All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] Fix some bugs about HugeTLB code
@ 2021-01-06  8:47 Muchun Song
  2021-01-06  8:47 ` [PATCH v2 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one Muchun Song
                   ` (6 more replies)
  0 siblings, 7 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-06  8:47 UTC (permalink / raw)
  To: mike.kravetz, akpm
  Cc: n-horiguchi, ak, mhocko, linux-mm, linux-kernel, Muchun Song

This patch series aims to fix some bugs and add some improvements.

Changelog since v1 -> v2:
  - Export set_page_huge_active() in patch #2 to fix.
  - Using head[3].mapping to indicate the page is freed in patch #3.
  - Flush @free_hpage_work in patch #4.

Muchun Song (6):
  mm: migrate: do not migrate HugeTLB page whose refcount is one
  mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
  mm: hugetlb: fix a race between freeing and dissolving the page
  mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
  mm: hugetlb: fix a race between isolating and freeing page
  mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active

 fs/hugetlbfs/inode.c    |  3 ++-
 include/linux/hugetlb.h |  2 ++
 mm/hugetlb.c            | 69 +++++++++++++++++++++++++++++++++++++++++++------
 mm/migrate.c            |  6 +++++
 4 files changed, 71 insertions(+), 9 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one
  2021-01-06  8:47 [PATCH v2 0/6] Fix some bugs about HugeTLB code Muchun Song
@ 2021-01-06  8:47 ` Muchun Song
  2021-01-06 16:13   ` Michal Hocko
  2021-01-06  8:47 ` [PATCH v2 2/6] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page Muchun Song
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 75+ messages in thread
From: Muchun Song @ 2021-01-06  8:47 UTC (permalink / raw)
  To: mike.kravetz, akpm
  Cc: n-horiguchi, ak, mhocko, linux-mm, linux-kernel, Muchun Song, Yang Shi

If the refcount is one when it is migrated, it means that the page
was freed from under us. So we are done and do not need to migrate.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Yang Shi <shy828301@gmail.com>
---
 mm/migrate.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/migrate.c b/mm/migrate.c
index 4385f2fb5d18..a6631c4eb6a6 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 		return -ENOSYS;
 	}
 
+	if (page_count(hpage) == 1) {
+		/* page was freed from under us. So we are done. */
+		putback_active_hugepage(hpage);
+		return MIGRATEPAGE_SUCCESS;
+	}
+
 	new_hpage = get_new_page(hpage, private);
 	if (!new_hpage)
 		return -ENOMEM;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 2/6] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
  2021-01-06  8:47 [PATCH v2 0/6] Fix some bugs about HugeTLB code Muchun Song
  2021-01-06  8:47 ` [PATCH v2 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one Muchun Song
@ 2021-01-06  8:47 ` Muchun Song
  2021-01-06 16:35   ` Michal Hocko
  2021-01-06  8:47 ` [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page Muchun Song
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 75+ messages in thread
From: Muchun Song @ 2021-01-06  8:47 UTC (permalink / raw)
  To: mike.kravetz, akpm
  Cc: n-horiguchi, ak, mhocko, linux-mm, linux-kernel, Muchun Song

Because we only can isolate a active page via isolate_huge_page()
and hugetlbfs_fallocate() forget to mark it as active, we cannot
isolate and migrate those pages.

Only export set_page_huge_active, just leave clear_page_huge_active
as static. Because there are no external users.

Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate())
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 fs/hugetlbfs/inode.c    | 3 ++-
 include/linux/hugetlb.h | 2 ++
 mm/hugetlb.c            | 2 +-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index b5c109703daa..21c20fd5f9ee 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -735,9 +735,10 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 
 		mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 
+		set_page_huge_active(page);
 		/*
 		 * unlock_page because locked by add_to_page_cache()
-		 * page_put due to reference from alloc_huge_page()
+		 * put_page() due to reference from alloc_huge_page()
 		 */
 		unlock_page(page);
 		put_page(page);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ebca2ef02212..b5807f23caf8 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -770,6 +770,8 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
 }
 #endif
 
+void set_page_huge_active(struct page *page);
+
 #else	/* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1f3bf1710b66..4741d60f8955 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1348,7 +1348,7 @@ bool page_huge_active(struct page *page)
 }
 
 /* never called for tail page */
-static void set_page_huge_active(struct page *page)
+void set_page_huge_active(struct page *page)
 {
 	VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
 	SetPagePrivate(&page[1]);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-06  8:47 [PATCH v2 0/6] Fix some bugs about HugeTLB code Muchun Song
  2021-01-06  8:47 ` [PATCH v2 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one Muchun Song
  2021-01-06  8:47 ` [PATCH v2 2/6] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page Muchun Song
@ 2021-01-06  8:47 ` Muchun Song
  2021-01-06 16:56   ` Michal Hocko
  2021-01-06  8:47 ` [PATCH v2 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page Muchun Song
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 75+ messages in thread
From: Muchun Song @ 2021-01-06  8:47 UTC (permalink / raw)
  To: mike.kravetz, akpm
  Cc: n-horiguchi, ak, mhocko, linux-mm, linux-kernel, Muchun Song

There is a race condition between __free_huge_page()
and dissolve_free_huge_page().

CPU0:                         CPU1:

// page_count(page) == 1
put_page(page)
  __free_huge_page(page)
                              dissolve_free_huge_page(page)
                                spin_lock(&hugetlb_lock)
                                // PageHuge(page) && !page_count(page)
                                update_and_free_page(page)
                                // page is freed to the buddy
                                spin_unlock(&hugetlb_lock)
    spin_lock(&hugetlb_lock)
    clear_page_huge_active(page)
    enqueue_huge_page(page)
    // It is wrong, the page is already freed
    spin_unlock(&hugetlb_lock)

The race windows is between put_page() and spin_lock() which
is in the __free_huge_page().

We should make sure that the page is already on the free list
when it is dissolved.

Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/hugetlb.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4741d60f8955..8ff138c17129 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
 static int num_fault_mutexes;
 struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
 
+static inline bool PageHugeFreed(struct page *head)
+{
+	return (unsigned long)head[3].mapping == -1U;
+}
+
+static inline void SetPageHugeFreed(struct page *head)
+{
+	head[3].mapping = (void *)-1U;
+}
+
+static inline void ClearPageHugeFreed(struct page *head)
+{
+	head[3].mapping = NULL;
+}
+
 /* Forward declaration */
 static int hugetlb_acct_memory(struct hstate *h, long delta);
 
@@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
 	list_move(&page->lru, &h->hugepage_freelists[nid]);
 	h->free_huge_pages++;
 	h->free_huge_pages_node[nid]++;
+	SetPageHugeFreed(page);
 }
 
 static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
@@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
 
 		list_move(&page->lru, &h->hugepage_activelist);
 		set_page_refcounted(page);
+		ClearPageHugeFreed(page);
 		h->free_huge_pages--;
 		h->free_huge_pages_node[nid]--;
 		return page;
@@ -1291,6 +1308,17 @@ static inline void destroy_compound_gigantic_page(struct page *page,
 						unsigned int order) { }
 #endif
 
+/*
+ * Because we reuse the mapping field of some tail page structs, we should
+ * reset those mapping to initial value before @head is freed to the buddy
+ * allocator. The invalid value will be checked in the free_tail_pages_check().
+ */
+static inline void reset_tail_page_mapping(struct hstate *h, struct page *head)
+{
+	if (!hstate_is_gigantic(h))
+		head[3].mapping = TAIL_MAPPING;
+}
+
 static void update_and_free_page(struct hstate *h, struct page *page)
 {
 	int i;
@@ -1298,6 +1326,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
 		return;
 
+	reset_tail_page_mapping(h, page);
 	h->nr_huge_pages--;
 	h->nr_huge_pages_node[page_to_nid(page)]--;
 	for (i = 0; i < pages_per_huge_page(h); i++) {
@@ -1504,6 +1533,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
 	spin_lock(&hugetlb_lock);
 	h->nr_huge_pages++;
 	h->nr_huge_pages_node[nid]++;
+	ClearPageHugeFreed(page);
 	spin_unlock(&hugetlb_lock);
 }
 
@@ -1770,6 +1800,14 @@ int dissolve_free_huge_page(struct page *page)
 		int nid = page_to_nid(head);
 		if (h->free_huge_pages - h->resv_huge_pages == 0)
 			goto out;
+
+		/*
+		 * We should make sure that the page is already on the free list
+		 * when it is dissolved.
+		 */
+		if (unlikely(!PageHugeFreed(head)))
+			goto out;
+
 		/*
 		 * Move PageHWPoison flag from head page to the raw error page,
 		 * which makes any subpages rather than the error page reusable.
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
  2021-01-06  8:47 [PATCH v2 0/6] Fix some bugs about HugeTLB code Muchun Song
                   ` (2 preceding siblings ...)
  2021-01-06  8:47 ` [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page Muchun Song
@ 2021-01-06  8:47 ` Muchun Song
  2021-01-06 17:07   ` Michal Hocko
  2021-01-06  8:47 ` [PATCH v2 5/6] mm: hugetlb: fix a race between isolating and freeing page Muchun Song
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 75+ messages in thread
From: Muchun Song @ 2021-01-06  8:47 UTC (permalink / raw)
  To: mike.kravetz, akpm
  Cc: n-horiguchi, ak, mhocko, linux-mm, linux-kernel, Muchun Song

When dissolve_free_huge_page() races with __free_huge_page(), we can
do a retry. Because the race window is small.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/hugetlb.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8ff138c17129..bf02e81e3953 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1775,10 +1775,11 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
  * nothing for in-use hugepages and non-hugepages.
  * This function returns values like below:
  *
- *  -EBUSY: failed to dissolved free hugepages or the hugepage is in-use
- *          (allocated or reserved.)
- *       0: successfully dissolved free hugepages or the page is not a
- *          hugepage (considered as already dissolved)
+ *  -EAGAIN: race with __free_huge_page() and can do a retry
+ *  -EBUSY:  failed to dissolved free hugepages or the hugepage is in-use
+ *           (allocated or reserved.)
+ *       0:  successfully dissolved free hugepages or the page is not a
+ *           hugepage (considered as already dissolved)
  */
 int dissolve_free_huge_page(struct page *page)
 {
@@ -1805,8 +1806,10 @@ int dissolve_free_huge_page(struct page *page)
 		 * We should make sure that the page is already on the free list
 		 * when it is dissolved.
 		 */
-		if (unlikely(!PageHugeFreed(head)))
+		if (unlikely(!PageHugeFreed(head))) {
+			rc = -EAGAIN;
 			goto out;
+		}
 
 		/*
 		 * Move PageHWPoison flag from head page to the raw error page,
@@ -1825,6 +1828,14 @@ int dissolve_free_huge_page(struct page *page)
 	}
 out:
 	spin_unlock(&hugetlb_lock);
+
+	/*
+	 * If the freeing of the HugeTLB page is put on a work queue, we should
+	 * flush the work before retrying.
+	 */
+	if (unlikely(rc == -EAGAIN))
+		flush_work(&free_hpage_work);
+
 	return rc;
 }
 
@@ -1847,7 +1858,12 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
 		page = pfn_to_page(pfn);
+retry:
 		rc = dissolve_free_huge_page(page);
+		if (rc == -EAGAIN) {
+			cpu_relax();
+			goto retry;
+		}
 		if (rc)
 			break;
 	}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 5/6] mm: hugetlb: fix a race between isolating and freeing page
  2021-01-06  8:47 [PATCH v2 0/6] Fix some bugs about HugeTLB code Muchun Song
                   ` (3 preceding siblings ...)
  2021-01-06  8:47 ` [PATCH v2 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page Muchun Song
@ 2021-01-06  8:47 ` Muchun Song
  2021-01-06 17:10   ` Michal Hocko
  2021-01-06  8:47 ` [PATCH v2 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active Muchun Song
  2021-01-07  9:30 ` [PATCH v2 0/6] Fix some bugs about HugeTLB code David Hildenbrand
  6 siblings, 1 reply; 75+ messages in thread
From: Muchun Song @ 2021-01-06  8:47 UTC (permalink / raw)
  To: mike.kravetz, akpm
  Cc: n-horiguchi, ak, mhocko, linux-mm, linux-kernel, Muchun Song

There is a race between isolate_huge_page() and __free_huge_page().

CPU0:                                       CPU1:

if (PageHuge(page))
                                            put_page(page)
                                              __free_huge_page(page)
                                                  spin_lock(&hugetlb_lock)
                                                  update_and_free_page(page)
                                                    set_compound_page_dtor(page,
                                                      NULL_COMPOUND_DTOR)
                                                  spin_unlock(&hugetlb_lock)
  isolate_huge_page(page)
    // trigger BUG_ON
    VM_BUG_ON_PAGE(!PageHead(page), page)
    spin_lock(&hugetlb_lock)
    page_huge_active(page)
      // trigger BUG_ON
      VM_BUG_ON_PAGE(!PageHuge(page), page)
    spin_unlock(&hugetlb_lock)

When we isolate a HugeTLB page on CPU0. Meanwhile, we free it to the
buddy allocator on CPU1. Then, we can trigger a BUG_ON on CPU0. Because
it is already freed to the buddy allocator.

Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bf02e81e3953..67200dd25b1d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5587,9 +5587,9 @@ bool isolate_huge_page(struct page *page, struct list_head *list)
 {
 	bool ret = true;
 
-	VM_BUG_ON_PAGE(!PageHead(page), page);
 	spin_lock(&hugetlb_lock);
-	if (!page_huge_active(page) || !get_page_unless_zero(page)) {
+	if (!PageHeadHuge(page) || !page_huge_active(page) ||
+	    !get_page_unless_zero(page)) {
 		ret = false;
 		goto unlock;
 	}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
  2021-01-06  8:47 [PATCH v2 0/6] Fix some bugs about HugeTLB code Muchun Song
                   ` (4 preceding siblings ...)
  2021-01-06  8:47 ` [PATCH v2 5/6] mm: hugetlb: fix a race between isolating and freeing page Muchun Song
@ 2021-01-06  8:47 ` Muchun Song
  2021-01-06 17:16   ` Michal Hocko
  2021-01-08 22:24   ` Mike Kravetz
  2021-01-07  9:30 ` [PATCH v2 0/6] Fix some bugs about HugeTLB code David Hildenbrand
  6 siblings, 2 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-06  8:47 UTC (permalink / raw)
  To: mike.kravetz, akpm
  Cc: n-horiguchi, ak, mhocko, linux-mm, linux-kernel, Muchun Song

The page_huge_active() can be called from scan_movable_pages() which
do not hold a reference count to the HugeTLB page. So when we call
page_huge_active() from scan_movable_pages(), the HugeTLB page can
be freed parallel. Then we will trigger a BUG_ON which is in the
page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the
VM_BUG_ON_PAGE.

Fixes: 7e1f049efb86 ("mm: hugetlb: cleanup using paeg_huge_active()")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 67200dd25b1d..7a24ed28ec4f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1372,7 +1372,6 @@ struct hstate *size_to_hstate(unsigned long size)
  */
 bool page_huge_active(struct page *page)
 {
-	VM_BUG_ON_PAGE(!PageHuge(page), page);
 	return PageHead(page) && PagePrivate(&page[1]);
 }
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one
  2021-01-06  8:47 ` [PATCH v2 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one Muchun Song
@ 2021-01-06 16:13   ` Michal Hocko
  2021-01-07  2:52       ` Muchun Song
  0 siblings, 1 reply; 75+ messages in thread
From: Michal Hocko @ 2021-01-06 16:13 UTC (permalink / raw)
  To: Muchun Song
  Cc: mike.kravetz, akpm, n-horiguchi, ak, linux-mm, linux-kernel, Yang Shi

On Wed 06-01-21 16:47:34, Muchun Song wrote:
> If the refcount is one when it is migrated, it means that the page
> was freed from under us. So we are done and do not need to migrate.

Is this common enough that it would warrant the explicit check for each
migration?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 2/6] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
  2021-01-06  8:47 ` [PATCH v2 2/6] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page Muchun Song
@ 2021-01-06 16:35   ` Michal Hocko
  2021-01-06 19:30     ` Mike Kravetz
  2021-01-07  2:58       ` Muchun Song
  0 siblings, 2 replies; 75+ messages in thread
From: Michal Hocko @ 2021-01-06 16:35 UTC (permalink / raw)
  To: Muchun Song; +Cc: mike.kravetz, akpm, n-horiguchi, ak, linux-mm, linux-kernel

On Wed 06-01-21 16:47:35, Muchun Song wrote:
> Because we only can isolate a active page via isolate_huge_page()
> and hugetlbfs_fallocate() forget to mark it as active, we cannot
> isolate and migrate those pages.

I've little bit hard time to understand this initially and had to dive
into the code to make sense of it. I would consider the following
wording easier to grasp. Feel free to reuse if you like.
"
If a new hugetlb page is allocated during fallocate it will not be
marked as active (set_page_huge_active) which will result in a later
isolate_huge_page failure when the page migration code would like to
move that page. Such a failure would be unexpected and wrong.
"

Now to the fix. I believe that this patch shows that the
set_page_huge_active is just too subtle. Is there any reason why we
cannot make all freshly allocated huge pages active by default?
 
> Only export set_page_huge_active, just leave clear_page_huge_active
> as static. Because there are no external users.
> 
> Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate())
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  fs/hugetlbfs/inode.c    | 3 ++-
>  include/linux/hugetlb.h | 2 ++
>  mm/hugetlb.c            | 2 +-
>  3 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index b5c109703daa..21c20fd5f9ee 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -735,9 +735,10 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
>  
>  		mutex_unlock(&hugetlb_fault_mutex_table[hash]);
>  
> +		set_page_huge_active(page);
>  		/*
>  		 * unlock_page because locked by add_to_page_cache()
> -		 * page_put due to reference from alloc_huge_page()
> +		 * put_page() due to reference from alloc_huge_page()
>  		 */
>  		unlock_page(page);
>  		put_page(page);
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index ebca2ef02212..b5807f23caf8 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -770,6 +770,8 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
>  }
>  #endif
>  
> +void set_page_huge_active(struct page *page);
> +
>  #else	/* CONFIG_HUGETLB_PAGE */
>  struct hstate {};
>  
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 1f3bf1710b66..4741d60f8955 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1348,7 +1348,7 @@ bool page_huge_active(struct page *page)
>  }
>  
>  /* never called for tail page */
> -static void set_page_huge_active(struct page *page)
> +void set_page_huge_active(struct page *page)
>  {
>  	VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
>  	SetPagePrivate(&page[1]);
> -- 
> 2.11.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-06  8:47 ` [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page Muchun Song
@ 2021-01-06 16:56   ` Michal Hocko
  2021-01-06 20:58     ` Mike Kravetz
  2021-01-07  5:39       ` Muchun Song
  0 siblings, 2 replies; 75+ messages in thread
From: Michal Hocko @ 2021-01-06 16:56 UTC (permalink / raw)
  To: Muchun Song; +Cc: mike.kravetz, akpm, n-horiguchi, ak, linux-mm, linux-kernel

On Wed 06-01-21 16:47:36, Muchun Song wrote:
> There is a race condition between __free_huge_page()
> and dissolve_free_huge_page().
> 
> CPU0:                         CPU1:
> 
> // page_count(page) == 1
> put_page(page)
>   __free_huge_page(page)
>                               dissolve_free_huge_page(page)
>                                 spin_lock(&hugetlb_lock)
>                                 // PageHuge(page) && !page_count(page)
>                                 update_and_free_page(page)
>                                 // page is freed to the buddy
>                                 spin_unlock(&hugetlb_lock)
>     spin_lock(&hugetlb_lock)
>     clear_page_huge_active(page)
>     enqueue_huge_page(page)
>     // It is wrong, the page is already freed
>     spin_unlock(&hugetlb_lock)
> 
> The race windows is between put_page() and spin_lock() which
> is in the __free_huge_page().

The race window reall is between put_page and dissolve_free_huge_page.
And the result is that the put_page path would clobber an unrelated page
(either free or already reused page) which is quite serious.
Fortunatelly pages are dissolved very rarely. I believe that user would
require to be privileged to hit this by intention.

> We should make sure that the page is already on the free list
> when it is dissolved.

Another option would be to check for PageHuge in __free_huge_page. Have
you considered that rather than add yet another state? The scope of the
spinlock would have to be extended. If that sounds more tricky then can
we check the page->lru in the dissolve path? If the page is still
PageHuge and reference count 0 then there shouldn't be many options
where it can be queued, right?

> Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/hugetlb.c | 38 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4741d60f8955..8ff138c17129 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
>  static int num_fault_mutexes;
>  struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
>  
> +static inline bool PageHugeFreed(struct page *head)
> +{
> +	return (unsigned long)head[3].mapping == -1U;
> +}
> +
> +static inline void SetPageHugeFreed(struct page *head)
> +{
> +	head[3].mapping = (void *)-1U;
> +}
> +
> +static inline void ClearPageHugeFreed(struct page *head)
> +{
> +	head[3].mapping = NULL;
> +}
> +
>  /* Forward declaration */
>  static int hugetlb_acct_memory(struct hstate *h, long delta);
>  
> @@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
>  	list_move(&page->lru, &h->hugepage_freelists[nid]);
>  	h->free_huge_pages++;
>  	h->free_huge_pages_node[nid]++;
> +	SetPageHugeFreed(page);
>  }
>  
>  static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> @@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
>  
>  		list_move(&page->lru, &h->hugepage_activelist);
>  		set_page_refcounted(page);
> +		ClearPageHugeFreed(page);
>  		h->free_huge_pages--;
>  		h->free_huge_pages_node[nid]--;
>  		return page;
> @@ -1291,6 +1308,17 @@ static inline void destroy_compound_gigantic_page(struct page *page,
>  						unsigned int order) { }
>  #endif
>  
> +/*
> + * Because we reuse the mapping field of some tail page structs, we should
> + * reset those mapping to initial value before @head is freed to the buddy
> + * allocator. The invalid value will be checked in the free_tail_pages_check().
> + */
> +static inline void reset_tail_page_mapping(struct hstate *h, struct page *head)
> +{
> +	if (!hstate_is_gigantic(h))
> +		head[3].mapping = TAIL_MAPPING;
> +}
> +
>  static void update_and_free_page(struct hstate *h, struct page *page)
>  {
>  	int i;
> @@ -1298,6 +1326,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
>  	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
>  		return;
>  
> +	reset_tail_page_mapping(h, page);
>  	h->nr_huge_pages--;
>  	h->nr_huge_pages_node[page_to_nid(page)]--;
>  	for (i = 0; i < pages_per_huge_page(h); i++) {
> @@ -1504,6 +1533,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
>  	spin_lock(&hugetlb_lock);
>  	h->nr_huge_pages++;
>  	h->nr_huge_pages_node[nid]++;
> +	ClearPageHugeFreed(page);
>  	spin_unlock(&hugetlb_lock);
>  }
>  
> @@ -1770,6 +1800,14 @@ int dissolve_free_huge_page(struct page *page)
>  		int nid = page_to_nid(head);
>  		if (h->free_huge_pages - h->resv_huge_pages == 0)
>  			goto out;
> +
> +		/*
> +		 * We should make sure that the page is already on the free list
> +		 * when it is dissolved.
> +		 */
> +		if (unlikely(!PageHugeFreed(head)))
> +			goto out;
> +
>  		/*
>  		 * Move PageHWPoison flag from head page to the raw error page,
>  		 * which makes any subpages rather than the error page reusable.
> -- 
> 2.11.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
  2021-01-06  8:47 ` [PATCH v2 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page Muchun Song
@ 2021-01-06 17:07   ` Michal Hocko
  2021-01-07  3:11       ` Muchun Song
  0 siblings, 1 reply; 75+ messages in thread
From: Michal Hocko @ 2021-01-06 17:07 UTC (permalink / raw)
  To: Muchun Song; +Cc: mike.kravetz, akpm, n-horiguchi, ak, linux-mm, linux-kernel

On Wed 06-01-21 16:47:37, Muchun Song wrote:
> When dissolve_free_huge_page() races with __free_huge_page(), we can
> do a retry. Because the race window is small.

Is this a bug fix or mere optimization. I have hard time to tell from
the description.
 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/hugetlb.c | 26 +++++++++++++++++++++-----
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
[...]
> @@ -1825,6 +1828,14 @@ int dissolve_free_huge_page(struct page *page)
>  	}
>  out:
>  	spin_unlock(&hugetlb_lock);
> +
> +	/*
> +	 * If the freeing of the HugeTLB page is put on a work queue, we should
> +	 * flush the work before retrying.
> +	 */
> +	if (unlikely(rc == -EAGAIN))
> +		flush_work(&free_hpage_work);

Is it safe to wait for the work to finish from this context?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 5/6] mm: hugetlb: fix a race between isolating and freeing page
  2021-01-06  8:47 ` [PATCH v2 5/6] mm: hugetlb: fix a race between isolating and freeing page Muchun Song
@ 2021-01-06 17:10   ` Michal Hocko
  0 siblings, 0 replies; 75+ messages in thread
From: Michal Hocko @ 2021-01-06 17:10 UTC (permalink / raw)
  To: Muchun Song; +Cc: mike.kravetz, akpm, n-horiguchi, ak, linux-mm, linux-kernel

On Wed 06-01-21 16:47:38, Muchun Song wrote:
> There is a race between isolate_huge_page() and __free_huge_page().
> 
> CPU0:                                       CPU1:
> 
> if (PageHuge(page))
>                                             put_page(page)
>                                               __free_huge_page(page)
>                                                   spin_lock(&hugetlb_lock)
>                                                   update_and_free_page(page)
>                                                     set_compound_page_dtor(page,
>                                                       NULL_COMPOUND_DTOR)
>                                                   spin_unlock(&hugetlb_lock)
>   isolate_huge_page(page)
>     // trigger BUG_ON
>     VM_BUG_ON_PAGE(!PageHead(page), page)
>     spin_lock(&hugetlb_lock)
>     page_huge_active(page)
>       // trigger BUG_ON
>       VM_BUG_ON_PAGE(!PageHuge(page), page)
>     spin_unlock(&hugetlb_lock)
> 
> When we isolate a HugeTLB page on CPU0. Meanwhile, we free it to the
> buddy allocator on CPU1. Then, we can trigger a BUG_ON on CPU0. Because
> it is already freed to the buddy allocator.
> 
> Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!
> ---
>  mm/hugetlb.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index bf02e81e3953..67200dd25b1d 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5587,9 +5587,9 @@ bool isolate_huge_page(struct page *page, struct list_head *list)
>  {
>  	bool ret = true;
>  
> -	VM_BUG_ON_PAGE(!PageHead(page), page);
>  	spin_lock(&hugetlb_lock);
> -	if (!page_huge_active(page) || !get_page_unless_zero(page)) {
> +	if (!PageHeadHuge(page) || !page_huge_active(page) ||
> +	    !get_page_unless_zero(page)) {
>  		ret = false;
>  		goto unlock;
>  	}
> -- 
> 2.11.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
  2021-01-06  8:47 ` [PATCH v2 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active Muchun Song
@ 2021-01-06 17:16   ` Michal Hocko
  2021-01-08 22:24   ` Mike Kravetz
  1 sibling, 0 replies; 75+ messages in thread
From: Michal Hocko @ 2021-01-06 17:16 UTC (permalink / raw)
  To: Muchun Song; +Cc: mike.kravetz, akpm, n-horiguchi, ak, linux-mm, linux-kernel

On Wed 06-01-21 16:47:39, Muchun Song wrote:
> The page_huge_active() can be called from scan_movable_pages() which
> do not hold a reference count to the HugeTLB page. So when we call
> page_huge_active() from scan_movable_pages(), the HugeTLB page can
> be freed parallel. Then we will trigger a BUG_ON which is in the
> page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the
> VM_BUG_ON_PAGE.
> 
> Fixes: 7e1f049efb86 ("mm: hugetlb: cleanup using paeg_huge_active()")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

Acked-by: Michal Hocko <mhocko@suse.com>

The BUG_ON looks like a wrong thing to do regardless of the memory
hotplug use. Which is admittedly ugly as well.

> ---
>  mm/hugetlb.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 67200dd25b1d..7a24ed28ec4f 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1372,7 +1372,6 @@ struct hstate *size_to_hstate(unsigned long size)
>   */
>  bool page_huge_active(struct page *page)
>  {
> -	VM_BUG_ON_PAGE(!PageHuge(page), page);
>  	return PageHead(page) && PagePrivate(&page[1]);
>  }
>  
> -- 
> 2.11.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 2/6] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
  2021-01-06 16:35   ` Michal Hocko
@ 2021-01-06 19:30     ` Mike Kravetz
  2021-01-06 20:02       ` Michal Hocko
  2021-01-07  2:58       ` Muchun Song
  1 sibling, 1 reply; 75+ messages in thread
From: Mike Kravetz @ 2021-01-06 19:30 UTC (permalink / raw)
  To: Michal Hocko, Muchun Song; +Cc: akpm, n-horiguchi, ak, linux-mm, linux-kernel

On 1/6/21 8:35 AM, Michal Hocko wrote:
> On Wed 06-01-21 16:47:35, Muchun Song wrote:
>> Because we only can isolate a active page via isolate_huge_page()
>> and hugetlbfs_fallocate() forget to mark it as active, we cannot
>> isolate and migrate those pages.
> 
> I've little bit hard time to understand this initially and had to dive
> into the code to make sense of it. I would consider the following
> wording easier to grasp. Feel free to reuse if you like.
> "
> If a new hugetlb page is allocated during fallocate it will not be
> marked as active (set_page_huge_active) which will result in a later
> isolate_huge_page failure when the page migration code would like to
> move that page. Such a failure would be unexpected and wrong.
> "
> 
> Now to the fix. I believe that this patch shows that the
> set_page_huge_active is just too subtle. Is there any reason why we
> cannot make all freshly allocated huge pages active by default?

I looked into that yesterday.  The primary issue is in page fault code,
hugetlb_no_page is an example.  If page_huge_active is set, then it can
be isolated for migration.  So, migration could race with the page fault
and the page could be migrated before being added to the page table of
the faulting task.  This was an issue when hugetlb_no_page set_page_huge_active
right after allocating and clearing the huge page.  Commit cb6acd01e2e4
moved the set_page_huge_active after adding the page to the page table
to address this issue.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 2/6] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
  2021-01-06 19:30     ` Mike Kravetz
@ 2021-01-06 20:02       ` Michal Hocko
  2021-01-06 21:07         ` Mike Kravetz
  0 siblings, 1 reply; 75+ messages in thread
From: Michal Hocko @ 2021-01-06 20:02 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Muchun Song, akpm, n-horiguchi, ak, linux-mm, linux-kernel

On Wed 06-01-21 11:30:25, Mike Kravetz wrote:
> On 1/6/21 8:35 AM, Michal Hocko wrote:
> > On Wed 06-01-21 16:47:35, Muchun Song wrote:
> >> Because we only can isolate a active page via isolate_huge_page()
> >> and hugetlbfs_fallocate() forget to mark it as active, we cannot
> >> isolate and migrate those pages.
> > 
> > I've little bit hard time to understand this initially and had to dive
> > into the code to make sense of it. I would consider the following
> > wording easier to grasp. Feel free to reuse if you like.
> > "
> > If a new hugetlb page is allocated during fallocate it will not be
> > marked as active (set_page_huge_active) which will result in a later
> > isolate_huge_page failure when the page migration code would like to
> > move that page. Such a failure would be unexpected and wrong.
> > "
> > 
> > Now to the fix. I believe that this patch shows that the
> > set_page_huge_active is just too subtle. Is there any reason why we
> > cannot make all freshly allocated huge pages active by default?
> 
> I looked into that yesterday.  The primary issue is in page fault code,
> hugetlb_no_page is an example.  If page_huge_active is set, then it can
> be isolated for migration.  So, migration could race with the page fault
> and the page could be migrated before being added to the page table of
> the faulting task.  This was an issue when hugetlb_no_page set_page_huge_active
> right after allocating and clearing the huge page.  Commit cb6acd01e2e4
> moved the set_page_huge_active after adding the page to the page table
> to address this issue.

Thanks for the clarification. I was not aware of this subtlety. The
existing comment is not helping much TBH. I am still digesting the
suggested race. The page is new and exclusive and not visible via page
tables yet, so the only source of the migration would be pfn based
(hotplug, poisoning), right?

Btw. s@set_page_huge_active@set_page_huge_migrateable@ would help
readability IMHO. With a comment explaining that this _has_ to be called
after the page is fully initialized.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-06 16:56   ` Michal Hocko
@ 2021-01-06 20:58     ` Mike Kravetz
  2021-01-07  3:08         ` Muchun Song
  2021-01-07  8:40       ` Michal Hocko
  2021-01-07  5:39       ` Muchun Song
  1 sibling, 2 replies; 75+ messages in thread
From: Mike Kravetz @ 2021-01-06 20:58 UTC (permalink / raw)
  To: Michal Hocko, Muchun Song; +Cc: akpm, n-horiguchi, ak, linux-mm, linux-kernel

On 1/6/21 8:56 AM, Michal Hocko wrote:
> On Wed 06-01-21 16:47:36, Muchun Song wrote:
>> There is a race condition between __free_huge_page()
>> and dissolve_free_huge_page().
>>
>> CPU0:                         CPU1:
>>
>> // page_count(page) == 1
>> put_page(page)
>>   __free_huge_page(page)
>>                               dissolve_free_huge_page(page)
>>                                 spin_lock(&hugetlb_lock)
>>                                 // PageHuge(page) && !page_count(page)
>>                                 update_and_free_page(page)
>>                                 // page is freed to the buddy
>>                                 spin_unlock(&hugetlb_lock)
>>     spin_lock(&hugetlb_lock)
>>     clear_page_huge_active(page)
>>     enqueue_huge_page(page)
>>     // It is wrong, the page is already freed
>>     spin_unlock(&hugetlb_lock)
>>
>> The race windows is between put_page() and spin_lock() which
>> is in the __free_huge_page().
> 
> The race window reall is between put_page and dissolve_free_huge_page.
> And the result is that the put_page path would clobber an unrelated page
> (either free or already reused page) which is quite serious.
> Fortunatelly pages are dissolved very rarely. I believe that user would
> require to be privileged to hit this by intention.
> 
>> We should make sure that the page is already on the free list
>> when it is dissolved.
> 
> Another option would be to check for PageHuge in __free_huge_page. Have
> you considered that rather than add yet another state? The scope of the
> spinlock would have to be extended. If that sounds more tricky then can
> we check the page->lru in the dissolve path? If the page is still
> PageHuge and reference count 0 then there shouldn't be many options
> where it can be queued, right?

The tricky part with expanding lock scope will be the potential call to
hugepage_subpool_put_pages as it may also try to acquire the hugetlb_lock.

I am not sure what you mean by 'check the page->lru'?  If we knew the page
was on the free list, then we could dissolve.  But, I do not think there
is an easy way to determine that from page->lru.  A hugetlb page is either
going to be on the active list or free list.

> 
>> Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> ---
>>  mm/hugetlb.c | 38 ++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 38 insertions(+)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 4741d60f8955..8ff138c17129 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
>>  static int num_fault_mutexes;
>>  struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
>>  
>> +static inline bool PageHugeFreed(struct page *head)
>> +{
>> +	return (unsigned long)head[3].mapping == -1U;
>> +}
>> +
>> +static inline void SetPageHugeFreed(struct page *head)
>> +{
>> +	head[3].mapping = (void *)-1U;
>> +}
>> +
>> +static inline void ClearPageHugeFreed(struct page *head)
>> +{
>> +	head[3].mapping = NULL;
>> +}
>> +
>>  /* Forward declaration */
>>  static int hugetlb_acct_memory(struct hstate *h, long delta);
>>  
>> @@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
>>  	list_move(&page->lru, &h->hugepage_freelists[nid]);
>>  	h->free_huge_pages++;
>>  	h->free_huge_pages_node[nid]++;
>> +	SetPageHugeFreed(page);
>>  }
>>  
>>  static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
>> @@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
>>  
>>  		list_move(&page->lru, &h->hugepage_activelist);
>>  		set_page_refcounted(page);
>> +		ClearPageHugeFreed(page);
>>  		h->free_huge_pages--;
>>  		h->free_huge_pages_node[nid]--;
>>  		return page;
>> @@ -1291,6 +1308,17 @@ static inline void destroy_compound_gigantic_page(struct page *page,
>>  						unsigned int order) { }
>>  #endif
>>  
>> +/*
>> + * Because we reuse the mapping field of some tail page structs, we should
>> + * reset those mapping to initial value before @head is freed to the buddy
>> + * allocator. The invalid value will be checked in the free_tail_pages_check().
>> + */

When I suggested using head[3].mapping for this state, I was not aware of
this requirement.  My suggestion was only following the convention used in
PageHugeTemporary.  I would not have made the suggestion if I had realized
this was required.  Sorry.
-- 
Mike Kravetz

>> +static inline void reset_tail_page_mapping(struct hstate *h, struct page *head)
>> +{
>> +	if (!hstate_is_gigantic(h))
>> +		head[3].mapping = TAIL_MAPPING;
>> +}
>> +
>>  static void update_and_free_page(struct hstate *h, struct page *page)
>>  {
>>  	int i;
>> @@ -1298,6 +1326,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
>>  	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
>>  		return;
>>  
>> +	reset_tail_page_mapping(h, page);
>>  	h->nr_huge_pages--;
>>  	h->nr_huge_pages_node[page_to_nid(page)]--;
>>  	for (i = 0; i < pages_per_huge_page(h); i++) {
>> @@ -1504,6 +1533,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
>>  	spin_lock(&hugetlb_lock);
>>  	h->nr_huge_pages++;
>>  	h->nr_huge_pages_node[nid]++;
>> +	ClearPageHugeFreed(page);
>>  	spin_unlock(&hugetlb_lock);
>>  }
>>  
>> @@ -1770,6 +1800,14 @@ int dissolve_free_huge_page(struct page *page)
>>  		int nid = page_to_nid(head);
>>  		if (h->free_huge_pages - h->resv_huge_pages == 0)
>>  			goto out;
>> +
>> +		/*
>> +		 * We should make sure that the page is already on the free list
>> +		 * when it is dissolved.
>> +		 */
>> +		if (unlikely(!PageHugeFreed(head)))
>> +			goto out;
>> +
>>  		/*
>>  		 * Move PageHWPoison flag from head page to the raw error page,
>>  		 * which makes any subpages rather than the error page reusable.
>> -- 
>> 2.11.0
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 2/6] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
  2021-01-06 20:02       ` Michal Hocko
@ 2021-01-06 21:07         ` Mike Kravetz
  2021-01-07  8:36           ` Michal Hocko
  0 siblings, 1 reply; 75+ messages in thread
From: Mike Kravetz @ 2021-01-06 21:07 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Muchun Song, akpm, n-horiguchi, ak, linux-mm, linux-kernel

On 1/6/21 12:02 PM, Michal Hocko wrote:
> On Wed 06-01-21 11:30:25, Mike Kravetz wrote:
>> On 1/6/21 8:35 AM, Michal Hocko wrote:
>>> On Wed 06-01-21 16:47:35, Muchun Song wrote:
>>>> Because we only can isolate a active page via isolate_huge_page()
>>>> and hugetlbfs_fallocate() forget to mark it as active, we cannot
>>>> isolate and migrate those pages.
>>>
>>> I've little bit hard time to understand this initially and had to dive
>>> into the code to make sense of it. I would consider the following
>>> wording easier to grasp. Feel free to reuse if you like.
>>> "
>>> If a new hugetlb page is allocated during fallocate it will not be
>>> marked as active (set_page_huge_active) which will result in a later
>>> isolate_huge_page failure when the page migration code would like to
>>> move that page. Such a failure would be unexpected and wrong.
>>> "
>>>
>>> Now to the fix. I believe that this patch shows that the
>>> set_page_huge_active is just too subtle. Is there any reason why we
>>> cannot make all freshly allocated huge pages active by default?
>>
>> I looked into that yesterday.  The primary issue is in page fault code,
>> hugetlb_no_page is an example.  If page_huge_active is set, then it can
>> be isolated for migration.  So, migration could race with the page fault
>> and the page could be migrated before being added to the page table of
>> the faulting task.  This was an issue when hugetlb_no_page set_page_huge_active
>> right after allocating and clearing the huge page.  Commit cb6acd01e2e4
>> moved the set_page_huge_active after adding the page to the page table
>> to address this issue.
> 
> Thanks for the clarification. I was not aware of this subtlety. The
> existing comment is not helping much TBH. I am still digesting the
> suggested race. The page is new and exclusive and not visible via page
> tables yet, so the only source of the migration would be pfn based
> (hotplug, poisoning), right?

That is correct.


> Btw. s@set_page_huge_active@set_page_huge_migrateable@ would help
> readability IMHO. With a comment explaining that this _has_ to be called
> after the page is fully initialized.

Agree, I will add that as a future enhancement.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one
  2021-01-06 16:13   ` Michal Hocko
@ 2021-01-07  2:52       ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07  2:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML, Yang Shi

On Thu, Jan 7, 2021 at 12:13 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 06-01-21 16:47:34, Muchun Song wrote:
> > If the refcount is one when it is migrated, it means that the page
> > was freed from under us. So we are done and do not need to migrate.
>
> Is this common enough that it would warrant the explicit check for each
> migration?

Are you worried about the overhead caused by the check? Thanks.

> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one
@ 2021-01-07  2:52       ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07  2:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML, Yang Shi

On Thu, Jan 7, 2021 at 12:13 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 06-01-21 16:47:34, Muchun Song wrote:
> > If the refcount is one when it is migrated, it means that the page
> > was freed from under us. So we are done and do not need to migrate.
>
> Is this common enough that it would warrant the explicit check for each
> migration?

Are you worried about the overhead caused by the check? Thanks.

> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 2/6] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
  2021-01-06 16:35   ` Michal Hocko
@ 2021-01-07  2:58       ` Muchun Song
  2021-01-07  2:58       ` Muchun Song
  1 sibling, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07  2:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 12:35 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 06-01-21 16:47:35, Muchun Song wrote:
> > Because we only can isolate a active page via isolate_huge_page()
> > and hugetlbfs_fallocate() forget to mark it as active, we cannot
> > isolate and migrate those pages.
>
> I've little bit hard time to understand this initially and had to dive
> into the code to make sense of it. I would consider the following
> wording easier to grasp. Feel free to reuse if you like.
> "
> If a new hugetlb page is allocated during fallocate it will not be
> marked as active (set_page_huge_active) which will result in a later
> isolate_huge_page failure when the page migration code would like to
> move that page. Such a failure would be unexpected and wrong.
> "

Thanks for your suggestion. I will reuse it.

>
> Now to the fix. I believe that this patch shows that the
> set_page_huge_active is just too subtle. Is there any reason why we
> cannot make all freshly allocated huge pages active by default?
>
> > Only export set_page_huge_active, just leave clear_page_huge_active
> > as static. Because there are no external users.
> >
> > Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate())
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  fs/hugetlbfs/inode.c    | 3 ++-
> >  include/linux/hugetlb.h | 2 ++
> >  mm/hugetlb.c            | 2 +-
> >  3 files changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> > index b5c109703daa..21c20fd5f9ee 100644
> > --- a/fs/hugetlbfs/inode.c
> > +++ b/fs/hugetlbfs/inode.c
> > @@ -735,9 +735,10 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
> >
> >               mutex_unlock(&hugetlb_fault_mutex_table[hash]);
> >
> > +             set_page_huge_active(page);
> >               /*
> >                * unlock_page because locked by add_to_page_cache()
> > -              * page_put due to reference from alloc_huge_page()
> > +              * put_page() due to reference from alloc_huge_page()
> >                */
> >               unlock_page(page);
> >               put_page(page);
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index ebca2ef02212..b5807f23caf8 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -770,6 +770,8 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
> >  }
> >  #endif
> >
> > +void set_page_huge_active(struct page *page);
> > +
> >  #else        /* CONFIG_HUGETLB_PAGE */
> >  struct hstate {};
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 1f3bf1710b66..4741d60f8955 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1348,7 +1348,7 @@ bool page_huge_active(struct page *page)
> >  }
> >
> >  /* never called for tail page */
> > -static void set_page_huge_active(struct page *page)
> > +void set_page_huge_active(struct page *page)
> >  {
> >       VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
> >       SetPagePrivate(&page[1]);
> > --
> > 2.11.0
>
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 2/6] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
@ 2021-01-07  2:58       ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07  2:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 12:35 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 06-01-21 16:47:35, Muchun Song wrote:
> > Because we only can isolate a active page via isolate_huge_page()
> > and hugetlbfs_fallocate() forget to mark it as active, we cannot
> > isolate and migrate those pages.
>
> I've little bit hard time to understand this initially and had to dive
> into the code to make sense of it. I would consider the following
> wording easier to grasp. Feel free to reuse if you like.
> "
> If a new hugetlb page is allocated during fallocate it will not be
> marked as active (set_page_huge_active) which will result in a later
> isolate_huge_page failure when the page migration code would like to
> move that page. Such a failure would be unexpected and wrong.
> "

Thanks for your suggestion. I will reuse it.

>
> Now to the fix. I believe that this patch shows that the
> set_page_huge_active is just too subtle. Is there any reason why we
> cannot make all freshly allocated huge pages active by default?
>
> > Only export set_page_huge_active, just leave clear_page_huge_active
> > as static. Because there are no external users.
> >
> > Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate())
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  fs/hugetlbfs/inode.c    | 3 ++-
> >  include/linux/hugetlb.h | 2 ++
> >  mm/hugetlb.c            | 2 +-
> >  3 files changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> > index b5c109703daa..21c20fd5f9ee 100644
> > --- a/fs/hugetlbfs/inode.c
> > +++ b/fs/hugetlbfs/inode.c
> > @@ -735,9 +735,10 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
> >
> >               mutex_unlock(&hugetlb_fault_mutex_table[hash]);
> >
> > +             set_page_huge_active(page);
> >               /*
> >                * unlock_page because locked by add_to_page_cache()
> > -              * page_put due to reference from alloc_huge_page()
> > +              * put_page() due to reference from alloc_huge_page()
> >                */
> >               unlock_page(page);
> >               put_page(page);
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index ebca2ef02212..b5807f23caf8 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -770,6 +770,8 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
> >  }
> >  #endif
> >
> > +void set_page_huge_active(struct page *page);
> > +
> >  #else        /* CONFIG_HUGETLB_PAGE */
> >  struct hstate {};
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 1f3bf1710b66..4741d60f8955 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1348,7 +1348,7 @@ bool page_huge_active(struct page *page)
> >  }
> >
> >  /* never called for tail page */
> > -static void set_page_huge_active(struct page *page)
> > +void set_page_huge_active(struct page *page)
> >  {
> >       VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
> >       SetPagePrivate(&page[1]);
> > --
> > 2.11.0
>
> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-06 20:58     ` Mike Kravetz
@ 2021-01-07  3:08         ` Muchun Song
  2021-01-07  8:40       ` Michal Hocko
  1 sibling, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07  3:08 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Michal Hocko, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 5:00 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 1/6/21 8:56 AM, Michal Hocko wrote:
> > On Wed 06-01-21 16:47:36, Muchun Song wrote:
> >> There is a race condition between __free_huge_page()
> >> and dissolve_free_huge_page().
> >>
> >> CPU0:                         CPU1:
> >>
> >> // page_count(page) == 1
> >> put_page(page)
> >>   __free_huge_page(page)
> >>                               dissolve_free_huge_page(page)
> >>                                 spin_lock(&hugetlb_lock)
> >>                                 // PageHuge(page) && !page_count(page)
> >>                                 update_and_free_page(page)
> >>                                 // page is freed to the buddy
> >>                                 spin_unlock(&hugetlb_lock)
> >>     spin_lock(&hugetlb_lock)
> >>     clear_page_huge_active(page)
> >>     enqueue_huge_page(page)
> >>     // It is wrong, the page is already freed
> >>     spin_unlock(&hugetlb_lock)
> >>
> >> The race windows is between put_page() and spin_lock() which
> >> is in the __free_huge_page().
> >
> > The race window reall is between put_page and dissolve_free_huge_page.
> > And the result is that the put_page path would clobber an unrelated page
> > (either free or already reused page) which is quite serious.
> > Fortunatelly pages are dissolved very rarely. I believe that user would
> > require to be privileged to hit this by intention.
> >
> >> We should make sure that the page is already on the free list
> >> when it is dissolved.
> >
> > Another option would be to check for PageHuge in __free_huge_page. Have
> > you considered that rather than add yet another state? The scope of the
> > spinlock would have to be extended. If that sounds more tricky then can
> > we check the page->lru in the dissolve path? If the page is still
> > PageHuge and reference count 0 then there shouldn't be many options
> > where it can be queued, right?
>
> The tricky part with expanding lock scope will be the potential call to
> hugepage_subpool_put_pages as it may also try to acquire the hugetlb_lock.
>
> I am not sure what you mean by 'check the page->lru'?  If we knew the page
> was on the free list, then we could dissolve.  But, I do not think there
> is an easy way to determine that from page->lru.  A hugetlb page is either
> going to be on the active list or free list.
>
> >
> >> Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
> >> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >> ---
> >>  mm/hugetlb.c | 38 ++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 38 insertions(+)
> >>
> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >> index 4741d60f8955..8ff138c17129 100644
> >> --- a/mm/hugetlb.c
> >> +++ b/mm/hugetlb.c
> >> @@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
> >>  static int num_fault_mutexes;
> >>  struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
> >>
> >> +static inline bool PageHugeFreed(struct page *head)
> >> +{
> >> +    return (unsigned long)head[3].mapping == -1U;
> >> +}
> >> +
> >> +static inline void SetPageHugeFreed(struct page *head)
> >> +{
> >> +    head[3].mapping = (void *)-1U;
> >> +}
> >> +
> >> +static inline void ClearPageHugeFreed(struct page *head)
> >> +{
> >> +    head[3].mapping = NULL;
> >> +}
> >> +
> >>  /* Forward declaration */
> >>  static int hugetlb_acct_memory(struct hstate *h, long delta);
> >>
> >> @@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
> >>      list_move(&page->lru, &h->hugepage_freelists[nid]);
> >>      h->free_huge_pages++;
> >>      h->free_huge_pages_node[nid]++;
> >> +    SetPageHugeFreed(page);
> >>  }
> >>
> >>  static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> >> @@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> >>
> >>              list_move(&page->lru, &h->hugepage_activelist);
> >>              set_page_refcounted(page);
> >> +            ClearPageHugeFreed(page);
> >>              h->free_huge_pages--;
> >>              h->free_huge_pages_node[nid]--;
> >>              return page;
> >> @@ -1291,6 +1308,17 @@ static inline void destroy_compound_gigantic_page(struct page *page,
> >>                                              unsigned int order) { }
> >>  #endif
> >>
> >> +/*
> >> + * Because we reuse the mapping field of some tail page structs, we should
> >> + * reset those mapping to initial value before @head is freed to the buddy
> >> + * allocator. The invalid value will be checked in the free_tail_pages_check().
> >> + */
>
> When I suggested using head[3].mapping for this state, I was not aware of
> this requirement.  My suggestion was only following the convention used in
> PageHugeTemporary.  I would not have made the suggestion if I had realized
> this was required.  Sorry.

Yeah, PageHugeTemporary is lucky. free_tail_pages_check() not check
head[2].mapping.

I will revert to the previous version (using head[3].private). Thanks.

> --
> Mike Kravetz
>
> >> +static inline void reset_tail_page_mapping(struct hstate *h, struct page *head)
> >> +{
> >> +    if (!hstate_is_gigantic(h))
> >> +            head[3].mapping = TAIL_MAPPING;
> >> +}
> >> +
> >>  static void update_and_free_page(struct hstate *h, struct page *page)
> >>  {
> >>      int i;
> >> @@ -1298,6 +1326,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
> >>      if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> >>              return;
> >>
> >> +    reset_tail_page_mapping(h, page);
> >>      h->nr_huge_pages--;
> >>      h->nr_huge_pages_node[page_to_nid(page)]--;
> >>      for (i = 0; i < pages_per_huge_page(h); i++) {
> >> @@ -1504,6 +1533,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
> >>      spin_lock(&hugetlb_lock);
> >>      h->nr_huge_pages++;
> >>      h->nr_huge_pages_node[nid]++;
> >> +    ClearPageHugeFreed(page);
> >>      spin_unlock(&hugetlb_lock);
> >>  }
> >>
> >> @@ -1770,6 +1800,14 @@ int dissolve_free_huge_page(struct page *page)
> >>              int nid = page_to_nid(head);
> >>              if (h->free_huge_pages - h->resv_huge_pages == 0)
> >>                      goto out;
> >> +
> >> +            /*
> >> +             * We should make sure that the page is already on the free list
> >> +             * when it is dissolved.
> >> +             */
> >> +            if (unlikely(!PageHugeFreed(head)))
> >> +                    goto out;
> >> +
> >>              /*
> >>               * Move PageHWPoison flag from head page to the raw error page,
> >>               * which makes any subpages rather than the error page reusable.
> >> --
> >> 2.11.0
> >

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
@ 2021-01-07  3:08         ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07  3:08 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Michal Hocko, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 5:00 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 1/6/21 8:56 AM, Michal Hocko wrote:
> > On Wed 06-01-21 16:47:36, Muchun Song wrote:
> >> There is a race condition between __free_huge_page()
> >> and dissolve_free_huge_page().
> >>
> >> CPU0:                         CPU1:
> >>
> >> // page_count(page) == 1
> >> put_page(page)
> >>   __free_huge_page(page)
> >>                               dissolve_free_huge_page(page)
> >>                                 spin_lock(&hugetlb_lock)
> >>                                 // PageHuge(page) && !page_count(page)
> >>                                 update_and_free_page(page)
> >>                                 // page is freed to the buddy
> >>                                 spin_unlock(&hugetlb_lock)
> >>     spin_lock(&hugetlb_lock)
> >>     clear_page_huge_active(page)
> >>     enqueue_huge_page(page)
> >>     // It is wrong, the page is already freed
> >>     spin_unlock(&hugetlb_lock)
> >>
> >> The race windows is between put_page() and spin_lock() which
> >> is in the __free_huge_page().
> >
> > The race window reall is between put_page and dissolve_free_huge_page.
> > And the result is that the put_page path would clobber an unrelated page
> > (either free or already reused page) which is quite serious.
> > Fortunatelly pages are dissolved very rarely. I believe that user would
> > require to be privileged to hit this by intention.
> >
> >> We should make sure that the page is already on the free list
> >> when it is dissolved.
> >
> > Another option would be to check for PageHuge in __free_huge_page. Have
> > you considered that rather than add yet another state? The scope of the
> > spinlock would have to be extended. If that sounds more tricky then can
> > we check the page->lru in the dissolve path? If the page is still
> > PageHuge and reference count 0 then there shouldn't be many options
> > where it can be queued, right?
>
> The tricky part with expanding lock scope will be the potential call to
> hugepage_subpool_put_pages as it may also try to acquire the hugetlb_lock.
>
> I am not sure what you mean by 'check the page->lru'?  If we knew the page
> was on the free list, then we could dissolve.  But, I do not think there
> is an easy way to determine that from page->lru.  A hugetlb page is either
> going to be on the active list or free list.
>
> >
> >> Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
> >> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >> ---
> >>  mm/hugetlb.c | 38 ++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 38 insertions(+)
> >>
> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >> index 4741d60f8955..8ff138c17129 100644
> >> --- a/mm/hugetlb.c
> >> +++ b/mm/hugetlb.c
> >> @@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
> >>  static int num_fault_mutexes;
> >>  struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
> >>
> >> +static inline bool PageHugeFreed(struct page *head)
> >> +{
> >> +    return (unsigned long)head[3].mapping == -1U;
> >> +}
> >> +
> >> +static inline void SetPageHugeFreed(struct page *head)
> >> +{
> >> +    head[3].mapping = (void *)-1U;
> >> +}
> >> +
> >> +static inline void ClearPageHugeFreed(struct page *head)
> >> +{
> >> +    head[3].mapping = NULL;
> >> +}
> >> +
> >>  /* Forward declaration */
> >>  static int hugetlb_acct_memory(struct hstate *h, long delta);
> >>
> >> @@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
> >>      list_move(&page->lru, &h->hugepage_freelists[nid]);
> >>      h->free_huge_pages++;
> >>      h->free_huge_pages_node[nid]++;
> >> +    SetPageHugeFreed(page);
> >>  }
> >>
> >>  static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> >> @@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> >>
> >>              list_move(&page->lru, &h->hugepage_activelist);
> >>              set_page_refcounted(page);
> >> +            ClearPageHugeFreed(page);
> >>              h->free_huge_pages--;
> >>              h->free_huge_pages_node[nid]--;
> >>              return page;
> >> @@ -1291,6 +1308,17 @@ static inline void destroy_compound_gigantic_page(struct page *page,
> >>                                              unsigned int order) { }
> >>  #endif
> >>
> >> +/*
> >> + * Because we reuse the mapping field of some tail page structs, we should
> >> + * reset those mapping to initial value before @head is freed to the buddy
> >> + * allocator. The invalid value will be checked in the free_tail_pages_check().
> >> + */
>
> When I suggested using head[3].mapping for this state, I was not aware of
> this requirement.  My suggestion was only following the convention used in
> PageHugeTemporary.  I would not have made the suggestion if I had realized
> this was required.  Sorry.

Yeah, PageHugeTemporary is lucky. free_tail_pages_check() not check
head[2].mapping.

I will revert to the previous version (using head[3].private). Thanks.

> --
> Mike Kravetz
>
> >> +static inline void reset_tail_page_mapping(struct hstate *h, struct page *head)
> >> +{
> >> +    if (!hstate_is_gigantic(h))
> >> +            head[3].mapping = TAIL_MAPPING;
> >> +}
> >> +
> >>  static void update_and_free_page(struct hstate *h, struct page *page)
> >>  {
> >>      int i;
> >> @@ -1298,6 +1326,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
> >>      if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> >>              return;
> >>
> >> +    reset_tail_page_mapping(h, page);
> >>      h->nr_huge_pages--;
> >>      h->nr_huge_pages_node[page_to_nid(page)]--;
> >>      for (i = 0; i < pages_per_huge_page(h); i++) {
> >> @@ -1504,6 +1533,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
> >>      spin_lock(&hugetlb_lock);
> >>      h->nr_huge_pages++;
> >>      h->nr_huge_pages_node[nid]++;
> >> +    ClearPageHugeFreed(page);
> >>      spin_unlock(&hugetlb_lock);
> >>  }
> >>
> >> @@ -1770,6 +1800,14 @@ int dissolve_free_huge_page(struct page *page)
> >>              int nid = page_to_nid(head);
> >>              if (h->free_huge_pages - h->resv_huge_pages == 0)
> >>                      goto out;
> >> +
> >> +            /*
> >> +             * We should make sure that the page is already on the free list
> >> +             * when it is dissolved.
> >> +             */
> >> +            if (unlikely(!PageHugeFreed(head)))
> >> +                    goto out;
> >> +
> >>              /*
> >>               * Move PageHWPoison flag from head page to the raw error page,
> >>               * which makes any subpages rather than the error page reusable.
> >> --
> >> 2.11.0
> >


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
  2021-01-06 17:07   ` Michal Hocko
@ 2021-01-07  3:11       ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07  3:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 1:07 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 06-01-21 16:47:37, Muchun Song wrote:
> > When dissolve_free_huge_page() races with __free_huge_page(), we can
> > do a retry. Because the race window is small.
>
> Is this a bug fix or mere optimization. I have hard time to tell from
> the description.

It is optimization. Thanks.

>
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/hugetlb.c | 26 +++++++++++++++++++++-----
> >  1 file changed, 21 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> [...]
> > @@ -1825,6 +1828,14 @@ int dissolve_free_huge_page(struct page *page)
> >       }
> >  out:
> >       spin_unlock(&hugetlb_lock);
> > +
> > +     /*
> > +      * If the freeing of the HugeTLB page is put on a work queue, we should
> > +      * flush the work before retrying.
> > +      */
> > +     if (unlikely(rc == -EAGAIN))
> > +             flush_work(&free_hpage_work);
>
> Is it safe to wait for the work to finish from this context?

Yes. It is safe.

>
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
@ 2021-01-07  3:11       ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07  3:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 1:07 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 06-01-21 16:47:37, Muchun Song wrote:
> > When dissolve_free_huge_page() races with __free_huge_page(), we can
> > do a retry. Because the race window is small.
>
> Is this a bug fix or mere optimization. I have hard time to tell from
> the description.

It is optimization. Thanks.

>
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/hugetlb.c | 26 +++++++++++++++++++++-----
> >  1 file changed, 21 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> [...]
> > @@ -1825,6 +1828,14 @@ int dissolve_free_huge_page(struct page *page)
> >       }
> >  out:
> >       spin_unlock(&hugetlb_lock);
> > +
> > +     /*
> > +      * If the freeing of the HugeTLB page is put on a work queue, we should
> > +      * flush the work before retrying.
> > +      */
> > +     if (unlikely(rc == -EAGAIN))
> > +             flush_work(&free_hpage_work);
>
> Is it safe to wait for the work to finish from this context?

Yes. It is safe.

>
> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-06 16:56   ` Michal Hocko
@ 2021-01-07  5:39       ` Muchun Song
  2021-01-07  5:39       ` Muchun Song
  1 sibling, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07  5:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 12:56 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 06-01-21 16:47:36, Muchun Song wrote:
> > There is a race condition between __free_huge_page()
> > and dissolve_free_huge_page().
> >
> > CPU0:                         CPU1:
> >
> > // page_count(page) == 1
> > put_page(page)
> >   __free_huge_page(page)
> >                               dissolve_free_huge_page(page)
> >                                 spin_lock(&hugetlb_lock)
> >                                 // PageHuge(page) && !page_count(page)
> >                                 update_and_free_page(page)
> >                                 // page is freed to the buddy
> >                                 spin_unlock(&hugetlb_lock)
> >     spin_lock(&hugetlb_lock)
> >     clear_page_huge_active(page)
> >     enqueue_huge_page(page)
> >     // It is wrong, the page is already freed
> >     spin_unlock(&hugetlb_lock)
> >
> > The race windows is between put_page() and spin_lock() which
> > is in the __free_huge_page().
>
> The race window reall is between put_page and dissolve_free_huge_page.
> And the result is that the put_page path would clobber an unrelated page
> (either free or already reused page) which is quite serious.
> Fortunatelly pages are dissolved very rarely. I believe that user would
> require to be privileged to hit this by intention.
>
> > We should make sure that the page is already on the free list
> > when it is dissolved.
>
> Another option would be to check for PageHuge in __free_huge_page. Have
> you considered that rather than add yet another state? The scope of the
> spinlock would have to be extended. If that sounds more tricky then can
> we check the page->lru in the dissolve path? If the page is still
> PageHuge and reference count 0 then there shouldn't be many options
> where it can be queued, right?

Did you mean that we iterate over the free list to check whether
the page is on the free list? If so, I do not think it is a good solution
than introducing another state. Because if there are a lot of pages
on the free list, it may take some time to do it with holding
hugetlb_lock. Right? Actually, we have some tail page structs
to store the state. At least it's not in short supply right now.

Thanks.

>
> > Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/hugetlb.c | 38 ++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 38 insertions(+)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 4741d60f8955..8ff138c17129 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
> >  static int num_fault_mutexes;
> >  struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
> >
> > +static inline bool PageHugeFreed(struct page *head)
> > +{
> > +     return (unsigned long)head[3].mapping == -1U;
> > +}
> > +
> > +static inline void SetPageHugeFreed(struct page *head)
> > +{
> > +     head[3].mapping = (void *)-1U;
> > +}
> > +
> > +static inline void ClearPageHugeFreed(struct page *head)
> > +{
> > +     head[3].mapping = NULL;
> > +}
> > +
> >  /* Forward declaration */
> >  static int hugetlb_acct_memory(struct hstate *h, long delta);
> >
> > @@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
> >       list_move(&page->lru, &h->hugepage_freelists[nid]);
> >       h->free_huge_pages++;
> >       h->free_huge_pages_node[nid]++;
> > +     SetPageHugeFreed(page);
> >  }
> >
> >  static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> > @@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> >
> >               list_move(&page->lru, &h->hugepage_activelist);
> >               set_page_refcounted(page);
> > +             ClearPageHugeFreed(page);
> >               h->free_huge_pages--;
> >               h->free_huge_pages_node[nid]--;
> >               return page;
> > @@ -1291,6 +1308,17 @@ static inline void destroy_compound_gigantic_page(struct page *page,
> >                                               unsigned int order) { }
> >  #endif
> >
> > +/*
> > + * Because we reuse the mapping field of some tail page structs, we should
> > + * reset those mapping to initial value before @head is freed to the buddy
> > + * allocator. The invalid value will be checked in the free_tail_pages_check().
> > + */
> > +static inline void reset_tail_page_mapping(struct hstate *h, struct page *head)
> > +{
> > +     if (!hstate_is_gigantic(h))
> > +             head[3].mapping = TAIL_MAPPING;
> > +}
> > +
> >  static void update_and_free_page(struct hstate *h, struct page *page)
> >  {
> >       int i;
> > @@ -1298,6 +1326,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
> >       if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> >               return;
> >
> > +     reset_tail_page_mapping(h, page);
> >       h->nr_huge_pages--;
> >       h->nr_huge_pages_node[page_to_nid(page)]--;
> >       for (i = 0; i < pages_per_huge_page(h); i++) {
> > @@ -1504,6 +1533,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
> >       spin_lock(&hugetlb_lock);
> >       h->nr_huge_pages++;
> >       h->nr_huge_pages_node[nid]++;
> > +     ClearPageHugeFreed(page);
> >       spin_unlock(&hugetlb_lock);
> >  }
> >
> > @@ -1770,6 +1800,14 @@ int dissolve_free_huge_page(struct page *page)
> >               int nid = page_to_nid(head);
> >               if (h->free_huge_pages - h->resv_huge_pages == 0)
> >                       goto out;
> > +
> > +             /*
> > +              * We should make sure that the page is already on the free list
> > +              * when it is dissolved.
> > +              */
> > +             if (unlikely(!PageHugeFreed(head)))
> > +                     goto out;
> > +
> >               /*
> >                * Move PageHWPoison flag from head page to the raw error page,
> >                * which makes any subpages rather than the error page reusable.
> > --
> > 2.11.0
>
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
@ 2021-01-07  5:39       ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07  5:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 12:56 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 06-01-21 16:47:36, Muchun Song wrote:
> > There is a race condition between __free_huge_page()
> > and dissolve_free_huge_page().
> >
> > CPU0:                         CPU1:
> >
> > // page_count(page) == 1
> > put_page(page)
> >   __free_huge_page(page)
> >                               dissolve_free_huge_page(page)
> >                                 spin_lock(&hugetlb_lock)
> >                                 // PageHuge(page) && !page_count(page)
> >                                 update_and_free_page(page)
> >                                 // page is freed to the buddy
> >                                 spin_unlock(&hugetlb_lock)
> >     spin_lock(&hugetlb_lock)
> >     clear_page_huge_active(page)
> >     enqueue_huge_page(page)
> >     // It is wrong, the page is already freed
> >     spin_unlock(&hugetlb_lock)
> >
> > The race windows is between put_page() and spin_lock() which
> > is in the __free_huge_page().
>
> The race window reall is between put_page and dissolve_free_huge_page.
> And the result is that the put_page path would clobber an unrelated page
> (either free or already reused page) which is quite serious.
> Fortunatelly pages are dissolved very rarely. I believe that user would
> require to be privileged to hit this by intention.
>
> > We should make sure that the page is already on the free list
> > when it is dissolved.
>
> Another option would be to check for PageHuge in __free_huge_page. Have
> you considered that rather than add yet another state? The scope of the
> spinlock would have to be extended. If that sounds more tricky then can
> we check the page->lru in the dissolve path? If the page is still
> PageHuge and reference count 0 then there shouldn't be many options
> where it can be queued, right?

Did you mean that we iterate over the free list to check whether
the page is on the free list? If so, I do not think it is a good solution
than introducing another state. Because if there are a lot of pages
on the free list, it may take some time to do it with holding
hugetlb_lock. Right? Actually, we have some tail page structs
to store the state. At least it's not in short supply right now.

Thanks.

>
> > Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/hugetlb.c | 38 ++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 38 insertions(+)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 4741d60f8955..8ff138c17129 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
> >  static int num_fault_mutexes;
> >  struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
> >
> > +static inline bool PageHugeFreed(struct page *head)
> > +{
> > +     return (unsigned long)head[3].mapping == -1U;
> > +}
> > +
> > +static inline void SetPageHugeFreed(struct page *head)
> > +{
> > +     head[3].mapping = (void *)-1U;
> > +}
> > +
> > +static inline void ClearPageHugeFreed(struct page *head)
> > +{
> > +     head[3].mapping = NULL;
> > +}
> > +
> >  /* Forward declaration */
> >  static int hugetlb_acct_memory(struct hstate *h, long delta);
> >
> > @@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
> >       list_move(&page->lru, &h->hugepage_freelists[nid]);
> >       h->free_huge_pages++;
> >       h->free_huge_pages_node[nid]++;
> > +     SetPageHugeFreed(page);
> >  }
> >
> >  static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> > @@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> >
> >               list_move(&page->lru, &h->hugepage_activelist);
> >               set_page_refcounted(page);
> > +             ClearPageHugeFreed(page);
> >               h->free_huge_pages--;
> >               h->free_huge_pages_node[nid]--;
> >               return page;
> > @@ -1291,6 +1308,17 @@ static inline void destroy_compound_gigantic_page(struct page *page,
> >                                               unsigned int order) { }
> >  #endif
> >
> > +/*
> > + * Because we reuse the mapping field of some tail page structs, we should
> > + * reset those mapping to initial value before @head is freed to the buddy
> > + * allocator. The invalid value will be checked in the free_tail_pages_check().
> > + */
> > +static inline void reset_tail_page_mapping(struct hstate *h, struct page *head)
> > +{
> > +     if (!hstate_is_gigantic(h))
> > +             head[3].mapping = TAIL_MAPPING;
> > +}
> > +
> >  static void update_and_free_page(struct hstate *h, struct page *page)
> >  {
> >       int i;
> > @@ -1298,6 +1326,7 @@ static void update_and_free_page(struct hstate *h, struct page *page)
> >       if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> >               return;
> >
> > +     reset_tail_page_mapping(h, page);
> >       h->nr_huge_pages--;
> >       h->nr_huge_pages_node[page_to_nid(page)]--;
> >       for (i = 0; i < pages_per_huge_page(h); i++) {
> > @@ -1504,6 +1533,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
> >       spin_lock(&hugetlb_lock);
> >       h->nr_huge_pages++;
> >       h->nr_huge_pages_node[nid]++;
> > +     ClearPageHugeFreed(page);
> >       spin_unlock(&hugetlb_lock);
> >  }
> >
> > @@ -1770,6 +1800,14 @@ int dissolve_free_huge_page(struct page *page)
> >               int nid = page_to_nid(head);
> >               if (h->free_huge_pages - h->resv_huge_pages == 0)
> >                       goto out;
> > +
> > +             /*
> > +              * We should make sure that the page is already on the free list
> > +              * when it is dissolved.
> > +              */
> > +             if (unlikely(!PageHugeFreed(head)))
> > +                     goto out;
> > +
> >               /*
> >                * Move PageHWPoison flag from head page to the raw error page,
> >                * which makes any subpages rather than the error page reusable.
> > --
> > 2.11.0
>
> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 2/6] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
  2021-01-06 21:07         ` Mike Kravetz
@ 2021-01-07  8:36           ` Michal Hocko
  0 siblings, 0 replies; 75+ messages in thread
From: Michal Hocko @ 2021-01-07  8:36 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Muchun Song, akpm, n-horiguchi, ak, linux-mm, linux-kernel

On Wed 06-01-21 13:07:40, Mike Kravetz wrote:
> On 1/6/21 12:02 PM, Michal Hocko wrote:
> > On Wed 06-01-21 11:30:25, Mike Kravetz wrote:
> >> On 1/6/21 8:35 AM, Michal Hocko wrote:
> >>> On Wed 06-01-21 16:47:35, Muchun Song wrote:
> >>>> Because we only can isolate a active page via isolate_huge_page()
> >>>> and hugetlbfs_fallocate() forget to mark it as active, we cannot
> >>>> isolate and migrate those pages.
> >>>
> >>> I've little bit hard time to understand this initially and had to dive
> >>> into the code to make sense of it. I would consider the following
> >>> wording easier to grasp. Feel free to reuse if you like.
> >>> "
> >>> If a new hugetlb page is allocated during fallocate it will not be
> >>> marked as active (set_page_huge_active) which will result in a later
> >>> isolate_huge_page failure when the page migration code would like to
> >>> move that page. Such a failure would be unexpected and wrong.
> >>> "
> >>>
> >>> Now to the fix. I believe that this patch shows that the
> >>> set_page_huge_active is just too subtle. Is there any reason why we
> >>> cannot make all freshly allocated huge pages active by default?
> >>
> >> I looked into that yesterday.  The primary issue is in page fault code,
> >> hugetlb_no_page is an example.  If page_huge_active is set, then it can
> >> be isolated for migration.  So, migration could race with the page fault
> >> and the page could be migrated before being added to the page table of
> >> the faulting task.  This was an issue when hugetlb_no_page set_page_huge_active
> >> right after allocating and clearing the huge page.  Commit cb6acd01e2e4
> >> moved the set_page_huge_active after adding the page to the page table
> >> to address this issue.
> > 
> > Thanks for the clarification. I was not aware of this subtlety. The
> > existing comment is not helping much TBH. I am still digesting the
> > suggested race. The page is new and exclusive and not visible via page
> > tables yet, so the only source of the migration would be pfn based
> > (hotplug, poisoning), right?
> 
> That is correct.
> 
> 
> > Btw. s@set_page_huge_active@set_page_huge_migrateable@ would help
> > readability IMHO. With a comment explaining that this _has_ to be called
> > after the page is fully initialized.
> 
> Agree, I will add that as a future enhancement.

Thanks!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
  2021-01-07  3:11       ` Muchun Song
  (?)
@ 2021-01-07  8:39       ` Michal Hocko
  2021-01-07  9:01           ` Muchun Song
  -1 siblings, 1 reply; 75+ messages in thread
From: Michal Hocko @ 2021-01-07  8:39 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu 07-01-21 11:11:41, Muchun Song wrote:
> On Thu, Jan 7, 2021 at 1:07 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 06-01-21 16:47:37, Muchun Song wrote:
> > > When dissolve_free_huge_page() races with __free_huge_page(), we can
> > > do a retry. Because the race window is small.
> >
> > Is this a bug fix or mere optimization. I have hard time to tell from
> > the description.
> 
> It is optimization. Thanks.
> 
> >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > ---
> > >  mm/hugetlb.c | 26 +++++++++++++++++++++-----
> > >  1 file changed, 21 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > [...]
> > > @@ -1825,6 +1828,14 @@ int dissolve_free_huge_page(struct page *page)
> > >       }
> > >  out:
> > >       spin_unlock(&hugetlb_lock);
> > > +
> > > +     /*
> > > +      * If the freeing of the HugeTLB page is put on a work queue, we should
> > > +      * flush the work before retrying.
> > > +      */
> > > +     if (unlikely(rc == -EAGAIN))
> > > +             flush_work(&free_hpage_work);
> >
> > Is it safe to wait for the work to finish from this context?
> 
> Yes. It is safe.

Please expand on why in the changelog. Same for the optimization
including some numbers showing it really helps.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-06 20:58     ` Mike Kravetz
  2021-01-07  3:08         ` Muchun Song
@ 2021-01-07  8:40       ` Michal Hocko
  2021-01-08  0:52         ` Mike Kravetz
  1 sibling, 1 reply; 75+ messages in thread
From: Michal Hocko @ 2021-01-07  8:40 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Muchun Song, akpm, n-horiguchi, ak, linux-mm, linux-kernel

On Wed 06-01-21 12:58:29, Mike Kravetz wrote:
> On 1/6/21 8:56 AM, Michal Hocko wrote:
> > On Wed 06-01-21 16:47:36, Muchun Song wrote:
> >> There is a race condition between __free_huge_page()
> >> and dissolve_free_huge_page().
> >>
> >> CPU0:                         CPU1:
> >>
> >> // page_count(page) == 1
> >> put_page(page)
> >>   __free_huge_page(page)
> >>                               dissolve_free_huge_page(page)
> >>                                 spin_lock(&hugetlb_lock)
> >>                                 // PageHuge(page) && !page_count(page)
> >>                                 update_and_free_page(page)
> >>                                 // page is freed to the buddy
> >>                                 spin_unlock(&hugetlb_lock)
> >>     spin_lock(&hugetlb_lock)
> >>     clear_page_huge_active(page)
> >>     enqueue_huge_page(page)
> >>     // It is wrong, the page is already freed
> >>     spin_unlock(&hugetlb_lock)
> >>
> >> The race windows is between put_page() and spin_lock() which
> >> is in the __free_huge_page().
> > 
> > The race window reall is between put_page and dissolve_free_huge_page.
> > And the result is that the put_page path would clobber an unrelated page
> > (either free or already reused page) which is quite serious.
> > Fortunatelly pages are dissolved very rarely. I believe that user would
> > require to be privileged to hit this by intention.
> > 
> >> We should make sure that the page is already on the free list
> >> when it is dissolved.
> > 
> > Another option would be to check for PageHuge in __free_huge_page. Have
> > you considered that rather than add yet another state? The scope of the
> > spinlock would have to be extended. If that sounds more tricky then can
> > we check the page->lru in the dissolve path? If the page is still
> > PageHuge and reference count 0 then there shouldn't be many options
> > where it can be queued, right?
> 
> The tricky part with expanding lock scope will be the potential call to
> hugepage_subpool_put_pages as it may also try to acquire the hugetlb_lock.

Can we rearrange the code and move hugepage_subpool_put_pages after all
this is done? Or is there any strong reason for the particular ordering?

> I am not sure what you mean by 'check the page->lru'?  If we knew the page
> was on the free list, then we could dissolve.  But, I do not think there
> is an easy way to determine that from page->lru.  A hugetlb page is either
> going to be on the active list or free list.

Can it be on the active list with ref count = 0?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-07  5:39       ` Muchun Song
  (?)
@ 2021-01-07  8:41       ` Michal Hocko
  2021-01-07  8:53           ` Muchun Song
  -1 siblings, 1 reply; 75+ messages in thread
From: Michal Hocko @ 2021-01-07  8:41 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu 07-01-21 13:39:38, Muchun Song wrote:
> On Thu, Jan 7, 2021 at 12:56 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 06-01-21 16:47:36, Muchun Song wrote:
> > > There is a race condition between __free_huge_page()
> > > and dissolve_free_huge_page().
> > >
> > > CPU0:                         CPU1:
> > >
> > > // page_count(page) == 1
> > > put_page(page)
> > >   __free_huge_page(page)
> > >                               dissolve_free_huge_page(page)
> > >                                 spin_lock(&hugetlb_lock)
> > >                                 // PageHuge(page) && !page_count(page)
> > >                                 update_and_free_page(page)
> > >                                 // page is freed to the buddy
> > >                                 spin_unlock(&hugetlb_lock)
> > >     spin_lock(&hugetlb_lock)
> > >     clear_page_huge_active(page)
> > >     enqueue_huge_page(page)
> > >     // It is wrong, the page is already freed
> > >     spin_unlock(&hugetlb_lock)
> > >
> > > The race windows is between put_page() and spin_lock() which
> > > is in the __free_huge_page().
> >
> > The race window reall is between put_page and dissolve_free_huge_page.
> > And the result is that the put_page path would clobber an unrelated page
> > (either free or already reused page) which is quite serious.
> > Fortunatelly pages are dissolved very rarely. I believe that user would
> > require to be privileged to hit this by intention.
> >
> > > We should make sure that the page is already on the free list
> > > when it is dissolved.
> >
> > Another option would be to check for PageHuge in __free_huge_page. Have
> > you considered that rather than add yet another state? The scope of the
> > spinlock would have to be extended. If that sounds more tricky then can
> > we check the page->lru in the dissolve path? If the page is still
> > PageHuge and reference count 0 then there shouldn't be many options
> > where it can be queued, right?
> 
> Did you mean that we iterate over the free list to check whether
> the page is on the free list?

No I meant to check that the page is enqueued which along with ref count
= 0 should mean it has been released to the pool unless I am missing
something.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-07  8:41       ` Michal Hocko
@ 2021-01-07  8:53           ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07  8:53 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 4:41 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 07-01-21 13:39:38, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 12:56 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 06-01-21 16:47:36, Muchun Song wrote:
> > > > There is a race condition between __free_huge_page()
> > > > and dissolve_free_huge_page().
> > > >
> > > > CPU0:                         CPU1:
> > > >
> > > > // page_count(page) == 1
> > > > put_page(page)
> > > >   __free_huge_page(page)
> > > >                               dissolve_free_huge_page(page)
> > > >                                 spin_lock(&hugetlb_lock)
> > > >                                 // PageHuge(page) && !page_count(page)
> > > >                                 update_and_free_page(page)
> > > >                                 // page is freed to the buddy
> > > >                                 spin_unlock(&hugetlb_lock)
> > > >     spin_lock(&hugetlb_lock)
> > > >     clear_page_huge_active(page)
> > > >     enqueue_huge_page(page)
> > > >     // It is wrong, the page is already freed
> > > >     spin_unlock(&hugetlb_lock)
> > > >
> > > > The race windows is between put_page() and spin_lock() which
> > > > is in the __free_huge_page().
> > >
> > > The race window reall is between put_page and dissolve_free_huge_page.
> > > And the result is that the put_page path would clobber an unrelated page
> > > (either free or already reused page) which is quite serious.
> > > Fortunatelly pages are dissolved very rarely. I believe that user would
> > > require to be privileged to hit this by intention.
> > >
> > > > We should make sure that the page is already on the free list
> > > > when it is dissolved.
> > >
> > > Another option would be to check for PageHuge in __free_huge_page. Have
> > > you considered that rather than add yet another state? The scope of the
> > > spinlock would have to be extended. If that sounds more tricky then can
> > > we check the page->lru in the dissolve path? If the page is still
> > > PageHuge and reference count 0 then there shouldn't be many options
> > > where it can be queued, right?
> >
> > Did you mean that we iterate over the free list to check whether
> > the page is on the free list?
>
> No I meant to check that the page is enqueued which along with ref count
> = 0 should mean it has been released to the pool unless I am missing
> something.

The page can be on the free list or active list or empty when it
is freed to the pool. How to check whether it is on the free list?

> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
@ 2021-01-07  8:53           ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07  8:53 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 4:41 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 07-01-21 13:39:38, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 12:56 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 06-01-21 16:47:36, Muchun Song wrote:
> > > > There is a race condition between __free_huge_page()
> > > > and dissolve_free_huge_page().
> > > >
> > > > CPU0:                         CPU1:
> > > >
> > > > // page_count(page) == 1
> > > > put_page(page)
> > > >   __free_huge_page(page)
> > > >                               dissolve_free_huge_page(page)
> > > >                                 spin_lock(&hugetlb_lock)
> > > >                                 // PageHuge(page) && !page_count(page)
> > > >                                 update_and_free_page(page)
> > > >                                 // page is freed to the buddy
> > > >                                 spin_unlock(&hugetlb_lock)
> > > >     spin_lock(&hugetlb_lock)
> > > >     clear_page_huge_active(page)
> > > >     enqueue_huge_page(page)
> > > >     // It is wrong, the page is already freed
> > > >     spin_unlock(&hugetlb_lock)
> > > >
> > > > The race windows is between put_page() and spin_lock() which
> > > > is in the __free_huge_page().
> > >
> > > The race window reall is between put_page and dissolve_free_huge_page.
> > > And the result is that the put_page path would clobber an unrelated page
> > > (either free or already reused page) which is quite serious.
> > > Fortunatelly pages are dissolved very rarely. I believe that user would
> > > require to be privileged to hit this by intention.
> > >
> > > > We should make sure that the page is already on the free list
> > > > when it is dissolved.
> > >
> > > Another option would be to check for PageHuge in __free_huge_page. Have
> > > you considered that rather than add yet another state? The scope of the
> > > spinlock would have to be extended. If that sounds more tricky then can
> > > we check the page->lru in the dissolve path? If the page is still
> > > PageHuge and reference count 0 then there shouldn't be many options
> > > where it can be queued, right?
> >
> > Did you mean that we iterate over the free list to check whether
> > the page is on the free list?
>
> No I meant to check that the page is enqueued which along with ref count
> = 0 should mean it has been released to the pool unless I am missing
> something.

The page can be on the free list or active list or empty when it
is freed to the pool. How to check whether it is on the free list?

> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
  2021-01-07  8:39       ` Michal Hocko
@ 2021-01-07  9:01           ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07  9:01 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 4:39 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 07-01-21 11:11:41, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 1:07 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 06-01-21 16:47:37, Muchun Song wrote:
> > > > When dissolve_free_huge_page() races with __free_huge_page(), we can
> > > > do a retry. Because the race window is small.
> > >
> > > Is this a bug fix or mere optimization. I have hard time to tell from
> > > the description.
> >
> > It is optimization. Thanks.
> >
> > >
> > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > ---
> > > >  mm/hugetlb.c | 26 +++++++++++++++++++++-----
> > > >  1 file changed, 21 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > [...]
> > > > @@ -1825,6 +1828,14 @@ int dissolve_free_huge_page(struct page *page)
> > > >       }
> > > >  out:
> > > >       spin_unlock(&hugetlb_lock);
> > > > +
> > > > +     /*
> > > > +      * If the freeing of the HugeTLB page is put on a work queue, we should
> > > > +      * flush the work before retrying.
> > > > +      */
> > > > +     if (unlikely(rc == -EAGAIN))
> > > > +             flush_work(&free_hpage_work);
> > >
> > > Is it safe to wait for the work to finish from this context?
> >
> > Yes. It is safe.
>
> Please expand on why in the changelog. Same for the optimization
> including some numbers showing it really helps.

OK. Changelog should be updated. Do you agree the race window
is quite small? If so, why is it not an optimization? Don’t we dissolve
the page as successfully as possible when we call
dissolve_free_huge_page()? I am confused about numbers showing.
Because this is not a performance optimization, but an increase in
the success rate of dissolving.

Thanks.

>
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
@ 2021-01-07  9:01           ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07  9:01 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 4:39 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 07-01-21 11:11:41, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 1:07 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 06-01-21 16:47:37, Muchun Song wrote:
> > > > When dissolve_free_huge_page() races with __free_huge_page(), we can
> > > > do a retry. Because the race window is small.
> > >
> > > Is this a bug fix or mere optimization. I have hard time to tell from
> > > the description.
> >
> > It is optimization. Thanks.
> >
> > >
> > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > ---
> > > >  mm/hugetlb.c | 26 +++++++++++++++++++++-----
> > > >  1 file changed, 21 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > [...]
> > > > @@ -1825,6 +1828,14 @@ int dissolve_free_huge_page(struct page *page)
> > > >       }
> > > >  out:
> > > >       spin_unlock(&hugetlb_lock);
> > > > +
> > > > +     /*
> > > > +      * If the freeing of the HugeTLB page is put on a work queue, we should
> > > > +      * flush the work before retrying.
> > > > +      */
> > > > +     if (unlikely(rc == -EAGAIN))
> > > > +             flush_work(&free_hpage_work);
> > >
> > > Is it safe to wait for the work to finish from this context?
> >
> > Yes. It is safe.
>
> Please expand on why in the changelog. Same for the optimization
> including some numbers showing it really helps.

OK. Changelog should be updated. Do you agree the race window
is quite small? If so, why is it not an optimization? Don’t we dissolve
the page as successfully as possible when we call
dissolve_free_huge_page()? I am confused about numbers showing.
Because this is not a performance optimization, but an increase in
the success rate of dissolving.

Thanks.

>
> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 0/6] Fix some bugs about HugeTLB code
  2021-01-06  8:47 [PATCH v2 0/6] Fix some bugs about HugeTLB code Muchun Song
                   ` (5 preceding siblings ...)
  2021-01-06  8:47 ` [PATCH v2 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active Muchun Song
@ 2021-01-07  9:30 ` David Hildenbrand
  2021-01-07  9:40     ` Muchun Song
  6 siblings, 1 reply; 75+ messages in thread
From: David Hildenbrand @ 2021-01-07  9:30 UTC (permalink / raw)
  To: Muchun Song, mike.kravetz, akpm
  Cc: n-horiguchi, ak, mhocko, linux-mm, linux-kernel

On 06.01.21 09:47, Muchun Song wrote:
> This patch series aims to fix some bugs and add some improvements.
> 
> Changelog since v1 -> v2:
>   - Export set_page_huge_active() in patch #2 to fix.
>   - Using head[3].mapping to indicate the page is freed in patch #3.
>   - Flush @free_hpage_work in patch #4.
> 
> Muchun Song (6):
>   mm: migrate: do not migrate HugeTLB page whose refcount is one
>   mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
>   mm: hugetlb: fix a race between freeing and dissolving the page
>   mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
>   mm: hugetlb: fix a race between isolating and freeing page
>   mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
> 
>  fs/hugetlbfs/inode.c    |  3 ++-
>  include/linux/hugetlb.h |  2 ++
>  mm/hugetlb.c            | 69 +++++++++++++++++++++++++++++++++++++++++++------
>  mm/migrate.c            |  6 +++++
>  4 files changed, 71 insertions(+), 9 deletions(-)
> 

Repeating my question regarding ccing stable on selected fixes.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 0/6] Fix some bugs about HugeTLB code
  2021-01-07  9:30 ` [PATCH v2 0/6] Fix some bugs about HugeTLB code David Hildenbrand
@ 2021-01-07  9:40     ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07  9:40 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen, mhocko,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 5:30 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 06.01.21 09:47, Muchun Song wrote:
> > This patch series aims to fix some bugs and add some improvements.
> >
> > Changelog since v1 -> v2:
> >   - Export set_page_huge_active() in patch #2 to fix.
> >   - Using head[3].mapping to indicate the page is freed in patch #3.
> >   - Flush @free_hpage_work in patch #4.
> >
> > Muchun Song (6):
> >   mm: migrate: do not migrate HugeTLB page whose refcount is one
> >   mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
> >   mm: hugetlb: fix a race between freeing and dissolving the page
> >   mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
> >   mm: hugetlb: fix a race between isolating and freeing page
> >   mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
> >
> >  fs/hugetlbfs/inode.c    |  3 ++-
> >  include/linux/hugetlb.h |  2 ++
> >  mm/hugetlb.c            | 69 +++++++++++++++++++++++++++++++++++++++++++------
> >  mm/migrate.c            |  6 +++++
> >  4 files changed, 71 insertions(+), 9 deletions(-)
> >
>
> Repeating my question regarding ccing stable on selected fixes.
>

Just add a CC tag in the commit log of the fix patches? Right?
Sorry, I'm a novice about this. Thanks.

> --
> Thanks,
>
> David / dhildenb
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 0/6] Fix some bugs about HugeTLB code
@ 2021-01-07  9:40     ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07  9:40 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen, mhocko,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 5:30 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 06.01.21 09:47, Muchun Song wrote:
> > This patch series aims to fix some bugs and add some improvements.
> >
> > Changelog since v1 -> v2:
> >   - Export set_page_huge_active() in patch #2 to fix.
> >   - Using head[3].mapping to indicate the page is freed in patch #3.
> >   - Flush @free_hpage_work in patch #4.
> >
> > Muchun Song (6):
> >   mm: migrate: do not migrate HugeTLB page whose refcount is one
> >   mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
> >   mm: hugetlb: fix a race between freeing and dissolving the page
> >   mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
> >   mm: hugetlb: fix a race between isolating and freeing page
> >   mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
> >
> >  fs/hugetlbfs/inode.c    |  3 ++-
> >  include/linux/hugetlb.h |  2 ++
> >  mm/hugetlb.c            | 69 +++++++++++++++++++++++++++++++++++++++++++------
> >  mm/migrate.c            |  6 +++++
> >  4 files changed, 71 insertions(+), 9 deletions(-)
> >
>
> Repeating my question regarding ccing stable on selected fixes.
>

Just add a CC tag in the commit log of the fix patches? Right?
Sorry, I'm a novice about this. Thanks.

> --
> Thanks,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 0/6] Fix some bugs about HugeTLB code
  2021-01-07  9:40     ` Muchun Song
  (?)
@ 2021-01-07 10:10     ` David Hildenbrand
  2021-01-07 10:16         ` Muchun Song
  -1 siblings, 1 reply; 75+ messages in thread
From: David Hildenbrand @ 2021-01-07 10:10 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen, mhocko,
	Linux Memory Management List, LKML

On 07.01.21 10:40, Muchun Song wrote:
> On Thu, Jan 7, 2021 at 5:30 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 06.01.21 09:47, Muchun Song wrote:
>>> This patch series aims to fix some bugs and add some improvements.
>>>
>>> Changelog since v1 -> v2:
>>>   - Export set_page_huge_active() in patch #2 to fix.
>>>   - Using head[3].mapping to indicate the page is freed in patch #3.
>>>   - Flush @free_hpage_work in patch #4.
>>>
>>> Muchun Song (6):
>>>   mm: migrate: do not migrate HugeTLB page whose refcount is one
>>>   mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
>>>   mm: hugetlb: fix a race between freeing and dissolving the page
>>>   mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
>>>   mm: hugetlb: fix a race between isolating and freeing page
>>>   mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
>>>
>>>  fs/hugetlbfs/inode.c    |  3 ++-
>>>  include/linux/hugetlb.h |  2 ++
>>>  mm/hugetlb.c            | 69 +++++++++++++++++++++++++++++++++++++++++++------
>>>  mm/migrate.c            |  6 +++++
>>>  4 files changed, 71 insertions(+), 9 deletions(-)
>>>
>>
>> Repeating my question regarding ccing stable on selected fixes.
>>
> 
> Just add a CC tag in the commit log of the fix patches? Right?
> Sorry, I'm a novice about this. Thanks.

Sure, here is some information:

https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html

Applicable patches should be moved to the beginning of the series.

Add "Cc: stable@vger.kernel.org" similar to "Fixes:" to the respective
patches. In the ideal case, indicate the applicable earliest stable
release where it applies.

E.g., take a look at (random commit)

commit 20b329129009caf1c646152abe09b697227e1c37
Author: Bob Peterson <rpeterso@redhat.com>
Date:   Wed Nov 18 08:54:31 2020 -0500

    gfs2: Fix regression in freeze_go_sync
...
    Fixes: 541656d3a513 ("gfs2: freeze should work on read-only mounts")
    Cc: stable@vger.kernel.org # v5.8+
...


Consequently, actually cc when sending out these patches (e.g., let "git
send-email" do it automatically).

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 0/6] Fix some bugs about HugeTLB code
  2021-01-07 10:10     ` David Hildenbrand
@ 2021-01-07 10:16         ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07 10:16 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen, mhocko,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 6:10 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 07.01.21 10:40, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 5:30 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 06.01.21 09:47, Muchun Song wrote:
> >>> This patch series aims to fix some bugs and add some improvements.
> >>>
> >>> Changelog since v1 -> v2:
> >>>   - Export set_page_huge_active() in patch #2 to fix.
> >>>   - Using head[3].mapping to indicate the page is freed in patch #3.
> >>>   - Flush @free_hpage_work in patch #4.
> >>>
> >>> Muchun Song (6):
> >>>   mm: migrate: do not migrate HugeTLB page whose refcount is one
> >>>   mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
> >>>   mm: hugetlb: fix a race between freeing and dissolving the page
> >>>   mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
> >>>   mm: hugetlb: fix a race between isolating and freeing page
> >>>   mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
> >>>
> >>>  fs/hugetlbfs/inode.c    |  3 ++-
> >>>  include/linux/hugetlb.h |  2 ++
> >>>  mm/hugetlb.c            | 69 +++++++++++++++++++++++++++++++++++++++++++------
> >>>  mm/migrate.c            |  6 +++++
> >>>  4 files changed, 71 insertions(+), 9 deletions(-)
> >>>
> >>
> >> Repeating my question regarding ccing stable on selected fixes.
> >>
> >
> > Just add a CC tag in the commit log of the fix patches? Right?
> > Sorry, I'm a novice about this. Thanks.
>
> Sure, here is some information:
>
> https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html
>
> Applicable patches should be moved to the beginning of the series.
>
> Add "Cc: stable@vger.kernel.org" similar to "Fixes:" to the respective
> patches. In the ideal case, indicate the applicable earliest stable
> release where it applies.
>
> E.g., take a look at (random commit)
>
> commit 20b329129009caf1c646152abe09b697227e1c37
> Author: Bob Peterson <rpeterso@redhat.com>
> Date:   Wed Nov 18 08:54:31 2020 -0500
>
>     gfs2: Fix regression in freeze_go_sync
> ...
>     Fixes: 541656d3a513 ("gfs2: freeze should work on read-only mounts")
>     Cc: stable@vger.kernel.org # v5.8+
> ...
>
>
> Consequently, actually cc when sending out these patches (e.g., let "git
> send-email" do it automatically).

Great! Very thanks.

>
> --
> Thanks,
>
> David / dhildenb
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 0/6] Fix some bugs about HugeTLB code
@ 2021-01-07 10:16         ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07 10:16 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen, mhocko,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 6:10 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 07.01.21 10:40, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 5:30 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 06.01.21 09:47, Muchun Song wrote:
> >>> This patch series aims to fix some bugs and add some improvements.
> >>>
> >>> Changelog since v1 -> v2:
> >>>   - Export set_page_huge_active() in patch #2 to fix.
> >>>   - Using head[3].mapping to indicate the page is freed in patch #3.
> >>>   - Flush @free_hpage_work in patch #4.
> >>>
> >>> Muchun Song (6):
> >>>   mm: migrate: do not migrate HugeTLB page whose refcount is one
> >>>   mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
> >>>   mm: hugetlb: fix a race between freeing and dissolving the page
> >>>   mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
> >>>   mm: hugetlb: fix a race between isolating and freeing page
> >>>   mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
> >>>
> >>>  fs/hugetlbfs/inode.c    |  3 ++-
> >>>  include/linux/hugetlb.h |  2 ++
> >>>  mm/hugetlb.c            | 69 +++++++++++++++++++++++++++++++++++++++++++------
> >>>  mm/migrate.c            |  6 +++++
> >>>  4 files changed, 71 insertions(+), 9 deletions(-)
> >>>
> >>
> >> Repeating my question regarding ccing stable on selected fixes.
> >>
> >
> > Just add a CC tag in the commit log of the fix patches? Right?
> > Sorry, I'm a novice about this. Thanks.
>
> Sure, here is some information:
>
> https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html
>
> Applicable patches should be moved to the beginning of the series.
>
> Add "Cc: stable@vger.kernel.org" similar to "Fixes:" to the respective
> patches. In the ideal case, indicate the applicable earliest stable
> release where it applies.
>
> E.g., take a look at (random commit)
>
> commit 20b329129009caf1c646152abe09b697227e1c37
> Author: Bob Peterson <rpeterso@redhat.com>
> Date:   Wed Nov 18 08:54:31 2020 -0500
>
>     gfs2: Fix regression in freeze_go_sync
> ...
>     Fixes: 541656d3a513 ("gfs2: freeze should work on read-only mounts")
>     Cc: stable@vger.kernel.org # v5.8+
> ...
>
>
> Consequently, actually cc when sending out these patches (e.g., let "git
> send-email" do it automatically).

Great! Very thanks.

>
> --
> Thanks,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one
  2021-01-07  2:52       ` Muchun Song
  (?)
@ 2021-01-07 11:16       ` Michal Hocko
  2021-01-07 11:24           ` Muchun Song
  -1 siblings, 1 reply; 75+ messages in thread
From: Michal Hocko @ 2021-01-07 11:16 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML, Yang Shi

On Thu 07-01-21 10:52:21, Muchun Song wrote:
> On Thu, Jan 7, 2021 at 12:13 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 06-01-21 16:47:34, Muchun Song wrote:
> > > If the refcount is one when it is migrated, it means that the page
> > > was freed from under us. So we are done and do not need to migrate.
> >
> > Is this common enough that it would warrant the explicit check for each
> > migration?
> 
> Are you worried about the overhead caused by the check? Thanks.

I am not as worried as I would like to see some actual justification for
this optimization.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-07  8:53           ` Muchun Song
  (?)
@ 2021-01-07 11:18           ` Michal Hocko
  2021-01-07 11:38               ` Muchun Song
  -1 siblings, 1 reply; 75+ messages in thread
From: Michal Hocko @ 2021-01-07 11:18 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu 07-01-21 16:53:13, Muchun Song wrote:
> On Thu, Jan 7, 2021 at 4:41 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Thu 07-01-21 13:39:38, Muchun Song wrote:
> > > On Thu, Jan 7, 2021 at 12:56 AM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Wed 06-01-21 16:47:36, Muchun Song wrote:
> > > > > There is a race condition between __free_huge_page()
> > > > > and dissolve_free_huge_page().
> > > > >
> > > > > CPU0:                         CPU1:
> > > > >
> > > > > // page_count(page) == 1
> > > > > put_page(page)
> > > > >   __free_huge_page(page)
> > > > >                               dissolve_free_huge_page(page)
> > > > >                                 spin_lock(&hugetlb_lock)
> > > > >                                 // PageHuge(page) && !page_count(page)
> > > > >                                 update_and_free_page(page)
> > > > >                                 // page is freed to the buddy
> > > > >                                 spin_unlock(&hugetlb_lock)
> > > > >     spin_lock(&hugetlb_lock)
> > > > >     clear_page_huge_active(page)
> > > > >     enqueue_huge_page(page)
> > > > >     // It is wrong, the page is already freed
> > > > >     spin_unlock(&hugetlb_lock)
> > > > >
> > > > > The race windows is between put_page() and spin_lock() which
> > > > > is in the __free_huge_page().
> > > >
> > > > The race window reall is between put_page and dissolve_free_huge_page.
> > > > And the result is that the put_page path would clobber an unrelated page
> > > > (either free or already reused page) which is quite serious.
> > > > Fortunatelly pages are dissolved very rarely. I believe that user would
> > > > require to be privileged to hit this by intention.
> > > >
> > > > > We should make sure that the page is already on the free list
> > > > > when it is dissolved.
> > > >
> > > > Another option would be to check for PageHuge in __free_huge_page. Have
> > > > you considered that rather than add yet another state? The scope of the
> > > > spinlock would have to be extended. If that sounds more tricky then can
> > > > we check the page->lru in the dissolve path? If the page is still
> > > > PageHuge and reference count 0 then there shouldn't be many options
> > > > where it can be queued, right?
> > >
> > > Did you mean that we iterate over the free list to check whether
> > > the page is on the free list?
> >
> > No I meant to check that the page is enqueued which along with ref count
> > = 0 should mean it has been released to the pool unless I am missing
> > something.
> 
> The page can be on the free list or active list or empty when it
> is freed to the pool. How to check whether it is on the free list?

As I've said, I might be missing something here. But if the page is
freed why does it matter whether it is on a active list or free list
from the dissolve operation POV?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page
  2021-01-07  9:01           ` Muchun Song
  (?)
@ 2021-01-07 11:22           ` Michal Hocko
  -1 siblings, 0 replies; 75+ messages in thread
From: Michal Hocko @ 2021-01-07 11:22 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu 07-01-21 17:01:16, Muchun Song wrote:
> On Thu, Jan 7, 2021 at 4:39 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Thu 07-01-21 11:11:41, Muchun Song wrote:
> > > On Thu, Jan 7, 2021 at 1:07 AM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Wed 06-01-21 16:47:37, Muchun Song wrote:
> > > > > When dissolve_free_huge_page() races with __free_huge_page(), we can
> > > > > do a retry. Because the race window is small.
> > > >
> > > > Is this a bug fix or mere optimization. I have hard time to tell from
> > > > the description.
> > >
> > > It is optimization. Thanks.
> > >
> > > >
> > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > > ---
> > > > >  mm/hugetlb.c | 26 +++++++++++++++++++++-----
> > > > >  1 file changed, 21 insertions(+), 5 deletions(-)
> > > > >
> > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > > [...]
> > > > > @@ -1825,6 +1828,14 @@ int dissolve_free_huge_page(struct page *page)
> > > > >       }
> > > > >  out:
> > > > >       spin_unlock(&hugetlb_lock);
> > > > > +
> > > > > +     /*
> > > > > +      * If the freeing of the HugeTLB page is put on a work queue, we should
> > > > > +      * flush the work before retrying.
> > > > > +      */
> > > > > +     if (unlikely(rc == -EAGAIN))
> > > > > +             flush_work(&free_hpage_work);
> > > >
> > > > Is it safe to wait for the work to finish from this context?
> > >
> > > Yes. It is safe.
> >
> > Please expand on why in the changelog. Same for the optimization
> > including some numbers showing it really helps.
> 
> OK. Changelog should be updated. Do you agree the race window
> is quite small?

Yes, the race is very rare and the window itself should be relatively
small. This doesn't really answer the question about blocking though.

> If so, why is it not an optimization? Don’t we dissolve
> the page as successfully as possible when we call
> dissolve_free_huge_page()? I am confused about numbers showing.
> Because this is not a performance optimization, but an increase in
> the success rate of dissolving.

And it is a very theoretical one, right? Can you even trigger it? What
would happen if the race is lost? Is it serious? This all would be part
of the changelog ideally. This is a tricky area that spans hugetlb,
memory hotplug and hwpoisoning. All of them tricky. 
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one
  2021-01-07 11:16       ` Michal Hocko
@ 2021-01-07 11:24           ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07 11:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML, Yang Shi

On Thu, Jan 7, 2021 at 7:16 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 07-01-21 10:52:21, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 12:13 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 06-01-21 16:47:34, Muchun Song wrote:
> > > > If the refcount is one when it is migrated, it means that the page
> > > > was freed from under us. So we are done and do not need to migrate.
> > >
> > > Is this common enough that it would warrant the explicit check for each
> > > migration?
> >
> > Are you worried about the overhead caused by the check? Thanks.
>
> I am not as worried as I would like to see some actual justification for
> this optimization.

OK. The HugeTLB page can be freed after it was isolated but before
being migrated. Right? If we catch this case, it is an optimization.
I do this just like unmap_and_move() does for a base page.

> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one
@ 2021-01-07 11:24           ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07 11:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML, Yang Shi

On Thu, Jan 7, 2021 at 7:16 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 07-01-21 10:52:21, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 12:13 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Wed 06-01-21 16:47:34, Muchun Song wrote:
> > > > If the refcount is one when it is migrated, it means that the page
> > > > was freed from under us. So we are done and do not need to migrate.
> > >
> > > Is this common enough that it would warrant the explicit check for each
> > > migration?
> >
> > Are you worried about the overhead caused by the check? Thanks.
>
> I am not as worried as I would like to see some actual justification for
> this optimization.

OK. The HugeTLB page can be freed after it was isolated but before
being migrated. Right? If we catch this case, it is an optimization.
I do this just like unmap_and_move() does for a base page.

> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-07 11:18           ` Michal Hocko
@ 2021-01-07 11:38               ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07 11:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 7:18 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 07-01-21 16:53:13, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 4:41 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Thu 07-01-21 13:39:38, Muchun Song wrote:
> > > > On Thu, Jan 7, 2021 at 12:56 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Wed 06-01-21 16:47:36, Muchun Song wrote:
> > > > > > There is a race condition between __free_huge_page()
> > > > > > and dissolve_free_huge_page().
> > > > > >
> > > > > > CPU0:                         CPU1:
> > > > > >
> > > > > > // page_count(page) == 1
> > > > > > put_page(page)
> > > > > >   __free_huge_page(page)
> > > > > >                               dissolve_free_huge_page(page)
> > > > > >                                 spin_lock(&hugetlb_lock)
> > > > > >                                 // PageHuge(page) && !page_count(page)
> > > > > >                                 update_and_free_page(page)
> > > > > >                                 // page is freed to the buddy
> > > > > >                                 spin_unlock(&hugetlb_lock)
> > > > > >     spin_lock(&hugetlb_lock)
> > > > > >     clear_page_huge_active(page)
> > > > > >     enqueue_huge_page(page)
> > > > > >     // It is wrong, the page is already freed
> > > > > >     spin_unlock(&hugetlb_lock)
> > > > > >
> > > > > > The race windows is between put_page() and spin_lock() which
> > > > > > is in the __free_huge_page().
> > > > >
> > > > > The race window reall is between put_page and dissolve_free_huge_page.
> > > > > And the result is that the put_page path would clobber an unrelated page
> > > > > (either free or already reused page) which is quite serious.
> > > > > Fortunatelly pages are dissolved very rarely. I believe that user would
> > > > > require to be privileged to hit this by intention.
> > > > >
> > > > > > We should make sure that the page is already on the free list
> > > > > > when it is dissolved.
> > > > >
> > > > > Another option would be to check for PageHuge in __free_huge_page. Have
> > > > > you considered that rather than add yet another state? The scope of the
> > > > > spinlock would have to be extended. If that sounds more tricky then can
> > > > > we check the page->lru in the dissolve path? If the page is still
> > > > > PageHuge and reference count 0 then there shouldn't be many options
> > > > > where it can be queued, right?
> > > >
> > > > Did you mean that we iterate over the free list to check whether
> > > > the page is on the free list?
> > >
> > > No I meant to check that the page is enqueued which along with ref count
> > > = 0 should mean it has been released to the pool unless I am missing
> > > something.
> >
> > The page can be on the free list or active list or empty when it
> > is freed to the pool. How to check whether it is on the free list?
>
> As I've said, I might be missing something here. But if the page is
> freed why does it matter whether it is on a active list or free list
> from the dissolve operation POV?

As you said "check the page->lru". I have a question.
How to check the page->lru in the dissolve path?

BTW, dissolve_free_huge_page aims to free the page
to buddy allocator. put_page (for HugeTLB page) aims
to free the page to the hugepage pool.

> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
@ 2021-01-07 11:38               ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07 11:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 7:18 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 07-01-21 16:53:13, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 4:41 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Thu 07-01-21 13:39:38, Muchun Song wrote:
> > > > On Thu, Jan 7, 2021 at 12:56 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Wed 06-01-21 16:47:36, Muchun Song wrote:
> > > > > > There is a race condition between __free_huge_page()
> > > > > > and dissolve_free_huge_page().
> > > > > >
> > > > > > CPU0:                         CPU1:
> > > > > >
> > > > > > // page_count(page) == 1
> > > > > > put_page(page)
> > > > > >   __free_huge_page(page)
> > > > > >                               dissolve_free_huge_page(page)
> > > > > >                                 spin_lock(&hugetlb_lock)
> > > > > >                                 // PageHuge(page) && !page_count(page)
> > > > > >                                 update_and_free_page(page)
> > > > > >                                 // page is freed to the buddy
> > > > > >                                 spin_unlock(&hugetlb_lock)
> > > > > >     spin_lock(&hugetlb_lock)
> > > > > >     clear_page_huge_active(page)
> > > > > >     enqueue_huge_page(page)
> > > > > >     // It is wrong, the page is already freed
> > > > > >     spin_unlock(&hugetlb_lock)
> > > > > >
> > > > > > The race windows is between put_page() and spin_lock() which
> > > > > > is in the __free_huge_page().
> > > > >
> > > > > The race window reall is between put_page and dissolve_free_huge_page.
> > > > > And the result is that the put_page path would clobber an unrelated page
> > > > > (either free or already reused page) which is quite serious.
> > > > > Fortunatelly pages are dissolved very rarely. I believe that user would
> > > > > require to be privileged to hit this by intention.
> > > > >
> > > > > > We should make sure that the page is already on the free list
> > > > > > when it is dissolved.
> > > > >
> > > > > Another option would be to check for PageHuge in __free_huge_page. Have
> > > > > you considered that rather than add yet another state? The scope of the
> > > > > spinlock would have to be extended. If that sounds more tricky then can
> > > > > we check the page->lru in the dissolve path? If the page is still
> > > > > PageHuge and reference count 0 then there shouldn't be many options
> > > > > where it can be queued, right?
> > > >
> > > > Did you mean that we iterate over the free list to check whether
> > > > the page is on the free list?
> > >
> > > No I meant to check that the page is enqueued which along with ref count
> > > = 0 should mean it has been released to the pool unless I am missing
> > > something.
> >
> > The page can be on the free list or active list or empty when it
> > is freed to the pool. How to check whether it is on the free list?
>
> As I've said, I might be missing something here. But if the page is
> freed why does it matter whether it is on a active list or free list
> from the dissolve operation POV?

As you said "check the page->lru". I have a question.
How to check the page->lru in the dissolve path?

BTW, dissolve_free_huge_page aims to free the page
to buddy allocator. put_page (for HugeTLB page) aims
to free the page to the hugepage pool.

> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one
  2021-01-07 11:24           ` Muchun Song
  (?)
@ 2021-01-07 11:48           ` Michal Hocko
  -1 siblings, 0 replies; 75+ messages in thread
From: Michal Hocko @ 2021-01-07 11:48 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML, Yang Shi

On Thu 07-01-21 19:24:59, Muchun Song wrote:
> On Thu, Jan 7, 2021 at 7:16 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Thu 07-01-21 10:52:21, Muchun Song wrote:
> > > On Thu, Jan 7, 2021 at 12:13 AM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Wed 06-01-21 16:47:34, Muchun Song wrote:
> > > > > If the refcount is one when it is migrated, it means that the page
> > > > > was freed from under us. So we are done and do not need to migrate.
> > > >
> > > > Is this common enough that it would warrant the explicit check for each
> > > > migration?
> > >
> > > Are you worried about the overhead caused by the check? Thanks.
> >
> > I am not as worried as I would like to see some actual justification for
> > this optimization.
> 
> OK. The HugeTLB page can be freed after it was isolated but before
> being migrated. Right? If we catch this case, it is an optimization.
> I do this just like unmap_and_move() does for a base page.

If your only justification is that this path to be consistent with the
regular pages then make it explicit in the changelog. Please think of
people reading this code in the future and scratching their heads
whether this is something that can be altered and fail to find the
reasoning. Was this a huge optimization and some workload might suffer?
Can this happen in the first place or how likely it is?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-07 11:38               ` Muchun Song
  (?)
@ 2021-01-07 12:38               ` Michal Hocko
  2021-01-07 12:59                   ` Muchun Song
  -1 siblings, 1 reply; 75+ messages in thread
From: Michal Hocko @ 2021-01-07 12:38 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu 07-01-21 19:38:00, Muchun Song wrote:
> On Thu, Jan 7, 2021 at 7:18 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Thu 07-01-21 16:53:13, Muchun Song wrote:
> > > On Thu, Jan 7, 2021 at 4:41 PM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Thu 07-01-21 13:39:38, Muchun Song wrote:
> > > > > On Thu, Jan 7, 2021 at 12:56 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > > >
> > > > > > On Wed 06-01-21 16:47:36, Muchun Song wrote:
> > > > > > > There is a race condition between __free_huge_page()
> > > > > > > and dissolve_free_huge_page().
> > > > > > >
> > > > > > > CPU0:                         CPU1:
> > > > > > >
> > > > > > > // page_count(page) == 1
> > > > > > > put_page(page)
> > > > > > >   __free_huge_page(page)
> > > > > > >                               dissolve_free_huge_page(page)
> > > > > > >                                 spin_lock(&hugetlb_lock)
> > > > > > >                                 // PageHuge(page) && !page_count(page)
> > > > > > >                                 update_and_free_page(page)
> > > > > > >                                 // page is freed to the buddy
> > > > > > >                                 spin_unlock(&hugetlb_lock)
> > > > > > >     spin_lock(&hugetlb_lock)
> > > > > > >     clear_page_huge_active(page)
> > > > > > >     enqueue_huge_page(page)
> > > > > > >     // It is wrong, the page is already freed
> > > > > > >     spin_unlock(&hugetlb_lock)
> > > > > > >
> > > > > > > The race windows is between put_page() and spin_lock() which
> > > > > > > is in the __free_huge_page().
> > > > > >
> > > > > > The race window reall is between put_page and dissolve_free_huge_page.
> > > > > > And the result is that the put_page path would clobber an unrelated page
> > > > > > (either free or already reused page) which is quite serious.
> > > > > > Fortunatelly pages are dissolved very rarely. I believe that user would
> > > > > > require to be privileged to hit this by intention.
> > > > > >
> > > > > > > We should make sure that the page is already on the free list
> > > > > > > when it is dissolved.
> > > > > >
> > > > > > Another option would be to check for PageHuge in __free_huge_page. Have
> > > > > > you considered that rather than add yet another state? The scope of the
> > > > > > spinlock would have to be extended. If that sounds more tricky then can
> > > > > > we check the page->lru in the dissolve path? If the page is still
> > > > > > PageHuge and reference count 0 then there shouldn't be many options
> > > > > > where it can be queued, right?
> > > > >
> > > > > Did you mean that we iterate over the free list to check whether
> > > > > the page is on the free list?
> > > >
> > > > No I meant to check that the page is enqueued which along with ref count
> > > > = 0 should mean it has been released to the pool unless I am missing
> > > > something.
> > >
> > > The page can be on the free list or active list or empty when it
> > > is freed to the pool. How to check whether it is on the free list?
> >
> > As I've said, I might be missing something here. But if the page is
> > freed why does it matter whether it is on a active list or free list
> > from the dissolve operation POV?
> 
> As you said "check the page->lru". I have a question.
> How to check the page->lru in the dissolve path?

list_empty?
 
> BTW, dissolve_free_huge_page aims to free the page
> to buddy allocator. put_page (for HugeTLB page) aims
> to free the page to the hugepage pool.

Right. Can we simply back off in the dissolving path when ref count is
0 && PageHuge() if list_empty(page->lru)? Is there any other scenario
when the all above is true and the page is not being freed?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-07 12:38               ` Michal Hocko
@ 2021-01-07 12:59                   ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07 12:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 8:38 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 07-01-21 19:38:00, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 7:18 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Thu 07-01-21 16:53:13, Muchun Song wrote:
> > > > On Thu, Jan 7, 2021 at 4:41 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Thu 07-01-21 13:39:38, Muchun Song wrote:
> > > > > > On Thu, Jan 7, 2021 at 12:56 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > > > >
> > > > > > > On Wed 06-01-21 16:47:36, Muchun Song wrote:
> > > > > > > > There is a race condition between __free_huge_page()
> > > > > > > > and dissolve_free_huge_page().
> > > > > > > >
> > > > > > > > CPU0:                         CPU1:
> > > > > > > >
> > > > > > > > // page_count(page) == 1
> > > > > > > > put_page(page)
> > > > > > > >   __free_huge_page(page)
> > > > > > > >                               dissolve_free_huge_page(page)
> > > > > > > >                                 spin_lock(&hugetlb_lock)
> > > > > > > >                                 // PageHuge(page) && !page_count(page)
> > > > > > > >                                 update_and_free_page(page)
> > > > > > > >                                 // page is freed to the buddy
> > > > > > > >                                 spin_unlock(&hugetlb_lock)
> > > > > > > >     spin_lock(&hugetlb_lock)
> > > > > > > >     clear_page_huge_active(page)
> > > > > > > >     enqueue_huge_page(page)
> > > > > > > >     // It is wrong, the page is already freed
> > > > > > > >     spin_unlock(&hugetlb_lock)
> > > > > > > >
> > > > > > > > The race windows is between put_page() and spin_lock() which
> > > > > > > > is in the __free_huge_page().
> > > > > > >
> > > > > > > The race window reall is between put_page and dissolve_free_huge_page.
> > > > > > > And the result is that the put_page path would clobber an unrelated page
> > > > > > > (either free or already reused page) which is quite serious.
> > > > > > > Fortunatelly pages are dissolved very rarely. I believe that user would
> > > > > > > require to be privileged to hit this by intention.
> > > > > > >
> > > > > > > > We should make sure that the page is already on the free list
> > > > > > > > when it is dissolved.
> > > > > > >
> > > > > > > Another option would be to check for PageHuge in __free_huge_page. Have
> > > > > > > you considered that rather than add yet another state? The scope of the
> > > > > > > spinlock would have to be extended. If that sounds more tricky then can
> > > > > > > we check the page->lru in the dissolve path? If the page is still
> > > > > > > PageHuge and reference count 0 then there shouldn't be many options
> > > > > > > where it can be queued, right?
> > > > > >
> > > > > > Did you mean that we iterate over the free list to check whether
> > > > > > the page is on the free list?
> > > > >
> > > > > No I meant to check that the page is enqueued which along with ref count
> > > > > = 0 should mean it has been released to the pool unless I am missing
> > > > > something.
> > > >
> > > > The page can be on the free list or active list or empty when it
> > > > is freed to the pool. How to check whether it is on the free list?
> > >
> > > As I've said, I might be missing something here. But if the page is
> > > freed why does it matter whether it is on a active list or free list
> > > from the dissolve operation POV?
> >
> > As you said "check the page->lru". I have a question.
> > How to check the page->lru in the dissolve path?
>
> list_empty?

No.

>
> > BTW, dissolve_free_huge_page aims to free the page
> > to buddy allocator. put_page (for HugeTLB page) aims
> > to free the page to the hugepage pool.
>
> Right. Can we simply back off in the dissolving path when ref count is
> 0 && PageHuge() if list_empty(page->lru)? Is there any other scenario
> when the all above is true and the page is not being freed?

The list_empty(&page->lru) may always return false.
The page before freeing is on the active list
(hstate->hugepage_activelist).Then it is on the free list
after freeing. So list_empty(&page->lru) is always false.

> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
@ 2021-01-07 12:59                   ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07 12:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 8:38 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 07-01-21 19:38:00, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 7:18 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Thu 07-01-21 16:53:13, Muchun Song wrote:
> > > > On Thu, Jan 7, 2021 at 4:41 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Thu 07-01-21 13:39:38, Muchun Song wrote:
> > > > > > On Thu, Jan 7, 2021 at 12:56 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > > > >
> > > > > > > On Wed 06-01-21 16:47:36, Muchun Song wrote:
> > > > > > > > There is a race condition between __free_huge_page()
> > > > > > > > and dissolve_free_huge_page().
> > > > > > > >
> > > > > > > > CPU0:                         CPU1:
> > > > > > > >
> > > > > > > > // page_count(page) == 1
> > > > > > > > put_page(page)
> > > > > > > >   __free_huge_page(page)
> > > > > > > >                               dissolve_free_huge_page(page)
> > > > > > > >                                 spin_lock(&hugetlb_lock)
> > > > > > > >                                 // PageHuge(page) && !page_count(page)
> > > > > > > >                                 update_and_free_page(page)
> > > > > > > >                                 // page is freed to the buddy
> > > > > > > >                                 spin_unlock(&hugetlb_lock)
> > > > > > > >     spin_lock(&hugetlb_lock)
> > > > > > > >     clear_page_huge_active(page)
> > > > > > > >     enqueue_huge_page(page)
> > > > > > > >     // It is wrong, the page is already freed
> > > > > > > >     spin_unlock(&hugetlb_lock)
> > > > > > > >
> > > > > > > > The race windows is between put_page() and spin_lock() which
> > > > > > > > is in the __free_huge_page().
> > > > > > >
> > > > > > > The race window reall is between put_page and dissolve_free_huge_page.
> > > > > > > And the result is that the put_page path would clobber an unrelated page
> > > > > > > (either free or already reused page) which is quite serious.
> > > > > > > Fortunatelly pages are dissolved very rarely. I believe that user would
> > > > > > > require to be privileged to hit this by intention.
> > > > > > >
> > > > > > > > We should make sure that the page is already on the free list
> > > > > > > > when it is dissolved.
> > > > > > >
> > > > > > > Another option would be to check for PageHuge in __free_huge_page. Have
> > > > > > > you considered that rather than add yet another state? The scope of the
> > > > > > > spinlock would have to be extended. If that sounds more tricky then can
> > > > > > > we check the page->lru in the dissolve path? If the page is still
> > > > > > > PageHuge and reference count 0 then there shouldn't be many options
> > > > > > > where it can be queued, right?
> > > > > >
> > > > > > Did you mean that we iterate over the free list to check whether
> > > > > > the page is on the free list?
> > > > >
> > > > > No I meant to check that the page is enqueued which along with ref count
> > > > > = 0 should mean it has been released to the pool unless I am missing
> > > > > something.
> > > >
> > > > The page can be on the free list or active list or empty when it
> > > > is freed to the pool. How to check whether it is on the free list?
> > >
> > > As I've said, I might be missing something here. But if the page is
> > > freed why does it matter whether it is on a active list or free list
> > > from the dissolve operation POV?
> >
> > As you said "check the page->lru". I have a question.
> > How to check the page->lru in the dissolve path?
>
> list_empty?

No.

>
> > BTW, dissolve_free_huge_page aims to free the page
> > to buddy allocator. put_page (for HugeTLB page) aims
> > to free the page to the hugepage pool.
>
> Right. Can we simply back off in the dissolving path when ref count is
> 0 && PageHuge() if list_empty(page->lru)? Is there any other scenario
> when the all above is true and the page is not being freed?

The list_empty(&page->lru) may always return false.
The page before freeing is on the active list
(hstate->hugepage_activelist).Then it is on the free list
after freeing. So list_empty(&page->lru) is always false.

> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-07 12:59                   ` Muchun Song
  (?)
@ 2021-01-07 14:11                   ` Michal Hocko
  2021-01-07 15:11                       ` Muchun Song
  -1 siblings, 1 reply; 75+ messages in thread
From: Michal Hocko @ 2021-01-07 14:11 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu 07-01-21 20:59:33, Muchun Song wrote:
> On Thu, Jan 7, 2021 at 8:38 PM Michal Hocko <mhocko@suse.com> wrote:
[...]
> > Right. Can we simply back off in the dissolving path when ref count is
> > 0 && PageHuge() if list_empty(page->lru)? Is there any other scenario
> > when the all above is true and the page is not being freed?
> 
> The list_empty(&page->lru) may always return false.
> The page before freeing is on the active list
> (hstate->hugepage_activelist).Then it is on the free list
> after freeing. So list_empty(&page->lru) is always false.

The point I was trying to make is that the page has to be enqueued when
it is dissolved and freed. If the page is not enqueued then something
racing. But then I have realized that this is not a great check to
detect the race because pages are going to be released to buddy
allocator and that will reuse page->lru again. So scratch that and sorry
for the detour.

But that made me think some more and one way to reliably detect the race
should be PageHuge() check in the freeing path. This is what dissolve
path does already. PageHuge becomes false during update_and_free_page()
while holding the hugetlb_lock. So can we use that?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-07 14:11                   ` Michal Hocko
@ 2021-01-07 15:11                       ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07 15:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 10:11 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 07-01-21 20:59:33, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 8:38 PM Michal Hocko <mhocko@suse.com> wrote:
> [...]
> > > Right. Can we simply back off in the dissolving path when ref count is
> > > 0 && PageHuge() if list_empty(page->lru)? Is there any other scenario
> > > when the all above is true and the page is not being freed?
> >
> > The list_empty(&page->lru) may always return false.
> > The page before freeing is on the active list
> > (hstate->hugepage_activelist).Then it is on the free list
> > after freeing. So list_empty(&page->lru) is always false.
>
> The point I was trying to make is that the page has to be enqueued when
> it is dissolved and freed. If the page is not enqueued then something
> racing. But then I have realized that this is not a great check to
> detect the race because pages are going to be released to buddy
> allocator and that will reuse page->lru again. So scratch that and sorry
> for the detour.
>
> But that made me think some more and one way to reliably detect the race
> should be PageHuge() check in the freeing path. This is what dissolve
> path does already. PageHuge becomes false during update_and_free_page()
> while holding the hugetlb_lock. So can we use that?

It may make the thing complex. Apart from freeing it to the
buddy allocator, free_huge_page also does something else for
us. If we detect the race in the freeing path, if it is not a HugeTLB
page, the freeing path just returns. We also should move those
things to the dissolve path. Right?

But I find a tricky problem to solve. See free_huge_page().
If we are in non-task context, we should schedule a work
to free the page. We reuse the page->mapping. If the page
is already freed by the dissolve path. We should not touch
the page->mapping. So we need to check PageHuge().
The check and llist_add() should be protected by
hugetlb_lock. But we cannot do that. Right? If dissolve
happens after it is linked to the list. We also should
remove it from the list (hpage_freelist). It seems to make
the thing more complex.

> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
@ 2021-01-07 15:11                       ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-07 15:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu, Jan 7, 2021 at 10:11 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 07-01-21 20:59:33, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 8:38 PM Michal Hocko <mhocko@suse.com> wrote:
> [...]
> > > Right. Can we simply back off in the dissolving path when ref count is
> > > 0 && PageHuge() if list_empty(page->lru)? Is there any other scenario
> > > when the all above is true and the page is not being freed?
> >
> > The list_empty(&page->lru) may always return false.
> > The page before freeing is on the active list
> > (hstate->hugepage_activelist).Then it is on the free list
> > after freeing. So list_empty(&page->lru) is always false.
>
> The point I was trying to make is that the page has to be enqueued when
> it is dissolved and freed. If the page is not enqueued then something
> racing. But then I have realized that this is not a great check to
> detect the race because pages are going to be released to buddy
> allocator and that will reuse page->lru again. So scratch that and sorry
> for the detour.
>
> But that made me think some more and one way to reliably detect the race
> should be PageHuge() check in the freeing path. This is what dissolve
> path does already. PageHuge becomes false during update_and_free_page()
> while holding the hugetlb_lock. So can we use that?

It may make the thing complex. Apart from freeing it to the
buddy allocator, free_huge_page also does something else for
us. If we detect the race in the freeing path, if it is not a HugeTLB
page, the freeing path just returns. We also should move those
things to the dissolve path. Right?

But I find a tricky problem to solve. See free_huge_page().
If we are in non-task context, we should schedule a work
to free the page. We reuse the page->mapping. If the page
is already freed by the dissolve path. We should not touch
the page->mapping. So we need to check PageHuge().
The check and llist_add() should be protected by
hugetlb_lock. But we cannot do that. Right? If dissolve
happens after it is linked to the list. We also should
remove it from the list (hpage_freelist). It seems to make
the thing more complex.

> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-07  8:40       ` Michal Hocko
@ 2021-01-08  0:52         ` Mike Kravetz
  2021-01-08  8:28           ` Michal Hocko
  0 siblings, 1 reply; 75+ messages in thread
From: Mike Kravetz @ 2021-01-08  0:52 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Muchun Song, akpm, n-horiguchi, ak, linux-mm, linux-kernel

On 1/7/21 12:40 AM, Michal Hocko wrote:
> On Wed 06-01-21 12:58:29, Mike Kravetz wrote:
>> On 1/6/21 8:56 AM, Michal Hocko wrote:
>>> On Wed 06-01-21 16:47:36, Muchun Song wrote:
>>>> There is a race condition between __free_huge_page()
>>>> and dissolve_free_huge_page().
>>>>
>>>> CPU0:                         CPU1:
>>>>
>>>> // page_count(page) == 1
>>>> put_page(page)
>>>>   __free_huge_page(page)
>>>>                               dissolve_free_huge_page(page)
>>>>                                 spin_lock(&hugetlb_lock)
>>>>                                 // PageHuge(page) && !page_count(page)
>>>>                                 update_and_free_page(page)
>>>>                                 // page is freed to the buddy
>>>>                                 spin_unlock(&hugetlb_lock)
>>>>     spin_lock(&hugetlb_lock)
>>>>     clear_page_huge_active(page)
>>>>     enqueue_huge_page(page)
>>>>     // It is wrong, the page is already freed
>>>>     spin_unlock(&hugetlb_lock)
>>>>
>>>> The race windows is between put_page() and spin_lock() which
>>>> is in the __free_huge_page().
>>>
>>> The race window reall is between put_page and dissolve_free_huge_page.
>>> And the result is that the put_page path would clobber an unrelated page
>>> (either free or already reused page) which is quite serious.
>>> Fortunatelly pages are dissolved very rarely. I believe that user would
>>> require to be privileged to hit this by intention.
>>>
>>>> We should make sure that the page is already on the free list
>>>> when it is dissolved.
>>>
>>> Another option would be to check for PageHuge in __free_huge_page. Have
>>> you considered that rather than add yet another state? The scope of the
>>> spinlock would have to be extended. If that sounds more tricky then can
>>> we check the page->lru in the dissolve path? If the page is still
>>> PageHuge and reference count 0 then there shouldn't be many options
>>> where it can be queued, right?
>>
>> The tricky part with expanding lock scope will be the potential call to
>> hugepage_subpool_put_pages as it may also try to acquire the hugetlb_lock.
> 
> Can we rearrange the code and move hugepage_subpool_put_pages after all
> this is done? Or is there any strong reason for the particular ordering?

The reservation code is so fragile, I always get nervous when making
any changes.  However, the straight forward patch below passes some
simple testing.  The only difference I can see is that global counts
are adjusted before sub-pool counts.  This should not be an issue as
global and sub-pool counts are adjusted independently (not under the
same lock).  Allocation code checks sub-pool counts before global
counts.  So, there is a SMALL potential that a racing allocation which
previously succeeded would now fail.  I do not think this is an issue
in practice.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3b38ea958e95..658593840212 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1395,6 +1395,11 @@ static void __free_huge_page(struct page *page)
 		(struct hugepage_subpool *)page_private(page);
 	bool restore_reserve;
 
+	spin_lock(&hugetlb_lock);
+	/* check for race with dissolve_free_huge_page/update_and_free_page */
+	if (!PageHuge(page))
+		return;
+
 	VM_BUG_ON_PAGE(page_count(page), page);
 	VM_BUG_ON_PAGE(page_mapcount(page), page);
 
@@ -1403,26 +1408,6 @@ static void __free_huge_page(struct page *page)
 	restore_reserve = PagePrivate(page);
 	ClearPagePrivate(page);
 
-	/*
-	 * If PagePrivate() was set on page, page allocation consumed a
-	 * reservation.  If the page was associated with a subpool, there
-	 * would have been a page reserved in the subpool before allocation
-	 * via hugepage_subpool_get_pages().  Since we are 'restoring' the
-	 * reservtion, do not call hugepage_subpool_put_pages() as this will
-	 * remove the reserved page from the subpool.
-	 */
-	if (!restore_reserve) {
-		/*
-		 * A return code of zero implies that the subpool will be
-		 * under its minimum size if the reservation is not restored
-		 * after page is free.  Therefore, force restore_reserve
-		 * operation.
-		 */
-		if (hugepage_subpool_put_pages(spool, 1) == 0)
-			restore_reserve = true;
-	}
-
-	spin_lock(&hugetlb_lock);
 	clear_page_huge_active(page);
 	hugetlb_cgroup_uncharge_page(hstate_index(h),
 				     pages_per_huge_page(h), page);
@@ -1446,6 +1431,28 @@ static void __free_huge_page(struct page *page)
 		enqueue_huge_page(h, page);
 	}
 	spin_unlock(&hugetlb_lock);
+
+	/*
+	 * If PagePrivate() was set on page, page allocation consumed a
+	 * reservation.  If the page was associated with a subpool, there
+	 * would have been a page reserved in the subpool before allocation
+	 * via hugepage_subpool_get_pages().  Since we are 'restoring' the
+	 * reservtion, do not call hugepage_subpool_put_pages() as this will
+	 * remove the reserved page from the subpool.
+	 */
+	if (!restore_reserve) {
+		/*
+		 * A return code of zero implies that the subpool will be
+		 * under its minimum size if the reservation is not restored
+		 * after page is free.  Therefore, we need to add 1 to the
+		 * global reserve count.
+		 */
+		if (hugepage_subpool_put_pages(spool, 1) == 0) {
+			spin_lock(&hugetlb_lock);
+			h->resv_huge_pages++;
+			spin_unlock(&hugetlb_lock);
+		}
+	}
 }
 
 /*

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-07 15:11                       ` Muchun Song
  (?)
@ 2021-01-08  1:06                       ` Mike Kravetz
  2021-01-08  2:38                           ` Muchun Song
  -1 siblings, 1 reply; 75+ messages in thread
From: Mike Kravetz @ 2021-01-08  1:06 UTC (permalink / raw)
  To: Muchun Song, Michal Hocko
  Cc: Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On 1/7/21 7:11 AM, Muchun Song wrote:
> On Thu, Jan 7, 2021 at 10:11 PM Michal Hocko <mhocko@suse.com> wrote:
>>
>> On Thu 07-01-21 20:59:33, Muchun Song wrote:
>>> On Thu, Jan 7, 2021 at 8:38 PM Michal Hocko <mhocko@suse.com> wrote:
>> [...]
>>>> Right. Can we simply back off in the dissolving path when ref count is
>>>> 0 && PageHuge() if list_empty(page->lru)? Is there any other scenario
>>>> when the all above is true and the page is not being freed?
>>>
>>> The list_empty(&page->lru) may always return false.
>>> The page before freeing is on the active list
>>> (hstate->hugepage_activelist).Then it is on the free list
>>> after freeing. So list_empty(&page->lru) is always false.
>>
>> The point I was trying to make is that the page has to be enqueued when
>> it is dissolved and freed. If the page is not enqueued then something
>> racing. But then I have realized that this is not a great check to
>> detect the race because pages are going to be released to buddy
>> allocator and that will reuse page->lru again. So scratch that and sorry
>> for the detour.
>>
>> But that made me think some more and one way to reliably detect the race
>> should be PageHuge() check in the freeing path. This is what dissolve
>> path does already. PageHuge becomes false during update_and_free_page()
>> while holding the hugetlb_lock. So can we use that?
> 
> It may make the thing complex. Apart from freeing it to the
> buddy allocator, free_huge_page also does something else for
> us. If we detect the race in the freeing path, if it is not a HugeTLB
> page, the freeing path just returns. We also should move those
> things to the dissolve path. Right?
> 
> But I find a tricky problem to solve. See free_huge_page().
> If we are in non-task context, we should schedule a work
> to free the page. We reuse the page->mapping. If the page
> is already freed by the dissolve path. We should not touch
> the page->mapping. So we need to check PageHuge().
> The check and llist_add() should be protected by
> hugetlb_lock. But we cannot do that. Right? If dissolve
> happens after it is linked to the list. We also should
> remove it from the list (hpage_freelist). It seems to make
> the thing more complex.

You are correct.  This is also an issue/potential problem with this
race.  It seems that adding the state information might be the least
complex way to address issue.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-08  1:06                       ` Mike Kravetz
@ 2021-01-08  2:38                           ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-08  2:38 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Michal Hocko, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Fri, Jan 8, 2021 at 9:08 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 1/7/21 7:11 AM, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 10:11 PM Michal Hocko <mhocko@suse.com> wrote:
> >>
> >> On Thu 07-01-21 20:59:33, Muchun Song wrote:
> >>> On Thu, Jan 7, 2021 at 8:38 PM Michal Hocko <mhocko@suse.com> wrote:
> >> [...]
> >>>> Right. Can we simply back off in the dissolving path when ref count is
> >>>> 0 && PageHuge() if list_empty(page->lru)? Is there any other scenario
> >>>> when the all above is true and the page is not being freed?
> >>>
> >>> The list_empty(&page->lru) may always return false.
> >>> The page before freeing is on the active list
> >>> (hstate->hugepage_activelist).Then it is on the free list
> >>> after freeing. So list_empty(&page->lru) is always false.
> >>
> >> The point I was trying to make is that the page has to be enqueued when
> >> it is dissolved and freed. If the page is not enqueued then something
> >> racing. But then I have realized that this is not a great check to
> >> detect the race because pages are going to be released to buddy
> >> allocator and that will reuse page->lru again. So scratch that and sorry
> >> for the detour.
> >>
> >> But that made me think some more and one way to reliably detect the race
> >> should be PageHuge() check in the freeing path. This is what dissolve
> >> path does already. PageHuge becomes false during update_and_free_page()
> >> while holding the hugetlb_lock. So can we use that?
> >
> > It may make the thing complex. Apart from freeing it to the
> > buddy allocator, free_huge_page also does something else for
> > us. If we detect the race in the freeing path, if it is not a HugeTLB
> > page, the freeing path just returns. We also should move those
> > things to the dissolve path. Right?
> >
> > But I find a tricky problem to solve. See free_huge_page().
> > If we are in non-task context, we should schedule a work
> > to free the page. We reuse the page->mapping. If the page
> > is already freed by the dissolve path. We should not touch
> > the page->mapping. So we need to check PageHuge().
> > The check and llist_add() should be protected by
> > hugetlb_lock. But we cannot do that. Right? If dissolve
> > happens after it is linked to the list. We also should
> > remove it from the list (hpage_freelist). It seems to make
> > the thing more complex.
>
> You are correct.  This is also an issue/potential problem with this
> race.  It seems that adding the state information might be the least
> complex way to address issue.

Yeah, I agree with you. Adding a state is a simple solution.


>
> --
> Mike Kravetz

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
@ 2021-01-08  2:38                           ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-08  2:38 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Michal Hocko, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Fri, Jan 8, 2021 at 9:08 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 1/7/21 7:11 AM, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 10:11 PM Michal Hocko <mhocko@suse.com> wrote:
> >>
> >> On Thu 07-01-21 20:59:33, Muchun Song wrote:
> >>> On Thu, Jan 7, 2021 at 8:38 PM Michal Hocko <mhocko@suse.com> wrote:
> >> [...]
> >>>> Right. Can we simply back off in the dissolving path when ref count is
> >>>> 0 && PageHuge() if list_empty(page->lru)? Is there any other scenario
> >>>> when the all above is true and the page is not being freed?
> >>>
> >>> The list_empty(&page->lru) may always return false.
> >>> The page before freeing is on the active list
> >>> (hstate->hugepage_activelist).Then it is on the free list
> >>> after freeing. So list_empty(&page->lru) is always false.
> >>
> >> The point I was trying to make is that the page has to be enqueued when
> >> it is dissolved and freed. If the page is not enqueued then something
> >> racing. But then I have realized that this is not a great check to
> >> detect the race because pages are going to be released to buddy
> >> allocator and that will reuse page->lru again. So scratch that and sorry
> >> for the detour.
> >>
> >> But that made me think some more and one way to reliably detect the race
> >> should be PageHuge() check in the freeing path. This is what dissolve
> >> path does already. PageHuge becomes false during update_and_free_page()
> >> while holding the hugetlb_lock. So can we use that?
> >
> > It may make the thing complex. Apart from freeing it to the
> > buddy allocator, free_huge_page also does something else for
> > us. If we detect the race in the freeing path, if it is not a HugeTLB
> > page, the freeing path just returns. We also should move those
> > things to the dissolve path. Right?
> >
> > But I find a tricky problem to solve. See free_huge_page().
> > If we are in non-task context, we should schedule a work
> > to free the page. We reuse the page->mapping. If the page
> > is already freed by the dissolve path. We should not touch
> > the page->mapping. So we need to check PageHuge().
> > The check and llist_add() should be protected by
> > hugetlb_lock. But we cannot do that. Right? If dissolve
> > happens after it is linked to the list. We also should
> > remove it from the list (hpage_freelist). It seems to make
> > the thing more complex.
>
> You are correct.  This is also an issue/potential problem with this
> race.  It seems that adding the state information might be the least
> complex way to address issue.

Yeah, I agree with you. Adding a state is a simple solution.


>
> --
> Mike Kravetz


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-08  0:52         ` Mike Kravetz
@ 2021-01-08  8:28           ` Michal Hocko
  0 siblings, 0 replies; 75+ messages in thread
From: Michal Hocko @ 2021-01-08  8:28 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Muchun Song, akpm, n-horiguchi, ak, linux-mm, linux-kernel

On Thu 07-01-21 16:52:19, Mike Kravetz wrote:
> On 1/7/21 12:40 AM, Michal Hocko wrote:
> > On Wed 06-01-21 12:58:29, Mike Kravetz wrote:
> >> On 1/6/21 8:56 AM, Michal Hocko wrote:
> >>> On Wed 06-01-21 16:47:36, Muchun Song wrote:
> >>>> There is a race condition between __free_huge_page()
> >>>> and dissolve_free_huge_page().
> >>>>
> >>>> CPU0:                         CPU1:
> >>>>
> >>>> // page_count(page) == 1
> >>>> put_page(page)
> >>>>   __free_huge_page(page)
> >>>>                               dissolve_free_huge_page(page)
> >>>>                                 spin_lock(&hugetlb_lock)
> >>>>                                 // PageHuge(page) && !page_count(page)
> >>>>                                 update_and_free_page(page)
> >>>>                                 // page is freed to the buddy
> >>>>                                 spin_unlock(&hugetlb_lock)
> >>>>     spin_lock(&hugetlb_lock)
> >>>>     clear_page_huge_active(page)
> >>>>     enqueue_huge_page(page)
> >>>>     // It is wrong, the page is already freed
> >>>>     spin_unlock(&hugetlb_lock)
> >>>>
> >>>> The race windows is between put_page() and spin_lock() which
> >>>> is in the __free_huge_page().
> >>>
> >>> The race window reall is between put_page and dissolve_free_huge_page.
> >>> And the result is that the put_page path would clobber an unrelated page
> >>> (either free or already reused page) which is quite serious.
> >>> Fortunatelly pages are dissolved very rarely. I believe that user would
> >>> require to be privileged to hit this by intention.
> >>>
> >>>> We should make sure that the page is already on the free list
> >>>> when it is dissolved.
> >>>
> >>> Another option would be to check for PageHuge in __free_huge_page. Have
> >>> you considered that rather than add yet another state? The scope of the
> >>> spinlock would have to be extended. If that sounds more tricky then can
> >>> we check the page->lru in the dissolve path? If the page is still
> >>> PageHuge and reference count 0 then there shouldn't be many options
> >>> where it can be queued, right?
> >>
> >> The tricky part with expanding lock scope will be the potential call to
> >> hugepage_subpool_put_pages as it may also try to acquire the hugetlb_lock.
> > 
> > Can we rearrange the code and move hugepage_subpool_put_pages after all
> > this is done? Or is there any strong reason for the particular ordering?
> 
> The reservation code is so fragile, I always get nervous when making
> any changes.  However, the straight forward patch below passes some
> simple testing.  The only difference I can see is that global counts
> are adjusted before sub-pool counts.  This should not be an issue as
> global and sub-pool counts are adjusted independently (not under the
> same lock).  Allocation code checks sub-pool counts before global
> counts.  So, there is a SMALL potential that a racing allocation which
> previously succeeded would now fail.  I do not think this is an issue
> in practice.
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3b38ea958e95..658593840212 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1395,6 +1395,11 @@ static void __free_huge_page(struct page *page)
>  		(struct hugepage_subpool *)page_private(page);
>  	bool restore_reserve;
>  
> +	spin_lock(&hugetlb_lock);
> +	/* check for race with dissolve_free_huge_page/update_and_free_page */
> +	if (!PageHuge(page))
> +		return;
> +

This really wants to unlock the lock, right? But this is indeed what
I've had in mind.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-07 15:11                       ` Muchun Song
  (?)
  (?)
@ 2021-01-08  8:43                       ` Michal Hocko
  2021-01-08  9:01                           ` Muchun Song
  -1 siblings, 1 reply; 75+ messages in thread
From: Michal Hocko @ 2021-01-08  8:43 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Thu 07-01-21 23:11:22, Muchun Song wrote:
> On Thu, Jan 7, 2021 at 10:11 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Thu 07-01-21 20:59:33, Muchun Song wrote:
> > > On Thu, Jan 7, 2021 at 8:38 PM Michal Hocko <mhocko@suse.com> wrote:
> > [...]
> > > > Right. Can we simply back off in the dissolving path when ref count is
> > > > 0 && PageHuge() if list_empty(page->lru)? Is there any other scenario
> > > > when the all above is true and the page is not being freed?
> > >
> > > The list_empty(&page->lru) may always return false.
> > > The page before freeing is on the active list
> > > (hstate->hugepage_activelist).Then it is on the free list
> > > after freeing. So list_empty(&page->lru) is always false.
> >
> > The point I was trying to make is that the page has to be enqueued when
> > it is dissolved and freed. If the page is not enqueued then something
> > racing. But then I have realized that this is not a great check to
> > detect the race because pages are going to be released to buddy
> > allocator and that will reuse page->lru again. So scratch that and sorry
> > for the detour.
> >
> > But that made me think some more and one way to reliably detect the race
> > should be PageHuge() check in the freeing path. This is what dissolve
> > path does already. PageHuge becomes false during update_and_free_page()
> > while holding the hugetlb_lock. So can we use that?
> 
> It may make the thing complex. Apart from freeing it to the
> buddy allocator, free_huge_page also does something else for
> us. If we detect the race in the freeing path, if it is not a HugeTLB
> page, the freeing path just returns. We also should move those
> things to the dissolve path. Right?

Not sure what you mean. Dissolving is a subset of the freeing path. It
effectivelly only implements the update_and_free_page branch (aka free
to buddy). It skips some of the existing steps because it believes it
sees a freed page. But as you have pointed out this is racy and I
strongly suspect it is simply wrong in those assumptions. E.g. hugetlb
cgroup accounting can get wrong right?

The more I think about it the more I think that dissolving path should
simply have a common helper with  __free_huge_page that would release
the huge page to the allocator. The only thing that should be specific
to the dissolving path is HWpoison handling. It shouldn't be playing
with accounting and what not. Btw the HWpoison handling is suspicious as
well, a lost race would mean this doesn't happen. But maybe there is
some fixup handled later on...
 
> But I find a tricky problem to solve. See free_huge_page().
> If we are in non-task context, we should schedule a work
> to free the page. We reuse the page->mapping. If the page
> is already freed by the dissolve path. We should not touch
> the page->mapping. So we need to check PageHuge().
> The check and llist_add() should be protected by
> hugetlb_lock. But we cannot do that. Right? If dissolve
> happens after it is linked to the list. We also should
> remove it from the list (hpage_freelist). It seems to make
> the thing more complex.

I am not sure I follow you here but yes PageHuge under hugetlb_lock
should be the reliable way to check for the race. I am not sure why we
really need to care about mapping or other state.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-08  8:43                       ` Michal Hocko
@ 2021-01-08  9:01                           ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-08  9:01 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Fri, Jan 8, 2021 at 4:43 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 07-01-21 23:11:22, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 10:11 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Thu 07-01-21 20:59:33, Muchun Song wrote:
> > > > On Thu, Jan 7, 2021 at 8:38 PM Michal Hocko <mhocko@suse.com> wrote:
> > > [...]
> > > > > Right. Can we simply back off in the dissolving path when ref count is
> > > > > 0 && PageHuge() if list_empty(page->lru)? Is there any other scenario
> > > > > when the all above is true and the page is not being freed?
> > > >
> > > > The list_empty(&page->lru) may always return false.
> > > > The page before freeing is on the active list
> > > > (hstate->hugepage_activelist).Then it is on the free list
> > > > after freeing. So list_empty(&page->lru) is always false.
> > >
> > > The point I was trying to make is that the page has to be enqueued when
> > > it is dissolved and freed. If the page is not enqueued then something
> > > racing. But then I have realized that this is not a great check to
> > > detect the race because pages are going to be released to buddy
> > > allocator and that will reuse page->lru again. So scratch that and sorry
> > > for the detour.
> > >
> > > But that made me think some more and one way to reliably detect the race
> > > should be PageHuge() check in the freeing path. This is what dissolve
> > > path does already. PageHuge becomes false during update_and_free_page()
> > > while holding the hugetlb_lock. So can we use that?
> >
> > It may make the thing complex. Apart from freeing it to the
> > buddy allocator, free_huge_page also does something else for
> > us. If we detect the race in the freeing path, if it is not a HugeTLB
> > page, the freeing path just returns. We also should move those
> > things to the dissolve path. Right?
>
> Not sure what you mean. Dissolving is a subset of the freeing path. It
> effectivelly only implements the update_and_free_page branch (aka free
> to buddy). It skips some of the existing steps because it believes it
> sees a freed page. But as you have pointed out this is racy and I
> strongly suspect it is simply wrong in those assumptions. E.g. hugetlb
> cgroup accounting can get wrong right?

OK. I know what you mean. The update_and_free_page should
do the freeing which is similar to __free_huge_page().

>
> The more I think about it the more I think that dissolving path should
> simply have a common helper with  __free_huge_page that would release
> the huge page to the allocator. The only thing that should be specific
> to the dissolving path is HWpoison handling. It shouldn't be playing
> with accounting and what not. Btw the HWpoison handling is suspicious as
> well, a lost race would mean this doesn't happen. But maybe there is
> some fixup handled later on...
>
> > But I find a tricky problem to solve. See free_huge_page().
> > If we are in non-task context, we should schedule a work
> > to free the page. We reuse the page->mapping. If the page
> > is already freed by the dissolve path. We should not touch
> > the page->mapping. So we need to check PageHuge().
> > The check and llist_add() should be protected by
> > hugetlb_lock. But we cannot do that. Right? If dissolve
> > happens after it is linked to the list. We also should
> > remove it from the list (hpage_freelist). It seems to make
> > the thing more complex.
>
> I am not sure I follow you here but yes PageHuge under hugetlb_lock
> should be the reliable way to check for the race. I am not sure why we
> really need to care about mapping or other state.

CPU0:                               CPU1:
free_huge_page(page)
  if (PageHuge(page))
                                    dissolve_free_huge_page(page)
                                      spin_lock(&hugetlb_lock)
                                      update_and_free_page(page)
                                      spin_unlock(&hugetlb_lock)
    llist_add(page->mapping)
    // the mapping is corrupted

The PageHuge(page) and llist_add() should be protected by
hugetlb_lock. Right? If so, we cannot hold hugetlb_lock
in free_huge_page() path.

> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
@ 2021-01-08  9:01                           ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-08  9:01 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Fri, Jan 8, 2021 at 4:43 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 07-01-21 23:11:22, Muchun Song wrote:
> > On Thu, Jan 7, 2021 at 10:11 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Thu 07-01-21 20:59:33, Muchun Song wrote:
> > > > On Thu, Jan 7, 2021 at 8:38 PM Michal Hocko <mhocko@suse.com> wrote:
> > > [...]
> > > > > Right. Can we simply back off in the dissolving path when ref count is
> > > > > 0 && PageHuge() if list_empty(page->lru)? Is there any other scenario
> > > > > when the all above is true and the page is not being freed?
> > > >
> > > > The list_empty(&page->lru) may always return false.
> > > > The page before freeing is on the active list
> > > > (hstate->hugepage_activelist).Then it is on the free list
> > > > after freeing. So list_empty(&page->lru) is always false.
> > >
> > > The point I was trying to make is that the page has to be enqueued when
> > > it is dissolved and freed. If the page is not enqueued then something
> > > racing. But then I have realized that this is not a great check to
> > > detect the race because pages are going to be released to buddy
> > > allocator and that will reuse page->lru again. So scratch that and sorry
> > > for the detour.
> > >
> > > But that made me think some more and one way to reliably detect the race
> > > should be PageHuge() check in the freeing path. This is what dissolve
> > > path does already. PageHuge becomes false during update_and_free_page()
> > > while holding the hugetlb_lock. So can we use that?
> >
> > It may make the thing complex. Apart from freeing it to the
> > buddy allocator, free_huge_page also does something else for
> > us. If we detect the race in the freeing path, if it is not a HugeTLB
> > page, the freeing path just returns. We also should move those
> > things to the dissolve path. Right?
>
> Not sure what you mean. Dissolving is a subset of the freeing path. It
> effectivelly only implements the update_and_free_page branch (aka free
> to buddy). It skips some of the existing steps because it believes it
> sees a freed page. But as you have pointed out this is racy and I
> strongly suspect it is simply wrong in those assumptions. E.g. hugetlb
> cgroup accounting can get wrong right?

OK. I know what you mean. The update_and_free_page should
do the freeing which is similar to __free_huge_page().

>
> The more I think about it the more I think that dissolving path should
> simply have a common helper with  __free_huge_page that would release
> the huge page to the allocator. The only thing that should be specific
> to the dissolving path is HWpoison handling. It shouldn't be playing
> with accounting and what not. Btw the HWpoison handling is suspicious as
> well, a lost race would mean this doesn't happen. But maybe there is
> some fixup handled later on...
>
> > But I find a tricky problem to solve. See free_huge_page().
> > If we are in non-task context, we should schedule a work
> > to free the page. We reuse the page->mapping. If the page
> > is already freed by the dissolve path. We should not touch
> > the page->mapping. So we need to check PageHuge().
> > The check and llist_add() should be protected by
> > hugetlb_lock. But we cannot do that. Right? If dissolve
> > happens after it is linked to the list. We also should
> > remove it from the list (hpage_freelist). It seems to make
> > the thing more complex.
>
> I am not sure I follow you here but yes PageHuge under hugetlb_lock
> should be the reliable way to check for the race. I am not sure why we
> really need to care about mapping or other state.

CPU0:                               CPU1:
free_huge_page(page)
  if (PageHuge(page))
                                    dissolve_free_huge_page(page)
                                      spin_lock(&hugetlb_lock)
                                      update_and_free_page(page)
                                      spin_unlock(&hugetlb_lock)
    llist_add(page->mapping)
    // the mapping is corrupted

The PageHuge(page) and llist_add() should be protected by
hugetlb_lock. Right? If so, we cannot hold hugetlb_lock
in free_huge_page() path.

> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-08  9:01                           ` Muchun Song
  (?)
@ 2021-01-08  9:31                           ` Michal Hocko
  2021-01-08 10:08                               ` Muchun Song
  -1 siblings, 1 reply; 75+ messages in thread
From: Michal Hocko @ 2021-01-08  9:31 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Fri 08-01-21 17:01:03, Muchun Song wrote:
> On Fri, Jan 8, 2021 at 4:43 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Thu 07-01-21 23:11:22, Muchun Song wrote:
[..]
> > > But I find a tricky problem to solve. See free_huge_page().
> > > If we are in non-task context, we should schedule a work
> > > to free the page. We reuse the page->mapping. If the page
> > > is already freed by the dissolve path. We should not touch
> > > the page->mapping. So we need to check PageHuge().
> > > The check and llist_add() should be protected by
> > > hugetlb_lock. But we cannot do that. Right? If dissolve
> > > happens after it is linked to the list. We also should
> > > remove it from the list (hpage_freelist). It seems to make
> > > the thing more complex.
> >
> > I am not sure I follow you here but yes PageHuge under hugetlb_lock
> > should be the reliable way to check for the race. I am not sure why we
> > really need to care about mapping or other state.
> 
> CPU0:                               CPU1:
> free_huge_page(page)
>   if (PageHuge(page))
>                                     dissolve_free_huge_page(page)
>                                       spin_lock(&hugetlb_lock)
>                                       update_and_free_page(page)
>                                       spin_unlock(&hugetlb_lock)
>     llist_add(page->mapping)
>     // the mapping is corrupted
> 
> The PageHuge(page) and llist_add() should be protected by
> hugetlb_lock. Right? If so, we cannot hold hugetlb_lock
> in free_huge_page() path.

OK, I see. I completely forgot about this snowflake. I thought that
free_huge_page was a typo missing initial __. Anyway you are right that
this path needs a check as well. But I don't see why we couldn't use the
lock here. The lock can be held only inside the !in_task branch.
Although it would be much more nicer if the lock was held at this layer
rather than both free_huge_page and __free_huge_page. But that clean up
can be done on top.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-08  9:31                           ` Michal Hocko
@ 2021-01-08 10:08                               ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-08 10:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Fri, Jan 8, 2021 at 5:31 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Fri 08-01-21 17:01:03, Muchun Song wrote:
> > On Fri, Jan 8, 2021 at 4:43 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Thu 07-01-21 23:11:22, Muchun Song wrote:
> [..]
> > > > But I find a tricky problem to solve. See free_huge_page().
> > > > If we are in non-task context, we should schedule a work
> > > > to free the page. We reuse the page->mapping. If the page
> > > > is already freed by the dissolve path. We should not touch
> > > > the page->mapping. So we need to check PageHuge().
> > > > The check and llist_add() should be protected by
> > > > hugetlb_lock. But we cannot do that. Right? If dissolve
> > > > happens after it is linked to the list. We also should
> > > > remove it from the list (hpage_freelist). It seems to make
> > > > the thing more complex.
> > >
> > > I am not sure I follow you here but yes PageHuge under hugetlb_lock
> > > should be the reliable way to check for the race. I am not sure why we
> > > really need to care about mapping or other state.
> >
> > CPU0:                               CPU1:
> > free_huge_page(page)
> >   if (PageHuge(page))
> >                                     dissolve_free_huge_page(page)
> >                                       spin_lock(&hugetlb_lock)
> >                                       update_and_free_page(page)
> >                                       spin_unlock(&hugetlb_lock)
> >     llist_add(page->mapping)
> >     // the mapping is corrupted
> >
> > The PageHuge(page) and llist_add() should be protected by
> > hugetlb_lock. Right? If so, we cannot hold hugetlb_lock
> > in free_huge_page() path.
>
> OK, I see. I completely forgot about this snowflake. I thought that
> free_huge_page was a typo missing initial __. Anyway you are right that
> this path needs a check as well. But I don't see why we couldn't use the
> lock here. The lock can be held only inside the !in_task branch.

Because we hold the hugetlb_lock without disable irq. So if an interrupt
occurs after we hold the lock. And we also free a HugeTLB page. Then
it leads to deadlock.

task context:                             interrupt context:

put_page(page)
  spin_lock(&hugetlb_lock)

                                                put_page(page)
                                                  spin_lock(&hugetlb_lock)
                                                  // deadlock

  spin_unlock(&hugetlb_lock)

> Although it would be much more nicer if the lock was held at this layer
> rather than both free_huge_page and __free_huge_page. But that clean up
> can be done on top.
>
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
@ 2021-01-08 10:08                               ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-08 10:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Fri, Jan 8, 2021 at 5:31 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Fri 08-01-21 17:01:03, Muchun Song wrote:
> > On Fri, Jan 8, 2021 at 4:43 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Thu 07-01-21 23:11:22, Muchun Song wrote:
> [..]
> > > > But I find a tricky problem to solve. See free_huge_page().
> > > > If we are in non-task context, we should schedule a work
> > > > to free the page. We reuse the page->mapping. If the page
> > > > is already freed by the dissolve path. We should not touch
> > > > the page->mapping. So we need to check PageHuge().
> > > > The check and llist_add() should be protected by
> > > > hugetlb_lock. But we cannot do that. Right? If dissolve
> > > > happens after it is linked to the list. We also should
> > > > remove it from the list (hpage_freelist). It seems to make
> > > > the thing more complex.
> > >
> > > I am not sure I follow you here but yes PageHuge under hugetlb_lock
> > > should be the reliable way to check for the race. I am not sure why we
> > > really need to care about mapping or other state.
> >
> > CPU0:                               CPU1:
> > free_huge_page(page)
> >   if (PageHuge(page))
> >                                     dissolve_free_huge_page(page)
> >                                       spin_lock(&hugetlb_lock)
> >                                       update_and_free_page(page)
> >                                       spin_unlock(&hugetlb_lock)
> >     llist_add(page->mapping)
> >     // the mapping is corrupted
> >
> > The PageHuge(page) and llist_add() should be protected by
> > hugetlb_lock. Right? If so, we cannot hold hugetlb_lock
> > in free_huge_page() path.
>
> OK, I see. I completely forgot about this snowflake. I thought that
> free_huge_page was a typo missing initial __. Anyway you are right that
> this path needs a check as well. But I don't see why we couldn't use the
> lock here. The lock can be held only inside the !in_task branch.

Because we hold the hugetlb_lock without disable irq. So if an interrupt
occurs after we hold the lock. And we also free a HugeTLB page. Then
it leads to deadlock.

task context:                             interrupt context:

put_page(page)
  spin_lock(&hugetlb_lock)

                                                put_page(page)
                                                  spin_lock(&hugetlb_lock)
                                                  // deadlock

  spin_unlock(&hugetlb_lock)

> Although it would be much more nicer if the lock was held at this layer
> rather than both free_huge_page and __free_huge_page. But that clean up
> can be done on top.
>
> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-08 10:08                               ` Muchun Song
  (?)
@ 2021-01-08 11:44                               ` Michal Hocko
  2021-01-08 11:52                                   ` Muchun Song
  -1 siblings, 1 reply; 75+ messages in thread
From: Michal Hocko @ 2021-01-08 11:44 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Fri 08-01-21 18:08:57, Muchun Song wrote:
> On Fri, Jan 8, 2021 at 5:31 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Fri 08-01-21 17:01:03, Muchun Song wrote:
> > > On Fri, Jan 8, 2021 at 4:43 PM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Thu 07-01-21 23:11:22, Muchun Song wrote:
> > [..]
> > > > > But I find a tricky problem to solve. See free_huge_page().
> > > > > If we are in non-task context, we should schedule a work
> > > > > to free the page. We reuse the page->mapping. If the page
> > > > > is already freed by the dissolve path. We should not touch
> > > > > the page->mapping. So we need to check PageHuge().
> > > > > The check and llist_add() should be protected by
> > > > > hugetlb_lock. But we cannot do that. Right? If dissolve
> > > > > happens after it is linked to the list. We also should
> > > > > remove it from the list (hpage_freelist). It seems to make
> > > > > the thing more complex.
> > > >
> > > > I am not sure I follow you here but yes PageHuge under hugetlb_lock
> > > > should be the reliable way to check for the race. I am not sure why we
> > > > really need to care about mapping or other state.
> > >
> > > CPU0:                               CPU1:
> > > free_huge_page(page)
> > >   if (PageHuge(page))
> > >                                     dissolve_free_huge_page(page)
> > >                                       spin_lock(&hugetlb_lock)
> > >                                       update_and_free_page(page)
> > >                                       spin_unlock(&hugetlb_lock)
> > >     llist_add(page->mapping)
> > >     // the mapping is corrupted
> > >
> > > The PageHuge(page) and llist_add() should be protected by
> > > hugetlb_lock. Right? If so, we cannot hold hugetlb_lock
> > > in free_huge_page() path.
> >
> > OK, I see. I completely forgot about this snowflake. I thought that
> > free_huge_page was a typo missing initial __. Anyway you are right that
> > this path needs a check as well. But I don't see why we couldn't use the
> > lock here. The lock can be held only inside the !in_task branch.
> 
> Because we hold the hugetlb_lock without disable irq. So if an interrupt
> occurs after we hold the lock. And we also free a HugeTLB page. Then
> it leads to deadlock.

There is nothing really to prevent making hugetlb_lock irq safe, isn't
it?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-08 11:44                               ` Michal Hocko
@ 2021-01-08 11:52                                   ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-08 11:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Fri, Jan 8, 2021 at 7:44 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Fri 08-01-21 18:08:57, Muchun Song wrote:
> > On Fri, Jan 8, 2021 at 5:31 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Fri 08-01-21 17:01:03, Muchun Song wrote:
> > > > On Fri, Jan 8, 2021 at 4:43 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Thu 07-01-21 23:11:22, Muchun Song wrote:
> > > [..]
> > > > > > But I find a tricky problem to solve. See free_huge_page().
> > > > > > If we are in non-task context, we should schedule a work
> > > > > > to free the page. We reuse the page->mapping. If the page
> > > > > > is already freed by the dissolve path. We should not touch
> > > > > > the page->mapping. So we need to check PageHuge().
> > > > > > The check and llist_add() should be protected by
> > > > > > hugetlb_lock. But we cannot do that. Right? If dissolve
> > > > > > happens after it is linked to the list. We also should
> > > > > > remove it from the list (hpage_freelist). It seems to make
> > > > > > the thing more complex.
> > > > >
> > > > > I am not sure I follow you here but yes PageHuge under hugetlb_lock
> > > > > should be the reliable way to check for the race. I am not sure why we
> > > > > really need to care about mapping or other state.
> > > >
> > > > CPU0:                               CPU1:
> > > > free_huge_page(page)
> > > >   if (PageHuge(page))
> > > >                                     dissolve_free_huge_page(page)
> > > >                                       spin_lock(&hugetlb_lock)
> > > >                                       update_and_free_page(page)
> > > >                                       spin_unlock(&hugetlb_lock)
> > > >     llist_add(page->mapping)
> > > >     // the mapping is corrupted
> > > >
> > > > The PageHuge(page) and llist_add() should be protected by
> > > > hugetlb_lock. Right? If so, we cannot hold hugetlb_lock
> > > > in free_huge_page() path.
> > >
> > > OK, I see. I completely forgot about this snowflake. I thought that
> > > free_huge_page was a typo missing initial __. Anyway you are right that
> > > this path needs a check as well. But I don't see why we couldn't use the
> > > lock here. The lock can be held only inside the !in_task branch.
> >
> > Because we hold the hugetlb_lock without disable irq. So if an interrupt
> > occurs after we hold the lock. And we also free a HugeTLB page. Then
> > it leads to deadlock.
>
> There is nothing really to prevent making hugetlb_lock irq safe, isn't
> it?

Yeah. We can make the hugetlb_lock irq safe. But why have we not
done this? Maybe the commit changelog can provide more information.

See https://github.com/torvalds/linux/commit/c77c0a8ac4c522638a8242fcb9de9496e3cdbb2d

> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
@ 2021-01-08 11:52                                   ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-08 11:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Fri, Jan 8, 2021 at 7:44 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Fri 08-01-21 18:08:57, Muchun Song wrote:
> > On Fri, Jan 8, 2021 at 5:31 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Fri 08-01-21 17:01:03, Muchun Song wrote:
> > > > On Fri, Jan 8, 2021 at 4:43 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Thu 07-01-21 23:11:22, Muchun Song wrote:
> > > [..]
> > > > > > But I find a tricky problem to solve. See free_huge_page().
> > > > > > If we are in non-task context, we should schedule a work
> > > > > > to free the page. We reuse the page->mapping. If the page
> > > > > > is already freed by the dissolve path. We should not touch
> > > > > > the page->mapping. So we need to check PageHuge().
> > > > > > The check and llist_add() should be protected by
> > > > > > hugetlb_lock. But we cannot do that. Right? If dissolve
> > > > > > happens after it is linked to the list. We also should
> > > > > > remove it from the list (hpage_freelist). It seems to make
> > > > > > the thing more complex.
> > > > >
> > > > > I am not sure I follow you here but yes PageHuge under hugetlb_lock
> > > > > should be the reliable way to check for the race. I am not sure why we
> > > > > really need to care about mapping or other state.
> > > >
> > > > CPU0:                               CPU1:
> > > > free_huge_page(page)
> > > >   if (PageHuge(page))
> > > >                                     dissolve_free_huge_page(page)
> > > >                                       spin_lock(&hugetlb_lock)
> > > >                                       update_and_free_page(page)
> > > >                                       spin_unlock(&hugetlb_lock)
> > > >     llist_add(page->mapping)
> > > >     // the mapping is corrupted
> > > >
> > > > The PageHuge(page) and llist_add() should be protected by
> > > > hugetlb_lock. Right? If so, we cannot hold hugetlb_lock
> > > > in free_huge_page() path.
> > >
> > > OK, I see. I completely forgot about this snowflake. I thought that
> > > free_huge_page was a typo missing initial __. Anyway you are right that
> > > this path needs a check as well. But I don't see why we couldn't use the
> > > lock here. The lock can be held only inside the !in_task branch.
> >
> > Because we hold the hugetlb_lock without disable irq. So if an interrupt
> > occurs after we hold the lock. And we also free a HugeTLB page. Then
> > it leads to deadlock.
>
> There is nothing really to prevent making hugetlb_lock irq safe, isn't
> it?

Yeah. We can make the hugetlb_lock irq safe. But why have we not
done this? Maybe the commit changelog can provide more information.

See https://github.com/torvalds/linux/commit/c77c0a8ac4c522638a8242fcb9de9496e3cdbb2d

> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-08 11:52                                   ` Muchun Song
  (?)
@ 2021-01-08 12:04                                   ` Michal Hocko
  2021-01-08 12:24                                       ` Muchun Song
  -1 siblings, 1 reply; 75+ messages in thread
From: Michal Hocko @ 2021-01-08 12:04 UTC (permalink / raw)
  To: Muchun Song
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Fri 08-01-21 19:52:54, Muchun Song wrote:
> On Fri, Jan 8, 2021 at 7:44 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Fri 08-01-21 18:08:57, Muchun Song wrote:
> > > On Fri, Jan 8, 2021 at 5:31 PM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Fri 08-01-21 17:01:03, Muchun Song wrote:
> > > > > On Fri, Jan 8, 2021 at 4:43 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > > >
> > > > > > On Thu 07-01-21 23:11:22, Muchun Song wrote:
> > > > [..]
> > > > > > > But I find a tricky problem to solve. See free_huge_page().
> > > > > > > If we are in non-task context, we should schedule a work
> > > > > > > to free the page. We reuse the page->mapping. If the page
> > > > > > > is already freed by the dissolve path. We should not touch
> > > > > > > the page->mapping. So we need to check PageHuge().
> > > > > > > The check and llist_add() should be protected by
> > > > > > > hugetlb_lock. But we cannot do that. Right? If dissolve
> > > > > > > happens after it is linked to the list. We also should
> > > > > > > remove it from the list (hpage_freelist). It seems to make
> > > > > > > the thing more complex.
> > > > > >
> > > > > > I am not sure I follow you here but yes PageHuge under hugetlb_lock
> > > > > > should be the reliable way to check for the race. I am not sure why we
> > > > > > really need to care about mapping or other state.
> > > > >
> > > > > CPU0:                               CPU1:
> > > > > free_huge_page(page)
> > > > >   if (PageHuge(page))
> > > > >                                     dissolve_free_huge_page(page)
> > > > >                                       spin_lock(&hugetlb_lock)
> > > > >                                       update_and_free_page(page)
> > > > >                                       spin_unlock(&hugetlb_lock)
> > > > >     llist_add(page->mapping)
> > > > >     // the mapping is corrupted
> > > > >
> > > > > The PageHuge(page) and llist_add() should be protected by
> > > > > hugetlb_lock. Right? If so, we cannot hold hugetlb_lock
> > > > > in free_huge_page() path.
> > > >
> > > > OK, I see. I completely forgot about this snowflake. I thought that
> > > > free_huge_page was a typo missing initial __. Anyway you are right that
> > > > this path needs a check as well. But I don't see why we couldn't use the
> > > > lock here. The lock can be held only inside the !in_task branch.
> > >
> > > Because we hold the hugetlb_lock without disable irq. So if an interrupt
> > > occurs after we hold the lock. And we also free a HugeTLB page. Then
> > > it leads to deadlock.
> >
> > There is nothing really to prevent making hugetlb_lock irq safe, isn't
> > it?
> 
> Yeah. We can make the hugetlb_lock irq safe. But why have we not
> done this? Maybe the commit changelog can provide more information.
> 
> See https://github.com/torvalds/linux/commit/c77c0a8ac4c522638a8242fcb9de9496e3cdbb2d

Dang! Maybe it is the time to finally stack one workaround on top of the
other and put this code into the shape. The amount of hackery and subtle
details has just grown beyond healthy!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
  2021-01-08 12:04                                   ` Michal Hocko
@ 2021-01-08 12:24                                       ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-08 12:24 UTC (permalink / raw)
  To: Michal Hocko, Mike Kravetz
  Cc: Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Fri, Jan 8, 2021 at 8:04 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Fri 08-01-21 19:52:54, Muchun Song wrote:
> > On Fri, Jan 8, 2021 at 7:44 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Fri 08-01-21 18:08:57, Muchun Song wrote:
> > > > On Fri, Jan 8, 2021 at 5:31 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Fri 08-01-21 17:01:03, Muchun Song wrote:
> > > > > > On Fri, Jan 8, 2021 at 4:43 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > > > >
> > > > > > > On Thu 07-01-21 23:11:22, Muchun Song wrote:
> > > > > [..]
> > > > > > > > But I find a tricky problem to solve. See free_huge_page().
> > > > > > > > If we are in non-task context, we should schedule a work
> > > > > > > > to free the page. We reuse the page->mapping. If the page
> > > > > > > > is already freed by the dissolve path. We should not touch
> > > > > > > > the page->mapping. So we need to check PageHuge().
> > > > > > > > The check and llist_add() should be protected by
> > > > > > > > hugetlb_lock. But we cannot do that. Right? If dissolve
> > > > > > > > happens after it is linked to the list. We also should
> > > > > > > > remove it from the list (hpage_freelist). It seems to make
> > > > > > > > the thing more complex.
> > > > > > >
> > > > > > > I am not sure I follow you here but yes PageHuge under hugetlb_lock
> > > > > > > should be the reliable way to check for the race. I am not sure why we
> > > > > > > really need to care about mapping or other state.
> > > > > >
> > > > > > CPU0:                               CPU1:
> > > > > > free_huge_page(page)
> > > > > >   if (PageHuge(page))
> > > > > >                                     dissolve_free_huge_page(page)
> > > > > >                                       spin_lock(&hugetlb_lock)
> > > > > >                                       update_and_free_page(page)
> > > > > >                                       spin_unlock(&hugetlb_lock)
> > > > > >     llist_add(page->mapping)
> > > > > >     // the mapping is corrupted
> > > > > >
> > > > > > The PageHuge(page) and llist_add() should be protected by
> > > > > > hugetlb_lock. Right? If so, we cannot hold hugetlb_lock
> > > > > > in free_huge_page() path.
> > > > >
> > > > > OK, I see. I completely forgot about this snowflake. I thought that
> > > > > free_huge_page was a typo missing initial __. Anyway you are right that
> > > > > this path needs a check as well. But I don't see why we couldn't use the
> > > > > lock here. The lock can be held only inside the !in_task branch.
> > > >
> > > > Because we hold the hugetlb_lock without disable irq. So if an interrupt
> > > > occurs after we hold the lock. And we also free a HugeTLB page. Then
> > > > it leads to deadlock.
> > >
> > > There is nothing really to prevent making hugetlb_lock irq safe, isn't
> > > it?
> >
> > Yeah. We can make the hugetlb_lock irq safe. But why have we not
> > done this? Maybe the commit changelog can provide more information.
> >
> > See https://github.com/torvalds/linux/commit/c77c0a8ac4c522638a8242fcb9de9496e3cdbb2d
>
> Dang! Maybe it is the time to finally stack one workaround on top of the
> other and put this code into the shape. The amount of hackery and subtle
> details has just grown beyond healthy!

From readability point of view, maybe making the hugetlb_lock irq safe
is an improvement (in the feature).

The details are always messy. :)

> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page
@ 2021-01-08 12:24                                       ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-08 12:24 UTC (permalink / raw)
  To: Michal Hocko, Mike Kravetz
  Cc: Andrew Morton, Naoya Horiguchi, Andi Kleen,
	Linux Memory Management List, LKML

On Fri, Jan 8, 2021 at 8:04 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Fri 08-01-21 19:52:54, Muchun Song wrote:
> > On Fri, Jan 8, 2021 at 7:44 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Fri 08-01-21 18:08:57, Muchun Song wrote:
> > > > On Fri, Jan 8, 2021 at 5:31 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Fri 08-01-21 17:01:03, Muchun Song wrote:
> > > > > > On Fri, Jan 8, 2021 at 4:43 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > > > >
> > > > > > > On Thu 07-01-21 23:11:22, Muchun Song wrote:
> > > > > [..]
> > > > > > > > But I find a tricky problem to solve. See free_huge_page().
> > > > > > > > If we are in non-task context, we should schedule a work
> > > > > > > > to free the page. We reuse the page->mapping. If the page
> > > > > > > > is already freed by the dissolve path. We should not touch
> > > > > > > > the page->mapping. So we need to check PageHuge().
> > > > > > > > The check and llist_add() should be protected by
> > > > > > > > hugetlb_lock. But we cannot do that. Right? If dissolve
> > > > > > > > happens after it is linked to the list. We also should
> > > > > > > > remove it from the list (hpage_freelist). It seems to make
> > > > > > > > the thing more complex.
> > > > > > >
> > > > > > > I am not sure I follow you here but yes PageHuge under hugetlb_lock
> > > > > > > should be the reliable way to check for the race. I am not sure why we
> > > > > > > really need to care about mapping or other state.
> > > > > >
> > > > > > CPU0:                               CPU1:
> > > > > > free_huge_page(page)
> > > > > >   if (PageHuge(page))
> > > > > >                                     dissolve_free_huge_page(page)
> > > > > >                                       spin_lock(&hugetlb_lock)
> > > > > >                                       update_and_free_page(page)
> > > > > >                                       spin_unlock(&hugetlb_lock)
> > > > > >     llist_add(page->mapping)
> > > > > >     // the mapping is corrupted
> > > > > >
> > > > > > The PageHuge(page) and llist_add() should be protected by
> > > > > > hugetlb_lock. Right? If so, we cannot hold hugetlb_lock
> > > > > > in free_huge_page() path.
> > > > >
> > > > > OK, I see. I completely forgot about this snowflake. I thought that
> > > > > free_huge_page was a typo missing initial __. Anyway you are right that
> > > > > this path needs a check as well. But I don't see why we couldn't use the
> > > > > lock here. The lock can be held only inside the !in_task branch.
> > > >
> > > > Because we hold the hugetlb_lock without disable irq. So if an interrupt
> > > > occurs after we hold the lock. And we also free a HugeTLB page. Then
> > > > it leads to deadlock.
> > >
> > > There is nothing really to prevent making hugetlb_lock irq safe, isn't
> > > it?
> >
> > Yeah. We can make the hugetlb_lock irq safe. But why have we not
> > done this? Maybe the commit changelog can provide more information.
> >
> > See https://github.com/torvalds/linux/commit/c77c0a8ac4c522638a8242fcb9de9496e3cdbb2d
>
> Dang! Maybe it is the time to finally stack one workaround on top of the
> other and put this code into the shape. The amount of hackery and subtle
> details has just grown beyond healthy!

From readability point of view, maybe making the hugetlb_lock irq safe
is an improvement (in the feature).

The details are always messy. :)

> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
  2021-01-06  8:47 ` [PATCH v2 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active Muchun Song
  2021-01-06 17:16   ` Michal Hocko
@ 2021-01-08 22:24   ` Mike Kravetz
  2021-01-09  4:07       ` Muchun Song
  1 sibling, 1 reply; 75+ messages in thread
From: Mike Kravetz @ 2021-01-08 22:24 UTC (permalink / raw)
  To: Muchun Song, akpm; +Cc: n-horiguchi, ak, mhocko, linux-mm, linux-kernel

On 1/6/21 12:47 AM, Muchun Song wrote:
> The page_huge_active() can be called from scan_movable_pages() which
> do not hold a reference count to the HugeTLB page. So when we call
> page_huge_active() from scan_movable_pages(), the HugeTLB page can
> be freed parallel. Then we will trigger a BUG_ON which is in the
> page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the
> VM_BUG_ON_PAGE.
> 
> Fixes: 7e1f049efb86 ("mm: hugetlb: cleanup using paeg_huge_active()")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  mm/hugetlb.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 67200dd25b1d..7a24ed28ec4f 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1372,7 +1372,6 @@ struct hstate *size_to_hstate(unsigned long size)
>   */
>  bool page_huge_active(struct page *page)
>  {
> -	VM_BUG_ON_PAGE(!PageHuge(page), page);
>  	return PageHead(page) && PagePrivate(&page[1]);
>  }

After more thought, should that return statement be changed to?
	return PageHeadHuge(page) && PagePrivate(&page[1]);

We only want to test that PagePrivate flag for hugetlb head pages.
Although, the possibility that the hugetlb page was freed and turned
into another compound page in this race window is REALLY small.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
  2021-01-08 22:24   ` Mike Kravetz
@ 2021-01-09  4:07       ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-09  4:07 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Naoya Horiguchi, Andi Kleen, mhocko,
	Linux Memory Management List, LKML

On Sat, Jan 9, 2021 at 6:24 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 1/6/21 12:47 AM, Muchun Song wrote:
> > The page_huge_active() can be called from scan_movable_pages() which
> > do not hold a reference count to the HugeTLB page. So when we call
> > page_huge_active() from scan_movable_pages(), the HugeTLB page can
> > be freed parallel. Then we will trigger a BUG_ON which is in the
> > page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the
> > VM_BUG_ON_PAGE.
> >
> > Fixes: 7e1f049efb86 ("mm: hugetlb: cleanup using paeg_huge_active()")
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > ---
> >  mm/hugetlb.c | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 67200dd25b1d..7a24ed28ec4f 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1372,7 +1372,6 @@ struct hstate *size_to_hstate(unsigned long size)
> >   */
> >  bool page_huge_active(struct page *page)
> >  {
> > -     VM_BUG_ON_PAGE(!PageHuge(page), page);
> >       return PageHead(page) && PagePrivate(&page[1]);
> >  }
>
> After more thought, should that return statement be changed to?
>         return PageHeadHuge(page) && PagePrivate(&page[1]);

Agree.

>
> We only want to test that PagePrivate flag for hugetlb head pages.
> Although, the possibility that the hugetlb page was freed and turned
> into another compound page in this race window is REALLY small.

Yeah. Thanks. I will update to PageHeadHuge().

> --
> Mike Kravetz

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [External] Re: [PATCH v2 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
@ 2021-01-09  4:07       ` Muchun Song
  0 siblings, 0 replies; 75+ messages in thread
From: Muchun Song @ 2021-01-09  4:07 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Naoya Horiguchi, Andi Kleen, mhocko,
	Linux Memory Management List, LKML

On Sat, Jan 9, 2021 at 6:24 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 1/6/21 12:47 AM, Muchun Song wrote:
> > The page_huge_active() can be called from scan_movable_pages() which
> > do not hold a reference count to the HugeTLB page. So when we call
> > page_huge_active() from scan_movable_pages(), the HugeTLB page can
> > be freed parallel. Then we will trigger a BUG_ON which is in the
> > page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the
> > VM_BUG_ON_PAGE.
> >
> > Fixes: 7e1f049efb86 ("mm: hugetlb: cleanup using paeg_huge_active()")
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > ---
> >  mm/hugetlb.c | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 67200dd25b1d..7a24ed28ec4f 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1372,7 +1372,6 @@ struct hstate *size_to_hstate(unsigned long size)
> >   */
> >  bool page_huge_active(struct page *page)
> >  {
> > -     VM_BUG_ON_PAGE(!PageHuge(page), page);
> >       return PageHead(page) && PagePrivate(&page[1]);
> >  }
>
> After more thought, should that return statement be changed to?
>         return PageHeadHuge(page) && PagePrivate(&page[1]);

Agree.

>
> We only want to test that PagePrivate flag for hugetlb head pages.
> Although, the possibility that the hugetlb page was freed and turned
> into another compound page in this race window is REALLY small.

Yeah. Thanks. I will update to PageHeadHuge().

> --
> Mike Kravetz


^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2021-01-09  4:09 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-06  8:47 [PATCH v2 0/6] Fix some bugs about HugeTLB code Muchun Song
2021-01-06  8:47 ` [PATCH v2 1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one Muchun Song
2021-01-06 16:13   ` Michal Hocko
2021-01-07  2:52     ` [External] " Muchun Song
2021-01-07  2:52       ` Muchun Song
2021-01-07 11:16       ` Michal Hocko
2021-01-07 11:24         ` Muchun Song
2021-01-07 11:24           ` Muchun Song
2021-01-07 11:48           ` Michal Hocko
2021-01-06  8:47 ` [PATCH v2 2/6] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page Muchun Song
2021-01-06 16:35   ` Michal Hocko
2021-01-06 19:30     ` Mike Kravetz
2021-01-06 20:02       ` Michal Hocko
2021-01-06 21:07         ` Mike Kravetz
2021-01-07  8:36           ` Michal Hocko
2021-01-07  2:58     ` [External] " Muchun Song
2021-01-07  2:58       ` Muchun Song
2021-01-06  8:47 ` [PATCH v2 3/6] mm: hugetlb: fix a race between freeing and dissolving the page Muchun Song
2021-01-06 16:56   ` Michal Hocko
2021-01-06 20:58     ` Mike Kravetz
2021-01-07  3:08       ` [External] " Muchun Song
2021-01-07  3:08         ` Muchun Song
2021-01-07  8:40       ` Michal Hocko
2021-01-08  0:52         ` Mike Kravetz
2021-01-08  8:28           ` Michal Hocko
2021-01-07  5:39     ` [External] " Muchun Song
2021-01-07  5:39       ` Muchun Song
2021-01-07  8:41       ` Michal Hocko
2021-01-07  8:53         ` Muchun Song
2021-01-07  8:53           ` Muchun Song
2021-01-07 11:18           ` Michal Hocko
2021-01-07 11:38             ` Muchun Song
2021-01-07 11:38               ` Muchun Song
2021-01-07 12:38               ` Michal Hocko
2021-01-07 12:59                 ` Muchun Song
2021-01-07 12:59                   ` Muchun Song
2021-01-07 14:11                   ` Michal Hocko
2021-01-07 15:11                     ` Muchun Song
2021-01-07 15:11                       ` Muchun Song
2021-01-08  1:06                       ` Mike Kravetz
2021-01-08  2:38                         ` Muchun Song
2021-01-08  2:38                           ` Muchun Song
2021-01-08  8:43                       ` Michal Hocko
2021-01-08  9:01                         ` Muchun Song
2021-01-08  9:01                           ` Muchun Song
2021-01-08  9:31                           ` Michal Hocko
2021-01-08 10:08                             ` Muchun Song
2021-01-08 10:08                               ` Muchun Song
2021-01-08 11:44                               ` Michal Hocko
2021-01-08 11:52                                 ` Muchun Song
2021-01-08 11:52                                   ` Muchun Song
2021-01-08 12:04                                   ` Michal Hocko
2021-01-08 12:24                                     ` Muchun Song
2021-01-08 12:24                                       ` Muchun Song
2021-01-06  8:47 ` [PATCH v2 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page Muchun Song
2021-01-06 17:07   ` Michal Hocko
2021-01-07  3:11     ` [External] " Muchun Song
2021-01-07  3:11       ` Muchun Song
2021-01-07  8:39       ` Michal Hocko
2021-01-07  9:01         ` Muchun Song
2021-01-07  9:01           ` Muchun Song
2021-01-07 11:22           ` Michal Hocko
2021-01-06  8:47 ` [PATCH v2 5/6] mm: hugetlb: fix a race between isolating and freeing page Muchun Song
2021-01-06 17:10   ` Michal Hocko
2021-01-06  8:47 ` [PATCH v2 6/6] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active Muchun Song
2021-01-06 17:16   ` Michal Hocko
2021-01-08 22:24   ` Mike Kravetz
2021-01-09  4:07     ` [External] " Muchun Song
2021-01-09  4:07       ` Muchun Song
2021-01-07  9:30 ` [PATCH v2 0/6] Fix some bugs about HugeTLB code David Hildenbrand
2021-01-07  9:40   ` [External] " Muchun Song
2021-01-07  9:40     ` Muchun Song
2021-01-07 10:10     ` David Hildenbrand
2021-01-07 10:16       ` Muchun Song
2021-01-07 10:16         ` Muchun Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.