linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] A few cleanup and fixup patches for migration
@ 2022-04-25 13:27 Miaohe Lin
  2022-04-25 13:27 ` [PATCH v2 1/4] mm/migration: reduce the rcu lock duration Miaohe Lin
                   ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: Miaohe Lin @ 2022-04-25 13:27 UTC (permalink / raw)
  To: akpm, mike.kravetz, naoya.horiguchi
  Cc: ying.huang, hch, dhowells, cl, david, linux-mm, linux-kernel, linmiaohe

Hi everyone,
This series contains a few patches to remove unneeded lock page and
PageMovable check, reduce the rcu lock duration. Also we fix potential
pte_unmap on an not mapped pte. More details can be found in the
respective changelogs. Thanks!

---
v2:
  collect Reviewed-by tag
  make isolate_huge_page consistent with isolate_lru_page
  add hugetlbfs variant of hugetlb_migration_entry_wait
v1:
  rebase [1] on mainline.

[1] https://lore.kernel.org/lkml/20220304093409.25829-2-linmiaohe@huawei.com/T/
---
Miaohe Lin (4):
  mm/migration: reduce the rcu lock duration
  mm/migration: remove unneeded lock page and PageMovable check
  mm/migration: return errno when isolate_huge_page failed
  mm/migration: fix potential pte_unmap on an not mapped pte

 include/linux/hugetlb.h |  6 +++---
 include/linux/swapops.h | 12 ++++++++----
 mm/gup.c                |  2 +-
 mm/hugetlb.c            | 15 +++++++--------
 mm/memory-failure.c     |  2 +-
 mm/mempolicy.c          |  2 +-
 mm/migrate.c            | 39 +++++++++++++++++++++++++--------------
 7 files changed, 46 insertions(+), 32 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 1/4] mm/migration: reduce the rcu lock duration
  2022-04-25 13:27 [PATCH v2 0/4] A few cleanup and fixup patches for migration Miaohe Lin
@ 2022-04-25 13:27 ` Miaohe Lin
  2022-04-29  9:54   ` David Hildenbrand
  2022-05-06  3:23   ` ying.huang
  2022-04-25 13:27 ` [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check Miaohe Lin
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 34+ messages in thread
From: Miaohe Lin @ 2022-04-25 13:27 UTC (permalink / raw)
  To: akpm, mike.kravetz, naoya.horiguchi
  Cc: ying.huang, hch, dhowells, cl, david, linux-mm, linux-kernel, linmiaohe

rcu_read_lock is required by grabbing the task refcount but it's not
needed for ptrace_may_access. So we could release the rcu lock after
task refcount is successfully grabbed to reduce the rcu holding time.

Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
---
 mm/migrate.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index b2678279eb43..b779646665fe 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1902,17 +1902,16 @@ static struct mm_struct *find_mm_struct(pid_t pid, nodemask_t *mem_nodes)
 		return ERR_PTR(-ESRCH);
 	}
 	get_task_struct(task);
+	rcu_read_unlock();
 
 	/*
 	 * Check if this process has the right to modify the specified
 	 * process. Use the regular "ptrace_may_access()" checks.
 	 */
 	if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) {
-		rcu_read_unlock();
 		mm = ERR_PTR(-EPERM);
 		goto out;
 	}
-	rcu_read_unlock();
 
 	mm = ERR_PTR(security_task_movememory(task));
 	if (IS_ERR(mm))
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check
  2022-04-25 13:27 [PATCH v2 0/4] A few cleanup and fixup patches for migration Miaohe Lin
  2022-04-25 13:27 ` [PATCH v2 1/4] mm/migration: reduce the rcu lock duration Miaohe Lin
@ 2022-04-25 13:27 ` Miaohe Lin
  2022-04-29 10:07   ` David Hildenbrand
  2022-04-25 13:27 ` [PATCH v2 3/4] mm/migration: return errno when isolate_huge_page failed Miaohe Lin
  2022-04-25 13:27 ` [PATCH v2 4/4] mm/migration: fix potential pte_unmap on an not mapped pte Miaohe Lin
  3 siblings, 1 reply; 34+ messages in thread
From: Miaohe Lin @ 2022-04-25 13:27 UTC (permalink / raw)
  To: akpm, mike.kravetz, naoya.horiguchi
  Cc: ying.huang, hch, dhowells, cl, david, linux-mm, linux-kernel, linmiaohe

When non-lru movable page was freed from under us, __ClearPageMovable must
have been done. Even if it's not done, ClearPageIsolated here won't hurt
as page will be freed anyway. So we can thus remove unneeded lock page and
PageMovable check here.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 mm/migrate.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index b779646665fe..0fc4651b3e39 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1093,12 +1093,8 @@ static int unmap_and_move(new_page_t get_new_page,
 		/* page was freed from under us. So we are done. */
 		ClearPageActive(page);
 		ClearPageUnevictable(page);
-		if (unlikely(__PageMovable(page))) {
-			lock_page(page);
-			if (!PageMovable(page))
-				ClearPageIsolated(page);
-			unlock_page(page);
-		}
+		if (unlikely(__PageMovable(page)))
+			ClearPageIsolated(page);
 		goto out;
 	}
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 3/4] mm/migration: return errno when isolate_huge_page failed
  2022-04-25 13:27 [PATCH v2 0/4] A few cleanup and fixup patches for migration Miaohe Lin
  2022-04-25 13:27 ` [PATCH v2 1/4] mm/migration: reduce the rcu lock duration Miaohe Lin
  2022-04-25 13:27 ` [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check Miaohe Lin
@ 2022-04-25 13:27 ` Miaohe Lin
  2022-04-29 10:08   ` David Hildenbrand
  2022-04-29 11:36   ` Muchun Song
  2022-04-25 13:27 ` [PATCH v2 4/4] mm/migration: fix potential pte_unmap on an not mapped pte Miaohe Lin
  3 siblings, 2 replies; 34+ messages in thread
From: Miaohe Lin @ 2022-04-25 13:27 UTC (permalink / raw)
  To: akpm, mike.kravetz, naoya.horiguchi
  Cc: ying.huang, hch, dhowells, cl, david, linux-mm, linux-kernel, linmiaohe

We might fail to isolate huge page due to e.g. the page is under migration
which cleared HPageMigratable. So we should return -EBUSY in this case
rather than always return 1 which could confuse the user. Also we make
the prototype of isolate_huge_page consistent with isolate_lru_page to
improve the readability.

Fixes: e8db67eb0ded ("mm: migrate: move_pages() supports thp migration")
Suggested-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 include/linux/hugetlb.h |  6 +++---
 mm/gup.c                |  2 +-
 mm/hugetlb.c            | 11 +++++------
 mm/memory-failure.c     |  2 +-
 mm/mempolicy.c          |  2 +-
 mm/migrate.c            |  5 +++--
 6 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 04f0186b089b..306d6ef3fa22 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -170,7 +170,7 @@ bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
 						vm_flags_t vm_flags);
 long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
 						long freed);
-bool isolate_huge_page(struct page *page, struct list_head *list);
+int isolate_huge_page(struct page *page, struct list_head *list);
 int get_hwpoison_huge_page(struct page *page, bool *hugetlb);
 int get_huge_page_for_hwpoison(unsigned long pfn, int flags);
 void putback_active_hugepage(struct page *page);
@@ -376,9 +376,9 @@ static inline pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr,
 	return NULL;
 }
 
-static inline bool isolate_huge_page(struct page *page, struct list_head *list)
+static inline int isolate_huge_page(struct page *page, struct list_head *list)
 {
-	return false;
+	return -EBUSY;
 }
 
 static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
diff --git a/mm/gup.c b/mm/gup.c
index 5c17d4816441..c15d41636e8e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1869,7 +1869,7 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
 		 * Try to move out any movable page before pinning the range.
 		 */
 		if (folio_test_hugetlb(folio)) {
-			if (!isolate_huge_page(&folio->page,
+			if (isolate_huge_page(&folio->page,
 						&movable_page_list))
 				isolation_error_count++;
 			continue;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 74c9964c1b11..098f81e8550d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2766,8 +2766,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
 		 * Fail with -EBUSY if not possible.
 		 */
 		spin_unlock_irq(&hugetlb_lock);
-		if (!isolate_huge_page(old_page, list))
-			ret = -EBUSY;
+		ret = isolate_huge_page(old_page, list);
 		spin_lock_irq(&hugetlb_lock);
 		goto free_new;
 	} else if (!HPageFreed(old_page)) {
@@ -2843,7 +2842,7 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list)
 	if (hstate_is_gigantic(h))
 		return -ENOMEM;
 
-	if (page_count(head) && isolate_huge_page(head, list))
+	if (page_count(head) && !isolate_huge_page(head, list))
 		ret = 0;
 	else if (!page_count(head))
 		ret = alloc_and_dissolve_huge_page(h, head, list);
@@ -6940,15 +6939,15 @@ follow_huge_pgd(struct mm_struct *mm, unsigned long address, pgd_t *pgd, int fla
 	return pte_page(*(pte_t *)pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
 }
 
-bool isolate_huge_page(struct page *page, struct list_head *list)
+int isolate_huge_page(struct page *page, struct list_head *list)
 {
-	bool ret = true;
+	int ret = 0;
 
 	spin_lock_irq(&hugetlb_lock);
 	if (!PageHeadHuge(page) ||
 	    !HPageMigratable(page) ||
 	    !get_page_unless_zero(page)) {
-		ret = false;
+		ret = -EBUSY;
 		goto unlock;
 	}
 	ClearHPageMigratable(page);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 1d117190c350..a83d32bbc567 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2203,7 +2203,7 @@ static bool isolate_page(struct page *page, struct list_head *pagelist)
 	bool lru = PageLRU(page);
 
 	if (PageHuge(page)) {
-		isolated = isolate_huge_page(page, pagelist);
+		isolated = !isolate_huge_page(page, pagelist);
 	} else {
 		if (lru)
 			isolated = !isolate_lru_page(page);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index e4f125e48cc4..a4467c4e9f8d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -602,7 +602,7 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
 	/* With MPOL_MF_MOVE, we migrate only unshared hugepage. */
 	if (flags & (MPOL_MF_MOVE_ALL) ||
 	    (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) {
-		if (!isolate_huge_page(page, qp->pagelist) &&
+		if (isolate_huge_page(page, qp->pagelist) &&
 			(flags & MPOL_MF_STRICT))
 			/*
 			 * Failed to isolate page but allow migrating pages
diff --git a/mm/migrate.c b/mm/migrate.c
index 0fc4651b3e39..c937a496239b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1628,8 +1628,9 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
 
 	if (PageHuge(page)) {
 		if (PageHead(page)) {
-			isolate_huge_page(page, pagelist);
-			err = 1;
+			err = isolate_huge_page(page, pagelist);
+			if (!err)
+				err = 1;
 		}
 	} else {
 		struct page *head;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 4/4] mm/migration: fix potential pte_unmap on an not mapped pte
  2022-04-25 13:27 [PATCH v2 0/4] A few cleanup and fixup patches for migration Miaohe Lin
                   ` (2 preceding siblings ...)
  2022-04-25 13:27 ` [PATCH v2 3/4] mm/migration: return errno when isolate_huge_page failed Miaohe Lin
@ 2022-04-25 13:27 ` Miaohe Lin
  2022-04-29  9:48   ` David Hildenbrand
  3 siblings, 1 reply; 34+ messages in thread
From: Miaohe Lin @ 2022-04-25 13:27 UTC (permalink / raw)
  To: akpm, mike.kravetz, naoya.horiguchi
  Cc: ying.huang, hch, dhowells, cl, david, linux-mm, linux-kernel, linmiaohe

__migration_entry_wait and migration_entry_wait_on_locked assume pte is
always mapped from caller. But this is not the case when it's called from
migration_entry_wait_huge and follow_huge_pmd. Add a hugetlbfs variant that
calls hugetlb_migration_entry_wait(ptep == NULL) to fix this issue.

Fixes: 30dad30922cc ("mm: migration: add migrate_entry_wait_huge()")
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 include/linux/swapops.h | 12 ++++++++----
 mm/hugetlb.c            |  4 ++--
 mm/migrate.c            | 23 +++++++++++++++++++----
 3 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 30cded849ee4..862e5a2053b1 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -244,8 +244,10 @@ extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
 					spinlock_t *ptl);
 extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
 					unsigned long address);
-extern void migration_entry_wait_huge(struct vm_area_struct *vma,
-		struct mm_struct *mm, pte_t *pte);
+#ifdef CONFIG_HUGETLB_PAGE
+extern void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl);
+extern void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte);
+#endif
 #else
 static inline swp_entry_t make_readable_migration_entry(pgoff_t offset)
 {
@@ -271,8 +273,10 @@ static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
 					spinlock_t *ptl) { }
 static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
 					 unsigned long address) { }
-static inline void migration_entry_wait_huge(struct vm_area_struct *vma,
-		struct mm_struct *mm, pte_t *pte) { }
+#ifdef CONFIG_HUGETLB_PAGE
+static inline void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl) { }
+static inline void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { }
+#endif
 static inline int is_writable_migration_entry(swp_entry_t entry)
 {
 	return 0;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 098f81e8550d..994361ec75e0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5689,7 +5689,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		 */
 		entry = huge_ptep_get(ptep);
 		if (unlikely(is_hugetlb_entry_migration(entry))) {
-			migration_entry_wait_huge(vma, mm, ptep);
+			migration_entry_wait_huge(vma, ptep);
 			return 0;
 		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
 			return VM_FAULT_HWPOISON_LARGE |
@@ -6907,7 +6907,7 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 	} else {
 		if (is_hugetlb_entry_migration(pte)) {
 			spin_unlock(ptl);
-			__migration_entry_wait(mm, (pte_t *)pmd, ptl);
+			__migration_entry_wait_huge((pte_t *)pmd, ptl);
 			goto retry;
 		}
 		/*
diff --git a/mm/migrate.c b/mm/migrate.c
index c937a496239b..7b31d0b06977 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -315,13 +315,28 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
 	__migration_entry_wait(mm, ptep, ptl);
 }
 
-void migration_entry_wait_huge(struct vm_area_struct *vma,
-		struct mm_struct *mm, pte_t *pte)
+#ifdef CONFIG_HUGETLB_PAGE
+void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl)
 {
-	spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), mm, pte);
-	__migration_entry_wait(mm, pte, ptl);
+	pte_t pte;
+
+	spin_lock(ptl);
+	pte = huge_ptep_get(ptep);
+
+	if (unlikely(!is_hugetlb_entry_migration(pte)))
+		spin_unlock(ptl);
+	else
+		migration_entry_wait_on_locked(pte_to_swp_entry(pte), NULL, ptl);
 }
 
+void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte)
+{
+	spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), vma->vm_mm, pte);
+
+	__migration_entry_wait_huge(pte, ptl);
+}
+#endif
+
 #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
 void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
 {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 4/4] mm/migration: fix potential pte_unmap on an not mapped pte
  2022-04-25 13:27 ` [PATCH v2 4/4] mm/migration: fix potential pte_unmap on an not mapped pte Miaohe Lin
@ 2022-04-29  9:48   ` David Hildenbrand
  0 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2022-04-29  9:48 UTC (permalink / raw)
  To: Miaohe Lin, akpm, mike.kravetz, naoya.horiguchi
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel

On 25.04.22 15:27, Miaohe Lin wrote:
> __migration_entry_wait and migration_entry_wait_on_locked assume pte is
> always mapped from caller. But this is not the case when it's called from
> migration_entry_wait_huge and follow_huge_pmd. Add a hugetlbfs variant that
> calls hugetlb_migration_entry_wait(ptep == NULL) to fix this issue.
> 
> Fixes: 30dad30922cc ("mm: migration: add migrate_entry_wait_huge()")
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  include/linux/swapops.h | 12 ++++++++----
>  mm/hugetlb.c            |  4 ++--
>  mm/migrate.c            | 23 +++++++++++++++++++----
>  3 files changed, 29 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/swapops.h b/include/linux/swapops.h
> index 30cded849ee4..862e5a2053b1 100644
> --- a/include/linux/swapops.h
> +++ b/include/linux/swapops.h
> @@ -244,8 +244,10 @@ extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>  					spinlock_t *ptl);
>  extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>  					unsigned long address);
> -extern void migration_entry_wait_huge(struct vm_area_struct *vma,
> -		struct mm_struct *mm, pte_t *pte);
> +#ifdef CONFIG_HUGETLB_PAGE
> +extern void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl);
> +extern void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte);
> +#endif
>  #else
>  static inline swp_entry_t make_readable_migration_entry(pgoff_t offset)
>  {
> @@ -271,8 +273,10 @@ static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>  					spinlock_t *ptl) { }
>  static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
>  					 unsigned long address) { }
> -static inline void migration_entry_wait_huge(struct vm_area_struct *vma,
> -		struct mm_struct *mm, pte_t *pte) { }
> +#ifdef CONFIG_HUGETLB_PAGE
> +static inline void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl) { }
> +static inline void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { }
> +#endif
>  static inline int is_writable_migration_entry(swp_entry_t entry)
>  {
>  	return 0;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 098f81e8550d..994361ec75e0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5689,7 +5689,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  		 */
>  		entry = huge_ptep_get(ptep);
>  		if (unlikely(is_hugetlb_entry_migration(entry))) {
> -			migration_entry_wait_huge(vma, mm, ptep);
> +			migration_entry_wait_huge(vma, ptep);
>  			return 0;
>  		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
>  			return VM_FAULT_HWPOISON_LARGE |
> @@ -6907,7 +6907,7 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  	} else {
>  		if (is_hugetlb_entry_migration(pte)) {
>  			spin_unlock(ptl);
> -			__migration_entry_wait(mm, (pte_t *)pmd, ptl);
> +			__migration_entry_wait_huge((pte_t *)pmd, ptl);

The unlock+immediate relock looks a bit sub-optimal, but that's already
been that way before your change.

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/4] mm/migration: reduce the rcu lock duration
  2022-04-25 13:27 ` [PATCH v2 1/4] mm/migration: reduce the rcu lock duration Miaohe Lin
@ 2022-04-29  9:54   ` David Hildenbrand
  2022-05-09  3:14     ` Miaohe Lin
  2022-05-24 12:36     ` Miaohe Lin
  2022-05-06  3:23   ` ying.huang
  1 sibling, 2 replies; 34+ messages in thread
From: David Hildenbrand @ 2022-04-29  9:54 UTC (permalink / raw)
  To: Miaohe Lin, akpm, mike.kravetz, naoya.horiguchi
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel

On 25.04.22 15:27, Miaohe Lin wrote:
> rcu_read_lock is required by grabbing the task refcount but it's not
> needed for ptrace_may_access. So we could release the rcu lock after
> task refcount is successfully grabbed to reduce the rcu holding time.
> 
> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Christoph Lameter <cl@linux.com>
> ---
>  mm/migrate.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index b2678279eb43..b779646665fe 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1902,17 +1902,16 @@ static struct mm_struct *find_mm_struct(pid_t pid, nodemask_t *mem_nodes)
>  		return ERR_PTR(-ESRCH);
>  	}
>  	get_task_struct(task);
> +	rcu_read_unlock();
>  
>  	/*
>  	 * Check if this process has the right to modify the specified
>  	 * process. Use the regular "ptrace_may_access()" checks.
>  	 */
>  	if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) {
> -		rcu_read_unlock();
>  		mm = ERR_PTR(-EPERM);
>  		goto out;
>  	}
> -	rcu_read_unlock();
>  
>  	mm = ERR_PTR(security_task_movememory(task));
>  	if (IS_ERR(mm))

Similar pattern in:

mm/mempolicy.c:kernel_migrate_pages()
kernel/futex/syscalls.c:get_robust_list()
kernel/nsproxy.c:validate_nsset()

Exception:

sched/core_sched.c:sched_core_share_pid()


Should we unify -- i.e., adjust the remaining 3 as well?

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check
  2022-04-25 13:27 ` [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check Miaohe Lin
@ 2022-04-29 10:07   ` David Hildenbrand
  2022-05-09  8:51     ` Miaohe Lin
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2022-04-29 10:07 UTC (permalink / raw)
  To: Miaohe Lin, akpm, mike.kravetz, naoya.horiguchi
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel

On 25.04.22 15:27, Miaohe Lin wrote:
> When non-lru movable page was freed from under us, __ClearPageMovable must
> have been done. Even if it's not done, ClearPageIsolated here won't hurt
> as page will be freed anyway. So we can thus remove unneeded lock page and
> PageMovable check here.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  mm/migrate.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index b779646665fe..0fc4651b3e39 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1093,12 +1093,8 @@ static int unmap_and_move(new_page_t get_new_page,
>  		/* page was freed from under us. So we are done. */
>  		ClearPageActive(page);
>  		ClearPageUnevictable(page);
> -		if (unlikely(__PageMovable(page))) {
> -			lock_page(page);
> -			if (!PageMovable(page))
> -				ClearPageIsolated(page);
> -			unlock_page(page);
> -		}
> +		if (unlikely(__PageMovable(page)))
> +			ClearPageIsolated(page);
>  		goto out;
>  	}

Hm, that code+change raises a couple of questions.

We're doing here the same as in putback_movable_pages(). So I guess the
difference here is that the caller did release the reference while the
page was isolated, while we don't assume the same in
putback_movable_pages().


Shouldn't whoever owned the page have cleared that? IOW, is it even
valid that we see a movable or isolated page here (WARN/BUG?)?

At least for balloon compaction, I remember that __PageMovable() is
properly cleared before freeing it via balloon_page_delete().


Also, I am not sure how reliable that page count check is here: if we'd
have another speculative reference to the page, we might see
"page_count(page) > 1" and not take that path, although the previous
owner released the last reference.


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 3/4] mm/migration: return errno when isolate_huge_page failed
  2022-04-25 13:27 ` [PATCH v2 3/4] mm/migration: return errno when isolate_huge_page failed Miaohe Lin
@ 2022-04-29 10:08   ` David Hildenbrand
  2022-05-09  8:03     ` Miaohe Lin
  2022-04-29 11:36   ` Muchun Song
  1 sibling, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2022-04-29 10:08 UTC (permalink / raw)
  To: Miaohe Lin, akpm, mike.kravetz, naoya.horiguchi
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel

On 25.04.22 15:27, Miaohe Lin wrote:
> We might fail to isolate huge page due to e.g. the page is under migration
> which cleared HPageMigratable. So we should return -EBUSY in this case
> rather than always return 1 which could confuse the user. Also we make
> the prototype of isolate_huge_page consistent with isolate_lru_page to
> improve the readability.
> 
> Fixes: e8db67eb0ded ("mm: migrate: move_pages() supports thp migration")

If this is a fix, what's the runtime effect of it?

You state "could confuse", which doesn't indicate an actual BUG to me.


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 3/4] mm/migration: return errno when isolate_huge_page failed
  2022-04-25 13:27 ` [PATCH v2 3/4] mm/migration: return errno when isolate_huge_page failed Miaohe Lin
  2022-04-29 10:08   ` David Hildenbrand
@ 2022-04-29 11:36   ` Muchun Song
  2022-05-09  3:23     ` Miaohe Lin
  1 sibling, 1 reply; 34+ messages in thread
From: Muchun Song @ 2022-04-29 11:36 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, mike.kravetz, naoya.horiguchi, ying.huang, hch, dhowells,
	cl, david, linux-mm, linux-kernel

On Mon, Apr 25, 2022 at 09:27:22PM +0800, Miaohe Lin wrote:
> We might fail to isolate huge page due to e.g. the page is under migration
> which cleared HPageMigratable. So we should return -EBUSY in this case
> rather than always return 1 which could confuse the user. Also we make
> the prototype of isolate_huge_page consistent with isolate_lru_page to
> improve the readability.
> 
> Fixes: e8db67eb0ded ("mm: migrate: move_pages() supports thp migration")
> Suggested-by: Huang Ying <ying.huang@intel.com>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  include/linux/hugetlb.h |  6 +++---
>  mm/gup.c                |  2 +-
>  mm/hugetlb.c            | 11 +++++------
>  mm/memory-failure.c     |  2 +-
>  mm/mempolicy.c          |  2 +-
>  mm/migrate.c            |  5 +++--
>  6 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 04f0186b089b..306d6ef3fa22 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -170,7 +170,7 @@ bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
>  						vm_flags_t vm_flags);
>  long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
>  						long freed);
> -bool isolate_huge_page(struct page *page, struct list_head *list);
> +int isolate_huge_page(struct page *page, struct list_head *list);
>  int get_hwpoison_huge_page(struct page *page, bool *hugetlb);
>  int get_huge_page_for_hwpoison(unsigned long pfn, int flags);
>  void putback_active_hugepage(struct page *page);
> @@ -376,9 +376,9 @@ static inline pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr,
>  	return NULL;
>  }
>  
> -static inline bool isolate_huge_page(struct page *page, struct list_head *list)
> +static inline int isolate_huge_page(struct page *page, struct list_head *list)

Since you already touched all the call sites, how about renaming this
to hugetlb_isolate()? I've always felt that huge_page is not a
straightforward and clear name since we also have another type of
huge page (THP).  I think hugetlb is more specific.

Thanks.
 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/4] mm/migration: reduce the rcu lock duration
  2022-04-25 13:27 ` [PATCH v2 1/4] mm/migration: reduce the rcu lock duration Miaohe Lin
  2022-04-29  9:54   ` David Hildenbrand
@ 2022-05-06  3:23   ` ying.huang
  2022-05-09  3:20     ` Miaohe Lin
  1 sibling, 1 reply; 34+ messages in thread
From: ying.huang @ 2022-05-06  3:23 UTC (permalink / raw)
  To: Miaohe Lin, akpm, mike.kravetz, naoya.horiguchi
  Cc: hch, dhowells, cl, david, linux-mm, linux-kernel

On Mon, 2022-04-25 at 21:27 +0800, Miaohe Lin wrote:
> rcu_read_lock is required by grabbing the task refcount but it's not
> needed for ptrace_may_access. So we could release the rcu lock after
> task refcount is successfully grabbed to reduce the rcu holding time.
> 
> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Christoph Lameter <cl@linux.com>
> ---
>  mm/migrate.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index b2678279eb43..b779646665fe 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1902,17 +1902,16 @@ static struct mm_struct *find_mm_struct(pid_t pid, nodemask_t *mem_nodes)
>  		return ERR_PTR(-ESRCH);
>  	}
>  	get_task_struct(task);
> +	rcu_read_unlock();
>  
> 
>  	/*
>  	 * Check if this process has the right to modify the specified
>  	 * process. Use the regular "ptrace_may_access()" checks.
>  	 */
>  	if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) {
> -		rcu_read_unlock();
>  		mm = ERR_PTR(-EPERM);
>  		goto out;
>  	}
> -	rcu_read_unlock();
>  
> 
>  	mm = ERR_PTR(security_task_movememory(task));
>  	if (IS_ERR(mm))

Hi, Miaohe,

Please check the previous discussion and verify whether the original
reported race condition is stll valid by yourself before resending this
patch again.  If you find that the original race condition isn't
possible now, please add the analysis in your change log.

Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/4] mm/migration: reduce the rcu lock duration
  2022-04-29  9:54   ` David Hildenbrand
@ 2022-05-09  3:14     ` Miaohe Lin
  2022-05-24 12:36     ` Miaohe Lin
  1 sibling, 0 replies; 34+ messages in thread
From: Miaohe Lin @ 2022-05-09  3:14 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi

On 2022/4/29 17:54, David Hildenbrand wrote:
> On 25.04.22 15:27, Miaohe Lin wrote:
>> rcu_read_lock is required by grabbing the task refcount but it's not
>> needed for ptrace_may_access. So we could release the rcu lock after
>> task refcount is successfully grabbed to reduce the rcu holding time.
>>
>> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> Cc: Huang Ying <ying.huang@intel.com>
>> Cc: David Howells <dhowells@redhat.com>
>> Cc: Christoph Lameter <cl@linux.com>
>> ---
>>  mm/migrate.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index b2678279eb43..b779646665fe 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1902,17 +1902,16 @@ static struct mm_struct *find_mm_struct(pid_t pid, nodemask_t *mem_nodes)
>>  		return ERR_PTR(-ESRCH);
>>  	}
>>  	get_task_struct(task);
>> +	rcu_read_unlock();
>>  
>>  	/*
>>  	 * Check if this process has the right to modify the specified
>>  	 * process. Use the regular "ptrace_may_access()" checks.
>>  	 */
>>  	if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) {
>> -		rcu_read_unlock();
>>  		mm = ERR_PTR(-EPERM);
>>  		goto out;
>>  	}
>> -	rcu_read_unlock();
>>  
>>  	mm = ERR_PTR(security_task_movememory(task));
>>  	if (IS_ERR(mm))
> 
> Similar pattern in:
> 
> mm/mempolicy.c:kernel_migrate_pages()
> kernel/futex/syscalls.c:get_robust_list()
> kernel/nsproxy.c:validate_nsset()
> 
> Exception:
> 
> sched/core_sched.c:sched_core_share_pid()
> 
> 
> Should we unify -- i.e., adjust the remaining 3 as well?
> 

Sorry for late respond. I think it's fine to do all of this together. But this patch is indeed
under verifying now. I will try to do that after verified. Thanks!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/4] mm/migration: reduce the rcu lock duration
  2022-05-06  3:23   ` ying.huang
@ 2022-05-09  3:20     ` Miaohe Lin
  0 siblings, 0 replies; 34+ messages in thread
From: Miaohe Lin @ 2022-05-09  3:20 UTC (permalink / raw)
  To: ying.huang, dhowells, cl
  Cc: hch, david, linux-mm, linux-kernel, akpm, mike.kravetz, naoya.horiguchi

On 2022/5/6 11:23, ying.huang@intel.com wrote:
> On Mon, 2022-04-25 at 21:27 +0800, Miaohe Lin wrote:
>> rcu_read_lock is required by grabbing the task refcount but it's not
>> needed for ptrace_may_access. So we could release the rcu lock after
>> task refcount is successfully grabbed to reduce the rcu holding time.
>>
>> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> Cc: Huang Ying <ying.huang@intel.com>
>> Cc: David Howells <dhowells@redhat.com>
>> Cc: Christoph Lameter <cl@linux.com>
>> ---
>>  mm/migrate.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index b2678279eb43..b779646665fe 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1902,17 +1902,16 @@ static struct mm_struct *find_mm_struct(pid_t pid, nodemask_t *mem_nodes)
>>  		return ERR_PTR(-ESRCH);
>>  	}
>>  	get_task_struct(task);
>> +	rcu_read_unlock();
>>  
>>
>>  	/*
>>  	 * Check if this process has the right to modify the specified
>>  	 * process. Use the regular "ptrace_may_access()" checks.
>>  	 */
>>  	if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) {
>> -		rcu_read_unlock();
>>  		mm = ERR_PTR(-EPERM);
>>  		goto out;
>>  	}
>> -	rcu_read_unlock();
>>  
>>
>>  	mm = ERR_PTR(security_task_movememory(task));
>>  	if (IS_ERR(mm))
> 
> Hi, Miaohe,
> 
> Please check the previous discussion and verify whether the original
> reported race condition is stll valid by yourself before resending this
> patch again.  If you find that the original race condition isn't
> possible now, please add the analysis in your change log.
> 

Sorry for late respond. It's a pity that this change is still not verified by the relevant
experts. I will try to give my analysis in my change log instead if the original race condition
is invalid now.

Many thanks!

> Best Regards,
> Huang, Ying
> 
> .
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 3/4] mm/migration: return errno when isolate_huge_page failed
  2022-04-29 11:36   ` Muchun Song
@ 2022-05-09  3:23     ` Miaohe Lin
  2022-05-09  4:21       ` Muchun Song
  0 siblings, 1 reply; 34+ messages in thread
From: Miaohe Lin @ 2022-05-09  3:23 UTC (permalink / raw)
  To: Muchun Song
  Cc: akpm, mike.kravetz, naoya.horiguchi, ying.huang, hch, dhowells,
	cl, david, linux-mm, linux-kernel

On 2022/4/29 19:36, Muchun Song wrote:
> On Mon, Apr 25, 2022 at 09:27:22PM +0800, Miaohe Lin wrote:
>> We might fail to isolate huge page due to e.g. the page is under migration
>> which cleared HPageMigratable. So we should return -EBUSY in this case
>> rather than always return 1 which could confuse the user. Also we make
>> the prototype of isolate_huge_page consistent with isolate_lru_page to
>> improve the readability.
>>
>> Fixes: e8db67eb0ded ("mm: migrate: move_pages() supports thp migration")
>> Suggested-by: Huang Ying <ying.huang@intel.com>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  include/linux/hugetlb.h |  6 +++---
>>  mm/gup.c                |  2 +-
>>  mm/hugetlb.c            | 11 +++++------
>>  mm/memory-failure.c     |  2 +-
>>  mm/mempolicy.c          |  2 +-
>>  mm/migrate.c            |  5 +++--
>>  6 files changed, 14 insertions(+), 14 deletions(-)
>>
>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>> index 04f0186b089b..306d6ef3fa22 100644
>> --- a/include/linux/hugetlb.h
>> +++ b/include/linux/hugetlb.h
>> @@ -170,7 +170,7 @@ bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
>>  						vm_flags_t vm_flags);
>>  long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
>>  						long freed);
>> -bool isolate_huge_page(struct page *page, struct list_head *list);
>> +int isolate_huge_page(struct page *page, struct list_head *list);
>>  int get_hwpoison_huge_page(struct page *page, bool *hugetlb);
>>  int get_huge_page_for_hwpoison(unsigned long pfn, int flags);
>>  void putback_active_hugepage(struct page *page);
>> @@ -376,9 +376,9 @@ static inline pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr,
>>  	return NULL;
>>  }
>>  
>> -static inline bool isolate_huge_page(struct page *page, struct list_head *list)
>> +static inline int isolate_huge_page(struct page *page, struct list_head *list)
> 
> Since you already touched all the call sites, how about renaming this
> to hugetlb_isolate()? I've always felt that huge_page is not a
> straightforward and clear name since we also have another type of
> huge page (THP).  I think hugetlb is more specific.
> 

Sorry for late respond. This suggestion looks good to me. But is isolate_hugetlb more suitable?
This could make it more consistent with isolate_lru_page? What do you think?

Thanks!

> Thanks.
>  
> .
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 3/4] mm/migration: return errno when isolate_huge_page failed
  2022-05-09  3:23     ` Miaohe Lin
@ 2022-05-09  4:21       ` Muchun Song
  2022-05-09  7:51         ` Miaohe Lin
  0 siblings, 1 reply; 34+ messages in thread
From: Muchun Song @ 2022-05-09  4:21 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Andrew Morton, Mike Kravetz,
	HORIGUCHI NAOYA(堀口 直也),
	Huang Ying, Christoph Hellwig, dhowells, Christoph Lameter,
	David Hildenbrand, Linux Memory Management List, LKML

On Mon, May 9, 2022 at 11:24 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2022/4/29 19:36, Muchun Song wrote:
> > On Mon, Apr 25, 2022 at 09:27:22PM +0800, Miaohe Lin wrote:
> >> We might fail to isolate huge page due to e.g. the page is under migration
> >> which cleared HPageMigratable. So we should return -EBUSY in this case
> >> rather than always return 1 which could confuse the user. Also we make
> >> the prototype of isolate_huge_page consistent with isolate_lru_page to
> >> improve the readability.
> >>
> >> Fixes: e8db67eb0ded ("mm: migrate: move_pages() supports thp migration")
> >> Suggested-by: Huang Ying <ying.huang@intel.com>
> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >> ---
> >>  include/linux/hugetlb.h |  6 +++---
> >>  mm/gup.c                |  2 +-
> >>  mm/hugetlb.c            | 11 +++++------
> >>  mm/memory-failure.c     |  2 +-
> >>  mm/mempolicy.c          |  2 +-
> >>  mm/migrate.c            |  5 +++--
> >>  6 files changed, 14 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> >> index 04f0186b089b..306d6ef3fa22 100644
> >> --- a/include/linux/hugetlb.h
> >> +++ b/include/linux/hugetlb.h
> >> @@ -170,7 +170,7 @@ bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
> >>                                              vm_flags_t vm_flags);
> >>  long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
> >>                                              long freed);
> >> -bool isolate_huge_page(struct page *page, struct list_head *list);
> >> +int isolate_huge_page(struct page *page, struct list_head *list);
> >>  int get_hwpoison_huge_page(struct page *page, bool *hugetlb);
> >>  int get_huge_page_for_hwpoison(unsigned long pfn, int flags);
> >>  void putback_active_hugepage(struct page *page);
> >> @@ -376,9 +376,9 @@ static inline pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr,
> >>      return NULL;
> >>  }
> >>
> >> -static inline bool isolate_huge_page(struct page *page, struct list_head *list)
> >> +static inline int isolate_huge_page(struct page *page, struct list_head *list)
> >
> > Since you already touched all the call sites, how about renaming this
> > to hugetlb_isolate()? I've always felt that huge_page is not a
> > straightforward and clear name since we also have another type of
> > huge page (THP).  I think hugetlb is more specific.
> >
>
> Sorry for late respond. This suggestion looks good to me. But is isolate_hugetlb more suitable?
> This could make it more consistent with isolate_lru_page? What do you think?
>

There is also a function named folio_isolate_lru(). My initial consideration was
making it consistent with folio_isolate_lru(). isolate_hugetlb looks good to me
as well.

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 3/4] mm/migration: return errno when isolate_huge_page failed
  2022-05-09  4:21       ` Muchun Song
@ 2022-05-09  7:51         ` Miaohe Lin
  0 siblings, 0 replies; 34+ messages in thread
From: Miaohe Lin @ 2022-05-09  7:51 UTC (permalink / raw)
  To: Muchun Song
  Cc: Andrew Morton, Mike Kravetz,
	HORIGUCHI NAOYA(堀口 直也),
	Huang Ying, Christoph Hellwig, dhowells, Christoph Lameter,
	David Hildenbrand, Linux Memory Management List, LKML

On 2022/5/9 12:21, Muchun Song wrote:
> On Mon, May 9, 2022 at 11:24 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> On 2022/4/29 19:36, Muchun Song wrote:
>>> On Mon, Apr 25, 2022 at 09:27:22PM +0800, Miaohe Lin wrote:
>>>> We might fail to isolate huge page due to e.g. the page is under migration
>>>> which cleared HPageMigratable. So we should return -EBUSY in this case
>>>> rather than always return 1 which could confuse the user. Also we make
>>>> the prototype of isolate_huge_page consistent with isolate_lru_page to
>>>> improve the readability.
>>>>
>>>> Fixes: e8db67eb0ded ("mm: migrate: move_pages() supports thp migration")
>>>> Suggested-by: Huang Ying <ying.huang@intel.com>
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>> ---
>>>>  include/linux/hugetlb.h |  6 +++---
>>>>  mm/gup.c                |  2 +-
>>>>  mm/hugetlb.c            | 11 +++++------
>>>>  mm/memory-failure.c     |  2 +-
>>>>  mm/mempolicy.c          |  2 +-
>>>>  mm/migrate.c            |  5 +++--
>>>>  6 files changed, 14 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>>>> index 04f0186b089b..306d6ef3fa22 100644
>>>> --- a/include/linux/hugetlb.h
>>>> +++ b/include/linux/hugetlb.h
>>>> @@ -170,7 +170,7 @@ bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
>>>>                                              vm_flags_t vm_flags);
>>>>  long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
>>>>                                              long freed);
>>>> -bool isolate_huge_page(struct page *page, struct list_head *list);
>>>> +int isolate_huge_page(struct page *page, struct list_head *list);
>>>>  int get_hwpoison_huge_page(struct page *page, bool *hugetlb);
>>>>  int get_huge_page_for_hwpoison(unsigned long pfn, int flags);
>>>>  void putback_active_hugepage(struct page *page);
>>>> @@ -376,9 +376,9 @@ static inline pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr,
>>>>      return NULL;
>>>>  }
>>>>
>>>> -static inline bool isolate_huge_page(struct page *page, struct list_head *list)
>>>> +static inline int isolate_huge_page(struct page *page, struct list_head *list)
>>>
>>> Since you already touched all the call sites, how about renaming this
>>> to hugetlb_isolate()? I've always felt that huge_page is not a
>>> straightforward and clear name since we also have another type of
>>> huge page (THP).  I think hugetlb is more specific.
>>>
>>
>> Sorry for late respond. This suggestion looks good to me. But is isolate_hugetlb more suitable?
>> This could make it more consistent with isolate_lru_page? What do you think?
>>
> 
> There is also a function named folio_isolate_lru(). My initial consideration was
> making it consistent with folio_isolate_lru(). isolate_hugetlb looks good to me
> as well.

I see. Many thanks for your explanation. :)

> 
> Thanks.
> .
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 3/4] mm/migration: return errno when isolate_huge_page failed
  2022-04-29 10:08   ` David Hildenbrand
@ 2022-05-09  8:03     ` Miaohe Lin
  0 siblings, 0 replies; 34+ messages in thread
From: Miaohe Lin @ 2022-05-09  8:03 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi

On 2022/4/29 18:08, David Hildenbrand wrote:
> On 25.04.22 15:27, Miaohe Lin wrote:
>> We might fail to isolate huge page due to e.g. the page is under migration
>> which cleared HPageMigratable. So we should return -EBUSY in this case
>> rather than always return 1 which could confuse the user. Also we make
>> the prototype of isolate_huge_page consistent with isolate_lru_page to
>> improve the readability.
>>
>> Fixes: e8db67eb0ded ("mm: migrate: move_pages() supports thp migration")
> 
> If this is a fix, what's the runtime effect of it?
> 
> You state "could confuse", which doesn't indicate an actual BUG to me.

The hugetlb page might not be migrated due to error while it's not reported in the __user *status.
So the caller might think all of the memory is migrated and thus does not retry to migrate the
hugetlb page in the next round. Is this too trival to bother adding a Fixes tag?

Thanks!

> 
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check
  2022-04-29 10:07   ` David Hildenbrand
@ 2022-05-09  8:51     ` Miaohe Lin
  2022-05-11 15:23       ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: Miaohe Lin @ 2022-05-09  8:51 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi

On 2022/4/29 18:07, David Hildenbrand wrote:
> On 25.04.22 15:27, Miaohe Lin wrote:
>> When non-lru movable page was freed from under us, __ClearPageMovable must
>> have been done. Even if it's not done, ClearPageIsolated here won't hurt
>> as page will be freed anyway. So we can thus remove unneeded lock page and
>> PageMovable check here.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> ---
>>  mm/migrate.c | 8 ++------
>>  1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index b779646665fe..0fc4651b3e39 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1093,12 +1093,8 @@ static int unmap_and_move(new_page_t get_new_page,
>>  		/* page was freed from under us. So we are done. */
>>  		ClearPageActive(page);
>>  		ClearPageUnevictable(page);
>> -		if (unlikely(__PageMovable(page))) {
>> -			lock_page(page);
>> -			if (!PageMovable(page))
>> -				ClearPageIsolated(page);
>> -			unlock_page(page);
>> -		}
>> +		if (unlikely(__PageMovable(page)))
>> +			ClearPageIsolated(page);
>>  		goto out;
>>  	}
> 
> Hm, that code+change raises a couple of questions.
> 
> We're doing here the same as in putback_movable_pages(). So I guess the
> difference here is that the caller did release the reference while the
> page was isolated, while we don't assume the same in
> putback_movable_pages().

Agree.

> 
> 
> Shouldn't whoever owned the page have cleared that? IOW, is it even
> valid that we see a movable or isolated page here (WARN/BUG?)?
> 
> At least for balloon compaction, I remember that __PageMovable() is
> properly cleared before freeing it via balloon_page_delete().

z3fold, zsmalloc will do __ClearPageMovable when the page is going to be released.
So I think we shouldn't see a movable page here:

void __ClearPageMovable(struct page *page)
{
	VM_BUG_ON_PAGE(!PageMovable(page), page);
	/*
	 * Clear registered address_space val with keeping PAGE_MAPPING_MOVABLE
	 * flag so that VM can catch up released page by driver after isolation.
	 * With it, VM migration doesn't try to put it back.
	 */
	page->mapping = (void *)((unsigned long)page->mapping &
				PAGE_MAPPING_MOVABLE);
}

But it seems there is no guarantee for PageIsolated flag. Or am I miss something?

> 
> 
> Also, I am not sure how reliable that page count check is here: if we'd
> have another speculative reference to the page, we might see
> "page_count(page) > 1" and not take that path, although the previous
> owner released the last reference.

IIUC, there should not be such speculative reference. The driver should have taken care
of it.

Thanks!

> 
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check
  2022-05-09  8:51     ` Miaohe Lin
@ 2022-05-11 15:23       ` David Hildenbrand
  2022-05-12  2:25         ` Miaohe Lin
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2022-05-11 15:23 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi

On 09.05.22 10:51, Miaohe Lin wrote:
> On 2022/4/29 18:07, David Hildenbrand wrote:
>> On 25.04.22 15:27, Miaohe Lin wrote:
>>> When non-lru movable page was freed from under us, __ClearPageMovable must
>>> have been done. Even if it's not done, ClearPageIsolated here won't hurt
>>> as page will be freed anyway. So we can thus remove unneeded lock page and
>>> PageMovable check here.
>>>
>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>> ---
>>>  mm/migrate.c | 8 ++------
>>>  1 file changed, 2 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index b779646665fe..0fc4651b3e39 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1093,12 +1093,8 @@ static int unmap_and_move(new_page_t get_new_page,
>>>  		/* page was freed from under us. So we are done. */
>>>  		ClearPageActive(page);
>>>  		ClearPageUnevictable(page);
>>> -		if (unlikely(__PageMovable(page))) {
>>> -			lock_page(page);
>>> -			if (!PageMovable(page))
>>> -				ClearPageIsolated(page);
>>> -			unlock_page(page);
>>> -		}
>>> +		if (unlikely(__PageMovable(page)))
>>> +			ClearPageIsolated(page);
>>>  		goto out;
>>>  	}
>>
>> Hm, that code+change raises a couple of questions.
>>
>> We're doing here the same as in putback_movable_pages(). So I guess the
>> difference here is that the caller did release the reference while the
>> page was isolated, while we don't assume the same in
>> putback_movable_pages().
> 
> Agree.
> 
>>
>>
>> Shouldn't whoever owned the page have cleared that? IOW, is it even
>> valid that we see a movable or isolated page here (WARN/BUG?)?
>>
>> At least for balloon compaction, I remember that __PageMovable() is
>> properly cleared before freeing it via balloon_page_delete().
> 
> z3fold, zsmalloc will do __ClearPageMovable when the page is going to be released.
> So I think we shouldn't see a movable page here:
> 
> void __ClearPageMovable(struct page *page)
> {
> 	VM_BUG_ON_PAGE(!PageMovable(page), page);
> 	/*
> 	 * Clear registered address_space val with keeping PAGE_MAPPING_MOVABLE
> 	 * flag so that VM can catch up released page by driver after isolation.
> 	 * With it, VM migration doesn't try to put it back.
> 	 */
> 	page->mapping = (void *)((unsigned long)page->mapping &
> 				PAGE_MAPPING_MOVABLE);
> }
> 
> But it seems there is no guarantee for PageIsolated flag. Or am I miss something?

At least the code we have now:

if (unlikely(__PageMovable(page)))
	ClearPageIsolated(page);

Should be dead code. So PG_isolated could remain set.

If PG_isolated is still set, it will get cleared in the buddy when
freeing the page via

	page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;

> 
>>
>>
>> Also, I am not sure how reliable that page count check is here: if we'd
>> have another speculative reference to the page, we might see
>> "page_count(page) > 1" and not take that path, although the previous
>> owner released the last reference.
> 
> IIUC, there should not be such speculative reference. The driver should have taken care
> of it.

How can you prevent any kind of speculative references?

See isolate_movable_page() as an example, which grabs a speculative
reference to then find out that the page is already isolated by someone
else, to then back off.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check
  2022-05-11 15:23       ` David Hildenbrand
@ 2022-05-12  2:25         ` Miaohe Lin
  2022-05-12  7:10           ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: Miaohe Lin @ 2022-05-12  2:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi

On 2022/5/11 23:23, David Hildenbrand wrote:
> On 09.05.22 10:51, Miaohe Lin wrote:
>> On 2022/4/29 18:07, David Hildenbrand wrote:
snip
>>
>> z3fold, zsmalloc will do __ClearPageMovable when the page is going to be released.
>> So I think we shouldn't see a movable page here:
>>
>> void __ClearPageMovable(struct page *page)
>> {
>> 	VM_BUG_ON_PAGE(!PageMovable(page), page);
>> 	/*
>> 	 * Clear registered address_space val with keeping PAGE_MAPPING_MOVABLE
>> 	 * flag so that VM can catch up released page by driver after isolation.
>> 	 * With it, VM migration doesn't try to put it back.
>> 	 */
>> 	page->mapping = (void *)((unsigned long)page->mapping &
>> 				PAGE_MAPPING_MOVABLE);
>> }
>>
>> But it seems there is no guarantee for PageIsolated flag. Or am I miss something?
> 
> At least the code we have now:
> 
> if (unlikely(__PageMovable(page)))
> 	ClearPageIsolated(page);
> 
> Should be dead code. So PG_isolated could remain set.
> 
> If PG_isolated is still set, it will get cleared in the buddy when
> freeing the page via
> 
> 	page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;

Yes, check_free_page only complains about flags belonging to PAGE_FLAGS_CHECK_AT_FREE and PG_isolated
will be cleared in the buddy when freeing the page. But it might not be a good idea to reply on this ?
IMHO, it should be better to clear the PG_isolated explicitly ourselves.

> 
>>
>>>
>>>
>>> Also, I am not sure how reliable that page count check is here: if we'd
>>> have another speculative reference to the page, we might see
>>> "page_count(page) > 1" and not take that path, although the previous
>>> owner released the last reference.
>>
>> IIUC, there should not be such speculative reference. The driver should have taken care
>> of it.
> 
> How can you prevent any kind of speculative references?
> 
> See isolate_movable_page() as an example, which grabs a speculative
> reference to then find out that the page is already isolated by someone
> else, to then back off.

You're right. isolate_movable_page will be an speculative references case. But the page count check here
is just an optimization. If we encounter speculative references, it still works with useless effort of
migrating to be released page.

Thanks!

> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check
  2022-05-12  2:25         ` Miaohe Lin
@ 2022-05-12  7:10           ` David Hildenbrand
  2022-05-12 13:26             ` Miaohe Lin
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2022-05-12  7:10 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi, Minchan Kim

>> If PG_isolated is still set, it will get cleared in the buddy when
>> freeing the page via
>>
>> 	page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
> 
> Yes, check_free_page only complains about flags belonging to PAGE_FLAGS_CHECK_AT_FREE and PG_isolated
> will be cleared in the buddy when freeing the page. But it might not be a good idea to reply on this ?
> IMHO, it should be better to clear the PG_isolated explicitly ourselves.

I think we can pretty much rely on this handling in the buddy :)

> 
>>
>>>
>>>>
>>>>
>>>> Also, I am not sure how reliable that page count check is here: if we'd
>>>> have another speculative reference to the page, we might see
>>>> "page_count(page) > 1" and not take that path, although the previous
>>>> owner released the last reference.
>>>
>>> IIUC, there should not be such speculative reference. The driver should have taken care
>>> of it.
>>
>> How can you prevent any kind of speculative references?
>>
>> See isolate_movable_page() as an example, which grabs a speculative
>> reference to then find out that the page is already isolated by someone
>> else, to then back off.
> 
> You're right. isolate_movable_page will be an speculative references case. But the page count check here
> is just an optimization. If we encounter speculative references, it still works with useless effort of
> migrating to be released page.


Not really. The issue is that PAGE_FLAGS_CHECK_AT_FREE contains
PG_active and PG_unevictable.

We only clear those 2 flags if "page_count(page) == 1". Consequently,
with a speculative reference, we'll run into the check_free_page_bad()
when dropping the last reference.

This is just shaky. Special casing on "page_count(page) == 1" for
detecting "was this freed by the owner" is not 100% water proof.

In an ideal world, we'd just get rid of that whole block of code and let
the actual freeing code clear PG_active and PG_unevictable. But that
would require changes to free_pages_prepare().


Now I do wonder, if we ever even have PG_active or PG_unevictable still
set when the page was freed by the owner in this code. IOW, maybe that
is dead code as well and we can just remove the whole shaky
"page_count(page) == 1" code block.

Ccing Minchan, who added clearing of the pageflags at that point.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check
  2022-05-12  7:10           ` David Hildenbrand
@ 2022-05-12 13:26             ` Miaohe Lin
  2022-05-12 16:50               ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: Miaohe Lin @ 2022-05-12 13:26 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi, Minchan Kim

On 2022/5/12 15:10, David Hildenbrand wrote:
>>> If PG_isolated is still set, it will get cleared in the buddy when
>>> freeing the page via
>>>
>>> 	page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
>>
>> Yes, check_free_page only complains about flags belonging to PAGE_FLAGS_CHECK_AT_FREE and PG_isolated
>> will be cleared in the buddy when freeing the page. But it might not be a good idea to reply on this ?
>> IMHO, it should be better to clear the PG_isolated explicitly ourselves.
> 
> I think we can pretty much rely on this handling in the buddy :)

So is the below code change what you're suggesting?

	if (page_count(page) == 1) {
		/* page was freed from under us. So we are done. */
		ClearPageActive(page);
		ClearPageUnevictable(page);
-		if (unlikely(__PageMovable(page)))
-			ClearPageIsolated(page);
		goto out;
	}
> 
>>
>>>
>>>>
>>>>>
>>>>>
>>>>> Also, I am not sure how reliable that page count check is here: if we'd
>>>>> have another speculative reference to the page, we might see
>>>>> "page_count(page) > 1" and not take that path, although the previous
>>>>> owner released the last reference.
>>>>
>>>> IIUC, there should not be such speculative reference. The driver should have taken care
>>>> of it.
>>>
>>> How can you prevent any kind of speculative references?
>>>
>>> See isolate_movable_page() as an example, which grabs a speculative
>>> reference to then find out that the page is already isolated by someone
>>> else, to then back off.
>>
>> You're right. isolate_movable_page will be an speculative references case. But the page count check here
>> is just an optimization. If we encounter speculative references, it still works with useless effort of
>> migrating to be released page.
> 
> 
> Not really. The issue is that PAGE_FLAGS_CHECK_AT_FREE contains
> PG_active and PG_unevictable.
> 
> We only clear those 2 flags if "page_count(page) == 1". Consequently,
> with a speculative reference, we'll run into the check_free_page_bad()
> when dropping the last reference.

It seems if a speculative reference happens after the "page_count(page) == 1" check,
it's ok because we cleared the PG_active and PG_unevictable. And if it happens before
the check, this code block is skipped and the page will be freed after migration. The
PG_active and PG_unevictable will be correctly cleared when page is actually freed via
__folio_clear_active. (Please see below comment)

> 
> This is just shaky. Special casing on "page_count(page) == 1" for
> detecting "was this freed by the owner" is not 100% water proof.
> 
> In an ideal world, we'd just get rid of that whole block of code and let
> the actual freeing code clear PG_active and PG_unevictable. But that
> would require changes to free_pages_prepare().
> 
> 
> Now I do wonder, if we ever even have PG_active or PG_unevictable still
> set when the page was freed by the owner in this code. IOW, maybe that
> is dead code as well and we can just remove the whole shaky
> "page_count(page) == 1" code block.

Think about below common scene: Anonymous page is actively used by the sole owner process, so it
will have PG_active set. Then process exited while vm tries to migrate that page. So the page
should have refcnt == 1 while PG_active is set? Note normally PG_active should be cleared when
the page is released:

__put_single_page
  PageLRU
    __clear_page_lru_flags
      __folio_clear_active
      __folio_clear_unevictable

But for isolated page, PageLRU is cleared. So when the isolated page is released, __clear_page_lru_flags
won't be called. So we have to clear the PG_active and PG_unevictable here manully. So I think
this code block works. Or am I miss something again?

Thanks!

> 
> Ccing Minchan, who added clearing of the pageflags at that point.
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check
  2022-05-12 13:26             ` Miaohe Lin
@ 2022-05-12 16:50               ` David Hildenbrand
  2022-05-16  2:44                 ` Miaohe Lin
  2022-05-24 12:47                 ` Miaohe Lin
  0 siblings, 2 replies; 34+ messages in thread
From: David Hildenbrand @ 2022-05-12 16:50 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi, Minchan Kim

On 12.05.22 15:26, Miaohe Lin wrote:
> On 2022/5/12 15:10, David Hildenbrand wrote:
>>>> If PG_isolated is still set, it will get cleared in the buddy when
>>>> freeing the page via
>>>>
>>>> 	page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
>>>
>>> Yes, check_free_page only complains about flags belonging to PAGE_FLAGS_CHECK_AT_FREE and PG_isolated
>>> will be cleared in the buddy when freeing the page. But it might not be a good idea to reply on this ?
>>> IMHO, it should be better to clear the PG_isolated explicitly ourselves.
>>
>> I think we can pretty much rely on this handling in the buddy :)
> 
> So is the below code change what you're suggesting?
> 
> 	if (page_count(page) == 1) {
> 		/* page was freed from under us. So we are done. */
> 		ClearPageActive(page);
> 		ClearPageUnevictable(page);
> -		if (unlikely(__PageMovable(page)))
> -			ClearPageIsolated(page);
> 		goto out;
> 	}

Yeah, unless I am missing something important :)

>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> Also, I am not sure how reliable that page count check is here: if we'd
>>>>>> have another speculative reference to the page, we might see
>>>>>> "page_count(page) > 1" and not take that path, although the previous
>>>>>> owner released the last reference.
>>>>>
>>>>> IIUC, there should not be such speculative reference. The driver should have taken care
>>>>> of it.
>>>>
>>>> How can you prevent any kind of speculative references?
>>>>
>>>> See isolate_movable_page() as an example, which grabs a speculative
>>>> reference to then find out that the page is already isolated by someone
>>>> else, to then back off.
>>>
>>> You're right. isolate_movable_page will be an speculative references case. But the page count check here
>>> is just an optimization. If we encounter speculative references, it still works with useless effort of
>>> migrating to be released page.
>>
>>
>> Not really. The issue is that PAGE_FLAGS_CHECK_AT_FREE contains
>> PG_active and PG_unevictable.
>>
>> We only clear those 2 flags if "page_count(page) == 1". Consequently,
>> with a speculative reference, we'll run into the check_free_page_bad()
>> when dropping the last reference.
> 
> It seems if a speculative reference happens after the "page_count(page) == 1" check,
> it's ok because we cleared the PG_active and PG_unevictable. And if it happens before
> the check, this code block is skipped and the page will be freed after migration. The
> PG_active and PG_unevictable will be correctly cleared when page is actually freed via
> __folio_clear_active. (Please see below comment)
> 
>>
>> This is just shaky. Special casing on "page_count(page) == 1" for
>> detecting "was this freed by the owner" is not 100% water proof.
>>
>> In an ideal world, we'd just get rid of that whole block of code and let
>> the actual freeing code clear PG_active and PG_unevictable. But that
>> would require changes to free_pages_prepare().
>>
>>
>> Now I do wonder, if we ever even have PG_active or PG_unevictable still
>> set when the page was freed by the owner in this code. IOW, maybe that
>> is dead code as well and we can just remove the whole shaky
>> "page_count(page) == 1" code block.
> 
> Think about below common scene: Anonymous page is actively used by the sole owner process, so it
> will have PG_active set. Then process exited while vm tries to migrate that page. So the page
> should have refcnt == 1 while PG_active is set? Note normally PG_active should be cleared when
> the page is released:
> 
> __put_single_page
>   PageLRU
>     __clear_page_lru_flags
>       __folio_clear_active
>       __folio_clear_unevictable
> 
> But for isolated page, PageLRU is cleared. So when the isolated page is released, __clear_page_lru_flags
> won't be called. So we have to clear the PG_active and PG_unevictable here manully. So I think
> this code block works. Or am I miss something again?

Let's assume the following: page as freed by the owner and we enter
unmap_and_move().


#1: enter unmap_and_move() // page_count is 1
#2: enter isolate_movable_page() // page_count is 1
#2: get_page_unless_zero() // page_count is now 2
#1: if (page_count(page) == 1) { // does not trigger
#2: put_page(page); // page_count is now 1
#1: put_page(page); // page_count is now 0 -> freed


#1 will trigger __put_page() -> __put_single_page() ->
__page_cache_release() will not clear the flags because it's not an LRU
page at that point in time, right (-> isolated)?

We did not run that code block that would clear PG_active and
PG_unevictable.

Which still leaves the questions:

a) If PG_active and PG_unevictable was cleared, where?
b) Why is that code block that conditionally clears the flags of any
value and why can't we simply drop it?

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check
  2022-05-12 16:50               ` David Hildenbrand
@ 2022-05-16  2:44                 ` Miaohe Lin
  2022-05-31 11:59                   ` David Hildenbrand
  2022-05-24 12:47                 ` Miaohe Lin
  1 sibling, 1 reply; 34+ messages in thread
From: Miaohe Lin @ 2022-05-16  2:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi, Minchan Kim

On 2022/5/13 0:50, David Hildenbrand wrote:
> On 12.05.22 15:26, Miaohe Lin wrote:
>> On 2022/5/12 15:10, David Hildenbrand wrote:
>>>>> If PG_isolated is still set, it will get cleared in the buddy when
>>>>> freeing the page via
>>>>>
>>>>> 	page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
>>>>
>>>> Yes, check_free_page only complains about flags belonging to PAGE_FLAGS_CHECK_AT_FREE and PG_isolated
>>>> will be cleared in the buddy when freeing the page. But it might not be a good idea to reply on this ?
>>>> IMHO, it should be better to clear the PG_isolated explicitly ourselves.
>>>
>>> I think we can pretty much rely on this handling in the buddy :)
>>
>> So is the below code change what you're suggesting?
>>
>> 	if (page_count(page) == 1) {
>> 		/* page was freed from under us. So we are done. */
>> 		ClearPageActive(page);
>> 		ClearPageUnevictable(page);
>> -		if (unlikely(__PageMovable(page)))
>> -			ClearPageIsolated(page);
>> 		goto out;
>> 	}
> 
> Yeah, unless I am missing something important :)
> 
>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Also, I am not sure how reliable that page count check is here: if we'd
>>>>>>> have another speculative reference to the page, we might see
>>>>>>> "page_count(page) > 1" and not take that path, although the previous
>>>>>>> owner released the last reference.
>>>>>>
>>>>>> IIUC, there should not be such speculative reference. The driver should have taken care
>>>>>> of it.
>>>>>
>>>>> How can you prevent any kind of speculative references?
>>>>>
>>>>> See isolate_movable_page() as an example, which grabs a speculative
>>>>> reference to then find out that the page is already isolated by someone
>>>>> else, to then back off.
>>>>
>>>> You're right. isolate_movable_page will be an speculative references case. But the page count check here
>>>> is just an optimization. If we encounter speculative references, it still works with useless effort of
>>>> migrating to be released page.
>>>
>>>
>>> Not really. The issue is that PAGE_FLAGS_CHECK_AT_FREE contains
>>> PG_active and PG_unevictable.
>>>
>>> We only clear those 2 flags if "page_count(page) == 1". Consequently,
>>> with a speculative reference, we'll run into the check_free_page_bad()
>>> when dropping the last reference.
>>
>> It seems if a speculative reference happens after the "page_count(page) == 1" check,
>> it's ok because we cleared the PG_active and PG_unevictable. And if it happens before
>> the check, this code block is skipped and the page will be freed after migration. The
>> PG_active and PG_unevictable will be correctly cleared when page is actually freed via
>> __folio_clear_active. (Please see below comment)
>>
>>>
>>> This is just shaky. Special casing on "page_count(page) == 1" for
>>> detecting "was this freed by the owner" is not 100% water proof.
>>>
>>> In an ideal world, we'd just get rid of that whole block of code and let
>>> the actual freeing code clear PG_active and PG_unevictable. But that
>>> would require changes to free_pages_prepare().
>>>
>>>
>>> Now I do wonder, if we ever even have PG_active or PG_unevictable still
>>> set when the page was freed by the owner in this code. IOW, maybe that
>>> is dead code as well and we can just remove the whole shaky
>>> "page_count(page) == 1" code block.
>>
>> Think about below common scene: Anonymous page is actively used by the sole owner process, so it
>> will have PG_active set. Then process exited while vm tries to migrate that page. So the page
>> should have refcnt == 1 while PG_active is set? Note normally PG_active should be cleared when
>> the page is released:
>>
>> __put_single_page
>>   PageLRU
>>     __clear_page_lru_flags
>>       __folio_clear_active
>>       __folio_clear_unevictable
>>
>> But for isolated page, PageLRU is cleared. So when the isolated page is released, __clear_page_lru_flags
>> won't be called. So we have to clear the PG_active and PG_unevictable here manully. So I think
>> this code block works. Or am I miss something again?
> 
> Let's assume the following: page as freed by the owner and we enter
> unmap_and_move().
> 
> 
> #1: enter unmap_and_move() // page_count is 1
> #2: enter isolate_movable_page() // page_count is 1
> #2: get_page_unless_zero() // page_count is now 2
> #1: if (page_count(page) == 1) { // does not trigger
> #2: put_page(page); // page_count is now 1
> #1: put_page(page); // page_count is now 0 -> freed
> 
> 
> #1 will trigger __put_page() -> __put_single_page() ->
> __page_cache_release() will not clear the flags because it's not an LRU
> page at that point in time, right (-> isolated)?

Sorry, you're right. I thought the old page will be freed via putback_lru_page which will
set PageLRU back instead of put_page directly. So if the above race occurs, PG_active and
PG_unevictable will remain set while page goes to the buddy and check_free_page will complain
about it. But it seems this is never witnessed?

> 
> We did not run that code block that would clear PG_active and
> PG_unevictable.
> 
> Which still leaves the questions:
> 
> a) If PG_active and PG_unevictable was cleared, where?

For LRU pages, PG_active and PG_unevictable are cleared via __page_cache_release. And for isolated
(LRU) pages, PG_active and PG_unevictable should be cleared ourselves?

> b) Why is that code block that conditionally clears the flags of any
> value and why can't we simply drop it?
> 

To fix the issue, should we clear PG_active and PG_unevictable unconditionally here?

Thanks a lot!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/4] mm/migration: reduce the rcu lock duration
  2022-04-29  9:54   ` David Hildenbrand
  2022-05-09  3:14     ` Miaohe Lin
@ 2022-05-24 12:36     ` Miaohe Lin
  1 sibling, 0 replies; 34+ messages in thread
From: Miaohe Lin @ 2022-05-24 12:36 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi

On 2022/4/29 17:54, David Hildenbrand wrote:
> On 25.04.22 15:27, Miaohe Lin wrote:
>> rcu_read_lock is required by grabbing the task refcount but it's not
>> needed for ptrace_may_access. So we could release the rcu lock after
>> task refcount is successfully grabbed to reduce the rcu holding time.
>>
>> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> Cc: Huang Ying <ying.huang@intel.com>
>> Cc: David Howells <dhowells@redhat.com>
>> Cc: Christoph Lameter <cl@linux.com>
>> ---
>>  mm/migrate.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index b2678279eb43..b779646665fe 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1902,17 +1902,16 @@ static struct mm_struct *find_mm_struct(pid_t pid, nodemask_t *mem_nodes)
>>  		return ERR_PTR(-ESRCH);
>>  	}
>>  	get_task_struct(task);
>> +	rcu_read_unlock();
>>  
>>  	/*
>>  	 * Check if this process has the right to modify the specified
>>  	 * process. Use the regular "ptrace_may_access()" checks.
>>  	 */
>>  	if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) {
>> -		rcu_read_unlock();
>>  		mm = ERR_PTR(-EPERM);
>>  		goto out;
>>  	}
>> -	rcu_read_unlock();
>>  
>>  	mm = ERR_PTR(security_task_movememory(task));
>>  	if (IS_ERR(mm))
> 
> Similar pattern in:
> 
> mm/mempolicy.c:kernel_migrate_pages()
> kernel/futex/syscalls.c:get_robust_list()
> kernel/nsproxy.c:validate_nsset()
> 
> Exception:
> 
> sched/core_sched.c:sched_core_share_pid()
> 
> 
> Should we unify -- i.e., adjust the remaining 3 as well?
> 

I verified that this code change applies to kernel_migrate_pages(), but not get_robust_list()
and validate_nsset(). It's because task_struct reference is not grabbed for later ones. Will
send the new patch soon.

Thanks!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check
  2022-05-12 16:50               ` David Hildenbrand
  2022-05-16  2:44                 ` Miaohe Lin
@ 2022-05-24 12:47                 ` Miaohe Lin
  1 sibling, 0 replies; 34+ messages in thread
From: Miaohe Lin @ 2022-05-24 12:47 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi, Minchan Kim

On 2022/5/13 0:50, David Hildenbrand wrote:
> On 12.05.22 15:26, Miaohe Lin wrote:
>> On 2022/5/12 15:10, David Hildenbrand wrote:
>>>>> If PG_isolated is still set, it will get cleared in the buddy when
>>>>> freeing the page via
>>>>>
>>>>> 	page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
>>>>
>>>> Yes, check_free_page only complains about flags belonging to PAGE_FLAGS_CHECK_AT_FREE and PG_isolated
>>>> will be cleared in the buddy when freeing the page. But it might not be a good idea to reply on this ?
>>>> IMHO, it should be better to clear the PG_isolated explicitly ourselves.
>>>
>>> I think we can pretty much rely on this handling in the buddy :)
>>
>> So is the below code change what you're suggesting?
>>
>> 	if (page_count(page) == 1) {
>> 		/* page was freed from under us. So we are done. */
>> 		ClearPageActive(page);
>> 		ClearPageUnevictable(page);
>> -		if (unlikely(__PageMovable(page)))
>> -			ClearPageIsolated(page);
>> 		goto out;
>> 	}
> 
> Yeah, unless I am missing something important :)
> 
>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Also, I am not sure how reliable that page count check is here: if we'd
>>>>>>> have another speculative reference to the page, we might see
>>>>>>> "page_count(page) > 1" and not take that path, although the previous
>>>>>>> owner released the last reference.
>>>>>>
>>>>>> IIUC, there should not be such speculative reference. The driver should have taken care
>>>>>> of it.
>>>>>
>>>>> How can you prevent any kind of speculative references?
>>>>>
>>>>> See isolate_movable_page() as an example, which grabs a speculative
>>>>> reference to then find out that the page is already isolated by someone
>>>>> else, to then back off.
>>>>
>>>> You're right. isolate_movable_page will be an speculative references case. But the page count check here
>>>> is just an optimization. If we encounter speculative references, it still works with useless effort of
>>>> migrating to be released page.
>>>
>>>
>>> Not really. The issue is that PAGE_FLAGS_CHECK_AT_FREE contains
>>> PG_active and PG_unevictable.
>>>
>>> We only clear those 2 flags if "page_count(page) == 1". Consequently,
>>> with a speculative reference, we'll run into the check_free_page_bad()
>>> when dropping the last reference.
>>
>> It seems if a speculative reference happens after the "page_count(page) == 1" check,
>> it's ok because we cleared the PG_active and PG_unevictable. And if it happens before
>> the check, this code block is skipped and the page will be freed after migration. The
>> PG_active and PG_unevictable will be correctly cleared when page is actually freed via
>> __folio_clear_active. (Please see below comment)
>>
>>>
>>> This is just shaky. Special casing on "page_count(page) == 1" for
>>> detecting "was this freed by the owner" is not 100% water proof.
>>>
>>> In an ideal world, we'd just get rid of that whole block of code and let
>>> the actual freeing code clear PG_active and PG_unevictable. But that
>>> would require changes to free_pages_prepare().
>>>
>>>
>>> Now I do wonder, if we ever even have PG_active or PG_unevictable still
>>> set when the page was freed by the owner in this code. IOW, maybe that
>>> is dead code as well and we can just remove the whole shaky
>>> "page_count(page) == 1" code block.
>>
>> Think about below common scene: Anonymous page is actively used by the sole owner process, so it
>> will have PG_active set. Then process exited while vm tries to migrate that page. So the page
>> should have refcnt == 1 while PG_active is set? Note normally PG_active should be cleared when
>> the page is released:
>>
>> __put_single_page
>>   PageLRU
>>     __clear_page_lru_flags
>>       __folio_clear_active
>>       __folio_clear_unevictable
>>
>> But for isolated page, PageLRU is cleared. So when the isolated page is released, __clear_page_lru_flags
>> won't be called. So we have to clear the PG_active and PG_unevictable here manully. So I think
>> this code block works. Or am I miss something again?
> 
> Let's assume the following: page as freed by the owner and we enter
> unmap_and_move().
> 
> 
> #1: enter unmap_and_move() // page_count is 1
> #2: enter isolate_movable_page() // page_count is 1
> #2: get_page_unless_zero() // page_count is now 2
> #1: if (page_count(page) == 1) { // does not trigger
> #2: put_page(page); // page_count is now 1
> #1: put_page(page); // page_count is now 0 -> freed
> 
> 
> #1 will trigger __put_page() -> __put_single_page() ->
> __page_cache_release() will not clear the flags because it's not an LRU
> page at that point in time, right (-> isolated)?
> 
> We did not run that code block that would clear PG_active and
> PG_unevictable.
> 
> Which still leaves the questions:
> 
> a) If PG_active and PG_unevictable was cleared, where?
> b) Why is that code block that conditionally clears the flags of any
> value and why can't we simply drop it?
> 

I took a more close look of the code today. And I think the current code works:
There are 3 cases in unmap_and_move:

1.page is freed through "if (page_count(page) == 1)" code block. This works
as PG_active and PG_unevictable are cleared here.

2. Failed to migrate the page. The page won't be release so we don't care about it.

3. The page is migrated successfully. The PG_active and PG_unevictable are cleared
via folio_migrate_flags():

	if (folio_test_clear_active(folio)) {
		VM_BUG_ON_FOLIO(folio_test_unevictable(folio), folio);
		folio_set_active(newfolio);
	} else if (folio_test_clear_unevictable(folio))
		folio_set_unevictable(newfolio);

For the above race case, the page won't be freed through "if (page_count(page) == 1)" code block.
It will just be migrated and freed via put_page() after folio_migrate_flags() having cleared PG_active
and PG_unevictable.

Am I miss something again ? ;)

Thanks a lot!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check
  2022-05-16  2:44                 ` Miaohe Lin
@ 2022-05-31 11:59                   ` David Hildenbrand
  2022-05-31 12:37                     ` Miaohe Lin
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2022-05-31 11:59 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi, Minchan Kim

Sorry for the late reply, was on vacation.

>>>
>>> But for isolated page, PageLRU is cleared. So when the isolated page is released, __clear_page_lru_flags
>>> won't be called. So we have to clear the PG_active and PG_unevictable here manully. So I think
>>> this code block works. Or am I miss something again?
>>
>> Let's assume the following: page as freed by the owner and we enter
>> unmap_and_move().
>>
>>
>> #1: enter unmap_and_move() // page_count is 1
>> #2: enter isolate_movable_page() // page_count is 1
>> #2: get_page_unless_zero() // page_count is now 2
>> #1: if (page_count(page) == 1) { // does not trigger
>> #2: put_page(page); // page_count is now 1
>> #1: put_page(page); // page_count is now 0 -> freed
>>
>>
>> #1 will trigger __put_page() -> __put_single_page() ->
>> __page_cache_release() will not clear the flags because it's not an LRU
>> page at that point in time, right (-> isolated)?
> 
> Sorry, you're right. I thought the old page will be freed via putback_lru_page which will
> set PageLRU back instead of put_page directly. So if the above race occurs, PG_active and
> PG_unevictable will remain set while page goes to the buddy and check_free_page will complain
> about it. But it seems this is never witnessed?

Maybe

a) we were lucky so far and didn't trigger it
b) the whole code block is dead code because we are missing something
c) we are missing something else :)

> 
>>
>> We did not run that code block that would clear PG_active and
>> PG_unevictable.
>>
>> Which still leaves the questions:
>>
>> a) If PG_active and PG_unevictable was cleared, where?
> 
> For LRU pages, PG_active and PG_unevictable are cleared via __page_cache_release. And for isolated
> (LRU) pages, PG_active and PG_unevictable should be cleared ourselves?
> 
>> b) Why is that code block that conditionally clears the flags of any
>> value and why can't we simply drop it?
>>
> 
> To fix the issue, should we clear PG_active and PG_unevictable unconditionally here?

I wonder if we should simply teach actual freeing code to simply clear
both flags when freeing an isolated page? IOW, to detect "isolated LRU"
is getting freed and fixup?

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check
  2022-05-31 11:59                   ` David Hildenbrand
@ 2022-05-31 12:37                     ` Miaohe Lin
  2022-06-01 10:31                       ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: Miaohe Lin @ 2022-05-31 12:37 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi, Minchan Kim

On 2022/5/31 19:59, David Hildenbrand wrote:
> Sorry for the late reply, was on vacation.

That's all right. Hope you have a great time. ;)

> 
>>>>
>>>> But for isolated page, PageLRU is cleared. So when the isolated page is released, __clear_page_lru_flags
>>>> won't be called. So we have to clear the PG_active and PG_unevictable here manully. So I think
>>>> this code block works. Or am I miss something again?
>>>
>>> Let's assume the following: page as freed by the owner and we enter
>>> unmap_and_move().
>>>
>>>
>>> #1: enter unmap_and_move() // page_count is 1
>>> #2: enter isolate_movable_page() // page_count is 1
>>> #2: get_page_unless_zero() // page_count is now 2
>>> #1: if (page_count(page) == 1) { // does not trigger
>>> #2: put_page(page); // page_count is now 1
>>> #1: put_page(page); // page_count is now 0 -> freed
>>>
>>>
>>> #1 will trigger __put_page() -> __put_single_page() ->
>>> __page_cache_release() will not clear the flags because it's not an LRU
>>> page at that point in time, right (-> isolated)?
>>
>> Sorry, you're right. I thought the old page will be freed via putback_lru_page which will
>> set PageLRU back instead of put_page directly. So if the above race occurs, PG_active and
>> PG_unevictable will remain set while page goes to the buddy and check_free_page will complain
>> about it. But it seems this is never witnessed?
> 
> Maybe
> 
> a) we were lucky so far and didn't trigger it
> b) the whole code block is dead code because we are missing something
> c) we are missing something else :)

I think I found the things we missed in another email [1].
[1]: https://lore.kernel.org/all/948ea45e-3b2b-e16c-5b8c-4c34de0ea593@huawei.com/

Paste the main content of [1] here:

"
There are 3 cases in unmap_and_move:

1.page is freed through "if (page_count(page) == 1)" code block. This works
as PG_active and PG_unevictable are cleared here.

2. Failed to migrate the page. The page won't be release so we don't care about it.

3. The page is migrated successfully. The PG_active and PG_unevictable are cleared
via folio_migrate_flags():

	if (folio_test_clear_active(folio)) {
		VM_BUG_ON_FOLIO(folio_test_unevictable(folio), folio);
		folio_set_active(newfolio);
	} else if (folio_test_clear_unevictable(folio))
		folio_set_unevictable(newfolio);

For the above race case, the page won't be freed through "if (page_count(page) == 1)" code block.
It will just be migrated and freed via put_page() after folio_migrate_flags() having cleared PG_active
and PG_unevictable.
"
Or Am I miss something again? :)

> 
>>
>>>
>>> We did not run that code block that would clear PG_active and
>>> PG_unevictable.
>>>
>>> Which still leaves the questions:
>>>
>>> a) If PG_active and PG_unevictable was cleared, where?
>>
>> For LRU pages, PG_active and PG_unevictable are cleared via __page_cache_release. And for isolated
>> (LRU) pages, PG_active and PG_unevictable should be cleared ourselves?
>>
>>> b) Why is that code block that conditionally clears the flags of any
>>> value and why can't we simply drop it?
>>>
>>
>> To fix the issue, should we clear PG_active and PG_unevictable unconditionally here?
> 
> I wonder if we should simply teach actual freeing code to simply clear
> both flags when freeing an isolated page? IOW, to detect "isolated LRU"
> is getting freed and fixup?

IMHO, clearing both flags are done by the caller indeed. Another example I found when I
read the khugepaged code recently is pasted below:

collapse_file
	...
	page_ref_unfreeze(page, 1);
	ClearPageActive(page);
	ClearPageUnevictable(page);
	unlock_page(page);
	put_page(page);
	index++;
	...

Thanks!

> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check
  2022-05-31 12:37                     ` Miaohe Lin
@ 2022-06-01 10:31                       ` David Hildenbrand
  2022-06-02  7:40                         ` Miaohe Lin
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2022-06-01 10:31 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi, Minchan Kim

On 31.05.22 14:37, Miaohe Lin wrote:
> On 2022/5/31 19:59, David Hildenbrand wrote:
>> Sorry for the late reply, was on vacation.
> 
> That's all right. Hope you have a great time. ;)
> 
>>
>>>>>
>>>>> But for isolated page, PageLRU is cleared. So when the isolated page is released, __clear_page_lru_flags
>>>>> won't be called. So we have to clear the PG_active and PG_unevictable here manully. So I think
>>>>> this code block works. Or am I miss something again?
>>>>
>>>> Let's assume the following: page as freed by the owner and we enter
>>>> unmap_and_move().
>>>>
>>>>
>>>> #1: enter unmap_and_move() // page_count is 1
>>>> #2: enter isolate_movable_page() // page_count is 1
>>>> #2: get_page_unless_zero() // page_count is now 2
>>>> #1: if (page_count(page) == 1) { // does not trigger
>>>> #2: put_page(page); // page_count is now 1
>>>> #1: put_page(page); // page_count is now 0 -> freed
>>>>
>>>>
>>>> #1 will trigger __put_page() -> __put_single_page() ->
>>>> __page_cache_release() will not clear the flags because it's not an LRU
>>>> page at that point in time, right (-> isolated)?
>>>
>>> Sorry, you're right. I thought the old page will be freed via putback_lru_page which will
>>> set PageLRU back instead of put_page directly. So if the above race occurs, PG_active and
>>> PG_unevictable will remain set while page goes to the buddy and check_free_page will complain
>>> about it. But it seems this is never witnessed?
>>
>> Maybe
>>
>> a) we were lucky so far and didn't trigger it
>> b) the whole code block is dead code because we are missing something
>> c) we are missing something else :)
> 
> I think I found the things we missed in another email [1].
> [1]: https://lore.kernel.org/all/948ea45e-3b2b-e16c-5b8c-4c34de0ea593@huawei.com/
> 
> Paste the main content of [1] here:
> 
> "
> There are 3 cases in unmap_and_move:
> 
> 1.page is freed through "if (page_count(page) == 1)" code block. This works
> as PG_active and PG_unevictable are cleared here.
> 
> 2. Failed to migrate the page. The page won't be release so we don't care about it.

Right, page is un-isolated.

> 
> 3. The page is migrated successfully. The PG_active and PG_unevictable are cleared
> via folio_migrate_flags():
> 
> 	if (folio_test_clear_active(folio)) {
> 		VM_BUG_ON_FOLIO(folio_test_unevictable(folio), folio);
> 		folio_set_active(newfolio);
> 	} else if (folio_test_clear_unevictable(folio))
> 		folio_set_unevictable(newfolio);

Right.

> 
> For the above race case, the page won't be freed through "if (page_count(page) == 1)" code block.
> It will just be migrated and freed via put_page() after folio_migrate_flags() having cleared PG_active
> and PG_unevictable.
> "
> Or Am I miss something again? :)

For #1, I'm still not sure what would happen on a speculative reference.

It's worth summarizing that

a) free_pages_prepare() will clear both flags via page->flags &=
~PAGE_FLAGS_CHECK_AT_PREP;

b) free_pages_prepare() will bail out if any flag is set in
check_free_page().

As we've never seen b) in the wild, this certainly has low priority, and
maybe it really cannot happen right now.

However, maybe really allowing these flags to be set when freeing the
page and removing the "page_count(page) == 1" case from migration code
would be the clean thing to do.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check
  2022-06-01 10:31                       ` David Hildenbrand
@ 2022-06-02  7:40                         ` Miaohe Lin
  2022-06-02  8:47                           ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: Miaohe Lin @ 2022-06-02  7:40 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi, Minchan Kim

On 2022/6/1 18:31, David Hildenbrand wrote:
> On 31.05.22 14:37, Miaohe Lin wrote:
>> On 2022/5/31 19:59, David Hildenbrand wrote:
>>> Sorry for the late reply, was on vacation.
>>
>> That's all right. Hope you have a great time. ;)
>>
>>>
>>>>>>
>>>>>> But for isolated page, PageLRU is cleared. So when the isolated page is released, __clear_page_lru_flags
>>>>>> won't be called. So we have to clear the PG_active and PG_unevictable here manully. So I think
>>>>>> this code block works. Or am I miss something again?
>>>>>
>>>>> Let's assume the following: page as freed by the owner and we enter
>>>>> unmap_and_move().
>>>>>
>>>>>
>>>>> #1: enter unmap_and_move() // page_count is 1
>>>>> #2: enter isolate_movable_page() // page_count is 1
>>>>> #2: get_page_unless_zero() // page_count is now 2
>>>>> #1: if (page_count(page) == 1) { // does not trigger
>>>>> #2: put_page(page); // page_count is now 1
>>>>> #1: put_page(page); // page_count is now 0 -> freed
>>>>>
>>>>>
>>>>> #1 will trigger __put_page() -> __put_single_page() ->
>>>>> __page_cache_release() will not clear the flags because it's not an LRU
>>>>> page at that point in time, right (-> isolated)?
>>>>
>>>> Sorry, you're right. I thought the old page will be freed via putback_lru_page which will
>>>> set PageLRU back instead of put_page directly. So if the above race occurs, PG_active and
>>>> PG_unevictable will remain set while page goes to the buddy and check_free_page will complain
>>>> about it. But it seems this is never witnessed?
>>>
>>> Maybe
>>>
>>> a) we were lucky so far and didn't trigger it
>>> b) the whole code block is dead code because we are missing something
>>> c) we are missing something else :)
>>
>> I think I found the things we missed in another email [1].
>> [1]: https://lore.kernel.org/all/948ea45e-3b2b-e16c-5b8c-4c34de0ea593@huawei.com/
>>
>> Paste the main content of [1] here:
>>
>> "
>> There are 3 cases in unmap_and_move:
>>
>> 1.page is freed through "if (page_count(page) == 1)" code block. This works
>> as PG_active and PG_unevictable are cleared here.
>>
>> 2. Failed to migrate the page. The page won't be release so we don't care about it.
> 
> Right, page is un-isolated.
> 
>>
>> 3. The page is migrated successfully. The PG_active and PG_unevictable are cleared
>> via folio_migrate_flags():
>>
>> 	if (folio_test_clear_active(folio)) {
>> 		VM_BUG_ON_FOLIO(folio_test_unevictable(folio), folio);
>> 		folio_set_active(newfolio);
>> 	} else if (folio_test_clear_unevictable(folio))
>> 		folio_set_unevictable(newfolio);
> 
> Right.
> 
>>
>> For the above race case, the page won't be freed through "if (page_count(page) == 1)" code block.
>> It will just be migrated and freed via put_page() after folio_migrate_flags() having cleared PG_active
>> and PG_unevictable.
>> "
>> Or Am I miss something again? :)
> 
> For #1, I'm still not sure what would happen on a speculative reference.
> 
> It's worth summarizing that
> 
> a) free_pages_prepare() will clear both flags via page->flags &=
> ~PAGE_FLAGS_CHECK_AT_PREP;
> 
> b) free_pages_prepare() will bail out if any flag is set in
> check_free_page().
> 
> As we've never seen b) in the wild, this certainly has low priority, and
> maybe it really cannot happen right now.
> 
> However, maybe really allowing these flags to be set when freeing the
> page and removing the "page_count(page) == 1" case from migration code
> would be the clean thing to do.

IMHO, check_free_page is used to catch possible problem. There's the comment of PAGE_FLAGS_CHECK_AT_FREE:

/*
 * Flags checked when a page is freed.  Pages being freed should not have
 * these flags set.  If they are, there is a problem.
 */
#define PAGE_FLAGS_CHECK_AT_FREE

There might be an assumption: when page is freed, it shouldn't be an active or unevictable page. It should be
inactive and evictable. So allowing these flags to be set when freeing the page might not be a good idea?

Thanks!

>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check
  2022-06-02  7:40                         ` Miaohe Lin
@ 2022-06-02  8:47                           ` David Hildenbrand
  2022-06-07  2:20                             ` Miaohe Lin
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2022-06-02  8:47 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi, Minchan Kim

On 02.06.22 09:40, Miaohe Lin wrote:
> On 2022/6/1 18:31, David Hildenbrand wrote:
>> On 31.05.22 14:37, Miaohe Lin wrote:
>>> On 2022/5/31 19:59, David Hildenbrand wrote:
>>>> Sorry for the late reply, was on vacation.
>>>
>>> That's all right. Hope you have a great time. ;)
>>>
>>>>
>>>>>>>
>>>>>>> But for isolated page, PageLRU is cleared. So when the isolated page is released, __clear_page_lru_flags
>>>>>>> won't be called. So we have to clear the PG_active and PG_unevictable here manully. So I think
>>>>>>> this code block works. Or am I miss something again?
>>>>>>
>>>>>> Let's assume the following: page as freed by the owner and we enter
>>>>>> unmap_and_move().
>>>>>>
>>>>>>
>>>>>> #1: enter unmap_and_move() // page_count is 1
>>>>>> #2: enter isolate_movable_page() // page_count is 1
>>>>>> #2: get_page_unless_zero() // page_count is now 2
>>>>>> #1: if (page_count(page) == 1) { // does not trigger
>>>>>> #2: put_page(page); // page_count is now 1
>>>>>> #1: put_page(page); // page_count is now 0 -> freed
>>>>>>
>>>>>>
>>>>>> #1 will trigger __put_page() -> __put_single_page() ->
>>>>>> __page_cache_release() will not clear the flags because it's not an LRU
>>>>>> page at that point in time, right (-> isolated)?
>>>>>
>>>>> Sorry, you're right. I thought the old page will be freed via putback_lru_page which will
>>>>> set PageLRU back instead of put_page directly. So if the above race occurs, PG_active and
>>>>> PG_unevictable will remain set while page goes to the buddy and check_free_page will complain
>>>>> about it. But it seems this is never witnessed?
>>>>
>>>> Maybe
>>>>
>>>> a) we were lucky so far and didn't trigger it
>>>> b) the whole code block is dead code because we are missing something
>>>> c) we are missing something else :)
>>>
>>> I think I found the things we missed in another email [1].
>>> [1]: https://lore.kernel.org/all/948ea45e-3b2b-e16c-5b8c-4c34de0ea593@huawei.com/
>>>
>>> Paste the main content of [1] here:
>>>
>>> "
>>> There are 3 cases in unmap_and_move:
>>>
>>> 1.page is freed through "if (page_count(page) == 1)" code block. This works
>>> as PG_active and PG_unevictable are cleared here.
>>>
>>> 2. Failed to migrate the page. The page won't be release so we don't care about it.
>>
>> Right, page is un-isolated.
>>
>>>
>>> 3. The page is migrated successfully. The PG_active and PG_unevictable are cleared
>>> via folio_migrate_flags():
>>>
>>> 	if (folio_test_clear_active(folio)) {
>>> 		VM_BUG_ON_FOLIO(folio_test_unevictable(folio), folio);
>>> 		folio_set_active(newfolio);
>>> 	} else if (folio_test_clear_unevictable(folio))
>>> 		folio_set_unevictable(newfolio);
>>
>> Right.
>>
>>>
>>> For the above race case, the page won't be freed through "if (page_count(page) == 1)" code block.
>>> It will just be migrated and freed via put_page() after folio_migrate_flags() having cleared PG_active
>>> and PG_unevictable.
>>> "
>>> Or Am I miss something again? :)
>>
>> For #1, I'm still not sure what would happen on a speculative reference.
>>
>> It's worth summarizing that
>>
>> a) free_pages_prepare() will clear both flags via page->flags &=
>> ~PAGE_FLAGS_CHECK_AT_PREP;
>>
>> b) free_pages_prepare() will bail out if any flag is set in
>> check_free_page().
>>
>> As we've never seen b) in the wild, this certainly has low priority, and
>> maybe it really cannot happen right now.
>>
>> However, maybe really allowing these flags to be set when freeing the
>> page and removing the "page_count(page) == 1" case from migration code
>> would be the clean thing to do.
> 
> IMHO, check_free_page is used to catch possible problem. There's the comment of PAGE_FLAGS_CHECK_AT_FREE:
> 
> /*
>  * Flags checked when a page is freed.  Pages being freed should not have
>  * these flags set.  If they are, there is a problem.
>  */
> #define PAGE_FLAGS_CHECK_AT_FREE
> 
> There might be an assumption: when page is freed, it shouldn't be an active or unevictable page. It should be
> inactive and evictable. So allowing these flags to be set when freeing the page might not be a good idea?

Yeah, and we'd be lifting that restriction because there is good reason
to do so.

Maybe we *could* special case for isolated pages; however, that adds
runtime overhead. Of course, we could perform different checks for e.g.,
DEBUG_VM vs !DEBUG_VM.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check
  2022-06-02  8:47                           ` David Hildenbrand
@ 2022-06-07  2:20                             ` Miaohe Lin
  2022-06-08 10:05                               ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: Miaohe Lin @ 2022-06-07  2:20 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi, Minchan Kim

On 2022/6/2 16:47, David Hildenbrand wrote:
> On 02.06.22 09:40, Miaohe Lin wrote:
>> On 2022/6/1 18:31, David Hildenbrand wrote:
>>> On 31.05.22 14:37, Miaohe Lin wrote:
>>>> On 2022/5/31 19:59, David Hildenbrand wrote:
>>>>> Sorry for the late reply, was on vacation.
>>>>
>>>> That's all right. Hope you have a great time. ;)
>>>>
>>>>>
>>>>>>>>
>>>>>>>> But for isolated page, PageLRU is cleared. So when the isolated page is released, __clear_page_lru_flags
>>>>>>>> won't be called. So we have to clear the PG_active and PG_unevictable here manully. So I think
>>>>>>>> this code block works. Or am I miss something again?
>>>>>>>
>>>>>>> Let's assume the following: page as freed by the owner and we enter
>>>>>>> unmap_and_move().
>>>>>>>
>>>>>>>
>>>>>>> #1: enter unmap_and_move() // page_count is 1
>>>>>>> #2: enter isolate_movable_page() // page_count is 1
>>>>>>> #2: get_page_unless_zero() // page_count is now 2
>>>>>>> #1: if (page_count(page) == 1) { // does not trigger
>>>>>>> #2: put_page(page); // page_count is now 1
>>>>>>> #1: put_page(page); // page_count is now 0 -> freed
>>>>>>>
>>>>>>>
>>>>>>> #1 will trigger __put_page() -> __put_single_page() ->
>>>>>>> __page_cache_release() will not clear the flags because it's not an LRU
>>>>>>> page at that point in time, right (-> isolated)?
>>>>>>
>>>>>> Sorry, you're right. I thought the old page will be freed via putback_lru_page which will
>>>>>> set PageLRU back instead of put_page directly. So if the above race occurs, PG_active and
>>>>>> PG_unevictable will remain set while page goes to the buddy and check_free_page will complain
>>>>>> about it. But it seems this is never witnessed?
>>>>>
>>>>> Maybe
>>>>>
>>>>> a) we were lucky so far and didn't trigger it
>>>>> b) the whole code block is dead code because we are missing something
>>>>> c) we are missing something else :)
>>>>
>>>> I think I found the things we missed in another email [1].
>>>> [1]: https://lore.kernel.org/all/948ea45e-3b2b-e16c-5b8c-4c34de0ea593@huawei.com/
>>>>
>>>> Paste the main content of [1] here:
>>>>
>>>> "
>>>> There are 3 cases in unmap_and_move:
>>>>
>>>> 1.page is freed through "if (page_count(page) == 1)" code block. This works
>>>> as PG_active and PG_unevictable are cleared here.
>>>>
>>>> 2. Failed to migrate the page. The page won't be release so we don't care about it.
>>>
>>> Right, page is un-isolated.
>>>
>>>>
>>>> 3. The page is migrated successfully. The PG_active and PG_unevictable are cleared
>>>> via folio_migrate_flags():
>>>>
>>>> 	if (folio_test_clear_active(folio)) {
>>>> 		VM_BUG_ON_FOLIO(folio_test_unevictable(folio), folio);
>>>> 		folio_set_active(newfolio);
>>>> 	} else if (folio_test_clear_unevictable(folio))
>>>> 		folio_set_unevictable(newfolio);
>>>
>>> Right.
>>>
>>>>
>>>> For the above race case, the page won't be freed through "if (page_count(page) == 1)" code block.
>>>> It will just be migrated and freed via put_page() after folio_migrate_flags() having cleared PG_active
>>>> and PG_unevictable.
>>>> "
>>>> Or Am I miss something again? :)
>>>
>>> For #1, I'm still not sure what would happen on a speculative reference.
>>>
>>> It's worth summarizing that
>>>
>>> a) free_pages_prepare() will clear both flags via page->flags &=
>>> ~PAGE_FLAGS_CHECK_AT_PREP;
>>>
>>> b) free_pages_prepare() will bail out if any flag is set in
>>> check_free_page().
>>>
>>> As we've never seen b) in the wild, this certainly has low priority, and
>>> maybe it really cannot happen right now.
>>>
>>> However, maybe really allowing these flags to be set when freeing the
>>> page and removing the "page_count(page) == 1" case from migration code
>>> would be the clean thing to do.
>>
>> IMHO, check_free_page is used to catch possible problem. There's the comment of PAGE_FLAGS_CHECK_AT_FREE:
>>
>> /*
>>  * Flags checked when a page is freed.  Pages being freed should not have
>>  * these flags set.  If they are, there is a problem.
>>  */
>> #define PAGE_FLAGS_CHECK_AT_FREE
>>
>> There might be an assumption: when page is freed, it shouldn't be an active or unevictable page. It should be
>> inactive and evictable. So allowing these flags to be set when freeing the page might not be a good idea?
> 
> Yeah, and we'd be lifting that restriction because there is good reason
> to do so.
> 
> Maybe we *could* special case for isolated pages; however, that adds
> runtime overhead. Of course, we could perform different checks for e.g.,
> DEBUG_VM vs !DEBUG_VM.

I found there is one assumption about PG_active and PG_unevictable, i.e. in __folio_clear_lru_flags:

	/* this shouldn't happen, so leave the flags to bad_page() */
	if (folio_test_active(folio) && folio_test_unevictable(folio))
		return;

If PG_active and PG_unevictable are both set, this case will be caught in the bad_page() via check_free_page().
There might be some other assumptions about PG_active and PG_unevictable. So I think it's not safe to lift that
restriction.

But maybe we could limit this check within DEBUG_VM as you suggested. Am I supposed to do it?

Thanks!

> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check
  2022-06-07  2:20                             ` Miaohe Lin
@ 2022-06-08 10:05                               ` David Hildenbrand
  2022-06-08 13:31                                 ` Miaohe Lin
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2022-06-08 10:05 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi, Minchan Kim

On 07.06.22 04:20, Miaohe Lin wrote:
> On 2022/6/2 16:47, David Hildenbrand wrote:
>> On 02.06.22 09:40, Miaohe Lin wrote:
>>> On 2022/6/1 18:31, David Hildenbrand wrote:
>>>> On 31.05.22 14:37, Miaohe Lin wrote:
>>>>> On 2022/5/31 19:59, David Hildenbrand wrote:
>>>>>> Sorry for the late reply, was on vacation.
>>>>>
>>>>> That's all right. Hope you have a great time. ;)
>>>>>
>>>>>>
>>>>>>>>>
>>>>>>>>> But for isolated page, PageLRU is cleared. So when the isolated page is released, __clear_page_lru_flags
>>>>>>>>> won't be called. So we have to clear the PG_active and PG_unevictable here manully. So I think
>>>>>>>>> this code block works. Or am I miss something again?
>>>>>>>>
>>>>>>>> Let's assume the following: page as freed by the owner and we enter
>>>>>>>> unmap_and_move().
>>>>>>>>
>>>>>>>>
>>>>>>>> #1: enter unmap_and_move() // page_count is 1
>>>>>>>> #2: enter isolate_movable_page() // page_count is 1
>>>>>>>> #2: get_page_unless_zero() // page_count is now 2
>>>>>>>> #1: if (page_count(page) == 1) { // does not trigger
>>>>>>>> #2: put_page(page); // page_count is now 1
>>>>>>>> #1: put_page(page); // page_count is now 0 -> freed
>>>>>>>>
>>>>>>>>
>>>>>>>> #1 will trigger __put_page() -> __put_single_page() ->
>>>>>>>> __page_cache_release() will not clear the flags because it's not an LRU
>>>>>>>> page at that point in time, right (-> isolated)?
>>>>>>>
>>>>>>> Sorry, you're right. I thought the old page will be freed via putback_lru_page which will
>>>>>>> set PageLRU back instead of put_page directly. So if the above race occurs, PG_active and
>>>>>>> PG_unevictable will remain set while page goes to the buddy and check_free_page will complain
>>>>>>> about it. But it seems this is never witnessed?
>>>>>>
>>>>>> Maybe
>>>>>>
>>>>>> a) we were lucky so far and didn't trigger it
>>>>>> b) the whole code block is dead code because we are missing something
>>>>>> c) we are missing something else :)
>>>>>
>>>>> I think I found the things we missed in another email [1].
>>>>> [1]: https://lore.kernel.org/all/948ea45e-3b2b-e16c-5b8c-4c34de0ea593@huawei.com/
>>>>>
>>>>> Paste the main content of [1] here:
>>>>>
>>>>> "
>>>>> There are 3 cases in unmap_and_move:
>>>>>
>>>>> 1.page is freed through "if (page_count(page) == 1)" code block. This works
>>>>> as PG_active and PG_unevictable are cleared here.
>>>>>
>>>>> 2. Failed to migrate the page. The page won't be release so we don't care about it.
>>>>
>>>> Right, page is un-isolated.
>>>>
>>>>>
>>>>> 3. The page is migrated successfully. The PG_active and PG_unevictable are cleared
>>>>> via folio_migrate_flags():
>>>>>
>>>>> 	if (folio_test_clear_active(folio)) {
>>>>> 		VM_BUG_ON_FOLIO(folio_test_unevictable(folio), folio);
>>>>> 		folio_set_active(newfolio);
>>>>> 	} else if (folio_test_clear_unevictable(folio))
>>>>> 		folio_set_unevictable(newfolio);
>>>>
>>>> Right.
>>>>
>>>>>
>>>>> For the above race case, the page won't be freed through "if (page_count(page) == 1)" code block.
>>>>> It will just be migrated and freed via put_page() after folio_migrate_flags() having cleared PG_active
>>>>> and PG_unevictable.
>>>>> "
>>>>> Or Am I miss something again? :)
>>>>
>>>> For #1, I'm still not sure what would happen on a speculative reference.
>>>>
>>>> It's worth summarizing that
>>>>
>>>> a) free_pages_prepare() will clear both flags via page->flags &=
>>>> ~PAGE_FLAGS_CHECK_AT_PREP;
>>>>
>>>> b) free_pages_prepare() will bail out if any flag is set in
>>>> check_free_page().
>>>>
>>>> As we've never seen b) in the wild, this certainly has low priority, and
>>>> maybe it really cannot happen right now.
>>>>
>>>> However, maybe really allowing these flags to be set when freeing the
>>>> page and removing the "page_count(page) == 1" case from migration code
>>>> would be the clean thing to do.
>>>
>>> IMHO, check_free_page is used to catch possible problem. There's the comment of PAGE_FLAGS_CHECK_AT_FREE:
>>>
>>> /*
>>>  * Flags checked when a page is freed.  Pages being freed should not have
>>>  * these flags set.  If they are, there is a problem.
>>>  */
>>> #define PAGE_FLAGS_CHECK_AT_FREE
>>>
>>> There might be an assumption: when page is freed, it shouldn't be an active or unevictable page. It should be
>>> inactive and evictable. So allowing these flags to be set when freeing the page might not be a good idea?
>>
>> Yeah, and we'd be lifting that restriction because there is good reason
>> to do so.
>>
>> Maybe we *could* special case for isolated pages; however, that adds
>> runtime overhead. Of course, we could perform different checks for e.g.,
>> DEBUG_VM vs !DEBUG_VM.
> 
> I found there is one assumption about PG_active and PG_unevictable, i.e. in __folio_clear_lru_flags:
> 
> 	/* this shouldn't happen, so leave the flags to bad_page() */
> 	if (folio_test_active(folio) && folio_test_unevictable(folio))
> 		return;
> 
> If PG_active and PG_unevictable are both set, this case will be caught in the bad_page() via check_free_page().
> There might be some other assumptions about PG_active and PG_unevictable. So I think it's not safe to lift that
> restriction.
> 
> But maybe we could limit this check within DEBUG_VM as you suggested. Am I supposed to do it?

Well, if you want, you can look into ways of cleaning that up and
removing the "if there is more than one reference, the owner hasn't
freed the page" condition, because there are corner cases where the
owner might have freed the page but speculative references keep the
refcount temporarily incremented.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check
  2022-06-08 10:05                               ` David Hildenbrand
@ 2022-06-08 13:31                                 ` Miaohe Lin
  0 siblings, 0 replies; 34+ messages in thread
From: Miaohe Lin @ 2022-06-08 13:31 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: ying.huang, hch, dhowells, cl, linux-mm, linux-kernel, akpm,
	mike.kravetz, naoya.horiguchi, Minchan Kim

On 2022/6/8 18:05, David Hildenbrand wrote:
> On 07.06.22 04:20, Miaohe Lin wrote:
>> On 2022/6/2 16:47, David Hildenbrand wrote:
>>> On 02.06.22 09:40, Miaohe Lin wrote:
>>>> On 2022/6/1 18:31, David Hildenbrand wrote:
>>>>> On 31.05.22 14:37, Miaohe Lin wrote:
>>>>>> On 2022/5/31 19:59, David Hildenbrand wrote:
>>>>>>> Sorry for the late reply, was on vacation.
>>>>>>
>>>>>> That's all right. Hope you have a great time. ;)
>>>>>>
>>>>>>>
>>>>>>>>>>
>>>>>>>>>> But for isolated page, PageLRU is cleared. So when the isolated page is released, __clear_page_lru_flags
>>>>>>>>>> won't be called. So we have to clear the PG_active and PG_unevictable here manully. So I think
>>>>>>>>>> this code block works. Or am I miss something again?
>>>>>>>>>
>>>>>>>>> Let's assume the following: page as freed by the owner and we enter
>>>>>>>>> unmap_and_move().
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> #1: enter unmap_and_move() // page_count is 1
>>>>>>>>> #2: enter isolate_movable_page() // page_count is 1
>>>>>>>>> #2: get_page_unless_zero() // page_count is now 2
>>>>>>>>> #1: if (page_count(page) == 1) { // does not trigger
>>>>>>>>> #2: put_page(page); // page_count is now 1
>>>>>>>>> #1: put_page(page); // page_count is now 0 -> freed
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> #1 will trigger __put_page() -> __put_single_page() ->
>>>>>>>>> __page_cache_release() will not clear the flags because it's not an LRU
>>>>>>>>> page at that point in time, right (-> isolated)?
>>>>>>>>
>>>>>>>> Sorry, you're right. I thought the old page will be freed via putback_lru_page which will
>>>>>>>> set PageLRU back instead of put_page directly. So if the above race occurs, PG_active and
>>>>>>>> PG_unevictable will remain set while page goes to the buddy and check_free_page will complain
>>>>>>>> about it. But it seems this is never witnessed?
>>>>>>>
>>>>>>> Maybe
>>>>>>>
>>>>>>> a) we were lucky so far and didn't trigger it
>>>>>>> b) the whole code block is dead code because we are missing something
>>>>>>> c) we are missing something else :)
>>>>>>
>>>>>> I think I found the things we missed in another email [1].
>>>>>> [1]: https://lore.kernel.org/all/948ea45e-3b2b-e16c-5b8c-4c34de0ea593@huawei.com/
>>>>>>
>>>>>> Paste the main content of [1] here:
>>>>>>
>>>>>> "
>>>>>> There are 3 cases in unmap_and_move:
>>>>>>
>>>>>> 1.page is freed through "if (page_count(page) == 1)" code block. This works
>>>>>> as PG_active and PG_unevictable are cleared here.
>>>>>>
>>>>>> 2. Failed to migrate the page. The page won't be release so we don't care about it.
>>>>>
>>>>> Right, page is un-isolated.
>>>>>
>>>>>>
>>>>>> 3. The page is migrated successfully. The PG_active and PG_unevictable are cleared
>>>>>> via folio_migrate_flags():
>>>>>>
>>>>>> 	if (folio_test_clear_active(folio)) {
>>>>>> 		VM_BUG_ON_FOLIO(folio_test_unevictable(folio), folio);
>>>>>> 		folio_set_active(newfolio);
>>>>>> 	} else if (folio_test_clear_unevictable(folio))
>>>>>> 		folio_set_unevictable(newfolio);
>>>>>
>>>>> Right.
>>>>>
>>>>>>
>>>>>> For the above race case, the page won't be freed through "if (page_count(page) == 1)" code block.
>>>>>> It will just be migrated and freed via put_page() after folio_migrate_flags() having cleared PG_active
>>>>>> and PG_unevictable.
>>>>>> "
>>>>>> Or Am I miss something again? :)
>>>>>
>>>>> For #1, I'm still not sure what would happen on a speculative reference.
>>>>>
>>>>> It's worth summarizing that
>>>>>
>>>>> a) free_pages_prepare() will clear both flags via page->flags &=
>>>>> ~PAGE_FLAGS_CHECK_AT_PREP;
>>>>>
>>>>> b) free_pages_prepare() will bail out if any flag is set in
>>>>> check_free_page().
>>>>>
>>>>> As we've never seen b) in the wild, this certainly has low priority, and
>>>>> maybe it really cannot happen right now.
>>>>>
>>>>> However, maybe really allowing these flags to be set when freeing the
>>>>> page and removing the "page_count(page) == 1" case from migration code
>>>>> would be the clean thing to do.
>>>>
>>>> IMHO, check_free_page is used to catch possible problem. There's the comment of PAGE_FLAGS_CHECK_AT_FREE:
>>>>
>>>> /*
>>>>  * Flags checked when a page is freed.  Pages being freed should not have
>>>>  * these flags set.  If they are, there is a problem.
>>>>  */
>>>> #define PAGE_FLAGS_CHECK_AT_FREE
>>>>
>>>> There might be an assumption: when page is freed, it shouldn't be an active or unevictable page. It should be
>>>> inactive and evictable. So allowing these flags to be set when freeing the page might not be a good idea?
>>>
>>> Yeah, and we'd be lifting that restriction because there is good reason
>>> to do so.
>>>
>>> Maybe we *could* special case for isolated pages; however, that adds
>>> runtime overhead. Of course, we could perform different checks for e.g.,
>>> DEBUG_VM vs !DEBUG_VM.
>>
>> I found there is one assumption about PG_active and PG_unevictable, i.e. in __folio_clear_lru_flags:
>>
>> 	/* this shouldn't happen, so leave the flags to bad_page() */
>> 	if (folio_test_active(folio) && folio_test_unevictable(folio))
>> 		return;
>>
>> If PG_active and PG_unevictable are both set, this case will be caught in the bad_page() via check_free_page().
>> There might be some other assumptions about PG_active and PG_unevictable. So I think it's not safe to lift that
>> restriction.
>>
>> But maybe we could limit this check within DEBUG_VM as you suggested. Am I supposed to do it?
> 
> Well, if you want, you can look into ways of cleaning that up and
> removing the "if there is more than one reference, the owner hasn't
> freed the page" condition, because there are corner cases where the
> owner might have freed the page but speculative references keep the
> refcount temporarily incremented.>

Let me queue it to my TODO list. :)

Thanks for your valuable suggestion!

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2022-06-08 13:32 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-25 13:27 [PATCH v2 0/4] A few cleanup and fixup patches for migration Miaohe Lin
2022-04-25 13:27 ` [PATCH v2 1/4] mm/migration: reduce the rcu lock duration Miaohe Lin
2022-04-29  9:54   ` David Hildenbrand
2022-05-09  3:14     ` Miaohe Lin
2022-05-24 12:36     ` Miaohe Lin
2022-05-06  3:23   ` ying.huang
2022-05-09  3:20     ` Miaohe Lin
2022-04-25 13:27 ` [PATCH v2 2/4] mm/migration: remove unneeded lock page and PageMovable check Miaohe Lin
2022-04-29 10:07   ` David Hildenbrand
2022-05-09  8:51     ` Miaohe Lin
2022-05-11 15:23       ` David Hildenbrand
2022-05-12  2:25         ` Miaohe Lin
2022-05-12  7:10           ` David Hildenbrand
2022-05-12 13:26             ` Miaohe Lin
2022-05-12 16:50               ` David Hildenbrand
2022-05-16  2:44                 ` Miaohe Lin
2022-05-31 11:59                   ` David Hildenbrand
2022-05-31 12:37                     ` Miaohe Lin
2022-06-01 10:31                       ` David Hildenbrand
2022-06-02  7:40                         ` Miaohe Lin
2022-06-02  8:47                           ` David Hildenbrand
2022-06-07  2:20                             ` Miaohe Lin
2022-06-08 10:05                               ` David Hildenbrand
2022-06-08 13:31                                 ` Miaohe Lin
2022-05-24 12:47                 ` Miaohe Lin
2022-04-25 13:27 ` [PATCH v2 3/4] mm/migration: return errno when isolate_huge_page failed Miaohe Lin
2022-04-29 10:08   ` David Hildenbrand
2022-05-09  8:03     ` Miaohe Lin
2022-04-29 11:36   ` Muchun Song
2022-05-09  3:23     ` Miaohe Lin
2022-05-09  4:21       ` Muchun Song
2022-05-09  7:51         ` Miaohe Lin
2022-04-25 13:27 ` [PATCH v2 4/4] mm/migration: fix potential pte_unmap on an not mapped pte Miaohe Lin
2022-04-29  9:48   ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).