[PATCH v2 0/3] Fix cache flush issues considering PMD sharing

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/3] Fix cache flush issues considering PMD sharing
@ 2022-04-27 10:52 Baolin Wang
  2022-04-27 10:52 ` [PATCH v2 1/3] mm: hugetlb: Considering PMD sharing when flushing cache/TLBs Baolin Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Baolin Wang @ 2022-04-27 10:52 UTC (permalink / raw)
  To: akpm, mike.kravetz
  Cc: almasrymina, songmuchun, baolin.wang, linux-mm, linux-kernel

Hi,

This patch set fixes some cache flushing issues if PMD sharing is
possible for hugetlb pages, which were found by code inspection.
Meanwhile Mike found the flush_cache_page() can not cover the whole
size of a hugetlb page on some architectures [1], so I added a new
patch 3 to fix this issue, since I found only try_to_unmap_one()
and try_to_migrate_one() need to fix after some investigation.

Please help to review. Thanks.

[1] https://lore.kernel.org/linux-mm/064da3bb-5b4b-7332-a722-c5a541128705@oracle.com/

Changes from v1:
 - Add reviewed-by tag from Mike.
 - Update some comments suggested by Mike.
 - Add a new patch to fix cache issue for hugetlb page.

Baolin Wang (3):
  mm: hugetlb: Considering PMD sharing when flushing cache/TLBs
  mm: rmap: Move the cache flushing to the correct place for hugetlb PMD
    sharing
  mm: rmap: Use flush_cache_range() to flush cache for hugetlb pages

 mm/hugetlb.c |  17 ++++++++-
 mm/mremap.c  |   2 +-
 mm/rmap.c    | 118 ++++++++++++++++++++++++++++++++---------------------------
 3 files changed, 80 insertions(+), 57 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/3] mm: hugetlb: Considering PMD sharing when flushing cache/TLBs
  2022-04-27 10:52 [PATCH v2 0/3] Fix cache flush issues considering PMD sharing Baolin Wang
@ 2022-04-27 10:52 ` Baolin Wang
  2022-04-28  2:55   ` Muchun Song
  2022-04-27 10:52 ` [PATCH v2 2/3] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing Baolin Wang
  2022-04-27 10:52 ` [PATCH v2 3/3] mm: rmap: Use flush_cache_range() to flush cache for hugetlb pages Baolin Wang
  2 siblings, 1 reply; 11+ messages in thread
From: Baolin Wang @ 2022-04-27 10:52 UTC (permalink / raw)
  To: akpm, mike.kravetz
  Cc: almasrymina, songmuchun, baolin.wang, linux-mm, linux-kernel

When moving hugetlb page tables, the cache flushing is called in
move_page_tables() without considering the shared PMDs, which may
be cause cache issues on some architectures.

Thus we should move the hugetlb cache flushing into
move_hugetlb_page_tables() with considering the shared PMDs ranges,
calculated by adjust_range_if_pmd_sharing_possible(). Meanwhile also
expanding the TLBs flushing range in case of shared PMDs.

Note this is discovered via code inspection, and did not meet a real
problem in practice so far.

Fixes: 550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed vma")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 17 +++++++++++++++--
 mm/mremap.c  |  2 +-
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1945dfb..d3a6094 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4937,10 +4937,17 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
 	unsigned long old_addr_copy;
 	pte_t *src_pte, *dst_pte;
 	struct mmu_notifier_range range;
+	bool shared_pmd = false;
 
 	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, old_addr,
 				old_end);
 	adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
+	/*
+	 * In case of shared PMDs, we should cover the maximum possible
+	 * range.
+	 */
+	flush_cache_range(vma, range.start, range.end);
+
 	mmu_notifier_invalidate_range_start(&range);
 	/* Prevent race with file truncation */
 	i_mmap_lock_write(mapping);
@@ -4957,8 +4964,10 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
 		 */
 		old_addr_copy = old_addr;
 
-		if (huge_pmd_unshare(mm, vma, &old_addr_copy, src_pte))
+		if (huge_pmd_unshare(mm, vma, &old_addr_copy, src_pte)) {
+			shared_pmd = true;
 			continue;
+		}
 
 		dst_pte = huge_pte_alloc(mm, new_vma, new_addr, sz);
 		if (!dst_pte)
@@ -4966,7 +4975,11 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
 
 		move_huge_pte(vma, old_addr, new_addr, src_pte, dst_pte);
 	}
-	flush_tlb_range(vma, old_end - len, old_end);
+
+	if (shared_pmd)
+		flush_tlb_range(vma, range.start, range.end);
+	else
+		flush_tlb_range(vma, old_end - len, old_end);
 	mmu_notifier_invalidate_range_end(&range);
 	i_mmap_unlock_write(mapping);
 
diff --git a/mm/mremap.c b/mm/mremap.c
index 98f50e6..0970025 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -490,12 +490,12 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 		return 0;
 
 	old_end = old_addr + len;
-	flush_cache_range(vma, old_addr, old_end);
 
 	if (is_vm_hugetlb_page(vma))
 		return move_hugetlb_page_tables(vma, new_vma, old_addr,
 						new_addr, len);
 
+	flush_cache_range(vma, old_addr, old_end);
 	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
 				old_addr, old_end);
 	mmu_notifier_invalidate_range_start(&range);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/3] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing
  2022-04-27 10:52 [PATCH v2 0/3] Fix cache flush issues considering PMD sharing Baolin Wang
  2022-04-27 10:52 ` [PATCH v2 1/3] mm: hugetlb: Considering PMD sharing when flushing cache/TLBs Baolin Wang
@ 2022-04-27 10:52 ` Baolin Wang
  2022-04-28  5:55   ` Muchun Song
  2022-04-27 10:52 ` [PATCH v2 3/3] mm: rmap: Use flush_cache_range() to flush cache for hugetlb pages Baolin Wang
  2 siblings, 1 reply; 11+ messages in thread
From: Baolin Wang @ 2022-04-27 10:52 UTC (permalink / raw)
  To: akpm, mike.kravetz
  Cc: almasrymina, songmuchun, baolin.wang, linux-mm, linux-kernel

The cache level flush will always be first when changing an existing
virtual–>physical mapping to a new value, since this allows us to
properly handle systems whose caches are strict and require a
virtual–>physical translation to exist for a virtual address. So we
should move the cache flushing before huge_pmd_unshare().

As Muchun pointed out[1], now the architectures whose supporting hugetlb
PMD sharing have no cache flush issues in practice. But I think we
should still follow the cache/TLB flushing rules when changing a valid
virtual address mapping in case of potential issues in future.

[1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/rmap.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 61e63db..4f0d115 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 			 * do this outside rmap routines.
 			 */
 			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+			/*
+			 * huge_pmd_unshare may unmap an entire PMD page.
+			 * There is no way of knowing exactly which PMDs may
+			 * be cached for this mm, so we must flush them all.
+			 * start/end were already adjusted above to cover this
+			 * range.
+			 */
+			flush_cache_range(vma, range.start, range.end);
+
 			if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
-				/*
-				 * huge_pmd_unshare unmapped an entire PMD
-				 * page.  There is no way of knowing exactly
-				 * which PMDs may be cached for this mm, so
-				 * we must flush them all.  start/end were
-				 * already adjusted above to cover this range.
-				 */
-				flush_cache_range(vma, range.start, range.end);
 				flush_tlb_range(vma, range.start, range.end);
 				mmu_notifier_invalidate_range(mm, range.start,
 							      range.end);
@@ -1560,13 +1561,14 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 				page_vma_mapped_walk_done(&pvmw);
 				break;
 			}
+		} else {
+			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
 		}
 
 		/*
 		 * Nuke the page table entry. When having to clear
 		 * PageAnonExclusive(), we always have to flush.
 		 */
-		flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
 		if (should_defer_flush(mm, flags) && !anon_exclusive) {
 			/*
 			 * We clear the PTE but do not flush so potentially
@@ -1890,15 +1892,16 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
 			 * do this outside rmap routines.
 			 */
 			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+			/*
+			 * huge_pmd_unshare may unmap an entire PMD page.
+			 * There is no way of knowing exactly which PMDs may
+			 * be cached for this mm, so we must flush them all.
+			 * start/end were already adjusted above to cover this
+			 * range.
+			 */
+			flush_cache_range(vma, range.start, range.end);
+
 			if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
-				/*
-				 * huge_pmd_unshare unmapped an entire PMD
-				 * page.  There is no way of knowing exactly
-				 * which PMDs may be cached for this mm, so
-				 * we must flush them all.  start/end were
-				 * already adjusted above to cover this range.
-				 */
-				flush_cache_range(vma, range.start, range.end);
 				flush_tlb_range(vma, range.start, range.end);
 				mmu_notifier_invalidate_range(mm, range.start,
 							      range.end);
@@ -1915,10 +1918,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
 				page_vma_mapped_walk_done(&pvmw);
 				break;
 			}
+		} else {
+			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
 		}
 
 		/* Nuke the page table entry. */
-		flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
 		pteval = ptep_clear_flush(vma, address, pvmw.pte);
 
 		/* Set the dirty flag on the folio now the pte is gone. */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/3] mm: rmap: Use flush_cache_range() to flush cache for hugetlb pages
  2022-04-27 10:52 [PATCH v2 0/3] Fix cache flush issues considering PMD sharing Baolin Wang
  2022-04-27 10:52 ` [PATCH v2 1/3] mm: hugetlb: Considering PMD sharing when flushing cache/TLBs Baolin Wang
  2022-04-27 10:52 ` [PATCH v2 2/3] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing Baolin Wang
@ 2022-04-27 10:52 ` Baolin Wang
  2022-05-03 20:17   ` Mike Kravetz
  2 siblings, 1 reply; 11+ messages in thread
From: Baolin Wang @ 2022-04-27 10:52 UTC (permalink / raw)
  To: akpm, mike.kravetz
  Cc: almasrymina, songmuchun, baolin.wang, linux-mm, linux-kernel

Now we will use flush_cache_page() to flush cache for anonymous hugetlb
pages when unmapping or migrating a hugetlb page mapping, but the
flush_cache_page() only handles a PAGE_SIZE range on some architectures
(like arm32, arc and so on), which will cause potential cache issues.
Thus change to use flush_cache_range() to cover the whole size of a
hugetlb page.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/rmap.c | 90 ++++++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 48 insertions(+), 42 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 4f0d115..6fdd198 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1528,13 +1528,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 		anon_exclusive = folio_test_anon(folio) &&
 				 PageAnonExclusive(subpage);
 
-		if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) {
-			/*
-			 * To call huge_pmd_unshare, i_mmap_rwsem must be
-			 * held in write mode.  Caller needs to explicitly
-			 * do this outside rmap routines.
-			 */
-			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+		if (folio_test_hugetlb(folio)) {
 			/*
 			 * huge_pmd_unshare may unmap an entire PMD page.
 			 * There is no way of knowing exactly which PMDs may
@@ -1544,22 +1538,31 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 			 */
 			flush_cache_range(vma, range.start, range.end);
 
-			if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
-				flush_tlb_range(vma, range.start, range.end);
-				mmu_notifier_invalidate_range(mm, range.start,
-							      range.end);
-
+			if (!folio_test_anon(folio)) {
 				/*
-				 * The ref count of the PMD page was dropped
-				 * which is part of the way map counting
-				 * is done for shared PMDs.  Return 'true'
-				 * here.  When there is no other sharing,
-				 * huge_pmd_unshare returns false and we will
-				 * unmap the actual page and drop map count
-				 * to zero.
+				 * To call huge_pmd_unshare, i_mmap_rwsem must be
+				 * held in write mode.  Caller needs to explicitly
+				 * do this outside rmap routines.
 				 */
-				page_vma_mapped_walk_done(&pvmw);
-				break;
+				VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+
+				if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
+					flush_tlb_range(vma, range.start, range.end);
+					mmu_notifier_invalidate_range(mm, range.start,
+								      range.end);
+
+					/*
+					 * The ref count of the PMD page was dropped
+					 * which is part of the way map counting
+					 * is done for shared PMDs.  Return 'true'
+					 * here.  When there is no other sharing,
+					 * huge_pmd_unshare returns false and we will
+					 * unmap the actual page and drop map count
+					 * to zero.
+					 */
+					page_vma_mapped_walk_done(&pvmw);
+					break;
+				}
 			}
 		} else {
 			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
@@ -1885,13 +1888,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
 		anon_exclusive = folio_test_anon(folio) &&
 				 PageAnonExclusive(subpage);
 
-		if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) {
-			/*
-			 * To call huge_pmd_unshare, i_mmap_rwsem must be
-			 * held in write mode.  Caller needs to explicitly
-			 * do this outside rmap routines.
-			 */
-			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+		if (folio_test_hugetlb(folio)) {
 			/*
 			 * huge_pmd_unshare may unmap an entire PMD page.
 			 * There is no way of knowing exactly which PMDs may
@@ -1901,22 +1898,31 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
 			 */
 			flush_cache_range(vma, range.start, range.end);
 
-			if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
-				flush_tlb_range(vma, range.start, range.end);
-				mmu_notifier_invalidate_range(mm, range.start,
-							      range.end);
-
+			if (!folio_test_anon(folio)) {
 				/*
-				 * The ref count of the PMD page was dropped
-				 * which is part of the way map counting
-				 * is done for shared PMDs.  Return 'true'
-				 * here.  When there is no other sharing,
-				 * huge_pmd_unshare returns false and we will
-				 * unmap the actual page and drop map count
-				 * to zero.
+				 * To call huge_pmd_unshare, i_mmap_rwsem must be
+				 * held in write mode.  Caller needs to explicitly
+				 * do this outside rmap routines.
 				 */
-				page_vma_mapped_walk_done(&pvmw);
-				break;
+				VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+
+				if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
+					flush_tlb_range(vma, range.start, range.end);
+					mmu_notifier_invalidate_range(mm, range.start,
+								      range.end);
+
+					/*
+					 * The ref count of the PMD page was dropped
+					 * which is part of the way map counting
+					 * is done for shared PMDs.  Return 'true'
+					 * here.  When there is no other sharing,
+					 * huge_pmd_unshare returns false and we will
+					 * unmap the actual page and drop map count
+					 * to zero.
+					 */
+					page_vma_mapped_walk_done(&pvmw);
+					break;
+				}
 			}
 		} else {
 			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] mm: hugetlb: Considering PMD sharing when flushing cache/TLBs
  2022-04-27 10:52 ` [PATCH v2 1/3] mm: hugetlb: Considering PMD sharing when flushing cache/TLBs Baolin Wang
@ 2022-04-28  2:55   ` Muchun Song
  0 siblings, 0 replies; 11+ messages in thread
From: Muchun Song @ 2022-04-28  2:55 UTC (permalink / raw)
  To: Baolin Wang; +Cc: akpm, mike.kravetz, almasrymina, linux-mm, linux-kernel

On Wed, Apr 27, 2022 at 06:52:05PM +0800, Baolin Wang wrote:
> When moving hugetlb page tables, the cache flushing is called in
> move_page_tables() without considering the shared PMDs, which may
> be cause cache issues on some architectures.
> 
> Thus we should move the hugetlb cache flushing into
> move_hugetlb_page_tables() with considering the shared PMDs ranges,
> calculated by adjust_range_if_pmd_sharing_possible(). Meanwhile also
> expanding the TLBs flushing range in case of shared PMDs.
> 
> Note this is discovered via code inspection, and did not meet a real
> problem in practice so far.
> 
> Fixes: 550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed vma")
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

LGTM.

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing
  2022-04-27 10:52 ` [PATCH v2 2/3] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing Baolin Wang
@ 2022-04-28  5:55   ` Muchun Song
  2022-04-28  7:06     ` Baolin Wang
  2022-05-03 18:42     ` Mike Kravetz
  0 siblings, 2 replies; 11+ messages in thread
From: Muchun Song @ 2022-04-28  5:55 UTC (permalink / raw)
  To: Baolin Wang; +Cc: akpm, mike.kravetz, almasrymina, linux-mm, linux-kernel

On Wed, Apr 27, 2022 at 06:52:06PM +0800, Baolin Wang wrote:
> The cache level flush will always be first when changing an existing
> virtual–>physical mapping to a new value, since this allows us to
> properly handle systems whose caches are strict and require a
> virtual–>physical translation to exist for a virtual address. So we
> should move the cache flushing before huge_pmd_unshare().
>

Right.

> As Muchun pointed out[1], now the architectures whose supporting hugetlb
> PMD sharing have no cache flush issues in practice. But I think we
> should still follow the cache/TLB flushing rules when changing a valid
> virtual address mapping in case of potential issues in future.

Right. One point i need to clarify. I do not object this change but
want you to clarify this (not an issue in practice) in commit log
to let others know they do not need to bp this.

> 
> [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  mm/rmap.c | 40 ++++++++++++++++++++++------------------
>  1 file changed, 22 insertions(+), 18 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 61e63db..4f0d115 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>  			 * do this outside rmap routines.
>  			 */
>  			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
> +			/*
> +			 * huge_pmd_unshare may unmap an entire PMD page.
> +			 * There is no way of knowing exactly which PMDs may
> +			 * be cached for this mm, so we must flush them all.
> +			 * start/end were already adjusted above to cover this
> +			 * range.
> +			 */
> +			flush_cache_range(vma, range.start, range.end);
> +

flush_cache_range() is always called even if we do not need to flush.
How about introducing a new helper like hugetlb_pmd_shared() which
returns true for shared PMD? Then:

	if (hugetlb_pmd_shared(mm, vma, pvmw.pte)) {
		flush_cache_range(vma, range.start, range.end);
		huge_pmd_unshare(mm, vma, &address, pvmw.pte);
		flush_tlb_range(vma, range.start, range.end);
	}

The code could be a little simpler. Right?

Thanks.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing
  2022-04-28  5:55   ` Muchun Song
@ 2022-04-28  7:06     ` Baolin Wang
  2022-05-03 18:42     ` Mike Kravetz
  1 sibling, 0 replies; 11+ messages in thread
From: Baolin Wang @ 2022-04-28  7:06 UTC (permalink / raw)
  To: Muchun Song; +Cc: akpm, mike.kravetz, almasrymina, linux-mm, linux-kernel



On 4/28/2022 1:55 PM, Muchun Song wrote:
> On Wed, Apr 27, 2022 at 06:52:06PM +0800, Baolin Wang wrote:
>> The cache level flush will always be first when changing an existing
>> virtual–>physical mapping to a new value, since this allows us to
>> properly handle systems whose caches are strict and require a
>> virtual–>physical translation to exist for a virtual address. So we
>> should move the cache flushing before huge_pmd_unshare().
>>
> 
> Right.
> 
>> As Muchun pointed out[1], now the architectures whose supporting hugetlb
>> PMD sharing have no cache flush issues in practice. But I think we
>> should still follow the cache/TLB flushing rules when changing a valid
>> virtual address mapping in case of potential issues in future.
> 
> Right. One point i need to clarify. I do not object this change but
> want you to clarify this (not an issue in practice) in commit log
> to let others know they do not need to bp this.
> 
>>
>> [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   mm/rmap.c | 40 ++++++++++++++++++++++------------------
>>   1 file changed, 22 insertions(+), 18 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 61e63db..4f0d115 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>   			 * do this outside rmap routines.
>>   			 */
>>   			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
>> +			/*
>> +			 * huge_pmd_unshare may unmap an entire PMD page.
>> +			 * There is no way of knowing exactly which PMDs may
>> +			 * be cached for this mm, so we must flush them all.
>> +			 * start/end were already adjusted above to cover this
>> +			 * range.
>> +			 */
>> +			flush_cache_range(vma, range.start, range.end);
>> +
> 
> flush_cache_range() is always called even if we do not need to flush.

Right, this is intended. In the original code, if it is not a shared 
PMD, we will use flush_cache_page() to do cache flushing. However the 
flush_cache_page() can not cover the whole size of a hugetlb page on 
some architectures, which is fixed by patch 3.

> How about introducing a new helper like hugetlb_pmd_shared() which
> returns true for shared PMD? Then:
> 
> 	if (hugetlb_pmd_shared(mm, vma, pvmw.pte)) {
> 		flush_cache_range(vma, range.start, range.end);
> 		huge_pmd_unshare(mm, vma, &address, pvmw.pte);
> 		flush_tlb_range(vma, range.start, range.end);
> 	}
> 
> The code could be a little simpler. Right?

IMHO after patch 3, the code will be changed as below, so seems no need 
to separate the validation of the shared PMDs from huge_pmd_unshare() 
into a new function.

if (folio_test_hugetlb(folio)) {
	flush_cache_range(vma, range.start, range.end);

	if (!folio_test_anon(folio)) {
		VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));

		if (hugetlb_pmd_shared(mm, vma, pvmw.pte)) {
			huge_pmd_unshare(mm, vma, &address, pvmw.pte));
			flush_tlb_range(vma, range.start, range.end);
			......
			break;
		}
	}
} else {
	......
}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing
  2022-04-28  5:55   ` Muchun Song
  2022-04-28  7:06     ` Baolin Wang
@ 2022-05-03 18:42     ` Mike Kravetz
  2022-05-04  2:50       ` Baolin Wang
  1 sibling, 1 reply; 11+ messages in thread
From: Mike Kravetz @ 2022-05-03 18:42 UTC (permalink / raw)
  To: Muchun Song, Baolin Wang; +Cc: akpm, almasrymina, linux-mm, linux-kernel

On 4/27/22 22:55, Muchun Song wrote:
> On Wed, Apr 27, 2022 at 06:52:06PM +0800, Baolin Wang wrote:
>> The cache level flush will always be first when changing an existing
>> virtual–>physical mapping to a new value, since this allows us to
>> properly handle systems whose caches are strict and require a
>> virtual–>physical translation to exist for a virtual address. So we
>> should move the cache flushing before huge_pmd_unshare().
>>
> 
> Right.
> 
>> As Muchun pointed out[1], now the architectures whose supporting hugetlb
>> PMD sharing have no cache flush issues in practice. But I think we
>> should still follow the cache/TLB flushing rules when changing a valid
>> virtual address mapping in case of potential issues in future.
> 
> Right. One point i need to clarify. I do not object this change but
> want you to clarify this (not an issue in practice) in commit log
> to let others know they do not need to bp this.
> 
>>
>> [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>  mm/rmap.c | 40 ++++++++++++++++++++++------------------
>>  1 file changed, 22 insertions(+), 18 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 61e63db..4f0d115 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>  			 * do this outside rmap routines.
>>  			 */
>>  			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
>> +			/*
>> +			 * huge_pmd_unshare may unmap an entire PMD page.
>> +			 * There is no way of knowing exactly which PMDs may
>> +			 * be cached for this mm, so we must flush them all.
>> +			 * start/end were already adjusted above to cover this
>> +			 * range.
>> +			 */
>> +			flush_cache_range(vma, range.start, range.end);
>> +
> 
> flush_cache_range() is always called even if we do not need to flush.
> How about introducing a new helper like hugetlb_pmd_shared() which
> returns true for shared PMD? Then:
> 
> 	if (hugetlb_pmd_shared(mm, vma, pvmw.pte)) {
> 		flush_cache_range(vma, range.start, range.end);
> 		huge_pmd_unshare(mm, vma, &address, pvmw.pte);
> 		flush_tlb_range(vma, range.start, range.end);
> 	}
> 
> The code could be a little simpler. Right?
> 
> Thanks.
> 

I thought about adding a 'hugetlb_pmd_shared()' interface for another use.
I believe it could even be used earlier in this call sequence.  Since we
hold i_mmap_rwsem, we would even test for shared BEFORE calling
adjust_range_if_pmd_sharing_possible.  We can not make an authoritative test
in adjust range... because not all callers will be holding i_mmap_rwsem.

I think we COULD optimize to minimize the flush range.  However, I think
that would complicate this code even more, and it is difficult enough to
follow.

My preference would be to over flush as is done here for correctness and
simplification.  We can optimize later if desired.

With Muchun's comment that this is not an issue in practice today,
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] mm: rmap: Use flush_cache_range() to flush cache for hugetlb pages
  2022-04-27 10:52 ` [PATCH v2 3/3] mm: rmap: Use flush_cache_range() to flush cache for hugetlb pages Baolin Wang
@ 2022-05-03 20:17   ` Mike Kravetz
  2022-05-04  2:49     ` Baolin Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Kravetz @ 2022-05-03 20:17 UTC (permalink / raw)
  To: Baolin Wang, akpm; +Cc: almasrymina, songmuchun, linux-mm, linux-kernel

On 4/27/22 03:52, Baolin Wang wrote:
> Now we will use flush_cache_page() to flush cache for anonymous hugetlb
> pages when unmapping or migrating a hugetlb page mapping, but the
> flush_cache_page() only handles a PAGE_SIZE range on some architectures
> (like arm32, arc and so on), which will cause potential cache issues.
> Thus change to use flush_cache_range() to cover the whole size of a
> hugetlb page.
> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  mm/rmap.c | 90 ++++++++++++++++++++++++++++++++++-----------------------------
>  1 file changed, 48 insertions(+), 42 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 4f0d115..6fdd198 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1528,13 +1528,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>  		anon_exclusive = folio_test_anon(folio) &&
>  				 PageAnonExclusive(subpage);
>  
> -		if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) {
> -			/*
> -			 * To call huge_pmd_unshare, i_mmap_rwsem must be
> -			 * held in write mode.  Caller needs to explicitly
> -			 * do this outside rmap routines.
> -			 */
> -			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
> +		if (folio_test_hugetlb(folio)) {
>  			/*
>  			 * huge_pmd_unshare may unmap an entire PMD page.
>  			 * There is no way of knowing exactly which PMDs may
> @@ -1544,22 +1538,31 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>  			 */
>  			flush_cache_range(vma, range.start, range.end);
>  
> -			if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
> -				flush_tlb_range(vma, range.start, range.end);
> -				mmu_notifier_invalidate_range(mm, range.start,
> -							      range.end);
> -
> +			if (!folio_test_anon(folio)) {
>  				/*
> -				 * The ref count of the PMD page was dropped
> -				 * which is part of the way map counting
> -				 * is done for shared PMDs.  Return 'true'
> -				 * here.  When there is no other sharing,
> -				 * huge_pmd_unshare returns false and we will
> -				 * unmap the actual page and drop map count
> -				 * to zero.
> +				 * To call huge_pmd_unshare, i_mmap_rwsem must be
> +				 * held in write mode.  Caller needs to explicitly
> +				 * do this outside rmap routines.
>  				 */
> -				page_vma_mapped_walk_done(&pvmw);
> -				break;
> +				VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
> +
> +				if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
> +					flush_tlb_range(vma, range.start, range.end);
> +					mmu_notifier_invalidate_range(mm, range.start,
> +								      range.end);
> +
> +					/*
> +					 * The ref count of the PMD page was dropped
> +					 * which is part of the way map counting
> +					 * is done for shared PMDs.  Return 'true'
> +					 * here.  When there is no other sharing,
> +					 * huge_pmd_unshare returns false and we will
> +					 * unmap the actual page and drop map count
> +					 * to zero.
> +					 */
> +					page_vma_mapped_walk_done(&pvmw);
> +					break;
> +				}
>  			}
>  		} else {
>  			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
> @@ -1885,13 +1888,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>  		anon_exclusive = folio_test_anon(folio) &&
>  				 PageAnonExclusive(subpage);
>  
> -		if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) {
> -			/*
> -			 * To call huge_pmd_unshare, i_mmap_rwsem must be
> -			 * held in write mode.  Caller needs to explicitly
> -			 * do this outside rmap routines.
> -			 */
> -			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
> +		if (folio_test_hugetlb(folio)) {
>  			/*
>  			 * huge_pmd_unshare may unmap an entire PMD page.
>  			 * There is no way of knowing exactly which PMDs may
> @@ -1901,22 +1898,31 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>  			 */
>  			flush_cache_range(vma, range.start, range.end);
>  
> -			if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
> -				flush_tlb_range(vma, range.start, range.end);
> -				mmu_notifier_invalidate_range(mm, range.start,
> -							      range.end);
> -
> +			if (!folio_test_anon(folio)) {
>  				/*
> -				 * The ref count of the PMD page was dropped
> -				 * which is part of the way map counting
> -				 * is done for shared PMDs.  Return 'true'
> -				 * here.  When there is no other sharing,
> -				 * huge_pmd_unshare returns false and we will
> -				 * unmap the actual page and drop map count
> -				 * to zero.
> +				 * To call huge_pmd_unshare, i_mmap_rwsem must be
> +				 * held in write mode.  Caller needs to explicitly
> +				 * do this outside rmap routines.
>  				 */
> -				page_vma_mapped_walk_done(&pvmw);
> -				break;
> +				VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
> +
> +				if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
> +					flush_tlb_range(vma, range.start, range.end);
> +					mmu_notifier_invalidate_range(mm, range.start,
> +								      range.end);
> +
> +					/*
> +					 * The ref count of the PMD page was dropped
> +					 * which is part of the way map counting
> +					 * is done for shared PMDs.  Return 'true'
> +					 * here.  When there is no other sharing,
> +					 * huge_pmd_unshare returns false and we will
> +					 * unmap the actual page and drop map count
> +					 * to zero.
> +					 */
> +					page_vma_mapped_walk_done(&pvmw);
> +					break;
> +				}
>  			}
>  		} else {
>  			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));

Thanks,
The code looks fine.  It is unfortunate that we need so many levels of
indenting and exceed 80 columns.  But, that is OK.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

I see you have a followup series to address the call to ptep_clear_flush()
for hugetlb pages not unmapped via huge_pmd_share and will take a look at
that soon.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] mm: rmap: Use flush_cache_range() to flush cache for hugetlb pages
  2022-05-03 20:17   ` Mike Kravetz
@ 2022-05-04  2:49     ` Baolin Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Baolin Wang @ 2022-05-04  2:49 UTC (permalink / raw)
  To: Mike Kravetz, akpm; +Cc: almasrymina, songmuchun, linux-mm, linux-kernel



On 5/4/2022 4:17 AM, Mike Kravetz wrote:
> On 4/27/22 03:52, Baolin Wang wrote:
>> Now we will use flush_cache_page() to flush cache for anonymous hugetlb
>> pages when unmapping or migrating a hugetlb page mapping, but the
>> flush_cache_page() only handles a PAGE_SIZE range on some architectures
>> (like arm32, arc and so on), which will cause potential cache issues.
>> Thus change to use flush_cache_range() to cover the whole size of a
>> hugetlb page.
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   mm/rmap.c | 90 ++++++++++++++++++++++++++++++++++-----------------------------
>>   1 file changed, 48 insertions(+), 42 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 4f0d115..6fdd198 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1528,13 +1528,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>   		anon_exclusive = folio_test_anon(folio) &&
>>   				 PageAnonExclusive(subpage);
>>   
>> -		if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) {
>> -			/*
>> -			 * To call huge_pmd_unshare, i_mmap_rwsem must be
>> -			 * held in write mode.  Caller needs to explicitly
>> -			 * do this outside rmap routines.
>> -			 */
>> -			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
>> +		if (folio_test_hugetlb(folio)) {
>>   			/*
>>   			 * huge_pmd_unshare may unmap an entire PMD page.
>>   			 * There is no way of knowing exactly which PMDs may
>> @@ -1544,22 +1538,31 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>   			 */
>>   			flush_cache_range(vma, range.start, range.end);
>>   
>> -			if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
>> -				flush_tlb_range(vma, range.start, range.end);
>> -				mmu_notifier_invalidate_range(mm, range.start,
>> -							      range.end);
>> -
>> +			if (!folio_test_anon(folio)) {
>>   				/*
>> -				 * The ref count of the PMD page was dropped
>> -				 * which is part of the way map counting
>> -				 * is done for shared PMDs.  Return 'true'
>> -				 * here.  When there is no other sharing,
>> -				 * huge_pmd_unshare returns false and we will
>> -				 * unmap the actual page and drop map count
>> -				 * to zero.
>> +				 * To call huge_pmd_unshare, i_mmap_rwsem must be
>> +				 * held in write mode.  Caller needs to explicitly
>> +				 * do this outside rmap routines.
>>   				 */
>> -				page_vma_mapped_walk_done(&pvmw);
>> -				break;
>> +				VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
>> +
>> +				if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
>> +					flush_tlb_range(vma, range.start, range.end);
>> +					mmu_notifier_invalidate_range(mm, range.start,
>> +								      range.end);
>> +
>> +					/*
>> +					 * The ref count of the PMD page was dropped
>> +					 * which is part of the way map counting
>> +					 * is done for shared PMDs.  Return 'true'
>> +					 * here.  When there is no other sharing,
>> +					 * huge_pmd_unshare returns false and we will
>> +					 * unmap the actual page and drop map count
>> +					 * to zero.
>> +					 */
>> +					page_vma_mapped_walk_done(&pvmw);
>> +					break;
>> +				}
>>   			}
>>   		} else {
>>   			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
>> @@ -1885,13 +1888,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>>   		anon_exclusive = folio_test_anon(folio) &&
>>   				 PageAnonExclusive(subpage);
>>   
>> -		if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) {
>> -			/*
>> -			 * To call huge_pmd_unshare, i_mmap_rwsem must be
>> -			 * held in write mode.  Caller needs to explicitly
>> -			 * do this outside rmap routines.
>> -			 */
>> -			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
>> +		if (folio_test_hugetlb(folio)) {
>>   			/*
>>   			 * huge_pmd_unshare may unmap an entire PMD page.
>>   			 * There is no way of knowing exactly which PMDs may
>> @@ -1901,22 +1898,31 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>>   			 */
>>   			flush_cache_range(vma, range.start, range.end);
>>   
>> -			if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
>> -				flush_tlb_range(vma, range.start, range.end);
>> -				mmu_notifier_invalidate_range(mm, range.start,
>> -							      range.end);
>> -
>> +			if (!folio_test_anon(folio)) {
>>   				/*
>> -				 * The ref count of the PMD page was dropped
>> -				 * which is part of the way map counting
>> -				 * is done for shared PMDs.  Return 'true'
>> -				 * here.  When there is no other sharing,
>> -				 * huge_pmd_unshare returns false and we will
>> -				 * unmap the actual page and drop map count
>> -				 * to zero.
>> +				 * To call huge_pmd_unshare, i_mmap_rwsem must be
>> +				 * held in write mode.  Caller needs to explicitly
>> +				 * do this outside rmap routines.
>>   				 */
>> -				page_vma_mapped_walk_done(&pvmw);
>> -				break;
>> +				VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
>> +
>> +				if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
>> +					flush_tlb_range(vma, range.start, range.end);
>> +					mmu_notifier_invalidate_range(mm, range.start,
>> +								      range.end);
>> +
>> +					/*
>> +					 * The ref count of the PMD page was dropped
>> +					 * which is part of the way map counting
>> +					 * is done for shared PMDs.  Return 'true'
>> +					 * here.  When there is no other sharing,
>> +					 * huge_pmd_unshare returns false and we will
>> +					 * unmap the actual page and drop map count
>> +					 * to zero.
>> +					 */
>> +					page_vma_mapped_walk_done(&pvmw);
>> +					break;
>> +				}
>>   			}
>>   		} else {
>>   			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
> 
> Thanks,
> The code looks fine.  It is unfortunate that we need so many levels of
> indenting and exceed 80 columns.  But, that is OK.

I'll do a cleanup to make it more readable with factoring the hugetlb 
case into a new function after the fix series[1].

[1] 
https://lore.kernel.org/linux-arm-kernel/20220503120343.6264e126@thinkpad/T/

> 
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> I see you have a followup series to address the call to ptep_clear_flush()
> for hugetlb pages not unmapped via huge_pmd_share and will take a look at
> that soon.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing
  2022-05-03 18:42     ` Mike Kravetz
@ 2022-05-04  2:50       ` Baolin Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Baolin Wang @ 2022-05-04  2:50 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song; +Cc: akpm, almasrymina, linux-mm, linux-kernel



On 5/4/2022 2:42 AM, Mike Kravetz wrote:
> On 4/27/22 22:55, Muchun Song wrote:
>> On Wed, Apr 27, 2022 at 06:52:06PM +0800, Baolin Wang wrote:
>>> The cache level flush will always be first when changing an existing
>>> virtual–>physical mapping to a new value, since this allows us to
>>> properly handle systems whose caches are strict and require a
>>> virtual–>physical translation to exist for a virtual address. So we
>>> should move the cache flushing before huge_pmd_unshare().
>>>
>>
>> Right.
>>
>>> As Muchun pointed out[1], now the architectures whose supporting hugetlb
>>> PMD sharing have no cache flush issues in practice. But I think we
>>> should still follow the cache/TLB flushing rules when changing a valid
>>> virtual address mapping in case of potential issues in future.
>>
>> Right. One point i need to clarify. I do not object this change but
>> want you to clarify this (not an issue in practice) in commit log
>> to let others know they do not need to bp this.
>>
>>>
>>> [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/
>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> ---
>>>   mm/rmap.c | 40 ++++++++++++++++++++++------------------
>>>   1 file changed, 22 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index 61e63db..4f0d115 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>>   			 * do this outside rmap routines.
>>>   			 */
>>>   			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
>>> +			/*
>>> +			 * huge_pmd_unshare may unmap an entire PMD page.
>>> +			 * There is no way of knowing exactly which PMDs may
>>> +			 * be cached for this mm, so we must flush them all.
>>> +			 * start/end were already adjusted above to cover this
>>> +			 * range.
>>> +			 */
>>> +			flush_cache_range(vma, range.start, range.end);
>>> +
>>
>> flush_cache_range() is always called even if we do not need to flush.
>> How about introducing a new helper like hugetlb_pmd_shared() which
>> returns true for shared PMD? Then:
>>
>> 	if (hugetlb_pmd_shared(mm, vma, pvmw.pte)) {
>> 		flush_cache_range(vma, range.start, range.end);
>> 		huge_pmd_unshare(mm, vma, &address, pvmw.pte);
>> 		flush_tlb_range(vma, range.start, range.end);
>> 	}
>>
>> The code could be a little simpler. Right?
>>
>> Thanks.
>>
> 
> I thought about adding a 'hugetlb_pmd_shared()' interface for another use.
> I believe it could even be used earlier in this call sequence.  Since we
> hold i_mmap_rwsem, we would even test for shared BEFORE calling
> adjust_range_if_pmd_sharing_possible.  We can not make an authoritative test
> in adjust range... because not all callers will be holding i_mmap_rwsem.
> 
> I think we COULD optimize to minimize the flush range.  However, I think
> that would complicate this code even more, and it is difficult enough to
> follow.
> 
> My preference would be to over flush as is done here for correctness and
> simplification.  We can optimize later if desired.

OK. Agree.

> 
> With Muchun's comment that this is not an issue in practice today,
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-05-04  2:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-27 10:52 [PATCH v2 0/3] Fix cache flush issues considering PMD sharing Baolin Wang
2022-04-27 10:52 ` [PATCH v2 1/3] mm: hugetlb: Considering PMD sharing when flushing cache/TLBs Baolin Wang
2022-04-28  2:55   ` Muchun Song
2022-04-27 10:52 ` [PATCH v2 2/3] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing Baolin Wang
2022-04-28  5:55   ` Muchun Song
2022-04-28  7:06     ` Baolin Wang
2022-05-03 18:42     ` Mike Kravetz
2022-05-04  2:50       ` Baolin Wang
2022-04-27 10:52 ` [PATCH v2 3/3] mm: rmap: Use flush_cache_range() to flush cache for hugetlb pages Baolin Wang
2022-05-03 20:17   ` Mike Kravetz
2022-05-04  2:49     ` Baolin Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.