All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix cache flush issues considering PMD sharing
@ 2022-04-24 14:50 Baolin Wang
  2022-04-24 14:50 ` [PATCH 1/2] mm: hugetlb: Considering PMD sharing when flushing cache/TLBs Baolin Wang
  2022-04-24 14:50 ` [PATCH 2/2] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing Baolin Wang
  0 siblings, 2 replies; 7+ messages in thread
From: Baolin Wang @ 2022-04-24 14:50 UTC (permalink / raw)
  To: akpm, mike.kravetz
  Cc: almasrymina, songmuchun, baolin.wang, linux-mm, linux-kernel

Hi,

This patch set fixes some cache flushing issues if PMD sharing is
possible for hugetlb pages, which were found by code inspection.
Please help to review. Thanks.


Baolin Wang (2):
  mm: hugetlb: Considering PMD sharing when flushing cache/TLBs
  mm: rmap: Move the cache flushing to the correct place for hugetlb PMD
    sharing

 mm/hugetlb.c | 17 +++++++++++++++--
 mm/mremap.c  |  2 +-
 mm/rmap.c    | 40 ++++++++++++++++++++++------------------
 3 files changed, 38 insertions(+), 21 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] mm: hugetlb: Considering PMD sharing when flushing cache/TLBs
  2022-04-24 14:50 [PATCH 0/2] Fix cache flush issues considering PMD sharing Baolin Wang
@ 2022-04-24 14:50 ` Baolin Wang
  2022-04-26  0:16   ` Mike Kravetz
  2022-04-24 14:50 ` [PATCH 2/2] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing Baolin Wang
  1 sibling, 1 reply; 7+ messages in thread
From: Baolin Wang @ 2022-04-24 14:50 UTC (permalink / raw)
  To: akpm, mike.kravetz
  Cc: almasrymina, songmuchun, baolin.wang, linux-mm, linux-kernel

When moving hugetlb page tables, the cache flushing is called in
move_page_tables() without considering the shared PMDs, which may
be cause cache issues on some architectures.

Thus we should move the hugetlb cache flushing into
move_hugetlb_page_tables() with considering the shared PMDs ranges,
calculated by adjust_range_if_pmd_sharing_possible(). Meanwhile also
expanding the TLBs flushing range in case of shared PMDs.

Note this is discovered via code inspection, and did not meet a real
problem in practice so far.

Fixes: 550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed vma")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/hugetlb.c | 17 +++++++++++++++--
 mm/mremap.c  |  2 +-
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1945dfb..d3a6094 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4937,10 +4937,17 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
 	unsigned long old_addr_copy;
 	pte_t *src_pte, *dst_pte;
 	struct mmu_notifier_range range;
+	bool shared_pmd = false;
 
 	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, old_addr,
 				old_end);
 	adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
+	/*
+	 * In case of shared PMDs, we should cover the maximum possible
+	 * range.
+	 */
+	flush_cache_range(vma, range.start, range.end);
+
 	mmu_notifier_invalidate_range_start(&range);
 	/* Prevent race with file truncation */
 	i_mmap_lock_write(mapping);
@@ -4957,8 +4964,10 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
 		 */
 		old_addr_copy = old_addr;
 
-		if (huge_pmd_unshare(mm, vma, &old_addr_copy, src_pte))
+		if (huge_pmd_unshare(mm, vma, &old_addr_copy, src_pte)) {
+			shared_pmd = true;
 			continue;
+		}
 
 		dst_pte = huge_pte_alloc(mm, new_vma, new_addr, sz);
 		if (!dst_pte)
@@ -4966,7 +4975,11 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
 
 		move_huge_pte(vma, old_addr, new_addr, src_pte, dst_pte);
 	}
-	flush_tlb_range(vma, old_end - len, old_end);
+
+	if (shared_pmd)
+		flush_tlb_range(vma, range.start, range.end);
+	else
+		flush_tlb_range(vma, old_end - len, old_end);
 	mmu_notifier_invalidate_range_end(&range);
 	i_mmap_unlock_write(mapping);
 
diff --git a/mm/mremap.c b/mm/mremap.c
index 98f50e6..0970025 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -490,12 +490,12 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 		return 0;
 
 	old_end = old_addr + len;
-	flush_cache_range(vma, old_addr, old_end);
 
 	if (is_vm_hugetlb_page(vma))
 		return move_hugetlb_page_tables(vma, new_vma, old_addr,
 						new_addr, len);
 
+	flush_cache_range(vma, old_addr, old_end);
 	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
 				old_addr, old_end);
 	mmu_notifier_invalidate_range_start(&range);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing
  2022-04-24 14:50 [PATCH 0/2] Fix cache flush issues considering PMD sharing Baolin Wang
  2022-04-24 14:50 ` [PATCH 1/2] mm: hugetlb: Considering PMD sharing when flushing cache/TLBs Baolin Wang
@ 2022-04-24 14:50 ` Baolin Wang
  2022-04-26  0:20   ` Mike Kravetz
  1 sibling, 1 reply; 7+ messages in thread
From: Baolin Wang @ 2022-04-24 14:50 UTC (permalink / raw)
  To: akpm, mike.kravetz
  Cc: almasrymina, songmuchun, baolin.wang, linux-mm, linux-kernel

The cache level flush will always be first when changing an existing
virtual–>physical mapping to a new value, since this allows us to
properly handle systems whose caches are strict and require a
virtual–>physical translation to exist for a virtual address. So we
should move the cache flushing before huge_pmd_unshare().

As Muchun pointed out[1], now the architectures whose supporting hugetlb
PMD sharing have no cache flush issues in practice. But I think we
should still follow the cache/TLB flushing rules when changing a valid
virtual address mapping in case of potential issues in future.

[1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/rmap.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 61e63db..81872bb 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 			 * do this outside rmap routines.
 			 */
 			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+			/*
+			 * huge_pmd_unshare unmapped an entire PMD page.
+			 * There is no way of knowing exactly which PMDs may
+			 * be cached for this mm, so we must flush them all.
+			 * start/end were already adjusted above to cover this
+			 * range.
+			 */
+			flush_cache_range(vma, range.start, range.end);
+
 			if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
-				/*
-				 * huge_pmd_unshare unmapped an entire PMD
-				 * page.  There is no way of knowing exactly
-				 * which PMDs may be cached for this mm, so
-				 * we must flush them all.  start/end were
-				 * already adjusted above to cover this range.
-				 */
-				flush_cache_range(vma, range.start, range.end);
 				flush_tlb_range(vma, range.start, range.end);
 				mmu_notifier_invalidate_range(mm, range.start,
 							      range.end);
@@ -1560,13 +1561,14 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 				page_vma_mapped_walk_done(&pvmw);
 				break;
 			}
+		} else {
+			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
 		}
 
 		/*
 		 * Nuke the page table entry. When having to clear
 		 * PageAnonExclusive(), we always have to flush.
 		 */
-		flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
 		if (should_defer_flush(mm, flags) && !anon_exclusive) {
 			/*
 			 * We clear the PTE but do not flush so potentially
@@ -1890,15 +1892,16 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
 			 * do this outside rmap routines.
 			 */
 			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+			/*
+			 * huge_pmd_unshare unmapped an entire PMD page.
+			 * There is no way of knowing exactly which PMDs may
+			 * be cached for this mm, so we must flush them all.
+			 * start/end were already adjusted above to cover this
+			 * range.
+			 */
+			flush_cache_range(vma, range.start, range.end);
+
 			if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
-				/*
-				 * huge_pmd_unshare unmapped an entire PMD
-				 * page.  There is no way of knowing exactly
-				 * which PMDs may be cached for this mm, so
-				 * we must flush them all.  start/end were
-				 * already adjusted above to cover this range.
-				 */
-				flush_cache_range(vma, range.start, range.end);
 				flush_tlb_range(vma, range.start, range.end);
 				mmu_notifier_invalidate_range(mm, range.start,
 							      range.end);
@@ -1915,10 +1918,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
 				page_vma_mapped_walk_done(&pvmw);
 				break;
 			}
+		} else {
+			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
 		}
 
 		/* Nuke the page table entry. */
-		flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
 		pteval = ptep_clear_flush(vma, address, pvmw.pte);
 
 		/* Set the dirty flag on the folio now the pte is gone. */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] mm: hugetlb: Considering PMD sharing when flushing cache/TLBs
  2022-04-24 14:50 ` [PATCH 1/2] mm: hugetlb: Considering PMD sharing when flushing cache/TLBs Baolin Wang
@ 2022-04-26  0:16   ` Mike Kravetz
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Kravetz @ 2022-04-26  0:16 UTC (permalink / raw)
  To: Baolin Wang, akpm; +Cc: almasrymina, songmuchun, linux-mm, linux-kernel

On 4/24/22 07:50, Baolin Wang wrote:
> When moving hugetlb page tables, the cache flushing is called in
> move_page_tables() without considering the shared PMDs, which may
> be cause cache issues on some architectures.
> 
> Thus we should move the hugetlb cache flushing into
> move_hugetlb_page_tables() with considering the shared PMDs ranges,
> calculated by adjust_range_if_pmd_sharing_possible(). Meanwhile also
> expanding the TLBs flushing range in case of shared PMDs.
> 
> Note this is discovered via code inspection, and did not meet a real
> problem in practice so far.
> 
> Fixes: 550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed vma")
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  mm/hugetlb.c | 17 +++++++++++++++--
>  mm/mremap.c  |  2 +-
>  2 files changed, 16 insertions(+), 3 deletions(-)

Thanks!

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
-- 
Mike Kravetz

> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 1945dfb..d3a6094 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4937,10 +4937,17 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
>  	unsigned long old_addr_copy;
>  	pte_t *src_pte, *dst_pte;
>  	struct mmu_notifier_range range;
> +	bool shared_pmd = false;
>  
>  	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, old_addr,
>  				old_end);
>  	adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
> +	/*
> +	 * In case of shared PMDs, we should cover the maximum possible
> +	 * range.
> +	 */
> +	flush_cache_range(vma, range.start, range.end);
> +
>  	mmu_notifier_invalidate_range_start(&range);
>  	/* Prevent race with file truncation */
>  	i_mmap_lock_write(mapping);
> @@ -4957,8 +4964,10 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
>  		 */
>  		old_addr_copy = old_addr;
>  
> -		if (huge_pmd_unshare(mm, vma, &old_addr_copy, src_pte))
> +		if (huge_pmd_unshare(mm, vma, &old_addr_copy, src_pte)) {
> +			shared_pmd = true;
>  			continue;
> +		}
>  
>  		dst_pte = huge_pte_alloc(mm, new_vma, new_addr, sz);
>  		if (!dst_pte)
> @@ -4966,7 +4975,11 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
>  
>  		move_huge_pte(vma, old_addr, new_addr, src_pte, dst_pte);
>  	}
> -	flush_tlb_range(vma, old_end - len, old_end);
> +
> +	if (shared_pmd)
> +		flush_tlb_range(vma, range.start, range.end);
> +	else
> +		flush_tlb_range(vma, old_end - len, old_end);
>  	mmu_notifier_invalidate_range_end(&range);
>  	i_mmap_unlock_write(mapping);
>  
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 98f50e6..0970025 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -490,12 +490,12 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>  		return 0;
>  
>  	old_end = old_addr + len;
> -	flush_cache_range(vma, old_addr, old_end);
>  
>  	if (is_vm_hugetlb_page(vma))
>  		return move_hugetlb_page_tables(vma, new_vma, old_addr,
>  						new_addr, len);
>  
> +	flush_cache_range(vma, old_addr, old_end);
>  	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
>  				old_addr, old_end);
>  	mmu_notifier_invalidate_range_start(&range);


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing
  2022-04-24 14:50 ` [PATCH 2/2] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing Baolin Wang
@ 2022-04-26  0:20   ` Mike Kravetz
  2022-04-26  6:26     ` Baolin Wang
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Kravetz @ 2022-04-26  0:20 UTC (permalink / raw)
  To: Baolin Wang, akpm; +Cc: almasrymina, songmuchun, linux-mm, linux-kernel

On 4/24/22 07:50, Baolin Wang wrote:
> The cache level flush will always be first when changing an existing
> virtual–>physical mapping to a new value, since this allows us to
> properly handle systems whose caches are strict and require a
> virtual–>physical translation to exist for a virtual address. So we
> should move the cache flushing before huge_pmd_unshare().
> 
> As Muchun pointed out[1], now the architectures whose supporting hugetlb
> PMD sharing have no cache flush issues in practice. But I think we
> should still follow the cache/TLB flushing rules when changing a valid
> virtual address mapping in case of potential issues in future.
> 
> [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  mm/rmap.c | 40 ++++++++++++++++++++++------------------
>  1 file changed, 22 insertions(+), 18 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 61e63db..81872bb 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>  			 * do this outside rmap routines.
>  			 */
>  			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
> +			/*
> +			 * huge_pmd_unshare unmapped an entire PMD page.

Perhaps update this comment to say that huge_pmd_unshare 'may' unmap
an entire PMD page?

> +			 * There is no way of knowing exactly which PMDs may
> +			 * be cached for this mm, so we must flush them all.
> +			 * start/end were already adjusted above to cover this
> +			 * range.
> +			 */
> +			flush_cache_range(vma, range.start, range.end);
> +
>  			if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
> -				/*
> -				 * huge_pmd_unshare unmapped an entire PMD
> -				 * page.  There is no way of knowing exactly
> -				 * which PMDs may be cached for this mm, so
> -				 * we must flush them all.  start/end were
> -				 * already adjusted above to cover this range.
> -				 */
> -				flush_cache_range(vma, range.start, range.end);
>  				flush_tlb_range(vma, range.start, range.end);
>  				mmu_notifier_invalidate_range(mm, range.start,
>  							      range.end);
> @@ -1560,13 +1561,14 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>  				page_vma_mapped_walk_done(&pvmw);
>  				break;
>  			}
> +		} else {
> +			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));

I know this call to flush_cache_page() existed before your change.  But, when
looking at this now I wonder how hugetlb pages are handled?  Are there any
versions of flush_cache_page() that take page size into account?

-- 
Mike Kravetz

>  		}
>  
>  		/*
>  		 * Nuke the page table entry. When having to clear
>  		 * PageAnonExclusive(), we always have to flush.
>  		 */
> -		flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
>  		if (should_defer_flush(mm, flags) && !anon_exclusive) {
>  			/*
>  			 * We clear the PTE but do not flush so potentially
> @@ -1890,15 +1892,16 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>  			 * do this outside rmap routines.
>  			 */
>  			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
> +			/*
> +			 * huge_pmd_unshare unmapped an entire PMD page.
> +			 * There is no way of knowing exactly which PMDs may
> +			 * be cached for this mm, so we must flush them all.
> +			 * start/end were already adjusted above to cover this
> +			 * range.
> +			 */
> +			flush_cache_range(vma, range.start, range.end);
> +
>  			if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
> -				/*
> -				 * huge_pmd_unshare unmapped an entire PMD
> -				 * page.  There is no way of knowing exactly
> -				 * which PMDs may be cached for this mm, so
> -				 * we must flush them all.  start/end were
> -				 * already adjusted above to cover this range.
> -				 */
> -				flush_cache_range(vma, range.start, range.end);
>  				flush_tlb_range(vma, range.start, range.end);
>  				mmu_notifier_invalidate_range(mm, range.start,
>  							      range.end);
> @@ -1915,10 +1918,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>  				page_vma_mapped_walk_done(&pvmw);
>  				break;
>  			}
> +		} else {
> +			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
>  		}
>  
>  		/* Nuke the page table entry. */
> -		flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
>  		pteval = ptep_clear_flush(vma, address, pvmw.pte);
>  
>  		/* Set the dirty flag on the folio now the pte is gone. */


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing
  2022-04-26  0:20   ` Mike Kravetz
@ 2022-04-26  6:26     ` Baolin Wang
  2022-04-26 16:28       ` Mike Kravetz
  0 siblings, 1 reply; 7+ messages in thread
From: Baolin Wang @ 2022-04-26  6:26 UTC (permalink / raw)
  To: Mike Kravetz, akpm; +Cc: almasrymina, songmuchun, linux-mm, linux-kernel



On 4/26/2022 8:20 AM, Mike Kravetz wrote:
> On 4/24/22 07:50, Baolin Wang wrote:
>> The cache level flush will always be first when changing an existing
>> virtual–>physical mapping to a new value, since this allows us to
>> properly handle systems whose caches are strict and require a
>> virtual–>physical translation to exist for a virtual address. So we
>> should move the cache flushing before huge_pmd_unshare().
>>
>> As Muchun pointed out[1], now the architectures whose supporting hugetlb
>> PMD sharing have no cache flush issues in practice. But I think we
>> should still follow the cache/TLB flushing rules when changing a valid
>> virtual address mapping in case of potential issues in future.
>>
>> [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   mm/rmap.c | 40 ++++++++++++++++++++++------------------
>>   1 file changed, 22 insertions(+), 18 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 61e63db..81872bb 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>   			 * do this outside rmap routines.
>>   			 */
>>   			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
>> +			/*
>> +			 * huge_pmd_unshare unmapped an entire PMD page.
> 
> Perhaps update this comment to say that huge_pmd_unshare 'may' unmap
> an entire PMD page?

Sure, will do.

> 
>> +			 * There is no way of knowing exactly which PMDs may
>> +			 * be cached for this mm, so we must flush them all.
>> +			 * start/end were already adjusted above to cover this
>> +			 * range.
>> +			 */
>> +			flush_cache_range(vma, range.start, range.end);
>> +
>>   			if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
>> -				/*
>> -				 * huge_pmd_unshare unmapped an entire PMD
>> -				 * page.  There is no way of knowing exactly
>> -				 * which PMDs may be cached for this mm, so
>> -				 * we must flush them all.  start/end were
>> -				 * already adjusted above to cover this range.
>> -				 */
>> -				flush_cache_range(vma, range.start, range.end);
>>   				flush_tlb_range(vma, range.start, range.end);
>>   				mmu_notifier_invalidate_range(mm, range.start,
>>   							      range.end);
>> @@ -1560,13 +1561,14 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>   				page_vma_mapped_walk_done(&pvmw);
>>   				break;
>>   			}
>> +		} else {
>> +			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
> 
> I know this call to flush_cache_page() existed before your change.  But, when
> looking at this now I wonder how hugetlb pages are handled?  Are there any
> versions of flush_cache_page() that take page size into account?

Thanks for reminding. I checked the flush_cache_page() implementation on 
some architectures (like arm32), they did not consider the hugetlb 
pages, so I think we may miss flushing the whole cache for hguetlb pages 
on some architectures.

With this patch, we can mitigate this issue, since we change to use 
flush_cache_range() to cover the possible range to flush cache for 
hugetlb pages. Bur for anon hugetlb pages, we should also convert to use
flush_cache_range() instead. I think we can do this conversion in a 
separate patch set with checking all the places, where using 
flush_cache_page() to flush cache for hugetlb pages. How do you think?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing
  2022-04-26  6:26     ` Baolin Wang
@ 2022-04-26 16:28       ` Mike Kravetz
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Kravetz @ 2022-04-26 16:28 UTC (permalink / raw)
  To: Baolin Wang, akpm; +Cc: almasrymina, songmuchun, linux-mm, linux-kernel

On 4/25/22 23:26, Baolin Wang wrote:
> 
> 
> On 4/26/2022 8:20 AM, Mike Kravetz wrote:
>> On 4/24/22 07:50, Baolin Wang wrote:
>>> The cache level flush will always be first when changing an existing
>>> virtual–>physical mapping to a new value, since this allows us to
>>> properly handle systems whose caches are strict and require a
>>> virtual–>physical translation to exist for a virtual address. So we
>>> should move the cache flushing before huge_pmd_unshare().
>>>
>>> As Muchun pointed out[1], now the architectures whose supporting hugetlb
>>> PMD sharing have no cache flush issues in practice. But I think we
>>> should still follow the cache/TLB flushing rules when changing a valid
>>> virtual address mapping in case of potential issues in future.
>>>
>>> [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/
>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> ---
>>>   mm/rmap.c | 40 ++++++++++++++++++++++------------------
>>>   1 file changed, 22 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index 61e63db..81872bb 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>>                * do this outside rmap routines.
>>>                */
>>>               VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
>>> +            /*
>>> +             * huge_pmd_unshare unmapped an entire PMD page.
>>
>> Perhaps update this comment to say that huge_pmd_unshare 'may' unmap
>> an entire PMD page?
> 
> Sure, will do.
> 
>>
>>> +             * There is no way of knowing exactly which PMDs may
>>> +             * be cached for this mm, so we must flush them all.
>>> +             * start/end were already adjusted above to cover this
>>> +             * range.
>>> +             */
>>> +            flush_cache_range(vma, range.start, range.end);
>>> +
>>>               if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
>>> -                /*
>>> -                 * huge_pmd_unshare unmapped an entire PMD
>>> -                 * page.  There is no way of knowing exactly
>>> -                 * which PMDs may be cached for this mm, so
>>> -                 * we must flush them all.  start/end were
>>> -                 * already adjusted above to cover this range.
>>> -                 */
>>> -                flush_cache_range(vma, range.start, range.end);
>>>                   flush_tlb_range(vma, range.start, range.end);
>>>                   mmu_notifier_invalidate_range(mm, range.start,
>>>                                     range.end);
>>> @@ -1560,13 +1561,14 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>>                   page_vma_mapped_walk_done(&pvmw);
>>>                   break;
>>>               }
>>> +        } else {
>>> +            flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
>>
>> I know this call to flush_cache_page() existed before your change.  But, when
>> looking at this now I wonder how hugetlb pages are handled?  Are there any
>> versions of flush_cache_page() that take page size into account?
> 
> Thanks for reminding. I checked the flush_cache_page() implementation on some architectures (like arm32), they did not consider the hugetlb pages, so I think we may miss flushing the whole cache for hguetlb pages on some architectures.
> 
> With this patch, we can mitigate this issue, since we change to use flush_cache_range() to cover the possible range to flush cache for hugetlb pages. Bur for anon hugetlb pages, we should also convert to use
> flush_cache_range() instead. I think we can do this conversion in a separate patch set with checking all the places, where using flush_cache_page() to flush cache for hugetlb pages. How do you think?

Yes, I am OK with that approach.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-04-26 16:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-24 14:50 [PATCH 0/2] Fix cache flush issues considering PMD sharing Baolin Wang
2022-04-24 14:50 ` [PATCH 1/2] mm: hugetlb: Considering PMD sharing when flushing cache/TLBs Baolin Wang
2022-04-26  0:16   ` Mike Kravetz
2022-04-24 14:50 ` [PATCH 2/2] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing Baolin Wang
2022-04-26  0:20   ` Mike Kravetz
2022-04-26  6:26     ` Baolin Wang
2022-04-26 16:28       ` Mike Kravetz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.