Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2] mm/hugetlb: Fix calculation of adjust_range_if_pmd_sharing_possible
@ 2020-07-30 20:16 Peter Xu
  2020-07-30 21:49 ` Mike Kravetz
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Xu @ 2020-07-30 20:16 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: peterx, Andrew Morton, Andrea Arcangeli, Mike Kravetz, Matthew Wilcox

This is found by code observation only.

Firstly, the worst case scenario should assume the whole range was covered by
pmd sharing.  The old algorithm might not work as expected for ranges
like (1g-2m, 1g+2m), where the adjusted range should be (0, 1g+2m) but the
expected range should be (0, 2g).

Since at it, remove the loop since it should not be required.  With that, the
new code should be faster too when the invalidating range is huge.

CC: Andrea Arcangeli <aarcange@redhat.com>
CC: Mike Kravetz <mike.kravetz@oracle.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Matthew Wilcox <willy@infradead.org>
CC: linux-mm@kvack.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Peter Xu <peterx@redhat.com>
---
v2:
- use min/max instead of custom MIN/MAX [Matthew]
---
 mm/hugetlb.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4645f1441d32..7332f3c4b8ec 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5321,25 +5321,21 @@ static bool vma_shareable(struct vm_area_struct *vma, unsigned long addr)
 void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
 				unsigned long *start, unsigned long *end)
 {
-	unsigned long check_addr;
+	unsigned long a_start, a_end;
 
 	if (!(vma->vm_flags & VM_MAYSHARE))
 		return;
 
-	for (check_addr = *start; check_addr < *end; check_addr += PUD_SIZE) {
-		unsigned long a_start = check_addr & PUD_MASK;
-		unsigned long a_end = a_start + PUD_SIZE;
+	/* Extend the range to be PUD aligned for a worst case scenario */
+	a_start = ALIGN_DOWN(*start, PUD_SIZE);
+	a_end = ALIGN(*end, PUD_SIZE);
 
-		/*
-		 * If sharing is possible, adjust start/end if necessary.
-		 */
-		if (range_in_vma(vma, a_start, a_end)) {
-			if (a_start < *start)
-				*start = a_start;
-			if (a_end > *end)
-				*end = a_end;
-		}
-	}
+	/*
+	 * Intersect the range with the vma range, since pmd sharing won't be
+	 * across vma after all
+	 */
+	*start = max(vma->vm_start, a_start);
+	*end = min(vma->vm_end, a_end);
 }
 
 /*
-- 
2.26.2



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] mm/hugetlb: Fix calculation of adjust_range_if_pmd_sharing_possible
  2020-07-30 20:16 [PATCH v2] mm/hugetlb: Fix calculation of adjust_range_if_pmd_sharing_possible Peter Xu
@ 2020-07-30 21:49 ` Mike Kravetz
  2020-07-30 23:26   ` Peter Xu
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Kravetz @ 2020-07-30 21:49 UTC (permalink / raw)
  To: Peter Xu, linux-mm, linux-kernel
  Cc: Andrew Morton, Andrea Arcangeli, Matthew Wilcox

On 7/30/20 1:16 PM, Peter Xu wrote:
> This is found by code observation only.
> 
> Firstly, the worst case scenario should assume the whole range was covered by
> pmd sharing.  The old algorithm might not work as expected for ranges
> like (1g-2m, 1g+2m), where the adjusted range should be (0, 1g+2m) but the
> expected range should be (0, 2g).
> 
> Since at it, remove the loop since it should not be required.  With that, the
> new code should be faster too when the invalidating range is huge.

Thanks Peter!

That is certainly much simpler than the loop in current code.  You say there
are instances where old code 'might not work' for ranges like (1g-2m, 1g+2m).
Not sure I understand what you mean by adjusted and expected ranges in the
message.  Both are possible 'adjusted' ranges depending on vma size.

Just trying to figure out if there is an actual problem in the existing code
that needs to be fixed in stable.  I think the existing code is correct, just
inefficient.
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] mm/hugetlb: Fix calculation of adjust_range_if_pmd_sharing_possible
  2020-07-30 21:49 ` Mike Kravetz
@ 2020-07-30 23:26   ` Peter Xu
  2020-07-31  0:28     ` Mike Kravetz
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Xu @ 2020-07-30 23:26 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Andrew Morton, Andrea Arcangeli, Matthew Wilcox

Hi, Mike,

On Thu, Jul 30, 2020 at 02:49:18PM -0700, Mike Kravetz wrote:
> On 7/30/20 1:16 PM, Peter Xu wrote:
> > This is found by code observation only.
> > 
> > Firstly, the worst case scenario should assume the whole range was covered by
> > pmd sharing.  The old algorithm might not work as expected for ranges
> > like (1g-2m, 1g+2m), where the adjusted range should be (0, 1g+2m) but the
> > expected range should be (0, 2g).
> > 
> > Since at it, remove the loop since it should not be required.  With that, the
> > new code should be faster too when the invalidating range is huge.
> 
> Thanks Peter!
> 
> That is certainly much simpler than the loop in current code.  You say there
> are instances where old code 'might not work' for ranges like (1g-2m, 1g+2m).
> Not sure I understand what you mean by adjusted and expected ranges in the
> message.  Both are possible 'adjusted' ranges depending on vma size.
> 
> Just trying to figure out if there is an actual problem in the existing code
> that needs to be fixed in stable.  I think the existing code is correct, just
> inefficient.

Thanks for the quick review!

I'm not sure whether that will cause a real problem, but iiuc in my previous
example of (1g-2m, 1g+2m) in the commit message, the old code will extend the
range to (0, 1g+2m).  In this case, if unluckily the (1g, 2g) range is a pud
with shared pmd, then imho we face the risk of partial tlb flushing with the
old code, because it will only flush tlb for range (0, 1g+2m) but not (0, 2g).
If that's the case, maybe it worths cc stable.

Anyway, I'd like to double confirm with you in case I missed something.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] mm/hugetlb: Fix calculation of adjust_range_if_pmd_sharing_possible
  2020-07-30 23:26   ` Peter Xu
@ 2020-07-31  0:28     ` Mike Kravetz
  0 siblings, 0 replies; 4+ messages in thread
From: Mike Kravetz @ 2020-07-31  0:28 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Andrew Morton, Andrea Arcangeli, Matthew Wilcox

On 7/30/20 4:26 PM, Peter Xu wrote:
> Hi, Mike,
> 
> On Thu, Jul 30, 2020 at 02:49:18PM -0700, Mike Kravetz wrote:
>> On 7/30/20 1:16 PM, Peter Xu wrote:
>>> This is found by code observation only.
>>>
>>> Firstly, the worst case scenario should assume the whole range was covered by
>>> pmd sharing.  The old algorithm might not work as expected for ranges
>>> like (1g-2m, 1g+2m), where the adjusted range should be (0, 1g+2m) but the
>>> expected range should be (0, 2g).
>>>
>>> Since at it, remove the loop since it should not be required.  With that, the
>>> new code should be faster too when the invalidating range is huge.
>>
>> Thanks Peter!
>>
>> That is certainly much simpler than the loop in current code.  You say there
>> are instances where old code 'might not work' for ranges like (1g-2m, 1g+2m).
>> Not sure I understand what you mean by adjusted and expected ranges in the
>> message.  Both are possible 'adjusted' ranges depending on vma size.
>>
>> Just trying to figure out if there is an actual problem in the existing code
>> that needs to be fixed in stable.  I think the existing code is correct, just
>> inefficient.
> 
> Thanks for the quick review!
> 
> I'm not sure whether that will cause a real problem, but iiuc in my previous
> example of (1g-2m, 1g+2m) in the commit message, the old code will extend the
> range to (0, 1g+2m).  In this case, if unluckily the (1g, 2g) range is a pud
> with shared pmd, then imho we face the risk of partial tlb flushing with the
> old code, because it will only flush tlb for range (0, 1g+2m) but not (0, 2g).
> If that's the case, maybe it worths cc stable.
> 
> Anyway, I'd like to double confirm with you in case I missed something.

You are correct.  With range (1g-2m, 1g+2m) within a vma (0, 2g) the existing
code will only adjust to (0, 1g+2m) which is incorrect.

We should cc stable.  The original reason for adjusting the range was to
prevent data corruption (getting wrong page).  Since the range is not always
adjusted correctly, the potential for corruption still exists.

However, I am fairly confident that adjust_range_if_pmd_sharing_possible is
only gong to be called in two cases:
1) for a single page
2) for range == entire vma
In those cases, the current code should produce the correct results.

To be safe, let's just cc stable.

Also,
Fixes: 017b1660df89 ("mm: migration: fix migration of huge PMD shared pages")
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-30 20:16 [PATCH v2] mm/hugetlb: Fix calculation of adjust_range_if_pmd_sharing_possible Peter Xu
2020-07-30 21:49 ` Mike Kravetz
2020-07-30 23:26   ` Peter Xu
2020-07-31  0:28     ` Mike Kravetz

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git