linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] mm/rmap: do not call mmu_notifier_invalidate_page() v3
@ 2017-08-29 19:05 Jérôme Glisse
  2017-08-29 19:06 ` Jerome Glisse
  2017-08-29 19:09 ` Linus Torvalds
  0 siblings, 2 replies; 7+ messages in thread
From: Jérôme Glisse @ 2017-08-29 19:05 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Jérôme Glisse, Linus Torvalds, Bernhard Held,
	Adam Borowski, Andrea Arcangeli, Radim Krčmář,
	Wanpeng Li, Paolo Bonzini, Takashi Iwai, Nadav Amit,
	Mike Galbraith, Kirill A . Shutemov, axie, Andrew Morton

Some MMU notifier need to be able to sleep during callback. This was
broken by c7ab0d2fdc84 ("mm: convert try_to_unmap_one() to use
page_vma_mapped_walk()").

This patch restore the sleep ability and properly capture the range of
address that needs to be invalidated.

Relevent threads:
https://lkml.kernel.org/r/20170809204333.27485-1-jglisse@redhat.com
https://lkml.kernel.org/r/20170804134928.l4klfcnqatni7vsc@black.fi.intel.com
https://marc.info/?l=kvm&m=150327081325160&w=2

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Bernhard Held <berny156@gmx.de>
Cc: Adam Borowski <kilobyte@angband.pl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: axie <axie@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 mm/rmap.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index c8993c63eb25..0b25b720f494 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -888,6 +888,8 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma,
 		.flags = PVMW_SYNC,
 	};
 	int *cleaned = arg;
+	bool invalidate = false;
+	unsigned long start = address, end = address;
 
 	while (page_vma_mapped_walk(&pvmw)) {
 		int ret = 0;
@@ -905,6 +907,9 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma,
 			entry = pte_mkclean(entry);
 			set_pte_at(vma->vm_mm, address, pte, entry);
 			ret = 1;
+			invalidate = true;
+			/* range is exclusive */
+			end = pvmw.address + PAGE_SIZE;
 		} else {
 #ifdef CONFIG_TRANSPARENT_HUGE_PAGECACHE
 			pmd_t *pmd = pvmw.pmd;
@@ -919,18 +924,22 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma,
 			entry = pmd_mkclean(entry);
 			set_pmd_at(vma->vm_mm, address, pmd, entry);
 			ret = 1;
+			invalidate = true;
+			/* range is exclusive */
+			end = pvmw.address + PAGE_SIZE;
 #else
 			/* unexpected pmd-mapped page? */
 			WARN_ON_ONCE(1);
 #endif
 		}
 
-		if (ret) {
-			mmu_notifier_invalidate_page(vma->vm_mm, address);
+		if (ret)
 			(*cleaned)++;
-		}
 	}
 
+	if (invalidate)
+		mmu_notifier_invalidate_range(vma->vm_mm, start, end);
+
 	return true;
 }
 
@@ -1323,8 +1332,9 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	};
 	pte_t pteval;
 	struct page *subpage;
-	bool ret = true;
+	bool ret = true, invalidate = false;
 	enum ttu_flags flags = (enum ttu_flags)arg;
+	unsigned long start = address, end = address;
 
 	/* munlock has nothing to gain from examining un-locked vmas */
 	if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
@@ -1490,8 +1500,14 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 discard:
 		page_remove_rmap(subpage, PageHuge(page));
 		put_page(page);
-		mmu_notifier_invalidate_page(mm, address);
+		invalidate = true;
+		/* range is exclusive */
+		end = address + PAGE_SIZE;
 	}
+
+	if (invalidate)
+		mmu_notifier_invalidate_range(mm, start, end);
+
 	return ret;
 }
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] mm/rmap: do not call mmu_notifier_invalidate_page() v3
  2017-08-29 19:05 [RFC PATCH] mm/rmap: do not call mmu_notifier_invalidate_page() v3 Jérôme Glisse
@ 2017-08-29 19:06 ` Jerome Glisse
  2017-08-29 19:09 ` Linus Torvalds
  1 sibling, 0 replies; 7+ messages in thread
From: Jerome Glisse @ 2017-08-29 19:06 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Linus Torvalds, Bernhard Held, Adam Borowski, Andrea Arcangeli,
	Radim Krčmář,
	Wanpeng Li, Paolo Bonzini, Takashi Iwai, Nadav Amit,
	Mike Galbraith, Kirill A . Shutemov, axie, Andrew Morton

On Tue, Aug 29, 2017 at 03:05:26PM -0400, Jérôme Glisse wrote:
> Some MMU notifier need to be able to sleep during callback. This was
> broken by c7ab0d2fdc84 ("mm: convert try_to_unmap_one() to use
> page_vma_mapped_walk()").
> 
> This patch restore the sleep ability and properly capture the range of
> address that needs to be invalidated.
> 
> Relevent threads:
> https://lkml.kernel.org/r/20170809204333.27485-1-jglisse@redhat.com
> https://lkml.kernel.org/r/20170804134928.l4klfcnqatni7vsc@black.fi.intel.com
> https://marc.info/?l=kvm&m=150327081325160&w=2

Note that i haven't yet tested this patch i just wanted to put it
out there. I am gonna run some test over new few days an report.
Anyone else is welcome to test it too.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] mm/rmap: do not call mmu_notifier_invalidate_page() v3
  2017-08-29 19:05 [RFC PATCH] mm/rmap: do not call mmu_notifier_invalidate_page() v3 Jérôme Glisse
  2017-08-29 19:06 ` Jerome Glisse
@ 2017-08-29 19:09 ` Linus Torvalds
  2017-08-29 19:14   ` Jerome Glisse
  2017-08-29 19:16   ` Linus Torvalds
  1 sibling, 2 replies; 7+ messages in thread
From: Linus Torvalds @ 2017-08-29 19:09 UTC (permalink / raw)
  To: Jérôme Glisse
  Cc: Linux Kernel Mailing List, linux-mm, Bernhard Held,
	Adam Borowski, Andrea Arcangeli, Radim Krčmář,
	Wanpeng Li, Paolo Bonzini, Takashi Iwai, Nadav Amit,
	Mike Galbraith, Kirill A . Shutemov, axie, Andrew Morton

On Tue, Aug 29, 2017 at 12:05 PM, Jérôme Glisse <jglisse@redhat.com> wrote:
> Some MMU notifier need to be able to sleep during callback. This was
> broken by c7ab0d2fdc84 ("mm: convert try_to_unmap_one() to use
> page_vma_mapped_walk()").

No. No no no.

Didn't you learn *anything* from the bug?

You cannot replace "mmu_notifier_invalidate_page()" with
"mmu_notifier_invalidate_range()".

KVM implements mmu_notifier_invalidate_page().

IT DOES NOT IMPLEMENT THAT RANGE CRAP AT ALL.

So any approach like this is fundamentally garbage. Really. Stop
sending crap. This is exactly tehe same thing that we already reverted
because it was broken shit. Why do you re-send it without actually
fixing the fundamental problems that were pointed out?

               Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] mm/rmap: do not call mmu_notifier_invalidate_page() v3
  2017-08-29 19:09 ` Linus Torvalds
@ 2017-08-29 19:14   ` Jerome Glisse
  2017-08-29 19:16   ` Linus Torvalds
  1 sibling, 0 replies; 7+ messages in thread
From: Jerome Glisse @ 2017-08-29 19:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, linux-mm, Bernhard Held,
	Adam Borowski, Andrea Arcangeli, Radim Krčmář,
	Wanpeng Li, Paolo Bonzini, Takashi Iwai, Nadav Amit,
	Mike Galbraith, Kirill A . Shutemov, axie, Andrew Morton

On Tue, Aug 29, 2017 at 12:09:45PM -0700, Linus Torvalds wrote:
> On Tue, Aug 29, 2017 at 12:05 PM, Jérôme Glisse <jglisse@redhat.com> wrote:
> > Some MMU notifier need to be able to sleep during callback. This was
> > broken by c7ab0d2fdc84 ("mm: convert try_to_unmap_one() to use
> > page_vma_mapped_walk()").
> 
> No. No no no.
> 
> Didn't you learn *anything* from the bug?
> 
> You cannot replace "mmu_notifier_invalidate_page()" with
> "mmu_notifier_invalidate_range()".
> 
> KVM implements mmu_notifier_invalidate_page().
> 
> IT DOES NOT IMPLEMENT THAT RANGE CRAP AT ALL.
> 
> So any approach like this is fundamentally garbage. Really. Stop
> sending crap. This is exactly tehe same thing that we already reverted
> because it was broken shit. Why do you re-send it without actually
> fixing the fundamental problems that were pointed out?
> 

Sorry i missed the kvm not implementing the range() only callback so
i am gonna respin with start/end.

Jérôme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] mm/rmap: do not call mmu_notifier_invalidate_page() v3
  2017-08-29 19:09 ` Linus Torvalds
  2017-08-29 19:14   ` Jerome Glisse
@ 2017-08-29 19:16   ` Linus Torvalds
  2017-08-29 19:58     ` Linus Torvalds
  1 sibling, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2017-08-29 19:16 UTC (permalink / raw)
  To: Jérôme Glisse
  Cc: Linux Kernel Mailing List, linux-mm, Bernhard Held,
	Adam Borowski, Andrea Arcangeli, Radim Krčmář,
	Wanpeng Li, Paolo Bonzini, Takashi Iwai, Nadav Amit,
	Mike Galbraith, Kirill A . Shutemov, axie, Andrew Morton

On Tue, Aug 29, 2017 at 12:09 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So any approach like this is fundamentally garbage. Really. Stop
> sending crap. This is exactly tehe same thing that we already reverted
> because it was broken shit. Why do you re-send it without actually
> fixing the fundamental problems that were pointed out?

Here's what I think might work:

 - put mmu_notifier_invalidate_range_start() before the rmap lock is
taken (and yes, this means that you don't know if it actually will do
anyhting)

 - put mmu_notifier_invalidate_range_end() after the lock is released.
And yes, this means that it will be unconditional and regardless of
whether anything happened)

And then you can check if something actually happened by catching the
*ATOMIC* call to mmu_notifier_invalidate_page(), setting a flag, and
then doing something blocking at mmu_notifier_invalidate_range_end()
time.

Maybe.

I don't know what the KVM issues are.

                Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] mm/rmap: do not call mmu_notifier_invalidate_page() v3
  2017-08-29 19:16   ` Linus Torvalds
@ 2017-08-29 19:58     ` Linus Torvalds
  2017-08-29 20:14       ` Jerome Glisse
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2017-08-29 19:58 UTC (permalink / raw)
  To: Jérôme Glisse
  Cc: Linux Kernel Mailing List, linux-mm, Bernhard Held,
	Adam Borowski, Andrea Arcangeli, Radim Krčmář,
	Wanpeng Li, Paolo Bonzini, Takashi Iwai, Nadav Amit,
	Mike Galbraith, Kirill A . Shutemov, axie, Andrew Morton

On Tue, Aug 29, 2017 at 12:16 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And then you can check if something actually happened by catching the
> *ATOMIC* call to mmu_notifier_invalidate_page(), setting a flag, and
> then doing something blocking at mmu_notifier_invalidate_range_end()
> time.
>
> Maybe.

Note that now I have looked more at the users, I think we actually
just want to get rid of mmu_notifier_invalidate_page() entirely in
favor of just calling mmu_notifier_invalidate_range_start()/end().

Nobody seems to want an atomic version of
mmu_notifier_invalidate_page(), they are perfectly happy just getting
those range_start/end() call instead.

HOWEVER.

There do seem to be places (eg powernv/npu-dma.c, iommu/amd_iommu_v2.c
and ommu/intel-svm.c) that want to get the "invalidate_page()" or
"invalidate_range()" calls, but do *not* catch the begin/end() ones.
The "range" calls were for atomic cases, and the "page" call was for
the few places that weren't (but should have been). They seem to do
the same things.

So just switching from mmu_notifier_invalidate_page() to the
"invalidate_range_start()/end()" pair instead could break those cases.

But the mmu_notifier_invalidate_range() call has always been atomic,
afaik.  It's called from the ptep_clear_flush_notify(), which is
called while holdin gthe ptl lock as far as I can tell.

So to handle the powernv/npu-dma.c, iommu/amd_iommu_v2.c and
ommu/intel-svm.c correctly, _and_ get he KVM case right, we probably
need to:

 - replace the existing mmu_notifier_invalidate_page() call with
mmu_notifier_invalidate_range(), and make sure it's inside the locked
region (ie fs/dax.c too - actually move it inside the lock)

 - surround the locked region with those
mmu_notifier_invalidate_range_start()/end() calls.

 - get rid of mmu_notifier_invalidate_page() entirely, it had bad
semantics anyway.

and from all I can tell that should work for everybody.

But maybe I'm missing something.

               Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] mm/rmap: do not call mmu_notifier_invalidate_page() v3
  2017-08-29 19:58     ` Linus Torvalds
@ 2017-08-29 20:14       ` Jerome Glisse
  0 siblings, 0 replies; 7+ messages in thread
From: Jerome Glisse @ 2017-08-29 20:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, linux-mm, Bernhard Held,
	Adam Borowski, Andrea Arcangeli, Radim Krčmář,
	Wanpeng Li, Paolo Bonzini, Takashi Iwai, Nadav Amit,
	Mike Galbraith, Kirill A . Shutemov, axie, Andrew Morton

On Tue, Aug 29, 2017 at 12:58:45PM -0700, Linus Torvalds wrote:
> On Tue, Aug 29, 2017 at 12:16 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > And then you can check if something actually happened by catching the
> > *ATOMIC* call to mmu_notifier_invalidate_page(), setting a flag, and
> > then doing something blocking at mmu_notifier_invalidate_range_end()
> > time.
> >
> > Maybe.
> 
> Note that now I have looked more at the users, I think we actually
> just want to get rid of mmu_notifier_invalidate_page() entirely in
> favor of just calling mmu_notifier_invalidate_range_start()/end().
> 
> Nobody seems to want an atomic version of
> mmu_notifier_invalidate_page(), they are perfectly happy just getting
> those range_start/end() call instead.
> 
> HOWEVER.
> 
> There do seem to be places (eg powernv/npu-dma.c, iommu/amd_iommu_v2.c
> and ommu/intel-svm.c) that want to get the "invalidate_page()" or
> "invalidate_range()" calls, but do *not* catch the begin/end() ones.
> The "range" calls were for atomic cases, and the "page" call was for
> the few places that weren't (but should have been). They seem to do
> the same things.
> 
> So just switching from mmu_notifier_invalidate_page() to the
> "invalidate_range_start()/end()" pair instead could break those cases.
> 
> But the mmu_notifier_invalidate_range() call has always been atomic,
> afaik.  It's called from the ptep_clear_flush_notify(), which is
> called while holdin gthe ptl lock as far as I can tell.
> 
> So to handle the powernv/npu-dma.c, iommu/amd_iommu_v2.c and
> ommu/intel-svm.c correctly, _and_ get he KVM case right, we probably
> need to:
> 
>  - replace the existing mmu_notifier_invalidate_page() call with
> mmu_notifier_invalidate_range(), and make sure it's inside the locked
> region (ie fs/dax.c too - actually move it inside the lock)
> 
>  - surround the locked region with those
> mmu_notifier_invalidate_range_start()/end() calls.
> 
>  - get rid of mmu_notifier_invalidate_page() entirely, it had bad
> semantics anyway.
> 
> and from all I can tell that should work for everybody.
> 
> But maybe I'm missing something.
> 

So ignore what i just sent, i will rework it toward that direction.
I believe you are right.

Jérôme

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-08-29 20:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-29 19:05 [RFC PATCH] mm/rmap: do not call mmu_notifier_invalidate_page() v3 Jérôme Glisse
2017-08-29 19:06 ` Jerome Glisse
2017-08-29 19:09 ` Linus Torvalds
2017-08-29 19:14   ` Jerome Glisse
2017-08-29 19:16   ` Linus Torvalds
2017-08-29 19:58     ` Linus Torvalds
2017-08-29 20:14       ` Jerome Glisse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).