Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v4 0/9] mmu notifier provide context informations
@ 2019-01-23 22:23 jglisse
  2019-01-23 22:23 ` [PATCH v4 1/9] mm/mmu_notifier: contextual information for event enums jglisse
                   ` (10 more replies)
  0 siblings, 11 replies; 24+ messages in thread
From: jglisse @ 2019-01-23 22:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Christian König, Jan Kara, Felix Kuehling, Jason Gunthorpe,
	Matthew Wilcox, Ross Zwisler, Dan Williams, Paolo Bonzini,
	Radim Krčmář,
	Michal Hocko, Ralph Campbell, John Hubbard, kvm, dri-devel,
	linux-rdma, linux-fsdevel, Arnd Bergmann

From: Jérôme Glisse <jglisse@redhat.com>

Hi Andrew, i see that you still have my event patch in you queue [1].
This patchset replace that single patch and is broken down in further
step so that it is easier to review and ascertain that no mistake were
made during mechanical changes. Here are the step:

    Patch 1 - add the enum values
    Patch 2 - coccinelle semantic patch to convert all call site of
              mmu_notifier_range_init to default enum value and also
              to passing down the vma when it is available
    Patch 3 - update many call site to more accurate enum values
    Patch 4 - add the information to the mmu_notifier_range struct
    Patch 5 - helper to test if a range is updated to read only

All the remaining patches are update to various driver to demonstrate
how this new information get use by device driver. I build tested
with make all and make all minus everything that enable mmu notifier
ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd
gpu and intel gpu.

If they are no objections i believe best plan would be to merge the
the first 5 patches (all mm changes) through your queue for 5.1 and
then to delay driver update to each individual driver tree for 5.2.
This will allow each individual device driver maintainer time to more
thouroughly test this more then my own testing.

Note that i also intend to use this feature further in nouveau and
HMM down the road. I also expect that other user like KVM might be
interested into leveraging this new information to optimize some of
there secondary page table invalidation.

Here is an explaination on the rational for this patchset:


CPU page table update can happens for many reasons, not only as a result
of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
as a result of kernel activities (memory compression, reclaim, migration,
...).

This patch introduce a set of enums that can be associated with each of
the events triggering a mmu notifier. Latter patches take advantages of
those enum values.

- UNMAP: munmap() or mremap()
- CLEAR: page table is cleared (migration, compaction, reclaim, ...)
- PROTECTION_VMA: change in access protections for the range
- PROTECTION_PAGE: change in access protections for page in the range
- SOFT_DIRTY: soft dirtyness tracking

Being able to identify munmap() and mremap() from other reasons why the
page table is cleared is important to allow user of mmu notifier to
update their own internal tracking structure accordingly (on munmap or
mremap it is not longer needed to track range of virtual address as it
becomes invalid).

[1] https://www.ozlabs.org/~akpm/mmotm/broken-out/mm-mmu_notifier-contextual-information-for-event-triggering-invalidation-v2.patch

Cc: Christian König <christian.koenig@amd.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: kvm@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>

Jérôme Glisse (9):
  mm/mmu_notifier: contextual information for event enums
  mm/mmu_notifier: contextual information for event triggering
    invalidation
  mm/mmu_notifier: use correct mmu_notifier events for each invalidation
  mm/mmu_notifier: pass down vma and reasons why mmu notifier is
    happening
  mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper
  gpu/drm/radeon: optimize out the case when a range is updated to read
    only
  gpu/drm/amdgpu: optimize out the case when a range is updated to read
    only
  gpu/drm/i915: optimize out the case when a range is updated to read
    only
  RDMA/umem_odp: optimize out the case when a range is updated to read
    only

 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  | 13 ++++++++
 drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++++
 drivers/gpu/drm/radeon/radeon_mn.c      | 13 ++++++++
 drivers/infiniband/core/umem_odp.c      | 22 +++++++++++--
 fs/proc/task_mmu.c                      |  3 +-
 include/linux/mmu_notifier.h            | 42 ++++++++++++++++++++++++-
 include/rdma/ib_umem_odp.h              |  1 +
 kernel/events/uprobes.c                 |  3 +-
 mm/huge_memory.c                        | 14 +++++----
 mm/hugetlb.c                            | 11 ++++---
 mm/khugepaged.c                         |  3 +-
 mm/ksm.c                                |  6 ++--
 mm/madvise.c                            |  3 +-
 mm/memory.c                             | 25 +++++++++------
 mm/migrate.c                            |  5 ++-
 mm/mmu_notifier.c                       | 10 ++++++
 mm/mprotect.c                           |  4 ++-
 mm/mremap.c                             |  3 +-
 mm/oom_kill.c                           |  3 +-
 mm/rmap.c                               |  6 ++--
 20 files changed, 171 insertions(+), 35 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 1/9] mm/mmu_notifier: contextual information for event enums
  2019-01-23 22:23 [PATCH v4 0/9] mmu notifier provide context informations jglisse
@ 2019-01-23 22:23 ` jglisse
  2019-01-23 22:23 ` [PATCH v4 2/9] mm/mmu_notifier: contextual information for event triggering invalidation jglisse
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: jglisse @ 2019-01-23 22:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Christian König, Jan Kara, Felix Kuehling, Jason Gunthorpe,
	Matthew Wilcox, Ross Zwisler, Dan Williams, Paolo Bonzini,
	Radim Krčmář,
	Michal Hocko, Ralph Campbell, John Hubbard, kvm, dri-devel,
	linux-rdma, linux-fsdevel, Arnd Bergmann

From: Jérôme Glisse <jglisse@redhat.com>

CPU page table update can happens for many reasons, not only as a result
of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
as a result of kernel activities (memory compression, reclaim, migration,
...).

This patch introduce a set of enums that can be associated with each of
the events triggering a mmu notifier. Latter patches take advantages of
those enum values.

    - UNMAP: munmap() or mremap()
    - CLEAR: page table is cleared (migration, compaction, reclaim, ...)
    - PROTECTION_VMA: change in access protections for the range
    - PROTECTION_PAGE: change in access protections for page in the range
    - SOFT_DIRTY: soft dirtyness tracking

Being able to identify munmap() and mremap() from other reasons why the
page table is cleared is important to allow user of mmu notifier to
update their own internal tracking structure accordingly (on munmap or
mremap it is not longer needed to track range of virtual address as it
becomes invalid).

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: kvm@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/mmu_notifier.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 4050ec1c3b45..abc9dbb7bcb6 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -10,6 +10,36 @@
 struct mmu_notifier;
 struct mmu_notifier_ops;
 
+/**
+ * enum mmu_notifier_event - reason for the mmu notifier callback
+ * @MMU_NOTIFY_UNMAP: either munmap() that unmap the range or a mremap() that
+ * move the range
+ *
+ * @MMU_NOTIFY_CLEAR: clear page table entry (many reasons for this like
+ * madvise() or replacing a page by another one, ...).
+ *
+ * @MMU_NOTIFY_PROTECTION_VMA: update is due to protection change for the range
+ * ie using the vma access permission (vm_page_prot) to update the whole range
+ * is enough no need to inspect changes to the CPU page table (mprotect()
+ * syscall)
+ *
+ * @MMU_NOTIFY_PROTECTION_PAGE: update is due to change in read/write flag for
+ * pages in the range so to mirror those changes the user must inspect the CPU
+ * page table (from the end callback).
+ *
+ * @MMU_NOTIFY_SOFT_DIRTY: soft dirty accounting (still same page and same
+ * access flags). User should soft dirty the page in the end callback to make
+ * sure that anyone relying on soft dirtyness catch pages that might be written
+ * through non CPU mappings.
+ */
+enum mmu_notifier_event {
+	MMU_NOTIFY_UNMAP = 0,
+	MMU_NOTIFY_CLEAR,
+	MMU_NOTIFY_PROTECTION_VMA,
+	MMU_NOTIFY_PROTECTION_PAGE,
+	MMU_NOTIFY_SOFT_DIRTY,
+};
+
 #ifdef CONFIG_MMU_NOTIFIER
 
 /*
-- 
2.17.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 2/9] mm/mmu_notifier: contextual information for event triggering invalidation
  2019-01-23 22:23 [PATCH v4 0/9] mmu notifier provide context informations jglisse
  2019-01-23 22:23 ` [PATCH v4 1/9] mm/mmu_notifier: contextual information for event enums jglisse
@ 2019-01-23 22:23 ` jglisse
  2019-01-23 22:23 ` [PATCH v4 3/9] mm/mmu_notifier: use correct mmu_notifier events for each invalidation jglisse
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: jglisse @ 2019-01-23 22:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Christian König, Jan Kara, Felix Kuehling, Jason Gunthorpe,
	Matthew Wilcox, Ross Zwisler, Dan Williams, Paolo Bonzini,
	Radim Krčmář,
	Michal Hocko, Ralph Campbell, John Hubbard, kvm, dri-devel,
	linux-rdma, linux-fsdevel, Arnd Bergmann

From: Jérôme Glisse <jglisse@redhat.com>

CPU page table update can happens for many reasons, not only as a result
of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
as a result of kernel activities (memory compression, reclaim, migration,
...).

Users of mmu notifier API track changes to the CPU page table and take
specific action for them. While current API only provide range of virtual
address affected by the change, not why the changes is happening.

This patchset do the initial mechanical convertion of all the places that
calls mmu_notifier_range_init to also provide the default MMU_NOTIFY_UNMAP
event as well as the vma if it is know (most invalidation happens against
a given vma). Passing down the vma allows the users of mmu notifier to
inspect the new vma page protection.

The MMU_NOTIFY_UNMAP is always the safe default as users of mmu notifier
should assume that every for the range is going away when that event
happens. A latter patch do convert mm call path to use a more appropriate
events for each call.

This is done as 2 patches so that no call site is forgotten especialy
as it uses this following coccinelle patch:

%<----------------------------------------------------------------------
@@
identifier I1, I2, I3, I4;
@@
static inline void mmu_notifier_range_init(struct mmu_notifier_range *I1,
+enum mmu_notifier_event event,
+struct vm_area_struct *vma,
struct mm_struct *I2, unsigned long I3, unsigned long I4) { ... }

@@
@@
-#define mmu_notifier_range_init(range, mm, start, end)
+#define mmu_notifier_range_init(range, event, vma, mm, start, end)

@@
expression E1, E3, E4;
identifier I1;
@@
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, I1,
I1->vm_mm, E3, E4)
...>

@@
expression E1, E2, E3, E4;
identifier FN, VMA;
@@
FN(..., struct vm_area_struct *VMA, ...) {
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, VMA,
E2, E3, E4)
...> }

@@
expression E1, E2, E3, E4;
identifier FN, VMA;
@@
FN(...) {
struct vm_area_struct *VMA;
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, VMA,
E2, E3, E4)
...> }

@@
expression E1, E2, E3, E4;
identifier FN;
@@
FN(...) {
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, NULL,
E2, E3, E4)
...> }
---------------------------------------------------------------------->%

Applied with:
spatch --all-includes --sp-file mmu-notifier.spatch fs/proc/task_mmu.c --in-place
spatch --sp-file mmu-notifier.spatch --dir kernel/events/ --in-place
spatch --sp-file mmu-notifier.spatch --dir mm --in-place

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: kvm@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
---
 fs/proc/task_mmu.c           |  3 ++-
 include/linux/mmu_notifier.h |  4 +++-
 kernel/events/uprobes.c      |  3 ++-
 mm/huge_memory.c             | 12 ++++++++----
 mm/hugetlb.c                 | 10 ++++++----
 mm/khugepaged.c              |  3 ++-
 mm/ksm.c                     |  6 ++++--
 mm/madvise.c                 |  3 ++-
 mm/memory.c                  | 25 ++++++++++++++++---------
 mm/migrate.c                 |  5 ++++-
 mm/mprotect.c                |  3 ++-
 mm/mremap.c                  |  3 ++-
 mm/oom_kill.c                |  3 ++-
 mm/rmap.c                    |  6 ++++--
 14 files changed, 59 insertions(+), 30 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f0ec9edab2f3..57e7f98647d3 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1143,7 +1143,8 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 				break;
 			}
 
-			mmu_notifier_range_init(&range, mm, 0, -1UL);
+			mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP,
+						NULL, mm, 0, -1UL);
 			mmu_notifier_invalidate_range_start(&range);
 		}
 		walk_page_range(0, mm->highest_vm_end, &clear_refs_walk);
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index abc9dbb7bcb6..a9808add4070 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -348,6 +348,8 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
 
 
 static inline void mmu_notifier_range_init(struct mmu_notifier_range *range,
+					   enum mmu_notifier_event event,
+					   struct vm_area_struct *vma,
 					   struct mm_struct *mm,
 					   unsigned long start,
 					   unsigned long end)
@@ -482,7 +484,7 @@ static inline void _mmu_notifier_range_init(struct mmu_notifier_range *range,
 	range->end = end;
 }
 
-#define mmu_notifier_range_init(range, mm, start, end) \
+#define mmu_notifier_range_init(range,event,vma,mm,start,end)  \
 	_mmu_notifier_range_init(range, start, end)
 
 
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 8aef47ee7bfa..b67fe7e59621 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -174,7 +174,8 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	struct mmu_notifier_range range;
 	struct mem_cgroup *memcg;
 
-	mmu_notifier_range_init(&range, mm, addr, addr + PAGE_SIZE);
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, mm, addr,
+				addr + PAGE_SIZE);
 
 	VM_BUG_ON_PAGE(PageTransHuge(old_page), old_page);
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index faf357eaf0ce..b353e8b7876f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1182,7 +1182,8 @@ static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf,
 		cond_resched();
 	}
 
-	mmu_notifier_range_init(&range, vma->vm_mm, haddr,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
+				haddr,
 				haddr + HPAGE_PMD_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
 
@@ -1345,7 +1346,8 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
 				    vma, HPAGE_PMD_NR);
 	__SetPageUptodate(new_page);
 
-	mmu_notifier_range_init(&range, vma->vm_mm, haddr,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
+				haddr,
 				haddr + HPAGE_PMD_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
 
@@ -2023,7 +2025,8 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud,
 	spinlock_t *ptl;
 	struct mmu_notifier_range range;
 
-	mmu_notifier_range_init(&range, vma->vm_mm, address & HPAGE_PUD_MASK,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
+				address & HPAGE_PUD_MASK,
 				(address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
 	ptl = pud_lock(vma->vm_mm, pud);
@@ -2241,7 +2244,8 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 	spinlock_t *ptl;
 	struct mmu_notifier_range range;
 
-	mmu_notifier_range_init(&range, vma->vm_mm, address & HPAGE_PMD_MASK,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
+				address & HPAGE_PMD_MASK,
 				(address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
 	ptl = pmd_lock(vma->vm_mm, pmd);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index df2e7dd5ff17..cbda46ad6a30 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3246,7 +3246,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 	cow = (vma->vm_flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
 
 	if (cow) {
-		mmu_notifier_range_init(&range, src, vma->vm_start,
+		mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, src,
+					vma->vm_start,
 					vma->vm_end);
 		mmu_notifier_invalidate_range_start(&range);
 	}
@@ -3358,7 +3359,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	/*
 	 * If sharing possible, alert mmu notifiers of worst case.
 	 */
-	mmu_notifier_range_init(&range, mm, start, end);
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, mm, start, end);
 	adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
 	mmu_notifier_invalidate_range_start(&range);
 	address = start;
@@ -3626,7 +3627,8 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 	__SetPageUptodate(new_page);
 	set_page_huge_active(new_page);
 
-	mmu_notifier_range_init(&range, mm, haddr, haddr + huge_page_size(h));
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, mm, haddr,
+				haddr + huge_page_size(h));
 	mmu_notifier_invalidate_range_start(&range);
 
 	/*
@@ -4346,7 +4348,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 	 * start/end.  Set range.start/range.end to cover the maximum possible
 	 * range if PMD sharing is possible.
 	 */
-	mmu_notifier_range_init(&range, mm, start, end);
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, mm, start, end);
 	adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
 
 	BUG_ON(address >= end);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 4f017339ddb2..f903acb1b0a6 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1016,7 +1016,8 @@ static void collapse_huge_page(struct mm_struct *mm,
 	pte = pte_offset_map(pmd, address);
 	pte_ptl = pte_lockptr(mm, pmd);
 
-	mmu_notifier_range_init(&range, mm, address, address + HPAGE_PMD_SIZE);
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, NULL, mm, address,
+				address + HPAGE_PMD_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
 	pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */
 	/*
diff --git a/mm/ksm.c b/mm/ksm.c
index 6c48ad13b4c9..6b27c4f0fb1f 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1051,7 +1051,8 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
 
 	BUG_ON(PageTransCompound(page));
 
-	mmu_notifier_range_init(&range, mm, pvmw.address,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, mm,
+				pvmw.address,
 				pvmw.address + PAGE_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
 
@@ -1139,7 +1140,8 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
 	if (!pmd)
 		goto out;
 
-	mmu_notifier_range_init(&range, mm, addr, addr + PAGE_SIZE);
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, mm, addr,
+				addr + PAGE_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
 
 	ptep = pte_offset_map_lock(mm, pmd, addr, &ptl);
diff --git a/mm/madvise.c b/mm/madvise.c
index 21a7881a2db4..04446dabba56 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -472,7 +472,8 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
 	range.end = min(vma->vm_end, end_addr);
 	if (range.end <= vma->vm_start)
 		return -EINVAL;
-	mmu_notifier_range_init(&range, mm, range.start, range.end);
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, mm,
+				range.start, range.end);
 
 	lru_add_drain();
 	tlb_gather_mmu(&tlb, mm, range.start, range.end);
diff --git a/mm/memory.c b/mm/memory.c
index e11ca9dd823f..d9b7c935e812 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1009,7 +1009,8 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	is_cow = is_cow_mapping(vma->vm_flags);
 
 	if (is_cow) {
-		mmu_notifier_range_init(&range, src_mm, addr, end);
+		mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, src_mm,
+					addr, end);
 		mmu_notifier_invalidate_range_start(&range);
 	}
 
@@ -1333,7 +1334,8 @@ void unmap_vmas(struct mmu_gather *tlb,
 {
 	struct mmu_notifier_range range;
 
-	mmu_notifier_range_init(&range, vma->vm_mm, start_addr, end_addr);
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
+				start_addr, end_addr);
 	mmu_notifier_invalidate_range_start(&range);
 	for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next)
 		unmap_single_vma(tlb, vma, start_addr, end_addr, NULL);
@@ -1355,7 +1357,8 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
 	struct mmu_gather tlb;
 
 	lru_add_drain();
-	mmu_notifier_range_init(&range, vma->vm_mm, start, start + size);
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
+				start, start + size);
 	tlb_gather_mmu(&tlb, vma->vm_mm, start, range.end);
 	update_hiwater_rss(vma->vm_mm);
 	mmu_notifier_invalidate_range_start(&range);
@@ -1381,7 +1384,8 @@ static void zap_page_range_single(struct vm_area_struct *vma, unsigned long addr
 	struct mmu_gather tlb;
 
 	lru_add_drain();
-	mmu_notifier_range_init(&range, vma->vm_mm, address, address + size);
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
+				address, address + size);
 	tlb_gather_mmu(&tlb, vma->vm_mm, address, range.end);
 	update_hiwater_rss(vma->vm_mm);
 	mmu_notifier_invalidate_range_start(&range);
@@ -2271,7 +2275,8 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 
 	__SetPageUptodate(new_page);
 
-	mmu_notifier_range_init(&range, mm, vmf->address & PAGE_MASK,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, mm,
+				vmf->address & PAGE_MASK,
 				(vmf->address & PAGE_MASK) + PAGE_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
 
@@ -4081,8 +4086,9 @@ static int __follow_pte_pmd(struct mm_struct *mm, unsigned long address,
 			goto out;
 
 		if (range) {
-			mmu_notifier_range_init(range, mm, address & PMD_MASK,
-					     (address & PMD_MASK) + PMD_SIZE);
+			mmu_notifier_range_init(range, MMU_NOTIFY_UNMAP, NULL,
+						mm, address & PMD_MASK,
+						(address & PMD_MASK) + PMD_SIZE);
 			mmu_notifier_invalidate_range_start(range);
 		}
 		*ptlp = pmd_lock(mm, pmd);
@@ -4099,8 +4105,9 @@ static int __follow_pte_pmd(struct mm_struct *mm, unsigned long address,
 		goto out;
 
 	if (range) {
-		mmu_notifier_range_init(range, mm, address & PAGE_MASK,
-				     (address & PAGE_MASK) + PAGE_SIZE);
+		mmu_notifier_range_init(range, MMU_NOTIFY_UNMAP, NULL, mm,
+					address & PAGE_MASK,
+					(address & PAGE_MASK) + PAGE_SIZE);
 		mmu_notifier_invalidate_range_start(range);
 	}
 	ptep = pte_offset_map_lock(mm, pmd, address, ptlp);
diff --git a/mm/migrate.c b/mm/migrate.c
index a16b15090df3..385c59d5c28d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2342,7 +2342,8 @@ static void migrate_vma_collect(struct migrate_vma *migrate)
 	mm_walk.mm = migrate->vma->vm_mm;
 	mm_walk.private = migrate;
 
-	mmu_notifier_range_init(&range, mm_walk.mm, migrate->start,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, NULL, mm_walk.mm,
+				migrate->start,
 				migrate->end);
 	mmu_notifier_invalidate_range_start(&range);
 	walk_page_range(migrate->start, migrate->end, &mm_walk);
@@ -2750,6 +2751,8 @@ static void migrate_vma_pages(struct migrate_vma *migrate)
 				notified = true;
 
 				mmu_notifier_range_init(&range,
+							MMU_NOTIFY_UNMAP,
+							NULL,
 							migrate->vma->vm_mm,
 							addr, migrate->end);
 				mmu_notifier_invalidate_range_start(&range);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 36cb358db170..b22e660701f1 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -185,7 +185,8 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
 
 		/* invoke the mmu notifier if the pmd is populated */
 		if (!range.start) {
-			mmu_notifier_range_init(&range, vma->vm_mm, addr, end);
+			mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma,
+						vma->vm_mm, addr, end);
 			mmu_notifier_invalidate_range_start(&range);
 		}
 
diff --git a/mm/mremap.c b/mm/mremap.c
index 3320616ed93f..cac19c1e0af4 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -249,7 +249,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 	old_end = old_addr + len;
 	flush_cache_range(vma, old_addr, old_end);
 
-	mmu_notifier_range_init(&range, vma->vm_mm, old_addr, old_end);
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
+				old_addr, old_end);
 	mmu_notifier_invalidate_range_start(&range);
 
 	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index f0e8cd9edb1a..77289bc6a943 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -531,7 +531,8 @@ bool __oom_reap_task_mm(struct mm_struct *mm)
 			struct mmu_notifier_range range;
 			struct mmu_gather tlb;
 
-			mmu_notifier_range_init(&range, mm, vma->vm_start,
+			mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma,
+						mm, vma->vm_start,
 						vma->vm_end);
 			tlb_gather_mmu(&tlb, mm, range.start, range.end);
 			if (mmu_notifier_invalidate_range_start_nonblock(&range)) {
diff --git a/mm/rmap.c b/mm/rmap.c
index 0454ecc29537..49c75f0c6c33 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -896,7 +896,8 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma,
 	 * We have to assume the worse case ie pmd for invalidation. Note that
 	 * the page can not be free from this function.
 	 */
-	mmu_notifier_range_init(&range, vma->vm_mm, address,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
+				address,
 				min(vma->vm_end, address +
 				    (PAGE_SIZE << compound_order(page))));
 	mmu_notifier_invalidate_range_start(&range);
@@ -1371,7 +1372,8 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	 * Note that the page can not be free in this function as call of
 	 * try_to_unmap() must hold a reference on the page.
 	 */
-	mmu_notifier_range_init(&range, vma->vm_mm, address,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
+				address,
 				min(vma->vm_end, address +
 				    (PAGE_SIZE << compound_order(page))));
 	if (PageHuge(page)) {
-- 
2.17.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 3/9] mm/mmu_notifier: use correct mmu_notifier events for each invalidation
  2019-01-23 22:23 [PATCH v4 0/9] mmu notifier provide context informations jglisse
  2019-01-23 22:23 ` [PATCH v4 1/9] mm/mmu_notifier: contextual information for event enums jglisse
  2019-01-23 22:23 ` [PATCH v4 2/9] mm/mmu_notifier: contextual information for event triggering invalidation jglisse
@ 2019-01-23 22:23 ` jglisse
  2019-01-23 22:23 ` [PATCH v4 4/9] mm/mmu_notifier: pass down vma and reasons why mmu notifier is happening jglisse
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: jglisse @ 2019-01-23 22:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Christian König, Jan Kara, Felix Kuehling, Jason Gunthorpe,
	Matthew Wilcox, Ross Zwisler, Dan Williams, Paolo Bonzini,
	Radim Krčmář,
	Michal Hocko, Ralph Campbell, John Hubbard, kvm, dri-devel,
	linux-rdma, linux-fsdevel, Arnd Bergmann

From: Jérôme Glisse <jglisse@redhat.com>

This update each existing invalidation to use the correct mmu notifier
event that represent what is happening to the CPU page table. See the
patch which introduced the events to see the rational behind this.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: kvm@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
---
 fs/proc/task_mmu.c      |  2 +-
 kernel/events/uprobes.c |  2 +-
 mm/huge_memory.c        | 14 ++++++--------
 mm/hugetlb.c            |  7 ++++---
 mm/khugepaged.c         |  2 +-
 mm/ksm.c                |  4 ++--
 mm/madvise.c            |  2 +-
 mm/memory.c             | 16 ++++++++--------
 mm/migrate.c            |  4 ++--
 mm/mprotect.c           |  5 +++--
 mm/rmap.c               |  6 +++---
 11 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 57e7f98647d3..cce226f3305f 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1143,7 +1143,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 				break;
 			}
 
-			mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP,
+			mmu_notifier_range_init(&range, MMU_NOTIFY_SOFT_DIRTY,
 						NULL, mm, 0, -1UL);
 			mmu_notifier_invalidate_range_start(&range);
 		}
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index b67fe7e59621..87e76a1dc758 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -174,7 +174,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	struct mmu_notifier_range range;
 	struct mem_cgroup *memcg;
 
-	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, mm, addr,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, vma, mm, addr,
 				addr + PAGE_SIZE);
 
 	VM_BUG_ON_PAGE(PageTransHuge(old_page), old_page);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b353e8b7876f..957d23754217 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1182,9 +1182,8 @@ static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf,
 		cond_resched();
 	}
 
-	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
-				haddr,
-				haddr + HPAGE_PMD_SIZE);
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, vma, vma->vm_mm,
+				haddr, haddr + HPAGE_PMD_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
 
 	vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
@@ -1346,9 +1345,8 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
 				    vma, HPAGE_PMD_NR);
 	__SetPageUptodate(new_page);
 
-	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
-				haddr,
-				haddr + HPAGE_PMD_SIZE);
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, vma, vma->vm_mm,
+				haddr, haddr + HPAGE_PMD_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
 
 	spin_lock(vmf->ptl);
@@ -2025,7 +2023,7 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud,
 	spinlock_t *ptl;
 	struct mmu_notifier_range range;
 
-	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, vma, vma->vm_mm,
 				address & HPAGE_PUD_MASK,
 				(address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
@@ -2244,7 +2242,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 	spinlock_t *ptl;
 	struct mmu_notifier_range range;
 
-	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, vma, vma->vm_mm,
 				address & HPAGE_PMD_MASK,
 				(address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index cbda46ad6a30..f691398ac6b6 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3246,7 +3246,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 	cow = (vma->vm_flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
 
 	if (cow) {
-		mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, src,
+		mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, vma, src,
 					vma->vm_start,
 					vma->vm_end);
 		mmu_notifier_invalidate_range_start(&range);
@@ -3627,7 +3627,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 	__SetPageUptodate(new_page);
 	set_page_huge_active(new_page);
 
-	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, mm, haddr,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, vma, mm, haddr,
 				haddr + huge_page_size(h));
 	mmu_notifier_invalidate_range_start(&range);
 
@@ -4348,7 +4348,8 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 	 * start/end.  Set range.start/range.end to cover the maximum possible
 	 * range if PMD sharing is possible.
 	 */
-	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, mm, start, end);
+	mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_VMA,
+				vma, mm, start, end);
 	adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
 
 	BUG_ON(address >= end);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index f903acb1b0a6..d09d71f10f0f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1016,7 +1016,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 	pte = pte_offset_map(pmd, address);
 	pte_ptl = pte_lockptr(mm, pmd);
 
-	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, NULL, mm, address,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, NULL, mm, address,
 				address + HPAGE_PMD_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
 	pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */
diff --git a/mm/ksm.c b/mm/ksm.c
index 6b27c4f0fb1f..97757c5fa15f 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1051,7 +1051,7 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
 
 	BUG_ON(PageTransCompound(page));
 
-	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, mm,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, vma, mm,
 				pvmw.address,
 				pvmw.address + PAGE_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
@@ -1140,7 +1140,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
 	if (!pmd)
 		goto out;
 
-	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, mm, addr,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, vma, mm, addr,
 				addr + PAGE_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
 
diff --git a/mm/madvise.c b/mm/madvise.c
index 04446dabba56..a2f91bd286d9 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -472,7 +472,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
 	range.end = min(vma->vm_end, end_addr);
 	if (range.end <= vma->vm_start)
 		return -EINVAL;
-	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, mm,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, vma, mm,
 				range.start, range.end);
 
 	lru_add_drain();
diff --git a/mm/memory.c b/mm/memory.c
index d9b7c935e812..a8c6922526f6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1009,8 +1009,8 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	is_cow = is_cow_mapping(vma->vm_flags);
 
 	if (is_cow) {
-		mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, src_mm,
-					addr, end);
+		mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE,
+					vma, src_mm, addr, end);
 		mmu_notifier_invalidate_range_start(&range);
 	}
 
@@ -1334,7 +1334,7 @@ void unmap_vmas(struct mmu_gather *tlb,
 {
 	struct mmu_notifier_range range;
 
-	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, NULL, vma->vm_mm,
 				start_addr, end_addr);
 	mmu_notifier_invalidate_range_start(&range);
 	for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next)
@@ -1357,7 +1357,7 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
 	struct mmu_gather tlb;
 
 	lru_add_drain();
-	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, vma, vma->vm_mm,
 				start, start + size);
 	tlb_gather_mmu(&tlb, vma->vm_mm, start, range.end);
 	update_hiwater_rss(vma->vm_mm);
@@ -1384,7 +1384,7 @@ static void zap_page_range_single(struct vm_area_struct *vma, unsigned long addr
 	struct mmu_gather tlb;
 
 	lru_add_drain();
-	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, vma, vma->vm_mm,
 				address, address + size);
 	tlb_gather_mmu(&tlb, vma->vm_mm, address, range.end);
 	update_hiwater_rss(vma->vm_mm);
@@ -2275,7 +2275,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 
 	__SetPageUptodate(new_page);
 
-	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, mm,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, vma, mm,
 				vmf->address & PAGE_MASK,
 				(vmf->address & PAGE_MASK) + PAGE_SIZE);
 	mmu_notifier_invalidate_range_start(&range);
@@ -4086,7 +4086,7 @@ static int __follow_pte_pmd(struct mm_struct *mm, unsigned long address,
 			goto out;
 
 		if (range) {
-			mmu_notifier_range_init(range, MMU_NOTIFY_UNMAP, NULL,
+			mmu_notifier_range_init(range, MMU_NOTIFY_CLEAR, NULL,
 						mm, address & PMD_MASK,
 						(address & PMD_MASK) + PMD_SIZE);
 			mmu_notifier_invalidate_range_start(range);
@@ -4105,7 +4105,7 @@ static int __follow_pte_pmd(struct mm_struct *mm, unsigned long address,
 		goto out;
 
 	if (range) {
-		mmu_notifier_range_init(range, MMU_NOTIFY_UNMAP, NULL, mm,
+		mmu_notifier_range_init(range, MMU_NOTIFY_CLEAR, NULL, mm,
 					address & PAGE_MASK,
 					(address & PAGE_MASK) + PAGE_SIZE);
 		mmu_notifier_invalidate_range_start(range);
diff --git a/mm/migrate.c b/mm/migrate.c
index 385c59d5c28d..384cde91d886 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2342,7 +2342,7 @@ static void migrate_vma_collect(struct migrate_vma *migrate)
 	mm_walk.mm = migrate->vma->vm_mm;
 	mm_walk.private = migrate;
 
-	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, NULL, mm_walk.mm,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, NULL, mm_walk.mm,
 				migrate->start,
 				migrate->end);
 	mmu_notifier_invalidate_range_start(&range);
@@ -2751,7 +2751,7 @@ static void migrate_vma_pages(struct migrate_vma *migrate)
 				notified = true;
 
 				mmu_notifier_range_init(&range,
-							MMU_NOTIFY_UNMAP,
+							MMU_NOTIFY_CLEAR,
 							NULL,
 							migrate->vma->vm_mm,
 							addr, migrate->end);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index b22e660701f1..c19e8bdc2648 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -185,8 +185,9 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
 
 		/* invoke the mmu notifier if the pmd is populated */
 		if (!range.start) {
-			mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma,
-						vma->vm_mm, addr, end);
+			mmu_notifier_range_init(&range,
+						MMU_NOTIFY_PROTECTION_VMA,
+						vma, vma->vm_mm, addr, end);
 			mmu_notifier_invalidate_range_start(&range);
 		}
 
diff --git a/mm/rmap.c b/mm/rmap.c
index 49c75f0c6c33..81ea84ab511a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -896,8 +896,8 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma,
 	 * We have to assume the worse case ie pmd for invalidation. Note that
 	 * the page can not be free from this function.
 	 */
-	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
-				address,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE, vma,
+				vma->vm_mm, address,
 				min(vma->vm_end, address +
 				    (PAGE_SIZE << compound_order(page))));
 	mmu_notifier_invalidate_range_start(&range);
@@ -1372,7 +1372,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	 * Note that the page can not be free in this function as call of
 	 * try_to_unmap() must hold a reference on the page.
 	 */
-	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, vma, vma->vm_mm,
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, vma, vma->vm_mm,
 				address,
 				min(vma->vm_end, address +
 				    (PAGE_SIZE << compound_order(page))));
-- 
2.17.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 4/9] mm/mmu_notifier: pass down vma and reasons why mmu notifier is happening
  2019-01-23 22:23 [PATCH v4 0/9] mmu notifier provide context informations jglisse
                   ` (2 preceding siblings ...)
  2019-01-23 22:23 ` [PATCH v4 3/9] mm/mmu_notifier: use correct mmu_notifier events for each invalidation jglisse
@ 2019-01-23 22:23 ` jglisse
  2019-01-23 22:23 ` [PATCH v4 5/9] mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper jglisse
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: jglisse @ 2019-01-23 22:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Christian König, Jan Kara, Felix Kuehling, Jason Gunthorpe,
	Matthew Wilcox, Ross Zwisler, Dan Williams, Paolo Bonzini,
	Radim Krčmář,
	Michal Hocko, Ralph Campbell, John Hubbard, kvm, dri-devel,
	linux-rdma, linux-fsdevel, Arnd Bergmann

From: Jérôme Glisse <jglisse@redhat.com>

CPU page table update can happens for many reasons, not only as a result
of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
as a result of kernel activities (memory compression, reclaim, migration,
...).

Users of mmu notifier API track changes to the CPU page table and take
specific action for them. While current API only provide range of virtual
address affected by the change, not why the changes is happening

This patch is just passing down the new informations by adding it to the
mmu_notifier_range structure.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: kvm@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/mmu_notifier.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index a9808add4070..7514775817de 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -56,9 +56,11 @@ struct mmu_notifier_mm {
 };
 
 struct mmu_notifier_range {
+	struct vm_area_struct *vma;
 	struct mm_struct *mm;
 	unsigned long start;
 	unsigned long end;
+	enum mmu_notifier_event event;
 	bool blockable;
 };
 
@@ -354,6 +356,8 @@ static inline void mmu_notifier_range_init(struct mmu_notifier_range *range,
 					   unsigned long start,
 					   unsigned long end)
 {
+	range->vma = vma;
+	range->event = event;
 	range->mm = mm;
 	range->start = start;
 	range->end = end;
-- 
2.17.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 5/9] mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper
  2019-01-23 22:23 [PATCH v4 0/9] mmu notifier provide context informations jglisse
                   ` (3 preceding siblings ...)
  2019-01-23 22:23 ` [PATCH v4 4/9] mm/mmu_notifier: pass down vma and reasons why mmu notifier is happening jglisse
@ 2019-01-23 22:23 ` jglisse
  2019-01-23 22:23 ` [PATCH v4 6/9] gpu/drm/radeon: optimize out the case when a range is updated to read only jglisse
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: jglisse @ 2019-01-23 22:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Christian König, Jan Kara, Felix Kuehling, Jason Gunthorpe,
	Matthew Wilcox, Ross Zwisler, Dan Williams, Paolo Bonzini,
	Radim Krčmář,
	Michal Hocko, Ralph Campbell, John Hubbard, kvm, dri-devel,
	linux-rdma, linux-fsdevel, Arnd Bergmann

From: Jérôme Glisse <jglisse@redhat.com>

Helper to test if a range is updated to read only (it is still valid
to read from the range). This is useful for device driver or anyone
who wish to optimize out update when they know that they already have
the range map read only.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: kvm@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/mmu_notifier.h |  4 ++++
 mm/mmu_notifier.c            | 10 ++++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 7514775817de..be873c431886 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -257,6 +257,8 @@ extern void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range *r,
 				  bool only_end);
 extern void __mmu_notifier_invalidate_range(struct mm_struct *mm,
 				  unsigned long start, unsigned long end);
+extern bool
+mmu_notifier_range_update_to_read_only(const struct mmu_notifier_range *range);
 
 static inline void mmu_notifier_release(struct mm_struct *mm)
 {
@@ -553,6 +555,8 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
 {
 }
 
+#define mmu_notifier_range_update_to_read_only(r) false
+
 #define ptep_clear_flush_young_notify ptep_clear_flush_young
 #define pmdp_clear_flush_young_notify pmdp_clear_flush_young
 #define ptep_clear_young_notify ptep_test_and_clear_young
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 9c884abc7850..0b2f77715a08 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -395,3 +395,13 @@ void mmu_notifier_unregister_no_release(struct mmu_notifier *mn,
 	mmdrop(mm);
 }
 EXPORT_SYMBOL_GPL(mmu_notifier_unregister_no_release);
+
+bool
+mmu_notifier_range_update_to_read_only(const struct mmu_notifier_range *range)
+{
+	if (!range->vma || range->event != MMU_NOTIFY_PROTECTION_VMA)
+		return false;
+	/* Return true if the vma still have the read flag set. */
+	return range->vma->vm_flags & VM_READ;
+}
+EXPORT_SYMBOL_GPL(mmu_notifier_range_update_to_read_only);
-- 
2.17.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 6/9] gpu/drm/radeon: optimize out the case when a range is updated to read only
  2019-01-23 22:23 [PATCH v4 0/9] mmu notifier provide context informations jglisse
                   ` (4 preceding siblings ...)
  2019-01-23 22:23 ` [PATCH v4 5/9] mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper jglisse
@ 2019-01-23 22:23 ` jglisse
  2019-01-23 22:23 ` [PATCH v4 7/9] gpu/drm/amdgpu: " jglisse
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: jglisse @ 2019-01-23 22:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Christian König, Jan Kara, Felix Kuehling, Jason Gunthorpe,
	Matthew Wilcox, Ross Zwisler, Dan Williams, Paolo Bonzini,
	Radim Krčmář,
	Michal Hocko, Ralph Campbell, John Hubbard, kvm, dri-devel,
	linux-rdma, linux-fsdevel, Arnd Bergmann

From: Jérôme Glisse <jglisse@redhat.com>

When range of virtual address is updated read only and corresponding
user ptr object are already read only it is pointless to do anything.
Optimize this case out.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: kvm@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
---
 drivers/gpu/drm/radeon/radeon_mn.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_mn.c b/drivers/gpu/drm/radeon/radeon_mn.c
index b3019505065a..f77294f58e63 100644
--- a/drivers/gpu/drm/radeon/radeon_mn.c
+++ b/drivers/gpu/drm/radeon/radeon_mn.c
@@ -124,6 +124,7 @@ static int radeon_mn_invalidate_range_start(struct mmu_notifier *mn,
 	struct radeon_mn *rmn = container_of(mn, struct radeon_mn, mn);
 	struct ttm_operation_ctx ctx = { false, false };
 	struct interval_tree_node *it;
+	bool update_to_read_only;
 	unsigned long end;
 	int ret = 0;
 
@@ -138,6 +139,8 @@ static int radeon_mn_invalidate_range_start(struct mmu_notifier *mn,
 	else if (!mutex_trylock(&rmn->lock))
 		return -EAGAIN;
 
+	update_to_read_only = mmu_notifier_range_update_to_read_only(range);
+
 	it = interval_tree_iter_first(&rmn->objects, range->start, end);
 	while (it) {
 		struct radeon_mn_node *node;
@@ -153,10 +156,20 @@ static int radeon_mn_invalidate_range_start(struct mmu_notifier *mn,
 		it = interval_tree_iter_next(it, range->start, end);
 
 		list_for_each_entry(bo, &node->bos, mn_list) {
+			bool read_only;
 
 			if (!bo->tbo.ttm || bo->tbo.ttm->state != tt_bound)
 				continue;
 
+			/*
+			 * If it is already read only and we are updating to
+			 * read only then we do not need to change anything.
+			 * So save time and skip this one.
+			 */
+			read_only = radeon_ttm_tt_is_readonly(bo->tbo.ttm);
+			if (update_to_read_only && read_only)
+				continue;
+
 			r = radeon_bo_reserve(bo, true);
 			if (r) {
 				DRM_ERROR("(%ld) failed to reserve user bo\n", r);
-- 
2.17.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 7/9] gpu/drm/amdgpu: optimize out the case when a range is updated to read only
  2019-01-23 22:23 [PATCH v4 0/9] mmu notifier provide context informations jglisse
                   ` (5 preceding siblings ...)
  2019-01-23 22:23 ` [PATCH v4 6/9] gpu/drm/radeon: optimize out the case when a range is updated to read only jglisse
@ 2019-01-23 22:23 ` " jglisse
  2019-01-23 22:23 ` [PATCH v4 8/9] gpu/drm/i915: " jglisse
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: jglisse @ 2019-01-23 22:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Christian König, Jan Kara, Felix Kuehling, Jason Gunthorpe,
	Matthew Wilcox, Ross Zwisler, Dan Williams, Paolo Bonzini,
	Radim Krčmář,
	Michal Hocko, Ralph Campbell, John Hubbard, kvm, dri-devel,
	linux-rdma, linux-fsdevel, Arnd Bergmann

From: Jérôme Glisse <jglisse@redhat.com>

When range of virtual address is updated read only and corresponding
user ptr object are already read only it is pointless to do anything.
Optimize this case out.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: kvm@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index 3e6823fdd939..7880eda064cd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -294,6 +294,7 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn,
 {
 	struct amdgpu_mn *amn = container_of(mn, struct amdgpu_mn, mn);
 	struct interval_tree_node *it;
+	bool update_to_read_only;
 	unsigned long end;
 
 	/* notification is exclusive, but interval is inclusive */
@@ -302,6 +303,8 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn,
 	if (amdgpu_mn_read_lock(amn, range->blockable))
 		return -EAGAIN;
 
+	update_to_read_only = mmu_notifier_range_update_to_read_only(range);
+
 	it = interval_tree_iter_first(&amn->objects, range->start, end);
 	while (it) {
 		struct amdgpu_mn_node *node;
@@ -317,6 +320,16 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn,
 
 		list_for_each_entry(bo, &node->bos, mn_list) {
 			struct kgd_mem *mem = bo->kfd_bo;
+			bool read_only;
+
+			/*
+			 * If it is already read only and we are updating to
+			 * read only then we do not need to change anything.
+			 * So save time and skip this one.
+			 */
+			read_only = amdgpu_ttm_tt_is_readonly(bo->tbo.ttm);
+			if (update_to_read_only && read_only)
+				continue;
 
 			if (amdgpu_ttm_tt_affect_userptr(bo->tbo.ttm,
 							 range->start,
-- 
2.17.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 8/9] gpu/drm/i915: optimize out the case when a range is updated to read only
  2019-01-23 22:23 [PATCH v4 0/9] mmu notifier provide context informations jglisse
                   ` (6 preceding siblings ...)
  2019-01-23 22:23 ` [PATCH v4 7/9] gpu/drm/amdgpu: " jglisse
@ 2019-01-23 22:23 ` " jglisse
  2019-01-24 12:09   ` Joonas Lahtinen
  2019-01-23 22:23 ` [PATCH v4 9/9] RDMA/umem_odp: " jglisse
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: jglisse @ 2019-01-23 22:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Christian König, Jan Kara, Felix Kuehling, Jason Gunthorpe,
	Matthew Wilcox, Ross Zwisler, Dan Williams, Paolo Bonzini,
	Radim Krčmář,
	Michal Hocko, Ralph Campbell, John Hubbard, kvm, dri-devel,
	linux-rdma, linux-fsdevel, Arnd Bergmann

From: Jérôme Glisse <jglisse@redhat.com>

When range of virtual address is updated read only and corresponding
user ptr object are already read only it is pointless to do anything.
Optimize this case out.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: kvm@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
---
 drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 9558582c105e..23330ac3d7ea 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -59,6 +59,7 @@ struct i915_mmu_object {
 	struct interval_tree_node it;
 	struct list_head link;
 	struct work_struct work;
+	bool read_only;
 	bool attached;
 };
 
@@ -119,6 +120,7 @@ static int i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
 		container_of(_mn, struct i915_mmu_notifier, mn);
 	struct i915_mmu_object *mo;
 	struct interval_tree_node *it;
+	bool update_to_read_only;
 	LIST_HEAD(cancelled);
 	unsigned long end;
 
@@ -128,6 +130,8 @@ static int i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
 	/* interval ranges are inclusive, but invalidate range is exclusive */
 	end = range->end - 1;
 
+	update_to_read_only = mmu_notifier_range_update_to_read_only(range);
+
 	spin_lock(&mn->lock);
 	it = interval_tree_iter_first(&mn->objects, range->start, end);
 	while (it) {
@@ -145,6 +149,17 @@ static int i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
 		 * object if it is not in the process of being destroyed.
 		 */
 		mo = container_of(it, struct i915_mmu_object, it);
+
+		/*
+		 * If it is already read only and we are updating to
+		 * read only then we do not need to change anything.
+		 * So save time and skip this one.
+		 */
+		if (update_to_read_only && mo->read_only) {
+			it = interval_tree_iter_next(it, range->start, end);
+			continue;
+		}
+
 		if (kref_get_unless_zero(&mo->obj->base.refcount))
 			queue_work(mn->wq, &mo->work);
 
@@ -270,6 +285,7 @@ i915_gem_userptr_init__mmu_notifier(struct drm_i915_gem_object *obj,
 	mo->mn = mn;
 	mo->obj = obj;
 	mo->it.start = obj->userptr.ptr;
+	mo->read_only = i915_gem_object_is_readonly(obj);
 	mo->it.last = obj->userptr.ptr + obj->base.size - 1;
 	INIT_WORK(&mo->work, cancel_userptr);
 
-- 
2.17.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 9/9] RDMA/umem_odp: optimize out the case when a range is updated to read only
  2019-01-23 22:23 [PATCH v4 0/9] mmu notifier provide context informations jglisse
                   ` (7 preceding siblings ...)
  2019-01-23 22:23 ` [PATCH v4 8/9] gpu/drm/i915: " jglisse
@ 2019-01-23 22:23 ` " jglisse
  2019-01-23 22:32   ` Jason Gunthorpe
  2019-01-23 22:54 ` [PATCH v4 0/9] mmu notifier provide context informations Dan Williams
  2019-01-31 16:10 ` Jerome Glisse
  10 siblings, 1 reply; 24+ messages in thread
From: jglisse @ 2019-01-23 22:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Christian König, Jan Kara, Felix Kuehling, Jason Gunthorpe,
	Matthew Wilcox, Ross Zwisler, Dan Williams, Paolo Bonzini,
	Radim Krčmář,
	Michal Hocko, Ralph Campbell, John Hubbard, kvm, dri-devel,
	linux-rdma, linux-fsdevel, Arnd Bergmann

From: Jérôme Glisse <jglisse@redhat.com>

When range of virtual address is updated read only and corresponding
user ptr object are already read only it is pointless to do anything.
Optimize this case out.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: kvm@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
---
 drivers/infiniband/core/umem_odp.c | 22 +++++++++++++++++++---
 include/rdma/ib_umem_odp.h         |  1 +
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index a4ec43093cb3..fa4e7fdcabfc 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -140,8 +140,15 @@ static void ib_umem_notifier_release(struct mmu_notifier *mn,
 static int invalidate_range_start_trampoline(struct ib_umem_odp *item,
 					     u64 start, u64 end, void *cookie)
 {
+	bool update_to_read_only = *((bool *)cookie);
+
 	ib_umem_notifier_start_account(item);
-	item->umem.context->invalidate_range(item, start, end);
+	/*
+	 * If it is already read only and we are updating to read only then we
+	 * do not need to change anything. So save time and skip this one.
+	 */
+	if (!update_to_read_only || !item->read_only)
+		item->umem.context->invalidate_range(item, start, end);
 	return 0;
 }
 
@@ -150,6 +157,7 @@ static int ib_umem_notifier_invalidate_range_start(struct mmu_notifier *mn,
 {
 	struct ib_ucontext_per_mm *per_mm =
 		container_of(mn, struct ib_ucontext_per_mm, mn);
+	bool update_to_read_only;
 
 	if (range->blockable)
 		down_read(&per_mm->umem_rwsem);
@@ -166,10 +174,13 @@ static int ib_umem_notifier_invalidate_range_start(struct mmu_notifier *mn,
 		return 0;
 	}
 
+	update_to_read_only = mmu_notifier_range_update_to_read_only(range);
+
 	return rbt_ib_umem_for_each_in_range(&per_mm->umem_tree, range->start,
 					     range->end,
 					     invalidate_range_start_trampoline,
-					     range->blockable, NULL);
+					     range->blockable,
+					     &update_to_read_only);
 }
 
 static int invalidate_range_end_trampoline(struct ib_umem_odp *item, u64 start,
@@ -363,6 +374,9 @@ struct ib_umem_odp *ib_alloc_odp_umem(struct ib_ucontext_per_mm *per_mm,
 		goto out_odp_data;
 	}
 
+	/* Assume read only at first, each time GUP is call this is updated. */
+	odp_data->read_only = true;
+
 	odp_data->dma_list =
 		vzalloc(array_size(pages, sizeof(*odp_data->dma_list)));
 	if (!odp_data->dma_list) {
@@ -619,8 +633,10 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 		goto out_put_task;
 	}
 
-	if (access_mask & ODP_WRITE_ALLOWED_BIT)
+	if (access_mask & ODP_WRITE_ALLOWED_BIT) {
+		umem_odp->read_only = false;
 		flags |= FOLL_WRITE;
+	}
 
 	start_idx = (user_virt - ib_umem_start(umem)) >> page_shift;
 	k = start_idx;
diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h
index 0b1446fe2fab..8256668c6170 100644
--- a/include/rdma/ib_umem_odp.h
+++ b/include/rdma/ib_umem_odp.h
@@ -76,6 +76,7 @@ struct ib_umem_odp {
 	struct completion	notifier_completion;
 	int			dying;
 	struct work_struct	work;
+	bool read_only;
 };
 
 static inline struct ib_umem_odp *to_ib_umem_odp(struct ib_umem *umem)
-- 
2.17.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 9/9] RDMA/umem_odp: optimize out the case when a range is updated to read only
  2019-01-23 22:23 ` [PATCH v4 9/9] RDMA/umem_odp: " jglisse
@ 2019-01-23 22:32   ` Jason Gunthorpe
  2019-01-23 22:46     ` Jerome Glisse
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2019-01-23 22:32 UTC (permalink / raw)
  To: jglisse
  Cc: linux-mm, Andrew Morton, linux-kernel, Christian König,
	Jan Kara, Felix Kuehling, Matthew Wilcox, Ross Zwisler,
	Dan Williams, Paolo Bonzini, Radim Krčmář,
	Michal Hocko, Ralph Campbell, John Hubbard, kvm, dri-devel,
	linux-rdma, linux-fsdevel, Arnd Bergmann

On Wed, Jan 23, 2019 at 05:23:15PM -0500, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> When range of virtual address is updated read only and corresponding
> user ptr object are already read only it is pointless to do anything.
> Optimize this case out.
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> Cc: Jason Gunthorpe <jgg@mellanox.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Matthew Wilcox <mawilcox@microsoft.com>
> Cc: Ross Zwisler <zwisler@kernel.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: kvm@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-rdma@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: Arnd Bergmann <arnd@arndb.de>
>  drivers/infiniband/core/umem_odp.c | 22 +++++++++++++++++++---
>  include/rdma/ib_umem_odp.h         |  1 +
>  2 files changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> index a4ec43093cb3..fa4e7fdcabfc 100644
> +++ b/drivers/infiniband/core/umem_odp.c
> @@ -140,8 +140,15 @@ static void ib_umem_notifier_release(struct mmu_notifier *mn,
>  static int invalidate_range_start_trampoline(struct ib_umem_odp *item,
>  					     u64 start, u64 end, void *cookie)
>  {
> +	bool update_to_read_only = *((bool *)cookie);
> +
>  	ib_umem_notifier_start_account(item);
> -	item->umem.context->invalidate_range(item, start, end);
> +	/*
> +	 * If it is already read only and we are updating to read only then we
> +	 * do not need to change anything. So save time and skip this one.
> +	 */
> +	if (!update_to_read_only || !item->read_only)
> +		item->umem.context->invalidate_range(item, start, end);
>  	return 0;
>  }
>  
> @@ -150,6 +157,7 @@ static int ib_umem_notifier_invalidate_range_start(struct mmu_notifier *mn,
>  {
>  	struct ib_ucontext_per_mm *per_mm =
>  		container_of(mn, struct ib_ucontext_per_mm, mn);
> +	bool update_to_read_only;
>  
>  	if (range->blockable)
>  		down_read(&per_mm->umem_rwsem);
> @@ -166,10 +174,13 @@ static int ib_umem_notifier_invalidate_range_start(struct mmu_notifier *mn,
>  		return 0;
>  	}
>  
> +	update_to_read_only = mmu_notifier_range_update_to_read_only(range);
> +
>  	return rbt_ib_umem_for_each_in_range(&per_mm->umem_tree, range->start,
>  					     range->end,
>  					     invalidate_range_start_trampoline,
> -					     range->blockable, NULL);
> +					     range->blockable,
> +					     &update_to_read_only);
>  }
>  
>  static int invalidate_range_end_trampoline(struct ib_umem_odp *item, u64 start,
> @@ -363,6 +374,9 @@ struct ib_umem_odp *ib_alloc_odp_umem(struct ib_ucontext_per_mm *per_mm,
>  		goto out_odp_data;
>  	}
>  
> +	/* Assume read only at first, each time GUP is call this is updated. */
> +	odp_data->read_only = true;
> +
>  	odp_data->dma_list =
>  		vzalloc(array_size(pages, sizeof(*odp_data->dma_list)));
>  	if (!odp_data->dma_list) {
> @@ -619,8 +633,10 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
>  		goto out_put_task;
>  	}
>  
> -	if (access_mask & ODP_WRITE_ALLOWED_BIT)
> +	if (access_mask & ODP_WRITE_ALLOWED_BIT) {
> +		umem_odp->read_only = false;

No locking?

>  		flags |= FOLL_WRITE;
> +	}
>  
>  	start_idx = (user_virt - ib_umem_start(umem)) >> page_shift;
>  	k = start_idx;
> diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h
> index 0b1446fe2fab..8256668c6170 100644
> +++ b/include/rdma/ib_umem_odp.h
> @@ -76,6 +76,7 @@ struct ib_umem_odp {
>  	struct completion	notifier_completion;
>  	int			dying;
>  	struct work_struct	work;
> +	bool read_only;
>  };

The ib_umem already has a writeable flag. This reflects if the user
asked for write permission to be granted.. The tracking here is if any
remote fault thus far has requested write, is this an important
difference to justify the new flag?

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 9/9] RDMA/umem_odp: optimize out the case when a range is updated to read only
  2019-01-23 22:32   ` Jason Gunthorpe
@ 2019-01-23 22:46     ` Jerome Glisse
  0 siblings, 0 replies; 24+ messages in thread
From: Jerome Glisse @ 2019-01-23 22:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-mm, Andrew Morton, linux-kernel, Christian König,
	Jan Kara, Felix Kuehling, Matthew Wilcox, Ross Zwisler,
	Dan Williams, Paolo Bonzini, Radim Krčmář,
	Michal Hocko, Ralph Campbell, John Hubbard, kvm, dri-devel,
	linux-rdma, linux-fsdevel, Arnd Bergmann

On Wed, Jan 23, 2019 at 10:32:00PM +0000, Jason Gunthorpe wrote:
> On Wed, Jan 23, 2019 at 05:23:15PM -0500, jglisse@redhat.com wrote:
> > From: Jérôme Glisse <jglisse@redhat.com>
> > 
> > When range of virtual address is updated read only and corresponding
> > user ptr object are already read only it is pointless to do anything.
> > Optimize this case out.
> > 
> > Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Christian König <christian.koenig@amd.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> > Cc: Jason Gunthorpe <jgg@mellanox.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Matthew Wilcox <mawilcox@microsoft.com>
> > Cc: Ross Zwisler <zwisler@kernel.org>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Ralph Campbell <rcampbell@nvidia.com>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: kvm@vger.kernel.org
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: linux-rdma@vger.kernel.org
> > Cc: linux-fsdevel@vger.kernel.org
> > Cc: Arnd Bergmann <arnd@arndb.de>
> >  drivers/infiniband/core/umem_odp.c | 22 +++++++++++++++++++---
> >  include/rdma/ib_umem_odp.h         |  1 +
> >  2 files changed, 20 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> > index a4ec43093cb3..fa4e7fdcabfc 100644
> > +++ b/drivers/infiniband/core/umem_odp.c
> > @@ -140,8 +140,15 @@ static void ib_umem_notifier_release(struct mmu_notifier *mn,
> >  static int invalidate_range_start_trampoline(struct ib_umem_odp *item,
> >  					     u64 start, u64 end, void *cookie)
> >  {
> > +	bool update_to_read_only = *((bool *)cookie);
> > +
> >  	ib_umem_notifier_start_account(item);
> > -	item->umem.context->invalidate_range(item, start, end);
> > +	/*
> > +	 * If it is already read only and we are updating to read only then we
> > +	 * do not need to change anything. So save time and skip this one.
> > +	 */
> > +	if (!update_to_read_only || !item->read_only)
> > +		item->umem.context->invalidate_range(item, start, end);
> >  	return 0;
> >  }
> >  
> > @@ -150,6 +157,7 @@ static int ib_umem_notifier_invalidate_range_start(struct mmu_notifier *mn,
> >  {
> >  	struct ib_ucontext_per_mm *per_mm =
> >  		container_of(mn, struct ib_ucontext_per_mm, mn);
> > +	bool update_to_read_only;
> >  
> >  	if (range->blockable)
> >  		down_read(&per_mm->umem_rwsem);
> > @@ -166,10 +174,13 @@ static int ib_umem_notifier_invalidate_range_start(struct mmu_notifier *mn,
> >  		return 0;
> >  	}
> >  
> > +	update_to_read_only = mmu_notifier_range_update_to_read_only(range);
> > +
> >  	return rbt_ib_umem_for_each_in_range(&per_mm->umem_tree, range->start,
> >  					     range->end,
> >  					     invalidate_range_start_trampoline,
> > -					     range->blockable, NULL);
> > +					     range->blockable,
> > +					     &update_to_read_only);
> >  }
> >  
> >  static int invalidate_range_end_trampoline(struct ib_umem_odp *item, u64 start,
> > @@ -363,6 +374,9 @@ struct ib_umem_odp *ib_alloc_odp_umem(struct ib_ucontext_per_mm *per_mm,
> >  		goto out_odp_data;
> >  	}
> >  
> > +	/* Assume read only at first, each time GUP is call this is updated. */
> > +	odp_data->read_only = true;
> > +
> >  	odp_data->dma_list =
> >  		vzalloc(array_size(pages, sizeof(*odp_data->dma_list)));
> >  	if (!odp_data->dma_list) {
> > @@ -619,8 +633,10 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
> >  		goto out_put_task;
> >  	}
> >  
> > -	if (access_mask & ODP_WRITE_ALLOWED_BIT)
> > +	if (access_mask & ODP_WRITE_ALLOWED_BIT) {
> > +		umem_odp->read_only = false;
> 
> No locking?

The mmu notitfier exclusion will ensure that it is not missed
ie it will be false before any mmu notifier might be call on
page GUPed with write flag which is what matter here. So lock
are useless here.

> 
> >  		flags |= FOLL_WRITE;
> > +	}
> >  
> >  	start_idx = (user_virt - ib_umem_start(umem)) >> page_shift;
> >  	k = start_idx;
> > diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h
> > index 0b1446fe2fab..8256668c6170 100644
> > +++ b/include/rdma/ib_umem_odp.h
> > @@ -76,6 +76,7 @@ struct ib_umem_odp {
> >  	struct completion	notifier_completion;
> >  	int			dying;
> >  	struct work_struct	work;
> > +	bool read_only;
> >  };
> 
> The ib_umem already has a writeable flag. This reflects if the user
> asked for write permission to be granted.. The tracking here is if any
> remote fault thus far has requested write, is this an important
> difference to justify the new flag?

I did that patch couple week ago and now i do not remember why
i did not use that, i remember thinking about it ... damm i need
to keep better notes. I will review the code again.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 0/9] mmu notifier provide context informations
  2019-01-23 22:23 [PATCH v4 0/9] mmu notifier provide context informations jglisse
                   ` (8 preceding siblings ...)
  2019-01-23 22:23 ` [PATCH v4 9/9] RDMA/umem_odp: " jglisse
@ 2019-01-23 22:54 ` Dan Williams
  2019-01-23 23:04   ` Jerome Glisse
  2019-01-31 16:10 ` Jerome Glisse
  10 siblings, 1 reply; 24+ messages in thread
From: Dan Williams @ 2019-01-23 22:54 UTC (permalink / raw)
  To: Jérôme Glisse
  Cc: Linux MM, Andrew Morton, Linux Kernel Mailing List,
	Christian König, Jan Kara, Felix Kuehling, Jason Gunthorpe,
	Matthew Wilcox, Ross Zwisler, Paolo Bonzini,
	Radim Krčmář,
	Michal Hocko, Ralph Campbell, John Hubbard, KVM list,
	Maling list - DRI developers, linux-rdma, linux-fsdevel,
	Arnd Bergmann

On Wed, Jan 23, 2019 at 2:23 PM <jglisse@redhat.com> wrote:
>
> From: Jérôme Glisse <jglisse@redhat.com>
>
> Hi Andrew, i see that you still have my event patch in you queue [1].
> This patchset replace that single patch and is broken down in further
> step so that it is easier to review and ascertain that no mistake were
> made during mechanical changes. Here are the step:
>
>     Patch 1 - add the enum values
>     Patch 2 - coccinelle semantic patch to convert all call site of
>               mmu_notifier_range_init to default enum value and also
>               to passing down the vma when it is available
>     Patch 3 - update many call site to more accurate enum values
>     Patch 4 - add the information to the mmu_notifier_range struct
>     Patch 5 - helper to test if a range is updated to read only
>
> All the remaining patches are update to various driver to demonstrate
> how this new information get use by device driver. I build tested
> with make all and make all minus everything that enable mmu notifier
> ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd
> gpu and intel gpu.
>
> If they are no objections i believe best plan would be to merge the
> the first 5 patches (all mm changes) through your queue for 5.1 and
> then to delay driver update to each individual driver tree for 5.2.
> This will allow each individual device driver maintainer time to more
> thouroughly test this more then my own testing.
>
> Note that i also intend to use this feature further in nouveau and
> HMM down the road. I also expect that other user like KVM might be
> interested into leveraging this new information to optimize some of
> there secondary page table invalidation.

"Down the road" users should introduce the functionality they want to
consume. The common concern with preemptively including
forward-looking infrastructure is realizing later that the
infrastructure is not needed, or needs changing. If it has no current
consumer, leave it out.

>
> Here is an explaination on the rational for this patchset:
>
>
> CPU page table update can happens for many reasons, not only as a result
> of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
> as a result of kernel activities (memory compression, reclaim, migration,
> ...).
>
> This patch introduce a set of enums that can be associated with each of
> the events triggering a mmu notifier. Latter patches take advantages of
> those enum values.
>
> - UNMAP: munmap() or mremap()
> - CLEAR: page table is cleared (migration, compaction, reclaim, ...)
> - PROTECTION_VMA: change in access protections for the range
> - PROTECTION_PAGE: change in access protections for page in the range
> - SOFT_DIRTY: soft dirtyness tracking
>
> Being able to identify munmap() and mremap() from other reasons why the
> page table is cleared is important to allow user of mmu notifier to
> update their own internal tracking structure accordingly (on munmap or
> mremap it is not longer needed to track range of virtual address as it
> becomes invalid).

The only context information consumed in this patch set is
MMU_NOTIFY_PROTECTION_VMA.

What is the practical benefit of these "optimize out the case when a
range is updated to read only" optimizations? Any numbers to show this
is worth the code thrash?

>
> [1] https://www.ozlabs.org/~akpm/mmotm/broken-out/mm-mmu_notifier-contextual-information-for-event-triggering-invalidation-v2.patch
>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> Cc: Jason Gunthorpe <jgg@mellanox.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Matthew Wilcox <mawilcox@microsoft.com>
> Cc: Ross Zwisler <zwisler@kernel.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: kvm@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-rdma@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: Arnd Bergmann <arnd@arndb.de>
>
> Jérôme Glisse (9):
>   mm/mmu_notifier: contextual information for event enums
>   mm/mmu_notifier: contextual information for event triggering
>     invalidation
>   mm/mmu_notifier: use correct mmu_notifier events for each invalidation
>   mm/mmu_notifier: pass down vma and reasons why mmu notifier is
>     happening
>   mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper
>   gpu/drm/radeon: optimize out the case when a range is updated to read
>     only
>   gpu/drm/amdgpu: optimize out the case when a range is updated to read
>     only
>   gpu/drm/i915: optimize out the case when a range is updated to read
>     only
>   RDMA/umem_odp: optimize out the case when a range is updated to read
>     only
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  | 13 ++++++++
>  drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++++
>  drivers/gpu/drm/radeon/radeon_mn.c      | 13 ++++++++
>  drivers/infiniband/core/umem_odp.c      | 22 +++++++++++--
>  fs/proc/task_mmu.c                      |  3 +-
>  include/linux/mmu_notifier.h            | 42 ++++++++++++++++++++++++-
>  include/rdma/ib_umem_odp.h              |  1 +
>  kernel/events/uprobes.c                 |  3 +-
>  mm/huge_memory.c                        | 14 +++++----
>  mm/hugetlb.c                            | 11 ++++---
>  mm/khugepaged.c                         |  3 +-
>  mm/ksm.c                                |  6 ++--
>  mm/madvise.c                            |  3 +-
>  mm/memory.c                             | 25 +++++++++------
>  mm/migrate.c                            |  5 ++-
>  mm/mmu_notifier.c                       | 10 ++++++
>  mm/mprotect.c                           |  4 ++-
>  mm/mremap.c                             |  3 +-
>  mm/oom_kill.c                           |  3 +-
>  mm/rmap.c                               |  6 ++--
>  20 files changed, 171 insertions(+), 35 deletions(-)
>
> --
> 2.17.2
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 0/9] mmu notifier provide context informations
  2019-01-23 22:54 ` [PATCH v4 0/9] mmu notifier provide context informations Dan Williams
@ 2019-01-23 23:04   ` Jerome Glisse
  2019-01-24  0:00     ` Dan Williams
  0 siblings, 1 reply; 24+ messages in thread
From: Jerome Glisse @ 2019-01-23 23:04 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ralph Campbell, Jan Kara, Arnd Bergmann, KVM list,
	Matthew Wilcox, linux-rdma, John Hubbard, Felix Kuehling,
	Radim Krčmář,
	Linux Kernel Mailing List, Maling list - DRI developers,
	Michal Hocko, Linux MM, Jason Gunthorpe, Ross Zwisler,
	linux-fsdevel, Paolo Bonzini, Andrew Morton,
	Christian König

On Wed, Jan 23, 2019 at 02:54:40PM -0800, Dan Williams wrote:
> On Wed, Jan 23, 2019 at 2:23 PM <jglisse@redhat.com> wrote:
> >
> > From: Jérôme Glisse <jglisse@redhat.com>
> >
> > Hi Andrew, i see that you still have my event patch in you queue [1].
> > This patchset replace that single patch and is broken down in further
> > step so that it is easier to review and ascertain that no mistake were
> > made during mechanical changes. Here are the step:
> >
> >     Patch 1 - add the enum values
> >     Patch 2 - coccinelle semantic patch to convert all call site of
> >               mmu_notifier_range_init to default enum value and also
> >               to passing down the vma when it is available
> >     Patch 3 - update many call site to more accurate enum values
> >     Patch 4 - add the information to the mmu_notifier_range struct
> >     Patch 5 - helper to test if a range is updated to read only
> >
> > All the remaining patches are update to various driver to demonstrate
> > how this new information get use by device driver. I build tested
> > with make all and make all minus everything that enable mmu notifier
> > ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd
> > gpu and intel gpu.
> >
> > If they are no objections i believe best plan would be to merge the
> > the first 5 patches (all mm changes) through your queue for 5.1 and
> > then to delay driver update to each individual driver tree for 5.2.
> > This will allow each individual device driver maintainer time to more
> > thouroughly test this more then my own testing.
> >
> > Note that i also intend to use this feature further in nouveau and
> > HMM down the road. I also expect that other user like KVM might be
> > interested into leveraging this new information to optimize some of
> > there secondary page table invalidation.
> 
> "Down the road" users should introduce the functionality they want to
> consume. The common concern with preemptively including
> forward-looking infrastructure is realizing later that the
> infrastructure is not needed, or needs changing. If it has no current
> consumer, leave it out.

This patchset already show that this is useful, what more can i do ?
I know i will use this information, in nouveau for memory policy we
allocate our own structure for every vma the GPU ever accessed or that
userspace hinted we should set a policy for. Right now with existing
mmu notifier i _must_ free those structure because i do not know if
the invalidation is an munmap or something else. So i am loosing
important informations and unecessarly free struct that i will have
to re-allocate just couple jiffies latter. That's one way i am using
this. The other way is to optimize GPU page table update just like i
am doing with all the patches to RDMA/ODP and various GPU drivers.


> > Here is an explaination on the rational for this patchset:
> >
> >
> > CPU page table update can happens for many reasons, not only as a result
> > of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
> > as a result of kernel activities (memory compression, reclaim, migration,
> > ...).
> >
> > This patch introduce a set of enums that can be associated with each of
> > the events triggering a mmu notifier. Latter patches take advantages of
> > those enum values.
> >
> > - UNMAP: munmap() or mremap()
> > - CLEAR: page table is cleared (migration, compaction, reclaim, ...)
> > - PROTECTION_VMA: change in access protections for the range
> > - PROTECTION_PAGE: change in access protections for page in the range
> > - SOFT_DIRTY: soft dirtyness tracking
> >
> > Being able to identify munmap() and mremap() from other reasons why the
> > page table is cleared is important to allow user of mmu notifier to
> > update their own internal tracking structure accordingly (on munmap or
> > mremap it is not longer needed to track range of virtual address as it
> > becomes invalid).
> 
> The only context information consumed in this patch set is
> MMU_NOTIFY_PROTECTION_VMA.
> 
> What is the practical benefit of these "optimize out the case when a
> range is updated to read only" optimizations? Any numbers to show this
> is worth the code thrash?

It depends on the workload for instance if you map to RDMA a file
read only like a log file for export, all write back that would
disrupt the RDMA mapping can be optimized out.

See above for more reasons why it is beneficial (knowing when it is
an munmap/mremap versus something else).

I would have not thought that passing down information as something
that controversial. Hopes this help you see the benefit of this.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 0/9] mmu notifier provide context informations
  2019-01-23 23:04   ` Jerome Glisse
@ 2019-01-24  0:00     ` Dan Williams
  0 siblings, 0 replies; 24+ messages in thread
From: Dan Williams @ 2019-01-24  0:00 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Ralph Campbell, Jan Kara, Arnd Bergmann, KVM list,
	Matthew Wilcox, linux-rdma, John Hubbard, Felix Kuehling,
	Radim Krčmář,
	Linux Kernel Mailing List, Maling list - DRI developers,
	Michal Hocko, Linux MM, Jason Gunthorpe, Ross Zwisler,
	linux-fsdevel, Paolo Bonzini, Andrew Morton,
	Christian König

On Wed, Jan 23, 2019 at 3:05 PM Jerome Glisse <jglisse@redhat.com> wrote:
>
> On Wed, Jan 23, 2019 at 02:54:40PM -0800, Dan Williams wrote:
> > On Wed, Jan 23, 2019 at 2:23 PM <jglisse@redhat.com> wrote:
> > >
> > > From: Jérôme Glisse <jglisse@redhat.com>
> > >
> > > Hi Andrew, i see that you still have my event patch in you queue [1].
> > > This patchset replace that single patch and is broken down in further
> > > step so that it is easier to review and ascertain that no mistake were
> > > made during mechanical changes. Here are the step:
> > >
> > >     Patch 1 - add the enum values
> > >     Patch 2 - coccinelle semantic patch to convert all call site of
> > >               mmu_notifier_range_init to default enum value and also
> > >               to passing down the vma when it is available
> > >     Patch 3 - update many call site to more accurate enum values
> > >     Patch 4 - add the information to the mmu_notifier_range struct
> > >     Patch 5 - helper to test if a range is updated to read only
> > >
> > > All the remaining patches are update to various driver to demonstrate
> > > how this new information get use by device driver. I build tested
> > > with make all and make all minus everything that enable mmu notifier
> > > ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd
> > > gpu and intel gpu.
> > >
> > > If they are no objections i believe best plan would be to merge the
> > > the first 5 patches (all mm changes) through your queue for 5.1 and
> > > then to delay driver update to each individual driver tree for 5.2.
> > > This will allow each individual device driver maintainer time to more
> > > thouroughly test this more then my own testing.
> > >
> > > Note that i also intend to use this feature further in nouveau and
> > > HMM down the road. I also expect that other user like KVM might be
> > > interested into leveraging this new information to optimize some of
> > > there secondary page table invalidation.
> >
> > "Down the road" users should introduce the functionality they want to
> > consume. The common concern with preemptively including
> > forward-looking infrastructure is realizing later that the
> > infrastructure is not needed, or needs changing. If it has no current
> > consumer, leave it out.
>
> This patchset already show that this is useful, what more can i do ?
> I know i will use this information, in nouveau for memory policy we
> allocate our own structure for every vma the GPU ever accessed or that
> userspace hinted we should set a policy for. Right now with existing
> mmu notifier i _must_ free those structure because i do not know if
> the invalidation is an munmap or something else. So i am loosing
> important informations and unecessarly free struct that i will have
> to re-allocate just couple jiffies latter. That's one way i am using
> this.

Understood, but that still seems to say stage the core support when
the nouveau enabling is ready.

> The other way is to optimize GPU page table update just like i
> am doing with all the patches to RDMA/ODP and various GPU drivers.

Yes.

>
>
> > > Here is an explaination on the rational for this patchset:
> > >
> > >
> > > CPU page table update can happens for many reasons, not only as a result
> > > of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
> > > as a result of kernel activities (memory compression, reclaim, migration,
> > > ...).
> > >
> > > This patch introduce a set of enums that can be associated with each of
> > > the events triggering a mmu notifier. Latter patches take advantages of
> > > those enum values.
> > >
> > > - UNMAP: munmap() or mremap()
> > > - CLEAR: page table is cleared (migration, compaction, reclaim, ...)
> > > - PROTECTION_VMA: change in access protections for the range
> > > - PROTECTION_PAGE: change in access protections for page in the range
> > > - SOFT_DIRTY: soft dirtyness tracking
> > >
> > > Being able to identify munmap() and mremap() from other reasons why the
> > > page table is cleared is important to allow user of mmu notifier to
> > > update their own internal tracking structure accordingly (on munmap or
> > > mremap it is not longer needed to track range of virtual address as it
> > > becomes invalid).
> >
> > The only context information consumed in this patch set is
> > MMU_NOTIFY_PROTECTION_VMA.
> >
> > What is the practical benefit of these "optimize out the case when a
> > range is updated to read only" optimizations? Any numbers to show this
> > is worth the code thrash?
>
> It depends on the workload for instance if you map to RDMA a file
> read only like a log file for export, all write back that would
> disrupt the RDMA mapping can be optimized out.
>
> See above for more reasons why it is beneficial (knowing when it is
> an munmap/mremap versus something else).
>
> I would have not thought that passing down information as something
> that controversial. Hopes this help you see the benefit of this.

I'm not asserting that it is controversial. I am asserting that
whenever a changelog says "optimize" it also includes concrete data
about the optimization scenario. Maybe the scenarios you have
optimized are clear to the driver owners, they just weren't
immediately clear to me.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 8/9] gpu/drm/i915: optimize out the case when a range is updated to read only
  2019-01-23 22:23 ` [PATCH v4 8/9] gpu/drm/i915: " jglisse
@ 2019-01-24 12:09   ` Joonas Lahtinen
  2019-01-24 15:30     ` Jerome Glisse
  0 siblings, 1 reply; 24+ messages in thread
From: Joonas Lahtinen @ 2019-01-24 12:09 UTC (permalink / raw)
  To: jglisse, linux-mm
  Cc: Ralph Campbell, Jan Kara, Arnd Bergmann, kvm, Matthew Wilcox,
	linux-rdma, John Hubbard, Felix Kuehling,
	Radim Krčmář,
	Dan Williams, linux-kernel, dri-devel, Michal Hocko,
	Jérôme Glisse, Jason Gunthorpe, Ross Zwisler,
	linux-fsdevel, Paolo Bonzini, Andrew Morton,
	Christian König

Hi Jerome,

This patch seems to have plenty of Cc:s, but none of the right ones :)

For further iterations, I guess you could use git option --cc to make
sure everyone gets the whole series, and still keep the Cc:s in the
patches themselves relevant to subsystems.

This doesn't seem to be on top of drm-tip, but on top of your previous
patches(?) that I had some comments about. Could you take a moment to
first address the couple of question I had, before proceeding to discuss
what is built on top of that base.

My reply's Message-ID is:
154289518994.19402.3481838548028068213@jlahtine-desk.ger.corp.intel.com

Regards, Joonas

PS. Please keep me Cc:d in the following patches, I'm keen on
understanding the motive and benefits.

Quoting jglisse@redhat.com (2019-01-24 00:23:14)
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> When range of virtual address is updated read only and corresponding
> user ptr object are already read only it is pointless to do anything.
> Optimize this case out.
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> Cc: Jason Gunthorpe <jgg@mellanox.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Matthew Wilcox <mawilcox@microsoft.com>
> Cc: Ross Zwisler <zwisler@kernel.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: kvm@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-rdma@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: Arnd Bergmann <arnd@arndb.de>
> ---
>  drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index 9558582c105e..23330ac3d7ea 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -59,6 +59,7 @@ struct i915_mmu_object {
>         struct interval_tree_node it;
>         struct list_head link;
>         struct work_struct work;
> +       bool read_only;
>         bool attached;
>  };
>  
> @@ -119,6 +120,7 @@ static int i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
>                 container_of(_mn, struct i915_mmu_notifier, mn);
>         struct i915_mmu_object *mo;
>         struct interval_tree_node *it;
> +       bool update_to_read_only;
>         LIST_HEAD(cancelled);
>         unsigned long end;
>  
> @@ -128,6 +130,8 @@ static int i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
>         /* interval ranges are inclusive, but invalidate range is exclusive */
>         end = range->end - 1;
>  
> +       update_to_read_only = mmu_notifier_range_update_to_read_only(range);
> +
>         spin_lock(&mn->lock);
>         it = interval_tree_iter_first(&mn->objects, range->start, end);
>         while (it) {
> @@ -145,6 +149,17 @@ static int i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
>                  * object if it is not in the process of being destroyed.
>                  */
>                 mo = container_of(it, struct i915_mmu_object, it);
> +
> +               /*
> +                * If it is already read only and we are updating to
> +                * read only then we do not need to change anything.
> +                * So save time and skip this one.
> +                */
> +               if (update_to_read_only && mo->read_only) {
> +                       it = interval_tree_iter_next(it, range->start, end);
> +                       continue;
> +               }
> +
>                 if (kref_get_unless_zero(&mo->obj->base.refcount))
>                         queue_work(mn->wq, &mo->work);
>  
> @@ -270,6 +285,7 @@ i915_gem_userptr_init__mmu_notifier(struct drm_i915_gem_object *obj,
>         mo->mn = mn;
>         mo->obj = obj;
>         mo->it.start = obj->userptr.ptr;
> +       mo->read_only = i915_gem_object_is_readonly(obj);
>         mo->it.last = obj->userptr.ptr + obj->base.size - 1;
>         INIT_WORK(&mo->work, cancel_userptr);
>  
> -- 
> 2.17.2
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 8/9] gpu/drm/i915: optimize out the case when a range is updated to read only
  2019-01-24 12:09   ` Joonas Lahtinen
@ 2019-01-24 15:30     ` Jerome Glisse
       [not found]       ` <154877159986.4387.16328989441685542244@jlahtine-desk.ger.corp.intel.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Jerome Glisse @ 2019-01-24 15:30 UTC (permalink / raw)
  To: Joonas Lahtinen
  Cc: linux-mm, Ralph Campbell, Jan Kara, Arnd Bergmann, kvm,
	Matthew Wilcox, linux-rdma, John Hubbard, Felix Kuehling,
	Radim Krčmář,
	Dan Williams, linux-kernel, dri-devel, Michal Hocko,
	Jason Gunthorpe, Ross Zwisler, linux-fsdevel, Paolo Bonzini,
	Andrew Morton, Christian König

On Thu, Jan 24, 2019 at 02:09:12PM +0200, Joonas Lahtinen wrote:
> Hi Jerome,
> 
> This patch seems to have plenty of Cc:s, but none of the right ones :)

So sorry, i am bad with git commands.

> For further iterations, I guess you could use git option --cc to make
> sure everyone gets the whole series, and still keep the Cc:s in the
> patches themselves relevant to subsystems.

Will do.

> This doesn't seem to be on top of drm-tip, but on top of your previous
> patches(?) that I had some comments about. Could you take a moment to
> first address the couple of question I had, before proceeding to discuss
> what is built on top of that base.

It is on top of Linus tree so roughly ~ rc3 it does not depend on any
of the previous patch i posted. I still intended to propose to remove
GUP from i915 once i get around to implement the equivalent of GUP_fast
for HMM and other bonus cookies with it.

The plan is once i have all mm bits properly upstream then i can propose
patches to individual driver against the proper driver tree ie following
rules of each individual device driver sub-system and Cc only people
there to avoid spamming the mm folks :)


> 
> My reply's Message-ID is:
> 154289518994.19402.3481838548028068213@jlahtine-desk.ger.corp.intel.com
> 
> Regards, Joonas
> 
> PS. Please keep me Cc:d in the following patches, I'm keen on
> understanding the motive and benefits.
> 
> Quoting jglisse@redhat.com (2019-01-24 00:23:14)
> > From: Jérôme Glisse <jglisse@redhat.com>
> > 
> > When range of virtual address is updated read only and corresponding
> > user ptr object are already read only it is pointless to do anything.
> > Optimize this case out.
> > 
> > Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Christian König <christian.koenig@amd.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> > Cc: Jason Gunthorpe <jgg@mellanox.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Matthew Wilcox <mawilcox@microsoft.com>
> > Cc: Ross Zwisler <zwisler@kernel.org>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Ralph Campbell <rcampbell@nvidia.com>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: kvm@vger.kernel.org
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: linux-rdma@vger.kernel.org
> > Cc: linux-fsdevel@vger.kernel.org
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > index 9558582c105e..23330ac3d7ea 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > @@ -59,6 +59,7 @@ struct i915_mmu_object {
> >         struct interval_tree_node it;
> >         struct list_head link;
> >         struct work_struct work;
> > +       bool read_only;
> >         bool attached;
> >  };
> >  
> > @@ -119,6 +120,7 @@ static int i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> >                 container_of(_mn, struct i915_mmu_notifier, mn);
> >         struct i915_mmu_object *mo;
> >         struct interval_tree_node *it;
> > +       bool update_to_read_only;
> >         LIST_HEAD(cancelled);
> >         unsigned long end;
> >  
> > @@ -128,6 +130,8 @@ static int i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> >         /* interval ranges are inclusive, but invalidate range is exclusive */
> >         end = range->end - 1;
> >  
> > +       update_to_read_only = mmu_notifier_range_update_to_read_only(range);
> > +
> >         spin_lock(&mn->lock);
> >         it = interval_tree_iter_first(&mn->objects, range->start, end);
> >         while (it) {
> > @@ -145,6 +149,17 @@ static int i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> >                  * object if it is not in the process of being destroyed.
> >                  */
> >                 mo = container_of(it, struct i915_mmu_object, it);
> > +
> > +               /*
> > +                * If it is already read only and we are updating to
> > +                * read only then we do not need to change anything.
> > +                * So save time and skip this one.
> > +                */
> > +               if (update_to_read_only && mo->read_only) {
> > +                       it = interval_tree_iter_next(it, range->start, end);
> > +                       continue;
> > +               }
> > +
> >                 if (kref_get_unless_zero(&mo->obj->base.refcount))
> >                         queue_work(mn->wq, &mo->work);
> >  
> > @@ -270,6 +285,7 @@ i915_gem_userptr_init__mmu_notifier(struct drm_i915_gem_object *obj,
> >         mo->mn = mn;
> >         mo->obj = obj;
> >         mo->it.start = obj->userptr.ptr;
> > +       mo->read_only = i915_gem_object_is_readonly(obj);
> >         mo->it.last = obj->userptr.ptr + obj->base.size - 1;
> >         INIT_WORK(&mo->work, cancel_userptr);
> >  
> > -- 
> > 2.17.2
> > 
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 8/9] gpu/drm/i915: optimize out the case when a range is updated to read only
       [not found]       ` <154877159986.4387.16328989441685542244@jlahtine-desk.ger.corp.intel.com>
@ 2019-01-29 16:21         ` Jerome Glisse
  0 siblings, 0 replies; 24+ messages in thread
From: Jerome Glisse @ 2019-01-29 16:21 UTC (permalink / raw)
  To: Joonas Lahtinen
  Cc: linux-mm, Ralph Campbell, Jan Kara, Arnd Bergmann, kvm,
	Matthew Wilcox, linux-rdma, John Hubbard, Felix Kuehling,
	Radim Krčmář,
	Dan Williams, linux-kernel, dri-devel, Michal Hocko,
	Jason Gunthorpe, Ross Zwisler, linux-fsdevel, Paolo Bonzini,
	Andrew Morton, Christian König

On Tue, Jan 29, 2019 at 04:20:00PM +0200, Joonas Lahtinen wrote:
> Quoting Jerome Glisse (2019-01-24 17:30:32)
> > On Thu, Jan 24, 2019 at 02:09:12PM +0200, Joonas Lahtinen wrote:
> > > Hi Jerome,
> > > 
> > > This patch seems to have plenty of Cc:s, but none of the right ones :)
> > 
> > So sorry, i am bad with git commands.
> > 
> > > For further iterations, I guess you could use git option --cc to make
> > > sure everyone gets the whole series, and still keep the Cc:s in the
> > > patches themselves relevant to subsystems.
> > 
> > Will do.
> > 
> > > This doesn't seem to be on top of drm-tip, but on top of your previous
> > > patches(?) that I had some comments about. Could you take a moment to
> > > first address the couple of question I had, before proceeding to discuss
> > > what is built on top of that base.
> > 
> > It is on top of Linus tree so roughly ~ rc3 it does not depend on any
> > of the previous patch i posted.
> 
> You actually managed to race a point in time just when Chris rewrote much
> of the userptr code in drm-tip, which I didn't remember of. My bad.
> 
> Still interested to hearing replies to my questions in the previous
> thread, if the series is still relevant. Trying to get my head around
> how the different aspects of HMM pan out for devices without fault handling.

HMM mirror does not need page fault handling for everything and in fact
for user ptr you can use HMM mirror without page fault support in hardware.
Page fault requirement is more like a __very__ nice to have feature.

So sorry i missed that mail i must had it in a middle of bugzilla spam
and deleted it. So here is a paste of it and answer. This was for a
patch to convert i915 to use HMM mirror instead of having i915 does it
own thing with GUP (get_user_page).

> Bit late reply, but here goes :)
>
> We're working quite hard to avoid pinning any pages unless they're in
> the GPU page tables. And when they are in the GPU page tables, they must
> be pinned for whole of that duration, for the reason that our GPUs can
> not take a fault. And to avoid thrashing GPU page tables, we do leave
> objects in page tables with the expectation that smart userspace
> recycles buffers.

You do not need to pin the page because you obey to mmu notifier ie
it is perfectly fine for you to keep the page map into the GPU until
you get an mmu notifier call back for the range of virtual address.

The pin from GUP in fact does not protect you from anything. GUP is
really misleading, by the time GUP return the page you get might not
correspond to the memory backing the virtual address.

In i915 code this is not an issue because you synchronize against
mmu notifier call back.

So my intention in converting GPU driver from GUP to HMM mirror is
just to avoid the useless page pin. As long as you obey the mmu
notifier call back (or HMM sync page table call back) then you are
fine.

> So what I understand of your proposal, it wouldn't really make a
> difference for us in the amount of pinned pages (which I agree,
> we'd love to see going down). When we're unable to take a fault,
> the first use effectively forces us to pin any pages and keep them
> pinned to avoid thrashing GPU page tables.

With HMM there is no pin, we never pin the page ie we never increment
the refcount on the page as it is useless to do so if you abide by
mmu notifier. Again the pin GUP take is misleading it does not block
mm event.

However Without pin and still abiding to mmu notifier you will not
see any difference in thrashing ie number of time you will get a mmu
notifier call back. As really those call back happens for good reasons.
For instance running out of memory and kernel trying to reclaim or
because userspace did a syscall that affect the range of virtual address.

This should not happen in regular workload and when they happen the pin
from GUP will not inhibit those either. In the end you will get the exact
same amount of trashing but you will inhibit thing like memory compaction
or migration while HMM does not block those (ie HMM is a good citizen ;)
while GUP user are not).

Also we are in the process of changing GUP and GUP will now have more
profound impact to filesystem and mm (inhibiting and breaking some of
the filesystem behavior). Converting GPU driver to HMM will avoid those
adverse impact and it is one of the motivation behind my crusade to
convert all GUP user that abide by mmu notifier to use HMM instead.


> So from i915 perspective, it just seems to be mostly an exchange of
> an API to an another for getting the pages. You already mentioned
> the fast path is being worked on, which is an obvious difference.
> But is there some other improvement one would be expecting, beyond
> the page pinning?

So for HMM i have a bunch of further optimization and new feature.
Using HMM would make it easier for i915 to leverage those.

> Also, is the requirement for a single non-file-backed VMA in the
> plans of being eliminated or is that inherent restriction of the
> HMM_MIRROR feature? We're currently not imposing such a limitation.

HMM does not have that limitation, never did. It seems that i915
unlike other driver does allow GUP on file back page, while other
GPU driver do not. So i made the assumption the i915 did have that
limitation without checking the code.


> > I still intended to propose to remove
> > GUP from i915 once i get around to implement the equivalent of GUP_fast
> > for HMM and other bonus cookies with it.
> > 
> > The plan is once i have all mm bits properly upstream then i can propose
> > patches to individual driver against the proper driver tree ie following
> > rules of each individual device driver sub-system and Cc only people
> > there to avoid spamming the mm folks :)
> 
> Makes sense, as we're having tons of changes in this field in i915, the
> churn to rebase on top of them will be substantial.

I am posting more HMM bits today for 5.1, i will probably post another i915
patchset in coming weeks. I will try to base it on for-5.1-drm tree as i am
not only doing i915 but amd too and it is easier if i can do all of them in
just one tree so i only have to switch GPU not kernel too for testing :)

> 
> Regards, Joonas
> 
> PS. Are you by any chance attending FOSDEM? Would be nice to chat about
> this.

No i am not going to fosdem :(

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 0/9] mmu notifier provide context informations
  2019-01-23 22:23 [PATCH v4 0/9] mmu notifier provide context informations jglisse
                   ` (9 preceding siblings ...)
  2019-01-23 22:54 ` [PATCH v4 0/9] mmu notifier provide context informations Dan Williams
@ 2019-01-31 16:10 ` Jerome Glisse
  2019-01-31 19:55   ` Andrew Morton
                     ` (2 more replies)
  10 siblings, 3 replies; 24+ messages in thread
From: Jerome Glisse @ 2019-01-31 16:10 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Christian König, Jan Kara, Felix Kuehling,
	Jason Gunthorpe, Matthew Wilcox, Ross Zwisler, Dan Williams,
	Paolo Bonzini, Radim Krčmář,
	Michal Hocko, Ralph Campbell, John Hubbard, kvm, dri-devel,
	linux-rdma, linux-fsdevel, Arnd Bergmann


Andrew what is your plan for this ? I had a discussion with Peter Xu
and Andrea about change_pte() and kvm. Today the change_pte() kvm
optimization is effectively disabled because of invalidate_range
calls. With a minimal couple lines patch on top of this patchset
we can bring back the kvm change_pte optimization and we can also
optimize some other cases like for instance when write protecting
after fork (but i am not sure this is something qemu does often so
it might not help for real kvm workload).

I will be posting a the extra patch as an RFC, but in the meantime
i wanted to know what was the status for this.


Jan, Christian does your previous ACK still holds for this ?


On Wed, Jan 23, 2019 at 05:23:06PM -0500, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> Hi Andrew, i see that you still have my event patch in you queue [1].
> This patchset replace that single patch and is broken down in further
> step so that it is easier to review and ascertain that no mistake were
> made during mechanical changes. Here are the step:
> 
>     Patch 1 - add the enum values
>     Patch 2 - coccinelle semantic patch to convert all call site of
>               mmu_notifier_range_init to default enum value and also
>               to passing down the vma when it is available
>     Patch 3 - update many call site to more accurate enum values
>     Patch 4 - add the information to the mmu_notifier_range struct
>     Patch 5 - helper to test if a range is updated to read only
> 
> All the remaining patches are update to various driver to demonstrate
> how this new information get use by device driver. I build tested
> with make all and make all minus everything that enable mmu notifier
> ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd
> gpu and intel gpu.
> 
> If they are no objections i believe best plan would be to merge the
> the first 5 patches (all mm changes) through your queue for 5.1 and
> then to delay driver update to each individual driver tree for 5.2.
> This will allow each individual device driver maintainer time to more
> thouroughly test this more then my own testing.
> 
> Note that i also intend to use this feature further in nouveau and
> HMM down the road. I also expect that other user like KVM might be
> interested into leveraging this new information to optimize some of
> there secondary page table invalidation.
> 
> Here is an explaination on the rational for this patchset:
> 
> 
> CPU page table update can happens for many reasons, not only as a result
> of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
> as a result of kernel activities (memory compression, reclaim, migration,
> ...).
> 
> This patch introduce a set of enums that can be associated with each of
> the events triggering a mmu notifier. Latter patches take advantages of
> those enum values.
> 
> - UNMAP: munmap() or mremap()
> - CLEAR: page table is cleared (migration, compaction, reclaim, ...)
> - PROTECTION_VMA: change in access protections for the range
> - PROTECTION_PAGE: change in access protections for page in the range
> - SOFT_DIRTY: soft dirtyness tracking
> 
> Being able to identify munmap() and mremap() from other reasons why the
> page table is cleared is important to allow user of mmu notifier to
> update their own internal tracking structure accordingly (on munmap or
> mremap it is not longer needed to track range of virtual address as it
> becomes invalid).
> 
> [1] https://www.ozlabs.org/~akpm/mmotm/broken-out/mm-mmu_notifier-contextual-information-for-event-triggering-invalidation-v2.patch
> 
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> Cc: Jason Gunthorpe <jgg@mellanox.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Matthew Wilcox <mawilcox@microsoft.com>
> Cc: Ross Zwisler <zwisler@kernel.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: kvm@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-rdma@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: Arnd Bergmann <arnd@arndb.de>
> 
> Jérôme Glisse (9):
>   mm/mmu_notifier: contextual information for event enums
>   mm/mmu_notifier: contextual information for event triggering
>     invalidation
>   mm/mmu_notifier: use correct mmu_notifier events for each invalidation
>   mm/mmu_notifier: pass down vma and reasons why mmu notifier is
>     happening
>   mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper
>   gpu/drm/radeon: optimize out the case when a range is updated to read
>     only
>   gpu/drm/amdgpu: optimize out the case when a range is updated to read
>     only
>   gpu/drm/i915: optimize out the case when a range is updated to read
>     only
>   RDMA/umem_odp: optimize out the case when a range is updated to read
>     only
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  | 13 ++++++++
>  drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++++
>  drivers/gpu/drm/radeon/radeon_mn.c      | 13 ++++++++
>  drivers/infiniband/core/umem_odp.c      | 22 +++++++++++--
>  fs/proc/task_mmu.c                      |  3 +-
>  include/linux/mmu_notifier.h            | 42 ++++++++++++++++++++++++-
>  include/rdma/ib_umem_odp.h              |  1 +
>  kernel/events/uprobes.c                 |  3 +-
>  mm/huge_memory.c                        | 14 +++++----
>  mm/hugetlb.c                            | 11 ++++---
>  mm/khugepaged.c                         |  3 +-
>  mm/ksm.c                                |  6 ++--
>  mm/madvise.c                            |  3 +-
>  mm/memory.c                             | 25 +++++++++------
>  mm/migrate.c                            |  5 ++-
>  mm/mmu_notifier.c                       | 10 ++++++
>  mm/mprotect.c                           |  4 ++-
>  mm/mremap.c                             |  3 +-
>  mm/oom_kill.c                           |  3 +-
>  mm/rmap.c                               |  6 ++--
>  20 files changed, 171 insertions(+), 35 deletions(-)
> 
> -- 
> 2.17.2
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 0/9] mmu notifier provide context informations
  2019-01-31 16:10 ` Jerome Glisse
@ 2019-01-31 19:55   ` Andrew Morton
  2019-01-31 20:33     ` Jerome Glisse
  2019-02-01 12:24   ` Christian König
  2019-02-01 21:02   ` Jan Kara
  2 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2019-01-31 19:55 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: linux-mm, linux-kernel, Christian König, Jan Kara,
	Felix Kuehling, Jason Gunthorpe, Matthew Wilcox, Ross Zwisler,
	Dan Williams, Paolo Bonzini, Radim Krčmář,
	Michal Hocko, Ralph Campbell, John Hubbard, kvm, dri-devel,
	linux-rdma, linux-fsdevel, Arnd Bergmann

On Thu, 31 Jan 2019 11:10:06 -0500 Jerome Glisse <jglisse@redhat.com> wrote:

> Andrew what is your plan for this ? I had a discussion with Peter Xu
> and Andrea about change_pte() and kvm. Today the change_pte() kvm
> optimization is effectively disabled because of invalidate_range
> calls. With a minimal couple lines patch on top of this patchset
> we can bring back the kvm change_pte optimization and we can also
> optimize some other cases like for instance when write protecting
> after fork (but i am not sure this is something qemu does often so
> it might not help for real kvm workload).
> 
> I will be posting a the extra patch as an RFC, but in the meantime
> i wanted to know what was the status for this.

The various drm patches appear to be headed for collisions with drm
tree development so we'll need to figure out how to handle that and in
what order things happen.

It's quite unclear from the v4 patchset's changelogs that this has
anything to do with KVM and "the change_pte() kvm optimization" hasn't
been described anywhere(?).

So..  I expect the thing to do here is to get everything finished, get
the changelogs completed with this new information and do a resend.

Can we omit the drm and rdma patches for now?  Feed them in via the
subsystem maintainers when the dust has settled?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 0/9] mmu notifier provide context informations
  2019-01-31 19:55   ` Andrew Morton
@ 2019-01-31 20:33     ` Jerome Glisse
  0 siblings, 0 replies; 24+ messages in thread
From: Jerome Glisse @ 2019-01-31 20:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Christian König, Jan Kara,
	Felix Kuehling, Jason Gunthorpe, Matthew Wilcox, Ross Zwisler,
	Dan Williams, Paolo Bonzini, Radim Krčmář,
	Michal Hocko, Ralph Campbell, John Hubbard, kvm, dri-devel,
	linux-rdma, linux-fsdevel, Arnd Bergmann

On Thu, Jan 31, 2019 at 11:55:35AM -0800, Andrew Morton wrote:
> On Thu, 31 Jan 2019 11:10:06 -0500 Jerome Glisse <jglisse@redhat.com> wrote:
> 
> > Andrew what is your plan for this ? I had a discussion with Peter Xu
> > and Andrea about change_pte() and kvm. Today the change_pte() kvm
> > optimization is effectively disabled because of invalidate_range
> > calls. With a minimal couple lines patch on top of this patchset
> > we can bring back the kvm change_pte optimization and we can also
> > optimize some other cases like for instance when write protecting
> > after fork (but i am not sure this is something qemu does often so
> > it might not help for real kvm workload).
> > 
> > I will be posting a the extra patch as an RFC, but in the meantime
> > i wanted to know what was the status for this.
> 
> The various drm patches appear to be headed for collisions with drm
> tree development so we'll need to figure out how to handle that and in
> what order things happen.
> 
> It's quite unclear from the v4 patchset's changelogs that this has
> anything to do with KVM and "the change_pte() kvm optimization" hasn't
> been described anywhere(?).
> 
> So..  I expect the thing to do here is to get everything finished, get
> the changelogs completed with this new information and do a resend.
> 
> Can we omit the drm and rdma patches for now?  Feed them in via the
> subsystem maintainers when the dust has settled?

Yes, i should have pointed out that you can ignore the driver patches
i will resumit them through the appropriate tree once the mm bits are
in. I just wanted to show case how i intended to use this. I will try
not to forget next time to clearly tag things that are just there to
show case and that will be merge latter through different tree.

I will do a v5 with kvm bits once we have enough testing and confidence.
So i guess this all will be delayed to 5.2 and 5.3 for driver bits.
The kvm bits are outcomes of private emails and previous face to face
discussion around mmu notifier and kvm. I believe the context information
will turn to be useful to more users than the ones i am doing it for.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 0/9] mmu notifier provide context informations
  2019-01-31 16:10 ` Jerome Glisse
  2019-01-31 19:55   ` Andrew Morton
@ 2019-02-01 12:24   ` Christian König
  2019-02-01 21:02   ` Jan Kara
  2 siblings, 0 replies; 24+ messages in thread
From: Christian König @ 2019-02-01 12:24 UTC (permalink / raw)
  To: Jerome Glisse, Andrew Morton, linux-mm
  Cc: Ralph Campbell, Jan Kara, Arnd Bergmann, kvm, Matthew Wilcox,
	linux-rdma, John Hubbard, Felix Kuehling,
	Radim Krčmář,
	linux-kernel, dri-devel, Michal Hocko, Jason Gunthorpe,
	Ross Zwisler, linux-fsdevel, Paolo Bonzini, Dan Williams,
	Christian König

Am 31.01.19 um 17:10 schrieb Jerome Glisse:
> Andrew what is your plan for this ? I had a discussion with Peter Xu
> and Andrea about change_pte() and kvm. Today the change_pte() kvm
> optimization is effectively disabled because of invalidate_range
> calls. With a minimal couple lines patch on top of this patchset
> we can bring back the kvm change_pte optimization and we can also
> optimize some other cases like for instance when write protecting
> after fork (but i am not sure this is something qemu does often so
> it might not help for real kvm workload).
>
> I will be posting a the extra patch as an RFC, but in the meantime
> i wanted to know what was the status for this.
>
>
> Jan, Christian does your previous ACK still holds for this ?

At least the general approach still sounds perfectly sane to me.

Regarding how to merge these patches I think we should just get the 
general infrastructure into Linus tree and when can then merge the DRM 
patches one release later when we are sure that it doesn't break anything.

Christian.

>
>
> On Wed, Jan 23, 2019 at 05:23:06PM -0500, jglisse@redhat.com wrote:
>> From: Jérôme Glisse <jglisse@redhat.com>
>>
>> Hi Andrew, i see that you still have my event patch in you queue [1].
>> This patchset replace that single patch and is broken down in further
>> step so that it is easier to review and ascertain that no mistake were
>> made during mechanical changes. Here are the step:
>>
>>      Patch 1 - add the enum values
>>      Patch 2 - coccinelle semantic patch to convert all call site of
>>                mmu_notifier_range_init to default enum value and also
>>                to passing down the vma when it is available
>>      Patch 3 - update many call site to more accurate enum values
>>      Patch 4 - add the information to the mmu_notifier_range struct
>>      Patch 5 - helper to test if a range is updated to read only
>>
>> All the remaining patches are update to various driver to demonstrate
>> how this new information get use by device driver. I build tested
>> with make all and make all minus everything that enable mmu notifier
>> ie building with MMU_NOTIFIER=no. Also tested with some radeon,amd
>> gpu and intel gpu.
>>
>> If they are no objections i believe best plan would be to merge the
>> the first 5 patches (all mm changes) through your queue for 5.1 and
>> then to delay driver update to each individual driver tree for 5.2.
>> This will allow each individual device driver maintainer time to more
>> thouroughly test this more then my own testing.
>>
>> Note that i also intend to use this feature further in nouveau and
>> HMM down the road. I also expect that other user like KVM might be
>> interested into leveraging this new information to optimize some of
>> there secondary page table invalidation.
>>
>> Here is an explaination on the rational for this patchset:
>>
>>
>> CPU page table update can happens for many reasons, not only as a result
>> of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
>> as a result of kernel activities (memory compression, reclaim, migration,
>> ...).
>>
>> This patch introduce a set of enums that can be associated with each of
>> the events triggering a mmu notifier. Latter patches take advantages of
>> those enum values.
>>
>> - UNMAP: munmap() or mremap()
>> - CLEAR: page table is cleared (migration, compaction, reclaim, ...)
>> - PROTECTION_VMA: change in access protections for the range
>> - PROTECTION_PAGE: change in access protections for page in the range
>> - SOFT_DIRTY: soft dirtyness tracking
>>
>> Being able to identify munmap() and mremap() from other reasons why the
>> page table is cleared is important to allow user of mmu notifier to
>> update their own internal tracking structure accordingly (on munmap or
>> mremap it is not longer needed to track range of virtual address as it
>> becomes invalid).
>>
>> [1] https://www.ozlabs.org/~akpm/mmotm/broken-out/mm-mmu_notifier-contextual-information-for-event-triggering-invalidation-v2.patch
>>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
>> Cc: Jason Gunthorpe <jgg@mellanox.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Matthew Wilcox <mawilcox@microsoft.com>
>> Cc: Ross Zwisler <zwisler@kernel.org>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Ralph Campbell <rcampbell@nvidia.com>
>> Cc: John Hubbard <jhubbard@nvidia.com>
>> Cc: kvm@vger.kernel.org
>> Cc: dri-devel@lists.freedesktop.org
>> Cc: linux-rdma@vger.kernel.org
>> Cc: linux-fsdevel@vger.kernel.org
>> Cc: Arnd Bergmann <arnd@arndb.de>
>>
>> Jérôme Glisse (9):
>>    mm/mmu_notifier: contextual information for event enums
>>    mm/mmu_notifier: contextual information for event triggering
>>      invalidation
>>    mm/mmu_notifier: use correct mmu_notifier events for each invalidation
>>    mm/mmu_notifier: pass down vma and reasons why mmu notifier is
>>      happening
>>    mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper
>>    gpu/drm/radeon: optimize out the case when a range is updated to read
>>      only
>>    gpu/drm/amdgpu: optimize out the case when a range is updated to read
>>      only
>>    gpu/drm/i915: optimize out the case when a range is updated to read
>>      only
>>    RDMA/umem_odp: optimize out the case when a range is updated to read
>>      only
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  | 13 ++++++++
>>   drivers/gpu/drm/i915/i915_gem_userptr.c | 16 ++++++++++
>>   drivers/gpu/drm/radeon/radeon_mn.c      | 13 ++++++++
>>   drivers/infiniband/core/umem_odp.c      | 22 +++++++++++--
>>   fs/proc/task_mmu.c                      |  3 +-
>>   include/linux/mmu_notifier.h            | 42 ++++++++++++++++++++++++-
>>   include/rdma/ib_umem_odp.h              |  1 +
>>   kernel/events/uprobes.c                 |  3 +-
>>   mm/huge_memory.c                        | 14 +++++----
>>   mm/hugetlb.c                            | 11 ++++---
>>   mm/khugepaged.c                         |  3 +-
>>   mm/ksm.c                                |  6 ++--
>>   mm/madvise.c                            |  3 +-
>>   mm/memory.c                             | 25 +++++++++------
>>   mm/migrate.c                            |  5 ++-
>>   mm/mmu_notifier.c                       | 10 ++++++
>>   mm/mprotect.c                           |  4 ++-
>>   mm/mremap.c                             |  3 +-
>>   mm/oom_kill.c                           |  3 +-
>>   mm/rmap.c                               |  6 ++--
>>   20 files changed, 171 insertions(+), 35 deletions(-)
>>
>> -- 
>> 2.17.2
>>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 0/9] mmu notifier provide context informations
  2019-01-31 16:10 ` Jerome Glisse
  2019-01-31 19:55   ` Andrew Morton
  2019-02-01 12:24   ` Christian König
@ 2019-02-01 21:02   ` Jan Kara
  2019-02-11 18:54     ` Jerome Glisse
  2 siblings, 1 reply; 24+ messages in thread
From: Jan Kara @ 2019-02-01 21:02 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Andrew Morton, linux-mm, linux-kernel, Christian König,
	Jan Kara, Felix Kuehling, Jason Gunthorpe, Matthew Wilcox,
	Ross Zwisler, Dan Williams, Paolo Bonzini,
	Radim Krčmář,
	Michal Hocko, Ralph Campbell, John Hubbard, kvm, dri-devel,
	linux-rdma, linux-fsdevel, Arnd Bergmann

On Thu 31-01-19 11:10:06, Jerome Glisse wrote:
> 
> Andrew what is your plan for this ? I had a discussion with Peter Xu
> and Andrea about change_pte() and kvm. Today the change_pte() kvm
> optimization is effectively disabled because of invalidate_range
> calls. With a minimal couple lines patch on top of this patchset
> we can bring back the kvm change_pte optimization and we can also
> optimize some other cases like for instance when write protecting
> after fork (but i am not sure this is something qemu does often so
> it might not help for real kvm workload).
> 
> I will be posting a the extra patch as an RFC, but in the meantime
> i wanted to know what was the status for this.
> 
> Jan, Christian does your previous ACK still holds for this ?

Yes, I still think the approach makes sense. Dan's concern about in tree
users is valid but it seems you have those just not merged yet, right?

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 0/9] mmu notifier provide context informations
  2019-02-01 21:02   ` Jan Kara
@ 2019-02-11 18:54     ` Jerome Glisse
  0 siblings, 0 replies; 24+ messages in thread
From: Jerome Glisse @ 2019-02-11 18:54 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, linux-mm, linux-kernel, Christian König,
	Felix Kuehling, Jason Gunthorpe, Matthew Wilcox, Ross Zwisler,
	Dan Williams, Paolo Bonzini, Radim Krčmář,
	Michal Hocko, Ralph Campbell, John Hubbard, kvm, dri-devel,
	linux-rdma, linux-fsdevel, Arnd Bergmann

On Fri, Feb 01, 2019 at 10:02:30PM +0100, Jan Kara wrote:
> On Thu 31-01-19 11:10:06, Jerome Glisse wrote:
> > 
> > Andrew what is your plan for this ? I had a discussion with Peter Xu
> > and Andrea about change_pte() and kvm. Today the change_pte() kvm
> > optimization is effectively disabled because of invalidate_range
> > calls. With a minimal couple lines patch on top of this patchset
> > we can bring back the kvm change_pte optimization and we can also
> > optimize some other cases like for instance when write protecting
> > after fork (but i am not sure this is something qemu does often so
> > it might not help for real kvm workload).
> > 
> > I will be posting a the extra patch as an RFC, but in the meantime
> > i wanted to know what was the status for this.
> > 
> > Jan, Christian does your previous ACK still holds for this ?
> 
> Yes, I still think the approach makes sense. Dan's concern about in tree
> users is valid but it seems you have those just not merged yet, right?

(Catching up on email)

This version included some of the first users for this but i do not
want to merge them through Andrew but through the individual driver
project tree. Also in the meantime i found a use for this with kvm
and i expect few others users of mmu notifier will leverage this
extra informations.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, back to index

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-23 22:23 [PATCH v4 0/9] mmu notifier provide context informations jglisse
2019-01-23 22:23 ` [PATCH v4 1/9] mm/mmu_notifier: contextual information for event enums jglisse
2019-01-23 22:23 ` [PATCH v4 2/9] mm/mmu_notifier: contextual information for event triggering invalidation jglisse
2019-01-23 22:23 ` [PATCH v4 3/9] mm/mmu_notifier: use correct mmu_notifier events for each invalidation jglisse
2019-01-23 22:23 ` [PATCH v4 4/9] mm/mmu_notifier: pass down vma and reasons why mmu notifier is happening jglisse
2019-01-23 22:23 ` [PATCH v4 5/9] mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper jglisse
2019-01-23 22:23 ` [PATCH v4 6/9] gpu/drm/radeon: optimize out the case when a range is updated to read only jglisse
2019-01-23 22:23 ` [PATCH v4 7/9] gpu/drm/amdgpu: " jglisse
2019-01-23 22:23 ` [PATCH v4 8/9] gpu/drm/i915: " jglisse
2019-01-24 12:09   ` Joonas Lahtinen
2019-01-24 15:30     ` Jerome Glisse
     [not found]       ` <154877159986.4387.16328989441685542244@jlahtine-desk.ger.corp.intel.com>
2019-01-29 16:21         ` Jerome Glisse
2019-01-23 22:23 ` [PATCH v4 9/9] RDMA/umem_odp: " jglisse
2019-01-23 22:32   ` Jason Gunthorpe
2019-01-23 22:46     ` Jerome Glisse
2019-01-23 22:54 ` [PATCH v4 0/9] mmu notifier provide context informations Dan Williams
2019-01-23 23:04   ` Jerome Glisse
2019-01-24  0:00     ` Dan Williams
2019-01-31 16:10 ` Jerome Glisse
2019-01-31 19:55   ` Andrew Morton
2019-01-31 20:33     ` Jerome Glisse
2019-02-01 12:24   ` Christian König
2019-02-01 21:02   ` Jan Kara
2019-02-11 18:54     ` Jerome Glisse

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org linux-fsdevel@archiver.kernel.org
	public-inbox-index linux-fsdevel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox