linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/8] Create 'old' ptes for faultaround mappings on arm64 with hardware access flag
@ 2021-01-14 17:59 Will Deacon
  2021-01-14 17:59 ` [PATCH v3 1/8] mm: Cleanup faultaround and finish_fault() codepaths Will Deacon
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Will Deacon @ 2021-01-14 17:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Jan Kara, Minchan Kim, Catalin Marinas,
	Linus Torvalds, Hugh Dickins, linux-mm, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Will Deacon,
	linux-arm-kernel

Hi again folks,

This is the third version of the patches I previously posted here:

  v1: https://lore.kernel.org/r/20201209163950.8494-1-will@kernel.org
  v2: https://lore.kernel.org/r/20210108171517.5290-1-will@kernel.org

The patches allow architectures to opt-in at runtime for faultaround
mappings to be created as 'old' instead of 'young'. Although there have
been previous attempts at this, they failed either because the decision
was deferred to userspace [1] or because it was done unconditionally and
shown to regress benchmarks for particular architectures [2].

Minor changes since v2 include:

  * Update commit messages
  * Remove repeated word 'from from' in a comment
  * Restore 'vmf->flags' in filemap_map_pages()

The major additions are in the five RFC patches at the end of the
series, which attempt to implement a suggestion from Linus to split up
'struct vm_fault', clearly separating the mutable and immutable fields
in the data structure. I used Coccinelle to do most of the mechanical
work, but I also ran into some tricky problems along the way:

1. 'vmf->flags' is modified on the '->page_mkwrite()' path so I couldn't
   find a satisfactory way to move it to the new const structure. I toyed
   with getting rid of FAULT_FLAG_[MK]WRITE completely and just tracking
   these as bools, but there's also a weird piece of code in
   vmw_bo_vm_mkwrite() which modifies FAULT_FLAG_ALLOW_RETRY, so I gave
   up and left the 'flags' field alone.

2. I had to perform terrifying surgery on __collapse_huge_page_swapin()
   and, in doing so, I'm a bit wary about the initialisation of 'pgoff',
   as it isn't updated along with the address (this matches the old code).

3. vmf_insert_pfn_pmd() and friends take both a 'struct vm_fault' _and_
   a 'bool write'. I have left them alone, but that FAULT_FLAG_WRITE is
   causing trouble again.

4. Turns out 'struct vm_fault' is popular, so the diffstat is bloody
   massive.

Anyway, be good to hear any thoughts on this lot, particular with regards
to my comments above. I've also pushed the series here:

  https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=faultaround

Cheers

Will

[1] https://www.spinics.net/lists/linux-mm/msg143831.html
[2] 315d09bf30c2 ("Revert "mm: make faultaround produce old ptes"")

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: <kernel-team@android.com>

--->8

Kirill A. Shutemov (1):
  mm: Cleanup faultaround and finish_fault() codepaths

Will Deacon (7):
  mm: Allow architectures to request 'old' entries when prefaulting
  arm64: mm: Implement arch_wants_old_prefaulted_pte()
  mm: Separate fault info out of 'struct vm_fault'
  mm: Pass 'address' to map to do_set_pte() and drop FAULT_FLAG_PREFAULT
  mm: Avoid modifying vmf.info.address in __collapse_huge_page_swapin()
  mm: Use static initialisers for 'info' field of 'struct vm_fault'
  mm: Mark 'info' field of 'struct vm_fault' as 'const'

 arch/arm64/include/asm/pgtable.h           |  12 +-
 arch/arm64/kernel/vdso.c                   |   4 +-
 arch/powerpc/kvm/book3s_64_vio.c           |   6 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c         |   4 +-
 arch/powerpc/kvm/book3s_xive_native.c      |  13 +-
 arch/powerpc/platforms/cell/spufs/file.c   |  16 +-
 arch/s390/kernel/vdso.c                    |   4 +-
 arch/s390/kvm/kvm-s390.c                   |   2 +-
 arch/x86/entry/vdso/vma.c                  |  22 +-
 arch/x86/kernel/cpu/sgx/encl.c             |   4 +-
 drivers/char/agp/alpha-agp.c               |   2 +-
 drivers/char/mspec.c                       |   6 +-
 drivers/dax/device.c                       |  37 +-
 drivers/dma-buf/heaps/cma_heap.c           |   6 +-
 drivers/dma-buf/udmabuf.c                  |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |   4 +-
 drivers/gpu/drm/armada/armada_gem.c        |   6 +-
 drivers/gpu/drm/drm_gem_shmem_helper.c     |   8 +-
 drivers/gpu/drm/drm_vm.c                   |  18 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c      |  10 +-
 drivers/gpu/drm/gma500/framebuffer.c       |   4 +-
 drivers/gpu/drm/gma500/gem.c               |   8 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c   |   8 +-
 drivers/gpu/drm/msm/msm_gem.c              |  11 +-
 drivers/gpu/drm/nouveau/nouveau_dmem.c     |   8 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c      |   2 +-
 drivers/gpu/drm/omapdrm/omap_gem.c         |  20 +-
 drivers/gpu/drm/radeon/radeon_ttm.c        |   4 +-
 drivers/gpu/drm/tegra/gem.c                |   6 +-
 drivers/gpu/drm/ttm/ttm_bo_vm.c            |  10 +-
 drivers/gpu/drm/vc4/vc4_bo.c               |   2 +-
 drivers/gpu/drm/vgem/vgem_drv.c            |   6 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c |  12 +-
 drivers/hsi/clients/cmt_speech.c           |   2 +-
 drivers/hwtracing/intel_th/msu.c           |   8 +-
 drivers/infiniband/core/uverbs_main.c      |  10 +-
 drivers/infiniband/hw/hfi1/file_ops.c      |   2 +-
 drivers/infiniband/hw/qib/qib_file_ops.c   |   2 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c  |   6 +-
 drivers/misc/cxl/context.c                 |   9 +-
 drivers/misc/ocxl/context.c                |  10 +-
 drivers/misc/ocxl/sysfs.c                  |   8 +-
 drivers/misc/sgi-gru/grumain.c             |   4 +-
 drivers/scsi/cxlflash/ocxl_hw.c            |   6 +-
 drivers/scsi/cxlflash/superpipe.c          |   2 +-
 drivers/scsi/sg.c                          |   4 +-
 drivers/target/target_core_user.c          |   6 +-
 drivers/uio/uio.c                          |   6 +-
 drivers/usb/mon/mon_bin.c                  |   4 +-
 drivers/vfio/pci/vfio_pci.c                |   2 +-
 drivers/vfio/pci/vfio_pci_nvlink2.c        |   8 +-
 drivers/vhost/vdpa.c                       |   6 +-
 drivers/video/fbdev/core/fb_defio.c        |  14 +-
 drivers/xen/privcmd-buf.c                  |   5 +-
 drivers/xen/privcmd.c                      |   4 +-
 fs/9p/vfs_file.c                           |   2 +-
 fs/afs/write.c                             |   2 +-
 fs/btrfs/inode.c                           |   4 +-
 fs/ceph/addr.c                             |   6 +-
 fs/dax.c                                   |  53 +--
 fs/ext2/file.c                             |   6 +-
 fs/ext4/file.c                             |   6 +-
 fs/ext4/inode.c                            |   4 +-
 fs/f2fs/file.c                             |   8 +-
 fs/fuse/dax.c                              |   2 +-
 fs/fuse/file.c                             |   4 +-
 fs/gfs2/file.c                             |   8 +-
 fs/iomap/buffered-io.c                     |   2 +-
 fs/kernfs/file.c                           |   4 +-
 fs/nfs/file.c                              |   2 +-
 fs/nilfs2/file.c                           |   2 +-
 fs/ocfs2/mmap.c                            |   8 +-
 fs/orangefs/file.c                         |   2 +-
 fs/orangefs/inode.c                        |   4 +-
 fs/proc/vmcore.c                           |   4 +-
 fs/ubifs/file.c                            |   2 +-
 fs/userfaultfd.c                           |  17 +-
 fs/xfs/xfs_file.c                          |  18 +-
 fs/zonefs/super.c                          |   6 +-
 include/linux/huge_mm.h                    |   6 +-
 include/linux/mm.h                         |  21 +-
 include/linux/pgtable.h                    |  11 +
 include/trace/events/fs_dax.h              |  28 +-
 ipc/shm.c                                  |   2 +-
 kernel/events/core.c                       |  12 +-
 kernel/relay.c                             |   4 +-
 lib/test_hmm.c                             |   4 +-
 mm/filemap.c                               | 208 +++++++---
 mm/huge_memory.c                           |  57 +--
 mm/hugetlb.c                               |   6 +-
 mm/internal.h                              |   4 +-
 mm/khugepaged.c                            |  39 +-
 mm/memory.c                                | 452 +++++++++------------
 mm/mmap.c                                  |   6 +-
 mm/shmem.c                                 |  16 +-
 mm/swap_state.c                            |  19 +-
 mm/swapfile.c                              |  13 +-
 samples/vfio-mdev/mbochs.c                 |  10 +-
 security/selinux/selinuxfs.c               |   4 +-
 sound/core/pcm_native.c                    |   8 +-
 sound/usb/usx2y/us122l.c                   |   4 +-
 sound/usb/usx2y/usX2Yhwdep.c               |   8 +-
 sound/usb/usx2y/usx2yhwdeppcm.c            |   4 +-
 virt/kvm/kvm_main.c                        |  12 +-
 104 files changed, 821 insertions(+), 730 deletions(-)

-- 
2.30.0.284.gd98b1dd5eaa7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/8] mm: Cleanup faultaround and finish_fault() codepaths
  2021-01-14 17:59 [PATCH v3 0/8] Create 'old' ptes for faultaround mappings on arm64 with hardware access flag Will Deacon
@ 2021-01-14 17:59 ` Will Deacon
  2021-02-09 20:24   ` Guenter Roeck
  2021-01-14 17:59 ` [PATCH v3 2/8] mm: Allow architectures to request 'old' entries when prefaulting Will Deacon
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Will Deacon @ 2021-01-14 17:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Jan Kara, Minchan Kim, Catalin Marinas,
	Linus Torvalds, Hugh Dickins, linux-mm, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Will Deacon,
	linux-arm-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

alloc_set_pte() has two users with different requirements: in the
faultaround code, it called from an atomic context and PTE page table
has to be preallocated. finish_fault() can sleep and allocate page table
as needed.

PTL locking rules are also strange, hard to follow and overkill for
finish_fault().

Let's untangle the mess. alloc_set_pte() has gone now. All locking is
explicit.

The price is some code duplication to handle huge pages in faultaround
path, but it should be fine, having overall improvement in readability.

Link: https://lore.kernel.org/r/20201229132819.najtavneutnf7ajp@box
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
[will: s/from from/from/ in comment; spotted by willy]
Signed-off-by: Will Deacon <will@kernel.org>
---
 fs/xfs/xfs_file.c       |   6 +-
 include/linux/mm.h      |  12 ++-
 include/linux/pgtable.h |  11 +++
 mm/filemap.c            | 177 ++++++++++++++++++++++++++---------
 mm/memory.c             | 199 ++++++++++++----------------------------
 5 files changed, 213 insertions(+), 192 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5b0f93f73837..111fe73bb8a7 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1319,17 +1319,19 @@ xfs_filemap_pfn_mkwrite(
 	return __xfs_filemap_fault(vmf, PE_SIZE_PTE, true);
 }
 
-static void
+static vm_fault_t
 xfs_filemap_map_pages(
 	struct vm_fault		*vmf,
 	pgoff_t			start_pgoff,
 	pgoff_t			end_pgoff)
 {
 	struct inode		*inode = file_inode(vmf->vma->vm_file);
+	vm_fault_t ret;
 
 	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
-	filemap_map_pages(vmf, start_pgoff, end_pgoff);
+	ret = filemap_map_pages(vmf, start_pgoff, end_pgoff);
 	xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
+	return ret;
 }
 
 static const struct vm_operations_struct xfs_file_vm_ops = {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ecdf8a8cd6ae..4572a9bc5862 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -542,8 +542,8 @@ struct vm_fault {
 					 * is not NULL, otherwise pmd.
 					 */
 	pgtable_t prealloc_pte;		/* Pre-allocated pte page table.
-					 * vm_ops->map_pages() calls
-					 * alloc_set_pte() from atomic context.
+					 * vm_ops->map_pages() sets up a page
+					 * table from atomic context.
 					 * do_fault_around() pre-allocates
 					 * page table to avoid allocation from
 					 * atomic context.
@@ -578,7 +578,7 @@ struct vm_operations_struct {
 	vm_fault_t (*fault)(struct vm_fault *vmf);
 	vm_fault_t (*huge_fault)(struct vm_fault *vmf,
 			enum page_entry_size pe_size);
-	void (*map_pages)(struct vm_fault *vmf,
+	vm_fault_t (*map_pages)(struct vm_fault *vmf,
 			pgoff_t start_pgoff, pgoff_t end_pgoff);
 	unsigned long (*pagesize)(struct vm_area_struct * area);
 
@@ -988,7 +988,9 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
 	return pte;
 }
 
-vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct page *page);
+vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page);
+void do_set_pte(struct vm_fault *vmf, struct page *page);
+
 vm_fault_t finish_fault(struct vm_fault *vmf);
 vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
 #endif
@@ -2622,7 +2624,7 @@ extern void truncate_inode_pages_final(struct address_space *);
 
 /* generic vm_area_ops exported for stackable file systems */
 extern vm_fault_t filemap_fault(struct vm_fault *vmf);
-extern void filemap_map_pages(struct vm_fault *vmf,
+extern vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 		pgoff_t start_pgoff, pgoff_t end_pgoff);
 extern vm_fault_t filemap_page_mkwrite(struct vm_fault *vmf);
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 8fcdfa52eb4b..36eb748f3c97 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1314,6 +1314,17 @@ static inline int pmd_trans_unstable(pmd_t *pmd)
 #endif
 }
 
+/*
+ * the ordering of these checks is important for pmds with _page_devmap set.
+ * if we check pmd_trans_unstable() first we will trip the bad_pmd() check
+ * inside of pmd_none_or_trans_huge_or_clear_bad(). this will end up correctly
+ * returning 1 but not before it spams dmesg with the pmd_clear_bad() output.
+ */
+static inline int pmd_devmap_trans_unstable(pmd_t *pmd)
+{
+	return pmd_devmap(*pmd) || pmd_trans_unstable(pmd);
+}
+
 #ifndef CONFIG_NUMA_BALANCING
 /*
  * Technically a PTE can be PROTNONE even when not doing NUMA balancing but
diff --git a/mm/filemap.c b/mm/filemap.c
index 5c9d564317a5..c1f2dc89b8a7 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -42,6 +42,7 @@
 #include <linux/psi.h>
 #include <linux/ramfs.h>
 #include <linux/page_idle.h>
+#include <asm/pgalloc.h>
 #include "internal.h"
 
 #define CREATE_TRACE_POINTS
@@ -2911,74 +2912,164 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 }
 EXPORT_SYMBOL(filemap_fault);
 
-void filemap_map_pages(struct vm_fault *vmf,
-		pgoff_t start_pgoff, pgoff_t end_pgoff)
+static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page)
 {
-	struct file *file = vmf->vma->vm_file;
+	struct mm_struct *mm = vmf->vma->vm_mm;
+
+	/* Huge page is mapped? No need to proceed. */
+	if (pmd_trans_huge(*vmf->pmd)) {
+		unlock_page(page);
+		put_page(page);
+		return true;
+	}
+
+	if (pmd_none(*vmf->pmd) && PageTransHuge(page)) {
+	    vm_fault_t ret = do_set_pmd(vmf, page);
+	    if (!ret) {
+		    /* The page is mapped successfully, reference consumed. */
+		    unlock_page(page);
+		    return true;
+	    }
+	}
+
+	if (pmd_none(*vmf->pmd)) {
+		vmf->ptl = pmd_lock(mm, vmf->pmd);
+		if (likely(pmd_none(*vmf->pmd))) {
+			mm_inc_nr_ptes(mm);
+			pmd_populate(mm, vmf->pmd, vmf->prealloc_pte);
+			vmf->prealloc_pte = NULL;
+		}
+		spin_unlock(vmf->ptl);
+	}
+
+	/* See comment in handle_pte_fault() */
+	if (pmd_devmap_trans_unstable(vmf->pmd)) {
+		unlock_page(page);
+		put_page(page);
+		return true;
+	}
+
+	return false;
+}
+
+static struct page *next_uptodate_page(struct page *page,
+				       struct address_space *mapping,
+				       struct xa_state *xas, pgoff_t end_pgoff)
+{
+	unsigned long max_idx;
+
+	do {
+		if (!page)
+			return NULL;
+		if (xas_retry(xas, page))
+			continue;
+		if (xa_is_value(page))
+			continue;
+		if (PageLocked(page))
+			continue;
+		if (!page_cache_get_speculative(page))
+			continue;
+		/* Has the page moved or been split? */
+		if (unlikely(page != xas_reload(xas)))
+			goto skip;
+		if (!PageUptodate(page) || PageReadahead(page))
+			goto skip;
+		if (PageHWPoison(page))
+			goto skip;
+		if (!trylock_page(page))
+			goto skip;
+		if (page->mapping != mapping)
+			goto unlock;
+		if (!PageUptodate(page))
+			goto unlock;
+		max_idx = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE);
+		if (xas->xa_index >= max_idx)
+			goto unlock;
+		return page;
+unlock:
+		unlock_page(page);
+skip:
+		put_page(page);
+	} while ((page = xas_next_entry(xas, end_pgoff)) != NULL);
+
+	return NULL;
+}
+
+static inline struct page *first_map_page(struct address_space *mapping,
+					  struct xa_state *xas,
+					  pgoff_t end_pgoff)
+{
+	return next_uptodate_page(xas_find(xas, end_pgoff),
+				  mapping, xas, end_pgoff);
+}
+
+static inline struct page *next_map_page(struct address_space *mapping,
+					 struct xa_state *xas,
+					 pgoff_t end_pgoff)
+{
+	return next_uptodate_page(xas_next_entry(xas, end_pgoff),
+				  mapping, xas, end_pgoff);
+}
+
+vm_fault_t filemap_map_pages(struct vm_fault *vmf,
+			     pgoff_t start_pgoff, pgoff_t end_pgoff)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	struct file *file = vma->vm_file;
 	struct address_space *mapping = file->f_mapping;
 	pgoff_t last_pgoff = start_pgoff;
-	unsigned long max_idx;
+	unsigned long address = vmf->address;
 	XA_STATE(xas, &mapping->i_pages, start_pgoff);
 	struct page *head, *page;
 	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
+	vm_fault_t ret = 0;
 
 	rcu_read_lock();
-	xas_for_each(&xas, head, end_pgoff) {
-		if (xas_retry(&xas, head))
-			continue;
-		if (xa_is_value(head))
-			goto next;
+	head = first_map_page(mapping, &xas, end_pgoff);
+	if (!head)
+		goto out;
 
-		/*
-		 * Check for a locked page first, as a speculative
-		 * reference may adversely influence page migration.
-		 */
-		if (PageLocked(head))
-			goto next;
-		if (!page_cache_get_speculative(head))
-			goto next;
+	if (filemap_map_pmd(vmf, head)) {
+		ret = VM_FAULT_NOPAGE;
+		goto out;
+	}
 
-		/* Has the page moved or been split? */
-		if (unlikely(head != xas_reload(&xas)))
-			goto skip;
+	vmf->address = vma->vm_start + ((start_pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl);
+	do {
 		page = find_subpage(head, xas.xa_index);
-
-		if (!PageUptodate(head) ||
-				PageReadahead(page) ||
-				PageHWPoison(page))
-			goto skip;
-		if (!trylock_page(head))
-			goto skip;
-
-		if (head->mapping != mapping || !PageUptodate(head))
-			goto unlock;
-
-		max_idx = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE);
-		if (xas.xa_index >= max_idx)
+		if (PageHWPoison(page))
 			goto unlock;
 
 		if (mmap_miss > 0)
 			mmap_miss--;
 
 		vmf->address += (xas.xa_index - last_pgoff) << PAGE_SHIFT;
-		if (vmf->pte)
-			vmf->pte += xas.xa_index - last_pgoff;
+		vmf->pte += xas.xa_index - last_pgoff;
 		last_pgoff = xas.xa_index;
-		if (alloc_set_pte(vmf, page))
+
+		if (!pte_none(*vmf->pte))
 			goto unlock;
+
+		do_set_pte(vmf, page);
+		/* no need to invalidate: a not-present page won't be cached */
+		update_mmu_cache(vma, vmf->address, vmf->pte);
 		unlock_page(head);
-		goto next;
+
+		/* The fault is handled */
+		if (vmf->address == address)
+			ret = VM_FAULT_NOPAGE;
+		continue;
 unlock:
 		unlock_page(head);
-skip:
 		put_page(head);
-next:
-		/* Huge page is mapped? No need to proceed. */
-		if (pmd_trans_huge(*vmf->pmd))
-			break;
-	}
+	} while ((head = next_map_page(mapping, &xas, end_pgoff)) != NULL);
+	pte_unmap_unlock(vmf->pte, vmf->ptl);
+out:
 	rcu_read_unlock();
+	vmf->address = address;
 	WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
+	return ret;
 }
 EXPORT_SYMBOL(filemap_map_pages);
 
diff --git a/mm/memory.c b/mm/memory.c
index feff48e1465a..3e2fc2950ad7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3503,7 +3503,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 	if (pte_alloc(vma->vm_mm, vmf->pmd))
 		return VM_FAULT_OOM;
 
-	/* See the comment in pte_alloc_one_map() */
+	/* See comment in handle_pte_fault() */
 	if (unlikely(pmd_trans_unstable(vmf->pmd)))
 		return 0;
 
@@ -3643,66 +3643,6 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
 	return ret;
 }
 
-/*
- * The ordering of these checks is important for pmds with _PAGE_DEVMAP set.
- * If we check pmd_trans_unstable() first we will trip the bad_pmd() check
- * inside of pmd_none_or_trans_huge_or_clear_bad(). This will end up correctly
- * returning 1 but not before it spams dmesg with the pmd_clear_bad() output.
- */
-static int pmd_devmap_trans_unstable(pmd_t *pmd)
-{
-	return pmd_devmap(*pmd) || pmd_trans_unstable(pmd);
-}
-
-static vm_fault_t pte_alloc_one_map(struct vm_fault *vmf)
-{
-	struct vm_area_struct *vma = vmf->vma;
-
-	if (!pmd_none(*vmf->pmd))
-		goto map_pte;
-	if (vmf->prealloc_pte) {
-		vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
-		if (unlikely(!pmd_none(*vmf->pmd))) {
-			spin_unlock(vmf->ptl);
-			goto map_pte;
-		}
-
-		mm_inc_nr_ptes(vma->vm_mm);
-		pmd_populate(vma->vm_mm, vmf->pmd, vmf->prealloc_pte);
-		spin_unlock(vmf->ptl);
-		vmf->prealloc_pte = NULL;
-	} else if (unlikely(pte_alloc(vma->vm_mm, vmf->pmd))) {
-		return VM_FAULT_OOM;
-	}
-map_pte:
-	/*
-	 * If a huge pmd materialized under us just retry later.  Use
-	 * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead of
-	 * pmd_trans_huge() to ensure the pmd didn't become pmd_trans_huge
-	 * under us and then back to pmd_none, as a result of MADV_DONTNEED
-	 * running immediately after a huge pmd fault in a different thread of
-	 * this mm, in turn leading to a misleading pmd_trans_huge() retval.
-	 * All we have to ensure is that it is a regular pmd that we can walk
-	 * with pte_offset_map() and we can do that through an atomic read in
-	 * C, which is what pmd_trans_unstable() provides.
-	 */
-	if (pmd_devmap_trans_unstable(vmf->pmd))
-		return VM_FAULT_NOPAGE;
-
-	/*
-	 * At this point we know that our vmf->pmd points to a page of ptes
-	 * and it cannot become pmd_none(), pmd_devmap() or pmd_trans_huge()
-	 * for the duration of the fault.  If a racing MADV_DONTNEED runs and
-	 * we zap the ptes pointed to by our vmf->pmd, the vmf->ptl will still
-	 * be valid and we will re-check to make sure the vmf->pte isn't
-	 * pte_none() under vmf->ptl protection when we return to
-	 * alloc_set_pte().
-	 */
-	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
-			&vmf->ptl);
-	return 0;
-}
-
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static void deposit_prealloc_pte(struct vm_fault *vmf)
 {
@@ -3717,7 +3657,7 @@ static void deposit_prealloc_pte(struct vm_fault *vmf)
 	vmf->prealloc_pte = NULL;
 }
 
-static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
+vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
@@ -3775,52 +3715,17 @@ static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
 	return ret;
 }
 #else
-static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
+vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
 {
-	BUILD_BUG();
-	return 0;
+	return VM_FAULT_FALLBACK;
 }
 #endif
 
-/**
- * alloc_set_pte - setup new PTE entry for given page and add reverse page
- * mapping. If needed, the function allocates page table or use pre-allocated.
- *
- * @vmf: fault environment
- * @page: page to map
- *
- * Caller must take care of unlocking vmf->ptl, if vmf->pte is non-NULL on
- * return.
- *
- * Target users are page handler itself and implementations of
- * vm_ops->map_pages.
- *
- * Return: %0 on success, %VM_FAULT_ code in case of error.
- */
-vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct page *page)
+void do_set_pte(struct vm_fault *vmf, struct page *page)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	pte_t entry;
-	vm_fault_t ret;
-
-	if (pmd_none(*vmf->pmd) && PageTransCompound(page)) {
-		ret = do_set_pmd(vmf, page);
-		if (ret != VM_FAULT_FALLBACK)
-			return ret;
-	}
-
-	if (!vmf->pte) {
-		ret = pte_alloc_one_map(vmf);
-		if (ret)
-			return ret;
-	}
-
-	/* Re-check under ptl */
-	if (unlikely(!pte_none(*vmf->pte))) {
-		update_mmu_tlb(vma, vmf->address, vmf->pte);
-		return VM_FAULT_NOPAGE;
-	}
 
 	flush_icache_page(vma, page);
 	entry = mk_pte(page, vma->vm_page_prot);
@@ -3837,14 +3742,8 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct page *page)
 		page_add_file_rmap(page, false);
 	}
 	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
-
-	/* no need to invalidate: a not-present page won't be cached */
-	update_mmu_cache(vma, vmf->address, vmf->pte);
-
-	return 0;
 }
 
-
 /**
  * finish_fault - finish page fault once we have prepared the page to fault
  *
@@ -3862,12 +3761,12 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct page *page)
  */
 vm_fault_t finish_fault(struct vm_fault *vmf)
 {
+	struct vm_area_struct *vma = vmf->vma;
 	struct page *page;
-	vm_fault_t ret = 0;
+	vm_fault_t ret;
 
 	/* Did we COW the page? */
-	if ((vmf->flags & FAULT_FLAG_WRITE) &&
-	    !(vmf->vma->vm_flags & VM_SHARED))
+	if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED))
 		page = vmf->cow_page;
 	else
 		page = vmf->page;
@@ -3876,12 +3775,38 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
 	 * check even for read faults because we might have lost our CoWed
 	 * page
 	 */
-	if (!(vmf->vma->vm_flags & VM_SHARED))
-		ret = check_stable_address_space(vmf->vma->vm_mm);
-	if (!ret)
-		ret = alloc_set_pte(vmf, page);
-	if (vmf->pte)
-		pte_unmap_unlock(vmf->pte, vmf->ptl);
+	if (!(vma->vm_flags & VM_SHARED)) {
+		ret = check_stable_address_space(vma->vm_mm);
+		if (ret)
+			return ret;
+	}
+
+	if (pmd_none(*vmf->pmd)) {
+		if (PageTransCompound(page)) {
+			ret = do_set_pmd(vmf, page);
+			if (ret != VM_FAULT_FALLBACK)
+				return ret;
+		}
+
+		if (unlikely(pte_alloc(vma->vm_mm, vmf->pmd)))
+			return VM_FAULT_OOM;
+	}
+
+	/* See comment in handle_pte_fault() */
+	if (pmd_devmap_trans_unstable(vmf->pmd))
+		return 0;
+
+	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
+				      vmf->address, &vmf->ptl);
+	ret = 0;
+	/* Re-check under ptl */
+	if (likely(pte_none(*vmf->pte)))
+		do_set_pte(vmf, page);
+	else
+		ret = VM_FAULT_NOPAGE;
+
+	update_mmu_tlb(vma, vmf->address, vmf->pte);
+	pte_unmap_unlock(vmf->pte, vmf->ptl);
 	return ret;
 }
 
@@ -3951,13 +3876,12 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
 	pgoff_t start_pgoff = vmf->pgoff;
 	pgoff_t end_pgoff;
 	int off;
-	vm_fault_t ret = 0;
 
 	nr_pages = READ_ONCE(fault_around_bytes) >> PAGE_SHIFT;
 	mask = ~(nr_pages * PAGE_SIZE - 1) & PAGE_MASK;
 
-	vmf->address = max(address & mask, vmf->vma->vm_start);
-	off = ((address - vmf->address) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1);
+	address = max(address & mask, vmf->vma->vm_start);
+	off = ((vmf->address - address) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1);
 	start_pgoff -= off;
 
 	/*
@@ -3965,7 +3889,7 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
 	 *  the vma or nr_pages from start_pgoff, depending what is nearest.
 	 */
 	end_pgoff = start_pgoff -
-		((vmf->address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)) +
+		((address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)) +
 		PTRS_PER_PTE - 1;
 	end_pgoff = min3(end_pgoff, vma_pages(vmf->vma) + vmf->vma->vm_pgoff - 1,
 			start_pgoff + nr_pages - 1);
@@ -3973,31 +3897,11 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
 	if (pmd_none(*vmf->pmd)) {
 		vmf->prealloc_pte = pte_alloc_one(vmf->vma->vm_mm);
 		if (!vmf->prealloc_pte)
-			goto out;
+			return VM_FAULT_OOM;
 		smp_wmb(); /* See comment in __pte_alloc() */
 	}
 
-	vmf->vma->vm_ops->map_pages(vmf, start_pgoff, end_pgoff);
-
-	/* Huge page is mapped? Page fault is solved */
-	if (pmd_trans_huge(*vmf->pmd)) {
-		ret = VM_FAULT_NOPAGE;
-		goto out;
-	}
-
-	/* ->map_pages() haven't done anything useful. Cold page cache? */
-	if (!vmf->pte)
-		goto out;
-
-	/* check if the page fault is solved */
-	vmf->pte -= (vmf->address >> PAGE_SHIFT) - (address >> PAGE_SHIFT);
-	if (!pte_none(*vmf->pte))
-		ret = VM_FAULT_NOPAGE;
-	pte_unmap_unlock(vmf->pte, vmf->ptl);
-out:
-	vmf->address = address;
-	vmf->pte = NULL;
-	return ret;
+	return vmf->vma->vm_ops->map_pages(vmf, start_pgoff, end_pgoff);
 }
 
 static vm_fault_t do_read_fault(struct vm_fault *vmf)
@@ -4353,7 +4257,18 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 		 */
 		vmf->pte = NULL;
 	} else {
-		/* See comment in pte_alloc_one_map() */
+		/*
+		 * If a huge pmd materialized under us just retry later.  Use
+		 * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead
+		 * of pmd_trans_huge() to ensure the pmd didn't become
+		 * pmd_trans_huge under us and then back to pmd_none, as a
+		 * result of MADV_DONTNEED running immediately after a huge pmd
+		 * fault in a different thread of this mm, in turn leading to a
+		 * misleading pmd_trans_huge() retval. All we have to ensure is
+		 * that it is a regular pmd that we can walk with
+		 * pte_offset_map() and we can do that through an atomic read
+		 * in C, which is what pmd_trans_unstable() provides.
+		 */
 		if (pmd_devmap_trans_unstable(vmf->pmd))
 			return 0;
 		/*
-- 
2.30.0.284.gd98b1dd5eaa7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 2/8] mm: Allow architectures to request 'old' entries when prefaulting
  2021-01-14 17:59 [PATCH v3 0/8] Create 'old' ptes for faultaround mappings on arm64 with hardware access flag Will Deacon
  2021-01-14 17:59 ` [PATCH v3 1/8] mm: Cleanup faultaround and finish_fault() codepaths Will Deacon
@ 2021-01-14 17:59 ` Will Deacon
  2021-01-14 17:59 ` [PATCH v3 3/8] arm64: mm: Implement arch_wants_old_prefaulted_pte() Will Deacon
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Will Deacon @ 2021-01-14 17:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Jan Kara, Minchan Kim, Catalin Marinas,
	Linus Torvalds, Hugh Dickins, linux-mm, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Will Deacon,
	linux-arm-kernel

Commit 5c0a85fad949 ("mm: make faultaround produce old ptes") changed
the "faultaround" behaviour to initialise prefaulted PTEs as 'old',
since this avoids vmscan wrongly assuming that they are hot, despite
having never been explicitly accessed by userspace. The change has been
shown to benefit numerous arm64 micro-architectures (with hardware
access flag) running Android, where both application launch latency and
direct reclaim time are significantly reduced (by 10%+ and ~80%
respectively).

Unfortunately, commit 315d09bf30c2 ("Revert "mm: make faultaround
produce old ptes"") reverted the change due to it being identified as
the cause of a ~6% regression in unixbench on x86. Experiments on a
variety of recent arm64 micro-architectures indicate that unixbench is
not affected by the original commit, which appears to yield a 0-1%
performance improvement.

Since one size does not fit all for the initial state of prefaulted
PTEs, introduce arch_wants_old_prefaulted_pte(), which allows an
architecture to opt-in to 'old' prefaulted PTEs at runtime based on
whatever criteria it may have.

Cc: Jan Kara <jack@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Will Deacon <will@kernel.org>
---
 include/linux/mm.h |  5 ++++-
 mm/filemap.c       | 14 ++++++++++----
 mm/memory.c        | 20 +++++++++++++++++++-
 3 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4572a9bc5862..251a2339befb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -434,6 +434,7 @@ extern pgprot_t protection_map[16];
  * @FAULT_FLAG_REMOTE: The fault is not for current task/mm.
  * @FAULT_FLAG_INSTRUCTION: The fault was during an instruction fetch.
  * @FAULT_FLAG_INTERRUPTIBLE: The fault can be interrupted by non-fatal signals.
+ * @FAULT_FLAG_PREFAULT: Fault was a prefault.
  *
  * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify
  * whether we would allow page faults to retry by specifying these two
@@ -464,6 +465,7 @@ extern pgprot_t protection_map[16];
 #define FAULT_FLAG_REMOTE			0x80
 #define FAULT_FLAG_INSTRUCTION  		0x100
 #define FAULT_FLAG_INTERRUPTIBLE		0x200
+#define FAULT_FLAG_PREFAULT			0x400
 
 /*
  * The default fault flags that should be used by most of the
@@ -501,7 +503,8 @@ static inline bool fault_flag_allow_retry_first(unsigned int flags)
 	{ FAULT_FLAG_USER,		"USER" }, \
 	{ FAULT_FLAG_REMOTE,		"REMOTE" }, \
 	{ FAULT_FLAG_INSTRUCTION,	"INSTRUCTION" }, \
-	{ FAULT_FLAG_INTERRUPTIBLE,	"INTERRUPTIBLE" }
+	{ FAULT_FLAG_INTERRUPTIBLE,	"INTERRUPTIBLE" }, \
+	{ FAULT_FLAG_PREFAULT,		"PREFAULT" }
 
 /*
  * vm_fault is filled by the pagefault handler and passed to the vma's
diff --git a/mm/filemap.c b/mm/filemap.c
index c1f2dc89b8a7..a6dc97906c8e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3019,6 +3019,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 	struct address_space *mapping = file->f_mapping;
 	pgoff_t last_pgoff = start_pgoff;
 	unsigned long address = vmf->address;
+	unsigned long flags = vmf->flags;
 	XA_STATE(xas, &mapping->i_pages, start_pgoff);
 	struct page *head, *page;
 	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
@@ -3051,14 +3052,18 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 		if (!pte_none(*vmf->pte))
 			goto unlock;
 
+		/* We're about to handle the fault */
+		if (vmf->address == address) {
+			vmf->flags &= ~FAULT_FLAG_PREFAULT;
+			ret = VM_FAULT_NOPAGE;
+		} else {
+			vmf->flags |= FAULT_FLAG_PREFAULT;
+		}
+
 		do_set_pte(vmf, page);
 		/* no need to invalidate: a not-present page won't be cached */
 		update_mmu_cache(vma, vmf->address, vmf->pte);
 		unlock_page(head);
-
-		/* The fault is handled */
-		if (vmf->address == address)
-			ret = VM_FAULT_NOPAGE;
 		continue;
 unlock:
 		unlock_page(head);
@@ -3067,6 +3072,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
 out:
 	rcu_read_unlock();
+	vmf->flags = flags;
 	vmf->address = address;
 	WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
 	return ret;
diff --git a/mm/memory.c b/mm/memory.c
index 3e2fc2950ad7..f0e7c589ca9d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -134,6 +134,18 @@ static inline bool arch_faults_on_old_pte(void)
 }
 #endif
 
+#ifndef arch_wants_old_prefaulted_pte
+static inline bool arch_wants_old_prefaulted_pte(void)
+{
+	/*
+	 * Transitioning a PTE from 'old' to 'young' can be expensive on
+	 * some architectures, even if it's performed in hardware. By
+	 * default, "false" means prefaulted entries will be 'young'.
+	 */
+	return false;
+}
+#endif
+
 static int __init disable_randmaps(char *s)
 {
 	randomize_va_space = 0;
@@ -3725,11 +3737,17 @@ void do_set_pte(struct vm_fault *vmf, struct page *page)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
+	bool prefault = vmf->flags & FAULT_FLAG_PREFAULT;
 	pte_t entry;
 
 	flush_icache_page(vma, page);
 	entry = mk_pte(page, vma->vm_page_prot);
-	entry = pte_sw_mkyoung(entry);
+
+	if (prefault && arch_wants_old_prefaulted_pte())
+		entry = pte_mkold(entry);
+	else
+		entry = pte_sw_mkyoung(entry);
+
 	if (write)
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 	/* copy-on-write page */
-- 
2.30.0.284.gd98b1dd5eaa7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 3/8] arm64: mm: Implement arch_wants_old_prefaulted_pte()
  2021-01-14 17:59 [PATCH v3 0/8] Create 'old' ptes for faultaround mappings on arm64 with hardware access flag Will Deacon
  2021-01-14 17:59 ` [PATCH v3 1/8] mm: Cleanup faultaround and finish_fault() codepaths Will Deacon
  2021-01-14 17:59 ` [PATCH v3 2/8] mm: Allow architectures to request 'old' entries when prefaulting Will Deacon
@ 2021-01-14 17:59 ` Will Deacon
  2021-01-14 17:59 ` [RFC PATCH 5/8] mm: Pass 'address' to map to do_set_pte() and drop FAULT_FLAG_PREFAULT Will Deacon
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Will Deacon @ 2021-01-14 17:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Jan Kara, Minchan Kim, Catalin Marinas,
	Linus Torvalds, Hugh Dickins, linux-mm, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Will Deacon,
	linux-arm-kernel

On CPUs with hardware AF/DBM, initialising prefaulted PTEs as 'old'
improves vmscan behaviour and does not appear to introduce any overhead
elsewhere.

Implement arch_wants_old_prefaulted_pte() to return 'true' if we detect
hardware access flag support at runtime. This can be extended in future
based on MIDR matching if necessary.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/pgtable.h | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 501562793ce2..e17b96d0e4b5 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -980,7 +980,17 @@ static inline bool arch_faults_on_old_pte(void)
 
 	return !cpu_has_hw_af();
 }
-#define arch_faults_on_old_pte arch_faults_on_old_pte
+#define arch_faults_on_old_pte		arch_faults_on_old_pte
+
+/*
+ * Experimentally, it's cheap to set the access flag in hardware and we
+ * benefit from prefaulting mappings as 'old' to start with.
+ */
+static inline bool arch_wants_old_prefaulted_pte(void)
+{
+	return !arch_faults_on_old_pte();
+}
+#define arch_wants_old_prefaulted_pte	arch_wants_old_prefaulted_pte
 
 #endif /* !__ASSEMBLY__ */
 
-- 
2.30.0.284.gd98b1dd5eaa7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 5/8] mm: Pass 'address' to map to do_set_pte() and drop FAULT_FLAG_PREFAULT
  2021-01-14 17:59 [PATCH v3 0/8] Create 'old' ptes for faultaround mappings on arm64 with hardware access flag Will Deacon
                   ` (2 preceding siblings ...)
  2021-01-14 17:59 ` [PATCH v3 3/8] arm64: mm: Implement arch_wants_old_prefaulted_pte() Will Deacon
@ 2021-01-14 17:59 ` Will Deacon
  2021-01-14 18:17   ` Linus Torvalds
  2021-01-14 17:59 ` [RFC PATCH 6/8] mm: Avoid modifying vmf.info.address in __collapse_huge_page_swapin() Will Deacon
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Will Deacon @ 2021-01-14 17:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Jan Kara, Minchan Kim, Catalin Marinas,
	Linus Torvalds, Hugh Dickins, linux-mm, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Will Deacon,
	linux-arm-kernel

Rather than modifying the 'address' field of the 'struct vm_fault_info'
passed to do_set_pte(), leave that to identify the real faulting address
and pass in the virtual address to be mapped by the new pte as a
separate argument.

This makes FAULT_FLAG_PREFAULT redundant, as a prefault entry can be
identified simply by comparing the new address parameter with the
faulting address, so remove the redundant flag at the same time.

Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will@kernel.org>
---
 include/linux/mm.h |  7 ++-----
 mm/filemap.c       | 22 +++++++---------------
 mm/memory.c        | 10 +++++-----
 3 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 15e8dd77b8ff..c24dd17b32c5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -434,7 +434,6 @@ extern pgprot_t protection_map[16];
  * @FAULT_FLAG_REMOTE: The fault is not for current task/mm.
  * @FAULT_FLAG_INSTRUCTION: The fault was during an instruction fetch.
  * @FAULT_FLAG_INTERRUPTIBLE: The fault can be interrupted by non-fatal signals.
- * @FAULT_FLAG_PREFAULT: Fault was a prefault.
  *
  * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify
  * whether we would allow page faults to retry by specifying these two
@@ -465,7 +464,6 @@ extern pgprot_t protection_map[16];
 #define FAULT_FLAG_REMOTE			0x80
 #define FAULT_FLAG_INSTRUCTION  		0x100
 #define FAULT_FLAG_INTERRUPTIBLE		0x200
-#define FAULT_FLAG_PREFAULT			0x400
 
 /*
  * The default fault flags that should be used by most of the
@@ -503,8 +501,7 @@ static inline bool fault_flag_allow_retry_first(unsigned int flags)
 	{ FAULT_FLAG_USER,		"USER" }, \
 	{ FAULT_FLAG_REMOTE,		"REMOTE" }, \
 	{ FAULT_FLAG_INSTRUCTION,	"INSTRUCTION" }, \
-	{ FAULT_FLAG_INTERRUPTIBLE,	"INTERRUPTIBLE" }, \
-	{ FAULT_FLAG_PREFAULT,		"PREFAULT" }
+	{ FAULT_FLAG_INTERRUPTIBLE,	"INTERRUPTIBLE" }
 
 /*
  * vm_fault is filled by the pagefault handler and passed to the vma's
@@ -997,7 +994,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
 }
 
 vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page);
-void do_set_pte(struct vm_fault *vmf, struct page *page);
+void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr);
 
 vm_fault_t finish_fault(struct vm_fault *vmf);
 vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
diff --git a/mm/filemap.c b/mm/filemap.c
index c192e103843f..46ca16bf1372 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3018,8 +3018,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 	struct file *file = vma->vm_file;
 	struct address_space *mapping = file->f_mapping;
 	pgoff_t last_pgoff = start_pgoff;
-	unsigned long address = vmf->info.address;
-	unsigned long flags = vmf->flags;
+	unsigned long addr;
 	XA_STATE(xas, &mapping->i_pages, start_pgoff);
 	struct page *head, *page;
 	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
@@ -3035,9 +3034,8 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 		goto out;
 	}
 
-	vmf->info.address = vma->vm_start + ((start_pgoff - vma->vm_pgoff) << PAGE_SHIFT);
-	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
-				       vmf->info.address, &vmf->ptl);
+	addr = vma->vm_start + ((start_pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
 	do {
 		page = find_subpage(head, xas.xa_index);
 		if (PageHWPoison(page))
@@ -3046,7 +3044,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 		if (mmap_miss > 0)
 			mmap_miss--;
 
-		vmf->info.address += (xas.xa_index - last_pgoff) << PAGE_SHIFT;
+		addr += (xas.xa_index - last_pgoff) << PAGE_SHIFT;
 		vmf->pte += xas.xa_index - last_pgoff;
 		last_pgoff = xas.xa_index;
 
@@ -3054,16 +3052,12 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 			goto unlock;
 
 		/* We're about to handle the fault */
-		if (vmf->info.address == address) {
-			vmf->flags &= ~FAULT_FLAG_PREFAULT;
+		if (vmf->info.address == addr)
 			ret = VM_FAULT_NOPAGE;
-		} else {
-			vmf->flags |= FAULT_FLAG_PREFAULT;
-		}
 
-		do_set_pte(vmf, page);
+		do_set_pte(vmf, page, addr);
 		/* no need to invalidate: a not-present page won't be cached */
-		update_mmu_cache(vma, vmf->info.address, vmf->pte);
+		update_mmu_cache(vma, addr, vmf->pte);
 		unlock_page(head);
 		continue;
 unlock:
@@ -3073,8 +3067,6 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
 out:
 	rcu_read_unlock();
-	vmf->flags = flags;
-	vmf->info.address = address;
 	WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
 	return ret;
 }
diff --git a/mm/memory.c b/mm/memory.c
index ffb7598046fc..f19184542974 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3741,11 +3741,11 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
 }
 #endif
 
-void do_set_pte(struct vm_fault *vmf, struct page *page)
+void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
 {
 	struct vm_area_struct *vma = vmf->info.vma;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
-	bool prefault = vmf->flags & FAULT_FLAG_PREFAULT;
+	bool prefault = vmf->info.address != addr;
 	pte_t entry;
 
 	flush_icache_page(vma, page);
@@ -3761,13 +3761,13 @@ void do_set_pte(struct vm_fault *vmf, struct page *page)
 	/* copy-on-write page */
 	if (write && !(vma->vm_flags & VM_SHARED)) {
 		inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
-		page_add_new_anon_rmap(page, vma, vmf->info.address, false);
+		page_add_new_anon_rmap(page, vma, addr, false);
 		lru_cache_add_inactive_or_unevictable(page, vma);
 	} else {
 		inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
 		page_add_file_rmap(page, false);
 	}
-	set_pte_at(vma->vm_mm, vmf->info.address, vmf->pte, entry);
+	set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
 }
 
 /**
@@ -3827,7 +3827,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
 	ret = 0;
 	/* Re-check under ptl */
 	if (likely(pte_none(*vmf->pte)))
-		do_set_pte(vmf, page);
+		do_set_pte(vmf, page, vmf->info.address);
 	else
 		ret = VM_FAULT_NOPAGE;
 
-- 
2.30.0.284.gd98b1dd5eaa7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 6/8] mm: Avoid modifying vmf.info.address in __collapse_huge_page_swapin()
  2021-01-14 17:59 [PATCH v3 0/8] Create 'old' ptes for faultaround mappings on arm64 with hardware access flag Will Deacon
                   ` (3 preceding siblings ...)
  2021-01-14 17:59 ` [RFC PATCH 5/8] mm: Pass 'address' to map to do_set_pte() and drop FAULT_FLAG_PREFAULT Will Deacon
@ 2021-01-14 17:59 ` Will Deacon
  2021-01-14 17:59 ` [RFC PATCH 7/8] mm: Use static initialisers for 'info' field of 'struct vm_fault' Will Deacon
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Will Deacon @ 2021-01-14 17:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Jan Kara, Minchan Kim, Catalin Marinas,
	Linus Torvalds, Hugh Dickins, linux-mm, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Will Deacon,
	linux-arm-kernel

In preparation for const-ifying the 'info' field of 'struct vm_fault',
rework __collapse_huge_page_swapin() to avoid continously updating
vmf.info.address and instead populate a new 'struct vm_fault' on the
stack for each page being processed.

Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will@kernel.org>
---
 mm/khugepaged.c | 41 ++++++++++++++++++++---------------------
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 4494c90075fb..86c51a5d92d2 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -991,40 +991,43 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
 
 static bool __collapse_huge_page_swapin(struct mm_struct *mm,
 					struct vm_area_struct *vma,
-					unsigned long address, pmd_t *pmd,
+					unsigned long haddr, pmd_t *pmd,
 					int referenced)
 {
 	int swapped_in = 0;
 	vm_fault_t ret = 0;
-	struct vm_fault vmf = {
-		.info = {
-			.vma = vma,
-			.address = address,
-			.pgoff = linear_page_index(vma, address),
-		},
-		.flags = FAULT_FLAG_ALLOW_RETRY,
-		.pmd = pmd,
-	};
-
-	vmf.pte = pte_offset_map(pmd, address);
-	for (; vmf.info.address < address + HPAGE_PMD_NR*PAGE_SIZE;
-			vmf.pte++, vmf.info.address += PAGE_SIZE) {
+	unsigned long address, end = haddr + (HPAGE_PMD_NR * PAGE_SIZE);
+
+	for (address = haddr; address < end; address += PAGE_SIZE) {
+		struct vm_fault vmf = {
+			.info = {
+				.vma = vma,
+				.address = address,
+				.pgoff = linear_page_index(vma, haddr),
+			},
+			.flags = FAULT_FLAG_ALLOW_RETRY,
+			.pmd = pmd,
+		};
+
+		vmf.pte = pte_offset_map(pmd, address);
 		vmf.orig_pte = *vmf.pte;
-		if (!is_swap_pte(vmf.orig_pte))
+		if (!is_swap_pte(vmf.orig_pte)) {
+			pte_unmap(vmf.pte);
 			continue;
+		}
 		swapped_in++;
 		ret = do_swap_page(&vmf);
 
 		/* do_swap_page returns VM_FAULT_RETRY with released mmap_lock */
 		if (ret & VM_FAULT_RETRY) {
 			mmap_read_lock(mm);
-			if (hugepage_vma_revalidate(mm, address, &vmf.info.vma)) {
+			if (hugepage_vma_revalidate(mm, haddr, &vma)) {
 				/* vma is no longer available, don't continue to swapin */
 				trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
 				return false;
 			}
 			/* check if the pmd is still valid */
-			if (mm_find_pmd(mm, address) != pmd) {
+			if (mm_find_pmd(mm, haddr) != pmd) {
 				trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
 				return false;
 			}
@@ -1033,11 +1036,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
 			trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
 			return false;
 		}
-		/* pte is unmapped now, we need to map it */
-		vmf.pte = pte_offset_map(pmd, vmf.info.address);
 	}
-	vmf.pte--;
-	pte_unmap(vmf.pte);
 
 	/* Drain LRU add pagevec to remove extra pin on the swapped in pages */
 	if (swapped_in)
-- 
2.30.0.284.gd98b1dd5eaa7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 7/8] mm: Use static initialisers for 'info' field of 'struct vm_fault'
  2021-01-14 17:59 [PATCH v3 0/8] Create 'old' ptes for faultaround mappings on arm64 with hardware access flag Will Deacon
                   ` (4 preceding siblings ...)
  2021-01-14 17:59 ` [RFC PATCH 6/8] mm: Avoid modifying vmf.info.address in __collapse_huge_page_swapin() Will Deacon
@ 2021-01-14 17:59 ` Will Deacon
  2021-01-14 17:59 ` [RFC PATCH 8/8] mm: Mark 'info' field of 'struct vm_fault' as 'const' Will Deacon
       [not found] ` <20210114175934.13070-5-will@kernel.org>
  7 siblings, 0 replies; 22+ messages in thread
From: Will Deacon @ 2021-01-14 17:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Jan Kara, Minchan Kim, Catalin Marinas,
	Linus Torvalds, Hugh Dickins, linux-mm, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Will Deacon,
	linux-arm-kernel

In preparation for const-ifying the 'info' field of 'struct vm_fault',
ensure that it is initialised using static initialisers.

Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will@kernel.org>
---
 mm/shmem.c    |  8 +++++---
 mm/swapfile.c | 13 ++++++++-----
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 6831d662fe01..4429e488636e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1520,11 +1520,13 @@ static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp,
 {
 	struct vm_area_struct pvma;
 	struct page *page;
-	struct vm_fault vmf;
+	struct vm_fault vmf = {
+		.info = {
+			.vma = &pvma,
+		},
+	};
 
 	shmem_pseudo_vma_init(&pvma, info, index);
-	vmf.info.vma = &pvma;
-	vmf.info.address = 0;
 	page = swap_cluster_readahead(swap, gfp, &vmf);
 	shmem_pseudo_vma_destroy(&pvma);
 
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 96ac0725feff..2a21bf3cfdbf 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1951,8 +1951,6 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 	si = swap_info[type];
 	pte = pte_offset_map(pmd, addr);
 	do {
-		struct vm_fault vmf;
-
 		if (!is_swap_pte(*pte))
 			continue;
 
@@ -1968,9 +1966,14 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 		swap_map = &si->swap_map[offset];
 		page = lookup_swap_cache(entry, vma, addr);
 		if (!page) {
-			vmf.info.vma = vma;
-			vmf.info.address = addr;
-			vmf.pmd = pmd;
+			struct vm_fault vmf = {
+				.info = {
+					.vma = vma,
+					.address = addr,
+				},
+				.pmd = pmd,
+			};
+
 			page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
 						&vmf);
 		}
-- 
2.30.0.284.gd98b1dd5eaa7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 8/8] mm: Mark 'info' field of 'struct vm_fault' as 'const'
  2021-01-14 17:59 [PATCH v3 0/8] Create 'old' ptes for faultaround mappings on arm64 with hardware access flag Will Deacon
                   ` (5 preceding siblings ...)
  2021-01-14 17:59 ` [RFC PATCH 7/8] mm: Use static initialisers for 'info' field of 'struct vm_fault' Will Deacon
@ 2021-01-14 17:59 ` Will Deacon
       [not found] ` <20210114175934.13070-5-will@kernel.org>
  7 siblings, 0 replies; 22+ messages in thread
From: Will Deacon @ 2021-01-14 17:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Jan Kara, Minchan Kim, Catalin Marinas,
	Linus Torvalds, Hugh Dickins, linux-mm, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Will Deacon,
	linux-arm-kernel

The field is only ever read after being initialised, so mark it 'const'
before somebody tries to modify it again.

Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will@kernel.org>
---
 include/linux/mm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c24dd17b32c5..b45a8e075f7f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -521,7 +521,7 @@ struct vm_fault_info {
 };
 
 struct vm_fault {
-	struct vm_fault_info info;
+	const struct vm_fault_info info;
 	unsigned int flags;		/* FAULT_FLAG_xxx flags
 					 * XXX: should be in vm_fault_info */
 	pmd_t *pmd;			/* Pointer to pmd entry matching
-- 
2.30.0.284.gd98b1dd5eaa7-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 4/8] mm: Separate fault info out of 'struct vm_fault'
       [not found] ` <20210114175934.13070-5-will@kernel.org>
@ 2021-01-14 18:16   ` Linus Torvalds
  2021-01-14 19:00     ` Will Deacon
  0 siblings, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2021-01-14 18:16 UTC (permalink / raw)
  To: Will Deacon
  Cc: Jan Kara, Minchan Kim, Catalin Marinas, Hugh Dickins,
	Linux Kernel Mailing List, Linux-MM, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Android Kernel Team,
	Linux ARM

On Thu, Jan 14, 2021 at 10:01 AM Will Deacon <will@kernel.org> wrote:
>
> Try to clean this up by splitting the immutable fault information out
> into a new 'struct vm_fault_info' which is embedded in 'struct vm_fault'
> and will later be made 'const'. The vast majority of this change was
> performed with a coccinelle patch:

You may have a reason for doing it this way, but my reaction to this
was: "just make the new embedded struct unnamed".

Then you wouldn't need to do all the automated coccinelle changes.

Is there some reason you didn't do that, or just a "oh, I didn't think of it".

                Linus

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 5/8] mm: Pass 'address' to map to do_set_pte() and drop FAULT_FLAG_PREFAULT
  2021-01-14 17:59 ` [RFC PATCH 5/8] mm: Pass 'address' to map to do_set_pte() and drop FAULT_FLAG_PREFAULT Will Deacon
@ 2021-01-14 18:17   ` Linus Torvalds
  2021-01-15 10:24     ` Will Deacon
  0 siblings, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2021-01-14 18:17 UTC (permalink / raw)
  To: Will Deacon
  Cc: Jan Kara, Minchan Kim, Catalin Marinas, Hugh Dickins,
	Linux Kernel Mailing List, Linux-MM, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Android Kernel Team,
	Linux ARM

On Thu, Jan 14, 2021 at 10:01 AM Will Deacon <will@kernel.org> wrote:
>
> Rather than modifying the 'address' field of the 'struct vm_fault_info'
> passed to do_set_pte(), leave that to identify the real faulting address
> and pass in the virtual address to be mapped by the new pte as a
> separate argument.

Ack.

In fact - apart from the question I had about the 'info' sub-structure
- ack on the whole series. But this one struck me particularly as
"that's simpler and clearer" even if that finish_fault() case is now
not as pretty (but with an unnamed structure it would be slightly
simpler, at least).

             Linus

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 4/8] mm: Separate fault info out of 'struct vm_fault'
  2021-01-14 18:16   ` [RFC PATCH 4/8] mm: Separate fault info out of 'struct vm_fault' Linus Torvalds
@ 2021-01-14 19:00     ` Will Deacon
  2021-01-14 19:09       ` Linus Torvalds
  0 siblings, 1 reply; 22+ messages in thread
From: Will Deacon @ 2021-01-14 19:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jan Kara, Minchan Kim, Catalin Marinas, Hugh Dickins,
	Linux Kernel Mailing List, Linux-MM, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Android Kernel Team,
	Linux ARM

On Thu, Jan 14, 2021 at 10:16:13AM -0800, Linus Torvalds wrote:
> On Thu, Jan 14, 2021 at 10:01 AM Will Deacon <will@kernel.org> wrote:
> >
> > Try to clean this up by splitting the immutable fault information out
> > into a new 'struct vm_fault_info' which is embedded in 'struct vm_fault'
> > and will later be made 'const'. The vast majority of this change was
> > performed with a coccinelle patch:
> 
> You may have a reason for doing it this way, but my reaction to this
> was: "just make the new embedded struct unnamed".
> 
> Then you wouldn't need to do all the automated coccinelle changes.
> 
> Is there some reason you didn't do that, or just a "oh, I didn't think of
> it".

I tried that initially, e.g.

struct vm_fault {
	const struct {
		unsigned long address;
		...
	};
};

but I found that I had to make all of the members const to get it to work,
at which point the anonymous struct wasn't really adding anything. Did I
just botch the syntax?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 4/8] mm: Separate fault info out of 'struct vm_fault'
  2021-01-14 19:00     ` Will Deacon
@ 2021-01-14 19:09       ` Linus Torvalds
  2021-01-14 19:41         ` Will Deacon
  0 siblings, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2021-01-14 19:09 UTC (permalink / raw)
  To: Will Deacon
  Cc: Jan Kara, Minchan Kim, Catalin Marinas, Hugh Dickins,
	Linux Kernel Mailing List, Linux-MM, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Android Kernel Team,
	Linux ARM

On Thu, Jan 14, 2021 at 11:00 AM Will Deacon <will@kernel.org> wrote:
>
> I tried that initially, but I found that I had to make all of the
> members const to get it to work, at which point the anonymous struct
> wasn't really adding anything. Did I just botch the syntax?

I'm not sure what you tried. But this stupid test-case sure works for me:

    struct hello {
        const struct {
                unsigned long address;
        };
        unsigned int flags;
    };

    extern int fn(struct hello *);

    int test(void)
    {
        struct hello a = {
                .address = 1,
        };
        a.flags = 0;
        return fn(&a);
    }

and because "address" is in that unnamed constant struct, you can only
set it within that initializer, and cannot do

        a.address = 0;

without an error (the way you _can_ do "a.flags = 0").

I don't see naming the struct making a difference - apart from forcing
that big rename patch, of course.

But maybe we're talking about different issues?

            Linus

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 4/8] mm: Separate fault info out of 'struct vm_fault'
  2021-01-14 19:09       ` Linus Torvalds
@ 2021-01-14 19:41         ` Will Deacon
  2021-01-14 20:09           ` Nick Desaulniers
  2021-01-14 21:11           ` Linus Torvalds
  0 siblings, 2 replies; 22+ messages in thread
From: Will Deacon @ 2021-01-14 19:41 UTC (permalink / raw)
  To: Linus Torvalds, ndesaulniers
  Cc: Jan Kara, Minchan Kim, Catalin Marinas, Hugh Dickins,
	Linux Kernel Mailing List, Linux-MM, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Android Kernel Team,
	Linux ARM

On Thu, Jan 14, 2021 at 11:09:01AM -0800, Linus Torvalds wrote:
> On Thu, Jan 14, 2021 at 11:00 AM Will Deacon <will@kernel.org> wrote:
> >
> > I tried that initially, but I found that I had to make all of the
> > members const to get it to work, at which point the anonymous struct
> > wasn't really adding anything. Did I just botch the syntax?
> 
> I'm not sure what you tried. But this stupid test-case sure works for me:
> 
>     struct hello {
>         const struct {
>                 unsigned long address;
>         };
>         unsigned int flags;
>     };
> 
>     extern int fn(struct hello *);
> 
>     int test(void)
>     {
>         struct hello a = {
>                 .address = 1,
>         };
>         a.flags = 0;
>         return fn(&a);
>     }
> 
> and because "address" is in that unnamed constant struct, you can only
> set it within that initializer, and cannot do
> 
>         a.address = 0;
> 
> without an error (the way you _can_ do "a.flags = 0").
> 
> I don't see naming the struct making a difference - apart from forcing
> that big rename patch, of course.
> 
> But maybe we're talking about different issues?

Urgh...

We _are_ both on the same page, and your reply above had me thinking I've
lost the plot, so I went back to the start. Check out v5.11-rc3 and apply
this patch:


diff --git a/include/linux/mm.h b/include/linux/mm.h
index ecdf8a8cd6ae..1eb950865450 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -514,11 +514,14 @@ static inline bool fault_flag_allow_retry_first(unsigned int flags)
  * pgoff should be used in favour of virtual_address, if possible.
  */
 struct vm_fault {
-       struct vm_area_struct *vma;     /* Target VMA */
+       const struct {
+               struct vm_area_struct *vma;     /* Target VMA */
+               gfp_t gfp_mask;                 /* gfp mask to be used for allocations */
+               pgoff_t pgoff;                  /* Logical page offset based on vma */
+               unsigned long address;          /* Faulting virtual address */
+       };
+
        unsigned int flags;             /* FAULT_FLAG_xxx flags */
-       gfp_t gfp_mask;                 /* gfp mask to be used for allocations */
-       pgoff_t pgoff;                  /* Logical page offset based on vma */
-       unsigned long address;          /* Faulting virtual address */
        pmd_t *pmd;                     /* Pointer to pmd entry matching
                                         * the 'address' */
        pud_t *pud;                     /* Pointer to pud entry matching


Sure enough, an arm64 defconfig builds perfectly alright with that change,
but it really shouldn't. I'm using clang 11.0.5, so I had another go with
GCC 9.2.1 and bang:

mm/filemap.c: In function ‘filemap_map_pages’:
mm/filemap.c:2963:16: error: assignment of member ‘address’ in read-only object
 2963 |   vmf->address += (xas.xa_index - last_pgoff) << PAGE_SHIFT;
      |                ^~
make[1]: *** [scripts/Makefile.build:279: mm/filemap.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:1805: mm] Error 2
make: *** Waiting for unfinished jobs....

Nick -- any clue what's happening here? We would like that const anonymous
struct to behave like a const struct member, as the alternative (naming the
thing) results in a lot of refactoring churn.

Cheers,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 4/8] mm: Separate fault info out of 'struct vm_fault'
  2021-01-14 19:41         ` Will Deacon
@ 2021-01-14 20:09           ` Nick Desaulniers
  2021-01-14 21:11           ` Linus Torvalds
  1 sibling, 0 replies; 22+ messages in thread
From: Nick Desaulniers @ 2021-01-14 20:09 UTC (permalink / raw)
  To: Will Deacon
  Cc: Android Kernel Team, Jan Kara, Minchan Kim, Catalin Marinas,
	Hugh Dickins, Linux Kernel Mailing List, Linux-MM, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Linus Torvalds, Linux ARM

On Thu, Jan 14, 2021 at 11:41 AM Will Deacon <will@kernel.org> wrote:
>
> On Thu, Jan 14, 2021 at 11:09:01AM -0800, Linus Torvalds wrote:
> > On Thu, Jan 14, 2021 at 11:00 AM Will Deacon <will@kernel.org> wrote:
> > >
> > > I tried that initially, but I found that I had to make all of the
> > > members const to get it to work, at which point the anonymous struct
> > > wasn't really adding anything. Did I just botch the syntax?
> >
> > I'm not sure what you tried. But this stupid test-case sure works for me:
> >
> >     struct hello {
> >         const struct {
> >                 unsigned long address;
> >         };
> >         unsigned int flags;
> >     };
> >
> >     extern int fn(struct hello *);
> >
> >     int test(void)
> >     {
> >         struct hello a = {
> >                 .address = 1,
> >         };
> >         a.flags = 0;
> >         return fn(&a);
> >     }
> >
> > and because "address" is in that unnamed constant struct, you can only
> > set it within that initializer, and cannot do
> >
> >         a.address = 0;
> >
> > without an error (the way you _can_ do "a.flags = 0").
> >
> > I don't see naming the struct making a difference - apart from forcing
> > that big rename patch, of course.
> >
> > But maybe we're talking about different issues?
>
> Urgh...
>
> We _are_ both on the same page, and your reply above had me thinking I've
> lost the plot, so I went back to the start. Check out v5.11-rc3 and apply
> this patch:
>
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ecdf8a8cd6ae..1eb950865450 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -514,11 +514,14 @@ static inline bool fault_flag_allow_retry_first(unsigned int flags)
>   * pgoff should be used in favour of virtual_address, if possible.
>   */
>  struct vm_fault {
> -       struct vm_area_struct *vma;     /* Target VMA */
> +       const struct {
> +               struct vm_area_struct *vma;     /* Target VMA */
> +               gfp_t gfp_mask;                 /* gfp mask to be used for allocations */
> +               pgoff_t pgoff;                  /* Logical page offset based on vma */
> +               unsigned long address;          /* Faulting virtual address */
> +       };
> +
>         unsigned int flags;             /* FAULT_FLAG_xxx flags */
> -       gfp_t gfp_mask;                 /* gfp mask to be used for allocations */
> -       pgoff_t pgoff;                  /* Logical page offset based on vma */
> -       unsigned long address;          /* Faulting virtual address */
>         pmd_t *pmd;                     /* Pointer to pmd entry matching
>                                          * the 'address' */
>         pud_t *pud;                     /* Pointer to pud entry matching
>
>
> Sure enough, an arm64 defconfig builds perfectly alright with that change,
> but it really shouldn't. I'm using clang 11.0.5, so I had another go with
> GCC 9.2.1 and bang:
>
> mm/filemap.c: In function ‘filemap_map_pages’:
> mm/filemap.c:2963:16: error: assignment of member ‘address’ in read-only object
>  2963 |   vmf->address += (xas.xa_index - last_pgoff) << PAGE_SHIFT;
>       |                ^~
> make[1]: *** [scripts/Makefile.build:279: mm/filemap.o] Error 1
> make[1]: *** Waiting for unfinished jobs....
> make: *** [Makefile:1805: mm] Error 2
> make: *** Waiting for unfinished jobs....
>
> Nick -- any clue what's happening here? We would like that const anonymous
> struct to behave like a const struct member, as the alternative (naming the
> thing) results in a lot of refactoring churn.

Weird, looks like a bug to me in Clang, filed
https://bugs.llvm.org/show_bug.cgi?id=48755.
-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 4/8] mm: Separate fault info out of 'struct vm_fault'
  2021-01-14 19:41         ` Will Deacon
  2021-01-14 20:09           ` Nick Desaulniers
@ 2021-01-14 21:11           ` Linus Torvalds
  2021-01-15  9:23             ` Will Deacon
  1 sibling, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2021-01-14 21:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: Jan Kara, Minchan Kim, Catalin Marinas, Nick Desaulniers,
	Linux Kernel Mailing List, Linux-MM, Vinayak Menon, Hugh Dickins,
	Kirill A . Shutemov, Andrew Morton, Android Kernel Team,
	Linux ARM

On Thu, Jan 14, 2021 at 11:41 AM Will Deacon <will@kernel.org> wrote:
>
> Sure enough, an arm64 defconfig builds perfectly alright with that change,
> but it really shouldn't. I'm using clang 11.0.5, so I had another go with
> GCC 9.2.1 and bang:

Ok, looks like a clang bug, but a reasonably benign one.

As long as we have sufficient coverage with gcc, we'll get error
reporting in a timely manner for any new incorrect assignments, so I
think we can do that constant anonymous struct even if it does mean
that clang might let some bad cases through (I personally use gcc for
build testing, and then clang for building my boot kernels, so I'd
catch anything x86-64 allmodconfig in my build tests).

And keeping it unnamed it would avoid a lot of noisy churn..

             Linus

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 4/8] mm: Separate fault info out of 'struct vm_fault'
  2021-01-14 21:11           ` Linus Torvalds
@ 2021-01-15  9:23             ` Will Deacon
  2021-01-15 21:32               ` Linus Torvalds
  0 siblings, 1 reply; 22+ messages in thread
From: Will Deacon @ 2021-01-15  9:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jan Kara, Minchan Kim, Catalin Marinas, Nick Desaulniers,
	Linux Kernel Mailing List, Linux-MM, Vinayak Menon, Hugh Dickins,
	Kirill A . Shutemov, Andrew Morton, Android Kernel Team,
	Linux ARM

On Thu, Jan 14, 2021 at 01:11:12PM -0800, Linus Torvalds wrote:
> On Thu, Jan 14, 2021 at 11:41 AM Will Deacon <will@kernel.org> wrote:
> >
> > Sure enough, an arm64 defconfig builds perfectly alright with that change,
> > but it really shouldn't. I'm using clang 11.0.5, so I had another go with
> > GCC 9.2.1 and bang:
> 
> Ok, looks like a clang bug, but a reasonably benign one.
> 
> As long as we have sufficient coverage with gcc, we'll get error
> reporting in a timely manner for any new incorrect assignments, so I
> think we can do that constant anonymous struct even if it does mean
> that clang might let some bad cases through (I personally use gcc for
> build testing, and then clang for building my boot kernels, so I'd
> catch anything x86-64 allmodconfig in my build tests).
> 
> And keeping it unnamed it would avoid a lot of noisy churn..

Hmm. The feedback on the clang bug suggests that GCC is the one in the
wrong here (although the argument is based on C11 and I haven't trawled
through the standards to see how this has evolved):

https://bugs.llvm.org/show_bug.cgi?id=48755#c1

There is at least some sympathy to generating a warning, so that might
be good enough. Otherwise, I suppose we can explicitly mark the fields
as 'const' but I won't jump to that immediately.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 5/8] mm: Pass 'address' to map to do_set_pte() and drop FAULT_FLAG_PREFAULT
  2021-01-14 18:17   ` Linus Torvalds
@ 2021-01-15 10:24     ` Will Deacon
  0 siblings, 0 replies; 22+ messages in thread
From: Will Deacon @ 2021-01-15 10:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jan Kara, Minchan Kim, Catalin Marinas, Hugh Dickins,
	Linux Kernel Mailing List, Linux-MM, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Android Kernel Team,
	Linux ARM

On Thu, Jan 14, 2021 at 10:17:22AM -0800, Linus Torvalds wrote:
> On Thu, Jan 14, 2021 at 10:01 AM Will Deacon <will@kernel.org> wrote:
> >
> > Rather than modifying the 'address' field of the 'struct vm_fault_info'
> > passed to do_set_pte(), leave that to identify the real faulting address
> > and pass in the virtual address to be mapped by the new pte as a
> > separate argument.
> 
> Ack.
> 
> In fact - apart from the question I had about the 'info' sub-structure
> - ack on the whole series. But this one struck me particularly as
> "that's simpler and clearer" even if that finish_fault() case is now
> not as pretty (but with an unnamed structure it would be slightly
> simpler, at least).

Thanks. I'll post another spin next week to avoid the named structure, but
if you (or anybody else) get a chance to glance over some of the other
things I mentioned in the cover letter beforehand that would be really
great.

Cheers,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 4/8] mm: Separate fault info out of 'struct vm_fault'
  2021-01-15  9:23             ` Will Deacon
@ 2021-01-15 21:32               ` Linus Torvalds
  2021-01-20  0:00                 ` Nick Desaulniers
  0 siblings, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2021-01-15 21:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: Jan Kara, Minchan Kim, Catalin Marinas, Nick Desaulniers,
	Linux Kernel Mailing List, Linux-MM, Vinayak Menon, Hugh Dickins,
	Kirill A . Shutemov, Andrew Morton, Android Kernel Team,
	Linux ARM

On Fri, Jan 15, 2021 at 1:23 AM Will Deacon <will@kernel.org> wrote:
>
> Hmm. The feedback on the clang bug suggests that GCC is the one in the
> wrong here (although the argument is based on C11 and I haven't trawled
> through the standards to see how this has evolved):

Oh well.

That writing is absolutely the _worst_ kind of weaselwording standards
language reading, trying to make excuses for bad behavior by basically
depending on "this language is unclear", and trying to say that the
buggy behavior is required by C11.

What a disappointment.

Absolutely nothing in the quoted C11 language says to me what that bug
entry claims it says.

The argument seems to hinge on

   "The members of an anonymous structure or union are considered to
be members of the containing structure or union"

and then it makes the completely uncalled-for leap that that means
that because it was "int" in the const struct, it must be "int" in the
containing structure too.

Which is complete BS, and doesn't follow logically _or_ grammatically.
It would be a "member of the containing structure" even with the
"const" qualifier, so the argument they make is just inane.

In fact, the _other_ sentence they quote clearly points to this being
just a clang bug:

 "A modifiable lvalue is [...] if it is a structure or union, does not
have any member (including, recursively, any member or element of all
contained aggregates or unions) with a const- qualified type"

and clearly this recursively is an element with a const-qualified
recursive struct.

Whatever. It's one of those "read the documentation squint-eyed to
avoid doing the right thing" arguments.

It's not worth arguing with people like that, and let's just ignore
the clang bug here.

            Linus

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 4/8] mm: Separate fault info out of 'struct vm_fault'
  2021-01-15 21:32               ` Linus Torvalds
@ 2021-01-20  0:00                 ` Nick Desaulniers
  0 siblings, 0 replies; 22+ messages in thread
From: Nick Desaulniers @ 2021-01-20  0:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Android Kernel Team, Jan Kara, Minchan Kim, Catalin Marinas,
	Hugh Dickins, Linux Kernel Mailing List, Linux-MM, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Will Deacon, Linux ARM

On Fri, Jan 15, 2021 at 1:33 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, Jan 15, 2021 at 1:23 AM Will Deacon <will@kernel.org> wrote:
> >
> > Hmm. The feedback on the clang bug suggests that GCC is the one in the
> > wrong here (although the argument is based on C11 and I haven't trawled
> > through the standards to see how this has evolved):
>
> Oh well.
>
> That writing is absolutely the _worst_ kind of weaselwording standards
> language reading, trying to make excuses for bad behavior by basically
> depending on "this language is unclear", and trying to say that the
> buggy behavior is required by C11.
>
> What a disappointment.

I don't really understand British humor either, but I assume that's
how the language lawyers throw shade on one anothers' standards.
Richard is both the WG21 spec editor (C++) and British, IIRC.
Apparently, there's a long conversion (behind closed doors; it's the
ISO way) going on in regards to the thread Richard has kicked off with
them (WG14; C).  Moreso on what should happen with the _Atomic
qualifier, assignments, and memcpy.  So it is still an important thing
to nail down the language spec.

Note there were also a lot of discussions lately on "where should the
volatile qualifier be allowed, or not."
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1152r0.html
https://www.youtube.com/watch?v=KJW_DLaVXIY
(2018? ok, maybe not lately.  Lately for C)

I view this similarly as "where should the const qualifier be allowed, or not."
-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/8] mm: Cleanup faultaround and finish_fault() codepaths
  2021-01-14 17:59 ` [PATCH v3 1/8] mm: Cleanup faultaround and finish_fault() codepaths Will Deacon
@ 2021-02-09 20:24   ` Guenter Roeck
  2021-02-10 11:44     ` Will Deacon
  0 siblings, 1 reply; 22+ messages in thread
From: Guenter Roeck @ 2021-02-09 20:24 UTC (permalink / raw)
  To: Will Deacon
  Cc: kernel-team, Jan Kara, Minchan Kim, Catalin Marinas,
	Hugh Dickins, linux-kernel, linux-mm, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Linus Torvalds,
	linux-arm-kernel

On Thu, Jan 14, 2021 at 05:59:27PM +0000, Will Deacon wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> alloc_set_pte() has two users with different requirements: in the
> faultaround code, it called from an atomic context and PTE page table
> has to be preallocated. finish_fault() can sleep and allocate page table
> as needed.
> 
> PTL locking rules are also strange, hard to follow and overkill for
> finish_fault().
> 
> Let's untangle the mess. alloc_set_pte() has gone now. All locking is
> explicit.
> 
> The price is some code duplication to handle huge pages in faultaround
> path, but it should be fine, having overall improvement in readability.
> 
> Link: https://lore.kernel.org/r/20201229132819.najtavneutnf7ajp@box
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> [will: s/from from/from/ in comment; spotted by willy]
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  fs/xfs/xfs_file.c       |   6 +-
>  include/linux/mm.h      |  12 ++-
>  include/linux/pgtable.h |  11 +++
>  mm/filemap.c            | 177 ++++++++++++++++++++++++++---------
>  mm/memory.c             | 199 ++++++++++++----------------------------
>  5 files changed, 213 insertions(+), 192 deletions(-)
> 

When building microblaze:mmu_defconfig:

mm/filemap.c: In function 'filemap_map_pages':
mm/filemap.c:3153:3: error: implicit declaration of function 'update_mmu_cache'; did you mean 'update_mmu_tlb'?

Bisect log attached.

Guenter

---
# bad: [a4bfd8d46ac357c12529e4eebb6c89502b03ecc9] Add linux-next specific files for 20210209
# good: [92bf22614b21a2706f4993b278017e437f7785b3] Linux 5.11-rc7
git bisect start 'HEAD' 'v5.11-rc7'
# bad: [a8eb921ba7e8e77d994a1c6c69c8ef08456ecf53] Merge remote-tracking branch 'crypto/master'
git bisect bad a8eb921ba7e8e77d994a1c6c69c8ef08456ecf53
# bad: [b68df186dae8ae890df08059bb068b78252b053a] Merge remote-tracking branch 'hid/for-next'
git bisect bad b68df186dae8ae890df08059bb068b78252b053a
# bad: [323c9f6fb99b033883b404ecbc811e7b283a60b3] Merge remote-tracking branch 'sunxi/sunxi/for-next'
git bisect bad 323c9f6fb99b033883b404ecbc811e7b283a60b3
# bad: [4053c8a4d3b53205272aa16b65a5b7ed1a3a5b3e] Merge remote-tracking branch 'arm-soc/for-next'
git bisect bad 4053c8a4d3b53205272aa16b65a5b7ed1a3a5b3e
# good: [dfb8870aed6ad71fb7e378274521bf68a7d465cb] Merge branch 'arm/dt' into for-next
git bisect good dfb8870aed6ad71fb7e378274521bf68a7d465cb
# good: [9ad7e5a35f96b8bf8aebedfd8f397a64eecb21bd] Merge remote-tracking branch 'arm/for-next'
git bisect good 9ad7e5a35f96b8bf8aebedfd8f397a64eecb21bd
# good: [edc55d8409542cd05d5c17203615a162cddbcb4c] Merge branch 'arm/drivers' into for-next
git bisect good edc55d8409542cd05d5c17203615a162cddbcb4c
# good: [d1bbc35fcab28668c8992c4d5777234b794d7306] arm64: hibernate: add __force attribute to gfp_t casting
git bisect good d1bbc35fcab28668c8992c4d5777234b794d7306
# bad: [fb01b86f47a44f0c03278a7cc78ece8415898ed0] Merge branches 'for-next/cosmetic', 'for-next/crypto', 'for-next/faultaround', 'for-next/from-tip/irq/urgent', 'for-next/kexec', 'for-next/misc', 'for-next/mm', 'for-next/perf', 'for-next/random', 'for-next/rng', 'for-next/selftests', 'for-next/stacktrace', 'for-next/topology' and 'for-next/vdso' into for-next/core
git bisect bad fb01b86f47a44f0c03278a7cc78ece8415898ed0
# good: [750d43b4a79e5f3767ee1db933b42abdf967ce1e] dt-bindings: arm: add Cortex-A78 binding
git bisect good 750d43b4a79e5f3767ee1db933b42abdf967ce1e
# good: [0188a894c390e51475274ece76b4d601782d709e] arm64: vmlinux.ld.S: add assertion for tramp_pg_dir offset
git bisect good 0188a894c390e51475274ece76b4d601782d709e
# bad: [3f98a28cc37253269b4104cf95a51f7716a2eb97] mm/nommu: Fix return type of filemap_map_pages()
git bisect bad 3f98a28cc37253269b4104cf95a51f7716a2eb97
# bad: [742d33729a0df11c9d8d4625dbf21dd20cdefd44] mm: Move immutable fields of 'struct vm_fault' into anonymous struct
git bisect bad 742d33729a0df11c9d8d4625dbf21dd20cdefd44
# bad: [46bdb4277f98e70d0c91f4289897ade533fe9e80] mm: Allow architectures to request 'old' entries when prefaulting
git bisect bad 46bdb4277f98e70d0c91f4289897ade533fe9e80
# bad: [f9ce0be71d1fbb038ada15ced83474b0e63f264d] mm: Cleanup faultaround and finish_fault() codepaths
git bisect bad f9ce0be71d1fbb038ada15ced83474b0e63f264d
# first bad commit: [f9ce0be71d1fbb038ada15ced83474b0e63f264d] mm: Cleanup faultaround and finish_fault() codepaths


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/8] mm: Cleanup faultaround and finish_fault() codepaths
  2021-02-09 20:24   ` Guenter Roeck
@ 2021-02-10 11:44     ` Will Deacon
  2021-02-10 14:57       ` Guenter Roeck
  0 siblings, 1 reply; 22+ messages in thread
From: Will Deacon @ 2021-02-10 11:44 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: kernel-team, Jan Kara, Minchan Kim, Catalin Marinas,
	Hugh Dickins, linux-kernel, linux-mm, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Linus Torvalds,
	linux-arm-kernel

On Tue, Feb 09, 2021 at 12:24:49PM -0800, Guenter Roeck wrote:
> On Thu, Jan 14, 2021 at 05:59:27PM +0000, Will Deacon wrote:
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > 
> > alloc_set_pte() has two users with different requirements: in the
> > faultaround code, it called from an atomic context and PTE page table
> > has to be preallocated. finish_fault() can sleep and allocate page table
> > as needed.
> > 
> > PTL locking rules are also strange, hard to follow and overkill for
> > finish_fault().
> > 
> > Let's untangle the mess. alloc_set_pte() has gone now. All locking is
> > explicit.
> > 
> > The price is some code duplication to handle huge pages in faultaround
> > path, but it should be fine, having overall improvement in readability.
> > 
> > Link: https://lore.kernel.org/r/20201229132819.najtavneutnf7ajp@box
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > [will: s/from from/from/ in comment; spotted by willy]
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  fs/xfs/xfs_file.c       |   6 +-
> >  include/linux/mm.h      |  12 ++-
> >  include/linux/pgtable.h |  11 +++
> >  mm/filemap.c            | 177 ++++++++++++++++++++++++++---------
> >  mm/memory.c             | 199 ++++++++++++----------------------------
> >  5 files changed, 213 insertions(+), 192 deletions(-)
> > 
> 
> When building microblaze:mmu_defconfig:
> 
> mm/filemap.c: In function 'filemap_map_pages':
> mm/filemap.c:3153:3: error: implicit declaration of function 'update_mmu_cache'; did you mean 'update_mmu_tlb'?
> 
> Bisect log attached.

Looks like a missing include.

Will

--->8

From 076f93117c067d5b6caab4773c6d6da130859cc4 Mon Sep 17 00:00:00 2001
From: Will Deacon <will@kernel.org>
Date: Wed, 10 Feb 2021 11:15:11 +0000
Subject: [PATCH] mm: filemap: Fix microblaze build failure with
 'mmu_defconfig'

Commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault()
codepaths") added a call to 'update_mmu_cache()' in mm/filemap.c, which
breaks the build for microblaze:

  | mm/filemap.c: In function 'filemap_map_pages':
  | mm/filemap.c:3153:3: error: implicit declaration of function 'update_mmu_cache'; did you mean 'update_mmu_tlb'?

Include asm/tlbflush.h in mm/filemap.c to make sure that the function
(or indeed, macro) is available.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20210209202449.GA104837@roeck-us.net
Signed-off-by: Will Deacon <will@kernel.org>
---
 mm/filemap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index fb7a8d9b5603..2ca13227747b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -43,6 +43,7 @@
 #include <linux/ramfs.h>
 #include <linux/page_idle.h>
 #include <asm/pgalloc.h>
+#include <asm/tlbflush.h>
 #include "internal.h"
 
 #define CREATE_TRACE_POINTS
-- 
2.30.0.478.g8a0d178c01-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/8] mm: Cleanup faultaround and finish_fault() codepaths
  2021-02-10 11:44     ` Will Deacon
@ 2021-02-10 14:57       ` Guenter Roeck
  0 siblings, 0 replies; 22+ messages in thread
From: Guenter Roeck @ 2021-02-10 14:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: kernel-team, Jan Kara, Minchan Kim, Catalin Marinas,
	Hugh Dickins, linux-kernel, linux-mm, Vinayak Menon,
	Kirill A . Shutemov, Andrew Morton, Linus Torvalds,
	linux-arm-kernel

On 2/10/21 3:44 AM, Will Deacon wrote:
> On Tue, Feb 09, 2021 at 12:24:49PM -0800, Guenter Roeck wrote:
>> On Thu, Jan 14, 2021 at 05:59:27PM +0000, Will Deacon wrote:
>>> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>>
>>> alloc_set_pte() has two users with different requirements: in the
>>> faultaround code, it called from an atomic context and PTE page table
>>> has to be preallocated. finish_fault() can sleep and allocate page table
>>> as needed.
>>>
>>> PTL locking rules are also strange, hard to follow and overkill for
>>> finish_fault().
>>>
>>> Let's untangle the mess. alloc_set_pte() has gone now. All locking is
>>> explicit.
>>>
>>> The price is some code duplication to handle huge pages in faultaround
>>> path, but it should be fine, having overall improvement in readability.
>>>
>>> Link: https://lore.kernel.org/r/20201229132819.najtavneutnf7ajp@box
>>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>>> [will: s/from from/from/ in comment; spotted by willy]
>>> Signed-off-by: Will Deacon <will@kernel.org>
>>> ---
>>>  fs/xfs/xfs_file.c       |   6 +-
>>>  include/linux/mm.h      |  12 ++-
>>>  include/linux/pgtable.h |  11 +++
>>>  mm/filemap.c            | 177 ++++++++++++++++++++++++++---------
>>>  mm/memory.c             | 199 ++++++++++++----------------------------
>>>  5 files changed, 213 insertions(+), 192 deletions(-)
>>>
>>
>> When building microblaze:mmu_defconfig:
>>
>> mm/filemap.c: In function 'filemap_map_pages':
>> mm/filemap.c:3153:3: error: implicit declaration of function 'update_mmu_cache'; did you mean 'update_mmu_tlb'?
>>
>> Bisect log attached.
> 
> Looks like a missing include.
> 
Indeed.

> Will
> 
> --->8
> 
>>From 076f93117c067d5b6caab4773c6d6da130859cc4 Mon Sep 17 00:00:00 2001
> From: Will Deacon <will@kernel.org>
> Date: Wed, 10 Feb 2021 11:15:11 +0000
> Subject: [PATCH] mm: filemap: Fix microblaze build failure with
>  'mmu_defconfig'
> 
> Commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault()
> codepaths") added a call to 'update_mmu_cache()' in mm/filemap.c, which
> breaks the build for microblaze:
> 
>   | mm/filemap.c: In function 'filemap_map_pages':
>   | mm/filemap.c:3153:3: error: implicit declaration of function 'update_mmu_cache'; did you mean 'update_mmu_tlb'?
> 
> Include asm/tlbflush.h in mm/filemap.c to make sure that the function
> (or indeed, macro) is available.
> 
> Reported-by: Guenter Roeck <linux@roeck-us.net>
> Link: https://lore.kernel.org/r/20210209202449.GA104837@roeck-us.net
> Signed-off-by: Will Deacon <will@kernel.org>

Tested-by: Guenter Roeck <linux@roeck-us.net>

> ---
>  mm/filemap.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index fb7a8d9b5603..2ca13227747b 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -43,6 +43,7 @@
>  #include <linux/ramfs.h>
>  #include <linux/page_idle.h>
>  #include <asm/pgalloc.h>
> +#include <asm/tlbflush.h>
>  #include "internal.h"
>  
>  #define CREATE_TRACE_POINTS
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-02-10 14:58 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-14 17:59 [PATCH v3 0/8] Create 'old' ptes for faultaround mappings on arm64 with hardware access flag Will Deacon
2021-01-14 17:59 ` [PATCH v3 1/8] mm: Cleanup faultaround and finish_fault() codepaths Will Deacon
2021-02-09 20:24   ` Guenter Roeck
2021-02-10 11:44     ` Will Deacon
2021-02-10 14:57       ` Guenter Roeck
2021-01-14 17:59 ` [PATCH v3 2/8] mm: Allow architectures to request 'old' entries when prefaulting Will Deacon
2021-01-14 17:59 ` [PATCH v3 3/8] arm64: mm: Implement arch_wants_old_prefaulted_pte() Will Deacon
2021-01-14 17:59 ` [RFC PATCH 5/8] mm: Pass 'address' to map to do_set_pte() and drop FAULT_FLAG_PREFAULT Will Deacon
2021-01-14 18:17   ` Linus Torvalds
2021-01-15 10:24     ` Will Deacon
2021-01-14 17:59 ` [RFC PATCH 6/8] mm: Avoid modifying vmf.info.address in __collapse_huge_page_swapin() Will Deacon
2021-01-14 17:59 ` [RFC PATCH 7/8] mm: Use static initialisers for 'info' field of 'struct vm_fault' Will Deacon
2021-01-14 17:59 ` [RFC PATCH 8/8] mm: Mark 'info' field of 'struct vm_fault' as 'const' Will Deacon
     [not found] ` <20210114175934.13070-5-will@kernel.org>
2021-01-14 18:16   ` [RFC PATCH 4/8] mm: Separate fault info out of 'struct vm_fault' Linus Torvalds
2021-01-14 19:00     ` Will Deacon
2021-01-14 19:09       ` Linus Torvalds
2021-01-14 19:41         ` Will Deacon
2021-01-14 20:09           ` Nick Desaulniers
2021-01-14 21:11           ` Linus Torvalds
2021-01-15  9:23             ` Will Deacon
2021-01-15 21:32               ` Linus Torvalds
2021-01-20  0:00                 ` Nick Desaulniers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).