linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	SeongJae Park <sj@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	damon@lists.linux.dev
Subject: 
Date: Thu, 11 May 2023 13:58:43 +0100	[thread overview]
Message-ID: <20230511125848.78621-1-ryan.roberts@arm.com> (raw)

Date: Thu, 11 May 2023 11:38:28 +0100
Subject: [PATCH v1 0/5] Encapsulate PTE contents from non-arch code

Hi All,

This series improves the encapsulation of pte entries by disallowing non-arch
code from directly dereferencing pte_t pointers. Instead code must use a new
helper, `pte_t ptep_deref(pte_t *ptep)`. By default, this helper does a direct
dereference of the pointer, so generated code should be exactly the same. But
it's presence sets us up for arch code being able to override the default to
"virtualize" the ptes without needing to maintain a shadow table.

I intend to take advantage of this for arm64 to enable use of its "contiguous
bit" to coalesce multiple ptes into a single tlb entry, reducing pressure and
improving performance. I have an RFC for the first part of this work at [1]. The
cover letter there also explains the second part, which this series is enabling.

I intend to post an RFC for the contpte changes in due course, but it would be
good to get the ball rolling on this enabler.

There are 2 reasons that I need the encapsulation:

  - Prevent leaking the arch-private PTE_CONT bit to the core code. If the core
    code reads a pte that contains this bit, it could end up calling
    set_pte_at() with the bit set which would confuse the implementation. So we
    can always clear PTE_CONT in ptep_deref() (and ptep_get()) to avoid a leaky
    abstraction.
  - Contiguous ptes have a single access and dirty bit for the contiguous range.
    So we need to "mix-in" those bits when the core is dereferencing a pte that
    lies in the contig range. There is code that dereferences the pte then takes
    different actions based on access/dirty (see e.g. write_protect_page()).

While ptep_get() and ptep_get_lockless() already exist, both of them are
implemented using READ_ONCE() by default. While we could use ptep_get() instead
of the new ptep_deref(), I didn't want to risk performance regression.
Alternatively, all call sites that currently use ptep_get() that need the
lockless behaviour could be upgraded to ptep_get_lockless() and ptep_get() could
be downgraded to a simple dereference. That would be cleanest, but is a much
bigger (and likely error prone) change because all the arch code would need to
be updated for the new definitions of ptep_get().

The series is split up as follows:

patchs 1-2: Fix bugs where code was _setting_ ptes directly, rather than using
            set_pte_at() and friends.
patch 3:    Fix highmem unmapping issue I spotted while doing the work.
patch 4:    Introduce the new ptep_deref() helper with default implementation.
patch 5:    Convert all direct dereferences to use ptep_deref().

[1] https://lore.kernel.org/linux-mm/20230414130303.2345383-1-ryan.roberts@arm.com/

Thanks,
Ryan


Ryan Roberts (5):
  mm: vmalloc must set pte via arch code
  mm: damon must atomically clear young on ptes and pmds
  mm: Fix failure to unmap pte on highmem systems
  mm: Add new ptep_deref() helper to fully encapsulate pte_t
  mm: ptep_deref() conversion

 .../drm/i915/gem/selftests/i915_gem_mman.c    |   8 +-
 drivers/misc/sgi-gru/grufault.c               |   2 +-
 drivers/vfio/vfio_iommu_type1.c               |   7 +-
 drivers/xen/privcmd.c                         |   2 +-
 fs/proc/task_mmu.c                            |  33 +++---
 fs/userfaultfd.c                              |   6 +-
 include/linux/hugetlb.h                       |   2 +-
 include/linux/mm_inline.h                     |   2 +-
 include/linux/pgtable.h                       |  13 ++-
 kernel/events/uprobes.c                       |   2 +-
 mm/damon/ops-common.c                         |  18 ++-
 mm/damon/ops-common.h                         |   4 +-
 mm/damon/paddr.c                              |   6 +-
 mm/damon/vaddr.c                              |  14 ++-
 mm/filemap.c                                  |   2 +-
 mm/gup.c                                      |  21 ++--
 mm/highmem.c                                  |  12 +-
 mm/hmm.c                                      |   2 +-
 mm/huge_memory.c                              |   4 +-
 mm/hugetlb.c                                  |   2 +-
 mm/hugetlb_vmemmap.c                          |   6 +-
 mm/kasan/init.c                               |   9 +-
 mm/kasan/shadow.c                             |  10 +-
 mm/khugepaged.c                               |  24 ++--
 mm/ksm.c                                      |  22 ++--
 mm/madvise.c                                  |   6 +-
 mm/mapping_dirty_helpers.c                    |   4 +-
 mm/memcontrol.c                               |   4 +-
 mm/memory-failure.c                           |   6 +-
 mm/memory.c                                   | 103 +++++++++---------
 mm/mempolicy.c                                |   6 +-
 mm/migrate.c                                  |  14 ++-
 mm/migrate_device.c                           |  14 ++-
 mm/mincore.c                                  |   2 +-
 mm/mlock.c                                    |   6 +-
 mm/mprotect.c                                 |   8 +-
 mm/mremap.c                                   |   2 +-
 mm/page_table_check.c                         |   4 +-
 mm/page_vma_mapped.c                          |  26 +++--
 mm/pgtable-generic.c                          |   2 +-
 mm/rmap.c                                     |  32 +++---
 mm/sparse-vmemmap.c                           |   8 +-
 mm/swap_state.c                               |   4 +-
 mm/swapfile.c                                 |  16 +--
 mm/userfaultfd.c                              |   4 +-
 mm/vmalloc.c                                  |  11 +-
 mm/vmscan.c                                   |  14 ++-
 virt/kvm/kvm_main.c                           |   9 +-
 48 files changed, 302 insertions(+), 236 deletions(-)

--
2.25.1


             reply	other threads:[~2023-05-11 12:59 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-11 12:58 Ryan Roberts [this message]
2023-05-11 12:58 ` [PATCH v1 1/5] mm: vmalloc must set pte via arch code Ryan Roberts
2023-05-11 13:14   ` Ryan Roberts
2023-05-11 12:58 ` [PATCH v1 2/5] mm: damon must atomically clear young on ptes and pmds Ryan Roberts
2023-05-11 13:14   ` Ryan Roberts
2023-05-11 12:58 ` [PATCH v1 3/5] mm: Fix failure to unmap pte on highmem systems Ryan Roberts
2023-05-11 13:14   ` Ryan Roberts
2023-05-11 12:58 ` [PATCH v1 4/5] mm: Add new ptep_deref() helper to fully encapsulate pte_t Ryan Roberts
2023-05-11 13:14   ` Ryan Roberts
2023-05-11 12:58 ` [PATCH v1 5/5] mm: ptep_deref() conversion Ryan Roberts
2023-05-11 13:14   ` Ryan Roberts
2023-05-12 10:34   ` kernel test robot
2023-05-11 13:13 ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230511125848.78621-1-ryan.roberts@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=damon@lists.linux.dev \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sj@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).