* [PATCH 1/2] mm: refactor do_fault_around()
2023-03-17 21:58 [PATCH 0/2] Refactor do_fault_around() Lorenzo Stoakes
@ 2023-03-17 21:58 ` Lorenzo Stoakes
2023-03-17 21:58 ` [PATCH 2/2] mm: pefer fault_around_pages to fault_around_bytes Lorenzo Stoakes
2023-03-17 23:39 ` [PATCH 0/2] Refactor do_fault_around() Andrew Morton
2 siblings, 0 replies; 6+ messages in thread
From: Lorenzo Stoakes @ 2023-03-17 21:58 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton
Cc: David Hildenbrand, Matthew Wilcox, Vlastimil Babka, Lorenzo Stoakes
The existing logic is confusing and fails to abstract a number of bitwise
tricks.
Use ALIGN_DOWN() to perform alignment, pte_index() to obtain a PTE index
and represent the address range using PTE offsets, which naturally make it
clear that the operation is intended to occur within only a single PTE and
prevent spanning of more than one page table.
We rely on the fact that fault_around_bytes will always be page-aligned, at
least one page in size, a power of two and that it will not exceed
PAGE_SIZE * PTRS_PER_PTE in size (i.e. the address space mapped by a
PTE). These are all guaranteed by fault_around_bytes_set().
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
mm/memory.c | 38 ++++++++++++++++----------------------
1 file changed, 16 insertions(+), 22 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index c5f1bf906d0c..3d85aa7106b0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4437,8 +4437,8 @@ late_initcall(fault_around_debugfs);
* It uses vm_ops->map_pages() to map the pages, which skips the page if it's
* not ready to be mapped: not up-to-date, locked, etc.
*
- * This function doesn't cross the VMA boundaries, in order to call map_pages()
- * only once.
+ * This function doesn't cross VMA or page table boundaries, in order to call
+ * map_pages() and acquire a PTE lock only once.
*
* fault_around_bytes defines how many bytes we'll try to map.
* do_fault_around() expects it to be set to a power of two less than or equal
@@ -4451,27 +4451,19 @@ late_initcall(fault_around_debugfs);
*/
static vm_fault_t do_fault_around(struct vm_fault *vmf)
{
- unsigned long address = vmf->address, nr_pages, mask;
- pgoff_t start_pgoff = vmf->pgoff;
- pgoff_t end_pgoff;
- int off;
+ pgoff_t nr_pages = READ_ONCE(fault_around_bytes) >> PAGE_SHIFT;
+ pgoff_t pte_off = pte_index(vmf->address);
+ /* The page offset of vmf->address within the VMA. */
+ pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
+ pgoff_t from_pte, to_pte;
- nr_pages = READ_ONCE(fault_around_bytes) >> PAGE_SHIFT;
- mask = ~(nr_pages * PAGE_SIZE - 1) & PAGE_MASK;
+ /* The PTE offset of the start address, clamped to the VMA. */
+ from_pte = max(ALIGN_DOWN(pte_off, nr_pages),
+ pte_off - min(pte_off, vma_off));
- address = max(address & mask, vmf->vma->vm_start);
- off = ((vmf->address - address) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1);
- start_pgoff -= off;
-
- /*
- * end_pgoff is either the end of the page table, the end of
- * the vma or nr_pages from start_pgoff, depending what is nearest.
- */
- end_pgoff = start_pgoff -
- ((address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)) +
- PTRS_PER_PTE - 1;
- end_pgoff = min3(end_pgoff, vma_pages(vmf->vma) + vmf->vma->vm_pgoff - 1,
- start_pgoff + nr_pages - 1);
+ /* The PTE offset of the end address, clamped to the VMA and PTE. */
+ to_pte = min3(from_pte + nr_pages, (pgoff_t)PTRS_PER_PTE,
+ pte_off + vma_pages(vmf->vma) - vma_off) - 1;
if (pmd_none(*vmf->pmd)) {
vmf->prealloc_pte = pte_alloc_one(vmf->vma->vm_mm);
@@ -4479,7 +4471,9 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
return VM_FAULT_OOM;
}
- return vmf->vma->vm_ops->map_pages(vmf, start_pgoff, end_pgoff);
+ return vmf->vma->vm_ops->map_pages(vmf,
+ vmf->pgoff + from_pte - pte_off,
+ vmf->pgoff + to_pte - pte_off);
}
/* Return true if we should do read fault-around, false otherwise */
--
2.39.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] mm: pefer fault_around_pages to fault_around_bytes
2023-03-17 21:58 [PATCH 0/2] Refactor do_fault_around() Lorenzo Stoakes
2023-03-17 21:58 ` [PATCH 1/2] mm: refactor do_fault_around() Lorenzo Stoakes
@ 2023-03-17 21:58 ` Lorenzo Stoakes
2023-03-17 23:39 ` [PATCH 0/2] Refactor do_fault_around() Andrew Morton
2 siblings, 0 replies; 6+ messages in thread
From: Lorenzo Stoakes @ 2023-03-17 21:58 UTC (permalink / raw)
To: linux-mm, linux-kernel, Andrew Morton
Cc: David Hildenbrand, Matthew Wilcox, Vlastimil Babka, Lorenzo Stoakes
All use of this value is now at page granularity, so specify the variable
as such too. This simplifies the logic.
We maintain the debugfs entry to ensure that there are no user-visible
changes.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
mm/memory.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 3d85aa7106b0..ae01f541ad30 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4393,13 +4393,13 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
return ret;
}
-static unsigned long fault_around_bytes __read_mostly =
- rounddown_pow_of_two(65536);
+static unsigned long fault_around_pages __read_mostly =
+ 65536 >> PAGE_SHIFT;
#ifdef CONFIG_DEBUG_FS
static int fault_around_bytes_get(void *data, u64 *val)
{
- *val = fault_around_bytes;
+ *val = fault_around_pages << PAGE_SHIFT;
return 0;
}
@@ -4411,10 +4411,13 @@ static int fault_around_bytes_set(void *data, u64 val)
{
if (val / PAGE_SIZE > PTRS_PER_PTE)
return -EINVAL;
- if (val > PAGE_SIZE)
- fault_around_bytes = rounddown_pow_of_two(val);
- else
- fault_around_bytes = PAGE_SIZE; /* rounddown_pow_of_two(0) is undefined */
+
+ /*
+ * The minimum value is 1 page, however this results in no fault-around
+ * at all. See should_fault_around().
+ */
+ fault_around_pages = max(rounddown_pow_of_two(val) >> PAGE_SHIFT, 1UL);
+
return 0;
}
DEFINE_DEBUGFS_ATTRIBUTE(fault_around_bytes_fops,
@@ -4440,18 +4443,18 @@ late_initcall(fault_around_debugfs);
* This function doesn't cross VMA or page table boundaries, in order to call
* map_pages() and acquire a PTE lock only once.
*
- * fault_around_bytes defines how many bytes we'll try to map.
+ * fault_around_pages defines how many pages we'll try to map.
* do_fault_around() expects it to be set to a power of two less than or equal
* to PTRS_PER_PTE.
*
* The virtual address of the area that we map is naturally aligned to
- * fault_around_bytes rounded down to the machine page size
+ * fault_around_pages * PAGE_SIZE rounded down to the machine page size
* (and therefore to page order). This way it's easier to guarantee
* that we don't cross page table boundaries.
*/
static vm_fault_t do_fault_around(struct vm_fault *vmf)
{
- pgoff_t nr_pages = READ_ONCE(fault_around_bytes) >> PAGE_SHIFT;
+ pgoff_t nr_pages = READ_ONCE(fault_around_pages);
pgoff_t pte_off = pte_index(vmf->address);
/* The page offset of vmf->address within the VMA. */
pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
@@ -4486,7 +4489,8 @@ static inline bool should_fault_around(struct vm_fault *vmf)
if (uffd_disable_fault_around(vmf->vma))
return false;
- return fault_around_bytes >> PAGE_SHIFT > 1;
+ /* A single page implies no faulting 'around' at all. */
+ return fault_around_pages > 1;
}
static vm_fault_t do_read_fault(struct vm_fault *vmf)
--
2.39.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] Refactor do_fault_around()
2023-03-17 21:58 [PATCH 0/2] Refactor do_fault_around() Lorenzo Stoakes
2023-03-17 21:58 ` [PATCH 1/2] mm: refactor do_fault_around() Lorenzo Stoakes
2023-03-17 21:58 ` [PATCH 2/2] mm: pefer fault_around_pages to fault_around_bytes Lorenzo Stoakes
@ 2023-03-17 23:39 ` Andrew Morton
2023-03-17 23:48 ` Lorenzo Stoakes
2 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2023-03-17 23:39 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: linux-mm, linux-kernel, David Hildenbrand, Matthew Wilcox,
Vlastimil Babka
On Fri, 17 Mar 2023 21:58:24 +0000 Lorenzo Stoakes <lstoakes@gmail.com> wrote:
> Refactor do_fault_around() to avoid bitwise tricks and arather difficult to
> follow logic. Additionally, prefer fault_around_pages to
> fault_around_bytes as the operations are performed at a base page
> granularity.
>
> I have run this code against a small program I wrote which generates
> significant input data and compares output with the original function to
> ensure that it behaves the same as the old code across varying vmf, vma and
> fault_around_pages inputs.
Well, what changes were you looking for in that testing?
do_fault_around() could become a no-op and most tests wouldn't notice.
What we'd be looking for to test these changes is performance
differences.
Perhaps one could add a tracepoint to do_fault_around()'s call to
->map_pages, check that the before-and-after traces are identical.
Or, if you're old school and lazy,
if (!strcmp(current->comm, "name-of-my-test-program"))
printk("name-of-my-test-program: %lu %lu\n",
start_pgoff, end_pgoff)
then grep-n-diff the dmesg output.
^ permalink raw reply [flat|nested] 6+ messages in thread