linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* API for setting multiple PTEs at once
@ 2023-02-02 21:14 Matthew Wilcox
  2023-02-02 21:48 ` Kirill A. Shutemov
  2023-02-07 20:27 ` Matthew Wilcox
  0 siblings, 2 replies; 10+ messages in thread
From: Matthew Wilcox @ 2023-02-02 21:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Yin Fengwei, linux-mm

For those of you not subscribed, linux-mm is currently discussing
how best to handle page faults on large folios.  I simply made it work
when adding large folio support.  Now Yin Fengwei is working on
making it fast.

https://lore.kernel.org/linux-mm/Y9qjn0Y+1ir787nc@casper.infradead.org/
is perhaps the best place to start as it pertains to what the
architecture will see.

At the bottom of that function, I propose

+       for (i = 0; i < nr; i++) {
+               set_pte_at(vma->vm_mm, addr, vmf->pte + i, entry);
+               /* no need to invalidate: a not-present page won't be cached */
+               update_mmu_cache(vma, addr, vmf->pte + i);
+               addr += PAGE_SIZE;
+		entry = pte_next(entry);
+	}

(or I would have, had I not forgotten that pte_t isn't an integral type)

But I think that some architectures want to mark PTEs specially for
"This is part of a contiguous range" -- ARM, perhaps?  So would you like
an API like:

	arch_set_ptes(mm, addr, vmf->pte, entry, nr);
	update_mmu_cache_range(vma, addr, vmf->pte, nr);

There are some challenges here.  For example, folios may be mapped
askew (ie not naturally aligned).  Another problem is that folios may
be unmapped in part (eg mmap(), fault, followed by munmap() of one of
the pages in the folio), and I presume you'd need to go and unmark the
other PTEs in that case.  So it's not as simple as just checking whether
'addr' and 'nr' are in some way compatible.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: API for setting multiple PTEs at once
  2023-02-02 21:14 API for setting multiple PTEs at once Matthew Wilcox
@ 2023-02-02 21:48 ` Kirill A. Shutemov
  2023-02-02 22:49   ` Matthew Wilcox
  2023-02-07 20:27 ` Matthew Wilcox
  1 sibling, 1 reply; 10+ messages in thread
From: Kirill A. Shutemov @ 2023-02-02 21:48 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-arch, Yin Fengwei, linux-mm

On Thu, Feb 02, 2023 at 09:14:23PM +0000, Matthew Wilcox wrote:
> For those of you not subscribed, linux-mm is currently discussing
> how best to handle page faults on large folios.  I simply made it work
> when adding large folio support.  Now Yin Fengwei is working on
> making it fast.
> 
> https://lore.kernel.org/linux-mm/Y9qjn0Y+1ir787nc@casper.infradead.org/
> is perhaps the best place to start as it pertains to what the
> architecture will see.
> 
> At the bottom of that function, I propose
> 
> +       for (i = 0; i < nr; i++) {
> +               set_pte_at(vma->vm_mm, addr, vmf->pte + i, entry);
> +               /* no need to invalidate: a not-present page won't be cached */
> +               update_mmu_cache(vma, addr, vmf->pte + i);
> +               addr += PAGE_SIZE;
> +		entry = pte_next(entry);
> +	}
> 
> (or I would have, had I not forgotten that pte_t isn't an integral type)
> 
> But I think that some architectures want to mark PTEs specially for
> "This is part of a contiguous range" -- ARM, perhaps?  So would you like
> an API like:
> 
> 	arch_set_ptes(mm, addr, vmf->pte, entry, nr);

Maybe just set_ptes(). arch_ doesn't contribute much.

> 	update_mmu_cache_range(vma, addr, vmf->pte, nr);
> 
> There are some challenges here.  For example, folios may be mapped
> askew (ie not naturally aligned).  Another problem is that folios may
> be unmapped in part (eg mmap(), fault, followed by munmap() of one of
> the pages in the folio), and I presume you'd need to go and unmark the
> other PTEs in that case.  So it's not as simple as just checking whether
> 'addr' and 'nr' are in some way compatible.

I think the key question is who is responsible for 'nr' being safe. Like
is it caller or set_ptes() need to check that it belong to the same PTE
page table, folio, VMA, etc.

I think it has to be done by caller and set_pte() has to be as simple as
possible.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: API for setting multiple PTEs at once
  2023-02-02 21:48 ` Kirill A. Shutemov
@ 2023-02-02 22:49   ` Matthew Wilcox
  2023-02-02 23:27     ` Kirill A. Shutemov
  0 siblings, 1 reply; 10+ messages in thread
From: Matthew Wilcox @ 2023-02-02 22:49 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: linux-arch, Yin Fengwei, linux-mm

On Fri, Feb 03, 2023 at 12:48:58AM +0300, Kirill A. Shutemov wrote:
> On Thu, Feb 02, 2023 at 09:14:23PM +0000, Matthew Wilcox wrote:
> > For those of you not subscribed, linux-mm is currently discussing
> > how best to handle page faults on large folios.  I simply made it work
> > when adding large folio support.  Now Yin Fengwei is working on
> > making it fast.
> > 
> > https://lore.kernel.org/linux-mm/Y9qjn0Y+1ir787nc@casper.infradead.org/
> > is perhaps the best place to start as it pertains to what the
> > architecture will see.
> > 
> > At the bottom of that function, I propose
> > 
> > +       for (i = 0; i < nr; i++) {
> > +               set_pte_at(vma->vm_mm, addr, vmf->pte + i, entry);
> > +               /* no need to invalidate: a not-present page won't be cached */
> > +               update_mmu_cache(vma, addr, vmf->pte + i);
> > +               addr += PAGE_SIZE;
> > +		entry = pte_next(entry);
> > +	}
> > 
> > (or I would have, had I not forgotten that pte_t isn't an integral type)
> > 
> > But I think that some architectures want to mark PTEs specially for
> > "This is part of a contiguous range" -- ARM, perhaps?  So would you like
> > an API like:
> > 
> > 	arch_set_ptes(mm, addr, vmf->pte, entry, nr);
> 
> Maybe just set_ptes(). arch_ doesn't contribute much.

Sure.

> > 	update_mmu_cache_range(vma, addr, vmf->pte, nr);
> > 
> > There are some challenges here.  For example, folios may be mapped
> > askew (ie not naturally aligned).  Another problem is that folios may
> > be unmapped in part (eg mmap(), fault, followed by munmap() of one of
> > the pages in the folio), and I presume you'd need to go and unmark the
> > other PTEs in that case.  So it's not as simple as just checking whether
> > 'addr' and 'nr' are in some way compatible.
> 
> I think the key question is who is responsible for 'nr' being safe. Like
> is it caller or set_ptes() need to check that it belong to the same PTE
> page table, folio, VMA, etc.
> 
> I think it has to be done by caller and set_pte() has to be as simple as
> possible.

Caller guarantees that 'nr' is bounded by all of (vma, PMD table, folio).

We don't currently allocate folios larger than PMD size, but perhaps we
should prepare for that and as part of this same exercise define

	set_pmds(mm, addr, vmf->pmd, entry, nr);

... where 'nr' is the number of PMDs to set, not number of pages.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: API for setting multiple PTEs at once
  2023-02-02 22:49   ` Matthew Wilcox
@ 2023-02-02 23:27     ` Kirill A. Shutemov
  0 siblings, 0 replies; 10+ messages in thread
From: Kirill A. Shutemov @ 2023-02-02 23:27 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-arch, Yin Fengwei, linux-mm

On Thu, Feb 02, 2023 at 10:49:38PM +0000, Matthew Wilcox wrote:
> On Fri, Feb 03, 2023 at 12:48:58AM +0300, Kirill A. Shutemov wrote:
> > On Thu, Feb 02, 2023 at 09:14:23PM +0000, Matthew Wilcox wrote:
> > > For those of you not subscribed, linux-mm is currently discussing
> > > how best to handle page faults on large folios.  I simply made it work
> > > when adding large folio support.  Now Yin Fengwei is working on
> > > making it fast.
> > > 
> > > https://lore.kernel.org/linux-mm/Y9qjn0Y+1ir787nc@casper.infradead.org/
> > > is perhaps the best place to start as it pertains to what the
> > > architecture will see.
> > > 
> > > At the bottom of that function, I propose
> > > 
> > > +       for (i = 0; i < nr; i++) {
> > > +               set_pte_at(vma->vm_mm, addr, vmf->pte + i, entry);
> > > +               /* no need to invalidate: a not-present page won't be cached */
> > > +               update_mmu_cache(vma, addr, vmf->pte + i);
> > > +               addr += PAGE_SIZE;
> > > +		entry = pte_next(entry);
> > > +	}
> > > 
> > > (or I would have, had I not forgotten that pte_t isn't an integral type)
> > > 
> > > But I think that some architectures want to mark PTEs specially for
> > > "This is part of a contiguous range" -- ARM, perhaps?  So would you like
> > > an API like:
> > > 
> > > 	arch_set_ptes(mm, addr, vmf->pte, entry, nr);
> > 
> > Maybe just set_ptes(). arch_ doesn't contribute much.
> 
> Sure.
> 
> > > 	update_mmu_cache_range(vma, addr, vmf->pte, nr);
> > > 
> > > There are some challenges here.  For example, folios may be mapped
> > > askew (ie not naturally aligned).  Another problem is that folios may
> > > be unmapped in part (eg mmap(), fault, followed by munmap() of one of
> > > the pages in the folio), and I presume you'd need to go and unmark the
> > > other PTEs in that case.  So it's not as simple as just checking whether
> > > 'addr' and 'nr' are in some way compatible.
> > 
> > I think the key question is who is responsible for 'nr' being safe. Like
> > is it caller or set_ptes() need to check that it belong to the same PTE
> > page table, folio, VMA, etc.
> > 
> > I think it has to be done by caller and set_pte() has to be as simple as
> > possible.
> 
> Caller guarantees that 'nr' is bounded by all of (vma, PMD table, folio).

Also caller is responsible for taking all relevant locks.

> We don't currently allocate folios larger than PMD size, but perhaps we
> should prepare for that and as part of this same exercise define
> 
> 	set_pmds(mm, addr, vmf->pmd, entry, nr);
> 
> ... where 'nr' is the number of PMDs to set, not number of pages.

Sounds good to me.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: API for setting multiple PTEs at once
  2023-02-02 21:14 API for setting multiple PTEs at once Matthew Wilcox
  2023-02-02 21:48 ` Kirill A. Shutemov
@ 2023-02-07 20:27 ` Matthew Wilcox
  2023-02-08 11:23   ` Alexandre Ghiti
  2023-02-14  9:55   ` Alexandre Ghiti
  1 sibling, 2 replies; 10+ messages in thread
From: Matthew Wilcox @ 2023-02-07 20:27 UTC (permalink / raw)
  To: linux-arch
  Cc: Yin Fengwei, linux-mm, linux-alpha, linux-csky, linux-m68k,
	linux-mips, Dinh Nguyen, linux-parisc, linux-sh,
	linux-arm-kernel, loongarch, openrisc, linuxppc-dev, linux-riscv,
	sparclinux, linux-xtensa

On Thu, Feb 02, 2023 at 09:14:23PM +0000, Matthew Wilcox wrote:
> For those of you not subscribed, linux-mm is currently discussing
> how best to handle page faults on large folios.  I simply made it work
> when adding large folio support.  Now Yin Fengwei is working on
> making it fast.

OK, here's an actual implementation:

https://lore.kernel.org/linux-mm/20230207194937.122543-3-willy@infradead.org/

It survives a run of xfstests.  If your architecture doesn't store its
PFNs at PAGE_SHIFT, you're going to want to implement your own set_ptes(),
or you'll see entirely the wrong pages mapped into userspace.  You may
also wish to implement set_ptes() if it can be done more efficiently
than __pte(pteval(pte) + PAGE_SIZE).

Architectures that implement things like flush_icache_page() and
update_mmu_cache() may want to propose batched versions of those.
That's alpha, csky, m68k, mips, nios2, parisc, sh,
arm, loongarch, openrisc, powerpc, riscv, sparc and xtensa.
Maintainers BCC'd, mailing lists CC'd.

I'm happy to collect implementations and submit them as part of a v6.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: API for setting multiple PTEs at once
  2023-02-07 20:27 ` Matthew Wilcox
@ 2023-02-08 11:23   ` Alexandre Ghiti
  2023-02-08 12:09     ` Yin, Fengwei
  2023-02-14  9:55   ` Alexandre Ghiti
  1 sibling, 1 reply; 10+ messages in thread
From: Alexandre Ghiti @ 2023-02-08 11:23 UTC (permalink / raw)
  To: Matthew Wilcox, linux-arch
  Cc: Yin Fengwei, linux-mm, linux-alpha, linux-csky, linux-m68k,
	linux-mips, Dinh Nguyen, linux-parisc, linux-sh,
	linux-arm-kernel, loongarch, openrisc, linuxppc-dev, linux-riscv,
	sparclinux, linux-xtensa

Hi Matthew,

On 2/7/23 21:27, Matthew Wilcox wrote:
> On Thu, Feb 02, 2023 at 09:14:23PM +0000, Matthew Wilcox wrote:
>> For those of you not subscribed, linux-mm is currently discussing
>> how best to handle page faults on large folios.  I simply made it work
>> when adding large folio support.  Now Yin Fengwei is working on
>> making it fast.
> OK, here's an actual implementation:
>
> https://lore.kernel.org/linux-mm/20230207194937.122543-3-willy@infradead.org/
>
> It survives a run of xfstests.  If your architecture doesn't store its
> PFNs at PAGE_SHIFT, you're going to want to implement your own set_ptes(),


riscv stores its pfn at PAGE_PFN_SHIFT instead of PAGE_SHIFT, se we need 
to reimplement set_ptes. But I have been playing with your patchset and 
we never fall into the case where set_ptes is called with nr > 1, any 
idea why? I booted a large ubuntu defconfig and launched 
will_it_scale.page_fault4.

I'll come up with the proper implementation of set_ptes anyway soon.

Thanks,

Alex


> or you'll see entirely the wrong pages mapped into userspace.  You may
> also wish to implement set_ptes() if it can be done more efficiently
> than __pte(pteval(pte) + PAGE_SIZE).
>
> Architectures that implement things like flush_icache_page() and
> update_mmu_cache() may want to propose batched versions of those.
> That's alpha, csky, m68k, mips, nios2, parisc, sh,
> arm, loongarch, openrisc, powerpc, riscv, sparc and xtensa.
> Maintainers BCC'd, mailing lists CC'd.
>
> I'm happy to collect implementations and submit them as part of a v6.
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: API for setting multiple PTEs at once
  2023-02-08 11:23   ` Alexandre Ghiti
@ 2023-02-08 12:09     ` Yin, Fengwei
  2023-02-08 13:35       ` Matthew Wilcox
  0 siblings, 1 reply; 10+ messages in thread
From: Yin, Fengwei @ 2023-02-08 12:09 UTC (permalink / raw)
  To: Alexandre Ghiti, Matthew Wilcox, linux-arch
  Cc: linux-mm, linux-alpha, linux-csky, linux-m68k, linux-mips,
	Dinh Nguyen, linux-parisc, linux-sh, linux-arm-kernel, loongarch,
	openrisc, linuxppc-dev, linux-riscv, sparclinux, linux-xtensa



On 2/8/2023 7:23 PM, Alexandre Ghiti wrote:
> Hi Matthew,
> 
> On 2/7/23 21:27, Matthew Wilcox wrote:
>> On Thu, Feb 02, 2023 at 09:14:23PM +0000, Matthew Wilcox wrote:
>>> For those of you not subscribed, linux-mm is currently discussing
>>> how best to handle page faults on large folios.  I simply made it work
>>> when adding large folio support.  Now Yin Fengwei is working on
>>> making it fast.
>> OK, here's an actual implementation:
>>
>> https://lore.kernel.org/linux-mm/20230207194937.122543-3-willy@infradead.org/
>>
>> It survives a run of xfstests.  If your architecture doesn't store its
>> PFNs at PAGE_SHIFT, you're going to want to implement your own set_ptes(),
> 
> 
> riscv stores its pfn at PAGE_PFN_SHIFT instead of PAGE_SHIFT, se we need to reimplement set_ptes. But I have been playing with your patchset and we never fall into the case where set_ptes is called with nr > 1, any idea why? I booted a large ubuntu defconfig and launched will_it_scale.page_fault4.
Need to use xfs filesystem to get large folio for file mapping.
Other filesystem may be also OK. But I just tried xfs. Thanks.


Regards
Yin, Fengwei

> 
> I'll come up with the proper implementation of set_ptes anyway soon.
> 
> Thanks,
> 
> Alex
> 
> 
>> or you'll see entirely the wrong pages mapped into userspace.  You may
>> also wish to implement set_ptes() if it can be done more efficiently
>> than __pte(pteval(pte) + PAGE_SIZE).
>>
>> Architectures that implement things like flush_icache_page() and
>> update_mmu_cache() may want to propose batched versions of those.
>> That's alpha, csky, m68k, mips, nios2, parisc, sh,
>> arm, loongarch, openrisc, powerpc, riscv, sparc and xtensa.
>> Maintainers BCC'd, mailing lists CC'd.
>>
>> I'm happy to collect implementations and submit them as part of a v6.
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: API for setting multiple PTEs at once
  2023-02-08 12:09     ` Yin, Fengwei
@ 2023-02-08 13:35       ` Matthew Wilcox
  0 siblings, 0 replies; 10+ messages in thread
From: Matthew Wilcox @ 2023-02-08 13:35 UTC (permalink / raw)
  To: Yin, Fengwei
  Cc: Alexandre Ghiti, linux-arch, linux-mm, linux-alpha, linux-csky,
	linux-m68k, linux-mips, Dinh Nguyen, linux-parisc, linux-sh,
	linux-arm-kernel, loongarch, openrisc, linuxppc-dev, linux-riscv,
	sparclinux, linux-xtensa

On Wed, Feb 08, 2023 at 08:09:00PM +0800, Yin, Fengwei wrote:
> 
> 
> On 2/8/2023 7:23 PM, Alexandre Ghiti wrote:
> > Hi Matthew,
> > 
> > On 2/7/23 21:27, Matthew Wilcox wrote:
> >> On Thu, Feb 02, 2023 at 09:14:23PM +0000, Matthew Wilcox wrote:
> >>> For those of you not subscribed, linux-mm is currently discussing
> >>> how best to handle page faults on large folios.  I simply made it work
> >>> when adding large folio support.  Now Yin Fengwei is working on
> >>> making it fast.
> >> OK, here's an actual implementation:
> >>
> >> https://lore.kernel.org/linux-mm/20230207194937.122543-3-willy@infradead.org/
> >>
> >> It survives a run of xfstests.  If your architecture doesn't store its
> >> PFNs at PAGE_SHIFT, you're going to want to implement your own set_ptes(),
> > 
> > 
> > riscv stores its pfn at PAGE_PFN_SHIFT instead of PAGE_SHIFT, se we need to reimplement set_ptes. But I have been playing with your patchset and we never fall into the case where set_ptes is called with nr > 1, any idea why? I booted a large ubuntu defconfig and launched will_it_scale.page_fault4.
> Need to use xfs filesystem to get large folio for file mapping.
> Other filesystem may be also OK. But I just tried xfs. Thanks.

XFS is certainly the flagship filesystem to support large folios, but
others have added support, AFS and EROFS.  You can also get large folios
in tmpfs (which is slightly different as it focuses on THPs rather than
generic large folios).

You also have to have CONFIG_TRANSPARENT_HUGEPAGE selected, which riscv
can do.  That restriction will be lifted at some point, but for now
large folios depends on the THP infrastructure.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: API for setting multiple PTEs at once
  2023-02-07 20:27 ` Matthew Wilcox
  2023-02-08 11:23   ` Alexandre Ghiti
@ 2023-02-14  9:55   ` Alexandre Ghiti
  2023-02-20  8:29     ` Rolf Eike Beer
  1 sibling, 1 reply; 10+ messages in thread
From: Alexandre Ghiti @ 2023-02-14  9:55 UTC (permalink / raw)
  To: Matthew Wilcox, linux-arch
  Cc: Yin Fengwei, linux-mm, linux-alpha, linux-csky, linux-m68k,
	linux-mips, Dinh Nguyen, linux-parisc, linux-sh,
	linux-arm-kernel, loongarch, openrisc, linuxppc-dev, linux-riscv,
	sparclinux, linux-xtensa

Hi Matthew,

On 2/7/23 21:27, Matthew Wilcox wrote:
> On Thu, Feb 02, 2023 at 09:14:23PM +0000, Matthew Wilcox wrote:
>> For those of you not subscribed, linux-mm is currently discussing
>> how best to handle page faults on large folios.  I simply made it work
>> when adding large folio support.  Now Yin Fengwei is working on
>> making it fast.
> OK, here's an actual implementation:
>
> https://lore.kernel.org/linux-mm/20230207194937.122543-3-willy@infradead.org/
>
> It survives a run of xfstests.  If your architecture doesn't store its
> PFNs at PAGE_SHIFT, you're going to want to implement your own set_ptes(),
> or you'll see entirely the wrong pages mapped into userspace.  You may
> also wish to implement set_ptes() if it can be done more efficiently
> than __pte(pteval(pte) + PAGE_SIZE).
>
> Architectures that implement things like flush_icache_page() and
> update_mmu_cache() may want to propose batched versions of those.
> That's alpha, csky, m68k, mips, nios2, parisc, sh,
> arm, loongarch, openrisc, powerpc, riscv, sparc and xtensa.
> Maintainers BCC'd, mailing lists CC'd.
>
> I'm happy to collect implementations and submit them as part of a v6.


Please find below the riscv implementation for set_ptes:

diff --git a/arch/riscv/include/asm/pgtable.h 
b/arch/riscv/include/asm/pgtable.h
index ebee56d47003..10bf812776a4 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -463,6 +463,20 @@ static inline void set_pte_at(struct mm_struct *mm,
         __set_pte_at(mm, addr, ptep, pteval);
  }

+#define set_ptes set_ptes
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+                           pte_t *ptep, pte_t pte, unsigned int nr)
+{
+       for (;;) {
+               set_pte_at(mm, addr, ptep, pte);
+               if (--nr == 0)
+                       break;
+               ptep++;
+               addr += PAGE_SIZE;
+               pte = __pte(pte_val(pte) + (1 << _PAGE_PFN_SHIFT));
+       }
+}
+
  static inline void pte_clear(struct mm_struct *mm,
         unsigned long addr, pte_t *ptep)
  {


Thanks,

Alex



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: API for setting multiple PTEs at once
  2023-02-14  9:55   ` Alexandre Ghiti
@ 2023-02-20  8:29     ` Rolf Eike Beer
  0 siblings, 0 replies; 10+ messages in thread
From: Rolf Eike Beer @ 2023-02-20  8:29 UTC (permalink / raw)
  To: Matthew Wilcox, linux-arch, Alexandre Ghiti
  Cc: Yin Fengwei, linux-mm, linux-alpha, linux-csky, linux-m68k,
	linux-mips, Dinh Nguyen, linux-parisc, linux-sh,
	linux-arm-kernel, loongarch, openrisc, linuxppc-dev, linux-riscv,
	sparclinux, linux-xtensa

[-- Attachment #1: Type: text/plain, Size: 3141 bytes --]

On Dienstag, 14. Februar 2023 10:55:43 CET Alexandre Ghiti wrote:
> Hi Matthew,
> 
> On 2/7/23 21:27, Matthew Wilcox wrote:
> > On Thu, Feb 02, 2023 at 09:14:23PM +0000, Matthew Wilcox wrote:
> >> For those of you not subscribed, linux-mm is currently discussing
> >> how best to handle page faults on large folios.  I simply made it work
> >> when adding large folio support.  Now Yin Fengwei is working on
> >> making it fast.
> > 
> > OK, here's an actual implementation:
> > 
> > https://lore.kernel.org/linux-mm/20230207194937.122543-3-willy@infradead.o
> > rg/
> > 
> > It survives a run of xfstests.  If your architecture doesn't store its
> > PFNs at PAGE_SHIFT, you're going to want to implement your own set_ptes(),
> > or you'll see entirely the wrong pages mapped into userspace.  You may
> > also wish to implement set_ptes() if it can be done more efficiently
> > than __pte(pteval(pte) + PAGE_SIZE).
> > 
> > Architectures that implement things like flush_icache_page() and
> > update_mmu_cache() may want to propose batched versions of those.
> > That's alpha, csky, m68k, mips, nios2, parisc, sh,
> > arm, loongarch, openrisc, powerpc, riscv, sparc and xtensa.
> > Maintainers BCC'd, mailing lists CC'd.
> > 
> > I'm happy to collect implementations and submit them as part of a v6.
> 
> Please find below the riscv implementation for set_ptes:
> 
> diff --git a/arch/riscv/include/asm/pgtable.h
> b/arch/riscv/include/asm/pgtable.h
> index ebee56d47003..10bf812776a4 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -463,6 +463,20 @@ static inline void set_pte_at(struct mm_struct *mm,
>          __set_pte_at(mm, addr, ptep, pteval);
>   }
> 
> +#define set_ptes set_ptes
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +                           pte_t *ptep, pte_t pte, unsigned int nr)
> +{
> +       for (;;) {
> +               set_pte_at(mm, addr, ptep, pte);
> +               if (--nr == 0)
> +                       break;
> +               ptep++;
> +               addr += PAGE_SIZE;
> +               pte = __pte(pte_val(pte) + (1 << _PAGE_PFN_SHIFT));
> +       }
> +}

Given that this is the same code as the original version (surprise!), what 
about doing something like this in the generic code instead:

#ifndef PTE_PAGE_STEP
#define PTE_PAGE_STEP PAGE_SIZE
#endif

[…]

> +               pte = __pte(pte_val(pte) + PTE_PAGE_STEP);

The name of the define is an obvious candidate for bike-shedding, feel free to 
name it as you want.

Or if that isn't sound enough maybe introduce something like:

static inline pte_t
set_ptes_next_pte(pte_t pte)
{
	return __pte(pte_val(pte) + (1 << _PAGE_PFN_SHIFT));
}

Greetings,

Eike
-- 
Rolf Eike Beer, emlix GmbH, http://www.emlix.com
Fon +49 551 30664-0, Fax +49 551 30664-11
Gothaer Platz 3, 37083 Göttingen, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055

emlix - smart embedded open source

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 313 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-02-20  8:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-02 21:14 API for setting multiple PTEs at once Matthew Wilcox
2023-02-02 21:48 ` Kirill A. Shutemov
2023-02-02 22:49   ` Matthew Wilcox
2023-02-02 23:27     ` Kirill A. Shutemov
2023-02-07 20:27 ` Matthew Wilcox
2023-02-08 11:23   ` Alexandre Ghiti
2023-02-08 12:09     ` Yin, Fengwei
2023-02-08 13:35       ` Matthew Wilcox
2023-02-14  9:55   ` Alexandre Ghiti
2023-02-20  8:29     ` Rolf Eike Beer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).