All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] Page table manipulation primitives
@ 2020-02-06 16:57 Mike Rapoport
  2020-02-06 17:34 ` Matthew Wilcox
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Rapoport @ 2020-02-06 16:57 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-mm

While updating the architectures to properly use 5-level folded page tables
without <asm-generic/?level-fixup.h> and <asm-generic/pgtable-nop4d-hack.h>
I wondered if we can do better than explicitly name each and every level of
the page table, open-code traversal of all the layers numerous times and
have copied do_something_pXd_range().

Then I've come across Kirill's "Proof-of-concept: better(?) page-table
manipulation API" [1], but as far as I could see there was no progress
since then.

I'd like to resurrect the topic and try to see if we can come up with
actually better page table manipulation API.

[1] https://lore.kernel.org/lkml/20180424154355.mfjgkf47kdp2by4e@black.fi.intel.com/

-- 
Sincerely yours,
Mike.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Page table manipulation primitives
  2020-02-06 16:57 [LSF/MM/BPF TOPIC] Page table manipulation primitives Mike Rapoport
@ 2020-02-06 17:34 ` Matthew Wilcox
  2020-02-07 17:45   ` Kirill A. Shutemov
  0 siblings, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2020-02-06 17:34 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: lsf-pc, linux-mm

On Thu, Feb 06, 2020 at 06:57:41PM +0200, Mike Rapoport wrote:
> While updating the architectures to properly use 5-level folded page tables
> without <asm-generic/?level-fixup.h> and <asm-generic/pgtable-nop4d-hack.h>
> I wondered if we can do better than explicitly name each and every level of
> the page table, open-code traversal of all the layers numerous times and
> have copied do_something_pXd_range().
> 
> Then I've come across Kirill's "Proof-of-concept: better(?) page-table
> manipulation API" [1], but as far as I could see there was no progress
> since then.
> 
> I'd like to resurrect the topic and try to see if we can come up with
> actually better page table manipulation API.
> 
> [1] https://lore.kernel.org/lkml/20180424154355.mfjgkf47kdp2by4e@black.fi.intel.com/

I don't think this approach helps support 64k pages on ARM, for example,
so it doesn't solve enough problems to be worth doing.  I'd favour
an interface which looked more like this:

	vpte_iter iter;
	vpte_t vpte;

	vpte_iter_for_each(vpte, iter, start, end, flags) {
		unsigned char order = vpte_order(&iter);
		... do things based on vpte and order ...
	}



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Page table manipulation primitives
  2020-02-06 17:34 ` Matthew Wilcox
@ 2020-02-07 17:45   ` Kirill A. Shutemov
  2020-02-07 19:40     ` Matthew Wilcox
  0 siblings, 1 reply; 4+ messages in thread
From: Kirill A. Shutemov @ 2020-02-07 17:45 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Mike Rapoport, lsf-pc, linux-mm

On Thu, Feb 06, 2020 at 09:34:10AM -0800, Matthew Wilcox wrote:
> On Thu, Feb 06, 2020 at 06:57:41PM +0200, Mike Rapoport wrote:
> > While updating the architectures to properly use 5-level folded page tables
> > without <asm-generic/?level-fixup.h> and <asm-generic/pgtable-nop4d-hack.h>
> > I wondered if we can do better than explicitly name each and every level of
> > the page table, open-code traversal of all the layers numerous times and
> > have copied do_something_pXd_range().
> > 
> > Then I've come across Kirill's "Proof-of-concept: better(?) page-table
> > manipulation API" [1], but as far as I could see there was no progress
> > since then.
> > 
> > I'd like to resurrect the topic and try to see if we can come up with
> > actually better page table manipulation API.
> > 
> > [1] https://lore.kernel.org/lkml/20180424154355.mfjgkf47kdp2by4e@black.fi.intel.com/

I played a bit more with it after that, but got distracted to other stuff.
I'll see if I'll be able to come up with an update.

> I don't think this approach helps support 64k pages on ARM

Could you specify what such support would require?

> , for example,
> so it doesn't solve enough problems to be worth doing.  I'd favour
> an interface which looked more like this:
> 
> 	vpte_iter iter;
> 	vpte_t vpte;
> 
> 	vpte_iter_for_each(vpte, iter, start, end, flags) {
> 		unsigned char order = vpte_order(&iter);
> 		... do things based on vpte and order ...
> 	}

It looks like just an higher level API that can be provided over my
approach. Maybe it should be the default go-to. But I find it useful to be
able go into low-level details where it is matters.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Page table manipulation primitives
  2020-02-07 17:45   ` Kirill A. Shutemov
@ 2020-02-07 19:40     ` Matthew Wilcox
  0 siblings, 0 replies; 4+ messages in thread
From: Matthew Wilcox @ 2020-02-07 19:40 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Mike Rapoport, lsf-pc, linux-mm

On Fri, Feb 07, 2020 at 08:45:53PM +0300, Kirill A. Shutemov wrote:
> On Thu, Feb 06, 2020 at 09:34:10AM -0800, Matthew Wilcox wrote:
> > On Thu, Feb 06, 2020 at 06:57:41PM +0200, Mike Rapoport wrote:
> > > While updating the architectures to properly use 5-level folded page tables
> > > without <asm-generic/?level-fixup.h> and <asm-generic/pgtable-nop4d-hack.h>
> > > I wondered if we can do better than explicitly name each and every level of
> > > the page table, open-code traversal of all the layers numerous times and
> > > have copied do_something_pXd_range().
> > > 
> > > Then I've come across Kirill's "Proof-of-concept: better(?) page-table
> > > manipulation API" [1], but as far as I could see there was no progress
> > > since then.
> > > 
> > > I'd like to resurrect the topic and try to see if we can come up with
> > > actually better page table manipulation API.
> > > 
> > > [1] https://lore.kernel.org/lkml/20180424154355.mfjgkf47kdp2by4e@black.fi.intel.com/
> 
> I played a bit more with it after that, but got distracted to other stuff.
> I'll see if I'll be able to come up with an update.
> 
> > I don't think this approach helps support 64k pages on ARM
> 
> Could you specify what such support would require?

For 64kB pages with a base 4kB page size, you set a special bit in 16 adjacent
aligned PTEs.  When the MMU sees that bit set, it uses a 64k TLB entry.  So
I think what we want for a fully generic interface is:

void set_vpte_at(struct mm_struct *, unsigned long addr, vpte_iter *, vpte_t,
		unsigned int order);

(maybe we don't need an 'order' here; perhaps it's embedded in the vpte_iter)

> > , for example,
> > so it doesn't solve enough problems to be worth doing.  I'd favour
> > an interface which looked more like this:
> > 
> > 	vpte_iter iter;
> > 	vpte_t vpte;
> > 
> > 	vpte_iter_for_each(vpte, iter, start, end, flags) {
> > 		unsigned char order = vpte_order(&iter);
> > 		... do things based on vpte and order ...
> > 	}
> 
> It looks like just an higher level API that can be provided over my
> approach. Maybe it should be the default go-to. But I find it useful to be
> able go into low-level details where it is matters.

I think the key difference is that I would not embed the 'order' in the
vpte, but keep it in the iter.  I don't know that every architecture has
the ability to tell from a union { pte_t, pmd_t, pud_t, p4d_t, pgd_t }
which of the levels it is.

Looking at the code you provided, another difference is that your method
involves a recursive call for each level of the page tables.  I'd rather
express these kinds of things as "I would like to iterate over each
page table entry in this range" than "Have I got to the bottom?  If not,
recursively call myself".  IOW vpte_iter_for_each() would work its way
down to the lowest level, and keep track of where it is in the iter,
so when moving to the next entry in the tree, it knows whether to go up
before going sideways, and then down as far as it needs to.

Whatever we come up with, we should be able to collapse away the levels
which aren't needed, and support whatever non-PTE-level TLB orders the
hardware supports without forcing support for those orders on x86 code.

I don't have a good solution for how to express the 'copy_pt_range' in
your example, where we need to iterate two mms at the same time.  Maybe
that's a special iterator which does exactly that.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-02-07 19:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-06 16:57 [LSF/MM/BPF TOPIC] Page table manipulation primitives Mike Rapoport
2020-02-06 17:34 ` Matthew Wilcox
2020-02-07 17:45   ` Kirill A. Shutemov
2020-02-07 19:40     ` Matthew Wilcox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.