linux-riscv.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* vmalloc faulting on RISC-V
@ 2020-09-09  6:29 Pekka Enberg
  2020-09-09 18:28 ` Joerg Roedel
  2020-09-09 20:35 ` Palmer Dabbelt
  0 siblings, 2 replies; 4+ messages in thread
From: Pekka Enberg @ 2020-09-09  6:29 UTC (permalink / raw)
  To: Palmer Dabbelt; +Cc: linux-riscv, Joerg Roedel

Hello!

Why does RISC-V need vmalloc faulting in do_page_fault()? If I
understand correctly, some architectures implement it because process
page tables can get out of sync with "init_mm.pgd". How does that
happen on RISC-V?

I am asking because Joerg Roedel recently switched the x86
architecture to a different approach because apparently vmalloc
faulting is error-prone:

commit 7f0a002b5a21302d9f4b29ba83c96cd433ff3769
Author: Joerg Roedel <jroedel@suse.de>
Date:   Mon Jun 1 21:52:40 2020 -0700

    x86/mm: remove vmalloc faulting

    Remove fault handling on vmalloc areas, as the vmalloc code now takes
    care of synchronizing changes to all page-tables in the system.

If RISC-V has the issue of page tables getting out of sync, I think we
should switch to this approach too.

Regards,

- Pekka

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: vmalloc faulting on RISC-V
  2020-09-09  6:29 vmalloc faulting on RISC-V Pekka Enberg
@ 2020-09-09 18:28 ` Joerg Roedel
  2020-09-09 20:35 ` Palmer Dabbelt
  1 sibling, 0 replies; 4+ messages in thread
From: Joerg Roedel @ 2020-09-09 18:28 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: linux-riscv, Palmer Dabbelt

On Wed, Sep 09, 2020 at 09:29:10AM +0300, Pekka Enberg wrote:
> Why does RISC-V need vmalloc faulting in do_page_fault()? If I
> understand correctly, some architectures implement it because process
> page tables can get out of sync with "init_mm.pgd". How does that
> happen on RISC-V?
> 
> I am asking because Joerg Roedel recently switched the x86
> architecture to a different approach because apparently vmalloc
> faulting is error-prone:
> 
> commit 7f0a002b5a21302d9f4b29ba83c96cd433ff3769
> Author: Joerg Roedel <jroedel@suse.de>
> Date:   Mon Jun 1 21:52:40 2020 -0700
> 
>     x86/mm: remove vmalloc faulting
> 
>     Remove fault handling on vmalloc areas, as the vmalloc code now takes
>     care of synchronizing changes to all page-tables in the system.

It is actually not vmalloc-faulting alone that was error-prone, it was
its combination with the (now removed) vmalloc_sync_* interfaces which
had to be called at random places in the kernel. This interface was
removed and replaced by arch_sync_kernel_mappings(), which makes sure
that mappings are synchronized to all page-tables before the vmalloc'ed
pointer is returned.

Above commit to remove vmalloc-faulting on x86 had to be (partially)
reverted, because relying on arch_sync_kernel_mappings() alone turned
out to have a race condition.

Please see commit

	4819e15f740e x86/mm/32: Bring back vmalloc faulting on x86_32

for details on that.

Regards,

	Joerg

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: vmalloc faulting on RISC-V
  2020-09-09  6:29 vmalloc faulting on RISC-V Pekka Enberg
  2020-09-09 18:28 ` Joerg Roedel
@ 2020-09-09 20:35 ` Palmer Dabbelt
  2020-09-18 12:08   ` Joerg Roedel
  1 sibling, 1 reply; 4+ messages in thread
From: Palmer Dabbelt @ 2020-09-09 20:35 UTC (permalink / raw)
  To: penberg; +Cc: linux-riscv, jroedel

On Tue, 08 Sep 2020 23:29:10 PDT (-0700), penberg@kernel.org wrote:
> Hello!
>
> Why does RISC-V need vmalloc faulting in do_page_fault()? If I
> understand correctly, some architectures implement it because process
> page tables can get out of sync with "init_mm.pgd". How does that
> happen on RISC-V?

RISC-V requires a sfence.vma when upgrading a mapping from invalid to valid, so
we need to do something.  Our two options are to eagerly sfence.vma (IIRC
there's a comment about how one might do so) or to handle the faults that arise 

On Rocket these faults are unlikely to manifest on the vmalloc region because
we don't speculatively fill the DTLB, memory is in order, and we don't
regularly reference invalid vmalloc pages.  The DTLB does cache invalid
mappings (IIRC it helps with critical path because it decouples the stall logic
from the address calculation logic, both of which are very tight in the M stage
of a canonical 5-stage pipeline) so they can and do show up, just not that
often.

Since the faults are rare and sfence.vma is expensive we decided that it would
be a net win to elide the fences and handle the resulting faults.  IIRC we
don't have any benchmarks to back that up, but intuitively it still smells like
a reasonable decision.

> I am asking because Joerg Roedel recently switched the x86
> architecture to a different approach because apparently vmalloc
> faulting is error-prone:
>
> commit 7f0a002b5a21302d9f4b29ba83c96cd433ff3769
> Author: Joerg Roedel <jroedel@suse.de>
> Date:   Mon Jun 1 21:52:40 2020 -0700
>
>     x86/mm: remove vmalloc faulting
>
>     Remove fault handling on vmalloc areas, as the vmalloc code now takes
>     care of synchronizing changes to all page-tables in the system.
>
> If RISC-V has the issue of page tables getting out of sync, I think we
> should switch to this approach too.

If it's actually error prone then we'll need to switch over, but I'd anticipate
we pay a performance hit on existing hardware so I'd prefer to fix the bugs if
possible.  My guess would be that the bugs are very ISA-specific and the
performance tradeoffs are very implementation-specific, so while the x86 folks
are usually quite solid on these things that may not apply to our use cases.

If there really is some reason shared between RISC-V and x86 that makes this
approach infeasible then we'll have to fix it.  Knowing why x86 changed their
approach would be the first step.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: vmalloc faulting on RISC-V
  2020-09-09 20:35 ` Palmer Dabbelt
@ 2020-09-18 12:08   ` Joerg Roedel
  0 siblings, 0 replies; 4+ messages in thread
From: Joerg Roedel @ 2020-09-18 12:08 UTC (permalink / raw)
  To: Palmer Dabbelt; +Cc: penberg, linux-riscv

On Wed, Sep 09, 2020 at 01:35:11PM -0700, Palmer Dabbelt wrote:
> If there really is some reason shared between RISC-V and x86 that makes this
> approach infeasible then we'll have to fix it.  Knowing why x86 changed their
> approach would be the first step.

On x86 there is a need to sync vmalloc paging updates at one single
level to all page-tables in the system. This is a slow operation and
requires a global spin-lock to be taken, so x86 did not synchronize the
mappings at creation time but used faulting to sync them to other
page-tables.

But this approach is problematic for at least two reasons:

	1) The page-fault handler might also access vmalloc'ed memory,
	   which can then result in recursive faults because the
	   page-fault handler can never finish the synchronization.

	   The primary reason where this happened on x86 was with
	   tracing enabled, as the trace-buffer can be vmalloc'ed,
	   leading to recursive faults and an effectivly frozen system.

	2) Faulting for synchronization does not work if valid ->
	   invalid changes need to be synchronized. This has happened on
	   x86-32 an caused memory corruption, because the page-table
	   level which needs synchronization (PMD) is also a huge-page
	   mapping level. The ioremap code can use huge-pages, so there
	   is a need to synchronize unmappings.

So x86 worked around these problems by having a function in the kernel
called vmalloc_sync_all() which had to be called at random places to
actively synchronize the mappings. But that never solved the underlying
issues so they popped up again and again in the past.

This was the reason I removed the vmalloc_sync_all() interface [by the
time it was already split into vmalloc_sync_mappings/unmappings()] and
introduced the arch_sync_kernel_mappings() function, which is only
called when a relevant mapping changed and which makes sure the memory
is mapped when vmalloc() returns.

Initially I thought this would also allow to remove vmalloc faulting
completly, but that is not the case because of the race condition. But
arch_sync_kernel_mappings() serves as a decent replacement for the old
and even more broken vmalloc_sync_all() interface.

Regards,

	Joerg

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-09-18 12:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-09  6:29 vmalloc faulting on RISC-V Pekka Enberg
2020-09-09 18:28 ` Joerg Roedel
2020-09-09 20:35 ` Palmer Dabbelt
2020-09-18 12:08   ` Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).