All of lore.kernel.org
 help / color / mirror / Atom feed
* overzealous TLB flushing by lazy VMAP flushing
@ 2014-08-04 23:23 David Miller
  2014-08-04 23:35   ` David Miller
  0 siblings, 1 reply; 3+ messages in thread
From: David Miller @ 2014-08-04 23:23 UTC (permalink / raw)
  To: sparclinux


Hey Nick,

The lazy VMAP flushing in mm/vmalloc.c seems to make various
assumptions about vmalloc area layout.

In particular it assumes that if there are pending VMAP flushes
in multiple regions managed by vmap/vunmap, it's safe to queue
up a range flush from the lowest such address to the highest
such address.

This is problematic and causes problems on sparc64 as diagnosed by
Christopher (CC:'d).

On sparc64 we have the following regions:

modules		0x010000000 --> 0x0f0000000
openfirmware	0x0f0000000 --> 0x100000000
vmalloc		0x100000000 --> 0x10000000000

So if a module is unloaded as well as some vfree()'s occur, the next
lazy VMAP flush will flush a range that covers all of openfirmware.

This will flush the firmware's locked TLB entries, which in turn cause
all sorts of problems.

It is not possible to adjust where these ranges are in order to make
the vmalloc and module ranges be right next to eachother.  The
firmware area is fixed, first of all.  Second of all the module area
has to be in the low 4GB because of the code model we compile the
kernel with (all symbols are 32-bit), and we want to use as little of
the sub-4GB area as possible because it has to fit the main kernel
image, modules, and the firmware region.

We could add all sorts of range logic to the flush_tlb_range()
implementation on sparc64, but I really think that the kernel should
not trigger a TLB flush across a range for which it never managed any
mappings.

I also think that the lazy VMAP flusher should be mindful of this for
another reason.  Specifically, issuing such an enormous flush range is
going to be expensive, more expensive that whatever we were gaining by
batching these flushes.

Unlike for userspace mappings, for kernel mappings we can't have a
cutoff for page-by-page flushes and just do a context based TLB flush.
We always have to do page-by-page flushes.  So these huge ranges
really do hurt.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: overzealous TLB flushing by lazy VMAP flushing
  2014-08-04 23:23 overzealous TLB flushing by lazy VMAP flushing David Miller
@ 2014-08-04 23:35   ` David Miller
  0 siblings, 0 replies; 3+ messages in thread
From: David Miller @ 2014-08-04 23:35 UTC (permalink / raw)
  To: npiggin; +Cc: cat.schulze, sparclinux, linux-kernel

From: David Miller <davem@davemloft.net>
Date: Mon, 04 Aug 2014 16:23:14 -0700 (PDT)

Sorry, I screwed up the lkml CC:, fixing that here.

> Hey Nick,
> 
> The lazy VMAP flushing in mm/vmalloc.c seems to make various
> assumptions about vmalloc area layout.
> 
> In particular it assumes that if there are pending VMAP flushes
> in multiple regions managed by vmap/vunmap, it's safe to queue
> up a range flush from the lowest such address to the highest
> such address.
> 
> This is problematic and causes problems on sparc64 as diagnosed by
> Christopher (CC:'d).
> 
> On sparc64 we have the following regions:
> 
> modules		0x010000000 --> 0x0f0000000
> openfirmware	0x0f0000000 --> 0x100000000
> vmalloc		0x100000000 --> 0x10000000000
> 
> So if a module is unloaded as well as some vfree()'s occur, the next
> lazy VMAP flush will flush a range that covers all of openfirmware.
> 
> This will flush the firmware's locked TLB entries, which in turn cause
> all sorts of problems.
> 
> It is not possible to adjust where these ranges are in order to make
> the vmalloc and module ranges be right next to eachother.  The
> firmware area is fixed, first of all.  Second of all the module area
> has to be in the low 4GB because of the code model we compile the
> kernel with (all symbols are 32-bit), and we want to use as little of
> the sub-4GB area as possible because it has to fit the main kernel
> image, modules, and the firmware region.
> 
> We could add all sorts of range logic to the flush_tlb_range()
> implementation on sparc64, but I really think that the kernel should
> not trigger a TLB flush across a range for which it never managed any
> mappings.
> 
> I also think that the lazy VMAP flusher should be mindful of this for
> another reason.  Specifically, issuing such an enormous flush range is
> going to be expensive, more expensive that whatever we were gaining by
> batching these flushes.
> 
> Unlike for userspace mappings, for kernel mappings we can't have a
> cutoff for page-by-page flushes and just do a context based TLB flush.
> We always have to do page-by-page flushes.  So these huge ranges
> really do hurt.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: overzealous TLB flushing by lazy VMAP flushing
@ 2014-08-04 23:35   ` David Miller
  0 siblings, 0 replies; 3+ messages in thread
From: David Miller @ 2014-08-04 23:35 UTC (permalink / raw)
  To: npiggin; +Cc: cat.schulze, sparclinux, linux-kernel

From: David Miller <davem@davemloft.net>
Date: Mon, 04 Aug 2014 16:23:14 -0700 (PDT)

Sorry, I screwed up the lkml CC:, fixing that here.

> Hey Nick,
> 
> The lazy VMAP flushing in mm/vmalloc.c seems to make various
> assumptions about vmalloc area layout.
> 
> In particular it assumes that if there are pending VMAP flushes
> in multiple regions managed by vmap/vunmap, it's safe to queue
> up a range flush from the lowest such address to the highest
> such address.
> 
> This is problematic and causes problems on sparc64 as diagnosed by
> Christopher (CC:'d).
> 
> On sparc64 we have the following regions:
> 
> modules		0x010000000 --> 0x0f0000000
> openfirmware	0x0f0000000 --> 0x100000000
> vmalloc		0x100000000 --> 0x10000000000
> 
> So if a module is unloaded as well as some vfree()'s occur, the next
> lazy VMAP flush will flush a range that covers all of openfirmware.
> 
> This will flush the firmware's locked TLB entries, which in turn cause
> all sorts of problems.
> 
> It is not possible to adjust where these ranges are in order to make
> the vmalloc and module ranges be right next to eachother.  The
> firmware area is fixed, first of all.  Second of all the module area
> has to be in the low 4GB because of the code model we compile the
> kernel with (all symbols are 32-bit), and we want to use as little of
> the sub-4GB area as possible because it has to fit the main kernel
> image, modules, and the firmware region.
> 
> We could add all sorts of range logic to the flush_tlb_range()
> implementation on sparc64, but I really think that the kernel should
> not trigger a TLB flush across a range for which it never managed any
> mappings.
> 
> I also think that the lazy VMAP flusher should be mindful of this for
> another reason.  Specifically, issuing such an enormous flush range is
> going to be expensive, more expensive that whatever we were gaining by
> batching these flushes.
> 
> Unlike for userspace mappings, for kernel mappings we can't have a
> cutoff for page-by-page flushes and just do a context based TLB flush.
> We always have to do page-by-page flushes.  So these huge ranges
> really do hurt.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-08-04 23:35 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-04 23:23 overzealous TLB flushing by lazy VMAP flushing David Miller
2014-08-04 23:35 ` David Miller
2014-08-04 23:35   ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.