overzealous TLB flushing by lazy VMAP flushing

* overzealous TLB flushing by lazy VMAP flushing
@ 2014-08-04 23:23 David Miller
  2014-08-04 23:35   ` David Miller
  0 siblings, 1 reply; 3+ messages in thread
From: David Miller @ 2014-08-04 23:23 UTC (permalink / raw)
  To: sparclinux

Hey Nick,

The lazy VMAP flushing in mm/vmalloc.c seems to make various
assumptions about vmalloc area layout.

In particular it assumes that if there are pending VMAP flushes
in multiple regions managed by vmap/vunmap, it's safe to queue
up a range flush from the lowest such address to the highest
such address.

This is problematic and causes problems on sparc64 as diagnosed by
Christopher (CC:'d).

On sparc64 we have the following regions:

modules		0x010000000 --> 0x0f0000000
openfirmware	0x0f0000000 --> 0x100000000
vmalloc		0x100000000 --> 0x10000000000

So if a module is unloaded as well as some vfree()'s occur, the next
lazy VMAP flush will flush a range that covers all of openfirmware.

This will flush the firmware's locked TLB entries, which in turn cause
all sorts of problems.

It is not possible to adjust where these ranges are in order to make
the vmalloc and module ranges be right next to eachother.  The
firmware area is fixed, first of all.  Second of all the module area
has to be in the low 4GB because of the code model we compile the
kernel with (all symbols are 32-bit), and we want to use as little of
the sub-4GB area as possible because it has to fit the main kernel
image, modules, and the firmware region.

We could add all sorts of range logic to the flush_tlb_range()
implementation on sparc64, but I really think that the kernel should
not trigger a TLB flush across a range for which it never managed any
mappings.

I also think that the lazy VMAP flusher should be mindful of this for
another reason.  Specifically, issuing such an enormous flush range is
going to be expensive, more expensive that whatever we were gaining by
batching these flushes.

Unlike for userspace mappings, for kernel mappings we can't have a
cutoff for page-by-page flushes and just do a context based TLB flush.
We always have to do page-by-page flushes.  So these huge ranges
really do hurt.

^ permalink raw reply	[flat|nested] 3+ messages in thread