All of lore.kernel.org
 help / color / mirror / Atom feed
From: Russell King <rmk@arm.linux.org.uk>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Avi Kivity <avi@redhat.com>, Thomas Gleixner <tglx@linutronix.de>,
	Rik van Riel <riel@redhat.com>, Ingo Molnar <mingo@elte.hu>,
	akpm@linux-foundation.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-mm@kvack.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	David Miller <davem@davemloft.net>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Mel Gorman <mel@csn.ul.ie>, Nick Piggin <npiggin@kernel.dk>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	Yanmin Zhang <yanmin_zhang@linux.intel.com>,
	"Luck,Tony" <tony.luck@intel.com>,
	PaulMundt <lethal@linux-sh.org>,
	Chris Metcalf <cmetcalf@tilera.com>
Subject: Re: [PATCH 06/17] arm: mmu_gather rework
Date: Mon, 28 Feb 2011 11:59:07 +0000	[thread overview]
Message-ID: <20110228115907.GB492@flint.arm.linux.org.uk> (raw)
In-Reply-To: <1298893487.2428.10537.camel@twins>

On Mon, Feb 28, 2011 at 12:44:47PM +0100, Peter Zijlstra wrote:
> Right, so the normal case is:
> 
>   unmap_region()
>     tlb_gather_mmu()

The fullmm argument is important here as it specifies the mode.
      tlb_gather_mmu(, 0)

>     unmap_vmas()
>       for (; vma; vma = vma->vm_next)
>         unmao_page_range()
>           tlb_start_vma() -> flush cache range
>           zap_*_range()
>             ptep_get_and_clear_full() -> batch/track external tlbs
>             tlb_remove_tlb_entry() -> batch/track external tlbs
>             tlb_remove_page() -> track range/batch page
>           tlb_end_vma() -> flush tlb range

       tlb_finish_mmu() -> nothing

> 
>  [ for architectures that have hardware page table walkers
>    concurrent faults can still load the page tables ]
> 
>     free_pgtables()

        tlb_gather_mmu(, 1)

>       while (vma)
>         unlink_*_vma()
>         free_*_range()
>           *_free_tlb()
>     tlb_finish_mmu()

      tlb_finish_mmu() -> flush tlb mm

> 
>   free vmas

So this is all fine.  Note that we *don't* use the range stuff here.

> Now, if we want to track ranges _and_ have hardware page table walkers
> (ARM seems to be one such), we must flush TLBs at tlb_end_vma() because
> flush_tlb_range() requires a vma pointer (ARM and TILE actually use more
> than ->vm_mm), and on tlb_finish_mmu() issue a full mm wide invalidate
> because the hardware walker could have re-populated the cache after
> clearing the PTEs but before freeing the page tables.

No.  The hardware walker won't re-populate the TLB after the page table
entries have been cleared - where would it get this information from if
not from the page tables?

> What ARM does is it retains the last vma pointer and tracks
> pte_free_tlb() range and uses that in tlb_finish_mmu(), which is a tad
> hacky.

It may be hacky but then the TLB shootdown interface is hacky too.  We
don't keep the vma around to re-use after tlb_end_vma() - if you think
that then you misunderstand what's going on.  The vma pointer is kept
around as a cheap way of allowing tlb_finish_mmu() to distinguish
between the unmap_region() mode and the shift_arg_pages() mode.

> Mostly because of shift_arg_pages(), where we have:
> 
>   shift_arg_pages()
>     tlb_gather_mmu()

      tlb_gather_mmu(, 0)

>     free_*_range()
>     tlb_finish_mmu()

      tlb_finish_mmu() does nothing without the ARM change.
      tlb_finish_mmu() -> flush_tlb_mm() with the ARM change.

And this is where the bug was - these page table entries could find
their way into the TLB and persist after they've been torn down.

> For which ARM now punts and does a full tlb invalidate (no vma pointer).
> But also because it drags along that vma pointer, which might not at all
> match the range its actually going to invalidate (and hence its vm_flags
> might not accurately reflect things -- at worst more expensive than
> needed).

Where do you get that from?  Where exactly in the above code would the
VMA pointer get set?  In this case, it will be NULL, so we do a
flush_tlb_mm() for this case.  We have to - we don't have any VMA to
deal with at this point.

> The reason I wanted flush_tlb_range() to take an mm_struct and not the
> current vm_area_struct is because we can avoid doing the
> flush_tlb_range() from tlb_end_vma() and delay the thing until
> tlb_finish_mmu() without having to resort to such games as above. We
> could simply track the full range over all VMAs and free'd page-tables
> and do one range invalidate.

No.  That's stupid.  Consider the case where you have to loop one page
at a time over the range (we do on ARM.)  If we ended up with your
suggestion above, that means we could potentially have to loop 4K at a
time over 3GB of address space.  That's idiotic when we have an
instruction which can flush the entire TLB for a particular thread.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

WARNING: multiple messages have this Message-ID (diff)
From: Russell King <rmk@arm.linux.org.uk>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Avi Kivity <avi@redhat.com>, Thomas Gleixner <tglx@linutronix.de>,
	Rik van Riel <riel@redhat.com>, Ingo Molnar <mingo@elte.hu>,
	akpm@linux-foundation.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-mm@kvack.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	David Miller <davem@davemloft.net>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Mel Gorman <mel@csn.ul.ie>, Nick Piggin <npiggin@kernel.dk>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	Yanmin Zhang <yanmin_zhang@linux.intel.com>,
	"Luck,Tony" <tony.luck@intel.com>,
	PaulMundt <lethal@linux-sh.org>,
	Chris Metcalf <cmetcalf@tilera.com>
Subject: Re: [PATCH 06/17] arm: mmu_gather rework
Date: Mon, 28 Feb 2011 11:59:07 +0000	[thread overview]
Message-ID: <20110228115907.GB492@flint.arm.linux.org.uk> (raw)
In-Reply-To: <1298893487.2428.10537.camel@twins>

On Mon, Feb 28, 2011 at 12:44:47PM +0100, Peter Zijlstra wrote:
> Right, so the normal case is:
> 
>   unmap_region()
>     tlb_gather_mmu()

The fullmm argument is important here as it specifies the mode.
      tlb_gather_mmu(, 0)

>     unmap_vmas()
>       for (; vma; vma = vma->vm_next)
>         unmao_page_range()
>           tlb_start_vma() -> flush cache range
>           zap_*_range()
>             ptep_get_and_clear_full() -> batch/track external tlbs
>             tlb_remove_tlb_entry() -> batch/track external tlbs
>             tlb_remove_page() -> track range/batch page
>           tlb_end_vma() -> flush tlb range

       tlb_finish_mmu() -> nothing

> 
>  [ for architectures that have hardware page table walkers
>    concurrent faults can still load the page tables ]
> 
>     free_pgtables()

        tlb_gather_mmu(, 1)

>       while (vma)
>         unlink_*_vma()
>         free_*_range()
>           *_free_tlb()
>     tlb_finish_mmu()

      tlb_finish_mmu() -> flush tlb mm

> 
>   free vmas

So this is all fine.  Note that we *don't* use the range stuff here.

> Now, if we want to track ranges _and_ have hardware page table walkers
> (ARM seems to be one such), we must flush TLBs at tlb_end_vma() because
> flush_tlb_range() requires a vma pointer (ARM and TILE actually use more
> than ->vm_mm), and on tlb_finish_mmu() issue a full mm wide invalidate
> because the hardware walker could have re-populated the cache after
> clearing the PTEs but before freeing the page tables.

No.  The hardware walker won't re-populate the TLB after the page table
entries have been cleared - where would it get this information from if
not from the page tables?

> What ARM does is it retains the last vma pointer and tracks
> pte_free_tlb() range and uses that in tlb_finish_mmu(), which is a tad
> hacky.

It may be hacky but then the TLB shootdown interface is hacky too.  We
don't keep the vma around to re-use after tlb_end_vma() - if you think
that then you misunderstand what's going on.  The vma pointer is kept
around as a cheap way of allowing tlb_finish_mmu() to distinguish
between the unmap_region() mode and the shift_arg_pages() mode.

> Mostly because of shift_arg_pages(), where we have:
> 
>   shift_arg_pages()
>     tlb_gather_mmu()

      tlb_gather_mmu(, 0)

>     free_*_range()
>     tlb_finish_mmu()

      tlb_finish_mmu() does nothing without the ARM change.
      tlb_finish_mmu() -> flush_tlb_mm() with the ARM change.

And this is where the bug was - these page table entries could find
their way into the TLB and persist after they've been torn down.

> For which ARM now punts and does a full tlb invalidate (no vma pointer).
> But also because it drags along that vma pointer, which might not at all
> match the range its actually going to invalidate (and hence its vm_flags
> might not accurately reflect things -- at worst more expensive than
> needed).

Where do you get that from?  Where exactly in the above code would the
VMA pointer get set?  In this case, it will be NULL, so we do a
flush_tlb_mm() for this case.  We have to - we don't have any VMA to
deal with at this point.

> The reason I wanted flush_tlb_range() to take an mm_struct and not the
> current vm_area_struct is because we can avoid doing the
> flush_tlb_range() from tlb_end_vma() and delay the thing until
> tlb_finish_mmu() without having to resort to such games as above. We
> could simply track the full range over all VMAs and free'd page-tables
> and do one range invalidate.

No.  That's stupid.  Consider the case where you have to loop one page
at a time over the range (we do on ARM.)  If we ended up with your
suggestion above, that means we could potentially have to loop 4K at a
time over 3GB of address space.  That's idiotic when we have an
instruction which can flush the entire TLB for a particular thread.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-02-28 12:01 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-17 16:23 [PATCH 00/17] mm: mmu_gather rework Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 01/17] tile: Fix __pte_free_tlb Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 02/17] mm: mmu_gather rework Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-03-10 15:50   ` Mel Gorman
2011-03-10 15:50     ` Mel Gorman
2011-03-16 18:55     ` Peter Zijlstra
2011-03-16 18:55       ` Peter Zijlstra
2011-03-16 20:15       ` Geert Uytterhoeven
2011-03-16 20:15         ` Geert Uytterhoeven
2011-03-16 21:08         ` Peter Zijlstra
2011-03-16 21:08           ` Peter Zijlstra
2011-03-21  8:47       ` Avi Kivity
2011-03-21  8:47         ` Avi Kivity
2011-04-01 12:07         ` Peter Zijlstra
2011-04-01 12:07           ` Peter Zijlstra
2011-04-01 16:13           ` Linus Torvalds
2011-04-01 16:13             ` Linus Torvalds
2011-04-02  0:07             ` David Miller
2011-04-02  0:07               ` David Miller
2011-02-17 16:23 ` [PATCH 03/17] powerpc: " Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 04/17] sparc: " Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 05/17] s390: " Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 06/17] arm: " Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-24 16:34   ` Peter Zijlstra
2011-02-24 16:34     ` Peter Zijlstra
2011-02-25 18:04     ` Peter Zijlstra
2011-02-25 18:04       ` Peter Zijlstra
2011-02-25 19:45       ` Peter Zijlstra
2011-02-25 19:45         ` Peter Zijlstra
2011-02-25 19:59         ` Hugh Dickins
2011-02-25 19:59           ` Hugh Dickins
2011-02-25 21:51       ` Russell King
2011-02-25 21:51         ` Russell King
2011-02-28 11:44         ` Peter Zijlstra
2011-02-28 11:44           ` Peter Zijlstra
2011-02-28 11:59           ` Russell King [this message]
2011-02-28 11:59             ` Russell King
2011-02-28 12:06             ` Russell King
2011-02-28 12:06             ` Russell King
2011-02-28 12:06             ` Russell King
2011-02-28 12:06               ` Russell King
2011-02-28 12:25               ` Peter Zijlstra
2011-02-28 12:25                 ` Peter Zijlstra
2011-02-28 12:20             ` Peter Zijlstra
2011-02-28 12:20               ` Peter Zijlstra
2011-02-28 12:28               ` Russell King
2011-02-28 12:28                 ` Russell King
2011-02-28 12:49                 ` Peter Zijlstra
2011-02-28 12:49                   ` Peter Zijlstra
2011-02-28 12:50                   ` Russell King
2011-02-28 12:50                     ` Russell King
2011-02-28 13:03                     ` Peter Zijlstra
2011-02-28 13:03                       ` Peter Zijlstra
2011-02-28 14:18           ` Peter Zijlstra
2011-02-28 14:18             ` Peter Zijlstra
2011-02-28 14:57             ` Russell King
2011-02-28 14:57               ` Russell King
2011-02-28 15:05               ` Peter Zijlstra
2011-02-28 15:05                 ` Peter Zijlstra
2011-02-28 15:15                 ` Russell King
2011-02-28 15:15                   ` Russell King
2011-03-01 22:05           ` Chris Metcalf
2011-03-01 22:05             ` Chris Metcalf
2011-03-01 22:05             ` Chris Metcalf
2011-03-02 10:54             ` Peter Zijlstra
2011-03-02 10:54               ` Peter Zijlstra
2011-03-02 10:54               ` Peter Zijlstra
2011-03-02 10:54               ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 07/17] sh: " Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 08/17] um: " Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 09/17] ia64: " Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 10/17] mm: Now that all old mmu_gather code is gone, remove the storage Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 11/17] mm, powerpc: Move the RCU page-table freeing into generic code Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 12/17] s390: use generic RCP page-table freeing Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 13/17] mm: Extended batches for generic mmu_gather Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 14/17] mm: Provide generic range tracking and flushing Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 15/17] arm, mm: Convert arm to generic tlb Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 16/17] ia64, mm: Convert ia64 " Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 17/17] sh, mm: Convert sh " Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 16:23   ` Peter Zijlstra
2011-02-17 17:36 ` [PATCH 00/17] mm: mmu_gather rework Peter Zijlstra
2011-02-17 17:36   ` Peter Zijlstra
2011-02-17 17:42 ` Peter Zijlstra
2011-02-17 17:42   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110228115907.GB492@flint.arm.linux.org.uk \
    --to=rmk@arm.linux.org.uk \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=cmetcalf@tilera.com \
    --cc=davem@davemloft.net \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=lethal@linux-sh.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=mingo@elte.hu \
    --cc=npiggin@kernel.dk \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.