From: Peter Zijlstra <peterz@infradead.org>
To: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: will.deacon@arm.com, aneesh.kumar@linux.vnet.ibm.com,
akpm@linux-foundation.org, npiggin@gmail.com,
linux-arch@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux@armlinux.org.uk,
heiko.carstens@de.ibm.com,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 2/2] s390/tlb: convert to generic mmu_gather
Date: Wed, 19 Sep 2018 18:15:14 +0200 [thread overview]
Message-ID: <20180919161514.GK24124@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20180919162809.30b5c416@mschwideX1>
On Wed, Sep 19, 2018 at 04:28:09PM +0200, Martin Schwidefsky wrote:
> On Wed, 19 Sep 2018 14:38:49 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
>
> > On Tue, Sep 18, 2018 at 02:51:51PM +0200, Martin Schwidefsky wrote:
> > > + page_table_free_rcu(tlb, (unsigned long *) pte, address);
> >
> > (whitespace damage, fixed)
> >
> > Also, could you perhaps explain the need for that
> > page_table_alloc/page_table_free code? That is, I get the comment about
> > using 2K page-table fragments out of 4k physical page, but why this
> > custom allocator instead of kmem_cache? It feels like there's a little
> > extra complication, but it's not immediately obvious what.
>
> The kmem_cache code uses the fields of struct page for its tracking.
> pgtable_page_ctor uses the same fields, e.g. for the ptl. Last time
> I tried to convert the page_table_alloc/page_table_free to kmem_cache
> it just crashed. Plus the split of 4K pages into 2 2K fragments is
> done on a per mm basis, that should help a little bit with fragmentation.
Fair enough, thanks for the information.
> > It's that ASCE limit that makes it impossible to use the generic
> > helpers, right?
>
> There are two problems, one of them is related to the ASCE limit:
>
> 1) s390 supports 4 different page table layouts. 2-levels (2^31 bytes) for 31-bit compat,
> 3-levels (2^42 bytes) as the default for 64-bit, 4-levels (2^53) if 4 tera-bytes are
> not enough and 5-levels (2^64) for the bragging rights.
> The pxd_free_tlb() turn into nops if the number of page table levels require it.
Shiny, I think we (x86) have to choose at boot time which paging mode we
want and have to stick to it.
> 2) The mm->context.flush_mm indication.
> That goes back to this beauty in the architecture:
>
> * "A valid table entry must not be changed while it is attached
> * to any CPU and may be used for translation by that CPU except to
> * (1) invalidate the entry by using INVALIDATE PAGE TABLE ENTRY,
> * or INVALIDATE DAT TABLE ENTRY, (2) alter bits 56-63 of a page
> * table entry, or (3) make a change by means of a COMPARE AND SWAP
> * AND PURGE instruction that purges the TLB."
>
> If one CPU is doing a mmu_gather page table operation on the only active thread
> in the system the individual page table updates are done in a lazy fashion with
> simple stores. If a second CPU picks up another thread for execution, the
> attach_count is increased and the page table updates are done with IPTE/IDTE
> from now on. But there might by TLBs of around that are not flushed yet.
> We may *not* let the second CPU see these TLBs, otherwise the CPU may start an
> instruction, then loose the TLB without being able to recreate it. Due to that
> the CPU can end up with a half finished instruction it can not roll back nor
> complete, ending in a check-stop. The simplest example is MVC with a length
> of e.g. 256 bytes. The instruction has to complete with all 256 bytes moved,
> or no bytes may have at all.
> That is where the mm->context.flush_mm indication comes into play, if the
> second CPU finds the bit set at the time it attaches a thread, it will to
> an IDTE for flush all TLBs for the mm.
Oh man.. what fun. Still, this bit could easily be set in the
__*_free_tlb() functions afaict. Still 1) above is enough.
Thanks!
prev parent reply other threads:[~2018-09-19 16:15 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-18 12:51 [RFC][PATCH 0/2] convert s390 to generic mmu_gather Martin Schwidefsky
2018-09-18 12:51 ` [PATCH 1/2] asm-generic/tlb: introduce HAVE_MMU_GATHER_NO_GATHER Martin Schwidefsky
2018-09-18 12:51 ` [PATCH 2/2] s390/tlb: convert to generic mmu_gather Martin Schwidefsky
2018-09-19 12:38 ` Peter Zijlstra
2018-09-19 14:28 ` Martin Schwidefsky
2018-09-19 16:15 ` Peter Zijlstra [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180919161514.GK24124@hirez.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@armlinux.org.uk \
--cc=npiggin@gmail.com \
--cc=schwidefsky@de.ibm.com \
--cc=torvalds@linux-foundation.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).