From: Peter Zijlstra <peterz@infradead.org> To: Andy Lutomirski <luto@amacapital.net> Cc: Nicholas Piggin <npiggin@gmail.com>, Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, Anton Blanchard <anton@ozlabs.org>, Arnd Bergmann <arnd@arndb.de>, linux-arch <linux-arch@vger.kernel.org>, linux-kernel <linux-kernel@vger.kernel.org>, linux-mm <linux-mm@kvack.org>, linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, Andy Lutomirski <luto@kernel.org>, x86 <x86@kernel.org> Subject: Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode Date: Thu, 16 Jul 2020 10:50:32 +0200 [thread overview] Message-ID: <20200716085032.GO10769@hirez.programming.kicks-ass.net> (raw) In-Reply-To: <EFAD6E2F-EC08-4EB3-9ECC-2A963C023FC5@amacapital.net> On Wed, Jul 15, 2020 at 10:18:20PM -0700, Andy Lutomirski wrote: > > On Jul 15, 2020, at 9:15 PM, Nicholas Piggin <npiggin@gmail.com> wrote: > > CPU0 CPU1 > > 1. user stuff > > a. membarrier() 2. enter kernel > > b. read rq->curr 3. rq->curr switched to kthread > > c. is kthread, skip IPI 4. switch_to kthread > > d. return to user 5. rq->curr switched to user thread > > 6. switch_to user thread > > 7. exit kernel > > 8. more user stuff > I find it hard to believe that this is x86 only. Why would thread > switch imply core sync on any architecture? Is x86 unique in having a > stupid expensive core sync that is heavier than smp_mb()? smp_mb() is nowhere near the most expensive barrier we have in Linux, mb() might qualify, since that has some completion requirements since it needs to serialize against external actors. On x86_64 things are rather murky, we have: LOCK prefix -- which implies smp_mb() before and after RmW LFENCE -- which used to be rmb like, until Spectre, and now it is ISYNC like. Since ISYNC ensures an empty pipeline, it also implies all loads are retired (and therefore complete) it implies rmb. MFENCE -- which is a memop completion barrier like, it makes sure all previously issued memops are complete. if you read that carefully, you'll note you'll have to use LFENCE + MFENCE to order against non-memops instructions. But none of them imply dumping the instruction decoder caches, that only happens on core serializing instructions like CR3 writes, IRET, CPUID and a few others, I think we recently got a SERIALIZE instruction to add to this list. On ARM64 there's something a whole different set of barriers, and again smp_mb() isn't nowhere near the top of the list. They have roughly 3 classes: ISB -- instruction sync barrier DMB(x) -- memory ordering in domain x DSB(x) -- memory completion in domain x And they have at least 3 domains (IIRC), system, outer, inner. The ARM64 __switch_to() includes a dsb(sy), just like PowerPC used to have a SYNC, but since PowerPC is rare for only having one rediculously heavy serializing instruction, we got to re-use the smp_mb() early in __schedule() instead, but ARM64 can't do that. So rather than say that x86 is special here, I'd say that PowerPC is special here. > But I’m wondering if all this deferred sync stuff is wrong. In the > brave new world of io_uring and such, perhaps kernel access matter > too. Heck, even: IIRC the membarrier SYNC_CORE use-case is about user-space self-modifying code. Userspace re-uses a text address and needs to SYNC_CORE before it can be sure the old text is forgotten. Nothing the kernel does matters there. I suppose the manpage could be more clear there.
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org> To: Andy Lutomirski <luto@amacapital.net> Cc: linux-arch <linux-arch@vger.kernel.org>, Arnd Bergmann <arnd@arndb.de>, x86 <x86@kernel.org>, linux-kernel <linux-kernel@vger.kernel.org>, Nicholas Piggin <npiggin@gmail.com>, linux-mm <linux-mm@kvack.org>, Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, Andy Lutomirski <luto@kernel.org>, linuxppc-dev <linuxppc-dev@lists.ozlabs.org> Subject: Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode Date: Thu, 16 Jul 2020 10:50:32 +0200 [thread overview] Message-ID: <20200716085032.GO10769@hirez.programming.kicks-ass.net> (raw) In-Reply-To: <EFAD6E2F-EC08-4EB3-9ECC-2A963C023FC5@amacapital.net> On Wed, Jul 15, 2020 at 10:18:20PM -0700, Andy Lutomirski wrote: > > On Jul 15, 2020, at 9:15 PM, Nicholas Piggin <npiggin@gmail.com> wrote: > > CPU0 CPU1 > > 1. user stuff > > a. membarrier() 2. enter kernel > > b. read rq->curr 3. rq->curr switched to kthread > > c. is kthread, skip IPI 4. switch_to kthread > > d. return to user 5. rq->curr switched to user thread > > 6. switch_to user thread > > 7. exit kernel > > 8. more user stuff > I find it hard to believe that this is x86 only. Why would thread > switch imply core sync on any architecture? Is x86 unique in having a > stupid expensive core sync that is heavier than smp_mb()? smp_mb() is nowhere near the most expensive barrier we have in Linux, mb() might qualify, since that has some completion requirements since it needs to serialize against external actors. On x86_64 things are rather murky, we have: LOCK prefix -- which implies smp_mb() before and after RmW LFENCE -- which used to be rmb like, until Spectre, and now it is ISYNC like. Since ISYNC ensures an empty pipeline, it also implies all loads are retired (and therefore complete) it implies rmb. MFENCE -- which is a memop completion barrier like, it makes sure all previously issued memops are complete. if you read that carefully, you'll note you'll have to use LFENCE + MFENCE to order against non-memops instructions. But none of them imply dumping the instruction decoder caches, that only happens on core serializing instructions like CR3 writes, IRET, CPUID and a few others, I think we recently got a SERIALIZE instruction to add to this list. On ARM64 there's something a whole different set of barriers, and again smp_mb() isn't nowhere near the top of the list. They have roughly 3 classes: ISB -- instruction sync barrier DMB(x) -- memory ordering in domain x DSB(x) -- memory completion in domain x And they have at least 3 domains (IIRC), system, outer, inner. The ARM64 __switch_to() includes a dsb(sy), just like PowerPC used to have a SYNC, but since PowerPC is rare for only having one rediculously heavy serializing instruction, we got to re-use the smp_mb() early in __schedule() instead, but ARM64 can't do that. So rather than say that x86 is special here, I'd say that PowerPC is special here. > But I’m wondering if all this deferred sync stuff is wrong. In the > brave new world of io_uring and such, perhaps kernel access matter > too. Heck, even: IIRC the membarrier SYNC_CORE use-case is about user-space self-modifying code. Userspace re-uses a text address and needs to SYNC_CORE before it can be sure the old text is forgotten. Nothing the kernel does matters there. I suppose the manpage could be more clear there.
next prev parent reply other threads:[~2020-07-16 8:51 UTC|newest] Thread overview: 136+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-07-10 1:56 [RFC PATCH 0/7] mmu context cleanup, lazy tlb cleanup, Nicholas Piggin 2020-07-10 1:56 ` Nicholas Piggin 2020-07-10 1:56 ` [RFC PATCH 1/7] asm-generic: add generic MMU versions of mmu context functions Nicholas Piggin 2020-07-10 1:56 ` Nicholas Piggin 2020-07-10 1:56 ` [RFC PATCH 2/7] arch: use asm-generic mmu context for no-op implementations Nicholas Piggin 2020-07-10 1:56 ` Nicholas Piggin 2020-07-10 1:56 ` [RFC PATCH 3/7] mm: introduce exit_lazy_tlb Nicholas Piggin 2020-07-10 1:56 ` Nicholas Piggin 2020-07-10 1:56 ` [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode Nicholas Piggin 2020-07-10 1:56 ` Nicholas Piggin 2020-07-10 9:42 ` Peter Zijlstra 2020-07-10 9:42 ` Peter Zijlstra 2020-07-10 14:02 ` Mathieu Desnoyers 2020-07-10 14:02 ` Mathieu Desnoyers 2020-07-10 14:02 ` Mathieu Desnoyers 2020-07-10 17:04 ` Andy Lutomirski 2020-07-10 17:04 ` Andy Lutomirski 2020-07-10 17:04 ` Andy Lutomirski 2020-07-13 4:45 ` Nicholas Piggin 2020-07-13 4:45 ` Nicholas Piggin 2020-07-13 13:47 ` Nicholas Piggin 2020-07-13 13:47 ` Nicholas Piggin 2020-07-13 14:13 ` Mathieu Desnoyers 2020-07-13 14:13 ` Mathieu Desnoyers 2020-07-13 14:13 ` Mathieu Desnoyers 2020-07-13 15:48 ` Andy Lutomirski 2020-07-13 15:48 ` Andy Lutomirski 2020-07-13 15:48 ` Andy Lutomirski 2020-07-13 16:37 ` Nicholas Piggin 2020-07-13 16:37 ` Nicholas Piggin 2020-07-16 4:15 ` Nicholas Piggin 2020-07-16 4:15 ` Nicholas Piggin 2020-07-16 4:42 ` Nicholas Piggin 2020-07-16 4:42 ` Nicholas Piggin 2020-07-16 15:46 ` Mathieu Desnoyers 2020-07-16 15:46 ` Mathieu Desnoyers 2020-07-16 15:46 ` Mathieu Desnoyers 2020-07-16 16:03 ` Mathieu Desnoyers 2020-07-16 16:03 ` Mathieu Desnoyers 2020-07-16 16:03 ` Mathieu Desnoyers 2020-07-16 18:58 ` Mathieu Desnoyers 2020-07-16 18:58 ` Mathieu Desnoyers 2020-07-16 18:58 ` Mathieu Desnoyers 2020-07-16 21:24 ` Alan Stern 2020-07-16 21:24 ` Alan Stern 2020-07-17 13:39 ` Mathieu Desnoyers 2020-07-17 13:39 ` Mathieu Desnoyers 2020-07-17 13:39 ` Mathieu Desnoyers 2020-07-17 14:51 ` Alan Stern 2020-07-17 14:51 ` Alan Stern 2020-07-17 15:39 ` Mathieu Desnoyers 2020-07-17 15:39 ` Mathieu Desnoyers 2020-07-17 15:39 ` Mathieu Desnoyers 2020-07-17 16:11 ` Alan Stern 2020-07-17 16:11 ` Alan Stern 2020-07-17 16:22 ` Mathieu Desnoyers 2020-07-17 16:22 ` Mathieu Desnoyers 2020-07-17 16:22 ` Mathieu Desnoyers 2020-07-17 17:44 ` Alan Stern 2020-07-17 17:44 ` Alan Stern 2020-07-17 17:52 ` Mathieu Desnoyers 2020-07-17 17:52 ` Mathieu Desnoyers 2020-07-17 17:52 ` Mathieu Desnoyers 2020-07-17 0:00 ` Nicholas Piggin 2020-07-17 0:00 ` Nicholas Piggin 2020-07-16 5:18 ` Andy Lutomirski 2020-07-16 5:18 ` Andy Lutomirski 2020-07-16 6:06 ` Nicholas Piggin 2020-07-16 6:06 ` Nicholas Piggin 2020-07-16 8:50 ` Peter Zijlstra [this message] 2020-07-16 8:50 ` Peter Zijlstra 2020-07-16 10:03 ` Nicholas Piggin 2020-07-16 10:03 ` Nicholas Piggin 2020-07-16 11:00 ` peterz 2020-07-16 11:00 ` peterz 2020-07-16 15:34 ` Mathieu Desnoyers 2020-07-16 15:34 ` Mathieu Desnoyers 2020-07-16 15:34 ` Mathieu Desnoyers 2020-07-16 23:26 ` Nicholas Piggin 2020-07-16 23:26 ` Nicholas Piggin 2020-07-17 13:42 ` Mathieu Desnoyers 2020-07-17 13:42 ` Mathieu Desnoyers 2020-07-17 13:42 ` Mathieu Desnoyers 2020-07-20 3:03 ` Nicholas Piggin 2020-07-20 3:03 ` Nicholas Piggin 2020-07-20 16:46 ` Mathieu Desnoyers 2020-07-20 16:46 ` Mathieu Desnoyers 2020-07-20 16:46 ` Mathieu Desnoyers 2020-07-21 10:04 ` Nicholas Piggin 2020-07-21 10:04 ` Nicholas Piggin 2020-07-21 13:11 ` Mathieu Desnoyers 2020-07-21 13:11 ` Mathieu Desnoyers 2020-07-21 13:11 ` Mathieu Desnoyers 2020-07-21 14:30 ` Nicholas Piggin 2020-07-21 14:30 ` Nicholas Piggin 2020-07-21 15:06 ` peterz 2020-07-21 15:06 ` peterz 2020-07-21 15:15 ` Mathieu Desnoyers 2020-07-21 15:15 ` Mathieu Desnoyers 2020-07-21 15:15 ` Mathieu Desnoyers 2020-07-21 15:19 ` Peter Zijlstra 2020-07-21 15:19 ` Peter Zijlstra 2020-07-21 15:22 ` Mathieu Desnoyers 2020-07-21 15:22 ` Mathieu Desnoyers 2020-07-21 15:22 ` Mathieu Desnoyers 2020-07-10 1:56 ` [RFC PATCH 5/7] lazy tlb: introduce lazy mm refcount helper functions Nicholas Piggin 2020-07-10 1:56 ` Nicholas Piggin 2020-07-10 9:48 ` Peter Zijlstra 2020-07-10 9:48 ` Peter Zijlstra 2020-07-10 1:56 ` [RFC PATCH 6/7] lazy tlb: allow lazy tlb mm switching to be configurable Nicholas Piggin 2020-07-10 1:56 ` Nicholas Piggin 2020-07-10 1:56 ` [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option Nicholas Piggin 2020-07-10 1:56 ` Nicholas Piggin 2020-07-10 9:35 ` Peter Zijlstra 2020-07-10 9:35 ` Peter Zijlstra 2020-07-13 4:58 ` Nicholas Piggin 2020-07-13 4:58 ` Nicholas Piggin 2020-07-13 15:59 ` Andy Lutomirski 2020-07-13 15:59 ` Andy Lutomirski 2020-07-13 15:59 ` Andy Lutomirski 2020-07-13 16:48 ` Nicholas Piggin 2020-07-13 16:48 ` Nicholas Piggin 2020-07-13 18:18 ` Andy Lutomirski 2020-07-13 18:18 ` Andy Lutomirski 2020-07-14 5:04 ` Nicholas Piggin 2020-07-14 5:04 ` Nicholas Piggin 2020-07-14 6:31 ` Nicholas Piggin 2020-07-14 6:31 ` Nicholas Piggin 2020-07-14 12:46 ` Andy Lutomirski 2020-07-14 12:46 ` Andy Lutomirski 2020-07-14 13:23 ` Peter Zijlstra 2020-07-14 13:23 ` Peter Zijlstra 2020-07-16 2:26 ` Nicholas Piggin 2020-07-16 2:26 ` Nicholas Piggin 2020-07-16 2:35 ` Nicholas Piggin 2020-07-16 2:35 ` Nicholas Piggin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200716085032.GO10769@hirez.programming.kicks-ass.net \ --to=peterz@infradead.org \ --cc=anton@ozlabs.org \ --cc=arnd@arndb.de \ --cc=linux-arch@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=luto@amacapital.net \ --cc=luto@kernel.org \ --cc=mathieu.desnoyers@efficios.com \ --cc=npiggin@gmail.com \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.