From: Andy Lutomirski <luto@kernel.org> To: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org>, X86 ML <x86@kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Linus Torvalds <torvalds@linux-foundation.org>, Andrew Morton <akpm@linux-foundation.org>, Mel Gorman <mgorman@suse.de>, "linux-mm@kvack.org" <linux-mm@kvack.org>, Nadav Amit <nadav.amit@gmail.com>, Rik van Riel <riel@redhat.com>, Dave Hansen <dave.hansen@intel.com>, Arjan van de Ven <arjan@linux.intel.com>, Peter Zijlstra <peterz@infradead.org> Subject: Re: [PATCH v3 05/11] x86/mm: Track the TLB's tlb_gen and update the flushing algorithm Date: Fri, 23 Jun 2017 08:46:40 -0700 [thread overview] Message-ID: <CALCETrX+B1Xa=0ZjYUNi+aApKPQerVqOt42bgGeNadaZc-c3hw@mail.gmail.com> (raw) In-Reply-To: <20170623084219.k4lrorgtlshej7ri@pd.tnic> On Fri, Jun 23, 2017 at 1:42 AM, Borislav Petkov <bp@alien8.de> wrote: > On Thu, Jun 22, 2017 at 11:08:38AM -0700, Andy Lutomirski wrote: >> Yes, I agree it's confusing. There really are three numbers. Those >> numbers are: the latest generation, the generation that this CPU has >> caught up to, and the generation that the requester of the flush we're >> currently handling has asked us to catch up to. I don't see a way to >> reduce the complexity. > > Yeah, can you pls put that clarification what what is, over it. It > explains it nicely what the check is supposed to do. Done. I've tried to improve a bunch of the comments in this function. > >> >> The flush IPI hits after a switch_mm_irqs_off() call notices the >> >> change from 1 to 2. switch_mm_irqs_off() will do a full flush and >> >> increment the local tlb_gen to 2, and the IPI handler for the partial >> >> flush will see local_tlb_gen == mm_tlb_gen - 1 (because local_tlb_gen >> >> == 2 and mm_tlb_gen == 3) and do a partial flush. >> > >> > Why, the 2->3 flush has f->end == TLB_FLUSH_ALL. >> > >> > That's why you have this thing in addition to the tlb_gen. >> >> Yes. The idea is that we only do remote partial flushes when it's >> 100% obvious that it's safe. > > So why wouldn't my simplified suggestion work then? > > if (f->end != TLB_FLUSH_ALL && > mm_tlb_gen == local_tlb_gen + 1) > > 1->2 is a partial flush - gets promoted to a full one > 2->3 is a full flush - it will get executed as one due to the f->end setting to > TLB_FLUSH_ALL. This could still fail in some cases, I think. Suppose 1->2 is a partial flush and 2->3 is a full flush. We could have this order of events: - CPU 1: Partial flush. Increase context.tlb_gen to 2 and send IPI. - CPU 0: switch_mm(), observe mm_tlb_gen == 2, set local_tlb_gen to 2. - CPU 2: Full flush. Increase context.tlb_gen to 3 and send IPI. - CPU 0: Receive partial flush IPI. mm_tlb_gen == 2 and local_tlb_gen == 3. Do __flush_tlb_single() and set local_tlb_gen to 3. Our invariant is now broken: CPU 0's percpu tlb_gen is now ahead of its actual TLB state. - CPU 0: Receive full flush IPI and skip the flush. Oops. I think my condition makes it clear that the invariants we need hold no matter it. > >> It could be converted to two full flushes or to just one, I think, >> depending on what order everything happens in. > > Right. One flush at the right time would be optimal. > >> But this approach of using three separate tlb_gen values seems to >> cover all the bases, and I don't think it's *that* bad. > > Sure. > > As I said in IRC, let's document that complexity then so that when we > stumble over it in the future, we at least know why it was done this > way. I've given it a try. Hopefully v4 is more clear.
WARNING: multiple messages have this Message-ID (diff)
From: Andy Lutomirski <luto@kernel.org> To: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org>, X86 ML <x86@kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Linus Torvalds <torvalds@linux-foundation.org>, Andrew Morton <akpm@linux-foundation.org>, Mel Gorman <mgorman@suse.de>, "linux-mm@kvack.org" <linux-mm@kvack.org>, Nadav Amit <nadav.amit@gmail.com>, Rik van Riel <riel@redhat.com>, Dave Hansen <dave.hansen@intel.com>, Arjan van de Ven <arjan@linux.intel.com>, Peter Zijlstra <peterz@infradead.org> Subject: Re: [PATCH v3 05/11] x86/mm: Track the TLB's tlb_gen and update the flushing algorithm Date: Fri, 23 Jun 2017 08:46:40 -0700 [thread overview] Message-ID: <CALCETrX+B1Xa=0ZjYUNi+aApKPQerVqOt42bgGeNadaZc-c3hw@mail.gmail.com> (raw) In-Reply-To: <20170623084219.k4lrorgtlshej7ri@pd.tnic> On Fri, Jun 23, 2017 at 1:42 AM, Borislav Petkov <bp@alien8.de> wrote: > On Thu, Jun 22, 2017 at 11:08:38AM -0700, Andy Lutomirski wrote: >> Yes, I agree it's confusing. There really are three numbers. Those >> numbers are: the latest generation, the generation that this CPU has >> caught up to, and the generation that the requester of the flush we're >> currently handling has asked us to catch up to. I don't see a way to >> reduce the complexity. > > Yeah, can you pls put that clarification what what is, over it. It > explains it nicely what the check is supposed to do. Done. I've tried to improve a bunch of the comments in this function. > >> >> The flush IPI hits after a switch_mm_irqs_off() call notices the >> >> change from 1 to 2. switch_mm_irqs_off() will do a full flush and >> >> increment the local tlb_gen to 2, and the IPI handler for the partial >> >> flush will see local_tlb_gen == mm_tlb_gen - 1 (because local_tlb_gen >> >> == 2 and mm_tlb_gen == 3) and do a partial flush. >> > >> > Why, the 2->3 flush has f->end == TLB_FLUSH_ALL. >> > >> > That's why you have this thing in addition to the tlb_gen. >> >> Yes. The idea is that we only do remote partial flushes when it's >> 100% obvious that it's safe. > > So why wouldn't my simplified suggestion work then? > > if (f->end != TLB_FLUSH_ALL && > mm_tlb_gen == local_tlb_gen + 1) > > 1->2 is a partial flush - gets promoted to a full one > 2->3 is a full flush - it will get executed as one due to the f->end setting to > TLB_FLUSH_ALL. This could still fail in some cases, I think. Suppose 1->2 is a partial flush and 2->3 is a full flush. We could have this order of events: - CPU 1: Partial flush. Increase context.tlb_gen to 2 and send IPI. - CPU 0: switch_mm(), observe mm_tlb_gen == 2, set local_tlb_gen to 2. - CPU 2: Full flush. Increase context.tlb_gen to 3 and send IPI. - CPU 0: Receive partial flush IPI. mm_tlb_gen == 2 and local_tlb_gen == 3. Do __flush_tlb_single() and set local_tlb_gen to 3. Our invariant is now broken: CPU 0's percpu tlb_gen is now ahead of its actual TLB state. - CPU 0: Receive full flush IPI and skip the flush. Oops. I think my condition makes it clear that the invariants we need hold no matter it. > >> It could be converted to two full flushes or to just one, I think, >> depending on what order everything happens in. > > Right. One flush at the right time would be optimal. > >> But this approach of using three separate tlb_gen values seems to >> cover all the bases, and I don't think it's *that* bad. > > Sure. > > As I said in IRC, let's document that complexity then so that when we > stumble over it in the future, we at least know why it was done this > way. I've given it a try. Hopefully v4 is more clear. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-06-23 15:47 UTC|newest] Thread overview: 154+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-06-21 5:22 [PATCH v3 00/11] PCID and improved laziness Andy Lutomirski 2017-06-21 5:22 ` Andy Lutomirski 2017-06-21 5:22 ` [PATCH v3 01/11] x86/mm: Don't reenter flush_tlb_func_common() Andy Lutomirski 2017-06-21 5:22 ` Andy Lutomirski 2017-06-21 8:01 ` Thomas Gleixner 2017-06-21 8:01 ` Thomas Gleixner 2017-06-21 8:49 ` Borislav Petkov 2017-06-21 8:49 ` Borislav Petkov 2017-06-21 15:15 ` Andy Lutomirski 2017-06-21 15:15 ` Andy Lutomirski 2017-06-21 23:26 ` Nadav Amit 2017-06-21 23:26 ` Nadav Amit 2017-06-22 2:27 ` Andy Lutomirski 2017-06-22 2:27 ` Andy Lutomirski 2017-06-22 7:32 ` Ingo Molnar 2017-06-22 7:32 ` Ingo Molnar 2017-06-21 5:22 ` [PATCH v3 02/11] x86/ldt: Simplify LDT switching logic Andy Lutomirski 2017-06-21 5:22 ` Andy Lutomirski 2017-06-21 8:03 ` Thomas Gleixner 2017-06-21 8:03 ` Thomas Gleixner 2017-06-21 9:40 ` Borislav Petkov 2017-06-21 9:40 ` Borislav Petkov 2017-06-22 11:08 ` [tip:x86/mm] x86/ldt: Simplify the " tip-bot for Andy Lutomirski 2017-06-21 5:22 ` [PATCH v3 03/11] x86/mm: Remove reset_lazy_tlbstate() Andy Lutomirski 2017-06-21 5:22 ` Andy Lutomirski 2017-06-21 8:03 ` Thomas Gleixner 2017-06-21 8:03 ` Thomas Gleixner 2017-06-21 9:50 ` Borislav Petkov 2017-06-21 9:50 ` Borislav Petkov 2017-06-22 11:08 ` [tip:x86/mm] " tip-bot for Andy Lutomirski 2017-06-21 5:22 ` [PATCH v3 04/11] x86/mm: Give each mm TLB flush generation a unique ID Andy Lutomirski 2017-06-21 5:22 ` Andy Lutomirski 2017-06-21 8:05 ` Thomas Gleixner 2017-06-21 8:05 ` Thomas Gleixner 2017-06-21 10:33 ` Borislav Petkov 2017-06-21 10:33 ` Borislav Petkov 2017-06-21 15:23 ` Andy Lutomirski 2017-06-21 15:23 ` Andy Lutomirski 2017-06-21 17:06 ` Borislav Petkov 2017-06-21 17:06 ` Borislav Petkov 2017-06-21 17:43 ` Borislav Petkov 2017-06-21 17:43 ` Borislav Petkov 2017-06-22 2:34 ` Andy Lutomirski 2017-06-22 2:34 ` Andy Lutomirski 2017-06-21 5:22 ` [PATCH v3 05/11] x86/mm: Track the TLB's tlb_gen and update the flushing algorithm Andy Lutomirski 2017-06-21 5:22 ` Andy Lutomirski 2017-06-21 8:32 ` Thomas Gleixner 2017-06-21 8:32 ` Thomas Gleixner 2017-06-21 15:11 ` Andy Lutomirski 2017-06-21 15:11 ` Andy Lutomirski 2017-06-21 18:44 ` Borislav Petkov 2017-06-21 18:44 ` Borislav Petkov 2017-06-22 2:46 ` Andy Lutomirski 2017-06-22 2:46 ` Andy Lutomirski 2017-06-22 7:24 ` Borislav Petkov 2017-06-22 7:24 ` Borislav Petkov 2017-06-22 14:48 ` Andy Lutomirski 2017-06-22 14:48 ` Andy Lutomirski 2017-06-22 14:59 ` Borislav Petkov 2017-06-22 14:59 ` Borislav Petkov 2017-06-22 15:55 ` Andy Lutomirski 2017-06-22 15:55 ` Andy Lutomirski 2017-06-22 17:22 ` Borislav Petkov 2017-06-22 17:22 ` Borislav Petkov 2017-06-22 18:08 ` Andy Lutomirski 2017-06-22 18:08 ` Andy Lutomirski 2017-06-23 8:42 ` Borislav Petkov 2017-06-23 8:42 ` Borislav Petkov 2017-06-23 15:46 ` Andy Lutomirski [this message] 2017-06-23 15:46 ` Andy Lutomirski 2017-06-21 5:22 ` [PATCH v3 06/11] x86/mm: Rework lazy TLB mode and TLB freshness tracking Andy Lutomirski 2017-06-21 5:22 ` Andy Lutomirski 2017-06-21 9:01 ` Thomas Gleixner 2017-06-21 9:01 ` Thomas Gleixner 2017-06-21 16:04 ` Andy Lutomirski 2017-06-21 16:04 ` Andy Lutomirski 2017-06-21 17:29 ` Borislav Petkov 2017-06-21 17:29 ` Borislav Petkov 2017-06-22 14:50 ` Borislav Petkov 2017-06-22 14:50 ` Borislav Petkov 2017-06-22 17:47 ` Andy Lutomirski 2017-06-22 17:47 ` Andy Lutomirski 2017-06-22 19:05 ` Borislav Petkov 2017-06-22 19:05 ` Borislav Petkov 2017-07-27 19:53 ` Andrew Banman 2017-07-27 19:53 ` Andrew Banman 2017-07-28 2:05 ` Andy Lutomirski 2017-07-28 2:05 ` Andy Lutomirski 2017-06-23 13:34 ` Boris Ostrovsky 2017-06-23 13:34 ` Boris Ostrovsky 2017-06-23 15:22 ` Andy Lutomirski 2017-06-23 15:22 ` Andy Lutomirski 2017-06-21 5:22 ` [PATCH v3 07/11] x86/mm: Stop calling leave_mm() in idle code Andy Lutomirski 2017-06-21 5:22 ` Andy Lutomirski 2017-06-21 9:22 ` Thomas Gleixner 2017-06-21 9:22 ` Thomas Gleixner 2017-06-21 15:16 ` Andy Lutomirski 2017-06-21 15:16 ` Andy Lutomirski 2017-06-23 9:07 ` Borislav Petkov 2017-06-23 9:07 ` Borislav Petkov 2017-06-21 5:22 ` [PATCH v3 08/11] x86/mm: Disable PCID on 32-bit kernels Andy Lutomirski 2017-06-21 5:22 ` Andy Lutomirski 2017-06-21 9:26 ` Thomas Gleixner 2017-06-21 9:26 ` Thomas Gleixner 2017-06-23 9:24 ` Borislav Petkov 2017-06-23 9:24 ` Borislav Petkov 2017-06-21 5:22 ` [PATCH v3 09/11] x86/mm: Add nopcid to turn off PCID Andy Lutomirski 2017-06-21 5:22 ` Andy Lutomirski 2017-06-21 9:27 ` Thomas Gleixner 2017-06-21 9:27 ` Thomas Gleixner 2017-06-23 9:34 ` Borislav Petkov 2017-06-23 9:34 ` Borislav Petkov 2017-06-21 5:22 ` [PATCH v3 10/11] x86/mm: Enable CR4.PCIDE on supported systems Andy Lutomirski 2017-06-21 5:22 ` Andy Lutomirski 2017-06-21 9:39 ` Thomas Gleixner 2017-06-21 9:39 ` Thomas Gleixner 2017-06-21 13:40 ` Thomas Gleixner 2017-06-21 13:40 ` Thomas Gleixner 2017-06-21 20:34 ` Andy Lutomirski 2017-06-21 20:34 ` Andy Lutomirski 2017-06-23 11:50 ` Borislav Petkov 2017-06-23 11:50 ` Borislav Petkov 2017-06-23 15:28 ` Andy Lutomirski 2017-06-23 15:28 ` Andy Lutomirski 2017-06-23 13:35 ` Boris Ostrovsky 2017-06-23 13:35 ` Boris Ostrovsky 2017-06-21 5:22 ` [PATCH v3 11/11] x86/mm: Try to preserve old TLB entries using PCID Andy Lutomirski 2017-06-21 5:22 ` Andy Lutomirski 2017-06-21 13:38 ` Thomas Gleixner 2017-06-21 13:38 ` Thomas Gleixner 2017-06-21 13:40 ` Thomas Gleixner 2017-06-21 13:40 ` Thomas Gleixner 2017-06-22 2:57 ` Andy Lutomirski 2017-06-22 2:57 ` Andy Lutomirski 2017-06-22 12:21 ` Thomas Gleixner 2017-06-22 12:21 ` Thomas Gleixner 2017-06-22 18:12 ` Andy Lutomirski 2017-06-22 18:12 ` Andy Lutomirski 2017-06-22 21:22 ` Thomas Gleixner 2017-06-22 21:22 ` Thomas Gleixner 2017-06-23 3:09 ` Andy Lutomirski 2017-06-23 3:09 ` Andy Lutomirski 2017-06-23 7:29 ` Thomas Gleixner 2017-06-23 7:29 ` Thomas Gleixner 2017-06-22 16:09 ` Nadav Amit 2017-06-22 16:09 ` Nadav Amit 2017-06-22 18:10 ` Andy Lutomirski 2017-06-22 18:10 ` Andy Lutomirski 2017-06-26 15:58 ` Borislav Petkov 2017-06-26 15:58 ` Borislav Petkov 2017-06-21 18:23 ` [PATCH v3 00/11] PCID and improved laziness Linus Torvalds 2017-06-21 18:23 ` Linus Torvalds 2017-06-22 5:19 ` Andy Lutomirski 2017-06-22 5:19 ` Andy Lutomirski
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CALCETrX+B1Xa=0ZjYUNi+aApKPQerVqOt42bgGeNadaZc-c3hw@mail.gmail.com' \ --to=luto@kernel.org \ --cc=akpm@linux-foundation.org \ --cc=arjan@linux.intel.com \ --cc=bp@alien8.de \ --cc=dave.hansen@intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@suse.de \ --cc=nadav.amit@gmail.com \ --cc=peterz@infradead.org \ --cc=riel@redhat.com \ --cc=torvalds@linux-foundation.org \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.