All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>, X86 ML <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Nadav Amit <nadav.amit@gmail.com>, Rik van Riel <riel@redhat.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Arjan van de Ven <arjan@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH v3 05/11] x86/mm: Track the TLB's tlb_gen and update the flushing algorithm
Date: Fri, 23 Jun 2017 08:46:40 -0700	[thread overview]
Message-ID: <CALCETrX+B1Xa=0ZjYUNi+aApKPQerVqOt42bgGeNadaZc-c3hw@mail.gmail.com> (raw)
In-Reply-To: <20170623084219.k4lrorgtlshej7ri@pd.tnic>

On Fri, Jun 23, 2017 at 1:42 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Jun 22, 2017 at 11:08:38AM -0700, Andy Lutomirski wrote:
>> Yes, I agree it's confusing.  There really are three numbers.  Those
>> numbers are: the latest generation, the generation that this CPU has
>> caught up to, and the generation that the requester of the flush we're
>> currently handling has asked us to catch up to.  I don't see a way to
>> reduce the complexity.
>
> Yeah, can you pls put that clarification what what is, over it. It
> explains it nicely what the check is supposed to do.

Done.  I've tried to improve a bunch of the comments in this function.

>
>> >> The flush IPI hits after a switch_mm_irqs_off() call notices the
>> >> change from 1 to 2. switch_mm_irqs_off() will do a full flush and
>> >> increment the local tlb_gen to 2, and the IPI handler for the partial
>> >> flush will see local_tlb_gen == mm_tlb_gen - 1 (because local_tlb_gen
>> >> == 2 and mm_tlb_gen == 3) and do a partial flush.
>> >
>> > Why, the 2->3 flush has f->end == TLB_FLUSH_ALL.
>> >
>> > That's why you have this thing in addition to the tlb_gen.
>>
>> Yes.  The idea is that we only do remote partial flushes when it's
>> 100% obvious that it's safe.
>
> So why wouldn't my simplified suggestion work then?
>
>         if (f->end != TLB_FLUSH_ALL &&
>              mm_tlb_gen == local_tlb_gen + 1)
>
> 1->2 is a partial flush - gets promoted to a full one
> 2->3 is a full flush - it will get executed as one due to the f->end setting to
> TLB_FLUSH_ALL.

This could still fail in some cases, I think.  Suppose 1->2 is a
partial flush and 2->3 is a full flush.  We could have this order of
events:

 - CPU 1: Partial flush.  Increase context.tlb_gen to 2 and send IPI.
 - CPU 0: switch_mm(), observe mm_tlb_gen == 2, set local_tlb_gen to 2.
 - CPU 2: Full flush.  Increase context.tlb_gen to 3 and send IPI.
 - CPU 0: Receive partial flush IPI.  mm_tlb_gen == 2 and
local_tlb_gen == 3.  Do __flush_tlb_single() and set local_tlb_gen to
3.

Our invariant is now broken: CPU 0's percpu tlb_gen is now ahead of
its actual TLB state.

 - CPU 0: Receive full flush IPI and skip the flush.  Oops.

I think my condition makes it clear that the invariants we need hold
no matter it.

>
>> It could be converted to two full flushes or to just one, I think,
>> depending on what order everything happens in.
>
> Right. One flush at the right time would be optimal.
>
>> But this approach of using three separate tlb_gen values seems to
>> cover all the bases, and I don't think it's *that* bad.
>
> Sure.
>
> As I said in IRC, let's document that complexity then so that when we
> stumble over it in the future, we at least know why it was done this
> way.

I've given it a try.  Hopefully v4 is more clear.

WARNING: multiple messages have this Message-ID (diff)
From: Andy Lutomirski <luto@kernel.org>
To: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>, X86 ML <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Nadav Amit <nadav.amit@gmail.com>, Rik van Riel <riel@redhat.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Arjan van de Ven <arjan@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH v3 05/11] x86/mm: Track the TLB's tlb_gen and update the flushing algorithm
Date: Fri, 23 Jun 2017 08:46:40 -0700	[thread overview]
Message-ID: <CALCETrX+B1Xa=0ZjYUNi+aApKPQerVqOt42bgGeNadaZc-c3hw@mail.gmail.com> (raw)
In-Reply-To: <20170623084219.k4lrorgtlshej7ri@pd.tnic>

On Fri, Jun 23, 2017 at 1:42 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Jun 22, 2017 at 11:08:38AM -0700, Andy Lutomirski wrote:
>> Yes, I agree it's confusing.  There really are three numbers.  Those
>> numbers are: the latest generation, the generation that this CPU has
>> caught up to, and the generation that the requester of the flush we're
>> currently handling has asked us to catch up to.  I don't see a way to
>> reduce the complexity.
>
> Yeah, can you pls put that clarification what what is, over it. It
> explains it nicely what the check is supposed to do.

Done.  I've tried to improve a bunch of the comments in this function.

>
>> >> The flush IPI hits after a switch_mm_irqs_off() call notices the
>> >> change from 1 to 2. switch_mm_irqs_off() will do a full flush and
>> >> increment the local tlb_gen to 2, and the IPI handler for the partial
>> >> flush will see local_tlb_gen == mm_tlb_gen - 1 (because local_tlb_gen
>> >> == 2 and mm_tlb_gen == 3) and do a partial flush.
>> >
>> > Why, the 2->3 flush has f->end == TLB_FLUSH_ALL.
>> >
>> > That's why you have this thing in addition to the tlb_gen.
>>
>> Yes.  The idea is that we only do remote partial flushes when it's
>> 100% obvious that it's safe.
>
> So why wouldn't my simplified suggestion work then?
>
>         if (f->end != TLB_FLUSH_ALL &&
>              mm_tlb_gen == local_tlb_gen + 1)
>
> 1->2 is a partial flush - gets promoted to a full one
> 2->3 is a full flush - it will get executed as one due to the f->end setting to
> TLB_FLUSH_ALL.

This could still fail in some cases, I think.  Suppose 1->2 is a
partial flush and 2->3 is a full flush.  We could have this order of
events:

 - CPU 1: Partial flush.  Increase context.tlb_gen to 2 and send IPI.
 - CPU 0: switch_mm(), observe mm_tlb_gen == 2, set local_tlb_gen to 2.
 - CPU 2: Full flush.  Increase context.tlb_gen to 3 and send IPI.
 - CPU 0: Receive partial flush IPI.  mm_tlb_gen == 2 and
local_tlb_gen == 3.  Do __flush_tlb_single() and set local_tlb_gen to
3.

Our invariant is now broken: CPU 0's percpu tlb_gen is now ahead of
its actual TLB state.

 - CPU 0: Receive full flush IPI and skip the flush.  Oops.

I think my condition makes it clear that the invariants we need hold
no matter it.

>
>> It could be converted to two full flushes or to just one, I think,
>> depending on what order everything happens in.
>
> Right. One flush at the right time would be optimal.
>
>> But this approach of using three separate tlb_gen values seems to
>> cover all the bases, and I don't think it's *that* bad.
>
> Sure.
>
> As I said in IRC, let's document that complexity then so that when we
> stumble over it in the future, we at least know why it was done this
> way.

I've given it a try.  Hopefully v4 is more clear.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-06-23 15:47 UTC|newest]

Thread overview: 154+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-21  5:22 [PATCH v3 00/11] PCID and improved laziness Andy Lutomirski
2017-06-21  5:22 ` Andy Lutomirski
2017-06-21  5:22 ` [PATCH v3 01/11] x86/mm: Don't reenter flush_tlb_func_common() Andy Lutomirski
2017-06-21  5:22   ` Andy Lutomirski
2017-06-21  8:01   ` Thomas Gleixner
2017-06-21  8:01     ` Thomas Gleixner
2017-06-21  8:49   ` Borislav Petkov
2017-06-21  8:49     ` Borislav Petkov
2017-06-21 15:15     ` Andy Lutomirski
2017-06-21 15:15       ` Andy Lutomirski
2017-06-21 23:26   ` Nadav Amit
2017-06-21 23:26     ` Nadav Amit
2017-06-22  2:27     ` Andy Lutomirski
2017-06-22  2:27       ` Andy Lutomirski
2017-06-22  7:32       ` Ingo Molnar
2017-06-22  7:32         ` Ingo Molnar
2017-06-21  5:22 ` [PATCH v3 02/11] x86/ldt: Simplify LDT switching logic Andy Lutomirski
2017-06-21  5:22   ` Andy Lutomirski
2017-06-21  8:03   ` Thomas Gleixner
2017-06-21  8:03     ` Thomas Gleixner
2017-06-21  9:40   ` Borislav Petkov
2017-06-21  9:40     ` Borislav Petkov
2017-06-22 11:08   ` [tip:x86/mm] x86/ldt: Simplify the " tip-bot for Andy Lutomirski
2017-06-21  5:22 ` [PATCH v3 03/11] x86/mm: Remove reset_lazy_tlbstate() Andy Lutomirski
2017-06-21  5:22   ` Andy Lutomirski
2017-06-21  8:03   ` Thomas Gleixner
2017-06-21  8:03     ` Thomas Gleixner
2017-06-21  9:50   ` Borislav Petkov
2017-06-21  9:50     ` Borislav Petkov
2017-06-22 11:08   ` [tip:x86/mm] " tip-bot for Andy Lutomirski
2017-06-21  5:22 ` [PATCH v3 04/11] x86/mm: Give each mm TLB flush generation a unique ID Andy Lutomirski
2017-06-21  5:22   ` Andy Lutomirski
2017-06-21  8:05   ` Thomas Gleixner
2017-06-21  8:05     ` Thomas Gleixner
2017-06-21 10:33   ` Borislav Petkov
2017-06-21 10:33     ` Borislav Petkov
2017-06-21 15:23     ` Andy Lutomirski
2017-06-21 15:23       ` Andy Lutomirski
2017-06-21 17:06       ` Borislav Petkov
2017-06-21 17:06         ` Borislav Petkov
2017-06-21 17:43   ` Borislav Petkov
2017-06-21 17:43     ` Borislav Petkov
2017-06-22  2:34     ` Andy Lutomirski
2017-06-22  2:34       ` Andy Lutomirski
2017-06-21  5:22 ` [PATCH v3 05/11] x86/mm: Track the TLB's tlb_gen and update the flushing algorithm Andy Lutomirski
2017-06-21  5:22   ` Andy Lutomirski
2017-06-21  8:32   ` Thomas Gleixner
2017-06-21  8:32     ` Thomas Gleixner
2017-06-21 15:11     ` Andy Lutomirski
2017-06-21 15:11       ` Andy Lutomirski
2017-06-21 18:44   ` Borislav Petkov
2017-06-21 18:44     ` Borislav Petkov
2017-06-22  2:46     ` Andy Lutomirski
2017-06-22  2:46       ` Andy Lutomirski
2017-06-22  7:24       ` Borislav Petkov
2017-06-22  7:24         ` Borislav Petkov
2017-06-22 14:48         ` Andy Lutomirski
2017-06-22 14:48           ` Andy Lutomirski
2017-06-22 14:59           ` Borislav Petkov
2017-06-22 14:59             ` Borislav Petkov
2017-06-22 15:55             ` Andy Lutomirski
2017-06-22 15:55               ` Andy Lutomirski
2017-06-22 17:22               ` Borislav Petkov
2017-06-22 17:22                 ` Borislav Petkov
2017-06-22 18:08                 ` Andy Lutomirski
2017-06-22 18:08                   ` Andy Lutomirski
2017-06-23  8:42                   ` Borislav Petkov
2017-06-23  8:42                     ` Borislav Petkov
2017-06-23 15:46                     ` Andy Lutomirski [this message]
2017-06-23 15:46                       ` Andy Lutomirski
2017-06-21  5:22 ` [PATCH v3 06/11] x86/mm: Rework lazy TLB mode and TLB freshness tracking Andy Lutomirski
2017-06-21  5:22   ` Andy Lutomirski
2017-06-21  9:01   ` Thomas Gleixner
2017-06-21  9:01     ` Thomas Gleixner
2017-06-21 16:04     ` Andy Lutomirski
2017-06-21 16:04       ` Andy Lutomirski
2017-06-21 17:29       ` Borislav Petkov
2017-06-21 17:29         ` Borislav Petkov
2017-06-22 14:50   ` Borislav Petkov
2017-06-22 14:50     ` Borislav Petkov
2017-06-22 17:47     ` Andy Lutomirski
2017-06-22 17:47       ` Andy Lutomirski
2017-06-22 19:05       ` Borislav Petkov
2017-06-22 19:05         ` Borislav Petkov
2017-07-27 19:53       ` Andrew Banman
2017-07-27 19:53         ` Andrew Banman
2017-07-28  2:05         ` Andy Lutomirski
2017-07-28  2:05           ` Andy Lutomirski
2017-06-23 13:34   ` Boris Ostrovsky
2017-06-23 13:34     ` Boris Ostrovsky
2017-06-23 15:22     ` Andy Lutomirski
2017-06-23 15:22       ` Andy Lutomirski
2017-06-21  5:22 ` [PATCH v3 07/11] x86/mm: Stop calling leave_mm() in idle code Andy Lutomirski
2017-06-21  5:22   ` Andy Lutomirski
2017-06-21  9:22   ` Thomas Gleixner
2017-06-21  9:22     ` Thomas Gleixner
2017-06-21 15:16     ` Andy Lutomirski
2017-06-21 15:16       ` Andy Lutomirski
2017-06-23  9:07   ` Borislav Petkov
2017-06-23  9:07     ` Borislav Petkov
2017-06-21  5:22 ` [PATCH v3 08/11] x86/mm: Disable PCID on 32-bit kernels Andy Lutomirski
2017-06-21  5:22   ` Andy Lutomirski
2017-06-21  9:26   ` Thomas Gleixner
2017-06-21  9:26     ` Thomas Gleixner
2017-06-23  9:24   ` Borislav Petkov
2017-06-23  9:24     ` Borislav Petkov
2017-06-21  5:22 ` [PATCH v3 09/11] x86/mm: Add nopcid to turn off PCID Andy Lutomirski
2017-06-21  5:22   ` Andy Lutomirski
2017-06-21  9:27   ` Thomas Gleixner
2017-06-21  9:27     ` Thomas Gleixner
2017-06-23  9:34   ` Borislav Petkov
2017-06-23  9:34     ` Borislav Petkov
2017-06-21  5:22 ` [PATCH v3 10/11] x86/mm: Enable CR4.PCIDE on supported systems Andy Lutomirski
2017-06-21  5:22   ` Andy Lutomirski
2017-06-21  9:39   ` Thomas Gleixner
2017-06-21  9:39     ` Thomas Gleixner
2017-06-21 13:40     ` Thomas Gleixner
2017-06-21 13:40       ` Thomas Gleixner
2017-06-21 20:34     ` Andy Lutomirski
2017-06-21 20:34       ` Andy Lutomirski
2017-06-23 11:50   ` Borislav Petkov
2017-06-23 11:50     ` Borislav Petkov
2017-06-23 15:28     ` Andy Lutomirski
2017-06-23 15:28       ` Andy Lutomirski
2017-06-23 13:35   ` Boris Ostrovsky
2017-06-23 13:35     ` Boris Ostrovsky
2017-06-21  5:22 ` [PATCH v3 11/11] x86/mm: Try to preserve old TLB entries using PCID Andy Lutomirski
2017-06-21  5:22   ` Andy Lutomirski
2017-06-21 13:38   ` Thomas Gleixner
2017-06-21 13:38     ` Thomas Gleixner
2017-06-21 13:40     ` Thomas Gleixner
2017-06-21 13:40       ` Thomas Gleixner
2017-06-22  2:57     ` Andy Lutomirski
2017-06-22  2:57       ` Andy Lutomirski
2017-06-22 12:21       ` Thomas Gleixner
2017-06-22 12:21         ` Thomas Gleixner
2017-06-22 18:12         ` Andy Lutomirski
2017-06-22 18:12           ` Andy Lutomirski
2017-06-22 21:22           ` Thomas Gleixner
2017-06-22 21:22             ` Thomas Gleixner
2017-06-23  3:09             ` Andy Lutomirski
2017-06-23  3:09               ` Andy Lutomirski
2017-06-23  7:29               ` Thomas Gleixner
2017-06-23  7:29                 ` Thomas Gleixner
2017-06-22 16:09   ` Nadav Amit
2017-06-22 16:09     ` Nadav Amit
2017-06-22 18:10     ` Andy Lutomirski
2017-06-22 18:10       ` Andy Lutomirski
2017-06-26 15:58   ` Borislav Petkov
2017-06-26 15:58     ` Borislav Petkov
2017-06-21 18:23 ` [PATCH v3 00/11] PCID and improved laziness Linus Torvalds
2017-06-21 18:23   ` Linus Torvalds
2017-06-22  5:19   ` Andy Lutomirski
2017-06-22  5:19     ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrX+B1Xa=0ZjYUNi+aApKPQerVqOt42bgGeNadaZc-c3hw@mail.gmail.com' \
    --to=luto@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=nadav.amit@gmail.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.