xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
	AndrewCooper <andrew.cooper3@citrix.com>, Wei Liu <wl@xen.org>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: Re: [Xen-devel] [PATCH 2/9] x86: limit the amount of TLB flushing in switch_cr3_cr4()
Date: Thu, 12 Sep 2019 12:38:08 +0200	[thread overview]
Message-ID: <20190912103808.muq5xasgjsix2wwi@Air-de-Roger> (raw)
In-Reply-To: <738ea7a4-5798-f1dd-65ad-356a0ac453d1@suse.com>

On Thu, Sep 12, 2019 at 12:11:55PM +0200, Jan Beulich wrote:
> On 12.09.2019 11:54, Roger Pau Monné  wrote:
> > On Wed, Sep 11, 2019 at 05:22:17PM +0200, Jan Beulich wrote:
> >> We really need to flush the TLB just once, if we do so with or after the
> >> CR3 write. The only case where two flushes are unavoidable is when we
> >> mean to turn off CR4.PGE (perhaps just temporarily; see the code
> >> comment).
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > 
> > Thanks, this seems to make the logic of the function easier, but I'm
> > slightly worried about the performance impact given that a full flush
> > of all PCID contexts is done instead of the previous selective flush.
> 
> I think you've misunderstood:
> 
> >> --- a/xen/arch/x86/flushtlb.c
> >> +++ b/xen/arch/x86/flushtlb.c
> >> @@ -104,82 +104,65 @@ static void do_tlb_flush(void)
> >>  void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
> >>  {
> >>      unsigned long flags, old_cr4;
> >> -    unsigned int old_pcid;
> >>      u32 t;
> >>  
> >> +    /* Throughout this function we make this assumption: */
> >> +    ASSERT(!(cr4 & X86_CR4_PCIDE) || !(cr4 & X86_CR4_PGE));
> >> +
> >>      /* This non-reentrant function is sometimes called in interrupt context. */
> >>      local_irq_save(flags);
> >>  
> >>      t = pre_flush();
> >>  
> >>      old_cr4 = read_cr4();
> >> -    if ( old_cr4 & X86_CR4_PGE )
> >> +    ASSERT(!(old_cr4 & X86_CR4_PCIDE) || !(old_cr4 & X86_CR4_PGE));
> >> +
> >> +    /*
> >> +     * We need to write CR4 before CR3 if we're about to enable PCIDE, at the
> >> +     * very least when the new PCID is non-zero.
> >> +     *
> >> +     * As we also need to do two CR4 writes in total when PGE is enabled and
> >> +     * is to remain enabled, do the one temporarily turning off the bit right
> >> +     * here as well.
> >> +     *
> >> +     * The only TLB flushing effect we depend on here is in case we move from
> >> +     * PGE set to PCIDE set, where we want global page entries gone (and none
> >> +     * to re-appear) after this write.
> >> +     */
> >> +    if ( !(old_cr4 & X86_CR4_PCIDE) &&
> >> +         ((cr4 & X86_CR4_PCIDE) || (cr4 & old_cr4 & X86_CR4_PGE)) )
> >>      {
> >> -        /*
> >> -         * X86_CR4_PGE set means PCID is inactive.
> >> -         * We have to purge the TLB via flipping cr4.pge.
> >> -         */
> >>          old_cr4 = cr4 & ~X86_CR4_PGE;
> >>          write_cr4(old_cr4);
> >>      }
> >> -    else if ( use_invpcid )
> >> -    {
> >> -        /*
> >> -         * Flushing the TLB via INVPCID is necessary only in case PCIDs are
> >> -         * in use, which is true only with INVPCID being available.
> >> -         * Without PCID usage the following write_cr3() will purge the TLB
> >> -         * (we are in the cr4.pge off path) of all entries.
> >> -         * Using invpcid_flush_all_nonglobals() seems to be faster than
> >> -         * invpcid_flush_all(), so use that.
> >> -         */
> >> -        invpcid_flush_all_nonglobals();
> 
> This simply gets moved, while ...
> 
> >> -        /*
> >> -         * CR4.PCIDE needs to be set before the CR3 write below. Otherwise
> >> -         * - the CR3 write will fault when CR3.NOFLUSH is set (which is the
> >> -         *   case normally),
> >> -         * - the subsequent CR4 write will fault if CR3.PCID != 0.
> >> -         */
> >> -        if ( (old_cr4 & X86_CR4_PCIDE) < (cr4 & X86_CR4_PCIDE) )
> >> -        {
> >> -            write_cr4(cr4);
> >> -            old_cr4 = cr4;
> >> -        }
> >> -    }
> >>  
> >>      /*
> >> -     * If we don't change PCIDs, the CR3 write below needs to flush this very
> >> -     * PCID, even when a full flush was performed above, as we are currently
> >> -     * accumulating TLB entries again from the old address space.
> >> -     * NB: Clearing the bit when we don't use PCID is benign (as it is clear
> >> -     * already in that case), but allows the if() to be more simple.
> >> +     * If the CR4 write is to turn off PCIDE, we don't need the CR3 write to
> >> +     * flush anything, as that transition is a full flush itself.
> >>       */
> >> -    old_pcid = cr3_pcid(read_cr3());
> >> -    if ( old_pcid == cr3_pcid(cr3) )
> >> -        cr3 &= ~X86_CR3_NOFLUSH;
> >> -
> >> +    if ( (old_cr4 & X86_CR4_PCIDE) > (cr4 & X86_CR4_PCIDE) )
> >> +        cr3 |= X86_CR3_NOFLUSH;
> >>      write_cr3(cr3);
> >>  
> >>      if ( old_cr4 != cr4 )
> >>          write_cr4(cr4);
> >>  
> >>      /*
> >> -     * Make sure no TLB entries related to the old PCID created between
> >> -     * flushing the TLB and writing the new %cr3 value remain in the TLB.
> >> -     *
> >> -     * The write to CR4 just above has performed a wider flush in certain
> >> -     * cases, which therefore get excluded here. Since that write is
> >> -     * conditional, note in particular that it won't be skipped if PCIDE
> >> -     * transitions from 1 to 0. This is because the CR4 write further up will
> >> -     * have been skipped in this case, as PCIDE and PGE won't both be set at
> >> -     * the same time.
> >> -     *
> >> -     * Note also that PGE is always clear in old_cr4.
> >> +     *  PGE  | PCIDE | flush at
> >> +     * ------+-------+------------------------
> >> +     *  0->0 | 0->0  | CR3 write
> >> +     *  0->0 | 0->1  | n/a (see 1st CR4 write)
> >> +     *  0->x | 1->0  | CR4 write
> >> +     *  x->1 | x->1  | n/a
> >> +     *  0->0 | 1->1  | INVPCID
> >> +     *  0->1 | 0->0  | CR3 and CR4 writes
> >> +     *  1->0 | 0->0  | CR4 write
> >> +     *  1->0 | 0->1  | n/a (see 1st CR4 write)
> >> +     *  1->1 | 0->0  | n/a (see 1st CR4 write)
> >> +     *  1->x | 1->x  | n/a
> >>       */
> >> -    if ( old_pcid != cr3_pcid(cr3) &&
> > 
> > You seem to have dropped all the users of cr3_pcid, I guess the
> > function is not removed because you plan to use it in other sites?
> > 
> >> -         !(cr4 & X86_CR4_PGE) &&
> >> -         (old_cr4 & X86_CR4_PCIDE) <= (cr4 & X86_CR4_PCIDE) )
> >> -        invpcid_flush_single_context(old_pcid);
> >> +    if ( cr4 & X86_CR4_PCIDE )
> >> +        invpcid_flush_all_nonglobals();
> > 
> > Isn't this going to be quite expensive compared to the single PCID
> > flushing done before? (ie: invpcid_flush_single_context vs
> > invpcid_flush_all_nonglobals)
> 
> ... the invpcid_flush_single_context() gets eliminated altogether
> (by doing the main flush _after_ the control register writes).

Oh, thanks, I've certainly missed this move, sorry.

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2019-09-12 10:38 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-11 15:15 [Xen-devel] [PATCH RESEND/PING 0/9] XSA-292 follow-up Jan Beulich
2019-09-11 15:21 ` [Xen-devel] [PATCH 1/9] x86: adjust cr3_pcid() return type Jan Beulich
2019-09-12  9:19   ` Roger Pau Monné
2019-09-11 15:22 ` [Xen-devel] [PATCH 2/9] x86: limit the amount of TLB flushing in switch_cr3_cr4() Jan Beulich
2019-09-12  9:54   ` Roger Pau Monné
2019-09-12 10:11     ` Jan Beulich
2019-09-12 10:38       ` Roger Pau Monné [this message]
2019-09-11 15:22 ` [Xen-devel] [PATCH 3/9] x86/mm: honor opt_pcid also for 32-bit PV domains Jan Beulich
2019-09-12 10:34   ` Roger Pau Monné
2019-09-12 10:45     ` Jan Beulich
2019-09-11 15:23 ` [Xen-devel] [PATCH 4/9] x86/HVM: move NOFLUSH handling out of hvm_set_cr3() Jan Beulich
2019-09-12 11:35   ` Roger Pau Monné
2019-09-12 11:52     ` Jan Beulich
2019-09-12 14:44       ` Roger Pau Monné
2019-09-12 14:47         ` Jan Beulich
2019-09-12 15:42           ` Roger Pau Monné
2019-09-12 15:52             ` Jan Beulich
2019-09-11 15:24 ` [Xen-devel] [PATCH 5/9] x86/HVM: refuse CR3 loads with reserved (upper) bits set Jan Beulich
2019-09-12 11:45   ` Roger Pau Monné
2019-09-12 12:01     ` Jan Beulich
2019-09-11 15:25 ` [Xen-devel] [PATCH 6/9] x86/HVM: relax shadow mode check in hvm_set_cr3() Jan Beulich
2019-09-12 14:50   ` Roger Pau Monné
2019-09-11 15:25 ` [Xen-devel] [PATCH 7/9] x86/HVM: cosmetics to hvm_set_cr3() Jan Beulich
2019-09-12 15:04   ` Roger Pau Monné
2019-09-11 15:26 ` [Xen-devel] [PATCH 8/9] x86/CPUID: drop INVPCID dependency on PCID Jan Beulich
2019-09-12 15:11   ` Roger Pau Monné
2019-09-11 15:26 ` [Xen-devel] [PATCH 9/9] x86: PCID is unused when !PV Jan Beulich
2019-09-12 15:31   ` Roger Pau Monné
2019-09-12 15:46     ` Jan Beulich
2019-09-12 15:48     ` Jan Beulich
2019-09-12 15:57       ` Roger Pau Monné
  -- strict thread matches above, loose matches on Subject: below --
2019-05-02 11:35 [PATCH 0/9] XSA-292 follow-up Jan Beulich
2019-05-02 12:19 ` [PATCH 2/9] x86: limit the amount of TLB flushing in switch_cr3_cr4() Jan Beulich
2019-05-02 12:19   ` [Xen-devel] " Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190912103808.muq5xasgjsix2wwi@Air-de-Roger \
    --to=roger.pau@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).