All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Wei Liu <wl@xen.org>, Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [Xen-devel] [PATCH 2/9] x86: limit the amount of TLB flushing in switch_cr3_cr4()
Date: Thu, 12 Sep 2019 11:54:20 +0200	[thread overview]
Message-ID: <20190912095420.shrhi7prduwjmyuk@Air-de-Roger> (raw)
In-Reply-To: <c842ae3e-962d-0272-eff6-c5f517fee7c9@suse.com>

On Wed, Sep 11, 2019 at 05:22:17PM +0200, Jan Beulich wrote:
> We really need to flush the TLB just once, if we do so with or after the
> CR3 write. The only case where two flushes are unavoidable is when we
> mean to turn off CR4.PGE (perhaps just temporarily; see the code
> comment).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Thanks, this seems to make the logic of the function easier, but I'm
slightly worried about the performance impact given that a full flush
of all PCID contexts is done instead of the previous selective flush.

> --- a/xen/arch/x86/flushtlb.c
> +++ b/xen/arch/x86/flushtlb.c
> @@ -104,82 +104,65 @@ static void do_tlb_flush(void)
>  void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
>  {
>      unsigned long flags, old_cr4;
> -    unsigned int old_pcid;
>      u32 t;
>  
> +    /* Throughout this function we make this assumption: */
> +    ASSERT(!(cr4 & X86_CR4_PCIDE) || !(cr4 & X86_CR4_PGE));
> +
>      /* This non-reentrant function is sometimes called in interrupt context. */
>      local_irq_save(flags);
>  
>      t = pre_flush();
>  
>      old_cr4 = read_cr4();
> -    if ( old_cr4 & X86_CR4_PGE )
> +    ASSERT(!(old_cr4 & X86_CR4_PCIDE) || !(old_cr4 & X86_CR4_PGE));
> +
> +    /*
> +     * We need to write CR4 before CR3 if we're about to enable PCIDE, at the
> +     * very least when the new PCID is non-zero.
> +     *
> +     * As we also need to do two CR4 writes in total when PGE is enabled and
> +     * is to remain enabled, do the one temporarily turning off the bit right
> +     * here as well.
> +     *
> +     * The only TLB flushing effect we depend on here is in case we move from
> +     * PGE set to PCIDE set, where we want global page entries gone (and none
> +     * to re-appear) after this write.
> +     */
> +    if ( !(old_cr4 & X86_CR4_PCIDE) &&
> +         ((cr4 & X86_CR4_PCIDE) || (cr4 & old_cr4 & X86_CR4_PGE)) )
>      {
> -        /*
> -         * X86_CR4_PGE set means PCID is inactive.
> -         * We have to purge the TLB via flipping cr4.pge.
> -         */
>          old_cr4 = cr4 & ~X86_CR4_PGE;
>          write_cr4(old_cr4);
>      }
> -    else if ( use_invpcid )
> -    {
> -        /*
> -         * Flushing the TLB via INVPCID is necessary only in case PCIDs are
> -         * in use, which is true only with INVPCID being available.
> -         * Without PCID usage the following write_cr3() will purge the TLB
> -         * (we are in the cr4.pge off path) of all entries.
> -         * Using invpcid_flush_all_nonglobals() seems to be faster than
> -         * invpcid_flush_all(), so use that.
> -         */
> -        invpcid_flush_all_nonglobals();
> -
> -        /*
> -         * CR4.PCIDE needs to be set before the CR3 write below. Otherwise
> -         * - the CR3 write will fault when CR3.NOFLUSH is set (which is the
> -         *   case normally),
> -         * - the subsequent CR4 write will fault if CR3.PCID != 0.
> -         */
> -        if ( (old_cr4 & X86_CR4_PCIDE) < (cr4 & X86_CR4_PCIDE) )
> -        {
> -            write_cr4(cr4);
> -            old_cr4 = cr4;
> -        }
> -    }
>  
>      /*
> -     * If we don't change PCIDs, the CR3 write below needs to flush this very
> -     * PCID, even when a full flush was performed above, as we are currently
> -     * accumulating TLB entries again from the old address space.
> -     * NB: Clearing the bit when we don't use PCID is benign (as it is clear
> -     * already in that case), but allows the if() to be more simple.
> +     * If the CR4 write is to turn off PCIDE, we don't need the CR3 write to
> +     * flush anything, as that transition is a full flush itself.
>       */
> -    old_pcid = cr3_pcid(read_cr3());
> -    if ( old_pcid == cr3_pcid(cr3) )
> -        cr3 &= ~X86_CR3_NOFLUSH;
> -
> +    if ( (old_cr4 & X86_CR4_PCIDE) > (cr4 & X86_CR4_PCIDE) )
> +        cr3 |= X86_CR3_NOFLUSH;
>      write_cr3(cr3);
>  
>      if ( old_cr4 != cr4 )
>          write_cr4(cr4);
>  
>      /*
> -     * Make sure no TLB entries related to the old PCID created between
> -     * flushing the TLB and writing the new %cr3 value remain in the TLB.
> -     *
> -     * The write to CR4 just above has performed a wider flush in certain
> -     * cases, which therefore get excluded here. Since that write is
> -     * conditional, note in particular that it won't be skipped if PCIDE
> -     * transitions from 1 to 0. This is because the CR4 write further up will
> -     * have been skipped in this case, as PCIDE and PGE won't both be set at
> -     * the same time.
> -     *
> -     * Note also that PGE is always clear in old_cr4.
> +     *  PGE  | PCIDE | flush at
> +     * ------+-------+------------------------
> +     *  0->0 | 0->0  | CR3 write
> +     *  0->0 | 0->1  | n/a (see 1st CR4 write)
> +     *  0->x | 1->0  | CR4 write
> +     *  x->1 | x->1  | n/a
> +     *  0->0 | 1->1  | INVPCID
> +     *  0->1 | 0->0  | CR3 and CR4 writes
> +     *  1->0 | 0->0  | CR4 write
> +     *  1->0 | 0->1  | n/a (see 1st CR4 write)
> +     *  1->1 | 0->0  | n/a (see 1st CR4 write)
> +     *  1->x | 1->x  | n/a
>       */
> -    if ( old_pcid != cr3_pcid(cr3) &&

You seem to have dropped all the users of cr3_pcid, I guess the
function is not removed because you plan to use it in other sites?

> -         !(cr4 & X86_CR4_PGE) &&
> -         (old_cr4 & X86_CR4_PCIDE) <= (cr4 & X86_CR4_PCIDE) )
> -        invpcid_flush_single_context(old_pcid);
> +    if ( cr4 & X86_CR4_PCIDE )
> +        invpcid_flush_all_nonglobals();

Isn't this going to be quite expensive compared to the single PCID
flushing done before? (ie: invpcid_flush_single_context vs
invpcid_flush_all_nonglobals)

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2019-09-12  9:54 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-11 15:15 [Xen-devel] [PATCH RESEND/PING 0/9] XSA-292 follow-up Jan Beulich
2019-09-11 15:21 ` [Xen-devel] [PATCH 1/9] x86: adjust cr3_pcid() return type Jan Beulich
2019-09-12  9:19   ` Roger Pau Monné
2019-09-11 15:22 ` [Xen-devel] [PATCH 2/9] x86: limit the amount of TLB flushing in switch_cr3_cr4() Jan Beulich
2019-09-12  9:54   ` Roger Pau Monné [this message]
2019-09-12 10:11     ` Jan Beulich
2019-09-12 10:38       ` Roger Pau Monné
2019-09-11 15:22 ` [Xen-devel] [PATCH 3/9] x86/mm: honor opt_pcid also for 32-bit PV domains Jan Beulich
2019-09-12 10:34   ` Roger Pau Monné
2019-09-12 10:45     ` Jan Beulich
2019-09-11 15:23 ` [Xen-devel] [PATCH 4/9] x86/HVM: move NOFLUSH handling out of hvm_set_cr3() Jan Beulich
2019-09-12 11:35   ` Roger Pau Monné
2019-09-12 11:52     ` Jan Beulich
2019-09-12 14:44       ` Roger Pau Monné
2019-09-12 14:47         ` Jan Beulich
2019-09-12 15:42           ` Roger Pau Monné
2019-09-12 15:52             ` Jan Beulich
2019-09-11 15:24 ` [Xen-devel] [PATCH 5/9] x86/HVM: refuse CR3 loads with reserved (upper) bits set Jan Beulich
2019-09-12 11:45   ` Roger Pau Monné
2019-09-12 12:01     ` Jan Beulich
2019-09-11 15:25 ` [Xen-devel] [PATCH 6/9] x86/HVM: relax shadow mode check in hvm_set_cr3() Jan Beulich
2019-09-12 14:50   ` Roger Pau Monné
2019-09-11 15:25 ` [Xen-devel] [PATCH 7/9] x86/HVM: cosmetics to hvm_set_cr3() Jan Beulich
2019-09-12 15:04   ` Roger Pau Monné
2019-09-11 15:26 ` [Xen-devel] [PATCH 8/9] x86/CPUID: drop INVPCID dependency on PCID Jan Beulich
2019-09-12 15:11   ` Roger Pau Monné
2019-09-11 15:26 ` [Xen-devel] [PATCH 9/9] x86: PCID is unused when !PV Jan Beulich
2019-09-12 15:31   ` Roger Pau Monné
2019-09-12 15:46     ` Jan Beulich
2019-09-12 15:48     ` Jan Beulich
2019-09-12 15:57       ` Roger Pau Monné
  -- strict thread matches above, loose matches on Subject: below --
2019-05-02 11:35 [PATCH 0/9] XSA-292 follow-up Jan Beulich
2019-05-02 12:19 ` [Xen-devel] [PATCH 2/9] x86: limit the amount of TLB flushing in switch_cr3_cr4() Jan Beulich
2019-05-02 12:19   ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190912095420.shrhi7prduwjmyuk@Air-de-Roger \
    --to=roger.pau@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.