All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: Xen-devel <xen-devel@lists.xenproject.org>,
	"Roger Pau Monné" <roger.pau@citrix.com>, "Wei Liu" <wl@xen.org>
Subject: Re: [PATCH] x86/pv: Flush TLB in response to paging structure changes
Date: Wed, 21 Oct 2020 16:39:13 +0100	[thread overview]
Message-ID: <9fe3d967-6bfe-71ef-6430-029de97dca8c@citrix.com> (raw)
In-Reply-To: <7967fa6e-213d-50e2-87d3-dbd319aa2060@suse.com>

On 21/10/2020 14:56, Jan Beulich wrote:
> On 21.10.2020 15:07, Andrew Cooper wrote:
>> @@ -4037,6 +4037,9 @@ long do_mmu_update(
>>                          break;
>>                      rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn,
>>                                        cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
>> +                    /* Paging structure maybe changed.  Flush linear range. */
>> +                    if ( !rc )
>> +                        flush_flags_all |= FLUSH_TLB;
>>                      break;
>>  
>>                  case PGT_l3_page_table:
>> @@ -4044,6 +4047,9 @@ long do_mmu_update(
>>                          break;
>>                      rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn,
>>                                        cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
>> +                    /* Paging structure maybe changed.  Flush linear range. */
>> +                    if ( !rc )
>> +                        flush_flags_all |= FLUSH_TLB;
>>                      break;
>>  
>>                  case PGT_l4_page_table:
>> @@ -4051,27 +4057,28 @@ long do_mmu_update(
>>                          break;
>>                      rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
>>                                        cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
>> -                    if ( !rc && pt_owner->arch.pv.xpti )
>> +                    /* Paging structure maybe changed.  Flush linear range. */
>> +                    if ( !rc )
>>                      {
>> -                        bool local_in_use = false;
>> +                        bool local_in_use = mfn_eq(
>> +                            pagetable_get_mfn(curr->arch.guest_table), mfn);
>>  
>> -                        if ( mfn_eq(pagetable_get_mfn(curr->arch.guest_table),
>> -                                    mfn) )
>> -                        {
>> -                            local_in_use = true;
>> -                            get_cpu_info()->root_pgt_changed = true;
>> -                        }
>> +                        flush_flags_all |= FLUSH_TLB;
>> +
>> +                        if ( local_in_use )
>> +                            flush_flags_local |= FLUSH_TLB | FLUSH_ROOT_PGTBL;
> Aiui here (and in the code consuming the flags) you build upon
> flush_flags_local, when not zero, always being a superset of
> flush_flags_all. I think this is a trap to fall into when later
> wanting to change this code, but as per below this won't hold
> anymore anyway, I think. Hence here I think you want to set
> FLUSH_TLB unconditionally, and above for L3 and L2 you want to
> set it in both variables. Or, if I'm wrong below, a comment to
> that effect may help people avoid falling into this trap.
>
> An alternative would be to have
>
>     flush_flags_local |= (flush_flags_all & FLUSH_TLB);
>
> before doing the actual flush.

Honestly, this is what I meant by stating that the asymmetry is a total
mess.

I originally named all 'remote', but this is even less accurate, it may
still contain the current cpu.

Our matrix of complexity:

* FLUSH_TLB for L2+ structure changes
* FLUSH_TLB_GLOBAL/FLUSH_ROOT_PGTBL for XPTI

with optimisations to skip GLOBAL/ROOT_PGTBL on the local CPU if none of
the updates hit the L4-in-use, and to skip the remote if we hold all
references on the L4.

Everything is complicated because pt_owner may not be current, for
toolstack operations constructing a new domain.

>
>>                          /*
>>                           * No need to sync if all uses of the page can be
>>                           * accounted to the page lock we hold, its pinned
>>                           * status, and uses on this (v)CPU.
>>                           */
>> -                        if ( (page->u.inuse.type_info & PGT_count_mask) >
>> +                        if ( pt_owner->arch.pv.xpti &&
> I assume you've moved this here to avoid the further non-trivial
> checks when possible, but you've not put it around the setting
> of FLUSH_ROOT_PGTBL in flush_flags_local because setting
> ->root_pgt_changed is benign when !XPTI?

No - that was accidental, while attempting to reduce the diff.

>
>> +                             (page->u.inuse.type_info & PGT_count_mask) >
>>                               (1 + !!(page->u.inuse.type_info & PGT_pinned) +
>>                                mfn_eq(pagetable_get_mfn(curr->arch.guest_table_user),
>>                                       mfn) + local_in_use) )
>> -                            sync_guest = true;
>> +                            flush_flags_all |= FLUSH_ROOT_PGTBL;
> This needs to also specify FLUSH_TLB_GLOBAL, or else original
> XPTI behavior gets broken. Since the local CPU doesn't need this,
> the variable may then better be named flush_flags_remote?

See above.  remote is even more confusing than all.

>
> Or if I'm wrong here, the reasoning behind the dropping of the
> global flush in this case needs putting in the description, not
> the least because it would mean the change introducing it went
> too far.
>
>> @@ -4173,18 +4180,36 @@ long do_mmu_update(
>>      if ( va )
>>          unmap_domain_page(va);
>>  
>> -    if ( sync_guest )
>> +    /*
>> +     * Flushing needs to occur for one of several reasons.
>> +     *
>> +     * 1) An update to an L2 or higher occured.  This potentially changes the
>> +     *    pagetable structure, requiring a flush of the linear range.
>> +     * 2) An update to an L4 occured, and XPTI is enabled.  All CPUs running
>> +     *    on a copy of this L4 need refreshing.
>> +     */
>> +    if ( flush_flags_all || flush_flags_local )
> Minor remark: At least in x86 code it is more efficient to use
> | instead of || in such cases, to avoid relying on the compiler
> to carry out this small optimization.

This transformation should not be recommended in general.  There are
cases, including this one, where it is at best, no effect, and at worst,
wrong.

I don't care about people using ancient compilers.  They've got far
bigger (== more impactful) problems than (the absence of) this
transformation, and the TLB flush will dwarf anything the compiler does
here.

However, the hand "optimised" case breaks a compiler spotting that the
entire second clause is actually redundant for now.

I specifically didn't encode the dependency, to avoid subtle bugs
if/when someone alters the logic.

>
>>      {
>> +        cpumask_t *mask = pt_owner->dirty_cpumask;
>> +
>>          /*
>> -         * Force other vCPU-s of the affected guest to pick up L4 entry
>> -         * changes (if any).
>> +         * Local flushing may be asymmetric with remote.  If there is local
>> +         * flushing to do, perform it separately and omit the current CPU from
>> +         * pt_owner->dirty_cpumask.
>>           */
>> -        unsigned int cpu = smp_processor_id();
>> -        cpumask_t *mask = per_cpu(scratch_cpumask, cpu);
>> +        if ( flush_flags_local )
>> +        {
>> +            unsigned int cpu = smp_processor_id();
>> +
>> +            mask = per_cpu(scratch_cpumask, cpu);
>> +            cpumask_copy(mask, pt_owner->dirty_cpumask);
>> +            __cpumask_clear_cpu(cpu, mask);
>> +
>> +            flush_local(flush_flags_local);
>> +        }
>>  
>> -        cpumask_andnot(mask, pt_owner->dirty_cpumask, cpumask_of(cpu));
> I understand you're of the opinion that cpumask_copy() +
> __cpumask_clear_cpu() is more efficient than cpumask_andnot()?
> (I guess I agree for high enough CPU counts.)

Its faster in all cases, even low CPU counts.

~Andrew


  reply	other threads:[~2020-10-21 15:39 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-21 13:07 [PATCH] x86/pv: Flush TLB in response to paging structure changes Andrew Cooper
2020-10-21 13:56 ` Jan Beulich
2020-10-21 15:39   ` Andrew Cooper [this message]
2020-10-21 15:55     ` Andrew Cooper
2020-10-21 16:53       ` Andrew Cooper
2020-10-22  7:22     ` Jan Beulich
  -- strict thread matches above, loose matches on Subject: below --
2020-10-20 15:24 Andrew Cooper
2020-10-20 15:44 ` Andrew Cooper
2020-10-20 15:48 ` Jan Beulich
2020-10-20 16:20   ` Andrew Cooper
2020-10-20 17:10     ` Andrew Cooper
2020-10-20 18:46       ` Andrew Cooper
2020-10-21  6:55         ` Jan Beulich
2020-10-21 10:01           ` Andrew Cooper
2020-10-21 10:27             ` Jan Beulich
2020-10-21  5:58 ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9fe3d967-6bfe-71ef-6430-029de97dca8c@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=roger.pau@citrix.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.