All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dario Faggioli <dario.faggioli@citrix.com>
To: Feng Wu <feng.wu@intel.com>, xen-devel@lists.xen.org
Cc: George Dunlap <george.dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Kevin Tian <kevin.tian@intel.com>, Keir Fraser <keir@xen.org>,
	Jan Beulich <jbeulich@suse.com>
Subject: Re: [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic handling
Date: Tue, 10 Nov 2015 19:14:51 +0100	[thread overview]
Message-ID: <1447179291.6409.59.camel@citrix.com> (raw)
In-Reply-To: <1446540207-4806-16-git-send-email-feng.wu@intel.com>


[-- Attachment #1.1: Type: text/plain, Size: 12274 bytes --]

On Tue, 2015-11-03 at 16:43 +0800, Feng Wu wrote:
> This patch includes the following aspects:
> - Handling logic when vCPU is blocked:
>     * Add a global vector to wake up the blocked vCPU
>       when an interrupt is being posted to it (This part
>       was sugguested by Yang Zhang <yang.z.zhang@intel.com>).
>     * Define two per-cpu variables:
>           1. pi_blocked_vcpu:
>             A list storing the vCPUs which were blocked
>             on this pCPU.
> 
>           2. pi_blocked_vcpu_lock:
>             The spinlock to protect pi_blocked_vcpu.
> 
> - Add the following hooks, this part was suggested
>   by George Dunlap <george.dunlap@eu.citrix.com> and
>   Dario Faggioli <dario.faggioli@citrix.com>.
>     * arch_vcpu_block()
>       Called alled before vcpu is blocking and update the PID
>       (posted-interrupt descriptor).
> 
>     * vmx_pi_switch_from()
>       Called before context switch, we update the PID when the
>       vCPU is preempted or going to sleep.
> 
>     * vmx_pi_switch_to()
>       Called after context switch, we update the PID when the vCPU
>       is going to run.
> 
> - Before VM-entry, check the state of PI descriptor, make sure the
> 'NV' field is set to '&posted_intr_vector' when the guest is running
> in non-root mode. Suggested by Jan Beulich <jbeulich@suse.com>.
> 
So, TBH, I never felt in love with this changelog. I'm saying this now
because I'm starting to loose track of things, and the changelog is not
helpful as it could be, in explaining to me what this patch (and, in
this case, this particular version of the patch) does.

In fact, above, you're listing the functions and the data structure you
are adding, and some of the places from where they're called/used.
That, however, can all be figured out from the code.

I think this would want to become a bit more high level, and try to
describe the _reason_why_ the various bits and pieces have been put in
the shape they are. For instance, although you added a paragraph about
the VMENTRY check, it is not clear to me, by just reading this
description, that it is meant at avoiding having to add much more
hooks, as well as, as another example, what you put in the other email
about it needing to be called on ever VMENTRY (the "we can't use
vmx_do_resume() thing).

I'm not saying you should rewrite another design document in this
changelog, and I know it's hard to pick the best structure and content,
but, really, this isn't ideal.

Also, explicitly crediting people who suggested the various items is
not helpful too. In 5 years someone (either of us or new), won't care
much who actually came up with the idea of having hooks. What he'll
care, is for the changelog to provide guidance throughout what the code
does, and why it's been done that way. There is no trace of a similar
crediting scheme (which, in principle, it would apply to any, o at
least to major, comments you get at each review) in our history, and it
makes the changelog heavier to read, so I'd cut it all off.

> ---
> v9:
> - Remove arch_vcpu_block_cancel() and arch_vcpu_wake_prepare()
> - Add vmx_pi_state_change() and call it before VM Entry
> 
And this is why I said I'm losing track of what was your actual aim for
this round.

Since when the possibility of moving stuff to VMX helpers came out

Jan said, during v7 review (about the need to undo what
arch_vcpu_block() does in case local_evnts_need_delivery() is true):

1) "Couldn't this be taken care of by, if necessary, adjusting PI state
    in vmx_do_resume()?"

And George said:

2) "we could just clear SN on every vmexit, and set SN and NDST on 
    every vmentry, couldn't we?  Then we wouldn't need hooks on the 
    context switch path at all (which are only there to catch running
    -> runnable and runnable -> running) -- we could just have the 
    block/unblock hooks (to change NV and add / remove things from the 
    list)."

Then, during v8, review comments focused mostly on 1), because we felt
the need to try to get rid of the block-cancelling hook.

What you seem to have done in this patch, is something in the middle.
In fact, AFAIU:
 - the block-cancelling hook is no longer there,
 - the wakeup hooks are gone as well,
 - the context switch hooks are still there.

Now, I'm not saying that this is not satisfactory as a result, I'm
saying it's hard to keep track of what's going on and to figure out
what your aim was in this round (and hence check the aim against the
review comments, and the implementation against the aim).

If describing that does not fit in the changelog (e.g., because it
would be something about how the code is evolving, as it probably is),
then put it in the cover letter (this very patch is almost the only
thing being changed significantly in the series anyway), or in a few
paragraph, here, after the "---", above the v9 bulleted list of
changes.

> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index e448b31..09c9c08 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -83,7 +83,132 @@ static int vmx_msr_write_intercept(unsigned int
> msr, uint64_t msr_content);

> +void vmx_vcpu_block(struct vcpu *v)
> +{
> +    unsigned long flags;
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
> +
> +    /*
> +     * The vCPU is blocking, we need to add it to one of the per
> pCPU lists.
> +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use
> it for
> +     * the per-CPU list, we also save it to posted-interrupt
> descriptor and
> +     * make it as the destination of the wake-up notification event.
> +     */
> +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
> +
> +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> +                      v->arch.hvm_vmx.pi_block_cpu), flags);
> +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> +                  &per_cpu(pi_blocked_vcpu, v
> ->arch.hvm_vmx.pi_block_cpu));
> +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> +                           v->arch.hvm_vmx.pi_block_cpu), flags);
> +
> +    ASSERT(!pi_test_sn(pi_desc));
> +
> +    /*
> +     * We don't need to set the 'NDST' field, since it should point
> to
> +     * the same pCPU as v->processor.
> +     */
> +
> +    write_atomic(&pi_desc->nv, pi_wakeup_vector);
>
Glad to see the 'do { } while ( cpmxchg(xxx) );' not being necessary
any longer! :-)

> +}
> +
> +static void vmx_pi_switch_from(struct vcpu *v)
> +{
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !has_arch_pdevs(v->domain) || !iommu_intpost ||
> +         test_bit(_VPF_blocked, &v->pause_flags) )
> +        return;
> +
The '!iommu_intpost' is probably the quickest condition to check. If it
is, it should be the first thing checked.

The same applies in a few other places.

> +static void vmx_pi_state_change(struct vcpu *v)
> +{
FWIW, I don't like the name. _state_change to what?

Looking at the code, it seems to me that what it does it to always sets
the PI descriptor to posted_intr_vector, and removes the vCPU from a
blocking list.

The reason why it does so is because we don't want the vCPU to run
while being in a blocked list, with posted_wakeup_vector in its PI
descriptor.

I'd go for a name that better reflect either the operations being done
or the aim.

> +    unsigned long flags;
> +    unsigned int pi_block_cpu;
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !has_arch_pdevs(v->domain) || !iommu_intpost )
> +        return;
> +
> +    ASSERT(!test_bit(_VPF_blocked, &v->pause_flags));
> +
> +    /*
> +     * Set 'NV' field back to posted_intr_vector, so the
> +     * Posted-Interrupts can be delivered to the vCPU when
> +     * it is running in non-root mode.
> +     */
> +    if ( pi_desc->nv != posted_intr_vector )
> +        write_atomic(&pi_desc->nv, posted_intr_vector);
> +
> +    /* the vCPU is not on any blocking list. */
> +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> +    if ( pi_block_cpu == NR_CPUS )
> +        return;
> +
> +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu),
> flags);
> +
> +    /*
> +     * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU
> was
> +     * removed from the blocking list while we are acquiring the
> lock.
> +     */
> +    if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )
> +    {
> +        spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> pi_block_cpu), flags);
> +        return;
> +    }
> +
> +    list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> pi_block_cpu), flags);
> +}
> +

> @@ -106,10 +231,18 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>  
>      spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
>  
> +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_vcpu_on_set_list);
> +
> +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> +
>      v->arch.schedule_tail    = vmx_do_resume;
>      v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
>      v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
>  
> +    if ( iommu_intpost && has_hvm_container_vcpu(v) )
> +        v->arch.vcpu_block = vmx_vcpu_block;
> +
Genuine question: is it really possible to be here, in
vmx_vcpu_initialise(), with has_hvm_container_vcpu()==false ?

> +/* Handle VT-d posted-interrupt when VCPU is blocked. */
> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> +{
> +    struct arch_vmx_struct *vmx, *tmp;
> +    struct vcpu *v;
> +    spinlock_t *lock = &per_cpu(pi_blocked_vcpu_lock,
> smp_processor_id());
> +    struct list_head *blocked_vcpus =
> +                       &per_cpu(pi_blocked_vcpu,
> smp_processor_id());
> +    LIST_HEAD(list);
> +
> +    ack_APIC_irq();
> +    this_cpu(irq_count)++;
> +
> +    spin_lock(lock);
> +
> +    /*
> +     * XXX: The length of the list depends on how many vCPU is
> current
> +     * blocked on this specific pCPU. This may hurt the interrupt
> latency
> +     * if the list grows to too many entries.
> +     */
> +    list_for_each_entry_safe(vmx, tmp, blocked_vcpus,
> pi_blocked_vcpu_list)
> +    {
> +        if ( pi_test_on(&vmx->pi_desc) )
> +        {
> +            list_del(&vmx->pi_blocked_vcpu_list);
> +            vmx->pi_block_cpu = NR_CPUS;
> +
> +            /*
> +             * We cannot call vcpu_unblock here, since it also needs
> +             * 'pi_blocked_vcpu_lock', we store the vCPUs with ON
> +             * set in another list and unblock them after we release
> +             * 'pi_blocked_vcpu_lock'.
> +             */
>
Is this still true? If yes, is it evident from this patch? I'm asking
because I don't see that here.

I see that it was like that in v7, before of the wakeup hooks... Is
that still the case?

> +            list_add_tail(&vmx->pi_vcpu_on_set_list, &list);
> +        }
> +    }
> +
> +    spin_unlock(lock);
> +
> +    list_for_each_entry_safe(vmx, tmp, &list, pi_vcpu_on_set_list)
> +    {
> +        v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> +        list_del(&vmx->pi_vcpu_on_set_list);
> +        vcpu_unblock(v);
> +    }
> +}
> +

> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 3eefed7..6e5c2f9 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c

> @@ -854,6 +856,7 @@ static long do_poll(struct sched_poll
> *sched_poll)
>  #endif
>  
>      rc = 0;
> +
>      if ( local_events_need_delivery() )
>          goto out;
> 
Just don't add this. :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  parent reply	other threads:[~2015-11-10 18:14 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
2015-11-03  8:43 ` [PATCH v9 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
2015-11-10 14:52   ` Dario Faggioli
2015-11-24  6:34     ` Tian, Kevin
2015-11-24  6:34   ` Tian, Kevin
2015-11-24  6:49     ` Wu, Feng
2015-11-03  8:43 ` [PATCH v9 02/17] Add cmpxchg16b support for x86-64 Feng Wu
2015-11-03  8:43 ` [PATCH v9 03/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
2015-11-03  8:43 ` [PATCH v9 04/17] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
2015-11-24  7:14   ` Tian, Kevin
2015-11-03  8:43 ` [PATCH v9 05/17] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts Feng Wu
2015-11-03  8:43 ` [PATCH v9 06/17] vmx: Add some helper functions for Posted-Interrupts Feng Wu
2015-11-24  7:15   ` Tian, Kevin
2015-11-03  8:43 ` [PATCH v9 07/17] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
2015-11-03  8:43 ` [PATCH v9 08/17] vmx: Suppress posting interrupts when 'SN' is set Feng Wu
2015-11-24  7:29   ` Tian, Kevin
2015-11-24  7:46     ` Wu, Feng
2015-11-24  7:52       ` Tian, Kevin
2015-11-24  7:55         ` Wu, Feng
2015-11-03  8:43 ` [PATCH v9 09/17] VT-d: Remove pointless casts Feng Wu
2015-11-24  7:30   ` Tian, Kevin
2015-11-03  8:43 ` [PATCH v9 10/17] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts Feng Wu
2015-11-03  8:43 ` [PATCH v9 11/17] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
2015-11-24  7:45   ` Tian, Kevin
2015-11-24  7:54     ` Wu, Feng
2015-11-24  7:56       ` Tian, Kevin
2015-11-24  8:52         ` Jan Beulich
2015-11-24 12:05           ` Tian, Kevin
2015-11-03  8:43 ` [PATCH v9 12/17] x86: move some APIC related macros to apicdef.h Feng Wu
2015-11-03  8:43 ` [PATCH v9 13/17] Update IRTE according to guest interrupt config changes Feng Wu
2015-11-03  8:43 ` [PATCH v9 14/17] vmx: Properly handle notification event when vCPU is running Feng Wu
2015-11-03  8:43 ` [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic handling Feng Wu
2015-11-03  9:07   ` Wu, Feng
2015-11-10 16:59     ` Dario Faggioli
2015-11-11  1:43       ` Wu, Feng
2015-11-11 13:29         ` Wu, Feng
2015-11-11 14:46           ` Dario Faggioli
2015-11-10 18:14   ` Dario Faggioli [this message]
2015-11-03  8:43 ` [PATCH v9 16/17] VT-d: Dump the posted format IRTE Feng Wu
2015-11-17 15:55   ` Jan Beulich
2015-11-24  7:50   ` Tian, Kevin
2015-11-03  8:43 ` [PATCH v9 17/17] Add a command line parameter for VT-d posted-interrupts Feng Wu
2015-11-10  7:33 ` [PATCH v9 00/17] Add VT-d Posted-Interrupts support Wu, Feng
2015-11-10 10:39   ` Jan Beulich
2015-11-17 15:39   ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1447179291.6409.59.camel@citrix.com \
    --to=dario.faggioli@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=feng.wu@intel.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.