xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Dario Faggioli <dario.faggioli@citrix.com>
To: Feng Wu <feng.wu@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>, Keir Fraser <keir@xen.org>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	xen-devel@lists.xen.org, Jan Beulich <jbeulich@suse.com>
Subject: Re: [v4 16/17] vmx: Add some scheduler hooks for VT-d posted interrupts
Date: Tue, 28 Jul 2015 16:15:48 +0200	[thread overview]
Message-ID: <1438092948.2889.98.camel@citrix.com> (raw)
In-Reply-To: <1437651353-5275-17-git-send-email-feng.wu@intel.com>


[-- Attachment #1.1: Type: text/plain, Size: 7080 bytes --]

On Thu, 2015-07-23 at 19:35 +0800, Feng Wu wrote:
> This patch adds the following arch hooks in scheduler:
> - vmx_pre_ctx_switch_pi():
> It is called before context switch, we update the posted
> interrupt descriptor when the vCPU is preempted, go to sleep,
> or is blocked.
> 
> - vmx_post_ctx_switch_pi()
> It is called after context switch, we update the posted
> interrupt descriptor when the vCPU is going to run.
> 
> - arch_vcpu_wake()
> It will be called when waking up the vCPU, we update
> the posted interrupt descriptor when the vCPU is unblocked.
> 
Thanks again for doing this. I'll never get tired to say how much better
I like this, wrt the previous RUNSTATE_* based approach. :-)

> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1550,9 +1550,19 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>  
>      set_current(next);
>  
> +    /*
> +     * We need to update posted interrupt descriptor for each context switch,
> +     * hence cannot use the lazy context switch for this.
> +     */
>
Perhaps it's me, but I don't get the comment. Why do you mention "the
lazy context switch"? We can't use it "for this", as opposed to what
other circumstance where we can use it?

> +    if ( !is_idle_vcpu(prev) && prev->arch.pi_ctxt_switch_from )
> +        prev->arch.pi_ctxt_switch_from(prev);
> +
Flip the order in the conditions, maybe? So that guests with
pi_ctxt_switch_from==NULL suffers the least possible overhead?


>      if ( (per_cpu(curr_vcpu, cpu) == next) ||
>           (is_idle_vcpu(next) && cpu_online(cpu)) )
>      {
> +        if ( !is_idle_vcpu(next) && next->arch.pi_ctxt_switch_to )
>
Same as above.

> +            next->arch.pi_ctxt_switch_to(next);
> +
>          local_irq_enable();
>
Another thing: if prev == next --and let's call such vcpu pp-- you go
through both:

    pp->arch.pi_ctxt_switch_from(pp);
    pp->arch.pi_ctxt_switch_to(pp);

Is this really necessary? AFAICT, it reduces to:

    pi_set_sn(&pp->arch.hvm_vmx.pi_desc);
    if ( x2apic_enabled )
        write_atomic(&pp->arch.hvm_vmx.pi_desc->ndst,
                     cpu_physical_id(v->processor));
    else
        write_atomic(&pp->arch.hvm_vmx.pi_desc->ndst,
                     MASK_INSR(cpu_physical_id(v->processor),
                     PI_xAPIC_NDST_MASK));
    pi_clear_sn(pi_desc);

In fact, if the scheduler picked prev again, as the next vcpu to be run,
I expect

    ( cpu_runnable(v) || !test_bit(_VPF_blocked, &v->pause_flags) )

to be true, and hence vmx_pre_ctx_switch_pi() reducing to just a call to
pi_set_sn().

So, either it is the write_atomic() that you absolutely need, which I
don't think, or you really can check whether such situation is
occurring, and avoid going through _pre_ and _post_ back to back.

> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c

> @@ -116,10 +118,17 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>      INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
>      INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_vcpu_on_set_list);
>  
> +    v->arch.hvm_vmx.pi_block_cpu = -1;
> +
> +    spin_lock_init(&v->arch.hvm_vmx.pi_lock);
> +
>      v->arch.schedule_tail    = vmx_do_resume;
>      v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
>      v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
>  
> +    v->arch.pi_ctxt_switch_from = vmx_pre_ctx_switch_pi;
> +    v->arch.pi_ctxt_switch_to = vmx_post_ctx_switch_pi;
> +
Would it make sense to get the two initialization above with:

  if ( iommu_intpost && is_hvm_vcpu(v) )

and leave them NULL if that is false? I think it works, and it looks
worthwhile to me.

In fact, this would make the operands flip suggested above actually
effective at cutting the overhead for non-PI enabled guests/hosts.

> +static void vmx_pre_ctx_switch_pi(struct vcpu *v)
> +{
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +    struct pi_desc old, new;
> +    unsigned long flags, gflags;
> +
> +    if ( !iommu_intpost || !is_hvm_vcpu(v) || !has_arch_pdevs(v->domain) )
> +        return;
> +
And, if you do as I suggest in context_switch() and
vmx_vcpu_initialise(), this will reduce to:

    if ( !has_arch_pdevs(v->domain) )
        return;

> +    else if ( test_bit(_VPF_blocked, &v->pause_flags) )
> +    {
> +        /*
> +         * The vCPU is blocking, we need to add it to one of the per pCPU lists.
> +         * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it for
> +         * the per-CPU list, we also save it to posted-interrupt descriptor and
> +         * make it as the destination of the wake-up notification event.
> +         */
> +        v->arch.hvm_vmx.pi_block_cpu = v->processor;
> +        spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> +                          v->arch.hvm_vmx.pi_block_cpu), flags);
> +        list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> +                      &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu));
> +        spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> +                           v->arch.hvm_vmx.pi_block_cpu), flags);
> +
> +        do {
> +            old.control = new.control = pi_desc->control;
> +
> +            /* Should not block the vCPU if an interrupt was posted for it */
> +
There's no need for the blank line, and I'd say kill it.

> +            if ( pi_test_on(&old) )
> +            {
> +                spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, gflags);
> +                vcpu_unblock(v);
> +                return;
> +            }

> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -381,6 +381,8 @@ void vcpu_wake(struct vcpu *v)
>      unsigned long flags;
>      spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags);
>  
> +    arch_vcpu_wake(v);
> +
So, in the draft you sent a few days back, this was called at the end of
vcpu_wake(), right before releasing the lock. Now it's at the beginning,
before the scheduler's wakeup routine has a chance to run.

IMO, it feels more natural for it to be at the bottom (i.e., generic
stuff first, arch specific stuff afterwards), and, after a quick
inspection, I don't think I see nothing preventing things to be that
way.

However, I recall you mentioning having issues with such draft, which
are now resolved with this version. Since this is one of the differences
between the two, was it the cause of the issues you were seeing? If yes,
can you elaborate on how and why?

In the end, I'm not too opposed to the hook being at the beginning
rather than at the end, but there has to be a reason, which may well end
up better be stated in a comment...

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  parent reply	other threads:[~2015-07-28 14:15 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-23 11:35 [v4 00/17] Add VT-d Posted-Interrupts support Feng Wu
2015-07-23 11:35 ` [v4 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
2015-07-23 11:35 ` [v4 02/17] Add helper macro for X86_FEATURE_CX16 feature detection Feng Wu
2015-07-23 11:35 ` [v4 03/17] Add cmpxchg16b support for x86-64 Feng Wu
2015-07-24 15:03   ` Jan Beulich
2015-07-23 11:35 ` [v4 04/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
2015-07-23 14:01   ` Andrew Cooper
2015-07-23 14:05     ` Andrew Cooper
2015-07-24  0:47       ` Wu, Feng
2015-07-23 11:35 ` [v4 05/17] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
2015-07-24 15:05   ` Jan Beulich
2015-07-23 11:35 ` [v4 06/17] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts Feng Wu
2015-07-23 11:35 ` [v4 07/17] vmx: Add some helper functions for Posted-Interrupts Feng Wu
2015-07-23 11:35 ` [v4 08/17] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
2015-07-23 11:35 ` [v4 09/17] vmx: Suppress posting interrupts when 'SN' is set Feng Wu
2015-07-24 15:11   ` Jan Beulich
2015-07-23 11:35 ` [v4 10/17] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts Feng Wu
2015-07-24 15:13   ` Jan Beulich
2015-07-23 11:35 ` [v4 11/17] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
2015-07-23 13:51   ` Andrew Cooper
2015-07-23 15:52     ` Jan Beulich
2015-07-23 15:55       ` Andrew Cooper
2015-07-23 16:00         ` Jan Beulich
2015-07-23 16:11           ` Andrew Cooper
2015-07-24  0:39     ` Wu, Feng
2015-07-24 15:27   ` Jan Beulich
2015-07-28  7:34     ` Wu, Feng
2015-08-11 10:18       ` Jan Beulich
2015-07-23 11:35 ` [v4 12/17] Update IRTE according to guest interrupt config changes Feng Wu
2015-07-23 11:35 ` [v4 13/17] vmx: posted-interrupt handling when vCPU is blocked Feng Wu
2015-07-23 11:35 ` [v4 14/17] vmx: Properly handle notification event when vCPU is running Feng Wu
2015-07-23 11:35 ` [v4 15/17] arm: add a dummy arch hooks for scheduler Feng Wu
2015-07-23 11:54   ` Julien Grall
2015-07-24  0:39     ` Wu, Feng
2015-07-23 11:58   ` Jan Beulich
2015-07-23 11:35 ` [v4 16/17] vmx: Add some scheduler hooks for VT-d posted interrupts Feng Wu
2015-07-23 12:50   ` Dario Faggioli
2015-07-24  0:49     ` Wu, Feng
2015-07-28 14:15   ` Dario Faggioli [this message]
2015-07-30  2:04     ` Wu, Feng
2015-07-30 18:26       ` Dario Faggioli
2015-08-11 10:23         ` Jan Beulich
2015-07-23 11:35 ` [v4 17/17] VT-d: Dump the posted format IRTE Feng Wu
2015-08-03  1:36 [v4 16/17] vmx: Add some scheduler hooks for VT-d posted interrupts Wu, Feng
2015-08-03 10:02 ` Dario Faggioli
2015-08-05  6:06   ` Wu, Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1438092948.2889.98.camel@citrix.com \
    --to=dario.faggioli@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=feng.wu@intel.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).