Re: Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling

From: "Wu, Feng" <feng.wu@intel.com>
To: George Dunlap <george.dunlap@citrix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>, Keir Fraser <keir@xen.org>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Dario Faggioli <dario.faggioli@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Jan Beulich <jbeulich@suse.com>, "Wu, Feng" <feng.wu@intel.com>
Subject: Re: Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling
Date: Tue, 8 Mar 2016 13:10:47 +0000	[thread overview]
Message-ID: <E959C4978C3B6342920538CF579893F00C367545@SHSMSX104.ccr.corp.intel.com> (raw)
In-Reply-To: <56DEBF4B.7060606@citrix.com>

> -----Original Message-----
> From: George Dunlap [mailto:george.dunlap@citrix.com]
> Sent: Tuesday, March 8, 2016 8:02 PM
> To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; George Dunlap
> <George.Dunlap@eu.citrix.com>
> Cc: Wu, Feng <feng.wu@intel.com>; Tian, Kevin <kevin.tian@intel.com>; Keir
> Fraser <keir@xen.org>; Andrew Cooper <andrew.cooper3@citrix.com>; Dario
> Faggioli <dario.faggioli@citrix.com>; xen-devel@lists.xen.org; Jan Beulich
> <jbeulich@suse.com>
> Subject: Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt
> core logic handling
> 
> On 07/03/16 15:53, Konrad Rzeszutek Wilk wrote:
> > On Mon, Mar 07, 2016 at 11:21:33AM +0000, George Dunlap wrote:
> >> On Fri, Mar 4, 2016 at 10:00 PM, Konrad Rzeszutek Wilk
> >> <konrad.wilk@oracle.com> wrote:
> >>>> +/* Handle VT-d posted-interrupt when VCPU is blocked. */
> >>>> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> >>>> +{
> >>>> +    struct arch_vmx_struct *vmx, *tmp;
> >>>> +    spinlock_t *lock = &per_cpu(vmx_pi_blocking,
> smp_processor_id()).lock;
> >>>> +    struct list_head *blocked_vcpus =
> >>>> +             &per_cpu(vmx_pi_blocking, smp_processor_id()).list;
> >>>> +
> >>>> +    ack_APIC_irq();
> >>>> +    this_cpu(irq_count)++;
> >>>> +
> >>>> +    spin_lock(lock);
> >>>> +
> >>>> +    /*
> >>>> +     * XXX: The length of the list depends on how many vCPU is current
> >>>> +     * blocked on this specific pCPU. This may hurt the interrupt latency
> >>>> +     * if the list grows to too many entries.
> >>>> +     */
> >>>> +    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocking.list)
> >>>> +    {
> >>>
> >>>
> >>> My recollection of the 'most-horrible' case of this being really bad is when
> >>> the scheduler puts the vCPU0 and VCPU1 of the guest on the same pCPU (as
> an example)
> >>> and they round-robin all the time.
> >>>
> >>> <handwaving>
> >>> Would it be perhaps possible to have an anti-affinity flag to deter the
> >>> scheduler from this? That is whichever struct vcpu has 'anti-affinity' flag
> >>> set - the scheduler will try as much as it can _to not_ schedule the 'struct
> vcpu'
> >>> if the previous 'struct vcpu' had this flag as well on this pCPU?
> >>
> >> Well having vcpus from the same guest on the same pcpu is problematic
> >> for a number of reasons -- spinlocks first and foremost.  So in
> >> general trying to avoid that would be useful for most guests.
> >
> > PV ticketlocks in HVM and PV guests make this "manageable".
> >
> >>
> >> The thing with scheduling is that it's a bit like economics: it seems
> >> simple but it's actually not at all obvious what the emergent behavior
> >> will be from adding a simple rule. :-)
> >
> > <nods>
> >>
> >> On the whole it seems unlikely that having two vcpus on a single pcpu
> >> is a "stable" situation -- it's likely to be pretty transient, and
> >> thus not have a major impact on performance.
> >
> > Except that we are concerned with it - in fact we are disabling this
> > feature because it may happen. How do we make sure it does not happen
> > all the time? Or at least do some back-off if things do get
> > in this situation.
> 
> So it's disabled by default based on a theoretical fear that it *may*
> cause performance problems, but without any actual performance problems
> having been observed?

Yes, according to Jan's comments in previous thread, theoretically, the list
may become very long, so he tend to make this feature default off now.

> 
> It seems like there are a couple of ways we could approach this:
> 
> 1. Try to optimize the reverse look-up code so that it's not a linear
> linked list (getting rid of the theoretical fear)

Good point.

> 
> 2. Try to test engineered situations where we expect this to be a
> problem, to see how big of a problem it is (proving the theory to be
> accurate or inaccurate in this case)

Maybe we can run a SMP guest with all the vcpus pinned to a dedicated
pCPU, we can run some benchmark in the guest with VT-d PI and without
VT-d PI, then see the performance difference between these two sceanrios.

> 
> 3. Turn the feature on by default as soon as the 4.8 window opens up,
> perhaps with some sort of a check that runs when in debug mode that
> looks for the condition we're afraid of happening and BUG()s.  If we run
> a full development cycle without anyone hitting the bug in testing, then
> we just leave the feature on.

Maybe we can pre-define a max acceptable length of the list,  if it really
reach the number, print out a warning or something like that. However,
how to decide the max length is a problem. May need more thinking.

Thanks,
Feng

> 
> Then we'll only look at adding complexity to the scheduler if there's
> actually a problem to solve.
> 
>  -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel