xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [v4 00/17] Add VT-d Posted-Interrupts support
@ 2015-07-23 11:35 Feng Wu
  2015-07-23 11:35 ` [v4 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
                   ` (16 more replies)
  0 siblings, 17 replies; 46+ messages in thread
From: Feng Wu @ 2015-07-23 11:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Feng Wu

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

You can find the VT-d Posted-Interrtups Spec. in the following URL:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

This patch set follow the following design:
http://article.gmane.org/gmane.comp.emulators.xen.devel/236476

Feng Wu (17):
  VT-d Posted-intterrupt (PI) design
  Add helper macro for X86_FEATURE_CX16 feature detection
  Add cmpxchg16b support for x86-64
  iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
  vt-d: VT-d Posted-Interrupts feature detection
  vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
  vmx: Add some helper functions for Posted-Interrupts
  vmx: Initialize VT-d Posted-Interrupts Descriptor
  vmx: Suppress posting interrupts when 'SN' is set
  vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
  vt-d: Add API to update IRTE when VT-d PI is used
  Update IRTE according to guest interrupt config changes
  vmx: posted-interrupt handling when vCPU is blocked
  vmx: Properly handle notification event when vCPU is running
  arm: add a dummy arch hooks for scheduler
  vmx: Add some scheduler hooks for VT-d posted interrupts
  VT-d: Dump the posted format IRTE

 docs/misc/vtd-pi.txt                   | 333 +++++++++++++++++++++++++++++++++
 xen/arch/x86/domain.c                  |  10 +
 xen/arch/x86/hvm/vmx/vmcs.c            |  21 +++
 xen/arch/x86/hvm/vmx/vmx.c             | 288 +++++++++++++++++++++++++++-
 xen/common/schedule.c                  |   2 +
 xen/drivers/passthrough/io.c           | 124 +++++++++++-
 xen/drivers/passthrough/iommu.c        |  17 +-
 xen/drivers/passthrough/vtd/intremap.c | 199 +++++++++++++++-----
 xen/drivers/passthrough/vtd/iommu.c    |  18 +-
 xen/drivers/passthrough/vtd/iommu.h    |  46 +++--
 xen/drivers/passthrough/vtd/utils.c    |  49 ++++-
 xen/include/asm-arm/domain.h           |   2 +
 xen/include/asm-x86/cpufeature.h       |   2 +
 xen/include/asm-x86/domain.h           |   3 +
 xen/include/asm-x86/hvm/hvm.h          |   2 +
 xen/include/asm-x86/hvm/vmx/vmcs.h     |  26 ++-
 xen/include/asm-x86/hvm/vmx/vmx.h      |  28 +++
 xen/include/asm-x86/iommu.h            |   2 +
 xen/include/asm-x86/x86_64/system.h    |  32 ++++
 xen/include/xen/iommu.h                |   2 +-
 20 files changed, 1130 insertions(+), 76 deletions(-)
 create mode 100644 docs/misc/vtd-pi.txt

-- 
2.1.0

^ permalink raw reply	[flat|nested] 46+ messages in thread
* Re: [v4 16/17] vmx: Add some scheduler hooks for VT-d posted interrupts
@ 2015-08-03  1:36 Wu, Feng
  2015-08-03 10:02 ` Dario Faggioli
  0 siblings, 1 reply; 46+ messages in thread
From: Wu, Feng @ 2015-08-03  1:36 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Tian, Kevin, Keir Fraser, George Dunlap, Andrew Cooper,
	xen-devel, Jan Beulich, Wu, Feng



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Friday, July 31, 2015 2:27 AM
> To: Wu, Feng
> Cc: xen-devel@lists.xen.org; Keir Fraser; Jan Beulich; Andrew Cooper; Tian,
> Kevin; George Dunlap
> Subject: Re: [v4 16/17] vmx: Add some scheduler hooks for VT-d posted
> interrupts
> 
> On Thu, 2015-07-30 at 02:04 +0000, Wu, Feng wrote:
> > > -----Original Message-----
> > > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> 
> > > > --- a/xen/arch/x86/domain.c
> > > > +++ b/xen/arch/x86/domain.c
> > > > @@ -1550,9 +1550,19 @@ void context_switch(struct vcpu *prev, struct
> > > vcpu *next)
> > > >
> > > >      set_current(next);
> > > >
> > > > +    /*
> > > > +     * We need to update posted interrupt descriptor for each context
> > > switch,
> > > > +     * hence cannot use the lazy context switch for this.
> > > > +     */
> > > >
> > > Perhaps it's me, but I don't get the comment. Why do you mention "the
> > > lazy context switch"? We can't use it "for this", as opposed to what
> > > other circumstance where we can use it?
> >
> > Oh, maybe I shouldn't use the word here, what I want to say here is
> > __context_switch() isn't called in each context switch, such as,
> > non-idle vcpu -> idle vcpu, so we need to call prev->arch.pi_ctxt_switch_from
> > explicitly instead of in __context_switch().
> >
> Ok, I see what you mean now, and it's probably correct, as 'lazy context
> switch' is, in this context, exactly that (i.e., not actually context
> switching if next is the idle vcpu).
> 
> It's just that such term is used, in literature, in different places to
> mean (slightly) different thing, and there is no close reference to it
> (like in the function), so I still see a bit of room for potential
> confusion.
> 
> In the end, as you which. If it were me, I'd add a few word to specify
> things better, something very similar to what you've put in this email,
> e.g.:
> 
> "When switching from non-idle to idle, we only do a lazy context switch.
> However, in order for posted interrupt (if available and enabled) to
> work properly, we at least need to update the descriptors"

Sounds good!

> 
> Or some better English form of it. :-)
> 
> But that's certainly something not critical, and I'll be ok with
> everything other maintainers agree on.
> 
> > > >      if ( (per_cpu(curr_vcpu, cpu) == next) ||
> > > >           (is_idle_vcpu(next) && cpu_online(cpu)) )
> > > >      {
> > > > +        if ( !is_idle_vcpu(next) && next->arch.pi_ctxt_switch_to )
> > > >
> > > Same as above.
> > >
> > > > +            next->arch.pi_ctxt_switch_to(next);
> > > > +
> > > >          local_irq_enable();
> > > >
> > > Another thing: if prev == next --and let's call such vcpu pp-- you go
> > > through both:
> > >
> > >     pp->arch.pi_ctxt_switch_from(pp);
> > >     pp->arch.pi_ctxt_switch_to(pp);
> >
> > In my understanding, if the scheduler chooses the same vcpu to run, it
> > will return early in schedule() as below:
> >
> > static void schedule(void)
> > {
> >     ....
> >
> >     /* get policy-specific decision on scheduling... */
> >     sched = this_cpu(scheduler);
> >     next_slice = sched->do_schedule(sched, now, tasklet_work_scheduled);
> >
> >     next = next_slice.task;
> >
> >     sd->curr = next;
> >
> >     if ( next_slice.time >= 0 ) /* -ve means no limit */
> >         set_timer(&sd->s_timer, now + next_slice.time);
> >
> >     if ( unlikely(prev == next) )
> >     {
> >         pcpu_schedule_unlock_irq(lock, cpu);
> >         trace_continue_running(next);
> >         return continue_running(prev);
> >     }
> >
> >     ....
> >
> > }
> >
> > If this is that case, when we get context_switch(), the prev and next are
> > different. Do I miss something?
> >
> That looks correct. Still, there are checks like '(prev!=next)' around
> in context_switch(), for both x86 and ARM... weird. I shall have a
> deeper look...
> 
> In any case, as far as this hunk is concerned, the
> '(per_cpu(curr_vcpu,cpu)==next)' is there to deal with the case where we
> went from vcpu v to idle, and we're now going from idle to v again,
> which is something you want to intercept.
> 
> So, at least for now, ignore my comments about it. I'll let you know if
> I find something interesting that you should take into account.
> 
> > > > --- a/xen/common/schedule.c
> > > > +++ b/xen/common/schedule.c
> > > > @@ -381,6 +381,8 @@ void vcpu_wake(struct vcpu *v)
> > > >      unsigned long flags;
> > > >      spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags);
> > > >
> > > > +    arch_vcpu_wake(v);
> > > > +
> > > So, in the draft you sent a few days back, this was called at the end of
> > > vcpu_wake(), right before releasing the lock. Now it's at the beginning,
> > > before the scheduler's wakeup routine has a chance to run.
> > >
> > > IMO, it feels more natural for it to be at the bottom (i.e., generic
> > > stuff first, arch specific stuff afterwards), and, after a quick
> > > inspection, I don't think I see nothing preventing things to be that
> > > way.
> > >
> > > However, I recall you mentioning having issues with such draft, which
> > > are now resolved with this version.
> >
> > The long latency issue mentioned previously is caused by another reason.
> > Originally I called the ' pi_ctxt_switch_from ' and ' pi_ctxt_switch_to ' in
> > __context_switch(), however, this function is not called for each context
> > switch, as I described above, after fixing this, the performance issue
> > disappeared.
> >
> I see, thanks for explaining this.
> 
> > > Since this is one of the differences
> > > between the two, was it the cause of the issues you were seeing? If yes,
> > > can you elaborate on how and why?
> > >
> > > In the end, I'm not too opposed to the hook being at the beginning
> > > rather than at the end, but there has to be a reason, which may well end
> > > up better be stated in a comment...
> >
> > Here is the reason I put arch_vcpu_wake() ahead of vcpu_wake():
> > arch_vcpu_wake() does some prerequisites for a vCPU which is about
> > to run, such as, setting SN again, changing NV filed back to
> > ' posted_intr_vector ', which should be finished before the vCPU is
> > actually scheduled to run. However, if we put arch_vcpu_wake() later
> > in vcpu_wake() right before ' vcpu_schedule_unlock_irqrestore', after
> > the 'wake' hook get finished, the vcpu can run at any time (maybe in
> > another pCPU since the current pCPU is protected by the lock), if
> > this can happen, it is incorrect. Does my understanding make sense?
> >
> It's safe in any case. In fact, the spinlock will  prevent both the
> vcpu's processor to schedule, as well as any other processors to steal
> the waking vcpu from the runqueue to run it.

Good to know this. For " as well as any other processors to steal
the waking vcpu from the runqueue to run it ", could you please show
some hints in the code side, so I can better understand how this can
be protected by the spinlock. Thank you!

Thanks,
Feng

> 
> That's actually why I wanted to double check you changing the position
> of the hook (wrt the draft), as it felt weird that the issue were in
> there. :-)
> 
> So, now that we know that safety is not an issue, where should we put
> the hook?
> 
> Having it before SCHED_OP(wake) may make people think that arch specific
> code is (or can, at some point) somehow influencing the scheduler
> specific wakeup code, which is not (and should not become, if possible)
> the case.
> 
> However, I kind of like the fact that the spinlock is released as soon
> as possible, after the call to SCHED_OP(wake). That will make it more
> likely, for the processors we may have sent IPIs to, during the
> scheduler specific wakeup code, to find the spinlock free. So, looking
> at things from this angle, it would be better to avoid putting stuff in
> between SCHED_OP(wake) and vcpu_schedule_unlock().
> 
> So, all in all, I'd say leave it on top, where it is in this patch. Of
> course, if others have opinions, I'm all ears. :-)
> 
> Thanks and Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2015-08-11 10:23 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-23 11:35 [v4 00/17] Add VT-d Posted-Interrupts support Feng Wu
2015-07-23 11:35 ` [v4 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
2015-07-23 11:35 ` [v4 02/17] Add helper macro for X86_FEATURE_CX16 feature detection Feng Wu
2015-07-23 11:35 ` [v4 03/17] Add cmpxchg16b support for x86-64 Feng Wu
2015-07-24 15:03   ` Jan Beulich
2015-07-23 11:35 ` [v4 04/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
2015-07-23 14:01   ` Andrew Cooper
2015-07-23 14:05     ` Andrew Cooper
2015-07-24  0:47       ` Wu, Feng
2015-07-23 11:35 ` [v4 05/17] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
2015-07-24 15:05   ` Jan Beulich
2015-07-23 11:35 ` [v4 06/17] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts Feng Wu
2015-07-23 11:35 ` [v4 07/17] vmx: Add some helper functions for Posted-Interrupts Feng Wu
2015-07-23 11:35 ` [v4 08/17] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
2015-07-23 11:35 ` [v4 09/17] vmx: Suppress posting interrupts when 'SN' is set Feng Wu
2015-07-24 15:11   ` Jan Beulich
2015-07-23 11:35 ` [v4 10/17] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts Feng Wu
2015-07-24 15:13   ` Jan Beulich
2015-07-23 11:35 ` [v4 11/17] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
2015-07-23 13:51   ` Andrew Cooper
2015-07-23 15:52     ` Jan Beulich
2015-07-23 15:55       ` Andrew Cooper
2015-07-23 16:00         ` Jan Beulich
2015-07-23 16:11           ` Andrew Cooper
2015-07-24  0:39     ` Wu, Feng
2015-07-24 15:27   ` Jan Beulich
2015-07-28  7:34     ` Wu, Feng
2015-08-11 10:18       ` Jan Beulich
2015-07-23 11:35 ` [v4 12/17] Update IRTE according to guest interrupt config changes Feng Wu
2015-07-23 11:35 ` [v4 13/17] vmx: posted-interrupt handling when vCPU is blocked Feng Wu
2015-07-23 11:35 ` [v4 14/17] vmx: Properly handle notification event when vCPU is running Feng Wu
2015-07-23 11:35 ` [v4 15/17] arm: add a dummy arch hooks for scheduler Feng Wu
2015-07-23 11:54   ` Julien Grall
2015-07-24  0:39     ` Wu, Feng
2015-07-23 11:58   ` Jan Beulich
2015-07-23 11:35 ` [v4 16/17] vmx: Add some scheduler hooks for VT-d posted interrupts Feng Wu
2015-07-23 12:50   ` Dario Faggioli
2015-07-24  0:49     ` Wu, Feng
2015-07-28 14:15   ` Dario Faggioli
2015-07-30  2:04     ` Wu, Feng
2015-07-30 18:26       ` Dario Faggioli
2015-08-11 10:23         ` Jan Beulich
2015-07-23 11:35 ` [v4 17/17] VT-d: Dump the posted format IRTE Feng Wu
2015-08-03  1:36 [v4 16/17] vmx: Add some scheduler hooks for VT-d posted interrupts Wu, Feng
2015-08-03 10:02 ` Dario Faggioli
2015-08-05  6:06   ` Wu, Feng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).