linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Wu, Feng" <feng.wu@intel.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: "tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"gleb@kernel.org" <gleb@kernel.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"dwmw2@infradead.org" <dwmw2@infradead.org>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"jiang.liu@linux.intel.com" <jiang.liu@linux.intel.com>,
	"eric.auger@linaro.org" <eric.auger@linaro.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Wu, Feng" <feng.wu@intel.com>
Subject: RE: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
Date: Mon, 16 Mar 2015 11:42:06 +0000	[thread overview]
Message-ID: <E959C4978C3B6342920538CF579893F00243A819@SHSMSX104.ccr.corp.intel.com> (raw)
In-Reply-To: <20150312011526.GA5878@amt.cnet>



> -----Original Message-----
> From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> Sent: Thursday, March 12, 2015 9:15 AM
> To: Wu, Feng
> Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> is blocked
> 
> On Fri, Mar 06, 2015 at 06:51:52AM +0000, Wu, Feng wrote:
> >
> >
> > > -----Original Message-----
> > > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > > Sent: Wednesday, March 04, 2015 8:06 PM
> > > To: Wu, Feng
> > > Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com;
> x86@kernel.org;
> > > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > > joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> > > eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > > iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> > > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when
> vCPU
> > > is blocked
> > >
> > > On Mon, Mar 02, 2015 at 01:36:51PM +0000, Wu, Feng wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > > > > Sent: Friday, February 27, 2015 7:41 AM
> > > > > To: Wu, Feng
> > > > > Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com;
> > > x86@kernel.org;
> > > > > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > > > > joro@8bytes.org; alex.williamson@redhat.com;
> jiang.liu@linux.intel.com;
> > > > > eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > > > > iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> > > > > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when
> > > vCPU
> > > > > is blocked
> > > > >
> > > > > On Fri, Dec 12, 2014 at 11:14:58PM +0800, Feng Wu wrote:
> > > > > > This patch updates the Posted-Interrupts Descriptor when vCPU
> > > > > > is blocked.
> > > > > >
> > > > > > pre-block:
> > > > > > - Add the vCPU to the blocked per-CPU list
> > > > > > - Clear 'SN'
> > > > > > - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> > > > > >
> > > > > > post-block:
> > > > > > - Remove the vCPU from the per-CPU list
> > > > > >
> > > > > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > > > > > ---
> > > > > >  arch/x86/include/asm/kvm_host.h |  2 +
> > > > > >  arch/x86/kvm/vmx.c              | 96
> > > > > +++++++++++++++++++++++++++++++++++++++++
> > > > > >  arch/x86/kvm/x86.c              | 22 +++++++---
> > > > > >  include/linux/kvm_host.h        |  4 ++
> > > > > >  virt/kvm/kvm_main.c             |  6 +++
> > > > > >  5 files changed, 123 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/x86/include/asm/kvm_host.h
> > > > > b/arch/x86/include/asm/kvm_host.h
> > > > > > index 13e3e40..32c110a 100644
> > > > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > > > @@ -101,6 +101,8 @@ static inline gfn_t gfn_to_index(gfn_t gfn,
> gfn_t
> > > > > base_gfn, int level)
> > > > > >
> > > > > >  #define ASYNC_PF_PER_VCPU 64
> > > > > >
> > > > > > +extern void (*wakeup_handler_callback)(void);
> > > > > > +
> > > > > >  enum kvm_reg {
> > > > > >  	VCPU_REGS_RAX = 0,
> > > > > >  	VCPU_REGS_RCX = 1,
> > > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > > > > index bf2e6cd..a1c83a2 100644
> > > > > > --- a/arch/x86/kvm/vmx.c
> > > > > > +++ b/arch/x86/kvm/vmx.c
> > > > > > @@ -832,6 +832,13 @@ static DEFINE_PER_CPU(struct vmcs *,
> > > > > current_vmcs);
> > > > > >  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
> > > > > >  static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
> > > > > >
> > > > > > +/*
> > > > > > + * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler()
> we
> > > > > > + * can find which vCPU should be waken up.
> > > > > > + */
> > > > > > +static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> > > > > > +static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> > > > > > +
> > > > > >  static unsigned long *vmx_io_bitmap_a;
> > > > > >  static unsigned long *vmx_io_bitmap_b;
> > > > > >  static unsigned long *vmx_msr_bitmap_legacy;
> > > > > > @@ -1921,6 +1928,7 @@ static void vmx_vcpu_load(struct kvm_vcpu
> > > *vcpu,
> > > > > int cpu)
> > > > > >  		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > > > > >  		struct pi_desc old, new;
> > > > > >  		unsigned int dest;
> > > > > > +		unsigned long flags;
> > > > > >
> > > > > >  		memset(&old, 0, sizeof(old));
> > > > > >  		memset(&new, 0, sizeof(new));
> > > > > > @@ -1942,6 +1950,20 @@ static void vmx_vcpu_load(struct
> kvm_vcpu
> > > > > *vcpu, int cpu)
> > > > > >  			new.nv = POSTED_INTR_VECTOR;
> > > > > >  		} while (cmpxchg(&pi_desc->control, old.control,
> > > > > >  				new.control) != old.control);
> > > > > > +
> > > > > > +		/*
> > > > > > +		 * Delete the vCPU from the related wakeup queue
> > > > > > +		 * if we are resuming from blocked state
> > > > > > +		 */
> > > > > > +		if (vcpu->blocked) {
> > > > > > +			vcpu->blocked = false;
> > > > > > +
> 	spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > > > +				vcpu->wakeup_cpu), flags);
> > > > > > +			list_del(&vcpu->blocked_vcpu_list);
> > > > > > +
> > > 	spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > > > +				vcpu->wakeup_cpu), flags);
> > > > > > +			vcpu->wakeup_cpu = -1;
> > > > > > +		}
> > > > > >  	}
> > > > > >  }
> > > > > >
> > > > > > @@ -1950,6 +1972,9 @@ static void vmx_vcpu_put(struct kvm_vcpu
> > > *vcpu)
> > > > > >  	if (irq_remapping_cap(IRQ_POSTING_CAP)) {
> > > > > >  		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > > > > >  		struct pi_desc old, new;
> > > > > > +		unsigned long flags;
> > > > > > +		int cpu;
> > > > > > +		struct cpumask cpu_others_mask;
> > > > > >
> > > > > >  		memset(&old, 0, sizeof(old));
> > > > > >  		memset(&new, 0, sizeof(new));
> > > > > > @@ -1961,6 +1986,54 @@ static void vmx_vcpu_put(struct
> kvm_vcpu
> > > > > *vcpu)
> > > > > >  				pi_set_sn(&new);
> > > > > >  			} while (cmpxchg(&pi_desc->control, old.control,
> > > > > >  					new.control) != old.control);
> > > > > > +		} else if (vcpu->blocked) {
> > > > > > +			/*
> > > > > > +			 * The vcpu is blocked on the wait queue.
> > > > > > +			 * Store the blocked vCPU on the list of the
> > > > > > +			 * vcpu->wakeup_cpu, which is the destination
> > > > > > +			 * of the wake-up notification event.
> > > > > > +			 */
> > > > > > +			vcpu->wakeup_cpu = vcpu->cpu;
> > > > > > +
> 	spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > > > +					  vcpu->wakeup_cpu), flags);
> > > > > > +			list_add_tail(&vcpu->blocked_vcpu_list,
> > > > > > +				      &per_cpu(blocked_vcpu_on_cpu,
> > > > > > +				      vcpu->wakeup_cpu));
> > > > > > +			spin_unlock_irqrestore(
> > > > > > +					&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > > > +					vcpu->wakeup_cpu), flags);
> > > > > > +
> > > > > > +			do {
> > > > > > +				old.control = new.control = pi_desc->control;
> > > > > > +
> > > > > > +				/*
> > > > > > +				 * We should not block the vCPU if
> > > > > > +				 * an interrupt is posted for it.
> > > > > > +				 */
> > > > > > +				if (pi_test_on(pi_desc) == 1) {
> > > > > > +					/*
> > > > > > +					 * We need schedule the wakeup worker
> > > > > > +					 * on a different cpu other than
> > > > > > +					 * vcpu->cpu, because in some case,
> > > > > > +					 * schedule_work() will call
> > > > > > +					 * try_to_wake_up() which needs acquire
> > > > > > +					 * the rq lock. This can cause deadlock.
> > > > > > +					 */
> > > > > > +					cpumask_copy(&cpu_others_mask,
> > > > > > +						     cpu_online_mask);
> > > > > > +					cpu_clear(vcpu->cpu, cpu_others_mask);
> > > > > > +					cpu = any_online_cpu(cpu_others_mask);
> > > > > > +
> > > > > > +					schedule_work_on(cpu,
> > > > > > +							 &vcpu->wakeup_worker);
> > > > > > +				}
> > > > > > +
> > > > > > +				pi_clear_sn(&new);
> > > > > > +
> > > > > > +				/* set 'NV' to 'wakeup vector' */
> > > > > > +				new.nv = POSTED_INTR_WAKEUP_VECTOR;
> > > > > > +			} while (cmpxchg(&pi_desc->control, old.control,
> > > > > > +				new.control) != old.control);
> > > > > >  		}
> > > > >
> > > > > This can be done exclusively on HLT emulation, correct? (that is, on
> > > > > entry to HLT and exit from HLT).
> > > >
> > > > Do you mean the following?
> > > > In kvm_emulate_halt(), we do:
> > > > 1. Add vCPU in the blocking list
> > > > 2. Clear 'SN'
> > > > 3. set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> > > >
> > > > In __vcpu_run(), after kvm_vcpu_block(), we remove the vCPU from the
> > > > Bloc king list.
> > >
> > > Yes (please check its OK to do this...).
> >
> > I think about this for some time, and I feel this may be another solution
> > to implement it. Do you mind sharing your ideas about why do you think
> > this alternative is better than the current one? Thanks a lot!
> 
> Two reasons:
> 
> 1) Because it does not add overhead to vcpu_puts thats are not due to
> HLT. Doing so removes the "vcpu->blocked" variable (its implicit in the
> code anyway).
> 2) Easier to spot races.
> 
> Do you have any reason why having the code at vcpu_put/vcpu_load is
> better than the proposal to have the code at kvm_vcpu_block?

I think your proposal is good, I just want to better understand your idea here.:)

One thing, even we put the code to kvm_vcpu_block, we still need to add code
at vcpu_put/vcpu_load for the preemption case like what I did now.

> 
> > > > > If the vcpu is scheduled out for any other reason (transition to
> > > > > userspace or transition to other thread), it will eventually resume
> > > > > execution. And in that case, continuation of execution does not depend
> > > > > on the event (VT-d interrupt) notification.
> > > >
> > > > Yes, I think this is true for my current implementation, right?
> > > >
> > > > >
> > > > > There is a race window with the code above, I believe.
> > > >
> > > > I did careful code review back and forth for the above code, It will
> > > > be highly appreciated if you can point out the race window!
> > >
> > > So the remapping HW sees either POSTED_INTR_VECTOR or
> > > POSTED_INTR_WAKEUP_VECTOR.
> > >
> > > You should:
> > >
> > > 1. Set POSTED_INTR_WAKEUP_VECTOR.
> > > 2. Check for PIR / ON bit, which might have been set by
> > > POSTED_INTR_VECTOR notification.
> > > 3. emulate HLT.
> >
> > My original idea for pre-block operation is:
> > 1. Add vCPU to the per-cpu blocking list. Here is the code for this in my patch:
> > +                       vcpu->wakeup_cpu = vcpu->cpu;
> > +
> spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +                                         vcpu->wakeup_cpu),
> flags);
> > +                       list_add_tail(&vcpu->blocked_vcpu_list,
> > +
> &per_cpu(blocked_vcpu_on_cpu,
> > +                                     vcpu->wakeup_cpu));
> > +                       spin_unlock_irqrestore(
> > +
> &per_cpu(blocked_vcpu_on_cpu_lock,
> > +                                       vcpu->wakeup_cpu), flags);
> > 2. Update Posted-interrupt descriptor, here is the code in my patch:
> > +                       do {
> > +                               old.control = new.control =
> pi_desc->control;
> > +
> > +                               /*
> > +                                * We should not block the vCPU if
> > +                                * an interrupt is posted for it.
> > +                                */
> > +                               if (pi_test_on(pi_desc) == 1) {
> > +                                       /*
> > +                                        * We need schedule the
> wakeup worker
> > +                                        * on a different cpu other
> than
> > +                                        * vcpu->cpu, because in
> some case,
> > +                                        * schedule_work() will call
> > +                                        * try_to_wake_up() which
> needs acquire
> > +                                        * the rq lock. This can
> cause deadlock.
> > +                                        */
> > +
> cpumask_copy(&cpu_others_mask,
> > +
> cpu_online_mask);
> > +                                       cpu_clear(vcpu->cpu,
> cpu_others_mask);
> > +                                       cpu =
> any_online_cpu(cpu_others_mask);
> > +
> > +                                       schedule_work_on(cpu,
> > +
> &vcpu->wakeup_worker);
> > +                               }
> > +
> > +                               WARN((pi_desc->sn == 1),
> > +                                    "Warning: SN field of
> posted-interrupts "
> > +                                    "is set before blocking\n");
> > +
> > +                               /* set 'NV' to 'wakeup vector' */
> > +                               new.nv =
> POSTED_INTR_WAKEUP_VECTOR;
> > +                       } while (cmpxchg(&pi_desc->control,
> old.control,
> > +                               new.control) != old.control);
> >
> > If PIR/ON bit is set by POSTED_INTR_VECTOR notification during the above
> operation, we will stop
> > blocking the vCPU like about. But seems I missed something in the above
> code which should be in
> > my mind from the beginning, I should add a 'break' in the end the above ' if
> (pi_test_on(pi_desc) == 1) {}',
> > so in this case, the 'NV' filed remains unchanged.
> 
> Right have to think carefully about all cases.
> 
> > > > > >  	}
> > > > > >
> > > > > > @@ -2842,6 +2915,8 @@ static int hardware_enable(void)
> > > > > >  		return -EBUSY;
> > > > > >
> > > > > >  	INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
> > > > > > +	INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
> > > > > > +	spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > > > > >
> > > > > >  	/*
> > > > > >  	 * Now we can enable the vmclear operation in kdump
> > > > > > @@ -9315,6 +9390,25 @@ static struct kvm_x86_ops vmx_x86_ops =
> {
> > > > > >  	.pi_set_sn = vmx_pi_set_sn,
> > > > > >  };
> > > > > >
> > > > > > +/*
> > > > > > + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
> > > > > > + */
> > > > > > +void wakeup_handler(void)
> > > > > > +{
> > > > > > +	struct kvm_vcpu *vcpu;
> > > > > > +	int cpu = smp_processor_id();
> > > > > > +
> > > > > > +	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > > > > > +	list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu,
> cpu),
> > > > > > +			blocked_vcpu_list) {
> > > > > > +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > > > > > +
> > > > > > +		if (pi_test_on(pi_desc) == 1)
> > > > > > +			kvm_vcpu_kick(vcpu);
> > > > > > +	}
> > > > > > +	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > > > > > +}
> > > > >
> > > > > Looping through all blocked vcpus does not scale:
> > > > > Can you allocate more vectors and then multiplex those
> > > > > vectors amongst the HLT'ed vcpus?
> > > >
> > > > I am a little confused about this, can you elaborate it a bit more?
> > > > Thanks a lot!
> > >
> > > Picture the following overcommitment scenario:
> > >
> > > * High ratio of vCPUs/pCPUs, in the ratio 128/1 (this is exaggerated
> > > to demonstrate the issue).
> > > * Every VT-d interrupt is going to scan 128 entries in the list.
> > >
> > > Moreover, the test:
> > >
> > > 		if (pi_test_on(pi_desc) == 1)
> > > 			kvm_vcpu_kick(vcpu);
> > >
> > > Can trigger for vCPUs which have not been waken up due
> > > to VT-d interrupts, but for other interrupts.
> > >
> > > You can allocate, say 16 vectors on the pCPU for VT-d interrupts:
> > >
> > > POSTED_INTERRUPT_WAKEUP_VECTOR_1,
> > > POSTED_INTERRUPT_WAKEUP_VECTOR_2,
> > > ...
> > >
> >
> > Global vector is a limited resources in the system, and this involves
> > common x86 interrupt code changes. I am not sure we can allocate
> > so many dedicated global vector for KVM usage.
> 
> Why not? Have KVM use all free vectors (so if vectors are necessary for
> other purposes, people should shrink the KVM vector pool).

If we want to allocate more global vector for this usage, we need hpa's
input about it. Peter, what is your opinion?

> 
> BTW the Intel docs talk about that ("one vector per vCPU").
Yes, the Spec talks about this, but it is more complex using one vector per vCPU.

> 
> > > > > It seems there is a bunch free:
> > > > >
> > > > > commit 52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4
> > > > > Author: Alex Shi <alex.shi@intel.com>
> > > > > Date:   Thu Jun 28 09:02:23 2012 +0800
> > > > >
> > > > >     x86/tlb: replace INVALIDATE_TLB_VECTOR by
> > > CALL_FUNCTION_VECTOR
> > > > >
> > > > > Can you add only vcpus which have posted IRTEs that point to this pCPU
> > > > > to the HLT'ed vcpu lists? (so for example, vcpus without assigned
> > > > > devices are not part of the list).
> > > >
> > > > Is it easy to find whether a vCPU (or the associated domain) has assigned
> > > devices?
> > > > If so, we can only add those vCPUs with assigned devices.
> > >
> > > When configuring IRTE, at kvm_arch_vfio_update_pi_irte?
> >
> > Yes.
> >
> > >
> > > > > > +
> > > > > >  static int __init vmx_init(void)
> > > > > >  {
> > > > > >  	int r, i, msr;
> > > > > > @@ -9429,6 +9523,8 @@ static int __init vmx_init(void)
> > > > > >
> > > > > >  	update_ple_window_actual_max();
> > > > > >
> > > > > > +	wakeup_handler_callback = wakeup_handler;
> > > > > > +
> > > > > >  	return 0;
> > > > > >
> > > > > >  out7:
> > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > index 0033df3..1551a46 100644
> > > > > > --- a/arch/x86/kvm/x86.c
> > > > > > +++ b/arch/x86/kvm/x86.c
> > > > > > @@ -6152,6 +6152,21 @@ static int vcpu_enter_guest(struct
> kvm_vcpu
> > > > > *vcpu)
> > > > > >  			kvm_vcpu_reload_apic_access_page(vcpu);
> > > > > >  	}
> > > > > >
> > > > > > +	/*
> > > > > > +	 * Since posted-interrupts can be set by VT-d HW now, in this
> > > > > > +	 * case, KVM_REQ_EVENT is not set. We move the following
> > > > > > +	 * operations out of the if statement.
> > > > > > +	 */
> > > > > > +	if (kvm_lapic_enabled(vcpu)) {
> > > > > > +		/*
> > > > > > +		 * Update architecture specific hints for APIC
> > > > > > +		 * virtual interrupt delivery.
> > > > > > +		 */
> > > > > > +		if (kvm_x86_ops->hwapic_irr_update)
> > > > > > +			kvm_x86_ops->hwapic_irr_update(vcpu,
> > > > > > +				kvm_lapic_find_highest_irr(vcpu));
> > > > > > +	}
> > > > > > +
> > > > >
> > > > > This is a hot fast path. You can set KVM_REQ_EVENT from
> wakeup_handler.
> > > >
> > > > I am afraid Setting KVM_REQ_EVENT from wakeup_handler doesn't help
> > > much,
> > > > if vCPU is running in ROOT mode, and VT-d hardware issues an notification
> > > event,
> > > > POSTED_INTR_VECTOR interrupt handler will be called.
> > >
> > > If vCPU is in root mode, remapping HW will find IRTE configured with
> > > vector == POSTED_INTR_WAKEUP_VECTOR, use that vector, which will
> > > VM-exit, and execute the interrupt handler wakeup_handler. Right?
> >
> > There are two cases:
> > Case 1: vCPU is blocked, so it is in root mode, this is what you described
> above.
> > Case 2, vCPU is running in root mode, such as, handling vm-exits, in this case,
> > the notification vector is 'POSTED_INTR_VECTOR', and if external interrupts
> > from assigned devices happen, the handled of 'POSTED_INTR_VECTOR' will
> > be called ( it is 'smp_kvm_posted_intr_ipi' in fact), this routine doesn't need
> > do real things, since the pending interrupts in PIR will be synced to vIRR
> before
> > VM-Entry (this code have already been there when enabling CPU-side
> > posted-interrupt along with APICv). Like what I said before, it is a little hard
> to
> > get vCPU related information in it, even if we get, it is not accurate and may
> harm
> > the performance.(need search)
> >
> > So only setting KVM_REQ_EVENT in wakeup_handler cannot cover the
> notification
> > event for 'POSTED_INTR_VECTOR'.
> >
> > >
> > > The point of this comment is that you can keep the
> > >
> > > "if (kvm_x86_ops->hwapic_irr_update)
> > > 	kvm_x86_ops->hwapic_irr_update(vcpu,
> > > 			kvm_lapic_find_highest_irr(vcpu));
> > > "
> > >
> > > Code inside KVM_REQ_EVENT handling section of vcpu_run, as long as
> > > wakeup_handler sets KVM_REQ_EVENT.
> >
> > Please see above.
> 
> OK can you set KVM_REQ_EVENT in case the ON bit is set,
> after disabling interrupts ?
> 
Currently, the following code is executed before local_irq_disable() is called,
so do you mean 1)moving local_irq_disable() to the place before it. 2) after interrupt
is disabled, set KVM_REQ_EVENT in case the ON bit is set?

"if (kvm_x86_ops->hwapic_irr_update)
	kvm_x86_ops->hwapic_irr_update(vcpu,
			kvm_lapic_find_highest_irr(vcpu));

> kvm_lapic_find_highest_irr(vcpu) eats some cache
> (4 cachelines) versus 1 cacheline for reading ON bit.
> 
> > > > > Please remove blocked and wakeup_cpu, they should not be necessary.
> > > >
> > > > Why do you think wakeup_cpu is not needed, when vCPU is blocked,
> > > > wakeup_cpu saves the cpu which the vCPU is blocked on, after vCPU
> > > > is woken up, it can run on a different cpu, so we need wakeup_cpu to
> > > > find the right list to wake up the vCPU.
> > >
> > > If the vCPU was moved it should have updated IRTE destination field
> > > to the pCPU which it has moved to?
> >
> > Every time a vCPU is scheduled to a new pCPU, the IRTE destination filed
> > would be updated accordingly.
> >
> > When vCPU is blocked. To wake up the blocked vCPU, we need to find which
> > list the vCPU is blocked on, and this is what wakeup_cpu used for?
> 
> Right, perhaps prev_vcpu is a better name.

Do you mean "prev_pcpu"?

Thanks,
Feng


  reply	other threads:[~2015-03-16 11:42 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
2014-12-12 15:14 ` [v3 01/26] genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPU Feng Wu
2014-12-12 15:14 ` [v3 02/26] iommu: Add new member capability to struct irq_remap_ops Feng Wu
2015-01-28 15:22   ` David Woodhouse
2015-01-29  8:34     ` Wu, Feng
2014-12-12 15:14 ` [v3 03/26] iommu, x86: Define new irte structure for VT-d Posted-Interrupts Feng Wu
2015-01-28 15:26   ` David Woodhouse
2014-12-12 15:14 ` [v3 04/26] iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip Feng Wu
2015-01-28 15:26   ` David Woodhouse
2015-01-29  7:55     ` Wu, Feng
2014-12-12 15:14 ` [v3 05/26] x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller Feng Wu
2014-12-12 15:14 ` [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts Feng Wu
2014-12-18 14:26   ` Zhang, Yang Z
2014-12-19  1:40     ` Wu, Feng
2014-12-19  1:46       ` Zhang, Yang Z
2014-12-19 11:59         ` Paolo Bonzini
2014-12-23  0:37           ` Zhang, Yang Z
2014-12-23  8:47             ` Paolo Bonzini
2014-12-23  9:07               ` Wu, Feng
2014-12-23  9:34                 ` Paolo Bonzini
2014-12-24  1:38                   ` Zhang, Yang Z
2014-12-24  2:12                     ` Jiang Liu
2014-12-24  2:32                       ` Zhang, Yang Z
2014-12-24  3:08                         ` Wu, Feng
2014-12-24  4:04                           ` Zhang, Yang Z
2014-12-24  4:54                         ` Jiang Liu
2015-01-28 15:29   ` David Woodhouse
2014-12-12 15:14 ` [v3 07/26] iommu, x86: Add cap_pi_support() to detect VT-d PI capability Feng Wu
2015-01-28 15:32   ` David Woodhouse
2014-12-12 15:14 ` [v3 08/26] iommu, x86: Add intel_irq_remapping_capability() for Intel Feng Wu
2015-01-28 15:37   ` David Woodhouse
2015-01-29  8:57     ` Wu, Feng
2014-12-12 15:14 ` [v3 09/26] iommu, x86: define irq_remapping_cap() Feng Wu
2014-12-12 15:14 ` [v3 10/26] KVM: change struct pi_desc for VT-d Posted-Interrupts Feng Wu
2014-12-12 15:14 ` [v3 11/26] KVM: Add some helper functions for Posted-Interrupts Feng Wu
2014-12-12 15:14 ` [v3 12/26] KVM: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
2014-12-18 15:19   ` Zhang, Yang Z
2014-12-12 15:14 ` [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI Feng Wu
2014-12-18 14:49   ` Zhang, Yang Z
2014-12-18 16:58     ` Paolo Bonzini
2014-12-19  1:13       ` Zhang, Yang Z
2014-12-19  1:30         ` Wu, Feng
2014-12-19  1:30       ` Wu, Feng
2014-12-19  1:47         ` Zhang, Yang Z
2014-12-19 11:59         ` Paolo Bonzini
2014-12-19 23:48           ` Wu, Feng
2014-12-20 13:16             ` Paolo Bonzini
2014-12-22  4:48               ` Wu, Feng
2014-12-22  9:27                 ` Paolo Bonzini
2014-12-22 11:04                   ` Wu, Feng
2014-12-22 11:06                     ` Paolo Bonzini
2014-12-22 11:17                       ` Wu, Feng
2014-12-22 11:23                         ` Paolo Bonzini
2014-12-22 14:13                           ` Yong Wang
2015-01-09 14:54   ` Radim Krčmář
2015-01-09 14:56     ` Paolo Bonzini
2015-01-09 15:12       ` Radim Krčmář
2015-01-09 15:18         ` Paolo Bonzini
2015-01-09 15:47           ` Radim Krčmář
2015-01-13  0:27       ` Wu, Feng
2015-01-13 16:17         ` Radim Kr?má?
2015-01-14  1:27           ` Wu, Feng
2015-01-14 13:02             ` Paolo Bonzini
2015-01-14 16:59             ` Radim Kr?má?
2015-01-20 21:04               ` Nadav Amit
2015-01-21 21:16                 ` Radim Kr?má?
2014-12-12 15:14 ` [v3 14/26] KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu Feng Wu
2014-12-12 15:14 ` [v3 15/26] KVM: add interfaces to control PI outside vmx Feng Wu
2014-12-12 15:14 ` [v3 16/26] KVM: Make struct kvm_irq_routing_table accessible Feng Wu
2014-12-17 16:17   ` Paolo Bonzini
2014-12-19  2:19     ` Wu, Feng
2014-12-19 11:59       ` Paolo Bonzini
2014-12-19 23:39         ` Wu, Feng
2014-12-12 15:14 ` [v3 17/26] KVM: make kvm_set_msi_irq() public Feng Wu
2014-12-17 17:32   ` Paolo Bonzini
2014-12-12 15:14 ` [v3 18/26] KVM: kvm-vfio: User API for VT-d Posted-Interrupts Feng Wu
2014-12-12 15:14 ` [v3 19/26] KVM: kvm-vfio: implement the VFIO skeleton " Feng Wu
2014-12-12 15:14 ` [v3 20/26] KVM: x86: kvm-vfio: VT-d posted-interrupts setup Feng Wu
2014-12-12 15:14 ` [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts Feng Wu
2014-12-18 14:54   ` Zhang, Yang Z
2014-12-19  0:52     ` Wu, Feng
2015-01-30 18:18   ` H. Peter Anvin
2015-02-02  1:06     ` Wu, Feng
2015-02-23 22:04   ` Marcelo Tosatti
2014-12-12 15:14 ` [v3 22/26] KVM: Define a wakeup worker thread for vCPU Feng Wu
2014-12-12 15:14 ` [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted Feng Wu
2014-12-17 17:11   ` Paolo Bonzini
2014-12-18  3:15     ` Wu, Feng
2014-12-18  8:32       ` Paolo Bonzini
2014-12-19  2:09         ` Wu, Feng
2015-02-23 22:21   ` Marcelo Tosatti
2015-03-02  9:12     ` Wu, Feng
2014-12-12 15:14 ` [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked Feng Wu
2014-12-17 17:09   ` Paolo Bonzini
2014-12-18  3:16     ` Wu, Feng
2014-12-18  8:37       ` Paolo Bonzini
2014-12-19  2:51         ` Wu, Feng
2015-02-25 21:50   ` Marcelo Tosatti
2015-02-26  8:08     ` Wu, Feng
2015-02-26 23:41       ` Marcelo Tosatti
2015-02-26 23:40   ` Marcelo Tosatti
2015-03-02 13:36     ` Wu, Feng
2015-03-04 12:06       ` Marcelo Tosatti
2015-03-06  6:51         ` Wu, Feng
2015-03-12  1:15           ` Marcelo Tosatti
2015-03-16 11:42             ` Wu, Feng [this message]
2015-03-25 23:17               ` Marcelo Tosatti
2015-03-27  6:34                 ` Wu, Feng
2015-03-27 19:30                   ` Marcelo Tosatti
2015-03-30  4:46                     ` Wu, Feng
2015-03-30 23:55                       ` Marcelo Tosatti
2015-03-31  1:13                         ` Wu, Feng
2015-04-14  7:37                         ` Wu, Feng
2015-06-05 21:59                           ` Marcelo Tosatti
2015-06-08  1:43                             ` Wu, Feng
2014-12-12 15:14 ` [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set Feng Wu
2014-12-17 17:42   ` Paolo Bonzini
2014-12-18  3:14     ` Wu, Feng
2014-12-18  8:38       ` Paolo Bonzini
2014-12-18 15:09         ` Zhang, Yang Z
2014-12-19  2:58           ` Wu, Feng
2014-12-19  3:32             ` Zhang, Yang Z
2014-12-19  4:34               ` Wu, Feng
2014-12-19  4:44                 ` Zhang, Yang Z
2014-12-19  4:49                   ` Wu, Feng
2014-12-19  5:25                     ` Zhang, Yang Z
2014-12-19  5:46                       ` Wu, Feng
2014-12-19  7:04                         ` Zhang, Yang Z
2014-12-19 12:00                       ` Paolo Bonzini
2014-12-19 23:34                         ` Wu, Feng
2014-12-12 15:15 ` [v3 26/26] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts Feng Wu
2015-01-28 15:39   ` David Woodhouse
2014-12-16  9:04 ` [v3 00/26] Add VT-d Posted-Interrupts support Wu, Feng
2015-01-06  1:10 ` Wu, Feng
2015-01-09 12:46   ` joro
2015-01-09 13:58     ` Wu, Feng
2015-01-21  2:25 ` Wu, Feng
2015-01-28  3:01   ` Wu, Feng
2015-01-28  3:44     ` Alex Williamson
2015-01-28  4:44       ` Wu, Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E959C4978C3B6342920538CF579893F00243A819@SHSMSX104.ccr.corp.intel.com \
    --to=feng.wu@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=dwmw2@infradead.org \
    --cc=eric.auger@linaro.org \
    --cc=gleb@kernel.org \
    --cc=hpa@zytor.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jiang.liu@linux.intel.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).