kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Murray <andrew.murray@arm.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Sudeep Holla <sudeep.holla@arm.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual interrupts for SPE
Date: Tue, 24 Dec 2019 13:36:47 +0000	[thread overview]
Message-ID: <20191224133647.GO42593@e119886-lin.cambridge.arm.com> (raw)
In-Reply-To: <a2b8846377b3f5884feeb9728b16f826@www.loen.fr>

On Tue, Dec 24, 2019 at 01:22:46PM +0000, Marc Zyngier wrote:
> On 2019-12-24 13:08, Andrew Murray wrote:
> > On Tue, Dec 24, 2019 at 12:42:02PM +0000, Marc Zyngier wrote:
> > > On 2019-12-24 11:50, Andrew Murray wrote:
> > > > On Sun, Dec 22, 2019 at 12:07:50PM +0000, Marc Zyngier wrote:
> > > > > On Fri, 20 Dec 2019 14:30:21 +0000,
> > > > > Andrew Murray <andrew.murray@arm.com> wrote:
> > > > > >
> > > > > > Upon the exit of a guest, let's determine if the SPE device
> > > has
> > > > > generated
> > > > > > an interrupt - if so we'll inject a virtual interrupt to the
> > > > > guest.
> > > > > >
> > > > > > Upon the entry and exit of a guest we'll also update the state
> > > of
> > > > > the
> > > > > > physical IRQ such that it is active when a guest interrupt is
> > > > > pending
> > > > > > and the guest is running.
> > > > > >
> > > > > > Finally we map the physical IRQ to the virtual IRQ such that
> > > the
> > > > > guest
> > > > > > can deactivate the interrupt when it handles the interrupt.
> > > > > >
> > > > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > > > > > ---
> > > > > >  include/kvm/arm_spe.h |  6 ++++
> > > > > >  virt/kvm/arm/arm.c    |  5 ++-
> > > > > >  virt/kvm/arm/spe.c    | 71
> > > > > +++++++++++++++++++++++++++++++++++++++++++
> > > > > >  3 files changed, 81 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> > > > > > index 9c65130d726d..91b2214f543a 100644
> > > > > > --- a/include/kvm/arm_spe.h
> > > > > > +++ b/include/kvm/arm_spe.h
> > > > > > @@ -37,6 +37,9 @@ static inline bool
> > > kvm_arm_support_spe_v1(void)
> > > > > >  						      ID_AA64DFR0_PMSVER_SHIFT);
> > > > > >  }
> > > > > >
> > > > > > +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu);
> > > > > > +inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
> > > > > > +
> > > > > >  int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
> > > > > >  			    struct kvm_device_attr *attr);
> > > > > >  int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
> > > > > > @@ -49,6 +52,9 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu
> > > > > *vcpu);
> > > > > >  #define kvm_arm_support_spe_v1()	(false)
> > > > > >  #define kvm_arm_spe_irq_initialized(v)	(false)
> > > > > >
> > > > > > +static inline void kvm_spe_flush_hwstate(struct kvm_vcpu
> > > *vcpu)
> > > > > {}
> > > > > > +static inline void kvm_spe_sync_hwstate(struct kvm_vcpu
> > > *vcpu) {}
> > > > > > +
> > > > > >  static inline int kvm_arm_spe_v1_set_attr(struct kvm_vcpu
> > > *vcpu,
> > > > > >  					  struct kvm_device_attr *attr)
> > > > > >  {
> > > > > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > > > > > index 340d2388ee2c..a66085c8e785 100644
> > > > > > --- a/virt/kvm/arm/arm.c
> > > > > > +++ b/virt/kvm/arm/arm.c
> > > > > > @@ -741,6 +741,7 @@ int kvm_arch_vcpu_ioctl_run(struct
> > > kvm_vcpu
> > > > > *vcpu, struct kvm_run *run)
> > > > > >  		preempt_disable();
> > > > > >
> > > > > >  		kvm_pmu_flush_hwstate(vcpu);
> > > > > > +		kvm_spe_flush_hwstate(vcpu);
> > > > > >
> > > > > >  		local_irq_disable();
> > > > > >
> > > > > > @@ -782,6 +783,7 @@ int kvm_arch_vcpu_ioctl_run(struct
> > > kvm_vcpu
> > > > > *vcpu, struct kvm_run *run)
> > > > > >  		    kvm_request_pending(vcpu)) {
> > > > > >  			vcpu->mode = OUTSIDE_GUEST_MODE;
> > > > > >  			isb(); /* Ensure work in x_flush_hwstate is committed */
> > > > > > +			kvm_spe_sync_hwstate(vcpu);
> > > > > >  			kvm_pmu_sync_hwstate(vcpu);
> > > > > >  			if (static_branch_unlikely(&userspace_irqchip_in_use))
> > > > > >  				kvm_timer_sync_hwstate(vcpu);
> > > > > > @@ -816,11 +818,12 @@ int kvm_arch_vcpu_ioctl_run(struct
> > > kvm_vcpu
> > > > > *vcpu, struct kvm_run *run)
> > > > > >  		kvm_arm_clear_debug(vcpu);
> > > > > >
> > > > > >  		/*
> > > > > > -		 * We must sync the PMU state before the vgic state so
> > > > > > +		 * We must sync the PMU and SPE state before the vgic state
> > > so
> > > > > >  		 * that the vgic can properly sample the updated state of
> > > the
> > > > > >  		 * interrupt line.
> > > > > >  		 */
> > > > > >  		kvm_pmu_sync_hwstate(vcpu);
> > > > > > +		kvm_spe_sync_hwstate(vcpu);
> > > > >
> > > > > The *HUGE* difference is that the PMU is purely a virtual
> > > interrupt,
> > > > > while you're trying to deal with a HW interrupt here.
> > > > >
> > > > > >
> > > > > >  		/*
> > > > > >  		 * Sync the vgic state before syncing the timer state
> > > because
> > > > > > diff --git a/virt/kvm/arm/spe.c b/virt/kvm/arm/spe.c
> > > > > > index 83ac2cce2cc3..097ed39014e4 100644
> > > > > > --- a/virt/kvm/arm/spe.c
> > > > > > +++ b/virt/kvm/arm/spe.c
> > > > > > @@ -35,6 +35,68 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu
> > > > > *vcpu)
> > > > > >  	return 0;
> > > > > >  }
> > > > > >
> > > > > > +static inline void set_spe_irq_phys_active(struct
> > > > > arm_spe_kvm_info *info,
> > > > > > +					   bool active)
> > > > > > +{
> > > > > > +	int r;
> > > > > > +	r = irq_set_irqchip_state(info->physical_irq,
> > > > > IRQCHIP_STATE_ACTIVE,
> > > > > > +				  active);
> > > > > > +	WARN_ON(r);
> > > > > > +}
> > > > > > +
> > > > > > +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu)
> > > > > > +{
> > > > > > +	struct kvm_spe *spe = &vcpu->arch.spe;
> > > > > > +	bool phys_active = false;
> > > > > > +	struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
> > > > > > +
> > > > > > +	if (!kvm_arm_spe_v1_ready(vcpu))
> > > > > > +		return;
> > > > > > +
> > > > > > +	if (irqchip_in_kernel(vcpu->kvm))
> > > > > > +		phys_active = kvm_vgic_map_is_active(vcpu, spe->irq_num);
> > > > > > +
> > > > > > +	phys_active |= spe->irq_level;
> > > > > > +
> > > > > > +	set_spe_irq_phys_active(info, phys_active);
> > > > >
> > > > > So you're happy to mess with the HW interrupt state even when
> > > you
> > > > > don't have a HW irqchip? If you are going to copy paste the
> > > timer
> > > > > code
> > > > > here, you'd need to support it all the way (no, don't).
> > > > >
> > > > > > +}
> > > > > > +
> > > > > > +void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu)
> > > > > > +{
> > > > > > +	struct kvm_spe *spe = &vcpu->arch.spe;
> > > > > > +	u64 pmbsr;
> > > > > > +	int r;
> > > > > > +	bool service;
> > > > > > +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> > > > > > +	struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
> > > > > > +
> > > > > > +	if (!kvm_arm_spe_v1_ready(vcpu))
> > > > > > +		return;
> > > > > > +
> > > > > > +	set_spe_irq_phys_active(info, false);
> > > > > > +
> > > > > > +	pmbsr = ctxt->sys_regs[PMBSR_EL1];
> > > > > > +	service = !!(pmbsr & BIT(SYS_PMBSR_EL1_S_SHIFT));
> > > > > > +	if (spe->irq_level == service)
> > > > > > +		return;
> > > > > > +
> > > > > > +	spe->irq_level = service;
> > > > > > +
> > > > > > +	if (likely(irqchip_in_kernel(vcpu->kvm))) {
> > > > > > +		r = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> > > > > > +					spe->irq_num, service, spe);
> > > > > > +		WARN_ON(r);
> > > > > > +	}
> > > > > > +}
> > > > > > +
> > > > > > +static inline bool kvm_arch_arm_spe_v1_get_input_level(int
> > > > > vintid)
> > > > > > +{
> > > > > > +	struct kvm_vcpu *vcpu = kvm_arm_get_running_vcpu();
> > > > > > +	struct kvm_spe *spe = &vcpu->arch.spe;
> > > > > > +
> > > > > > +	return spe->irq_level;
> > > > > > +}
> > > > >
> > > > > This isn't what such a callback is for. It is supposed to sample
> > > the
> > > > > HW, an nothing else.
> > > > >
> > > > > > +
> > > > > >  static int kvm_arm_spe_v1_init(struct kvm_vcpu *vcpu)
> > > > > >  {
> > > > > >  	if (!kvm_arm_support_spe_v1())
> > > > > > @@ -48,6 +110,7 @@ static int kvm_arm_spe_v1_init(struct
> > > kvm_vcpu
> > > > > *vcpu)
> > > > > >
> > > > > >  	if (irqchip_in_kernel(vcpu->kvm)) {
> > > > > >  		int ret;
> > > > > > +		struct arm_spe_kvm_info *info;
> > > > > >
> > > > > >  		/*
> > > > > >  		 * If using the SPE with an in-kernel virtual GIC
> > > > > > @@ -57,10 +120,18 @@ static int kvm_arm_spe_v1_init(struct
> > > > > kvm_vcpu *vcpu)
> > > > > >  		if (!vgic_initialized(vcpu->kvm))
> > > > > >  			return -ENODEV;
> > > > > >
> > > > > > +		info = arm_spe_get_kvm_info();
> > > > > > +		if (!info->physical_irq)
> > > > > > +			return -ENODEV;
> > > > > > +
> > > > > >  		ret = kvm_vgic_set_owner(vcpu, vcpu->arch.spe.irq_num,
> > > > > >  					 &vcpu->arch.spe);
> > > > > >  		if (ret)
> > > > > >  			return ret;
> > > > > > +
> > > > > > +		ret = kvm_vgic_map_phys_irq(vcpu, info->physical_irq,
> > > > > > +					    vcpu->arch.spe.irq_num,
> > > > > > +					    kvm_arch_arm_spe_v1_get_input_level);
> > > > >
> > > > > You're mapping the interrupt int the guest, and yet you have
> > > never
> > > > > forwarded the interrupt the first place. All this flow is only
> > > going
> > > > > to wreck the host driver as soon as an interrupt occurs.
> > > > >
> > > > > I think you should rethink the interrupt handling altogether. It
> > > > > would
> > > > > make more sense if the interrupt was actually completely
> > > > > virtualized. If you can isolate the guest state and compute the
> > > > > interrupt state in SW (and from the above, it seems that you
> > > can),
> > > > > then you shouldn't mess with the whole forwarding *at all*, as
> > > it
> > > > > isn't designed for devices shared between host and guests.
> > > >
> > > > Yes it's possible to read SYS_PMBSR_EL1_S_SHIFT and determine if
> > > SPE
> > > > wants
> > > > service. If I understand correctly, you're suggesting on
> > > entry/exit to
> > > > the
> > > > guest we determine this and inject an interrupt to the guest. As
> > > well as
> > > > removing the kvm_vgic_map_phys_irq mapping to the physical
> > > interrupt?
> > > 
> > > The mapping only makes sense for devices that have their interrupt
> > > forwarded to a vcpu, where the expected flow is that the interrupt
> > > is taken on the host with a normal interrupt handler and then
> > > injected in the guest (you still have to manage the active state
> > > though). The basic assumption is that such a device is entirely
> > > owned by KVM.
> > 
> > Though the mapping does mean that if the guest handles the guest SPE
> > interrupt it doesn't have to wait for a guest exit before having the
> > SPE interrupt evaluated again (i.e. another SPE interrupt won't cause
> > a guest exit) - thus increasing the size of any black hole.
> 
> Sure. It still remains that your use case is outside of the scope of
> this internal API.
> 
> > > Here, you're abusing the mapping interface: you don't have an
> > > interrupt handler (the host SPE driver owns it), the interrupt
> > > isn't forwarded, and yet you're messing with the active state.
> > > None of that is expected, and you are in uncharted territory
> > > as far as KVM is concerned.
> > > 
> > > What bothers me the most is that this looks a lot like a previous
> > > implementation of the timers, and we had all the problems in the
> > > world to keep track of the interrupt state *and* have a reasonable
> > > level of performance (hitting the redistributor on the fast path
> > > is a performance killer).
> > > 
> > > > My understanding was that I needed knowledge of the physical SPE
> > > > interrupt
> > > > number so that I could prevent the host SPE driver from getting
> > > spurious
> > > > interrupts due to guest use of the SPE.
> > > 
> > > You can't completely rule out the host getting interrupted. Even if
> > > you set
> > > PMBSR_EL1.S to zero, there is no guarantee that the host will not
> > > observe
> > > the interrupt anyway (the GIC architecture doesn't tell you how
> > > quickly
> > > it will be retired, if ever). The host driver already checks for
> > > this
> > > anyway.
> > > 
> > > What you need to ensure is that PMBSR_EL1.S being set on guest entry
> > > doesn't immediately kick you out of the guest and prevent forward
> > > progress. This is why you need to manage the active state.
> > > 
> > > The real question is: how quickly do you want to react to a SPE
> > > interrupt firing while in a guest?
> > > 
> > > If you want to take it into account as soon as it fires, then you
> > > need
> > > to eagerly save/restore the active state together with the SPE state
> > > on
> > > each entry/exit, and performance will suffer. This is what you are
> > > currently doing.
> > > 
> > > If you're OK with evaluating the interrupt status on exit, but
> > > without
> > > the interrupt itself causing an exit, then you can simply manage it
> > > as a purely virtual interrupt, and just deal with the active state
> > > in load/put (set the interrupt as active on load, clear it on put).
> > 
> > This does feel like the pragmatic approach - a larger black hole in
> > exchange
> > for performance. I imagine the blackhole would be naturally reduced on
> > machines with high workloads.
> 
> Why? I don't see the relation between how busy the vcpu is and the size
> of the blackhole. It is strictly a function of the frequency of exits.

Indeed, my assumption being that the busier a system is the more
interrupts, thus leading to more exits and so an increased frequency of
SPE interrupt evaluation and thus smaller black hole.

Thanks,

Andrew Murray

> 
>         M.
> 
> > 
> > I'll refine the series to take this approach.
> > 
> > > 
> > > Given that SPE interrupts always indicate that profiling has
> > > stopped,
> > 
> > and faults :|
> > 
> > Thanks,
> > 
> > Andrew Murray
> > 
> > > this only affects the size of the black hole, and I'm inclined to do
> > > the latter.
> > > 
> > >         M.
> > > --
> > > Jazz is not dead. It just smells funny...
> 
> -- 
> Jazz is not dead. It just smells funny...

  reply	other threads:[~2019-12-24 13:36 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
2019-12-20 14:30 ` [PATCH v2 01/18] dt-bindings: ARM SPE: highlight the need for PPI partitions on heterogeneous systems Andrew Murray
2019-12-20 14:30 ` [PATCH v2 02/18] arm64: KVM: reset E2PB correctly in MDCR_EL2 when exiting the guest(VHE) Andrew Murray
2019-12-21 13:12   ` Marc Zyngier
2019-12-24 10:29     ` Andrew Murray
2020-01-02 16:21       ` Andrew Murray
2019-12-20 14:30 ` [PATCH v2 03/18] arm64: KVM: define SPE data structure for each vcpu Andrew Murray
2019-12-21 13:19   ` Marc Zyngier
2019-12-24 12:01     ` Andrew Murray
2019-12-20 14:30 ` [PATCH v2 04/18] arm64: KVM: add SPE system registers to sys_reg_descs Andrew Murray
2019-12-20 14:30 ` [PATCH v2 05/18] arm64: KVM/VHE: enable the use PMSCR_EL12 on VHE systems Andrew Murray
2019-12-20 14:30 ` [PATCH v2 06/18] arm64: KVM: split debug save restore across vm/traps activation Andrew Murray
2019-12-20 14:30 ` [PATCH v2 07/18] arm64: KVM/debug: drop pmscr_el1 and use sys_regs[PMSCR_EL1] in kvm_cpu_context Andrew Murray
2019-12-20 14:30 ` [PATCH v2 08/18] arm64: KVM: add support to save/restore SPE profiling buffer controls Andrew Murray
2019-12-21 13:57   ` Marc Zyngier
2019-12-24 10:49     ` Andrew Murray
2019-12-24 15:17       ` Andrew Murray
2019-12-24 15:48         ` Marc Zyngier
2019-12-20 14:30 ` [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full " Andrew Murray
2019-12-20 18:06   ` Mark Rutland
2019-12-24 12:15     ` Andrew Murray
2019-12-21 14:13   ` Marc Zyngier
2020-01-07 15:13     ` Andrew Murray
2020-01-08 11:17       ` Marc Zyngier
2020-01-08 11:58         ` Will Deacon
2020-01-08 12:36           ` Marc Zyngier
2020-01-08 13:10             ` Will Deacon
2020-01-09 11:23               ` Andrew Murray
2020-01-09 11:25                 ` Andrew Murray
2020-01-09 12:01                   ` Will Deacon
2020-01-10 10:54     ` Andrew Murray
2020-01-10 11:04       ` Andrew Murray
2020-01-10 11:51         ` Marc Zyngier
2020-01-10 12:12           ` Andrew Murray
2020-01-10 11:18       ` Marc Zyngier
2020-01-10 12:12         ` Andrew Murray
2020-01-10 13:34           ` Marc Zyngier
2019-12-20 14:30 ` [PATCH v2 10/18] arm64: KVM/debug: use EL1&0 stage 1 translation regime Andrew Murray
2019-12-22 10:34   ` Marc Zyngier
2019-12-24 11:11     ` Andrew Murray
2020-01-13 16:31     ` Andrew Murray
2020-01-15 14:03       ` Marc Zyngier
2019-12-20 14:30 ` [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2 Andrew Murray
2019-12-20 18:08   ` Mark Rutland
2019-12-22 10:42   ` Marc Zyngier
2019-12-23 11:56     ` Andrew Murray
2019-12-23 12:05       ` Marc Zyngier
2019-12-23 12:10         ` Andrew Murray
2020-01-09 17:25           ` Andrew Murray
2020-01-09 17:42             ` Mark Rutland
2020-01-09 17:46               ` Andrew Murray
2019-12-20 14:30 ` [PATCH v2 12/18] KVM: arm64: add a new vcpu device control group for SPEv1 Andrew Murray
2019-12-22 11:03   ` Marc Zyngier
2019-12-24 12:30     ` Andrew Murray
2019-12-20 14:30 ` [PATCH v2 13/18] perf: arm_spe: Add KVM structure for obtaining IRQ info Andrew Murray
2019-12-22 11:24   ` Marc Zyngier
2019-12-24 12:35     ` Andrew Murray
2019-12-20 14:30 ` [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual interrupts for SPE Andrew Murray
2019-12-22 12:07   ` Marc Zyngier
2019-12-24 11:50     ` Andrew Murray
2019-12-24 12:42       ` Marc Zyngier
2019-12-24 13:08         ` Andrew Murray
2019-12-24 13:22           ` Marc Zyngier
2019-12-24 13:36             ` Andrew Murray [this message]
2019-12-24 13:46               ` Marc Zyngier
2019-12-20 14:30 ` [PATCH v2 15/18] perf: arm_spe: Handle guest/host exclusion flags Andrew Murray
2019-12-20 18:10   ` Mark Rutland
2019-12-22 12:10   ` Marc Zyngier
2019-12-23 12:10     ` Andrew Murray
2019-12-23 12:18       ` Marc Zyngier
2019-12-20 14:30 ` [PATCH v2 16/18] KVM: arm64: enable SPE support Andrew Murray
2019-12-20 14:30 ` [PATCH v2 17/18, KVMTOOL] update_headers: Sync kvm UAPI headers with linux v5.5-rc2 Andrew Murray
2019-12-20 14:30 ` [PATCH v2 18/18, KVMTOOL] kvm: add a vcpu feature for SPEv1 support Andrew Murray
2019-12-20 17:55 ` [PATCH v2 00/18] arm64: KVM: add SPE profiling support Mark Rutland
2019-12-24 12:54   ` Andrew Murray
2019-12-21 10:48 ` Marc Zyngier
2019-12-22 12:22   ` Marc Zyngier
2019-12-24 12:56     ` Andrew Murray

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191224133647.GO42593@e119886-lin.cambridge.arm.com \
    --to=andrew.murray@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.zyngier@arm.com \
    --cc=maz@kernel.org \
    --cc=sudeep.holla@arm.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).