KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Andrew Murray <andrew.murray@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>, <kvm@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>,
	Sudeep Holla <sudeep.holla@arm.com>,
	<kvmarm@lists.cs.columbia.edu>,
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual  interrupts for SPE
Date: Tue, 24 Dec 2019 12:42:02 +0000
Message-ID: <1f3fbff6c9db0f14c92a6e3fb800fa0f@www.loen.fr> (raw)
In-Reply-To: <20191224115031.GG42593@e119886-lin.cambridge.arm.com>

On 2019-12-24 11:50, Andrew Murray wrote:
> On Sun, Dec 22, 2019 at 12:07:50PM +0000, Marc Zyngier wrote:
>> On Fri, 20 Dec 2019 14:30:21 +0000,
>> Andrew Murray <andrew.murray@arm.com> wrote:
>> >
>> > Upon the exit of a guest, let's determine if the SPE device has 
>> generated
>> > an interrupt - if so we'll inject a virtual interrupt to the 
>> guest.
>> >
>> > Upon the entry and exit of a guest we'll also update the state of 
>> the
>> > physical IRQ such that it is active when a guest interrupt is 
>> pending
>> > and the guest is running.
>> >
>> > Finally we map the physical IRQ to the virtual IRQ such that the 
>> guest
>> > can deactivate the interrupt when it handles the interrupt.
>> >
>> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
>> > ---
>> >  include/kvm/arm_spe.h |  6 ++++
>> >  virt/kvm/arm/arm.c    |  5 ++-
>> >  virt/kvm/arm/spe.c    | 71 
>> +++++++++++++++++++++++++++++++++++++++++++
>> >  3 files changed, 81 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
>> > index 9c65130d726d..91b2214f543a 100644
>> > --- a/include/kvm/arm_spe.h
>> > +++ b/include/kvm/arm_spe.h
>> > @@ -37,6 +37,9 @@ static inline bool kvm_arm_support_spe_v1(void)
>> >  						      ID_AA64DFR0_PMSVER_SHIFT);
>> >  }
>> >
>> > +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu);
>> > +inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
>> > +
>> >  int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
>> >  			    struct kvm_device_attr *attr);
>> >  int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
>> > @@ -49,6 +52,9 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu 
>> *vcpu);
>> >  #define kvm_arm_support_spe_v1()	(false)
>> >  #define kvm_arm_spe_irq_initialized(v)	(false)
>> >
>> > +static inline void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu) 
>> {}
>> > +static inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu) {}
>> > +
>> >  static inline int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
>> >  					  struct kvm_device_attr *attr)
>> >  {
>> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> > index 340d2388ee2c..a66085c8e785 100644
>> > --- a/virt/kvm/arm/arm.c
>> > +++ b/virt/kvm/arm/arm.c
>> > @@ -741,6 +741,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu 
>> *vcpu, struct kvm_run *run)
>> >  		preempt_disable();
>> >
>> >  		kvm_pmu_flush_hwstate(vcpu);
>> > +		kvm_spe_flush_hwstate(vcpu);
>> >
>> >  		local_irq_disable();
>> >
>> > @@ -782,6 +783,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu 
>> *vcpu, struct kvm_run *run)
>> >  		    kvm_request_pending(vcpu)) {
>> >  			vcpu->mode = OUTSIDE_GUEST_MODE;
>> >  			isb(); /* Ensure work in x_flush_hwstate is committed */
>> > +			kvm_spe_sync_hwstate(vcpu);
>> >  			kvm_pmu_sync_hwstate(vcpu);
>> >  			if (static_branch_unlikely(&userspace_irqchip_in_use))
>> >  				kvm_timer_sync_hwstate(vcpu);
>> > @@ -816,11 +818,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu 
>> *vcpu, struct kvm_run *run)
>> >  		kvm_arm_clear_debug(vcpu);
>> >
>> >  		/*
>> > -		 * We must sync the PMU state before the vgic state so
>> > +		 * We must sync the PMU and SPE state before the vgic state so
>> >  		 * that the vgic can properly sample the updated state of the
>> >  		 * interrupt line.
>> >  		 */
>> >  		kvm_pmu_sync_hwstate(vcpu);
>> > +		kvm_spe_sync_hwstate(vcpu);
>>
>> The *HUGE* difference is that the PMU is purely a virtual interrupt,
>> while you're trying to deal with a HW interrupt here.
>>
>> >
>> >  		/*
>> >  		 * Sync the vgic state before syncing the timer state because
>> > diff --git a/virt/kvm/arm/spe.c b/virt/kvm/arm/spe.c
>> > index 83ac2cce2cc3..097ed39014e4 100644
>> > --- a/virt/kvm/arm/spe.c
>> > +++ b/virt/kvm/arm/spe.c
>> > @@ -35,6 +35,68 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu 
>> *vcpu)
>> >  	return 0;
>> >  }
>> >
>> > +static inline void set_spe_irq_phys_active(struct 
>> arm_spe_kvm_info *info,
>> > +					   bool active)
>> > +{
>> > +	int r;
>> > +	r = irq_set_irqchip_state(info->physical_irq, 
>> IRQCHIP_STATE_ACTIVE,
>> > +				  active);
>> > +	WARN_ON(r);
>> > +}
>> > +
>> > +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu)
>> > +{
>> > +	struct kvm_spe *spe = &vcpu->arch.spe;
>> > +	bool phys_active = false;
>> > +	struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
>> > +
>> > +	if (!kvm_arm_spe_v1_ready(vcpu))
>> > +		return;
>> > +
>> > +	if (irqchip_in_kernel(vcpu->kvm))
>> > +		phys_active = kvm_vgic_map_is_active(vcpu, spe->irq_num);
>> > +
>> > +	phys_active |= spe->irq_level;
>> > +
>> > +	set_spe_irq_phys_active(info, phys_active);
>>
>> So you're happy to mess with the HW interrupt state even when you
>> don't have a HW irqchip? If you are going to copy paste the timer 
>> code
>> here, you'd need to support it all the way (no, don't).
>>
>> > +}
>> > +
>> > +void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu)
>> > +{
>> > +	struct kvm_spe *spe = &vcpu->arch.spe;
>> > +	u64 pmbsr;
>> > +	int r;
>> > +	bool service;
>> > +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
>> > +	struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
>> > +
>> > +	if (!kvm_arm_spe_v1_ready(vcpu))
>> > +		return;
>> > +
>> > +	set_spe_irq_phys_active(info, false);
>> > +
>> > +	pmbsr = ctxt->sys_regs[PMBSR_EL1];
>> > +	service = !!(pmbsr & BIT(SYS_PMBSR_EL1_S_SHIFT));
>> > +	if (spe->irq_level == service)
>> > +		return;
>> > +
>> > +	spe->irq_level = service;
>> > +
>> > +	if (likely(irqchip_in_kernel(vcpu->kvm))) {
>> > +		r = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
>> > +					spe->irq_num, service, spe);
>> > +		WARN_ON(r);
>> > +	}
>> > +}
>> > +
>> > +static inline bool kvm_arch_arm_spe_v1_get_input_level(int 
>> vintid)
>> > +{
>> > +	struct kvm_vcpu *vcpu = kvm_arm_get_running_vcpu();
>> > +	struct kvm_spe *spe = &vcpu->arch.spe;
>> > +
>> > +	return spe->irq_level;
>> > +}
>>
>> This isn't what such a callback is for. It is supposed to sample the
>> HW, an nothing else.
>>
>> > +
>> >  static int kvm_arm_spe_v1_init(struct kvm_vcpu *vcpu)
>> >  {
>> >  	if (!kvm_arm_support_spe_v1())
>> > @@ -48,6 +110,7 @@ static int kvm_arm_spe_v1_init(struct kvm_vcpu 
>> *vcpu)
>> >
>> >  	if (irqchip_in_kernel(vcpu->kvm)) {
>> >  		int ret;
>> > +		struct arm_spe_kvm_info *info;
>> >
>> >  		/*
>> >  		 * If using the SPE with an in-kernel virtual GIC
>> > @@ -57,10 +120,18 @@ static int kvm_arm_spe_v1_init(struct 
>> kvm_vcpu *vcpu)
>> >  		if (!vgic_initialized(vcpu->kvm))
>> >  			return -ENODEV;
>> >
>> > +		info = arm_spe_get_kvm_info();
>> > +		if (!info->physical_irq)
>> > +			return -ENODEV;
>> > +
>> >  		ret = kvm_vgic_set_owner(vcpu, vcpu->arch.spe.irq_num,
>> >  					 &vcpu->arch.spe);
>> >  		if (ret)
>> >  			return ret;
>> > +
>> > +		ret = kvm_vgic_map_phys_irq(vcpu, info->physical_irq,
>> > +					    vcpu->arch.spe.irq_num,
>> > +					    kvm_arch_arm_spe_v1_get_input_level);
>>
>> You're mapping the interrupt int the guest, and yet you have never
>> forwarded the interrupt the first place. All this flow is only going
>> to wreck the host driver as soon as an interrupt occurs.
>>
>> I think you should rethink the interrupt handling altogether. It 
>> would
>> make more sense if the interrupt was actually completely
>> virtualized. If you can isolate the guest state and compute the
>> interrupt state in SW (and from the above, it seems that you can),
>> then you shouldn't mess with the whole forwarding *at all*, as it
>> isn't designed for devices shared between host and guests.
>
> Yes it's possible to read SYS_PMBSR_EL1_S_SHIFT and determine if SPE 
> wants
> service. If I understand correctly, you're suggesting on entry/exit 
> to the
> guest we determine this and inject an interrupt to the guest. As well 
> as
> removing the kvm_vgic_map_phys_irq mapping to the physical interrupt?

The mapping only makes sense for devices that have their interrupt
forwarded to a vcpu, where the expected flow is that the interrupt
is taken on the host with a normal interrupt handler and then
injected in the guest (you still have to manage the active state
though). The basic assumption is that such a device is entirely
owned by KVM.

Here, you're abusing the mapping interface: you don't have an
interrupt handler (the host SPE driver owns it), the interrupt
isn't forwarded, and yet you're messing with the active state.
None of that is expected, and you are in uncharted territory
as far as KVM is concerned.

What bothers me the most is that this looks a lot like a previous
implementation of the timers, and we had all the problems in the
world to keep track of the interrupt state *and* have a reasonable
level of performance (hitting the redistributor on the fast path
is a performance killer).

> My understanding was that I needed knowledge of the physical SPE 
> interrupt
> number so that I could prevent the host SPE driver from getting 
> spurious
> interrupts due to guest use of the SPE.

You can't completely rule out the host getting interrupted. Even if you 
set
PMBSR_EL1.S to zero, there is no guarantee that the host will not 
observe
the interrupt anyway (the GIC architecture doesn't tell you how quickly
it will be retired, if ever). The host driver already checks for this
anyway.

What you need to ensure is that PMBSR_EL1.S being set on guest entry
doesn't immediately kick you out of the guest and prevent forward
progress. This is why you need to manage the active state.

The real question is: how quickly do you want to react to a SPE
interrupt firing while in a guest?

If you want to take it into account as soon as it fires, then you need
to eagerly save/restore the active state together with the SPE state on
each entry/exit, and performance will suffer. This is what you are
currently doing.

If you're OK with evaluating the interrupt status on exit, but without
the interrupt itself causing an exit, then you can simply manage it
as a purely virtual interrupt, and just deal with the active state
in load/put (set the interrupt as active on load, clear it on put).

Given that SPE interrupts always indicate that profiling has stopped,
this only affects the size of the black hole, and I'm inclined to do
the latter.

         M.
-- 
Jazz is not dead. It just smells funny...

  reply index

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
2019-12-20 14:30 ` [PATCH v2 01/18] dt-bindings: ARM SPE: highlight the need for PPI partitions on heterogeneous systems Andrew Murray
2019-12-20 14:30 ` [PATCH v2 02/18] arm64: KVM: reset E2PB correctly in MDCR_EL2 when exiting the guest(VHE) Andrew Murray
2019-12-21 13:12   ` Marc Zyngier
2019-12-24 10:29     ` Andrew Murray
2020-01-02 16:21       ` Andrew Murray
2019-12-20 14:30 ` [PATCH v2 03/18] arm64: KVM: define SPE data structure for each vcpu Andrew Murray
2019-12-21 13:19   ` Marc Zyngier
2019-12-24 12:01     ` Andrew Murray
2019-12-20 14:30 ` [PATCH v2 04/18] arm64: KVM: add SPE system registers to sys_reg_descs Andrew Murray
2019-12-20 14:30 ` [PATCH v2 05/18] arm64: KVM/VHE: enable the use PMSCR_EL12 on VHE systems Andrew Murray
2019-12-20 14:30 ` [PATCH v2 06/18] arm64: KVM: split debug save restore across vm/traps activation Andrew Murray
2019-12-20 14:30 ` [PATCH v2 07/18] arm64: KVM/debug: drop pmscr_el1 and use sys_regs[PMSCR_EL1] in kvm_cpu_context Andrew Murray
2019-12-20 14:30 ` [PATCH v2 08/18] arm64: KVM: add support to save/restore SPE profiling buffer controls Andrew Murray
2019-12-21 13:57   ` Marc Zyngier
2019-12-24 10:49     ` Andrew Murray
2019-12-24 15:17       ` Andrew Murray
2019-12-24 15:48         ` Marc Zyngier
2019-12-20 14:30 ` [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full " Andrew Murray
2019-12-20 18:06   ` Mark Rutland
2019-12-24 12:15     ` Andrew Murray
2019-12-21 14:13   ` Marc Zyngier
2020-01-07 15:13     ` Andrew Murray
2020-01-08 11:17       ` Marc Zyngier
2020-01-08 11:58         ` Will Deacon
2020-01-08 12:36           ` Marc Zyngier
2020-01-08 13:10             ` Will Deacon
2020-01-09 11:23               ` Andrew Murray
2020-01-09 11:25                 ` Andrew Murray
2020-01-09 12:01                   ` Will Deacon
2020-01-10 10:54     ` Andrew Murray
2020-01-10 11:04       ` Andrew Murray
2020-01-10 11:51         ` Marc Zyngier
2020-01-10 12:12           ` Andrew Murray
2020-01-10 11:18       ` Marc Zyngier
2020-01-10 12:12         ` Andrew Murray
2020-01-10 13:34           ` Marc Zyngier
2019-12-20 14:30 ` [PATCH v2 10/18] arm64: KVM/debug: use EL1&0 stage 1 translation regime Andrew Murray
2019-12-22 10:34   ` Marc Zyngier
2019-12-24 11:11     ` Andrew Murray
2020-01-13 16:31     ` Andrew Murray
2020-01-15 14:03       ` Marc Zyngier
2019-12-20 14:30 ` [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2 Andrew Murray
2019-12-20 18:08   ` Mark Rutland
2019-12-22 10:42   ` Marc Zyngier
2019-12-23 11:56     ` Andrew Murray
2019-12-23 12:05       ` Marc Zyngier
2019-12-23 12:10         ` Andrew Murray
2020-01-09 17:25           ` Andrew Murray
2020-01-09 17:42             ` Mark Rutland
2020-01-09 17:46               ` Andrew Murray
2019-12-20 14:30 ` [PATCH v2 12/18] KVM: arm64: add a new vcpu device control group for SPEv1 Andrew Murray
2019-12-22 11:03   ` Marc Zyngier
2019-12-24 12:30     ` Andrew Murray
2019-12-20 14:30 ` [PATCH v2 13/18] perf: arm_spe: Add KVM structure for obtaining IRQ info Andrew Murray
2019-12-22 11:24   ` Marc Zyngier
2019-12-24 12:35     ` Andrew Murray
2019-12-20 14:30 ` [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual interrupts for SPE Andrew Murray
2019-12-22 12:07   ` Marc Zyngier
2019-12-24 11:50     ` Andrew Murray
2019-12-24 12:42       ` Marc Zyngier [this message]
2019-12-24 13:08         ` Andrew Murray
2019-12-24 13:22           ` Marc Zyngier
2019-12-24 13:36             ` Andrew Murray
2019-12-24 13:46               ` Marc Zyngier
2019-12-20 14:30 ` [PATCH v2 15/18] perf: arm_spe: Handle guest/host exclusion flags Andrew Murray
2019-12-20 18:10   ` Mark Rutland
2019-12-22 12:10   ` Marc Zyngier
2019-12-23 12:10     ` Andrew Murray
2019-12-23 12:18       ` Marc Zyngier
2019-12-20 14:30 ` [PATCH v2 16/18] KVM: arm64: enable SPE support Andrew Murray
2019-12-20 14:30 ` [PATCH v2 17/18, KVMTOOL] update_headers: Sync kvm UAPI headers with linux v5.5-rc2 Andrew Murray
2019-12-20 14:30 ` [PATCH v2 18/18, KVMTOOL] kvm: add a vcpu feature for SPEv1 support Andrew Murray
2019-12-20 17:55 ` [PATCH v2 00/18] arm64: KVM: add SPE profiling support Mark Rutland
2019-12-24 12:54   ` Andrew Murray
2019-12-21 10:48 ` Marc Zyngier
2019-12-22 12:22   ` Marc Zyngier
2019-12-24 12:56     ` Andrew Murray

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1f3fbff6c9db0f14c92a6e3fb800fa0f@www.loen.fr \
    --to=maz@kernel.org \
    --cc=andrew.murray@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.zyngier@arm.com \
    --cc=sudeep.holla@arm.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git