All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoffer Dall <christoffer.dall@linaro.org>
To: Anup Patel <anup@brainfault.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	KVM General <kvm@vger.kernel.org>, patches <patches@apm.com>,
	Will Deacon <Will.Deacon@arm.com>,
	"Ian.Campbell@citrix.com" <Ian.Campbell@citrix.com>,
	Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Subject: Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
Date: Mon, 16 Feb 2015 13:23:09 +0100	[thread overview]
Message-ID: <20150216122309.GA23821@cbox> (raw)
In-Reply-To: <CAAhSdy1Nj1THsdF6UeM--ihmk8mnDR38Bi9v_ri3Y-MRKD34zg@mail.gmail.com>

On Mon, Feb 16, 2015 at 05:46:54PM +0530, Anup Patel wrote:
> Hi Christoffer,
> 
> On Sun, Feb 15, 2015 at 9:03 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > Hi Anup,
> >
> > On Mon, Jan 12, 2015 at 09:49:13AM +0530, Anup Patel wrote:
> >> On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
> >> >> (dropping previous conversation for easy reading)
> >> >>
> >> >> Hi Marc/Christoffer,
> >> >>
> >> >> I tried implementing PMU context-switch via C code
> >> >> in EL1 mode and in atomic context with irqs disabled.
> >> >> The context switch itself works perfectly fine but
> >> >> irq forwarding is not clean for PMU irq.
> >> >>
> >> >> I found another issue that is GIC only samples irq
> >> >> lines if they are enabled. This means for using
> >> >> irq forwarding we will need to ensure that host PMU
> >> >> irq is enabled.  The arch_timer code does this by
> >> >> doing request_irq() for host virtual timer interrupt.
> >> >> For PMU, we can either enable/disable host PMU
> >> >> irq in context switch or we need to do have shared
> >> >> irq handler between kvm pmu and host kernel pmu.
> >> >
> >> > could we simply require the host PMU driver to request the IRQ and have
> >> > the driver inject the corresponding IRQ to the VM via a mechanism
> >> > similar to VFIO using an eventfd and irqfds etc.?
> >>
> >> Currently, the host PMU driver does request_irq() only when
> >> there is some event to be monitored. This means host will do
> >> request_irq() only when we run perf application on host
> >> user space.
> >>
> >> Initially, I though that we could simply pass IRQF_SHARED
> >> for request_irq() in host PMU driver and do the same for
> >> reqest_irq() in KVM PMU code but the PMU irq can be
> >> SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
> >> flag would fine but if its PPI then we have no way to
> >> set IRQF_SHARED flag because request_percpu_irq()
> >> does not have irq flags parameter.
> >>
> >> >
> >> > (I haven't quite thought through if there's a way for the host PMU
> >> > driver to distinguish between an IRQ for itself and one for the guest,
> >> > though).
> >> >
> >> > It does feel like we will need some sort of communication/coordination
> >> > between the host PMU driver and KVM...
> >> >
> >> >>
> >> >> I have rethinked about our discussion so far. I
> >> >> understand that we need KVM PMU virtualization
> >> >> to meet following criteria:
> >> >> 1. No modification in host PMU driver
> >> >
> >> > is this really a strict requirement?  one of the advantages of KVM
> >> > should be that the rest of the kernel should be supportive of KVM.
> >>
> >> I guess so because host PMU driver should not do things
> >> differently for host and guest. I think this the reason why
> >> we discarded the mask/unmask PMU irq approach which
> >> I had implemented in RFC v1.
> >>
> >> >
> >> >> 2. No modification in guest PMU driver
> >> >> 3. No mask/unmask dance for sharing host PMU irq
> >> >> 4. Clean way to avoid infinite VM exits due to
> >> >> PMU interrupt
> >> >>
> >> >> I have discovered new approach which is as follows:
> >> >> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> >> >> 2. Ensure that host PMU irq is disabled when entering guest
> >> >> mode and re-enable host PMU irq when exiting guest mode if
> >> >> it was enabled previously.
> >> >
> >> > How does this look like software-engineering wise?  Would you be looking
> >> > up the IRQ number from the DT in the KVM code again?  How does KVM then
> >> > synchronize with the host PMU driver so they're not both requesting the
> >> > same IRQ at the same time?
> >>
> >> We only lookup host PMU irq numbers from DT at HYP init time.
> >>
> >> During context switch we know the host PMU irq number for
> >> current host CPU so we can get state of host PMU irq in
> >> context switch code.
> >>
> >> If we go by the shard irq handler approach then both KVM
> >> and host PMU driver will do request_irq() on same host
> >> PMU irq. In other words, there is no virtual PMU irq provided
> >> by HW for guest.
> >>
> >
> > Sorry for the *really* long delay in this response.
> >
> > We had a chat about this subject with Will Deacon and Marc Zyngier
> > during connect, and basically we came to think of a number of problems
> > with the current approach:
> >
> > 1. As you pointed out, there is a need for a shared IRQ handler, and
> >    there is no immediately nice way to implement this without a more
> >    sophisticated perf/kvm interface, probably comprising eventfds or
> >    something similar.
> >
> > 2. Hijacking the counters for the VM without perf knowing about it
> >    basically makes it impossible to do system-wide event counting, an
> >    important use case for a virtualization host.
> >
> > So the approach we will be taking now would be to:
> >
> > First, implement a strictly trap-and-emulate in software approach.  This
> > would allow any software relying on access to performance counters to
> > work, although potentially with slightly unprecise values.  This is the
> > approach taken by x86 and would be significantly simpler to support on
> > systems like big.LITTLE as well.
> 
> Actually, trap-and-emulate would also help avoid additions to the
> KVM world switch.
> 
> >
> > Second, if there are values obtained from within the guest that are so
> > skewed by the trap-and-emulate approach that we need to give the guest
> > access to counters, we should try to share the hardware by partitioning
> > the physical counters, but again, we need to coordinate with the host
> > perf system for this.  We would only be pursuing this approach if
> > absolutely necessary.
> 
> Yes, with trap-and-emulate we cannot accurately emulate all types
> of hw counters (particularly cache misses and similar events).
> 
> >
> > Apologies for the change in direction on this.
> >
> > What are your thoughts?  Do you still have time/interest to work
> > on any of this?
> 
> Its a drastic change in direction.
> 
> Currently, I have taken up some different work (not related to KVM)
> so for next few months I wont be able spend time on this.
> 
> Its better if Linaro takes this work to avoid any further delays.
> 
ok, will do, thanks for being responsive and putting in the efforts so
far!

Best,
-Christoffer

WARNING: multiple messages have this Message-ID (diff)
From: christoffer.dall@linaro.org (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
Date: Mon, 16 Feb 2015 13:23:09 +0100	[thread overview]
Message-ID: <20150216122309.GA23821@cbox> (raw)
In-Reply-To: <CAAhSdy1Nj1THsdF6UeM--ihmk8mnDR38Bi9v_ri3Y-MRKD34zg@mail.gmail.com>

On Mon, Feb 16, 2015 at 05:46:54PM +0530, Anup Patel wrote:
> Hi Christoffer,
> 
> On Sun, Feb 15, 2015 at 9:03 PM, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > Hi Anup,
> >
> > On Mon, Jan 12, 2015 at 09:49:13AM +0530, Anup Patel wrote:
> >> On Mon, Jan 12, 2015 at 12:41 AM, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Tue, Dec 30, 2014 at 11:19:13AM +0530, Anup Patel wrote:
> >> >> (dropping previous conversation for easy reading)
> >> >>
> >> >> Hi Marc/Christoffer,
> >> >>
> >> >> I tried implementing PMU context-switch via C code
> >> >> in EL1 mode and in atomic context with irqs disabled.
> >> >> The context switch itself works perfectly fine but
> >> >> irq forwarding is not clean for PMU irq.
> >> >>
> >> >> I found another issue that is GIC only samples irq
> >> >> lines if they are enabled. This means for using
> >> >> irq forwarding we will need to ensure that host PMU
> >> >> irq is enabled.  The arch_timer code does this by
> >> >> doing request_irq() for host virtual timer interrupt.
> >> >> For PMU, we can either enable/disable host PMU
> >> >> irq in context switch or we need to do have shared
> >> >> irq handler between kvm pmu and host kernel pmu.
> >> >
> >> > could we simply require the host PMU driver to request the IRQ and have
> >> > the driver inject the corresponding IRQ to the VM via a mechanism
> >> > similar to VFIO using an eventfd and irqfds etc.?
> >>
> >> Currently, the host PMU driver does request_irq() only when
> >> there is some event to be monitored. This means host will do
> >> request_irq() only when we run perf application on host
> >> user space.
> >>
> >> Initially, I though that we could simply pass IRQF_SHARED
> >> for request_irq() in host PMU driver and do the same for
> >> reqest_irq() in KVM PMU code but the PMU irq can be
> >> SPI or PPI. If the PMU irq is SPI then IRQF_SHARED
> >> flag would fine but if its PPI then we have no way to
> >> set IRQF_SHARED flag because request_percpu_irq()
> >> does not have irq flags parameter.
> >>
> >> >
> >> > (I haven't quite thought through if there's a way for the host PMU
> >> > driver to distinguish between an IRQ for itself and one for the guest,
> >> > though).
> >> >
> >> > It does feel like we will need some sort of communication/coordination
> >> > between the host PMU driver and KVM...
> >> >
> >> >>
> >> >> I have rethinked about our discussion so far. I
> >> >> understand that we need KVM PMU virtualization
> >> >> to meet following criteria:
> >> >> 1. No modification in host PMU driver
> >> >
> >> > is this really a strict requirement?  one of the advantages of KVM
> >> > should be that the rest of the kernel should be supportive of KVM.
> >>
> >> I guess so because host PMU driver should not do things
> >> differently for host and guest. I think this the reason why
> >> we discarded the mask/unmask PMU irq approach which
> >> I had implemented in RFC v1.
> >>
> >> >
> >> >> 2. No modification in guest PMU driver
> >> >> 3. No mask/unmask dance for sharing host PMU irq
> >> >> 4. Clean way to avoid infinite VM exits due to
> >> >> PMU interrupt
> >> >>
> >> >> I have discovered new approach which is as follows:
> >> >> 1. Context switch PMU in atomic context (i.e. local_irq_disable())
> >> >> 2. Ensure that host PMU irq is disabled when entering guest
> >> >> mode and re-enable host PMU irq when exiting guest mode if
> >> >> it was enabled previously.
> >> >
> >> > How does this look like software-engineering wise?  Would you be looking
> >> > up the IRQ number from the DT in the KVM code again?  How does KVM then
> >> > synchronize with the host PMU driver so they're not both requesting the
> >> > same IRQ at the same time?
> >>
> >> We only lookup host PMU irq numbers from DT at HYP init time.
> >>
> >> During context switch we know the host PMU irq number for
> >> current host CPU so we can get state of host PMU irq in
> >> context switch code.
> >>
> >> If we go by the shard irq handler approach then both KVM
> >> and host PMU driver will do request_irq() on same host
> >> PMU irq. In other words, there is no virtual PMU irq provided
> >> by HW for guest.
> >>
> >
> > Sorry for the *really* long delay in this response.
> >
> > We had a chat about this subject with Will Deacon and Marc Zyngier
> > during connect, and basically we came to think of a number of problems
> > with the current approach:
> >
> > 1. As you pointed out, there is a need for a shared IRQ handler, and
> >    there is no immediately nice way to implement this without a more
> >    sophisticated perf/kvm interface, probably comprising eventfds or
> >    something similar.
> >
> > 2. Hijacking the counters for the VM without perf knowing about it
> >    basically makes it impossible to do system-wide event counting, an
> >    important use case for a virtualization host.
> >
> > So the approach we will be taking now would be to:
> >
> > First, implement a strictly trap-and-emulate in software approach.  This
> > would allow any software relying on access to performance counters to
> > work, although potentially with slightly unprecise values.  This is the
> > approach taken by x86 and would be significantly simpler to support on
> > systems like big.LITTLE as well.
> 
> Actually, trap-and-emulate would also help avoid additions to the
> KVM world switch.
> 
> >
> > Second, if there are values obtained from within the guest that are so
> > skewed by the trap-and-emulate approach that we need to give the guest
> > access to counters, we should try to share the hardware by partitioning
> > the physical counters, but again, we need to coordinate with the host
> > perf system for this.  We would only be pursuing this approach if
> > absolutely necessary.
> 
> Yes, with trap-and-emulate we cannot accurately emulate all types
> of hw counters (particularly cache misses and similar events).
> 
> >
> > Apologies for the change in direction on this.
> >
> > What are your thoughts?  Do you still have time/interest to work
> > on any of this?
> 
> Its a drastic change in direction.
> 
> Currently, I have taken up some different work (not related to KVM)
> so for next few months I wont be able spend time on this.
> 
> Its better if Linaro takes this work to avoid any further delays.
> 
ok, will do, thanks for being responsive and putting in the efforts so
far!

Best,
-Christoffer

  reply	other threads:[~2015-02-16 12:22 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-05  9:24 [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support Anup Patel
2014-08-05  9:24 ` Anup Patel
2014-08-05  9:24 ` [RFC PATCH 1/6] ARM64: Move PMU register related defines to asm/pmu.h Anup Patel
2014-08-05  9:24   ` Anup Patel
2014-08-05  9:24 ` [RFC PATCH 2/6] ARM64: perf: Re-enable overflow interrupt from interrupt handler Anup Patel
2014-08-05  9:24   ` Anup Patel
2014-08-06 14:24   ` Will Deacon
2014-08-06 14:24     ` Will Deacon
2014-08-07  9:03     ` Anup Patel
2014-08-07  9:03       ` Anup Patel
2014-08-07  9:06       ` Will Deacon
2014-08-07  9:06         ` Will Deacon
2014-08-05  9:24 ` [RFC PATCH 3/6] ARM: " Anup Patel
2014-08-05  9:24   ` Anup Patel
2014-08-05  9:24 ` [RFC PATCH 4/6] ARM/ARM64: KVM: Add common code PMU IRQ routing Anup Patel
2014-08-05  9:24   ` Anup Patel
2014-08-05  9:24 ` [RFC PATCH 5/6] ARM64: KVM: Implement full context switch of PMU registers Anup Patel
2014-08-05  9:24   ` Anup Patel
2014-08-05  9:24 ` [RFC PATCH 6/6] ARM64: KVM: Upgrade to lazy " Anup Patel
2014-08-05  9:24   ` Anup Patel
2014-08-05  9:32 ` [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support Anup Patel
2014-08-05  9:32   ` Anup Patel
2014-08-05  9:35   ` Anup Patel
2014-08-05  9:35     ` Anup Patel
2014-11-07 20:23 ` Christoffer Dall
2014-11-07 20:23   ` Christoffer Dall
2014-11-07 20:25 ` Christoffer Dall
2014-11-07 20:25   ` Christoffer Dall
2014-11-08  9:36   ` Anup Patel
2014-11-08  9:36     ` Anup Patel
2014-11-08 12:39     ` Christoffer Dall
2014-11-08 12:39       ` Christoffer Dall
2014-11-11  9:18       ` Anup Patel
2014-11-11  9:18         ` Anup Patel
2014-11-18  3:24         ` Anup Patel
2014-11-18  3:24           ` Anup Patel
2014-11-19 15:29         ` Christoffer Dall
2014-11-19 15:29           ` Christoffer Dall
2014-11-20 14:47           ` Anup Patel
2014-11-20 14:47             ` Anup Patel
2014-11-21  9:59             ` Christoffer Dall
2014-11-21  9:59               ` Christoffer Dall
2014-11-21 10:36               ` Anup Patel
2014-11-21 10:36                 ` Anup Patel
2014-11-21 11:49                 ` Christoffer Dall
2014-11-21 11:49                   ` Christoffer Dall
2014-11-24  8:44                   ` Anup Patel
2014-11-24  8:44                     ` Anup Patel
2014-11-24 14:37                     ` Christoffer Dall
2014-11-24 14:37                       ` Christoffer Dall
2014-11-25 12:47                       ` Anup Patel
2014-11-25 12:47                         ` Anup Patel
2014-11-25 13:42                         ` Christoffer Dall
2014-11-25 13:42                           ` Christoffer Dall
2014-11-27 10:22                           ` Anup Patel
2014-11-27 10:22                             ` Anup Patel
2014-11-27 10:40                             ` Marc Zyngier
2014-11-27 10:40                               ` Marc Zyngier
2014-11-27 10:54                               ` Anup Patel
2014-11-27 10:54                                 ` Anup Patel
2014-11-27 11:06                                 ` Marc Zyngier
2014-11-27 11:06                                   ` Marc Zyngier
2014-12-30  5:49                                   ` Anup Patel
2014-12-30  5:49                                     ` Anup Patel
2015-01-08  4:02                                     ` Anup Patel
2015-01-08  4:02                                       ` Anup Patel
2015-01-11 19:11                                     ` Christoffer Dall
2015-01-11 19:11                                       ` Christoffer Dall
2015-01-12  4:19                                       ` Anup Patel
2015-01-12  4:19                                         ` Anup Patel
2015-02-15 15:33                                         ` Christoffer Dall
2015-02-15 15:33                                           ` Christoffer Dall
2015-02-16 12:16                                           ` Anup Patel
2015-02-16 12:16                                             ` Anup Patel
2015-02-16 12:23                                             ` Christoffer Dall [this message]
2015-02-16 12:23                                               ` Christoffer Dall
2015-01-14  4:28                                       ` Anup Patel
2015-01-14  4:28                                         ` Anup Patel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150216122309.GA23821@cbox \
    --to=christoffer.dall@linaro.org \
    --cc=Ian.Campbell@citrix.com \
    --cc=Will.Deacon@arm.com \
    --cc=anup@brainfault.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=marc.zyngier@arm.com \
    --cc=patches@apm.com \
    --cc=pranavkumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.