linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexandru Elisei <alexandru.elisei@arm.com>
To: Suzuki K Poulose <suzuki.poulose@arm.com>,
	maz@kernel.org, james.morse@arm.com,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, will@kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support
Date: Thu, 23 Sep 2021 16:12:27 +0100	[thread overview]
Message-ID: <62a2f16c-3aa0-3d7e-4b35-3a5bad701586@arm.com> (raw)
In-Reply-To: <963f68c8-b109-7ebb-751d-14ce46e3cdde@arm.com>

Hi Suzuki,

Thank you for having a look!

On 9/22/21 11:11, Suzuki K Poulose wrote:
> On 25/08/2021 17:17, Alexandru Elisei wrote:
>> This is v4 of the SPE series posted at [1]. v2 can be found at [2], and the
>> original series at [3].
>>
>> Statistical Profiling Extension (SPE) is an optional feature added in
>> ARMv8.2. It allows sampling at regular intervals of the operations executed
>> by the PE and storing a record of each operation in a memory buffer. A high
>> level overview of the extension is presented in an article on arm.com [4].
>>
>> This is another complete rewrite of the series, and nothing is set in
>> stone. If you think of a better way to do things, please suggest it.
>>
>>
>> Features added
>> ==============
>>
>> The rewrite enabled me to add support for several features not
>> present in the previous iteration:
>>
>> - Support for heterogeneous systems, where only some of the CPUs support SPE.
>>    This is accomplished via the KVM_ARM_VCPU_SUPPORTED_CPUS VCPU ioctl.
>>
>> - Support for VM migration with the KVM_ARM_VCPU_SPE_CTRL(KVM_ARM_VCPU_SPE_STOP)
>>    VCPU ioctl.
>>
>> - The requirement for userspace to mlock() the guest memory has been removed,
>>    and now userspace can make changes to memory contents after the memory is
>>    mapped at stage 2.
>>
>> - Better debugging of guest memory pinning by printing a warning when we
>>    get an unexpected read or write fault. This helped me catch several bugs
>>    during development, it has already proven very useful. Many thanks to
>>    James who suggested when reviewing v3.
>>
>>
>> Missing features
>> ================
>>
>> I've tried to keep the series as small as possible to make it easier to review,
>> while implementing the core functionality needed for the SPE emulation. As such,
>> I've chosen to not implement several features:
>>
>> - Host profiling a guest which has the SPE feature bit set (see open
>>    questions).
>>
>> - No errata workarounds have been implemented yet, and there are quite a few of
>>    them for Neoverse N1 and Neoverse V1.
>>
>> - Disabling CONFIG_NUMA_BALANCING is a hack to get KVM SPE to work and I am
>>    investigating other ways to get around automatic numa balancing, like
>>    requiring userspace to disable it via set_mempolicy(). I am also going to
>>    look at how VFIO gets around it. Suggestions welcome.
>>
>> - There's plenty of room for optimization. Off the top of my head, using
>>    block mappings at stage 2, batch pinning of pages (similar to what VFIO
>>    does), optimize the way KVM keeps track of pinned pages (using a linked
>>    list triples the memory usage), context-switch the SPE registers on
>>    vcpu_load/vcpu_put on VHE if the host is not profiling, locking
>>    optimizations, etc, etc.
>>
>> - ...and others. I'm sure I'm missing at least a few things which are
>>    important for someone.
>>
>>
>> Known issues
>> ============
>>
>> This is an RFC, so keep in mind that almost definitely there will be scary
>> bugs. For example, below is a list of known issues which don't affect the
>> correctness of the emulation, and which I'm planning to fix in a future
>> iteration:
>>
>> - With CONFIG_PROVE_LOCKING=y, lockdep complains about lock contention when
>>    the VCPU executes the dcache clean pending ops.
>>
>> - With CONFIG_PROVE_LOCKING=y, KVM will hit a BUG at
>>    kvm_lock_all_vcpus()->mutex_trylock(&vcpu->mutex) with more than 48
>>    VCPUs.
>>
>> This BUG statement can also be triggered with mainline. To reproduce it,
>> compile kvmtool from this branch [5] and follow the instruction in the
>> kvmtool commit message.
>>
>> One workaround could be to stop trying to lock all VCPUs when locking a
>> memslot and document the fact that it is required that no VCPUs are run
>> before the ioctl completes, otherwise bad things might happen to the VM.
>>
>>
>> Open questions
>> ==============
>>
>> 1. Implementing support for host profiling a guest with the SPE feature
>> means setting the profiling buffer owning regime to EL2. While that is in
>> effect,  PMBIDR_EL1.P will equal 1. This has two consequences: if the guest
>> probes SPE during this time, the driver will fail; and the guest will be
>> able to determine when it is profiled. I see two options here:
>
> This doesn't mean the EL2 is owning the SPE. It only tells you that a
> higher level EL is owning the SPE. It could as well be EL3. (e.g, MDCR_EL3.NSPB
> == 0 or 1). So I think this is architecturally correct,
> as long as we trap the guest access to other SPE registers and inject
> and UNDEF.

You are right, I was wrong about the part about the guest being able to determine
when it is profiled, I forgot that PMBIDR_EL1.P can also be 1 if EL3 is owning the
buffer.

But I don't understand why you are suggesting that KVM injects UNDEF in this case.
I was thinking that when the host is profiling the guest, KVM can store guest
writes to the SPE registers to the shadow copy of the registers. On a world switch
to guest, KVM will not restore the registers onto the hardware as to not interfere
with the profiling operation performed by the host. When profiling ends, KVM can
then let the guest use SPE again by restoring the guest register values onto the
hardware at every world switch and setting the buffer owning regime to EL1 again.

I put this as an open question because if the guest sees PMBIDR_EL1.P set when the
guest SPE driver probes, the driver will likely decide that SPE is not available.
But the VM has been created with SPE enabled because the host userspace wants the
guest to have access to SPE, and the host userspace might not realize that the act
of profiling a guest can make SPE unusable by that guest. I am inclined now to let
the host's userspace do whatever it wishes and allow it to profile a guest if it
so desires, and mention this possible side effect in the KVM documentation, as it
might be surprising for someone who isn't familiar with the inner workings of
KVM's SPE emulation.

Thanks,

Alex

>
>
> Thanks
> Suzuki

      reply	other threads:[~2021-09-23 15:11 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 01/39] KVM: arm64: Make lock_all_vcpus() available to the rest of KVM Alexandru Elisei
2021-09-22 10:39   ` Suzuki K Poulose
2021-08-25 16:17 ` [RFC PATCH v4 02/39] KVM: arm64: Add lock/unlock memslot user API Alexandru Elisei
2021-10-18  9:04   ` Suzuki K Poulose
2021-10-18 14:09     ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 03/39] KVM: arm64: Implement the memslot lock/unlock functionality Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 04/39] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run Alexandru Elisei
2021-10-17 11:43   ` Marc Zyngier
2021-10-18 12:56     ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 05/39] KVM: arm64: Perform CMOs on locked memslots when userspace resets VCPUs Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 06/39] KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 07/39] KVM: arm64: Unlock memslots after stage 2 tables are freed Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 08/39] KVM: arm64: Deny changes to locked memslots Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 09/39] KVM: Add kvm_warn{,_ratelimited} macros Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 10/39] KVM: arm64: Print a warning for unexpected faults on locked memslots Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 11/39] KVM: arm64: Allow userspace to lock and unlock memslots Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 12/39] KVM: arm64: Add the KVM_ARM_VCPU_SUPPORTED_CPUS VCPU ioctl Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 13/39] KVM: arm64: Add CONFIG_KVM_ARM_SPE Kconfig option Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 14/39] KVM: arm64: Add SPE capability and VCPU feature Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 15/39] drivers/perf: Expose the cpumask of CPUs that support SPE Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 16/39] KVM: arm64: Make SPE available when at least one CPU supports it Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 17/39] KVM: arm64: Set the VCPU SPE feature bit when SPE is available Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 18/39] KVM: arm64: Expose SPE version to guests Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 19/39] KVM: arm64: Do not emulate SPE on CPUs which don't have SPE Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 20/39] KVM: arm64: Add a new VCPU device control group for SPE Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 21/39] KVM: arm64: Add SPE VCPU device attribute to set the interrupt number Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 22/39] KVM: arm64: Add SPE VCPU device attribute to initialize SPE Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 23/39] KVM: arm64: VHE: Clear MDCR_EL2.E2PB in vcpu_put() Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 24/39] KVM: arm64: debug: Configure MDCR_EL2 when a VCPU has SPE Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 25/39] KVM: arm64: Move the write to MDCR_EL2 out of __activate_traps_common() Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 26/39] KVM: arm64: VHE: Change MDCR_EL2 at world switch if VCPU has SPE Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 27/39] KVM: arm64: Add SPE system registers to VCPU context Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 28/39] KVM: arm64: nVHE: Save PMSCR_EL1 to the host context Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 29/39] KVM: arm64: Rename DEBUG_STATE_SAVE_SPE -> DEBUG_SAVE_SPE_BUFFER flags Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 30/39] KVM: arm64: nVHE: Context switch SPE state if VCPU has SPE Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 31/39] KVM: arm64: VHE: " Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 32/39] KVM: arm64: Save/restore PMSNEVFR_EL1 on VCPU put/load Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 33/39] KVM: arm64: Allow guest to use physical timestamps if perfmon_capable() Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 34/39] KVM: arm64: Emulate SPE buffer management interrupt Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 35/39] KVM: arm64: Add an userspace API to stop a VCPU profiling Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 36/39] KVM: arm64: Implement " Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 37/39] KVM: arm64: Add PMSIDR_EL1 to the SPE register context Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 38/39] KVM: arm64: Make CONFIG_KVM_ARM_SPE depend on !CONFIG_NUMA_BALANCING Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 39/39] KVM: arm64: Allow userspace to enable SPE for guests Alexandru Elisei
2021-09-22 10:11 ` [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Suzuki K Poulose
2021-09-23 15:12   ` Alexandru Elisei [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62a2f16c-3aa0-3d7e-4b35-3a5bad701586@arm.com \
    --to=alexandru.elisei@arm.com \
    --cc=james.morse@arm.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).