All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oliver Upton <oupton@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm list <kvm@vger.kernel.org>,
	kvmarm@lists.cs.columbia.edu,
	Maxim Levitsky <mlevitsk@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Marc Zyngier <maz@kernel.org>, Peter Shier <pshier@google.com>,
	Jim Mattson <jmattson@google.com>,
	David Matlack <dmatlack@google.com>,
	Ricardo Koller <ricarkol@google.com>,
	Jing Zhang <jingzhangos@google.com>,
	Raghavendra Rao Anata <rananta@google.com>
Subject: Re: [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state
Date: Wed, 9 Jun 2021 10:11:32 -0500	[thread overview]
Message-ID: <CAOQ_QsgPHAUuzeLy5sX=EhE8tKs7yEF3rxM47YeM_Pk3DUXMcg@mail.gmail.com> (raw)
In-Reply-To: <63db3823-b8a3-578d-4baa-146104bb977f@redhat.com>

On Wed, Jun 9, 2021 at 8:06 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 08/06/21 23:47, Oliver Upton wrote:
> > KVM's current means of saving/restoring system counters is plagued with
> > temporal issues. At least on ARM64 and x86, we migrate the guest's
> > system counter by-value through the respective guest system register
> > values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
> > brittle as the state is not idempotent: the host system counter is still
> > oscillating between the attempted save and restore. Furthermore, VMMs
> > may wish to transparently live migrate guest VMs, meaning that they
> > include the elapsed time due to live migration blackout in the guest
> > system counter view. The VMM thread could be preempted for any number of
> > reasons (scheduler, L0 hypervisor under nested) between the time that
> > it calculates the desired guest counter value and when KVM actually sets
> > this counter state.
> >
> > Despite the value-based interface that we present to userspace, KVM
> > actually has idempotent guest controls by way of system counter offsets.
> > We can avoid all of the issues associated with a value-based interface
> > by abstracting these offset controls in new ioctls. This series
> > introduces KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls, meant to provide
> > userspace with idempotent controls of the guest system counter.
>
> Hi Oliver,
>
> I wonder how this compares to the idea of initializing the TSC via a
> synchronized (nanoseconds, TSC) pair.
> (https://lore.kernel.org/r/20201130133559.233242-2-mlevitsk@redhat.com),
> and whether it makes sense to apply that idea to ARM as well.  If so, it
> certainly is a good idea to use the same capability and ioctl, even
> though the details of the struct would be architecture-dependent.

Hey Paolo,

Yeah, great question, I had actually alluded to this on [02/10] in
talking to Marc about this.

Really the issue we want to avoid is sampling the host counter twice,
which at least based on the existing means of counter migration is
impossible as the VMM must account for elapsed time. Maxim's patches
appear to address that very issue as well.

Perhaps this will clarify the motivation for my approach: what if the
kernel wasn't the authoritative source for wall time in a system?
Furthermore, VMMs may wish to define their own heuristics for counter
migration (e.g. we only allow the counter to 'jump' by X seconds
during migration blackout). If a VMM tried to assert its whims on the
TSC state before handing it down to the kernel, we would inadvertently
be sampling the host counter twice again. And, anything can happen
between the time we assert elapsed time is within SLO and KVM
computing the TSC offset (scheduling, L0 hypervisor preemption).

So, Maxim's changes would address my concerns in the general case, but
maybe not as much in edge cases where an operator may make decisions
about how much time can elapse while the guest hasn't had CPU time.

--
Thanks,
Oliver

> In your patches there isn't much architecture dependency in struct
> kvm_system_counter_state.  However,  Maxim's also added an
> MSR_IA32_TSC_ADJUST value to the struct, thus ensuring that the host
> could write not just an arbitrary TSC value, but also tie it to an
> arbitrary MSR_IA32_TSC_ADJUST value.  Specifying both in the same ioctl
> simplifies the userspace API.
>
> Paolo
>
> > Patch 1 defines the ioctls, and was separated from the two provided
> > implementations for the sake of review. If it is more intuitive, this
> > patch can be squashed into the implementation commit.
> >
> > Patch 2 realizes initial support for ARM64, migrating only the state
> > associated with the guest's virtual counter-timer. Patch 3 introduces a
> > KVM selftest to assert that userspace manipulation via the
> > aforementioned ioctls produces the expected system counter values within
> > the guest.
> >
> > Patch 4 extends upon the ARM64 implementation by adding support for
> > physical counter-timer offsetting. This is currently backed by a
> > trap-and-emulate implementation, but can also be virtualized in hardware
> > that fully implements ARMv8.6-ECV. ECV support has been elided from this
> > series out of convenience for the author :) Patch 5 adds some test cases
> > to the newly-minted kvm selftest to validate expectations of physical
> > counter-timer emulation.
> >
> > Patch 6 introduces yet another KVM selftest for aarch64, intended to
> > measure the effects of physical counter-timer emulation. Data for this
> > test can be found below, but basically there is some tradeoff of
> > overhead for the sake of correctness, but it isn't too bad.
> >
> > Patches 7-8 add support for the ioctls to x86 by shoehorning the
> > controls into the pre-existing synchronization heuristics. Patch 7
> > provides necessary helper methods for the implementation to play nice
> > with those heuristics, and patch 8 actually implements the ioctls.
> >
> > Patch 9 adds x86 test cases to the system counter KVM selftest. Lastly,
> > patch 10 documents the ioctls for both x86 and arm64.
> >
> > All patches apply cleanly to kvm/next at the following commit:
> >
> > a4345a7cecfb ("Merge tag 'kvmarm-fixes-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD")
> >
> > Physical counter benchmark
> > --------------------------
> >
> > The following data was collected by running 10000 iterations of the
> > benchmark test from Patch 6 on an Ampere Mt. Jade reference server, A 2S
> > machine with 2 80-core Ampere Altra SoCs. Measurements were collected
> > for both VHE and nVHE operation using the `kvm-arm.mode=` command-line
> > parameter.
> >
> > nVHE
> > ----
> >
> > +--------------------+--------+---------+
> > |       Metric       | Native | Trapped |
> > +--------------------+--------+---------+
> > | Average            | 54ns   | 148ns   |
> > | Standard Deviation | 124ns  | 122ns   |
> > | 95th Percentile    | 258ns  | 348ns   |
> > +--------------------+--------+---------+
> >
> > VHE
> > ---
> >
> > +--------------------+--------+---------+
> > |       Metric       | Native | Trapped |
> > +--------------------+--------+---------+
> > | Average            | 53ns   | 152ns   |
> > | Standard Deviation | 92ns   | 94ns    |
> > | 95th Percentile    | 204ns  | 307ns   |
> > +--------------------+--------+---------+
> >
> > Oliver Upton (10):
> >    KVM: Introduce KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
> >    KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
> >    selftests: KVM: Introduce system_counter_state_test
> >    KVM: arm64: Add userspace control of the guest's physical counter
> >    selftests: KVM: Add test cases for physical counter offsetting
> >    selftests: KVM: Add counter emulation benchmark
> >    KVM: x86: Refactor tsc synchronization code
> >    KVM: x86: Implement KVM_CAP_SYSTEM_COUNTER_STATE
> >    selftests: KVM: Add support for x86 to system_counter_state_test
> >    Documentation: KVM: Document KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
> >
> >   Documentation/virt/kvm/api.rst                |  98 +++++++
> >   Documentation/virt/kvm/locking.rst            |  11 +
> >   arch/arm64/include/asm/kvm_host.h             |   6 +
> >   arch/arm64/include/asm/sysreg.h               |   1 +
> >   arch/arm64/include/uapi/asm/kvm.h             |  17 ++
> >   arch/arm64/kvm/arch_timer.c                   |  84 +++++-
> >   arch/arm64/kvm/arm.c                          |  25 ++
> >   arch/arm64/kvm/hyp/include/hyp/switch.h       |  31 +++
> >   arch/arm64/kvm/hyp/nvhe/timer-sr.c            |  16 +-
> >   arch/x86/include/asm/kvm_host.h               |   1 +
> >   arch/x86/include/uapi/asm/kvm.h               |   8 +
> >   arch/x86/kvm/x86.c                            | 176 +++++++++---
> >   include/uapi/linux/kvm.h                      |   5 +
> >   tools/testing/selftests/kvm/.gitignore        |   2 +
> >   tools/testing/selftests/kvm/Makefile          |   3 +
> >   .../kvm/aarch64/counter_emulation_benchmark.c | 209 ++++++++++++++
> >   .../selftests/kvm/include/aarch64/processor.h |  24 ++
> >   .../selftests/kvm/system_counter_state_test.c | 256 ++++++++++++++++++
> >   18 files changed, 926 insertions(+), 47 deletions(-)
> >   create mode 100644 tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
> >   create mode 100644 tools/testing/selftests/kvm/system_counter_state_test.c
> >
>

WARNING: multiple messages have this Message-ID (diff)
From: Oliver Upton <oupton@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm list <kvm@vger.kernel.org>,
	Sean Christopherson <seanjc@google.com>,
	Raghavendra Rao Anata <rananta@google.com>,
	Peter Shier <pshier@google.com>,
	Maxim Levitsky <mlevitsk@redhat.com>,
	Marc Zyngier <maz@kernel.org>,
	David Matlack <dmatlack@google.com>,
	kvmarm@lists.cs.columbia.edu, Jim Mattson <jmattson@google.com>
Subject: Re: [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state
Date: Wed, 9 Jun 2021 10:11:32 -0500	[thread overview]
Message-ID: <CAOQ_QsgPHAUuzeLy5sX=EhE8tKs7yEF3rxM47YeM_Pk3DUXMcg@mail.gmail.com> (raw)
In-Reply-To: <63db3823-b8a3-578d-4baa-146104bb977f@redhat.com>

On Wed, Jun 9, 2021 at 8:06 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 08/06/21 23:47, Oliver Upton wrote:
> > KVM's current means of saving/restoring system counters is plagued with
> > temporal issues. At least on ARM64 and x86, we migrate the guest's
> > system counter by-value through the respective guest system register
> > values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
> > brittle as the state is not idempotent: the host system counter is still
> > oscillating between the attempted save and restore. Furthermore, VMMs
> > may wish to transparently live migrate guest VMs, meaning that they
> > include the elapsed time due to live migration blackout in the guest
> > system counter view. The VMM thread could be preempted for any number of
> > reasons (scheduler, L0 hypervisor under nested) between the time that
> > it calculates the desired guest counter value and when KVM actually sets
> > this counter state.
> >
> > Despite the value-based interface that we present to userspace, KVM
> > actually has idempotent guest controls by way of system counter offsets.
> > We can avoid all of the issues associated with a value-based interface
> > by abstracting these offset controls in new ioctls. This series
> > introduces KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls, meant to provide
> > userspace with idempotent controls of the guest system counter.
>
> Hi Oliver,
>
> I wonder how this compares to the idea of initializing the TSC via a
> synchronized (nanoseconds, TSC) pair.
> (https://lore.kernel.org/r/20201130133559.233242-2-mlevitsk@redhat.com),
> and whether it makes sense to apply that idea to ARM as well.  If so, it
> certainly is a good idea to use the same capability and ioctl, even
> though the details of the struct would be architecture-dependent.

Hey Paolo,

Yeah, great question, I had actually alluded to this on [02/10] in
talking to Marc about this.

Really the issue we want to avoid is sampling the host counter twice,
which at least based on the existing means of counter migration is
impossible as the VMM must account for elapsed time. Maxim's patches
appear to address that very issue as well.

Perhaps this will clarify the motivation for my approach: what if the
kernel wasn't the authoritative source for wall time in a system?
Furthermore, VMMs may wish to define their own heuristics for counter
migration (e.g. we only allow the counter to 'jump' by X seconds
during migration blackout). If a VMM tried to assert its whims on the
TSC state before handing it down to the kernel, we would inadvertently
be sampling the host counter twice again. And, anything can happen
between the time we assert elapsed time is within SLO and KVM
computing the TSC offset (scheduling, L0 hypervisor preemption).

So, Maxim's changes would address my concerns in the general case, but
maybe not as much in edge cases where an operator may make decisions
about how much time can elapse while the guest hasn't had CPU time.

--
Thanks,
Oliver

> In your patches there isn't much architecture dependency in struct
> kvm_system_counter_state.  However,  Maxim's also added an
> MSR_IA32_TSC_ADJUST value to the struct, thus ensuring that the host
> could write not just an arbitrary TSC value, but also tie it to an
> arbitrary MSR_IA32_TSC_ADJUST value.  Specifying both in the same ioctl
> simplifies the userspace API.
>
> Paolo
>
> > Patch 1 defines the ioctls, and was separated from the two provided
> > implementations for the sake of review. If it is more intuitive, this
> > patch can be squashed into the implementation commit.
> >
> > Patch 2 realizes initial support for ARM64, migrating only the state
> > associated with the guest's virtual counter-timer. Patch 3 introduces a
> > KVM selftest to assert that userspace manipulation via the
> > aforementioned ioctls produces the expected system counter values within
> > the guest.
> >
> > Patch 4 extends upon the ARM64 implementation by adding support for
> > physical counter-timer offsetting. This is currently backed by a
> > trap-and-emulate implementation, but can also be virtualized in hardware
> > that fully implements ARMv8.6-ECV. ECV support has been elided from this
> > series out of convenience for the author :) Patch 5 adds some test cases
> > to the newly-minted kvm selftest to validate expectations of physical
> > counter-timer emulation.
> >
> > Patch 6 introduces yet another KVM selftest for aarch64, intended to
> > measure the effects of physical counter-timer emulation. Data for this
> > test can be found below, but basically there is some tradeoff of
> > overhead for the sake of correctness, but it isn't too bad.
> >
> > Patches 7-8 add support for the ioctls to x86 by shoehorning the
> > controls into the pre-existing synchronization heuristics. Patch 7
> > provides necessary helper methods for the implementation to play nice
> > with those heuristics, and patch 8 actually implements the ioctls.
> >
> > Patch 9 adds x86 test cases to the system counter KVM selftest. Lastly,
> > patch 10 documents the ioctls for both x86 and arm64.
> >
> > All patches apply cleanly to kvm/next at the following commit:
> >
> > a4345a7cecfb ("Merge tag 'kvmarm-fixes-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD")
> >
> > Physical counter benchmark
> > --------------------------
> >
> > The following data was collected by running 10000 iterations of the
> > benchmark test from Patch 6 on an Ampere Mt. Jade reference server, A 2S
> > machine with 2 80-core Ampere Altra SoCs. Measurements were collected
> > for both VHE and nVHE operation using the `kvm-arm.mode=` command-line
> > parameter.
> >
> > nVHE
> > ----
> >
> > +--------------------+--------+---------+
> > |       Metric       | Native | Trapped |
> > +--------------------+--------+---------+
> > | Average            | 54ns   | 148ns   |
> > | Standard Deviation | 124ns  | 122ns   |
> > | 95th Percentile    | 258ns  | 348ns   |
> > +--------------------+--------+---------+
> >
> > VHE
> > ---
> >
> > +--------------------+--------+---------+
> > |       Metric       | Native | Trapped |
> > +--------------------+--------+---------+
> > | Average            | 53ns   | 152ns   |
> > | Standard Deviation | 92ns   | 94ns    |
> > | 95th Percentile    | 204ns  | 307ns   |
> > +--------------------+--------+---------+
> >
> > Oliver Upton (10):
> >    KVM: Introduce KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
> >    KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
> >    selftests: KVM: Introduce system_counter_state_test
> >    KVM: arm64: Add userspace control of the guest's physical counter
> >    selftests: KVM: Add test cases for physical counter offsetting
> >    selftests: KVM: Add counter emulation benchmark
> >    KVM: x86: Refactor tsc synchronization code
> >    KVM: x86: Implement KVM_CAP_SYSTEM_COUNTER_STATE
> >    selftests: KVM: Add support for x86 to system_counter_state_test
> >    Documentation: KVM: Document KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
> >
> >   Documentation/virt/kvm/api.rst                |  98 +++++++
> >   Documentation/virt/kvm/locking.rst            |  11 +
> >   arch/arm64/include/asm/kvm_host.h             |   6 +
> >   arch/arm64/include/asm/sysreg.h               |   1 +
> >   arch/arm64/include/uapi/asm/kvm.h             |  17 ++
> >   arch/arm64/kvm/arch_timer.c                   |  84 +++++-
> >   arch/arm64/kvm/arm.c                          |  25 ++
> >   arch/arm64/kvm/hyp/include/hyp/switch.h       |  31 +++
> >   arch/arm64/kvm/hyp/nvhe/timer-sr.c            |  16 +-
> >   arch/x86/include/asm/kvm_host.h               |   1 +
> >   arch/x86/include/uapi/asm/kvm.h               |   8 +
> >   arch/x86/kvm/x86.c                            | 176 +++++++++---
> >   include/uapi/linux/kvm.h                      |   5 +
> >   tools/testing/selftests/kvm/.gitignore        |   2 +
> >   tools/testing/selftests/kvm/Makefile          |   3 +
> >   .../kvm/aarch64/counter_emulation_benchmark.c | 209 ++++++++++++++
> >   .../selftests/kvm/include/aarch64/processor.h |  24 ++
> >   .../selftests/kvm/system_counter_state_test.c | 256 ++++++++++++++++++
> >   18 files changed, 926 insertions(+), 47 deletions(-)
> >   create mode 100644 tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
> >   create mode 100644 tools/testing/selftests/kvm/system_counter_state_test.c
> >
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

  reply	other threads:[~2021-06-09 15:12 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-08 21:47 [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state Oliver Upton
2021-06-08 21:47 ` Oliver Upton
2021-06-08 21:47 ` [PATCH 01/10] KVM: Introduce KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls Oliver Upton
2021-06-08 21:47   ` [PATCH 01/10] KVM: Introduce KVM_{GET, SET}_SYSTEM_COUNTER_STATE ioctls Oliver Upton
2021-06-08 21:47 ` [PATCH 02/10] KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE Oliver Upton
2021-06-08 21:47   ` Oliver Upton
2021-06-08 21:55   ` Oliver Upton
2021-06-08 21:55     ` Oliver Upton
2021-06-09 10:23   ` Marc Zyngier
2021-06-09 10:23     ` Marc Zyngier
2021-06-09 14:51     ` Oliver Upton
2021-06-09 14:51       ` Oliver Upton
2021-06-10  6:54       ` Paolo Bonzini
2021-06-10  6:54         ` Paolo Bonzini
2021-06-10  6:26     ` Paolo Bonzini
2021-06-10  6:26       ` Paolo Bonzini
2021-06-08 21:47 ` [PATCH 03/10] selftests: KVM: Introduce system_counter_state_test Oliver Upton
2021-06-08 21:47   ` Oliver Upton
2021-06-08 21:47 ` [PATCH 04/10] KVM: arm64: Add userspace control of the guest's physical counter Oliver Upton
2021-06-08 21:47   ` Oliver Upton
2021-06-08 21:58   ` Oliver Upton
2021-06-08 21:58     ` Oliver Upton
2021-06-08 21:47 ` [PATCH 05/10] selftests: KVM: Add test cases for physical counter offsetting Oliver Upton
2021-06-08 21:47   ` Oliver Upton
2021-06-08 21:47 ` [PATCH 06/10] selftests: KVM: Add counter emulation benchmark Oliver Upton
2021-06-08 21:47   ` Oliver Upton
2021-06-08 21:47 ` [PATCH 07/10] KVM: x86: Refactor tsc synchronization code Oliver Upton
2021-06-08 21:47   ` Oliver Upton
2021-06-08 21:47 ` [PATCH 08/10] KVM: x86: Implement KVM_CAP_SYSTEM_COUNTER_STATE Oliver Upton
2021-06-08 21:47   ` Oliver Upton
2021-06-08 21:47 ` [PATCH 09/10] selftests: KVM: Add support for x86 to system_counter_state_test Oliver Upton
2021-06-08 21:47   ` Oliver Upton
2021-06-08 21:47 ` [PATCH 10/10] Documentation: KVM: Document KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls Oliver Upton
2021-06-08 21:47   ` [PATCH 10/10] Documentation: KVM: Document KVM_{GET, SET}_SYSTEM_COUNTER_STATE ioctls Oliver Upton
2021-06-09 13:05 ` [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state Paolo Bonzini
2021-06-09 13:05   ` Paolo Bonzini
2021-06-09 15:11   ` Oliver Upton [this message]
2021-06-09 15:11     ` Oliver Upton
2021-06-09 17:05     ` Paolo Bonzini
2021-06-09 17:05       ` Paolo Bonzini
2021-06-09 22:04       ` Oliver Upton
2021-06-09 22:04         ` Oliver Upton
2021-06-10  6:22         ` Paolo Bonzini
2021-06-10  6:22           ` Paolo Bonzini
2021-06-10  6:53           ` Christian Borntraeger
2021-06-10  6:53             ` Christian Borntraeger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ_QsgPHAUuzeLy5sX=EhE8tKs7yEF3rxM47YeM_Pk3DUXMcg@mail.gmail.com' \
    --to=oupton@google.com \
    --cc=dmatlack@google.com \
    --cc=jingzhangos@google.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=maz@kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=pshier@google.com \
    --cc=rananta@google.com \
    --cc=ricarkol@google.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.