linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jürgen Groß" <jgross@suse.com>
To: Ankur Arora <ankur.a.arora@oracle.com>,
	linux-kernel@vger.kernel.org, x86@kernel.org
Cc: peterz@infradead.org, hpa@zytor.com, jpoimboe@redhat.com,
	namit@vmware.com, mhiramat@kernel.org, bp@alien8.de,
	vkuznets@redhat.com, pbonzini@redhat.com,
	boris.ostrovsky@oracle.com, mihai.carabas@oracle.com,
	kvm@vger.kernel.org, xen-devel@lists.xenproject.org,
	virtualization@lists.linux-foundation.org
Subject: Re: [RFC PATCH 00/26] Runtime paravirt patching
Date: Wed, 8 Apr 2020 14:28:06 +0200	[thread overview]
Message-ID: <d7f8bff3-526a-6a84-2e81-677cfbac0111@suse.com> (raw)
In-Reply-To: <20200408050323.4237-1-ankur.a.arora@oracle.com>

On 08.04.20 07:02, Ankur Arora wrote:
> A KVM host (or another hypervisor) might advertise paravirtualized
> features and optimization hints (ex KVM_HINTS_REALTIME) which might
> become stale over the lifetime of the guest. For instance, the

Then this hint is wrong if it can't be guaranteed.

> host might go from being undersubscribed to being oversubscribed
> (or the other way round) and it would make sense for the guest
> switch pv-ops based on that.

I think using pvops for such a feature change is just wrong.

What comes next? Using pvops for being able to migrate a guest from an
Intel to an AMD machine?

...

> There are four main sets of patches in this series:
> 
>   1. PV-ops management (patches 1-10, 20): mostly infrastructure and
>   refactoring pieces to make paravirt patching usable at runtime. For the
>   most part scoped under CONFIG_PARAVIRT_RUNTIME.
> 
>   Patches 1-7, to persist part of parainstructions in memory:
>    "x86/paravirt: Specify subsection in PVOP macros"
>    "x86/paravirt: Allow paravirt patching post-init"
>    "x86/paravirt: PVRTOP macros for PARAVIRT_RUNTIME"
>    "x86/alternatives: Refactor alternatives_smp_module*
>    "x86/alternatives: Rename alternatives_smp*, smp_alt_module
>    "x86/alternatives: Remove stale symbols
>    "x86/paravirt: Persist .parainstructions.runtime"
> 
>   Patches 8-10, develop the inerfaces to safely switch pv-ops:
>    "x86/paravirt: Stash native pv-ops"
>    "x86/paravirt: Add runtime_patch()"
>    "x86/paravirt: Add primitives to stage pv-ops"
> 
>   Patch 20 enables switching of pv_lock_ops:
>    "x86/paravirt: Enable pv-spinlocks in runtime_patch()"
> 
>   2. Non-emulated text poking (patches 11-19)
> 
>   Patches 11-13 are mostly refactoring to split __text_poke() into map,
>   unmap and poke/memcpy phases with the poke portion being re-entrant
>    "x86/alternatives: Remove return value of text_poke*()"
>    "x86/alternatives: Use __get_unlocked_pte() in text_poke()"
>    "x86/alternatives: Split __text_poke()"
> 
>   Patches 15, 17 add the actual poking state-machine:
>    "x86/alternatives: Non-emulated text poking"
>    "x86/alternatives: Add patching logic in text_poke_site()"
> 
>   with patches 14 and 18 containing the pieces for BP handling:
>    "x86/alternatives: Handle native insns in text_poke_loc*()"
>    "x86/alternatives: Handle BP in non-emulated text poking"
> 
>   and patch 19 provides the ability to use the state-machine above in an
>   NMI context (fixes some potential deadlocks when handling inter-
>   dependent operations and multiple NMIs):
>    "x86/alternatives: NMI safe runtime patching".
> 
>   Patch 16 provides the interface (paravirt_runtime_patch()) to use the
>   poking mechanism developed above and patch 21 adds a selftest:
>    "x86/alternatives: Add paravirt patching at runtime"
>    "x86/alternatives: Paravirt runtime selftest"
> 
>   3. KVM guest changes to be able to use this (patches 22-23,25-26):
>    "kvm/paravirt: Encapsulate KVM pv switching logic"
>    "x86/kvm: Add worker to trigger runtime patching"
>    "x86/kvm: Guest support for dynamic hints"
>    "x86/kvm: Add hint change notifier for KVM_HINT_REALTIME".
> 
>   4. KVM host changes to notify the guest of a change (patch 24):
>    "x86/kvm: Support dynamic CPUID hints"
> 
> Testing:
> With paravirt patching, the code is mostly stable on Intel and AMD
> systems under kernbench and locktorture with paravirt toggling (with,
> without synthetic NMIs) in the background.
> 
> Queued spinlock performance for locktorture is also on expected lines:
>   [ 1533.221563] Writes:  Total: 1048759000  Max/Min: 0/0   Fail: 0
>   # toggle PV spinlocks
> 
>   [ 1594.713699] Writes:  Total: 1111660545  Max/Min: 0/0   Fail: 0
>   # PV spinlocks (in ~60 seconds) = 62,901,545
> 
>   # toggle native spinlocks
>   [ 1656.117175] Writes:  Total: 1113888840  Max/Min: 0/0   Fail: 0
>    # native spinlocks (in ~60 seconds) = 2,228,295
> 
> The alternatives testing is more limited with it being used to rewrite
> mostly harmless X86_FEATUREs with load in the background.
> 
> Patches also at:
> 
> ssh://git@github.com/terminus/linux.git alternatives-rfc-upstream-v1
> 
> Please review.
> 
> Thanks
> Ankur
> 
> [1] The precise change in memory footprint depends on config options
> but the following example inlines queued_spin_unlock() (which forms
> the bulk of the added state). The added footprint is the size of the
> .parainstructions.runtime section:
> 
>   $ objdump -h vmlinux|grep .parainstructions
>   Idx Name              		Size      VMA
>   	LMA                File-off  Algn
>    27 .parainstructions 		0001013c  ffffffff82895000
>    	0000000002895000   01c95000  2**3
>    28 .parainstructions.runtime  0000cd2c  ffffffff828a5140
>    	00000000028a5140   01ca5140  2**3
> 
>    $ size vmlinux
>    text       data       bss        dec      hex       filename
>    13726196   12302814   14094336   40123346 2643bd2   vmlinux
> 
> Ankur Arora (26):
>    x86/paravirt: Specify subsection in PVOP macros
>    x86/paravirt: Allow paravirt patching post-init
>    x86/paravirt: PVRTOP macros for PARAVIRT_RUNTIME
>    x86/alternatives: Refactor alternatives_smp_module*
>    x86/alternatives: Rename alternatives_smp*, smp_alt_module
>    x86/alternatives: Remove stale symbols
>    x86/paravirt: Persist .parainstructions.runtime
>    x86/paravirt: Stash native pv-ops
>    x86/paravirt: Add runtime_patch()
>    x86/paravirt: Add primitives to stage pv-ops
>    x86/alternatives: Remove return value of text_poke*()
>    x86/alternatives: Use __get_unlocked_pte() in text_poke()
>    x86/alternatives: Split __text_poke()
>    x86/alternatives: Handle native insns in text_poke_loc*()
>    x86/alternatives: Non-emulated text poking
>    x86/alternatives: Add paravirt patching at runtime
>    x86/alternatives: Add patching logic in text_poke_site()
>    x86/alternatives: Handle BP in non-emulated text poking
>    x86/alternatives: NMI safe runtime patching
>    x86/paravirt: Enable pv-spinlocks in runtime_patch()
>    x86/alternatives: Paravirt runtime selftest
>    kvm/paravirt: Encapsulate KVM pv switching logic
>    x86/kvm: Add worker to trigger runtime patching
>    x86/kvm: Support dynamic CPUID hints
>    x86/kvm: Guest support for dynamic hints
>    x86/kvm: Add hint change notifier for KVM_HINT_REALTIME
> 
>   Documentation/virt/kvm/api.rst        |  17 +
>   Documentation/virt/kvm/cpuid.rst      |   9 +-
>   arch/x86/Kconfig                      |  14 +
>   arch/x86/Kconfig.debug                |  13 +
>   arch/x86/entry/entry_64.S             |   5 +
>   arch/x86/include/asm/alternative.h    |  20 +-
>   arch/x86/include/asm/kvm_host.h       |   6 +
>   arch/x86/include/asm/kvm_para.h       |  17 +
>   arch/x86/include/asm/paravirt.h       |  10 +-
>   arch/x86/include/asm/paravirt_types.h | 230 ++++--
>   arch/x86/include/asm/text-patching.h  |  18 +-
>   arch/x86/include/uapi/asm/kvm_para.h  |   2 +
>   arch/x86/kernel/Makefile              |   1 +
>   arch/x86/kernel/alternative.c         | 987 +++++++++++++++++++++++---
>   arch/x86/kernel/kvm.c                 | 191 ++++-
>   arch/x86/kernel/module.c              |  42 +-
>   arch/x86/kernel/paravirt.c            |  16 +-
>   arch/x86/kernel/paravirt_patch.c      |  61 ++
>   arch/x86/kernel/pv_selftest.c         | 264 +++++++
>   arch/x86/kernel/pv_selftest.h         |  15 +
>   arch/x86/kernel/setup.c               |   2 +
>   arch/x86/kernel/vmlinux.lds.S         |  16 +
>   arch/x86/kvm/cpuid.c                  |   3 +-
>   arch/x86/kvm/x86.c                    |  39 +
>   include/asm-generic/kvm_para.h        |  12 +
>   include/asm-generic/vmlinux.lds.h     |   8 +
>   include/linux/kvm_para.h              |   5 +
>   include/linux/mm.h                    |  16 +-
>   include/linux/preempt.h               |  17 +
>   include/uapi/linux/kvm.h              |   4 +
>   kernel/locking/lock_events.c          |   2 +-
>   mm/memory.c                           |   9 +-
>   32 files changed, 1850 insertions(+), 221 deletions(-)
>   create mode 100644 arch/x86/kernel/pv_selftest.c
>   create mode 100644 arch/x86/kernel/pv_selftest.h
> 

Quite a lot of code churn and hacks for a problem which should not
occur on a well administrated machine.

Especially the NMI dependencies make me not wanting to Ack this series.


Juergen

  parent reply	other threads:[~2020-04-08 12:28 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-08  5:02 [RFC PATCH 00/26] Runtime paravirt patching Ankur Arora
2020-04-08  5:02 ` [RFC PATCH 01/26] x86/paravirt: Specify subsection in PVOP macros Ankur Arora
2020-04-08  5:02 ` [RFC PATCH 02/26] x86/paravirt: Allow paravirt patching post-init Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 03/26] x86/paravirt: PVRTOP macros for PARAVIRT_RUNTIME Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 04/26] x86/alternatives: Refactor alternatives_smp_module* Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 05/26] x86/alternatives: Rename alternatives_smp*, smp_alt_module Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 06/26] x86/alternatives: Remove stale symbols Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 07/26] x86/paravirt: Persist .parainstructions.runtime Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 08/26] x86/paravirt: Stash native pv-ops Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 09/26] x86/paravirt: Add runtime_patch() Ankur Arora
2020-04-08 11:05   ` Peter Zijlstra
2020-04-08  5:03 ` [RFC PATCH 10/26] x86/paravirt: Add primitives to stage pv-ops Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 11/26] x86/alternatives: Remove return value of text_poke*() Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 12/26] x86/alternatives: Use __get_unlocked_pte() in text_poke() Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 13/26] x86/alternatives: Split __text_poke() Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 14/26] x86/alternatives: Handle native insns in text_poke_loc*() Ankur Arora
2020-04-08 11:11   ` Peter Zijlstra
2020-04-08 11:17   ` Peter Zijlstra
2020-04-08  5:03 ` [RFC PATCH 15/26] x86/alternatives: Non-emulated text poking Ankur Arora
2020-04-08 11:13   ` Peter Zijlstra
2020-04-08 11:23   ` Peter Zijlstra
2020-04-08  5:03 ` [RFC PATCH 16/26] x86/alternatives: Add paravirt patching at runtime Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 17/26] x86/alternatives: Add patching logic in text_poke_site() Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 18/26] x86/alternatives: Handle BP in non-emulated text poking Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 19/26] x86/alternatives: NMI safe runtime patching Ankur Arora
2020-04-08 11:36   ` Peter Zijlstra
2020-04-08  5:03 ` [RFC PATCH 20/26] x86/paravirt: Enable pv-spinlocks in runtime_patch() Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 21/26] x86/alternatives: Paravirt runtime selftest Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 22/26] kvm/paravirt: Encapsulate KVM pv switching logic Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 23/26] x86/kvm: Add worker to trigger runtime patching Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 24/26] x86/kvm: Support dynamic CPUID hints Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 25/26] x86/kvm: Guest support for dynamic hints Ankur Arora
2020-04-08  5:03 ` [RFC PATCH 26/26] x86/kvm: Add hint change notifier for KVM_HINT_REALTIME Ankur Arora
2020-04-08 12:08 ` [RFC PATCH 00/26] Runtime paravirt patching Peter Zijlstra
2020-04-08 13:33   ` Jürgen Groß
2020-04-08 14:49     ` Peter Zijlstra
2020-04-10  9:18   ` Ankur Arora
2020-04-08 12:28 ` Jürgen Groß [this message]
2020-04-10  7:56   ` Ankur Arora
2020-04-10  9:32   ` Ankur Arora
2020-04-08 14:12 ` Thomas Gleixner
2020-04-10  9:55   ` Ankur Arora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d7f8bff3-526a-6a84-2e81-677cfbac0111@suse.com \
    --to=jgross@suse.com \
    --cc=ankur.a.arora@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=jpoimboe@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mihai.carabas@oracle.com \
    --cc=namit@vmware.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=vkuznets@redhat.com \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).