All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] docs/virt/kvm: Document running nested guests
@ 2020-02-07 15:30 Kashyap Chamarthy
  2020-02-07 15:46 ` Cornelia Huck
  2020-02-07 16:01 ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 5+ messages in thread
From: Kashyap Chamarthy @ 2020-02-07 15:30 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, dgilbert, vkuznets, Kashyap Chamarthy

This is a rewrite of the Wiki page:

    https://www.linux-kvm.org/page/Nested_Guests

Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
---
Question: is the live migration of L1-with-L2-running-in-it fixed for
*all* architectures, including s390x?
---
 .../virt/kvm/running-nested-guests.rst        | 171 ++++++++++++++++++
 1 file changed, 171 insertions(+)
 create mode 100644 Documentation/virt/kvm/running-nested-guests.rst

diff --git a/Documentation/virt/kvm/running-nested-guests.rst b/Documentation/virt/kvm/running-nested-guests.rst
new file mode 100644
index 0000000000000000000000000000000000000000..e94ab665c71a36b7718aebae902af16b792f6dd3
--- /dev/null
+++ b/Documentation/virt/kvm/running-nested-guests.rst
@@ -0,0 +1,171 @@
+Running nested guests with KVM
+==============================
+
+A nested guest is a KVM guest that in turn runs on a KVM guest::
+
+              .----------------.  .----------------.
+              |                |  |                |
+              |      L2        |  |      L2        |
+              | (Nested Guest) |  | (Nested Guest) |
+              |                |  |                |
+              |----------------'--'----------------|
+              |                                    |
+              |       L1 (Guest Hypervisor)        |
+              |          KVM (/dev/kvm)            |
+              |                                    |
+      .------------------------------------------------------.
+      |                 L0 (Host Hypervisor)                 |
+      |                    KVM (/dev/kvm)                    |
+      |------------------------------------------------------|
+      |                  x86 Hardware (VMX)                  |
+      '------------------------------------------------------'
+
+
+Terminology:
+
+  - L0 – level-0; the bare metal host, running KVM
+
+  - L1 – level-1 guest; a VM running on L0; also called the "guest
+    hypervisor", as it itself is capable of running KVM.
+
+  - L2 – level-2 guest; a VM running on L1, this is the "nested guest"
+
+
+Use Cases
+---------
+
+An additional layer of virtualization sometimes can .  You
+might have access to a large virtual machine in a cloud environment that
+you want to compartmentalize into multiple workloads.  You might be
+running a lab environment in a training session.
+
+There are several scenarios where nested KVM can be Useful:
+
+  - As a developer, you want to test your software on different OSes.
+    Instead of renting multiple VMs from a Cloud Provider, using nested
+    KVM lets you rent a large enough "guest hypervisor" (level-1 guest).
+    This in turn allows you to create multiple nested guests (level-2
+    guests), running different OSes, on which you can develop and test
+    your software.
+
+  - Live migration of "guest hypervisors" and their nested guests, for
+    load balancing, disaster recovery, etc.
+
+  - Using VMs for isolation (as in Kata Containers, and before it Clear
+    Containers https://lwn.net/Articles/644675/) if you're running on a
+    cloud provider that is already using virtual machines
+
+
+Procedure to enable nesting on the bare metal host
+--------------------------------------------------
+
+The KVM kernel modules do not enable nesting by default (though your
+distribution may override this default).  To enable nesting, set the
+``nested`` module parameter to ``Y`` or ``1``. You may set this
+parameter persistently in a file in ``/etc/modprobe.d`` in the L0 host:
+
+1. On the bare metal host (L0), list the kernel modules, and ensure that
+   the KVM modules::
+
+    $ lsmod | grep -i kvm
+    kvm_intel             133627  0
+    kvm                   435079  1 kvm_intel
+
+2. Show information for ``kvm_intel`` module::
+
+    $ modinfo kvm_intel | grep -i nested
+    parm:           nested:boolkvm                   435079  1 kvm_intel
+
+3. To make nested KVM configuration persistent across reboots, place the
+   below entry in a config attribute::
+
+    $ cat /etc/modprobe.d/kvm_intel.conf
+    options kvm-intel nested=y
+
+4. Unload and re-load the KVM Intel module::
+
+    $ sudo rmmod kvm-intel
+    $ sudo modprobe kvm-intel
+
+5. Verify if the ``nested`` parameter for KVM is enabled::
+
+    $ cat /sys/module/kvm_intel/parameters/nested
+    Y
+
+For AMD hosts, the process is the same as above, except that the module
+name is ``kvm-amd``.
+
+Once your bare metal host (L0) is configured for nesting, you should be
+able to start an L1 guest with ``qemu-kvm -cpu host`` (which passes
+through the host CPU's capabilities as-is to the guest); or for better
+live migration compatibility, use a named CPU model supported by QEMU,
+e.g.: ``-cpu Haswell-noTSX-IBRS,vmx=on`` and the guest will subsequently
+be capable of running an L2 guest with accelerated KVM.
+
+Additional nested-related kernel parameters
+-------------------------------------------
+
+If your hardware is sufficiently advanced (Intel Haswell processor or
+above which has newer hardware virt extensions), you might want to
+enable additional features: "Shadow VMCS (Virtual Machine Control
+Structure)", APIC Virtualization on your bare metal host (L0).
+Parameters for Intel hosts::
+
+    $ cat /sys/module/kvm_intel/parameters/enable_shadow_vmcs
+    Y
+
+    $ cat /sys/module/kvm_intel/parameters/enable_apicv
+    N
+
+    $ cat /sys/module/kvm_intel/parameters/ept
+    Y
+
+Again, to persist the above values across reboot, append them to
+``/etc/modprobe.d/kvm_intel.conf``::
+
+    options kvm-intel nested=y
+    options kvm-intel enable_shadow_vmcs=y
+    options kvm-intel enable_apivc=y
+    options kvm-intel ept=y
+
+
+Live migration with nested KVM
+------------------------------
+
+The below live migration scenarios should work as of Linux kernel 5.3
+and QEMU 4.2.0.  In all the below cases, L1 exposes ``/dev/kvm`` in
+it, i.e. the L2 guest is a "KVM-accelerated guest", not a "plain
+emulated guest" (as done by QEMU's TCG).
+
+- Migrating a nested guest (L2) to another L1 guest on the *same* bare
+  metal host.
+
+- Migrating a nested guest (L2) to another L1 guest on a *different*
+  bare metal host.
+
+- Migrating an L1 guest, with an *offline* nested guest in it, to
+  another bare metal host.
+
+- Migrating an L1 guest, with a  *live* nested guest in it, to another
+  bare metal host.
+
+
+Limitations on Linux kernel versions older than 5.3
+---------------------------------------------------
+
+On Linux kernel versions older than 5.3, once an L1 guest has started an
+L2 guest, the L1 guest would no longer capable of being migrated, saved,
+or loaded (refer to QEMU documentation on "save"/"load") until the L2
+guest shuts down.  [FIXME: Is this limitation fixed for *all*
+architectures, including s390x?]
+
+Attempting to migrate or save & load an L1 guest while an L2 guest is
+running will result in undefined behavior.  You might see a ``kernel
+BUG!`` entry in ``dmesg``, a kernel 'oops', or an outright kernel panic.
+Such a migrated or loaded L1 guest can no longer be considered stable or
+secure, and must be restarted.
+
+Migrating an L1 guest merely configured to support nesting, while not
+actually running L2 guests, is expected to function normally.
+Live-migrating an L2 guest from one L1 guest to another is also expected
+to succeed.
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] docs/virt/kvm: Document running nested guests
  2020-02-07 15:30 [PATCH] docs/virt/kvm: Document running nested guests Kashyap Chamarthy
@ 2020-02-07 15:46 ` Cornelia Huck
  2020-02-07 16:26   ` Kashyap Chamarthy
  2020-02-07 16:01 ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 5+ messages in thread
From: Cornelia Huck @ 2020-02-07 15:46 UTC (permalink / raw)
  To: Kashyap Chamarthy, David Hildenbrand; +Cc: kvm, pbonzini, dgilbert, vkuznets

On Fri,  7 Feb 2020 16:30:02 +0100
Kashyap Chamarthy <kchamart@redhat.com> wrote:

> This is a rewrite of the Wiki page:
> 
>     https://www.linux-kvm.org/page/Nested_Guests

Thanks for doing that!

> 
> Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
> ---
> Question: is the live migration of L1-with-L2-running-in-it fixed for
> *all* architectures, including s390x?
> ---
>  .../virt/kvm/running-nested-guests.rst        | 171 ++++++++++++++++++
>  1 file changed, 171 insertions(+)
>  create mode 100644 Documentation/virt/kvm/running-nested-guests.rst

FWIW, there's currently a series converting this subdirectory to rst
on-list.

> 
> diff --git a/Documentation/virt/kvm/running-nested-guests.rst b/Documentation/virt/kvm/running-nested-guests.rst
> new file mode 100644
> index 0000000000000000000000000000000000000000..e94ab665c71a36b7718aebae902af16b792f6dd3
> --- /dev/null
> +++ b/Documentation/virt/kvm/running-nested-guests.rst
> @@ -0,0 +1,171 @@
> +Running nested guests with KVM
> +==============================

I think the common style is to also have a "===..." line on top.

> +
> +A nested guest is a KVM guest that in turn runs on a KVM guest::
> +
> +              .----------------.  .----------------.
> +              |                |  |                |
> +              |      L2        |  |      L2        |
> +              | (Nested Guest) |  | (Nested Guest) |
> +              |                |  |                |
> +              |----------------'--'----------------|
> +              |                                    |
> +              |       L1 (Guest Hypervisor)        |
> +              |          KVM (/dev/kvm)            |
> +              |                                    |
> +      .------------------------------------------------------.
> +      |                 L0 (Host Hypervisor)                 |
> +      |                    KVM (/dev/kvm)                    |
> +      |------------------------------------------------------|
> +      |                  x86 Hardware (VMX)                  |

Just 'Hardware'? I don't think you want to make this x86-specific?

> +      '------------------------------------------------------'
> +
> +
> +Terminology:
> +
> +  - L0 – level-0; the bare metal host, running KVM
> +
> +  - L1 – level-1 guest; a VM running on L0; also called the "guest
> +    hypervisor", as it itself is capable of running KVM.
> +
> +  - L2 – level-2 guest; a VM running on L1, this is the "nested guest"
> +
> +
> +Use Cases
> +---------
> +
> +An additional layer of virtualization sometimes can .  You

Something seems to be missing here?

> +might have access to a large virtual machine in a cloud environment that
> +you want to compartmentalize into multiple workloads.  You might be
> +running a lab environment in a training session.
> +
> +There are several scenarios where nested KVM can be Useful:

s/Useful/useful/

> +
> +  - As a developer, you want to test your software on different OSes.
> +    Instead of renting multiple VMs from a Cloud Provider, using nested
> +    KVM lets you rent a large enough "guest hypervisor" (level-1 guest).
> +    This in turn allows you to create multiple nested guests (level-2
> +    guests), running different OSes, on which you can develop and test
> +    your software.
> +
> +  - Live migration of "guest hypervisors" and their nested guests, for
> +    load balancing, disaster recovery, etc.
> +
> +  - Using VMs for isolation (as in Kata Containers, and before it Clear
> +    Containers https://lwn.net/Articles/644675/) if you're running on a
> +    cloud provider that is already using virtual machines
> +
> +
> +Procedure to enable nesting on the bare metal host
> +--------------------------------------------------
> +
> +The KVM kernel modules do not enable nesting by default (though your
> +distribution may override this default).  To enable nesting, set the
> +``nested`` module parameter to ``Y`` or ``1``. You may set this
> +parameter persistently in a file in ``/etc/modprobe.d`` in the L0 host:
> +
> +1. On the bare metal host (L0), list the kernel modules, and ensure that
> +   the KVM modules::
> +
> +    $ lsmod | grep -i kvm
> +    kvm_intel             133627  0
> +    kvm                   435079  1 kvm_intel
> +
> +2. Show information for ``kvm_intel`` module::
> +
> +    $ modinfo kvm_intel | grep -i nested
> +    parm:           nested:boolkvm                   435079  1 kvm_intel
> +
> +3. To make nested KVM configuration persistent across reboots, place the
> +   below entry in a config attribute::
> +
> +    $ cat /etc/modprobe.d/kvm_intel.conf
> +    options kvm-intel nested=y
> +
> +4. Unload and re-load the KVM Intel module::
> +
> +    $ sudo rmmod kvm-intel
> +    $ sudo modprobe kvm-intel
> +
> +5. Verify if the ``nested`` parameter for KVM is enabled::
> +
> +    $ cat /sys/module/kvm_intel/parameters/nested
> +    Y
> +
> +For AMD hosts, the process is the same as above, except that the module
> +name is ``kvm-amd``.

This looks x86-specific. Don't know about others, but s390 has one
module, also a 'nested' parameter, which is mutually exclusive with a
'hpage' parameter.

> +
> +Once your bare metal host (L0) is configured for nesting, you should be
> +able to start an L1 guest with ``qemu-kvm -cpu host`` (which passes
> +through the host CPU's capabilities as-is to the guest); or for better
> +live migration compatibility, use a named CPU model supported by QEMU,
> +e.g.: ``-cpu Haswell-noTSX-IBRS,vmx=on`` and the guest will subsequently
> +be capable of running an L2 guest with accelerated KVM.

That's probably more something that should go into a section that gives
an example how to start a nested guest with QEMU? Cpu models also look
different between architectures.

> +
> +Additional nested-related kernel parameters
> +-------------------------------------------
> +
> +If your hardware is sufficiently advanced (Intel Haswell processor or
> +above which has newer hardware virt extensions), you might want to
> +enable additional features: "Shadow VMCS (Virtual Machine Control
> +Structure)", APIC Virtualization on your bare metal host (L0).
> +Parameters for Intel hosts::
> +
> +    $ cat /sys/module/kvm_intel/parameters/enable_shadow_vmcs
> +    Y
> +
> +    $ cat /sys/module/kvm_intel/parameters/enable_apicv
> +    N
> +
> +    $ cat /sys/module/kvm_intel/parameters/ept
> +    Y
> +
> +Again, to persist the above values across reboot, append them to
> +``/etc/modprobe.d/kvm_intel.conf``::
> +
> +    options kvm-intel nested=y
> +    options kvm-intel enable_shadow_vmcs=y
> +    options kvm-intel enable_apivc=y
> +    options kvm-intel ept=y

x86 specific -- maybe reorganize this document by starting with a
general setup section and then giving some architecture-specific
information?

> +
> +
> +Live migration with nested KVM
> +------------------------------
> +
> +The below live migration scenarios should work as of Linux kernel 5.3
> +and QEMU 4.2.0.  In all the below cases, L1 exposes ``/dev/kvm`` in
> +it, i.e. the L2 guest is a "KVM-accelerated guest", not a "plain
> +emulated guest" (as done by QEMU's TCG).
> +
> +- Migrating a nested guest (L2) to another L1 guest on the *same* bare
> +  metal host.
> +
> +- Migrating a nested guest (L2) to another L1 guest on a *different*
> +  bare metal host.
> +
> +- Migrating an L1 guest, with an *offline* nested guest in it, to
> +  another bare metal host.
> +
> +- Migrating an L1 guest, with a  *live* nested guest in it, to another
> +  bare metal host.
> +
> +
> +Limitations on Linux kernel versions older than 5.3
> +---------------------------------------------------
> +
> +On Linux kernel versions older than 5.3, once an L1 guest has started an
> +L2 guest, the L1 guest would no longer capable of being migrated, saved,
> +or loaded (refer to QEMU documentation on "save"/"load") until the L2
> +guest shuts down.  [FIXME: Is this limitation fixed for *all*
> +architectures, including s390x?]

I don't think we ever had that limitation on s390x, since the whole way
control blocks etc. are handled is different there. David (H), do you
remember?

> +
> +Attempting to migrate or save & load an L1 guest while an L2 guest is
> +running will result in undefined behavior.  You might see a ``kernel
> +BUG!`` entry in ``dmesg``, a kernel 'oops', or an outright kernel panic.
> +Such a migrated or loaded L1 guest can no longer be considered stable or
> +secure, and must be restarted.
> +
> +Migrating an L1 guest merely configured to support nesting, while not
> +actually running L2 guests, is expected to function normally.
> +Live-migrating an L2 guest from one L1 guest to another is also expected
> +to succeed.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] docs/virt/kvm: Document running nested guests
  2020-02-07 15:30 [PATCH] docs/virt/kvm: Document running nested guests Kashyap Chamarthy
  2020-02-07 15:46 ` Cornelia Huck
@ 2020-02-07 16:01 ` Dr. David Alan Gilbert
  2020-02-07 16:40   ` Kashyap Chamarthy
  1 sibling, 1 reply; 5+ messages in thread
From: Dr. David Alan Gilbert @ 2020-02-07 16:01 UTC (permalink / raw)
  To: Kashyap Chamarthy; +Cc: kvm, pbonzini, vkuznets

* Kashyap Chamarthy (kchamart@redhat.com) wrote:
> This is a rewrite of the Wiki page:
> 
>     https://www.linux-kvm.org/page/Nested_Guests
> 
> Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
> ---
> Question: is the live migration of L1-with-L2-running-in-it fixed for
> *all* architectures, including s390x?
> ---
>  .../virt/kvm/running-nested-guests.rst        | 171 ++++++++++++++++++
>  1 file changed, 171 insertions(+)
>  create mode 100644 Documentation/virt/kvm/running-nested-guests.rst
> 
> diff --git a/Documentation/virt/kvm/running-nested-guests.rst b/Documentation/virt/kvm/running-nested-guests.rst
> new file mode 100644
> index 0000000000000000000000000000000000000000..e94ab665c71a36b7718aebae902af16b792f6dd3
> --- /dev/null
> +++ b/Documentation/virt/kvm/running-nested-guests.rst
> @@ -0,0 +1,171 @@
> +Running nested guests with KVM
> +==============================
> +
> +A nested guest is a KVM guest that in turn runs on a KVM guest::

Note nesting maybe a little more general; e.g. L1 might be another
OS/hypervisor that wants to run it's own L2; and similarly
KVM might be the L1 under someone elses hypervisor.

I think this doc is mostly about the case of KVM being the L0
and wanting to run an L1 that's capable of running an L2.

> +              .----------------.  .----------------.
> +              |                |  |                |
> +              |      L2        |  |      L2        |
> +              | (Nested Guest) |  | (Nested Guest) |
> +              |                |  |                |
> +              |----------------'--'----------------|
> +              |                                    |
> +              |       L1 (Guest Hypervisor)        |
> +              |          KVM (/dev/kvm)            |
> +              |                                    |
> +      .------------------------------------------------------.
> +      |                 L0 (Host Hypervisor)                 |
> +      |                    KVM (/dev/kvm)                    |
> +      |------------------------------------------------------|
> +      |                  x86 Hardware (VMX)                  |
> +      '------------------------------------------------------'

This is now x86 specific but the doc is in a general directory;
I'm not sure what other architecture nesting rules are.

Woth having VMX/SVM at least.

> +
> +Terminology:
> +
> +  - L0 – level-0; the bare metal host, running KVM
> +
> +  - L1 – level-1 guest; a VM running on L0; also called the "guest
> +    hypervisor", as it itself is capable of running KVM.
> +
> +  - L2 – level-2 guest; a VM running on L1, this is the "nested guest"
> +
> +
> +Use Cases
> +---------
> +
> +An additional layer of virtualization sometimes can .  You
> +might have access to a large virtual machine in a cloud environment that
> +you want to compartmentalize into multiple workloads.  You might be
> +running a lab environment in a training session.

Lose this paragraph, and just use the list below?

> +There are several scenarios where nested KVM can be Useful:
> +
> +  - As a developer, you want to test your software on different OSes.
> +    Instead of renting multiple VMs from a Cloud Provider, using nested
> +    KVM lets you rent a large enough "guest hypervisor" (level-1 guest).
> +    This in turn allows you to create multiple nested guests (level-2
> +    guests), running different OSes, on which you can develop and test
> +    your software.
> +
> +  - Live migration of "guest hypervisors" and their nested guests, for
> +    load balancing, disaster recovery, etc.
> +
> +  - Using VMs for isolation (as in Kata Containers, and before it Clear
> +    Containers https://lwn.net/Articles/644675/) if you're running on a
> +    cloud provider that is already using virtual machines

Some others that might be worth listing;
   - VM image creation tools (e.g. virt-install etc) often run their own
     VM, and users expect these to work inside a VM.
   - Some other OS's use virtualization internally for other
     features/protection.

> +Procedure to enable nesting on the bare metal host
> +--------------------------------------------------
> +
> +The KVM kernel modules do not enable nesting by default (though your
> +distribution may override this default).

It's the other way;  see 1e58e5e for intel has made it default; AMD has
it set as default for longer.

 >To enable nesting, set the
> +``nested`` module parameter to ``Y`` or ``1``. You may set this
> +parameter persistently in a file in ``/etc/modprobe.d`` in the L0 host:


> +1. On the bare metal host (L0), list the kernel modules, and ensure that
> +   the KVM modules::
> +
> +    $ lsmod | grep -i kvm
> +    kvm_intel             133627  0
> +    kvm                   435079  1 kvm_intel
> +
> +2. Show information for ``kvm_intel`` module::
> +
> +    $ modinfo kvm_intel | grep -i nested
> +    parm:           nested:boolkvm                   435079  1 kvm_intel
> +
> +3. To make nested KVM configuration persistent across reboots, place the
> +   below entry in a config attribute::
> +
> +    $ cat /etc/modprobe.d/kvm_intel.conf
> +    options kvm-intel nested=y
> +
> +4. Unload and re-load the KVM Intel module::
> +
> +    $ sudo rmmod kvm-intel
> +    $ sudo modprobe kvm-intel
> +
> +5. Verify if the ``nested`` parameter for KVM is enabled::
> +
> +    $ cat /sys/module/kvm_intel/parameters/nested
> +    Y
> +
> +For AMD hosts, the process is the same as above, except that the module
> +name is ``kvm-amd``.
> +
> +Once your bare metal host (L0) is configured for nesting, you should be
> +able to start an L1 guest with ``qemu-kvm -cpu host`` (which passes
> +through the host CPU's capabilities as-is to the guest); or for better
> +live migration compatibility, use a named CPU model supported by QEMU,
> +e.g.: ``-cpu Haswell-noTSX-IBRS,vmx=on`` and the guest will subsequently
> +be capable of running an L2 guest with accelerated KVM.
> +
> +Additional nested-related kernel parameters
> +-------------------------------------------
> +
> +If your hardware is sufficiently advanced (Intel Haswell processor or
> +above which has newer hardware virt extensions), you might want to
> +enable additional features: "Shadow VMCS (Virtual Machine Control
> +Structure)", APIC Virtualization on your bare metal host (L0).
> +Parameters for Intel hosts::
> +
> +    $ cat /sys/module/kvm_intel/parameters/enable_shadow_vmcs
> +    Y
> +
> +    $ cat /sys/module/kvm_intel/parameters/enable_apicv
> +    N
> +
> +    $ cat /sys/module/kvm_intel/parameters/ept
> +    Y

Don't those happen automatically (mostly?)

> +Again, to persist the above values across reboot, append them to
> +``/etc/modprobe.d/kvm_intel.conf``::
> +
> +    options kvm-intel nested=y
> +    options kvm-intel enable_shadow_vmcs=y
> +    options kvm-intel enable_apivc=y
> +    options kvm-intel ept=y
> +
> +
> +Live migration with nested KVM
> +------------------------------
> +
> +The below live migration scenarios should work as of Linux kernel 5.3
> +and QEMU 4.2.0.  In all the below cases, L1 exposes ``/dev/kvm`` in
> +it, i.e. the L2 guest is a "KVM-accelerated guest", not a "plain
> +emulated guest" (as done by QEMU's TCG).
> +
> +- Migrating a nested guest (L2) to another L1 guest on the *same* bare
> +  metal host.
> +
> +- Migrating a nested guest (L2) to another L1 guest on a *different*
> +  bare metal host.
> +
> +- Migrating an L1 guest, with an *offline* nested guest in it, to
> +  another bare metal host.
> +
> +- Migrating an L1 guest, with a  *live* nested guest in it, to another
> +  bare metal host.
> +
> +
> +Limitations on Linux kernel versions older than 5.3
> +---------------------------------------------------
> +
> +On Linux kernel versions older than 5.3, once an L1 guest has started an
> +L2 guest, the L1 guest would no longer capable of being migrated, saved,
> +or loaded (refer to QEMU documentation on "save"/"load") until the L2
> +guest shuts down.  [FIXME: Is this limitation fixed for *all*
> +architectures, including s390x?]
> +
> +Attempting to migrate or save & load an L1 guest while an L2 guest is
> +running will result in undefined behavior.  You might see a ``kernel
> +BUG!`` entry in ``dmesg``, a kernel 'oops', or an outright kernel panic.
> +Such a migrated or loaded L1 guest can no longer be considered stable or
> +secure, and must be restarted.
> +
> +Migrating an L1 guest merely configured to support nesting, while not
> +actually running L2 guests, is expected to function normally.
> +Live-migrating an L2 guest from one L1 guest to another is also expected
> +to succeed.

Can you add an entry along the lines of 'reporting bugs with nesting'
that explains you should clearly state what the host CPU is,
and the exact OS and hypervisor config in L0,L1 and L2 ?

Dave

> -- 
> 2.21.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] docs/virt/kvm: Document running nested guests
  2020-02-07 15:46 ` Cornelia Huck
@ 2020-02-07 16:26   ` Kashyap Chamarthy
  0 siblings, 0 replies; 5+ messages in thread
From: Kashyap Chamarthy @ 2020-02-07 16:26 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: David Hildenbrand, kvm, pbonzini, dgilbert, vkuznets

On Fri, Feb 07, 2020 at 04:46:53PM +0100, Cornelia Huck wrote:
> On Fri,  7 Feb 2020 16:30:02 +0100
> Kashyap Chamarthy <kchamart@redhat.com> wrote:
> 

[...]

> > ---
> >  .../virt/kvm/running-nested-guests.rst        | 171 ++++++++++++++++++
> >  1 file changed, 171 insertions(+)
> >  create mode 100644 Documentation/virt/kvm/running-nested-guests.rst
> 
> FWIW, there's currently a series converting this subdirectory to rst
> on-list.

I see, noted.  I hope there won't be any conflict, as this is a new file
addition.

> > 
> > diff --git a/Documentation/virt/kvm/running-nested-guests.rst b/Documentation/virt/kvm/running-nested-guests.rst
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..e94ab665c71a36b7718aebae902af16b792f6dd3
> > --- /dev/null
> > +++ b/Documentation/virt/kvm/running-nested-guests.rst
> > @@ -0,0 +1,171 @@
> > +Running nested guests with KVM
> > +==============================
> 
> I think the common style is to also have a "===..." line on top.

Will add.  (Just that some projects don't use it; others do.  :-))


> > +
> > +A nested guest is a KVM guest that in turn runs on a KVM guest::
> > +
> > +              .----------------.  .----------------.
> > +              |                |  |                |
> > +              |      L2        |  |      L2        |
> > +              | (Nested Guest) |  | (Nested Guest) |
> > +              |                |  |                |
> > +              |----------------'--'----------------|
> > +              |                                    |
> > +              |       L1 (Guest Hypervisor)        |
> > +              |          KVM (/dev/kvm)            |
> > +              |                                    |
> > +      .------------------------------------------------------.
> > +      |                 L0 (Host Hypervisor)                 |
> > +      |                    KVM (/dev/kvm)                    |
> > +      |------------------------------------------------------|
> > +      |                  x86 Hardware (VMX)                  |
> 
> Just 'Hardware'? I don't think you want to make this x86-specific?

Good point, will make it more generic.

> 
> > +      '------------------------------------------------------'
> > +
> > +
> > +Terminology:
> > +
> > +  - L0 – level-0; the bare metal host, running KVM
> > +
> > +  - L1 – level-1 guest; a VM running on L0; also called the "guest
> > +    hypervisor", as it itself is capable of running KVM.
> > +
> > +  - L2 – level-2 guest; a VM running on L1, this is the "nested guest"
> > +
> > +
> > +Use Cases
> > +---------
> > +
> > +An additional layer of virtualization sometimes can .  You
> 
> Something seems to be missing here?

Err, broken sentence while rewriting (perils of distraction).  I'll fix
it.

> > +might have access to a large virtual machine in a cloud environment that
> > +you want to compartmentalize into multiple workloads.  You might be
> > +running a lab environment in a training session.
> > +
> > +There are several scenarios where nested KVM can be Useful:
> 
> s/Useful/useful/

Will fix in v2.

[...]

> > +    $ cat /sys/module/kvm_intel/parameters/nested
> > +    Y
> > +
> > +For AMD hosts, the process is the same as above, except that the module
> > +name is ``kvm-amd``.
> 
> This looks x86-specific. Don't know about others, but s390 has one
> module, also a 'nested' parameter, which is mutually exclusive with a
> 'hpage' parameter.

Fair point, I'll add a seperate section for all relevant architectures.
Thanks for pointing it out.

> > +
> > +Once your bare metal host (L0) is configured for nesting, you should be
> > +able to start an L1 guest with ``qemu-kvm -cpu host`` (which passes
> > +through the host CPU's capabilities as-is to the guest); or for better
> > +live migration compatibility, use a named CPU model supported by QEMU,
> > +e.g.: ``-cpu Haswell-noTSX-IBRS,vmx=on`` and the guest will subsequently
> > +be capable of running an L2 guest with accelerated KVM.
> 
> That's probably more something that should go into a section that gives
> an example how to start a nested guest with QEMU? Cpu models also look
> different between architectures.

Yeah, I wondered about it.  I'll add a simple, representative example.

[...]

> > +Again, to persist the above values across reboot, append them to
> > +``/etc/modprobe.d/kvm_intel.conf``::
> > +
> > +    options kvm-intel nested=y
> > +    options kvm-intel enable_shadow_vmcs=y
> > +    options kvm-intel enable_apivc=y
> > +    options kvm-intel ept=y
> 
> x86 specific -- maybe reorganize this document by starting with a
> general setup section and then giving some architecture-specific
> information?

Yeah, good point.  Sorry, I was too x86-centric as I tend to just work
with x86 machines.  Reorganizing it as you suggest sounds good.

[...]

> > +Limitations on Linux kernel versions older than 5.3
> > +---------------------------------------------------
> > +
> > +On Linux kernel versions older than 5.3, once an L1 guest has started an
> > +L2 guest, the L1 guest would no longer capable of being migrated, saved,
> > +or loaded (refer to QEMU documentation on "save"/"load") until the L2
> > +guest shuts down.  [FIXME: Is this limitation fixed for *all*
> > +architectures, including s390x?]
> 
> I don't think we ever had that limitation on s390x, since the whole way
> control blocks etc. are handled is different there. David (H), do you
> remember?

I see, I was just not sure.  Thought I better ask on the list :-)

Thank you for the quick review!

[...]

-- 
/kashyap


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] docs/virt/kvm: Document running nested guests
  2020-02-07 16:01 ` Dr. David Alan Gilbert
@ 2020-02-07 16:40   ` Kashyap Chamarthy
  0 siblings, 0 replies; 5+ messages in thread
From: Kashyap Chamarthy @ 2020-02-07 16:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: kvm, pbonzini, vkuznets

On Fri, Feb 07, 2020 at 04:01:57PM +0000, Dr. David Alan Gilbert wrote:
> * Kashyap Chamarthy (kchamart@redhat.com) wrote:

[...]

> > +Running nested guests with KVM
> > +==============================
> > +
> > +A nested guest is a KVM guest that in turn runs on a KVM guest::
> 
> Note nesting maybe a little more general; e.g. L1 might be another
> OS/hypervisor that wants to run it's own L2; and similarly
> KVM might be the L1 under someone elses hypervisor.

True, I narrowly focused on KVM-on-KVM.

Will take this approach: I'll mention the generic nature of nesting, but
focus on KVM-on-KVM in this document.
 
> I think this doc is mostly about the case of KVM being the L0
> and wanting to run an L1 that's capable of running an L2.
>
> > +              .----------------.  .----------------.
> > +              |                |  |                |
> > +              |      L2        |  |      L2        |
> > +              | (Nested Guest) |  | (Nested Guest) |
> > +              |                |  |                |
> > +              |----------------'--'----------------|
> > +              |                                    |
> > +              |       L1 (Guest Hypervisor)        |
> > +              |          KVM (/dev/kvm)            |
> > +              |                                    |
> > +      .------------------------------------------------------.
> > +      |                 L0 (Host Hypervisor)                 |
> > +      |                    KVM (/dev/kvm)                    |
> > +      |------------------------------------------------------|
> > +      |                  x86 Hardware (VMX)                  |
> > +      '------------------------------------------------------'
> 
> This is now x86 specific but the doc is in a general directory;
> I'm not sure what other architecture nesting rules are.

Yeah, x86 is the beast I knew, so I stuck to it.  But since this is
upstream doc, I should bear in mind to clearly mention s390x and other
architectures. 
 
> Woth having VMX/SVM at least.

Will add.

[...]

> > +
> > +Use Cases
> > +---------
> > +
> > +An additional layer of virtualization sometimes can .  You
> > +might have access to a large virtual machine in a cloud environment that
> > +you want to compartmentalize into multiple workloads.  You might be
> > +running a lab environment in a training session.
> 
> Lose this paragraph, and just use the list below?

That was precisely my intention, but I didn't commit the local version
before sending.  Will fix in v2.

> > +There are several scenarios where nested KVM can be Useful:
> > +
> > +  - As a developer, you want to test your software on different OSes.
> > +    Instead of renting multiple VMs from a Cloud Provider, using nested
> > +    KVM lets you rent a large enough "guest hypervisor" (level-1 guest).
> > +    This in turn allows you to create multiple nested guests (level-2
> > +    guests), running different OSes, on which you can develop and test
> > +    your software.
> > +
> > +  - Live migration of "guest hypervisors" and their nested guests, for
> > +    load balancing, disaster recovery, etc.
> > +
> > +  - Using VMs for isolation (as in Kata Containers, and before it Clear
> > +    Containers https://lwn.net/Articles/644675/) if you're running on a
> > +    cloud provider that is already using virtual machines

The last use-case was pointed out by Paolo elsewhere.  (I should make
this more generic.)

> Some others that might be worth listing;
>    - VM image creation tools (e.g. virt-install etc) often run their own
>      VM, and users expect these to work inside a VM.
>    - Some other OS's use virtualization internally for other
>      features/protection.

Yeah.  Will add; thanks!

> > +Procedure to enable nesting on the bare metal host
> > +--------------------------------------------------
> > +
> > +The KVM kernel modules do not enable nesting by default (though your
> > +distribution may override this default).
> 
> It's the other way;  see 1e58e5e for intel has made it default; AMD has
> it set as default for longer.

Ah, this was another bit I realized later, but forgot to fix before
sending to the list.  (I recall seeing this when it came out about a
year ago:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1e58e5e)

Will fix.  Thanks for the eagle eyes :-)

> > +Additional nested-related kernel parameters
> > +-------------------------------------------
> > +
> > +If your hardware is sufficiently advanced (Intel Haswell processor or
> > +above which has newer hardware virt extensions), you might want to
> > +enable additional features: "Shadow VMCS (Virtual Machine Control
> > +Structure)", APIC Virtualization on your bare metal host (L0).
> > +Parameters for Intel hosts::
> > +
> > +    $ cat /sys/module/kvm_intel/parameters/enable_shadow_vmcs
> > +    Y
> > +
> > +    $ cat /sys/module/kvm_intel/parameters/enable_apicv
> > +    N
> > +
> > +    $ cat /sys/module/kvm_intel/parameters/ept
> > +    Y
> 
> Don't those happen automatically (mostly?)

EPT, yes.  I forget if `enable_shadow_vmcs` and `enable_apivc` are.
I'll investigate and update.

[...]

> > +Limitations on Linux kernel versions older than 5.3
> > +---------------------------------------------------
> > +
> > +On Linux kernel versions older than 5.3, once an L1 guest has started an
> > +L2 guest, the L1 guest would no longer capable of being migrated, saved,
> > +or loaded (refer to QEMU documentation on "save"/"load") until the L2
> > +guest shuts down.  [FIXME: Is this limitation fixed for *all*
> > +architectures, including s390x?]
> > +
> > +Attempting to migrate or save & load an L1 guest while an L2 guest is
> > +running will result in undefined behavior.  You might see a ``kernel
> > +BUG!`` entry in ``dmesg``, a kernel 'oops', or an outright kernel panic.
> > +Such a migrated or loaded L1 guest can no longer be considered stable or
> > +secure, and must be restarted.
> > +
> > +Migrating an L1 guest merely configured to support nesting, while not
> > +actually running L2 guests, is expected to function normally.
> > +Live-migrating an L2 guest from one L1 guest to another is also expected
> > +to succeed.
> 
> Can you add an entry along the lines of 'reporting bugs with nesting'
> that explains you should clearly state what the host CPU is,
> and the exact OS and hypervisor config in L0,L1 and L2 ?

Yes, good point.  I'll add a short version based my notes from here
(which you've reviewed in the past):

https://kashyapc.fedorapeople.org/Notes/_build/html/docs/Info-to-collect-when-debugging-nested-KVM.html#what-information-to-collect

Thanks for the review.

-- 
/kashyap


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-07 16:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-07 15:30 [PATCH] docs/virt/kvm: Document running nested guests Kashyap Chamarthy
2020-02-07 15:46 ` Cornelia Huck
2020-02-07 16:26   ` Kashyap Chamarthy
2020-02-07 16:01 ` Dr. David Alan Gilbert
2020-02-07 16:40   ` Kashyap Chamarthy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.