All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 00/17] Add AMD SEV guest live migration support
@ 2021-02-04  0:35 Ashish Kalra
  2021-02-04  0:36 ` [PATCH v10 01/16] KVM: SVM: Add KVM_SEV SEND_START command Ashish Kalra
                   ` (15 more replies)
  0 siblings, 16 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-02-04  0:35 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Ashish Kalra <ashish.kalra@amd.com>

The series add support for AMD SEV guest live migration commands. To protect the
confidentiality of an SEV protected guest memory while in transit we need to
use the SEV commands defined in SEV API spec [1].

SEV guest VMs have the concept of private and shared memory. Private memory
is encrypted with the guest-specific key, while shared memory may be encrypted
with hypervisor key. The commands provided by the SEV FW are meant to be used
for the private memory only. The patch series introduces a new hypercall.
The guest OS can use this hypercall to notify the page encryption status.
If the page is encrypted with guest specific-key then we use SEV command during
the migration. If page is not encrypted then fallback to default.

The patch adds new ioctls KVM_{SET,GET}_SHARED_PAGES_LIST. The ioctl can be used
by the qemu to get the shared pages list. Qemu can consult this list
during the migration to know whether the page is encrypted.

This section descibes how the SEV live migration feature is negotiated
between the host and guest, the host indicates this feature support via 
KVM_FEATURE_CPUID. The guest firmware (OVMF) detects this feature and
sets a UEFI enviroment variable indicating OVMF support for live
migration, the guest kernel also detects the host support for this
feature via cpuid and in case of an EFI boot verifies if OVMF also
supports this feature by getting the UEFI enviroment variable and if it
set then enables live migration feature on host by writing to a custom
MSR, if not booted under EFI, then it simply enables the feature by
again writing to the custom MSR. The host returns error as part of
KVM_GET_SHARED_PAGES_LIST ioctl if guest has not enabled live migration.

A branch containing these patches is available here:
https://github.com/AMDESE/linux/tree/sev-migration-v10

[1] https://developer.amd.com/wp-content/resources/55766.PDF

Changes since v9:
- Transitioning from page encryption bitmap to the shared pages list
  to keep track of guest's shared/unencrypted memory regions.
- Move back to marking the complete _bss_decrypted section as 
  decrypted in the shared pages list.
- Invoke a new function check_kvm_sev_migration() via kvm_init_platform()
  for guest to query for host-side support for SEV live migration 
  and to enable the SEV live migration feature, to avoid
  #ifdefs in code 
- Rename MSR_KVM_SEV_LIVE_MIG_EN to MSR_KVM_SEV_LIVE_MIGRATION.
- Invoke a new function handle_unencrypted_region() from 
  sev_dbg_crypt() to bypass unencrypted guest memory regions.

Changes since v8:
- Rebasing to kvm next branch.
- Fixed and added comments as per review feedback on v8 patches.
- Removed implicitly enabling live migration for incoming VMs in
  in KVM_SET_PAGE_ENC_BITMAP, it is now done via KVM_SET_MSR ioctl.
- Adds support for bypassing unencrypted guest memory regions for
  DBG_DECRYPT API calls, guest memory region encryption status in
  sev_dbg_decrypt() is referenced using the page encryption bitmap.

Changes since v7:
- Removed the hypervisor specific hypercall/paravirt callback for
  SEV live migration and moved back to calling kvm_sev_hypercall3 
  directly.
- Fix build errors as
  Reported-by: kbuild test robot <lkp@intel.com>, specifically fixed
  build error when CONFIG_HYPERVISOR_GUEST=y and
  CONFIG_AMD_MEM_ENCRYPT=n.
- Implicitly enabled live migration for incoming VM(s) to handle 
  A->B->C->... VM migrations.
- Fixed Documentation as per comments on v6 patches.
- Fixed error return path in sev_send_update_data() as per comments 
  on v6 patches. 

Changes since v6:
- Rebasing to mainline and refactoring to the new split SVM
  infrastructre.
- Move to static allocation of the unified Page Encryption bitmap
  instead of the dynamic resizing of the bitmap, the static allocation
  is done implicitly by extending kvm_arch_commit_memory_region() callack
  to add svm specific x86_ops which can read the userspace provided memory
  region/memslots and calculate the amount of guest RAM managed by the KVM
  and grow the bitmap.
- Fixed KVM_SET_PAGE_ENC_BITMAP ioctl to set the whole bitmap instead
  of simply clearing specific bits.
- Removed KVM_PAGE_ENC_BITMAP_RESET ioctl, which is now performed using
  KVM_SET_PAGE_ENC_BITMAP.
- Extended guest support for enabling Live Migration feature by adding a
  check for UEFI environment variable indicating OVMF support for Live
  Migration feature and additionally checking for KVM capability for the
  same feature. If not booted under EFI, then we simply check for KVM
  capability.
- Add hypervisor specific hypercall for SEV live migration by adding
  a new paravirt callback as part of x86_hyper_runtime.
  (x86 hypervisor specific runtime callbacks)
- Moving MSR handling for MSR_KVM_SEV_LIVE_MIG_EN into svm/sev code 
  and adding check for SEV live migration enabled by guest in the 
  KVM_GET_PAGE_ENC_BITMAP ioctl.
- Instead of the complete __bss_decrypted section, only specific variables
  such as hv_clock_boot and wall_clock are marked as decrypted in the
  page encryption bitmap

Changes since v5:
- Fix build errors as
  Reported-by: kbuild test robot <lkp@intel.com>

Changes since v4:
- Host support has been added to extend KVM capabilities/feature bits to 
  include a new KVM_FEATURE_SEV_LIVE_MIGRATION, which the guest can
  query for host-side support for SEV live migration and a new custom MSR
  MSR_KVM_SEV_LIVE_MIG_EN is added for guest to enable the SEV live
  migration feature.
- Ensure that _bss_decrypted section is marked as decrypted in the
  page encryption bitmap.
- Fixing KVM_GET_PAGE_ENC_BITMAP ioctl to return the correct bitmap
  as per the number of pages being requested by the user. Ensure that
  we only copy bmap->num_pages bytes in the userspace buffer, if
  bmap->num_pages is not byte aligned we read the trailing bits
  from the userspace and copy those bits as is. This fixes guest
  page(s) corruption issues observed after migration completion.
- Add kexec support for SEV Live Migration to reset the host's
  page encryption bitmap related to kernel specific page encryption
  status settings before we load a new kernel by kexec. We cannot
  reset the complete page encryption bitmap here as we need to
  retain the UEFI/OVMF firmware specific settings.

Changes since v3:
- Rebasing to mainline and testing.
- Adding a new KVM_PAGE_ENC_BITMAP_RESET ioctl, which resets the 
  page encryption bitmap on a guest reboot event.
- Adding a more reliable sanity check for GPA range being passed to
  the hypercall to ensure that guest MMIO ranges are also marked
  in the page encryption bitmap.

Changes since v2:
 - reset the page encryption bitmap on vcpu reboot

Changes since v1:
 - Add support to share the page encryption between the source and target
   machine.
 - Fix review feedbacks from Tom Lendacky.
 - Add check to limit the session blob length.
 - Update KVM_GET_PAGE_ENC_BITMAP icotl to use the base_gfn instead of
   the memory slot when querying the bitmap.

Ashish Kalra (5):
  KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature &
    Custom MSR.
  EFI: Introduce the new AMD Memory Encryption GUID.
  KVM: x86: Add guest support for detecting and enabling SEV Live
    Migration feature.
  KVM: x86: Add kexec support for SEV Live Migration.
  KVM: SVM: Bypass DBG_DECRYPT API calls for unencrypted guest memory.

Brijesh Singh (11):
  KVM: SVM: Add KVM_SEV SEND_START command
  KVM: SVM: Add KVM_SEND_UPDATE_DATA command
  KVM: SVM: Add KVM_SEV_SEND_FINISH command
  KVM: SVM: Add support for KVM_SEV_RECEIVE_START command
  KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command
  KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
  KVM: x86: Add AMD SEV specific Hypercall3
  KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  mm: x86: Invoke hypercall when page encryption status is changed
  KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  KVM: x86: Introduce KVM_SET_SHARED_PAGES_LIST ioctl

 .../virt/kvm/amd-memory-encryption.rst        | 120 +++
 Documentation/virt/kvm/api.rst                |  44 +-
 Documentation/virt/kvm/cpuid.rst              |   5 +
 Documentation/virt/kvm/hypercalls.rst         |  15 +
 Documentation/virt/kvm/msr.rst                |  12 +
 arch/x86/include/asm/kvm_host.h               |   6 +
 arch/x86/include/asm/kvm_para.h               |  12 +
 arch/x86/include/asm/mem_encrypt.h            |   8 +
 arch/x86/include/asm/paravirt.h               |  10 +
 arch/x86/include/asm/paravirt_types.h         |   2 +
 arch/x86/include/uapi/asm/kvm_para.h          |   4 +
 arch/x86/kernel/kvm.c                         |  80 ++
 arch/x86/kernel/paravirt.c                    |   1 +
 arch/x86/kvm/svm/sev.c                        | 861 ++++++++++++++++++
 arch/x86/kvm/svm/svm.c                        |  20 +
 arch/x86/kvm/svm/svm.h                        |   9 +
 arch/x86/kvm/vmx/vmx.c                        |   1 +
 arch/x86/kvm/x86.c                            |  30 +
 arch/x86/mm/mem_encrypt.c                     |  98 +-
 arch/x86/mm/pat/set_memory.c                  |   7 +
 include/linux/efi.h                           |   1 +
 include/linux/psp-sev.h                       |   8 +-
 include/uapi/linux/kvm.h                      |  49 +
 include/uapi/linux/kvm_para.h                 |   1 +
 24 files changed, 1398 insertions(+), 6 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v10 01/16] KVM: SVM: Add KVM_SEV SEND_START command
  2021-02-04  0:35 [PATCH v10 00/17] Add AMD SEV guest live migration support Ashish Kalra
@ 2021-02-04  0:36 ` Ashish Kalra
  2021-02-04  0:36 ` [PATCH v10 02/16] KVM: SVM: Add KVM_SEND_UPDATE_DATA command Ashish Kalra
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-02-04  0:36 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Brijesh Singh <brijesh.singh@amd.com>

The command is used to create an outgoing SEV guest encryption context.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/amd-memory-encryption.rst        |  27 ++++
 arch/x86/kvm/svm/sev.c                        | 125 ++++++++++++++++++
 include/linux/psp-sev.h                       |   8 +-
 include/uapi/linux/kvm.h                      |  12 ++
 4 files changed, 168 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index 09a8f2a34e39..9f9896b72d36 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -263,6 +263,33 @@ Returns: 0 on success, -negative on error
                 __u32 trans_len;
         };
 
+10. KVM_SEV_SEND_START
+----------------------
+
+The KVM_SEV_SEND_START command can be used by the hypervisor to create an
+outgoing guest encryption context.
+
+Parameters (in): struct kvm_sev_send_start
+
+Returns: 0 on success, -negative on error
+
+::
+        struct kvm_sev_send_start {
+                __u32 policy;                 /* guest policy */
+
+                __u64 pdh_cert_uaddr;         /* platform Diffie-Hellman certificate */
+                __u32 pdh_cert_len;
+
+                __u64 plat_certs_uaddr;        /* platform certificate chain */
+                __u32 plat_certs_len;
+
+                __u64 amd_certs_uaddr;        /* AMD certificate */
+                __u32 amd_certs_len;
+
+                __u64 session_uaddr;          /* Guest session information */
+                __u32 session_len;
+        };
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ac652bc476ae..3026c7fd2ffc 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1039,6 +1039,128 @@ static int sev_launch_secret(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+/* Userspace wants to query session length. */
+static int
+__sev_send_start_query_session_length(struct kvm *kvm, struct kvm_sev_cmd *argp,
+				      struct kvm_sev_send_start *params)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_send_start *data;
+	int ret;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+	if (data == NULL)
+		return -ENOMEM;
+
+	data->handle = sev->handle;
+	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
+
+	params->session_len = data->session_len;
+	if (copy_to_user((void __user *)(uintptr_t)argp->data, params,
+				sizeof(struct kvm_sev_send_start)))
+		ret = -EFAULT;
+
+	kfree(data);
+	return ret;
+}
+
+static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_send_start *data;
+	struct kvm_sev_send_start params;
+	void *amd_certs, *session_data;
+	void *pdh_cert, *plat_certs;
+	int ret;
+
+	if (!sev_guest(kvm))
+		return -ENOTTY;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
+				sizeof(struct kvm_sev_send_start)))
+		return -EFAULT;
+
+	/* if session_len is zero, userspace wants to query the session length */
+	if (!params.session_len)
+		return __sev_send_start_query_session_length(kvm, argp,
+				&params);
+
+	/* some sanity checks */
+	if (!params.pdh_cert_uaddr || !params.pdh_cert_len ||
+	    !params.session_uaddr || params.session_len > SEV_FW_BLOB_MAX_SIZE)
+		return -EINVAL;
+
+	/* allocate the memory to hold the session data blob */
+	session_data = kmalloc(params.session_len, GFP_KERNEL_ACCOUNT);
+	if (!session_data)
+		return -ENOMEM;
+
+	/* copy the certificate blobs from userspace */
+	pdh_cert = psp_copy_user_blob(params.pdh_cert_uaddr,
+				params.pdh_cert_len);
+	if (IS_ERR(pdh_cert)) {
+		ret = PTR_ERR(pdh_cert);
+		goto e_free_session;
+	}
+
+	plat_certs = psp_copy_user_blob(params.plat_certs_uaddr,
+				params.plat_certs_len);
+	if (IS_ERR(plat_certs)) {
+		ret = PTR_ERR(plat_certs);
+		goto e_free_pdh;
+	}
+
+	amd_certs = psp_copy_user_blob(params.amd_certs_uaddr,
+				params.amd_certs_len);
+	if (IS_ERR(amd_certs)) {
+		ret = PTR_ERR(amd_certs);
+		goto e_free_plat_cert;
+	}
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+	if (data == NULL) {
+		ret = -ENOMEM;
+		goto e_free_amd_cert;
+	}
+
+	/* populate the FW SEND_START field with system physical address */
+	data->pdh_cert_address = __psp_pa(pdh_cert);
+	data->pdh_cert_len = params.pdh_cert_len;
+	data->plat_certs_address = __psp_pa(plat_certs);
+	data->plat_certs_len = params.plat_certs_len;
+	data->amd_certs_address = __psp_pa(amd_certs);
+	data->amd_certs_len = params.amd_certs_len;
+	data->session_address = __psp_pa(session_data);
+	data->session_len = params.session_len;
+	data->handle = sev->handle;
+
+	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
+
+	if (!ret && copy_to_user((void __user *)(uintptr_t)params.session_uaddr,
+			session_data, params.session_len)) {
+		ret = -EFAULT;
+		goto e_free;
+	}
+
+	params.policy = data->policy;
+	params.session_len = data->session_len;
+	if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
+				sizeof(struct kvm_sev_send_start)))
+		ret = -EFAULT;
+
+e_free:
+	kfree(data);
+e_free_amd_cert:
+	kfree(amd_certs);
+e_free_plat_cert:
+	kfree(plat_certs);
+e_free_pdh:
+	kfree(pdh_cert);
+e_free_session:
+	kfree(session_data);
+	return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -1089,6 +1211,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_LAUNCH_SECRET:
 		r = sev_launch_secret(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SEND_START:
+		r = sev_send_start(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 49d155cd2dfe..454f35904d47 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -325,11 +325,11 @@ struct sev_data_send_start {
 	u64 pdh_cert_address;			/* In */
 	u32 pdh_cert_len;			/* In */
 	u32 reserved1;
-	u64 plat_cert_address;			/* In */
-	u32 plat_cert_len;			/* In */
+	u64 plat_certs_address;			/* In */
+	u32 plat_certs_len;			/* In */
 	u32 reserved2;
-	u64 amd_cert_address;			/* In */
-	u32 amd_cert_len;			/* In */
+	u64 amd_certs_address;			/* In */
+	u32 amd_certs_len;			/* In */
 	u32 reserved3;
 	u64 session_address;			/* In */
 	u32 session_len;			/* In/Out */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 374c67875cdb..8f538fd873f6 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1645,6 +1645,18 @@ struct kvm_sev_dbg {
 	__u32 len;
 };
 
+struct kvm_sev_send_start {
+	__u32 policy;
+	__u64 pdh_cert_uaddr;
+	__u32 pdh_cert_len;
+	__u64 plat_certs_uaddr;
+	__u32 plat_certs_len;
+	__u64 amd_certs_uaddr;
+	__u32 amd_certs_len;
+	__u64 session_uaddr;
+	__u32 session_len;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v10 02/16] KVM: SVM: Add KVM_SEND_UPDATE_DATA command
  2021-02-04  0:35 [PATCH v10 00/17] Add AMD SEV guest live migration support Ashish Kalra
  2021-02-04  0:36 ` [PATCH v10 01/16] KVM: SVM: Add KVM_SEV SEND_START command Ashish Kalra
@ 2021-02-04  0:36 ` Ashish Kalra
  2021-02-04  0:37 ` [PATCH v10 03/16] KVM: SVM: Add KVM_SEV_SEND_FINISH command Ashish Kalra
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-02-04  0:36 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Brijesh Singh <brijesh.singh@amd.com>

The command is used for encrypting the guest memory region using the encryption
context created with KVM_SEV_SEND_START.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by : Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/amd-memory-encryption.rst        |  24 ++++
 arch/x86/kvm/svm/sev.c                        | 122 ++++++++++++++++++
 include/uapi/linux/kvm.h                      |   9 ++
 3 files changed, 155 insertions(+)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index 9f9896b72d36..8bed1d801558 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -290,6 +290,30 @@ Returns: 0 on success, -negative on error
                 __u32 session_len;
         };
 
+11. KVM_SEV_SEND_UPDATE_DATA
+----------------------------
+
+The KVM_SEV_SEND_UPDATE_DATA command can be used by the hypervisor to encrypt the
+outgoing guest memory region with the encryption context creating using
+KVM_SEV_SEND_START.
+
+Parameters (in): struct kvm_sev_send_update_data
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_launch_send_update_data {
+                __u64 hdr_uaddr;        /* userspace address containing the packet header */
+                __u32 hdr_len;
+
+                __u64 guest_uaddr;      /* the source memory region to be encrypted */
+                __u32 guest_len;
+
+                __u64 trans_uaddr;      /* the destition memory region  */
+                __u32 trans_len;
+        };
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 3026c7fd2ffc..98e46ae1cba3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -33,6 +33,7 @@ static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
 unsigned int max_sev_asid;
 static unsigned int min_sev_asid;
+static unsigned long sev_me_mask;
 static unsigned long *sev_asid_bitmap;
 static unsigned long *sev_reclaim_asid_bitmap;
 
@@ -1161,6 +1162,123 @@ static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+/* Userspace wants to query either header or trans length. */
+static int
+__sev_send_update_data_query_lengths(struct kvm *kvm, struct kvm_sev_cmd *argp,
+				     struct kvm_sev_send_update_data *params)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_send_update_data *data;
+	int ret;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+	if (!data)
+		return -ENOMEM;
+
+	data->handle = sev->handle;
+	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_UPDATE_DATA, data, &argp->error);
+
+	params->hdr_len = data->hdr_len;
+	params->trans_len = data->trans_len;
+
+	if (copy_to_user((void __user *)(uintptr_t)argp->data, params,
+			 sizeof(struct kvm_sev_send_update_data)))
+		ret = -EFAULT;
+
+	kfree(data);
+	return ret;
+}
+
+static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_send_update_data *data;
+	struct kvm_sev_send_update_data params;
+	void *hdr, *trans_data;
+	struct page **guest_page;
+	unsigned long n;
+	int ret, offset;
+
+	if (!sev_guest(kvm))
+		return -ENOTTY;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
+			sizeof(struct kvm_sev_send_update_data)))
+		return -EFAULT;
+
+	/* userspace wants to query either header or trans length */
+	if (!params.trans_len || !params.hdr_len)
+		return __sev_send_update_data_query_lengths(kvm, argp, &params);
+
+	if (!params.trans_uaddr || !params.guest_uaddr ||
+	    !params.guest_len || !params.hdr_uaddr)
+		return -EINVAL;
+
+	/* Check if we are crossing the page boundary */
+	offset = params.guest_uaddr & (PAGE_SIZE - 1);
+	if ((params.guest_len + offset > PAGE_SIZE))
+		return -EINVAL;
+
+	/* Pin guest memory */
+	guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
+				    PAGE_SIZE, &n, 0);
+	if (!guest_page)
+		return -EFAULT;
+
+	/* allocate memory for header and transport buffer */
+	ret = -ENOMEM;
+	hdr = kmalloc(params.hdr_len, GFP_KERNEL_ACCOUNT);
+	if (!hdr)
+		goto e_unpin;
+
+	trans_data = kmalloc(params.trans_len, GFP_KERNEL_ACCOUNT);
+	if (!trans_data)
+		goto e_free_hdr;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		goto e_free_trans_data;
+
+	data->hdr_address = __psp_pa(hdr);
+	data->hdr_len = params.hdr_len;
+	data->trans_address = __psp_pa(trans_data);
+	data->trans_len = params.trans_len;
+
+	/* The SEND_UPDATE_DATA command requires C-bit to be always set. */
+	data->guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) +
+				offset;
+	data->guest_address |= sev_me_mask;
+	data->guest_len = params.guest_len;
+	data->handle = sev->handle;
+
+	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_UPDATE_DATA, data, &argp->error);
+
+	if (ret)
+		goto e_free;
+
+	/* copy transport buffer to user space */
+	if (copy_to_user((void __user *)(uintptr_t)params.trans_uaddr,
+			 trans_data, params.trans_len)) {
+		ret = -EFAULT;
+		goto e_free;
+	}
+
+	/* Copy packet header to userspace. */
+	ret = copy_to_user((void __user *)(uintptr_t)params.hdr_uaddr, hdr,
+				params.hdr_len);
+
+e_free:
+	kfree(data);
+e_free_trans_data:
+	kfree(trans_data);
+e_free_hdr:
+	kfree(hdr);
+e_unpin:
+	sev_unpin_memory(kvm, guest_page, n);
+
+	return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -1214,6 +1332,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SEND_START:
 		r = sev_send_start(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SEND_UPDATE_DATA:
+		r = sev_send_update_data(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -1392,6 +1513,7 @@ void __init sev_hardware_setup(void)
 
 	/* Minimum ASID value that should be used for SEV guest */
 	min_sev_asid = edx;
+	sev_me_mask = 1UL << (ebx & 0x3f);
 
 	/* Initialize SEV ASID bitmaps */
 	sev_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 8f538fd873f6..0ff7bed508fc 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1657,6 +1657,15 @@ struct kvm_sev_send_start {
 	__u32 session_len;
 };
 
+struct kvm_sev_send_update_data {
+	__u64 hdr_uaddr;
+	__u32 hdr_len;
+	__u64 guest_uaddr;
+	__u32 guest_len;
+	__u64 trans_uaddr;
+	__u32 trans_len;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v10 03/16] KVM: SVM: Add KVM_SEV_SEND_FINISH command
  2021-02-04  0:35 [PATCH v10 00/17] Add AMD SEV guest live migration support Ashish Kalra
  2021-02-04  0:36 ` [PATCH v10 01/16] KVM: SVM: Add KVM_SEV SEND_START command Ashish Kalra
  2021-02-04  0:36 ` [PATCH v10 02/16] KVM: SVM: Add KVM_SEND_UPDATE_DATA command Ashish Kalra
@ 2021-02-04  0:37 ` Ashish Kalra
  2021-02-04  0:37 ` [PATCH v10 04/16] KVM: SVM: Add support for KVM_SEV_RECEIVE_START command Ashish Kalra
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-02-04  0:37 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Brijesh Singh <brijesh.singh@amd.com>

The command is used to finailize the encryption context created with
KVM_SEV_SEND_START command.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/amd-memory-encryption.rst        |  8 +++++++
 arch/x86/kvm/svm/sev.c                        | 23 +++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index 8bed1d801558..0da0c199efa8 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -314,6 +314,14 @@ Returns: 0 on success, -negative on error
                 __u32 trans_len;
         };
 
+12. KVM_SEV_SEND_FINISH
+------------------------
+
+After completion of the migration flow, the KVM_SEV_SEND_FINISH command can be
+issued by the hypervisor to delete the encryption context.
+
+Returns: 0 on success, -negative on error
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 98e46ae1cba3..0d117b1e6491 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1279,6 +1279,26 @@ static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_send_finish *data;
+	int ret;
+
+	if (!sev_guest(kvm))
+		return -ENOTTY;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	data->handle = sev->handle;
+	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_FINISH, data, &argp->error);
+
+	kfree(data);
+	return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -1335,6 +1355,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SEND_UPDATE_DATA:
 		r = sev_send_update_data(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SEND_FINISH:
+		r = sev_send_finish(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v10 04/16] KVM: SVM: Add support for KVM_SEV_RECEIVE_START command
  2021-02-04  0:35 [PATCH v10 00/17] Add AMD SEV guest live migration support Ashish Kalra
                   ` (2 preceding siblings ...)
  2021-02-04  0:37 ` [PATCH v10 03/16] KVM: SVM: Add KVM_SEV_SEND_FINISH command Ashish Kalra
@ 2021-02-04  0:37 ` Ashish Kalra
  2021-02-04  0:37 ` [PATCH v10 05/16] KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command Ashish Kalra
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-02-04  0:37 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Brijesh Singh <brijesh.singh@amd.com>

The command is used to create the encryption context for an incoming
SEV guest. The encryption context can be later used by the hypervisor
to import the incoming data into the SEV guest memory space.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/amd-memory-encryption.rst        | 29 +++++++
 arch/x86/kvm/svm/sev.c                        | 81 +++++++++++++++++++
 include/uapi/linux/kvm.h                      |  9 +++
 3 files changed, 119 insertions(+)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index 0da0c199efa8..079ac5ac2459 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -322,6 +322,35 @@ issued by the hypervisor to delete the encryption context.
 
 Returns: 0 on success, -negative on error
 
+13. KVM_SEV_RECEIVE_START
+------------------------
+
+The KVM_SEV_RECEIVE_START command is used for creating the memory encryption
+context for an incoming SEV guest. To create the encryption context, the user must
+provide a guest policy, the platform public Diffie-Hellman (PDH) key and session
+information.
+
+Parameters: struct  kvm_sev_receive_start (in/out)
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_receive_start {
+                __u32 handle;           /* if zero then firmware creates a new handle */
+                __u32 policy;           /* guest's policy */
+
+                __u64 pdh_uaddr;        /* userspace address pointing to the PDH key */
+                __u32 pdh_len;
+
+                __u64 session_uaddr;    /* userspace address which points to the guest session information */
+                __u32 session_len;
+        };
+
+On success, the 'handle' field contains a new handle and on error, a negative value.
+
+For more details, see SEV spec Section 6.12.
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0d117b1e6491..ec0d573cb09a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1299,6 +1299,84 @@ static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int sev_receive_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_receive_start *start;
+	struct kvm_sev_receive_start params;
+	int *error = &argp->error;
+	void *session_data;
+	void *pdh_data;
+	int ret;
+
+	if (!sev_guest(kvm))
+		return -ENOTTY;
+
+	/* Get parameter from the userspace */
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
+			sizeof(struct kvm_sev_receive_start)))
+		return -EFAULT;
+
+	/* some sanity checks */
+	if (!params.pdh_uaddr || !params.pdh_len ||
+	    !params.session_uaddr || !params.session_len)
+		return -EINVAL;
+
+	pdh_data = psp_copy_user_blob(params.pdh_uaddr, params.pdh_len);
+	if (IS_ERR(pdh_data))
+		return PTR_ERR(pdh_data);
+
+	session_data = psp_copy_user_blob(params.session_uaddr,
+			params.session_len);
+	if (IS_ERR(session_data)) {
+		ret = PTR_ERR(session_data);
+		goto e_free_pdh;
+	}
+
+	ret = -ENOMEM;
+	start = kzalloc(sizeof(*start), GFP_KERNEL);
+	if (!start)
+		goto e_free_session;
+
+	start->handle = params.handle;
+	start->policy = params.policy;
+	start->pdh_cert_address = __psp_pa(pdh_data);
+	start->pdh_cert_len = params.pdh_len;
+	start->session_address = __psp_pa(session_data);
+	start->session_len = params.session_len;
+
+	/* create memory encryption context */
+	ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_RECEIVE_START, start,
+				error);
+	if (ret)
+		goto e_free;
+
+	/* Bind ASID to this guest */
+	ret = sev_bind_asid(kvm, start->handle, error);
+	if (ret)
+		goto e_free;
+
+	params.handle = start->handle;
+	if (copy_to_user((void __user *)(uintptr_t)argp->data,
+			 &params, sizeof(struct kvm_sev_receive_start))) {
+		ret = -EFAULT;
+		sev_unbind_asid(kvm, start->handle);
+		goto e_free;
+	}
+
+	sev->handle = start->handle;
+	sev->fd = argp->sev_fd;
+
+e_free:
+	kfree(start);
+e_free_session:
+	kfree(session_data);
+e_free_pdh:
+	kfree(pdh_data);
+
+	return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -1358,6 +1436,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SEND_FINISH:
 		r = sev_send_finish(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_RECEIVE_START:
+		r = sev_receive_start(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0ff7bed508fc..d2eea75de8b3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1666,6 +1666,15 @@ struct kvm_sev_send_update_data {
 	__u32 trans_len;
 };
 
+struct kvm_sev_receive_start {
+	__u32 handle;
+	__u32 policy;
+	__u64 pdh_uaddr;
+	__u32 pdh_len;
+	__u64 session_uaddr;
+	__u32 session_len;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v10 05/16] KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command
  2021-02-04  0:35 [PATCH v10 00/17] Add AMD SEV guest live migration support Ashish Kalra
                   ` (3 preceding siblings ...)
  2021-02-04  0:37 ` [PATCH v10 04/16] KVM: SVM: Add support for KVM_SEV_RECEIVE_START command Ashish Kalra
@ 2021-02-04  0:37 ` Ashish Kalra
  2021-02-04  0:37 ` [PATCH v10 06/16] KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command Ashish Kalra
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-02-04  0:37 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Brijesh Singh <brijesh.singh@amd.com>

The command is used for copying the incoming buffer into the
SEV guest memory space.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/amd-memory-encryption.rst        | 24 ++++++
 arch/x86/kvm/svm/sev.c                        | 79 +++++++++++++++++++
 include/uapi/linux/kvm.h                      |  9 +++
 3 files changed, 112 insertions(+)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index 079ac5ac2459..da40be3d8bc2 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -351,6 +351,30 @@ On success, the 'handle' field contains a new handle and on error, a negative va
 
 For more details, see SEV spec Section 6.12.
 
+14. KVM_SEV_RECEIVE_UPDATE_DATA
+----------------------------
+
+The KVM_SEV_RECEIVE_UPDATE_DATA command can be used by the hypervisor to copy
+the incoming buffers into the guest memory region with encryption context
+created during the KVM_SEV_RECEIVE_START.
+
+Parameters (in): struct kvm_sev_receive_update_data
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_launch_receive_update_data {
+                __u64 hdr_uaddr;        /* userspace address containing the packet header */
+                __u32 hdr_len;
+
+                __u64 guest_uaddr;      /* the destination guest memory region */
+                __u32 guest_len;
+
+                __u64 trans_uaddr;      /* the incoming buffer memory region  */
+                __u32 trans_len;
+        };
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ec0d573cb09a..73d5dbb72a65 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1377,6 +1377,82 @@ static int sev_receive_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_sev_receive_update_data params;
+	struct sev_data_receive_update_data *data;
+	void *hdr = NULL, *trans = NULL;
+	struct page **guest_page;
+	unsigned long n;
+	int ret, offset;
+
+	if (!sev_guest(kvm))
+		return -EINVAL;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
+			sizeof(struct kvm_sev_receive_update_data)))
+		return -EFAULT;
+
+	if (!params.hdr_uaddr || !params.hdr_len ||
+	    !params.guest_uaddr || !params.guest_len ||
+	    !params.trans_uaddr || !params.trans_len)
+		return -EINVAL;
+
+	/* Check if we are crossing the page boundary */
+	offset = params.guest_uaddr & (PAGE_SIZE - 1);
+	if ((params.guest_len + offset > PAGE_SIZE))
+		return -EINVAL;
+
+	hdr = psp_copy_user_blob(params.hdr_uaddr, params.hdr_len);
+	if (IS_ERR(hdr))
+		return PTR_ERR(hdr);
+
+	trans = psp_copy_user_blob(params.trans_uaddr, params.trans_len);
+	if (IS_ERR(trans)) {
+		ret = PTR_ERR(trans);
+		goto e_free_hdr;
+	}
+
+	ret = -ENOMEM;
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		goto e_free_trans;
+
+	data->hdr_address = __psp_pa(hdr);
+	data->hdr_len = params.hdr_len;
+	data->trans_address = __psp_pa(trans);
+	data->trans_len = params.trans_len;
+
+	/* Pin guest memory */
+	ret = -EFAULT;
+	guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
+				    PAGE_SIZE, &n, 0);
+	if (!guest_page)
+		goto e_free;
+
+	/* The RECEIVE_UPDATE_DATA command requires C-bit to be always set. */
+	data->guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) +
+				offset;
+	data->guest_address |= sev_me_mask;
+	data->guest_len = params.guest_len;
+	data->handle = sev->handle;
+
+	ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_UPDATE_DATA, data,
+				&argp->error);
+
+	sev_unpin_memory(kvm, guest_page, n);
+
+e_free:
+	kfree(data);
+e_free_trans:
+	kfree(trans);
+e_free_hdr:
+	kfree(hdr);
+
+	return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -1439,6 +1515,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_RECEIVE_START:
 		r = sev_receive_start(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_RECEIVE_UPDATE_DATA:
+		r = sev_receive_update_data(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d2eea75de8b3..c4e195a4220f 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1675,6 +1675,15 @@ struct kvm_sev_receive_start {
 	__u32 session_len;
 };
 
+struct kvm_sev_receive_update_data {
+	__u64 hdr_uaddr;
+	__u32 hdr_len;
+	__u64 guest_uaddr;
+	__u32 guest_len;
+	__u64 trans_uaddr;
+	__u32 trans_len;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v10 06/16] KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
  2021-02-04  0:35 [PATCH v10 00/17] Add AMD SEV guest live migration support Ashish Kalra
                   ` (4 preceding siblings ...)
  2021-02-04  0:37 ` [PATCH v10 05/16] KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command Ashish Kalra
@ 2021-02-04  0:37 ` Ashish Kalra
  2021-02-04  0:38 ` [PATCH v10 07/16] KVM: x86: Add AMD SEV specific Hypercall3 Ashish Kalra
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-02-04  0:37 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Brijesh Singh <brijesh.singh@amd.com>

The command finalize the guest receiving process and make the SEV guest
ready for the execution.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/amd-memory-encryption.rst        |  8 +++++++
 arch/x86/kvm/svm/sev.c                        | 23 +++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index da40be3d8bc2..1f7bbda1f971 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -375,6 +375,14 @@ Returns: 0 on success, -negative on error
                 __u32 trans_len;
         };
 
+15. KVM_SEV_RECEIVE_FINISH
+------------------------
+
+After completion of the migration flow, the KVM_SEV_RECEIVE_FINISH command can be
+issued by the hypervisor to make the guest ready for execution.
+
+Returns: 0 on success, -negative on error
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 73d5dbb72a65..25eaf35ba51d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1453,6 +1453,26 @@ static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_receive_finish *data;
+	int ret;
+
+	if (!sev_guest(kvm))
+		return -ENOTTY;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	data->handle = sev->handle;
+	ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, data, &argp->error);
+
+	kfree(data);
+	return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -1518,6 +1538,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_RECEIVE_UPDATE_DATA:
 		r = sev_receive_update_data(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_RECEIVE_FINISH:
+		r = sev_receive_finish(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v10 07/16] KVM: x86: Add AMD SEV specific Hypercall3
  2021-02-04  0:35 [PATCH v10 00/17] Add AMD SEV guest live migration support Ashish Kalra
                   ` (5 preceding siblings ...)
  2021-02-04  0:37 ` [PATCH v10 06/16] KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command Ashish Kalra
@ 2021-02-04  0:38 ` Ashish Kalra
  2021-02-04  0:38 ` [PATCH v10 08/16] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall Ashish Kalra
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-02-04  0:38 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Brijesh Singh <brijesh.singh@amd.com>

KVM hypercall framework relies on alternative framework to patch the
VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
apply_alternative() is called then it defaults to VMCALL. The approach
works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
will be able to decode the instruction and do the right things. But
when SEV is active, guest memory is encrypted with guest key and
hypervisor will not be able to decode the instruction bytes.

Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall
will be used by the SEV guest to notify encrypted pages to the hypervisor.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/kvm_para.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 338119852512..bc1b11d057fc 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -85,6 +85,18 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
 	return ret;
 }
 
+static inline long kvm_sev_hypercall3(unsigned int nr, unsigned long p1,
+				      unsigned long p2, unsigned long p3)
+{
+	long ret;
+
+	asm volatile("vmmcall"
+		     : "=a"(ret)
+		     : "a"(nr), "b"(p1), "c"(p2), "d"(p3)
+		     : "memory");
+	return ret;
+}
+
 #ifdef CONFIG_KVM_GUEST
 bool kvm_para_available(void);
 unsigned int kvm_arch_para_features(void);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v10 08/16] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2021-02-04  0:35 [PATCH v10 00/17] Add AMD SEV guest live migration support Ashish Kalra
                   ` (6 preceding siblings ...)
  2021-02-04  0:38 ` [PATCH v10 07/16] KVM: x86: Add AMD SEV specific Hypercall3 Ashish Kalra
@ 2021-02-04  0:38 ` Ashish Kalra
  2021-02-04 16:03   ` Tom Lendacky
  2021-02-05  1:44   ` Steve Rutherford
  2021-02-04  0:39 ` [PATCH v10 09/16] mm: x86: Invoke hypercall when page encryption status is changed Ashish Kalra
                   ` (7 subsequent siblings)
  15 siblings, 2 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-02-04  0:38 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Brijesh Singh <brijesh.singh@amd.com>

This hypercall is used by the SEV guest to notify a change in the page
encryption status to the hypervisor. The hypercall should be invoked
only when the encryption attribute is changed from encrypted -> decrypted
and vice versa. By default all guest pages are considered encrypted.

The patch introduces a new shared pages list implemented as a
sorted linked list to track the shared/unencrypted regions marked by the
guest hypercall.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 Documentation/virt/kvm/hypercalls.rst |  15 +++
 arch/x86/include/asm/kvm_host.h       |   2 +
 arch/x86/kvm/svm/sev.c                | 150 ++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c                |   2 +
 arch/x86/kvm/svm/svm.h                |   5 +
 arch/x86/kvm/vmx/vmx.c                |   1 +
 arch/x86/kvm/x86.c                    |   6 ++
 include/uapi/linux/kvm_para.h         |   1 +
 8 files changed, 182 insertions(+)

diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
index ed4fddd364ea..7aff0cebab7c 100644
--- a/Documentation/virt/kvm/hypercalls.rst
+++ b/Documentation/virt/kvm/hypercalls.rst
@@ -169,3 +169,18 @@ a0: destination APIC ID
 
 :Usage example: When sending a call-function IPI-many to vCPUs, yield if
 	        any of the IPI target vCPUs was preempted.
+
+
+8. KVM_HC_PAGE_ENC_STATUS
+-------------------------
+:Architecture: x86
+:Status: active
+:Purpose: Notify the encryption status changes in guest page table (SEV guest)
+
+a0: the guest physical address of the start page
+a1: the number of pages
+a2: encryption attribute
+
+   Where:
+	* 1: Encryption attribute is set
+	* 0: Encryption attribute is cleared
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3d6616f6f6ef..2da5f5e2a10e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1301,6 +1301,8 @@ struct kvm_x86_ops {
 	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
 
 	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
+	int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
+				  unsigned long sz, unsigned long mode);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 25eaf35ba51d..55c628df5155 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -45,6 +45,11 @@ struct enc_region {
 	unsigned long size;
 };
 
+struct shared_region {
+	struct list_head list;
+	unsigned long gfn_start, gfn_end;
+};
+
 static int sev_flush_asids(void)
 {
 	int ret, error = 0;
@@ -196,6 +201,8 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	sev->active = true;
 	sev->asid = asid;
 	INIT_LIST_HEAD(&sev->regions_list);
+	INIT_LIST_HEAD(&sev->shared_pages_list);
+	sev->shared_pages_list_count = 0;
 
 	return 0;
 
@@ -1473,6 +1480,148 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int remove_shared_region(unsigned long start, unsigned long end,
+				struct list_head *head)
+{
+	struct shared_region *pos;
+
+	list_for_each_entry(pos, head, list) {
+		if (pos->gfn_start == start &&
+		    pos->gfn_end == end) {
+			list_del(&pos->list);
+			kfree(pos);
+			return -1;
+		} else if (start >= pos->gfn_start && end <= pos->gfn_end) {
+			if (start == pos->gfn_start)
+				pos->gfn_start = end + 1;
+			else if (end == pos->gfn_end)
+				pos->gfn_end = start - 1;
+			else {
+				/* Do a de-merge -- split linked list nodes */
+				unsigned long tmp;
+				struct shared_region *shrd_region;
+
+				tmp = pos->gfn_end;
+				pos->gfn_end = start-1;
+				shrd_region = kzalloc(sizeof(*shrd_region), GFP_KERNEL_ACCOUNT);
+				if (!shrd_region)
+					return -ENOMEM;
+				shrd_region->gfn_start = end + 1;
+				shrd_region->gfn_end = tmp;
+				list_add(&shrd_region->list, &pos->list);
+				return 1;
+			}
+			return 0;
+		}
+	}
+	return 0;
+}
+
+static int add_shared_region(unsigned long start, unsigned long end,
+			     struct list_head *shared_pages_list)
+{
+	struct list_head *head = shared_pages_list;
+	struct shared_region *shrd_region;
+	struct shared_region *pos;
+
+	if (list_empty(head)) {
+		shrd_region = kzalloc(sizeof(*shrd_region), GFP_KERNEL_ACCOUNT);
+		if (!shrd_region)
+			return -ENOMEM;
+		shrd_region->gfn_start = start;
+		shrd_region->gfn_end = end;
+		list_add_tail(&shrd_region->list, head);
+		return 1;
+	}
+
+	/*
+	 * Shared pages list is a sorted list in ascending order of
+	 * guest PA's and also merges consecutive range of guest PA's
+	 */
+	list_for_each_entry(pos, head, list) {
+		if (pos->gfn_end < start)
+			continue;
+		/* merge consecutive guest PA(s) */
+		if (pos->gfn_start <= start && pos->gfn_end >= start) {
+			pos->gfn_end = end;
+			return 0;
+		}
+		break;
+	}
+	/*
+	 * Add a new node, allocate nodes using GFP_KERNEL_ACCOUNT so that
+	 * kernel memory can be tracked/throttled in case a
+	 * malicious guest makes infinite number of hypercalls to
+	 * exhaust host kernel memory and cause a DOS attack.
+	 */
+	shrd_region = kzalloc(sizeof(*shrd_region), GFP_KERNEL_ACCOUNT);
+	if (!shrd_region)
+		return -ENOMEM;
+	shrd_region->gfn_start = start;
+	shrd_region->gfn_end = end;
+	list_add_tail(&shrd_region->list, &pos->list);
+	return 1;
+}
+
+int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
+			   unsigned long npages, unsigned long enc)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	kvm_pfn_t pfn_start, pfn_end;
+	gfn_t gfn_start, gfn_end;
+	int ret = 0;
+
+	if (!sev_guest(kvm))
+		return -EINVAL;
+
+	if (!npages)
+		return 0;
+
+	gfn_start = gpa_to_gfn(gpa);
+	gfn_end = gfn_start + npages;
+
+	/* out of bound access error check */
+	if (gfn_end <= gfn_start)
+		return -EINVAL;
+
+	/* lets make sure that gpa exist in our memslot */
+	pfn_start = gfn_to_pfn(kvm, gfn_start);
+	pfn_end = gfn_to_pfn(kvm, gfn_end);
+
+	if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
+		/*
+		 * Allow guest MMIO range(s) to be added
+		 * to the shared pages list.
+		 */
+		return -EINVAL;
+	}
+
+	if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
+		/*
+		 * Allow guest MMIO range(s) to be added
+		 * to the shared pages list.
+		 */
+		return -EINVAL;
+	}
+
+	mutex_lock(&kvm->lock);
+
+	if (enc) {
+		ret = remove_shared_region(gfn_start, gfn_end,
+					   &sev->shared_pages_list);
+		if (ret != -ENOMEM)
+			sev->shared_pages_list_count += ret;
+	} else {
+		ret = add_shared_region(gfn_start, gfn_end,
+					&sev->shared_pages_list);
+		if (ret > 0)
+			sev->shared_pages_list_count++;
+	}
+
+	mutex_unlock(&kvm->lock);
+	return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -1693,6 +1842,7 @@ void sev_vm_destroy(struct kvm *kvm)
 
 	sev_unbind_asid(kvm, sev->handle);
 	sev_asid_free(sev->asid);
+	sev->shared_pages_list_count = 0;
 }
 
 void __init sev_hardware_setup(void)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f923e14e87df..bb249ec625fc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4536,6 +4536,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.complete_emulated_msr = svm_complete_emulated_msr,
 
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
+
+	.page_enc_status_hc = svm_page_enc_status_hc,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0fe874ae5498..6437c1fa1f24 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -79,6 +79,9 @@ struct kvm_sev_info {
 	unsigned long pages_locked; /* Number of pages locked */
 	struct list_head regions_list;  /* List of registered regions */
 	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
+	/* List and count of shared pages */
+	int shared_pages_list_count;
+	struct list_head shared_pages_list;
 };
 
 struct kvm_svm {
@@ -472,6 +475,8 @@ int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
 			       bool has_error_code, u32 error_code);
 int nested_svm_exit_special(struct vcpu_svm *svm);
 void sync_nested_vmcb_control(struct vcpu_svm *svm);
+int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
+			   unsigned long npages, unsigned long enc);
 
 extern struct kvm_x86_nested_ops svm_nested_ops;
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cc60b1fc3ee7..bcbf53851612 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7705,6 +7705,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.can_emulate_instruction = vmx_can_emulate_instruction,
 	.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
 	.migrate_timers = vmx_migrate_timers,
+	.page_enc_status_hc = NULL,
 
 	.msr_filter_changed = vmx_msr_filter_changed,
 	.complete_emulated_msr = kvm_complete_insn_gp,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 76bce832cade..2f17f0f9ace7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8162,6 +8162,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		kvm_sched_yield(vcpu->kvm, a0);
 		ret = 0;
 		break;
+	case KVM_HC_PAGE_ENC_STATUS:
+		ret = -KVM_ENOSYS;
+		if (kvm_x86_ops.page_enc_status_hc)
+			ret = kvm_x86_ops.page_enc_status_hc(vcpu->kvm,
+					a0, a1, a2);
+		break;
 	default:
 		ret = -KVM_ENOSYS;
 		break;
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index 8b86609849b9..847b83b75dc8 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -29,6 +29,7 @@
 #define KVM_HC_CLOCK_PAIRING		9
 #define KVM_HC_SEND_IPI		10
 #define KVM_HC_SCHED_YIELD		11
+#define KVM_HC_PAGE_ENC_STATUS		12
 
 /*
  * hypercalls use architecture specific
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v10 09/16] mm: x86: Invoke hypercall when page encryption status is changed
  2021-02-04  0:35 [PATCH v10 00/17] Add AMD SEV guest live migration support Ashish Kalra
                   ` (7 preceding siblings ...)
  2021-02-04  0:38 ` [PATCH v10 08/16] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall Ashish Kalra
@ 2021-02-04  0:39 ` Ashish Kalra
  2021-02-04  0:39 ` [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl Ashish Kalra
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-02-04  0:39 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Brijesh Singh <brijesh.singh@amd.com>

Invoke a hypercall when a memory region is changed from encrypted ->
decrypted and vice versa. Hypervisor needs to know the page encryption
status during the guest migration.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/paravirt.h       | 10 +++++
 arch/x86/include/asm/paravirt_types.h |  2 +
 arch/x86/kernel/paravirt.c            |  1 +
 arch/x86/mm/mem_encrypt.c             | 57 ++++++++++++++++++++++++++-
 arch/x86/mm/pat/set_memory.c          |  7 ++++
 5 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index f8dce11d2bc1..1265e1f5db5f 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -84,6 +84,12 @@ static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
 	PVOP_VCALL1(mmu.exit_mmap, mm);
 }
 
+static inline void page_encryption_changed(unsigned long vaddr, int npages,
+						bool enc)
+{
+	PVOP_VCALL3(mmu.page_encryption_changed, vaddr, npages, enc);
+}
+
 #ifdef CONFIG_PARAVIRT_XXL
 static inline void load_sp0(unsigned long sp0)
 {
@@ -829,6 +835,10 @@ static inline void paravirt_arch_dup_mmap(struct mm_struct *oldmm,
 static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
 {
 }
+
+static inline void page_encryption_changed(unsigned long vaddr, int npages, bool enc)
+{
+}
 #endif
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PARAVIRT_H */
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index b6b02b7c19cc..6a83821cf758 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -208,6 +208,8 @@ struct pv_mmu_ops {
 
 	/* Hook for intercepting the destruction of an mm_struct. */
 	void (*exit_mmap)(struct mm_struct *mm);
+	void (*page_encryption_changed)(unsigned long vaddr, int npages,
+					bool enc);
 
 #ifdef CONFIG_PARAVIRT_XXL
 	struct paravirt_callee_save read_cr2;
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 6c3407ba6ee9..52913356b6fa 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -340,6 +340,7 @@ struct paravirt_patch_template pv_ops = {
 			(void (*)(struct mmu_gather *, void *))tlb_remove_page,
 
 	.mmu.exit_mmap		= paravirt_nop,
+	.mmu.page_encryption_changed	= paravirt_nop,
 
 #ifdef CONFIG_PARAVIRT_XXL
 	.mmu.read_cr2		= __PV_IS_CALLEE_SAVE(native_read_cr2),
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index c79e5736ab2b..dc17d14f9bcd 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -19,6 +19,7 @@
 #include <linux/kernel.h>
 #include <linux/bitops.h>
 #include <linux/dma-mapping.h>
+#include <linux/kvm_para.h>
 
 #include <asm/tlbflush.h>
 #include <asm/fixmap.h>
@@ -29,6 +30,7 @@
 #include <asm/processor-flags.h>
 #include <asm/msr.h>
 #include <asm/cmdline.h>
+#include <asm/kvm_para.h>
 
 #include "mm_internal.h"
 
@@ -229,6 +231,47 @@ void __init sev_setup_arch(void)
 	swiotlb_adjust_size(size);
 }
 
+static void set_memory_enc_dec_hypercall(unsigned long vaddr, int npages,
+					bool enc)
+{
+	unsigned long sz = npages << PAGE_SHIFT;
+	unsigned long vaddr_end, vaddr_next;
+
+	vaddr_end = vaddr + sz;
+
+	for (; vaddr < vaddr_end; vaddr = vaddr_next) {
+		int psize, pmask, level;
+		unsigned long pfn;
+		pte_t *kpte;
+
+		kpte = lookup_address(vaddr, &level);
+		if (!kpte || pte_none(*kpte))
+			return;
+
+		switch (level) {
+		case PG_LEVEL_4K:
+			pfn = pte_pfn(*kpte);
+			break;
+		case PG_LEVEL_2M:
+			pfn = pmd_pfn(*(pmd_t *)kpte);
+			break;
+		case PG_LEVEL_1G:
+			pfn = pud_pfn(*(pud_t *)kpte);
+			break;
+		default:
+			return;
+		}
+
+		psize = page_level_size(level);
+		pmask = page_level_mask(level);
+
+		kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
+				   pfn << PAGE_SHIFT, psize >> PAGE_SHIFT, enc);
+
+		vaddr_next = (vaddr & pmask) + psize;
+	}
+}
+
 static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
 {
 	pgprot_t old_prot, new_prot;
@@ -286,12 +329,13 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
 static int __init early_set_memory_enc_dec(unsigned long vaddr,
 					   unsigned long size, bool enc)
 {
-	unsigned long vaddr_end, vaddr_next;
+	unsigned long vaddr_end, vaddr_next, start;
 	unsigned long psize, pmask;
 	int split_page_size_mask;
 	int level, ret;
 	pte_t *kpte;
 
+	start = vaddr;
 	vaddr_next = vaddr;
 	vaddr_end = vaddr + size;
 
@@ -346,6 +390,8 @@ static int __init early_set_memory_enc_dec(unsigned long vaddr,
 
 	ret = 0;
 
+	set_memory_enc_dec_hypercall(start, PAGE_ALIGN(size) >> PAGE_SHIFT,
+					enc);
 out:
 	__flush_tlb_all();
 	return ret;
@@ -479,6 +525,15 @@ void __init mem_encrypt_init(void)
 	if (sev_active())
 		static_branch_enable(&sev_enable_key);
 
+#ifdef CONFIG_PARAVIRT
+	/*
+	 * With SEV, we need to make a hypercall when page encryption state is
+	 * changed.
+	 */
+	if (sev_active())
+		pv_ops.mmu.page_encryption_changed = set_memory_enc_dec_hypercall;
+#endif
+
 	print_mem_encrypt_feature_info();
 }
 
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 16f878c26667..3576b583ac65 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -27,6 +27,7 @@
 #include <asm/proto.h>
 #include <asm/memtype.h>
 #include <asm/set_memory.h>
+#include <asm/paravirt.h>
 
 #include "../mm_internal.h"
 
@@ -2012,6 +2013,12 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
 	 */
 	cpa_flush(&cpa, 0);
 
+	/* Notify hypervisor that a given memory range is mapped encrypted
+	 * or decrypted. The hypervisor will use this information during the
+	 * VM migration.
+	 */
+	page_encryption_changed(addr, numpages, enc);
+
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-04  0:35 [PATCH v10 00/17] Add AMD SEV guest live migration support Ashish Kalra
                   ` (8 preceding siblings ...)
  2021-02-04  0:39 ` [PATCH v10 09/16] mm: x86: Invoke hypercall when page encryption status is changed Ashish Kalra
@ 2021-02-04  0:39 ` Ashish Kalra
  2021-02-04 16:14   ` Tom Lendacky
                     ` (2 more replies)
  2021-02-04  0:39 ` [PATCH v10 11/16] KVM: x86: Introduce KVM_SET_SHARED_PAGES_LIST ioctl Ashish Kalra
                   ` (5 subsequent siblings)
  15 siblings, 3 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-02-04  0:39 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Brijesh Singh <brijesh.singh@amd.com>

The ioctl is used to retrieve a guest's shared pages list.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 Documentation/virt/kvm/api.rst  | 24 ++++++++++++++++
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/svm/sev.c          | 49 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          |  1 +
 arch/x86/kvm/svm/svm.h          |  1 +
 arch/x86/kvm/x86.c              | 12 ++++++++
 include/uapi/linux/kvm.h        |  9 ++++++
 7 files changed, 98 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 99ceb978c8b0..59ef537c0cdd 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4677,6 +4677,30 @@ This ioctl resets VCPU registers and control structures according to
 the clear cpu reset definition in the POP. However, the cpu is not put
 into ESA mode. This reset is a superset of the initial reset.
 
+4.125 KVM_GET_SHARED_PAGES_LIST (vm ioctl)
+---------------------------------------
+
+:Capability: basic
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: struct kvm_shared_pages_list (in/out)
+:Returns: 0 on success, -1 on error
+
+/* for KVM_GET_SHARED_PAGES_LIST */
+struct kvm_shared_pages_list {
+	int __user *pnents;
+	void __user *buffer;
+	__u32 size;
+};
+
+The encrypted VMs have the concept of private and shared pages. The private
+pages are encrypted with the guest-specific key, while the shared pages may
+be encrypted with the hypervisor key. The KVM_GET_SHARED_PAGES_LIST can
+be used to get guest's shared/unencrypted memory regions list.
+This list can be used during the guest migration. If the page
+is private then the userspace need to use SEV migration commands to transmit
+the page.
+
 
 4.125 KVM_S390_PV_COMMAND
 -------------------------
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2da5f5e2a10e..cd354d830e13 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1303,6 +1303,8 @@ struct kvm_x86_ops {
 	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
 	int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
 				  unsigned long sz, unsigned long mode);
+	int (*get_shared_pages_list)(struct kvm *kvm,
+				     struct kvm_shared_pages_list *list);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 55c628df5155..701d74c8b15b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -50,6 +50,11 @@ struct shared_region {
 	unsigned long gfn_start, gfn_end;
 };
 
+struct shared_region_array_entry {
+	unsigned long gfn_start;
+	unsigned long gfn_end;
+};
+
 static int sev_flush_asids(void)
 {
 	int ret, error = 0;
@@ -1622,6 +1627,50 @@ int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
 	return ret;
 }
 
+int svm_get_shared_pages_list(struct kvm *kvm,
+			      struct kvm_shared_pages_list *list)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct shared_region_array_entry *array;
+	struct shared_region *pos;
+	int ret, nents = 0;
+	unsigned long sz;
+
+	if (!sev_guest(kvm))
+		return -ENOTTY;
+
+	if (!list->size)
+		return -EINVAL;
+
+	if (!sev->shared_pages_list_count)
+		return put_user(0, list->pnents);
+
+	sz = sev->shared_pages_list_count * sizeof(struct shared_region_array_entry);
+	if (sz > list->size)
+		return -E2BIG;
+
+	array = kmalloc(sz, GFP_KERNEL);
+	if (!array)
+		return -ENOMEM;
+
+	mutex_lock(&kvm->lock);
+	list_for_each_entry(pos, &sev->shared_pages_list, list) {
+		array[nents].gfn_start = pos->gfn_start;
+		array[nents++].gfn_end = pos->gfn_end;
+	}
+	mutex_unlock(&kvm->lock);
+
+	ret = -EFAULT;
+	if (copy_to_user(list->buffer, array, sz))
+		goto out;
+	if (put_user(nents, list->pnents))
+		goto out;
+	ret = 0;
+out:
+	kfree(array);
+	return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index bb249ec625fc..533ce47ff158 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4538,6 +4538,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
 
 	.page_enc_status_hc = svm_page_enc_status_hc,
+	.get_shared_pages_list = svm_get_shared_pages_list,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6437c1fa1f24..6a777c61373c 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -477,6 +477,7 @@ int nested_svm_exit_special(struct vcpu_svm *svm);
 void sync_nested_vmcb_control(struct vcpu_svm *svm);
 int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
 			   unsigned long npages, unsigned long enc);
+int svm_get_shared_pages_list(struct kvm *kvm, struct kvm_shared_pages_list *list);
 
 extern struct kvm_x86_nested_ops svm_nested_ops;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2f17f0f9ace7..acfec2ae1402 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5719,6 +5719,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	case KVM_X86_SET_MSR_FILTER:
 		r = kvm_vm_ioctl_set_msr_filter(kvm, argp);
 		break;
+	case KVM_GET_SHARED_PAGES_LIST: {
+		struct kvm_shared_pages_list list;
+
+		r = -EFAULT;
+		if (copy_from_user(&list, argp, sizeof(list)))
+			goto out;
+
+		r = -ENOTTY;
+		if (kvm_x86_ops.get_shared_pages_list)
+			r = kvm_x86_ops.get_shared_pages_list(kvm, &list);
+		break;
+	}
 	default:
 		r = -ENOTTY;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index c4e195a4220f..0529ba80498a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -544,6 +544,13 @@ struct kvm_clear_dirty_log {
 	};
 };
 
+/* for KVM_GET_SHARED_PAGES_LIST */
+struct kvm_shared_pages_list {
+	int __user *pnents;
+	void __user *buffer;
+	__u32 size;
+};
+
 /* for KVM_SET_SIGNAL_MASK */
 struct kvm_signal_mask {
 	__u32 len;
@@ -1565,6 +1572,8 @@ struct kvm_pv_cmd {
 /* Available with KVM_CAP_DIRTY_LOG_RING */
 #define KVM_RESET_DIRTY_RINGS		_IO(KVMIO, 0xc7)
 
+#define KVM_GET_SHARED_PAGES_LIST	_IOW(KVMIO, 0xc8, struct kvm_shared_pages_list)
+
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
 	/* Guest initialization commands */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v10 11/16] KVM: x86: Introduce KVM_SET_SHARED_PAGES_LIST ioctl
  2021-02-04  0:35 [PATCH v10 00/17] Add AMD SEV guest live migration support Ashish Kalra
                   ` (9 preceding siblings ...)
  2021-02-04  0:39 ` [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl Ashish Kalra
@ 2021-02-04  0:39 ` Ashish Kalra
  2021-02-04  0:39 ` [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR Ashish Kalra
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-02-04  0:39 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Brijesh Singh <brijesh.singh@amd.com>

The ioctl is used to setup the shared pages list for an
incoming guest.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 Documentation/virt/kvm/api.rst  | 20 +++++++++-
 arch/x86/include/asm/kvm_host.h |  2 +
 arch/x86/kvm/svm/sev.c          | 70 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          |  1 +
 arch/x86/kvm/svm/svm.h          |  1 +
 arch/x86/kvm/x86.c              | 12 ++++++
 include/uapi/linux/kvm.h        |  1 +
 7 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 59ef537c0cdd..efb4720733b4 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4701,6 +4701,25 @@ This list can be used during the guest migration. If the page
 is private then the userspace need to use SEV migration commands to transmit
 the page.
 
+4.126 KVM_SET_SHARED_PAGES_LIST (vm ioctl)
+---------------------------------------
+
+:Capability: basic
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: struct kvm_shared_pages_list (in/out)
+:Returns: 0 on success, -1 on error
+
+/* for KVM_SET_SHARED_PAGES_LIST */
+struct kvm_shared_pages_list {
+	int __user *pnents;
+	void __user *buffer;
+	__u32 size;
+};
+
+During the guest live migration the outgoing guest exports its unencrypted
+memory regions list, the KVM_SET_SHARED_PAGES_LIST can be used to build the
+shared/unencrypted regions list for an incoming guest.
 
 4.125 KVM_S390_PV_COMMAND
 -------------------------
@@ -4855,7 +4874,6 @@ into user space.
 If a vCPU is in running state while this ioctl is invoked, the vCPU may
 experience inconsistent filtering behavior on MSR accesses.
 
-
 5. The kvm_run structure
 ========================
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cd354d830e13..f05b812b69bd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1305,6 +1305,8 @@ struct kvm_x86_ops {
 				  unsigned long sz, unsigned long mode);
 	int (*get_shared_pages_list)(struct kvm *kvm,
 				     struct kvm_shared_pages_list *list);
+	int (*set_shared_pages_list)(struct kvm *kvm,
+				     struct kvm_shared_pages_list *list);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 701d74c8b15b..b0d324aed515 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1671,6 +1671,76 @@ int svm_get_shared_pages_list(struct kvm *kvm,
 	return ret;
 }
 
+int svm_set_shared_pages_list(struct kvm *kvm,
+			      struct kvm_shared_pages_list *list)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct shared_region_array_entry *array;
+	struct shared_region *shrd_region;
+	int ret, nents, i;
+	unsigned long sz;
+
+	if (!sev_guest(kvm))
+		return -ENOTTY;
+
+	if (get_user(nents, list->pnents))
+		return -EFAULT;
+
+	/* special case of resetting the shared pages list */
+	if (!list->buffer || !nents) {
+		struct shared_region *pos;
+
+		mutex_lock(&kvm->lock);
+		list_for_each_entry(pos, &sev->shared_pages_list, list)
+			kfree(pos);
+		sev->shared_pages_list_count = 0;
+		mutex_unlock(&kvm->lock);
+
+		return 0;
+	}
+
+	sz = nents * sizeof(struct shared_region_array_entry);
+	array = kmalloc(sz, GFP_KERNEL);
+	if (!array)
+		return -ENOMEM;
+
+	ret = -EFAULT;
+	if (copy_from_user(array, list->buffer, sz))
+		goto out;
+
+	ret = 0;
+	mutex_lock(&kvm->lock);
+	for (i = 0; i < nents; i++) {
+		shrd_region = kzalloc(sizeof(*shrd_region), GFP_KERNEL_ACCOUNT);
+		if (!shrd_region) {
+			struct shared_region *pos;
+
+			/* Freeing previously allocated entries */
+			list_for_each_entry(pos,
+					    &sev->shared_pages_list,
+					    list) {
+				kfree(pos);
+			}
+
+			mutex_unlock(&kvm->lock);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		shrd_region->gfn_start = array[i].gfn_start;
+		shrd_region->gfn_end = array[i].gfn_end;
+		list_add_tail(&shrd_region->list,
+			      &sev->shared_pages_list);
+	}
+	sev->shared_pages_list_count = nents;
+	mutex_unlock(&kvm->lock);
+
+out:
+	kfree(array);
+
+	return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 533ce47ff158..58f89f83caab 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4539,6 +4539,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.page_enc_status_hc = svm_page_enc_status_hc,
 	.get_shared_pages_list = svm_get_shared_pages_list,
+	.set_shared_pages_list = svm_set_shared_pages_list,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6a777c61373c..066ca2a9f1e6 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -478,6 +478,7 @@ void sync_nested_vmcb_control(struct vcpu_svm *svm);
 int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
 			   unsigned long npages, unsigned long enc);
 int svm_get_shared_pages_list(struct kvm *kvm, struct kvm_shared_pages_list *list);
+int svm_set_shared_pages_list(struct kvm *kvm, struct kvm_shared_pages_list *list);
 
 extern struct kvm_x86_nested_ops svm_nested_ops;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index acfec2ae1402..c119715c1034 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5731,6 +5731,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
 			r = kvm_x86_ops.get_shared_pages_list(kvm, &list);
 		break;
 	}
+	case KVM_SET_SHARED_PAGES_LIST: {
+		struct kvm_shared_pages_list list;
+
+		r = -EFAULT;
+		if (copy_from_user(&list, argp, sizeof(list)))
+			goto out;
+
+		r = -ENOTTY;
+		if (kvm_x86_ops.set_shared_pages_list)
+			r = kvm_x86_ops.set_shared_pages_list(kvm, &list);
+		break;
+	}
 	default:
 		r = -ENOTTY;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0529ba80498a..f704b08c97f2 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1573,6 +1573,7 @@ struct kvm_pv_cmd {
 #define KVM_RESET_DIRTY_RINGS		_IO(KVMIO, 0xc7)
 
 #define KVM_GET_SHARED_PAGES_LIST	_IOW(KVMIO, 0xc8, struct kvm_shared_pages_list)
+#define KVM_SET_SHARED_PAGES_LIST	_IOW(KVMIO, 0xc9, struct kvm_shared_pages_list)
 
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.
  2021-02-04  0:35 [PATCH v10 00/17] Add AMD SEV guest live migration support Ashish Kalra
                   ` (10 preceding siblings ...)
  2021-02-04  0:39 ` [PATCH v10 11/16] KVM: x86: Introduce KVM_SET_SHARED_PAGES_LIST ioctl Ashish Kalra
@ 2021-02-04  0:39 ` Ashish Kalra
  2021-02-05  0:56   ` Steve Rutherford
  2021-02-16 23:20   ` Sean Christopherson
  2021-02-04  0:40 ` [PATCH v10 13/16] EFI: Introduce the new AMD Memory Encryption GUID Ashish Kalra
                   ` (3 subsequent siblings)
  15 siblings, 2 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-02-04  0:39 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Ashish Kalra <ashish.kalra@amd.com>

Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
for host-side support for SEV live migration. Also add a new custom
MSR_KVM_SEV_LIVE_MIGRATION for guest to enable the SEV live migration
feature.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 Documentation/virt/kvm/cpuid.rst     |  5 +++++
 Documentation/virt/kvm/msr.rst       | 12 ++++++++++++
 arch/x86/include/uapi/asm/kvm_para.h |  4 ++++
 arch/x86/kvm/svm/sev.c               | 13 +++++++++++++
 arch/x86/kvm/svm/svm.c               | 16 ++++++++++++++++
 arch/x86/kvm/svm/svm.h               |  2 ++
 6 files changed, 52 insertions(+)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index cf62162d4be2..0bdb6cdb12d3 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID        15          guest checks this feature bit
                                                before using extended destination
                                                ID bits in MSI address bits 11-5.
 
+KVM_FEATURE_SEV_LIVE_MIGRATION     16          guest checks this feature bit before
+                                               using the page encryption state
+                                               hypercall to notify the page state
+                                               change
+
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24          host will warn if no guest-side
                                                per-cpu warps are expected in
                                                kvmclock
diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index e37a14c323d2..020245d16087 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -376,3 +376,15 @@ data:
 	write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
 	and check if there are more notifications pending. The MSR is available
 	if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
+
+MSR_KVM_SEV_LIVE_MIGRATION:
+        0x4b564d08
+
+	Control SEV Live Migration features.
+
+data:
+        Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature,
+        in other words, this is guest->host communication that it's properly
+        handling the shared pages list.
+
+        All other bits are reserved.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 950afebfba88..f6bfa138874f 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -33,6 +33,7 @@
 #define KVM_FEATURE_PV_SCHED_YIELD	13
 #define KVM_FEATURE_ASYNC_PF_INT	14
 #define KVM_FEATURE_MSI_EXT_DEST_ID	15
+#define KVM_FEATURE_SEV_LIVE_MIGRATION	16
 
 #define KVM_HINTS_REALTIME      0
 
@@ -54,6 +55,7 @@
 #define MSR_KVM_POLL_CONTROL	0x4b564d05
 #define MSR_KVM_ASYNC_PF_INT	0x4b564d06
 #define MSR_KVM_ASYNC_PF_ACK	0x4b564d07
+#define MSR_KVM_SEV_LIVE_MIGRATION	0x4b564d08
 
 struct kvm_steal_time {
 	__u64 steal;
@@ -136,4 +138,6 @@ struct kvm_vcpu_pv_apf_data {
 #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
 #define KVM_PV_EOI_DISABLED 0x0
 
+#define KVM_SEV_LIVE_MIGRATION_ENABLED BIT_ULL(0)
+
 #endif /* _UAPI_ASM_X86_KVM_PARA_H */
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b0d324aed515..93f42b3d3e33 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1627,6 +1627,16 @@ int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
 	return ret;
 }
 
+void sev_update_migration_flags(struct kvm *kvm, u64 data)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	if (!sev_guest(kvm))
+		return;
+
+	sev->live_migration_enabled = !!(data & KVM_SEV_LIVE_MIGRATION_ENABLED);
+}
+
 int svm_get_shared_pages_list(struct kvm *kvm,
 			      struct kvm_shared_pages_list *list)
 {
@@ -1639,6 +1649,9 @@ int svm_get_shared_pages_list(struct kvm *kvm,
 	if (!sev_guest(kvm))
 		return -ENOTTY;
 
+	if (!sev->live_migration_enabled)
+		return -EINVAL;
+
 	if (!list->size)
 		return -EINVAL;
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 58f89f83caab..43ea5061926f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2903,6 +2903,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		svm->msr_decfg = data;
 		break;
 	}
+	case MSR_KVM_SEV_LIVE_MIGRATION:
+		sev_update_migration_flags(vcpu->kvm, data);
+		break;
 	case MSR_IA32_APICBASE:
 		if (kvm_vcpu_apicv_active(vcpu))
 			avic_update_vapic_bar(to_svm(vcpu), data);
@@ -3976,6 +3979,19 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 			vcpu->arch.cr3_lm_rsvd_bits &= ~(1UL << (best->ebx & 0x3f));
 	}
 
+	/*
+	 * If SEV guest then enable the Live migration feature.
+	 */
+	if (sev_guest(vcpu->kvm)) {
+		struct kvm_cpuid_entry2 *best;
+
+		best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
+		if (!best)
+			return;
+
+		best->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
+	}
+
 	if (!kvm_vcpu_apicv_active(vcpu))
 		return;
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 066ca2a9f1e6..e1bffc11e425 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -79,6 +79,7 @@ struct kvm_sev_info {
 	unsigned long pages_locked; /* Number of pages locked */
 	struct list_head regions_list;  /* List of registered regions */
 	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
+	bool live_migration_enabled;
 	/* List and count of shared pages */
 	int shared_pages_list_count;
 	struct list_head shared_pages_list;
@@ -592,6 +593,7 @@ int svm_unregister_enc_region(struct kvm *kvm,
 void pre_sev_run(struct vcpu_svm *svm, int cpu);
 void __init sev_hardware_setup(void);
 void sev_hardware_teardown(void);
+void sev_update_migration_flags(struct kvm *kvm, u64 data);
 void sev_free_vcpu(struct kvm_vcpu *vcpu);
 int sev_handle_vmgexit(struct vcpu_svm *svm);
 int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v10 13/16] EFI: Introduce the new AMD Memory Encryption GUID.
  2021-02-04  0:35 [PATCH v10 00/17] Add AMD SEV guest live migration support Ashish Kalra
                   ` (11 preceding siblings ...)
  2021-02-04  0:39 ` [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR Ashish Kalra
@ 2021-02-04  0:40 ` Ashish Kalra
  2021-02-04  0:40 ` [PATCH v10 14/16] KVM: x86: Add guest support for detecting and enabling SEV Live Migration feature Ashish Kalra
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-02-04  0:40 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Ashish Kalra <ashish.kalra@amd.com>

Introduce a new AMD Memory Encryption GUID which is currently
used for defining a new UEFI environment variable which indicates
UEFI/OVMF support for the SEV live migration feature. This variable
is setup when UEFI/OVMF detects host/hypervisor support for SEV
live migration and later this variable is read by the kernel using
EFI runtime services to verify if OVMF supports the live migration
feature.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 include/linux/efi.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/efi.h b/include/linux/efi.h
index 763b816ba19c..ae47db882bee 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -362,6 +362,7 @@ void efi_native_runtime_setup(void);
 
 /* OEM GUIDs */
 #define DELLEMC_EFI_RCI2_TABLE_GUID		EFI_GUID(0x2d9f28a2, 0xa886, 0x456a,  0x97, 0xa8, 0xf1, 0x1e, 0xf2, 0x4f, 0xf4, 0x55)
+#define MEM_ENCRYPT_GUID			EFI_GUID(0x0cf29b71, 0x9e51, 0x433a,  0xa3, 0xb7, 0x81, 0xf3, 0xab, 0x16, 0xb8, 0x75)
 
 typedef struct {
 	efi_guid_t guid;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v10 14/16] KVM: x86: Add guest support for detecting and enabling SEV Live Migration feature.
  2021-02-04  0:35 [PATCH v10 00/17] Add AMD SEV guest live migration support Ashish Kalra
                   ` (12 preceding siblings ...)
  2021-02-04  0:40 ` [PATCH v10 13/16] EFI: Introduce the new AMD Memory Encryption GUID Ashish Kalra
@ 2021-02-04  0:40 ` Ashish Kalra
  2021-02-18 17:56   ` Sean Christopherson
  2021-02-04  0:40 ` [PATCH v10 15/16] KVM: x86: Add kexec support for SEV Live Migration Ashish Kalra
  2021-02-04  0:40 ` [PATCH v10 16/16] KVM: SVM: Bypass DBG_DECRYPT API calls for unencrypted guest memory Ashish Kalra
  15 siblings, 1 reply; 75+ messages in thread
From: Ashish Kalra @ 2021-02-04  0:40 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Ashish Kalra <ashish.kalra@amd.com>

The guest support for detecting and enabling SEV Live migration
feature uses the following logic :

 - kvm_init_plaform() invokes check_kvm_sev_migration() which
   checks if its booted under the EFI

   - If not EFI,

     i) check for the KVM_FEATURE_CPUID

     ii) if CPUID reports that migration is supported, issue a wrmsrl()
         to enable the SEV live migration support

   - If EFI,

     i) check for the KVM_FEATURE_CPUID

     ii) If CPUID reports that migration is supported, read the UEFI variable which
         indicates OVMF support for live migration

     iii) the variable indicates live migration is supported, issue a wrmsrl() to
          enable the SEV live migration support

The EFI live migration check is done using a late_initcall() callback.

Also, ensure that _bss_decrypted section is marked as decrypted in the
shared pages list.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/mem_encrypt.h |  8 +++++
 arch/x86/kernel/kvm.c              | 52 ++++++++++++++++++++++++++++++
 arch/x86/mm/mem_encrypt.c          | 41 +++++++++++++++++++++++
 3 files changed, 101 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 31c4df123aa0..19b77f3a62dc 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -21,6 +21,7 @@
 extern u64 sme_me_mask;
 extern u64 sev_status;
 extern bool sev_enabled;
+extern bool sev_live_migration_enabled;
 
 void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,
 			 unsigned long decrypted_kernel_vaddr,
@@ -44,8 +45,11 @@ void __init sme_enable(struct boot_params *bp);
 
 int __init early_set_memory_decrypted(unsigned long vaddr, unsigned long size);
 int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size);
+void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages,
+					    bool enc);
 
 void __init mem_encrypt_free_decrypted_mem(void);
+void __init check_kvm_sev_migration(void);
 
 /* Architecture __weak replacement functions */
 void __init mem_encrypt_init(void);
@@ -60,6 +64,7 @@ bool sev_es_active(void);
 #else	/* !CONFIG_AMD_MEM_ENCRYPT */
 
 #define sme_me_mask	0ULL
+#define sev_live_migration_enabled	false
 
 static inline void __init sme_early_encrypt(resource_size_t paddr,
 					    unsigned long size) { }
@@ -84,8 +89,11 @@ static inline int __init
 early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 0; }
 static inline int __init
 early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0; }
+static inline void __init
+early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages, bool enc) {}
 
 static inline void mem_encrypt_free_decrypted_mem(void) { }
+static inline void check_kvm_sev_migration(void) { }
 
 #define __bss_decrypted
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 5e78e01ca3b4..c4b8029c1442 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -26,6 +26,7 @@
 #include <linux/kprobes.h>
 #include <linux/nmi.h>
 #include <linux/swait.h>
+#include <linux/efi.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -429,6 +430,56 @@ static inline void __set_percpu_decrypted(void *ptr, unsigned long size)
 	early_set_memory_decrypted((unsigned long) ptr, size);
 }
 
+static int __init setup_kvm_sev_migration(void)
+{
+	efi_char16_t efi_sev_live_migration_enabled[] = L"SevLiveMigrationEnabled";
+	efi_guid_t efi_variable_guid = MEM_ENCRYPT_GUID;
+	efi_status_t status;
+	unsigned long size;
+	bool enabled;
+
+	/*
+	 * check_kvm_sev_migration() invoked via kvm_init_platform() before
+	 * this callback would have setup the indicator that live migration
+	 * feature is supported/enabled.
+	 */
+	if (!sev_live_migration_enabled)
+		return 0;
+
+	if (!efi_enabled(EFI_RUNTIME_SERVICES)) {
+		pr_info("%s : EFI runtime services are not enabled\n", __func__);
+		return 0;
+	}
+
+	size = sizeof(enabled);
+
+	/* Get variable contents into buffer */
+	status = efi.get_variable(efi_sev_live_migration_enabled,
+				  &efi_variable_guid, NULL, &size, &enabled);
+
+	if (status == EFI_NOT_FOUND) {
+		pr_info("%s : EFI live migration variable not found\n", __func__);
+		return 0;
+	}
+
+	if (status != EFI_SUCCESS) {
+		pr_info("%s : EFI variable retrieval failed\n", __func__);
+		return 0;
+	}
+
+	if (enabled == 0) {
+		pr_info("%s: live migration disabled in EFI\n", __func__);
+		return 0;
+	}
+
+	pr_info("%s : live migration enabled in EFI\n", __func__);
+	wrmsrl(MSR_KVM_SEV_LIVE_MIGRATION, KVM_SEV_LIVE_MIGRATION_ENABLED);
+
+	return true;
+}
+
+late_initcall(setup_kvm_sev_migration);
+
 /*
  * Iterate through all possible CPUs and map the memory region pointed
  * by apf_reason, steal_time and kvm_apic_eoi as decrypted at once.
@@ -747,6 +798,7 @@ static bool __init kvm_msi_ext_dest_id(void)
 
 static void __init kvm_init_platform(void)
 {
+	check_kvm_sev_migration();
 	kvmclock_init();
 	x86_platform.apic_post_init = kvm_apic_init;
 }
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index dc17d14f9bcd..f80d2aee3938 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -20,6 +20,7 @@
 #include <linux/bitops.h>
 #include <linux/dma-mapping.h>
 #include <linux/kvm_para.h>
+#include <linux/efi.h>
 
 #include <asm/tlbflush.h>
 #include <asm/fixmap.h>
@@ -48,6 +49,8 @@ EXPORT_SYMBOL_GPL(sev_enable_key);
 
 bool sev_enabled __section(".data");
 
+bool sev_live_migration_enabled __section(".data");
+
 /* Buffer used for early in-place encryption by BSP, no locking needed */
 static char sme_early_buffer[PAGE_SIZE] __initdata __aligned(PAGE_SIZE);
 
@@ -237,6 +240,9 @@ static void set_memory_enc_dec_hypercall(unsigned long vaddr, int npages,
 	unsigned long sz = npages << PAGE_SHIFT;
 	unsigned long vaddr_end, vaddr_next;
 
+	if (!sev_live_migration_enabled)
+		return;
+
 	vaddr_end = vaddr + sz;
 
 	for (; vaddr < vaddr_end; vaddr = vaddr_next) {
@@ -407,6 +413,12 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size)
 	return early_set_memory_enc_dec(vaddr, size, true);
 }
 
+void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages,
+					bool enc)
+{
+	set_memory_enc_dec_hypercall(vaddr, npages, enc);
+}
+
 /*
  * SME and SEV are very similar but they are not the same, so there are
  * times that the kernel will need to distinguish between SME and SEV. The
@@ -461,6 +473,35 @@ bool force_dma_unencrypted(struct device *dev)
 	return false;
 }
 
+void __init check_kvm_sev_migration(void)
+{
+	if (sev_active() &&
+	    kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION)) {
+		unsigned long nr_pages;
+
+		pr_info("KVM enable live migration\n");
+		sev_live_migration_enabled = true;
+
+		/*
+		 * Ensure that _bss_decrypted section is marked as decrypted in the
+		 * shared pages list.
+		 */
+		nr_pages = DIV_ROUND_UP(__end_bss_decrypted - __start_bss_decrypted,
+					PAGE_SIZE);
+		early_set_mem_enc_dec_hypercall((unsigned long)__start_bss_decrypted,
+						nr_pages, 0);
+
+		/*
+		 * If not booted using EFI, enable Live migration support.
+		 */
+		if (!efi_enabled(EFI_BOOT))
+			wrmsrl(MSR_KVM_SEV_LIVE_MIGRATION,
+			       KVM_SEV_LIVE_MIGRATION_ENABLED);
+		} else {
+			pr_info("KVM enable live migration feature unsupported\n");
+		}
+}
+
 void __init mem_encrypt_free_decrypted_mem(void)
 {
 	unsigned long vaddr, vaddr_end, npages;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v10 15/16] KVM: x86: Add kexec support for SEV Live Migration.
  2021-02-04  0:35 [PATCH v10 00/17] Add AMD SEV guest live migration support Ashish Kalra
                   ` (13 preceding siblings ...)
  2021-02-04  0:40 ` [PATCH v10 14/16] KVM: x86: Add guest support for detecting and enabling SEV Live Migration feature Ashish Kalra
@ 2021-02-04  0:40 ` Ashish Kalra
  2021-02-04  4:10     ` kernel test robot
  2021-02-04  0:40 ` [PATCH v10 16/16] KVM: SVM: Bypass DBG_DECRYPT API calls for unencrypted guest memory Ashish Kalra
  15 siblings, 1 reply; 75+ messages in thread
From: Ashish Kalra @ 2021-02-04  0:40 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Ashish Kalra <ashish.kalra@amd.com>

Reset the host's shared pages list related to kernel
specific page encryption status settings before we load a
new kernel by kexec. We cannot reset the complete
shared pages list here as we need to retain the
UEFI/OVMF firmware specific settings.

The host's shared pages list is maintained for the
guest to keep track of all unencrypted guest memory regions,
therefore we need to explicitly mark all shared pages as
encrypted again before rebooting into the new guest kernel.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index c4b8029c1442..d61156db7797 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -39,6 +39,7 @@
 #include <asm/cpuidle_haltpoll.h>
 #include <asm/ptrace.h>
 #include <asm/svm.h>
+#include <asm/e820/api.h>
 
 DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled);
 
@@ -384,6 +385,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
 	 */
 	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
 		wrmsrl(MSR_KVM_PV_EOI_EN, 0);
+	/*
+	 * Reset the host's shared pages list related to kernel
+	 * specific page encryption status settings before we load a
+	 * new kernel by kexec. NOTE: We cannot reset the complete
+	 * shared pages list here as we need to retain the
+	 * UEFI/OVMF firmware specific settings.
+	 */
+	if (sev_live_migration_enabled & (smp_processor_id() == 0)) {
+		int i;
+		unsigned long nr_pages;
+
+		for (i = 0; i < e820_table->nr_entries; i++) {
+			struct e820_entry *entry = &e820_table->entries[i];
+			unsigned long start_pfn;
+			unsigned long end_pfn;
+
+			if (entry->type != E820_TYPE_RAM)
+				continue;
+
+			start_pfn = entry->addr >> PAGE_SHIFT;
+			end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
+			nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
+
+			kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
+					   entry->addr, nr_pages, 1);
+		}
+	}
 	kvm_pv_disable_apf();
 	kvm_disable_steal_time();
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v10 16/16] KVM: SVM: Bypass DBG_DECRYPT API calls for unencrypted guest memory.
  2021-02-04  0:35 [PATCH v10 00/17] Add AMD SEV guest live migration support Ashish Kalra
                   ` (14 preceding siblings ...)
  2021-02-04  0:40 ` [PATCH v10 15/16] KVM: x86: Add kexec support for SEV Live Migration Ashish Kalra
@ 2021-02-04  0:40 ` Ashish Kalra
  2021-02-09  6:23     ` kernel test robot
  15 siblings, 1 reply; 75+ messages in thread
From: Ashish Kalra @ 2021-02-04  0:40 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Ashish Kalra <ashish.kalra@amd.com>

For all unencrypted guest memory regions such as S/W IOTLB
bounce buffers and for guest regions marked as "__bss_decrypted",
ensure that DBG_DECRYPT API calls are bypassed.
The guest memory regions encryption status is referenced using the
shared pages list.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/kvm/svm/sev.c | 126 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 126 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 93f42b3d3e33..fa3fbbb73b33 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -888,6 +888,117 @@ static int __sev_dbg_encrypt_user(struct kvm *kvm, unsigned long paddr,
 	return ret;
 }
 
+static struct kvm_memory_slot *hva_to_memslot(struct kvm *kvm,
+					      unsigned long hva)
+{
+	struct kvm_memslots *slots = kvm_memslots(kvm);
+	struct kvm_memory_slot *memslot;
+
+	kvm_for_each_memslot(memslot, slots) {
+		if (hva >= memslot->userspace_addr &&
+		    hva < memslot->userspace_addr +
+			      (memslot->npages << PAGE_SHIFT))
+			return memslot;
+	}
+
+	return NULL;
+}
+
+static bool hva_to_gfn(struct kvm *kvm, unsigned long hva, gfn_t *gfn)
+{
+	struct kvm_memory_slot *memslot;
+	gpa_t gpa_offset;
+
+	memslot = hva_to_memslot(kvm, hva);
+	if (!memslot)
+		return false;
+
+	gpa_offset = hva - memslot->userspace_addr;
+	*gfn = ((memslot->base_gfn << PAGE_SHIFT) + gpa_offset) >> PAGE_SHIFT;
+
+	return true;
+}
+
+static bool is_unencrypted_region(gfn_t gfn_start, gfn_t gfn_end,
+				  struct list_head *head)
+{
+	struct shared_region *pos;
+
+	list_for_each_entry(pos, head, list)
+		if (gfn_start >= pos->gfn_start &&
+		    gfn_end <= pos->gfn_end)
+			return true;
+
+	return false;
+}
+
+static int handle_unencrypted_region(struct kvm *kvm,
+				     unsigned long vaddr,
+				     unsigned long vaddr_end,
+				     unsigned long dst_vaddr,
+				     unsigned int size,
+				     bool *is_decrypted)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct page *page = NULL;
+	gfn_t gfn_start, gfn_end;
+	int len, s_off, d_off;
+	int srcu_idx;
+	int ret = 0;
+
+	/* ensure hva_to_gfn translations remain valid */
+	srcu_idx = srcu_read_lock(&kvm->srcu);
+
+	if (!hva_to_gfn(kvm, vaddr, &gfn_start)) {
+		srcu_read_unlock(&kvm->srcu, srcu_idx);
+		return -EINVAL;
+	}
+
+	if (!hva_to_gfn(kvm, vaddr_end, &gfn_end)) {
+		srcu_read_unlock(&kvm->srcu, srcu_idx);
+		return -EINVAL;
+	}
+
+	if (sev->shared_pages_list_count) {
+		if (is_unencrypted_region(gfn_start, gfn_end,
+					  &sev->shared_pages_list)) {
+			page = alloc_page(GFP_KERNEL);
+			if (!page) {
+				srcu_read_unlock(&kvm->srcu, srcu_idx);
+				return -ENOMEM;
+			}
+
+			/*
+			 * Since user buffer may not be page aligned, calculate the
+			 * offset within the page.
+			 */
+			s_off = vaddr & ~PAGE_MASK;
+			d_off = dst_vaddr & ~PAGE_MASK;
+			len = min_t(size_t, (PAGE_SIZE - s_off), size);
+
+			if (copy_from_user(page_address(page),
+					   (void __user *)(uintptr_t)vaddr, len)) {
+				__free_page(page);
+				srcu_read_unlock(&kvm->srcu, srcu_idx);
+				return -EFAULT;
+			}
+
+			if (copy_to_user((void __user *)(uintptr_t)dst_vaddr,
+					 page_address(page), len)) {
+				ret = -EFAULT;
+			}
+
+			__free_page(page);
+			srcu_read_unlock(&kvm->srcu, srcu_idx);
+			*is_decrypted = true;
+			return ret;
+		}
+	}
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
+	*is_decrypted = false;
+	return ret;
+}
+
 static int sev_dbg_crypt(struct kvm *kvm, struct kvm_sev_cmd *argp, bool dec)
 {
 	unsigned long vaddr, vaddr_end, next_vaddr;
@@ -917,6 +1028,20 @@ static int sev_dbg_crypt(struct kvm *kvm, struct kvm_sev_cmd *argp, bool dec)
 	for (; vaddr < vaddr_end; vaddr = next_vaddr) {
 		int len, s_off, d_off;
 
+		if (dec) {
+			bool is_already_decrypted;
+
+			ret = handle_unencrypted_region(kvm,
+							vaddr,
+							vaddr_end,
+							dst_vaddr,
+							size,
+							&is_already_decrypted);
+
+			if (ret || is_already_decrypted)
+				goto already_decrypted;
+		}
+
 		/* lock userspace source and destination page */
 		src_p = sev_pin_memory(kvm, vaddr & PAGE_MASK, PAGE_SIZE, &n, 0);
 		if (IS_ERR(src_p))
@@ -961,6 +1086,7 @@ static int sev_dbg_crypt(struct kvm *kvm, struct kvm_sev_cmd *argp, bool dec)
 		sev_unpin_memory(kvm, src_p, n);
 		sev_unpin_memory(kvm, dst_p, n);
 
+already_decrypted:
 		if (ret)
 			goto err;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 15/16] KVM: x86: Add kexec support for SEV Live Migration.
  2021-02-04  0:40 ` [PATCH v10 15/16] KVM: x86: Add kexec support for SEV Live Migration Ashish Kalra
@ 2021-02-04  4:10     ` kernel test robot
  0 siblings, 0 replies; 75+ messages in thread
From: kernel test robot @ 2021-02-04  4:10 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: kbuild-all, tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky,
	x86, kvm

[-- Attachment #1: Type: text/plain, Size: 3335 bytes --]

Hi Ashish,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v5.11-rc6]
[also build test WARNING on next-20210125]
[cannot apply to kvm/linux-next tip/x86/mm tip/x86/core]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ashish-Kalra/KVM-SVM-Add-KVM_SEV-SEND_START-command/20210204-093647
base:    1048ba83fb1c00cd24172e23e8263972f6b5d9ac
config: i386-randconfig-r034-20210202 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/51c7205a5d0abf98f52da67fcf7a223c521f9693
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ashish-Kalra/KVM-SVM-Add-KVM_SEV-SEND_START-command/20210204-093647
        git checkout 51c7205a5d0abf98f52da67fcf7a223c521f9693
        # save the attached .config to linux build tree
        make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   arch/x86/kernel/kvm.c: In function 'kvm_pv_guest_cpu_reboot':
>> arch/x86/kernel/kvm.c:402:18: warning: variable 'end_pfn' set but not used [-Wunused-but-set-variable]
     402 |    unsigned long end_pfn;
         |                  ^~~~~~~
>> arch/x86/kernel/kvm.c:401:18: warning: variable 'start_pfn' set but not used [-Wunused-but-set-variable]
     401 |    unsigned long start_pfn;
         |                  ^~~~~~~~~


vim +/end_pfn +402 arch/x86/kernel/kvm.c

   378	
   379	static void kvm_pv_guest_cpu_reboot(void *unused)
   380	{
   381		/*
   382		 * We disable PV EOI before we load a new kernel by kexec,
   383		 * since MSR_KVM_PV_EOI_EN stores a pointer into old kernel's memory.
   384		 * New kernel can re-enable when it boots.
   385		 */
   386		if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
   387			wrmsrl(MSR_KVM_PV_EOI_EN, 0);
   388		/*
   389		 * Reset the host's shared pages list related to kernel
   390		 * specific page encryption status settings before we load a
   391		 * new kernel by kexec. NOTE: We cannot reset the complete
   392		 * shared pages list here as we need to retain the
   393		 * UEFI/OVMF firmware specific settings.
   394		 */
   395		if (sev_live_migration_enabled & (smp_processor_id() == 0)) {
   396			int i;
   397			unsigned long nr_pages;
   398	
   399			for (i = 0; i < e820_table->nr_entries; i++) {
   400				struct e820_entry *entry = &e820_table->entries[i];
 > 401				unsigned long start_pfn;
 > 402				unsigned long end_pfn;
   403	
   404				if (entry->type != E820_TYPE_RAM)
   405					continue;
   406	
   407				start_pfn = entry->addr >> PAGE_SHIFT;
   408				end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
   409				nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
   410	
   411				kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
   412						   entry->addr, nr_pages, 1);
   413			}
   414		}
   415		kvm_pv_disable_apf();
   416		kvm_disable_steal_time();
   417	}
   418	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32333 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 15/16] KVM: x86: Add kexec support for SEV Live Migration.
@ 2021-02-04  4:10     ` kernel test robot
  0 siblings, 0 replies; 75+ messages in thread
From: kernel test robot @ 2021-02-04  4:10 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3421 bytes --]

Hi Ashish,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v5.11-rc6]
[also build test WARNING on next-20210125]
[cannot apply to kvm/linux-next tip/x86/mm tip/x86/core]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ashish-Kalra/KVM-SVM-Add-KVM_SEV-SEND_START-command/20210204-093647
base:    1048ba83fb1c00cd24172e23e8263972f6b5d9ac
config: i386-randconfig-r034-20210202 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/51c7205a5d0abf98f52da67fcf7a223c521f9693
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ashish-Kalra/KVM-SVM-Add-KVM_SEV-SEND_START-command/20210204-093647
        git checkout 51c7205a5d0abf98f52da67fcf7a223c521f9693
        # save the attached .config to linux build tree
        make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   arch/x86/kernel/kvm.c: In function 'kvm_pv_guest_cpu_reboot':
>> arch/x86/kernel/kvm.c:402:18: warning: variable 'end_pfn' set but not used [-Wunused-but-set-variable]
     402 |    unsigned long end_pfn;
         |                  ^~~~~~~
>> arch/x86/kernel/kvm.c:401:18: warning: variable 'start_pfn' set but not used [-Wunused-but-set-variable]
     401 |    unsigned long start_pfn;
         |                  ^~~~~~~~~


vim +/end_pfn +402 arch/x86/kernel/kvm.c

   378	
   379	static void kvm_pv_guest_cpu_reboot(void *unused)
   380	{
   381		/*
   382		 * We disable PV EOI before we load a new kernel by kexec,
   383		 * since MSR_KVM_PV_EOI_EN stores a pointer into old kernel's memory.
   384		 * New kernel can re-enable when it boots.
   385		 */
   386		if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
   387			wrmsrl(MSR_KVM_PV_EOI_EN, 0);
   388		/*
   389		 * Reset the host's shared pages list related to kernel
   390		 * specific page encryption status settings before we load a
   391		 * new kernel by kexec. NOTE: We cannot reset the complete
   392		 * shared pages list here as we need to retain the
   393		 * UEFI/OVMF firmware specific settings.
   394		 */
   395		if (sev_live_migration_enabled & (smp_processor_id() == 0)) {
   396			int i;
   397			unsigned long nr_pages;
   398	
   399			for (i = 0; i < e820_table->nr_entries; i++) {
   400				struct e820_entry *entry = &e820_table->entries[i];
 > 401				unsigned long start_pfn;
 > 402				unsigned long end_pfn;
   403	
   404				if (entry->type != E820_TYPE_RAM)
   405					continue;
   406	
   407				start_pfn = entry->addr >> PAGE_SHIFT;
   408				end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
   409				nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
   410	
   411				kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
   412						   entry->addr, nr_pages, 1);
   413			}
   414		}
   415		kvm_pv_disable_apf();
   416		kvm_disable_steal_time();
   417	}
   418	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 32333 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 08/16] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2021-02-04  0:38 ` [PATCH v10 08/16] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall Ashish Kalra
@ 2021-02-04 16:03   ` Tom Lendacky
  2021-02-05  1:44   ` Steve Rutherford
  1 sibling, 0 replies; 75+ messages in thread
From: Tom Lendacky @ 2021-02-04 16:03 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, x86, kvm, linux-kernel,
	srutherford, seanjc, venu.busireddy, brijesh.singh

On 2/3/21 6:38 PM, Ashish Kalra wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> This hypercall is used by the SEV guest to notify a change in the page
> encryption status to the hypervisor. The hypercall should be invoked
> only when the encryption attribute is changed from encrypted -> decrypted
> and vice versa. By default all guest pages are considered encrypted.
> 
> The patch introduces a new shared pages list implemented as a
> sorted linked list to track the shared/unencrypted regions marked by the
> guest hypercall.
> 

...

> +
> +	if (enc) {
> +		ret = remove_shared_region(gfn_start, gfn_end,
> +					   &sev->shared_pages_list);
> +		if (ret != -ENOMEM)
> +			sev->shared_pages_list_count += ret;
> +	} else {
> +		ret = add_shared_region(gfn_start, gfn_end,
> +					&sev->shared_pages_list);
> +		if (ret > 0)
> +			sev->shared_pages_list_count++;
> +	}

I would move the shared_pages_list_count updates into the add/remove 
functions and then just return 0 or a -EXXXX error code from those 
functions. It seems simpler than "adding" ret or checking for a greater 
than 0 return code.

Thanks,
Tom

> +
> +	mutex_unlock(&kvm->lock);
> +	return ret;
> +}
> +
>   int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   {
>   	struct kvm_sev_cmd sev_cmd;
> @@ -1693,6 +1842,7 @@ void sev_vm_destroy(struct kvm *kvm)
>   
>   	sev_unbind_asid(kvm, sev->handle);
>   	sev_asid_free(sev->asid);
> +	sev->shared_pages_list_count = 0;
>   }
>   
>   void __init sev_hardware_setup(void)
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index f923e14e87df..bb249ec625fc 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4536,6 +4536,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>   	.complete_emulated_msr = svm_complete_emulated_msr,
>   
>   	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
> +
> +	.page_enc_status_hc = svm_page_enc_status_hc,
>   };
>   
>   static struct kvm_x86_init_ops svm_init_ops __initdata = {
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 0fe874ae5498..6437c1fa1f24 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -79,6 +79,9 @@ struct kvm_sev_info {
>   	unsigned long pages_locked; /* Number of pages locked */
>   	struct list_head regions_list;  /* List of registered regions */
>   	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
> +	/* List and count of shared pages */
> +	int shared_pages_list_count;
> +	struct list_head shared_pages_list;
>   };
>   
>   struct kvm_svm {
> @@ -472,6 +475,8 @@ int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
>   			       bool has_error_code, u32 error_code);
>   int nested_svm_exit_special(struct vcpu_svm *svm);
>   void sync_nested_vmcb_control(struct vcpu_svm *svm);
> +int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> +			   unsigned long npages, unsigned long enc);
>   
>   extern struct kvm_x86_nested_ops svm_nested_ops;
>   
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index cc60b1fc3ee7..bcbf53851612 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7705,6 +7705,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
>   	.can_emulate_instruction = vmx_can_emulate_instruction,
>   	.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
>   	.migrate_timers = vmx_migrate_timers,
> +	.page_enc_status_hc = NULL,
>   
>   	.msr_filter_changed = vmx_msr_filter_changed,
>   	.complete_emulated_msr = kvm_complete_insn_gp,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 76bce832cade..2f17f0f9ace7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8162,6 +8162,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>   		kvm_sched_yield(vcpu->kvm, a0);
>   		ret = 0;
>   		break;
> +	case KVM_HC_PAGE_ENC_STATUS:
> +		ret = -KVM_ENOSYS;
> +		if (kvm_x86_ops.page_enc_status_hc)
> +			ret = kvm_x86_ops.page_enc_status_hc(vcpu->kvm,
> +					a0, a1, a2);
> +		break;
>   	default:
>   		ret = -KVM_ENOSYS;
>   		break;
> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> index 8b86609849b9..847b83b75dc8 100644
> --- a/include/uapi/linux/kvm_para.h
> +++ b/include/uapi/linux/kvm_para.h
> @@ -29,6 +29,7 @@
>   #define KVM_HC_CLOCK_PAIRING		9
>   #define KVM_HC_SEND_IPI		10
>   #define KVM_HC_SCHED_YIELD		11
> +#define KVM_HC_PAGE_ENC_STATUS		12
>   
>   /*
>    * hypercalls use architecture specific
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-04  0:39 ` [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl Ashish Kalra
@ 2021-02-04 16:14   ` Tom Lendacky
  2021-02-04 16:34     ` Ashish Kalra
  2021-02-17  1:03   ` Sean Christopherson
  2021-02-17  1:06   ` Sean Christopherson
  2 siblings, 1 reply; 75+ messages in thread
From: Tom Lendacky @ 2021-02-04 16:14 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: tglx, mingo, hpa, rkrcmar, joro, bp, x86, kvm, linux-kernel,
	srutherford, seanjc, venu.busireddy, brijesh.singh

On 2/3/21 6:39 PM, Ashish Kalra wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The ioctl is used to retrieve a guest's shared pages list.
> 

...

>   
> +int svm_get_shared_pages_list(struct kvm *kvm,
> +			      struct kvm_shared_pages_list *list)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct shared_region_array_entry *array;
> +	struct shared_region *pos;
> +	int ret, nents = 0;
> +	unsigned long sz;
> +
> +	if (!sev_guest(kvm))
> +		return -ENOTTY;
> +
> +	if (!list->size)
> +		return -EINVAL;
> +
> +	if (!sev->shared_pages_list_count)
> +		return put_user(0, list->pnents);
> +
> +	sz = sev->shared_pages_list_count * sizeof(struct shared_region_array_entry);
> +	if (sz > list->size)
> +		return -E2BIG;
> +
> +	array = kmalloc(sz, GFP_KERNEL);
> +	if (!array)
> +		return -ENOMEM;
> +
> +	mutex_lock(&kvm->lock);

I think this lock needs to be taken before the memory size is calculated. 
If the list is expanded after obtaining the size and before taking the 
lock, you will run off the end of the array.

Thanks,
Tom

> +	list_for_each_entry(pos, &sev->shared_pages_list, list) {
> +		array[nents].gfn_start = pos->gfn_start;
> +		array[nents++].gfn_end = pos->gfn_end;
> +	}
> +	mutex_unlock(&kvm->lock);
> +
> +	ret = -EFAULT;
> +	if (copy_to_user(list->buffer, array, sz))
> +		goto out;
> +	if (put_user(nents, list->pnents))
> +		goto out;
> +	ret = 0;
> +out:
> +	kfree(array);
> +	return ret;
> +}
> +

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-04 16:14   ` Tom Lendacky
@ 2021-02-04 16:34     ` Ashish Kalra
  0 siblings, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-02-04 16:34 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: pbonzini, tglx, mingo, hpa, rkrcmar, joro, bp, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

Hello Tom,

On Thu, Feb 04, 2021 at 10:14:37AM -0600, Tom Lendacky wrote:
> On 2/3/21 6:39 PM, Ashish Kalra wrote:
> > From: Brijesh Singh <brijesh.singh@amd.com>
> > 
> > The ioctl is used to retrieve a guest's shared pages list.
> > 
> 
> ...
> 
> > +int svm_get_shared_pages_list(struct kvm *kvm,
> > +			      struct kvm_shared_pages_list *list)
> > +{
> > +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +	struct shared_region_array_entry *array;
> > +	struct shared_region *pos;
> > +	int ret, nents = 0;
> > +	unsigned long sz;
> > +
> > +	if (!sev_guest(kvm))
> > +		return -ENOTTY;
> > +
> > +	if (!list->size)
> > +		return -EINVAL;
> > +
> > +	if (!sev->shared_pages_list_count)
> > +		return put_user(0, list->pnents);
> > +
> > +	sz = sev->shared_pages_list_count * sizeof(struct shared_region_array_entry);
> > +	if (sz > list->size)
> > +		return -E2BIG;
> > +
> > +	array = kmalloc(sz, GFP_KERNEL);
> > +	if (!array)
> > +		return -ENOMEM;
> > +
> > +	mutex_lock(&kvm->lock);
> 
> I think this lock needs to be taken before the memory size is calculated. If
> the list is expanded after obtaining the size and before taking the lock,
> you will run off the end of the array.
> 

Yes, as the page encryption status hcalls can happen simultaneously to
this ioctl, therefore this is an issue, so i will take the lock before
the memory size is calculated.

Thanks,
Ashish

> 
> > +	list_for_each_entry(pos, &sev->shared_pages_list, list) {
> > +		array[nents].gfn_start = pos->gfn_start;
> > +		array[nents++].gfn_end = pos->gfn_end;
> > +	}
> > +	mutex_unlock(&kvm->lock);
> > +
> > +	ret = -EFAULT;
> > +	if (copy_to_user(list->buffer, array, sz))
> > +		goto out;
> > +	if (put_user(nents, list->pnents))
> > +		goto out;
> > +	ret = 0;
> > +out:
> > +	kfree(array);
> > +	return ret;
> > +}
> > +

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.
  2021-02-04  0:39 ` [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR Ashish Kalra
@ 2021-02-05  0:56   ` Steve Rutherford
  2021-02-05  3:07     ` Ashish Kalra
  2021-02-16 23:20   ` Sean Christopherson
  1 sibling, 1 reply; 75+ messages in thread
From: Steve Rutherford @ 2021-02-05  0:56 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Radim Krčmář,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, Sean Christopherson, Venu Busireddy, Brijesh Singh

On Wed, Feb 3, 2021 at 4:39 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
>
> From: Ashish Kalra <ashish.kalra@amd.com>
>
> Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> for host-side support for SEV live migration. Also add a new custom
> MSR_KVM_SEV_LIVE_MIGRATION for guest to enable the SEV live migration
> feature.
>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  Documentation/virt/kvm/cpuid.rst     |  5 +++++
>  Documentation/virt/kvm/msr.rst       | 12 ++++++++++++
>  arch/x86/include/uapi/asm/kvm_para.h |  4 ++++
>  arch/x86/kvm/svm/sev.c               | 13 +++++++++++++
>  arch/x86/kvm/svm/svm.c               | 16 ++++++++++++++++
>  arch/x86/kvm/svm/svm.h               |  2 ++
>  6 files changed, 52 insertions(+)
>
> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> index cf62162d4be2..0bdb6cdb12d3 100644
> --- a/Documentation/virt/kvm/cpuid.rst
> +++ b/Documentation/virt/kvm/cpuid.rst
> @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID        15          guest checks this feature bit
>                                                 before using extended destination
>                                                 ID bits in MSI address bits 11-5.
>
> +KVM_FEATURE_SEV_LIVE_MIGRATION     16          guest checks this feature bit before
> +                                               using the page encryption state
> +                                               hypercall to notify the page state
> +                                               change
> +
>  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24          host will warn if no guest-side
>                                                 per-cpu warps are expected in
>                                                 kvmclock
> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> index e37a14c323d2..020245d16087 100644
> --- a/Documentation/virt/kvm/msr.rst
> +++ b/Documentation/virt/kvm/msr.rst
> @@ -376,3 +376,15 @@ data:
>         write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
>         and check if there are more notifications pending. The MSR is available
>         if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
> +
> +MSR_KVM_SEV_LIVE_MIGRATION:
> +        0x4b564d08
> +
> +       Control SEV Live Migration features.
> +
> +data:
> +        Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature,
> +        in other words, this is guest->host communication that it's properly
> +        handling the shared pages list.
> +
> +        All other bits are reserved.
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 950afebfba88..f6bfa138874f 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -33,6 +33,7 @@
>  #define KVM_FEATURE_PV_SCHED_YIELD     13
>  #define KVM_FEATURE_ASYNC_PF_INT       14
>  #define KVM_FEATURE_MSI_EXT_DEST_ID    15
> +#define KVM_FEATURE_SEV_LIVE_MIGRATION 16
>
>  #define KVM_HINTS_REALTIME      0
>
> @@ -54,6 +55,7 @@
>  #define MSR_KVM_POLL_CONTROL   0x4b564d05
>  #define MSR_KVM_ASYNC_PF_INT   0x4b564d06
>  #define MSR_KVM_ASYNC_PF_ACK   0x4b564d07
> +#define MSR_KVM_SEV_LIVE_MIGRATION     0x4b564d08
>
>  struct kvm_steal_time {
>         __u64 steal;
> @@ -136,4 +138,6 @@ struct kvm_vcpu_pv_apf_data {
>  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
>  #define KVM_PV_EOI_DISABLED 0x0
>
> +#define KVM_SEV_LIVE_MIGRATION_ENABLED BIT_ULL(0)
> +
>  #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index b0d324aed515..93f42b3d3e33 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1627,6 +1627,16 @@ int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
>         return ret;
>  }
>
> +void sev_update_migration_flags(struct kvm *kvm, u64 data)
> +{
> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +
> +       if (!sev_guest(kvm))
> +               return;

This should assert that userspace wanted the guest to be able to make
these calls (see more below).

>
> +
> +       sev->live_migration_enabled = !!(data & KVM_SEV_LIVE_MIGRATION_ENABLED);
> +}
> +
>  int svm_get_shared_pages_list(struct kvm *kvm,
>                               struct kvm_shared_pages_list *list)
>  {
> @@ -1639,6 +1649,9 @@ int svm_get_shared_pages_list(struct kvm *kvm,
>         if (!sev_guest(kvm))
>                 return -ENOTTY;
>
> +       if (!sev->live_migration_enabled)
> +               return -EINVAL;
> +
>         if (!list->size)
>                 return -EINVAL;
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 58f89f83caab..43ea5061926f 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2903,6 +2903,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>                 svm->msr_decfg = data;
>                 break;
>         }
> +       case MSR_KVM_SEV_LIVE_MIGRATION:
> +               sev_update_migration_flags(vcpu->kvm, data);
> +               break;
>         case MSR_IA32_APICBASE:
>                 if (kvm_vcpu_apicv_active(vcpu))
>                         avic_update_vapic_bar(to_svm(vcpu), data);
> @@ -3976,6 +3979,19 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>                         vcpu->arch.cr3_lm_rsvd_bits &= ~(1UL << (best->ebx & 0x3f));
>         }
>
> +       /*
> +        * If SEV guest then enable the Live migration feature.
> +        */
> +       if (sev_guest(vcpu->kvm)) {
> +               struct kvm_cpuid_entry2 *best;
> +
> +               best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
> +               if (!best)
> +                       return;
> +
> +               best->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
> +       }
> +

Looking at this, I believe the only way for this bit to get enabled is
if userspace toggles it. There needs to be a way for userspace to
identify if the kernel underneath them does, in fact, support SEV LM.
I'm at risk for having misread these patches (it's a long series), but
I don't see anything that communicates upwards.

This could go upward with the other paravirt features flags in
cpuid.c. It could also be an explicit KVM Capability (checked through
check_extension).

Userspace should then have a chance to decide whether or not this
should be enabled. And when it's not enabled, the host should return a
GP in response to the hypercall. This could be configured either
through userspace stripping out the LM feature bit, or by calling a VM
scoped enable cap (KVM_VM_IOCTL_ENABLE_CAP).

I believe the typical path for a feature like this to be configured
would be to use ENABLE_CAP.

>
>         if (!kvm_vcpu_apicv_active(vcpu))
>                 return;
>
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 066ca2a9f1e6..e1bffc11e425 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -79,6 +79,7 @@ struct kvm_sev_info {
>         unsigned long pages_locked; /* Number of pages locked */
>         struct list_head regions_list;  /* List of registered regions */
>         u64 ap_jump_table;      /* SEV-ES AP Jump Table address */
> +       bool live_migration_enabled;
>         /* List and count of shared pages */
>         int shared_pages_list_count;
>         struct list_head shared_pages_list;
> @@ -592,6 +593,7 @@ int svm_unregister_enc_region(struct kvm *kvm,
>  void pre_sev_run(struct vcpu_svm *svm, int cpu);
>  void __init sev_hardware_setup(void);
>  void sev_hardware_teardown(void);
> +void sev_update_migration_flags(struct kvm *kvm, u64 data);
>  void sev_free_vcpu(struct kvm_vcpu *vcpu);
>  int sev_handle_vmgexit(struct vcpu_svm *svm);
>  int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 08/16] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2021-02-04  0:38 ` [PATCH v10 08/16] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall Ashish Kalra
  2021-02-04 16:03   ` Tom Lendacky
@ 2021-02-05  1:44   ` Steve Rutherford
  2021-02-05  3:32     ` Ashish Kalra
  1 sibling, 1 reply; 75+ messages in thread
From: Steve Rutherford @ 2021-02-05  1:44 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Radim Krčmář,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, Sean Christopherson, Venu Busireddy, Brijesh Singh

On Wed, Feb 3, 2021 at 4:38 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
>
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> This hypercall is used by the SEV guest to notify a change in the page
> encryption status to the hypervisor. The hypercall should be invoked
> only when the encryption attribute is changed from encrypted -> decrypted
> and vice versa. By default all guest pages are considered encrypted.
>
> The patch introduces a new shared pages list implemented as a
> sorted linked list to track the shared/unencrypted regions marked by the
> guest hypercall.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  Documentation/virt/kvm/hypercalls.rst |  15 +++
>  arch/x86/include/asm/kvm_host.h       |   2 +
>  arch/x86/kvm/svm/sev.c                | 150 ++++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.c                |   2 +
>  arch/x86/kvm/svm/svm.h                |   5 +
>  arch/x86/kvm/vmx/vmx.c                |   1 +
>  arch/x86/kvm/x86.c                    |   6 ++
>  include/uapi/linux/kvm_para.h         |   1 +
>  8 files changed, 182 insertions(+)
>
> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> index ed4fddd364ea..7aff0cebab7c 100644
> --- a/Documentation/virt/kvm/hypercalls.rst
> +++ b/Documentation/virt/kvm/hypercalls.rst
> @@ -169,3 +169,18 @@ a0: destination APIC ID
>
>  :Usage example: When sending a call-function IPI-many to vCPUs, yield if
>                 any of the IPI target vCPUs was preempted.
> +
> +
> +8. KVM_HC_PAGE_ENC_STATUS
> +-------------------------
> +:Architecture: x86
> +:Status: active
> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> +
> +a0: the guest physical address of the start page
> +a1: the number of pages
> +a2: encryption attribute
> +
> +   Where:
> +       * 1: Encryption attribute is set
> +       * 0: Encryption attribute is cleared
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3d6616f6f6ef..2da5f5e2a10e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1301,6 +1301,8 @@ struct kvm_x86_ops {
>         int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
>
>         void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
> +       int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> +                                 unsigned long sz, unsigned long mode);
>  };
>
>  struct kvm_x86_nested_ops {
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 25eaf35ba51d..55c628df5155 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -45,6 +45,11 @@ struct enc_region {
>         unsigned long size;
>  };
>
> +struct shared_region {
> +       struct list_head list;
> +       unsigned long gfn_start, gfn_end;
> +};
> +
>  static int sev_flush_asids(void)
>  {
>         int ret, error = 0;
> @@ -196,6 +201,8 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>         sev->active = true;
>         sev->asid = asid;
>         INIT_LIST_HEAD(&sev->regions_list);
> +       INIT_LIST_HEAD(&sev->shared_pages_list);
> +       sev->shared_pages_list_count = 0;
>
>         return 0;
>
> @@ -1473,6 +1480,148 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>         return ret;
>  }
>
> +static int remove_shared_region(unsigned long start, unsigned long end,
> +                               struct list_head *head)
> +{
> +       struct shared_region *pos;
> +
> +       list_for_each_entry(pos, head, list) {
> +               if (pos->gfn_start == start &&
> +                   pos->gfn_end == end) {
> +                       list_del(&pos->list);
> +                       kfree(pos);
> +                       return -1;
> +               } else if (start >= pos->gfn_start && end <= pos->gfn_end) {
> +                       if (start == pos->gfn_start)
> +                               pos->gfn_start = end + 1;
> +                       else if (end == pos->gfn_end)
> +                               pos->gfn_end = start - 1;
> +                       else {
> +                               /* Do a de-merge -- split linked list nodes */
> +                               unsigned long tmp;
> +                               struct shared_region *shrd_region;
> +
> +                               tmp = pos->gfn_end;
> +                               pos->gfn_end = start-1;
> +                               shrd_region = kzalloc(sizeof(*shrd_region), GFP_KERNEL_ACCOUNT);
> +                               if (!shrd_region)
> +                                       return -ENOMEM;
> +                               shrd_region->gfn_start = end + 1;
> +                               shrd_region->gfn_end = tmp;
> +                               list_add(&shrd_region->list, &pos->list);
> +                               return 1;
> +                       }
> +                       return 0;
> +               }
> +       }

This doesn't handle the case where the region being marked as
encrypted is larger than than the unencrypted region under
consideration, which (I believe) can happen with the current kexec
handling (since it is oblivious to the prior state).
I would probably break this down into the "five cases": no
intersection (skip), entry is completely contained (drop), entry
completely contains the removed region (split), intersect start
(chop), and intersect end (chop).

>
> +       return 0;
> +}
> +
> +static int add_shared_region(unsigned long start, unsigned long end,
> +                            struct list_head *shared_pages_list)
> +{
> +       struct list_head *head = shared_pages_list;
> +       struct shared_region *shrd_region;
> +       struct shared_region *pos;
> +
> +       if (list_empty(head)) {
> +               shrd_region = kzalloc(sizeof(*shrd_region), GFP_KERNEL_ACCOUNT);
> +               if (!shrd_region)
> +                       return -ENOMEM;
> +               shrd_region->gfn_start = start;
> +               shrd_region->gfn_end = end;
> +               list_add_tail(&shrd_region->list, head);
> +               return 1;
> +       }
> +
> +       /*
> +        * Shared pages list is a sorted list in ascending order of
> +        * guest PA's and also merges consecutive range of guest PA's
> +        */
> +       list_for_each_entry(pos, head, list) {
> +               if (pos->gfn_end < start)
> +                       continue;
> +               /* merge consecutive guest PA(s) */
> +               if (pos->gfn_start <= start && pos->gfn_end >= start) {
> +                       pos->gfn_end = end;

I'm not sure this correctly avoids having duplicate overlapping
elements in the list. It also doesn't merge consecutive contiguous
regions. Current guest implementation should never call the hypercall
with C=0 for the same region twice, without calling with c=1 in
between, but this API should be compatible with that model.

The easiest pattern would probably be to:
1) find (or insert) the node that will contain the added region.
2) remove the contents of the added region from the tail (will
typically do nothing).
3) merge the head of the tail into the current node, if the end of the
current node matches the start of that head.
>
> +                       return 0;
> +               }
> +               break;
> +       }
>
> +       /*
> +        * Add a new node, allocate nodes using GFP_KERNEL_ACCOUNT so that
> +        * kernel memory can be tracked/throttled in case a
> +        * malicious guest makes infinite number of hypercalls to
> +        * exhaust host kernel memory and cause a DOS attack.
> +        */
> +       shrd_region = kzalloc(sizeof(*shrd_region), GFP_KERNEL_ACCOUNT);
> +       if (!shrd_region)
> +               return -ENOMEM;
> +       shrd_region->gfn_start = start;
> +       shrd_region->gfn_end = end;
> +       list_add_tail(&shrd_region->list, &pos->list);
> +       return 1;
>
> +}
> +

Thanks!
Steve

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.
  2021-02-05  0:56   ` Steve Rutherford
@ 2021-02-05  3:07     ` Ashish Kalra
  2021-02-06  2:54       ` Steve Rutherford
  0 siblings, 1 reply; 75+ messages in thread
From: Ashish Kalra @ 2021-02-05  3:07 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Radim Krčmář,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, Sean Christopherson, Venu Busireddy, Brijesh Singh

Hello Steve,

On Thu, Feb 04, 2021 at 04:56:35PM -0800, Steve Rutherford wrote:
> On Wed, Feb 3, 2021 at 4:39 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> >
> > From: Ashish Kalra <ashish.kalra@amd.com>
> >
> > Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> > for host-side support for SEV live migration. Also add a new custom
> > MSR_KVM_SEV_LIVE_MIGRATION for guest to enable the SEV live migration
> > feature.
> >
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > ---
> >  Documentation/virt/kvm/cpuid.rst     |  5 +++++
> >  Documentation/virt/kvm/msr.rst       | 12 ++++++++++++
> >  arch/x86/include/uapi/asm/kvm_para.h |  4 ++++
> >  arch/x86/kvm/svm/sev.c               | 13 +++++++++++++
> >  arch/x86/kvm/svm/svm.c               | 16 ++++++++++++++++
> >  arch/x86/kvm/svm/svm.h               |  2 ++
> >  6 files changed, 52 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> > index cf62162d4be2..0bdb6cdb12d3 100644
> > --- a/Documentation/virt/kvm/cpuid.rst
> > +++ b/Documentation/virt/kvm/cpuid.rst
> > @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID        15          guest checks this feature bit
> >                                                 before using extended destination
> >                                                 ID bits in MSI address bits 11-5.
> >
> > +KVM_FEATURE_SEV_LIVE_MIGRATION     16          guest checks this feature bit before
> > +                                               using the page encryption state
> > +                                               hypercall to notify the page state
> > +                                               change
> > +
> >  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24          host will warn if no guest-side
> >                                                 per-cpu warps are expected in
> >                                                 kvmclock
> > diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> > index e37a14c323d2..020245d16087 100644
> > --- a/Documentation/virt/kvm/msr.rst
> > +++ b/Documentation/virt/kvm/msr.rst
> > @@ -376,3 +376,15 @@ data:
> >         write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
> >         and check if there are more notifications pending. The MSR is available
> >         if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
> > +
> > +MSR_KVM_SEV_LIVE_MIGRATION:
> > +        0x4b564d08
> > +
> > +       Control SEV Live Migration features.
> > +
> > +data:
> > +        Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature,
> > +        in other words, this is guest->host communication that it's properly
> > +        handling the shared pages list.
> > +
> > +        All other bits are reserved.
> > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> > index 950afebfba88..f6bfa138874f 100644
> > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > @@ -33,6 +33,7 @@
> >  #define KVM_FEATURE_PV_SCHED_YIELD     13
> >  #define KVM_FEATURE_ASYNC_PF_INT       14
> >  #define KVM_FEATURE_MSI_EXT_DEST_ID    15
> > +#define KVM_FEATURE_SEV_LIVE_MIGRATION 16
> >
> >  #define KVM_HINTS_REALTIME      0
> >
> > @@ -54,6 +55,7 @@
> >  #define MSR_KVM_POLL_CONTROL   0x4b564d05
> >  #define MSR_KVM_ASYNC_PF_INT   0x4b564d06
> >  #define MSR_KVM_ASYNC_PF_ACK   0x4b564d07
> > +#define MSR_KVM_SEV_LIVE_MIGRATION     0x4b564d08
> >
> >  struct kvm_steal_time {
> >         __u64 steal;
> > @@ -136,4 +138,6 @@ struct kvm_vcpu_pv_apf_data {
> >  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
> >  #define KVM_PV_EOI_DISABLED 0x0
> >
> > +#define KVM_SEV_LIVE_MIGRATION_ENABLED BIT_ULL(0)
> > +
> >  #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index b0d324aed515..93f42b3d3e33 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -1627,6 +1627,16 @@ int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> >         return ret;
> >  }
> >
> > +void sev_update_migration_flags(struct kvm *kvm, u64 data)
> > +{
> > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +
> > +       if (!sev_guest(kvm))
> > +               return;
> 
> This should assert that userspace wanted the guest to be able to make
> these calls (see more below).
> 
> >
> > +
> > +       sev->live_migration_enabled = !!(data & KVM_SEV_LIVE_MIGRATION_ENABLED);
> > +}
> > +
> >  int svm_get_shared_pages_list(struct kvm *kvm,
> >                               struct kvm_shared_pages_list *list)
> >  {
> > @@ -1639,6 +1649,9 @@ int svm_get_shared_pages_list(struct kvm *kvm,
> >         if (!sev_guest(kvm))
> >                 return -ENOTTY;
> >
> > +       if (!sev->live_migration_enabled)
> > +               return -EINVAL;
> > +
> >         if (!list->size)
> >                 return -EINVAL;
> >
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 58f89f83caab..43ea5061926f 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -2903,6 +2903,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> >                 svm->msr_decfg = data;
> >                 break;
> >         }
> > +       case MSR_KVM_SEV_LIVE_MIGRATION:
> > +               sev_update_migration_flags(vcpu->kvm, data);
> > +               break;
> >         case MSR_IA32_APICBASE:
> >                 if (kvm_vcpu_apicv_active(vcpu))
> >                         avic_update_vapic_bar(to_svm(vcpu), data);
> > @@ -3976,6 +3979,19 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> >                         vcpu->arch.cr3_lm_rsvd_bits &= ~(1UL << (best->ebx & 0x3f));
> >         }
> >
> > +       /*
> > +        * If SEV guest then enable the Live migration feature.
> > +        */
> > +       if (sev_guest(vcpu->kvm)) {
> > +               struct kvm_cpuid_entry2 *best;
> > +
> > +               best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
> > +               if (!best)
> > +                       return;
> > +
> > +               best->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
> > +       }
> > +
> 
> Looking at this, I believe the only way for this bit to get enabled is
> if userspace toggles it. There needs to be a way for userspace to
> identify if the kernel underneath them does, in fact, support SEV LM.
> I'm at risk for having misread these patches (it's a long series), but
> I don't see anything that communicates upwards.
> 
> This could go upward with the other paravirt features flags in
> cpuid.c. It could also be an explicit KVM Capability (checked through
> check_extension).
> 
> Userspace should then have a chance to decide whether or not this
> should be enabled. And when it's not enabled, the host should return a
> GP in response to the hypercall. This could be configured either
> through userspace stripping out the LM feature bit, or by calling a VM
> scoped enable cap (KVM_VM_IOCTL_ENABLE_CAP).
> 
> I believe the typical path for a feature like this to be configured
> would be to use ENABLE_CAP.

I believe we have discussed and reviewed this earlier too. 

To summarize this feature, the host indicates if it supports the Live
Migration feature and the feature and the hypercall are only enabled on
the host when the guest checks for this support and does a wrmsrl() to
enable the feature. Also the guest will not make the hypercall if the
host does not indicate support for it.

And these were your review comments on the above : 
I see I misunderstood how the CPUID bits get passed
through: usermode can still override them. Forgot about the back and
forth for CPUID with usermode. 

So as you mentioned, userspace can still override these and it gets a 
chance to decide whether or not this should be enabled.

Thanks,
Ashish

> >
> >         if (!kvm_vcpu_apicv_active(vcpu))
> >                 return;
> >
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index 066ca2a9f1e6..e1bffc11e425 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -79,6 +79,7 @@ struct kvm_sev_info {
> >         unsigned long pages_locked; /* Number of pages locked */
> >         struct list_head regions_list;  /* List of registered regions */
> >         u64 ap_jump_table;      /* SEV-ES AP Jump Table address */
> > +       bool live_migration_enabled;
> >         /* List and count of shared pages */
> >         int shared_pages_list_count;
> >         struct list_head shared_pages_list;
> > @@ -592,6 +593,7 @@ int svm_unregister_enc_region(struct kvm *kvm,
> >  void pre_sev_run(struct vcpu_svm *svm, int cpu);
> >  void __init sev_hardware_setup(void);
> >  void sev_hardware_teardown(void);
> > +void sev_update_migration_flags(struct kvm *kvm, u64 data);
> >  void sev_free_vcpu(struct kvm_vcpu *vcpu);
> >  int sev_handle_vmgexit(struct vcpu_svm *svm);
> >  int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
> > --
> > 2.17.1
> >

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 08/16] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2021-02-05  1:44   ` Steve Rutherford
@ 2021-02-05  3:32     ` Ashish Kalra
  0 siblings, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-02-05  3:32 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Radim Krčmář,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, Sean Christopherson, Venu Busireddy, Brijesh Singh

Hello Steve,

On Thu, Feb 04, 2021 at 05:44:27PM -0800, Steve Rutherford wrote:
> On Wed, Feb 3, 2021 at 4:38 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> >
> > From: Brijesh Singh <brijesh.singh@amd.com>
> >
> > This hypercall is used by the SEV guest to notify a change in the page
> > encryption status to the hypervisor. The hypercall should be invoked
> > only when the encryption attribute is changed from encrypted -> decrypted
> > and vice versa. By default all guest pages are considered encrypted.
> >
> > The patch introduces a new shared pages list implemented as a
> > sorted linked list to track the shared/unencrypted regions marked by the
> > guest hypercall.
> >
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > Cc: x86@kernel.org
> > Cc: kvm@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > ---
> >  Documentation/virt/kvm/hypercalls.rst |  15 +++
> >  arch/x86/include/asm/kvm_host.h       |   2 +
> >  arch/x86/kvm/svm/sev.c                | 150 ++++++++++++++++++++++++++
> >  arch/x86/kvm/svm/svm.c                |   2 +
> >  arch/x86/kvm/svm/svm.h                |   5 +
> >  arch/x86/kvm/vmx/vmx.c                |   1 +
> >  arch/x86/kvm/x86.c                    |   6 ++
> >  include/uapi/linux/kvm_para.h         |   1 +
> >  8 files changed, 182 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> > index ed4fddd364ea..7aff0cebab7c 100644
> > --- a/Documentation/virt/kvm/hypercalls.rst
> > +++ b/Documentation/virt/kvm/hypercalls.rst
> > @@ -169,3 +169,18 @@ a0: destination APIC ID
> >
> >  :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> >                 any of the IPI target vCPUs was preempted.
> > +
> > +
> > +8. KVM_HC_PAGE_ENC_STATUS
> > +-------------------------
> > +:Architecture: x86
> > +:Status: active
> > +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> > +
> > +a0: the guest physical address of the start page
> > +a1: the number of pages
> > +a2: encryption attribute
> > +
> > +   Where:
> > +       * 1: Encryption attribute is set
> > +       * 0: Encryption attribute is cleared
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 3d6616f6f6ef..2da5f5e2a10e 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1301,6 +1301,8 @@ struct kvm_x86_ops {
> >         int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
> >
> >         void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
> > +       int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > +                                 unsigned long sz, unsigned long mode);
> >  };
> >
> >  struct kvm_x86_nested_ops {
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 25eaf35ba51d..55c628df5155 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -45,6 +45,11 @@ struct enc_region {
> >         unsigned long size;
> >  };
> >
> > +struct shared_region {
> > +       struct list_head list;
> > +       unsigned long gfn_start, gfn_end;
> > +};
> > +
> >  static int sev_flush_asids(void)
> >  {
> >         int ret, error = 0;
> > @@ -196,6 +201,8 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >         sev->active = true;
> >         sev->asid = asid;
> >         INIT_LIST_HEAD(&sev->regions_list);
> > +       INIT_LIST_HEAD(&sev->shared_pages_list);
> > +       sev->shared_pages_list_count = 0;
> >
> >         return 0;
> >
> > @@ -1473,6 +1480,148 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >         return ret;
> >  }
> >
> > +static int remove_shared_region(unsigned long start, unsigned long end,
> > +                               struct list_head *head)
> > +{
> > +       struct shared_region *pos;
> > +
> > +       list_for_each_entry(pos, head, list) {
> > +               if (pos->gfn_start == start &&
> > +                   pos->gfn_end == end) {
> > +                       list_del(&pos->list);
> > +                       kfree(pos);
> > +                       return -1;
> > +               } else if (start >= pos->gfn_start && end <= pos->gfn_end) {
> > +                       if (start == pos->gfn_start)
> > +                               pos->gfn_start = end + 1;
> > +                       else if (end == pos->gfn_end)
> > +                               pos->gfn_end = start - 1;
> > +                       else {
> > +                               /* Do a de-merge -- split linked list nodes */
> > +                               unsigned long tmp;
> > +                               struct shared_region *shrd_region;
> > +
> > +                               tmp = pos->gfn_end;
> > +                               pos->gfn_end = start-1;
> > +                               shrd_region = kzalloc(sizeof(*shrd_region), GFP_KERNEL_ACCOUNT);
> > +                               if (!shrd_region)
> > +                                       return -ENOMEM;
> > +                               shrd_region->gfn_start = end + 1;
> > +                               shrd_region->gfn_end = tmp;
> > +                               list_add(&shrd_region->list, &pos->list);
> > +                               return 1;
> > +                       }
> > +                       return 0;
> > +               }
> > +       }
> 
> This doesn't handle the case where the region being marked as
> encrypted is larger than than the unencrypted region under
> consideration, which (I believe) can happen with the current kexec
> handling (since it is oblivious to the prior state).
> I would probably break this down into the "five cases": no
> intersection (skip), entry is completely contained (drop), entry
> completely contains the removed region (split), intersect start
> (chop), and intersect end (chop).
> 

I believe that the above is already handling these cases :

1). no intersection (skip) : handled
2). entry is completely contained (drop) : handled
3). entry completely contains the removed region (split) : handled
4). intersect start in case of #3 (chop) : handled
5). intersect end in case of #3 (chop) : handled.

The one case it does not handle currently is where the region being marked 
as encrypted is larger than the unencrypted region under consideration.

> >
> > +       return 0;
> > +}
> > +
> > +static int add_shared_region(unsigned long start, unsigned long end,
> > +                            struct list_head *shared_pages_list)
> > +{
> > +       struct list_head *head = shared_pages_list;
> > +       struct shared_region *shrd_region;
> > +       struct shared_region *pos;
> > +
> > +       if (list_empty(head)) { > > +               shrd_region = kzalloc(sizeof(*shrd_region), GFP_KERNEL_ACCOUNT);
> > +               if (!shrd_region)
> > +                       return -ENOMEM;
> > +               shrd_region->gfn_start = start;
> > +               shrd_region->gfn_end = end;
> > +               list_add_tail(&shrd_region->list, head);
> > +               return 1;
> > +       }
> > +
> > +       /*
> > +        * Shared pages list is a sorted list in ascending order of
> > +        * guest PA's and also merges consecutive range of guest PA's
> > +        */
> > +       list_for_each_entry(pos, head, list) {
> > +               if (pos->gfn_end < start)
> > +                       continue;
> > +               /* merge consecutive guest PA(s) */
> > +               if (pos->gfn_start <= start && pos->gfn_end >= start) {
> > +                       pos->gfn_end = end;
> 
> I'm not sure this correctly avoids having duplicate overlapping
> elements in the list. It also doesn't merge consecutive contiguous
> regions. Current guest implementation should never call the hypercall
> with C=0 for the same region twice, without calling with c=1 in
> between, but this API should be compatible with that model.

Yes it does not handle duplicate overlapping elements being added in
the list, i will fix that. 

And merging consecutive contigious regions is supported, 
the code is merging consecutive GPAs as the example below shows :

...
[   50.894254] marking as unencrypted, gfn_start = c0000, gfn_end = fc000
[   50.894255] inserting new node @guest PA = c0000
[   50.894259] marking as unencrypted, gfn_start = fc000, gfn_end = fec00
[   50.894260] merging consecutive GPA start = c0000, end = fc000
[   50.894267] marking as unencrypted, gfn_start = fec00, gfn_end = fec01
[   50.894267] merging consecutive GPA start = c0000, end = fec00
[   50.894274] marking as unencrypted, gfn_start = fec01, gfn_end = fed00
[   50.894274] merging consecutive GPA start = c0000, end = fec01
[   50.894278] marking as unencrypted, gfn_start = fed00, gfn_end = fed01
[   50.894279] merging consecutive GPA start = c0000, end = fed00
[   50.894283] marking as unencrypted, gfn_start = fed00, gfn_end = fed1c
[   50.894283] merging consecutive GPA start = c0000, end = fed01
[   50.894287] marking as unencrypted, gfn_start = fed1c, gfn_end = fed20
[   50.894288] merging consecutive GPA start = c0000, end = fed1c
[   50.894294] marking as unencrypted, gfn_start = fed20, gfn_end = fee00
[   50.894294] merging consecutive GPA start = c0000, end = fed20
[   50.894303] marking as unencrypted, gfn_start = fee00, gfn_end = fef00
[   50.894304] merging consecutive GPA start = c0000, end = fee00
[   50.894669] marking as unencrypted, gfn_start = fef00, gfn_end = 1000000
...

The above is merged into a single entry, as shown below:

qemu-system-x86_64: gfn start = c0000, gfn_end = 1000000

> 
> The easiest pattern would probably be to:
> 1) find (or insert) the node that will contain the added region.

Except the find part, the above is handled.

> 2) remove the contents of the added region from the tail (will
> typically do nothing).
> 3) merge the head of the tail into the current node, if the end of the
> current node matches the start of that head.

This is also being handled.

Thanks,
Ashish

> >
> > +                       return 0;
> > +               }
> > +               break;
> > +       }
> >
> > +       /*
> > +        * Add a new node, allocate nodes using GFP_KERNEL_ACCOUNT so that
> > +        * kernel memory can be tracked/throttled in case a
> > +        * malicious guest makes infinite number of hypercalls to
> > +        * exhaust host kernel memory and cause a DOS attack.
> > +        */
> > +       shrd_region = kzalloc(sizeof(*shrd_region), GFP_KERNEL_ACCOUNT);
> > +       if (!shrd_region)
> > +               return -ENOMEM;
> > +       shrd_region->gfn_start = start;
> > +       shrd_region->gfn_end = end;
> > +       list_add_tail(&shrd_region->list, &pos->list);
> > +       return 1;
> >
> > +}
> > +
> 
> Thanks!
> Steve

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.
  2021-02-05  3:07     ` Ashish Kalra
@ 2021-02-06  2:54       ` Steve Rutherford
  2021-02-06  4:49         ` Ashish Kalra
  2021-02-06  5:46         ` Ashish Kalra
  0 siblings, 2 replies; 75+ messages in thread
From: Steve Rutherford @ 2021-02-06  2:54 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Radim Krčmář,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, Sean Christopherson, Venu Busireddy, Brijesh Singh

On Thu, Feb 4, 2021 at 7:08 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>
> Hello Steve,
>
> On Thu, Feb 04, 2021 at 04:56:35PM -0800, Steve Rutherford wrote:
> > On Wed, Feb 3, 2021 at 4:39 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> > >
> > > From: Ashish Kalra <ashish.kalra@amd.com>
> > >
> > > Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> > > for host-side support for SEV live migration. Also add a new custom
> > > MSR_KVM_SEV_LIVE_MIGRATION for guest to enable the SEV live migration
> > > feature.
> > >
> > > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > ---
> > >  Documentation/virt/kvm/cpuid.rst     |  5 +++++
> > >  Documentation/virt/kvm/msr.rst       | 12 ++++++++++++
> > >  arch/x86/include/uapi/asm/kvm_para.h |  4 ++++
> > >  arch/x86/kvm/svm/sev.c               | 13 +++++++++++++
> > >  arch/x86/kvm/svm/svm.c               | 16 ++++++++++++++++
> > >  arch/x86/kvm/svm/svm.h               |  2 ++
> > >  6 files changed, 52 insertions(+)
> > >
> > > diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> > > index cf62162d4be2..0bdb6cdb12d3 100644
> > > --- a/Documentation/virt/kvm/cpuid.rst
> > > +++ b/Documentation/virt/kvm/cpuid.rst
> > > @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID        15          guest checks this feature bit
> > >                                                 before using extended destination
> > >                                                 ID bits in MSI address bits 11-5.
> > >
> > > +KVM_FEATURE_SEV_LIVE_MIGRATION     16          guest checks this feature bit before
> > > +                                               using the page encryption state
> > > +                                               hypercall to notify the page state
> > > +                                               change
> > > +
> > >  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24          host will warn if no guest-side
> > >                                                 per-cpu warps are expected in
> > >                                                 kvmclock
> > > diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> > > index e37a14c323d2..020245d16087 100644
> > > --- a/Documentation/virt/kvm/msr.rst
> > > +++ b/Documentation/virt/kvm/msr.rst
> > > @@ -376,3 +376,15 @@ data:
> > >         write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
> > >         and check if there are more notifications pending. The MSR is available
> > >         if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
> > > +
> > > +MSR_KVM_SEV_LIVE_MIGRATION:
> > > +        0x4b564d08
> > > +
> > > +       Control SEV Live Migration features.
> > > +
> > > +data:
> > > +        Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature,
> > > +        in other words, this is guest->host communication that it's properly
> > > +        handling the shared pages list.
> > > +
> > > +        All other bits are reserved.
> > > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> > > index 950afebfba88..f6bfa138874f 100644
> > > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > > @@ -33,6 +33,7 @@
> > >  #define KVM_FEATURE_PV_SCHED_YIELD     13
> > >  #define KVM_FEATURE_ASYNC_PF_INT       14
> > >  #define KVM_FEATURE_MSI_EXT_DEST_ID    15
> > > +#define KVM_FEATURE_SEV_LIVE_MIGRATION 16
> > >
> > >  #define KVM_HINTS_REALTIME      0
> > >
> > > @@ -54,6 +55,7 @@
> > >  #define MSR_KVM_POLL_CONTROL   0x4b564d05
> > >  #define MSR_KVM_ASYNC_PF_INT   0x4b564d06
> > >  #define MSR_KVM_ASYNC_PF_ACK   0x4b564d07
> > > +#define MSR_KVM_SEV_LIVE_MIGRATION     0x4b564d08
> > >
> > >  struct kvm_steal_time {
> > >         __u64 steal;
> > > @@ -136,4 +138,6 @@ struct kvm_vcpu_pv_apf_data {
> > >  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
> > >  #define KVM_PV_EOI_DISABLED 0x0
> > >
> > > +#define KVM_SEV_LIVE_MIGRATION_ENABLED BIT_ULL(0)
> > > +
> > >  #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > index b0d324aed515..93f42b3d3e33 100644
> > > --- a/arch/x86/kvm/svm/sev.c
> > > +++ b/arch/x86/kvm/svm/sev.c
> > > @@ -1627,6 +1627,16 @@ int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > >         return ret;
> > >  }
> > >
> > > +void sev_update_migration_flags(struct kvm *kvm, u64 data)
> > > +{
> > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > +
> > > +       if (!sev_guest(kvm))
> > > +               return;
> >
> > This should assert that userspace wanted the guest to be able to make
> > these calls (see more below).
> >
> > >
> > > +
> > > +       sev->live_migration_enabled = !!(data & KVM_SEV_LIVE_MIGRATION_ENABLED);
> > > +}
> > > +
> > >  int svm_get_shared_pages_list(struct kvm *kvm,
> > >                               struct kvm_shared_pages_list *list)
> > >  {
> > > @@ -1639,6 +1649,9 @@ int svm_get_shared_pages_list(struct kvm *kvm,
> > >         if (!sev_guest(kvm))
> > >                 return -ENOTTY;
> > >
> > > +       if (!sev->live_migration_enabled)
> > > +               return -EINVAL;

This is currently under guest control, so I'm not certain this is
helpful. If I called this with otherwise valid parameters, and got
back -EINVAL, I would probably think the bug is on my end. But it
could be on the guest's end! I would probably drop this, but you could
have KVM return an empty list of regions when this happens.

Alternatively, as explained below, this could call guest_pv_has instead.

>
> > > +
> > >         if (!list->size)
> > >                 return -EINVAL;
> > >
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index 58f89f83caab..43ea5061926f 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -2903,6 +2903,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> > >                 svm->msr_decfg = data;
> > >                 break;
> > >         }
> > > +       case MSR_KVM_SEV_LIVE_MIGRATION:
> > > +               sev_update_migration_flags(vcpu->kvm, data);
> > > +               break;
> > >         case MSR_IA32_APICBASE:
> > >                 if (kvm_vcpu_apicv_active(vcpu))
> > >                         avic_update_vapic_bar(to_svm(vcpu), data);
> > > @@ -3976,6 +3979,19 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > >                         vcpu->arch.cr3_lm_rsvd_bits &= ~(1UL << (best->ebx & 0x3f));
> > >         }
> > >
> > > +       /*
> > > +        * If SEV guest then enable the Live migration feature.
> > > +        */
> > > +       if (sev_guest(vcpu->kvm)) {
> > > +               struct kvm_cpuid_entry2 *best;
> > > +
> > > +               best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
> > > +               if (!best)
> > > +                       return;
> > > +
> > > +               best->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
> > > +       }
> > > +
> >
> > Looking at this, I believe the only way for this bit to get enabled is
> > if userspace toggles it. There needs to be a way for userspace to
> > identify if the kernel underneath them does, in fact, support SEV LM.
> > I'm at risk for having misread these patches (it's a long series), but
> > I don't see anything that communicates upwards.
> >
> > This could go upward with the other paravirt features flags in
> > cpuid.c. It could also be an explicit KVM Capability (checked through
> > check_extension).
> >
> > Userspace should then have a chance to decide whether or not this
> > should be enabled. And when it's not enabled, the host should return a
> > GP in response to the hypercall. This could be configured either
> > through userspace stripping out the LM feature bit, or by calling a VM
> > scoped enable cap (KVM_VM_IOCTL_ENABLE_CAP).
> >
> > I believe the typical path for a feature like this to be configured
> > would be to use ENABLE_CAP.
>
> I believe we have discussed and reviewed this earlier too.
>
> To summarize this feature, the host indicates if it supports the Live
> Migration feature and the feature and the hypercall are only enabled on
> the host when the guest checks for this support and does a wrmsrl() to
> enable the feature. Also the guest will not make the hypercall if the
> host does not indicate support for it.

I've gone through and read this patch a bit more closely, and the
surrounding code. Previously, I clearly misread this and the
surrounding space.

What happens if the guest just writes to the MSR anyway? Even if it
didn't receive a cue to do so? I believe the hypercall would still get
invoked here, since the hypercall does not check if SEV live migration
is enabled. Similarly, the MSR for enabling it is always available,
even if userspace didn't ask for the cpuid bit to be set. This should
not happen. Userspace should be in control of a new hypercall rolling
out.

I believe my interpretation last time was that the cpuid bit was
getting surfaced from the host kernel to host userspace, but I don't
actually see that in this patch series. Another way to ask this
question would be "How does userspace know the kernel they are on has
this patch series?". It needs some way of checking whether or not the
kernel underneath it supports SEV live migration. Technically, I think
userspace could call get_cpuid, set_cpuid (with the same values), and
then get_cpuid again, and it would be able to infer by checking the
SEV LM feature flag in the KVM leaf. This seems a bit kludgy. Checking
support should be easy.

An additional question is "how does userspace choose whether live
migration is advertised to the guest"? I believe userspace's desire
for a particular value of the paravirt feature flag in CPUID get's
overridden when they call set cpuid, since the feature flag is set in
svm_vcpu_after_set_cpuid regardless of what userspace asks for.
Userspace should have a choice in the matter.

Looking at similar paravirt-y features, there's precedent for another
way of doing this (may be preferred over CHECK_EXTENSION/ENABLE_CAP?):
this could call guest_pv_has before running the hypercall. The feature
(KVM_FEATURE_SEV_LIVE_MIGRATION) would then need to be exposed with
the other paravirt features in __do_cpuid_func. The function
guest_pv_has would represent if userspace has decided to expose SEV
live migration to the guest, and the sev->live_migration_enabled would
indicate if the guest responded affirmatively to the CPUID bit.

The downside of using guest_pv_has is that, if pv enforcement is
disabled, guest_pv_has will always return true, which seems a bit odd
for a non-SEV guest. This isn't a deal breaker, but seems a bit odd
for say, a guest that isn't even running SEV. Using CHECK_EXTENSION
and ENABLE_CAP sidestep that. I'm also not certain I would call this a
paravirt feature.

> And these were your review comments on the above :
> I see I misunderstood how the CPUID bits get passed
> through: usermode can still override them. Forgot about the back and
> forth for CPUID with usermode.
>
> So as you mentioned, userspace can still override these and it gets a
> chance to decide whether or not this should be enabled.
>
> Thanks,
> Ashish


Thanks,
Steve

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.
  2021-02-06  2:54       ` Steve Rutherford
@ 2021-02-06  4:49         ` Ashish Kalra
  2021-02-06  5:46         ` Ashish Kalra
  1 sibling, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-02-06  4:49 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Radim Krčmář,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, Sean Christopherson, Venu Busireddy, Brijesh Singh

Hello Steve,

Let me first answer those queries which i can do immediately ...

On Fri, Feb 05, 2021 at 06:54:21PM -0800, Steve Rutherford wrote:
> On Thu, Feb 4, 2021 at 7:08 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >
> > Hello Steve,
> >
> > On Thu, Feb 04, 2021 at 04:56:35PM -0800, Steve Rutherford wrote:
> > > On Wed, Feb 3, 2021 at 4:39 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> > > >
> > > > From: Ashish Kalra <ashish.kalra@amd.com>
> > > >
> > > > Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> > > > for host-side support for SEV live migration. Also add a new custom
> > > > MSR_KVM_SEV_LIVE_MIGRATION for guest to enable the SEV live migration
> > > > feature.
> > > >
> > > > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > > ---
> > > >  Documentation/virt/kvm/cpuid.rst     |  5 +++++
> > > >  Documentation/virt/kvm/msr.rst       | 12 ++++++++++++
> > > >  arch/x86/include/uapi/asm/kvm_para.h |  4 ++++
> > > >  arch/x86/kvm/svm/sev.c               | 13 +++++++++++++
> > > >  arch/x86/kvm/svm/svm.c               | 16 ++++++++++++++++
> > > >  arch/x86/kvm/svm/svm.h               |  2 ++
> > > >  6 files changed, 52 insertions(+)
> > > >
> > > > diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> > > > index cf62162d4be2..0bdb6cdb12d3 100644
> > > > --- a/Documentation/virt/kvm/cpuid.rst
> > > > +++ b/Documentation/virt/kvm/cpuid.rst
> > > > @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID        15          guest checks this feature bit
> > > >                                                 before using extended destination
> > > >                                                 ID bits in MSI address bits 11-5.
> > > >
> > > > +KVM_FEATURE_SEV_LIVE_MIGRATION     16          guest checks this feature bit before
> > > > +                                               using the page encryption state
> > > > +                                               hypercall to notify the page state
> > > > +                                               change
> > > > +
> > > >  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24          host will warn if no guest-side
> > > >                                                 per-cpu warps are expected in
> > > >                                                 kvmclock
> > > > diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> > > > index e37a14c323d2..020245d16087 100644
> > > > --- a/Documentation/virt/kvm/msr.rst
> > > > +++ b/Documentation/virt/kvm/msr.rst
> > > > @@ -376,3 +376,15 @@ data:
> > > >         write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
> > > >         and check if there are more notifications pending. The MSR is available
> > > >         if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
> > > > +
> > > > +MSR_KVM_SEV_LIVE_MIGRATION:
> > > > +        0x4b564d08
> > > > +
> > > > +       Control SEV Live Migration features.
> > > > +
> > > > +data:
> > > > +        Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature,
> > > > +        in other words, this is guest->host communication that it's properly
> > > > +        handling the shared pages list.
> > > > +
> > > > +        All other bits are reserved.
> > > > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> > > > index 950afebfba88..f6bfa138874f 100644
> > > > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > > > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > > > @@ -33,6 +33,7 @@
> > > >  #define KVM_FEATURE_PV_SCHED_YIELD     13
> > > >  #define KVM_FEATURE_ASYNC_PF_INT       14
> > > >  #define KVM_FEATURE_MSI_EXT_DEST_ID    15
> > > > +#define KVM_FEATURE_SEV_LIVE_MIGRATION 16
> > > >
> > > >  #define KVM_HINTS_REALTIME      0
> > > >
> > > > @@ -54,6 +55,7 @@
> > > >  #define MSR_KVM_POLL_CONTROL   0x4b564d05
> > > >  #define MSR_KVM_ASYNC_PF_INT   0x4b564d06
> > > >  #define MSR_KVM_ASYNC_PF_ACK   0x4b564d07
> > > > +#define MSR_KVM_SEV_LIVE_MIGRATION     0x4b564d08
> > > >
> > > >  struct kvm_steal_time {
> > > >         __u64 steal;
> > > > @@ -136,4 +138,6 @@ struct kvm_vcpu_pv_apf_data {
> > > >  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
> > > >  #define KVM_PV_EOI_DISABLED 0x0
> > > >
> > > > +#define KVM_SEV_LIVE_MIGRATION_ENABLED BIT_ULL(0)
> > > > +
> > > >  #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > > index b0d324aed515..93f42b3d3e33 100644
> > > > --- a/arch/x86/kvm/svm/sev.c
> > > > +++ b/arch/x86/kvm/svm/sev.c
> > > > @@ -1627,6 +1627,16 @@ int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > > >         return ret;
> > > >  }
> > > >
> > > > +void sev_update_migration_flags(struct kvm *kvm, u64 data)
> > > > +{
> > > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > +
> > > > +       if (!sev_guest(kvm))
> > > > +               return;
> > >
> > > This should assert that userspace wanted the guest to be able to make
> > > these calls (see more below).
> > >
> > > >
> > > > +
> > > > +       sev->live_migration_enabled = !!(data & KVM_SEV_LIVE_MIGRATION_ENABLED);
> > > > +}
> > > > +
> > > >  int svm_get_shared_pages_list(struct kvm *kvm,
> > > >                               struct kvm_shared_pages_list *list)
> > > >  {
> > > > @@ -1639,6 +1649,9 @@ int svm_get_shared_pages_list(struct kvm *kvm,
> > > >         if (!sev_guest(kvm))
> > > >                 return -ENOTTY;
> > > >
> > > > +       if (!sev->live_migration_enabled)
> > > > +               return -EINVAL;
> 
> This is currently under guest control, so I'm not certain this is
> helpful. If I called this with otherwise valid parameters, and got
> back -EINVAL, I would probably think the bug is on my end. But it
> could be on the guest's end! I would probably drop this, but you could
> have KVM return an empty list of regions when this happens.

You will get -EINVAL till the guest kernel has booted to a specific
point, because live migration is enabled when guest kernel has checked
for host support for live migration and also OVMF/UEFI support for
live migration, as per this guest kernel code below :

arch/x86/kernel/kvm.c:

kvm_init_platform() -> check_kvm_sev_migration()
..

arch/x86/mm/mem_encrypt.c:

void __init check_kvm_sev_migration(void)
{
        if (sev_active() &&
            kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION)) {
                unsigned long nr_pages;

                pr_info("KVM enable live migration\n");
                sev_live_migration_enabled = true;

                /*
                 * Ensure that _bss_decrypted section is marked as decrypted in the
                 * shared pages list.
                 */
                nr_pages = DIV_ROUND_UP(__end_bss_decrypted - __start_bss_decrypted,
                                        PAGE_SIZE);
                early_set_mem_enc_dec_hypercall((unsigned long)__start_bss_decrypted,
                                                nr_pages, 0);

                /*
                 * If not booted using EFI, enable Live migration support.
                 */
                if (!efi_enabled(EFI_BOOT))
                        wrmsrl(MSR_KVM_SEV_LIVE_MIGRATION,
                               KVM_SEV_LIVE_MIGRATION_ENABLED);
	} else {
                        pr_info("KVM enable live migration feature unsupported\n");
                }       

Later, setup_kvm_sev_migration() invoked via a late initcall, checks for
live migration supported(setup above) and UEFI/OVMF support for live
migration and then enables live migration on the host via the wrmrsl() :

static int __init setup_kvm_sev_migration(void)
{       
        efi_char16_t efi_sev_live_migration_enabled[] = L"SevLiveMigrationEnabled";
        efi_guid_t efi_variable_guid = MEM_ENCRYPT_GUID;
        efi_status_t status;
        unsigned long size;
        bool enabled;
        
        /*
         * check_kvm_sev_migration() invoked via kvm_init_platform() before
         * this callback would have setup the indicator that live migration
         * feature is supported/enabled.
         */
        if (!sev_live_migration_enabled)
                return 0;
        
        if (!efi_enabled(EFI_RUNTIME_SERVICES)) {
                pr_info("%s : EFI runtime services are not enabled\n", __func__);
                return 0;
        }

        size = sizeof(enabled);

        /* Get variable contents into buffer */
        status = efi.get_variable(efi_sev_live_migration_enabled,
                                  &efi_variable_guid, NULL, &size, &enabled);
                                  
        if (status == EFI_NOT_FOUND) {
                pr_info("%s : EFI live migration variable not found\n", __func__);
                return 0;
        }
        
        if (status != EFI_SUCCESS) {
                pr_info("%s : EFI variable retrieval failed\n", __func__);
                return 0;
        }
        
        if (enabled == 0) {
                pr_info("%s: live migration disabled in EFI\n", __func__);
                return 0;
	}

	pr_info("%s : live migration enabled in EFI\n", __func__);
        wrmsrl(MSR_KVM_SEV_LIVE_MIGRATION, KVM_SEV_LIVE_MIGRATION_ENABLED);

        return true;

This ensures that host/guest live migration negotation is completed only
when both host and guest have support for it and also UEFI/OVMF supports
it. 

Please note, that live migration cannot be initiated before this
negotation is complete, which makes sense, as we don't want to enable it
till the host/guest negotation is complete and UEFI/OVMF support for it
is checked.

So there is this window, till guest is booting and till the late
initcall is invoked that live migration cannot be initiated, and 
KVM_GET_SHARED_PAGES ioctl will return -EINVAL till then.
               
> Alternatively, as explained below, this could call guest_pv_has instead.
> 
> >
> > > > +
> > > >         if (!list->size)
> > > >                 return -EINVAL;
> > > >
> > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > > index 58f89f83caab..43ea5061926f 100644
> > > > --- a/arch/x86/kvm/svm/svm.c
> > > > +++ b/arch/x86/kvm/svm/svm.c
> > > > @@ -2903,6 +2903,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> > > >                 svm->msr_decfg = data;
> > > >                 break;
> > > >         }
> > > > +       case MSR_KVM_SEV_LIVE_MIGRATION:
> > > > +               sev_update_migration_flags(vcpu->kvm, data);
> > > > +               break;
> > > >         case MSR_IA32_APICBASE:
> > > >                 if (kvm_vcpu_apicv_active(vcpu))
> > > >                         avic_update_vapic_bar(to_svm(vcpu), data);
> > > > @@ -3976,6 +3979,19 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > > >                         vcpu->arch.cr3_lm_rsvd_bits &= ~(1UL << (best->ebx & 0x3f));
> > > >         }
> > > >
> > > > +       /*
> > > > +        * If SEV guest then enable the Live migration feature.
> > > > +        */
> > > > +       if (sev_guest(vcpu->kvm)) {
> > > > +               struct kvm_cpuid_entry2 *best;
> > > > +
> > > > +               best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
> > > > +               if (!best)
> > > > +                       return;
> > > > +
> > > > +               best->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
> > > > +       }
> > > > +
> > >
> > > Looking at this, I believe the only way for this bit to get enabled is
> > > if userspace toggles it. There needs to be a way for userspace to
> > > identify if the kernel underneath them does, in fact, support SEV LM.
> > > I'm at risk for having misread these patches (it's a long series), but
> > > I don't see anything that communicates upwards.
> > >
> > > This could go upward with the other paravirt features flags in
> > > cpuid.c. It could also be an explicit KVM Capability (checked through
> > > check_extension).
> > >
> > > Userspace should then have a chance to decide whether or not this
> > > should be enabled. And when it's not enabled, the host should return a
> > > GP in response to the hypercall. This could be configured either
> > > through userspace stripping out the LM feature bit, or by calling a VM
> > > scoped enable cap (KVM_VM_IOCTL_ENABLE_CAP).
> > >
> > > I believe the typical path for a feature like this to be configured
> > > would be to use ENABLE_CAP.
> >
> > I believe we have discussed and reviewed this earlier too.
> >
> > To summarize this feature, the host indicates if it supports the Live
> > Migration feature and the feature and the hypercall are only enabled on
> > the host when the guest checks for this support and does a wrmsrl() to
> > enable the feature. Also the guest will not make the hypercall if the
> > host does not indicate support for it.
> 
> I've gone through and read this patch a bit more closely, and the
> surrounding code. Previously, I clearly misread this and the
> surrounding space.
> 
> What happens if the guest just writes to the MSR anyway? Even if it
> didn't receive a cue to do so? I believe the hypercall would still get
> invoked here, since the hypercall does not check if SEV live migration
> is enabled. Similarly, the MSR for enabling it is always available,
> even if userspace didn't ask for the cpuid bit to be set. This should
> not happen. Userspace should be in control of a new hypercall rolling
> out.

No, the guest will not invoke hypercall until live migration support
has been enabled on the guest as i described above. The hypercall 
invocation code does this check as shown below: 

static void set_memory_enc_dec_hypercall(unsigned long vaddr, int npages,
                                        bool enc)
{
        unsigned long sz = npages << PAGE_SHIFT;
        unsigned long vaddr_end, vaddr_next;

        if (!sev_live_migration_enabled)
                return;
...
...

Thanks,
Ashish

> 
> I believe my interpretation last time was that the cpuid bit was
> getting surfaced from the host kernel to host userspace, but I don't
> actually see that in this patch series. Another way to ask this
> question would be "How does userspace know the kernel they are on has
> this patch series?". It needs some way of checking whether or not the
> kernel underneath it supports SEV live migration. Technically, I think
> userspace could call get_cpuid, set_cpuid (with the same values), and
> then get_cpuid again, and it would be able to infer by checking the
> SEV LM feature flag in the KVM leaf. This seems a bit kludgy. Checking
> support should be easy.
> 
> An additional question is "how does userspace choose whether live
> migration is advertised to the guest"? I believe userspace's desire
> for a particular value of the paravirt feature flag in CPUID get's
> overridden when they call set cpuid, since the feature flag is set in
> svm_vcpu_after_set_cpuid regardless of what userspace asks for.
> Userspace should have a choice in the matter.
> 
> Looking at similar paravirt-y features, there's precedent for another
> way of doing this (may be preferred over CHECK_EXTENSION/ENABLE_CAP?):
> this could call guest_pv_has before running the hypercall. The feature
> (KVM_FEATURE_SEV_LIVE_MIGRATION) would then need to be exposed with
> the other paravirt features in __do_cpuid_func. The function
> guest_pv_has would represent if userspace has decided to expose SEV
> live migration to the guest, and the sev->live_migration_enabled would
> indicate if the guest responded affirmatively to the CPUID bit.
> 
> The downside of using guest_pv_has is that, if pv enforcement is
> disabled, guest_pv_has will always return true, which seems a bit odd
> for a non-SEV guest. This isn't a deal breaker, but seems a bit odd
> for say, a guest that isn't even running SEV. Using CHECK_EXTENSION
> and ENABLE_CAP sidestep that. I'm also not certain I would call this a
> paravirt feature.
> 
> > And these were your review comments on the above :
> > I see I misunderstood how the CPUID bits get passed
> > through: usermode can still override them. Forgot about the back and
> > forth for CPUID with usermode.
> >
> > So as you mentioned, userspace can still override these and it gets a
> > chance to decide whether or not this should be enabled.
> >
> > Thanks,
> > Ashish
> 
> 
> Thanks,
> Steve

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.
  2021-02-06  2:54       ` Steve Rutherford
  2021-02-06  4:49         ` Ashish Kalra
@ 2021-02-06  5:46         ` Ashish Kalra
  2021-02-06 13:56           ` Ashish Kalra
  1 sibling, 1 reply; 75+ messages in thread
From: Ashish Kalra @ 2021-02-06  5:46 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Radim Krčmář,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, Sean Christopherson, Venu Busireddy, Brijesh Singh

Hello Steve,

Continued response to your queries, especially related to userspace
control of SEV live migration feature : 

On Fri, Feb 05, 2021 at 06:54:21PM -0800, Steve Rutherford wrote:
> On Thu, Feb 4, 2021 at 7:08 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >
> > Hello Steve,
> >
> > On Thu, Feb 04, 2021 at 04:56:35PM -0800, Steve Rutherford wrote:
> > > On Wed, Feb 3, 2021 at 4:39 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> > > >
> > > > From: Ashish Kalra <ashish.kalra@amd.com>
> > > >
> > > > Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> > > > for host-side support for SEV live migration. Also add a new custom
> > > > MSR_KVM_SEV_LIVE_MIGRATION for guest to enable the SEV live migration
> > > > feature.
> > > >
> > > > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > > ---
> > > >  Documentation/virt/kvm/cpuid.rst     |  5 +++++
> > > >  Documentation/virt/kvm/msr.rst       | 12 ++++++++++++
> > > >  arch/x86/include/uapi/asm/kvm_para.h |  4 ++++
> > > >  arch/x86/kvm/svm/sev.c               | 13 +++++++++++++
> > > >  arch/x86/kvm/svm/svm.c               | 16 ++++++++++++++++
> > > >  arch/x86/kvm/svm/svm.h               |  2 ++
> > > >  6 files changed, 52 insertions(+)
> > > >
> > > > diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> > > > index cf62162d4be2..0bdb6cdb12d3 100644
> > > > --- a/Documentation/virt/kvm/cpuid.rst
> > > > +++ b/Documentation/virt/kvm/cpuid.rst
> > > > @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID        15          guest checks this feature bit
> > > >                                                 before using extended destination
> > > >                                                 ID bits in MSI address bits 11-5.
> > > >
> > > > +KVM_FEATURE_SEV_LIVE_MIGRATION     16          guest checks this feature bit before
> > > > +                                               using the page encryption state
> > > > +                                               hypercall to notify the page state
> > > > +                                               change
> > > > +
> > > >  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24          host will warn if no guest-side
> > > >                                                 per-cpu warps are expected in
> > > >                                                 kvmclock
> > > > diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> > > > index e37a14c323d2..020245d16087 100644
> > > > --- a/Documentation/virt/kvm/msr.rst
> > > > +++ b/Documentation/virt/kvm/msr.rst
> > > > @@ -376,3 +376,15 @@ data:
> > > >         write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
> > > >         and check if there are more notifications pending. The MSR is available
> > > >         if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
> > > > +
> > > > +MSR_KVM_SEV_LIVE_MIGRATION:
> > > > +        0x4b564d08
> > > > +
> > > > +       Control SEV Live Migration features.
> > > > +
> > > > +data:
> > > > +        Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature,
> > > > +        in other words, this is guest->host communication that it's properly
> > > > +        handling the shared pages list.
> > > > +
> > > > +        All other bits are reserved.
> > > > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> > > > index 950afebfba88..f6bfa138874f 100644
> > > > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > > > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > > > @@ -33,6 +33,7 @@
> > > >  #define KVM_FEATURE_PV_SCHED_YIELD     13
> > > >  #define KVM_FEATURE_ASYNC_PF_INT       14
> > > >  #define KVM_FEATURE_MSI_EXT_DEST_ID    15
> > > > +#define KVM_FEATURE_SEV_LIVE_MIGRATION 16
> > > >
> > > >  #define KVM_HINTS_REALTIME      0
> > > >
> > > > @@ -54,6 +55,7 @@
> > > >  #define MSR_KVM_POLL_CONTROL   0x4b564d05
> > > >  #define MSR_KVM_ASYNC_PF_INT   0x4b564d06
> > > >  #define MSR_KVM_ASYNC_PF_ACK   0x4b564d07
> > > > +#define MSR_KVM_SEV_LIVE_MIGRATION     0x4b564d08
> > > >
> > > >  struct kvm_steal_time {
> > > >         __u64 steal;
> > > > @@ -136,4 +138,6 @@ struct kvm_vcpu_pv_apf_data {
> > > >  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
> > > >  #define KVM_PV_EOI_DISABLED 0x0
> > > >
> > > > +#define KVM_SEV_LIVE_MIGRATION_ENABLED BIT_ULL(0)
> > > > +
> > > >  #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > > index b0d324aed515..93f42b3d3e33 100644
> > > > --- a/arch/x86/kvm/svm/sev.c
> > > > +++ b/arch/x86/kvm/svm/sev.c
> > > > @@ -1627,6 +1627,16 @@ int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > > >         return ret;
> > > >  }
> > > >
> > > > +void sev_update_migration_flags(struct kvm *kvm, u64 data)
> > > > +{
> > > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > +
> > > > +       if (!sev_guest(kvm))
> > > > +               return;
> > >
> > > This should assert that userspace wanted the guest to be able to make
> > > these calls (see more below).
> > >
> > > >
> > > > +
> > > > +       sev->live_migration_enabled = !!(data & KVM_SEV_LIVE_MIGRATION_ENABLED);
> > > > +}
> > > > +
> > > >  int svm_get_shared_pages_list(struct kvm *kvm,
> > > >                               struct kvm_shared_pages_list *list)
> > > >  {
> > > > @@ -1639,6 +1649,9 @@ int svm_get_shared_pages_list(struct kvm *kvm,
> > > >         if (!sev_guest(kvm))
> > > >                 return -ENOTTY;
> > > >
> > > > +       if (!sev->live_migration_enabled)
> > > > +               return -EINVAL;
> 
> This is currently under guest control, so I'm not certain this is
> helpful. If I called this with otherwise valid parameters, and got
> back -EINVAL, I would probably think the bug is on my end. But it
> could be on the guest's end! I would probably drop this, but you could
> have KVM return an empty list of regions when this happens.
> 
> Alternatively, as explained below, this could call guest_pv_has instead.
> 
> >
> > > > +
> > > >         if (!list->size)
> > > >                 return -EINVAL;
> > > >
> > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > > index 58f89f83caab..43ea5061926f 100644
> > > > --- a/arch/x86/kvm/svm/svm.c
> > > > +++ b/arch/x86/kvm/svm/svm.c
> > > > @@ -2903,6 +2903,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> > > >                 svm->msr_decfg = data;
> > > >                 break;
> > > >         }
> > > > +       case MSR_KVM_SEV_LIVE_MIGRATION:
> > > > +               sev_update_migration_flags(vcpu->kvm, data);
> > > > +               break;
> > > >         case MSR_IA32_APICBASE:
> > > >                 if (kvm_vcpu_apicv_active(vcpu))
> > > >                         avic_update_vapic_bar(to_svm(vcpu), data);
> > > > @@ -3976,6 +3979,19 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > > >                         vcpu->arch.cr3_lm_rsvd_bits &= ~(1UL << (best->ebx & 0x3f));
> > > >         }
> > > >
> > > > +       /*
> > > > +        * If SEV guest then enable the Live migration feature.
> > > > +        */
> > > > +       if (sev_guest(vcpu->kvm)) {
> > > > +               struct kvm_cpuid_entry2 *best;
> > > > +
> > > > +               best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
> > > > +               if (!best)
> > > > +                       return;
> > > > +
> > > > +               best->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
> > > > +       }
> > > > +
> > >
> > > Looking at this, I believe the only way for this bit to get enabled is
> > > if userspace toggles it. There needs to be a way for userspace to
> > > identify if the kernel underneath them does, in fact, support SEV LM.
> > > I'm at risk for having misread these patches (it's a long series), but
> > > I don't see anything that communicates upwards.
> > >
> > > This could go upward with the other paravirt features flags in
> > > cpuid.c. It could also be an explicit KVM Capability (checked through
> > > check_extension).
> > >
> > > Userspace should then have a chance to decide whether or not this
> > > should be enabled. And when it's not enabled, the host should return a
> > > GP in response to the hypercall. This could be configured either
> > > through userspace stripping out the LM feature bit, or by calling a VM
> > > scoped enable cap (KVM_VM_IOCTL_ENABLE_CAP).
> > >
> > > I believe the typical path for a feature like this to be configured
> > > would be to use ENABLE_CAP.
> >
> > I believe we have discussed and reviewed this earlier too.
> >
> > To summarize this feature, the host indicates if it supports the Live
> > Migration feature and the feature and the hypercall are only enabled on
> > the host when the guest checks for this support and does a wrmsrl() to
> > enable the feature. Also the guest will not make the hypercall if the
> > host does not indicate support for it.
> 
> I've gone through and read this patch a bit more closely, and the
> surrounding code. Previously, I clearly misread this and the
> surrounding space.
> 
> What happens if the guest just writes to the MSR anyway? Even if it
> didn't receive a cue to do so? I believe the hypercall would still get
> invoked here, since the hypercall does not check if SEV live migration
> is enabled. Similarly, the MSR for enabling it is always available,
> even if userspace didn't ask for the cpuid bit to be set. This should
> not happen. Userspace should be in control of a new hypercall rolling
> out.
> 
> I believe my interpretation last time was that the cpuid bit was
> getting surfaced from the host kernel to host userspace, but I don't
> actually see that in this patch series. Another way to ask this
> question would be "How does userspace know the kernel they are on has
> this patch series?". It needs some way of checking whether or not the
> kernel underneath it supports SEV live migration. Technically, I think
> userspace could call get_cpuid, set_cpuid (with the same values), and
> then get_cpuid again, and it would be able to infer by checking the
> SEV LM feature flag in the KVM leaf. This seems a bit kludgy. Checking
> support should be easy.
> 
> An additional question is "how does userspace choose whether live
> migration is advertised to the guest"? I believe userspace's desire
> for a particular value of the paravirt feature flag in CPUID get's
> overridden when they call set cpuid, since the feature flag is set in
> svm_vcpu_after_set_cpuid regardless of what userspace asks for.
> Userspace should have a choice in the matter.
> 

To summarize, KVM (host) enables SEV live migration feature as
following:

static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
{
...
        /*
         * If SEV guest then enable the Live migration feature.
         */
        if (sev_guest(vcpu->kvm)) {
                struct kvm_cpuid_entry2 *best;

                best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
                if (!best)
                        return;

                best->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
        }

...
...

Later userspace can call cpuid(KVM_CPUID_FEATURES) and get the cpuid data
and override it, for example, this is how Qemu userspace code currently
fixups/overrides the KVM reported CPUID features : 

target/i386/kvm/kvm.c:

uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
                                      uint32_t index, int reg)
{
...
...

  cpuid = get_supported_cpuid(s);

  struct kvm_cpuid_entry2 *entry = cpuid_find_entry(cpuid, function, index);
  if (entry) {
      ret = cpuid_entry_get_reg(entry, reg);
  }
    
  /* Fixups for the data returned by KVM, below */

  ...
  ...

  } else if (function == KVM_CPUID_FEATURES && reg == R_EAX) {
        /* kvm_pv_unhalt is reported by GET_SUPPORTED_CPUID, but it can't
         * be enabled without the in-kernel irqchip
         */
        if (!kvm_irqchip_in_kernel()) {
            ret &= ~(1U << KVM_FEATURE_PV_UNHALT);
        }
        if (kvm_irqchip_is_split()) {
            ret |= 1U << KVM_FEATURE_MSI_EXT_DEST_ID;
        }
    } else if (function == KVM_CPUID_FEATURES && reg == R_EDX) {
        ret |= 1U << KVM_HINTS_REALTIME;
    }
    
    return ret;

So you can use a similar approach to override
KVM_FEATURE_SEV_LIVE_MIGRATION feature.

Thanks,
Ashish

> Looking at similar paravirt-y features, there's precedent for another
> way of doing this (may be preferred over CHECK_EXTENSION/ENABLE_CAP?):
> this could call guest_pv_has before running the hypercall. The feature
> (KVM_FEATURE_SEV_LIVE_MIGRATION) would then need to be exposed with
> the other paravirt features in __do_cpuid_func. The function
> guest_pv_has would represent if userspace has decided to expose SEV
> live migration to the guest, and the sev->live_migration_enabled would
> indicate if the guest responded affirmatively to the CPUID bit.
> 
> The downside of using guest_pv_has is that, if pv enforcement is
> disabled, guest_pv_has will always return true, which seems a bit odd
> for a non-SEV guest. This isn't a deal breaker, but seems a bit odd
> for say, a guest that isn't even running SEV. Using CHECK_EXTENSION
> and ENABLE_CAP sidestep that. I'm also not certain I would call this a
> paravirt feature.
> 
> > And these were your review comments on the above :
> > I see I misunderstood how the CPUID bits get passed
> > through: usermode can still override them. Forgot about the back and
> > forth for CPUID with usermode.
> >
> > So as you mentioned, userspace can still override these and it gets a
> > chance to decide whether or not this should be enabled.
> >
> > Thanks,
> > Ashish
> 
> 
> Thanks,
> Steve

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.
  2021-02-06  5:46         ` Ashish Kalra
@ 2021-02-06 13:56           ` Ashish Kalra
  2021-02-08  0:28             ` Ashish Kalra
  0 siblings, 1 reply; 75+ messages in thread
From: Ashish Kalra @ 2021-02-06 13:56 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Radim Krčmář,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, Sean Christopherson, Venu Busireddy, Brijesh Singh

Hello Steve,

On Sat, Feb 06, 2021 at 05:46:17AM +0000, Ashish Kalra wrote:
> Hello Steve,
> 
> Continued response to your queries, especially related to userspace
> control of SEV live migration feature : 
> 
> On Fri, Feb 05, 2021 at 06:54:21PM -0800, Steve Rutherford wrote:
> > On Thu, Feb 4, 2021 at 7:08 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > >
> > > Hello Steve,
> > >
> > > On Thu, Feb 04, 2021 at 04:56:35PM -0800, Steve Rutherford wrote:
> > > > On Wed, Feb 3, 2021 at 4:39 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> > > > >
> > > > > From: Ashish Kalra <ashish.kalra@amd.com>
> > > > >
> > > > > Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> > > > > for host-side support for SEV live migration. Also add a new custom
> > > > > MSR_KVM_SEV_LIVE_MIGRATION for guest to enable the SEV live migration
> > > > > feature.
> > > > >
> > > > > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > > > ---
> > > > >  Documentation/virt/kvm/cpuid.rst     |  5 +++++
> > > > >  Documentation/virt/kvm/msr.rst       | 12 ++++++++++++
> > > > >  arch/x86/include/uapi/asm/kvm_para.h |  4 ++++
> > > > >  arch/x86/kvm/svm/sev.c               | 13 +++++++++++++
> > > > >  arch/x86/kvm/svm/svm.c               | 16 ++++++++++++++++
> > > > >  arch/x86/kvm/svm/svm.h               |  2 ++
> > > > >  6 files changed, 52 insertions(+)
> > > > >
> > > > > diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> > > > > index cf62162d4be2..0bdb6cdb12d3 100644
> > > > > --- a/Documentation/virt/kvm/cpuid.rst
> > > > > +++ b/Documentation/virt/kvm/cpuid.rst
> > > > > @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID        15          guest checks this feature bit
> > > > >                                                 before using extended destination
> > > > >                                                 ID bits in MSI address bits 11-5.
> > > > >
> > > > > +KVM_FEATURE_SEV_LIVE_MIGRATION     16          guest checks this feature bit before
> > > > > +                                               using the page encryption state
> > > > > +                                               hypercall to notify the page state
> > > > > +                                               change
> > > > > +
> > > > >  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24          host will warn if no guest-side
> > > > >                                                 per-cpu warps are expected in
> > > > >                                                 kvmclock
> > > > > diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> > > > > index e37a14c323d2..020245d16087 100644
> > > > > --- a/Documentation/virt/kvm/msr.rst
> > > > > +++ b/Documentation/virt/kvm/msr.rst
> > > > > @@ -376,3 +376,15 @@ data:
> > > > >         write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
> > > > >         and check if there are more notifications pending. The MSR is available
> > > > >         if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
> > > > > +
> > > > > +MSR_KVM_SEV_LIVE_MIGRATION:
> > > > > +        0x4b564d08
> > > > > +
> > > > > +       Control SEV Live Migration features.
> > > > > +
> > > > > +data:
> > > > > +        Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature,
> > > > > +        in other words, this is guest->host communication that it's properly
> > > > > +        handling the shared pages list.
> > > > > +
> > > > > +        All other bits are reserved.
> > > > > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> > > > > index 950afebfba88..f6bfa138874f 100644
> > > > > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > > > > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > > > > @@ -33,6 +33,7 @@
> > > > >  #define KVM_FEATURE_PV_SCHED_YIELD     13
> > > > >  #define KVM_FEATURE_ASYNC_PF_INT       14
> > > > >  #define KVM_FEATURE_MSI_EXT_DEST_ID    15
> > > > > +#define KVM_FEATURE_SEV_LIVE_MIGRATION 16
> > > > >
> > > > >  #define KVM_HINTS_REALTIME      0
> > > > >
> > > > > @@ -54,6 +55,7 @@
> > > > >  #define MSR_KVM_POLL_CONTROL   0x4b564d05
> > > > >  #define MSR_KVM_ASYNC_PF_INT   0x4b564d06
> > > > >  #define MSR_KVM_ASYNC_PF_ACK   0x4b564d07
> > > > > +#define MSR_KVM_SEV_LIVE_MIGRATION     0x4b564d08
> > > > >
> > > > >  struct kvm_steal_time {
> > > > >         __u64 steal;
> > > > > @@ -136,4 +138,6 @@ struct kvm_vcpu_pv_apf_data {
> > > > >  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
> > > > >  #define KVM_PV_EOI_DISABLED 0x0
> > > > >
> > > > > +#define KVM_SEV_LIVE_MIGRATION_ENABLED BIT_ULL(0)
> > > > > +
> > > > >  #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> > > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > > > index b0d324aed515..93f42b3d3e33 100644
> > > > > --- a/arch/x86/kvm/svm/sev.c
> > > > > +++ b/arch/x86/kvm/svm/sev.c
> > > > > @@ -1627,6 +1627,16 @@ int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > > > >         return ret;
> > > > >  }
> > > > >
> > > > > +void sev_update_migration_flags(struct kvm *kvm, u64 data)
> > > > > +{
> > > > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > > +
> > > > > +       if (!sev_guest(kvm))
> > > > > +               return;
> > > >
> > > > This should assert that userspace wanted the guest to be able to make
> > > > these calls (see more below).
> > > >
> > > > >
> > > > > +
> > > > > +       sev->live_migration_enabled = !!(data & KVM_SEV_LIVE_MIGRATION_ENABLED);
> > > > > +}
> > > > > +
> > > > >  int svm_get_shared_pages_list(struct kvm *kvm,
> > > > >                               struct kvm_shared_pages_list *list)
> > > > >  {
> > > > > @@ -1639,6 +1649,9 @@ int svm_get_shared_pages_list(struct kvm *kvm,
> > > > >         if (!sev_guest(kvm))
> > > > >                 return -ENOTTY;
> > > > >
> > > > > +       if (!sev->live_migration_enabled)
> > > > > +               return -EINVAL;
> > 
> > This is currently under guest control, so I'm not certain this is
> > helpful. If I called this with otherwise valid parameters, and got
> > back -EINVAL, I would probably think the bug is on my end. But it
> > could be on the guest's end! I would probably drop this, but you could
> > have KVM return an empty list of regions when this happens.
> > 
> > Alternatively, as explained below, this could call guest_pv_has instead.
> > 
> > >
> > > > > +
> > > > >         if (!list->size)
> > > > >                 return -EINVAL;
> > > > >
> > > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > > > index 58f89f83caab..43ea5061926f 100644
> > > > > --- a/arch/x86/kvm/svm/svm.c
> > > > > +++ b/arch/x86/kvm/svm/svm.c
> > > > > @@ -2903,6 +2903,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> > > > >                 svm->msr_decfg = data;
> > > > >                 break;
> > > > >         }
> > > > > +       case MSR_KVM_SEV_LIVE_MIGRATION:
> > > > > +               sev_update_migration_flags(vcpu->kvm, data);
> > > > > +               break;
> > > > >         case MSR_IA32_APICBASE:
> > > > >                 if (kvm_vcpu_apicv_active(vcpu))
> > > > >                         avic_update_vapic_bar(to_svm(vcpu), data);
> > > > > @@ -3976,6 +3979,19 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > > > >                         vcpu->arch.cr3_lm_rsvd_bits &= ~(1UL << (best->ebx & 0x3f));
> > > > >         }
> > > > >
> > > > > +       /*
> > > > > +        * If SEV guest then enable the Live migration feature.
> > > > > +        */
> > > > > +       if (sev_guest(vcpu->kvm)) {
> > > > > +               struct kvm_cpuid_entry2 *best;
> > > > > +
> > > > > +               best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
> > > > > +               if (!best)
> > > > > +                       return;
> > > > > +
> > > > > +               best->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
> > > > > +       }
> > > > > +
> > > >
> > > > Looking at this, I believe the only way for this bit to get enabled is
> > > > if userspace toggles it. There needs to be a way for userspace to
> > > > identify if the kernel underneath them does, in fact, support SEV LM.
> > > > I'm at risk for having misread these patches (it's a long series), but
> > > > I don't see anything that communicates upwards.
> > > >
> > > > This could go upward with the other paravirt features flags in
> > > > cpuid.c. It could also be an explicit KVM Capability (checked through
> > > > check_extension).
> > > >
> > > > Userspace should then have a chance to decide whether or not this
> > > > should be enabled. And when it's not enabled, the host should return a
> > > > GP in response to the hypercall. This could be configured either
> > > > through userspace stripping out the LM feature bit, or by calling a VM
> > > > scoped enable cap (KVM_VM_IOCTL_ENABLE_CAP).
> > > >
> > > > I believe the typical path for a feature like this to be configured
> > > > would be to use ENABLE_CAP.
> > >
> > > I believe we have discussed and reviewed this earlier too.
> > >
> > > To summarize this feature, the host indicates if it supports the Live
> > > Migration feature and the feature and the hypercall are only enabled on
> > > the host when the guest checks for this support and does a wrmsrl() to
> > > enable the feature. Also the guest will not make the hypercall if the
> > > host does not indicate support for it.
> > 
> > I've gone through and read this patch a bit more closely, and the
> > surrounding code. Previously, I clearly misread this and the
> > surrounding space.
> > 
> > What happens if the guest just writes to the MSR anyway? Even if it
> > didn't receive a cue to do so? I believe the hypercall would still get
> > invoked here, since the hypercall does not check if SEV live migration
> > is enabled. Similarly, the MSR for enabling it is always available,
> > even if userspace didn't ask for the cpuid bit to be set. This should
> > not happen. Userspace should be in control of a new hypercall rolling
> > out.
> > 
> > I believe my interpretation last time was that the cpuid bit was
> > getting surfaced from the host kernel to host userspace, but I don't
> > actually see that in this patch series. Another way to ask this
> > question would be "How does userspace know the kernel they are on has
> > this patch series?". It needs some way of checking whether or not the
> > kernel underneath it supports SEV live migration. Technically, I think
> > userspace could call get_cpuid, set_cpuid (with the same values), and
> > then get_cpuid again, and it would be able to infer by checking the
> > SEV LM feature flag in the KVM leaf. This seems a bit kludgy. Checking
> > support should be easy.
> > 
> > An additional question is "how does userspace choose whether live
> > migration is advertised to the guest"? I believe userspace's desire
> > for a particular value of the paravirt feature flag in CPUID get's
> > overridden when they call set cpuid, since the feature flag is set in
> > svm_vcpu_after_set_cpuid regardless of what userspace asks for.
> > Userspace should have a choice in the matter.
> > 

Actually i did some more analysis of this, and i believe you are right
about the above, feature flag gets set in svm_vcpu_after_set_cpuid.

So please ignore my comments below. 

I am still analyzing this further.

Thanks,
Ashish
> 
> To summarize, KVM (host) enables SEV live migration feature as
> following:
> 
> static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> {
> ...
>         /*
>          * If SEV guest then enable the Live migration feature.
>          */
>         if (sev_guest(vcpu->kvm)) {
>                 struct kvm_cpuid_entry2 *best;
> 
>                 best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
>                 if (!best)
>                         return;
> 
>                 best->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
>         }
> 
> ...
> ...
> 
> Later userspace can call cpuid(KVM_CPUID_FEATURES) and get the cpuid data
> and override it, for example, this is how Qemu userspace code currently
> fixups/overrides the KVM reported CPUID features : 
> 
> target/i386/kvm/kvm.c:
> 
> uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
>                                       uint32_t index, int reg)
> {
> ...
> ...
> 
>   cpuid = get_supported_cpuid(s);
> 
>   struct kvm_cpuid_entry2 *entry = cpuid_find_entry(cpuid, function, index);
>   if (entry) {
>       ret = cpuid_entry_get_reg(entry, reg);
>   }
>     
>   /* Fixups for the data returned by KVM, below */
> 
>   ...
>   ...
> 
>   } else if (function == KVM_CPUID_FEATURES && reg == R_EAX) {
>         /* kvm_pv_unhalt is reported by GET_SUPPORTED_CPUID, but it can't
>          * be enabled without the in-kernel irqchip
>          */
>         if (!kvm_irqchip_in_kernel()) {
>             ret &= ~(1U << KVM_FEATURE_PV_UNHALT);
>         }
>         if (kvm_irqchip_is_split()) {
>             ret |= 1U << KVM_FEATURE_MSI_EXT_DEST_ID;
>         }
>     } else if (function == KVM_CPUID_FEATURES && reg == R_EDX) {
>         ret |= 1U << KVM_HINTS_REALTIME;
>     }
>     
>     return ret;
> 
> So you can use a similar approach to override
> KVM_FEATURE_SEV_LIVE_MIGRATION feature.
> 
> Thanks,
> Ashish
> 
> > Looking at similar paravirt-y features, there's precedent for another
> > way of doing this (may be preferred over CHECK_EXTENSION/ENABLE_CAP?):
> > this could call guest_pv_has before running the hypercall. The feature
> > (KVM_FEATURE_SEV_LIVE_MIGRATION) would then need to be exposed with
> > the other paravirt features in __do_cpuid_func. The function
> > guest_pv_has would represent if userspace has decided to expose SEV
> > live migration to the guest, and the sev->live_migration_enabled would
> > indicate if the guest responded affirmatively to the CPUID bit.
> > 
> > The downside of using guest_pv_has is that, if pv enforcement is
> > disabled, guest_pv_has will always return true, which seems a bit odd
> > for a non-SEV guest. This isn't a deal breaker, but seems a bit odd
> > for say, a guest that isn't even running SEV. Using CHECK_EXTENSION
> > and ENABLE_CAP sidestep that. I'm also not certain I would call this a
> > paravirt feature.
> > 
> > > And these were your review comments on the above :
> > > I see I misunderstood how the CPUID bits get passed
> > > through: usermode can still override them. Forgot about the back and
> > > forth for CPUID with usermode.
> > >
> > > So as you mentioned, userspace can still override these and it gets a
> > > chance to decide whether or not this should be enabled.
> > >
> > > Thanks,
> > > Ashish
> > 
> > 
> > Thanks,
> > Steve

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.
  2021-02-06 13:56           ` Ashish Kalra
@ 2021-02-08  0:28             ` Ashish Kalra
  2021-02-08 22:50               ` Steve Rutherford
  0 siblings, 1 reply; 75+ messages in thread
From: Ashish Kalra @ 2021-02-08  0:28 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Radim Krčmář,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, Sean Christopherson, Venu Busireddy, Brijesh Singh

Hello Steve, 

On Sat, Feb 06, 2021 at 01:56:46PM +0000, Ashish Kalra wrote:
> Hello Steve,
> 
> On Sat, Feb 06, 2021 at 05:46:17AM +0000, Ashish Kalra wrote:
> > Hello Steve,
> > 
> > Continued response to your queries, especially related to userspace
> > control of SEV live migration feature : 
> > 
> > On Fri, Feb 05, 2021 at 06:54:21PM -0800, Steve Rutherford wrote:
> > > On Thu, Feb 4, 2021 at 7:08 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > >
> > > > Hello Steve,
> > > >
> > > > On Thu, Feb 04, 2021 at 04:56:35PM -0800, Steve Rutherford wrote:
> > > > > On Wed, Feb 3, 2021 at 4:39 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> > > > > >
> > > > > > From: Ashish Kalra <ashish.kalra@amd.com>
> > > > > >
> > > > > > Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> > > > > > for host-side support for SEV live migration. Also add a new custom
> > > > > > MSR_KVM_SEV_LIVE_MIGRATION for guest to enable the SEV live migration
> > > > > > feature.
> > > > > >
> > > > > > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > > > > ---
> > > > > >  Documentation/virt/kvm/cpuid.rst     |  5 +++++
> > > > > >  Documentation/virt/kvm/msr.rst       | 12 ++++++++++++
> > > > > >  arch/x86/include/uapi/asm/kvm_para.h |  4 ++++
> > > > > >  arch/x86/kvm/svm/sev.c               | 13 +++++++++++++
> > > > > >  arch/x86/kvm/svm/svm.c               | 16 ++++++++++++++++
> > > > > >  arch/x86/kvm/svm/svm.h               |  2 ++
> > > > > >  6 files changed, 52 insertions(+)
> > > > > >
> > > > > > diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> > > > > > index cf62162d4be2..0bdb6cdb12d3 100644
> > > > > > --- a/Documentation/virt/kvm/cpuid.rst
> > > > > > +++ b/Documentation/virt/kvm/cpuid.rst
> > > > > > @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID        15          guest checks this feature bit
> > > > > >                                                 before using extended destination
> > > > > >                                                 ID bits in MSI address bits 11-5.
> > > > > >
> > > > > > +KVM_FEATURE_SEV_LIVE_MIGRATION     16          guest checks this feature bit before
> > > > > > +                                               using the page encryption state
> > > > > > +                                               hypercall to notify the page state
> > > > > > +                                               change
> > > > > > +
> > > > > >  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24          host will warn if no guest-side
> > > > > >                                                 per-cpu warps are expected in
> > > > > >                                                 kvmclock
> > > > > > diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> > > > > > index e37a14c323d2..020245d16087 100644
> > > > > > --- a/Documentation/virt/kvm/msr.rst
> > > > > > +++ b/Documentation/virt/kvm/msr.rst
> > > > > > @@ -376,3 +376,15 @@ data:
> > > > > >         write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
> > > > > >         and check if there are more notifications pending. The MSR is available
> > > > > >         if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
> > > > > > +
> > > > > > +MSR_KVM_SEV_LIVE_MIGRATION:
> > > > > > +        0x4b564d08
> > > > > > +
> > > > > > +       Control SEV Live Migration features.
> > > > > > +
> > > > > > +data:
> > > > > > +        Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature,
> > > > > > +        in other words, this is guest->host communication that it's properly
> > > > > > +        handling the shared pages list.
> > > > > > +
> > > > > > +        All other bits are reserved.
> > > > > > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> > > > > > index 950afebfba88..f6bfa138874f 100644
> > > > > > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > > > > > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > > > > > @@ -33,6 +33,7 @@
> > > > > >  #define KVM_FEATURE_PV_SCHED_YIELD     13
> > > > > >  #define KVM_FEATURE_ASYNC_PF_INT       14
> > > > > >  #define KVM_FEATURE_MSI_EXT_DEST_ID    15
> > > > > > +#define KVM_FEATURE_SEV_LIVE_MIGRATION 16
> > > > > >
> > > > > >  #define KVM_HINTS_REALTIME      0
> > > > > >
> > > > > > @@ -54,6 +55,7 @@
> > > > > >  #define MSR_KVM_POLL_CONTROL   0x4b564d05
> > > > > >  #define MSR_KVM_ASYNC_PF_INT   0x4b564d06
> > > > > >  #define MSR_KVM_ASYNC_PF_ACK   0x4b564d07
> > > > > > +#define MSR_KVM_SEV_LIVE_MIGRATION     0x4b564d08
> > > > > >
> > > > > >  struct kvm_steal_time {
> > > > > >         __u64 steal;
> > > > > > @@ -136,4 +138,6 @@ struct kvm_vcpu_pv_apf_data {
> > > > > >  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
> > > > > >  #define KVM_PV_EOI_DISABLED 0x0
> > > > > >
> > > > > > +#define KVM_SEV_LIVE_MIGRATION_ENABLED BIT_ULL(0)
> > > > > > +
> > > > > >  #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> > > > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > > > > index b0d324aed515..93f42b3d3e33 100644
> > > > > > --- a/arch/x86/kvm/svm/sev.c
> > > > > > +++ b/arch/x86/kvm/svm/sev.c
> > > > > > @@ -1627,6 +1627,16 @@ int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > > > > >         return ret;
> > > > > >  }
> > > > > >
> > > > > > +void sev_update_migration_flags(struct kvm *kvm, u64 data)
> > > > > > +{
> > > > > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > > > +
> > > > > > +       if (!sev_guest(kvm))
> > > > > > +               return;
> > > > >
> > > > > This should assert that userspace wanted the guest to be able to make
> > > > > these calls (see more below).
> > > > >
> > > > > >
> > > > > > +
> > > > > > +       sev->live_migration_enabled = !!(data & KVM_SEV_LIVE_MIGRATION_ENABLED);
> > > > > > +}
> > > > > > +
> > > > > >  int svm_get_shared_pages_list(struct kvm *kvm,
> > > > > >                               struct kvm_shared_pages_list *list)
> > > > > >  {
> > > > > > @@ -1639,6 +1649,9 @@ int svm_get_shared_pages_list(struct kvm *kvm,
> > > > > >         if (!sev_guest(kvm))
> > > > > >                 return -ENOTTY;
> > > > > >
> > > > > > +       if (!sev->live_migration_enabled)
> > > > > > +               return -EINVAL;
> > > 
> > > This is currently under guest control, so I'm not certain this is
> > > helpful. If I called this with otherwise valid parameters, and got
> > > back -EINVAL, I would probably think the bug is on my end. But it
> > > could be on the guest's end! I would probably drop this, but you could
> > > have KVM return an empty list of regions when this happens.
> > > 
> > > Alternatively, as explained below, this could call guest_pv_has instead.
> > > 
> > > >
> > > > > > +
> > > > > >         if (!list->size)
> > > > > >                 return -EINVAL;
> > > > > >
> > > > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > > > > index 58f89f83caab..43ea5061926f 100644
> > > > > > --- a/arch/x86/kvm/svm/svm.c
> > > > > > +++ b/arch/x86/kvm/svm/svm.c
> > > > > > @@ -2903,6 +2903,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> > > > > >                 svm->msr_decfg = data;
> > > > > >                 break;
> > > > > >         }
> > > > > > +       case MSR_KVM_SEV_LIVE_MIGRATION:
> > > > > > +               sev_update_migration_flags(vcpu->kvm, data);
> > > > > > +               break;
> > > > > >         case MSR_IA32_APICBASE:
> > > > > >                 if (kvm_vcpu_apicv_active(vcpu))
> > > > > >                         avic_update_vapic_bar(to_svm(vcpu), data);
> > > > > > @@ -3976,6 +3979,19 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > > > > >                         vcpu->arch.cr3_lm_rsvd_bits &= ~(1UL << (best->ebx & 0x3f));
> > > > > >         }
> > > > > >
> > > > > > +       /*
> > > > > > +        * If SEV guest then enable the Live migration feature.
> > > > > > +        */
> > > > > > +       if (sev_guest(vcpu->kvm)) {
> > > > > > +               struct kvm_cpuid_entry2 *best;
> > > > > > +
> > > > > > +               best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
> > > > > > +               if (!best)
> > > > > > +                       return;
> > > > > > +
> > > > > > +               best->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
> > > > > > +       }
> > > > > > +
> > > > >
> > > > > Looking at this, I believe the only way for this bit to get enabled is
> > > > > if userspace toggles it. There needs to be a way for userspace to
> > > > > identify if the kernel underneath them does, in fact, support SEV LM.
> > > > > I'm at risk for having misread these patches (it's a long series), but
> > > > > I don't see anything that communicates upwards.
> > > > >
> > > > > This could go upward with the other paravirt features flags in
> > > > > cpuid.c. It could also be an explicit KVM Capability (checked through
> > > > > check_extension).
> > > > >
> > > > > Userspace should then have a chance to decide whether or not this
> > > > > should be enabled. And when it's not enabled, the host should return a
> > > > > GP in response to the hypercall. This could be configured either
> > > > > through userspace stripping out the LM feature bit, or by calling a VM
> > > > > scoped enable cap (KVM_VM_IOCTL_ENABLE_CAP).
> > > > >
> > > > > I believe the typical path for a feature like this to be configured
> > > > > would be to use ENABLE_CAP.
> > > >
> > > > I believe we have discussed and reviewed this earlier too.
> > > >
> > > > To summarize this feature, the host indicates if it supports the Live
> > > > Migration feature and the feature and the hypercall are only enabled on
> > > > the host when the guest checks for this support and does a wrmsrl() to
> > > > enable the feature. Also the guest will not make the hypercall if the
> > > > host does not indicate support for it.
> > > 
> > > I've gone through and read this patch a bit more closely, and the
> > > surrounding code. Previously, I clearly misread this and the
> > > surrounding space.
> > > 
> > > What happens if the guest just writes to the MSR anyway? Even if it
> > > didn't receive a cue to do so? I believe the hypercall would still get
> > > invoked here, since the hypercall does not check if SEV live migration
> > > is enabled. Similarly, the MSR for enabling it is always available,
> > > even if userspace didn't ask for the cpuid bit to be set. This should
> > > not happen. Userspace should be in control of a new hypercall rolling
> > > out.
> > > 
> > > I believe my interpretation last time was that the cpuid bit was
> > > getting surfaced from the host kernel to host userspace, but I don't
> > > actually see that in this patch series. Another way to ask this
> > > question would be "How does userspace know the kernel they are on has
> > > this patch series?". It needs some way of checking whether or not the
> > > kernel underneath it supports SEV live migration. Technically, I think
> > > userspace could call get_cpuid, set_cpuid (with the same values), and
> > > then get_cpuid again, and it would be able to infer by checking the
> > > SEV LM feature flag in the KVM leaf. This seems a bit kludgy. Checking
> > > support should be easy.
> > > 
> > > An additional question is "how does userspace choose whether live
> > > migration is advertised to the guest"? I believe userspace's desire
> > > for a particular value of the paravirt feature flag in CPUID get's
> > > overridden when they call set cpuid, since the feature flag is set in
> > > svm_vcpu_after_set_cpuid regardless of what userspace asks for.
> > > Userspace should have a choice in the matter.
> > > 
> 
> Actually i did some more analysis of this, and i believe you are right
> about the above, feature flag gets set in svm_vcpu_after_set_cpuid.
> 

As you mentioned above and as i confirmed in my previous email,
calling KVM_SET_CPUID2 vcpu ioctl will always set the live migration
feature flag for the vCPU. 

This is what will be queried by the guest to enable the kernel's
live migration feature and to start making hypercalls.

Now, i want to understand why do you want the userspace to have a 
choice in this matter ?

After all, it is the userspace which actually initiates the live 
migration process, so doesn't it have the final choice in this
matter ?

Even if this feature is reported by host, the guest only uses
it to enable and make page encryption status hypercalls 
and the host's shared pages list gets setup accordingly. 

But unless userspace calls KVM_GET_SHARED_PAGES_LIST ioctl and
initiates live migration process, the above is simply enabling
the guest to make hypercalls whenever a page's encryption 
status is changed.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.
  2021-02-08  0:28             ` Ashish Kalra
@ 2021-02-08 22:50               ` Steve Rutherford
  2021-02-10 20:36                 ` Ashish Kalra
  0 siblings, 1 reply; 75+ messages in thread
From: Steve Rutherford @ 2021-02-08 22:50 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Radim Krčmář,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, Sean Christopherson, Venu Busireddy, Brijesh Singh

Hi Ashish,

On Sun, Feb 7, 2021 at 4:29 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>
> Hello Steve,
>
> On Sat, Feb 06, 2021 at 01:56:46PM +0000, Ashish Kalra wrote:
> > Hello Steve,
> >
> > On Sat, Feb 06, 2021 at 05:46:17AM +0000, Ashish Kalra wrote:
> > > Hello Steve,
> > >
> > > Continued response to your queries, especially related to userspace
> > > control of SEV live migration feature :
> > >
> > > On Fri, Feb 05, 2021 at 06:54:21PM -0800, Steve Rutherford wrote:
> > > > On Thu, Feb 4, 2021 at 7:08 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > > >
> > > > > Hello Steve,
> > > > >
> > > > > On Thu, Feb 04, 2021 at 04:56:35PM -0800, Steve Rutherford wrote:
> > > > > > On Wed, Feb 3, 2021 at 4:39 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> > > > > > >
> > > > > > > From: Ashish Kalra <ashish.kalra@amd.com>
> > > > > > >
> > > > > > > Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> > > > > > > for host-side support for SEV live migration. Also add a new custom
> > > > > > > MSR_KVM_SEV_LIVE_MIGRATION for guest to enable the SEV live migration
> > > > > > > feature.
> > > > > > >
> > > > > > > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > > > > > ---
> > > > > > >  Documentation/virt/kvm/cpuid.rst     |  5 +++++
> > > > > > >  Documentation/virt/kvm/msr.rst       | 12 ++++++++++++
> > > > > > >  arch/x86/include/uapi/asm/kvm_para.h |  4 ++++
> > > > > > >  arch/x86/kvm/svm/sev.c               | 13 +++++++++++++
> > > > > > >  arch/x86/kvm/svm/svm.c               | 16 ++++++++++++++++
> > > > > > >  arch/x86/kvm/svm/svm.h               |  2 ++
> > > > > > >  6 files changed, 52 insertions(+)
> > > > > > >
> > > > > > > diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> > > > > > > index cf62162d4be2..0bdb6cdb12d3 100644
> > > > > > > --- a/Documentation/virt/kvm/cpuid.rst
> > > > > > > +++ b/Documentation/virt/kvm/cpuid.rst
> > > > > > > @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID        15          guest checks this feature bit
> > > > > > >                                                 before using extended destination
> > > > > > >                                                 ID bits in MSI address bits 11-5.
> > > > > > >
> > > > > > > +KVM_FEATURE_SEV_LIVE_MIGRATION     16          guest checks this feature bit before
> > > > > > > +                                               using the page encryption state
> > > > > > > +                                               hypercall to notify the page state
> > > > > > > +                                               change
> > > > > > > +
> > > > > > >  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24          host will warn if no guest-side
> > > > > > >                                                 per-cpu warps are expected in
> > > > > > >                                                 kvmclock
> > > > > > > diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> > > > > > > index e37a14c323d2..020245d16087 100644
> > > > > > > --- a/Documentation/virt/kvm/msr.rst
> > > > > > > +++ b/Documentation/virt/kvm/msr.rst
> > > > > > > @@ -376,3 +376,15 @@ data:
> > > > > > >         write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
> > > > > > >         and check if there are more notifications pending. The MSR is available
> > > > > > >         if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
> > > > > > > +
> > > > > > > +MSR_KVM_SEV_LIVE_MIGRATION:
> > > > > > > +        0x4b564d08
> > > > > > > +
> > > > > > > +       Control SEV Live Migration features.
> > > > > > > +
> > > > > > > +data:
> > > > > > > +        Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature,
> > > > > > > +        in other words, this is guest->host communication that it's properly
> > > > > > > +        handling the shared pages list.
> > > > > > > +
> > > > > > > +        All other bits are reserved.
> > > > > > > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> > > > > > > index 950afebfba88..f6bfa138874f 100644
> > > > > > > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > > > > > > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > > > > > > @@ -33,6 +33,7 @@
> > > > > > >  #define KVM_FEATURE_PV_SCHED_YIELD     13
> > > > > > >  #define KVM_FEATURE_ASYNC_PF_INT       14
> > > > > > >  #define KVM_FEATURE_MSI_EXT_DEST_ID    15
> > > > > > > +#define KVM_FEATURE_SEV_LIVE_MIGRATION 16
> > > > > > >
> > > > > > >  #define KVM_HINTS_REALTIME      0
> > > > > > >
> > > > > > > @@ -54,6 +55,7 @@
> > > > > > >  #define MSR_KVM_POLL_CONTROL   0x4b564d05
> > > > > > >  #define MSR_KVM_ASYNC_PF_INT   0x4b564d06
> > > > > > >  #define MSR_KVM_ASYNC_PF_ACK   0x4b564d07
> > > > > > > +#define MSR_KVM_SEV_LIVE_MIGRATION     0x4b564d08
> > > > > > >
> > > > > > >  struct kvm_steal_time {
> > > > > > >         __u64 steal;
> > > > > > > @@ -136,4 +138,6 @@ struct kvm_vcpu_pv_apf_data {
> > > > > > >  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
> > > > > > >  #define KVM_PV_EOI_DISABLED 0x0
> > > > > > >
> > > > > > > +#define KVM_SEV_LIVE_MIGRATION_ENABLED BIT_ULL(0)
> > > > > > > +
> > > > > > >  #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> > > > > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > > > > > index b0d324aed515..93f42b3d3e33 100644
> > > > > > > --- a/arch/x86/kvm/svm/sev.c
> > > > > > > +++ b/arch/x86/kvm/svm/sev.c
> > > > > > > @@ -1627,6 +1627,16 @@ int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > > > > > >         return ret;
> > > > > > >  }
> > > > > > >
> > > > > > > +void sev_update_migration_flags(struct kvm *kvm, u64 data)
> > > > > > > +{
> > > > > > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > > > > +
> > > > > > > +       if (!sev_guest(kvm))
> > > > > > > +               return;
> > > > > >
> > > > > > This should assert that userspace wanted the guest to be able to make
> > > > > > these calls (see more below).
> > > > > >
> > > > > > >
> > > > > > > +
> > > > > > > +       sev->live_migration_enabled = !!(data & KVM_SEV_LIVE_MIGRATION_ENABLED);
> > > > > > > +}
> > > > > > > +
> > > > > > >  int svm_get_shared_pages_list(struct kvm *kvm,
> > > > > > >                               struct kvm_shared_pages_list *list)
> > > > > > >  {
> > > > > > > @@ -1639,6 +1649,9 @@ int svm_get_shared_pages_list(struct kvm *kvm,
> > > > > > >         if (!sev_guest(kvm))
> > > > > > >                 return -ENOTTY;
> > > > > > >
> > > > > > > +       if (!sev->live_migration_enabled)
> > > > > > > +               return -EINVAL;
> > > >
> > > > This is currently under guest control, so I'm not certain this is
> > > > helpful. If I called this with otherwise valid parameters, and got
> > > > back -EINVAL, I would probably think the bug is on my end. But it
> > > > could be on the guest's end! I would probably drop this, but you could
> > > > have KVM return an empty list of regions when this happens.
> > > >
> > > > Alternatively, as explained below, this could call guest_pv_has instead.
> > > >
> > > > >
> > > > > > > +
> > > > > > >         if (!list->size)
> > > > > > >                 return -EINVAL;
> > > > > > >
> > > > > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > > > > > index 58f89f83caab..43ea5061926f 100644
> > > > > > > --- a/arch/x86/kvm/svm/svm.c
> > > > > > > +++ b/arch/x86/kvm/svm/svm.c
> > > > > > > @@ -2903,6 +2903,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> > > > > > >                 svm->msr_decfg = data;
> > > > > > >                 break;
> > > > > > >         }
> > > > > > > +       case MSR_KVM_SEV_LIVE_MIGRATION:
> > > > > > > +               sev_update_migration_flags(vcpu->kvm, data);
> > > > > > > +               break;
> > > > > > >         case MSR_IA32_APICBASE:
> > > > > > >                 if (kvm_vcpu_apicv_active(vcpu))
> > > > > > >                         avic_update_vapic_bar(to_svm(vcpu), data);
> > > > > > > @@ -3976,6 +3979,19 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > > > > > >                         vcpu->arch.cr3_lm_rsvd_bits &= ~(1UL << (best->ebx & 0x3f));
> > > > > > >         }
> > > > > > >
> > > > > > > +       /*
> > > > > > > +        * If SEV guest then enable the Live migration feature.
> > > > > > > +        */
> > > > > > > +       if (sev_guest(vcpu->kvm)) {
> > > > > > > +               struct kvm_cpuid_entry2 *best;
> > > > > > > +
> > > > > > > +               best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
> > > > > > > +               if (!best)
> > > > > > > +                       return;
> > > > > > > +
> > > > > > > +               best->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
> > > > > > > +       }
> > > > > > > +
> > > > > >
> > > > > > Looking at this, I believe the only way for this bit to get enabled is
> > > > > > if userspace toggles it. There needs to be a way for userspace to
> > > > > > identify if the kernel underneath them does, in fact, support SEV LM.
> > > > > > I'm at risk for having misread these patches (it's a long series), but
> > > > > > I don't see anything that communicates upwards.
> > > > > >
> > > > > > This could go upward with the other paravirt features flags in
> > > > > > cpuid.c. It could also be an explicit KVM Capability (checked through
> > > > > > check_extension).
> > > > > >
> > > > > > Userspace should then have a chance to decide whether or not this
> > > > > > should be enabled. And when it's not enabled, the host should return a
> > > > > > GP in response to the hypercall. This could be configured either
> > > > > > through userspace stripping out the LM feature bit, or by calling a VM
> > > > > > scoped enable cap (KVM_VM_IOCTL_ENABLE_CAP).
> > > > > >
> > > > > > I believe the typical path for a feature like this to be configured
> > > > > > would be to use ENABLE_CAP.
> > > > >
> > > > > I believe we have discussed and reviewed this earlier too.
> > > > >
> > > > > To summarize this feature, the host indicates if it supports the Live
> > > > > Migration feature and the feature and the hypercall are only enabled on
> > > > > the host when the guest checks for this support and does a wrmsrl() to
> > > > > enable the feature. Also the guest will not make the hypercall if the
> > > > > host does not indicate support for it.
> > > >
> > > > I've gone through and read this patch a bit more closely, and the
> > > > surrounding code. Previously, I clearly misread this and the
> > > > surrounding space.
> > > >
> > > > What happens if the guest just writes to the MSR anyway? Even if it
> > > > didn't receive a cue to do so? I believe the hypercall would still get
> > > > invoked here, since the hypercall does not check if SEV live migration
> > > > is enabled. Similarly, the MSR for enabling it is always available,
> > > > even if userspace didn't ask for the cpuid bit to be set. This should
> > > > not happen. Userspace should be in control of a new hypercall rolling
> > > > out.
> > > >
> > > > I believe my interpretation last time was that the cpuid bit was
> > > > getting surfaced from the host kernel to host userspace, but I don't
> > > > actually see that in this patch series. Another way to ask this
> > > > question would be "How does userspace know the kernel they are on has
> > > > this patch series?". It needs some way of checking whether or not the
> > > > kernel underneath it supports SEV live migration. Technically, I think
> > > > userspace could call get_cpuid, set_cpuid (with the same values), and
> > > > then get_cpuid again, and it would be able to infer by checking the
> > > > SEV LM feature flag in the KVM leaf. This seems a bit kludgy. Checking
> > > > support should be easy.
> > > >
> > > > An additional question is "how does userspace choose whether live
> > > > migration is advertised to the guest"? I believe userspace's desire
> > > > for a particular value of the paravirt feature flag in CPUID get's
> > > > overridden when they call set cpuid, since the feature flag is set in
> > > > svm_vcpu_after_set_cpuid regardless of what userspace asks for.
> > > > Userspace should have a choice in the matter.
> > > >
> >
> > Actually i did some more analysis of this, and i believe you are right
> > about the above, feature flag gets set in svm_vcpu_after_set_cpuid.
> >
>
> As you mentioned above and as i confirmed in my previous email,
> calling KVM_SET_CPUID2 vcpu ioctl will always set the live migration
> feature flag for the vCPU.
>
> This is what will be queried by the guest to enable the kernel's
> live migration feature and to start making hypercalls.
>
> Now, i want to understand why do you want the userspace to have a
> choice in this matter ?
Kernel rollout risk is a pretty big factor:
1) Feature flagging is a pretty common risk mitigation for new features.
2) Without userspace being able to intervene, the kernel rollout
becomes a feature rollout.

IIUC, as soon as new VMs started running on this host kernel, they
would immediately start calling the hypercall if they had the guest
patches, even if they did not do so on older versions of the host
kernel.

>
> After all, it is the userspace which actually initiates the live
> migration process, so doesn't it have the final choice in this
> matter ?
With the current implementation, userspace has the final say in the
migration, but not the final say in whether or not that particular
hypercall is used by the guest. If a customer showed up, and said
"don't have my guest migrate", there is no way for the host to tell
the guest "hey, we're not even listening to what you're sending over
the hypercall". IIRC, there is an SEV Policy bit for migration
enablement, but even if it were set to false, that guest would still
update the host about its unencrypted regions.

Right now, the host can't even remove the feature bit from CPUID
(since its desire would be overridden post-set), so it doesn't have
the ability to tell the guest to hang up the phone. And even if we
could tell the guest through CPUID, if the guest ignored what we told
it, it could still send data down anyway! If there were a bug in this
implementation that we missed, the only way to avoid it would be to
roll out a new kernel, which is pretty heavy handed. If you could just
disable the feature (or never enable it in the first place), that
would be much less costly.

> Even if this feature is reported by host, the guest only uses
> it to enable and make page encryption status hypercalls
> and the host's shared pages list gets setup accordingly.
>
> But unless userspace calls KVM_GET_SHARED_PAGES_LIST ioctl and
> initiates live migration process, the above is simply enabling
> the guest to make hypercalls whenever a page's encryption
> status is changed.
>
> Thanks,
> Ashish

Thanks,
Steve

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 16/16] KVM: SVM: Bypass DBG_DECRYPT API calls for unencrypted guest memory.
  2021-02-04  0:40 ` [PATCH v10 16/16] KVM: SVM: Bypass DBG_DECRYPT API calls for unencrypted guest memory Ashish Kalra
@ 2021-02-09  6:23     ` kernel test robot
  0 siblings, 0 replies; 75+ messages in thread
From: kernel test robot @ 2021-02-09  6:23 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: kbuild-all, tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky,
	x86, kvm

[-- Attachment #1: Type: text/plain, Size: 3779 bytes --]

Hi Ashish,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v5.11-rc6]
[also build test WARNING on next-20210125]
[cannot apply to kvm/linux-next tip/x86/mm tip/x86/core]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ashish-Kalra/KVM-SVM-Add-KVM_SEV-SEND_START-command/20210204-093647
base:    1048ba83fb1c00cd24172e23e8263972f6b5d9ac
config: i386-randconfig-r015-20210209 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/c086cd5491c6b84adcffedc1bf798df758b18080
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ashish-Kalra/KVM-SVM-Add-KVM_SEV-SEND_START-command/20210204-093647
        git checkout c086cd5491c6b84adcffedc1bf798df758b18080
        # save the attached .config to linux build tree
        make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   arch/x86/kvm/svm/sev.c: In function 'handle_unencrypted_region':
>> arch/x86/kvm/svm/sev.c:945:18: warning: variable 'd_off' set but not used [-Wunused-but-set-variable]
     945 |  int len, s_off, d_off;
         |                  ^~~~~


vim +/d_off +945 arch/x86/kvm/svm/sev.c

   934	
   935	static int handle_unencrypted_region(struct kvm *kvm,
   936					     unsigned long vaddr,
   937					     unsigned long vaddr_end,
   938					     unsigned long dst_vaddr,
   939					     unsigned int size,
   940					     bool *is_decrypted)
   941	{
   942		struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
   943		struct page *page = NULL;
   944		gfn_t gfn_start, gfn_end;
 > 945		int len, s_off, d_off;
   946		int srcu_idx;
   947		int ret = 0;
   948	
   949		/* ensure hva_to_gfn translations remain valid */
   950		srcu_idx = srcu_read_lock(&kvm->srcu);
   951	
   952		if (!hva_to_gfn(kvm, vaddr, &gfn_start)) {
   953			srcu_read_unlock(&kvm->srcu, srcu_idx);
   954			return -EINVAL;
   955		}
   956	
   957		if (!hva_to_gfn(kvm, vaddr_end, &gfn_end)) {
   958			srcu_read_unlock(&kvm->srcu, srcu_idx);
   959			return -EINVAL;
   960		}
   961	
   962		if (sev->shared_pages_list_count) {
   963			if (is_unencrypted_region(gfn_start, gfn_end,
   964						  &sev->shared_pages_list)) {
   965				page = alloc_page(GFP_KERNEL);
   966				if (!page) {
   967					srcu_read_unlock(&kvm->srcu, srcu_idx);
   968					return -ENOMEM;
   969				}
   970	
   971				/*
   972				 * Since user buffer may not be page aligned, calculate the
   973				 * offset within the page.
   974				 */
   975				s_off = vaddr & ~PAGE_MASK;
   976				d_off = dst_vaddr & ~PAGE_MASK;
   977				len = min_t(size_t, (PAGE_SIZE - s_off), size);
   978	
   979				if (copy_from_user(page_address(page),
   980						   (void __user *)(uintptr_t)vaddr, len)) {
   981					__free_page(page);
   982					srcu_read_unlock(&kvm->srcu, srcu_idx);
   983					return -EFAULT;
   984				}
   985	
   986				if (copy_to_user((void __user *)(uintptr_t)dst_vaddr,
   987						 page_address(page), len)) {
   988					ret = -EFAULT;
   989				}
   990	
   991				__free_page(page);
   992				srcu_read_unlock(&kvm->srcu, srcu_idx);
   993				*is_decrypted = true;
   994				return ret;
   995			}
   996		}
   997		srcu_read_unlock(&kvm->srcu, srcu_idx);
   998		*is_decrypted = false;
   999		return ret;
  1000	}
  1001	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 36525 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 16/16] KVM: SVM: Bypass DBG_DECRYPT API calls for unencrypted guest memory.
@ 2021-02-09  6:23     ` kernel test robot
  0 siblings, 0 replies; 75+ messages in thread
From: kernel test robot @ 2021-02-09  6:23 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3889 bytes --]

Hi Ashish,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v5.11-rc6]
[also build test WARNING on next-20210125]
[cannot apply to kvm/linux-next tip/x86/mm tip/x86/core]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ashish-Kalra/KVM-SVM-Add-KVM_SEV-SEND_START-command/20210204-093647
base:    1048ba83fb1c00cd24172e23e8263972f6b5d9ac
config: i386-randconfig-r015-20210209 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/c086cd5491c6b84adcffedc1bf798df758b18080
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ashish-Kalra/KVM-SVM-Add-KVM_SEV-SEND_START-command/20210204-093647
        git checkout c086cd5491c6b84adcffedc1bf798df758b18080
        # save the attached .config to linux build tree
        make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   arch/x86/kvm/svm/sev.c: In function 'handle_unencrypted_region':
>> arch/x86/kvm/svm/sev.c:945:18: warning: variable 'd_off' set but not used [-Wunused-but-set-variable]
     945 |  int len, s_off, d_off;
         |                  ^~~~~


vim +/d_off +945 arch/x86/kvm/svm/sev.c

   934	
   935	static int handle_unencrypted_region(struct kvm *kvm,
   936					     unsigned long vaddr,
   937					     unsigned long vaddr_end,
   938					     unsigned long dst_vaddr,
   939					     unsigned int size,
   940					     bool *is_decrypted)
   941	{
   942		struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
   943		struct page *page = NULL;
   944		gfn_t gfn_start, gfn_end;
 > 945		int len, s_off, d_off;
   946		int srcu_idx;
   947		int ret = 0;
   948	
   949		/* ensure hva_to_gfn translations remain valid */
   950		srcu_idx = srcu_read_lock(&kvm->srcu);
   951	
   952		if (!hva_to_gfn(kvm, vaddr, &gfn_start)) {
   953			srcu_read_unlock(&kvm->srcu, srcu_idx);
   954			return -EINVAL;
   955		}
   956	
   957		if (!hva_to_gfn(kvm, vaddr_end, &gfn_end)) {
   958			srcu_read_unlock(&kvm->srcu, srcu_idx);
   959			return -EINVAL;
   960		}
   961	
   962		if (sev->shared_pages_list_count) {
   963			if (is_unencrypted_region(gfn_start, gfn_end,
   964						  &sev->shared_pages_list)) {
   965				page = alloc_page(GFP_KERNEL);
   966				if (!page) {
   967					srcu_read_unlock(&kvm->srcu, srcu_idx);
   968					return -ENOMEM;
   969				}
   970	
   971				/*
   972				 * Since user buffer may not be page aligned, calculate the
   973				 * offset within the page.
   974				 */
   975				s_off = vaddr & ~PAGE_MASK;
   976				d_off = dst_vaddr & ~PAGE_MASK;
   977				len = min_t(size_t, (PAGE_SIZE - s_off), size);
   978	
   979				if (copy_from_user(page_address(page),
   980						   (void __user *)(uintptr_t)vaddr, len)) {
   981					__free_page(page);
   982					srcu_read_unlock(&kvm->srcu, srcu_idx);
   983					return -EFAULT;
   984				}
   985	
   986				if (copy_to_user((void __user *)(uintptr_t)dst_vaddr,
   987						 page_address(page), len)) {
   988					ret = -EFAULT;
   989				}
   990	
   991				__free_page(page);
   992				srcu_read_unlock(&kvm->srcu, srcu_idx);
   993				*is_decrypted = true;
   994				return ret;
   995			}
   996		}
   997		srcu_read_unlock(&kvm->srcu, srcu_idx);
   998		*is_decrypted = false;
   999		return ret;
  1000	}
  1001	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 36525 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.
  2021-02-08 22:50               ` Steve Rutherford
@ 2021-02-10 20:36                 ` Ashish Kalra
  2021-02-10 22:01                   ` Steve Rutherford
  0 siblings, 1 reply; 75+ messages in thread
From: Ashish Kalra @ 2021-02-10 20:36 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Radim Krčmář,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, Sean Christopherson, Venu Busireddy, Brijesh Singh

Hello Steve,

On Mon, Feb 08, 2021 at 02:50:14PM -0800, Steve Rutherford wrote:
> Hi Ashish,
> 
> On Sun, Feb 7, 2021 at 4:29 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >
> > Hello Steve,
> >
> > On Sat, Feb 06, 2021 at 01:56:46PM +0000, Ashish Kalra wrote:
> > > Hello Steve,
> > >
> > > On Sat, Feb 06, 2021 at 05:46:17AM +0000, Ashish Kalra wrote:
> > > > Hello Steve,
> > > >
> > > > Continued response to your queries, especially related to userspace
> > > > control of SEV live migration feature :
> > > >
> > > > On Fri, Feb 05, 2021 at 06:54:21PM -0800, Steve Rutherford wrote:
> > > > > On Thu, Feb 4, 2021 at 7:08 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > > > >
> > > > > > Hello Steve,
> > > > > >
> > > > > > On Thu, Feb 04, 2021 at 04:56:35PM -0800, Steve Rutherford wrote:
> > > > > > > On Wed, Feb 3, 2021 at 4:39 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> > > > > > > >
> > > > > > > > From: Ashish Kalra <ashish.kalra@amd.com>
> > > > > > > >
> > > > > > > > Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> > > > > > > > for host-side support for SEV live migration. Also add a new custom
> > > > > > > > MSR_KVM_SEV_LIVE_MIGRATION for guest to enable the SEV live migration
> > > > > > > > feature.
> > > > > > > >
> > > > > > > > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > > > > > > ---
> > > > > > > >  Documentation/virt/kvm/cpuid.rst     |  5 +++++
> > > > > > > >  Documentation/virt/kvm/msr.rst       | 12 ++++++++++++
> > > > > > > >  arch/x86/include/uapi/asm/kvm_para.h |  4 ++++
> > > > > > > >  arch/x86/kvm/svm/sev.c               | 13 +++++++++++++
> > > > > > > >  arch/x86/kvm/svm/svm.c               | 16 ++++++++++++++++
> > > > > > > >  arch/x86/kvm/svm/svm.h               |  2 ++
> > > > > > > >  6 files changed, 52 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> > > > > > > > index cf62162d4be2..0bdb6cdb12d3 100644
> > > > > > > > --- a/Documentation/virt/kvm/cpuid.rst
> > > > > > > > +++ b/Documentation/virt/kvm/cpuid.rst
> > > > > > > > @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID        15          guest checks this feature bit
> > > > > > > >                                                 before using extended destination
> > > > > > > >                                                 ID bits in MSI address bits 11-5.
> > > > > > > >
> > > > > > > > +KVM_FEATURE_SEV_LIVE_MIGRATION     16          guest checks this feature bit before
> > > > > > > > +                                               using the page encryption state
> > > > > > > > +                                               hypercall to notify the page state
> > > > > > > > +                                               change
> > > > > > > > +
> > > > > > > >  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24          host will warn if no guest-side
> > > > > > > >                                                 per-cpu warps are expected in
> > > > > > > >                                                 kvmclock
> > > > > > > > diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> > > > > > > > index e37a14c323d2..020245d16087 100644
> > > > > > > > --- a/Documentation/virt/kvm/msr.rst
> > > > > > > > +++ b/Documentation/virt/kvm/msr.rst
> > > > > > > > @@ -376,3 +376,15 @@ data:
> > > > > > > >         write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
> > > > > > > >         and check if there are more notifications pending. The MSR is available
> > > > > > > >         if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
> > > > > > > > +
> > > > > > > > +MSR_KVM_SEV_LIVE_MIGRATION:
> > > > > > > > +        0x4b564d08
> > > > > > > > +
> > > > > > > > +       Control SEV Live Migration features.
> > > > > > > > +
> > > > > > > > +data:
> > > > > > > > +        Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature,
> > > > > > > > +        in other words, this is guest->host communication that it's properly
> > > > > > > > +        handling the shared pages list.
> > > > > > > > +
> > > > > > > > +        All other bits are reserved.
> > > > > > > > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> > > > > > > > index 950afebfba88..f6bfa138874f 100644
> > > > > > > > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > > > > > > > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > > > > > > > @@ -33,6 +33,7 @@
> > > > > > > >  #define KVM_FEATURE_PV_SCHED_YIELD     13
> > > > > > > >  #define KVM_FEATURE_ASYNC_PF_INT       14
> > > > > > > >  #define KVM_FEATURE_MSI_EXT_DEST_ID    15
> > > > > > > > +#define KVM_FEATURE_SEV_LIVE_MIGRATION 16
> > > > > > > >
> > > > > > > >  #define KVM_HINTS_REALTIME      0
> > > > > > > >
> > > > > > > > @@ -54,6 +55,7 @@
> > > > > > > >  #define MSR_KVM_POLL_CONTROL   0x4b564d05
> > > > > > > >  #define MSR_KVM_ASYNC_PF_INT   0x4b564d06
> > > > > > > >  #define MSR_KVM_ASYNC_PF_ACK   0x4b564d07
> > > > > > > > +#define MSR_KVM_SEV_LIVE_MIGRATION     0x4b564d08
> > > > > > > >
> > > > > > > >  struct kvm_steal_time {
> > > > > > > >         __u64 steal;
> > > > > > > > @@ -136,4 +138,6 @@ struct kvm_vcpu_pv_apf_data {
> > > > > > > >  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
> > > > > > > >  #define KVM_PV_EOI_DISABLED 0x0
> > > > > > > >
> > > > > > > > +#define KVM_SEV_LIVE_MIGRATION_ENABLED BIT_ULL(0)
> > > > > > > > +
> > > > > > > >  #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> > > > > > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > > > > > > index b0d324aed515..93f42b3d3e33 100644
> > > > > > > > --- a/arch/x86/kvm/svm/sev.c
> > > > > > > > +++ b/arch/x86/kvm/svm/sev.c
> > > > > > > > @@ -1627,6 +1627,16 @@ int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > > > > > > >         return ret;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +void sev_update_migration_flags(struct kvm *kvm, u64 data)
> > > > > > > > +{
> > > > > > > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > > > > > +
> > > > > > > > +       if (!sev_guest(kvm))
> > > > > > > > +               return;
> > > > > > >
> > > > > > > This should assert that userspace wanted the guest to be able to make
> > > > > > > these calls (see more below).
> > > > > > >
> > > > > > > >
> > > > > > > > +
> > > > > > > > +       sev->live_migration_enabled = !!(data & KVM_SEV_LIVE_MIGRATION_ENABLED);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  int svm_get_shared_pages_list(struct kvm *kvm,
> > > > > > > >                               struct kvm_shared_pages_list *list)
> > > > > > > >  {
> > > > > > > > @@ -1639,6 +1649,9 @@ int svm_get_shared_pages_list(struct kvm *kvm,
> > > > > > > >         if (!sev_guest(kvm))
> > > > > > > >                 return -ENOTTY;
> > > > > > > >
> > > > > > > > +       if (!sev->live_migration_enabled)
> > > > > > > > +               return -EINVAL;
> > > > >
> > > > > This is currently under guest control, so I'm not certain this is
> > > > > helpful. If I called this with otherwise valid parameters, and got
> > > > > back -EINVAL, I would probably think the bug is on my end. But it
> > > > > could be on the guest's end! I would probably drop this, but you could
> > > > > have KVM return an empty list of regions when this happens.
> > > > >
> > > > > Alternatively, as explained below, this could call guest_pv_has instead.
> > > > >
> > > > > >
> > > > > > > > +
> > > > > > > >         if (!list->size)
> > > > > > > >                 return -EINVAL;
> > > > > > > >
> > > > > > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > > > > > > index 58f89f83caab..43ea5061926f 100644
> > > > > > > > --- a/arch/x86/kvm/svm/svm.c
> > > > > > > > +++ b/arch/x86/kvm/svm/svm.c
> > > > > > > > @@ -2903,6 +2903,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> > > > > > > >                 svm->msr_decfg = data;
> > > > > > > >                 break;
> > > > > > > >         }
> > > > > > > > +       case MSR_KVM_SEV_LIVE_MIGRATION:
> > > > > > > > +               sev_update_migration_flags(vcpu->kvm, data);
> > > > > > > > +               break;
> > > > > > > >         case MSR_IA32_APICBASE:
> > > > > > > >                 if (kvm_vcpu_apicv_active(vcpu))
> > > > > > > >                         avic_update_vapic_bar(to_svm(vcpu), data);
> > > > > > > > @@ -3976,6 +3979,19 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > > > > > > >                         vcpu->arch.cr3_lm_rsvd_bits &= ~(1UL << (best->ebx & 0x3f));
> > > > > > > >         }
> > > > > > > >
> > > > > > > > +       /*
> > > > > > > > +        * If SEV guest then enable the Live migration feature.
> > > > > > > > +        */
> > > > > > > > +       if (sev_guest(vcpu->kvm)) {
> > > > > > > > +               struct kvm_cpuid_entry2 *best;
> > > > > > > > +
> > > > > > > > +               best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
> > > > > > > > +               if (!best)
> > > > > > > > +                       return;
> > > > > > > > +
> > > > > > > > +               best->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
> > > > > > > > +       }
> > > > > > > > +
> > > > > > >
> > > > > > > Looking at this, I believe the only way for this bit to get enabled is
> > > > > > > if userspace toggles it. There needs to be a way for userspace to
> > > > > > > identify if the kernel underneath them does, in fact, support SEV LM.
> > > > > > > I'm at risk for having misread these patches (it's a long series), but
> > > > > > > I don't see anything that communicates upwards.
> > > > > > >
> > > > > > > This could go upward with the other paravirt features flags in
> > > > > > > cpuid.c. It could also be an explicit KVM Capability (checked through
> > > > > > > check_extension).
> > > > > > >
> > > > > > > Userspace should then have a chance to decide whether or not this
> > > > > > > should be enabled. And when it's not enabled, the host should return a
> > > > > > > GP in response to the hypercall. This could be configured either
> > > > > > > through userspace stripping out the LM feature bit, or by calling a VM
> > > > > > > scoped enable cap (KVM_VM_IOCTL_ENABLE_CAP).
> > > > > > >
> > > > > > > I believe the typical path for a feature like this to be configured
> > > > > > > would be to use ENABLE_CAP.
> > > > > >
> > > > > > I believe we have discussed and reviewed this earlier too.
> > > > > >
> > > > > > To summarize this feature, the host indicates if it supports the Live
> > > > > > Migration feature and the feature and the hypercall are only enabled on
> > > > > > the host when the guest checks for this support and does a wrmsrl() to
> > > > > > enable the feature. Also the guest will not make the hypercall if the
> > > > > > host does not indicate support for it.
> > > > >
> > > > > I've gone through and read this patch a bit more closely, and the
> > > > > surrounding code. Previously, I clearly misread this and the
> > > > > surrounding space.
> > > > >
> > > > > What happens if the guest just writes to the MSR anyway? Even if it
> > > > > didn't receive a cue to do so? I believe the hypercall would still get
> > > > > invoked here, since the hypercall does not check if SEV live migration
> > > > > is enabled. Similarly, the MSR for enabling it is always available,
> > > > > even if userspace didn't ask for the cpuid bit to be set. This should
> > > > > not happen. Userspace should be in control of a new hypercall rolling
> > > > > out.
> > > > >
> > > > > I believe my interpretation last time was that the cpuid bit was
> > > > > getting surfaced from the host kernel to host userspace, but I don't
> > > > > actually see that in this patch series. Another way to ask this
> > > > > question would be "How does userspace know the kernel they are on has
> > > > > this patch series?". It needs some way of checking whether or not the
> > > > > kernel underneath it supports SEV live migration. Technically, I think
> > > > > userspace could call get_cpuid, set_cpuid (with the same values), and
> > > > > then get_cpuid again, and it would be able to infer by checking the
> > > > > SEV LM feature flag in the KVM leaf. This seems a bit kludgy. Checking
> > > > > support should be easy.
> > > > >
> > > > > An additional question is "how does userspace choose whether live
> > > > > migration is advertised to the guest"? I believe userspace's desire
> > > > > for a particular value of the paravirt feature flag in CPUID get's
> > > > > overridden when they call set cpuid, since the feature flag is set in
> > > > > svm_vcpu_after_set_cpuid regardless of what userspace asks for.
> > > > > Userspace should have a choice in the matter.
> > > > >
> > >
> > > Actually i did some more analysis of this, and i believe you are right
> > > about the above, feature flag gets set in svm_vcpu_after_set_cpuid.
> > >
> >
> > As you mentioned above and as i confirmed in my previous email,
> > calling KVM_SET_CPUID2 vcpu ioctl will always set the live migration
> > feature flag for the vCPU.
> >
> > This is what will be queried by the guest to enable the kernel's
> > live migration feature and to start making hypercalls.
> >
> > Now, i want to understand why do you want the userspace to have a
> > choice in this matter ?
> Kernel rollout risk is a pretty big factor:
> 1) Feature flagging is a pretty common risk mitigation for new features.
> 2) Without userspace being able to intervene, the kernel rollout
> becomes a feature rollout.
> 
> IIUC, as soon as new VMs started running on this host kernel, they
> would immediately start calling the hypercall if they had the guest
> patches, even if they did not do so on older versions of the host
> kernel.
> 
> >
> > After all, it is the userspace which actually initiates the live
> > migration process, so doesn't it have the final choice in this
> > matter ?
> With the current implementation, userspace has the final say in the
> migration, but not the final say in whether or not that particular
> hypercall is used by the guest. If a customer showed up, and said
> "don't have my guest migrate", there is no way for the host to tell
> the guest "hey, we're not even listening to what you're sending over
> the hypercall". IIRC, there is an SEV Policy bit for migration
> enablement, but even if it were set to false, that guest would still
> update the host about its unencrypted regions.
> 
> Right now, the host can't even remove the feature bit from CPUID
> (since its desire would be overridden post-set), so it doesn't have
> the ability to tell the guest to hang up the phone. And even if we
> could tell the guest through CPUID, if the guest ignored what we told
> it, it could still send data down anyway! If there were a bug in this
> implementation that we missed, the only way to avoid it would be to
> roll out a new kernel, which is pretty heavy handed. If you could just
> disable the feature (or never enable it in the first place), that
> would be much less costly.
>

We can remove the implicit enabling of this live migration feature
from svm_vcpu_after_set_cpuid() callback invoked afer KVM_SET_CPUID2
ioctl, and let this feature flag be controlled by the userspace 
VMM/qemu. 

Userspace can set this feature flag explicitly by calling the 
KVM_SET_CPUID2 ioctl and enable this feature whenever it is ready to
do so.

I have tested this as part of Qemu code : 

int kvm_arch_init_vcpu(CPUState *cs)
{
...
...
        c->function = KVM_CPUID_FEATURES | kvm_base;
        c->eax = env->features[FEAT_KVM];
        c->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
...
...
	
    r = kvm_vcpu_ioctl(cs, KVM_SET_CPUID2, &cpuid_data);
...

Let me know if this addresses your concerns.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.
  2021-02-10 20:36                 ` Ashish Kalra
@ 2021-02-10 22:01                   ` Steve Rutherford
  2021-02-10 22:05                     ` Steve Rutherford
  0 siblings, 1 reply; 75+ messages in thread
From: Steve Rutherford @ 2021-02-10 22:01 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Radim Krčmář,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, Sean Christopherson, Venu Busireddy, Brijesh Singh

Hi Ashish,

On Wed, Feb 10, 2021 at 12:37 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>
> Hello Steve,
>
> We can remove the implicit enabling of this live migration feature
> from svm_vcpu_after_set_cpuid() callback invoked afer KVM_SET_CPUID2
> ioctl, and let this feature flag be controlled by the userspace
> VMM/qemu.
>
> Userspace can set this feature flag explicitly by calling the
> KVM_SET_CPUID2 ioctl and enable this feature whenever it is ready to
> do so.
>
> I have tested this as part of Qemu code :
>
> int kvm_arch_init_vcpu(CPUState *cs)
> {
> ...
> ...
>         c->function = KVM_CPUID_FEATURES | kvm_base;
>         c->eax = env->features[FEAT_KVM];
>         c->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
> ...
> ...
>
>     r = kvm_vcpu_ioctl(cs, KVM_SET_CPUID2, &cpuid_data);
> ...
>
> Let me know if this addresses your concerns.
Removing implicit enablement is one part of the equation.
The other two are:
1) Host userspace being able to ask the kernel if it supports SEV Live Migration
2) Host userspace being able to disable access to the MSR/hypercall

Feature flagging for paravirt features is pretty complicated, since
you need all three parties to negotiate (host userspace/host
kernel/guest), and every single one has veto power. In the end, the
feature should only be available to the guest if every single party
says yes.

For an example of how to handle 1), the new feature flag could be
checked when asking the kernel which cpuid bits it supports by adding
it to the list of features that the kernel mentions in
KVM_GET_SUPPORTED_CPUID.

For example (in KVM's arch/x86/kvm/cpuid.c):
case KVM_CPUID_FEATURES:
==========
entry->eax = (1 << KVM_FEATURE_CLOCKSOURCE) |
    (1 << KVM_FEATURE_NOP_IO_DELAY) |
...
    (1 << KVM_FEATURE_PV_SCHED_YIELD) |
+  (1 << KVM_FEATURE_ASYNC_PF_INT) |
-   (1 << KVM_FEATURE_ASYNC_PF_INT);
+  (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
==========

Without this, userspace has to infer if the kernel it is on supports that flag.

For an example of how to handle 2), in the new msr handler, KVM should
throw a GP `if (!guest_pv_has(vcpu, KVM_FEATURE_SEV_LIVE_MIGRATION))`
(it can do this by returning th. The issue here is "what if the guest
ignores CPUID and calls the MSR/hypercall anyway". This is a less
important issue as it requires the guest to be malicious, but still
worth resolving. Additionally, the hypercall itself should check if
the MSR has been toggled by the guest.

Thanks,
Steve

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.
  2021-02-10 22:01                   ` Steve Rutherford
@ 2021-02-10 22:05                     ` Steve Rutherford
  0 siblings, 0 replies; 75+ messages in thread
From: Steve Rutherford @ 2021-02-10 22:05 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Radim Krčmář,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, Sean Christopherson, Venu Busireddy, Brijesh Singh

On Wed, Feb 10, 2021 at 2:01 PM Steve Rutherford <srutherford@google.com> wrote:
>
> Hi Ashish,
>
> On Wed, Feb 10, 2021 at 12:37 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >
> > Hello Steve,
> >
> > We can remove the implicit enabling of this live migration feature
> > from svm_vcpu_after_set_cpuid() callback invoked afer KVM_SET_CPUID2
> > ioctl, and let this feature flag be controlled by the userspace
> > VMM/qemu.
> >
> > Userspace can set this feature flag explicitly by calling the
> > KVM_SET_CPUID2 ioctl and enable this feature whenever it is ready to
> > do so.
> >
> > I have tested this as part of Qemu code :
> >
> > int kvm_arch_init_vcpu(CPUState *cs)
> > {
> > ...
> > ...
> >         c->function = KVM_CPUID_FEATURES | kvm_base;
> >         c->eax = env->features[FEAT_KVM];
> >         c->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
> > ...
> > ...
> >
> >     r = kvm_vcpu_ioctl(cs, KVM_SET_CPUID2, &cpuid_data);
> > ...
> >
> > Let me know if this addresses your concerns.
> Removing implicit enablement is one part of the equation.
> The other two are:
> 1) Host userspace being able to ask the kernel if it supports SEV Live Migration
> 2) Host userspace being able to disable access to the MSR/hypercall
>
> Feature flagging for paravirt features is pretty complicated, since
> you need all three parties to negotiate (host userspace/host
> kernel/guest), and every single one has veto power. In the end, the
> feature should only be available to the guest if every single party
> says yes.
>
> For an example of how to handle 1), the new feature flag could be
> checked when asking the kernel which cpuid bits it supports by adding
> it to the list of features that the kernel mentions in
> KVM_GET_SUPPORTED_CPUID.
>
> For example (in KVM's arch/x86/kvm/cpuid.c):
> case KVM_CPUID_FEATURES:
> ==========
> entry->eax = (1 << KVM_FEATURE_CLOCKSOURCE) |
>     (1 << KVM_FEATURE_NOP_IO_DELAY) |
> ...
>     (1 << KVM_FEATURE_PV_SCHED_YIELD) |
> +  (1 << KVM_FEATURE_ASYNC_PF_INT) |
> -   (1 << KVM_FEATURE_ASYNC_PF_INT);
> +  (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
> ==========
>
> Without this, userspace has to infer if the kernel it is on supports that flag.
>
> For an example of how to handle 2), in the new msr handler, KVM should
> throw a GP `if (!guest_pv_has(vcpu, KVM_FEATURE_SEV_LIVE_MIGRATION))`
> (it can do this by returning th. The issue here is "what if the guest
Correction: (it can do this by returning 1).

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.
  2021-02-04  0:39 ` [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR Ashish Kalra
  2021-02-05  0:56   ` Steve Rutherford
@ 2021-02-16 23:20   ` Sean Christopherson
  1 sibling, 0 replies; 75+ messages in thread
From: Sean Christopherson @ 2021-02-16 23:20 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky,
	x86, kvm, linux-kernel, srutherford, venu.busireddy,
	brijesh.singh

On Thu, Feb 04, 2021, Ashish Kalra wrote:
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 950afebfba88..f6bfa138874f 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -33,6 +33,7 @@
>  #define KVM_FEATURE_PV_SCHED_YIELD	13
>  #define KVM_FEATURE_ASYNC_PF_INT	14
>  #define KVM_FEATURE_MSI_EXT_DEST_ID	15
> +#define KVM_FEATURE_SEV_LIVE_MIGRATION	16
>  
>  #define KVM_HINTS_REALTIME      0
>  
> @@ -54,6 +55,7 @@
>  #define MSR_KVM_POLL_CONTROL	0x4b564d05
>  #define MSR_KVM_ASYNC_PF_INT	0x4b564d06
>  #define MSR_KVM_ASYNC_PF_ACK	0x4b564d07
> +#define MSR_KVM_SEV_LIVE_MIGRATION	0x4b564d08
>  
>  struct kvm_steal_time {
>  	__u64 steal;
> @@ -136,4 +138,6 @@ struct kvm_vcpu_pv_apf_data {
>  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
>  #define KVM_PV_EOI_DISABLED 0x0
>  
> +#define KVM_SEV_LIVE_MIGRATION_ENABLED BIT_ULL(0)
> +
>  #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index b0d324aed515..93f42b3d3e33 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1627,6 +1627,16 @@ int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
>  	return ret;
>  }
>  
> +void sev_update_migration_flags(struct kvm *kvm, u64 data)
> +{

I don't see the point for a helper.  It's actually going to make the code
less readable once proper error handling is added.  Given that it's not static
and exposed via svm.h, without an external user, I assume this got left behind
when the implicit enabling was removed.

> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +
> +	if (!sev_guest(kvm))

I 100% agree with Steve, this needs to check guest_cpuid_has() in addition to
sev_guest().  And it should return '1', i.e. signal #GP to the guest, not
silently eat the bad WRMSR.

> +		return;
> +
> +	sev->live_migration_enabled = !!(data & KVM_SEV_LIVE_MIGRATION_ENABLED);

The value needs to be checked as well, i.e. all bits except LIVE_MIGRATION...
should to be reserved to zero.

> +}
> +
>  int svm_get_shared_pages_list(struct kvm *kvm,
>  			      struct kvm_shared_pages_list *list)
>  {
> @@ -1639,6 +1649,9 @@ int svm_get_shared_pages_list(struct kvm *kvm,
>  	if (!sev_guest(kvm))
>  		return -ENOTTY;
>  
> +	if (!sev->live_migration_enabled)
> +		return -EINVAL;

EINVAL is a weird return value for something that is controlled by the guest,
especially since it's possible for the guest to support migration, just not
yet.  EBUSY maybe?  EOPNOTSUPP?

> +
>  	if (!list->size)
>  		return -EINVAL;
>  
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 58f89f83caab..43ea5061926f 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2903,6 +2903,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>  		svm->msr_decfg = data;
>  		break;
>  	}
> +	case MSR_KVM_SEV_LIVE_MIGRATION:
> +		sev_update_migration_flags(vcpu->kvm, data);
> +		break;

There shuld be a svm_get_msr() entry as well, I don't see any reason to prevent
the guest from reading the MSR.

>  	case MSR_IA32_APICBASE:
>  		if (kvm_vcpu_apicv_active(vcpu))
>  			avic_update_vapic_bar(to_svm(vcpu), data);
> @@ -3976,6 +3979,19 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  			vcpu->arch.cr3_lm_rsvd_bits &= ~(1UL << (best->ebx & 0x3f));
>  	}
>  
> +	/*
> +	 * If SEV guest then enable the Live migration feature.
> +	 */
> +	if (sev_guest(vcpu->kvm)) {
> +		struct kvm_cpuid_entry2 *best;
> +
> +		best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
> +		if (!best)
> +			return;
> +
> +		best->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);

Again echoing Steve's concern, userspace is the ultimate authority on what
features are exposed to the VM.  I don't see any motivation for forcing live
migration to be enabled.

And as I believe was pointed out elsewhere, this bit needs to be advertised to
userspace via kvm_cpu_caps.

> +	}
> +
>  	if (!kvm_vcpu_apicv_active(vcpu))
>  		return;
>  
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 066ca2a9f1e6..e1bffc11e425 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -79,6 +79,7 @@ struct kvm_sev_info {
>  	unsigned long pages_locked; /* Number of pages locked */
>  	struct list_head regions_list;  /* List of registered regions */
>  	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
> +	bool live_migration_enabled;
>  	/* List and count of shared pages */
>  	int shared_pages_list_count;
>  	struct list_head shared_pages_list;
> @@ -592,6 +593,7 @@ int svm_unregister_enc_region(struct kvm *kvm,
>  void pre_sev_run(struct vcpu_svm *svm, int cpu);
>  void __init sev_hardware_setup(void);
>  void sev_hardware_teardown(void);
> +void sev_update_migration_flags(struct kvm *kvm, u64 data);
>  void sev_free_vcpu(struct kvm_vcpu *vcpu);
>  int sev_handle_vmgexit(struct vcpu_svm *svm);
>  int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-04  0:39 ` [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl Ashish Kalra
  2021-02-04 16:14   ` Tom Lendacky
@ 2021-02-17  1:03   ` Sean Christopherson
  2021-02-17 14:00     ` Kalra, Ashish
  2021-02-18 18:32     ` Kalra, Ashish
  2021-02-17  1:06   ` Sean Christopherson
  2 siblings, 2 replies; 75+ messages in thread
From: Sean Christopherson @ 2021-02-17  1:03 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky,
	x86, kvm, linux-kernel, srutherford, venu.busireddy,
	brijesh.singh

On Thu, Feb 04, 2021, Ashish Kalra wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The ioctl is used to retrieve a guest's shared pages list.

What's the performance hit to boot time if KVM_HC_PAGE_ENC_STATUS is passed
through to userspace?  That way, userspace could manage the set of pages in
whatever data structure they want, and these get/set ioctls go away.

Also, aren't there plans for an in-guest migration helper?  If so, do we have
any idea what that interface will look like?  E.g. if we're going to end up with
a full fledged driver in the guest, why not bite the bullet now and bypass KVM
entirely?

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-04  0:39 ` [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl Ashish Kalra
  2021-02-04 16:14   ` Tom Lendacky
  2021-02-17  1:03   ` Sean Christopherson
@ 2021-02-17  1:06   ` Sean Christopherson
  2 siblings, 0 replies; 75+ messages in thread
From: Sean Christopherson @ 2021-02-17  1:06 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, venu.busireddy, brijesh.singh

On Thu, Feb 04, 2021, Ashish Kalra wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The ioctl is used to retrieve a guest's shared pages list.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>

AFAIK, Radim is no longer involved with KVM, and his RedHat email is bouncing.
Probably best to drop his Cc in future revisions.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-17  1:03   ` Sean Christopherson
@ 2021-02-17 14:00     ` Kalra, Ashish
  2021-02-17 16:13       ` Sean Christopherson
  2021-02-18 18:32     ` Kalra, Ashish
  1 sibling, 1 reply; 75+ messages in thread
From: Kalra, Ashish @ 2021-02-17 14:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, tglx, mingo, hpa, rkrcmar, joro, bp, Lendacky, Thomas,
	x86, kvm, linux-kernel, srutherford, venu.busireddy, Singh,
	Brijesh

[AMD Public Use]

-----Original Message-----
From: Sean Christopherson <seanjc@google.com> 
Sent: Tuesday, February 16, 2021 7:03 PM
To: Kalra, Ashish <Ashish.Kalra@amd.com>
Cc: pbonzini@redhat.com; tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; rkrcmar@redhat.com; joro@8bytes.org; bp@suse.de; Lendacky, Thomas <Thomas.Lendacky@amd.com>; x86@kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; srutherford@google.com; venu.busireddy@oracle.com; Singh, Brijesh <brijesh.singh@amd.com>
Subject: Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

On Thu, Feb 04, 2021, Ashish Kalra wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The ioctl is used to retrieve a guest's shared pages list.

>What's the performance hit to boot time if KVM_HC_PAGE_ENC_STATUS is passed through to userspace?  That way, userspace could manage the set of pages >in whatever data structure they want, and these get/set ioctls go away.

What is the advantage of passing KVM_HC_PAGE_ENC_STATUS through to user-space ?

As such it is just a simple interface to get the shared page list via the get/set ioctl's. simply an array is passed to these ioctl to get/set the shared pages
list.

>Also, aren't there plans for an in-guest migration helper?  If so, do we have any idea what that interface will look like?  E.g. if we're going to end up with a full >fledged driver in the guest, why not bite the bullet now and bypass KVM entirely?

Even the in-guest migration helper will be using page encryption status hypercalls, so some interface is surely required.

Also the in-guest migration will be mainly an OVMF component, won't  really be a full fledged kernel driver in the guest.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-17 14:00     ` Kalra, Ashish
@ 2021-02-17 16:13       ` Sean Christopherson
  2021-02-18  6:48         ` Kalra, Ashish
  0 siblings, 1 reply; 75+ messages in thread
From: Sean Christopherson @ 2021-02-17 16:13 UTC (permalink / raw)
  To: Kalra, Ashish
  Cc: pbonzini, tglx, mingo, hpa, rkrcmar, joro, bp, Lendacky, Thomas,
	x86, kvm, linux-kernel, srutherford, venu.busireddy, Singh,
	Brijesh

On Wed, Feb 17, 2021, Kalra, Ashish wrote:
> From: Sean Christopherson <seanjc@google.com> 
> On Thu, Feb 04, 2021, Ashish Kalra wrote:
> > From: Brijesh Singh <brijesh.singh@amd.com>
> > 
> > The ioctl is used to retrieve a guest's shared pages list.
> 
> >What's the performance hit to boot time if KVM_HC_PAGE_ENC_STATUS is passed
> >through to userspace?  That way, userspace could manage the set of pages >in
> >whatever data structure they want, and these get/set ioctls go away.
> 
> What is the advantage of passing KVM_HC_PAGE_ENC_STATUS through to user-space
> ?
> 
> As such it is just a simple interface to get the shared page list via the
> get/set ioctl's. simply an array is passed to these ioctl to get/set the
> shared pages list.

It eliminates any probability of the kernel choosing the wrong data structure,
and it's two fewer ioctls to maintain and test.

> >Also, aren't there plans for an in-guest migration helper?  If so, do we
> >have any idea what that interface will look like?  E.g. if we're going to
> >end up with a full >fledged driver in the guest, why not bite the bullet now
> >and bypass KVM entirely?
> 
> Even the in-guest migration helper will be using page encryption status
> hypercalls, so some interface is surely required.

If it's a driver with a more extensive interace, then the hypercalls can be
replaced by a driver operation.  That's obviously a big if, though.

> Also the in-guest migration will be mainly an OVMF component, won't  really
> be a full fledged kernel driver in the guest.

Is there code and/or a description of what the proposed helper would look like?

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-17 16:13       ` Sean Christopherson
@ 2021-02-18  6:48         ` Kalra, Ashish
  2021-02-18 16:39           ` Sean Christopherson
  0 siblings, 1 reply; 75+ messages in thread
From: Kalra, Ashish @ 2021-02-18  6:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, tglx, mingo, hpa, rkrcmar, joro, bp, Lendacky, Thomas,
	x86, kvm, linux-kernel, srutherford, venu.busireddy, Singh,
	Brijesh

[AMD Public Use]

-----Original Message-----
From: Sean Christopherson <seanjc@google.com> 
Sent: Wednesday, February 17, 2021 10:13 AM
To: Kalra, Ashish <Ashish.Kalra@amd.com>
Cc: pbonzini@redhat.com; tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; rkrcmar@redhat.com; joro@8bytes.org; bp@suse.de; Lendacky, Thomas <Thomas.Lendacky@amd.com>; x86@kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; srutherford@google.com; venu.busireddy@oracle.com; Singh, Brijesh <brijesh.singh@amd.com>
Subject: Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

On Wed, Feb 17, 2021, Kalra, Ashish wrote:
>> From: Sean Christopherson <seanjc@google.com> On Thu, Feb 04, 2021, 
>> Ashish Kalra wrote:
>> > From: Brijesh Singh <brijesh.singh@amd.com>
>> > 
>> > The ioctl is used to retrieve a guest's shared pages list.
>> 
>> >What's the performance hit to boot time if KVM_HC_PAGE_ENC_STATUS is 
>> >passed through to userspace?  That way, userspace could manage the 
>> >set of pages >in whatever data structure they want, and these get/set ioctls go away.
>> 
>> What is the advantage of passing KVM_HC_PAGE_ENC_STATUS through to 
>> user-space ?
>> 
>> As such it is just a simple interface to get the shared page list via 
>> the get/set ioctl's. simply an array is passed to these ioctl to 
>> get/set the shared pages list.

> It eliminates any probability of the kernel choosing the wrong data structure, and it's two fewer ioctls to maintain and test.

The set shared pages list ioctl cannot be avoided as it needs to be issued to setup the shared pages list on the migrated
VM, it cannot be achieved by passing KVM_HC_PAGE_ENC_STATUS through to user-space.

So it makes sense to add both get/set shared pages list ioctl, passing through to user-space is just adding more complexity
without any significant gains.

> >Also, aren't there plans for an in-guest migration helper?  If so, do 
> >we have any idea what that interface will look like?  E.g. if we're 
> >going to end up with a full >fledged driver in the guest, why not 
> >bite the bullet now and bypass KVM entirely?
> 
> Even the in-guest migration helper will be using page encryption 
> status hypercalls, so some interface is surely required.

>If it's a driver with a more extensive interace, then the hypercalls can be replaced by a driver operation.  That's obviously a big if, though.

> Also the in-guest migration will be mainly an OVMF component, won't  
> really be a full fledged kernel driver in the guest.

>Is there code and/or a description of what the proposed helper would look like?

Not right now, there are prototype(s) under development, I assume they will be posted upstream soon.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-18  6:48         ` Kalra, Ashish
@ 2021-02-18 16:39           ` Sean Christopherson
  2021-02-18 17:05             ` Kalra, Ashish
  0 siblings, 1 reply; 75+ messages in thread
From: Sean Christopherson @ 2021-02-18 16:39 UTC (permalink / raw)
  To: Kalra, Ashish
  Cc: pbonzini, tglx, mingo, hpa, rkrcmar, joro, bp, Lendacky, Thomas,
	x86, kvm, linux-kernel, srutherford, venu.busireddy, Singh,
	Brijesh

On Thu, Feb 18, 2021, Kalra, Ashish wrote:
> From: Sean Christopherson <seanjc@google.com> 
> 
> On Wed, Feb 17, 2021, Kalra, Ashish wrote:
> >> From: Sean Christopherson <seanjc@google.com> On Thu, Feb 04, 2021, 
> >> Ashish Kalra wrote:
> >> > From: Brijesh Singh <brijesh.singh@amd.com>
> >> > 
> >> > The ioctl is used to retrieve a guest's shared pages list.
> >> 
> >> >What's the performance hit to boot time if KVM_HC_PAGE_ENC_STATUS is 
> >> >passed through to userspace?  That way, userspace could manage the 
> >> >set of pages >in whatever data structure they want, and these get/set ioctls go away.
> >> 
> >> What is the advantage of passing KVM_HC_PAGE_ENC_STATUS through to 
> >> user-space ?
> >> 
> >> As such it is just a simple interface to get the shared page list via 
> >> the get/set ioctl's. simply an array is passed to these ioctl to 
> >> get/set the shared pages list.
> 
> > It eliminates any probability of the kernel choosing the wrong data
> > structure, and it's two fewer ioctls to maintain and test.
> 
> The set shared pages list ioctl cannot be avoided as it needs to be issued to
> setup the shared pages list on the migrated VM, it cannot be achieved by
> passing KVM_HC_PAGE_ENC_STATUS through to user-space.

Why's that?  AIUI, KVM doesn't do anything with the list other than pass it back
to userspace.  Assuming that's the case, userspace can just hold onto the list
for the next migration.

> So it makes sense to add both get/set shared pages list ioctl, passing
> through to user-space is just adding more complexity without any significant
> gains.

No, it reduces complexity for KVM by avoiding locking and memslot dependencies.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-18 16:39           ` Sean Christopherson
@ 2021-02-18 17:05             ` Kalra, Ashish
  2021-02-18 17:50               ` Sean Christopherson
  0 siblings, 1 reply; 75+ messages in thread
From: Kalra, Ashish @ 2021-02-18 17:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, tglx, mingo, hpa, rkrcmar, joro, bp, Lendacky, Thomas,
	x86, kvm, linux-kernel, srutherford, venu.busireddy, Singh,
	Brijesh

[AMD Public Use]


-----Original Message-----
From: Sean Christopherson <seanjc@google.com> 
Sent: Thursday, February 18, 2021 10:39 AM
To: Kalra, Ashish <Ashish.Kalra@amd.com>
Cc: pbonzini@redhat.com; tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; rkrcmar@redhat.com; joro@8bytes.org; bp@suse.de; Lendacky, Thomas <Thomas.Lendacky@amd.com>; x86@kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; srutherford@google.com; venu.busireddy@oracle.com; Singh, Brijesh <brijesh.singh@amd.com>
Subject: Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

On Thu, Feb 18, 2021, Kalra, Ashish wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> On Wed, Feb 17, 2021, Kalra, Ashish wrote:
> >> From: Sean Christopherson <seanjc@google.com> On Thu, Feb 04, 2021, 
> >> Ashish Kalra wrote:
> >> > From: Brijesh Singh <brijesh.singh@amd.com>
> >> > 
> >> > The ioctl is used to retrieve a guest's shared pages list.
> >> 
> >> >What's the performance hit to boot time if KVM_HC_PAGE_ENC_STATUS 
> >> >is passed through to userspace?  That way, userspace could manage 
> >> >the set of pages >in whatever data structure they want, and these get/set ioctls go away.
> >> 
> >> What is the advantage of passing KVM_HC_PAGE_ENC_STATUS through to 
> >> user-space ?
> >> 
> >> As such it is just a simple interface to get the shared page list 
> >> via the get/set ioctl's. simply an array is passed to these ioctl 
> >> to get/set the shared pages list.
>> 
>> > It eliminates any probability of the kernel choosing the wrong data 
>> > structure, and it's two fewer ioctls to maintain and test.
>> 
>> The set shared pages list ioctl cannot be avoided as it needs to be 
>> issued to setup the shared pages list on the migrated VM, it cannot be 
>> achieved by passing KVM_HC_PAGE_ENC_STATUS through to user-space.

>Why's that?  AIUI, KVM doesn't do anything with the list other than pass it back to userspace.  Assuming that's the case, userspace can just hold onto the list >for the next migration.

KVM does use it as part of the SEV DBG_DECTYPT API, within sev_dbg_decrypt() to check if the guest page(s) are encrypted or not,
and accordingly use it to decide whether to decrypt the guest page(s) and return that back to user-space or just return it as it is.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-18 17:05             ` Kalra, Ashish
@ 2021-02-18 17:50               ` Sean Christopherson
  0 siblings, 0 replies; 75+ messages in thread
From: Sean Christopherson @ 2021-02-18 17:50 UTC (permalink / raw)
  To: Kalra, Ashish
  Cc: pbonzini, tglx, mingo, hpa, rkrcmar, joro, bp, Lendacky, Thomas,
	x86, kvm, linux-kernel, srutherford, venu.busireddy, Singh,
	Brijesh

On Thu, Feb 18, 2021, Kalra, Ashish wrote:
> From: Sean Christopherson <seanjc@google.com> 
> 
> On Thu, Feb 18, 2021, Kalra, Ashish wrote:
> > From: Sean Christopherson <seanjc@google.com>
> > 
> > On Wed, Feb 17, 2021, Kalra, Ashish wrote:
> > >> From: Sean Christopherson <seanjc@google.com> On Thu, Feb 04, 2021, 
> > >> Ashish Kalra wrote:
> > >> > From: Brijesh Singh <brijesh.singh@amd.com>
> > >> > 
> > >> > The ioctl is used to retrieve a guest's shared pages list.
> > >> 
> > >> >What's the performance hit to boot time if KVM_HC_PAGE_ENC_STATUS 
> > >> >is passed through to userspace?  That way, userspace could manage 
> > >> >the set of pages >in whatever data structure they want, and these get/set ioctls go away.
> > >> 
> > >> What is the advantage of passing KVM_HC_PAGE_ENC_STATUS through to 
> > >> user-space ?
> > >> 
> > >> As such it is just a simple interface to get the shared page list 
> > >> via the get/set ioctl's. simply an array is passed to these ioctl 
> > >> to get/set the shared pages list.
> >> 
> >> > It eliminates any probability of the kernel choosing the wrong data 
> >> > structure, and it's two fewer ioctls to maintain and test.
> >> 
> >> The set shared pages list ioctl cannot be avoided as it needs to be 
> >> issued to setup the shared pages list on the migrated VM, it cannot be 
> >> achieved by passing KVM_HC_PAGE_ENC_STATUS through to user-space.
> 
> >Why's that?  AIUI, KVM doesn't do anything with the list other than pass it
> >back to userspace.  Assuming that's the case, userspace can just hold onto
> >the list >for the next migration.
> 
> KVM does use it as part of the SEV DBG_DECTYPT API, within sev_dbg_decrypt()
> to check if the guest page(s) are encrypted or not, and accordingly use it to
> decide whether to decrypt the guest page(s) and return that back to
> user-space or just return it as it is.

Why is handling shared memory KVM's responsibility?  Userspace shouldn't be
asking KVM to decrypt memory it knows isn't encrypted.  My understanding is that
bogus decryption won't harm the kernel, it will only corrupt the guest.  In
other words, patch 16 can be dropped if managing the set of shared pages is
punted to userspace.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 14/16] KVM: x86: Add guest support for detecting and enabling SEV Live Migration feature.
  2021-02-04  0:40 ` [PATCH v10 14/16] KVM: x86: Add guest support for detecting and enabling SEV Live Migration feature Ashish Kalra
@ 2021-02-18 17:56   ` Sean Christopherson
  0 siblings, 0 replies; 75+ messages in thread
From: Sean Christopherson @ 2021-02-18 17:56 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, rkrcmar, joro, bp, thomas.lendacky,
	x86, kvm, linux-kernel, srutherford, venu.busireddy,
	brijesh.singh

On Thu, Feb 04, 2021, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>

...

>  arch/x86/include/asm/mem_encrypt.h |  8 +++++
>  arch/x86/kernel/kvm.c              | 52 ++++++++++++++++++++++++++++++
>  arch/x86/mm/mem_encrypt.c          | 41 +++++++++++++++++++++++
>  3 files changed, 101 insertions(+)

Please use "x86/kvm:" in the shortlog for guest side changes so that reviewers
can easily differentiate between host and guest patches.  Past changes haven't
exactly been consistent for guest changes, but "x86/kvm:" is the most popular
at a glance, and IMO aligns best with other kernel (non-KVM) changes.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-17  1:03   ` Sean Christopherson
  2021-02-17 14:00     ` Kalra, Ashish
@ 2021-02-18 18:32     ` Kalra, Ashish
  2021-02-24 17:51       ` Ashish Kalra
  1 sibling, 1 reply; 75+ messages in thread
From: Kalra, Ashish @ 2021-02-18 18:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, tglx, mingo, hpa, rkrcmar, joro, bp, Lendacky, Thomas,
	x86, kvm, linux-kernel, srutherford, venu.busireddy, Singh,
	Brijesh

[AMD Public Use]


-----Original Message-----
From: Sean Christopherson <seanjc@google.com> 
Sent: Tuesday, February 16, 2021 7:03 PM
To: Kalra, Ashish <Ashish.Kalra@amd.com>
Cc: pbonzini@redhat.com; tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; rkrcmar@redhat.com; joro@8bytes.org; bp@suse.de; Lendacky, Thomas <Thomas.Lendacky@amd.com>; x86@kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; srutherford@google.com; venu.busireddy@oracle.com; Singh, Brijesh <brijesh.singh@amd.com>
Subject: Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

On Thu, Feb 04, 2021, Ashish Kalra wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The ioctl is used to retrieve a guest's shared pages list.

>What's the performance hit to boot time if KVM_HC_PAGE_ENC_STATUS is passed through to userspace?  That way, userspace could manage the set of pages >in whatever data structure they want, and these get/set ioctls go away.

I will be more concerned about performance hit during guest DMA I/O if the page encryption status hypercalls are passed through to user-space, 
a lot of guest DMA I/O dynamically sets up pages for encryption and then flips them at DMA completion, so guest I/O will surely take a performance 
hit with this pass-through stuff.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-18 18:32     ` Kalra, Ashish
@ 2021-02-24 17:51       ` Ashish Kalra
  2021-02-24 18:22         ` Sean Christopherson
  0 siblings, 1 reply; 75+ messages in thread
From: Ashish Kalra @ 2021-02-24 17:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, tglx, mingo, hpa, rkrcmar, joro, bp, Lendacky, Thomas,
	x86, kvm, linux-kernel, srutherford, venu.busireddy, Singh,
	Brijesh

On Thu, Feb 18, 2021 at 12:32:47PM -0600, Kalra, Ashish wrote:
> [AMD Public Use]
> 
> 
> -----Original Message-----
> From: Sean Christopherson <seanjc@google.com> 
> Sent: Tuesday, February 16, 2021 7:03 PM
> To: Kalra, Ashish <Ashish.Kalra@amd.com>
> Cc: pbonzini@redhat.com; tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; rkrcmar@redhat.com; joro@8bytes.org; bp@suse.de; Lendacky, Thomas <Thomas.Lendacky@amd.com>; x86@kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; srutherford@google.com; venu.busireddy@oracle.com; Singh, Brijesh <brijesh.singh@amd.com>
> Subject: Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
> 
> On Thu, Feb 04, 2021, Ashish Kalra wrote:
> > From: Brijesh Singh <brijesh.singh@amd.com>
> > 
> > The ioctl is used to retrieve a guest's shared pages list.
> 
> >What's the performance hit to boot time if KVM_HC_PAGE_ENC_STATUS is passed through to userspace?  That way, userspace could manage the set of pages >in whatever data structure they want, and these get/set ioctls go away.
> 
> I will be more concerned about performance hit during guest DMA I/O if the page encryption status hypercalls are passed through to user-space, 
> a lot of guest DMA I/O dynamically sets up pages for encryption and then flips them at DMA completion, so guest I/O will surely take a performance 
> hit with this pass-through stuff.
> 

Here are some rough performance numbers comparing # of heavy-weight VMEXITs compared to # of hypercalls, 
during a SEV guest boot: (launch of a ubuntu 18.04 guest)

# ./perf record -e kvm:kvm_userspace_exit -e kvm:kvm_hypercall -a ./qemu-system-x86_64 -enable-kvm -cpu host -machine q35 -smp 16,maxcpus=64 -m 512M -drive if=pflash,format=raw,unit=0,file=/home/ashish/sev-migration/qemu-5.1.50/OVMF_CODE.fd,readonly -drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd -drive file=../ubuntu-18.04.qcow2,if=none,id=disk0,format=qcow2 -device virtio-scsi-pci,id=scsi,disable-legacy=on,iommu_platform=true -device scsi-hd,drive=disk0 -object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1,policy=0x0 -machine memory-encryption=sev0 -trace events=/tmp/events -nographic -monitor pty -monitor unix:monitor-source,server,nowait -qmp unix:/tmp/qmp-sock,server,nowait -device virtio-rng-pci,disable-legacy=on,iommu_platform=true

...
...

root@diesel2540:/home/ashish/sev-migration/qemu-5.1.50# ./perf report
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 981K of event 'kvm:kvm_userspace_exit'
# Event count (approx.): 981021
#
# Overhead  Command          Shared Object     Symbol
# ........  ...............  ................  ..................
#
   100.00%  qemu-system-x86  [kernel.vmlinux]  [k] kvm_vcpu_ioctl


# Samples: 19K of event 'kvm:kvm_hypercall'
# Event count (approx.): 19573
#
# Overhead  Command          Shared Object     Symbol
# ........  ...............  ................  .........................
#
   100.00%  qemu-system-x86  [kernel.vmlinux]  [k] kvm_emulate_hypercall

Out of these 19573 hypercalls, # of page encryption status hcalls are 19479,
so almost all hypercalls here are page encryption status hypercalls.

The above data indicates that there will be ~2% more Heavyweight VMEXITs
during SEV guest boot if we do page encryption status hypercalls 
pass-through to host userspace.

But, then Brijesh pointed out to me and highlighted that currently
OVMF is doing lot of VMEXITs because they don't use the DMA pool to minimize the C-bit toggles,
in other words, OVMF bounce buffer does page state change on every DMA allocate and free.

So here is the performance analysis after kernel and initrd have been
loaded into memory using grub and then starting perf just before booting the kernel.

These are the performance #'s after kernel and initrd have been loaded into memory, 
then perf is attached and kernel is booted : 

# Samples: 1M of event 'kvm:kvm_userspace_exit'
# Event count (approx.): 1081235
#
# Overhead  Trace output
# ........  ........................
#
    99.77%  reason KVM_EXIT_IO (2)
     0.23%  reason KVM_EXIT_MMIO (6)

# Samples: 1K of event 'kvm:kvm_hypercall'
# Event count (approx.): 1279
#

So as the above data indicates, Linux is only making ~1K hypercalls,
compared to ~18K hypercalls made by OVMF in the above use case.

Does the above adds a prerequisite that OVMF needs to be optimized if 
and before hypercall pass-through can be done ? 

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-24 17:51       ` Ashish Kalra
@ 2021-02-24 18:22         ` Sean Christopherson
  2021-02-25 20:20           ` Ashish Kalra
  0 siblings, 1 reply; 75+ messages in thread
From: Sean Christopherson @ 2021-02-24 18:22 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, rkrcmar, joro, bp, Lendacky, Thomas,
	x86, kvm, linux-kernel, srutherford, venu.busireddy, Singh,
	Brijesh

On Wed, Feb 24, 2021, Ashish Kalra wrote:
> # Samples: 19K of event 'kvm:kvm_hypercall'
> # Event count (approx.): 19573
> #
> # Overhead  Command          Shared Object     Symbol
> # ........  ...............  ................  .........................
> #
>    100.00%  qemu-system-x86  [kernel.vmlinux]  [k] kvm_emulate_hypercall
> 
> Out of these 19573 hypercalls, # of page encryption status hcalls are 19479,
> so almost all hypercalls here are page encryption status hypercalls.

Oof.

> The above data indicates that there will be ~2% more Heavyweight VMEXITs
> during SEV guest boot if we do page encryption status hypercalls 
> pass-through to host userspace.
> 
> But, then Brijesh pointed out to me and highlighted that currently
> OVMF is doing lot of VMEXITs because they don't use the DMA pool to minimize the C-bit toggles,
> in other words, OVMF bounce buffer does page state change on every DMA allocate and free.
> 
> So here is the performance analysis after kernel and initrd have been
> loaded into memory using grub and then starting perf just before booting the kernel.
> 
> These are the performance #'s after kernel and initrd have been loaded into memory, 
> then perf is attached and kernel is booted : 
> 
> # Samples: 1M of event 'kvm:kvm_userspace_exit'
> # Event count (approx.): 1081235
> #
> # Overhead  Trace output
> # ........  ........................
> #
>     99.77%  reason KVM_EXIT_IO (2)
>      0.23%  reason KVM_EXIT_MMIO (6)
> 
> # Samples: 1K of event 'kvm:kvm_hypercall'
> # Event count (approx.): 1279
> #
> 
> So as the above data indicates, Linux is only making ~1K hypercalls,
> compared to ~18K hypercalls made by OVMF in the above use case.
> 
> Does the above adds a prerequisite that OVMF needs to be optimized if 
> and before hypercall pass-through can be done ? 

Disclaimer: my math could be totally wrong.

I doubt it's a hard requirement.  Assuming a conversative roundtrip time of 50k
cycles, those 18K hypercalls will add well under a 1/2 a second of boot time.
If userspace can push the roundtrip time down to 10k cycles, the overhead is
more like 50 milliseconds.

That being said, this does seem like a good OVMF cleanup, irrespective of this
new hypercall.  I assume it's not cheap to convert a page between encrypted and
decrypted.

Thanks much for getting the numbers!

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-24 18:22         ` Sean Christopherson
@ 2021-02-25 20:20           ` Ashish Kalra
  2021-02-25 22:59             ` Steve Rutherford
  0 siblings, 1 reply; 75+ messages in thread
From: Ashish Kalra @ 2021-02-25 20:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, tglx, mingo, hpa, rkrcmar, joro, bp, Lendacky, Thomas,
	x86, kvm, linux-kernel, srutherford, venu.busireddy, Singh,
	Brijesh

On Wed, Feb 24, 2021 at 10:22:33AM -0800, Sean Christopherson wrote:
> On Wed, Feb 24, 2021, Ashish Kalra wrote:
> > # Samples: 19K of event 'kvm:kvm_hypercall'
> > # Event count (approx.): 19573
> > #
> > # Overhead  Command          Shared Object     Symbol
> > # ........  ...............  ................  .........................
> > #
> >    100.00%  qemu-system-x86  [kernel.vmlinux]  [k] kvm_emulate_hypercall
> > 
> > Out of these 19573 hypercalls, # of page encryption status hcalls are 19479,
> > so almost all hypercalls here are page encryption status hypercalls.
> 
> Oof.
> 
> > The above data indicates that there will be ~2% more Heavyweight VMEXITs
> > during SEV guest boot if we do page encryption status hypercalls 
> > pass-through to host userspace.
> > 
> > But, then Brijesh pointed out to me and highlighted that currently
> > OVMF is doing lot of VMEXITs because they don't use the DMA pool to minimize the C-bit toggles,
> > in other words, OVMF bounce buffer does page state change on every DMA allocate and free.
> > 
> > So here is the performance analysis after kernel and initrd have been
> > loaded into memory using grub and then starting perf just before booting the kernel.
> > 
> > These are the performance #'s after kernel and initrd have been loaded into memory, 
> > then perf is attached and kernel is booted : 
> > 
> > # Samples: 1M of event 'kvm:kvm_userspace_exit'
> > # Event count (approx.): 1081235
> > #
> > # Overhead  Trace output
> > # ........  ........................
> > #
> >     99.77%  reason KVM_EXIT_IO (2)
> >      0.23%  reason KVM_EXIT_MMIO (6)
> > 
> > # Samples: 1K of event 'kvm:kvm_hypercall'
> > # Event count (approx.): 1279
> > #
> > 
> > So as the above data indicates, Linux is only making ~1K hypercalls,
> > compared to ~18K hypercalls made by OVMF in the above use case.
> > 
> > Does the above adds a prerequisite that OVMF needs to be optimized if 
> > and before hypercall pass-through can be done ? 
> 
> Disclaimer: my math could be totally wrong.
> 
> I doubt it's a hard requirement.  Assuming a conversative roundtrip time of 50k
> cycles, those 18K hypercalls will add well under a 1/2 a second of boot time.
> If userspace can push the roundtrip time down to 10k cycles, the overhead is
> more like 50 milliseconds.
> 
> That being said, this does seem like a good OVMF cleanup, irrespective of this
> new hypercall.  I assume it's not cheap to convert a page between encrypted and
> decrypted.
> 
> Thanks much for getting the numbers!

Considering the above data and guest boot time latencies 
(and potential issues with OVMF and optimizations required there),
do we have any consensus on whether we want to do page encryption
status hypercall passthrough or not ?

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-25 20:20           ` Ashish Kalra
@ 2021-02-25 22:59             ` Steve Rutherford
  2021-02-25 23:24               ` Steve Rutherford
  2021-02-26 14:04               ` Ashish Kalra
  0 siblings, 2 replies; 75+ messages in thread
From: Steve Rutherford @ 2021-02-25 22:59 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Sean Christopherson, pbonzini, tglx, mingo, hpa, rkrcmar, joro,
	bp, Lendacky, Thomas, x86, kvm, linux-kernel, venu.busireddy,
	Singh, Brijesh

On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>
> On Wed, Feb 24, 2021 at 10:22:33AM -0800, Sean Christopherson wrote:
> > On Wed, Feb 24, 2021, Ashish Kalra wrote:
> > > # Samples: 19K of event 'kvm:kvm_hypercall'
> > > # Event count (approx.): 19573
> > > #
> > > # Overhead  Command          Shared Object     Symbol
> > > # ........  ...............  ................  .........................
> > > #
> > >    100.00%  qemu-system-x86  [kernel.vmlinux]  [k] kvm_emulate_hypercall
> > >
> > > Out of these 19573 hypercalls, # of page encryption status hcalls are 19479,
> > > so almost all hypercalls here are page encryption status hypercalls.
> >
> > Oof.
> >
> > > The above data indicates that there will be ~2% more Heavyweight VMEXITs
> > > during SEV guest boot if we do page encryption status hypercalls
> > > pass-through to host userspace.
> > >
> > > But, then Brijesh pointed out to me and highlighted that currently
> > > OVMF is doing lot of VMEXITs because they don't use the DMA pool to minimize the C-bit toggles,
> > > in other words, OVMF bounce buffer does page state change on every DMA allocate and free.
> > >
> > > So here is the performance analysis after kernel and initrd have been
> > > loaded into memory using grub and then starting perf just before booting the kernel.
> > >
> > > These are the performance #'s after kernel and initrd have been loaded into memory,
> > > then perf is attached and kernel is booted :
> > >
> > > # Samples: 1M of event 'kvm:kvm_userspace_exit'
> > > # Event count (approx.): 1081235
> > > #
> > > # Overhead  Trace output
> > > # ........  ........................
> > > #
> > >     99.77%  reason KVM_EXIT_IO (2)
> > >      0.23%  reason KVM_EXIT_MMIO (6)
> > >
> > > # Samples: 1K of event 'kvm:kvm_hypercall'
> > > # Event count (approx.): 1279
> > > #
> > >
> > > So as the above data indicates, Linux is only making ~1K hypercalls,
> > > compared to ~18K hypercalls made by OVMF in the above use case.
> > >
> > > Does the above adds a prerequisite that OVMF needs to be optimized if
> > > and before hypercall pass-through can be done ?
> >
> > Disclaimer: my math could be totally wrong.
> >
> > I doubt it's a hard requirement.  Assuming a conversative roundtrip time of 50k
> > cycles, those 18K hypercalls will add well under a 1/2 a second of boot time.
> > If userspace can push the roundtrip time down to 10k cycles, the overhead is
> > more like 50 milliseconds.
> >
> > That being said, this does seem like a good OVMF cleanup, irrespective of this
> > new hypercall.  I assume it's not cheap to convert a page between encrypted and
> > decrypted.
> >
> > Thanks much for getting the numbers!
>
> Considering the above data and guest boot time latencies
> (and potential issues with OVMF and optimizations required there),
> do we have any consensus on whether we want to do page encryption
> status hypercall passthrough or not ?
>
> Thanks,
> Ashish

Thanks for grabbing the data!

I am fine with both paths. Sean has stated an explicit desire for
hypercall exiting, so I think that would be the current consensus.

If we want to do hypercall exiting, this should be in a follow-up
series where we implement something more generic, e.g. a hypercall
exiting bitmap or hypercall exit list. If we are taking the hypercall
exit route, we can drop the kvm side of the hypercall. Userspace could
also handle the MSR using MSR filters (would need to confirm that).
Then userspace could also be in control of the cpuid bit.

Essentially, I think you could drop most of the host kernel work if
there were generic support for hypercall exiting. Then userspace would
be responsible for all of that. Thoughts on this?

--Steve

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-25 22:59             ` Steve Rutherford
@ 2021-02-25 23:24               ` Steve Rutherford
  2021-02-26 14:04               ` Ashish Kalra
  1 sibling, 0 replies; 75+ messages in thread
From: Steve Rutherford @ 2021-02-25 23:24 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Sean Christopherson, pbonzini, tglx, mingo, hpa, joro, bp,
	Lendacky, Thomas, x86, kvm, linux-kernel, venu.busireddy, Singh,
	Brijesh

On Thu, Feb 25, 2021 at 2:59 PM Steve Rutherford <srutherford@google.com> wrote:
>
> On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >
> > On Wed, Feb 24, 2021 at 10:22:33AM -0800, Sean Christopherson wrote:
> > > On Wed, Feb 24, 2021, Ashish Kalra wrote:
> > > > # Samples: 19K of event 'kvm:kvm_hypercall'
> > > > # Event count (approx.): 19573
> > > > #
> > > > # Overhead  Command          Shared Object     Symbol
> > > > # ........  ...............  ................  .........................
> > > > #
> > > >    100.00%  qemu-system-x86  [kernel.vmlinux]  [k] kvm_emulate_hypercall
> > > >
> > > > Out of these 19573 hypercalls, # of page encryption status hcalls are 19479,
> > > > so almost all hypercalls here are page encryption status hypercalls.
> > >
> > > Oof.
> > >
> > > > The above data indicates that there will be ~2% more Heavyweight VMEXITs
> > > > during SEV guest boot if we do page encryption status hypercalls
> > > > pass-through to host userspace.
> > > >
> > > > But, then Brijesh pointed out to me and highlighted that currently
> > > > OVMF is doing lot of VMEXITs because they don't use the DMA pool to minimize the C-bit toggles,
> > > > in other words, OVMF bounce buffer does page state change on every DMA allocate and free.
> > > >
> > > > So here is the performance analysis after kernel and initrd have been
> > > > loaded into memory using grub and then starting perf just before booting the kernel.
> > > >
> > > > These are the performance #'s after kernel and initrd have been loaded into memory,
> > > > then perf is attached and kernel is booted :
> > > >
> > > > # Samples: 1M of event 'kvm:kvm_userspace_exit'
> > > > # Event count (approx.): 1081235
> > > > #
> > > > # Overhead  Trace output
> > > > # ........  ........................
> > > > #
> > > >     99.77%  reason KVM_EXIT_IO (2)
> > > >      0.23%  reason KVM_EXIT_MMIO (6)
> > > >
> > > > # Samples: 1K of event 'kvm:kvm_hypercall'
> > > > # Event count (approx.): 1279
> > > > #
> > > >
> > > > So as the above data indicates, Linux is only making ~1K hypercalls,
> > > > compared to ~18K hypercalls made by OVMF in the above use case.
> > > >
> > > > Does the above adds a prerequisite that OVMF needs to be optimized if
> > > > and before hypercall pass-through can be done ?
> > >
> > > Disclaimer: my math could be totally wrong.
> > >
> > > I doubt it's a hard requirement.  Assuming a conversative roundtrip time of 50k
> > > cycles, those 18K hypercalls will add well under a 1/2 a second of boot time.
> > > If userspace can push the roundtrip time down to 10k cycles, the overhead is
> > > more like 50 milliseconds.
> > >
> > > That being said, this does seem like a good OVMF cleanup, irrespective of this
> > > new hypercall.  I assume it's not cheap to convert a page between encrypted and
> > > decrypted.
> > >
> > > Thanks much for getting the numbers!
> >
> > Considering the above data and guest boot time latencies
> > (and potential issues with OVMF and optimizations required there),
> > do we have any consensus on whether we want to do page encryption
> > status hypercall passthrough or not ?
> >
> > Thanks,
> > Ashish
>
> Thanks for grabbing the data!
>
> I am fine with both paths. Sean has stated an explicit desire for
> hypercall exiting, so I think that would be the current consensus.
>
> If we want to do hypercall exiting, this should be in a follow-up
> series where we implement something more generic, e.g. a hypercall
> exiting bitmap or hypercall exit list. If we are taking the hypercall
> exit route, we can drop the kvm side of the hypercall. Userspace could
> also handle the MSR using MSR filters (would need to confirm that).
> Then userspace could also be in control of the cpuid bit.
>
> Essentially, I think you could drop most of the host kernel work if
> there were generic support for hypercall exiting. Then userspace would
> be responsible for all of that. Thoughts on this?
>
> --Steve

This could even go a step further, and use an MSR write from within
the guest instead of a hypercall, which could be patched through to
userspace without host modification, if I understand the MSR filtering
correctly.
--Steve

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-25 22:59             ` Steve Rutherford
  2021-02-25 23:24               ` Steve Rutherford
@ 2021-02-26 14:04               ` Ashish Kalra
  2021-02-26 17:44                 ` Sean Christopherson
  1 sibling, 1 reply; 75+ messages in thread
From: Ashish Kalra @ 2021-02-26 14:04 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Sean Christopherson, pbonzini, tglx, mingo, hpa, rkrcmar, joro,
	bp, Lendacky, Thomas, x86, kvm, linux-kernel, venu.busireddy,
	Singh, Brijesh

Hello Steve,

On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >
> > On Wed, Feb 24, 2021 at 10:22:33AM -0800, Sean Christopherson wrote:
> > > On Wed, Feb 24, 2021, Ashish Kalra wrote:
> > > > # Samples: 19K of event 'kvm:kvm_hypercall'
> > > > # Event count (approx.): 19573
> > > > #
> > > > # Overhead  Command          Shared Object     Symbol
> > > > # ........  ...............  ................  .........................
> > > > #
> > > >    100.00%  qemu-system-x86  [kernel.vmlinux]  [k] kvm_emulate_hypercall
> > > >
> > > > Out of these 19573 hypercalls, # of page encryption status hcalls are 19479,
> > > > so almost all hypercalls here are page encryption status hypercalls.
> > >
> > > Oof.
> > >
> > > > The above data indicates that there will be ~2% more Heavyweight VMEXITs
> > > > during SEV guest boot if we do page encryption status hypercalls
> > > > pass-through to host userspace.
> > > >
> > > > But, then Brijesh pointed out to me and highlighted that currently
> > > > OVMF is doing lot of VMEXITs because they don't use the DMA pool to minimize the C-bit toggles,
> > > > in other words, OVMF bounce buffer does page state change on every DMA allocate and free.
> > > >
> > > > So here is the performance analysis after kernel and initrd have been
> > > > loaded into memory using grub and then starting perf just before booting the kernel.
> > > >
> > > > These are the performance #'s after kernel and initrd have been loaded into memory,
> > > > then perf is attached and kernel is booted :
> > > >
> > > > # Samples: 1M of event 'kvm:kvm_userspace_exit'
> > > > # Event count (approx.): 1081235
> > > > #
> > > > # Overhead  Trace output
> > > > # ........  ........................
> > > > #
> > > >     99.77%  reason KVM_EXIT_IO (2)
> > > >      0.23%  reason KVM_EXIT_MMIO (6)
> > > >
> > > > # Samples: 1K of event 'kvm:kvm_hypercall'
> > > > # Event count (approx.): 1279
> > > > #
> > > >
> > > > So as the above data indicates, Linux is only making ~1K hypercalls,
> > > > compared to ~18K hypercalls made by OVMF in the above use case.
> > > >
> > > > Does the above adds a prerequisite that OVMF needs to be optimized if
> > > > and before hypercall pass-through can be done ?
> > >
> > > Disclaimer: my math could be totally wrong.
> > >
> > > I doubt it's a hard requirement.  Assuming a conversative roundtrip time of 50k
> > > cycles, those 18K hypercalls will add well under a 1/2 a second of boot time.
> > > If userspace can push the roundtrip time down to 10k cycles, the overhead is
> > > more like 50 milliseconds.
> > >
> > > That being said, this does seem like a good OVMF cleanup, irrespective of this
> > > new hypercall.  I assume it's not cheap to convert a page between encrypted and
> > > decrypted.
> > >
> > > Thanks much for getting the numbers!
> >
> > Considering the above data and guest boot time latencies
> > (and potential issues with OVMF and optimizations required there),
> > do we have any consensus on whether we want to do page encryption
> > status hypercall passthrough or not ?
> >
> > Thanks,
> > Ashish
> 
> Thanks for grabbing the data!
> 
> I am fine with both paths. Sean has stated an explicit desire for
> hypercall exiting, so I think that would be the current consensus.
> 
> If we want to do hypercall exiting, this should be in a follow-up
> series where we implement something more generic, e.g. a hypercall
> exiting bitmap or hypercall exit list. If we are taking the hypercall
> exit route, we can drop the kvm side of the hypercall. Userspace could
> also handle the MSR using MSR filters (would need to confirm that).
> Then userspace could also be in control of the cpuid bit.
> 
> Essentially, I think you could drop most of the host kernel work if
> there were generic support for hypercall exiting. Then userspace would
> be responsible for all of that. Thoughts on this?
> 
So if i understand it correctly, i will submitting v11 of this patch-set
with in-kernel support for page encryption status hypercalls and shared
pages list and the userspace control of SEV live migration feature
support and fixes for MSR handling.

In subsequent follow-up patches we will add generic support for hypercall 
exiting and then drop kvm side of hypercall and also add userspace
support for MSR handling.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-26 14:04               ` Ashish Kalra
@ 2021-02-26 17:44                 ` Sean Christopherson
  2021-03-02 14:55                   ` Ashish Kalra
  2021-03-08 10:40                   ` Ashish Kalra
  0 siblings, 2 replies; 75+ messages in thread
From: Sean Christopherson @ 2021-02-26 17:44 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Steve Rutherford, pbonzini, joro, Lendacky, Thomas, kvm,
	linux-kernel, venu.busireddy, Singh, Brijesh, Will Deacon,
	Quentin Perret

+Will and Quentin (arm64)

Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at this
point.

On Fri, Feb 26, 2021, Ashish Kalra wrote:
> On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > Thanks for grabbing the data!
> > 
> > I am fine with both paths. Sean has stated an explicit desire for
> > hypercall exiting, so I think that would be the current consensus.

Yep, though it'd be good to get Paolo's input, too.

> > If we want to do hypercall exiting, this should be in a follow-up
> > series where we implement something more generic, e.g. a hypercall
> > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > exit route, we can drop the kvm side of the hypercall.

I don't think this is a good candidate for arbitrary hypercall interception.  Or
rather, I think hypercall interception should be an orthogonal implementation.

The guest, including guest firmware, needs to be aware that the hypercall is
supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
implement a common ABI is an unnecessary risk.

We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
require further VMM intervention.  But, I just don't see the point, it would
save only a few lines of code.  It would also limit what KVM could do in the
future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
then mandatory interception would essentially make it impossible for KVM to do
bookkeeping while still honoring the interception request.

However, I do think it would make sense to have the userspace exit be a generic
exit type.  But hey, we already have the necessary ABI defined for that!  It's
just not used anywhere.

	/* KVM_EXIT_HYPERCALL */
	struct {
		__u64 nr;
		__u64 args[6];
		__u64 ret;
		__u32 longmode;
		__u32 pad;
	} hypercall;


> > Userspace could also handle the MSR using MSR filters (would need to
> > confirm that).  Then userspace could also be in control of the cpuid bit.

An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
The data limitation could be fudged by shoving data into non-standard GPRs, but
that will result in truly heinous guest code, and extensibility issues.

The data limitation is a moot point, because the x86-only thing is a deal
breaker.  arm64's pKVM work has a near-identical use case for a guest to share
memory with a host.  I can't think of a clever way to avoid having to support
TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
multiple KVM variants.

> > Essentially, I think you could drop most of the host kernel work if
> > there were generic support for hypercall exiting. Then userspace would
> > be responsible for all of that. Thoughts on this?

> So if i understand it correctly, i will submitting v11 of this patch-set
> with in-kernel support for page encryption status hypercalls and shared
> pages list and the userspace control of SEV live migration feature
> support and fixes for MSR handling.

At this point, I'd say hold off on putting more effort into an implementation
until we have consensus.

> In subsequent follow-up patches we will add generic support for hypercall 
> exiting and then drop kvm side of hypercall and also add userspace
> support for MSR handling.
> 
> Thanks,
> Ashish

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-26 17:44                 ` Sean Christopherson
@ 2021-03-02 14:55                   ` Ashish Kalra
  2021-03-02 15:15                     ` Ashish Kalra
  2021-03-03 18:54                     ` Will Deacon
  2021-03-08 10:40                   ` Ashish Kalra
  1 sibling, 2 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-03-02 14:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Steve Rutherford, pbonzini, joro, Lendacky, Thomas, kvm,
	linux-kernel, venu.busireddy, Singh, Brijesh, Will Deacon,
	Quentin Perret

On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> +Will and Quentin (arm64)
> 
> Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at this
> point.
> 
> On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > Thanks for grabbing the data!
> > > 
> > > I am fine with both paths. Sean has stated an explicit desire for
> > > hypercall exiting, so I think that would be the current consensus.
> 
> Yep, though it'd be good to get Paolo's input, too.
> 
> > > If we want to do hypercall exiting, this should be in a follow-up
> > > series where we implement something more generic, e.g. a hypercall
> > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > exit route, we can drop the kvm side of the hypercall.
> 
> I don't think this is a good candidate for arbitrary hypercall interception.  Or
> rather, I think hypercall interception should be an orthogonal implementation.
> 
> The guest, including guest firmware, needs to be aware that the hypercall is
> supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> implement a common ABI is an unnecessary risk.
> 
> We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> require further VMM intervention.  But, I just don't see the point, it would
> save only a few lines of code.  It would also limit what KVM could do in the
> future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> then mandatory interception would essentially make it impossible for KVM to do
> bookkeeping while still honoring the interception request.
> 
> However, I do think it would make sense to have the userspace exit be a generic
> exit type.  But hey, we already have the necessary ABI defined for that!  It's
> just not used anywhere.
> 
> 	/* KVM_EXIT_HYPERCALL */
> 	struct {
> 		__u64 nr;
> 		__u64 args[6];
> 		__u64 ret;
> 		__u32 longmode;
> 		__u32 pad;
> 	} hypercall;
> 
> 
> > > Userspace could also handle the MSR using MSR filters (would need to
> > > confirm that).  Then userspace could also be in control of the cpuid bit.
> 
> An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> The data limitation could be fudged by shoving data into non-standard GPRs, but
> that will result in truly heinous guest code, and extensibility issues.
> 
> The data limitation is a moot point, because the x86-only thing is a deal
> breaker.  arm64's pKVM work has a near-identical use case for a guest to share
> memory with a host.  I can't think of a clever way to avoid having to support
> TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> multiple KVM variants.

Looking at arm64's pKVM work, i see that it is a recently introduced RFC
patch-set and probably relevant to arm64 nVHE hypervisor
mode/implementation, and potentially makes sense as it adds guest
memory protection as both host and guest kernels are running on the same
privilege level ?

Though i do see that the pKVM stuff adds two hypercalls, specifically :

pkvm_create_mappings() ( I assume this is for setting shared memory
regions between host and guest) &
pkvm_create_private_mappings().

And the use-cases are quite similar to memory protection architectues
use cases, for example, use with virtio devices, guest DMA I/O, etc.

But, isn't this patch set still RFC, and though i agree that it adds
an infrastructure for standardised communication between the host and
it's guests for mutually controlled shared memory regions and
surely adds some kind of portability between hypervisor
implementations, but nothing is standardised still, right ?

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-03-02 14:55                   ` Ashish Kalra
@ 2021-03-02 15:15                     ` Ashish Kalra
  2021-03-03 18:54                     ` Will Deacon
  1 sibling, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-03-02 15:15 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Steve Rutherford, pbonzini, joro, Lendacky, Thomas, kvm,
	linux-kernel, venu.busireddy, Singh, Brijesh, Will Deacon,
	Quentin Perret

On Tue, Mar 02, 2021 at 02:55:43PM +0000, Ashish Kalra wrote:
> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > +Will and Quentin (arm64)
> > 
> > Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at this
> > point.
> > 
> > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > > Thanks for grabbing the data!
> > > > 
> > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > hypercall exiting, so I think that would be the current consensus.
> > 
> > Yep, though it'd be good to get Paolo's input, too.
> > 
> > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > series where we implement something more generic, e.g. a hypercall
> > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > exit route, we can drop the kvm side of the hypercall.
> > 
> > I don't think this is a good candidate for arbitrary hypercall interception.  Or
> > rather, I think hypercall interception should be an orthogonal implementation.
> > 
> > The guest, including guest firmware, needs to be aware that the hypercall is
> > supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> > implement a common ABI is an unnecessary risk.
> > 
> > We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> > require further VMM intervention.  But, I just don't see the point, it would
> > save only a few lines of code.  It would also limit what KVM could do in the
> > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> > then mandatory interception would essentially make it impossible for KVM to do
> > bookkeeping while still honoring the interception request.
> > 
> > However, I do think it would make sense to have the userspace exit be a generic
> > exit type.  But hey, we already have the necessary ABI defined for that!  It's
> > just not used anywhere.
> > 
> > 	/* KVM_EXIT_HYPERCALL */
> > 	struct {
> > 		__u64 nr;
> > 		__u64 args[6];
> > 		__u64 ret;
> > 		__u32 longmode;
> > 		__u32 pad;
> > 	} hypercall;
> > 
> > 
> > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > confirm that).  Then userspace could also be in control of the cpuid bit.
> > 
> > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> > The data limitation could be fudged by shoving data into non-standard GPRs, but
> > that will result in truly heinous guest code, and extensibility issues.
> > 
> > The data limitation is a moot point, because the x86-only thing is a deal
> > breaker.  arm64's pKVM work has a near-identical use case for a guest to share
> > memory with a host.  I can't think of a clever way to avoid having to support
> > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > multiple KVM variants.
> 
> Looking at arm64's pKVM work, i see that it is a recently introduced RFC
> patch-set and probably relevant to arm64 nVHE hypervisor
> mode/implementation, and potentially makes sense as it adds guest
> memory protection as both host and guest kernels are running on the same
> privilege level ?
> 
> Though i do see that the pKVM stuff adds two hypercalls, specifically :
> 
> pkvm_create_mappings() ( I assume this is for setting shared memory
> regions between host and guest) &
> pkvm_create_private_mappings().
> 
> And the use-cases are quite similar to memory protection architectues
> use cases, for example, use with virtio devices, guest DMA I/O, etc.
> 
> But, isn't this patch set still RFC, and though i agree that it adds
> an infrastructure for standardised communication between the host and
> it's guests for mutually controlled shared memory regions and
> surely adds some kind of portability between hypervisor
> implementations, but nothing is standardised still, right ?
> 

And to add here, the hypercall implementation is in-HYP mode,
there is no infrastructure as part of this patch-set to do
hypercall exiting and handling it in user-space. 

Though arguably, we may able to add a hypercall exiting code path on the
amd64 implementation for the same hypercall interfaces ?

Alternatively, we implement this in-kernel and then add SET/GET ioctl
interfaces to export the shared pages/regions list to user-space.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-03-02 14:55                   ` Ashish Kalra
  2021-03-02 15:15                     ` Ashish Kalra
@ 2021-03-03 18:54                     ` Will Deacon
  2021-03-03 19:32                       ` Ashish Kalra
                                         ` (2 more replies)
  1 sibling, 3 replies; 75+ messages in thread
From: Will Deacon @ 2021-03-03 18:54 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Sean Christopherson, Steve Rutherford, pbonzini, joro, Lendacky,
	Thomas, kvm, linux-kernel, venu.busireddy, Singh, Brijesh,
	Quentin Perret, maz

[+Marc]

On Tue, Mar 02, 2021 at 02:55:43PM +0000, Ashish Kalra wrote:
> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > > Thanks for grabbing the data!
> > > > 
> > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > hypercall exiting, so I think that would be the current consensus.
> > 
> > Yep, though it'd be good to get Paolo's input, too.
> > 
> > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > series where we implement something more generic, e.g. a hypercall
> > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > exit route, we can drop the kvm side of the hypercall.
> > 
> > I don't think this is a good candidate for arbitrary hypercall interception.  Or
> > rather, I think hypercall interception should be an orthogonal implementation.
> > 
> > The guest, including guest firmware, needs to be aware that the hypercall is
> > supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> > implement a common ABI is an unnecessary risk.
> > 
> > We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> > require further VMM intervention.  But, I just don't see the point, it would
> > save only a few lines of code.  It would also limit what KVM could do in the
> > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> > then mandatory interception would essentially make it impossible for KVM to do
> > bookkeeping while still honoring the interception request.
> > 
> > However, I do think it would make sense to have the userspace exit be a generic
> > exit type.  But hey, we already have the necessary ABI defined for that!  It's
> > just not used anywhere.
> > 
> > 	/* KVM_EXIT_HYPERCALL */
> > 	struct {
> > 		__u64 nr;
> > 		__u64 args[6];
> > 		__u64 ret;
> > 		__u32 longmode;
> > 		__u32 pad;
> > 	} hypercall;
> > 
> > 
> > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > confirm that).  Then userspace could also be in control of the cpuid bit.
> > 
> > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> > The data limitation could be fudged by shoving data into non-standard GPRs, but
> > that will result in truly heinous guest code, and extensibility issues.
> > 
> > The data limitation is a moot point, because the x86-only thing is a deal
> > breaker.  arm64's pKVM work has a near-identical use case for a guest to share
> > memory with a host.  I can't think of a clever way to avoid having to support
> > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > multiple KVM variants.
> 
> Looking at arm64's pKVM work, i see that it is a recently introduced RFC
> patch-set and probably relevant to arm64 nVHE hypervisor
> mode/implementation, and potentially makes sense as it adds guest
> memory protection as both host and guest kernels are running on the same
> privilege level ?
> 
> Though i do see that the pKVM stuff adds two hypercalls, specifically :
> 
> pkvm_create_mappings() ( I assume this is for setting shared memory
> regions between host and guest) &
> pkvm_create_private_mappings().
> 
> And the use-cases are quite similar to memory protection architectues
> use cases, for example, use with virtio devices, guest DMA I/O, etc.

These hypercalls are both private to the host kernel communicating with
its hypervisor counterpart, so I don't think they're particularly
relevant here. As far as I can see, the more useful thing is to allow
the guest to communicate back to the host (and the VMM) that it has opened
up a memory window, perhaps for virtio rings or some other shared memory.

We hacked this up as a prototype in the past:

https://android-kvm.googlesource.com/linux/+/d12a9e2c12a52cf7140d40cd9fa092dc8a85fac9%5E%21/

but that's all arm64-specific and if we're solving the same problem as
you, then let's avoid arch-specific stuff if possible. The way in which
the guest discovers the interface will be arch-specific (we already have
a mechanism for that and some hypercalls are already allocated by specs
from Arm), but the interface back to the VMM and some (most?) of the host
handling could be shared.

> But, isn't this patch set still RFC, and though i agree that it adds
> an infrastructure for standardised communication between the host and
> it's guests for mutually controlled shared memory regions and
> surely adds some kind of portability between hypervisor
> implementations, but nothing is standardised still, right ?

No, and it seems that you're further ahead than us in terms of
implementation in this area. We're happy to review patches though, to
make sure we end up with something that works for us both.

Will

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-03-03 18:54                     ` Will Deacon
@ 2021-03-03 19:32                       ` Ashish Kalra
  2021-03-09 19:10                       ` Ashish Kalra
  2021-03-11 18:14                       ` Ashish Kalra
  2 siblings, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-03-03 19:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sean Christopherson, Steve Rutherford, pbonzini, joro, Lendacky,
	Thomas, kvm, linux-kernel, venu.busireddy, Singh, Brijesh,
	Quentin Perret, maz

On Wed, Mar 03, 2021 at 06:54:41PM +0000, Will Deacon wrote:
> [+Marc]
> 
> On Tue, Mar 02, 2021 at 02:55:43PM +0000, Ashish Kalra wrote:
> > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > > > Thanks for grabbing the data!
> > > > > 
> > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > hypercall exiting, so I think that would be the current consensus.
> > > 
> > > Yep, though it'd be good to get Paolo's input, too.
> > > 
> > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > series where we implement something more generic, e.g. a hypercall
> > > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > > exit route, we can drop the kvm side of the hypercall.
> > > 
> > > I don't think this is a good candidate for arbitrary hypercall interception.  Or
> > > rather, I think hypercall interception should be an orthogonal implementation.
> > > 
> > > The guest, including guest firmware, needs to be aware that the hypercall is
> > > supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> > > implement a common ABI is an unnecessary risk.
> > > 
> > > We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> > > require further VMM intervention.  But, I just don't see the point, it would
> > > save only a few lines of code.  It would also limit what KVM could do in the
> > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> > > then mandatory interception would essentially make it impossible for KVM to do
> > > bookkeeping while still honoring the interception request.
> > > 
> > > However, I do think it would make sense to have the userspace exit be a generic
> > > exit type.  But hey, we already have the necessary ABI defined for that!  It's
> > > just not used anywhere.
> > > 
> > > 	/* KVM_EXIT_HYPERCALL */
> > > 	struct {
> > > 		__u64 nr;
> > > 		__u64 args[6];
> > > 		__u64 ret;
> > > 		__u32 longmode;
> > > 		__u32 pad;
> > > 	} hypercall;
> > > 
> > > 
> > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > confirm that).  Then userspace could also be in control of the cpuid bit.
> > > 
> > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> > > The data limitation could be fudged by shoving data into non-standard GPRs, but
> > > that will result in truly heinous guest code, and extensibility issues.
> > > 
> > > The data limitation is a moot point, because the x86-only thing is a deal
> > > breaker.  arm64's pKVM work has a near-identical use case for a guest to share
> > > memory with a host.  I can't think of a clever way to avoid having to support
> > > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > > multiple KVM variants.
> > 
> > Looking at arm64's pKVM work, i see that it is a recently introduced RFC
> > patch-set and probably relevant to arm64 nVHE hypervisor
> > mode/implementation, and potentially makes sense as it adds guest
> > memory protection as both host and guest kernels are running on the same
> > privilege level ?
> > 
> > Though i do see that the pKVM stuff adds two hypercalls, specifically :
> > 
> > pkvm_create_mappings() ( I assume this is for setting shared memory
> > regions between host and guest) &
> > pkvm_create_private_mappings().
> > 
> > And the use-cases are quite similar to memory protection architectues
> > use cases, for example, use with virtio devices, guest DMA I/O, etc.
> 
> These hypercalls are both private to the host kernel communicating with
> its hypervisor counterpart, so I don't think they're particularly
> relevant here. 

Yes, i have the same thoughts here as this looked like a private hypercall
interface between host kernel and hypervisor, rather than between guest
and host/hypervisor.

> As far as I can see, the more useful thing is to allow
> the guest to communicate back to the host (and the VMM) that it has opened
> up a memory window, perhaps for virtio rings or some other shared memory.
> 

Yes, this is our main use case.

> We hacked this up as a prototype in the past:
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fandroid-kvm.googlesource.com%2Flinux%2F%2B%2Fd12a9e2c12a52cf7140d40cd9fa092dc8a85fac9%255E%2521%2F&amp;data=04%7C01%7Cashish.kalra%40amd.com%7C7ae6bbd9fa6442f9edcc08d8de75d14b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637503944913839841%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Juon5nJ7BB6moTWYssRXOWrDOrYfZLmA%2BLrz3s12Ook%3D&amp;reserved=0
> 
> but that's all arm64-specific and if we're solving the same problem as
> you, then let's avoid arch-specific stuff if possible. The way in which
> the guest discovers the interface will be arch-specific (we already have
> a mechanism for that and some hypercalls are already allocated by specs
> from Arm), but the interface back to the VMM and some (most?) of the host
> handling could be shared.
> 

Ok, yes and that makes sense.

> > But, isn't this patch set still RFC, and though i agree that it adds
> > an infrastructure for standardised communication between the host and
> > it's guests for mutually controlled shared memory regions and
> > surely adds some kind of portability between hypervisor
> > implementations, but nothing is standardised still, right ?
> 
> No, and it seems that you're further ahead than us in terms of
> implementation in this area. We're happy to review patches though, to
> make sure we end up with something that works for us both.
> 

Here is the link to the current patch series :
https://lore.kernel.org/lkml/cover.1612398155.git.ashish.kalra@amd.com/

And the following patches will be relevant here: 
  KVM: x86: Add AMD SEV specific Hypercall3
  KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  mm: x86: Invoke hypercall when page encryption status is changed
  KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  KVM: x86: Introduce KVM_SET_SHARED_PAGES_LIST ioctl

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-02-26 17:44                 ` Sean Christopherson
  2021-03-02 14:55                   ` Ashish Kalra
@ 2021-03-08 10:40                   ` Ashish Kalra
  2021-03-08 19:51                     ` Sean Christopherson
  1 sibling, 1 reply; 75+ messages in thread
From: Ashish Kalra @ 2021-03-08 10:40 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Steve Rutherford, pbonzini, joro, Lendacky, Thomas, kvm,
	linux-kernel, venu.busireddy, Singh, Brijesh, Will Deacon,
	Quentin Perret

On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> +Will and Quentin (arm64)
> 
> Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at this
> point.
> 
> On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > Thanks for grabbing the data!
> > > 
> > > I am fine with both paths. Sean has stated an explicit desire for
> > > hypercall exiting, so I think that would be the current consensus.
> 
> Yep, though it'd be good to get Paolo's input, too.
> 
> > > If we want to do hypercall exiting, this should be in a follow-up
> > > series where we implement something more generic, e.g. a hypercall
> > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > exit route, we can drop the kvm side of the hypercall.
> 
> I don't think this is a good candidate for arbitrary hypercall interception.  Or
> rather, I think hypercall interception should be an orthogonal implementation.
> 
> The guest, including guest firmware, needs to be aware that the hypercall is
> supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> implement a common ABI is an unnecessary risk.
> 
> We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> require further VMM intervention.  But, I just don't see the point, it would
> save only a few lines of code.  It would also limit what KVM could do in the
> future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> then mandatory interception would essentially make it impossible for KVM to do
> bookkeeping while still honoring the interception request.
> 
> However, I do think it would make sense to have the userspace exit be a generic
> exit type.  But hey, we already have the necessary ABI defined for that!  It's
> just not used anywhere.
> 
> 	/* KVM_EXIT_HYPERCALL */
> 	struct {
> 		__u64 nr;
> 		__u64 args[6];
> 		__u64 ret;
> 		__u32 longmode;
> 		__u32 pad;
> 	} hypercall;
> 
> 
> > > Userspace could also handle the MSR using MSR filters (would need to
> > > confirm that).  Then userspace could also be in control of the cpuid bit.
> 
> An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> The data limitation could be fudged by shoving data into non-standard GPRs, but
> that will result in truly heinous guest code, and extensibility issues.
> 
> The data limitation is a moot point, because the x86-only thing is a deal
> breaker.  arm64's pKVM work has a near-identical use case for a guest to share
> memory with a host.  I can't think of a clever way to avoid having to support
> TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> multiple KVM variants.
> 

Potentially, there is another reason for in-kernel hypercall handling
considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
of each guest page, for instance pages in hypervisor state, i.e., pages
with C=0 and pages in guest valid state with C=1.

Now, there shouldn't be a need for page encryption status hypercalls on 
SEV-SNP as KVM can track & reference guest page status directly using 
the RMP table.

As KVM maintains the RMP table, therefore we will need SET/GET type of
interfaces to provide the guest page encryption status to userspace.

For the above reason if we do in-kernel hypercall handling for page
encryption status (which we probably won't require for SEV-SNP &
correspondingly there will be no hypercall exiting), then we can
implement a standard GET/SET ioctl interface to get/set the guest page
encryption status for userspace, which will work across SEV, SEV-ES and
SEV-SNP.

Thanks,
Ashish


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-03-08 10:40                   ` Ashish Kalra
@ 2021-03-08 19:51                     ` Sean Christopherson
  2021-03-08 21:05                       ` Ashish Kalra
                                         ` (2 more replies)
  0 siblings, 3 replies; 75+ messages in thread
From: Sean Christopherson @ 2021-03-08 19:51 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Steve Rutherford, pbonzini, joro, Lendacky, Thomas, kvm,
	linux-kernel, venu.busireddy, Singh, Brijesh, Will Deacon,
	Quentin Perret

On Mon, Mar 08, 2021, Ashish Kalra wrote:
> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > +Will and Quentin (arm64)
> > 
> > Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at this
> > point.
> > 
> > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > > Thanks for grabbing the data!
> > > > 
> > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > hypercall exiting, so I think that would be the current consensus.
> > 
> > Yep, though it'd be good to get Paolo's input, too.
> > 
> > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > series where we implement something more generic, e.g. a hypercall
> > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > exit route, we can drop the kvm side of the hypercall.
> > 
> > I don't think this is a good candidate for arbitrary hypercall interception.  Or
> > rather, I think hypercall interception should be an orthogonal implementation.
> > 
> > The guest, including guest firmware, needs to be aware that the hypercall is
> > supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> > implement a common ABI is an unnecessary risk.
> > 
> > We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> > require further VMM intervention.  But, I just don't see the point, it would
> > save only a few lines of code.  It would also limit what KVM could do in the
> > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> > then mandatory interception would essentially make it impossible for KVM to do
> > bookkeeping while still honoring the interception request.
> > 
> > However, I do think it would make sense to have the userspace exit be a generic
> > exit type.  But hey, we already have the necessary ABI defined for that!  It's
> > just not used anywhere.
> > 
> > 	/* KVM_EXIT_HYPERCALL */
> > 	struct {
> > 		__u64 nr;
> > 		__u64 args[6];
> > 		__u64 ret;
> > 		__u32 longmode;
> > 		__u32 pad;
> > 	} hypercall;
> > 
> > 
> > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > confirm that).  Then userspace could also be in control of the cpuid bit.
> > 
> > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> > The data limitation could be fudged by shoving data into non-standard GPRs, but
> > that will result in truly heinous guest code, and extensibility issues.
> > 
> > The data limitation is a moot point, because the x86-only thing is a deal
> > breaker.  arm64's pKVM work has a near-identical use case for a guest to share
> > memory with a host.  I can't think of a clever way to avoid having to support
> > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > multiple KVM variants.
> > 
> 
> Potentially, there is another reason for in-kernel hypercall handling
> considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
> of each guest page, for instance pages in hypervisor state, i.e., pages
> with C=0 and pages in guest valid state with C=1.
> 
> Now, there shouldn't be a need for page encryption status hypercalls on 
> SEV-SNP as KVM can track & reference guest page status directly using 
> the RMP table.

Relying on the RMP table itself would require locking the RMP table for an
extended duration, and walking the entire RMP to find shared pages would be
very inefficient.

> As KVM maintains the RMP table, therefore we will need SET/GET type of
> interfaces to provide the guest page encryption status to userspace.

Hrm, somehow I temporarily forgot about SNP and TDX adding their own hypercalls
for converting between shared and private.  And in the case of TDX, the hypercall
can't be trusted, i.e. is just a hint, otherwise the guest could induce a #MC in
the host.

But, the different guest behavior doesn't require KVM to maintain a list/tree,
e.g. adding a dedicated KVM_EXIT_* for notifying userspace of page encryption
status changes would also suffice.  

Actually, that made me think of another argument against maintaining a list in
KVM: there's no way to notify userspace that a page's status has changed.
Userspace would need to query KVM to do GET_LIST after every GET_DIRTY.
Obviously not a huge issue, but it does make migration slightly less efficient.

On a related topic, there are fatal race conditions that will require careful
coordination between guest and host, and will effectively be wired into the ABI.
SNP and TDX don't suffer these issues because host awareness of status is atomic
with respect to the guest actually writing the page with the new encryption
status.

For SEV live migration...

If the guest does the hypercall after writing the page, then the guest is hosed
if it gets migrated while writing the page (scenario #1):

  vCPU                 Userspace
  zero_bytes[0:N]
                       <transfers written bytes as private instead of shared>
		       <migrates vCPU>
  zero_bytes[N+1:4095]
  set_shared (dest)
  kaboom!

If userspace does GET_DIRTY after GET_LIST, then the host would transfer bad
data by consuming a stale list (scenario #2):

  vCPU               Userspace
                     get_list (from KVM or internally)
  set_shared (src)
  zero_page (src)
                     get_dirty
                     <transfers private data instead of shared>
                     <migrates vCPU>
  kaboom!

If both guest and host order things to avoid #1 and #2, the host can still
migrate the wrong data (scenario #3):

  vCPU               Userspace
  set_private
  zero_bytes[0:4096]
                     get_dirty
  set_shared (src)
                     get_list
                     <transfers as shared instead of private>
		     <migrates vCPU>
  set_private (dest)
  kaboom!

Scenario #3 is unlikely, but plausible, e.g. if the guest bails from its
conversion flow for whatever reason, after making the initial hypercall.  Maybe
it goes without saying, but to address #3, the guest must consider existing data
as lost the instant it tells the host the page has been converted to a different
type.

> For the above reason if we do in-kernel hypercall handling for page
> encryption status (which we probably won't require for SEV-SNP &
> correspondingly there will be no hypercall exiting),

As above, that doesn't preclude KVM from exiting to userspace on conversion.

> then we can implement a standard GET/SET ioctl interface to get/set the guest
> page encryption status for userspace, which will work across SEV, SEV-ES and
> SEV-SNP.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-03-08 19:51                     ` Sean Christopherson
@ 2021-03-08 21:05                       ` Ashish Kalra
  2021-03-08 21:11                       ` Brijesh Singh
  2021-03-08 21:48                       ` Steve Rutherford
  2 siblings, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-03-08 21:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Steve Rutherford, pbonzini, joro, Lendacky, Thomas, kvm,
	linux-kernel, venu.busireddy, Singh, Brijesh, Will Deacon,
	Quentin Perret

On Mon, Mar 08, 2021 at 11:51:57AM -0800, Sean Christopherson wrote:
> On Mon, Mar 08, 2021, Ashish Kalra wrote:
> > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > +Will and Quentin (arm64)
> > > 
> > > Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at this
> > > point.
> > > 
> > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > > > Thanks for grabbing the data!
> > > > > 
> > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > hypercall exiting, so I think that would be the current consensus.
> > > 
> > > Yep, though it'd be good to get Paolo's input, too.
> > > 
> > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > series where we implement something more generic, e.g. a hypercall
> > > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > > exit route, we can drop the kvm side of the hypercall.
> > > 
> > > I don't think this is a good candidate for arbitrary hypercall interception.  Or
> > > rather, I think hypercall interception should be an orthogonal implementation.
> > > 
> > > The guest, including guest firmware, needs to be aware that the hypercall is
> > > supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> > > implement a common ABI is an unnecessary risk.
> > > 
> > > We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> > > require further VMM intervention.  But, I just don't see the point, it would
> > > save only a few lines of code.  It would also limit what KVM could do in the
> > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> > > then mandatory interception would essentially make it impossible for KVM to do
> > > bookkeeping while still honoring the interception request.
> > > 
> > > However, I do think it would make sense to have the userspace exit be a generic
> > > exit type.  But hey, we already have the necessary ABI defined for that!  It's
> > > just not used anywhere.
> > > 
> > > 	/* KVM_EXIT_HYPERCALL */
> > > 	struct {
> > > 		__u64 nr;
> > > 		__u64 args[6];
> > > 		__u64 ret;
> > > 		__u32 longmode;
> > > 		__u32 pad;
> > > 	} hypercall;
> > > 
> > > 
> > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > confirm that).  Then userspace could also be in control of the cpuid bit.
> > > 
> > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> > > The data limitation could be fudged by shoving data into non-standard GPRs, but
> > > that will result in truly heinous guest code, and extensibility issues.
> > > 
> > > The data limitation is a moot point, because the x86-only thing is a deal
> > > breaker.  arm64's pKVM work has a near-identical use case for a guest to share
> > > memory with a host.  I can't think of a clever way to avoid having to support
> > > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > > multiple KVM variants.
> > > 
> > 
> > Potentially, there is another reason for in-kernel hypercall handling
> > considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
> > of each guest page, for instance pages in hypervisor state, i.e., pages
> > with C=0 and pages in guest valid state with C=1.
> > 
> > Now, there shouldn't be a need for page encryption status hypercalls on 
> > SEV-SNP as KVM can track & reference guest page status directly using 
> > the RMP table.
> 
> Relying on the RMP table itself would require locking the RMP table for an
> extended duration, and walking the entire RMP to find shared pages would be
> very inefficient.
> 
> > As KVM maintains the RMP table, therefore we will need SET/GET type of
> > interfaces to provide the guest page encryption status to userspace.
> 
> Hrm, somehow I temporarily forgot about SNP and TDX adding their own hypercalls
> for converting between shared and private.  And in the case of TDX, the hypercall
> can't be trusted, i.e. is just a hint, otherwise the guest could induce a #MC in
> the host.

One question here, is this because if hypercall can cause direct
modifications to the shared EPT, it can induce #MC in the host ?

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-03-08 19:51                     ` Sean Christopherson
  2021-03-08 21:05                       ` Ashish Kalra
@ 2021-03-08 21:11                       ` Brijesh Singh
  2021-03-08 21:32                         ` Ashish Kalra
  2021-03-08 21:51                         ` Steve Rutherford
  2021-03-08 21:48                       ` Steve Rutherford
  2 siblings, 2 replies; 75+ messages in thread
From: Brijesh Singh @ 2021-03-08 21:11 UTC (permalink / raw)
  To: Sean Christopherson, Ashish Kalra
  Cc: brijesh.singh, Steve Rutherford, pbonzini, joro, Lendacky,
	Thomas, kvm, linux-kernel, venu.busireddy, Will Deacon,
	Quentin Perret


On 3/8/21 1:51 PM, Sean Christopherson wrote:
> On Mon, Mar 08, 2021, Ashish Kalra wrote:
>> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
>>> +Will and Quentin (arm64)
>>>
>>> Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at this
>>> point.
>>>
>>> On Fri, Feb 26, 2021, Ashish Kalra wrote:
>>>> On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
>>>>> On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>>>>> Thanks for grabbing the data!
>>>>>
>>>>> I am fine with both paths. Sean has stated an explicit desire for
>>>>> hypercall exiting, so I think that would be the current consensus.
>>> Yep, though it'd be good to get Paolo's input, too.
>>>
>>>>> If we want to do hypercall exiting, this should be in a follow-up
>>>>> series where we implement something more generic, e.g. a hypercall
>>>>> exiting bitmap or hypercall exit list. If we are taking the hypercall
>>>>> exit route, we can drop the kvm side of the hypercall.
>>> I don't think this is a good candidate for arbitrary hypercall interception.  Or
>>> rather, I think hypercall interception should be an orthogonal implementation.
>>>
>>> The guest, including guest firmware, needs to be aware that the hypercall is
>>> supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
>>> implement a common ABI is an unnecessary risk.
>>>
>>> We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
>>> require further VMM intervention.  But, I just don't see the point, it would
>>> save only a few lines of code.  It would also limit what KVM could do in the
>>> future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
>>> then mandatory interception would essentially make it impossible for KVM to do
>>> bookkeeping while still honoring the interception request.
>>>
>>> However, I do think it would make sense to have the userspace exit be a generic
>>> exit type.  But hey, we already have the necessary ABI defined for that!  It's
>>> just not used anywhere.
>>>
>>> 	/* KVM_EXIT_HYPERCALL */
>>> 	struct {
>>> 		__u64 nr;
>>> 		__u64 args[6];
>>> 		__u64 ret;
>>> 		__u32 longmode;
>>> 		__u32 pad;
>>> 	} hypercall;
>>>
>>>
>>>>> Userspace could also handle the MSR using MSR filters (would need to
>>>>> confirm that).  Then userspace could also be in control of the cpuid bit.
>>> An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
>>> The data limitation could be fudged by shoving data into non-standard GPRs, but
>>> that will result in truly heinous guest code, and extensibility issues.
>>>
>>> The data limitation is a moot point, because the x86-only thing is a deal
>>> breaker.  arm64's pKVM work has a near-identical use case for a guest to share
>>> memory with a host.  I can't think of a clever way to avoid having to support
>>> TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
>>> multiple KVM variants.
>>>
>> Potentially, there is another reason for in-kernel hypercall handling
>> considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
>> of each guest page, for instance pages in hypervisor state, i.e., pages
>> with C=0 and pages in guest valid state with C=1.
>>
>> Now, there shouldn't be a need for page encryption status hypercalls on 
>> SEV-SNP as KVM can track & reference guest page status directly using 
>> the RMP table.
> Relying on the RMP table itself would require locking the RMP table for an
> extended duration, and walking the entire RMP to find shared pages would be
> very inefficient.
>
>> As KVM maintains the RMP table, therefore we will need SET/GET type of
>> interfaces to provide the guest page encryption status to userspace.
> Hrm, somehow I temporarily forgot about SNP and TDX adding their own hypercalls
> for converting between shared and private.  And in the case of TDX, the hypercall
> can't be trusted, i.e. is just a hint, otherwise the guest could induce a #MC in
> the host.
>
> But, the different guest behavior doesn't require KVM to maintain a list/tree,
> e.g. adding a dedicated KVM_EXIT_* for notifying userspace of page encryption
> status changes would also suffice.  
>
> Actually, that made me think of another argument against maintaining a list in
> KVM: there's no way to notify userspace that a page's status has changed.
> Userspace would need to query KVM to do GET_LIST after every GET_DIRTY.
> Obviously not a huge issue, but it does make migration slightly less efficient.
>
> On a related topic, there are fatal race conditions that will require careful
> coordination between guest and host, and will effectively be wired into the ABI.
> SNP and TDX don't suffer these issues because host awareness of status is atomic
> with respect to the guest actually writing the page with the new encryption
> status.
>
> For SEV live migration...
>
> If the guest does the hypercall after writing the page, then the guest is hosed
> if it gets migrated while writing the page (scenario #1):
>
>   vCPU                 Userspace
>   zero_bytes[0:N]
>                        <transfers written bytes as private instead of shared>
> 		       <migrates vCPU>
>   zero_bytes[N+1:4095]
>   set_shared (dest)
>   kaboom!


Maybe I am missing something, this is not any different from a normal
operation inside a guest. Making a page shared/private in the page table
does not update the content of the page itself. In your above case, I
assume zero_bytes[N+1:4095] are written by the destination VM. The
memory region was private in the source VM page table, so, those writes
will be performed encrypted. The destination VM later changed the memory
to shared, but nobody wrote to the memory after it has been transitioned
to the  shared, so a reader of the memory should get ciphertext and
unless there was a write after the set_shared (dest).


> If userspace does GET_DIRTY after GET_LIST, then the host would transfer bad
> data by consuming a stale list (scenario #2):
>
>   vCPU               Userspace
>                      get_list (from KVM or internally)
>   set_shared (src)
>   zero_page (src)
>                      get_dirty
>                      <transfers private data instead of shared>
>                      <migrates vCPU>
>   kaboom!


I don't remember how things are done in recent Ashish Qemu/KVM patches
but in previous series, the get_dirty() happens before the querying the
encrypted state. There was some logic in VMM to resync the encrypted
bitmap during the final migration stage and perform any additional data
transfer since last sync.


> If both guest and host order things to avoid #1 and #2, the host can still
> migrate the wrong data (scenario #3):
>
>   vCPU               Userspace
>   set_private
>   zero_bytes[0:4096]
>                      get_dirty
>   set_shared (src)
>                      get_list
>                      <transfers as shared instead of private>
> 		     <migrates vCPU>
>   set_private (dest)
>   kaboom!


Since there was no write to the memory after the set_shared (src), so
the content of the page should not have changed. After the set_private
(dest), the caller should be seeing the same content written by the
zero_bytes[0:4096]


> Scenario #3 is unlikely, but plausible, e.g. if the guest bails from its
> conversion flow for whatever reason, after making the initial hypercall.  Maybe
> it goes without saying, but to address #3, the guest must consider existing data
> as lost the instant it tells the host the page has been converted to a different
> type.
>
>> For the above reason if we do in-kernel hypercall handling for page
>> encryption status (which we probably won't require for SEV-SNP &
>> correspondingly there will be no hypercall exiting),
> As above, that doesn't preclude KVM from exiting to userspace on conversion.
>
>> then we can implement a standard GET/SET ioctl interface to get/set the guest
>> page encryption status for userspace, which will work across SEV, SEV-ES and
>> SEV-SNP.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-03-08 21:11                       ` Brijesh Singh
@ 2021-03-08 21:32                         ` Ashish Kalra
  2021-03-08 21:51                         ` Steve Rutherford
  1 sibling, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-03-08 21:32 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Sean Christopherson, Steve Rutherford, pbonzini, joro, Lendacky,
	Thomas, kvm, linux-kernel, venu.busireddy, Will Deacon,
	Quentin Perret

On Mon, Mar 08, 2021 at 03:11:41PM -0600, Brijesh Singh wrote:
> 
> On 3/8/21 1:51 PM, Sean Christopherson wrote:
> > On Mon, Mar 08, 2021, Ashish Kalra wrote:
> >> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> >>> +Will and Quentin (arm64)
> >>>
> >>> Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at this
> >>> point.
> >>>
> >>> On Fri, Feb 26, 2021, Ashish Kalra wrote:
> >>>> On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> >>>>> On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >>>>> Thanks for grabbing the data!
> >>>>>
> >>>>> I am fine with both paths. Sean has stated an explicit desire for
> >>>>> hypercall exiting, so I think that would be the current consensus.
> >>> Yep, though it'd be good to get Paolo's input, too.
> >>>
> >>>>> If we want to do hypercall exiting, this should be in a follow-up
> >>>>> series where we implement something more generic, e.g. a hypercall
> >>>>> exiting bitmap or hypercall exit list. If we are taking the hypercall
> >>>>> exit route, we can drop the kvm side of the hypercall.
> >>> I don't think this is a good candidate for arbitrary hypercall interception.  Or
> >>> rather, I think hypercall interception should be an orthogonal implementation.
> >>>
> >>> The guest, including guest firmware, needs to be aware that the hypercall is
> >>> supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> >>> implement a common ABI is an unnecessary risk.
> >>>
> >>> We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> >>> require further VMM intervention.  But, I just don't see the point, it would
> >>> save only a few lines of code.  It would also limit what KVM could do in the
> >>> future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> >>> then mandatory interception would essentially make it impossible for KVM to do
> >>> bookkeeping while still honoring the interception request.
> >>>
> >>> However, I do think it would make sense to have the userspace exit be a generic
> >>> exit type.  But hey, we already have the necessary ABI defined for that!  It's
> >>> just not used anywhere.
> >>>
> >>> 	/* KVM_EXIT_HYPERCALL */
> >>> 	struct {
> >>> 		__u64 nr;
> >>> 		__u64 args[6];
> >>> 		__u64 ret;
> >>> 		__u32 longmode;
> >>> 		__u32 pad;
> >>> 	} hypercall;
> >>>
> >>>
> >>>>> Userspace could also handle the MSR using MSR filters (would need to
> >>>>> confirm that).  Then userspace could also be in control of the cpuid bit.
> >>> An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> >>> The data limitation could be fudged by shoving data into non-standard GPRs, but
> >>> that will result in truly heinous guest code, and extensibility issues.
> >>>
> >>> The data limitation is a moot point, because the x86-only thing is a deal
> >>> breaker.  arm64's pKVM work has a near-identical use case for a guest to share
> >>> memory with a host.  I can't think of a clever way to avoid having to support
> >>> TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> >>> multiple KVM variants.
> >>>
> >> Potentially, there is another reason for in-kernel hypercall handling
> >> considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
> >> of each guest page, for instance pages in hypervisor state, i.e., pages
> >> with C=0 and pages in guest valid state with C=1.
> >>
> >> Now, there shouldn't be a need for page encryption status hypercalls on 
> >> SEV-SNP as KVM can track & reference guest page status directly using 
> >> the RMP table.
> > Relying on the RMP table itself would require locking the RMP table for an
> > extended duration, and walking the entire RMP to find shared pages would be
> > very inefficient.
> >
> >> As KVM maintains the RMP table, therefore we will need SET/GET type of
> >> interfaces to provide the guest page encryption status to userspace.
> > Hrm, somehow I temporarily forgot about SNP and TDX adding their own hypercalls
> > for converting between shared and private.  And in the case of TDX, the hypercall
> > can't be trusted, i.e. is just a hint, otherwise the guest could induce a #MC in
> > the host.
> >
> > But, the different guest behavior doesn't require KVM to maintain a list/tree,
> > e.g. adding a dedicated KVM_EXIT_* for notifying userspace of page encryption
> > status changes would also suffice.  
> >
> > Actually, that made me think of another argument against maintaining a list in
> > KVM: there's no way to notify userspace that a page's status has changed.
> > Userspace would need to query KVM to do GET_LIST after every GET_DIRTY.
> > Obviously not a huge issue, but it does make migration slightly less efficient.
> >
> > On a related topic, there are fatal race conditions that will require careful
> > coordination between guest and host, and will effectively be wired into the ABI.
> > SNP and TDX don't suffer these issues because host awareness of status is atomic
> > with respect to the guest actually writing the page with the new encryption
> > status.
> >
> > For SEV live migration...
> >
> > If the guest does the hypercall after writing the page, then the guest is hosed
> > if it gets migrated while writing the page (scenario #1):
> >
> >   vCPU                 Userspace
> >   zero_bytes[0:N]
> >                        <transfers written bytes as private instead of shared>
> > 		       <migrates vCPU>
> >   zero_bytes[N+1:4095]
> >   set_shared (dest)
> >   kaboom!
> 
> 
> Maybe I am missing something, this is not any different from a normal
> operation inside a guest. Making a page shared/private in the page table
> does not update the content of the page itself. In your above case, I
> assume zero_bytes[N+1:4095] are written by the destination VM. The
> memory region was private in the source VM page table, so, those writes
> will be performed encrypted. The destination VM later changed the memory
> to shared, but nobody wrote to the memory after it has been transitioned
> to the  shared, so a reader of the memory should get ciphertext and
> unless there was a write after the set_shared (dest).
> 
> 
> > If userspace does GET_DIRTY after GET_LIST, then the host would transfer bad
> > data by consuming a stale list (scenario #2):
> >
> >   vCPU               Userspace
> >                      get_list (from KVM or internally)
> >   set_shared (src)
> >   zero_page (src)
> >                      get_dirty
> >                      <transfers private data instead of shared>
> >                      <migrates vCPU>
> >   kaboom!
> 
> 
> I don't remember how things are done in recent Ashish Qemu/KVM patches
> but in previous series, the get_dirty() happens before the querying the
> encrypted state. There was some logic in VMM to resync the encrypted
> bitmap during the final migration stage and perform any additional data
> transfer since last sync.
> 
> 

Yes, we do that and in fact, we added logic in VMM to resync the
encrypted bitmap after every migration iteration and if there is a
difference in encrypted page states, then we perform additional data
transfers corresponding to those changes.

Thanks,
Ashish


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-03-08 19:51                     ` Sean Christopherson
  2021-03-08 21:05                       ` Ashish Kalra
  2021-03-08 21:11                       ` Brijesh Singh
@ 2021-03-08 21:48                       ` Steve Rutherford
  2 siblings, 0 replies; 75+ messages in thread
From: Steve Rutherford @ 2021-03-08 21:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ashish Kalra, pbonzini, joro, Lendacky, Thomas, kvm,
	linux-kernel, venu.busireddy, Singh, Brijesh, Will Deacon,
	Quentin Perret

On Mon, Mar 8, 2021 at 11:52 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Mar 08, 2021, Ashish Kalra wrote:
> > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > +Will and Quentin (arm64)
> > >
> > > Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at this
> > > point.
> > >
> > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > > > Thanks for grabbing the data!
> > > > >
> > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > hypercall exiting, so I think that would be the current consensus.
> > >
> > > Yep, though it'd be good to get Paolo's input, too.
> > >
> > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > series where we implement something more generic, e.g. a hypercall
> > > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > > exit route, we can drop the kvm side of the hypercall.
> > >
> > > I don't think this is a good candidate for arbitrary hypercall interception.  Or
> > > rather, I think hypercall interception should be an orthogonal implementation.
> > >
> > > The guest, including guest firmware, needs to be aware that the hypercall is
> > > supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> > > implement a common ABI is an unnecessary risk.
> > >
> > > We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> > > require further VMM intervention.  But, I just don't see the point, it would
> > > save only a few lines of code.  It would also limit what KVM could do in the
> > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> > > then mandatory interception would essentially make it impossible for KVM to do
> > > bookkeeping while still honoring the interception request.
> > >
> > > However, I do think it would make sense to have the userspace exit be a generic
> > > exit type.  But hey, we already have the necessary ABI defined for that!  It's
> > > just not used anywhere.
> > >
> > >     /* KVM_EXIT_HYPERCALL */
> > >     struct {
> > >             __u64 nr;
> > >             __u64 args[6];
> > >             __u64 ret;
> > >             __u32 longmode;
> > >             __u32 pad;
> > >     } hypercall;
> > >
> > >
> > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > confirm that).  Then userspace could also be in control of the cpuid bit.
> > >
> > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> > > The data limitation could be fudged by shoving data into non-standard GPRs, but
> > > that will result in truly heinous guest code, and extensibility issues.
> > >
> > > The data limitation is a moot point, because the x86-only thing is a deal
> > > breaker.  arm64's pKVM work has a near-identical use case for a guest to share
> > > memory with a host.  I can't think of a clever way to avoid having to support
> > > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > > multiple KVM variants.
> > >
> >
> > Potentially, there is another reason for in-kernel hypercall handling
> > considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
> > of each guest page, for instance pages in hypervisor state, i.e., pages
> > with C=0 and pages in guest valid state with C=1.
> >
> > Now, there shouldn't be a need for page encryption status hypercalls on
> > SEV-SNP as KVM can track & reference guest page status directly using
> > the RMP table.
>
> Relying on the RMP table itself would require locking the RMP table for an
> extended duration, and walking the entire RMP to find shared pages would be
> very inefficient.
>
> > As KVM maintains the RMP table, therefore we will need SET/GET type of
> > interfaces to provide the guest page encryption status to userspace.
>
> Hrm, somehow I temporarily forgot about SNP and TDX adding their own hypercalls
> for converting between shared and private.  And in the case of TDX, the hypercall
> can't be trusted, i.e. is just a hint, otherwise the guest could induce a #MC in
> the host.
>
> But, the different guest behavior doesn't require KVM to maintain a list/tree,
> e.g. adding a dedicated KVM_EXIT_* for notifying userspace of page encryption
> status changes would also suffice.
>
> Actually, that made me think of another argument against maintaining a list in
> KVM: there's no way to notify userspace that a page's status has changed.
> Userspace would need to query KVM to do GET_LIST after every GET_DIRTY.
> Obviously not a huge issue, but it does make migration slightly less efficient.
>
> On a related topic, there are fatal race conditions that will require careful
> coordination between guest and host, and will effectively be wired into the ABI.
> SNP and TDX don't suffer these issues because host awareness of status is atomic
> with respect to the guest actually writing the page with the new encryption
> status.
>
> For SEV live migration...
>
> If the guest does the hypercall after writing the page, then the guest is hosed
> if it gets migrated while writing the page (scenario #1):
>
>   vCPU                 Userspace
>   zero_bytes[0:N]
>                        <transfers written bytes as private instead of shared>
>                        <migrates vCPU>
>   zero_bytes[N+1:4095]
>   set_shared (dest)
>   kaboom!
>
> If userspace does GET_DIRTY after GET_LIST, then the host would transfer bad
> data by consuming a stale list (scenario #2):
>
>   vCPU               Userspace
>                      get_list (from KVM or internally)
>   set_shared (src)
>   zero_page (src)
>                      get_dirty
>                      <transfers private data instead of shared>
>                      <migrates vCPU>
>   kaboom!
>
> If both guest and host order things to avoid #1 and #2, the host can still
> migrate the wrong data (scenario #3):
>
>   vCPU               Userspace
>   set_private
>   zero_bytes[0:4096]
>                      get_dirty
>   set_shared (src)
>                      get_list
>                      <transfers as shared instead of private>
>                      <migrates vCPU>
>   set_private (dest)
>   kaboom!
>
> Scenario #3 is unlikely, but plausible, e.g. if the guest bails from its
> conversion flow for whatever reason, after making the initial hypercall.  Maybe
> it goes without saying, but to address #3, the guest must consider existing data
> as lost the instant it tells the host the page has been converted to a different
> type.
This requires the guest to bail on the conversion flow, and
subsequently treat the page as having it's previous value. The kernel
does not currently do this. I don't think it should ever do this.
Currently, each time the guest changes it's view of a page, it
re-zeroes that page, and it does so after flipping the c-bit and
calling the hypercall (IIRC). This is much more conservative than is
necessary for supporting LM, but does suffice. I think the minimal
requirement for LM support is "Flipping a page's encryption status
(and advertising that flip to the host) twice is not a no-op. The
kernel must reinitialize that page". This invariant should also be
necessary for anything that moves pages around on the same host (COPY,
SWAP_OUT, SWAP_IN).

Relatedly, there is no workaround if the kernel does in-place
reencryption (since the page would be both encrypted and unencrypted).
That said, I believe the spec affirmatively tells you not to do this
outside of SME.

There's also no solution if the guest relies on the value of
ciphertext, or on the value of "decrypted plaintext". The guest could
detect live migration by noticing that a page's value from an
unencrypted perspective has changed when it's value from an encrypted
perspective has not (or the other way around). This would also imply
that a very strange guest could be broken by live migration.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-03-08 21:11                       ` Brijesh Singh
  2021-03-08 21:32                         ` Ashish Kalra
@ 2021-03-08 21:51                         ` Steve Rutherford
  2021-03-09 19:42                           ` Sean Christopherson
  2021-03-10  3:42                           ` Kalra, Ashish
  1 sibling, 2 replies; 75+ messages in thread
From: Steve Rutherford @ 2021-03-08 21:51 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Sean Christopherson, Ashish Kalra, pbonzini, joro, Lendacky,
	Thomas, kvm, linux-kernel, venu.busireddy, Will Deacon,
	Quentin Perret

On Mon, Mar 8, 2021 at 1:11 PM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
>
> On 3/8/21 1:51 PM, Sean Christopherson wrote:
> > On Mon, Mar 08, 2021, Ashish Kalra wrote:
> >> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> >>> +Will and Quentin (arm64)
> >>>
> >>> Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at this
> >>> point.
> >>>
> >>> On Fri, Feb 26, 2021, Ashish Kalra wrote:
> >>>> On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> >>>>> On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >>>>> Thanks for grabbing the data!
> >>>>>
> >>>>> I am fine with both paths. Sean has stated an explicit desire for
> >>>>> hypercall exiting, so I think that would be the current consensus.
> >>> Yep, though it'd be good to get Paolo's input, too.
> >>>
> >>>>> If we want to do hypercall exiting, this should be in a follow-up
> >>>>> series where we implement something more generic, e.g. a hypercall
> >>>>> exiting bitmap or hypercall exit list. If we are taking the hypercall
> >>>>> exit route, we can drop the kvm side of the hypercall.
> >>> I don't think this is a good candidate for arbitrary hypercall interception.  Or
> >>> rather, I think hypercall interception should be an orthogonal implementation.
> >>>
> >>> The guest, including guest firmware, needs to be aware that the hypercall is
> >>> supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> >>> implement a common ABI is an unnecessary risk.
> >>>
> >>> We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> >>> require further VMM intervention.  But, I just don't see the point, it would
> >>> save only a few lines of code.  It would also limit what KVM could do in the
> >>> future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> >>> then mandatory interception would essentially make it impossible for KVM to do
> >>> bookkeeping while still honoring the interception request.
> >>>
> >>> However, I do think it would make sense to have the userspace exit be a generic
> >>> exit type.  But hey, we already have the necessary ABI defined for that!  It's
> >>> just not used anywhere.
> >>>
> >>>     /* KVM_EXIT_HYPERCALL */
> >>>     struct {
> >>>             __u64 nr;
> >>>             __u64 args[6];
> >>>             __u64 ret;
> >>>             __u32 longmode;
> >>>             __u32 pad;
> >>>     } hypercall;
> >>>
> >>>
> >>>>> Userspace could also handle the MSR using MSR filters (would need to
> >>>>> confirm that).  Then userspace could also be in control of the cpuid bit.
> >>> An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> >>> The data limitation could be fudged by shoving data into non-standard GPRs, but
> >>> that will result in truly heinous guest code, and extensibility issues.
> >>>
> >>> The data limitation is a moot point, because the x86-only thing is a deal
> >>> breaker.  arm64's pKVM work has a near-identical use case for a guest to share
> >>> memory with a host.  I can't think of a clever way to avoid having to support
> >>> TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> >>> multiple KVM variants.
> >>>
> >> Potentially, there is another reason for in-kernel hypercall handling
> >> considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
> >> of each guest page, for instance pages in hypervisor state, i.e., pages
> >> with C=0 and pages in guest valid state with C=1.
> >>
> >> Now, there shouldn't be a need for page encryption status hypercalls on
> >> SEV-SNP as KVM can track & reference guest page status directly using
> >> the RMP table.
> > Relying on the RMP table itself would require locking the RMP table for an
> > extended duration, and walking the entire RMP to find shared pages would be
> > very inefficient.
> >
> >> As KVM maintains the RMP table, therefore we will need SET/GET type of
> >> interfaces to provide the guest page encryption status to userspace.
> > Hrm, somehow I temporarily forgot about SNP and TDX adding their own hypercalls
> > for converting between shared and private.  And in the case of TDX, the hypercall
> > can't be trusted, i.e. is just a hint, otherwise the guest could induce a #MC in
> > the host.
> >
> > But, the different guest behavior doesn't require KVM to maintain a list/tree,
> > e.g. adding a dedicated KVM_EXIT_* for notifying userspace of page encryption
> > status changes would also suffice.
> >
> > Actually, that made me think of another argument against maintaining a list in
> > KVM: there's no way to notify userspace that a page's status has changed.
> > Userspace would need to query KVM to do GET_LIST after every GET_DIRTY.
> > Obviously not a huge issue, but it does make migration slightly less efficient.
> >
> > On a related topic, there are fatal race conditions that will require careful
> > coordination between guest and host, and will effectively be wired into the ABI.
> > SNP and TDX don't suffer these issues because host awareness of status is atomic
> > with respect to the guest actually writing the page with the new encryption
> > status.
> >
> > For SEV live migration...
> >
> > If the guest does the hypercall after writing the page, then the guest is hosed
> > if it gets migrated while writing the page (scenario #1):
> >
> >   vCPU                 Userspace
> >   zero_bytes[0:N]
> >                        <transfers written bytes as private instead of shared>
> >                      <migrates vCPU>
> >   zero_bytes[N+1:4095]
> >   set_shared (dest)
> >   kaboom!
>
>
> Maybe I am missing something, this is not any different from a normal
> operation inside a guest. Making a page shared/private in the page table
> does not update the content of the page itself. In your above case, I
> assume zero_bytes[N+1:4095] are written by the destination VM. The
> memory region was private in the source VM page table, so, those writes
> will be performed encrypted. The destination VM later changed the memory
> to shared, but nobody wrote to the memory after it has been transitioned
> to the  shared, so a reader of the memory should get ciphertext and
> unless there was a write after the set_shared (dest).
>
>
> > If userspace does GET_DIRTY after GET_LIST, then the host would transfer bad
> > data by consuming a stale list (scenario #2):
> >
> >   vCPU               Userspace
> >                      get_list (from KVM or internally)
> >   set_shared (src)
> >   zero_page (src)
> >                      get_dirty
> >                      <transfers private data instead of shared>
> >                      <migrates vCPU>
> >   kaboom!
>
>
> I don't remember how things are done in recent Ashish Qemu/KVM patches
> but in previous series, the get_dirty() happens before the querying the
> encrypted state. There was some logic in VMM to resync the encrypted
> bitmap during the final migration stage and perform any additional data
> transfer since last sync.
>
>
> > If both guest and host order things to avoid #1 and #2, the host can still
> > migrate the wrong data (scenario #3):
> >
> >   vCPU               Userspace
> >   set_private
> >   zero_bytes[0:4096]
> >                      get_dirty
> >   set_shared (src)
> >                      get_list
> >                      <transfers as shared instead of private>
> >                    <migrates vCPU>
> >   set_private (dest)
> >   kaboom!
>
>
> Since there was no write to the memory after the set_shared (src), so
> the content of the page should not have changed. After the set_private
> (dest), the caller should be seeing the same content written by the
> zero_bytes[0:4096]
I think Sean was going for the situation where the VM has moved to the
destination, which would have changed the VEK. That way the guest
would be decrypting the old ciphertext with the new (wrong) key.
>
>
> > Scenario #3 is unlikely, but plausible, e.g. if the guest bails from its
> > conversion flow for whatever reason, after making the initial hypercall.  Maybe
> > it goes without saying, but to address #3, the guest must consider existing data
> > as lost the instant it tells the host the page has been converted to a different
> > type.
> >
> >> For the above reason if we do in-kernel hypercall handling for page
> >> encryption status (which we probably won't require for SEV-SNP &
> >> correspondingly there will be no hypercall exiting),
> > As above, that doesn't preclude KVM from exiting to userspace on conversion.
> >
> >> then we can implement a standard GET/SET ioctl interface to get/set the guest
> >> page encryption status for userspace, which will work across SEV, SEV-ES and
> >> SEV-SNP.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-03-03 18:54                     ` Will Deacon
  2021-03-03 19:32                       ` Ashish Kalra
@ 2021-03-09 19:10                       ` Ashish Kalra
  2021-03-11 18:14                       ` Ashish Kalra
  2 siblings, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-03-09 19:10 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sean Christopherson, Steve Rutherford, pbonzini, joro, Lendacky,
	Thomas, kvm, linux-kernel, venu.busireddy, Singh, Brijesh,
	Quentin Perret, maz

On Wed, Mar 03, 2021 at 06:54:41PM +0000, Will Deacon wrote:
> [+Marc]
> 
> On Tue, Mar 02, 2021 at 02:55:43PM +0000, Ashish Kalra wrote:
> > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > > > Thanks for grabbing the data!
> > > > > 
> > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > hypercall exiting, so I think that would be the current consensus.
> > > 
> > > Yep, though it'd be good to get Paolo's input, too.
> > > 
> > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > series where we implement something more generic, e.g. a hypercall
> > > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > > exit route, we can drop the kvm side of the hypercall.
> > > 
> > > I don't think this is a good candidate for arbitrary hypercall interception.  Or
> > > rather, I think hypercall interception should be an orthogonal implementation.
> > > 
> > > The guest, including guest firmware, needs to be aware that the hypercall is
> > > supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> > > implement a common ABI is an unnecessary risk.
> > > 
> > > We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> > > require further VMM intervention.  But, I just don't see the point, it would
> > > save only a few lines of code.  It would also limit what KVM could do in the
> > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> > > then mandatory interception would essentially make it impossible for KVM to do
> > > bookkeeping while still honoring the interception request.
> > > 
> > > However, I do think it would make sense to have the userspace exit be a generic
> > > exit type.  But hey, we already have the necessary ABI defined for that!  It's
> > > just not used anywhere.
> > > 
> > > 	/* KVM_EXIT_HYPERCALL */
> > > 	struct {
> > > 		__u64 nr;
> > > 		__u64 args[6];
> > > 		__u64 ret;
> > > 		__u32 longmode;
> > > 		__u32 pad;
> > > 	} hypercall;
> > > 

Doc mentions to use KVM_EXIT_IO for x86 to implement "hypercall to userspace"
functionality. 

Any reasons for not using KVM_EXIT_HYPERCALL interface any more, even
though it is still there and not removed ?

> > > 
> > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > confirm that).  Then userspace could also be in control of the cpuid bit.
> > > 
> > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> > > The data limitation could be fudged by shoving data into non-standard GPRs, but
> > > that will result in truly heinous guest code, and extensibility issues.
> > > 
> > > The data limitation is a moot point, because the x86-only thing is a deal
> > > breaker.  arm64's pKVM work has a near-identical use case for a guest to share
> > > memory with a host.  I can't think of a clever way to avoid having to support
> > > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > > multiple KVM variants.
> > 
> > Looking at arm64's pKVM work, i see that it is a recently introduced RFC
> > patch-set and probably relevant to arm64 nVHE hypervisor
> > mode/implementation, and potentially makes sense as it adds guest
> > memory protection as both host and guest kernels are running on the same
> > privilege level ?
> > 
> > Though i do see that the pKVM stuff adds two hypercalls, specifically :
> > 
> > pkvm_create_mappings() ( I assume this is for setting shared memory
> > regions between host and guest) &
> > pkvm_create_private_mappings().
> > 
> > And the use-cases are quite similar to memory protection architectues
> > use cases, for example, use with virtio devices, guest DMA I/O, etc.
> 
> These hypercalls are both private to the host kernel communicating with
> its hypervisor counterpart, so I don't think they're particularly
> relevant here. As far as I can see, the more useful thing is to allow
> the guest to communicate back to the host (and the VMM) that it has opened
> up a memory window, perhaps for virtio rings or some other shared memory.
> 
> We hacked this up as a prototype in the past:
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fandroid-kvm.googlesource.com%2Flinux%2F%2B%2Fd12a9e2c12a52cf7140d40cd9fa092dc8a85fac9%255E%2521%2F&amp;data=04%7C01%7Cashish.kalra%40amd.com%7C7ae6bbd9fa6442f9edcc08d8de75d14b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637503944913839841%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Juon5nJ7BB6moTWYssRXOWrDOrYfZLmA%2BLrz3s12Ook%3D&amp;reserved=0
> 
> but that's all arm64-specific and if we're solving the same problem as
> you, then let's avoid arch-specific stuff if possible. The way in which
> the guest discovers the interface will be arch-specific (we already have
> a mechanism for that and some hypercalls are already allocated by specs
> from Arm), but the interface back to the VMM and some (most?) of the host
> handling could be shared.
> 

Again any reason for not using the KVM_EXIT_HYPERCALL interface for these ?

Thanks,
Ashish

> > But, isn't this patch set still RFC, and though i agree that it adds
> > an infrastructure for standardised communication between the host and
> > it's guests for mutually controlled shared memory regions and
> > surely adds some kind of portability between hypervisor
> > implementations, but nothing is standardised still, right ?
> 
> No, and it seems that you're further ahead than us in terms of
> implementation in this area. We're happy to review patches though, to
> make sure we end up with something that works for us both.
> 
> Will

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-03-08 21:51                         ` Steve Rutherford
@ 2021-03-09 19:42                           ` Sean Christopherson
  2021-03-10  3:42                           ` Kalra, Ashish
  1 sibling, 0 replies; 75+ messages in thread
From: Sean Christopherson @ 2021-03-09 19:42 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Brijesh Singh, Ashish Kalra, pbonzini, joro, Lendacky, Thomas,
	kvm, linux-kernel, venu.busireddy, Will Deacon, Quentin Perret

On Mon, Mar 08, 2021, Steve Rutherford wrote:
> On Mon, Mar 8, 2021 at 1:11 PM Brijesh Singh <brijesh.singh@amd.com> wrote:
> > On 3/8/21 1:51 PM, Sean Christopherson wrote:
> > > If the guest does the hypercall after writing the page, then the guest is hosed
> > > if it gets migrated while writing the page (scenario #1):
> > >
> > >   vCPU                 Userspace
> > >   zero_bytes[0:N]
> > >                        <transfers written bytes as private instead of shared>
> > >                      <migrates vCPU>
> > >   zero_bytes[N+1:4095]
> > >   set_shared (dest)
> > >   kaboom!
> >
> >
> > Maybe I am missing something, this is not any different from a normal
> > operation inside a guest. Making a page shared/private in the page table
> > does not update the content of the page itself. In your above case, I
> > assume zero_bytes[N+1:4095] are written by the destination VM. The
> > memory region was private in the source VM page table, so, those writes
> > will be performed encrypted. The destination VM later changed the memory
> > to shared, but nobody wrote to the memory after it has been transitioned
> > to the  shared, so a reader of the memory should get ciphertext and
> > unless there was a write after the set_shared (dest).

Sorry, that wasn't clear, there's an implied page table update to make the page
shared before zero_bytes.

> > > If userspace does GET_DIRTY after GET_LIST, then the host would transfer bad
> > > data by consuming a stale list (scenario #2):
> > >
> > >   vCPU               Userspace
> > >                      get_list (from KVM or internally)
> > >   set_shared (src)
> > >   zero_page (src)
> > >                      get_dirty
> > >                      <transfers private data instead of shared>
> > >                      <migrates vCPU>
> > >   kaboom!
> >
> >
> > I don't remember how things are done in recent Ashish Qemu/KVM patches
> > but in previous series, the get_dirty() happens before the querying the
> > encrypted state. There was some logic in VMM to resync the encrypted
> > bitmap during the final migration stage and perform any additional data
> > transfer since last sync.

It's likely that Ashish's patches did the correct thing, I just wanted to point
out that both host and guest need to do everything in a very specific order.

> > > If both guest and host order things to avoid #1 and #2, the host can still
> > > migrate the wrong data (scenario #3):
> > >
> > >   vCPU               Userspace
> > >   set_private
> > >   zero_bytes[0:4096]
> > >                      get_dirty
> > >   set_shared (src)
> > >                      get_list
> > >                      <transfers as shared instead of private>
> > >                    <migrates vCPU>
> > >   set_private (dest)
> > >   kaboom!
> >
> >
> > Since there was no write to the memory after the set_shared (src), so
> > the content of the page should not have changed. After the set_private
> > (dest), the caller should be seeing the same content written by the
> > zero_bytes[0:4096]
>
> I think Sean was going for the situation where the VM has moved to the
> destination, which would have changed the VEK. That way the guest
> would be decrypting the old ciphertext with the new (wrong) key.

I think that's what I was saying?

I was pointing out that the userspace VMM would see the page as "shared" and so
read the guest memory directly instead of routing it through the PSP.

Anyways, my intent wasn't to point out a known issue anywhere.  I was working
through how GET_LIST would interact with GET_DIRTY_LOG to convince myself that
the various flows were bombproof.  I wanted to record those thoughts so that I
can remind myself of the requirements when I inevitably forget them in the future.

> > > Scenario #3 is unlikely, but plausible, e.g. if the guest bails from its
> > > conversion flow for whatever reason, after making the initial hypercall.  Maybe
> > > it goes without saying, but to address #3, the guest must consider existing data
> > > as lost the instant it tells the host the page has been converted to a different
> > > type.
> > >
> > >> For the above reason if we do in-kernel hypercall handling for page
> > >> encryption status (which we probably won't require for SEV-SNP &
> > >> correspondingly there will be no hypercall exiting),
> > > As above, that doesn't preclude KVM from exiting to userspace on conversion.
> > >
> > >> then we can implement a standard GET/SET ioctl interface to get/set the guest
> > >> page encryption status for userspace, which will work across SEV, SEV-ES and
> > >> SEV-SNP.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-03-08 21:51                         ` Steve Rutherford
  2021-03-09 19:42                           ` Sean Christopherson
@ 2021-03-10  3:42                           ` Kalra, Ashish
  2021-03-10  3:47                             ` Steve Rutherford
  1 sibling, 1 reply; 75+ messages in thread
From: Kalra, Ashish @ 2021-03-10  3:42 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Singh, Brijesh, Sean Christopherson, pbonzini, joro, Lendacky,
	Thomas, kvm, linux-kernel, venu.busireddy, Will Deacon,
	Quentin Perret



> On Mar 9, 2021, at 3:22 AM, Steve Rutherford <srutherford@google.com> wrote:
> 
> On Mon, Mar 8, 2021 at 1:11 PM Brijesh Singh <brijesh.singh@amd.com> wrote:
>> 
>> 
>>> On 3/8/21 1:51 PM, Sean Christopherson wrote:
>>> On Mon, Mar 08, 2021, Ashish Kalra wrote:
>>>> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
>>>>> +Will and Quentin (arm64)
>>>>> 
>>>>> Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at this
>>>>> point.
>>>>> 
>>>>> On Fri, Feb 26, 2021, Ashish Kalra wrote:
>>>>>> On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
>>>>>>> On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>>>>>>> Thanks for grabbing the data!
>>>>>>> 
>>>>>>> I am fine with both paths. Sean has stated an explicit desire for
>>>>>>> hypercall exiting, so I think that would be the current consensus.
>>>>> Yep, though it'd be good to get Paolo's input, too.
>>>>> 
>>>>>>> If we want to do hypercall exiting, this should be in a follow-up
>>>>>>> series where we implement something more generic, e.g. a hypercall
>>>>>>> exiting bitmap or hypercall exit list. If we are taking the hypercall
>>>>>>> exit route, we can drop the kvm side of the hypercall.
>>>>> I don't think this is a good candidate for arbitrary hypercall interception.  Or
>>>>> rather, I think hypercall interception should be an orthogonal implementation.
>>>>> 
>>>>> The guest, including guest firmware, needs to be aware that the hypercall is
>>>>> supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
>>>>> implement a common ABI is an unnecessary risk.
>>>>> 
>>>>> We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
>>>>> require further VMM intervention.  But, I just don't see the point, it would
>>>>> save only a few lines of code.  It would also limit what KVM could do in the
>>>>> future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
>>>>> then mandatory interception would essentially make it impossible for KVM to do
>>>>> bookkeeping while still honoring the interception request.
>>>>> 
>>>>> However, I do think it would make sense to have the userspace exit be a generic
>>>>> exit type.  But hey, we already have the necessary ABI defined for that!  It's
>>>>> just not used anywhere.
>>>>> 
>>>>>    /* KVM_EXIT_HYPERCALL */
>>>>>    struct {
>>>>>            __u64 nr;
>>>>>            __u64 args[6];
>>>>>            __u64 ret;
>>>>>            __u32 longmode;
>>>>>            __u32 pad;
>>>>>    } hypercall;
>>>>> 
>>>>> 
>>>>>>> Userspace could also handle the MSR using MSR filters (would need to
>>>>>>> confirm that).  Then userspace could also be in control of the cpuid bit.
>>>>> An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
>>>>> The data limitation could be fudged by shoving data into non-standard GPRs, but
>>>>> that will result in truly heinous guest code, and extensibility issues.
>>>>> 
>>>>> The data limitation is a moot point, because the x86-only thing is a deal
>>>>> breaker.  arm64's pKVM work has a near-identical use case for a guest to share
>>>>> memory with a host.  I can't think of a clever way to avoid having to support
>>>>> TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
>>>>> multiple KVM variants.
>>>>> 
>>>> Potentially, there is another reason for in-kernel hypercall handling
>>>> considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
>>>> of each guest page, for instance pages in hypervisor state, i.e., pages
>>>> with C=0 and pages in guest valid state with C=1.
>>>> 
>>>> Now, there shouldn't be a need for page encryption status hypercalls on
>>>> SEV-SNP as KVM can track & reference guest page status directly using
>>>> the RMP table.
>>> Relying on the RMP table itself would require locking the RMP table for an
>>> extended duration, and walking the entire RMP to find shared pages would be
>>> very inefficient.
>>> 
>>>> As KVM maintains the RMP table, therefore we will need SET/GET type of
>>>> interfaces to provide the guest page encryption status to userspace.
>>> Hrm, somehow I temporarily forgot about SNP and TDX adding their own hypercalls
>>> for converting between shared and private.  And in the case of TDX, the hypercall
>>> can't be trusted, i.e. is just a hint, otherwise the guest could induce a #MC in
>>> the host.
>>> 
>>> But, the different guest behavior doesn't require KVM to maintain a list/tree,
>>> e.g. adding a dedicated KVM_EXIT_* for notifying userspace of page encryption
>>> status changes would also suffice.
>>> 
>>> Actually, that made me think of another argument against maintaining a list in
>>> KVM: there's no way to notify userspace that a page's status has changed.
>>> Userspace would need to query KVM to do GET_LIST after every GET_DIRTY.
>>> Obviously not a huge issue, but it does make migration slightly less efficient.
>>> 
>>> On a related topic, there are fatal race conditions that will require careful
>>> coordination between guest and host, and will effectively be wired into the ABI.
>>> SNP and TDX don't suffer these issues because host awareness of status is atomic
>>> with respect to the guest actually writing the page with the new encryption
>>> status.
>>> 
>>> For SEV live migration...
>>> 
>>> If the guest does the hypercall after writing the page, then the guest is hosed
>>> if it gets migrated while writing the page (scenario #1):
>>> 
>>>  vCPU                 Userspace
>>>  zero_bytes[0:N]
>>>                       <transfers written bytes as private instead of shared>
>>>                     <migrates vCPU>
>>>  zero_bytes[N+1:4095]
>>>  set_shared (dest)
>>>  kaboom!
>> 
>> 
>> Maybe I am missing something, this is not any different from a normal
>> operation inside a guest. Making a page shared/private in the page table
>> does not update the content of the page itself. In your above case, I
>> assume zero_bytes[N+1:4095] are written by the destination VM. The
>> memory region was private in the source VM page table, so, those writes
>> will be performed encrypted. The destination VM later changed the memory
>> to shared, but nobody wrote to the memory after it has been transitioned
>> to the  shared, so a reader of the memory should get ciphertext and
>> unless there was a write after the set_shared (dest).
>> 
>> 
>>> If userspace does GET_DIRTY after GET_LIST, then the host would transfer bad
>>> data by consuming a stale list (scenario #2):
>>> 
>>>  vCPU               Userspace
>>>                     get_list (from KVM or internally)
>>>  set_shared (src)
>>>  zero_page (src)
>>>                     get_dirty
>>>                     <transfers private data instead of shared>
>>>                     <migrates vCPU>
>>>  kaboom!
>> 
>> 
>> I don't remember how things are done in recent Ashish Qemu/KVM patches
>> but in previous series, the get_dirty() happens before the querying the
>> encrypted state. There was some logic in VMM to resync the encrypted
>> bitmap during the final migration stage and perform any additional data
>> transfer since last sync.
>> 
>> 
>>> If both guest and host order things to avoid #1 and #2, the host can still
>>> migrate the wrong data (scenario #3):
>>> 
>>>  vCPU               Userspace
>>>  set_private
>>>  zero_bytes[0:4096]
>>>                     get_dirty
>>>  set_shared (src)
>>>                     get_list
>>>                     <transfers as shared instead of private>
>>>                   <migrates vCPU>
>>>  set_private (dest)
>>>  kaboom!
>> 
>> 
>> Since there was no write to the memory after the set_shared (src), so
>> the content of the page should not have changed. After the set_private
>> (dest), the caller should be seeing the same content written by the
>> zero_bytes[0:4096]
> I think Sean was going for the situation where the VM has moved to the
> destination, which would have changed the VEK. That way the guest
> would be decrypting the old ciphertext with the new (wrong) key.
>> 

But how can this happen, if a page is migrated as private , when it is received it will be decrypted using the transport key TEK and then re-encrypted using the destination VM’s VEK on the destination VM.

Thanks,
Ashish

>> 
>>> Scenario #3 is unlikely, but plausible, e.g. if the guest bails from its
>>> conversion flow for whatever reason, after making the initial hypercall.  Maybe
>>> it goes without saying, but to address #3, the guest must consider existing data
>>> as lost the instant it tells the host the page has been converted to a different
>>> type.
>>> 
>>>> For the above reason if we do in-kernel hypercall handling for page
>>>> encryption status (which we probably won't require for SEV-SNP &
>>>> correspondingly there will be no hypercall exiting),
>>> As above, that doesn't preclude KVM from exiting to userspace on conversion.
>>> 
>>>> then we can implement a standard GET/SET ioctl interface to get/set the guest
>>>> page encryption status for userspace, which will work across SEV, SEV-ES and
>>>> SEV-SNP.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-03-10  3:42                           ` Kalra, Ashish
@ 2021-03-10  3:47                             ` Steve Rutherford
  0 siblings, 0 replies; 75+ messages in thread
From: Steve Rutherford @ 2021-03-10  3:47 UTC (permalink / raw)
  To: Kalra, Ashish
  Cc: Singh, Brijesh, Sean Christopherson, pbonzini, joro, Lendacky,
	Thomas, kvm, linux-kernel, venu.busireddy, Will Deacon,
	Quentin Perret

On Tue, Mar 9, 2021 at 7:42 PM Kalra, Ashish <Ashish.Kalra@amd.com> wrote:
>
>
>
> > On Mar 9, 2021, at 3:22 AM, Steve Rutherford <srutherford@google.com> wrote:
> >
> > On Mon, Mar 8, 2021 at 1:11 PM Brijesh Singh <brijesh.singh@amd.com> wrote:
> >>
> >>
> >>> On 3/8/21 1:51 PM, Sean Christopherson wrote:
> >>> On Mon, Mar 08, 2021, Ashish Kalra wrote:
> >>>> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> >>>>> +Will and Quentin (arm64)
> >>>>>
> >>>>> Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at this
> >>>>> point.
> >>>>>
> >>>>> On Fri, Feb 26, 2021, Ashish Kalra wrote:
> >>>>>> On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> >>>>>>> On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >>>>>>> Thanks for grabbing the data!
> >>>>>>>
> >>>>>>> I am fine with both paths. Sean has stated an explicit desire for
> >>>>>>> hypercall exiting, so I think that would be the current consensus.
> >>>>> Yep, though it'd be good to get Paolo's input, too.
> >>>>>
> >>>>>>> If we want to do hypercall exiting, this should be in a follow-up
> >>>>>>> series where we implement something more generic, e.g. a hypercall
> >>>>>>> exiting bitmap or hypercall exit list. If we are taking the hypercall
> >>>>>>> exit route, we can drop the kvm side of the hypercall.
> >>>>> I don't think this is a good candidate for arbitrary hypercall interception.  Or
> >>>>> rather, I think hypercall interception should be an orthogonal implementation.
> >>>>>
> >>>>> The guest, including guest firmware, needs to be aware that the hypercall is
> >>>>> supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> >>>>> implement a common ABI is an unnecessary risk.
> >>>>>
> >>>>> We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> >>>>> require further VMM intervention.  But, I just don't see the point, it would
> >>>>> save only a few lines of code.  It would also limit what KVM could do in the
> >>>>> future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> >>>>> then mandatory interception would essentially make it impossible for KVM to do
> >>>>> bookkeeping while still honoring the interception request.
> >>>>>
> >>>>> However, I do think it would make sense to have the userspace exit be a generic
> >>>>> exit type.  But hey, we already have the necessary ABI defined for that!  It's
> >>>>> just not used anywhere.
> >>>>>
> >>>>>    /* KVM_EXIT_HYPERCALL */
> >>>>>    struct {
> >>>>>            __u64 nr;
> >>>>>            __u64 args[6];
> >>>>>            __u64 ret;
> >>>>>            __u32 longmode;
> >>>>>            __u32 pad;
> >>>>>    } hypercall;
> >>>>>
> >>>>>
> >>>>>>> Userspace could also handle the MSR using MSR filters (would need to
> >>>>>>> confirm that).  Then userspace could also be in control of the cpuid bit.
> >>>>> An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> >>>>> The data limitation could be fudged by shoving data into non-standard GPRs, but
> >>>>> that will result in truly heinous guest code, and extensibility issues.
> >>>>>
> >>>>> The data limitation is a moot point, because the x86-only thing is a deal
> >>>>> breaker.  arm64's pKVM work has a near-identical use case for a guest to share
> >>>>> memory with a host.  I can't think of a clever way to avoid having to support
> >>>>> TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> >>>>> multiple KVM variants.
> >>>>>
> >>>> Potentially, there is another reason for in-kernel hypercall handling
> >>>> considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
> >>>> of each guest page, for instance pages in hypervisor state, i.e., pages
> >>>> with C=0 and pages in guest valid state with C=1.
> >>>>
> >>>> Now, there shouldn't be a need for page encryption status hypercalls on
> >>>> SEV-SNP as KVM can track & reference guest page status directly using
> >>>> the RMP table.
> >>> Relying on the RMP table itself would require locking the RMP table for an
> >>> extended duration, and walking the entire RMP to find shared pages would be
> >>> very inefficient.
> >>>
> >>>> As KVM maintains the RMP table, therefore we will need SET/GET type of
> >>>> interfaces to provide the guest page encryption status to userspace.
> >>> Hrm, somehow I temporarily forgot about SNP and TDX adding their own hypercalls
> >>> for converting between shared and private.  And in the case of TDX, the hypercall
> >>> can't be trusted, i.e. is just a hint, otherwise the guest could induce a #MC in
> >>> the host.
> >>>
> >>> But, the different guest behavior doesn't require KVM to maintain a list/tree,
> >>> e.g. adding a dedicated KVM_EXIT_* for notifying userspace of page encryption
> >>> status changes would also suffice.
> >>>
> >>> Actually, that made me think of another argument against maintaining a list in
> >>> KVM: there's no way to notify userspace that a page's status has changed.
> >>> Userspace would need to query KVM to do GET_LIST after every GET_DIRTY.
> >>> Obviously not a huge issue, but it does make migration slightly less efficient.
> >>>
> >>> On a related topic, there are fatal race conditions that will require careful
> >>> coordination between guest and host, and will effectively be wired into the ABI.
> >>> SNP and TDX don't suffer these issues because host awareness of status is atomic
> >>> with respect to the guest actually writing the page with the new encryption
> >>> status.
> >>>
> >>> For SEV live migration...
> >>>
> >>> If the guest does the hypercall after writing the page, then the guest is hosed
> >>> if it gets migrated while writing the page (scenario #1):
> >>>
> >>>  vCPU                 Userspace
> >>>  zero_bytes[0:N]
> >>>                       <transfers written bytes as private instead of shared>
> >>>                     <migrates vCPU>
> >>>  zero_bytes[N+1:4095]
> >>>  set_shared (dest)
> >>>  kaboom!
> >>
> >>
> >> Maybe I am missing something, this is not any different from a normal
> >> operation inside a guest. Making a page shared/private in the page table
> >> does not update the content of the page itself. In your above case, I
> >> assume zero_bytes[N+1:4095] are written by the destination VM. The
> >> memory region was private in the source VM page table, so, those writes
> >> will be performed encrypted. The destination VM later changed the memory
> >> to shared, but nobody wrote to the memory after it has been transitioned
> >> to the  shared, so a reader of the memory should get ciphertext and
> >> unless there was a write after the set_shared (dest).
> >>
> >>
> >>> If userspace does GET_DIRTY after GET_LIST, then the host would transfer bad
> >>> data by consuming a stale list (scenario #2):
> >>>
> >>>  vCPU               Userspace
> >>>                     get_list (from KVM or internally)
> >>>  set_shared (src)
> >>>  zero_page (src)
> >>>                     get_dirty
> >>>                     <transfers private data instead of shared>
> >>>                     <migrates vCPU>
> >>>  kaboom!
> >>
> >>
> >> I don't remember how things are done in recent Ashish Qemu/KVM patches
> >> but in previous series, the get_dirty() happens before the querying the
> >> encrypted state. There was some logic in VMM to resync the encrypted
> >> bitmap during the final migration stage and perform any additional data
> >> transfer since last sync.
> >>
> >>
> >>> If both guest and host order things to avoid #1 and #2, the host can still
> >>> migrate the wrong data (scenario #3):
> >>>
> >>>  vCPU               Userspace
> >>>  set_private
> >>>  zero_bytes[0:4096]
> >>>                     get_dirty
> >>>  set_shared (src)
> >>>                     get_list
> >>>                     <transfers as shared instead of private>
> >>>                   <migrates vCPU>
> >>>  set_private (dest)
> >>>  kaboom!
> >>
> >>
> >> Since there was no write to the memory after the set_shared (src), so
> >> the content of the page should not have changed. After the set_private
> >> (dest), the caller should be seeing the same content written by the
> >> zero_bytes[0:4096]
> > I think Sean was going for the situation where the VM has moved to the
> > destination, which would have changed the VEK. That way the guest
> > would be decrypting the old ciphertext with the new (wrong) key.
> >>
>
> But how can this happen, if a page is migrated as private , when it is received it will be decrypted using the transport key TEK and then re-encrypted using the destination VM’s VEK on the destination VM.
>
If, as in scenario #3 above, the page is set to shared just before
being migrated. It would then be migrated in the clear, but be
interpreted on the target as encrypted (since, immediately
post-migration, the page is flipped to private without ever writing to
the page). This is not a scenario that is expected to work, as it
requires violating (currently unspoken?) invariants.

Thanks,
Steve

> Thanks,
> Ashish
>
> >>
> >>> Scenario #3 is unlikely, but plausible, e.g. if the guest bails from its
> >>> conversion flow for whatever reason, after making the initial hypercall.  Maybe
> >>> it goes without saying, but to address #3, the guest must consider existing data
> >>> as lost the instant it tells the host the page has been converted to a different
> >>> type.
> >>>
> >>>> For the above reason if we do in-kernel hypercall handling for page
> >>>> encryption status (which we probably won't require for SEV-SNP &
> >>>> correspondingly there will be no hypercall exiting),
> >>> As above, that doesn't preclude KVM from exiting to userspace on conversion.
> >>>
> >>>> then we can implement a standard GET/SET ioctl interface to get/set the guest
> >>>> page encryption status for userspace, which will work across SEV, SEV-ES and
> >>>> SEV-SNP.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-03-03 18:54                     ` Will Deacon
  2021-03-03 19:32                       ` Ashish Kalra
  2021-03-09 19:10                       ` Ashish Kalra
@ 2021-03-11 18:14                       ` Ashish Kalra
  2021-03-11 20:48                         ` Steve Rutherford
  2 siblings, 1 reply; 75+ messages in thread
From: Ashish Kalra @ 2021-03-11 18:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sean Christopherson, Steve Rutherford, pbonzini, joro, Lendacky,
	Thomas, kvm, linux-kernel, venu.busireddy, Singh, Brijesh,
	Quentin Perret, maz

On Wed, Mar 03, 2021 at 06:54:41PM +0000, Will Deacon wrote:
> [+Marc]
> 
> On Tue, Mar 02, 2021 at 02:55:43PM +0000, Ashish Kalra wrote:
> > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > > > Thanks for grabbing the data!
> > > > > 
> > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > hypercall exiting, so I think that would be the current consensus.
> > > 
> > > Yep, though it'd be good to get Paolo's input, too.
> > > 
> > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > series where we implement something more generic, e.g. a hypercall
> > > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > > exit route, we can drop the kvm side of the hypercall.
> > > 
> > > I don't think this is a good candidate for arbitrary hypercall interception.  Or
> > > rather, I think hypercall interception should be an orthogonal implementation.
> > > 
> > > The guest, including guest firmware, needs to be aware that the hypercall is
> > > supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> > > implement a common ABI is an unnecessary risk.
> > > 
> > > We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> > > require further VMM intervention.  But, I just don't see the point, it would
> > > save only a few lines of code.  It would also limit what KVM could do in the
> > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> > > then mandatory interception would essentially make it impossible for KVM to do
> > > bookkeeping while still honoring the interception request.
> > > 
> > > However, I do think it would make sense to have the userspace exit be a generic
> > > exit type.  But hey, we already have the necessary ABI defined for that!  It's
> > > just not used anywhere.
> > > 
> > > 	/* KVM_EXIT_HYPERCALL */
> > > 	struct {
> > > 		__u64 nr;
> > > 		__u64 args[6];
> > > 		__u64 ret;
> > > 		__u32 longmode;
> > > 		__u32 pad;
> > > 	} hypercall;
> > > 
> > > 
> > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > confirm that).  Then userspace could also be in control of the cpuid bit.
> > > 
> > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> > > The data limitation could be fudged by shoving data into non-standard GPRs, but
> > > that will result in truly heinous guest code, and extensibility issues.
> > > 
> > > The data limitation is a moot point, because the x86-only thing is a deal
> > > breaker.  arm64's pKVM work has a near-identical use case for a guest to share
> > > memory with a host.  I can't think of a clever way to avoid having to support
> > > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > > multiple KVM variants.
> > 
> > Looking at arm64's pKVM work, i see that it is a recently introduced RFC
> > patch-set and probably relevant to arm64 nVHE hypervisor
> > mode/implementation, and potentially makes sense as it adds guest
> > memory protection as both host and guest kernels are running on the same
> > privilege level ?
> > 
> > Though i do see that the pKVM stuff adds two hypercalls, specifically :
> > 
> > pkvm_create_mappings() ( I assume this is for setting shared memory
> > regions between host and guest) &
> > pkvm_create_private_mappings().
> > 
> > And the use-cases are quite similar to memory protection architectues
> > use cases, for example, use with virtio devices, guest DMA I/O, etc.
> 
> These hypercalls are both private to the host kernel communicating with
> its hypervisor counterpart, so I don't think they're particularly
> relevant here. As far as I can see, the more useful thing is to allow
> the guest to communicate back to the host (and the VMM) that it has opened
> up a memory window, perhaps for virtio rings or some other shared memory.
> 
> We hacked this up as a prototype in the past:
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fandroid-kvm.googlesource.com%2Flinux%2F%2B%2Fd12a9e2c12a52cf7140d40cd9fa092dc8a85fac9%255E%2521%2F&amp;data=04%7C01%7Cashish.kalra%40amd.com%7C7ae6bbd9fa6442f9edcc08d8de75d14b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637503944913839841%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Juon5nJ7BB6moTWYssRXOWrDOrYfZLmA%2BLrz3s12Ook%3D&amp;reserved=0
> 
> but that's all arm64-specific and if we're solving the same problem as
> you, then let's avoid arch-specific stuff if possible. The way in which
> the guest discovers the interface will be arch-specific (we already have
> a mechanism for that and some hypercalls are already allocated by specs
> from Arm), but the interface back to the VMM and some (most?) of the host
> handling could be shared.
> 

I have started implementing a similar "hypercall to userspace"
functionality for these DMA_SHARE/DMA_UNSHARE type of interfaces
corresponding to SEV guest's add/remove shared regions on the x86 platform.

This does not implement a generic hypercall exiting infrastructure, 
mainly extends the KVM hypercall support to return back to userspace
specifically for add/remove shared region hypercalls and then re-uses
the complete userspace I/O callback functionality to resume the guest
after returning back from userspace handling of the hypercall.

Looking fwd. to any comments/feedback/thoughts on the above.

Thanks,
Ashish

> > But, isn't this patch set still RFC, and though i agree that it adds
> > an infrastructure for standardised communication between the host and
> > it's guests for mutually controlled shared memory regions and
> > surely adds some kind of portability between hypervisor
> > implementations, but nothing is standardised still, right ?
> 
> No, and it seems that you're further ahead than us in terms of
> implementation in this area. We're happy to review patches though, to
> make sure we end up with something that works for us both.
> 
> Will

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-03-11 18:14                       ` Ashish Kalra
@ 2021-03-11 20:48                         ` Steve Rutherford
  2021-03-19 17:59                           ` Ashish Kalra
  0 siblings, 1 reply; 75+ messages in thread
From: Steve Rutherford @ 2021-03-11 20:48 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Will Deacon, Sean Christopherson, pbonzini, joro, Lendacky,
	Thomas, kvm, linux-kernel, venu.busireddy, Singh, Brijesh,
	Quentin Perret, maz

On Thu, Mar 11, 2021 at 10:15 AM Ashish Kalra <ashish.kalra@amd.com> wrote:
>
> On Wed, Mar 03, 2021 at 06:54:41PM +0000, Will Deacon wrote:
> > [+Marc]
> >
> > On Tue, Mar 02, 2021 at 02:55:43PM +0000, Ashish Kalra wrote:
> > > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > > > > Thanks for grabbing the data!
> > > > > >
> > > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > > hypercall exiting, so I think that would be the current consensus.
> > > >
> > > > Yep, though it'd be good to get Paolo's input, too.
> > > >
> > > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > > series where we implement something more generic, e.g. a hypercall
> > > > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > > > exit route, we can drop the kvm side of the hypercall.
> > > >
> > > > I don't think this is a good candidate for arbitrary hypercall interception.  Or
> > > > rather, I think hypercall interception should be an orthogonal implementation.
> > > >
> > > > The guest, including guest firmware, needs to be aware that the hypercall is
> > > > supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> > > > implement a common ABI is an unnecessary risk.
> > > >
> > > > We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> > > > require further VMM intervention.  But, I just don't see the point, it would
> > > > save only a few lines of code.  It would also limit what KVM could do in the
> > > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> > > > then mandatory interception would essentially make it impossible for KVM to do
> > > > bookkeeping while still honoring the interception request.
> > > >
> > > > However, I do think it would make sense to have the userspace exit be a generic
> > > > exit type.  But hey, we already have the necessary ABI defined for that!  It's
> > > > just not used anywhere.
> > > >
> > > >   /* KVM_EXIT_HYPERCALL */
> > > >   struct {
> > > >           __u64 nr;
> > > >           __u64 args[6];
> > > >           __u64 ret;
> > > >           __u32 longmode;
> > > >           __u32 pad;
> > > >   } hypercall;
> > > >
> > > >
> > > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > > confirm that).  Then userspace could also be in control of the cpuid bit.
> > > >
> > > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> > > > The data limitation could be fudged by shoving data into non-standard GPRs, but
> > > > that will result in truly heinous guest code, and extensibility issues.
> > > >
> > > > The data limitation is a moot point, because the x86-only thing is a deal
> > > > breaker.  arm64's pKVM work has a near-identical use case for a guest to share
> > > > memory with a host.  I can't think of a clever way to avoid having to support
> > > > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > > > multiple KVM variants.
> > >
> > > Looking at arm64's pKVM work, i see that it is a recently introduced RFC
> > > patch-set and probably relevant to arm64 nVHE hypervisor
> > > mode/implementation, and potentially makes sense as it adds guest
> > > memory protection as both host and guest kernels are running on the same
> > > privilege level ?
> > >
> > > Though i do see that the pKVM stuff adds two hypercalls, specifically :
> > >
> > > pkvm_create_mappings() ( I assume this is for setting shared memory
> > > regions between host and guest) &
> > > pkvm_create_private_mappings().
> > >
> > > And the use-cases are quite similar to memory protection architectues
> > > use cases, for example, use with virtio devices, guest DMA I/O, etc.
> >
> > These hypercalls are both private to the host kernel communicating with
> > its hypervisor counterpart, so I don't think they're particularly
> > relevant here. As far as I can see, the more useful thing is to allow
> > the guest to communicate back to the host (and the VMM) that it has opened
> > up a memory window, perhaps for virtio rings or some other shared memory.
> >
> > We hacked this up as a prototype in the past:
> >
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fandroid-kvm.googlesource.com%2Flinux%2F%2B%2Fd12a9e2c12a52cf7140d40cd9fa092dc8a85fac9%255E%2521%2F&amp;data=04%7C01%7Cashish.kalra%40amd.com%7C7ae6bbd9fa6442f9edcc08d8de75d14b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637503944913839841%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Juon5nJ7BB6moTWYssRXOWrDOrYfZLmA%2BLrz3s12Ook%3D&amp;reserved=0
> >
> > but that's all arm64-specific and if we're solving the same problem as
> > you, then let's avoid arch-specific stuff if possible. The way in which
> > the guest discovers the interface will be arch-specific (we already have
> > a mechanism for that and some hypercalls are already allocated by specs
> > from Arm), but the interface back to the VMM and some (most?) of the host
> > handling could be shared.
> >
>
> I have started implementing a similar "hypercall to userspace"
> functionality for these DMA_SHARE/DMA_UNSHARE type of interfaces
> corresponding to SEV guest's add/remove shared regions on the x86 platform.
>
> This does not implement a generic hypercall exiting infrastructure,
> mainly extends the KVM hypercall support to return back to userspace
> specifically for add/remove shared region hypercalls and then re-uses
> the complete userspace I/O callback functionality to resume the guest
> after returning back from userspace handling of the hypercall.
>
> Looking fwd. to any comments/feedback/thoughts on the above.
Others have mentioned a lack of appetite for generic hypercall
intercepts, so this is the right approach.

Thanks,
Steve

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-03-11 20:48                         ` Steve Rutherford
@ 2021-03-19 17:59                           ` Ashish Kalra
  2021-04-02  1:40                             ` Steve Rutherford
  0 siblings, 1 reply; 75+ messages in thread
From: Ashish Kalra @ 2021-03-19 17:59 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Will Deacon, Sean Christopherson, pbonzini, joro, Lendacky,
	Thomas, kvm, linux-kernel, venu.busireddy, Singh, Brijesh,
	Quentin Perret, maz

On Thu, Mar 11, 2021 at 12:48:07PM -0800, Steve Rutherford wrote:
> On Thu, Mar 11, 2021 at 10:15 AM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >
> > On Wed, Mar 03, 2021 at 06:54:41PM +0000, Will Deacon wrote:
> > > [+Marc]
> > >
> > > On Tue, Mar 02, 2021 at 02:55:43PM +0000, Ashish Kalra wrote:
> > > > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > > > > > Thanks for grabbing the data!
> > > > > > >
> > > > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > > > hypercall exiting, so I think that would be the current consensus.
> > > > >
> > > > > Yep, though it'd be good to get Paolo's input, too.
> > > > >
> > > > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > > > series where we implement something more generic, e.g. a hypercall
> > > > > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > > > > exit route, we can drop the kvm side of the hypercall.
> > > > >
> > > > > I don't think this is a good candidate for arbitrary hypercall interception.  Or
> > > > > rather, I think hypercall interception should be an orthogonal implementation.
> > > > >
> > > > > The guest, including guest firmware, needs to be aware that the hypercall is
> > > > > supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> > > > > implement a common ABI is an unnecessary risk.
> > > > >
> > > > > We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> > > > > require further VMM intervention.  But, I just don't see the point, it would
> > > > > save only a few lines of code.  It would also limit what KVM could do in the
> > > > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> > > > > then mandatory interception would essentially make it impossible for KVM to do
> > > > > bookkeeping while still honoring the interception request.
> > > > >
> > > > > However, I do think it would make sense to have the userspace exit be a generic
> > > > > exit type.  But hey, we already have the necessary ABI defined for that!  It's
> > > > > just not used anywhere.
> > > > >
> > > > >   /* KVM_EXIT_HYPERCALL */
> > > > >   struct {
> > > > >           __u64 nr;
> > > > >           __u64 args[6];
> > > > >           __u64 ret;
> > > > >           __u32 longmode;
> > > > >           __u32 pad;
> > > > >   } hypercall;
> > > > >
> > > > >
> > > > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > > > confirm that).  Then userspace could also be in control of the cpuid bit.
> > > > >
> > > > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> > > > > The data limitation could be fudged by shoving data into non-standard GPRs, but
> > > > > that will result in truly heinous guest code, and extensibility issues.
> > > > >

We may also need to pass-through the MSR to userspace, as it is a part of this
complete host (userspace/kernel), OVMF and guest kernel negotiation of
the SEV live migration feature. 

Host (userspace/kernel) advertises it's support for SEV live migration
feature via the CPUID bits, which is queried by OVMF and which in turn
adds a new UEFI runtime variable to indicate support for SEV live
migration, which is later queried during guest kernel boot and
accordingly the guest does a wrmrsl() to custom MSR to complete SEV
live migration negotiation and enable it. 

Now, the GET_SHARED_REGION_LIST ioctl returns error, until this MSR write
enables SEV live migration, hence, preventing userspace to start live
migration before the feature support has been negotiated and enabled on
all the three components - host, guest OVMF and kernel. 

But, now with this ioctl not existing anymore, we will need to
pass-through the MSR to userspace too, for it to only initiate live
migration once the feature negotiation has been completed.

Thanks,
Ashish

> > > > > The data limitation is a moot point, because the x86-only thing is a deal
> > > > > breaker.  arm64's pKVM work has a near-identical use case for a guest to share
> > > > > memory with a host.  I can't think of a clever way to avoid having to support
> > > > > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > > > > multiple KVM variants.
> > > >
> > > > Looking at arm64's pKVM work, i see that it is a recently introduced RFC
> > > > patch-set and probably relevant to arm64 nVHE hypervisor
> > > > mode/implementation, and potentially makes sense as it adds guest
> > > > memory protection as both host and guest kernels are running on the same
> > > > privilege level ?
> > > >
> > > > Though i do see that the pKVM stuff adds two hypercalls, specifically :
> > > >
> > > > pkvm_create_mappings() ( I assume this is for setting shared memory
> > > > regions between host and guest) &
> > > > pkvm_create_private_mappings().
> > > >
> > > > And the use-cases are quite similar to memory protection architectues
> > > > use cases, for example, use with virtio devices, guest DMA I/O, etc.
> > >
> > > These hypercalls are both private to the host kernel communicating with
> > > its hypervisor counterpart, so I don't think they're particularly
> > > relevant here. As far as I can see, the more useful thing is to allow
> > > the guest to communicate back to the host (and the VMM) that it has opened
> > > up a memory window, perhaps for virtio rings or some other shared memory.
> > >
> > > We hacked this up as a prototype in the past:
> > >
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fandroid-kvm.googlesource.com%2Flinux%2F%2B%2Fd12a9e2c12a52cf7140d40cd9fa092dc8a85fac9%255E%2521%2F&amp;data=04%7C01%7Cashish.kalra%40amd.com%7C21099e2dd82c477005a008d8e4cf149e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637510925361923243%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=JputIEFoae3HRO5jvvn%2FTpP8i%2FpMMfWSiZ8a2%2Bna5iQ%3D&amp;reserved=0
> > >
> > > but that's all arm64-specific and if we're solving the same problem as
> > > you, then let's avoid arch-specific stuff if possible. The way in which
> > > the guest discovers the interface will be arch-specific (we already have
> > > a mechanism for that and some hypercalls are already allocated by specs
> > > from Arm), but the interface back to the VMM and some (most?) of the host
> > > handling could be shared.
> > >
> >
> > I have started implementing a similar "hypercall to userspace"
> > functionality for these DMA_SHARE/DMA_UNSHARE type of interfaces
> > corresponding to SEV guest's add/remove shared regions on the x86 platform.
> >
> > This does not implement a generic hypercall exiting infrastructure,
> > mainly extends the KVM hypercall support to return back to userspace
> > specifically for add/remove shared region hypercalls and then re-uses
> > the complete userspace I/O callback functionality to resume the guest
> > after returning back from userspace handling of the hypercall.
> >
> > Looking fwd. to any comments/feedback/thoughts on the above.
> Others have mentioned a lack of appetite for generic hypercall
> intercepts, so this is the right approach.
> 
> Thanks,
> Steve

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-03-19 17:59                           ` Ashish Kalra
@ 2021-04-02  1:40                             ` Steve Rutherford
  2021-04-02 11:09                               ` Ashish Kalra
  0 siblings, 1 reply; 75+ messages in thread
From: Steve Rutherford @ 2021-04-02  1:40 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Will Deacon, Sean Christopherson, pbonzini, joro, Lendacky,
	Thomas, kvm, linux-kernel, venu.busireddy, Singh, Brijesh,
	Quentin Perret, maz

On Fri, Mar 19, 2021 at 11:00 AM Ashish Kalra <ashish.kalra@amd.com> wrote:
>
> On Thu, Mar 11, 2021 at 12:48:07PM -0800, Steve Rutherford wrote:
> > On Thu, Mar 11, 2021 at 10:15 AM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > >
> > > On Wed, Mar 03, 2021 at 06:54:41PM +0000, Will Deacon wrote:
> > > > [+Marc]
> > > >
> > > > On Tue, Mar 02, 2021 at 02:55:43PM +0000, Ashish Kalra wrote:
> > > > > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > > > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > > > > > > Thanks for grabbing the data!
> > > > > > > >
> > > > > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > > > > hypercall exiting, so I think that would be the current consensus.
> > > > > >
> > > > > > Yep, though it'd be good to get Paolo's input, too.
> > > > > >
> > > > > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > > > > series where we implement something more generic, e.g. a hypercall
> > > > > > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > > > > > exit route, we can drop the kvm side of the hypercall.
> > > > > >
> > > > > > I don't think this is a good candidate for arbitrary hypercall interception.  Or
> > > > > > rather, I think hypercall interception should be an orthogonal implementation.
> > > > > >
> > > > > > The guest, including guest firmware, needs to be aware that the hypercall is
> > > > > > supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> > > > > > implement a common ABI is an unnecessary risk.
> > > > > >
> > > > > > We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> > > > > > require further VMM intervention.  But, I just don't see the point, it would
> > > > > > save only a few lines of code.  It would also limit what KVM could do in the
> > > > > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> > > > > > then mandatory interception would essentially make it impossible for KVM to do
> > > > > > bookkeeping while still honoring the interception request.
> > > > > >
> > > > > > However, I do think it would make sense to have the userspace exit be a generic
> > > > > > exit type.  But hey, we already have the necessary ABI defined for that!  It's
> > > > > > just not used anywhere.
> > > > > >
> > > > > >   /* KVM_EXIT_HYPERCALL */
> > > > > >   struct {
> > > > > >           __u64 nr;
> > > > > >           __u64 args[6];
> > > > > >           __u64 ret;
> > > > > >           __u32 longmode;
> > > > > >           __u32 pad;
> > > > > >   } hypercall;
> > > > > >
> > > > > >
> > > > > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > > > > confirm that).  Then userspace could also be in control of the cpuid bit.
> > > > > >
> > > > > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> > > > > > The data limitation could be fudged by shoving data into non-standard GPRs, but
> > > > > > that will result in truly heinous guest code, and extensibility issues.
> > > > > >
>
> We may also need to pass-through the MSR to userspace, as it is a part of this
> complete host (userspace/kernel), OVMF and guest kernel negotiation of
> the SEV live migration feature.
>
> Host (userspace/kernel) advertises it's support for SEV live migration
> feature via the CPUID bits, which is queried by OVMF and which in turn
> adds a new UEFI runtime variable to indicate support for SEV live
> migration, which is later queried during guest kernel boot and
> accordingly the guest does a wrmrsl() to custom MSR to complete SEV
> live migration negotiation and enable it.
>
> Now, the GET_SHARED_REGION_LIST ioctl returns error, until this MSR write
> enables SEV live migration, hence, preventing userspace to start live
> migration before the feature support has been negotiated and enabled on
> all the three components - host, guest OVMF and kernel.
>
> But, now with this ioctl not existing anymore, we will need to
> pass-through the MSR to userspace too, for it to only initiate live
> migration once the feature negotiation has been completed.
Hey Ashish,

I can't tell if you were waiting for feedback on this before posting
the follow-up patch series.

Here are a few options:
1) Add the MSR explicitly to the list of custom kvm MSRs, but don't
have it hooked up anywhere. The expectation would be for the VMM to
use msr intercepts to handle the reads and writes. If that seems
weird, have svm_set_msr (or whatever) explicitly ignore it.
2) Add a getter and setter for the MSR. Only allow guests to use it if
they are sev_guests with the requisite CPUID bit set.

I think I prefer the former, and it should work fine from my
understanding of the msr intercepts implementation. I'm also open to
other ideas. You could also have the MSR write trigger a KVM_EXIT of
the same type as the hypercall, but have it just say "the msr value
changed to XYZ", but that design sounds awkward.

Thanks,
Steve

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl
  2021-04-02  1:40                             ` Steve Rutherford
@ 2021-04-02 11:09                               ` Ashish Kalra
  0 siblings, 0 replies; 75+ messages in thread
From: Ashish Kalra @ 2021-04-02 11:09 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Will Deacon, Sean Christopherson, pbonzini, joro, Lendacky,
	Thomas, kvm, linux-kernel, venu.busireddy, Singh, Brijesh,
	Quentin Perret, maz

Hello Steve,

On Thu, Apr 01, 2021 at 06:40:06PM -0700, Steve Rutherford wrote:
> On Fri, Mar 19, 2021 at 11:00 AM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >
> > On Thu, Mar 11, 2021 at 12:48:07PM -0800, Steve Rutherford wrote:
> > > On Thu, Mar 11, 2021 at 10:15 AM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > >
> > > > On Wed, Mar 03, 2021 at 06:54:41PM +0000, Will Deacon wrote:
> > > > > [+Marc]
> > > > >
> > > > > On Tue, Mar 02, 2021 at 02:55:43PM +0000, Ashish Kalra wrote:
> > > > > > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > > > > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > > > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > > > > > > > Thanks for grabbing the data!
> > > > > > > > >
> > > > > > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > > > > > hypercall exiting, so I think that would be the current consensus.
> > > > > > >
> > > > > > > Yep, though it'd be good to get Paolo's input, too.
> > > > > > >
> > > > > > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > > > > > series where we implement something more generic, e.g. a hypercall
> > > > > > > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > > > > > > exit route, we can drop the kvm side of the hypercall.
> > > > > > >
> > > > > > > I don't think this is a good candidate for arbitrary hypercall interception.  Or
> > > > > > > rather, I think hypercall interception should be an orthogonal implementation.
> > > > > > >
> > > > > > > The guest, including guest firmware, needs to be aware that the hypercall is
> > > > > > > supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> > > > > > > implement a common ABI is an unnecessary risk.
> > > > > > >
> > > > > > > We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> > > > > > > require further VMM intervention.  But, I just don't see the point, it would
> > > > > > > save only a few lines of code.  It would also limit what KVM could do in the
> > > > > > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> > > > > > > then mandatory interception would essentially make it impossible for KVM to do
> > > > > > > bookkeeping while still honoring the interception request.
> > > > > > >
> > > > > > > However, I do think it would make sense to have the userspace exit be a generic
> > > > > > > exit type.  But hey, we already have the necessary ABI defined for that!  It's
> > > > > > > just not used anywhere.
> > > > > > >
> > > > > > >   /* KVM_EXIT_HYPERCALL */
> > > > > > >   struct {
> > > > > > >           __u64 nr;
> > > > > > >           __u64 args[6];
> > > > > > >           __u64 ret;
> > > > > > >           __u32 longmode;
> > > > > > >           __u32 pad;
> > > > > > >   } hypercall;
> > > > > > >
> > > > > > >
> > > > > > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > > > > > confirm that).  Then userspace could also be in control of the cpuid bit.
> > > > > > >
> > > > > > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> > > > > > > The data limitation could be fudged by shoving data into non-standard GPRs, but
> > > > > > > that will result in truly heinous guest code, and extensibility issues.
> > > > > > >
> >
> > We may also need to pass-through the MSR to userspace, as it is a part of this
> > complete host (userspace/kernel), OVMF and guest kernel negotiation of
> > the SEV live migration feature.
> >
> > Host (userspace/kernel) advertises it's support for SEV live migration
> > feature via the CPUID bits, which is queried by OVMF and which in turn
> > adds a new UEFI runtime variable to indicate support for SEV live
> > migration, which is later queried during guest kernel boot and
> > accordingly the guest does a wrmrsl() to custom MSR to complete SEV
> > live migration negotiation and enable it.
> >
> > Now, the GET_SHARED_REGION_LIST ioctl returns error, until this MSR write
> > enables SEV live migration, hence, preventing userspace to start live
> > migration before the feature support has been negotiated and enabled on
> > all the three components - host, guest OVMF and kernel.
> >
> > But, now with this ioctl not existing anymore, we will need to
> > pass-through the MSR to userspace too, for it to only initiate live
> > migration once the feature negotiation has been completed.
> 
> I can't tell if you were waiting for feedback on this before posting
> the follow-up patch series.

Actually, i am going to post the follow-up patch series upstream early
next week. 

I have already added support for MSR handling and exit to userspace, the
current implementation looks like this :

The custom MSR is hooked both in svm_get_msr() and svm_set_msr():

@@ -2800,6 +2800,17 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
        case MSR_F10H_DECFG:
                msr_info->data = svm->msr_decfg;
                break;
+       case MSR_KVM_SEV_LIVE_MIGRATION:
+               if (!sev_guest(vcpu->kvm))
+                       return 1;
+
+               if (!guest_cpuid_has(vcpu, KVM_FEATURE_SEV_LIVE_MIGRATION))
+                       return 1;
+
+               /*
+                * Let userspace handle the MSR using MSR filters.
+                */
+               return KVM_MSR_RET_FILTERED;

So there is special check added for both sev_guest() and requisite CPUID
bit set in guest CPUID, if either fails this will signal #GP to guest.

Otherwise, it returns MSR_FILTER return code, which will allow userspace
to use msr intercepts to handle the reads and writes via userspace exits
using KVM_EXIT_X86_RDMSR/KVM_EXIT_X86_WRMSR.

Let me know if you have any feedback/comments on the above handling.

Thanks,
Ashish

> 
> Here are a few options:
> 1) Add the MSR explicitly to the list of custom kvm MSRs, but don't
> have it hooked up anywhere. The expectation would be for the VMM to
> use msr intercepts to handle the reads and writes. If that seems
> weird, have svm_set_msr (or whatever) explicitly ignore it.
> 2) Add a getter and setter for the MSR. Only allow guests to use it if
> they are sev_guests with the requisite CPUID bit set.
> 
> I think I prefer the former, and it should work fine from my
> understanding of the msr intercepts implementation. I'm also open to
> other ideas. You could also have the MSR write trigger a KVM_EXIT of
> the same type as the hypercall, but have it just say "the msr value
> changed to XYZ", but that design sounds awkward.
> 
> Thanks,
> Steve

^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2021-04-02 11:09 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-04  0:35 [PATCH v10 00/17] Add AMD SEV guest live migration support Ashish Kalra
2021-02-04  0:36 ` [PATCH v10 01/16] KVM: SVM: Add KVM_SEV SEND_START command Ashish Kalra
2021-02-04  0:36 ` [PATCH v10 02/16] KVM: SVM: Add KVM_SEND_UPDATE_DATA command Ashish Kalra
2021-02-04  0:37 ` [PATCH v10 03/16] KVM: SVM: Add KVM_SEV_SEND_FINISH command Ashish Kalra
2021-02-04  0:37 ` [PATCH v10 04/16] KVM: SVM: Add support for KVM_SEV_RECEIVE_START command Ashish Kalra
2021-02-04  0:37 ` [PATCH v10 05/16] KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command Ashish Kalra
2021-02-04  0:37 ` [PATCH v10 06/16] KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command Ashish Kalra
2021-02-04  0:38 ` [PATCH v10 07/16] KVM: x86: Add AMD SEV specific Hypercall3 Ashish Kalra
2021-02-04  0:38 ` [PATCH v10 08/16] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall Ashish Kalra
2021-02-04 16:03   ` Tom Lendacky
2021-02-05  1:44   ` Steve Rutherford
2021-02-05  3:32     ` Ashish Kalra
2021-02-04  0:39 ` [PATCH v10 09/16] mm: x86: Invoke hypercall when page encryption status is changed Ashish Kalra
2021-02-04  0:39 ` [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl Ashish Kalra
2021-02-04 16:14   ` Tom Lendacky
2021-02-04 16:34     ` Ashish Kalra
2021-02-17  1:03   ` Sean Christopherson
2021-02-17 14:00     ` Kalra, Ashish
2021-02-17 16:13       ` Sean Christopherson
2021-02-18  6:48         ` Kalra, Ashish
2021-02-18 16:39           ` Sean Christopherson
2021-02-18 17:05             ` Kalra, Ashish
2021-02-18 17:50               ` Sean Christopherson
2021-02-18 18:32     ` Kalra, Ashish
2021-02-24 17:51       ` Ashish Kalra
2021-02-24 18:22         ` Sean Christopherson
2021-02-25 20:20           ` Ashish Kalra
2021-02-25 22:59             ` Steve Rutherford
2021-02-25 23:24               ` Steve Rutherford
2021-02-26 14:04               ` Ashish Kalra
2021-02-26 17:44                 ` Sean Christopherson
2021-03-02 14:55                   ` Ashish Kalra
2021-03-02 15:15                     ` Ashish Kalra
2021-03-03 18:54                     ` Will Deacon
2021-03-03 19:32                       ` Ashish Kalra
2021-03-09 19:10                       ` Ashish Kalra
2021-03-11 18:14                       ` Ashish Kalra
2021-03-11 20:48                         ` Steve Rutherford
2021-03-19 17:59                           ` Ashish Kalra
2021-04-02  1:40                             ` Steve Rutherford
2021-04-02 11:09                               ` Ashish Kalra
2021-03-08 10:40                   ` Ashish Kalra
2021-03-08 19:51                     ` Sean Christopherson
2021-03-08 21:05                       ` Ashish Kalra
2021-03-08 21:11                       ` Brijesh Singh
2021-03-08 21:32                         ` Ashish Kalra
2021-03-08 21:51                         ` Steve Rutherford
2021-03-09 19:42                           ` Sean Christopherson
2021-03-10  3:42                           ` Kalra, Ashish
2021-03-10  3:47                             ` Steve Rutherford
2021-03-08 21:48                       ` Steve Rutherford
2021-02-17  1:06   ` Sean Christopherson
2021-02-04  0:39 ` [PATCH v10 11/16] KVM: x86: Introduce KVM_SET_SHARED_PAGES_LIST ioctl Ashish Kalra
2021-02-04  0:39 ` [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR Ashish Kalra
2021-02-05  0:56   ` Steve Rutherford
2021-02-05  3:07     ` Ashish Kalra
2021-02-06  2:54       ` Steve Rutherford
2021-02-06  4:49         ` Ashish Kalra
2021-02-06  5:46         ` Ashish Kalra
2021-02-06 13:56           ` Ashish Kalra
2021-02-08  0:28             ` Ashish Kalra
2021-02-08 22:50               ` Steve Rutherford
2021-02-10 20:36                 ` Ashish Kalra
2021-02-10 22:01                   ` Steve Rutherford
2021-02-10 22:05                     ` Steve Rutherford
2021-02-16 23:20   ` Sean Christopherson
2021-02-04  0:40 ` [PATCH v10 13/16] EFI: Introduce the new AMD Memory Encryption GUID Ashish Kalra
2021-02-04  0:40 ` [PATCH v10 14/16] KVM: x86: Add guest support for detecting and enabling SEV Live Migration feature Ashish Kalra
2021-02-18 17:56   ` Sean Christopherson
2021-02-04  0:40 ` [PATCH v10 15/16] KVM: x86: Add kexec support for SEV Live Migration Ashish Kalra
2021-02-04  4:10   ` kernel test robot
2021-02-04  4:10     ` kernel test robot
2021-02-04  0:40 ` [PATCH v10 16/16] KVM: SVM: Bypass DBG_DECRYPT API calls for unencrypted guest memory Ashish Kalra
2021-02-09  6:23   ` kernel test robot
2021-02-09  6:23     ` kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.