linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/14] Add AMD SEV guest live migration support
@ 2020-03-30  6:19 Ashish Kalra
  2020-03-30  6:19 ` [PATCH v6 01/14] KVM: SVM: Add KVM_SEV SEND_START command Ashish Kalra
                   ` (14 more replies)
  0 siblings, 15 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30  6:19 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

From: Ashish Kalra <ashish.kalra@amd.com>

The series add support for AMD SEV guest live migration commands. To protect the
confidentiality of an SEV protected guest memory while in transit we need to
use the SEV commands defined in SEV API spec [1].

SEV guest VMs have the concept of private and shared memory. Private memory
is encrypted with the guest-specific key, while shared memory may be encrypted
with hypervisor key. The commands provided by the SEV FW are meant to be used
for the private memory only. The patch series introduces a new hypercall.
The guest OS can use this hypercall to notify the page encryption status.
If the page is encrypted with guest specific-key then we use SEV command during
the migration. If page is not encrypted then fallback to default.

The patch adds new ioctls KVM_{SET,GET}_PAGE_ENC_BITMAP. The ioctl can be used
by the qemu to get the page encrypted bitmap. Qemu can consult this bitmap
during the migration to know whether the page is encrypted.

[1] https://developer.amd.com/wp-content/resources/55766.PDF

Changes since v5:
- Fix build errors as
  Reported-by: kbuild test robot <lkp@intel.com>

Changes since v4:
- Host support has been added to extend KVM capabilities/feature bits to 
  include a new KVM_FEATURE_SEV_LIVE_MIGRATION, which the guest can
  query for host-side support for SEV live migration and a new custom MSR
  MSR_KVM_SEV_LIVE_MIG_EN is added for guest to enable the SEV live
  migration feature.
- Ensure that _bss_decrypted section is marked as decrypted in the
  page encryption bitmap.
- Fixing KVM_GET_PAGE_ENC_BITMAP ioctl to return the correct bitmap
  as per the number of pages being requested by the user. Ensure that
  we only copy bmap->num_pages bytes in the userspace buffer, if
  bmap->num_pages is not byte aligned we read the trailing bits
  from the userspace and copy those bits as is. This fixes guest
  page(s) corruption issues observed after migration completion.
- Add kexec support for SEV Live Migration to reset the host's
  page encryption bitmap related to kernel specific page encryption
  status settings before we load a new kernel by kexec. We cannot
  reset the complete page encryption bitmap here as we need to
  retain the UEFI/OVMF firmware specific settings.

Changes since v3:
- Rebasing to mainline and testing.
- Adding a new KVM_PAGE_ENC_BITMAP_RESET ioctl, which resets the 
  page encryption bitmap on a guest reboot event.
- Adding a more reliable sanity check for GPA range being passed to
  the hypercall to ensure that guest MMIO ranges are also marked
  in the page encryption bitmap.

Changes since v2:
 - reset the page encryption bitmap on vcpu reboot

Changes since v1:
 - Add support to share the page encryption between the source and target
   machine.
 - Fix review feedbacks from Tom Lendacky.
 - Add check to limit the session blob length.
 - Update KVM_GET_PAGE_ENC_BITMAP icotl to use the base_gfn instead of
   the memory slot when querying the bitmap.

Ashish Kalra (3):
  KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature &
    Custom MSR.
  KVM: x86: Add kexec support for SEV Live Migration.

Brijesh Singh (11):
  KVM: SVM: Add KVM_SEV SEND_START command
  KVM: SVM: Add KVM_SEND_UPDATE_DATA command
  KVM: SVM: Add KVM_SEV_SEND_FINISH command
  KVM: SVM: Add support for KVM_SEV_RECEIVE_START command
  KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command
  KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
  KVM: x86: Add AMD SEV specific Hypercall3
  KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
  mm: x86: Invoke hypercall when page encryption status is changed
  KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl

 .../virt/kvm/amd-memory-encryption.rst        | 120 +++
 Documentation/virt/kvm/api.rst                |  62 ++
 Documentation/virt/kvm/cpuid.rst              |   4 +
 Documentation/virt/kvm/hypercalls.rst         |  15 +
 Documentation/virt/kvm/msr.rst                |  10 +
 arch/x86/include/asm/kvm_host.h               |  10 +
 arch/x86/include/asm/kvm_para.h               |  12 +
 arch/x86/include/asm/paravirt.h               |  10 +
 arch/x86/include/asm/paravirt_types.h         |   2 +
 arch/x86/include/uapi/asm/kvm_para.h          |   5 +
 arch/x86/kernel/kvm.c                         |  32 +
 arch/x86/kernel/paravirt.c                    |   1 +
 arch/x86/kvm/cpuid.c                          |   3 +-
 arch/x86/kvm/svm.c                            | 699 +++++++++++++++++-
 arch/x86/kvm/vmx/vmx.c                        |   1 +
 arch/x86/kvm/x86.c                            |  43 ++
 arch/x86/mm/mem_encrypt.c                     |  69 +-
 arch/x86/mm/pat/set_memory.c                  |   7 +
 include/linux/psp-sev.h                       |   8 +-
 include/uapi/linux/kvm.h                      |  53 ++
 include/uapi/linux/kvm_para.h                 |   1 +
 21 files changed, 1157 insertions(+), 10 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v6 01/14] KVM: SVM: Add KVM_SEV SEND_START command
  2020-03-30  6:19 [PATCH v6 00/14] Add AMD SEV guest live migration support Ashish Kalra
@ 2020-03-30  6:19 ` Ashish Kalra
  2020-04-02  6:27   ` Venu Busireddy
  2020-04-02 17:51   ` Krish Sadhukhan
  2020-03-30  6:20 ` [PATCH v6 02/14] KVM: SVM: Add KVM_SEND_UPDATE_DATA command Ashish Kalra
                   ` (13 subsequent siblings)
  14 siblings, 2 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30  6:19 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

From: Brijesh Singh <Brijesh.Singh@amd.com>

The command is used to create an outgoing SEV guest encryption context.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/amd-memory-encryption.rst        |  27 ++++
 arch/x86/kvm/svm.c                            | 128 ++++++++++++++++++
 include/linux/psp-sev.h                       |   8 +-
 include/uapi/linux/kvm.h                      |  12 ++
 4 files changed, 171 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index c3129b9ba5cb..4fd34fc5c7a7 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -263,6 +263,33 @@ Returns: 0 on success, -negative on error
                 __u32 trans_len;
         };
 
+10. KVM_SEV_SEND_START
+----------------------
+
+The KVM_SEV_SEND_START command can be used by the hypervisor to create an
+outgoing guest encryption context.
+
+Parameters (in): struct kvm_sev_send_start
+
+Returns: 0 on success, -negative on error
+
+::
+        struct kvm_sev_send_start {
+                __u32 policy;                 /* guest policy */
+
+                __u64 pdh_cert_uaddr;         /* platform Diffie-Hellman certificate */
+                __u32 pdh_cert_len;
+
+                __u64 plat_certs_uadr;        /* platform certificate chain */
+                __u32 plat_certs_len;
+
+                __u64 amd_certs_uaddr;        /* AMD certificate */
+                __u32 amd_cert_len;
+
+                __u64 session_uaddr;          /* Guest session information */
+                __u32 session_len;
+        };
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 50d1ebafe0b3..63d172e974ad 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7149,6 +7149,131 @@ static int sev_launch_secret(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+/* Userspace wants to query session length. */
+static int
+__sev_send_start_query_session_length(struct kvm *kvm, struct kvm_sev_cmd *argp,
+				      struct kvm_sev_send_start *params)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_send_start *data;
+	int ret;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+	if (data == NULL)
+		return -ENOMEM;
+
+	data->handle = sev->handle;
+	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
+
+	params->session_len = data->session_len;
+	if (copy_to_user((void __user *)(uintptr_t)argp->data, params,
+				sizeof(struct kvm_sev_send_start)))
+		ret = -EFAULT;
+
+	kfree(data);
+	return ret;
+}
+
+static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_send_start *data;
+	struct kvm_sev_send_start params;
+	void *amd_certs, *session_data;
+	void *pdh_cert, *plat_certs;
+	int ret;
+
+	if (!sev_guest(kvm))
+		return -ENOTTY;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
+				sizeof(struct kvm_sev_send_start)))
+		return -EFAULT;
+
+	/* if session_len is zero, userspace wants to query the session length */
+	if (!params.session_len)
+		return __sev_send_start_query_session_length(kvm, argp,
+				&params);
+
+	/* some sanity checks */
+	if (!params.pdh_cert_uaddr || !params.pdh_cert_len ||
+	    !params.session_uaddr || params.session_len > SEV_FW_BLOB_MAX_SIZE)
+		return -EINVAL;
+
+	/* allocate the memory to hold the session data blob */
+	session_data = kmalloc(params.session_len, GFP_KERNEL_ACCOUNT);
+	if (!session_data)
+		return -ENOMEM;
+
+	/* copy the certificate blobs from userspace */
+	pdh_cert = psp_copy_user_blob(params.pdh_cert_uaddr,
+				params.pdh_cert_len);
+	if (IS_ERR(pdh_cert)) {
+		ret = PTR_ERR(pdh_cert);
+		goto e_free_session;
+	}
+
+	plat_certs = psp_copy_user_blob(params.plat_certs_uaddr,
+				params.plat_certs_len);
+	if (IS_ERR(plat_certs)) {
+		ret = PTR_ERR(plat_certs);
+		goto e_free_pdh;
+	}
+
+	amd_certs = psp_copy_user_blob(params.amd_certs_uaddr,
+				params.amd_certs_len);
+	if (IS_ERR(amd_certs)) {
+		ret = PTR_ERR(amd_certs);
+		goto e_free_plat_cert;
+	}
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+	if (data == NULL) {
+		ret = -ENOMEM;
+		goto e_free_amd_cert;
+	}
+
+	/* populate the FW SEND_START field with system physical address */
+	data->pdh_cert_address = __psp_pa(pdh_cert);
+	data->pdh_cert_len = params.pdh_cert_len;
+	data->plat_certs_address = __psp_pa(plat_certs);
+	data->plat_certs_len = params.plat_certs_len;
+	data->amd_certs_address = __psp_pa(amd_certs);
+	data->amd_certs_len = params.amd_certs_len;
+	data->session_address = __psp_pa(session_data);
+	data->session_len = params.session_len;
+	data->handle = sev->handle;
+
+	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
+
+	if (ret)
+		goto e_free;
+
+	if (copy_to_user((void __user *)(uintptr_t) params.session_uaddr,
+			session_data, params.session_len)) {
+		ret = -EFAULT;
+		goto e_free;
+	}
+
+	params.policy = data->policy;
+	params.session_len = data->session_len;
+	if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
+				sizeof(struct kvm_sev_send_start)))
+		ret = -EFAULT;
+
+e_free:
+	kfree(data);
+e_free_amd_cert:
+	kfree(amd_certs);
+e_free_plat_cert:
+	kfree(plat_certs);
+e_free_pdh:
+	kfree(pdh_cert);
+e_free_session:
+	kfree(session_data);
+	return ret;
+}
+
 static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -7193,6 +7318,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_LAUNCH_SECRET:
 		r = sev_launch_secret(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SEND_START:
+		r = sev_send_start(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 5167bf2bfc75..9f63b9d48b63 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -323,11 +323,11 @@ struct sev_data_send_start {
 	u64 pdh_cert_address;			/* In */
 	u32 pdh_cert_len;			/* In */
 	u32 reserved1;
-	u64 plat_cert_address;			/* In */
-	u32 plat_cert_len;			/* In */
+	u64 plat_certs_address;			/* In */
+	u32 plat_certs_len;			/* In */
 	u32 reserved2;
-	u64 amd_cert_address;			/* In */
-	u32 amd_cert_len;			/* In */
+	u64 amd_certs_address;			/* In */
+	u32 amd_certs_len;			/* In */
 	u32 reserved3;
 	u64 session_address;			/* In */
 	u32 session_len;			/* In/Out */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4b95f9a31a2f..17bef4c245e1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1558,6 +1558,18 @@ struct kvm_sev_dbg {
 	__u32 len;
 };
 
+struct kvm_sev_send_start {
+	__u32 policy;
+	__u64 pdh_cert_uaddr;
+	__u32 pdh_cert_len;
+	__u64 plat_certs_uaddr;
+	__u32 plat_certs_len;
+	__u64 amd_certs_uaddr;
+	__u32 amd_certs_len;
+	__u64 session_uaddr;
+	__u32 session_len;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v6 02/14] KVM: SVM: Add KVM_SEND_UPDATE_DATA command
  2020-03-30  6:19 [PATCH v6 00/14] Add AMD SEV guest live migration support Ashish Kalra
  2020-03-30  6:19 ` [PATCH v6 01/14] KVM: SVM: Add KVM_SEV SEND_START command Ashish Kalra
@ 2020-03-30  6:20 ` Ashish Kalra
  2020-04-02 17:55   ` Venu Busireddy
  2020-04-02 20:13   ` Krish Sadhukhan
  2020-03-30  6:20 ` [PATCH v6 03/14] KVM: SVM: Add KVM_SEV_SEND_FINISH command Ashish Kalra
                   ` (12 subsequent siblings)
  14 siblings, 2 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30  6:20 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

From: Brijesh Singh <Brijesh.Singh@amd.com>

The command is used for encrypting the guest memory region using the encryption
context created with KVM_SEV_SEND_START.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by : Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/amd-memory-encryption.rst        |  24 ++++
 arch/x86/kvm/svm.c                            | 136 +++++++++++++++++-
 include/uapi/linux/kvm.h                      |   9 ++
 3 files changed, 165 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index 4fd34fc5c7a7..f46817ef7019 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -290,6 +290,30 @@ Returns: 0 on success, -negative on error
                 __u32 session_len;
         };
 
+11. KVM_SEV_SEND_UPDATE_DATA
+----------------------------
+
+The KVM_SEV_SEND_UPDATE_DATA command can be used by the hypervisor to encrypt the
+outgoing guest memory region with the encryption context creating using
+KVM_SEV_SEND_START.
+
+Parameters (in): struct kvm_sev_send_update_data
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_launch_send_update_data {
+                __u64 hdr_uaddr;        /* userspace address containing the packet header */
+                __u32 hdr_len;
+
+                __u64 guest_uaddr;      /* the source memory region to be encrypted */
+                __u32 guest_len;
+
+                __u64 trans_uaddr;      /* the destition memory region  */
+                __u32 trans_len;
+        };
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 63d172e974ad..8561c47cc4f9 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -428,6 +428,7 @@ static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
 static unsigned int max_sev_asid;
 static unsigned int min_sev_asid;
+static unsigned long sev_me_mask;
 static unsigned long *sev_asid_bitmap;
 static unsigned long *sev_reclaim_asid_bitmap;
 #define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
@@ -1232,16 +1233,22 @@ static int avic_ga_log_notifier(u32 ga_tag)
 static __init int sev_hardware_setup(void)
 {
 	struct sev_user_data_status *status;
+	u32 eax, ebx;
 	int rc;
 
-	/* Maximum number of encrypted guests supported simultaneously */
-	max_sev_asid = cpuid_ecx(0x8000001F);
+	/*
+	 * Query the memory encryption information.
+	 *  EBX:  Bit 0:5 Pagetable bit position used to indicate encryption
+	 *  (aka Cbit).
+	 *  ECX:  Maximum number of encrypted guests supported simultaneously.
+	 *  EDX:  Minimum ASID value that should be used for SEV guest.
+	 */
+	cpuid(0x8000001f, &eax, &ebx, &max_sev_asid, &min_sev_asid);
 
 	if (!max_sev_asid)
 		return 1;
 
-	/* Minimum ASID value that should be used for SEV guest */
-	min_sev_asid = cpuid_edx(0x8000001F);
+	sev_me_mask = 1UL << (ebx & 0x3f);
 
 	/* Initialize SEV ASID bitmaps */
 	sev_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);
@@ -7274,6 +7281,124 @@ static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+/* Userspace wants to query either header or trans length. */
+static int
+__sev_send_update_data_query_lengths(struct kvm *kvm, struct kvm_sev_cmd *argp,
+				     struct kvm_sev_send_update_data *params)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_send_update_data *data;
+	int ret;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+	if (!data)
+		return -ENOMEM;
+
+	data->handle = sev->handle;
+	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_UPDATE_DATA, data, &argp->error);
+
+	params->hdr_len = data->hdr_len;
+	params->trans_len = data->trans_len;
+
+	if (copy_to_user((void __user *)(uintptr_t)argp->data, params,
+			 sizeof(struct kvm_sev_send_update_data)))
+		ret = -EFAULT;
+
+	kfree(data);
+	return ret;
+}
+
+static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_send_update_data *data;
+	struct kvm_sev_send_update_data params;
+	void *hdr, *trans_data;
+	struct page **guest_page;
+	unsigned long n;
+	int ret, offset;
+
+	if (!sev_guest(kvm))
+		return -ENOTTY;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
+			sizeof(struct kvm_sev_send_update_data)))
+		return -EFAULT;
+
+	/* userspace wants to query either header or trans length */
+	if (!params.trans_len || !params.hdr_len)
+		return __sev_send_update_data_query_lengths(kvm, argp, &params);
+
+	if (!params.trans_uaddr || !params.guest_uaddr ||
+	    !params.guest_len || !params.hdr_uaddr)
+		return -EINVAL;
+
+
+	/* Check if we are crossing the page boundary */
+	offset = params.guest_uaddr & (PAGE_SIZE - 1);
+	if ((params.guest_len + offset > PAGE_SIZE))
+		return -EINVAL;
+
+	/* Pin guest memory */
+	guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
+				    PAGE_SIZE, &n, 0);
+	if (!guest_page)
+		return -EFAULT;
+
+	/* allocate memory for header and transport buffer */
+	ret = -ENOMEM;
+	hdr = kmalloc(params.hdr_len, GFP_KERNEL_ACCOUNT);
+	if (!hdr)
+		goto e_unpin;
+
+	trans_data = kmalloc(params.trans_len, GFP_KERNEL_ACCOUNT);
+	if (!trans_data)
+		goto e_free_hdr;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		goto e_free_trans_data;
+
+	data->hdr_address = __psp_pa(hdr);
+	data->hdr_len = params.hdr_len;
+	data->trans_address = __psp_pa(trans_data);
+	data->trans_len = params.trans_len;
+
+	/* The SEND_UPDATE_DATA command requires C-bit to be always set. */
+	data->guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) +
+				offset;
+	data->guest_address |= sev_me_mask;
+	data->guest_len = params.guest_len;
+	data->handle = sev->handle;
+
+	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_UPDATE_DATA, data, &argp->error);
+
+	if (ret)
+		goto e_free;
+
+	/* copy transport buffer to user space */
+	if (copy_to_user((void __user *)(uintptr_t)params.trans_uaddr,
+			 trans_data, params.trans_len)) {
+		ret = -EFAULT;
+		goto e_unpin;
+	}
+
+	/* Copy packet header to userspace. */
+	ret = copy_to_user((void __user *)(uintptr_t)params.hdr_uaddr, hdr,
+				params.hdr_len);
+
+e_free:
+	kfree(data);
+e_free_trans_data:
+	kfree(trans_data);
+e_free_hdr:
+	kfree(hdr);
+e_unpin:
+	sev_unpin_memory(kvm, guest_page, n);
+
+	return ret;
+}
+
 static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -7321,6 +7446,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SEND_START:
 		r = sev_send_start(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SEND_UPDATE_DATA:
+		r = sev_send_update_data(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 17bef4c245e1..d9dc81bb9c55 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1570,6 +1570,15 @@ struct kvm_sev_send_start {
 	__u32 session_len;
 };
 
+struct kvm_sev_send_update_data {
+	__u64 hdr_uaddr;
+	__u32 hdr_len;
+	__u64 guest_uaddr;
+	__u32 guest_len;
+	__u64 trans_uaddr;
+	__u32 trans_len;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v6 03/14] KVM: SVM: Add KVM_SEV_SEND_FINISH command
  2020-03-30  6:19 [PATCH v6 00/14] Add AMD SEV guest live migration support Ashish Kalra
  2020-03-30  6:19 ` [PATCH v6 01/14] KVM: SVM: Add KVM_SEV SEND_START command Ashish Kalra
  2020-03-30  6:20 ` [PATCH v6 02/14] KVM: SVM: Add KVM_SEND_UPDATE_DATA command Ashish Kalra
@ 2020-03-30  6:20 ` Ashish Kalra
  2020-04-02 18:17   ` Venu Busireddy
  2020-04-02 20:15   ` Krish Sadhukhan
  2020-03-30  6:21 ` [PATCH v6 04/14] KVM: SVM: Add support for KVM_SEV_RECEIVE_START command Ashish Kalra
                   ` (11 subsequent siblings)
  14 siblings, 2 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30  6:20 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

From: Brijesh Singh <Brijesh.Singh@amd.com>

The command is used to finailize the encryption context created with
KVM_SEV_SEND_START command.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/amd-memory-encryption.rst        |  8 +++++++
 arch/x86/kvm/svm.c                            | 23 +++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index f46817ef7019..a45dcb5f8687 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -314,6 +314,14 @@ Returns: 0 on success, -negative on error
                 __u32 trans_len;
         };
 
+12. KVM_SEV_SEND_FINISH
+------------------------
+
+After completion of the migration flow, the KVM_SEV_SEND_FINISH command can be
+issued by the hypervisor to delete the encryption context.
+
+Returns: 0 on success, -negative on error
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8561c47cc4f9..71a4cb3b817d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7399,6 +7399,26 @@ static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_send_finish *data;
+	int ret;
+
+	if (!sev_guest(kvm))
+		return -ENOTTY;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	data->handle = sev->handle;
+	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_FINISH, data, &argp->error);
+
+	kfree(data);
+	return ret;
+}
+
 static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -7449,6 +7469,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SEND_UPDATE_DATA:
 		r = sev_send_update_data(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SEND_FINISH:
+		r = sev_send_finish(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v6 04/14] KVM: SVM: Add support for KVM_SEV_RECEIVE_START command
  2020-03-30  6:19 [PATCH v6 00/14] Add AMD SEV guest live migration support Ashish Kalra
                   ` (2 preceding siblings ...)
  2020-03-30  6:20 ` [PATCH v6 03/14] KVM: SVM: Add KVM_SEV_SEND_FINISH command Ashish Kalra
@ 2020-03-30  6:21 ` Ashish Kalra
  2020-04-02 21:35   ` Venu Busireddy
  2020-04-02 22:09   ` Krish Sadhukhan
  2020-03-30  6:21 ` [PATCH v6 05/14] KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command Ashish Kalra
                   ` (10 subsequent siblings)
  14 siblings, 2 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30  6:21 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

From: Brijesh Singh <Brijesh.Singh@amd.com>

The command is used to create the encryption context for an incoming
SEV guest. The encryption context can be later used by the hypervisor
to import the incoming data into the SEV guest memory space.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/amd-memory-encryption.rst        | 29 +++++++
 arch/x86/kvm/svm.c                            | 81 +++++++++++++++++++
 include/uapi/linux/kvm.h                      |  9 +++
 3 files changed, 119 insertions(+)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index a45dcb5f8687..ef1f1f3a5b40 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -322,6 +322,35 @@ issued by the hypervisor to delete the encryption context.
 
 Returns: 0 on success, -negative on error
 
+13. KVM_SEV_RECEIVE_START
+------------------------
+
+The KVM_SEV_RECEIVE_START command is used for creating the memory encryption
+context for an incoming SEV guest. To create the encryption context, the user must
+provide a guest policy, the platform public Diffie-Hellman (PDH) key and session
+information.
+
+Parameters: struct  kvm_sev_receive_start (in/out)
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_receive_start {
+                __u32 handle;           /* if zero then firmware creates a new handle */
+                __u32 policy;           /* guest's policy */
+
+                __u64 pdh_uaddr;         /* userspace address pointing to the PDH key */
+                __u32 dh_len;
+
+                __u64 session_addr;     /* userspace address which points to the guest session information */
+                __u32 session_len;
+        };
+
+On success, the 'handle' field contains a new handle and on error, a negative value.
+
+For more details, see SEV spec Section 6.12.
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 71a4cb3b817d..038b47685733 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7419,6 +7419,84 @@ static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int sev_receive_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_receive_start *start;
+	struct kvm_sev_receive_start params;
+	int *error = &argp->error;
+	void *session_data;
+	void *pdh_data;
+	int ret;
+
+	if (!sev_guest(kvm))
+		return -ENOTTY;
+
+	/* Get parameter from the userspace */
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
+			sizeof(struct kvm_sev_receive_start)))
+		return -EFAULT;
+
+	/* some sanity checks */
+	if (!params.pdh_uaddr || !params.pdh_len ||
+	    !params.session_uaddr || !params.session_len)
+		return -EINVAL;
+
+	pdh_data = psp_copy_user_blob(params.pdh_uaddr, params.pdh_len);
+	if (IS_ERR(pdh_data))
+		return PTR_ERR(pdh_data);
+
+	session_data = psp_copy_user_blob(params.session_uaddr,
+			params.session_len);
+	if (IS_ERR(session_data)) {
+		ret = PTR_ERR(session_data);
+		goto e_free_pdh;
+	}
+
+	ret = -ENOMEM;
+	start = kzalloc(sizeof(*start), GFP_KERNEL);
+	if (!start)
+		goto e_free_session;
+
+	start->handle = params.handle;
+	start->policy = params.policy;
+	start->pdh_cert_address = __psp_pa(pdh_data);
+	start->pdh_cert_len = params.pdh_len;
+	start->session_address = __psp_pa(session_data);
+	start->session_len = params.session_len;
+
+	/* create memory encryption context */
+	ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_RECEIVE_START, start,
+				error);
+	if (ret)
+		goto e_free;
+
+	/* Bind ASID to this guest */
+	ret = sev_bind_asid(kvm, start->handle, error);
+	if (ret)
+		goto e_free;
+
+	params.handle = start->handle;
+	if (copy_to_user((void __user *)(uintptr_t)argp->data,
+			 &params, sizeof(struct kvm_sev_receive_start))) {
+		ret = -EFAULT;
+		sev_unbind_asid(kvm, start->handle);
+		goto e_free;
+	}
+
+	sev->handle = start->handle;
+	sev->fd = argp->sev_fd;
+
+e_free:
+	kfree(start);
+e_free_session:
+	kfree(session_data);
+e_free_pdh:
+	kfree(pdh_data);
+
+	return ret;
+}
+
 static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -7472,6 +7550,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SEND_FINISH:
 		r = sev_send_finish(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_RECEIVE_START:
+		r = sev_receive_start(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d9dc81bb9c55..74764b9db5fa 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1579,6 +1579,15 @@ struct kvm_sev_send_update_data {
 	__u32 trans_len;
 };
 
+struct kvm_sev_receive_start {
+	__u32 handle;
+	__u32 policy;
+	__u64 pdh_uaddr;
+	__u32 pdh_len;
+	__u64 session_uaddr;
+	__u32 session_len;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v6 05/14] KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command
  2020-03-30  6:19 [PATCH v6 00/14] Add AMD SEV guest live migration support Ashish Kalra
                   ` (3 preceding siblings ...)
  2020-03-30  6:21 ` [PATCH v6 04/14] KVM: SVM: Add support for KVM_SEV_RECEIVE_START command Ashish Kalra
@ 2020-03-30  6:21 ` Ashish Kalra
  2020-04-02 22:25   ` Krish Sadhukhan
  2020-04-02 22:29   ` Venu Busireddy
  2020-03-30  6:21 ` [PATCH v6 06/14] KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command Ashish Kalra
                   ` (9 subsequent siblings)
  14 siblings, 2 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30  6:21 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

From: Brijesh Singh <Brijesh.Singh@amd.com>

The command is used for copying the incoming buffer into the
SEV guest memory space.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/amd-memory-encryption.rst        | 24 ++++++
 arch/x86/kvm/svm.c                            | 79 +++++++++++++++++++
 include/uapi/linux/kvm.h                      |  9 +++
 3 files changed, 112 insertions(+)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index ef1f1f3a5b40..554aa33a99cc 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -351,6 +351,30 @@ On success, the 'handle' field contains a new handle and on error, a negative va
 
 For more details, see SEV spec Section 6.12.
 
+14. KVM_SEV_RECEIVE_UPDATE_DATA
+----------------------------
+
+The KVM_SEV_RECEIVE_UPDATE_DATA command can be used by the hypervisor to copy
+the incoming buffers into the guest memory region with encryption context
+created during the KVM_SEV_RECEIVE_START.
+
+Parameters (in): struct kvm_sev_receive_update_data
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_launch_receive_update_data {
+                __u64 hdr_uaddr;        /* userspace address containing the packet header */
+                __u32 hdr_len;
+
+                __u64 guest_uaddr;      /* the destination guest memory region */
+                __u32 guest_len;
+
+                __u64 trans_uaddr;      /* the incoming buffer memory region  */
+                __u32 trans_len;
+        };
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 038b47685733..5fc5355536d7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7497,6 +7497,82 @@ static int sev_receive_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_sev_receive_update_data params;
+	struct sev_data_receive_update_data *data;
+	void *hdr = NULL, *trans = NULL;
+	struct page **guest_page;
+	unsigned long n;
+	int ret, offset;
+
+	if (!sev_guest(kvm))
+		return -EINVAL;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
+			sizeof(struct kvm_sev_receive_update_data)))
+		return -EFAULT;
+
+	if (!params.hdr_uaddr || !params.hdr_len ||
+	    !params.guest_uaddr || !params.guest_len ||
+	    !params.trans_uaddr || !params.trans_len)
+		return -EINVAL;
+
+	/* Check if we are crossing the page boundary */
+	offset = params.guest_uaddr & (PAGE_SIZE - 1);
+	if ((params.guest_len + offset > PAGE_SIZE))
+		return -EINVAL;
+
+	hdr = psp_copy_user_blob(params.hdr_uaddr, params.hdr_len);
+	if (IS_ERR(hdr))
+		return PTR_ERR(hdr);
+
+	trans = psp_copy_user_blob(params.trans_uaddr, params.trans_len);
+	if (IS_ERR(trans)) {
+		ret = PTR_ERR(trans);
+		goto e_free_hdr;
+	}
+
+	ret = -ENOMEM;
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		goto e_free_trans;
+
+	data->hdr_address = __psp_pa(hdr);
+	data->hdr_len = params.hdr_len;
+	data->trans_address = __psp_pa(trans);
+	data->trans_len = params.trans_len;
+
+	/* Pin guest memory */
+	ret = -EFAULT;
+	guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
+				    PAGE_SIZE, &n, 0);
+	if (!guest_page)
+		goto e_free;
+
+	/* The RECEIVE_UPDATE_DATA command requires C-bit to be always set. */
+	data->guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) +
+				offset;
+	data->guest_address |= sev_me_mask;
+	data->guest_len = params.guest_len;
+	data->handle = sev->handle;
+
+	ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_UPDATE_DATA, data,
+				&argp->error);
+
+	sev_unpin_memory(kvm, guest_page, n);
+
+e_free:
+	kfree(data);
+e_free_trans:
+	kfree(trans);
+e_free_hdr:
+	kfree(hdr);
+
+	return ret;
+}
+
 static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -7553,6 +7629,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_RECEIVE_START:
 		r = sev_receive_start(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_RECEIVE_UPDATE_DATA:
+		r = sev_receive_update_data(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 74764b9db5fa..4e80c57a3182 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1588,6 +1588,15 @@ struct kvm_sev_receive_start {
 	__u32 session_len;
 };
 
+struct kvm_sev_receive_update_data {
+	__u64 hdr_uaddr;
+	__u32 hdr_len;
+	__u64 guest_uaddr;
+	__u32 guest_len;
+	__u64 trans_uaddr;
+	__u32 trans_len;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v6 06/14] KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
  2020-03-30  6:19 [PATCH v6 00/14] Add AMD SEV guest live migration support Ashish Kalra
                   ` (4 preceding siblings ...)
  2020-03-30  6:21 ` [PATCH v6 05/14] KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command Ashish Kalra
@ 2020-03-30  6:21 ` Ashish Kalra
  2020-04-02 22:24   ` Venu Busireddy
  2020-04-02 22:27   ` Krish Sadhukhan
  2020-03-30  6:21 ` [PATCH v6 07/14] KVM: x86: Add AMD SEV specific Hypercall3 Ashish Kalra
                   ` (8 subsequent siblings)
  14 siblings, 2 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30  6:21 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

From: Brijesh Singh <Brijesh.Singh@amd.com>

The command finalize the guest receiving process and make the SEV guest
ready for the execution.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 .../virt/kvm/amd-memory-encryption.rst        |  8 +++++++
 arch/x86/kvm/svm.c                            | 23 +++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index 554aa33a99cc..93cd95d9a6c0 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -375,6 +375,14 @@ Returns: 0 on success, -negative on error
                 __u32 trans_len;
         };
 
+15. KVM_SEV_RECEIVE_FINISH
+------------------------
+
+After completion of the migration flow, the KVM_SEV_RECEIVE_FINISH command can be
+issued by the hypervisor to make the guest ready for execution.
+
+Returns: 0 on success, -negative on error
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5fc5355536d7..7c2721e18b06 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7573,6 +7573,26 @@ static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_receive_finish *data;
+	int ret;
+
+	if (!sev_guest(kvm))
+		return -ENOTTY;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	data->handle = sev->handle;
+	ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, data, &argp->error);
+
+	kfree(data);
+	return ret;
+}
+
 static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -7632,6 +7652,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_RECEIVE_UPDATE_DATA:
 		r = sev_receive_update_data(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_RECEIVE_FINISH:
+		r = sev_receive_finish(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v6 07/14] KVM: x86: Add AMD SEV specific Hypercall3
  2020-03-30  6:19 [PATCH v6 00/14] Add AMD SEV guest live migration support Ashish Kalra
                   ` (5 preceding siblings ...)
  2020-03-30  6:21 ` [PATCH v6 06/14] KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command Ashish Kalra
@ 2020-03-30  6:21 ` Ashish Kalra
  2020-04-02 22:36   ` Venu Busireddy
  2020-04-02 23:54   ` Krish Sadhukhan
  2020-03-30  6:22 ` [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall Ashish Kalra
                   ` (7 subsequent siblings)
  14 siblings, 2 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30  6:21 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

From: Brijesh Singh <Brijesh.Singh@amd.com>

KVM hypercall framework relies on alternative framework to patch the
VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
apply_alternative() is called then it defaults to VMCALL. The approach
works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
will be able to decode the instruction and do the right things. But
when SEV is active, guest memory is encrypted with guest key and
hypervisor will not be able to decode the instruction bytes.

Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall
will be used by the SEV guest to notify encrypted pages to the hypervisor.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/kvm_para.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 9b4df6eaa11a..6c09255633a4 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -84,6 +84,18 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
 	return ret;
 }
 
+static inline long kvm_sev_hypercall3(unsigned int nr, unsigned long p1,
+				      unsigned long p2, unsigned long p3)
+{
+	long ret;
+
+	asm volatile("vmmcall"
+		     : "=a"(ret)
+		     : "a"(nr), "b"(p1), "c"(p2), "d"(p3)
+		     : "memory");
+	return ret;
+}
+
 #ifdef CONFIG_KVM_GUEST
 bool kvm_para_available(void);
 unsigned int kvm_arch_para_features(void);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2020-03-30  6:19 [PATCH v6 00/14] Add AMD SEV guest live migration support Ashish Kalra
                   ` (6 preceding siblings ...)
  2020-03-30  6:21 ` [PATCH v6 07/14] KVM: x86: Add AMD SEV specific Hypercall3 Ashish Kalra
@ 2020-03-30  6:22 ` Ashish Kalra
  2020-04-03  0:00   ` Venu Busireddy
                     ` (2 more replies)
  2020-03-30  6:22 ` [PATCH v6 09/14] KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl Ashish Kalra
                   ` (6 subsequent siblings)
  14 siblings, 3 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30  6:22 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

From: Brijesh Singh <Brijesh.Singh@amd.com>

This hypercall is used by the SEV guest to notify a change in the page
encryption status to the hypervisor. The hypercall should be invoked
only when the encryption attribute is changed from encrypted -> decrypted
and vice versa. By default all guest pages are considered encrypted.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 Documentation/virt/kvm/hypercalls.rst | 15 +++++
 arch/x86/include/asm/kvm_host.h       |  2 +
 arch/x86/kvm/svm.c                    | 95 +++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c                |  1 +
 arch/x86/kvm/x86.c                    |  6 ++
 include/uapi/linux/kvm_para.h         |  1 +
 6 files changed, 120 insertions(+)

diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
index dbaf207e560d..ff5287e68e81 100644
--- a/Documentation/virt/kvm/hypercalls.rst
+++ b/Documentation/virt/kvm/hypercalls.rst
@@ -169,3 +169,18 @@ a0: destination APIC ID
 
 :Usage example: When sending a call-function IPI-many to vCPUs, yield if
 	        any of the IPI target vCPUs was preempted.
+
+
+8. KVM_HC_PAGE_ENC_STATUS
+-------------------------
+:Architecture: x86
+:Status: active
+:Purpose: Notify the encryption status changes in guest page table (SEV guest)
+
+a0: the guest physical address of the start page
+a1: the number of pages
+a2: encryption attribute
+
+   Where:
+	* 1: Encryption attribute is set
+	* 0: Encryption attribute is cleared
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 98959e8cd448..90718fa3db47 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
 
 	bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
 	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
+	int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
+				  unsigned long sz, unsigned long mode);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 7c2721e18b06..1d8beaf1bceb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -136,6 +136,8 @@ struct kvm_sev_info {
 	int fd;			/* SEV device fd */
 	unsigned long pages_locked; /* Number of pages locked */
 	struct list_head regions_list;  /* List of registered regions */
+	unsigned long *page_enc_bmap;
+	unsigned long page_enc_bmap_size;
 };
 
 struct kvm_svm {
@@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
 
 	sev_unbind_asid(kvm, sev->handle);
 	sev_asid_free(sev->asid);
+
+	kvfree(sev->page_enc_bmap);
+	sev->page_enc_bmap = NULL;
 }
 
 static void avic_vm_destroy(struct kvm *kvm)
@@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	unsigned long *map;
+	unsigned long sz;
+
+	if (sev->page_enc_bmap_size >= new_size)
+		return 0;
+
+	sz = ALIGN(new_size, BITS_PER_LONG) / 8;
+
+	map = vmalloc(sz);
+	if (!map) {
+		pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
+				sz);
+		return -ENOMEM;
+	}
+
+	/* mark the page encrypted (by default) */
+	memset(map, 0xff, sz);
+
+	bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
+	kvfree(sev->page_enc_bmap);
+
+	sev->page_enc_bmap = map;
+	sev->page_enc_bmap_size = new_size;
+
+	return 0;
+}
+
+static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
+				  unsigned long npages, unsigned long enc)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	kvm_pfn_t pfn_start, pfn_end;
+	gfn_t gfn_start, gfn_end;
+	int ret;
+
+	if (!sev_guest(kvm))
+		return -EINVAL;
+
+	if (!npages)
+		return 0;
+
+	gfn_start = gpa_to_gfn(gpa);
+	gfn_end = gfn_start + npages;
+
+	/* out of bound access error check */
+	if (gfn_end <= gfn_start)
+		return -EINVAL;
+
+	/* lets make sure that gpa exist in our memslot */
+	pfn_start = gfn_to_pfn(kvm, gfn_start);
+	pfn_end = gfn_to_pfn(kvm, gfn_end);
+
+	if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
+		/*
+		 * Allow guest MMIO range(s) to be added
+		 * to the page encryption bitmap.
+		 */
+		return -EINVAL;
+	}
+
+	if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
+		/*
+		 * Allow guest MMIO range(s) to be added
+		 * to the page encryption bitmap.
+		 */
+		return -EINVAL;
+	}
+
+	mutex_lock(&kvm->lock);
+	ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
+	if (ret)
+		goto unlock;
+
+	if (enc)
+		__bitmap_set(sev->page_enc_bmap, gfn_start,
+				gfn_end - gfn_start);
+	else
+		__bitmap_clear(sev->page_enc_bmap, gfn_start,
+				gfn_end - gfn_start);
+
+unlock:
+	mutex_unlock(&kvm->lock);
+	return ret;
+}
+
 static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
 
 	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
+
+	.page_enc_status_hc = svm_page_enc_status_hc,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 079d9fbf278e..f68e76ee7f9c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.nested_get_evmcs_version = NULL,
 	.need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
 	.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
+	.page_enc_status_hc = NULL,
 };
 
 static void vmx_cleanup_l1d_flush(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf95c36cb4f4..68428eef2dde 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		kvm_sched_yield(vcpu->kvm, a0);
 		ret = 0;
 		break;
+	case KVM_HC_PAGE_ENC_STATUS:
+		ret = -KVM_ENOSYS;
+		if (kvm_x86_ops->page_enc_status_hc)
+			ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
+					a0, a1, a2);
+		break;
 	default:
 		ret = -KVM_ENOSYS;
 		break;
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index 8b86609849b9..847b83b75dc8 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -29,6 +29,7 @@
 #define KVM_HC_CLOCK_PAIRING		9
 #define KVM_HC_SEND_IPI		10
 #define KVM_HC_SCHED_YIELD		11
+#define KVM_HC_PAGE_ENC_STATUS		12
 
 /*
  * hypercalls use architecture specific
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v6 09/14] KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
  2020-03-30  6:19 [PATCH v6 00/14] Add AMD SEV guest live migration support Ashish Kalra
                   ` (7 preceding siblings ...)
  2020-03-30  6:22 ` [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall Ashish Kalra
@ 2020-03-30  6:22 ` Ashish Kalra
  2020-04-03 18:30   ` Venu Busireddy
  2020-04-03 20:18   ` Krish Sadhukhan
  2020-03-30  6:22 ` [PATCH v6 10/14] mm: x86: Invoke hypercall when page encryption status is changed Ashish Kalra
                   ` (5 subsequent siblings)
  14 siblings, 2 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30  6:22 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

From: Brijesh Singh <Brijesh.Singh@amd.com>

The ioctl can be used to retrieve page encryption bitmap for a given
gfn range.

Return the correct bitmap as per the number of pages being requested
by the user. Ensure that we only copy bmap->num_pages bytes in the
userspace buffer, if bmap->num_pages is not byte aligned we read
the trailing bits from the userspace and copy those bits as is.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 Documentation/virt/kvm/api.rst  | 27 +++++++++++++
 arch/x86/include/asm/kvm_host.h |  2 +
 arch/x86/kvm/svm.c              | 71 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c              | 12 ++++++
 include/uapi/linux/kvm.h        | 12 ++++++
 5 files changed, 124 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index ebd383fba939..8ad800ebb54f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4648,6 +4648,33 @@ This ioctl resets VCPU registers and control structures according to
 the clear cpu reset definition in the POP. However, the cpu is not put
 into ESA mode. This reset is a superset of the initial reset.
 
+4.125 KVM_GET_PAGE_ENC_BITMAP (vm ioctl)
+---------------------------------------
+
+:Capability: basic
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: struct kvm_page_enc_bitmap (in/out)
+:Returns: 0 on success, -1 on error
+
+/* for KVM_GET_PAGE_ENC_BITMAP */
+struct kvm_page_enc_bitmap {
+	__u64 start_gfn;
+	__u64 num_pages;
+	union {
+		void __user *enc_bitmap; /* one bit per page */
+		__u64 padding2;
+	};
+};
+
+The encrypted VMs have concept of private and shared pages. The private
+page is encrypted with the guest-specific key, while shared page may
+be encrypted with the hypervisor key. The KVM_GET_PAGE_ENC_BITMAP can
+be used to get the bitmap indicating whether the guest page is private
+or shared. The bitmap can be used during the guest migration, if the page
+is private then userspace need to use SEV migration commands to transmit
+the page.
+
 
 5. The kvm_run structure
 ========================
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 90718fa3db47..27e43e3ec9d8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1269,6 +1269,8 @@ struct kvm_x86_ops {
 	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
 	int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
 				  unsigned long sz, unsigned long mode);
+	int (*get_page_enc_bitmap)(struct kvm *kvm,
+				struct kvm_page_enc_bitmap *bmap);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1d8beaf1bceb..bae783cd396a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7686,6 +7686,76 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
 	return ret;
 }
 
+static int svm_get_page_enc_bitmap(struct kvm *kvm,
+				   struct kvm_page_enc_bitmap *bmap)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	unsigned long gfn_start, gfn_end;
+	unsigned long sz, i, sz_bytes;
+	unsigned long *bitmap;
+	int ret, n;
+
+	if (!sev_guest(kvm))
+		return -ENOTTY;
+
+	gfn_start = bmap->start_gfn;
+	gfn_end = gfn_start + bmap->num_pages;
+
+	sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / BITS_PER_BYTE;
+	bitmap = kmalloc(sz, GFP_KERNEL);
+	if (!bitmap)
+		return -ENOMEM;
+
+	/* by default all pages are marked encrypted */
+	memset(bitmap, 0xff, sz);
+
+	mutex_lock(&kvm->lock);
+	if (sev->page_enc_bmap) {
+		i = gfn_start;
+		for_each_clear_bit_from(i, sev->page_enc_bmap,
+				      min(sev->page_enc_bmap_size, gfn_end))
+			clear_bit(i - gfn_start, bitmap);
+	}
+	mutex_unlock(&kvm->lock);
+
+	ret = -EFAULT;
+
+	n = bmap->num_pages % BITS_PER_BYTE;
+	sz_bytes = ALIGN(bmap->num_pages, BITS_PER_BYTE) / BITS_PER_BYTE;
+
+	/*
+	 * Return the correct bitmap as per the number of pages being
+	 * requested by the user. Ensure that we only copy bmap->num_pages
+	 * bytes in the userspace buffer, if bmap->num_pages is not byte
+	 * aligned we read the trailing bits from the userspace and copy
+	 * those bits as is.
+	 */
+
+	if (n) {
+		unsigned char *bitmap_kernel = (unsigned char *)bitmap;
+		unsigned char bitmap_user;
+		unsigned long offset, mask;
+
+		offset = bmap->num_pages / BITS_PER_BYTE;
+		if (copy_from_user(&bitmap_user, bmap->enc_bitmap + offset,
+				sizeof(unsigned char)))
+			goto out;
+
+		mask = GENMASK(n - 1, 0);
+		bitmap_user &= ~mask;
+		bitmap_kernel[offset] &= mask;
+		bitmap_kernel[offset] |= bitmap_user;
+	}
+
+	if (copy_to_user(bmap->enc_bitmap, bitmap, sz_bytes))
+		goto out;
+
+	ret = 0;
+out:
+	kfree(bitmap);
+	return ret;
+}
+
 static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -8090,6 +8160,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
 
 	.page_enc_status_hc = svm_page_enc_status_hc,
+	.get_page_enc_bitmap = svm_get_page_enc_bitmap,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 68428eef2dde..3c3fea4e20b5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5226,6 +5226,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	case KVM_SET_PMU_EVENT_FILTER:
 		r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp);
 		break;
+	case KVM_GET_PAGE_ENC_BITMAP: {
+		struct kvm_page_enc_bitmap bitmap;
+
+		r = -EFAULT;
+		if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
+			goto out;
+
+		r = -ENOTTY;
+		if (kvm_x86_ops->get_page_enc_bitmap)
+			r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
+		break;
+	}
 	default:
 		r = -ENOTTY;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4e80c57a3182..db1ebf85e177 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -500,6 +500,16 @@ struct kvm_dirty_log {
 	};
 };
 
+/* for KVM_GET_PAGE_ENC_BITMAP */
+struct kvm_page_enc_bitmap {
+	__u64 start_gfn;
+	__u64 num_pages;
+	union {
+		void __user *enc_bitmap; /* one bit per page */
+		__u64 padding2;
+	};
+};
+
 /* for KVM_CLEAR_DIRTY_LOG */
 struct kvm_clear_dirty_log {
 	__u32 slot;
@@ -1478,6 +1488,8 @@ struct kvm_enc_region {
 #define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
 #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
 
+#define KVM_GET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
+
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
 	/* Guest initialization commands */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v6 10/14] mm: x86: Invoke hypercall when page encryption status is changed
  2020-03-30  6:19 [PATCH v6 00/14] Add AMD SEV guest live migration support Ashish Kalra
                   ` (8 preceding siblings ...)
  2020-03-30  6:22 ` [PATCH v6 09/14] KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl Ashish Kalra
@ 2020-03-30  6:22 ` Ashish Kalra
  2020-04-03 21:07   ` Krish Sadhukhan
  2020-04-03 21:36   ` Venu Busireddy
  2020-03-30  6:22 ` [PATCH v6 11/14] KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl Ashish Kalra
                   ` (4 subsequent siblings)
  14 siblings, 2 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30  6:22 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

From: Brijesh Singh <Brijesh.Singh@amd.com>

Invoke a hypercall when a memory region is changed from encrypted ->
decrypted and vice versa. Hypervisor need to know the page encryption
status during the guest migration.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/paravirt.h       | 10 +++++
 arch/x86/include/asm/paravirt_types.h |  2 +
 arch/x86/kernel/paravirt.c            |  1 +
 arch/x86/mm/mem_encrypt.c             | 57 ++++++++++++++++++++++++++-
 arch/x86/mm/pat/set_memory.c          |  7 ++++
 5 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 694d8daf4983..8127b9c141bf 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -78,6 +78,12 @@ static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
 	PVOP_VCALL1(mmu.exit_mmap, mm);
 }
 
+static inline void page_encryption_changed(unsigned long vaddr, int npages,
+						bool enc)
+{
+	PVOP_VCALL3(mmu.page_encryption_changed, vaddr, npages, enc);
+}
+
 #ifdef CONFIG_PARAVIRT_XXL
 static inline void load_sp0(unsigned long sp0)
 {
@@ -946,6 +952,10 @@ static inline void paravirt_arch_dup_mmap(struct mm_struct *oldmm,
 static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
 {
 }
+
+static inline void page_encryption_changed(unsigned long vaddr, int npages, bool enc)
+{
+}
 #endif
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PARAVIRT_H */
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 732f62e04ddb..03bfd515c59c 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -215,6 +215,8 @@ struct pv_mmu_ops {
 
 	/* Hook for intercepting the destruction of an mm_struct. */
 	void (*exit_mmap)(struct mm_struct *mm);
+	void (*page_encryption_changed)(unsigned long vaddr, int npages,
+					bool enc);
 
 #ifdef CONFIG_PARAVIRT_XXL
 	struct paravirt_callee_save read_cr2;
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index c131ba4e70ef..840c02b23aeb 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -367,6 +367,7 @@ struct paravirt_patch_template pv_ops = {
 			(void (*)(struct mmu_gather *, void *))tlb_remove_page,
 
 	.mmu.exit_mmap		= paravirt_nop,
+	.mmu.page_encryption_changed	= paravirt_nop,
 
 #ifdef CONFIG_PARAVIRT_XXL
 	.mmu.read_cr2		= __PV_IS_CALLEE_SAVE(native_read_cr2),
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index f4bd4b431ba1..c9800fa811f6 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -19,6 +19,7 @@
 #include <linux/kernel.h>
 #include <linux/bitops.h>
 #include <linux/dma-mapping.h>
+#include <linux/kvm_para.h>
 
 #include <asm/tlbflush.h>
 #include <asm/fixmap.h>
@@ -29,6 +30,7 @@
 #include <asm/processor-flags.h>
 #include <asm/msr.h>
 #include <asm/cmdline.h>
+#include <asm/kvm_para.h>
 
 #include "mm_internal.h"
 
@@ -196,6 +198,47 @@ void __init sme_early_init(void)
 		swiotlb_force = SWIOTLB_FORCE;
 }
 
+static void set_memory_enc_dec_hypercall(unsigned long vaddr, int npages,
+					bool enc)
+{
+	unsigned long sz = npages << PAGE_SHIFT;
+	unsigned long vaddr_end, vaddr_next;
+
+	vaddr_end = vaddr + sz;
+
+	for (; vaddr < vaddr_end; vaddr = vaddr_next) {
+		int psize, pmask, level;
+		unsigned long pfn;
+		pte_t *kpte;
+
+		kpte = lookup_address(vaddr, &level);
+		if (!kpte || pte_none(*kpte))
+			return;
+
+		switch (level) {
+		case PG_LEVEL_4K:
+			pfn = pte_pfn(*kpte);
+			break;
+		case PG_LEVEL_2M:
+			pfn = pmd_pfn(*(pmd_t *)kpte);
+			break;
+		case PG_LEVEL_1G:
+			pfn = pud_pfn(*(pud_t *)kpte);
+			break;
+		default:
+			return;
+		}
+
+		psize = page_level_size(level);
+		pmask = page_level_mask(level);
+
+		kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
+				   pfn << PAGE_SHIFT, psize >> PAGE_SHIFT, enc);
+
+		vaddr_next = (vaddr & pmask) + psize;
+	}
+}
+
 static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
 {
 	pgprot_t old_prot, new_prot;
@@ -253,12 +296,13 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
 static int __init early_set_memory_enc_dec(unsigned long vaddr,
 					   unsigned long size, bool enc)
 {
-	unsigned long vaddr_end, vaddr_next;
+	unsigned long vaddr_end, vaddr_next, start;
 	unsigned long psize, pmask;
 	int split_page_size_mask;
 	int level, ret;
 	pte_t *kpte;
 
+	start = vaddr;
 	vaddr_next = vaddr;
 	vaddr_end = vaddr + size;
 
@@ -313,6 +357,8 @@ static int __init early_set_memory_enc_dec(unsigned long vaddr,
 
 	ret = 0;
 
+	set_memory_enc_dec_hypercall(start, PAGE_ALIGN(size) >> PAGE_SHIFT,
+					enc);
 out:
 	__flush_tlb_all();
 	return ret;
@@ -451,6 +497,15 @@ void __init mem_encrypt_init(void)
 	if (sev_active())
 		static_branch_enable(&sev_enable_key);
 
+#ifdef CONFIG_PARAVIRT
+	/*
+	 * With SEV, we need to make a hypercall when page encryption state is
+	 * changed.
+	 */
+	if (sev_active())
+		pv_ops.mmu.page_encryption_changed = set_memory_enc_dec_hypercall;
+#endif
+
 	pr_info("AMD %s active\n",
 		sev_active() ? "Secure Encrypted Virtualization (SEV)"
 			     : "Secure Memory Encryption (SME)");
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index c4aedd00c1ba..86b7804129fc 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -26,6 +26,7 @@
 #include <asm/proto.h>
 #include <asm/memtype.h>
 #include <asm/set_memory.h>
+#include <asm/paravirt.h>
 
 #include "../mm_internal.h"
 
@@ -1987,6 +1988,12 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
 	 */
 	cpa_flush(&cpa, 0);
 
+	/* Notify hypervisor that a given memory range is mapped encrypted
+	 * or decrypted. The hypervisor will use this information during the
+	 * VM migration.
+	 */
+	page_encryption_changed(addr, numpages, enc);
+
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v6 11/14] KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
  2020-03-30  6:19 [PATCH v6 00/14] Add AMD SEV guest live migration support Ashish Kalra
                   ` (9 preceding siblings ...)
  2020-03-30  6:22 ` [PATCH v6 10/14] mm: x86: Invoke hypercall when page encryption status is changed Ashish Kalra
@ 2020-03-30  6:22 ` Ashish Kalra
  2020-04-03 21:10   ` Krish Sadhukhan
                     ` (2 more replies)
  2020-03-30  6:23 ` [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl Ashish Kalra
                   ` (3 subsequent siblings)
  14 siblings, 3 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30  6:22 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

From: Brijesh Singh <Brijesh.Singh@amd.com>

The ioctl can be used to set page encryption bitmap for an
incoming guest.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 Documentation/virt/kvm/api.rst  | 22 +++++++++++++++++
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/svm.c              | 42 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c              | 12 ++++++++++
 include/uapi/linux/kvm.h        |  1 +
 5 files changed, 79 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 8ad800ebb54f..4d1004a154f6 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4675,6 +4675,28 @@ or shared. The bitmap can be used during the guest migration, if the page
 is private then userspace need to use SEV migration commands to transmit
 the page.
 
+4.126 KVM_SET_PAGE_ENC_BITMAP (vm ioctl)
+---------------------------------------
+
+:Capability: basic
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: struct kvm_page_enc_bitmap (in/out)
+:Returns: 0 on success, -1 on error
+
+/* for KVM_SET_PAGE_ENC_BITMAP */
+struct kvm_page_enc_bitmap {
+	__u64 start_gfn;
+	__u64 num_pages;
+	union {
+		void __user *enc_bitmap; /* one bit per page */
+		__u64 padding2;
+	};
+};
+
+During the guest live migration the outgoing guest exports its page encryption
+bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
+bitmap for an incoming guest.
 
 5. The kvm_run structure
 ========================
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 27e43e3ec9d8..d30f770aaaea 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1271,6 +1271,8 @@ struct kvm_x86_ops {
 				  unsigned long sz, unsigned long mode);
 	int (*get_page_enc_bitmap)(struct kvm *kvm,
 				struct kvm_page_enc_bitmap *bmap);
+	int (*set_page_enc_bitmap)(struct kvm *kvm,
+				struct kvm_page_enc_bitmap *bmap);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index bae783cd396a..313343a43045 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7756,6 +7756,47 @@ static int svm_get_page_enc_bitmap(struct kvm *kvm,
 	return ret;
 }
 
+static int svm_set_page_enc_bitmap(struct kvm *kvm,
+				   struct kvm_page_enc_bitmap *bmap)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	unsigned long gfn_start, gfn_end;
+	unsigned long *bitmap;
+	unsigned long sz, i;
+	int ret;
+
+	if (!sev_guest(kvm))
+		return -ENOTTY;
+
+	gfn_start = bmap->start_gfn;
+	gfn_end = gfn_start + bmap->num_pages;
+
+	sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / 8;
+	bitmap = kmalloc(sz, GFP_KERNEL);
+	if (!bitmap)
+		return -ENOMEM;
+
+	ret = -EFAULT;
+	if (copy_from_user(bitmap, bmap->enc_bitmap, sz))
+		goto out;
+
+	mutex_lock(&kvm->lock);
+	ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
+	if (ret)
+		goto unlock;
+
+	i = gfn_start;
+	for_each_clear_bit_from(i, bitmap, (gfn_end - gfn_start))
+		clear_bit(i + gfn_start, sev->page_enc_bmap);
+
+	ret = 0;
+unlock:
+	mutex_unlock(&kvm->lock);
+out:
+	kfree(bitmap);
+	return ret;
+}
+
 static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -8161,6 +8202,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 
 	.page_enc_status_hc = svm_page_enc_status_hc,
 	.get_page_enc_bitmap = svm_get_page_enc_bitmap,
+	.set_page_enc_bitmap = svm_set_page_enc_bitmap,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3c3fea4e20b5..05e953b2ec61 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5238,6 +5238,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
 			r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
 		break;
 	}
+	case KVM_SET_PAGE_ENC_BITMAP: {
+		struct kvm_page_enc_bitmap bitmap;
+
+		r = -EFAULT;
+		if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
+			goto out;
+
+		r = -ENOTTY;
+		if (kvm_x86_ops->set_page_enc_bitmap)
+			r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
+		break;
+	}
 	default:
 		r = -ENOTTY;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index db1ebf85e177..b4b01d47e568 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1489,6 +1489,7 @@ struct kvm_enc_region {
 #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
 
 #define KVM_GET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
+#define KVM_SET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
 
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  2020-03-30  6:19 [PATCH v6 00/14] Add AMD SEV guest live migration support Ashish Kalra
                   ` (10 preceding siblings ...)
  2020-03-30  6:22 ` [PATCH v6 11/14] KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl Ashish Kalra
@ 2020-03-30  6:23 ` Ashish Kalra
  2020-04-03 21:14   ` Krish Sadhukhan
  2020-04-03 22:01   ` Venu Busireddy
  2020-03-30  6:23 ` [PATCH v6 13/14] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR Ashish Kalra
                   ` (2 subsequent siblings)
  14 siblings, 2 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30  6:23 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

From: Ashish Kalra <ashish.kalra@amd.com>

This ioctl can be used by the application to reset the page
encryption bitmap managed by the KVM driver. A typical usage
for this ioctl is on VM reboot, on reboot, we must reinitialize
the bitmap.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 Documentation/virt/kvm/api.rst  | 13 +++++++++++++
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c              | 16 ++++++++++++++++
 arch/x86/kvm/x86.c              |  6 ++++++
 include/uapi/linux/kvm.h        |  1 +
 5 files changed, 37 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 4d1004a154f6..a11326ccc51d 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
 bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
 bitmap for an incoming guest.
 
+4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
+-----------------------------------------
+
+:Capability: basic
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: none
+:Returns: 0 on success, -1 on error
+
+The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
+bitmap during guest reboot and this is only done on the guest's boot vCPU.
+
+
 5. The kvm_run structure
 ========================
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d30f770aaaea..a96ef6338cd2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
 				struct kvm_page_enc_bitmap *bmap);
 	int (*set_page_enc_bitmap)(struct kvm *kvm,
 				struct kvm_page_enc_bitmap *bmap);
+	int (*reset_page_enc_bitmap)(struct kvm *kvm);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 313343a43045..c99b0207a443 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
 	return ret;
 }
 
+static int svm_reset_page_enc_bitmap(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	if (!sev_guest(kvm))
+		return -ENOTTY;
+
+	mutex_lock(&kvm->lock);
+	/* by default all pages should be marked encrypted */
+	if (sev->page_enc_bmap_size)
+		bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
+	mutex_unlock(&kvm->lock);
+	return 0;
+}
+
 static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.page_enc_status_hc = svm_page_enc_status_hc,
 	.get_page_enc_bitmap = svm_get_page_enc_bitmap,
 	.set_page_enc_bitmap = svm_set_page_enc_bitmap,
+	.reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05e953b2ec61..2127ed937f53 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
 			r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
 		break;
 	}
+	case KVM_PAGE_ENC_BITMAP_RESET: {
+		r = -ENOTTY;
+		if (kvm_x86_ops->reset_page_enc_bitmap)
+			r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
+		break;
+	}
 	default:
 		r = -ENOTTY;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b4b01d47e568..0884a581fc37 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1490,6 +1490,7 @@ struct kvm_enc_region {
 
 #define KVM_GET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
 #define KVM_SET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
+#define KVM_PAGE_ENC_BITMAP_RESET	_IO(KVMIO, 0xc7)
 
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v6 13/14] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.
  2020-03-30  6:19 [PATCH v6 00/14] Add AMD SEV guest live migration support Ashish Kalra
                   ` (11 preceding siblings ...)
  2020-03-30  6:23 ` [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl Ashish Kalra
@ 2020-03-30  6:23 ` Ashish Kalra
  2020-03-30 15:52   ` Brijesh Singh
  2020-04-03 23:46   ` Krish Sadhukhan
  2020-03-30  6:23 ` [PATCH v6 14/14] KVM: x86: Add kexec support for SEV Live Migration Ashish Kalra
  2020-03-30 17:24 ` [PATCH v6 00/14] Add AMD SEV guest live migration support Venu Busireddy
  14 siblings, 2 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30  6:23 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

From: Ashish Kalra <ashish.kalra@amd.com>

Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
for host-side support for SEV live migration. Also add a new custom
MSR_KVM_SEV_LIVE_MIG_EN for guest to enable the SEV live migration
feature.

Also, ensure that _bss_decrypted section is marked as decrypted in the
page encryption bitmap.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 Documentation/virt/kvm/cpuid.rst     |  4 ++++
 Documentation/virt/kvm/msr.rst       | 10 ++++++++++
 arch/x86/include/asm/kvm_host.h      |  3 +++
 arch/x86/include/uapi/asm/kvm_para.h |  5 +++++
 arch/x86/kernel/kvm.c                |  4 ++++
 arch/x86/kvm/cpuid.c                 |  3 ++-
 arch/x86/kvm/svm.c                   |  5 +++++
 arch/x86/kvm/x86.c                   |  7 +++++++
 arch/x86/mm/mem_encrypt.c            | 14 +++++++++++++-
 9 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index 01b081f6e7ea..fcb191bb3016 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD        13          guest checks this feature bit
                                               before using paravirtualized
                                               sched yield.
 
+KVM_FEATURE_SEV_LIVE_MIGRATION    14          guest checks this feature bit
+                                              before enabling SEV live
+                                              migration feature.
+
 KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24          host will warn if no guest-side
                                               per-cpu warps are expeced in
                                               kvmclock
diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index 33892036672d..7cd7786bbb03 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -319,3 +319,13 @@ data:
 
 	KVM guests can request the host not to poll on HLT, for example if
 	they are performing polling themselves.
+
+MSR_KVM_SEV_LIVE_MIG_EN:
+        0x4b564d06
+
+	Control SEV Live Migration features.
+
+data:
+        Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature.
+        Bit 1 enables (1) or disables (0) support for SEV Live Migration extensions.
+        All other bits are reserved.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a96ef6338cd2..ad5faaed43c0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -780,6 +780,9 @@ struct kvm_vcpu_arch {
 
 	u64 msr_kvm_poll_control;
 
+	/* SEV Live Migration MSR (AMD only) */
+	u64 msr_kvm_sev_live_migration_flag;
+
 	/*
 	 * Indicates the guest is trying to write a gfn that contains one or
 	 * more of the PTEs used to translate the write itself, i.e. the access
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 2a8e0b6b9805..d9d4953b42ad 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -31,6 +31,7 @@
 #define KVM_FEATURE_PV_SEND_IPI	11
 #define KVM_FEATURE_POLL_CONTROL	12
 #define KVM_FEATURE_PV_SCHED_YIELD	13
+#define KVM_FEATURE_SEV_LIVE_MIGRATION	14
 
 #define KVM_HINTS_REALTIME      0
 
@@ -50,6 +51,7 @@
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN      0x4b564d04
 #define MSR_KVM_POLL_CONTROL	0x4b564d05
+#define MSR_KVM_SEV_LIVE_MIG_EN	0x4b564d06
 
 struct kvm_steal_time {
 	__u64 steal;
@@ -122,4 +124,7 @@ struct kvm_vcpu_pv_apf_data {
 #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
 #define KVM_PV_EOI_DISABLED 0x0
 
+#define KVM_SEV_LIVE_MIGRATION_ENABLED			(1 << 0)
+#define KVM_SEV_LIVE_MIGRATION_EXTENSIONS_SUPPORTED	(1 << 1)
+
 #endif /* _UAPI_ASM_X86_KVM_PARA_H */
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 6efe0410fb72..8fcee0b45231 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -418,6 +418,10 @@ static void __init sev_map_percpu_data(void)
 	if (!sev_active())
 		return;
 
+	if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION)) {
+		wrmsrl(MSR_KVM_SEV_LIVE_MIG_EN, KVM_SEV_LIVE_MIGRATION_ENABLED);
+	}
+
 	for_each_possible_cpu(cpu) {
 		__set_percpu_decrypted(&per_cpu(apf_reason, cpu), sizeof(apf_reason));
 		__set_percpu_decrypted(&per_cpu(steal_time, cpu), sizeof(steal_time));
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b1c469446b07..74c8b2a7270c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -716,7 +716,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
 			     (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
 			     (1 << KVM_FEATURE_PV_SEND_IPI) |
 			     (1 << KVM_FEATURE_POLL_CONTROL) |
-			     (1 << KVM_FEATURE_PV_SCHED_YIELD);
+			     (1 << KVM_FEATURE_PV_SCHED_YIELD) |
+			     (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c99b0207a443..60ddc242a133 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7632,6 +7632,7 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
 				  unsigned long npages, unsigned long enc)
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_vcpu *vcpu = kvm->vcpus[0];
 	kvm_pfn_t pfn_start, pfn_end;
 	gfn_t gfn_start, gfn_end;
 	int ret;
@@ -7639,6 +7640,10 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
 	if (!sev_guest(kvm))
 		return -EINVAL;
 
+	if (!(vcpu->arch.msr_kvm_sev_live_migration_flag &
+		KVM_SEV_LIVE_MIGRATION_ENABLED))
+		return -ENOTTY;
+
 	if (!npages)
 		return 0;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2127ed937f53..82867b8798f8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2880,6 +2880,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vcpu->arch.msr_kvm_poll_control = data;
 		break;
 
+	case MSR_KVM_SEV_LIVE_MIG_EN:
+		vcpu->arch.msr_kvm_sev_live_migration_flag = data;
+		break;
+
 	case MSR_IA32_MCG_CTL:
 	case MSR_IA32_MCG_STATUS:
 	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
@@ -3126,6 +3130,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_KVM_POLL_CONTROL:
 		msr_info->data = vcpu->arch.msr_kvm_poll_control;
 		break;
+	case MSR_KVM_SEV_LIVE_MIG_EN:
+		msr_info->data = vcpu->arch.msr_kvm_sev_live_migration_flag;
+		break;
 	case MSR_IA32_P5_MC_ADDR:
 	case MSR_IA32_P5_MC_TYPE:
 	case MSR_IA32_MCG_CAP:
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index c9800fa811f6..f6a841494845 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -502,8 +502,20 @@ void __init mem_encrypt_init(void)
 	 * With SEV, we need to make a hypercall when page encryption state is
 	 * changed.
 	 */
-	if (sev_active())
+	if (sev_active()) {
+		unsigned long nr_pages;
+
 		pv_ops.mmu.page_encryption_changed = set_memory_enc_dec_hypercall;
+
+		/*
+		 * Ensure that _bss_decrypted section is marked as decrypted in the
+		 * page encryption bitmap.
+		 */
+		nr_pages = DIV_ROUND_UP(__end_bss_decrypted - __start_bss_decrypted,
+			PAGE_SIZE);
+		set_memory_enc_dec_hypercall((unsigned long)__start_bss_decrypted,
+			nr_pages, 0);
+	}
 #endif
 
 	pr_info("AMD %s active\n",
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v6 14/14] KVM: x86: Add kexec support for SEV Live Migration.
  2020-03-30  6:19 [PATCH v6 00/14] Add AMD SEV guest live migration support Ashish Kalra
                   ` (12 preceding siblings ...)
  2020-03-30  6:23 ` [PATCH v6 13/14] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR Ashish Kalra
@ 2020-03-30  6:23 ` Ashish Kalra
  2020-03-30 16:00   ` Brijesh Singh
                     ` (2 more replies)
  2020-03-30 17:24 ` [PATCH v6 00/14] Add AMD SEV guest live migration support Venu Busireddy
  14 siblings, 3 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30  6:23 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

From: Ashish Kalra <ashish.kalra@amd.com>

Reset the host's page encryption bitmap related to kernel
specific page encryption status settings before we load a
new kernel by kexec. We cannot reset the complete
page encryption bitmap here as we need to retain the
UEFI/OVMF firmware specific settings.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 8fcee0b45231..ba6cce3c84af 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -34,6 +34,7 @@
 #include <asm/hypervisor.h>
 #include <asm/tlb.h>
 #include <asm/cpuidle_haltpoll.h>
+#include <asm/e820/api.h>
 
 static int kvmapf = 1;
 
@@ -357,6 +358,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
 	 */
 	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
 		wrmsrl(MSR_KVM_PV_EOI_EN, 0);
+	/*
+	 * Reset the host's page encryption bitmap related to kernel
+	 * specific page encryption status settings before we load a
+	 * new kernel by kexec. NOTE: We cannot reset the complete
+	 * page encryption bitmap here as we need to retain the
+	 * UEFI/OVMF firmware specific settings.
+	 */
+	if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION) &&
+		(smp_processor_id() == 0)) {
+		unsigned long nr_pages;
+		int i;
+
+		for (i = 0; i < e820_table->nr_entries; i++) {
+			struct e820_entry *entry = &e820_table->entries[i];
+			unsigned long start_pfn, end_pfn;
+
+			if (entry->type != E820_TYPE_RAM)
+				continue;
+
+			start_pfn = entry->addr >> PAGE_SHIFT;
+			end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
+			nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
+
+			kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
+				entry->addr, nr_pages, 1);
+		}
+	}
 	kvm_pv_disable_apf();
 	kvm_disable_steal_time();
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 13/14] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.
  2020-03-30  6:23 ` [PATCH v6 13/14] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR Ashish Kalra
@ 2020-03-30 15:52   ` Brijesh Singh
  2020-03-30 16:42     ` Ashish Kalra
       [not found]     ` <20200330162730.GA21567@ashkalra_ubuntu_server>
  2020-04-03 23:46   ` Krish Sadhukhan
  1 sibling, 2 replies; 107+ messages in thread
From: Brijesh Singh @ 2020-03-30 15:52 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: brijesh.singh, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86,
	kvm, linux-kernel, rientjes, srutherford, luto


On 3/30/20 1:23 AM, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
>
> Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> for host-side support for SEV live migration. Also add a new custom
> MSR_KVM_SEV_LIVE_MIG_EN for guest to enable the SEV live migration
> feature.
>
> Also, ensure that _bss_decrypted section is marked as decrypted in the
> page encryption bitmap.
>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  Documentation/virt/kvm/cpuid.rst     |  4 ++++
>  Documentation/virt/kvm/msr.rst       | 10 ++++++++++
>  arch/x86/include/asm/kvm_host.h      |  3 +++
>  arch/x86/include/uapi/asm/kvm_para.h |  5 +++++
>  arch/x86/kernel/kvm.c                |  4 ++++
>  arch/x86/kvm/cpuid.c                 |  3 ++-
>  arch/x86/kvm/svm.c                   |  5 +++++
>  arch/x86/kvm/x86.c                   |  7 +++++++
>  arch/x86/mm/mem_encrypt.c            | 14 +++++++++++++-
>  9 files changed, 53 insertions(+), 2 deletions(-)


IMHO, this patch should be broken into multiple patches as it touches
guest, and hypervisor at the same time. The first patch can introduce
the feature flag in the kvm, second patch can make the changes specific
to svm,  and third patch can focus on how to make use of that feature
inside the guest. Additionally invoking the HC to clear the
__bss_decrypted section should be either squash in Patch 10/14 or be a
separate patch itself.


> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> index 01b081f6e7ea..fcb191bb3016 100644
> --- a/Documentation/virt/kvm/cpuid.rst
> +++ b/Documentation/virt/kvm/cpuid.rst
> @@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD        13          guest checks this feature bit
>                                                before using paravirtualized
>                                                sched yield.
>  
> +KVM_FEATURE_SEV_LIVE_MIGRATION    14          guest checks this feature bit
> +                                              before enabling SEV live
> +                                              migration feature.
> +
>  KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24          host will warn if no guest-side
>                                                per-cpu warps are expeced in
>                                                kvmclock
> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> index 33892036672d..7cd7786bbb03 100644
> --- a/Documentation/virt/kvm/msr.rst
> +++ b/Documentation/virt/kvm/msr.rst
> @@ -319,3 +319,13 @@ data:
>  
>  	KVM guests can request the host not to poll on HLT, for example if
>  	they are performing polling themselves.
> +
> +MSR_KVM_SEV_LIVE_MIG_EN:
> +        0x4b564d06
> +
> +	Control SEV Live Migration features.
> +
> +data:
> +        Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature.
> +        Bit 1 enables (1) or disables (0) support for SEV Live Migration extensions.
> +        All other bits are reserved.
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index a96ef6338cd2..ad5faaed43c0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -780,6 +780,9 @@ struct kvm_vcpu_arch {
>  
>  	u64 msr_kvm_poll_control;
>  
> +	/* SEV Live Migration MSR (AMD only) */
> +	u64 msr_kvm_sev_live_migration_flag;
> +
>  	/*
>  	 * Indicates the guest is trying to write a gfn that contains one or
>  	 * more of the PTEs used to translate the write itself, i.e. the access
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 2a8e0b6b9805..d9d4953b42ad 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -31,6 +31,7 @@
>  #define KVM_FEATURE_PV_SEND_IPI	11
>  #define KVM_FEATURE_POLL_CONTROL	12
>  #define KVM_FEATURE_PV_SCHED_YIELD	13
> +#define KVM_FEATURE_SEV_LIVE_MIGRATION	14
>  
>  #define KVM_HINTS_REALTIME      0
>  
> @@ -50,6 +51,7 @@
>  #define MSR_KVM_STEAL_TIME  0x4b564d03
>  #define MSR_KVM_PV_EOI_EN      0x4b564d04
>  #define MSR_KVM_POLL_CONTROL	0x4b564d05
> +#define MSR_KVM_SEV_LIVE_MIG_EN	0x4b564d06
>  
>  struct kvm_steal_time {
>  	__u64 steal;
> @@ -122,4 +124,7 @@ struct kvm_vcpu_pv_apf_data {
>  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
>  #define KVM_PV_EOI_DISABLED 0x0
>  
> +#define KVM_SEV_LIVE_MIGRATION_ENABLED			(1 << 0)
> +#define KVM_SEV_LIVE_MIGRATION_EXTENSIONS_SUPPORTED	(1 << 1)
> +
>  #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 6efe0410fb72..8fcee0b45231 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -418,6 +418,10 @@ static void __init sev_map_percpu_data(void)
>  	if (!sev_active())
>  		return;
>  
> +	if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION)) {
> +		wrmsrl(MSR_KVM_SEV_LIVE_MIG_EN, KVM_SEV_LIVE_MIGRATION_ENABLED);
> +	}
> +
>  	for_each_possible_cpu(cpu) {
>  		__set_percpu_decrypted(&per_cpu(apf_reason, cpu), sizeof(apf_reason));
>  		__set_percpu_decrypted(&per_cpu(steal_time, cpu), sizeof(steal_time));
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index b1c469446b07..74c8b2a7270c 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -716,7 +716,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
>  			     (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
>  			     (1 << KVM_FEATURE_PV_SEND_IPI) |
>  			     (1 << KVM_FEATURE_POLL_CONTROL) |
> -			     (1 << KVM_FEATURE_PV_SCHED_YIELD);
> +			     (1 << KVM_FEATURE_PV_SCHED_YIELD) |
> +			     (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);


Do we want to enable this feature unconditionally ? Who will clear the
feature flags for the non-SEV guest ?

>  
>  		if (sched_info_on())
>  			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index c99b0207a443..60ddc242a133 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7632,6 +7632,7 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
>  				  unsigned long npages, unsigned long enc)
>  {
>  	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct kvm_vcpu *vcpu = kvm->vcpus[0];
>  	kvm_pfn_t pfn_start, pfn_end;
>  	gfn_t gfn_start, gfn_end;
>  	int ret;
> @@ -7639,6 +7640,10 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
>  	if (!sev_guest(kvm))
>  		return -EINVAL;
>  
> +	if (!(vcpu->arch.msr_kvm_sev_live_migration_flag &
> +		KVM_SEV_LIVE_MIGRATION_ENABLED))
> +		return -ENOTTY;
> +
>  	if (!npages)
>  		return 0;
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2127ed937f53..82867b8798f8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2880,6 +2880,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		vcpu->arch.msr_kvm_poll_control = data;
>  		break;
>  
> +	case MSR_KVM_SEV_LIVE_MIG_EN:
> +		vcpu->arch.msr_kvm_sev_live_migration_flag = data;
> +		break;
> +
>  	case MSR_IA32_MCG_CTL:
>  	case MSR_IA32_MCG_STATUS:
>  	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
> @@ -3126,6 +3130,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	case MSR_KVM_POLL_CONTROL:
>  		msr_info->data = vcpu->arch.msr_kvm_poll_control;
>  		break;
> +	case MSR_KVM_SEV_LIVE_MIG_EN:
> +		msr_info->data = vcpu->arch.msr_kvm_sev_live_migration_flag;
> +		break;
>  	case MSR_IA32_P5_MC_ADDR:
>  	case MSR_IA32_P5_MC_TYPE:
>  	case MSR_IA32_MCG_CAP:
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index c9800fa811f6..f6a841494845 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -502,8 +502,20 @@ void __init mem_encrypt_init(void)
>  	 * With SEV, we need to make a hypercall when page encryption state is
>  	 * changed.
>  	 */
> -	if (sev_active())
> +	if (sev_active()) {
> +		unsigned long nr_pages;
> +
>  		pv_ops.mmu.page_encryption_changed = set_memory_enc_dec_hypercall;
> +
> +		/*
> +		 * Ensure that _bss_decrypted section is marked as decrypted in the
> +		 * page encryption bitmap.
> +		 */
> +		nr_pages = DIV_ROUND_UP(__end_bss_decrypted - __start_bss_decrypted,
> +			PAGE_SIZE);
> +		set_memory_enc_dec_hypercall((unsigned long)__start_bss_decrypted,
> +			nr_pages, 0);
> +	}


Isn't this too late, should we be making hypercall at the same time we
clear the encryption bit ?


>  #endif
>  
>  	pr_info("AMD %s active\n",

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 14/14] KVM: x86: Add kexec support for SEV Live Migration.
  2020-03-30  6:23 ` [PATCH v6 14/14] KVM: x86: Add kexec support for SEV Live Migration Ashish Kalra
@ 2020-03-30 16:00   ` Brijesh Singh
  2020-03-30 16:45     ` Ashish Kalra
  2020-04-03 12:57   ` Dave Young
  2020-04-04  0:55   ` Krish Sadhukhan
  2 siblings, 1 reply; 107+ messages in thread
From: Brijesh Singh @ 2020-03-30 16:00 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: brijesh.singh, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86,
	kvm, linux-kernel, rientjes, srutherford, luto


On 3/30/20 1:23 AM, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
>
> Reset the host's page encryption bitmap related to kernel
> specific page encryption status settings before we load a
> new kernel by kexec. We cannot reset the complete
> page encryption bitmap here as we need to retain the
> UEFI/OVMF firmware specific settings.
>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 8fcee0b45231..ba6cce3c84af 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -34,6 +34,7 @@
>  #include <asm/hypervisor.h>
>  #include <asm/tlb.h>
>  #include <asm/cpuidle_haltpoll.h>
> +#include <asm/e820/api.h>
>  
>  static int kvmapf = 1;
>  
> @@ -357,6 +358,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
>  	 */
>  	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
>  		wrmsrl(MSR_KVM_PV_EOI_EN, 0);
> +	/*
> +	 * Reset the host's page encryption bitmap related to kernel
> +	 * specific page encryption status settings before we load a
> +	 * new kernel by kexec. NOTE: We cannot reset the complete
> +	 * page encryption bitmap here as we need to retain the
> +	 * UEFI/OVMF firmware specific settings.
> +	 */
> +	if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION) &&
> +		(smp_processor_id() == 0)) {


In patch 13/14, the KVM_FEATURE_SEV_LIVE_MIGRATION is set
unconditionally and because of that now the below code will be executed
on non-SEV guest. IMO, this feature must be cleared for non-SEV guest to
avoid making unnecessary hypercall's.


> +		unsigned long nr_pages;
> +		int i;
> +
> +		for (i = 0; i < e820_table->nr_entries; i++) {
> +			struct e820_entry *entry = &e820_table->entries[i];
> +			unsigned long start_pfn, end_pfn;
> +
> +			if (entry->type != E820_TYPE_RAM)
> +				continue;
> +
> +			start_pfn = entry->addr >> PAGE_SHIFT;
> +			end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
> +			nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
> +
> +			kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> +				entry->addr, nr_pages, 1);
> +		}
> +	}
>  	kvm_pv_disable_apf();
>  	kvm_disable_steal_time();
>  }

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 13/14] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.
  2020-03-30 15:52   ` Brijesh Singh
@ 2020-03-30 16:42     ` Ashish Kalra
       [not found]     ` <20200330162730.GA21567@ashkalra_ubuntu_server>
  1 sibling, 0 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30 16:42 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto

Hello Brijesh,

On Mon, Mar 30, 2020 at 10:52:16AM -0500, Brijesh Singh wrote:
> 
> On 3/30/20 1:23 AM, Ashish Kalra wrote:
> > From: Ashish Kalra <ashish.kalra@amd.com>
> >
> > Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> > for host-side support for SEV live migration. Also add a new custom
> > MSR_KVM_SEV_LIVE_MIG_EN for guest to enable the SEV live migration
> > feature.
> >
> > Also, ensure that _bss_decrypted section is marked as decrypted in the
> > page encryption bitmap.
> >
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > ---
> >  Documentation/virt/kvm/cpuid.rst     |  4 ++++
> >  Documentation/virt/kvm/msr.rst       | 10 ++++++++++
> >  arch/x86/include/asm/kvm_host.h      |  3 +++
> >  arch/x86/include/uapi/asm/kvm_para.h |  5 +++++
> >  arch/x86/kernel/kvm.c                |  4 ++++
> >  arch/x86/kvm/cpuid.c                 |  3 ++-
> >  arch/x86/kvm/svm.c                   |  5 +++++
> >  arch/x86/kvm/x86.c                   |  7 +++++++
> >  arch/x86/mm/mem_encrypt.c            | 14 +++++++++++++-
> >  9 files changed, 53 insertions(+), 2 deletions(-)
> 
> 
> IMHO, this patch should be broken into multiple patches as it touches
> guest, and hypervisor at the same time. The first patch can introduce
> the feature flag in the kvm, second patch can make the changes specific
> to svm,  and third patch can focus on how to make use of that feature
> inside the guest. Additionally invoking the HC to clear the
> __bss_decrypted section should be either squash in Patch 10/14 or be a
> separate patch itself.
> 
> 

Ok.

I will also move the __bss_decrypted section HC to a separate patch.

> > diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> > index 01b081f6e7ea..fcb191bb3016 100644
> > --- a/Documentation/virt/kvm/cpuid.rst
> > +++ b/Documentation/virt/kvm/cpuid.rst
> > @@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD        13          guest checks this feature bit
> >                                                before using paravirtualized
> >                                                sched yield.
> >  
> > +KVM_FEATURE_SEV_LIVE_MIGRATION    14          guest checks this feature bit
> > +                                              before enabling SEV live
> > +                                              migration feature.
> > +
> >  KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24          host will warn if no guest-side
> >                                                per-cpu warps are expeced in
> >                                                kvmclock
> > diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> > index 33892036672d..7cd7786bbb03 100644
> > --- a/Documentation/virt/kvm/msr.rst
> > +++ b/Documentation/virt/kvm/msr.rst
> > @@ -319,3 +319,13 @@ data:
> >  
> >  	KVM guests can request the host not to poll on HLT, for example if
> >  	they are performing polling themselves.
> > +
> > +MSR_KVM_SEV_LIVE_MIG_EN:
> > +        0x4b564d06
> > +
> > +	Control SEV Live Migration features.
> > +
> > +data:
> > +        Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature.
> > +        Bit 1 enables (1) or disables (0) support for SEV Live Migration extensions.
> > +        All other bits are reserved.
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index a96ef6338cd2..ad5faaed43c0 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -780,6 +780,9 @@ struct kvm_vcpu_arch {
> >  
> >  	u64 msr_kvm_poll_control;
> >  
> > +	/* SEV Live Migration MSR (AMD only) */
> > +	u64 msr_kvm_sev_live_migration_flag;
> > +
> >  	/*
> >  	 * Indicates the guest is trying to write a gfn that contains one or
> >  	 * more of the PTEs used to translate the write itself, i.e. the access
> > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> > index 2a8e0b6b9805..d9d4953b42ad 100644
> > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > @@ -31,6 +31,7 @@
> >  #define KVM_FEATURE_PV_SEND_IPI	11
> >  #define KVM_FEATURE_POLL_CONTROL	12
> >  #define KVM_FEATURE_PV_SCHED_YIELD	13
> > +#define KVM_FEATURE_SEV_LIVE_MIGRATION	14
> >  
> >  #define KVM_HINTS_REALTIME      0
> >  
> > @@ -50,6 +51,7 @@
> >  #define MSR_KVM_STEAL_TIME  0x4b564d03
> >  #define MSR_KVM_PV_EOI_EN      0x4b564d04
> >  #define MSR_KVM_POLL_CONTROL	0x4b564d05
> > +#define MSR_KVM_SEV_LIVE_MIG_EN	0x4b564d06
> >  
> >  struct kvm_steal_time {
> >  	__u64 steal;
> > @@ -122,4 +124,7 @@ struct kvm_vcpu_pv_apf_data {
> >  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
> >  #define KVM_PV_EOI_DISABLED 0x0
> >  
> > +#define KVM_SEV_LIVE_MIGRATION_ENABLED			(1 << 0)
> > +#define KVM_SEV_LIVE_MIGRATION_EXTENSIONS_SUPPORTED	(1 << 1)
> > +
> >  #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index 6efe0410fb72..8fcee0b45231 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -418,6 +418,10 @@ static void __init sev_map_percpu_data(void)
> >  	if (!sev_active())
> >  		return;
> >  
> > +	if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION)) {
> > +		wrmsrl(MSR_KVM_SEV_LIVE_MIG_EN, KVM_SEV_LIVE_MIGRATION_ENABLED);
> > +	}
> > +
> >  	for_each_possible_cpu(cpu) {
> >  		__set_percpu_decrypted(&per_cpu(apf_reason, cpu), sizeof(apf_reason));
> >  		__set_percpu_decrypted(&per_cpu(steal_time, cpu), sizeof(steal_time));
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index b1c469446b07..74c8b2a7270c 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -716,7 +716,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
> >  			     (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
> >  			     (1 << KVM_FEATURE_PV_SEND_IPI) |
> >  			     (1 << KVM_FEATURE_POLL_CONTROL) |
> > -			     (1 << KVM_FEATURE_PV_SCHED_YIELD);
> > +			     (1 << KVM_FEATURE_PV_SCHED_YIELD) |
> > +			     (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
> 
> 
> Do we want to enable this feature unconditionally ? Who will clear the
> feature flags for the non-SEV guest ?
>

The guest only enables/activates this feature if sev is active.

> >  
> >  		if (sched_info_on())
> >  			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index c99b0207a443..60ddc242a133 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -7632,6 +7632,7 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> >  				  unsigned long npages, unsigned long enc)
> >  {
> >  	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +	struct kvm_vcpu *vcpu = kvm->vcpus[0];
> >  	kvm_pfn_t pfn_start, pfn_end;
> >  	gfn_t gfn_start, gfn_end;
> >  	int ret;
> > @@ -7639,6 +7640,10 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> >  	if (!sev_guest(kvm))
> >  		return -EINVAL;
> >  
> > +	if (!(vcpu->arch.msr_kvm_sev_live_migration_flag &
> > +		KVM_SEV_LIVE_MIGRATION_ENABLED))
> > +		return -ENOTTY;
> > +
> >  	if (!npages)
> >  		return 0;
> >  
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 2127ed937f53..82867b8798f8 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2880,6 +2880,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >  		vcpu->arch.msr_kvm_poll_control = data;
> >  		break;
> >  
> > +	case MSR_KVM_SEV_LIVE_MIG_EN:
> > +		vcpu->arch.msr_kvm_sev_live_migration_flag = data;
> > +		break;
> > +
> >  	case MSR_IA32_MCG_CTL:
> >  	case MSR_IA32_MCG_STATUS:
> >  	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
> > @@ -3126,6 +3130,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >  	case MSR_KVM_POLL_CONTROL:
> >  		msr_info->data = vcpu->arch.msr_kvm_poll_control;
> >  		break;
> > +	case MSR_KVM_SEV_LIVE_MIG_EN:
> > +		msr_info->data = vcpu->arch.msr_kvm_sev_live_migration_flag;
> > +		break;
> >  	case MSR_IA32_P5_MC_ADDR:
> >  	case MSR_IA32_P5_MC_TYPE:
> >  	case MSR_IA32_MCG_CAP:
> > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> > index c9800fa811f6..f6a841494845 100644
> > --- a/arch/x86/mm/mem_encrypt.c
> > +++ b/arch/x86/mm/mem_encrypt.c
> > @@ -502,8 +502,20 @@ void __init mem_encrypt_init(void)
> >  	 * With SEV, we need to make a hypercall when page encryption state is
> >  	 * changed.
> >  	 */
> > -	if (sev_active())
> > +	if (sev_active()) {
> > +		unsigned long nr_pages;
> > +
> >  		pv_ops.mmu.page_encryption_changed = set_memory_enc_dec_hypercall;
> > +
> > +		/*
> > +		 * Ensure that _bss_decrypted section is marked as decrypted in the
> > +		 * page encryption bitmap.
> > +		 */
> > +		nr_pages = DIV_ROUND_UP(__end_bss_decrypted - __start_bss_decrypted,
> > +			PAGE_SIZE);
> > +		set_memory_enc_dec_hypercall((unsigned long)__start_bss_decrypted,
> > +			nr_pages, 0);
> > +	}
> 
> 
> Isn't this too late, should we be making hypercall at the same time we
> clear the encryption bit ?
> 
>

Actually this is being done somewhat lazily, after the guest enables/activates the live migration feature, it should be fine to do it
here or it can be moved into sev_map_percpu_data() where the first hypercalls are done, in both cases the __bss_decrypted section will
be marked before the live migration process is initiated.

> >  #endif
> >  
> >  	pr_info("AMD %s active\n",

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 14/14] KVM: x86: Add kexec support for SEV Live Migration.
  2020-03-30 16:00   ` Brijesh Singh
@ 2020-03-30 16:45     ` Ashish Kalra
  2020-03-31 14:26       ` Brijesh Singh
  0 siblings, 1 reply; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30 16:45 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto

Hello Brijesh,

On Mon, Mar 30, 2020 at 11:00:14AM -0500, Brijesh Singh wrote:
> 
> On 3/30/20 1:23 AM, Ashish Kalra wrote:
> > From: Ashish Kalra <ashish.kalra@amd.com>
> >
> > Reset the host's page encryption bitmap related to kernel
> > specific page encryption status settings before we load a
> > new kernel by kexec. We cannot reset the complete
> > page encryption bitmap here as we need to retain the
> > UEFI/OVMF firmware specific settings.
> >
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > ---
> >  arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
> >  1 file changed, 28 insertions(+)
> >
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index 8fcee0b45231..ba6cce3c84af 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -34,6 +34,7 @@
> >  #include <asm/hypervisor.h>
> >  #include <asm/tlb.h>
> >  #include <asm/cpuidle_haltpoll.h>
> > +#include <asm/e820/api.h>
> >  
> >  static int kvmapf = 1;
> >  
> > @@ -357,6 +358,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
> >  	 */
> >  	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
> >  		wrmsrl(MSR_KVM_PV_EOI_EN, 0);
> > +	/*
> > +	 * Reset the host's page encryption bitmap related to kernel
> > +	 * specific page encryption status settings before we load a
> > +	 * new kernel by kexec. NOTE: We cannot reset the complete
> > +	 * page encryption bitmap here as we need to retain the
> > +	 * UEFI/OVMF firmware specific settings.
> > +	 */
> > +	if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION) &&
> > +		(smp_processor_id() == 0)) {
> 
> 
> In patch 13/14, the KVM_FEATURE_SEV_LIVE_MIGRATION is set
> unconditionally and because of that now the below code will be executed
> on non-SEV guest. IMO, this feature must be cleared for non-SEV guest to
> avoid making unnecessary hypercall's.
> 
>

I will additionally add a sev_active() check here to ensure that we don't make the unnecassary hypercalls on non-SEV guests.

> > +		unsigned long nr_pages;
> > +		int i;
> > +
> > +		for (i = 0; i < e820_table->nr_entries; i++) {
> > +			struct e820_entry *entry = &e820_table->entries[i];
> > +			unsigned long start_pfn, end_pfn;
> > +
> > +			if (entry->type != E820_TYPE_RAM)
> > +				continue;
> > +
> > +			start_pfn = entry->addr >> PAGE_SHIFT;
> > +			end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
> > +			nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
> > +
> > +			kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> > +				entry->addr, nr_pages, 1);
> > +		}
> > +	}
> >  	kvm_pv_disable_apf();
> >  	kvm_disable_steal_time();
> >  }

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 00/14] Add AMD SEV guest live migration support
  2020-03-30  6:19 [PATCH v6 00/14] Add AMD SEV guest live migration support Ashish Kalra
                   ` (13 preceding siblings ...)
  2020-03-30  6:23 ` [PATCH v6 14/14] KVM: x86: Add kexec support for SEV Live Migration Ashish Kalra
@ 2020-03-30 17:24 ` Venu Busireddy
  2020-03-30 18:28   ` Ashish Kalra
  14 siblings, 1 reply; 107+ messages in thread
From: Venu Busireddy @ 2020-03-30 17:24 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On 2020-03-30 06:19:27 +0000, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> The series add support for AMD SEV guest live migration commands. To protect the
> confidentiality of an SEV protected guest memory while in transit we need to
> use the SEV commands defined in SEV API spec [1].
> 
> SEV guest VMs have the concept of private and shared memory. Private memory
> is encrypted with the guest-specific key, while shared memory may be encrypted
> with hypervisor key. The commands provided by the SEV FW are meant to be used
> for the private memory only. The patch series introduces a new hypercall.
> The guest OS can use this hypercall to notify the page encryption status.
> If the page is encrypted with guest specific-key then we use SEV command during
> the migration. If page is not encrypted then fallback to default.
> 
> The patch adds new ioctls KVM_{SET,GET}_PAGE_ENC_BITMAP. The ioctl can be used
> by the qemu to get the page encrypted bitmap. Qemu can consult this bitmap
> during the migration to know whether the page is encrypted.
> 
> [1] https://developer.amd.com/wp-content/resources/55766.PDF
> 
> Changes since v5:
> - Fix build errors as
>   Reported-by: kbuild test robot <lkp@intel.com>

Which upstream tag should I use to apply this patch set? I tried the
top of Linus's tree, and I get the following error when I apply this
patch set.

$ git am PATCH-v6-01-14-KVM-SVM-Add-KVM_SEV-SEND_START-command.mbox
Applying: KVM: SVM: Add KVM_SEV SEND_START command
Applying: KVM: SVM: Add KVM_SEND_UPDATE_DATA command
Applying: KVM: SVM: Add KVM_SEV_SEND_FINISH command
Applying: KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
error: patch failed: Documentation/virt/kvm/amd-memory-encryption.rst:375
error: Documentation/virt/kvm/amd-memory-encryption.rst: patch does not apply
error: patch failed: arch/x86/kvm/svm.c:7632
error: arch/x86/kvm/svm.c: patch does not apply
Patch failed at 0004 KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command

Thanks,

Venu

> 
> Changes since v4:
> - Host support has been added to extend KVM capabilities/feature bits to 
>   include a new KVM_FEATURE_SEV_LIVE_MIGRATION, which the guest can
>   query for host-side support for SEV live migration and a new custom MSR
>   MSR_KVM_SEV_LIVE_MIG_EN is added for guest to enable the SEV live
>   migration feature.
> - Ensure that _bss_decrypted section is marked as decrypted in the
>   page encryption bitmap.
> - Fixing KVM_GET_PAGE_ENC_BITMAP ioctl to return the correct bitmap
>   as per the number of pages being requested by the user. Ensure that
>   we only copy bmap->num_pages bytes in the userspace buffer, if
>   bmap->num_pages is not byte aligned we read the trailing bits
>   from the userspace and copy those bits as is. This fixes guest
>   page(s) corruption issues observed after migration completion.
> - Add kexec support for SEV Live Migration to reset the host's
>   page encryption bitmap related to kernel specific page encryption
>   status settings before we load a new kernel by kexec. We cannot
>   reset the complete page encryption bitmap here as we need to
>   retain the UEFI/OVMF firmware specific settings.
> 
> Changes since v3:
> - Rebasing to mainline and testing.
> - Adding a new KVM_PAGE_ENC_BITMAP_RESET ioctl, which resets the 
>   page encryption bitmap on a guest reboot event.
> - Adding a more reliable sanity check for GPA range being passed to
>   the hypercall to ensure that guest MMIO ranges are also marked
>   in the page encryption bitmap.
> 
> Changes since v2:
>  - reset the page encryption bitmap on vcpu reboot
> 
> Changes since v1:
>  - Add support to share the page encryption between the source and target
>    machine.
>  - Fix review feedbacks from Tom Lendacky.
>  - Add check to limit the session blob length.
>  - Update KVM_GET_PAGE_ENC_BITMAP icotl to use the base_gfn instead of
>    the memory slot when querying the bitmap.
> 
> Ashish Kalra (3):
>   KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
>   KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature &
>     Custom MSR.
>   KVM: x86: Add kexec support for SEV Live Migration.
> 
> Brijesh Singh (11):
>   KVM: SVM: Add KVM_SEV SEND_START command
>   KVM: SVM: Add KVM_SEND_UPDATE_DATA command
>   KVM: SVM: Add KVM_SEV_SEND_FINISH command
>   KVM: SVM: Add support for KVM_SEV_RECEIVE_START command
>   KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command
>   KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
>   KVM: x86: Add AMD SEV specific Hypercall3
>   KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
>   KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
>   mm: x86: Invoke hypercall when page encryption status is changed
>   KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
> 
>  .../virt/kvm/amd-memory-encryption.rst        | 120 +++
>  Documentation/virt/kvm/api.rst                |  62 ++
>  Documentation/virt/kvm/cpuid.rst              |   4 +
>  Documentation/virt/kvm/hypercalls.rst         |  15 +
>  Documentation/virt/kvm/msr.rst                |  10 +
>  arch/x86/include/asm/kvm_host.h               |  10 +
>  arch/x86/include/asm/kvm_para.h               |  12 +
>  arch/x86/include/asm/paravirt.h               |  10 +
>  arch/x86/include/asm/paravirt_types.h         |   2 +
>  arch/x86/include/uapi/asm/kvm_para.h          |   5 +
>  arch/x86/kernel/kvm.c                         |  32 +
>  arch/x86/kernel/paravirt.c                    |   1 +
>  arch/x86/kvm/cpuid.c                          |   3 +-
>  arch/x86/kvm/svm.c                            | 699 +++++++++++++++++-
>  arch/x86/kvm/vmx/vmx.c                        |   1 +
>  arch/x86/kvm/x86.c                            |  43 ++
>  arch/x86/mm/mem_encrypt.c                     |  69 +-
>  arch/x86/mm/pat/set_memory.c                  |   7 +
>  include/linux/psp-sev.h                       |   8 +-
>  include/uapi/linux/kvm.h                      |  53 ++
>  include/uapi/linux/kvm_para.h                 |   1 +
>  21 files changed, 1157 insertions(+), 10 deletions(-)
> 
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 00/14] Add AMD SEV guest live migration support
  2020-03-30 17:24 ` [PATCH v6 00/14] Add AMD SEV guest live migration support Venu Busireddy
@ 2020-03-30 18:28   ` Ashish Kalra
  2020-03-30 19:13     ` Venu Busireddy
  0 siblings, 1 reply; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30 18:28 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

This is applied on top of Linux 5.6, as per commit below :

commit 7111951b8d4973bda27ff663f2cf18b663d15b48 (tag: v5.6, origin/master, origin/HEAD)
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sun Mar 29 15:25:41 2020 -0700

    Linux 5.6

 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Thanks,
Ashish

On Mon, Mar 30, 2020 at 12:24:46PM -0500, Venu Busireddy wrote:
> On 2020-03-30 06:19:27 +0000, Ashish Kalra wrote:
> > From: Ashish Kalra <ashish.kalra@amd.com>
> > 
> > The series add support for AMD SEV guest live migration commands. To protect the
> > confidentiality of an SEV protected guest memory while in transit we need to
> > use the SEV commands defined in SEV API spec [1].
> > 
> > SEV guest VMs have the concept of private and shared memory. Private memory
> > is encrypted with the guest-specific key, while shared memory may be encrypted
> > with hypervisor key. The commands provided by the SEV FW are meant to be used
> > for the private memory only. The patch series introduces a new hypercall.
> > The guest OS can use this hypercall to notify the page encryption status.
> > If the page is encrypted with guest specific-key then we use SEV command during
> > the migration. If page is not encrypted then fallback to default.
> > 
> > The patch adds new ioctls KVM_{SET,GET}_PAGE_ENC_BITMAP. The ioctl can be used
> > by the qemu to get the page encrypted bitmap. Qemu can consult this bitmap
> > during the migration to know whether the page is encrypted.
> > 
> > [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.amd.com%2Fwp-content%2Fresources%2F55766.PDF&amp;data=02%7C01%7CAshish.Kalra%40amd.com%7Cb87828d7e1eb41fe401c08d7d4cf4937%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637211859771455979&amp;sdata=yN1OrvcuNb%2F8JAaLwlf2pIJtEvBRFOSvTKPYWz9ASUY%3D&amp;reserved=0
> > 
> > Changes since v5:
> > - Fix build errors as
> >   Reported-by: kbuild test robot <lkp@intel.com>
> 
> Which upstream tag should I use to apply this patch set? I tried the
> top of Linus's tree, and I get the following error when I apply this
> patch set.
> 
> $ git am PATCH-v6-01-14-KVM-SVM-Add-KVM_SEV-SEND_START-command.mbox
> Applying: KVM: SVM: Add KVM_SEV SEND_START command
> Applying: KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> Applying: KVM: SVM: Add KVM_SEV_SEND_FINISH command
> Applying: KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> error: patch failed: Documentation/virt/kvm/amd-memory-encryption.rst:375
> error: Documentation/virt/kvm/amd-memory-encryption.rst: patch does not apply
> error: patch failed: arch/x86/kvm/svm.c:7632
> error: arch/x86/kvm/svm.c: patch does not apply
> Patch failed at 0004 KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> 
> Thanks,
> 
> Venu
> 
> > 
> > Changes since v4:
> > - Host support has been added to extend KVM capabilities/feature bits to 
> >   include a new KVM_FEATURE_SEV_LIVE_MIGRATION, which the guest can
> >   query for host-side support for SEV live migration and a new custom MSR
> >   MSR_KVM_SEV_LIVE_MIG_EN is added for guest to enable the SEV live
> >   migration feature.
> > - Ensure that _bss_decrypted section is marked as decrypted in the
> >   page encryption bitmap.
> > - Fixing KVM_GET_PAGE_ENC_BITMAP ioctl to return the correct bitmap
> >   as per the number of pages being requested by the user. Ensure that
> >   we only copy bmap->num_pages bytes in the userspace buffer, if
> >   bmap->num_pages is not byte aligned we read the trailing bits
> >   from the userspace and copy those bits as is. This fixes guest
> >   page(s) corruption issues observed after migration completion.
> > - Add kexec support for SEV Live Migration to reset the host's
> >   page encryption bitmap related to kernel specific page encryption
> >   status settings before we load a new kernel by kexec. We cannot
> >   reset the complete page encryption bitmap here as we need to
> >   retain the UEFI/OVMF firmware specific settings.
> > 
> > Changes since v3:
> > - Rebasing to mainline and testing.
> > - Adding a new KVM_PAGE_ENC_BITMAP_RESET ioctl, which resets the 
> >   page encryption bitmap on a guest reboot event.
> > - Adding a more reliable sanity check for GPA range being passed to
> >   the hypercall to ensure that guest MMIO ranges are also marked
> >   in the page encryption bitmap.
> > 
> > Changes since v2:
> >  - reset the page encryption bitmap on vcpu reboot
> > 
> > Changes since v1:
> >  - Add support to share the page encryption between the source and target
> >    machine.
> >  - Fix review feedbacks from Tom Lendacky.
> >  - Add check to limit the session blob length.
> >  - Update KVM_GET_PAGE_ENC_BITMAP icotl to use the base_gfn instead of
> >    the memory slot when querying the bitmap.
> > 
> > Ashish Kalra (3):
> >   KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
> >   KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature &
> >     Custom MSR.
> >   KVM: x86: Add kexec support for SEV Live Migration.
> > 
> > Brijesh Singh (11):
> >   KVM: SVM: Add KVM_SEV SEND_START command
> >   KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> >   KVM: SVM: Add KVM_SEV_SEND_FINISH command
> >   KVM: SVM: Add support for KVM_SEV_RECEIVE_START command
> >   KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command
> >   KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> >   KVM: x86: Add AMD SEV specific Hypercall3
> >   KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
> >   KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
> >   mm: x86: Invoke hypercall when page encryption status is changed
> >   KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
> > 
> >  .../virt/kvm/amd-memory-encryption.rst        | 120 +++
> >  Documentation/virt/kvm/api.rst                |  62 ++
> >  Documentation/virt/kvm/cpuid.rst              |   4 +
> >  Documentation/virt/kvm/hypercalls.rst         |  15 +
> >  Documentation/virt/kvm/msr.rst                |  10 +
> >  arch/x86/include/asm/kvm_host.h               |  10 +
> >  arch/x86/include/asm/kvm_para.h               |  12 +
> >  arch/x86/include/asm/paravirt.h               |  10 +
> >  arch/x86/include/asm/paravirt_types.h         |   2 +
> >  arch/x86/include/uapi/asm/kvm_para.h          |   5 +
> >  arch/x86/kernel/kvm.c                         |  32 +
> >  arch/x86/kernel/paravirt.c                    |   1 +
> >  arch/x86/kvm/cpuid.c                          |   3 +-
> >  arch/x86/kvm/svm.c                            | 699 +++++++++++++++++-
> >  arch/x86/kvm/vmx/vmx.c                        |   1 +
> >  arch/x86/kvm/x86.c                            |  43 ++
> >  arch/x86/mm/mem_encrypt.c                     |  69 +-
> >  arch/x86/mm/pat/set_memory.c                  |   7 +
> >  include/linux/psp-sev.h                       |   8 +-
> >  include/uapi/linux/kvm.h                      |  53 ++
> >  include/uapi/linux/kvm_para.h                 |   1 +
> >  21 files changed, 1157 insertions(+), 10 deletions(-)
> > 
> > -- 
> > 2.17.1
> > 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 00/14] Add AMD SEV guest live migration support
  2020-03-30 18:28   ` Ashish Kalra
@ 2020-03-30 19:13     ` Venu Busireddy
  2020-03-30 21:52       ` Ashish Kalra
  0 siblings, 1 reply; 107+ messages in thread
From: Venu Busireddy @ 2020-03-30 19:13 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On 2020-03-30 18:28:45 +0000, Ashish Kalra wrote:
> This is applied on top of Linux 5.6, as per commit below :
> 
> commit 7111951b8d4973bda27ff663f2cf18b663d15b48 (tag: v5.6, origin/master, origin/HEAD)
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Sun Mar 29 15:25:41 2020 -0700
> 
>     Linux 5.6
> 
>  Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Not sure what I am missing here! This the current state of my sandbox:

$ git remote -v
origin  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (fetch)
origin  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (push)

$ git log --oneline
12acbbfef749 (HEAD -> master) KVM: SVM: Add KVM_SEV_SEND_FINISH command
e5f21e48bfff KVM: SVM: Add KVM_SEND_UPDATE_DATA command
6b2bcf682d08 KVM: SVM: Add KVM_SEV SEND_START command
7111951b8d49 (tag: v5.6, origin/master, origin/HEAD) Linux 5.6

$ git status
On branch master
Your branch is ahead of 'origin/master' by 3 commits.

As can be seen, I started with the commit (7111951b8d49) you mentioned.
I could apply 3 of the patches, but 04/14 is failing.

Any suggestions?

Thanks,

Venu

> Thanks,
> Ashish
> 
> On Mon, Mar 30, 2020 at 12:24:46PM -0500, Venu Busireddy wrote:
> > On 2020-03-30 06:19:27 +0000, Ashish Kalra wrote:
> > > From: Ashish Kalra <ashish.kalra@amd.com>
> > > 
> > > The series add support for AMD SEV guest live migration commands. To protect the
> > > confidentiality of an SEV protected guest memory while in transit we need to
> > > use the SEV commands defined in SEV API spec [1].
> > > 
> > > SEV guest VMs have the concept of private and shared memory. Private memory
> > > is encrypted with the guest-specific key, while shared memory may be encrypted
> > > with hypervisor key. The commands provided by the SEV FW are meant to be used
> > > for the private memory only. The patch series introduces a new hypercall.
> > > The guest OS can use this hypercall to notify the page encryption status.
> > > If the page is encrypted with guest specific-key then we use SEV command during
> > > the migration. If page is not encrypted then fallback to default.
> > > 
> > > The patch adds new ioctls KVM_{SET,GET}_PAGE_ENC_BITMAP. The ioctl can be used
> > > by the qemu to get the page encrypted bitmap. Qemu can consult this bitmap
> > > during the migration to know whether the page is encrypted.
> > > 
> > > [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.amd.com%2Fwp-content%2Fresources%2F55766.PDF&amp;data=02%7C01%7CAshish.Kalra%40amd.com%7Cb87828d7e1eb41fe401c08d7d4cf4937%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637211859771455979&amp;sdata=yN1OrvcuNb%2F8JAaLwlf2pIJtEvBRFOSvTKPYWz9ASUY%3D&amp;reserved=0
> > > 
> > > Changes since v5:
> > > - Fix build errors as
> > >   Reported-by: kbuild test robot <lkp@intel.com>
> > 
> > Which upstream tag should I use to apply this patch set? I tried the
> > top of Linus's tree, and I get the following error when I apply this
> > patch set.
> > 
> > $ git am PATCH-v6-01-14-KVM-SVM-Add-KVM_SEV-SEND_START-command.mbox
> > Applying: KVM: SVM: Add KVM_SEV SEND_START command
> > Applying: KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> > Applying: KVM: SVM: Add KVM_SEV_SEND_FINISH command
> > Applying: KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> > error: patch failed: Documentation/virt/kvm/amd-memory-encryption.rst:375
> > error: Documentation/virt/kvm/amd-memory-encryption.rst: patch does not apply
> > error: patch failed: arch/x86/kvm/svm.c:7632
> > error: arch/x86/kvm/svm.c: patch does not apply
> > Patch failed at 0004 KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> > 
> > Thanks,
> > 
> > Venu
> > 
> > > 
> > > Changes since v4:
> > > - Host support has been added to extend KVM capabilities/feature bits to 
> > >   include a new KVM_FEATURE_SEV_LIVE_MIGRATION, which the guest can
> > >   query for host-side support for SEV live migration and a new custom MSR
> > >   MSR_KVM_SEV_LIVE_MIG_EN is added for guest to enable the SEV live
> > >   migration feature.
> > > - Ensure that _bss_decrypted section is marked as decrypted in the
> > >   page encryption bitmap.
> > > - Fixing KVM_GET_PAGE_ENC_BITMAP ioctl to return the correct bitmap
> > >   as per the number of pages being requested by the user. Ensure that
> > >   we only copy bmap->num_pages bytes in the userspace buffer, if
> > >   bmap->num_pages is not byte aligned we read the trailing bits
> > >   from the userspace and copy those bits as is. This fixes guest
> > >   page(s) corruption issues observed after migration completion.
> > > - Add kexec support for SEV Live Migration to reset the host's
> > >   page encryption bitmap related to kernel specific page encryption
> > >   status settings before we load a new kernel by kexec. We cannot
> > >   reset the complete page encryption bitmap here as we need to
> > >   retain the UEFI/OVMF firmware specific settings.
> > > 
> > > Changes since v3:
> > > - Rebasing to mainline and testing.
> > > - Adding a new KVM_PAGE_ENC_BITMAP_RESET ioctl, which resets the 
> > >   page encryption bitmap on a guest reboot event.
> > > - Adding a more reliable sanity check for GPA range being passed to
> > >   the hypercall to ensure that guest MMIO ranges are also marked
> > >   in the page encryption bitmap.
> > > 
> > > Changes since v2:
> > >  - reset the page encryption bitmap on vcpu reboot
> > > 
> > > Changes since v1:
> > >  - Add support to share the page encryption between the source and target
> > >    machine.
> > >  - Fix review feedbacks from Tom Lendacky.
> > >  - Add check to limit the session blob length.
> > >  - Update KVM_GET_PAGE_ENC_BITMAP icotl to use the base_gfn instead of
> > >    the memory slot when querying the bitmap.
> > > 
> > > Ashish Kalra (3):
> > >   KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
> > >   KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature &
> > >     Custom MSR.
> > >   KVM: x86: Add kexec support for SEV Live Migration.
> > > 
> > > Brijesh Singh (11):
> > >   KVM: SVM: Add KVM_SEV SEND_START command
> > >   KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> > >   KVM: SVM: Add KVM_SEV_SEND_FINISH command
> > >   KVM: SVM: Add support for KVM_SEV_RECEIVE_START command
> > >   KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command
> > >   KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> > >   KVM: x86: Add AMD SEV specific Hypercall3
> > >   KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
> > >   KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
> > >   mm: x86: Invoke hypercall when page encryption status is changed
> > >   KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
> > > 
> > >  .../virt/kvm/amd-memory-encryption.rst        | 120 +++
> > >  Documentation/virt/kvm/api.rst                |  62 ++
> > >  Documentation/virt/kvm/cpuid.rst              |   4 +
> > >  Documentation/virt/kvm/hypercalls.rst         |  15 +
> > >  Documentation/virt/kvm/msr.rst                |  10 +
> > >  arch/x86/include/asm/kvm_host.h               |  10 +
> > >  arch/x86/include/asm/kvm_para.h               |  12 +
> > >  arch/x86/include/asm/paravirt.h               |  10 +
> > >  arch/x86/include/asm/paravirt_types.h         |   2 +
> > >  arch/x86/include/uapi/asm/kvm_para.h          |   5 +
> > >  arch/x86/kernel/kvm.c                         |  32 +
> > >  arch/x86/kernel/paravirt.c                    |   1 +
> > >  arch/x86/kvm/cpuid.c                          |   3 +-
> > >  arch/x86/kvm/svm.c                            | 699 +++++++++++++++++-
> > >  arch/x86/kvm/vmx/vmx.c                        |   1 +
> > >  arch/x86/kvm/x86.c                            |  43 ++
> > >  arch/x86/mm/mem_encrypt.c                     |  69 +-
> > >  arch/x86/mm/pat/set_memory.c                  |   7 +
> > >  include/linux/psp-sev.h                       |   8 +-
> > >  include/uapi/linux/kvm.h                      |  53 ++
> > >  include/uapi/linux/kvm_para.h                 |   1 +
> > >  21 files changed, 1157 insertions(+), 10 deletions(-)
> > > 
> > > -- 
> > > 2.17.1
> > > 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 00/14] Add AMD SEV guest live migration support
  2020-03-30 19:13     ` Venu Busireddy
@ 2020-03-30 21:52       ` Ashish Kalra
  2020-03-31 14:42         ` Venu Busireddy
  0 siblings, 1 reply; 107+ messages in thread
From: Ashish Kalra @ 2020-03-30 21:52 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

I just did a fresh install of Linus's tree and i can install these
patches cleanly on top of the tree.

Thanks,
Ashish

On Mon, Mar 30, 2020 at 02:13:07PM -0500, Venu Busireddy wrote:
> On 2020-03-30 18:28:45 +0000, Ashish Kalra wrote:
> > This is applied on top of Linux 5.6, as per commit below :
> > 
> > commit 7111951b8d4973bda27ff663f2cf18b663d15b48 (tag: v5.6, origin/master, origin/HEAD)
> > Author: Linus Torvalds <torvalds@linux-foundation.org>
> > Date:   Sun Mar 29 15:25:41 2020 -0700
> > 
> >     Linux 5.6
> > 
> >  Makefile | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Not sure what I am missing here! This the current state of my sandbox:
> 
> $ git remote -v
> origin  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (fetch)
> origin  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (push)
> 
> $ git log --oneline
> 12acbbfef749 (HEAD -> master) KVM: SVM: Add KVM_SEV_SEND_FINISH command
> e5f21e48bfff KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> 6b2bcf682d08 KVM: SVM: Add KVM_SEV SEND_START command
> 7111951b8d49 (tag: v5.6, origin/master, origin/HEAD) Linux 5.6
> 
> $ git status
> On branch master
> Your branch is ahead of 'origin/master' by 3 commits.
> 
> As can be seen, I started with the commit (7111951b8d49) you mentioned.
> I could apply 3 of the patches, but 04/14 is failing.
> 
> Any suggestions?
> 
> Thanks,
> 
> Venu
> 
> > Thanks,
> > Ashish
> > 
> > On Mon, Mar 30, 2020 at 12:24:46PM -0500, Venu Busireddy wrote:
> > > On 2020-03-30 06:19:27 +0000, Ashish Kalra wrote:
> > > > From: Ashish Kalra <ashish.kalra@amd.com>
> > > > 
> > > > The series add support for AMD SEV guest live migration commands. To protect the
> > > > confidentiality of an SEV protected guest memory while in transit we need to
> > > > use the SEV commands defined in SEV API spec [1].
> > > > 
> > > > SEV guest VMs have the concept of private and shared memory. Private memory
> > > > is encrypted with the guest-specific key, while shared memory may be encrypted
> > > > with hypervisor key. The commands provided by the SEV FW are meant to be used
> > > > for the private memory only. The patch series introduces a new hypercall.
> > > > The guest OS can use this hypercall to notify the page encryption status.
> > > > If the page is encrypted with guest specific-key then we use SEV command during
> > > > the migration. If page is not encrypted then fallback to default.
> > > > 
> > > > The patch adds new ioctls KVM_{SET,GET}_PAGE_ENC_BITMAP. The ioctl can be used
> > > > by the qemu to get the page encrypted bitmap. Qemu can consult this bitmap
> > > > during the migration to know whether the page is encrypted.
> > > > 
> > > > [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.amd.com%2Fwp-content%2Fresources%2F55766.PDF&amp;data=02%7C01%7Cashish.kalra%40amd.com%7C2546be8861e3409b9d3408d7d4de683c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637211924007781552&amp;sdata=lYAGaXWFveawb7Fre8Qo7iGyKcLREiodSgQswMBirHc%3D&amp;reserved=0
> > > > 
> > > > Changes since v5:
> > > > - Fix build errors as
> > > >   Reported-by: kbuild test robot <lkp@intel.com>
> > > 
> > > Which upstream tag should I use to apply this patch set? I tried the
> > > top of Linus's tree, and I get the following error when I apply this
> > > patch set.
> > > 
> > > $ git am PATCH-v6-01-14-KVM-SVM-Add-KVM_SEV-SEND_START-command.mbox
> > > Applying: KVM: SVM: Add KVM_SEV SEND_START command
> > > Applying: KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> > > Applying: KVM: SVM: Add KVM_SEV_SEND_FINISH command
> > > Applying: KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> > > error: patch failed: Documentation/virt/kvm/amd-memory-encryption.rst:375
> > > error: Documentation/virt/kvm/amd-memory-encryption.rst: patch does not apply
> > > error: patch failed: arch/x86/kvm/svm.c:7632
> > > error: arch/x86/kvm/svm.c: patch does not apply
> > > Patch failed at 0004 KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> > > 
> > > Thanks,
> > > 
> > > Venu
> > > 
> > > > 
> > > > Changes since v4:
> > > > - Host support has been added to extend KVM capabilities/feature bits to 
> > > >   include a new KVM_FEATURE_SEV_LIVE_MIGRATION, which the guest can
> > > >   query for host-side support for SEV live migration and a new custom MSR
> > > >   MSR_KVM_SEV_LIVE_MIG_EN is added for guest to enable the SEV live
> > > >   migration feature.
> > > > - Ensure that _bss_decrypted section is marked as decrypted in the
> > > >   page encryption bitmap.
> > > > - Fixing KVM_GET_PAGE_ENC_BITMAP ioctl to return the correct bitmap
> > > >   as per the number of pages being requested by the user. Ensure that
> > > >   we only copy bmap->num_pages bytes in the userspace buffer, if
> > > >   bmap->num_pages is not byte aligned we read the trailing bits
> > > >   from the userspace and copy those bits as is. This fixes guest
> > > >   page(s) corruption issues observed after migration completion.
> > > > - Add kexec support for SEV Live Migration to reset the host's
> > > >   page encryption bitmap related to kernel specific page encryption
> > > >   status settings before we load a new kernel by kexec. We cannot
> > > >   reset the complete page encryption bitmap here as we need to
> > > >   retain the UEFI/OVMF firmware specific settings.
> > > > 
> > > > Changes since v3:
> > > > - Rebasing to mainline and testing.
> > > > - Adding a new KVM_PAGE_ENC_BITMAP_RESET ioctl, which resets the 
> > > >   page encryption bitmap on a guest reboot event.
> > > > - Adding a more reliable sanity check for GPA range being passed to
> > > >   the hypercall to ensure that guest MMIO ranges are also marked
> > > >   in the page encryption bitmap.
> > > > 
> > > > Changes since v2:
> > > >  - reset the page encryption bitmap on vcpu reboot
> > > > 
> > > > Changes since v1:
> > > >  - Add support to share the page encryption between the source and target
> > > >    machine.
> > > >  - Fix review feedbacks from Tom Lendacky.
> > > >  - Add check to limit the session blob length.
> > > >  - Update KVM_GET_PAGE_ENC_BITMAP icotl to use the base_gfn instead of
> > > >    the memory slot when querying the bitmap.
> > > > 
> > > > Ashish Kalra (3):
> > > >   KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
> > > >   KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature &
> > > >     Custom MSR.
> > > >   KVM: x86: Add kexec support for SEV Live Migration.
> > > > 
> > > > Brijesh Singh (11):
> > > >   KVM: SVM: Add KVM_SEV SEND_START command
> > > >   KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> > > >   KVM: SVM: Add KVM_SEV_SEND_FINISH command
> > > >   KVM: SVM: Add support for KVM_SEV_RECEIVE_START command
> > > >   KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command
> > > >   KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> > > >   KVM: x86: Add AMD SEV specific Hypercall3
> > > >   KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
> > > >   KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
> > > >   mm: x86: Invoke hypercall when page encryption status is changed
> > > >   KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
> > > > 
> > > >  .../virt/kvm/amd-memory-encryption.rst        | 120 +++
> > > >  Documentation/virt/kvm/api.rst                |  62 ++
> > > >  Documentation/virt/kvm/cpuid.rst              |   4 +
> > > >  Documentation/virt/kvm/hypercalls.rst         |  15 +
> > > >  Documentation/virt/kvm/msr.rst                |  10 +
> > > >  arch/x86/include/asm/kvm_host.h               |  10 +
> > > >  arch/x86/include/asm/kvm_para.h               |  12 +
> > > >  arch/x86/include/asm/paravirt.h               |  10 +
> > > >  arch/x86/include/asm/paravirt_types.h         |   2 +
> > > >  arch/x86/include/uapi/asm/kvm_para.h          |   5 +
> > > >  arch/x86/kernel/kvm.c                         |  32 +
> > > >  arch/x86/kernel/paravirt.c                    |   1 +
> > > >  arch/x86/kvm/cpuid.c                          |   3 +-
> > > >  arch/x86/kvm/svm.c                            | 699 +++++++++++++++++-
> > > >  arch/x86/kvm/vmx/vmx.c                        |   1 +
> > > >  arch/x86/kvm/x86.c                            |  43 ++
> > > >  arch/x86/mm/mem_encrypt.c                     |  69 +-
> > > >  arch/x86/mm/pat/set_memory.c                  |   7 +
> > > >  include/linux/psp-sev.h                       |   8 +-
> > > >  include/uapi/linux/kvm.h                      |  53 ++
> > > >  include/uapi/linux/kvm_para.h                 |   1 +
> > > >  21 files changed, 1157 insertions(+), 10 deletions(-)
> > > > 
> > > > -- 
> > > > 2.17.1
> > > > 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 14/14] KVM: x86: Add kexec support for SEV Live Migration.
  2020-03-30 16:45     ` Ashish Kalra
@ 2020-03-31 14:26       ` Brijesh Singh
  2020-04-02 23:34         ` Ashish Kalra
  0 siblings, 1 reply; 107+ messages in thread
From: Brijesh Singh @ 2020-03-31 14:26 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: brijesh.singh, pbonzini, tglx, mingo, hpa, joro, bp,
	thomas.lendacky, x86, kvm, linux-kernel, rientjes, srutherford,
	luto


On 3/30/20 11:45 AM, Ashish Kalra wrote:
> Hello Brijesh,
>
> On Mon, Mar 30, 2020 at 11:00:14AM -0500, Brijesh Singh wrote:
>> On 3/30/20 1:23 AM, Ashish Kalra wrote:
>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>
>>> Reset the host's page encryption bitmap related to kernel
>>> specific page encryption status settings before we load a
>>> new kernel by kexec. We cannot reset the complete
>>> page encryption bitmap here as we need to retain the
>>> UEFI/OVMF firmware specific settings.
>>>
>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>> ---
>>>  arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
>>>  1 file changed, 28 insertions(+)
>>>
>>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>>> index 8fcee0b45231..ba6cce3c84af 100644
>>> --- a/arch/x86/kernel/kvm.c
>>> +++ b/arch/x86/kernel/kvm.c
>>> @@ -34,6 +34,7 @@
>>>  #include <asm/hypervisor.h>
>>>  #include <asm/tlb.h>
>>>  #include <asm/cpuidle_haltpoll.h>
>>> +#include <asm/e820/api.h>
>>>  
>>>  static int kvmapf = 1;
>>>  
>>> @@ -357,6 +358,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
>>>  	 */
>>>  	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
>>>  		wrmsrl(MSR_KVM_PV_EOI_EN, 0);
>>> +	/*
>>> +	 * Reset the host's page encryption bitmap related to kernel
>>> +	 * specific page encryption status settings before we load a
>>> +	 * new kernel by kexec. NOTE: We cannot reset the complete
>>> +	 * page encryption bitmap here as we need to retain the
>>> +	 * UEFI/OVMF firmware specific settings.
>>> +	 */
>>> +	if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION) &&
>>> +		(smp_processor_id() == 0)) {
>>
>> In patch 13/14, the KVM_FEATURE_SEV_LIVE_MIGRATION is set
>> unconditionally and because of that now the below code will be executed
>> on non-SEV guest. IMO, this feature must be cleared for non-SEV guest to
>> avoid making unnecessary hypercall's.
>>
>>
> I will additionally add a sev_active() check here to ensure that we don't make the unnecassary hypercalls on non-SEV guests.


IMO, instead of using the sev_active() we should make sure that the
feature is not enabled when SEV is not active.


>>> +		unsigned long nr_pages;
>>> +		int i;
>>> +
>>> +		for (i = 0; i < e820_table->nr_entries; i++) {
>>> +			struct e820_entry *entry = &e820_table->entries[i];
>>> +			unsigned long start_pfn, end_pfn;
>>> +
>>> +			if (entry->type != E820_TYPE_RAM)
>>> +				continue;
>>> +
>>> +			start_pfn = entry->addr >> PAGE_SHIFT;
>>> +			end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
>>> +			nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
>>> +
>>> +			kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
>>> +				entry->addr, nr_pages, 1);
>>> +		}
>>> +	}
>>>  	kvm_pv_disable_apf();
>>>  	kvm_disable_steal_time();
>>>  }
> Thanks,
> Ashish

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 00/14] Add AMD SEV guest live migration support
  2020-03-30 21:52       ` Ashish Kalra
@ 2020-03-31 14:42         ` Venu Busireddy
  0 siblings, 0 replies; 107+ messages in thread
From: Venu Busireddy @ 2020-03-31 14:42 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On 2020-03-30 21:52:45 +0000, Ashish Kalra wrote:
> I just did a fresh install of Linus's tree and i can install these
> patches cleanly on top of the tree.

Figured out what the problem was. Though the patches are listed sorted at
https://lore.kernel.org/kvm/cover.1585548051.git.ashish.kalra@amd.com/,
the patches inside
https://lore.kernel.org/kvm/cover.1585548051.git.ashish.kalra@amd.com/t.mbox.gz
are not sorted (sequential). Hence, they were being applied out of
order by 'git am ....mbox', which caused the error. I had to edit the
mbox file by hand, or use a tool such as b4 (suggested by a colleague),
to sort the patches in the mbox file. Once that is done, I was able to
apply the entire patch set to my code base.

Thanks,

Venu

> 
> Thanks,
> Ashish
> 
> On Mon, Mar 30, 2020 at 02:13:07PM -0500, Venu Busireddy wrote:
> > On 2020-03-30 18:28:45 +0000, Ashish Kalra wrote:
> > > This is applied on top of Linux 5.6, as per commit below :
> > > 
> > > commit 7111951b8d4973bda27ff663f2cf18b663d15b48 (tag: v5.6, origin/master, origin/HEAD)
> > > Author: Linus Torvalds <torvalds@linux-foundation.org>
> > > Date:   Sun Mar 29 15:25:41 2020 -0700
> > > 
> > >     Linux 5.6
> > > 
> > >  Makefile | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > Not sure what I am missing here! This the current state of my sandbox:
> > 
> > $ git remote -v
> > origin  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (fetch)
> > origin  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (push)
> > 
> > $ git log --oneline
> > 12acbbfef749 (HEAD -> master) KVM: SVM: Add KVM_SEV_SEND_FINISH command
> > e5f21e48bfff KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> > 6b2bcf682d08 KVM: SVM: Add KVM_SEV SEND_START command
> > 7111951b8d49 (tag: v5.6, origin/master, origin/HEAD) Linux 5.6
> > 
> > $ git status
> > On branch master
> > Your branch is ahead of 'origin/master' by 3 commits.
> > 
> > As can be seen, I started with the commit (7111951b8d49) you mentioned.
> > I could apply 3 of the patches, but 04/14 is failing.
> > 
> > Any suggestions?
> > 
> > Thanks,
> > 
> > Venu
> > 
> > > Thanks,
> > > Ashish
> > > 
> > > On Mon, Mar 30, 2020 at 12:24:46PM -0500, Venu Busireddy wrote:
> > > > On 2020-03-30 06:19:27 +0000, Ashish Kalra wrote:
> > > > > From: Ashish Kalra <ashish.kalra@amd.com>
> > > > > 
> > > > > The series add support for AMD SEV guest live migration commands. To protect the
> > > > > confidentiality of an SEV protected guest memory while in transit we need to
> > > > > use the SEV commands defined in SEV API spec [1].
> > > > > 
> > > > > SEV guest VMs have the concept of private and shared memory. Private memory
> > > > > is encrypted with the guest-specific key, while shared memory may be encrypted
> > > > > with hypervisor key. The commands provided by the SEV FW are meant to be used
> > > > > for the private memory only. The patch series introduces a new hypercall.
> > > > > The guest OS can use this hypercall to notify the page encryption status.
> > > > > If the page is encrypted with guest specific-key then we use SEV command during
> > > > > the migration. If page is not encrypted then fallback to default.
> > > > > 
> > > > > The patch adds new ioctls KVM_{SET,GET}_PAGE_ENC_BITMAP. The ioctl can be used
> > > > > by the qemu to get the page encrypted bitmap. Qemu can consult this bitmap
> > > > > during the migration to know whether the page is encrypted.
> > > > > 
> > > > > [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.amd.com%2Fwp-content%2Fresources%2F55766.PDF&amp;data=02%7C01%7Cashish.kalra%40amd.com%7C2546be8861e3409b9d3408d7d4de683c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637211924007781552&amp;sdata=lYAGaXWFveawb7Fre8Qo7iGyKcLREiodSgQswMBirHc%3D&amp;reserved=0
> > > > > 
> > > > > Changes since v5:
> > > > > - Fix build errors as
> > > > >   Reported-by: kbuild test robot <lkp@intel.com>
> > > > 
> > > > Which upstream tag should I use to apply this patch set? I tried the
> > > > top of Linus's tree, and I get the following error when I apply this
> > > > patch set.
> > > > 
> > > > $ git am PATCH-v6-01-14-KVM-SVM-Add-KVM_SEV-SEND_START-command.mbox
> > > > Applying: KVM: SVM: Add KVM_SEV SEND_START command
> > > > Applying: KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> > > > Applying: KVM: SVM: Add KVM_SEV_SEND_FINISH command
> > > > Applying: KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> > > > error: patch failed: Documentation/virt/kvm/amd-memory-encryption.rst:375
> > > > error: Documentation/virt/kvm/amd-memory-encryption.rst: patch does not apply
> > > > error: patch failed: arch/x86/kvm/svm.c:7632
> > > > error: arch/x86/kvm/svm.c: patch does not apply
> > > > Patch failed at 0004 KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> > > > 
> > > > Thanks,
> > > > 
> > > > Venu
> > > > 
> > > > > 
> > > > > Changes since v4:
> > > > > - Host support has been added to extend KVM capabilities/feature bits to 
> > > > >   include a new KVM_FEATURE_SEV_LIVE_MIGRATION, which the guest can
> > > > >   query for host-side support for SEV live migration and a new custom MSR
> > > > >   MSR_KVM_SEV_LIVE_MIG_EN is added for guest to enable the SEV live
> > > > >   migration feature.
> > > > > - Ensure that _bss_decrypted section is marked as decrypted in the
> > > > >   page encryption bitmap.
> > > > > - Fixing KVM_GET_PAGE_ENC_BITMAP ioctl to return the correct bitmap
> > > > >   as per the number of pages being requested by the user. Ensure that
> > > > >   we only copy bmap->num_pages bytes in the userspace buffer, if
> > > > >   bmap->num_pages is not byte aligned we read the trailing bits
> > > > >   from the userspace and copy those bits as is. This fixes guest
> > > > >   page(s) corruption issues observed after migration completion.
> > > > > - Add kexec support for SEV Live Migration to reset the host's
> > > > >   page encryption bitmap related to kernel specific page encryption
> > > > >   status settings before we load a new kernel by kexec. We cannot
> > > > >   reset the complete page encryption bitmap here as we need to
> > > > >   retain the UEFI/OVMF firmware specific settings.
> > > > > 
> > > > > Changes since v3:
> > > > > - Rebasing to mainline and testing.
> > > > > - Adding a new KVM_PAGE_ENC_BITMAP_RESET ioctl, which resets the 
> > > > >   page encryption bitmap on a guest reboot event.
> > > > > - Adding a more reliable sanity check for GPA range being passed to
> > > > >   the hypercall to ensure that guest MMIO ranges are also marked
> > > > >   in the page encryption bitmap.
> > > > > 
> > > > > Changes since v2:
> > > > >  - reset the page encryption bitmap on vcpu reboot
> > > > > 
> > > > > Changes since v1:
> > > > >  - Add support to share the page encryption between the source and target
> > > > >    machine.
> > > > >  - Fix review feedbacks from Tom Lendacky.
> > > > >  - Add check to limit the session blob length.
> > > > >  - Update KVM_GET_PAGE_ENC_BITMAP icotl to use the base_gfn instead of
> > > > >    the memory slot when querying the bitmap.
> > > > > 
> > > > > Ashish Kalra (3):
> > > > >   KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
> > > > >   KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature &
> > > > >     Custom MSR.
> > > > >   KVM: x86: Add kexec support for SEV Live Migration.
> > > > > 
> > > > > Brijesh Singh (11):
> > > > >   KVM: SVM: Add KVM_SEV SEND_START command
> > > > >   KVM: SVM: Add KVM_SEND_UPDATE_DATA command
> > > > >   KVM: SVM: Add KVM_SEV_SEND_FINISH command
> > > > >   KVM: SVM: Add support for KVM_SEV_RECEIVE_START command
> > > > >   KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command
> > > > >   KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
> > > > >   KVM: x86: Add AMD SEV specific Hypercall3
> > > > >   KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
> > > > >   KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
> > > > >   mm: x86: Invoke hypercall when page encryption status is changed
> > > > >   KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
> > > > > 
> > > > >  .../virt/kvm/amd-memory-encryption.rst        | 120 +++
> > > > >  Documentation/virt/kvm/api.rst                |  62 ++
> > > > >  Documentation/virt/kvm/cpuid.rst              |   4 +
> > > > >  Documentation/virt/kvm/hypercalls.rst         |  15 +
> > > > >  Documentation/virt/kvm/msr.rst                |  10 +
> > > > >  arch/x86/include/asm/kvm_host.h               |  10 +
> > > > >  arch/x86/include/asm/kvm_para.h               |  12 +
> > > > >  arch/x86/include/asm/paravirt.h               |  10 +
> > > > >  arch/x86/include/asm/paravirt_types.h         |   2 +
> > > > >  arch/x86/include/uapi/asm/kvm_para.h          |   5 +
> > > > >  arch/x86/kernel/kvm.c                         |  32 +
> > > > >  arch/x86/kernel/paravirt.c                    |   1 +
> > > > >  arch/x86/kvm/cpuid.c                          |   3 +-
> > > > >  arch/x86/kvm/svm.c                            | 699 +++++++++++++++++-
> > > > >  arch/x86/kvm/vmx/vmx.c                        |   1 +
> > > > >  arch/x86/kvm/x86.c                            |  43 ++
> > > > >  arch/x86/mm/mem_encrypt.c                     |  69 +-
> > > > >  arch/x86/mm/pat/set_memory.c                  |   7 +
> > > > >  include/linux/psp-sev.h                       |   8 +-
> > > > >  include/uapi/linux/kvm.h                      |  53 ++
> > > > >  include/uapi/linux/kvm_para.h                 |   1 +
> > > > >  21 files changed, 1157 insertions(+), 10 deletions(-)
> > > > > 
> > > > > -- 
> > > > > 2.17.1
> > > > > 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 01/14] KVM: SVM: Add KVM_SEV SEND_START command
  2020-03-30  6:19 ` [PATCH v6 01/14] KVM: SVM: Add KVM_SEV SEND_START command Ashish Kalra
@ 2020-04-02  6:27   ` Venu Busireddy
  2020-04-02 12:59     ` Brijesh Singh
  2020-04-02 17:51   ` Krish Sadhukhan
  1 sibling, 1 reply; 107+ messages in thread
From: Venu Busireddy @ 2020-04-02  6:27 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On 2020-03-30 06:19:59 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
> 
> The command is used to create an outgoing SEV guest encryption context.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Steve Rutherford <srutherford@google.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  .../virt/kvm/amd-memory-encryption.rst        |  27 ++++
>  arch/x86/kvm/svm.c                            | 128 ++++++++++++++++++
>  include/linux/psp-sev.h                       |   8 +-
>  include/uapi/linux/kvm.h                      |  12 ++
>  4 files changed, 171 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index c3129b9ba5cb..4fd34fc5c7a7 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -263,6 +263,33 @@ Returns: 0 on success, -negative on error
>                  __u32 trans_len;
>          };
>  
> +10. KVM_SEV_SEND_START
> +----------------------
> +
> +The KVM_SEV_SEND_START command can be used by the hypervisor to create an
> +outgoing guest encryption context.
> +
> +Parameters (in): struct kvm_sev_send_start
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +        struct kvm_sev_send_start {
> +                __u32 policy;                 /* guest policy */
> +
> +                __u64 pdh_cert_uaddr;         /* platform Diffie-Hellman certificate */
> +                __u32 pdh_cert_len;
> +
> +                __u64 plat_certs_uadr;        /* platform certificate chain */

Could this please be changed to plat_certs_uaddr, as it is referred to
in the rest of the code?

> +                __u32 plat_certs_len;
> +
> +                __u64 amd_certs_uaddr;        /* AMD certificate */
> +                __u32 amd_cert_len;

Could this please be changed to amd_certs_len, as it is referred to in
the rest of the code?

> +
> +                __u64 session_uaddr;          /* Guest session information */
> +                __u32 session_len;
> +        };
> +
>  References
>  ==========
>  
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 50d1ebafe0b3..63d172e974ad 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7149,6 +7149,131 @@ static int sev_launch_secret(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  	return ret;
>  }
>  
> +/* Userspace wants to query session length. */
> +static int
> +__sev_send_start_query_session_length(struct kvm *kvm, struct kvm_sev_cmd *argp,
> +				      struct kvm_sev_send_start *params)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct sev_data_send_start *data;
> +	int ret;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
> +	if (data == NULL)
> +		return -ENOMEM;
> +
> +	data->handle = sev->handle;
> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
> +
> +	params->session_len = data->session_len;
> +	if (copy_to_user((void __user *)(uintptr_t)argp->data, params,
> +				sizeof(struct kvm_sev_send_start)))
> +		ret = -EFAULT;
> +
> +	kfree(data);
> +	return ret;
> +}
> +
> +static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct sev_data_send_start *data;
> +	struct kvm_sev_send_start params;
> +	void *amd_certs, *session_data;
> +	void *pdh_cert, *plat_certs;
> +	int ret;
> +
> +	if (!sev_guest(kvm))
> +		return -ENOTTY;
> +
> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
> +				sizeof(struct kvm_sev_send_start)))
> +		return -EFAULT;
> +
> +	/* if session_len is zero, userspace wants to query the session length */
> +	if (!params.session_len)
> +		return __sev_send_start_query_session_length(kvm, argp,
> +				&params);
> +
> +	/* some sanity checks */
> +	if (!params.pdh_cert_uaddr || !params.pdh_cert_len ||
> +	    !params.session_uaddr || params.session_len > SEV_FW_BLOB_MAX_SIZE)
> +		return -EINVAL;
> +
> +	/* allocate the memory to hold the session data blob */
> +	session_data = kmalloc(params.session_len, GFP_KERNEL_ACCOUNT);
> +	if (!session_data)
> +		return -ENOMEM;
> +
> +	/* copy the certificate blobs from userspace */
> +	pdh_cert = psp_copy_user_blob(params.pdh_cert_uaddr,
> +				params.pdh_cert_len);
> +	if (IS_ERR(pdh_cert)) {
> +		ret = PTR_ERR(pdh_cert);
> +		goto e_free_session;
> +	}
> +
> +	plat_certs = psp_copy_user_blob(params.plat_certs_uaddr,
> +				params.plat_certs_len);
> +	if (IS_ERR(plat_certs)) {
> +		ret = PTR_ERR(plat_certs);
> +		goto e_free_pdh;
> +	}
> +
> +	amd_certs = psp_copy_user_blob(params.amd_certs_uaddr,
> +				params.amd_certs_len);
> +	if (IS_ERR(amd_certs)) {
> +		ret = PTR_ERR(amd_certs);
> +		goto e_free_plat_cert;
> +	}
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
> +	if (data == NULL) {
> +		ret = -ENOMEM;
> +		goto e_free_amd_cert;
> +	}
> +
> +	/* populate the FW SEND_START field with system physical address */
> +	data->pdh_cert_address = __psp_pa(pdh_cert);
> +	data->pdh_cert_len = params.pdh_cert_len;
> +	data->plat_certs_address = __psp_pa(plat_certs);
> +	data->plat_certs_len = params.plat_certs_len;
> +	data->amd_certs_address = __psp_pa(amd_certs);
> +	data->amd_certs_len = params.amd_certs_len;
> +	data->session_address = __psp_pa(session_data);
> +	data->session_len = params.session_len;
> +	data->handle = sev->handle;
> +
> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
> +
> +	if (ret)
> +		goto e_free;
> +
> +	if (copy_to_user((void __user *)(uintptr_t) params.session_uaddr,
> +			session_data, params.session_len)) {
> +		ret = -EFAULT;
> +		goto e_free;
> +	}

To optimize the amount of data being copied to user space, could the
above section of code changed as follows?

	params.session_len = data->session_len;
	if (copy_to_user((void __user *)(uintptr_t) params.session_uaddr,
			session_data, params.session_len)) {
		ret = -EFAULT;
		goto e_free;
	}

> +
> +	params.policy = data->policy;
> +	params.session_len = data->session_len;
> +	if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
> +				sizeof(struct kvm_sev_send_start)))
> +		ret = -EFAULT;

Since the only fields that are changed in the kvm_sev_send_start structure
are session_len and policy, why do we need to copy the entire structure
back to the user? Why not just those two values? Please see the changes
proposed to kvm_sev_send_start structure further below to accomplish this.

	params.policy = data->policy;
	if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
			sizeof(params.policy) + sizeof(params.session_len))
		ret = -EFAULT;
> +
> +e_free:
> +	kfree(data);
> +e_free_amd_cert:
> +	kfree(amd_certs);
> +e_free_plat_cert:
> +	kfree(plat_certs);
> +e_free_pdh:
> +	kfree(pdh_cert);
> +e_free_session:
> +	kfree(session_data);
> +	return ret;
> +}
> +
>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_sev_cmd sev_cmd;
> @@ -7193,6 +7318,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  	case KVM_SEV_LAUNCH_SECRET:
>  		r = sev_launch_secret(kvm, &sev_cmd);
>  		break;
> +	case KVM_SEV_SEND_START:
> +		r = sev_send_start(kvm, &sev_cmd);
> +		break;
>  	default:
>  		r = -EINVAL;
>  		goto out;
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index 5167bf2bfc75..9f63b9d48b63 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -323,11 +323,11 @@ struct sev_data_send_start {
>  	u64 pdh_cert_address;			/* In */
>  	u32 pdh_cert_len;			/* In */
>  	u32 reserved1;
> -	u64 plat_cert_address;			/* In */
> -	u32 plat_cert_len;			/* In */
> +	u64 plat_certs_address;			/* In */
> +	u32 plat_certs_len;			/* In */
>  	u32 reserved2;
> -	u64 amd_cert_address;			/* In */
> -	u32 amd_cert_len;			/* In */
> +	u64 amd_certs_address;			/* In */
> +	u32 amd_certs_len;			/* In */
>  	u32 reserved3;
>  	u64 session_address;			/* In */
>  	u32 session_len;			/* In/Out */
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 4b95f9a31a2f..17bef4c245e1 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1558,6 +1558,18 @@ struct kvm_sev_dbg {
>  	__u32 len;
>  };
>  
> +struct kvm_sev_send_start {
> +	__u32 policy;
> +	__u64 pdh_cert_uaddr;
> +	__u32 pdh_cert_len;
> +	__u64 plat_certs_uaddr;
> +	__u32 plat_certs_len;
> +	__u64 amd_certs_uaddr;
> +	__u32 amd_certs_len;
> +	__u64 session_uaddr;
> +	__u32 session_len;
> +};

Redo this structure as below:

struct kvm_sev_send_start {
	__u32 policy;
	__u32 session_len;
	__u64 session_uaddr;
	__u64 pdh_cert_uaddr;
	__u32 pdh_cert_len;
	__u64 plat_certs_uaddr;
	__u32 plat_certs_len;
	__u64 amd_certs_uaddr;
	__u32 amd_certs_len;
};

Or as below, just to make it look better.

struct kvm_sev_send_start {
	__u32 policy;
	__u32 session_len;
	__u64 session_uaddr;
	__u32 pdh_cert_len;
	__u64 pdh_cert_uaddr;
	__u32 plat_certs_len;
	__u64 plat_certs_uaddr;
	__u32 amd_certs_len;
	__u64 amd_certs_uaddr;
};

> +
>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
>  #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
>  #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 01/14] KVM: SVM: Add KVM_SEV SEND_START command
  2020-04-02  6:27   ` Venu Busireddy
@ 2020-04-02 12:59     ` Brijesh Singh
  2020-04-02 16:37       ` Venu Busireddy
  0 siblings, 1 reply; 107+ messages in thread
From: Brijesh Singh @ 2020-04-02 12:59 UTC (permalink / raw)
  To: Venu Busireddy, Ashish Kalra
  Cc: brijesh.singh, pbonzini, tglx, mingo, hpa, joro, bp,
	thomas.lendacky, x86, kvm, linux-kernel, rientjes, srutherford,
	luto

Hi Venu,

Thanks for the feedback.

On 4/2/20 1:27 AM, Venu Busireddy wrote:
> On 2020-03-30 06:19:59 +0000, Ashish Kalra wrote:
>> From: Brijesh Singh <Brijesh.Singh@amd.com>
>>
>> The command is used to create an outgoing SEV guest encryption context.
>>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
>> Cc: Joerg Roedel <joro@8bytes.org>
>> Cc: Borislav Petkov <bp@suse.de>
>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>> Cc: x86@kernel.org
>> Cc: kvm@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Reviewed-by: Steve Rutherford <srutherford@google.com>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> ---
>>  .../virt/kvm/amd-memory-encryption.rst        |  27 ++++
>>  arch/x86/kvm/svm.c                            | 128 ++++++++++++++++++
>>  include/linux/psp-sev.h                       |   8 +-
>>  include/uapi/linux/kvm.h                      |  12 ++
>>  4 files changed, 171 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
>> index c3129b9ba5cb..4fd34fc5c7a7 100644
>> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
>> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
>> @@ -263,6 +263,33 @@ Returns: 0 on success, -negative on error
>>                  __u32 trans_len;
>>          };
>>  
>> +10. KVM_SEV_SEND_START
>> +----------------------
>> +
>> +The KVM_SEV_SEND_START command can be used by the hypervisor to create an
>> +outgoing guest encryption context.
>> +
>> +Parameters (in): struct kvm_sev_send_start
>> +
>> +Returns: 0 on success, -negative on error
>> +
>> +::
>> +        struct kvm_sev_send_start {
>> +                __u32 policy;                 /* guest policy */
>> +
>> +                __u64 pdh_cert_uaddr;         /* platform Diffie-Hellman certificate */
>> +                __u32 pdh_cert_len;
>> +
>> +                __u64 plat_certs_uadr;        /* platform certificate chain */
> Could this please be changed to plat_certs_uaddr, as it is referred to
> in the rest of the code?
>
>> +                __u32 plat_certs_len;
>> +
>> +                __u64 amd_certs_uaddr;        /* AMD certificate */
>> +                __u32 amd_cert_len;
> Could this please be changed to amd_certs_len, as it is referred to in
> the rest of the code?
>
>> +
>> +                __u64 session_uaddr;          /* Guest session information */
>> +                __u32 session_len;
>> +        };
>> +
>>  References
>>  ==========
>>  
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 50d1ebafe0b3..63d172e974ad 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -7149,6 +7149,131 @@ static int sev_launch_secret(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>  	return ret;
>>  }
>>  
>> +/* Userspace wants to query session length. */
>> +static int
>> +__sev_send_start_query_session_length(struct kvm *kvm, struct kvm_sev_cmd *argp,
>> +				      struct kvm_sev_send_start *params)
>> +{
>> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> +	struct sev_data_send_start *data;
>> +	int ret;
>> +
>> +	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
>> +	if (data == NULL)
>> +		return -ENOMEM;
>> +
>> +	data->handle = sev->handle;
>> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
>> +
>> +	params->session_len = data->session_len;
>> +	if (copy_to_user((void __user *)(uintptr_t)argp->data, params,
>> +				sizeof(struct kvm_sev_send_start)))
>> +		ret = -EFAULT;
>> +
>> +	kfree(data);
>> +	return ret;
>> +}
>> +
>> +static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>> +{
>> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> +	struct sev_data_send_start *data;
>> +	struct kvm_sev_send_start params;
>> +	void *amd_certs, *session_data;
>> +	void *pdh_cert, *plat_certs;
>> +	int ret;
>> +
>> +	if (!sev_guest(kvm))
>> +		return -ENOTTY;
>> +
>> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
>> +				sizeof(struct kvm_sev_send_start)))
>> +		return -EFAULT;
>> +
>> +	/* if session_len is zero, userspace wants to query the session length */
>> +	if (!params.session_len)
>> +		return __sev_send_start_query_session_length(kvm, argp,
>> +				&params);
>> +
>> +	/* some sanity checks */
>> +	if (!params.pdh_cert_uaddr || !params.pdh_cert_len ||
>> +	    !params.session_uaddr || params.session_len > SEV_FW_BLOB_MAX_SIZE)
>> +		return -EINVAL;
>> +
>> +	/* allocate the memory to hold the session data blob */
>> +	session_data = kmalloc(params.session_len, GFP_KERNEL_ACCOUNT);
>> +	if (!session_data)
>> +		return -ENOMEM;
>> +
>> +	/* copy the certificate blobs from userspace */
>> +	pdh_cert = psp_copy_user_blob(params.pdh_cert_uaddr,
>> +				params.pdh_cert_len);
>> +	if (IS_ERR(pdh_cert)) {
>> +		ret = PTR_ERR(pdh_cert);
>> +		goto e_free_session;
>> +	}
>> +
>> +	plat_certs = psp_copy_user_blob(params.plat_certs_uaddr,
>> +				params.plat_certs_len);
>> +	if (IS_ERR(plat_certs)) {
>> +		ret = PTR_ERR(plat_certs);
>> +		goto e_free_pdh;
>> +	}
>> +
>> +	amd_certs = psp_copy_user_blob(params.amd_certs_uaddr,
>> +				params.amd_certs_len);
>> +	if (IS_ERR(amd_certs)) {
>> +		ret = PTR_ERR(amd_certs);
>> +		goto e_free_plat_cert;
>> +	}
>> +
>> +	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
>> +	if (data == NULL) {
>> +		ret = -ENOMEM;
>> +		goto e_free_amd_cert;
>> +	}
>> +
>> +	/* populate the FW SEND_START field with system physical address */
>> +	data->pdh_cert_address = __psp_pa(pdh_cert);
>> +	data->pdh_cert_len = params.pdh_cert_len;
>> +	data->plat_certs_address = __psp_pa(plat_certs);
>> +	data->plat_certs_len = params.plat_certs_len;
>> +	data->amd_certs_address = __psp_pa(amd_certs);
>> +	data->amd_certs_len = params.amd_certs_len;
>> +	data->session_address = __psp_pa(session_data);
>> +	data->session_len = params.session_len;
>> +	data->handle = sev->handle;
>> +
>> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
>> +
>> +	if (ret)
>> +		goto e_free;
>> +
>> +	if (copy_to_user((void __user *)(uintptr_t) params.session_uaddr,
>> +			session_data, params.session_len)) {
>> +		ret = -EFAULT;
>> +		goto e_free;
>> +	}
> To optimize the amount of data being copied to user space, could the
> above section of code changed as follows?
>
> 	params.session_len = data->session_len;
> 	if (copy_to_user((void __user *)(uintptr_t) params.session_uaddr,
> 			session_data, params.session_len)) {
> 		ret = -EFAULT;
> 		goto e_free;
> 	}


We should not be using the data->session_len, it will cause -EFAULT when
user has not allocated enough space in the session_uaddr. Lets consider
the case where user passes session_len=10 but firmware thinks the
session length should be 64. In that case the data->session_len will
contains a value of 64 but userspace has allocated space for 10 bytes
and copy_to_user() will fail. If we are really concern about the amount
of data getting copied to userspace then use min_t(size_t,
params.session_len, data->session_len).


>> +
>> +	params.policy = data->policy;
>> +	params.session_len = data->session_len;
>> +	if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
>> +				sizeof(struct kvm_sev_send_start)))
>> +		ret = -EFAULT;
> Since the only fields that are changed in the kvm_sev_send_start structure
> are session_len and policy, why do we need to copy the entire structure
> back to the user? Why not just those two values? Please see the changes
> proposed to kvm_sev_send_start structure further below to accomplish this.

I think we also need to consider the code readability while saving the
CPU cycles. This is very small structure. By duplicating into two calls
#1 copy params.policy and #2 copy params.session_len we will add more
CPU cycle. And, If we get creative and rearrange the structure then code
readability is lost because now the copy will depend on how the
structure is layout in the memory.

>
> 	params.policy = data->policy;
> 	if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
> 			sizeof(params.policy) + sizeof(params.session_len))
> 		ret = -EFAULT;
>> +
>> +e_free:
>> +	kfree(data);
>> +e_free_amd_cert:
>> +	kfree(amd_certs);
>> +e_free_plat_cert:
>> +	kfree(plat_certs);
>> +e_free_pdh:
>> +	kfree(pdh_cert);
>> +e_free_session:
>> +	kfree(session_data);
>> +	return ret;
>> +}
>> +
>>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>  {
>>  	struct kvm_sev_cmd sev_cmd;
>> @@ -7193,6 +7318,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>  	case KVM_SEV_LAUNCH_SECRET:
>>  		r = sev_launch_secret(kvm, &sev_cmd);
>>  		break;
>> +	case KVM_SEV_SEND_START:
>> +		r = sev_send_start(kvm, &sev_cmd);
>> +		break;
>>  	default:
>>  		r = -EINVAL;
>>  		goto out;
>> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
>> index 5167bf2bfc75..9f63b9d48b63 100644
>> --- a/include/linux/psp-sev.h
>> +++ b/include/linux/psp-sev.h
>> @@ -323,11 +323,11 @@ struct sev_data_send_start {
>>  	u64 pdh_cert_address;			/* In */
>>  	u32 pdh_cert_len;			/* In */
>>  	u32 reserved1;
>> -	u64 plat_cert_address;			/* In */
>> -	u32 plat_cert_len;			/* In */
>> +	u64 plat_certs_address;			/* In */
>> +	u32 plat_certs_len;			/* In */
>>  	u32 reserved2;
>> -	u64 amd_cert_address;			/* In */
>> -	u32 amd_cert_len;			/* In */
>> +	u64 amd_certs_address;			/* In */
>> +	u32 amd_certs_len;			/* In */
>>  	u32 reserved3;
>>  	u64 session_address;			/* In */
>>  	u32 session_len;			/* In/Out */
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 4b95f9a31a2f..17bef4c245e1 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1558,6 +1558,18 @@ struct kvm_sev_dbg {
>>  	__u32 len;
>>  };
>>  
>> +struct kvm_sev_send_start {
>> +	__u32 policy;
>> +	__u64 pdh_cert_uaddr;
>> +	__u32 pdh_cert_len;
>> +	__u64 plat_certs_uaddr;
>> +	__u32 plat_certs_len;
>> +	__u64 amd_certs_uaddr;
>> +	__u32 amd_certs_len;
>> +	__u64 session_uaddr;
>> +	__u32 session_len;
>> +};
> Redo this structure as below:
>
> struct kvm_sev_send_start {
> 	__u32 policy;
> 	__u32 session_len;
> 	__u64 session_uaddr;
> 	__u64 pdh_cert_uaddr;
> 	__u32 pdh_cert_len;
> 	__u64 plat_certs_uaddr;
> 	__u32 plat_certs_len;
> 	__u64 amd_certs_uaddr;
> 	__u32 amd_certs_len;
> };
>
> Or as below, just to make it look better.
>
> struct kvm_sev_send_start {
> 	__u32 policy;
> 	__u32 session_len;
> 	__u64 session_uaddr;
> 	__u32 pdh_cert_len;
> 	__u64 pdh_cert_uaddr;
> 	__u32 plat_certs_len;
> 	__u64 plat_certs_uaddr;
> 	__u32 amd_certs_len;
> 	__u64 amd_certs_uaddr;
> };
>

Wherever applicable, I tried  best to not divert from the SEV spec
structure layout. Anyone who is reading the SEV FW spec  will see a
similar structure layout in the KVM/PSP header files. I would prefer to
stick to that approach.


>> +
>>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
>>  #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
>>  #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
>> -- 
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 01/14] KVM: SVM: Add KVM_SEV SEND_START command
  2020-04-02 12:59     ` Brijesh Singh
@ 2020-04-02 16:37       ` Venu Busireddy
  2020-04-02 18:04         ` Brijesh Singh
  0 siblings, 1 reply; 107+ messages in thread
From: Venu Busireddy @ 2020-04-02 16:37 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Ashish Kalra, pbonzini, tglx, mingo, hpa, joro, bp,
	thomas.lendacky, x86, kvm, linux-kernel, rientjes, srutherford,
	luto

On 2020-04-02 07:59:54 -0500, Brijesh Singh wrote:
> Hi Venu,
> 
> Thanks for the feedback.
> 
> On 4/2/20 1:27 AM, Venu Busireddy wrote:
> > On 2020-03-30 06:19:59 +0000, Ashish Kalra wrote:
> >> From: Brijesh Singh <Brijesh.Singh@amd.com>
> >>
> >> The command is used to create an outgoing SEV guest encryption context.
> >>
> >> Cc: Thomas Gleixner <tglx@linutronix.de>
> >> Cc: Ingo Molnar <mingo@redhat.com>
> >> Cc: "H. Peter Anvin" <hpa@zytor.com>
> >> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> >> Cc: Joerg Roedel <joro@8bytes.org>
> >> Cc: Borislav Petkov <bp@suse.de>
> >> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> >> Cc: x86@kernel.org
> >> Cc: kvm@vger.kernel.org
> >> Cc: linux-kernel@vger.kernel.org
> >> Reviewed-by: Steve Rutherford <srutherford@google.com>
> >> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> >> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> >> ---
> >>  .../virt/kvm/amd-memory-encryption.rst        |  27 ++++
> >>  arch/x86/kvm/svm.c                            | 128 ++++++++++++++++++
> >>  include/linux/psp-sev.h                       |   8 +-
> >>  include/uapi/linux/kvm.h                      |  12 ++
> >>  4 files changed, 171 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> >> index c3129b9ba5cb..4fd34fc5c7a7 100644
> >> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> >> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> >> @@ -263,6 +263,33 @@ Returns: 0 on success, -negative on error
> >>                  __u32 trans_len;
> >>          };
> >>  
> >> +10. KVM_SEV_SEND_START
> >> +----------------------
> >> +
> >> +The KVM_SEV_SEND_START command can be used by the hypervisor to create an
> >> +outgoing guest encryption context.
> >> +
> >> +Parameters (in): struct kvm_sev_send_start
> >> +
> >> +Returns: 0 on success, -negative on error
> >> +
> >> +::
> >> +        struct kvm_sev_send_start {
> >> +                __u32 policy;                 /* guest policy */
> >> +
> >> +                __u64 pdh_cert_uaddr;         /* platform Diffie-Hellman certificate */
> >> +                __u32 pdh_cert_len;
> >> +
> >> +                __u64 plat_certs_uadr;        /* platform certificate chain */
> > Could this please be changed to plat_certs_uaddr, as it is referred to
> > in the rest of the code?
> >
> >> +                __u32 plat_certs_len;
> >> +
> >> +                __u64 amd_certs_uaddr;        /* AMD certificate */
> >> +                __u32 amd_cert_len;
> > Could this please be changed to amd_certs_len, as it is referred to in
> > the rest of the code?
> >
> >> +
> >> +                __u64 session_uaddr;          /* Guest session information */
> >> +                __u32 session_len;
> >> +        };
> >> +
> >>  References
> >>  ==========
> >>  
> >> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >> index 50d1ebafe0b3..63d172e974ad 100644
> >> --- a/arch/x86/kvm/svm.c
> >> +++ b/arch/x86/kvm/svm.c
> >> @@ -7149,6 +7149,131 @@ static int sev_launch_secret(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >>  	return ret;
> >>  }
> >>  
> >> +/* Userspace wants to query session length. */
> >> +static int
> >> +__sev_send_start_query_session_length(struct kvm *kvm, struct kvm_sev_cmd *argp,
> >> +				      struct kvm_sev_send_start *params)
> >> +{
> >> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >> +	struct sev_data_send_start *data;
> >> +	int ret;
> >> +
> >> +	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
> >> +	if (data == NULL)
> >> +		return -ENOMEM;
> >> +
> >> +	data->handle = sev->handle;
> >> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
> >> +
> >> +	params->session_len = data->session_len;
> >> +	if (copy_to_user((void __user *)(uintptr_t)argp->data, params,
> >> +				sizeof(struct kvm_sev_send_start)))
> >> +		ret = -EFAULT;
> >> +
> >> +	kfree(data);
> >> +	return ret;
> >> +}
> >> +
> >> +static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >> +{
> >> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >> +	struct sev_data_send_start *data;
> >> +	struct kvm_sev_send_start params;
> >> +	void *amd_certs, *session_data;
> >> +	void *pdh_cert, *plat_certs;
> >> +	int ret;
> >> +
> >> +	if (!sev_guest(kvm))
> >> +		return -ENOTTY;
> >> +
> >> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
> >> +				sizeof(struct kvm_sev_send_start)))
> >> +		return -EFAULT;
> >> +
> >> +	/* if session_len is zero, userspace wants to query the session length */
> >> +	if (!params.session_len)
> >> +		return __sev_send_start_query_session_length(kvm, argp,
> >> +				&params);
> >> +
> >> +	/* some sanity checks */
> >> +	if (!params.pdh_cert_uaddr || !params.pdh_cert_len ||
> >> +	    !params.session_uaddr || params.session_len > SEV_FW_BLOB_MAX_SIZE)
> >> +		return -EINVAL;
> >> +
> >> +	/* allocate the memory to hold the session data blob */
> >> +	session_data = kmalloc(params.session_len, GFP_KERNEL_ACCOUNT);
> >> +	if (!session_data)
> >> +		return -ENOMEM;
> >> +
> >> +	/* copy the certificate blobs from userspace */
> >> +	pdh_cert = psp_copy_user_blob(params.pdh_cert_uaddr,
> >> +				params.pdh_cert_len);
> >> +	if (IS_ERR(pdh_cert)) {
> >> +		ret = PTR_ERR(pdh_cert);
> >> +		goto e_free_session;
> >> +	}
> >> +
> >> +	plat_certs = psp_copy_user_blob(params.plat_certs_uaddr,
> >> +				params.plat_certs_len);
> >> +	if (IS_ERR(plat_certs)) {
> >> +		ret = PTR_ERR(plat_certs);
> >> +		goto e_free_pdh;
> >> +	}
> >> +
> >> +	amd_certs = psp_copy_user_blob(params.amd_certs_uaddr,
> >> +				params.amd_certs_len);
> >> +	if (IS_ERR(amd_certs)) {
> >> +		ret = PTR_ERR(amd_certs);
> >> +		goto e_free_plat_cert;
> >> +	}
> >> +
> >> +	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
> >> +	if (data == NULL) {
> >> +		ret = -ENOMEM;
> >> +		goto e_free_amd_cert;
> >> +	}
> >> +
> >> +	/* populate the FW SEND_START field with system physical address */
> >> +	data->pdh_cert_address = __psp_pa(pdh_cert);
> >> +	data->pdh_cert_len = params.pdh_cert_len;
> >> +	data->plat_certs_address = __psp_pa(plat_certs);
> >> +	data->plat_certs_len = params.plat_certs_len;
> >> +	data->amd_certs_address = __psp_pa(amd_certs);
> >> +	data->amd_certs_len = params.amd_certs_len;
> >> +	data->session_address = __psp_pa(session_data);
> >> +	data->session_len = params.session_len;
> >> +	data->handle = sev->handle;
> >> +
> >> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
> >> +
> >> +	if (ret)
> >> +		goto e_free;
> >> +
> >> +	if (copy_to_user((void __user *)(uintptr_t) params.session_uaddr,
> >> +			session_data, params.session_len)) {
> >> +		ret = -EFAULT;
> >> +		goto e_free;
> >> +	}
> > To optimize the amount of data being copied to user space, could the
> > above section of code changed as follows?
> >
> > 	params.session_len = data->session_len;
> > 	if (copy_to_user((void __user *)(uintptr_t) params.session_uaddr,
> > 			session_data, params.session_len)) {
> > 		ret = -EFAULT;
> > 		goto e_free;
> > 	}
> 
> 
> We should not be using the data->session_len, it will cause -EFAULT when
> user has not allocated enough space in the session_uaddr. Lets consider
> the case where user passes session_len=10 but firmware thinks the
> session length should be 64. In that case the data->session_len will
> contains a value of 64 but userspace has allocated space for 10 bytes
> and copy_to_user() will fail. If we are really concern about the amount
> of data getting copied to userspace then use min_t(size_t,
> params.session_len, data->session_len).

We are allocating a buffer of params.session_len size and passing that
buffer, and that length via data->session_len, to the firmware. Why would
the firmware set data->session_len to a larger value, in spite of telling
it that the buffer is only params.session_len long? I thought that only
the reverse is possible, that is, the user sets the params.session_len
to the MAX, but the session data is actually smaller than that size.

Also, if for whatever reason the firmware sets data->session_len to
a larger value than what is passed, what is the user space expected
to do when the call returns? If the user space tries to access
params.session_len amount of data, it will possibly get a memory access
violation, because it did not originally allocate that large a buffer.

If we do go with using min_t(size_t, params.session_len,
data->session_len), then params.session_len should also be set to the
smaller of the two, right?

> >> +
> >> +	params.policy = data->policy;
> >> +	params.session_len = data->session_len;
> >> +	if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
> >> +				sizeof(struct kvm_sev_send_start)))
> >> +		ret = -EFAULT;
> > Since the only fields that are changed in the kvm_sev_send_start structure
> > are session_len and policy, why do we need to copy the entire structure
> > back to the user? Why not just those two values? Please see the changes
> > proposed to kvm_sev_send_start structure further below to accomplish this.
> 
> I think we also need to consider the code readability while saving the
> CPU cycles. This is very small structure. By duplicating into two calls
> #1 copy params.policy and #2 copy params.session_len we will add more
> CPU cycle. And, If we get creative and rearrange the structure then code
> readability is lost because now the copy will depend on how the
> structure is layout in the memory.

I was not recommending splitting that call into two. That would certainly
be more expensive, than copying the entire structure. That is the reason
why I suggested reordering the members of kvm_sev_send_start. Isn't
there plenty of code where structures are defined in a way to keep the
data movement efficient? :-)

Please see my other comment below.

> 
> >
> > 	params.policy = data->policy;
> > 	if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
> > 			sizeof(params.policy) + sizeof(params.session_len))
> > 		ret = -EFAULT;
> >> +
> >> +e_free:
> >> +	kfree(data);
> >> +e_free_amd_cert:
> >> +	kfree(amd_certs);
> >> +e_free_plat_cert:
> >> +	kfree(plat_certs);
> >> +e_free_pdh:
> >> +	kfree(pdh_cert);
> >> +e_free_session:
> >> +	kfree(session_data);
> >> +	return ret;
> >> +}
> >> +
> >>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >>  {
> >>  	struct kvm_sev_cmd sev_cmd;
> >> @@ -7193,6 +7318,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >>  	case KVM_SEV_LAUNCH_SECRET:
> >>  		r = sev_launch_secret(kvm, &sev_cmd);
> >>  		break;
> >> +	case KVM_SEV_SEND_START:
> >> +		r = sev_send_start(kvm, &sev_cmd);
> >> +		break;
> >>  	default:
> >>  		r = -EINVAL;
> >>  		goto out;
> >> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> >> index 5167bf2bfc75..9f63b9d48b63 100644
> >> --- a/include/linux/psp-sev.h
> >> +++ b/include/linux/psp-sev.h
> >> @@ -323,11 +323,11 @@ struct sev_data_send_start {
> >>  	u64 pdh_cert_address;			/* In */
> >>  	u32 pdh_cert_len;			/* In */
> >>  	u32 reserved1;
> >> -	u64 plat_cert_address;			/* In */
> >> -	u32 plat_cert_len;			/* In */
> >> +	u64 plat_certs_address;			/* In */
> >> +	u32 plat_certs_len;			/* In */
> >>  	u32 reserved2;
> >> -	u64 amd_cert_address;			/* In */
> >> -	u32 amd_cert_len;			/* In */
> >> +	u64 amd_certs_address;			/* In */
> >> +	u32 amd_certs_len;			/* In */
> >>  	u32 reserved3;
> >>  	u64 session_address;			/* In */
> >>  	u32 session_len;			/* In/Out */
> >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >> index 4b95f9a31a2f..17bef4c245e1 100644
> >> --- a/include/uapi/linux/kvm.h
> >> +++ b/include/uapi/linux/kvm.h
> >> @@ -1558,6 +1558,18 @@ struct kvm_sev_dbg {
> >>  	__u32 len;
> >>  };
> >>  
> >> +struct kvm_sev_send_start {
> >> +	__u32 policy;
> >> +	__u64 pdh_cert_uaddr;
> >> +	__u32 pdh_cert_len;
> >> +	__u64 plat_certs_uaddr;
> >> +	__u32 plat_certs_len;
> >> +	__u64 amd_certs_uaddr;
> >> +	__u32 amd_certs_len;
> >> +	__u64 session_uaddr;
> >> +	__u32 session_len;
> >> +};
> > Redo this structure as below:
> >
> > struct kvm_sev_send_start {
> > 	__u32 policy;
> > 	__u32 session_len;
> > 	__u64 session_uaddr;
> > 	__u64 pdh_cert_uaddr;
> > 	__u32 pdh_cert_len;
> > 	__u64 plat_certs_uaddr;
> > 	__u32 plat_certs_len;
> > 	__u64 amd_certs_uaddr;
> > 	__u32 amd_certs_len;
> > };
> >
> > Or as below, just to make it look better.
> >
> > struct kvm_sev_send_start {
> > 	__u32 policy;
> > 	__u32 session_len;
> > 	__u64 session_uaddr;
> > 	__u32 pdh_cert_len;
> > 	__u64 pdh_cert_uaddr;
> > 	__u32 plat_certs_len;
> > 	__u64 plat_certs_uaddr;
> > 	__u32 amd_certs_len;
> > 	__u64 amd_certs_uaddr;
> > };
> >
> 
> Wherever applicable, I tried  best to not divert from the SEV spec
> structure layout. Anyone who is reading the SEV FW spec  will see a
> similar structure layout in the KVM/PSP header files. I would prefer to
> stick to that approach.

This structure is in uapi, and is anyway different from the
sev_data_send_start, right? Does it really need to stay close to the
firmware structure layout? Just because the firmware folks thought of
a structure layout, that should not prevent our code to be efficient.

> 
> 
> >> +
> >>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
> >>  #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
> >>  #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
> >> -- 
> >> 2.17.1
> >>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 01/14] KVM: SVM: Add KVM_SEV SEND_START command
  2020-03-30  6:19 ` [PATCH v6 01/14] KVM: SVM: Add KVM_SEV SEND_START command Ashish Kalra
  2020-04-02  6:27   ` Venu Busireddy
@ 2020-04-02 17:51   ` Krish Sadhukhan
  2020-04-02 18:38     ` Brijesh Singh
  1 sibling, 1 reply; 107+ messages in thread
From: Krish Sadhukhan @ 2020-04-02 17:51 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh


On 3/29/20 11:19 PM, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
>
> The command is used to create an outgoing SEV guest encryption context.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Steve Rutherford <srutherford@google.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>   .../virt/kvm/amd-memory-encryption.rst        |  27 ++++
>   arch/x86/kvm/svm.c                            | 128 ++++++++++++++++++
>   include/linux/psp-sev.h                       |   8 +-
>   include/uapi/linux/kvm.h                      |  12 ++
>   4 files changed, 171 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index c3129b9ba5cb..4fd34fc5c7a7 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -263,6 +263,33 @@ Returns: 0 on success, -negative on error
>                   __u32 trans_len;
>           };
>   
> +10. KVM_SEV_SEND_START
> +----------------------
> +
> +The KVM_SEV_SEND_START command can be used by the hypervisor to create an
> +outgoing guest encryption context.
Shouldn't we mention that this command is also used to save the guest to 
the disk ?
> +
> +Parameters (in): struct kvm_sev_send_start
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +        struct kvm_sev_send_start {
> +                __u32 policy;                 /* guest policy */
> +
> +                __u64 pdh_cert_uaddr;         /* platform Diffie-Hellman certificate */
> +                __u32 pdh_cert_len;
> +
> +                __u64 plat_certs_uadr;        /* platform certificate chain */
> +                __u32 plat_certs_len;
> +
> +                __u64 amd_certs_uaddr;        /* AMD certificate */
> +                __u32 amd_cert_len;
> +
> +                __u64 session_uaddr;          /* Guest session information */
> +                __u32 session_len;
> +        };
> +
>   References
>   ==========
>   
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 50d1ebafe0b3..63d172e974ad 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7149,6 +7149,131 @@ static int sev_launch_secret(struct kvm *kvm, struct kvm_sev_cmd *argp)
>   	return ret;
>   }
>   
> +/* Userspace wants to query session length. */
> +static int
> +__sev_send_start_query_session_length(struct kvm *kvm, struct kvm_sev_cmd *argp,


__sev_query_send_start_session_length a better name perhaps ?

> +				      struct kvm_sev_send_start *params)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct sev_data_send_start *data;
> +	int ret;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
> +	if (data == NULL)
> +		return -ENOMEM;
> +
> +	data->handle = sev->handle;
> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);


We are not checking ret here as we are assuming that the command will 
always be successful. Is there any chance that sev->handle can be junk 
and should we have an ASSERT for it ?

> +
> +	params->session_len = data->session_len;
> +	if (copy_to_user((void __user *)(uintptr_t)argp->data, params,
> +				sizeof(struct kvm_sev_send_start)))
> +		ret = -EFAULT;
> +
> +	kfree(data);
> +	return ret;
> +}
> +
> +static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp)


For readability and ease of cscope searches, isn't it better to append 
"svm" to all these functions ?

It seems svm_sev_enabled() is an example of an appropriate naming style.

> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct sev_data_send_start *data;
> +	struct kvm_sev_send_start params;
> +	void *amd_certs, *session_data;
> +	void *pdh_cert, *plat_certs;
> +	int ret;
> +
> +	if (!sev_guest(kvm))
> +		return -ENOTTY;
> +
> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
> +				sizeof(struct kvm_sev_send_start)))
> +		return -EFAULT;
> +
> +	/* if session_len is zero, userspace wants to query the session length */
> +	if (!params.session_len)
> +		return __sev_send_start_query_session_length(kvm, argp,
> +				&params);
> +
> +	/* some sanity checks */
> +	if (!params.pdh_cert_uaddr || !params.pdh_cert_len ||
> +	    !params.session_uaddr || params.session_len > SEV_FW_BLOB_MAX_SIZE)
> +		return -EINVAL;


What if params.plat_certs_uaddr or params.amd_certs_uaddr is NULL ?

> +
> +	/* allocate the memory to hold the session data blob */
> +	session_data = kmalloc(params.session_len, GFP_KERNEL_ACCOUNT);
> +	if (!session_data)
> +		return -ENOMEM;
> +
> +	/* copy the certificate blobs from userspace */


You haven't added comments for plat_cert and amd_cert. Also, it's much 
more readable if you add block comments like,

         /*

          *  PDH cert

          */

> +	pdh_cert = psp_copy_user_blob(params.pdh_cert_uaddr,
> +				params.pdh_cert_len);
> +	if (IS_ERR(pdh_cert)) {
> +		ret = PTR_ERR(pdh_cert);
> +		goto e_free_session;
> +	}
> +
> +	plat_certs = psp_copy_user_blob(params.plat_certs_uaddr,
> +				params.plat_certs_len);
> +	if (IS_ERR(plat_certs)) {
> +		ret = PTR_ERR(plat_certs);
> +		goto e_free_pdh;
> +	}
> +
> +	amd_certs = psp_copy_user_blob(params.amd_certs_uaddr,
> +				params.amd_certs_len);
> +	if (IS_ERR(amd_certs)) {
> +		ret = PTR_ERR(amd_certs);
> +		goto e_free_plat_cert;
> +	}
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
> +	if (data == NULL) {
> +		ret = -ENOMEM;
> +		goto e_free_amd_cert;
> +	}
> +
> +	/* populate the FW SEND_START field with system physical address */
> +	data->pdh_cert_address = __psp_pa(pdh_cert);
> +	data->pdh_cert_len = params.pdh_cert_len;
> +	data->plat_certs_address = __psp_pa(plat_certs);
> +	data->plat_certs_len = params.plat_certs_len;
> +	data->amd_certs_address = __psp_pa(amd_certs);
> +	data->amd_certs_len = params.amd_certs_len;
> +	data->session_address = __psp_pa(session_data);
> +	data->session_len = params.session_len;
> +	data->handle = sev->handle;
> +
> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
> +
> +	if (ret)
> +		goto e_free;
> +
> +	if (copy_to_user((void __user *)(uintptr_t) params.session_uaddr,
> +			session_data, params.session_len)) {
> +		ret = -EFAULT;
> +		goto e_free;
> +	}
> +
> +	params.policy = data->policy;
> +	params.session_len = data->session_len;
> +	if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
> +				sizeof(struct kvm_sev_send_start)))
> +		ret = -EFAULT;
> +
> +e_free:
> +	kfree(data);
> +e_free_amd_cert:
> +	kfree(amd_certs);
> +e_free_plat_cert:
> +	kfree(plat_certs);
> +e_free_pdh:
> +	kfree(pdh_cert);
> +e_free_session:
> +	kfree(session_data);
> +	return ret;
> +}
> +
>   static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   {
>   	struct kvm_sev_cmd sev_cmd;
> @@ -7193,6 +7318,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   	case KVM_SEV_LAUNCH_SECRET:
>   		r = sev_launch_secret(kvm, &sev_cmd);
>   		break;
> +	case KVM_SEV_SEND_START:
> +		r = sev_send_start(kvm, &sev_cmd);
> +		break;
>   	default:
>   		r = -EINVAL;
>   		goto out;
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index 5167bf2bfc75..9f63b9d48b63 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -323,11 +323,11 @@ struct sev_data_send_start {
>   	u64 pdh_cert_address;			/* In */
>   	u32 pdh_cert_len;			/* In */
>   	u32 reserved1;
> -	u64 plat_cert_address;			/* In */
> -	u32 plat_cert_len;			/* In */
> +	u64 plat_certs_address;			/* In */
> +	u32 plat_certs_len;			/* In */


It seems that the 'platform certificate' and the 'amd_certificate' are 
single entities, meaning only copy is there for the particular platform 
and particular the AMD product. Why are these plural then ?


>   	u32 reserved2;
> -	u64 amd_cert_address;			/* In */
> -	u32 amd_cert_len;			/* In */
> +	u64 amd_certs_address;			/* In */
> +	u32 amd_certs_len;			/* In */
>   	u32 reserved3;
>   	u64 session_address;			/* In */
>   	u32 session_len;			/* In/Out */
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 4b95f9a31a2f..17bef4c245e1 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1558,6 +1558,18 @@ struct kvm_sev_dbg {
>   	__u32 len;
>   };
>   
> +struct kvm_sev_send_start {
> +	__u32 policy;
> +	__u64 pdh_cert_uaddr;
> +	__u32 pdh_cert_len;
> +	__u64 plat_certs_uaddr;
> +	__u32 plat_certs_len;
> +	__u64 amd_certs_uaddr;
> +	__u32 amd_certs_len;
> +	__u64 session_uaddr;
> +	__u32 session_len;
> +};
> +
>   #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
>   #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
>   #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 02/14] KVM: SVM: Add KVM_SEND_UPDATE_DATA command
  2020-03-30  6:20 ` [PATCH v6 02/14] KVM: SVM: Add KVM_SEND_UPDATE_DATA command Ashish Kalra
@ 2020-04-02 17:55   ` Venu Busireddy
  2020-04-02 20:13   ` Krish Sadhukhan
  1 sibling, 0 replies; 107+ messages in thread
From: Venu Busireddy @ 2020-04-02 17:55 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On 2020-03-30 06:20:33 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
> 
> The command is used for encrypting the guest memory region using the encryption
> context created with KVM_SEV_SEND_START.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by : Steve Rutherford <srutherford@google.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  .../virt/kvm/amd-memory-encryption.rst        |  24 ++++
>  arch/x86/kvm/svm.c                            | 136 +++++++++++++++++-
>  include/uapi/linux/kvm.h                      |   9 ++
>  3 files changed, 165 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index 4fd34fc5c7a7..f46817ef7019 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -290,6 +290,30 @@ Returns: 0 on success, -negative on error
>                  __u32 session_len;
>          };
>  
> +11. KVM_SEV_SEND_UPDATE_DATA
> +----------------------------
> +
> +The KVM_SEV_SEND_UPDATE_DATA command can be used by the hypervisor to encrypt the
> +outgoing guest memory region with the encryption context creating using
> +KVM_SEV_SEND_START.
> +
> +Parameters (in): struct kvm_sev_send_update_data
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> +        struct kvm_sev_launch_send_update_data {
> +                __u64 hdr_uaddr;        /* userspace address containing the packet header */
> +                __u32 hdr_len;
> +
> +                __u64 guest_uaddr;      /* the source memory region to be encrypted */
> +                __u32 guest_len;
> +
> +                __u64 trans_uaddr;      /* the destition memory region  */
> +                __u32 trans_len;
> +        };
> +
>  References
>  ==========
>  
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 63d172e974ad..8561c47cc4f9 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -428,6 +428,7 @@ static DECLARE_RWSEM(sev_deactivate_lock);
>  static DEFINE_MUTEX(sev_bitmap_lock);
>  static unsigned int max_sev_asid;
>  static unsigned int min_sev_asid;
> +static unsigned long sev_me_mask;
>  static unsigned long *sev_asid_bitmap;
>  static unsigned long *sev_reclaim_asid_bitmap;
>  #define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
> @@ -1232,16 +1233,22 @@ static int avic_ga_log_notifier(u32 ga_tag)
>  static __init int sev_hardware_setup(void)
>  {
>  	struct sev_user_data_status *status;
> +	u32 eax, ebx;
>  	int rc;
>  
> -	/* Maximum number of encrypted guests supported simultaneously */
> -	max_sev_asid = cpuid_ecx(0x8000001F);
> +	/*
> +	 * Query the memory encryption information.
> +	 *  EBX:  Bit 0:5 Pagetable bit position used to indicate encryption
> +	 *  (aka Cbit).
> +	 *  ECX:  Maximum number of encrypted guests supported simultaneously.
> +	 *  EDX:  Minimum ASID value that should be used for SEV guest.
> +	 */
> +	cpuid(0x8000001f, &eax, &ebx, &max_sev_asid, &min_sev_asid);
>  
>  	if (!max_sev_asid)
>  		return 1;
>  
> -	/* Minimum ASID value that should be used for SEV guest */
> -	min_sev_asid = cpuid_edx(0x8000001F);
> +	sev_me_mask = 1UL << (ebx & 0x3f);
>  
>  	/* Initialize SEV ASID bitmaps */
>  	sev_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);
> @@ -7274,6 +7281,124 @@ static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  	return ret;
>  }
>  
> +/* Userspace wants to query either header or trans length. */
> +static int
> +__sev_send_update_data_query_lengths(struct kvm *kvm, struct kvm_sev_cmd *argp,
> +				     struct kvm_sev_send_update_data *params)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct sev_data_send_update_data *data;
> +	int ret;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	data->handle = sev->handle;
> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_UPDATE_DATA, data, &argp->error);
> +
> +	params->hdr_len = data->hdr_len;
> +	params->trans_len = data->trans_len;
> +
> +	if (copy_to_user((void __user *)(uintptr_t)argp->data, params,
> +			 sizeof(struct kvm_sev_send_update_data)))
> +		ret = -EFAULT;
> +
> +	kfree(data);
> +	return ret;
> +}
> +
> +static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct sev_data_send_update_data *data;
> +	struct kvm_sev_send_update_data params;
> +	void *hdr, *trans_data;
> +	struct page **guest_page;
> +	unsigned long n;
> +	int ret, offset;
> +
> +	if (!sev_guest(kvm))
> +		return -ENOTTY;
> +
> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
> +			sizeof(struct kvm_sev_send_update_data)))
> +		return -EFAULT;
> +
> +	/* userspace wants to query either header or trans length */
> +	if (!params.trans_len || !params.hdr_len)
> +		return __sev_send_update_data_query_lengths(kvm, argp, &params);
> +
> +	if (!params.trans_uaddr || !params.guest_uaddr ||
> +	    !params.guest_len || !params.hdr_uaddr)
> +		return -EINVAL;
> +
> +
> +	/* Check if we are crossing the page boundary */
> +	offset = params.guest_uaddr & (PAGE_SIZE - 1);
> +	if ((params.guest_len + offset > PAGE_SIZE))
> +		return -EINVAL;
> +
> +	/* Pin guest memory */
> +	guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
> +				    PAGE_SIZE, &n, 0);
> +	if (!guest_page)
> +		return -EFAULT;
> +
> +	/* allocate memory for header and transport buffer */
> +	ret = -ENOMEM;
> +	hdr = kmalloc(params.hdr_len, GFP_KERNEL_ACCOUNT);
> +	if (!hdr)
> +		goto e_unpin;
> +
> +	trans_data = kmalloc(params.trans_len, GFP_KERNEL_ACCOUNT);
> +	if (!trans_data)
> +		goto e_free_hdr;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		goto e_free_trans_data;
> +
> +	data->hdr_address = __psp_pa(hdr);
> +	data->hdr_len = params.hdr_len;
> +	data->trans_address = __psp_pa(trans_data);
> +	data->trans_len = params.trans_len;
> +
> +	/* The SEND_UPDATE_DATA command requires C-bit to be always set. */
> +	data->guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) +
> +				offset;
> +	data->guest_address |= sev_me_mask;
> +	data->guest_len = params.guest_len;
> +	data->handle = sev->handle;
> +
> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_UPDATE_DATA, data, &argp->error);
> +
> +	if (ret)
> +		goto e_free;
> +
> +	/* copy transport buffer to user space */
> +	if (copy_to_user((void __user *)(uintptr_t)params.trans_uaddr,
> +			 trans_data, params.trans_len)) {
> +		ret = -EFAULT;
> +		goto e_unpin;

Shouldn't this be 

		goto e_free;
?

> +	}
> +
> +	/* Copy packet header to userspace. */
> +	ret = copy_to_user((void __user *)(uintptr_t)params.hdr_uaddr, hdr,
> +				params.hdr_len);
> +
> +e_free:
> +	kfree(data);
> +e_free_trans_data:
> +	kfree(trans_data);
> +e_free_hdr:
> +	kfree(hdr);
> +e_unpin:
> +	sev_unpin_memory(kvm, guest_page, n);
> +
> +	return ret;
> +}
> +
>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_sev_cmd sev_cmd;
> @@ -7321,6 +7446,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  	case KVM_SEV_SEND_START:
>  		r = sev_send_start(kvm, &sev_cmd);
>  		break;
> +	case KVM_SEV_SEND_UPDATE_DATA:
> +		r = sev_send_update_data(kvm, &sev_cmd);
> +		break;
>  	default:
>  		r = -EINVAL;
>  		goto out;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 17bef4c245e1..d9dc81bb9c55 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1570,6 +1570,15 @@ struct kvm_sev_send_start {
>  	__u32 session_len;
>  };
>  
> +struct kvm_sev_send_update_data {
> +	__u64 hdr_uaddr;
> +	__u32 hdr_len;
> +	__u64 guest_uaddr;
> +	__u32 guest_len;
> +	__u64 trans_uaddr;
> +	__u32 trans_len;
> +};
> +
>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
>  #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
>  #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 01/14] KVM: SVM: Add KVM_SEV SEND_START command
  2020-04-02 16:37       ` Venu Busireddy
@ 2020-04-02 18:04         ` Brijesh Singh
  2020-04-02 18:57           ` Venu Busireddy
  0 siblings, 1 reply; 107+ messages in thread
From: Brijesh Singh @ 2020-04-02 18:04 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: brijesh.singh, Ashish Kalra, pbonzini, tglx, mingo, hpa, joro,
	bp, thomas.lendacky, x86, kvm, linux-kernel, rientjes,
	srutherford, luto


On 4/2/20 11:37 AM, Venu Busireddy wrote:
> On 2020-04-02 07:59:54 -0500, Brijesh Singh wrote:
>> Hi Venu,
>>
>> Thanks for the feedback.
>>
>> On 4/2/20 1:27 AM, Venu Busireddy wrote:
>>> On 2020-03-30 06:19:59 +0000, Ashish Kalra wrote:
>>>> From: Brijesh Singh <Brijesh.Singh@amd.com>
>>>>
>>>> The command is used to create an outgoing SEV guest encryption context.
>>>>
>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
>>>> Cc: Joerg Roedel <joro@8bytes.org>
>>>> Cc: Borislav Petkov <bp@suse.de>
>>>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>>>> Cc: x86@kernel.org
>>>> Cc: kvm@vger.kernel.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Reviewed-by: Steve Rutherford <srutherford@google.com>
>>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>>> ---
>>>>  .../virt/kvm/amd-memory-encryption.rst        |  27 ++++
>>>>  arch/x86/kvm/svm.c                            | 128 ++++++++++++++++++
>>>>  include/linux/psp-sev.h                       |   8 +-
>>>>  include/uapi/linux/kvm.h                      |  12 ++
>>>>  4 files changed, 171 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
>>>> index c3129b9ba5cb..4fd34fc5c7a7 100644
>>>> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
>>>> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
>>>> @@ -263,6 +263,33 @@ Returns: 0 on success, -negative on error
>>>>                  __u32 trans_len;
>>>>          };
>>>>  
>>>> +10. KVM_SEV_SEND_START
>>>> +----------------------
>>>> +
>>>> +The KVM_SEV_SEND_START command can be used by the hypervisor to create an
>>>> +outgoing guest encryption context.
>>>> +
>>>> +Parameters (in): struct kvm_sev_send_start
>>>> +
>>>> +Returns: 0 on success, -negative on error
>>>> +
>>>> +::
>>>> +        struct kvm_sev_send_start {
>>>> +                __u32 policy;                 /* guest policy */
>>>> +
>>>> +                __u64 pdh_cert_uaddr;         /* platform Diffie-Hellman certificate */
>>>> +                __u32 pdh_cert_len;
>>>> +
>>>> +                __u64 plat_certs_uadr;        /* platform certificate chain */
>>> Could this please be changed to plat_certs_uaddr, as it is referred to
>>> in the rest of the code?
>>>
>>>> +                __u32 plat_certs_len;
>>>> +
>>>> +                __u64 amd_certs_uaddr;        /* AMD certificate */
>>>> +                __u32 amd_cert_len;
>>> Could this please be changed to amd_certs_len, as it is referred to in
>>> the rest of the code?
>>>
>>>> +
>>>> +                __u64 session_uaddr;          /* Guest session information */
>>>> +                __u32 session_len;
>>>> +        };
>>>> +
>>>>  References
>>>>  ==========
>>>>  
>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>>> index 50d1ebafe0b3..63d172e974ad 100644
>>>> --- a/arch/x86/kvm/svm.c
>>>> +++ b/arch/x86/kvm/svm.c
>>>> @@ -7149,6 +7149,131 @@ static int sev_launch_secret(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>>>  	return ret;
>>>>  }
>>>>  
>>>> +/* Userspace wants to query session length. */
>>>> +static int
>>>> +__sev_send_start_query_session_length(struct kvm *kvm, struct kvm_sev_cmd *argp,
>>>> +				      struct kvm_sev_send_start *params)
>>>> +{
>>>> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>> +	struct sev_data_send_start *data;
>>>> +	int ret;
>>>> +
>>>> +	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
>>>> +	if (data == NULL)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	data->handle = sev->handle;
>>>> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
>>>> +
>>>> +	params->session_len = data->session_len;
>>>> +	if (copy_to_user((void __user *)(uintptr_t)argp->data, params,
>>>> +				sizeof(struct kvm_sev_send_start)))
>>>> +		ret = -EFAULT;
>>>> +
>>>> +	kfree(data);
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>>> +{
>>>> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>> +	struct sev_data_send_start *data;
>>>> +	struct kvm_sev_send_start params;
>>>> +	void *amd_certs, *session_data;
>>>> +	void *pdh_cert, *plat_certs;
>>>> +	int ret;
>>>> +
>>>> +	if (!sev_guest(kvm))
>>>> +		return -ENOTTY;
>>>> +
>>>> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
>>>> +				sizeof(struct kvm_sev_send_start)))
>>>> +		return -EFAULT;
>>>> +
>>>> +	/* if session_len is zero, userspace wants to query the session length */
>>>> +	if (!params.session_len)
>>>> +		return __sev_send_start_query_session_length(kvm, argp,
>>>> +				&params);
>>>> +
>>>> +	/* some sanity checks */
>>>> +	if (!params.pdh_cert_uaddr || !params.pdh_cert_len ||
>>>> +	    !params.session_uaddr || params.session_len > SEV_FW_BLOB_MAX_SIZE)
>>>> +		return -EINVAL;
>>>> +
>>>> +	/* allocate the memory to hold the session data blob */
>>>> +	session_data = kmalloc(params.session_len, GFP_KERNEL_ACCOUNT);
>>>> +	if (!session_data)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	/* copy the certificate blobs from userspace */
>>>> +	pdh_cert = psp_copy_user_blob(params.pdh_cert_uaddr,
>>>> +				params.pdh_cert_len);
>>>> +	if (IS_ERR(pdh_cert)) {
>>>> +		ret = PTR_ERR(pdh_cert);
>>>> +		goto e_free_session;
>>>> +	}
>>>> +
>>>> +	plat_certs = psp_copy_user_blob(params.plat_certs_uaddr,
>>>> +				params.plat_certs_len);
>>>> +	if (IS_ERR(plat_certs)) {
>>>> +		ret = PTR_ERR(plat_certs);
>>>> +		goto e_free_pdh;
>>>> +	}
>>>> +
>>>> +	amd_certs = psp_copy_user_blob(params.amd_certs_uaddr,
>>>> +				params.amd_certs_len);
>>>> +	if (IS_ERR(amd_certs)) {
>>>> +		ret = PTR_ERR(amd_certs);
>>>> +		goto e_free_plat_cert;
>>>> +	}
>>>> +
>>>> +	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
>>>> +	if (data == NULL) {
>>>> +		ret = -ENOMEM;
>>>> +		goto e_free_amd_cert;
>>>> +	}
>>>> +
>>>> +	/* populate the FW SEND_START field with system physical address */
>>>> +	data->pdh_cert_address = __psp_pa(pdh_cert);
>>>> +	data->pdh_cert_len = params.pdh_cert_len;
>>>> +	data->plat_certs_address = __psp_pa(plat_certs);
>>>> +	data->plat_certs_len = params.plat_certs_len;
>>>> +	data->amd_certs_address = __psp_pa(amd_certs);
>>>> +	data->amd_certs_len = params.amd_certs_len;
>>>> +	data->session_address = __psp_pa(session_data);
>>>> +	data->session_len = params.session_len;
>>>> +	data->handle = sev->handle;
>>>> +
>>>> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
>>>> +
>>>> +	if (ret)
>>>> +		goto e_free;
>>>> +
>>>> +	if (copy_to_user((void __user *)(uintptr_t) params.session_uaddr,
>>>> +			session_data, params.session_len)) {
>>>> +		ret = -EFAULT;
>>>> +		goto e_free;
>>>> +	}
>>> To optimize the amount of data being copied to user space, could the
>>> above section of code changed as follows?
>>>
>>> 	params.session_len = data->session_len;
>>> 	if (copy_to_user((void __user *)(uintptr_t) params.session_uaddr,
>>> 			session_data, params.session_len)) {
>>> 		ret = -EFAULT;
>>> 		goto e_free;
>>> 	}
>>
>> We should not be using the data->session_len, it will cause -EFAULT when
>> user has not allocated enough space in the session_uaddr. Lets consider
>> the case where user passes session_len=10 but firmware thinks the
>> session length should be 64. In that case the data->session_len will
>> contains a value of 64 but userspace has allocated space for 10 bytes
>> and copy_to_user() will fail. If we are really concern about the amount
>> of data getting copied to userspace then use min_t(size_t,
>> params.session_len, data->session_len).
> We are allocating a buffer of params.session_len size and passing that
> buffer, and that length via data->session_len, to the firmware. Why would
> the firmware set data->session_len to a larger value, in spite of telling
> it that the buffer is only params.session_len long? I thought that only
> the reverse is possible, that is, the user sets the params.session_len
> to the MAX, but the session data is actually smaller than that size.


The question is, how does a userspace know the session length ? One
method is you can precalculate a value based on your firmware version
and have userspace pass that, or another approach is set
params.session_len = 0 and query it from the FW. The FW spec allow to
query the length, please see the spec. In the qemu patches I choose
second approach. This is because session blob can change from one FW
version to another and I tried to avoid calculating or hardcoding the
length for a one version of the FW. You can certainly choose the first
method. We want to ensure that kernel interface works on the both cases.


> Also, if for whatever reason the firmware sets data->session_len to
> a larger value than what is passed, what is the user space expected
> to do when the call returns? If the user space tries to access
> params.session_len amount of data, it will possibly get a memory access
> violation, because it did not originally allocate that large a buffer.
>
> If we do go with using min_t(size_t, params.session_len,
> data->session_len), then params.session_len should also be set to the
> smaller of the two, right?
>
>>>> +
>>>> +	params.policy = data->policy;
>>>> +	params.session_len = data->session_len;
>>>> +	if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
>>>> +				sizeof(struct kvm_sev_send_start)))
>>>> +		ret = -EFAULT;
>>> Since the only fields that are changed in the kvm_sev_send_start structure
>>> are session_len and policy, why do we need to copy the entire structure
>>> back to the user? Why not just those two values? Please see the changes
>>> proposed to kvm_sev_send_start structure further below to accomplish this.
>> I think we also need to consider the code readability while saving the
>> CPU cycles. This is very small structure. By duplicating into two calls
>> #1 copy params.policy and #2 copy params.session_len we will add more
>> CPU cycle. And, If we get creative and rearrange the structure then code
>> readability is lost because now the copy will depend on how the
>> structure is layout in the memory.
> I was not recommending splitting that call into two. That would certainly
> be more expensive, than copying the entire structure. That is the reason
> why I suggested reordering the members of kvm_sev_send_start. Isn't
> there plenty of code where structures are defined in a way to keep the
> data movement efficient? :-)
>
> Please see my other comment below.
>
>>> 	params.policy = data->policy;
>>> 	if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
>>> 			sizeof(params.policy) + sizeof(params.session_len))
>>> 		ret = -EFAULT;
>>>> +
>>>> +e_free:
>>>> +	kfree(data);
>>>> +e_free_amd_cert:
>>>> +	kfree(amd_certs);
>>>> +e_free_plat_cert:
>>>> +	kfree(plat_certs);
>>>> +e_free_pdh:
>>>> +	kfree(pdh_cert);
>>>> +e_free_session:
>>>> +	kfree(session_data);
>>>> +	return ret;
>>>> +}
>>>> +
>>>>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>>>  {
>>>>  	struct kvm_sev_cmd sev_cmd;
>>>> @@ -7193,6 +7318,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>>>  	case KVM_SEV_LAUNCH_SECRET:
>>>>  		r = sev_launch_secret(kvm, &sev_cmd);
>>>>  		break;
>>>> +	case KVM_SEV_SEND_START:
>>>> +		r = sev_send_start(kvm, &sev_cmd);
>>>> +		break;
>>>>  	default:
>>>>  		r = -EINVAL;
>>>>  		goto out;
>>>> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
>>>> index 5167bf2bfc75..9f63b9d48b63 100644
>>>> --- a/include/linux/psp-sev.h
>>>> +++ b/include/linux/psp-sev.h
>>>> @@ -323,11 +323,11 @@ struct sev_data_send_start {
>>>>  	u64 pdh_cert_address;			/* In */
>>>>  	u32 pdh_cert_len;			/* In */
>>>>  	u32 reserved1;
>>>> -	u64 plat_cert_address;			/* In */
>>>> -	u32 plat_cert_len;			/* In */
>>>> +	u64 plat_certs_address;			/* In */
>>>> +	u32 plat_certs_len;			/* In */
>>>>  	u32 reserved2;
>>>> -	u64 amd_cert_address;			/* In */
>>>> -	u32 amd_cert_len;			/* In */
>>>> +	u64 amd_certs_address;			/* In */
>>>> +	u32 amd_certs_len;			/* In */
>>>>  	u32 reserved3;
>>>>  	u64 session_address;			/* In */
>>>>  	u32 session_len;			/* In/Out */
>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>>> index 4b95f9a31a2f..17bef4c245e1 100644
>>>> --- a/include/uapi/linux/kvm.h
>>>> +++ b/include/uapi/linux/kvm.h
>>>> @@ -1558,6 +1558,18 @@ struct kvm_sev_dbg {
>>>>  	__u32 len;
>>>>  };
>>>>  
>>>> +struct kvm_sev_send_start {
>>>> +	__u32 policy;
>>>> +	__u64 pdh_cert_uaddr;
>>>> +	__u32 pdh_cert_len;
>>>> +	__u64 plat_certs_uaddr;
>>>> +	__u32 plat_certs_len;
>>>> +	__u64 amd_certs_uaddr;
>>>> +	__u32 amd_certs_len;
>>>> +	__u64 session_uaddr;
>>>> +	__u32 session_len;
>>>> +};
>>> Redo this structure as below:
>>>
>>> struct kvm_sev_send_start {
>>> 	__u32 policy;
>>> 	__u32 session_len;
>>> 	__u64 session_uaddr;
>>> 	__u64 pdh_cert_uaddr;
>>> 	__u32 pdh_cert_len;
>>> 	__u64 plat_certs_uaddr;
>>> 	__u32 plat_certs_len;
>>> 	__u64 amd_certs_uaddr;
>>> 	__u32 amd_certs_len;
>>> };
>>>
>>> Or as below, just to make it look better.
>>>
>>> struct kvm_sev_send_start {
>>> 	__u32 policy;
>>> 	__u32 session_len;
>>> 	__u64 session_uaddr;
>>> 	__u32 pdh_cert_len;
>>> 	__u64 pdh_cert_uaddr;
>>> 	__u32 plat_certs_len;
>>> 	__u64 plat_certs_uaddr;
>>> 	__u32 amd_certs_len;
>>> 	__u64 amd_certs_uaddr;
>>> };
>>>
>> Wherever applicable, I tried  best to not divert from the SEV spec
>> structure layout. Anyone who is reading the SEV FW spec  will see a
>> similar structure layout in the KVM/PSP header files. I would prefer to
>> stick to that approach.
> This structure is in uapi, and is anyway different from the
> sev_data_send_start, right? Does it really need to stay close to the
> firmware structure layout? Just because the firmware folks thought of
> a structure layout, that should not prevent our code to be efficient.
>
>>
>>>> +
>>>>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
>>>>  #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
>>>>  #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
>>>> -- 
>>>> 2.17.1
>>>>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 03/14] KVM: SVM: Add KVM_SEV_SEND_FINISH command
  2020-03-30  6:20 ` [PATCH v6 03/14] KVM: SVM: Add KVM_SEV_SEND_FINISH command Ashish Kalra
@ 2020-04-02 18:17   ` Venu Busireddy
  2020-04-02 20:15   ` Krish Sadhukhan
  1 sibling, 0 replies; 107+ messages in thread
From: Venu Busireddy @ 2020-04-02 18:17 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On 2020-03-30 06:20:49 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
> 
> The command is used to finailize the encryption context created with
> KVM_SEV_SEND_START command.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Steve Rutherford <srutherford@google.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  .../virt/kvm/amd-memory-encryption.rst        |  8 +++++++
>  arch/x86/kvm/svm.c                            | 23 +++++++++++++++++++
>  2 files changed, 31 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index f46817ef7019..a45dcb5f8687 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -314,6 +314,14 @@ Returns: 0 on success, -negative on error
>                  __u32 trans_len;
>          };
>  
> +12. KVM_SEV_SEND_FINISH
> +------------------------
> +
> +After completion of the migration flow, the KVM_SEV_SEND_FINISH command can be
> +issued by the hypervisor to delete the encryption context.
> +
> +Returns: 0 on success, -negative on error

Didn't notice this earlier. I would suggest changing all occurrences of
"-negative" to either "negative" or "less than 0" in this file.

> +
>  References
>  ==========
>  
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 8561c47cc4f9..71a4cb3b817d 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7399,6 +7399,26 @@ static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  	return ret;
>  }
>  
> +static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct sev_data_send_finish *data;
> +	int ret;
> +
> +	if (!sev_guest(kvm))
> +		return -ENOTTY;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	data->handle = sev->handle;
> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_FINISH, data, &argp->error);
> +
> +	kfree(data);
> +	return ret;
> +}
> +
>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_sev_cmd sev_cmd;
> @@ -7449,6 +7469,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  	case KVM_SEV_SEND_UPDATE_DATA:
>  		r = sev_send_update_data(kvm, &sev_cmd);
>  		break;
> +	case KVM_SEV_SEND_FINISH:
> +		r = sev_send_finish(kvm, &sev_cmd);
> +		break;
>  	default:
>  		r = -EINVAL;
>  		goto out;
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 01/14] KVM: SVM: Add KVM_SEV SEND_START command
  2020-04-02 17:51   ` Krish Sadhukhan
@ 2020-04-02 18:38     ` Brijesh Singh
  0 siblings, 0 replies; 107+ messages in thread
From: Brijesh Singh @ 2020-04-02 18:38 UTC (permalink / raw)
  To: Krish Sadhukhan, Ashish Kalra, pbonzini
  Cc: brijesh.singh, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86,
	kvm, linux-kernel, rientjes, srutherford, luto


On 4/2/20 12:51 PM, Krish Sadhukhan wrote:
>
> On 3/29/20 11:19 PM, Ashish Kalra wrote:
>> From: Brijesh Singh <Brijesh.Singh@amd.com>
>>
>> The command is used to create an outgoing SEV guest encryption context.
>>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
>> Cc: Joerg Roedel <joro@8bytes.org>
>> Cc: Borislav Petkov <bp@suse.de>
>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>> Cc: x86@kernel.org
>> Cc: kvm@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Reviewed-by: Steve Rutherford <srutherford@google.com>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> ---
>>   .../virt/kvm/amd-memory-encryption.rst        |  27 ++++
>>   arch/x86/kvm/svm.c                            | 128 ++++++++++++++++++
>>   include/linux/psp-sev.h                       |   8 +-
>>   include/uapi/linux/kvm.h                      |  12 ++
>>   4 files changed, 171 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst
>> b/Documentation/virt/kvm/amd-memory-encryption.rst
>> index c3129b9ba5cb..4fd34fc5c7a7 100644
>> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
>> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
>> @@ -263,6 +263,33 @@ Returns: 0 on success, -negative on error
>>                   __u32 trans_len;
>>           };
>>   +10. KVM_SEV_SEND_START
>> +----------------------
>> +
>> +The KVM_SEV_SEND_START command can be used by the hypervisor to
>> create an
>> +outgoing guest encryption context.
> Shouldn't we mention that this command is also used to save the guest
> to the disk ?


No, because this command is not used for saving to the disk.


>> +
>> +Parameters (in): struct kvm_sev_send_start
>> +
>> +Returns: 0 on success, -negative on error
>> +
>> +::
>> +        struct kvm_sev_send_start {
>> +                __u32 policy;                 /* guest policy */
>> +
>> +                __u64 pdh_cert_uaddr;         /* platform
>> Diffie-Hellman certificate */
>> +                __u32 pdh_cert_len;
>> +
>> +                __u64 plat_certs_uadr;        /* platform
>> certificate chain */
>> +                __u32 plat_certs_len;
>> +
>> +                __u64 amd_certs_uaddr;        /* AMD certificate */
>> +                __u32 amd_cert_len;
>> +
>> +                __u64 session_uaddr;          /* Guest session
>> information */
>> +                __u32 session_len;
>> +        };
>> +
>>   References
>>   ==========
>>   diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 50d1ebafe0b3..63d172e974ad 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -7149,6 +7149,131 @@ static int sev_launch_secret(struct kvm *kvm,
>> struct kvm_sev_cmd *argp)
>>       return ret;
>>   }
>>   +/* Userspace wants to query session length. */
>> +static int
>> +__sev_send_start_query_session_length(struct kvm *kvm, struct
>> kvm_sev_cmd *argp,
>
>
> __sev_query_send_start_session_length a better name perhaps ?
>
>> +                      struct kvm_sev_send_start *params)
>> +{
>> +    struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> +    struct sev_data_send_start *data;
>> +    int ret;
>> +
>> +    data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
>> +    if (data == NULL)
>> +        return -ENOMEM;
>> +
>> +    data->handle = sev->handle;
>> +    ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
>
>
> We are not checking ret here as we are assuming that the command will
> always be successful. Is there any chance that sev->handle can be junk
> and should we have an ASSERT for it ?


Both failure and success of this command need to be propagated to the
userspace. In the case of failure the FW may provide additional
information which also need to be propagated to the userspace hence on
failure we should not be asserting.


>
>> +
>> +    params->session_len = data->session_len;
>> +    if (copy_to_user((void __user *)(uintptr_t)argp->data, params,
>> +                sizeof(struct kvm_sev_send_start)))
>> +        ret = -EFAULT;
>> +
>> +    kfree(data);
>> +    return ret;
>> +}
>> +
>> +static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>
>
> For readability and ease of cscope searches, isn't it better to append
> "svm" to all these functions ?
>
> It seems svm_sev_enabled() is an example of an appropriate naming style.


I don't have strong opinion to prepend or append "svm" to all these
functions, it can be done outside this series. I personally don't see
any strong reason to append svm at this time. The SEV is applicable to
the SVM file only. There is a pending patch which moves all the SEV
stuff in svm/sev.c.


>
>> +{
>> +    struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> +    struct sev_data_send_start *data;
>> +    struct kvm_sev_send_start params;
>> +    void *amd_certs, *session_data;
>> +    void *pdh_cert, *plat_certs;
>> +    int ret;
>> +
>> +    if (!sev_guest(kvm))
>> +        return -ENOTTY;
>> +
>> +    if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
>> +                sizeof(struct kvm_sev_send_start)))
>> +        return -EFAULT;
>> +
>> +    /* if session_len is zero, userspace wants to query the session
>> length */
>> +    if (!params.session_len)
>> +        return __sev_send_start_query_session_length(kvm, argp,
>> +                &params);
>> +
>> +    /* some sanity checks */
>> +    if (!params.pdh_cert_uaddr || !params.pdh_cert_len ||
>> +        !params.session_uaddr || params.session_len >
>> SEV_FW_BLOB_MAX_SIZE)
>> +        return -EINVAL;
>
>
> What if params.plat_certs_uaddr or params.amd_certs_uaddr is NULL ?
>
>> +
>> +    /* allocate the memory to hold the session data blob */
>> +    session_data = kmalloc(params.session_len, GFP_KERNEL_ACCOUNT);
>> +    if (!session_data)
>> +        return -ENOMEM;
>> +
>> +    /* copy the certificate blobs from userspace */
>
>
> You haven't added comments for plat_cert and amd_cert. Also, it's much
> more readable if you add block comments like,
>
>         /*
>
>          *  PDH cert
>
>          */
>
>> +    pdh_cert = psp_copy_user_blob(params.pdh_cert_uaddr,
>> +                params.pdh_cert_len);
>> +    if (IS_ERR(pdh_cert)) {
>> +        ret = PTR_ERR(pdh_cert);
>> +        goto e_free_session;
>> +    }
>> +
>> +    plat_certs = psp_copy_user_blob(params.plat_certs_uaddr,
>> +                params.plat_certs_len);
>> +    if (IS_ERR(plat_certs)) {
>> +        ret = PTR_ERR(plat_certs);
>> +        goto e_free_pdh;
>> +    }
>> +
>> +    amd_certs = psp_copy_user_blob(params.amd_certs_uaddr,
>> +                params.amd_certs_len);
>> +    if (IS_ERR(amd_certs)) {
>> +        ret = PTR_ERR(amd_certs);
>> +        goto e_free_plat_cert;
>> +    }
>> +
>> +    data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
>> +    if (data == NULL) {
>> +        ret = -ENOMEM;
>> +        goto e_free_amd_cert;
>> +    }
>> +
>> +    /* populate the FW SEND_START field with system physical address */
>> +    data->pdh_cert_address = __psp_pa(pdh_cert);
>> +    data->pdh_cert_len = params.pdh_cert_len;
>> +    data->plat_certs_address = __psp_pa(plat_certs);
>> +    data->plat_certs_len = params.plat_certs_len;
>> +    data->amd_certs_address = __psp_pa(amd_certs);
>> +    data->amd_certs_len = params.amd_certs_len;
>> +    data->session_address = __psp_pa(session_data);
>> +    data->session_len = params.session_len;
>> +    data->handle = sev->handle;
>> +
>> +    ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
>> +
>> +    if (ret)
>> +        goto e_free;
>> +
>> +    if (copy_to_user((void __user *)(uintptr_t) params.session_uaddr,
>> +            session_data, params.session_len)) {
>> +        ret = -EFAULT;
>> +        goto e_free;
>> +    }
>> +
>> +    params.policy = data->policy;
>> +    params.session_len = data->session_len;
>> +    if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
>> +                sizeof(struct kvm_sev_send_start)))
>> +        ret = -EFAULT;
>> +
>> +e_free:
>> +    kfree(data);
>> +e_free_amd_cert:
>> +    kfree(amd_certs);
>> +e_free_plat_cert:
>> +    kfree(plat_certs);
>> +e_free_pdh:
>> +    kfree(pdh_cert);
>> +e_free_session:
>> +    kfree(session_data);
>> +    return ret;
>> +}
>> +
>>   static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>   {
>>       struct kvm_sev_cmd sev_cmd;
>> @@ -7193,6 +7318,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void
>> __user *argp)
>>       case KVM_SEV_LAUNCH_SECRET:
>>           r = sev_launch_secret(kvm, &sev_cmd);
>>           break;
>> +    case KVM_SEV_SEND_START:
>> +        r = sev_send_start(kvm, &sev_cmd);
>> +        break;
>>       default:
>>           r = -EINVAL;
>>           goto out;
>> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
>> index 5167bf2bfc75..9f63b9d48b63 100644
>> --- a/include/linux/psp-sev.h
>> +++ b/include/linux/psp-sev.h
>> @@ -323,11 +323,11 @@ struct sev_data_send_start {
>>       u64 pdh_cert_address;            /* In */
>>       u32 pdh_cert_len;            /* In */
>>       u32 reserved1;
>> -    u64 plat_cert_address;            /* In */
>> -    u32 plat_cert_len;            /* In */
>> +    u64 plat_certs_address;            /* In */
>> +    u32 plat_certs_len;            /* In */
>
>
> It seems that the 'platform certificate' and the 'amd_certificate' are
> single entities, meaning only copy is there for the particular
> platform and particular the AMD product. Why are these plural then ?
>
>
>>       u32 reserved2;
>> -    u64 amd_cert_address;            /* In */
>> -    u32 amd_cert_len;            /* In */
>> +    u64 amd_certs_address;            /* In */
>> +    u32 amd_certs_len;            /* In */
>>       u32 reserved3;
>>       u64 session_address;            /* In */
>>       u32 session_len;            /* In/Out */
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 4b95f9a31a2f..17bef4c245e1 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1558,6 +1558,18 @@ struct kvm_sev_dbg {
>>       __u32 len;
>>   };
>>   +struct kvm_sev_send_start {
>> +    __u32 policy;
>> +    __u64 pdh_cert_uaddr;
>> +    __u32 pdh_cert_len;
>> +    __u64 plat_certs_uaddr;
>> +    __u32 plat_certs_len;
>> +    __u64 amd_certs_uaddr;
>> +    __u32 amd_certs_len;
>> +    __u64 session_uaddr;
>> +    __u32 session_len;
>> +};
>> +
>>   #define KVM_DEV_ASSIGN_ENABLE_IOMMU    (1 << 0)
>>   #define KVM_DEV_ASSIGN_PCI_2_3        (1 << 1)
>>   #define KVM_DEV_ASSIGN_MASK_INTX    (1 << 2)

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 01/14] KVM: SVM: Add KVM_SEV SEND_START command
  2020-04-02 18:04         ` Brijesh Singh
@ 2020-04-02 18:57           ` Venu Busireddy
  2020-04-02 19:17             ` Brijesh Singh
  0 siblings, 1 reply; 107+ messages in thread
From: Venu Busireddy @ 2020-04-02 18:57 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Ashish Kalra, pbonzini, tglx, mingo, hpa, joro, bp,
	thomas.lendacky, x86, kvm, linux-kernel, rientjes, srutherford,
	luto

On 2020-04-02 13:04:13 -0500, Brijesh Singh wrote:
> 
> On 4/2/20 11:37 AM, Venu Busireddy wrote:
> > On 2020-04-02 07:59:54 -0500, Brijesh Singh wrote:
> >> Hi Venu,
> >>
> >> Thanks for the feedback.
> >>
> >> On 4/2/20 1:27 AM, Venu Busireddy wrote:
> >>> On 2020-03-30 06:19:59 +0000, Ashish Kalra wrote:
> >>>> From: Brijesh Singh <Brijesh.Singh@amd.com>
> >>>>
> >>>> The command is used to create an outgoing SEV guest encryption context.
> >>>>
> >>>> Cc: Thomas Gleixner <tglx@linutronix.de>
> >>>> Cc: Ingo Molnar <mingo@redhat.com>
> >>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
> >>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >>>> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> >>>> Cc: Joerg Roedel <joro@8bytes.org>
> >>>> Cc: Borislav Petkov <bp@suse.de>
> >>>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> >>>> Cc: x86@kernel.org
> >>>> Cc: kvm@vger.kernel.org
> >>>> Cc: linux-kernel@vger.kernel.org
> >>>> Reviewed-by: Steve Rutherford <srutherford@google.com>
> >>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> >>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> >>>> ---
> >>>>  .../virt/kvm/amd-memory-encryption.rst        |  27 ++++
> >>>>  arch/x86/kvm/svm.c                            | 128 ++++++++++++++++++
> >>>>  include/linux/psp-sev.h                       |   8 +-
> >>>>  include/uapi/linux/kvm.h                      |  12 ++
> >>>>  4 files changed, 171 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> >>>> index c3129b9ba5cb..4fd34fc5c7a7 100644
> >>>> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> >>>> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> >>>> @@ -263,6 +263,33 @@ Returns: 0 on success, -negative on error
> >>>>                  __u32 trans_len;
> >>>>          };
> >>>>  
> >>>> +10. KVM_SEV_SEND_START
> >>>> +----------------------
> >>>> +
> >>>> +The KVM_SEV_SEND_START command can be used by the hypervisor to create an
> >>>> +outgoing guest encryption context.
> >>>> +
> >>>> +Parameters (in): struct kvm_sev_send_start
> >>>> +
> >>>> +Returns: 0 on success, -negative on error
> >>>> +
> >>>> +::
> >>>> +        struct kvm_sev_send_start {
> >>>> +                __u32 policy;                 /* guest policy */
> >>>> +
> >>>> +                __u64 pdh_cert_uaddr;         /* platform Diffie-Hellman certificate */
> >>>> +                __u32 pdh_cert_len;
> >>>> +
> >>>> +                __u64 plat_certs_uadr;        /* platform certificate chain */
> >>> Could this please be changed to plat_certs_uaddr, as it is referred to
> >>> in the rest of the code?
> >>>
> >>>> +                __u32 plat_certs_len;
> >>>> +
> >>>> +                __u64 amd_certs_uaddr;        /* AMD certificate */
> >>>> +                __u32 amd_cert_len;
> >>> Could this please be changed to amd_certs_len, as it is referred to in
> >>> the rest of the code?
> >>>
> >>>> +
> >>>> +                __u64 session_uaddr;          /* Guest session information */
> >>>> +                __u32 session_len;
> >>>> +        };
> >>>> +
> >>>>  References
> >>>>  ==========
> >>>>  
> >>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >>>> index 50d1ebafe0b3..63d172e974ad 100644
> >>>> --- a/arch/x86/kvm/svm.c
> >>>> +++ b/arch/x86/kvm/svm.c
> >>>> @@ -7149,6 +7149,131 @@ static int sev_launch_secret(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >>>>  	return ret;
> >>>>  }
> >>>>  
> >>>> +/* Userspace wants to query session length. */
> >>>> +static int
> >>>> +__sev_send_start_query_session_length(struct kvm *kvm, struct kvm_sev_cmd *argp,
> >>>> +				      struct kvm_sev_send_start *params)
> >>>> +{
> >>>> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >>>> +	struct sev_data_send_start *data;
> >>>> +	int ret;
> >>>> +
> >>>> +	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
> >>>> +	if (data == NULL)
> >>>> +		return -ENOMEM;
> >>>> +
> >>>> +	data->handle = sev->handle;
> >>>> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
> >>>> +
> >>>> +	params->session_len = data->session_len;
> >>>> +	if (copy_to_user((void __user *)(uintptr_t)argp->data, params,
> >>>> +				sizeof(struct kvm_sev_send_start)))
> >>>> +		ret = -EFAULT;
> >>>> +
> >>>> +	kfree(data);
> >>>> +	return ret;
> >>>> +}
> >>>> +
> >>>> +static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >>>> +{
> >>>> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >>>> +	struct sev_data_send_start *data;
> >>>> +	struct kvm_sev_send_start params;
> >>>> +	void *amd_certs, *session_data;
> >>>> +	void *pdh_cert, *plat_certs;
> >>>> +	int ret;
> >>>> +
> >>>> +	if (!sev_guest(kvm))
> >>>> +		return -ENOTTY;
> >>>> +
> >>>> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
> >>>> +				sizeof(struct kvm_sev_send_start)))
> >>>> +		return -EFAULT;
> >>>> +
> >>>> +	/* if session_len is zero, userspace wants to query the session length */
> >>>> +	if (!params.session_len)
> >>>> +		return __sev_send_start_query_session_length(kvm, argp,
> >>>> +				&params);
> >>>> +
> >>>> +	/* some sanity checks */
> >>>> +	if (!params.pdh_cert_uaddr || !params.pdh_cert_len ||
> >>>> +	    !params.session_uaddr || params.session_len > SEV_FW_BLOB_MAX_SIZE)
> >>>> +		return -EINVAL;
> >>>> +
> >>>> +	/* allocate the memory to hold the session data blob */
> >>>> +	session_data = kmalloc(params.session_len, GFP_KERNEL_ACCOUNT);
> >>>> +	if (!session_data)
> >>>> +		return -ENOMEM;
> >>>> +
> >>>> +	/* copy the certificate blobs from userspace */
> >>>> +	pdh_cert = psp_copy_user_blob(params.pdh_cert_uaddr,
> >>>> +				params.pdh_cert_len);
> >>>> +	if (IS_ERR(pdh_cert)) {
> >>>> +		ret = PTR_ERR(pdh_cert);
> >>>> +		goto e_free_session;
> >>>> +	}
> >>>> +
> >>>> +	plat_certs = psp_copy_user_blob(params.plat_certs_uaddr,
> >>>> +				params.plat_certs_len);
> >>>> +	if (IS_ERR(plat_certs)) {
> >>>> +		ret = PTR_ERR(plat_certs);
> >>>> +		goto e_free_pdh;
> >>>> +	}
> >>>> +
> >>>> +	amd_certs = psp_copy_user_blob(params.amd_certs_uaddr,
> >>>> +				params.amd_certs_len);
> >>>> +	if (IS_ERR(amd_certs)) {
> >>>> +		ret = PTR_ERR(amd_certs);
> >>>> +		goto e_free_plat_cert;
> >>>> +	}
> >>>> +
> >>>> +	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
> >>>> +	if (data == NULL) {
> >>>> +		ret = -ENOMEM;
> >>>> +		goto e_free_amd_cert;
> >>>> +	}
> >>>> +
> >>>> +	/* populate the FW SEND_START field with system physical address */
> >>>> +	data->pdh_cert_address = __psp_pa(pdh_cert);
> >>>> +	data->pdh_cert_len = params.pdh_cert_len;
> >>>> +	data->plat_certs_address = __psp_pa(plat_certs);
> >>>> +	data->plat_certs_len = params.plat_certs_len;
> >>>> +	data->amd_certs_address = __psp_pa(amd_certs);
> >>>> +	data->amd_certs_len = params.amd_certs_len;
> >>>> +	data->session_address = __psp_pa(session_data);
> >>>> +	data->session_len = params.session_len;
> >>>> +	data->handle = sev->handle;
> >>>> +
> >>>> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
> >>>> +
> >>>> +	if (ret)
> >>>> +		goto e_free;
> >>>> +
> >>>> +	if (copy_to_user((void __user *)(uintptr_t) params.session_uaddr,
> >>>> +			session_data, params.session_len)) {
> >>>> +		ret = -EFAULT;
> >>>> +		goto e_free;
> >>>> +	}
> >>> To optimize the amount of data being copied to user space, could the
> >>> above section of code changed as follows?
> >>>
> >>> 	params.session_len = data->session_len;
> >>> 	if (copy_to_user((void __user *)(uintptr_t) params.session_uaddr,
> >>> 			session_data, params.session_len)) {
> >>> 		ret = -EFAULT;
> >>> 		goto e_free;
> >>> 	}
> >>
> >> We should not be using the data->session_len, it will cause -EFAULT when
> >> user has not allocated enough space in the session_uaddr. Lets consider
> >> the case where user passes session_len=10 but firmware thinks the
> >> session length should be 64. In that case the data->session_len will
> >> contains a value of 64 but userspace has allocated space for 10 bytes
> >> and copy_to_user() will fail. If we are really concern about the amount
> >> of data getting copied to userspace then use min_t(size_t,
> >> params.session_len, data->session_len).
> > We are allocating a buffer of params.session_len size and passing that
> > buffer, and that length via data->session_len, to the firmware. Why would
> > the firmware set data->session_len to a larger value, in spite of telling
> > it that the buffer is only params.session_len long? I thought that only
> > the reverse is possible, that is, the user sets the params.session_len
> > to the MAX, but the session data is actually smaller than that size.
> 
> 
> The question is, how does a userspace know the session length ? One
> method is you can precalculate a value based on your firmware version
> and have userspace pass that, or another approach is set
> params.session_len = 0 and query it from the FW. The FW spec allow to
> query the length, please see the spec. In the qemu patches I choose
> second approach. This is because session blob can change from one FW
> version to another and I tried to avoid calculating or hardcoding the
> length for a one version of the FW. You can certainly choose the first
> method. We want to ensure that kernel interface works on the both cases.

I like the fact that you have already implemented the functionality to
facilitate the user space to obtain the session length from the firmware
(by setting params.session_len to 0). However, I am trying to address
the case where the user space sets the params.session_len to a size
smaller than the size needed.

Let me put it differently. Let us say that the session blob needs 128
bytes, but the user space sets params.session_len to 16. That results
in us allocating a buffer of 16 bytes, and set data->session_len to 16.

What does the firmware do now?

Does it copy 128 bytes into data->session_address, or, does it copy
16 bytes?

If it copies 128 bytes, we most certainly will end up with a kernel crash.

If it copies 16 bytes, then what does it set in data->session_len? 16,
or 128? If 16, everything is good. If 128, we end up causing memory
access violation for the user space.

Perhaps, this can be dealt a little differently? Why not always call
sev_issue_cmd(kvm, SEV_CMD_SEND_START, ...) with zeroed out data? Then,
if the user space has set params.session_len to 0, we return with the
needed params.session_len. Otherwise, we check if params.session_len is
large enough, and if not, we return -EINVAL?

> 
> 
> > Also, if for whatever reason the firmware sets data->session_len to
> > a larger value than what is passed, what is the user space expected
> > to do when the call returns? If the user space tries to access
> > params.session_len amount of data, it will possibly get a memory access
> > violation, because it did not originally allocate that large a buffer.
> >
> > If we do go with using min_t(size_t, params.session_len,
> > data->session_len), then params.session_len should also be set to the
> > smaller of the two, right?
> >
> >>>> +
> >>>> +	params.policy = data->policy;
> >>>> +	params.session_len = data->session_len;
> >>>> +	if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
> >>>> +				sizeof(struct kvm_sev_send_start)))
> >>>> +		ret = -EFAULT;
> >>> Since the only fields that are changed in the kvm_sev_send_start structure
> >>> are session_len and policy, why do we need to copy the entire structure
> >>> back to the user? Why not just those two values? Please see the changes
> >>> proposed to kvm_sev_send_start structure further below to accomplish this.
> >> I think we also need to consider the code readability while saving the
> >> CPU cycles. This is very small structure. By duplicating into two calls
> >> #1 copy params.policy and #2 copy params.session_len we will add more
> >> CPU cycle. And, If we get creative and rearrange the structure then code
> >> readability is lost because now the copy will depend on how the
> >> structure is layout in the memory.
> > I was not recommending splitting that call into two. That would certainly
> > be more expensive, than copying the entire structure. That is the reason
> > why I suggested reordering the members of kvm_sev_send_start. Isn't
> > there plenty of code where structures are defined in a way to keep the
> > data movement efficient? :-)
> >
> > Please see my other comment below.
> >
> >>> 	params.policy = data->policy;
> >>> 	if (copy_to_user((void __user *)(uintptr_t)argp->data, &params,
> >>> 			sizeof(params.policy) + sizeof(params.session_len))
> >>> 		ret = -EFAULT;
> >>>> +
> >>>> +e_free:
> >>>> +	kfree(data);
> >>>> +e_free_amd_cert:
> >>>> +	kfree(amd_certs);
> >>>> +e_free_plat_cert:
> >>>> +	kfree(plat_certs);
> >>>> +e_free_pdh:
> >>>> +	kfree(pdh_cert);
> >>>> +e_free_session:
> >>>> +	kfree(session_data);
> >>>> +	return ret;
> >>>> +}
> >>>> +
> >>>>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >>>>  {
> >>>>  	struct kvm_sev_cmd sev_cmd;
> >>>> @@ -7193,6 +7318,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >>>>  	case KVM_SEV_LAUNCH_SECRET:
> >>>>  		r = sev_launch_secret(kvm, &sev_cmd);
> >>>>  		break;
> >>>> +	case KVM_SEV_SEND_START:
> >>>> +		r = sev_send_start(kvm, &sev_cmd);
> >>>> +		break;
> >>>>  	default:
> >>>>  		r = -EINVAL;
> >>>>  		goto out;
> >>>> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> >>>> index 5167bf2bfc75..9f63b9d48b63 100644
> >>>> --- a/include/linux/psp-sev.h
> >>>> +++ b/include/linux/psp-sev.h
> >>>> @@ -323,11 +323,11 @@ struct sev_data_send_start {
> >>>>  	u64 pdh_cert_address;			/* In */
> >>>>  	u32 pdh_cert_len;			/* In */
> >>>>  	u32 reserved1;
> >>>> -	u64 plat_cert_address;			/* In */
> >>>> -	u32 plat_cert_len;			/* In */
> >>>> +	u64 plat_certs_address;			/* In */
> >>>> +	u32 plat_certs_len;			/* In */
> >>>>  	u32 reserved2;
> >>>> -	u64 amd_cert_address;			/* In */
> >>>> -	u32 amd_cert_len;			/* In */
> >>>> +	u64 amd_certs_address;			/* In */
> >>>> +	u32 amd_certs_len;			/* In */
> >>>>  	u32 reserved3;
> >>>>  	u64 session_address;			/* In */
> >>>>  	u32 session_len;			/* In/Out */
> >>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >>>> index 4b95f9a31a2f..17bef4c245e1 100644
> >>>> --- a/include/uapi/linux/kvm.h
> >>>> +++ b/include/uapi/linux/kvm.h
> >>>> @@ -1558,6 +1558,18 @@ struct kvm_sev_dbg {
> >>>>  	__u32 len;
> >>>>  };
> >>>>  
> >>>> +struct kvm_sev_send_start {
> >>>> +	__u32 policy;
> >>>> +	__u64 pdh_cert_uaddr;
> >>>> +	__u32 pdh_cert_len;
> >>>> +	__u64 plat_certs_uaddr;
> >>>> +	__u32 plat_certs_len;
> >>>> +	__u64 amd_certs_uaddr;
> >>>> +	__u32 amd_certs_len;
> >>>> +	__u64 session_uaddr;
> >>>> +	__u32 session_len;
> >>>> +};
> >>> Redo this structure as below:
> >>>
> >>> struct kvm_sev_send_start {
> >>> 	__u32 policy;
> >>> 	__u32 session_len;
> >>> 	__u64 session_uaddr;
> >>> 	__u64 pdh_cert_uaddr;
> >>> 	__u32 pdh_cert_len;
> >>> 	__u64 plat_certs_uaddr;
> >>> 	__u32 plat_certs_len;
> >>> 	__u64 amd_certs_uaddr;
> >>> 	__u32 amd_certs_len;
> >>> };
> >>>
> >>> Or as below, just to make it look better.
> >>>
> >>> struct kvm_sev_send_start {
> >>> 	__u32 policy;
> >>> 	__u32 session_len;
> >>> 	__u64 session_uaddr;
> >>> 	__u32 pdh_cert_len;
> >>> 	__u64 pdh_cert_uaddr;
> >>> 	__u32 plat_certs_len;
> >>> 	__u64 plat_certs_uaddr;
> >>> 	__u32 amd_certs_len;
> >>> 	__u64 amd_certs_uaddr;
> >>> };
> >>>
> >> Wherever applicable, I tried  best to not divert from the SEV spec
> >> structure layout. Anyone who is reading the SEV FW spec  will see a
> >> similar structure layout in the KVM/PSP header files. I would prefer to
> >> stick to that approach.
> > This structure is in uapi, and is anyway different from the
> > sev_data_send_start, right? Does it really need to stay close to the
> > firmware structure layout? Just because the firmware folks thought of
> > a structure layout, that should not prevent our code to be efficient.
> >
> >>
> >>>> +
> >>>>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
> >>>>  #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
> >>>>  #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
> >>>> -- 
> >>>> 2.17.1
> >>>>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 01/14] KVM: SVM: Add KVM_SEV SEND_START command
  2020-04-02 18:57           ` Venu Busireddy
@ 2020-04-02 19:17             ` Brijesh Singh
  2020-04-02 19:43               ` Venu Busireddy
  0 siblings, 1 reply; 107+ messages in thread
From: Brijesh Singh @ 2020-04-02 19:17 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: brijesh.singh, Ashish Kalra, pbonzini, tglx, mingo, hpa, joro,
	bp, thomas.lendacky, x86, kvm, linux-kernel, rientjes,
	srutherford, luto


On 4/2/20 1:57 PM, Venu Busireddy wrote:
[snip]...

>> The question is, how does a userspace know the session length ? One
>> method is you can precalculate a value based on your firmware version
>> and have userspace pass that, or another approach is set
>> params.session_len = 0 and query it from the FW. The FW spec allow to
>> query the length, please see the spec. In the qemu patches I choose
>> second approach. This is because session blob can change from one FW
>> version to another and I tried to avoid calculating or hardcoding the
>> length for a one version of the FW. You can certainly choose the first
>> method. We want to ensure that kernel interface works on the both cases.
> I like the fact that you have already implemented the functionality to
> facilitate the user space to obtain the session length from the firmware
> (by setting params.session_len to 0). However, I am trying to address
> the case where the user space sets the params.session_len to a size
> smaller than the size needed.
>
> Let me put it differently. Let us say that the session blob needs 128
> bytes, but the user space sets params.session_len to 16. That results
> in us allocating a buffer of 16 bytes, and set data->session_len to 16.
>
> What does the firmware do now?
>
> Does it copy 128 bytes into data->session_address, or, does it copy
> 16 bytes?
>
> If it copies 128 bytes, we most certainly will end up with a kernel crash.
>
> If it copies 16 bytes, then what does it set in data->session_len? 16,
> or 128? If 16, everything is good. If 128, we end up causing memory
> access violation for the user space.

My interpretation of the spec is, if user provided length is smaller
than the FW expected length then FW will reports an error with
data->session_len set to the expected length. In other words, it should
*not* copy anything into the session buffer in the event of failure. If
FW is touching memory beyond what is specified in the session_len then
its FW bug and we can't do much from kernel. Am I missing something ?


>
> Perhaps, this can be dealt a little differently? Why not always call
> sev_issue_cmd(kvm, SEV_CMD_SEND_START, ...) with zeroed out data? Then,
> if the user space has set params.session_len to 0, we return with the
> needed params.session_len. Otherwise, we check if params.session_len is
> large enough, and if not, we return -EINVAL?


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 01/14] KVM: SVM: Add KVM_SEV SEND_START command
  2020-04-02 19:17             ` Brijesh Singh
@ 2020-04-02 19:43               ` Venu Busireddy
  2020-04-02 20:04                 ` Brijesh Singh
  0 siblings, 1 reply; 107+ messages in thread
From: Venu Busireddy @ 2020-04-02 19:43 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Ashish Kalra, pbonzini, tglx, mingo, hpa, joro, bp,
	thomas.lendacky, x86, kvm, linux-kernel, rientjes, srutherford,
	luto

On 2020-04-02 14:17:26 -0500, Brijesh Singh wrote:
> 
> On 4/2/20 1:57 PM, Venu Busireddy wrote:
> [snip]...
> 
> >> The question is, how does a userspace know the session length ? One
> >> method is you can precalculate a value based on your firmware version
> >> and have userspace pass that, or another approach is set
> >> params.session_len = 0 and query it from the FW. The FW spec allow to
> >> query the length, please see the spec. In the qemu patches I choose
> >> second approach. This is because session blob can change from one FW
> >> version to another and I tried to avoid calculating or hardcoding the
> >> length for a one version of the FW. You can certainly choose the first
> >> method. We want to ensure that kernel interface works on the both cases.
> > I like the fact that you have already implemented the functionality to
> > facilitate the user space to obtain the session length from the firmware
> > (by setting params.session_len to 0). However, I am trying to address
> > the case where the user space sets the params.session_len to a size
> > smaller than the size needed.
> >
> > Let me put it differently. Let us say that the session blob needs 128
> > bytes, but the user space sets params.session_len to 16. That results
> > in us allocating a buffer of 16 bytes, and set data->session_len to 16.
> >
> > What does the firmware do now?
> >
> > Does it copy 128 bytes into data->session_address, or, does it copy
> > 16 bytes?
> >
> > If it copies 128 bytes, we most certainly will end up with a kernel crash.
> >
> > If it copies 16 bytes, then what does it set in data->session_len? 16,
> > or 128? If 16, everything is good. If 128, we end up causing memory
> > access violation for the user space.
> 
> My interpretation of the spec is, if user provided length is smaller
> than the FW expected length then FW will reports an error with
> data->session_len set to the expected length. In other words, it should
> *not* copy anything into the session buffer in the event of failure.

That is good, and expected behavior.

> If FW is touching memory beyond what is specified in the session_len then
> its FW bug and we can't do much from kernel.

Agreed. But let us assume that the firmware is not touching memory that
it is not supposed to.

> Am I missing something ?

I believe you are agreeing that if the session blob needs 128 bytes and
user space sets params.session_len to 16, the firmware does not copy
any data to data->session_address, and sets data->session_len to 128.

Now, when we return, won't the user space try to access 128 bytes
(params.session_len) of data in params.session_uaddr, and crash? Because,
instead of returning an error that buffer is not large enough, we return
the call successfully!

That is why I was suggesting the following, which you seem to have
missed.

> > Perhaps, this can be dealt a little differently? Why not always call
> > sev_issue_cmd(kvm, SEV_CMD_SEND_START, ...) with zeroed out data? Then,
> > if the user space has set params.session_len to 0, we return with the
> > needed params.session_len. Otherwise, we check if params.session_len is
> > large enough, and if not, we return -EINVAL?

Doesn't the above approach address all scenarios?


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 01/14] KVM: SVM: Add KVM_SEV SEND_START command
  2020-04-02 19:43               ` Venu Busireddy
@ 2020-04-02 20:04                 ` Brijesh Singh
  2020-04-02 20:19                   ` Venu Busireddy
  0 siblings, 1 reply; 107+ messages in thread
From: Brijesh Singh @ 2020-04-02 20:04 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: brijesh.singh, Ashish Kalra, pbonzini, tglx, mingo, hpa, joro,
	bp, thomas.lendacky, x86, kvm, linux-kernel, rientjes,
	srutherford, luto


On 4/2/20 2:43 PM, Venu Busireddy wrote:
> On 2020-04-02 14:17:26 -0500, Brijesh Singh wrote:
>> On 4/2/20 1:57 PM, Venu Busireddy wrote:
>> [snip]...
>>
>>>> The question is, how does a userspace know the session length ? One
>>>> method is you can precalculate a value based on your firmware version
>>>> and have userspace pass that, or another approach is set
>>>> params.session_len = 0 and query it from the FW. The FW spec allow to
>>>> query the length, please see the spec. In the qemu patches I choose
>>>> second approach. This is because session blob can change from one FW
>>>> version to another and I tried to avoid calculating or hardcoding the
>>>> length for a one version of the FW. You can certainly choose the first
>>>> method. We want to ensure that kernel interface works on the both cases.
>>> I like the fact that you have already implemented the functionality to
>>> facilitate the user space to obtain the session length from the firmware
>>> (by setting params.session_len to 0). However, I am trying to address
>>> the case where the user space sets the params.session_len to a size
>>> smaller than the size needed.
>>>
>>> Let me put it differently. Let us say that the session blob needs 128
>>> bytes, but the user space sets params.session_len to 16. That results
>>> in us allocating a buffer of 16 bytes, and set data->session_len to 16.
>>>
>>> What does the firmware do now?
>>>
>>> Does it copy 128 bytes into data->session_address, or, does it copy
>>> 16 bytes?
>>>
>>> If it copies 128 bytes, we most certainly will end up with a kernel crash.
>>>
>>> If it copies 16 bytes, then what does it set in data->session_len? 16,
>>> or 128? If 16, everything is good. If 128, we end up causing memory
>>> access violation for the user space.
>> My interpretation of the spec is, if user provided length is smaller
>> than the FW expected length then FW will reports an error with
>> data->session_len set to the expected length. In other words, it should
>> *not* copy anything into the session buffer in the event of failure.
> That is good, and expected behavior.
>
>> If FW is touching memory beyond what is specified in the session_len then
>> its FW bug and we can't do much from kernel.
> Agreed. But let us assume that the firmware is not touching memory that
> it is not supposed to.
>
>> Am I missing something ?
> I believe you are agreeing that if the session blob needs 128 bytes and
> user space sets params.session_len to 16, the firmware does not copy
> any data to data->session_address, and sets data->session_len to 128.
>
> Now, when we return, won't the user space try to access 128 bytes
> (params.session_len) of data in params.session_uaddr, and crash? Because,
> instead of returning an error that buffer is not large enough, we return
> the call successfully!


Ah, so the main issue is we should not be going to e_free on error. If
session_len is less than the expected len then FW will return an error.
In the case of an error we can skip copying the session_data into
userspace buffer but we still need to pass the session_len and policy
back to the userspace.

+	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
+
+	if (ret)
+		goto e_free;
+

If user space gets an error then it can decode it further to get
additional information (e.g buffer too small).

>
> That is why I was suggesting the following, which you seem to have
> missed.
>
>>> Perhaps, this can be dealt a little differently? Why not always call
>>> sev_issue_cmd(kvm, SEV_CMD_SEND_START, ...) with zeroed out data? Then,
>>> if the user space has set params.session_len to 0, we return with the
>>> needed params.session_len. Otherwise, we check if params.session_len is
>>> large enough, and if not, we return -EINVAL?
> Doesn't the above approach address all scenarios?
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 02/14] KVM: SVM: Add KVM_SEND_UPDATE_DATA command
  2020-03-30  6:20 ` [PATCH v6 02/14] KVM: SVM: Add KVM_SEND_UPDATE_DATA command Ashish Kalra
  2020-04-02 17:55   ` Venu Busireddy
@ 2020-04-02 20:13   ` Krish Sadhukhan
  1 sibling, 0 replies; 107+ messages in thread
From: Krish Sadhukhan @ 2020-04-02 20:13 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh


On 3/29/20 11:20 PM, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
>
> The command is used for encrypting the guest memory region using the encryption
> context created with KVM_SEV_SEND_START.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by : Steve Rutherford <srutherford@google.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>   .../virt/kvm/amd-memory-encryption.rst        |  24 ++++
>   arch/x86/kvm/svm.c                            | 136 +++++++++++++++++-
>   include/uapi/linux/kvm.h                      |   9 ++
>   3 files changed, 165 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index 4fd34fc5c7a7..f46817ef7019 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -290,6 +290,30 @@ Returns: 0 on success, -negative on error
>                   __u32 session_len;
>           };
>   
> +11. KVM_SEV_SEND_UPDATE_DATA
> +----------------------------
> +
> +The KVM_SEV_SEND_UPDATE_DATA command can be used by the hypervisor to encrypt the
> +outgoing guest memory region with the encryption context creating using
> +KVM_SEV_SEND_START.
> +
> +Parameters (in): struct kvm_sev_send_update_data
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> +        struct kvm_sev_launch_send_update_data {
> +                __u64 hdr_uaddr;        /* userspace address containing the packet header */
> +                __u32 hdr_len;
> +
> +                __u64 guest_uaddr;      /* the source memory region to be encrypted */
> +                __u32 guest_len;
> +
> +                __u64 trans_uaddr;      /* the destition memory region  */
> +                __u32 trans_len;
> +        };
> +
>   References
>   ==========
>   
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 63d172e974ad..8561c47cc4f9 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -428,6 +428,7 @@ static DECLARE_RWSEM(sev_deactivate_lock);
>   static DEFINE_MUTEX(sev_bitmap_lock);
>   static unsigned int max_sev_asid;
>   static unsigned int min_sev_asid;
> +static unsigned long sev_me_mask;
>   static unsigned long *sev_asid_bitmap;
>   static unsigned long *sev_reclaim_asid_bitmap;
>   #define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
> @@ -1232,16 +1233,22 @@ static int avic_ga_log_notifier(u32 ga_tag)
>   static __init int sev_hardware_setup(void)
>   {
>   	struct sev_user_data_status *status;
> +	u32 eax, ebx;
>   	int rc;
>   
> -	/* Maximum number of encrypted guests supported simultaneously */
> -	max_sev_asid = cpuid_ecx(0x8000001F);
> +	/*
> +	 * Query the memory encryption information.
> +	 *  EBX:  Bit 0:5 Pagetable bit position used to indicate encryption
> +	 *  (aka Cbit).
> +	 *  ECX:  Maximum number of encrypted guests supported simultaneously.
> +	 *  EDX:  Minimum ASID value that should be used for SEV guest.
> +	 */
> +	cpuid(0x8000001f, &eax, &ebx, &max_sev_asid, &min_sev_asid);


Will max_sev_asid and the max number of guests supported be the same 
number always ?

>   
>   	if (!max_sev_asid)
>   		return 1;
>   
> -	/* Minimum ASID value that should be used for SEV guest */
> -	min_sev_asid = cpuid_edx(0x8000001F);
> +	sev_me_mask = 1UL << (ebx & 0x3f);
>   
>   	/* Initialize SEV ASID bitmaps */
>   	sev_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);
> @@ -7274,6 +7281,124 @@ static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>   	return ret;
>   }
>   
> +/* Userspace wants to query either header or trans length. */
> +static int
> +__sev_send_update_data_query_lengths(struct kvm *kvm, struct kvm_sev_cmd *argp,
> +				     struct kvm_sev_send_update_data *params)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct sev_data_send_update_data *data;
> +	int ret;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	data->handle = sev->handle;
> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_UPDATE_DATA, data, &argp->error);
> +
> +	params->hdr_len = data->hdr_len;
> +	params->trans_len = data->trans_len;
> +
> +	if (copy_to_user((void __user *)(uintptr_t)argp->data, params,
> +			 sizeof(struct kvm_sev_send_update_data)))
> +		ret = -EFAULT;
> +
> +	kfree(data);
> +	return ret;
> +}
> +
> +static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct sev_data_send_update_data *data;
> +	struct kvm_sev_send_update_data params;
> +	void *hdr, *trans_data;
> +	struct page **guest_page;
> +	unsigned long n;
> +	int ret, offset;
> +
> +	if (!sev_guest(kvm))
> +		return -ENOTTY;


Do we need to check the following conditions here ?

     "The platform must be in the PSTATE.WORKING state.
      The guest must be in the GSTATE.SUPDATE state."

> +
> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
> +			sizeof(struct kvm_sev_send_update_data)))
> +		return -EFAULT;
> +
> +	/* userspace wants to query either header or trans length */
> +	if (!params.trans_len || !params.hdr_len)
> +		return __sev_send_update_data_query_lengths(kvm, argp, &params);
> +
> +	if (!params.trans_uaddr || !params.guest_uaddr ||
> +	    !params.guest_len || !params.hdr_uaddr)
> +		return -EINVAL;
> +
> +
> +	/* Check if we are crossing the page boundary */
> +	offset = params.guest_uaddr & (PAGE_SIZE - 1);
> +	if ((params.guest_len + offset > PAGE_SIZE))
> +		return -EINVAL;
> +
> +	/* Pin guest memory */
> +	guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
> +				    PAGE_SIZE, &n, 0);
> +	if (!guest_page)
> +		return -EFAULT;
> +
> +	/* allocate memory for header and transport buffer */
> +	ret = -ENOMEM;
> +	hdr = kmalloc(params.hdr_len, GFP_KERNEL_ACCOUNT);
> +	if (!hdr)
> +		goto e_unpin;
> +
> +	trans_data = kmalloc(params.trans_len, GFP_KERNEL_ACCOUNT);
> +	if (!trans_data)
> +		goto e_free_hdr;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		goto e_free_trans_data;
> +
> +	data->hdr_address = __psp_pa(hdr);
> +	data->hdr_len = params.hdr_len;
> +	data->trans_address = __psp_pa(trans_data);
> +	data->trans_len = params.trans_len;
> +
> +	/* The SEND_UPDATE_DATA command requires C-bit to be always set. */
> +	data->guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) +
> +				offset;
> +	data->guest_address |= sev_me_mask;

Why not name the variable 'sev_cbit_mask' instead of sev_me_mask ?

> +	data->guest_len = params.guest_len;
> +	data->handle = sev->handle;
> +
> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_UPDATE_DATA, data, &argp->error);
> +
> +	if (ret)
> +		goto e_free;
> +
> +	/* copy transport buffer to user space */
> +	if (copy_to_user((void __user *)(uintptr_t)params.trans_uaddr,
> +			 trans_data, params.trans_len)) {
> +		ret = -EFAULT;
> +		goto e_unpin;
> +	}
> +
> +	/* Copy packet header to userspace. */
> +	ret = copy_to_user((void __user *)(uintptr_t)params.hdr_uaddr, hdr,
> +				params.hdr_len);
> +
> +e_free:
> +	kfree(data);
> +e_free_trans_data:
> +	kfree(trans_data);
> +e_free_hdr:
> +	kfree(hdr);
> +e_unpin:
> +	sev_unpin_memory(kvm, guest_page, n);
> +
> +	return ret;
> +}
> +
>   static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   {
>   	struct kvm_sev_cmd sev_cmd;
> @@ -7321,6 +7446,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   	case KVM_SEV_SEND_START:
>   		r = sev_send_start(kvm, &sev_cmd);
>   		break;
> +	case KVM_SEV_SEND_UPDATE_DATA:
> +		r = sev_send_update_data(kvm, &sev_cmd);
> +		break;
>   	default:
>   		r = -EINVAL;
>   		goto out;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 17bef4c245e1..d9dc81bb9c55 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1570,6 +1570,15 @@ struct kvm_sev_send_start {
>   	__u32 session_len;
>   };
>   
> +struct kvm_sev_send_update_data {
> +	__u64 hdr_uaddr;
> +	__u32 hdr_len;
> +	__u64 guest_uaddr;
> +	__u32 guest_len;
> +	__u64 trans_uaddr;
> +	__u32 trans_len;
> +};
> +
>   #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
>   #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
>   #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)


Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 03/14] KVM: SVM: Add KVM_SEV_SEND_FINISH command
  2020-03-30  6:20 ` [PATCH v6 03/14] KVM: SVM: Add KVM_SEV_SEND_FINISH command Ashish Kalra
  2020-04-02 18:17   ` Venu Busireddy
@ 2020-04-02 20:15   ` Krish Sadhukhan
  1 sibling, 0 replies; 107+ messages in thread
From: Krish Sadhukhan @ 2020-04-02 20:15 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh


On 3/29/20 11:20 PM, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
>
> The command is used to finailize the encryption context created with
> KVM_SEV_SEND_START command.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Steve Rutherford <srutherford@google.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>   .../virt/kvm/amd-memory-encryption.rst        |  8 +++++++
>   arch/x86/kvm/svm.c                            | 23 +++++++++++++++++++
>   2 files changed, 31 insertions(+)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index f46817ef7019..a45dcb5f8687 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -314,6 +314,14 @@ Returns: 0 on success, -negative on error
>                   __u32 trans_len;
>           };
>   
> +12. KVM_SEV_SEND_FINISH
> +------------------------
> +
> +After completion of the migration flow, the KVM_SEV_SEND_FINISH command can be
> +issued by the hypervisor to delete the encryption context.
> +
> +Returns: 0 on success, -negative on error
> +
>   References
>   ==========
>   
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 8561c47cc4f9..71a4cb3b817d 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7399,6 +7399,26 @@ static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
>   	return ret;
>   }
>   
> +static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct sev_data_send_finish *data;
> +	int ret;
> +
> +	if (!sev_guest(kvm))
> +		return -ENOTTY;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	data->handle = sev->handle;
> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_FINISH, data, &argp->error);
> +
> +	kfree(data);
> +	return ret;
> +}
> +
>   static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   {
>   	struct kvm_sev_cmd sev_cmd;
> @@ -7449,6 +7469,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   	case KVM_SEV_SEND_UPDATE_DATA:
>   		r = sev_send_update_data(kvm, &sev_cmd);
>   		break;
> +	case KVM_SEV_SEND_FINISH:
> +		r = sev_send_finish(kvm, &sev_cmd);
> +		break;
>   	default:
>   		r = -EINVAL;
>   		goto out;
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 01/14] KVM: SVM: Add KVM_SEV SEND_START command
  2020-04-02 20:04                 ` Brijesh Singh
@ 2020-04-02 20:19                   ` Venu Busireddy
  0 siblings, 0 replies; 107+ messages in thread
From: Venu Busireddy @ 2020-04-02 20:19 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Ashish Kalra, pbonzini, tglx, mingo, hpa, joro, bp,
	thomas.lendacky, x86, kvm, linux-kernel, rientjes, srutherford,
	luto

On 2020-04-02 15:04:54 -0500, Brijesh Singh wrote:
> 
> On 4/2/20 2:43 PM, Venu Busireddy wrote:
> > On 2020-04-02 14:17:26 -0500, Brijesh Singh wrote:
> >> On 4/2/20 1:57 PM, Venu Busireddy wrote:
> >> [snip]...
> >>
> >>>> The question is, how does a userspace know the session length ? One
> >>>> method is you can precalculate a value based on your firmware version
> >>>> and have userspace pass that, or another approach is set
> >>>> params.session_len = 0 and query it from the FW. The FW spec allow to
> >>>> query the length, please see the spec. In the qemu patches I choose
> >>>> second approach. This is because session blob can change from one FW
> >>>> version to another and I tried to avoid calculating or hardcoding the
> >>>> length for a one version of the FW. You can certainly choose the first
> >>>> method. We want to ensure that kernel interface works on the both cases.
> >>> I like the fact that you have already implemented the functionality to
> >>> facilitate the user space to obtain the session length from the firmware
> >>> (by setting params.session_len to 0). However, I am trying to address
> >>> the case where the user space sets the params.session_len to a size
> >>> smaller than the size needed.
> >>>
> >>> Let me put it differently. Let us say that the session blob needs 128
> >>> bytes, but the user space sets params.session_len to 16. That results
> >>> in us allocating a buffer of 16 bytes, and set data->session_len to 16.
> >>>
> >>> What does the firmware do now?
> >>>
> >>> Does it copy 128 bytes into data->session_address, or, does it copy
> >>> 16 bytes?
> >>>
> >>> If it copies 128 bytes, we most certainly will end up with a kernel crash.
> >>>
> >>> If it copies 16 bytes, then what does it set in data->session_len? 16,
> >>> or 128? If 16, everything is good. If 128, we end up causing memory
> >>> access violation for the user space.
> >> My interpretation of the spec is, if user provided length is smaller
> >> than the FW expected length then FW will reports an error with
> >> data->session_len set to the expected length. In other words, it should
> >> *not* copy anything into the session buffer in the event of failure.
> > That is good, and expected behavior.
> >
> >> If FW is touching memory beyond what is specified in the session_len then
> >> its FW bug and we can't do much from kernel.
> > Agreed. But let us assume that the firmware is not touching memory that
> > it is not supposed to.
> >
> >> Am I missing something ?
> > I believe you are agreeing that if the session blob needs 128 bytes and
> > user space sets params.session_len to 16, the firmware does not copy
> > any data to data->session_address, and sets data->session_len to 128.
> >
> > Now, when we return, won't the user space try to access 128 bytes
> > (params.session_len) of data in params.session_uaddr, and crash? Because,
> > instead of returning an error that buffer is not large enough, we return
> > the call successfully!
> 
> 
> Ah, so the main issue is we should not be going to e_free on error. If
> session_len is less than the expected len then FW will return an error.
> In the case of an error we can skip copying the session_data into
> userspace buffer but we still need to pass the session_len and policy
> back to the userspace.

Sure, that is one way to fix the problem.

> 
> +	ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error);
> +
> +	if (ret)
> +		goto e_free;
> +
> 
> If user space gets an error then it can decode it further to get
> additional information (e.g buffer too small).
> 
> >
> > That is why I was suggesting the following, which you seem to have
> > missed.
> >
> >>> Perhaps, this can be dealt a little differently? Why not always call
> >>> sev_issue_cmd(kvm, SEV_CMD_SEND_START, ...) with zeroed out data? Then,
> >>> if the user space has set params.session_len to 0, we return with the
> >>> needed params.session_len. Otherwise, we check if params.session_len is
> >>> large enough, and if not, we return -EINVAL?
> > Doesn't the above approach address all scenarios?
> >

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 04/14] KVM: SVM: Add support for KVM_SEV_RECEIVE_START command
  2020-03-30  6:21 ` [PATCH v6 04/14] KVM: SVM: Add support for KVM_SEV_RECEIVE_START command Ashish Kalra
@ 2020-04-02 21:35   ` Venu Busireddy
  2020-04-02 22:09   ` Krish Sadhukhan
  1 sibling, 0 replies; 107+ messages in thread
From: Venu Busireddy @ 2020-04-02 21:35 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On 2020-03-30 06:21:04 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
> 
> The command is used to create the encryption context for an incoming
> SEV guest. The encryption context can be later used by the hypervisor
> to import the incoming data into the SEV guest memory space.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Steve Rutherford <srutherford@google.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  .../virt/kvm/amd-memory-encryption.rst        | 29 +++++++
>  arch/x86/kvm/svm.c                            | 81 +++++++++++++++++++
>  include/uapi/linux/kvm.h                      |  9 +++
>  3 files changed, 119 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index a45dcb5f8687..ef1f1f3a5b40 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -322,6 +322,35 @@ issued by the hypervisor to delete the encryption context.
>  
>  Returns: 0 on success, -negative on error
>  
> +13. KVM_SEV_RECEIVE_START
> +------------------------
> +
> +The KVM_SEV_RECEIVE_START command is used for creating the memory encryption
> +context for an incoming SEV guest. To create the encryption context, the user must
> +provide a guest policy, the platform public Diffie-Hellman (PDH) key and session
> +information.
> +
> +Parameters: struct  kvm_sev_receive_start (in/out)
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> +        struct kvm_sev_receive_start {
> +                __u32 handle;           /* if zero then firmware creates a new handle */
> +                __u32 policy;           /* guest's policy */
> +
> +                __u64 pdh_uaddr;         /* userspace address pointing to the PDH key */
> +                __u32 dh_len;

Could dh_len be changed to pdh_len, to match the names in
kvm_sev_receive_start in include/uapi/linux/kvm.h?

> +
> +                __u64 session_addr;     /* userspace address which points to the guest session information */

Also, session_addr to session_uaddr?

> +                __u32 session_len;
> +        };
> +
> +On success, the 'handle' field contains a new handle and on error, a negative value.
> +
> +For more details, see SEV spec Section 6.12.
> +
>  References
>  ==========
>  
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 71a4cb3b817d..038b47685733 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7419,6 +7419,84 @@ static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  	return ret;
>  }
>  
> +static int sev_receive_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct sev_data_receive_start *start;
> +	struct kvm_sev_receive_start params;
> +	int *error = &argp->error;
> +	void *session_data;
> +	void *pdh_data;
> +	int ret;
> +
> +	if (!sev_guest(kvm))
> +		return -ENOTTY;
> +
> +	/* Get parameter from the userspace */
> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
> +			sizeof(struct kvm_sev_receive_start)))
> +		return -EFAULT;
> +
> +	/* some sanity checks */
> +	if (!params.pdh_uaddr || !params.pdh_len ||
> +	    !params.session_uaddr || !params.session_len)
> +		return -EINVAL;
> +
> +	pdh_data = psp_copy_user_blob(params.pdh_uaddr, params.pdh_len);
> +	if (IS_ERR(pdh_data))
> +		return PTR_ERR(pdh_data);
> +
> +	session_data = psp_copy_user_blob(params.session_uaddr,
> +			params.session_len);
> +	if (IS_ERR(session_data)) {
> +		ret = PTR_ERR(session_data);
> +		goto e_free_pdh;
> +	}
> +
> +	ret = -ENOMEM;
> +	start = kzalloc(sizeof(*start), GFP_KERNEL);
> +	if (!start)
> +		goto e_free_session;
> +
> +	start->handle = params.handle;
> +	start->policy = params.policy;
> +	start->pdh_cert_address = __psp_pa(pdh_data);
> +	start->pdh_cert_len = params.pdh_len;
> +	start->session_address = __psp_pa(session_data);
> +	start->session_len = params.session_len;
> +
> +	/* create memory encryption context */
> +	ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_RECEIVE_START, start,
> +				error);
> +	if (ret)
> +		goto e_free;
> +
> +	/* Bind ASID to this guest */
> +	ret = sev_bind_asid(kvm, start->handle, error);
> +	if (ret)
> +		goto e_free;
> +
> +	params.handle = start->handle;
> +	if (copy_to_user((void __user *)(uintptr_t)argp->data,
> +			 &params, sizeof(struct kvm_sev_receive_start))) {
> +		ret = -EFAULT;
> +		sev_unbind_asid(kvm, start->handle);
> +		goto e_free;
> +	}
> +
> +	sev->handle = start->handle;
> +	sev->fd = argp->sev_fd;
> +
> +e_free:
> +	kfree(start);
> +e_free_session:
> +	kfree(session_data);
> +e_free_pdh:
> +	kfree(pdh_data);
> +
> +	return ret;
> +}
> +
>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_sev_cmd sev_cmd;
> @@ -7472,6 +7550,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  	case KVM_SEV_SEND_FINISH:
>  		r = sev_send_finish(kvm, &sev_cmd);
>  		break;
> +	case KVM_SEV_RECEIVE_START:
> +		r = sev_receive_start(kvm, &sev_cmd);
> +		break;
>  	default:
>  		r = -EINVAL;
>  		goto out;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index d9dc81bb9c55..74764b9db5fa 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1579,6 +1579,15 @@ struct kvm_sev_send_update_data {
>  	__u32 trans_len;
>  };
>  
> +struct kvm_sev_receive_start {
> +	__u32 handle;
> +	__u32 policy;
> +	__u64 pdh_uaddr;
> +	__u32 pdh_len;
> +	__u64 session_uaddr;
> +	__u32 session_len;
> +};
> +
>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
>  #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
>  #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 04/14] KVM: SVM: Add support for KVM_SEV_RECEIVE_START command
  2020-03-30  6:21 ` [PATCH v6 04/14] KVM: SVM: Add support for KVM_SEV_RECEIVE_START command Ashish Kalra
  2020-04-02 21:35   ` Venu Busireddy
@ 2020-04-02 22:09   ` Krish Sadhukhan
  1 sibling, 0 replies; 107+ messages in thread
From: Krish Sadhukhan @ 2020-04-02 22:09 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh


On 3/29/20 11:21 PM, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
>
> The command is used to create the encryption context for an incoming
> SEV guest. The encryption context can be later used by the hypervisor
> to import the incoming data into the SEV guest memory space.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Steve Rutherford <srutherford@google.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>   .../virt/kvm/amd-memory-encryption.rst        | 29 +++++++
>   arch/x86/kvm/svm.c                            | 81 +++++++++++++++++++
>   include/uapi/linux/kvm.h                      |  9 +++
>   3 files changed, 119 insertions(+)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index a45dcb5f8687..ef1f1f3a5b40 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -322,6 +322,35 @@ issued by the hypervisor to delete the encryption context.
>   
>   Returns: 0 on success, -negative on error
>   
> +13. KVM_SEV_RECEIVE_START
> +------------------------
> +
> +The KVM_SEV_RECEIVE_START command is used for creating the memory encryption
> +context for an incoming SEV guest. To create the encryption context, the user must
> +provide a guest policy, the platform public Diffie-Hellman (PDH) key and session
> +information.
> +
> +Parameters: struct  kvm_sev_receive_start (in/out)
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> +        struct kvm_sev_receive_start {
> +                __u32 handle;           /* if zero then firmware creates a new handle */
> +                __u32 policy;           /* guest's policy */
> +
> +                __u64 pdh_uaddr;         /* userspace address pointing to the PDH key */
> +                __u32 dh_len;
> +
> +                __u64 session_addr;     /* userspace address which points to the guest session information */
> +                __u32 session_len;
> +        };
> +
> +On success, the 'handle' field contains a new handle and on error, a negative value.
> +
> +For more details, see SEV spec Section 6.12.
> +
>   References
>   ==========
>   
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 71a4cb3b817d..038b47685733 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7419,6 +7419,84 @@ static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>   	return ret;
>   }
>   
> +static int sev_receive_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct sev_data_receive_start *start;
> +	struct kvm_sev_receive_start params;
> +	int *error = &argp->error;
> +	void *session_data;
> +	void *pdh_data;
> +	int ret;
> +
> +	if (!sev_guest(kvm))
> +		return -ENOTTY;
> +
> +	/* Get parameter from the userspace */
> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
> +			sizeof(struct kvm_sev_receive_start)))
> +		return -EFAULT;
> +
> +	/* some sanity checks */
> +	if (!params.pdh_uaddr || !params.pdh_len ||
> +	    !params.session_uaddr || !params.session_len)
> +		return -EINVAL;
> +
> +	pdh_data = psp_copy_user_blob(params.pdh_uaddr, params.pdh_len);
> +	if (IS_ERR(pdh_data))
> +		return PTR_ERR(pdh_data);
> +
> +	session_data = psp_copy_user_blob(params.session_uaddr,
> +			params.session_len);
> +	if (IS_ERR(session_data)) {
> +		ret = PTR_ERR(session_data);
> +		goto e_free_pdh;
> +	}
> +
> +	ret = -ENOMEM;
> +	start = kzalloc(sizeof(*start), GFP_KERNEL);
> +	if (!start)
> +		goto e_free_session;
> +
> +	start->handle = params.handle;
> +	start->policy = params.policy;
> +	start->pdh_cert_address = __psp_pa(pdh_data);
> +	start->pdh_cert_len = params.pdh_len;
> +	start->session_address = __psp_pa(session_data);
> +	start->session_len = params.session_len;
> +
> +	/* create memory encryption context */
> +	ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_RECEIVE_START, start,
> +				error);
> +	if (ret)
> +		goto e_free;
> +
> +	/* Bind ASID to this guest */
> +	ret = sev_bind_asid(kvm, start->handle, error);
> +	if (ret)
> +		goto e_free;
> +
> +	params.handle = start->handle;
> +	if (copy_to_user((void __user *)(uintptr_t)argp->data,
> +			 &params, sizeof(struct kvm_sev_receive_start))) {
> +		ret = -EFAULT;
> +		sev_unbind_asid(kvm, start->handle);
> +		goto e_free;
> +	}
> +
> +	sev->handle = start->handle;
> +	sev->fd = argp->sev_fd;
> +
> +e_free:
> +	kfree(start);
> +e_free_session:
> +	kfree(session_data);
> +e_free_pdh:
> +	kfree(pdh_data);
> +
> +	return ret;
> +}
> +
>   static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   {
>   	struct kvm_sev_cmd sev_cmd;
> @@ -7472,6 +7550,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   	case KVM_SEV_SEND_FINISH:
>   		r = sev_send_finish(kvm, &sev_cmd);
>   		break;
> +	case KVM_SEV_RECEIVE_START:
> +		r = sev_receive_start(kvm, &sev_cmd);
> +		break;
>   	default:
>   		r = -EINVAL;
>   		goto out;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index d9dc81bb9c55..74764b9db5fa 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1579,6 +1579,15 @@ struct kvm_sev_send_update_data {
>   	__u32 trans_len;
>   };
>   
> +struct kvm_sev_receive_start {
> +	__u32 handle;
> +	__u32 policy;
> +	__u64 pdh_uaddr;
> +	__u32 pdh_len;

Why not 'pdh_cert_uaddr' and 'pdh_cert_len' ? That's the naming 
convention you have followed in previous patches.
> +	__u64 session_uaddr;
> +	__u32 session_len;
> +};
> +
>   #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
>   #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
>   #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)


Reviewed-by: Krish Sadhukhan <krish.sadhukan@oracle.com>


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 06/14] KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
  2020-03-30  6:21 ` [PATCH v6 06/14] KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command Ashish Kalra
@ 2020-04-02 22:24   ` Venu Busireddy
  2020-04-02 22:27   ` Krish Sadhukhan
  1 sibling, 0 replies; 107+ messages in thread
From: Venu Busireddy @ 2020-04-02 22:24 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On 2020-03-30 06:21:36 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
> 
> The command finalize the guest receiving process and make the SEV guest
> ready for the execution.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  .../virt/kvm/amd-memory-encryption.rst        |  8 +++++++
>  arch/x86/kvm/svm.c                            | 23 +++++++++++++++++++
>  2 files changed, 31 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index 554aa33a99cc..93cd95d9a6c0 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -375,6 +375,14 @@ Returns: 0 on success, -negative on error
>                  __u32 trans_len;
>          };
>  
> +15. KVM_SEV_RECEIVE_FINISH
> +------------------------
> +
> +After completion of the migration flow, the KVM_SEV_RECEIVE_FINISH command can be
> +issued by the hypervisor to make the guest ready for execution.
> +
> +Returns: 0 on success, -negative on error
> +
>  References
>  ==========
>  
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 5fc5355536d7..7c2721e18b06 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7573,6 +7573,26 @@ static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  	return ret;
>  }
>  
> +static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct sev_data_receive_finish *data;
> +	int ret;
> +
> +	if (!sev_guest(kvm))
> +		return -ENOTTY;

Noticed this in earlier patches too. Is -ENOTTY the best return value?
Aren't one of -ENXIO, or -ENODEV, or -EINVAL a better choice? What is
the rationale for using -ENOTTY?

> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	data->handle = sev->handle;
> +	ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, data, &argp->error);
> +
> +	kfree(data);
> +	return ret;
> +}
> +
>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_sev_cmd sev_cmd;
> @@ -7632,6 +7652,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  	case KVM_SEV_RECEIVE_UPDATE_DATA:
>  		r = sev_receive_update_data(kvm, &sev_cmd);
>  		break;
> +	case KVM_SEV_RECEIVE_FINISH:
> +		r = sev_receive_finish(kvm, &sev_cmd);
> +		break;
>  	default:
>  		r = -EINVAL;
>  		goto out;
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 05/14] KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command
  2020-03-30  6:21 ` [PATCH v6 05/14] KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command Ashish Kalra
@ 2020-04-02 22:25   ` Krish Sadhukhan
  2020-04-02 22:29   ` Venu Busireddy
  1 sibling, 0 replies; 107+ messages in thread
From: Krish Sadhukhan @ 2020-04-02 22:25 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh


On 3/29/20 11:21 PM, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
>
> The command is used for copying the incoming buffer into the
> SEV guest memory space.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>   .../virt/kvm/amd-memory-encryption.rst        | 24 ++++++
>   arch/x86/kvm/svm.c                            | 79 +++++++++++++++++++
>   include/uapi/linux/kvm.h                      |  9 +++
>   3 files changed, 112 insertions(+)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index ef1f1f3a5b40..554aa33a99cc 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -351,6 +351,30 @@ On success, the 'handle' field contains a new handle and on error, a negative va
>   
>   For more details, see SEV spec Section 6.12.
>   
> +14. KVM_SEV_RECEIVE_UPDATE_DATA
> +----------------------------
> +
> +The KVM_SEV_RECEIVE_UPDATE_DATA command can be used by the hypervisor to copy
> +the incoming buffers into the guest memory region with encryption context
> +created during the KVM_SEV_RECEIVE_START.
> +
> +Parameters (in): struct kvm_sev_receive_update_data
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> +        struct kvm_sev_launch_receive_update_data {
> +                __u64 hdr_uaddr;        /* userspace address containing the packet header */
> +                __u32 hdr_len;
> +
> +                __u64 guest_uaddr;      /* the destination guest memory region */
> +                __u32 guest_len;
> +
> +                __u64 trans_uaddr;      /* the incoming buffer memory region  */
> +                __u32 trans_len;
> +        };
> +
>   References
>   ==========
>   
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 038b47685733..5fc5355536d7 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7497,6 +7497,82 @@ static int sev_receive_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>   	return ret;
>   }
>   
> +static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct kvm_sev_receive_update_data params;
> +	struct sev_data_receive_update_data *data;
> +	void *hdr = NULL, *trans = NULL;
> +	struct page **guest_page;
> +	unsigned long n;
> +	int ret, offset;
> +
> +	if (!sev_guest(kvm))
> +		return -EINVAL;
> +
> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
> +			sizeof(struct kvm_sev_receive_update_data)))
> +		return -EFAULT;
> +
> +	if (!params.hdr_uaddr || !params.hdr_len ||
> +	    !params.guest_uaddr || !params.guest_len ||
> +	    !params.trans_uaddr || !params.trans_len)
> +		return -EINVAL;
> +
> +	/* Check if we are crossing the page boundary */
> +	offset = params.guest_uaddr & (PAGE_SIZE - 1);
> +	if ((params.guest_len + offset > PAGE_SIZE))
> +		return -EINVAL;
> +
> +	hdr = psp_copy_user_blob(params.hdr_uaddr, params.hdr_len);
> +	if (IS_ERR(hdr))
> +		return PTR_ERR(hdr);
> +
> +	trans = psp_copy_user_blob(params.trans_uaddr, params.trans_len);
> +	if (IS_ERR(trans)) {
> +		ret = PTR_ERR(trans);
> +		goto e_free_hdr;
> +	}
> +
> +	ret = -ENOMEM;
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		goto e_free_trans;
> +
> +	data->hdr_address = __psp_pa(hdr);
> +	data->hdr_len = params.hdr_len;
> +	data->trans_address = __psp_pa(trans);
> +	data->trans_len = params.trans_len;
> +
> +	/* Pin guest memory */
> +	ret = -EFAULT;
> +	guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
> +				    PAGE_SIZE, &n, 0);
> +	if (!guest_page)
> +		goto e_free;
> +
> +	/* The RECEIVE_UPDATE_DATA command requires C-bit to be always set. */
> +	data->guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) +
> +				offset;
> +	data->guest_address |= sev_me_mask;
> +	data->guest_len = params.guest_len;
> +	data->handle = sev->handle;
> +
> +	ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_UPDATE_DATA, data,
> +				&argp->error);
> +
> +	sev_unpin_memory(kvm, guest_page, n);
> +
> +e_free:
> +	kfree(data);
> +e_free_trans:
> +	kfree(trans);
> +e_free_hdr:
> +	kfree(hdr);
> +
> +	return ret;
> +}
> +
>   static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   {
>   	struct kvm_sev_cmd sev_cmd;
> @@ -7553,6 +7629,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   	case KVM_SEV_RECEIVE_START:
>   		r = sev_receive_start(kvm, &sev_cmd);
>   		break;
> +	case KVM_SEV_RECEIVE_UPDATE_DATA:
> +		r = sev_receive_update_data(kvm, &sev_cmd);
> +		break;
>   	default:
>   		r = -EINVAL;
>   		goto out;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 74764b9db5fa..4e80c57a3182 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1588,6 +1588,15 @@ struct kvm_sev_receive_start {
>   	__u32 session_len;
>   };
>   
> +struct kvm_sev_receive_update_data {
> +	__u64 hdr_uaddr;
> +	__u32 hdr_len;
> +	__u64 guest_uaddr;
> +	__u32 guest_len;
> +	__u64 trans_uaddr;
> +	__u32 trans_len;
> +};
> +
>   #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
>   #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
>   #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 06/14] KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
  2020-03-30  6:21 ` [PATCH v6 06/14] KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command Ashish Kalra
  2020-04-02 22:24   ` Venu Busireddy
@ 2020-04-02 22:27   ` Krish Sadhukhan
  2020-04-07  0:57     ` Steve Rutherford
  1 sibling, 1 reply; 107+ messages in thread
From: Krish Sadhukhan @ 2020-04-02 22:27 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh


On 3/29/20 11:21 PM, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
>
> The command finalize the guest receiving process and make the SEV guest
> ready for the execution.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>   .../virt/kvm/amd-memory-encryption.rst        |  8 +++++++
>   arch/x86/kvm/svm.c                            | 23 +++++++++++++++++++
>   2 files changed, 31 insertions(+)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index 554aa33a99cc..93cd95d9a6c0 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -375,6 +375,14 @@ Returns: 0 on success, -negative on error
>                   __u32 trans_len;
>           };
>   
> +15. KVM_SEV_RECEIVE_FINISH
> +------------------------
> +
> +After completion of the migration flow, the KVM_SEV_RECEIVE_FINISH command can be
> +issued by the hypervisor to make the guest ready for execution.
> +
> +Returns: 0 on success, -negative on error
> +
>   References
>   ==========
>   
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 5fc5355536d7..7c2721e18b06 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7573,6 +7573,26 @@ static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
>   	return ret;
>   }
>   
> +static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct sev_data_receive_finish *data;
> +	int ret;
> +
> +	if (!sev_guest(kvm))
> +		return -ENOTTY;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	data->handle = sev->handle;
> +	ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, data, &argp->error);
> +
> +	kfree(data);
> +	return ret;
> +}
> +
>   static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   {
>   	struct kvm_sev_cmd sev_cmd;
> @@ -7632,6 +7652,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   	case KVM_SEV_RECEIVE_UPDATE_DATA:
>   		r = sev_receive_update_data(kvm, &sev_cmd);
>   		break;
> +	case KVM_SEV_RECEIVE_FINISH:
> +		r = sev_receive_finish(kvm, &sev_cmd);
> +		break;
>   	default:
>   		r = -EINVAL;
>   		goto out;
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 05/14] KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command
  2020-03-30  6:21 ` [PATCH v6 05/14] KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command Ashish Kalra
  2020-04-02 22:25   ` Krish Sadhukhan
@ 2020-04-02 22:29   ` Venu Busireddy
  2020-04-07  0:49     ` Steve Rutherford
  1 sibling, 1 reply; 107+ messages in thread
From: Venu Busireddy @ 2020-04-02 22:29 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On 2020-03-30 06:21:20 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
> 
> The command is used for copying the incoming buffer into the
> SEV guest memory space.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>

Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>

> ---
>  .../virt/kvm/amd-memory-encryption.rst        | 24 ++++++
>  arch/x86/kvm/svm.c                            | 79 +++++++++++++++++++
>  include/uapi/linux/kvm.h                      |  9 +++
>  3 files changed, 112 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index ef1f1f3a5b40..554aa33a99cc 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -351,6 +351,30 @@ On success, the 'handle' field contains a new handle and on error, a negative va
>  
>  For more details, see SEV spec Section 6.12.
>  
> +14. KVM_SEV_RECEIVE_UPDATE_DATA
> +----------------------------
> +
> +The KVM_SEV_RECEIVE_UPDATE_DATA command can be used by the hypervisor to copy
> +the incoming buffers into the guest memory region with encryption context
> +created during the KVM_SEV_RECEIVE_START.
> +
> +Parameters (in): struct kvm_sev_receive_update_data
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> +        struct kvm_sev_launch_receive_update_data {
> +                __u64 hdr_uaddr;        /* userspace address containing the packet header */
> +                __u32 hdr_len;
> +
> +                __u64 guest_uaddr;      /* the destination guest memory region */
> +                __u32 guest_len;
> +
> +                __u64 trans_uaddr;      /* the incoming buffer memory region  */
> +                __u32 trans_len;
> +        };
> +
>  References
>  ==========
>  
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 038b47685733..5fc5355536d7 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7497,6 +7497,82 @@ static int sev_receive_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  	return ret;
>  }
>  
> +static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct kvm_sev_receive_update_data params;
> +	struct sev_data_receive_update_data *data;
> +	void *hdr = NULL, *trans = NULL;
> +	struct page **guest_page;
> +	unsigned long n;
> +	int ret, offset;
> +
> +	if (!sev_guest(kvm))
> +		return -EINVAL;
> +
> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
> +			sizeof(struct kvm_sev_receive_update_data)))
> +		return -EFAULT;
> +
> +	if (!params.hdr_uaddr || !params.hdr_len ||
> +	    !params.guest_uaddr || !params.guest_len ||
> +	    !params.trans_uaddr || !params.trans_len)
> +		return -EINVAL;
> +
> +	/* Check if we are crossing the page boundary */
> +	offset = params.guest_uaddr & (PAGE_SIZE - 1);
> +	if ((params.guest_len + offset > PAGE_SIZE))
> +		return -EINVAL;
> +
> +	hdr = psp_copy_user_blob(params.hdr_uaddr, params.hdr_len);
> +	if (IS_ERR(hdr))
> +		return PTR_ERR(hdr);
> +
> +	trans = psp_copy_user_blob(params.trans_uaddr, params.trans_len);
> +	if (IS_ERR(trans)) {
> +		ret = PTR_ERR(trans);
> +		goto e_free_hdr;
> +	}
> +
> +	ret = -ENOMEM;
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		goto e_free_trans;
> +
> +	data->hdr_address = __psp_pa(hdr);
> +	data->hdr_len = params.hdr_len;
> +	data->trans_address = __psp_pa(trans);
> +	data->trans_len = params.trans_len;
> +
> +	/* Pin guest memory */
> +	ret = -EFAULT;
> +	guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
> +				    PAGE_SIZE, &n, 0);
> +	if (!guest_page)
> +		goto e_free;
> +
> +	/* The RECEIVE_UPDATE_DATA command requires C-bit to be always set. */
> +	data->guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) +
> +				offset;
> +	data->guest_address |= sev_me_mask;
> +	data->guest_len = params.guest_len;
> +	data->handle = sev->handle;
> +
> +	ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_UPDATE_DATA, data,
> +				&argp->error);
> +
> +	sev_unpin_memory(kvm, guest_page, n);
> +
> +e_free:
> +	kfree(data);
> +e_free_trans:
> +	kfree(trans);
> +e_free_hdr:
> +	kfree(hdr);
> +
> +	return ret;
> +}
> +
>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_sev_cmd sev_cmd;
> @@ -7553,6 +7629,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  	case KVM_SEV_RECEIVE_START:
>  		r = sev_receive_start(kvm, &sev_cmd);
>  		break;
> +	case KVM_SEV_RECEIVE_UPDATE_DATA:
> +		r = sev_receive_update_data(kvm, &sev_cmd);
> +		break;
>  	default:
>  		r = -EINVAL;
>  		goto out;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 74764b9db5fa..4e80c57a3182 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1588,6 +1588,15 @@ struct kvm_sev_receive_start {
>  	__u32 session_len;
>  };
>  
> +struct kvm_sev_receive_update_data {
> +	__u64 hdr_uaddr;
> +	__u32 hdr_len;
> +	__u64 guest_uaddr;
> +	__u32 guest_len;
> +	__u64 trans_uaddr;
> +	__u32 trans_len;
> +};
> +
>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
>  #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
>  #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 07/14] KVM: x86: Add AMD SEV specific Hypercall3
  2020-03-30  6:21 ` [PATCH v6 07/14] KVM: x86: Add AMD SEV specific Hypercall3 Ashish Kalra
@ 2020-04-02 22:36   ` Venu Busireddy
  2020-04-02 23:54   ` Krish Sadhukhan
  1 sibling, 0 replies; 107+ messages in thread
From: Venu Busireddy @ 2020-04-02 22:36 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On 2020-03-30 06:21:52 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
> 
> KVM hypercall framework relies on alternative framework to patch the
> VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
> apply_alternative() is called then it defaults to VMCALL. The approach
> works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
                                              ^^^^^^
					      cause
> will be able to decode the instruction and do the right things. But
> when SEV is active, guest memory is encrypted with guest key and
> hypervisor will not be able to decode the instruction bytes.
> 
> Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall
                               ^^
			       which
> will be used by the SEV guest to notify encrypted pages to the hypervisor.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>

Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>

> ---
>  arch/x86/include/asm/kvm_para.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 9b4df6eaa11a..6c09255633a4 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -84,6 +84,18 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
>  	return ret;
>  }
>  
> +static inline long kvm_sev_hypercall3(unsigned int nr, unsigned long p1,
> +				      unsigned long p2, unsigned long p3)
> +{
> +	long ret;
> +
> +	asm volatile("vmmcall"
> +		     : "=a"(ret)
> +		     : "a"(nr), "b"(p1), "c"(p2), "d"(p3)
> +		     : "memory");
> +	return ret;
> +}
> +
>  #ifdef CONFIG_KVM_GUEST
>  bool kvm_para_available(void);
>  unsigned int kvm_arch_para_features(void);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 13/14] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.
       [not found]           ` <20200401070931.GA8562@ashkalra_ubuntu_server>
@ 2020-04-02 23:29             ` Ashish Kalra
  0 siblings, 0 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-04-02 23:29 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, Thomas, x86, kvm,
	linux-kernel, rientjes, srutherford, luto

Hello Brijesh,
> 
> On Tue, Mar 31, 2020 at 05:13:36PM +0000, Ashish Kalra wrote:
> > Hello Brijesh,
> > 
> > > > Actually this is being done somewhat lazily, after the guest
> > > > enables/activates the live migration feature, it should be fine to do it
> > > > here or it can be moved into sev_map_percpu_data() where the first
> > > > hypercalls are done, in both cases the __bss_decrypted section will be
> > > > marked before the live migration process is initiated.
> > > 
> > > 
> > > IMO, its not okay to do it here or inside sev_map_percpu_data(). So far,
> > > as soon as C-bit state is changed in page table we make a hypercall. It
> > > will be good idea to stick to that approach. I don't see any reason why
> > > we need to make an exception for the __bss_decrypted unless I am missing
> > > something. What will happen if VMM initiate the migration while guest
> > > BIOS is booting?  Are you saying its not supported ?
> > > 
> > 
> > The one thing this will require is checking for KVM para capability 
> > KVM_FEATURE_SEV_LIVE_MIGRATION as part of this code in startup_64(), i 
> > need to verify if i can check for this feature so early in startup code.
> > 
> > I need to check for this capability and do the wrmsrl() here as this
> > will be the 1st hypercall in the guest kernel and i will need to
> > enable live migration feature and hypercall support on the host
> > before making the hypercall.
> > 
 
I added the KVM para feature capability check here in startup_64(), and
as i thought this does "not" work and also as a side effect disables 
the KVM paravirtualization check and so KVM paravirtualization is not
detected later during kernel boot and all KVM paravirt features remain
disabled.
 
Digged deeper into this and here's what happens ...

kvm_para_has_feature() calls kvm_arch_para_feature() which in turn calls
kvm_cpuid_base() and this invokes __kvm_cpuid_base(). As the 
"boot_cpu_data" is still not populated/setup, therefore, 
__kvm_cpuid_base() does not detect X86_FEATURE_HYPERVISOR and
also as a side effect sets the variable kvm_cpuid_base == 0.

So as the kvm_para_feature() is not detected in startup_64(), therefore 
the hypercall does not get invoked and also as the side effect of calling
kvm_para_feature() in startup_64(), the static variable "kvm_cpuid_base"
gets set to 0, and later during hypervisor detection (kvm_detect), this
variable's setting causes kvm_detect() to return failure and hence
KVM paravirtualization features don't get enabled for the guest kernel.

So, calling kvm_para_has_feature() so early in startup_64() code is 
not going to work, hence, it is probably best to do the hypercall to mark
__bss_decrypted section as decrypted (lazily) as part of sev_map_percpu_data()
as per my original thought.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 14/14] KVM: x86: Add kexec support for SEV Live Migration.
  2020-03-31 14:26       ` Brijesh Singh
@ 2020-04-02 23:34         ` Ashish Kalra
  0 siblings, 0 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-04-02 23:34 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto

Hello Brijesh,

On Tue, Mar 31, 2020 at 09:26:26AM -0500, Brijesh Singh wrote:
> 
> On 3/30/20 11:45 AM, Ashish Kalra wrote:
> > Hello Brijesh,
> >
> > On Mon, Mar 30, 2020 at 11:00:14AM -0500, Brijesh Singh wrote:
> >> On 3/30/20 1:23 AM, Ashish Kalra wrote:
> >>> From: Ashish Kalra <ashish.kalra@amd.com>
> >>>
> >>> Reset the host's page encryption bitmap related to kernel
> >>> specific page encryption status settings before we load a
> >>> new kernel by kexec. We cannot reset the complete
> >>> page encryption bitmap here as we need to retain the
> >>> UEFI/OVMF firmware specific settings.
> >>>
> >>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> >>> ---
> >>>  arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
> >>>  1 file changed, 28 insertions(+)
> >>>
> >>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> >>> index 8fcee0b45231..ba6cce3c84af 100644
> >>> --- a/arch/x86/kernel/kvm.c
> >>> +++ b/arch/x86/kernel/kvm.c
> >>> @@ -34,6 +34,7 @@
> >>>  #include <asm/hypervisor.h>
> >>>  #include <asm/tlb.h>
> >>>  #include <asm/cpuidle_haltpoll.h>
> >>> +#include <asm/e820/api.h>
> >>>  
> >>>  static int kvmapf = 1;
> >>>  
> >>> @@ -357,6 +358,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
> >>>  	 */
> >>>  	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
> >>>  		wrmsrl(MSR_KVM_PV_EOI_EN, 0);
> >>> +	/*
> >>> +	 * Reset the host's page encryption bitmap related to kernel
> >>> +	 * specific page encryption status settings before we load a
> >>> +	 * new kernel by kexec. NOTE: We cannot reset the complete
> >>> +	 * page encryption bitmap here as we need to retain the
> >>> +	 * UEFI/OVMF firmware specific settings.
> >>> +	 */
> >>> +	if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION) &&
> >>> +		(smp_processor_id() == 0)) {
> >>
> >> In patch 13/14, the KVM_FEATURE_SEV_LIVE_MIGRATION is set
> >> unconditionally and because of that now the below code will be executed
> >> on non-SEV guest. IMO, this feature must be cleared for non-SEV guest to
> >> avoid making unnecessary hypercall's.
> >>
> >>
> > I will additionally add a sev_active() check here to ensure that we don't make the unnecassary hypercalls on non-SEV guests.
> 
> 
> IMO, instead of using the sev_active() we should make sure that the
> feature is not enabled when SEV is not active.
> 

Yes, now the KVM_FEATURE_SEV_LIVE_MIGRATION feature is enabled
dynamically in svm_cpuid_update() after it gets called from
svm_launch_finish(), which ensures that it only gets set when a SEV
guest is active.

Thanks,
Ashish

> 
> >>> +		unsigned long nr_pages;
> >>> +		int i;
> >>> +
> >>> +		for (i = 0; i < e820_table->nr_entries; i++) {
> >>> +			struct e820_entry *entry = &e820_table->entries[i];
> >>> +			unsigned long start_pfn, end_pfn;
> >>> +
> >>> +			if (entry->type != E820_TYPE_RAM)
> >>> +				continue;
> >>> +
> >>> +			start_pfn = entry->addr >> PAGE_SHIFT;
> >>> +			end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
> >>> +			nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
> >>> +
> >>> +			kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> >>> +				entry->addr, nr_pages, 1);
> >>> +		}
> >>> +	}
> >>>  	kvm_pv_disable_apf();
> >>>  	kvm_disable_steal_time();
> >>>  }
> > Thanks,
> > Ashish

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 07/14] KVM: x86: Add AMD SEV specific Hypercall3
  2020-03-30  6:21 ` [PATCH v6 07/14] KVM: x86: Add AMD SEV specific Hypercall3 Ashish Kalra
  2020-04-02 22:36   ` Venu Busireddy
@ 2020-04-02 23:54   ` Krish Sadhukhan
  2020-04-07  1:22     ` Steve Rutherford
  1 sibling, 1 reply; 107+ messages in thread
From: Krish Sadhukhan @ 2020-04-02 23:54 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh


On 3/29/20 11:21 PM, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
>
> KVM hypercall framework relies on alternative framework to patch the
> VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
> apply_alternative()

s/apply_alternative/apply_alternatives/
> is called then it defaults to VMCALL. The approach
> works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
> will be able to decode the instruction and do the right things. But
> when SEV is active, guest memory is encrypted with guest key and
> hypervisor will not be able to decode the instruction bytes.
>
> Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall
> will be used by the SEV guest to notify encrypted pages to the hypervisor.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>   arch/x86/include/asm/kvm_para.h | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 9b4df6eaa11a..6c09255633a4 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -84,6 +84,18 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
>   	return ret;
>   }
>   
> +static inline long kvm_sev_hypercall3(unsigned int nr, unsigned long p1,
> +				      unsigned long p2, unsigned long p3)
> +{
> +	long ret;
> +
> +	asm volatile("vmmcall"
> +		     : "=a"(ret)
> +		     : "a"(nr), "b"(p1), "c"(p2), "d"(p3)
> +		     : "memory");
> +	return ret;
> +}
> +
>   #ifdef CONFIG_KVM_GUEST
>   bool kvm_para_available(void);
>   unsigned int kvm_arch_para_features(void);
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2020-03-30  6:22 ` [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall Ashish Kalra
@ 2020-04-03  0:00   ` Venu Busireddy
  2020-04-03  1:31   ` Krish Sadhukhan
  2020-04-07  2:17   ` Steve Rutherford
  2 siblings, 0 replies; 107+ messages in thread
From: Venu Busireddy @ 2020-04-03  0:00 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On 2020-03-30 06:22:07 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
> 
> This hypercall is used by the SEV guest to notify a change in the page
> encryption status to the hypervisor. The hypercall should be invoked
> only when the encryption attribute is changed from encrypted -> decrypted
> and vice versa. By default all guest pages are considered encrypted.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>

Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>

> ---
>  Documentation/virt/kvm/hypercalls.rst | 15 +++++
>  arch/x86/include/asm/kvm_host.h       |  2 +
>  arch/x86/kvm/svm.c                    | 95 +++++++++++++++++++++++++++
>  arch/x86/kvm/vmx/vmx.c                |  1 +
>  arch/x86/kvm/x86.c                    |  6 ++
>  include/uapi/linux/kvm_para.h         |  1 +
>  6 files changed, 120 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> index dbaf207e560d..ff5287e68e81 100644
> --- a/Documentation/virt/kvm/hypercalls.rst
> +++ b/Documentation/virt/kvm/hypercalls.rst
> @@ -169,3 +169,18 @@ a0: destination APIC ID
>  
>  :Usage example: When sending a call-function IPI-many to vCPUs, yield if
>  	        any of the IPI target vCPUs was preempted.
> +
> +
> +8. KVM_HC_PAGE_ENC_STATUS
> +-------------------------
> +:Architecture: x86
> +:Status: active
> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> +
> +a0: the guest physical address of the start page
> +a1: the number of pages
> +a2: encryption attribute
> +
> +   Where:
> +	* 1: Encryption attribute is set
> +	* 0: Encryption attribute is cleared
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 98959e8cd448..90718fa3db47 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
>  
>  	bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
>  	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> +	int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> +				  unsigned long sz, unsigned long mode);
>  };
>  
>  struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 7c2721e18b06..1d8beaf1bceb 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -136,6 +136,8 @@ struct kvm_sev_info {
>  	int fd;			/* SEV device fd */
>  	unsigned long pages_locked; /* Number of pages locked */
>  	struct list_head regions_list;  /* List of registered regions */
> +	unsigned long *page_enc_bmap;
> +	unsigned long page_enc_bmap_size;
>  };
>  
>  struct kvm_svm {
> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
>  
>  	sev_unbind_asid(kvm, sev->handle);
>  	sev_asid_free(sev->asid);
> +
> +	kvfree(sev->page_enc_bmap);
> +	sev->page_enc_bmap = NULL;
>  }
>  
>  static void avic_vm_destroy(struct kvm *kvm)
> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  	return ret;
>  }
>  
> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	unsigned long *map;
> +	unsigned long sz;
> +
> +	if (sev->page_enc_bmap_size >= new_size)
> +		return 0;
> +
> +	sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> +
> +	map = vmalloc(sz);
> +	if (!map) {
> +		pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> +				sz);
> +		return -ENOMEM;
> +	}
> +
> +	/* mark the page encrypted (by default) */
> +	memset(map, 0xff, sz);
> +
> +	bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> +	kvfree(sev->page_enc_bmap);
> +
> +	sev->page_enc_bmap = map;
> +	sev->page_enc_bmap_size = new_size;
> +
> +	return 0;
> +}
> +
> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> +				  unsigned long npages, unsigned long enc)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	kvm_pfn_t pfn_start, pfn_end;
> +	gfn_t gfn_start, gfn_end;
> +	int ret;
> +
> +	if (!sev_guest(kvm))
> +		return -EINVAL;
> +
> +	if (!npages)
> +		return 0;
> +
> +	gfn_start = gpa_to_gfn(gpa);
> +	gfn_end = gfn_start + npages;
> +
> +	/* out of bound access error check */
> +	if (gfn_end <= gfn_start)
> +		return -EINVAL;
> +
> +	/* lets make sure that gpa exist in our memslot */
> +	pfn_start = gfn_to_pfn(kvm, gfn_start);
> +	pfn_end = gfn_to_pfn(kvm, gfn_end);
> +
> +	if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> +		/*
> +		 * Allow guest MMIO range(s) to be added
> +		 * to the page encryption bitmap.
> +		 */
> +		return -EINVAL;
> +	}
> +
> +	if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> +		/*
> +		 * Allow guest MMIO range(s) to be added
> +		 * to the page encryption bitmap.
> +		 */
> +		return -EINVAL;
> +	}
> +
> +	mutex_lock(&kvm->lock);
> +	ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> +	if (ret)
> +		goto unlock;
> +
> +	if (enc)
> +		__bitmap_set(sev->page_enc_bmap, gfn_start,
> +				gfn_end - gfn_start);
> +	else
> +		__bitmap_clear(sev->page_enc_bmap, gfn_start,
> +				gfn_end - gfn_start);
> +
> +unlock:
> +	mutex_unlock(&kvm->lock);
> +	return ret;
> +}
> +
>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_sev_cmd sev_cmd;
> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>  	.need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
>  
>  	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
> +
> +	.page_enc_status_hc = svm_page_enc_status_hc,
>  };
>  
>  static int __init svm_init(void)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 079d9fbf278e..f68e76ee7f9c 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>  	.nested_get_evmcs_version = NULL,
>  	.need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
>  	.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> +	.page_enc_status_hc = NULL,
>  };
>  
>  static void vmx_cleanup_l1d_flush(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cf95c36cb4f4..68428eef2dde 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>  		kvm_sched_yield(vcpu->kvm, a0);
>  		ret = 0;
>  		break;
> +	case KVM_HC_PAGE_ENC_STATUS:
> +		ret = -KVM_ENOSYS;
> +		if (kvm_x86_ops->page_enc_status_hc)
> +			ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> +					a0, a1, a2);
> +		break;
>  	default:
>  		ret = -KVM_ENOSYS;
>  		break;
> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> index 8b86609849b9..847b83b75dc8 100644
> --- a/include/uapi/linux/kvm_para.h
> +++ b/include/uapi/linux/kvm_para.h
> @@ -29,6 +29,7 @@
>  #define KVM_HC_CLOCK_PAIRING		9
>  #define KVM_HC_SEND_IPI		10
>  #define KVM_HC_SCHED_YIELD		11
> +#define KVM_HC_PAGE_ENC_STATUS		12
>  
>  /*
>   * hypercalls use architecture specific
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2020-03-30  6:22 ` [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall Ashish Kalra
  2020-04-03  0:00   ` Venu Busireddy
@ 2020-04-03  1:31   ` Krish Sadhukhan
  2020-04-03  1:57     ` Ashish Kalra
  2020-04-07  2:17   ` Steve Rutherford
  2 siblings, 1 reply; 107+ messages in thread
From: Krish Sadhukhan @ 2020-04-03  1:31 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh


On 3/29/20 11:22 PM, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
>
> This hypercall is used by the SEV guest to notify a change in the page
> encryption status to the hypervisor. The hypercall should be invoked
> only when the encryption attribute is changed from encrypted -> decrypted
> and vice versa. By default all guest pages are considered encrypted.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>   Documentation/virt/kvm/hypercalls.rst | 15 +++++
>   arch/x86/include/asm/kvm_host.h       |  2 +
>   arch/x86/kvm/svm.c                    | 95 +++++++++++++++++++++++++++
>   arch/x86/kvm/vmx/vmx.c                |  1 +
>   arch/x86/kvm/x86.c                    |  6 ++
>   include/uapi/linux/kvm_para.h         |  1 +
>   6 files changed, 120 insertions(+)
>
> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> index dbaf207e560d..ff5287e68e81 100644
> --- a/Documentation/virt/kvm/hypercalls.rst
> +++ b/Documentation/virt/kvm/hypercalls.rst
> @@ -169,3 +169,18 @@ a0: destination APIC ID
>   
>   :Usage example: When sending a call-function IPI-many to vCPUs, yield if
>   	        any of the IPI target vCPUs was preempted.
> +
> +
> +8. KVM_HC_PAGE_ENC_STATUS
> +-------------------------
> +:Architecture: x86
> +:Status: active
> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> +
> +a0: the guest physical address of the start page
> +a1: the number of pages
> +a2: encryption attribute
> +
> +   Where:
> +	* 1: Encryption attribute is set
> +	* 0: Encryption attribute is cleared
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 98959e8cd448..90718fa3db47 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
>   
>   	bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
>   	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> +	int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> +				  unsigned long sz, unsigned long mode);
>   };
>   
>   struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 7c2721e18b06..1d8beaf1bceb 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -136,6 +136,8 @@ struct kvm_sev_info {
>   	int fd;			/* SEV device fd */
>   	unsigned long pages_locked; /* Number of pages locked */
>   	struct list_head regions_list;  /* List of registered regions */
> +	unsigned long *page_enc_bmap;
> +	unsigned long page_enc_bmap_size;
>   };
>   
>   struct kvm_svm {
> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
>   
>   	sev_unbind_asid(kvm, sev->handle);
>   	sev_asid_free(sev->asid);
> +
> +	kvfree(sev->page_enc_bmap);
> +	sev->page_enc_bmap = NULL;
>   }
>   
>   static void avic_vm_destroy(struct kvm *kvm)
> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>   	return ret;
>   }
>   
> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	unsigned long *map;
> +	unsigned long sz;
> +
> +	if (sev->page_enc_bmap_size >= new_size)
> +		return 0;
> +
> +	sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> +
> +	map = vmalloc(sz);


Just wondering why we can't directly modify sev->page_enc_bmap.

> +	if (!map) {
> +		pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> +				sz);
> +		return -ENOMEM;
> +	}
> +
> +	/* mark the page encrypted (by default) */
> +	memset(map, 0xff, sz);
> +
> +	bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> +	kvfree(sev->page_enc_bmap);
> +
> +	sev->page_enc_bmap = map;
> +	sev->page_enc_bmap_size = new_size;
> +
> +	return 0;
> +}
> +
> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> +				  unsigned long npages, unsigned long enc)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	kvm_pfn_t pfn_start, pfn_end;
> +	gfn_t gfn_start, gfn_end;
> +	int ret;
> +
> +	if (!sev_guest(kvm))
> +		return -EINVAL;
> +
> +	if (!npages)
> +		return 0;
> +
> +	gfn_start = gpa_to_gfn(gpa);
> +	gfn_end = gfn_start + npages;
> +
> +	/* out of bound access error check */
> +	if (gfn_end <= gfn_start)
> +		return -EINVAL;
> +
> +	/* lets make sure that gpa exist in our memslot */
> +	pfn_start = gfn_to_pfn(kvm, gfn_start);
> +	pfn_end = gfn_to_pfn(kvm, gfn_end);
> +
> +	if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> +		/*
> +		 * Allow guest MMIO range(s) to be added
> +		 * to the page encryption bitmap.
> +		 */
> +		return -EINVAL;
> +	}
> +
> +	if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> +		/*
> +		 * Allow guest MMIO range(s) to be added
> +		 * to the page encryption bitmap.
> +		 */
> +		return -EINVAL;
> +	}


It seems is_error_noslot_pfn() covers both cases - i) gfn slot is 
absent, ii) failure to translate to pfn. So do we still need 
is_noslot_pfn() ?

> +
> +	mutex_lock(&kvm->lock);
> +	ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> +	if (ret)
> +		goto unlock;
> +
> +	if (enc)
> +		__bitmap_set(sev->page_enc_bmap, gfn_start,
> +				gfn_end - gfn_start);
> +	else
> +		__bitmap_clear(sev->page_enc_bmap, gfn_start,
> +				gfn_end - gfn_start);
> +
> +unlock:
> +	mutex_unlock(&kvm->lock);
> +	return ret;
> +}
> +
>   static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   {
>   	struct kvm_sev_cmd sev_cmd;
> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>   	.need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
>   
>   	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
> +
> +	.page_enc_status_hc = svm_page_enc_status_hc,


Why not place it where other encryption ops are located ?

         ...

         .mem_enc_unreg_region

+      .page_enc_status_hc = svm_page_enc_status_hc

>   };
>   
>   static int __init svm_init(void)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 079d9fbf278e..f68e76ee7f9c 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>   	.nested_get_evmcs_version = NULL,
>   	.need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
>   	.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> +	.page_enc_status_hc = NULL,
>   };
>   
>   static void vmx_cleanup_l1d_flush(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cf95c36cb4f4..68428eef2dde 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>   		kvm_sched_yield(vcpu->kvm, a0);
>   		ret = 0;
>   		break;
> +	case KVM_HC_PAGE_ENC_STATUS:
> +		ret = -KVM_ENOSYS;
> +		if (kvm_x86_ops->page_enc_status_hc)
> +			ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> +					a0, a1, a2);
> +		break;
>   	default:
>   		ret = -KVM_ENOSYS;
>   		break;
> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> index 8b86609849b9..847b83b75dc8 100644
> --- a/include/uapi/linux/kvm_para.h
> +++ b/include/uapi/linux/kvm_para.h
> @@ -29,6 +29,7 @@
>   #define KVM_HC_CLOCK_PAIRING		9
>   #define KVM_HC_SEND_IPI		10
>   #define KVM_HC_SCHED_YIELD		11
> +#define KVM_HC_PAGE_ENC_STATUS		12
>   
>   /*
>    * hypercalls use architecture specific

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2020-04-03  1:31   ` Krish Sadhukhan
@ 2020-04-03  1:57     ` Ashish Kalra
  2020-04-03  2:58       ` Ashish Kalra
  0 siblings, 1 reply; 107+ messages in thread
From: Ashish Kalra @ 2020-04-03  1:57 UTC (permalink / raw)
  To: Krish Sadhukhan
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On Thu, Apr 02, 2020 at 06:31:54PM -0700, Krish Sadhukhan wrote:
> 
> On 3/29/20 11:22 PM, Ashish Kalra wrote:
> > From: Brijesh Singh <Brijesh.Singh@amd.com>
> > 
> > This hypercall is used by the SEV guest to notify a change in the page
> > encryption status to the hypervisor. The hypercall should be invoked
> > only when the encryption attribute is changed from encrypted -> decrypted
> > and vice versa. By default all guest pages are considered encrypted.
> > 
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > Cc: x86@kernel.org
> > Cc: kvm@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > ---
> >   Documentation/virt/kvm/hypercalls.rst | 15 +++++
> >   arch/x86/include/asm/kvm_host.h       |  2 +
> >   arch/x86/kvm/svm.c                    | 95 +++++++++++++++++++++++++++
> >   arch/x86/kvm/vmx/vmx.c                |  1 +
> >   arch/x86/kvm/x86.c                    |  6 ++
> >   include/uapi/linux/kvm_para.h         |  1 +
> >   6 files changed, 120 insertions(+)
> > 
> > diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> > index dbaf207e560d..ff5287e68e81 100644
> > --- a/Documentation/virt/kvm/hypercalls.rst
> > +++ b/Documentation/virt/kvm/hypercalls.rst
> > @@ -169,3 +169,18 @@ a0: destination APIC ID
> >   :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> >   	        any of the IPI target vCPUs was preempted.
> > +
> > +
> > +8. KVM_HC_PAGE_ENC_STATUS
> > +-------------------------
> > +:Architecture: x86
> > +:Status: active
> > +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> > +
> > +a0: the guest physical address of the start page
> > +a1: the number of pages
> > +a2: encryption attribute
> > +
> > +   Where:
> > +	* 1: Encryption attribute is set
> > +	* 0: Encryption attribute is cleared
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 98959e8cd448..90718fa3db47 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> >   	bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> >   	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > +	int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > +				  unsigned long sz, unsigned long mode);
> >   };
> >   struct kvm_arch_async_pf {
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 7c2721e18b06..1d8beaf1bceb 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -136,6 +136,8 @@ struct kvm_sev_info {
> >   	int fd;			/* SEV device fd */
> >   	unsigned long pages_locked; /* Number of pages locked */
> >   	struct list_head regions_list;  /* List of registered regions */
> > +	unsigned long *page_enc_bmap;
> > +	unsigned long page_enc_bmap_size;
> >   };
> >   struct kvm_svm {
> > @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> >   	sev_unbind_asid(kvm, sev->handle);
> >   	sev_asid_free(sev->asid);
> > +
> > +	kvfree(sev->page_enc_bmap);
> > +	sev->page_enc_bmap = NULL;
> >   }
> >   static void avic_vm_destroy(struct kvm *kvm)
> > @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >   	return ret;
> >   }
> > +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> > +{
> > +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +	unsigned long *map;
> > +	unsigned long sz;
> > +
> > +	if (sev->page_enc_bmap_size >= new_size)
> > +		return 0;
> > +
> > +	sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> > +
> > +	map = vmalloc(sz);
> 
> 
> Just wondering why we can't directly modify sev->page_enc_bmap.
> 

Because the page_enc_bitmap needs to be re-sized here, it needs to be
expanded here. 

> > +	if (!map) {
> > +		pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> > +				sz);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	/* mark the page encrypted (by default) */
> > +	memset(map, 0xff, sz);
> > +
> > +	bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> > +	kvfree(sev->page_enc_bmap);
> > +
> > +	sev->page_enc_bmap = map;
> > +	sev->page_enc_bmap_size = new_size;
> > +
> > +	return 0;
> > +}
> > +
> > +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > +				  unsigned long npages, unsigned long enc)
> > +{
> > +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +	kvm_pfn_t pfn_start, pfn_end;
> > +	gfn_t gfn_start, gfn_end;
> > +	int ret;
> > +
> > +	if (!sev_guest(kvm))
> > +		return -EINVAL;
> > +
> > +	if (!npages)
> > +		return 0;
> > +
> > +	gfn_start = gpa_to_gfn(gpa);
> > +	gfn_end = gfn_start + npages;
> > +
> > +	/* out of bound access error check */
> > +	if (gfn_end <= gfn_start)
> > +		return -EINVAL;
> > +
> > +	/* lets make sure that gpa exist in our memslot */
> > +	pfn_start = gfn_to_pfn(kvm, gfn_start);
> > +	pfn_end = gfn_to_pfn(kvm, gfn_end);
> > +
> > +	if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> > +		/*
> > +		 * Allow guest MMIO range(s) to be added
> > +		 * to the page encryption bitmap.
> > +		 */
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> > +		/*
> > +		 * Allow guest MMIO range(s) to be added
> > +		 * to the page encryption bitmap.
> > +		 */
> > +		return -EINVAL;
> > +	}
> 
> 
> It seems is_error_noslot_pfn() covers both cases - i) gfn slot is absent,
> ii) failure to translate to pfn. So do we still need is_noslot_pfn() ?
>

We do need to check for !is_noslot_pfn(..) additionally as the MMIO ranges will not
be having a slot allocated.

Thanks,
Ashish

> > +
> > +	mutex_lock(&kvm->lock);
> > +	ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > +	if (ret)
> > +		goto unlock;
> > +
> > +	if (enc)
> > +		__bitmap_set(sev->page_enc_bmap, gfn_start,
> > +				gfn_end - gfn_start);
> > +	else
> > +		__bitmap_clear(sev->page_enc_bmap, gfn_start,
> > +				gfn_end - gfn_start);
> > +
> > +unlock:
> > +	mutex_unlock(&kvm->lock);
> > +	return ret;
> > +}
> > +
> >   static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >   {
> >   	struct kvm_sev_cmd sev_cmd;
> > @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >   	.need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> >   	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > +
> > +	.page_enc_status_hc = svm_page_enc_status_hc,
> 
> 
> Why not place it where other encryption ops are located ?
> 
>         ...
> 
>         .mem_enc_unreg_region
> 
> +      .page_enc_status_hc = svm_page_enc_status_hc
> 
> >   };
> >   static int __init svm_init(void)
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 079d9fbf278e..f68e76ee7f9c 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> >   	.nested_get_evmcs_version = NULL,
> >   	.need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> >   	.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > +	.page_enc_status_hc = NULL,
> >   };
> >   static void vmx_cleanup_l1d_flush(void)
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index cf95c36cb4f4..68428eef2dde 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> >   		kvm_sched_yield(vcpu->kvm, a0);
> >   		ret = 0;
> >   		break;
> > +	case KVM_HC_PAGE_ENC_STATUS:
> > +		ret = -KVM_ENOSYS;
> > +		if (kvm_x86_ops->page_enc_status_hc)
> > +			ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> > +					a0, a1, a2);
> > +		break;
> >   	default:
> >   		ret = -KVM_ENOSYS;
> >   		break;
> > diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> > index 8b86609849b9..847b83b75dc8 100644
> > --- a/include/uapi/linux/kvm_para.h
> > +++ b/include/uapi/linux/kvm_para.h
> > @@ -29,6 +29,7 @@
> >   #define KVM_HC_CLOCK_PAIRING		9
> >   #define KVM_HC_SEND_IPI		10
> >   #define KVM_HC_SCHED_YIELD		11
> > +#define KVM_HC_PAGE_ENC_STATUS		12
> >   /*
> >    * hypercalls use architecture specific

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2020-04-03  1:57     ` Ashish Kalra
@ 2020-04-03  2:58       ` Ashish Kalra
  2020-04-06 22:27         ` Krish Sadhukhan
  0 siblings, 1 reply; 107+ messages in thread
From: Ashish Kalra @ 2020-04-03  2:58 UTC (permalink / raw)
  To: Krish Sadhukhan
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On Fri, Apr 03, 2020 at 01:57:48AM +0000, Ashish Kalra wrote:
> On Thu, Apr 02, 2020 at 06:31:54PM -0700, Krish Sadhukhan wrote:
> > 
> > On 3/29/20 11:22 PM, Ashish Kalra wrote:
> > > From: Brijesh Singh <Brijesh.Singh@amd.com>
> > > 
> > > This hypercall is used by the SEV guest to notify a change in the page
> > > encryption status to the hypervisor. The hypercall should be invoked
> > > only when the encryption attribute is changed from encrypted -> decrypted
> > > and vice versa. By default all guest pages are considered encrypted.
> > > 
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > > Cc: Joerg Roedel <joro@8bytes.org>
> > > Cc: Borislav Petkov <bp@suse.de>
> > > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > > Cc: x86@kernel.org
> > > Cc: kvm@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org
> > > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > ---
> > >   Documentation/virt/kvm/hypercalls.rst | 15 +++++
> > >   arch/x86/include/asm/kvm_host.h       |  2 +
> > >   arch/x86/kvm/svm.c                    | 95 +++++++++++++++++++++++++++
> > >   arch/x86/kvm/vmx/vmx.c                |  1 +
> > >   arch/x86/kvm/x86.c                    |  6 ++
> > >   include/uapi/linux/kvm_para.h         |  1 +
> > >   6 files changed, 120 insertions(+)
> > > 
> > > diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> > > index dbaf207e560d..ff5287e68e81 100644
> > > --- a/Documentation/virt/kvm/hypercalls.rst
> > > +++ b/Documentation/virt/kvm/hypercalls.rst
> > > @@ -169,3 +169,18 @@ a0: destination APIC ID
> > >   :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> > >   	        any of the IPI target vCPUs was preempted.
> > > +
> > > +
> > > +8. KVM_HC_PAGE_ENC_STATUS
> > > +-------------------------
> > > +:Architecture: x86
> > > +:Status: active
> > > +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> > > +
> > > +a0: the guest physical address of the start page
> > > +a1: the number of pages
> > > +a2: encryption attribute
> > > +
> > > +   Where:
> > > +	* 1: Encryption attribute is set
> > > +	* 0: Encryption attribute is cleared
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 98959e8cd448..90718fa3db47 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> > >   	bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> > >   	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > > +	int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > > +				  unsigned long sz, unsigned long mode);
> > >   };
> > >   struct kvm_arch_async_pf {
> > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > index 7c2721e18b06..1d8beaf1bceb 100644
> > > --- a/arch/x86/kvm/svm.c
> > > +++ b/arch/x86/kvm/svm.c
> > > @@ -136,6 +136,8 @@ struct kvm_sev_info {
> > >   	int fd;			/* SEV device fd */
> > >   	unsigned long pages_locked; /* Number of pages locked */
> > >   	struct list_head regions_list;  /* List of registered regions */
> > > +	unsigned long *page_enc_bmap;
> > > +	unsigned long page_enc_bmap_size;
> > >   };
> > >   struct kvm_svm {
> > > @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> > >   	sev_unbind_asid(kvm, sev->handle);
> > >   	sev_asid_free(sev->asid);
> > > +
> > > +	kvfree(sev->page_enc_bmap);
> > > +	sev->page_enc_bmap = NULL;
> > >   }
> > >   static void avic_vm_destroy(struct kvm *kvm)
> > > @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > >   	return ret;
> > >   }
> > > +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> > > +{
> > > +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > +	unsigned long *map;
> > > +	unsigned long sz;
> > > +
> > > +	if (sev->page_enc_bmap_size >= new_size)
> > > +		return 0;
> > > +
> > > +	sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> > > +
> > > +	map = vmalloc(sz);
> > 
> > 
> > Just wondering why we can't directly modify sev->page_enc_bmap.
> > 
> 
> Because the page_enc_bitmap needs to be re-sized here, it needs to be
> expanded here. 
> 

I don't believe there is anything is like a realloc() kind of equivalent
for the kmalloc() interfaces.

Thanks,
Ashish

> > > +	if (!map) {
> > > +		pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> > > +				sz);
> > > +		return -ENOMEM;
> > > +	}
> > > +
> > > +	/* mark the page encrypted (by default) */
> > > +	memset(map, 0xff, sz);
> > > +
> > > +	bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > +	kvfree(sev->page_enc_bmap);
> > > +
> > > +	sev->page_enc_bmap = map;
> > > +	sev->page_enc_bmap_size = new_size;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > > +				  unsigned long npages, unsigned long enc)
> > > +{
> > > +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > +	kvm_pfn_t pfn_start, pfn_end;
> > > +	gfn_t gfn_start, gfn_end;
> > > +	int ret;
> > > +
> > > +	if (!sev_guest(kvm))
> > > +		return -EINVAL;
> > > +
> > > +	if (!npages)
> > > +		return 0;
> > > +
> > > +	gfn_start = gpa_to_gfn(gpa);
> > > +	gfn_end = gfn_start + npages;
> > > +
> > > +	/* out of bound access error check */
> > > +	if (gfn_end <= gfn_start)
> > > +		return -EINVAL;
> > > +
> > > +	/* lets make sure that gpa exist in our memslot */
> > > +	pfn_start = gfn_to_pfn(kvm, gfn_start);
> > > +	pfn_end = gfn_to_pfn(kvm, gfn_end);
> > > +
> > > +	if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> > > +		/*
> > > +		 * Allow guest MMIO range(s) to be added
> > > +		 * to the page encryption bitmap.
> > > +		 */
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> > > +		/*
> > > +		 * Allow guest MMIO range(s) to be added
> > > +		 * to the page encryption bitmap.
> > > +		 */
> > > +		return -EINVAL;
> > > +	}
> > 
> > 
> > It seems is_error_noslot_pfn() covers both cases - i) gfn slot is absent,
> > ii) failure to translate to pfn. So do we still need is_noslot_pfn() ?
> >
> 
> We do need to check for !is_noslot_pfn(..) additionally as the MMIO ranges will not
> be having a slot allocated.
> 
> Thanks,
> Ashish
> 
> > > +
> > > +	mutex_lock(&kvm->lock);
> > > +	ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > > +	if (ret)
> > > +		goto unlock;
> > > +
> > > +	if (enc)
> > > +		__bitmap_set(sev->page_enc_bmap, gfn_start,
> > > +				gfn_end - gfn_start);
> > > +	else
> > > +		__bitmap_clear(sev->page_enc_bmap, gfn_start,
> > > +				gfn_end - gfn_start);
> > > +
> > > +unlock:
> > > +	mutex_unlock(&kvm->lock);
> > > +	return ret;
> > > +}
> > > +
> > >   static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > >   {
> > >   	struct kvm_sev_cmd sev_cmd;
> > > @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > >   	.need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> > >   	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > > +
> > > +	.page_enc_status_hc = svm_page_enc_status_hc,
> > 
> > 
> > Why not place it where other encryption ops are located ?
> > 
> >         ...
> > 
> >         .mem_enc_unreg_region
> > 
> > +      .page_enc_status_hc = svm_page_enc_status_hc
> > 
> > >   };
> > >   static int __init svm_init(void)
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 079d9fbf278e..f68e76ee7f9c 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > >   	.nested_get_evmcs_version = NULL,
> > >   	.need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> > >   	.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > > +	.page_enc_status_hc = NULL,
> > >   };
> > >   static void vmx_cleanup_l1d_flush(void)
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index cf95c36cb4f4..68428eef2dde 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> > >   		kvm_sched_yield(vcpu->kvm, a0);
> > >   		ret = 0;
> > >   		break;
> > > +	case KVM_HC_PAGE_ENC_STATUS:
> > > +		ret = -KVM_ENOSYS;
> > > +		if (kvm_x86_ops->page_enc_status_hc)
> > > +			ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> > > +					a0, a1, a2);
> > > +		break;
> > >   	default:
> > >   		ret = -KVM_ENOSYS;
> > >   		break;
> > > diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> > > index 8b86609849b9..847b83b75dc8 100644
> > > --- a/include/uapi/linux/kvm_para.h
> > > +++ b/include/uapi/linux/kvm_para.h
> > > @@ -29,6 +29,7 @@
> > >   #define KVM_HC_CLOCK_PAIRING		9
> > >   #define KVM_HC_SEND_IPI		10
> > >   #define KVM_HC_SCHED_YIELD		11
> > > +#define KVM_HC_PAGE_ENC_STATUS		12
> > >   /*
> > >    * hypercalls use architecture specific

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 14/14] KVM: x86: Add kexec support for SEV Live Migration.
  2020-03-30  6:23 ` [PATCH v6 14/14] KVM: x86: Add kexec support for SEV Live Migration Ashish Kalra
  2020-03-30 16:00   ` Brijesh Singh
@ 2020-04-03 12:57   ` Dave Young
  2020-04-04  0:55   ` Krish Sadhukhan
  2 siblings, 0 replies; 107+ messages in thread
From: Dave Young @ 2020-04-03 12:57 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh, kexec,
	lijiang, bhe

Ccing kexec list.

Ashish, could you cc kexec list if you repost later?
On 03/30/20 at 06:23am, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> Reset the host's page encryption bitmap related to kernel
> specific page encryption status settings before we load a
> new kernel by kexec. We cannot reset the complete
> page encryption bitmap here as we need to retain the
> UEFI/OVMF firmware specific settings.
> 
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 8fcee0b45231..ba6cce3c84af 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -34,6 +34,7 @@
>  #include <asm/hypervisor.h>
>  #include <asm/tlb.h>
>  #include <asm/cpuidle_haltpoll.h>
> +#include <asm/e820/api.h>
>  
>  static int kvmapf = 1;
>  
> @@ -357,6 +358,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
>  	 */
>  	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
>  		wrmsrl(MSR_KVM_PV_EOI_EN, 0);
> +	/*
> +	 * Reset the host's page encryption bitmap related to kernel
> +	 * specific page encryption status settings before we load a
> +	 * new kernel by kexec. NOTE: We cannot reset the complete
> +	 * page encryption bitmap here as we need to retain the
> +	 * UEFI/OVMF firmware specific settings.
> +	 */
> +	if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION) &&
> +		(smp_processor_id() == 0)) {
> +		unsigned long nr_pages;
> +		int i;
> +
> +		for (i = 0; i < e820_table->nr_entries; i++) {
> +			struct e820_entry *entry = &e820_table->entries[i];
> +			unsigned long start_pfn, end_pfn;
> +
> +			if (entry->type != E820_TYPE_RAM)
> +				continue;
> +
> +			start_pfn = entry->addr >> PAGE_SHIFT;
> +			end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
> +			nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
> +
> +			kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> +				entry->addr, nr_pages, 1);
> +		}
> +	}
>  	kvm_pv_disable_apf();
>  	kvm_disable_steal_time();
>  }
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 09/14] KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
  2020-03-30  6:22 ` [PATCH v6 09/14] KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl Ashish Kalra
@ 2020-04-03 18:30   ` Venu Busireddy
  2020-04-03 20:18   ` Krish Sadhukhan
  1 sibling, 0 replies; 107+ messages in thread
From: Venu Busireddy @ 2020-04-03 18:30 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On 2020-03-30 06:22:23 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
> 
> The ioctl can be used to retrieve page encryption bitmap for a given
> gfn range.
> 
> Return the correct bitmap as per the number of pages being requested
> by the user. Ensure that we only copy bmap->num_pages bytes in the
> userspace buffer, if bmap->num_pages is not byte aligned we read
> the trailing bits from the userspace and copy those bits as is.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>

With the suggestions below...

Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>

> ---
>  Documentation/virt/kvm/api.rst  | 27 +++++++++++++
>  arch/x86/include/asm/kvm_host.h |  2 +
>  arch/x86/kvm/svm.c              | 71 +++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c              | 12 ++++++
>  include/uapi/linux/kvm.h        | 12 ++++++
>  5 files changed, 124 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index ebd383fba939..8ad800ebb54f 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4648,6 +4648,33 @@ This ioctl resets VCPU registers and control structures according to
>  the clear cpu reset definition in the POP. However, the cpu is not put
>  into ESA mode. This reset is a superset of the initial reset.
>  
> +4.125 KVM_GET_PAGE_ENC_BITMAP (vm ioctl)
> +---------------------------------------
> +
> +:Capability: basic
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: struct kvm_page_enc_bitmap (in/out)
> +:Returns: 0 on success, -1 on error
> +
> +/* for KVM_GET_PAGE_ENC_BITMAP */
> +struct kvm_page_enc_bitmap {
> +	__u64 start_gfn;
> +	__u64 num_pages;
> +	union {
> +		void __user *enc_bitmap; /* one bit per page */
> +		__u64 padding2;
> +	};
> +};
> +
> +The encrypted VMs have concept of private and shared pages. The private
s/have concept/have the concept/
> +page is encrypted with the guest-specific key, while shared page may
s/page is/pages are/
s/shared page/the shared pages/
> +be encrypted with the hypervisor key. The KVM_GET_PAGE_ENC_BITMAP can
> +be used to get the bitmap indicating whether the guest page is private
> +or shared. The bitmap can be used during the guest migration, if the page
s/, if/. If/
> +is private then userspace need to use SEV migration commands to transmit
s/then userspace need/then the userspace needs/
> +the page.
> +
>  
>  5. The kvm_run structure
>  ========================
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 90718fa3db47..27e43e3ec9d8 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1269,6 +1269,8 @@ struct kvm_x86_ops {
>  	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
>  	int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
>  				  unsigned long sz, unsigned long mode);
> +	int (*get_page_enc_bitmap)(struct kvm *kvm,
> +				struct kvm_page_enc_bitmap *bmap);
>  };
>  
>  struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 1d8beaf1bceb..bae783cd396a 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7686,6 +7686,76 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
>  	return ret;
>  }
>  
> +static int svm_get_page_enc_bitmap(struct kvm *kvm,
> +				   struct kvm_page_enc_bitmap *bmap)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	unsigned long gfn_start, gfn_end;
> +	unsigned long sz, i, sz_bytes;
> +	unsigned long *bitmap;
> +	int ret, n;
> +
> +	if (!sev_guest(kvm))
> +		return -ENOTTY;
> +
> +	gfn_start = bmap->start_gfn;
> +	gfn_end = gfn_start + bmap->num_pages;
> +
> +	sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / BITS_PER_BYTE;
> +	bitmap = kmalloc(sz, GFP_KERNEL);
> +	if (!bitmap)
> +		return -ENOMEM;
> +
> +	/* by default all pages are marked encrypted */
> +	memset(bitmap, 0xff, sz);
> +
> +	mutex_lock(&kvm->lock);
> +	if (sev->page_enc_bmap) {
> +		i = gfn_start;
> +		for_each_clear_bit_from(i, sev->page_enc_bmap,
> +				      min(sev->page_enc_bmap_size, gfn_end))
> +			clear_bit(i - gfn_start, bitmap);
> +	}
> +	mutex_unlock(&kvm->lock);
> +
> +	ret = -EFAULT;
> +
> +	n = bmap->num_pages % BITS_PER_BYTE;
> +	sz_bytes = ALIGN(bmap->num_pages, BITS_PER_BYTE) / BITS_PER_BYTE;
> +
> +	/*
> +	 * Return the correct bitmap as per the number of pages being
> +	 * requested by the user. Ensure that we only copy bmap->num_pages
> +	 * bytes in the userspace buffer, if bmap->num_pages is not byte
> +	 * aligned we read the trailing bits from the userspace and copy
> +	 * those bits as is.
> +	 */
> +
> +	if (n) {
> +		unsigned char *bitmap_kernel = (unsigned char *)bitmap;
> +		unsigned char bitmap_user;
> +		unsigned long offset, mask;
> +
> +		offset = bmap->num_pages / BITS_PER_BYTE;
> +		if (copy_from_user(&bitmap_user, bmap->enc_bitmap + offset,
> +				sizeof(unsigned char)))
> +			goto out;
> +
> +		mask = GENMASK(n - 1, 0);
> +		bitmap_user &= ~mask;
> +		bitmap_kernel[offset] &= mask;
> +		bitmap_kernel[offset] |= bitmap_user;
> +	}
> +
> +	if (copy_to_user(bmap->enc_bitmap, bitmap, sz_bytes))
> +		goto out;
> +
> +	ret = 0;
> +out:
> +	kfree(bitmap);
> +	return ret;
> +}
> +
>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_sev_cmd sev_cmd;
> @@ -8090,6 +8160,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>  	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
>  
>  	.page_enc_status_hc = svm_page_enc_status_hc,
> +	.get_page_enc_bitmap = svm_get_page_enc_bitmap,
>  };
>  
>  static int __init svm_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 68428eef2dde..3c3fea4e20b5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5226,6 +5226,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  	case KVM_SET_PMU_EVENT_FILTER:
>  		r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp);
>  		break;
> +	case KVM_GET_PAGE_ENC_BITMAP: {
> +		struct kvm_page_enc_bitmap bitmap;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
> +			goto out;
> +
> +		r = -ENOTTY;
> +		if (kvm_x86_ops->get_page_enc_bitmap)
> +			r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
> +		break;
> +	}
>  	default:
>  		r = -ENOTTY;
>  	}
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 4e80c57a3182..db1ebf85e177 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -500,6 +500,16 @@ struct kvm_dirty_log {
>  	};
>  };
>  
> +/* for KVM_GET_PAGE_ENC_BITMAP */
> +struct kvm_page_enc_bitmap {
> +	__u64 start_gfn;
> +	__u64 num_pages;
> +	union {
> +		void __user *enc_bitmap; /* one bit per page */
> +		__u64 padding2;
> +	};
> +};
> +
>  /* for KVM_CLEAR_DIRTY_LOG */
>  struct kvm_clear_dirty_log {
>  	__u32 slot;
> @@ -1478,6 +1488,8 @@ struct kvm_enc_region {
>  #define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
>  #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
>  
> +#define KVM_GET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> +
>  /* Secure Encrypted Virtualization command */
>  enum sev_cmd_id {
>  	/* Guest initialization commands */
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 09/14] KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
  2020-03-30  6:22 ` [PATCH v6 09/14] KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl Ashish Kalra
  2020-04-03 18:30   ` Venu Busireddy
@ 2020-04-03 20:18   ` Krish Sadhukhan
  2020-04-03 20:47     ` Ashish Kalra
  2020-04-03 20:55     ` Venu Busireddy
  1 sibling, 2 replies; 107+ messages in thread
From: Krish Sadhukhan @ 2020-04-03 20:18 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh


On 3/29/20 11:22 PM, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
>
> The ioctl can be used to retrieve page encryption bitmap for a given
> gfn range.
>
> Return the correct bitmap as per the number of pages being requested
> by the user. Ensure that we only copy bmap->num_pages bytes in the
> userspace buffer, if bmap->num_pages is not byte aligned we read
> the trailing bits from the userspace and copy those bits as is.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>   Documentation/virt/kvm/api.rst  | 27 +++++++++++++
>   arch/x86/include/asm/kvm_host.h |  2 +
>   arch/x86/kvm/svm.c              | 71 +++++++++++++++++++++++++++++++++
>   arch/x86/kvm/x86.c              | 12 ++++++
>   include/uapi/linux/kvm.h        | 12 ++++++
>   5 files changed, 124 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index ebd383fba939..8ad800ebb54f 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4648,6 +4648,33 @@ This ioctl resets VCPU registers and control structures according to
>   the clear cpu reset definition in the POP. However, the cpu is not put
>   into ESA mode. This reset is a superset of the initial reset.
>   
> +4.125 KVM_GET_PAGE_ENC_BITMAP (vm ioctl)
> +---------------------------------------
> +
> +:Capability: basic
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: struct kvm_page_enc_bitmap (in/out)
> +:Returns: 0 on success, -1 on error
> +
> +/* for KVM_GET_PAGE_ENC_BITMAP */
> +struct kvm_page_enc_bitmap {
> +	__u64 start_gfn;
> +	__u64 num_pages;
> +	union {
> +		void __user *enc_bitmap; /* one bit per page */
> +		__u64 padding2;
> +	};
> +};
> +
> +The encrypted VMs have concept of private and shared pages. The private
> +page is encrypted with the guest-specific key, while shared page may
> +be encrypted with the hypervisor key. The KVM_GET_PAGE_ENC_BITMAP can
> +be used to get the bitmap indicating whether the guest page is private
> +or shared. The bitmap can be used during the guest migration, if the page
> +is private then userspace need to use SEV migration commands to transmit
> +the page.
> +
>   
>   5. The kvm_run structure
>   ========================
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 90718fa3db47..27e43e3ec9d8 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1269,6 +1269,8 @@ struct kvm_x86_ops {
>   	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
>   	int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
>   				  unsigned long sz, unsigned long mode);
> +	int (*get_page_enc_bitmap)(struct kvm *kvm,
> +				struct kvm_page_enc_bitmap *bmap);


Looking back at the previous patch, it seems that these two are 
basically the setter/getter action for page encryption, though one is 
implemented as a hypercall while the other as an ioctl. If we consider 
the setter/getter aspect, isn't it better to have some sort of symmetry 
in the naming of the ops ? For example,

         set_page_enc_hc

         get_page_enc_ioctl

>   };
>   
>   struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 1d8beaf1bceb..bae783cd396a 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7686,6 +7686,76 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
>   	return ret;
>   }
>   
> +static int svm_get_page_enc_bitmap(struct kvm *kvm,
> +				   struct kvm_page_enc_bitmap *bmap)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	unsigned long gfn_start, gfn_end;
> +	unsigned long sz, i, sz_bytes;
> +	unsigned long *bitmap;
> +	int ret, n;
> +
> +	if (!sev_guest(kvm))
> +		return -ENOTTY;
> +
> +	gfn_start = bmap->start_gfn;


What if bmap->start_gfn is junk ?

> +	gfn_end = gfn_start + bmap->num_pages;
> +
> +	sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / BITS_PER_BYTE;
> +	bitmap = kmalloc(sz, GFP_KERNEL);
> +	if (!bitmap)
> +		return -ENOMEM;
> +
> +	/* by default all pages are marked encrypted */
> +	memset(bitmap, 0xff, sz);
> +
> +	mutex_lock(&kvm->lock);
> +	if (sev->page_enc_bmap) {
> +		i = gfn_start;
> +		for_each_clear_bit_from(i, sev->page_enc_bmap,
> +				      min(sev->page_enc_bmap_size, gfn_end))
> +			clear_bit(i - gfn_start, bitmap);
> +	}
> +	mutex_unlock(&kvm->lock);
> +
> +	ret = -EFAULT;
> +
> +	n = bmap->num_pages % BITS_PER_BYTE;
> +	sz_bytes = ALIGN(bmap->num_pages, BITS_PER_BYTE) / BITS_PER_BYTE;
> +
> +	/*
> +	 * Return the correct bitmap as per the number of pages being
> +	 * requested by the user. Ensure that we only copy bmap->num_pages
> +	 * bytes in the userspace buffer, if bmap->num_pages is not byte
> +	 * aligned we read the trailing bits from the userspace and copy
> +	 * those bits as is.
> +	 */
> +
> +	if (n) {


Is it better to check for 'num_pages' at the beginning of the function 
rather than coming this far if bmap->num_pages is zero ?

> +		unsigned char *bitmap_kernel = (unsigned char *)bitmap;


Just trying to understand why you need this extra variable instead of 
using 'bitmap' directly.

> +		unsigned char bitmap_user;
> +		unsigned long offset, mask;
> +
> +		offset = bmap->num_pages / BITS_PER_BYTE;
> +		if (copy_from_user(&bitmap_user, bmap->enc_bitmap + offset,
> +				sizeof(unsigned char)))
> +			goto out;
> +
> +		mask = GENMASK(n - 1, 0);
> +		bitmap_user &= ~mask;
> +		bitmap_kernel[offset] &= mask;
> +		bitmap_kernel[offset] |= bitmap_user;
> +	}
> +
> +	if (copy_to_user(bmap->enc_bitmap, bitmap, sz_bytes))


If 'n' is zero, we are still copying stuff back to the user. Is that 
what is expected from userland ?

Another point. Since copy_from_user() was done in the caller, isn't it 
better to move this to the caller to keep a symmetry ?

> +		goto out;
> +
> +	ret = 0;
> +out:
> +	kfree(bitmap);
> +	return ret;
> +}
> +
>   static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   {
>   	struct kvm_sev_cmd sev_cmd;
> @@ -8090,6 +8160,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>   	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
>   
>   	.page_enc_status_hc = svm_page_enc_status_hc,
> +	.get_page_enc_bitmap = svm_get_page_enc_bitmap,
>   };
>   
>   static int __init svm_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 68428eef2dde..3c3fea4e20b5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5226,6 +5226,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
>   	case KVM_SET_PMU_EVENT_FILTER:
>   		r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp);
>   		break;
> +	case KVM_GET_PAGE_ENC_BITMAP: {
> +		struct kvm_page_enc_bitmap bitmap;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
> +			goto out;
> +
> +		r = -ENOTTY;
> +		if (kvm_x86_ops->get_page_enc_bitmap)
> +			r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
> +		break;
> +	}
>   	default:
>   		r = -ENOTTY;
>   	}
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 4e80c57a3182..db1ebf85e177 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -500,6 +500,16 @@ struct kvm_dirty_log {
>   	};
>   };
>   
> +/* for KVM_GET_PAGE_ENC_BITMAP */
> +struct kvm_page_enc_bitmap {
> +	__u64 start_gfn;
> +	__u64 num_pages;
> +	union {
> +		void __user *enc_bitmap; /* one bit per page */
> +		__u64 padding2;
> +	};
> +};
> +
>   /* for KVM_CLEAR_DIRTY_LOG */
>   struct kvm_clear_dirty_log {
>   	__u32 slot;
> @@ -1478,6 +1488,8 @@ struct kvm_enc_region {
>   #define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
>   #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
>   
> +#define KVM_GET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> +
>   /* Secure Encrypted Virtualization command */
>   enum sev_cmd_id {
>   	/* Guest initialization commands */

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 09/14] KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
  2020-04-03 20:18   ` Krish Sadhukhan
@ 2020-04-03 20:47     ` Ashish Kalra
  2020-04-06 22:07       ` Krish Sadhukhan
  2020-04-03 20:55     ` Venu Busireddy
  1 sibling, 1 reply; 107+ messages in thread
From: Ashish Kalra @ 2020-04-03 20:47 UTC (permalink / raw)
  To: Krish Sadhukhan
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On Fri, Apr 03, 2020 at 01:18:52PM -0700, Krish Sadhukhan wrote:
> 
> On 3/29/20 11:22 PM, Ashish Kalra wrote:
> > From: Brijesh Singh <Brijesh.Singh@amd.com>
> > 
> > The ioctl can be used to retrieve page encryption bitmap for a given
> > gfn range.
> > 
> > Return the correct bitmap as per the number of pages being requested
> > by the user. Ensure that we only copy bmap->num_pages bytes in the
> > userspace buffer, if bmap->num_pages is not byte aligned we read
> > the trailing bits from the userspace and copy those bits as is.
> > 
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > Cc: x86@kernel.org
> > Cc: kvm@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > ---
> >   Documentation/virt/kvm/api.rst  | 27 +++++++++++++
> >   arch/x86/include/asm/kvm_host.h |  2 +
> >   arch/x86/kvm/svm.c              | 71 +++++++++++++++++++++++++++++++++
> >   arch/x86/kvm/x86.c              | 12 ++++++
> >   include/uapi/linux/kvm.h        | 12 ++++++
> >   5 files changed, 124 insertions(+)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index ebd383fba939..8ad800ebb54f 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -4648,6 +4648,33 @@ This ioctl resets VCPU registers and control structures according to
> >   the clear cpu reset definition in the POP. However, the cpu is not put
> >   into ESA mode. This reset is a superset of the initial reset.
> > +4.125 KVM_GET_PAGE_ENC_BITMAP (vm ioctl)
> > +---------------------------------------
> > +
> > +:Capability: basic
> > +:Architectures: x86
> > +:Type: vm ioctl
> > +:Parameters: struct kvm_page_enc_bitmap (in/out)
> > +:Returns: 0 on success, -1 on error
> > +
> > +/* for KVM_GET_PAGE_ENC_BITMAP */
> > +struct kvm_page_enc_bitmap {
> > +	__u64 start_gfn;
> > +	__u64 num_pages;
> > +	union {
> > +		void __user *enc_bitmap; /* one bit per page */
> > +		__u64 padding2;
> > +	};
> > +};
> > +
> > +The encrypted VMs have concept of private and shared pages. The private
> > +page is encrypted with the guest-specific key, while shared page may
> > +be encrypted with the hypervisor key. The KVM_GET_PAGE_ENC_BITMAP can
> > +be used to get the bitmap indicating whether the guest page is private
> > +or shared. The bitmap can be used during the guest migration, if the page
> > +is private then userspace need to use SEV migration commands to transmit
> > +the page.
> > +
> >   5. The kvm_run structure
> >   ========================
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 90718fa3db47..27e43e3ec9d8 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1269,6 +1269,8 @@ struct kvm_x86_ops {
> >   	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> >   	int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> >   				  unsigned long sz, unsigned long mode);
> > +	int (*get_page_enc_bitmap)(struct kvm *kvm,
> > +				struct kvm_page_enc_bitmap *bmap);
> 
> 
> Looking back at the previous patch, it seems that these two are basically
> the setter/getter action for page encryption, though one is implemented as a
> hypercall while the other as an ioctl. If we consider the setter/getter
> aspect, isn't it better to have some sort of symmetry in the naming of the
> ops ? For example,
> 
>         set_page_enc_hc
> 
>         get_page_enc_ioctl
> 
> >   };

These are named as per their usage. While the page_enc_status_hc is a
hypercall used by a guest to mark the page encryption bitmap, the other
ones are ioctl interfaces used by Qemu (or Qemu alternative) to get/set
the page encryption bitmaps, so these are named accordingly.

> >   struct kvm_arch_async_pf {
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 1d8beaf1bceb..bae783cd396a 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -7686,6 +7686,76 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> >   	return ret;
> >   }
> > +static int svm_get_page_enc_bitmap(struct kvm *kvm,
> > +				   struct kvm_page_enc_bitmap *bmap)
> > +{
> > +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +	unsigned long gfn_start, gfn_end;
> > +	unsigned long sz, i, sz_bytes;
> > +	unsigned long *bitmap;
> > +	int ret, n;
> > +
> > +	if (!sev_guest(kvm))
> > +		return -ENOTTY;
> > +
> > +	gfn_start = bmap->start_gfn;
> 
> 
> What if bmap->start_gfn is junk ?
> 
> > +	gfn_end = gfn_start + bmap->num_pages;
> > +
> > +	sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / BITS_PER_BYTE;
> > +	bitmap = kmalloc(sz, GFP_KERNEL);
> > +	if (!bitmap)
> > +		return -ENOMEM;
> > +
> > +	/* by default all pages are marked encrypted */
> > +	memset(bitmap, 0xff, sz);
> > +
> > +	mutex_lock(&kvm->lock);
> > +	if (sev->page_enc_bmap) {
> > +		i = gfn_start;
> > +		for_each_clear_bit_from(i, sev->page_enc_bmap,
> > +				      min(sev->page_enc_bmap_size, gfn_end))
> > +			clear_bit(i - gfn_start, bitmap);
> > +	}
> > +	mutex_unlock(&kvm->lock);
> > +
> > +	ret = -EFAULT;
> > +
> > +	n = bmap->num_pages % BITS_PER_BYTE;
> > +	sz_bytes = ALIGN(bmap->num_pages, BITS_PER_BYTE) / BITS_PER_BYTE;
> > +
> > +	/*
> > +	 * Return the correct bitmap as per the number of pages being
> > +	 * requested by the user. Ensure that we only copy bmap->num_pages
> > +	 * bytes in the userspace buffer, if bmap->num_pages is not byte
> > +	 * aligned we read the trailing bits from the userspace and copy
> > +	 * those bits as is.
> > +	 */
> > +
> > +	if (n) {
> 
> 
> Is it better to check for 'num_pages' at the beginning of the function
> rather than coming this far if bmap->num_pages is zero ?
> 

This is not checking for "num_pages", this is basically checking if
bmap->num_pages is not byte aligned.

> > +		unsigned char *bitmap_kernel = (unsigned char *)bitmap;
> 
> 
> Just trying to understand why you need this extra variable instead of using
> 'bitmap' directly.
> 

Makes the code much more readable/understandable.

> > +		unsigned char bitmap_user;
> > +		unsigned long offset, mask;
> > +
> > +		offset = bmap->num_pages / BITS_PER_BYTE;
> > +		if (copy_from_user(&bitmap_user, bmap->enc_bitmap + offset,
> > +				sizeof(unsigned char)))
> > +			goto out;
> > +
> > +		mask = GENMASK(n - 1, 0);
> > +		bitmap_user &= ~mask;
> > +		bitmap_kernel[offset] &= mask;
> > +		bitmap_kernel[offset] |= bitmap_user;
> > +	}
> > +
> > +	if (copy_to_user(bmap->enc_bitmap, bitmap, sz_bytes))
> 
> 
> If 'n' is zero, we are still copying stuff back to the user. Is that what is
> expected from userland ?
> 
> Another point. Since copy_from_user() was done in the caller, isn't it
> better to move this to the caller to keep a symmetry ?
>

As per the comments above, please note if n is not zero that means 
bmap->num_pages is not byte aligned so we read the trailing bits
from the userspace and copy those bits as is. If n is zero, then
bmap->num_pages is correctly aligned and we copy all the bytes back.

Thanks,
Ashish

> > +		goto out;
> > +
> > +	ret = 0;
> > +out:
> > +	kfree(bitmap);
> > +	return ret;
> > +}
> > +
> >   static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >   {
> >   	struct kvm_sev_cmd sev_cmd;
> > @@ -8090,6 +8160,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >   	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
> >   	.page_enc_status_hc = svm_page_enc_status_hc,
> > +	.get_page_enc_bitmap = svm_get_page_enc_bitmap,
> >   };
> >   static int __init svm_init(void)
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 68428eef2dde..3c3fea4e20b5 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5226,6 +5226,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >   	case KVM_SET_PMU_EVENT_FILTER:
> >   		r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp);
> >   		break;
> > +	case KVM_GET_PAGE_ENC_BITMAP: {
> > +		struct kvm_page_enc_bitmap bitmap;
> > +
> > +		r = -EFAULT;
> > +		if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
> > +			goto out;
> > +
> > +		r = -ENOTTY;
> > +		if (kvm_x86_ops->get_page_enc_bitmap)
> > +			r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
> > +		break;
> > +	}
> >   	default:
> >   		r = -ENOTTY;
> >   	}
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 4e80c57a3182..db1ebf85e177 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -500,6 +500,16 @@ struct kvm_dirty_log {
> >   	};
> >   };
> > +/* for KVM_GET_PAGE_ENC_BITMAP */
> > +struct kvm_page_enc_bitmap {
> > +	__u64 start_gfn;
> > +	__u64 num_pages;
> > +	union {
> > +		void __user *enc_bitmap; /* one bit per page */
> > +		__u64 padding2;
> > +	};
> > +};
> > +
> >   /* for KVM_CLEAR_DIRTY_LOG */
> >   struct kvm_clear_dirty_log {
> >   	__u32 slot;
> > @@ -1478,6 +1488,8 @@ struct kvm_enc_region {
> >   #define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
> >   #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
> > +#define KVM_GET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > +
> >   /* Secure Encrypted Virtualization command */
> >   enum sev_cmd_id {
> >   	/* Guest initialization commands */

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 09/14] KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
  2020-04-03 20:18   ` Krish Sadhukhan
  2020-04-03 20:47     ` Ashish Kalra
@ 2020-04-03 20:55     ` Venu Busireddy
  2020-04-03 21:01       ` Ashish Kalra
  1 sibling, 1 reply; 107+ messages in thread
From: Venu Busireddy @ 2020-04-03 20:55 UTC (permalink / raw)
  To: Krish Sadhukhan
  Cc: Ashish Kalra, pbonzini, tglx, mingo, hpa, joro, bp,
	thomas.lendacky, x86, kvm, linux-kernel, rientjes, srutherford,
	luto, brijesh.singh

On 2020-04-03 13:18:52 -0700, Krish Sadhukhan wrote:
> 
> On 3/29/20 11:22 PM, Ashish Kalra wrote:
> > From: Brijesh Singh <Brijesh.Singh@amd.com>
> > 
> > The ioctl can be used to retrieve page encryption bitmap for a given
> > gfn range.
> > 
> > Return the correct bitmap as per the number of pages being requested
> > by the user. Ensure that we only copy bmap->num_pages bytes in the
> > userspace buffer, if bmap->num_pages is not byte aligned we read
> > the trailing bits from the userspace and copy those bits as is.
> > 
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > Cc: x86@kernel.org
> > Cc: kvm@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > ---
> >   Documentation/virt/kvm/api.rst  | 27 +++++++++++++
> >   arch/x86/include/asm/kvm_host.h |  2 +
> >   arch/x86/kvm/svm.c              | 71 +++++++++++++++++++++++++++++++++
> >   arch/x86/kvm/x86.c              | 12 ++++++
> >   include/uapi/linux/kvm.h        | 12 ++++++
> >   5 files changed, 124 insertions(+)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index ebd383fba939..8ad800ebb54f 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -4648,6 +4648,33 @@ This ioctl resets VCPU registers and control structures according to
> >   the clear cpu reset definition in the POP. However, the cpu is not put
> >   into ESA mode. This reset is a superset of the initial reset.
> > +4.125 KVM_GET_PAGE_ENC_BITMAP (vm ioctl)
> > +---------------------------------------
> > +
> > +:Capability: basic
> > +:Architectures: x86
> > +:Type: vm ioctl
> > +:Parameters: struct kvm_page_enc_bitmap (in/out)
> > +:Returns: 0 on success, -1 on error
> > +
> > +/* for KVM_GET_PAGE_ENC_BITMAP */
> > +struct kvm_page_enc_bitmap {
> > +	__u64 start_gfn;
> > +	__u64 num_pages;
> > +	union {
> > +		void __user *enc_bitmap; /* one bit per page */
> > +		__u64 padding2;
> > +	};
> > +};
> > +
> > +The encrypted VMs have concept of private and shared pages. The private
> > +page is encrypted with the guest-specific key, while shared page may
> > +be encrypted with the hypervisor key. The KVM_GET_PAGE_ENC_BITMAP can
> > +be used to get the bitmap indicating whether the guest page is private
> > +or shared. The bitmap can be used during the guest migration, if the page
> > +is private then userspace need to use SEV migration commands to transmit
> > +the page.
> > +
> >   5. The kvm_run structure
> >   ========================
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 90718fa3db47..27e43e3ec9d8 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1269,6 +1269,8 @@ struct kvm_x86_ops {
> >   	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> >   	int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> >   				  unsigned long sz, unsigned long mode);
> > +	int (*get_page_enc_bitmap)(struct kvm *kvm,
> > +				struct kvm_page_enc_bitmap *bmap);
> 
> 
> Looking back at the previous patch, it seems that these two are basically
> the setter/getter action for page encryption, though one is implemented as a
> hypercall while the other as an ioctl. If we consider the setter/getter
> aspect, isn't it better to have some sort of symmetry in the naming of the
> ops ? For example,
> 
>         set_page_enc_hc
> 
>         get_page_enc_ioctl
> 
> >   };
> >   struct kvm_arch_async_pf {
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 1d8beaf1bceb..bae783cd396a 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -7686,6 +7686,76 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> >   	return ret;
> >   }
> > +static int svm_get_page_enc_bitmap(struct kvm *kvm,
> > +				   struct kvm_page_enc_bitmap *bmap)
> > +{
> > +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +	unsigned long gfn_start, gfn_end;
> > +	unsigned long sz, i, sz_bytes;
> > +	unsigned long *bitmap;
> > +	int ret, n;
> > +
> > +	if (!sev_guest(kvm))
> > +		return -ENOTTY;
> > +
> > +	gfn_start = bmap->start_gfn;
> 
> 
> What if bmap->start_gfn is junk ?
> 
> > +	gfn_end = gfn_start + bmap->num_pages;
> > +
> > +	sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / BITS_PER_BYTE;
> > +	bitmap = kmalloc(sz, GFP_KERNEL);
> > +	if (!bitmap)
> > +		return -ENOMEM;
> > +
> > +	/* by default all pages are marked encrypted */
> > +	memset(bitmap, 0xff, sz);
> > +
> > +	mutex_lock(&kvm->lock);
> > +	if (sev->page_enc_bmap) {
> > +		i = gfn_start;
> > +		for_each_clear_bit_from(i, sev->page_enc_bmap,
> > +				      min(sev->page_enc_bmap_size, gfn_end))
> > +			clear_bit(i - gfn_start, bitmap);
> > +	}
> > +	mutex_unlock(&kvm->lock);
> > +
> > +	ret = -EFAULT;
> > +
> > +	n = bmap->num_pages % BITS_PER_BYTE;
> > +	sz_bytes = ALIGN(bmap->num_pages, BITS_PER_BYTE) / BITS_PER_BYTE;
> > +
> > +	/*
> > +	 * Return the correct bitmap as per the number of pages being
> > +	 * requested by the user. Ensure that we only copy bmap->num_pages
> > +	 * bytes in the userspace buffer, if bmap->num_pages is not byte
> > +	 * aligned we read the trailing bits from the userspace and copy
> > +	 * those bits as is.
> > +	 */
> > +
> > +	if (n) {
> 
> 
> Is it better to check for 'num_pages' at the beginning of the function
> rather than coming this far if bmap->num_pages is zero ?
> 
> > +		unsigned char *bitmap_kernel = (unsigned char *)bitmap;
> 
> 
> Just trying to understand why you need this extra variable instead of using
> 'bitmap' directly.
> 
> > +		unsigned char bitmap_user;
> > +		unsigned long offset, mask;
> > +
> > +		offset = bmap->num_pages / BITS_PER_BYTE;
> > +		if (copy_from_user(&bitmap_user, bmap->enc_bitmap + offset,
> > +				sizeof(unsigned char)))
> > +			goto out;
> > +
> > +		mask = GENMASK(n - 1, 0);
> > +		bitmap_user &= ~mask;
> > +		bitmap_kernel[offset] &= mask;
> > +		bitmap_kernel[offset] |= bitmap_user;
> > +	}
> > +
> > +	if (copy_to_user(bmap->enc_bitmap, bitmap, sz_bytes))
> 
> 
> If 'n' is zero, we are still copying stuff back to the user. Is that what is
> expected from userland ?
> 
> Another point. Since copy_from_user() was done in the caller, isn't it
> better to move this to the caller to keep a symmetry ?

That would need the interface of .get_page_enc_bitmap to change, to pass
back the local bitmap to the caller for use in copy_to_user() and then
free it up. I think it is better to call copy_to_user() here and free
the bitmap before returning.

> 
> > +		goto out;
> > +
> > +	ret = 0;
> > +out:
> > +	kfree(bitmap);
> > +	return ret;
> > +}
> > +
> >   static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >   {
> >   	struct kvm_sev_cmd sev_cmd;
> > @@ -8090,6 +8160,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >   	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
> >   	.page_enc_status_hc = svm_page_enc_status_hc,
> > +	.get_page_enc_bitmap = svm_get_page_enc_bitmap,
> >   };
> >   static int __init svm_init(void)
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 68428eef2dde..3c3fea4e20b5 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5226,6 +5226,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >   	case KVM_SET_PMU_EVENT_FILTER:
> >   		r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp);
> >   		break;
> > +	case KVM_GET_PAGE_ENC_BITMAP: {
> > +		struct kvm_page_enc_bitmap bitmap;
> > +
> > +		r = -EFAULT;
> > +		if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
> > +			goto out;
> > +
> > +		r = -ENOTTY;
> > +		if (kvm_x86_ops->get_page_enc_bitmap)
> > +			r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
> > +		break;
> > +	}
> >   	default:
> >   		r = -ENOTTY;
> >   	}
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 4e80c57a3182..db1ebf85e177 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -500,6 +500,16 @@ struct kvm_dirty_log {
> >   	};
> >   };
> > +/* for KVM_GET_PAGE_ENC_BITMAP */
> > +struct kvm_page_enc_bitmap {
> > +	__u64 start_gfn;
> > +	__u64 num_pages;
> > +	union {
> > +		void __user *enc_bitmap; /* one bit per page */
> > +		__u64 padding2;
> > +	};
> > +};
> > +
> >   /* for KVM_CLEAR_DIRTY_LOG */
> >   struct kvm_clear_dirty_log {
> >   	__u32 slot;
> > @@ -1478,6 +1488,8 @@ struct kvm_enc_region {
> >   #define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
> >   #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
> > +#define KVM_GET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > +
> >   /* Secure Encrypted Virtualization command */
> >   enum sev_cmd_id {
> >   	/* Guest initialization commands */

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 09/14] KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
  2020-04-03 20:55     ` Venu Busireddy
@ 2020-04-03 21:01       ` Ashish Kalra
  0 siblings, 0 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-04-03 21:01 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: Krish Sadhukhan, pbonzini, tglx, mingo, hpa, joro, bp,
	thomas.lendacky, x86, kvm, linux-kernel, rientjes, srutherford,
	luto, brijesh.singh

On Fri, Apr 03, 2020 at 03:55:07PM -0500, Venu Busireddy wrote:
> On 2020-04-03 13:18:52 -0700, Krish Sadhukhan wrote:
> > 
> > On 3/29/20 11:22 PM, Ashish Kalra wrote:
> > > From: Brijesh Singh <Brijesh.Singh@amd.com>
> > > 
> > > The ioctl can be used to retrieve page encryption bitmap for a given
> > > gfn range.
> > > 
> > > Return the correct bitmap as per the number of pages being requested
> > > by the user. Ensure that we only copy bmap->num_pages bytes in the
> > > userspace buffer, if bmap->num_pages is not byte aligned we read
> > > the trailing bits from the userspace and copy those bits as is.
> > > 
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > > Cc: Joerg Roedel <joro@8bytes.org>
> > > Cc: Borislav Petkov <bp@suse.de>
> > > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > > Cc: x86@kernel.org
> > > Cc: kvm@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org
> > > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > ---
> > >   Documentation/virt/kvm/api.rst  | 27 +++++++++++++
> > >   arch/x86/include/asm/kvm_host.h |  2 +
> > >   arch/x86/kvm/svm.c              | 71 +++++++++++++++++++++++++++++++++
> > >   arch/x86/kvm/x86.c              | 12 ++++++
> > >   include/uapi/linux/kvm.h        | 12 ++++++
> > >   5 files changed, 124 insertions(+)
> > > 
> > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > index ebd383fba939..8ad800ebb54f 100644
> > > --- a/Documentation/virt/kvm/api.rst
> > > +++ b/Documentation/virt/kvm/api.rst
> > > @@ -4648,6 +4648,33 @@ This ioctl resets VCPU registers and control structures according to
> > >   the clear cpu reset definition in the POP. However, the cpu is not put
> > >   into ESA mode. This reset is a superset of the initial reset.
> > > +4.125 KVM_GET_PAGE_ENC_BITMAP (vm ioctl)
> > > +---------------------------------------
> > > +
> > > +:Capability: basic
> > > +:Architectures: x86
> > > +:Type: vm ioctl
> > > +:Parameters: struct kvm_page_enc_bitmap (in/out)
> > > +:Returns: 0 on success, -1 on error
> > > +
> > > +/* for KVM_GET_PAGE_ENC_BITMAP */
> > > +struct kvm_page_enc_bitmap {
> > > +	__u64 start_gfn;
> > > +	__u64 num_pages;
> > > +	union {
> > > +		void __user *enc_bitmap; /* one bit per page */
> > > +		__u64 padding2;
> > > +	};
> > > +};
> > > +
> > > +The encrypted VMs have concept of private and shared pages. The private
> > > +page is encrypted with the guest-specific key, while shared page may
> > > +be encrypted with the hypervisor key. The KVM_GET_PAGE_ENC_BITMAP can
> > > +be used to get the bitmap indicating whether the guest page is private
> > > +or shared. The bitmap can be used during the guest migration, if the page
> > > +is private then userspace need to use SEV migration commands to transmit
> > > +the page.
> > > +
> > >   5. The kvm_run structure
> > >   ========================
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 90718fa3db47..27e43e3ec9d8 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1269,6 +1269,8 @@ struct kvm_x86_ops {
> > >   	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > >   	int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > >   				  unsigned long sz, unsigned long mode);
> > > +	int (*get_page_enc_bitmap)(struct kvm *kvm,
> > > +				struct kvm_page_enc_bitmap *bmap);
> > 
> > 
> > Looking back at the previous patch, it seems that these two are basically
> > the setter/getter action for page encryption, though one is implemented as a
> > hypercall while the other as an ioctl. If we consider the setter/getter
> > aspect, isn't it better to have some sort of symmetry in the naming of the
> > ops ? For example,
> > 
> >         set_page_enc_hc
> > 
> >         get_page_enc_ioctl
> > 
> > >   };
> > >   struct kvm_arch_async_pf {
> > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > index 1d8beaf1bceb..bae783cd396a 100644
> > > --- a/arch/x86/kvm/svm.c
> > > +++ b/arch/x86/kvm/svm.c
> > > @@ -7686,6 +7686,76 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > >   	return ret;
> > >   }
> > > +static int svm_get_page_enc_bitmap(struct kvm *kvm,
> > > +				   struct kvm_page_enc_bitmap *bmap)
> > > +{
> > > +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > +	unsigned long gfn_start, gfn_end;
> > > +	unsigned long sz, i, sz_bytes;
> > > +	unsigned long *bitmap;
> > > +	int ret, n;
> > > +
> > > +	if (!sev_guest(kvm))
> > > +		return -ENOTTY;
> > > +
> > > +	gfn_start = bmap->start_gfn;
> > 
> > 
> > What if bmap->start_gfn is junk ?
> > 
> > > +	gfn_end = gfn_start + bmap->num_pages;
> > > +
> > > +	sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / BITS_PER_BYTE;
> > > +	bitmap = kmalloc(sz, GFP_KERNEL);
> > > +	if (!bitmap)
> > > +		return -ENOMEM;
> > > +
> > > +	/* by default all pages are marked encrypted */
> > > +	memset(bitmap, 0xff, sz);
> > > +
> > > +	mutex_lock(&kvm->lock);
> > > +	if (sev->page_enc_bmap) {
> > > +		i = gfn_start;
> > > +		for_each_clear_bit_from(i, sev->page_enc_bmap,
> > > +				      min(sev->page_enc_bmap_size, gfn_end))
> > > +			clear_bit(i - gfn_start, bitmap);
> > > +	}
> > > +	mutex_unlock(&kvm->lock);
> > > +
> > > +	ret = -EFAULT;
> > > +
> > > +	n = bmap->num_pages % BITS_PER_BYTE;
> > > +	sz_bytes = ALIGN(bmap->num_pages, BITS_PER_BYTE) / BITS_PER_BYTE;
> > > +
> > > +	/*
> > > +	 * Return the correct bitmap as per the number of pages being
> > > +	 * requested by the user. Ensure that we only copy bmap->num_pages
> > > +	 * bytes in the userspace buffer, if bmap->num_pages is not byte
> > > +	 * aligned we read the trailing bits from the userspace and copy
> > > +	 * those bits as is.
> > > +	 */
> > > +
> > > +	if (n) {
> > 
> > 
> > Is it better to check for 'num_pages' at the beginning of the function
> > rather than coming this far if bmap->num_pages is zero ?
> > 
> > > +		unsigned char *bitmap_kernel = (unsigned char *)bitmap;
> > 
> > 
> > Just trying to understand why you need this extra variable instead of using
> > 'bitmap' directly.
> > 
> > > +		unsigned char bitmap_user;
> > > +		unsigned long offset, mask;
> > > +
> > > +		offset = bmap->num_pages / BITS_PER_BYTE;
> > > +		if (copy_from_user(&bitmap_user, bmap->enc_bitmap + offset,
> > > +				sizeof(unsigned char)))
> > > +			goto out;
> > > +
> > > +		mask = GENMASK(n - 1, 0);
> > > +		bitmap_user &= ~mask;
> > > +		bitmap_kernel[offset] &= mask;
> > > +		bitmap_kernel[offset] |= bitmap_user;
> > > +	}
> > > +
> > > +	if (copy_to_user(bmap->enc_bitmap, bitmap, sz_bytes))
> > 
> > 
> > If 'n' is zero, we are still copying stuff back to the user. Is that what is
> > expected from userland ?
> > 
> > Another point. Since copy_from_user() was done in the caller, isn't it
> > better to move this to the caller to keep a symmetry ?
> 
> That would need the interface of .get_page_enc_bitmap to change, to pass
> back the local bitmap to the caller for use in copy_to_user() and then
> free it up. I think it is better to call copy_to_user() here and free
> the bitmap before returning.
> 

As i replied in my earlier response to this patch, please note that
as per comments above, here we are checking if bmap->num_pages is not byte
aligned and if not then we read the trailing bits from the userspace and copy
those bits as is.

Thanks,
Ashish

> > 
> > > +		goto out;
> > > +
> > > +	ret = 0;
> > > +out:
> > > +	kfree(bitmap);
> > > +	return ret;
> > > +}
> > > +
> > >   static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > >   {
> > >   	struct kvm_sev_cmd sev_cmd;
> > > @@ -8090,6 +8160,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > >   	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > >   	.page_enc_status_hc = svm_page_enc_status_hc,
> > > +	.get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > >   };
> > >   static int __init svm_init(void)
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 68428eef2dde..3c3fea4e20b5 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -5226,6 +5226,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > >   	case KVM_SET_PMU_EVENT_FILTER:
> > >   		r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp);
> > >   		break;
> > > +	case KVM_GET_PAGE_ENC_BITMAP: {
> > > +		struct kvm_page_enc_bitmap bitmap;
> > > +
> > > +		r = -EFAULT;
> > > +		if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
> > > +			goto out;
> > > +
> > > +		r = -ENOTTY;
> > > +		if (kvm_x86_ops->get_page_enc_bitmap)
> > > +			r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
> > > +		break;
> > > +	}
> > >   	default:
> > >   		r = -ENOTTY;
> > >   	}
> > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > index 4e80c57a3182..db1ebf85e177 100644
> > > --- a/include/uapi/linux/kvm.h
> > > +++ b/include/uapi/linux/kvm.h
> > > @@ -500,6 +500,16 @@ struct kvm_dirty_log {
> > >   	};
> > >   };
> > > +/* for KVM_GET_PAGE_ENC_BITMAP */
> > > +struct kvm_page_enc_bitmap {
> > > +	__u64 start_gfn;
> > > +	__u64 num_pages;
> > > +	union {
> > > +		void __user *enc_bitmap; /* one bit per page */
> > > +		__u64 padding2;
> > > +	};
> > > +};
> > > +
> > >   /* for KVM_CLEAR_DIRTY_LOG */
> > >   struct kvm_clear_dirty_log {
> > >   	__u32 slot;
> > > @@ -1478,6 +1488,8 @@ struct kvm_enc_region {
> > >   #define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
> > >   #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
> > > +#define KVM_GET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > > +
> > >   /* Secure Encrypted Virtualization command */
> > >   enum sev_cmd_id {
> > >   	/* Guest initialization commands */

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 10/14] mm: x86: Invoke hypercall when page encryption status is changed
  2020-03-30  6:22 ` [PATCH v6 10/14] mm: x86: Invoke hypercall when page encryption status is changed Ashish Kalra
@ 2020-04-03 21:07   ` Krish Sadhukhan
  2020-04-03 21:30     ` Ashish Kalra
  2020-04-03 21:36   ` Venu Busireddy
  1 sibling, 1 reply; 107+ messages in thread
From: Krish Sadhukhan @ 2020-04-03 21:07 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh


On 3/29/20 11:22 PM, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
>
> Invoke a hypercall when a memory region is changed from encrypted ->
> decrypted and vice versa. Hypervisor need to know the page encryption
> status during the guest migration.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>   arch/x86/include/asm/paravirt.h       | 10 +++++
>   arch/x86/include/asm/paravirt_types.h |  2 +
>   arch/x86/kernel/paravirt.c            |  1 +
>   arch/x86/mm/mem_encrypt.c             | 57 ++++++++++++++++++++++++++-
>   arch/x86/mm/pat/set_memory.c          |  7 ++++
>   5 files changed, 76 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index 694d8daf4983..8127b9c141bf 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -78,6 +78,12 @@ static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
>   	PVOP_VCALL1(mmu.exit_mmap, mm);
>   }
>   
> +static inline void page_encryption_changed(unsigned long vaddr, int npages,
> +						bool enc)
> +{
> +	PVOP_VCALL3(mmu.page_encryption_changed, vaddr, npages, enc);
> +}
> +
>   #ifdef CONFIG_PARAVIRT_XXL
>   static inline void load_sp0(unsigned long sp0)
>   {
> @@ -946,6 +952,10 @@ static inline void paravirt_arch_dup_mmap(struct mm_struct *oldmm,
>   static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
>   {
>   }
> +
> +static inline void page_encryption_changed(unsigned long vaddr, int npages, bool enc)
> +{
> +}
>   #endif
>   #endif /* __ASSEMBLY__ */
>   #endif /* _ASM_X86_PARAVIRT_H */
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index 732f62e04ddb..03bfd515c59c 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -215,6 +215,8 @@ struct pv_mmu_ops {
>   
>   	/* Hook for intercepting the destruction of an mm_struct. */
>   	void (*exit_mmap)(struct mm_struct *mm);
> +	void (*page_encryption_changed)(unsigned long vaddr, int npages,
> +					bool enc);
>   
>   #ifdef CONFIG_PARAVIRT_XXL
>   	struct paravirt_callee_save read_cr2;
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index c131ba4e70ef..840c02b23aeb 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -367,6 +367,7 @@ struct paravirt_patch_template pv_ops = {
>   			(void (*)(struct mmu_gather *, void *))tlb_remove_page,
>   
>   	.mmu.exit_mmap		= paravirt_nop,
> +	.mmu.page_encryption_changed	= paravirt_nop,
>   
>   #ifdef CONFIG_PARAVIRT_XXL
>   	.mmu.read_cr2		= __PV_IS_CALLEE_SAVE(native_read_cr2),
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index f4bd4b431ba1..c9800fa811f6 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -19,6 +19,7 @@
>   #include <linux/kernel.h>
>   #include <linux/bitops.h>
>   #include <linux/dma-mapping.h>
> +#include <linux/kvm_para.h>
>   
>   #include <asm/tlbflush.h>
>   #include <asm/fixmap.h>
> @@ -29,6 +30,7 @@
>   #include <asm/processor-flags.h>
>   #include <asm/msr.h>
>   #include <asm/cmdline.h>
> +#include <asm/kvm_para.h>
>   
>   #include "mm_internal.h"
>   
> @@ -196,6 +198,47 @@ void __init sme_early_init(void)
>   		swiotlb_force = SWIOTLB_FORCE;
>   }
>   
> +static void set_memory_enc_dec_hypercall(unsigned long vaddr, int npages,
> +					bool enc)
> +{
> +	unsigned long sz = npages << PAGE_SHIFT;
> +	unsigned long vaddr_end, vaddr_next;
> +
> +	vaddr_end = vaddr + sz;
> +
> +	for (; vaddr < vaddr_end; vaddr = vaddr_next) {
> +		int psize, pmask, level;
> +		unsigned long pfn;
> +		pte_t *kpte;
> +
> +		kpte = lookup_address(vaddr, &level);
> +		if (!kpte || pte_none(*kpte))
> +			return;
> +
> +		switch (level) {
> +		case PG_LEVEL_4K:
> +			pfn = pte_pfn(*kpte);
> +			break;
> +		case PG_LEVEL_2M:
> +			pfn = pmd_pfn(*(pmd_t *)kpte);
> +			break;
> +		case PG_LEVEL_1G:
> +			pfn = pud_pfn(*(pud_t *)kpte);
> +			break;
> +		default:
> +			return;
> +		}


Is it possible to re-use the code in __set_clr_pte_enc() ?

> +
> +		psize = page_level_size(level);
> +		pmask = page_level_mask(level);
> +
> +		kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> +				   pfn << PAGE_SHIFT, psize >> PAGE_SHIFT, enc);
> +
> +		vaddr_next = (vaddr & pmask) + psize;
> +	}
> +}
> +
>   static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
>   {
>   	pgprot_t old_prot, new_prot;
> @@ -253,12 +296,13 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
>   static int __init early_set_memory_enc_dec(unsigned long vaddr,
>   					   unsigned long size, bool enc)
>   {
> -	unsigned long vaddr_end, vaddr_next;
> +	unsigned long vaddr_end, vaddr_next, start;
>   	unsigned long psize, pmask;
>   	int split_page_size_mask;
>   	int level, ret;
>   	pte_t *kpte;
>   
> +	start = vaddr;
>   	vaddr_next = vaddr;
>   	vaddr_end = vaddr + size;
>   
> @@ -313,6 +357,8 @@ static int __init early_set_memory_enc_dec(unsigned long vaddr,
>   
>   	ret = 0;
>   
> +	set_memory_enc_dec_hypercall(start, PAGE_ALIGN(size) >> PAGE_SHIFT,
> +					enc);


If I haven't missed anything, it seems early_set_memory_encrypted() 
doesn't have a caller. So is there a possibility that we can end up 
calling it in non-SEV context and hence do we need to have the 
sev_active() guard here ?

>   out:
>   	__flush_tlb_all();
>   	return ret;
> @@ -451,6 +497,15 @@ void __init mem_encrypt_init(void)
>   	if (sev_active())
>   		static_branch_enable(&sev_enable_key);
>   
> +#ifdef CONFIG_PARAVIRT
> +	/*
> +	 * With SEV, we need to make a hypercall when page encryption state is
> +	 * changed.
> +	 */
> +	if (sev_active())
> +		pv_ops.mmu.page_encryption_changed = set_memory_enc_dec_hypercall;
> +#endif
> +
>   	pr_info("AMD %s active\n",
>   		sev_active() ? "Secure Encrypted Virtualization (SEV)"
>   			     : "Secure Memory Encryption (SME)");
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index c4aedd00c1ba..86b7804129fc 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -26,6 +26,7 @@
>   #include <asm/proto.h>
>   #include <asm/memtype.h>
>   #include <asm/set_memory.h>
> +#include <asm/paravirt.h>
>   
>   #include "../mm_internal.h"
>   
> @@ -1987,6 +1988,12 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>   	 */
>   	cpa_flush(&cpa, 0);
>   
> +	/* Notify hypervisor that a given memory range is mapped encrypted
> +	 * or decrypted. The hypervisor will use this information during the
> +	 * VM migration.
> +	 */
> +	page_encryption_changed(addr, numpages, enc);
> +
>   	return ret;
>   }
>   

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 11/14] KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
  2020-03-30  6:22 ` [PATCH v6 11/14] KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl Ashish Kalra
@ 2020-04-03 21:10   ` Krish Sadhukhan
  2020-04-03 21:46   ` Venu Busireddy
  2020-04-08  0:26   ` Steve Rutherford
  2 siblings, 0 replies; 107+ messages in thread
From: Krish Sadhukhan @ 2020-04-03 21:10 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh


On 3/29/20 11:22 PM, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
>
> The ioctl can be used to set page encryption bitmap for an
> incoming guest.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>   Documentation/virt/kvm/api.rst  | 22 +++++++++++++++++
>   arch/x86/include/asm/kvm_host.h |  2 ++
>   arch/x86/kvm/svm.c              | 42 +++++++++++++++++++++++++++++++++
>   arch/x86/kvm/x86.c              | 12 ++++++++++
>   include/uapi/linux/kvm.h        |  1 +
>   5 files changed, 79 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 8ad800ebb54f..4d1004a154f6 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4675,6 +4675,28 @@ or shared. The bitmap can be used during the guest migration, if the page
>   is private then userspace need to use SEV migration commands to transmit
>   the page.
>   
> +4.126 KVM_SET_PAGE_ENC_BITMAP (vm ioctl)
> +---------------------------------------
> +
> +:Capability: basic
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: struct kvm_page_enc_bitmap (in/out)
> +:Returns: 0 on success, -1 on error
> +
> +/* for KVM_SET_PAGE_ENC_BITMAP */
> +struct kvm_page_enc_bitmap {
> +	__u64 start_gfn;
> +	__u64 num_pages;
> +	union {
> +		void __user *enc_bitmap; /* one bit per page */
> +		__u64 padding2;
> +	};
> +};
> +
> +During the guest live migration the outgoing guest exports its page encryption
> +bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> +bitmap for an incoming guest.
>   
>   5. The kvm_run structure
>   ========================
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 27e43e3ec9d8..d30f770aaaea 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1271,6 +1271,8 @@ struct kvm_x86_ops {
>   				  unsigned long sz, unsigned long mode);
>   	int (*get_page_enc_bitmap)(struct kvm *kvm,
>   				struct kvm_page_enc_bitmap *bmap);
> +	int (*set_page_enc_bitmap)(struct kvm *kvm,
> +				struct kvm_page_enc_bitmap *bmap);
>   };
>   
>   struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index bae783cd396a..313343a43045 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7756,6 +7756,47 @@ static int svm_get_page_enc_bitmap(struct kvm *kvm,
>   	return ret;
>   }
>   
> +static int svm_set_page_enc_bitmap(struct kvm *kvm,
> +				   struct kvm_page_enc_bitmap *bmap)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	unsigned long gfn_start, gfn_end;
> +	unsigned long *bitmap;
> +	unsigned long sz, i;
> +	int ret;
> +
> +	if (!sev_guest(kvm))
> +		return -ENOTTY;
> +
> +	gfn_start = bmap->start_gfn;
> +	gfn_end = gfn_start + bmap->num_pages;


Same comment as the previous one. Do we continue if num_pages is zero ?

> +
> +	sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / 8;
> +	bitmap = kmalloc(sz, GFP_KERNEL);
> +	if (!bitmap)
> +		return -ENOMEM;
> +
> +	ret = -EFAULT;
> +	if (copy_from_user(bitmap, bmap->enc_bitmap, sz))
> +		goto out;
> +
> +	mutex_lock(&kvm->lock);
> +	ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> +	if (ret)
> +		goto unlock;
> +
> +	i = gfn_start;
> +	for_each_clear_bit_from(i, bitmap, (gfn_end - gfn_start))
> +		clear_bit(i + gfn_start, sev->page_enc_bmap);
> +
> +	ret = 0;
> +unlock:
> +	mutex_unlock(&kvm->lock);
> +out:
> +	kfree(bitmap);
> +	return ret;
> +}
> +
>   static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   {
>   	struct kvm_sev_cmd sev_cmd;
> @@ -8161,6 +8202,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>   
>   	.page_enc_status_hc = svm_page_enc_status_hc,
>   	.get_page_enc_bitmap = svm_get_page_enc_bitmap,
> +	.set_page_enc_bitmap = svm_set_page_enc_bitmap,
>   };
>   
>   static int __init svm_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3c3fea4e20b5..05e953b2ec61 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5238,6 +5238,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
>   			r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
>   		break;
>   	}
> +	case KVM_SET_PAGE_ENC_BITMAP: {
> +		struct kvm_page_enc_bitmap bitmap;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
> +			goto out;
> +
> +		r = -ENOTTY;
> +		if (kvm_x86_ops->set_page_enc_bitmap)
> +			r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> +		break;
> +	}
>   	default:
>   		r = -ENOTTY;
>   	}
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index db1ebf85e177..b4b01d47e568 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1489,6 +1489,7 @@ struct kvm_enc_region {
>   #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
>   
>   #define KVM_GET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> +#define KVM_SET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
>   
>   /* Secure Encrypted Virtualization command */
>   enum sev_cmd_id {
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  2020-03-30  6:23 ` [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl Ashish Kalra
@ 2020-04-03 21:14   ` Krish Sadhukhan
  2020-04-03 21:45     ` Ashish Kalra
  2020-04-03 22:01   ` Venu Busireddy
  1 sibling, 1 reply; 107+ messages in thread
From: Krish Sadhukhan @ 2020-04-03 21:14 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh


On 3/29/20 11:23 PM, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
>
> This ioctl can be used by the application to reset the page
> encryption bitmap managed by the KVM driver. A typical usage
> for this ioctl is on VM reboot, on reboot, we must reinitialize
> the bitmap.
>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>   Documentation/virt/kvm/api.rst  | 13 +++++++++++++
>   arch/x86/include/asm/kvm_host.h |  1 +
>   arch/x86/kvm/svm.c              | 16 ++++++++++++++++
>   arch/x86/kvm/x86.c              |  6 ++++++
>   include/uapi/linux/kvm.h        |  1 +
>   5 files changed, 37 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 4d1004a154f6..a11326ccc51d 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
>   bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
>   bitmap for an incoming guest.
>   
> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> +-----------------------------------------
> +
> +:Capability: basic
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: none
> +:Returns: 0 on success, -1 on error
> +
> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> +
> +
>   5. The kvm_run structure
>   ========================
>   
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index d30f770aaaea..a96ef6338cd2 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
>   				struct kvm_page_enc_bitmap *bmap);
>   	int (*set_page_enc_bitmap)(struct kvm *kvm,
>   				struct kvm_page_enc_bitmap *bmap);
> +	int (*reset_page_enc_bitmap)(struct kvm *kvm);
>   };
>   
>   struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 313343a43045..c99b0207a443 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
>   	return ret;
>   }
>   
> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +
> +	if (!sev_guest(kvm))
> +		return -ENOTTY;
> +
> +	mutex_lock(&kvm->lock);
> +	/* by default all pages should be marked encrypted */
> +	if (sev->page_enc_bmap_size)
> +		bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> +	mutex_unlock(&kvm->lock);
> +	return 0;
> +}
> +
>   static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   {
>   	struct kvm_sev_cmd sev_cmd;
> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>   	.page_enc_status_hc = svm_page_enc_status_hc,
>   	.get_page_enc_bitmap = svm_get_page_enc_bitmap,
>   	.set_page_enc_bitmap = svm_set_page_enc_bitmap,
> +	.reset_page_enc_bitmap = svm_reset_page_enc_bitmap,


We don't need to initialize the intel ops to NULL ? It's not initialized 
in the previous patch either.

>   };
>   
>   static int __init svm_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 05e953b2ec61..2127ed937f53 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
>   			r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
>   		break;
>   	}
> +	case KVM_PAGE_ENC_BITMAP_RESET: {
> +		r = -ENOTTY;
> +		if (kvm_x86_ops->reset_page_enc_bitmap)
> +			r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> +		break;
> +	}
>   	default:
>   		r = -ENOTTY;
>   	}
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index b4b01d47e568..0884a581fc37 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
>   
>   #define KVM_GET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
>   #define KVM_SET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> +#define KVM_PAGE_ENC_BITMAP_RESET	_IO(KVMIO, 0xc7)
>   
>   /* Secure Encrypted Virtualization command */
>   enum sev_cmd_id {
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 10/14] mm: x86: Invoke hypercall when page encryption status is changed
  2020-04-03 21:07   ` Krish Sadhukhan
@ 2020-04-03 21:30     ` Ashish Kalra
  0 siblings, 0 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-04-03 21:30 UTC (permalink / raw)
  To: Krish Sadhukhan
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On Fri, Apr 03, 2020 at 02:07:02PM -0700, Krish Sadhukhan wrote:
> 
> On 3/29/20 11:22 PM, Ashish Kalra wrote:
> > From: Brijesh Singh <Brijesh.Singh@amd.com>
> > 
> > Invoke a hypercall when a memory region is changed from encrypted ->
> > decrypted and vice versa. Hypervisor need to know the page encryption
> > status during the guest migration.
> > 
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > Cc: x86@kernel.org
> > Cc: kvm@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > ---
> >   arch/x86/include/asm/paravirt.h       | 10 +++++
> >   arch/x86/include/asm/paravirt_types.h |  2 +
> >   arch/x86/kernel/paravirt.c            |  1 +
> >   arch/x86/mm/mem_encrypt.c             | 57 ++++++++++++++++++++++++++-
> >   arch/x86/mm/pat/set_memory.c          |  7 ++++
> >   5 files changed, 76 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> > index 694d8daf4983..8127b9c141bf 100644
> > --- a/arch/x86/include/asm/paravirt.h
> > +++ b/arch/x86/include/asm/paravirt.h
> > @@ -78,6 +78,12 @@ static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
> >   	PVOP_VCALL1(mmu.exit_mmap, mm);
> >   }
> > +static inline void page_encryption_changed(unsigned long vaddr, int npages,
> > +						bool enc)
> > +{
> > +	PVOP_VCALL3(mmu.page_encryption_changed, vaddr, npages, enc);
> > +}
> > +
> >   #ifdef CONFIG_PARAVIRT_XXL
> >   static inline void load_sp0(unsigned long sp0)
> >   {
> > @@ -946,6 +952,10 @@ static inline void paravirt_arch_dup_mmap(struct mm_struct *oldmm,
> >   static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
> >   {
> >   }
> > +
> > +static inline void page_encryption_changed(unsigned long vaddr, int npages, bool enc)
> > +{
> > +}
> >   #endif
> >   #endif /* __ASSEMBLY__ */
> >   #endif /* _ASM_X86_PARAVIRT_H */
> > diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> > index 732f62e04ddb..03bfd515c59c 100644
> > --- a/arch/x86/include/asm/paravirt_types.h
> > +++ b/arch/x86/include/asm/paravirt_types.h
> > @@ -215,6 +215,8 @@ struct pv_mmu_ops {
> >   	/* Hook for intercepting the destruction of an mm_struct. */
> >   	void (*exit_mmap)(struct mm_struct *mm);
> > +	void (*page_encryption_changed)(unsigned long vaddr, int npages,
> > +					bool enc);
> >   #ifdef CONFIG_PARAVIRT_XXL
> >   	struct paravirt_callee_save read_cr2;
> > diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> > index c131ba4e70ef..840c02b23aeb 100644
> > --- a/arch/x86/kernel/paravirt.c
> > +++ b/arch/x86/kernel/paravirt.c
> > @@ -367,6 +367,7 @@ struct paravirt_patch_template pv_ops = {
> >   			(void (*)(struct mmu_gather *, void *))tlb_remove_page,
> >   	.mmu.exit_mmap		= paravirt_nop,
> > +	.mmu.page_encryption_changed	= paravirt_nop,
> >   #ifdef CONFIG_PARAVIRT_XXL
> >   	.mmu.read_cr2		= __PV_IS_CALLEE_SAVE(native_read_cr2),
> > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> > index f4bd4b431ba1..c9800fa811f6 100644
> > --- a/arch/x86/mm/mem_encrypt.c
> > +++ b/arch/x86/mm/mem_encrypt.c
> > @@ -19,6 +19,7 @@
> >   #include <linux/kernel.h>
> >   #include <linux/bitops.h>
> >   #include <linux/dma-mapping.h>
> > +#include <linux/kvm_para.h>
> >   #include <asm/tlbflush.h>
> >   #include <asm/fixmap.h>
> > @@ -29,6 +30,7 @@
> >   #include <asm/processor-flags.h>
> >   #include <asm/msr.h>
> >   #include <asm/cmdline.h>
> > +#include <asm/kvm_para.h>
> >   #include "mm_internal.h"
> > @@ -196,6 +198,47 @@ void __init sme_early_init(void)
> >   		swiotlb_force = SWIOTLB_FORCE;
> >   }
> > +static void set_memory_enc_dec_hypercall(unsigned long vaddr, int npages,
> > +					bool enc)
> > +{
> > +	unsigned long sz = npages << PAGE_SHIFT;
> > +	unsigned long vaddr_end, vaddr_next;
> > +
> > +	vaddr_end = vaddr + sz;
> > +
> > +	for (; vaddr < vaddr_end; vaddr = vaddr_next) {
> > +		int psize, pmask, level;
> > +		unsigned long pfn;
> > +		pte_t *kpte;
> > +
> > +		kpte = lookup_address(vaddr, &level);
> > +		if (!kpte || pte_none(*kpte))
> > +			return;
> > +
> > +		switch (level) {
> > +		case PG_LEVEL_4K:
> > +			pfn = pte_pfn(*kpte);
> > +			break;
> > +		case PG_LEVEL_2M:
> > +			pfn = pmd_pfn(*(pmd_t *)kpte);
> > +			break;
> > +		case PG_LEVEL_1G:
> > +			pfn = pud_pfn(*(pud_t *)kpte);
> > +			break;
> > +		default:
> > +			return;
> > +		}
> 
> 
> Is it possible to re-use the code in __set_clr_pte_enc() ?
> 
> > +
> > +		psize = page_level_size(level);
> > +		pmask = page_level_mask(level);
> > +
> > +		kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> > +				   pfn << PAGE_SHIFT, psize >> PAGE_SHIFT, enc);
> > +
> > +		vaddr_next = (vaddr & pmask) + psize;
> > +	}
> > +}
> > +
> >   static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
> >   {
> >   	pgprot_t old_prot, new_prot;
> > @@ -253,12 +296,13 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
> >   static int __init early_set_memory_enc_dec(unsigned long vaddr,
> >   					   unsigned long size, bool enc)
> >   {
> > -	unsigned long vaddr_end, vaddr_next;
> > +	unsigned long vaddr_end, vaddr_next, start;
> >   	unsigned long psize, pmask;
> >   	int split_page_size_mask;
> >   	int level, ret;
> >   	pte_t *kpte;
> > +	start = vaddr;
> >   	vaddr_next = vaddr;
> >   	vaddr_end = vaddr + size;
> > @@ -313,6 +357,8 @@ static int __init early_set_memory_enc_dec(unsigned long vaddr,
> >   	ret = 0;
> > +	set_memory_enc_dec_hypercall(start, PAGE_ALIGN(size) >> PAGE_SHIFT,
> > +					enc);
> 
> 
> If I haven't missed anything, it seems early_set_memory_encrypted() doesn't
> have a caller. So is there a possibility that we can end up calling it in
> non-SEV context and hence do we need to have the sev_active() guard here ?
> 

As of now early_set_memory_encrypted() is not used, but
early_set_memory_decrypted() is used in __set_percpu_decrypted() and
that is called with the sev_active() check.

Thanks,
Ashish

> >   out:
> >   	__flush_tlb_all();
> >   	return ret;
> > @@ -451,6 +497,15 @@ void __init mem_encrypt_init(void)
> >   	if (sev_active())
> >   		static_branch_enable(&sev_enable_key);
> > +#ifdef CONFIG_PARAVIRT
> > +	/*
> > +	 * With SEV, we need to make a hypercall when page encryption state is
> > +	 * changed.
> > +	 */
> > +	if (sev_active())
> > +		pv_ops.mmu.page_encryption_changed = set_memory_enc_dec_hypercall;
> > +#endif
> > +
> >   	pr_info("AMD %s active\n",
> >   		sev_active() ? "Secure Encrypted Virtualization (SEV)"
> >   			     : "Secure Memory Encryption (SME)");
> > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> > index c4aedd00c1ba..86b7804129fc 100644
> > --- a/arch/x86/mm/pat/set_memory.c
> > +++ b/arch/x86/mm/pat/set_memory.c
> > @@ -26,6 +26,7 @@
> >   #include <asm/proto.h>
> >   #include <asm/memtype.h>
> >   #include <asm/set_memory.h>
> > +#include <asm/paravirt.h>
> >   #include "../mm_internal.h"
> > @@ -1987,6 +1988,12 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> >   	 */
> >   	cpa_flush(&cpa, 0);
> > +	/* Notify hypervisor that a given memory range is mapped encrypted
> > +	 * or decrypted. The hypervisor will use this information during the
> > +	 * VM migration.
> > +	 */
> > +	page_encryption_changed(addr, numpages, enc);
> > +
> >   	return ret;
> >   }

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 10/14] mm: x86: Invoke hypercall when page encryption status is changed
  2020-03-30  6:22 ` [PATCH v6 10/14] mm: x86: Invoke hypercall when page encryption status is changed Ashish Kalra
  2020-04-03 21:07   ` Krish Sadhukhan
@ 2020-04-03 21:36   ` Venu Busireddy
  1 sibling, 0 replies; 107+ messages in thread
From: Venu Busireddy @ 2020-04-03 21:36 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On 2020-03-30 06:22:38 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
> 
> Invoke a hypercall when a memory region is changed from encrypted ->
> decrypted and vice versa. Hypervisor need to know the page encryption
s/need/needs/
> status during the guest migration.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>

Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>

> ---
>  arch/x86/include/asm/paravirt.h       | 10 +++++
>  arch/x86/include/asm/paravirt_types.h |  2 +
>  arch/x86/kernel/paravirt.c            |  1 +
>  arch/x86/mm/mem_encrypt.c             | 57 ++++++++++++++++++++++++++-
>  arch/x86/mm/pat/set_memory.c          |  7 ++++
>  5 files changed, 76 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index 694d8daf4983..8127b9c141bf 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -78,6 +78,12 @@ static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
>  	PVOP_VCALL1(mmu.exit_mmap, mm);
>  }
>  
> +static inline void page_encryption_changed(unsigned long vaddr, int npages,
> +						bool enc)
> +{
> +	PVOP_VCALL3(mmu.page_encryption_changed, vaddr, npages, enc);
> +}
> +
>  #ifdef CONFIG_PARAVIRT_XXL
>  static inline void load_sp0(unsigned long sp0)
>  {
> @@ -946,6 +952,10 @@ static inline void paravirt_arch_dup_mmap(struct mm_struct *oldmm,
>  static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
>  {
>  }
> +
> +static inline void page_encryption_changed(unsigned long vaddr, int npages, bool enc)
> +{
> +}
>  #endif
>  #endif /* __ASSEMBLY__ */
>  #endif /* _ASM_X86_PARAVIRT_H */
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index 732f62e04ddb..03bfd515c59c 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -215,6 +215,8 @@ struct pv_mmu_ops {
>  
>  	/* Hook for intercepting the destruction of an mm_struct. */
>  	void (*exit_mmap)(struct mm_struct *mm);
> +	void (*page_encryption_changed)(unsigned long vaddr, int npages,
> +					bool enc);
>  
>  #ifdef CONFIG_PARAVIRT_XXL
>  	struct paravirt_callee_save read_cr2;
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index c131ba4e70ef..840c02b23aeb 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -367,6 +367,7 @@ struct paravirt_patch_template pv_ops = {
>  			(void (*)(struct mmu_gather *, void *))tlb_remove_page,
>  
>  	.mmu.exit_mmap		= paravirt_nop,
> +	.mmu.page_encryption_changed	= paravirt_nop,
>  
>  #ifdef CONFIG_PARAVIRT_XXL
>  	.mmu.read_cr2		= __PV_IS_CALLEE_SAVE(native_read_cr2),
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index f4bd4b431ba1..c9800fa811f6 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -19,6 +19,7 @@
>  #include <linux/kernel.h>
>  #include <linux/bitops.h>
>  #include <linux/dma-mapping.h>
> +#include <linux/kvm_para.h>
>  
>  #include <asm/tlbflush.h>
>  #include <asm/fixmap.h>
> @@ -29,6 +30,7 @@
>  #include <asm/processor-flags.h>
>  #include <asm/msr.h>
>  #include <asm/cmdline.h>
> +#include <asm/kvm_para.h>
>  
>  #include "mm_internal.h"
>  
> @@ -196,6 +198,47 @@ void __init sme_early_init(void)
>  		swiotlb_force = SWIOTLB_FORCE;
>  }
>  
> +static void set_memory_enc_dec_hypercall(unsigned long vaddr, int npages,
> +					bool enc)
> +{
> +	unsigned long sz = npages << PAGE_SHIFT;
> +	unsigned long vaddr_end, vaddr_next;
> +
> +	vaddr_end = vaddr + sz;
> +
> +	for (; vaddr < vaddr_end; vaddr = vaddr_next) {
> +		int psize, pmask, level;
> +		unsigned long pfn;
> +		pte_t *kpte;
> +
> +		kpte = lookup_address(vaddr, &level);
> +		if (!kpte || pte_none(*kpte))
> +			return;
> +
> +		switch (level) {
> +		case PG_LEVEL_4K:
> +			pfn = pte_pfn(*kpte);
> +			break;
> +		case PG_LEVEL_2M:
> +			pfn = pmd_pfn(*(pmd_t *)kpte);
> +			break;
> +		case PG_LEVEL_1G:
> +			pfn = pud_pfn(*(pud_t *)kpte);
> +			break;
> +		default:
> +			return;
> +		}
> +
> +		psize = page_level_size(level);
> +		pmask = page_level_mask(level);
> +
> +		kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> +				   pfn << PAGE_SHIFT, psize >> PAGE_SHIFT, enc);
> +
> +		vaddr_next = (vaddr & pmask) + psize;
> +	}
> +}
> +
>  static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
>  {
>  	pgprot_t old_prot, new_prot;
> @@ -253,12 +296,13 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
>  static int __init early_set_memory_enc_dec(unsigned long vaddr,
>  					   unsigned long size, bool enc)
>  {
> -	unsigned long vaddr_end, vaddr_next;
> +	unsigned long vaddr_end, vaddr_next, start;
>  	unsigned long psize, pmask;
>  	int split_page_size_mask;
>  	int level, ret;
>  	pte_t *kpte;
>  
> +	start = vaddr;
>  	vaddr_next = vaddr;
>  	vaddr_end = vaddr + size;
>  
> @@ -313,6 +357,8 @@ static int __init early_set_memory_enc_dec(unsigned long vaddr,
>  
>  	ret = 0;
>  
> +	set_memory_enc_dec_hypercall(start, PAGE_ALIGN(size) >> PAGE_SHIFT,
> +					enc);
>  out:
>  	__flush_tlb_all();
>  	return ret;
> @@ -451,6 +497,15 @@ void __init mem_encrypt_init(void)
>  	if (sev_active())
>  		static_branch_enable(&sev_enable_key);
>  
> +#ifdef CONFIG_PARAVIRT
> +	/*
> +	 * With SEV, we need to make a hypercall when page encryption state is
> +	 * changed.
> +	 */
> +	if (sev_active())
> +		pv_ops.mmu.page_encryption_changed = set_memory_enc_dec_hypercall;
> +#endif
> +
>  	pr_info("AMD %s active\n",
>  		sev_active() ? "Secure Encrypted Virtualization (SEV)"
>  			     : "Secure Memory Encryption (SME)");
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index c4aedd00c1ba..86b7804129fc 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -26,6 +26,7 @@
>  #include <asm/proto.h>
>  #include <asm/memtype.h>
>  #include <asm/set_memory.h>
> +#include <asm/paravirt.h>
>  
>  #include "../mm_internal.h"
>  
> @@ -1987,6 +1988,12 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>  	 */
>  	cpa_flush(&cpa, 0);
>  
> +	/* Notify hypervisor that a given memory range is mapped encrypted
> +	 * or decrypted. The hypervisor will use this information during the
> +	 * VM migration.
> +	 */
> +	page_encryption_changed(addr, numpages, enc);
> +
>  	return ret;
>  }
>  
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  2020-04-03 21:14   ` Krish Sadhukhan
@ 2020-04-03 21:45     ` Ashish Kalra
  2020-04-06 18:52       ` Krish Sadhukhan
  0 siblings, 1 reply; 107+ messages in thread
From: Ashish Kalra @ 2020-04-03 21:45 UTC (permalink / raw)
  To: Krish Sadhukhan
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> 
> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > From: Ashish Kalra <ashish.kalra@amd.com>
> > 
> > This ioctl can be used by the application to reset the page
> > encryption bitmap managed by the KVM driver. A typical usage
> > for this ioctl is on VM reboot, on reboot, we must reinitialize
> > the bitmap.
> > 
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > ---
> >   Documentation/virt/kvm/api.rst  | 13 +++++++++++++
> >   arch/x86/include/asm/kvm_host.h |  1 +
> >   arch/x86/kvm/svm.c              | 16 ++++++++++++++++
> >   arch/x86/kvm/x86.c              |  6 ++++++
> >   include/uapi/linux/kvm.h        |  1 +
> >   5 files changed, 37 insertions(+)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 4d1004a154f6..a11326ccc51d 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> >   bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> >   bitmap for an incoming guest.
> > +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> > +-----------------------------------------
> > +
> > +:Capability: basic
> > +:Architectures: x86
> > +:Type: vm ioctl
> > +:Parameters: none
> > +:Returns: 0 on success, -1 on error
> > +
> > +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> > +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> > +
> > +
> >   5. The kvm_run structure
> >   ========================
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index d30f770aaaea..a96ef6338cd2 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> >   				struct kvm_page_enc_bitmap *bmap);
> >   	int (*set_page_enc_bitmap)(struct kvm *kvm,
> >   				struct kvm_page_enc_bitmap *bmap);
> > +	int (*reset_page_enc_bitmap)(struct kvm *kvm);
> >   };
> >   struct kvm_arch_async_pf {
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 313343a43045..c99b0207a443 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> >   	return ret;
> >   }
> > +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> > +{
> > +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +
> > +	if (!sev_guest(kvm))
> > +		return -ENOTTY;
> > +
> > +	mutex_lock(&kvm->lock);
> > +	/* by default all pages should be marked encrypted */
> > +	if (sev->page_enc_bmap_size)
> > +		bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> > +	mutex_unlock(&kvm->lock);
> > +	return 0;
> > +}
> > +
> >   static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >   {
> >   	struct kvm_sev_cmd sev_cmd;
> > @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >   	.page_enc_status_hc = svm_page_enc_status_hc,
> >   	.get_page_enc_bitmap = svm_get_page_enc_bitmap,
> >   	.set_page_enc_bitmap = svm_set_page_enc_bitmap,
> > +	.reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
> 
> 
> We don't need to initialize the intel ops to NULL ? It's not initialized in
> the previous patch either.
> 
> >   };

This struct is declared as "static storage", so won't the non-initialized
members be 0 ?

> >   static int __init svm_init(void)
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 05e953b2ec61..2127ed937f53 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >   			r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> >   		break;
> >   	}
> > +	case KVM_PAGE_ENC_BITMAP_RESET: {
> > +		r = -ENOTTY;
> > +		if (kvm_x86_ops->reset_page_enc_bitmap)
> > +			r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> > +		break;
> > +	}
> >   	default:
> >   		r = -ENOTTY;
> >   	}
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index b4b01d47e568..0884a581fc37 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> >   #define KVM_GET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> >   #define KVM_SET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> > +#define KVM_PAGE_ENC_BITMAP_RESET	_IO(KVMIO, 0xc7)
> >   /* Secure Encrypted Virtualization command */
> >   enum sev_cmd_id {
> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 11/14] KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
  2020-03-30  6:22 ` [PATCH v6 11/14] KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl Ashish Kalra
  2020-04-03 21:10   ` Krish Sadhukhan
@ 2020-04-03 21:46   ` Venu Busireddy
  2020-04-08  0:26   ` Steve Rutherford
  2 siblings, 0 replies; 107+ messages in thread
From: Venu Busireddy @ 2020-04-03 21:46 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On 2020-03-30 06:22:55 +0000, Ashish Kalra wrote:
> From: Brijesh Singh <Brijesh.Singh@amd.com>
> 
> The ioctl can be used to set page encryption bitmap for an
> incoming guest.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>

Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>

> ---
>  Documentation/virt/kvm/api.rst  | 22 +++++++++++++++++
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/svm.c              | 42 +++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c              | 12 ++++++++++
>  include/uapi/linux/kvm.h        |  1 +
>  5 files changed, 79 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 8ad800ebb54f..4d1004a154f6 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4675,6 +4675,28 @@ or shared. The bitmap can be used during the guest migration, if the page
>  is private then userspace need to use SEV migration commands to transmit
>  the page.
>  
> +4.126 KVM_SET_PAGE_ENC_BITMAP (vm ioctl)
> +---------------------------------------
> +
> +:Capability: basic
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: struct kvm_page_enc_bitmap (in/out)
> +:Returns: 0 on success, -1 on error
> +
> +/* for KVM_SET_PAGE_ENC_BITMAP */
> +struct kvm_page_enc_bitmap {
> +	__u64 start_gfn;
> +	__u64 num_pages;
> +	union {
> +		void __user *enc_bitmap; /* one bit per page */
> +		__u64 padding2;
> +	};
> +};
> +
> +During the guest live migration the outgoing guest exports its page encryption
> +bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> +bitmap for an incoming guest.
>  
>  5. The kvm_run structure
>  ========================
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 27e43e3ec9d8..d30f770aaaea 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1271,6 +1271,8 @@ struct kvm_x86_ops {
>  				  unsigned long sz, unsigned long mode);
>  	int (*get_page_enc_bitmap)(struct kvm *kvm,
>  				struct kvm_page_enc_bitmap *bmap);
> +	int (*set_page_enc_bitmap)(struct kvm *kvm,
> +				struct kvm_page_enc_bitmap *bmap);
>  };
>  
>  struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index bae783cd396a..313343a43045 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7756,6 +7756,47 @@ static int svm_get_page_enc_bitmap(struct kvm *kvm,
>  	return ret;
>  }
>  
> +static int svm_set_page_enc_bitmap(struct kvm *kvm,
> +				   struct kvm_page_enc_bitmap *bmap)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	unsigned long gfn_start, gfn_end;
> +	unsigned long *bitmap;
> +	unsigned long sz, i;
> +	int ret;
> +
> +	if (!sev_guest(kvm))
> +		return -ENOTTY;
> +
> +	gfn_start = bmap->start_gfn;
> +	gfn_end = gfn_start + bmap->num_pages;
> +
> +	sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / 8;
> +	bitmap = kmalloc(sz, GFP_KERNEL);
> +	if (!bitmap)
> +		return -ENOMEM;
> +
> +	ret = -EFAULT;
> +	if (copy_from_user(bitmap, bmap->enc_bitmap, sz))
> +		goto out;
> +
> +	mutex_lock(&kvm->lock);
> +	ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> +	if (ret)
> +		goto unlock;
> +
> +	i = gfn_start;
> +	for_each_clear_bit_from(i, bitmap, (gfn_end - gfn_start))
> +		clear_bit(i + gfn_start, sev->page_enc_bmap);
> +
> +	ret = 0;
> +unlock:
> +	mutex_unlock(&kvm->lock);
> +out:
> +	kfree(bitmap);
> +	return ret;
> +}
> +
>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_sev_cmd sev_cmd;
> @@ -8161,6 +8202,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>  
>  	.page_enc_status_hc = svm_page_enc_status_hc,
>  	.get_page_enc_bitmap = svm_get_page_enc_bitmap,
> +	.set_page_enc_bitmap = svm_set_page_enc_bitmap,
>  };
>  
>  static int __init svm_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3c3fea4e20b5..05e953b2ec61 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5238,6 +5238,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  			r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
>  		break;
>  	}
> +	case KVM_SET_PAGE_ENC_BITMAP: {
> +		struct kvm_page_enc_bitmap bitmap;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
> +			goto out;
> +
> +		r = -ENOTTY;
> +		if (kvm_x86_ops->set_page_enc_bitmap)
> +			r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> +		break;
> +	}
>  	default:
>  		r = -ENOTTY;
>  	}
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index db1ebf85e177..b4b01d47e568 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1489,6 +1489,7 @@ struct kvm_enc_region {
>  #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
>  
>  #define KVM_GET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> +#define KVM_SET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
>  
>  /* Secure Encrypted Virtualization command */
>  enum sev_cmd_id {
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  2020-03-30  6:23 ` [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl Ashish Kalra
  2020-04-03 21:14   ` Krish Sadhukhan
@ 2020-04-03 22:01   ` Venu Busireddy
  1 sibling, 0 replies; 107+ messages in thread
From: Venu Busireddy @ 2020-04-03 22:01 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

On 2020-03-30 06:23:10 +0000, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> This ioctl can be used by the application to reset the page
> encryption bitmap managed by the KVM driver. A typical usage
> for this ioctl is on VM reboot, on reboot, we must reinitialize
> the bitmap.
> 
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>

Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>

> ---
>  Documentation/virt/kvm/api.rst  | 13 +++++++++++++
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/svm.c              | 16 ++++++++++++++++
>  arch/x86/kvm/x86.c              |  6 ++++++
>  include/uapi/linux/kvm.h        |  1 +
>  5 files changed, 37 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 4d1004a154f6..a11326ccc51d 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
>  bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
>  bitmap for an incoming guest.
>  
> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> +-----------------------------------------
> +
> +:Capability: basic
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: none
> +:Returns: 0 on success, -1 on error
> +
> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> +
> +
>  5. The kvm_run structure
>  ========================
>  
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index d30f770aaaea..a96ef6338cd2 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
>  				struct kvm_page_enc_bitmap *bmap);
>  	int (*set_page_enc_bitmap)(struct kvm *kvm,
>  				struct kvm_page_enc_bitmap *bmap);
> +	int (*reset_page_enc_bitmap)(struct kvm *kvm);
>  };
>  
>  struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 313343a43045..c99b0207a443 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
>  	return ret;
>  }
>  
> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +
> +	if (!sev_guest(kvm))
> +		return -ENOTTY;
> +
> +	mutex_lock(&kvm->lock);
> +	/* by default all pages should be marked encrypted */
> +	if (sev->page_enc_bmap_size)
> +		bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> +	mutex_unlock(&kvm->lock);
> +	return 0;
> +}
> +
>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_sev_cmd sev_cmd;
> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>  	.page_enc_status_hc = svm_page_enc_status_hc,
>  	.get_page_enc_bitmap = svm_get_page_enc_bitmap,
>  	.set_page_enc_bitmap = svm_set_page_enc_bitmap,
> +	.reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
>  };
>  
>  static int __init svm_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 05e953b2ec61..2127ed937f53 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  			r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
>  		break;
>  	}
> +	case KVM_PAGE_ENC_BITMAP_RESET: {
> +		r = -ENOTTY;
> +		if (kvm_x86_ops->reset_page_enc_bitmap)
> +			r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> +		break;
> +	}
>  	default:
>  		r = -ENOTTY;
>  	}
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index b4b01d47e568..0884a581fc37 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
>  
>  #define KVM_GET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
>  #define KVM_SET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> +#define KVM_PAGE_ENC_BITMAP_RESET	_IO(KVMIO, 0xc7)
>  
>  /* Secure Encrypted Virtualization command */
>  enum sev_cmd_id {
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 13/14] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.
  2020-03-30  6:23 ` [PATCH v6 13/14] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR Ashish Kalra
  2020-03-30 15:52   ` Brijesh Singh
@ 2020-04-03 23:46   ` Krish Sadhukhan
  1 sibling, 0 replies; 107+ messages in thread
From: Krish Sadhukhan @ 2020-04-03 23:46 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh


On 3/29/20 11:23 PM, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
>
> Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> for host-side support for SEV live migration. Also add a new custom
> MSR_KVM_SEV_LIVE_MIG_EN for guest to enable the SEV live migration
> feature.
>
> Also, ensure that _bss_decrypted section is marked as decrypted in the
> page encryption bitmap.
>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>   Documentation/virt/kvm/cpuid.rst     |  4 ++++
>   Documentation/virt/kvm/msr.rst       | 10 ++++++++++
>   arch/x86/include/asm/kvm_host.h      |  3 +++
>   arch/x86/include/uapi/asm/kvm_para.h |  5 +++++
>   arch/x86/kernel/kvm.c                |  4 ++++
>   arch/x86/kvm/cpuid.c                 |  3 ++-
>   arch/x86/kvm/svm.c                   |  5 +++++
>   arch/x86/kvm/x86.c                   |  7 +++++++
>   arch/x86/mm/mem_encrypt.c            | 14 +++++++++++++-
>   9 files changed, 53 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> index 01b081f6e7ea..fcb191bb3016 100644
> --- a/Documentation/virt/kvm/cpuid.rst
> +++ b/Documentation/virt/kvm/cpuid.rst
> @@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD        13          guest checks this feature bit
>                                                 before using paravirtualized
>                                                 sched yield.
>   
> +KVM_FEATURE_SEV_LIVE_MIGRATION    14          guest checks this feature bit
> +                                              before enabling SEV live
> +                                              migration feature.
> +
>   KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24          host will warn if no guest-side
>                                                 per-cpu warps are expeced in
>                                                 kvmclock
> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> index 33892036672d..7cd7786bbb03 100644
> --- a/Documentation/virt/kvm/msr.rst
> +++ b/Documentation/virt/kvm/msr.rst
> @@ -319,3 +319,13 @@ data:
>   
>   	KVM guests can request the host not to poll on HLT, for example if
>   	they are performing polling themselves.
> +
> +MSR_KVM_SEV_LIVE_MIG_EN:
> +        0x4b564d06
> +
> +	Control SEV Live Migration features.
> +
> +data:
> +        Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature.
> +        Bit 1 enables (1) or disables (0) support for SEV Live Migration extensions.
> +        All other bits are reserved.
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index a96ef6338cd2..ad5faaed43c0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -780,6 +780,9 @@ struct kvm_vcpu_arch {
>   
>   	u64 msr_kvm_poll_control;
>   
> +	/* SEV Live Migration MSR (AMD only) */
> +	u64 msr_kvm_sev_live_migration_flag;
> +
>   	/*
>   	 * Indicates the guest is trying to write a gfn that contains one or
>   	 * more of the PTEs used to translate the write itself, i.e. the access
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 2a8e0b6b9805..d9d4953b42ad 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -31,6 +31,7 @@
>   #define KVM_FEATURE_PV_SEND_IPI	11
>   #define KVM_FEATURE_POLL_CONTROL	12
>   #define KVM_FEATURE_PV_SCHED_YIELD	13
> +#define KVM_FEATURE_SEV_LIVE_MIGRATION	14
>   
>   #define KVM_HINTS_REALTIME      0
>   
> @@ -50,6 +51,7 @@
>   #define MSR_KVM_STEAL_TIME  0x4b564d03
>   #define MSR_KVM_PV_EOI_EN      0x4b564d04
>   #define MSR_KVM_POLL_CONTROL	0x4b564d05
> +#define MSR_KVM_SEV_LIVE_MIG_EN	0x4b564d06
>   
>   struct kvm_steal_time {
>   	__u64 steal;
> @@ -122,4 +124,7 @@ struct kvm_vcpu_pv_apf_data {
>   #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
>   #define KVM_PV_EOI_DISABLED 0x0
>   
> +#define KVM_SEV_LIVE_MIGRATION_ENABLED			(1 << 0)
> +#define KVM_SEV_LIVE_MIGRATION_EXTENSIONS_SUPPORTED	(1 << 1)
> +
>   #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 6efe0410fb72..8fcee0b45231 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -418,6 +418,10 @@ static void __init sev_map_percpu_data(void)
>   	if (!sev_active())
>   		return;
>   
> +	if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION)) {
> +		wrmsrl(MSR_KVM_SEV_LIVE_MIG_EN, KVM_SEV_LIVE_MIGRATION_ENABLED);
> +	}
> +
>   	for_each_possible_cpu(cpu) {
>   		__set_percpu_decrypted(&per_cpu(apf_reason, cpu), sizeof(apf_reason));
>   		__set_percpu_decrypted(&per_cpu(steal_time, cpu), sizeof(steal_time));
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index b1c469446b07..74c8b2a7270c 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -716,7 +716,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
>   			     (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
>   			     (1 << KVM_FEATURE_PV_SEND_IPI) |
>   			     (1 << KVM_FEATURE_POLL_CONTROL) |
> -			     (1 << KVM_FEATURE_PV_SCHED_YIELD);
> +			     (1 << KVM_FEATURE_PV_SCHED_YIELD) |
> +			     (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
>   
>   		if (sched_info_on())
>   			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index c99b0207a443..60ddc242a133 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7632,6 +7632,7 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
>   				  unsigned long npages, unsigned long enc)
>   {
>   	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct kvm_vcpu *vcpu = kvm->vcpus[0];
>   	kvm_pfn_t pfn_start, pfn_end;
>   	gfn_t gfn_start, gfn_end;
>   	int ret;
> @@ -7639,6 +7640,10 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
>   	if (!sev_guest(kvm))
>   		return -EINVAL;
>   
> +	if (!(vcpu->arch.msr_kvm_sev_live_migration_flag &
> +		KVM_SEV_LIVE_MIGRATION_ENABLED))
> +		return -ENOTTY;
> +
>   	if (!npages)
>   		return 0;
>   
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2127ed937f53..82867b8798f8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2880,6 +2880,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		vcpu->arch.msr_kvm_poll_control = data;
>   		break;
>   
> +	case MSR_KVM_SEV_LIVE_MIG_EN:
> +		vcpu->arch.msr_kvm_sev_live_migration_flag = data;
> +		break;
> +
>   	case MSR_IA32_MCG_CTL:
>   	case MSR_IA32_MCG_STATUS:
>   	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
> @@ -3126,6 +3130,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   	case MSR_KVM_POLL_CONTROL:
>   		msr_info->data = vcpu->arch.msr_kvm_poll_control;
>   		break;
> +	case MSR_KVM_SEV_LIVE_MIG_EN:
> +		msr_info->data = vcpu->arch.msr_kvm_sev_live_migration_flag;
> +		break;
>   	case MSR_IA32_P5_MC_ADDR:
>   	case MSR_IA32_P5_MC_TYPE:
>   	case MSR_IA32_MCG_CAP:
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index c9800fa811f6..f6a841494845 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -502,8 +502,20 @@ void __init mem_encrypt_init(void)
>   	 * With SEV, we need to make a hypercall when page encryption state is
>   	 * changed.
>   	 */
> -	if (sev_active())
> +	if (sev_active()) {
> +		unsigned long nr_pages;
> +
>   		pv_ops.mmu.page_encryption_changed = set_memory_enc_dec_hypercall;
> +
> +		/*
> +		 * Ensure that _bss_decrypted section is marked as decrypted in the
> +		 * page encryption bitmap.
> +		 */
> +		nr_pages = DIV_ROUND_UP(__end_bss_decrypted - __start_bss_decrypted,
> +			PAGE_SIZE);
> +		set_memory_enc_dec_hypercall((unsigned long)__start_bss_decrypted,
> +			nr_pages, 0);
> +	}
>   #endif
>   
>   	pr_info("AMD %s active\n",
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 14/14] KVM: x86: Add kexec support for SEV Live Migration.
  2020-03-30  6:23 ` [PATCH v6 14/14] KVM: x86: Add kexec support for SEV Live Migration Ashish Kalra
  2020-03-30 16:00   ` Brijesh Singh
  2020-04-03 12:57   ` Dave Young
@ 2020-04-04  0:55   ` Krish Sadhukhan
  2020-04-04 21:57     ` Ashish Kalra
  2 siblings, 1 reply; 107+ messages in thread
From: Krish Sadhukhan @ 2020-04-04  0:55 UTC (permalink / raw)
  To: Ashish Kalra, pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh


On 3/29/20 11:23 PM, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
>
> Reset the host's page encryption bitmap related to kernel
> specific page encryption status settings before we load a
> new kernel by kexec. We cannot reset the complete
> page encryption bitmap here as we need to retain the
> UEFI/OVMF firmware specific settings.


Can the commit message mention why host page encryption needs to be 
reset ? Since the theme of these patches is guest migration in-SEV 
context, it might be useful to mention why the host context comes in here.

>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>   arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
>   1 file changed, 28 insertions(+)
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 8fcee0b45231..ba6cce3c84af 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -34,6 +34,7 @@
>   #include <asm/hypervisor.h>
>   #include <asm/tlb.h>
>   #include <asm/cpuidle_haltpoll.h>
> +#include <asm/e820/api.h>
>   
>   static int kvmapf = 1;
>   
> @@ -357,6 +358,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
>   	 */
>   	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
>   		wrmsrl(MSR_KVM_PV_EOI_EN, 0);
> +	/*
> +	 * Reset the host's page encryption bitmap related to kernel
> +	 * specific page encryption status settings before we load a
> +	 * new kernel by kexec. NOTE: We cannot reset the complete
> +	 * page encryption bitmap here as we need to retain the
> +	 * UEFI/OVMF firmware specific settings.
> +	 */
> +	if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION) &&
> +		(smp_processor_id() == 0)) {
> +		unsigned long nr_pages;
> +		int i;
> +
> +		for (i = 0; i < e820_table->nr_entries; i++) {
> +			struct e820_entry *entry = &e820_table->entries[i];
> +			unsigned long start_pfn, end_pfn;
> +
> +			if (entry->type != E820_TYPE_RAM)
> +				continue;
> +
> +			start_pfn = entry->addr >> PAGE_SHIFT;
> +			end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
> +			nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
> +
> +			kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> +				entry->addr, nr_pages, 1);
> +		}
> +	}
>   	kvm_pv_disable_apf();
>   	kvm_disable_steal_time();
>   }

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 14/14] KVM: x86: Add kexec support for SEV Live Migration.
  2020-04-04  0:55   ` Krish Sadhukhan
@ 2020-04-04 21:57     ` Ashish Kalra
  2020-04-06 18:37       ` Krish Sadhukhan
  0 siblings, 1 reply; 107+ messages in thread
From: Ashish Kalra @ 2020-04-04 21:57 UTC (permalink / raw)
  To: Krish Sadhukhan
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh

The host's page encryption bitmap is maintained for the guest to keep the encrypted/decrypted state 
of the guest pages, therefore we need to explicitly mark all shared pages as encrypted again before
rebooting into the new guest kernel.

On Fri, Apr 03, 2020 at 05:55:52PM -0700, Krish Sadhukhan wrote:
> 
> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > From: Ashish Kalra <ashish.kalra@amd.com>
> > 
> > Reset the host's page encryption bitmap related to kernel
> > specific page encryption status settings before we load a
> > new kernel by kexec. We cannot reset the complete
> > page encryption bitmap here as we need to retain the
> > UEFI/OVMF firmware specific settings.
> 
> 
> Can the commit message mention why host page encryption needs to be reset ?
> Since the theme of these patches is guest migration in-SEV context, it might
> be useful to mention why the host context comes in here.
> 
> > 
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > ---
> >   arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
> >   1 file changed, 28 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index 8fcee0b45231..ba6cce3c84af 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -34,6 +34,7 @@
> >   #include <asm/hypervisor.h>
> >   #include <asm/tlb.h>
> >   #include <asm/cpuidle_haltpoll.h>
> > +#include <asm/e820/api.h>
> >   static int kvmapf = 1;
> > @@ -357,6 +358,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
> >   	 */
> >   	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
> >   		wrmsrl(MSR_KVM_PV_EOI_EN, 0);
> > +	/*
> > +	 * Reset the host's page encryption bitmap related to kernel
> > +	 * specific page encryption status settings before we load a
> > +	 * new kernel by kexec. NOTE: We cannot reset the complete
> > +	 * page encryption bitmap here as we need to retain the
> > +	 * UEFI/OVMF firmware specific settings.
> > +	 */
> > +	if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION) &&
> > +		(smp_processor_id() == 0)) {
> > +		unsigned long nr_pages;
> > +		int i;
> > +
> > +		for (i = 0; i < e820_table->nr_entries; i++) {
> > +			struct e820_entry *entry = &e820_table->entries[i];
> > +			unsigned long start_pfn, end_pfn;
> > +
> > +			if (entry->type != E820_TYPE_RAM)
> > +				continue;
> > +
> > +			start_pfn = entry->addr >> PAGE_SHIFT;
> > +			end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
> > +			nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
> > +
> > +			kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> > +				entry->addr, nr_pages, 1);
> > +		}
> > +	}
> >   	kvm_pv_disable_apf();
> >   	kvm_disable_steal_time();
> >   }

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 14/14] KVM: x86: Add kexec support for SEV Live Migration.
  2020-04-04 21:57     ` Ashish Kalra
@ 2020-04-06 18:37       ` Krish Sadhukhan
  0 siblings, 0 replies; 107+ messages in thread
From: Krish Sadhukhan @ 2020-04-06 18:37 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh


On 4/4/20 2:57 PM, Ashish Kalra wrote:
> The host's page encryption bitmap is maintained for the guest to keep the encrypted/decrypted state
> of the guest pages, therefore we need to explicitly mark all shared pages as encrypted again before
> rebooting into the new guest kernel.
>
> On Fri, Apr 03, 2020 at 05:55:52PM -0700, Krish Sadhukhan wrote:
>> On 3/29/20 11:23 PM, Ashish Kalra wrote:
>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>
>>> Reset the host's page encryption bitmap related to kernel
>>> specific page encryption status settings before we load a
>>> new kernel by kexec. We cannot reset the complete
>>> page encryption bitmap here as we need to retain the
>>> UEFI/OVMF firmware specific settings.
>>
>> Can the commit message mention why host page encryption needs to be reset ?
>> Since the theme of these patches is guest migration in-SEV context, it might
>> be useful to mention why the host context comes in here.
>>
>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>> ---
>>>    arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++
>>>    1 file changed, 28 insertions(+)
>>>
>>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>>> index 8fcee0b45231..ba6cce3c84af 100644
>>> --- a/arch/x86/kernel/kvm.c
>>> +++ b/arch/x86/kernel/kvm.c
>>> @@ -34,6 +34,7 @@
>>>    #include <asm/hypervisor.h>
>>>    #include <asm/tlb.h>
>>>    #include <asm/cpuidle_haltpoll.h>
>>> +#include <asm/e820/api.h>
>>>    static int kvmapf = 1;
>>> @@ -357,6 +358,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
>>>    	 */
>>>    	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
>>>    		wrmsrl(MSR_KVM_PV_EOI_EN, 0);
>>> +	/*
>>> +	 * Reset the host's page encryption bitmap related to kernel
>>> +	 * specific page encryption status settings before we load a
>>> +	 * new kernel by kexec. NOTE: We cannot reset the complete
>>> +	 * page encryption bitmap here as we need to retain the
>>> +	 * UEFI/OVMF firmware specific settings.
>>> +	 */
>>> +	if (kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION) &&
>>> +		(smp_processor_id() == 0)) {
>>> +		unsigned long nr_pages;
>>> +		int i;
>>> +
>>> +		for (i = 0; i < e820_table->nr_entries; i++) {
>>> +			struct e820_entry *entry = &e820_table->entries[i];
>>> +			unsigned long start_pfn, end_pfn;
>>> +
>>> +			if (entry->type != E820_TYPE_RAM)
>>> +				continue;
>>> +
>>> +			start_pfn = entry->addr >> PAGE_SHIFT;
>>> +			end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
>>> +			nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
>>> +
>>> +			kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
>>> +				entry->addr, nr_pages, 1);
>>> +		}
>>> +	}
>>>    	kvm_pv_disable_apf();
>>>    	kvm_disable_steal_time();
>>>    }

Thanks for the explanation. It will certainly help one understand the 
context better if you add it to the commit message.

Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  2020-04-03 21:45     ` Ashish Kalra
@ 2020-04-06 18:52       ` Krish Sadhukhan
  2020-04-08  1:25         ` Steve Rutherford
  0 siblings, 1 reply; 107+ messages in thread
From: Krish Sadhukhan @ 2020-04-06 18:52 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh


On 4/3/20 2:45 PM, Ashish Kalra wrote:
> On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
>> On 3/29/20 11:23 PM, Ashish Kalra wrote:
>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>
>>> This ioctl can be used by the application to reset the page
>>> encryption bitmap managed by the KVM driver. A typical usage
>>> for this ioctl is on VM reboot, on reboot, we must reinitialize
>>> the bitmap.
>>>
>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>> ---
>>>    Documentation/virt/kvm/api.rst  | 13 +++++++++++++
>>>    arch/x86/include/asm/kvm_host.h |  1 +
>>>    arch/x86/kvm/svm.c              | 16 ++++++++++++++++
>>>    arch/x86/kvm/x86.c              |  6 ++++++
>>>    include/uapi/linux/kvm.h        |  1 +
>>>    5 files changed, 37 insertions(+)
>>>
>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>>> index 4d1004a154f6..a11326ccc51d 100644
>>> --- a/Documentation/virt/kvm/api.rst
>>> +++ b/Documentation/virt/kvm/api.rst
>>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
>>>    bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
>>>    bitmap for an incoming guest.
>>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
>>> +-----------------------------------------
>>> +
>>> +:Capability: basic
>>> +:Architectures: x86
>>> +:Type: vm ioctl
>>> +:Parameters: none
>>> +:Returns: 0 on success, -1 on error
>>> +
>>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
>>> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
>>> +
>>> +
>>>    5. The kvm_run structure
>>>    ========================
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index d30f770aaaea..a96ef6338cd2 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
>>>    				struct kvm_page_enc_bitmap *bmap);
>>>    	int (*set_page_enc_bitmap)(struct kvm *kvm,
>>>    				struct kvm_page_enc_bitmap *bmap);
>>> +	int (*reset_page_enc_bitmap)(struct kvm *kvm);
>>>    };
>>>    struct kvm_arch_async_pf {
>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>> index 313343a43045..c99b0207a443 100644
>>> --- a/arch/x86/kvm/svm.c
>>> +++ b/arch/x86/kvm/svm.c
>>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
>>>    	return ret;
>>>    }
>>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
>>> +{
>>> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>> +
>>> +	if (!sev_guest(kvm))
>>> +		return -ENOTTY;
>>> +
>>> +	mutex_lock(&kvm->lock);
>>> +	/* by default all pages should be marked encrypted */
>>> +	if (sev->page_enc_bmap_size)
>>> +		bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
>>> +	mutex_unlock(&kvm->lock);
>>> +	return 0;
>>> +}
>>> +
>>>    static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>>    {
>>>    	struct kvm_sev_cmd sev_cmd;
>>> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>>>    	.page_enc_status_hc = svm_page_enc_status_hc,
>>>    	.get_page_enc_bitmap = svm_get_page_enc_bitmap,
>>>    	.set_page_enc_bitmap = svm_set_page_enc_bitmap,
>>> +	.reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
>>
>> We don't need to initialize the intel ops to NULL ? It's not initialized in
>> the previous patch either.
>>
>>>    };
> This struct is declared as "static storage", so won't the non-initialized
> members be 0 ?


Correct. Although, I see that 'nested_enable_evmcs' is explicitly 
initialized. We should maintain the convention, perhaps.

>
>>>    static int __init svm_init(void)
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 05e953b2ec61..2127ed937f53 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>>    			r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
>>>    		break;
>>>    	}
>>> +	case KVM_PAGE_ENC_BITMAP_RESET: {
>>> +		r = -ENOTTY;
>>> +		if (kvm_x86_ops->reset_page_enc_bitmap)
>>> +			r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
>>> +		break;
>>> +	}
>>>    	default:
>>>    		r = -ENOTTY;
>>>    	}
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index b4b01d47e568..0884a581fc37 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
>>>    #define KVM_GET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
>>>    #define KVM_SET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
>>> +#define KVM_PAGE_ENC_BITMAP_RESET	_IO(KVMIO, 0xc7)
>>>    /* Secure Encrypted Virtualization command */
>>>    enum sev_cmd_id {
>> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 09/14] KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl
  2020-04-03 20:47     ` Ashish Kalra
@ 2020-04-06 22:07       ` Krish Sadhukhan
  0 siblings, 0 replies; 107+ messages in thread
From: Krish Sadhukhan @ 2020-04-06 22:07 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh


On 4/3/20 1:47 PM, Ashish Kalra wrote:
> On Fri, Apr 03, 2020 at 01:18:52PM -0700, Krish Sadhukhan wrote:
>> On 3/29/20 11:22 PM, Ashish Kalra wrote:
>>> From: Brijesh Singh <Brijesh.Singh@amd.com>
>>>
>>> The ioctl can be used to retrieve page encryption bitmap for a given
>>> gfn range.
>>>
>>> Return the correct bitmap as per the number of pages being requested
>>> by the user. Ensure that we only copy bmap->num_pages bytes in the
>>> userspace buffer, if bmap->num_pages is not byte aligned we read
>>> the trailing bits from the userspace and copy those bits as is.
>>>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Ingo Molnar <mingo@redhat.com>
>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
>>> Cc: Joerg Roedel <joro@8bytes.org>
>>> Cc: Borislav Petkov <bp@suse.de>
>>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>>> Cc: x86@kernel.org
>>> Cc: kvm@vger.kernel.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>> ---
>>>    Documentation/virt/kvm/api.rst  | 27 +++++++++++++
>>>    arch/x86/include/asm/kvm_host.h |  2 +
>>>    arch/x86/kvm/svm.c              | 71 +++++++++++++++++++++++++++++++++
>>>    arch/x86/kvm/x86.c              | 12 ++++++
>>>    include/uapi/linux/kvm.h        | 12 ++++++
>>>    5 files changed, 124 insertions(+)
>>>
>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>>> index ebd383fba939..8ad800ebb54f 100644
>>> --- a/Documentation/virt/kvm/api.rst
>>> +++ b/Documentation/virt/kvm/api.rst
>>> @@ -4648,6 +4648,33 @@ This ioctl resets VCPU registers and control structures according to
>>>    the clear cpu reset definition in the POP. However, the cpu is not put
>>>    into ESA mode. This reset is a superset of the initial reset.
>>> +4.125 KVM_GET_PAGE_ENC_BITMAP (vm ioctl)
>>> +---------------------------------------
>>> +
>>> +:Capability: basic
>>> +:Architectures: x86
>>> +:Type: vm ioctl
>>> +:Parameters: struct kvm_page_enc_bitmap (in/out)
>>> +:Returns: 0 on success, -1 on error
>>> +
>>> +/* for KVM_GET_PAGE_ENC_BITMAP */
>>> +struct kvm_page_enc_bitmap {
>>> +	__u64 start_gfn;
>>> +	__u64 num_pages;
>>> +	union {
>>> +		void __user *enc_bitmap; /* one bit per page */
>>> +		__u64 padding2;
>>> +	};
>>> +};
>>> +
>>> +The encrypted VMs have concept of private and shared pages. The private
>>> +page is encrypted with the guest-specific key, while shared page may
>>> +be encrypted with the hypervisor key. The KVM_GET_PAGE_ENC_BITMAP can
>>> +be used to get the bitmap indicating whether the guest page is private
>>> +or shared. The bitmap can be used during the guest migration, if the page
>>> +is private then userspace need to use SEV migration commands to transmit
>>> +the page.
>>> +
>>>    5. The kvm_run structure
>>>    ========================
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index 90718fa3db47..27e43e3ec9d8 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -1269,6 +1269,8 @@ struct kvm_x86_ops {
>>>    	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
>>>    	int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
>>>    				  unsigned long sz, unsigned long mode);
>>> +	int (*get_page_enc_bitmap)(struct kvm *kvm,
>>> +				struct kvm_page_enc_bitmap *bmap);
>>
>> Looking back at the previous patch, it seems that these two are basically
>> the setter/getter action for page encryption, though one is implemented as a
>> hypercall while the other as an ioctl. If we consider the setter/getter
>> aspect, isn't it better to have some sort of symmetry in the naming of the
>> ops ? For example,
>>
>>          set_page_enc_hc
>>
>>          get_page_enc_ioctl
>>
>>>    };
> These are named as per their usage. While the page_enc_status_hc is a
> hypercall used by a guest to mark the page encryption bitmap, the other
> ones are ioctl interfaces used by Qemu (or Qemu alternative) to get/set
> the page encryption bitmaps, so these are named accordingly.


OK.

Please rename 'set_page_enc_hc' to 'set_page_enc_hypercall' to match 
'patch_hypercall'.


Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>

>
>>>    struct kvm_arch_async_pf {
>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>> index 1d8beaf1bceb..bae783cd396a 100644
>>> --- a/arch/x86/kvm/svm.c
>>> +++ b/arch/x86/kvm/svm.c
>>> @@ -7686,6 +7686,76 @@ static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
>>>    	return ret;
>>>    }
>>> +static int svm_get_page_enc_bitmap(struct kvm *kvm,
>>> +				   struct kvm_page_enc_bitmap *bmap)
>>> +{
>>> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>> +	unsigned long gfn_start, gfn_end;
>>> +	unsigned long sz, i, sz_bytes;
>>> +	unsigned long *bitmap;
>>> +	int ret, n;
>>> +
>>> +	if (!sev_guest(kvm))
>>> +		return -ENOTTY;
>>> +
>>> +	gfn_start = bmap->start_gfn;
>>
>> What if bmap->start_gfn is junk ?
>>
>>> +	gfn_end = gfn_start + bmap->num_pages;
>>> +
>>> +	sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / BITS_PER_BYTE;
>>> +	bitmap = kmalloc(sz, GFP_KERNEL);
>>> +	if (!bitmap)
>>> +		return -ENOMEM;
>>> +
>>> +	/* by default all pages are marked encrypted */
>>> +	memset(bitmap, 0xff, sz);
>>> +
>>> +	mutex_lock(&kvm->lock);
>>> +	if (sev->page_enc_bmap) {
>>> +		i = gfn_start;
>>> +		for_each_clear_bit_from(i, sev->page_enc_bmap,
>>> +				      min(sev->page_enc_bmap_size, gfn_end))
>>> +			clear_bit(i - gfn_start, bitmap);
>>> +	}
>>> +	mutex_unlock(&kvm->lock);
>>> +
>>> +	ret = -EFAULT;
>>> +
>>> +	n = bmap->num_pages % BITS_PER_BYTE;
>>> +	sz_bytes = ALIGN(bmap->num_pages, BITS_PER_BYTE) / BITS_PER_BYTE;
>>> +
>>> +	/*
>>> +	 * Return the correct bitmap as per the number of pages being
>>> +	 * requested by the user. Ensure that we only copy bmap->num_pages
>>> +	 * bytes in the userspace buffer, if bmap->num_pages is not byte
>>> +	 * aligned we read the trailing bits from the userspace and copy
>>> +	 * those bits as is.
>>> +	 */
>>> +
>>> +	if (n) {
>>
>> Is it better to check for 'num_pages' at the beginning of the function
>> rather than coming this far if bmap->num_pages is zero ?
>>
> This is not checking for "num_pages", this is basically checking if
> bmap->num_pages is not byte aligned.
>
>>> +		unsigned char *bitmap_kernel = (unsigned char *)bitmap;
>>
>> Just trying to understand why you need this extra variable instead of using
>> 'bitmap' directly.
>>
> Makes the code much more readable/understandable.
>
>>> +		unsigned char bitmap_user;
>>> +		unsigned long offset, mask;
>>> +
>>> +		offset = bmap->num_pages / BITS_PER_BYTE;
>>> +		if (copy_from_user(&bitmap_user, bmap->enc_bitmap + offset,
>>> +				sizeof(unsigned char)))
>>> +			goto out;
>>> +
>>> +		mask = GENMASK(n - 1, 0);
>>> +		bitmap_user &= ~mask;
>>> +		bitmap_kernel[offset] &= mask;
>>> +		bitmap_kernel[offset] |= bitmap_user;
>>> +	}
>>> +
>>> +	if (copy_to_user(bmap->enc_bitmap, bitmap, sz_bytes))
>>
>> If 'n' is zero, we are still copying stuff back to the user. Is that what is
>> expected from userland ?
>>
>> Another point. Since copy_from_user() was done in the caller, isn't it
>> better to move this to the caller to keep a symmetry ?
>>
> As per the comments above, please note if n is not zero that means
> bmap->num_pages is not byte aligned so we read the trailing bits
> from the userspace and copy those bits as is. If n is zero, then
> bmap->num_pages is correctly aligned and we copy all the bytes back.
>
> Thanks,
> Ashish
>
>>> +		goto out;
>>> +
>>> +	ret = 0;
>>> +out:
>>> +	kfree(bitmap);
>>> +	return ret;
>>> +}
>>> +
>>>    static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>>    {
>>>    	struct kvm_sev_cmd sev_cmd;
>>> @@ -8090,6 +8160,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>>>    	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
>>>    	.page_enc_status_hc = svm_page_enc_status_hc,
>>> +	.get_page_enc_bitmap = svm_get_page_enc_bitmap,
>>>    };
>>>    static int __init svm_init(void)
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 68428eef2dde..3c3fea4e20b5 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -5226,6 +5226,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>>    	case KVM_SET_PMU_EVENT_FILTER:
>>>    		r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp);
>>>    		break;
>>> +	case KVM_GET_PAGE_ENC_BITMAP: {
>>> +		struct kvm_page_enc_bitmap bitmap;
>>> +
>>> +		r = -EFAULT;
>>> +		if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
>>> +			goto out;
>>> +
>>> +		r = -ENOTTY;
>>> +		if (kvm_x86_ops->get_page_enc_bitmap)
>>> +			r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
>>> +		break;
>>> +	}
>>>    	default:
>>>    		r = -ENOTTY;
>>>    	}
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index 4e80c57a3182..db1ebf85e177 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -500,6 +500,16 @@ struct kvm_dirty_log {
>>>    	};
>>>    };
>>> +/* for KVM_GET_PAGE_ENC_BITMAP */
>>> +struct kvm_page_enc_bitmap {
>>> +	__u64 start_gfn;
>>> +	__u64 num_pages;
>>> +	union {
>>> +		void __user *enc_bitmap; /* one bit per page */
>>> +		__u64 padding2;
>>> +	};
>>> +};
>>> +
>>>    /* for KVM_CLEAR_DIRTY_LOG */
>>>    struct kvm_clear_dirty_log {
>>>    	__u32 slot;
>>> @@ -1478,6 +1488,8 @@ struct kvm_enc_region {
>>>    #define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
>>>    #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
>>> +#define KVM_GET_PAGE_ENC_BITMAP	_IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
>>> +
>>>    /* Secure Encrypted Virtualization command */
>>>    enum sev_cmd_id {
>>>    	/* Guest initialization commands */

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2020-04-03  2:58       ` Ashish Kalra
@ 2020-04-06 22:27         ` Krish Sadhukhan
  0 siblings, 0 replies; 107+ messages in thread
From: Krish Sadhukhan @ 2020-04-06 22:27 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, rientjes, srutherford, luto, brijesh.singh


On 4/2/20 7:58 PM, Ashish Kalra wrote:
> On Fri, Apr 03, 2020 at 01:57:48AM +0000, Ashish Kalra wrote:
>> On Thu, Apr 02, 2020 at 06:31:54PM -0700, Krish Sadhukhan wrote:
>>> On 3/29/20 11:22 PM, Ashish Kalra wrote:
>>>> From: Brijesh Singh <Brijesh.Singh@amd.com>
>>>>
>>>> This hypercall is used by the SEV guest to notify a change in the page
>>>> encryption status to the hypervisor. The hypercall should be invoked
>>>> only when the encryption attribute is changed from encrypted -> decrypted
>>>> and vice versa. By default all guest pages are considered encrypted.
>>>>
>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
>>>> Cc: Joerg Roedel <joro@8bytes.org>
>>>> Cc: Borislav Petkov <bp@suse.de>
>>>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>>>> Cc: x86@kernel.org
>>>> Cc: kvm@vger.kernel.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>>> ---
>>>>    Documentation/virt/kvm/hypercalls.rst | 15 +++++
>>>>    arch/x86/include/asm/kvm_host.h       |  2 +
>>>>    arch/x86/kvm/svm.c                    | 95 +++++++++++++++++++++++++++
>>>>    arch/x86/kvm/vmx/vmx.c                |  1 +
>>>>    arch/x86/kvm/x86.c                    |  6 ++
>>>>    include/uapi/linux/kvm_para.h         |  1 +
>>>>    6 files changed, 120 insertions(+)
>>>>
>>>> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
>>>> index dbaf207e560d..ff5287e68e81 100644
>>>> --- a/Documentation/virt/kvm/hypercalls.rst
>>>> +++ b/Documentation/virt/kvm/hypercalls.rst
>>>> @@ -169,3 +169,18 @@ a0: destination APIC ID
>>>>    :Usage example: When sending a call-function IPI-many to vCPUs, yield if
>>>>    	        any of the IPI target vCPUs was preempted.
>>>> +
>>>> +
>>>> +8. KVM_HC_PAGE_ENC_STATUS
>>>> +-------------------------
>>>> +:Architecture: x86
>>>> +:Status: active
>>>> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
>>>> +
>>>> +a0: the guest physical address of the start page
>>>> +a1: the number of pages
>>>> +a2: encryption attribute
>>>> +
>>>> +   Where:
>>>> +	* 1: Encryption attribute is set
>>>> +	* 0: Encryption attribute is cleared
>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>> index 98959e8cd448..90718fa3db47 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
>>>>    	bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
>>>>    	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
>>>> +	int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
>>>> +				  unsigned long sz, unsigned long mode);
>>>>    };
>>>>    struct kvm_arch_async_pf {
>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>>> index 7c2721e18b06..1d8beaf1bceb 100644
>>>> --- a/arch/x86/kvm/svm.c
>>>> +++ b/arch/x86/kvm/svm.c
>>>> @@ -136,6 +136,8 @@ struct kvm_sev_info {
>>>>    	int fd;			/* SEV device fd */
>>>>    	unsigned long pages_locked; /* Number of pages locked */
>>>>    	struct list_head regions_list;  /* List of registered regions */
>>>> +	unsigned long *page_enc_bmap;
>>>> +	unsigned long page_enc_bmap_size;
>>>>    };
>>>>    struct kvm_svm {
>>>> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
>>>>    	sev_unbind_asid(kvm, sev->handle);
>>>>    	sev_asid_free(sev->asid);
>>>> +
>>>> +	kvfree(sev->page_enc_bmap);
>>>> +	sev->page_enc_bmap = NULL;
>>>>    }
>>>>    static void avic_vm_destroy(struct kvm *kvm)
>>>> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>>>    	return ret;
>>>>    }
>>>> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
>>>> +{
>>>> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>> +	unsigned long *map;
>>>> +	unsigned long sz;
>>>> +
>>>> +	if (sev->page_enc_bmap_size >= new_size)
>>>> +		return 0;
>>>> +
>>>> +	sz = ALIGN(new_size, BITS_PER_LONG) / 8;
>>>> +
>>>> +	map = vmalloc(sz);
>>>
>>> Just wondering why we can't directly modify sev->page_enc_bmap.
>>>
>> Because the page_enc_bitmap needs to be re-sized here, it needs to be
>> expanded here.


OK.

> I don't believe there is anything is like a realloc() kind of equivalent
> for the kmalloc() interfaces.
>
> Thanks,
> Ashish
>
>>>> +	if (!map) {
>>>> +		pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
>>>> +				sz);
>>>> +		return -ENOMEM;
>>>> +	}
>>>> +
>>>> +	/* mark the page encrypted (by default) */
>>>> +	memset(map, 0xff, sz);
>>>> +
>>>> +	bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
>>>> +	kvfree(sev->page_enc_bmap);
>>>> +
>>>> +	sev->page_enc_bmap = map;
>>>> +	sev->page_enc_bmap_size = new_size;
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
>>>> +				  unsigned long npages, unsigned long enc)
>>>> +{
>>>> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>> +	kvm_pfn_t pfn_start, pfn_end;
>>>> +	gfn_t gfn_start, gfn_end;
>>>> +	int ret;
>>>> +
>>>> +	if (!sev_guest(kvm))
>>>> +		return -EINVAL;
>>>> +
>>>> +	if (!npages)
>>>> +		return 0;
>>>> +
>>>> +	gfn_start = gpa_to_gfn(gpa);
>>>> +	gfn_end = gfn_start + npages;
>>>> +
>>>> +	/* out of bound access error check */
>>>> +	if (gfn_end <= gfn_start)
>>>> +		return -EINVAL;
>>>> +
>>>> +	/* lets make sure that gpa exist in our memslot */
>>>> +	pfn_start = gfn_to_pfn(kvm, gfn_start);
>>>> +	pfn_end = gfn_to_pfn(kvm, gfn_end);
>>>> +
>>>> +	if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
>>>> +		/*
>>>> +		 * Allow guest MMIO range(s) to be added
>>>> +		 * to the page encryption bitmap.
>>>> +		 */
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
>>>> +		/*
>>>> +		 * Allow guest MMIO range(s) to be added
>>>> +		 * to the page encryption bitmap.
>>>> +		 */
>>>> +		return -EINVAL;
>>>> +	}
>>>
>>> It seems is_error_noslot_pfn() covers both cases - i) gfn slot is absent,
>>> ii) failure to translate to pfn. So do we still need is_noslot_pfn() ?
>>>
>> We do need to check for !is_noslot_pfn(..) additionally as the MMIO ranges will not
>> be having a slot allocated.


The comments above is_error_noslot_pfn() seem to indicate that it covers 
both cases, "not in slot" and "failure to translate"...


Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>

>>
>> Thanks,
>> Ashish
>>
>>>> +
>>>> +	mutex_lock(&kvm->lock);
>>>> +	ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
>>>> +	if (ret)
>>>> +		goto unlock;
>>>> +
>>>> +	if (enc)
>>>> +		__bitmap_set(sev->page_enc_bmap, gfn_start,
>>>> +				gfn_end - gfn_start);
>>>> +	else
>>>> +		__bitmap_clear(sev->page_enc_bmap, gfn_start,
>>>> +				gfn_end - gfn_start);
>>>> +
>>>> +unlock:
>>>> +	mutex_unlock(&kvm->lock);
>>>> +	return ret;
>>>> +}
>>>> +
>>>>    static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>>>    {
>>>>    	struct kvm_sev_cmd sev_cmd;
>>>> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>>>>    	.need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
>>>>    	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
>>>> +
>>>> +	.page_enc_status_hc = svm_page_enc_status_hc,
>>>
>>> Why not place it where other encryption ops are located ?
>>>
>>>          ...
>>>
>>>          .mem_enc_unreg_region
>>>
>>> +      .page_enc_status_hc = svm_page_enc_status_hc
>>>
>>>>    };
>>>>    static int __init svm_init(void)
>>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>>> index 079d9fbf278e..f68e76ee7f9c 100644
>>>> --- a/arch/x86/kvm/vmx/vmx.c
>>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>>> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>>>>    	.nested_get_evmcs_version = NULL,
>>>>    	.need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
>>>>    	.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
>>>> +	.page_enc_status_hc = NULL,
>>>>    };
>>>>    static void vmx_cleanup_l1d_flush(void)
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index cf95c36cb4f4..68428eef2dde 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>>>>    		kvm_sched_yield(vcpu->kvm, a0);
>>>>    		ret = 0;
>>>>    		break;
>>>> +	case KVM_HC_PAGE_ENC_STATUS:
>>>> +		ret = -KVM_ENOSYS;
>>>> +		if (kvm_x86_ops->page_enc_status_hc)
>>>> +			ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
>>>> +					a0, a1, a2);
>>>> +		break;
>>>>    	default:
>>>>    		ret = -KVM_ENOSYS;
>>>>    		break;
>>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
>>>> index 8b86609849b9..847b83b75dc8 100644
>>>> --- a/include/uapi/linux/kvm_para.h
>>>> +++ b/include/uapi/linux/kvm_para.h
>>>> @@ -29,6 +29,7 @@
>>>>    #define KVM_HC_CLOCK_PAIRING		9
>>>>    #define KVM_HC_SEND_IPI		10
>>>>    #define KVM_HC_SCHED_YIELD		11
>>>> +#define KVM_HC_PAGE_ENC_STATUS		12
>>>>    /*
>>>>     * hypercalls use architecture specific

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 05/14] KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command
  2020-04-02 22:29   ` Venu Busireddy
@ 2020-04-07  0:49     ` Steve Rutherford
  0 siblings, 0 replies; 107+ messages in thread
From: Steve Rutherford @ 2020-04-07  0:49 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: Ashish Kalra, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski,
	Brijesh Singh

On Thu, Apr 2, 2020 at 3:31 PM Venu Busireddy <venu.busireddy@oracle.com> wrote:
>
> On 2020-03-30 06:21:20 +0000, Ashish Kalra wrote:
> > From: Brijesh Singh <Brijesh.Singh@amd.com>
> >
> > The command is used for copying the incoming buffer into the
> > SEV guest memory space.
> >
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > Cc: x86@kernel.org
> > Cc: kvm@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>
> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
>
> > ---
> >  .../virt/kvm/amd-memory-encryption.rst        | 24 ++++++
> >  arch/x86/kvm/svm.c                            | 79 +++++++++++++++++++
> >  include/uapi/linux/kvm.h                      |  9 +++
> >  3 files changed, 112 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> > index ef1f1f3a5b40..554aa33a99cc 100644
> > --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> > +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> > @@ -351,6 +351,30 @@ On success, the 'handle' field contains a new handle and on error, a negative va
> >
> >  For more details, see SEV spec Section 6.12.
> >
> > +14. KVM_SEV_RECEIVE_UPDATE_DATA
> > +----------------------------
> > +
> > +The KVM_SEV_RECEIVE_UPDATE_DATA command can be used by the hypervisor to copy
> > +the incoming buffers into the guest memory region with encryption context
> > +created during the KVM_SEV_RECEIVE_START.
> > +
> > +Parameters (in): struct kvm_sev_receive_update_data
> > +
> > +Returns: 0 on success, -negative on error
> > +
> > +::
> > +
> > +        struct kvm_sev_launch_receive_update_data {
> > +                __u64 hdr_uaddr;        /* userspace address containing the packet header */
> > +                __u32 hdr_len;
> > +
> > +                __u64 guest_uaddr;      /* the destination guest memory region */
> > +                __u32 guest_len;
> > +
> > +                __u64 trans_uaddr;      /* the incoming buffer memory region  */
> > +                __u32 trans_len;
> > +        };
> > +
> >  References
> >  ==========
> >
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 038b47685733..5fc5355536d7 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -7497,6 +7497,82 @@ static int sev_receive_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >       return ret;
> >  }
> >
> > +static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > +{
> > +     struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +     struct kvm_sev_receive_update_data params;
> > +     struct sev_data_receive_update_data *data;
> > +     void *hdr = NULL, *trans = NULL;
> > +     struct page **guest_page;
> > +     unsigned long n;
> > +     int ret, offset;
> > +
> > +     if (!sev_guest(kvm))
> > +             return -EINVAL;
> > +
> > +     if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data,
> > +                     sizeof(struct kvm_sev_receive_update_data)))
> > +             return -EFAULT;
> > +
> > +     if (!params.hdr_uaddr || !params.hdr_len ||
> > +         !params.guest_uaddr || !params.guest_len ||
> > +         !params.trans_uaddr || !params.trans_len)
> > +             return -EINVAL;
> > +
> > +     /* Check if we are crossing the page boundary */
> > +     offset = params.guest_uaddr & (PAGE_SIZE - 1);
> > +     if ((params.guest_len + offset > PAGE_SIZE))
> > +             return -EINVAL;

Check for overflow.
>
> > +
> > +     hdr = psp_copy_user_blob(params.hdr_uaddr, params.hdr_len);
> > +     if (IS_ERR(hdr))
> > +             return PTR_ERR(hdr);
> > +
> > +     trans = psp_copy_user_blob(params.trans_uaddr, params.trans_len);
> > +     if (IS_ERR(trans)) {
> > +             ret = PTR_ERR(trans);
> > +             goto e_free_hdr;
> > +     }
> > +
> > +     ret = -ENOMEM;
> > +     data = kzalloc(sizeof(*data), GFP_KERNEL);
> > +     if (!data)
> > +             goto e_free_trans;
> > +
> > +     data->hdr_address = __psp_pa(hdr);
> > +     data->hdr_len = params.hdr_len;
> > +     data->trans_address = __psp_pa(trans);
> > +     data->trans_len = params.trans_len;
> > +
> > +     /* Pin guest memory */
> > +     ret = -EFAULT;
> > +     guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK,
> > +                                 PAGE_SIZE, &n, 0);
> > +     if (!guest_page)
> > +             goto e_free;
> > +
> > +     /* The RECEIVE_UPDATE_DATA command requires C-bit to be always set. */
> > +     data->guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) +
> > +                             offset;
> > +     data->guest_address |= sev_me_mask;
> > +     data->guest_len = params.guest_len;
> > +     data->handle = sev->handle;
> > +
> > +     ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_UPDATE_DATA, data,
> > +                             &argp->error);
> > +
> > +     sev_unpin_memory(kvm, guest_page, n);
> > +
> > +e_free:
> > +     kfree(data);
> > +e_free_trans:
> > +     kfree(trans);
> > +e_free_hdr:
> > +     kfree(hdr);
> > +
> > +     return ret;
> > +}
> > +
> >  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >  {
> >       struct kvm_sev_cmd sev_cmd;
> > @@ -7553,6 +7629,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >       case KVM_SEV_RECEIVE_START:
> >               r = sev_receive_start(kvm, &sev_cmd);
> >               break;
> > +     case KVM_SEV_RECEIVE_UPDATE_DATA:
> > +             r = sev_receive_update_data(kvm, &sev_cmd);
> > +             break;
> >       default:
> >               r = -EINVAL;
> >               goto out;
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 74764b9db5fa..4e80c57a3182 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1588,6 +1588,15 @@ struct kvm_sev_receive_start {
> >       __u32 session_len;
> >  };
> >
> > +struct kvm_sev_receive_update_data {
> > +     __u64 hdr_uaddr;
> > +     __u32 hdr_len;
> > +     __u64 guest_uaddr;
> > +     __u32 guest_len;
> > +     __u64 trans_uaddr;
> > +     __u32 trans_len;
> > +};
> > +
> >  #define KVM_DEV_ASSIGN_ENABLE_IOMMU  (1 << 0)
> >  #define KVM_DEV_ASSIGN_PCI_2_3               (1 << 1)
> >  #define KVM_DEV_ASSIGN_MASK_INTX     (1 << 2)
> > --
> > 2.17.1
> >
Otherwise looks fine to my eye.
Reviewed-by: Steve Rutherford <srutherford@google.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 06/14] KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command
  2020-04-02 22:27   ` Krish Sadhukhan
@ 2020-04-07  0:57     ` Steve Rutherford
  0 siblings, 0 replies; 107+ messages in thread
From: Steve Rutherford @ 2020-04-07  0:57 UTC (permalink / raw)
  To: Krish Sadhukhan
  Cc: Ashish Kalra, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski,
	Brijesh Singh

On Thu, Apr 2, 2020 at 3:27 PM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
>
> On 3/29/20 11:21 PM, Ashish Kalra wrote:
> > From: Brijesh Singh <Brijesh.Singh@amd.com>
> >
> > The command finalize the guest receiving process and make the SEV guest
> > ready for the execution.
> >
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > Cc: x86@kernel.org
> > Cc: kvm@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > ---
> >   .../virt/kvm/amd-memory-encryption.rst        |  8 +++++++
> >   arch/x86/kvm/svm.c                            | 23 +++++++++++++++++++
> >   2 files changed, 31 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> > index 554aa33a99cc..93cd95d9a6c0 100644
> > --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> > +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> > @@ -375,6 +375,14 @@ Returns: 0 on success, -negative on error
> >                   __u32 trans_len;
> >           };
> >
> > +15. KVM_SEV_RECEIVE_FINISH
> > +------------------------
> > +
> > +After completion of the migration flow, the KVM_SEV_RECEIVE_FINISH command can be
> > +issued by the hypervisor to make the guest ready for execution.
> > +
> > +Returns: 0 on success, -negative on error
> > +
> >   References
> >   ==========
> >
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 5fc5355536d7..7c2721e18b06 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -7573,6 +7573,26 @@ static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >       return ret;
> >   }
> >
> > +static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > +{
> > +     struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +     struct sev_data_receive_finish *data;
> > +     int ret;
> > +
> > +     if (!sev_guest(kvm))
> > +             return -ENOTTY;
> > +
> > +     data = kzalloc(sizeof(*data), GFP_KERNEL);
> > +     if (!data)
> > +             return -ENOMEM;
> > +
> > +     data->handle = sev->handle;
> > +     ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, data, &argp->error);
> > +
> > +     kfree(data);
> > +     return ret;
> > +}
> > +
> >   static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >   {
> >       struct kvm_sev_cmd sev_cmd;
> > @@ -7632,6 +7652,9 @@ static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >       case KVM_SEV_RECEIVE_UPDATE_DATA:
> >               r = sev_receive_update_data(kvm, &sev_cmd);
> >               break;
> > +     case KVM_SEV_RECEIVE_FINISH:
> > +             r = sev_receive_finish(kvm, &sev_cmd);
> > +             break;
> >       default:
> >               r = -EINVAL;
> >               goto out;
> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>

As to ENOTTY, man page for ioctl translates it as "The specified
request does not apply to the kind of object that the file descriptor
fd references", which seems appropriate here.
Reviewed-by: Steve Rutherford <srutherford@google.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 07/14] KVM: x86: Add AMD SEV specific Hypercall3
  2020-04-02 23:54   ` Krish Sadhukhan
@ 2020-04-07  1:22     ` Steve Rutherford
  0 siblings, 0 replies; 107+ messages in thread
From: Steve Rutherford @ 2020-04-07  1:22 UTC (permalink / raw)
  To: Krish Sadhukhan
  Cc: Ashish Kalra, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski,
	Brijesh Singh

On Thu, Apr 2, 2020 at 4:56 PM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
>
> On 3/29/20 11:21 PM, Ashish Kalra wrote:
> > From: Brijesh Singh <Brijesh.Singh@amd.com>
> >
> > KVM hypercall framework relies on alternative framework to patch the
> > VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
> > apply_alternative()
>
> s/apply_alternative/apply_alternatives/
> > is called then it defaults to VMCALL. The approach
> > works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
> > will be able to decode the instruction and do the right things. But
> > when SEV is active, guest memory is encrypted with guest key and
> > hypervisor will not be able to decode the instruction bytes.
> >
> > Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall
> > will be used by the SEV guest to notify encrypted pages to the hypervisor.
> >
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > Cc: x86@kernel.org
> > Cc: kvm@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > ---
> >   arch/x86/include/asm/kvm_para.h | 12 ++++++++++++
> >   1 file changed, 12 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> > index 9b4df6eaa11a..6c09255633a4 100644
> > --- a/arch/x86/include/asm/kvm_para.h
> > +++ b/arch/x86/include/asm/kvm_para.h
> > @@ -84,6 +84,18 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
> >       return ret;
> >   }
> >
> > +static inline long kvm_sev_hypercall3(unsigned int nr, unsigned long p1,
> > +                                   unsigned long p2, unsigned long p3)
> > +{
> > +     long ret;
> > +
> > +     asm volatile("vmmcall"
> > +                  : "=a"(ret)
> > +                  : "a"(nr), "b"(p1), "c"(p2), "d"(p3)
> > +                  : "memory");
> > +     return ret;
> > +}
> > +
> >   #ifdef CONFIG_KVM_GUEST
> >   bool kvm_para_available(void);
> >   unsigned int kvm_arch_para_features(void);
> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>

Nit: I'd personally have named this kvm_vmmcall3, since it's about
invoking a particular instruction. The usage happens to be for SEV.

Reviewed-by: Steve Rutherford <srutherford@google.com>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2020-03-30  6:22 ` [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall Ashish Kalra
  2020-04-03  0:00   ` Venu Busireddy
  2020-04-03  1:31   ` Krish Sadhukhan
@ 2020-04-07  2:17   ` Steve Rutherford
  2020-04-07  5:27     ` Ashish Kalra
  2 siblings, 1 reply; 107+ messages in thread
From: Steve Rutherford @ 2020-04-07  2:17 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, David Rientjes, Andy Lutomirski, Brijesh Singh

On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
>
> From: Brijesh Singh <Brijesh.Singh@amd.com>
>
> This hypercall is used by the SEV guest to notify a change in the page
> encryption status to the hypervisor. The hypercall should be invoked
> only when the encryption attribute is changed from encrypted -> decrypted
> and vice versa. By default all guest pages are considered encrypted.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  Documentation/virt/kvm/hypercalls.rst | 15 +++++
>  arch/x86/include/asm/kvm_host.h       |  2 +
>  arch/x86/kvm/svm.c                    | 95 +++++++++++++++++++++++++++
>  arch/x86/kvm/vmx/vmx.c                |  1 +
>  arch/x86/kvm/x86.c                    |  6 ++
>  include/uapi/linux/kvm_para.h         |  1 +
>  6 files changed, 120 insertions(+)
>
> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> index dbaf207e560d..ff5287e68e81 100644
> --- a/Documentation/virt/kvm/hypercalls.rst
> +++ b/Documentation/virt/kvm/hypercalls.rst
> @@ -169,3 +169,18 @@ a0: destination APIC ID
>
>  :Usage example: When sending a call-function IPI-many to vCPUs, yield if
>                 any of the IPI target vCPUs was preempted.
> +
> +
> +8. KVM_HC_PAGE_ENC_STATUS
> +-------------------------
> +:Architecture: x86
> +:Status: active
> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> +
> +a0: the guest physical address of the start page
> +a1: the number of pages
> +a2: encryption attribute
> +
> +   Where:
> +       * 1: Encryption attribute is set
> +       * 0: Encryption attribute is cleared
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 98959e8cd448..90718fa3db47 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
>
>         bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
>         int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> +       int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> +                                 unsigned long sz, unsigned long mode);
Nit: spell out size instead of sz.
>  };
>
>  struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 7c2721e18b06..1d8beaf1bceb 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -136,6 +136,8 @@ struct kvm_sev_info {
>         int fd;                 /* SEV device fd */
>         unsigned long pages_locked; /* Number of pages locked */
>         struct list_head regions_list;  /* List of registered regions */
> +       unsigned long *page_enc_bmap;
> +       unsigned long page_enc_bmap_size;
>  };
>
>  struct kvm_svm {
> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
>
>         sev_unbind_asid(kvm, sev->handle);
>         sev_asid_free(sev->asid);
> +
> +       kvfree(sev->page_enc_bmap);
> +       sev->page_enc_bmap = NULL;
>  }
>
>  static void avic_vm_destroy(struct kvm *kvm)
> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>         return ret;
>  }
>
> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> +{
> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +       unsigned long *map;
> +       unsigned long sz;
> +
> +       if (sev->page_enc_bmap_size >= new_size)
> +               return 0;
> +
> +       sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> +
> +       map = vmalloc(sz);
> +       if (!map) {
> +               pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> +                               sz);
> +               return -ENOMEM;
> +       }
> +
> +       /* mark the page encrypted (by default) */
> +       memset(map, 0xff, sz);
> +
> +       bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> +       kvfree(sev->page_enc_bmap);
> +
> +       sev->page_enc_bmap = map;
> +       sev->page_enc_bmap_size = new_size;
> +
> +       return 0;
> +}
> +
> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> +                                 unsigned long npages, unsigned long enc)
> +{
> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +       kvm_pfn_t pfn_start, pfn_end;
> +       gfn_t gfn_start, gfn_end;
> +       int ret;
> +
> +       if (!sev_guest(kvm))
> +               return -EINVAL;
> +
> +       if (!npages)
> +               return 0;
> +
> +       gfn_start = gpa_to_gfn(gpa);
> +       gfn_end = gfn_start + npages;
> +
> +       /* out of bound access error check */
> +       if (gfn_end <= gfn_start)
> +               return -EINVAL;
> +
> +       /* lets make sure that gpa exist in our memslot */
> +       pfn_start = gfn_to_pfn(kvm, gfn_start);
> +       pfn_end = gfn_to_pfn(kvm, gfn_end);
> +
> +       if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> +               /*
> +                * Allow guest MMIO range(s) to be added
> +                * to the page encryption bitmap.
> +                */
> +               return -EINVAL;
> +       }
> +
> +       if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> +               /*
> +                * Allow guest MMIO range(s) to be added
> +                * to the page encryption bitmap.
> +                */
> +               return -EINVAL;
> +       }
> +
> +       mutex_lock(&kvm->lock);
> +       ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> +       if (ret)
> +               goto unlock;
> +
> +       if (enc)
> +               __bitmap_set(sev->page_enc_bmap, gfn_start,
> +                               gfn_end - gfn_start);
> +       else
> +               __bitmap_clear(sev->page_enc_bmap, gfn_start,
> +                               gfn_end - gfn_start);
> +
> +unlock:
> +       mutex_unlock(&kvm->lock);
> +       return ret;
> +}
> +
>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
>         struct kvm_sev_cmd sev_cmd;
> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>         .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
>
>         .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> +
> +       .page_enc_status_hc = svm_page_enc_status_hc,
>  };
>
>  static int __init svm_init(void)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 079d9fbf278e..f68e76ee7f9c 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>         .nested_get_evmcs_version = NULL,
>         .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
>         .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> +       .page_enc_status_hc = NULL,
>  };
>
>  static void vmx_cleanup_l1d_flush(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cf95c36cb4f4..68428eef2dde 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>                 kvm_sched_yield(vcpu->kvm, a0);
>                 ret = 0;
>                 break;
> +       case KVM_HC_PAGE_ENC_STATUS:
> +               ret = -KVM_ENOSYS;
> +               if (kvm_x86_ops->page_enc_status_hc)
> +                       ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> +                                       a0, a1, a2);
> +               break;
>         default:
>                 ret = -KVM_ENOSYS;
>                 break;
> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> index 8b86609849b9..847b83b75dc8 100644
> --- a/include/uapi/linux/kvm_para.h
> +++ b/include/uapi/linux/kvm_para.h
> @@ -29,6 +29,7 @@
>  #define KVM_HC_CLOCK_PAIRING           9
>  #define KVM_HC_SEND_IPI                10
>  #define KVM_HC_SCHED_YIELD             11
> +#define KVM_HC_PAGE_ENC_STATUS         12
>
>  /*
>   * hypercalls use architecture specific
> --
> 2.17.1
>

I'm still not excited by the dynamic resizing. I believe the guest
hypercall can be called in atomic contexts, which makes me
particularly unexcited to see a potentially large vmalloc on the host
followed by filling the buffer. Particularly when the buffer might be
non-trivial in size (~1MB per 32GB, per some back of the envelope
math).

I'd like to see an enable cap for preallocating this. Yes, the first
call might not be the right value because of hotplug, but won't the
VMM know when hotplug is happening? If the VMM asks for the wrong
size, and does not update the size correctly before granting the VM
access to more RAM, that seems like the VMM's fault.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2020-04-07  2:17   ` Steve Rutherford
@ 2020-04-07  5:27     ` Ashish Kalra
  2020-04-08  0:01       ` Steve Rutherford
  0 siblings, 1 reply; 107+ messages in thread
From: Ashish Kalra @ 2020-04-07  5:27 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, David Rientjes, Andy Lutomirski, Brijesh Singh

Hello Steve,

On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
> On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> >
> > From: Brijesh Singh <Brijesh.Singh@amd.com>
> >
> > This hypercall is used by the SEV guest to notify a change in the page
> > encryption status to the hypervisor. The hypercall should be invoked
> > only when the encryption attribute is changed from encrypted -> decrypted
> > and vice versa. By default all guest pages are considered encrypted.
> >
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > Cc: x86@kernel.org
> > Cc: kvm@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > ---
> >  Documentation/virt/kvm/hypercalls.rst | 15 +++++
> >  arch/x86/include/asm/kvm_host.h       |  2 +
> >  arch/x86/kvm/svm.c                    | 95 +++++++++++++++++++++++++++
> >  arch/x86/kvm/vmx/vmx.c                |  1 +
> >  arch/x86/kvm/x86.c                    |  6 ++
> >  include/uapi/linux/kvm_para.h         |  1 +
> >  6 files changed, 120 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> > index dbaf207e560d..ff5287e68e81 100644
> > --- a/Documentation/virt/kvm/hypercalls.rst
> > +++ b/Documentation/virt/kvm/hypercalls.rst
> > @@ -169,3 +169,18 @@ a0: destination APIC ID
> >
> >  :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> >                 any of the IPI target vCPUs was preempted.
> > +
> > +
> > +8. KVM_HC_PAGE_ENC_STATUS
> > +-------------------------
> > +:Architecture: x86
> > +:Status: active
> > +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> > +
> > +a0: the guest physical address of the start page
> > +a1: the number of pages
> > +a2: encryption attribute
> > +
> > +   Where:
> > +       * 1: Encryption attribute is set
> > +       * 0: Encryption attribute is cleared
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 98959e8cd448..90718fa3db47 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> >
> >         bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> >         int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > +       int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > +                                 unsigned long sz, unsigned long mode);
> Nit: spell out size instead of sz.
> >  };
> >
> >  struct kvm_arch_async_pf {
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 7c2721e18b06..1d8beaf1bceb 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -136,6 +136,8 @@ struct kvm_sev_info {
> >         int fd;                 /* SEV device fd */
> >         unsigned long pages_locked; /* Number of pages locked */
> >         struct list_head regions_list;  /* List of registered regions */
> > +       unsigned long *page_enc_bmap;
> > +       unsigned long page_enc_bmap_size;
> >  };
> >
> >  struct kvm_svm {
> > @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> >
> >         sev_unbind_asid(kvm, sev->handle);
> >         sev_asid_free(sev->asid);
> > +
> > +       kvfree(sev->page_enc_bmap);
> > +       sev->page_enc_bmap = NULL;
> >  }
> >
> >  static void avic_vm_destroy(struct kvm *kvm)
> > @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >         return ret;
> >  }
> >
> > +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> > +{
> > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +       unsigned long *map;
> > +       unsigned long sz;
> > +
> > +       if (sev->page_enc_bmap_size >= new_size)
> > +               return 0;
> > +
> > +       sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> > +
> > +       map = vmalloc(sz);
> > +       if (!map) {
> > +               pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> > +                               sz);
> > +               return -ENOMEM;
> > +       }
> > +
> > +       /* mark the page encrypted (by default) */
> > +       memset(map, 0xff, sz);
> > +
> > +       bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> > +       kvfree(sev->page_enc_bmap);
> > +
> > +       sev->page_enc_bmap = map;
> > +       sev->page_enc_bmap_size = new_size;
> > +
> > +       return 0;
> > +}
> > +
> > +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > +                                 unsigned long npages, unsigned long enc)
> > +{
> > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +       kvm_pfn_t pfn_start, pfn_end;
> > +       gfn_t gfn_start, gfn_end;
> > +       int ret;
> > +
> > +       if (!sev_guest(kvm))
> > +               return -EINVAL;
> > +
> > +       if (!npages)
> > +               return 0;
> > +
> > +       gfn_start = gpa_to_gfn(gpa);
> > +       gfn_end = gfn_start + npages;
> > +
> > +       /* out of bound access error check */
> > +       if (gfn_end <= gfn_start)
> > +               return -EINVAL;
> > +
> > +       /* lets make sure that gpa exist in our memslot */
> > +       pfn_start = gfn_to_pfn(kvm, gfn_start);
> > +       pfn_end = gfn_to_pfn(kvm, gfn_end);
> > +
> > +       if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> > +               /*
> > +                * Allow guest MMIO range(s) to be added
> > +                * to the page encryption bitmap.
> > +                */
> > +               return -EINVAL;
> > +       }
> > +
> > +       if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> > +               /*
> > +                * Allow guest MMIO range(s) to be added
> > +                * to the page encryption bitmap.
> > +                */
> > +               return -EINVAL;
> > +       }
> > +
> > +       mutex_lock(&kvm->lock);
> > +       ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > +       if (ret)
> > +               goto unlock;
> > +
> > +       if (enc)
> > +               __bitmap_set(sev->page_enc_bmap, gfn_start,
> > +                               gfn_end - gfn_start);
> > +       else
> > +               __bitmap_clear(sev->page_enc_bmap, gfn_start,
> > +                               gfn_end - gfn_start);
> > +
> > +unlock:
> > +       mutex_unlock(&kvm->lock);
> > +       return ret;
> > +}
> > +
> >  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >  {
> >         struct kvm_sev_cmd sev_cmd;
> > @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >         .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> >
> >         .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > +
> > +       .page_enc_status_hc = svm_page_enc_status_hc,
> >  };
> >
> >  static int __init svm_init(void)
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 079d9fbf278e..f68e76ee7f9c 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> >         .nested_get_evmcs_version = NULL,
> >         .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> >         .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > +       .page_enc_status_hc = NULL,
> >  };
> >
> >  static void vmx_cleanup_l1d_flush(void)
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index cf95c36cb4f4..68428eef2dde 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> >                 kvm_sched_yield(vcpu->kvm, a0);
> >                 ret = 0;
> >                 break;
> > +       case KVM_HC_PAGE_ENC_STATUS:
> > +               ret = -KVM_ENOSYS;
> > +               if (kvm_x86_ops->page_enc_status_hc)
> > +                       ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> > +                                       a0, a1, a2);
> > +               break;
> >         default:
> >                 ret = -KVM_ENOSYS;
> >                 break;
> > diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> > index 8b86609849b9..847b83b75dc8 100644
> > --- a/include/uapi/linux/kvm_para.h
> > +++ b/include/uapi/linux/kvm_para.h
> > @@ -29,6 +29,7 @@
> >  #define KVM_HC_CLOCK_PAIRING           9
> >  #define KVM_HC_SEND_IPI                10
> >  #define KVM_HC_SCHED_YIELD             11
> > +#define KVM_HC_PAGE_ENC_STATUS         12
> >
> >  /*
> >   * hypercalls use architecture specific
> > --
> > 2.17.1
> >
> 
> I'm still not excited by the dynamic resizing. I believe the guest
> hypercall can be called in atomic contexts, which makes me
> particularly unexcited to see a potentially large vmalloc on the host
> followed by filling the buffer. Particularly when the buffer might be
> non-trivial in size (~1MB per 32GB, per some back of the envelope
> math).
>

I think looking at more practical situations, most hypercalls will 
happen during the boot stage, when device specific initializations are
happening, so typically the maximum page encryption bitmap size would
be allocated early enough.

In fact, initial hypercalls made by OVMF will probably allocate the
maximum page bitmap size even before the kernel comes up, especially
as they will be setting up page enc/dec status for MMIO, ROM, ACPI
regions, PCI device memory, etc., and most importantly for
"non-existent" high memory range (which will probably be the
maximum size page encryption bitmap allocated/resized).

Let me know if you have different thoughts on this ?

> I'd like to see an enable cap for preallocating this. Yes, the first
> call might not be the right value because of hotplug, but won't the
> VMM know when hotplug is happening? If the VMM asks for the wrong
> size, and does not update the size correctly before granting the VM
> access to more RAM, that seems like the VMM's fault.o

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2020-04-07  5:27     ` Ashish Kalra
@ 2020-04-08  0:01       ` Steve Rutherford
  2020-04-08  0:29         ` Brijesh Singh
  0 siblings, 1 reply; 107+ messages in thread
From: Steve Rutherford @ 2020-04-08  0:01 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, David Rientjes, Andy Lutomirski, Brijesh Singh

On Mon, Apr 6, 2020 at 10:27 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>
> Hello Steve,
>
> On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
> > On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> > >
> > > From: Brijesh Singh <Brijesh.Singh@amd.com>
> > >
> > > This hypercall is used by the SEV guest to notify a change in the page
> > > encryption status to the hypervisor. The hypercall should be invoked
> > > only when the encryption attribute is changed from encrypted -> decrypted
> > > and vice versa. By default all guest pages are considered encrypted.
> > >
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > > Cc: Joerg Roedel <joro@8bytes.org>
> > > Cc: Borislav Petkov <bp@suse.de>
> > > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > > Cc: x86@kernel.org
> > > Cc: kvm@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org
> > > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > ---
> > >  Documentation/virt/kvm/hypercalls.rst | 15 +++++
> > >  arch/x86/include/asm/kvm_host.h       |  2 +
> > >  arch/x86/kvm/svm.c                    | 95 +++++++++++++++++++++++++++
> > >  arch/x86/kvm/vmx/vmx.c                |  1 +
> > >  arch/x86/kvm/x86.c                    |  6 ++
> > >  include/uapi/linux/kvm_para.h         |  1 +
> > >  6 files changed, 120 insertions(+)
> > >
> > > diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> > > index dbaf207e560d..ff5287e68e81 100644
> > > --- a/Documentation/virt/kvm/hypercalls.rst
> > > +++ b/Documentation/virt/kvm/hypercalls.rst
> > > @@ -169,3 +169,18 @@ a0: destination APIC ID
> > >
> > >  :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> > >                 any of the IPI target vCPUs was preempted.
> > > +
> > > +
> > > +8. KVM_HC_PAGE_ENC_STATUS
> > > +-------------------------
> > > +:Architecture: x86
> > > +:Status: active
> > > +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> > > +
> > > +a0: the guest physical address of the start page
> > > +a1: the number of pages
> > > +a2: encryption attribute
> > > +
> > > +   Where:
> > > +       * 1: Encryption attribute is set
> > > +       * 0: Encryption attribute is cleared
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 98959e8cd448..90718fa3db47 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> > >
> > >         bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> > >         int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > > +       int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > > +                                 unsigned long sz, unsigned long mode);
> > Nit: spell out size instead of sz.
> > >  };
> > >
> > >  struct kvm_arch_async_pf {
> > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > index 7c2721e18b06..1d8beaf1bceb 100644
> > > --- a/arch/x86/kvm/svm.c
> > > +++ b/arch/x86/kvm/svm.c
> > > @@ -136,6 +136,8 @@ struct kvm_sev_info {
> > >         int fd;                 /* SEV device fd */
> > >         unsigned long pages_locked; /* Number of pages locked */
> > >         struct list_head regions_list;  /* List of registered regions */
> > > +       unsigned long *page_enc_bmap;
> > > +       unsigned long page_enc_bmap_size;
> > >  };
> > >
> > >  struct kvm_svm {
> > > @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> > >
> > >         sev_unbind_asid(kvm, sev->handle);
> > >         sev_asid_free(sev->asid);
> > > +
> > > +       kvfree(sev->page_enc_bmap);
> > > +       sev->page_enc_bmap = NULL;
> > >  }
> > >
> > >  static void avic_vm_destroy(struct kvm *kvm)
> > > @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > >         return ret;
> > >  }
> > >
> > > +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> > > +{
> > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > +       unsigned long *map;
> > > +       unsigned long sz;
> > > +
> > > +       if (sev->page_enc_bmap_size >= new_size)
> > > +               return 0;
> > > +
> > > +       sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> > > +
> > > +       map = vmalloc(sz);
> > > +       if (!map) {
> > > +               pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> > > +                               sz);
> > > +               return -ENOMEM;
> > > +       }
> > > +
> > > +       /* mark the page encrypted (by default) */
> > > +       memset(map, 0xff, sz);
> > > +
> > > +       bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > +       kvfree(sev->page_enc_bmap);
> > > +
> > > +       sev->page_enc_bmap = map;
> > > +       sev->page_enc_bmap_size = new_size;
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > > +                                 unsigned long npages, unsigned long enc)
> > > +{
> > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > +       kvm_pfn_t pfn_start, pfn_end;
> > > +       gfn_t gfn_start, gfn_end;
> > > +       int ret;
> > > +
> > > +       if (!sev_guest(kvm))
> > > +               return -EINVAL;
> > > +
> > > +       if (!npages)
> > > +               return 0;
> > > +
> > > +       gfn_start = gpa_to_gfn(gpa);
> > > +       gfn_end = gfn_start + npages;
> > > +
> > > +       /* out of bound access error check */
> > > +       if (gfn_end <= gfn_start)
> > > +               return -EINVAL;
> > > +
> > > +       /* lets make sure that gpa exist in our memslot */
> > > +       pfn_start = gfn_to_pfn(kvm, gfn_start);
> > > +       pfn_end = gfn_to_pfn(kvm, gfn_end);
> > > +
> > > +       if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> > > +               /*
> > > +                * Allow guest MMIO range(s) to be added
> > > +                * to the page encryption bitmap.
> > > +                */
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> > > +               /*
> > > +                * Allow guest MMIO range(s) to be added
> > > +                * to the page encryption bitmap.
> > > +                */
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       mutex_lock(&kvm->lock);
> > > +       ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > > +       if (ret)
> > > +               goto unlock;
> > > +
> > > +       if (enc)
> > > +               __bitmap_set(sev->page_enc_bmap, gfn_start,
> > > +                               gfn_end - gfn_start);
> > > +       else
> > > +               __bitmap_clear(sev->page_enc_bmap, gfn_start,
> > > +                               gfn_end - gfn_start);
> > > +
> > > +unlock:
> > > +       mutex_unlock(&kvm->lock);
> > > +       return ret;
> > > +}
> > > +
> > >  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > >  {
> > >         struct kvm_sev_cmd sev_cmd;
> > > @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > >         .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> > >
> > >         .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > > +
> > > +       .page_enc_status_hc = svm_page_enc_status_hc,
> > >  };
> > >
> > >  static int __init svm_init(void)
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 079d9fbf278e..f68e76ee7f9c 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > >         .nested_get_evmcs_version = NULL,
> > >         .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> > >         .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > > +       .page_enc_status_hc = NULL,
> > >  };
> > >
> > >  static void vmx_cleanup_l1d_flush(void)
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index cf95c36cb4f4..68428eef2dde 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> > >                 kvm_sched_yield(vcpu->kvm, a0);
> > >                 ret = 0;
> > >                 break;
> > > +       case KVM_HC_PAGE_ENC_STATUS:
> > > +               ret = -KVM_ENOSYS;
> > > +               if (kvm_x86_ops->page_enc_status_hc)
> > > +                       ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> > > +                                       a0, a1, a2);
> > > +               break;
> > >         default:
> > >                 ret = -KVM_ENOSYS;
> > >                 break;
> > > diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> > > index 8b86609849b9..847b83b75dc8 100644
> > > --- a/include/uapi/linux/kvm_para.h
> > > +++ b/include/uapi/linux/kvm_para.h
> > > @@ -29,6 +29,7 @@
> > >  #define KVM_HC_CLOCK_PAIRING           9
> > >  #define KVM_HC_SEND_IPI                10
> > >  #define KVM_HC_SCHED_YIELD             11
> > > +#define KVM_HC_PAGE_ENC_STATUS         12
> > >
> > >  /*
> > >   * hypercalls use architecture specific
> > > --
> > > 2.17.1
> > >
> >
> > I'm still not excited by the dynamic resizing. I believe the guest
> > hypercall can be called in atomic contexts, which makes me
> > particularly unexcited to see a potentially large vmalloc on the host
> > followed by filling the buffer. Particularly when the buffer might be
> > non-trivial in size (~1MB per 32GB, per some back of the envelope
> > math).
> >
>
> I think looking at more practical situations, most hypercalls will
> happen during the boot stage, when device specific initializations are
> happening, so typically the maximum page encryption bitmap size would
> be allocated early enough.
>
> In fact, initial hypercalls made by OVMF will probably allocate the
> maximum page bitmap size even before the kernel comes up, especially
> as they will be setting up page enc/dec status for MMIO, ROM, ACPI
> regions, PCI device memory, etc., and most importantly for
> "non-existent" high memory range (which will probably be the
> maximum size page encryption bitmap allocated/resized).
>
> Let me know if you have different thoughts on this ?

Hi Ashish,

If this is not an issue in practice, we can just move past this. If we
are basically guaranteed that OVMF will trigger hypercalls that expand
the bitmap beyond the top of memory, then, yes, that should work. That
leaves me slightly nervous that OVMF might regress since it's not
obvious that calling a hypercall beyond the top of memory would be
"required" for avoiding a somewhat indirectly related issue in guest
kernels.

Adding a kvm_enable_cap doesn't seem particularly complicated and side
steps all of these concerns, so I still prefer it. Caveat, haven't
reviewed the patches about the feature bits yet: the enable cap would
also make it possible for kernels that support live migration to avoid
advertising live migration if host usermode does not want it to be
advertised. This seems pretty important, since hosts that don't plan
to live migrate should have the ability to tell the guest to stop
calling.

Thanks,
Steve

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 11/14] KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
  2020-03-30  6:22 ` [PATCH v6 11/14] KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl Ashish Kalra
  2020-04-03 21:10   ` Krish Sadhukhan
  2020-04-03 21:46   ` Venu Busireddy
@ 2020-04-08  0:26   ` Steve Rutherford
  2020-04-08  1:48     ` Ashish Kalra
  2 siblings, 1 reply; 107+ messages in thread
From: Steve Rutherford @ 2020-04-08  0:26 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, David Rientjes, Andy Lutomirski, Brijesh Singh

On Sun, Mar 29, 2020 at 11:23 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
>
> From: Brijesh Singh <Brijesh.Singh@amd.com>
>
> The ioctl can be used to set page encryption bitmap for an
> incoming guest.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  Documentation/virt/kvm/api.rst  | 22 +++++++++++++++++
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/svm.c              | 42 +++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c              | 12 ++++++++++
>  include/uapi/linux/kvm.h        |  1 +
>  5 files changed, 79 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 8ad800ebb54f..4d1004a154f6 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4675,6 +4675,28 @@ or shared. The bitmap can be used during the guest migration, if the page
>  is private then userspace need to use SEV migration commands to transmit
>  the page.
>
> +4.126 KVM_SET_PAGE_ENC_BITMAP (vm ioctl)
> +---------------------------------------
> +
> +:Capability: basic
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: struct kvm_page_enc_bitmap (in/out)
> +:Returns: 0 on success, -1 on error
> +
> +/* for KVM_SET_PAGE_ENC_BITMAP */
> +struct kvm_page_enc_bitmap {
> +       __u64 start_gfn;
> +       __u64 num_pages;
> +       union {
> +               void __user *enc_bitmap; /* one bit per page */
> +               __u64 padding2;
> +       };
> +};
> +
> +During the guest live migration the outgoing guest exports its page encryption
> +bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> +bitmap for an incoming guest.
>
>  5. The kvm_run structure
>  ========================
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 27e43e3ec9d8..d30f770aaaea 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1271,6 +1271,8 @@ struct kvm_x86_ops {
>                                   unsigned long sz, unsigned long mode);
>         int (*get_page_enc_bitmap)(struct kvm *kvm,
>                                 struct kvm_page_enc_bitmap *bmap);
> +       int (*set_page_enc_bitmap)(struct kvm *kvm,
> +                               struct kvm_page_enc_bitmap *bmap);
>  };
>
>  struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index bae783cd396a..313343a43045 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7756,6 +7756,47 @@ static int svm_get_page_enc_bitmap(struct kvm *kvm,
>         return ret;
>  }
>
> +static int svm_set_page_enc_bitmap(struct kvm *kvm,
> +                                  struct kvm_page_enc_bitmap *bmap)
> +{
> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +       unsigned long gfn_start, gfn_end;
> +       unsigned long *bitmap;
> +       unsigned long sz, i;
> +       int ret;
> +
> +       if (!sev_guest(kvm))
> +               return -ENOTTY;
> +
> +       gfn_start = bmap->start_gfn;
> +       gfn_end = gfn_start + bmap->num_pages;
> +
> +       sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / 8;
> +       bitmap = kmalloc(sz, GFP_KERNEL);
> +       if (!bitmap)
> +               return -ENOMEM;
> +
> +       ret = -EFAULT;
> +       if (copy_from_user(bitmap, bmap->enc_bitmap, sz))
> +               goto out;
> +
> +       mutex_lock(&kvm->lock);
> +       ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
I realize now that usermode could use this for initializing the
minimum size of the enc bitmap, which probably solves my issue from
the other thread.
> +       if (ret)
> +               goto unlock;
> +
> +       i = gfn_start;
> +       for_each_clear_bit_from(i, bitmap, (gfn_end - gfn_start))
> +               clear_bit(i + gfn_start, sev->page_enc_bmap);
This API seems a bit strange, since it can only clear bits. I would
expect "set" to force the values to match the values passed down,
instead of only ensuring that cleared bits in the input are also
cleared in the kernel.

This should copy the values from userspace (and fix up the ends since
byte alignment makes that complicated), instead of iterating in this
way.
> +
> +       ret = 0;
> +unlock:
> +       mutex_unlock(&kvm->lock);
> +out:
> +       kfree(bitmap);
> +       return ret;
> +}
> +
>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
>         struct kvm_sev_cmd sev_cmd;
> @@ -8161,6 +8202,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>
>         .page_enc_status_hc = svm_page_enc_status_hc,
>         .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> +       .set_page_enc_bitmap = svm_set_page_enc_bitmap,
>  };
>
>  static int __init svm_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3c3fea4e20b5..05e953b2ec61 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5238,6 +5238,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
>                         r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
>                 break;
>         }
> +       case KVM_SET_PAGE_ENC_BITMAP: {
> +               struct kvm_page_enc_bitmap bitmap;
> +
> +               r = -EFAULT;
> +               if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
> +                       goto out;
> +
> +               r = -ENOTTY;
> +               if (kvm_x86_ops->set_page_enc_bitmap)
> +                       r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> +               break;
> +       }
>         default:
>                 r = -ENOTTY;
>         }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index db1ebf85e177..b4b01d47e568 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1489,6 +1489,7 @@ struct kvm_enc_region {
>  #define KVM_S390_CLEAR_RESET   _IO(KVMIO,   0xc4)
>
>  #define KVM_GET_PAGE_ENC_BITMAP        _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> +#define KVM_SET_PAGE_ENC_BITMAP        _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
>
>  /* Secure Encrypted Virtualization command */
>  enum sev_cmd_id {
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2020-04-08  0:01       ` Steve Rutherford
@ 2020-04-08  0:29         ` Brijesh Singh
  2020-04-08  0:35           ` Steve Rutherford
  0 siblings, 1 reply; 107+ messages in thread
From: Brijesh Singh @ 2020-04-08  0:29 UTC (permalink / raw)
  To: Steve Rutherford, Ashish Kalra
  Cc: brijesh.singh, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski


On 4/7/20 7:01 PM, Steve Rutherford wrote:
> On Mon, Apr 6, 2020 at 10:27 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>> Hello Steve,
>>
>> On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
>>> On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
>>>> From: Brijesh Singh <Brijesh.Singh@amd.com>
>>>>
>>>> This hypercall is used by the SEV guest to notify a change in the page
>>>> encryption status to the hypervisor. The hypercall should be invoked
>>>> only when the encryption attribute is changed from encrypted -> decrypted
>>>> and vice versa. By default all guest pages are considered encrypted.
>>>>
>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
>>>> Cc: Joerg Roedel <joro@8bytes.org>
>>>> Cc: Borislav Petkov <bp@suse.de>
>>>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>>>> Cc: x86@kernel.org
>>>> Cc: kvm@vger.kernel.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>>> ---
>>>>  Documentation/virt/kvm/hypercalls.rst | 15 +++++
>>>>  arch/x86/include/asm/kvm_host.h       |  2 +
>>>>  arch/x86/kvm/svm.c                    | 95 +++++++++++++++++++++++++++
>>>>  arch/x86/kvm/vmx/vmx.c                |  1 +
>>>>  arch/x86/kvm/x86.c                    |  6 ++
>>>>  include/uapi/linux/kvm_para.h         |  1 +
>>>>  6 files changed, 120 insertions(+)
>>>>
>>>> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
>>>> index dbaf207e560d..ff5287e68e81 100644
>>>> --- a/Documentation/virt/kvm/hypercalls.rst
>>>> +++ b/Documentation/virt/kvm/hypercalls.rst
>>>> @@ -169,3 +169,18 @@ a0: destination APIC ID
>>>>
>>>>  :Usage example: When sending a call-function IPI-many to vCPUs, yield if
>>>>                 any of the IPI target vCPUs was preempted.
>>>> +
>>>> +
>>>> +8. KVM_HC_PAGE_ENC_STATUS
>>>> +-------------------------
>>>> +:Architecture: x86
>>>> +:Status: active
>>>> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
>>>> +
>>>> +a0: the guest physical address of the start page
>>>> +a1: the number of pages
>>>> +a2: encryption attribute
>>>> +
>>>> +   Where:
>>>> +       * 1: Encryption attribute is set
>>>> +       * 0: Encryption attribute is cleared
>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>> index 98959e8cd448..90718fa3db47 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
>>>>
>>>>         bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
>>>>         int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
>>>> +       int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
>>>> +                                 unsigned long sz, unsigned long mode);
>>> Nit: spell out size instead of sz.
>>>>  };
>>>>
>>>>  struct kvm_arch_async_pf {
>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>>> index 7c2721e18b06..1d8beaf1bceb 100644
>>>> --- a/arch/x86/kvm/svm.c
>>>> +++ b/arch/x86/kvm/svm.c
>>>> @@ -136,6 +136,8 @@ struct kvm_sev_info {
>>>>         int fd;                 /* SEV device fd */
>>>>         unsigned long pages_locked; /* Number of pages locked */
>>>>         struct list_head regions_list;  /* List of registered regions */
>>>> +       unsigned long *page_enc_bmap;
>>>> +       unsigned long page_enc_bmap_size;
>>>>  };
>>>>
>>>>  struct kvm_svm {
>>>> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
>>>>
>>>>         sev_unbind_asid(kvm, sev->handle);
>>>>         sev_asid_free(sev->asid);
>>>> +
>>>> +       kvfree(sev->page_enc_bmap);
>>>> +       sev->page_enc_bmap = NULL;
>>>>  }
>>>>
>>>>  static void avic_vm_destroy(struct kvm *kvm)
>>>> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>>>         return ret;
>>>>  }
>>>>
>>>> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
>>>> +{
>>>> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>> +       unsigned long *map;
>>>> +       unsigned long sz;
>>>> +
>>>> +       if (sev->page_enc_bmap_size >= new_size)
>>>> +               return 0;
>>>> +
>>>> +       sz = ALIGN(new_size, BITS_PER_LONG) / 8;
>>>> +
>>>> +       map = vmalloc(sz);
>>>> +       if (!map) {
>>>> +               pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
>>>> +                               sz);
>>>> +               return -ENOMEM;
>>>> +       }
>>>> +
>>>> +       /* mark the page encrypted (by default) */
>>>> +       memset(map, 0xff, sz);
>>>> +
>>>> +       bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
>>>> +       kvfree(sev->page_enc_bmap);
>>>> +
>>>> +       sev->page_enc_bmap = map;
>>>> +       sev->page_enc_bmap_size = new_size;
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
>>>> +                                 unsigned long npages, unsigned long enc)
>>>> +{
>>>> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>> +       kvm_pfn_t pfn_start, pfn_end;
>>>> +       gfn_t gfn_start, gfn_end;
>>>> +       int ret;
>>>> +
>>>> +       if (!sev_guest(kvm))
>>>> +               return -EINVAL;
>>>> +
>>>> +       if (!npages)
>>>> +               return 0;
>>>> +
>>>> +       gfn_start = gpa_to_gfn(gpa);
>>>> +       gfn_end = gfn_start + npages;
>>>> +
>>>> +       /* out of bound access error check */
>>>> +       if (gfn_end <= gfn_start)
>>>> +               return -EINVAL;
>>>> +
>>>> +       /* lets make sure that gpa exist in our memslot */
>>>> +       pfn_start = gfn_to_pfn(kvm, gfn_start);
>>>> +       pfn_end = gfn_to_pfn(kvm, gfn_end);
>>>> +
>>>> +       if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
>>>> +               /*
>>>> +                * Allow guest MMIO range(s) to be added
>>>> +                * to the page encryption bitmap.
>>>> +                */
>>>> +               return -EINVAL;
>>>> +       }
>>>> +
>>>> +       if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
>>>> +               /*
>>>> +                * Allow guest MMIO range(s) to be added
>>>> +                * to the page encryption bitmap.
>>>> +                */
>>>> +               return -EINVAL;
>>>> +       }
>>>> +
>>>> +       mutex_lock(&kvm->lock);
>>>> +       ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
>>>> +       if (ret)
>>>> +               goto unlock;
>>>> +
>>>> +       if (enc)
>>>> +               __bitmap_set(sev->page_enc_bmap, gfn_start,
>>>> +                               gfn_end - gfn_start);
>>>> +       else
>>>> +               __bitmap_clear(sev->page_enc_bmap, gfn_start,
>>>> +                               gfn_end - gfn_start);
>>>> +
>>>> +unlock:
>>>> +       mutex_unlock(&kvm->lock);
>>>> +       return ret;
>>>> +}
>>>> +
>>>>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>>>  {
>>>>         struct kvm_sev_cmd sev_cmd;
>>>> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>>>>         .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
>>>>
>>>>         .apic_init_signal_blocked = svm_apic_init_signal_blocked,
>>>> +
>>>> +       .page_enc_status_hc = svm_page_enc_status_hc,
>>>>  };
>>>>
>>>>  static int __init svm_init(void)
>>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>>> index 079d9fbf278e..f68e76ee7f9c 100644
>>>> --- a/arch/x86/kvm/vmx/vmx.c
>>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>>> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>>>>         .nested_get_evmcs_version = NULL,
>>>>         .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
>>>>         .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
>>>> +       .page_enc_status_hc = NULL,
>>>>  };
>>>>
>>>>  static void vmx_cleanup_l1d_flush(void)
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index cf95c36cb4f4..68428eef2dde 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>>>>                 kvm_sched_yield(vcpu->kvm, a0);
>>>>                 ret = 0;
>>>>                 break;
>>>> +       case KVM_HC_PAGE_ENC_STATUS:
>>>> +               ret = -KVM_ENOSYS;
>>>> +               if (kvm_x86_ops->page_enc_status_hc)
>>>> +                       ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
>>>> +                                       a0, a1, a2);
>>>> +               break;
>>>>         default:
>>>>                 ret = -KVM_ENOSYS;
>>>>                 break;
>>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
>>>> index 8b86609849b9..847b83b75dc8 100644
>>>> --- a/include/uapi/linux/kvm_para.h
>>>> +++ b/include/uapi/linux/kvm_para.h
>>>> @@ -29,6 +29,7 @@
>>>>  #define KVM_HC_CLOCK_PAIRING           9
>>>>  #define KVM_HC_SEND_IPI                10
>>>>  #define KVM_HC_SCHED_YIELD             11
>>>> +#define KVM_HC_PAGE_ENC_STATUS         12
>>>>
>>>>  /*
>>>>   * hypercalls use architecture specific
>>>> --
>>>> 2.17.1
>>>>
>>> I'm still not excited by the dynamic resizing. I believe the guest
>>> hypercall can be called in atomic contexts, which makes me
>>> particularly unexcited to see a potentially large vmalloc on the host
>>> followed by filling the buffer. Particularly when the buffer might be
>>> non-trivial in size (~1MB per 32GB, per some back of the envelope
>>> math).
>>>
>> I think looking at more practical situations, most hypercalls will
>> happen during the boot stage, when device specific initializations are
>> happening, so typically the maximum page encryption bitmap size would
>> be allocated early enough.
>>
>> In fact, initial hypercalls made by OVMF will probably allocate the
>> maximum page bitmap size even before the kernel comes up, especially
>> as they will be setting up page enc/dec status for MMIO, ROM, ACPI
>> regions, PCI device memory, etc., and most importantly for
>> "non-existent" high memory range (which will probably be the
>> maximum size page encryption bitmap allocated/resized).
>>
>> Let me know if you have different thoughts on this ?
> Hi Ashish,
>
> If this is not an issue in practice, we can just move past this. If we
> are basically guaranteed that OVMF will trigger hypercalls that expand
> the bitmap beyond the top of memory, then, yes, that should work. That
> leaves me slightly nervous that OVMF might regress since it's not
> obvious that calling a hypercall beyond the top of memory would be
> "required" for avoiding a somewhat indirectly related issue in guest
> kernels.


If possible then we should try to avoid growing/shrinking the bitmap .
Today OVMF may not be accessing beyond memory but a malicious guest
could send a hypercall down which can trigger a huge memory allocation
on the host side and may eventually cause denial of service for other. 
I am in favor if we can find some solution to handle this case. How
about Steve's suggestion about VMM making a call down to the kernel to
tell how big the bitmap should be? Initially it should be equal to the
guest RAM and if VMM ever did the memory expansion then it can send down
another notification to increase the bitmap ?

Optionally, instead of adding a new ioctl, I was wondering if we can
extend the kvm_arch_prepare_memory_region() to make svm specific x86_ops
which can take read the userspace provided memory region and calculate
the amount of guest RAM managed by the KVM and grow/shrink the bitmap
based on that information. I have not looked deep enough to see if its
doable but if it can work then we can avoid adding yet another ioctl.

>
> Adding a kvm_enable_cap doesn't seem particularly complicated and side
> steps all of these concerns, so I still prefer it. Caveat, haven't
> reviewed the patches about the feature bits yet: the enable cap would
> also make it possible for kernels that support live migration to avoid
> advertising live migration if host usermode does not want it to be
> advertised. This seems pretty important, since hosts that don't plan
> to live migrate should have the ability to tell the guest to stop
> calling.
>
> Thanks,
> Steve

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2020-04-08  0:29         ` Brijesh Singh
@ 2020-04-08  0:35           ` Steve Rutherford
  2020-04-08  1:17             ` Ashish Kalra
  0 siblings, 1 reply; 107+ messages in thread
From: Steve Rutherford @ 2020-04-08  0:35 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Ashish Kalra, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski

On Tue, Apr 7, 2020 at 5:29 PM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
>
> On 4/7/20 7:01 PM, Steve Rutherford wrote:
> > On Mon, Apr 6, 2020 at 10:27 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >> Hello Steve,
> >>
> >> On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
> >>> On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> >>>> From: Brijesh Singh <Brijesh.Singh@amd.com>
> >>>>
> >>>> This hypercall is used by the SEV guest to notify a change in the page
> >>>> encryption status to the hypervisor. The hypercall should be invoked
> >>>> only when the encryption attribute is changed from encrypted -> decrypted
> >>>> and vice versa. By default all guest pages are considered encrypted.
> >>>>
> >>>> Cc: Thomas Gleixner <tglx@linutronix.de>
> >>>> Cc: Ingo Molnar <mingo@redhat.com>
> >>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
> >>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >>>> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> >>>> Cc: Joerg Roedel <joro@8bytes.org>
> >>>> Cc: Borislav Petkov <bp@suse.de>
> >>>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> >>>> Cc: x86@kernel.org
> >>>> Cc: kvm@vger.kernel.org
> >>>> Cc: linux-kernel@vger.kernel.org
> >>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> >>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> >>>> ---
> >>>>  Documentation/virt/kvm/hypercalls.rst | 15 +++++
> >>>>  arch/x86/include/asm/kvm_host.h       |  2 +
> >>>>  arch/x86/kvm/svm.c                    | 95 +++++++++++++++++++++++++++
> >>>>  arch/x86/kvm/vmx/vmx.c                |  1 +
> >>>>  arch/x86/kvm/x86.c                    |  6 ++
> >>>>  include/uapi/linux/kvm_para.h         |  1 +
> >>>>  6 files changed, 120 insertions(+)
> >>>>
> >>>> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> >>>> index dbaf207e560d..ff5287e68e81 100644
> >>>> --- a/Documentation/virt/kvm/hypercalls.rst
> >>>> +++ b/Documentation/virt/kvm/hypercalls.rst
> >>>> @@ -169,3 +169,18 @@ a0: destination APIC ID
> >>>>
> >>>>  :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> >>>>                 any of the IPI target vCPUs was preempted.
> >>>> +
> >>>> +
> >>>> +8. KVM_HC_PAGE_ENC_STATUS
> >>>> +-------------------------
> >>>> +:Architecture: x86
> >>>> +:Status: active
> >>>> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> >>>> +
> >>>> +a0: the guest physical address of the start page
> >>>> +a1: the number of pages
> >>>> +a2: encryption attribute
> >>>> +
> >>>> +   Where:
> >>>> +       * 1: Encryption attribute is set
> >>>> +       * 0: Encryption attribute is cleared
> >>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >>>> index 98959e8cd448..90718fa3db47 100644
> >>>> --- a/arch/x86/include/asm/kvm_host.h
> >>>> +++ b/arch/x86/include/asm/kvm_host.h
> >>>> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> >>>>
> >>>>         bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> >>>>         int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> >>>> +       int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> >>>> +                                 unsigned long sz, unsigned long mode);
> >>> Nit: spell out size instead of sz.
> >>>>  };
> >>>>
> >>>>  struct kvm_arch_async_pf {
> >>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >>>> index 7c2721e18b06..1d8beaf1bceb 100644
> >>>> --- a/arch/x86/kvm/svm.c
> >>>> +++ b/arch/x86/kvm/svm.c
> >>>> @@ -136,6 +136,8 @@ struct kvm_sev_info {
> >>>>         int fd;                 /* SEV device fd */
> >>>>         unsigned long pages_locked; /* Number of pages locked */
> >>>>         struct list_head regions_list;  /* List of registered regions */
> >>>> +       unsigned long *page_enc_bmap;
> >>>> +       unsigned long page_enc_bmap_size;
> >>>>  };
> >>>>
> >>>>  struct kvm_svm {
> >>>> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> >>>>
> >>>>         sev_unbind_asid(kvm, sev->handle);
> >>>>         sev_asid_free(sev->asid);
> >>>> +
> >>>> +       kvfree(sev->page_enc_bmap);
> >>>> +       sev->page_enc_bmap = NULL;
> >>>>  }
> >>>>
> >>>>  static void avic_vm_destroy(struct kvm *kvm)
> >>>> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >>>>         return ret;
> >>>>  }
> >>>>
> >>>> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> >>>> +{
> >>>> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >>>> +       unsigned long *map;
> >>>> +       unsigned long sz;
> >>>> +
> >>>> +       if (sev->page_enc_bmap_size >= new_size)
> >>>> +               return 0;
> >>>> +
> >>>> +       sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> >>>> +
> >>>> +       map = vmalloc(sz);
> >>>> +       if (!map) {
> >>>> +               pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> >>>> +                               sz);
> >>>> +               return -ENOMEM;
> >>>> +       }
> >>>> +
> >>>> +       /* mark the page encrypted (by default) */
> >>>> +       memset(map, 0xff, sz);
> >>>> +
> >>>> +       bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> >>>> +       kvfree(sev->page_enc_bmap);
> >>>> +
> >>>> +       sev->page_enc_bmap = map;
> >>>> +       sev->page_enc_bmap_size = new_size;
> >>>> +
> >>>> +       return 0;
> >>>> +}
> >>>> +
> >>>> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> >>>> +                                 unsigned long npages, unsigned long enc)
> >>>> +{
> >>>> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >>>> +       kvm_pfn_t pfn_start, pfn_end;
> >>>> +       gfn_t gfn_start, gfn_end;
> >>>> +       int ret;
> >>>> +
> >>>> +       if (!sev_guest(kvm))
> >>>> +               return -EINVAL;
> >>>> +
> >>>> +       if (!npages)
> >>>> +               return 0;
> >>>> +
> >>>> +       gfn_start = gpa_to_gfn(gpa);
> >>>> +       gfn_end = gfn_start + npages;
> >>>> +
> >>>> +       /* out of bound access error check */
> >>>> +       if (gfn_end <= gfn_start)
> >>>> +               return -EINVAL;
> >>>> +
> >>>> +       /* lets make sure that gpa exist in our memslot */
> >>>> +       pfn_start = gfn_to_pfn(kvm, gfn_start);
> >>>> +       pfn_end = gfn_to_pfn(kvm, gfn_end);
> >>>> +
> >>>> +       if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> >>>> +               /*
> >>>> +                * Allow guest MMIO range(s) to be added
> >>>> +                * to the page encryption bitmap.
> >>>> +                */
> >>>> +               return -EINVAL;
> >>>> +       }
> >>>> +
> >>>> +       if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> >>>> +               /*
> >>>> +                * Allow guest MMIO range(s) to be added
> >>>> +                * to the page encryption bitmap.
> >>>> +                */
> >>>> +               return -EINVAL;
> >>>> +       }
> >>>> +
> >>>> +       mutex_lock(&kvm->lock);
> >>>> +       ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> >>>> +       if (ret)
> >>>> +               goto unlock;
> >>>> +
> >>>> +       if (enc)
> >>>> +               __bitmap_set(sev->page_enc_bmap, gfn_start,
> >>>> +                               gfn_end - gfn_start);
> >>>> +       else
> >>>> +               __bitmap_clear(sev->page_enc_bmap, gfn_start,
> >>>> +                               gfn_end - gfn_start);
> >>>> +
> >>>> +unlock:
> >>>> +       mutex_unlock(&kvm->lock);
> >>>> +       return ret;
> >>>> +}
> >>>> +
> >>>>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >>>>  {
> >>>>         struct kvm_sev_cmd sev_cmd;
> >>>> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >>>>         .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> >>>>
> >>>>         .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> >>>> +
> >>>> +       .page_enc_status_hc = svm_page_enc_status_hc,
> >>>>  };
> >>>>
> >>>>  static int __init svm_init(void)
> >>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> >>>> index 079d9fbf278e..f68e76ee7f9c 100644
> >>>> --- a/arch/x86/kvm/vmx/vmx.c
> >>>> +++ b/arch/x86/kvm/vmx/vmx.c
> >>>> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> >>>>         .nested_get_evmcs_version = NULL,
> >>>>         .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> >>>>         .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> >>>> +       .page_enc_status_hc = NULL,
> >>>>  };
> >>>>
> >>>>  static void vmx_cleanup_l1d_flush(void)
> >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>>> index cf95c36cb4f4..68428eef2dde 100644
> >>>> --- a/arch/x86/kvm/x86.c
> >>>> +++ b/arch/x86/kvm/x86.c
> >>>> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> >>>>                 kvm_sched_yield(vcpu->kvm, a0);
> >>>>                 ret = 0;
> >>>>                 break;
> >>>> +       case KVM_HC_PAGE_ENC_STATUS:
> >>>> +               ret = -KVM_ENOSYS;
> >>>> +               if (kvm_x86_ops->page_enc_status_hc)
> >>>> +                       ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> >>>> +                                       a0, a1, a2);
> >>>> +               break;
> >>>>         default:
> >>>>                 ret = -KVM_ENOSYS;
> >>>>                 break;
> >>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> >>>> index 8b86609849b9..847b83b75dc8 100644
> >>>> --- a/include/uapi/linux/kvm_para.h
> >>>> +++ b/include/uapi/linux/kvm_para.h
> >>>> @@ -29,6 +29,7 @@
> >>>>  #define KVM_HC_CLOCK_PAIRING           9
> >>>>  #define KVM_HC_SEND_IPI                10
> >>>>  #define KVM_HC_SCHED_YIELD             11
> >>>> +#define KVM_HC_PAGE_ENC_STATUS         12
> >>>>
> >>>>  /*
> >>>>   * hypercalls use architecture specific
> >>>> --
> >>>> 2.17.1
> >>>>
> >>> I'm still not excited by the dynamic resizing. I believe the guest
> >>> hypercall can be called in atomic contexts, which makes me
> >>> particularly unexcited to see a potentially large vmalloc on the host
> >>> followed by filling the buffer. Particularly when the buffer might be
> >>> non-trivial in size (~1MB per 32GB, per some back of the envelope
> >>> math).
> >>>
> >> I think looking at more practical situations, most hypercalls will
> >> happen during the boot stage, when device specific initializations are
> >> happening, so typically the maximum page encryption bitmap size would
> >> be allocated early enough.
> >>
> >> In fact, initial hypercalls made by OVMF will probably allocate the
> >> maximum page bitmap size even before the kernel comes up, especially
> >> as they will be setting up page enc/dec status for MMIO, ROM, ACPI
> >> regions, PCI device memory, etc., and most importantly for
> >> "non-existent" high memory range (which will probably be the
> >> maximum size page encryption bitmap allocated/resized).
> >>
> >> Let me know if you have different thoughts on this ?
> > Hi Ashish,
> >
> > If this is not an issue in practice, we can just move past this. If we
> > are basically guaranteed that OVMF will trigger hypercalls that expand
> > the bitmap beyond the top of memory, then, yes, that should work. That
> > leaves me slightly nervous that OVMF might regress since it's not
> > obvious that calling a hypercall beyond the top of memory would be
> > "required" for avoiding a somewhat indirectly related issue in guest
> > kernels.
>
>
> If possible then we should try to avoid growing/shrinking the bitmap .
> Today OVMF may not be accessing beyond memory but a malicious guest
> could send a hypercall down which can trigger a huge memory allocation
> on the host side and may eventually cause denial of service for other.
Nice catch! Was just writing up an email about this.
> I am in favor if we can find some solution to handle this case. How
> about Steve's suggestion about VMM making a call down to the kernel to
> tell how big the bitmap should be? Initially it should be equal to the
> guest RAM and if VMM ever did the memory expansion then it can send down
> another notification to increase the bitmap ?
>
> Optionally, instead of adding a new ioctl, I was wondering if we can
> extend the kvm_arch_prepare_memory_region() to make svm specific x86_ops
> which can take read the userspace provided memory region and calculate
> the amount of guest RAM managed by the KVM and grow/shrink the bitmap
> based on that information. I have not looked deep enough to see if its
> doable but if it can work then we can avoid adding yet another ioctl.
We also have the set bitmap ioctl in a later patch in this series. We
could also use the set ioctl for initialization (it's a little
excessive for initialization since there will be an additional
ephemeral allocation and a few additional buffer copies, but that's
probably fine). An enable_cap has the added benefit of probably being
necessary anyway so usermode can disable the migration feature flag.

In general, userspace is going to have to be in direct control of the
buffer and its size.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2020-04-08  0:35           ` Steve Rutherford
@ 2020-04-08  1:17             ` Ashish Kalra
  2020-04-08  1:38               ` Steve Rutherford
  0 siblings, 1 reply; 107+ messages in thread
From: Ashish Kalra @ 2020-04-08  1:17 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Brijesh Singh, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski

Hello Steve, Brijesh,

On Tue, Apr 07, 2020 at 05:35:57PM -0700, Steve Rutherford wrote:
> On Tue, Apr 7, 2020 at 5:29 PM Brijesh Singh <brijesh.singh@amd.com> wrote:
> >
> >
> > On 4/7/20 7:01 PM, Steve Rutherford wrote:
> > > On Mon, Apr 6, 2020 at 10:27 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > >> Hello Steve,
> > >>
> > >> On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
> > >>> On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> > >>>> From: Brijesh Singh <Brijesh.Singh@amd.com>
> > >>>>
> > >>>> This hypercall is used by the SEV guest to notify a change in the page
> > >>>> encryption status to the hypervisor. The hypercall should be invoked
> > >>>> only when the encryption attribute is changed from encrypted -> decrypted
> > >>>> and vice versa. By default all guest pages are considered encrypted.
> > >>>>
> > >>>> Cc: Thomas Gleixner <tglx@linutronix.de>
> > >>>> Cc: Ingo Molnar <mingo@redhat.com>
> > >>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
> > >>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
> > >>>> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > >>>> Cc: Joerg Roedel <joro@8bytes.org>
> > >>>> Cc: Borislav Petkov <bp@suse.de>
> > >>>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > >>>> Cc: x86@kernel.org
> > >>>> Cc: kvm@vger.kernel.org
> > >>>> Cc: linux-kernel@vger.kernel.org
> > >>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > >>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > >>>> ---
> > >>>>  Documentation/virt/kvm/hypercalls.rst | 15 +++++
> > >>>>  arch/x86/include/asm/kvm_host.h       |  2 +
> > >>>>  arch/x86/kvm/svm.c                    | 95 +++++++++++++++++++++++++++
> > >>>>  arch/x86/kvm/vmx/vmx.c                |  1 +
> > >>>>  arch/x86/kvm/x86.c                    |  6 ++
> > >>>>  include/uapi/linux/kvm_para.h         |  1 +
> > >>>>  6 files changed, 120 insertions(+)
> > >>>>
> > >>>> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> > >>>> index dbaf207e560d..ff5287e68e81 100644
> > >>>> --- a/Documentation/virt/kvm/hypercalls.rst
> > >>>> +++ b/Documentation/virt/kvm/hypercalls.rst
> > >>>> @@ -169,3 +169,18 @@ a0: destination APIC ID
> > >>>>
> > >>>>  :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> > >>>>                 any of the IPI target vCPUs was preempted.
> > >>>> +
> > >>>> +
> > >>>> +8. KVM_HC_PAGE_ENC_STATUS
> > >>>> +-------------------------
> > >>>> +:Architecture: x86
> > >>>> +:Status: active
> > >>>> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> > >>>> +
> > >>>> +a0: the guest physical address of the start page
> > >>>> +a1: the number of pages
> > >>>> +a2: encryption attribute
> > >>>> +
> > >>>> +   Where:
> > >>>> +       * 1: Encryption attribute is set
> > >>>> +       * 0: Encryption attribute is cleared
> > >>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > >>>> index 98959e8cd448..90718fa3db47 100644
> > >>>> --- a/arch/x86/include/asm/kvm_host.h
> > >>>> +++ b/arch/x86/include/asm/kvm_host.h
> > >>>> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> > >>>>
> > >>>>         bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> > >>>>         int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > >>>> +       int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > >>>> +                                 unsigned long sz, unsigned long mode);
> > >>> Nit: spell out size instead of sz.
> > >>>>  };
> > >>>>
> > >>>>  struct kvm_arch_async_pf {
> > >>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > >>>> index 7c2721e18b06..1d8beaf1bceb 100644
> > >>>> --- a/arch/x86/kvm/svm.c
> > >>>> +++ b/arch/x86/kvm/svm.c
> > >>>> @@ -136,6 +136,8 @@ struct kvm_sev_info {
> > >>>>         int fd;                 /* SEV device fd */
> > >>>>         unsigned long pages_locked; /* Number of pages locked */
> > >>>>         struct list_head regions_list;  /* List of registered regions */
> > >>>> +       unsigned long *page_enc_bmap;
> > >>>> +       unsigned long page_enc_bmap_size;
> > >>>>  };
> > >>>>
> > >>>>  struct kvm_svm {
> > >>>> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> > >>>>
> > >>>>         sev_unbind_asid(kvm, sev->handle);
> > >>>>         sev_asid_free(sev->asid);
> > >>>> +
> > >>>> +       kvfree(sev->page_enc_bmap);
> > >>>> +       sev->page_enc_bmap = NULL;
> > >>>>  }
> > >>>>
> > >>>>  static void avic_vm_destroy(struct kvm *kvm)
> > >>>> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > >>>>         return ret;
> > >>>>  }
> > >>>>
> > >>>> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> > >>>> +{
> > >>>> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > >>>> +       unsigned long *map;
> > >>>> +       unsigned long sz;
> > >>>> +
> > >>>> +       if (sev->page_enc_bmap_size >= new_size)
> > >>>> +               return 0;
> > >>>> +
> > >>>> +       sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> > >>>> +
> > >>>> +       map = vmalloc(sz);
> > >>>> +       if (!map) {
> > >>>> +               pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> > >>>> +                               sz);
> > >>>> +               return -ENOMEM;
> > >>>> +       }
> > >>>> +
> > >>>> +       /* mark the page encrypted (by default) */
> > >>>> +       memset(map, 0xff, sz);
> > >>>> +
> > >>>> +       bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> > >>>> +       kvfree(sev->page_enc_bmap);
> > >>>> +
> > >>>> +       sev->page_enc_bmap = map;
> > >>>> +       sev->page_enc_bmap_size = new_size;
> > >>>> +
> > >>>> +       return 0;
> > >>>> +}
> > >>>> +
> > >>>> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > >>>> +                                 unsigned long npages, unsigned long enc)
> > >>>> +{
> > >>>> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > >>>> +       kvm_pfn_t pfn_start, pfn_end;
> > >>>> +       gfn_t gfn_start, gfn_end;
> > >>>> +       int ret;
> > >>>> +
> > >>>> +       if (!sev_guest(kvm))
> > >>>> +               return -EINVAL;
> > >>>> +
> > >>>> +       if (!npages)
> > >>>> +               return 0;
> > >>>> +
> > >>>> +       gfn_start = gpa_to_gfn(gpa);
> > >>>> +       gfn_end = gfn_start + npages;
> > >>>> +
> > >>>> +       /* out of bound access error check */
> > >>>> +       if (gfn_end <= gfn_start)
> > >>>> +               return -EINVAL;
> > >>>> +
> > >>>> +       /* lets make sure that gpa exist in our memslot */
> > >>>> +       pfn_start = gfn_to_pfn(kvm, gfn_start);
> > >>>> +       pfn_end = gfn_to_pfn(kvm, gfn_end);
> > >>>> +
> > >>>> +       if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> > >>>> +               /*
> > >>>> +                * Allow guest MMIO range(s) to be added
> > >>>> +                * to the page encryption bitmap.
> > >>>> +                */
> > >>>> +               return -EINVAL;
> > >>>> +       }
> > >>>> +
> > >>>> +       if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> > >>>> +               /*
> > >>>> +                * Allow guest MMIO range(s) to be added
> > >>>> +                * to the page encryption bitmap.
> > >>>> +                */
> > >>>> +               return -EINVAL;
> > >>>> +       }
> > >>>> +
> > >>>> +       mutex_lock(&kvm->lock);
> > >>>> +       ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > >>>> +       if (ret)
> > >>>> +               goto unlock;
> > >>>> +
> > >>>> +       if (enc)
> > >>>> +               __bitmap_set(sev->page_enc_bmap, gfn_start,
> > >>>> +                               gfn_end - gfn_start);
> > >>>> +       else
> > >>>> +               __bitmap_clear(sev->page_enc_bmap, gfn_start,
> > >>>> +                               gfn_end - gfn_start);
> > >>>> +
> > >>>> +unlock:
> > >>>> +       mutex_unlock(&kvm->lock);
> > >>>> +       return ret;
> > >>>> +}
> > >>>> +
> > >>>>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > >>>>  {
> > >>>>         struct kvm_sev_cmd sev_cmd;
> > >>>> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > >>>>         .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> > >>>>
> > >>>>         .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > >>>> +
> > >>>> +       .page_enc_status_hc = svm_page_enc_status_hc,
> > >>>>  };
> > >>>>
> > >>>>  static int __init svm_init(void)
> > >>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > >>>> index 079d9fbf278e..f68e76ee7f9c 100644
> > >>>> --- a/arch/x86/kvm/vmx/vmx.c
> > >>>> +++ b/arch/x86/kvm/vmx/vmx.c
> > >>>> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > >>>>         .nested_get_evmcs_version = NULL,
> > >>>>         .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> > >>>>         .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > >>>> +       .page_enc_status_hc = NULL,
> > >>>>  };
> > >>>>
> > >>>>  static void vmx_cleanup_l1d_flush(void)
> > >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > >>>> index cf95c36cb4f4..68428eef2dde 100644
> > >>>> --- a/arch/x86/kvm/x86.c
> > >>>> +++ b/arch/x86/kvm/x86.c
> > >>>> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> > >>>>                 kvm_sched_yield(vcpu->kvm, a0);
> > >>>>                 ret = 0;
> > >>>>                 break;
> > >>>> +       case KVM_HC_PAGE_ENC_STATUS:
> > >>>> +               ret = -KVM_ENOSYS;
> > >>>> +               if (kvm_x86_ops->page_enc_status_hc)
> > >>>> +                       ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> > >>>> +                                       a0, a1, a2);
> > >>>> +               break;
> > >>>>         default:
> > >>>>                 ret = -KVM_ENOSYS;
> > >>>>                 break;
> > >>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> > >>>> index 8b86609849b9..847b83b75dc8 100644
> > >>>> --- a/include/uapi/linux/kvm_para.h
> > >>>> +++ b/include/uapi/linux/kvm_para.h
> > >>>> @@ -29,6 +29,7 @@
> > >>>>  #define KVM_HC_CLOCK_PAIRING           9
> > >>>>  #define KVM_HC_SEND_IPI                10
> > >>>>  #define KVM_HC_SCHED_YIELD             11
> > >>>> +#define KVM_HC_PAGE_ENC_STATUS         12
> > >>>>
> > >>>>  /*
> > >>>>   * hypercalls use architecture specific
> > >>>> --
> > >>>> 2.17.1
> > >>>>
> > >>> I'm still not excited by the dynamic resizing. I believe the guest
> > >>> hypercall can be called in atomic contexts, which makes me
> > >>> particularly unexcited to see a potentially large vmalloc on the host
> > >>> followed by filling the buffer. Particularly when the buffer might be
> > >>> non-trivial in size (~1MB per 32GB, per some back of the envelope
> > >>> math).
> > >>>
> > >> I think looking at more practical situations, most hypercalls will
> > >> happen during the boot stage, when device specific initializations are
> > >> happening, so typically the maximum page encryption bitmap size would
> > >> be allocated early enough.
> > >>
> > >> In fact, initial hypercalls made by OVMF will probably allocate the
> > >> maximum page bitmap size even before the kernel comes up, especially
> > >> as they will be setting up page enc/dec status for MMIO, ROM, ACPI
> > >> regions, PCI device memory, etc., and most importantly for
> > >> "non-existent" high memory range (which will probably be the
> > >> maximum size page encryption bitmap allocated/resized).
> > >>
> > >> Let me know if you have different thoughts on this ?
> > > Hi Ashish,
> > >
> > > If this is not an issue in practice, we can just move past this. If we
> > > are basically guaranteed that OVMF will trigger hypercalls that expand
> > > the bitmap beyond the top of memory, then, yes, that should work. That
> > > leaves me slightly nervous that OVMF might regress since it's not
> > > obvious that calling a hypercall beyond the top of memory would be
> > > "required" for avoiding a somewhat indirectly related issue in guest
> > > kernels.
> >
> >
> > If possible then we should try to avoid growing/shrinking the bitmap .
> > Today OVMF may not be accessing beyond memory but a malicious guest
> > could send a hypercall down which can trigger a huge memory allocation
> > on the host side and may eventually cause denial of service for other.
> Nice catch! Was just writing up an email about this.
> > I am in favor if we can find some solution to handle this case. How
> > about Steve's suggestion about VMM making a call down to the kernel to
> > tell how big the bitmap should be? Initially it should be equal to the
> > guest RAM and if VMM ever did the memory expansion then it can send down
> > another notification to increase the bitmap ?
> >
> > Optionally, instead of adding a new ioctl, I was wondering if we can
> > extend the kvm_arch_prepare_memory_region() to make svm specific x86_ops
> > which can take read the userspace provided memory region and calculate
> > the amount of guest RAM managed by the KVM and grow/shrink the bitmap
> > based on that information. I have not looked deep enough to see if its
> > doable but if it can work then we can avoid adding yet another ioctl.
> We also have the set bitmap ioctl in a later patch in this series. We
> could also use the set ioctl for initialization (it's a little
> excessive for initialization since there will be an additional
> ephemeral allocation and a few additional buffer copies, but that's
> probably fine). An enable_cap has the added benefit of probably being
> necessary anyway so usermode can disable the migration feature flag.
> 
> In general, userspace is going to have to be in direct control of the
> buffer and its size.

My only practical concern about setting a static bitmap size based on guest
memory is about the hypercalls being made initially by OVMF to set page
enc/dec status for ROM, ACPI regions and especially the non-existent
high memory range. The new ioctl will statically setup bitmap size to 
whatever guest RAM is specified, say 4G, 8G, etc., but the OVMF
hypercall for non-existent memory will try to do a hypercall for guest
physical memory range like ~6G->64G (for 4G guest RAM setup), this
hypercall will basically have to just return doing nothing, because
the allocated bitmap won't have this guest physical range available ?

Also, hypercalls for ROM, ACPI, device regions and any memory holes within
the static bitmap setup as per guest RAM config will work, but what
about hypercalls for any device regions beyond the guest RAM config ?

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  2020-04-06 18:52       ` Krish Sadhukhan
@ 2020-04-08  1:25         ` Steve Rutherford
  2020-04-08  1:52           ` Ashish Kalra
  0 siblings, 1 reply; 107+ messages in thread
From: Steve Rutherford @ 2020-04-08  1:25 UTC (permalink / raw)
  To: Krish Sadhukhan
  Cc: Ashish Kalra, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski,
	Brijesh Singh

On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
>
> On 4/3/20 2:45 PM, Ashish Kalra wrote:
> > On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> >> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> >>> From: Ashish Kalra <ashish.kalra@amd.com>
> >>>
> >>> This ioctl can be used by the application to reset the page
> >>> encryption bitmap managed by the KVM driver. A typical usage
> >>> for this ioctl is on VM reboot, on reboot, we must reinitialize
> >>> the bitmap.
> >>>
> >>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> >>> ---
> >>>    Documentation/virt/kvm/api.rst  | 13 +++++++++++++
> >>>    arch/x86/include/asm/kvm_host.h |  1 +
> >>>    arch/x86/kvm/svm.c              | 16 ++++++++++++++++
> >>>    arch/x86/kvm/x86.c              |  6 ++++++
> >>>    include/uapi/linux/kvm.h        |  1 +
> >>>    5 files changed, 37 insertions(+)
> >>>
> >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> >>> index 4d1004a154f6..a11326ccc51d 100644
> >>> --- a/Documentation/virt/kvm/api.rst
> >>> +++ b/Documentation/virt/kvm/api.rst
> >>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> >>>    bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> >>>    bitmap for an incoming guest.
> >>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> >>> +-----------------------------------------
> >>> +
> >>> +:Capability: basic
> >>> +:Architectures: x86
> >>> +:Type: vm ioctl
> >>> +:Parameters: none
> >>> +:Returns: 0 on success, -1 on error
> >>> +
> >>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> >>> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> >>> +
> >>> +
> >>>    5. The kvm_run structure
> >>>    ========================
> >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >>> index d30f770aaaea..a96ef6338cd2 100644
> >>> --- a/arch/x86/include/asm/kvm_host.h
> >>> +++ b/arch/x86/include/asm/kvm_host.h
> >>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> >>>                             struct kvm_page_enc_bitmap *bmap);
> >>>     int (*set_page_enc_bitmap)(struct kvm *kvm,
> >>>                             struct kvm_page_enc_bitmap *bmap);
> >>> +   int (*reset_page_enc_bitmap)(struct kvm *kvm);
> >>>    };
> >>>    struct kvm_arch_async_pf {
> >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >>> index 313343a43045..c99b0207a443 100644
> >>> --- a/arch/x86/kvm/svm.c
> >>> +++ b/arch/x86/kvm/svm.c
> >>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> >>>     return ret;
> >>>    }
> >>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> >>> +{
> >>> +   struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >>> +
> >>> +   if (!sev_guest(kvm))
> >>> +           return -ENOTTY;
> >>> +
> >>> +   mutex_lock(&kvm->lock);
> >>> +   /* by default all pages should be marked encrypted */
> >>> +   if (sev->page_enc_bmap_size)
> >>> +           bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> >>> +   mutex_unlock(&kvm->lock);
> >>> +   return 0;
> >>> +}
> >>> +
> >>>    static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >>>    {
> >>>     struct kvm_sev_cmd sev_cmd;
> >>> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >>>     .page_enc_status_hc = svm_page_enc_status_hc,
> >>>     .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> >>>     .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> >>> +   .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
> >>
> >> We don't need to initialize the intel ops to NULL ? It's not initialized in
> >> the previous patch either.
> >>
> >>>    };
> > This struct is declared as "static storage", so won't the non-initialized
> > members be 0 ?
>
>
> Correct. Although, I see that 'nested_enable_evmcs' is explicitly
> initialized. We should maintain the convention, perhaps.
>
> >
> >>>    static int __init svm_init(void)
> >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>> index 05e953b2ec61..2127ed937f53 100644
> >>> --- a/arch/x86/kvm/x86.c
> >>> +++ b/arch/x86/kvm/x86.c
> >>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >>>                     r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> >>>             break;
> >>>     }
> >>> +   case KVM_PAGE_ENC_BITMAP_RESET: {
> >>> +           r = -ENOTTY;
> >>> +           if (kvm_x86_ops->reset_page_enc_bitmap)
> >>> +                   r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> >>> +           break;
> >>> +   }
> >>>     default:
> >>>             r = -ENOTTY;
> >>>     }
> >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >>> index b4b01d47e568..0884a581fc37 100644
> >>> --- a/include/uapi/linux/kvm.h
> >>> +++ b/include/uapi/linux/kvm.h
> >>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> >>>    #define KVM_GET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> >>>    #define KVM_SET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> >>> +#define KVM_PAGE_ENC_BITMAP_RESET  _IO(KVMIO, 0xc7)
> >>>    /* Secure Encrypted Virtualization command */
> >>>    enum sev_cmd_id {
> >> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>


Doesn't this overlap with the set ioctl? Yes, obviously, you have to
copy the new value down and do a bit more work, but I don't think
resetting the bitmap is going to be the bottleneck on reboot. Seems
excessive to add another ioctl for this.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2020-04-08  1:17             ` Ashish Kalra
@ 2020-04-08  1:38               ` Steve Rutherford
  2020-04-08  2:34                 ` Brijesh Singh
  0 siblings, 1 reply; 107+ messages in thread
From: Steve Rutherford @ 2020-04-08  1:38 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Brijesh Singh, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski

On Tue, Apr 7, 2020 at 6:17 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>
> Hello Steve, Brijesh,
>
> On Tue, Apr 07, 2020 at 05:35:57PM -0700, Steve Rutherford wrote:
> > On Tue, Apr 7, 2020 at 5:29 PM Brijesh Singh <brijesh.singh@amd.com> wrote:
> > >
> > >
> > > On 4/7/20 7:01 PM, Steve Rutherford wrote:
> > > > On Mon, Apr 6, 2020 at 10:27 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > >> Hello Steve,
> > > >>
> > > >> On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
> > > >>> On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> > > >>>> From: Brijesh Singh <Brijesh.Singh@amd.com>
> > > >>>>
> > > >>>> This hypercall is used by the SEV guest to notify a change in the page
> > > >>>> encryption status to the hypervisor. The hypercall should be invoked
> > > >>>> only when the encryption attribute is changed from encrypted -> decrypted
> > > >>>> and vice versa. By default all guest pages are considered encrypted.
> > > >>>>
> > > >>>> Cc: Thomas Gleixner <tglx@linutronix.de>
> > > >>>> Cc: Ingo Molnar <mingo@redhat.com>
> > > >>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > >>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > >>>> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > > >>>> Cc: Joerg Roedel <joro@8bytes.org>
> > > >>>> Cc: Borislav Petkov <bp@suse.de>
> > > >>>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > > >>>> Cc: x86@kernel.org
> > > >>>> Cc: kvm@vger.kernel.org
> > > >>>> Cc: linux-kernel@vger.kernel.org
> > > >>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > > >>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > >>>> ---
> > > >>>>  Documentation/virt/kvm/hypercalls.rst | 15 +++++
> > > >>>>  arch/x86/include/asm/kvm_host.h       |  2 +
> > > >>>>  arch/x86/kvm/svm.c                    | 95 +++++++++++++++++++++++++++
> > > >>>>  arch/x86/kvm/vmx/vmx.c                |  1 +
> > > >>>>  arch/x86/kvm/x86.c                    |  6 ++
> > > >>>>  include/uapi/linux/kvm_para.h         |  1 +
> > > >>>>  6 files changed, 120 insertions(+)
> > > >>>>
> > > >>>> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> > > >>>> index dbaf207e560d..ff5287e68e81 100644
> > > >>>> --- a/Documentation/virt/kvm/hypercalls.rst
> > > >>>> +++ b/Documentation/virt/kvm/hypercalls.rst
> > > >>>> @@ -169,3 +169,18 @@ a0: destination APIC ID
> > > >>>>
> > > >>>>  :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> > > >>>>                 any of the IPI target vCPUs was preempted.
> > > >>>> +
> > > >>>> +
> > > >>>> +8. KVM_HC_PAGE_ENC_STATUS
> > > >>>> +-------------------------
> > > >>>> +:Architecture: x86
> > > >>>> +:Status: active
> > > >>>> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> > > >>>> +
> > > >>>> +a0: the guest physical address of the start page
> > > >>>> +a1: the number of pages
> > > >>>> +a2: encryption attribute
> > > >>>> +
> > > >>>> +   Where:
> > > >>>> +       * 1: Encryption attribute is set
> > > >>>> +       * 0: Encryption attribute is cleared
> > > >>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > >>>> index 98959e8cd448..90718fa3db47 100644
> > > >>>> --- a/arch/x86/include/asm/kvm_host.h
> > > >>>> +++ b/arch/x86/include/asm/kvm_host.h
> > > >>>> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> > > >>>>
> > > >>>>         bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> > > >>>>         int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > > >>>> +       int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > > >>>> +                                 unsigned long sz, unsigned long mode);
> > > >>> Nit: spell out size instead of sz.
> > > >>>>  };
> > > >>>>
> > > >>>>  struct kvm_arch_async_pf {
> > > >>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > >>>> index 7c2721e18b06..1d8beaf1bceb 100644
> > > >>>> --- a/arch/x86/kvm/svm.c
> > > >>>> +++ b/arch/x86/kvm/svm.c
> > > >>>> @@ -136,6 +136,8 @@ struct kvm_sev_info {
> > > >>>>         int fd;                 /* SEV device fd */
> > > >>>>         unsigned long pages_locked; /* Number of pages locked */
> > > >>>>         struct list_head regions_list;  /* List of registered regions */
> > > >>>> +       unsigned long *page_enc_bmap;
> > > >>>> +       unsigned long page_enc_bmap_size;
> > > >>>>  };
> > > >>>>
> > > >>>>  struct kvm_svm {
> > > >>>> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> > > >>>>
> > > >>>>         sev_unbind_asid(kvm, sev->handle);
> > > >>>>         sev_asid_free(sev->asid);
> > > >>>> +
> > > >>>> +       kvfree(sev->page_enc_bmap);
> > > >>>> +       sev->page_enc_bmap = NULL;
> > > >>>>  }
> > > >>>>
> > > >>>>  static void avic_vm_destroy(struct kvm *kvm)
> > > >>>> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > > >>>>         return ret;
> > > >>>>  }
> > > >>>>
> > > >>>> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> > > >>>> +{
> > > >>>> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > >>>> +       unsigned long *map;
> > > >>>> +       unsigned long sz;
> > > >>>> +
> > > >>>> +       if (sev->page_enc_bmap_size >= new_size)
> > > >>>> +               return 0;
> > > >>>> +
> > > >>>> +       sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> > > >>>> +
> > > >>>> +       map = vmalloc(sz);
> > > >>>> +       if (!map) {
> > > >>>> +               pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> > > >>>> +                               sz);
> > > >>>> +               return -ENOMEM;
> > > >>>> +       }
> > > >>>> +
> > > >>>> +       /* mark the page encrypted (by default) */
> > > >>>> +       memset(map, 0xff, sz);
> > > >>>> +
> > > >>>> +       bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > >>>> +       kvfree(sev->page_enc_bmap);
> > > >>>> +
> > > >>>> +       sev->page_enc_bmap = map;
> > > >>>> +       sev->page_enc_bmap_size = new_size;
> > > >>>> +
> > > >>>> +       return 0;
> > > >>>> +}
> > > >>>> +
> > > >>>> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > > >>>> +                                 unsigned long npages, unsigned long enc)
> > > >>>> +{
> > > >>>> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > >>>> +       kvm_pfn_t pfn_start, pfn_end;
> > > >>>> +       gfn_t gfn_start, gfn_end;
> > > >>>> +       int ret;
> > > >>>> +
> > > >>>> +       if (!sev_guest(kvm))
> > > >>>> +               return -EINVAL;
> > > >>>> +
> > > >>>> +       if (!npages)
> > > >>>> +               return 0;
> > > >>>> +
> > > >>>> +       gfn_start = gpa_to_gfn(gpa);
> > > >>>> +       gfn_end = gfn_start + npages;
> > > >>>> +
> > > >>>> +       /* out of bound access error check */
> > > >>>> +       if (gfn_end <= gfn_start)
> > > >>>> +               return -EINVAL;
> > > >>>> +
> > > >>>> +       /* lets make sure that gpa exist in our memslot */
> > > >>>> +       pfn_start = gfn_to_pfn(kvm, gfn_start);
> > > >>>> +       pfn_end = gfn_to_pfn(kvm, gfn_end);
> > > >>>> +
> > > >>>> +       if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> > > >>>> +               /*
> > > >>>> +                * Allow guest MMIO range(s) to be added
> > > >>>> +                * to the page encryption bitmap.
> > > >>>> +                */
> > > >>>> +               return -EINVAL;
> > > >>>> +       }
> > > >>>> +
> > > >>>> +       if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> > > >>>> +               /*
> > > >>>> +                * Allow guest MMIO range(s) to be added
> > > >>>> +                * to the page encryption bitmap.
> > > >>>> +                */
> > > >>>> +               return -EINVAL;
> > > >>>> +       }
> > > >>>> +
> > > >>>> +       mutex_lock(&kvm->lock);
> > > >>>> +       ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > > >>>> +       if (ret)
> > > >>>> +               goto unlock;
> > > >>>> +
> > > >>>> +       if (enc)
> > > >>>> +               __bitmap_set(sev->page_enc_bmap, gfn_start,
> > > >>>> +                               gfn_end - gfn_start);
> > > >>>> +       else
> > > >>>> +               __bitmap_clear(sev->page_enc_bmap, gfn_start,
> > > >>>> +                               gfn_end - gfn_start);
> > > >>>> +
> > > >>>> +unlock:
> > > >>>> +       mutex_unlock(&kvm->lock);
> > > >>>> +       return ret;
> > > >>>> +}
> > > >>>> +
> > > >>>>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > >>>>  {
> > > >>>>         struct kvm_sev_cmd sev_cmd;
> > > >>>> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > >>>>         .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> > > >>>>
> > > >>>>         .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > > >>>> +
> > > >>>> +       .page_enc_status_hc = svm_page_enc_status_hc,
> > > >>>>  };
> > > >>>>
> > > >>>>  static int __init svm_init(void)
> > > >>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > >>>> index 079d9fbf278e..f68e76ee7f9c 100644
> > > >>>> --- a/arch/x86/kvm/vmx/vmx.c
> > > >>>> +++ b/arch/x86/kvm/vmx/vmx.c
> > > >>>> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > > >>>>         .nested_get_evmcs_version = NULL,
> > > >>>>         .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> > > >>>>         .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > > >>>> +       .page_enc_status_hc = NULL,
> > > >>>>  };
> > > >>>>
> > > >>>>  static void vmx_cleanup_l1d_flush(void)
> > > >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > >>>> index cf95c36cb4f4..68428eef2dde 100644
> > > >>>> --- a/arch/x86/kvm/x86.c
> > > >>>> +++ b/arch/x86/kvm/x86.c
> > > >>>> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> > > >>>>                 kvm_sched_yield(vcpu->kvm, a0);
> > > >>>>                 ret = 0;
> > > >>>>                 break;
> > > >>>> +       case KVM_HC_PAGE_ENC_STATUS:
> > > >>>> +               ret = -KVM_ENOSYS;
> > > >>>> +               if (kvm_x86_ops->page_enc_status_hc)
> > > >>>> +                       ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> > > >>>> +                                       a0, a1, a2);
> > > >>>> +               break;
> > > >>>>         default:
> > > >>>>                 ret = -KVM_ENOSYS;
> > > >>>>                 break;
> > > >>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> > > >>>> index 8b86609849b9..847b83b75dc8 100644
> > > >>>> --- a/include/uapi/linux/kvm_para.h
> > > >>>> +++ b/include/uapi/linux/kvm_para.h
> > > >>>> @@ -29,6 +29,7 @@
> > > >>>>  #define KVM_HC_CLOCK_PAIRING           9
> > > >>>>  #define KVM_HC_SEND_IPI                10
> > > >>>>  #define KVM_HC_SCHED_YIELD             11
> > > >>>> +#define KVM_HC_PAGE_ENC_STATUS         12
> > > >>>>
> > > >>>>  /*
> > > >>>>   * hypercalls use architecture specific
> > > >>>> --
> > > >>>> 2.17.1
> > > >>>>
> > > >>> I'm still not excited by the dynamic resizing. I believe the guest
> > > >>> hypercall can be called in atomic contexts, which makes me
> > > >>> particularly unexcited to see a potentially large vmalloc on the host
> > > >>> followed by filling the buffer. Particularly when the buffer might be
> > > >>> non-trivial in size (~1MB per 32GB, per some back of the envelope
> > > >>> math).
> > > >>>
> > > >> I think looking at more practical situations, most hypercalls will
> > > >> happen during the boot stage, when device specific initializations are
> > > >> happening, so typically the maximum page encryption bitmap size would
> > > >> be allocated early enough.
> > > >>
> > > >> In fact, initial hypercalls made by OVMF will probably allocate the
> > > >> maximum page bitmap size even before the kernel comes up, especially
> > > >> as they will be setting up page enc/dec status for MMIO, ROM, ACPI
> > > >> regions, PCI device memory, etc., and most importantly for
> > > >> "non-existent" high memory range (which will probably be the
> > > >> maximum size page encryption bitmap allocated/resized).
> > > >>
> > > >> Let me know if you have different thoughts on this ?
> > > > Hi Ashish,
> > > >
> > > > If this is not an issue in practice, we can just move past this. If we
> > > > are basically guaranteed that OVMF will trigger hypercalls that expand
> > > > the bitmap beyond the top of memory, then, yes, that should work. That
> > > > leaves me slightly nervous that OVMF might regress since it's not
> > > > obvious that calling a hypercall beyond the top of memory would be
> > > > "required" for avoiding a somewhat indirectly related issue in guest
> > > > kernels.
> > >
> > >
> > > If possible then we should try to avoid growing/shrinking the bitmap .
> > > Today OVMF may not be accessing beyond memory but a malicious guest
> > > could send a hypercall down which can trigger a huge memory allocation
> > > on the host side and may eventually cause denial of service for other.
> > Nice catch! Was just writing up an email about this.
> > > I am in favor if we can find some solution to handle this case. How
> > > about Steve's suggestion about VMM making a call down to the kernel to
> > > tell how big the bitmap should be? Initially it should be equal to the
> > > guest RAM and if VMM ever did the memory expansion then it can send down
> > > another notification to increase the bitmap ?
> > >
> > > Optionally, instead of adding a new ioctl, I was wondering if we can
> > > extend the kvm_arch_prepare_memory_region() to make svm specific x86_ops
> > > which can take read the userspace provided memory region and calculate
> > > the amount of guest RAM managed by the KVM and grow/shrink the bitmap
> > > based on that information. I have not looked deep enough to see if its
> > > doable but if it can work then we can avoid adding yet another ioctl.
> > We also have the set bitmap ioctl in a later patch in this series. We
> > could also use the set ioctl for initialization (it's a little
> > excessive for initialization since there will be an additional
> > ephemeral allocation and a few additional buffer copies, but that's
> > probably fine). An enable_cap has the added benefit of probably being
> > necessary anyway so usermode can disable the migration feature flag.
> >
> > In general, userspace is going to have to be in direct control of the
> > buffer and its size.
>
> My only practical concern about setting a static bitmap size based on guest
> memory is about the hypercalls being made initially by OVMF to set page
> enc/dec status for ROM, ACPI regions and especially the non-existent
> high memory range. The new ioctl will statically setup bitmap size to
> whatever guest RAM is specified, say 4G, 8G, etc., but the OVMF
> hypercall for non-existent memory will try to do a hypercall for guest
> physical memory range like ~6G->64G (for 4G guest RAM setup), this
> hypercall will basically have to just return doing nothing, because
> the allocated bitmap won't have this guest physical range available ?
>
> Also, hypercalls for ROM, ACPI, device regions and any memory holes within
> the static bitmap setup as per guest RAM config will work, but what
> about hypercalls for any device regions beyond the guest RAM config ?
>
> Thanks,
> Ashish
I'm not super familiar with what the address beyond the top of ram is
used for. If the memory is not backed by RAM, will it even matter for
migration? Sounds like the encryption for SEV won't even apply to it.
If we don't need to know what the c-bit state of an address is, we
don't need to track it. It doesn't hurt to track it (which is why I'm
not super concerned about tracking the memory holes).

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 11/14] KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
  2020-04-08  0:26   ` Steve Rutherford
@ 2020-04-08  1:48     ` Ashish Kalra
  2020-04-10  0:06       ` Steve Rutherford
  0 siblings, 1 reply; 107+ messages in thread
From: Ashish Kalra @ 2020-04-08  1:48 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, David Rientjes, Andy Lutomirski, Brijesh Singh

Hello Steve,

On Tue, Apr 07, 2020 at 05:26:33PM -0700, Steve Rutherford wrote:
> On Sun, Mar 29, 2020 at 11:23 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> >
> > From: Brijesh Singh <Brijesh.Singh@amd.com>
> >
> > The ioctl can be used to set page encryption bitmap for an
> > incoming guest.
> >
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > Cc: x86@kernel.org
> > Cc: kvm@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > ---
> >  Documentation/virt/kvm/api.rst  | 22 +++++++++++++++++
> >  arch/x86/include/asm/kvm_host.h |  2 ++
> >  arch/x86/kvm/svm.c              | 42 +++++++++++++++++++++++++++++++++
> >  arch/x86/kvm/x86.c              | 12 ++++++++++
> >  include/uapi/linux/kvm.h        |  1 +
> >  5 files changed, 79 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 8ad800ebb54f..4d1004a154f6 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -4675,6 +4675,28 @@ or shared. The bitmap can be used during the guest migration, if the page
> >  is private then userspace need to use SEV migration commands to transmit
> >  the page.
> >
> > +4.126 KVM_SET_PAGE_ENC_BITMAP (vm ioctl)
> > +---------------------------------------
> > +
> > +:Capability: basic
> > +:Architectures: x86
> > +:Type: vm ioctl
> > +:Parameters: struct kvm_page_enc_bitmap (in/out)
> > +:Returns: 0 on success, -1 on error
> > +
> > +/* for KVM_SET_PAGE_ENC_BITMAP */
> > +struct kvm_page_enc_bitmap {
> > +       __u64 start_gfn;
> > +       __u64 num_pages;
> > +       union {
> > +               void __user *enc_bitmap; /* one bit per page */
> > +               __u64 padding2;
> > +       };
> > +};
> > +
> > +During the guest live migration the outgoing guest exports its page encryption
> > +bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > +bitmap for an incoming guest.
> >
> >  5. The kvm_run structure
> >  ========================
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 27e43e3ec9d8..d30f770aaaea 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1271,6 +1271,8 @@ struct kvm_x86_ops {
> >                                   unsigned long sz, unsigned long mode);
> >         int (*get_page_enc_bitmap)(struct kvm *kvm,
> >                                 struct kvm_page_enc_bitmap *bmap);
> > +       int (*set_page_enc_bitmap)(struct kvm *kvm,
> > +                               struct kvm_page_enc_bitmap *bmap);
> >  };
> >
> >  struct kvm_arch_async_pf {
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index bae783cd396a..313343a43045 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -7756,6 +7756,47 @@ static int svm_get_page_enc_bitmap(struct kvm *kvm,
> >         return ret;
> >  }
> >
> > +static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > +                                  struct kvm_page_enc_bitmap *bmap)
> > +{
> > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +       unsigned long gfn_start, gfn_end;
> > +       unsigned long *bitmap;
> > +       unsigned long sz, i;
> > +       int ret;
> > +
> > +       if (!sev_guest(kvm))
> > +               return -ENOTTY;
> > +
> > +       gfn_start = bmap->start_gfn;
> > +       gfn_end = gfn_start + bmap->num_pages;
> > +
> > +       sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / 8;
> > +       bitmap = kmalloc(sz, GFP_KERNEL);
> > +       if (!bitmap)
> > +               return -ENOMEM;
> > +
> > +       ret = -EFAULT;
> > +       if (copy_from_user(bitmap, bmap->enc_bitmap, sz))
> > +               goto out;
> > +
> > +       mutex_lock(&kvm->lock);
> > +       ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> I realize now that usermode could use this for initializing the
> minimum size of the enc bitmap, which probably solves my issue from
> the other thread.
> > +       if (ret)
> > +               goto unlock;
> > +
> > +       i = gfn_start;
> > +       for_each_clear_bit_from(i, bitmap, (gfn_end - gfn_start))
> > +               clear_bit(i + gfn_start, sev->page_enc_bmap);
> This API seems a bit strange, since it can only clear bits. I would
> expect "set" to force the values to match the values passed down,
> instead of only ensuring that cleared bits in the input are also
> cleared in the kernel.
>

The sev_resize_page_enc_bitmap() will allocate a new bitmap and
set it to all 0xFF's, therefore, the code here simply clears the bits
in the bitmap as per the cleared bits in the input.

Thanks,
Ashish

> This should copy the values from userspace (and fix up the ends since
> byte alignment makes that complicated), instead of iterating in this
> way.
> > +
> > +       ret = 0;
> > +unlock:
> > +       mutex_unlock(&kvm->lock);
> > +out:
> > +       kfree(bitmap);
> > +       return ret;
> > +}
> > +
> >  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >  {
> >         struct kvm_sev_cmd sev_cmd;
> > @@ -8161,6 +8202,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >
> >         .page_enc_status_hc = svm_page_enc_status_hc,
> >         .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > +       .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> >  };
> >
> >  static int __init svm_init(void)
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 3c3fea4e20b5..05e953b2ec61 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5238,6 +5238,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >                         r = kvm_x86_ops->get_page_enc_bitmap(kvm, &bitmap);
> >                 break;
> >         }
> > +       case KVM_SET_PAGE_ENC_BITMAP: {
> > +               struct kvm_page_enc_bitmap bitmap;
> > +
> > +               r = -EFAULT;
> > +               if (copy_from_user(&bitmap, argp, sizeof(bitmap)))
> > +                       goto out;
> > +
> > +               r = -ENOTTY;
> > +               if (kvm_x86_ops->set_page_enc_bitmap)
> > +                       r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> > +               break;
> > +       }
> >         default:
> >                 r = -ENOTTY;
> >         }
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index db1ebf85e177..b4b01d47e568 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1489,6 +1489,7 @@ struct kvm_enc_region {
> >  #define KVM_S390_CLEAR_RESET   _IO(KVMIO,   0xc4)
> >
> >  #define KVM_GET_PAGE_ENC_BITMAP        _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > +#define KVM_SET_PAGE_ENC_BITMAP        _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> >
> >  /* Secure Encrypted Virtualization command */
> >  enum sev_cmd_id {
> > --
> > 2.17.1
> >

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  2020-04-08  1:25         ` Steve Rutherford
@ 2020-04-08  1:52           ` Ashish Kalra
  2020-04-10  0:59             ` Steve Rutherford
  0 siblings, 1 reply; 107+ messages in thread
From: Ashish Kalra @ 2020-04-08  1:52 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Krish Sadhukhan, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski,
	Brijesh Singh

Hello Steve,

On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
> On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
> <krish.sadhukhan@oracle.com> wrote:
> >
> >
> > On 4/3/20 2:45 PM, Ashish Kalra wrote:
> > > On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> > >> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > >>> From: Ashish Kalra <ashish.kalra@amd.com>
> > >>>
> > >>> This ioctl can be used by the application to reset the page
> > >>> encryption bitmap managed by the KVM driver. A typical usage
> > >>> for this ioctl is on VM reboot, on reboot, we must reinitialize
> > >>> the bitmap.
> > >>>
> > >>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > >>> ---
> > >>>    Documentation/virt/kvm/api.rst  | 13 +++++++++++++
> > >>>    arch/x86/include/asm/kvm_host.h |  1 +
> > >>>    arch/x86/kvm/svm.c              | 16 ++++++++++++++++
> > >>>    arch/x86/kvm/x86.c              |  6 ++++++
> > >>>    include/uapi/linux/kvm.h        |  1 +
> > >>>    5 files changed, 37 insertions(+)
> > >>>
> > >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > >>> index 4d1004a154f6..a11326ccc51d 100644
> > >>> --- a/Documentation/virt/kvm/api.rst
> > >>> +++ b/Documentation/virt/kvm/api.rst
> > >>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> > >>>    bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > >>>    bitmap for an incoming guest.
> > >>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> > >>> +-----------------------------------------
> > >>> +
> > >>> +:Capability: basic
> > >>> +:Architectures: x86
> > >>> +:Type: vm ioctl
> > >>> +:Parameters: none
> > >>> +:Returns: 0 on success, -1 on error
> > >>> +
> > >>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> > >>> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> > >>> +
> > >>> +
> > >>>    5. The kvm_run structure
> > >>>    ========================
> > >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > >>> index d30f770aaaea..a96ef6338cd2 100644
> > >>> --- a/arch/x86/include/asm/kvm_host.h
> > >>> +++ b/arch/x86/include/asm/kvm_host.h
> > >>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> > >>>                             struct kvm_page_enc_bitmap *bmap);
> > >>>     int (*set_page_enc_bitmap)(struct kvm *kvm,
> > >>>                             struct kvm_page_enc_bitmap *bmap);
> > >>> +   int (*reset_page_enc_bitmap)(struct kvm *kvm);
> > >>>    };
> > >>>    struct kvm_arch_async_pf {
> > >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > >>> index 313343a43045..c99b0207a443 100644
> > >>> --- a/arch/x86/kvm/svm.c
> > >>> +++ b/arch/x86/kvm/svm.c
> > >>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > >>>     return ret;
> > >>>    }
> > >>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> > >>> +{
> > >>> +   struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > >>> +
> > >>> +   if (!sev_guest(kvm))
> > >>> +           return -ENOTTY;
> > >>> +
> > >>> +   mutex_lock(&kvm->lock);
> > >>> +   /* by default all pages should be marked encrypted */
> > >>> +   if (sev->page_enc_bmap_size)
> > >>> +           bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> > >>> +   mutex_unlock(&kvm->lock);
> > >>> +   return 0;
> > >>> +}
> > >>> +
> > >>>    static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > >>>    {
> > >>>     struct kvm_sev_cmd sev_cmd;
> > >>> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > >>>     .page_enc_status_hc = svm_page_enc_status_hc,
> > >>>     .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > >>>     .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> > >>> +   .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
> > >>
> > >> We don't need to initialize the intel ops to NULL ? It's not initialized in
> > >> the previous patch either.
> > >>
> > >>>    };
> > > This struct is declared as "static storage", so won't the non-initialized
> > > members be 0 ?
> >
> >
> > Correct. Although, I see that 'nested_enable_evmcs' is explicitly
> > initialized. We should maintain the convention, perhaps.
> >
> > >
> > >>>    static int __init svm_init(void)
> > >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > >>> index 05e953b2ec61..2127ed937f53 100644
> > >>> --- a/arch/x86/kvm/x86.c
> > >>> +++ b/arch/x86/kvm/x86.c
> > >>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > >>>                     r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> > >>>             break;
> > >>>     }
> > >>> +   case KVM_PAGE_ENC_BITMAP_RESET: {
> > >>> +           r = -ENOTTY;
> > >>> +           if (kvm_x86_ops->reset_page_enc_bitmap)
> > >>> +                   r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> > >>> +           break;
> > >>> +   }
> > >>>     default:
> > >>>             r = -ENOTTY;
> > >>>     }
> > >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > >>> index b4b01d47e568..0884a581fc37 100644
> > >>> --- a/include/uapi/linux/kvm.h
> > >>> +++ b/include/uapi/linux/kvm.h
> > >>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> > >>>    #define KVM_GET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > >>>    #define KVM_SET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> > >>> +#define KVM_PAGE_ENC_BITMAP_RESET  _IO(KVMIO, 0xc7)
> > >>>    /* Secure Encrypted Virtualization command */
> > >>>    enum sev_cmd_id {
> > >> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> 
> 
> Doesn't this overlap with the set ioctl? Yes, obviously, you have to
> copy the new value down and do a bit more work, but I don't think
> resetting the bitmap is going to be the bottleneck on reboot. Seems
> excessive to add another ioctl for this.

The set ioctl is generally available/provided for the incoming VM to setup
the page encryption bitmap, this reset ioctl is meant for the source VM
as a simple interface to reset the whole page encryption bitmap.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2020-04-08  1:38               ` Steve Rutherford
@ 2020-04-08  2:34                 ` Brijesh Singh
  2020-04-08  3:18                   ` Ashish Kalra
  0 siblings, 1 reply; 107+ messages in thread
From: Brijesh Singh @ 2020-04-08  2:34 UTC (permalink / raw)
  To: Steve Rutherford, Ashish Kalra
  Cc: brijesh.singh, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski


On 4/7/20 8:38 PM, Steve Rutherford wrote:
> On Tue, Apr 7, 2020 at 6:17 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>> Hello Steve, Brijesh,
>>
>> On Tue, Apr 07, 2020 at 05:35:57PM -0700, Steve Rutherford wrote:
>>> On Tue, Apr 7, 2020 at 5:29 PM Brijesh Singh <brijesh.singh@amd.com> wrote:
>>>>
>>>> On 4/7/20 7:01 PM, Steve Rutherford wrote:
>>>>> On Mon, Apr 6, 2020 at 10:27 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>>>>>> Hello Steve,
>>>>>>
>>>>>> On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
>>>>>>> On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
>>>>>>>> From: Brijesh Singh <Brijesh.Singh@amd.com>
>>>>>>>>
>>>>>>>> This hypercall is used by the SEV guest to notify a change in the page
>>>>>>>> encryption status to the hypervisor. The hypercall should be invoked
>>>>>>>> only when the encryption attribute is changed from encrypted -> decrypted
>>>>>>>> and vice versa. By default all guest pages are considered encrypted.
>>>>>>>>
>>>>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>>>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>>>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>>>>>> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
>>>>>>>> Cc: Joerg Roedel <joro@8bytes.org>
>>>>>>>> Cc: Borislav Petkov <bp@suse.de>
>>>>>>>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>>>>>>>> Cc: x86@kernel.org
>>>>>>>> Cc: kvm@vger.kernel.org
>>>>>>>> Cc: linux-kernel@vger.kernel.org
>>>>>>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>>>>>>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>>>>>>> ---
>>>>>>>>  Documentation/virt/kvm/hypercalls.rst | 15 +++++
>>>>>>>>  arch/x86/include/asm/kvm_host.h       |  2 +
>>>>>>>>  arch/x86/kvm/svm.c                    | 95 +++++++++++++++++++++++++++
>>>>>>>>  arch/x86/kvm/vmx/vmx.c                |  1 +
>>>>>>>>  arch/x86/kvm/x86.c                    |  6 ++
>>>>>>>>  include/uapi/linux/kvm_para.h         |  1 +
>>>>>>>>  6 files changed, 120 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
>>>>>>>> index dbaf207e560d..ff5287e68e81 100644
>>>>>>>> --- a/Documentation/virt/kvm/hypercalls.rst
>>>>>>>> +++ b/Documentation/virt/kvm/hypercalls.rst
>>>>>>>> @@ -169,3 +169,18 @@ a0: destination APIC ID
>>>>>>>>
>>>>>>>>  :Usage example: When sending a call-function IPI-many to vCPUs, yield if
>>>>>>>>                 any of the IPI target vCPUs was preempted.
>>>>>>>> +
>>>>>>>> +
>>>>>>>> +8. KVM_HC_PAGE_ENC_STATUS
>>>>>>>> +-------------------------
>>>>>>>> +:Architecture: x86
>>>>>>>> +:Status: active
>>>>>>>> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
>>>>>>>> +
>>>>>>>> +a0: the guest physical address of the start page
>>>>>>>> +a1: the number of pages
>>>>>>>> +a2: encryption attribute
>>>>>>>> +
>>>>>>>> +   Where:
>>>>>>>> +       * 1: Encryption attribute is set
>>>>>>>> +       * 0: Encryption attribute is cleared
>>>>>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>>>>>> index 98959e8cd448..90718fa3db47 100644
>>>>>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>>>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>>>>>> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
>>>>>>>>
>>>>>>>>         bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
>>>>>>>>         int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
>>>>>>>> +       int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
>>>>>>>> +                                 unsigned long sz, unsigned long mode);
>>>>>>> Nit: spell out size instead of sz.
>>>>>>>>  };
>>>>>>>>
>>>>>>>>  struct kvm_arch_async_pf {
>>>>>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>>>>>>> index 7c2721e18b06..1d8beaf1bceb 100644
>>>>>>>> --- a/arch/x86/kvm/svm.c
>>>>>>>> +++ b/arch/x86/kvm/svm.c
>>>>>>>> @@ -136,6 +136,8 @@ struct kvm_sev_info {
>>>>>>>>         int fd;                 /* SEV device fd */
>>>>>>>>         unsigned long pages_locked; /* Number of pages locked */
>>>>>>>>         struct list_head regions_list;  /* List of registered regions */
>>>>>>>> +       unsigned long *page_enc_bmap;
>>>>>>>> +       unsigned long page_enc_bmap_size;
>>>>>>>>  };
>>>>>>>>
>>>>>>>>  struct kvm_svm {
>>>>>>>> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
>>>>>>>>
>>>>>>>>         sev_unbind_asid(kvm, sev->handle);
>>>>>>>>         sev_asid_free(sev->asid);
>>>>>>>> +
>>>>>>>> +       kvfree(sev->page_enc_bmap);
>>>>>>>> +       sev->page_enc_bmap = NULL;
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  static void avic_vm_destroy(struct kvm *kvm)
>>>>>>>> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>>>>>>>         return ret;
>>>>>>>>  }
>>>>>>>>
>>>>>>>> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
>>>>>>>> +{
>>>>>>>> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>>> +       unsigned long *map;
>>>>>>>> +       unsigned long sz;
>>>>>>>> +
>>>>>>>> +       if (sev->page_enc_bmap_size >= new_size)
>>>>>>>> +               return 0;
>>>>>>>> +
>>>>>>>> +       sz = ALIGN(new_size, BITS_PER_LONG) / 8;
>>>>>>>> +
>>>>>>>> +       map = vmalloc(sz);
>>>>>>>> +       if (!map) {
>>>>>>>> +               pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
>>>>>>>> +                               sz);
>>>>>>>> +               return -ENOMEM;
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       /* mark the page encrypted (by default) */
>>>>>>>> +       memset(map, 0xff, sz);
>>>>>>>> +
>>>>>>>> +       bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
>>>>>>>> +       kvfree(sev->page_enc_bmap);
>>>>>>>> +
>>>>>>>> +       sev->page_enc_bmap = map;
>>>>>>>> +       sev->page_enc_bmap_size = new_size;
>>>>>>>> +
>>>>>>>> +       return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
>>>>>>>> +                                 unsigned long npages, unsigned long enc)
>>>>>>>> +{
>>>>>>>> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>>>>>> +       kvm_pfn_t pfn_start, pfn_end;
>>>>>>>> +       gfn_t gfn_start, gfn_end;
>>>>>>>> +       int ret;
>>>>>>>> +
>>>>>>>> +       if (!sev_guest(kvm))
>>>>>>>> +               return -EINVAL;
>>>>>>>> +
>>>>>>>> +       if (!npages)
>>>>>>>> +               return 0;
>>>>>>>> +
>>>>>>>> +       gfn_start = gpa_to_gfn(gpa);
>>>>>>>> +       gfn_end = gfn_start + npages;
>>>>>>>> +
>>>>>>>> +       /* out of bound access error check */
>>>>>>>> +       if (gfn_end <= gfn_start)
>>>>>>>> +               return -EINVAL;
>>>>>>>> +
>>>>>>>> +       /* lets make sure that gpa exist in our memslot */
>>>>>>>> +       pfn_start = gfn_to_pfn(kvm, gfn_start);
>>>>>>>> +       pfn_end = gfn_to_pfn(kvm, gfn_end);
>>>>>>>> +
>>>>>>>> +       if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
>>>>>>>> +               /*
>>>>>>>> +                * Allow guest MMIO range(s) to be added
>>>>>>>> +                * to the page encryption bitmap.
>>>>>>>> +                */
>>>>>>>> +               return -EINVAL;
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
>>>>>>>> +               /*
>>>>>>>> +                * Allow guest MMIO range(s) to be added
>>>>>>>> +                * to the page encryption bitmap.
>>>>>>>> +                */
>>>>>>>> +               return -EINVAL;
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       mutex_lock(&kvm->lock);
>>>>>>>> +       ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
>>>>>>>> +       if (ret)
>>>>>>>> +               goto unlock;
>>>>>>>> +
>>>>>>>> +       if (enc)
>>>>>>>> +               __bitmap_set(sev->page_enc_bmap, gfn_start,
>>>>>>>> +                               gfn_end - gfn_start);
>>>>>>>> +       else
>>>>>>>> +               __bitmap_clear(sev->page_enc_bmap, gfn_start,
>>>>>>>> +                               gfn_end - gfn_start);
>>>>>>>> +
>>>>>>>> +unlock:
>>>>>>>> +       mutex_unlock(&kvm->lock);
>>>>>>>> +       return ret;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>>>>>>>  {
>>>>>>>>         struct kvm_sev_cmd sev_cmd;
>>>>>>>> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>>>>>>>>         .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
>>>>>>>>
>>>>>>>>         .apic_init_signal_blocked = svm_apic_init_signal_blocked,
>>>>>>>> +
>>>>>>>> +       .page_enc_status_hc = svm_page_enc_status_hc,
>>>>>>>>  };
>>>>>>>>
>>>>>>>>  static int __init svm_init(void)
>>>>>>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>>>>>>> index 079d9fbf278e..f68e76ee7f9c 100644
>>>>>>>> --- a/arch/x86/kvm/vmx/vmx.c
>>>>>>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>>>>>>> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>>>>>>>>         .nested_get_evmcs_version = NULL,
>>>>>>>>         .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
>>>>>>>>         .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
>>>>>>>> +       .page_enc_status_hc = NULL,
>>>>>>>>  };
>>>>>>>>
>>>>>>>>  static void vmx_cleanup_l1d_flush(void)
>>>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>>>>>> index cf95c36cb4f4..68428eef2dde 100644
>>>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>>>> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>>>>>>>>                 kvm_sched_yield(vcpu->kvm, a0);
>>>>>>>>                 ret = 0;
>>>>>>>>                 break;
>>>>>>>> +       case KVM_HC_PAGE_ENC_STATUS:
>>>>>>>> +               ret = -KVM_ENOSYS;
>>>>>>>> +               if (kvm_x86_ops->page_enc_status_hc)
>>>>>>>> +                       ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
>>>>>>>> +                                       a0, a1, a2);
>>>>>>>> +               break;
>>>>>>>>         default:
>>>>>>>>                 ret = -KVM_ENOSYS;
>>>>>>>>                 break;
>>>>>>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
>>>>>>>> index 8b86609849b9..847b83b75dc8 100644
>>>>>>>> --- a/include/uapi/linux/kvm_para.h
>>>>>>>> +++ b/include/uapi/linux/kvm_para.h
>>>>>>>> @@ -29,6 +29,7 @@
>>>>>>>>  #define KVM_HC_CLOCK_PAIRING           9
>>>>>>>>  #define KVM_HC_SEND_IPI                10
>>>>>>>>  #define KVM_HC_SCHED_YIELD             11
>>>>>>>> +#define KVM_HC_PAGE_ENC_STATUS         12
>>>>>>>>
>>>>>>>>  /*
>>>>>>>>   * hypercalls use architecture specific
>>>>>>>> --
>>>>>>>> 2.17.1
>>>>>>>>
>>>>>>> I'm still not excited by the dynamic resizing. I believe the guest
>>>>>>> hypercall can be called in atomic contexts, which makes me
>>>>>>> particularly unexcited to see a potentially large vmalloc on the host
>>>>>>> followed by filling the buffer. Particularly when the buffer might be
>>>>>>> non-trivial in size (~1MB per 32GB, per some back of the envelope
>>>>>>> math).
>>>>>>>
>>>>>> I think looking at more practical situations, most hypercalls will
>>>>>> happen during the boot stage, when device specific initializations are
>>>>>> happening, so typically the maximum page encryption bitmap size would
>>>>>> be allocated early enough.
>>>>>>
>>>>>> In fact, initial hypercalls made by OVMF will probably allocate the
>>>>>> maximum page bitmap size even before the kernel comes up, especially
>>>>>> as they will be setting up page enc/dec status for MMIO, ROM, ACPI
>>>>>> regions, PCI device memory, etc., and most importantly for
>>>>>> "non-existent" high memory range (which will probably be the
>>>>>> maximum size page encryption bitmap allocated/resized).
>>>>>>
>>>>>> Let me know if you have different thoughts on this ?
>>>>> Hi Ashish,
>>>>>
>>>>> If this is not an issue in practice, we can just move past this. If we
>>>>> are basically guaranteed that OVMF will trigger hypercalls that expand
>>>>> the bitmap beyond the top of memory, then, yes, that should work. That
>>>>> leaves me slightly nervous that OVMF might regress since it's not
>>>>> obvious that calling a hypercall beyond the top of memory would be
>>>>> "required" for avoiding a somewhat indirectly related issue in guest
>>>>> kernels.
>>>>
>>>> If possible then we should try to avoid growing/shrinking the bitmap .
>>>> Today OVMF may not be accessing beyond memory but a malicious guest
>>>> could send a hypercall down which can trigger a huge memory allocation
>>>> on the host side and may eventually cause denial of service for other.
>>> Nice catch! Was just writing up an email about this.
>>>> I am in favor if we can find some solution to handle this case. How
>>>> about Steve's suggestion about VMM making a call down to the kernel to
>>>> tell how big the bitmap should be? Initially it should be equal to the
>>>> guest RAM and if VMM ever did the memory expansion then it can send down
>>>> another notification to increase the bitmap ?
>>>>
>>>> Optionally, instead of adding a new ioctl, I was wondering if we can
>>>> extend the kvm_arch_prepare_memory_region() to make svm specific x86_ops
>>>> which can take read the userspace provided memory region and calculate
>>>> the amount of guest RAM managed by the KVM and grow/shrink the bitmap
>>>> based on that information. I have not looked deep enough to see if its
>>>> doable but if it can work then we can avoid adding yet another ioctl.
>>> We also have the set bitmap ioctl in a later patch in this series. We
>>> could also use the set ioctl for initialization (it's a little
>>> excessive for initialization since there will be an additional
>>> ephemeral allocation and a few additional buffer copies, but that's
>>> probably fine). An enable_cap has the added benefit of probably being
>>> necessary anyway so usermode can disable the migration feature flag.
>>>
>>> In general, userspace is going to have to be in direct control of the
>>> buffer and its size.
>> My only practical concern about setting a static bitmap size based on guest
>> memory is about the hypercalls being made initially by OVMF to set page
>> enc/dec status for ROM, ACPI regions and especially the non-existent
>> high memory range. The new ioctl will statically setup bitmap size to
>> whatever guest RAM is specified, say 4G, 8G, etc., but the OVMF
>> hypercall for non-existent memory will try to do a hypercall for guest
>> physical memory range like ~6G->64G (for 4G guest RAM setup), this
>> hypercall will basically have to just return doing nothing, because
>> the allocated bitmap won't have this guest physical range available ?


IMO, Ovmf issuing a hypercall beyond the guest RAM is simple wrong, it
should *not* do that.  There was a feature request I submitted sometime
back to Tianocore https://bugzilla.tianocore.org/show_bug.cgi?id=623 as
I saw this coming in future. I tried highlighting the problem in the
MdeModulePkg that it does not provide a notifier to tell OVMF when core
creates the MMIO holes etc. It was not a big problem with the SEV
initially because we were never getting down to hypervisor to do
something about those non-existent regions. But with the migration its
now important that we should restart the discussion with UEFI folks and
see what can be done. In the kernel patches we should do what is right
for the kernel and not workaround the Ovmf limitation.


>> Also, hypercalls for ROM, ACPI, device regions and any memory holes within
>> the static bitmap setup as per guest RAM config will work, but what
>> about hypercalls for any device regions beyond the guest RAM config ?
>>
>> Thanks,
>> Ashish
> I'm not super familiar with what the address beyond the top of ram is
> used for. If the memory is not backed by RAM, will it even matter for
> migration? Sounds like the encryption for SEV won't even apply to it.
> If we don't need to know what the c-bit state of an address is, we
> don't need to track it. It doesn't hurt to track it (which is why I'm
> not super concerned about tracking the memory holes).

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2020-04-08  2:34                 ` Brijesh Singh
@ 2020-04-08  3:18                   ` Ashish Kalra
  2020-04-09 16:18                     ` Ashish Kalra
  0 siblings, 1 reply; 107+ messages in thread
From: Ashish Kalra @ 2020-04-08  3:18 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Steve Rutherford, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski

Hello Brijesh,

On Tue, Apr 07, 2020 at 09:34:15PM -0500, Brijesh Singh wrote:
> 
> On 4/7/20 8:38 PM, Steve Rutherford wrote:
> > On Tue, Apr 7, 2020 at 6:17 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >> Hello Steve, Brijesh,
> >>
> >> On Tue, Apr 07, 2020 at 05:35:57PM -0700, Steve Rutherford wrote:
> >>> On Tue, Apr 7, 2020 at 5:29 PM Brijesh Singh <brijesh.singh@amd.com> wrote:
> >>>>
> >>>> On 4/7/20 7:01 PM, Steve Rutherford wrote:
> >>>>> On Mon, Apr 6, 2020 at 10:27 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >>>>>> Hello Steve,
> >>>>>>
> >>>>>> On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
> >>>>>>> On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> >>>>>>>> From: Brijesh Singh <Brijesh.Singh@amd.com>
> >>>>>>>>
> >>>>>>>> This hypercall is used by the SEV guest to notify a change in the page
> >>>>>>>> encryption status to the hypervisor. The hypercall should be invoked
> >>>>>>>> only when the encryption attribute is changed from encrypted -> decrypted
> >>>>>>>> and vice versa. By default all guest pages are considered encrypted.
> >>>>>>>>
> >>>>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
> >>>>>>>> Cc: Ingo Molnar <mingo@redhat.com>
> >>>>>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
> >>>>>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >>>>>>>> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> >>>>>>>> Cc: Joerg Roedel <joro@8bytes.org>
> >>>>>>>> Cc: Borislav Petkov <bp@suse.de>
> >>>>>>>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> >>>>>>>> Cc: x86@kernel.org
> >>>>>>>> Cc: kvm@vger.kernel.org
> >>>>>>>> Cc: linux-kernel@vger.kernel.org
> >>>>>>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> >>>>>>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> >>>>>>>> ---
> >>>>>>>>  Documentation/virt/kvm/hypercalls.rst | 15 +++++
> >>>>>>>>  arch/x86/include/asm/kvm_host.h       |  2 +
> >>>>>>>>  arch/x86/kvm/svm.c                    | 95 +++++++++++++++++++++++++++
> >>>>>>>>  arch/x86/kvm/vmx/vmx.c                |  1 +
> >>>>>>>>  arch/x86/kvm/x86.c                    |  6 ++
> >>>>>>>>  include/uapi/linux/kvm_para.h         |  1 +
> >>>>>>>>  6 files changed, 120 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> >>>>>>>> index dbaf207e560d..ff5287e68e81 100644
> >>>>>>>> --- a/Documentation/virt/kvm/hypercalls.rst
> >>>>>>>> +++ b/Documentation/virt/kvm/hypercalls.rst
> >>>>>>>> @@ -169,3 +169,18 @@ a0: destination APIC ID
> >>>>>>>>
> >>>>>>>>  :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> >>>>>>>>                 any of the IPI target vCPUs was preempted.
> >>>>>>>> +
> >>>>>>>> +
> >>>>>>>> +8. KVM_HC_PAGE_ENC_STATUS
> >>>>>>>> +-------------------------
> >>>>>>>> +:Architecture: x86
> >>>>>>>> +:Status: active
> >>>>>>>> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> >>>>>>>> +
> >>>>>>>> +a0: the guest physical address of the start page
> >>>>>>>> +a1: the number of pages
> >>>>>>>> +a2: encryption attribute
> >>>>>>>> +
> >>>>>>>> +   Where:
> >>>>>>>> +       * 1: Encryption attribute is set
> >>>>>>>> +       * 0: Encryption attribute is cleared
> >>>>>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >>>>>>>> index 98959e8cd448..90718fa3db47 100644
> >>>>>>>> --- a/arch/x86/include/asm/kvm_host.h
> >>>>>>>> +++ b/arch/x86/include/asm/kvm_host.h
> >>>>>>>> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> >>>>>>>>
> >>>>>>>>         bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> >>>>>>>>         int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> >>>>>>>> +       int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> >>>>>>>> +                                 unsigned long sz, unsigned long mode);
> >>>>>>> Nit: spell out size instead of sz.
> >>>>>>>>  };
> >>>>>>>>
> >>>>>>>>  struct kvm_arch_async_pf {
> >>>>>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >>>>>>>> index 7c2721e18b06..1d8beaf1bceb 100644
> >>>>>>>> --- a/arch/x86/kvm/svm.c
> >>>>>>>> +++ b/arch/x86/kvm/svm.c
> >>>>>>>> @@ -136,6 +136,8 @@ struct kvm_sev_info {
> >>>>>>>>         int fd;                 /* SEV device fd */
> >>>>>>>>         unsigned long pages_locked; /* Number of pages locked */
> >>>>>>>>         struct list_head regions_list;  /* List of registered regions */
> >>>>>>>> +       unsigned long *page_enc_bmap;
> >>>>>>>> +       unsigned long page_enc_bmap_size;
> >>>>>>>>  };
> >>>>>>>>
> >>>>>>>>  struct kvm_svm {
> >>>>>>>> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> >>>>>>>>
> >>>>>>>>         sev_unbind_asid(kvm, sev->handle);
> >>>>>>>>         sev_asid_free(sev->asid);
> >>>>>>>> +
> >>>>>>>> +       kvfree(sev->page_enc_bmap);
> >>>>>>>> +       sev->page_enc_bmap = NULL;
> >>>>>>>>  }
> >>>>>>>>
> >>>>>>>>  static void avic_vm_destroy(struct kvm *kvm)
> >>>>>>>> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >>>>>>>>         return ret;
> >>>>>>>>  }
> >>>>>>>>
> >>>>>>>> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> >>>>>>>> +{
> >>>>>>>> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >>>>>>>> +       unsigned long *map;
> >>>>>>>> +       unsigned long sz;
> >>>>>>>> +
> >>>>>>>> +       if (sev->page_enc_bmap_size >= new_size)
> >>>>>>>> +               return 0;
> >>>>>>>> +
> >>>>>>>> +       sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> >>>>>>>> +
> >>>>>>>> +       map = vmalloc(sz);
> >>>>>>>> +       if (!map) {
> >>>>>>>> +               pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> >>>>>>>> +                               sz);
> >>>>>>>> +               return -ENOMEM;
> >>>>>>>> +       }
> >>>>>>>> +
> >>>>>>>> +       /* mark the page encrypted (by default) */
> >>>>>>>> +       memset(map, 0xff, sz);
> >>>>>>>> +
> >>>>>>>> +       bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> >>>>>>>> +       kvfree(sev->page_enc_bmap);
> >>>>>>>> +
> >>>>>>>> +       sev->page_enc_bmap = map;
> >>>>>>>> +       sev->page_enc_bmap_size = new_size;
> >>>>>>>> +
> >>>>>>>> +       return 0;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> >>>>>>>> +                                 unsigned long npages, unsigned long enc)
> >>>>>>>> +{
> >>>>>>>> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >>>>>>>> +       kvm_pfn_t pfn_start, pfn_end;
> >>>>>>>> +       gfn_t gfn_start, gfn_end;
> >>>>>>>> +       int ret;
> >>>>>>>> +
> >>>>>>>> +       if (!sev_guest(kvm))
> >>>>>>>> +               return -EINVAL;
> >>>>>>>> +
> >>>>>>>> +       if (!npages)
> >>>>>>>> +               return 0;
> >>>>>>>> +
> >>>>>>>> +       gfn_start = gpa_to_gfn(gpa);
> >>>>>>>> +       gfn_end = gfn_start + npages;
> >>>>>>>> +
> >>>>>>>> +       /* out of bound access error check */
> >>>>>>>> +       if (gfn_end <= gfn_start)
> >>>>>>>> +               return -EINVAL;
> >>>>>>>> +
> >>>>>>>> +       /* lets make sure that gpa exist in our memslot */
> >>>>>>>> +       pfn_start = gfn_to_pfn(kvm, gfn_start);
> >>>>>>>> +       pfn_end = gfn_to_pfn(kvm, gfn_end);
> >>>>>>>> +
> >>>>>>>> +       if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> >>>>>>>> +               /*
> >>>>>>>> +                * Allow guest MMIO range(s) to be added
> >>>>>>>> +                * to the page encryption bitmap.
> >>>>>>>> +                */
> >>>>>>>> +               return -EINVAL;
> >>>>>>>> +       }
> >>>>>>>> +
> >>>>>>>> +       if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> >>>>>>>> +               /*
> >>>>>>>> +                * Allow guest MMIO range(s) to be added
> >>>>>>>> +                * to the page encryption bitmap.
> >>>>>>>> +                */
> >>>>>>>> +               return -EINVAL;
> >>>>>>>> +       }
> >>>>>>>> +
> >>>>>>>> +       mutex_lock(&kvm->lock);
> >>>>>>>> +       ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> >>>>>>>> +       if (ret)
> >>>>>>>> +               goto unlock;
> >>>>>>>> +
> >>>>>>>> +       if (enc)
> >>>>>>>> +               __bitmap_set(sev->page_enc_bmap, gfn_start,
> >>>>>>>> +                               gfn_end - gfn_start);
> >>>>>>>> +       else
> >>>>>>>> +               __bitmap_clear(sev->page_enc_bmap, gfn_start,
> >>>>>>>> +                               gfn_end - gfn_start);
> >>>>>>>> +
> >>>>>>>> +unlock:
> >>>>>>>> +       mutex_unlock(&kvm->lock);
> >>>>>>>> +       return ret;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >>>>>>>>  {
> >>>>>>>>         struct kvm_sev_cmd sev_cmd;
> >>>>>>>> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >>>>>>>>         .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> >>>>>>>>
> >>>>>>>>         .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> >>>>>>>> +
> >>>>>>>> +       .page_enc_status_hc = svm_page_enc_status_hc,
> >>>>>>>>  };
> >>>>>>>>
> >>>>>>>>  static int __init svm_init(void)
> >>>>>>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> >>>>>>>> index 079d9fbf278e..f68e76ee7f9c 100644
> >>>>>>>> --- a/arch/x86/kvm/vmx/vmx.c
> >>>>>>>> +++ b/arch/x86/kvm/vmx/vmx.c
> >>>>>>>> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> >>>>>>>>         .nested_get_evmcs_version = NULL,
> >>>>>>>>         .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> >>>>>>>>         .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> >>>>>>>> +       .page_enc_status_hc = NULL,
> >>>>>>>>  };
> >>>>>>>>
> >>>>>>>>  static void vmx_cleanup_l1d_flush(void)
> >>>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>>>>>>> index cf95c36cb4f4..68428eef2dde 100644
> >>>>>>>> --- a/arch/x86/kvm/x86.c
> >>>>>>>> +++ b/arch/x86/kvm/x86.c
> >>>>>>>> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> >>>>>>>>                 kvm_sched_yield(vcpu->kvm, a0);
> >>>>>>>>                 ret = 0;
> >>>>>>>>                 break;
> >>>>>>>> +       case KVM_HC_PAGE_ENC_STATUS:
> >>>>>>>> +               ret = -KVM_ENOSYS;
> >>>>>>>> +               if (kvm_x86_ops->page_enc_status_hc)
> >>>>>>>> +                       ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> >>>>>>>> +                                       a0, a1, a2);
> >>>>>>>> +               break;
> >>>>>>>>         default:
> >>>>>>>>                 ret = -KVM_ENOSYS;
> >>>>>>>>                 break;
> >>>>>>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> >>>>>>>> index 8b86609849b9..847b83b75dc8 100644
> >>>>>>>> --- a/include/uapi/linux/kvm_para.h
> >>>>>>>> +++ b/include/uapi/linux/kvm_para.h
> >>>>>>>> @@ -29,6 +29,7 @@
> >>>>>>>>  #define KVM_HC_CLOCK_PAIRING           9
> >>>>>>>>  #define KVM_HC_SEND_IPI                10
> >>>>>>>>  #define KVM_HC_SCHED_YIELD             11
> >>>>>>>> +#define KVM_HC_PAGE_ENC_STATUS         12
> >>>>>>>>
> >>>>>>>>  /*
> >>>>>>>>   * hypercalls use architecture specific
> >>>>>>>> --
> >>>>>>>> 2.17.1
> >>>>>>>>
> >>>>>>> I'm still not excited by the dynamic resizing. I believe the guest
> >>>>>>> hypercall can be called in atomic contexts, which makes me
> >>>>>>> particularly unexcited to see a potentially large vmalloc on the host
> >>>>>>> followed by filling the buffer. Particularly when the buffer might be
> >>>>>>> non-trivial in size (~1MB per 32GB, per some back of the envelope
> >>>>>>> math).
> >>>>>>>
> >>>>>> I think looking at more practical situations, most hypercalls will
> >>>>>> happen during the boot stage, when device specific initializations are
> >>>>>> happening, so typically the maximum page encryption bitmap size would
> >>>>>> be allocated early enough.
> >>>>>>
> >>>>>> In fact, initial hypercalls made by OVMF will probably allocate the
> >>>>>> maximum page bitmap size even before the kernel comes up, especially
> >>>>>> as they will be setting up page enc/dec status for MMIO, ROM, ACPI
> >>>>>> regions, PCI device memory, etc., and most importantly for
> >>>>>> "non-existent" high memory range (which will probably be the
> >>>>>> maximum size page encryption bitmap allocated/resized).
> >>>>>>
> >>>>>> Let me know if you have different thoughts on this ?
> >>>>> Hi Ashish,
> >>>>>
> >>>>> If this is not an issue in practice, we can just move past this. If we
> >>>>> are basically guaranteed that OVMF will trigger hypercalls that expand
> >>>>> the bitmap beyond the top of memory, then, yes, that should work. That
> >>>>> leaves me slightly nervous that OVMF might regress since it's not
> >>>>> obvious that calling a hypercall beyond the top of memory would be
> >>>>> "required" for avoiding a somewhat indirectly related issue in guest
> >>>>> kernels.
> >>>>
> >>>> If possible then we should try to avoid growing/shrinking the bitmap .
> >>>> Today OVMF may not be accessing beyond memory but a malicious guest
> >>>> could send a hypercall down which can trigger a huge memory allocation
> >>>> on the host side and may eventually cause denial of service for other.
> >>> Nice catch! Was just writing up an email about this.
> >>>> I am in favor if we can find some solution to handle this case. How
> >>>> about Steve's suggestion about VMM making a call down to the kernel to
> >>>> tell how big the bitmap should be? Initially it should be equal to the
> >>>> guest RAM and if VMM ever did the memory expansion then it can send down
> >>>> another notification to increase the bitmap ?
> >>>>
> >>>> Optionally, instead of adding a new ioctl, I was wondering if we can
> >>>> extend the kvm_arch_prepare_memory_region() to make svm specific x86_ops
> >>>> which can take read the userspace provided memory region and calculate
> >>>> the amount of guest RAM managed by the KVM and grow/shrink the bitmap
> >>>> based on that information. I have not looked deep enough to see if its
> >>>> doable but if it can work then we can avoid adding yet another ioctl.
> >>> We also have the set bitmap ioctl in a later patch in this series. We
> >>> could also use the set ioctl for initialization (it's a little
> >>> excessive for initialization since there will be an additional
> >>> ephemeral allocation and a few additional buffer copies, but that's
> >>> probably fine). An enable_cap has the added benefit of probably being
> >>> necessary anyway so usermode can disable the migration feature flag.
> >>>
> >>> In general, userspace is going to have to be in direct control of the
> >>> buffer and its size.
> >> My only practical concern about setting a static bitmap size based on guest
> >> memory is about the hypercalls being made initially by OVMF to set page
> >> enc/dec status for ROM, ACPI regions and especially the non-existent
> >> high memory range. The new ioctl will statically setup bitmap size to
> >> whatever guest RAM is specified, say 4G, 8G, etc., but the OVMF
> >> hypercall for non-existent memory will try to do a hypercall for guest
> >> physical memory range like ~6G->64G (for 4G guest RAM setup), this
> >> hypercall will basically have to just return doing nothing, because
> >> the allocated bitmap won't have this guest physical range available ?
> 
> 
> IMO, Ovmf issuing a hypercall beyond the guest RAM is simple wrong, it
> should *not* do that.  There was a feature request I submitted sometime
> back to Tianocore https://bugzilla.tianocore.org/show_bug.cgi?id=623 as
> I saw this coming in future. I tried highlighting the problem in the
> MdeModulePkg that it does not provide a notifier to tell OVMF when core
> creates the MMIO holes etc. It was not a big problem with the SEV
> initially because we were never getting down to hypervisor to do
> something about those non-existent regions. But with the migration its
> now important that we should restart the discussion with UEFI folks and
> see what can be done. In the kernel patches we should do what is right
> for the kernel and not workaround the Ovmf limitation.

Ok, this makes sense. I will start exploring
kvm_arch_prepare_memory_region() to see if it can assist in computing
the guest RAM or otherwise i will look at adding a new ioctl interface
for the same.

Thanks,
Ashish

> 
> 
> >> Also, hypercalls for ROM, ACPI, device regions and any memory holes within
> >> the static bitmap setup as per guest RAM config will work, but what
> >> about hypercalls for any device regions beyond the guest RAM config ?
> >>
> >> Thanks,
> >> Ashish
> > I'm not super familiar with what the address beyond the top of ram is
> > used for. If the memory is not backed by RAM, will it even matter for
> > migration? Sounds like the encryption for SEV won't even apply to it.
> > If we don't need to know what the c-bit state of an address is, we
> > don't need to track it. It doesn't hurt to track it (which is why I'm
> > not super concerned about tracking the memory holes).

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2020-04-08  3:18                   ` Ashish Kalra
@ 2020-04-09 16:18                     ` Ashish Kalra
  2020-04-09 20:41                       ` Steve Rutherford
  0 siblings, 1 reply; 107+ messages in thread
From: Ashish Kalra @ 2020-04-09 16:18 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Steve Rutherford, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski

Hello Brijesh, Steve,

On Wed, Apr 08, 2020 at 03:18:18AM +0000, Ashish Kalra wrote:
> Hello Brijesh,
> 
> On Tue, Apr 07, 2020 at 09:34:15PM -0500, Brijesh Singh wrote:
> > 
> > On 4/7/20 8:38 PM, Steve Rutherford wrote:
> > > On Tue, Apr 7, 2020 at 6:17 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > >> Hello Steve, Brijesh,
> > >>
> > >> On Tue, Apr 07, 2020 at 05:35:57PM -0700, Steve Rutherford wrote:
> > >>> On Tue, Apr 7, 2020 at 5:29 PM Brijesh Singh <brijesh.singh@amd.com> wrote:
> > >>>>
> > >>>> On 4/7/20 7:01 PM, Steve Rutherford wrote:
> > >>>>> On Mon, Apr 6, 2020 at 10:27 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > >>>>>> Hello Steve,
> > >>>>>>
> > >>>>>> On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
> > >>>>>>> On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> > >>>>>>>> From: Brijesh Singh <Brijesh.Singh@amd.com>
> > >>>>>>>>
> > >>>>>>>> This hypercall is used by the SEV guest to notify a change in the page
> > >>>>>>>> encryption status to the hypervisor. The hypercall should be invoked
> > >>>>>>>> only when the encryption attribute is changed from encrypted -> decrypted
> > >>>>>>>> and vice versa. By default all guest pages are considered encrypted.
> > >>>>>>>>
> > >>>>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
> > >>>>>>>> Cc: Ingo Molnar <mingo@redhat.com>
> > >>>>>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
> > >>>>>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
> > >>>>>>>> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > >>>>>>>> Cc: Joerg Roedel <joro@8bytes.org>
> > >>>>>>>> Cc: Borislav Petkov <bp@suse.de>
> > >>>>>>>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > >>>>>>>> Cc: x86@kernel.org
> > >>>>>>>> Cc: kvm@vger.kernel.org
> > >>>>>>>> Cc: linux-kernel@vger.kernel.org
> > >>>>>>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > >>>>>>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > >>>>>>>> ---
> > >>>>>>>>  Documentation/virt/kvm/hypercalls.rst | 15 +++++
> > >>>>>>>>  arch/x86/include/asm/kvm_host.h       |  2 +
> > >>>>>>>>  arch/x86/kvm/svm.c                    | 95 +++++++++++++++++++++++++++
> > >>>>>>>>  arch/x86/kvm/vmx/vmx.c                |  1 +
> > >>>>>>>>  arch/x86/kvm/x86.c                    |  6 ++
> > >>>>>>>>  include/uapi/linux/kvm_para.h         |  1 +
> > >>>>>>>>  6 files changed, 120 insertions(+)
> > >>>>>>>>
> > >>>>>>>> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> > >>>>>>>> index dbaf207e560d..ff5287e68e81 100644
> > >>>>>>>> --- a/Documentation/virt/kvm/hypercalls.rst
> > >>>>>>>> +++ b/Documentation/virt/kvm/hypercalls.rst
> > >>>>>>>> @@ -169,3 +169,18 @@ a0: destination APIC ID
> > >>>>>>>>
> > >>>>>>>>  :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> > >>>>>>>>                 any of the IPI target vCPUs was preempted.
> > >>>>>>>> +
> > >>>>>>>> +
> > >>>>>>>> +8. KVM_HC_PAGE_ENC_STATUS
> > >>>>>>>> +-------------------------
> > >>>>>>>> +:Architecture: x86
> > >>>>>>>> +:Status: active
> > >>>>>>>> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> > >>>>>>>> +
> > >>>>>>>> +a0: the guest physical address of the start page
> > >>>>>>>> +a1: the number of pages
> > >>>>>>>> +a2: encryption attribute
> > >>>>>>>> +
> > >>>>>>>> +   Where:
> > >>>>>>>> +       * 1: Encryption attribute is set
> > >>>>>>>> +       * 0: Encryption attribute is cleared
> > >>>>>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > >>>>>>>> index 98959e8cd448..90718fa3db47 100644
> > >>>>>>>> --- a/arch/x86/include/asm/kvm_host.h
> > >>>>>>>> +++ b/arch/x86/include/asm/kvm_host.h
> > >>>>>>>> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> > >>>>>>>>
> > >>>>>>>>         bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> > >>>>>>>>         int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > >>>>>>>> +       int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > >>>>>>>> +                                 unsigned long sz, unsigned long mode);
> > >>>>>>> Nit: spell out size instead of sz.
> > >>>>>>>>  };
> > >>>>>>>>
> > >>>>>>>>  struct kvm_arch_async_pf {
> > >>>>>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > >>>>>>>> index 7c2721e18b06..1d8beaf1bceb 100644
> > >>>>>>>> --- a/arch/x86/kvm/svm.c
> > >>>>>>>> +++ b/arch/x86/kvm/svm.c
> > >>>>>>>> @@ -136,6 +136,8 @@ struct kvm_sev_info {
> > >>>>>>>>         int fd;                 /* SEV device fd */
> > >>>>>>>>         unsigned long pages_locked; /* Number of pages locked */
> > >>>>>>>>         struct list_head regions_list;  /* List of registered regions */
> > >>>>>>>> +       unsigned long *page_enc_bmap;
> > >>>>>>>> +       unsigned long page_enc_bmap_size;
> > >>>>>>>>  };
> > >>>>>>>>
> > >>>>>>>>  struct kvm_svm {
> > >>>>>>>> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> > >>>>>>>>
> > >>>>>>>>         sev_unbind_asid(kvm, sev->handle);
> > >>>>>>>>         sev_asid_free(sev->asid);
> > >>>>>>>> +
> > >>>>>>>> +       kvfree(sev->page_enc_bmap);
> > >>>>>>>> +       sev->page_enc_bmap = NULL;
> > >>>>>>>>  }
> > >>>>>>>>
> > >>>>>>>>  static void avic_vm_destroy(struct kvm *kvm)
> > >>>>>>>> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > >>>>>>>>         return ret;
> > >>>>>>>>  }
> > >>>>>>>>
> > >>>>>>>> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> > >>>>>>>> +{
> > >>>>>>>> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > >>>>>>>> +       unsigned long *map;
> > >>>>>>>> +       unsigned long sz;
> > >>>>>>>> +
> > >>>>>>>> +       if (sev->page_enc_bmap_size >= new_size)
> > >>>>>>>> +               return 0;
> > >>>>>>>> +
> > >>>>>>>> +       sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> > >>>>>>>> +
> > >>>>>>>> +       map = vmalloc(sz);
> > >>>>>>>> +       if (!map) {
> > >>>>>>>> +               pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> > >>>>>>>> +                               sz);
> > >>>>>>>> +               return -ENOMEM;
> > >>>>>>>> +       }
> > >>>>>>>> +
> > >>>>>>>> +       /* mark the page encrypted (by default) */
> > >>>>>>>> +       memset(map, 0xff, sz);
> > >>>>>>>> +
> > >>>>>>>> +       bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> > >>>>>>>> +       kvfree(sev->page_enc_bmap);
> > >>>>>>>> +
> > >>>>>>>> +       sev->page_enc_bmap = map;
> > >>>>>>>> +       sev->page_enc_bmap_size = new_size;
> > >>>>>>>> +
> > >>>>>>>> +       return 0;
> > >>>>>>>> +}
> > >>>>>>>> +
> > >>>>>>>> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > >>>>>>>> +                                 unsigned long npages, unsigned long enc)
> > >>>>>>>> +{
> > >>>>>>>> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > >>>>>>>> +       kvm_pfn_t pfn_start, pfn_end;
> > >>>>>>>> +       gfn_t gfn_start, gfn_end;
> > >>>>>>>> +       int ret;
> > >>>>>>>> +
> > >>>>>>>> +       if (!sev_guest(kvm))
> > >>>>>>>> +               return -EINVAL;
> > >>>>>>>> +
> > >>>>>>>> +       if (!npages)
> > >>>>>>>> +               return 0;
> > >>>>>>>> +
> > >>>>>>>> +       gfn_start = gpa_to_gfn(gpa);
> > >>>>>>>> +       gfn_end = gfn_start + npages;
> > >>>>>>>> +
> > >>>>>>>> +       /* out of bound access error check */
> > >>>>>>>> +       if (gfn_end <= gfn_start)
> > >>>>>>>> +               return -EINVAL;
> > >>>>>>>> +
> > >>>>>>>> +       /* lets make sure that gpa exist in our memslot */
> > >>>>>>>> +       pfn_start = gfn_to_pfn(kvm, gfn_start);
> > >>>>>>>> +       pfn_end = gfn_to_pfn(kvm, gfn_end);
> > >>>>>>>> +
> > >>>>>>>> +       if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> > >>>>>>>> +               /*
> > >>>>>>>> +                * Allow guest MMIO range(s) to be added
> > >>>>>>>> +                * to the page encryption bitmap.
> > >>>>>>>> +                */
> > >>>>>>>> +               return -EINVAL;
> > >>>>>>>> +       }
> > >>>>>>>> +
> > >>>>>>>> +       if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> > >>>>>>>> +               /*
> > >>>>>>>> +                * Allow guest MMIO range(s) to be added
> > >>>>>>>> +                * to the page encryption bitmap.
> > >>>>>>>> +                */
> > >>>>>>>> +               return -EINVAL;
> > >>>>>>>> +       }
> > >>>>>>>> +
> > >>>>>>>> +       mutex_lock(&kvm->lock);
> > >>>>>>>> +       ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > >>>>>>>> +       if (ret)
> > >>>>>>>> +               goto unlock;
> > >>>>>>>> +
> > >>>>>>>> +       if (enc)
> > >>>>>>>> +               __bitmap_set(sev->page_enc_bmap, gfn_start,
> > >>>>>>>> +                               gfn_end - gfn_start);
> > >>>>>>>> +       else
> > >>>>>>>> +               __bitmap_clear(sev->page_enc_bmap, gfn_start,
> > >>>>>>>> +                               gfn_end - gfn_start);
> > >>>>>>>> +
> > >>>>>>>> +unlock:
> > >>>>>>>> +       mutex_unlock(&kvm->lock);
> > >>>>>>>> +       return ret;
> > >>>>>>>> +}
> > >>>>>>>> +
> > >>>>>>>>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > >>>>>>>>  {
> > >>>>>>>>         struct kvm_sev_cmd sev_cmd;
> > >>>>>>>> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > >>>>>>>>         .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> > >>>>>>>>
> > >>>>>>>>         .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > >>>>>>>> +
> > >>>>>>>> +       .page_enc_status_hc = svm_page_enc_status_hc,
> > >>>>>>>>  };
> > >>>>>>>>
> > >>>>>>>>  static int __init svm_init(void)
> > >>>>>>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > >>>>>>>> index 079d9fbf278e..f68e76ee7f9c 100644
> > >>>>>>>> --- a/arch/x86/kvm/vmx/vmx.c
> > >>>>>>>> +++ b/arch/x86/kvm/vmx/vmx.c
> > >>>>>>>> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > >>>>>>>>         .nested_get_evmcs_version = NULL,
> > >>>>>>>>         .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> > >>>>>>>>         .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > >>>>>>>> +       .page_enc_status_hc = NULL,
> > >>>>>>>>  };
> > >>>>>>>>
> > >>>>>>>>  static void vmx_cleanup_l1d_flush(void)
> > >>>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > >>>>>>>> index cf95c36cb4f4..68428eef2dde 100644
> > >>>>>>>> --- a/arch/x86/kvm/x86.c
> > >>>>>>>> +++ b/arch/x86/kvm/x86.c
> > >>>>>>>> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> > >>>>>>>>                 kvm_sched_yield(vcpu->kvm, a0);
> > >>>>>>>>                 ret = 0;
> > >>>>>>>>                 break;
> > >>>>>>>> +       case KVM_HC_PAGE_ENC_STATUS:
> > >>>>>>>> +               ret = -KVM_ENOSYS;
> > >>>>>>>> +               if (kvm_x86_ops->page_enc_status_hc)
> > >>>>>>>> +                       ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> > >>>>>>>> +                                       a0, a1, a2);
> > >>>>>>>> +               break;
> > >>>>>>>>         default:
> > >>>>>>>>                 ret = -KVM_ENOSYS;
> > >>>>>>>>                 break;
> > >>>>>>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> > >>>>>>>> index 8b86609849b9..847b83b75dc8 100644
> > >>>>>>>> --- a/include/uapi/linux/kvm_para.h
> > >>>>>>>> +++ b/include/uapi/linux/kvm_para.h
> > >>>>>>>> @@ -29,6 +29,7 @@
> > >>>>>>>>  #define KVM_HC_CLOCK_PAIRING           9
> > >>>>>>>>  #define KVM_HC_SEND_IPI                10
> > >>>>>>>>  #define KVM_HC_SCHED_YIELD             11
> > >>>>>>>> +#define KVM_HC_PAGE_ENC_STATUS         12
> > >>>>>>>>
> > >>>>>>>>  /*
> > >>>>>>>>   * hypercalls use architecture specific
> > >>>>>>>> --
> > >>>>>>>> 2.17.1
> > >>>>>>>>
> > >>>>>>> I'm still not excited by the dynamic resizing. I believe the guest
> > >>>>>>> hypercall can be called in atomic contexts, which makes me
> > >>>>>>> particularly unexcited to see a potentially large vmalloc on the host
> > >>>>>>> followed by filling the buffer. Particularly when the buffer might be
> > >>>>>>> non-trivial in size (~1MB per 32GB, per some back of the envelope
> > >>>>>>> math).
> > >>>>>>>
> > >>>>>> I think looking at more practical situations, most hypercalls will
> > >>>>>> happen during the boot stage, when device specific initializations are
> > >>>>>> happening, so typically the maximum page encryption bitmap size would
> > >>>>>> be allocated early enough.
> > >>>>>>
> > >>>>>> In fact, initial hypercalls made by OVMF will probably allocate the
> > >>>>>> maximum page bitmap size even before the kernel comes up, especially
> > >>>>>> as they will be setting up page enc/dec status for MMIO, ROM, ACPI
> > >>>>>> regions, PCI device memory, etc., and most importantly for
> > >>>>>> "non-existent" high memory range (which will probably be the
> > >>>>>> maximum size page encryption bitmap allocated/resized).
> > >>>>>>
> > >>>>>> Let me know if you have different thoughts on this ?
> > >>>>> Hi Ashish,
> > >>>>>
> > >>>>> If this is not an issue in practice, we can just move past this. If we
> > >>>>> are basically guaranteed that OVMF will trigger hypercalls that expand
> > >>>>> the bitmap beyond the top of memory, then, yes, that should work. That
> > >>>>> leaves me slightly nervous that OVMF might regress since it's not
> > >>>>> obvious that calling a hypercall beyond the top of memory would be
> > >>>>> "required" for avoiding a somewhat indirectly related issue in guest
> > >>>>> kernels.
> > >>>>
> > >>>> If possible then we should try to avoid growing/shrinking the bitmap .
> > >>>> Today OVMF may not be accessing beyond memory but a malicious guest
> > >>>> could send a hypercall down which can trigger a huge memory allocation
> > >>>> on the host side and may eventually cause denial of service for other.
> > >>> Nice catch! Was just writing up an email about this.
> > >>>> I am in favor if we can find some solution to handle this case. How
> > >>>> about Steve's suggestion about VMM making a call down to the kernel to
> > >>>> tell how big the bitmap should be? Initially it should be equal to the
> > >>>> guest RAM and if VMM ever did the memory expansion then it can send down
> > >>>> another notification to increase the bitmap ?
> > >>>>
> > >>>> Optionally, instead of adding a new ioctl, I was wondering if we can
> > >>>> extend the kvm_arch_prepare_memory_region() to make svm specific x86_ops
> > >>>> which can take read the userspace provided memory region and calculate
> > >>>> the amount of guest RAM managed by the KVM and grow/shrink the bitmap
> > >>>> based on that information. I have not looked deep enough to see if its
> > >>>> doable but if it can work then we can avoid adding yet another ioctl.
> > >>> We also have the set bitmap ioctl in a later patch in this series. We
> > >>> could also use the set ioctl for initialization (it's a little
> > >>> excessive for initialization since there will be an additional
> > >>> ephemeral allocation and a few additional buffer copies, but that's
> > >>> probably fine). An enable_cap has the added benefit of probably being
> > >>> necessary anyway so usermode can disable the migration feature flag.
> > >>>
> > >>> In general, userspace is going to have to be in direct control of the
> > >>> buffer and its size.
> > >> My only practical concern about setting a static bitmap size based on guest
> > >> memory is about the hypercalls being made initially by OVMF to set page
> > >> enc/dec status for ROM, ACPI regions and especially the non-existent
> > >> high memory range. The new ioctl will statically setup bitmap size to
> > >> whatever guest RAM is specified, say 4G, 8G, etc., but the OVMF
> > >> hypercall for non-existent memory will try to do a hypercall for guest
> > >> physical memory range like ~6G->64G (for 4G guest RAM setup), this
> > >> hypercall will basically have to just return doing nothing, because
> > >> the allocated bitmap won't have this guest physical range available ?
> > 
> > 
> > IMO, Ovmf issuing a hypercall beyond the guest RAM is simple wrong, it
> > should *not* do that.  There was a feature request I submitted sometime
> > back to Tianocore https://bugzilla.tianocore.org/show_bug.cgi?id=623 as
> > I saw this coming in future. I tried highlighting the problem in the
> > MdeModulePkg that it does not provide a notifier to tell OVMF when core
> > creates the MMIO holes etc. It was not a big problem with the SEV
> > initially because we were never getting down to hypervisor to do
> > something about those non-existent regions. But with the migration its
> > now important that we should restart the discussion with UEFI folks and
> > see what can be done. In the kernel patches we should do what is right
> > for the kernel and not workaround the Ovmf limitation.
> 
> Ok, this makes sense. I will start exploring
> kvm_arch_prepare_memory_region() to see if it can assist in computing
> the guest RAM or otherwise i will look at adding a new ioctl interface
> for the same.
> 

I looked at kvm_arch_prepare_memory_region() and
kvm_arch_commit_memory_region() and kvm_arch_commit_memory_region()
looks to be ideal to use for this.

I walked the kvm_memslots in this function and i can compute the
approximate guest RAM mapped by KVM, though, i get the guest RAM size as
"twice" the configured size because of the two address spaces on x86 KVM,
i believe there is one additional address space for SMM/SMRAM use.

I don't think we have a use case of migrating a SEV guest with SMM
support ?

Considering that, i believe that i can just compute the guest RAM size
using memslots for address space #0 and use that to grow/shrink the bitmap. 

As you mentioned i will need to add a new SVM specific x86_ops to
callback as part of kvm_arch_commit_memory_region() which will in-turn
call sev_resize_page_enc_bitmap().

Thanks,
Ashish

> > 
> > 
> > >> Also, hypercalls for ROM, ACPI, device regions and any memory holes within
> > >> the static bitmap setup as per guest RAM config will work, but what
> > >> about hypercalls for any device regions beyond the guest RAM config ?
> > >>
> > >> Thanks,
> > >> Ashish
> > > I'm not super familiar with what the address beyond the top of ram is
> > > used for. If the memory is not backed by RAM, will it even matter for
> > > migration? Sounds like the encryption for SEV won't even apply to it.
> > > If we don't need to know what the c-bit state of an address is, we
> > > don't need to track it. It doesn't hurt to track it (which is why I'm
> > > not super concerned about tracking the memory holes).

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
  2020-04-09 16:18                     ` Ashish Kalra
@ 2020-04-09 20:41                       ` Steve Rutherford
  0 siblings, 0 replies; 107+ messages in thread
From: Steve Rutherford @ 2020-04-09 20:41 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Brijesh Singh, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski

On Thu, Apr 9, 2020 at 9:18 AM Ashish Kalra <ashish.kalra@amd.com> wrote:
>
> Hello Brijesh, Steve,
>
> On Wed, Apr 08, 2020 at 03:18:18AM +0000, Ashish Kalra wrote:
> > Hello Brijesh,
> >
> > On Tue, Apr 07, 2020 at 09:34:15PM -0500, Brijesh Singh wrote:
> > >
> > > On 4/7/20 8:38 PM, Steve Rutherford wrote:
> > > > On Tue, Apr 7, 2020 at 6:17 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > >> Hello Steve, Brijesh,
> > > >>
> > > >> On Tue, Apr 07, 2020 at 05:35:57PM -0700, Steve Rutherford wrote:
> > > >>> On Tue, Apr 7, 2020 at 5:29 PM Brijesh Singh <brijesh.singh@amd.com> wrote:
> > > >>>>
> > > >>>> On 4/7/20 7:01 PM, Steve Rutherford wrote:
> > > >>>>> On Mon, Apr 6, 2020 at 10:27 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > >>>>>> Hello Steve,
> > > >>>>>>
> > > >>>>>> On Mon, Apr 06, 2020 at 07:17:37PM -0700, Steve Rutherford wrote:
> > > >>>>>>> On Sun, Mar 29, 2020 at 11:22 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> > > >>>>>>>> From: Brijesh Singh <Brijesh.Singh@amd.com>
> > > >>>>>>>>
> > > >>>>>>>> This hypercall is used by the SEV guest to notify a change in the page
> > > >>>>>>>> encryption status to the hypervisor. The hypercall should be invoked
> > > >>>>>>>> only when the encryption attribute is changed from encrypted -> decrypted
> > > >>>>>>>> and vice versa. By default all guest pages are considered encrypted.
> > > >>>>>>>>
> > > >>>>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
> > > >>>>>>>> Cc: Ingo Molnar <mingo@redhat.com>
> > > >>>>>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > >>>>>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > >>>>>>>> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > > >>>>>>>> Cc: Joerg Roedel <joro@8bytes.org>
> > > >>>>>>>> Cc: Borislav Petkov <bp@suse.de>
> > > >>>>>>>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > > >>>>>>>> Cc: x86@kernel.org
> > > >>>>>>>> Cc: kvm@vger.kernel.org
> > > >>>>>>>> Cc: linux-kernel@vger.kernel.org
> > > >>>>>>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > > >>>>>>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > >>>>>>>> ---
> > > >>>>>>>>  Documentation/virt/kvm/hypercalls.rst | 15 +++++
> > > >>>>>>>>  arch/x86/include/asm/kvm_host.h       |  2 +
> > > >>>>>>>>  arch/x86/kvm/svm.c                    | 95 +++++++++++++++++++++++++++
> > > >>>>>>>>  arch/x86/kvm/vmx/vmx.c                |  1 +
> > > >>>>>>>>  arch/x86/kvm/x86.c                    |  6 ++
> > > >>>>>>>>  include/uapi/linux/kvm_para.h         |  1 +
> > > >>>>>>>>  6 files changed, 120 insertions(+)
> > > >>>>>>>>
> > > >>>>>>>> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> > > >>>>>>>> index dbaf207e560d..ff5287e68e81 100644
> > > >>>>>>>> --- a/Documentation/virt/kvm/hypercalls.rst
> > > >>>>>>>> +++ b/Documentation/virt/kvm/hypercalls.rst
> > > >>>>>>>> @@ -169,3 +169,18 @@ a0: destination APIC ID
> > > >>>>>>>>
> > > >>>>>>>>  :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> > > >>>>>>>>                 any of the IPI target vCPUs was preempted.
> > > >>>>>>>> +
> > > >>>>>>>> +
> > > >>>>>>>> +8. KVM_HC_PAGE_ENC_STATUS
> > > >>>>>>>> +-------------------------
> > > >>>>>>>> +:Architecture: x86
> > > >>>>>>>> +:Status: active
> > > >>>>>>>> +:Purpose: Notify the encryption status changes in guest page table (SEV guest)
> > > >>>>>>>> +
> > > >>>>>>>> +a0: the guest physical address of the start page
> > > >>>>>>>> +a1: the number of pages
> > > >>>>>>>> +a2: encryption attribute
> > > >>>>>>>> +
> > > >>>>>>>> +   Where:
> > > >>>>>>>> +       * 1: Encryption attribute is set
> > > >>>>>>>> +       * 0: Encryption attribute is cleared
> > > >>>>>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > >>>>>>>> index 98959e8cd448..90718fa3db47 100644
> > > >>>>>>>> --- a/arch/x86/include/asm/kvm_host.h
> > > >>>>>>>> +++ b/arch/x86/include/asm/kvm_host.h
> > > >>>>>>>> @@ -1267,6 +1267,8 @@ struct kvm_x86_ops {
> > > >>>>>>>>
> > > >>>>>>>>         bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> > > >>>>>>>>         int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > > >>>>>>>> +       int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> > > >>>>>>>> +                                 unsigned long sz, unsigned long mode);
> > > >>>>>>> Nit: spell out size instead of sz.
> > > >>>>>>>>  };
> > > >>>>>>>>
> > > >>>>>>>>  struct kvm_arch_async_pf {
> > > >>>>>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > >>>>>>>> index 7c2721e18b06..1d8beaf1bceb 100644
> > > >>>>>>>> --- a/arch/x86/kvm/svm.c
> > > >>>>>>>> +++ b/arch/x86/kvm/svm.c
> > > >>>>>>>> @@ -136,6 +136,8 @@ struct kvm_sev_info {
> > > >>>>>>>>         int fd;                 /* SEV device fd */
> > > >>>>>>>>         unsigned long pages_locked; /* Number of pages locked */
> > > >>>>>>>>         struct list_head regions_list;  /* List of registered regions */
> > > >>>>>>>> +       unsigned long *page_enc_bmap;
> > > >>>>>>>> +       unsigned long page_enc_bmap_size;
> > > >>>>>>>>  };
> > > >>>>>>>>
> > > >>>>>>>>  struct kvm_svm {
> > > >>>>>>>> @@ -1991,6 +1993,9 @@ static void sev_vm_destroy(struct kvm *kvm)
> > > >>>>>>>>
> > > >>>>>>>>         sev_unbind_asid(kvm, sev->handle);
> > > >>>>>>>>         sev_asid_free(sev->asid);
> > > >>>>>>>> +
> > > >>>>>>>> +       kvfree(sev->page_enc_bmap);
> > > >>>>>>>> +       sev->page_enc_bmap = NULL;
> > > >>>>>>>>  }
> > > >>>>>>>>
> > > >>>>>>>>  static void avic_vm_destroy(struct kvm *kvm)
> > > >>>>>>>> @@ -7593,6 +7598,94 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > > >>>>>>>>         return ret;
> > > >>>>>>>>  }
> > > >>>>>>>>
> > > >>>>>>>> +static int sev_resize_page_enc_bitmap(struct kvm *kvm, unsigned long new_size)
> > > >>>>>>>> +{
> > > >>>>>>>> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > >>>>>>>> +       unsigned long *map;
> > > >>>>>>>> +       unsigned long sz;
> > > >>>>>>>> +
> > > >>>>>>>> +       if (sev->page_enc_bmap_size >= new_size)
> > > >>>>>>>> +               return 0;
> > > >>>>>>>> +
> > > >>>>>>>> +       sz = ALIGN(new_size, BITS_PER_LONG) / 8;
> > > >>>>>>>> +
> > > >>>>>>>> +       map = vmalloc(sz);
> > > >>>>>>>> +       if (!map) {
> > > >>>>>>>> +               pr_err_once("Failed to allocate encrypted bitmap size %lx\n",
> > > >>>>>>>> +                               sz);
> > > >>>>>>>> +               return -ENOMEM;
> > > >>>>>>>> +       }
> > > >>>>>>>> +
> > > >>>>>>>> +       /* mark the page encrypted (by default) */
> > > >>>>>>>> +       memset(map, 0xff, sz);
> > > >>>>>>>> +
> > > >>>>>>>> +       bitmap_copy(map, sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > >>>>>>>> +       kvfree(sev->page_enc_bmap);
> > > >>>>>>>> +
> > > >>>>>>>> +       sev->page_enc_bmap = map;
> > > >>>>>>>> +       sev->page_enc_bmap_size = new_size;
> > > >>>>>>>> +
> > > >>>>>>>> +       return 0;
> > > >>>>>>>> +}
> > > >>>>>>>> +
> > > >>>>>>>> +static int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > > >>>>>>>> +                                 unsigned long npages, unsigned long enc)
> > > >>>>>>>> +{
> > > >>>>>>>> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > >>>>>>>> +       kvm_pfn_t pfn_start, pfn_end;
> > > >>>>>>>> +       gfn_t gfn_start, gfn_end;
> > > >>>>>>>> +       int ret;
> > > >>>>>>>> +
> > > >>>>>>>> +       if (!sev_guest(kvm))
> > > >>>>>>>> +               return -EINVAL;
> > > >>>>>>>> +
> > > >>>>>>>> +       if (!npages)
> > > >>>>>>>> +               return 0;
> > > >>>>>>>> +
> > > >>>>>>>> +       gfn_start = gpa_to_gfn(gpa);
> > > >>>>>>>> +       gfn_end = gfn_start + npages;
> > > >>>>>>>> +
> > > >>>>>>>> +       /* out of bound access error check */
> > > >>>>>>>> +       if (gfn_end <= gfn_start)
> > > >>>>>>>> +               return -EINVAL;
> > > >>>>>>>> +
> > > >>>>>>>> +       /* lets make sure that gpa exist in our memslot */
> > > >>>>>>>> +       pfn_start = gfn_to_pfn(kvm, gfn_start);
> > > >>>>>>>> +       pfn_end = gfn_to_pfn(kvm, gfn_end);
> > > >>>>>>>> +
> > > >>>>>>>> +       if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> > > >>>>>>>> +               /*
> > > >>>>>>>> +                * Allow guest MMIO range(s) to be added
> > > >>>>>>>> +                * to the page encryption bitmap.
> > > >>>>>>>> +                */
> > > >>>>>>>> +               return -EINVAL;
> > > >>>>>>>> +       }
> > > >>>>>>>> +
> > > >>>>>>>> +       if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> > > >>>>>>>> +               /*
> > > >>>>>>>> +                * Allow guest MMIO range(s) to be added
> > > >>>>>>>> +                * to the page encryption bitmap.
> > > >>>>>>>> +                */
> > > >>>>>>>> +               return -EINVAL;
> > > >>>>>>>> +       }
> > > >>>>>>>> +
> > > >>>>>>>> +       mutex_lock(&kvm->lock);
> > > >>>>>>>> +       ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > > >>>>>>>> +       if (ret)
> > > >>>>>>>> +               goto unlock;
> > > >>>>>>>> +
> > > >>>>>>>> +       if (enc)
> > > >>>>>>>> +               __bitmap_set(sev->page_enc_bmap, gfn_start,
> > > >>>>>>>> +                               gfn_end - gfn_start);
> > > >>>>>>>> +       else
> > > >>>>>>>> +               __bitmap_clear(sev->page_enc_bmap, gfn_start,
> > > >>>>>>>> +                               gfn_end - gfn_start);
> > > >>>>>>>> +
> > > >>>>>>>> +unlock:
> > > >>>>>>>> +       mutex_unlock(&kvm->lock);
> > > >>>>>>>> +       return ret;
> > > >>>>>>>> +}
> > > >>>>>>>> +
> > > >>>>>>>>  static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > >>>>>>>>  {
> > > >>>>>>>>         struct kvm_sev_cmd sev_cmd;
> > > >>>>>>>> @@ -7995,6 +8088,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > >>>>>>>>         .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> > > >>>>>>>>
> > > >>>>>>>>         .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > > >>>>>>>> +
> > > >>>>>>>> +       .page_enc_status_hc = svm_page_enc_status_hc,
> > > >>>>>>>>  };
> > > >>>>>>>>
> > > >>>>>>>>  static int __init svm_init(void)
> > > >>>>>>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > >>>>>>>> index 079d9fbf278e..f68e76ee7f9c 100644
> > > >>>>>>>> --- a/arch/x86/kvm/vmx/vmx.c
> > > >>>>>>>> +++ b/arch/x86/kvm/vmx/vmx.c
> > > >>>>>>>> @@ -8001,6 +8001,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> > > >>>>>>>>         .nested_get_evmcs_version = NULL,
> > > >>>>>>>>         .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> > > >>>>>>>>         .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > > >>>>>>>> +       .page_enc_status_hc = NULL,
> > > >>>>>>>>  };
> > > >>>>>>>>
> > > >>>>>>>>  static void vmx_cleanup_l1d_flush(void)
> > > >>>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > >>>>>>>> index cf95c36cb4f4..68428eef2dde 100644
> > > >>>>>>>> --- a/arch/x86/kvm/x86.c
> > > >>>>>>>> +++ b/arch/x86/kvm/x86.c
> > > >>>>>>>> @@ -7564,6 +7564,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> > > >>>>>>>>                 kvm_sched_yield(vcpu->kvm, a0);
> > > >>>>>>>>                 ret = 0;
> > > >>>>>>>>                 break;
> > > >>>>>>>> +       case KVM_HC_PAGE_ENC_STATUS:
> > > >>>>>>>> +               ret = -KVM_ENOSYS;
> > > >>>>>>>> +               if (kvm_x86_ops->page_enc_status_hc)
> > > >>>>>>>> +                       ret = kvm_x86_ops->page_enc_status_hc(vcpu->kvm,
> > > >>>>>>>> +                                       a0, a1, a2);
> > > >>>>>>>> +               break;
> > > >>>>>>>>         default:
> > > >>>>>>>>                 ret = -KVM_ENOSYS;
> > > >>>>>>>>                 break;
> > > >>>>>>>> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> > > >>>>>>>> index 8b86609849b9..847b83b75dc8 100644
> > > >>>>>>>> --- a/include/uapi/linux/kvm_para.h
> > > >>>>>>>> +++ b/include/uapi/linux/kvm_para.h
> > > >>>>>>>> @@ -29,6 +29,7 @@
> > > >>>>>>>>  #define KVM_HC_CLOCK_PAIRING           9
> > > >>>>>>>>  #define KVM_HC_SEND_IPI                10
> > > >>>>>>>>  #define KVM_HC_SCHED_YIELD             11
> > > >>>>>>>> +#define KVM_HC_PAGE_ENC_STATUS         12
> > > >>>>>>>>
> > > >>>>>>>>  /*
> > > >>>>>>>>   * hypercalls use architecture specific
> > > >>>>>>>> --
> > > >>>>>>>> 2.17.1
> > > >>>>>>>>
> > > >>>>>>> I'm still not excited by the dynamic resizing. I believe the guest
> > > >>>>>>> hypercall can be called in atomic contexts, which makes me
> > > >>>>>>> particularly unexcited to see a potentially large vmalloc on the host
> > > >>>>>>> followed by filling the buffer. Particularly when the buffer might be
> > > >>>>>>> non-trivial in size (~1MB per 32GB, per some back of the envelope
> > > >>>>>>> math).
> > > >>>>>>>
> > > >>>>>> I think looking at more practical situations, most hypercalls will
> > > >>>>>> happen during the boot stage, when device specific initializations are
> > > >>>>>> happening, so typically the maximum page encryption bitmap size would
> > > >>>>>> be allocated early enough.
> > > >>>>>>
> > > >>>>>> In fact, initial hypercalls made by OVMF will probably allocate the
> > > >>>>>> maximum page bitmap size even before the kernel comes up, especially
> > > >>>>>> as they will be setting up page enc/dec status for MMIO, ROM, ACPI
> > > >>>>>> regions, PCI device memory, etc., and most importantly for
> > > >>>>>> "non-existent" high memory range (which will probably be the
> > > >>>>>> maximum size page encryption bitmap allocated/resized).
> > > >>>>>>
> > > >>>>>> Let me know if you have different thoughts on this ?
> > > >>>>> Hi Ashish,
> > > >>>>>
> > > >>>>> If this is not an issue in practice, we can just move past this. If we
> > > >>>>> are basically guaranteed that OVMF will trigger hypercalls that expand
> > > >>>>> the bitmap beyond the top of memory, then, yes, that should work. That
> > > >>>>> leaves me slightly nervous that OVMF might regress since it's not
> > > >>>>> obvious that calling a hypercall beyond the top of memory would be
> > > >>>>> "required" for avoiding a somewhat indirectly related issue in guest
> > > >>>>> kernels.
> > > >>>>
> > > >>>> If possible then we should try to avoid growing/shrinking the bitmap .
> > > >>>> Today OVMF may not be accessing beyond memory but a malicious guest
> > > >>>> could send a hypercall down which can trigger a huge memory allocation
> > > >>>> on the host side and may eventually cause denial of service for other.
> > > >>> Nice catch! Was just writing up an email about this.
> > > >>>> I am in favor if we can find some solution to handle this case. How
> > > >>>> about Steve's suggestion about VMM making a call down to the kernel to
> > > >>>> tell how big the bitmap should be? Initially it should be equal to the
> > > >>>> guest RAM and if VMM ever did the memory expansion then it can send down
> > > >>>> another notification to increase the bitmap ?
> > > >>>>
> > > >>>> Optionally, instead of adding a new ioctl, I was wondering if we can
> > > >>>> extend the kvm_arch_prepare_memory_region() to make svm specific x86_ops
> > > >>>> which can take read the userspace provided memory region and calculate
> > > >>>> the amount of guest RAM managed by the KVM and grow/shrink the bitmap
> > > >>>> based on that information. I have not looked deep enough to see if its
> > > >>>> doable but if it can work then we can avoid adding yet another ioctl.
> > > >>> We also have the set bitmap ioctl in a later patch in this series. We
> > > >>> could also use the set ioctl for initialization (it's a little
> > > >>> excessive for initialization since there will be an additional
> > > >>> ephemeral allocation and a few additional buffer copies, but that's
> > > >>> probably fine). An enable_cap has the added benefit of probably being
> > > >>> necessary anyway so usermode can disable the migration feature flag.
> > > >>>
> > > >>> In general, userspace is going to have to be in direct control of the
> > > >>> buffer and its size.
> > > >> My only practical concern about setting a static bitmap size based on guest
> > > >> memory is about the hypercalls being made initially by OVMF to set page
> > > >> enc/dec status for ROM, ACPI regions and especially the non-existent
> > > >> high memory range. The new ioctl will statically setup bitmap size to
> > > >> whatever guest RAM is specified, say 4G, 8G, etc., but the OVMF
> > > >> hypercall for non-existent memory will try to do a hypercall for guest
> > > >> physical memory range like ~6G->64G (for 4G guest RAM setup), this
> > > >> hypercall will basically have to just return doing nothing, because
> > > >> the allocated bitmap won't have this guest physical range available ?
> > >
> > >
> > > IMO, Ovmf issuing a hypercall beyond the guest RAM is simple wrong, it
> > > should *not* do that.  There was a feature request I submitted sometime
> > > back to Tianocore https://bugzilla.tianocore.org/show_bug.cgi?id=623 as
> > > I saw this coming in future. I tried highlighting the problem in the
> > > MdeModulePkg that it does not provide a notifier to tell OVMF when core
> > > creates the MMIO holes etc. It was not a big problem with the SEV
> > > initially because we were never getting down to hypervisor to do
> > > something about those non-existent regions. But with the migration its
> > > now important that we should restart the discussion with UEFI folks and
> > > see what can be done. In the kernel patches we should do what is right
> > > for the kernel and not workaround the Ovmf limitation.
> >
> > Ok, this makes sense. I will start exploring
> > kvm_arch_prepare_memory_region() to see if it can assist in computing
> > the guest RAM or otherwise i will look at adding a new ioctl interface
> > for the same.
> >
>
> I looked at kvm_arch_prepare_memory_region() and
> kvm_arch_commit_memory_region() and kvm_arch_commit_memory_region()
> looks to be ideal to use for this.
>
> I walked the kvm_memslots in this function and i can compute the
> approximate guest RAM mapped by KVM, though, i get the guest RAM size as
> "twice" the configured size because of the two address spaces on x86 KVM,
> i believe there is one additional address space for SMM/SMRAM use.
>
> I don't think we have a use case of migrating a SEV guest with SMM
> support ?
>
> Considering that, i believe that i can just compute the guest RAM size
> using memslots for address space #0 and use that to grow/shrink the bitmap.
>
> As you mentioned i will need to add a new SVM specific x86_ops to
> callback as part of kvm_arch_commit_memory_region() which will in-turn
> call sev_resize_page_enc_bitmap().
>
> Thanks,
> Ashish
>
> > >
> > >
> > > >> Also, hypercalls for ROM, ACPI, device regions and any memory holes within
> > > >> the static bitmap setup as per guest RAM config will work, but what
> > > >> about hypercalls for any device regions beyond the guest RAM config ?
> > > >>
> > > >> Thanks,
> > > >> Ashish
Supporting Migration of SEV VM with SMM seems unnecessary, but I
wouldn't go out of my way to make it harder to fix. The bitmap is
as_id unaware already (as it likely should be, since the PA space is
shared), so I would personally ignore summing up the sizes, and
instead look for the highest PA that is mapped by a memslot. If I'm
not mistaken, the address spaces should (more or less) entirely
overlap. This will be suboptimal if the PA space is sparse, but that
would be weird anyway.

You could also go back to a suggestion I had much earlier and have the
memslots own the bitmaps. I believe there is space to add flags for
enabling and disabling features like this on memslots. Basically, you
could treat this in the same way as dirty bitmaps, which seem pretty
similar. This would work well even if PA space were sparse.

The current patch set is nowhere near sufficient for supporting SMM
migration securely, which is fine: we would need to separate out
access to the hypercall for particular pages, so that non-SMM could
not corrupt SMM. And then the code in SMM would also need to support
the hypercall if it ever used shared pages. Until someone asks for
this I don't believe it is worth doing. If we put the bitmaps on the
memslots, we could probably limit access to a memslot's bitmap to
vcpus that can access that memslot. This seems unnecessary for now.

--Steve

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 11/14] KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
  2020-04-08  1:48     ` Ashish Kalra
@ 2020-04-10  0:06       ` Steve Rutherford
  2020-04-10  1:23         ` Ashish Kalra
  0 siblings, 1 reply; 107+ messages in thread
From: Steve Rutherford @ 2020-04-10  0:06 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, David Rientjes, Andy Lutomirski, Brijesh Singh

On Tue, Apr 7, 2020 at 6:49 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>
> Hello Steve,
>
> On Tue, Apr 07, 2020 at 05:26:33PM -0700, Steve Rutherford wrote:
> > On Sun, Mar 29, 2020 at 11:23 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> > >
> > > From: Brijesh Singh <Brijesh.Singh@amd.com>
> > >
> > > The ioctl can be used to set page encryption bitmap for an
> > > incoming guest.
> > >
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > > Cc: Joerg Roedel <joro@8bytes.org>
> > > Cc: Borislav Petkov <bp@suse.de>
> > > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > > Cc: x86@kernel.org
> > > Cc: kvm@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org
> > > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > ---
> > >  Documentation/virt/kvm/api.rst  | 22 +++++++++++++++++
> > >  arch/x86/include/asm/kvm_host.h |  2 ++
> > >  arch/x86/kvm/svm.c              | 42 +++++++++++++++++++++++++++++++++
> > >  arch/x86/kvm/x86.c              | 12 ++++++++++
> > >  include/uapi/linux/kvm.h        |  1 +
> > >  5 files changed, 79 insertions(+)
> > >
> > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > index 8ad800ebb54f..4d1004a154f6 100644
> > > --- a/Documentation/virt/kvm/api.rst
> > > +++ b/Documentation/virt/kvm/api.rst
> > > @@ -4675,6 +4675,28 @@ or shared. The bitmap can be used during the guest migration, if the page
> > >  is private then userspace need to use SEV migration commands to transmit
> > >  the page.
> > >
> > > +4.126 KVM_SET_PAGE_ENC_BITMAP (vm ioctl)
> > > +---------------------------------------
> > > +
> > > +:Capability: basic
> > > +:Architectures: x86
> > > +:Type: vm ioctl
> > > +:Parameters: struct kvm_page_enc_bitmap (in/out)
> > > +:Returns: 0 on success, -1 on error
> > > +
> > > +/* for KVM_SET_PAGE_ENC_BITMAP */
> > > +struct kvm_page_enc_bitmap {
> > > +       __u64 start_gfn;
> > > +       __u64 num_pages;
> > > +       union {
> > > +               void __user *enc_bitmap; /* one bit per page */
> > > +               __u64 padding2;
> > > +       };
> > > +};
> > > +
> > > +During the guest live migration the outgoing guest exports its page encryption
> > > +bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > +bitmap for an incoming guest.
> > >
> > >  5. The kvm_run structure
> > >  ========================
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 27e43e3ec9d8..d30f770aaaea 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1271,6 +1271,8 @@ struct kvm_x86_ops {
> > >                                   unsigned long sz, unsigned long mode);
> > >         int (*get_page_enc_bitmap)(struct kvm *kvm,
> > >                                 struct kvm_page_enc_bitmap *bmap);
> > > +       int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > +                               struct kvm_page_enc_bitmap *bmap);
> > >  };
> > >
> > >  struct kvm_arch_async_pf {
> > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > index bae783cd396a..313343a43045 100644
> > > --- a/arch/x86/kvm/svm.c
> > > +++ b/arch/x86/kvm/svm.c
> > > @@ -7756,6 +7756,47 @@ static int svm_get_page_enc_bitmap(struct kvm *kvm,
> > >         return ret;
> > >  }
> > >
> > > +static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > +                                  struct kvm_page_enc_bitmap *bmap)
> > > +{
> > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > +       unsigned long gfn_start, gfn_end;
> > > +       unsigned long *bitmap;
> > > +       unsigned long sz, i;
> > > +       int ret;
> > > +
> > > +       if (!sev_guest(kvm))
> > > +               return -ENOTTY;
> > > +
> > > +       gfn_start = bmap->start_gfn;
> > > +       gfn_end = gfn_start + bmap->num_pages;
> > > +
> > > +       sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / 8;
> > > +       bitmap = kmalloc(sz, GFP_KERNEL);
> > > +       if (!bitmap)
> > > +               return -ENOMEM;
> > > +
> > > +       ret = -EFAULT;
> > > +       if (copy_from_user(bitmap, bmap->enc_bitmap, sz))
> > > +               goto out;
> > > +
> > > +       mutex_lock(&kvm->lock);
> > > +       ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > I realize now that usermode could use this for initializing the
> > minimum size of the enc bitmap, which probably solves my issue from
> > the other thread.
> > > +       if (ret)
> > > +               goto unlock;
> > > +
> > > +       i = gfn_start;
> > > +       for_each_clear_bit_from(i, bitmap, (gfn_end - gfn_start))
> > > +               clear_bit(i + gfn_start, sev->page_enc_bmap);
> > This API seems a bit strange, since it can only clear bits. I would
> > expect "set" to force the values to match the values passed down,
> > instead of only ensuring that cleared bits in the input are also
> > cleared in the kernel.
> >
>
> The sev_resize_page_enc_bitmap() will allocate a new bitmap and
> set it to all 0xFF's, therefore, the code here simply clears the bits
> in the bitmap as per the cleared bits in the input.

If I'm not mistaken, resize only reinitializes the newly extended part
of the buffer, and copies the old values for the rest.
With the API you proposed you could probably reimplement a normal set
call by calling get, then reset, and then set, but this feels
cumbersome.

--Steve

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  2020-04-08  1:52           ` Ashish Kalra
@ 2020-04-10  0:59             ` Steve Rutherford
  2020-04-10  1:34               ` Ashish Kalra
  0 siblings, 1 reply; 107+ messages in thread
From: Steve Rutherford @ 2020-04-10  0:59 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Krish Sadhukhan, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski,
	Brijesh Singh

On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>
> Hello Steve,
>
> On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
> > On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
> > <krish.sadhukhan@oracle.com> wrote:
> > >
> > >
> > > On 4/3/20 2:45 PM, Ashish Kalra wrote:
> > > > On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> > > >> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > > >>> From: Ashish Kalra <ashish.kalra@amd.com>
> > > >>>
> > > >>> This ioctl can be used by the application to reset the page
> > > >>> encryption bitmap managed by the KVM driver. A typical usage
> > > >>> for this ioctl is on VM reboot, on reboot, we must reinitialize
> > > >>> the bitmap.
> > > >>>
> > > >>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > >>> ---
> > > >>>    Documentation/virt/kvm/api.rst  | 13 +++++++++++++
> > > >>>    arch/x86/include/asm/kvm_host.h |  1 +
> > > >>>    arch/x86/kvm/svm.c              | 16 ++++++++++++++++
> > > >>>    arch/x86/kvm/x86.c              |  6 ++++++
> > > >>>    include/uapi/linux/kvm.h        |  1 +
> > > >>>    5 files changed, 37 insertions(+)
> > > >>>
> > > >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > >>> index 4d1004a154f6..a11326ccc51d 100644
> > > >>> --- a/Documentation/virt/kvm/api.rst
> > > >>> +++ b/Documentation/virt/kvm/api.rst
> > > >>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> > > >>>    bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > >>>    bitmap for an incoming guest.
> > > >>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> > > >>> +-----------------------------------------
> > > >>> +
> > > >>> +:Capability: basic
> > > >>> +:Architectures: x86
> > > >>> +:Type: vm ioctl
> > > >>> +:Parameters: none
> > > >>> +:Returns: 0 on success, -1 on error
> > > >>> +
> > > >>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> > > >>> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> > > >>> +
> > > >>> +
> > > >>>    5. The kvm_run structure
> > > >>>    ========================
> > > >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > >>> index d30f770aaaea..a96ef6338cd2 100644
> > > >>> --- a/arch/x86/include/asm/kvm_host.h
> > > >>> +++ b/arch/x86/include/asm/kvm_host.h
> > > >>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> > > >>>                             struct kvm_page_enc_bitmap *bmap);
> > > >>>     int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > >>>                             struct kvm_page_enc_bitmap *bmap);
> > > >>> +   int (*reset_page_enc_bitmap)(struct kvm *kvm);
> > > >>>    };
> > > >>>    struct kvm_arch_async_pf {
> > > >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > >>> index 313343a43045..c99b0207a443 100644
> > > >>> --- a/arch/x86/kvm/svm.c
> > > >>> +++ b/arch/x86/kvm/svm.c
> > > >>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > >>>     return ret;
> > > >>>    }
> > > >>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> > > >>> +{
> > > >>> +   struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > >>> +
> > > >>> +   if (!sev_guest(kvm))
> > > >>> +           return -ENOTTY;
> > > >>> +
> > > >>> +   mutex_lock(&kvm->lock);
> > > >>> +   /* by default all pages should be marked encrypted */
> > > >>> +   if (sev->page_enc_bmap_size)
> > > >>> +           bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > >>> +   mutex_unlock(&kvm->lock);
> > > >>> +   return 0;
> > > >>> +}
> > > >>> +
> > > >>>    static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > >>>    {
> > > >>>     struct kvm_sev_cmd sev_cmd;
> > > >>> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > >>>     .page_enc_status_hc = svm_page_enc_status_hc,
> > > >>>     .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > > >>>     .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> > > >>> +   .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
> > > >>
> > > >> We don't need to initialize the intel ops to NULL ? It's not initialized in
> > > >> the previous patch either.
> > > >>
> > > >>>    };
> > > > This struct is declared as "static storage", so won't the non-initialized
> > > > members be 0 ?
> > >
> > >
> > > Correct. Although, I see that 'nested_enable_evmcs' is explicitly
> > > initialized. We should maintain the convention, perhaps.
> > >
> > > >
> > > >>>    static int __init svm_init(void)
> > > >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > >>> index 05e953b2ec61..2127ed937f53 100644
> > > >>> --- a/arch/x86/kvm/x86.c
> > > >>> +++ b/arch/x86/kvm/x86.c
> > > >>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > > >>>                     r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> > > >>>             break;
> > > >>>     }
> > > >>> +   case KVM_PAGE_ENC_BITMAP_RESET: {
> > > >>> +           r = -ENOTTY;
> > > >>> +           if (kvm_x86_ops->reset_page_enc_bitmap)
> > > >>> +                   r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> > > >>> +           break;
> > > >>> +   }
> > > >>>     default:
> > > >>>             r = -ENOTTY;
> > > >>>     }
> > > >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > >>> index b4b01d47e568..0884a581fc37 100644
> > > >>> --- a/include/uapi/linux/kvm.h
> > > >>> +++ b/include/uapi/linux/kvm.h
> > > >>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> > > >>>    #define KVM_GET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > > >>>    #define KVM_SET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> > > >>> +#define KVM_PAGE_ENC_BITMAP_RESET  _IO(KVMIO, 0xc7)
> > > >>>    /* Secure Encrypted Virtualization command */
> > > >>>    enum sev_cmd_id {
> > > >> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> >
> >
> > Doesn't this overlap with the set ioctl? Yes, obviously, you have to
> > copy the new value down and do a bit more work, but I don't think
> > resetting the bitmap is going to be the bottleneck on reboot. Seems
> > excessive to add another ioctl for this.
>
> The set ioctl is generally available/provided for the incoming VM to setup
> the page encryption bitmap, this reset ioctl is meant for the source VM
> as a simple interface to reset the whole page encryption bitmap.
>
> Thanks,
> Ashish


Hey Ashish,

These seem very overlapping. I think this API should be refactored a bit.

1) Use kvm_vm_ioctl_enable_cap to control whether or not this
hypercall (and related feature bit) is offered to the VM, and also the
size of the buffer.
2) Use set for manipulating values in the bitmap, including resetting
the bitmap. Set the bitmap pointer to null if you want to reset to all
0xFFs. When the bitmap pointer is set, it should set the values to
exactly what is pointed at, instead of only clearing bits, as is done
currently.
3) Use get for fetching values from the kernel. Personally, I'd
require alignment of the base GFN to a multiple of 8 (but the number
of pages could be whatever), so you can just use a memcpy. Optionally,
you may want some way to tell userspace the size of the existing
buffer, so it can ensure that it can ask for the entire buffer without
having to track the size in usermode (not strictly necessary, but nice
to have since it ensures that there is only one place that has to
manage this value).

If you want to expand or contract the bitmap, you can use enable cap
to adjust the size.
If you don't want to offer the hypercall to the guest, don't call the
enable cap.
This API avoids using up another ioctl. Ioctl space is somewhat
scarce. It also gives userspace fine grained control over the buffer,
so it can support both hot-plug and hot-unplug (or at the very least
it is not obviously incompatible with those). It also gives userspace
control over whether or not the feature is offered. The hypercall
isn't free, and being able to tell guests to not call when the host
wasn't going to migrate it anyway will be useful.

Thanks,
--Steve

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 11/14] KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
  2020-04-10  0:06       ` Steve Rutherford
@ 2020-04-10  1:23         ` Ashish Kalra
  2020-04-10 18:08           ` Steve Rutherford
  0 siblings, 1 reply; 107+ messages in thread
From: Ashish Kalra @ 2020-04-10  1:23 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, David Rientjes, Andy Lutomirski, Brijesh Singh

Hello Steve,

On Thu, Apr 09, 2020 at 05:06:21PM -0700, Steve Rutherford wrote:
> On Tue, Apr 7, 2020 at 6:49 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >
> > Hello Steve,
> >
> > On Tue, Apr 07, 2020 at 05:26:33PM -0700, Steve Rutherford wrote:
> > > On Sun, Mar 29, 2020 at 11:23 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> > > >
> > > > From: Brijesh Singh <Brijesh.Singh@amd.com>
> > > >
> > > > The ioctl can be used to set page encryption bitmap for an
> > > > incoming guest.
> > > >
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Ingo Molnar <mingo@redhat.com>
> > > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > > > Cc: Joerg Roedel <joro@8bytes.org>
> > > > Cc: Borislav Petkov <bp@suse.de>
> > > > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > > > Cc: x86@kernel.org
> > > > Cc: kvm@vger.kernel.org
> > > > Cc: linux-kernel@vger.kernel.org
> > > > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > > > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > > ---
> > > >  Documentation/virt/kvm/api.rst  | 22 +++++++++++++++++
> > > >  arch/x86/include/asm/kvm_host.h |  2 ++
> > > >  arch/x86/kvm/svm.c              | 42 +++++++++++++++++++++++++++++++++
> > > >  arch/x86/kvm/x86.c              | 12 ++++++++++
> > > >  include/uapi/linux/kvm.h        |  1 +
> > > >  5 files changed, 79 insertions(+)
> > > >
> > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > index 8ad800ebb54f..4d1004a154f6 100644
> > > > --- a/Documentation/virt/kvm/api.rst
> > > > +++ b/Documentation/virt/kvm/api.rst
> > > > @@ -4675,6 +4675,28 @@ or shared. The bitmap can be used during the guest migration, if the page
> > > >  is private then userspace need to use SEV migration commands to transmit
> > > >  the page.
> > > >
> > > > +4.126 KVM_SET_PAGE_ENC_BITMAP (vm ioctl)
> > > > +---------------------------------------
> > > > +
> > > > +:Capability: basic
> > > > +:Architectures: x86
> > > > +:Type: vm ioctl
> > > > +:Parameters: struct kvm_page_enc_bitmap (in/out)
> > > > +:Returns: 0 on success, -1 on error
> > > > +
> > > > +/* for KVM_SET_PAGE_ENC_BITMAP */
> > > > +struct kvm_page_enc_bitmap {
> > > > +       __u64 start_gfn;
> > > > +       __u64 num_pages;
> > > > +       union {
> > > > +               void __user *enc_bitmap; /* one bit per page */
> > > > +               __u64 padding2;
> > > > +       };
> > > > +};
> > > > +
> > > > +During the guest live migration the outgoing guest exports its page encryption
> > > > +bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > > +bitmap for an incoming guest.
> > > >
> > > >  5. The kvm_run structure
> > > >  ========================
> > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > index 27e43e3ec9d8..d30f770aaaea 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -1271,6 +1271,8 @@ struct kvm_x86_ops {
> > > >                                   unsigned long sz, unsigned long mode);
> > > >         int (*get_page_enc_bitmap)(struct kvm *kvm,
> > > >                                 struct kvm_page_enc_bitmap *bmap);
> > > > +       int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > > +                               struct kvm_page_enc_bitmap *bmap);
> > > >  };
> > > >
> > > >  struct kvm_arch_async_pf {
> > > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > > index bae783cd396a..313343a43045 100644
> > > > --- a/arch/x86/kvm/svm.c
> > > > +++ b/arch/x86/kvm/svm.c
> > > > @@ -7756,6 +7756,47 @@ static int svm_get_page_enc_bitmap(struct kvm *kvm,
> > > >         return ret;
> > > >  }
> > > >
> > > > +static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > > +                                  struct kvm_page_enc_bitmap *bmap)
> > > > +{
> > > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > +       unsigned long gfn_start, gfn_end;
> > > > +       unsigned long *bitmap;
> > > > +       unsigned long sz, i;
> > > > +       int ret;
> > > > +
> > > > +       if (!sev_guest(kvm))
> > > > +               return -ENOTTY;
> > > > +
> > > > +       gfn_start = bmap->start_gfn;
> > > > +       gfn_end = gfn_start + bmap->num_pages;
> > > > +
> > > > +       sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / 8;
> > > > +       bitmap = kmalloc(sz, GFP_KERNEL);
> > > > +       if (!bitmap)
> > > > +               return -ENOMEM;
> > > > +
> > > > +       ret = -EFAULT;
> > > > +       if (copy_from_user(bitmap, bmap->enc_bitmap, sz))
> > > > +               goto out;
> > > > +
> > > > +       mutex_lock(&kvm->lock);
> > > > +       ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > > I realize now that usermode could use this for initializing the
> > > minimum size of the enc bitmap, which probably solves my issue from
> > > the other thread.
> > > > +       if (ret)
> > > > +               goto unlock;
> > > > +
> > > > +       i = gfn_start;
> > > > +       for_each_clear_bit_from(i, bitmap, (gfn_end - gfn_start))
> > > > +               clear_bit(i + gfn_start, sev->page_enc_bmap);
> > > This API seems a bit strange, since it can only clear bits. I would
> > > expect "set" to force the values to match the values passed down,
> > > instead of only ensuring that cleared bits in the input are also
> > > cleared in the kernel.
> > >
> >
> > The sev_resize_page_enc_bitmap() will allocate a new bitmap and
> > set it to all 0xFF's, therefore, the code here simply clears the bits
> > in the bitmap as per the cleared bits in the input.
> 
> If I'm not mistaken, resize only reinitializes the newly extended part
> of the buffer, and copies the old values for the rest.
> With the API you proposed you could probably reimplement a normal set
> call by calling get, then reset, and then set, but this feels
> cumbersome.
> 

As i mentioned earlier, the set api is basically meant for the incoming
VM, the resize will initialize the incoming VM's bitmap to all 0xFF's
and as there won't be any bitmap allocated initially on the incoming VM,
therefore, the bitmap copy will not do anything and the clear_bit later
will clear the incoming VM's bits as per the input.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  2020-04-10  0:59             ` Steve Rutherford
@ 2020-04-10  1:34               ` Ashish Kalra
  2020-04-10 18:14                 ` Steve Rutherford
  0 siblings, 1 reply; 107+ messages in thread
From: Ashish Kalra @ 2020-04-10  1:34 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Krish Sadhukhan, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski,
	Brijesh Singh

Hello Steve,

On Thu, Apr 09, 2020 at 05:59:56PM -0700, Steve Rutherford wrote:
> On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >
> > Hello Steve,
> >
> > On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
> > > On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
> > > <krish.sadhukhan@oracle.com> wrote:
> > > >
> > > >
> > > > On 4/3/20 2:45 PM, Ashish Kalra wrote:
> > > > > On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> > > > >> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > > > >>> From: Ashish Kalra <ashish.kalra@amd.com>
> > > > >>>
> > > > >>> This ioctl can be used by the application to reset the page
> > > > >>> encryption bitmap managed by the KVM driver. A typical usage
> > > > >>> for this ioctl is on VM reboot, on reboot, we must reinitialize
> > > > >>> the bitmap.
> > > > >>>
> > > > >>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > > >>> ---
> > > > >>>    Documentation/virt/kvm/api.rst  | 13 +++++++++++++
> > > > >>>    arch/x86/include/asm/kvm_host.h |  1 +
> > > > >>>    arch/x86/kvm/svm.c              | 16 ++++++++++++++++
> > > > >>>    arch/x86/kvm/x86.c              |  6 ++++++
> > > > >>>    include/uapi/linux/kvm.h        |  1 +
> > > > >>>    5 files changed, 37 insertions(+)
> > > > >>>
> > > > >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > >>> index 4d1004a154f6..a11326ccc51d 100644
> > > > >>> --- a/Documentation/virt/kvm/api.rst
> > > > >>> +++ b/Documentation/virt/kvm/api.rst
> > > > >>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> > > > >>>    bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > > >>>    bitmap for an incoming guest.
> > > > >>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> > > > >>> +-----------------------------------------
> > > > >>> +
> > > > >>> +:Capability: basic
> > > > >>> +:Architectures: x86
> > > > >>> +:Type: vm ioctl
> > > > >>> +:Parameters: none
> > > > >>> +:Returns: 0 on success, -1 on error
> > > > >>> +
> > > > >>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> > > > >>> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> > > > >>> +
> > > > >>> +
> > > > >>>    5. The kvm_run structure
> > > > >>>    ========================
> > > > >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > >>> index d30f770aaaea..a96ef6338cd2 100644
> > > > >>> --- a/arch/x86/include/asm/kvm_host.h
> > > > >>> +++ b/arch/x86/include/asm/kvm_host.h
> > > > >>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> > > > >>>                             struct kvm_page_enc_bitmap *bmap);
> > > > >>>     int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > > >>>                             struct kvm_page_enc_bitmap *bmap);
> > > > >>> +   int (*reset_page_enc_bitmap)(struct kvm *kvm);
> > > > >>>    };
> > > > >>>    struct kvm_arch_async_pf {
> > > > >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > > >>> index 313343a43045..c99b0207a443 100644
> > > > >>> --- a/arch/x86/kvm/svm.c
> > > > >>> +++ b/arch/x86/kvm/svm.c
> > > > >>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > > >>>     return ret;
> > > > >>>    }
> > > > >>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> > > > >>> +{
> > > > >>> +   struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > >>> +
> > > > >>> +   if (!sev_guest(kvm))
> > > > >>> +           return -ENOTTY;
> > > > >>> +
> > > > >>> +   mutex_lock(&kvm->lock);
> > > > >>> +   /* by default all pages should be marked encrypted */
> > > > >>> +   if (sev->page_enc_bmap_size)
> > > > >>> +           bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > > >>> +   mutex_unlock(&kvm->lock);
> > > > >>> +   return 0;
> > > > >>> +}
> > > > >>> +
> > > > >>>    static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > > >>>    {
> > > > >>>     struct kvm_sev_cmd sev_cmd;
> > > > >>> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > > >>>     .page_enc_status_hc = svm_page_enc_status_hc,
> > > > >>>     .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > > > >>>     .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> > > > >>> +   .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
> > > > >>
> > > > >> We don't need to initialize the intel ops to NULL ? It's not initialized in
> > > > >> the previous patch either.
> > > > >>
> > > > >>>    };
> > > > > This struct is declared as "static storage", so won't the non-initialized
> > > > > members be 0 ?
> > > >
> > > >
> > > > Correct. Although, I see that 'nested_enable_evmcs' is explicitly
> > > > initialized. We should maintain the convention, perhaps.
> > > >
> > > > >
> > > > >>>    static int __init svm_init(void)
> > > > >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > >>> index 05e953b2ec61..2127ed937f53 100644
> > > > >>> --- a/arch/x86/kvm/x86.c
> > > > >>> +++ b/arch/x86/kvm/x86.c
> > > > >>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > > > >>>                     r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> > > > >>>             break;
> > > > >>>     }
> > > > >>> +   case KVM_PAGE_ENC_BITMAP_RESET: {
> > > > >>> +           r = -ENOTTY;
> > > > >>> +           if (kvm_x86_ops->reset_page_enc_bitmap)
> > > > >>> +                   r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> > > > >>> +           break;
> > > > >>> +   }
> > > > >>>     default:
> > > > >>>             r = -ENOTTY;
> > > > >>>     }
> > > > >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > > >>> index b4b01d47e568..0884a581fc37 100644
> > > > >>> --- a/include/uapi/linux/kvm.h
> > > > >>> +++ b/include/uapi/linux/kvm.h
> > > > >>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> > > > >>>    #define KVM_GET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > > > >>>    #define KVM_SET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> > > > >>> +#define KVM_PAGE_ENC_BITMAP_RESET  _IO(KVMIO, 0xc7)
> > > > >>>    /* Secure Encrypted Virtualization command */
> > > > >>>    enum sev_cmd_id {
> > > > >> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> > >
> > >
> > > Doesn't this overlap with the set ioctl? Yes, obviously, you have to
> > > copy the new value down and do a bit more work, but I don't think
> > > resetting the bitmap is going to be the bottleneck on reboot. Seems
> > > excessive to add another ioctl for this.
> >
> > The set ioctl is generally available/provided for the incoming VM to setup
> > the page encryption bitmap, this reset ioctl is meant for the source VM
> > as a simple interface to reset the whole page encryption bitmap.
> >
> > Thanks,
> > Ashish
> 
> 
> Hey Ashish,
> 
> These seem very overlapping. I think this API should be refactored a bit.
> 
> 1) Use kvm_vm_ioctl_enable_cap to control whether or not this
> hypercall (and related feature bit) is offered to the VM, and also the
> size of the buffer.

If you look at patch 13/14, i have added a new kvm para feature called
"KVM_FEATURE_SEV_LIVE_MIGRATION" which indicates host support for SEV
Live Migration and a new Custom MSR which the guest does a wrmsr to
enable the Live Migration feature, so this is like the enable cap
support.

There are further extensions to this support i am adding, so patch 13/14
of this patch-set is still being enhanced and will have full support
when i repost next.

> 2) Use set for manipulating values in the bitmap, including resetting
> the bitmap. Set the bitmap pointer to null if you want to reset to all
> 0xFFs. When the bitmap pointer is set, it should set the values to
> exactly what is pointed at, instead of only clearing bits, as is done
> currently.

As i mentioned in my earlier email, the set api is supposed to be for
the incoming VM, but if you really need to use it for the outgoing VM
then it can be modified. 

> 3) Use get for fetching values from the kernel. Personally, I'd
> require alignment of the base GFN to a multiple of 8 (but the number
> of pages could be whatever), so you can just use a memcpy. Optionally,
> you may want some way to tell userspace the size of the existing
> buffer, so it can ensure that it can ask for the entire buffer without
> having to track the size in usermode (not strictly necessary, but nice
> to have since it ensures that there is only one place that has to
> manage this value).
> 
> If you want to expand or contract the bitmap, you can use enable cap
> to adjust the size.

As being discussed on the earlier mail thread, we are doing this
dynamically now by computing the guest RAM size when the
set_user_memory_region ioctl is invoked. I believe that should handle
the hot-plug and hot-unplug events too, as any hot memory updates will
need KVM memslots to be updated. 

> If you don't want to offer the hypercall to the guest, don't call the
> enable cap.
> This API avoids using up another ioctl. Ioctl space is somewhat
> scarce. It also gives userspace fine grained control over the buffer,
> so it can support both hot-plug and hot-unplug (or at the very least
> it is not obviously incompatible with those). It also gives userspace
> control over whether or not the feature is offered. The hypercall
> isn't free, and being able to tell guests to not call when the host
> wasn't going to migrate it anyway will be useful.
> 

As i mentioned above, now the host indicates if it supports the Live
Migration feature and the feature and the hypercall are only enabled on
the host when the guest checks for this support and does a wrmsr() to
enable the feature. Also the guest will not make the hypercall if the
host does not indicate support for it.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 11/14] KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl
  2020-04-10  1:23         ` Ashish Kalra
@ 2020-04-10 18:08           ` Steve Rutherford
  0 siblings, 0 replies; 107+ messages in thread
From: Steve Rutherford @ 2020-04-10 18:08 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joerg Roedel, Borislav Petkov, Tom Lendacky, X86 ML, KVM list,
	LKML, David Rientjes, Andy Lutomirski, Brijesh Singh

On Thu, Apr 9, 2020 at 6:23 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>
> Hello Steve,
>
> On Thu, Apr 09, 2020 at 05:06:21PM -0700, Steve Rutherford wrote:
> > On Tue, Apr 7, 2020 at 6:49 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > >
> > > Hello Steve,
> > >
> > > On Tue, Apr 07, 2020 at 05:26:33PM -0700, Steve Rutherford wrote:
> > > > On Sun, Mar 29, 2020 at 11:23 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> > > > >
> > > > > From: Brijesh Singh <Brijesh.Singh@amd.com>
> > > > >
> > > > > The ioctl can be used to set page encryption bitmap for an
> > > > > incoming guest.
> > > > >
> > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > Cc: Ingo Molnar <mingo@redhat.com>
> > > > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > > > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > > > > Cc: Joerg Roedel <joro@8bytes.org>
> > > > > Cc: Borislav Petkov <bp@suse.de>
> > > > > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > > > > Cc: x86@kernel.org
> > > > > Cc: kvm@vger.kernel.org
> > > > > Cc: linux-kernel@vger.kernel.org
> > > > > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > > > > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > > > ---
> > > > >  Documentation/virt/kvm/api.rst  | 22 +++++++++++++++++
> > > > >  arch/x86/include/asm/kvm_host.h |  2 ++
> > > > >  arch/x86/kvm/svm.c              | 42 +++++++++++++++++++++++++++++++++
> > > > >  arch/x86/kvm/x86.c              | 12 ++++++++++
> > > > >  include/uapi/linux/kvm.h        |  1 +
> > > > >  5 files changed, 79 insertions(+)
> > > > >
> > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > > index 8ad800ebb54f..4d1004a154f6 100644
> > > > > --- a/Documentation/virt/kvm/api.rst
> > > > > +++ b/Documentation/virt/kvm/api.rst
> > > > > @@ -4675,6 +4675,28 @@ or shared. The bitmap can be used during the guest migration, if the page
> > > > >  is private then userspace need to use SEV migration commands to transmit
> > > > >  the page.
> > > > >
> > > > > +4.126 KVM_SET_PAGE_ENC_BITMAP (vm ioctl)
> > > > > +---------------------------------------
> > > > > +
> > > > > +:Capability: basic
> > > > > +:Architectures: x86
> > > > > +:Type: vm ioctl
> > > > > +:Parameters: struct kvm_page_enc_bitmap (in/out)
> > > > > +:Returns: 0 on success, -1 on error
> > > > > +
> > > > > +/* for KVM_SET_PAGE_ENC_BITMAP */
> > > > > +struct kvm_page_enc_bitmap {
> > > > > +       __u64 start_gfn;
> > > > > +       __u64 num_pages;
> > > > > +       union {
> > > > > +               void __user *enc_bitmap; /* one bit per page */
> > > > > +               __u64 padding2;
> > > > > +       };
> > > > > +};
> > > > > +
> > > > > +During the guest live migration the outgoing guest exports its page encryption
> > > > > +bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > > > +bitmap for an incoming guest.
> > > > >
> > > > >  5. The kvm_run structure
> > > > >  ========================
> > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > > index 27e43e3ec9d8..d30f770aaaea 100644
> > > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > > @@ -1271,6 +1271,8 @@ struct kvm_x86_ops {
> > > > >                                   unsigned long sz, unsigned long mode);
> > > > >         int (*get_page_enc_bitmap)(struct kvm *kvm,
> > > > >                                 struct kvm_page_enc_bitmap *bmap);
> > > > > +       int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > > > +                               struct kvm_page_enc_bitmap *bmap);
> > > > >  };
> > > > >
> > > > >  struct kvm_arch_async_pf {
> > > > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > > > index bae783cd396a..313343a43045 100644
> > > > > --- a/arch/x86/kvm/svm.c
> > > > > +++ b/arch/x86/kvm/svm.c
> > > > > @@ -7756,6 +7756,47 @@ static int svm_get_page_enc_bitmap(struct kvm *kvm,
> > > > >         return ret;
> > > > >  }
> > > > >
> > > > > +static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > > > +                                  struct kvm_page_enc_bitmap *bmap)
> > > > > +{
> > > > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > > +       unsigned long gfn_start, gfn_end;
> > > > > +       unsigned long *bitmap;
> > > > > +       unsigned long sz, i;
> > > > > +       int ret;
> > > > > +
> > > > > +       if (!sev_guest(kvm))
> > > > > +               return -ENOTTY;
> > > > > +
> > > > > +       gfn_start = bmap->start_gfn;
> > > > > +       gfn_end = gfn_start + bmap->num_pages;
> > > > > +
> > > > > +       sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / 8;
> > > > > +       bitmap = kmalloc(sz, GFP_KERNEL);
> > > > > +       if (!bitmap)
> > > > > +               return -ENOMEM;
> > > > > +
> > > > > +       ret = -EFAULT;
> > > > > +       if (copy_from_user(bitmap, bmap->enc_bitmap, sz))
> > > > > +               goto out;
> > > > > +
> > > > > +       mutex_lock(&kvm->lock);
> > > > > +       ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> > > > I realize now that usermode could use this for initializing the
> > > > minimum size of the enc bitmap, which probably solves my issue from
> > > > the other thread.
> > > > > +       if (ret)
> > > > > +               goto unlock;
> > > > > +
> > > > > +       i = gfn_start;
> > > > > +       for_each_clear_bit_from(i, bitmap, (gfn_end - gfn_start))
> > > > > +               clear_bit(i + gfn_start, sev->page_enc_bmap);
> > > > This API seems a bit strange, since it can only clear bits. I would
> > > > expect "set" to force the values to match the values passed down,
> > > > instead of only ensuring that cleared bits in the input are also
> > > > cleared in the kernel.
> > > >
> > >
> > > The sev_resize_page_enc_bitmap() will allocate a new bitmap and
> > > set it to all 0xFF's, therefore, the code here simply clears the bits
> > > in the bitmap as per the cleared bits in the input.
> >
> > If I'm not mistaken, resize only reinitializes the newly extended part
> > of the buffer, and copies the old values for the rest.
> > With the API you proposed you could probably reimplement a normal set
> > call by calling get, then reset, and then set, but this feels
> > cumbersome.
> >
>
> As i mentioned earlier, the set api is basically meant for the incoming
> VM, the resize will initialize the incoming VM's bitmap to all 0xFF's
> and as there won't be any bitmap allocated initially on the incoming VM,
> therefore, the bitmap copy will not do anything and the clear_bit later
> will clear the incoming VM's bits as per the input.

The documentation does not make that super clear. A typical set call
in the KVM API let's you go to any state, not just a subset of states.
Yes, this works in the common case of migrating a VM to a particular
target, once. I find the behavior of the current API surprising. I
prefer APIs that are unsurprising. If I were to not have read the
code, it would be very easy for me to have assumed it worked like a
normal set call. You could rename the ioctl something like
"CLEAR_BITS", but a set based API is more common.

Thanks,
Steve

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  2020-04-10  1:34               ` Ashish Kalra
@ 2020-04-10 18:14                 ` Steve Rutherford
  2020-04-10 20:16                   ` Steve Rutherford
  0 siblings, 1 reply; 107+ messages in thread
From: Steve Rutherford @ 2020-04-10 18:14 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Krish Sadhukhan, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski,
	Brijesh Singh

On Thu, Apr 9, 2020 at 6:34 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>
> Hello Steve,
>
> On Thu, Apr 09, 2020 at 05:59:56PM -0700, Steve Rutherford wrote:
> > On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > >
> > > Hello Steve,
> > >
> > > On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
> > > > On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
> > > > <krish.sadhukhan@oracle.com> wrote:
> > > > >
> > > > >
> > > > > On 4/3/20 2:45 PM, Ashish Kalra wrote:
> > > > > > On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> > > > > >> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > > > > >>> From: Ashish Kalra <ashish.kalra@amd.com>
> > > > > >>>
> > > > > >>> This ioctl can be used by the application to reset the page
> > > > > >>> encryption bitmap managed by the KVM driver. A typical usage
> > > > > >>> for this ioctl is on VM reboot, on reboot, we must reinitialize
> > > > > >>> the bitmap.
> > > > > >>>
> > > > > >>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > > > >>> ---
> > > > > >>>    Documentation/virt/kvm/api.rst  | 13 +++++++++++++
> > > > > >>>    arch/x86/include/asm/kvm_host.h |  1 +
> > > > > >>>    arch/x86/kvm/svm.c              | 16 ++++++++++++++++
> > > > > >>>    arch/x86/kvm/x86.c              |  6 ++++++
> > > > > >>>    include/uapi/linux/kvm.h        |  1 +
> > > > > >>>    5 files changed, 37 insertions(+)
> > > > > >>>
> > > > > >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > > >>> index 4d1004a154f6..a11326ccc51d 100644
> > > > > >>> --- a/Documentation/virt/kvm/api.rst
> > > > > >>> +++ b/Documentation/virt/kvm/api.rst
> > > > > >>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> > > > > >>>    bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > > > >>>    bitmap for an incoming guest.
> > > > > >>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> > > > > >>> +-----------------------------------------
> > > > > >>> +
> > > > > >>> +:Capability: basic
> > > > > >>> +:Architectures: x86
> > > > > >>> +:Type: vm ioctl
> > > > > >>> +:Parameters: none
> > > > > >>> +:Returns: 0 on success, -1 on error
> > > > > >>> +
> > > > > >>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> > > > > >>> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> > > > > >>> +
> > > > > >>> +
> > > > > >>>    5. The kvm_run structure
> > > > > >>>    ========================
> > > > > >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > > >>> index d30f770aaaea..a96ef6338cd2 100644
> > > > > >>> --- a/arch/x86/include/asm/kvm_host.h
> > > > > >>> +++ b/arch/x86/include/asm/kvm_host.h
> > > > > >>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> > > > > >>>                             struct kvm_page_enc_bitmap *bmap);
> > > > > >>>     int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > > > >>>                             struct kvm_page_enc_bitmap *bmap);
> > > > > >>> +   int (*reset_page_enc_bitmap)(struct kvm *kvm);
> > > > > >>>    };
> > > > > >>>    struct kvm_arch_async_pf {
> > > > > >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > > > >>> index 313343a43045..c99b0207a443 100644
> > > > > >>> --- a/arch/x86/kvm/svm.c
> > > > > >>> +++ b/arch/x86/kvm/svm.c
> > > > > >>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > > > >>>     return ret;
> > > > > >>>    }
> > > > > >>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> > > > > >>> +{
> > > > > >>> +   struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > > >>> +
> > > > > >>> +   if (!sev_guest(kvm))
> > > > > >>> +           return -ENOTTY;
> > > > > >>> +
> > > > > >>> +   mutex_lock(&kvm->lock);
> > > > > >>> +   /* by default all pages should be marked encrypted */
> > > > > >>> +   if (sev->page_enc_bmap_size)
> > > > > >>> +           bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > > > >>> +   mutex_unlock(&kvm->lock);
> > > > > >>> +   return 0;
> > > > > >>> +}
> > > > > >>> +
> > > > > >>>    static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > > > >>>    {
> > > > > >>>     struct kvm_sev_cmd sev_cmd;
> > > > > >>> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > > > >>>     .page_enc_status_hc = svm_page_enc_status_hc,
> > > > > >>>     .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > > > > >>>     .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> > > > > >>> +   .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
> > > > > >>
> > > > > >> We don't need to initialize the intel ops to NULL ? It's not initialized in
> > > > > >> the previous patch either.
> > > > > >>
> > > > > >>>    };
> > > > > > This struct is declared as "static storage", so won't the non-initialized
> > > > > > members be 0 ?
> > > > >
> > > > >
> > > > > Correct. Although, I see that 'nested_enable_evmcs' is explicitly
> > > > > initialized. We should maintain the convention, perhaps.
> > > > >
> > > > > >
> > > > > >>>    static int __init svm_init(void)
> > > > > >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > >>> index 05e953b2ec61..2127ed937f53 100644
> > > > > >>> --- a/arch/x86/kvm/x86.c
> > > > > >>> +++ b/arch/x86/kvm/x86.c
> > > > > >>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > > > > >>>                     r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> > > > > >>>             break;
> > > > > >>>     }
> > > > > >>> +   case KVM_PAGE_ENC_BITMAP_RESET: {
> > > > > >>> +           r = -ENOTTY;
> > > > > >>> +           if (kvm_x86_ops->reset_page_enc_bitmap)
> > > > > >>> +                   r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> > > > > >>> +           break;
> > > > > >>> +   }
> > > > > >>>     default:
> > > > > >>>             r = -ENOTTY;
> > > > > >>>     }
> > > > > >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > > > >>> index b4b01d47e568..0884a581fc37 100644
> > > > > >>> --- a/include/uapi/linux/kvm.h
> > > > > >>> +++ b/include/uapi/linux/kvm.h
> > > > > >>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> > > > > >>>    #define KVM_GET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > > > > >>>    #define KVM_SET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> > > > > >>> +#define KVM_PAGE_ENC_BITMAP_RESET  _IO(KVMIO, 0xc7)
> > > > > >>>    /* Secure Encrypted Virtualization command */
> > > > > >>>    enum sev_cmd_id {
> > > > > >> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> > > >
> > > >
> > > > Doesn't this overlap with the set ioctl? Yes, obviously, you have to
> > > > copy the new value down and do a bit more work, but I don't think
> > > > resetting the bitmap is going to be the bottleneck on reboot. Seems
> > > > excessive to add another ioctl for this.
> > >
> > > The set ioctl is generally available/provided for the incoming VM to setup
> > > the page encryption bitmap, this reset ioctl is meant for the source VM
> > > as a simple interface to reset the whole page encryption bitmap.
> > >
> > > Thanks,
> > > Ashish
> >
> >
> > Hey Ashish,
> >
> > These seem very overlapping. I think this API should be refactored a bit.
> >
> > 1) Use kvm_vm_ioctl_enable_cap to control whether or not this
> > hypercall (and related feature bit) is offered to the VM, and also the
> > size of the buffer.
>
> If you look at patch 13/14, i have added a new kvm para feature called
> "KVM_FEATURE_SEV_LIVE_MIGRATION" which indicates host support for SEV
> Live Migration and a new Custom MSR which the guest does a wrmsr to
> enable the Live Migration feature, so this is like the enable cap
> support.
>
> There are further extensions to this support i am adding, so patch 13/14
> of this patch-set is still being enhanced and will have full support
> when i repost next.
>
> > 2) Use set for manipulating values in the bitmap, including resetting
> > the bitmap. Set the bitmap pointer to null if you want to reset to all
> > 0xFFs. When the bitmap pointer is set, it should set the values to
> > exactly what is pointed at, instead of only clearing bits, as is done
> > currently.
>
> As i mentioned in my earlier email, the set api is supposed to be for
> the incoming VM, but if you really need to use it for the outgoing VM
> then it can be modified.
>
> > 3) Use get for fetching values from the kernel. Personally, I'd
> > require alignment of the base GFN to a multiple of 8 (but the number
> > of pages could be whatever), so you can just use a memcpy. Optionally,
> > you may want some way to tell userspace the size of the existing
> > buffer, so it can ensure that it can ask for the entire buffer without
> > having to track the size in usermode (not strictly necessary, but nice
> > to have since it ensures that there is only one place that has to
> > manage this value).
> >
> > If you want to expand or contract the bitmap, you can use enable cap
> > to adjust the size.
>
> As being discussed on the earlier mail thread, we are doing this
> dynamically now by computing the guest RAM size when the
> set_user_memory_region ioctl is invoked. I believe that should handle
> the hot-plug and hot-unplug events too, as any hot memory updates will
> need KVM memslots to be updated.
Ahh, sorry, forgot you mentioned this: yes this can work. Host needs
to be able to decide not to allocate, but this should be workable.
>
> > If you don't want to offer the hypercall to the guest, don't call the
> > enable cap.
> > This API avoids using up another ioctl. Ioctl space is somewhat
> > scarce. It also gives userspace fine grained control over the buffer,
> > so it can support both hot-plug and hot-unplug (or at the very least
> > it is not obviously incompatible with those). It also gives userspace
> > control over whether or not the feature is offered. The hypercall
> > isn't free, and being able to tell guests to not call when the host
> > wasn't going to migrate it anyway will be useful.
> >
>
> As i mentioned above, now the host indicates if it supports the Live
> Migration feature and the feature and the hypercall are only enabled on
> the host when the guest checks for this support and does a wrmsr() to
> enable the feature. Also the guest will not make the hypercall if the
> host does not indicate support for it.
If my read of those patches was correct, the host will always
advertise support for the hypercall. And the only bit controlling
whether or not the hypercall is advertised is essentially the kernel
version. You need to rollout a new kernel to disable the hypercall.

An enable cap could give the host control of this feature bit. It
could also give the host control of whether or not the bitmaps are
allocated. This is important since I assume not every SEV VM is
planned to be migrated. And if the host does not plan to migrate it
probably doesn't want to waste memory or cycles managing this bitmap.

Thanks,
Steve

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  2020-04-10 18:14                 ` Steve Rutherford
@ 2020-04-10 20:16                   ` Steve Rutherford
  2020-04-10 20:18                     ` Steve Rutherford
  0 siblings, 1 reply; 107+ messages in thread
From: Steve Rutherford @ 2020-04-10 20:16 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Krish Sadhukhan, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski,
	Brijesh Singh

On Fri, Apr 10, 2020 at 11:14 AM Steve Rutherford
<srutherford@google.com> wrote:
>
> On Thu, Apr 9, 2020 at 6:34 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >
> > Hello Steve,
> >
> > On Thu, Apr 09, 2020 at 05:59:56PM -0700, Steve Rutherford wrote:
> > > On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > >
> > > > Hello Steve,
> > > >
> > > > On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
> > > > > On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
> > > > > <krish.sadhukhan@oracle.com> wrote:
> > > > > >
> > > > > >
> > > > > > On 4/3/20 2:45 PM, Ashish Kalra wrote:
> > > > > > > On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> > > > > > >> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > > > > > >>> From: Ashish Kalra <ashish.kalra@amd.com>
> > > > > > >>>
> > > > > > >>> This ioctl can be used by the application to reset the page
> > > > > > >>> encryption bitmap managed by the KVM driver. A typical usage
> > > > > > >>> for this ioctl is on VM reboot, on reboot, we must reinitialize
> > > > > > >>> the bitmap.
> > > > > > >>>
> > > > > > >>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > > > > >>> ---
> > > > > > >>>    Documentation/virt/kvm/api.rst  | 13 +++++++++++++
> > > > > > >>>    arch/x86/include/asm/kvm_host.h |  1 +
> > > > > > >>>    arch/x86/kvm/svm.c              | 16 ++++++++++++++++
> > > > > > >>>    arch/x86/kvm/x86.c              |  6 ++++++
> > > > > > >>>    include/uapi/linux/kvm.h        |  1 +
> > > > > > >>>    5 files changed, 37 insertions(+)
> > > > > > >>>
> > > > > > >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > > > >>> index 4d1004a154f6..a11326ccc51d 100644
> > > > > > >>> --- a/Documentation/virt/kvm/api.rst
> > > > > > >>> +++ b/Documentation/virt/kvm/api.rst
> > > > > > >>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> > > > > > >>>    bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > > > > >>>    bitmap for an incoming guest.
> > > > > > >>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> > > > > > >>> +-----------------------------------------
> > > > > > >>> +
> > > > > > >>> +:Capability: basic
> > > > > > >>> +:Architectures: x86
> > > > > > >>> +:Type: vm ioctl
> > > > > > >>> +:Parameters: none
> > > > > > >>> +:Returns: 0 on success, -1 on error
> > > > > > >>> +
> > > > > > >>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> > > > > > >>> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> > > > > > >>> +
> > > > > > >>> +
> > > > > > >>>    5. The kvm_run structure
> > > > > > >>>    ========================
> > > > > > >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > > > >>> index d30f770aaaea..a96ef6338cd2 100644
> > > > > > >>> --- a/arch/x86/include/asm/kvm_host.h
> > > > > > >>> +++ b/arch/x86/include/asm/kvm_host.h
> > > > > > >>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> > > > > > >>>                             struct kvm_page_enc_bitmap *bmap);
> > > > > > >>>     int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > > > > >>>                             struct kvm_page_enc_bitmap *bmap);
> > > > > > >>> +   int (*reset_page_enc_bitmap)(struct kvm *kvm);
> > > > > > >>>    };
> > > > > > >>>    struct kvm_arch_async_pf {
> > > > > > >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > > > > >>> index 313343a43045..c99b0207a443 100644
> > > > > > >>> --- a/arch/x86/kvm/svm.c
> > > > > > >>> +++ b/arch/x86/kvm/svm.c
> > > > > > >>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > > > > >>>     return ret;
> > > > > > >>>    }
> > > > > > >>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> > > > > > >>> +{
> > > > > > >>> +   struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > > > >>> +
> > > > > > >>> +   if (!sev_guest(kvm))
> > > > > > >>> +           return -ENOTTY;
> > > > > > >>> +
> > > > > > >>> +   mutex_lock(&kvm->lock);
> > > > > > >>> +   /* by default all pages should be marked encrypted */
> > > > > > >>> +   if (sev->page_enc_bmap_size)
> > > > > > >>> +           bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > > > > >>> +   mutex_unlock(&kvm->lock);
> > > > > > >>> +   return 0;
> > > > > > >>> +}
> > > > > > >>> +
> > > > > > >>>    static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > > > > >>>    {
> > > > > > >>>     struct kvm_sev_cmd sev_cmd;
> > > > > > >>> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > > > > >>>     .page_enc_status_hc = svm_page_enc_status_hc,
> > > > > > >>>     .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > > > > > >>>     .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> > > > > > >>> +   .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
> > > > > > >>
> > > > > > >> We don't need to initialize the intel ops to NULL ? It's not initialized in
> > > > > > >> the previous patch either.
> > > > > > >>
> > > > > > >>>    };
> > > > > > > This struct is declared as "static storage", so won't the non-initialized
> > > > > > > members be 0 ?
> > > > > >
> > > > > >
> > > > > > Correct. Although, I see that 'nested_enable_evmcs' is explicitly
> > > > > > initialized. We should maintain the convention, perhaps.
> > > > > >
> > > > > > >
> > > > > > >>>    static int __init svm_init(void)
> > > > > > >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > >>> index 05e953b2ec61..2127ed937f53 100644
> > > > > > >>> --- a/arch/x86/kvm/x86.c
> > > > > > >>> +++ b/arch/x86/kvm/x86.c
> > > > > > >>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > > > > > >>>                     r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> > > > > > >>>             break;
> > > > > > >>>     }
> > > > > > >>> +   case KVM_PAGE_ENC_BITMAP_RESET: {
> > > > > > >>> +           r = -ENOTTY;
> > > > > > >>> +           if (kvm_x86_ops->reset_page_enc_bitmap)
> > > > > > >>> +                   r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> > > > > > >>> +           break;
> > > > > > >>> +   }
> > > > > > >>>     default:
> > > > > > >>>             r = -ENOTTY;
> > > > > > >>>     }
> > > > > > >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > > > > >>> index b4b01d47e568..0884a581fc37 100644
> > > > > > >>> --- a/include/uapi/linux/kvm.h
> > > > > > >>> +++ b/include/uapi/linux/kvm.h
> > > > > > >>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> > > > > > >>>    #define KVM_GET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > > > > > >>>    #define KVM_SET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> > > > > > >>> +#define KVM_PAGE_ENC_BITMAP_RESET  _IO(KVMIO, 0xc7)
> > > > > > >>>    /* Secure Encrypted Virtualization command */
> > > > > > >>>    enum sev_cmd_id {
> > > > > > >> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> > > > >
> > > > >
> > > > > Doesn't this overlap with the set ioctl? Yes, obviously, you have to
> > > > > copy the new value down and do a bit more work, but I don't think
> > > > > resetting the bitmap is going to be the bottleneck on reboot. Seems
> > > > > excessive to add another ioctl for this.
> > > >
> > > > The set ioctl is generally available/provided for the incoming VM to setup
> > > > the page encryption bitmap, this reset ioctl is meant for the source VM
> > > > as a simple interface to reset the whole page encryption bitmap.
> > > >
> > > > Thanks,
> > > > Ashish
> > >
> > >
> > > Hey Ashish,
> > >
> > > These seem very overlapping. I think this API should be refactored a bit.
> > >
> > > 1) Use kvm_vm_ioctl_enable_cap to control whether or not this
> > > hypercall (and related feature bit) is offered to the VM, and also the
> > > size of the buffer.
> >
> > If you look at patch 13/14, i have added a new kvm para feature called
> > "KVM_FEATURE_SEV_LIVE_MIGRATION" which indicates host support for SEV
> > Live Migration and a new Custom MSR which the guest does a wrmsr to
> > enable the Live Migration feature, so this is like the enable cap
> > support.
> >
> > There are further extensions to this support i am adding, so patch 13/14
> > of this patch-set is still being enhanced and will have full support
> > when i repost next.
> >
> > > 2) Use set for manipulating values in the bitmap, including resetting
> > > the bitmap. Set the bitmap pointer to null if you want to reset to all
> > > 0xFFs. When the bitmap pointer is set, it should set the values to
> > > exactly what is pointed at, instead of only clearing bits, as is done
> > > currently.
> >
> > As i mentioned in my earlier email, the set api is supposed to be for
> > the incoming VM, but if you really need to use it for the outgoing VM
> > then it can be modified.
> >
> > > 3) Use get for fetching values from the kernel. Personally, I'd
> > > require alignment of the base GFN to a multiple of 8 (but the number
> > > of pages could be whatever), so you can just use a memcpy. Optionally,
> > > you may want some way to tell userspace the size of the existing
> > > buffer, so it can ensure that it can ask for the entire buffer without
> > > having to track the size in usermode (not strictly necessary, but nice
> > > to have since it ensures that there is only one place that has to
> > > manage this value).
> > >
> > > If you want to expand or contract the bitmap, you can use enable cap
> > > to adjust the size.
> >
> > As being discussed on the earlier mail thread, we are doing this
> > dynamically now by computing the guest RAM size when the
> > set_user_memory_region ioctl is invoked. I believe that should handle
> > the hot-plug and hot-unplug events too, as any hot memory updates will
> > need KVM memslots to be updated.
> Ahh, sorry, forgot you mentioned this: yes this can work. Host needs
> to be able to decide not to allocate, but this should be workable.
> >
> > > If you don't want to offer the hypercall to the guest, don't call the
> > > enable cap.
> > > This API avoids using up another ioctl. Ioctl space is somewhat
> > > scarce. It also gives userspace fine grained control over the buffer,
> > > so it can support both hot-plug and hot-unplug (or at the very least
> > > it is not obviously incompatible with those). It also gives userspace
> > > control over whether or not the feature is offered. The hypercall
> > > isn't free, and being able to tell guests to not call when the host
> > > wasn't going to migrate it anyway will be useful.
> > >
> >
> > As i mentioned above, now the host indicates if it supports the Live
> > Migration feature and the feature and the hypercall are only enabled on
> > the host when the guest checks for this support and does a wrmsr() to
> > enable the feature. Also the guest will not make the hypercall if the
> > host does not indicate support for it.
> If my read of those patches was correct, the host will always
> advertise support for the hypercall. And the only bit controlling
> whether or not the hypercall is advertised is essentially the kernel
> version. You need to rollout a new kernel to disable the hypercall.

Ahh, awesome, I see I misunderstood how the CPUID bits get passed
through: usermode can still override them. Forgot about the back and
forth for CPUID with usermode. My point about informing the guest
kernel is clearly moot. The host still needs the ability to prevent
allocations, but that is more minor. Maybe use a flag on the memslots
directly?

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  2020-04-10 20:16                   ` Steve Rutherford
@ 2020-04-10 20:18                     ` Steve Rutherford
  2020-04-10 20:55                       ` Kalra, Ashish
  0 siblings, 1 reply; 107+ messages in thread
From: Steve Rutherford @ 2020-04-10 20:18 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Krish Sadhukhan, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski,
	Brijesh Singh

On Fri, Apr 10, 2020 at 1:16 PM Steve Rutherford <srutherford@google.com> wrote:
>
> On Fri, Apr 10, 2020 at 11:14 AM Steve Rutherford
> <srutherford@google.com> wrote:
> >
> > On Thu, Apr 9, 2020 at 6:34 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > >
> > > Hello Steve,
> > >
> > > On Thu, Apr 09, 2020 at 05:59:56PM -0700, Steve Rutherford wrote:
> > > > On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > > >
> > > > > Hello Steve,
> > > > >
> > > > > On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
> > > > > > On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan
> > > > > > <krish.sadhukhan@oracle.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > > On 4/3/20 2:45 PM, Ashish Kalra wrote:
> > > > > > > > On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> > > > > > > >> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > > > > > > >>> From: Ashish Kalra <ashish.kalra@amd.com>
> > > > > > > >>>
> > > > > > > >>> This ioctl can be used by the application to reset the page
> > > > > > > >>> encryption bitmap managed by the KVM driver. A typical usage
> > > > > > > >>> for this ioctl is on VM reboot, on reboot, we must reinitialize
> > > > > > > >>> the bitmap.
> > > > > > > >>>
> > > > > > > >>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > > > > > >>> ---
> > > > > > > >>>    Documentation/virt/kvm/api.rst  | 13 +++++++++++++
> > > > > > > >>>    arch/x86/include/asm/kvm_host.h |  1 +
> > > > > > > >>>    arch/x86/kvm/svm.c              | 16 ++++++++++++++++
> > > > > > > >>>    arch/x86/kvm/x86.c              |  6 ++++++
> > > > > > > >>>    include/uapi/linux/kvm.h        |  1 +
> > > > > > > >>>    5 files changed, 37 insertions(+)
> > > > > > > >>>
> > > > > > > >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > > > > >>> index 4d1004a154f6..a11326ccc51d 100644
> > > > > > > >>> --- a/Documentation/virt/kvm/api.rst
> > > > > > > >>> +++ b/Documentation/virt/kvm/api.rst
> > > > > > > >>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> > > > > > > >>>    bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > > > > > >>>    bitmap for an incoming guest.
> > > > > > > >>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> > > > > > > >>> +-----------------------------------------
> > > > > > > >>> +
> > > > > > > >>> +:Capability: basic
> > > > > > > >>> +:Architectures: x86
> > > > > > > >>> +:Type: vm ioctl
> > > > > > > >>> +:Parameters: none
> > > > > > > >>> +:Returns: 0 on success, -1 on error
> > > > > > > >>> +
> > > > > > > >>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the guest's page encryption
> > > > > > > >>> +bitmap during guest reboot and this is only done on the guest's boot vCPU.
> > > > > > > >>> +
> > > > > > > >>> +
> > > > > > > >>>    5. The kvm_run structure
> > > > > > > >>>    ========================
> > > > > > > >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > > > > >>> index d30f770aaaea..a96ef6338cd2 100644
> > > > > > > >>> --- a/arch/x86/include/asm/kvm_host.h
> > > > > > > >>> +++ b/arch/x86/include/asm/kvm_host.h
> > > > > > > >>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> > > > > > > >>>                             struct kvm_page_enc_bitmap *bmap);
> > > > > > > >>>     int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > > > > > >>>                             struct kvm_page_enc_bitmap *bmap);
> > > > > > > >>> +   int (*reset_page_enc_bitmap)(struct kvm *kvm);
> > > > > > > >>>    };
> > > > > > > >>>    struct kvm_arch_async_pf {
> > > > > > > >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > > > > > >>> index 313343a43045..c99b0207a443 100644
> > > > > > > >>> --- a/arch/x86/kvm/svm.c
> > > > > > > >>> +++ b/arch/x86/kvm/svm.c
> > > > > > > >>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > > > > > >>>     return ret;
> > > > > > > >>>    }
> > > > > > > >>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm)
> > > > > > > >>> +{
> > > > > > > >>> +   struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > > > > >>> +
> > > > > > > >>> +   if (!sev_guest(kvm))
> > > > > > > >>> +           return -ENOTTY;
> > > > > > > >>> +
> > > > > > > >>> +   mutex_lock(&kvm->lock);
> > > > > > > >>> +   /* by default all pages should be marked encrypted */
> > > > > > > >>> +   if (sev->page_enc_bmap_size)
> > > > > > > >>> +           bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > > > > > >>> +   mutex_unlock(&kvm->lock);
> > > > > > > >>> +   return 0;
> > > > > > > >>> +}
> > > > > > > >>> +
> > > > > > > >>>    static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > > > > > >>>    {
> > > > > > > >>>     struct kvm_sev_cmd sev_cmd;
> > > > > > > >>> @@ -8203,6 +8218,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > > > > > >>>     .page_enc_status_hc = svm_page_enc_status_hc,
> > > > > > > >>>     .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > > > > > > >>>     .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> > > > > > > >>> +   .reset_page_enc_bitmap = svm_reset_page_enc_bitmap,
> > > > > > > >>
> > > > > > > >> We don't need to initialize the intel ops to NULL ? It's not initialized in
> > > > > > > >> the previous patch either.
> > > > > > > >>
> > > > > > > >>>    };
> > > > > > > > This struct is declared as "static storage", so won't the non-initialized
> > > > > > > > members be 0 ?
> > > > > > >
> > > > > > >
> > > > > > > Correct. Although, I see that 'nested_enable_evmcs' is explicitly
> > > > > > > initialized. We should maintain the convention, perhaps.
> > > > > > >
> > > > > > > >
> > > > > > > >>>    static int __init svm_init(void)
> > > > > > > >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > > >>> index 05e953b2ec61..2127ed937f53 100644
> > > > > > > >>> --- a/arch/x86/kvm/x86.c
> > > > > > > >>> +++ b/arch/x86/kvm/x86.c
> > > > > > > >>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > > > > > > >>>                     r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> > > > > > > >>>             break;
> > > > > > > >>>     }
> > > > > > > >>> +   case KVM_PAGE_ENC_BITMAP_RESET: {
> > > > > > > >>> +           r = -ENOTTY;
> > > > > > > >>> +           if (kvm_x86_ops->reset_page_enc_bitmap)
> > > > > > > >>> +                   r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> > > > > > > >>> +           break;
> > > > > > > >>> +   }
> > > > > > > >>>     default:
> > > > > > > >>>             r = -ENOTTY;
> > > > > > > >>>     }
> > > > > > > >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > > > > > >>> index b4b01d47e568..0884a581fc37 100644
> > > > > > > >>> --- a/include/uapi/linux/kvm.h
> > > > > > > >>> +++ b/include/uapi/linux/kvm.h
> > > > > > > >>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> > > > > > > >>>    #define KVM_GET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > > > > > > >>>    #define KVM_SET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc6, struct kvm_page_enc_bitmap)
> > > > > > > >>> +#define KVM_PAGE_ENC_BITMAP_RESET  _IO(KVMIO, 0xc7)
> > > > > > > >>>    /* Secure Encrypted Virtualization command */
> > > > > > > >>>    enum sev_cmd_id {
> > > > > > > >> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> > > > > >
> > > > > >
> > > > > > Doesn't this overlap with the set ioctl? Yes, obviously, you have to
> > > > > > copy the new value down and do a bit more work, but I don't think
> > > > > > resetting the bitmap is going to be the bottleneck on reboot. Seems
> > > > > > excessive to add another ioctl for this.
> > > > >
> > > > > The set ioctl is generally available/provided for the incoming VM to setup
> > > > > the page encryption bitmap, this reset ioctl is meant for the source VM
> > > > > as a simple interface to reset the whole page encryption bitmap.
> > > > >
> > > > > Thanks,
> > > > > Ashish
> > > >
> > > >
> > > > Hey Ashish,
> > > >
> > > > These seem very overlapping. I think this API should be refactored a bit.
> > > >
> > > > 1) Use kvm_vm_ioctl_enable_cap to control whether or not this
> > > > hypercall (and related feature bit) is offered to the VM, and also the
> > > > size of the buffer.
> > >
> > > If you look at patch 13/14, i have added a new kvm para feature called
> > > "KVM_FEATURE_SEV_LIVE_MIGRATION" which indicates host support for SEV
> > > Live Migration and a new Custom MSR which the guest does a wrmsr to
> > > enable the Live Migration feature, so this is like the enable cap
> > > support.
> > >
> > > There are further extensions to this support i am adding, so patch 13/14
> > > of this patch-set is still being enhanced and will have full support
> > > when i repost next.
> > >
> > > > 2) Use set for manipulating values in the bitmap, including resetting
> > > > the bitmap. Set the bitmap pointer to null if you want to reset to all
> > > > 0xFFs. When the bitmap pointer is set, it should set the values to
> > > > exactly what is pointed at, instead of only clearing bits, as is done
> > > > currently.
> > >
> > > As i mentioned in my earlier email, the set api is supposed to be for
> > > the incoming VM, but if you really need to use it for the outgoing VM
> > > then it can be modified.
> > >
> > > > 3) Use get for fetching values from the kernel. Personally, I'd
> > > > require alignment of the base GFN to a multiple of 8 (but the number
> > > > of pages could be whatever), so you can just use a memcpy. Optionally,
> > > > you may want some way to tell userspace the size of the existing
> > > > buffer, so it can ensure that it can ask for the entire buffer without
> > > > having to track the size in usermode (not strictly necessary, but nice
> > > > to have since it ensures that there is only one place that has to
> > > > manage this value).
> > > >
> > > > If you want to expand or contract the bitmap, you can use enable cap
> > > > to adjust the size.
> > >
> > > As being discussed on the earlier mail thread, we are doing this
> > > dynamically now by computing the guest RAM size when the
> > > set_user_memory_region ioctl is invoked. I believe that should handle
> > > the hot-plug and hot-unplug events too, as any hot memory updates will
> > > need KVM memslots to be updated.
> > Ahh, sorry, forgot you mentioned this: yes this can work. Host needs
> > to be able to decide not to allocate, but this should be workable.
> > >
> > > > If you don't want to offer the hypercall to the guest, don't call the
> > > > enable cap.
> > > > This API avoids using up another ioctl. Ioctl space is somewhat
> > > > scarce. It also gives userspace fine grained control over the buffer,
> > > > so it can support both hot-plug and hot-unplug (or at the very least
> > > > it is not obviously incompatible with those). It also gives userspace
> > > > control over whether or not the feature is offered. The hypercall
> > > > isn't free, and being able to tell guests to not call when the host
> > > > wasn't going to migrate it anyway will be useful.
> > > >
> > >
> > > As i mentioned above, now the host indicates if it supports the Live
> > > Migration feature and the feature and the hypercall are only enabled on
> > > the host when the guest checks for this support and does a wrmsr() to
> > > enable the feature. Also the guest will not make the hypercall if the
> > > host does not indicate support for it.
> > If my read of those patches was correct, the host will always
> > advertise support for the hypercall. And the only bit controlling
> > whether or not the hypercall is advertised is essentially the kernel
> > version. You need to rollout a new kernel to disable the hypercall.
>
> Ahh, awesome, I see I misunderstood how the CPUID bits get passed
> through: usermode can still override them. Forgot about the back and
> forth for CPUID with usermode. My point about informing the guest
> kernel is clearly moot. The host still needs the ability to prevent
> allocations, but that is more minor. Maybe use a flag on the memslots
> directly?
On second thought: burning the memslot flag for 30mb per tb of VM
seems like a waste.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* RE: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  2020-04-10 20:18                     ` Steve Rutherford
@ 2020-04-10 20:55                       ` Kalra, Ashish
  2020-04-10 21:42                         ` Brijesh Singh
  2020-04-10 22:02                         ` Brijesh Singh
  0 siblings, 2 replies; 107+ messages in thread
From: Kalra, Ashish @ 2020-04-10 20:55 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Krish Sadhukhan, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Borislav Petkov, Lendacky, Thomas,
	X86 ML, KVM list, LKML, David Rientjes, Andy Lutomirski, Singh,
	Brijesh

[AMD Official Use Only - Internal Distribution Only]

Hello Steve,

-----Original Message-----
From: Steve Rutherford <srutherford@google.com> 
Sent: Friday, April 10, 2020 3:19 PM
To: Kalra, Ashish <Ashish.Kalra@amd.com>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>; Paolo Bonzini <pbonzini@redhat.com>; Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; Joerg Roedel <joro@8bytes.org>; Borislav Petkov <bp@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; X86 ML <x86@kernel.org>; KVM list <kvm@vger.kernel.org>; LKML <linux-kernel@vger.kernel.org>; David Rientjes <rientjes@google.com>; Andy Lutomirski <luto@kernel.org>; Singh, Brijesh <brijesh.singh@amd.com>
Subject: Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl

On Fri, Apr 10, 2020 at 1:16 PM Steve Rutherford <srutherford@google.com> wrote:
>
> On Fri, Apr 10, 2020 at 11:14 AM Steve Rutherford 
> <srutherford@google.com> wrote:
> >
> > On Thu, Apr 9, 2020 at 6:34 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > >
> > > Hello Steve,
> > >
> > > On Thu, Apr 09, 2020 at 05:59:56PM -0700, Steve Rutherford wrote:
> > > > On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> > > > >
> > > > > Hello Steve,
> > > > >
> > > > > On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
> > > > > > On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan 
> > > > > > <krish.sadhukhan@oracle.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > > On 4/3/20 2:45 PM, Ashish Kalra wrote:
> > > > > > > > On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> > > > > > > >> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> > > > > > > >>> From: Ashish Kalra <ashish.kalra@amd.com>
> > > > > > > >>>
> > > > > > > >>> This ioctl can be used by the application to reset the 
> > > > > > > >>> page encryption bitmap managed by the KVM driver. A 
> > > > > > > >>> typical usage for this ioctl is on VM reboot, on 
> > > > > > > >>> reboot, we must reinitialize the bitmap.
> > > > > > > >>>
> > > > > > > >>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > > > > > >>> ---
> > > > > > > >>>    Documentation/virt/kvm/api.rst  | 13 +++++++++++++
> > > > > > > >>>    arch/x86/include/asm/kvm_host.h |  1 +
> > > > > > > >>>    arch/x86/kvm/svm.c              | 16 ++++++++++++++++
> > > > > > > >>>    arch/x86/kvm/x86.c              |  6 ++++++
> > > > > > > >>>    include/uapi/linux/kvm.h        |  1 +
> > > > > > > >>>    5 files changed, 37 insertions(+)
> > > > > > > >>>
> > > > > > > >>> diff --git a/Documentation/virt/kvm/api.rst 
> > > > > > > >>> b/Documentation/virt/kvm/api.rst index 
> > > > > > > >>> 4d1004a154f6..a11326ccc51d 100644
> > > > > > > >>> --- a/Documentation/virt/kvm/api.rst
> > > > > > > >>> +++ b/Documentation/virt/kvm/api.rst
> > > > > > > >>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> > > > > > > >>>    bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> > > > > > > >>>    bitmap for an incoming guest.
> > > > > > > >>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> > > > > > > >>> +-----------------------------------------
> > > > > > > >>> +
> > > > > > > >>> +:Capability: basic
> > > > > > > >>> +:Architectures: x86
> > > > > > > >>> +:Type: vm ioctl
> > > > > > > >>> +:Parameters: none
> > > > > > > >>> +:Returns: 0 on success, -1 on error
> > > > > > > >>> +
> > > > > > > >>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the 
> > > > > > > >>> +guest's page encryption bitmap during guest reboot and this is only done on the guest's boot vCPU.
> > > > > > > >>> +
> > > > > > > >>> +
> > > > > > > >>>    5. The kvm_run structure
> > > > > > > >>>    ======================== diff --git 
> > > > > > > >>> a/arch/x86/include/asm/kvm_host.h 
> > > > > > > >>> b/arch/x86/include/asm/kvm_host.h index 
> > > > > > > >>> d30f770aaaea..a96ef6338cd2 100644
> > > > > > > >>> --- a/arch/x86/include/asm/kvm_host.h
> > > > > > > >>> +++ b/arch/x86/include/asm/kvm_host.h
> > > > > > > >>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> > > > > > > >>>                             struct kvm_page_enc_bitmap *bmap);
> > > > > > > >>>     int (*set_page_enc_bitmap)(struct kvm *kvm,
> > > > > > > >>>                             struct kvm_page_enc_bitmap 
> > > > > > > >>> *bmap);
> > > > > > > >>> +   int (*reset_page_enc_bitmap)(struct kvm *kvm);
> > > > > > > >>>    };
> > > > > > > >>>    struct kvm_arch_async_pf { diff --git 
> > > > > > > >>> a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 
> > > > > > > >>> 313343a43045..c99b0207a443 100644
> > > > > > > >>> --- a/arch/x86/kvm/svm.c
> > > > > > > >>> +++ b/arch/x86/kvm/svm.c
> > > > > > > >>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> > > > > > > >>>     return ret;
> > > > > > > >>>    }
> > > > > > > >>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm) 
> > > > > > > >>> +{
> > > > > > > >>> +   struct kvm_sev_info *sev = 
> > > > > > > >>> +&to_kvm_svm(kvm)->sev_info;
> > > > > > > >>> +
> > > > > > > >>> +   if (!sev_guest(kvm))
> > > > > > > >>> +           return -ENOTTY;
> > > > > > > >>> +
> > > > > > > >>> +   mutex_lock(&kvm->lock);
> > > > > > > >>> +   /* by default all pages should be marked encrypted */
> > > > > > > >>> +   if (sev->page_enc_bmap_size)
> > > > > > > >>> +           bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> > > > > > > >>> +   mutex_unlock(&kvm->lock);
> > > > > > > >>> +   return 0;
> > > > > > > >>> +}
> > > > > > > >>> +
> > > > > > > >>>    static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> > > > > > > >>>    {
> > > > > > > >>>     struct kvm_sev_cmd sev_cmd; @@ -8203,6 +8218,7 @@ 
> > > > > > > >>> static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> > > > > > > >>>     .page_enc_status_hc = svm_page_enc_status_hc,
> > > > > > > >>>     .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> > > > > > > >>>     .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> > > > > > > >>> +   .reset_page_enc_bitmap = 
> > > > > > > >>> + svm_reset_page_enc_bitmap,
> > > > > > > >>
> > > > > > > >> We don't need to initialize the intel ops to NULL ? 
> > > > > > > >> It's not initialized in the previous patch either.
> > > > > > > >>
> > > > > > > >>>    };
> > > > > > > > This struct is declared as "static storage", so won't 
> > > > > > > > the non-initialized members be 0 ?
> > > > > > >
> > > > > > >
> > > > > > > Correct. Although, I see that 'nested_enable_evmcs' is 
> > > > > > > explicitly initialized. We should maintain the convention, perhaps.
> > > > > > >
> > > > > > > >
> > > > > > > >>>    static int __init svm_init(void) diff --git 
> > > > > > > >>> a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 
> > > > > > > >>> 05e953b2ec61..2127ed937f53 100644
> > > > > > > >>> --- a/arch/x86/kvm/x86.c
> > > > > > > >>> +++ b/arch/x86/kvm/x86.c
> > > > > > > >>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> > > > > > > >>>                     r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> > > > > > > >>>             break;
> > > > > > > >>>     }
> > > > > > > >>> +   case KVM_PAGE_ENC_BITMAP_RESET: {
> > > > > > > >>> +           r = -ENOTTY;
> > > > > > > >>> +           if (kvm_x86_ops->reset_page_enc_bitmap)
> > > > > > > >>> +                   r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> > > > > > > >>> +           break;
> > > > > > > >>> +   }
> > > > > > > >>>     default:
> > > > > > > >>>             r = -ENOTTY;
> > > > > > > >>>     }
> > > > > > > >>> diff --git a/include/uapi/linux/kvm.h 
> > > > > > > >>> b/include/uapi/linux/kvm.h index 
> > > > > > > >>> b4b01d47e568..0884a581fc37 100644
> > > > > > > >>> --- a/include/uapi/linux/kvm.h
> > > > > > > >>> +++ b/include/uapi/linux/kvm.h
> > > > > > > >>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> > > > > > > >>>    #define KVM_GET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> > > > > > > >>>    #define KVM_SET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc6, 
> > > > > > > >>> struct kvm_page_enc_bitmap)
> > > > > > > >>> +#define KVM_PAGE_ENC_BITMAP_RESET  _IO(KVMIO, 0xc7)
> > > > > > > >>>    /* Secure Encrypted Virtualization command */
> > > > > > > >>>    enum sev_cmd_id {
> > > > > > > >> Reviewed-by: Krish Sadhukhan 
> > > > > > > >> <krish.sadhukhan@oracle.com>
> > > > > >
> > > > > >
> > > > > > Doesn't this overlap with the set ioctl? Yes, obviously, you 
> > > > > > have to copy the new value down and do a bit more work, but 
> > > > > > I don't think resetting the bitmap is going to be the 
> > > > > > bottleneck on reboot. Seems excessive to add another ioctl for this.
> > > > >
> > > > > The set ioctl is generally available/provided for the incoming 
> > > > > VM to setup the page encryption bitmap, this reset ioctl is 
> > > > > meant for the source VM as a simple interface to reset the whole page encryption bitmap.
> > > > >
> > > > > Thanks,
> > > > > Ashish
> > > >
> > > >
> > > > Hey Ashish,
> > > >
> > > > These seem very overlapping. I think this API should be refactored a bit.
> > > >
> > > > 1) Use kvm_vm_ioctl_enable_cap to control whether or not this 
> > > > hypercall (and related feature bit) is offered to the VM, and 
> > > > also the size of the buffer.
> > >
> > > If you look at patch 13/14, i have added a new kvm para feature 
> > > called "KVM_FEATURE_SEV_LIVE_MIGRATION" which indicates host 
> > > support for SEV Live Migration and a new Custom MSR which the 
> > > guest does a wrmsr to enable the Live Migration feature, so this 
> > > is like the enable cap support.
> > >
> > > There are further extensions to this support i am adding, so patch 
> > > 13/14 of this patch-set is still being enhanced and will have full 
> > > support when i repost next.
> > >
> > > > 2) Use set for manipulating values in the bitmap, including 
> > > > resetting the bitmap. Set the bitmap pointer to null if you want 
> > > > to reset to all 0xFFs. When the bitmap pointer is set, it should 
> > > > set the values to exactly what is pointed at, instead of only 
> > > > clearing bits, as is done currently.
> > >
> > > As i mentioned in my earlier email, the set api is supposed to be 
> > > for the incoming VM, but if you really need to use it for the 
> > > outgoing VM then it can be modified.
> > >
> > > > 3) Use get for fetching values from the kernel. Personally, I'd 
> > > > require alignment of the base GFN to a multiple of 8 (but the 
> > > > number of pages could be whatever), so you can just use a 
> > > > memcpy. Optionally, you may want some way to tell userspace the 
> > > > size of the existing buffer, so it can ensure that it can ask 
> > > > for the entire buffer without having to track the size in 
> > > > usermode (not strictly necessary, but nice to have since it 
> > > > ensures that there is only one place that has to manage this value).
> > > >
> > > > If you want to expand or contract the bitmap, you can use enable 
> > > > cap to adjust the size.
> > >
> > > As being discussed on the earlier mail thread, we are doing this 
> > > dynamically now by computing the guest RAM size when the 
> > > set_user_memory_region ioctl is invoked. I believe that should 
> > > handle the hot-plug and hot-unplug events too, as any hot memory 
> > > updates will need KVM memslots to be updated.
> > Ahh, sorry, forgot you mentioned this: yes this can work. Host needs 
> > to be able to decide not to allocate, but this should be workable.
> > >
> > > > If you don't want to offer the hypercall to the guest, don't 
> > > > call the enable cap.
> > > > This API avoids using up another ioctl. Ioctl space is somewhat 
> > > > scarce. It also gives userspace fine grained control over the 
> > > > buffer, so it can support both hot-plug and hot-unplug (or at 
> > > > the very least it is not obviously incompatible with those). It 
> > > > also gives userspace control over whether or not the feature is 
> > > > offered. The hypercall isn't free, and being able to tell guests 
> > > > to not call when the host wasn't going to migrate it anyway will be useful.
> > > >
> > >
> > > As i mentioned above, now the host indicates if it supports the 
> > > Live Migration feature and the feature and the hypercall are only 
> > > enabled on the host when the guest checks for this support and 
> > > does a wrmsr() to enable the feature. Also the guest will not make 
> > > the hypercall if the host does not indicate support for it.
> > If my read of those patches was correct, the host will always 
> > advertise support for the hypercall. And the only bit controlling 
> > whether or not the hypercall is advertised is essentially the kernel 
> > version. You need to rollout a new kernel to disable the hypercall.
>
> Ahh, awesome, I see I misunderstood how the CPUID bits get passed
> through: usermode can still override them. Forgot about the back and 
> forth for CPUID with usermode. My point about informing the guest 
> kernel is clearly moot. The host still needs the ability to prevent 
> allocations, but that is more minor. Maybe use a flag on the memslots 
> directly?
> On second thought: burning the memslot flag for 30mb per tb of VM seems like a waste.

Currently, I am still using the approach of a "unified" page encryption bitmap instead of a 
bitmap per memslot, with the main change being that the resizing is only done whenever
there are any updates in memslots, when memslots are updated using the
kvm_arch_commit_memory_region() interface.  

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  2020-04-10 20:55                       ` Kalra, Ashish
@ 2020-04-10 21:42                         ` Brijesh Singh
  2020-04-10 21:46                           ` Sean Christopherson
  2020-04-10 22:02                         ` Brijesh Singh
  1 sibling, 1 reply; 107+ messages in thread
From: Brijesh Singh @ 2020-04-10 21:42 UTC (permalink / raw)
  To: Kalra, Ashish, Steve Rutherford
  Cc: brijesh.singh, Krish Sadhukhan, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Joerg Roedel, Borislav Petkov,
	Lendacky, Thomas, X86 ML, KVM list, LKML, David Rientjes,
	Andy Lutomirski


On 4/10/20 3:55 PM, Kalra, Ashish wrote:
> [AMD Official Use Only - Internal Distribution Only]
>
> Hello Steve,
>
> -----Original Message-----
> From: Steve Rutherford <srutherford@google.com> 
> Sent: Friday, April 10, 2020 3:19 PM
> To: Kalra, Ashish <Ashish.Kalra@amd.com>
> Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>; Paolo Bonzini <pbonzini@redhat.com>; Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; Joerg Roedel <joro@8bytes.org>; Borislav Petkov <bp@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; X86 ML <x86@kernel.org>; KVM list <kvm@vger.kernel.org>; LKML <linux-kernel@vger.kernel.org>; David Rientjes <rientjes@google.com>; Andy Lutomirski <luto@kernel.org>; Singh, Brijesh <brijesh.singh@amd.com>
> Subject: Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
>
> On Fri, Apr 10, 2020 at 1:16 PM Steve Rutherford <srutherford@google.com> wrote:
>> On Fri, Apr 10, 2020 at 11:14 AM Steve Rutherford 
>> <srutherford@google.com> wrote:
>>> On Thu, Apr 9, 2020 at 6:34 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>>>> Hello Steve,
>>>>
>>>> On Thu, Apr 09, 2020 at 05:59:56PM -0700, Steve Rutherford wrote:
>>>>> On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>>>>>> Hello Steve,
>>>>>>
>>>>>> On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
>>>>>>> On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan 
>>>>>>> <krish.sadhukhan@oracle.com> wrote:
>>>>>>>>
>>>>>>>> On 4/3/20 2:45 PM, Ashish Kalra wrote:
>>>>>>>>> On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
>>>>>>>>>> On 3/29/20 11:23 PM, Ashish Kalra wrote:
>>>>>>>>>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>>>>>>>>>
>>>>>>>>>>> This ioctl can be used by the application to reset the 
>>>>>>>>>>> page encryption bitmap managed by the KVM driver. A 
>>>>>>>>>>> typical usage for this ioctl is on VM reboot, on 
>>>>>>>>>>> reboot, we must reinitialize the bitmap.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>>>>>>>>>> ---
>>>>>>>>>>>    Documentation/virt/kvm/api.rst  | 13 +++++++++++++
>>>>>>>>>>>    arch/x86/include/asm/kvm_host.h |  1 +
>>>>>>>>>>>    arch/x86/kvm/svm.c              | 16 ++++++++++++++++
>>>>>>>>>>>    arch/x86/kvm/x86.c              |  6 ++++++
>>>>>>>>>>>    include/uapi/linux/kvm.h        |  1 +
>>>>>>>>>>>    5 files changed, 37 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/Documentation/virt/kvm/api.rst 
>>>>>>>>>>> b/Documentation/virt/kvm/api.rst index 
>>>>>>>>>>> 4d1004a154f6..a11326ccc51d 100644
>>>>>>>>>>> --- a/Documentation/virt/kvm/api.rst
>>>>>>>>>>> +++ b/Documentation/virt/kvm/api.rst
>>>>>>>>>>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
>>>>>>>>>>>    bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
>>>>>>>>>>>    bitmap for an incoming guest.
>>>>>>>>>>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
>>>>>>>>>>> +-----------------------------------------
>>>>>>>>>>> +
>>>>>>>>>>> +:Capability: basic
>>>>>>>>>>> +:Architectures: x86
>>>>>>>>>>> +:Type: vm ioctl
>>>>>>>>>>> +:Parameters: none
>>>>>>>>>>> +:Returns: 0 on success, -1 on error
>>>>>>>>>>> +
>>>>>>>>>>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the 
>>>>>>>>>>> +guest's page encryption bitmap during guest reboot and this is only done on the guest's boot vCPU.
>>>>>>>>>>> +
>>>>>>>>>>> +
>>>>>>>>>>>    5. The kvm_run structure
>>>>>>>>>>>    ======================== diff --git 
>>>>>>>>>>> a/arch/x86/include/asm/kvm_host.h 
>>>>>>>>>>> b/arch/x86/include/asm/kvm_host.h index 
>>>>>>>>>>> d30f770aaaea..a96ef6338cd2 100644
>>>>>>>>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>>>>>>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>>>>>>>>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
>>>>>>>>>>>                             struct kvm_page_enc_bitmap *bmap);
>>>>>>>>>>>     int (*set_page_enc_bitmap)(struct kvm *kvm,
>>>>>>>>>>>                             struct kvm_page_enc_bitmap 
>>>>>>>>>>> *bmap);
>>>>>>>>>>> +   int (*reset_page_enc_bitmap)(struct kvm *kvm);
>>>>>>>>>>>    };
>>>>>>>>>>>    struct kvm_arch_async_pf { diff --git 
>>>>>>>>>>> a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 
>>>>>>>>>>> 313343a43045..c99b0207a443 100644
>>>>>>>>>>> --- a/arch/x86/kvm/svm.c
>>>>>>>>>>> +++ b/arch/x86/kvm/svm.c
>>>>>>>>>>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
>>>>>>>>>>>     return ret;
>>>>>>>>>>>    }
>>>>>>>>>>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm) 
>>>>>>>>>>> +{
>>>>>>>>>>> +   struct kvm_sev_info *sev = 
>>>>>>>>>>> +&to_kvm_svm(kvm)->sev_info;
>>>>>>>>>>> +
>>>>>>>>>>> +   if (!sev_guest(kvm))
>>>>>>>>>>> +           return -ENOTTY;
>>>>>>>>>>> +
>>>>>>>>>>> +   mutex_lock(&kvm->lock);
>>>>>>>>>>> +   /* by default all pages should be marked encrypted */
>>>>>>>>>>> +   if (sev->page_enc_bmap_size)
>>>>>>>>>>> +           bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
>>>>>>>>>>> +   mutex_unlock(&kvm->lock);
>>>>>>>>>>> +   return 0;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>>    static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>>>>>>>>>>    {
>>>>>>>>>>>     struct kvm_sev_cmd sev_cmd; @@ -8203,6 +8218,7 @@ 
>>>>>>>>>>> static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>>>>>>>>>>>     .page_enc_status_hc = svm_page_enc_status_hc,
>>>>>>>>>>>     .get_page_enc_bitmap = svm_get_page_enc_bitmap,
>>>>>>>>>>>     .set_page_enc_bitmap = svm_set_page_enc_bitmap,
>>>>>>>>>>> +   .reset_page_enc_bitmap = 
>>>>>>>>>>> + svm_reset_page_enc_bitmap,
>>>>>>>>>> We don't need to initialize the intel ops to NULL ? 
>>>>>>>>>> It's not initialized in the previous patch either.
>>>>>>>>>>
>>>>>>>>>>>    };
>>>>>>>>> This struct is declared as "static storage", so won't 
>>>>>>>>> the non-initialized members be 0 ?
>>>>>>>>
>>>>>>>> Correct. Although, I see that 'nested_enable_evmcs' is 
>>>>>>>> explicitly initialized. We should maintain the convention, perhaps.
>>>>>>>>
>>>>>>>>>>>    static int __init svm_init(void) diff --git 
>>>>>>>>>>> a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 
>>>>>>>>>>> 05e953b2ec61..2127ed937f53 100644
>>>>>>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>>>>>>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>>>>>>>>>>                     r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
>>>>>>>>>>>             break;
>>>>>>>>>>>     }
>>>>>>>>>>> +   case KVM_PAGE_ENC_BITMAP_RESET: {
>>>>>>>>>>> +           r = -ENOTTY;
>>>>>>>>>>> +           if (kvm_x86_ops->reset_page_enc_bitmap)
>>>>>>>>>>> +                   r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
>>>>>>>>>>> +           break;
>>>>>>>>>>> +   }
>>>>>>>>>>>     default:
>>>>>>>>>>>             r = -ENOTTY;
>>>>>>>>>>>     }
>>>>>>>>>>> diff --git a/include/uapi/linux/kvm.h 
>>>>>>>>>>> b/include/uapi/linux/kvm.h index 
>>>>>>>>>>> b4b01d47e568..0884a581fc37 100644
>>>>>>>>>>> --- a/include/uapi/linux/kvm.h
>>>>>>>>>>> +++ b/include/uapi/linux/kvm.h
>>>>>>>>>>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
>>>>>>>>>>>    #define KVM_GET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
>>>>>>>>>>>    #define KVM_SET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc6, 
>>>>>>>>>>> struct kvm_page_enc_bitmap)
>>>>>>>>>>> +#define KVM_PAGE_ENC_BITMAP_RESET  _IO(KVMIO, 0xc7)
>>>>>>>>>>>    /* Secure Encrypted Virtualization command */
>>>>>>>>>>>    enum sev_cmd_id {
>>>>>>>>>> Reviewed-by: Krish Sadhukhan 
>>>>>>>>>> <krish.sadhukhan@oracle.com>
>>>>>>>
>>>>>>> Doesn't this overlap with the set ioctl? Yes, obviously, you 
>>>>>>> have to copy the new value down and do a bit more work, but 
>>>>>>> I don't think resetting the bitmap is going to be the 
>>>>>>> bottleneck on reboot. Seems excessive to add another ioctl for this.
>>>>>> The set ioctl is generally available/provided for the incoming 
>>>>>> VM to setup the page encryption bitmap, this reset ioctl is 
>>>>>> meant for the source VM as a simple interface to reset the whole page encryption bitmap.
>>>>>>
>>>>>> Thanks,
>>>>>> Ashish
>>>>>
>>>>> Hey Ashish,
>>>>>
>>>>> These seem very overlapping. I think this API should be refactored a bit.
>>>>>
>>>>> 1) Use kvm_vm_ioctl_enable_cap to control whether or not this 
>>>>> hypercall (and related feature bit) is offered to the VM, and 
>>>>> also the size of the buffer.
>>>> If you look at patch 13/14, i have added a new kvm para feature 
>>>> called "KVM_FEATURE_SEV_LIVE_MIGRATION" which indicates host 
>>>> support for SEV Live Migration and a new Custom MSR which the 
>>>> guest does a wrmsr to enable the Live Migration feature, so this 
>>>> is like the enable cap support.
>>>>
>>>> There are further extensions to this support i am adding, so patch 
>>>> 13/14 of this patch-set is still being enhanced and will have full 
>>>> support when i repost next.
>>>>
>>>>> 2) Use set for manipulating values in the bitmap, including 
>>>>> resetting the bitmap. Set the bitmap pointer to null if you want 
>>>>> to reset to all 0xFFs. When the bitmap pointer is set, it should 
>>>>> set the values to exactly what is pointed at, instead of only 
>>>>> clearing bits, as is done currently.
>>>> As i mentioned in my earlier email, the set api is supposed to be 
>>>> for the incoming VM, but if you really need to use it for the 
>>>> outgoing VM then it can be modified.
>>>>
>>>>> 3) Use get for fetching values from the kernel. Personally, I'd 
>>>>> require alignment of the base GFN to a multiple of 8 (but the 
>>>>> number of pages could be whatever), so you can just use a 
>>>>> memcpy. Optionally, you may want some way to tell userspace the 
>>>>> size of the existing buffer, so it can ensure that it can ask 
>>>>> for the entire buffer without having to track the size in 
>>>>> usermode (not strictly necessary, but nice to have since it 
>>>>> ensures that there is only one place that has to manage this value).
>>>>>
>>>>> If you want to expand or contract the bitmap, you can use enable 
>>>>> cap to adjust the size.
>>>> As being discussed on the earlier mail thread, we are doing this 
>>>> dynamically now by computing the guest RAM size when the 
>>>> set_user_memory_region ioctl is invoked. I believe that should 
>>>> handle the hot-plug and hot-unplug events too, as any hot memory 
>>>> updates will need KVM memslots to be updated.
>>> Ahh, sorry, forgot you mentioned this: yes this can work. Host needs 
>>> to be able to decide not to allocate, but this should be workable.
>>>>> If you don't want to offer the hypercall to the guest, don't 
>>>>> call the enable cap.
>>>>> This API avoids using up another ioctl. Ioctl space is somewhat 
>>>>> scarce. It also gives userspace fine grained control over the 
>>>>> buffer, so it can support both hot-plug and hot-unplug (or at 
>>>>> the very least it is not obviously incompatible with those). It 
>>>>> also gives userspace control over whether or not the feature is 
>>>>> offered. The hypercall isn't free, and being able to tell guests 
>>>>> to not call when the host wasn't going to migrate it anyway will be useful.
>>>>>
>>>> As i mentioned above, now the host indicates if it supports the 
>>>> Live Migration feature and the feature and the hypercall are only 
>>>> enabled on the host when the guest checks for this support and 
>>>> does a wrmsr() to enable the feature. Also the guest will not make 
>>>> the hypercall if the host does not indicate support for it.
>>> If my read of those patches was correct, the host will always 
>>> advertise support for the hypercall. And the only bit controlling 
>>> whether or not the hypercall is advertised is essentially the kernel 
>>> version. You need to rollout a new kernel to disable the hypercall.
>> Ahh, awesome, I see I misunderstood how the CPUID bits get passed
>> through: usermode can still override them. Forgot about the back and 
>> forth for CPUID with usermode. My point about informing the guest 
>> kernel is clearly moot. The host still needs the ability to prevent 
>> allocations, but that is more minor. Maybe use a flag on the memslots 
>> directly?
>> On second thought: burning the memslot flag for 30mb per tb of VM seems like a waste.
> Currently, I am still using the approach of a "unified" page encryption bitmap instead of a 
> bitmap per memslot, with the main change being that the resizing is only done whenever
> there are any updates in memslots, when memslots are updated using the
> kvm_arch_commit_memory_region() interface.


Just a note, I believe kvm_arch_commit_memory_region() maybe getting
called every time there is a change in the memory region (e.g add,
update, delete etc). So your architecture specific hook now need to be
well aware of all those changes and act accordingly. This basically
means that the svm.c will probably need to understand all those memory
slot flags etc. IMO, having a separate ioctl to hint the size makes more
sense if you are doing a unified bitmap but if the bitmap is per memslot
then calculating the size based on the memslot information makes more sense.


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  2020-04-10 21:42                         ` Brijesh Singh
@ 2020-04-10 21:46                           ` Sean Christopherson
  2020-04-10 21:58                             ` Brijesh Singh
  0 siblings, 1 reply; 107+ messages in thread
From: Sean Christopherson @ 2020-04-10 21:46 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Kalra, Ashish, Steve Rutherford, Krish Sadhukhan, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joerg Roedel,
	Borislav Petkov, Lendacky, Thomas, X86 ML, KVM list, LKML,
	David Rientjes, Andy Lutomirski

On Fri, Apr 10, 2020 at 04:42:29PM -0500, Brijesh Singh wrote:
> 
> On 4/10/20 3:55 PM, Kalra, Ashish wrote:
> > [AMD Official Use Only - Internal Distribution Only]

Can you please resend the original mail without the above header, so us
non-AMD folks can follow along?  :-)

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  2020-04-10 21:46                           ` Sean Christopherson
@ 2020-04-10 21:58                             ` Brijesh Singh
  0 siblings, 0 replies; 107+ messages in thread
From: Brijesh Singh @ 2020-04-10 21:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: brijesh.singh, Kalra, Ashish, Steve Rutherford, Krish Sadhukhan,
	Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joerg Roedel, Borislav Petkov, Lendacky, Thomas, X86 ML,
	KVM list, LKML, David Rientjes, Andy Lutomirski


On 4/10/20 4:46 PM, Sean Christopherson wrote:
> On Fri, Apr 10, 2020 at 04:42:29PM -0500, Brijesh Singh wrote:
>> On 4/10/20 3:55 PM, Kalra, Ashish wrote:
>>> [AMD Official Use Only - Internal Distribution Only]
> Can you please resend the original mail without the above header, so us
> non-AMD folks can follow along?  :-)

Haha :) looks like one of us probably used outlook for responding. These
days IT have been auto add some of those tags. Let me resend with it
removed.


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  2020-04-10 20:55                       ` Kalra, Ashish
  2020-04-10 21:42                         ` Brijesh Singh
@ 2020-04-10 22:02                         ` Brijesh Singh
  2020-04-11  0:35                           ` Ashish Kalra
  1 sibling, 1 reply; 107+ messages in thread
From: Brijesh Singh @ 2020-04-10 22:02 UTC (permalink / raw)
  To: Kalra, Ashish, Steve Rutherford
  Cc: brijesh.singh, Krish Sadhukhan, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Joerg Roedel, Borislav Petkov,
	Lendacky, Thomas, X86 ML, KVM list, LKML, David Rientjes,
	Andy Lutomirski

resend with internal distribution tag removed.


On 4/10/20 3:55 PM, Kalra, Ashish wrote:
[snip]
..
>
> Hello Steve,
>
> -----Original Message-----
> From: Steve Rutherford <srutherford@google.com> 
> Sent: Friday, April 10, 2020 3:19 PM
> To: Kalra, Ashish <Ashish.Kalra@amd.com>
> Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>; Paolo Bonzini <pbonzini@redhat.com>; Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; Joerg Roedel <joro@8bytes.org>; Borislav Petkov <bp@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; X86 ML <x86@kernel.org>; KVM list <kvm@vger.kernel.org>; LKML <linux-kernel@vger.kernel.org>; David Rientjes <rientjes@google.com>; Andy Lutomirski <luto@kernel.org>; Singh, Brijesh <brijesh.singh@amd.com>
> Subject: Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
>
> On Fri, Apr 10, 2020 at 1:16 PM Steve Rutherford <srutherford@google.com> wrote:
>> On Fri, Apr 10, 2020 at 11:14 AM Steve Rutherford 
>> <srutherford@google.com> wrote:
>>> On Thu, Apr 9, 2020 at 6:34 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>>>> Hello Steve,
>>>>
>>>> On Thu, Apr 09, 2020 at 05:59:56PM -0700, Steve Rutherford wrote:
>>>>> On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
>>>>>> Hello Steve,
>>>>>>
>>>>>> On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
>>>>>>> On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan 
>>>>>>> <krish.sadhukhan@oracle.com> wrote:
>>>>>>>>
>>>>>>>> On 4/3/20 2:45 PM, Ashish Kalra wrote:
>>>>>>>>> On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
>>>>>>>>>> On 3/29/20 11:23 PM, Ashish Kalra wrote:
>>>>>>>>>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>>>>>>>>>
>>>>>>>>>>> This ioctl can be used by the application to reset the 
>>>>>>>>>>> page encryption bitmap managed by the KVM driver. A 
>>>>>>>>>>> typical usage for this ioctl is on VM reboot, on 
>>>>>>>>>>> reboot, we must reinitialize the bitmap.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>>>>>>>>>> ---
>>>>>>>>>>>    Documentation/virt/kvm/api.rst  | 13 +++++++++++++
>>>>>>>>>>>    arch/x86/include/asm/kvm_host.h |  1 +
>>>>>>>>>>>    arch/x86/kvm/svm.c              | 16 ++++++++++++++++
>>>>>>>>>>>    arch/x86/kvm/x86.c              |  6 ++++++
>>>>>>>>>>>    include/uapi/linux/kvm.h        |  1 +
>>>>>>>>>>>    5 files changed, 37 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/Documentation/virt/kvm/api.rst 
>>>>>>>>>>> b/Documentation/virt/kvm/api.rst index 
>>>>>>>>>>> 4d1004a154f6..a11326ccc51d 100644
>>>>>>>>>>> --- a/Documentation/virt/kvm/api.rst
>>>>>>>>>>> +++ b/Documentation/virt/kvm/api.rst
>>>>>>>>>>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
>>>>>>>>>>>    bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
>>>>>>>>>>>    bitmap for an incoming guest.
>>>>>>>>>>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
>>>>>>>>>>> +-----------------------------------------
>>>>>>>>>>> +
>>>>>>>>>>> +:Capability: basic
>>>>>>>>>>> +:Architectures: x86
>>>>>>>>>>> +:Type: vm ioctl
>>>>>>>>>>> +:Parameters: none
>>>>>>>>>>> +:Returns: 0 on success, -1 on error
>>>>>>>>>>> +
>>>>>>>>>>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the 
>>>>>>>>>>> +guest's page encryption bitmap during guest reboot and this is only done on the guest's boot vCPU.
>>>>>>>>>>> +
>>>>>>>>>>> +
>>>>>>>>>>>    5. The kvm_run structure
>>>>>>>>>>>    ======================== diff --git 
>>>>>>>>>>> a/arch/x86/include/asm/kvm_host.h 
>>>>>>>>>>> b/arch/x86/include/asm/kvm_host.h index 
>>>>>>>>>>> d30f770aaaea..a96ef6338cd2 100644
>>>>>>>>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>>>>>>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>>>>>>>>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
>>>>>>>>>>>                             struct kvm_page_enc_bitmap *bmap);
>>>>>>>>>>>     int (*set_page_enc_bitmap)(struct kvm *kvm,
>>>>>>>>>>>                             struct kvm_page_enc_bitmap 
>>>>>>>>>>> *bmap);
>>>>>>>>>>> +   int (*reset_page_enc_bitmap)(struct kvm *kvm);
>>>>>>>>>>>    };
>>>>>>>>>>>    struct kvm_arch_async_pf { diff --git 
>>>>>>>>>>> a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 
>>>>>>>>>>> 313343a43045..c99b0207a443 100644
>>>>>>>>>>> --- a/arch/x86/kvm/svm.c
>>>>>>>>>>> +++ b/arch/x86/kvm/svm.c
>>>>>>>>>>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
>>>>>>>>>>>     return ret;
>>>>>>>>>>>    }
>>>>>>>>>>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm) 
>>>>>>>>>>> +{
>>>>>>>>>>> +   struct kvm_sev_info *sev = 
>>>>>>>>>>> +&to_kvm_svm(kvm)->sev_info;
>>>>>>>>>>> +
>>>>>>>>>>> +   if (!sev_guest(kvm))
>>>>>>>>>>> +           return -ENOTTY;
>>>>>>>>>>> +
>>>>>>>>>>> +   mutex_lock(&kvm->lock);
>>>>>>>>>>> +   /* by default all pages should be marked encrypted */
>>>>>>>>>>> +   if (sev->page_enc_bmap_size)
>>>>>>>>>>> +           bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
>>>>>>>>>>> +   mutex_unlock(&kvm->lock);
>>>>>>>>>>> +   return 0;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>>    static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>>>>>>>>>>    {
>>>>>>>>>>>     struct kvm_sev_cmd sev_cmd; @@ -8203,6 +8218,7 @@ 
>>>>>>>>>>> static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>>>>>>>>>>>     .page_enc_status_hc = svm_page_enc_status_hc,
>>>>>>>>>>>     .get_page_enc_bitmap = svm_get_page_enc_bitmap,
>>>>>>>>>>>     .set_page_enc_bitmap = svm_set_page_enc_bitmap,
>>>>>>>>>>> +   .reset_page_enc_bitmap = 
>>>>>>>>>>> + svm_reset_page_enc_bitmap,
>>>>>>>>>> We don't need to initialize the intel ops to NULL ? 
>>>>>>>>>> It's not initialized in the previous patch either.
>>>>>>>>>>
>>>>>>>>>>>    };
>>>>>>>>> This struct is declared as "static storage", so won't 
>>>>>>>>> the non-initialized members be 0 ?
>>>>>>>>
>>>>>>>> Correct. Although, I see that 'nested_enable_evmcs' is 
>>>>>>>> explicitly initialized. We should maintain the convention, perhaps.
>>>>>>>>
>>>>>>>>>>>    static int __init svm_init(void) diff --git 
>>>>>>>>>>> a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 
>>>>>>>>>>> 05e953b2ec61..2127ed937f53 100644
>>>>>>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>>>>>>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>>>>>>>>>>                     r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
>>>>>>>>>>>             break;
>>>>>>>>>>>     }
>>>>>>>>>>> +   case KVM_PAGE_ENC_BITMAP_RESET: {
>>>>>>>>>>> +           r = -ENOTTY;
>>>>>>>>>>> +           if (kvm_x86_ops->reset_page_enc_bitmap)
>>>>>>>>>>> +                   r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
>>>>>>>>>>> +           break;
>>>>>>>>>>> +   }
>>>>>>>>>>>     default:
>>>>>>>>>>>             r = -ENOTTY;
>>>>>>>>>>>     }
>>>>>>>>>>> diff --git a/include/uapi/linux/kvm.h 
>>>>>>>>>>> b/include/uapi/linux/kvm.h index 
>>>>>>>>>>> b4b01d47e568..0884a581fc37 100644
>>>>>>>>>>> --- a/include/uapi/linux/kvm.h
>>>>>>>>>>> +++ b/include/uapi/linux/kvm.h
>>>>>>>>>>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
>>>>>>>>>>>    #define KVM_GET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
>>>>>>>>>>>    #define KVM_SET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc6, 
>>>>>>>>>>> struct kvm_page_enc_bitmap)
>>>>>>>>>>> +#define KVM_PAGE_ENC_BITMAP_RESET  _IO(KVMIO, 0xc7)
>>>>>>>>>>>    /* Secure Encrypted Virtualization command */
>>>>>>>>>>>    enum sev_cmd_id {
>>>>>>>>>> Reviewed-by: Krish Sadhukhan 
>>>>>>>>>> <krish.sadhukhan@oracle.com>
>>>>>>>
>>>>>>> Doesn't this overlap with the set ioctl? Yes, obviously, you 
>>>>>>> have to copy the new value down and do a bit more work, but 
>>>>>>> I don't think resetting the bitmap is going to be the 
>>>>>>> bottleneck on reboot. Seems excessive to add another ioctl for this.
>>>>>> The set ioctl is generally available/provided for the incoming 
>>>>>> VM to setup the page encryption bitmap, this reset ioctl is 
>>>>>> meant for the source VM as a simple interface to reset the whole page encryption bitmap.
>>>>>>
>>>>>> Thanks,
>>>>>> Ashish
>>>>>
>>>>> Hey Ashish,
>>>>>
>>>>> These seem very overlapping. I think this API should be refactored a bit.
>>>>>
>>>>> 1) Use kvm_vm_ioctl_enable_cap to control whether or not this 
>>>>> hypercall (and related feature bit) is offered to the VM, and 
>>>>> also the size of the buffer.
>>>> If you look at patch 13/14, i have added a new kvm para feature 
>>>> called "KVM_FEATURE_SEV_LIVE_MIGRATION" which indicates host 
>>>> support for SEV Live Migration and a new Custom MSR which the 
>>>> guest does a wrmsr to enable the Live Migration feature, so this 
>>>> is like the enable cap support.
>>>>
>>>> There are further extensions to this support i am adding, so patch 
>>>> 13/14 of this patch-set is still being enhanced and will have full 
>>>> support when i repost next.
>>>>
>>>>> 2) Use set for manipulating values in the bitmap, including 
>>>>> resetting the bitmap. Set the bitmap pointer to null if you want 
>>>>> to reset to all 0xFFs. When the bitmap pointer is set, it should 
>>>>> set the values to exactly what is pointed at, instead of only 
>>>>> clearing bits, as is done currently.
>>>> As i mentioned in my earlier email, the set api is supposed to be 
>>>> for the incoming VM, but if you really need to use it for the 
>>>> outgoing VM then it can be modified.
>>>>
>>>>> 3) Use get for fetching values from the kernel. Personally, I'd 
>>>>> require alignment of the base GFN to a multiple of 8 (but the 
>>>>> number of pages could be whatever), so you can just use a 
>>>>> memcpy. Optionally, you may want some way to tell userspace the 
>>>>> size of the existing buffer, so it can ensure that it can ask 
>>>>> for the entire buffer without having to track the size in 
>>>>> usermode (not strictly necessary, but nice to have since it 
>>>>> ensures that there is only one place that has to manage this value).
>>>>>
>>>>> If you want to expand or contract the bitmap, you can use enable 
>>>>> cap to adjust the size.
>>>> As being discussed on the earlier mail thread, we are doing this 
>>>> dynamically now by computing the guest RAM size when the 
>>>> set_user_memory_region ioctl is invoked. I believe that should 
>>>> handle the hot-plug and hot-unplug events too, as any hot memory 
>>>> updates will need KVM memslots to be updated.
>>> Ahh, sorry, forgot you mentioned this: yes this can work. Host needs 
>>> to be able to decide not to allocate, but this should be workable.
>>>>> If you don't want to offer the hypercall to the guest, don't 
>>>>> call the enable cap.
>>>>> This API avoids using up another ioctl. Ioctl space is somewhat 
>>>>> scarce. It also gives userspace fine grained control over the 
>>>>> buffer, so it can support both hot-plug and hot-unplug (or at 
>>>>> the very least it is not obviously incompatible with those). It 
>>>>> also gives userspace control over whether or not the feature is 
>>>>> offered. The hypercall isn't free, and being able to tell guests 
>>>>> to not call when the host wasn't going to migrate it anyway will be useful.
>>>>>
>>>> As i mentioned above, now the host indicates if it supports the 
>>>> Live Migration feature and the feature and the hypercall are only 
>>>> enabled on the host when the guest checks for this support and 
>>>> does a wrmsr() to enable the feature. Also the guest will not make 
>>>> the hypercall if the host does not indicate support for it.
>>> If my read of those patches was correct, the host will always 
>>> advertise support for the hypercall. And the only bit controlling 
>>> whether or not the hypercall is advertised is essentially the kernel 
>>> version. You need to rollout a new kernel to disable the hypercall.
>> Ahh, awesome, I see I misunderstood how the CPUID bits get passed
>> through: usermode can still override them. Forgot about the back and 
>> forth for CPUID with usermode. My point about informing the guest 
>> kernel is clearly moot. The host still needs the ability to prevent 
>> allocations, but that is more minor. Maybe use a flag on the memslots 
>> directly?
>> On second thought: burning the memslot flag for 30mb per tb of VM seems like a waste.
> Currently, I am still using the approach of a "unified" page encryption bitmap instead of a 
> bitmap per memslot, with the main change being that the resizing is only done whenever
> there are any updates in memslots, when memslots are updated using the
> kvm_arch_commit_memory_region() interface.  


Just a note, I believe kvm_arch_commit_memory_region() maybe getting
called every time there is a change in the memory region (e.g add,
update, delete etc). So your architecture specific hook now need to be
well aware of all those changes and act accordingly. This basically
means that the svm.c will probably need to understand all those memory
slot flags etc. IMO, having a separate ioctl to hint the size makes more
sense if you are doing a unified bitmap but if the bitmap is per memslot
then calculating the size based on the memslot information makes more sense.


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
  2020-04-10 22:02                         ` Brijesh Singh
@ 2020-04-11  0:35                           ` Ashish Kalra
  0 siblings, 0 replies; 107+ messages in thread
From: Ashish Kalra @ 2020-04-11  0:35 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Steve Rutherford, Krish Sadhukhan, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joerg Roedel,
	Borislav Petkov, Lendacky, Thomas, X86 ML, KVM list, LKML,
	David Rientjes, Andy Lutomirski

Hello Brijesh,

On Fri, Apr 10, 2020 at 05:02:46PM -0500, Brijesh Singh wrote:
> resend with internal distribution tag removed.
> 
> 
> On 4/10/20 3:55 PM, Kalra, Ashish wrote:
> [snip]
> ..
> >
> > Hello Steve,
> >
> > -----Original Message-----
> > From: Steve Rutherford <srutherford@google.com> 
> > Sent: Friday, April 10, 2020 3:19 PM
> > To: Kalra, Ashish <Ashish.Kalra@amd.com>
> > Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>; Paolo Bonzini <pbonzini@redhat.com>; Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; H. Peter Anvin <hpa@zytor.com>; Joerg Roedel <joro@8bytes.org>; Borislav Petkov <bp@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; X86 ML <x86@kernel.org>; KVM list <kvm@vger.kernel.org>; LKML <linux-kernel@vger.kernel.org>; David Rientjes <rientjes@google.com>; Andy Lutomirski <luto@kernel.org>; Singh, Brijesh <brijesh.singh@amd.com>
> > Subject: Re: [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl
> >
> > On Fri, Apr 10, 2020 at 1:16 PM Steve Rutherford <srutherford@google.com> wrote:
> >> On Fri, Apr 10, 2020 at 11:14 AM Steve Rutherford 
> >> <srutherford@google.com> wrote:
> >>> On Thu, Apr 9, 2020 at 6:34 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >>>> Hello Steve,
> >>>>
> >>>> On Thu, Apr 09, 2020 at 05:59:56PM -0700, Steve Rutherford wrote:
> >>>>> On Tue, Apr 7, 2020 at 6:52 PM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >>>>>> Hello Steve,
> >>>>>>
> >>>>>> On Tue, Apr 07, 2020 at 06:25:51PM -0700, Steve Rutherford wrote:
> >>>>>>> On Mon, Apr 6, 2020 at 11:53 AM Krish Sadhukhan 
> >>>>>>> <krish.sadhukhan@oracle.com> wrote:
> >>>>>>>>
> >>>>>>>> On 4/3/20 2:45 PM, Ashish Kalra wrote:
> >>>>>>>>> On Fri, Apr 03, 2020 at 02:14:23PM -0700, Krish Sadhukhan wrote:
> >>>>>>>>>> On 3/29/20 11:23 PM, Ashish Kalra wrote:
> >>>>>>>>>>> From: Ashish Kalra <ashish.kalra@amd.com>
> >>>>>>>>>>>
> >>>>>>>>>>> This ioctl can be used by the application to reset the 
> >>>>>>>>>>> page encryption bitmap managed by the KVM driver. A 
> >>>>>>>>>>> typical usage for this ioctl is on VM reboot, on 
> >>>>>>>>>>> reboot, we must reinitialize the bitmap.
> >>>>>>>>>>>
> >>>>>>>>>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> >>>>>>>>>>> ---
> >>>>>>>>>>>    Documentation/virt/kvm/api.rst  | 13 +++++++++++++
> >>>>>>>>>>>    arch/x86/include/asm/kvm_host.h |  1 +
> >>>>>>>>>>>    arch/x86/kvm/svm.c              | 16 ++++++++++++++++
> >>>>>>>>>>>    arch/x86/kvm/x86.c              |  6 ++++++
> >>>>>>>>>>>    include/uapi/linux/kvm.h        |  1 +
> >>>>>>>>>>>    5 files changed, 37 insertions(+)
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/Documentation/virt/kvm/api.rst 
> >>>>>>>>>>> b/Documentation/virt/kvm/api.rst index 
> >>>>>>>>>>> 4d1004a154f6..a11326ccc51d 100644
> >>>>>>>>>>> --- a/Documentation/virt/kvm/api.rst
> >>>>>>>>>>> +++ b/Documentation/virt/kvm/api.rst
> >>>>>>>>>>> @@ -4698,6 +4698,19 @@ During the guest live migration the outgoing guest exports its page encryption
> >>>>>>>>>>>    bitmap, the KVM_SET_PAGE_ENC_BITMAP can be used to build the page encryption
> >>>>>>>>>>>    bitmap for an incoming guest.
> >>>>>>>>>>> +4.127 KVM_PAGE_ENC_BITMAP_RESET (vm ioctl)
> >>>>>>>>>>> +-----------------------------------------
> >>>>>>>>>>> +
> >>>>>>>>>>> +:Capability: basic
> >>>>>>>>>>> +:Architectures: x86
> >>>>>>>>>>> +:Type: vm ioctl
> >>>>>>>>>>> +:Parameters: none
> >>>>>>>>>>> +:Returns: 0 on success, -1 on error
> >>>>>>>>>>> +
> >>>>>>>>>>> +The KVM_PAGE_ENC_BITMAP_RESET is used to reset the 
> >>>>>>>>>>> +guest's page encryption bitmap during guest reboot and this is only done on the guest's boot vCPU.
> >>>>>>>>>>> +
> >>>>>>>>>>> +
> >>>>>>>>>>>    5. The kvm_run structure
> >>>>>>>>>>>    ======================== diff --git 
> >>>>>>>>>>> a/arch/x86/include/asm/kvm_host.h 
> >>>>>>>>>>> b/arch/x86/include/asm/kvm_host.h index 
> >>>>>>>>>>> d30f770aaaea..a96ef6338cd2 100644
> >>>>>>>>>>> --- a/arch/x86/include/asm/kvm_host.h
> >>>>>>>>>>> +++ b/arch/x86/include/asm/kvm_host.h
> >>>>>>>>>>> @@ -1273,6 +1273,7 @@ struct kvm_x86_ops {
> >>>>>>>>>>>                             struct kvm_page_enc_bitmap *bmap);
> >>>>>>>>>>>     int (*set_page_enc_bitmap)(struct kvm *kvm,
> >>>>>>>>>>>                             struct kvm_page_enc_bitmap 
> >>>>>>>>>>> *bmap);
> >>>>>>>>>>> +   int (*reset_page_enc_bitmap)(struct kvm *kvm);
> >>>>>>>>>>>    };
> >>>>>>>>>>>    struct kvm_arch_async_pf { diff --git 
> >>>>>>>>>>> a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 
> >>>>>>>>>>> 313343a43045..c99b0207a443 100644
> >>>>>>>>>>> --- a/arch/x86/kvm/svm.c
> >>>>>>>>>>> +++ b/arch/x86/kvm/svm.c
> >>>>>>>>>>> @@ -7797,6 +7797,21 @@ static int svm_set_page_enc_bitmap(struct kvm *kvm,
> >>>>>>>>>>>     return ret;
> >>>>>>>>>>>    }
> >>>>>>>>>>> +static int svm_reset_page_enc_bitmap(struct kvm *kvm) 
> >>>>>>>>>>> +{
> >>>>>>>>>>> +   struct kvm_sev_info *sev = 
> >>>>>>>>>>> +&to_kvm_svm(kvm)->sev_info;
> >>>>>>>>>>> +
> >>>>>>>>>>> +   if (!sev_guest(kvm))
> >>>>>>>>>>> +           return -ENOTTY;
> >>>>>>>>>>> +
> >>>>>>>>>>> +   mutex_lock(&kvm->lock);
> >>>>>>>>>>> +   /* by default all pages should be marked encrypted */
> >>>>>>>>>>> +   if (sev->page_enc_bmap_size)
> >>>>>>>>>>> +           bitmap_fill(sev->page_enc_bmap, sev->page_enc_bmap_size);
> >>>>>>>>>>> +   mutex_unlock(&kvm->lock);
> >>>>>>>>>>> +   return 0;
> >>>>>>>>>>> +}
> >>>>>>>>>>> +
> >>>>>>>>>>>    static int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >>>>>>>>>>>    {
> >>>>>>>>>>>     struct kvm_sev_cmd sev_cmd; @@ -8203,6 +8218,7 @@ 
> >>>>>>>>>>> static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >>>>>>>>>>>     .page_enc_status_hc = svm_page_enc_status_hc,
> >>>>>>>>>>>     .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> >>>>>>>>>>>     .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> >>>>>>>>>>> +   .reset_page_enc_bitmap = 
> >>>>>>>>>>> + svm_reset_page_enc_bitmap,
> >>>>>>>>>> We don't need to initialize the intel ops to NULL ? 
> >>>>>>>>>> It's not initialized in the previous patch either.
> >>>>>>>>>>
> >>>>>>>>>>>    };
> >>>>>>>>> This struct is declared as "static storage", so won't 
> >>>>>>>>> the non-initialized members be 0 ?
> >>>>>>>>
> >>>>>>>> Correct. Although, I see that 'nested_enable_evmcs' is 
> >>>>>>>> explicitly initialized. We should maintain the convention, perhaps.
> >>>>>>>>
> >>>>>>>>>>>    static int __init svm_init(void) diff --git 
> >>>>>>>>>>> a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 
> >>>>>>>>>>> 05e953b2ec61..2127ed937f53 100644
> >>>>>>>>>>> --- a/arch/x86/kvm/x86.c
> >>>>>>>>>>> +++ b/arch/x86/kvm/x86.c
> >>>>>>>>>>> @@ -5250,6 +5250,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >>>>>>>>>>>                     r = kvm_x86_ops->set_page_enc_bitmap(kvm, &bitmap);
> >>>>>>>>>>>             break;
> >>>>>>>>>>>     }
> >>>>>>>>>>> +   case KVM_PAGE_ENC_BITMAP_RESET: {
> >>>>>>>>>>> +           r = -ENOTTY;
> >>>>>>>>>>> +           if (kvm_x86_ops->reset_page_enc_bitmap)
> >>>>>>>>>>> +                   r = kvm_x86_ops->reset_page_enc_bitmap(kvm);
> >>>>>>>>>>> +           break;
> >>>>>>>>>>> +   }
> >>>>>>>>>>>     default:
> >>>>>>>>>>>             r = -ENOTTY;
> >>>>>>>>>>>     }
> >>>>>>>>>>> diff --git a/include/uapi/linux/kvm.h 
> >>>>>>>>>>> b/include/uapi/linux/kvm.h index 
> >>>>>>>>>>> b4b01d47e568..0884a581fc37 100644
> >>>>>>>>>>> --- a/include/uapi/linux/kvm.h
> >>>>>>>>>>> +++ b/include/uapi/linux/kvm.h
> >>>>>>>>>>> @@ -1490,6 +1490,7 @@ struct kvm_enc_region {
> >>>>>>>>>>>    #define KVM_GET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc5, struct kvm_page_enc_bitmap)
> >>>>>>>>>>>    #define KVM_SET_PAGE_ENC_BITMAP  _IOW(KVMIO, 0xc6, 
> >>>>>>>>>>> struct kvm_page_enc_bitmap)
> >>>>>>>>>>> +#define KVM_PAGE_ENC_BITMAP_RESET  _IO(KVMIO, 0xc7)
> >>>>>>>>>>>    /* Secure Encrypted Virtualization command */
> >>>>>>>>>>>    enum sev_cmd_id {
> >>>>>>>>>> Reviewed-by: Krish Sadhukhan 
> >>>>>>>>>> <krish.sadhukhan@oracle.com>
> >>>>>>>
> >>>>>>> Doesn't this overlap with the set ioctl? Yes, obviously, you 
> >>>>>>> have to copy the new value down and do a bit more work, but 
> >>>>>>> I don't think resetting the bitmap is going to be the 
> >>>>>>> bottleneck on reboot. Seems excessive to add another ioctl for this.
> >>>>>> The set ioctl is generally available/provided for the incoming 
> >>>>>> VM to setup the page encryption bitmap, this reset ioctl is 
> >>>>>> meant for the source VM as a simple interface to reset the whole page encryption bitmap.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Ashish
> >>>>>
> >>>>> Hey Ashish,
> >>>>>
> >>>>> These seem very overlapping. I think this API should be refactored a bit.
> >>>>>
> >>>>> 1) Use kvm_vm_ioctl_enable_cap to control whether or not this 
> >>>>> hypercall (and related feature bit) is offered to the VM, and 
> >>>>> also the size of the buffer.
> >>>> If you look at patch 13/14, i have added a new kvm para feature 
> >>>> called "KVM_FEATURE_SEV_LIVE_MIGRATION" which indicates host 
> >>>> support for SEV Live Migration and a new Custom MSR which the 
> >>>> guest does a wrmsr to enable the Live Migration feature, so this 
> >>>> is like the enable cap support.
> >>>>
> >>>> There are further extensions to this support i am adding, so patch 
> >>>> 13/14 of this patch-set is still being enhanced and will have full 
> >>>> support when i repost next.
> >>>>
> >>>>> 2) Use set for manipulating values in the bitmap, including 
> >>>>> resetting the bitmap. Set the bitmap pointer to null if you want 
> >>>>> to reset to all 0xFFs. When the bitmap pointer is set, it should 
> >>>>> set the values to exactly what is pointed at, instead of only 
> >>>>> clearing bits, as is done currently.
> >>>> As i mentioned in my earlier email, the set api is supposed to be 
> >>>> for the incoming VM, but if you really need to use it for the 
> >>>> outgoing VM then it can be modified.
> >>>>
> >>>>> 3) Use get for fetching values from the kernel. Personally, I'd 
> >>>>> require alignment of the base GFN to a multiple of 8 (but the 
> >>>>> number of pages could be whatever), so you can just use a 
> >>>>> memcpy. Optionally, you may want some way to tell userspace the 
> >>>>> size of the existing buffer, so it can ensure that it can ask 
> >>>>> for the entire buffer without having to track the size in 
> >>>>> usermode (not strictly necessary, but nice to have since it 
> >>>>> ensures that there is only one place that has to manage this value).
> >>>>>
> >>>>> If you want to expand or contract the bitmap, you can use enable 
> >>>>> cap to adjust the size.
> >>>> As being discussed on the earlier mail thread, we are doing this 
> >>>> dynamically now by computing the guest RAM size when the 
> >>>> set_user_memory_region ioctl is invoked. I believe that should 
> >>>> handle the hot-plug and hot-unplug events too, as any hot memory 
> >>>> updates will need KVM memslots to be updated.
> >>> Ahh, sorry, forgot you mentioned this: yes this can work. Host needs 
> >>> to be able to decide not to allocate, but this should be workable.
> >>>>> If you don't want to offer the hypercall to the guest, don't 
> >>>>> call the enable cap.
> >>>>> This API avoids using up another ioctl. Ioctl space is somewhat 
> >>>>> scarce. It also gives userspace fine grained control over the 
> >>>>> buffer, so it can support both hot-plug and hot-unplug (or at 
> >>>>> the very least it is not obviously incompatible with those). It 
> >>>>> also gives userspace control over whether or not the feature is 
> >>>>> offered. The hypercall isn't free, and being able to tell guests 
> >>>>> to not call when the host wasn't going to migrate it anyway will be useful.
> >>>>>
> >>>> As i mentioned above, now the host indicates if it supports the 
> >>>> Live Migration feature and the feature and the hypercall are only 
> >>>> enabled on the host when the guest checks for this support and 
> >>>> does a wrmsr() to enable the feature. Also the guest will not make 
> >>>> the hypercall if the host does not indicate support for it.
> >>> If my read of those patches was correct, the host will always 
> >>> advertise support for the hypercall. And the only bit controlling 
> >>> whether or not the hypercall is advertised is essentially the kernel 
> >>> version. You need to rollout a new kernel to disable the hypercall.
> >> Ahh, awesome, I see I misunderstood how the CPUID bits get passed
> >> through: usermode can still override them. Forgot about the back and 
> >> forth for CPUID with usermode. My point about informing the guest 
> >> kernel is clearly moot. The host still needs the ability to prevent 
> >> allocations, but that is more minor. Maybe use a flag on the memslots 
> >> directly?
> >> On second thought: burning the memslot flag for 30mb per tb of VM seems like a waste.
> > Currently, I am still using the approach of a "unified" page encryption bitmap instead of a 
> > bitmap per memslot, with the main change being that the resizing is only done whenever
> > there are any updates in memslots, when memslots are updated using the
> > kvm_arch_commit_memory_region() interface.  
> 
> 
> Just a note, I believe kvm_arch_commit_memory_region() maybe getting
> called every time there is a change in the memory region (e.g add,
> update, delete etc). So your architecture specific hook now need to be
> well aware of all those changes and act accordingly. This basically
> means that the svm.c will probably need to understand all those memory
> slot flags etc. IMO, having a separate ioctl to hint the size makes more
> sense if you are doing a unified bitmap but if the bitmap is per memslot
> then calculating the size based on the memslot information makes more sense.
> 

If instead of unified bitmap, i use a bitmap per memslot approach, even
then the svm/sev code will still need to to have knowledge of memslot flags
etc., that information will be required as svm/sev code will be
responsible for syncing the page encryption bitmap to userspace and not
the generic KVM x86 code which gets invoked for the dirty page bitmap sync.

Currently, the architecture hooks need to be aware of
KVM_MR_ADD/KVM_MR_DELETE flags and the resize will only happen
if the highest guest PA that is mapped by a memslot gets modified,
otherwise, typically, there will mainly one resize at the initial guest
launch.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 107+ messages in thread

end of thread, other threads:[~2020-04-11  0:35 UTC | newest]

Thread overview: 107+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-30  6:19 [PATCH v6 00/14] Add AMD SEV guest live migration support Ashish Kalra
2020-03-30  6:19 ` [PATCH v6 01/14] KVM: SVM: Add KVM_SEV SEND_START command Ashish Kalra
2020-04-02  6:27   ` Venu Busireddy
2020-04-02 12:59     ` Brijesh Singh
2020-04-02 16:37       ` Venu Busireddy
2020-04-02 18:04         ` Brijesh Singh
2020-04-02 18:57           ` Venu Busireddy
2020-04-02 19:17             ` Brijesh Singh
2020-04-02 19:43               ` Venu Busireddy
2020-04-02 20:04                 ` Brijesh Singh
2020-04-02 20:19                   ` Venu Busireddy
2020-04-02 17:51   ` Krish Sadhukhan
2020-04-02 18:38     ` Brijesh Singh
2020-03-30  6:20 ` [PATCH v6 02/14] KVM: SVM: Add KVM_SEND_UPDATE_DATA command Ashish Kalra
2020-04-02 17:55   ` Venu Busireddy
2020-04-02 20:13   ` Krish Sadhukhan
2020-03-30  6:20 ` [PATCH v6 03/14] KVM: SVM: Add KVM_SEV_SEND_FINISH command Ashish Kalra
2020-04-02 18:17   ` Venu Busireddy
2020-04-02 20:15   ` Krish Sadhukhan
2020-03-30  6:21 ` [PATCH v6 04/14] KVM: SVM: Add support for KVM_SEV_RECEIVE_START command Ashish Kalra
2020-04-02 21:35   ` Venu Busireddy
2020-04-02 22:09   ` Krish Sadhukhan
2020-03-30  6:21 ` [PATCH v6 05/14] KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command Ashish Kalra
2020-04-02 22:25   ` Krish Sadhukhan
2020-04-02 22:29   ` Venu Busireddy
2020-04-07  0:49     ` Steve Rutherford
2020-03-30  6:21 ` [PATCH v6 06/14] KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command Ashish Kalra
2020-04-02 22:24   ` Venu Busireddy
2020-04-02 22:27   ` Krish Sadhukhan
2020-04-07  0:57     ` Steve Rutherford
2020-03-30  6:21 ` [PATCH v6 07/14] KVM: x86: Add AMD SEV specific Hypercall3 Ashish Kalra
2020-04-02 22:36   ` Venu Busireddy
2020-04-02 23:54   ` Krish Sadhukhan
2020-04-07  1:22     ` Steve Rutherford
2020-03-30  6:22 ` [PATCH v6 08/14] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall Ashish Kalra
2020-04-03  0:00   ` Venu Busireddy
2020-04-03  1:31   ` Krish Sadhukhan
2020-04-03  1:57     ` Ashish Kalra
2020-04-03  2:58       ` Ashish Kalra
2020-04-06 22:27         ` Krish Sadhukhan
2020-04-07  2:17   ` Steve Rutherford
2020-04-07  5:27     ` Ashish Kalra
2020-04-08  0:01       ` Steve Rutherford
2020-04-08  0:29         ` Brijesh Singh
2020-04-08  0:35           ` Steve Rutherford
2020-04-08  1:17             ` Ashish Kalra
2020-04-08  1:38               ` Steve Rutherford
2020-04-08  2:34                 ` Brijesh Singh
2020-04-08  3:18                   ` Ashish Kalra
2020-04-09 16:18                     ` Ashish Kalra
2020-04-09 20:41                       ` Steve Rutherford
2020-03-30  6:22 ` [PATCH v6 09/14] KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl Ashish Kalra
2020-04-03 18:30   ` Venu Busireddy
2020-04-03 20:18   ` Krish Sadhukhan
2020-04-03 20:47     ` Ashish Kalra
2020-04-06 22:07       ` Krish Sadhukhan
2020-04-03 20:55     ` Venu Busireddy
2020-04-03 21:01       ` Ashish Kalra
2020-03-30  6:22 ` [PATCH v6 10/14] mm: x86: Invoke hypercall when page encryption status is changed Ashish Kalra
2020-04-03 21:07   ` Krish Sadhukhan
2020-04-03 21:30     ` Ashish Kalra
2020-04-03 21:36   ` Venu Busireddy
2020-03-30  6:22 ` [PATCH v6 11/14] KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl Ashish Kalra
2020-04-03 21:10   ` Krish Sadhukhan
2020-04-03 21:46   ` Venu Busireddy
2020-04-08  0:26   ` Steve Rutherford
2020-04-08  1:48     ` Ashish Kalra
2020-04-10  0:06       ` Steve Rutherford
2020-04-10  1:23         ` Ashish Kalra
2020-04-10 18:08           ` Steve Rutherford
2020-03-30  6:23 ` [PATCH v6 12/14] KVM: x86: Introduce KVM_PAGE_ENC_BITMAP_RESET ioctl Ashish Kalra
2020-04-03 21:14   ` Krish Sadhukhan
2020-04-03 21:45     ` Ashish Kalra
2020-04-06 18:52       ` Krish Sadhukhan
2020-04-08  1:25         ` Steve Rutherford
2020-04-08  1:52           ` Ashish Kalra
2020-04-10  0:59             ` Steve Rutherford
2020-04-10  1:34               ` Ashish Kalra
2020-04-10 18:14                 ` Steve Rutherford
2020-04-10 20:16                   ` Steve Rutherford
2020-04-10 20:18                     ` Steve Rutherford
2020-04-10 20:55                       ` Kalra, Ashish
2020-04-10 21:42                         ` Brijesh Singh
2020-04-10 21:46                           ` Sean Christopherson
2020-04-10 21:58                             ` Brijesh Singh
2020-04-10 22:02                         ` Brijesh Singh
2020-04-11  0:35                           ` Ashish Kalra
2020-04-03 22:01   ` Venu Busireddy
2020-03-30  6:23 ` [PATCH v6 13/14] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR Ashish Kalra
2020-03-30 15:52   ` Brijesh Singh
2020-03-30 16:42     ` Ashish Kalra
     [not found]     ` <20200330162730.GA21567@ashkalra_ubuntu_server>
     [not found]       ` <1de5e95f-4485-f2ff-aba8-aa8b15564796@amd.com>
     [not found]         ` <20200331171336.GA24050@ashkalra_ubuntu_server>
     [not found]           ` <20200401070931.GA8562@ashkalra_ubuntu_server>
2020-04-02 23:29             ` Ashish Kalra
2020-04-03 23:46   ` Krish Sadhukhan
2020-03-30  6:23 ` [PATCH v6 14/14] KVM: x86: Add kexec support for SEV Live Migration Ashish Kalra
2020-03-30 16:00   ` Brijesh Singh
2020-03-30 16:45     ` Ashish Kalra
2020-03-31 14:26       ` Brijesh Singh
2020-04-02 23:34         ` Ashish Kalra
2020-04-03 12:57   ` Dave Young
2020-04-04  0:55   ` Krish Sadhukhan
2020-04-04 21:57     ` Ashish Kalra
2020-04-06 18:37       ` Krish Sadhukhan
2020-03-30 17:24 ` [PATCH v6 00/14] Add AMD SEV guest live migration support Venu Busireddy
2020-03-30 18:28   ` Ashish Kalra
2020-03-30 19:13     ` Venu Busireddy
2020-03-30 21:52       ` Ashish Kalra
2020-03-31 14:42         ` Venu Busireddy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).