All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot
@ 2022-06-03  6:56 Claudio Imbrenda
  2022-06-03  6:56 ` [PATCH v11 01/19] KVM: s390: pv: leak the topmost page table when destroy fails Claudio Imbrenda
                   ` (19 more replies)
  0 siblings, 20 replies; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

Previously, when a protected VM was rebooted or when it was shut down,
its memory was made unprotected, and then the protected VM itself was
destroyed. Looping over the whole address space can take some time,
considering the overhead of the various Ultravisor Calls (UVCs). This
means that a reboot or a shutdown would take a potentially long amount
of time, depending on the amount of used memory.

This patchseries implements a deferred destroy mechanism for protected
guests. When a protected guest is destroyed, its memory can be cleared
in background, allowing the guest to restart or terminate significantly
faster than before.

There are 2 possibilities when a protected VM is torn down:
* it still has an address space associated (reboot case)
* it does not have an address space anymore (shutdown case)

For the reboot case, two new commands are available for the
KVM_S390_PV_COMMAND:

KVM_PV_ASYNC_DISABLE_PREPARE: prepares the current protected VM for
asynchronous teardown. The current VM will then continue immediately
as non-protected. If a protected VM had already been set aside without
starting the teardown process, this call will fail. In this case the
userspace process should issue a normal KVM_PV_DISABLE

KVM_PV_ASYNC_DISABLE: tears down the protected VM previously set aside
for asychronous teardown. This PV command should ideally be issued by
userspace from a separate thread. If a fatal signal is received (or
the process terminates naturally), the command will terminate
immediately without completing.

The idea is that userspace should first issue the
KVM_PV_ASYNC_DISABLE_PREPARE command, and in case of success, create a
new thread and issue KVM_PV_ASYNC_DISABLE from there. This also allows
for proper accounting of the CPU time needed for the asynchronous
teardown.

This means that the same address space can have memory belonging to
more than one protected guest, although only one will be running, the
others will in fact not even have any CPUs.

The shutdown case should be dealt with in userspace (e.g. using
clone(CLONE_VM)).

A module parameter is also provided to disable the new functionality,
which is otherwise enabled by default. This should not be an issue
since the new functionality is opt-in anyway. This is mainly thought to
aid debugging.

v10->v11
* rebase
* improve comments and patch descriptions
* rename s390_remove_old_asce to s390_unlist_old_asce
* rename DESTROY_LOOP_THRESHOLD to GATHER_GET_PAGES
* rename module parameter lazy_destroy to async_destroy
* move the WRITE_ONCE to be right after the UVC in patch 13
* improve handling leftover secure VMs in patch 14
* lock only when needed in patch 15, instead of always locking and then
  unlocking and locking again
* refactor should_export_before_import to make it more readable

v9->v10
* improved and expanded comments, fix typos
* add new patch: perform destroy configuration UVC before clearing
  memory for unconditional deinit_vm (instead of afterwards)
* explicitly initialize kvm->arch.pv.async_deinit in kvm_arch_init_vm
* do not try to call the destroy fast UVC in the MMU notifier if it is
  not available

v8->v9
* rebased
* added dependency on MMU_NOTIFIER for KVM in arch/s390/kvm/Kconfig
* add support for the Destroy Secure Configuration Fast UVC
* minor fixes

v7->v8
* switched patches 8 and 9
* improved comments, documentation and patch descriptions
* remove mm notifier when the struct kvm is torn down
* removed useless locks in the mm notifier
* use _ASCE_ORIGIN instead of PAGE_MASK for ASCEs
* cleanup of some compiler warnings
* remove some harmless but useless duplicate code
* the last parameter of __s390_uv_destroy_range is now bool
* rename the KVM capability to KVM_CAP_S390_PROTECTED_ASYNC_DISABLE

v6->v7
* moved INIT_LIST_HEAD inside spinlock in patch 1
* improved commit messages in patch 2
* added missing locks in patch 3
* added and expanded some comments in patch 11
* rebased

v5->v6
* completely reworked the series
* removed kernel thread for asynchronous teardown
* added new commands to KVM_S390_PV_COMMAND ioctl

v4->v5
* fixed and improved some patch descriptions
* added some comments to better explain what's going on
* use vma_lookup instead of find_vma
* rename is_protected to protected_count since now it's used as a counter

v3->v4
* added patch 2
* split patch 3
* removed the shutdown part -- will be a separate patchseries
* moved the patch introducing the module parameter

v2->v3
* added definitions for CC return codes for the UVC instruction
* improved make_secure_pte:
  - renamed rc to cc
  - added comments to explain why returning -EAGAIN is ok
* fixed kvm_s390_pv_replace_asce and kvm_s390_pv_remove_old_asce:
  - renamed
  - added locking
  - moved to gmap.c
* do proper error management in do_secure_storage_access instead of
  trying again hoping to get a different exception
* fix outdated patch descriptions

v1->v2
* rebased on a more recent kernel
* improved/expanded some patch descriptions
* improves/expanded some comments
* added patch 1, which prevents stall notification when the system is
  under heavy load.
* rename some members of struct deferred_priv to improve readability
* avoid an use-after-free bug of the struct mm in case of shutdown
* add missing return when lazy destroy is disabled
* add support for OOM notifier

Claudio Imbrenda (19):
  KVM: s390: pv: leak the topmost page table when destroy fails
  KVM: s390: pv: handle secure storage violations for protected guests
  KVM: s390: pv: handle secure storage exceptions for normal guests
  KVM: s390: pv: refactor s390_reset_acc
  KVM: s390: pv: usage counter instead of flag
  KVM: s390: pv: add export before import
  KVM: s390: pv: module parameter to fence asynchronous destroy
  KVM: s390: pv: clear the state without memset
  KVM: s390: pv: Add kvm_s390_cpus_from_pv to kvm-s390.h and add
    documentation
  KVM: s390: pv: add mmu_notifier
  s390/mm: KVM: pv: when tearing down, try to destroy protected pages
  KVM: s390: pv: refactoring of kvm_s390_pv_deinit_vm
  KVM: s390: pv: destroy the configuration before its memory
  KVM: s390: pv: cleanup leftover protected VMs if needed
  KVM: s390: pv: asynchronous destroy for reboot
  KVM: s390: pv: api documentation for asynchronous destroy
  KVM: s390: pv: add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE
  KVM: s390: pv: avoid export before import if possible
  KVM: s390: pv: support for Destroy fast UVC

 Documentation/virt/kvm/api.rst      |  25 ++-
 arch/s390/include/asm/gmap.h        |  39 +++-
 arch/s390/include/asm/kvm_host.h    |   4 +
 arch/s390/include/asm/mmu.h         |   2 +-
 arch/s390/include/asm/mmu_context.h |   2 +-
 arch/s390/include/asm/pgtable.h     |  21 +-
 arch/s390/include/asm/uv.h          |  11 +
 arch/s390/kernel/uv.c               |  79 +++++++
 arch/s390/kvm/Kconfig               |   1 +
 arch/s390/kvm/kvm-s390.c            |  80 ++++++-
 arch/s390/kvm/kvm-s390.h            |   3 +
 arch/s390/kvm/pv.c                  | 325 +++++++++++++++++++++++++++-
 arch/s390/mm/fault.c                |  23 +-
 arch/s390/mm/gmap.c                 | 173 ++++++++++++---
 include/uapi/linux/kvm.h            |   3 +
 15 files changed, 743 insertions(+), 48 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v11 01/19] KVM: s390: pv: leak the topmost page table when destroy fails
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-03  6:56 ` [PATCH v11 02/19] KVM: s390: pv: handle secure storage violations for protected guests Claudio Imbrenda
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

Each secure guest must have a unique ASCE (address space control
element); we must avoid that new guests use the same page for their
ASCE, to avoid errors.

Since the ASCE mostly consists of the address of the topmost page table
(plus some flags), we must not return that memory to the pool unless
the ASCE is no longer in use.

Only a successful Destroy Secure Configuration UVC will make the ASCE
reusable again.

If the Destroy Configuration UVC fails, the ASCE cannot be reused for a
secure guest (either for the ASCE or for other memory areas). To avoid
a collision, it must not be used again. This is a permanent error and
the page becomes in practice unusable, so we set it aside and leak it.
On failure we already leak other memory that belongs to the ultravisor
(i.e. the variable and base storage for a guest) and not leaking the
topmost page table was an oversight.

This error (and thus the leakage) should not happen unless the hardware
is broken or KVM has some unknown serious bug.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: 29b40f105ec8d55 ("KVM: s390: protvirt: Add initial vm and cpu lifecycle handling")
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/gmap.h |  2 +
 arch/s390/kvm/pv.c           |  9 ++--
 arch/s390/mm/gmap.c          | 86 ++++++++++++++++++++++++++++++++++++
 3 files changed, 94 insertions(+), 3 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index 40264f60b0da..f4073106e1f3 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -148,4 +148,6 @@ void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],
 			     unsigned long gaddr, unsigned long vmaddr);
 int gmap_mark_unmergeable(void);
 void s390_reset_acc(struct mm_struct *mm);
+void s390_unlist_old_asce(struct gmap *gmap);
+int s390_replace_asce(struct gmap *gmap);
 #endif /* _ASM_S390_GMAP_H */
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index cc7c9599f43e..8eee3fc414e5 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -161,10 +161,13 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
 	atomic_set(&kvm->mm->context.is_protected, 0);
 	KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x", *rc, *rrc);
 	WARN_ONCE(cc, "protvirt destroy vm failed rc %x rrc %x", *rc, *rrc);
-	/* Inteded memory leak on "impossible" error */
-	if (!cc)
+	/* Intended memory leak on "impossible" error */
+	if (!cc) {
 		kvm_s390_pv_dealloc_vm(kvm);
-	return cc ? -EIO : 0;
+		return 0;
+	}
+	s390_replace_asce(kvm->arch.gmap);
+	return -EIO;
 }
 
 int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 1ac73917a8d3..bd07157f834f 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2721,3 +2721,89 @@ void s390_reset_acc(struct mm_struct *mm)
 	mmput(mm);
 }
 EXPORT_SYMBOL_GPL(s390_reset_acc);
+
+/**
+ * s390_unlist_old_asce - Remove the topmost level of page tables from the
+ * list of page tables of the gmap.
+ * @gmap the gmap whose table is to be removed
+ *
+ * On s390x, KVM keeps a list of all pages containing the page tables of the
+ * gmap (the CRST list). This list is used at tear down time to free all
+ * pages that are now not needed anymore.
+ *
+ * This function removes the topmost page of the tree (the one pointed to by
+ * the ASCE) from the CRST list.
+ *
+ * This means that it will not be freed when the VM is torn down, and needs
+ * to be handled separately by the caller, unless a leak is actually
+ * intended. Notice that this function will only remove the page from the
+ * list, the page will still be used as a top level page table (and ASCE).
+ */
+void s390_unlist_old_asce(struct gmap *gmap)
+{
+	struct page *old;
+
+	old = virt_to_page(gmap->table);
+	spin_lock(&gmap->guest_table_lock);
+	list_del(&old->lru);
+	/*
+	 * Sometimes the topmost page might need to be "removed" multiple
+	 * times, for example if the VM is rebooted into secure mode several
+	 * times concurrently, or if s390_replace_asce fails after calling
+	 * s390_remove_old_asce and is attempted again later. In that case
+	 * the old asce has been removed from the list, and therefore it
+	 * will not be freed when the VM terminates, but the ASCE is still
+	 * in use and still pointed to.
+	 * A subsequent call to replace_asce will follow the pointer and try
+	 * to remove the same page from the list again.
+	 * Therefore it's necessary that the page of the ASCE has valid
+	 * pointers, so list_del can work (and do nothing) without
+	 * dereferencing stale or invalid pointers.
+	 */
+	INIT_LIST_HEAD(&old->lru);
+	spin_unlock(&gmap->guest_table_lock);
+}
+EXPORT_SYMBOL_GPL(s390_unlist_old_asce);
+
+/**
+ * s390_replace_asce - Try to replace the current ASCE of a gmap with a copy
+ * @gmap the gmap whose ASCE needs to be replaced
+ *
+ * If the allocation of the new top level page table fails, the ASCE is not
+ * replaced.
+ * In any case, the old ASCE is always removed from the gmap CRST list.
+ * Therefore the caller has to make sure to save a pointer to it
+ * beforehand, unless a leak is actually intended.
+ */
+int s390_replace_asce(struct gmap *gmap)
+{
+	unsigned long asce;
+	struct page *page;
+	void *table;
+
+	s390_unlist_old_asce(gmap);
+
+	page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
+	if (!page)
+		return -ENOMEM;
+	table = page_to_virt(page);
+	memcpy(table, gmap->table, 1UL << (CRST_ALLOC_ORDER + PAGE_SHIFT));
+
+	/*
+	 * The caller has to deal with the old ASCE, but here we make sure
+	 * the new one is properly added to the CRST list, so that
+	 * it will be freed when the VM is torn down.
+	 */
+	spin_lock(&gmap->guest_table_lock);
+	list_add(&page->lru, &gmap->crst_list);
+	spin_unlock(&gmap->guest_table_lock);
+
+	/* Set new table origin while preserving existing ASCE control bits */
+	asce = (gmap->asce & ~_ASCE_ORIGIN) | __pa(table);
+	WRITE_ONCE(gmap->asce, asce);
+	WRITE_ONCE(gmap->mm->context.gmap_asce, asce);
+	WRITE_ONCE(gmap->table, table);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(s390_replace_asce);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v11 02/19] KVM: s390: pv: handle secure storage violations for protected guests
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
  2022-06-03  6:56 ` [PATCH v11 01/19] KVM: s390: pv: leak the topmost page table when destroy fails Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-03  6:56 ` [PATCH v11 03/19] KVM: s390: pv: handle secure storage exceptions for normal guests Claudio Imbrenda
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

A secure storage violation is triggered when a protected guest tries to
access secure memory that has been mapped erroneously, or that belongs
to a different protected guest or to the ultravisor.

With upcoming patches, protected guests will be able to trigger secure
storage violations in normal operation. This happens for example if a
protected guest is rebooted with deferred destroy enabled and the new
guest is also protected.

When the new protected guest touches pages that have not yet been
destroyed, and thus are accounted to the previous protected guest, a
secure storage violation is raised.

This patch adds handling of secure storage violations for protected
guests.

This exception is handled by first trying to destroy the page, because
it is expected to belong to a defunct protected guest where a destroy
should be possible. Note that a secure page can only be destroyed if
its protected VM does not have any CPUs, which only happens when the
protected VM is being terminated. If that fails, a normal export of
the page is attempted.

This means that pages that trigger the exception will be made
non-secure (in one way or another) before attempting to use them again
for a different secure guest.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/uv.h |  1 +
 arch/s390/kernel/uv.c      | 55 ++++++++++++++++++++++++++++++++++++++
 arch/s390/mm/fault.c       | 10 +++++++
 3 files changed, 66 insertions(+)

diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index cfea7b77a5b8..ba64e0be03bb 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -378,6 +378,7 @@ static inline int is_prot_virt_host(void)
 }
 
 int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb);
+int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr);
 int uv_destroy_owned_page(unsigned long paddr);
 int uv_convert_from_secure(unsigned long paddr);
 int uv_convert_owned_from_secure(unsigned long paddr);
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index a5425075dd25..2754471cc789 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -334,6 +334,61 @@ int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
 }
 EXPORT_SYMBOL_GPL(gmap_convert_to_secure);
 
+/**
+ * gmap_destroy_page - Destroy a guest page.
+ * @gmap the gmap of the guest
+ * @gaddr the guest address to destroy
+ *
+ * An attempt will be made to destroy the given guest page. If the attempt
+ * fails, an attempt is made to export the page. If both attempts fail, an
+ * appropriate error is returned.
+ */
+int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr)
+{
+	struct vm_area_struct *vma;
+	unsigned long uaddr;
+	struct page *page;
+	int rc;
+
+	rc = -EFAULT;
+	mmap_read_lock(gmap->mm);
+
+	uaddr = __gmap_translate(gmap, gaddr);
+	if (IS_ERR_VALUE(uaddr))
+		goto out;
+	vma = vma_lookup(gmap->mm, uaddr);
+	if (!vma)
+		goto out;
+	/*
+	 * Huge pages should not be able to become secure
+	 */
+	if (is_vm_hugetlb_page(vma))
+		goto out;
+
+	rc = 0;
+	/* we take an extra reference here */
+	page = follow_page(vma, uaddr, FOLL_WRITE | FOLL_GET);
+	if (IS_ERR_OR_NULL(page))
+		goto out;
+	rc = uv_destroy_owned_page(page_to_phys(page));
+	/*
+	 * Fault handlers can race; it is possible that two CPUs will fault
+	 * on the same secure page. One CPU can destroy the page, reboot,
+	 * re-enter secure mode and import it, while the second CPU was
+	 * stuck at the beginning of the handler. At some point the second
+	 * CPU will be able to progress, and it will not be able to destroy
+	 * the page. In that case we do not want to terminate the process,
+	 * we instead try to export the page.
+	 */
+	if (rc)
+		rc = uv_convert_owned_from_secure(page_to_phys(page));
+	put_page(page);
+out:
+	mmap_read_unlock(gmap->mm);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(gmap_destroy_page);
+
 /*
  * To be called with the page locked or with an extra reference! This will
  * prevent gmap_make_secure from touching the page concurrently. Having 2
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index e173b6187ad5..af1ac49168fb 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -837,6 +837,16 @@ NOKPROBE_SYMBOL(do_non_secure_storage_access);
 
 void do_secure_storage_violation(struct pt_regs *regs)
 {
+	unsigned long gaddr = regs->int_parm_long & __FAIL_ADDR_MASK;
+	struct gmap *gmap = (struct gmap *)S390_lowcore.gmap;
+
+	/*
+	 * If the VM has been rebooted, its address space might still contain
+	 * secure pages from the previous boot.
+	 * Clear the page so it can be reused.
+	 */
+	if (!gmap_destroy_page(gmap, gaddr))
+		return;
 	/*
 	 * Either KVM messed up the secure guest mapping or the same
 	 * page is mapped into multiple secure guests.
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v11 03/19] KVM: s390: pv: handle secure storage exceptions for normal guests
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
  2022-06-03  6:56 ` [PATCH v11 01/19] KVM: s390: pv: leak the topmost page table when destroy fails Claudio Imbrenda
  2022-06-03  6:56 ` [PATCH v11 02/19] KVM: s390: pv: handle secure storage violations for protected guests Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-03  6:56 ` [PATCH v11 04/19] KVM: s390: pv: refactor s390_reset_acc Claudio Imbrenda
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

With upcoming patches, normal guests might touch secure pages.

This patch extends the existing exception handler to convert the pages
to non secure also when the exception is triggered by a normal guest.

This can happen for example when a secure guest reboots; the first
stage of a secure guest is non secure, and in general a secure guest
can reboot into non-secure mode.

If the secure memory of the previous boot has not been cleared up
completely yet (which will be allowed to happen in an upcoming patch),
a non-secure guest might touch secure memory, which will need to be
handled properly.

This means that gmap faults must be handled and not cause termination
of the process. The handling is the same as userspace accesses, it's
enough to translate the gmap address to a user address and then let the
normal user fault code handle it.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/mm/fault.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index af1ac49168fb..ee7871f770fb 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -754,6 +754,7 @@ void do_secure_storage_access(struct pt_regs *regs)
 	struct vm_area_struct *vma;
 	struct mm_struct *mm;
 	struct page *page;
+	struct gmap *gmap;
 	int rc;
 
 	/*
@@ -783,6 +784,17 @@ void do_secure_storage_access(struct pt_regs *regs)
 	}
 
 	switch (get_fault_type(regs)) {
+	case GMAP_FAULT:
+		mm = current->mm;
+		gmap = (struct gmap *)S390_lowcore.gmap;
+		mmap_read_lock(mm);
+		addr = __gmap_translate(gmap, addr);
+		mmap_read_unlock(mm);
+		if (IS_ERR_VALUE(addr)) {
+			do_fault_error(regs, VM_ACCESS_FLAGS, VM_FAULT_BADMAP);
+			break;
+		}
+		fallthrough;
 	case USER_FAULT:
 		mm = current->mm;
 		mmap_read_lock(mm);
@@ -811,7 +823,6 @@ void do_secure_storage_access(struct pt_regs *regs)
 		if (rc)
 			BUG();
 		break;
-	case GMAP_FAULT:
 	default:
 		do_fault_error(regs, VM_READ | VM_WRITE, VM_FAULT_BADMAP);
 		WARN_ON_ONCE(1);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v11 04/19] KVM: s390: pv: refactor s390_reset_acc
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
                   ` (2 preceding siblings ...)
  2022-06-03  6:56 ` [PATCH v11 03/19] KVM: s390: pv: handle secure storage exceptions for normal guests Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-03  6:56 ` [PATCH v11 05/19] KVM: s390: pv: usage counter instead of flag Claudio Imbrenda
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

Refactor s390_reset_acc so that it can be reused in upcoming patches.

We don't want to hold all the locks used in a walk_page_range for too
long, and the destroy page UVC does take some time to complete.
Therefore we quickly gather the pages to destroy, and then destroy them
without holding all the locks.

The new refactored function optionally allows to return early without
completing if a fatal signal is pending (and return and appropriate
error code). Two wrappers are provided to call the new function.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Nico Boehr <nrb@linux.ibm.com>
---
 arch/s390/include/asm/gmap.h | 37 +++++++++++++-
 arch/s390/kvm/pv.c           | 12 ++++-
 arch/s390/mm/gmap.c          | 95 +++++++++++++++++++++++++-----------
 3 files changed, 112 insertions(+), 32 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index f4073106e1f3..21ae3b3c3bb5 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -147,7 +147,42 @@ int gmap_mprotect_notify(struct gmap *, unsigned long start,
 void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],
 			     unsigned long gaddr, unsigned long vmaddr);
 int gmap_mark_unmergeable(void);
-void s390_reset_acc(struct mm_struct *mm);
 void s390_unlist_old_asce(struct gmap *gmap);
 int s390_replace_asce(struct gmap *gmap);
+void s390_uv_destroy_pfns(unsigned long count, unsigned long *pfns);
+int __s390_uv_destroy_range(struct mm_struct *mm, unsigned long start,
+			    unsigned long end, bool interruptible);
+
+/**
+ * s390_uv_destroy_range - Destroy a range of pages in the given mm.
+ * @mm the mm on which to operate on
+ * @start the start of the range
+ * @end the end of the range
+ *
+ * This function will call cond_sched, so it should not generate stalls, but
+ * it will otherwise only return when it completed.
+ */
+static inline void s390_uv_destroy_range(struct mm_struct *mm, unsigned long start,
+					 unsigned long end)
+{
+	(void)__s390_uv_destroy_range(mm, start, end, false);
+}
+
+/**
+ * s390_uv_destroy_range_interruptible - Destroy a range of pages in the
+ * given mm, but stop when a fatal signal is received.
+ * @mm the mm on which to operate on
+ * @start the start of the range
+ * @end the end of the range
+ *
+ * This function will call cond_sched, so it should not generate stalls. If
+ * a fatal signal is received, it will return with -EINTR immediately,
+ * without finishing destroying the whole range. Upon successful
+ * completion, 0 is returned.
+ */
+static inline int s390_uv_destroy_range_interruptible(struct mm_struct *mm, unsigned long start,
+						      unsigned long end)
+{
+	return __s390_uv_destroy_range(mm, start, end, true);
+}
 #endif /* _ASM_S390_GMAP_H */
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 8eee3fc414e5..bcbe10862f9f 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -12,6 +12,8 @@
 #include <asm/gmap.h>
 #include <asm/uv.h>
 #include <asm/mman.h>
+#include <linux/pagewalk.h>
+#include <linux/sched/mm.h>
 #include "kvm-s390.h"
 
 int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc)
@@ -152,8 +154,14 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
 {
 	int cc;
 
-	/* make all pages accessible before destroying the guest */
-	s390_reset_acc(kvm->mm);
+	/*
+	 * if the mm still has a mapping, make all its pages accessible
+	 * before destroying the guest
+	 */
+	if (mmget_not_zero(kvm->mm)) {
+		s390_uv_destroy_range(kvm->mm, 0, TASK_SIZE);
+		mmput(kvm->mm);
+	}
 
 	cc = uv_cmd_nodata(kvm_s390_pv_get_handle(kvm),
 			   UVC_CMD_DESTROY_SEC_CONF, rc, rrc);
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index bd07157f834f..653b97053d5e 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2683,44 +2683,81 @@ void s390_reset_cmma(struct mm_struct *mm)
 }
 EXPORT_SYMBOL_GPL(s390_reset_cmma);
 
-/*
- * make inaccessible pages accessible again
- */
-static int __s390_reset_acc(pte_t *ptep, unsigned long addr,
-			    unsigned long next, struct mm_walk *walk)
+#define GATHER_GET_PAGES 32
+
+struct reset_walk_state {
+	unsigned long next;
+	unsigned long count;
+	unsigned long pfns[GATHER_GET_PAGES];
+};
+
+static int s390_gather_pages(pte_t *ptep, unsigned long addr,
+			     unsigned long next, struct mm_walk *walk)
 {
+	struct reset_walk_state *p = walk->private;
 	pte_t pte = READ_ONCE(*ptep);
 
-	/* There is a reference through the mapping */
-	if (pte_present(pte))
-		WARN_ON_ONCE(uv_destroy_owned_page(pte_val(pte) & PAGE_MASK));
-
-	return 0;
+	if (pte_present(pte)) {
+		/* we have a reference from the mapping, take an extra one */
+		get_page(phys_to_page(pte_val(pte)));
+		p->pfns[p->count] = phys_to_pfn(pte_val(pte));
+		p->next = next;
+		p->count++;
+	}
+	return p->count >= GATHER_GET_PAGES;
 }
 
-static const struct mm_walk_ops reset_acc_walk_ops = {
-	.pte_entry		= __s390_reset_acc,
+static const struct mm_walk_ops gather_pages_ops = {
+	.pte_entry = s390_gather_pages,
 };
 
-#include <linux/sched/mm.h>
-void s390_reset_acc(struct mm_struct *mm)
+/*
+ * Call the Destroy secure page UVC on each page in the given array of PFNs.
+ * Each page needs to have an extra reference, which will be released here.
+ */
+void s390_uv_destroy_pfns(unsigned long count, unsigned long *pfns)
 {
-	if (!mm_is_protected(mm))
-		return;
-	/*
-	 * we might be called during
-	 * reset:                             we walk the pages and clear
-	 * close of all kvm file descriptors: we walk the pages and clear
-	 * exit of process on fd closure:     vma already gone, do nothing
-	 */
-	if (!mmget_not_zero(mm))
-		return;
-	mmap_read_lock(mm);
-	walk_page_range(mm, 0, TASK_SIZE, &reset_acc_walk_ops, NULL);
-	mmap_read_unlock(mm);
-	mmput(mm);
+	unsigned long i;
+
+	for (i = 0; i < count; i++) {
+		/* we always have an extra reference */
+		uv_destroy_owned_page(pfn_to_phys(pfns[i]));
+		/* get rid of the extra reference */
+		put_page(pfn_to_page(pfns[i]));
+		cond_resched();
+	}
+}
+EXPORT_SYMBOL_GPL(s390_uv_destroy_pfns);
+
+/**
+ * __s390_uv_destroy_range - Walk the given range of the given address
+ * space, and call the destroy secure page UVC on each page.
+ * Optionally exit early if a fatal signal is pending.
+ * @mm the mm to operate on
+ * @start the start of the range
+ * @end the end of the range
+ * @interruptible if not 0, stop when a fatal signal is received
+ * Return: 0 on success, -EINTR if the function stopped before completing
+ */
+int __s390_uv_destroy_range(struct mm_struct *mm, unsigned long start,
+			    unsigned long end, bool interruptible)
+{
+	struct reset_walk_state state = { .next = start };
+	int r = 1;
+
+	while (r > 0) {
+		state.count = 0;
+		mmap_read_lock(mm);
+		r = walk_page_range(mm, state.next, end, &gather_pages_ops, &state);
+		mmap_read_unlock(mm);
+		cond_resched();
+		s390_uv_destroy_pfns(state.count, state.pfns);
+		if (interruptible && fatal_signal_pending(current))
+			return -EINTR;
+	}
+	return 0;
 }
-EXPORT_SYMBOL_GPL(s390_reset_acc);
+EXPORT_SYMBOL_GPL(__s390_uv_destroy_range);
 
 /**
  * s390_unlist_old_asce - Remove the topmost level of page tables from the
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v11 05/19] KVM: s390: pv: usage counter instead of flag
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
                   ` (3 preceding siblings ...)
  2022-06-03  6:56 ` [PATCH v11 04/19] KVM: s390: pv: refactor s390_reset_acc Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-03  6:56 ` [PATCH v11 06/19] KVM: s390: pv: add export before import Claudio Imbrenda
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

Use the new protected_count field as a counter instead of the old
is_protected flag. This will be used in upcoming patches.

Increment the counter when a secure configuration is created, and
decrement it when it is destroyed. Previously the flag was set when the
set secure parameters UVC was performed.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/mmu.h         |  2 +-
 arch/s390/include/asm/mmu_context.h |  2 +-
 arch/s390/include/asm/pgtable.h     |  2 +-
 arch/s390/kvm/pv.c                  | 12 +++++++-----
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h
index 82aae78e1315..1572b3634cdd 100644
--- a/arch/s390/include/asm/mmu.h
+++ b/arch/s390/include/asm/mmu.h
@@ -18,7 +18,7 @@ typedef struct {
 	unsigned long asce_limit;
 	unsigned long vdso_base;
 	/* The mmu context belongs to a secure guest. */
-	atomic_t is_protected;
+	atomic_t protected_count;
 	/*
 	 * The following bitfields need a down_write on the mm
 	 * semaphore when they are written to. As they are only
diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h
index c7937f369e62..2a38af5a00c2 100644
--- a/arch/s390/include/asm/mmu_context.h
+++ b/arch/s390/include/asm/mmu_context.h
@@ -26,7 +26,7 @@ static inline int init_new_context(struct task_struct *tsk,
 	INIT_LIST_HEAD(&mm->context.gmap_list);
 	cpumask_clear(&mm->context.cpu_attach_mask);
 	atomic_set(&mm->context.flush_count, 0);
-	atomic_set(&mm->context.is_protected, 0);
+	atomic_set(&mm->context.protected_count, 0);
 	mm->context.gmap_asce = 0;
 	mm->context.flush_mm = 0;
 #ifdef CONFIG_PGSTE
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index a397b072a580..f16403ba81ec 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -525,7 +525,7 @@ static inline int mm_has_pgste(struct mm_struct *mm)
 static inline int mm_is_protected(struct mm_struct *mm)
 {
 #ifdef CONFIG_PGSTE
-	if (unlikely(atomic_read(&mm->context.is_protected)))
+	if (unlikely(atomic_read(&mm->context.protected_count)))
 		return 1;
 #endif
 	return 0;
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index bcbe10862f9f..f3134d79f8e1 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -166,7 +166,8 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
 	cc = uv_cmd_nodata(kvm_s390_pv_get_handle(kvm),
 			   UVC_CMD_DESTROY_SEC_CONF, rc, rrc);
 	WRITE_ONCE(kvm->arch.gmap->guest_handle, 0);
-	atomic_set(&kvm->mm->context.is_protected, 0);
+	if (!cc)
+		atomic_dec(&kvm->mm->context.protected_count);
 	KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x", *rc, *rrc);
 	WARN_ONCE(cc, "protvirt destroy vm failed rc %x rrc %x", *rc, *rrc);
 	/* Intended memory leak on "impossible" error */
@@ -208,11 +209,14 @@ int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
 	/* Outputs */
 	kvm->arch.pv.handle = uvcb.guest_handle;
 
+	atomic_inc(&kvm->mm->context.protected_count);
 	if (cc) {
-		if (uvcb.header.rc & UVC_RC_NEED_DESTROY)
+		if (uvcb.header.rc & UVC_RC_NEED_DESTROY) {
 			kvm_s390_pv_deinit_vm(kvm, &dummy, &dummy);
-		else
+		} else {
+			atomic_dec(&kvm->mm->context.protected_count);
 			kvm_s390_pv_dealloc_vm(kvm);
+		}
 		return -EIO;
 	}
 	kvm->arch.gmap->guest_handle = uvcb.guest_handle;
@@ -235,8 +239,6 @@ int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length, u16 *rc,
 	*rrc = uvcb.header.rrc;
 	KVM_UV_EVENT(kvm, 3, "PROTVIRT VM SET PARMS: rc %x rrc %x",
 		     *rc, *rrc);
-	if (!cc)
-		atomic_set(&kvm->mm->context.is_protected, 1);
 	return cc ? -EINVAL : 0;
 }
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v11 06/19] KVM: s390: pv: add export before import
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
                   ` (4 preceding siblings ...)
  2022-06-03  6:56 ` [PATCH v11 05/19] KVM: s390: pv: usage counter instead of flag Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-03  6:56 ` [PATCH v11 07/19] KVM: s390: pv: module parameter to fence asynchronous destroy Claudio Imbrenda
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

Due to upcoming changes, it will be possible to temporarily have
multiple protected VMs in the same address space, although only one
will be actually active.

In that scenario, it is necessary to perform an export of every page
that is to be imported, since the hardware does not allow a page
belonging to a protected guest to be imported into a different
protected guest.

This also applies to pages that are shared, and thus accessible by the
host.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kernel/uv.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index 2754471cc789..02aca3c5dce1 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -234,6 +234,26 @@ static int make_secure_pte(pte_t *ptep, unsigned long addr,
 	return uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
 }
 
+/**
+ * should_export_before_import - Determine whether an export is needed
+ * before an import-like operation
+ * @uvcb the Ultravisor control block of the UVC to be performed
+ * @mm the mm of the process
+ *
+ * Although considered as one, the Unpin Page UVC is not an actual import,
+ * so it is not affected.
+ *
+ * No export is needed also when there is only one protected VM, because the
+ * page cannot belong to the wrong VM in that case (there is no "other VM"
+ * it can belong to).
+ */
+static bool should_export_before_import(struct uv_cb_header *uvcb, struct mm_struct *mm)
+{
+	if (uvcb->cmd == UVC_CMD_UNPIN_PAGE_SHARED)
+		return false;
+	return atomic_read(&mm->context.protected_count) > 1;
+}
+
 /*
  * Requests the Ultravisor to make a page accessible to a guest.
  * If it's brought in the first time, it will be cleared. If
@@ -277,6 +297,8 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
 
 	lock_page(page);
 	ptep = get_locked_pte(gmap->mm, uaddr, &ptelock);
+	if (should_export_before_import(uvcb, gmap->mm))
+		uv_convert_from_secure(page_to_phys(page));
 	rc = make_secure_pte(ptep, uaddr, page, uvcb);
 	pte_unmap_unlock(ptep, ptelock);
 	unlock_page(page);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v11 07/19] KVM: s390: pv: module parameter to fence asynchronous destroy
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
                   ` (5 preceding siblings ...)
  2022-06-03  6:56 ` [PATCH v11 06/19] KVM: s390: pv: add export before import Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-15  9:53   ` Janosch Frank
  2022-06-03  6:56 ` [PATCH v11 08/19] KVM: s390: pv: clear the state without memset Claudio Imbrenda
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

Add the module parameter "async_destroy", to allow the asynchronous
destroy mechanism to be switched off.  This might be useful for
debugging purposes.

The parameter is enabled by default.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 76ad6408cb2c..49e27b5d7c3a 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -206,6 +206,11 @@ unsigned int diag9c_forwarding_hz;
 module_param(diag9c_forwarding_hz, uint, 0644);
 MODULE_PARM_DESC(diag9c_forwarding_hz, "Maximum diag9c forwarding per second, 0 to turn off");
 
+/* allow asynchronous deinit for protected guests, enable by default */
+static int async_destroy = 1;
+module_param(async_destroy, int, 0444);
+MODULE_PARM_DESC(async_destroy, "Asynchronous destroy for protected guests");
+
 /*
  * For now we handle at most 16 double words as this is what the s390 base
  * kernel handles and stores in the prefix page. If we ever need to go beyond
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v11 08/19] KVM: s390: pv: clear the state without memset
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
                   ` (6 preceding siblings ...)
  2022-06-03  6:56 ` [PATCH v11 07/19] KVM: s390: pv: module parameter to fence asynchronous destroy Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-03  6:56 ` [PATCH v11 09/19] KVM: s390: pv: Add kvm_s390_cpus_from_pv to kvm-s390.h and add documentation Claudio Imbrenda
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

Do not use memset to clean the whole struct kvm_s390_pv; instead,
explicitly clear the fields that need to be cleared.

Upcoming patches will introduce new fields in the struct kvm_s390_pv
that will not need to be cleared.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kvm/pv.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index f3134d79f8e1..9eca80afedce 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -16,6 +16,14 @@
 #include <linux/sched/mm.h>
 #include "kvm-s390.h"
 
+static void kvm_s390_clear_pv_state(struct kvm *kvm)
+{
+	kvm->arch.pv.handle = 0;
+	kvm->arch.pv.guest_len = 0;
+	kvm->arch.pv.stor_base = 0;
+	kvm->arch.pv.stor_var = NULL;
+}
+
 int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc)
 {
 	int cc;
@@ -110,7 +118,7 @@ static void kvm_s390_pv_dealloc_vm(struct kvm *kvm)
 	vfree(kvm->arch.pv.stor_var);
 	free_pages(kvm->arch.pv.stor_base,
 		   get_order(uv_info.guest_base_stor_len));
-	memset(&kvm->arch.pv, 0, sizeof(kvm->arch.pv));
+	kvm_s390_clear_pv_state(kvm);
 }
 
 static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v11 09/19] KVM: s390: pv: Add kvm_s390_cpus_from_pv to kvm-s390.h and add documentation
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
                   ` (7 preceding siblings ...)
  2022-06-03  6:56 ` [PATCH v11 08/19] KVM: s390: pv: clear the state without memset Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-03  6:56 ` [PATCH v11 10/19] KVM: s390: pv: add mmu_notifier Claudio Imbrenda
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

Future changes make it necessary to call this function from pv.c.

While we are at it, let's properly document kvm_s390_cpus_from_pv() and
kvm_s390_cpus_to_pv().

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 26 +++++++++++++++++++++++++-
 arch/s390/kvm/kvm-s390.h |  1 +
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 49e27b5d7c3a..8973985593a9 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2175,7 +2175,20 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm,
 	return r;
 }
 
-static int kvm_s390_cpus_from_pv(struct kvm *kvm, u16 *rcp, u16 *rrcp)
+/**
+ * kvm_s390_cpus_from_pv - Convert all protected vCPUs in a protected VM to
+ * non protected.
+ * @kvm the VM whose protected vCPUs are to be converted
+ * @rcp return value for the RC field of the UVC (in case of error)
+ * @rrcp return value for the RRC field of the UVC (in case of error)
+ *
+ * Does not stop in case of error, tries to convert as many
+ * CPUs as possible. In case of error, the RC and RRC of the last error are
+ * returned.
+ *
+ * Return: 0 in case of success, otherwise -EIO
+ */
+int kvm_s390_cpus_from_pv(struct kvm *kvm, u16 *rcp, u16 *rrcp)
 {
 	struct kvm_vcpu *vcpu;
 	u16 rc, rrc;
@@ -2205,6 +2218,17 @@ static int kvm_s390_cpus_from_pv(struct kvm *kvm, u16 *rcp, u16 *rrcp)
 	return ret;
 }
 
+/**
+ * kvm_s390_cpus_to_pv - Convert all non-protected vCPUs in a protected VM
+ * to protected.
+ * @kvm the VM whose protected vCPUs are to be converted
+ * @rcp return value for the RC field of the UVC (in case of error)
+ * @rrcp return value for the RRC field of the UVC (in case of error)
+ *
+ * Tries to undo the conversion in case of error.
+ *
+ * Return: 0 in case of success, otherwise -EIO
+ */
 static int kvm_s390_cpus_to_pv(struct kvm *kvm, u16 *rc, u16 *rrc)
 {
 	unsigned long i;
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 497d52a83c78..d3abedafa7a8 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -374,6 +374,7 @@ int kvm_s390_vcpu_setup_cmma(struct kvm_vcpu *vcpu);
 void kvm_s390_vcpu_unsetup_cmma(struct kvm_vcpu *vcpu);
 void kvm_s390_set_cpu_timer(struct kvm_vcpu *vcpu, __u64 cputm);
 __u64 kvm_s390_get_cpu_timer(struct kvm_vcpu *vcpu);
+int kvm_s390_cpus_from_pv(struct kvm *kvm, u16 *rcp, u16 *rrcp);
 
 /* implemented in diag.c */
 int kvm_s390_handle_diag(struct kvm_vcpu *vcpu);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v11 10/19] KVM: s390: pv: add mmu_notifier
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
                   ` (8 preceding siblings ...)
  2022-06-03  6:56 ` [PATCH v11 09/19] KVM: s390: pv: Add kvm_s390_cpus_from_pv to kvm-s390.h and add documentation Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-08 12:02   ` Nico Boehr
  2022-06-03  6:56 ` [PATCH v11 11/19] s390/mm: KVM: pv: when tearing down, try to destroy protected pages Claudio Imbrenda
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

Add an mmu_notifier for protected VMs. The callback function is
triggered when the mm is torn down, and will attempt to convert all
protected vCPUs to non-protected. This allows the mm teardown to use
the destroy page UVC instead of export.

Also make KVM select CONFIG_MMU_NOTIFIER, needed to use mmu_notifiers.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |  2 ++
 arch/s390/kvm/Kconfig            |  1 +
 arch/s390/kvm/kvm-s390.c         | 10 ++++++++++
 arch/s390/kvm/pv.c               | 26 ++++++++++++++++++++++++++
 4 files changed, 39 insertions(+)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 766028d54a3e..5824efe5fc9d 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -19,6 +19,7 @@
 #include <linux/kvm.h>
 #include <linux/seqlock.h>
 #include <linux/module.h>
+#include <linux/mmu_notifier.h>
 #include <asm/debug.h>
 #include <asm/cpu.h>
 #include <asm/fpu/api.h>
@@ -923,6 +924,7 @@ struct kvm_s390_pv {
 	u64 guest_len;
 	unsigned long stor_base;
 	void *stor_var;
+	struct mmu_notifier mmu_notifier;
 };
 
 struct kvm_arch{
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index 2e84d3922f7c..33f4ff909476 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -34,6 +34,7 @@ config KVM
 	select SRCU
 	select KVM_VFIO
 	select INTERVAL_TREE
+	select MMU_NOTIFIER
 	help
 	  Support hosting paravirtualized guest machines using the SIE
 	  virtualization capability on the mainframe. This should work
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 8973985593a9..fe1fa896def7 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -31,6 +31,7 @@
 #include <linux/sched/signal.h>
 #include <linux/string.h>
 #include <linux/pgtable.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/asm-offsets.h>
 #include <asm/lowcore.h>
@@ -2937,6 +2938,15 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	 */
 	if (kvm_s390_pv_get_handle(kvm))
 		kvm_s390_pv_deinit_vm(kvm, &rc, &rrc);
+	/*
+	 * Remove the mmu notifier only when the whole KVM VM is torn down,
+	 * and only if one was registered to begin with. If the VM is
+	 * currently not protected, but has been previously been protected,
+	 * then it's possible that the notifier is still registered.
+	 */
+	if (kvm->arch.pv.mmu_notifier.ops)
+		mmu_notifier_unregister(&kvm->arch.pv.mmu_notifier, kvm->mm);
+
 	debug_unregister(kvm->arch.dbf);
 	free_page((unsigned long)kvm->arch.sie_page2);
 	if (!kvm_is_ucontrol(kvm))
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 9eca80afedce..da1bae111fb1 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -14,6 +14,7 @@
 #include <asm/mman.h>
 #include <linux/pagewalk.h>
 #include <linux/sched/mm.h>
+#include <linux/mmu_notifier.h>
 #include "kvm-s390.h"
 
 static void kvm_s390_clear_pv_state(struct kvm *kvm)
@@ -187,6 +188,26 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
 	return -EIO;
 }
 
+static void kvm_s390_pv_mmu_notifier_release(struct mmu_notifier *subscription,
+					     struct mm_struct *mm)
+{
+	struct kvm *kvm = container_of(subscription, struct kvm, arch.pv.mmu_notifier);
+	u16 dummy;
+
+	/*
+	 * No locking is needed since this is the last thread of the last user of this
+	 * struct mm.
+	 * When the struct kvm gets deinitialized, this notifier is also
+	 * unregistered. This means that if this notifier runs, then the
+	 * struct kvm is still valid.
+	 */
+	kvm_s390_cpus_from_pv(kvm, &dummy, &dummy);
+}
+
+static const struct mmu_notifier_ops kvm_s390_pv_mmu_notifier_ops = {
+	.release = kvm_s390_pv_mmu_notifier_release,
+};
+
 int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
 {
 	struct uv_cb_cgc uvcb = {
@@ -228,6 +249,11 @@ int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
 		return -EIO;
 	}
 	kvm->arch.gmap->guest_handle = uvcb.guest_handle;
+	/* Add the notifier only once. No races because we hold kvm->lock */
+	if (kvm->arch.pv.mmu_notifier.ops != &kvm_s390_pv_mmu_notifier_ops) {
+		kvm->arch.pv.mmu_notifier.ops = &kvm_s390_pv_mmu_notifier_ops;
+		mmu_notifier_register(&kvm->arch.pv.mmu_notifier, kvm->mm);
+	}
 	return 0;
 }
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v11 11/19] s390/mm: KVM: pv: when tearing down, try to destroy protected pages
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
                   ` (9 preceding siblings ...)
  2022-06-03  6:56 ` [PATCH v11 10/19] KVM: s390: pv: add mmu_notifier Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-08 12:03   ` Nico Boehr
  2022-06-03  6:56 ` [PATCH v11 12/19] KVM: s390: pv: refactoring of kvm_s390_pv_deinit_vm Claudio Imbrenda
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

When ptep_get_and_clear_full is called for a mm teardown, we will now
attempt to destroy the secure pages. This will be faster than export.

In case it was not a teardown, or if for some reason the destroy page
UVC failed, we try with an export page, like before.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/pgtable.h | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index f16403ba81ec..cf81acf3879c 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1182,9 +1182,22 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
 	} else {
 		res = ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID));
 	}
-	/* At this point the reference through the mapping is still present */
-	if (mm_is_protected(mm) && pte_present(res))
-		uv_convert_owned_from_secure(pte_val(res) & PAGE_MASK);
+	/* Nothing to do */
+	if (!mm_is_protected(mm) || !pte_present(res))
+		return res;
+	/*
+	 * At this point the reference through the mapping is still present.
+	 * The notifier should have destroyed all protected vCPUs at this
+	 * point, so the destroy should be successful.
+	 */
+	if (full && !uv_destroy_owned_page(pte_val(res) & PAGE_MASK))
+		return res;
+	/*
+	 * If something went wrong and the page could not be destroyed, or
+	 * if this is not a mm teardown, the slower export is used as
+	 * fallback instead.
+	 */
+	uv_convert_owned_from_secure(pte_val(res) & PAGE_MASK);
 	return res;
 }
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v11 12/19] KVM: s390: pv: refactoring of kvm_s390_pv_deinit_vm
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
                   ` (10 preceding siblings ...)
  2022-06-03  6:56 ` [PATCH v11 11/19] s390/mm: KVM: pv: when tearing down, try to destroy protected pages Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-03  6:56 ` [PATCH v11 13/19] KVM: s390: pv: destroy the configuration before its memory Claudio Imbrenda
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

Refactor kvm_s390_pv_deinit_vm to improve readability and simplify the
improvements that are coming in subsequent patches.

No functional change intended.

[note: this can potentially be squashed into the next patch, I factored
it out to simplify the review process]

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/kvm/pv.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index da1bae111fb1..a389555d62e7 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -175,17 +175,17 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
 	cc = uv_cmd_nodata(kvm_s390_pv_get_handle(kvm),
 			   UVC_CMD_DESTROY_SEC_CONF, rc, rrc);
 	WRITE_ONCE(kvm->arch.gmap->guest_handle, 0);
-	if (!cc)
-		atomic_dec(&kvm->mm->context.protected_count);
-	KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x", *rc, *rrc);
-	WARN_ONCE(cc, "protvirt destroy vm failed rc %x rrc %x", *rc, *rrc);
-	/* Intended memory leak on "impossible" error */
 	if (!cc) {
+		atomic_dec(&kvm->mm->context.protected_count);
 		kvm_s390_pv_dealloc_vm(kvm);
-		return 0;
+	} else {
+		/* Intended memory leak on "impossible" error */
+		s390_replace_asce(kvm->arch.gmap);
 	}
-	s390_replace_asce(kvm->arch.gmap);
-	return -EIO;
+	KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x", *rc, *rrc);
+	WARN_ONCE(cc, "protvirt destroy vm failed rc %x rrc %x", *rc, *rrc);
+
+	return cc ? -EIO : 0;
 }
 
 static void kvm_s390_pv_mmu_notifier_release(struct mmu_notifier *subscription,
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v11 13/19] KVM: s390: pv: destroy the configuration before its memory
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
                   ` (11 preceding siblings ...)
  2022-06-03  6:56 ` [PATCH v11 12/19] KVM: s390: pv: refactoring of kvm_s390_pv_deinit_vm Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-08 12:06   ` Nico Boehr
  2022-06-14 14:23   ` Janosch Frank
  2022-06-03  6:56 ` [PATCH v11 14/19] KVM: s390: pv: cleanup leftover protected VMs if needed Claudio Imbrenda
                   ` (6 subsequent siblings)
  19 siblings, 2 replies; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

Move the Destroy Secure Configuration UVC before the loop to destroy
the memory. If the protected VM has memory, it will be cleaned up and
made accessible by the Destroy Secure Configuraion UVC. The struct
page for the relevant pages will still have the protected bit set, so
the loop is still needed to clean that up.

Switching the order of those two operations does not change the
outcome, but it is significantly faster.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 arch/s390/kvm/pv.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index a389555d62e7..6cffea26c47f 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -163,6 +163,9 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
 {
 	int cc;
 
+	cc = uv_cmd_nodata(kvm_s390_pv_get_handle(kvm),
+			   UVC_CMD_DESTROY_SEC_CONF, rc, rrc);
+	WRITE_ONCE(kvm->arch.gmap->guest_handle, 0);
 	/*
 	 * if the mm still has a mapping, make all its pages accessible
 	 * before destroying the guest
@@ -172,9 +175,6 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
 		mmput(kvm->mm);
 	}
 
-	cc = uv_cmd_nodata(kvm_s390_pv_get_handle(kvm),
-			   UVC_CMD_DESTROY_SEC_CONF, rc, rrc);
-	WRITE_ONCE(kvm->arch.gmap->guest_handle, 0);
 	if (!cc) {
 		atomic_dec(&kvm->mm->context.protected_count);
 		kvm_s390_pv_dealloc_vm(kvm);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v11 14/19] KVM: s390: pv: cleanup leftover protected VMs if needed
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
                   ` (12 preceding siblings ...)
  2022-06-03  6:56 ` [PATCH v11 13/19] KVM: s390: pv: destroy the configuration before its memory Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-15  9:59   ` Janosch Frank
  2022-06-03  6:56 ` [PATCH v11 15/19] KVM: s390: pv: asynchronous destroy for reboot Claudio Imbrenda
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

In upcoming patches it will be possible to start tearing down a
protected VM, and finish the teardown concurrently in a different
thread.

Protected VMs that are pending for tear down ("leftover") need to be
cleaned properly when the userspace process (e.g. qemu) terminates.

This patch makes sure that all "leftover" protected VMs are always
properly torn down.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |   2 +
 arch/s390/kvm/kvm-s390.c         |   2 +
 arch/s390/kvm/pv.c               | 109 ++++++++++++++++++++++++++++---
 3 files changed, 104 insertions(+), 9 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 5824efe5fc9d..cca8e05e0a71 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -924,6 +924,8 @@ struct kvm_s390_pv {
 	u64 guest_len;
 	unsigned long stor_base;
 	void *stor_var;
+	void *prepared_for_async_deinit;
+	struct list_head need_cleanup;
 	struct mmu_notifier mmu_notifier;
 };
 
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index fe1fa896def7..369de8377116 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2890,6 +2890,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	kvm_s390_vsie_init(kvm);
 	if (use_gisa)
 		kvm_s390_gisa_init(kvm);
+	INIT_LIST_HEAD(&kvm->arch.pv.need_cleanup);
+	kvm->arch.pv.prepared_for_async_deinit = NULL;
 	KVM_EVENT(3, "vm 0x%pK created by pid %u", kvm, current->pid);
 
 	return 0;
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 6cffea26c47f..8471c17d538c 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -17,6 +17,19 @@
 #include <linux/mmu_notifier.h>
 #include "kvm-s390.h"
 
+/**
+ * @struct leftover_pv_vm
+ * Represents a "leftover" protected VM that is still registered with the
+ * Ultravisor, but which does not correspond any longer to an active KVM VM.
+ */
+struct leftover_pv_vm {
+	struct list_head list;
+	unsigned long old_gmap_table;
+	u64 handle;
+	void *stor_var;
+	unsigned long stor_base;
+};
+
 static void kvm_s390_clear_pv_state(struct kvm *kvm)
 {
 	kvm->arch.pv.handle = 0;
@@ -158,23 +171,88 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
 	return -ENOMEM;
 }
 
+/**
+ * kvm_s390_pv_dispose_one_leftover - Clean up one leftover protected VM.
+ * @kvm the KVM that was associated with this leftover protected VM
+ * @leftover details about the leftover protected VM that needs a clean up
+ * @rc the RC code of the Destroy Secure Configuration UVC
+ * @rrc the RRC code of the Destroy Secure Configuration UVC
+ * Return: 0 in case of success, otherwise 1
+ *
+ * Destroy one leftover protected VM.
+ * On success, kvm->mm->context.protected_count will be decremented atomically
+ * and all other resources used by the VM will be freed.
+ */
+static int kvm_s390_pv_dispose_one_leftover(struct kvm *kvm, struct leftover_pv_vm *leftover,
+					    u16 *rc, u16 *rrc)
+{
+	int cc;
+
+	cc = uv_cmd_nodata(leftover->handle, UVC_CMD_DESTROY_SEC_CONF, rc, rrc);
+	KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY LEFTOVER VM: rc %x rrc %x", *rc, *rrc);
+	WARN_ONCE(cc, "protvirt destroy leftover vm failed rc %x rrc %x", *rc, *rrc);
+	if (cc)
+		return cc;
+	/*
+	 * Intentionally leak unusable memory. If the UVC fails, the memory
+	 * used for the VM and its metadata is permanently unusable.
+	 * This can only happen in case of a serious KVM or hardware bug; it
+	 * is not expected to happen in normal operation.
+	 */
+	free_pages(leftover->stor_base, get_order(uv_info.guest_base_stor_len));
+	free_pages(leftover->old_gmap_table, CRST_ALLOC_ORDER);
+	vfree(leftover->stor_var);
+	atomic_dec(&kvm->mm->context.protected_count);
+	return 0;
+}
+
+/**
+ * kvm_s390_pv_cleanup_leftovers - Clean up all leftover protected VMs.
+ * @kvm the KVM whose leftover protected VMs are to be cleaned up
+ * @rc the RC code of the first failing UVC, unless it was already != 1
+ * @rrc the RRC code of the first failing UVC, unless @rc was already != 1
+ * Return: 0 if all leftover VMs are successfully cleaned up, otherwise 1
+ *
+ * This function will clean up all "leftover" protected VMs, including the
+ * one that had been set aside for deferred teardown.
+ */
+static int kvm_s390_pv_cleanup_leftovers(struct kvm *kvm, u16 *rc, u16 *rrc)
+{
+	struct leftover_pv_vm *cur;
+	u16 _rc, _rrc;
+	int cc = 0;
+
+	if (kvm->arch.pv.prepared_for_async_deinit)
+		list_add(kvm->arch.pv.prepared_for_async_deinit, &kvm->arch.pv.need_cleanup);
+
+	while (!list_empty(&kvm->arch.pv.need_cleanup)) {
+		cur = list_first_entry(&kvm->arch.pv.need_cleanup, typeof(*cur), list);
+		if (kvm_s390_pv_dispose_one_leftover(kvm, cur, &_rc, &_rrc)) {
+			cc = 1;
+			/* do not overwrite a previous error code */
+			if (*rc == 1) {
+				*rc = _rc;
+				*rrc = _rrc;
+			}
+		}
+		list_del(&cur->list);
+		kfree(cur);
+	}
+	kvm->arch.pv.prepared_for_async_deinit = NULL;
+	return cc;
+}
+
 /* this should not fail, but if it does, we must not free the donated memory */
 int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
 {
 	int cc;
 
+	/* Make sure the counter does not reach 0 before calling s390_uv_destroy_range */
+	atomic_inc(&kvm->mm->context.protected_count);
+
 	cc = uv_cmd_nodata(kvm_s390_pv_get_handle(kvm),
 			   UVC_CMD_DESTROY_SEC_CONF, rc, rrc);
 	WRITE_ONCE(kvm->arch.gmap->guest_handle, 0);
-	/*
-	 * if the mm still has a mapping, make all its pages accessible
-	 * before destroying the guest
-	 */
-	if (mmget_not_zero(kvm->mm)) {
-		s390_uv_destroy_range(kvm->mm, 0, TASK_SIZE);
-		mmput(kvm->mm);
-	}
-
 	if (!cc) {
 		atomic_dec(&kvm->mm->context.protected_count);
 		kvm_s390_pv_dealloc_vm(kvm);
@@ -185,6 +263,19 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
 	KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x", *rc, *rrc);
 	WARN_ONCE(cc, "protvirt destroy vm failed rc %x rrc %x", *rc, *rrc);
 
+	cc |= kvm_s390_pv_cleanup_leftovers(kvm, rc, rrc);
+
+	/*
+	 * If the mm still has a mapping, try to mark all its pages as
+	 * accessible. The counter should not reach zero before this
+	 * cleanup has been performed.
+	 */
+	if (mmget_not_zero(kvm->mm)) {
+		s390_uv_destroy_range(kvm->mm, 0, TASK_SIZE);
+		mmput(kvm->mm);
+	}
+	/* Now the counter can safely reach 0 */
+	atomic_dec(&kvm->mm->context.protected_count);
 	return cc ? -EIO : 0;
 }
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v11 15/19] KVM: s390: pv: asynchronous destroy for reboot
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
                   ` (13 preceding siblings ...)
  2022-06-03  6:56 ` [PATCH v11 14/19] KVM: s390: pv: cleanup leftover protected VMs if needed Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-15 10:58   ` Janosch Frank
  2022-06-20  9:41   ` Janosch Frank
  2022-06-03  6:56 ` [PATCH v11 16/19] KVM: s390: pv: api documentation for asynchronous destroy Claudio Imbrenda
                   ` (4 subsequent siblings)
  19 siblings, 2 replies; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

Until now, destroying a protected guest was an entirely synchronous
operation that could potentially take a very long time, depending on
the size of the guest, due to the time needed to clean up the address
space from protected pages.

This patch implements an asynchronous destroy mechanism, that allows a
protected guest to reboot significantly faster than previously.

This is achieved by clearing the pages of the old guest in background.
In case of reboot, the new guest will be able to run in the same
address space almost immediately.

The old protected guest is then only destroyed when all of its memory has
been destroyed or otherwise made non protected.

Two new PV commands are added for the KVM_S390_PV_COMMAND ioctl:

KVM_PV_ASYNC_DISABLE_PREPARE: prepares the current protected VM for
asynchronous teardown. The current VM will then continue immediately
as non-protected. If a protected VM had already been set aside without
starting the teardown process, this call will fail.

KVM_PV_ASYNC_DISABLE: tears down the protected VM previously set aside
for asynchronous teardown. This PV command should ideally be issued by
userspace from a separate thread. If a fatal signal is received (or the
process terminates naturally), the command will terminate immediately
without completing.

Leftover protected VMs are cleaned up when a KVM VM is torn down
normally (either via IOCTL or when the process terminates); this
cleanup has been implemented in a previous patch.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 arch/s390/kvm/kvm-s390.c |  34 +++++++++-
 arch/s390/kvm/kvm-s390.h |   2 +
 arch/s390/kvm/pv.c       | 131 +++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h |   2 +
 4 files changed, 166 insertions(+), 3 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 369de8377116..842419092c0c 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2256,9 +2256,13 @@ static int kvm_s390_cpus_to_pv(struct kvm *kvm, u16 *rc, u16 *rrc)
 
 static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 {
+	const bool needslock = (cmd->cmd != KVM_PV_ASYNC_DISABLE);
+	void __user *argp = (void __user *)cmd->data;
 	int r = 0;
 	u16 dummy;
-	void __user *argp = (void __user *)cmd->data;
+
+	if (needslock)
+		mutex_lock(&kvm->lock);
 
 	switch (cmd->cmd) {
 	case KVM_PV_ENABLE: {
@@ -2292,6 +2296,28 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 		set_bit(IRQ_PEND_EXT_SERVICE, &kvm->arch.float_int.masked_irqs);
 		break;
 	}
+	case KVM_PV_ASYNC_DISABLE_PREPARE:
+		r = -EINVAL;
+		if (!kvm_s390_pv_is_protected(kvm) || !async_destroy)
+			break;
+
+		r = kvm_s390_cpus_from_pv(kvm, &cmd->rc, &cmd->rrc);
+		/*
+		 * If a CPU could not be destroyed, destroy VM will also fail.
+		 * There is no point in trying to destroy it. Instead return
+		 * the rc and rrc from the first CPU that failed destroying.
+		 */
+		if (r)
+			break;
+		r = kvm_s390_pv_deinit_vm_async_prepare(kvm, &cmd->rc, &cmd->rrc);
+
+		/* no need to block service interrupts any more */
+		clear_bit(IRQ_PEND_EXT_SERVICE, &kvm->arch.float_int.masked_irqs);
+		break;
+	case KVM_PV_ASYNC_DISABLE:
+		/* This must not be called while holding kvm->lock */
+		r = kvm_s390_pv_deinit_vm_async(kvm, &cmd->rc, &cmd->rrc);
+		break;
 	case KVM_PV_DISABLE: {
 		r = -EINVAL;
 		if (!kvm_s390_pv_is_protected(kvm))
@@ -2393,6 +2419,9 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 	default:
 		r = -ENOTTY;
 	}
+	if (needslock)
+		mutex_unlock(&kvm->lock);
+
 	return r;
 }
 
@@ -2597,9 +2626,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
 			r = -EINVAL;
 			break;
 		}
-		mutex_lock(&kvm->lock);
+		/* must be called without kvm->lock */
 		r = kvm_s390_handle_pv(kvm, &args);
-		mutex_unlock(&kvm->lock);
 		if (copy_to_user(argp, &args, sizeof(args))) {
 			r = -EFAULT;
 			break;
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index d3abedafa7a8..d296afb6041c 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -243,6 +243,8 @@ static inline u32 kvm_s390_get_gisa_desc(struct kvm *kvm)
 /* implemented in pv.c */
 int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc);
 int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc);
+int kvm_s390_pv_deinit_vm_async_prepare(struct kvm *kvm, u16 *rc, u16 *rrc);
+int kvm_s390_pv_deinit_vm_async(struct kvm *kvm, u16 *rc, u16 *rrc);
 int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc);
 int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *rrc);
 int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length, u16 *rc,
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 8471c17d538c..ab06fa366e49 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -279,6 +279,137 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
 	return cc ? -EIO : 0;
 }
 
+/**
+ * kvm_s390_destroy_lower_2g - Destroy the first 2GB of protected guest memory.
+ * @kvm the VM whose memory is to be cleared.
+ * Destroy the first 2GB of guest memory, to avoid prefix issues after reboot.
+ */
+static void kvm_s390_destroy_lower_2g(struct kvm *kvm)
+{
+	struct kvm_memory_slot *slot;
+	unsigned long lim;
+	int srcu_idx;
+
+	srcu_idx = srcu_read_lock(&kvm->srcu);
+
+	/* Take the memslot containing guest absolute address 0 */
+	slot = gfn_to_memslot(kvm, 0);
+	/* Clear all slots that are completely below 2GB */
+	while (slot && slot->base_gfn + slot->npages < SZ_2G / PAGE_SIZE) {
+		lim = slot->userspace_addr + slot->npages * PAGE_SIZE;
+		s390_uv_destroy_range(kvm->mm, slot->userspace_addr, lim);
+		/* Take the next memslot */
+		slot = gfn_to_memslot(kvm, slot->base_gfn + slot->npages);
+	}
+	/* Last slot crosses the 2G boundary, clear only up to 2GB */
+	if (slot && slot->base_gfn < SZ_2G / PAGE_SIZE) {
+		lim = slot->userspace_addr + SZ_2G - slot->base_gfn * PAGE_SIZE;
+		s390_uv_destroy_range(kvm->mm, slot->userspace_addr, lim);
+	}
+
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
+}
+
+/**
+ * kvm_s390_pv_deinit_vm_async_prepare - Prepare a protected VM for
+ * asynchronous teardown.
+ * @kvm the VM
+ * @rc return value for the RC field of the UVCB
+ * @rrc return value for the RRC field of the UVCB
+ *
+ * Prepare the protected VM for asynchronous teardown. The VM will be able
+ * to continue immediately as a non-secure VM, and the information needed to
+ * properly tear down the protected VM is set aside. If another protected VM
+ * was already set aside without starting a teardown, the function will
+ * fail.
+ *
+ * Context: kvm->lock needs to be held
+ *
+ * Return: 0 in case of success, -EINVAL if another protected VM was already set
+ * aside, -ENOMEM if the system ran out of memory.
+ */
+int kvm_s390_pv_deinit_vm_async_prepare(struct kvm *kvm, u16 *rc, u16 *rrc)
+{
+	struct leftover_pv_vm *priv;
+
+	/*
+	 * If an asynchronous deinitialization is already pending, refuse.
+	 * A synchronous deinitialization has to be performed instead.
+	 */
+	if (READ_ONCE(kvm->arch.pv.prepared_for_async_deinit))
+		return -EINVAL;
+	priv = kmalloc(sizeof(*priv), GFP_KERNEL | __GFP_ZERO);
+	if (!priv)
+		return -ENOMEM;
+
+	priv->stor_var = kvm->arch.pv.stor_var;
+	priv->stor_base = kvm->arch.pv.stor_base;
+	priv->handle = kvm_s390_pv_get_handle(kvm);
+	priv->old_gmap_table = (unsigned long)kvm->arch.gmap->table;
+	WRITE_ONCE(kvm->arch.gmap->guest_handle, 0);
+	if (s390_replace_asce(kvm->arch.gmap)) {
+		kfree(priv);
+		return -ENOMEM;
+	}
+
+	kvm_s390_destroy_lower_2g(kvm);
+	kvm_s390_clear_pv_state(kvm);
+	WRITE_ONCE(kvm->arch.pv.prepared_for_async_deinit, priv);
+
+	*rc = 1;
+	*rrc = 42;
+	return 0;
+}
+
+/**
+ * kvm_s390_pv_deinit_vm_async - Perform an asynchronous teardown of a
+ * protected VM.
+ * @kvm the VM previously associated with the protected VM
+ * @rc return value for the RC field of the UVCB
+ * @rrc return value for the RRC field of the UVCB
+ *
+ * Tear down the protected VM that had previously been set aside using
+ * kvm_s390_pv_deinit_vm_async_prepare.
+ *
+ * Context: kvm->lock must not be held.
+ *
+ * Return: 0 in case of success, -EINVAL if no protected VM had been
+ * prepared for asynchronous teardowm, -EIO in case of other errors.
+ */
+int kvm_s390_pv_deinit_vm_async(struct kvm *kvm, u16 *rc, u16 *rrc)
+{
+	struct leftover_pv_vm *p;
+	int ret = 0;
+
+	lockdep_assert_not_held(&kvm->lock);
+
+	p = xchg(&kvm->arch.pv.prepared_for_async_deinit, NULL);
+	if (!p)
+		return -EINVAL;
+
+	/* When a fatal signal is received, stop immediately */
+	if (s390_uv_destroy_range_interruptible(kvm->mm, 0, TASK_SIZE_MAX))
+		goto done;
+	if (kvm_s390_pv_dispose_one_leftover(kvm, p, rc, rrc))
+		ret = -EIO;
+	kfree(p);
+	p = NULL;
+done:
+	/*
+	 * p is not NULL if we aborted because of a fatal signal, in which
+	 * case queue the leftover for later cleanup.
+	 */
+	if (p) {
+		mutex_lock(&kvm->lock);
+		list_add(&p->list, &kvm->arch.pv.need_cleanup);
+		mutex_unlock(&kvm->lock);
+		/* Did not finish, but pretend things went well */
+		*rc = 1;
+		*rrc = 42;
+	}
+	return ret;
+}
+
 static void kvm_s390_pv_mmu_notifier_release(struct mmu_notifier *subscription,
 					     struct mm_struct *mm)
 {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 5088bd9f1922..91b072c137bf 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1668,6 +1668,8 @@ enum pv_cmd_id {
 	KVM_PV_VERIFY,
 	KVM_PV_PREP_RESET,
 	KVM_PV_UNSHARE_ALL,
+	KVM_PV_ASYNC_DISABLE_PREPARE,
+	KVM_PV_ASYNC_DISABLE,
 };
 
 struct kvm_pv_cmd {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v11 16/19] KVM: s390: pv: api documentation for asynchronous destroy
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
                   ` (14 preceding siblings ...)
  2022-06-03  6:56 ` [PATCH v11 15/19] KVM: s390: pv: asynchronous destroy for reboot Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-20  9:13   ` Janosch Frank
  2022-06-03  6:56 ` [PATCH v11 17/19] KVM: s390: pv: add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE Claudio Imbrenda
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

Add documentation for the new commands added to the KVM_S390_PV_COMMAND
ioctl.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Nico Boehr <nrb@linux.ibm.com>
---
 Documentation/virt/kvm/api.rst | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 11e00a46c610..97d35b30ce3b 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5143,11 +5143,13 @@ KVM_PV_ENABLE
   =====      =============================
 
 KVM_PV_DISABLE
-
   Deregister the VM from the Ultravisor and reclaim the memory that
   had been donated to the Ultravisor, making it usable by the kernel
-  again.  All registered VCPUs are converted back to non-protected
-  ones.
+  again. All registered VCPUs are converted back to non-protected
+  ones. If a previous VM had been prepared for asynchonous teardown
+  with KVM_PV_ASYNC_DISABLE_PREPARE and not actually torn down with
+  KVM_PV_ASYNC_DISABLE, it will be torn down in this call together with
+  the current VM.
 
 KVM_PV_VM_SET_SEC_PARMS
   Pass the image header from VM memory to the Ultravisor in
@@ -5160,6 +5162,23 @@ KVM_PV_VM_VERIFY
   Verify the integrity of the unpacked image. Only if this succeeds,
   KVM is allowed to start protected VCPUs.
 
+KVM_PV_ASYNC_DISABLE_PREPARE
+  Prepare the current protected VM for asynchronous teardown. Most
+  resources used by the current protected VM will be set aside for a
+  subsequent asynchronous teardown. The current protected VM will then
+  resume execution immediately as non-protected. If a protected VM had
+  already been prepared without starting the asynchronous teardown process,
+  this call will fail. In that case, the userspace process should issue a
+  normal KVM_PV_DISABLE.
+
+KVM_PV_ASYNC_DISABLE
+  Tear down the protected VM previously prepared for asynchronous teardown.
+  The resources that had been set aside will be freed asynchronously during
+  the execution of this command.
+  This PV command should ideally be issued by userspace from a separate
+  thread. If a fatal signal is received (or the process terminates
+  naturally), the command will terminate immediately without completing.
+
 4.126 KVM_X86_SET_MSR_FILTER
 ----------------------------
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v11 17/19] KVM: s390: pv: add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
                   ` (15 preceding siblings ...)
  2022-06-03  6:56 ` [PATCH v11 16/19] KVM: s390: pv: api documentation for asynchronous destroy Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-03  6:56 ` [PATCH v11 18/19] KVM: s390: pv: avoid export before import if possible Claudio Imbrenda
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

Add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE to signal that the
KVM_PV_ASYNC_DISABLE and KVM_PV_ASYNC_DISABLE_PREPARE commands for the
KVM_S390_PV_COMMAND ioctl are available.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Nico Boehr <nrb@linux.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 3 +++
 include/uapi/linux/kvm.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 842419092c0c..aeb654849f2d 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -609,6 +609,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_S390_BPB:
 		r = test_facility(82);
 		break;
+	case KVM_CAP_S390_PROTECTED_ASYNC_DISABLE:
+		r = async_destroy && is_prot_virt_host();
+		break;
 	case KVM_CAP_S390_PROTECTED:
 		r = is_prot_virt_host();
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 91b072c137bf..b22dd1221b30 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1157,6 +1157,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_TSC_CONTROL 214
 #define KVM_CAP_SYSTEM_EVENT_DATA 215
 #define KVM_CAP_ARM_SYSTEM_SUSPEND 216
+#define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 225
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v11 18/19] KVM: s390: pv: avoid export before import if possible
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
                   ` (16 preceding siblings ...)
  2022-06-03  6:56 ` [PATCH v11 17/19] KVM: s390: pv: add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-07 14:33   ` Nico Boehr
  2022-06-20  9:56   ` Janosch Frank
  2022-06-03  6:56 ` [PATCH v11 19/19] KVM: s390: pv: support for Destroy fast UVC Claudio Imbrenda
  2022-06-14 14:29 ` [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Janosch Frank
  19 siblings, 2 replies; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

If the appropriate UV feature bit is set, there is no need to perform
an export before import.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 arch/s390/kernel/uv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index 02aca3c5dce1..c18c3d6a4314 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -249,6 +249,8 @@ static int make_secure_pte(pte_t *ptep, unsigned long addr,
  */
 static bool should_export_before_import(struct uv_cb_header *uvcb, struct mm_struct *mm)
 {
+	if (test_bit_inv(BIT_UV_FEAT_MISC, &uv_info.uv_feature_indications))
+		return false;
 	if (uvcb->cmd == UVC_CMD_UNPIN_PAGE_SHARED)
 		return false;
 	return atomic_read(&mm->context.protected_count) > 1;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v11 19/19] KVM: s390: pv: support for Destroy fast UVC
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
                   ` (17 preceding siblings ...)
  2022-06-03  6:56 ` [PATCH v11 18/19] KVM: s390: pv: avoid export before import if possible Claudio Imbrenda
@ 2022-06-03  6:56 ` Claudio Imbrenda
  2022-06-14 14:29 ` [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Janosch Frank
  19 siblings, 0 replies; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-03  6:56 UTC (permalink / raw)
  To: kvm
  Cc: borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu, nrb

Add support for the Destroy Secure Configuration Fast Ultravisor call,
and take advantage of it for asynchronous destroy.

When supported, the protected guest is destroyed immediately using the
new UVC, leaving only the memory to be cleaned up asynchronously.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 arch/s390/include/asm/uv.h | 10 +++++++
 arch/s390/kvm/pv.c         | 58 ++++++++++++++++++++++++++++++++------
 2 files changed, 59 insertions(+), 9 deletions(-)

diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index ba64e0be03bb..8b255d26c5a7 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -34,6 +34,7 @@
 #define UVC_CMD_INIT_UV			0x000f
 #define UVC_CMD_CREATE_SEC_CONF		0x0100
 #define UVC_CMD_DESTROY_SEC_CONF	0x0101
+#define UVC_CMD_DESTROY_SEC_CONF_FAST	0x0102
 #define UVC_CMD_CREATE_SEC_CPU		0x0120
 #define UVC_CMD_DESTROY_SEC_CPU		0x0121
 #define UVC_CMD_CONV_TO_SEC_STOR	0x0200
@@ -77,6 +78,7 @@ enum uv_cmds_inst {
 	BIT_UVC_CMD_UNSHARE_ALL = 20,
 	BIT_UVC_CMD_PIN_PAGE_SHARED = 21,
 	BIT_UVC_CMD_UNPIN_PAGE_SHARED = 22,
+	BIT_UVC_CMD_DESTROY_SEC_CONF_FAST = 23,
 	BIT_UVC_CMD_RETR_ATTEST = 28,
 };
 
@@ -213,6 +215,14 @@ struct uv_cb_nodata {
 	u64 reserved20[4];
 } __packed __aligned(8);
 
+/* Destroy Configuration Fast */
+struct uv_cb_destroy_fast {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 handle;
+	u64 reserved20[5];
+} __packed __aligned(8);
+
 /* Set Shared Access */
 struct uv_cb_share {
 	struct uv_cb_header header;
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index ab06fa366e49..51c35df41a83 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -188,6 +188,9 @@ static int kvm_s390_pv_dispose_one_leftover(struct kvm *kvm, struct leftover_pv_
 {
 	int cc;
 
+	/* It used the destroy-fast UVC, nothing left to do here */
+	if (!leftover->handle)
+		goto done_fast;
 	cc = uv_cmd_nodata(leftover->handle, UVC_CMD_DESTROY_SEC_CONF, rc, rrc);
 	KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY LEFTOVER VM: rc %x rrc %x", *rc, *rrc);
 	WARN_ONCE(cc, "protvirt destroy leftover vm failed rc %x rrc %x", *rc, *rrc);
@@ -202,6 +205,7 @@ static int kvm_s390_pv_dispose_one_leftover(struct kvm *kvm, struct leftover_pv_
 	free_pages(leftover->stor_base, get_order(uv_info.guest_base_stor_len));
 	free_pages(leftover->old_gmap_table, CRST_ALLOC_ORDER);
 	vfree(leftover->stor_var);
+done_fast:
 	atomic_dec(&kvm->mm->context.protected_count);
 	return 0;
 }
@@ -310,6 +314,32 @@ static void kvm_s390_destroy_lower_2g(struct kvm *kvm)
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
 }
 
+static int kvm_s390_pv_deinit_vm_fast(struct kvm *kvm, u16 *rc, u16 *rrc)
+{
+	struct uv_cb_destroy_fast uvcb = {
+		.header.cmd = UVC_CMD_DESTROY_SEC_CONF_FAST,
+		.header.len = sizeof(uvcb),
+		.handle = kvm_s390_pv_get_handle(kvm),
+	};
+	int cc;
+
+	cc = uv_call_sched(0, (u64)&uvcb);
+	*rc = uvcb.header.rc;
+	*rrc = uvcb.header.rrc;
+	WRITE_ONCE(kvm->arch.gmap->guest_handle, 0);
+	KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM FAST: rc %x rrc %x", *rc, *rrc);
+	WARN_ONCE(cc, "protvirt destroy vm fast failed rc %x rrc %x", *rc, *rrc);
+	/* Inteded memory leak on "impossible" error */
+	if (!cc)
+		kvm_s390_pv_dealloc_vm(kvm);
+	return cc ? -EIO : 0;
+}
+
+static inline bool is_destroy_fast_available(void)
+{
+	return test_bit_inv(BIT_UVC_CMD_DESTROY_SEC_CONF_FAST, uv_info.inst_calls_list);
+}
+
 /**
  * kvm_s390_pv_deinit_vm_async_prepare - Prepare a protected VM for
  * asynchronous teardown.
@@ -331,6 +361,7 @@ static void kvm_s390_destroy_lower_2g(struct kvm *kvm)
 int kvm_s390_pv_deinit_vm_async_prepare(struct kvm *kvm, u16 *rc, u16 *rrc)
 {
 	struct leftover_pv_vm *priv;
+	int res;
 
 	/*
 	 * If an asynchronous deinitialization is already pending, refuse.
@@ -342,14 +373,20 @@ int kvm_s390_pv_deinit_vm_async_prepare(struct kvm *kvm, u16 *rc, u16 *rrc)
 	if (!priv)
 		return -ENOMEM;
 
-	priv->stor_var = kvm->arch.pv.stor_var;
-	priv->stor_base = kvm->arch.pv.stor_base;
-	priv->handle = kvm_s390_pv_get_handle(kvm);
-	priv->old_gmap_table = (unsigned long)kvm->arch.gmap->table;
-	WRITE_ONCE(kvm->arch.gmap->guest_handle, 0);
-	if (s390_replace_asce(kvm->arch.gmap)) {
-		kfree(priv);
-		return -ENOMEM;
+	if (is_destroy_fast_available()) {
+		res = kvm_s390_pv_deinit_vm_fast(kvm, rc, rrc);
+		if (res)
+			return res;
+	} else {
+		priv->stor_var = kvm->arch.pv.stor_var;
+		priv->stor_base = kvm->arch.pv.stor_base;
+		priv->handle = kvm_s390_pv_get_handle(kvm);
+		priv->old_gmap_table = (unsigned long)kvm->arch.gmap->table;
+		WRITE_ONCE(kvm->arch.gmap->guest_handle, 0);
+		if (s390_replace_asce(kvm->arch.gmap)) {
+			kfree(priv);
+			return -ENOMEM;
+		}
 	}
 
 	kvm_s390_destroy_lower_2g(kvm);
@@ -415,6 +452,7 @@ static void kvm_s390_pv_mmu_notifier_release(struct mmu_notifier *subscription,
 {
 	struct kvm *kvm = container_of(subscription, struct kvm, arch.pv.mmu_notifier);
 	u16 dummy;
+	int r;
 
 	/*
 	 * No locking is needed since this is the last thread of the last user of this
@@ -423,7 +461,9 @@ static void kvm_s390_pv_mmu_notifier_release(struct mmu_notifier *subscription,
 	 * unregistered. This means that if this notifier runs, then the
 	 * struct kvm is still valid.
 	 */
-	kvm_s390_cpus_from_pv(kvm, &dummy, &dummy);
+	r = kvm_s390_cpus_from_pv(kvm, &dummy, &dummy);
+	if (!r && is_destroy_fast_available())
+		kvm_s390_pv_deinit_vm_fast(kvm, &dummy, &dummy);
 }
 
 static const struct mmu_notifier_ops kvm_s390_pv_mmu_notifier_ops = {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v11 18/19] KVM: s390: pv: avoid export before import if possible
  2022-06-03  6:56 ` [PATCH v11 18/19] KVM: s390: pv: avoid export before import if possible Claudio Imbrenda
@ 2022-06-07 14:33   ` Nico Boehr
  2022-06-20  9:56   ` Janosch Frank
  1 sibling, 0 replies; 36+ messages in thread
From: Nico Boehr @ 2022-06-07 14:33 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: kvm, borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu

On Fri,  3 Jun 2022 08:56:44 +0200
Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:

> If the appropriate UV feature bit is set, there is no need to perform
> an export before import.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>

Reviewed-by: Nico Boehr <nrb@linux.ibm.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v11 10/19] KVM: s390: pv: add mmu_notifier
  2022-06-03  6:56 ` [PATCH v11 10/19] KVM: s390: pv: add mmu_notifier Claudio Imbrenda
@ 2022-06-08 12:02   ` Nico Boehr
  0 siblings, 0 replies; 36+ messages in thread
From: Nico Boehr @ 2022-06-08 12:02 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: kvm, borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu

On Fri,  3 Jun 2022 08:56:36 +0200
Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:

> Add an mmu_notifier for protected VMs. The callback function is
> triggered when the mm is torn down, and will attempt to convert all
> protected vCPUs to non-protected. This allows the mm teardown to use
> the destroy page UVC instead of export.
> 
> Also make KVM select CONFIG_MMU_NOTIFIER, needed to use mmu_notifiers.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Acked-by: Janosch Frank <frankja@linux.ibm.com>

Reviewed-by: Nico Boehr <nrb@linux.ibm.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v11 11/19] s390/mm: KVM: pv: when tearing down, try to destroy protected pages
  2022-06-03  6:56 ` [PATCH v11 11/19] s390/mm: KVM: pv: when tearing down, try to destroy protected pages Claudio Imbrenda
@ 2022-06-08 12:03   ` Nico Boehr
  0 siblings, 0 replies; 36+ messages in thread
From: Nico Boehr @ 2022-06-08 12:03 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: kvm, borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu

On Fri,  3 Jun 2022 08:56:37 +0200
Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:

> When ptep_get_and_clear_full is called for a mm teardown, we will now
> attempt to destroy the secure pages. This will be faster than export.
> 
> In case it was not a teardown, or if for some reason the destroy page
> UVC failed, we try with an export page, like before.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Acked-by: Janosch Frank <frankja@linux.ibm.com>

Reviewed-by: Nico Boehr <nrb@linux.ibm.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v11 13/19] KVM: s390: pv: destroy the configuration before its memory
  2022-06-03  6:56 ` [PATCH v11 13/19] KVM: s390: pv: destroy the configuration before its memory Claudio Imbrenda
@ 2022-06-08 12:06   ` Nico Boehr
  2022-06-14 14:23   ` Janosch Frank
  1 sibling, 0 replies; 36+ messages in thread
From: Nico Boehr @ 2022-06-08 12:06 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: kvm, borntraeger, frankja, thuth, pasic, david, linux-s390,
	linux-kernel, scgl, mimu

On Fri,  3 Jun 2022 08:56:39 +0200
Claudio Imbrenda <imbrenda@linux.ibm.com> wrote:

> Move the Destroy Secure Configuration UVC before the loop to destroy
> the memory. If the protected VM has memory, it will be cleaned up and
> made accessible by the Destroy Secure Configuraion UVC. The struct
> page for the relevant pages will still have the protected bit set, so
> the loop is still needed to clean that up.
> 
> Switching the order of those two operations does not change the
> outcome, but it is significantly faster.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>

Reviewed-by: Nico Boehr <nrb@linux.ibm.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v11 13/19] KVM: s390: pv: destroy the configuration before its memory
  2022-06-03  6:56 ` [PATCH v11 13/19] KVM: s390: pv: destroy the configuration before its memory Claudio Imbrenda
  2022-06-08 12:06   ` Nico Boehr
@ 2022-06-14 14:23   ` Janosch Frank
  1 sibling, 0 replies; 36+ messages in thread
From: Janosch Frank @ 2022-06-14 14:23 UTC (permalink / raw)
  To: Claudio Imbrenda, kvm
  Cc: borntraeger, thuth, pasic, david, linux-s390, linux-kernel, scgl,
	mimu, nrb

On 6/3/22 08:56, Claudio Imbrenda wrote:
> Move the Destroy Secure Configuration UVC before the loop to destroy
> the memory. If the protected VM has memory, it will be cleaned up and
> made accessible by the Destroy Secure Configuraion UVC. The struct

s/Configuraion/Configuration/

> page for the relevant pages will still have the protected bit set, so
> the loop is still needed to clean that up.
> 
> Switching the order of those two operations does not change the
> outcome, but it is significantly faster.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>

Reviewed-by: Janosch Frank <frankja@linux.ibm.com>

> ---
>   arch/s390/kvm/pv.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index a389555d62e7..6cffea26c47f 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -163,6 +163,9 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
>   {
>   	int cc;
>   
> +	cc = uv_cmd_nodata(kvm_s390_pv_get_handle(kvm),
> +			   UVC_CMD_DESTROY_SEC_CONF, rc, rrc);
> +	WRITE_ONCE(kvm->arch.gmap->guest_handle, 0);
>   	/*
>   	 * if the mm still has a mapping, make all its pages accessible
>   	 * before destroying the guest
> @@ -172,9 +175,6 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
>   		mmput(kvm->mm);
>   	}
>   
> -	cc = uv_cmd_nodata(kvm_s390_pv_get_handle(kvm),
> -			   UVC_CMD_DESTROY_SEC_CONF, rc, rrc);
> -	WRITE_ONCE(kvm->arch.gmap->guest_handle, 0);
>   	if (!cc) {
>   		atomic_dec(&kvm->mm->context.protected_count);
>   		kvm_s390_pv_dealloc_vm(kvm);


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot
  2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
                   ` (18 preceding siblings ...)
  2022-06-03  6:56 ` [PATCH v11 19/19] KVM: s390: pv: support for Destroy fast UVC Claudio Imbrenda
@ 2022-06-14 14:29 ` Janosch Frank
  19 siblings, 0 replies; 36+ messages in thread
From: Janosch Frank @ 2022-06-14 14:29 UTC (permalink / raw)
  To: Claudio Imbrenda, kvm
  Cc: borntraeger, thuth, pasic, david, linux-s390, linux-kernel, scgl,
	mimu, nrb

On 6/3/22 08:56, Claudio Imbrenda wrote:
> Previously, when a protected VM was rebooted or when it was shut down,
> its memory was made unprotected, and then the protected VM itself was
> destroyed. Looping over the whole address space can take some time,
> considering the overhead of the various Ultravisor Calls (UVCs). This
> means that a reboot or a shutdown would take a potentially long amount
> of time, depending on the amount of used memory.
> 
> This patchseries implements a deferred destroy mechanism for protected
> guests. When a protected guest is destroyed, its memory can be cleared
> in background, allowing the guest to restart or terminate significantly
> faster than before.
> 
> There are 2 possibilities when a protected VM is torn down:
> * it still has an address space associated (reboot case)
> * it does not have an address space anymore (shutdown case)
> 

Please add patches 1-13 to devel for some CI coverage.
I'll try reviewing the remaining patches this week.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v11 07/19] KVM: s390: pv: module parameter to fence asynchronous destroy
  2022-06-03  6:56 ` [PATCH v11 07/19] KVM: s390: pv: module parameter to fence asynchronous destroy Claudio Imbrenda
@ 2022-06-15  9:53   ` Janosch Frank
  2022-06-15  9:59     ` Claudio Imbrenda
  0 siblings, 1 reply; 36+ messages in thread
From: Janosch Frank @ 2022-06-15  9:53 UTC (permalink / raw)
  To: Claudio Imbrenda, kvm
  Cc: borntraeger, thuth, pasic, david, linux-s390, linux-kernel, scgl,
	mimu, nrb

On 6/3/22 08:56, Claudio Imbrenda wrote:
> Add the module parameter "async_destroy", to allow the asynchronous
> destroy mechanism to be switched off.  This might be useful for
> debugging purposes.
> 
> The parameter is enabled by default.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Reviewed-by: Janosch Frank <frankja@linux.ibm.com>

Normally this would be one of the last patches in the series, no?

> ---
>   arch/s390/kvm/kvm-s390.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 76ad6408cb2c..49e27b5d7c3a 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -206,6 +206,11 @@ unsigned int diag9c_forwarding_hz;
>   module_param(diag9c_forwarding_hz, uint, 0644);
>   MODULE_PARM_DESC(diag9c_forwarding_hz, "Maximum diag9c forwarding per second, 0 to turn off");
>   
> +/* allow asynchronous deinit for protected guests, enable by default */
> +static int async_destroy = 1;
> +module_param(async_destroy, int, 0444);
> +MODULE_PARM_DESC(async_destroy, "Asynchronous destroy for protected guests");
> +
>   /*
>    * For now we handle at most 16 double words as this is what the s390 base
>    * kernel handles and stores in the prefix page. If we ever need to go beyond


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v11 14/19] KVM: s390: pv: cleanup leftover protected VMs if needed
  2022-06-03  6:56 ` [PATCH v11 14/19] KVM: s390: pv: cleanup leftover protected VMs if needed Claudio Imbrenda
@ 2022-06-15  9:59   ` Janosch Frank
  2022-06-15 10:19     ` Claudio Imbrenda
  0 siblings, 1 reply; 36+ messages in thread
From: Janosch Frank @ 2022-06-15  9:59 UTC (permalink / raw)
  To: Claudio Imbrenda, kvm
  Cc: borntraeger, thuth, pasic, david, linux-s390, linux-kernel, scgl,
	mimu, nrb

On 6/3/22 08:56, Claudio Imbrenda wrote:
> In upcoming patches it will be possible to start tearing down a
> protected VM, and finish the teardown concurrently in a different
> thread.

s/,/
s/the/its/

> 
> Protected VMs that are pending for tear down ("leftover") need to be
> cleaned properly when the userspace process (e.g. qemu) terminates.
> 
> This patch makes sure that all "leftover" protected VMs are always
> properly torn down.

So we're handling the kvm_arch_destroy_vm() case here, right?
Maybe add that in a more prominent way and rework the subject:

KVM: s390: pv: cleanup leftover PV VM shells on VM shutdown

> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
>   arch/s390/include/asm/kvm_host.h |   2 +
>   arch/s390/kvm/kvm-s390.c         |   2 +
>   arch/s390/kvm/pv.c               | 109 ++++++++++++++++++++++++++++---
>   3 files changed, 104 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 5824efe5fc9d..cca8e05e0a71 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -924,6 +924,8 @@ struct kvm_s390_pv {
>   	u64 guest_len;
>   	unsigned long stor_base;
>   	void *stor_var;
> +	void *prepared_for_async_deinit;
> +	struct list_head need_cleanup;
>   	struct mmu_notifier mmu_notifier;
>   };
>   
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index fe1fa896def7..369de8377116 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2890,6 +2890,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>   	kvm_s390_vsie_init(kvm);
>   	if (use_gisa)
>   		kvm_s390_gisa_init(kvm);
> +	INIT_LIST_HEAD(&kvm->arch.pv.need_cleanup);
> +	kvm->arch.pv.prepared_for_async_deinit = NULL;
>   	KVM_EVENT(3, "vm 0x%pK created by pid %u", kvm, current->pid);
>   
>   	return 0;
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index 6cffea26c47f..8471c17d538c 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -17,6 +17,19 @@
>   #include <linux/mmu_notifier.h>
>   #include "kvm-s390.h"
>   
> +/**
> + * @struct leftover_pv_vm

Any other ideas on naming these VMs?
Also I'd turn that around: pv_vm_leftover

> + * Represents a "leftover" protected VM that is still registered with the
> + * Ultravisor, but which does not correspond any longer to an active KVM VM.
> + */
> +struct leftover_pv_vm {
> +	struct list_head list;
> +	unsigned long old_gmap_table;
> +	u64 handle;
> +	void *stor_var;
> +	unsigned long stor_base;
> +};
> +

I think we should switch this patch and the next one and add this struct 
to the next patch. The list work below makes more sense once the next 
patch has been read.

>   static void kvm_s390_clear_pv_state(struct kvm *kvm)
>   {
>   	kvm->arch.pv.handle = 0;
> @@ -158,23 +171,88 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
>   	return -ENOMEM;
>   }
>   

>   


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v11 07/19] KVM: s390: pv: module parameter to fence asynchronous destroy
  2022-06-15  9:53   ` Janosch Frank
@ 2022-06-15  9:59     ` Claudio Imbrenda
  0 siblings, 0 replies; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-15  9:59 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, borntraeger, thuth, pasic, david, linux-s390, linux-kernel,
	scgl, mimu, nrb

On Wed, 15 Jun 2022 11:53:17 +0200
Janosch Frank <frankja@linux.ibm.com> wrote:

> On 6/3/22 08:56, Claudio Imbrenda wrote:
> > Add the module parameter "async_destroy", to allow the asynchronous
> > destroy mechanism to be switched off.  This might be useful for
> > debugging purposes.
> > 
> > The parameter is enabled by default.
> > 
> > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > Reviewed-by: Janosch Frank <frankja@linux.ibm.com>  
> 
> Normally this would be one of the last patches in the series, no?

I need the variable to be already defined, because the subsequent
patches use it to fence things

> 
> > ---
> >   arch/s390/kvm/kvm-s390.c | 5 +++++
> >   1 file changed, 5 insertions(+)
> > 
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index 76ad6408cb2c..49e27b5d7c3a 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -206,6 +206,11 @@ unsigned int diag9c_forwarding_hz;
> >   module_param(diag9c_forwarding_hz, uint, 0644);
> >   MODULE_PARM_DESC(diag9c_forwarding_hz, "Maximum diag9c forwarding per second, 0 to turn off");
> >   
> > +/* allow asynchronous deinit for protected guests, enable by default */
> > +static int async_destroy = 1;
> > +module_param(async_destroy, int, 0444);
> > +MODULE_PARM_DESC(async_destroy, "Asynchronous destroy for protected guests");
> > +
> >   /*
> >    * For now we handle at most 16 double words as this is what the s390 base
> >    * kernel handles and stores in the prefix page. If we ever need to go beyond  
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v11 14/19] KVM: s390: pv: cleanup leftover protected VMs if needed
  2022-06-15  9:59   ` Janosch Frank
@ 2022-06-15 10:19     ` Claudio Imbrenda
  2022-06-15 10:57       ` Janosch Frank
  0 siblings, 1 reply; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-15 10:19 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, borntraeger, thuth, pasic, david, linux-s390, linux-kernel,
	scgl, mimu, nrb

On Wed, 15 Jun 2022 11:59:36 +0200
Janosch Frank <frankja@linux.ibm.com> wrote:

> On 6/3/22 08:56, Claudio Imbrenda wrote:
> > In upcoming patches it will be possible to start tearing down a
> > protected VM, and finish the teardown concurrently in a different
> > thread.  
> 
> s/,/
> s/the/its/

will fix

> 
> > 
> > Protected VMs that are pending for tear down ("leftover") need to be
> > cleaned properly when the userspace process (e.g. qemu) terminates.
> > 
> > This patch makes sure that all "leftover" protected VMs are always
> > properly torn down.  
> 
> So we're handling the kvm_arch_destroy_vm() case here, right?

yes

> Maybe add that in a more prominent way and rework the subject:
> 
> KVM: s390: pv: cleanup leftover PV VM shells on VM shutdown

ok, I'll change the description and rework the subject

> 
> > 
> > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > ---
> >   arch/s390/include/asm/kvm_host.h |   2 +
> >   arch/s390/kvm/kvm-s390.c         |   2 +
> >   arch/s390/kvm/pv.c               | 109 ++++++++++++++++++++++++++++---
> >   3 files changed, 104 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> > index 5824efe5fc9d..cca8e05e0a71 100644
> > --- a/arch/s390/include/asm/kvm_host.h
> > +++ b/arch/s390/include/asm/kvm_host.h
> > @@ -924,6 +924,8 @@ struct kvm_s390_pv {
> >   	u64 guest_len;
> >   	unsigned long stor_base;
> >   	void *stor_var;
> > +	void *prepared_for_async_deinit;
> > +	struct list_head need_cleanup;
> >   	struct mmu_notifier mmu_notifier;
> >   };
> >   
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index fe1fa896def7..369de8377116 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -2890,6 +2890,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >   	kvm_s390_vsie_init(kvm);
> >   	if (use_gisa)
> >   		kvm_s390_gisa_init(kvm);
> > +	INIT_LIST_HEAD(&kvm->arch.pv.need_cleanup);
> > +	kvm->arch.pv.prepared_for_async_deinit = NULL;
> >   	KVM_EVENT(3, "vm 0x%pK created by pid %u", kvm, current->pid);
> >   
> >   	return 0;
> > diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> > index 6cffea26c47f..8471c17d538c 100644
> > --- a/arch/s390/kvm/pv.c
> > +++ b/arch/s390/kvm/pv.c
> > @@ -17,6 +17,19 @@
> >   #include <linux/mmu_notifier.h>
> >   #include "kvm-s390.h"
> >   
> > +/**
> > + * @struct leftover_pv_vm  
> 
> Any other ideas on naming these VMs?

not really

> Also I'd turn that around: pv_vm_leftover

I mean, it's a leftover protected VM, it felt more natural to name it
that way

> 
> > + * Represents a "leftover" protected VM that is still registered with the
> > + * Ultravisor, but which does not correspond any longer to an active KVM VM.
> > + */
> > +struct leftover_pv_vm {
> > +	struct list_head list;
> > +	unsigned long old_gmap_table;
> > +	u64 handle;
> > +	void *stor_var;
> > +	unsigned long stor_base;
> > +};
> > +  
> 
> I think we should switch this patch and the next one and add this struct 
> to the next patch. The list work below makes more sense once the next 
> patch has been read.

but the next patch will leave leftovers in some circumstances, and
those won't be cleaned up without this patch.

having this patch first means that when the next patch is applied, the
leftovers are already taken care of

> >   static void kvm_s390_clear_pv_state(struct kvm *kvm)
> >   {
> >   	kvm->arch.pv.handle = 0;
> > @@ -158,23 +171,88 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
> >   	return -ENOMEM;
> >   }
> >     
> 
> >     
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v11 14/19] KVM: s390: pv: cleanup leftover protected VMs if needed
  2022-06-15 10:19     ` Claudio Imbrenda
@ 2022-06-15 10:57       ` Janosch Frank
  2022-06-15 11:13         ` Claudio Imbrenda
  0 siblings, 1 reply; 36+ messages in thread
From: Janosch Frank @ 2022-06-15 10:57 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: kvm, borntraeger, thuth, pasic, david, linux-s390, linux-kernel,
	scgl, mimu, nrb

On 6/15/22 12:19, Claudio Imbrenda wrote:
> On Wed, 15 Jun 2022 11:59:36 +0200
> Janosch Frank <frankja@linux.ibm.com> wrote:
> 
>> On 6/3/22 08:56, Claudio Imbrenda wrote:
>>> In upcoming patches it will be possible to start tearing down a
>>> protected VM, and finish the teardown concurrently in a different
>>> thread.
>>
>> s/,/
>> s/the/its/
> 
> will fix
> 
>>
>>>
>>> Protected VMs that are pending for tear down ("leftover") need to be
>>> cleaned properly when the userspace process (e.g. qemu) terminates.
>>>
>>> This patch makes sure that all "leftover" protected VMs are always
>>> properly torn down.
>>
>> So we're handling the kvm_arch_destroy_vm() case here, right?
> 
> yes
> 
>> Maybe add that in a more prominent way and rework the subject:
>>
>> KVM: s390: pv: cleanup leftover PV VM shells on VM shutdown
> 
> ok, I'll change the description and rework the subject
> 
>>
>>>
>>> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>> ---
>>>    arch/s390/include/asm/kvm_host.h |   2 +
>>>    arch/s390/kvm/kvm-s390.c         |   2 +
>>>    arch/s390/kvm/pv.c               | 109 ++++++++++++++++++++++++++++---
>>>    3 files changed, 104 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>>> index 5824efe5fc9d..cca8e05e0a71 100644
>>> --- a/arch/s390/include/asm/kvm_host.h
>>> +++ b/arch/s390/include/asm/kvm_host.h
>>> @@ -924,6 +924,8 @@ struct kvm_s390_pv {
>>>    	u64 guest_len;
>>>    	unsigned long stor_base;
>>>    	void *stor_var;
>>> +	void *prepared_for_async_deinit;
>>> +	struct list_head need_cleanup;
>>>    	struct mmu_notifier mmu_notifier;
>>>    };
>>>    
>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>> index fe1fa896def7..369de8377116 100644
>>> --- a/arch/s390/kvm/kvm-s390.c
>>> +++ b/arch/s390/kvm/kvm-s390.c
>>> @@ -2890,6 +2890,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>>    	kvm_s390_vsie_init(kvm);
>>>    	if (use_gisa)
>>>    		kvm_s390_gisa_init(kvm);
>>> +	INIT_LIST_HEAD(&kvm->arch.pv.need_cleanup);
>>> +	kvm->arch.pv.prepared_for_async_deinit = NULL;
>>>    	KVM_EVENT(3, "vm 0x%pK created by pid %u", kvm, current->pid);
>>>    
>>>    	return 0;
>>> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
>>> index 6cffea26c47f..8471c17d538c 100644
>>> --- a/arch/s390/kvm/pv.c
>>> +++ b/arch/s390/kvm/pv.c
>>> @@ -17,6 +17,19 @@
>>>    #include <linux/mmu_notifier.h>
>>>    #include "kvm-s390.h"
>>>    
>>> +/**
>>> + * @struct leftover_pv_vm
>>
>> Any other ideas on naming these VMs?
> 
> not really
> 
>> Also I'd turn that around: pv_vm_leftover
> 
> I mean, it's a leftover protected VM, it felt more natural to name it
> that way
> 
>>
>>> + * Represents a "leftover" protected VM that is still registered with the
>>> + * Ultravisor, but which does not correspond any longer to an active KVM VM.
>>> + */
>>> +struct leftover_pv_vm {
>>> +	struct list_head list;
>>> +	unsigned long old_gmap_table;
>>> +	u64 handle;
>>> +	void *stor_var;
>>> +	unsigned long stor_base;
>>> +};
>>> +
>>
>> I think we should switch this patch and the next one and add this struct
>> to the next patch. The list work below makes more sense once the next
>> patch has been read.
> 
> but the next patch will leave leftovers in some circumstances, and
> those won't be cleaned up without this patch.
> 
> having this patch first means that when the next patch is applied, the
> leftovers are already taken care of

Then I opt for squashing the patch.

Without the next patch prepared_for_async_deinit will always be NULL and 
this code is completely unneeded, no?

> 
>>>    static void kvm_s390_clear_pv_state(struct kvm *kvm)
>>>    {
>>>    	kvm->arch.pv.handle = 0;
>>> @@ -158,23 +171,88 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
>>>    	return -ENOMEM;
>>>    }
>>>      
>>
>>>      
>>
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v11 15/19] KVM: s390: pv: asynchronous destroy for reboot
  2022-06-03  6:56 ` [PATCH v11 15/19] KVM: s390: pv: asynchronous destroy for reboot Claudio Imbrenda
@ 2022-06-15 10:58   ` Janosch Frank
  2022-06-20  9:41   ` Janosch Frank
  1 sibling, 0 replies; 36+ messages in thread
From: Janosch Frank @ 2022-06-15 10:58 UTC (permalink / raw)
  To: Claudio Imbrenda, kvm
  Cc: borntraeger, thuth, pasic, david, linux-s390, linux-kernel, scgl,
	mimu, nrb

On 6/3/22 08:56, Claudio Imbrenda wrote:
> Until now, destroying a protected guest was an entirely synchronous
> operation that could potentially take a very long time, depending on
> the size of the guest, due to the time needed to clean up the address
> space from protected pages.
> 
> This patch implements an asynchronous destroy mechanism, that allows a
> protected guest to reboot significantly faster than previously.
> 
> This is achieved by clearing the pages of the old guest in background.
> In case of reboot, the new guest will be able to run in the same
> address space almost immediately.
> 
> The old protected guest is then only destroyed when all of its memory has
> been destroyed or otherwise made non protected.
> 
> Two new PV commands are added for the KVM_S390_PV_COMMAND ioctl:
> 
> KVM_PV_ASYNC_DISABLE_PREPARE: prepares the current protected VM for
> asynchronous teardown. The current VM will then continue immediately
> as non-protected. If a protected VM had already been set aside without
> starting the teardown process, this call will fail.
> 
> KVM_PV_ASYNC_DISABLE: tears down the protected VM previously set aside
> for asynchronous teardown. This PV command should ideally be issued by
> userspace from a separate thread. If a fatal signal is received (or the
> process terminates naturally), the command will terminate immediately
> without completing.
> 
> Leftover protected VMs are cleaned up when a KVM VM is torn down
> normally (either via IOCTL or when the process terminates); this
> cleanup has been implemented in a previous patch.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
>   arch/s390/kvm/kvm-s390.c |  34 +++++++++-
>   arch/s390/kvm/kvm-s390.h |   2 +
>   arch/s390/kvm/pv.c       | 131 +++++++++++++++++++++++++++++++++++++++
>   include/uapi/linux/kvm.h |   2 +
>   4 files changed, 166 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 369de8377116..842419092c0c 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2256,9 +2256,13 @@ static int kvm_s390_cpus_to_pv(struct kvm *kvm, u16 *rc, u16 *rrc)
>   
>   static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>   {
> +	const bool needslock = (cmd->cmd != KVM_PV_ASYNC_DISABLE);
> +	void __user *argp = (void __user *)cmd->data;
>   	int r = 0;
>   	u16 dummy;
> -	void __user *argp = (void __user *)cmd->data;
> +
> +	if (needslock)
> +		mutex_lock(&kvm->lock);
>   
>   	switch (cmd->cmd) {
>   	case KVM_PV_ENABLE: {
> @@ -2292,6 +2296,28 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>   		set_bit(IRQ_PEND_EXT_SERVICE, &kvm->arch.float_int.masked_irqs);
>   		break;
>   	}
> +	case KVM_PV_ASYNC_DISABLE_PREPARE:
> +		r = -EINVAL;
> +		if (!kvm_s390_pv_is_protected(kvm) || !async_destroy)
> +			break;
> +
> +		r = kvm_s390_cpus_from_pv(kvm, &cmd->rc, &cmd->rrc);
> +		/*
> +		 * If a CPU could not be destroyed, destroy VM will also fail.
> +		 * There is no point in trying to destroy it. Instead return
> +		 * the rc and rrc from the first CPU that failed destroying.
> +		 */
> +		if (r)
> +			break;
> +		r = kvm_s390_pv_deinit_vm_async_prepare(kvm, &cmd->rc, &cmd->rrc);
> +
> +		/* no need to block service interrupts any more */
> +		clear_bit(IRQ_PEND_EXT_SERVICE, &kvm->arch.float_int.masked_irqs);
> +		break;
> +	case KVM_PV_ASYNC_DISABLE:
> +		/* This must not be called while holding kvm->lock */
> +		r = kvm_s390_pv_deinit_vm_async(kvm, &cmd->rc, &cmd->rrc);
> +		break;
>   	case KVM_PV_DISABLE: {
>   		r = -EINVAL;
>   		if (!kvm_s390_pv_is_protected(kvm))
> @@ -2393,6 +2419,9 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>   	default:
>   		r = -ENOTTY;
>   	}
> +	if (needslock)
> +		mutex_unlock(&kvm->lock);
> +
>   	return r;
>   }
>   
> @@ -2597,9 +2626,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
>   			r = -EINVAL;
>   			break;
>   		}
> -		mutex_lock(&kvm->lock);
> +		/* must be called without kvm->lock */
>   		r = kvm_s390_handle_pv(kvm, &args);
> -		mutex_unlock(&kvm->lock);
>   		if (copy_to_user(argp, &args, sizeof(args))) {
>   			r = -EFAULT;
>   			break;
> diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
> index d3abedafa7a8..d296afb6041c 100644
> --- a/arch/s390/kvm/kvm-s390.h
> +++ b/arch/s390/kvm/kvm-s390.h
> @@ -243,6 +243,8 @@ static inline u32 kvm_s390_get_gisa_desc(struct kvm *kvm)
>   /* implemented in pv.c */
>   int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc);
>   int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc);
> +int kvm_s390_pv_deinit_vm_async_prepare(struct kvm *kvm, u16 *rc, u16 *rrc);
> +int kvm_s390_pv_deinit_vm_async(struct kvm *kvm, u16 *rc, u16 *rrc);
>   int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc);
>   int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *rrc);
>   int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length, u16 *rc,
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index 8471c17d538c..ab06fa366e49 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -279,6 +279,137 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
>   	return cc ? -EIO : 0;
>   }
>   
> +/**
> + * kvm_s390_destroy_lower_2g - Destroy the first 2GB of protected guest memory.
> + * @kvm the VM whose memory is to be cleared.
> + * Destroy the first 2GB of guest memory, to avoid prefix issues after reboot.
> + */
> +static void kvm_s390_destroy_lower_2g(struct kvm *kvm)
> +{
> +	struct kvm_memory_slot *slot;
> +	unsigned long lim;
> +	int srcu_idx;
> +
> +	srcu_idx = srcu_read_lock(&kvm->srcu);
> +
> +	/* Take the memslot containing guest absolute address 0 */
> +	slot = gfn_to_memslot(kvm, 0);
> +	/* Clear all slots that are completely below 2GB */
> +	while (slot && slot->base_gfn + slot->npages < SZ_2G / PAGE_SIZE) {
> +		lim = slot->userspace_addr + slot->npages * PAGE_SIZE;
> +		s390_uv_destroy_range(kvm->mm, slot->userspace_addr, lim);
> +		/* Take the next memslot */
> +		slot = gfn_to_memslot(kvm, slot->base_gfn + slot->npages);
> +	}
> +	/* Last slot crosses the 2G boundary, clear only up to 2GB */
> +	if (slot && slot->base_gfn < SZ_2G / PAGE_SIZE) {
> +		lim = slot->userspace_addr + SZ_2G - slot->base_gfn * PAGE_SIZE;
> +		s390_uv_destroy_range(kvm->mm, slot->userspace_addr, lim);
> +	}

Any reason why you split that up instead of always calculating a length 
and using MIN()?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v11 14/19] KVM: s390: pv: cleanup leftover protected VMs if needed
  2022-06-15 10:57       ` Janosch Frank
@ 2022-06-15 11:13         ` Claudio Imbrenda
  0 siblings, 0 replies; 36+ messages in thread
From: Claudio Imbrenda @ 2022-06-15 11:13 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, borntraeger, thuth, pasic, david, linux-s390, linux-kernel,
	scgl, mimu, nrb

On Wed, 15 Jun 2022 12:57:39 +0200
Janosch Frank <frankja@linux.ibm.com> wrote:

[...]

> >> I think we should switch this patch and the next one and add this struct
> >> to the next patch. The list work below makes more sense once the next
> >> patch has been read.  
> > 
> > but the next patch will leave leftovers in some circumstances, and
> > those won't be cleaned up without this patch.
> > 
> > having this patch first means that when the next patch is applied, the
> > leftovers are already taken care of  
> 
> Then I opt for squashing the patch.
> 
> Without the next patch prepared_for_async_deinit will always be NULL and 
> this code is completely unneeded, no?

correct. I had split them to make them smaller and easier to review

I will squash them if you think it's better

> 
> >   
> >>>    static void kvm_s390_clear_pv_state(struct kvm *kvm)
> >>>    {
> >>>    	kvm->arch.pv.handle = 0;
> >>> @@ -158,23 +171,88 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
> >>>    	return -ENOMEM;
> >>>    }
> >>>        
> >>  
> >>>        
> >>  
> >   
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v11 16/19] KVM: s390: pv: api documentation for asynchronous destroy
  2022-06-03  6:56 ` [PATCH v11 16/19] KVM: s390: pv: api documentation for asynchronous destroy Claudio Imbrenda
@ 2022-06-20  9:13   ` Janosch Frank
  0 siblings, 0 replies; 36+ messages in thread
From: Janosch Frank @ 2022-06-20  9:13 UTC (permalink / raw)
  To: Claudio Imbrenda, kvm
  Cc: borntraeger, thuth, pasic, david, linux-s390, linux-kernel, scgl,
	mimu, nrb

On 6/3/22 08:56, Claudio Imbrenda wrote:
> Add documentation for the new commands added to the KVM_S390_PV_COMMAND
> ioctl.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Reviewed-by: Nico Boehr <nrb@linux.ibm.com>
> ---
>   Documentation/virt/kvm/api.rst | 25 ++++++++++++++++++++++---
>   1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 11e00a46c610..97d35b30ce3b 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -5143,11 +5143,13 @@ KVM_PV_ENABLE
>     =====      =============================
>   
>   KVM_PV_DISABLE
> -
>     Deregister the VM from the Ultravisor and reclaim the memory that
>     had been donated to the Ultravisor, making it usable by the kernel
> -  again.  All registered VCPUs are converted back to non-protected
> -  ones.
> +  again. All registered VCPUs are converted back to non-protected
> +  ones. If a previous VM had been prepared for asynchonous teardown
> +  with KVM_PV_ASYNC_DISABLE_PREPARE and not actually torn down with
> +  KVM_PV_ASYNC_DISABLE, it will be torn down in this call together with
> +  the current VM.
>   
>   KVM_PV_VM_SET_SEC_PARMS
>     Pass the image header from VM memory to the Ultravisor in
> @@ -5160,6 +5162,23 @@ KVM_PV_VM_VERIFY
>     Verify the integrity of the unpacked image. Only if this succeeds,
>     KVM is allowed to start protected VCPUs.
>   
> +KVM_PV_ASYNC_DISABLE_PREPARE
> +  Prepare the current protected VM for asynchronous teardown. Most
> +  resources used by the current protected VM will be set aside for a

We should state that leftover UV state needs cleanup, namely secure 
storage and the configuration.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v11 15/19] KVM: s390: pv: asynchronous destroy for reboot
  2022-06-03  6:56 ` [PATCH v11 15/19] KVM: s390: pv: asynchronous destroy for reboot Claudio Imbrenda
  2022-06-15 10:58   ` Janosch Frank
@ 2022-06-20  9:41   ` Janosch Frank
  1 sibling, 0 replies; 36+ messages in thread
From: Janosch Frank @ 2022-06-20  9:41 UTC (permalink / raw)
  To: Claudio Imbrenda, kvm
  Cc: borntraeger, thuth, pasic, david, linux-s390, linux-kernel, scgl,
	mimu, nrb

On 6/3/22 08:56, Claudio Imbrenda wrote:
> Until now, destroying a protected guest was an entirely synchronous
> operation that could potentially take a very long time, depending on
> the size of the guest, due to the time needed to clean up the address
> space from protected pages.
> 
> This patch implements an asynchronous destroy mechanism, that allows a
> protected guest to reboot significantly faster than previously.
> 
> This is achieved by clearing the pages of the old guest in background.
> In case of reboot, the new guest will be able to run in the same
> address space almost immediately.
> 
> The old protected guest is then only destroyed when all of its memory has
> been destroyed or otherwise made non protected.
> 
> Two new PV commands are added for the KVM_S390_PV_COMMAND ioctl:
> 
> KVM_PV_ASYNC_DISABLE_PREPARE: prepares the current protected VM for
> asynchronous teardown. The current VM will then continue immediately
> as non-protected. If a protected VM had already been set aside without
> starting the teardown process, this call will fail.
> 
> KVM_PV_ASYNC_DISABLE: tears down the protected VM previously set aside
> for asynchronous teardown. This PV command should ideally be issued by
> userspace from a separate thread. If a fatal signal is received (or the
> process terminates naturally), the command will terminate immediately
> without completing.
> 
> Leftover protected VMs are cleaned up when a KVM VM is torn down
> normally (either via IOCTL or when the process terminates); this
> cleanup has been implemented in a previous patch.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
>   arch/s390/kvm/kvm-s390.c |  34 +++++++++-
>   arch/s390/kvm/kvm-s390.h |   2 +
>   arch/s390/kvm/pv.c       | 131 +++++++++++++++++++++++++++++++++++++++
>   include/uapi/linux/kvm.h |   2 +
>   4 files changed, 166 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 369de8377116..842419092c0c 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2256,9 +2256,13 @@ static int kvm_s390_cpus_to_pv(struct kvm *kvm, u16 *rc, u16 *rrc)
>   
>   static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>   {
> +	const bool needslock = (cmd->cmd != KVM_PV_ASYNC_DISABLE);
> +	void __user *argp = (void __user *)cmd->data;
>   	int r = 0;
>   	u16 dummy;
> -	void __user *argp = (void __user *)cmd->data;
> +
> +	if (needslock)
> +		mutex_lock(&kvm->lock);
>   
>   	switch (cmd->cmd) {
>   	case KVM_PV_ENABLE: {
> @@ -2292,6 +2296,28 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>   		set_bit(IRQ_PEND_EXT_SERVICE, &kvm->arch.float_int.masked_irqs);
>   		break;
>   	}
> +	case KVM_PV_ASYNC_DISABLE_PREPARE:
> +		r = -EINVAL;
> +		if (!kvm_s390_pv_is_protected(kvm) || !async_destroy)
> +			break;
> +
> +		r = kvm_s390_cpus_from_pv(kvm, &cmd->rc, &cmd->rrc);
> +		/*
> +		 * If a CPU could not be destroyed, destroy VM will also fail.
> +		 * There is no point in trying to destroy it. Instead return
> +		 * the rc and rrc from the first CPU that failed destroying.
> +		 */
> +		if (r)
> +			break;
> +		r = kvm_s390_pv_deinit_vm_async_prepare(kvm, &cmd->rc, &cmd->rrc);
> +
> +		/* no need to block service interrupts any more */
> +		clear_bit(IRQ_PEND_EXT_SERVICE, &kvm->arch.float_int.masked_irqs);
> +		break;
> +	case KVM_PV_ASYNC_DISABLE:

I'd like to have the ASYNC and DISABLE swapped so we're in line with the 
normal disable. Also renaming KVM_PV_ASYNC_DISABLE to 
KVM_PV_ASYNC_DISABLE_EXECUTE or something similar would make sense to me.

> +		/* This must not be called while holding kvm->lock */
> +		r = kvm_s390_pv_deinit_vm_async(kvm, &cmd->rc, &cmd->rrc);
> +		break;
>   	case KVM_PV_DISABLE: {
>   		r = -EINVAL;
>   		if (!kvm_s390_pv_is_protected(kvm))
> @@ -2393,6 +2419,9 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>   	default:
>   		r = -ENOTTY;
>   	}
> +	if (needslock)
> +		mutex_unlock(&kvm->lock);
> +
>   	return r;
>   }
>   
> @@ -2597,9 +2626,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
>   			r = -EINVAL;
>   			break;
>   		}
> -		mutex_lock(&kvm->lock);
> +		/* must be called without kvm->lock */
>   		r = kvm_s390_handle_pv(kvm, &args);
> -		mutex_unlock(&kvm->lock);
>   		if (copy_to_user(argp, &args, sizeof(args))) {
>   			r = -EFAULT;
>   			break;
> diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
> index d3abedafa7a8..d296afb6041c 100644
> --- a/arch/s390/kvm/kvm-s390.h
> +++ b/arch/s390/kvm/kvm-s390.h
> @@ -243,6 +243,8 @@ static inline u32 kvm_s390_get_gisa_desc(struct kvm *kvm)
>   /* implemented in pv.c */
>   int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc);
>   int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc);
> +int kvm_s390_pv_deinit_vm_async_prepare(struct kvm *kvm, u16 *rc, u16 *rrc);
> +int kvm_s390_pv_deinit_vm_async(struct kvm *kvm, u16 *rc, u16 *rrc);
>   int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc);
>   int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *rrc);
>   int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length, u16 *rc,
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index 8471c17d538c..ab06fa366e49 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -279,6 +279,137 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
>   	return cc ? -EIO : 0;
>   }
>   
> +/**
> + * kvm_s390_destroy_lower_2g - Destroy the first 2GB of protected guest memory.
> + * @kvm the VM whose memory is to be cleared.
> + * Destroy the first 2GB of guest memory, to avoid prefix issues after reboot.

Please add a line stating that destroying is only possible if the 
configuration that owns the storage has no PV cpus registered with the UV.

> + */
> +static void kvm_s390_destroy_lower_2g(struct kvm *kvm)
> +{
> +	struct kvm_memory_slot *slot;
> +	unsigned long lim;
> +	int srcu_idx;
> +
> +	srcu_idx = srcu_read_lock(&kvm->srcu);
> +
> +	/* Take the memslot containing guest absolute address 0 */
> +	slot = gfn_to_memslot(kvm, 0);
> +	/* Clear all slots that are completely below 2GB */
> +	while (slot && slot->base_gfn + slot->npages < SZ_2G / PAGE_SIZE) {
> +		lim = slot->userspace_addr + slot->npages * PAGE_SIZE;
> +		s390_uv_destroy_range(kvm->mm, slot->userspace_addr, lim);
> +		/* Take the next memslot */
> +		slot = gfn_to_memslot(kvm, slot->base_gfn + slot->npages);
> +	}
> +	/* Last slot crosses the 2G boundary, clear only up to 2GB */
> +	if (slot && slot->base_gfn < SZ_2G / PAGE_SIZE) {
> +		lim = slot->userspace_addr + SZ_2G - slot->base_gfn * PAGE_SIZE;
> +		s390_uv_destroy_range(kvm->mm, slot->userspace_addr, lim);
> +	}
> +
> +	srcu_read_unlock(&kvm->srcu, srcu_idx);
> +}
> +
> +/**
> + * kvm_s390_pv_deinit_vm_async_prepare - Prepare a protected VM for
> + * asynchronous teardown.
> + * @kvm the VM
> + * @rc return value for the RC field of the UVCB
> + * @rrc return value for the RRC field of the UVCB
> + *
> + * Prepare the protected VM for asynchronous teardown. The VM will be able
> + * to continue immediately as a non-secure VM, and the information needed to
> + * properly tear down the protected VM is set aside. If another protected VM
> + * was already set aside without starting a teardown, the function will
> + * fail.
> + *
> + * Context: kvm->lock needs to be held

You're asserting that via lockdep in the function below, why not here?

Also add something like: All PV cpus need to be destroyed by the caller 
before calling this function.

I'm considering that we add a cpu counter to the kvm arch pv struct 
which counts the currently created PV cpus so we can do a proper 
WARN_ONCE(count) here.

> + *
> + * Return: 0 in case of success, -EINVAL if another protected VM was already set
> + * aside, -ENOMEM if the system ran out of memory.
> + */
> +int kvm_s390_pv_deinit_vm_async_prepare(struct kvm *kvm, u16 *rc, u16 *rrc)
> +{
> +	struct leftover_pv_vm *priv;
> +
> +	/*
> +	 * If an asynchronous deinitialization is already pending, refuse.
> +	 * A synchronous deinitialization has to be performed instead.
> +	 */
> +	if (READ_ONCE(kvm->arch.pv.prepared_for_async_deinit))
> +		return -EINVAL;
> +	priv = kmalloc(sizeof(*priv), GFP_KERNEL | __GFP_ZERO);
> +	if (!priv)
> +		return -ENOMEM;
> +
> +	priv->stor_var = kvm->arch.pv.stor_var;
> +	priv->stor_base = kvm->arch.pv.stor_base;
> +	priv->handle = kvm_s390_pv_get_handle(kvm);
> +	priv->old_gmap_table = (unsigned long)kvm->arch.gmap->table;
> +	WRITE_ONCE(kvm->arch.gmap->guest_handle, 0);
> +	if (s390_replace_asce(kvm->arch.gmap)) {
> +		kfree(priv);
> +		return -ENOMEM;
> +	}
> +
> +	kvm_s390_destroy_lower_2g(kvm);
> +	kvm_s390_clear_pv_state(kvm);
> +	WRITE_ONCE(kvm->arch.pv.prepared_for_async_deinit, priv);
> +
> +	*rc = 1;
> +	*rrc = 42;
> +	return 0;
> +}
> +

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v11 18/19] KVM: s390: pv: avoid export before import if possible
  2022-06-03  6:56 ` [PATCH v11 18/19] KVM: s390: pv: avoid export before import if possible Claudio Imbrenda
  2022-06-07 14:33   ` Nico Boehr
@ 2022-06-20  9:56   ` Janosch Frank
  1 sibling, 0 replies; 36+ messages in thread
From: Janosch Frank @ 2022-06-20  9:56 UTC (permalink / raw)
  To: Claudio Imbrenda, kvm
  Cc: borntraeger, thuth, pasic, david, linux-s390, linux-kernel, scgl,
	mimu, nrb

On 6/3/22 08:56, Claudio Imbrenda wrote:
> If the appropriate UV feature bit is set, there is no need to perform
> an export before import.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
>   arch/s390/kernel/uv.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index 02aca3c5dce1..c18c3d6a4314 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -249,6 +249,8 @@ static int make_secure_pte(pte_t *ptep, unsigned long addr,
>    */
>   static bool should_export_before_import(struct uv_cb_header *uvcb, struct mm_struct *mm)
>   {
> +	if (test_bit_inv(BIT_UV_FEAT_MISC, &uv_info.uv_feature_indications))
> +		return false;

This needs a comment explaining, that this is only an option for shared 
pages.

>   	if (uvcb->cmd == UVC_CMD_UNPIN_PAGE_SHARED)
>   		return false;
>   	return atomic_read(&mm->context.protected_count) > 1;


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2022-06-20  9:56 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-03  6:56 [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
2022-06-03  6:56 ` [PATCH v11 01/19] KVM: s390: pv: leak the topmost page table when destroy fails Claudio Imbrenda
2022-06-03  6:56 ` [PATCH v11 02/19] KVM: s390: pv: handle secure storage violations for protected guests Claudio Imbrenda
2022-06-03  6:56 ` [PATCH v11 03/19] KVM: s390: pv: handle secure storage exceptions for normal guests Claudio Imbrenda
2022-06-03  6:56 ` [PATCH v11 04/19] KVM: s390: pv: refactor s390_reset_acc Claudio Imbrenda
2022-06-03  6:56 ` [PATCH v11 05/19] KVM: s390: pv: usage counter instead of flag Claudio Imbrenda
2022-06-03  6:56 ` [PATCH v11 06/19] KVM: s390: pv: add export before import Claudio Imbrenda
2022-06-03  6:56 ` [PATCH v11 07/19] KVM: s390: pv: module parameter to fence asynchronous destroy Claudio Imbrenda
2022-06-15  9:53   ` Janosch Frank
2022-06-15  9:59     ` Claudio Imbrenda
2022-06-03  6:56 ` [PATCH v11 08/19] KVM: s390: pv: clear the state without memset Claudio Imbrenda
2022-06-03  6:56 ` [PATCH v11 09/19] KVM: s390: pv: Add kvm_s390_cpus_from_pv to kvm-s390.h and add documentation Claudio Imbrenda
2022-06-03  6:56 ` [PATCH v11 10/19] KVM: s390: pv: add mmu_notifier Claudio Imbrenda
2022-06-08 12:02   ` Nico Boehr
2022-06-03  6:56 ` [PATCH v11 11/19] s390/mm: KVM: pv: when tearing down, try to destroy protected pages Claudio Imbrenda
2022-06-08 12:03   ` Nico Boehr
2022-06-03  6:56 ` [PATCH v11 12/19] KVM: s390: pv: refactoring of kvm_s390_pv_deinit_vm Claudio Imbrenda
2022-06-03  6:56 ` [PATCH v11 13/19] KVM: s390: pv: destroy the configuration before its memory Claudio Imbrenda
2022-06-08 12:06   ` Nico Boehr
2022-06-14 14:23   ` Janosch Frank
2022-06-03  6:56 ` [PATCH v11 14/19] KVM: s390: pv: cleanup leftover protected VMs if needed Claudio Imbrenda
2022-06-15  9:59   ` Janosch Frank
2022-06-15 10:19     ` Claudio Imbrenda
2022-06-15 10:57       ` Janosch Frank
2022-06-15 11:13         ` Claudio Imbrenda
2022-06-03  6:56 ` [PATCH v11 15/19] KVM: s390: pv: asynchronous destroy for reboot Claudio Imbrenda
2022-06-15 10:58   ` Janosch Frank
2022-06-20  9:41   ` Janosch Frank
2022-06-03  6:56 ` [PATCH v11 16/19] KVM: s390: pv: api documentation for asynchronous destroy Claudio Imbrenda
2022-06-20  9:13   ` Janosch Frank
2022-06-03  6:56 ` [PATCH v11 17/19] KVM: s390: pv: add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE Claudio Imbrenda
2022-06-03  6:56 ` [PATCH v11 18/19] KVM: s390: pv: avoid export before import if possible Claudio Imbrenda
2022-06-07 14:33   ` Nico Boehr
2022-06-20  9:56   ` Janosch Frank
2022-06-03  6:56 ` [PATCH v11 19/19] KVM: s390: pv: support for Destroy fast UVC Claudio Imbrenda
2022-06-14 14:29 ` [PATCH v11 00/19] KVM: s390: pv: implement lazy destroy for reboot Janosch Frank

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.