All of lore.kernel.org
 help / color / mirror / Atom feed
From: Claudio Imbrenda <imbrenda@linux.ibm.com>
To: kvm@vger.kernel.org
Cc: borntraeger@de.ibm.com, frankja@linux.ibm.com, thuth@redhat.com,
	pasic@linux.ibm.com, david@redhat.com,
	linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org,
	scgl@linux.ibm.com, mimu@linux.ibm.com, nrb@linux.ibm.com
Subject: [PATCH v12 01/18] KVM: s390: pv: leak the topmost page table when destroy fails
Date: Tue, 28 Jun 2022 15:56:02 +0200	[thread overview]
Message-ID: <20220628135619.32410-2-imbrenda@linux.ibm.com> (raw)
In-Reply-To: <20220628135619.32410-1-imbrenda@linux.ibm.com>

Each secure guest must have a unique ASCE (address space control
element); we must avoid that new guests use the same page for their
ASCE, to avoid errors.

Since the ASCE mostly consists of the address of the topmost page table
(plus some flags), we must not return that memory to the pool unless
the ASCE is no longer in use.

Only a successful Destroy Secure Configuration UVC will make the ASCE
reusable again.

If the Destroy Configuration UVC fails, the ASCE cannot be reused for a
secure guest (either for the ASCE or for other memory areas). To avoid
a collision, it must not be used again. This is a permanent error and
the page becomes in practice unusable, so we set it aside and leak it.
On failure we already leak other memory that belongs to the ultravisor
(i.e. the variable and base storage for a guest) and not leaking the
topmost page table was an oversight.

This error (and thus the leakage) should not happen unless the hardware
is broken or KVM has some unknown serious bug.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: 29b40f105ec8d55 ("KVM: s390: protvirt: Add initial vm and cpu lifecycle handling")
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/gmap.h |  2 +
 arch/s390/kvm/pv.c           |  9 ++--
 arch/s390/mm/gmap.c          | 86 ++++++++++++++++++++++++++++++++++++
 3 files changed, 94 insertions(+), 3 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index 40264f60b0da..f4073106e1f3 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -148,4 +148,6 @@ void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],
 			     unsigned long gaddr, unsigned long vmaddr);
 int gmap_mark_unmergeable(void);
 void s390_reset_acc(struct mm_struct *mm);
+void s390_unlist_old_asce(struct gmap *gmap);
+int s390_replace_asce(struct gmap *gmap);
 #endif /* _ASM_S390_GMAP_H */
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index cc7c9599f43e..8eee3fc414e5 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -161,10 +161,13 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
 	atomic_set(&kvm->mm->context.is_protected, 0);
 	KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x", *rc, *rrc);
 	WARN_ONCE(cc, "protvirt destroy vm failed rc %x rrc %x", *rc, *rrc);
-	/* Inteded memory leak on "impossible" error */
-	if (!cc)
+	/* Intended memory leak on "impossible" error */
+	if (!cc) {
 		kvm_s390_pv_dealloc_vm(kvm);
-	return cc ? -EIO : 0;
+		return 0;
+	}
+	s390_replace_asce(kvm->arch.gmap);
+	return -EIO;
 }
 
 int kvm_s390_pv_init_vm(struct kvm *kvm, u16 *rc, u16 *rrc)
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index b8ae4a4aa2ba..85cab61d87a9 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2735,3 +2735,89 @@ void s390_reset_acc(struct mm_struct *mm)
 	mmput(mm);
 }
 EXPORT_SYMBOL_GPL(s390_reset_acc);
+
+/**
+ * s390_unlist_old_asce - Remove the topmost level of page tables from the
+ * list of page tables of the gmap.
+ * @gmap: the gmap whose table is to be removed
+ *
+ * On s390x, KVM keeps a list of all pages containing the page tables of the
+ * gmap (the CRST list). This list is used at tear down time to free all
+ * pages that are now not needed anymore.
+ *
+ * This function removes the topmost page of the tree (the one pointed to by
+ * the ASCE) from the CRST list.
+ *
+ * This means that it will not be freed when the VM is torn down, and needs
+ * to be handled separately by the caller, unless a leak is actually
+ * intended. Notice that this function will only remove the page from the
+ * list, the page will still be used as a top level page table (and ASCE).
+ */
+void s390_unlist_old_asce(struct gmap *gmap)
+{
+	struct page *old;
+
+	old = virt_to_page(gmap->table);
+	spin_lock(&gmap->guest_table_lock);
+	list_del(&old->lru);
+	/*
+	 * Sometimes the topmost page might need to be "removed" multiple
+	 * times, for example if the VM is rebooted into secure mode several
+	 * times concurrently, or if s390_replace_asce fails after calling
+	 * s390_remove_old_asce and is attempted again later. In that case
+	 * the old asce has been removed from the list, and therefore it
+	 * will not be freed when the VM terminates, but the ASCE is still
+	 * in use and still pointed to.
+	 * A subsequent call to replace_asce will follow the pointer and try
+	 * to remove the same page from the list again.
+	 * Therefore it's necessary that the page of the ASCE has valid
+	 * pointers, so list_del can work (and do nothing) without
+	 * dereferencing stale or invalid pointers.
+	 */
+	INIT_LIST_HEAD(&old->lru);
+	spin_unlock(&gmap->guest_table_lock);
+}
+EXPORT_SYMBOL_GPL(s390_unlist_old_asce);
+
+/**
+ * s390_replace_asce - Try to replace the current ASCE of a gmap with a copy
+ * @gmap: the gmap whose ASCE needs to be replaced
+ *
+ * If the allocation of the new top level page table fails, the ASCE is not
+ * replaced.
+ * In any case, the old ASCE is always removed from the gmap CRST list.
+ * Therefore the caller has to make sure to save a pointer to it
+ * beforehand, unless a leak is actually intended.
+ */
+int s390_replace_asce(struct gmap *gmap)
+{
+	unsigned long asce;
+	struct page *page;
+	void *table;
+
+	s390_unlist_old_asce(gmap);
+
+	page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
+	if (!page)
+		return -ENOMEM;
+	table = page_to_virt(page);
+	memcpy(table, gmap->table, 1UL << (CRST_ALLOC_ORDER + PAGE_SHIFT));
+
+	/*
+	 * The caller has to deal with the old ASCE, but here we make sure
+	 * the new one is properly added to the CRST list, so that
+	 * it will be freed when the VM is torn down.
+	 */
+	spin_lock(&gmap->guest_table_lock);
+	list_add(&page->lru, &gmap->crst_list);
+	spin_unlock(&gmap->guest_table_lock);
+
+	/* Set new table origin while preserving existing ASCE control bits */
+	asce = (gmap->asce & ~_ASCE_ORIGIN) | __pa(table);
+	WRITE_ONCE(gmap->asce, asce);
+	WRITE_ONCE(gmap->mm->context.gmap_asce, asce);
+	WRITE_ONCE(gmap->table, table);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(s390_replace_asce);
-- 
2.36.1


  reply	other threads:[~2022-06-28 13:56 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-28 13:56 [PATCH v12 00/18] KVM: s390: pv: implement lazy destroy for reboot Claudio Imbrenda
2022-06-28 13:56 ` Claudio Imbrenda [this message]
2022-06-28 13:56 ` [PATCH v12 02/18] KVM: s390: pv: handle secure storage violations for protected guests Claudio Imbrenda
2022-06-28 13:56 ` [PATCH v12 03/18] KVM: s390: pv: handle secure storage exceptions for normal guests Claudio Imbrenda
2022-06-28 13:56 ` [PATCH v12 04/18] KVM: s390: pv: refactor s390_reset_acc Claudio Imbrenda
2022-06-28 13:56 ` [PATCH v12 05/18] KVM: s390: pv: usage counter instead of flag Claudio Imbrenda
2022-06-28 13:56 ` [PATCH v12 06/18] KVM: s390: pv: add export before import Claudio Imbrenda
2022-06-28 13:56 ` [PATCH v12 07/18] KVM: s390: pv: clear the state without memset Claudio Imbrenda
2022-06-28 13:56 ` [PATCH v12 08/18] KVM: s390: pv: Add kvm_s390_cpus_from_pv to kvm-s390.h and add documentation Claudio Imbrenda
2022-06-28 13:56 ` [PATCH v12 09/18] KVM: s390: pv: add mmu_notifier Claudio Imbrenda
2022-06-28 13:56 ` [PATCH v12 10/18] s390/mm: KVM: pv: when tearing down, try to destroy protected pages Claudio Imbrenda
2022-06-28 13:56 ` [PATCH v12 11/18] KVM: s390: pv: refactoring of kvm_s390_pv_deinit_vm Claudio Imbrenda
2022-06-28 13:56 ` [PATCH v12 12/18] KVM: s390: pv: destroy the configuration before its memory Claudio Imbrenda
2022-06-28 13:56 ` [PATCH v12 13/18] KVM: s390: pv: asynchronous destroy for reboot Claudio Imbrenda
2022-06-28 13:56 ` [PATCH v12 14/18] KVM: s390: pv: api documentation for asynchronous destroy Claudio Imbrenda
2022-06-28 13:56 ` [PATCH v12 15/18] KVM: s390: pv: add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE Claudio Imbrenda
2022-06-28 13:56 ` [PATCH v12 16/18] KVM: s390: pv: avoid export before import if possible Claudio Imbrenda
2022-06-28 13:56 ` [PATCH v12 17/18] KVM: s390: pv: support for Destroy fast UVC Claudio Imbrenda
2022-06-28 13:56 ` [PATCH v12 18/18] KVM: s390: pv: module parameter to fence asynchronous destroy Claudio Imbrenda
2022-07-14  6:59 ` [PATCH v12 00/18] KVM: s390: pv: implement lazy destroy for reboot Janosch Frank
2022-07-14  9:17   ` Claudio Imbrenda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220628135619.32410-2-imbrenda@linux.ibm.com \
    --to=imbrenda@linux.ibm.com \
    --cc=borntraeger@de.ibm.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mimu@linux.ibm.com \
    --cc=nrb@linux.ibm.com \
    --cc=pasic@linux.ibm.com \
    --cc=scgl@linux.ibm.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.