All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/35] KVM: s390: Add support for protected VMs
@ 2020-02-07 11:39 Christian Borntraeger
  2020-02-07 11:39 ` [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages Christian Borntraeger
                   ` (34 more replies)
  0 siblings, 35 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, Andrew Morton


Upfront: This series contains a "pretty small" common code memory
management change that will allow paging, guest backing with files etc
almost just like normal VMs. It should be a no-op for all architectures
not opting in. And it should be usable for others that also try to get
notified on "the pages are in the process of being used for things like
I/O"

I CCed linux-mm (and Andrew as mm maintainer and Andrea as he was
involved in some design discussions) on the first patch (common code
mm). I also added the CC to some other patches that make use of this
infrastructure or are dealing with arch-specific memory management.

The full patch queue is on the linux-s390 and kvm mailing list.  It
would be good to get an ACK for this patch. I can then carry that via
the s390 tree.

Overview
--------
Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's state
like guest memory and guest registers anymore. Instead the PVMs are
mostly managed by a new entity called Ultravisor (UV), which provides
an API, so KVM and the PV can request management actions.

PVMs are encrypted at rest and protected from hypervisor access while
running. They switch from a normal operation into protected mode, so
we can still use the standard boot process to load a encrypted blob
and then move it into protected mode.

Rebooting is only possible by passing through the unprotected/normal
mode and switching to protected again.

All patches are in the protvirtv3 branch of the korg s390 kvm git
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git/log/?h=protvirtv3

Claudio presented the technology at his presentation at KVM Forum
2019.

https://static.sched.com/hosted_files/kvmforum2019/3b/ibm_protected_vms_s390x.pdf


RFCv2 -> v1 (you can diff the protvirtv2 and the protvirtv3 branch)
- tons of review feedback integrated (see mail thread)
- memory management now complete and working
- Documentation patches merged
- interrupt patches merged
- CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST removed
- SIDA interface integrated into memop
- for merged patches I removed reviews that were not in all patches

Christian Borntraeger (3):
  KVM: s390/mm: Make pages accessible before destroying the guest
  KVM: s390: protvirt: Add SCLP interrupt handling
  KVM: s390: protvirt: do not inject interrupts after start

Claudio Imbrenda (3):
  mm:gup/writeback: add callbacks for inaccessible pages
  s390/mm: provide memory management functions for protected KVM guests
  KVM: s390/mm: handle guest unpin events

Janosch Frank (23):
  KVM: s390: add new variants of UV CALL
  KVM: s390: protvirt: Add initial lifecycle handling
  KVM: s390: protvirt: Add KVM api documentation
  KVM: s390: protvirt: Secure memory is not mergeable
  KVM: s390: protvirt: Handle SE notification interceptions
  KVM: s390: protvirt: Instruction emulation
  KVM: s390: protvirt: Handle spec exception loops
  KVM: s390: protvirt: Add new gprs location handling
  KVM: S390: protvirt: Introduce instruction data area bounce buffer
  KVM: s390: protvirt: handle secure guest prefix pages
  KVM: s390: protvirt: Write sthyi data to instruction data area
  KVM: s390: protvirt: STSI handling
  KVM: s390: protvirt: disallow one_reg
  KVM: s390: protvirt: Only sync fmt4 registers
  KVM: s390: protvirt: Add program exception injection
  KVM: s390: protvirt: Add diag 308 subcode 8 - 10 handling
  KVM: s390: protvirt: UV calls diag308 0, 1
  KVM: s390: protvirt: Report CPU state to Ultravisor
  KVM: s390: protvirt: Support cmd 5 operation state
  KVM: s390: protvirt: Add UV debug trace
  KVM: s390: protvirt: Mask PSW interrupt bits for interception 104 and
    112
  KVM: s390: protvirt: Add UV cpu reset calls
  DOCUMENTATION: Protected virtual machine introduction and IPL

Michael Mueller (2):
  KVM: s390: protvirt: Add interruption injection controls
  KVM: s390: protvirt: Implement interruption injection

Ulrich Weigand (1):
  KVM: s390/interrupt: do not pin adapter interrupt pages

Vasily Gorbik (3):
  s390/protvirt: introduce host side setup
  s390/protvirt: add ultravisor initialization
  s390/mm: add (non)secure page access exceptions handlers

 .../admin-guide/kernel-parameters.txt         |   5 +
 Documentation/virt/kvm/api.txt                |  67 ++-
 Documentation/virt/kvm/index.rst              |   2 +
 Documentation/virt/kvm/s390-pv-boot.rst       |  79 +++
 Documentation/virt/kvm/s390-pv.rst            | 116 +++++
 MAINTAINERS                                   |   1 +
 arch/s390/boot/Makefile                       |   2 +-
 arch/s390/boot/uv.c                           |  21 +-
 arch/s390/include/asm/gmap.h                  |   3 +
 arch/s390/include/asm/kvm_host.h              | 114 ++++-
 arch/s390/include/asm/mmu.h                   |   2 +
 arch/s390/include/asm/mmu_context.h           |   1 +
 arch/s390/include/asm/page.h                  |   5 +
 arch/s390/include/asm/pgtable.h               |  35 +-
 arch/s390/include/asm/uv.h                    | 267 +++++++++-
 arch/s390/kernel/Makefile                     |   1 +
 arch/s390/kernel/pgm_check.S                  |   4 +-
 arch/s390/kernel/setup.c                      |   7 +-
 arch/s390/kernel/uv.c                         | 274 ++++++++++
 arch/s390/kvm/Makefile                        |   2 +-
 arch/s390/kvm/diag.c                          |   1 +
 arch/s390/kvm/intercept.c                     | 109 +++-
 arch/s390/kvm/interrupt.c                     | 371 +++++++++++---
 arch/s390/kvm/kvm-s390.c                      | 477 ++++++++++++++++--
 arch/s390/kvm/kvm-s390.h                      |  39 ++
 arch/s390/kvm/priv.c                          |  11 +-
 arch/s390/kvm/pv.c                            | 292 +++++++++++
 arch/s390/mm/fault.c                          |  86 ++++
 arch/s390/mm/gmap.c                           |  65 ++-
 include/linux/gfp.h                           |   6 +
 include/uapi/linux/kvm.h                      |  42 +-
 mm/gup.c                                      |   2 +
 mm/page-writeback.c                           |   1 +
 33 files changed, 2325 insertions(+), 185 deletions(-)
 create mode 100644 Documentation/virt/kvm/s390-pv-boot.rst
 create mode 100644 Documentation/virt/kvm/s390-pv.rst
 create mode 100644 arch/s390/kernel/uv.c
 create mode 100644 arch/s390/kvm/pv.c

-- 
2.24.0

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-10 17:27     ` Christian Borntraeger
                     ` (2 more replies)
  2020-02-07 11:39 ` [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages Christian Borntraeger
                   ` (33 subsequent siblings)
  34 siblings, 3 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, Andrew Morton

From: Claudio Imbrenda <imbrenda@linux.ibm.com>

With the introduction of protected KVM guests on s390 there is now a
concept of inaccessible pages. These pages need to be made accessible
before the host can access them.

While cpu accesses will trigger a fault that can be resolved, I/O
accesses will just fail.  We need to add a callback into architecture
code for places that will do I/O, namely when writeback is started or
when a page reference is taken.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 include/linux/gfp.h | 6 ++++++
 mm/gup.c            | 2 ++
 mm/page-writeback.c | 1 +
 3 files changed, 9 insertions(+)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index e5b817cb86e7..be2754841369 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { }
 #ifndef HAVE_ARCH_ALLOC_PAGE
 static inline void arch_alloc_page(struct page *page, int order) { }
 #endif
+#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
+static inline int arch_make_page_accessible(struct page *page)
+{
+	return 0;
+}
+#endif
 
 struct page *
 __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
diff --git a/mm/gup.c b/mm/gup.c
index 7646bf993b25..a01262cd2821 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -257,6 +257,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 			page = ERR_PTR(-ENOMEM);
 			goto out;
 		}
+		arch_make_page_accessible(page);
 	}
 	if (flags & FOLL_TOUCH) {
 		if ((flags & FOLL_WRITE) &&
@@ -1870,6 +1871,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 
 		VM_BUG_ON_PAGE(compound_head(page) != head, page);
 
+		arch_make_page_accessible(page);
 		SetPageReferenced(page);
 		pages[*nr] = page;
 		(*nr)++;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 2caf780a42e7..0f0bd14571b1 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2806,6 +2806,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 		inc_lruvec_page_state(page, NR_WRITEBACK);
 		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 	}
+	arch_make_page_accessible(page);
 	unlock_page_memcg(page);
 	return ret;
 
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
  2020-02-07 11:39 ` [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-10 12:26   ` David Hildenbrand
  2020-02-10 12:40   ` [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages David Hildenbrand
  2020-02-07 11:39 ` [PATCH 03/35] s390/protvirt: introduce host side setup Christian Borntraeger
                   ` (32 subsequent siblings)
  34 siblings, 2 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, Andrew Morton

From: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>

The adapter interrupt page containing the indicator bits is currently
pinned. That means that a guest with many devices can pin a lot of
memory pages in the host. This also complicates the reference tracking
which is needed for memory management handling of protected virtual
machines.
We can reuse the pte notifiers to "cache" the page without pinning it.

Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |   4 +-
 arch/s390/kvm/interrupt.c        | 155 +++++++++++++++++++++++--------
 arch/s390/kvm/kvm-s390.c         |   4 +
 arch/s390/kvm/kvm-s390.h         |   2 +
 4 files changed, 123 insertions(+), 42 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 73044545ecac..884503e05424 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -701,9 +701,9 @@ struct s390_io_adapter {
 	bool masked;
 	bool swap;
 	bool suppressible;
-	struct rw_semaphore maps_lock;
+	spinlock_t maps_lock;
 	struct list_head maps;
-	atomic_t nr_maps;
+	int nr_maps;
 };
 
 #define MAX_S390_IO_ADAPTERS ((MAX_ISC + 1) * 8)
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index c06c89d370a7..4bfb2f8fe57c 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -28,6 +28,7 @@
 #include <asm/switch_to.h>
 #include <asm/nmi.h>
 #include <asm/airq.h>
+#include <linux/pagemap.h>
 #include "kvm-s390.h"
 #include "gaccess.h"
 #include "trace-s390.h"
@@ -2328,8 +2329,8 @@ static int register_io_adapter(struct kvm_device *dev,
 		return -ENOMEM;
 
 	INIT_LIST_HEAD(&adapter->maps);
-	init_rwsem(&adapter->maps_lock);
-	atomic_set(&adapter->nr_maps, 0);
+	spin_lock_init(&adapter->maps_lock);
+	adapter->nr_maps = 0;
 	adapter->id = adapter_info.id;
 	adapter->isc = adapter_info.isc;
 	adapter->maskable = adapter_info.maskable;
@@ -2375,19 +2376,15 @@ static int kvm_s390_adapter_map(struct kvm *kvm, unsigned int id, __u64 addr)
 		ret = -EFAULT;
 		goto out;
 	}
-	ret = get_user_pages_fast(map->addr, 1, FOLL_WRITE, &map->page);
-	if (ret < 0)
-		goto out;
-	BUG_ON(ret != 1);
-	down_write(&adapter->maps_lock);
-	if (atomic_inc_return(&adapter->nr_maps) < MAX_S390_ADAPTER_MAPS) {
+	spin_lock(&adapter->maps_lock);
+	if (adapter->nr_maps < MAX_S390_ADAPTER_MAPS) {
+		adapter->nr_maps++;
 		list_add_tail(&map->list, &adapter->maps);
 		ret = 0;
 	} else {
-		put_page(map->page);
 		ret = -EINVAL;
 	}
-	up_write(&adapter->maps_lock);
+	spin_unlock(&adapter->maps_lock);
 out:
 	if (ret)
 		kfree(map);
@@ -2403,18 +2400,17 @@ static int kvm_s390_adapter_unmap(struct kvm *kvm, unsigned int id, __u64 addr)
 	if (!adapter || !addr)
 		return -EINVAL;
 
-	down_write(&adapter->maps_lock);
+	spin_lock(&adapter->maps_lock);
 	list_for_each_entry_safe(map, tmp, &adapter->maps, list) {
 		if (map->guest_addr == addr) {
 			found = 1;
-			atomic_dec(&adapter->nr_maps);
+			adapter->nr_maps--;
 			list_del(&map->list);
-			put_page(map->page);
 			kfree(map);
 			break;
 		}
 	}
-	up_write(&adapter->maps_lock);
+	spin_unlock(&adapter->maps_lock);
 
 	return found ? 0 : -EINVAL;
 }
@@ -2430,7 +2426,6 @@ void kvm_s390_destroy_adapters(struct kvm *kvm)
 		list_for_each_entry_safe(map, tmp,
 					 &kvm->arch.adapters[i]->maps, list) {
 			list_del(&map->list);
-			put_page(map->page);
 			kfree(map);
 		}
 		kfree(kvm->arch.adapters[i]);
@@ -2690,6 +2685,31 @@ struct kvm_device_ops kvm_flic_ops = {
 	.destroy = flic_destroy,
 };
 
+void kvm_s390_adapter_gmap_notifier(struct gmap *gmap, unsigned long start,
+				    unsigned long end)
+{
+	struct kvm *kvm = gmap->private;
+	struct s390_map_info *map, *tmp;
+	int i;
+
+	for (i = 0; i < MAX_S390_IO_ADAPTERS; i++) {
+		struct s390_io_adapter *adapter = kvm->arch.adapters[i];
+
+		if (!adapter)
+			continue;
+		spin_lock(&adapter->maps_lock);
+		list_for_each_entry_safe(map, tmp, &adapter->maps, list) {
+			if (start <= map->guest_addr && map->guest_addr < end) {
+				if (IS_ERR(map->page))
+					map->page = ERR_PTR(-EAGAIN);
+				else
+					map->page = NULL;
+			}
+		}
+		spin_unlock(&adapter->maps_lock);
+	}
+}
+
 static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
 {
 	unsigned long bit;
@@ -2699,19 +2719,71 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
 	return swap ? (bit ^ (BITS_PER_LONG - 1)) : bit;
 }
 
-static struct s390_map_info *get_map_info(struct s390_io_adapter *adapter,
-					  u64 addr)
+static struct page *get_map_page(struct kvm *kvm,
+				 struct s390_io_adapter *adapter,
+				 u64 addr)
 {
 	struct s390_map_info *map;
+	unsigned long uaddr;
+	struct page *page;
+	bool need_retry;
+	int ret;
 
 	if (!adapter)
 		return NULL;
+retry:
+	page = NULL;
+	uaddr = 0;
+	spin_lock(&adapter->maps_lock);
+	list_for_each_entry(map, &adapter->maps, list)
+		if (map->guest_addr == addr) {
+			uaddr = map->addr;
+			page = map->page;
+			if (!page)
+				map->page = ERR_PTR(-EBUSY);
+			else if (IS_ERR(page) || !page_cache_get_speculative(page)) {
+				spin_unlock(&adapter->maps_lock);
+				goto retry;
+			}
+			break;
+		}
+	spin_unlock(&adapter->maps_lock);
+
+	if (page)
+		return page;
+	if (!uaddr)
+		return NULL;
 
-	list_for_each_entry(map, &adapter->maps, list) {
-		if (map->guest_addr == addr)
-			return map;
+	down_read(&kvm->mm->mmap_sem);
+	ret = set_pgste_bits(kvm->mm, uaddr, PGSTE_IN_BIT, PGSTE_IN_BIT);
+	if (ret)
+		goto fail;
+	ret = get_user_pages_remote(NULL, kvm->mm, uaddr, 1, FOLL_WRITE,
+				    &page, NULL, NULL);
+	if (ret < 1)
+		page = NULL;
+fail:
+	up_read(&kvm->mm->mmap_sem);
+	need_retry = true;
+	spin_lock(&adapter->maps_lock);
+	list_for_each_entry(map, &adapter->maps, list)
+		if (map->guest_addr == addr) {
+			if (map->page == ERR_PTR(-EBUSY)) {
+				map->page = page;
+				need_retry = false;
+			} else if (IS_ERR(map->page)) {
+				map->page = NULL;
+			}
+			break;
+		}
+	spin_unlock(&adapter->maps_lock);
+	if (need_retry) {
+		if (page)
+			put_page(page);
+		goto retry;
 	}
-	return NULL;
+
+	return page;
 }
 
 static int adapter_indicators_set(struct kvm *kvm,
@@ -2720,30 +2792,35 @@ static int adapter_indicators_set(struct kvm *kvm,
 {
 	unsigned long bit;
 	int summary_set, idx;
-	struct s390_map_info *info;
+	struct page *ind_page, *summary_page;
 	void *map;
 
-	info = get_map_info(adapter, adapter_int->ind_addr);
-	if (!info)
+	ind_page = get_map_page(kvm, adapter, adapter_int->ind_addr);
+	if (!ind_page)
 		return -1;
-	map = page_address(info->page);
-	bit = get_ind_bit(info->addr, adapter_int->ind_offset, adapter->swap);
-	set_bit(bit, map);
-	idx = srcu_read_lock(&kvm->srcu);
-	mark_page_dirty(kvm, info->guest_addr >> PAGE_SHIFT);
-	set_page_dirty_lock(info->page);
-	info = get_map_info(adapter, adapter_int->summary_addr);
-	if (!info) {
-		srcu_read_unlock(&kvm->srcu, idx);
+	summary_page = get_map_page(kvm, adapter, adapter_int->summary_addr);
+	if (!summary_page) {
+		put_page(ind_page);
 		return -1;
 	}
-	map = page_address(info->page);
-	bit = get_ind_bit(info->addr, adapter_int->summary_offset,
-			  adapter->swap);
+
+	idx = srcu_read_lock(&kvm->srcu);
+	map = page_address(ind_page);
+	bit = get_ind_bit(adapter_int->ind_addr,
+			  adapter_int->ind_offset, adapter->swap);
+	set_bit(bit, map);
+	mark_page_dirty(kvm, adapter_int->ind_addr >> PAGE_SHIFT);
+	set_page_dirty_lock(ind_page);
+	map = page_address(summary_page);
+	bit = get_ind_bit(adapter_int->summary_addr,
+			  adapter_int->summary_offset, adapter->swap);
 	summary_set = test_and_set_bit(bit, map);
-	mark_page_dirty(kvm, info->guest_addr >> PAGE_SHIFT);
-	set_page_dirty_lock(info->page);
+	mark_page_dirty(kvm, adapter_int->summary_addr >> PAGE_SHIFT);
+	set_page_dirty_lock(summary_page);
 	srcu_read_unlock(&kvm->srcu, idx);
+
+	put_page(ind_page);
+	put_page(summary_page);
 	return summary_set ? 0 : 1;
 }
 
@@ -2765,9 +2842,7 @@ static int set_adapter_int(struct kvm_kernel_irq_routing_entry *e,
 	adapter = get_io_adapter(kvm, e->adapter.adapter_id);
 	if (!adapter)
 		return -1;
-	down_read(&adapter->maps_lock);
 	ret = adapter_indicators_set(kvm, adapter, &e->adapter);
-	up_read(&adapter->maps_lock);
 	if ((ret > 0) && !adapter->masked) {
 		ret = kvm_s390_inject_airq(kvm, adapter);
 		if (ret == 0)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index e39f6ef97b09..1a48214ac507 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -219,6 +219,7 @@ static struct kvm_s390_vm_cpu_subfunc kvm_s390_available_subfunc;
 
 static struct gmap_notifier gmap_notifier;
 static struct gmap_notifier vsie_gmap_notifier;
+static struct gmap_notifier adapter_gmap_notifier;
 debug_info_t *kvm_s390_dbf;
 
 /* Section: not file related */
@@ -299,6 +300,8 @@ int kvm_arch_hardware_setup(void)
 	gmap_register_pte_notifier(&gmap_notifier);
 	vsie_gmap_notifier.notifier_call = kvm_s390_vsie_gmap_notifier;
 	gmap_register_pte_notifier(&vsie_gmap_notifier);
+	adapter_gmap_notifier.notifier_call = kvm_s390_adapter_gmap_notifier;
+	gmap_register_pte_notifier(&adapter_gmap_notifier);
 	atomic_notifier_chain_register(&s390_epoch_delta_notifier,
 				       &kvm_clock_notifier);
 	return 0;
@@ -308,6 +311,7 @@ void kvm_arch_hardware_unsetup(void)
 {
 	gmap_unregister_pte_notifier(&gmap_notifier);
 	gmap_unregister_pte_notifier(&vsie_gmap_notifier);
+	gmap_unregister_pte_notifier(&adapter_gmap_notifier);
 	atomic_notifier_chain_unregister(&s390_epoch_delta_notifier,
 					 &kvm_clock_notifier);
 }
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 6d9448dbd052..54c5eb4b275d 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -367,6 +367,8 @@ int s390int_to_s390irq(struct kvm_s390_interrupt *s390int,
 			struct kvm_s390_irq *s390irq);
 
 /* implemented in interrupt.c */
+void kvm_s390_adapter_gmap_notifier(struct gmap *gmap, unsigned long start,
+				    unsigned long end);
 int kvm_s390_vcpu_has_irq(struct kvm_vcpu *vcpu, int exclude_stop);
 int psw_extint_disabled(struct kvm_vcpu *vcpu);
 void kvm_s390_destroy_adapters(struct kvm *kvm);
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 03/35] s390/protvirt: introduce host side setup
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
  2020-02-07 11:39 ` [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages Christian Borntraeger
  2020-02-07 11:39 ` [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-10  9:42   ` Thomas Huth
                     ` (2 more replies)
  2020-02-07 11:39 ` [PATCH 04/35] s390/protvirt: add ultravisor initialization Christian Borntraeger
                   ` (31 subsequent siblings)
  34 siblings, 3 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik

From: Vasily Gorbik <gor@linux.ibm.com>

Add "prot_virt" command line option which controls if the kernel
protected VMs support is enabled at early boot time. This has to be
done early, because it needs large amounts of memory and will disable
some features like STP time sync for the lpar.

Extend ultravisor info definitions and expose it via uv_info struct
filled in during startup.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 .../admin-guide/kernel-parameters.txt         |  5 ++
 arch/s390/boot/Makefile                       |  2 +-
 arch/s390/boot/uv.c                           | 21 +++++++-
 arch/s390/include/asm/uv.h                    | 46 +++++++++++++++--
 arch/s390/kernel/Makefile                     |  1 +
 arch/s390/kernel/setup.c                      |  4 --
 arch/s390/kernel/uv.c                         | 49 +++++++++++++++++++
 7 files changed, 119 insertions(+), 9 deletions(-)
 create mode 100644 arch/s390/kernel/uv.c

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ade4e6ec23e0..327af96f9528 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3750,6 +3750,11 @@
 			before loading.
 			See Documentation/admin-guide/blockdev/ramdisk.rst.
 
+	prot_virt=	[S390] enable hosting protected virtual machines
+			isolated from the hypervisor (if hardware supports
+			that).
+			Format: <bool>
+
 	psi=		[KNL] Enable or disable pressure stall information
 			tracking.
 			Format: <bool>
diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
index e2c47d3a1c89..30f1811540c5 100644
--- a/arch/s390/boot/Makefile
+++ b/arch/s390/boot/Makefile
@@ -37,7 +37,7 @@ CFLAGS_sclp_early_core.o += -I$(srctree)/drivers/s390/char
 obj-y	:= head.o als.o startup.o mem_detect.o ipl_parm.o ipl_report.o
 obj-y	+= string.o ebcdic.o sclp_early_core.o mem.o ipl_vmparm.o cmdline.o
 obj-y	+= version.o pgm_check_info.o ctype.o text_dma.o
-obj-$(CONFIG_PROTECTED_VIRTUALIZATION_GUEST)	+= uv.o
+obj-$(findstring y, $(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) $(CONFIG_PGSTE))	+= uv.o
 obj-$(CONFIG_RELOCATABLE)	+= machine_kexec_reloc.o
 obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr.o
 targets	:= bzImage startup.a section_cmp.boot.data section_cmp.boot.preserved.data $(obj-y)
diff --git a/arch/s390/boot/uv.c b/arch/s390/boot/uv.c
index ed007f4a6444..af9e1cc93c68 100644
--- a/arch/s390/boot/uv.c
+++ b/arch/s390/boot/uv.c
@@ -3,7 +3,13 @@
 #include <asm/facility.h>
 #include <asm/sections.h>
 
+/* will be used in arch/s390/kernel/uv.c */
+#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
 int __bootdata_preserved(prot_virt_guest);
+#endif
+#if IS_ENABLED(CONFIG_KVM)
+struct uv_info __bootdata_preserved(uv_info);
+#endif
 
 void uv_query_info(void)
 {
@@ -18,7 +24,20 @@ void uv_query_info(void)
 	if (uv_call(0, (uint64_t)&uvcb))
 		return;
 
-	if (test_bit_inv(BIT_UVC_CMD_SET_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list) &&
+	if (IS_ENABLED(CONFIG_KVM)) {
+		memcpy(uv_info.inst_calls_list, uvcb.inst_calls_list, sizeof(uv_info.inst_calls_list));
+		uv_info.uv_base_stor_len = uvcb.uv_base_stor_len;
+		uv_info.guest_base_stor_len = uvcb.conf_base_phys_stor_len;
+		uv_info.guest_virt_base_stor_len = uvcb.conf_base_virt_stor_len;
+		uv_info.guest_virt_var_stor_len = uvcb.conf_virt_var_stor_len;
+		uv_info.guest_cpu_stor_len = uvcb.cpu_stor_len;
+		uv_info.max_sec_stor_addr = ALIGN(uvcb.max_guest_stor_addr, PAGE_SIZE);
+		uv_info.max_num_sec_conf = uvcb.max_num_sec_conf;
+		uv_info.max_guest_cpus = uvcb.max_guest_cpus;
+	}
+
+	if (IS_ENABLED(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) &&
+	    test_bit_inv(BIT_UVC_CMD_SET_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list) &&
 	    test_bit_inv(BIT_UVC_CMD_REMOVE_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list))
 		prot_virt_guest = 1;
 }
diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index 4093a2856929..cc7b0b0bc874 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -44,7 +44,19 @@ struct uv_cb_qui {
 	struct uv_cb_header header;
 	u64 reserved08;
 	u64 inst_calls_list[4];
-	u64 reserved30[15];
+	u64 reserved30[2];
+	u64 uv_base_stor_len;
+	u64 reserved48;
+	u64 conf_base_phys_stor_len;
+	u64 conf_base_virt_stor_len;
+	u64 conf_virt_var_stor_len;
+	u64 cpu_stor_len;
+	u32 reserved70[3];
+	u32 max_num_sec_conf;
+	u64 max_guest_stor_addr;
+	u8  reserved88[158-136];
+	u16 max_guest_cpus;
+	u64 reserveda0;
 } __packed __aligned(8);
 
 struct uv_cb_share {
@@ -69,9 +81,21 @@ static inline int uv_call(unsigned long r1, unsigned long r2)
 	return cc;
 }
 
-#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
+struct uv_info {
+	unsigned long inst_calls_list[4];
+	unsigned long uv_base_stor_len;
+	unsigned long guest_base_stor_len;
+	unsigned long guest_virt_base_stor_len;
+	unsigned long guest_virt_var_stor_len;
+	unsigned long guest_cpu_stor_len;
+	unsigned long max_sec_stor_addr;
+	unsigned int max_num_sec_conf;
+	unsigned short max_guest_cpus;
+};
+extern struct uv_info uv_info;
 extern int prot_virt_guest;
 
+#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
 static inline int is_prot_virt_guest(void)
 {
 	return prot_virt_guest;
@@ -121,11 +145,27 @@ static inline int uv_remove_shared(unsigned long addr)
 	return share(addr, UVC_CMD_REMOVE_SHARED_ACCESS);
 }
 
-void uv_query_info(void);
 #else
 #define is_prot_virt_guest() 0
 static inline int uv_set_shared(unsigned long addr) { return 0; }
 static inline int uv_remove_shared(unsigned long addr) { return 0; }
+#endif
+
+#if IS_ENABLED(CONFIG_KVM)
+extern int prot_virt_host;
+
+static inline int is_prot_virt_host(void)
+{
+	return prot_virt_host;
+}
+#else
+#define is_prot_virt_host() 0
+#endif
+
+#if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
+	IS_ENABLED(CONFIG_KVM)
+void uv_query_info(void);
+#else
 static inline void uv_query_info(void) {}
 #endif
 
diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
index 2b1203cf7be6..22bfb8d5084e 100644
--- a/arch/s390/kernel/Makefile
+++ b/arch/s390/kernel/Makefile
@@ -78,6 +78,7 @@ obj-$(CONFIG_PERF_EVENTS)	+= perf_cpum_cf_events.o perf_regs.o
 obj-$(CONFIG_PERF_EVENTS)	+= perf_cpum_cf_diag.o
 
 obj-$(CONFIG_TRACEPOINTS)	+= trace.o
+obj-$(findstring y, $(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) $(CONFIG_PGSTE))	+= uv.o
 
 # vdso
 obj-y				+= vdso64/
diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index d5fbd754f41a..f2ab2528859f 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -92,10 +92,6 @@ char elf_platform[ELF_PLATFORM_SIZE];
 
 unsigned long int_hwcap = 0;
 
-#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
-int __bootdata_preserved(prot_virt_guest);
-#endif
-
 int __bootdata(noexec_disabled);
 int __bootdata(memory_end_set);
 unsigned long __bootdata(memory_end);
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
new file mode 100644
index 000000000000..fbf2a98de642
--- /dev/null
+++ b/arch/s390/kernel/uv.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Common Ultravisor functions and initialization
+ *
+ * Copyright IBM Corp. 2019, 2020
+ */
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/sizes.h>
+#include <linux/bitmap.h>
+#include <linux/memblock.h>
+#include <asm/facility.h>
+#include <asm/sections.h>
+#include <asm/uv.h>
+
+/* the bootdata_preserved fields come from ones in arch/s390/boot/uv.c */
+#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
+int __bootdata_preserved(prot_virt_guest);
+#endif
+
+#if IS_ENABLED(CONFIG_KVM)
+int prot_virt_host;
+EXPORT_SYMBOL(prot_virt_host);
+struct uv_info __bootdata_preserved(uv_info);
+EXPORT_SYMBOL(uv_info);
+
+static int __init prot_virt_setup(char *val)
+{
+	bool enabled;
+	int rc;
+
+	rc = kstrtobool(val, &enabled);
+	if (!rc && enabled)
+		prot_virt_host = 1;
+
+	if (is_prot_virt_guest() && prot_virt_host) {
+		prot_virt_host = 0;
+		pr_info("Running as protected virtualization guest.");
+	}
+
+	if (prot_virt_host && !test_facility(158)) {
+		prot_virt_host = 0;
+		pr_info("The ultravisor call facility is not available.");
+	}
+
+	return rc;
+}
+early_param("prot_virt", prot_virt_setup);
+#endif
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 04/35] s390/protvirt: add ultravisor initialization
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (2 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 03/35] s390/protvirt: introduce host side setup Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-14 10:25   ` David Hildenbrand
  2020-02-07 11:39 ` [PATCH 05/35] s390/mm: provide memory management functions for protected KVM guests Christian Borntraeger
                   ` (30 subsequent siblings)
  34 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik

From: Vasily Gorbik <gor@linux.ibm.com>

Before being able to host protected virtual machines, donate some of
the memory to the ultravisor. Besides that the ultravisor might impose
addressing limitations for memory used to back protected VM storage. Treat
that limit as protected virtualization host's virtual memory limit.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/uv.h | 15 +++++++++++
 arch/s390/kernel/setup.c   |  3 +++
 arch/s390/kernel/uv.c      | 53 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 71 insertions(+)

diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index cc7b0b0bc874..9e988543201f 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -23,12 +23,14 @@
 #define UVC_RC_NO_RESUME	0x0007
 
 #define UVC_CMD_QUI			0x0001
+#define UVC_CMD_INIT_UV			0x000f
 #define UVC_CMD_SET_SHARED_ACCESS	0x1000
 #define UVC_CMD_REMOVE_SHARED_ACCESS	0x1001
 
 /* Bits in installed uv calls */
 enum uv_cmds_inst {
 	BIT_UVC_CMD_QUI = 0,
+	BIT_UVC_CMD_INIT_UV = 1,
 	BIT_UVC_CMD_SET_SHARED_ACCESS = 8,
 	BIT_UVC_CMD_REMOVE_SHARED_ACCESS = 9,
 };
@@ -59,6 +61,14 @@ struct uv_cb_qui {
 	u64 reserveda0;
 } __packed __aligned(8);
 
+struct uv_cb_init {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 stor_origin;
+	u64 stor_len;
+	u64 reserved28[4];
+} __packed __aligned(8);
+
 struct uv_cb_share {
 	struct uv_cb_header header;
 	u64 reserved08[3];
@@ -158,8 +168,13 @@ static inline int is_prot_virt_host(void)
 {
 	return prot_virt_host;
 }
+
+void setup_uv(void);
+void adjust_to_uv_max(unsigned long *vmax);
 #else
 #define is_prot_virt_host() 0
+static inline void setup_uv(void) {}
+static inline void adjust_to_uv_max(unsigned long *vmax) {}
 #endif
 
 #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index f2ab2528859f..5f178d557cc8 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -560,6 +560,8 @@ static void __init setup_memory_end(void)
 			vmax = _REGION1_SIZE; /* 4-level kernel page table */
 	}
 
+	adjust_to_uv_max(&vmax);
+
 	/* module area is at the end of the kernel address space. */
 	MODULES_END = vmax;
 	MODULES_VADDR = MODULES_END - MODULES_LEN;
@@ -1140,6 +1142,7 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	memblock_trim_memory(1UL << (MAX_ORDER - 1 + PAGE_SHIFT));
 
+	setup_uv();
 	setup_memory_end();
 	setup_memory();
 	dma_contiguous_reserve(memory_end);
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index fbf2a98de642..a06a628a88da 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -46,4 +46,57 @@ static int __init prot_virt_setup(char *val)
 	return rc;
 }
 early_param("prot_virt", prot_virt_setup);
+
+static int __init uv_init(unsigned long stor_base, unsigned long stor_len)
+{
+	struct uv_cb_init uvcb = {
+		.header.cmd = UVC_CMD_INIT_UV,
+		.header.len = sizeof(uvcb),
+		.stor_origin = stor_base,
+		.stor_len = stor_len,
+	};
+	int cc;
+
+	cc = uv_call(0, (uint64_t)&uvcb);
+	if (cc || uvcb.header.rc != UVC_RC_EXECUTED) {
+		pr_err("Ultravisor init failed with cc: %d rc: 0x%hx\n", cc,
+		       uvcb.header.rc);
+		return -1;
+	}
+	return 0;
+}
+
+void __init setup_uv(void)
+{
+	unsigned long uv_stor_base;
+
+	if (!prot_virt_host)
+		return;
+
+	uv_stor_base = (unsigned long)memblock_alloc_try_nid(
+		uv_info.uv_base_stor_len, SZ_1M, SZ_2G,
+		MEMBLOCK_ALLOC_ACCESSIBLE, NUMA_NO_NODE);
+	if (!uv_stor_base) {
+		pr_info("Failed to reserve %lu bytes for ultravisor base storage\n",
+			uv_info.uv_base_stor_len);
+		goto fail;
+	}
+
+	if (uv_init(uv_stor_base, uv_info.uv_base_stor_len)) {
+		memblock_free(uv_stor_base, uv_info.uv_base_stor_len);
+		goto fail;
+	}
+
+	pr_info("Reserving %luMB as ultravisor base storage\n",
+		uv_info.uv_base_stor_len >> 20);
+	return;
+fail:
+	prot_virt_host = 0;
+}
+
+void adjust_to_uv_max(unsigned long *vmax)
+{
+	if (prot_virt_host && *vmax > uv_info.max_sec_stor_addr)
+		*vmax = uv_info.max_sec_stor_addr;
+}
 #endif
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 05/35] s390/mm: provide memory management functions for protected KVM guests
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (3 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 04/35] s390/protvirt: add ultravisor initialization Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-12 13:42   ` Cornelia Huck
  2020-02-14 17:59   ` David Hildenbrand
  2020-02-07 11:39 ` [PATCH 06/35] s390/mm: add (non)secure page access exceptions handlers Christian Borntraeger
                   ` (29 subsequent siblings)
  34 siblings, 2 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, Andrew Morton

From: Claudio Imbrenda <imbrenda@linux.ibm.com>

This provides the basic ultravisor calls and page table handling to cope
with secure guests:
- provide arch_make_page_accessible
- make pages accessible after unmapping of secure guests
- provide the ultravisor commands convert to/from secure
- provide the ultravisor commands pin/unpin shared
- provide callbacks to make pages secure (inacccessible)
 - we check for the expected pin count to only make pages secure if the
   host is not accessing them
 - we fence hugetlbfs for secure pages

Co-developed-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/gmap.h        |   2 +
 arch/s390/include/asm/mmu.h         |   2 +
 arch/s390/include/asm/mmu_context.h |   1 +
 arch/s390/include/asm/page.h        |   5 +
 arch/s390/include/asm/pgtable.h     |  34 +++++-
 arch/s390/include/asm/uv.h          |  52 +++++++++
 arch/s390/kernel/uv.c               | 172 ++++++++++++++++++++++++++++
 7 files changed, 263 insertions(+), 5 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index 37f96b6f0e61..e2d2f48c5c7c 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -9,6 +9,7 @@
 #ifndef _ASM_S390_GMAP_H
 #define _ASM_S390_GMAP_H
 
+#include <linux/radix-tree.h>
 #include <linux/refcount.h>
 
 /* Generic bits for GMAP notification on DAT table entry changes. */
@@ -61,6 +62,7 @@ struct gmap {
 	spinlock_t shadow_lock;
 	struct gmap *parent;
 	unsigned long orig_asce;
+	unsigned long guest_handle;
 	int edat_level;
 	bool removed;
 	bool initialized;
diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h
index bcfb6371086f..e21b618ad432 100644
--- a/arch/s390/include/asm/mmu.h
+++ b/arch/s390/include/asm/mmu.h
@@ -16,6 +16,8 @@ typedef struct {
 	unsigned long asce;
 	unsigned long asce_limit;
 	unsigned long vdso_base;
+	/* The mmu context belongs to a secure guest. */
+	atomic_t is_protected;
 	/*
 	 * The following bitfields need a down_write on the mm
 	 * semaphore when they are written to. As they are only
diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h
index 8d04e6f3f796..afa836014076 100644
--- a/arch/s390/include/asm/mmu_context.h
+++ b/arch/s390/include/asm/mmu_context.h
@@ -23,6 +23,7 @@ static inline int init_new_context(struct task_struct *tsk,
 	INIT_LIST_HEAD(&mm->context.gmap_list);
 	cpumask_clear(&mm->context.cpu_attach_mask);
 	atomic_set(&mm->context.flush_count, 0);
+	atomic_set(&mm->context.is_protected, 0);
 	mm->context.gmap_asce = 0;
 	mm->context.flush_mm = 0;
 	mm->context.compat_mm = test_thread_flag(TIF_31BIT);
diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
index a4d38092530a..05ea3e42a041 100644
--- a/arch/s390/include/asm/page.h
+++ b/arch/s390/include/asm/page.h
@@ -151,6 +151,11 @@ static inline int devmem_is_allowed(unsigned long pfn)
 #define HAVE_ARCH_FREE_PAGE
 #define HAVE_ARCH_ALLOC_PAGE
 
+#if IS_ENABLED(CONFIG_PGSTE)
+int arch_make_page_accessible(struct page *page);
+#define HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
+#endif
+
 #endif /* !__ASSEMBLY__ */
 
 #define __PAGE_OFFSET		0x0UL
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 7b03037a8475..dbd1453e6924 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -19,6 +19,7 @@
 #include <linux/atomic.h>
 #include <asm/bug.h>
 #include <asm/page.h>
+#include <asm/uv.h>
 
 extern pgd_t swapper_pg_dir[];
 extern void paging_init(void);
@@ -520,6 +521,15 @@ static inline int mm_has_pgste(struct mm_struct *mm)
 	return 0;
 }
 
+static inline int mm_is_protected(struct mm_struct *mm)
+{
+#ifdef CONFIG_PGSTE
+	if (unlikely(atomic_read(&mm->context.is_protected)))
+		return 1;
+#endif
+	return 0;
+}
+
 static inline int mm_alloc_pgste(struct mm_struct *mm)
 {
 #ifdef CONFIG_PGSTE
@@ -1059,7 +1069,12 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 				       unsigned long addr, pte_t *ptep)
 {
-	return ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID));
+	pte_t res;
+
+	res = ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID));
+	if (mm_is_protected(mm) && pte_present(res))
+		uv_convert_from_secure(pte_val(res) & PAGE_MASK);
+	return res;
 }
 
 #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
@@ -1071,7 +1086,12 @@ void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long,
 static inline pte_t ptep_clear_flush(struct vm_area_struct *vma,
 				     unsigned long addr, pte_t *ptep)
 {
-	return ptep_xchg_direct(vma->vm_mm, addr, ptep, __pte(_PAGE_INVALID));
+	pte_t res;
+
+	res = ptep_xchg_direct(vma->vm_mm, addr, ptep, __pte(_PAGE_INVALID));
+	if (mm_is_protected(vma->vm_mm) && pte_present(res))
+		uv_convert_from_secure(pte_val(res) & PAGE_MASK);
+	return res;
 }
 
 /*
@@ -1086,12 +1106,16 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
 					    unsigned long addr,
 					    pte_t *ptep, int full)
 {
+	pte_t res;
 	if (full) {
-		pte_t pte = *ptep;
+		res = *ptep;
 		*ptep = __pte(_PAGE_INVALID);
-		return pte;
+	} else {
+		res = ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID));
 	}
-	return ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID));
+	if (mm_is_protected(mm) && pte_present(res))
+		uv_convert_from_secure(pte_val(res) & PAGE_MASK);
+	return res;
 }
 
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index 9e988543201f..1b97230a57ba 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -15,6 +15,7 @@
 #include <linux/errno.h>
 #include <linux/bug.h>
 #include <asm/page.h>
+#include <asm/gmap.h>
 
 #define UVC_RC_EXECUTED		0x0001
 #define UVC_RC_INV_CMD		0x0002
@@ -24,6 +25,10 @@
 
 #define UVC_CMD_QUI			0x0001
 #define UVC_CMD_INIT_UV			0x000f
+#define UVC_CMD_CONV_TO_SEC_STOR	0x0200
+#define UVC_CMD_CONV_FROM_SEC_STOR	0x0201
+#define UVC_CMD_PIN_PAGE_SHARED		0x0341
+#define UVC_CMD_UNPIN_PAGE_SHARED	0x0342
 #define UVC_CMD_SET_SHARED_ACCESS	0x1000
 #define UVC_CMD_REMOVE_SHARED_ACCESS	0x1001
 
@@ -31,8 +36,12 @@
 enum uv_cmds_inst {
 	BIT_UVC_CMD_QUI = 0,
 	BIT_UVC_CMD_INIT_UV = 1,
+	BIT_UVC_CMD_CONV_TO_SEC_STOR = 6,
+	BIT_UVC_CMD_CONV_FROM_SEC_STOR = 7,
 	BIT_UVC_CMD_SET_SHARED_ACCESS = 8,
 	BIT_UVC_CMD_REMOVE_SHARED_ACCESS = 9,
+	BIT_UVC_CMD_PIN_PAGE_SHARED = 21,
+	BIT_UVC_CMD_UNPIN_PAGE_SHARED = 22,
 };
 
 struct uv_cb_header {
@@ -69,6 +78,19 @@ struct uv_cb_init {
 	u64 reserved28[4];
 } __packed __aligned(8);
 
+struct uv_cb_cts {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 guest_handle;
+	u64 gaddr;
+} __packed __aligned(8);
+
+struct uv_cb_cfs {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 paddr;
+} __packed __aligned(8);
+
 struct uv_cb_share {
 	struct uv_cb_header header;
 	u64 reserved08[3];
@@ -169,12 +191,42 @@ static inline int is_prot_virt_host(void)
 	return prot_virt_host;
 }
 
+int uv_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb);
+int uv_convert_from_secure(unsigned long paddr);
+
+static inline int uv_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
+{
+	struct uv_cb_cts uvcb = {
+		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
+		.header.len = sizeof(uvcb),
+		.guest_handle = gmap->guest_handle,
+		.gaddr = gaddr,
+	};
+
+	return uv_make_secure(gmap, gaddr, &uvcb);
+}
+
 void setup_uv(void);
 void adjust_to_uv_max(unsigned long *vmax);
 #else
 #define is_prot_virt_host() 0
 static inline void setup_uv(void) {}
 static inline void adjust_to_uv_max(unsigned long *vmax) {}
+
+static inline int uv_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
+{
+	return 0;
+}
+
+static inline int uv_convert_from_secure(unsigned long paddr)
+{
+	return 0;
+}
+
+static inline int uv_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
+{
+	return 0;
+}
 #endif
 
 #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index a06a628a88da..15ac598a3d8d 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -9,6 +9,8 @@
 #include <linux/sizes.h>
 #include <linux/bitmap.h>
 #include <linux/memblock.h>
+#include <linux/pagemap.h>
+#include <linux/swap.h>
 #include <asm/facility.h>
 #include <asm/sections.h>
 #include <asm/uv.h>
@@ -99,4 +101,174 @@ void adjust_to_uv_max(unsigned long *vmax)
 	if (prot_virt_host && *vmax > uv_info.max_sec_stor_addr)
 		*vmax = uv_info.max_sec_stor_addr;
 }
+
+static int __uv_pin_shared(unsigned long paddr)
+{
+	struct uv_cb_cfs uvcb = {
+		.header.cmd	= UVC_CMD_PIN_PAGE_SHARED,
+		.header.len	= sizeof(uvcb),
+		.paddr		= paddr,
+	};
+
+	if (uv_call(0, (u64)&uvcb))
+		return -EINVAL;
+	return 0;
+}
+
+/*
+ * Requests the Ultravisor to encrypt a guest page and make it
+ * accessible to the host for paging (export).
+ *
+ * @paddr: Absolute host address of page to be exported
+ */
+int uv_convert_from_secure(unsigned long paddr)
+{
+	struct uv_cb_cfs uvcb = {
+		.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
+		.header.len = sizeof(uvcb),
+		.paddr = paddr
+	};
+
+	uv_call(0, (u64)&uvcb);
+
+	if (uvcb.header.rc == 1 || uvcb.header.rc == 0x107)
+		return 0;
+	return -EINVAL;
+}
+
+static int expected_page_refs(struct page *page)
+{
+	int res;
+
+	res = page_mapcount(page);
+	if (PageSwapCache(page))
+		res++;
+	else if (page_mapping(page)) {
+		res++;
+		if (page_has_private(page))
+			res++;
+	}
+	return res;
+}
+
+struct conv_params {
+	struct uv_cb_header *uvcb;
+	struct page *page;
+};
+
+static int make_secure_pte(pte_t *ptep, unsigned long addr, void *data)
+{
+	struct conv_params *params = data;
+	pte_t entry = READ_ONCE(*ptep);
+	struct page *page;
+	int expected, rc = 0;
+
+	if (!pte_present(entry))
+		return -ENXIO;
+	if (pte_val(entry) & (_PAGE_INVALID | _PAGE_PROTECT))
+		return -ENXIO;
+
+	page = pte_page(entry);
+	if (page != params->page)
+		return -ENXIO;
+
+	if (PageWriteback(page))
+		return -EAGAIN;
+	expected = expected_page_refs(page);
+	if (!page_ref_freeze(page, expected))
+		return -EBUSY;
+	set_bit(PG_arch_1, &page->flags);
+	rc = uv_call(0, (u64)params->uvcb);
+	page_ref_unfreeze(page, expected);
+	if (rc)
+		rc = (params->uvcb->rc == 0x10a) ? -ENXIO : -EINVAL;
+	return rc;
+}
+
+/*
+ * Requests the Ultravisor to make a page accessible to a guest.
+ * If it's brought in the first time, it will be cleared. If
+ * it has been exported before, it will be decrypted and integrity
+ * checked.
+ *
+ * @gmap: Guest mapping
+ * @gaddr: Guest 2 absolute address to be imported
+ */
+int uv_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
+{
+	struct conv_params params = { .uvcb = uvcb };
+	struct vm_area_struct *vma;
+	unsigned long uaddr;
+	int rc, local_drain = 0;
+
+again:
+	rc = -EFAULT;
+	down_read(&gmap->mm->mmap_sem);
+
+	uaddr = __gmap_translate(gmap, gaddr);
+	if (IS_ERR_VALUE(uaddr))
+		goto out;
+	vma = find_vma(gmap->mm, uaddr);
+	if (!vma)
+		goto out;
+	if (is_vm_hugetlb_page(vma))
+		goto out;
+
+	rc = -ENXIO;
+	params.page = follow_page(vma, uaddr, FOLL_WRITE | FOLL_NOWAIT);
+	if (IS_ERR_OR_NULL(params.page))
+		goto out;
+
+	lock_page(params.page);
+	rc = apply_to_page_range(gmap->mm, uaddr, PAGE_SIZE, make_secure_pte, &params);
+	unlock_page(params.page);
+out:
+	up_read(&gmap->mm->mmap_sem);
+
+	if (rc == -EBUSY) {
+		if (local_drain) {
+			lru_add_drain_all();
+			return -EAGAIN;
+		}
+		lru_add_drain();
+		local_drain = 1;
+		goto again;
+	} else if (rc == -ENXIO) {
+		if (gmap_fault(gmap, gaddr, FAULT_FLAG_WRITE))
+			return -EFAULT;
+		return -EAGAIN;
+	}
+	return rc;
+}
+EXPORT_SYMBOL_GPL(uv_make_secure);
+
+/**
+ * To be called with the page locked or with an extra reference!
+ */
+int arch_make_page_accessible(struct page *page)
+{
+	int rc = 0;
+
+	if (PageHuge(page))
+		return 0;
+
+	if (!test_bit(PG_arch_1, &page->flags))
+		return 0;
+
+	rc = __uv_pin_shared(page_to_phys(page));
+	if (!rc) {
+		clear_bit(PG_arch_1, &page->flags);
+		return 0;
+	}
+
+	rc = uv_convert_from_secure(page_to_phys(page));
+	if (!rc) {
+		clear_bit(PG_arch_1, &page->flags);
+		return 0;
+	}
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(arch_make_page_accessible);
+
 #endif
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 06/35] s390/mm: add (non)secure page access exceptions handlers
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (4 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 05/35] s390/mm: provide memory management functions for protected KVM guests Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-14 18:05   ` David Hildenbrand
  2020-02-07 11:39 ` [PATCH 07/35] KVM: s390: add new variants of UV CALL Christian Borntraeger
                   ` (28 subsequent siblings)
  34 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, Andrew Morton,
	Janosch Frank

From: Vasily Gorbik <gor@linux.ibm.com>

Add exceptions handlers performing transparent transition of non-secure
pages to secure (import) upon guest access and secure pages to
non-secure (export) upon hypervisor access.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
[frankja@linux.ibm.com: adding checks for failures]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
[imbrenda@linux.ibm.com:  adding a check for gmap fault]
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kernel/pgm_check.S |  4 +-
 arch/s390/mm/fault.c         | 86 ++++++++++++++++++++++++++++++++++++
 2 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kernel/pgm_check.S b/arch/s390/kernel/pgm_check.S
index 59dee9d3bebf..27ac4f324c70 100644
--- a/arch/s390/kernel/pgm_check.S
+++ b/arch/s390/kernel/pgm_check.S
@@ -78,8 +78,8 @@ PGM_CHECK(do_dat_exception)		/* 39 */
 PGM_CHECK(do_dat_exception)		/* 3a */
 PGM_CHECK(do_dat_exception)		/* 3b */
 PGM_CHECK_DEFAULT			/* 3c */
-PGM_CHECK_DEFAULT			/* 3d */
-PGM_CHECK_DEFAULT			/* 3e */
+PGM_CHECK(do_secure_storage_access)	/* 3d */
+PGM_CHECK(do_non_secure_storage_access)	/* 3e */
 PGM_CHECK_DEFAULT			/* 3f */
 PGM_CHECK_DEFAULT			/* 40 */
 PGM_CHECK_DEFAULT			/* 41 */
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 7b0bb475c166..fab4219fa0be 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -38,6 +38,7 @@
 #include <asm/irq.h>
 #include <asm/mmu_context.h>
 #include <asm/facility.h>
+#include <asm/uv.h>
 #include "../kernel/entry.h"
 
 #define __FAIL_ADDR_MASK -4096L
@@ -816,3 +817,88 @@ static int __init pfault_irq_init(void)
 early_initcall(pfault_irq_init);
 
 #endif /* CONFIG_PFAULT */
+
+#if IS_ENABLED(CONFIG_KVM)
+void do_secure_storage_access(struct pt_regs *regs)
+{
+	unsigned long addr = regs->int_parm_long & __FAIL_ADDR_MASK;
+	struct vm_area_struct *vma;
+	struct mm_struct *mm;
+	struct page *page;
+	int rc;
+
+	switch (get_fault_type(regs)) {
+	case USER_FAULT:
+		mm = current->mm;
+		down_read(&mm->mmap_sem);
+		vma = find_vma(mm, addr);
+		if (!vma) {
+			up_read(&mm->mmap_sem);
+			do_fault_error(regs, VM_READ | VM_WRITE, VM_FAULT_BADMAP);
+			break;
+		}
+		page = follow_page(vma, addr, FOLL_WRITE | FOLL_GET);
+		if (IS_ERR_OR_NULL(page)) {
+			up_read(&mm->mmap_sem);
+			break;
+		}
+		if (arch_make_page_accessible(page))
+			send_sig(SIGSEGV, current, 0);
+		put_page(page);
+		up_read(&mm->mmap_sem);
+		break;
+	case KERNEL_FAULT:
+		page = phys_to_page(addr);
+		if (unlikely(!try_get_page(page)))
+			break;
+		rc = arch_make_page_accessible(page);
+		put_page(page);
+		if (rc)
+			BUG();
+		break;
+	case VDSO_FAULT:
+		/* fallthrough */
+	case GMAP_FAULT:
+		/* fallthrough */
+	default:
+		do_fault_error(regs, VM_READ | VM_WRITE, VM_FAULT_BADMAP);
+		WARN_ON_ONCE(1);
+	}
+}
+NOKPROBE_SYMBOL(do_secure_storage_access);
+
+void do_non_secure_storage_access(struct pt_regs *regs)
+{
+	unsigned long gaddr = regs->int_parm_long & __FAIL_ADDR_MASK;
+	struct gmap *gmap = (struct gmap *)S390_lowcore.gmap;
+	struct uv_cb_cts uvcb = {
+		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
+		.header.len = sizeof(uvcb),
+		.guest_handle = gmap->guest_handle,
+		.gaddr = gaddr,
+	};
+	int rc;
+
+	if (get_fault_type(regs) != GMAP_FAULT) {
+		do_fault_error(regs, VM_READ | VM_WRITE, VM_FAULT_BADMAP);
+		WARN_ON_ONCE(1);
+		return;
+	}
+
+	rc = uv_make_secure(gmap, gaddr, &uvcb);
+	if (rc == -EINVAL && uvcb.header.rc != 0x104)
+		send_sig(SIGSEGV, current, 0);
+}
+NOKPROBE_SYMBOL(do_non_secure_storage_access);
+
+#else
+void do_secure_storage_access(struct pt_regs *regs)
+{
+	default_trap_handler(regs);
+}
+
+void do_non_secure_storage_access(struct pt_regs *regs)
+{
+	default_trap_handler(regs);
+}
+#endif
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 07/35] KVM: s390: add new variants of UV CALL
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (5 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 06/35] s390/mm: add (non)secure page access exceptions handlers Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-07 14:34   ` Thomas Huth
                     ` (2 more replies)
  2020-02-07 11:39 ` [PATCH 08/35] KVM: s390: protvirt: Add initial lifecycle handling Christian Borntraeger
                   ` (27 subsequent siblings)
  34 siblings, 3 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

This add 2 new variants of the UV CALL.

The first variant handles UV CALLs that might have longer busy
conditions or just need longer when doing partial completion. We should
schedule when necessary.

The second variant handles UV CALLs that only need the handle but have
no payload (e.g. destroying a VM). We can provide a simple wrapper for
those.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/uv.h | 59 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index 1b97230a57ba..e1cef772fde1 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -14,6 +14,7 @@
 #include <linux/types.h>
 #include <linux/errno.h>
 #include <linux/bug.h>
+#include <linux/sched.h>
 #include <asm/page.h>
 #include <asm/gmap.h>
 
@@ -91,6 +92,19 @@ struct uv_cb_cfs {
 	u64 paddr;
 } __packed __aligned(8);
 
+/*
+ * A common UV call struct for calls that take no payload
+ * Examples:
+ * Destroy cpu/config
+ * Verify
+ */
+struct uv_cb_nodata {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 handle;
+	u64 reserved20[4];
+} __packed __aligned(8);
+
 struct uv_cb_share {
 	struct uv_cb_header header;
 	u64 reserved08[3];
@@ -98,6 +112,31 @@ struct uv_cb_share {
 	u64 reserved28;
 } __packed __aligned(8);
 
+/*
+ * Low level uv_call that takes r1 and r2 as parameter and avoids
+ * stalls for long running busy conditions by doing schedule
+ */
+static inline int uv_call_sched(unsigned long r1, unsigned long r2)
+{
+	int cc;
+
+	do {
+		asm volatile(
+			"0:	.insn rrf,0xB9A40000,%[r1],%[r2],0,0\n"
+			"		ipm	%[cc]\n"
+			"		srl	%[cc],28\n"
+			: [cc] "=d" (cc)
+			: [r1] "d" (r1), [r2] "d" (r2)
+			: "memory", "cc");
+		if (need_resched())
+			schedule();
+	} while (cc > 1);
+	return cc;
+}
+
+/*
+ * Low level uv_call that takes r1 and r2 as parameter
+ */
 static inline int uv_call(unsigned long r1, unsigned long r2)
 {
 	int cc;
@@ -113,6 +152,26 @@ static inline int uv_call(unsigned long r1, unsigned long r2)
 	return cc;
 }
 
+/*
+ * special variant of uv_call that only transports the cpu or guest
+ * handle and the command, like destroy or verify.
+ */
+static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret)
+{
+	int rc;
+	struct uv_cb_nodata uvcb = {
+		.header.cmd = cmd,
+		.header.len = sizeof(uvcb),
+		.handle = handle,
+	};
+
+	WARN(!handle, "No handle provided to Ultravisor call cmd %x\n", cmd);
+	rc = uv_call_sched(0, (u64)&uvcb);
+	if (ret)
+		*ret = *(u32 *)&uvcb.header.rc;
+	return rc ? -EINVAL : 0;
+}
+
 struct uv_info {
 	unsigned long inst_calls_list[4];
 	unsigned long uv_base_stor_len;
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 08/35] KVM: s390: protvirt: Add initial lifecycle handling
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (6 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 07/35] KVM: s390: add new variants of UV CALL Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-07 16:32   ` Thomas Huth
                     ` (2 more replies)
  2020-02-07 11:39 ` [PATCH 09/35] KVM: s390: protvirt: Add KVM api documentation Christian Borntraeger
                   ` (26 subsequent siblings)
  34 siblings, 3 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

This contains 3 main changes:
1. changes in SIE control block handling for secure guests
2. helper functions for create/destroy/unpack secure guests
3. KVM_S390_PV_COMMAND ioctl to allow userspace dealing with secure
machines

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |  24 ++-
 arch/s390/include/asm/uv.h       |  69 +++++++++
 arch/s390/kvm/Makefile           |   2 +-
 arch/s390/kvm/kvm-s390.c         | 191 +++++++++++++++++++++++-
 arch/s390/kvm/kvm-s390.h         |  27 ++++
 arch/s390/kvm/pv.c               | 244 +++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h         |  33 +++++
 7 files changed, 586 insertions(+), 4 deletions(-)
 create mode 100644 arch/s390/kvm/pv.c

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 884503e05424..3ed31c5f80e1 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -160,7 +160,13 @@ struct kvm_s390_sie_block {
 	__u8	reserved08[4];		/* 0x0008 */
 #define PROG_IN_SIE (1<<0)
 	__u32	prog0c;			/* 0x000c */
-	__u8	reserved10[16];		/* 0x0010 */
+	union {
+		__u8	reserved10[16];		/* 0x0010 */
+		struct {
+			__u64	pv_handle_cpu;
+			__u64	pv_handle_config;
+		};
+	};
 #define PROG_BLOCK_SIE	(1<<0)
 #define PROG_REQUEST	(1<<1)
 	atomic_t prog20;		/* 0x0020 */
@@ -233,7 +239,7 @@ struct kvm_s390_sie_block {
 #define ECB3_RI  0x01
 	__u8    ecb3;			/* 0x0063 */
 	__u32	scaol;			/* 0x0064 */
-	__u8	reserved68;		/* 0x0068 */
+	__u8	sdf;			/* 0x0068 */
 	__u8    epdx;			/* 0x0069 */
 	__u8    reserved6a[2];		/* 0x006a */
 	__u32	todpr;			/* 0x006c */
@@ -645,6 +651,11 @@ struct kvm_guestdbg_info_arch {
 	unsigned long last_bp;
 };
 
+struct kvm_s390_pv_vcpu {
+	u64 handle;
+	unsigned long stor_base;
+};
+
 struct kvm_vcpu_arch {
 	struct kvm_s390_sie_block *sie_block;
 	/* if vsie is active, currently executed shadow sie control block */
@@ -673,6 +684,7 @@ struct kvm_vcpu_arch {
 	__u64 cputm_start;
 	bool gs_enabled;
 	bool skey_enabled;
+	struct kvm_s390_pv_vcpu pv;
 };
 
 struct kvm_vm_stat {
@@ -846,6 +858,13 @@ struct kvm_s390_gisa_interrupt {
 	DECLARE_BITMAP(kicked_mask, KVM_MAX_VCPUS);
 };
 
+struct kvm_s390_pv {
+	u64 handle;
+	u64 guest_len;
+	unsigned long stor_base;
+	void *stor_var;
+};
+
 struct kvm_arch{
 	void *sca;
 	int use_esca;
@@ -881,6 +900,7 @@ struct kvm_arch{
 	DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS);
 	DECLARE_BITMAP(idle_mask, KVM_MAX_VCPUS);
 	struct kvm_s390_gisa_interrupt gisa_int;
+	struct kvm_s390_pv pv;
 };
 
 #define KVM_HVA_ERR_BAD		(-1UL)
diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index e1cef772fde1..7c21d55d2e49 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -23,11 +23,19 @@
 #define UVC_RC_INV_STATE	0x0003
 #define UVC_RC_INV_LEN		0x0005
 #define UVC_RC_NO_RESUME	0x0007
+#define UVC_RC_NEED_DESTROY	0x8000
 
 #define UVC_CMD_QUI			0x0001
 #define UVC_CMD_INIT_UV			0x000f
+#define UVC_CMD_CREATE_SEC_CONF		0x0100
+#define UVC_CMD_DESTROY_SEC_CONF	0x0101
+#define UVC_CMD_CREATE_SEC_CPU		0x0120
+#define UVC_CMD_DESTROY_SEC_CPU		0x0121
 #define UVC_CMD_CONV_TO_SEC_STOR	0x0200
 #define UVC_CMD_CONV_FROM_SEC_STOR	0x0201
+#define UVC_CMD_SET_SEC_CONF_PARAMS	0x0300
+#define UVC_CMD_UNPACK_IMG		0x0301
+#define UVC_CMD_VERIFY_IMG		0x0302
 #define UVC_CMD_PIN_PAGE_SHARED		0x0341
 #define UVC_CMD_UNPIN_PAGE_SHARED	0x0342
 #define UVC_CMD_SET_SHARED_ACCESS	0x1000
@@ -37,10 +45,17 @@
 enum uv_cmds_inst {
 	BIT_UVC_CMD_QUI = 0,
 	BIT_UVC_CMD_INIT_UV = 1,
+	BIT_UVC_CMD_CREATE_SEC_CONF = 2,
+	BIT_UVC_CMD_DESTROY_SEC_CONF = 3,
+	BIT_UVC_CMD_CREATE_SEC_CPU = 4,
+	BIT_UVC_CMD_DESTROY_SEC_CPU = 5,
 	BIT_UVC_CMD_CONV_TO_SEC_STOR = 6,
 	BIT_UVC_CMD_CONV_FROM_SEC_STOR = 7,
 	BIT_UVC_CMD_SET_SHARED_ACCESS = 8,
 	BIT_UVC_CMD_REMOVE_SHARED_ACCESS = 9,
+	BIT_UVC_CMD_SET_SEC_PARMS = 11,
+	BIT_UVC_CMD_UNPACK_IMG = 13,
+	BIT_UVC_CMD_VERIFY_IMG = 14,
 	BIT_UVC_CMD_PIN_PAGE_SHARED = 21,
 	BIT_UVC_CMD_UNPIN_PAGE_SHARED = 22,
 };
@@ -52,6 +67,7 @@ struct uv_cb_header {
 	u16 rrc;	/* Return Reason Code */
 } __packed __aligned(8);
 
+/* Query Ultravisor Information */
 struct uv_cb_qui {
 	struct uv_cb_header header;
 	u64 reserved08;
@@ -71,6 +87,7 @@ struct uv_cb_qui {
 	u64 reserveda0;
 } __packed __aligned(8);
 
+/* Initialize Ultravisor */
 struct uv_cb_init {
 	struct uv_cb_header header;
 	u64 reserved08[2];
@@ -79,6 +96,35 @@ struct uv_cb_init {
 	u64 reserved28[4];
 } __packed __aligned(8);
 
+/* Create Guest Configuration */
+struct uv_cb_cgc {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 guest_handle;
+	u64 conf_base_stor_origin;
+	u64 conf_virt_stor_origin;
+	u64 reserved30;
+	u64 guest_stor_origin;
+	u64 guest_stor_len;
+	u64 guest_sca;
+	u64 guest_asce;
+	u64 reserved58[5];
+} __packed __aligned(8);
+
+/* Create Secure CPU */
+struct uv_cb_csc {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 cpu_handle;
+	u64 guest_handle;
+	u64 stor_origin;
+	u8  reserved30[6];
+	u16 num;
+	u64 state_origin;
+	u64 reserved40[4];
+} __packed __aligned(8);
+
+/* Convert to Secure */
 struct uv_cb_cts {
 	struct uv_cb_header header;
 	u64 reserved08[2];
@@ -86,12 +132,34 @@ struct uv_cb_cts {
 	u64 gaddr;
 } __packed __aligned(8);
 
+/* Convert from Secure / Pin Page Shared */
 struct uv_cb_cfs {
 	struct uv_cb_header header;
 	u64 reserved08[2];
 	u64 paddr;
 } __packed __aligned(8);
 
+/* Set Secure Config Parameter */
+struct uv_cb_ssc {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 guest_handle;
+	u64 sec_header_origin;
+	u32 sec_header_len;
+	u32 reserved2c;
+	u64 reserved30[4];
+} __packed __aligned(8);
+
+/* Unpack */
+struct uv_cb_unp {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 guest_handle;
+	u64 gaddr;
+	u64 tweak[2];
+	u64 reserved38[3];
+} __packed __aligned(8);
+
 /*
  * A common UV call struct for calls that take no payload
  * Examples:
@@ -105,6 +173,7 @@ struct uv_cb_nodata {
 	u64 reserved20[4];
 } __packed __aligned(8);
 
+/* Set Shared Access */
 struct uv_cb_share {
 	struct uv_cb_header header;
 	u64 reserved08[3];
diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile
index 05ee90a5ea08..12decca22e7c 100644
--- a/arch/s390/kvm/Makefile
+++ b/arch/s390/kvm/Makefile
@@ -9,6 +9,6 @@ common-objs = $(KVM)/kvm_main.o $(KVM)/eventfd.o  $(KVM)/async_pf.o $(KVM)/irqch
 ccflags-y := -Ivirt/kvm -Iarch/s390/kvm
 
 kvm-objs := $(common-objs) kvm-s390.o intercept.o interrupt.o priv.o sigp.o
-kvm-objs += diag.o gaccess.o guestdbg.o vsie.o
+kvm-objs += diag.o gaccess.o guestdbg.o vsie.o pv.o
 
 obj-$(CONFIG_KVM) += kvm.o
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 1a48214ac507..e1bccbb41fdd 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -44,6 +44,7 @@
 #include <asm/cpacf.h>
 #include <asm/timex.h>
 #include <asm/ap.h>
+#include <asm/uv.h>
 #include "kvm-s390.h"
 #include "gaccess.h"
 
@@ -236,6 +237,7 @@ int kvm_arch_check_processor_compat(void)
 
 static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start,
 			      unsigned long end);
+static int sca_switch_to_extended(struct kvm *kvm);
 
 static void kvm_clock_sync_scb(struct kvm_s390_sie_block *scb, u64 delta)
 {
@@ -568,6 +570,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_S390_BPB:
 		r = test_facility(82);
 		break;
+	case KVM_CAP_S390_PROTECTED:
+		r = is_prot_virt_host();
+		break;
 	default:
 		r = 0;
 	}
@@ -2162,6 +2167,115 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm,
 	return r;
 }
 
+static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
+{
+	int r = 0;
+	void __user *argp = (void __user *)cmd->data;
+
+	switch (cmd->cmd) {
+	case KVM_PV_VM_CREATE: {
+		r = -EINVAL;
+		if (kvm_s390_pv_is_protected(kvm))
+			break;
+
+		r = kvm_s390_pv_alloc_vm(kvm);
+		if (r)
+			break;
+
+		mutex_lock(&kvm->lock);
+		kvm_s390_vcpu_block_all(kvm);
+		/* FMT 4 SIE needs esca */
+		r = sca_switch_to_extended(kvm);
+		if (r) {
+			kvm_s390_pv_dealloc_vm(kvm);
+			kvm_s390_vcpu_unblock_all(kvm);
+			mutex_unlock(&kvm->lock);
+			break;
+		}
+		r = kvm_s390_pv_create_vm(kvm);
+		kvm_s390_vcpu_unblock_all(kvm);
+		mutex_unlock(&kvm->lock);
+		break;
+	}
+	case KVM_PV_VM_DESTROY: {
+		r = -EINVAL;
+		if (!kvm_s390_pv_is_protected(kvm))
+			break;
+
+		/* All VCPUs have to be destroyed before this call. */
+		mutex_lock(&kvm->lock);
+		kvm_s390_vcpu_block_all(kvm);
+		r = kvm_s390_pv_destroy_vm(kvm);
+		if (!r)
+			kvm_s390_pv_dealloc_vm(kvm);
+		kvm_s390_vcpu_unblock_all(kvm);
+		mutex_unlock(&kvm->lock);
+		break;
+	}
+	case KVM_PV_VM_SET_SEC_PARMS: {
+		struct kvm_s390_pv_sec_parm parms = {};
+		void *hdr;
+
+		r = -EINVAL;
+		if (!kvm_s390_pv_is_protected(kvm))
+			break;
+
+		r = -EFAULT;
+		if (copy_from_user(&parms, argp, sizeof(parms)))
+			break;
+
+		/* Currently restricted to 8KB */
+		r = -EINVAL;
+		if (parms.length > PAGE_SIZE * 2)
+			break;
+
+		r = -ENOMEM;
+		hdr = vmalloc(parms.length);
+		if (!hdr)
+			break;
+
+		r = -EFAULT;
+		if (!copy_from_user(hdr, (void __user *)parms.origin,
+				   parms.length))
+			r = kvm_s390_pv_set_sec_parms(kvm, hdr, parms.length);
+
+		vfree(hdr);
+		break;
+	}
+	case KVM_PV_VM_UNPACK: {
+		struct kvm_s390_pv_unp unp = {};
+
+		r = -EINVAL;
+		if (!kvm_s390_pv_is_protected(kvm))
+			break;
+
+		r = -EFAULT;
+		if (copy_from_user(&unp, argp, sizeof(unp)))
+			break;
+
+		r = kvm_s390_pv_unpack(kvm, unp.addr, unp.size, unp.tweak);
+		break;
+	}
+	case KVM_PV_VM_VERIFY: {
+		u32 ret;
+
+		r = -EINVAL;
+		if (!kvm_s390_pv_is_protected(kvm))
+			break;
+
+		r = uv_cmd_nodata(kvm_s390_pv_handle(kvm),
+				  UVC_CMD_VERIFY_IMG,
+				  &ret);
+		VM_EVENT(kvm, 3, "PROTVIRT VERIFY: rc %x rrc %x",
+			 ret >> 16, ret & 0x0000ffff);
+		break;
+	}
+	default:
+		return -ENOTTY;
+	}
+	return r;
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
 		       unsigned int ioctl, unsigned long arg)
 {
@@ -2259,6 +2373,20 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		mutex_unlock(&kvm->slots_lock);
 		break;
 	}
+	case KVM_S390_PV_COMMAND: {
+		struct kvm_pv_cmd args;
+
+		r = -EINVAL;
+		if (!is_prot_virt_host())
+			break;
+
+		r = -EFAULT;
+		if (copy_from_user(&args, argp, sizeof(args)))
+			break;
+
+		r = kvm_s390_handle_pv(kvm, &args);
+		break;
+	}
 	default:
 		r = -ENOTTY;
 	}
@@ -2534,6 +2662,8 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 
 	if (vcpu->kvm->arch.use_cmma)
 		kvm_s390_vcpu_unsetup_cmma(vcpu);
+	if (kvm_s390_pv_handle_cpu(vcpu))
+		kvm_s390_pv_destroy_cpu(vcpu);
 	free_page((unsigned long)(vcpu->arch.sie_block));
 
 	kvm_vcpu_uninit(vcpu);
@@ -2560,8 +2690,12 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	kvm_free_vcpus(kvm);
 	sca_dispose(kvm);
-	debug_unregister(kvm->arch.dbf);
 	kvm_s390_gisa_destroy(kvm);
+	if (kvm_s390_pv_is_protected(kvm)) {
+		kvm_s390_pv_destroy_vm(kvm);
+		kvm_s390_pv_dealloc_vm(kvm);
+	}
+	debug_unregister(kvm->arch.dbf);
 	free_page((unsigned long)kvm->arch.sie_page2);
 	if (!kvm_is_ucontrol(kvm))
 		gmap_remove(kvm->arch.gmap);
@@ -2657,6 +2791,9 @@ static int sca_switch_to_extended(struct kvm *kvm)
 	unsigned int vcpu_idx;
 	u32 scaol, scaoh;
 
+	if (kvm->arch.use_esca)
+		return 0;
+
 	new_sca = alloc_pages_exact(sizeof(*new_sca), GFP_KERNEL|__GFP_ZERO);
 	if (!new_sca)
 		return -ENOMEM;
@@ -3049,6 +3186,15 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
 	rc = kvm_vcpu_init(vcpu, kvm, id);
 	if (rc)
 		goto out_free_sie_block;
+
+	if (kvm_s390_pv_is_protected(kvm)) {
+		rc = kvm_s390_pv_create_cpu(vcpu);
+		if (rc) {
+			kvm_vcpu_uninit(vcpu);
+			goto out_free_sie_block;
+		}
+	}
+
 	VM_EVENT(kvm, 3, "create cpu %d at 0x%pK, sie block at 0x%pK", id, vcpu,
 		 vcpu->arch.sie_block);
 	trace_kvm_s390_create_vcpu(id, vcpu, vcpu->arch.sie_block);
@@ -4357,6 +4503,35 @@ long kvm_arch_vcpu_async_ioctl(struct file *filp,
 	return -ENOIOCTLCMD;
 }
 
+static int kvm_s390_handle_pv_vcpu(struct kvm_vcpu *vcpu,
+				   struct kvm_pv_cmd *cmd)
+{
+	int r = 0;
+
+	if (!kvm_s390_pv_is_protected(vcpu->kvm))
+		return -EINVAL;
+
+	switch (cmd->cmd) {
+	case KVM_PV_VCPU_CREATE: {
+		if (kvm_s390_pv_handle_cpu(vcpu))
+			return -EINVAL;
+
+		r = kvm_s390_pv_create_cpu(vcpu);
+		break;
+	}
+	case KVM_PV_VCPU_DESTROY: {
+		if (!kvm_s390_pv_handle_cpu(vcpu))
+			return -EINVAL;
+
+		r = kvm_s390_pv_destroy_cpu(vcpu);
+		break;
+	}
+	default:
+		r = -ENOTTY;
+	}
+	return r;
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 			 unsigned int ioctl, unsigned long arg)
 {
@@ -4498,6 +4673,20 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 					   irq_state.len);
 		break;
 	}
+	case KVM_S390_PV_COMMAND_VCPU: {
+		struct kvm_pv_cmd args;
+
+		r = -EINVAL;
+		if (!is_prot_virt_host())
+			break;
+
+		r = -EFAULT;
+		if (copy_from_user(&args, argp, sizeof(args)))
+			break;
+
+		r = kvm_s390_handle_pv_vcpu(vcpu, &args);
+		break;
+	}
 	default:
 		r = -ENOTTY;
 	}
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 54c5eb4b275d..32c0c01d5df0 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -196,6 +196,33 @@ static inline int kvm_s390_user_cpu_state_ctrl(struct kvm *kvm)
 	return kvm->arch.user_cpu_state_ctrl != 0;
 }
 
+/* implemented in pv.c */
+void kvm_s390_pv_dealloc_vm(struct kvm *kvm);
+int kvm_s390_pv_alloc_vm(struct kvm *kvm);
+int kvm_s390_pv_create_vm(struct kvm *kvm);
+int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu);
+int kvm_s390_pv_destroy_vm(struct kvm *kvm);
+int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu);
+int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length);
+int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
+		       unsigned long tweak);
+int kvm_s390_pv_verify(struct kvm *kvm);
+
+static inline bool kvm_s390_pv_is_protected(struct kvm *kvm)
+{
+	return !!kvm->arch.pv.handle;
+}
+
+static inline u64 kvm_s390_pv_handle(struct kvm *kvm)
+{
+	return kvm->arch.pv.handle;
+}
+
+static inline u64 kvm_s390_pv_handle_cpu(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.pv.handle;
+}
+
 /* implemented in interrupt.c */
 int kvm_s390_handle_wait(struct kvm_vcpu *vcpu);
 void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu);
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
new file mode 100644
index 000000000000..4795e61f4e16
--- /dev/null
+++ b/arch/s390/kvm/pv.c
@@ -0,0 +1,244 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hosting Secure Execution virtual machines
+ *
+ * Copyright IBM Corp. 2019
+ *    Author(s): Janosch Frank <frankja@linux.ibm.com>
+ */
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/pagemap.h>
+#include <asm/pgalloc.h>
+#include <asm/gmap.h>
+#include <asm/uv.h>
+#include <asm/gmap.h>
+#include <asm/mman.h>
+#include "kvm-s390.h"
+
+void kvm_s390_pv_dealloc_vm(struct kvm *kvm)
+{
+	vfree(kvm->arch.pv.stor_var);
+	free_pages(kvm->arch.pv.stor_base,
+		   get_order(uv_info.guest_base_stor_len));
+	memset(&kvm->arch.pv, 0, sizeof(kvm->arch.pv));
+}
+
+int kvm_s390_pv_alloc_vm(struct kvm *kvm)
+{
+	unsigned long base = uv_info.guest_base_stor_len;
+	unsigned long virt = uv_info.guest_virt_var_stor_len;
+	unsigned long npages = 0, vlen = 0;
+	struct kvm_memory_slot *memslot;
+
+	kvm->arch.pv.stor_var = NULL;
+	kvm->arch.pv.stor_base = __get_free_pages(GFP_KERNEL, get_order(base));
+	if (!kvm->arch.pv.stor_base)
+		return -ENOMEM;
+
+	/*
+	 * Calculate current guest storage for allocation of the
+	 * variable storage, which is based on the length in MB.
+	 *
+	 * Slots are sorted by GFN
+	 */
+	mutex_lock(&kvm->slots_lock);
+	memslot = kvm_memslots(kvm)->memslots;
+	npages = memslot->base_gfn + memslot->npages;
+	mutex_unlock(&kvm->slots_lock);
+
+	kvm->arch.pv.guest_len = npages * PAGE_SIZE;
+
+	/* Allocate variable storage */
+	vlen = ALIGN(virt * ((npages * PAGE_SIZE) / HPAGE_SIZE), PAGE_SIZE);
+	vlen += uv_info.guest_virt_base_stor_len;
+	kvm->arch.pv.stor_var = vzalloc(vlen);
+	if (!kvm->arch.pv.stor_var)
+		goto out_err;
+	return 0;
+
+out_err:
+	kvm_s390_pv_dealloc_vm(kvm);
+	return -ENOMEM;
+}
+
+int kvm_s390_pv_destroy_vm(struct kvm *kvm)
+{
+	int rc;
+	u32 ret;
+
+	rc = uv_cmd_nodata(kvm_s390_pv_handle(kvm),
+			   UVC_CMD_DESTROY_SEC_CONF, &ret);
+	WRITE_ONCE(kvm->arch.gmap->guest_handle, 0);
+	atomic_set(&kvm->mm->context.is_protected, 0);
+	VM_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x",
+		 ret >> 16, ret & 0x0000ffff);
+	return rc;
+}
+
+int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu)
+{
+	int rc = 0;
+	u32 ret;
+
+	if (kvm_s390_pv_handle_cpu(vcpu)) {
+		rc = uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
+				   UVC_CMD_DESTROY_SEC_CPU,
+				   &ret);
+
+		VCPU_EVENT(vcpu, 3, "PROTVIRT DESTROY VCPU: cpu %d rc %x rrc %x",
+			   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
+	}
+
+	free_pages(vcpu->arch.pv.stor_base,
+		   get_order(uv_info.guest_cpu_stor_len));
+	vcpu->arch.sie_block->pv_handle_cpu = 0;
+	vcpu->arch.sie_block->pv_handle_config = 0;
+	memset(&vcpu->arch.pv, 0, sizeof(vcpu->arch.pv));
+	vcpu->arch.sie_block->sdf = 0;
+	return rc;
+}
+
+int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
+{
+	int rc;
+	struct uv_cb_csc uvcb = {
+		.header.cmd = UVC_CMD_CREATE_SEC_CPU,
+		.header.len = sizeof(uvcb),
+	};
+
+	if (kvm_s390_pv_handle_cpu(vcpu))
+		return -EINVAL;
+
+	vcpu->arch.pv.stor_base = __get_free_pages(GFP_KERNEL,
+						   get_order(uv_info.guest_cpu_stor_len));
+	if (!vcpu->arch.pv.stor_base)
+		return -ENOMEM;
+
+	/* Input */
+	uvcb.guest_handle = kvm_s390_pv_handle(vcpu->kvm);
+	uvcb.num = vcpu->arch.sie_block->icpua;
+	uvcb.state_origin = (u64)vcpu->arch.sie_block;
+	uvcb.stor_origin = (u64)vcpu->arch.pv.stor_base;
+
+	rc = uv_call(0, (u64)&uvcb);
+	VCPU_EVENT(vcpu, 3, "PROTVIRT CREATE VCPU: cpu %d handle %llx rc %x rrc %x",
+		   vcpu->vcpu_id, uvcb.cpu_handle, uvcb.header.rc,
+		   uvcb.header.rrc);
+
+	if (rc) {
+		kvm_s390_pv_destroy_cpu(vcpu);
+		return -EINVAL;
+	}
+
+	/* Output */
+	vcpu->arch.pv.handle = uvcb.cpu_handle;
+	vcpu->arch.sie_block->pv_handle_cpu = uvcb.cpu_handle;
+	vcpu->arch.sie_block->pv_handle_config = kvm_s390_pv_handle(vcpu->kvm);
+	vcpu->arch.sie_block->sdf = 2;
+	return 0;
+}
+
+int kvm_s390_pv_create_vm(struct kvm *kvm)
+{
+	int rc;
+
+	struct uv_cb_cgc uvcb = {
+		.header.cmd = UVC_CMD_CREATE_SEC_CONF,
+		.header.len = sizeof(uvcb)
+	};
+
+	if (kvm_s390_pv_handle(kvm))
+		return -EINVAL;
+
+	/* Inputs */
+	uvcb.guest_stor_origin = 0; /* MSO is 0 for KVM */
+	uvcb.guest_stor_len = kvm->arch.pv.guest_len;
+	uvcb.guest_asce = kvm->arch.gmap->asce;
+	uvcb.guest_sca = (unsigned long)kvm->arch.sca;
+	uvcb.conf_base_stor_origin = (u64)kvm->arch.pv.stor_base;
+	uvcb.conf_virt_stor_origin = (u64)kvm->arch.pv.stor_var;
+
+	rc = uv_call(0, (u64)&uvcb);
+	VM_EVENT(kvm, 3, "PROTVIRT CREATE VM: handle %llx len %llx rc %x rrc %x",
+		 uvcb.guest_handle, uvcb.guest_stor_len, uvcb.header.rc,
+		 uvcb.header.rrc);
+
+	/* Outputs */
+	kvm->arch.pv.handle = uvcb.guest_handle;
+
+	if (rc && (uvcb.header.rc & 0x8000)) {
+		kvm_s390_pv_destroy_vm(kvm);
+		return -EINVAL;
+	}
+	kvm->arch.gmap->guest_handle = uvcb.guest_handle;
+	atomic_set(&kvm->mm->context.is_protected, 1);
+	return rc;
+}
+
+int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
+			      void *hdr, u64 length)
+{
+	int rc;
+	struct uv_cb_ssc uvcb = {
+		.header.cmd = UVC_CMD_SET_SEC_CONF_PARAMS,
+		.header.len = sizeof(uvcb),
+		.sec_header_origin = (u64)hdr,
+		.sec_header_len = length,
+		.guest_handle = kvm_s390_pv_handle(kvm),
+	};
+
+	if (!kvm_s390_pv_handle(kvm))
+		return -EINVAL;
+
+	rc = uv_call(0, (u64)&uvcb);
+	VM_EVENT(kvm, 3, "PROTVIRT VM SET PARMS: rc %x rrc %x",
+		 uvcb.header.rc, uvcb.header.rrc);
+	if (rc)
+		return -EINVAL;
+	return 0;
+}
+
+static int unpack_one(struct kvm *kvm, unsigned long addr, u64 tweak[2])
+{
+	struct uv_cb_unp uvcb = {
+		.header.cmd = UVC_CMD_UNPACK_IMG,
+		.header.len = sizeof(uvcb),
+		.guest_handle = kvm_s390_pv_handle(kvm),
+		.gaddr = addr,
+		.tweak[0] = tweak[0],
+		.tweak[1] = tweak[1],
+	};
+	int rc;
+
+	rc = uv_make_secure(kvm->arch.gmap, addr, &uvcb);
+
+	if (rc)
+		VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: failed addr %llx rc %x rrc %x",
+			 uvcb.gaddr, uvcb.header.rc, uvcb.header.rrc);
+	return rc;
+}
+
+int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
+		       unsigned long tweak)
+{
+	int rc = 0;
+	u64 tw[2] = {tweak, 0};
+
+	if (addr & ~PAGE_MASK || !size || size & ~PAGE_MASK)
+		return -EINVAL;
+
+	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: start addr %lx size %lx",
+		 addr, size);
+
+	while (tw[1] < size) {
+		rc = unpack_one(kvm, addr, tw);
+		if (rc == -EAGAIN)
+			continue;
+		if (rc)
+			break;
+		addr += PAGE_SIZE;
+		tw[1] += PAGE_SIZE;
+	}
+	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: finished with rc %x", rc);
+	return rc;
+}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4b95f9a31a2f..eab741bc12c3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1010,6 +1010,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_NISV_TO_USER 177
 #define KVM_CAP_ARM_INJECT_EXT_DABT 178
 #define KVM_CAP_S390_VCPU_RESETS 179
+#define KVM_CAP_S390_PROTECTED 181
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1478,6 +1479,38 @@ struct kvm_enc_region {
 #define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
 #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
 
+struct kvm_s390_pv_sec_parm {
+	__u64	origin;
+	__u64	length;
+};
+
+struct kvm_s390_pv_unp {
+	__u64 addr;
+	__u64 size;
+	__u64 tweak;
+};
+
+enum pv_cmd_id {
+	KVM_PV_VM_CREATE,
+	KVM_PV_VM_DESTROY,
+	KVM_PV_VM_SET_SEC_PARMS,
+	KVM_PV_VM_UNPACK,
+	KVM_PV_VM_VERIFY,
+	KVM_PV_VCPU_CREATE,
+	KVM_PV_VCPU_DESTROY,
+};
+
+struct kvm_pv_cmd {
+	__u32	cmd;	/* Command to be executed */
+	__u16	rc;	/* Ultravisor return code */
+	__u16	rrc;	/* Ultravisor return reason code */
+	__u64	data;	/* Data or address */
+};
+
+/* Available with KVM_CAP_S390_PROTECTED */
+#define KVM_S390_PV_COMMAND		_IOW(KVMIO, 0xc5, struct kvm_pv_cmd)
+#define KVM_S390_PV_COMMAND_VCPU	_IOW(KVMIO, 0xc6, struct kvm_pv_cmd)
+
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
 	/* Guest initialization commands */
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 09/35] KVM: s390: protvirt: Add KVM api documentation
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (7 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 08/35] KVM: s390: protvirt: Add initial lifecycle handling Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-08 14:57   ` Thomas Huth
  2020-02-07 11:39 ` [PATCH 10/35] KVM: s390: protvirt: Secure memory is not mergeable Christian Borntraeger
                   ` (25 subsequent siblings)
  34 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

Add documentation for KVM_CAP_S390_PROTECTED capability and the
KVM_S390_PV_COMMAND and KVM_S390_PV_COMMAND_VCPU ioctls.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 Documentation/virt/kvm/api.txt | 61 ++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
index 73448764f544..4874d42286ca 100644
--- a/Documentation/virt/kvm/api.txt
+++ b/Documentation/virt/kvm/api.txt
@@ -4204,6 +4204,60 @@ the clear cpu reset definition in the POP. However, the cpu is not put
 into ESA mode. This reset is a superset of the initial reset.
 
 
+4.125 KVM_S390_PV_COMMAND
+
+Capability: KVM_CAP_S390_PROTECTED
+Architectures: s390
+Type: vm ioctl
+Parameters: struct kvm_pv_cmd
+Returns: 0 on success, < 0 on error
+
+struct kvm_pv_cmd {
+	__u32	cmd;	/* Command to be executed */
+	__u16	rc;	/* Ultravisor return code */
+	__u16	rrc;	/* Ultravisor return reason code */
+	__u64	data;	/* Data or address */
+};
+
+cmd values:
+KVM_PV_VM_CREATE
+Allocate memory and register the VM with the Ultravisor, thereby
+donating memory to the Ultravisor making it inaccessible to KVM.
+
+KVM_PV_VM_DESTROY
+Deregisters the VM from the Ultravisor and frees memory that was
+donated, so the kernel can use it again. All registered VCPUs have to
+be unregistered beforehand and all memory has to be exported or
+shared.
+
+KVM_PV_VM_SET_SEC_PARMS
+Pass the image header from VM memory to the Ultravisor in preparation
+of image unpacking and verification.
+
+KVM_PV_VM_UNPACK
+Unpack (protect and decrypt) a page of the encrypted boot image.
+
+KVM_PV_VM_VERIFY
+Verify the integrity of the unpacked image. Only if this succeeds, KVM
+is allowed to start protected VCPUs.
+
+4.126 KVM_S390_PV_COMMAND_VCPU
+
+Capability: KVM_CAP_S390_PROTECTED
+Architectures: s390
+Type: vcpu ioctl
+Parameters: struct kvm_pv_cmd
+Returns: 0 on success, < 0 on error
+
+cmd values:
+KVM_PV_VCPU_CREATE
+Allocate memory and register a VCPU with the Ultravisor, thereby
+donating memory to the Ultravisor making it inaccessible to KVM.
+
+KVM_PV_VCPU_DESTROY
+Unregisters the VCPU from the Ultravisor and frees memory that was
+donated, so the kernel can use it again.
+
 5. The kvm_run structure
 ------------------------
 
@@ -5439,3 +5493,10 @@ Architectures: s390
 
 This capability indicates that the KVM_S390_NORMAL_RESET and
 KVM_S390_CLEAR_RESET ioctls are available.
+
+8.23 KVM_CAP_S390_PROTECTED
+
+Architecture: s390
+
+This capability indicates that KVM can start protected VMs and the
+Ultravisor has therefore been initialized.
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 10/35] KVM: s390: protvirt: Secure memory is not mergeable
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (8 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 09/35] KVM: s390: protvirt: Add KVM api documentation Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-07 11:39 ` [PATCH 11/35] KVM: s390/mm: Make pages accessible before destroying the guest Christian Borntraeger
                   ` (24 subsequent siblings)
  34 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank, linux-mm,
	Andrew Morton

From: Janosch Frank <frankja@linux.ibm.com>

KSM will not work on secure pages, because when the kernel reads a
secure page, it will be encrypted and hence no two pages will look the
same.

Let's mark the guest pages as unmergeable when we transition to secure
mode.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/gmap.h |  1 +
 arch/s390/kvm/kvm-s390.c     |  8 ++++++++
 arch/s390/mm/gmap.c          | 30 ++++++++++++++++++++----------
 3 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index e2d2f48c5c7c..e1f2cc0b2b00 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -146,4 +146,5 @@ int gmap_mprotect_notify(struct gmap *, unsigned long start,
 
 void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],
 			     unsigned long gaddr, unsigned long vmaddr);
+int gmap_mark_unmergeable(void);
 #endif /* _ASM_S390_GMAP_H */
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index e1bccbb41fdd..ad86e74e27ec 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2182,6 +2182,14 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 		if (r)
 			break;
 
+		down_write(&current->mm->mmap_sem);
+		r = gmap_mark_unmergeable();
+		up_write(&current->mm->mmap_sem);
+		if (r) {
+			kvm_s390_pv_dealloc_vm(kvm);
+			break;
+		}
+
 		mutex_lock(&kvm->lock);
 		kvm_s390_vcpu_block_all(kvm);
 		/* FMT 4 SIE needs esca */
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index edcdca97e85e..7291452fe5f0 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2548,6 +2548,22 @@ int s390_enable_sie(void)
 }
 EXPORT_SYMBOL_GPL(s390_enable_sie);
 
+int gmap_mark_unmergeable(void)
+{
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
+				MADV_UNMERGEABLE, &vma->vm_flags)) {
+			return -ENOMEM;
+		}
+	}
+	mm->def_flags &= ~VM_MERGEABLE;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(gmap_mark_unmergeable);
+
 /*
  * Enable storage key handling from now on and initialize the storage
  * keys with the default key.
@@ -2593,7 +2609,6 @@ static const struct mm_walk_ops enable_skey_walk_ops = {
 int s390_enable_skey(void)
 {
 	struct mm_struct *mm = current->mm;
-	struct vm_area_struct *vma;
 	int rc = 0;
 
 	down_write(&mm->mmap_sem);
@@ -2601,16 +2616,11 @@ int s390_enable_skey(void)
 		goto out_up;
 
 	mm->context.uses_skeys = 1;
-	for (vma = mm->mmap; vma; vma = vma->vm_next) {
-		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
-				MADV_UNMERGEABLE, &vma->vm_flags)) {
-			mm->context.uses_skeys = 0;
-			rc = -ENOMEM;
-			goto out_up;
-		}
+	rc = gmap_mark_unmergeable();
+	if (rc) {
+		mm->context.uses_skeys = 0;
+		goto out_up;
 	}
-	mm->def_flags &= ~VM_MERGEABLE;
-
 	walk_page_range(mm, 0, TASK_SIZE, &enable_skey_walk_ops, NULL);
 
 out_up:
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 11/35] KVM: s390/mm: Make pages accessible before destroying the guest
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (9 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 10/35] KVM: s390: protvirt: Secure memory is not mergeable Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-14 18:40   ` David Hildenbrand
  2020-02-07 11:39 ` [PATCH 12/35] KVM: s390: protvirt: Handle SE notification interceptions Christian Borntraeger
                   ` (23 subsequent siblings)
  34 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, Andrew Morton

Before we destroy the secure configuration, we better make all
pages accessible again. This also happens during reboot, where we reboot
into a non-secure guest that then can go again into secure mode. As
this "new" secure guest will have a new ID we cannot reuse the old page
state.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
---
 arch/s390/include/asm/pgtable.h |  1 +
 arch/s390/kvm/pv.c              |  2 ++
 arch/s390/mm/gmap.c             | 35 +++++++++++++++++++++++++++++++++
 3 files changed, 38 insertions(+)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index dbd1453e6924..3e2ea997c334 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1669,6 +1669,7 @@ extern int vmem_remove_mapping(unsigned long start, unsigned long size);
 extern int s390_enable_sie(void);
 extern int s390_enable_skey(void);
 extern void s390_reset_cmma(struct mm_struct *mm);
+extern void s390_reset_acc(struct mm_struct *mm);
 
 /* s390 has a private copy of get unmapped area to deal with cache synonyms */
 #define HAVE_ARCH_UNMAPPED_AREA
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 4795e61f4e16..392795a92bd9 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -66,6 +66,8 @@ int kvm_s390_pv_destroy_vm(struct kvm *kvm)
 	int rc;
 	u32 ret;
 
+	/* make all pages accessible before destroying the guest */
+	s390_reset_acc(kvm->mm);
 	rc = uv_cmd_nodata(kvm_s390_pv_handle(kvm),
 			   UVC_CMD_DESTROY_SEC_CONF, &ret);
 	WRITE_ONCE(kvm->arch.gmap->guest_handle, 0);
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 7291452fe5f0..27926a06df32 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2650,3 +2650,38 @@ void s390_reset_cmma(struct mm_struct *mm)
 	up_write(&mm->mmap_sem);
 }
 EXPORT_SYMBOL_GPL(s390_reset_cmma);
+
+/*
+ * make inaccessible pages accessible again
+ */
+static int __s390_reset_acc(pte_t *ptep, unsigned long addr,
+			    unsigned long next, struct mm_walk *walk)
+{
+	pte_t pte = READ_ONCE(*ptep);
+
+	if (pte_present(pte))
+		WARN_ON_ONCE(uv_convert_from_secure(pte_val(pte) & PAGE_MASK));
+	return 0;
+}
+
+static const struct mm_walk_ops reset_acc_walk_ops = {
+	.pte_entry		= __s390_reset_acc,
+};
+
+#include <linux/sched/mm.h>
+void s390_reset_acc(struct mm_struct *mm)
+{
+	/*
+	 * we might be called during
+	 * reset:                             we walk the pages and clear
+	 * close of all kvm file descriptors: we walk the pages and clear
+	 * exit of process on fd closure:     vma already gone, do nothing
+	 */
+	if (!mmget_not_zero(mm))
+		return;
+	down_read(&mm->mmap_sem);
+	walk_page_range(mm, 0, TASK_SIZE, &reset_acc_walk_ops, NULL);
+	up_read(&mm->mmap_sem);
+	mmput(mm);
+}
+EXPORT_SYMBOL_GPL(s390_reset_acc);
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 12/35] KVM: s390: protvirt: Handle SE notification interceptions
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (10 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 11/35] KVM: s390/mm: Make pages accessible before destroying the guest Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-07 11:39 ` [PATCH 13/35] KVM: s390: protvirt: Instruction emulation Christian Borntraeger
                   ` (22 subsequent siblings)
  34 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

Since there is no interception for load control and load psw
instruction in the protected mode, we need a new way to get notified
whenever we can inject an IRQ right after the guest has just enabled
the possibility for receiving them.

The new interception codes solve that problem by providing a
notification for changes to IRQ enablement relevant bits in CRs 0, 6
and 14, as well a the machine check mask bit in the PSW.

No special handling is needed for these interception codes, the KVM
pre-run code will consult all necessary CRs and PSW bits and inject
IRQs the guest is enabled for.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/kvm_host.h | 2 ++
 arch/s390/kvm/intercept.c        | 9 +++++++++
 2 files changed, 11 insertions(+)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 3ed31c5f80e1..08acf280c4b0 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -215,6 +215,8 @@ struct kvm_s390_sie_block {
 #define ICPT_PARTEXEC	0x38
 #define ICPT_IOINST	0x40
 #define ICPT_KSS	0x5c
+#define ICPT_MCHKREQ	0x60
+#define ICPT_INT_ENABLE	0x64
 	__u8	icptcode;		/* 0x0050 */
 	__u8	icptstatus;		/* 0x0051 */
 	__u16	ihcpu;			/* 0x0052 */
diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index a389fa85cca2..6aeb4b36042c 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -480,6 +480,15 @@ int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
 	case ICPT_KSS:
 		rc = kvm_s390_skey_check_enable(vcpu);
 		break;
+	case ICPT_MCHKREQ:
+	case ICPT_INT_ENABLE:
+		/*
+		 * PSW bit 13 or a CR (0, 6, 14) changed and we might
+		 * now be able to deliver interrupts. The pre-run code
+		 * will take care of this.
+		 */
+		rc = 0;
+		break;
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 13/35] KVM: s390: protvirt: Instruction emulation
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (11 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 12/35] KVM: s390: protvirt: Handle SE notification interceptions Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-07 11:39 ` [PATCH 14/35] KVM: s390: protvirt: Add interruption injection controls Christian Borntraeger
                   ` (21 subsequent siblings)
  34 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

We have two new SIE exit codes dealing with instructions.
104 (0x68) for a secure instruction interception, on which the SIE needs
hypervisor action to complete the instruction. We can piggy-back on the
existing instruction handlers.

108 which is merely a notification and provides data for tracking and
management. For example this is used to tell the host about a new value
for the prefix register. As there will be several special case handlers
in later patches, we handle this in a separate function.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |  2 ++
 arch/s390/kvm/intercept.c        | 11 +++++++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 08acf280c4b0..ae7c611ee9dd 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -217,6 +217,8 @@ struct kvm_s390_sie_block {
 #define ICPT_KSS	0x5c
 #define ICPT_MCHKREQ	0x60
 #define ICPT_INT_ENABLE	0x64
+#define ICPT_PV_INSTR	0x68
+#define ICPT_PV_NOTIFY	0x6c
 	__u8	icptcode;		/* 0x0050 */
 	__u8	icptstatus;		/* 0x0051 */
 	__u16	ihcpu;			/* 0x0052 */
diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index 6aeb4b36042c..6fdbac696f65 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -444,6 +444,11 @@ static int handle_operexc(struct kvm_vcpu *vcpu)
 	return kvm_s390_inject_program_int(vcpu, PGM_OPERATION);
 }
 
+static int handle_pv_notification(struct kvm_vcpu *vcpu)
+{
+	return handle_instruction(vcpu);
+}
+
 int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
 {
 	int rc, per_rc = 0;
@@ -489,6 +494,12 @@ int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
 		 */
 		rc = 0;
 		break;
+	case ICPT_PV_INSTR:
+		rc = handle_instruction(vcpu);
+		break;
+	case ICPT_PV_NOTIFY:
+		rc = handle_pv_notification(vcpu);
+		break;
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 14/35] KVM: s390: protvirt: Add interruption injection controls
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (12 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 13/35] KVM: s390: protvirt: Instruction emulation Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-07 11:39 ` [PATCH 15/35] KVM: s390: protvirt: Implement interruption injection Christian Borntraeger
                   ` (20 subsequent siblings)
  34 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik

From: Michael Mueller <mimu@linux.ibm.com>

This defines the necessary data structures in the SIE control block to
inject machine checks,external and I/O interrupts. We first define the
the interrupt injection control, which defines the next interrupt to
inject. Then we define the fields that contain the payload for machine
checks,external and I/O interrupts

Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/kvm_host.h | 56 +++++++++++++++++++++++++-------
 1 file changed, 44 insertions(+), 12 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index ae7c611ee9dd..a453670d37fa 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -222,7 +222,15 @@ struct kvm_s390_sie_block {
 	__u8	icptcode;		/* 0x0050 */
 	__u8	icptstatus;		/* 0x0051 */
 	__u16	ihcpu;			/* 0x0052 */
-	__u8	reserved54[2];		/* 0x0054 */
+	__u8	reserved54;		/* 0x0054 */
+#define IICTL_CODE_NONE		 0x00
+#define IICTL_CODE_MCHK		 0x01
+#define IICTL_CODE_EXT		 0x02
+#define IICTL_CODE_IO		 0x03
+#define IICTL_CODE_RESTART	 0x04
+#define IICTL_CODE_SPECIFICATION 0x10
+#define IICTL_CODE_OPERAND	 0x11
+	__u8	iictl;			/* 0x0055 */
 	__u16	ipa;			/* 0x0056 */
 	__u32	ipb;			/* 0x0058 */
 	__u32	scaoh;			/* 0x005c */
@@ -259,24 +267,48 @@ struct kvm_s390_sie_block {
 #define HPID_KVM	0x4
 #define HPID_VSIE	0x5
 	__u8	hpid;			/* 0x00b8 */
-	__u8	reservedb9[11];		/* 0x00b9 */
-	__u16	extcpuaddr;		/* 0x00c4 */
-	__u16	eic;			/* 0x00c6 */
+	__u8	reservedb9[7];		/* 0x00b9 */
+	union {
+		struct {
+			__u32	eiparams;	/* 0x00c0 */
+			__u16	extcpuaddr;	/* 0x00c4 */
+			__u16	eic;		/* 0x00c6 */
+		};
+		__u64	mcic;			/* 0x00c0 */
+	} __packed;
 	__u32	reservedc8;		/* 0x00c8 */
-	__u16	pgmilc;			/* 0x00cc */
-	__u16	iprcc;			/* 0x00ce */
-	__u32	dxc;			/* 0x00d0 */
-	__u16	mcn;			/* 0x00d4 */
-	__u8	perc;			/* 0x00d6 */
-	__u8	peratmid;		/* 0x00d7 */
+	union {
+		struct {
+			__u16	pgmilc;		/* 0x00cc */
+			__u16	iprcc;		/* 0x00ce */
+		};
+		__u32	edc;			/* 0x00cc */
+	} __packed;
+	union {
+		struct {
+			__u32	dxc;		/* 0x00d0 */
+			__u16	mcn;		/* 0x00d4 */
+			__u8	perc;		/* 0x00d6 */
+			__u8	peratmid;	/* 0x00d7 */
+		};
+		__u64	faddr;			/* 0x00d0 */
+	} __packed;
 	__u64	peraddr;		/* 0x00d8 */
 	__u8	eai;			/* 0x00e0 */
 	__u8	peraid;			/* 0x00e1 */
 	__u8	oai;			/* 0x00e2 */
 	__u8	armid;			/* 0x00e3 */
 	__u8	reservede4[4];		/* 0x00e4 */
-	__u64	tecmc;			/* 0x00e8 */
-	__u8	reservedf0[12];		/* 0x00f0 */
+	union {
+		__u64	tecmc;		/* 0x00e8 */
+		struct {
+			__u16	subchannel_id;	/* 0x00e8 */
+			__u16	subchannel_nr;	/* 0x00ea */
+			__u32	io_int_parm;	/* 0x00ec */
+			__u32	io_int_word;	/* 0x00f0 */
+		};
+	} __packed;
+	__u8	reservedf4[8];		/* 0x00f4 */
 #define CRYCB_FORMAT_MASK 0x00000003
 #define CRYCB_FORMAT0 0x00000000
 #define CRYCB_FORMAT1 0x00000001
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 15/35] KVM: s390: protvirt: Implement interruption injection
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (13 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 14/35] KVM: s390: protvirt: Add interruption injection controls Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-10 10:03   ` Thomas Huth
  2020-02-07 11:39 ` [PATCH 16/35] KVM: s390: protvirt: Add SCLP interrupt handling Christian Borntraeger
                   ` (19 subsequent siblings)
  34 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik

From: Michael Mueller <mimu@linux.ibm.com>

The patch implements interruption injection for the following
list of interruption types:

   - I/O (uses inject io interruption)
     __deliver_io

   - External (uses inject external interruption)
     __deliver_cpu_timer
     __deliver_ckc
     __deliver_emergency_signal
     __deliver_external_call

   - cpu restart (uses inject restart interruption)
     __deliver_restart

   - machine checks (uses mcic, failing address and external damage)
     __write_machine_check

Please note that posted interrupts (GISA) are not used for protected
guests as of today.

The service interrupt is handled in a followup patch.

Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |   6 ++
 arch/s390/kvm/interrupt.c        | 106 +++++++++++++++++++++++--------
 2 files changed, 86 insertions(+), 26 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index a453670d37fa..1319a496c8f3 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -578,6 +578,12 @@ enum irq_types {
 #define IRQ_PEND_MCHK_MASK ((1UL << IRQ_PEND_MCHK_REP) | \
 			    (1UL << IRQ_PEND_MCHK_EX))
 
+#define IRQ_PEND_EXT_II_MASK ((1UL << IRQ_PEND_EXT_CPU_TIMER)  | \
+			      (1UL << IRQ_PEND_EXT_CLOCK_COMP) | \
+			      (1UL << IRQ_PEND_EXT_EMERGENCY)  | \
+			      (1UL << IRQ_PEND_EXT_EXTERNAL)   | \
+			      (1UL << IRQ_PEND_EXT_SERVICE))
+
 struct kvm_s390_interrupt_info {
 	struct list_head list;
 	u64	type;
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 4bfb2f8fe57c..e5ee52e33d96 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -388,6 +388,12 @@ static unsigned long deliverable_irqs(struct kvm_vcpu *vcpu)
 		__clear_bit(IRQ_PEND_EXT_SERVICE, &active_mask);
 	if (psw_mchk_disabled(vcpu))
 		active_mask &= ~IRQ_PEND_MCHK_MASK;
+	/* PV guest cpus can have a single interruption injected at a time. */
+	if (kvm_s390_pv_is_protected(vcpu->kvm) &&
+	    vcpu->arch.sie_block->iictl != IICTL_CODE_NONE)
+		active_mask &= ~(IRQ_PEND_EXT_II_MASK |
+				 IRQ_PEND_IO_MASK |
+				 IRQ_PEND_MCHK_MASK);
 	/*
 	 * Check both floating and local interrupt's cr14 because
 	 * bit IRQ_PEND_MCHK_REP could be set in both cases.
@@ -480,19 +486,23 @@ static void set_intercept_indicators(struct kvm_vcpu *vcpu)
 static int __must_check __deliver_cpu_timer(struct kvm_vcpu *vcpu)
 {
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
-	int rc;
+	int rc = 0;
 
 	vcpu->stat.deliver_cputm++;
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_CPU_TIMER,
 					 0, 0);
-
-	rc  = put_guest_lc(vcpu, EXT_IRQ_CPU_TIMER,
-			   (u16 *)__LC_EXT_INT_CODE);
-	rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
-	rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
-			     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
-	rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
-			    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+		vcpu->arch.sie_block->iictl = IICTL_CODE_EXT;
+		vcpu->arch.sie_block->eic = EXT_IRQ_CPU_TIMER;
+	} else {
+		rc  = put_guest_lc(vcpu, EXT_IRQ_CPU_TIMER,
+				   (u16 *)__LC_EXT_INT_CODE);
+		rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
+		rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
+				     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+		rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
+				    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+	}
 	clear_bit(IRQ_PEND_EXT_CPU_TIMER, &li->pending_irqs);
 	return rc ? -EFAULT : 0;
 }
@@ -500,19 +510,23 @@ static int __must_check __deliver_cpu_timer(struct kvm_vcpu *vcpu)
 static int __must_check __deliver_ckc(struct kvm_vcpu *vcpu)
 {
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
-	int rc;
+	int rc = 0;
 
 	vcpu->stat.deliver_ckc++;
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_CLOCK_COMP,
 					 0, 0);
-
-	rc  = put_guest_lc(vcpu, EXT_IRQ_CLK_COMP,
-			   (u16 __user *)__LC_EXT_INT_CODE);
-	rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
-	rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
-			     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
-	rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
-			    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+		vcpu->arch.sie_block->iictl = IICTL_CODE_EXT;
+		vcpu->arch.sie_block->eic = EXT_IRQ_CLK_COMP;
+	} else {
+		rc  = put_guest_lc(vcpu, EXT_IRQ_CLK_COMP,
+				   (u16 __user *)__LC_EXT_INT_CODE);
+		rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
+		rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
+				     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+		rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
+				    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+	}
 	clear_bit(IRQ_PEND_EXT_CLOCK_COMP, &li->pending_irqs);
 	return rc ? -EFAULT : 0;
 }
@@ -554,6 +568,20 @@ static int __write_machine_check(struct kvm_vcpu *vcpu,
 	union mci mci;
 	int rc;
 
+	/*
+	 * All other possible payload for a machine check (e.g. the register
+	 * contents in the save area) will be handled by the ultravisor, as
+	 * the hypervisor does not not have the needed information for
+	 * protected guests.
+	 */
+	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+		vcpu->arch.sie_block->iictl = IICTL_CODE_MCHK;
+		vcpu->arch.sie_block->mcic = mchk->mcic;
+		vcpu->arch.sie_block->faddr = mchk->failing_storage_address;
+		vcpu->arch.sie_block->edc = mchk->ext_damage_code;
+		return 0;
+	}
+
 	mci.val = mchk->mcic;
 	/* take care of lazy register loading */
 	save_fpu_regs();
@@ -697,17 +725,21 @@ static int __must_check __deliver_machine_check(struct kvm_vcpu *vcpu)
 static int __must_check __deliver_restart(struct kvm_vcpu *vcpu)
 {
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
-	int rc;
+	int rc = 0;
 
 	VCPU_EVENT(vcpu, 3, "%s", "deliver: cpu restart");
 	vcpu->stat.deliver_restart_signal++;
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_RESTART, 0, 0);
 
-	rc  = write_guest_lc(vcpu,
-			     offsetof(struct lowcore, restart_old_psw),
-			     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
-	rc |= read_guest_lc(vcpu, offsetof(struct lowcore, restart_psw),
-			    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+		vcpu->arch.sie_block->iictl = IICTL_CODE_RESTART;
+	} else {
+		rc  = write_guest_lc(vcpu,
+				     offsetof(struct lowcore, restart_old_psw),
+				     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+		rc |= read_guest_lc(vcpu, offsetof(struct lowcore, restart_psw),
+				    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+	}
 	clear_bit(IRQ_PEND_RESTART, &li->pending_irqs);
 	return rc ? -EFAULT : 0;
 }
@@ -749,6 +781,12 @@ static int __must_check __deliver_emergency_signal(struct kvm_vcpu *vcpu)
 	vcpu->stat.deliver_emergency_signal++;
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_EMERGENCY,
 					 cpu_addr, 0);
+	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+		vcpu->arch.sie_block->iictl = IICTL_CODE_EXT;
+		vcpu->arch.sie_block->eic = EXT_IRQ_EMERGENCY_SIG;
+		vcpu->arch.sie_block->extcpuaddr = cpu_addr;
+		return 0;
+	}
 
 	rc  = put_guest_lc(vcpu, EXT_IRQ_EMERGENCY_SIG,
 			   (u16 *)__LC_EXT_INT_CODE);
@@ -777,6 +815,12 @@ static int __must_check __deliver_external_call(struct kvm_vcpu *vcpu)
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id,
 					 KVM_S390_INT_EXTERNAL_CALL,
 					 extcall.code, 0);
+	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+		vcpu->arch.sie_block->iictl = IICTL_CODE_EXT;
+		vcpu->arch.sie_block->eic = EXT_IRQ_EXTERNAL_CALL;
+		vcpu->arch.sie_block->extcpuaddr = extcall.code;
+		return 0;
+	}
 
 	rc  = put_guest_lc(vcpu, EXT_IRQ_EXTERNAL_CALL,
 			   (u16 *)__LC_EXT_INT_CODE);
@@ -1029,6 +1073,15 @@ static int __do_deliver_io(struct kvm_vcpu *vcpu, struct kvm_s390_io_info *io)
 {
 	int rc;
 
+	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+		vcpu->arch.sie_block->iictl = IICTL_CODE_IO;
+		vcpu->arch.sie_block->subchannel_id = io->subchannel_id;
+		vcpu->arch.sie_block->subchannel_nr = io->subchannel_nr;
+		vcpu->arch.sie_block->io_int_parm = io->io_int_parm;
+		vcpu->arch.sie_block->io_int_word = io->io_int_word;
+		return 0;
+	}
+
 	rc  = put_guest_lc(vcpu, io->subchannel_id, (u16 *)__LC_SUBCHANNEL_ID);
 	rc |= put_guest_lc(vcpu, io->subchannel_nr, (u16 *)__LC_SUBCHANNEL_NR);
 	rc |= put_guest_lc(vcpu, io->io_int_parm, (u32 *)__LC_IO_INT_PARM);
@@ -1422,7 +1475,7 @@ static int __inject_extcall(struct kvm_vcpu *vcpu, struct kvm_s390_irq *irq)
 	if (kvm_get_vcpu_by_id(vcpu->kvm, src_id) == NULL)
 		return -EINVAL;
 
-	if (sclp.has_sigpif)
+	if (sclp.has_sigpif && !kvm_s390_pv_handle_cpu(vcpu))
 		return sca_inject_ext_call(vcpu, src_id);
 
 	if (test_and_set_bit(IRQ_PEND_EXT_EXTERNAL, &li->pending_irqs))
@@ -1835,7 +1888,8 @@ static void __floating_irq_kick(struct kvm *kvm, u64 type)
 		break;
 	case KVM_S390_INT_IO_MIN...KVM_S390_INT_IO_MAX:
 		if (!(type & KVM_S390_INT_IO_AI_MASK &&
-		      kvm->arch.gisa_int.origin))
+		      kvm->arch.gisa_int.origin) ||
+		      kvm_s390_pv_handle_cpu(dst_vcpu))
 			kvm_s390_set_cpuflags(dst_vcpu, CPUSTAT_IO_INT);
 		break;
 	default:
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 16/35] KVM: s390: protvirt: Add SCLP interrupt handling
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (14 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 15/35] KVM: s390: protvirt: Implement interruption injection Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-11 12:00   ` Thomas Huth
  2020-02-07 11:39 ` [PATCH 17/35] KVM: s390: protvirt: Handle spec exception loops Christian Borntraeger
                   ` (18 subsequent siblings)
  34 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

The sclp interrupt is kind of special. The ultravisor polices that we
do not inject an sclp interrupt with payload if no sccb is outstanding.
On the other hand we have "asynchronous" event interrupts, e.g. for
console input.
We separate both variants into sclp interrupt and sclp event interrupt.
The sclp interrupt is masked until a previous servc instruction has
finished (sie exit 108).

[frankja@linux.ibm.com: factoring out write_sclp]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |  6 ++-
 arch/s390/kvm/intercept.c        | 27 ++++++++++
 arch/s390/kvm/interrupt.c        | 92 ++++++++++++++++++++++++++------
 arch/s390/kvm/kvm-s390.c         |  2 +
 4 files changed, 110 insertions(+), 17 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 1319a496c8f3..bd1ddbfef436 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -518,6 +518,7 @@ enum irq_types {
 	IRQ_PEND_PFAULT_INIT,
 	IRQ_PEND_EXT_HOST,
 	IRQ_PEND_EXT_SERVICE,
+	IRQ_PEND_EXT_SERVICE_EV,
 	IRQ_PEND_EXT_TIMING,
 	IRQ_PEND_EXT_CPU_TIMER,
 	IRQ_PEND_EXT_CLOCK_COMP,
@@ -562,6 +563,7 @@ enum irq_types {
 			   (1UL << IRQ_PEND_EXT_TIMING)     | \
 			   (1UL << IRQ_PEND_EXT_HOST)       | \
 			   (1UL << IRQ_PEND_EXT_SERVICE)    | \
+			   (1UL << IRQ_PEND_EXT_SERVICE_EV) | \
 			   (1UL << IRQ_PEND_VIRTIO)         | \
 			   (1UL << IRQ_PEND_PFAULT_INIT)    | \
 			   (1UL << IRQ_PEND_PFAULT_DONE))
@@ -582,7 +584,8 @@ enum irq_types {
 			      (1UL << IRQ_PEND_EXT_CLOCK_COMP) | \
 			      (1UL << IRQ_PEND_EXT_EMERGENCY)  | \
 			      (1UL << IRQ_PEND_EXT_EXTERNAL)   | \
-			      (1UL << IRQ_PEND_EXT_SERVICE))
+			      (1UL << IRQ_PEND_EXT_SERVICE)    | \
+			      (1UL << IRQ_PEND_EXT_SERVICE_EV))
 
 struct kvm_s390_interrupt_info {
 	struct list_head list;
@@ -642,6 +645,7 @@ struct kvm_s390_local_interrupt {
 
 struct kvm_s390_float_interrupt {
 	unsigned long pending_irqs;
+	unsigned long masked_irqs;
 	spinlock_t lock;
 	struct list_head lists[FIRQ_LIST_COUNT];
 	int counters[FIRQ_MAX_COUNT];
diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index 6fdbac696f65..d50a0214eba1 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -444,8 +444,35 @@ static int handle_operexc(struct kvm_vcpu *vcpu)
 	return kvm_s390_inject_program_int(vcpu, PGM_OPERATION);
 }
 
+static int handle_pv_sclp(struct kvm_vcpu *vcpu)
+{
+	struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
+
+	spin_lock(&fi->lock);
+	/*
+	 * 2 cases:
+	 * a: an sccb answering interrupt was already pending or in flight.
+	 *    As the sccb value is not known we can simply set some value to
+	 *    trigger delivery of a saved SCCB. UV will then use its saved
+	 *    copy of the SCCB value.
+	 * b: an error SCCB interrupt needs to be injected so we also inject
+	 *    a fake SCCB address. Firmware will use the proper one.
+	 * This makes sure, that both errors and real sccb returns will only
+	 * be delivered after a notification intercept (instruction has
+	 * finished) but not after others.
+	 */
+	fi->srv_signal.ext_params |= 0x43000;
+	set_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs);
+	clear_bit(IRQ_PEND_EXT_SERVICE, &fi->masked_irqs);
+	spin_unlock(&fi->lock);
+	return 0;
+}
+
 static int handle_pv_notification(struct kvm_vcpu *vcpu)
 {
+	if (vcpu->arch.sie_block->ipa == 0xb220)
+		return handle_pv_sclp(vcpu);
+
 	return handle_instruction(vcpu);
 }
 
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index e5ee52e33d96..c28fa09cb557 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -325,8 +325,11 @@ static inline int gisa_tac_ipm_gisc(struct kvm_s390_gisa *gisa, u32 gisc)
 
 static inline unsigned long pending_irqs_no_gisa(struct kvm_vcpu *vcpu)
 {
-	return vcpu->kvm->arch.float_int.pending_irqs |
-		vcpu->arch.local_int.pending_irqs;
+	unsigned long pending = vcpu->kvm->arch.float_int.pending_irqs |
+				vcpu->arch.local_int.pending_irqs;
+
+	pending &= ~vcpu->kvm->arch.float_int.masked_irqs;
+	return pending;
 }
 
 static inline unsigned long pending_irqs(struct kvm_vcpu *vcpu)
@@ -384,8 +387,10 @@ static unsigned long deliverable_irqs(struct kvm_vcpu *vcpu)
 		__clear_bit(IRQ_PEND_EXT_CLOCK_COMP, &active_mask);
 	if (!(vcpu->arch.sie_block->gcr[0] & CR0_CPU_TIMER_SUBMASK))
 		__clear_bit(IRQ_PEND_EXT_CPU_TIMER, &active_mask);
-	if (!(vcpu->arch.sie_block->gcr[0] & CR0_SERVICE_SIGNAL_SUBMASK))
+	if (!(vcpu->arch.sie_block->gcr[0] & CR0_SERVICE_SIGNAL_SUBMASK)) {
 		__clear_bit(IRQ_PEND_EXT_SERVICE, &active_mask);
+		__clear_bit(IRQ_PEND_EXT_SERVICE_EV, &active_mask);
+	}
 	if (psw_mchk_disabled(vcpu))
 		active_mask &= ~IRQ_PEND_MCHK_MASK;
 	/* PV guest cpus can have a single interruption injected at a time. */
@@ -947,6 +952,31 @@ static int __must_check __deliver_prog(struct kvm_vcpu *vcpu)
 	return rc ? -EFAULT : 0;
 }
 
+#define SCCB_MASK 0xFFFFFFF8
+#define SCCB_EVENT_PENDING 0x3
+
+static int write_sclp(struct kvm_vcpu *vcpu, u32 parm)
+{
+	int rc;
+
+	if (kvm_s390_pv_handle_cpu(vcpu)) {
+		vcpu->arch.sie_block->iictl = IICTL_CODE_EXT;
+		vcpu->arch.sie_block->eic = EXT_IRQ_SERVICE_SIG;
+		vcpu->arch.sie_block->eiparams = parm;
+		return 0;
+	}
+
+	rc  = put_guest_lc(vcpu, EXT_IRQ_SERVICE_SIG, (u16 *)__LC_EXT_INT_CODE);
+	rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
+	rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
+			     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+	rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
+			    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
+	rc |= put_guest_lc(vcpu, parm,
+			   (u32 *)__LC_EXT_PARAMS);
+	return rc;
+}
+
 static int __must_check __deliver_service(struct kvm_vcpu *vcpu)
 {
 	struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
@@ -954,13 +984,17 @@ static int __must_check __deliver_service(struct kvm_vcpu *vcpu)
 	int rc = 0;
 
 	spin_lock(&fi->lock);
-	if (!(test_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs))) {
+	if (test_bit(IRQ_PEND_EXT_SERVICE, &fi->masked_irqs) ||
+	    !(test_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs))) {
 		spin_unlock(&fi->lock);
 		return 0;
 	}
 	ext = fi->srv_signal;
 	memset(&fi->srv_signal, 0, sizeof(ext));
 	clear_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs);
+	clear_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs);
+	if (kvm_s390_pv_is_protected(vcpu->kvm))
+		set_bit(IRQ_PEND_EXT_SERVICE, &fi->masked_irqs);
 	spin_unlock(&fi->lock);
 
 	VCPU_EVENT(vcpu, 4, "deliver: sclp parameter 0x%x",
@@ -969,15 +1003,33 @@ static int __must_check __deliver_service(struct kvm_vcpu *vcpu)
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_SERVICE,
 					 ext.ext_params, 0);
 
-	rc  = put_guest_lc(vcpu, EXT_IRQ_SERVICE_SIG, (u16 *)__LC_EXT_INT_CODE);
-	rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
-	rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
-			     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
-	rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
-			    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
-	rc |= put_guest_lc(vcpu, ext.ext_params,
-			   (u32 *)__LC_EXT_PARAMS);
+	rc = write_sclp(vcpu, ext.ext_params);
+	return rc ? -EFAULT : 0;
+}
+
+static int __must_check __deliver_service_ev(struct kvm_vcpu *vcpu)
+{
+	struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
+	struct kvm_s390_ext_info ext;
+	int rc = 0;
+
+	spin_lock(&fi->lock);
+	if (!(test_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs))) {
+		spin_unlock(&fi->lock);
+		return 0;
+	}
+	ext = fi->srv_signal;
+	/* only clear the event bit */
+	fi->srv_signal.ext_params &= ~SCCB_EVENT_PENDING;
+	clear_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs);
+	spin_unlock(&fi->lock);
+
+	VCPU_EVENT(vcpu, 4, "%s", "deliver: sclp parameter event");
+	vcpu->stat.deliver_service_signal++;
+	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_SERVICE,
+					 ext.ext_params, 0);
 
+	rc = write_sclp(vcpu, SCCB_EVENT_PENDING);
 	return rc ? -EFAULT : 0;
 }
 
@@ -1383,6 +1435,9 @@ int __must_check kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu)
 		case IRQ_PEND_EXT_SERVICE:
 			rc = __deliver_service(vcpu);
 			break;
+		case IRQ_PEND_EXT_SERVICE_EV:
+			rc = __deliver_service_ev(vcpu);
+			break;
 		case IRQ_PEND_PFAULT_DONE:
 			rc = __deliver_pfault_done(vcpu);
 			break;
@@ -1735,9 +1790,6 @@ struct kvm_s390_interrupt_info *kvm_s390_get_io_int(struct kvm *kvm,
 	return inti;
 }
 
-#define SCCB_MASK 0xFFFFFFF8
-#define SCCB_EVENT_PENDING 0x3
-
 static int __inject_service(struct kvm *kvm,
 			     struct kvm_s390_interrupt_info *inti)
 {
@@ -1746,6 +1798,11 @@ static int __inject_service(struct kvm *kvm,
 	kvm->stat.inject_service_signal++;
 	spin_lock(&fi->lock);
 	fi->srv_signal.ext_params |= inti->ext.ext_params & SCCB_EVENT_PENDING;
+
+	/* We always allow events, track them separately from the sccb ints */
+	if (fi->srv_signal.ext_params & SCCB_EVENT_PENDING)
+		set_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs);
+
 	/*
 	 * Early versions of the QEMU s390 bios will inject several
 	 * service interrupts after another without handling a
@@ -2137,6 +2194,8 @@ void kvm_s390_clear_float_irqs(struct kvm *kvm)
 
 	spin_lock(&fi->lock);
 	fi->pending_irqs = 0;
+	if (!kvm_s390_pv_is_protected(kvm))
+		fi->masked_irqs = 0;
 	memset(&fi->srv_signal, 0, sizeof(fi->srv_signal));
 	memset(&fi->mchk, 0, sizeof(fi->mchk));
 	for (i = 0; i < FIRQ_LIST_COUNT; i++)
@@ -2201,7 +2260,8 @@ static int get_all_floating_irqs(struct kvm *kvm, u8 __user *usrbuf, u64 len)
 			n++;
 		}
 	}
-	if (test_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs)) {
+	if (test_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs) ||
+	    test_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs)) {
 		if (n == max_irqs) {
 			/* signal userspace to try again */
 			ret = -ENOMEM;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index ad86e74e27ec..e1dd851aca65 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2202,6 +2202,8 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 		}
 		r = kvm_s390_pv_create_vm(kvm);
 		kvm_s390_vcpu_unblock_all(kvm);
+		/* we need to block service interrupts from now on */
+		set_bit(IRQ_PEND_EXT_SERVICE, &kvm->arch.float_int.masked_irqs);
 		mutex_unlock(&kvm->lock);
 		break;
 	}
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 17/35] KVM: s390: protvirt: Handle spec exception loops
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (15 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 16/35] KVM: s390: protvirt: Add SCLP interrupt handling Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-07 11:39 ` [PATCH 18/35] KVM: s390: protvirt: Add new gprs location handling Christian Borntraeger
                   ` (17 subsequent siblings)
  34 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

SIE intercept code 8 is used only on exception loops for protected
guests. That means we need to stop the guest when we see it. This is
done by userspace.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/intercept.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index d50a0214eba1..db3dd5ee0b7a 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -231,6 +231,13 @@ static int handle_prog(struct kvm_vcpu *vcpu)
 
 	vcpu->stat.exit_program_interruption++;
 
+	/*
+	 * Intercept 8 indicates a loop of specification exceptions
+	 * for protected guests.
+	 */
+	if (kvm_s390_pv_is_protected(vcpu->kvm))
+		return -EOPNOTSUPP;
+
 	if (guestdbg_enabled(vcpu) && per_event(vcpu)) {
 		rc = kvm_s390_handle_per_event(vcpu);
 		if (rc)
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 18/35] KVM: s390: protvirt: Add new gprs location handling
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (16 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 17/35] KVM: s390: protvirt: Handle spec exception loops Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-07 11:39 ` [PATCH 19/35] KVM: S390: protvirt: Introduce instruction data area bounce buffer Christian Borntraeger
                   ` (16 subsequent siblings)
  34 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

Guest registers for protected guests are stored at offset 0x380.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |  4 +++-
 arch/s390/kvm/kvm-s390.c         | 11 +++++++++++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index bd1ddbfef436..9d7b248dcadc 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -343,7 +343,9 @@ struct kvm_s390_itdb {
 struct sie_page {
 	struct kvm_s390_sie_block sie_block;
 	struct mcck_volatile_info mcck_info;	/* 0x0200 */
-	__u8 reserved218[1000];		/* 0x0218 */
+	__u8 reserved218[360];		/* 0x0218 */
+	__u64 pv_grregs[16];		/* 0x0380 */
+	__u8 reserved400[512];		/* 0x0400 */
 	struct kvm_s390_itdb itdb;	/* 0x0600 */
 	__u8 reserved700[2304];		/* 0x0700 */
 };
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index e1dd851aca65..6f90d16cad92 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -3999,6 +3999,7 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
 static int __vcpu_run(struct kvm_vcpu *vcpu)
 {
 	int rc, exit_reason;
+	struct sie_page *sie_page = (struct sie_page *)vcpu->arch.sie_block;
 
 	/*
 	 * We try to hold kvm->srcu during most of vcpu_run (except when run-
@@ -4020,8 +4021,18 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
 		guest_enter_irqoff();
 		__disable_cpu_timer_accounting(vcpu);
 		local_irq_enable();
+		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+			memcpy(sie_page->pv_grregs,
+			       vcpu->run->s.regs.gprs,
+			       sizeof(sie_page->pv_grregs));
+		}
 		exit_reason = sie64a(vcpu->arch.sie_block,
 				     vcpu->run->s.regs.gprs);
+		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+			memcpy(vcpu->run->s.regs.gprs,
+			       sie_page->pv_grregs,
+			       sizeof(sie_page->pv_grregs));
+		}
 		local_irq_disable();
 		__enable_cpu_timer_accounting(vcpu);
 		guest_exit_irqoff();
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 19/35] KVM: S390: protvirt: Introduce instruction data area bounce buffer
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (17 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 18/35] KVM: s390: protvirt: Add new gprs location handling Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-07 11:39 ` [PATCH 20/35] KVM: s390: protvirt: handle secure guest prefix pages Christian Borntraeger
                   ` (15 subsequent siblings)
  34 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

Now that we can't access guest memory anymore, we have a dedicated
satellite block that's a bounce buffer for instruction data.

We re-use the memop interface to copy the instruction data to / from
userspace. This lets us re-use a lot of QEMU code which used that
interface to make logical guest memory accesses which are not possible
anymore in protected mode anyway.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/kvm_host.h | 11 ++++++++-
 arch/s390/kvm/kvm-s390.c         | 42 ++++++++++++++++++++++++++++++++
 arch/s390/kvm/pv.c               |  9 +++++++
 include/uapi/linux/kvm.h         |  6 ++++-
 4 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 9d7b248dcadc..05949ff75a1e 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -127,6 +127,12 @@ struct mcck_volatile_info {
 #define CR14_INITIAL_MASK (CR14_UNUSED_32 | CR14_UNUSED_33 | \
 			   CR14_EXTERNAL_DAMAGE_SUBMASK)
 
+#define SIDAD_SIZE_MASK		0xff
+#define sida_origin(sie_block) \
+	(sie_block->sidad & PAGE_MASK)
+#define sida_size(sie_block) \
+	(((sie_block->sidad & SIDAD_SIZE_MASK) + 1) * PAGE_SIZE)
+
 #define CPUSTAT_STOPPED    0x80000000
 #define CPUSTAT_WAIT       0x10000000
 #define CPUSTAT_ECALL_PEND 0x08000000
@@ -315,7 +321,10 @@ struct kvm_s390_sie_block {
 #define CRYCB_FORMAT2 0x00000003
 	__u32	crycbd;			/* 0x00fc */
 	__u64	gcr[16];		/* 0x0100 */
-	__u64	gbea;			/* 0x0180 */
+	union {
+		__u64	gbea;		/* 0x0180 */
+		__u64	sidad;
+	};
 	__u8    reserved188[8];		/* 0x0188 */
 	__u64   sdnxo;			/* 0x0190 */
 	__u8    reserved198[8];		/* 0x0198 */
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 6f90d16cad92..1797490e3e77 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4435,6 +4435,34 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 	return r;
 }
 
+static long kvm_s390_guest_sida_op(struct kvm_vcpu *vcpu,
+				   struct kvm_s390_mem_op *mop)
+{
+	void __user *uaddr = (void __user *)mop->buf;
+	int r = 0;
+
+	if (mop->flags || !mop->size)
+		return -EINVAL;
+	if (mop->size + mop->sida_offset < mop->size)
+		return -EINVAL;
+	if (mop->size + mop->sida_offset > sida_size(vcpu->arch.sie_block))
+		return -E2BIG;
+
+	switch (mop->op) {
+	case KVM_S390_MEMOP_SIDA_READ:
+		if (copy_to_user(uaddr, (void *)(sida_origin(vcpu->arch.sie_block) +
+				 mop->sida_offset), mop->size))
+			r = -EFAULT;
+
+		break;
+	case KVM_S390_MEMOP_SIDA_WRITE:
+		if (copy_from_user((void *)(sida_origin(vcpu->arch.sie_block) +
+				   mop->sida_offset), uaddr, mop->size))
+			r = -EFAULT;
+		break;
+	}
+	return r;
+}
 static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
 				  struct kvm_s390_mem_op *mop)
 {
@@ -4444,6 +4472,7 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
 	const u64 supported_flags = KVM_S390_MEMOP_F_INJECT_EXCEPTION
 				    | KVM_S390_MEMOP_F_CHECK_ONLY;
 
+	BUILD_BUG_ON(sizeof(*mop) != 64);
 	if (mop->flags & ~supported_flags || mop->ar >= NUM_ACRS || !mop->size)
 		return -EINVAL;
 
@@ -4460,6 +4489,10 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
 
 	switch (mop->op) {
 	case KVM_S390_MEMOP_LOGICAL_READ:
+		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+			r = -EINVAL;
+			break;
+		}
 		if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
 			r = check_gva_range(vcpu, mop->gaddr, mop->ar,
 					    mop->size, GACC_FETCH);
@@ -4472,6 +4505,10 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
 		}
 		break;
 	case KVM_S390_MEMOP_LOGICAL_WRITE:
+		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+			r = -EINVAL;
+			break;
+		}
 		if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
 			r = check_gva_range(vcpu, mop->gaddr, mop->ar,
 					    mop->size, GACC_STORE);
@@ -4483,6 +4520,11 @@ static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
 		}
 		r = write_guest(vcpu, mop->gaddr, mop->ar, tmpbuf, mop->size);
 		break;
+	case KVM_S390_MEMOP_SIDA_READ:
+	case KVM_S390_MEMOP_SIDA_WRITE:
+		/* we are locked against sida going away by the vcpu->mutex */
+		r = kvm_s390_guest_sida_op(vcpu, mop);
+		break;
 	default:
 		r = -EINVAL;
 	}
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 392795a92bd9..70e452192468 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -93,6 +93,7 @@ int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu)
 
 	free_pages(vcpu->arch.pv.stor_base,
 		   get_order(uv_info.guest_cpu_stor_len));
+	free_page(sida_origin(vcpu->arch.sie_block));
 	vcpu->arch.sie_block->pv_handle_cpu = 0;
 	vcpu->arch.sie_block->pv_handle_config = 0;
 	memset(&vcpu->arch.pv, 0, sizeof(vcpu->arch.pv));
@@ -122,6 +123,14 @@ int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
 	uvcb.state_origin = (u64)vcpu->arch.sie_block;
 	uvcb.stor_origin = (u64)vcpu->arch.pv.stor_base;
 
+	/* Alloc Secure Instruction Data Area Designation */
+	vcpu->arch.sie_block->sidad = __get_free_page(GFP_KERNEL | __GFP_ZERO);
+	if (!vcpu->arch.sie_block->sidad) {
+		free_pages(vcpu->arch.pv.stor_base,
+			   get_order(uv_info.guest_cpu_stor_len));
+		return -ENOMEM;
+	}
+
 	rc = uv_call(0, (u64)&uvcb);
 	VCPU_EVENT(vcpu, 3, "PROTVIRT CREATE VCPU: cpu %d handle %llx rc %x rrc %x",
 		   vcpu->vcpu_id, uvcb.cpu_handle, uvcb.header.rc,
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index eab741bc12c3..51d4018c3be3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -475,11 +475,15 @@ struct kvm_s390_mem_op {
 	__u32 op;		/* type of operation */
 	__u64 buf;		/* buffer in userspace */
 	__u8 ar;		/* the access register number */
-	__u8 reserved[31];	/* should be set to 0 */
+	__u8 reserved21[3];	/* should be set to 0 */
+	__u32 sida_offset;	/* offset into the sida */
+	__u8 reserved28[24];	/* should be set to 0 */
 };
 /* types for kvm_s390_mem_op->op */
 #define KVM_S390_MEMOP_LOGICAL_READ	0
 #define KVM_S390_MEMOP_LOGICAL_WRITE	1
+#define KVM_S390_MEMOP_SIDA_READ	2
+#define KVM_S390_MEMOP_SIDA_WRITE	3
 /* flags for kvm_s390_mem_op->flags */
 #define KVM_S390_MEMOP_F_CHECK_ONLY		(1ULL << 0)
 #define KVM_S390_MEMOP_F_INJECT_EXCEPTION	(1ULL << 1)
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 20/35] KVM: s390: protvirt: handle secure guest prefix pages
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (18 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 19/35] KVM: S390: protvirt: Introduce instruction data area bounce buffer Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-13  8:37   ` Christian Borntraeger
  2020-02-07 11:39 ` [PATCH 21/35] KVM: s390/mm: handle guest unpin events Christian Borntraeger
                   ` (14 subsequent siblings)
  34 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

The SPX instruction is handled by the ultravisor. We do get a
notification intercept, though. Let us update our internal view.

In addition to that, when the guest prefix page is not secure, an
intercept 112 (0x70) is indicated.  To avoid this for the most common
cases, we can make the guest prefix page protected whenever we pin it.
We have to deal with 112 nevertheless, e.g. when some host code triggers
an export (e.g. qemu dump guest memory). We can simply re-run the
pinning logic by doing a no-op prefix change.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |  1 +
 arch/s390/kvm/intercept.c        | 16 ++++++++++++++++
 arch/s390/kvm/kvm-s390.c         | 14 ++++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 05949ff75a1e..0e3ffad4137f 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -225,6 +225,7 @@ struct kvm_s390_sie_block {
 #define ICPT_INT_ENABLE	0x64
 #define ICPT_PV_INSTR	0x68
 #define ICPT_PV_NOTIFY	0x6c
+#define ICPT_PV_PREF	0x70
 	__u8	icptcode;		/* 0x0050 */
 	__u8	icptstatus;		/* 0x0051 */
 	__u16	ihcpu;			/* 0x0052 */
diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index db3dd5ee0b7a..2a966dc52611 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -451,6 +451,15 @@ static int handle_operexc(struct kvm_vcpu *vcpu)
 	return kvm_s390_inject_program_int(vcpu, PGM_OPERATION);
 }
 
+static int handle_pv_spx(struct kvm_vcpu *vcpu)
+{
+	u32 pref = *(u32 *)vcpu->arch.sie_block->sidad;
+
+	kvm_s390_set_prefix(vcpu, pref);
+	trace_kvm_s390_handle_prefix(vcpu, 1, pref);
+	return 0;
+}
+
 static int handle_pv_sclp(struct kvm_vcpu *vcpu)
 {
 	struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
@@ -477,6 +486,8 @@ static int handle_pv_sclp(struct kvm_vcpu *vcpu)
 
 static int handle_pv_notification(struct kvm_vcpu *vcpu)
 {
+	if (vcpu->arch.sie_block->ipa == 0xb210)
+		return handle_pv_spx(vcpu);
 	if (vcpu->arch.sie_block->ipa == 0xb220)
 		return handle_pv_sclp(vcpu);
 
@@ -534,6 +545,11 @@ int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
 	case ICPT_PV_NOTIFY:
 		rc = handle_pv_notification(vcpu);
 		break;
+	case ICPT_PV_PREF:
+		rc = 0;
+		/* request to convert and pin the prefix pages again */
+		kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
+		break;
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 1797490e3e77..63d158149936 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -3678,6 +3678,20 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
 		rc = gmap_mprotect_notify(vcpu->arch.gmap,
 					  kvm_s390_get_prefix(vcpu),
 					  PAGE_SIZE * 2, PROT_WRITE);
+		if (!rc && kvm_s390_pv_is_protected(vcpu->kvm)) {
+			do {
+				rc = uv_convert_to_secure(
+						vcpu->arch.gmap,
+						kvm_s390_get_prefix(vcpu));
+			} while (rc == -EAGAIN);
+			WARN_ONCE(rc, "Error while importing first prefix page. rc %d", rc);
+			do {
+				rc = uv_convert_to_secure(
+						vcpu->arch.gmap,
+						kvm_s390_get_prefix(vcpu) + PAGE_SIZE);
+			} while (rc == -EAGAIN);
+			WARN_ONCE(rc, "Error while importing second prefix page. rc %d", rc);
+		}
 		if (rc) {
 			kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
 			return rc;
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 21/35] KVM: s390/mm: handle guest unpin events
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (19 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 20/35] KVM: s390: protvirt: handle secure guest prefix pages Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-10 14:58   ` Thomas Huth
  2020-02-07 11:39 ` [PATCH 22/35] KVM: s390: protvirt: Write sthyi data to instruction data area Christian Borntraeger
                   ` (13 subsequent siblings)
  34 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, Andrew Morton

From: Claudio Imbrenda <imbrenda@linux.ibm.com>

The current code tries to first pin shared pages, if that fails (e.g.
because the page is not shared) it will export them. For shared pages
this means that we get a new intercept telling us that the guest is
unsharing that page. We will make the page secure at that point in time
and revoke the host access. This is synchronized with other host events,
e.g. the code will wait until host I/O has finished.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/intercept.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index 2a966dc52611..e155389a4a66 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -16,6 +16,7 @@
 #include <asm/asm-offsets.h>
 #include <asm/irq.h>
 #include <asm/sysinfo.h>
+#include <asm/uv.h>
 
 #include "kvm-s390.h"
 #include "gaccess.h"
@@ -484,12 +485,35 @@ static int handle_pv_sclp(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static int handle_pv_uvc(struct kvm_vcpu *vcpu)
+{
+	struct uv_cb_share *guest_uvcb = (void *)vcpu->arch.sie_block->sidad;
+	struct uv_cb_cts uvcb = {
+		.header.cmd	= UVC_CMD_UNPIN_PAGE_SHARED,
+		.header.len	= sizeof(uvcb),
+		.guest_handle	= kvm_s390_pv_handle(vcpu->kvm),
+		.gaddr		= guest_uvcb->paddr,
+	};
+	int rc;
+
+	if (guest_uvcb->header.cmd != UVC_CMD_REMOVE_SHARED_ACCESS) {
+		WARN_ONCE(1, "Unexpected UVC 0x%x!\n", guest_uvcb->header.cmd);
+		return 0;
+	}
+	rc = uv_make_secure(vcpu->arch.gmap, uvcb.gaddr, &uvcb);
+	if (rc == -EINVAL && uvcb.header.rc == 0x104)
+		return 0;
+	return rc;
+}
+
 static int handle_pv_notification(struct kvm_vcpu *vcpu)
 {
 	if (vcpu->arch.sie_block->ipa == 0xb210)
 		return handle_pv_spx(vcpu);
 	if (vcpu->arch.sie_block->ipa == 0xb220)
 		return handle_pv_sclp(vcpu);
+	if (vcpu->arch.sie_block->ipa == 0xb9a4)
+		return handle_pv_uvc(vcpu);
 
 	return handle_instruction(vcpu);
 }
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 22/35] KVM: s390: protvirt: Write sthyi data to instruction data area
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (20 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 21/35] KVM: s390/mm: handle guest unpin events Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-07 11:39 ` [PATCH 23/35] KVM: s390: protvirt: STSI handling Christian Borntraeger
                   ` (12 subsequent siblings)
  34 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

STHYI data has to go through the bounce buffer.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/intercept.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index e155389a4a66..51a8bb7e956a 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -392,7 +392,7 @@ int handle_sthyi(struct kvm_vcpu *vcpu)
 		goto out;
 	}
 
-	if (addr & ~PAGE_MASK)
+	if (!kvm_s390_pv_is_protected(vcpu->kvm) && (addr & ~PAGE_MASK))
 		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
 
 	sctns = (void *)get_zeroed_page(GFP_KERNEL);
@@ -403,10 +403,15 @@ int handle_sthyi(struct kvm_vcpu *vcpu)
 
 out:
 	if (!cc) {
-		r = write_guest(vcpu, addr, reg2, sctns, PAGE_SIZE);
-		if (r) {
-			free_page((unsigned long)sctns);
-			return kvm_s390_inject_prog_cond(vcpu, r);
+		if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+			memcpy((void *)(sida_origin(vcpu->arch.sie_block)),
+			       sctns, PAGE_SIZE);
+		} else {
+			r = write_guest(vcpu, addr, reg2, sctns, PAGE_SIZE);
+			if (r) {
+				free_page((unsigned long)sctns);
+				return kvm_s390_inject_prog_cond(vcpu, r);
+			}
 		}
 	}
 
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 23/35] KVM: s390: protvirt: STSI handling
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (21 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 22/35] KVM: s390: protvirt: Write sthyi data to instruction data area Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-08 15:01   ` Thomas Huth
  2020-02-11 10:55   ` Cornelia Huck
  2020-02-07 11:39 ` [PATCH 24/35] KVM: s390: protvirt: disallow one_reg Christian Borntraeger
                   ` (11 subsequent siblings)
  34 siblings, 2 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

Save response to sidad and disable address checking for protected
guests.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/priv.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index ed52ffa8d5d4..b2de7dc5f58d 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -872,7 +872,7 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
 
 	operand2 = kvm_s390_get_base_disp_s(vcpu, &ar);
 
-	if (operand2 & 0xfff)
+	if (!kvm_s390_pv_is_protected(vcpu->kvm) && (operand2 & 0xfff))
 		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
 
 	switch (fc) {
@@ -893,8 +893,13 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
 		handle_stsi_3_2_2(vcpu, (void *) mem);
 		break;
 	}
-
-	rc = write_guest(vcpu, operand2, ar, (void *)mem, PAGE_SIZE);
+	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
+		memcpy((void *)sida_origin(vcpu->arch.sie_block), (void *)mem,
+		       PAGE_SIZE);
+		rc = 0;
+	} else {
+		rc = write_guest(vcpu, operand2, ar, (void *)mem, PAGE_SIZE);
+	}
 	if (rc) {
 		rc = kvm_s390_inject_prog_cond(vcpu, rc);
 		goto out;
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 24/35] KVM: s390: protvirt: disallow one_reg
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (22 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 23/35] KVM: s390: protvirt: STSI handling Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-10 17:53   ` Cornelia Huck
  2020-02-07 11:39 ` [PATCH 25/35] KVM: s390: protvirt: Only sync fmt4 registers Christian Borntraeger
                   ` (10 subsequent siblings)
  34 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

A lot of the registers are controlled by the Ultravisor and never
visible to KVM. Some fields in the sie control block are overlayed,
like gbea. As no userspace uses the ONE_REG interface on s390 it is safe
to disable this for protected guests.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 Documentation/virt/kvm/api.txt | 6 ++++--
 arch/s390/kvm/kvm-s390.c       | 3 +++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
index 4874d42286ca..4bee7c023426 100644
--- a/Documentation/virt/kvm/api.txt
+++ b/Documentation/virt/kvm/api.txt
@@ -1918,7 +1918,8 @@ Parameters: struct kvm_one_reg (in)
 Returns: 0 on success, negative value on failure
 Errors:
   ENOENT:   no such register
-  EINVAL:   invalid register ID, or no such register
+  EINVAL:   invalid register ID, or no such register, ONE_REG forbidden
+            for protected guests (s390).
   EPERM:    (arm64) register access not allowed before vcpu finalization
 (These error codes are indicative only: do not rely on a specific error
 code being returned in a specific situation.)
@@ -2311,7 +2312,8 @@ Parameters: struct kvm_one_reg (in and out)
 Returns: 0 on success, negative value on failure
 Errors include:
   ENOENT:   no such register
-  EINVAL:   invalid register ID, or no such register
+  EINVAL:   invalid register ID, or no such register, ONE_REG forbidden
+            for protected guests (s390)
   EPERM:    (arm64) register access not allowed before vcpu finalization
 (These error codes are indicative only: do not rely on a specific error
 code being returned in a specific situation.)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 63d158149936..f995040102ea 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4649,6 +4649,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	case KVM_SET_ONE_REG:
 	case KVM_GET_ONE_REG: {
 		struct kvm_one_reg reg;
+		r = -EINVAL;
+		if (kvm_s390_pv_is_protected(vcpu->kvm))
+			break;
 		r = -EFAULT;
 		if (copy_from_user(&reg, argp, sizeof(reg)))
 			break;
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 25/35] KVM: s390: protvirt: Only sync fmt4 registers
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (23 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 24/35] KVM: s390: protvirt: disallow one_reg Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-09 15:50   ` Thomas Huth
  2020-02-11 10:51   ` Cornelia Huck
  2020-02-07 11:39 ` [PATCH 26/35] KVM: s390: protvirt: Add program exception injection Christian Borntraeger
                   ` (9 subsequent siblings)
  34 siblings, 2 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

A lot of the registers are controlled by the Ultravisor and never
visible to KVM. Also some registers are overlayed, like gbea is with
sidad, which might leak data to userspace.

Hence we sync a minimal set of registers for both SIE formats and then
check and sync format 2 registers if necessary.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 116 ++++++++++++++++++++++++---------------
 1 file changed, 72 insertions(+), 44 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index f995040102ea..7df48cc942fd 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -3447,9 +3447,11 @@ static void kvm_arch_vcpu_ioctl_initial_reset(struct kvm_vcpu *vcpu)
 	vcpu->arch.sie_block->gcr[0] = CR0_INITIAL_MASK;
 	vcpu->arch.sie_block->gcr[14] = CR14_INITIAL_MASK;
 	vcpu->run->s.regs.fpc = 0;
-	vcpu->arch.sie_block->gbea = 1;
-	vcpu->arch.sie_block->pp = 0;
-	vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
+	if (!kvm_s390_pv_handle_cpu(vcpu)) {
+		vcpu->arch.sie_block->gbea = 1;
+		vcpu->arch.sie_block->pp = 0;
+		vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
+	}
 }
 
 static void kvm_arch_vcpu_ioctl_clear_reset(struct kvm_vcpu *vcpu)
@@ -4060,25 +4062,16 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
 	return rc;
 }
 
-static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+static void sync_regs_fmt2(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
 	struct runtime_instr_cb *riccb;
 	struct gs_cb *gscb;
 
-	riccb = (struct runtime_instr_cb *) &kvm_run->s.regs.riccb;
-	gscb = (struct gs_cb *) &kvm_run->s.regs.gscb;
 	vcpu->arch.sie_block->gpsw.mask = kvm_run->psw_mask;
 	vcpu->arch.sie_block->gpsw.addr = kvm_run->psw_addr;
-	if (kvm_run->kvm_dirty_regs & KVM_SYNC_PREFIX)
-		kvm_s390_set_prefix(vcpu, kvm_run->s.regs.prefix);
-	if (kvm_run->kvm_dirty_regs & KVM_SYNC_CRS) {
-		memcpy(&vcpu->arch.sie_block->gcr, &kvm_run->s.regs.crs, 128);
-		/* some control register changes require a tlb flush */
-		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
-	}
+	riccb = (struct runtime_instr_cb *) &kvm_run->s.regs.riccb;
+	gscb = (struct gs_cb *) &kvm_run->s.regs.gscb;
 	if (kvm_run->kvm_dirty_regs & KVM_SYNC_ARCH0) {
-		kvm_s390_set_cpu_timer(vcpu, kvm_run->s.regs.cputm);
-		vcpu->arch.sie_block->ckc = kvm_run->s.regs.ckc;
 		vcpu->arch.sie_block->todpr = kvm_run->s.regs.todpr;
 		vcpu->arch.sie_block->pp = kvm_run->s.regs.pp;
 		vcpu->arch.sie_block->gbea = kvm_run->s.regs.gbea;
@@ -4119,6 +4112,47 @@ static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 		vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
 		vcpu->arch.sie_block->fpf |= kvm_run->s.regs.bpbc ? FPF_BPBC : 0;
 	}
+	if (MACHINE_HAS_GS) {
+		preempt_disable();
+		__ctl_set_bit(2, 4);
+		if (current->thread.gs_cb) {
+			vcpu->arch.host_gscb = current->thread.gs_cb;
+			save_gs_cb(vcpu->arch.host_gscb);
+		}
+		if (vcpu->arch.gs_enabled) {
+			current->thread.gs_cb = (struct gs_cb *)
+						&vcpu->run->s.regs.gscb;
+			restore_gs_cb(current->thread.gs_cb);
+		}
+		preempt_enable();
+	}
+	/* SIE will load etoken directly from SDNX and therefore kvm_run */
+}
+
+static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+{
+	/*
+	 * at several places we have to modify our internal view to not do
+	 * things that are disallowed by the ultravisor. For example we must
+	 * not inject interrupts after specific exits (e.g. 112). We do this
+	 * by turning off the MIE bits of our PSW copy. To avoid getting
+	 * validity intercepts, we do only accept the condition code from
+	 * userspace.
+	 */
+	vcpu->arch.sie_block->gpsw.mask &= ~PSW_MASK_CC;
+	vcpu->arch.sie_block->gpsw.mask |= kvm_run->psw_mask & PSW_MASK_CC;
+
+	if (kvm_run->kvm_dirty_regs & KVM_SYNC_PREFIX)
+		kvm_s390_set_prefix(vcpu, kvm_run->s.regs.prefix);
+	if (kvm_run->kvm_dirty_regs & KVM_SYNC_CRS) {
+		memcpy(&vcpu->arch.sie_block->gcr, &kvm_run->s.regs.crs, 128);
+		/* some control register changes require a tlb flush */
+		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+	}
+	if (kvm_run->kvm_dirty_regs & KVM_SYNC_ARCH0) {
+		kvm_s390_set_cpu_timer(vcpu, kvm_run->s.regs.cputm);
+		vcpu->arch.sie_block->ckc = kvm_run->s.regs.ckc;
+	}
 	save_access_regs(vcpu->arch.host_acrs);
 	restore_access_regs(vcpu->run->s.regs.acrs);
 	/* save host (userspace) fprs/vrs */
@@ -4133,23 +4167,31 @@ static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 	if (test_fp_ctl(current->thread.fpu.fpc))
 		/* User space provided an invalid FPC, let's clear it */
 		current->thread.fpu.fpc = 0;
+
+	/* Sync fmt2 only data */
+	if (likely(!kvm_s390_pv_is_protected(vcpu->kvm)))
+		sync_regs_fmt2(vcpu, kvm_run);
+	kvm_run->kvm_dirty_regs = 0;
+}
+
+static void store_regs_fmt2(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+{
+	kvm_run->s.regs.pp = vcpu->arch.sie_block->pp;
+	kvm_run->s.regs.gbea = vcpu->arch.sie_block->gbea;
+	kvm_run->s.regs.bpbc = (vcpu->arch.sie_block->fpf & FPF_BPBC) == FPF_BPBC;
 	if (MACHINE_HAS_GS) {
-		preempt_disable();
 		__ctl_set_bit(2, 4);
-		if (current->thread.gs_cb) {
-			vcpu->arch.host_gscb = current->thread.gs_cb;
-			save_gs_cb(vcpu->arch.host_gscb);
-		}
-		if (vcpu->arch.gs_enabled) {
-			current->thread.gs_cb = (struct gs_cb *)
-						&vcpu->run->s.regs.gscb;
-			restore_gs_cb(current->thread.gs_cb);
-		}
+		if (vcpu->arch.gs_enabled)
+			save_gs_cb(current->thread.gs_cb);
+		preempt_disable();
+		current->thread.gs_cb = vcpu->arch.host_gscb;
+		restore_gs_cb(vcpu->arch.host_gscb);
 		preempt_enable();
+		if (!vcpu->arch.host_gscb)
+			__ctl_clear_bit(2, 4);
+		vcpu->arch.host_gscb = NULL;
 	}
-	/* SIE will load etoken directly from SDNX and therefore kvm_run */
-
-	kvm_run->kvm_dirty_regs = 0;
+	/* SIE will save etoken directly into SDNX and therefore kvm_run */
 }
 
 static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
@@ -4161,12 +4203,9 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 	kvm_run->s.regs.cputm = kvm_s390_get_cpu_timer(vcpu);
 	kvm_run->s.regs.ckc = vcpu->arch.sie_block->ckc;
 	kvm_run->s.regs.todpr = vcpu->arch.sie_block->todpr;
-	kvm_run->s.regs.pp = vcpu->arch.sie_block->pp;
-	kvm_run->s.regs.gbea = vcpu->arch.sie_block->gbea;
 	kvm_run->s.regs.pft = vcpu->arch.pfault_token;
 	kvm_run->s.regs.pfs = vcpu->arch.pfault_select;
 	kvm_run->s.regs.pfc = vcpu->arch.pfault_compare;
-	kvm_run->s.regs.bpbc = (vcpu->arch.sie_block->fpf & FPF_BPBC) == FPF_BPBC;
 	save_access_regs(vcpu->run->s.regs.acrs);
 	restore_access_regs(vcpu->arch.host_acrs);
 	/* Save guest register state */
@@ -4175,19 +4214,8 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 	/* Restore will be done lazily at return */
 	current->thread.fpu.fpc = vcpu->arch.host_fpregs.fpc;
 	current->thread.fpu.regs = vcpu->arch.host_fpregs.regs;
-	if (MACHINE_HAS_GS) {
-		__ctl_set_bit(2, 4);
-		if (vcpu->arch.gs_enabled)
-			save_gs_cb(current->thread.gs_cb);
-		preempt_disable();
-		current->thread.gs_cb = vcpu->arch.host_gscb;
-		restore_gs_cb(vcpu->arch.host_gscb);
-		preempt_enable();
-		if (!vcpu->arch.host_gscb)
-			__ctl_clear_bit(2, 4);
-		vcpu->arch.host_gscb = NULL;
-	}
-	/* SIE will save etoken directly into SDNX and therefore kvm_run */
+	if (likely(!kvm_s390_pv_is_protected(vcpu->kvm)))
+		store_regs_fmt2(vcpu, kvm_run);
 }
 
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 26/35] KVM: s390: protvirt: Add program exception injection
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (24 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 25/35] KVM: s390: protvirt: Only sync fmt4 registers Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-09 15:52   ` Thomas Huth
  2020-02-07 11:39 ` [PATCH 27/35] KVM: s390: protvirt: Add diag 308 subcode 8 - 10 handling Christian Borntraeger
                   ` (8 subsequent siblings)
  34 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

Only two program exceptions can be injected for a protected guest:
specification and operand.

For both, a code needs to be specified in the interrupt injection
control of the state description, as the guest prefix page is not
accessible to KVM for such guests.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/interrupt.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index c28fa09cb557..2df6459ab98b 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -837,6 +837,21 @@ static int __must_check __deliver_external_call(struct kvm_vcpu *vcpu)
 	return rc ? -EFAULT : 0;
 }
 
+static int __deliver_prog_pv(struct kvm_vcpu *vcpu, u16 code)
+{
+	switch (code) {
+	case PGM_SPECIFICATION:
+		vcpu->arch.sie_block->iictl = IICTL_CODE_SPECIFICATION;
+		break;
+	case PGM_OPERAND:
+		vcpu->arch.sie_block->iictl = IICTL_CODE_OPERAND;
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
 static int __must_check __deliver_prog(struct kvm_vcpu *vcpu)
 {
 	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
@@ -857,6 +872,9 @@ static int __must_check __deliver_prog(struct kvm_vcpu *vcpu)
 	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_PROGRAM_INT,
 					 pgm_info.code, 0);
 
+	if (kvm_s390_pv_is_protected(vcpu->kvm))
+		return __deliver_prog_pv(vcpu, pgm_info.code & ~PGM_PER);
+
 	switch (pgm_info.code & ~PGM_PER) {
 	case PGM_AFX_TRANSLATION:
 	case PGM_ASX_TRANSLATION:
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 27/35] KVM: s390: protvirt: Add diag 308 subcode 8 - 10 handling
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (25 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 26/35] KVM: s390: protvirt: Add program exception injection Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-07 11:39 ` [PATCH 28/35] KVM: s390: protvirt: UV calls diag308 0, 1 Christian Borntraeger
                   ` (7 subsequent siblings)
  34 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

If the host initialized the Ultravisor, we can set stfle bit 161
(protected virtual IPL enhancements facility), which indicates that
the IPL subcodes 8, 9, and 10 are valid. These subcodes are used by a
normal guest to set/retrieve an IPL information block of type 5 (for
protected virtual machines) and transition into protected mode.

Once in protected mode, the Ultravisor will conceal the facility bit.
Therefore each boot into protected mode has to go through
non-protected mode. There is no secure re-ipl with subcode 10 without
a previous subcode 3.

In protected mode, there is no subcode 4 available, as the VM has no
more access to its memory from non-protected mode. I.e., only a IPL
clear is possible.

The error cases will all be handled in userspace.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 7df48cc942fd..4afa44e3d1ed 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2611,6 +2611,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	if (css_general_characteristics.aiv && test_facility(65))
 		set_kvm_facility(kvm->arch.model.fac_mask, 65);
 
+	if (is_prot_virt_host()) {
+		set_kvm_facility(kvm->arch.model.fac_mask, 161);
+		set_kvm_facility(kvm->arch.model.fac_list, 161);
+	}
+
 	kvm->arch.model.cpuid = kvm_s390_get_initial_cpuid();
 	kvm->arch.model.ibc = sclp.ibc & 0x0fff;
 
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 28/35] KVM: s390: protvirt: UV calls diag308 0, 1
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (26 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 27/35] KVM: s390: protvirt: Add diag 308 subcode 8 - 10 handling Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-09 16:03   ` Thomas Huth
  2020-02-07 11:39 ` [PATCH 29/35] KVM: s390: protvirt: Report CPU state to Ultravisor Christian Borntraeger
                   ` (6 subsequent siblings)
  34 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

diag 308 subcode 0 and 1 require KVM and Ultravisor interaction, since
the cpus have to be set into multiple reset states.

* All cpus need to be stopped
* The "unshare all" UVC needs to be executed
* The "perform reset" UVC needs to be executed
* The cpus need to be reset via the "set cpu state" UVC
* The issuing cpu needs to set state 5 via "set cpu state"

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/uv.h | 25 +++++++++++++++++++++++++
 arch/s390/kvm/diag.c       |  1 +
 arch/s390/kvm/kvm-s390.c   | 28 ++++++++++++++++++++++++++++
 arch/s390/kvm/kvm-s390.h   |  1 +
 include/uapi/linux/kvm.h   |  2 ++
 5 files changed, 57 insertions(+)

diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index 7c21d55d2e49..237d0b417d07 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -36,6 +36,12 @@
 #define UVC_CMD_SET_SEC_CONF_PARAMS	0x0300
 #define UVC_CMD_UNPACK_IMG		0x0301
 #define UVC_CMD_VERIFY_IMG		0x0302
+#define UVC_CMD_CPU_RESET		0x0310
+#define UVC_CMD_CPU_RESET_INITIAL	0x0311
+#define UVC_CMD_PREPARE_RESET		0x0320
+#define UVC_CMD_CPU_RESET_CLEAR		0x0321
+#define UVC_CMD_CPU_SET_STATE		0x0330
+#define UVC_CMD_SET_UNSHARED_ALL	0x0340
 #define UVC_CMD_PIN_PAGE_SHARED		0x0341
 #define UVC_CMD_UNPIN_PAGE_SHARED	0x0342
 #define UVC_CMD_SET_SHARED_ACCESS	0x1000
@@ -56,6 +62,12 @@ enum uv_cmds_inst {
 	BIT_UVC_CMD_SET_SEC_PARMS = 11,
 	BIT_UVC_CMD_UNPACK_IMG = 13,
 	BIT_UVC_CMD_VERIFY_IMG = 14,
+	BIT_UVC_CMD_CPU_RESET = 15,
+	BIT_UVC_CMD_CPU_RESET_INITIAL = 16,
+	BIT_UVC_CMD_CPU_SET_STATE = 17,
+	BIT_UVC_CMD_PREPARE_RESET = 18,
+	BIT_UVC_CMD_CPU_PERFORM_CLEAR_RESET = 19,
+	BIT_UVC_CMD_UNSHARE_ALL = 20,
 	BIT_UVC_CMD_PIN_PAGE_SHARED = 21,
 	BIT_UVC_CMD_UNPIN_PAGE_SHARED = 22,
 };
@@ -160,6 +172,19 @@ struct uv_cb_unp {
 	u64 reserved38[3];
 } __packed __aligned(8);
 
+#define PV_CPU_STATE_OPR	1
+#define PV_CPU_STATE_STP	2
+#define PV_CPU_STATE_CHKSTP	3
+
+struct uv_cb_cpu_set_state {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 cpu_handle;
+	u8  reserved20[7];
+	u8  state;
+	u64 reserved28[5];
+};
+
 /*
  * A common UV call struct for calls that take no payload
  * Examples:
diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
index 3fb54ec2cf3e..390830385b9f 100644
--- a/arch/s390/kvm/diag.c
+++ b/arch/s390/kvm/diag.c
@@ -13,6 +13,7 @@
 #include <asm/pgalloc.h>
 #include <asm/gmap.h>
 #include <asm/virtio-ccw.h>
+#include <asm/uv.h>
 #include "kvm-s390.h"
 #include "trace.h"
 #include "trace-s390.h"
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 4afa44e3d1ed..0be18ac1afb5 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2280,6 +2280,34 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 			 ret >> 16, ret & 0x0000ffff);
 		break;
 	}
+	case KVM_PV_VM_PREP_RESET: {
+		u32 ret;
+
+		r = -EINVAL;
+		if (!kvm_s390_pv_is_protected(kvm))
+			break;
+
+		r = uv_cmd_nodata(kvm_s390_pv_handle(kvm),
+				  UVC_CMD_PREPARE_RESET,
+				  &ret);
+		VM_EVENT(kvm, 3, "PROTVIRT PREP RESET: rc %x rrc %x",
+			 ret >> 16, ret & 0x0000ffff);
+		break;
+	}
+	case KVM_PV_VM_UNSHARE_ALL: {
+		u32 ret;
+
+		r = -EINVAL;
+		if (!kvm_s390_pv_is_protected(kvm))
+			break;
+
+		r = uv_cmd_nodata(kvm_s390_pv_handle(kvm),
+				  UVC_CMD_SET_UNSHARED_ALL,
+				  &ret);
+		VM_EVENT(kvm, 3, "PROTVIRT UNSHARE: %d rc %x rrc %x",
+			 r, ret >> 16, ret & 0x0000ffff);
+		break;
+	}
 	default:
 		return -ENOTTY;
 	}
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 32c0c01d5df0..7530042a44e9 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -207,6 +207,7 @@ int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length);
 int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
 		       unsigned long tweak);
 int kvm_s390_pv_verify(struct kvm *kvm);
+int kvm_s390_pv_set_cpu_state(struct kvm_vcpu *vcpu, u8 state);
 
 static inline bool kvm_s390_pv_is_protected(struct kvm *kvm)
 {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 51d4018c3be3..864e73bd40e4 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1500,6 +1500,8 @@ enum pv_cmd_id {
 	KVM_PV_VM_SET_SEC_PARMS,
 	KVM_PV_VM_UNPACK,
 	KVM_PV_VM_VERIFY,
+	KVM_PV_VM_PREP_RESET,
+	KVM_PV_VM_UNSHARE_ALL,
 	KVM_PV_VCPU_CREATE,
 	KVM_PV_VCPU_DESTROY,
 };
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 29/35] KVM: s390: protvirt: Report CPU state to Ultravisor
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (27 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 28/35] KVM: s390: protvirt: UV calls diag308 0, 1 Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-07 11:39 ` [PATCH 30/35] KVM: s390: protvirt: Support cmd 5 operation state Christian Borntraeger
                   ` (5 subsequent siblings)
  34 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

VCPU states have to be reported to the ultravisor for SIGP
interpretation.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/kvm-s390.c |  5 ++++-
 arch/s390/kvm/pv.c       | 19 +++++++++++++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 0be18ac1afb5..ee98799212d3 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4437,7 +4437,8 @@ void kvm_s390_vcpu_start(struct kvm_vcpu *vcpu)
 		 */
 		__disable_ibs_on_all_vcpus(vcpu->kvm);
 	}
-
+	/* Let's tell the UV that we want to start again */
+	kvm_s390_pv_set_cpu_state(vcpu, PV_CPU_STATE_OPR);
 	kvm_s390_clear_cpuflags(vcpu, CPUSTAT_STOPPED);
 	/*
 	 * Another VCPU might have used IBS while we were offline.
@@ -4465,6 +4466,8 @@ void kvm_s390_vcpu_stop(struct kvm_vcpu *vcpu)
 	kvm_s390_clear_stop_irq(vcpu);
 
 	kvm_s390_set_cpuflags(vcpu, CPUSTAT_STOPPED);
+	/* Let's tell the UV that we successfully stopped the vcpu */
+	kvm_s390_pv_set_cpu_state(vcpu, PV_CPU_STATE_STP);
 	__disable_ibs_on_vcpu(vcpu);
 
 	for (i = 0; i < online_vcpus; i++) {
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 70e452192468..a58f5106ba5f 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -253,3 +253,22 @@ int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
 	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: finished with rc %x", rc);
 	return rc;
 }
+
+int kvm_s390_pv_set_cpu_state(struct kvm_vcpu *vcpu, u8 state)
+{
+	int rc;
+	struct uv_cb_cpu_set_state uvcb = {
+		.header.cmd	= UVC_CMD_CPU_SET_STATE,
+		.header.len	= sizeof(uvcb),
+		.cpu_handle	= kvm_s390_pv_handle_cpu(vcpu),
+		.state		= state,
+	};
+
+	if (!kvm_s390_pv_handle_cpu(vcpu))
+		return -EINVAL;
+
+	rc = uv_call(0, (u64)&uvcb);
+	if (rc)
+		return -EINVAL;
+	return 0;
+}
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 30/35] KVM: s390: protvirt: Support cmd 5 operation state
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (28 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 29/35] KVM: s390: protvirt: Report CPU state to Ultravisor Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-07 11:39 ` [PATCH 31/35] KVM: s390: protvirt: Add UV debug trace Christian Borntraeger
                   ` (4 subsequent siblings)
  34 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

Code 5 for the set cpu state UV call tells the UV to load a PSW from
the SE header (first IPL) or from guest location 0x0 (diag 308 subcode
0/1). Also it sets the cpu into operating state afterwards, so we can
start it.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/uv.h | 1 +
 arch/s390/kvm/kvm-s390.c   | 7 +++++++
 include/uapi/linux/kvm.h   | 1 +
 3 files changed, 9 insertions(+)

diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index 237d0b417d07..94fc0b60e4c8 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -175,6 +175,7 @@ struct uv_cb_unp {
 #define PV_CPU_STATE_OPR	1
 #define PV_CPU_STATE_STP	2
 #define PV_CPU_STATE_CHKSTP	3
+#define PV_CPU_STATE_OPR_LOAD	5
 
 struct uv_cb_cpu_set_state {
 	struct uv_cb_header header;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index ee98799212d3..3a06622c52fb 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4667,6 +4667,13 @@ static int kvm_s390_handle_pv_vcpu(struct kvm_vcpu *vcpu,
 		r = kvm_s390_pv_destroy_cpu(vcpu);
 		break;
 	}
+	case KVM_PV_VCPU_SET_IPL_PSW: {
+		if (!kvm_s390_pv_handle_cpu(vcpu))
+			return -EINVAL;
+
+		r = kvm_s390_pv_set_cpu_state(vcpu, PV_CPU_STATE_OPR_LOAD);
+		break;
+	}
 	default:
 		r = -ENOTTY;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 864e73bd40e4..9125d6b2a974 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1504,6 +1504,7 @@ enum pv_cmd_id {
 	KVM_PV_VM_UNSHARE_ALL,
 	KVM_PV_VCPU_CREATE,
 	KVM_PV_VCPU_DESTROY,
+	KVM_PV_VCPU_SET_IPL_PSW,
 };
 
 struct kvm_pv_cmd {
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 31/35] KVM: s390: protvirt: Add UV debug trace
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (29 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 30/35] KVM: s390: protvirt: Support cmd 5 operation state Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-10 13:22   ` Cornelia Huck
  2020-02-07 11:39 ` [PATCH 32/35] KVM: s390: protvirt: Mask PSW interrupt bits for interception 104 and 112 Christian Borntraeger
                   ` (3 subsequent siblings)
  34 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

Let's have some debug traces which stay around for longer than the
guest.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/kvm-s390.c |  9 ++++++++-
 arch/s390/kvm/kvm-s390.h |  9 +++++++++
 arch/s390/kvm/pv.c       | 20 +++++++++++++++++++-
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 3a06622c52fb..ced2bac251a6 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -222,6 +222,7 @@ static struct gmap_notifier gmap_notifier;
 static struct gmap_notifier vsie_gmap_notifier;
 static struct gmap_notifier adapter_gmap_notifier;
 debug_info_t *kvm_s390_dbf;
+debug_info_t *kvm_s390_dbf_uv;
 
 /* Section: not file related */
 int kvm_arch_hardware_enable(void)
@@ -466,7 +467,12 @@ int kvm_arch_init(void *opaque)
 	if (!kvm_s390_dbf)
 		return -ENOMEM;
 
-	if (debug_register_view(kvm_s390_dbf, &debug_sprintf_view))
+	kvm_s390_dbf_uv = debug_register("kvm-uv", 32, 1, 7 * sizeof(long));
+	if (!kvm_s390_dbf_uv)
+		goto out;
+
+	if (debug_register_view(kvm_s390_dbf, &debug_sprintf_view) ||
+	    debug_register_view(kvm_s390_dbf_uv, &debug_sprintf_view))
 		goto out;
 
 	kvm_s390_cpu_feat_init();
@@ -493,6 +499,7 @@ void kvm_arch_exit(void)
 {
 	kvm_s390_gib_destroy();
 	debug_unregister(kvm_s390_dbf);
+	debug_unregister(kvm_s390_dbf_uv);
 }
 
 /* Section: device related */
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 7530042a44e9..0121a5b36e54 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -25,6 +25,15 @@
 #define IS_ITDB_VALID(vcpu)	((*(char *)vcpu->arch.sie_block->itdba == TDB_FORMAT1))
 
 extern debug_info_t *kvm_s390_dbf;
+extern debug_info_t *kvm_s390_dbf_uv;
+
+#define KVM_UV_EVENT(d_kvm, d_loglevel, d_string, d_args...)\
+do { \
+	debug_sprintf_event(kvm_s390_dbf_uv, d_loglevel, \
+			    "%s: " d_string "\n", d_kvm->arch.dbf->name, \
+			    d_args); \
+} while (0)
+
 #define KVM_EVENT(d_loglevel, d_string, d_args...)\
 do { \
 	debug_sprintf_event(kvm_s390_dbf, d_loglevel, d_string "\n", \
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index a58f5106ba5f..da281d8dcc92 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -74,6 +74,8 @@ int kvm_s390_pv_destroy_vm(struct kvm *kvm)
 	atomic_set(&kvm->mm->context.is_protected, 0);
 	VM_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x",
 		 ret >> 16, ret & 0x0000ffff);
+	KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x",
+		 ret >> 16, ret & 0x0000ffff);
 	return rc;
 }
 
@@ -89,6 +91,8 @@ int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu)
 
 		VCPU_EVENT(vcpu, 3, "PROTVIRT DESTROY VCPU: cpu %d rc %x rrc %x",
 			   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
+		KVM_UV_EVENT(vcpu->kvm, 3, "PROTVIRT DESTROY VCPU: cpu %d rc %x rrc %x",
+			     vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
 	}
 
 	free_pages(vcpu->arch.pv.stor_base,
@@ -135,6 +139,10 @@ int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
 	VCPU_EVENT(vcpu, 3, "PROTVIRT CREATE VCPU: cpu %d handle %llx rc %x rrc %x",
 		   vcpu->vcpu_id, uvcb.cpu_handle, uvcb.header.rc,
 		   uvcb.header.rrc);
+	KVM_UV_EVENT(vcpu->kvm, 3,
+		     "PROTVIRT CREATE VCPU: cpu %d handle %llx rc %x rrc %x",
+		     vcpu->vcpu_id, uvcb.cpu_handle, uvcb.header.rc,
+		     uvcb.header.rrc);
 
 	if (rc) {
 		kvm_s390_pv_destroy_cpu(vcpu);
@@ -174,6 +182,10 @@ int kvm_s390_pv_create_vm(struct kvm *kvm)
 		 uvcb.guest_handle, uvcb.guest_stor_len, uvcb.header.rc,
 		 uvcb.header.rrc);
 
+	KVM_UV_EVENT(kvm, 3, "PROTVIRT CREATE VM: handle %llx len %llx rc %x rrc %x",
+		 uvcb.guest_handle, uvcb.guest_stor_len, uvcb.header.rc,
+		 uvcb.header.rrc);
+
 	/* Outputs */
 	kvm->arch.pv.handle = uvcb.guest_handle;
 
@@ -204,6 +216,8 @@ int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
 	rc = uv_call(0, (u64)&uvcb);
 	VM_EVENT(kvm, 3, "PROTVIRT VM SET PARMS: rc %x rrc %x",
 		 uvcb.header.rc, uvcb.header.rrc);
+	KVM_UV_EVENT(kvm, 3, "PROTVIRT VM SET PARMS: rc %x rrc %x",
+		     uvcb.header.rc, uvcb.header.rrc);
 	if (rc)
 		return -EINVAL;
 	return 0;
@@ -223,9 +237,12 @@ static int unpack_one(struct kvm *kvm, unsigned long addr, u64 tweak[2])
 
 	rc = uv_make_secure(kvm->arch.gmap, addr, &uvcb);
 
-	if (rc)
+	if (rc) {
 		VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: failed addr %llx rc %x rrc %x",
 			 uvcb.gaddr, uvcb.header.rc, uvcb.header.rrc);
+		KVM_UV_EVENT(kvm, 3, "PROTVIRT VM UNPACK: failed with rc %x rrc %x",
+			     uvcb.header.rc, uvcb.header.rrc);
+	}
 	return rc;
 }
 
@@ -251,6 +268,7 @@ int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
 		tw[1] += PAGE_SIZE;
 	}
 	VM_EVENT(kvm, 3, "PROTVIRT VM UNPACK: finished with rc %x", rc);
+	KVM_UV_EVENT(kvm, 3, "PROTVIRT VM UNPACK: finished with rc %x", rc);
 	return rc;
 }
 
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 32/35] KVM: s390: protvirt: Mask PSW interrupt bits for interception 104 and 112
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (30 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 31/35] KVM: s390: protvirt: Add UV debug trace Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-09 16:07   ` Thomas Huth
  2020-02-10 13:28   ` Cornelia Huck
  2020-02-07 11:39 ` [PATCH 33/35] KVM: s390: protvirt: do not inject interrupts after start Christian Borntraeger
                   ` (2 subsequent siblings)
  34 siblings, 2 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

We're not allowed to inject interrupts on intercepts that leave the
guest state in an "in-beetween" state where the next SIE entry will do a
continuation.  Namely secure instruction interception and secure prefix
interception.
As our PSW is just a copy of the real one that will be replaced on the
next exit, we can mask out the interrupt bits in the PSW to make sure
that we do not inject anything.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index ced2bac251a6..8c7b27287b91 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4052,6 +4052,7 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
 	return vcpu_post_run_fault_in_sie(vcpu);
 }
 
+#define PSW_INT_MASK (PSW_MASK_EXT | PSW_MASK_IO | PSW_MASK_MCHECK)
 static int __vcpu_run(struct kvm_vcpu *vcpu)
 {
 	int rc, exit_reason;
@@ -4088,6 +4089,10 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
 			memcpy(vcpu->run->s.regs.gprs,
 			       sie_page->pv_grregs,
 			       sizeof(sie_page->pv_grregs));
+			if (vcpu->arch.sie_block->icptcode == ICPT_PV_INSTR ||
+			    vcpu->arch.sie_block->icptcode == ICPT_PV_PREF) {
+				vcpu->arch.sie_block->gpsw.mask &= ~PSW_INT_MASK;
+			}
 		}
 		local_irq_disable();
 		__enable_cpu_timer_accounting(vcpu);
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 33/35] KVM: s390: protvirt: do not inject interrupts after start
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (31 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 32/35] KVM: s390: protvirt: Mask PSW interrupt bits for interception 104 and 112 Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-07 11:39 ` [PATCH 34/35] KVM: s390: protvirt: Add UV cpu reset calls Christian Borntraeger
  2020-02-07 11:39 ` [PATCH 35/35] DOCUMENTATION: Protected virtual machine introduction and IPL Christian Borntraeger
  34 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik

As PSW restart is handled by the ultravisor (and we only get a start
notification) we must re-check the PSW after a start before injecting
interrupts.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
---
 arch/s390/kvm/kvm-s390.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 8c7b27287b91..27365fea5f95 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4452,6 +4452,13 @@ void kvm_s390_vcpu_start(struct kvm_vcpu *vcpu)
 	/* Let's tell the UV that we want to start again */
 	kvm_s390_pv_set_cpu_state(vcpu, PV_CPU_STATE_OPR);
 	kvm_s390_clear_cpuflags(vcpu, CPUSTAT_STOPPED);
+	/*
+	 * The real PSW might have changed due to a RESTART interpreted by the
+	 * ultravisor. We block all interrupts and let the next sie exit
+	 * refresh our view.
+	 */
+	if (kvm_s390_pv_is_protected(vcpu->kvm))
+		vcpu->arch.sie_block->gpsw.mask &= ~PSW_INT_MASK;
 	/*
 	 * Another VCPU might have used IBS while we were offline.
 	 * Let's play safe and flush the VCPU at startup.
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 34/35] KVM: s390: protvirt: Add UV cpu reset calls
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (32 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 33/35] KVM: s390: protvirt: do not inject interrupts after start Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-10 13:17   ` Cornelia Huck
  2020-02-07 11:39 ` [PATCH 35/35] DOCUMENTATION: Protected virtual machine introduction and IPL Christian Borntraeger
  34 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

For protected VMs, the VCPU resets are done by the Ultravisor, as KVM
has no access to the VCPU registers.

As the Ultravisor will only accept a call for the reset that is
needed, we need to fence the UV calls when chaining resets.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 27365fea5f95..a56660607fd5 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4706,6 +4706,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	void __user *argp = (void __user *)arg;
 	int idx;
 	long r;
+	u32 uvret;
 
 	vcpu_load(vcpu);
 
@@ -4727,14 +4728,33 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	case KVM_S390_CLEAR_RESET:
 		r = 0;
 		kvm_arch_vcpu_ioctl_clear_reset(vcpu);
+		if (kvm_s390_pv_handle_cpu(vcpu)) {
+			r = uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
+					  UVC_CMD_CPU_RESET_CLEAR, &uvret);
+			VCPU_EVENT(vcpu, 3, "PROTVIRT RESET CLEAR VCPU: rc %x rrc %x",
+				   uvret >> 16, uvret & 0x0000ffff);
+		}
 		break;
 	case KVM_S390_INITIAL_RESET:
 		r = 0;
 		kvm_arch_vcpu_ioctl_initial_reset(vcpu);
+		if (kvm_s390_pv_handle_cpu(vcpu)) {
+			r = uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
+					  UVC_CMD_CPU_RESET_INITIAL,
+					  &uvret);
+			VCPU_EVENT(vcpu, 3, "PROTVIRT RESET INITIAL VCPU: rc %x rrc %x",
+				   uvret >> 16, uvret & 0x0000ffff);
+		}
 		break;
 	case KVM_S390_NORMAL_RESET:
 		r = 0;
 		kvm_arch_vcpu_ioctl_normal_reset(vcpu);
+		if (kvm_s390_pv_handle_cpu(vcpu)) {
+			r = uv_cmd_nodata(kvm_s390_pv_handle_cpu(vcpu),
+					  UVC_CMD_CPU_RESET, &uvret);
+			VCPU_EVENT(vcpu, 3, "PROTVIRT RESET NORMAL VCPU: rc %x rrc %x",
+				   uvret >> 16, uvret & 0x0000ffff);
+		}
 		break;
 	case KVM_SET_ONE_REG:
 	case KVM_GET_ONE_REG: {
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* [PATCH 35/35] DOCUMENTATION: Protected virtual machine introduction and IPL
  2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (33 preceding siblings ...)
  2020-02-07 11:39 ` [PATCH 34/35] KVM: s390: protvirt: Add UV cpu reset calls Christian Borntraeger
@ 2020-02-07 11:39 ` Christian Borntraeger
  2020-02-11 12:23   ` Thomas Huth
  2020-02-12 11:01   ` Cornelia Huck
  34 siblings, 2 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 11:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

From: Janosch Frank <frankja@linux.ibm.com>

Add documentation about protected KVM guests and description of changes
that are necessary to move a KVM VM into Protected Virtualization mode.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
[borntraeger@de.ibm.com: fixing and conversion to rst]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 Documentation/virt/kvm/index.rst        |   2 +
 Documentation/virt/kvm/s390-pv-boot.rst |  79 ++++++++++++++++
 Documentation/virt/kvm/s390-pv.rst      | 116 ++++++++++++++++++++++++
 MAINTAINERS                             |   1 +
 4 files changed, 198 insertions(+)
 create mode 100644 Documentation/virt/kvm/s390-pv-boot.rst
 create mode 100644 Documentation/virt/kvm/s390-pv.rst

diff --git a/Documentation/virt/kvm/index.rst b/Documentation/virt/kvm/index.rst
index ada224a511fe..9a3b3fff18aa 100644
--- a/Documentation/virt/kvm/index.rst
+++ b/Documentation/virt/kvm/index.rst
@@ -10,3 +10,5 @@ KVM
    amd-memory-encryption
    cpuid
    vcpu-requests
+   s390-pv
+   s390-pv-boot
diff --git a/Documentation/virt/kvm/s390-pv-boot.rst b/Documentation/virt/kvm/s390-pv-boot.rst
new file mode 100644
index 000000000000..47814e53369a
--- /dev/null
+++ b/Documentation/virt/kvm/s390-pv-boot.rst
@@ -0,0 +1,79 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================
+s390 (IBM Z) Boot/IPL of Protected VMs
+======================================
+
+Summary
+-------
+Protected Virtual Machines (PVM) are not accessible by I/O or the
+hypervisor.  When the hypervisor wants to access the memory of PVMs
+the memory needs to be made accessible. When doing so, the memory will
+be encrypted.  See :doc:`s390-pv` for details.
+
+On IPL a small plaintext bootloader is started which provides
+information about the encrypted components and necessary metadata to
+KVM to decrypt the protected virtual machine.
+
+Based on this data, KVM will make the protected virtual machine known
+to the Ultravisor(UV) and instruct it to secure the memory of the PVM,
+decrypt the components and verify the data and address list hashes, to
+ensure integrity. Afterwards KVM can run the PVM via the SIE
+instruction which the UV will intercept and execute on KVM's behalf.
+
+The switch into PV mode lets us load encrypted guest executables and
+data via every available method (network, dasd, scsi, direct kernel,
+...) without the need to change the boot process.
+
+
+Diag308
+-------
+This diagnose instruction is the basis for VM IPL. The VM can set and
+retrieve IPL information blocks, that specify the IPL method/devices
+and request VM memory and subsystem resets, as well as IPLs.
+
+For PVs this concept has been extended with new subcodes:
+
+Subcode 8: Set an IPL Information Block of type 5 (information block
+for PVMs)
+Subcode 9: Store the saved block in guest memory
+Subcode 10: Move into Protected Virtualization mode
+
+The new PV load-device-specific-parameters field specifies all data,
+that is necessary to move into PV mode.
+
+* PV Header origin
+* PV Header length
+* List of Components composed of
+   * AES-XTS Tweak prefix
+   * Origin
+   * Size
+
+The PV header contains the keys and hashes, which the UV will use to
+decrypt and verify the PV, as well as control flags and a start PSW.
+
+The components are for instance an encrypted kernel, kernel cmd and
+initrd. The components are decrypted by the UV.
+
+All non-decrypted data of the guest before it switches to protected
+virtualization mode are zero on first access of the PV.
+
+
+When running in protected mode some subcodes will result in exceptions
+or return error codes.
+
+Subcodes 4 and 7 will result in specification exceptions as they would
+not clear out the guest memory.
+When removing a secure VM, the UV will clear all memory, so we can't
+have non-clearing IPL subcodes.
+
+Subcodes 8, 9, 10 will result in specification exceptions.
+Re-IPL into a protected mode is only possible via a detour into non
+protected mode.
+
+Keys
+----
+Every CEC will have a unique public key to enable tooling to build
+encrypted images.
+See  `s390-tools <https://github.com/ibm-s390-tools/s390-tools/>`_
+for the tooling.
diff --git a/Documentation/virt/kvm/s390-pv.rst b/Documentation/virt/kvm/s390-pv.rst
new file mode 100644
index 000000000000..dbe9110dfd1e
--- /dev/null
+++ b/Documentation/virt/kvm/s390-pv.rst
@@ -0,0 +1,116 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================================
+s390 (IBM Z) Ultravisor and Protected VMs
+=========================================
+
+Summary
+-------
+Protected virtual machines (PVM) are KVM VMs, where KVM can't access
+the VM's state like guest memory and guest registers anymore. Instead,
+the PVMs are mostly managed by a new entity called Ultravisor
+(UV). The UV provides an API that can be used by PVMs and KVM to
+request management actions.
+
+Each guest starts in the non-protected mode and then may make a
+request to transition into protected mode. On transition, KVM
+registers the guest and its VCPUs with the Ultravisor and prepares
+everything for running it.
+
+The Ultravisor will secure and decrypt the guest's boot memory
+(i.e. kernel/initrd). It will safeguard state changes like VCPU
+starts/stops and injected interrupts while the guest is running.
+
+As access to the guest's state, such as the SIE state description, is
+normally needed to be able to run a VM, some changes have been made in
+SIE behavior. A new format 4 state description has been introduced,
+where some fields have different meanings for a PVM. SIE exits are
+minimized as much as possible to improve speed and reduce exposed
+guest state.
+
+
+Interrupt injection
+-------------------
+Interrupt injection is safeguarded by the Ultravisor. As KVM doesn't
+have access to the VCPUs' lowcores, injection is handled via the
+format 4 state description.
+
+Machine check, external, IO and restart interruptions each can be
+injected on SIE entry via a bit in the interrupt injection control
+field (offset 0x54). If the guest cpu is not enabled for the interrupt
+at the time of injection, a validity interception is recognized. The
+format 4 state description contains fields in the interception data
+block where data associated with the interrupt can be transported.
+
+Program and Service Call exceptions have another layer of
+safeguarding; they can only be injected for instructions that have
+been intercepted into KVM. The exceptions need to be a valid outcome
+of an instruction emulation by KVM, e.g. we can never inject a
+addressing exception as they are reported by SIE since KVM has no
+access to the guest memory.
+
+
+Mask notification interceptions
+-------------------------------
+In order to be notified when a PVM enables a certain class of
+interrupt, KVM cannot intercept lctl(g) and lpsw(e) anymore. As a
+replacement, two new interception codes have been introduced: One
+indicating that the contents of CRs 0, 6, or 14 have been changed,
+indicating different interruption subclasses; and one indicating that
+PSW bit 13 has been changed, indicating that a machine check
+intervention was requested and those are now enabled.
+
+Instruction emulation
+---------------------
+With the format 4 state description for PVMs, the SIE instruction already
+interprets more instructions than it does with format 2. It is not able
+to interpret every instruction, but needs to hand some tasks to KVM;
+therefore, the SIE and the ultravisor safeguard emulation inputs and outputs.
+
+The control structures associated with SIE provide the Secure
+Instruction Data Area (SIDA), the Interception Parameters (IP) and the
+Secure Interception General Register Save Area.  Guest GRs and most of
+the instruction data, such as I/O data structures, are filtered.
+Instruction data is copied to and from the Secure Instruction Data
+Area (SIDA) when needed.  Guest GRs are put into / retrieved from the
+Secure Interception General Register Save Area.
+
+Only GR values needed to emulate an instruction will be copied into this
+save area and the real register numbers will be hidden.
+
+The Interception Parameters state description field still contains the
+the bytes of the instruction text, but with pre-set register values
+instead of the actual ones. I.e. each instruction always uses the same
+instruction text, in order not to leak guest instruction text.
+This also implies that the register content that a guest had in r<n>
+may be in r<m> from the hypervisors point of view.
+
+The Secure Instruction Data Area contains instruction storage
+data. Instruction data, i.e. data being referenced by an instruction
+like the SCCB for sclp, is moved over the SIDA. When an instruction is
+intercepted, the SIE will only allow data and program interrupts for
+this instruction to be moved to the guest via the two data areas
+discussed before. Other data is either ignored or results in validity
+interceptions.
+
+
+Instruction emulation interceptions
+-----------------------------------
+There are two types of SIE secure instruction intercepts: the normal
+and the notification type. Normal secure instruction intercepts will
+make the guest pending for instruction completion of the intercepted
+instruction type, i.e. on SIE entry it is attempted to complete
+emulation of the instruction with the data provided by KVM. That might
+be a program exception or instruction completion.
+
+The notification type intercepts inform KVM about guest environment
+changes due to guest instruction interpretation. Such an interception
+is recognized, for example, for the store prefix instruction to provide
+the new lowcore location. On SIE reentry, any KVM data in the data areas
+is ignored and execution continues as if the guest instruction had
+completed. For that reason KVM is not allowed to inject a program
+interrupt.
+
+Links
+-----
+`KVM Forum 2019 presentation <https://static.sched.com/hosted_files/kvmforum2019/3b/ibm_protected_vms_s390x.pdf>`_
diff --git a/MAINTAINERS b/MAINTAINERS
index 56765f542244..90da412bebd9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9106,6 +9106,7 @@ L:	kvm@vger.kernel.org
 W:	http://www.ibm.com/developerworks/linux/linux390/
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git
 S:	Supported
+F:	Documentation/virt/kvm/s390*
 F:	arch/s390/include/uapi/asm/kvm*
 F:	arch/s390/include/asm/gmap.h
 F:	arch/s390/include/asm/kvm*
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 07/35] KVM: s390: add new variants of UV CALL
  2020-02-07 11:39 ` [PATCH 07/35] KVM: s390: add new variants of UV CALL Christian Borntraeger
@ 2020-02-07 14:34   ` Thomas Huth
  2020-02-07 15:03     ` Christian Borntraeger
  2020-02-10 12:16   ` Cornelia Huck
  2020-02-14 18:28   ` David Hildenbrand
  2 siblings, 1 reply; 147+ messages in thread
From: Thomas Huth @ 2020-02-07 14:34 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank

On 07/02/2020 12.39, Christian Borntraeger wrote:
> From: Janosch Frank <frankja@linux.ibm.com>
> 
> This add 2 new variants of the UV CALL.
> 
> The first variant handles UV CALLs that might have longer busy
> conditions or just need longer when doing partial completion. We should
> schedule when necessary.
> 
> The second variant handles UV CALLs that only need the handle but have
> no payload (e.g. destroying a VM). We can provide a simple wrapper for
> those.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/include/asm/uv.h | 59 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 59 insertions(+)
> 
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index 1b97230a57ba..e1cef772fde1 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -14,6 +14,7 @@
>  #include <linux/types.h>
>  #include <linux/errno.h>
>  #include <linux/bug.h>
> +#include <linux/sched.h>
>  #include <asm/page.h>
>  #include <asm/gmap.h>
>  
> @@ -91,6 +92,19 @@ struct uv_cb_cfs {
>  	u64 paddr;
>  } __packed __aligned(8);
>  
> +/*
> + * A common UV call struct for calls that take no payload
> + * Examples:
> + * Destroy cpu/config
> + * Verify
> + */
> +struct uv_cb_nodata {
> +	struct uv_cb_header header;
> +	u64 reserved08[2];
> +	u64 handle;
> +	u64 reserved20[4];
> +} __packed __aligned(8);
> +
>  struct uv_cb_share {
>  	struct uv_cb_header header;
>  	u64 reserved08[3];
> @@ -98,6 +112,31 @@ struct uv_cb_share {
>  	u64 reserved28;
>  } __packed __aligned(8);
>  
> +/*
> + * Low level uv_call that takes r1 and r2 as parameter and avoids
> + * stalls for long running busy conditions by doing schedule
> + */
> +static inline int uv_call_sched(unsigned long r1, unsigned long r2)
> +{
> +	int cc;
> +
> +	do {
> +		asm volatile(
> +			"0:	.insn rrf,0xB9A40000,%[r1],%[r2],0,0\n"
> +			"		ipm	%[cc]\n"
> +			"		srl	%[cc],28\n"

Maybe remove one TAB before "ipm" and "srl" ?

Apart from that, patch looks fine to me now.

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 07/35] KVM: s390: add new variants of UV CALL
  2020-02-07 14:34   ` Thomas Huth
@ 2020-02-07 15:03     ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-07 15:03 UTC (permalink / raw)
  To: Thomas Huth, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank




On 07.02.20 15:34, Thomas Huth wrote:
[...]

+static inline int uv_call_sched(unsigned long r1, unsigned long r2)
>> +{
>> +	int cc;
>> +
>> +	do {
>> +		asm volatile(
>> +			"0:	.insn rrf,0xB9A40000,%[r1],%[r2],0,0\n"
>> +			"		ipm	%[cc]\n"
>> +			"		srl	%[cc],28\n"
> 
> Maybe remove one TAB before "ipm" and "srl" ?

ack
> 
> Apart from that, patch looks fine to me now.
> 
> Reviewed-by: Thomas Huth <thuth@redhat.com>
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 08/35] KVM: s390: protvirt: Add initial lifecycle handling
  2020-02-07 11:39 ` [PATCH 08/35] KVM: s390: protvirt: Add initial lifecycle handling Christian Borntraeger
@ 2020-02-07 16:32   ` Thomas Huth
  2020-02-10  8:34     ` Christian Borntraeger
  2020-02-08 14:54   ` Thomas Huth
  2020-02-14 18:39   ` [PATCH 08/35] KVM: s390: protvirt: Add initial lifecycle handling David Hildenbrand
  2 siblings, 1 reply; 147+ messages in thread
From: Thomas Huth @ 2020-02-07 16:32 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank

On 07/02/2020 12.39, Christian Borntraeger wrote:
> From: Janosch Frank <frankja@linux.ibm.com>
> 
> This contains 3 main changes:
> 1. changes in SIE control block handling for secure guests
> 2. helper functions for create/destroy/unpack secure guests
> 3. KVM_S390_PV_COMMAND ioctl to allow userspace dealing with secure
> machines
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
[...]
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index e1cef772fde1..7c21d55d2e49 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -23,11 +23,19 @@
>  #define UVC_RC_INV_STATE	0x0003
>  #define UVC_RC_INV_LEN		0x0005
>  #define UVC_RC_NO_RESUME	0x0007
> +#define UVC_RC_NEED_DESTROY	0x8000

This define is never used. I'd suggest to drop it.

The rest of the patch looks ok to me.

 Thomas

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 08/35] KVM: s390: protvirt: Add initial lifecycle handling
  2020-02-07 11:39 ` [PATCH 08/35] KVM: s390: protvirt: Add initial lifecycle handling Christian Borntraeger
  2020-02-07 16:32   ` Thomas Huth
@ 2020-02-08 14:54   ` Thomas Huth
  2020-02-10 11:43     ` Christian Borntraeger
  2020-02-14 18:39   ` [PATCH 08/35] KVM: s390: protvirt: Add initial lifecycle handling David Hildenbrand
  2 siblings, 1 reply; 147+ messages in thread
From: Thomas Huth @ 2020-02-08 14:54 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank

On 07/02/2020 12.39, Christian Borntraeger wrote:
> From: Janosch Frank <frankja@linux.ibm.com>
> 
> This contains 3 main changes:
> 1. changes in SIE control block handling for secure guests
> 2. helper functions for create/destroy/unpack secure guests
> 3. KVM_S390_PV_COMMAND ioctl to allow userspace dealing with secure
> machines
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h |  24 ++-
>  arch/s390/include/asm/uv.h       |  69 +++++++++
>  arch/s390/kvm/Makefile           |   2 +-
>  arch/s390/kvm/kvm-s390.c         | 191 +++++++++++++++++++++++-
>  arch/s390/kvm/kvm-s390.h         |  27 ++++
>  arch/s390/kvm/pv.c               | 244 +++++++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h         |  33 +++++
>  7 files changed, 586 insertions(+), 4 deletions(-)
>  create mode 100644 arch/s390/kvm/pv.c
[...]
> +struct kvm_pv_cmd {
> +	__u32	cmd;	/* Command to be executed */
> +	__u16	rc;	/* Ultravisor return code */
> +	__u16	rrc;	/* Ultravisor return reason code */

What are rc and rrc good for? I currently can't spot the code where they
are used...

> +	__u64	data;	/* Data or address */
> +};
> +
> +/* Available with KVM_CAP_S390_PROTECTED */
> +#define KVM_S390_PV_COMMAND		_IOW(KVMIO, 0xc5, struct kvm_pv_cmd)
> +#define KVM_S390_PV_COMMAND_VCPU	_IOW(KVMIO, 0xc6, struct kvm_pv_cmd)

If you intend to return values in rc and rrc, shouldn't this rather be
declared as _IOWR instead ?

 Thomas

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 09/35] KVM: s390: protvirt: Add KVM api documentation
  2020-02-07 11:39 ` [PATCH 09/35] KVM: s390: protvirt: Add KVM api documentation Christian Borntraeger
@ 2020-02-08 14:57   ` Thomas Huth
  2020-02-10 12:26     ` Christian Borntraeger
  0 siblings, 1 reply; 147+ messages in thread
From: Thomas Huth @ 2020-02-08 14:57 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank

On 07/02/2020 12.39, Christian Borntraeger wrote:
> From: Janosch Frank <frankja@linux.ibm.com>
> 
> Add documentation for KVM_CAP_S390_PROTECTED capability and the
> KVM_S390_PV_COMMAND and KVM_S390_PV_COMMAND_VCPU ioctls.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  Documentation/virt/kvm/api.txt | 61 ++++++++++++++++++++++++++++++++++
>  1 file changed, 61 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
> index 73448764f544..4874d42286ca 100644
> --- a/Documentation/virt/kvm/api.txt
> +++ b/Documentation/virt/kvm/api.txt
> @@ -4204,6 +4204,60 @@ the clear cpu reset definition in the POP. However, the cpu is not put
>  into ESA mode. This reset is a superset of the initial reset.
>  
>  
> +4.125 KVM_S390_PV_COMMAND
> +
> +Capability: KVM_CAP_S390_PROTECTED
> +Architectures: s390
> +Type: vm ioctl
> +Parameters: struct kvm_pv_cmd
> +Returns: 0 on success, < 0 on error
> +
> +struct kvm_pv_cmd {
> +	__u32	cmd;	/* Command to be executed */
> +	__u16	rc;	/* Ultravisor return code */
> +	__u16	rrc;	/* Ultravisor return reason code */
> +	__u64	data;	/* Data or address */

That remindes me ... do we maybe want a "reserved" field in here for
future extensions? Or is the "data" pointer enough?

> +};
> +
> +cmd values:
> +KVM_PV_VM_CREATE
> +Allocate memory and register the VM with the Ultravisor, thereby
> +donating memory to the Ultravisor making it inaccessible to KVM.
> +
> +KVM_PV_VM_DESTROY
> +Deregisters the VM from the Ultravisor and frees memory that was
> +donated, so the kernel can use it again. All registered VCPUs have to
> +be unregistered beforehand and all memory has to be exported or
> +shared.
> +
> +KVM_PV_VM_SET_SEC_PARMS
> +Pass the image header from VM memory to the Ultravisor in preparation
> +of image unpacking and verification.
> +
> +KVM_PV_VM_UNPACK
> +Unpack (protect and decrypt) a page of the encrypted boot image.
> +
> +KVM_PV_VM_VERIFY
> +Verify the integrity of the unpacked image. Only if this succeeds, KVM
> +is allowed to start protected VCPUs.

You also don't mention rc and rrc here ... yet another indication that
it is unused?

 Thomas

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 23/35] KVM: s390: protvirt: STSI handling
  2020-02-07 11:39 ` [PATCH 23/35] KVM: s390: protvirt: STSI handling Christian Borntraeger
@ 2020-02-08 15:01   ` Thomas Huth
  2020-02-11 10:55   ` Cornelia Huck
  1 sibling, 0 replies; 147+ messages in thread
From: Thomas Huth @ 2020-02-08 15:01 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank

On 07/02/2020 12.39, Christian Borntraeger wrote:
> From: Janosch Frank <frankja@linux.ibm.com>
> 
> Save response to sidad and disable address checking for protected
> guests.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/kvm/priv.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 25/35] KVM: s390: protvirt: Only sync fmt4 registers
  2020-02-07 11:39 ` [PATCH 25/35] KVM: s390: protvirt: Only sync fmt4 registers Christian Borntraeger
@ 2020-02-09 15:50   ` Thomas Huth
  2020-02-10  9:33     ` Christian Borntraeger
  2020-02-11 10:51   ` Cornelia Huck
  1 sibling, 1 reply; 147+ messages in thread
From: Thomas Huth @ 2020-02-09 15:50 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank

On 07/02/2020 12.39, Christian Borntraeger wrote:
> From: Janosch Frank <frankja@linux.ibm.com>
> 
> A lot of the registers are controlled by the Ultravisor and never
> visible to KVM. Also some registers are overlayed, like gbea is with
> sidad, which might leak data to userspace.
> 
> Hence we sync a minimal set of registers for both SIE formats and then
> check and sync format 2 registers if necessary.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/kvm/kvm-s390.c | 116 ++++++++++++++++++++++++---------------
>  1 file changed, 72 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index f995040102ea..7df48cc942fd 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -3447,9 +3447,11 @@ static void kvm_arch_vcpu_ioctl_initial_reset(struct kvm_vcpu *vcpu)
>  	vcpu->arch.sie_block->gcr[0] = CR0_INITIAL_MASK;
>  	vcpu->arch.sie_block->gcr[14] = CR14_INITIAL_MASK;
>  	vcpu->run->s.regs.fpc = 0;
> -	vcpu->arch.sie_block->gbea = 1;
> -	vcpu->arch.sie_block->pp = 0;
> -	vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
> +	if (!kvm_s390_pv_handle_cpu(vcpu)) {
> +		vcpu->arch.sie_block->gbea = 1;
> +		vcpu->arch.sie_block->pp = 0;
> +		vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
> +	}

Technically, this part is not about sync'ing but about reset ... worth
to mention this in the patch description, too? (or maybe even move to
the reset patch 34/35 or a new patch?)

And what about vcpu->arch.sie_block->todpr ? Should that be moved into
the if-statement, too?

>  }
>  
>  static void kvm_arch_vcpu_ioctl_clear_reset(struct kvm_vcpu *vcpu)
> @@ -4060,25 +4062,16 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>  	return rc;
>  }
>  
> -static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> +static void sync_regs_fmt2(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>  {
>  	struct runtime_instr_cb *riccb;
>  	struct gs_cb *gscb;
>  
> -	riccb = (struct runtime_instr_cb *) &kvm_run->s.regs.riccb;
> -	gscb = (struct gs_cb *) &kvm_run->s.regs.gscb;
>  	vcpu->arch.sie_block->gpsw.mask = kvm_run->psw_mask;
>  	vcpu->arch.sie_block->gpsw.addr = kvm_run->psw_addr;
> -	if (kvm_run->kvm_dirty_regs & KVM_SYNC_PREFIX)
> -		kvm_s390_set_prefix(vcpu, kvm_run->s.regs.prefix);
> -	if (kvm_run->kvm_dirty_regs & KVM_SYNC_CRS) {
> -		memcpy(&vcpu->arch.sie_block->gcr, &kvm_run->s.regs.crs, 128);
> -		/* some control register changes require a tlb flush */
> -		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> -	}
> +	riccb = (struct runtime_instr_cb *) &kvm_run->s.regs.riccb;
> +	gscb = (struct gs_cb *) &kvm_run->s.regs.gscb;

You could leave the riccb and gscb lines at the beginning to make the
diff a little bit nicer.

>  	if (kvm_run->kvm_dirty_regs & KVM_SYNC_ARCH0) {
> -		kvm_s390_set_cpu_timer(vcpu, kvm_run->s.regs.cputm);
> -		vcpu->arch.sie_block->ckc = kvm_run->s.regs.ckc;
>  		vcpu->arch.sie_block->todpr = kvm_run->s.regs.todpr;
>  		vcpu->arch.sie_block->pp = kvm_run->s.regs.pp;
>  		vcpu->arch.sie_block->gbea = kvm_run->s.regs.gbea;
> @@ -4119,6 +4112,47 @@ static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>  		vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
>  		vcpu->arch.sie_block->fpf |= kvm_run->s.regs.bpbc ? FPF_BPBC : 0;
>  	}
> +	if (MACHINE_HAS_GS) {
> +		preempt_disable();
> +		__ctl_set_bit(2, 4);
> +		if (current->thread.gs_cb) {
> +			vcpu->arch.host_gscb = current->thread.gs_cb;
> +			save_gs_cb(vcpu->arch.host_gscb);
> +		}
> +		if (vcpu->arch.gs_enabled) {
> +			current->thread.gs_cb = (struct gs_cb *)
> +						&vcpu->run->s.regs.gscb;
> +			restore_gs_cb(current->thread.gs_cb);
> +		}
> +		preempt_enable();
> +	}
> +	/* SIE will load etoken directly from SDNX and therefore kvm_run */
> +}
> +
> +static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> +{
> +	/*
> +	 * at several places we have to modify our internal view to not do
> +	 * things that are disallowed by the ultravisor. For example we must
> +	 * not inject interrupts after specific exits (e.g. 112). We do this
> +	 * by turning off the MIE bits of our PSW copy. To avoid getting
> +	 * validity intercepts, we do only accept the condition code from
> +	 * userspace.
> +	 */
> +	vcpu->arch.sie_block->gpsw.mask &= ~PSW_MASK_CC;
> +	vcpu->arch.sie_block->gpsw.mask |= kvm_run->psw_mask & PSW_MASK_CC;

I think it would be cleaner to only do this for protected guests. You
could combine it with the call to sync_regs_fmt2():

	if (likely(!kvm_s390_pv_is_protected(vcpu->kvm))) {
		sync_regs_fmt2(vcpu, kvm_run);
	} else {
		vcpu->arch.sie_block->gpsw.mask &= ~PSW_MASK_CC;
		vcpu->arch.sie_block->gpsw.mask |= kvm_run->psw_mask &
						   PSW_MASK_CC;
	}

> +	if (kvm_run->kvm_dirty_regs & KVM_SYNC_PREFIX)
> +		kvm_s390_set_prefix(vcpu, kvm_run->s.regs.prefix);
> +	if (kvm_run->kvm_dirty_regs & KVM_SYNC_CRS) {
> +		memcpy(&vcpu->arch.sie_block->gcr, &kvm_run->s.regs.crs, 128);
> +		/* some control register changes require a tlb flush */
> +		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> +	}
> +	if (kvm_run->kvm_dirty_regs & KVM_SYNC_ARCH0) {
> +		kvm_s390_set_cpu_timer(vcpu, kvm_run->s.regs.cputm);
> +		vcpu->arch.sie_block->ckc = kvm_run->s.regs.ckc;
> +	}
>  	save_access_regs(vcpu->arch.host_acrs);
>  	restore_access_regs(vcpu->run->s.regs.acrs);
>  	/* save host (userspace) fprs/vrs */
> @@ -4133,23 +4167,31 @@ static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>  	if (test_fp_ctl(current->thread.fpu.fpc))
>  		/* User space provided an invalid FPC, let's clear it */
>  		current->thread.fpu.fpc = 0;
> +
> +	/* Sync fmt2 only data */
> +	if (likely(!kvm_s390_pv_is_protected(vcpu->kvm)))
> +		sync_regs_fmt2(vcpu, kvm_run);
> +	kvm_run->kvm_dirty_regs = 0;
> +}
> +
> +static void store_regs_fmt2(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> +{
> +	kvm_run->s.regs.pp = vcpu->arch.sie_block->pp;
> +	kvm_run->s.regs.gbea = vcpu->arch.sie_block->gbea;
> +	kvm_run->s.regs.bpbc = (vcpu->arch.sie_block->fpf & FPF_BPBC) == FPF_BPBC;
>  	if (MACHINE_HAS_GS) {
> -		preempt_disable();
>  		__ctl_set_bit(2, 4);
> -		if (current->thread.gs_cb) {
> -			vcpu->arch.host_gscb = current->thread.gs_cb;
> -			save_gs_cb(vcpu->arch.host_gscb);
> -		}
> -		if (vcpu->arch.gs_enabled) {
> -			current->thread.gs_cb = (struct gs_cb *)
> -						&vcpu->run->s.regs.gscb;
> -			restore_gs_cb(current->thread.gs_cb);
> -		}
> +		if (vcpu->arch.gs_enabled)
> +			save_gs_cb(current->thread.gs_cb);
> +		preempt_disable();
> +		current->thread.gs_cb = vcpu->arch.host_gscb;
> +		restore_gs_cb(vcpu->arch.host_gscb);
>  		preempt_enable();
> +		if (!vcpu->arch.host_gscb)
> +			__ctl_clear_bit(2, 4);
> +		vcpu->arch.host_gscb = NULL;
>  	}
> -	/* SIE will load etoken directly from SDNX and therefore kvm_run */
> -
> -	kvm_run->kvm_dirty_regs = 0;
> +	/* SIE will save etoken directly into SDNX and therefore kvm_run */
>  }
>  
>  static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> @@ -4161,12 +4203,9 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>  	kvm_run->s.regs.cputm = kvm_s390_get_cpu_timer(vcpu);
>  	kvm_run->s.regs.ckc = vcpu->arch.sie_block->ckc;
>  	kvm_run->s.regs.todpr = vcpu->arch.sie_block->todpr;

TODPR handling has been move from sync_regs() to sync_regs_fmt2() ...
should this here move from store_regs() to store_regs_fmt2(), too?

And maybe you should also not read the sie_block->gpsw.addr (and some of
the control registers) field in store_regs() either, i.e. move the lines
to store_regs_fmt2()?

> -	kvm_run->s.regs.pp = vcpu->arch.sie_block->pp;
> -	kvm_run->s.regs.gbea = vcpu->arch.sie_block->gbea;
>  	kvm_run->s.regs.pft = vcpu->arch.pfault_token;
>  	kvm_run->s.regs.pfs = vcpu->arch.pfault_select;
>  	kvm_run->s.regs.pfc = vcpu->arch.pfault_compare;
> -	kvm_run->s.regs.bpbc = (vcpu->arch.sie_block->fpf & FPF_BPBC) == FPF_BPBC;
>  	save_access_regs(vcpu->run->s.regs.acrs);
>  	restore_access_regs(vcpu->arch.host_acrs);
>  	/* Save guest register state */
> @@ -4175,19 +4214,8 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>  	/* Restore will be done lazily at return */
>  	current->thread.fpu.fpc = vcpu->arch.host_fpregs.fpc;
>  	current->thread.fpu.regs = vcpu->arch.host_fpregs.regs;
> -	if (MACHINE_HAS_GS) {
> -		__ctl_set_bit(2, 4);
> -		if (vcpu->arch.gs_enabled)
> -			save_gs_cb(current->thread.gs_cb);
> -		preempt_disable();
> -		current->thread.gs_cb = vcpu->arch.host_gscb;
> -		restore_gs_cb(vcpu->arch.host_gscb);
> -		preempt_enable();
> -		if (!vcpu->arch.host_gscb)
> -			__ctl_clear_bit(2, 4);
> -		vcpu->arch.host_gscb = NULL;
> -	}
> -	/* SIE will save etoken directly into SDNX and therefore kvm_run */
> +	if (likely(!kvm_s390_pv_is_protected(vcpu->kvm)))
> +		store_regs_fmt2(vcpu, kvm_run);
>  }
>  
>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> 

 Thomas

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 26/35] KVM: s390: protvirt: Add program exception injection
  2020-02-07 11:39 ` [PATCH 26/35] KVM: s390: protvirt: Add program exception injection Christian Borntraeger
@ 2020-02-09 15:52   ` Thomas Huth
  0 siblings, 0 replies; 147+ messages in thread
From: Thomas Huth @ 2020-02-09 15:52 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank

On 07/02/2020 12.39, Christian Borntraeger wrote:
> From: Janosch Frank <frankja@linux.ibm.com>
> 
> Only two program exceptions can be injected for a protected guest:
> specification and operand.
> 
> For both, a code needs to be specified in the interrupt injection
> control of the state description, as the guest prefix page is not
> accessible to KVM for such guests.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/kvm/interrupt.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index c28fa09cb557..2df6459ab98b 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
> @@ -837,6 +837,21 @@ static int __must_check __deliver_external_call(struct kvm_vcpu *vcpu)
>  	return rc ? -EFAULT : 0;
>  }
>  
> +static int __deliver_prog_pv(struct kvm_vcpu *vcpu, u16 code)
> +{
> +	switch (code) {
> +	case PGM_SPECIFICATION:
> +		vcpu->arch.sie_block->iictl = IICTL_CODE_SPECIFICATION;
> +		break;
> +	case PGM_OPERAND:
> +		vcpu->arch.sie_block->iictl = IICTL_CODE_OPERAND;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
>  static int __must_check __deliver_prog(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int;
> @@ -857,6 +872,9 @@ static int __must_check __deliver_prog(struct kvm_vcpu *vcpu)
>  	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_PROGRAM_INT,
>  					 pgm_info.code, 0);
>  
> +	if (kvm_s390_pv_is_protected(vcpu->kvm))
> +		return __deliver_prog_pv(vcpu, pgm_info.code & ~PGM_PER);
> +
>  	switch (pgm_info.code & ~PGM_PER) {
>  	case PGM_AFX_TRANSLATION:
>  	case PGM_ASX_TRANSLATION:
> 

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 28/35] KVM: s390: protvirt: UV calls diag308 0, 1
  2020-02-07 11:39 ` [PATCH 28/35] KVM: s390: protvirt: UV calls diag308 0, 1 Christian Borntraeger
@ 2020-02-09 16:03   ` Thomas Huth
  2020-02-10  8:45     ` Christian Borntraeger
  0 siblings, 1 reply; 147+ messages in thread
From: Thomas Huth @ 2020-02-09 16:03 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank

On 07/02/2020 12.39, Christian Borntraeger wrote:
> From: Janosch Frank <frankja@linux.ibm.com>
> 
> diag 308 subcode 0 and 1 require KVM and Ultravisor interaction, since
> the cpus have to be set into multiple reset states.
> 
> * All cpus need to be stopped
> * The "unshare all" UVC needs to be executed
> * The "perform reset" UVC needs to be executed
> * The cpus need to be reset via the "set cpu state" UVC
> * The issuing cpu needs to set state 5 via "set cpu state"

Is the patch description still accurate here? The patch seems mostly
about adding two new UVCs, and not really about diag 308 ... ?

> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
[...]
> diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
> index 3fb54ec2cf3e..390830385b9f 100644
> --- a/arch/s390/kvm/diag.c
> +++ b/arch/s390/kvm/diag.c
> @@ -13,6 +13,7 @@
>  #include <asm/pgalloc.h>
>  #include <asm/gmap.h>
>  #include <asm/virtio-ccw.h>
> +#include <asm/uv.h>
>  #include "kvm-s390.h"
>  #include "trace.h"
>  #include "trace-s390.h"

This single change to diag.c looks like it could either be removed, or
the hunk should belong to another patch.

> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 4afa44e3d1ed..0be18ac1afb5 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c

 Thomas

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 32/35] KVM: s390: protvirt: Mask PSW interrupt bits for interception 104 and 112
  2020-02-07 11:39 ` [PATCH 32/35] KVM: s390: protvirt: Mask PSW interrupt bits for interception 104 and 112 Christian Borntraeger
@ 2020-02-09 16:07   ` Thomas Huth
  2020-02-10 13:28   ` Cornelia Huck
  1 sibling, 0 replies; 147+ messages in thread
From: Thomas Huth @ 2020-02-09 16:07 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank

On 07/02/2020 12.39, Christian Borntraeger wrote:
> From: Janosch Frank <frankja@linux.ibm.com>
> 
> We're not allowed to inject interrupts on intercepts that leave the
> guest state in an "in-beetween" state where the next SIE entry will do a
> continuation.  Namely secure instruction interception and secure prefix
> interception.
> As our PSW is just a copy of the real one that will be replaced on the
> next exit, we can mask out the interrupt bits in the PSW to make sure
> that we do not inject anything.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/kvm/kvm-s390.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index ced2bac251a6..8c7b27287b91 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -4052,6 +4052,7 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
>  	return vcpu_post_run_fault_in_sie(vcpu);
>  }
>  
> +#define PSW_INT_MASK (PSW_MASK_EXT | PSW_MASK_IO | PSW_MASK_MCHECK)
>  static int __vcpu_run(struct kvm_vcpu *vcpu)
>  {
>  	int rc, exit_reason;
> @@ -4088,6 +4089,10 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>  			memcpy(vcpu->run->s.regs.gprs,
>  			       sie_page->pv_grregs,
>  			       sizeof(sie_page->pv_grregs));
> +			if (vcpu->arch.sie_block->icptcode == ICPT_PV_INSTR ||
> +			    vcpu->arch.sie_block->icptcode == ICPT_PV_PREF) {
> +				vcpu->arch.sie_block->gpsw.mask &= ~PSW_INT_MASK;
> +			}
>  		}
>  		local_irq_disable();
>  		__enable_cpu_timer_accounting(vcpu);
> 

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 08/35] KVM: s390: protvirt: Add initial lifecycle handling
  2020-02-07 16:32   ` Thomas Huth
@ 2020-02-10  8:34     ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10  8:34 UTC (permalink / raw)
  To: Thomas Huth, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank



On 07.02.20 17:32, Thomas Huth wrote:
> On 07/02/2020 12.39, Christian Borntraeger wrote:
>> From: Janosch Frank <frankja@linux.ibm.com>
>>
>> This contains 3 main changes:
>> 1. changes in SIE control block handling for secure guests
>> 2. helper functions for create/destroy/unpack secure guests
>> 3. KVM_S390_PV_COMMAND ioctl to allow userspace dealing with secure
>> machines
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
> [...]
>> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
>> index e1cef772fde1..7c21d55d2e49 100644
>> --- a/arch/s390/include/asm/uv.h
>> +++ b/arch/s390/include/asm/uv.h
>> @@ -23,11 +23,19 @@
>>  #define UVC_RC_INV_STATE	0x0003
>>  #define UVC_RC_INV_LEN		0x0005
>>  #define UVC_RC_NO_RESUME	0x0007
>> +#define UVC_RC_NEED_DESTROY	0x8000
> 
> This define is never used. I'd suggest to drop it.

I should be used in 

diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index da281d8dcc92..8cc927ca061f 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -189,7 +189,7 @@ int kvm_s390_pv_create_vm(struct kvm *kvm)
        /* Outputs */
        kvm->arch.pv.handle = uvcb.guest_handle;
 
-       if (rc && (uvcb.header.rc & 0x8000)) {
+       if (rc && (uvcb.header.rc & UVC_RC_NEED_DESTROY)) {
                kvm_s390_pv_destroy_vm(kvm);
                return -EINVAL;
        }


Will fix.

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 28/35] KVM: s390: protvirt: UV calls diag308 0, 1
  2020-02-09 16:03   ` Thomas Huth
@ 2020-02-10  8:45     ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10  8:45 UTC (permalink / raw)
  To: Thomas Huth, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank



On 09.02.20 17:03, Thomas Huth wrote:
> On 07/02/2020 12.39, Christian Borntraeger wrote:
>> From: Janosch Frank <frankja@linux.ibm.com>
>>
>> diag 308 subcode 0 and 1 require KVM and Ultravisor interaction, since
>> the cpus have to be set into multiple reset states.
>>
>> * All cpus need to be stopped
>> * The "unshare all" UVC needs to be executed
>> * The "perform reset" UVC needs to be executed
>> * The cpus need to be reset via the "set cpu state" UVC
>> * The issuing cpu needs to set state 5 via "set cpu state"
> 
> Is the patch description still accurate here? The patch seems mostly
> about adding two new UVCs, and not really about diag 308 ... ?

Yes, this patch seems a bit unordered, I messed that one up.
I will keep the UNSHARE_ALL and the  KVM_PV_VM_PREP_RESET things
as we call both for diag 308 subcode 0 and 1 (kexec and kdump)

Everything else belongs in other patches. 
Will move and improve the patch description.

> 
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
> [...]
>> diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
>> index 3fb54ec2cf3e..390830385b9f 100644
>> --- a/arch/s390/kvm/diag.c
>> +++ b/arch/s390/kvm/diag.c
>> @@ -13,6 +13,7 @@
>>  #include <asm/pgalloc.h>
>>  #include <asm/gmap.h>
>>  #include <asm/virtio-ccw.h>
>> +#include <asm/uv.h>
>>  #include "kvm-s390.h"
>>  #include "trace.h"
>>  #include "trace-s390.h"
> 
> This single change to diag.c looks like it could either be removed, or
> the hunk should belong to another patch.
> 
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index 4afa44e3d1ed..0be18ac1afb5 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
> 
>  Thomas
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 25/35] KVM: s390: protvirt: Only sync fmt4 registers
  2020-02-09 15:50   ` Thomas Huth
@ 2020-02-10  9:33     ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10  9:33 UTC (permalink / raw)
  To: Thomas Huth, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank

On 09.02.20 16:50, Thomas Huth wrote:
> On 07/02/2020 12.39, Christian Borntraeger wrote:
>> From: Janosch Frank <frankja@linux.ibm.com>
>>
>> A lot of the registers are controlled by the Ultravisor and never
>> visible to KVM. Also some registers are overlayed, like gbea is with
>> sidad, which might leak data to userspace.
>>
>> Hence we sync a minimal set of registers for both SIE formats and then
>> check and sync format 2 registers if necessary.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  arch/s390/kvm/kvm-s390.c | 116 ++++++++++++++++++++++++---------------
>>  1 file changed, 72 insertions(+), 44 deletions(-)
>>
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index f995040102ea..7df48cc942fd 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -3447,9 +3447,11 @@ static void kvm_arch_vcpu_ioctl_initial_reset(struct kvm_vcpu *vcpu)
>>  	vcpu->arch.sie_block->gcr[0] = CR0_INITIAL_MASK;
>>  	vcpu->arch.sie_block->gcr[14] = CR14_INITIAL_MASK;
>>  	vcpu->run->s.regs.fpc = 0;
>> -	vcpu->arch.sie_block->gbea = 1;
>> -	vcpu->arch.sie_block->pp = 0;
>> -	vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
>> +	if (!kvm_s390_pv_handle_cpu(vcpu)) {
>> +		vcpu->arch.sie_block->gbea = 1;
>> +		vcpu->arch.sie_block->pp = 0;
>> +		vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
>> +	}
> 
> Technically, this part is not about sync'ing but about reset ... worth
> to mention this in the patch description, too? (or maybe even move to
> the reset patch 34/35 or a new patch?)

Will move into a separate patch. 
> 
> And what about vcpu->arch.sie_block->todpr ? Should that be moved into
> the if-statement, too?

Yes, todpr is not accessible by the KVM and should go in here 


> 
>>  }
>>  
>>  static void kvm_arch_vcpu_ioctl_clear_reset(struct kvm_vcpu *vcpu)
>> @@ -4060,25 +4062,16 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>  	return rc;
>>  }
>>  
>> -static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>> +static void sync_regs_fmt2(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>>  {
>>  	struct runtime_instr_cb *riccb;
>>  	struct gs_cb *gscb;
>>  
>> -	riccb = (struct runtime_instr_cb *) &kvm_run->s.regs.riccb;
>> -	gscb = (struct gs_cb *) &kvm_run->s.regs.gscb;
>>  	vcpu->arch.sie_block->gpsw.mask = kvm_run->psw_mask;
>>  	vcpu->arch.sie_block->gpsw.addr = kvm_run->psw_addr;
>> -	if (kvm_run->kvm_dirty_regs & KVM_SYNC_PREFIX)
>> -		kvm_s390_set_prefix(vcpu, kvm_run->s.regs.prefix);
>> -	if (kvm_run->kvm_dirty_regs & KVM_SYNC_CRS) {
>> -		memcpy(&vcpu->arch.sie_block->gcr, &kvm_run->s.regs.crs, 128);
>> -		/* some control register changes require a tlb flush */
>> -		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
>> -	}
>> +	riccb = (struct runtime_instr_cb *) &kvm_run->s.regs.riccb;
>> +	gscb = (struct gs_cb *) &kvm_run->s.regs.gscb;
> 
> You could leave the riccb and gscb lines at the beginning to make the
> diff a little bit nicer.

ack.
> 
>>  	if (kvm_run->kvm_dirty_regs & KVM_SYNC_ARCH0) {
>> -		kvm_s390_set_cpu_timer(vcpu, kvm_run->s.regs.cputm);
>> -		vcpu->arch.sie_block->ckc = kvm_run->s.regs.ckc;
>>  		vcpu->arch.sie_block->todpr = kvm_run->s.regs.todpr;
>>  		vcpu->arch.sie_block->pp = kvm_run->s.regs.pp;
>>  		vcpu->arch.sie_block->gbea = kvm_run->s.regs.gbea;
>> @@ -4119,6 +4112,47 @@ static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>>  		vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
>>  		vcpu->arch.sie_block->fpf |= kvm_run->s.regs.bpbc ? FPF_BPBC : 0;
>>  	}
>> +	if (MACHINE_HAS_GS) {
>> +		preempt_disable();
>> +		__ctl_set_bit(2, 4);
>> +		if (current->thread.gs_cb) {
>> +			vcpu->arch.host_gscb = current->thread.gs_cb;
>> +			save_gs_cb(vcpu->arch.host_gscb);
>> +		}
>> +		if (vcpu->arch.gs_enabled) {
>> +			current->thread.gs_cb = (struct gs_cb *)
>> +						&vcpu->run->s.regs.gscb;
>> +			restore_gs_cb(current->thread.gs_cb);
>> +		}
>> +		preempt_enable();
>> +	}
>> +	/* SIE will load etoken directly from SDNX and therefore kvm_run */
>> +}
>> +
>> +static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>> +{
>> +	/*
>> +	 * at several places we have to modify our internal view to not do
>> +	 * things that are disallowed by the ultravisor. For example we must
>> +	 * not inject interrupts after specific exits (e.g. 112). We do this
>> +	 * by turning off the MIE bits of our PSW copy. To avoid getting
>> +	 * validity intercepts, we do only accept the condition code from
>> +	 * userspace.
>> +	 */
>> +	vcpu->arch.sie_block->gpsw.mask &= ~PSW_MASK_CC;
>> +	vcpu->arch.sie_block->gpsw.mask |= kvm_run->psw_mask & PSW_MASK_CC;
> 
> I think it would be cleaner to only do this for protected guests. You
> could combine it with the call to sync_regs_fmt2():
> 
> 	if (likely(!kvm_s390_pv_is_protected(vcpu->kvm))) {
> 		sync_regs_fmt2(vcpu, kvm_run);
> 	} else {
> 		vcpu->arch.sie_block->gpsw.mask &= ~PSW_MASK_CC;
> 		vcpu->arch.sie_block->gpsw.mask |= kvm_run->psw_mask &
> 						   PSW_MASK_CC;
> 	}

I like that. 

[...]
>>  static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>> @@ -4161,12 +4203,9 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>>  	kvm_run->s.regs.cputm = kvm_s390_get_cpu_timer(vcpu);
>>  	kvm_run->s.regs.ckc = vcpu->arch.sie_block->ckc;
>>  	kvm_run->s.regs.todpr = vcpu->arch.sie_block->todpr;
> 
> TODPR handling has been move from sync_regs() to sync_regs_fmt2() ...
> should this here move from store_regs() to store_regs_fmt2(), too?

ack.
> 
> And maybe you should also not read the sie_block->gpsw.addr (and some of
> the control registers) field in store_regs() either, i.e. move the lines
> to store_regs_fmt2()?
> 
>> -	kvm_run->s.regs.pp = vcpu->arch.sie_block->pp;
>> -	kvm_run->s.regs.gbea = vcpu->arch.sie_block->gbea;
>>  	kvm_run->s.regs.pft = vcpu->arch.pfault_token;
>>  	kvm_run->s.regs.pfs = vcpu->arch.pfault_select;
>>  	kvm_run->s.regs.pfc = vcpu->arch.pfault_compare;
>> -	kvm_run->s.regs.bpbc = (vcpu->arch.sie_block->fpf & FPF_BPBC) == FPF_BPBC;
>>  	save_access_regs(vcpu->run->s.regs.acrs);
>>  	restore_access_regs(vcpu->arch.host_acrs);
>>  	/* Save guest register state */
>> @@ -4175,19 +4214,8 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>>  	/* Restore will be done lazily at return */
>>  	current->thread.fpu.fpc = vcpu->arch.host_fpregs.fpc;
>>  	current->thread.fpu.regs = vcpu->arch.host_fpregs.regs;
>> -	if (MACHINE_HAS_GS) {
>> -		__ctl_set_bit(2, 4);
>> -		if (vcpu->arch.gs_enabled)
>> -			save_gs_cb(current->thread.gs_cb);
>> -		preempt_disable();
>> -		current->thread.gs_cb = vcpu->arch.host_gscb;
>> -		restore_gs_cb(vcpu->arch.host_gscb);
>> -		preempt_enable();
>> -		if (!vcpu->arch.host_gscb)
>> -			__ctl_clear_bit(2, 4);
>> -		vcpu->arch.host_gscb = NULL;
>> -	}
>> -	/* SIE will save etoken directly into SDNX and therefore kvm_run */
>> +	if (likely(!kvm_s390_pv_is_protected(vcpu->kvm)))
>> +		store_regs_fmt2(vcpu, kvm_run);
>>  }
>>  
>>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>>
> 
>  Thomas
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 03/35] s390/protvirt: introduce host side setup
  2020-02-07 11:39 ` [PATCH 03/35] s390/protvirt: introduce host side setup Christian Borntraeger
@ 2020-02-10  9:42   ` Thomas Huth
  2020-02-10  9:48     ` Christian Borntraeger
  2020-02-10 11:54   ` Cornelia Huck
  2020-02-10 12:38   ` David Hildenbrand
  2 siblings, 1 reply; 147+ messages in thread
From: Thomas Huth @ 2020-02-10  9:42 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik

On 07/02/2020 12.39, Christian Borntraeger wrote:
> From: Vasily Gorbik <gor@linux.ibm.com>
> 
> Add "prot_virt" command line option which controls if the kernel
> protected VMs support is enabled at early boot time. This has to be
> done early, because it needs large amounts of memory and will disable
> some features like STP time sync for the lpar.
> 
> Extend ultravisor info definitions and expose it via uv_info struct
> filled in during startup.
> 
> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
[...]
> diff --git a/arch/s390/boot/uv.c b/arch/s390/boot/uv.c
> index ed007f4a6444..af9e1cc93c68 100644
> --- a/arch/s390/boot/uv.c
> +++ b/arch/s390/boot/uv.c
> @@ -3,7 +3,13 @@
>  #include <asm/facility.h>
>  #include <asm/sections.h>
>  
> +/* will be used in arch/s390/kernel/uv.c */
> +#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
>  int __bootdata_preserved(prot_virt_guest);
> +#endif
> +#if IS_ENABLED(CONFIG_KVM)
> +struct uv_info __bootdata_preserved(uv_info);
> +#endif
>  
>  void uv_query_info(void)
>  {
> @@ -18,7 +24,20 @@ void uv_query_info(void)
>  	if (uv_call(0, (uint64_t)&uvcb))
>  		return;
>  
> -	if (test_bit_inv(BIT_UVC_CMD_SET_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list) &&
> +	if (IS_ENABLED(CONFIG_KVM)) {
> +		memcpy(uv_info.inst_calls_list, uvcb.inst_calls_list, sizeof(uv_info.inst_calls_list));
> +		uv_info.uv_base_stor_len = uvcb.uv_base_stor_len;
> +		uv_info.guest_base_stor_len = uvcb.conf_base_phys_stor_len;
> +		uv_info.guest_virt_base_stor_len = uvcb.conf_base_virt_stor_len;
> +		uv_info.guest_virt_var_stor_len = uvcb.conf_virt_var_stor_len;
> +		uv_info.guest_cpu_stor_len = uvcb.cpu_stor_len;
> +		uv_info.max_sec_stor_addr = ALIGN(uvcb.max_guest_stor_addr, PAGE_SIZE);
> +		uv_info.max_num_sec_conf = uvcb.max_num_sec_conf;
> +		uv_info.max_guest_cpus = uvcb.max_guest_cpus;
> +	}
> +
> +	if (IS_ENABLED(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) &&
> +	    test_bit_inv(BIT_UVC_CMD_SET_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list) &&
>  	    test_bit_inv(BIT_UVC_CMD_REMOVE_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list))
>  		prot_virt_guest = 1;
>  }
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index 4093a2856929..cc7b0b0bc874 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -44,7 +44,19 @@ struct uv_cb_qui {
>  	struct uv_cb_header header;
>  	u64 reserved08;
>  	u64 inst_calls_list[4];
> -	u64 reserved30[15];
> +	u64 reserved30[2];
> +	u64 uv_base_stor_len;
> +	u64 reserved48;
> +	u64 conf_base_phys_stor_len;
> +	u64 conf_base_virt_stor_len;
> +	u64 conf_virt_var_stor_len;
> +	u64 cpu_stor_len;
> +	u32 reserved70[3];
> +	u32 max_num_sec_conf;
> +	u64 max_guest_stor_addr;
> +	u8  reserved88[158-136];
> +	u16 max_guest_cpus;
> +	u64 reserveda0;
>  } __packed __aligned(8);
>  
>  struct uv_cb_share {
> @@ -69,9 +81,21 @@ static inline int uv_call(unsigned long r1, unsigned long r2)
>  	return cc;
>  }
>  
> -#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
> +struct uv_info {
> +	unsigned long inst_calls_list[4];
> +	unsigned long uv_base_stor_len;
> +	unsigned long guest_base_stor_len;
> +	unsigned long guest_virt_base_stor_len;
> +	unsigned long guest_virt_var_stor_len;
> +	unsigned long guest_cpu_stor_len;
> +	unsigned long max_sec_stor_addr;
> +	unsigned int max_num_sec_conf;
> +	unsigned short max_guest_cpus;
> +};
> +extern struct uv_info uv_info;
>  extern int prot_virt_guest;

Don't you want to keep prot_virt_guest within the "#ifdef
CONFIG_PROTECTED_VIRTUALIZATION_GUEST" ?

> +#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
>  static inline int is_prot_virt_guest(void)
>  {
>  	return prot_virt_guest;
> @@ -121,11 +145,27 @@ static inline int uv_remove_shared(unsigned long addr)
>  	return share(addr, UVC_CMD_REMOVE_SHARED_ACCESS);
>  }
>  
> -void uv_query_info(void);
>  #else
>  #define is_prot_virt_guest() 0
>  static inline int uv_set_shared(unsigned long addr) { return 0; }
>  static inline int uv_remove_shared(unsigned long addr) { return 0; }
> +#endif
> +
> +#if IS_ENABLED(CONFIG_KVM)
> +extern int prot_virt_host;
> +
> +static inline int is_prot_virt_host(void)
> +{
> +	return prot_virt_host;
> +}
> +#else
> +#define is_prot_virt_host() 0
> +#endif
> +
> +#if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
> +	IS_ENABLED(CONFIG_KVM)
> +void uv_query_info(void);
> +#else
>  static inline void uv_query_info(void) {}
>  #endif

With the nit fixed:
Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 03/35] s390/protvirt: introduce host side setup
  2020-02-10  9:42   ` Thomas Huth
@ 2020-02-10  9:48     ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10  9:48 UTC (permalink / raw)
  To: Thomas Huth, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik



On 10.02.20 10:42, Thomas Huth wrote:

>> +	unsigned short max_guest_cpus;
>> +};
>> +extern struct uv_info uv_info;
>>  extern int prot_virt_guest;
> 
> Don't you want to keep prot_virt_guest within the "#ifdef
> CONFIG_PROTECTED_VIRTUALIZATION_GUEST" ?

yes. 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 15/35] KVM: s390: protvirt: Implement interruption injection
  2020-02-07 11:39 ` [PATCH 15/35] KVM: s390: protvirt: Implement interruption injection Christian Borntraeger
@ 2020-02-10 10:03   ` Thomas Huth
  0 siblings, 0 replies; 147+ messages in thread
From: Thomas Huth @ 2020-02-10 10:03 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik

On 07/02/2020 12.39, Christian Borntraeger wrote:
> From: Michael Mueller <mimu@linux.ibm.com>
> 
> The patch implements interruption injection for the following
> list of interruption types:
> 
>    - I/O (uses inject io interruption)
>      __deliver_io
> 
>    - External (uses inject external interruption)
>      __deliver_cpu_timer
>      __deliver_ckc
>      __deliver_emergency_signal
>      __deliver_external_call
> 
>    - cpu restart (uses inject restart interruption)
>      __deliver_restart
> 
>    - machine checks (uses mcic, failing address and external damage)
>      __write_machine_check
> 
> Please note that posted interrupts (GISA) are not used for protected
> guests as of today.
> 
> The service interrupt is handled in a followup patch.
> 
> Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h |   6 ++
>  arch/s390/kvm/interrupt.c        | 106 +++++++++++++++++++++++--------
>  2 files changed, 86 insertions(+), 26 deletions(-)

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 08/35] KVM: s390: protvirt: Add initial lifecycle handling
  2020-02-08 14:54   ` Thomas Huth
@ 2020-02-10 11:43     ` Christian Borntraeger
  2020-02-10 11:45       ` [PATCH/RFC] KVM: s390: protvirt: pass-through rc and rrc Christian Borntraeger
  0 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10 11:43 UTC (permalink / raw)
  To: Thomas Huth, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank

On 08.02.20 15:54, Thomas Huth wrote:
> On 07/02/2020 12.39, Christian Borntraeger wrote:
>> From: Janosch Frank <frankja@linux.ibm.com>
>>
>> This contains 3 main changes:
>> 1. changes in SIE control block handling for secure guests
>> 2. helper functions for create/destroy/unpack secure guests
>> 3. KVM_S390_PV_COMMAND ioctl to allow userspace dealing with secure
>> machines
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  arch/s390/include/asm/kvm_host.h |  24 ++-
>>  arch/s390/include/asm/uv.h       |  69 +++++++++
>>  arch/s390/kvm/Makefile           |   2 +-
>>  arch/s390/kvm/kvm-s390.c         | 191 +++++++++++++++++++++++-
>>  arch/s390/kvm/kvm-s390.h         |  27 ++++
>>  arch/s390/kvm/pv.c               | 244 +++++++++++++++++++++++++++++++
>>  include/uapi/linux/kvm.h         |  33 +++++
>>  7 files changed, 586 insertions(+), 4 deletions(-)
>>  create mode 100644 arch/s390/kvm/pv.c
> [...]
>> +struct kvm_pv_cmd {
>> +	__u32	cmd;	/* Command to be executed */
>> +	__u16	rc;	/* Ultravisor return code */
>> +	__u16	rrc;	/* Ultravisor return reason code */
> 
> What are rc and rrc good for? I currently can't spot the code where they
> are used...

Janosch want to have those for some cases. I will post an addon patch
as a reply.

> 
>> +	__u64	data;	/* Data or address */
>> +};
>> +
>> +/* Available with KVM_CAP_S390_PROTECTED */
>> +#define KVM_S390_PV_COMMAND		_IOW(KVMIO, 0xc5, struct kvm_pv_cmd)
>> +#define KVM_S390_PV_COMMAND_VCPU	_IOW(KVMIO, 0xc6, struct kvm_pv_cmd)
> 
> If you intend to return values in rc and rrc, shouldn't this rather be
> declared as _IOWR instead ?

If yes then Yes.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH/RFC] KVM: s390: protvirt: pass-through rc and rrc
  2020-02-10 11:43     ` Christian Borntraeger
@ 2020-02-10 11:45       ` Christian Borntraeger
  2020-02-10 12:06         ` Christian Borntraeger
  0 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10 11:45 UTC (permalink / raw)
  To: borntraeger
  Cc: Ulrich.Weigand, aarcange, cohuck, david, frankja, frankja, gor,
	imbrenda, kvm, linux-s390, mimu, thuth

This would be one variant to get the RC/RRC to userspace.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 34 +++++++++++++++++++++++++---------
 arch/s390/kvm/kvm-s390.h | 15 ++++++++-------
 arch/s390/kvm/pv.c       | 30 ++++++++++++++++++++++--------
 include/uapi/linux/kvm.h |  4 ++--
 4 files changed, 57 insertions(+), 26 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index e1bccbb41fdd..8dae9629b47f 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2172,6 +2172,8 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 	int r = 0;
 	void __user *argp = (void __user *)cmd->data;
 
+	cmd->rc = 0;
+	cmd->rrc = 0;
 	switch (cmd->cmd) {
 	case KVM_PV_VM_CREATE: {
 		r = -EINVAL;
@@ -2192,7 +2194,7 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 			mutex_unlock(&kvm->lock);
 			break;
 		}
-		r = kvm_s390_pv_create_vm(kvm);
+		r = kvm_s390_pv_create_vm(kvm, cmd);
 		kvm_s390_vcpu_unblock_all(kvm);
 		mutex_unlock(&kvm->lock);
 		break;
@@ -2205,7 +2207,7 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 		/* All VCPUs have to be destroyed before this call. */
 		mutex_lock(&kvm->lock);
 		kvm_s390_vcpu_block_all(kvm);
-		r = kvm_s390_pv_destroy_vm(kvm);
+		r = kvm_s390_pv_destroy_vm(kvm, cmd);
 		if (!r)
 			kvm_s390_pv_dealloc_vm(kvm);
 		kvm_s390_vcpu_unblock_all(kvm);
@@ -2237,7 +2239,7 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 		r = -EFAULT;
 		if (!copy_from_user(hdr, (void __user *)parms.origin,
 				   parms.length))
-			r = kvm_s390_pv_set_sec_parms(kvm, hdr, parms.length);
+			r = kvm_s390_pv_set_sec_parms(kvm, hdr, parms.length, cmd);
 
 		vfree(hdr);
 		break;
@@ -2253,7 +2255,7 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 		if (copy_from_user(&unp, argp, sizeof(unp)))
 			break;
 
-		r = kvm_s390_pv_unpack(kvm, unp.addr, unp.size, unp.tweak);
+		r = kvm_s390_pv_unpack(kvm, unp.addr, unp.size, unp.tweak, cmd);
 		break;
 	}
 	case KVM_PV_VM_VERIFY: {
@@ -2268,6 +2270,8 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 				  &ret);
 		VM_EVENT(kvm, 3, "PROTVIRT VERIFY: rc %x rrc %x",
 			 ret >> 16, ret & 0x0000ffff);
+		cmd->rc = ret >> 16;
+		cmd->rrc = ret & 0xffff;
 		break;
 	}
 	default:
@@ -2385,6 +2389,10 @@ long kvm_arch_vm_ioctl(struct file *filp,
 			break;
 
 		r = kvm_s390_handle_pv(kvm, &args);
+
+		if (copy_to_user(argp, &args, sizeof(args)))
+			r = -EFAULT;
+
 		break;
 	}
 	default:
@@ -2650,6 +2658,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
+	struct kvm_pv_cmd dummy;
+
 	VCPU_EVENT(vcpu, 3, "%s", "free cpu");
 	trace_kvm_s390_destroy_vcpu(vcpu->vcpu_id);
 	kvm_s390_clear_local_irqs(vcpu);
@@ -2663,7 +2673,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	if (vcpu->kvm->arch.use_cmma)
 		kvm_s390_vcpu_unsetup_cmma(vcpu);
 	if (kvm_s390_pv_handle_cpu(vcpu))
-		kvm_s390_pv_destroy_cpu(vcpu);
+		kvm_s390_pv_destroy_cpu(vcpu, &dummy);
 	free_page((unsigned long)(vcpu->arch.sie_block));
 
 	kvm_vcpu_uninit(vcpu);
@@ -2688,11 +2698,13 @@ static void kvm_free_vcpus(struct kvm *kvm)
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
+	struct kvm_pv_cmd dummy;
+
 	kvm_free_vcpus(kvm);
 	sca_dispose(kvm);
 	kvm_s390_gisa_destroy(kvm);
 	if (kvm_s390_pv_is_protected(kvm)) {
-		kvm_s390_pv_destroy_vm(kvm);
+		kvm_s390_pv_destroy_vm(kvm, &dummy);
 		kvm_s390_pv_dealloc_vm(kvm);
 	}
 	debug_unregister(kvm->arch.dbf);
@@ -3153,6 +3165,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
 {
 	struct kvm_vcpu *vcpu;
 	struct sie_page *sie_page;
+	struct kvm_pv_cmd dummy;
 	int rc = -EINVAL;
 
 	if (!kvm_is_ucontrol(kvm) && !sca_can_add_vcpu(kvm, id))
@@ -3188,7 +3201,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
 		goto out_free_sie_block;
 
 	if (kvm_s390_pv_is_protected(kvm)) {
-		rc = kvm_s390_pv_create_cpu(vcpu);
+		rc = kvm_s390_pv_create_cpu(vcpu, &dummy);
 		if (rc) {
 			kvm_vcpu_uninit(vcpu);
 			goto out_free_sie_block;
@@ -4511,19 +4524,22 @@ static int kvm_s390_handle_pv_vcpu(struct kvm_vcpu *vcpu,
 	if (!kvm_s390_pv_is_protected(vcpu->kvm))
 		return -EINVAL;
 
+	cmd->rc = 0;
+	cmd->rrc = 0;
+
 	switch (cmd->cmd) {
 	case KVM_PV_VCPU_CREATE: {
 		if (kvm_s390_pv_handle_cpu(vcpu))
 			return -EINVAL;
 
-		r = kvm_s390_pv_create_cpu(vcpu);
+		r = kvm_s390_pv_create_cpu(vcpu, cmd);
 		break;
 	}
 	case KVM_PV_VCPU_DESTROY: {
 		if (!kvm_s390_pv_handle_cpu(vcpu))
 			return -EINVAL;
 
-		r = kvm_s390_pv_destroy_cpu(vcpu);
+		r = kvm_s390_pv_destroy_cpu(vcpu, cmd);
 		break;
 	}
 	default:
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 32c0c01d5df0..b77d5f565b5c 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -199,14 +199,15 @@ static inline int kvm_s390_user_cpu_state_ctrl(struct kvm *kvm)
 /* implemented in pv.c */
 void kvm_s390_pv_dealloc_vm(struct kvm *kvm);
 int kvm_s390_pv_alloc_vm(struct kvm *kvm);
-int kvm_s390_pv_create_vm(struct kvm *kvm);
-int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu);
-int kvm_s390_pv_destroy_vm(struct kvm *kvm);
-int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu);
-int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length);
+int kvm_s390_pv_create_vm(struct kvm *kvm, struct kvm_pv_cmd *cmd);
+int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu, struct kvm_pv_cmd *cmd);
+int kvm_s390_pv_destroy_vm(struct kvm *kvm, struct kvm_pv_cmd *cmd);
+int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu, struct kvm_pv_cmd *cmd);
+int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length,
+			      struct kvm_pv_cmd *cmd);
 int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
-		       unsigned long tweak);
-int kvm_s390_pv_verify(struct kvm *kvm);
+		       unsigned long tweak, struct kvm_pv_cmd *cmd);
+int kvm_s390_pv_verify(struct kvm *kvm, struct kvm_pv_cmd *cmd);
 
 static inline bool kvm_s390_pv_is_protected(struct kvm *kvm)
 {
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index c1778cb3f8ac..381dc3fefac4 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -61,7 +61,7 @@ int kvm_s390_pv_alloc_vm(struct kvm *kvm)
 	return -ENOMEM;
 }
 
-int kvm_s390_pv_destroy_vm(struct kvm *kvm)
+int kvm_s390_pv_destroy_vm(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 {
 	int rc;
 	u32 ret;
@@ -72,10 +72,12 @@ int kvm_s390_pv_destroy_vm(struct kvm *kvm)
 	atomic_set(&kvm->mm->context.is_protected, 0);
 	VM_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x",
 		 ret >> 16, ret & 0x0000ffff);
+	cmd->rc = ret >> 16;
+	cmd->rrc = ret & 0xffff;
 	return rc;
 }
 
-int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu)
+int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu, struct kvm_pv_cmd *cmd)
 {
 	int rc = 0;
 	u32 ret;
@@ -87,6 +89,8 @@ int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu)
 
 		VCPU_EVENT(vcpu, 3, "PROTVIRT DESTROY VCPU: cpu %d rc %x rrc %x",
 			   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
+		cmd->rc = ret >> 16;
+		cmd->rrc = ret & 0xffff;
 	}
 
 	free_pages(vcpu->arch.pv.stor_base,
@@ -98,7 +102,7 @@ int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu)
 	return rc;
 }
 
-int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
+int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu, struct kvm_pv_cmd *cmd)
 {
 	int rc;
 	struct uv_cb_csc uvcb = {
@@ -124,9 +128,13 @@ int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
 	VCPU_EVENT(vcpu, 3, "PROTVIRT CREATE VCPU: cpu %d handle %llx rc %x rrc %x",
 		   vcpu->vcpu_id, uvcb.cpu_handle, uvcb.header.rc,
 		   uvcb.header.rrc);
+	cmd->rc = uvcb.header.rc;
+	cmd->rrc = uvcb.header.rrc;
 
 	if (rc) {
-		kvm_s390_pv_destroy_cpu(vcpu);
+		struct kvm_pv_cmd dummy;
+
+		kvm_s390_pv_destroy_cpu(vcpu, &dummy);
 		return -EINVAL;
 	}
 
@@ -138,7 +146,7 @@ int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-int kvm_s390_pv_create_vm(struct kvm *kvm)
+int kvm_s390_pv_create_vm(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 {
 	int rc;
 
@@ -162,12 +170,15 @@ int kvm_s390_pv_create_vm(struct kvm *kvm)
 	VM_EVENT(kvm, 3, "PROTVIRT CREATE VM: handle %llx len %llx rc %x rrc %x",
 		 uvcb.guest_handle, uvcb.guest_stor_len, uvcb.header.rc,
 		 uvcb.header.rrc);
+	cmd->rc = uvcb.header.rc;
+	cmd->rrc = uvcb.header.rrc;
 
 	/* Outputs */
 	kvm->arch.pv.handle = uvcb.guest_handle;
 
 	if (rc && (uvcb.header.rc & UVC_RC_NEED_DESTROY)) {
-		kvm_s390_pv_destroy_vm(kvm);
+		struct kvm_pv_cmd dummy;
+		kvm_s390_pv_destroy_vm(kvm, &dummy);
 		return -EINVAL;
 	}
 	kvm->arch.gmap->guest_handle = uvcb.guest_handle;
@@ -176,7 +187,7 @@ int kvm_s390_pv_create_vm(struct kvm *kvm)
 }
 
 int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
-			      void *hdr, u64 length)
+			      void *hdr, u64 length, struct kvm_pv_cmd *cmd)
 {
 	int rc;
 	struct uv_cb_ssc uvcb = {
@@ -193,6 +204,9 @@ int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
 	rc = uv_call(0, (u64)&uvcb);
 	VM_EVENT(kvm, 3, "PROTVIRT VM SET PARMS: rc %x rrc %x",
 		 uvcb.header.rc, uvcb.header.rrc);
+	cmd->rc = uvcb.header.rc;
+	cmd->rrc = uvcb.header.rrc;
+
 	if (rc)
 		return -EINVAL;
 	return 0;
@@ -219,7 +233,7 @@ static int unpack_one(struct kvm *kvm, unsigned long addr, u64 tweak[2])
 }
 
 int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
-		       unsigned long tweak)
+		       unsigned long tweak, struct kvm_pv_cmd *cmd)
 {
 	int rc = 0;
 	u64 tw[2] = {tweak, 0};
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index eab741bc12c3..17c1a9556eac 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1508,8 +1508,8 @@ struct kvm_pv_cmd {
 };
 
 /* Available with KVM_CAP_S390_PROTECTED */
-#define KVM_S390_PV_COMMAND		_IOW(KVMIO, 0xc5, struct kvm_pv_cmd)
-#define KVM_S390_PV_COMMAND_VCPU	_IOW(KVMIO, 0xc6, struct kvm_pv_cmd)
+#define KVM_S390_PV_COMMAND		_IOWR(KVMIO, 0xc5, struct kvm_pv_cmd)
+#define KVM_S390_PV_COMMAND_VCPU	_IOWR(KVMIO, 0xc6, struct kvm_pv_cmd)
 
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 03/35] s390/protvirt: introduce host side setup
  2020-02-07 11:39 ` [PATCH 03/35] s390/protvirt: introduce host side setup Christian Borntraeger
  2020-02-10  9:42   ` Thomas Huth
@ 2020-02-10 11:54   ` Cornelia Huck
  2020-02-10 12:14     ` Christian Borntraeger
  2020-02-10 12:38   ` David Hildenbrand
  2 siblings, 1 reply; 147+ messages in thread
From: Cornelia Huck @ 2020-02-10 11:54 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik

On Fri,  7 Feb 2020 06:39:26 -0500
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> From: Vasily Gorbik <gor@linux.ibm.com>
> 
> Add "prot_virt" command line option which controls if the kernel
> protected VMs support is enabled at early boot time. This has to be
> done early, because it needs large amounts of memory and will disable
> some features like STP time sync for the lpar.
> 
> Extend ultravisor info definitions and expose it via uv_info struct
> filled in during startup.
> 
> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  5 ++
>  arch/s390/boot/Makefile                       |  2 +-
>  arch/s390/boot/uv.c                           | 21 +++++++-
>  arch/s390/include/asm/uv.h                    | 46 +++++++++++++++--
>  arch/s390/kernel/Makefile                     |  1 +
>  arch/s390/kernel/setup.c                      |  4 --
>  arch/s390/kernel/uv.c                         | 49 +++++++++++++++++++
>  7 files changed, 119 insertions(+), 9 deletions(-)
>  create mode 100644 arch/s390/kernel/uv.c

(...)

> diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
> index e2c47d3a1c89..30f1811540c5 100644
> --- a/arch/s390/boot/Makefile
> +++ b/arch/s390/boot/Makefile
> @@ -37,7 +37,7 @@ CFLAGS_sclp_early_core.o += -I$(srctree)/drivers/s390/char
>  obj-y	:= head.o als.o startup.o mem_detect.o ipl_parm.o ipl_report.o
>  obj-y	+= string.o ebcdic.o sclp_early_core.o mem.o ipl_vmparm.o cmdline.o
>  obj-y	+= version.o pgm_check_info.o ctype.o text_dma.o
> -obj-$(CONFIG_PROTECTED_VIRTUALIZATION_GUEST)	+= uv.o
> +obj-$(findstring y, $(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) $(CONFIG_PGSTE))	+= uv.o

I'm wondering why you're checking CONFIG_PGSTE here...

>  obj-$(CONFIG_RELOCATABLE)	+= machine_kexec_reloc.o
>  obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr.o
>  targets	:= bzImage startup.a section_cmp.boot.data section_cmp.boot.preserved.data $(obj-y)
> diff --git a/arch/s390/boot/uv.c b/arch/s390/boot/uv.c
> index ed007f4a6444..af9e1cc93c68 100644
> --- a/arch/s390/boot/uv.c
> +++ b/arch/s390/boot/uv.c
> @@ -3,7 +3,13 @@
>  #include <asm/facility.h>
>  #include <asm/sections.h>
>  
> +/* will be used in arch/s390/kernel/uv.c */
> +#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
>  int __bootdata_preserved(prot_virt_guest);
> +#endif
> +#if IS_ENABLED(CONFIG_KVM)

...and CONFIG_KVM here and below?

> +struct uv_info __bootdata_preserved(uv_info);
> +#endif
>  
>  void uv_query_info(void)
>  {

(...)

> +static int __init prot_virt_setup(char *val)
> +{
> +	bool enabled;
> +	int rc;
> +
> +	rc = kstrtobool(val, &enabled);
> +	if (!rc && enabled)
> +		prot_virt_host = 1;
> +
> +	if (is_prot_virt_guest() && prot_virt_host) {
> +		prot_virt_host = 0;
> +		pr_info("Running as protected virtualization guest.");
> +	}
> +
> +	if (prot_virt_host && !test_facility(158)) {
> +		prot_virt_host = 0;
> +		pr_info("The ultravisor call facility is not available.");
> +	}

What about prefixing these two with 'prot_virt:'? It seems the name is
settled now?

> +
> +	return rc;
> +}
> +early_param("prot_virt", prot_virt_setup);
> +#endif

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH/RFC] KVM: s390: protvirt: pass-through rc and rrc
  2020-02-10 11:45       ` [PATCH/RFC] KVM: s390: protvirt: pass-through rc and rrc Christian Borntraeger
@ 2020-02-10 12:06         ` Christian Borntraeger
  2020-02-10 12:29           ` Thomas Huth
  2020-02-10 12:50           ` Cornelia Huck
  0 siblings, 2 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10 12:06 UTC (permalink / raw)
  To: thuth
  Cc: Ulrich.Weigand, aarcange, cohuck, david, frankja, frankja, gor,
	imbrenda, kvm, linux-s390, mimu

What about the following. I will rip out RC and RRC but add 
a 32bit flags field (which must be 0) and 3*64 bit reserved.


On 10.02.20 12:45, Christian Borntraeger wrote:
> This would be one variant to get the RC/RRC to userspace.
> 
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/kvm/kvm-s390.c | 34 +++++++++++++++++++++++++---------
>  arch/s390/kvm/kvm-s390.h | 15 ++++++++-------
>  arch/s390/kvm/pv.c       | 30 ++++++++++++++++++++++--------
>  include/uapi/linux/kvm.h |  4 ++--
>  4 files changed, 57 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index e1bccbb41fdd..8dae9629b47f 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2172,6 +2172,8 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>  	int r = 0;
>  	void __user *argp = (void __user *)cmd->data;
>  
> +	cmd->rc = 0;
> +	cmd->rrc = 0;
>  	switch (cmd->cmd) {
>  	case KVM_PV_VM_CREATE: {
>  		r = -EINVAL;
> @@ -2192,7 +2194,7 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>  			mutex_unlock(&kvm->lock);
>  			break;
>  		}
> -		r = kvm_s390_pv_create_vm(kvm);
> +		r = kvm_s390_pv_create_vm(kvm, cmd);
>  		kvm_s390_vcpu_unblock_all(kvm);
>  		mutex_unlock(&kvm->lock);
>  		break;
> @@ -2205,7 +2207,7 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>  		/* All VCPUs have to be destroyed before this call. */
>  		mutex_lock(&kvm->lock);
>  		kvm_s390_vcpu_block_all(kvm);
> -		r = kvm_s390_pv_destroy_vm(kvm);
> +		r = kvm_s390_pv_destroy_vm(kvm, cmd);
>  		if (!r)
>  			kvm_s390_pv_dealloc_vm(kvm);
>  		kvm_s390_vcpu_unblock_all(kvm);
> @@ -2237,7 +2239,7 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>  		r = -EFAULT;
>  		if (!copy_from_user(hdr, (void __user *)parms.origin,
>  				   parms.length))
> -			r = kvm_s390_pv_set_sec_parms(kvm, hdr, parms.length);
> +			r = kvm_s390_pv_set_sec_parms(kvm, hdr, parms.length, cmd);
>  
>  		vfree(hdr);
>  		break;
> @@ -2253,7 +2255,7 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>  		if (copy_from_user(&unp, argp, sizeof(unp)))
>  			break;
>  
> -		r = kvm_s390_pv_unpack(kvm, unp.addr, unp.size, unp.tweak);
> +		r = kvm_s390_pv_unpack(kvm, unp.addr, unp.size, unp.tweak, cmd);
>  		break;
>  	}
>  	case KVM_PV_VM_VERIFY: {
> @@ -2268,6 +2270,8 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>  				  &ret);
>  		VM_EVENT(kvm, 3, "PROTVIRT VERIFY: rc %x rrc %x",
>  			 ret >> 16, ret & 0x0000ffff);
> +		cmd->rc = ret >> 16;
> +		cmd->rrc = ret & 0xffff;
>  		break;
>  	}
>  	default:
> @@ -2385,6 +2389,10 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  			break;
>  
>  		r = kvm_s390_handle_pv(kvm, &args);
> +
> +		if (copy_to_user(argp, &args, sizeof(args)))
> +			r = -EFAULT;
> +
>  		break;
>  	}
>  	default:
> @@ -2650,6 +2658,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  
>  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  {
> +	struct kvm_pv_cmd dummy;
> +
>  	VCPU_EVENT(vcpu, 3, "%s", "free cpu");
>  	trace_kvm_s390_destroy_vcpu(vcpu->vcpu_id);
>  	kvm_s390_clear_local_irqs(vcpu);
> @@ -2663,7 +2673,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  	if (vcpu->kvm->arch.use_cmma)
>  		kvm_s390_vcpu_unsetup_cmma(vcpu);
>  	if (kvm_s390_pv_handle_cpu(vcpu))
> -		kvm_s390_pv_destroy_cpu(vcpu);
> +		kvm_s390_pv_destroy_cpu(vcpu, &dummy);
>  	free_page((unsigned long)(vcpu->arch.sie_block));
>  
>  	kvm_vcpu_uninit(vcpu);
> @@ -2688,11 +2698,13 @@ static void kvm_free_vcpus(struct kvm *kvm)
>  
>  void kvm_arch_destroy_vm(struct kvm *kvm)
>  {
> +	struct kvm_pv_cmd dummy;
> +
>  	kvm_free_vcpus(kvm);
>  	sca_dispose(kvm);
>  	kvm_s390_gisa_destroy(kvm);
>  	if (kvm_s390_pv_is_protected(kvm)) {
> -		kvm_s390_pv_destroy_vm(kvm);
> +		kvm_s390_pv_destroy_vm(kvm, &dummy);
>  		kvm_s390_pv_dealloc_vm(kvm);
>  	}
>  	debug_unregister(kvm->arch.dbf);
> @@ -3153,6 +3165,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
>  {
>  	struct kvm_vcpu *vcpu;
>  	struct sie_page *sie_page;
> +	struct kvm_pv_cmd dummy;
>  	int rc = -EINVAL;
>  
>  	if (!kvm_is_ucontrol(kvm) && !sca_can_add_vcpu(kvm, id))
> @@ -3188,7 +3201,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
>  		goto out_free_sie_block;
>  
>  	if (kvm_s390_pv_is_protected(kvm)) {
> -		rc = kvm_s390_pv_create_cpu(vcpu);
> +		rc = kvm_s390_pv_create_cpu(vcpu, &dummy);
>  		if (rc) {
>  			kvm_vcpu_uninit(vcpu);
>  			goto out_free_sie_block;
> @@ -4511,19 +4524,22 @@ static int kvm_s390_handle_pv_vcpu(struct kvm_vcpu *vcpu,
>  	if (!kvm_s390_pv_is_protected(vcpu->kvm))
>  		return -EINVAL;
>  
> +	cmd->rc = 0;
> +	cmd->rrc = 0;
> +
>  	switch (cmd->cmd) {
>  	case KVM_PV_VCPU_CREATE: {
>  		if (kvm_s390_pv_handle_cpu(vcpu))
>  			return -EINVAL;
>  
> -		r = kvm_s390_pv_create_cpu(vcpu);
> +		r = kvm_s390_pv_create_cpu(vcpu, cmd);
>  		break;
>  	}
>  	case KVM_PV_VCPU_DESTROY: {
>  		if (!kvm_s390_pv_handle_cpu(vcpu))
>  			return -EINVAL;
>  
> -		r = kvm_s390_pv_destroy_cpu(vcpu);
> +		r = kvm_s390_pv_destroy_cpu(vcpu, cmd);
>  		break;
>  	}
>  	default:
> diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
> index 32c0c01d5df0..b77d5f565b5c 100644
> --- a/arch/s390/kvm/kvm-s390.h
> +++ b/arch/s390/kvm/kvm-s390.h
> @@ -199,14 +199,15 @@ static inline int kvm_s390_user_cpu_state_ctrl(struct kvm *kvm)
>  /* implemented in pv.c */
>  void kvm_s390_pv_dealloc_vm(struct kvm *kvm);
>  int kvm_s390_pv_alloc_vm(struct kvm *kvm);
> -int kvm_s390_pv_create_vm(struct kvm *kvm);
> -int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu);
> -int kvm_s390_pv_destroy_vm(struct kvm *kvm);
> -int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu);
> -int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length);
> +int kvm_s390_pv_create_vm(struct kvm *kvm, struct kvm_pv_cmd *cmd);
> +int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu, struct kvm_pv_cmd *cmd);
> +int kvm_s390_pv_destroy_vm(struct kvm *kvm, struct kvm_pv_cmd *cmd);
> +int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu, struct kvm_pv_cmd *cmd);
> +int kvm_s390_pv_set_sec_parms(struct kvm *kvm, void *hdr, u64 length,
> +			      struct kvm_pv_cmd *cmd);
>  int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
> -		       unsigned long tweak);
> -int kvm_s390_pv_verify(struct kvm *kvm);
> +		       unsigned long tweak, struct kvm_pv_cmd *cmd);
> +int kvm_s390_pv_verify(struct kvm *kvm, struct kvm_pv_cmd *cmd);
>  
>  static inline bool kvm_s390_pv_is_protected(struct kvm *kvm)
>  {
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index c1778cb3f8ac..381dc3fefac4 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -61,7 +61,7 @@ int kvm_s390_pv_alloc_vm(struct kvm *kvm)
>  	return -ENOMEM;
>  }
>  
> -int kvm_s390_pv_destroy_vm(struct kvm *kvm)
> +int kvm_s390_pv_destroy_vm(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>  {
>  	int rc;
>  	u32 ret;
> @@ -72,10 +72,12 @@ int kvm_s390_pv_destroy_vm(struct kvm *kvm)
>  	atomic_set(&kvm->mm->context.is_protected, 0);
>  	VM_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x",
>  		 ret >> 16, ret & 0x0000ffff);
> +	cmd->rc = ret >> 16;
> +	cmd->rrc = ret & 0xffff;
>  	return rc;
>  }
>  
> -int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu)
> +int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu, struct kvm_pv_cmd *cmd)
>  {
>  	int rc = 0;
>  	u32 ret;
> @@ -87,6 +89,8 @@ int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu)
>  
>  		VCPU_EVENT(vcpu, 3, "PROTVIRT DESTROY VCPU: cpu %d rc %x rrc %x",
>  			   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
> +		cmd->rc = ret >> 16;
> +		cmd->rrc = ret & 0xffff;
>  	}
>  
>  	free_pages(vcpu->arch.pv.stor_base,
> @@ -98,7 +102,7 @@ int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu)
>  	return rc;
>  }
>  
> -int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
> +int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu, struct kvm_pv_cmd *cmd)
>  {
>  	int rc;
>  	struct uv_cb_csc uvcb = {
> @@ -124,9 +128,13 @@ int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
>  	VCPU_EVENT(vcpu, 3, "PROTVIRT CREATE VCPU: cpu %d handle %llx rc %x rrc %x",
>  		   vcpu->vcpu_id, uvcb.cpu_handle, uvcb.header.rc,
>  		   uvcb.header.rrc);
> +	cmd->rc = uvcb.header.rc;
> +	cmd->rrc = uvcb.header.rrc;
>  
>  	if (rc) {
> -		kvm_s390_pv_destroy_cpu(vcpu);
> +		struct kvm_pv_cmd dummy;
> +
> +		kvm_s390_pv_destroy_cpu(vcpu, &dummy);
>  		return -EINVAL;
>  	}
>  
> @@ -138,7 +146,7 @@ int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu)
>  	return 0;
>  }
>  
> -int kvm_s390_pv_create_vm(struct kvm *kvm)
> +int kvm_s390_pv_create_vm(struct kvm *kvm, struct kvm_pv_cmd *cmd)
>  {
>  	int rc;
>  
> @@ -162,12 +170,15 @@ int kvm_s390_pv_create_vm(struct kvm *kvm)
>  	VM_EVENT(kvm, 3, "PROTVIRT CREATE VM: handle %llx len %llx rc %x rrc %x",
>  		 uvcb.guest_handle, uvcb.guest_stor_len, uvcb.header.rc,
>  		 uvcb.header.rrc);
> +	cmd->rc = uvcb.header.rc;
> +	cmd->rrc = uvcb.header.rrc;
>  
>  	/* Outputs */
>  	kvm->arch.pv.handle = uvcb.guest_handle;
>  
>  	if (rc && (uvcb.header.rc & UVC_RC_NEED_DESTROY)) {
> -		kvm_s390_pv_destroy_vm(kvm);
> +		struct kvm_pv_cmd dummy;
> +		kvm_s390_pv_destroy_vm(kvm, &dummy);
>  		return -EINVAL;
>  	}
>  	kvm->arch.gmap->guest_handle = uvcb.guest_handle;
> @@ -176,7 +187,7 @@ int kvm_s390_pv_create_vm(struct kvm *kvm)
>  }
>  
>  int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
> -			      void *hdr, u64 length)
> +			      void *hdr, u64 length, struct kvm_pv_cmd *cmd)
>  {
>  	int rc;
>  	struct uv_cb_ssc uvcb = {
> @@ -193,6 +204,9 @@ int kvm_s390_pv_set_sec_parms(struct kvm *kvm,
>  	rc = uv_call(0, (u64)&uvcb);
>  	VM_EVENT(kvm, 3, "PROTVIRT VM SET PARMS: rc %x rrc %x",
>  		 uvcb.header.rc, uvcb.header.rrc);
> +	cmd->rc = uvcb.header.rc;
> +	cmd->rrc = uvcb.header.rrc;
> +
>  	if (rc)
>  		return -EINVAL;
>  	return 0;
> @@ -219,7 +233,7 @@ static int unpack_one(struct kvm *kvm, unsigned long addr, u64 tweak[2])
>  }
>  
>  int kvm_s390_pv_unpack(struct kvm *kvm, unsigned long addr, unsigned long size,
> -		       unsigned long tweak)
> +		       unsigned long tweak, struct kvm_pv_cmd *cmd)
>  {
>  	int rc = 0;
>  	u64 tw[2] = {tweak, 0};
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index eab741bc12c3..17c1a9556eac 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1508,8 +1508,8 @@ struct kvm_pv_cmd {
>  };
>  
>  /* Available with KVM_CAP_S390_PROTECTED */
> -#define KVM_S390_PV_COMMAND		_IOW(KVMIO, 0xc5, struct kvm_pv_cmd)
> -#define KVM_S390_PV_COMMAND_VCPU	_IOW(KVMIO, 0xc6, struct kvm_pv_cmd)
> +#define KVM_S390_PV_COMMAND		_IOWR(KVMIO, 0xc5, struct kvm_pv_cmd)
> +#define KVM_S390_PV_COMMAND_VCPU	_IOWR(KVMIO, 0xc6, struct kvm_pv_cmd)
>  
>  /* Secure Encrypted Virtualization command */
>  enum sev_cmd_id {
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 03/35] s390/protvirt: introduce host side setup
  2020-02-10 11:54   ` Cornelia Huck
@ 2020-02-10 12:14     ` Christian Borntraeger
  2020-02-10 12:31       ` Cornelia Huck
  0 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10 12:14 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik



On 10.02.20 12:54, Cornelia Huck wrote:
> On Fri,  7 Feb 2020 06:39:26 -0500
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> From: Vasily Gorbik <gor@linux.ibm.com>
>>
>> Add "prot_virt" command line option which controls if the kernel
>> protected VMs support is enabled at early boot time. This has to be
>> done early, because it needs large amounts of memory and will disable
>> some features like STP time sync for the lpar.
>>
>> Extend ultravisor info definitions and expose it via uv_info struct
>> filled in during startup.
>>
>> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
>> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  .../admin-guide/kernel-parameters.txt         |  5 ++
>>  arch/s390/boot/Makefile                       |  2 +-
>>  arch/s390/boot/uv.c                           | 21 +++++++-
>>  arch/s390/include/asm/uv.h                    | 46 +++++++++++++++--
>>  arch/s390/kernel/Makefile                     |  1 +
>>  arch/s390/kernel/setup.c                      |  4 --
>>  arch/s390/kernel/uv.c                         | 49 +++++++++++++++++++
>>  7 files changed, 119 insertions(+), 9 deletions(-)
>>  create mode 100644 arch/s390/kernel/uv.c
> 
> (...)
> 
>> diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
>> index e2c47d3a1c89..30f1811540c5 100644
>> --- a/arch/s390/boot/Makefile
>> +++ b/arch/s390/boot/Makefile
>> @@ -37,7 +37,7 @@ CFLAGS_sclp_early_core.o += -I$(srctree)/drivers/s390/char
>>  obj-y	:= head.o als.o startup.o mem_detect.o ipl_parm.o ipl_report.o
>>  obj-y	+= string.o ebcdic.o sclp_early_core.o mem.o ipl_vmparm.o cmdline.o
>>  obj-y	+= version.o pgm_check_info.o ctype.o text_dma.o
>> -obj-$(CONFIG_PROTECTED_VIRTUALIZATION_GUEST)	+= uv.o
>> +obj-$(findstring y, $(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) $(CONFIG_PGSTE))	+= uv.o
> 
> I'm wondering why you're checking CONFIG_PGSTE here...

It was just simpler for a Makefile, because CONFIG_KVM can be m or y.
PGSTE is always y when CONFIG_KVM is set. Suggestions welcome.

[...]

>> +		prot_virt_host = 0;
>> +		pr_info("Running as protected virtualization guest.");
>> +	}
>> +
>> +	if (prot_virt_host && !test_facility(158)) {
>> +		prot_virt_host = 0;
>> +		pr_info("The ultravisor call facility is not available.");
>> +	}
> 
> What about prefixing these two with 'prot_virt:'? It seems the name is
> settled now?

It is not settled, but I can certainly do something like

#define KMSG_COMPONENT "prot_virt"
#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt


to prefix all pr_* calls in this file.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 07/35] KVM: s390: add new variants of UV CALL
  2020-02-07 11:39 ` [PATCH 07/35] KVM: s390: add new variants of UV CALL Christian Borntraeger
  2020-02-07 14:34   ` Thomas Huth
@ 2020-02-10 12:16   ` Cornelia Huck
  2020-02-10 12:22     ` Christian Borntraeger
  2020-02-14 18:28   ` David Hildenbrand
  2 siblings, 1 reply; 147+ messages in thread
From: Cornelia Huck @ 2020-02-10 12:16 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

On Fri,  7 Feb 2020 06:39:30 -0500
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> From: Janosch Frank <frankja@linux.ibm.com>
> 
> This add 2 new variants of the UV CALL.

"This adds two new helper functions for doing UV CALLs."

?

> 
> The first variant handles UV CALLs that might have longer busy
> conditions or just need longer when doing partial completion. We should
> schedule when necessary.
> 
> The second variant handles UV CALLs that only need the handle but have
> no payload (e.g. destroying a VM). We can provide a simple wrapper for
> those.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/include/asm/uv.h | 59 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 59 insertions(+)

Reviewed-by: Cornelia Huck <cohuck@redhat.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 07/35] KVM: s390: add new variants of UV CALL
  2020-02-10 12:16   ` Cornelia Huck
@ 2020-02-10 12:22     ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10 12:22 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank



On 10.02.20 13:16, Cornelia Huck wrote:
> On Fri,  7 Feb 2020 06:39:30 -0500
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> From: Janosch Frank <frankja@linux.ibm.com>
>>
>> This add 2 new variants of the UV CALL.
> 
> "This adds two new helper functions for doing UV CALLs."

ack

> 
> ?
> 
>>
>> The first variant handles UV CALLs that might have longer busy
>> conditions or just need longer when doing partial completion. We should
>> schedule when necessary.
>>
>> The second variant handles UV CALLs that only need the handle but have
>> no payload (e.g. destroying a VM). We can provide a simple wrapper for
>> those.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  arch/s390/include/asm/uv.h | 59 ++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 59 insertions(+)
> 
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 09/35] KVM: s390: protvirt: Add KVM api documentation
  2020-02-08 14:57   ` Thomas Huth
@ 2020-02-10 12:26     ` Christian Borntraeger
  2020-02-10 12:57       ` Cornelia Huck
  0 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10 12:26 UTC (permalink / raw)
  To: Thomas Huth, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank



On 08.02.20 15:57, Thomas Huth wrote:
> On 07/02/2020 12.39, Christian Borntraeger wrote:
>> From: Janosch Frank <frankja@linux.ibm.com>
>>
>> Add documentation for KVM_CAP_S390_PROTECTED capability and the
>> KVM_S390_PV_COMMAND and KVM_S390_PV_COMMAND_VCPU ioctls.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  Documentation/virt/kvm/api.txt | 61 ++++++++++++++++++++++++++++++++++
>>  1 file changed, 61 insertions(+)
>>
>> diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
>> index 73448764f544..4874d42286ca 100644
>> --- a/Documentation/virt/kvm/api.txt
>> +++ b/Documentation/virt/kvm/api.txt
>> @@ -4204,6 +4204,60 @@ the clear cpu reset definition in the POP. However, the cpu is not put
>>  into ESA mode. This reset is a superset of the initial reset.
>>  
>>  
>> +4.125 KVM_S390_PV_COMMAND
>> +
>> +Capability: KVM_CAP_S390_PROTECTED
>> +Architectures: s390
>> +Type: vm ioctl
>> +Parameters: struct kvm_pv_cmd
>> +Returns: 0 on success, < 0 on error
>> +
>> +struct kvm_pv_cmd {
>> +	__u32	cmd;	/* Command to be executed */
>> +	__u16	rc;	/* Ultravisor return code */
>> +	__u16	rrc;	/* Ultravisor return reason code */
>> +	__u64	data;	/* Data or address */
> 
> That remindes me ... do we maybe want a "reserved" field in here for
> future extensions? Or is the "data" pointer enough?


This is now:

struct kvm_pv_cmd {

        __u32 cmd;      /* Command to be executed */
        __u32 flags;    /* flags for future extensions. Must be 0 for now */
        __u64 data;     /* Data or address */
        __u64 reserved[2];
};

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages
  2020-02-07 11:39 ` [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages Christian Borntraeger
@ 2020-02-10 12:26   ` David Hildenbrand
  2020-02-10 18:38     ` Christian Borntraeger
  2020-02-10 18:56       ` Ulrich Weigand
  2020-02-10 12:40   ` [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages David Hildenbrand
  1 sibling, 2 replies; 147+ messages in thread
From: David Hildenbrand @ 2020-02-10 12:26 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, Andrew Morton

On 07.02.20 12:39, Christian Borntraeger wrote:
> From: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
> 
> The adapter interrupt page containing the indicator bits is currently
> pinned. That means that a guest with many devices can pin a lot of
> memory pages in the host. This also complicates the reference tracking
> which is needed for memory management handling of protected virtual
> machines.
> We can reuse the pte notifiers to "cache" the page without pinning it.
> 
> Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
> Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---

So, instead of pinning explicitly, look up the page address, cache it,
and glue its lifetime to the gmap table entry. When that entry is
changed, invalidate the cached page. On re-access, look up the page
again and register the gmap notifier for the table entry again.

[...]

>  #define MAX_S390_IO_ADAPTERS ((MAX_ISC + 1) * 8)
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index c06c89d370a7..4bfb2f8fe57c 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
> @@ -28,6 +28,7 @@
>  #include <asm/switch_to.h>
>  #include <asm/nmi.h>
>  #include <asm/airq.h>
> +#include <linux/pagemap.h>
>  #include "kvm-s390.h"
>  #include "gaccess.h"
>  #include "trace-s390.h"
> @@ -2328,8 +2329,8 @@ static int register_io_adapter(struct kvm_device *dev,
>  		return -ENOMEM;
>  
>  	INIT_LIST_HEAD(&adapter->maps);
> -	init_rwsem(&adapter->maps_lock);
> -	atomic_set(&adapter->nr_maps, 0);
> +	spin_lock_init(&adapter->maps_lock);
> +	adapter->nr_maps = 0;
>  	adapter->id = adapter_info.id;
>  	adapter->isc = adapter_info.isc;
>  	adapter->maskable = adapter_info.maskable;
> @@ -2375,19 +2376,15 @@ static int kvm_s390_adapter_map(struct kvm *kvm, unsigned int id, __u64 addr)
>  		ret = -EFAULT;
>  		goto out;
>  	}
> -	ret = get_user_pages_fast(map->addr, 1, FOLL_WRITE, &map->page);
> -	if (ret < 0)
> -		goto out;
> -	BUG_ON(ret != 1);
> -	down_write(&adapter->maps_lock);
> -	if (atomic_inc_return(&adapter->nr_maps) < MAX_S390_ADAPTER_MAPS) {
> +	spin_lock(&adapter->maps_lock);
> +	if (adapter->nr_maps < MAX_S390_ADAPTER_MAPS) {
> +		adapter->nr_maps++;
>  		list_add_tail(&map->list, &adapter->maps);

I do wonder if we should check for duplicates. The unmap path will only
remove exactly one entry. But maybe this can never happen or is already
handled on a a higher layer.

>  }
> @@ -2430,7 +2426,6 @@ void kvm_s390_destroy_adapters(struct kvm *kvm)
>  		list_for_each_entry_safe(map, tmp,
>  					 &kvm->arch.adapters[i]->maps, list) {
>  			list_del(&map->list);
> -			put_page(map->page);
>  			kfree(map);
>  		}
>  		kfree(kvm->arch.adapters[i]);

Between the gmap being removed in kvm_arch_vcpu_destroy() and
kvm_s390_destroy_adapters(), the entries would no longer properly get
invalidated. AFAIK, removing/freeing the gmap will not trigger any
notifiers.

Not sure if that's an issue (IOW, if we can have some very weird race).
But I guess we would have similar races already :)

> @@ -2690,6 +2685,31 @@ struct kvm_device_ops kvm_flic_ops = {
>  	.destroy = flic_destroy,
>  };
>  
> +void kvm_s390_adapter_gmap_notifier(struct gmap *gmap, unsigned long start,
> +				    unsigned long end)
> +{
> +	struct kvm *kvm = gmap->private;
> +	struct s390_map_info *map, *tmp;
> +	int i;
> +
> +	for (i = 0; i < MAX_S390_IO_ADAPTERS; i++) {
> +		struct s390_io_adapter *adapter = kvm->arch.adapters[i];
> +
> +		if (!adapter)
> +			continue;

I have to ask very dumb: How is kvm->arch.adapters[] protected?

I don't see any explicit locking e.g., on
flic_set_attr()->register_io_adapter().


[...]> +static struct page *get_map_page(struct kvm *kvm,
> +				 struct s390_io_adapter *adapter,
> +				 u64 addr)
>  {
>  	struct s390_map_info *map;
> +	unsigned long uaddr;
> +	struct page *page;
> +	bool need_retry;
> +	int ret;
>  
>  	if (!adapter)
>  		return NULL;
> +retry:
> +	page = NULL;
> +	uaddr = 0;
> +	spin_lock(&adapter->maps_lock);
> +	list_for_each_entry(map, &adapter->maps, list)
> +		if (map->guest_addr == addr) {

Could it happen, that we don't have a fitting entry in the list?

> +			uaddr = map->addr;
> +			page = map->page;
> +			if (!page)
> +				map->page = ERR_PTR(-EBUSY);
> +			else if (IS_ERR(page) || !page_cache_get_speculative(page)) {
> +				spin_unlock(&adapter->maps_lock);
> +				goto retry;
> +			}
> +			break;
> +		}

Can we please factor out looking up the list entry to a separate
function, to be called under lock? (and e.g., use it below as well)

spin_lock(&adapter->maps_lock);
entry = fancy_new_function();
if (!entry)
	return NULL;
uaddr = entry->addr;
page = entry->page;
if (!page)
	...
spin_unlock(&adapter->maps_lock);


> +	spin_unlock(&adapter->maps_lock);
> +
> +	if (page)
> +		return page;
> +	if (!uaddr)
> +		return NULL;
>  
> -	list_for_each_entry(map, &adapter->maps, list) {
> -		if (map->guest_addr == addr)
> -			return map;
> +	down_read(&kvm->mm->mmap_sem);
> +	ret = set_pgste_bits(kvm->mm, uaddr, PGSTE_IN_BIT, PGSTE_IN_BIT);
> +	if (ret)
> +		goto fail;
> +	ret = get_user_pages_remote(NULL, kvm->mm, uaddr, 1, FOLL_WRITE,
> +				    &page, NULL, NULL);
> +	if (ret < 1)
> +		page = NULL;
> +fail:
> +	up_read(&kvm->mm->mmap_sem);
> +	need_retry = true;
> +	spin_lock(&adapter->maps_lock);
> +	list_for_each_entry(map, &adapter->maps, list)
> +		if (map->guest_addr == addr) {

Could it happen that our entry is suddenly no longer in the list?

> +			if (map->page == ERR_PTR(-EBUSY)) {
> +				map->page = page;
> +				need_retry = false;
> +			} else if (IS_ERR(map->page)) {

else if (map->page == ERR_PTR(-EINVAL)

or simpy "else" (every other value would be a BUG_ON, right?)

/* race with a notifier - don't store the entry and retry */

> +				map->page = NULL;> +			}



> +			break;
> +		}
> +	spin_unlock(&adapter->maps_lock);
> +	if (need_retry) {
> +		if (page)
> +			put_page(page);
> +		goto retry;
>  	}
> -	return NULL;
> +
> +	return page;

Wow, this function is ... special. Took me way to long to figure out
what is going on here. We certainly need comments in there.

I can see that

- ERR_PTR(-EBUSY) is used when somebody is about to do the
  get_user_pages_remote(). others have to loop until that is resolved.
- ERR_PTR(-EINVAL) is used when the entry gets invalidated by the
  notifier while somebody is about to set it (while still
  ERR_PTR(-EBUSY)). The one currently processing the entry will
  eventually set it back to NULL.

I think we should make this clearer by only setting ERR_PTR(-EINVAL) in
the notifier if already ERR_PTR(-EBUSY), along with a comment.

Can we document the values for map->page and how they are to be handled
right in the struct?

-- 
Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH/RFC] KVM: s390: protvirt: pass-through rc and rrc
  2020-02-10 12:06         ` Christian Borntraeger
@ 2020-02-10 12:29           ` Thomas Huth
  2020-02-10 12:50           ` Cornelia Huck
  1 sibling, 0 replies; 147+ messages in thread
From: Thomas Huth @ 2020-02-10 12:29 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Ulrich.Weigand, aarcange, cohuck, david, frankja, frankja, gor,
	imbrenda, kvm, linux-s390, mimu

On 10/02/2020 13.06, Christian Borntraeger wrote:
> What about the following. I will rip out RC and RRC but add 
> a 32bit flags field (which must be 0) and 3*64 bit reserved.

Flags and reserved always sounds good :-)

 Thomas

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 03/35] s390/protvirt: introduce host side setup
  2020-02-10 12:14     ` Christian Borntraeger
@ 2020-02-10 12:31       ` Cornelia Huck
  0 siblings, 0 replies; 147+ messages in thread
From: Cornelia Huck @ 2020-02-10 12:31 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik

On Mon, 10 Feb 2020 13:14:03 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 10.02.20 12:54, Cornelia Huck wrote:
> > On Fri,  7 Feb 2020 06:39:26 -0500
> > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> >   
> >> From: Vasily Gorbik <gor@linux.ibm.com>
> >>
> >> Add "prot_virt" command line option which controls if the kernel
> >> protected VMs support is enabled at early boot time. This has to be
> >> done early, because it needs large amounts of memory and will disable
> >> some features like STP time sync for the lpar.
> >>
> >> Extend ultravisor info definitions and expose it via uv_info struct
> >> filled in during startup.
> >>
> >> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
> >> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> >> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> >> ---
> >>  .../admin-guide/kernel-parameters.txt         |  5 ++
> >>  arch/s390/boot/Makefile                       |  2 +-
> >>  arch/s390/boot/uv.c                           | 21 +++++++-
> >>  arch/s390/include/asm/uv.h                    | 46 +++++++++++++++--
> >>  arch/s390/kernel/Makefile                     |  1 +
> >>  arch/s390/kernel/setup.c                      |  4 --
> >>  arch/s390/kernel/uv.c                         | 49 +++++++++++++++++++
> >>  7 files changed, 119 insertions(+), 9 deletions(-)
> >>  create mode 100644 arch/s390/kernel/uv.c  
> > 
> > (...)
> >   
> >> diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
> >> index e2c47d3a1c89..30f1811540c5 100644
> >> --- a/arch/s390/boot/Makefile
> >> +++ b/arch/s390/boot/Makefile
> >> @@ -37,7 +37,7 @@ CFLAGS_sclp_early_core.o += -I$(srctree)/drivers/s390/char
> >>  obj-y	:= head.o als.o startup.o mem_detect.o ipl_parm.o ipl_report.o
> >>  obj-y	+= string.o ebcdic.o sclp_early_core.o mem.o ipl_vmparm.o cmdline.o
> >>  obj-y	+= version.o pgm_check_info.o ctype.o text_dma.o
> >> -obj-$(CONFIG_PROTECTED_VIRTUALIZATION_GUEST)	+= uv.o
> >> +obj-$(findstring y, $(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) $(CONFIG_PGSTE))	+= uv.o  
> > 
> > I'm wondering why you're checking CONFIG_PGSTE here...  
> 
> It was just simpler for a Makefile, because CONFIG_KVM can be m or y.
> PGSTE is always y when CONFIG_KVM is set. Suggestions welcome.

My only complaint is that it is a bit non-obvious at a glance... but
yeah, I don't have a better suggestion, either.

> 
> [...]
> 
> >> +		prot_virt_host = 0;
> >> +		pr_info("Running as protected virtualization guest.");
> >> +	}
> >> +
> >> +	if (prot_virt_host && !test_facility(158)) {
> >> +		prot_virt_host = 0;
> >> +		pr_info("The ultravisor call facility is not available.");
> >> +	}  
> > 
> > What about prefixing these two with 'prot_virt:'? It seems the name is
> > settled now?  
> 
> It is not settled, but I can certainly do something like
> 
> #define KMSG_COMPONENT "prot_virt"
> #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
> 
> 
> to prefix all pr_* calls in this file.

That would make it easier to associate any messages (especially the
second message here) with this feature, I think.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 03/35] s390/protvirt: introduce host side setup
  2020-02-07 11:39 ` [PATCH 03/35] s390/protvirt: introduce host side setup Christian Borntraeger
  2020-02-10  9:42   ` Thomas Huth
  2020-02-10 11:54   ` Cornelia Huck
@ 2020-02-10 12:38   ` David Hildenbrand
  2020-02-10 12:54     ` Christian Borntraeger
  2 siblings, 1 reply; 147+ messages in thread
From: David Hildenbrand @ 2020-02-10 12:38 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik

On 07.02.20 12:39, Christian Borntraeger wrote:
> From: Vasily Gorbik <gor@linux.ibm.com>
> 
> Add "prot_virt" command line option which controls if the kernel
> protected VMs support is enabled at early boot time. This has to be
> done early, because it needs large amounts of memory and will disable
> some features like STP time sync for the lpar.
> 
> Extend ultravisor info definitions and expose it via uv_info struct
> filled in during startup.
> 
> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  5 ++
>  arch/s390/boot/Makefile                       |  2 +-
>  arch/s390/boot/uv.c                           | 21 +++++++-
>  arch/s390/include/asm/uv.h                    | 46 +++++++++++++++--
>  arch/s390/kernel/Makefile                     |  1 +
>  arch/s390/kernel/setup.c                      |  4 --
>  arch/s390/kernel/uv.c                         | 49 +++++++++++++++++++
>  7 files changed, 119 insertions(+), 9 deletions(-)
>  create mode 100644 arch/s390/kernel/uv.c
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index ade4e6ec23e0..327af96f9528 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3750,6 +3750,11 @@
>  			before loading.
>  			See Documentation/admin-guide/blockdev/ramdisk.rst.
>  
> +	prot_virt=	[S390] enable hosting protected virtual machines
> +			isolated from the hypervisor (if hardware supports
> +			that).
> +			Format: <bool>
> +
>  	psi=		[KNL] Enable or disable pressure stall information
>  			tracking.
>  			Format: <bool>
> diff --git a/arch/s390/boot/Makefile b/arch/s390/boot/Makefile
> index e2c47d3a1c89..30f1811540c5 100644
> --- a/arch/s390/boot/Makefile
> +++ b/arch/s390/boot/Makefile
> @@ -37,7 +37,7 @@ CFLAGS_sclp_early_core.o += -I$(srctree)/drivers/s390/char
>  obj-y	:= head.o als.o startup.o mem_detect.o ipl_parm.o ipl_report.o
>  obj-y	+= string.o ebcdic.o sclp_early_core.o mem.o ipl_vmparm.o cmdline.o
>  obj-y	+= version.o pgm_check_info.o ctype.o text_dma.o
> -obj-$(CONFIG_PROTECTED_VIRTUALIZATION_GUEST)	+= uv.o
> +obj-$(findstring y, $(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) $(CONFIG_PGSTE))	+= uv.o
>  obj-$(CONFIG_RELOCATABLE)	+= machine_kexec_reloc.o
>  obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr.o
>  targets	:= bzImage startup.a section_cmp.boot.data section_cmp.boot.preserved.data $(obj-y)
> diff --git a/arch/s390/boot/uv.c b/arch/s390/boot/uv.c
> index ed007f4a6444..af9e1cc93c68 100644
> --- a/arch/s390/boot/uv.c
> +++ b/arch/s390/boot/uv.c
> @@ -3,7 +3,13 @@
>  #include <asm/facility.h>
>  #include <asm/sections.h>
>  
> +/* will be used in arch/s390/kernel/uv.c */
> +#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
>  int __bootdata_preserved(prot_virt_guest);
> +#endif
> +#if IS_ENABLED(CONFIG_KVM)
> +struct uv_info __bootdata_preserved(uv_info);
> +#endif
>  
>  void uv_query_info(void)
>  {
> @@ -18,7 +24,20 @@ void uv_query_info(void)
>  	if (uv_call(0, (uint64_t)&uvcb))
>  		return;
>  
> -	if (test_bit_inv(BIT_UVC_CMD_SET_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list) &&
> +	if (IS_ENABLED(CONFIG_KVM)) {
> +		memcpy(uv_info.inst_calls_list, uvcb.inst_calls_list, sizeof(uv_info.inst_calls_list));
> +		uv_info.uv_base_stor_len = uvcb.uv_base_stor_len;
> +		uv_info.guest_base_stor_len = uvcb.conf_base_phys_stor_len;
> +		uv_info.guest_virt_base_stor_len = uvcb.conf_base_virt_stor_len;
> +		uv_info.guest_virt_var_stor_len = uvcb.conf_virt_var_stor_len;
> +		uv_info.guest_cpu_stor_len = uvcb.cpu_stor_len;
> +		uv_info.max_sec_stor_addr = ALIGN(uvcb.max_guest_stor_addr, PAGE_SIZE);
> +		uv_info.max_num_sec_conf = uvcb.max_num_sec_conf;
> +		uv_info.max_guest_cpus = uvcb.max_guest_cpus;
> +	}
> +
> +	if (IS_ENABLED(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) &&
> +	    test_bit_inv(BIT_UVC_CMD_SET_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list) &&
>  	    test_bit_inv(BIT_UVC_CMD_REMOVE_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list))
>  		prot_virt_guest = 1;
>  }
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index 4093a2856929..cc7b0b0bc874 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -44,7 +44,19 @@ struct uv_cb_qui {
>  	struct uv_cb_header header;
>  	u64 reserved08;
>  	u64 inst_calls_list[4];
> -	u64 reserved30[15];
> +	u64 reserved30[2];
> +	u64 uv_base_stor_len;
> +	u64 reserved48;
> +	u64 conf_base_phys_stor_len;
> +	u64 conf_base_virt_stor_len;
> +	u64 conf_virt_var_stor_len;
> +	u64 cpu_stor_len;
> +	u32 reserved70[3];
> +	u32 max_num_sec_conf;
> +	u64 max_guest_stor_addr;
> +	u8  reserved88[158-136];
> +	u16 max_guest_cpus;
> +	u64 reserveda0;
>  } __packed __aligned(8);
>  
>  struct uv_cb_share {
> @@ -69,9 +81,21 @@ static inline int uv_call(unsigned long r1, unsigned long r2)
>  	return cc;
>  }
>  
> -#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
> +struct uv_info {
> +	unsigned long inst_calls_list[4];
> +	unsigned long uv_base_stor_len;
> +	unsigned long guest_base_stor_len;
> +	unsigned long guest_virt_base_stor_len;
> +	unsigned long guest_virt_var_stor_len;
> +	unsigned long guest_cpu_stor_len;
> +	unsigned long max_sec_stor_addr;
> +	unsigned int max_num_sec_conf;
> +	unsigned short max_guest_cpus;
> +};
> +extern struct uv_info uv_info;
>  extern int prot_virt_guest;
>  
> +#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
>  static inline int is_prot_virt_guest(void)
>  {
>  	return prot_virt_guest;
> @@ -121,11 +145,27 @@ static inline int uv_remove_shared(unsigned long addr)
>  	return share(addr, UVC_CMD_REMOVE_SHARED_ACCESS);
>  }
>  
> -void uv_query_info(void);
>  #else
>  #define is_prot_virt_guest() 0
>  static inline int uv_set_shared(unsigned long addr) { return 0; }
>  static inline int uv_remove_shared(unsigned long addr) { return 0; }
> +#endif
> +
> +#if IS_ENABLED(CONFIG_KVM)
> +extern int prot_virt_host;
> +
> +static inline int is_prot_virt_host(void)
> +{
> +	return prot_virt_host;
> +}
> +#else
> +#define is_prot_virt_host() 0
> +#endif
> +
> +#if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
> +	IS_ENABLED(CONFIG_KVM)
> +void uv_query_info(void);
> +#else
>  static inline void uv_query_info(void) {}
>  #endif
>  
> diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
> index 2b1203cf7be6..22bfb8d5084e 100644
> --- a/arch/s390/kernel/Makefile
> +++ b/arch/s390/kernel/Makefile
> @@ -78,6 +78,7 @@ obj-$(CONFIG_PERF_EVENTS)	+= perf_cpum_cf_events.o perf_regs.o
>  obj-$(CONFIG_PERF_EVENTS)	+= perf_cpum_cf_diag.o
>  
>  obj-$(CONFIG_TRACEPOINTS)	+= trace.o
> +obj-$(findstring y, $(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) $(CONFIG_PGSTE))	+= uv.o
>  
>  # vdso
>  obj-y				+= vdso64/
> diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
> index d5fbd754f41a..f2ab2528859f 100644
> --- a/arch/s390/kernel/setup.c
> +++ b/arch/s390/kernel/setup.c
> @@ -92,10 +92,6 @@ char elf_platform[ELF_PLATFORM_SIZE];
>  
>  unsigned long int_hwcap = 0;
>  
> -#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
> -int __bootdata_preserved(prot_virt_guest);
> -#endif
> -
>  int __bootdata(noexec_disabled);
>  int __bootdata(memory_end_set);
>  unsigned long __bootdata(memory_end);
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> new file mode 100644
> index 000000000000..fbf2a98de642
> --- /dev/null
> +++ b/arch/s390/kernel/uv.c
> @@ -0,0 +1,49 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Common Ultravisor functions and initialization
> + *
> + * Copyright IBM Corp. 2019, 2020
> + */
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/sizes.h>
> +#include <linux/bitmap.h>
> +#include <linux/memblock.h>
> +#include <asm/facility.h>
> +#include <asm/sections.h>
> +#include <asm/uv.h>
> +
> +/* the bootdata_preserved fields come from ones in arch/s390/boot/uv.c */
> +#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
> +int __bootdata_preserved(prot_virt_guest);
> +#endif
> +
> +#if IS_ENABLED(CONFIG_KVM)
> +int prot_virt_host;
> +EXPORT_SYMBOL(prot_virt_host);
> +struct uv_info __bootdata_preserved(uv_info);
> +EXPORT_SYMBOL(uv_info);
> +
> +static int __init prot_virt_setup(char *val)
> +{
> +	bool enabled;
> +	int rc;
> +
> +	rc = kstrtobool(val, &enabled);
> +	if (!rc && enabled)
> +		prot_virt_host = 1;
> +
> +	if (is_prot_virt_guest() && prot_virt_host) {
> +		prot_virt_host = 0;
> +		pr_info("Running as protected virtualization guest.");

/me confused about gluing an informative message to disabling a feature.

Should this actually be a

pr_warn("Protected virtualization not available in protected guests.");

> +	}
> +
> +	if (prot_virt_host && !test_facility(158)) {
> +		prot_virt_host = 0;
> +		pr_info("The ultravisor call facility is not available.");

It's somehwhat confusing for a user to requested "prot_virt" and get
that error message.

pr_warn("Protected virtualization not supported by the hardware".);


-- 
Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages
  2020-02-07 11:39 ` [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages Christian Borntraeger
  2020-02-10 12:26   ` David Hildenbrand
@ 2020-02-10 12:40   ` David Hildenbrand
  1 sibling, 0 replies; 147+ messages in thread
From: David Hildenbrand @ 2020-02-10 12:40 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, Andrew Morton

[...]

> +void kvm_s390_adapter_gmap_notifier(struct gmap *gmap, unsigned long start,
> +				    unsigned long end)
> +{
> +	struct kvm *kvm = gmap->private;
> +	struct s390_map_info *map, *tmp;
> +	int i;
> +
> +	for (i = 0; i < MAX_S390_IO_ADAPTERS; i++) {
> +		struct s390_io_adapter *adapter = kvm->arch.adapters[i];
> +
> +		if (!adapter)
> +			continue;
> +		spin_lock(&adapter->maps_lock);
> +		list_for_each_entry_safe(map, tmp, &adapter->maps, list) {

list_for_each_entry() is sufficient, we are not removing entries.

> +			if (start <= map->guest_addr && map->guest_addr < end) {
> +				if (IS_ERR(map->page))
> +					map->page = ERR_PTR(-EAGAIN);
> +				else
> +					map->page = NULL;
> +			}

-- 
Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH/RFC] KVM: s390: protvirt: pass-through rc and rrc
  2020-02-10 12:06         ` Christian Borntraeger
  2020-02-10 12:29           ` Thomas Huth
@ 2020-02-10 12:50           ` Cornelia Huck
  2020-02-10 12:56             ` Christian Borntraeger
  1 sibling, 1 reply; 147+ messages in thread
From: Cornelia Huck @ 2020-02-10 12:50 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: thuth, Ulrich.Weigand, aarcange, david, frankja, frankja, gor,
	imbrenda, kvm, linux-s390, mimu

On Mon, 10 Feb 2020 13:06:19 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> What about the following. I will rip out RC and RRC but add 
> a 32bit flags field (which must be 0) and 3*64 bit reserved.

Probably dumb question: How are these new fields supposed to be used?

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 03/35] s390/protvirt: introduce host side setup
  2020-02-10 12:38   ` David Hildenbrand
@ 2020-02-10 12:54     ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10 12:54 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik



On 10.02.20 13:38, David Hildenbrand wrote:

>> +		pr_info("Running as protected virtualization guest.");
> 
> /me confused about gluing an informative message to disabling a feature.
> 
> Should this actually be a
> 
> pr_warn("Protected virtualization not available in protected guests.");

Yes, this is probably better.

> 
>> +	}
>> +
>> +	if (prot_virt_host && !test_facility(158)) {
>> +		prot_virt_host = 0;
>> +		pr_info("The ultravisor call facility is not available.");
> 
> It's somehwhat confusing for a user to requested "prot_virt" and get
> that error message.
> 
> pr_warn("Protected virtualization not supported by the hardware".);


The name is still in flux, but we can change that later on. Will use your 
variant.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH/RFC] KVM: s390: protvirt: pass-through rc and rrc
  2020-02-10 12:50           ` Cornelia Huck
@ 2020-02-10 12:56             ` Christian Borntraeger
  2020-02-11  8:48               ` Janosch Frank
  0 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10 12:56 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: thuth, Ulrich.Weigand, aarcange, david, frankja, frankja, gor,
	imbrenda, kvm, linux-s390, mimu



On 10.02.20 13:50, Cornelia Huck wrote:
> On Mon, 10 Feb 2020 13:06:19 +0100
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> What about the following. I will rip out RC and RRC but add 
>> a 32bit flags field (which must be 0) and 3*64 bit reserved.
> 
> Probably dumb question: How are these new fields supposed to be used?

This was planned for error handling in QEMU. As we have no user of rc/rrc
yet, I have ripped that out and added a flag field + 16 bytes of reserved.
Usage is as usual flags must be 0. When flags!=0 the reserved fields will
have a new meaning. 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 09/35] KVM: s390: protvirt: Add KVM api documentation
  2020-02-10 12:26     ` Christian Borntraeger
@ 2020-02-10 12:57       ` Cornelia Huck
  2020-02-10 13:02         ` Christian Borntraeger
  0 siblings, 1 reply; 147+ messages in thread
From: Cornelia Huck @ 2020-02-10 12:57 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Thomas Huth, Janosch Frank, KVM, David Hildenbrand,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

On Mon, 10 Feb 2020 13:26:35 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 08.02.20 15:57, Thomas Huth wrote:
> > On 07/02/2020 12.39, Christian Borntraeger wrote:  
> >> From: Janosch Frank <frankja@linux.ibm.com>
> >>
> >> Add documentation for KVM_CAP_S390_PROTECTED capability and the
> >> KVM_S390_PV_COMMAND and KVM_S390_PV_COMMAND_VCPU ioctls.
> >>
> >> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> >> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> >> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> >> ---
> >>  Documentation/virt/kvm/api.txt | 61 ++++++++++++++++++++++++++++++++++
> >>  1 file changed, 61 insertions(+)

> >> +4.125 KVM_S390_PV_COMMAND
> >> +
> >> +Capability: KVM_CAP_S390_PROTECTED
> >> +Architectures: s390
> >> +Type: vm ioctl
> >> +Parameters: struct kvm_pv_cmd
> >> +Returns: 0 on success, < 0 on error
> >> +
> >> +struct kvm_pv_cmd {
> >> +	__u32	cmd;	/* Command to be executed */
> >> +	__u16	rc;	/* Ultravisor return code */
> >> +	__u16	rrc;	/* Ultravisor return reason code */
> >> +	__u64	data;	/* Data or address */  
> > 
> > That remindes me ... do we maybe want a "reserved" field in here for
> > future extensions? Or is the "data" pointer enough?  
> 
> 
> This is now:
> 
> struct kvm_pv_cmd {
> 
>         __u32 cmd;      /* Command to be executed */
>         __u32 flags;    /* flags for future extensions. Must be 0 for now */
>         __u64 data;     /* Data or address */
>         __u64 reserved[2];
> };

Ok, that is where you add this... but still, the question: are those
fields only ever set by userspace, or could the kernel return things in
the reserved fields in the future?

Also, two 64 bit values seem a bit arbitrary... what about a data
address + length construct instead? (Length might be a fixed value per
flag?)

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 09/35] KVM: s390: protvirt: Add KVM api documentation
  2020-02-10 12:57       ` Cornelia Huck
@ 2020-02-10 13:02         ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10 13:02 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Thomas Huth, Janosch Frank, KVM, David Hildenbrand,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank



On 10.02.20 13:57, Cornelia Huck wrote:
> On Mon, 10 Feb 2020 13:26:35 +0100
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> On 08.02.20 15:57, Thomas Huth wrote:
>>> On 07/02/2020 12.39, Christian Borntraeger wrote:  
>>>> From: Janosch Frank <frankja@linux.ibm.com>
>>>>
>>>> Add documentation for KVM_CAP_S390_PROTECTED capability and the
>>>> KVM_S390_PV_COMMAND and KVM_S390_PV_COMMAND_VCPU ioctls.
>>>>
>>>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>>>> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
>>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>> ---
>>>>  Documentation/virt/kvm/api.txt | 61 ++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 61 insertions(+)
> 
>>>> +4.125 KVM_S390_PV_COMMAND
>>>> +
>>>> +Capability: KVM_CAP_S390_PROTECTED
>>>> +Architectures: s390
>>>> +Type: vm ioctl
>>>> +Parameters: struct kvm_pv_cmd
>>>> +Returns: 0 on success, < 0 on error
>>>> +
>>>> +struct kvm_pv_cmd {
>>>> +	__u32	cmd;	/* Command to be executed */
>>>> +	__u16	rc;	/* Ultravisor return code */
>>>> +	__u16	rrc;	/* Ultravisor return reason code */
>>>> +	__u64	data;	/* Data or address */  
>>>
>>> That remindes me ... do we maybe want a "reserved" field in here for
>>> future extensions? Or is the "data" pointer enough?  
>>
>>
>> This is now:
>>
>> struct kvm_pv_cmd {
>>
>>         __u32 cmd;      /* Command to be executed */
>>         __u32 flags;    /* flags for future extensions. Must be 0 for now */
>>         __u64 data;     /* Data or address */
>>         __u64 reserved[2];
>> };
> 
> Ok, that is where you add this... but still, the question: are those
> fields only ever set by userspace, or could the kernel return things in
> the reserved fields in the future?

I will change the IOWR to make sure that we can have both directions.
> 
> Also, two 64 bit values seem a bit arbitrary... what about a data
> address + length construct instead? (Length might be a fixed value per
> flag?)

When you look at all the other examples we define those as reserved bytes
The idea is to have no semantics at all. Whenever we add a new flag we will
replace the reserved bytes with a new meaning.

e.g. see
struct kvm_s390_skeys {
        __u64 start_gfn;
        __u64 count;
        __u64 skeydata_addr;
        __u32 flags;
        __u32 reserved[9];
};

or

/* for KVM_S390_MEM_OP and KVM_S390_SIDA_OP */
struct kvm_s390_mem_op {
        /* in */
        __u64 gaddr;            /* the guest address */
        __u64 flags;            /* flags */
        __u32 size;             /* amount of bytes */
        __u32 op;               /* type of operation */
        __u64 buf;              /* buffer in userspace */
        __u8 ar;                /* the access register number */
        __u8 reserved21[3];     /* should be set to 0 */
        __u32 offset;           /* offset into the sida */
        __u8 reserved28[24];    /* should be set to 0 */
};

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 34/35] KVM: s390: protvirt: Add UV cpu reset calls
  2020-02-07 11:39 ` [PATCH 34/35] KVM: s390: protvirt: Add UV cpu reset calls Christian Borntraeger
@ 2020-02-10 13:17   ` Cornelia Huck
  2020-02-10 13:25     ` Christian Borntraeger
  0 siblings, 1 reply; 147+ messages in thread
From: Cornelia Huck @ 2020-02-10 13:17 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

On Fri,  7 Feb 2020 06:39:57 -0500
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> From: Janosch Frank <frankja@linux.ibm.com>
> 
> For protected VMs, the VCPU resets are done by the Ultravisor, as KVM
> has no access to the VCPU registers.
> 
> As the Ultravisor will only accept a call for the reset that is
> needed, we need to fence the UV calls when chaining resets.

Last time, I suggested replacing this with

"Note that the ultravisor will only accept a call for the exact reset
that has been requested."

I still suggest that :)

> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> Reviewed-by: Thomas Huth <thuth@redhat.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/kvm/kvm-s390.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)

Reviewed-by: Cornelia Huck <cohuck@redhat.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 31/35] KVM: s390: protvirt: Add UV debug trace
  2020-02-07 11:39 ` [PATCH 31/35] KVM: s390: protvirt: Add UV debug trace Christian Borntraeger
@ 2020-02-10 13:22   ` Cornelia Huck
  2020-02-10 13:40     ` Christian Borntraeger
  0 siblings, 1 reply; 147+ messages in thread
From: Cornelia Huck @ 2020-02-10 13:22 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

On Fri,  7 Feb 2020 06:39:54 -0500
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> From: Janosch Frank <frankja@linux.ibm.com>
> 
> Let's have some debug traces which stay around for longer than the
> guest.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/kvm/kvm-s390.c |  9 ++++++++-
>  arch/s390/kvm/kvm-s390.h |  9 +++++++++
>  arch/s390/kvm/pv.c       | 20 +++++++++++++++++++-
>  3 files changed, 36 insertions(+), 2 deletions(-)
(...)
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index a58f5106ba5f..da281d8dcc92 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -74,6 +74,8 @@ int kvm_s390_pv_destroy_vm(struct kvm *kvm)
>  	atomic_set(&kvm->mm->context.is_protected, 0);
>  	VM_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x",
>  		 ret >> 16, ret & 0x0000ffff);
> +	KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x",
> +		 ret >> 16, ret & 0x0000ffff);
>  	return rc;
>  }
>  
> @@ -89,6 +91,8 @@ int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu)
>  
>  		VCPU_EVENT(vcpu, 3, "PROTVIRT DESTROY VCPU: cpu %d rc %x rrc %x",
>  			   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);

I think these should drop the vcpu_id, as VCPU_EVENT already includes
it (in the patch introducing them).

> +		KVM_UV_EVENT(vcpu->kvm, 3, "PROTVIRT DESTROY VCPU: cpu %d rc %x rrc %x",
> +			     vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
>  	}
>  
>  	free_pages(vcpu->arch.pv.stor_base,

Otherwise, looks good.

Reviewed-by: Cornelia Huck <cohuck@redhat.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 34/35] KVM: s390: protvirt: Add UV cpu reset calls
  2020-02-10 13:17   ` Cornelia Huck
@ 2020-02-10 13:25     ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10 13:25 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank



On 10.02.20 14:17, Cornelia Huck wrote:
> On Fri,  7 Feb 2020 06:39:57 -0500
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> From: Janosch Frank <frankja@linux.ibm.com>
>>
>> For protected VMs, the VCPU resets are done by the Ultravisor, as KVM
>> has no access to the VCPU registers.
>>
>> As the Ultravisor will only accept a call for the reset that is
>> needed, we need to fence the UV calls when chaining resets.
> 
> Last time, I suggested replacing this with
> 
> "Note that the ultravisor will only accept a call for the exact reset
> that has been requested."
> 
> I still suggest that :)

Will do.

> 
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> Reviewed-by: Thomas Huth <thuth@redhat.com>
>> Reviewed-by: David Hildenbrand <david@redhat.com>
>> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  arch/s390/kvm/kvm-s390.c | 20 ++++++++++++++++++++
>>  1 file changed, 20 insertions(+)
> 
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 32/35] KVM: s390: protvirt: Mask PSW interrupt bits for interception 104 and 112
  2020-02-07 11:39 ` [PATCH 32/35] KVM: s390: protvirt: Mask PSW interrupt bits for interception 104 and 112 Christian Borntraeger
  2020-02-09 16:07   ` Thomas Huth
@ 2020-02-10 13:28   ` Cornelia Huck
  2020-02-10 13:48     ` Christian Borntraeger
  1 sibling, 1 reply; 147+ messages in thread
From: Cornelia Huck @ 2020-02-10 13:28 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

On Fri,  7 Feb 2020 06:39:55 -0500
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> From: Janosch Frank <frankja@linux.ibm.com>
> 
> We're not allowed to inject interrupts on intercepts that leave the
> guest state in an "in-beetween" state where the next SIE entry will do a

s/beetween/between/

> continuation.  Namely secure instruction interception and secure prefix

s/continuation. Namely/continuation, namely,/

Add which one is 104 and which one is 112, so you can match up the
description with the subject?


> interception.
> As our PSW is just a copy of the real one that will be replaced on the
> next exit, we can mask out the interrupt bits in the PSW to make sure
> that we do not inject anything.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/kvm/kvm-s390.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index ced2bac251a6..8c7b27287b91 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -4052,6 +4052,7 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
>  	return vcpu_post_run_fault_in_sie(vcpu);
>  }
>  
> +#define PSW_INT_MASK (PSW_MASK_EXT | PSW_MASK_IO | PSW_MASK_MCHECK)
>  static int __vcpu_run(struct kvm_vcpu *vcpu)
>  {
>  	int rc, exit_reason;
> @@ -4088,6 +4089,10 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>  			memcpy(vcpu->run->s.regs.gprs,
>  			       sie_page->pv_grregs,
>  			       sizeof(sie_page->pv_grregs));

Add a comment, as suggested by Thomas last time?

> +			if (vcpu->arch.sie_block->icptcode == ICPT_PV_INSTR ||
> +			    vcpu->arch.sie_block->icptcode == ICPT_PV_PREF) {
> +				vcpu->arch.sie_block->gpsw.mask &= ~PSW_INT_MASK;
> +			}
>  		}
>  		local_irq_disable();
>  		__enable_cpu_timer_accounting(vcpu);

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 31/35] KVM: s390: protvirt: Add UV debug trace
  2020-02-10 13:22   ` Cornelia Huck
@ 2020-02-10 13:40     ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10 13:40 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank



On 10.02.20 14:22, Cornelia Huck wrote:
> On Fri,  7 Feb 2020 06:39:54 -0500
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> From: Janosch Frank <frankja@linux.ibm.com>
>>
>> Let's have some debug traces which stay around for longer than the
>> guest.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  arch/s390/kvm/kvm-s390.c |  9 ++++++++-
>>  arch/s390/kvm/kvm-s390.h |  9 +++++++++
>>  arch/s390/kvm/pv.c       | 20 +++++++++++++++++++-
>>  3 files changed, 36 insertions(+), 2 deletions(-)
> (...)
>> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
>> index a58f5106ba5f..da281d8dcc92 100644
>> --- a/arch/s390/kvm/pv.c
>> +++ b/arch/s390/kvm/pv.c
>> @@ -74,6 +74,8 @@ int kvm_s390_pv_destroy_vm(struct kvm *kvm)
>>  	atomic_set(&kvm->mm->context.is_protected, 0);
>>  	VM_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x",
>>  		 ret >> 16, ret & 0x0000ffff);
>> +	KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x",
>> +		 ret >> 16, ret & 0x0000ffff);
>>  	return rc;
>>  }
>>  
>> @@ -89,6 +91,8 @@ int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu)
>>  
>>  		VCPU_EVENT(vcpu, 3, "PROTVIRT DESTROY VCPU: cpu %d rc %x rrc %x",
>>  			   vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);
> 
> I think these should drop the vcpu_id, as VCPU_EVENT already includes
> it (in the patch introducing them).

ack, this is patch 8.

> 
>> +		KVM_UV_EVENT(vcpu->kvm, 3, "PROTVIRT DESTROY VCPU: cpu %d rc %x rrc %x",
>> +			     vcpu->vcpu_id, ret >> 16, ret & 0x0000ffff);

>>  	}
>>  
>>  	free_pages(vcpu->arch.pv.stor_base,
> 
> Otherwise, looks good.
> 
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 32/35] KVM: s390: protvirt: Mask PSW interrupt bits for interception 104 and 112
  2020-02-10 13:28   ` Cornelia Huck
@ 2020-02-10 13:48     ` Christian Borntraeger
  2020-02-10 14:47       ` Cornelia Huck
  0 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10 13:48 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank



On 10.02.20 14:28, Cornelia Huck wrote:
> On Fri,  7 Feb 2020 06:39:55 -0500
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> From: Janosch Frank <frankja@linux.ibm.com>
>>
>> We're not allowed to inject interrupts on intercepts that leave the
>> guest state in an "in-beetween" state where the next SIE entry will do a
> 
> s/beetween/between/

ack
> 
>> continuation.  Namely secure instruction interception and secure prefix
> 
> s/continuation. Namely/continuation, namely,/

ack.
> 
> Add which one is 104 and which one is 112, so you can match up the
> description with the subject?

ack.
> 
> 
>> interception.
>> As our PSW is just a copy of the real one that will be replaced on the
>> next exit, we can mask out the interrupt bits in the PSW to make sure
>> that we do not inject anything.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  arch/s390/kvm/kvm-s390.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index ced2bac251a6..8c7b27287b91 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -4052,6 +4052,7 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
>>  	return vcpu_post_run_fault_in_sie(vcpu);
>>  }
>>  
>> +#define PSW_INT_MASK (PSW_MASK_EXT | PSW_MASK_IO | PSW_MASK_MCHECK)
>>  static int __vcpu_run(struct kvm_vcpu *vcpu)
>>  {
>>  	int rc, exit_reason;
>> @@ -4088,6 +4089,10 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>  			memcpy(vcpu->run->s.regs.gprs,
>>  			       sie_page->pv_grregs,
>>  			       sizeof(sie_page->pv_grregs));
> 
> Add a comment, as suggested by Thomas last time?


diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index b53cabc15d9d..52a5196fe975 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4089,6 +4089,12 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
                        memcpy(vcpu->run->s.regs.gprs,
                               sie_page->pv_grregs,
                               sizeof(sie_page->pv_grregs));
+                       /*
+                        * We're not allowed to inject interrupts on intercepts
+                        * that leave the guest state in an "in-beetween" state
+                        * where the next SIE entry will do a continuation.
+                        * Fence interrupts in our "internal" PSW.
+                        */
                        if (vcpu->arch.sie_block->icptcode == ICPT_PV_INSTR ||
                            vcpu->arch.sie_block->icptcode == ICPT_PV_PREF) {
                                vcpu->arch.sie_block->gpsw.mask &= ~PSW_INT_MASK;



> 
>> +			if (vcpu->arch.sie_block->icptcode == ICPT_PV_INSTR ||
>> +			    vcpu->arch.sie_block->icptcode == ICPT_PV_PREF) {
>> +				vcpu->arch.sie_block->gpsw.mask &= ~PSW_INT_MASK;
>> +			}
>>  		}
>>  		local_irq_disable();
>>  		__enable_cpu_timer_accounting(vcpu);
> 

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 32/35] KVM: s390: protvirt: Mask PSW interrupt bits for interception 104 and 112
  2020-02-10 13:48     ` Christian Borntraeger
@ 2020-02-10 14:47       ` Cornelia Huck
  0 siblings, 0 replies; 147+ messages in thread
From: Cornelia Huck @ 2020-02-10 14:47 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

On Mon, 10 Feb 2020 14:48:06 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index b53cabc15d9d..52a5196fe975 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -4089,6 +4089,12 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>                         memcpy(vcpu->run->s.regs.gprs,
>                                sie_page->pv_grregs,
>                                sizeof(sie_page->pv_grregs));
> +                       /*
> +                        * We're not allowed to inject interrupts on intercepts
> +                        * that leave the guest state in an "in-beetween" state

s/beetween/between/ here as well :)

> +                        * where the next SIE entry will do a continuation.
> +                        * Fence interrupts in our "internal" PSW.
> +                        */
>                         if (vcpu->arch.sie_block->icptcode == ICPT_PV_INSTR ||
>                             vcpu->arch.sie_block->icptcode == ICPT_PV_PREF) {
>                                 vcpu->arch.sie_block->gpsw.mask &= ~PSW_INT_MASK;

With that on top,

Reviewed-by: Cornelia Huck <cohuck@redhat.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 21/35] KVM: s390/mm: handle guest unpin events
  2020-02-07 11:39 ` [PATCH 21/35] KVM: s390/mm: handle guest unpin events Christian Borntraeger
@ 2020-02-10 14:58   ` Thomas Huth
  2020-02-11 13:21     ` Cornelia Huck
  0 siblings, 1 reply; 147+ messages in thread
From: Thomas Huth @ 2020-02-10 14:58 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, Andrew Morton

On 07/02/2020 12.39, Christian Borntraeger wrote:
> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> 
> The current code tries to first pin shared pages, if that fails (e.g.
> because the page is not shared) it will export them. For shared pages
> this means that we get a new intercept telling us that the guest is
> unsharing that page. We will make the page secure at that point in time
> and revoke the host access. This is synchronized with other host events,
> e.g. the code will wait until host I/O has finished.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/kvm/intercept.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
> index 2a966dc52611..e155389a4a66 100644
> --- a/arch/s390/kvm/intercept.c
> +++ b/arch/s390/kvm/intercept.c
> @@ -16,6 +16,7 @@
>  #include <asm/asm-offsets.h>
>  #include <asm/irq.h>
>  #include <asm/sysinfo.h>
> +#include <asm/uv.h>
>  
>  #include "kvm-s390.h"
>  #include "gaccess.h"
> @@ -484,12 +485,35 @@ static int handle_pv_sclp(struct kvm_vcpu *vcpu)
>  	return 0;
>  }
>  
> +static int handle_pv_uvc(struct kvm_vcpu *vcpu)
> +{
> +	struct uv_cb_share *guest_uvcb = (void *)vcpu->arch.sie_block->sidad;
> +	struct uv_cb_cts uvcb = {
> +		.header.cmd	= UVC_CMD_UNPIN_PAGE_SHARED,
> +		.header.len	= sizeof(uvcb),
> +		.guest_handle	= kvm_s390_pv_handle(vcpu->kvm),
> +		.gaddr		= guest_uvcb->paddr,
> +	};
> +	int rc;
> +
> +	if (guest_uvcb->header.cmd != UVC_CMD_REMOVE_SHARED_ACCESS) {
> +		WARN_ONCE(1, "Unexpected UVC 0x%x!\n", guest_uvcb->header.cmd);

Is there a way to signal the failed command to the guest, too?

 Thomas


> +		return 0;
> +	}
> +	rc = uv_make_secure(vcpu->arch.gmap, uvcb.gaddr, &uvcb);
> +	if (rc == -EINVAL && uvcb.header.rc == 0x104)
> +		return 0;
> +	return rc;
> +}
> +
>  static int handle_pv_notification(struct kvm_vcpu *vcpu)
>  {
>  	if (vcpu->arch.sie_block->ipa == 0xb210)
>  		return handle_pv_spx(vcpu);
>  	if (vcpu->arch.sie_block->ipa == 0xb220)
>  		return handle_pv_sclp(vcpu);
> +	if (vcpu->arch.sie_block->ipa == 0xb9a4)
> +		return handle_pv_uvc(vcpu);
>  
>  	return handle_instruction(vcpu);
>  }
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-07 11:39 ` [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages Christian Borntraeger
@ 2020-02-10 17:27     ` Christian Borntraeger
  2020-02-10 18:17   ` David Hildenbrand
  2020-02-18  3:36   ` Tian, Kevin
  2 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10 17:27 UTC (permalink / raw)
  To: Janosch Frank, Andrew Morton, Marc Zyngier, Sean Christopherson,
	Tom Lendacky
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, kvm-ppc, Paolo Bonzini

CC Marc Zyngier for KVM on ARM.  Marc, see below. Will there be any
use for this on KVM/ARM in the future?

CC Sean Christopherson/Tom Lendacky. Any obvious use case for Intel/AMD
to have a callback before a page is used for I/O?

Andrew (or other mm people) any chance to get an ACK for this change?
I could then carry that via s390 or KVM tree. Or if you want to carry
that yourself I can send an updated version (we need to kind of 
synchronize that Linus will pull the KVM changes after the mm changes).

Andrea asked if others would benefit from this, so here are some more
information about this (and I can also put this into the patch
description).  So we have talked to the POWER folks. They do not use
the standard normal memory management, instead they have a hard split
between secure and normal memory. The secure memory  is the handled by
the hypervisor as device memory and the ultravisor and the hypervisor
move this forth and back when needed.

On s390 there is no *separate* pool of physical pages that are secure.
Instead, *any* physical page can be marked as secure or not, by
setting a bit in a per-page data structure that hardware uses to stop
unauthorized access.  (That bit is under control of the ultravisor.)

Note that one side effect of this strategy is that the decision
*which* secure pages to encrypt and then swap out is actually done by
the hypervisor, not the ultravisor.  In our case, the hypervisor is
Linux/KVM, so we're using the regular Linux memory management scheme
(active/inactive LRU lists etc.) to make this decision.  The advantage
is that the Ultravisor code does not need to itself implement any
memory management code, making it a lot simpler.

However, in the end this is why we need the hook into Linux memory
management: once Linux has decided to swap a page out, we need to get
a chance to tell the Ultravisor to "export" the page (i.e., encrypt
its contents and mark it no longer secure).

As outlined below this should be a no-op for anybody not opting in.

Christian                                   

On 07.02.20 12:39, Christian Borntraeger wrote:
> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> 
> With the introduction of protected KVM guests on s390 there is now a
> concept of inaccessible pages. These pages need to be made accessible
> before the host can access them.
> 
> While cpu accesses will trigger a fault that can be resolved, I/O
> accesses will just fail.  We need to add a callback into architecture
> code for places that will do I/O, namely when writeback is started or
> when a page reference is taken.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  include/linux/gfp.h | 6 ++++++
>  mm/gup.c            | 2 ++
>  mm/page-writeback.c | 1 +
>  3 files changed, 9 insertions(+)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index e5b817cb86e7..be2754841369 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { }
>  #ifndef HAVE_ARCH_ALLOC_PAGE
>  static inline void arch_alloc_page(struct page *page, int order) { }
>  #endif
> +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> +static inline int arch_make_page_accessible(struct page *page)
> +{
> +	return 0;
> +}
> +#endif
>  
>  struct page *
>  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> diff --git a/mm/gup.c b/mm/gup.c
> index 7646bf993b25..a01262cd2821 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -257,6 +257,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>  			page = ERR_PTR(-ENOMEM);
>  			goto out;
>  		}
> +		arch_make_page_accessible(page);
>  	}
>  	if (flags & FOLL_TOUCH) {
>  		if ((flags & FOLL_WRITE) &&
> @@ -1870,6 +1871,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>  
>  		VM_BUG_ON_PAGE(compound_head(page) != head, page);
>  
> +		arch_make_page_accessible(page);
>  		SetPageReferenced(page);
>  		pages[*nr] = page;
>  		(*nr)++;
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 2caf780a42e7..0f0bd14571b1 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2806,6 +2806,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
>  		inc_lruvec_page_state(page, NR_WRITEBACK);
>  		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
>  	}
> +	arch_make_page_accessible(page);
>  	unlock_page_memcg(page);

As outlined by Ulrich, we can move the callback after the unlock.

>  	return ret;
>  
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
@ 2020-02-10 17:27     ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10 17:27 UTC (permalink / raw)
  To: Janosch Frank, Andrew Morton, Marc Zyngier, Sean Christopherson,
	Tom Lendacky
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, kvm-ppc, Paolo Bonzini

CC Marc Zyngier for KVM on ARM.  Marc, see below. Will there be any
use for this on KVM/ARM in the future?

CC Sean Christopherson/Tom Lendacky. Any obvious use case for Intel/AMD
to have a callback before a page is used for I/O?

Andrew (or other mm people) any chance to get an ACK for this change?
I could then carry that via s390 or KVM tree. Or if you want to carry
that yourself I can send an updated version (we need to kind of 
synchronize that Linus will pull the KVM changes after the mm changes).

Andrea asked if others would benefit from this, so here are some more
information about this (and I can also put this into the patch
description).  So we have talked to the POWER folks. They do not use
the standard normal memory management, instead they have a hard split
between secure and normal memory. The secure memory  is the handled by
the hypervisor as device memory and the ultravisor and the hypervisor
move this forth and back when needed.

On s390 there is no *separate* pool of physical pages that are secure.
Instead, *any* physical page can be marked as secure or not, by
setting a bit in a per-page data structure that hardware uses to stop
unauthorized access.  (That bit is under control of the ultravisor.)

Note that one side effect of this strategy is that the decision
*which* secure pages to encrypt and then swap out is actually done by
the hypervisor, not the ultravisor.  In our case, the hypervisor is
Linux/KVM, so we're using the regular Linux memory management scheme
(active/inactive LRU lists etc.) to make this decision.  The advantage
is that the Ultravisor code does not need to itself implement any
memory management code, making it a lot simpler.

However, in the end this is why we need the hook into Linux memory
management: once Linux has decided to swap a page out, we need to get
a chance to tell the Ultravisor to "export" the page (i.e., encrypt
its contents and mark it no longer secure).

As outlined below this should be a no-op for anybody not opting in.

Christian                                   

On 07.02.20 12:39, Christian Borntraeger wrote:
> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> 
> With the introduction of protected KVM guests on s390 there is now a
> concept of inaccessible pages. These pages need to be made accessible
> before the host can access them.
> 
> While cpu accesses will trigger a fault that can be resolved, I/O
> accesses will just fail.  We need to add a callback into architecture
> code for places that will do I/O, namely when writeback is started or
> when a page reference is taken.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  include/linux/gfp.h | 6 ++++++
>  mm/gup.c            | 2 ++
>  mm/page-writeback.c | 1 +
>  3 files changed, 9 insertions(+)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index e5b817cb86e7..be2754841369 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { }
>  #ifndef HAVE_ARCH_ALLOC_PAGE
>  static inline void arch_alloc_page(struct page *page, int order) { }
>  #endif
> +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> +static inline int arch_make_page_accessible(struct page *page)
> +{
> +	return 0;
> +}
> +#endif
>  
>  struct page *
>  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> diff --git a/mm/gup.c b/mm/gup.c
> index 7646bf993b25..a01262cd2821 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -257,6 +257,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>  			page = ERR_PTR(-ENOMEM);
>  			goto out;
>  		}
> +		arch_make_page_accessible(page);
>  	}
>  	if (flags & FOLL_TOUCH) {
>  		if ((flags & FOLL_WRITE) &&
> @@ -1870,6 +1871,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>  
>  		VM_BUG_ON_PAGE(compound_head(page) != head, page);
>  
> +		arch_make_page_accessible(page);
>  		SetPageReferenced(page);
>  		pages[*nr] = page;
>  		(*nr)++;
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 2caf780a42e7..0f0bd14571b1 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2806,6 +2806,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
>  		inc_lruvec_page_state(page, NR_WRITEBACK);
>  		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
>  	}
> +	arch_make_page_accessible(page);
>  	unlock_page_memcg(page);

As outlined by Ulrich, we can move the callback after the unlock.

>  	return ret;
>  
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 24/35] KVM: s390: protvirt: disallow one_reg
  2020-02-07 11:39 ` [PATCH 24/35] KVM: s390: protvirt: disallow one_reg Christian Borntraeger
@ 2020-02-10 17:53   ` Cornelia Huck
  2020-02-10 18:34     ` Christian Borntraeger
  0 siblings, 1 reply; 147+ messages in thread
From: Cornelia Huck @ 2020-02-10 17:53 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

On Fri,  7 Feb 2020 06:39:47 -0500
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> From: Janosch Frank <frankja@linux.ibm.com>
> 
> A lot of the registers are controlled by the Ultravisor and never
> visible to KVM. Some fields in the sie control block are overlayed,
> like gbea. As no userspace uses the ONE_REG interface on s390 it is safe
> to disable this for protected guests.

Last round, I suggested

"As no known userspace uses the ONE_REG interface on s390 if sync regs
are available, no functionality is lost if it is disabled for protected
guests."

Any opinion on that?

> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> Reviewed-by: Thomas Huth <thuth@redhat.com>
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  Documentation/virt/kvm/api.txt | 6 ++++--
>  arch/s390/kvm/kvm-s390.c       | 3 +++
>  2 files changed, 7 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-07 11:39 ` [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages Christian Borntraeger
  2020-02-10 17:27     ` Christian Borntraeger
@ 2020-02-10 18:17   ` David Hildenbrand
  2020-02-10 18:28     ` Christian Borntraeger
  2020-02-18  3:36   ` Tian, Kevin
  2 siblings, 1 reply; 147+ messages in thread
From: David Hildenbrand @ 2020-02-10 18:17 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, Andrew Morton, Michal Hocko

On 07.02.20 12:39, Christian Borntraeger wrote:
> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> 
> With the introduction of protected KVM guests on s390 there is now a
> concept of inaccessible pages. These pages need to be made accessible
> before the host can access them.
> 
> While cpu accesses will trigger a fault that can be resolved, I/O
> accesses will just fail.  We need to add a callback into architecture
> code for places that will do I/O, namely when writeback is started or
> when a page reference is taken.

My question would be: What guarantees that the page will stay accessible
(for I/O)? IIRC, pages can be converted back to secure/inaccessible
whenever the guest wants to access them. How will that be dealt with?

I would assume some magic counter that tracks if the page still has to
remain accessible. Once all clients that require the page to be
"accessible" on the I/O path are done, the page can be made inaccessible
again. But then, I would assume there would be something like a

/* make page accessible and make sure the page will remain accessible */
arch_get_page_accessible(page);

/* we're done dealing with the page content */
arch_put_page_accessible(page);

You mention page references. I think you should elaborate how that is
expected to work in the patch description more detailed.


(side note: I assume you guys have a plan for dealing with kdump wanting
to dump inaccessible pages. the kexec kernel would have to talk to the
UV to convert pages - and also make pages accessible on the I/O path I
guess - or one would want to mark and skip encrypted pages completely in
kdump somehow, as the content is essentially garbage)



cc Michal (not sure if your area of expertise)
https://lore.kernel.org/kvm/20200207113958.7320-2-borntraeger@de.ibm.com/

-- 
Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-10 18:17   ` David Hildenbrand
@ 2020-02-10 18:28     ` Christian Borntraeger
  2020-02-10 18:43       ` David Hildenbrand
  0 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10 18:28 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, Andrew Morton, Michal Hocko



On 10.02.20 19:17, David Hildenbrand wrote:
> On 07.02.20 12:39, Christian Borntraeger wrote:
>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>
>> With the introduction of protected KVM guests on s390 there is now a
>> concept of inaccessible pages. These pages need to be made accessible
>> before the host can access them.
>>
>> While cpu accesses will trigger a fault that can be resolved, I/O
>> accesses will just fail.  We need to add a callback into architecture
>> code for places that will do I/O, namely when writeback is started or
>> when a page reference is taken.
> 
> My question would be: What guarantees that the page will stay accessible
> (for I/O)? IIRC, pages can be converted back to secure/inaccessible
> whenever the guest wants to access them. How will that be dealt with?

Yes, in patch 5 we do use the page lock, PageWriteBack and page_ref_freeze
to only make the page secure again if no I/O is going to be started or
still running.

We have minimized the common code impact (just these 3 callbacks) so that 
architecture code can do the right thing.

> 
> I would assume some magic counter that tracks if the page still has to
> remain accessible. Once all clients that require the page to be
> "accessible" on the I/O path are done, the page can be made inaccessible
> again. But then, I would assume there would be something like a
> 
> /* make page accessible and make sure the page will remain accessible */
> arch_get_page_accessible(page);
> 
> /* we're done dealing with the page content */
> arch_put_page_accessible(page);
> 
> You mention page references. I think you should elaborate how that is
> expected to work in the patch description more detailed.
> 
> 
> (side note: I assume you guys have a plan for dealing with kdump wanting
> to dump inaccessible pages. the kexec kernel would have to talk to the
> UV to convert pages - and also make pages accessible on the I/O path I
> guess - or one would want to mark and skip encrypted pages completely in
> kdump somehow, as the content is essentially garbage)

On kexec and kdump the ultravisor is called as part of the the diagnose
308 subcodes 0 and 1 to make sure that a: kdump works (no fault on a 
previously secure page) and b: the content of the secure page is no 
longer accessible.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 24/35] KVM: s390: protvirt: disallow one_reg
  2020-02-10 17:53   ` Cornelia Huck
@ 2020-02-10 18:34     ` Christian Borntraeger
  2020-02-11  8:27       ` Cornelia Huck
  0 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10 18:34 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank



On 10.02.20 18:53, Cornelia Huck wrote:
> On Fri,  7 Feb 2020 06:39:47 -0500
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> From: Janosch Frank <frankja@linux.ibm.com>
>>
>> A lot of the registers are controlled by the Ultravisor and never
>> visible to KVM. Some fields in the sie control block are overlayed,
>> like gbea. As no userspace uses the ONE_REG interface on s390 it is safe
>> to disable this for protected guests.
> 
> Last round, I suggested
> 
> "As no known userspace uses the ONE_REG interface on s390 if sync regs
> are available, no functionality is lost if it is disabled for protected
> guests."

If you think this variant is better I can use this, I am fine with either. 
> 
> Any opinion on that?
> 
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> Reviewed-by: Thomas Huth <thuth@redhat.com>
>> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
>> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  Documentation/virt/kvm/api.txt | 6 ++++--
>>  arch/s390/kvm/kvm-s390.c       | 3 +++
>>  2 files changed, 7 insertions(+), 2 deletions(-)
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages
  2020-02-10 12:26   ` David Hildenbrand
@ 2020-02-10 18:38     ` Christian Borntraeger
  2020-02-10 19:33       ` David Hildenbrand
  2020-02-10 18:56       ` Ulrich Weigand
  1 sibling, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10 18:38 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, Andrew Morton



On 10.02.20 13:26, David Hildenbrand wrote:
> On 07.02.20 12:39, Christian Borntraeger wrote:
>> From: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
>>
>> The adapter interrupt page containing the indicator bits is currently
>> pinned. That means that a guest with many devices can pin a lot of
>> memory pages in the host. This also complicates the reference tracking
>> which is needed for memory management handling of protected virtual
>> machines.
>> We can reuse the pte notifiers to "cache" the page without pinning it.
>>
>> Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
>> Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
>> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
> 
> So, instead of pinning explicitly, look up the page address, cache it,
> and glue its lifetime to the gmap table entry. When that entry is
> changed, invalidate the cached page. On re-access, look up the page
> again and register the gmap notifier for the table entry again.

I think I might want to split this into two parts.
part 1: a naive approach that always does get_user_pages_remote/put_page
part 2: do the complex caching

Ulrich mentioned that this actually could make the map/unmap a no-op as we
have the address and bit already in the irq route. In the end this might be
as fast as todays pinning as we replace a list walk with a page table walk. 
Plus it would simplify the code. Will have a look if that is the case.

> 
> [...]
> 
>>  #define MAX_S390_IO_ADAPTERS ((MAX_ISC + 1) * 8)
>> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
>> index c06c89d370a7..4bfb2f8fe57c 100644
>> --- a/arch/s390/kvm/interrupt.c
>> +++ b/arch/s390/kvm/interrupt.c
>> @@ -28,6 +28,7 @@
>>  #include <asm/switch_to.h>
>>  #include <asm/nmi.h>
>>  #include <asm/airq.h>
>> +#include <linux/pagemap.h>
>>  #include "kvm-s390.h"
>>  #include "gaccess.h"
>>  #include "trace-s390.h"
>> @@ -2328,8 +2329,8 @@ static int register_io_adapter(struct kvm_device *dev,
>>  		return -ENOMEM;
>>  
>>  	INIT_LIST_HEAD(&adapter->maps);
>> -	init_rwsem(&adapter->maps_lock);
>> -	atomic_set(&adapter->nr_maps, 0);
>> +	spin_lock_init(&adapter->maps_lock);
>> +	adapter->nr_maps = 0;
>>  	adapter->id = adapter_info.id;
>>  	adapter->isc = adapter_info.isc;
>>  	adapter->maskable = adapter_info.maskable;
>> @@ -2375,19 +2376,15 @@ static int kvm_s390_adapter_map(struct kvm *kvm, unsigned int id, __u64 addr)
>>  		ret = -EFAULT;
>>  		goto out;
>>  	}
>> -	ret = get_user_pages_fast(map->addr, 1, FOLL_WRITE, &map->page);
>> -	if (ret < 0)
>> -		goto out;
>> -	BUG_ON(ret != 1);
>> -	down_write(&adapter->maps_lock);
>> -	if (atomic_inc_return(&adapter->nr_maps) < MAX_S390_ADAPTER_MAPS) {
>> +	spin_lock(&adapter->maps_lock);
>> +	if (adapter->nr_maps < MAX_S390_ADAPTER_MAPS) {
>> +		adapter->nr_maps++;
>>  		list_add_tail(&map->list, &adapter->maps);
> 
> I do wonder if we should check for duplicates. The unmap path will only
> remove exactly one entry. But maybe this can never happen or is already
> handled on a a higher layer.


This would be a broken userspace, but I also do not see a what would break
in the host if this happens.


> 
>>  }
>> @@ -2430,7 +2426,6 @@ void kvm_s390_destroy_adapters(struct kvm *kvm)
>>  		list_for_each_entry_safe(map, tmp,
>>  					 &kvm->arch.adapters[i]->maps, list) {
>>  			list_del(&map->list);
>> -			put_page(map->page);
>>  			kfree(map);
>>  		}
>>  		kfree(kvm->arch.adapters[i]);
> 
> Between the gmap being removed in kvm_arch_vcpu_destroy() and
> kvm_s390_destroy_adapters(), the entries would no longer properly get
> invalidated. AFAIK, removing/freeing the gmap will not trigger any
> notifiers.
> 
> Not sure if that's an issue (IOW, if we can have some very weird race).
> But I guess we would have similar races already :)

This is only called when all file descriptors are closed and this also closes
all irq routes. So I guess no I/O should be going on any more. 

> 
>> @@ -2690,6 +2685,31 @@ struct kvm_device_ops kvm_flic_ops = {
>>  	.destroy = flic_destroy,
>>  };
>>  
>> +void kvm_s390_adapter_gmap_notifier(struct gmap *gmap, unsigned long start,
>> +				    unsigned long end)
>> +{
>> +	struct kvm *kvm = gmap->private;
>> +	struct s390_map_info *map, *tmp;
>> +	int i;
>> +
>> +	for (i = 0; i < MAX_S390_IO_ADAPTERS; i++) {
>> +		struct s390_io_adapter *adapter = kvm->arch.adapters[i];
>> +
>> +		if (!adapter)
>> +			continue;
> 
> I have to ask very dumb: How is kvm->arch.adapters[] protected?

We only add new ones and this is removed at guest teardown it seems.
[...]

Let me have a look if we can simplify this.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-10 18:28     ` Christian Borntraeger
@ 2020-02-10 18:43       ` David Hildenbrand
  2020-02-10 18:51         ` Christian Borntraeger
  0 siblings, 1 reply; 147+ messages in thread
From: David Hildenbrand @ 2020-02-10 18:43 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, Andrew Morton, Michal Hocko

On 10.02.20 19:28, Christian Borntraeger wrote:
> 
> 
> On 10.02.20 19:17, David Hildenbrand wrote:
>> On 07.02.20 12:39, Christian Borntraeger wrote:
>>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>>
>>> With the introduction of protected KVM guests on s390 there is now a
>>> concept of inaccessible pages. These pages need to be made accessible
>>> before the host can access them.
>>>
>>> While cpu accesses will trigger a fault that can be resolved, I/O
>>> accesses will just fail.  We need to add a callback into architecture
>>> code for places that will do I/O, namely when writeback is started or
>>> when a page reference is taken.
>>
>> My question would be: What guarantees that the page will stay accessible
>> (for I/O)? IIRC, pages can be converted back to secure/inaccessible
>> whenever the guest wants to access them. How will that be dealt with?
> 
> Yes, in patch 5 we do use the page lock, PageWriteBack and page_ref_freeze
> to only make the page secure again if no I/O is going to be started or
> still running.
> 
> We have minimized the common code impact (just these 3 callbacks) so that 
> architecture code can do the right thing.

So the magic is

+static int expected_page_refs(struct page *page)
+{
+	int res;
+
+	res = page_mapcount(page);
+	if (PageSwapCache(page))
+		res++;
+	else if (page_mapping(page)) {
+		res++;
+		if (page_has_private(page))
+			res++;
+	}
+	return res;
+}
[...]
+static int make_secure_pte(pte_t *ptep, unsigned long addr, void *data)
[...]
+	if (PageWriteback(page))
+		return -EAGAIN;
+	expected = expected_page_refs(page);
+	if (!page_ref_freeze(page, expected))
+		return -EBUSY;
[...]
+	rc = uv_call(0, (u64)params->uvcb);
+	page_ref_unfreeze(page, expected);

As long as a page is does not have the expected refcount, it cannot be
convert to secure and not used by the guest.

I assume this implies, that if a guest page is pinned somewhere (e.g.,
in KVM), it won't be usable by the guest.

Please add all these details to the patch description. I think they are
crucial to understand how this is expected to work and to be used.


-- 
Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-10 18:43       ` David Hildenbrand
@ 2020-02-10 18:51         ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-10 18:51 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, Andrew Morton, Michal Hocko



On 10.02.20 19:43, David Hildenbrand wrote:
> On 10.02.20 19:28, Christian Borntraeger wrote:
>>
>>
>> On 10.02.20 19:17, David Hildenbrand wrote:
>>> On 07.02.20 12:39, Christian Borntraeger wrote:
>>>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>>>
>>>> With the introduction of protected KVM guests on s390 there is now a
>>>> concept of inaccessible pages. These pages need to be made accessible
>>>> before the host can access them.
>>>>
>>>> While cpu accesses will trigger a fault that can be resolved, I/O
>>>> accesses will just fail.  We need to add a callback into architecture
>>>> code for places that will do I/O, namely when writeback is started or
>>>> when a page reference is taken.
>>>
>>> My question would be: What guarantees that the page will stay accessible
>>> (for I/O)? IIRC, pages can be converted back to secure/inaccessible
>>> whenever the guest wants to access them. How will that be dealt with?
>>
>> Yes, in patch 5 we do use the page lock, PageWriteBack and page_ref_freeze
>> to only make the page secure again if no I/O is going to be started or
>> still running.
>>
>> We have minimized the common code impact (just these 3 callbacks) so that 
>> architecture code can do the right thing.
> 
> So the magic is
> 
> +static int expected_page_refs(struct page *page)
> +{
> +	int res;
> +
> +	res = page_mapcount(page);
> +	if (PageSwapCache(page))
> +		res++;
> +	else if (page_mapping(page)) {
> +		res++;
> +		if (page_has_private(page))
> +			res++;
> +	}
> +	return res;
> +}
> [...]
> +static int make_secure_pte(pte_t *ptep, unsigned long addr, void *data)
> [...]
> +	if (PageWriteback(page))
> +		return -EAGAIN;
> +	expected = expected_page_refs(page);
> +	if (!page_ref_freeze(page, expected))
> +		return -EBUSY;
> [...]
> +	rc = uv_call(0, (u64)params->uvcb);
> +	page_ref_unfreeze(page, expected);
> 
> As long as a page is does not have the expected refcount, it cannot be
> convert to secure and not used by the guest.
> 
> I assume this implies, that if a guest page is pinned somewhere (e.g.,
> in KVM), it won't be usable by the guest.

Yes, but you can always exit QEMU nothing will "block". You you have a permanent
SIE exit. This is something that should not happen for QEMU/KVM and the expected
refcount logic can be found in many common code places. 
> 
> Please add all these details to the patch description. I think they are
> crucial to understand how this is expected to work and to be used.

Makes sense. Will add more explanation to patch 5.
Ulrich also had some idea how to simplify patch 5 in some places.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt
  2020-02-10 12:26   ` David Hildenbrand
@ 2020-02-10 18:56       ` Ulrich Weigand
  2020-02-10 18:56       ` Ulrich Weigand
  1 sibling, 0 replies; 147+ messages in thread
From: Ulrich Weigand @ 2020-02-10 18:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Christian Borntraeger, Janosch Frank, KVM, Cornelia Huck,
	Thomas Huth, Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli,
	linux-s390, Michael Mueller, Vasily Gorbik, linux-mm,
	Andrew Morton

David Hildenbrand wrote:

> So, instead of pinning explicitly, look up the page address, cache it,
> and glue its lifetime to the gmap table entry. When that entry is
> changed, invalidate the cached page. On re-access, look up the page
> again and register the gmap notifier for the table entry again.

Yes, exactly.

> [...]> +static struct page *get_map_page(struct kvm *kvm,
> > +				 struct s390_io_adapter *adapter,
> > +				 u64 addr)
> >  {
> >  	struct s390_map_info *map;
> > +	unsigned long uaddr;
> > +	struct page *page;
> > +	bool need_retry;
> > +	int ret;
> > 
> >  	if (!adapter)
> >  		return NULL;
> > +retry:
> > +	page = NULL;
> > +	uaddr = 0;
> > +	spin_lock(&adapter->maps_lock);
> > +	list_for_each_entry(map, &adapter->maps, list)
> > +		if (map->guest_addr == addr) {
> 
> Could it happen, that we don't have a fitting entry in the list?

Yes, if user space tries to signal an interrupt on a page that
was not properly announced via KVM_S390_IO_ADAPTER_MAP.

In that case, the loop returns with page == NULL and uaddr == 0,
which will cause the code below to return NULL, which will cause
the caller to return an error to user space.

> > +			uaddr = map->addr;
> > +			page = map->page;
> > +			if (!page)
> > +				map->page = ERR_PTR(-EBUSY);
> > +			else if (IS_ERR(page) || !page_cache_get_speculative(page)) {
> > +				spin_unlock(&adapter->maps_lock);
> > +				goto retry;
> > +			}
> > +			break;
> > +		}
> 
> Can we please factor out looking up the list entry to a separate
> function, to be called under lock? (and e.g., use it below as well)

Good idea, I like that.  Will update the patch ...

> > +	need_retry = true;
> > +	spin_lock(&adapter->maps_lock);
> > +	list_for_each_entry(map, &adapter->maps, list)
> > +		if (map->guest_addr == addr) {
> 
> Could it happen that our entry is suddenly no longer in the list?

Yes, if user space did a KVM_S390_IO_ADAPTER_UNMAP in the meantime.
In this case we'll exit the loop with need_retry == true and will
restart from the beginning, usually then returning an error back
to user space.

> > +			if (map->page == ERR_PTR(-EBUSY)) {
> > +				map->page = page;
> > +				need_retry = false;
> > +			} else if (IS_ERR(map->page)) {
> 
> else if (map->page == ERR_PTR(-EINVAL)
> 
> or simpy "else" (every other value would be a BUG_ON, right?)

Usually yes.  I guess there's the theoretical case that we race
with user space removing the old entry with KVM_S390_IO_ADAPTER_UNMAP
and immediately afterwards installing a new entry with the same
guest address.  In that case, we'll also fall into the need_retry
case here.

> Wow, this function is ... special. Took me way to long to figure out
> what is going on here. We certainly need comments in there.

I agree.  As Christian said, it's not fully clear that all of this
is really needed.  Maybe just doing the get_user_pages_remote every
time is actually enough -- we should do the "cache" magic only if
this is really critical for performance.

> I can see that
> 
> - ERR_PTR(-EBUSY) is used when somebody is about to do the
>   get_user_pages_remote(). others have to loop until that is resolved.
> - ERR_PTR(-EINVAL) is used when the entry gets invalidated by the
>   notifier while somebody is about to set it (while still
>   ERR_PTR(-EBUSY)). The one currently processing the entry will
>   eventually set it back to NULL.

Yes, that's the intent.
 
> I think we should make this clearer by only setting ERR_PTR(-EINVAL) in
> the notifier if already ERR_PTR(-EBUSY), along with a comment.

I guess I wanted to catch the case where we get another invalidation
while we already have -EINVAL.  But given the rest of the logic, this
shouldn't actually ever happen.  (If it *did* happen, however, then
setting to -EINVAL again is safer than resetting to NULL.)

> Can we document the values for map->page and how they are to be handled
> right in the struct?

OK, will do.

Bye,
Ulrich


-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt
@ 2020-02-10 18:56       ` Ulrich Weigand
  0 siblings, 0 replies; 147+ messages in thread
From: Ulrich Weigand @ 2020-02-10 18:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Christian Borntraeger, Janosch Frank, KVM, Cornelia Huck,
	Thomas Huth, Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli,
	linux-s390, Michael Mueller, Vasily Gorbik, linux-mm,
	Andrew Morton

David Hildenbrand wrote:

> So, instead of pinning explicitly, look up the page address, cache it,
> and glue its lifetime to the gmap table entry. When that entry is
> changed, invalidate the cached page. On re-access, look up the page
> again and register the gmap notifier for the table entry again.

Yes, exactly.

> [...]> +static struct page *get_map_page(struct kvm *kvm,
> > +				 struct s390_io_adapter *adapter,
> > +				 u64 addr)
> >  {
> >  	struct s390_map_info *map;
> > +	unsigned long uaddr;
> > +	struct page *page;
> > +	bool need_retry;
> > +	int ret;
> > 
> >  	if (!adapter)
> >  		return NULL;
> > +retry:
> > +	page = NULL;
> > +	uaddr = 0;
> > +	spin_lock(&adapter->maps_lock);
> > +	list_for_each_entry(map, &adapter->maps, list)
> > +		if (map->guest_addr == addr) {
> 
> Could it happen, that we don't have a fitting entry in the list?

Yes, if user space tries to signal an interrupt on a page that
was not properly announced via KVM_S390_IO_ADAPTER_MAP.

In that case, the loop returns with page == NULL and uaddr == 0,
which will cause the code below to return NULL, which will cause
the caller to return an error to user space.

> > +			uaddr = map->addr;
> > +			page = map->page;
> > +			if (!page)
> > +				map->page = ERR_PTR(-EBUSY);
> > +			else if (IS_ERR(page) || !page_cache_get_speculative(page)) {
> > +				spin_unlock(&adapter->maps_lock);
> > +				goto retry;
> > +			}
> > +			break;
> > +		}
> 
> Can we please factor out looking up the list entry to a separate
> function, to be called under lock? (and e.g., use it below as well)

Good idea, I like that.  Will update the patch ...

> > +	need_retry = true;
> > +	spin_lock(&adapter->maps_lock);
> > +	list_for_each_entry(map, &adapter->maps, list)
> > +		if (map->guest_addr == addr) {
> 
> Could it happen that our entry is suddenly no longer in the list?

Yes, if user space did a KVM_S390_IO_ADAPTER_UNMAP in the meantime.
In this case we'll exit the loop with need_retry == true and will
restart from the beginning, usually then returning an error back
to user space.

> > +			if (map->page == ERR_PTR(-EBUSY)) {
> > +				map->page = page;
> > +				need_retry = false;
> > +			} else if (IS_ERR(map->page)) {
> 
> else if (map->page == ERR_PTR(-EINVAL)
> 
> or simpy "else" (every other value would be a BUG_ON, right?)

Usually yes.  I guess there's the theoretical case that we race
with user space removing the old entry with KVM_S390_IO_ADAPTER_UNMAP
and immediately afterwards installing a new entry with the same
guest address.  In that case, we'll also fall into the need_retry
case here.

> Wow, this function is ... special. Took me way to long to figure out
> what is going on here. We certainly need comments in there.

I agree.  As Christian said, it's not fully clear that all of this
is really needed.  Maybe just doing the get_user_pages_remote every
time is actually enough -- we should do the "cache" magic only if
this is really critical for performance.

> I can see that
> 
> - ERR_PTR(-EBUSY) is used when somebody is about to do the
>   get_user_pages_remote(). others have to loop until that is resolved.
> - ERR_PTR(-EINVAL) is used when the entry gets invalidated by the
>   notifier while somebody is about to set it (while still
>   ERR_PTR(-EBUSY)). The one currently processing the entry will
>   eventually set it back to NULL.

Yes, that's the intent.
 
> I think we should make this clearer by only setting ERR_PTR(-EINVAL) in
> the notifier if already ERR_PTR(-EBUSY), along with a comment.

I guess I wanted to catch the case where we get another invalidation
while we already have -EINVAL.  But given the rest of the logic, this
shouldn't actually ever happen.  (If it *did* happen, however, then
setting to -EINVAL again is safer than resetting to NULL.)

> Can we document the values for map->page and how they are to be handled
> right in the struct?

OK, will do.

Bye,
Ulrich


-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com


^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages
  2020-02-10 18:38     ` Christian Borntraeger
@ 2020-02-10 19:33       ` David Hildenbrand
  2020-02-11  9:23         ` [PATCH v2 RFC] " Christian Borntraeger
  0 siblings, 1 reply; 147+ messages in thread
From: David Hildenbrand @ 2020-02-10 19:33 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: David Hildenbrand, Janosch Frank, KVM, Cornelia Huck,
	Thomas Huth, Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli,
	linux-s390, Michael Mueller, Vasily Gorbik, linux-mm,
	Andrew Morton



> Am 10.02.2020 um 19:41 schrieb Christian Borntraeger <borntraeger@de.ibm.com>:
> 
> 
> 
>> On 10.02.20 13:26, David Hildenbrand wrote:
>>> On 07.02.20 12:39, Christian Borntraeger wrote:
>>> From: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
>>> 
>>> The adapter interrupt page containing the indicator bits is currently
>>> pinned. That means that a guest with many devices can pin a lot of
>>> memory pages in the host. This also complicates the reference tracking
>>> which is needed for memory management handling of protected virtual
>>> machines.
>>> We can reuse the pte notifiers to "cache" the page without pinning it.
>>> 
>>> Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
>>> Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
>>> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>> ---
>> 
>> So, instead of pinning explicitly, look up the page address, cache it,
>> and glue its lifetime to the gmap table entry. When that entry is
>> changed, invalidate the cached page. On re-access, look up the page
>> again and register the gmap notifier for the table entry again.
> 
> I think I might want to split this into two parts.
> part 1: a naive approach that always does get_user_pages_remote/put_page
> part 2: do the complex caching
> 
> Ulrich mentioned that this actually could make the map/unmap a no-op as we
> have the address and bit already in the irq route. In the end this might be
> as fast as todays pinning as we replace a list walk with a page table walk. 
> Plus it would simplify the code. Will have a look if that is the case.

If we could simplify that heavily, that would be awesome!

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 24/35] KVM: s390: protvirt: disallow one_reg
  2020-02-10 18:34     ` Christian Borntraeger
@ 2020-02-11  8:27       ` Cornelia Huck
  0 siblings, 0 replies; 147+ messages in thread
From: Cornelia Huck @ 2020-02-11  8:27 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

On Mon, 10 Feb 2020 19:34:56 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 10.02.20 18:53, Cornelia Huck wrote:
> > On Fri,  7 Feb 2020 06:39:47 -0500
> > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> >   
> >> From: Janosch Frank <frankja@linux.ibm.com>
> >>
> >> A lot of the registers are controlled by the Ultravisor and never
> >> visible to KVM. Some fields in the sie control block are overlayed,
> >> like gbea. As no userspace uses the ONE_REG interface on s390 it is safe
> >> to disable this for protected guests.  
> > 
> > Last round, I suggested
> > 
> > "As no known userspace uses the ONE_REG interface on s390 if sync regs
> > are available, no functionality is lost if it is disabled for protected
> > guests."  
> 
> If you think this variant is better I can use this, I am fine with either. 

Well, yes :) I was afraid that it fell through the cracks.

> > 
> > Any opinion on that?
> >   
> >>
> >> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> >> Reviewed-by: Thomas Huth <thuth@redhat.com>
> >> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> >> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> >> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> >> ---
> >>  Documentation/virt/kvm/api.txt | 6 ++++--
> >>  arch/s390/kvm/kvm-s390.c       | 3 +++
> >>  2 files changed, 7 insertions(+), 2 deletions(-)  
> >   
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH/RFC] KVM: s390: protvirt: pass-through rc and rrc
  2020-02-10 12:56             ` Christian Borntraeger
@ 2020-02-11  8:48               ` Janosch Frank
  2020-02-13  8:43                 ` Christian Borntraeger
  0 siblings, 1 reply; 147+ messages in thread
From: Janosch Frank @ 2020-02-11  8:48 UTC (permalink / raw)
  To: Christian Borntraeger, Cornelia Huck
  Cc: thuth, Ulrich.Weigand, aarcange, david, frankja, gor, imbrenda,
	kvm, linux-s390, mimu


[-- Attachment #1.1: Type: text/plain, Size: 1062 bytes --]

On 2/10/20 1:56 PM, Christian Borntraeger wrote:
> 
> 
> On 10.02.20 13:50, Cornelia Huck wrote:
>> On Mon, 10 Feb 2020 13:06:19 +0100
>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>>
>>> What about the following. I will rip out RC and RRC but add 
>>> a 32bit flags field (which must be 0) and 3*64 bit reserved.
>>
>> Probably dumb question: How are these new fields supposed to be used?
> 
> This was planned for error handling in QEMU. As we have no user of rc/rrc
> yet, I have ripped that out and added a flag field + 16 bytes of reserved.
> Usage is as usual flags must be 0. When flags!=0 the reserved fields will
> have a new meaning. 
> 

I want to have the rcs because right now we would only output the return
value of the ioctl and most UV error codes are mapped to -EINVAL. So if
an error occurs, admins would need to match up the crashed VM with the
UV debugfs files which might not even exist if debugfs is not mounted...

That's also one of the reasons I like having separate create calls for
VM and VCPUs.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [PATCH v2 RFC] KVM: s390/interrupt: do not pin adapter interrupt pages
  2020-02-10 19:33       ` David Hildenbrand
@ 2020-02-11  9:23         ` Christian Borntraeger
  2020-02-12 11:52           ` Christian Borntraeger
                             ` (2 more replies)
  0 siblings, 3 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-11  9:23 UTC (permalink / raw)
  To: david
  Cc: Ulrich.Weigand, aarcange, akpm, borntraeger, cohuck, frankja,
	gor, imbrenda, kvm, linux-mm, linux-s390, mimu, thuth

From: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>

The adapter interrupt page containing the indicator bits is currently
pinned. That means that a guest with many devices can pin a lot of
memory pages in the host. This also complicates the reference tracking
which is needed for memory management handling of protected virtual
machines.
We can simply try to get the userspace page set the bits and free the
page. By storing the userspace address in the irq routing entry instead
of the guest address we can actually avoid many lookups and list walks
so that this variant is very likely not slower.

Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
[borntraeger@de.ibm.com: patch simplification]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
quick and dirty, how this could look like


 arch/s390/include/asm/kvm_host.h |   3 -
 arch/s390/kvm/interrupt.c        | 146 +++++++++++--------------------
 2 files changed, 49 insertions(+), 100 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 0d398738ded9..88a218872fa0 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -771,9 +771,6 @@ struct s390_io_adapter {
 	bool masked;
 	bool swap;
 	bool suppressible;
-	struct rw_semaphore maps_lock;
-	struct list_head maps;
-	atomic_t nr_maps;
 };
 
 #define MAX_S390_IO_ADAPTERS ((MAX_ISC + 1) * 8)
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index d4d35ec79e12..e6fe8b61ee9b 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2459,9 +2459,6 @@ static int register_io_adapter(struct kvm_device *dev,
 	if (!adapter)
 		return -ENOMEM;
 
-	INIT_LIST_HEAD(&adapter->maps);
-	init_rwsem(&adapter->maps_lock);
-	atomic_set(&adapter->nr_maps, 0);
 	adapter->id = adapter_info.id;
 	adapter->isc = adapter_info.isc;
 	adapter->maskable = adapter_info.maskable;
@@ -2488,83 +2485,26 @@ int kvm_s390_mask_adapter(struct kvm *kvm, unsigned int id, bool masked)
 
 static int kvm_s390_adapter_map(struct kvm *kvm, unsigned int id, __u64 addr)
 {
-	struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
-	struct s390_map_info *map;
-	int ret;
-
-	if (!adapter || !addr)
-		return -EINVAL;
-
-	map = kzalloc(sizeof(*map), GFP_KERNEL);
-	if (!map) {
-		ret = -ENOMEM;
-		goto out;
-	}
-	INIT_LIST_HEAD(&map->list);
-	map->guest_addr = addr;
-	map->addr = gmap_translate(kvm->arch.gmap, addr);
-	if (map->addr == -EFAULT) {
-		ret = -EFAULT;
-		goto out;
-	}
-	ret = get_user_pages_fast(map->addr, 1, FOLL_WRITE, &map->page);
-	if (ret < 0)
-		goto out;
-	BUG_ON(ret != 1);
-	down_write(&adapter->maps_lock);
-	if (atomic_inc_return(&adapter->nr_maps) < MAX_S390_ADAPTER_MAPS) {
-		list_add_tail(&map->list, &adapter->maps);
-		ret = 0;
-	} else {
-		put_page(map->page);
-		ret = -EINVAL;
+	/*
+	 * We resolve the gpa to hva when setting the IRQ routing. If userspace
+	 * decides to mess with the memslots it better also updates the irq
+	 * routing. Otherwise we will write to the wrong userspace address.
+	 */
+	return 0;
 	}
-	up_write(&adapter->maps_lock);
-out:
-	if (ret)
-		kfree(map);
-	return ret;
-}
 
 static int kvm_s390_adapter_unmap(struct kvm *kvm, unsigned int id, __u64 addr)
 {
-	struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
-	struct s390_map_info *map, *tmp;
-	int found = 0;
-
-	if (!adapter || !addr)
-		return -EINVAL;
-
-	down_write(&adapter->maps_lock);
-	list_for_each_entry_safe(map, tmp, &adapter->maps, list) {
-		if (map->guest_addr == addr) {
-			found = 1;
-			atomic_dec(&adapter->nr_maps);
-			list_del(&map->list);
-			put_page(map->page);
-			kfree(map);
-			break;
-		}
-	}
-	up_write(&adapter->maps_lock);
-
-	return found ? 0 : -EINVAL;
+	return 0;
 }
 
 void kvm_s390_destroy_adapters(struct kvm *kvm)
 {
 	int i;
-	struct s390_map_info *map, *tmp;
 
 	for (i = 0; i < MAX_S390_IO_ADAPTERS; i++) {
 		if (!kvm->arch.adapters[i])
 			continue;
-		list_for_each_entry_safe(map, tmp,
-					 &kvm->arch.adapters[i]->maps, list) {
-			list_del(&map->list);
-			put_page(map->page);
-			kfree(map);
-		}
 		kfree(kvm->arch.adapters[i]);
 	}
 }
@@ -2831,19 +2771,25 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
 	return swap ? (bit ^ (BITS_PER_LONG - 1)) : bit;
 }
 
-static struct s390_map_info *get_map_info(struct s390_io_adapter *adapter,
-					  u64 addr)
+static struct page *get_map_page(struct kvm *kvm,
+				 struct s390_io_adapter *adapter,
+				 u64 uaddr)
 {
-	struct s390_map_info *map;
+	struct page *page;
+	int ret;
 
 	if (!adapter)
 		return NULL;
-
-	list_for_each_entry(map, &adapter->maps, list) {
-		if (map->guest_addr == addr)
-			return map;
-	}
-	return NULL;
+	page = NULL;
+	if (!uaddr)
+		return NULL;
+	down_read(&kvm->mm->mmap_sem);
+	ret = get_user_pages_remote(NULL, kvm->mm, uaddr, 1, FOLL_WRITE,
+				    &page, NULL, NULL);
+	if (ret < 1)
+		page = NULL;
+	up_read(&kvm->mm->mmap_sem);
+	return page;
 }
 
 static int adapter_indicators_set(struct kvm *kvm,
@@ -2852,30 +2798,35 @@ static int adapter_indicators_set(struct kvm *kvm,
 {
 	unsigned long bit;
 	int summary_set, idx;
-	struct s390_map_info *info;
+	struct page *ind_page, *summary_page;
 	void *map;
 
-	info = get_map_info(adapter, adapter_int->ind_addr);
-	if (!info)
+	ind_page = get_map_page(kvm, adapter, adapter_int->ind_addr);
+	if (!ind_page)
 		return -1;
-	map = page_address(info->page);
-	bit = get_ind_bit(info->addr, adapter_int->ind_offset, adapter->swap);
-	set_bit(bit, map);
-	idx = srcu_read_lock(&kvm->srcu);
-	mark_page_dirty(kvm, info->guest_addr >> PAGE_SHIFT);
-	set_page_dirty_lock(info->page);
-	info = get_map_info(adapter, adapter_int->summary_addr);
-	if (!info) {
-		srcu_read_unlock(&kvm->srcu, idx);
+	summary_page = get_map_page(kvm, adapter, adapter_int->summary_addr);
+	if (!summary_page) {
+		put_page(ind_page);
 		return -1;
 	}
-	map = page_address(info->page);
-	bit = get_ind_bit(info->addr, adapter_int->summary_offset,
-			  adapter->swap);
+
+	idx = srcu_read_lock(&kvm->srcu);
+	map = page_address(ind_page);
+	bit = get_ind_bit(adapter_int->ind_addr,
+			  adapter_int->ind_offset, adapter->swap);
+	set_bit(bit, map);
+	mark_page_dirty(kvm, adapter_int->ind_addr >> PAGE_SHIFT);
+	set_page_dirty_lock(ind_page);
+	map = page_address(summary_page);
+	bit = get_ind_bit(adapter_int->summary_addr,
+			  adapter_int->summary_offset, adapter->swap);
 	summary_set = test_and_set_bit(bit, map);
-	mark_page_dirty(kvm, info->guest_addr >> PAGE_SHIFT);
-	set_page_dirty_lock(info->page);
+	mark_page_dirty(kvm, adapter_int->summary_addr >> PAGE_SHIFT);
+	set_page_dirty_lock(summary_page);
 	srcu_read_unlock(&kvm->srcu, idx);
+
+	put_page(ind_page);
+	put_page(summary_page);
 	return summary_set ? 0 : 1;
 }
 
@@ -2897,9 +2848,7 @@ static int set_adapter_int(struct kvm_kernel_irq_routing_entry *e,
 	adapter = get_io_adapter(kvm, e->adapter.adapter_id);
 	if (!adapter)
 		return -1;
-	down_read(&adapter->maps_lock);
 	ret = adapter_indicators_set(kvm, adapter, &e->adapter);
-	up_read(&adapter->maps_lock);
 	if ((ret > 0) && !adapter->masked) {
 		ret = kvm_s390_inject_airq(kvm, adapter);
 		if (ret == 0)
@@ -2951,12 +2900,15 @@ int kvm_set_routing_entry(struct kvm *kvm,
 			  const struct kvm_irq_routing_entry *ue)
 {
 	int ret;
+	u64 uaddr;
 
 	switch (ue->type) {
 	case KVM_IRQ_ROUTING_S390_ADAPTER:
 		e->set = set_adapter_int;
-		e->adapter.summary_addr = ue->u.adapter.summary_addr;
-		e->adapter.ind_addr = ue->u.adapter.ind_addr;
+		uaddr =  gmap_translate(kvm->arch.gmap, ue->u.adapter.summary_addr);
+		e->adapter.summary_addr = uaddr;
+		uaddr =  gmap_translate(kvm->arch.gmap, ue->u.adapter.ind_addr);
+		e->adapter.ind_addr = uaddr;
 		e->adapter.summary_offset = ue->u.adapter.summary_offset;
 		e->adapter.ind_offset = ue->u.adapter.ind_offset;
 		e->adapter.adapter_id = ue->u.adapter.adapter_id;
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 147+ messages in thread

* Re: [PATCH 25/35] KVM: s390: protvirt: Only sync fmt4 registers
  2020-02-07 11:39 ` [PATCH 25/35] KVM: s390: protvirt: Only sync fmt4 registers Christian Borntraeger
  2020-02-09 15:50   ` Thomas Huth
@ 2020-02-11 10:51   ` Cornelia Huck
  2020-02-11 12:59     ` Christian Borntraeger
  1 sibling, 1 reply; 147+ messages in thread
From: Cornelia Huck @ 2020-02-11 10:51 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

On Fri,  7 Feb 2020 06:39:48 -0500
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> +static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> +{
> +	/*
> +	 * at several places we have to modify our internal view to not do

s/at/In/ ?

> +	 * things that are disallowed by the ultravisor. For example we must
> +	 * not inject interrupts after specific exits (e.g. 112). We do this

Spell out what 112 is?

> +	 * by turning off the MIE bits of our PSW copy. To avoid getting

And also spell out what MIE is?

> +	 * validity intercepts, we do only accept the condition code from
> +	 * userspace.
> +	 */

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 23/35] KVM: s390: protvirt: STSI handling
  2020-02-07 11:39 ` [PATCH 23/35] KVM: s390: protvirt: STSI handling Christian Borntraeger
  2020-02-08 15:01   ` Thomas Huth
@ 2020-02-11 10:55   ` Cornelia Huck
  1 sibling, 0 replies; 147+ messages in thread
From: Cornelia Huck @ 2020-02-11 10:55 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

On Fri,  7 Feb 2020 06:39:46 -0500
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> From: Janosch Frank <frankja@linux.ibm.com>
> 
> Save response to sidad and disable address checking for protected
> guests.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/kvm/priv.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)

Reviewed-by: Cornelia Huck <cohuck@redhat.com>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-10 17:27     ` Christian Borntraeger
  (?)
@ 2020-02-11 11:26     ` Will Deacon
  2020-02-11 11:43         ` Christian Borntraeger
  2020-02-13 14:48         ` Christian Borntraeger
  -1 siblings, 2 replies; 147+ messages in thread
From: Will Deacon @ 2020-02-11 11:26 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, Andrew Morton, Marc Zyngier, Sean Christopherson,
	Tom Lendacky, KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, kvm-ppc, Paolo Bonzini,
	mark.rutland, qperret, palmerdabbelt

On Mon, Feb 10, 2020 at 06:27:04PM +0100, Christian Borntraeger wrote:
> CC Marc Zyngier for KVM on ARM.  Marc, see below. Will there be any
> use for this on KVM/ARM in the future?

I can't speak for Marc, but I can say that we're interested in something
like this for potentially isolating VMs from a KVM host in Android.
However, we've currently been working on the assumption that the memory
removed from the host won't usually be touched by the host (i.e. no
KSM or swapping out), so all we'd probably want at the moment is to be
able to return an error back from arch_make_page_accessible(). Its return
code is ignored in this patch :/

One thing I don't grok about the ultravisor encryption is how it avoids
replay attacks when paging back in. For example, if the host is compromised
and replaces the page contents with an old encrypted value. Are you storing
per-page metadata somewhere to ensure "freshness" of the encrypted data?

Will

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-11 11:26     ` Will Deacon
@ 2020-02-11 11:43         ` Christian Borntraeger
  2020-02-13 14:48         ` Christian Borntraeger
  1 sibling, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-11 11:43 UTC (permalink / raw)
  To: Will Deacon
  Cc: Janosch Frank, Andrew Morton, Marc Zyngier, Sean Christopherson,
	Tom Lendacky, KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, kvm-ppc, Paolo Bonzini,
	mark.rutland, qperret, palmerdabbelt



On 11.02.20 12:26, Will Deacon wrote:
> On Mon, Feb 10, 2020 at 06:27:04PM +0100, Christian Borntraeger wrote:
>> CC Marc Zyngier for KVM on ARM.  Marc, see below. Will there be any
>> use for this on KVM/ARM in the future?
> 
> I can't speak for Marc, but I can say that we're interested in something
> like this for potentially isolating VMs from a KVM host in Android.
> However, we've currently been working on the assumption that the memory
> removed from the host won't usually be touched by the host (i.e. no
> KSM or swapping out), so all we'd probably want at the moment is to be
> able to return an error back from arch_make_page_accessible(). Its return
> code is ignored in this patch :/
> 
> One thing I don't grok about the ultravisor encryption is how it avoids
> replay attacks when paging back in. For example, if the host is compromised
> and replaces the page contents with an old encrypted value. Are you storing
> per-page metadata somewhere to ensure "freshness" of the encrypted data?

Cant talk about the others, but on s390 the ultravisor stores counter,
tweak, address and hashing information. No replay or page exchange within
the guest is possible. (We can move the guest content to a different host
page though be using the export/import as this will revalidate the 
correctness from the guest point of view)

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
@ 2020-02-11 11:43         ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-11 11:43 UTC (permalink / raw)
  To: Will Deacon
  Cc: Janosch Frank, Andrew Morton, Marc Zyngier, Sean Christopherson,
	Tom Lendacky, KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, kvm-ppc, Paolo Bonzini,
	mark.rutland, qperret, palmerdabbelt



On 11.02.20 12:26, Will Deacon wrote:
> On Mon, Feb 10, 2020 at 06:27:04PM +0100, Christian Borntraeger wrote:
>> CC Marc Zyngier for KVM on ARM.  Marc, see below. Will there be any
>> use for this on KVM/ARM in the future?
> 
> I can't speak for Marc, but I can say that we're interested in something
> like this for potentially isolating VMs from a KVM host in Android.
> However, we've currently been working on the assumption that the memory
> removed from the host won't usually be touched by the host (i.e. no
> KSM or swapping out), so all we'd probably want at the moment is to be
> able to return an error back from arch_make_page_accessible(). Its return
> code is ignored in this patch :/
> 
> One thing I don't grok about the ultravisor encryption is how it avoids
> replay attacks when paging back in. For example, if the host is compromised
> and replaces the page contents with an old encrypted value. Are you storing
> per-page metadata somewhere to ensure "freshness" of the encrypted data?

Cant talk about the others, but on s390 the ultravisor stores counter,
tweak, address and hashing information. No replay or page exchange within
the guest is possible. (We can move the guest content to a different host
page though be using the export/import as this will revalidate the 
correctness from the guest point of view)

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 16/35] KVM: s390: protvirt: Add SCLP interrupt handling
  2020-02-07 11:39 ` [PATCH 16/35] KVM: s390: protvirt: Add SCLP interrupt handling Christian Borntraeger
@ 2020-02-11 12:00   ` Thomas Huth
  2020-02-11 20:06     ` Christian Borntraeger
  0 siblings, 1 reply; 147+ messages in thread
From: Thomas Huth @ 2020-02-11 12:00 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank

On 07/02/2020 12.39, Christian Borntraeger wrote:
> The sclp interrupt is kind of special. The ultravisor polices that we
> do not inject an sclp interrupt with payload if no sccb is outstanding.
> On the other hand we have "asynchronous" event interrupts, e.g. for
> console input.
> We separate both variants into sclp interrupt and sclp event interrupt.
> The sclp interrupt is masked until a previous servc instruction has
> finished (sie exit 108).
> 
> [frankja@linux.ibm.com: factoring out write_sclp]
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
[...]
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index e5ee52e33d96..c28fa09cb557 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
> @@ -325,8 +325,11 @@ static inline int gisa_tac_ipm_gisc(struct kvm_s390_gisa *gisa, u32 gisc)
>  
>  static inline unsigned long pending_irqs_no_gisa(struct kvm_vcpu *vcpu)
>  {
> -	return vcpu->kvm->arch.float_int.pending_irqs |
> -		vcpu->arch.local_int.pending_irqs;
> +	unsigned long pending = vcpu->kvm->arch.float_int.pending_irqs |
> +				vcpu->arch.local_int.pending_irqs;
> +
> +	pending &= ~vcpu->kvm->arch.float_int.masked_irqs;
> +	return pending;
>  }
>  
>  static inline unsigned long pending_irqs(struct kvm_vcpu *vcpu)
> @@ -384,8 +387,10 @@ static unsigned long deliverable_irqs(struct kvm_vcpu *vcpu)
>  		__clear_bit(IRQ_PEND_EXT_CLOCK_COMP, &active_mask);
>  	if (!(vcpu->arch.sie_block->gcr[0] & CR0_CPU_TIMER_SUBMASK))
>  		__clear_bit(IRQ_PEND_EXT_CPU_TIMER, &active_mask);
> -	if (!(vcpu->arch.sie_block->gcr[0] & CR0_SERVICE_SIGNAL_SUBMASK))
> +	if (!(vcpu->arch.sie_block->gcr[0] & CR0_SERVICE_SIGNAL_SUBMASK)) {
>  		__clear_bit(IRQ_PEND_EXT_SERVICE, &active_mask);
> +		__clear_bit(IRQ_PEND_EXT_SERVICE_EV, &active_mask);
> +	}
>  	if (psw_mchk_disabled(vcpu))
>  		active_mask &= ~IRQ_PEND_MCHK_MASK;
>  	/* PV guest cpus can have a single interruption injected at a time. */
> @@ -947,6 +952,31 @@ static int __must_check __deliver_prog(struct kvm_vcpu *vcpu)
>  	return rc ? -EFAULT : 0;
>  }
>  
> +#define SCCB_MASK 0xFFFFFFF8
> +#define SCCB_EVENT_PENDING 0x3
> +
> +static int write_sclp(struct kvm_vcpu *vcpu, u32 parm)
> +{
> +	int rc;
> +
> +	if (kvm_s390_pv_handle_cpu(vcpu)) {
> +		vcpu->arch.sie_block->iictl = IICTL_CODE_EXT;
> +		vcpu->arch.sie_block->eic = EXT_IRQ_SERVICE_SIG;
> +		vcpu->arch.sie_block->eiparams = parm;
> +		return 0;
> +	}
> +
> +	rc  = put_guest_lc(vcpu, EXT_IRQ_SERVICE_SIG, (u16 *)__LC_EXT_INT_CODE);
> +	rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
> +	rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
> +			     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> +	rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
> +			    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> +	rc |= put_guest_lc(vcpu, parm,
> +			   (u32 *)__LC_EXT_PARAMS);
> +	return rc;

I think it would be nicer to move the "return rc ? -EFAULT : 0;" here
instead of using it in the __deliver_service* functions...

> +}
> +
>  static int __must_check __deliver_service(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
> @@ -954,13 +984,17 @@ static int __must_check __deliver_service(struct kvm_vcpu *vcpu)
>  	int rc = 0;
>  
>  	spin_lock(&fi->lock);
> -	if (!(test_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs))) {
> +	if (test_bit(IRQ_PEND_EXT_SERVICE, &fi->masked_irqs) ||
> +	    !(test_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs))) {
>  		spin_unlock(&fi->lock);
>  		return 0;
>  	}
>  	ext = fi->srv_signal;
>  	memset(&fi->srv_signal, 0, sizeof(ext));
>  	clear_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs);
> +	clear_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs);
> +	if (kvm_s390_pv_is_protected(vcpu->kvm))
> +		set_bit(IRQ_PEND_EXT_SERVICE, &fi->masked_irqs);
>  	spin_unlock(&fi->lock);
>  
>  	VCPU_EVENT(vcpu, 4, "deliver: sclp parameter 0x%x",
> @@ -969,15 +1003,33 @@ static int __must_check __deliver_service(struct kvm_vcpu *vcpu)
>  	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_SERVICE,
>  					 ext.ext_params, 0);
>  
> -	rc  = put_guest_lc(vcpu, EXT_IRQ_SERVICE_SIG, (u16 *)__LC_EXT_INT_CODE);
> -	rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
> -	rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
> -			     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> -	rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
> -			    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
> -	rc |= put_guest_lc(vcpu, ext.ext_params,
> -			   (u32 *)__LC_EXT_PARAMS);
> +	rc = write_sclp(vcpu, ext.ext_params);
> +	return rc ? -EFAULT : 0;

... i.e. use "return write_sclp(...)" here...

> +}
> +
> +static int __must_check __deliver_service_ev(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
> +	struct kvm_s390_ext_info ext;
> +	int rc = 0;
> +
> +	spin_lock(&fi->lock);
> +	if (!(test_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs))) {
> +		spin_unlock(&fi->lock);
> +		return 0;
> +	}
> +	ext = fi->srv_signal;
> +	/* only clear the event bit */
> +	fi->srv_signal.ext_params &= ~SCCB_EVENT_PENDING;
> +	clear_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs);
> +	spin_unlock(&fi->lock);
> +
> +	VCPU_EVENT(vcpu, 4, "%s", "deliver: sclp parameter event");
> +	vcpu->stat.deliver_service_signal++;
> +	trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_INT_SERVICE,
> +					 ext.ext_params, 0);
>  
> +	rc = write_sclp(vcpu, SCCB_EVENT_PENDING);
>  	return rc ? -EFAULT : 0;
>  }

... and here.

Apart from that, patch looks ok to me.

 Thomas

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 35/35] DOCUMENTATION: Protected virtual machine introduction and IPL
  2020-02-07 11:39 ` [PATCH 35/35] DOCUMENTATION: Protected virtual machine introduction and IPL Christian Borntraeger
@ 2020-02-11 12:23   ` Thomas Huth
  2020-02-11 20:03     ` Christian Borntraeger
  2020-02-12 11:01   ` Cornelia Huck
  1 sibling, 1 reply; 147+ messages in thread
From: Thomas Huth @ 2020-02-11 12:23 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank

On 07/02/2020 12.39, Christian Borntraeger wrote:
> From: Janosch Frank <frankja@linux.ibm.com>
> 
> Add documentation about protected KVM guests and description of changes
> that are necessary to move a KVM VM into Protected Virtualization mode.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> [borntraeger@de.ibm.com: fixing and conversion to rst]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
[...]
> diff --git a/Documentation/virt/kvm/s390-pv-boot.rst b/Documentation/virt/kvm/s390-pv-boot.rst
> new file mode 100644
> index 000000000000..47814e53369a
> --- /dev/null
> +++ b/Documentation/virt/kvm/s390-pv-boot.rst
> @@ -0,0 +1,79 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +======================================
> +s390 (IBM Z) Boot/IPL of Protected VMs
> +======================================
> +
> +Summary
> +-------
> +Protected Virtual Machines (PVM) are not accessible by I/O or the
> +hypervisor.  When the hypervisor wants to access the memory of PVMs
> +the memory needs to be made accessible. When doing so, the memory will
> +be encrypted.  See :doc:`s390-pv` for details.
> +
> +On IPL a small plaintext bootloader is started which provides
> +information about the encrypted components and necessary metadata to
> +KVM to decrypt the protected virtual machine.
> +
> +Based on this data, KVM will make the protected virtual machine known
> +to the Ultravisor(UV) and instruct it to secure the memory of the PVM,
> +decrypt the components and verify the data and address list hashes, to
> +ensure integrity. Afterwards KVM can run the PVM via the SIE
> +instruction which the UV will intercept and execute on KVM's behalf.
> +
> +The switch into PV mode lets us load encrypted guest executables and

Maybe rather: "After the switch into PV mode, the guest can load ..." ?

> +data via every available method (network, dasd, scsi, direct kernel,
> +...) without the need to change the boot process.
> +
> +
> +Diag308
> +-------
> +This diagnose instruction is the basis for VM IPL. The VM can set and
> +retrieve IPL information blocks, that specify the IPL method/devices
> +and request VM memory and subsystem resets, as well as IPLs.
> +
> +For PVs this concept has been extended with new subcodes:
> +
> +Subcode 8: Set an IPL Information Block of type 5 (information block
> +for PVMs)
> +Subcode 9: Store the saved block in guest memory
> +Subcode 10: Move into Protected Virtualization mode
> +
> +The new PV load-device-specific-parameters field specifies all data,

remove the comma?

> +that is necessary to move into PV mode.
> +
> +* PV Header origin
> +* PV Header length
> +* List of Components composed of
> +   * AES-XTS Tweak prefix
> +   * Origin
> +   * Size
> +
> +The PV header contains the keys and hashes, which the UV will use to
> +decrypt and verify the PV, as well as control flags and a start PSW.
> +
> +The components are for instance an encrypted kernel, kernel cmd and

s/kernel cmd/kernel parameters/ ?

> +initrd. The components are decrypted by the UV.
> +
> +All non-decrypted data of the guest before it switches to protected
> +virtualization mode are zero on first access of the PV.

Before it switches to protected virtualization mode, all non-decrypted
data of the guest are ... ?

> +
> +When running in protected mode some subcodes will result in exceptions
> +or return error codes.
> +
> +Subcodes 4 and 7 will result in specification exceptions as they would
> +not clear out the guest memory.
> +When removing a secure VM, the UV will clear all memory, so we can't
> +have non-clearing IPL subcodes.
> +
> +Subcodes 8, 9, 10 will result in specification exceptions.
> +Re-IPL into a protected mode is only possible via a detour into non
> +protected mode.
> +
> +Keys
> +----
> +Every CEC will have a unique public key to enable tooling to build
> +encrypted images.
> +See  `s390-tools <https://github.com/ibm-s390-tools/s390-tools/>`_
> +for the tooling.
> diff --git a/Documentation/virt/kvm/s390-pv.rst b/Documentation/virt/kvm/s390-pv.rst
> new file mode 100644
> index 000000000000..dbe9110dfd1e
> --- /dev/null
> +++ b/Documentation/virt/kvm/s390-pv.rst
> @@ -0,0 +1,116 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=========================================
> +s390 (IBM Z) Ultravisor and Protected VMs
> +=========================================
> +
> +Summary
> +-------
> +Protected virtual machines (PVM) are KVM VMs, where KVM can't access
> +the VM's state like guest memory and guest registers anymore. Instead,
> +the PVMs are mostly managed by a new entity called Ultravisor
> +(UV). The UV provides an API that can be used by PVMs and KVM to
> +request management actions.
> +
> +Each guest starts in the non-protected mode and then may make a
> +request to transition into protected mode. On transition, KVM
> +registers the guest and its VCPUs with the Ultravisor and prepares
> +everything for running it.
> +
> +The Ultravisor will secure and decrypt the guest's boot memory
> +(i.e. kernel/initrd). It will safeguard state changes like VCPU
> +starts/stops and injected interrupts while the guest is running.
> +
> +As access to the guest's state, such as the SIE state description, is
> +normally needed to be able to run a VM, some changes have been made in
> +SIE behavior. A new format 4 state description has been introduced,

s/in SIE behavior/in the behavior of the SIE instruction/ ?

> +where some fields have different meanings for a PVM. SIE exits are
> +minimized as much as possible to improve speed and reduce exposed
> +guest state.
[...]

 Thomas

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 25/35] KVM: s390: protvirt: Only sync fmt4 registers
  2020-02-11 10:51   ` Cornelia Huck
@ 2020-02-11 12:59     ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-11 12:59 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank



On 11.02.20 11:51, Cornelia Huck wrote:
> On Fri,  7 Feb 2020 06:39:48 -0500
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> +static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>> +{
>> +	/*
>> +	 * at several places we have to modify our internal view to not do
> 
> s/at/In/ ?

ack

> 
>> +	 * things that are disallowed by the ultravisor. For example we must
>> +	 * not inject interrupts after specific exits (e.g. 112). We do this
> 
> Spell out what 112 is?

ack.
> 
>> +	 * by turning off the MIE bits of our PSW copy. To avoid getting
> 
> And also spell out what MIE is?

ack
> 
>> +	 * validity intercepts, we do only accept the condition code from
>> +	 * userspace.
>> +	 */
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 21/35] KVM: s390/mm: handle guest unpin events
  2020-02-10 14:58   ` Thomas Huth
@ 2020-02-11 13:21     ` Cornelia Huck
  0 siblings, 0 replies; 147+ messages in thread
From: Cornelia Huck @ 2020-02-11 13:21 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Christian Borntraeger, Janosch Frank, KVM, David Hildenbrand,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, Andrew Morton

On Mon, 10 Feb 2020 15:58:11 +0100
Thomas Huth <thuth@redhat.com> wrote:

> On 07/02/2020 12.39, Christian Borntraeger wrote:
> > From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > 
> > The current code tries to first pin shared pages, if that fails (e.g.
> > because the page is not shared) it will export them. For shared pages
> > this means that we get a new intercept telling us that the guest is
> > unsharing that page. We will make the page secure at that point in time
> > and revoke the host access. This is synchronized with other host events,
> > e.g. the code will wait until host I/O has finished.
> > 
> > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > ---
> >  arch/s390/kvm/intercept.c | 24 ++++++++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> > 
> > diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
> > index 2a966dc52611..e155389a4a66 100644
> > --- a/arch/s390/kvm/intercept.c
> > +++ b/arch/s390/kvm/intercept.c
> > @@ -16,6 +16,7 @@
> >  #include <asm/asm-offsets.h>
> >  #include <asm/irq.h>
> >  #include <asm/sysinfo.h>
> > +#include <asm/uv.h>
> >  
> >  #include "kvm-s390.h"
> >  #include "gaccess.h"
> > @@ -484,12 +485,35 @@ static int handle_pv_sclp(struct kvm_vcpu *vcpu)
> >  	return 0;
> >  }
> >  
> > +static int handle_pv_uvc(struct kvm_vcpu *vcpu)
> > +{
> > +	struct uv_cb_share *guest_uvcb = (void *)vcpu->arch.sie_block->sidad;
> > +	struct uv_cb_cts uvcb = {
> > +		.header.cmd	= UVC_CMD_UNPIN_PAGE_SHARED,
> > +		.header.len	= sizeof(uvcb),
> > +		.guest_handle	= kvm_s390_pv_handle(vcpu->kvm),
> > +		.gaddr		= guest_uvcb->paddr,
> > +	};
> > +	int rc;
> > +
> > +	if (guest_uvcb->header.cmd != UVC_CMD_REMOVE_SHARED_ACCESS) {
> > +		WARN_ONCE(1, "Unexpected UVC 0x%x!\n", guest_uvcb->header.cmd);  
> 
> Is there a way to signal the failed command to the guest, too?

I'm wondering at which layer the actual problem occurs here. Is it
because a (new) command was not interpreted or rejected by the
ultravisor so that it ended up being handled by the hypervisor? If so,
what should the guest know?

> 
>  Thomas
> 
> 
> > +		return 0;
> > +	}
> > +	rc = uv_make_secure(vcpu->arch.gmap, uvcb.gaddr, &uvcb);
> > +	if (rc == -EINVAL && uvcb.header.rc == 0x104)

This wants a comment.

> > +		return 0;
> > +	return rc;
> > +}
> > +
> >  static int handle_pv_notification(struct kvm_vcpu *vcpu)
> >  {
> >  	if (vcpu->arch.sie_block->ipa == 0xb210)
> >  		return handle_pv_spx(vcpu);
> >  	if (vcpu->arch.sie_block->ipa == 0xb220)
> >  		return handle_pv_sclp(vcpu);
> > +	if (vcpu->arch.sie_block->ipa == 0xb9a4)
> > +		return handle_pv_uvc(vcpu);

Is it defined by the architecture what the possible commands are
for which the hypervisor may get control? If we get something
unexpected, is returning 0 the right strategy?

> >  
> >  	return handle_instruction(vcpu);
> >  }
> >   
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 35/35] DOCUMENTATION: Protected virtual machine introduction and IPL
  2020-02-11 12:23   ` Thomas Huth
@ 2020-02-11 20:03     ` Christian Borntraeger
  2020-02-12 11:03       ` Cornelia Huck
  0 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-11 20:03 UTC (permalink / raw)
  To: Thomas Huth, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank



On 11.02.20 13:23, Thomas Huth wrote:
> On 07/02/2020 12.39, Christian Borntraeger wrote:
>> From: Janosch Frank <frankja@linux.ibm.com>
>>
>> Add documentation about protected KVM guests and description of changes
>> that are necessary to move a KVM VM into Protected Virtualization mode.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> [borntraeger@de.ibm.com: fixing and conversion to rst]
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
> [...]
>> diff --git a/Documentation/virt/kvm/s390-pv-boot.rst b/Documentation/virt/kvm/s390-pv-boot.rst
>> new file mode 100644
>> index 000000000000..47814e53369a
>> --- /dev/null
>> +++ b/Documentation/virt/kvm/s390-pv-boot.rst
>> @@ -0,0 +1,79 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +======================================
>> +s390 (IBM Z) Boot/IPL of Protected VMs
>> +======================================
>> +
>> +Summary
>> +-------
>> +Protected Virtual Machines (PVM) are not accessible by I/O or the
>> +hypervisor.  When the hypervisor wants to access the memory of PVMs
>> +the memory needs to be made accessible. When doing so, the memory will
>> +be encrypted.  See :doc:`s390-pv` for details.
>> +
>> +On IPL a small plaintext bootloader is started which provides
>> +information about the encrypted components and necessary metadata to
>> +KVM to decrypt the protected virtual machine.
>> +
>> +Based on this data, KVM will make the protected virtual machine known
>> +to the Ultravisor(UV) and instruct it to secure the memory of the PVM,
>> +decrypt the components and verify the data and address list hashes, to
>> +ensure integrity. Afterwards KVM can run the PVM via the SIE
>> +instruction which the UV will intercept and execute on KVM's behalf.
>> +
>> +The switch into PV mode lets us load encrypted guest executables and
> 
> Maybe rather: "After the switch into PV mode, the guest can load ..." ?

No its not after the switch. By doing the switch the guest image can be loaded
fro anywhere because it is just like a kernel.

So I will do:

As the guest image is just like an opaque kernel image that does the
switch into PV mode itself, the user can load encrypted guest
executables and data via every available method (network, dasd, scsi,
direct kernel, ...) without the need to change the boot process.



> 
>> +data via every available method (network, dasd, scsi, direct kernel,
>> +...) without the need to change the boot process.
>> +
>> +
>> +Diag308
>> +-------
>> +This diagnose instruction is the basis for VM IPL. The VM can set and
>> +retrieve IPL information blocks, that specify the IPL method/devices
>> +and request VM memory and subsystem resets, as well as IPLs.
>> +
>> +For PVs this concept has been extended with new subcodes:
>> +
>> +Subcode 8: Set an IPL Information Block of type 5 (information block
>> +for PVMs)
>> +Subcode 9: Store the saved block in guest memory
>> +Subcode 10: Move into Protected Virtualization mode
>> +
>> +The new PV load-device-specific-parameters field specifies all data,
> 
> remove the comma?

ack.

> 
>> +that is necessary to move into PV mode.
>> +
>> +* PV Header origin
>> +* PV Header length
>> +* List of Components composed of
>> +   * AES-XTS Tweak prefix
>> +   * Origin
>> +   * Size
>> +
>> +The PV header contains the keys and hashes, which the UV will use to
>> +decrypt and verify the PV, as well as control flags and a start PSW.
>> +
>> +The components are for instance an encrypted kernel, kernel cmd and
> 
> s/kernel cmd/kernel parameters/ ?

ack
> 
>> +initrd. The components are decrypted by the UV.
>> +
>> +All non-decrypted data of the guest before it switches to protected
>> +virtualization mode are zero on first access of the PV.
> 
> Before it switches to protected virtualization mode, all non-decrypted
> data of the guest are ... ?

No, this is about the data after the initial import.
What about

After the initial import of the encrypted data all defined pages will
contain the guest content. All non-specified pages will start out as
zero pages on first access.


> 
>> +
>> +When running in protected mode some subcodes will result in exceptions
>> +or return error codes.
>> +
>> +Subcodes 4 and 7 will result in specification exceptions as they would
>> +not clear out the guest memory.
>> +When removing a secure VM, the UV will clear all memory, so we can't
>> +have non-clearing IPL subcodes.
>> +
>> +Subcodes 8, 9, 10 will result in specification exceptions.
>> +Re-IPL into a protected mode is only possible via a detour into non
>> +protected mode.
>> +
>> +Keys
>> +----
>> +Every CEC will have a unique public key to enable tooling to build
>> +encrypted images.
>> +See  `s390-tools <https://github.com/ibm-s390-tools/s390-tools/>`_
>> +for the tooling.
>> diff --git a/Documentation/virt/kvm/s390-pv.rst b/Documentation/virt/kvm/s390-pv.rst
>> new file mode 100644
>> index 000000000000..dbe9110dfd1e
>> --- /dev/null
>> +++ b/Documentation/virt/kvm/s390-pv.rst
>> @@ -0,0 +1,116 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +=========================================
>> +s390 (IBM Z) Ultravisor and Protected VMs
>> +=========================================
>> +
>> +Summary
>> +-------
>> +Protected virtual machines (PVM) are KVM VMs, where KVM can't access
>> +the VM's state like guest memory and guest registers anymore. Instead,
>> +the PVMs are mostly managed by a new entity called Ultravisor
>> +(UV). The UV provides an API that can be used by PVMs and KVM to
>> +request management actions.
>> +
>> +Each guest starts in the non-protected mode and then may make a
>> +request to transition into protected mode. On transition, KVM
>> +registers the guest and its VCPUs with the Ultravisor and prepares
>> +everything for running it.
>> +
>> +The Ultravisor will secure and decrypt the guest's boot memory
>> +(i.e. kernel/initrd). It will safeguard state changes like VCPU
>> +starts/stops and injected interrupts while the guest is running.
>> +
>> +As access to the guest's state, such as the SIE state description, is
>> +normally needed to be able to run a VM, some changes have been made in
>> +SIE behavior. A new format 4 state description has been introduced,
> 
> s/in SIE behavior/in the behavior of the SIE instruction/ ?

ack
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 16/35] KVM: s390: protvirt: Add SCLP interrupt handling
  2020-02-11 12:00   ` Thomas Huth
@ 2020-02-11 20:06     ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-11 20:06 UTC (permalink / raw)
  To: Thomas Huth, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank



On 11.02.20 13:00, Thomas Huth wrote:
[...]
>> +
>> +static int write_sclp(struct kvm_vcpu *vcpu, u32 parm)
>> +{
>> +	int rc;
>> +
>> +	if (kvm_s390_pv_handle_cpu(vcpu)) {
>> +		vcpu->arch.sie_block->iictl = IICTL_CODE_EXT;
>> +		vcpu->arch.sie_block->eic = EXT_IRQ_SERVICE_SIG;
>> +		vcpu->arch.sie_block->eiparams = parm;
>> +		return 0;
>> +	}
>> +
>> +	rc  = put_guest_lc(vcpu, EXT_IRQ_SERVICE_SIG, (u16 *)__LC_EXT_INT_CODE);
>> +	rc |= put_guest_lc(vcpu, 0, (u16 *)__LC_EXT_CPU_ADDR);
>> +	rc |= write_guest_lc(vcpu, __LC_EXT_OLD_PSW,
>> +			     &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
>> +	rc |= read_guest_lc(vcpu, __LC_EXT_NEW_PSW,
>> +			    &vcpu->arch.sie_block->gpsw, sizeof(psw_t));
>> +	rc |= put_guest_lc(vcpu, parm,
>> +			   (u32 *)__LC_EXT_PARAMS);
>> +	return rc;
> 
> I think it would be nicer to move the "return rc ? -EFAULT : 0;" here
> instead of using it in the __deliver_service* functions...

ack. That would also allow to get rid of rc in the deliver functions.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 35/35] DOCUMENTATION: Protected virtual machine introduction and IPL
  2020-02-07 11:39 ` [PATCH 35/35] DOCUMENTATION: Protected virtual machine introduction and IPL Christian Borntraeger
  2020-02-11 12:23   ` Thomas Huth
@ 2020-02-12 11:01   ` Cornelia Huck
  2020-02-12 16:36     ` Christian Borntraeger
  1 sibling, 1 reply; 147+ messages in thread
From: Cornelia Huck @ 2020-02-12 11:01 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

On Fri,  7 Feb 2020 06:39:58 -0500
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> From: Janosch Frank <frankja@linux.ibm.com>
> 
> Add documentation about protected KVM guests and description of changes
> that are necessary to move a KVM VM into Protected Virtualization mode.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> [borntraeger@de.ibm.com: fixing and conversion to rst]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  Documentation/virt/kvm/index.rst        |   2 +
>  Documentation/virt/kvm/s390-pv-boot.rst |  79 ++++++++++++++++
>  Documentation/virt/kvm/s390-pv.rst      | 116 ++++++++++++++++++++++++
>  MAINTAINERS                             |   1 +
>  4 files changed, 198 insertions(+)
>  create mode 100644 Documentation/virt/kvm/s390-pv-boot.rst
>  create mode 100644 Documentation/virt/kvm/s390-pv.rst
> 
(...)
> diff --git a/Documentation/virt/kvm/s390-pv-boot.rst b/Documentation/virt/kvm/s390-pv-boot.rst
> new file mode 100644
> index 000000000000..47814e53369a
> --- /dev/null
> +++ b/Documentation/virt/kvm/s390-pv-boot.rst
> @@ -0,0 +1,79 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +======================================
> +s390 (IBM Z) Boot/IPL of Protected VMs
> +======================================
> +
> +Summary
> +-------
> +Protected Virtual Machines (PVM) are not accessible by I/O or the
> +hypervisor.  When the hypervisor wants to access the memory of PVMs
> +the memory needs to be made accessible. When doing so, the memory will
> +be encrypted.  See :doc:`s390-pv` for details.

Maybe

"The memory of Protected Virtual Machines (PVMs) is not accessible to
I/O or the hypervisor. In those cases where the hypervisor needs to
access the memory of a PVM, that memory must be made accessible. Memory
made accessible to the hypervisor will be encrypted. See :doc:`s390-pv`
for details."

?

> +
> +On IPL a small plaintext bootloader is started which provides

"On IPL (boot), a small plaintext bootloader is started, which..."

?

> +information about the encrypted components and necessary metadata to
> +KVM to decrypt the protected virtual machine.

(...)

> +Diag308
> +-------
> +This diagnose instruction is the basis for VM IPL. The VM can set and

"This diagnose instruction is the basic mechanism to handle IPL and
related operations for virtual machines." ?

> +retrieve IPL information blocks, that specify the IPL method/devices
> +and request VM memory and subsystem resets, as well as IPLs.
> +
> +For PVs this concept has been extended with new subcodes:

s/For PVs/For PVMs,/

(...)

> +When running in protected mode some subcodes will result in exceptions

s/When running in protected mode/When running in protected virtualization mode,/

?

> +or return error codes.
> +
> +Subcodes 4 and 7 will result in specification exceptions as they would
> +not clear out the guest memory.
> +When removing a secure VM, the UV will clear all memory, so we can't
> +have non-clearing IPL subcodes.

"Subcodes 4 and 7, which specify operations that do not clear the guest
memory, will result in specification exceptions. This is because the UV
will clear all memory when a secure VM is removed, and therefore
non-clearing IPL subcodes are not allowed."

?

(...)
> diff --git a/Documentation/virt/kvm/s390-pv.rst b/Documentation/virt/kvm/s390-pv.rst
> new file mode 100644
> index 000000000000..dbe9110dfd1e
> --- /dev/null
> +++ b/Documentation/virt/kvm/s390-pv.rst
> @@ -0,0 +1,116 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=========================================
> +s390 (IBM Z) Ultravisor and Protected VMs
> +=========================================
> +
> +Summary
> +-------
> +Protected virtual machines (PVM) are KVM VMs, where KVM can't access
> +the VM's state like guest memory and guest registers anymore. Instead,

"...are KVM VMs that do not allow KVM to access VM state like guest
memory or guest registers."

?

(...)

> +The Interception Parameters state description field still contains the
> +the bytes of the instruction text, but with pre-set register values
> +instead of the actual ones. I.e. each instruction always uses the same
> +instruction text, in order not to leak guest instruction text.
> +This also implies that the register content that a guest had in r<n>
> +may be in r<m> from the hypervisors point of view.

s/hypervisors/hypervisor's/

> +
> +The Secure Instruction Data Area contains instruction storage
> +data. Instruction data, i.e. data being referenced by an instruction
> +like the SCCB for sclp, is moved over the SIDA. When an instruction is

s/over/via/ ?

> +intercepted, the SIE will only allow data and program interrupts for
> +this instruction to be moved to the guest via the two data areas
> +discussed before. Other data is either ignored or results in validity
> +interceptions.

(...)

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 35/35] DOCUMENTATION: Protected virtual machine introduction and IPL
  2020-02-11 20:03     ` Christian Borntraeger
@ 2020-02-12 11:03       ` Cornelia Huck
  2020-02-12 11:49         ` Christian Borntraeger
  0 siblings, 1 reply; 147+ messages in thread
From: Cornelia Huck @ 2020-02-12 11:03 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Thomas Huth, Janosch Frank, KVM, David Hildenbrand,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank

On Tue, 11 Feb 2020 21:03:17 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 11.02.20 13:23, Thomas Huth wrote:
> > On 07/02/2020 12.39, Christian Borntraeger wrote:  
> >> +The switch into PV mode lets us load encrypted guest executables and  
> > 
> > Maybe rather: "After the switch into PV mode, the guest can load ..." ?  
> 
> No its not after the switch. By doing the switch the guest image can be loaded
> fro anywhere because it is just like a kernel.
> 
> So I will do:
> 
> As the guest image is just like an opaque kernel image that does the
> switch into PV mode itself, the user can load encrypted guest
> executables and data via every available method (network, dasd, scsi,
> direct kernel, ...) without the need to change the boot process.

Sounds good to me.

(...)

> >> +All non-decrypted data of the guest before it switches to protected
> >> +virtualization mode are zero on first access of the PV.  
> > 
> > Before it switches to protected virtualization mode, all non-decrypted
> > data of the guest are ... ?  
> 
> No, this is about the data after the initial import.
> What about
> 
> After the initial import of the encrypted data all defined pages will

s/data/data,/

> contain the guest content. All non-specified pages will start out as
> zero pages on first access.

Also sounds good to me.

(...)

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 35/35] DOCUMENTATION: Protected virtual machine introduction and IPL
  2020-02-12 11:03       ` Cornelia Huck
@ 2020-02-12 11:49         ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-12 11:49 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Thomas Huth, Janosch Frank, KVM, David Hildenbrand,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank



On 12.02.20 12:03, Cornelia Huck wrote:
> On Tue, 11 Feb 2020 21:03:17 +0100
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> On 11.02.20 13:23, Thomas Huth wrote:
>>> On 07/02/2020 12.39, Christian Borntraeger wrote:  
>>>> +The switch into PV mode lets us load encrypted guest executables and  
>>>
>>> Maybe rather: "After the switch into PV mode, the guest can load ..." ?  
>>
>> No its not after the switch. By doing the switch the guest image can be loaded
>> fro anywhere because it is just like a kernel.
>>
>> So I will do:
>>
>> As the guest image is just like an opaque kernel image that does the
>> switch into PV mode itself, the user can load encrypted guest
>> executables and data via every available method (network, dasd, scsi,
>> direct kernel, ...) without the need to change the boot process.
> 
> Sounds good to me.
> 
> (...)
> 
>>>> +All non-decrypted data of the guest before it switches to protected
>>>> +virtualization mode are zero on first access of the PV.  
>>>
>>> Before it switches to protected virtualization mode, all non-decrypted
>>> data of the guest are ... ?  
>>
>> No, this is about the data after the initial import.
>> What about
>>
>> After the initial import of the encrypted data all defined pages will
> 
> s/data/data,/

ack.
> 
>> contain the guest content. All non-specified pages will start out as
>> zero pages on first access.
> 
> Also sounds good to me.
> 
> (...)
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v2 RFC] KVM: s390/interrupt: do not pin adapter interrupt pages
  2020-02-11  9:23         ` [PATCH v2 RFC] " Christian Borntraeger
@ 2020-02-12 11:52           ` Christian Borntraeger
  2020-02-12 12:16           ` David Hildenbrand
  2020-02-12 12:39           ` Cornelia Huck
  2 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-12 11:52 UTC (permalink / raw)
  To: david
  Cc: Ulrich.Weigand, aarcange, akpm, cohuck, frankja, gor, imbrenda,
	kvm, linux-mm, linux-s390, mimu, thuth

I pushed that variant to my next branch. this should trigger several regression runs
in regard to function and performance for normal KVM guests.
Lets see if this has any impact at all. If not this could be the simplest solution that
also simplifies a lot of code.

On 11.02.20 10:23, Christian Borntraeger wrote:
> From: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
> 
> The adapter interrupt page containing the indicator bits is currently
> pinned. That means that a guest with many devices can pin a lot of
> memory pages in the host. This also complicates the reference tracking
> which is needed for memory management handling of protected virtual
> machines.
> We can simply try to get the userspace page set the bits and free the
> page. By storing the userspace address in the irq routing entry instead
> of the guest address we can actually avoid many lookups and list walks
> so that this variant is very likely not slower.
> 
> Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
> [borntraeger@de.ibm.com: patch simplification]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
> quick and dirty, how this could look like
> 
> 
>  arch/s390/include/asm/kvm_host.h |   3 -
>  arch/s390/kvm/interrupt.c        | 146 +++++++++++--------------------
>  2 files changed, 49 insertions(+), 100 deletions(-)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 0d398738ded9..88a218872fa0 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -771,9 +771,6 @@ struct s390_io_adapter {
>  	bool masked;
>  	bool swap;
>  	bool suppressible;
> -	struct rw_semaphore maps_lock;
> -	struct list_head maps;
> -	atomic_t nr_maps;
>  };
>  
>  #define MAX_S390_IO_ADAPTERS ((MAX_ISC + 1) * 8)
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index d4d35ec79e12..e6fe8b61ee9b 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
> @@ -2459,9 +2459,6 @@ static int register_io_adapter(struct kvm_device *dev,
>  	if (!adapter)
>  		return -ENOMEM;
>  
> -	INIT_LIST_HEAD(&adapter->maps);
> -	init_rwsem(&adapter->maps_lock);
> -	atomic_set(&adapter->nr_maps, 0);
>  	adapter->id = adapter_info.id;
>  	adapter->isc = adapter_info.isc;
>  	adapter->maskable = adapter_info.maskable;
> @@ -2488,83 +2485,26 @@ int kvm_s390_mask_adapter(struct kvm *kvm, unsigned int id, bool masked)
>  
>  static int kvm_s390_adapter_map(struct kvm *kvm, unsigned int id, __u64 addr)
>  {
> -	struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
> -	struct s390_map_info *map;
> -	int ret;
> -
> -	if (!adapter || !addr)
> -		return -EINVAL;
> -
> -	map = kzalloc(sizeof(*map), GFP_KERNEL);
> -	if (!map) {
> -		ret = -ENOMEM;
> -		goto out;
> -	}
> -	INIT_LIST_HEAD(&map->list);
> -	map->guest_addr = addr;
> -	map->addr = gmap_translate(kvm->arch.gmap, addr);
> -	if (map->addr == -EFAULT) {
> -		ret = -EFAULT;
> -		goto out;
> -	}
> -	ret = get_user_pages_fast(map->addr, 1, FOLL_WRITE, &map->page);
> -	if (ret < 0)
> -		goto out;
> -	BUG_ON(ret != 1);
> -	down_write(&adapter->maps_lock);
> -	if (atomic_inc_return(&adapter->nr_maps) < MAX_S390_ADAPTER_MAPS) {
> -		list_add_tail(&map->list, &adapter->maps);
> -		ret = 0;
> -	} else {
> -		put_page(map->page);
> -		ret = -EINVAL;
> +	/*
> +	 * We resolve the gpa to hva when setting the IRQ routing. If userspace
> +	 * decides to mess with the memslots it better also updates the irq
> +	 * routing. Otherwise we will write to the wrong userspace address.
> +	 */
> +	return 0;
>  	}
> -	up_write(&adapter->maps_lock);
> -out:
> -	if (ret)
> -		kfree(map);
> -	return ret;
> -}
>  
>  static int kvm_s390_adapter_unmap(struct kvm *kvm, unsigned int id, __u64 addr)
>  {
> -	struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
> -	struct s390_map_info *map, *tmp;
> -	int found = 0;
> -
> -	if (!adapter || !addr)
> -		return -EINVAL;
> -
> -	down_write(&adapter->maps_lock);
> -	list_for_each_entry_safe(map, tmp, &adapter->maps, list) {
> -		if (map->guest_addr == addr) {
> -			found = 1;
> -			atomic_dec(&adapter->nr_maps);
> -			list_del(&map->list);
> -			put_page(map->page);
> -			kfree(map);
> -			break;
> -		}
> -	}
> -	up_write(&adapter->maps_lock);
> -
> -	return found ? 0 : -EINVAL;
> +	return 0;
>  }
>  
>  void kvm_s390_destroy_adapters(struct kvm *kvm)
>  {
>  	int i;
> -	struct s390_map_info *map, *tmp;
>  
>  	for (i = 0; i < MAX_S390_IO_ADAPTERS; i++) {
>  		if (!kvm->arch.adapters[i])
>  			continue;
> -		list_for_each_entry_safe(map, tmp,
> -					 &kvm->arch.adapters[i]->maps, list) {
> -			list_del(&map->list);
> -			put_page(map->page);
> -			kfree(map);
> -		}
>  		kfree(kvm->arch.adapters[i]);
>  	}
>  }
> @@ -2831,19 +2771,25 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
>  	return swap ? (bit ^ (BITS_PER_LONG - 1)) : bit;
>  }
>  
> -static struct s390_map_info *get_map_info(struct s390_io_adapter *adapter,
> -					  u64 addr)
> +static struct page *get_map_page(struct kvm *kvm,
> +				 struct s390_io_adapter *adapter,
> +				 u64 uaddr)
>  {
> -	struct s390_map_info *map;
> +	struct page *page;
> +	int ret;
>  
>  	if (!adapter)
>  		return NULL;
> -
> -	list_for_each_entry(map, &adapter->maps, list) {
> -		if (map->guest_addr == addr)
> -			return map;
> -	}
> -	return NULL;
> +	page = NULL;
> +	if (!uaddr)
> +		return NULL;
> +	down_read(&kvm->mm->mmap_sem);
> +	ret = get_user_pages_remote(NULL, kvm->mm, uaddr, 1, FOLL_WRITE,
> +				    &page, NULL, NULL);
> +	if (ret < 1)
> +		page = NULL;
> +	up_read(&kvm->mm->mmap_sem);
> +	return page;
>  }
>  
>  static int adapter_indicators_set(struct kvm *kvm,
> @@ -2852,30 +2798,35 @@ static int adapter_indicators_set(struct kvm *kvm,
>  {
>  	unsigned long bit;
>  	int summary_set, idx;
> -	struct s390_map_info *info;
> +	struct page *ind_page, *summary_page;
>  	void *map;
>  
> -	info = get_map_info(adapter, adapter_int->ind_addr);
> -	if (!info)
> +	ind_page = get_map_page(kvm, adapter, adapter_int->ind_addr);
> +	if (!ind_page)
>  		return -1;
> -	map = page_address(info->page);
> -	bit = get_ind_bit(info->addr, adapter_int->ind_offset, adapter->swap);
> -	set_bit(bit, map);
> -	idx = srcu_read_lock(&kvm->srcu);
> -	mark_page_dirty(kvm, info->guest_addr >> PAGE_SHIFT);
> -	set_page_dirty_lock(info->page);
> -	info = get_map_info(adapter, adapter_int->summary_addr);
> -	if (!info) {
> -		srcu_read_unlock(&kvm->srcu, idx);
> +	summary_page = get_map_page(kvm, adapter, adapter_int->summary_addr);
> +	if (!summary_page) {
> +		put_page(ind_page);
>  		return -1;
>  	}
> -	map = page_address(info->page);
> -	bit = get_ind_bit(info->addr, adapter_int->summary_offset,
> -			  adapter->swap);
> +
> +	idx = srcu_read_lock(&kvm->srcu);
> +	map = page_address(ind_page);
> +	bit = get_ind_bit(adapter_int->ind_addr,
> +			  adapter_int->ind_offset, adapter->swap);
> +	set_bit(bit, map);
> +	mark_page_dirty(kvm, adapter_int->ind_addr >> PAGE_SHIFT);
> +	set_page_dirty_lock(ind_page);
> +	map = page_address(summary_page);
> +	bit = get_ind_bit(adapter_int->summary_addr,
> +			  adapter_int->summary_offset, adapter->swap);
>  	summary_set = test_and_set_bit(bit, map);
> -	mark_page_dirty(kvm, info->guest_addr >> PAGE_SHIFT);
> -	set_page_dirty_lock(info->page);
> +	mark_page_dirty(kvm, adapter_int->summary_addr >> PAGE_SHIFT);
> +	set_page_dirty_lock(summary_page);
>  	srcu_read_unlock(&kvm->srcu, idx);
> +
> +	put_page(ind_page);
> +	put_page(summary_page);
>  	return summary_set ? 0 : 1;
>  }
>  
> @@ -2897,9 +2848,7 @@ static int set_adapter_int(struct kvm_kernel_irq_routing_entry *e,
>  	adapter = get_io_adapter(kvm, e->adapter.adapter_id);
>  	if (!adapter)
>  		return -1;
> -	down_read(&adapter->maps_lock);
>  	ret = adapter_indicators_set(kvm, adapter, &e->adapter);
> -	up_read(&adapter->maps_lock);
>  	if ((ret > 0) && !adapter->masked) {
>  		ret = kvm_s390_inject_airq(kvm, adapter);
>  		if (ret == 0)
> @@ -2951,12 +2900,15 @@ int kvm_set_routing_entry(struct kvm *kvm,
>  			  const struct kvm_irq_routing_entry *ue)
>  {
>  	int ret;
> +	u64 uaddr;
>  
>  	switch (ue->type) {
>  	case KVM_IRQ_ROUTING_S390_ADAPTER:
>  		e->set = set_adapter_int;
> -		e->adapter.summary_addr = ue->u.adapter.summary_addr;
> -		e->adapter.ind_addr = ue->u.adapter.ind_addr;
> +		uaddr =  gmap_translate(kvm->arch.gmap, ue->u.adapter.summary_addr);
> +		e->adapter.summary_addr = uaddr;
> +		uaddr =  gmap_translate(kvm->arch.gmap, ue->u.adapter.ind_addr);
> +		e->adapter.ind_addr = uaddr;
>  		e->adapter.summary_offset = ue->u.adapter.summary_offset;
>  		e->adapter.ind_offset = ue->u.adapter.ind_offset;
>  		e->adapter.adapter_id = ue->u.adapter.adapter_id;
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v2 RFC] KVM: s390/interrupt: do not pin adapter interrupt pages
  2020-02-11  9:23         ` [PATCH v2 RFC] " Christian Borntraeger
  2020-02-12 11:52           ` Christian Borntraeger
@ 2020-02-12 12:16           ` David Hildenbrand
  2020-02-12 12:22             ` Christian Borntraeger
  2020-02-12 12:39           ` Cornelia Huck
  2 siblings, 1 reply; 147+ messages in thread
From: David Hildenbrand @ 2020-02-12 12:16 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Ulrich.Weigand, aarcange, akpm, cohuck, frankja, gor, imbrenda,
	kvm, linux-mm, linux-s390, mimu, thuth, dgilbert


> +	/*
> +	 * We resolve the gpa to hva when setting the IRQ routing. If userspace
> +	 * decides to mess with the memslots it better also updates the irq
> +	 * routing. Otherwise we will write to the wrong userspace address.
> +	 */

I guess this is just as old handling, where a page was pinned. But
slightly better :) So the pages are definitely part of guest memory.

Fun stuff: If (a nasty) guest (in current code) zappes this page using
balloon inflation and the page is re-accessed (e.g., by the guest or by
the host), a new page will be faulted in, and there will be an
inconsistency between what the guest/user space sees and what this code
sees. Going via the user space address looks cleaner.

Now, with postcopy live migration, we will also zap all guest memory
before starting the guest, I do wonder if that produces a similar
inconsistency ... usually, when pages are pinned in the kernel, we
inhibit the balloon and implicitly also postcopy.

If so, this actually fixes an issue. But might depend on the order
things are initialized in user space. Or I am messing up things :)

[...]

>  static int kvm_s390_adapter_unmap(struct kvm *kvm, unsigned int id, __u64 addr)
>  {
> -	struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
> -	struct s390_map_info *map, *tmp;
> -	int found = 0;
> -
> -	if (!adapter || !addr)
> -		return -EINVAL;
> -
> -	down_write(&adapter->maps_lock);
> -	list_for_each_entry_safe(map, tmp, &adapter->maps, list) {
> -		if (map->guest_addr == addr) {
> -			found = 1;
> -			atomic_dec(&adapter->nr_maps);
> -			list_del(&map->list);
> -			put_page(map->page);
> -			kfree(map);
> -			break;
> -		}
> -	}
> -	up_write(&adapter->maps_lock);
> -
> -	return found ? 0 : -EINVAL;
> +	return 0;

Can we get rid of this function?

>  }

> +static struct page *get_map_page(struct kvm *kvm,
> +				 struct s390_io_adapter *adapter,
> +				 u64 uaddr)
>  {
> -	struct s390_map_info *map;
> +	struct page *page;
> +	int ret;
>  
>  	if (!adapter)
>  		return NULL;
> -
> -	list_for_each_entry(map, &adapter->maps, list) {
> -		if (map->guest_addr == addr)
> -			return map;
> -	}
> -	return NULL;
> +	page = NULL;

struct page *page = NULL;

> +	if (!uaddr)
> +		return NULL;
> +	down_read(&kvm->mm->mmap_sem);
> +	ret = get_user_pages_remote(NULL, kvm->mm, uaddr, 1, FOLL_WRITE,
> +				    &page, NULL, NULL);
> +	if (ret < 1)
> +		page = NULL;

Is that really necessary? According to the doc, pinned pages are stored
to the array.  ret < 1 means "no pages" were pinned, so nothing should
be stored.

-- 
Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v2 RFC] KVM: s390/interrupt: do not pin adapter interrupt pages
  2020-02-12 12:16           ` David Hildenbrand
@ 2020-02-12 12:22             ` Christian Borntraeger
  2020-02-12 12:47               ` David Hildenbrand
  0 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-12 12:22 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Ulrich.Weigand, aarcange, akpm, cohuck, frankja, gor, imbrenda,
	kvm, linux-mm, linux-s390, mimu, thuth, dgilbert



On 12.02.20 13:16, David Hildenbrand wrote:
> 
>> +	/*
>> +	 * We resolve the gpa to hva when setting the IRQ routing. If userspace
>> +	 * decides to mess with the memslots it better also updates the irq
>> +	 * routing. Otherwise we will write to the wrong userspace address.
>> +	 */
> 
> I guess this is just as old handling, where a page was pinned. But
> slightly better :) So the pages are definitely part of guest memory.
> 
> Fun stuff: If (a nasty) guest (in current code) zappes this page using
> balloon inflation and the page is re-accessed (e.g., by the guest or by
> the host), a new page will be faulted in, and there will be an
> inconsistency between what the guest/user space sees and what this code
> sees. Going via the user space address looks cleaner.
> 
> Now, with postcopy live migration, we will also zap all guest memory
> before starting the guest, I do wonder if that produces a similar
> inconsistency ... usually, when pages are pinned in the kernel, we
> inhibit the balloon and implicitly also postcopy.
> 
> If so, this actually fixes an issue. But might depend on the order
> things are initialized in user space. Or I am messing up things :)

Yes, the current code has some corner cases where a guest can shoot himself
in the foot. This variant could actually be safer. 
> 
> [...]
> 
>>  static int kvm_s390_adapter_unmap(struct kvm *kvm, unsigned int id, __u64 addr)
>>  {
>> -	struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
>> -	struct s390_map_info *map, *tmp;
>> -	int found = 0;
>> -
>> -	if (!adapter || !addr)
>> -		return -EINVAL;
>> -
>> -	down_write(&adapter->maps_lock);
>> -	list_for_each_entry_safe(map, tmp, &adapter->maps, list) {
>> -		if (map->guest_addr == addr) {
>> -			found = 1;
>> -			atomic_dec(&adapter->nr_maps);
>> -			list_del(&map->list);
>> -			put_page(map->page);
>> -			kfree(map);
>> -			break;
>> -		}
>> -	}
>> -	up_write(&adapter->maps_lock);
>> -
>> -	return found ? 0 : -EINVAL;
>> +	return 0;
> 
> Can we get rid of this function?

And do a return in the handler? maybe yes. Will have a look.
> 
>>  }
> 
>> +static struct page *get_map_page(struct kvm *kvm,
>> +				 struct s390_io_adapter *adapter,
>> +				 u64 uaddr)
>>  {
>> -	struct s390_map_info *map;
>> +	struct page *page;
>> +	int ret;
>>  
>>  	if (!adapter)
>>  		return NULL;
>> -
>> -	list_for_each_entry(map, &adapter->maps, list) {
>> -		if (map->guest_addr == addr)
>> -			return map;
>> -	}
>> -	return NULL;
>> +	page = NULL;
> 
> struct page *page = NULL;
> 
>> +	if (!uaddr)
>> +		return NULL;
>> +	down_read(&kvm->mm->mmap_sem);
>> +	ret = get_user_pages_remote(NULL, kvm->mm, uaddr, 1, FOLL_WRITE,
>> +				    &page, NULL, NULL);
>> +	if (ret < 1)
>> +		page = NULL;
> 
> Is that really necessary? According to the doc, pinned pages are stored
> to the array.  ret < 1 means "no pages" were pinned, so nothing should
> be stored.

Probably. Will have a look.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v2 RFC] KVM: s390/interrupt: do not pin adapter interrupt pages
  2020-02-11  9:23         ` [PATCH v2 RFC] " Christian Borntraeger
  2020-02-12 11:52           ` Christian Borntraeger
  2020-02-12 12:16           ` David Hildenbrand
@ 2020-02-12 12:39           ` Cornelia Huck
  2020-02-12 12:44             ` Christian Borntraeger
  2 siblings, 1 reply; 147+ messages in thread
From: Cornelia Huck @ 2020-02-12 12:39 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: david, Ulrich.Weigand, aarcange, akpm, frankja, gor, imbrenda,
	kvm, linux-mm, linux-s390, mimu, thuth

On Tue, 11 Feb 2020 04:23:41 -0500
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> From: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
> 
> The adapter interrupt page containing the indicator bits is currently
> pinned. That means that a guest with many devices can pin a lot of
> memory pages in the host. This also complicates the reference tracking
> which is needed for memory management handling of protected virtual
> machines.
> We can simply try to get the userspace page set the bits and free the
> page. By storing the userspace address in the irq routing entry instead
> of the guest address we can actually avoid many lookups and list walks
> so that this variant is very likely not slower.
> 
> Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
> [borntraeger@de.ibm.com: patch simplification]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
> quick and dirty, how this could look like
> 
> 
>  arch/s390/include/asm/kvm_host.h |   3 -
>  arch/s390/kvm/interrupt.c        | 146 +++++++++++--------------------
>  2 files changed, 49 insertions(+), 100 deletions(-)
> 

(...)

> @@ -2488,83 +2485,26 @@ int kvm_s390_mask_adapter(struct kvm *kvm, unsigned int id, bool masked)
>  
>  static int kvm_s390_adapter_map(struct kvm *kvm, unsigned int id, __u64 addr)
>  {
> -	struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
> -	struct s390_map_info *map;
> -	int ret;
> -
> -	if (!adapter || !addr)
> -		return -EINVAL;
> -
> -	map = kzalloc(sizeof(*map), GFP_KERNEL);
> -	if (!map) {
> -		ret = -ENOMEM;
> -		goto out;
> -	}
> -	INIT_LIST_HEAD(&map->list);
> -	map->guest_addr = addr;
> -	map->addr = gmap_translate(kvm->arch.gmap, addr);
> -	if (map->addr == -EFAULT) {
> -		ret = -EFAULT;
> -		goto out;
> -	}
> -	ret = get_user_pages_fast(map->addr, 1, FOLL_WRITE, &map->page);
> -	if (ret < 0)
> -		goto out;
> -	BUG_ON(ret != 1);
> -	down_write(&adapter->maps_lock);
> -	if (atomic_inc_return(&adapter->nr_maps) < MAX_S390_ADAPTER_MAPS) {
> -		list_add_tail(&map->list, &adapter->maps);
> -		ret = 0;
> -	} else {
> -		put_page(map->page);
> -		ret = -EINVAL;
> +	/*
> +	 * We resolve the gpa to hva when setting the IRQ routing. If userspace
> +	 * decides to mess with the memslots it better also updates the irq
> +	 * routing. Otherwise we will write to the wrong userspace address.
> +	 */
> +	return 0;

Given that this function now always returns 0, we basically get a
completely useless roundtrip into the kernel when userspace is trying
to setup the mappings.

Can we define a new IO_ADAPTER_MAPPING_NOT_NEEDED or so capability that
userspace can check?

This change in behaviour probably wants a change in the documentation
as well.

>  	}
> -	up_write(&adapter->maps_lock);
> -out:
> -	if (ret)
> -		kfree(map);
> -	return ret;
> -}
>  
>  static int kvm_s390_adapter_unmap(struct kvm *kvm, unsigned int id, __u64 addr)
>  {
> -	struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
> -	struct s390_map_info *map, *tmp;
> -	int found = 0;
> -
> -	if (!adapter || !addr)
> -		return -EINVAL;
> -
> -	down_write(&adapter->maps_lock);
> -	list_for_each_entry_safe(map, tmp, &adapter->maps, list) {
> -		if (map->guest_addr == addr) {
> -			found = 1;
> -			atomic_dec(&adapter->nr_maps);
> -			list_del(&map->list);
> -			put_page(map->page);
> -			kfree(map);
> -			break;
> -		}
> -	}
> -	up_write(&adapter->maps_lock);
> -
> -	return found ? 0 : -EINVAL;
> +	return 0;

Same here.

>  }
>  
>  void kvm_s390_destroy_adapters(struct kvm *kvm)
>  {
>  	int i;
> -	struct s390_map_info *map, *tmp;
>  
>  	for (i = 0; i < MAX_S390_IO_ADAPTERS; i++) {
>  		if (!kvm->arch.adapters[i])
>  			continue;
> -		list_for_each_entry_safe(map, tmp,
> -					 &kvm->arch.adapters[i]->maps, list) {
> -			list_del(&map->list);
> -			put_page(map->page);
> -			kfree(map);
> -		}
>  		kfree(kvm->arch.adapters[i]);

Call kfree() unconditionally?

>  	}
>  }
> @@ -2831,19 +2771,25 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
>  	return swap ? (bit ^ (BITS_PER_LONG - 1)) : bit;
>  }
>  
> -static struct s390_map_info *get_map_info(struct s390_io_adapter *adapter,
> -					  u64 addr)
> +static struct page *get_map_page(struct kvm *kvm,
> +				 struct s390_io_adapter *adapter,
> +				 u64 uaddr)
>  {
> -	struct s390_map_info *map;
> +	struct page *page;
> +	int ret;
>  
>  	if (!adapter)
>  		return NULL;
> -
> -	list_for_each_entry(map, &adapter->maps, list) {
> -		if (map->guest_addr == addr)
> -			return map;
> -	}
> -	return NULL;
> +	page = NULL;
> +	if (!uaddr)
> +		return NULL;
> +	down_read(&kvm->mm->mmap_sem);
> +	ret = get_user_pages_remote(NULL, kvm->mm, uaddr, 1, FOLL_WRITE,
> +				    &page, NULL, NULL);
> +	if (ret < 1)
> +		page = NULL;
> +	up_read(&kvm->mm->mmap_sem);
> +	return page;
>  }
>  
>  static int adapter_indicators_set(struct kvm *kvm,

(...)

> @@ -2951,12 +2900,15 @@ int kvm_set_routing_entry(struct kvm *kvm,
>  			  const struct kvm_irq_routing_entry *ue)
>  {
>  	int ret;
> +	u64 uaddr;
>  
>  	switch (ue->type) {
>  	case KVM_IRQ_ROUTING_S390_ADAPTER:
>  		e->set = set_adapter_int;
> -		e->adapter.summary_addr = ue->u.adapter.summary_addr;
> -		e->adapter.ind_addr = ue->u.adapter.ind_addr;
> +		uaddr =  gmap_translate(kvm->arch.gmap, ue->u.adapter.summary_addr);

Can gmap_translate() return -EFAULT here? The code above only seems to
check for 0... do we want to return an error here?

> +		e->adapter.summary_addr = uaddr;
> +		uaddr =  gmap_translate(kvm->arch.gmap, ue->u.adapter.ind_addr);
> +		e->adapter.ind_addr = uaddr;
>  		e->adapter.summary_offset = ue->u.adapter.summary_offset;
>  		e->adapter.ind_offset = ue->u.adapter.ind_offset;
>  		e->adapter.adapter_id = ue->u.adapter.adapter_id;

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v2 RFC] KVM: s390/interrupt: do not pin adapter interrupt pages
  2020-02-12 12:39           ` Cornelia Huck
@ 2020-02-12 12:44             ` Christian Borntraeger
  2020-02-12 13:07               ` Cornelia Huck
  0 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-12 12:44 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: david, Ulrich.Weigand, aarcange, akpm, frankja, gor, imbrenda,
	kvm, linux-mm, linux-s390, mimu, thuth



On 12.02.20 13:39, Cornelia Huck wrote:
[...]

>> +	 */
>> +	return 0;
> 
> Given that this function now always returns 0, we basically get a
> completely useless roundtrip into the kernel when userspace is trying
> to setup the mappings.
> 
> Can we define a new IO_ADAPTER_MAPPING_NOT_NEEDED or so capability that
> userspace can check?

Nack. This is one system call per initial indicator ccw. This is so seldom
and cheap that I do not see a point in optimizing this. 


> This change in behaviour probably wants a change in the documentation
> as well.

Yep. 
[...]

>> @@ -2951,12 +2900,15 @@ int kvm_set_routing_entry(struct kvm *kvm,
>>  			  const struct kvm_irq_routing_entry *ue)
>>  {
>>  	int ret;
>> +	u64 uaddr;
>>  
>>  	switch (ue->type) {
>>  	case KVM_IRQ_ROUTING_S390_ADAPTER:
>>  		e->set = set_adapter_int;
>> -		e->adapter.summary_addr = ue->u.adapter.summary_addr;
>> -		e->adapter.ind_addr = ue->u.adapter.ind_addr;
>> +		uaddr =  gmap_translate(kvm->arch.gmap, ue->u.adapter.summary_addr);
> 
> Can gmap_translate() return -EFAULT here? The code above only seems to
> check for 0... do we want to return an error here?

Yes.

> 
>> +		e->adapter.summary_addr = uaddr;
>> +		uaddr =  gmap_translate(kvm->arch.gmap, ue->u.adapter.ind_addr);
>> +		e->adapter.ind_addr = uaddr;
>>  		e->adapter.summary_offset = ue->u.adapter.summary_offset;
>>  		e->adapter.ind_offset = ue->u.adapter.ind_offset;
>>  		e->adapter.adapter_id = ue->u.adapter.adapter_id;
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v2 RFC] KVM: s390/interrupt: do not pin adapter interrupt pages
  2020-02-12 12:22             ` Christian Borntraeger
@ 2020-02-12 12:47               ` David Hildenbrand
  0 siblings, 0 replies; 147+ messages in thread
From: David Hildenbrand @ 2020-02-12 12:47 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Ulrich.Weigand, aarcange, akpm, cohuck, frankja, gor, imbrenda,
	kvm, linux-mm, linux-s390, mimu, thuth, dgilbert

On 12.02.20 13:22, Christian Borntraeger wrote:
> 
> 
> On 12.02.20 13:16, David Hildenbrand wrote:
>>
>>> +	/*
>>> +	 * We resolve the gpa to hva when setting the IRQ routing. If userspace
>>> +	 * decides to mess with the memslots it better also updates the irq
>>> +	 * routing. Otherwise we will write to the wrong userspace address.
>>> +	 */
>>
>> I guess this is just as old handling, where a page was pinned. But
>> slightly better :) So the pages are definitely part of guest memory.
>>
>> Fun stuff: If (a nasty) guest (in current code) zappes this page using
>> balloon inflation and the page is re-accessed (e.g., by the guest or by
>> the host), a new page will be faulted in, and there will be an
>> inconsistency between what the guest/user space sees and what this code
>> sees. Going via the user space address looks cleaner.
>>
>> Now, with postcopy live migration, we will also zap all guest memory
>> before starting the guest, I do wonder if that produces a similar
>> inconsistency ... usually, when pages are pinned in the kernel, we
>> inhibit the balloon and implicitly also postcopy.
>>
>> If so, this actually fixes an issue. But might depend on the order
>> things are initialized in user space. Or I am messing up things :)
> 
> Yes, the current code has some corner cases where a guest can shoot himself
> in the foot. This variant could actually be safer. 

At least with postcopy it would be a silent migration issue, not guest
triggered. But I am not sure if it can trigger.

Anyhow, this is safer :)

-- 
Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH v2 RFC] KVM: s390/interrupt: do not pin adapter interrupt pages
  2020-02-12 12:44             ` Christian Borntraeger
@ 2020-02-12 13:07               ` Cornelia Huck
  0 siblings, 0 replies; 147+ messages in thread
From: Cornelia Huck @ 2020-02-12 13:07 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: david, Ulrich.Weigand, aarcange, akpm, frankja, gor, imbrenda,
	kvm, linux-mm, linux-s390, mimu, thuth

On Wed, 12 Feb 2020 13:44:53 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 12.02.20 13:39, Cornelia Huck wrote:
> [...]
> 
> >> +	 */
> >> +	return 0;  
> > 
> > Given that this function now always returns 0, we basically get a
> > completely useless roundtrip into the kernel when userspace is trying
> > to setup the mappings.
> > 
> > Can we define a new IO_ADAPTER_MAPPING_NOT_NEEDED or so capability that
> > userspace can check?  
> 
> Nack. This is one system call per initial indicator ccw. This is so seldom
> and cheap that I do not see a point in optimizing this. 

NB that zpci also calls this. Probably a rare event there as well.

> 
> 
> > This change in behaviour probably wants a change in the documentation
> > as well.  
> 
> Yep. 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 05/35] s390/mm: provide memory management functions for protected KVM guests
  2020-02-07 11:39 ` [PATCH 05/35] s390/mm: provide memory management functions for protected KVM guests Christian Borntraeger
@ 2020-02-12 13:42   ` Cornelia Huck
  2020-02-13  7:43     ` Christian Borntraeger
  2020-02-14 17:59   ` David Hildenbrand
  1 sibling, 1 reply; 147+ messages in thread
From: Cornelia Huck @ 2020-02-12 13:42 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, Andrew Morton

On Fri,  7 Feb 2020 06:39:28 -0500
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> 
> This provides the basic ultravisor calls and page table handling to cope
> with secure guests:
> - provide arch_make_page_accessible
> - make pages accessible after unmapping of secure guests
> - provide the ultravisor commands convert to/from secure
> - provide the ultravisor commands pin/unpin shared
> - provide callbacks to make pages secure (inacccessible)
>  - we check for the expected pin count to only make pages secure if the
>    host is not accessing them
>  - we fence hugetlbfs for secure pages
> 
> Co-developed-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
> Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/include/asm/gmap.h        |   2 +
>  arch/s390/include/asm/mmu.h         |   2 +
>  arch/s390/include/asm/mmu_context.h |   1 +
>  arch/s390/include/asm/page.h        |   5 +
>  arch/s390/include/asm/pgtable.h     |  34 +++++-
>  arch/s390/include/asm/uv.h          |  52 +++++++++
>  arch/s390/kernel/uv.c               | 172 ++++++++++++++++++++++++++++
>  7 files changed, 263 insertions(+), 5 deletions(-)

(...)

> +/*
> + * Requests the Ultravisor to encrypt a guest page and make it
> + * accessible to the host for paging (export).
> + *
> + * @paddr: Absolute host address of page to be exported
> + */
> +int uv_convert_from_secure(unsigned long paddr)
> +{
> +	struct uv_cb_cfs uvcb = {
> +		.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
> +		.header.len = sizeof(uvcb),
> +		.paddr = paddr
> +	};
> +
> +	uv_call(0, (u64)&uvcb);
> +
> +	if (uvcb.header.rc == 1 || uvcb.header.rc == 0x107)

I think this either wants a comment or some speaking #defines.

> +		return 0;
> +	return -EINVAL;
> +}

(...)

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 35/35] DOCUMENTATION: Protected virtual machine introduction and IPL
  2020-02-12 11:01   ` Cornelia Huck
@ 2020-02-12 16:36     ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-12 16:36 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank



On 12.02.20 12:01, Cornelia Huck wrote:
> On Fri,  7 Feb 2020 06:39:58 -0500
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> From: Janosch Frank <frankja@linux.ibm.com>
>>
>> Add documentation about protected KVM guests and description of changes
>> that are necessary to move a KVM VM into Protected Virtualization mode.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> [borntraeger@de.ibm.com: fixing and conversion to rst]
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  Documentation/virt/kvm/index.rst        |   2 +
>>  Documentation/virt/kvm/s390-pv-boot.rst |  79 ++++++++++++++++
>>  Documentation/virt/kvm/s390-pv.rst      | 116 ++++++++++++++++++++++++
>>  MAINTAINERS                             |   1 +
>>  4 files changed, 198 insertions(+)
>>  create mode 100644 Documentation/virt/kvm/s390-pv-boot.rst
>>  create mode 100644 Documentation/virt/kvm/s390-pv.rst
>>
> (...)
>> diff --git a/Documentation/virt/kvm/s390-pv-boot.rst b/Documentation/virt/kvm/s390-pv-boot.rst
>> new file mode 100644
>> index 000000000000..47814e53369a
>> --- /dev/null
>> +++ b/Documentation/virt/kvm/s390-pv-boot.rst
>> @@ -0,0 +1,79 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +======================================
>> +s390 (IBM Z) Boot/IPL of Protected VMs
>> +======================================
>> +
>> +Summary
>> +-------
>> +Protected Virtual Machines (PVM) are not accessible by I/O or the
>> +hypervisor.  When the hypervisor wants to access the memory of PVMs
>> +the memory needs to be made accessible. When doing so, the memory will
>> +be encrypted.  See :doc:`s390-pv` for details.
> 
> Maybe
> 
> "The memory of Protected Virtual Machines (PVMs) is not accessible to
> I/O or the hypervisor. In those cases where the hypervisor needs to
> access the memory of a PVM, that memory must be made accessible. Memory
> made accessible to the hypervisor will be encrypted. See :doc:`s390-pv`
> for details."

looks good.

> 
> ?
> 
>> +
>> +On IPL a small plaintext bootloader is started which provides
> 
> "On IPL (boot), a small plaintext bootloader is started, which..."

ok


> 
> ?
> 
>> +information about the encrypted components and necessary metadata to
>> +KVM to decrypt the protected virtual machine.
> 
> (...)
> 
>> +Diag308
>> +-------
>> +This diagnose instruction is the basis for VM IPL. The VM can set and
> 
> "This diagnose instruction is the basic mechanism to handle IPL and
> related operations for virtual machines." ?


ok


> 
>> +retrieve IPL information blocks, that specify the IPL method/devices
>> +and request VM memory and subsystem resets, as well as IPLs.
>> +
>> +For PVs this concept has been extended with new subcodes:
> 
> s/For PVs/For PVMs,/

ok
> 
> (...)
> 
>> +When running in protected mode some subcodes will result in exceptions
> 
> s/When running in protected mode/When running in protected virtualization mode,/
> 
ok

> ?
> 
>> +or return error codes.
>> +
>> +Subcodes 4 and 7 will result in specification exceptions as they would
>> +not clear out the guest memory.
>> +When removing a secure VM, the UV will clear all memory, so we can't
>> +have non-clearing IPL subcodes.
> 
> "Subcodes 4 and 7, which specify operations that do not clear the guest
> memory, will result in specification exceptions. This is because the UV
> will clear all memory when a secure VM is removed, and therefore
> non-clearing IPL subcodes are not allowed."

ok


> 
> ?
> 
> (...)
>> diff --git a/Documentation/virt/kvm/s390-pv.rst b/Documentation/virt/kvm/s390-pv.rst
>> new file mode 100644
>> index 000000000000..dbe9110dfd1e
>> --- /dev/null
>> +++ b/Documentation/virt/kvm/s390-pv.rst
>> @@ -0,0 +1,116 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +=========================================
>> +s390 (IBM Z) Ultravisor and Protected VMs
>> +=========================================
>> +
>> +Summary
>> +-------
>> +Protected virtual machines (PVM) are KVM VMs, where KVM can't access
>> +the VM's state like guest memory and guest registers anymore. Instead,
> 
> "...are KVM VMs that do not allow KVM to access VM state like guest
> memory or guest registers."
> 
> ?
> 
> (...)
> 
>> +The Interception Parameters state description field still contains the
>> +the bytes of the instruction text, but with pre-set register values
>> +instead of the actual ones. I.e. each instruction always uses the same
>> +instruction text, in order not to leak guest instruction text.
>> +This also implies that the register content that a guest had in r<n>
>> +may be in r<m> from the hypervisors point of view.
> 
> s/hypervisors/hypervisor's/

ack.

> 
>> +
>> +The Secure Instruction Data Area contains instruction storage
>> +data. Instruction data, i.e. data being referenced by an instruction
>> +like the SCCB for sclp, is moved over the SIDA. When an instruction is
> 
> s/over/via/ ?

ack
> 
>> +intercepted, the SIE will only allow data and program interrupts for
>> +this instruction to be moved to the guest via the two data areas
>> +discussed before. Other data is either ignored or results in validity
>> +interceptions.
> 
> (...)
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 05/35] s390/mm: provide memory management functions for protected KVM guests
  2020-02-12 13:42   ` Cornelia Huck
@ 2020-02-13  7:43     ` Christian Borntraeger
  2020-02-13  8:44       ` Cornelia Huck
  0 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-13  7:43 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, Andrew Morton



On 12.02.20 14:42, Cornelia Huck wrote:
> On Fri,  7 Feb 2020 06:39:28 -0500
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>
>> This provides the basic ultravisor calls and page table handling to cope
>> with secure guests:
>> - provide arch_make_page_accessible
>> - make pages accessible after unmapping of secure guests
>> - provide the ultravisor commands convert to/from secure
>> - provide the ultravisor commands pin/unpin shared
>> - provide callbacks to make pages secure (inacccessible)
>>  - we check for the expected pin count to only make pages secure if the
>>    host is not accessing them
>>  - we fence hugetlbfs for secure pages
>>
>> Co-developed-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
>> Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
>> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
>> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  arch/s390/include/asm/gmap.h        |   2 +
>>  arch/s390/include/asm/mmu.h         |   2 +
>>  arch/s390/include/asm/mmu_context.h |   1 +
>>  arch/s390/include/asm/page.h        |   5 +
>>  arch/s390/include/asm/pgtable.h     |  34 +++++-
>>  arch/s390/include/asm/uv.h          |  52 +++++++++
>>  arch/s390/kernel/uv.c               | 172 ++++++++++++++++++++++++++++
>>  7 files changed, 263 insertions(+), 5 deletions(-)
> 
> (...)
> 
>> +/*
>> + * Requests the Ultravisor to encrypt a guest page and make it
>> + * accessible to the host for paging (export).
>> + *
>> + * @paddr: Absolute host address of page to be exported
>> + */
>> +int uv_convert_from_secure(unsigned long paddr)
>> +{
>> +	struct uv_cb_cfs uvcb = {
>> +		.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
>> +		.header.len = sizeof(uvcb),
>> +		.paddr = paddr
>> +	};
>> +
>> +	uv_call(0, (u64)&uvcb);
>> +
>> +	if (uvcb.header.rc == 1 || uvcb.header.rc == 0x107)
> 
> I think this either wants a comment or some speaking #defines.

Yes. We will improve some other aspects of this patch, but I will add

	/* Return on success or if this page was already exported */
> 
>> +		return 0;
>> +	return -EINVAL;
>> +}
> 
> (...)
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 20/35] KVM: s390: protvirt: handle secure guest prefix pages
  2020-02-07 11:39 ` [PATCH 20/35] KVM: s390: protvirt: handle secure guest prefix pages Christian Borntraeger
@ 2020-02-13  8:37   ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-13  8:37 UTC (permalink / raw)
  To: Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, Janosch Frank



On 07.02.20 12:39, Christian Borntraeger wrote:
> From: Janosch Frank <frankja@linux.ibm.com>
> 
> The SPX instruction is handled by the ultravisor. We do get a
> notification intercept, though. Let us update our internal view.
> 
> In addition to that, when the guest prefix page is not secure, an
> intercept 112 (0x70) is indicated.  To avoid this for the most common
> cases, we can make the guest prefix page protected whenever we pin it.
> We have to deal with 112 nevertheless, e.g. when some host code triggers
> an export (e.g. qemu dump guest memory). We can simply re-run the
> pinning logic by doing a no-op prefix change.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h |  1 +
>  arch/s390/kvm/intercept.c        | 16 ++++++++++++++++
>  arch/s390/kvm/kvm-s390.c         | 14 ++++++++++++++
>  3 files changed, 31 insertions(+)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 05949ff75a1e..0e3ffad4137f 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -225,6 +225,7 @@ struct kvm_s390_sie_block {
>  #define ICPT_INT_ENABLE	0x64
>  #define ICPT_PV_INSTR	0x68
>  #define ICPT_PV_NOTIFY	0x6c
> +#define ICPT_PV_PREF	0x70
>  	__u8	icptcode;		/* 0x0050 */
>  	__u8	icptstatus;		/* 0x0051 */
>  	__u16	ihcpu;			/* 0x0052 */
> diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
> index db3dd5ee0b7a..2a966dc52611 100644
> --- a/arch/s390/kvm/intercept.c
> +++ b/arch/s390/kvm/intercept.c
> @@ -451,6 +451,15 @@ static int handle_operexc(struct kvm_vcpu *vcpu)
>  	return kvm_s390_inject_program_int(vcpu, PGM_OPERATION);
>  }
>  
> +static int handle_pv_spx(struct kvm_vcpu *vcpu)
> +{
> +	u32 pref = *(u32 *)vcpu->arch.sie_block->sidad;
> +
> +	kvm_s390_set_prefix(vcpu, pref);
> +	trace_kvm_s390_handle_prefix(vcpu, 1, pref);
> +	return 0;
> +}
> +
>  static int handle_pv_sclp(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
> @@ -477,6 +486,8 @@ static int handle_pv_sclp(struct kvm_vcpu *vcpu)
>  
>  static int handle_pv_notification(struct kvm_vcpu *vcpu)
>  {
> +	if (vcpu->arch.sie_block->ipa == 0xb210)
> +		return handle_pv_spx(vcpu);
>  	if (vcpu->arch.sie_block->ipa == 0xb220)
>  		return handle_pv_sclp(vcpu);
>  
> @@ -534,6 +545,11 @@ int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
>  	case ICPT_PV_NOTIFY:
>  		rc = handle_pv_notification(vcpu);
>  		break;
> +	case ICPT_PV_PREF:
> +		rc = 0;
> +		/* request to convert and pin the prefix pages again */
> +		kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
> +		break;
>  	default:
>  		return -EOPNOTSUPP;
>  	}
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 1797490e3e77..63d158149936 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -3678,6 +3678,20 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
>  		rc = gmap_mprotect_notify(vcpu->arch.gmap,
>  					  kvm_s390_get_prefix(vcpu),
>  					  PAGE_SIZE * 2, PROT_WRITE);
> +		if (!rc && kvm_s390_pv_is_protected(vcpu->kvm)) {
> +			do {
> +				rc = uv_convert_to_secure(
> +						vcpu->arch.gmap,
> +						kvm_s390_get_prefix(vcpu));
> +			} while (rc == -EAGAIN);
> +			WARN_ONCE(rc, "Error while importing first prefix page. rc %d", rc);
> +			do {
> +				rc = uv_convert_to_secure(
> +						vcpu->arch.gmap,
> +						kvm_s390_get_prefix(vcpu) + PAGE_SIZE);
> +			} while (rc == -EAGAIN);
> +			WARN_ONCE(rc, "Error while importing second prefix page. rc %d", rc);

I think it might be better to move this hunk in the ICPT_IV_PREF handler. 
Then we can cleanup the convert to secure handling a bit. 

> +		}
>  		if (rc) {
>  			kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
>  			return rc;
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH/RFC] KVM: s390: protvirt: pass-through rc and rrc
  2020-02-11  8:48               ` Janosch Frank
@ 2020-02-13  8:43                 ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-13  8:43 UTC (permalink / raw)
  To: Janosch Frank, Cornelia Huck
  Cc: thuth, Ulrich.Weigand, david, imbrenda, kvm, linux-s390, mimu

On 11.02.20 09:48, Janosch Frank wrote:
> On 2/10/20 1:56 PM, Christian Borntraeger wrote:
>>
>>
>> On 10.02.20 13:50, Cornelia Huck wrote:
>>> On Mon, 10 Feb 2020 13:06:19 +0100
>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>>>
>>>> What about the following. I will rip out RC and RRC but add 
>>>> a 32bit flags field (which must be 0) and 3*64 bit reserved.
>>>
>>> Probably dumb question: How are these new fields supposed to be used?
>>
>> This was planned for error handling in QEMU. As we have no user of rc/rrc
>> yet, I have ripped that out and added a flag field + 16 bytes of reserved.
>> Usage is as usual flags must be 0. When flags!=0 the reserved fields will
>> have a new meaning. 
>>
> 
> I want to have the rcs because right now we would only output the return
> value of the ioctl and most UV error codes are mapped to -EINVAL. So if
> an error occurs, admins would need to match up the crashed VM with the
> UV debugfs files which might not even exist if debugfs is not mounted...
> 
> That's also one of the reasons I like having separate create calls for
> VM and VCPUs.

Janosch convinced me that we need rc and rrc for some calls. For example
the set secure configuration parameter passes along the header from the
guest image to the hardware. There are now different errors possible,
for example wrong key, wrong list of requirements etc. Userspace needs 
to know that to provide proper error messages.

I will try to build something as clean as possible.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 05/35] s390/mm: provide memory management functions for protected KVM guests
  2020-02-13  7:43     ` Christian Borntraeger
@ 2020-02-13  8:44       ` Cornelia Huck
  0 siblings, 0 replies; 147+ messages in thread
From: Cornelia Huck @ 2020-02-13  8:44 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, KVM, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, Andrew Morton

On Thu, 13 Feb 2020 08:43:33 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 12.02.20 14:42, Cornelia Huck wrote:
> > On Fri,  7 Feb 2020 06:39:28 -0500
> > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> >   
> >> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> >>
> >> This provides the basic ultravisor calls and page table handling to cope
> >> with secure guests:
> >> - provide arch_make_page_accessible
> >> - make pages accessible after unmapping of secure guests
> >> - provide the ultravisor commands convert to/from secure
> >> - provide the ultravisor commands pin/unpin shared
> >> - provide callbacks to make pages secure (inacccessible)
> >>  - we check for the expected pin count to only make pages secure if the
> >>    host is not accessing them
> >>  - we fence hugetlbfs for secure pages
> >>
> >> Co-developed-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
> >> Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
> >> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> >> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> >> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> >> ---
> >>  arch/s390/include/asm/gmap.h        |   2 +
> >>  arch/s390/include/asm/mmu.h         |   2 +
> >>  arch/s390/include/asm/mmu_context.h |   1 +
> >>  arch/s390/include/asm/page.h        |   5 +
> >>  arch/s390/include/asm/pgtable.h     |  34 +++++-
> >>  arch/s390/include/asm/uv.h          |  52 +++++++++
> >>  arch/s390/kernel/uv.c               | 172 ++++++++++++++++++++++++++++
> >>  7 files changed, 263 insertions(+), 5 deletions(-)  
> > 
> > (...)
> >   
> >> +/*
> >> + * Requests the Ultravisor to encrypt a guest page and make it
> >> + * accessible to the host for paging (export).
> >> + *
> >> + * @paddr: Absolute host address of page to be exported
> >> + */
> >> +int uv_convert_from_secure(unsigned long paddr)
> >> +{
> >> +	struct uv_cb_cfs uvcb = {
> >> +		.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
> >> +		.header.len = sizeof(uvcb),
> >> +		.paddr = paddr
> >> +	};
> >> +
> >> +	uv_call(0, (u64)&uvcb);
> >> +
> >> +	if (uvcb.header.rc == 1 || uvcb.header.rc == 0x107)  
> > 
> > I think this either wants a comment or some speaking #defines.  
> 
> Yes. We will improve some other aspects of this patch, but I will add
> 
> 	/* Return on success or if this page was already exported */

Sounds good.

> >   
> >> +		return 0;
> >> +	return -EINVAL;
> >> +}  
> > 
> > (...)
> >   
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-11 11:26     ` Will Deacon
@ 2020-02-13 14:48         ` Christian Borntraeger
  2020-02-13 14:48         ` Christian Borntraeger
  1 sibling, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-13 14:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: Janosch Frank, Andrew Morton, Marc Zyngier, Sean Christopherson,
	Tom Lendacky, KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, kvm-ppc, Paolo Bonzini,
	mark.rutland, qperret, palmerdabbelt



On 11.02.20 12:26, Will Deacon wrote:
> On Mon, Feb 10, 2020 at 06:27:04PM +0100, Christian Borntraeger wrote:
>> CC Marc Zyngier for KVM on ARM.  Marc, see below. Will there be any
>> use for this on KVM/ARM in the future?
> 
> I can't speak for Marc, but I can say that we're interested in something
> like this for potentially isolating VMs from a KVM host in Android.
> However, we've currently been working on the assumption that the memory
> removed from the host won't usually be touched by the host (i.e. no
> KSM or swapping out), so all we'd probably want at the moment is to be
> able to return an error back from arch_make_page_accessible(). Its return
> code is ignored in this patch :/

I think there are two ways at the moment. One is to keep the memory away from
Linux, e.g. by using the memory as device driver memory like kmalloc. This is
kind of what Power does. And I understand you as you want to follow that model
and do not want to use paging, file backing or so.
Our approach tries to fully integrate into the existing Linux LRU methods.

Back to your approach. What happens when a malicious QEMU would start direct I/O
on such isolated memory? Is that what you meant by adding error checking in these
hooks. For the gup.c code returning an error seems straightforward.

I have no idea what to do in writeback. When somebody managed to trigger writeback
on such a page, it already seems too late. 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
@ 2020-02-13 14:48         ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-13 14:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: Janosch Frank, Andrew Morton, Marc Zyngier, Sean Christopherson,
	Tom Lendacky, KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, kvm-ppc, Paolo Bonzini,
	mark.rutland, qperret, palmerdabbelt



On 11.02.20 12:26, Will Deacon wrote:
> On Mon, Feb 10, 2020 at 06:27:04PM +0100, Christian Borntraeger wrote:
>> CC Marc Zyngier for KVM on ARM.  Marc, see below. Will there be any
>> use for this on KVM/ARM in the future?
> 
> I can't speak for Marc, but I can say that we're interested in something
> like this for potentially isolating VMs from a KVM host in Android.
> However, we've currently been working on the assumption that the memory
> removed from the host won't usually be touched by the host (i.e. no
> KSM or swapping out), so all we'd probably want at the moment is to be
> able to return an error back from arch_make_page_accessible(). Its return
> code is ignored in this patch :/

I think there are two ways at the moment. One is to keep the memory away from
Linux, e.g. by using the memory as device driver memory like kmalloc. This is
kind of what Power does. And I understand you as you want to follow that model
and do not want to use paging, file backing or so.
Our approach tries to fully integrate into the existing Linux LRU methods.

Back to your approach. What happens when a malicious QEMU would start direct I/O
on such isolated memory? Is that what you meant by adding error checking in these
hooks. For the gup.c code returning an error seems straightforward.

I have no idea what to do in writeback. When somebody managed to trigger writeback
on such a page, it already seems too late. 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-10 17:27     ` Christian Borntraeger
@ 2020-02-13 19:56       ` Sean Christopherson
  -1 siblings, 0 replies; 147+ messages in thread
From: Sean Christopherson @ 2020-02-13 19:56 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, Andrew Morton, Marc Zyngier, Tom Lendacky, KVM,
	Cornelia Huck, David Hildenbrand, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, kvm-ppc, Paolo Bonzini

On Mon, Feb 10, 2020 at 06:27:04PM +0100, Christian Borntraeger wrote:
> CC Marc Zyngier for KVM on ARM.  Marc, see below. Will there be any
> use for this on KVM/ARM in the future?
> 
> CC Sean Christopherson/Tom Lendacky. Any obvious use case for Intel/AMD
> to have a callback before a page is used for I/O?

Yes?

> Andrew (or other mm people) any chance to get an ACK for this change?
> I could then carry that via s390 or KVM tree. Or if you want to carry
> that yourself I can send an updated version (we need to kind of 
> synchronize that Linus will pull the KVM changes after the mm changes).
> 
> Andrea asked if others would benefit from this, so here are some more
> information about this (and I can also put this into the patch
> description).  So we have talked to the POWER folks. They do not use
> the standard normal memory management, instead they have a hard split
> between secure and normal memory. The secure memory  is the handled by
> the hypervisor as device memory and the ultravisor and the hypervisor
> move this forth and back when needed.
> 
> On s390 there is no *separate* pool of physical pages that are secure.
> Instead, *any* physical page can be marked as secure or not, by
> setting a bit in a per-page data structure that hardware uses to stop
> unauthorized access.  (That bit is under control of the ultravisor.)
> 
> Note that one side effect of this strategy is that the decision
> *which* secure pages to encrypt and then swap out is actually done by
> the hypervisor, not the ultravisor.  In our case, the hypervisor is
> Linux/KVM, so we're using the regular Linux memory management scheme
> (active/inactive LRU lists etc.) to make this decision.  The advantage
> is that the Ultravisor code does not need to itself implement any
> memory management code, making it a lot simpler.

Disclaimer: I'm not familiar with s390 guest page faults or UV.  I tried
to give myself a crash course, apologies if I'm way out in left field...

AIUI, pages will first be added to a secure guest by converting a normal,
non-secure page to secure and stuffing it into the guest page tables.  To
swap a page from a secure guest, arch_make_page_accessible() will be called
to encrypt the page in place so that it can be accessed by the untrusted
kernel/VMM and written out to disk.  And to fault the page back in, on s390
a secure guest access to a non-secure page will generate a page fault with
a dedicated type.  That fault routes directly to
do_non_secure_storage_access(), which converts the page to secure and thus
makes it re-accessible to the guest.

That all sounds sane and usable for Intel.

My big question is the follow/get flows, more on that below.

> However, in the end this is why we need the hook into Linux memory
> management: once Linux has decided to swap a page out, we need to get
> a chance to tell the Ultravisor to "export" the page (i.e., encrypt
> its contents and mark it no longer secure).
> 
> As outlined below this should be a no-op for anybody not opting in.
> 
> Christian                                   
> 
> On 07.02.20 12:39, Christian Borntraeger wrote:
> > From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > 
> > With the introduction of protected KVM guests on s390 there is now a
> > concept of inaccessible pages. These pages need to be made accessible
> > before the host can access them.
> > 
> > While cpu accesses will trigger a fault that can be resolved, I/O
> > accesses will just fail.  We need to add a callback into architecture
> > code for places that will do I/O, namely when writeback is started or
> > when a page reference is taken.
> > 
> > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > ---
> >  include/linux/gfp.h | 6 ++++++
> >  mm/gup.c            | 2 ++
> >  mm/page-writeback.c | 1 +
> >  3 files changed, 9 insertions(+)
> > 
> > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > index e5b817cb86e7..be2754841369 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { }
> >  #ifndef HAVE_ARCH_ALLOC_PAGE
> >  static inline void arch_alloc_page(struct page *page, int order) { }
> >  #endif
> > +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> > +static inline int arch_make_page_accessible(struct page *page)
> > +{
> > +	return 0;
> > +}
> > +#endif
> >  
> >  struct page *
> >  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 7646bf993b25..a01262cd2821 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -257,6 +257,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
> >  			page = ERR_PTR(-ENOMEM);
> >  			goto out;
> >  		}
> > +		arch_make_page_accessible(page);

As Will pointed out, the return value definitely needs to be checked, there
will undoubtedly be scenarios where the page cannot be made accessible.

What is the use case for calling arch_make_page_accessible() in the follow()
and gup() paths?  Live migration is the only thing that comes to mind, and
for live migration I would expect you would want to keep the secure guest
running when copying pages to the target, i.e. use pre-copy.  That would
conflict with converting the page in place.  Rather, migration would use a
separate dedicated path to copy the encrypted contents of the secure page to
a completely different page, and send *that* across the wire so that the
guest can continue accessing the original page.

Am I missing a need to do this for the swap/reclaim case?  Or is there a
completely different use case I'm overlooking?

Tangentially related, hooks here could be quite useful for sanity checking
the kernel/KVM and/or debugging kernel/KVM bugs.  Would it make sense to
pass a param to arch_make_page_accessible() to provide some information as
to why the page needs to be made accessible?

> >  	}
> >  	if (flags & FOLL_TOUCH) {
> >  		if ((flags & FOLL_WRITE) &&
> > @@ -1870,6 +1871,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> >  
> >  		VM_BUG_ON_PAGE(compound_head(page) != head, page);
> >  
> > +		arch_make_page_accessible(page);
> >  		SetPageReferenced(page);
> >  		pages[*nr] = page;
> >  		(*nr)++;
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 2caf780a42e7..0f0bd14571b1 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -2806,6 +2806,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
> >  		inc_lruvec_page_state(page, NR_WRITEBACK);
> >  		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
> >  	}
> > +	arch_make_page_accessible(page);
> >  	unlock_page_memcg(page);
> 
> As outlined by Ulrich, we can move the callback after the unlock.
> 
> >  	return ret;
> >  
> > 
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
@ 2020-02-13 19:56       ` Sean Christopherson
  0 siblings, 0 replies; 147+ messages in thread
From: Sean Christopherson @ 2020-02-13 19:56 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, Andrew Morton, Marc Zyngier, Tom Lendacky, KVM,
	Cornelia Huck, David Hildenbrand, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, kvm-ppc, Paolo Bonzini

On Mon, Feb 10, 2020 at 06:27:04PM +0100, Christian Borntraeger wrote:
> CC Marc Zyngier for KVM on ARM.  Marc, see below. Will there be any
> use for this on KVM/ARM in the future?
> 
> CC Sean Christopherson/Tom Lendacky. Any obvious use case for Intel/AMD
> to have a callback before a page is used for I/O?

Yes?

> Andrew (or other mm people) any chance to get an ACK for this change?
> I could then carry that via s390 or KVM tree. Or if you want to carry
> that yourself I can send an updated version (we need to kind of 
> synchronize that Linus will pull the KVM changes after the mm changes).
> 
> Andrea asked if others would benefit from this, so here are some more
> information about this (and I can also put this into the patch
> description).  So we have talked to the POWER folks. They do not use
> the standard normal memory management, instead they have a hard split
> between secure and normal memory. The secure memory  is the handled by
> the hypervisor as device memory and the ultravisor and the hypervisor
> move this forth and back when needed.
> 
> On s390 there is no *separate* pool of physical pages that are secure.
> Instead, *any* physical page can be marked as secure or not, by
> setting a bit in a per-page data structure that hardware uses to stop
> unauthorized access.  (That bit is under control of the ultravisor.)
> 
> Note that one side effect of this strategy is that the decision
> *which* secure pages to encrypt and then swap out is actually done by
> the hypervisor, not the ultravisor.  In our case, the hypervisor is
> Linux/KVM, so we're using the regular Linux memory management scheme
> (active/inactive LRU lists etc.) to make this decision.  The advantage
> is that the Ultravisor code does not need to itself implement any
> memory management code, making it a lot simpler.

Disclaimer: I'm not familiar with s390 guest page faults or UV.  I tried
to give myself a crash course, apologies if I'm way out in left field...

AIUI, pages will first be added to a secure guest by converting a normal,
non-secure page to secure and stuffing it into the guest page tables.  To
swap a page from a secure guest, arch_make_page_accessible() will be called
to encrypt the page in place so that it can be accessed by the untrusted
kernel/VMM and written out to disk.  And to fault the page back in, on s390
a secure guest access to a non-secure page will generate a page fault with
a dedicated type.  That fault routes directly to
do_non_secure_storage_access(), which converts the page to secure and thus
makes it re-accessible to the guest.

That all sounds sane and usable for Intel.

My big question is the follow/get flows, more on that below.

> However, in the end this is why we need the hook into Linux memory
> management: once Linux has decided to swap a page out, we need to get
> a chance to tell the Ultravisor to "export" the page (i.e., encrypt
> its contents and mark it no longer secure).
> 
> As outlined below this should be a no-op for anybody not opting in.
> 
> Christian                                   
> 
> On 07.02.20 12:39, Christian Borntraeger wrote:
> > From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > 
> > With the introduction of protected KVM guests on s390 there is now a
> > concept of inaccessible pages. These pages need to be made accessible
> > before the host can access them.
> > 
> > While cpu accesses will trigger a fault that can be resolved, I/O
> > accesses will just fail.  We need to add a callback into architecture
> > code for places that will do I/O, namely when writeback is started or
> > when a page reference is taken.
> > 
> > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > ---
> >  include/linux/gfp.h | 6 ++++++
> >  mm/gup.c            | 2 ++
> >  mm/page-writeback.c | 1 +
> >  3 files changed, 9 insertions(+)
> > 
> > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > index e5b817cb86e7..be2754841369 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { }
> >  #ifndef HAVE_ARCH_ALLOC_PAGE
> >  static inline void arch_alloc_page(struct page *page, int order) { }
> >  #endif
> > +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> > +static inline int arch_make_page_accessible(struct page *page)
> > +{
> > +	return 0;
> > +}
> > +#endif
> >  
> >  struct page *
> >  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 7646bf993b25..a01262cd2821 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -257,6 +257,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
> >  			page = ERR_PTR(-ENOMEM);
> >  			goto out;
> >  		}
> > +		arch_make_page_accessible(page);

As Will pointed out, the return value definitely needs to be checked, there
will undoubtedly be scenarios where the page cannot be made accessible.

What is the use case for calling arch_make_page_accessible() in the follow()
and gup() paths?  Live migration is the only thing that comes to mind, and
for live migration I would expect you would want to keep the secure guest
running when copying pages to the target, i.e. use pre-copy.  That would
conflict with converting the page in place.  Rather, migration would use a
separate dedicated path to copy the encrypted contents of the secure page to
a completely different page, and send *that* across the wire so that the
guest can continue accessing the original page.

Am I missing a need to do this for the swap/reclaim case?  Or is there a
completely different use case I'm overlooking?

Tangentially related, hooks here could be quite useful for sanity checking
the kernel/KVM and/or debugging kernel/KVM bugs.  Would it make sense to
pass a param to arch_make_page_accessible() to provide some information as
to why the page needs to be made accessible?

> >  	}
> >  	if (flags & FOLL_TOUCH) {
> >  		if ((flags & FOLL_WRITE) &&
> > @@ -1870,6 +1871,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> >  
> >  		VM_BUG_ON_PAGE(compound_head(page) != head, page);
> >  
> > +		arch_make_page_accessible(page);
> >  		SetPageReferenced(page);
> >  		pages[*nr] = page;
> >  		(*nr)++;
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 2caf780a42e7..0f0bd14571b1 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -2806,6 +2806,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
> >  		inc_lruvec_page_state(page, NR_WRITEBACK);
> >  		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
> >  	}
> > +	arch_make_page_accessible(page);
> >  	unlock_page_memcg(page);
> 
> As outlined by Ulrich, we can move the callback after the unlock.
> 
> >  	return ret;
> >  
> > 
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-13 19:56       ` Sean Christopherson
@ 2020-02-13 20:13         ` Christian Borntraeger
  -1 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-13 20:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Janosch Frank, Andrew Morton, Marc Zyngier, Tom Lendacky, KVM,
	Cornelia Huck, David Hildenbrand, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, kvm-ppc, Paolo Bonzini



On 13.02.20 20:56, Sean Christopherson wrote:
> On Mon, Feb 10, 2020 at 06:27:04PM +0100, Christian Borntraeger wrote:
>> CC Marc Zyngier for KVM on ARM.  Marc, see below. Will there be any
>> use for this on KVM/ARM in the future?
>>
>> CC Sean Christopherson/Tom Lendacky. Any obvious use case for Intel/AMD
>> to have a callback before a page is used for I/O?
> 
> Yes?
> 
>> Andrew (or other mm people) any chance to get an ACK for this change?
>> I could then carry that via s390 or KVM tree. Or if you want to carry
>> that yourself I can send an updated version (we need to kind of 
>> synchronize that Linus will pull the KVM changes after the mm changes).
>>
>> Andrea asked if others would benefit from this, so here are some more
>> information about this (and I can also put this into the patch
>> description).  So we have talked to the POWER folks. They do not use
>> the standard normal memory management, instead they have a hard split
>> between secure and normal memory. The secure memory  is the handled by
>> the hypervisor as device memory and the ultravisor and the hypervisor
>> move this forth and back when needed.
>>
>> On s390 there is no *separate* pool of physical pages that are secure.
>> Instead, *any* physical page can be marked as secure or not, by
>> setting a bit in a per-page data structure that hardware uses to stop
>> unauthorized access.  (That bit is under control of the ultravisor.)
>>
>> Note that one side effect of this strategy is that the decision
>> *which* secure pages to encrypt and then swap out is actually done by
>> the hypervisor, not the ultravisor.  In our case, the hypervisor is
>> Linux/KVM, so we're using the regular Linux memory management scheme
>> (active/inactive LRU lists etc.) to make this decision.  The advantage
>> is that the Ultravisor code does not need to itself implement any
>> memory management code, making it a lot simpler.
> 
> Disclaimer: I'm not familiar with s390 guest page faults or UV.  I tried
> to give myself a crash course, apologies if I'm way out in left field...
> 
> AIUI, pages will first be added to a secure guest by converting a normal,
> non-secure page to secure and stuffing it into the guest page tables.  To
> swap a page from a secure guest, arch_make_page_accessible() will be called
> to encrypt the page in place so that it can be accessed by the untrusted
> kernel/VMM and written out to disk.  And to fault the page back in, on s390
> a secure guest access to a non-secure page will generate a page fault with
> a dedicated type.  That fault routes directly to
> do_non_secure_storage_access(), which converts the page to secure and thus
> makes it re-accessible to the guest.
> 
> That all sounds sane and usable for Intel.
> 
> My big question is the follow/get flows, more on that below.
> 
>> However, in the end this is why we need the hook into Linux memory
>> management: once Linux has decided to swap a page out, we need to get
>> a chance to tell the Ultravisor to "export" the page (i.e., encrypt
>> its contents and mark it no longer secure).
>>
>> As outlined below this should be a no-op for anybody not opting in.
>>
>> Christian                                   
>>
>> On 07.02.20 12:39, Christian Borntraeger wrote:
>>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>>
>>> With the introduction of protected KVM guests on s390 there is now a
>>> concept of inaccessible pages. These pages need to be made accessible
>>> before the host can access them.
>>>
>>> While cpu accesses will trigger a fault that can be resolved, I/O
>>> accesses will just fail.  We need to add a callback into architecture
>>> code for places that will do I/O, namely when writeback is started or
>>> when a page reference is taken.
>>>
>>> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>> ---
>>>  include/linux/gfp.h | 6 ++++++
>>>  mm/gup.c            | 2 ++
>>>  mm/page-writeback.c | 1 +
>>>  3 files changed, 9 insertions(+)
>>>
>>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>>> index e5b817cb86e7..be2754841369 100644
>>> --- a/include/linux/gfp.h
>>> +++ b/include/linux/gfp.h
>>> @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { }
>>>  #ifndef HAVE_ARCH_ALLOC_PAGE
>>>  static inline void arch_alloc_page(struct page *page, int order) { }
>>>  #endif
>>> +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
>>> +static inline int arch_make_page_accessible(struct page *page)
>>> +{
>>> +	return 0;
>>> +}
>>> +#endif
>>>  
>>>  struct page *
>>>  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 7646bf993b25..a01262cd2821 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -257,6 +257,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>>>  			page = ERR_PTR(-ENOMEM);
>>>  			goto out;
>>>  		}
>>> +		arch_make_page_accessible(page);
> 
> As Will pointed out, the return value definitely needs to be checked, there
> will undoubtedly be scenarios where the page cannot be made accessible.

Actually onm s390 this should always succeed unless we have a bug.

But we can certainly provide a variant of that patch that does check the return
value. 
Proper error handling for gup and WARN_ON for pae-writeback.
> 
> What is the use case for calling arch_make_page_accessible() in the follow()
> and gup() paths?  Live migration is the only thing that comes to mind, and
> for live migration I would expect you would want to keep the secure guest
> running when copying pages to the target, i.e. use pre-copy.  That would
> conflict with converting the page in place.  Rather, migration would use a
> separate dedicated path to copy the encrypted contents of the secure page to
> a completely different page, and send *that* across the wire so that the
> guest can continue accessing the original page.
> Am I missing a need to do this for the swap/reclaim case?  Or is there a
> completely different use case I'm overlooking?

This is actually to protect the host against a malicious user space. For 
example a bad QEMU could simply start direct I/O on such protected memory.
We do not want userspace to be able to trigger I/O errors and thus we
implemented the logic to "whenever somebody accesses that page (gup) or
doing I/O, make sure that this page can be accessed. When the guest tries
to access that page we will wait in the page fault handler for writeback to
have finished and for the page_ref to be the expected value.



> 
> Tangentially related, hooks here could be quite useful for sanity checking
> the kernel/KVM and/or debugging kernel/KVM bugs.  Would it make sense to
> pass a param to arch_make_page_accessible() to provide some information as
> to why the page needs to be made accessible?

Some kind of enum that can be used optionally to optimize things?

> 
>>>  	}
>>>  	if (flags & FOLL_TOUCH) {
>>>  		if ((flags & FOLL_WRITE) &&
>>> @@ -1870,6 +1871,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>>>  
>>>  		VM_BUG_ON_PAGE(compound_head(page) != head, page);
>>>  
>>> +		arch_make_page_accessible(page);
>>>  		SetPageReferenced(page);
>>>  		pages[*nr] = page;
>>>  		(*nr)++;
>>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>>> index 2caf780a42e7..0f0bd14571b1 100644
>>> --- a/mm/page-writeback.c
>>> +++ b/mm/page-writeback.c
>>> @@ -2806,6 +2806,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
>>>  		inc_lruvec_page_state(page, NR_WRITEBACK);
>>>  		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
>>>  	}
>>> +	arch_make_page_accessible(page);
>>>  	unlock_page_memcg(page);
>>
>> As outlined by Ulrich, we can move the callback after the unlock.
>>
>>>  	return ret;
>>>  
>>>
>>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
@ 2020-02-13 20:13         ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-13 20:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Janosch Frank, Andrew Morton, Marc Zyngier, Tom Lendacky, KVM,
	Cornelia Huck, David Hildenbrand, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, kvm-ppc, Paolo Bonzini



On 13.02.20 20:56, Sean Christopherson wrote:
> On Mon, Feb 10, 2020 at 06:27:04PM +0100, Christian Borntraeger wrote:
>> CC Marc Zyngier for KVM on ARM.  Marc, see below. Will there be any
>> use for this on KVM/ARM in the future?
>>
>> CC Sean Christopherson/Tom Lendacky. Any obvious use case for Intel/AMD
>> to have a callback before a page is used for I/O?
> 
> Yes?
> 
>> Andrew (or other mm people) any chance to get an ACK for this change?
>> I could then carry that via s390 or KVM tree. Or if you want to carry
>> that yourself I can send an updated version (we need to kind of 
>> synchronize that Linus will pull the KVM changes after the mm changes).
>>
>> Andrea asked if others would benefit from this, so here are some more
>> information about this (and I can also put this into the patch
>> description).  So we have talked to the POWER folks. They do not use
>> the standard normal memory management, instead they have a hard split
>> between secure and normal memory. The secure memory  is the handled by
>> the hypervisor as device memory and the ultravisor and the hypervisor
>> move this forth and back when needed.
>>
>> On s390 there is no *separate* pool of physical pages that are secure.
>> Instead, *any* physical page can be marked as secure or not, by
>> setting a bit in a per-page data structure that hardware uses to stop
>> unauthorized access.  (That bit is under control of the ultravisor.)
>>
>> Note that one side effect of this strategy is that the decision
>> *which* secure pages to encrypt and then swap out is actually done by
>> the hypervisor, not the ultravisor.  In our case, the hypervisor is
>> Linux/KVM, so we're using the regular Linux memory management scheme
>> (active/inactive LRU lists etc.) to make this decision.  The advantage
>> is that the Ultravisor code does not need to itself implement any
>> memory management code, making it a lot simpler.
> 
> Disclaimer: I'm not familiar with s390 guest page faults or UV.  I tried
> to give myself a crash course, apologies if I'm way out in left field...
> 
> AIUI, pages will first be added to a secure guest by converting a normal,
> non-secure page to secure and stuffing it into the guest page tables.  To
> swap a page from a secure guest, arch_make_page_accessible() will be called
> to encrypt the page in place so that it can be accessed by the untrusted
> kernel/VMM and written out to disk.  And to fault the page back in, on s390
> a secure guest access to a non-secure page will generate a page fault with
> a dedicated type.  That fault routes directly to
> do_non_secure_storage_access(), which converts the page to secure and thus
> makes it re-accessible to the guest.
> 
> That all sounds sane and usable for Intel.
> 
> My big question is the follow/get flows, more on that below.
> 
>> However, in the end this is why we need the hook into Linux memory
>> management: once Linux has decided to swap a page out, we need to get
>> a chance to tell the Ultravisor to "export" the page (i.e., encrypt
>> its contents and mark it no longer secure).
>>
>> As outlined below this should be a no-op for anybody not opting in.
>>
>> Christian                                   
>>
>> On 07.02.20 12:39, Christian Borntraeger wrote:
>>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>>
>>> With the introduction of protected KVM guests on s390 there is now a
>>> concept of inaccessible pages. These pages need to be made accessible
>>> before the host can access them.
>>>
>>> While cpu accesses will trigger a fault that can be resolved, I/O
>>> accesses will just fail.  We need to add a callback into architecture
>>> code for places that will do I/O, namely when writeback is started or
>>> when a page reference is taken.
>>>
>>> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>> ---
>>>  include/linux/gfp.h | 6 ++++++
>>>  mm/gup.c            | 2 ++
>>>  mm/page-writeback.c | 1 +
>>>  3 files changed, 9 insertions(+)
>>>
>>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>>> index e5b817cb86e7..be2754841369 100644
>>> --- a/include/linux/gfp.h
>>> +++ b/include/linux/gfp.h
>>> @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { }
>>>  #ifndef HAVE_ARCH_ALLOC_PAGE
>>>  static inline void arch_alloc_page(struct page *page, int order) { }
>>>  #endif
>>> +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
>>> +static inline int arch_make_page_accessible(struct page *page)
>>> +{
>>> +	return 0;
>>> +}
>>> +#endif
>>>  
>>>  struct page *
>>>  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 7646bf993b25..a01262cd2821 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -257,6 +257,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>>>  			page = ERR_PTR(-ENOMEM);
>>>  			goto out;
>>>  		}
>>> +		arch_make_page_accessible(page);
> 
> As Will pointed out, the return value definitely needs to be checked, there
> will undoubtedly be scenarios where the page cannot be made accessible.

Actually onm s390 this should always succeed unless we have a bug.

But we can certainly provide a variant of that patch that does check the return
value. 
Proper error handling for gup and WARN_ON for pae-writeback.
> 
> What is the use case for calling arch_make_page_accessible() in the follow()
> and gup() paths?  Live migration is the only thing that comes to mind, and
> for live migration I would expect you would want to keep the secure guest
> running when copying pages to the target, i.e. use pre-copy.  That would
> conflict with converting the page in place.  Rather, migration would use a
> separate dedicated path to copy the encrypted contents of the secure page to
> a completely different page, and send *that* across the wire so that the
> guest can continue accessing the original page.
> Am I missing a need to do this for the swap/reclaim case?  Or is there a
> completely different use case I'm overlooking?

This is actually to protect the host against a malicious user space. For 
example a bad QEMU could simply start direct I/O on such protected memory.
We do not want userspace to be able to trigger I/O errors and thus we
implemented the logic to "whenever somebody accesses that page (gup) or
doing I/O, make sure that this page can be accessed. When the guest tries
to access that page we will wait in the page fault handler for writeback to
have finished and for the page_ref to be the expected value.



> 
> Tangentially related, hooks here could be quite useful for sanity checking
> the kernel/KVM and/or debugging kernel/KVM bugs.  Would it make sense to
> pass a param to arch_make_page_accessible() to provide some information as
> to why the page needs to be made accessible?

Some kind of enum that can be used optionally to optimize things?

> 
>>>  	}
>>>  	if (flags & FOLL_TOUCH) {
>>>  		if ((flags & FOLL_WRITE) &&
>>> @@ -1870,6 +1871,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>>>  
>>>  		VM_BUG_ON_PAGE(compound_head(page) != head, page);
>>>  
>>> +		arch_make_page_accessible(page);
>>>  		SetPageReferenced(page);
>>>  		pages[*nr] = page;
>>>  		(*nr)++;
>>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>>> index 2caf780a42e7..0f0bd14571b1 100644
>>> --- a/mm/page-writeback.c
>>> +++ b/mm/page-writeback.c
>>> @@ -2806,6 +2806,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
>>>  		inc_lruvec_page_state(page, NR_WRITEBACK);
>>>  		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
>>>  	}
>>> +	arch_make_page_accessible(page);
>>>  	unlock_page_memcg(page);
>>
>> As outlined by Ulrich, we can move the callback after the unlock.
>>
>>>  	return ret;
>>>  
>>>
>>

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-13 20:13         ` Christian Borntraeger
@ 2020-02-13 20:46           ` Sean Christopherson
  -1 siblings, 0 replies; 147+ messages in thread
From: Sean Christopherson @ 2020-02-13 20:46 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, Andrew Morton, Marc Zyngier, Tom Lendacky, KVM,
	Cornelia Huck, David Hildenbrand, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, kvm-ppc, Paolo Bonzini

On Thu, Feb 13, 2020 at 09:13:35PM +0100, Christian Borntraeger wrote:
> 
> On 13.02.20 20:56, Sean Christopherson wrote:
> > On Mon, Feb 10, 2020 at 06:27:04PM +0100, Christian Borntraeger wrote:
> > Am I missing a need to do this for the swap/reclaim case?  Or is there a
> > completely different use case I'm overlooking?
> 
> This is actually to protect the host against a malicious user space. For 
> example a bad QEMU could simply start direct I/O on such protected memory.
> We do not want userspace to be able to trigger I/O errors and thus we
> implemented the logic to "whenever somebody accesses that page (gup) or
> doing I/O, make sure that this page can be accessed. When the guest tries
> to access that page we will wait in the page fault handler for writeback to
> have finished and for the page_ref to be the expected value.

Ah.  I was assuming the pages would unmappable by userspace, enforced by
some other mechanism

> > 
> > Tangentially related, hooks here could be quite useful for sanity checking
> > the kernel/KVM and/or debugging kernel/KVM bugs.  Would it make sense to
> > pass a param to arch_make_page_accessible() to provide some information as
> > to why the page needs to be made accessible?
> 
> Some kind of enum that can be used optionally to optimize things?

Not just optimize, in the case above it'd probably preferable for us to
reject a userspace mapping outright, e.g. return -EFAULT if called from
gup()/follow().  Debug scenarios might also require differentiating between
writeback and "other".

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
@ 2020-02-13 20:46           ` Sean Christopherson
  0 siblings, 0 replies; 147+ messages in thread
From: Sean Christopherson @ 2020-02-13 20:46 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, Andrew Morton, Marc Zyngier, Tom Lendacky, KVM,
	Cornelia Huck, David Hildenbrand, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, kvm-ppc, Paolo Bonzini

On Thu, Feb 13, 2020 at 09:13:35PM +0100, Christian Borntraeger wrote:
> 
> On 13.02.20 20:56, Sean Christopherson wrote:
> > On Mon, Feb 10, 2020 at 06:27:04PM +0100, Christian Borntraeger wrote:
> > Am I missing a need to do this for the swap/reclaim case?  Or is there a
> > completely different use case I'm overlooking?
> 
> This is actually to protect the host against a malicious user space. For 
> example a bad QEMU could simply start direct I/O on such protected memory.
> We do not want userspace to be able to trigger I/O errors and thus we
> implemented the logic to "whenever somebody accesses that page (gup) or
> doing I/O, make sure that this page can be accessed. When the guest tries
> to access that page we will wait in the page fault handler for writeback to
> have finished and for the page_ref to be the expected value.

Ah.  I was assuming the pages would unmappable by userspace, enforced by
some other mechanism

> > 
> > Tangentially related, hooks here could be quite useful for sanity checking
> > the kernel/KVM and/or debugging kernel/KVM bugs.  Would it make sense to
> > pass a param to arch_make_page_accessible() to provide some information as
> > to why the page needs to be made accessible?
> 
> Some kind of enum that can be used optionally to optimize things?

Not just optimize, in the case above it'd probably preferable for us to
reject a userspace mapping outright, e.g. return -EFAULT if called from
gup()/follow().  Debug scenarios might also require differentiating between
writeback and "other".

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 04/35] s390/protvirt: add ultravisor initialization
  2020-02-07 11:39 ` [PATCH 04/35] s390/protvirt: add ultravisor initialization Christian Borntraeger
@ 2020-02-14 10:25   ` David Hildenbrand
  2020-02-14 10:33     ` Christian Borntraeger
  0 siblings, 1 reply; 147+ messages in thread
From: David Hildenbrand @ 2020-02-14 10:25 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik

[...]

>  #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
> diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
> index f2ab2528859f..5f178d557cc8 100644
> --- a/arch/s390/kernel/setup.c
> +++ b/arch/s390/kernel/setup.c
> @@ -560,6 +560,8 @@ static void __init setup_memory_end(void)
>  			vmax = _REGION1_SIZE; /* 4-level kernel page table */
>  	}
>  
> +	adjust_to_uv_max(&vmax);

I'd somewhat prefer

if (prot_virt_host)
	adjust_to_uv_max(&vmax);

> +
>  	/* module area is at the end of the kernel address space. */
>  	MODULES_END = vmax;
>  	MODULES_VADDR = MODULES_END - MODULES_LEN;
> @@ -1140,6 +1142,7 @@ void __init setup_arch(char **cmdline_p)
>  	 */
>  	memblock_trim_memory(1UL << (MAX_ORDER - 1 + PAGE_SHIFT));
>  
> +	setup_uv();

and

if (prot_virt_host)
	setup_uv();

Moving the checks out of the functions. Makes it clearer that this is
optional.

>  	setup_memory_end();
>  	setup_memory();
>  	dma_contiguous_reserve(memory_end);
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index fbf2a98de642..a06a628a88da 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -46,4 +46,57 @@ static int __init prot_virt_setup(char *val)
>  	return rc;
>  }
>  early_param("prot_virt", prot_virt_setup);
> +
> +static int __init uv_init(unsigned long stor_base, unsigned long stor_len)
> +{
> +	struct uv_cb_init uvcb = {
> +		.header.cmd = UVC_CMD_INIT_UV,
> +		.header.len = sizeof(uvcb),
> +		.stor_origin = stor_base,
> +		.stor_len = stor_len,
> +	};
> +	int cc;
> +
> +	cc = uv_call(0, (uint64_t)&uvcb);

Could do

int cc = uv_call(0, (uint64_t)&uvcb);

> +	if (cc || uvcb.header.rc != UVC_RC_EXECUTED) {
> +		pr_err("Ultravisor init failed with cc: %d rc: 0x%hx\n", cc,
> +		       uvcb.header.rc);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +void __init setup_uv(void)
> +{
> +	unsigned long uv_stor_base;
> +
> +	if (!prot_virt_host)
> +		return;
> +
> +	uv_stor_base = (unsigned long)memblock_alloc_try_nid(
> +		uv_info.uv_base_stor_len, SZ_1M, SZ_2G,
> +		MEMBLOCK_ALLOC_ACCESSIBLE, NUMA_NO_NODE);
> +	if (!uv_stor_base) {
> +		pr_info("Failed to reserve %lu bytes for ultravisor base storage\n",
> +			uv_info.uv_base_stor_len);

pr_err() ? pr_warn()?

> +		goto fail;
> +	}
> +
> +	if (uv_init(uv_stor_base, uv_info.uv_base_stor_len)) {
> +		memblock_free(uv_stor_base, uv_info.uv_base_stor_len);
> +		goto fail;
> +	}
> +
> +	pr_info("Reserving %luMB as ultravisor base storage\n",
> +		uv_info.uv_base_stor_len >> 20);
> +	return;
> +fail:

I'd add here:

pr_info("Disabling support for protected virtualization");

> +	prot_virt_host = 0;> +}
> +
> +void adjust_to_uv_max(unsigned long *vmax)
> +{
> +	if (prot_virt_host && *vmax > uv_info.max_sec_stor_addr)
> +		*vmax = uv_info.max_sec_stor_addr;

Once you move the prot virt check out of this function

	*vmax = max_t(unsigned long, *vmax, uv_info.max_sec_stor_addr);

> +}
>  #endif
> 


-- 
Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 04/35] s390/protvirt: add ultravisor initialization
  2020-02-14 10:25   ` David Hildenbrand
@ 2020-02-14 10:33     ` Christian Borntraeger
  2020-02-14 10:34       ` David Hildenbrand
  0 siblings, 1 reply; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-14 10:33 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik



On 14.02.20 11:25, David Hildenbrand wrote:
> [...]
> 
>>  #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
>> diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
>> index f2ab2528859f..5f178d557cc8 100644
>> --- a/arch/s390/kernel/setup.c
>> +++ b/arch/s390/kernel/setup.c
>> @@ -560,6 +560,8 @@ static void __init setup_memory_end(void)
>>  			vmax = _REGION1_SIZE; /* 4-level kernel page table */
>>  	}
>>  
>> +	adjust_to_uv_max(&vmax);
> 
> I'd somewhat prefer
> 
> if (prot_virt_host)
> 	adjust_to_uv_max(&vmax);
> 
>> +

fine with me. ack

>>  	/* module area is at the end of the kernel address space. */
>>  	MODULES_END = vmax;
>>  	MODULES_VADDR = MODULES_END - MODULES_LEN;
>> @@ -1140,6 +1142,7 @@ void __init setup_arch(char **cmdline_p)
>>  	 */
>>  	memblock_trim_memory(1UL << (MAX_ORDER - 1 + PAGE_SHIFT));
>>  
>> +	setup_uv();
> 
> and
> 
> if (prot_virt_host)
> 	setup_uv();

ack
> 
> Moving the checks out of the functions. Makes it clearer that this is
> optional.
> 
>>  	setup_memory_end();
>>  	setup_memory();
>>  	dma_contiguous_reserve(memory_end);
>> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
>> index fbf2a98de642..a06a628a88da 100644
>> --- a/arch/s390/kernel/uv.c
>> +++ b/arch/s390/kernel/uv.c
>> @@ -46,4 +46,57 @@ static int __init prot_virt_setup(char *val)
>>  	return rc;
>>  }
>>  early_param("prot_virt", prot_virt_setup);
>> +
>> +static int __init uv_init(unsigned long stor_base, unsigned long stor_len)
>> +{
>> +	struct uv_cb_init uvcb = {
>> +		.header.cmd = UVC_CMD_INIT_UV,
>> +		.header.len = sizeof(uvcb),
>> +		.stor_origin = stor_base,
>> +		.stor_len = stor_len,
>> +	};
>> +	int cc;
>> +
>> +	cc = uv_call(0, (uint64_t)&uvcb);
> 
> Could do
> 
> int cc = uv_call(0, (uint64_t)&uvcb);

I could actually get rid of the cc and the comparison with UVC_RC_EXECUTED.
When the condition code is 0, rc must be 1.

Something like
	if (uv_call(0,.....

> 
>> +	if (cc || uvcb.header.rc != UVC_RC_EXECUTED) {
>> +		pr_err("Ultravisor init failed with cc: %d rc: 0x%hx\n", cc,
>> +		       uvcb.header.rc);
>> +		return -1;
>> +	}
>> +	return 0;
>> +}
>> +
>> +void __init setup_uv(void)
>> +{
>> +	unsigned long uv_stor_base;
>> +
>> +	if (!prot_virt_host)
>> +		return;
>> +
>> +	uv_stor_base = (unsigned long)memblock_alloc_try_nid(
>> +		uv_info.uv_base_stor_len, SZ_1M, SZ_2G,
>> +		MEMBLOCK_ALLOC_ACCESSIBLE, NUMA_NO_NODE);
>> +	if (!uv_stor_base) {
>> +		pr_info("Failed to reserve %lu bytes for ultravisor base storage\n",
>> +			uv_info.uv_base_stor_len);
> 
> pr_err() ? pr_warn()

ack.


> 
>> +		goto fail;
>> +	}
>> +
>> +	if (uv_init(uv_stor_base, uv_info.uv_base_stor_len)) {
>> +		memblock_free(uv_stor_base, uv_info.uv_base_stor_len);
>> +		goto fail;
>> +	}
>> +
>> +	pr_info("Reserving %luMB as ultravisor base storage\n",
>> +		uv_info.uv_base_stor_len >> 20);
>> +	return;
>> +fail:
> 
> I'd add here:
> 
> pr_info("Disabling support for protected virtualization");

ack

> 
>> +	prot_virt_host = 0;> +}
>> +
>> +void adjust_to_uv_max(unsigned long *vmax)
>> +{
>> +	if (prot_virt_host && *vmax > uv_info.max_sec_stor_addr)
>> +		*vmax = uv_info.max_sec_stor_addr;
> 
> Once you move the prot virt check out of this function


> 
> 	*vmax = max_t(unsigned long, *vmax, uv_info.max_sec_stor_addr);

ack
> 
>> +}
>>  #endif
>>
> 
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 04/35] s390/protvirt: add ultravisor initialization
  2020-02-14 10:33     ` Christian Borntraeger
@ 2020-02-14 10:34       ` David Hildenbrand
  0 siblings, 0 replies; 147+ messages in thread
From: David Hildenbrand @ 2020-02-14 10:34 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik


>>
>>> +	prot_virt_host = 0;> +}
>>> +
>>> +void adjust_to_uv_max(unsigned long *vmax)
>>> +{
>>> +	if (prot_virt_host && *vmax > uv_info.max_sec_stor_addr)
>>> +		*vmax = uv_info.max_sec_stor_addr;
>>
>> Once you move the prot virt check out of this function
> 
> 
>>
>> 	*vmax = max_t(unsigned long, *vmax, uv_info.max_sec_stor_addr);

actually min_t, sorry :)


-- 
Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 05/35] s390/mm: provide memory management functions for protected KVM guests
  2020-02-07 11:39 ` [PATCH 05/35] s390/mm: provide memory management functions for protected KVM guests Christian Borntraeger
  2020-02-12 13:42   ` Cornelia Huck
@ 2020-02-14 17:59   ` David Hildenbrand
  2020-02-14 21:17     ` Christian Borntraeger
  1 sibling, 1 reply; 147+ messages in thread
From: David Hildenbrand @ 2020-02-14 17:59 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, Andrew Morton

>  
>  /*
> @@ -1086,12 +1106,16 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
>  					    unsigned long addr,
>  					    pte_t *ptep, int full)
>  {
> +	pte_t res;

Empty line missing.

>  	if (full) {
> -		pte_t pte = *ptep;
> +		res = *ptep;
>  		*ptep = __pte(_PAGE_INVALID);
> -		return pte;
> +	} else {
> +		res = ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID));
>  	}
> -	return ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID));
> +	if (mm_is_protected(mm) && pte_present(res))
> +		uv_convert_from_secure(pte_val(res) & PAGE_MASK);
> +	return res;
>  }

[...]

> +int uv_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb);
> +int uv_convert_from_secure(unsigned long paddr);
> +
> +static inline int uv_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
> +{
> +	struct uv_cb_cts uvcb = {
> +		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
> +		.header.len = sizeof(uvcb),
> +		.guest_handle = gmap->guest_handle,
> +		.gaddr = gaddr,
> +	};
> +
> +	return uv_make_secure(gmap, gaddr, &uvcb);
> +}

I'd actually suggest to name everything that eats a gmap "gmap_",

e.g., "gmap_make_secure()"

[...]

>  
>  #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index a06a628a88da..15ac598a3d8d 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -9,6 +9,8 @@
>  #include <linux/sizes.h>
>  #include <linux/bitmap.h>
>  #include <linux/memblock.h>
> +#include <linux/pagemap.h>
> +#include <linux/swap.h>
>  #include <asm/facility.h>
>  #include <asm/sections.h>
>  #include <asm/uv.h>
> @@ -99,4 +101,174 @@ void adjust_to_uv_max(unsigned long *vmax)
>  	if (prot_virt_host && *vmax > uv_info.max_sec_stor_addr)
>  		*vmax = uv_info.max_sec_stor_addr;
>  }
> +
> +static int __uv_pin_shared(unsigned long paddr)
> +{
> +	struct uv_cb_cfs uvcb = {
> +		.header.cmd	= UVC_CMD_PIN_PAGE_SHARED,
> +		.header.len	= sizeof(uvcb),
> +		.paddr		= paddr,

please drop all the superfluous spaces (just as in the other uv calls).

> +	};
> +
> +	if (uv_call(0, (u64)&uvcb))
> +		return -EINVAL;
> +	return 0;
> +}

[...]

> +static int make_secure_pte(pte_t *ptep, unsigned long addr, void *data)
> +{
> +	struct conv_params *params = data;
> +	pte_t entry = READ_ONCE(*ptep);
> +	struct page *page;
> +	int expected, rc = 0;
> +
> +	if (!pte_present(entry))
> +		return -ENXIO;
> +	if (pte_val(entry) & (_PAGE_INVALID | _PAGE_PROTECT))
> +		return -ENXIO;
> +
> +	page = pte_page(entry);
> +	if (page != params->page)
> +		return -ENXIO;
> +
> +	if (PageWriteback(page))
> +		return -EAGAIN;
> +	expected = expected_page_refs(page);

I do wonder if we could factor out expected_page_refs() and reuse from
other sources ...

I do wonder about huge page backing of guests, and especially
hpage_nr_pages(page) used in mm/migrate.c:expected_page_refs(). But I
can spot some hugepage exclusion below ... This needs comments.

> +	if (!page_ref_freeze(page, expected))
> +		return -EBUSY;
> +	set_bit(PG_arch_1, &page->flags);

Can we please document somewhere how PG_arch_1 is used on s390x? (page)

"The generic code guarantees that this bit is cleared for a page when it
first is entered into the page cache" - should not be an issue, right?

> +	rc = uv_call(0, (u64)params->uvcb);
> +	page_ref_unfreeze(page, expected);
> +	if (rc)
> +		rc = (params->uvcb->rc == 0x10a) ? -ENXIO : -EINVAL;
> +	return rc;
> +}
> +
> +/*
> + * Requests the Ultravisor to make a page accessible to a guest.
> + * If it's brought in the first time, it will be cleared. If
> + * it has been exported before, it will be decrypted and integrity
> + * checked.
> + *
> + * @gmap: Guest mapping
> + * @gaddr: Guest 2 absolute address to be imported

I'd just drop the the (incomplete) parameter documentation, everybody
reaching this point should now what a gmap and what a gaddr is ...

> + */
> +int uv_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
> +{
> +	struct conv_params params = { .uvcb = uvcb };
> +	struct vm_area_struct *vma;
> +	unsigned long uaddr;
> +	int rc, local_drain = 0;
> +
> +again:
> +	rc = -EFAULT;
> +	down_read(&gmap->mm->mmap_sem);
> +
> +	uaddr = __gmap_translate(gmap, gaddr);
> +	if (IS_ERR_VALUE(uaddr))
> +		goto out;
> +	vma = find_vma(gmap->mm, uaddr);
> +	if (!vma)
> +		goto out;
> +	if (is_vm_hugetlb_page(vma))
> +		goto out;

Hah there it is! How is it enforced on upper layers/excluded? Will
hpage=true fail with prot virt? What if a guest is not a protected guest
but wants to sue huge pages? This needs comments/patch description.

> +
> +	rc = -ENXIO;
> +	params.page = follow_page(vma, uaddr, FOLL_WRITE | FOLL_NOWAIT);
> +	if (IS_ERR_OR_NULL(params.page))
> +		goto out;
> +
> +	lock_page(params.page);
> +	rc = apply_to_page_range(gmap->mm, uaddr, PAGE_SIZE, make_secure_pte, &params);

Ehm, isn't it just always a single page?

> +	unlock_page(params.page);
> +out:
> +	up_read(&gmap->mm->mmap_sem);
> +
> +	if (rc == -EBUSY) {
> +		if (local_drain) {
> +			lru_add_drain_all();
> +			return -EAGAIN;
> +		}
> +		lru_add_drain();

comments please why that is performed.

> +		local_drain = 1;
> +		goto again;

Could we end up in an endless loop?

> +	} else if (rc == -ENXIO) {
> +		if (gmap_fault(gmap, gaddr, FAULT_FLAG_WRITE))
> +			return -EFAULT;
> +		return -EAGAIN;
> +	}
> +	return rc;
> +}
> +EXPORT_SYMBOL_GPL(uv_make_secure);
> +
> +/**
> + * To be called with the page locked or with an extra reference!
> + */
> +int arch_make_page_accessible(struct page *page)
> +{
> +	int rc = 0;
> +
> +	if (PageHuge(page))
> +		return 0;

Ah, another instance. Comment please why

> +
> +	if (!test_bit(PG_arch_1, &page->flags))
> +		return 0;

"Can you describe the meaning of this bit with three words"? Or a couple
more? :D

"once upon a time, the page was secure and still might be" ?
"the page is secure and therefore inaccessible" ?

> +
> +	rc = __uv_pin_shared(page_to_phys(page));
> +	if (!rc) {
> +		clear_bit(PG_arch_1, &page->flags);
> +		return 0;
> +	}
> +
> +	rc = uv_convert_from_secure(page_to_phys(page));
> +	if (!rc) {
> +		clear_bit(PG_arch_1, &page->flags);
> +		return 0;
> +	}
> +
> +	return rc;
> +}
> +EXPORT_SYMBOL_GPL(arch_make_page_accessible);
> +
>  #endif
> 

More code comments would be highly appreciated!

-- 
Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 06/35] s390/mm: add (non)secure page access exceptions handlers
  2020-02-07 11:39 ` [PATCH 06/35] s390/mm: add (non)secure page access exceptions handlers Christian Borntraeger
@ 2020-02-14 18:05   ` David Hildenbrand
  2020-02-14 19:59     ` Christian Borntraeger
  0 siblings, 1 reply; 147+ messages in thread
From: David Hildenbrand @ 2020-02-14 18:05 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, Andrew Morton, Janosch Frank

On 07.02.20 12:39, Christian Borntraeger wrote:
> From: Vasily Gorbik <gor@linux.ibm.com>
> 
> Add exceptions handlers performing transparent transition of non-secure
> pages to secure (import) upon guest access and secure pages to
> non-secure (export) upon hypervisor access.
> 
> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
> [frankja@linux.ibm.com: adding checks for failures]
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> [imbrenda@linux.ibm.com:  adding a check for gmap fault]
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/kernel/pgm_check.S |  4 +-
>  arch/s390/mm/fault.c         | 86 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 88 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/s390/kernel/pgm_check.S b/arch/s390/kernel/pgm_check.S
> index 59dee9d3bebf..27ac4f324c70 100644
> --- a/arch/s390/kernel/pgm_check.S
> +++ b/arch/s390/kernel/pgm_check.S
> @@ -78,8 +78,8 @@ PGM_CHECK(do_dat_exception)		/* 39 */
>  PGM_CHECK(do_dat_exception)		/* 3a */
>  PGM_CHECK(do_dat_exception)		/* 3b */
>  PGM_CHECK_DEFAULT			/* 3c */
> -PGM_CHECK_DEFAULT			/* 3d */
> -PGM_CHECK_DEFAULT			/* 3e */
> +PGM_CHECK(do_secure_storage_access)	/* 3d */
> +PGM_CHECK(do_non_secure_storage_access)	/* 3e */
>  PGM_CHECK_DEFAULT			/* 3f */
>  PGM_CHECK_DEFAULT			/* 40 */
>  PGM_CHECK_DEFAULT			/* 41 */
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index 7b0bb475c166..fab4219fa0be 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -38,6 +38,7 @@
>  #include <asm/irq.h>
>  #include <asm/mmu_context.h>
>  #include <asm/facility.h>
> +#include <asm/uv.h>
>  #include "../kernel/entry.h"
>  
>  #define __FAIL_ADDR_MASK -4096L
> @@ -816,3 +817,88 @@ static int __init pfault_irq_init(void)
>  early_initcall(pfault_irq_init);
>  
>  #endif /* CONFIG_PFAULT */
> +
> +#if IS_ENABLED(CONFIG_KVM)
> +void do_secure_storage_access(struct pt_regs *regs)
> +{
> +	unsigned long addr = regs->int_parm_long & __FAIL_ADDR_MASK;
> +	struct vm_area_struct *vma;
> +	struct mm_struct *mm;
> +	struct page *page;
> +	int rc;
> +
> +	switch (get_fault_type(regs)) {
> +	case USER_FAULT:
> +		mm = current->mm;
> +		down_read(&mm->mmap_sem);
> +		vma = find_vma(mm, addr);
> +		if (!vma) {
> +			up_read(&mm->mmap_sem);
> +			do_fault_error(regs, VM_READ | VM_WRITE, VM_FAULT_BADMAP);
> +			break;
> +		}
> +		page = follow_page(vma, addr, FOLL_WRITE | FOLL_GET);
> +		if (IS_ERR_OR_NULL(page)) {
> +			up_read(&mm->mmap_sem);
> +			break;
> +		}
> +		if (arch_make_page_accessible(page))
> +			send_sig(SIGSEGV, current, 0);
> +		put_page(page);
> +		up_read(&mm->mmap_sem);
> +		break;
> +	case KERNEL_FAULT:
> +		page = phys_to_page(addr);
> +		if (unlikely(!try_get_page(page)))
> +			break;
> +		rc = arch_make_page_accessible(page);
> +		put_page(page);
> +		if (rc)
> +			BUG();
> +		break;
> +	case VDSO_FAULT:
> +		/* fallthrough */
> +	case GMAP_FAULT:
> +		/* fallthrough */

Could we ever get here from the SIE?

> +	default:
> +		do_fault_error(regs, VM_READ | VM_WRITE, VM_FAULT_BADMAP);
> +		WARN_ON_ONCE(1);
> +	}
> +}
> +NOKPROBE_SYMBOL(do_secure_storage_access);
> +
> +void do_non_secure_storage_access(struct pt_regs *regs)
> +{
> +	unsigned long gaddr = regs->int_parm_long & __FAIL_ADDR_MASK;
> +	struct gmap *gmap = (struct gmap *)S390_lowcore.gmap;
> +	struct uv_cb_cts uvcb = {
> +		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
> +		.header.len = sizeof(uvcb),
> +		.guest_handle = gmap->guest_handle,
> +		.gaddr = gaddr,
> +	};
> +	int rc;
> +
> +	if (get_fault_type(regs) != GMAP_FAULT) {
> +		do_fault_error(regs, VM_READ | VM_WRITE, VM_FAULT_BADMAP);
> +		WARN_ON_ONCE(1);
> +		return;
> +	}
> +
> +	rc = uv_make_secure(gmap, gaddr, &uvcb);
> +	if (rc == -EINVAL && uvcb.header.rc != 0x104)
> +		send_sig(SIGSEGV, current, 0);


Looks good to me, but I don't feel like being ready for an r-b. I'll
have to let that sink in :)

Assumed-is-okay-by: David Hildenbrand <david@redhat.com>


-- 
Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 07/35] KVM: s390: add new variants of UV CALL
  2020-02-07 11:39 ` [PATCH 07/35] KVM: s390: add new variants of UV CALL Christian Borntraeger
  2020-02-07 14:34   ` Thomas Huth
  2020-02-10 12:16   ` Cornelia Huck
@ 2020-02-14 18:28   ` David Hildenbrand
  2020-02-14 20:13     ` Christian Borntraeger
  2 siblings, 1 reply; 147+ messages in thread
From: David Hildenbrand @ 2020-02-14 18:28 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank

On 07.02.20 12:39, Christian Borntraeger wrote:
> From: Janosch Frank <frankja@linux.ibm.com>
> 
> This add 2 new variants of the UV CALL.
> 
> The first variant handles UV CALLs that might have longer busy
> conditions or just need longer when doing partial completion. We should
> schedule when necessary.
> 
> The second variant handles UV CALLs that only need the handle but have
> no payload (e.g. destroying a VM). We can provide a simple wrapper for
> those.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/include/asm/uv.h | 59 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 59 insertions(+)
> 
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index 1b97230a57ba..e1cef772fde1 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -14,6 +14,7 @@
>  #include <linux/types.h>
>  #include <linux/errno.h>
>  #include <linux/bug.h>
> +#include <linux/sched.h>
>  #include <asm/page.h>
>  #include <asm/gmap.h>
>  
> @@ -91,6 +92,19 @@ struct uv_cb_cfs {
>  	u64 paddr;
>  } __packed __aligned(8);
>  
> +/*
> + * A common UV call struct for calls that take no payload
> + * Examples:
> + * Destroy cpu/config
> + * Verify
> + */
> +struct uv_cb_nodata {
> +	struct uv_cb_header header;
> +	u64 reserved08[2];
> +	u64 handle;
> +	u64 reserved20[4];
> +} __packed __aligned(8);
> +
>  struct uv_cb_share {
>  	struct uv_cb_header header;
>  	u64 reserved08[3];
> @@ -98,6 +112,31 @@ struct uv_cb_share {
>  	u64 reserved28;
>  } __packed __aligned(8);
>  
> +/*
> + * Low level uv_call that takes r1 and r2 as parameter and avoids
> + * stalls for long running busy conditions by doing schedule
> + */
> +static inline int uv_call_sched(unsigned long r1, unsigned long r2)
> +{
> +	int cc;
> +
> +	do {
> +		asm volatile(
> +			"0:	.insn rrf,0xB9A40000,%[r1],%[r2],0,0\n"

label not necessary

> +			"		ipm	%[cc]\n"
> +			"		srl	%[cc],28\n"
> +			: [cc] "=d" (cc)
> +			: [r1] "d" (r1), [r2] "d" (r2)
> +			: "memory", "cc");

I was wondering if we could reuse uv_call() - something like

static inline int __uv_call(unsigned long r1, unsigned long r2)
{
	int cc;

	asm volatile(
		"	.insn rrf,0xB9A40000,%[r1],%[r2],0,0\n"
		"		ipm	%[cc]\n"
		"		srl	%[cc],28\n"
		: [cc] "=d" (cc)
		: [r1] "a" (r1), [r2] "a" (r2)
		: "memory", "cc");
	return cc;
}

static inline int uv_call(unsigned long r1, unsigned long r2)
{
	int rc;

	do {
		cc = __uv_call(unsigned long r1, unsigned long r2);
	} while (cc > 1)
	return rc;
}

static inline int uv_call_sched(unsigned long r1, unsigned long r2)
{
	int rc;

	do {
		cc = __uv_call(unsigned long r1, unsigned long r2);
		cond_resched();
	} while (rc > 1)
	return rc;
}

> +		if (need_resched())
> +			schedule();

cond_resched();

> +	} while (cc > 1);
> +	return cc;
> +}
> +
> +/*
> + * Low level uv_call that takes r1 and r2 as parameter
> + */

This "r1 and r2" does not sound like relevant news. Same for the other
variant above.

>  static inline int uv_call(unsigned long r1, unsigned long r2)
>  {
>  	int cc;
> @@ -113,6 +152,26 @@ static inline int uv_call(unsigned long r1, unsigned long r2)
>  	return cc;
>  }
>  
> +/*
> + * special variant of uv_call that only transports the cpu or guest
> + * handle and the command, like destroy or verify.
> + */
> +static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret)

uv_call_sched_simple() ?

> +{
> +	int rc;
> +	struct uv_cb_nodata uvcb = {
> +		.header.cmd = cmd,
> +		.header.len = sizeof(uvcb),
> +		.handle = handle,
> +	};
> +
> +	WARN(!handle, "No handle provided to Ultravisor call cmd %x\n", cmd);
> +	rc = uv_call_sched(0, (u64)&uvcb);
> +	if (ret)
> +		*ret = *(u32 *)&uvcb.header.rc;
> +	return rc ? -EINVAL : 0;
> +}
> +
>  struct uv_info {
>  	unsigned long inst_calls_list[4];
>  	unsigned long uv_base_stor_len;
> 


-- 
Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 08/35] KVM: s390: protvirt: Add initial lifecycle handling
  2020-02-07 11:39 ` [PATCH 08/35] KVM: s390: protvirt: Add initial lifecycle handling Christian Borntraeger
  2020-02-07 16:32   ` Thomas Huth
  2020-02-08 14:54   ` Thomas Huth
@ 2020-02-14 18:39   ` David Hildenbrand
  2020-02-14 21:22     ` Christian Borntraeger
  2 siblings, 1 reply; 147+ messages in thread
From: David Hildenbrand @ 2020-02-14 18:39 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank

On 07.02.20 12:39, Christian Borntraeger wrote:
> From: Janosch Frank <frankja@linux.ibm.com>
> 
> This contains 3 main changes:
> 1. changes in SIE control block handling for secure guests
> 2. helper functions for create/destroy/unpack secure guests
> 3. KVM_S390_PV_COMMAND ioctl to allow userspace dealing with secure
> machines
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h |  24 ++-
>  arch/s390/include/asm/uv.h       |  69 +++++++++
>  arch/s390/kvm/Makefile           |   2 +-
>  arch/s390/kvm/kvm-s390.c         | 191 +++++++++++++++++++++++-
>  arch/s390/kvm/kvm-s390.h         |  27 ++++
>  arch/s390/kvm/pv.c               | 244 +++++++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h         |  33 +++++
>  7 files changed, 586 insertions(+), 4 deletions(-)
>  create mode 100644 arch/s390/kvm/pv.c
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 884503e05424..3ed31c5f80e1 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -160,7 +160,13 @@ struct kvm_s390_sie_block {
>  	__u8	reserved08[4];		/* 0x0008 */
>  #define PROG_IN_SIE (1<<0)
>  	__u32	prog0c;			/* 0x000c */
> -	__u8	reserved10[16];		/* 0x0010 */
> +	union {
> +		__u8	reserved10[16];		/* 0x0010 */
> +		struct {
> +			__u64	pv_handle_cpu;
> +			__u64	pv_handle_config;
> +		};
> +	};
>  #define PROG_BLOCK_SIE	(1<<0)
>  #define PROG_REQUEST	(1<<1)
>  	atomic_t prog20;		/* 0x0020 */
> @@ -233,7 +239,7 @@ struct kvm_s390_sie_block {
>  #define ECB3_RI  0x01
>  	__u8    ecb3;			/* 0x0063 */
>  	__u32	scaol;			/* 0x0064 */
> -	__u8	reserved68;		/* 0x0068 */
> +	__u8	sdf;			/* 0x0068 */
>  	__u8    epdx;			/* 0x0069 */
>  	__u8    reserved6a[2];		/* 0x006a */
>  	__u32	todpr;			/* 0x006c */
> @@ -645,6 +651,11 @@ struct kvm_guestdbg_info_arch {
>  	unsigned long last_bp;
>  };
>  
> +struct kvm_s390_pv_vcpu {
> +	u64 handle;
> +	unsigned long stor_base;
> +};
> +
>  struct kvm_vcpu_arch {
>  	struct kvm_s390_sie_block *sie_block;
>  	/* if vsie is active, currently executed shadow sie control block */
> @@ -673,6 +684,7 @@ struct kvm_vcpu_arch {
>  	__u64 cputm_start;
>  	bool gs_enabled;
>  	bool skey_enabled;
> +	struct kvm_s390_pv_vcpu pv;
>  };
>  
>  struct kvm_vm_stat {
> @@ -846,6 +858,13 @@ struct kvm_s390_gisa_interrupt {
>  	DECLARE_BITMAP(kicked_mask, KVM_MAX_VCPUS);
>  };
>  
> +struct kvm_s390_pv {
> +	u64 handle;
> +	u64 guest_len;
> +	unsigned long stor_base;
> +	void *stor_var;
> +};
> +
>  struct kvm_arch{
>  	void *sca;
>  	int use_esca;
> @@ -881,6 +900,7 @@ struct kvm_arch{
>  	DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS);
>  	DECLARE_BITMAP(idle_mask, KVM_MAX_VCPUS);
>  	struct kvm_s390_gisa_interrupt gisa_int;
> +	struct kvm_s390_pv pv;
>  };
>  
>  #define KVM_HVA_ERR_BAD		(-1UL)
> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
> index e1cef772fde1..7c21d55d2e49 100644
> --- a/arch/s390/include/asm/uv.h
> +++ b/arch/s390/include/asm/uv.h
> @@ -23,11 +23,19 @@
>  #define UVC_RC_INV_STATE	0x0003
>  #define UVC_RC_INV_LEN		0x0005
>  #define UVC_RC_NO_RESUME	0x0007
> +#define UVC_RC_NEED_DESTROY	0x8000
>  
>  #define UVC_CMD_QUI			0x0001
>  #define UVC_CMD_INIT_UV			0x000f
> +#define UVC_CMD_CREATE_SEC_CONF		0x0100
> +#define UVC_CMD_DESTROY_SEC_CONF	0x0101
> +#define UVC_CMD_CREATE_SEC_CPU		0x0120
> +#define UVC_CMD_DESTROY_SEC_CPU		0x0121
>  #define UVC_CMD_CONV_TO_SEC_STOR	0x0200
>  #define UVC_CMD_CONV_FROM_SEC_STOR	0x0201
> +#define UVC_CMD_SET_SEC_CONF_PARAMS	0x0300
> +#define UVC_CMD_UNPACK_IMG		0x0301
> +#define UVC_CMD_VERIFY_IMG		0x0302
>  #define UVC_CMD_PIN_PAGE_SHARED		0x0341
>  #define UVC_CMD_UNPIN_PAGE_SHARED	0x0342
>  #define UVC_CMD_SET_SHARED_ACCESS	0x1000
> @@ -37,10 +45,17 @@
>  enum uv_cmds_inst {
>  	BIT_UVC_CMD_QUI = 0,
>  	BIT_UVC_CMD_INIT_UV = 1,
> +	BIT_UVC_CMD_CREATE_SEC_CONF = 2,
> +	BIT_UVC_CMD_DESTROY_SEC_CONF = 3,
> +	BIT_UVC_CMD_CREATE_SEC_CPU = 4,
> +	BIT_UVC_CMD_DESTROY_SEC_CPU = 5,
>  	BIT_UVC_CMD_CONV_TO_SEC_STOR = 6,
>  	BIT_UVC_CMD_CONV_FROM_SEC_STOR = 7,
>  	BIT_UVC_CMD_SET_SHARED_ACCESS = 8,
>  	BIT_UVC_CMD_REMOVE_SHARED_ACCESS = 9,
> +	BIT_UVC_CMD_SET_SEC_PARMS = 11,
> +	BIT_UVC_CMD_UNPACK_IMG = 13,
> +	BIT_UVC_CMD_VERIFY_IMG = 14,
>  	BIT_UVC_CMD_PIN_PAGE_SHARED = 21,
>  	BIT_UVC_CMD_UNPIN_PAGE_SHARED = 22,
>  };
> @@ -52,6 +67,7 @@ struct uv_cb_header {
>  	u16 rrc;	/* Return Reason Code */
>  } __packed __aligned(8);
>  
> +/* Query Ultravisor Information */
>  struct uv_cb_qui {
>  	struct uv_cb_header header;
>  	u64 reserved08;
> @@ -71,6 +87,7 @@ struct uv_cb_qui {
>  	u64 reserveda0;
>  } __packed __aligned(8);
>  
> +/* Initialize Ultravisor */
>  struct uv_cb_init {
>  	struct uv_cb_header header;
>  	u64 reserved08[2];
> @@ -79,6 +96,35 @@ struct uv_cb_init {
>  	u64 reserved28[4];
>  } __packed __aligned(8);
>  
> +/* Create Guest Configuration */
> +struct uv_cb_cgc {
> +	struct uv_cb_header header;
> +	u64 reserved08[2];
> +	u64 guest_handle;
> +	u64 conf_base_stor_origin;
> +	u64 conf_virt_stor_origin;
> +	u64 reserved30;
> +	u64 guest_stor_origin;
> +	u64 guest_stor_len;
> +	u64 guest_sca;
> +	u64 guest_asce;
> +	u64 reserved58[5];
> +} __packed __aligned(8);
> +
> +/* Create Secure CPU */
> +struct uv_cb_csc {
> +	struct uv_cb_header header;
> +	u64 reserved08[2];
> +	u64 cpu_handle;
> +	u64 guest_handle;
> +	u64 stor_origin;
> +	u8  reserved30[6];
> +	u16 num;
> +	u64 state_origin;
> +	u64 reserved40[4];
> +} __packed __aligned(8);
> +
> +/* Convert to Secure */
>  struct uv_cb_cts {
>  	struct uv_cb_header header;
>  	u64 reserved08[2];
> @@ -86,12 +132,34 @@ struct uv_cb_cts {
>  	u64 gaddr;
>  } __packed __aligned(8);
>  
> +/* Convert from Secure / Pin Page Shared */
>  struct uv_cb_cfs {
>  	struct uv_cb_header header;
>  	u64 reserved08[2];
>  	u64 paddr;
>  } __packed __aligned(8);
>  
> +/* Set Secure Config Parameter */
> +struct uv_cb_ssc {
> +	struct uv_cb_header header;
> +	u64 reserved08[2];
> +	u64 guest_handle;
> +	u64 sec_header_origin;
> +	u32 sec_header_len;
> +	u32 reserved2c;
> +	u64 reserved30[4];
> +} __packed __aligned(8);
> +
> +/* Unpack */
> +struct uv_cb_unp {
> +	struct uv_cb_header header;
> +	u64 reserved08[2];
> +	u64 guest_handle;
> +	u64 gaddr;
> +	u64 tweak[2];
> +	u64 reserved38[3];
> +} __packed __aligned(8);
> +
>  /*
>   * A common UV call struct for calls that take no payload
>   * Examples:
> @@ -105,6 +173,7 @@ struct uv_cb_nodata {
>  	u64 reserved20[4];
>  } __packed __aligned(8);
>  
> +/* Set Shared Access */
>  struct uv_cb_share {
>  	struct uv_cb_header header;
>  	u64 reserved08[3];
> diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile
> index 05ee90a5ea08..12decca22e7c 100644
> --- a/arch/s390/kvm/Makefile
> +++ b/arch/s390/kvm/Makefile
> @@ -9,6 +9,6 @@ common-objs = $(KVM)/kvm_main.o $(KVM)/eventfd.o  $(KVM)/async_pf.o $(KVM)/irqch
>  ccflags-y := -Ivirt/kvm -Iarch/s390/kvm
>  
>  kvm-objs := $(common-objs) kvm-s390.o intercept.o interrupt.o priv.o sigp.o
> -kvm-objs += diag.o gaccess.o guestdbg.o vsie.o
> +kvm-objs += diag.o gaccess.o guestdbg.o vsie.o pv.o
>  
>  obj-$(CONFIG_KVM) += kvm.o
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 1a48214ac507..e1bccbb41fdd 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -44,6 +44,7 @@
>  #include <asm/cpacf.h>
>  #include <asm/timex.h>
>  #include <asm/ap.h>
> +#include <asm/uv.h>
>  #include "kvm-s390.h"
>  #include "gaccess.h"
>  
> @@ -236,6 +237,7 @@ int kvm_arch_check_processor_compat(void)
>  
>  static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start,
>  			      unsigned long end);
> +static int sca_switch_to_extended(struct kvm *kvm);
>  
>  static void kvm_clock_sync_scb(struct kvm_s390_sie_block *scb, u64 delta)
>  {
> @@ -568,6 +570,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_S390_BPB:
>  		r = test_facility(82);
>  		break;
> +	case KVM_CAP_S390_PROTECTED:
> +		r = is_prot_virt_host();
> +		break;
>  	default:
>  		r = 0;
>  	}
> @@ -2162,6 +2167,115 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm,
>  	return r;
>  }
>  
> +static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
> +{
> +	int r = 0;
> +	void __user *argp = (void __user *)cmd->data;
> +
> +	switch (cmd->cmd) {
> +	case KVM_PV_VM_CREATE: {
> +		r = -EINVAL;
> +		if (kvm_s390_pv_is_protected(kvm))
> +			break;
> +
> +		r = kvm_s390_pv_alloc_vm(kvm);
> +		if (r)
> +			break;
> +
> +		mutex_lock(&kvm->lock);
> +		kvm_s390_vcpu_block_all(kvm);
> +		/* FMT 4 SIE needs esca */
> +		r = sca_switch_to_extended(kvm);
> +		if (r) {
> +			kvm_s390_pv_dealloc_vm(kvm);
> +			kvm_s390_vcpu_unblock_all(kvm);
> +			mutex_unlock(&kvm->lock);
> +			break;
> +		}
> +		r = kvm_s390_pv_create_vm(kvm);
> +		kvm_s390_vcpu_unblock_all(kvm);
> +		mutex_unlock(&kvm->lock);
> +		break;
> +	}
> +	case KVM_PV_VM_DESTROY: {
> +		r = -EINVAL;
> +		if (!kvm_s390_pv_is_protected(kvm))
> +			break;
> +
> +		/* All VCPUs have to be destroyed before this call. */
> +		mutex_lock(&kvm->lock);
> +		kvm_s390_vcpu_block_all(kvm);
> +		r = kvm_s390_pv_destroy_vm(kvm);
> +		if (!r)
> +			kvm_s390_pv_dealloc_vm(kvm);
> +		kvm_s390_vcpu_unblock_all(kvm);
> +		mutex_unlock(&kvm->lock);
> +		break;
> +	}
> +	case KVM_PV_VM_SET_SEC_PARMS: {
> +		struct kvm_s390_pv_sec_parm parms = {};
> +		void *hdr;
> +
> +		r = -EINVAL;
> +		if (!kvm_s390_pv_is_protected(kvm))
> +			break;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&parms, argp, sizeof(parms)))
> +			break;
> +
> +		/* Currently restricted to 8KB */
> +		r = -EINVAL;
> +		if (parms.length > PAGE_SIZE * 2)
> +			break;
> +
> +		r = -ENOMEM;
> +		hdr = vmalloc(parms.length);
> +		if (!hdr)
> +			break;
> +
> +		r = -EFAULT;
> +		if (!copy_from_user(hdr, (void __user *)parms.origin,
> +				   parms.length))
> +			r = kvm_s390_pv_set_sec_parms(kvm, hdr, parms.length);
> +
> +		vfree(hdr);
> +		break;
> +	}
> +	case KVM_PV_VM_UNPACK: {
> +		struct kvm_s390_pv_unp unp = {};
> +
> +		r = -EINVAL;
> +		if (!kvm_s390_pv_is_protected(kvm))
> +			break;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&unp, argp, sizeof(unp)))
> +			break;
> +
> +		r = kvm_s390_pv_unpack(kvm, unp.addr, unp.size, unp.tweak);
> +		break;
> +	}
> +	case KVM_PV_VM_VERIFY: {
> +		u32 ret;
> +
> +		r = -EINVAL;
> +		if (!kvm_s390_pv_is_protected(kvm))
> +			break;
> +
> +		r = uv_cmd_nodata(kvm_s390_pv_handle(kvm),
> +				  UVC_CMD_VERIFY_IMG,
> +				  &ret);
> +		VM_EVENT(kvm, 3, "PROTVIRT VERIFY: rc %x rrc %x",
> +			 ret >> 16, ret & 0x0000ffff);
> +		break;
> +	}
> +	default:
> +		return -ENOTTY;
> +	}
> +	return r;
> +}
> +
>  long kvm_arch_vm_ioctl(struct file *filp,
>  		       unsigned int ioctl, unsigned long arg)
>  {
> @@ -2259,6 +2373,20 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  		mutex_unlock(&kvm->slots_lock);
>  		break;
>  	}
> +	case KVM_S390_PV_COMMAND: {
> +		struct kvm_pv_cmd args;
> +
> +		r = -EINVAL;
> +		if (!is_prot_virt_host())
> +			break;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&args, argp, sizeof(args)))
> +			break;
> +
> +		r = kvm_s390_handle_pv(kvm, &args);
> +		break;
> +	}
>  	default:
>  		r = -ENOTTY;
>  	}
> @@ -2534,6 +2662,8 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  
>  	if (vcpu->kvm->arch.use_cmma)
>  		kvm_s390_vcpu_unsetup_cmma(vcpu);
> +	if (kvm_s390_pv_handle_cpu(vcpu))
> +		kvm_s390_pv_destroy_cpu(vcpu);
>  	free_page((unsigned long)(vcpu->arch.sie_block));
>  
>  	kvm_vcpu_uninit(vcpu);
> @@ -2560,8 +2690,12 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>  {
>  	kvm_free_vcpus(kvm);
>  	sca_dispose(kvm);
> -	debug_unregister(kvm->arch.dbf);
>  	kvm_s390_gisa_destroy(kvm);
> +	if (kvm_s390_pv_is_protected(kvm)) {
> +		kvm_s390_pv_destroy_vm(kvm);
> +		kvm_s390_pv_dealloc_vm(kvm);
> +	}
> +	debug_unregister(kvm->arch.dbf);
>  	free_page((unsigned long)kvm->arch.sie_page2);
>  	if (!kvm_is_ucontrol(kvm))
>  		gmap_remove(kvm->arch.gmap);
> @@ -2657,6 +2791,9 @@ static int sca_switch_to_extended(struct kvm *kvm)
>  	unsigned int vcpu_idx;
>  	u32 scaol, scaoh;
>  
> +	if (kvm->arch.use_esca)
> +		return 0;
> +
>  	new_sca = alloc_pages_exact(sizeof(*new_sca), GFP_KERNEL|__GFP_ZERO);
>  	if (!new_sca)
>  		return -ENOMEM;
> @@ -3049,6 +3186,15 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
>  	rc = kvm_vcpu_init(vcpu, kvm, id);
>  	if (rc)
>  		goto out_free_sie_block;
> +
> +	if (kvm_s390_pv_is_protected(kvm)) {
> +		rc = kvm_s390_pv_create_cpu(vcpu);
> +		if (rc) {
> +			kvm_vcpu_uninit(vcpu);
> +			goto out_free_sie_block;
> +		}
> +	}
> +
>  	VM_EVENT(kvm, 3, "create cpu %d at 0x%pK, sie block at 0x%pK", id, vcpu,
>  		 vcpu->arch.sie_block);
>  	trace_kvm_s390_create_vcpu(id, vcpu, vcpu->arch.sie_block);
> @@ -4357,6 +4503,35 @@ long kvm_arch_vcpu_async_ioctl(struct file *filp,
>  	return -ENOIOCTLCMD;
>  }
>  
> +static int kvm_s390_handle_pv_vcpu(struct kvm_vcpu *vcpu,
> +				   struct kvm_pv_cmd *cmd)
> +{
> +	int r = 0;
> +
> +	if (!kvm_s390_pv_is_protected(vcpu->kvm))
> +		return -EINVAL;
> +
> +	switch (cmd->cmd) {
> +	case KVM_PV_VCPU_CREATE: {
> +		if (kvm_s390_pv_handle_cpu(vcpu))
> +			return -EINVAL;
> +
> +		r = kvm_s390_pv_create_cpu(vcpu);
> +		break;
> +	}
> +	case KVM_PV_VCPU_DESTROY: {
> +		if (!kvm_s390_pv_handle_cpu(vcpu))
> +			return -EINVAL;
> +
> +		r = kvm_s390_pv_destroy_cpu(vcpu);
> +		break;
> +	}

I feel like my review comments for this patch were lost, so not
repeating them.


-- 
Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 11/35] KVM: s390/mm: Make pages accessible before destroying the guest
  2020-02-07 11:39 ` [PATCH 11/35] KVM: s390/mm: Make pages accessible before destroying the guest Christian Borntraeger
@ 2020-02-14 18:40   ` David Hildenbrand
  0 siblings, 0 replies; 147+ messages in thread
From: David Hildenbrand @ 2020-02-14 18:40 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, Andrew Morton

On 07.02.20 12:39, Christian Borntraeger wrote:
> Before we destroy the secure configuration, we better make all
> pages accessible again. This also happens during reboot, where we reboot
> into a non-secure guest that then can go again into secure mode. As
> this "new" secure guest will have a new ID we cannot reuse the old page
> state.
> 
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Reviewed-by: Thomas Huth <thuth@redhat.com>
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> ---
>  arch/s390/include/asm/pgtable.h |  1 +
>  arch/s390/kvm/pv.c              |  2 ++
>  arch/s390/mm/gmap.c             | 35 +++++++++++++++++++++++++++++++++
>  3 files changed, 38 insertions(+)
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index dbd1453e6924..3e2ea997c334 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -1669,6 +1669,7 @@ extern int vmem_remove_mapping(unsigned long start, unsigned long size);
>  extern int s390_enable_sie(void);
>  extern int s390_enable_skey(void);
>  extern void s390_reset_cmma(struct mm_struct *mm);
> +extern void s390_reset_acc(struct mm_struct *mm);
>  
>  /* s390 has a private copy of get unmapped area to deal with cache synonyms */
>  #define HAVE_ARCH_UNMAPPED_AREA
> diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
> index 4795e61f4e16..392795a92bd9 100644
> --- a/arch/s390/kvm/pv.c
> +++ b/arch/s390/kvm/pv.c
> @@ -66,6 +66,8 @@ int kvm_s390_pv_destroy_vm(struct kvm *kvm)
>  	int rc;
>  	u32 ret;
>  
> +	/* make all pages accessible before destroying the guest */
> +	s390_reset_acc(kvm->mm);
>  	rc = uv_cmd_nodata(kvm_s390_pv_handle(kvm),
>  			   UVC_CMD_DESTROY_SEC_CONF, &ret);
>  	WRITE_ONCE(kvm->arch.gmap->guest_handle, 0);
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 7291452fe5f0..27926a06df32 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2650,3 +2650,38 @@ void s390_reset_cmma(struct mm_struct *mm)
>  	up_write(&mm->mmap_sem);
>  }
>  EXPORT_SYMBOL_GPL(s390_reset_cmma);
> +
> +/*
> + * make inaccessible pages accessible again
> + */
> +static int __s390_reset_acc(pte_t *ptep, unsigned long addr,
> +			    unsigned long next, struct mm_walk *walk)
> +{
> +	pte_t pte = READ_ONCE(*ptep);
> +
> +	if (pte_present(pte))
> +		WARN_ON_ONCE(uv_convert_from_secure(pte_val(pte) & PAGE_MASK));
> +	return 0;
> +}
> +
> +static const struct mm_walk_ops reset_acc_walk_ops = {
> +	.pte_entry		= __s390_reset_acc,
> +};
> +
> +#include <linux/sched/mm.h>
> +void s390_reset_acc(struct mm_struct *mm)
> +{
> +	/*
> +	 * we might be called during
> +	 * reset:                             we walk the pages and clear
> +	 * close of all kvm file descriptors: we walk the pages and clear
> +	 * exit of process on fd closure:     vma already gone, do nothing
> +	 */
> +	if (!mmget_not_zero(mm))
> +		return;
> +	down_read(&mm->mmap_sem);
> +	walk_page_range(mm, 0, TASK_SIZE, &reset_acc_walk_ops, NULL);
> +	up_read(&mm->mmap_sem);
> +	mmput(mm);
> +}
> +EXPORT_SYMBOL_GPL(s390_reset_acc);
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 06/35] s390/mm: add (non)secure page access exceptions handlers
  2020-02-14 18:05   ` David Hildenbrand
@ 2020-02-14 19:59     ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-14 19:59 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, Andrew Morton, Janosch Frank



On 14.02.20 19:05, David Hildenbrand wrote:
> On 07.02.20 12:39, Christian Borntraeger wrote:
>> From: Vasily Gorbik <gor@linux.ibm.com>
>>
>> Add exceptions handlers performing transparent transition of non-secure
>> pages to secure (import) upon guest access and secure pages to
>> non-secure (export) upon hypervisor access.
>>
>> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
>> [frankja@linux.ibm.com: adding checks for failures]
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> [imbrenda@linux.ibm.com:  adding a check for gmap fault]
>> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
>> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  arch/s390/kernel/pgm_check.S |  4 +-
>>  arch/s390/mm/fault.c         | 86 ++++++++++++++++++++++++++++++++++++
>>  2 files changed, 88 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/s390/kernel/pgm_check.S b/arch/s390/kernel/pgm_check.S
>> index 59dee9d3bebf..27ac4f324c70 100644
>> --- a/arch/s390/kernel/pgm_check.S
>> +++ b/arch/s390/kernel/pgm_check.S
>> @@ -78,8 +78,8 @@ PGM_CHECK(do_dat_exception)		/* 39 */
>>  PGM_CHECK(do_dat_exception)		/* 3a */
>>  PGM_CHECK(do_dat_exception)		/* 3b */
>>  PGM_CHECK_DEFAULT			/* 3c */
>> -PGM_CHECK_DEFAULT			/* 3d */
>> -PGM_CHECK_DEFAULT			/* 3e */
>> +PGM_CHECK(do_secure_storage_access)	/* 3d */
>> +PGM_CHECK(do_non_secure_storage_access)	/* 3e */
>>  PGM_CHECK_DEFAULT			/* 3f */
>>  PGM_CHECK_DEFAULT			/* 40 */
>>  PGM_CHECK_DEFAULT			/* 41 */
>> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
>> index 7b0bb475c166..fab4219fa0be 100644
>> --- a/arch/s390/mm/fault.c
>> +++ b/arch/s390/mm/fault.c
>> @@ -38,6 +38,7 @@
>>  #include <asm/irq.h>
>>  #include <asm/mmu_context.h>
>>  #include <asm/facility.h>
>> +#include <asm/uv.h>
>>  #include "../kernel/entry.h"
>>  
>>  #define __FAIL_ADDR_MASK -4096L
>> @@ -816,3 +817,88 @@ static int __init pfault_irq_init(void)
>>  early_initcall(pfault_irq_init);
>>  
>>  #endif /* CONFIG_PFAULT */
>> +
>> +#if IS_ENABLED(CONFIG_KVM)
>> +void do_secure_storage_access(struct pt_regs *regs)
>> +{
>> +	unsigned long addr = regs->int_parm_long & __FAIL_ADDR_MASK;
>> +	struct vm_area_struct *vma;
>> +	struct mm_struct *mm;
>> +	struct page *page;
>> +	int rc;
>> +
>> +	switch (get_fault_type(regs)) {
>> +	case USER_FAULT:
>> +		mm = current->mm;
>> +		down_read(&mm->mmap_sem);
>> +		vma = find_vma(mm, addr);
>> +		if (!vma) {
>> +			up_read(&mm->mmap_sem);
>> +			do_fault_error(regs, VM_READ | VM_WRITE, VM_FAULT_BADMAP);
>> +			break;
>> +		}
>> +		page = follow_page(vma, addr, FOLL_WRITE | FOLL_GET);
>> +		if (IS_ERR_OR_NULL(page)) {
>> +			up_read(&mm->mmap_sem);
>> +			break;
>> +		}
>> +		if (arch_make_page_accessible(page))
>> +			send_sig(SIGSEGV, current, 0);
>> +		put_page(page);
>> +		up_read(&mm->mmap_sem);
>> +		break;
>> +	case KERNEL_FAULT:
>> +		page = phys_to_page(addr);
>> +		if (unlikely(!try_get_page(page)))
>> +			break;
>> +		rc = arch_make_page_accessible(page);
>> +		put_page(page);
>> +		if (rc)
>> +			BUG();
>> +		break;
>> +	case VDSO_FAULT:
>> +		/* fallthrough */
>> +	case GMAP_FAULT:
>> +		/* fallthrough */
> 
> Could we ever get here from the SIE?

GMAP_FAULT is only set if we came from the sie critical section, so unless we have a bug no.

> 
>> +	default:
>> +		do_fault_error(regs, VM_READ | VM_WRITE, VM_FAULT_BADMAP);
>> +		WARN_ON_ONCE(1);
>> +	}
>> +}
>> +NOKPROBE_SYMBOL(do_secure_storage_access);
>> +
>> +void do_non_secure_storage_access(struct pt_regs *regs)
>> +{
>> +	unsigned long gaddr = regs->int_parm_long & __FAIL_ADDR_MASK;
>> +	struct gmap *gmap = (struct gmap *)S390_lowcore.gmap;
>> +	struct uv_cb_cts uvcb = {
>> +		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
>> +		.header.len = sizeof(uvcb),
>> +		.guest_handle = gmap->guest_handle,
>> +		.gaddr = gaddr,
>> +	};
>> +	int rc;
>> +
>> +	if (get_fault_type(regs) != GMAP_FAULT) {
>> +		do_fault_error(regs, VM_READ | VM_WRITE, VM_FAULT_BADMAP);
>> +		WARN_ON_ONCE(1);
>> +		return;
>> +	}
>> +
>> +	rc = uv_make_secure(gmap, gaddr, &uvcb);
>> +	if (rc == -EINVAL && uvcb.header.rc != 0x104)
>> +		send_sig(SIGSEGV, current, 0);
> 
> 
> Looks good to me, but I don't feel like being ready for an r-b. I'll
> have to let that sink in :)
> 
> Assumed-is-okay-by: David Hildenbrand <david@redhat.com>
> 
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 07/35] KVM: s390: add new variants of UV CALL
  2020-02-14 18:28   ` David Hildenbrand
@ 2020-02-14 20:13     ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-14 20:13 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank



On 14.02.20 19:28, David Hildenbrand wrote:
> On 07.02.20 12:39, Christian Borntraeger wrote:
>> From: Janosch Frank <frankja@linux.ibm.com>
>>
>> This add 2 new variants of the UV CALL.
>>
>> The first variant handles UV CALLs that might have longer busy
>> conditions or just need longer when doing partial completion. We should
>> schedule when necessary.
>>
>> The second variant handles UV CALLs that only need the handle but have
>> no payload (e.g. destroying a VM). We can provide a simple wrapper for
>> those.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> [borntraeger@de.ibm.com: patch merging, splitting, fixing]
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  arch/s390/include/asm/uv.h | 59 ++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 59 insertions(+)
>>
>> diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
>> index 1b97230a57ba..e1cef772fde1 100644
>> --- a/arch/s390/include/asm/uv.h
>> +++ b/arch/s390/include/asm/uv.h
>> @@ -14,6 +14,7 @@
>>  #include <linux/types.h>
>>  #include <linux/errno.h>
>>  #include <linux/bug.h>
>> +#include <linux/sched.h>
>>  #include <asm/page.h>
>>  #include <asm/gmap.h>
>>  
>> @@ -91,6 +92,19 @@ struct uv_cb_cfs {
>>  	u64 paddr;
>>  } __packed __aligned(8);
>>  
>> +/*
>> + * A common UV call struct for calls that take no payload
>> + * Examples:
>> + * Destroy cpu/config
>> + * Verify
>> + */
>> +struct uv_cb_nodata {
>> +	struct uv_cb_header header;
>> +	u64 reserved08[2];
>> +	u64 handle;
>> +	u64 reserved20[4];
>> +} __packed __aligned(8);
>> +
>>  struct uv_cb_share {
>>  	struct uv_cb_header header;
>>  	u64 reserved08[3];
>> @@ -98,6 +112,31 @@ struct uv_cb_share {
>>  	u64 reserved28;
>>  } __packed __aligned(8);
>>  
>> +/*
>> + * Low level uv_call that takes r1 and r2 as parameter and avoids
>> + * stalls for long running busy conditions by doing schedule
>> + */
>> +static inline int uv_call_sched(unsigned long r1, unsigned long r2)
>> +{
>> +	int cc;
>> +
>> +	do {
>> +		asm volatile(
>> +			"0:	.insn rrf,0xB9A40000,%[r1],%[r2],0,0\n"
> 
> label not necessary

ack
> 
>> +			"		ipm	%[cc]\n"
>> +			"		srl	%[cc],28\n"
>> +			: [cc] "=d" (cc)
>> +			: [r1] "d" (r1), [r2] "d" (r2)
>> +			: "memory", "cc");
> 
> I was wondering if we could reuse uv_call() - something like
> 
> static inline int __uv_call(unsigned long r1, unsigned long r2)
> {
> 	int cc;
> 
> 	asm volatile(
> 		"	.insn rrf,0xB9A40000,%[r1],%[r2],0,0\n"
> 		"		ipm	%[cc]\n"
> 		"		srl	%[cc],28\n"
> 		: [cc] "=d" (cc)
> 		: [r1] "a" (r1), [r2] "a" (r2)
> 		: "memory", "cc");
> 	return cc;
> }
> 
> static inline int uv_call(unsigned long r1, unsigned long r2)
> {
> 	int rc;
> 
> 	do {
> 		cc = __uv_call(unsigned long r1, unsigned long r2);
> 	} while (cc > 1)
> 	return rc;

This will likely generate less efficient assembly code but it is certainly
easier to read. WIll change. 
> }
> 
> static inline int uv_call_sched(unsigned long r1, unsigned long r2)
> {
> 	int rc;
> 
> 	do {
> 		cc = __uv_call(unsigned long r1, unsigned long r2);
> 		cond_resched();
> 	} while (rc > 1)
> 	return rc;
> }

> 
>> +		if (need_resched())
>> +			schedule();
> 
> cond_resched();

ack

> 
>> +	} while (cc > 1);
>> +	return cc;
>> +}
>> +
>> +/*
>> + * Low level uv_call that takes r1 and r2 as parameter
>> + */
> 
> This "r1 and r2" does not sound like relevant news. Same for the other
> variant above.
> 
>>  static inline int uv_call(unsigned long r1, unsigned long r2)
>>  {
>>  	int cc;
>> @@ -113,6 +152,26 @@ static inline int uv_call(unsigned long r1, unsigned long r2)
>>  	return cc;
>>  }
>>  
>> +/*
>> + * special variant of uv_call that only transports the cpu or guest
>> + * handle and the command, like destroy or verify.
>> + */
>> +static inline int uv_cmd_nodata(u64 handle, u16 cmd, u32 *ret)
> 
> uv_call_sched_simple() ?

I think nodata is actually a better description

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 05/35] s390/mm: provide memory management functions for protected KVM guests
  2020-02-14 17:59   ` David Hildenbrand
@ 2020-02-14 21:17     ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-14 21:17 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, linux-mm, Andrew Morton

In general this patch has changed a lot, but several comments still apply

On 14.02.20 18:59, David Hildenbrand wrote:
>>  
>>  /*
>> @@ -1086,12 +1106,16 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
>>  					    unsigned long addr,
>>  					    pte_t *ptep, int full)
>>  {
>> +	pte_t res;
> 
> Empty line missing.

ack

> 
>>  	if (full) {
>> -		pte_t pte = *ptep;
>> +		res = *ptep;
>>  		*ptep = __pte(_PAGE_INVALID);
>> -		return pte;
>> +	} else {
>> +		res = ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID));
>>  	}
>> -	return ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID));
>> +	if (mm_is_protected(mm) && pte_present(res))
>> +		uv_convert_from_secure(pte_val(res) & PAGE_MASK);
>> +	return res;
>>  }
> 
> [...]
> 
>> +int uv_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb);
>> +int uv_convert_from_secure(unsigned long paddr);
>> +
>> +static inline int uv_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
>> +{
>> +	struct uv_cb_cts uvcb = {
>> +		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
>> +		.header.len = sizeof(uvcb),
>> +		.guest_handle = gmap->guest_handle,
>> +		.gaddr = gaddr,
>> +	};
>> +
>> +	return uv_make_secure(gmap, gaddr, &uvcb);
>> +}
> 
> I'd actually suggest to name everything that eats a gmap "gmap_",
> 
> e.g., "gmap_make_secure()"
> 
> [...]

ack.

> 
>>  
>>  #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
>> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
>> index a06a628a88da..15ac598a3d8d 100644
>> --- a/arch/s390/kernel/uv.c
>> +++ b/arch/s390/kernel/uv.c
>> @@ -9,6 +9,8 @@
>>  #include <linux/sizes.h>
>>  #include <linux/bitmap.h>
>>  #include <linux/memblock.h>
>> +#include <linux/pagemap.h>
>> +#include <linux/swap.h>
>>  #include <asm/facility.h>
>>  #include <asm/sections.h>
>>  #include <asm/uv.h>
>> @@ -99,4 +101,174 @@ void adjust_to_uv_max(unsigned long *vmax)
>>  	if (prot_virt_host && *vmax > uv_info.max_sec_stor_addr)
>>  		*vmax = uv_info.max_sec_stor_addr;
>>  }
>> +
>> +static int __uv_pin_shared(unsigned long paddr)
>> +{
>> +	struct uv_cb_cfs uvcb = {
>> +		.header.cmd	= UVC_CMD_PIN_PAGE_SHARED,
>> +		.header.len	= sizeof(uvcb),
>> +		.paddr		= paddr,
> 
> please drop all the superfluous spaces (just as in the other uv calls).

ack

> 
>> +	};
>> +
>> +	if (uv_call(0, (u64)&uvcb))
>> +		return -EINVAL;
>> +	return 0;
>> +}
> 
> [...]
> 
>> +static int make_secure_pte(pte_t *ptep, unsigned long addr, void *data)
>> +{
>> +	struct conv_params *params = data;
>> +	pte_t entry = READ_ONCE(*ptep);
>> +	struct page *page;
>> +	int expected, rc = 0;
>> +
>> +	if (!pte_present(entry))
>> +		return -ENXIO;
>> +	if (pte_val(entry) & (_PAGE_INVALID | _PAGE_PROTECT))
>> +		return -ENXIO;
>> +
>> +	page = pte_page(entry);
>> +	if (page != params->page)
>> +		return -ENXIO;
>> +
>> +	if (PageWriteback(page))
>> +		return -EAGAIN;
>> +	expected = expected_page_refs(page);
> 
> I do wonder if we could factor out expected_page_refs() and reuse from
> other sources ...
> 
> I do wonder about huge page backing of guests, and especially
> hpage_nr_pages(page) used in mm/migrate.c:expected_page_refs(). But I
> can spot some hugepage exclusion below ... This needs comments.

Yes, we looked into several places and ALL places do their own math with their
own side conditions. There is no single function that accounts all possible
conditions and I am not going to start that now given the review bandwidth of
the mm tree.

I will add:
/*
 * Calculate the expected ref_count for a page that would otherwise have no
 * further pins. This was cribbed from similar functions in other places in
 * the kernel, but with some slight modifications. We know that a secure
 * page can not be a huge page for example.
 */
to expected page count

and something to the hugetlb check.




> 
>> +	if (!page_ref_freeze(page, expected))
>> +		return -EBUSY;
>> +	set_bit(PG_arch_1, &page->flags);
> 
> Can we please document somewhere how PG_arch_1 is used on s390x? (page)
> 
> "The generic code guarantees that this bit is cleared for a page when it
> first is entered into the page cache" - should not be an issue, right?

Right
> 
>> +	rc = uv_call(0, (u64)params->uvcb);
>> +	page_ref_unfreeze(page, expected);
>> +	if (rc)
>> +		rc = (params->uvcb->rc == 0x10a) ? -ENXIO : -EINVAL;
>> +	return rc;
>> +}
>> +
>> +/*
>> + * Requests the Ultravisor to make a page accessible to a guest.
>> + * If it's brought in the first time, it will be cleared. If
>> + * it has been exported before, it will be decrypted and integrity
>> + * checked.
>> + *
>> + * @gmap: Guest mapping
>> + * @gaddr: Guest 2 absolute address to be imported
> 
> I'd just drop the the (incomplete) parameter documentation, everybody
> reaching this point should now what a gmap and what a gaddr is ...

ack.
> 
>> + */
>> +int uv_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
>> +{
>> +	struct conv_params params = { .uvcb = uvcb };
>> +	struct vm_area_struct *vma;
>> +	unsigned long uaddr;
>> +	int rc, local_drain = 0;
>> +
>> +again:
>> +	rc = -EFAULT;
>> +	down_read(&gmap->mm->mmap_sem);
>> +
>> +	uaddr = __gmap_translate(gmap, gaddr);
>> +	if (IS_ERR_VALUE(uaddr))
>> +		goto out;
>> +	vma = find_vma(gmap->mm, uaddr);
>> +	if (!vma)
>> +		goto out;
>> +	if (is_vm_hugetlb_page(vma))
>> +		goto out;
> 
> Hah there it is! How is it enforced on upper layers/excluded? Will
> hpage=true fail with prot virt? What if a guest is not a protected guest
> but wants to sue huge pages? This needs comments/patch description.

will add

        /*
         * Secure pages cannot be huge and userspace should not combine both.
         * In case userspace does it anyway this will result in an -EFAULT for
         * the unpack. The guest is thus never reaching secure mode. If
         * userspace is playing dirty tricky with mapping huge pages later
         * on this will result in a segmenation fault.
         */


> 
>> +
>> +	rc = -ENXIO;
>> +	params.page = follow_page(vma, uaddr, FOLL_WRITE | FOLL_NOWAIT);
>> +	if (IS_ERR_OR_NULL(params.page))
>> +		goto out;
>> +
>> +	lock_page(params.page);
>> +	rc = apply_to_page_range(gmap->mm, uaddr, PAGE_SIZE, make_secure_pte, &params);
> 
> Ehm, isn't it just always a single page?

Yes, already fixed.

> 
>> +	unlock_page(params.page);
>> +out:
>> +	up_read(&gmap->mm->mmap_sem);
>> +
>> +	if (rc == -EBUSY) {
>> +		if (local_drain) {
>> +			lru_add_drain_all();
>> +			return -EAGAIN;
>> +		}
>> +		lru_add_drain();
> 
> comments please why that is performed.

done

> 
>> +		local_drain = 1;
[..]

>> +
>> +	if (PageHuge(page))
>> +		return 0;
> 
> Ah, another instance. Comment please why
> 
>> +
>> +	if (!test_bit(PG_arch_1, &page->flags))
>> +		return 0;
> 
> "Can you describe the meaning of this bit with three words"? Or a couple
> more? :D
> 
> "once upon a time, the page was secure and still might be" ?
> "the page is secure and therefore inaccessible" ?


        /*
         * PG_arch_1 is used in 3 places:
         * 1. for kernel page tables during early boot
         * 2. for storage keys of huge pages and KVM
         * 3. As an indication that this page might be secure. This can
         *    overindicate, e.g. we set the bit before calling
         *    convert_to_secure.
         * As secure pages are never huge, all 3 variants can co-exists.
         */

> 
>> +
>> +	rc = __uv_pin_shared(page_to_phys(page));
>> +	if (!rc) {
>> +		clear_bit(PG_arch_1, &page->flags);
>> +		return 0;
>> +	}
>> +
>> +	rc = uv_convert_from_secure(page_to_phys(page));
>> +	if (!rc) {
>> +		clear_bit(PG_arch_1, &page->flags);
>> +		return 0;
>> +	}
>> +
>> +	return rc;
>> +}
>> +EXPORT_SYMBOL_GPL(arch_make_page_accessible);
>> +
>>  #endif
>>
> 
> More code comments would be highly appreciated!
> 
done

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 08/35] KVM: s390: protvirt: Add initial lifecycle handling
  2020-02-14 18:39   ` [PATCH 08/35] KVM: s390: protvirt: Add initial lifecycle handling David Hildenbrand
@ 2020-02-14 21:22     ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-14 21:22 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, Andrea Arcangeli, linux-s390, Michael Mueller,
	Vasily Gorbik, Janosch Frank



On 14.02.20 19:39, David Hildenbrand wrote:
> On 07.02.20 12:39, Christian Borntraeger wrote:
>> From: Janosch Frank <frankja@linux.ibm.com>
>>
>> This contains 3 main changes:
>> 1. changes in SIE control block handling for secure guests
>> 2. helper functions for create/destroy/unpack secure guests
>> 3. KVM_S390_PV_COMMAND ioctl to allow userspace dealing with secure
>> machines
[...]
> 
> I feel like my review comments for this patch were lost, so not
> repeating them

Basically you only asked if we could combine vm/vcpu_create/destroy into
enable/disable. Janosch came up with some cases regarding error handling
where the enable/disable would be hard to do right and exposing the single
interfaces provide some advantages.

If you still want to go down that path, please look at the next round of
kernel/qemu patches and then lets discuss.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-13 20:13         ` Christian Borntraeger
@ 2020-02-17 20:55           ` Tom Lendacky
  -1 siblings, 0 replies; 147+ messages in thread
From: Tom Lendacky @ 2020-02-17 20:55 UTC (permalink / raw)
  To: Christian Borntraeger, Sean Christopherson
  Cc: Janosch Frank, Andrew Morton, Marc Zyngier, KVM, Cornelia Huck,
	David Hildenbrand, Thomas Huth, Ulrich Weigand, Claudio Imbrenda,
	Andrea Arcangeli, linux-s390, Michael Mueller, Vasily Gorbik,
	linux-mm, kvm-ppc, Paolo Bonzini

On 2/13/20 2:13 PM, Christian Borntraeger wrote:
> 
> 
> On 13.02.20 20:56, Sean Christopherson wrote:
>> On Mon, Feb 10, 2020 at 06:27:04PM +0100, Christian Borntraeger wrote:
>>> CC Marc Zyngier for KVM on ARM.  Marc, see below. Will there be any
>>> use for this on KVM/ARM in the future?
>>>
>>> CC Sean Christopherson/Tom Lendacky. Any obvious use case for Intel/AMD
>>> to have a callback before a page is used for I/O?

From an SEV-SNP perspective, I don't think so. The SEV-SNP architecture
uses page states and having the hypervisor change the state from beneath
the guest might trigger the guest into thinking it's being attacked vs
just allowing the I/O to fail. Is this a concern with flooding the console
with I/O error messages?

>>
>> Yes?
>>
>>> Andrew (or other mm people) any chance to get an ACK for this change?
>>> I could then carry that via s390 or KVM tree. Or if you want to carry
>>> that yourself I can send an updated version (we need to kind of 
>>> synchronize that Linus will pull the KVM changes after the mm changes).
>>>
>>> Andrea asked if others would benefit from this, so here are some more
>>> information about this (and I can also put this into the patch
>>> description).  So we have talked to the POWER folks. They do not use
>>> the standard normal memory management, instead they have a hard split
>>> between secure and normal memory. The secure memory  is the handled by
>>> the hypervisor as device memory and the ultravisor and the hypervisor
>>> move this forth and back when needed.
>>>
>>> On s390 there is no *separate* pool of physical pages that are secure.
>>> Instead, *any* physical page can be marked as secure or not, by
>>> setting a bit in a per-page data structure that hardware uses to stop
>>> unauthorized access.  (That bit is under control of the ultravisor.)
>>>
>>> Note that one side effect of this strategy is that the decision
>>> *which* secure pages to encrypt and then swap out is actually done by
>>> the hypervisor, not the ultravisor.  In our case, the hypervisor is
>>> Linux/KVM, so we're using the regular Linux memory management scheme
>>> (active/inactive LRU lists etc.) to make this decision.  The advantage
>>> is that the Ultravisor code does not need to itself implement any
>>> memory management code, making it a lot simpler.
>>
>> Disclaimer: I'm not familiar with s390 guest page faults or UV.  I tried
>> to give myself a crash course, apologies if I'm way out in left field...
>>
>> AIUI, pages will first be added to a secure guest by converting a normal,
>> non-secure page to secure and stuffing it into the guest page tables.  To
>> swap a page from a secure guest, arch_make_page_accessible() will be called
>> to encrypt the page in place so that it can be accessed by the untrusted
>> kernel/VMM and written out to disk.  And to fault the page back in, on s390
>> a secure guest access to a non-secure page will generate a page fault with
>> a dedicated type.  That fault routes directly to
>> do_non_secure_storage_access(), which converts the page to secure and thus
>> makes it re-accessible to the guest.
>>
>> That all sounds sane and usable for Intel.
>>
>> My big question is the follow/get flows, more on that below.
>>
>>> However, in the end this is why we need the hook into Linux memory
>>> management: once Linux has decided to swap a page out, we need to get
>>> a chance to tell the Ultravisor to "export" the page (i.e., encrypt
>>> its contents and mark it no longer secure).
>>>
>>> As outlined below this should be a no-op for anybody not opting in.
>>>
>>> Christian                                   
>>>
>>> On 07.02.20 12:39, Christian Borntraeger wrote:
>>>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>>>
>>>> With the introduction of protected KVM guests on s390 there is now a
>>>> concept of inaccessible pages. These pages need to be made accessible
>>>> before the host can access them.
>>>>
>>>> While cpu accesses will trigger a fault that can be resolved, I/O
>>>> accesses will just fail.  We need to add a callback into architecture
>>>> code for places that will do I/O, namely when writeback is started or
>>>> when a page reference is taken.
>>>>
>>>> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>> ---
>>>>  include/linux/gfp.h | 6 ++++++
>>>>  mm/gup.c            | 2 ++
>>>>  mm/page-writeback.c | 1 +
>>>>  3 files changed, 9 insertions(+)
>>>>
>>>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>>>> index e5b817cb86e7..be2754841369 100644
>>>> --- a/include/linux/gfp.h
>>>> +++ b/include/linux/gfp.h
>>>> @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { }
>>>>  #ifndef HAVE_ARCH_ALLOC_PAGE
>>>>  static inline void arch_alloc_page(struct page *page, int order) { }
>>>>  #endif
>>>> +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
>>>> +static inline int arch_make_page_accessible(struct page *page)
>>>> +{
>>>> +	return 0;
>>>> +}
>>>> +#endif
>>>>  
>>>>  struct page *
>>>>  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>>>> diff --git a/mm/gup.c b/mm/gup.c
>>>> index 7646bf993b25..a01262cd2821 100644
>>>> --- a/mm/gup.c
>>>> +++ b/mm/gup.c
>>>> @@ -257,6 +257,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>>>>  			page = ERR_PTR(-ENOMEM);
>>>>  			goto out;
>>>>  		}
>>>> +		arch_make_page_accessible(page);
>>
>> As Will pointed out, the return value definitely needs to be checked, there
>> will undoubtedly be scenarios where the page cannot be made accessible.
> 
> Actually onm s390 this should always succeed unless we have a bug.
> 
> But we can certainly provide a variant of that patch that does check the return
> value. 
> Proper error handling for gup and WARN_ON for pae-writeback.
>>
>> What is the use case for calling arch_make_page_accessible() in the follow()
>> and gup() paths?  Live migration is the only thing that comes to mind, and
>> for live migration I would expect you would want to keep the secure guest
>> running when copying pages to the target, i.e. use pre-copy.  That would
>> conflict with converting the page in place.  Rather, migration would use a
>> separate dedicated path to copy the encrypted contents of the secure page to
>> a completely different page, and send *that* across the wire so that the
>> guest can continue accessing the original page.
>> Am I missing a need to do this for the swap/reclaim case?  Or is there a
>> completely different use case I'm overlooking?
> 
> This is actually to protect the host against a malicious user space. For 
> example a bad QEMU could simply start direct I/O on such protected memory.
> We do not want userspace to be able to trigger I/O errors and thus we
> implemented the logic to "whenever somebody accesses that page (gup) or
> doing I/O, make sure that this page can be accessed. When the guest tries
> to access that page we will wait in the page fault handler for writeback to
> have finished and for the page_ref to be the expected value.

So in this case, when the guest tries to access the page, the page may now
be corrupted because I/O was allowed to be done to it? Or will the I/O
have been blocked in some way, but without generating the I/O error?

Thanks,
Tom

> 
> 
> 
>>
>> Tangentially related, hooks here could be quite useful for sanity checking
>> the kernel/KVM and/or debugging kernel/KVM bugs.  Would it make sense to
>> pass a param to arch_make_page_accessible() to provide some information as
>> to why the page needs to be made accessible?
> 
> Some kind of enum that can be used optionally to optimize things?
> 
>>
>>>>  	}
>>>>  	if (flags & FOLL_TOUCH) {
>>>>  		if ((flags & FOLL_WRITE) &&
>>>> @@ -1870,6 +1871,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>>>>  
>>>>  		VM_BUG_ON_PAGE(compound_head(page) != head, page);
>>>>  
>>>> +		arch_make_page_accessible(page);
>>>>  		SetPageReferenced(page);
>>>>  		pages[*nr] = page;
>>>>  		(*nr)++;
>>>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>>>> index 2caf780a42e7..0f0bd14571b1 100644
>>>> --- a/mm/page-writeback.c
>>>> +++ b/mm/page-writeback.c
>>>> @@ -2806,6 +2806,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
>>>>  		inc_lruvec_page_state(page, NR_WRITEBACK);
>>>>  		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
>>>>  	}
>>>> +	arch_make_page_accessible(page);
>>>>  	unlock_page_memcg(page);
>>>
>>> As outlined by Ulrich, we can move the callback after the unlock.
>>>
>>>>  	return ret;
>>>>  
>>>>
>>>
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
@ 2020-02-17 20:55           ` Tom Lendacky
  0 siblings, 0 replies; 147+ messages in thread
From: Tom Lendacky @ 2020-02-17 20:55 UTC (permalink / raw)
  To: Christian Borntraeger, Sean Christopherson
  Cc: Janosch Frank, Andrew Morton, Marc Zyngier, KVM, Cornelia Huck,
	David Hildenbrand, Thomas Huth, Ulrich Weigand, Claudio Imbrenda,
	Andrea Arcangeli, linux-s390, Michael Mueller, Vasily Gorbik,
	linux-mm, kvm-ppc, Paolo Bonzini

On 2/13/20 2:13 PM, Christian Borntraeger wrote:
> 
> 
> On 13.02.20 20:56, Sean Christopherson wrote:
>> On Mon, Feb 10, 2020 at 06:27:04PM +0100, Christian Borntraeger wrote:
>>> CC Marc Zyngier for KVM on ARM.  Marc, see below. Will there be any
>>> use for this on KVM/ARM in the future?
>>>
>>> CC Sean Christopherson/Tom Lendacky. Any obvious use case for Intel/AMD
>>> to have a callback before a page is used for I/O?

From an SEV-SNP perspective, I don't think so. The SEV-SNP architecture
uses page states and having the hypervisor change the state from beneath
the guest might trigger the guest into thinking it's being attacked vs
just allowing the I/O to fail. Is this a concern with flooding the console
with I/O error messages?

>>
>> Yes?
>>
>>> Andrew (or other mm people) any chance to get an ACK for this change?
>>> I could then carry that via s390 or KVM tree. Or if you want to carry
>>> that yourself I can send an updated version (we need to kind of 
>>> synchronize that Linus will pull the KVM changes after the mm changes).
>>>
>>> Andrea asked if others would benefit from this, so here are some more
>>> information about this (and I can also put this into the patch
>>> description).  So we have talked to the POWER folks. They do not use
>>> the standard normal memory management, instead they have a hard split
>>> between secure and normal memory. The secure memory  is the handled by
>>> the hypervisor as device memory and the ultravisor and the hypervisor
>>> move this forth and back when needed.
>>>
>>> On s390 there is no *separate* pool of physical pages that are secure.
>>> Instead, *any* physical page can be marked as secure or not, by
>>> setting a bit in a per-page data structure that hardware uses to stop
>>> unauthorized access.  (That bit is under control of the ultravisor.)
>>>
>>> Note that one side effect of this strategy is that the decision
>>> *which* secure pages to encrypt and then swap out is actually done by
>>> the hypervisor, not the ultravisor.  In our case, the hypervisor is
>>> Linux/KVM, so we're using the regular Linux memory management scheme
>>> (active/inactive LRU lists etc.) to make this decision.  The advantage
>>> is that the Ultravisor code does not need to itself implement any
>>> memory management code, making it a lot simpler.
>>
>> Disclaimer: I'm not familiar with s390 guest page faults or UV.  I tried
>> to give myself a crash course, apologies if I'm way out in left field...
>>
>> AIUI, pages will first be added to a secure guest by converting a normal,
>> non-secure page to secure and stuffing it into the guest page tables.  To
>> swap a page from a secure guest, arch_make_page_accessible() will be called
>> to encrypt the page in place so that it can be accessed by the untrusted
>> kernel/VMM and written out to disk.  And to fault the page back in, on s390
>> a secure guest access to a non-secure page will generate a page fault with
>> a dedicated type.  That fault routes directly to
>> do_non_secure_storage_access(), which converts the page to secure and thus
>> makes it re-accessible to the guest.
>>
>> That all sounds sane and usable for Intel.
>>
>> My big question is the follow/get flows, more on that below.
>>
>>> However, in the end this is why we need the hook into Linux memory
>>> management: once Linux has decided to swap a page out, we need to get
>>> a chance to tell the Ultravisor to "export" the page (i.e., encrypt
>>> its contents and mark it no longer secure).
>>>
>>> As outlined below this should be a no-op for anybody not opting in.
>>>
>>> Christian                                   
>>>
>>> On 07.02.20 12:39, Christian Borntraeger wrote:
>>>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>>>
>>>> With the introduction of protected KVM guests on s390 there is now a
>>>> concept of inaccessible pages. These pages need to be made accessible
>>>> before the host can access them.
>>>>
>>>> While cpu accesses will trigger a fault that can be resolved, I/O
>>>> accesses will just fail.  We need to add a callback into architecture
>>>> code for places that will do I/O, namely when writeback is started or
>>>> when a page reference is taken.
>>>>
>>>> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>>> ---
>>>>  include/linux/gfp.h | 6 ++++++
>>>>  mm/gup.c            | 2 ++
>>>>  mm/page-writeback.c | 1 +
>>>>  3 files changed, 9 insertions(+)
>>>>
>>>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>>>> index e5b817cb86e7..be2754841369 100644
>>>> --- a/include/linux/gfp.h
>>>> +++ b/include/linux/gfp.h
>>>> @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { }
>>>>  #ifndef HAVE_ARCH_ALLOC_PAGE
>>>>  static inline void arch_alloc_page(struct page *page, int order) { }
>>>>  #endif
>>>> +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
>>>> +static inline int arch_make_page_accessible(struct page *page)
>>>> +{
>>>> +	return 0;
>>>> +}
>>>> +#endif
>>>>  
>>>>  struct page *
>>>>  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
>>>> diff --git a/mm/gup.c b/mm/gup.c
>>>> index 7646bf993b25..a01262cd2821 100644
>>>> --- a/mm/gup.c
>>>> +++ b/mm/gup.c
>>>> @@ -257,6 +257,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>>>>  			page = ERR_PTR(-ENOMEM);
>>>>  			goto out;
>>>>  		}
>>>> +		arch_make_page_accessible(page);
>>
>> As Will pointed out, the return value definitely needs to be checked, there
>> will undoubtedly be scenarios where the page cannot be made accessible.
> 
> Actually onm s390 this should always succeed unless we have a bug.
> 
> But we can certainly provide a variant of that patch that does check the return
> value. 
> Proper error handling for gup and WARN_ON for pae-writeback.
>>
>> What is the use case for calling arch_make_page_accessible() in the follow()
>> and gup() paths?  Live migration is the only thing that comes to mind, and
>> for live migration I would expect you would want to keep the secure guest
>> running when copying pages to the target, i.e. use pre-copy.  That would
>> conflict with converting the page in place.  Rather, migration would use a
>> separate dedicated path to copy the encrypted contents of the secure page to
>> a completely different page, and send *that* across the wire so that the
>> guest can continue accessing the original page.
>> Am I missing a need to do this for the swap/reclaim case?  Or is there a
>> completely different use case I'm overlooking?
> 
> This is actually to protect the host against a malicious user space. For 
> example a bad QEMU could simply start direct I/O on such protected memory.
> We do not want userspace to be able to trigger I/O errors and thus we
> implemented the logic to "whenever somebody accesses that page (gup) or
> doing I/O, make sure that this page can be accessed. When the guest tries
> to access that page we will wait in the page fault handler for writeback to
> have finished and for the page_ref to be the expected value.

So in this case, when the guest tries to access the page, the page may now
be corrupted because I/O was allowed to be done to it? Or will the I/O
have been blocked in some way, but without generating the I/O error?

Thanks,
Tom

> 
> 
> 
>>
>> Tangentially related, hooks here could be quite useful for sanity checking
>> the kernel/KVM and/or debugging kernel/KVM bugs.  Would it make sense to
>> pass a param to arch_make_page_accessible() to provide some information as
>> to why the page needs to be made accessible?
> 
> Some kind of enum that can be used optionally to optimize things?
> 
>>
>>>>  	}
>>>>  	if (flags & FOLL_TOUCH) {
>>>>  		if ((flags & FOLL_WRITE) &&
>>>> @@ -1870,6 +1871,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>>>>  
>>>>  		VM_BUG_ON_PAGE(compound_head(page) != head, page);
>>>>  
>>>> +		arch_make_page_accessible(page);
>>>>  		SetPageReferenced(page);
>>>>  		pages[*nr] = page;
>>>>  		(*nr)++;
>>>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>>>> index 2caf780a42e7..0f0bd14571b1 100644
>>>> --- a/mm/page-writeback.c
>>>> +++ b/mm/page-writeback.c
>>>> @@ -2806,6 +2806,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
>>>>  		inc_lruvec_page_state(page, NR_WRITEBACK);
>>>>  		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
>>>>  	}
>>>> +	arch_make_page_accessible(page);
>>>>  	unlock_page_memcg(page);
>>>
>>> As outlined by Ulrich, we can move the callback after the unlock.
>>>
>>>>  	return ret;
>>>>  
>>>>
>>>
> 

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-17 20:55           ` Tom Lendacky
@ 2020-02-17 21:14             ` Christian Borntraeger
  -1 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-17 21:14 UTC (permalink / raw)
  To: Tom Lendacky, Sean Christopherson
  Cc: Janosch Frank, Andrew Morton, Marc Zyngier, KVM, Cornelia Huck,
	David Hildenbrand, Thomas Huth, Ulrich Weigand, Claudio Imbrenda,
	Andrea Arcangeli, linux-s390, Michael Mueller, Vasily Gorbik,
	linux-mm, kvm-ppc, Paolo Bonzini



On 17.02.20 21:55, Tom Lendacky wrote:
[...]

>>> What is the use case for calling arch_make_page_accessible() in the follow()
>>> and gup() paths?  Live migration is the only thing that comes to mind, and
>>> for live migration I would expect you would want to keep the secure guest
>>> running when copying pages to the target, i.e. use pre-copy.  That would
>>> conflict with converting the page in place.  Rather, migration would use a
>>> separate dedicated path to copy the encrypted contents of the secure page to
>>> a completely different page, and send *that* across the wire so that the
>>> guest can continue accessing the original page.
>>> Am I missing a need to do this for the swap/reclaim case?  Or is there a
>>> completely different use case I'm overlooking?
>>
>> This is actually to protect the host against a malicious user space. For 
>> example a bad QEMU could simply start direct I/O on such protected memory.
>> We do not want userspace to be able to trigger I/O errors and thus we
>> implemented the logic to "whenever somebody accesses that page (gup) or
>> doing I/O, make sure that this page can be accessed. When the guest tries
>> to access that page we will wait in the page fault handler for writeback to
>> have finished and for the page_ref to be the expected value.
> 
> So in this case, when the guest tries to access the page, the page may now
> be corrupted because I/O was allowed to be done to it? Or will the I/O
> have been blocked in some way, but without generating the I/O error?

No the I/O would be blocked by the hardware. Thats why we encrypt and export
the page for I/O usage. As soon as the refcount drops to the expected value
the guest can access its (unchanged) content after the import. the import
would check the hash etc. so no corruption of the guest state in any case.
(apart from denial of service, which is always possible)
If we would not have these hooks a malicious user could trigger I/O (which 
would be blocked) but the blocked I/O would generate an I/O error. And this
could bring trouble to some device drivers. And we want to avoid that.

In other words: the hardware/firmware will ensure guest integrity.But host
integrity (kernel vs userspace) must be enforced by the host kernel as usual
and this is one part of it.

But thanks for the clarification that you do not need those hooks.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
@ 2020-02-17 21:14             ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-17 21:14 UTC (permalink / raw)
  To: Tom Lendacky, Sean Christopherson
  Cc: Janosch Frank, Andrew Morton, Marc Zyngier, KVM, Cornelia Huck,
	David Hildenbrand, Thomas Huth, Ulrich Weigand, Claudio Imbrenda,
	Andrea Arcangeli, linux-s390, Michael Mueller, Vasily Gorbik,
	linux-mm, kvm-ppc, Paolo Bonzini



On 17.02.20 21:55, Tom Lendacky wrote:
[...]

>>> What is the use case for calling arch_make_page_accessible() in the follow()
>>> and gup() paths?  Live migration is the only thing that comes to mind, and
>>> for live migration I would expect you would want to keep the secure guest
>>> running when copying pages to the target, i.e. use pre-copy.  That would
>>> conflict with converting the page in place.  Rather, migration would use a
>>> separate dedicated path to copy the encrypted contents of the secure page to
>>> a completely different page, and send *that* across the wire so that the
>>> guest can continue accessing the original page.
>>> Am I missing a need to do this for the swap/reclaim case?  Or is there a
>>> completely different use case I'm overlooking?
>>
>> This is actually to protect the host against a malicious user space. For 
>> example a bad QEMU could simply start direct I/O on such protected memory.
>> We do not want userspace to be able to trigger I/O errors and thus we
>> implemented the logic to "whenever somebody accesses that page (gup) or
>> doing I/O, make sure that this page can be accessed. When the guest tries
>> to access that page we will wait in the page fault handler for writeback to
>> have finished and for the page_ref to be the expected value.
> 
> So in this case, when the guest tries to access the page, the page may now
> be corrupted because I/O was allowed to be done to it? Or will the I/O
> have been blocked in some way, but without generating the I/O error?

No the I/O would be blocked by the hardware. Thats why we encrypt and export
the page for I/O usage. As soon as the refcount drops to the expected value
the guest can access its (unchanged) content after the import. the import
would check the hash etc. so no corruption of the guest state in any case.
(apart from denial of service, which is always possible)
If we would not have these hooks a malicious user could trigger I/O (which 
would be blocked) but the blocked I/O would generate an I/O error. And this
could bring trouble to some device drivers. And we want to avoid that.

In other words: the hardware/firmware will ensure guest integrity.But host
integrity (kernel vs userspace) must be enforced by the host kernel as usual
and this is one part of it.

But thanks for the clarification that you do not need those hooks.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* RE: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-07 11:39 ` [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages Christian Borntraeger
  2020-02-10 17:27     ` Christian Borntraeger
  2020-02-10 18:17   ` David Hildenbrand
@ 2020-02-18  3:36   ` Tian, Kevin
  2020-02-18  6:44     ` Christian Borntraeger
  2 siblings, 1 reply; 147+ messages in thread
From: Tian, Kevin @ 2020-02-18  3:36 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, Andrew Morton

> From: Christian Borntraeger
> Sent: Friday, February 7, 2020 7:39 PM
> 
> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> 
> With the introduction of protected KVM guests on s390 there is now a
> concept of inaccessible pages. These pages need to be made accessible
> before the host can access them.
> 
> While cpu accesses will trigger a fault that can be resolved, I/O
> accesses will just fail.  We need to add a callback into architecture
> code for places that will do I/O, namely when writeback is started or
> when a page reference is taken.

What about hooking the callback to DMA API ops?

> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  include/linux/gfp.h | 6 ++++++
>  mm/gup.c            | 2 ++
>  mm/page-writeback.c | 1 +
>  3 files changed, 9 insertions(+)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index e5b817cb86e7..be2754841369 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page,
> int order) { }
>  #ifndef HAVE_ARCH_ALLOC_PAGE
>  static inline void arch_alloc_page(struct page *page, int order) { }
>  #endif
> +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> +static inline int arch_make_page_accessible(struct page *page)
> +{
> +	return 0;
> +}
> +#endif
> 
>  struct page *
>  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int
> preferred_nid,
> diff --git a/mm/gup.c b/mm/gup.c
> index 7646bf993b25..a01262cd2821 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -257,6 +257,7 @@ static struct page *follow_page_pte(struct
> vm_area_struct *vma,
>  			page = ERR_PTR(-ENOMEM);
>  			goto out;
>  		}
> +		arch_make_page_accessible(page);
>  	}
>  	if (flags & FOLL_TOUCH) {
>  		if ((flags & FOLL_WRITE) &&
> @@ -1870,6 +1871,7 @@ static int gup_pte_range(pmd_t pmd, unsigned
> long addr, unsigned long end,
> 
>  		VM_BUG_ON_PAGE(compound_head(page) != head, page);
> 
> +		arch_make_page_accessible(page);
>  		SetPageReferenced(page);
>  		pages[*nr] = page;
>  		(*nr)++;
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 2caf780a42e7..0f0bd14571b1 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2806,6 +2806,7 @@ int __test_set_page_writeback(struct page *page,
> bool keep_write)
>  		inc_lruvec_page_state(page, NR_WRITEBACK);
>  		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
>  	}
> +	arch_make_page_accessible(page);
>  	unlock_page_memcg(page);
>  	return ret;
> 
> --
> 2.24.0

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-18  3:36   ` Tian, Kevin
@ 2020-02-18  6:44     ` Christian Borntraeger
  0 siblings, 0 replies; 147+ messages in thread
From: Christian Borntraeger @ 2020-02-18  6:44 UTC (permalink / raw)
  To: Tian, Kevin, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, Andrew Morton


On 18.02.20 04:36, Tian, Kevin wrote:
>> From: Christian Borntraeger
>> Sent: Friday, February 7, 2020 7:39 PM
>>
>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>
>> With the introduction of protected KVM guests on s390 there is now a
>> concept of inaccessible pages. These pages need to be made accessible
>> before the host can access them.
>>
>> While cpu accesses will trigger a fault that can be resolved, I/O
>> accesses will just fail.  We need to add a callback into architecture
>> code for places that will do I/O, namely when writeback is started or
>> when a page reference is taken.
> 
> What about hooking the callback to DMA API ops?

Not all device drivers do use the DMA API so it wont work for us.

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-13 14:48         ` Christian Borntraeger
  (?)
@ 2020-02-18 16:02         ` Will Deacon
  -1 siblings, 0 replies; 147+ messages in thread
From: Will Deacon @ 2020-02-18 16:02 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, Andrew Morton, Marc Zyngier, Sean Christopherson,
	Tom Lendacky, KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, Andrea Arcangeli, linux-s390,
	Michael Mueller, Vasily Gorbik, linux-mm, kvm-ppc, Paolo Bonzini,
	mark.rutland, qperret, palmerdabbelt

On Thu, Feb 13, 2020 at 03:48:16PM +0100, Christian Borntraeger wrote:
> 
> 
> On 11.02.20 12:26, Will Deacon wrote:
> > On Mon, Feb 10, 2020 at 06:27:04PM +0100, Christian Borntraeger wrote:
> >> CC Marc Zyngier for KVM on ARM.  Marc, see below. Will there be any
> >> use for this on KVM/ARM in the future?
> > 
> > I can't speak for Marc, but I can say that we're interested in something
> > like this for potentially isolating VMs from a KVM host in Android.
> > However, we've currently been working on the assumption that the memory
> > removed from the host won't usually be touched by the host (i.e. no
> > KSM or swapping out), so all we'd probably want at the moment is to be
> > able to return an error back from arch_make_page_accessible(). Its return
> > code is ignored in this patch :/
> 
> I think there are two ways at the moment. One is to keep the memory away from
> Linux, e.g. by using the memory as device driver memory like kmalloc. This is
> kind of what Power does. And I understand you as you want to follow that model
> and do not want to use paging, file backing or so.

Correct.

> Our approach tries to fully integrate into the existing Linux LRU methods.
> 
> Back to your approach. What happens when a malicious QEMU would start direct I/O
> on such isolated memory? Is that what you meant by adding error checking in these
> hooks. For the gup.c code returning an error seems straightforward.

Yes, it would be nice if the host could avoid even trying to access the
page if it's inaccessible and so returning an error from
arch_make_page_accessible() would be a good way to achieve that. If the
access goes ahead anyway, then the hypervisor will have to handle the
fault and effectively ignore the host access (writes will be lost, reads
will return poison).

> I have no idea what to do in writeback. When somebody managed to trigger writeback
> on such a page, it already seems too late.

For now, we could just have a BUG_ON().

Will

^ permalink raw reply	[flat|nested] 147+ messages in thread

end of thread, other threads:[~2020-02-18 16:02 UTC | newest]

Thread overview: 147+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-07 11:39 [PATCH 00/35] KVM: s390: Add support for protected VMs Christian Borntraeger
2020-02-07 11:39 ` [PATCH 01/35] mm:gup/writeback: add callbacks for inaccessible pages Christian Borntraeger
2020-02-10 17:27   ` Christian Borntraeger
2020-02-10 17:27     ` Christian Borntraeger
2020-02-11 11:26     ` Will Deacon
2020-02-11 11:43       ` Christian Borntraeger
2020-02-11 11:43         ` Christian Borntraeger
2020-02-13 14:48       ` Christian Borntraeger
2020-02-13 14:48         ` Christian Borntraeger
2020-02-18 16:02         ` Will Deacon
2020-02-13 19:56     ` Sean Christopherson
2020-02-13 19:56       ` Sean Christopherson
2020-02-13 20:13       ` Christian Borntraeger
2020-02-13 20:13         ` Christian Borntraeger
2020-02-13 20:46         ` Sean Christopherson
2020-02-13 20:46           ` Sean Christopherson
2020-02-17 20:55         ` Tom Lendacky
2020-02-17 20:55           ` Tom Lendacky
2020-02-17 21:14           ` Christian Borntraeger
2020-02-17 21:14             ` Christian Borntraeger
2020-02-10 18:17   ` David Hildenbrand
2020-02-10 18:28     ` Christian Borntraeger
2020-02-10 18:43       ` David Hildenbrand
2020-02-10 18:51         ` Christian Borntraeger
2020-02-18  3:36   ` Tian, Kevin
2020-02-18  6:44     ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages Christian Borntraeger
2020-02-10 12:26   ` David Hildenbrand
2020-02-10 18:38     ` Christian Borntraeger
2020-02-10 19:33       ` David Hildenbrand
2020-02-11  9:23         ` [PATCH v2 RFC] " Christian Borntraeger
2020-02-12 11:52           ` Christian Borntraeger
2020-02-12 12:16           ` David Hildenbrand
2020-02-12 12:22             ` Christian Borntraeger
2020-02-12 12:47               ` David Hildenbrand
2020-02-12 12:39           ` Cornelia Huck
2020-02-12 12:44             ` Christian Borntraeger
2020-02-12 13:07               ` Cornelia Huck
2020-02-10 18:56     ` [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt Ulrich Weigand
2020-02-10 18:56       ` Ulrich Weigand
2020-02-10 12:40   ` [PATCH 02/35] KVM: s390/interrupt: do not pin adapter interrupt pages David Hildenbrand
2020-02-07 11:39 ` [PATCH 03/35] s390/protvirt: introduce host side setup Christian Borntraeger
2020-02-10  9:42   ` Thomas Huth
2020-02-10  9:48     ` Christian Borntraeger
2020-02-10 11:54   ` Cornelia Huck
2020-02-10 12:14     ` Christian Borntraeger
2020-02-10 12:31       ` Cornelia Huck
2020-02-10 12:38   ` David Hildenbrand
2020-02-10 12:54     ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 04/35] s390/protvirt: add ultravisor initialization Christian Borntraeger
2020-02-14 10:25   ` David Hildenbrand
2020-02-14 10:33     ` Christian Borntraeger
2020-02-14 10:34       ` David Hildenbrand
2020-02-07 11:39 ` [PATCH 05/35] s390/mm: provide memory management functions for protected KVM guests Christian Borntraeger
2020-02-12 13:42   ` Cornelia Huck
2020-02-13  7:43     ` Christian Borntraeger
2020-02-13  8:44       ` Cornelia Huck
2020-02-14 17:59   ` David Hildenbrand
2020-02-14 21:17     ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 06/35] s390/mm: add (non)secure page access exceptions handlers Christian Borntraeger
2020-02-14 18:05   ` David Hildenbrand
2020-02-14 19:59     ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 07/35] KVM: s390: add new variants of UV CALL Christian Borntraeger
2020-02-07 14:34   ` Thomas Huth
2020-02-07 15:03     ` Christian Borntraeger
2020-02-10 12:16   ` Cornelia Huck
2020-02-10 12:22     ` Christian Borntraeger
2020-02-14 18:28   ` David Hildenbrand
2020-02-14 20:13     ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 08/35] KVM: s390: protvirt: Add initial lifecycle handling Christian Borntraeger
2020-02-07 16:32   ` Thomas Huth
2020-02-10  8:34     ` Christian Borntraeger
2020-02-08 14:54   ` Thomas Huth
2020-02-10 11:43     ` Christian Borntraeger
2020-02-10 11:45       ` [PATCH/RFC] KVM: s390: protvirt: pass-through rc and rrc Christian Borntraeger
2020-02-10 12:06         ` Christian Borntraeger
2020-02-10 12:29           ` Thomas Huth
2020-02-10 12:50           ` Cornelia Huck
2020-02-10 12:56             ` Christian Borntraeger
2020-02-11  8:48               ` Janosch Frank
2020-02-13  8:43                 ` Christian Borntraeger
2020-02-14 18:39   ` [PATCH 08/35] KVM: s390: protvirt: Add initial lifecycle handling David Hildenbrand
2020-02-14 21:22     ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 09/35] KVM: s390: protvirt: Add KVM api documentation Christian Borntraeger
2020-02-08 14:57   ` Thomas Huth
2020-02-10 12:26     ` Christian Borntraeger
2020-02-10 12:57       ` Cornelia Huck
2020-02-10 13:02         ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 10/35] KVM: s390: protvirt: Secure memory is not mergeable Christian Borntraeger
2020-02-07 11:39 ` [PATCH 11/35] KVM: s390/mm: Make pages accessible before destroying the guest Christian Borntraeger
2020-02-14 18:40   ` David Hildenbrand
2020-02-07 11:39 ` [PATCH 12/35] KVM: s390: protvirt: Handle SE notification interceptions Christian Borntraeger
2020-02-07 11:39 ` [PATCH 13/35] KVM: s390: protvirt: Instruction emulation Christian Borntraeger
2020-02-07 11:39 ` [PATCH 14/35] KVM: s390: protvirt: Add interruption injection controls Christian Borntraeger
2020-02-07 11:39 ` [PATCH 15/35] KVM: s390: protvirt: Implement interruption injection Christian Borntraeger
2020-02-10 10:03   ` Thomas Huth
2020-02-07 11:39 ` [PATCH 16/35] KVM: s390: protvirt: Add SCLP interrupt handling Christian Borntraeger
2020-02-11 12:00   ` Thomas Huth
2020-02-11 20:06     ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 17/35] KVM: s390: protvirt: Handle spec exception loops Christian Borntraeger
2020-02-07 11:39 ` [PATCH 18/35] KVM: s390: protvirt: Add new gprs location handling Christian Borntraeger
2020-02-07 11:39 ` [PATCH 19/35] KVM: S390: protvirt: Introduce instruction data area bounce buffer Christian Borntraeger
2020-02-07 11:39 ` [PATCH 20/35] KVM: s390: protvirt: handle secure guest prefix pages Christian Borntraeger
2020-02-13  8:37   ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 21/35] KVM: s390/mm: handle guest unpin events Christian Borntraeger
2020-02-10 14:58   ` Thomas Huth
2020-02-11 13:21     ` Cornelia Huck
2020-02-07 11:39 ` [PATCH 22/35] KVM: s390: protvirt: Write sthyi data to instruction data area Christian Borntraeger
2020-02-07 11:39 ` [PATCH 23/35] KVM: s390: protvirt: STSI handling Christian Borntraeger
2020-02-08 15:01   ` Thomas Huth
2020-02-11 10:55   ` Cornelia Huck
2020-02-07 11:39 ` [PATCH 24/35] KVM: s390: protvirt: disallow one_reg Christian Borntraeger
2020-02-10 17:53   ` Cornelia Huck
2020-02-10 18:34     ` Christian Borntraeger
2020-02-11  8:27       ` Cornelia Huck
2020-02-07 11:39 ` [PATCH 25/35] KVM: s390: protvirt: Only sync fmt4 registers Christian Borntraeger
2020-02-09 15:50   ` Thomas Huth
2020-02-10  9:33     ` Christian Borntraeger
2020-02-11 10:51   ` Cornelia Huck
2020-02-11 12:59     ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 26/35] KVM: s390: protvirt: Add program exception injection Christian Borntraeger
2020-02-09 15:52   ` Thomas Huth
2020-02-07 11:39 ` [PATCH 27/35] KVM: s390: protvirt: Add diag 308 subcode 8 - 10 handling Christian Borntraeger
2020-02-07 11:39 ` [PATCH 28/35] KVM: s390: protvirt: UV calls diag308 0, 1 Christian Borntraeger
2020-02-09 16:03   ` Thomas Huth
2020-02-10  8:45     ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 29/35] KVM: s390: protvirt: Report CPU state to Ultravisor Christian Borntraeger
2020-02-07 11:39 ` [PATCH 30/35] KVM: s390: protvirt: Support cmd 5 operation state Christian Borntraeger
2020-02-07 11:39 ` [PATCH 31/35] KVM: s390: protvirt: Add UV debug trace Christian Borntraeger
2020-02-10 13:22   ` Cornelia Huck
2020-02-10 13:40     ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 32/35] KVM: s390: protvirt: Mask PSW interrupt bits for interception 104 and 112 Christian Borntraeger
2020-02-09 16:07   ` Thomas Huth
2020-02-10 13:28   ` Cornelia Huck
2020-02-10 13:48     ` Christian Borntraeger
2020-02-10 14:47       ` Cornelia Huck
2020-02-07 11:39 ` [PATCH 33/35] KVM: s390: protvirt: do not inject interrupts after start Christian Borntraeger
2020-02-07 11:39 ` [PATCH 34/35] KVM: s390: protvirt: Add UV cpu reset calls Christian Borntraeger
2020-02-10 13:17   ` Cornelia Huck
2020-02-10 13:25     ` Christian Borntraeger
2020-02-07 11:39 ` [PATCH 35/35] DOCUMENTATION: Protected virtual machine introduction and IPL Christian Borntraeger
2020-02-11 12:23   ` Thomas Huth
2020-02-11 20:03     ` Christian Borntraeger
2020-02-12 11:03       ` Cornelia Huck
2020-02-12 11:49         ` Christian Borntraeger
2020-02-12 11:01   ` Cornelia Huck
2020-02-12 16:36     ` Christian Borntraeger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.