Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 00/42] KVM: s390: Add support for protected VMs
@ 2020-02-14 22:26 Christian Borntraeger
  2020-02-14 22:26 ` [PATCH v2 01/42] mm:gup/writeback: add callbacks for inaccessible pages Christian Borntraeger
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Christian Borntraeger @ 2020-02-14 22:26 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank, Andrew Morton
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, linux-s390, Michael Mueller,
	Vasily Gorbik, Andrea Arcangeli, linux-mm, Will Deacon,
	Sean Christopherson

mm people: This series contains a "pretty small" common code memory
management change that will allow paging, guest backing with files etc
almost just like normal VMs. It should be a no-op for all architectures
not opting in. And it should be usable for others that also try to get
notified on "the pages are in the process of being used for things like
I/O". At the end of the series are two sample patches as these hooks
seem to be useful for other with error handling/call information.  I
would suggest to keep the patch as is and add the additional things when
intel/arm know exactly what they need.

mm-related patches CCed on linux-mm, the complete list can be found on
the KVM and linux-s390 list. 

Andrew, any chance to either take " mm:gup/writeback: add callbacks for
inaccessible pages" or ACK so that I can take it?

Overview
--------
Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's state
like guest memory and guest registers anymore. Instead the PVMs are
mostly managed by a new entity called Ultravisor (UV), which provides
an API, so KVM and the PV can request management actions.

PVMs are encrypted at rest and protected from hypervisor access while
running. They switch from a normal operation into protected mode, so
we can still use the standard boot process to load a encrypted blob
and then move it into protected mode.

Rebooting is only possible by passing through the unprotected/normal
mode and switching to protected again.

All patches are in the protvirtv4 branch of the korg s390 kvm git
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git/log/?h=protvirtv4

Claudio presented the technology at his presentation at KVM Forum
2019.

https://static.sched.com/hosted_files/kvmforum2019/3b/ibm_protected_vms_s390x.pdf


v-> v2
- rebase on top of kvm/master
- pipe through rc and rrc. This might have created some churn here and
  there
- turn off sclp masking when rebooting into "unsecure"
- memory management simplification
- prefix page handling now via intercept 112
- io interrupt intervention request fix (do not use GISA)
- api.txt conversion to rst
- sample patches on top of mm/gup/writeback
- tons of review feedback
- kvm_uv debug feature fixes and unifications
- ultravisor information for /sys/firmware
- 

RFCv2 -> v1 (you can diff the protvirtv2 and the protvirtv3 branch)
- tons of review feedback integrated (see mail thread)
- memory management now complete and working
- Documentation patches merged
- interrupt patches merged
- CONFIG_KVM_S390_PROTECTED_VIRTUALIZATION_HOST removed
- SIDA interface integrated into memop
- for merged patches I removed reviews that were not in all patches

Christian Borntraeger (5):
  KVM: s390/mm: Make pages accessible before destroying the guest
  KVM: s390: protvirt: Add SCLP interrupt handling
  KVM: s390: protvirt: do not inject interrupts after start
  s390/uv: Fix handling of length extensions (already in s390 tree)
  KVM: s390: rstify new ioctls in api.rst

Claudio Imbrenda (6):
  mm:gup/writeback: add callbacks for inaccessible pages
  s390/mm: provide memory management functions for protected KVM guests
  KVM: s390/mm: handle guest unpin events
  example for future extension: mm:gup/writeback: add callbacks for
    inaccessible pages: error cases
  example for future extension: mm:gup/writeback: add callbacks for
    inaccessible pages: source indication
  potential fixup for "s390/mm: provide memory management functions for
    protected KVM guests"

Janosch Frank (25):
  KVM: s390: protvirt: Add UV debug trace
  KVM: s390: add new variants of UV CALL
  KVM: s390: protvirt: Add initial vm and cpu lifecycle handling
  KVM: s390: protvirt: Add KVM api documentation
  KVM: s390: protvirt: Secure memory is not mergeable
  KVM: s390: protvirt: Handle SE notification interceptions
  KVM: s390: protvirt: Instruction emulation
  KVM: s390: protvirt: Handle spec exception loops
  KVM: s390: protvirt: Add new gprs location handling
  KVM: S390: protvirt: Introduce instruction data area bounce buffer
  KVM: s390: protvirt: handle secure guest prefix pages
  KVM: s390: protvirt: Write sthyi data to instruction data area
  KVM: s390: protvirt: STSI handling
  KVM: s390: protvirt: disallow one_reg
  KVM: s390: protvirt: Do only reset registers that are accessible
  KVM: s390: protvirt: Only sync fmt4 registers
  KVM: s390: protvirt: Add program exception injection
  KVM: s390: protvirt: Add diag 308 subcode 8 - 10 handling
  KVM: s390: protvirt: UV calls in support of diag308 0, 1
  KVM: s390: protvirt: Report CPU state to Ultravisor
  KVM: s390: protvirt: Support cmd 5 operation state
  KVM: s390: protvirt: Mask PSW interrupt bits for interception 104 and
    112
  KVM: s390: protvirt: Add UV cpu reset calls
  DOCUMENTATION: Protected virtual machine introduction and IPL
  s390: protvirt: Add sysfs firmware interface for Ultravisor
    information

Michael Mueller (2):
  KVM: s390: protvirt: Add interruption injection controls
  KVM: s390: protvirt: Implement interruption injection

Ulrich Weigand (1):
  KVM: s390/interrupt: do not pin adapter interrupt pages

Vasily Gorbik (3):
  s390/protvirt: introduce host side setup
  s390/protvirt: add ultravisor initialization
  s390/mm: add (non)secure page access exceptions handlers

 .../admin-guide/kernel-parameters.txt         |   5 +
 Documentation/virt/kvm/api.rst                | 108 +++-
 Documentation/virt/kvm/devices/s390_flic.rst  |  11 +-
 Documentation/virt/kvm/index.rst              |   2 +
 Documentation/virt/kvm/s390-pv-boot.rst       |  83 +++
 Documentation/virt/kvm/s390-pv.rst            | 116 +++++
 MAINTAINERS                                   |   1 +
 arch/s390/boot/Makefile                       |   2 +-
 arch/s390/boot/uv.c                           |  23 +-
 arch/s390/include/asm/gmap.h                  |   6 +
 arch/s390/include/asm/kvm_host.h              | 113 ++++-
 arch/s390/include/asm/mmu.h                   |   2 +
 arch/s390/include/asm/mmu_context.h           |   1 +
 arch/s390/include/asm/page.h                  |   5 +
 arch/s390/include/asm/pgtable.h               |  35 +-
 arch/s390/include/asm/uv.h                    | 251 ++++++++-
 arch/s390/kernel/Makefile                     |   1 +
 arch/s390/kernel/pgm_check.S                  |   4 +-
 arch/s390/kernel/setup.c                      |   9 +-
 arch/s390/kernel/uv.c                         | 412 +++++++++++++++
 arch/s390/kvm/Makefile                        |   2 +-
 arch/s390/kvm/intercept.c                     | 111 +++-
 arch/s390/kvm/interrupt.c                     | 391 ++++++++------
 arch/s390/kvm/kvm-s390.c                      | 479 ++++++++++++++++--
 arch/s390/kvm/kvm-s390.h                      |  40 ++
 arch/s390/kvm/priv.c                          |  11 +-
 arch/s390/kvm/pv.c                            | 295 +++++++++++
 arch/s390/mm/fault.c                          |  78 +++
 arch/s390/mm/gmap.c                           |  65 ++-
 include/linux/gfp.h                           |  12 +
 include/uapi/linux/kvm.h                      |  44 +-
 mm/gup.c                                      |  15 +-
 mm/page-writeback.c                           |   5 +
 33 files changed, 2438 insertions(+), 300 deletions(-)
 create mode 100644 Documentation/virt/kvm/s390-pv-boot.rst
 create mode 100644 Documentation/virt/kvm/s390-pv.rst
 create mode 100644 arch/s390/kernel/uv.c
 create mode 100644 arch/s390/kvm/pv.c

-- 
2.25.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 01/42] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-14 22:26 [PATCH v2 00/42] KVM: s390: Add support for protected VMs Christian Borntraeger
@ 2020-02-14 22:26 ` Christian Borntraeger
  2020-02-17  9:14   ` David Hildenbrand
  2020-02-14 22:26 ` [PATCH v2 05/42] s390/mm: provide memory management functions for protected KVM guests Christian Borntraeger
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Christian Borntraeger @ 2020-02-14 22:26 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank, Andrew Morton
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, linux-s390, Michael Mueller,
	Vasily Gorbik, Andrea Arcangeli, linux-mm, Will Deacon,
	Sean Christopherson

From: Claudio Imbrenda <imbrenda@linux.ibm.com>

With the introduction of protected KVM guests on s390 there is now a
concept of inaccessible pages. These pages need to be made accessible
before the host can access them.

While cpu accesses will trigger a fault that can be resolved, I/O
accesses will just fail.  We need to add a callback into architecture
code for places that will do I/O, namely when writeback is started or
when a page reference is taken.

This is not only to enable paging, file backing etc, it is also
necessary to protect the host against a malicious user space. For
example a bad QEMU could simply start direct I/O on such protected
memory.  We do not want userspace to be able to trigger I/O errors and
thus we the logic is "whenever somebody accesses that page (gup) or
doing I/O, make sure that this page can be accessed. When the guest
tries to access that page we will wait in the page fault handler for
writeback to have finished and for the page_ref to be the expected
value.

If wanted by others, the callbacks can be extended with error handlin
and a parameter from where this is called.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 include/linux/gfp.h | 6 ++++++
 mm/gup.c            | 2 ++
 mm/page-writeback.c | 1 +
 3 files changed, 9 insertions(+)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index e5b817cb86e7..be2754841369 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { }
 #ifndef HAVE_ARCH_ALLOC_PAGE
 static inline void arch_alloc_page(struct page *page, int order) { }
 #endif
+#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
+static inline int arch_make_page_accessible(struct page *page)
+{
+	return 0;
+}
+#endif
 
 struct page *
 __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
diff --git a/mm/gup.c b/mm/gup.c
index 1b521e0ac1de..a1c15d029f7c 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -276,6 +276,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 			page = ERR_PTR(-ENOMEM);
 			goto out;
 		}
+		arch_make_page_accessible(page);
 	}
 	if (flags & FOLL_TOUCH) {
 		if ((flags & FOLL_WRITE) &&
@@ -1919,6 +1920,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 
 		VM_BUG_ON_PAGE(compound_head(page) != head, page);
 
+		arch_make_page_accessible(page);
 		SetPageReferenced(page);
 		pages[*nr] = page;
 		(*nr)++;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 2caf780a42e7..4c020e4ae71c 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2807,6 +2807,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 	}
 	unlock_page_memcg(page);
+	arch_make_page_accessible(page);
 	return ret;
 
 }
-- 
2.25.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 05/42] s390/mm: provide memory management functions for protected KVM guests
  2020-02-14 22:26 [PATCH v2 00/42] KVM: s390: Add support for protected VMs Christian Borntraeger
  2020-02-14 22:26 ` [PATCH v2 01/42] mm:gup/writeback: add callbacks for inaccessible pages Christian Borntraeger
@ 2020-02-14 22:26 ` Christian Borntraeger
  2020-02-17 10:21   ` David Hildenbrand
  2020-02-14 22:26 ` [PATCH v2 06/42] s390/mm: add (non)secure page access exceptions handlers Christian Borntraeger
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Christian Borntraeger @ 2020-02-14 22:26 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, linux-s390, Michael Mueller,
	Vasily Gorbik, Andrea Arcangeli, linux-mm

From: Claudio Imbrenda <imbrenda@linux.ibm.com>

This provides the basic ultravisor calls and page table handling to cope
with secure guests:
- provide arch_make_page_accessible
- make pages accessible after unmapping of secure guests
- provide the ultravisor commands convert to/from secure
- provide the ultravisor commands pin/unpin shared
- provide callbacks to make pages secure (inacccessible)
 - we check for the expected pin count to only make pages secure if the
   host is not accessing them
 - we fence hugetlbfs for secure pages

Co-developed-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/gmap.h        |   4 +
 arch/s390/include/asm/mmu.h         |   2 +
 arch/s390/include/asm/mmu_context.h |   1 +
 arch/s390/include/asm/page.h        |   5 +
 arch/s390/include/asm/pgtable.h     |  35 ++++-
 arch/s390/include/asm/uv.h          |  31 ++++
 arch/s390/kernel/uv.c               | 223 ++++++++++++++++++++++++++++
 7 files changed, 296 insertions(+), 5 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index 37f96b6f0e61..3c4926aa78f4 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -9,6 +9,7 @@
 #ifndef _ASM_S390_GMAP_H
 #define _ASM_S390_GMAP_H
 
+#include <linux/radix-tree.h>
 #include <linux/refcount.h>
 
 /* Generic bits for GMAP notification on DAT table entry changes. */
@@ -31,6 +32,7 @@
  * @table: pointer to the page directory
  * @asce: address space control element for gmap page table
  * @pfault_enabled: defines if pfaults are applicable for the guest
+ * @guest_handle: protected virtual machine handle for the ultravisor
  * @host_to_rmap: radix tree with gmap_rmap lists
  * @children: list of shadow gmap structures
  * @pt_list: list of all page tables used in the shadow guest address space
@@ -54,6 +56,8 @@ struct gmap {
 	unsigned long asce_end;
 	void *private;
 	bool pfault_enabled;
+	/* only set for protected virtual machines */
+	unsigned long guest_handle;
 	/* Additional data for shadow guest address spaces */
 	struct radix_tree_root host_to_rmap;
 	struct list_head children;
diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h
index bcfb6371086f..e21b618ad432 100644
--- a/arch/s390/include/asm/mmu.h
+++ b/arch/s390/include/asm/mmu.h
@@ -16,6 +16,8 @@ typedef struct {
 	unsigned long asce;
 	unsigned long asce_limit;
 	unsigned long vdso_base;
+	/* The mmu context belongs to a secure guest. */
+	atomic_t is_protected;
 	/*
 	 * The following bitfields need a down_write on the mm
 	 * semaphore when they are written to. As they are only
diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h
index 8d04e6f3f796..afa836014076 100644
--- a/arch/s390/include/asm/mmu_context.h
+++ b/arch/s390/include/asm/mmu_context.h
@@ -23,6 +23,7 @@ static inline int init_new_context(struct task_struct *tsk,
 	INIT_LIST_HEAD(&mm->context.gmap_list);
 	cpumask_clear(&mm->context.cpu_attach_mask);
 	atomic_set(&mm->context.flush_count, 0);
+	atomic_set(&mm->context.is_protected, 0);
 	mm->context.gmap_asce = 0;
 	mm->context.flush_mm = 0;
 	mm->context.compat_mm = test_thread_flag(TIF_31BIT);
diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
index 85e944f04c70..4ebcf891ff3c 100644
--- a/arch/s390/include/asm/page.h
+++ b/arch/s390/include/asm/page.h
@@ -153,6 +153,11 @@ static inline int devmem_is_allowed(unsigned long pfn)
 #define HAVE_ARCH_FREE_PAGE
 #define HAVE_ARCH_ALLOC_PAGE
 
+#if IS_ENABLED(CONFIG_PGSTE)
+int arch_make_page_accessible(struct page *page);
+#define HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
+#endif
+
 #endif /* !__ASSEMBLY__ */
 
 #define __PAGE_OFFSET		0x0UL
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 137a3920ca36..cc7a1adacb94 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -19,6 +19,7 @@
 #include <linux/atomic.h>
 #include <asm/bug.h>
 #include <asm/page.h>
+#include <asm/uv.h>
 
 extern pgd_t swapper_pg_dir[];
 extern void paging_init(void);
@@ -520,6 +521,15 @@ static inline int mm_has_pgste(struct mm_struct *mm)
 	return 0;
 }
 
+static inline int mm_is_protected(struct mm_struct *mm)
+{
+#ifdef CONFIG_PGSTE
+	if (unlikely(atomic_read(&mm->context.is_protected)))
+		return 1;
+#endif
+	return 0;
+}
+
 static inline int mm_alloc_pgste(struct mm_struct *mm)
 {
 #ifdef CONFIG_PGSTE
@@ -1061,7 +1071,12 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 				       unsigned long addr, pte_t *ptep)
 {
-	return ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID));
+	pte_t res;
+
+	res = ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID));
+	if (mm_is_protected(mm) && pte_present(res))
+		uv_convert_from_secure(pte_val(res) & PAGE_MASK);
+	return res;
 }
 
 #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
@@ -1073,7 +1088,12 @@ void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long,
 static inline pte_t ptep_clear_flush(struct vm_area_struct *vma,
 				     unsigned long addr, pte_t *ptep)
 {
-	return ptep_xchg_direct(vma->vm_mm, addr, ptep, __pte(_PAGE_INVALID));
+	pte_t res;
+
+	res = ptep_xchg_direct(vma->vm_mm, addr, ptep, __pte(_PAGE_INVALID));
+	if (mm_is_protected(vma->vm_mm) && pte_present(res))
+		uv_convert_from_secure(pte_val(res) & PAGE_MASK);
+	return res;
 }
 
 /*
@@ -1088,12 +1108,17 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
 					    unsigned long addr,
 					    pte_t *ptep, int full)
 {
+	pte_t res;
+
 	if (full) {
-		pte_t pte = *ptep;
+		res = *ptep;
 		*ptep = __pte(_PAGE_INVALID);
-		return pte;
+	} else {
+		res = ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID));
 	}
-	return ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID));
+	if (mm_is_protected(mm) && pte_present(res))
+		uv_convert_from_secure(pte_val(res) & PAGE_MASK);
+	return res;
 }
 
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h
index f5b55e3972b3..e45963cc7f40 100644
--- a/arch/s390/include/asm/uv.h
+++ b/arch/s390/include/asm/uv.h
@@ -15,6 +15,7 @@
 #include <linux/errno.h>
 #include <linux/bug.h>
 #include <asm/page.h>
+#include <asm/gmap.h>
 
 #define UVC_RC_EXECUTED		0x0001
 #define UVC_RC_INV_CMD		0x0002
@@ -24,6 +25,10 @@
 
 #define UVC_CMD_QUI			0x0001
 #define UVC_CMD_INIT_UV			0x000f
+#define UVC_CMD_CONV_TO_SEC_STOR	0x0200
+#define UVC_CMD_CONV_FROM_SEC_STOR	0x0201
+#define UVC_CMD_PIN_PAGE_SHARED		0x0341
+#define UVC_CMD_UNPIN_PAGE_SHARED	0x0342
 #define UVC_CMD_SET_SHARED_ACCESS	0x1000
 #define UVC_CMD_REMOVE_SHARED_ACCESS	0x1001
 
@@ -31,8 +36,12 @@
 enum uv_cmds_inst {
 	BIT_UVC_CMD_QUI = 0,
 	BIT_UVC_CMD_INIT_UV = 1,
+	BIT_UVC_CMD_CONV_TO_SEC_STOR = 6,
+	BIT_UVC_CMD_CONV_FROM_SEC_STOR = 7,
 	BIT_UVC_CMD_SET_SHARED_ACCESS = 8,
 	BIT_UVC_CMD_REMOVE_SHARED_ACCESS = 9,
+	BIT_UVC_CMD_PIN_PAGE_SHARED = 21,
+	BIT_UVC_CMD_UNPIN_PAGE_SHARED = 22,
 };
 
 struct uv_cb_header {
@@ -69,6 +78,19 @@ struct uv_cb_init {
 	u64 reserved28[4];
 } __packed __aligned(8);
 
+struct uv_cb_cts {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 guest_handle;
+	u64 gaddr;
+} __packed __aligned(8);
+
+struct uv_cb_cfs {
+	struct uv_cb_header header;
+	u64 reserved08[2];
+	u64 paddr;
+} __packed __aligned(8);
+
 struct uv_cb_share {
 	struct uv_cb_header header;
 	u64 reserved08[3];
@@ -170,12 +192,21 @@ static inline int is_prot_virt_host(void)
 	return prot_virt_host;
 }
 
+int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb);
+int uv_convert_from_secure(unsigned long paddr);
+int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr);
+
 void setup_uv(void);
 void adjust_to_uv_max(unsigned long *vmax);
 #else
 #define is_prot_virt_host() 0
 static inline void setup_uv(void) {}
 static inline void adjust_to_uv_max(unsigned long *vmax) {}
+
+static inline int uv_convert_from_secure(unsigned long paddr)
+{
+	return 0;
+}
 #endif
 
 #if defined(CONFIG_PROTECTED_VIRTUALIZATION_GUEST) ||                          \
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index 1424994f5489..9a6c309864a0 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -12,6 +12,8 @@
 #include <linux/sizes.h>
 #include <linux/bitmap.h>
 #include <linux/memblock.h>
+#include <linux/pagemap.h>
+#include <linux/swap.h>
 #include <asm/facility.h>
 #include <asm/sections.h>
 #include <asm/uv.h>
@@ -100,4 +102,225 @@ void adjust_to_uv_max(unsigned long *vmax)
 {
 	*vmax = min_t(unsigned long, *vmax, uv_info.max_sec_stor_addr);
 }
+
+/*
+ * Requests the Ultravisor to pin the page in the shared state. This will
+ * cause an intercept when the guest attempts to unshare the pinned page.
+ */
+static int uv_pin_shared(unsigned long paddr)
+{
+	struct uv_cb_cfs uvcb = {
+		.header.cmd = UVC_CMD_PIN_PAGE_SHARED,
+		.header.len = sizeof(uvcb),
+		.paddr = paddr,
+	};
+
+	if (uv_call(0, (u64)&uvcb))
+		return -EINVAL;
+	return 0;
+}
+
+/*
+ * Requests the Ultravisor to encrypt a guest page and make it
+ * accessible to the host for paging (export).
+ *
+ * @paddr: Absolute host address of page to be exported
+ */
+int uv_convert_from_secure(unsigned long paddr)
+{
+	struct uv_cb_cfs uvcb = {
+		.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
+		.header.len = sizeof(uvcb),
+		.paddr = paddr
+	};
+
+	if (uv_call(0, (u64)&uvcb))
+		return -EINVAL;
+	return 0;
+}
+
+/*
+ * Calculate the expected ref_count for a page that would otherwise have no
+ * further pins. This was cribbed from similar functions in other places in
+ * the kernel, but with some slight modifications. We know that a secure
+ * page can not be a huge page for example.
+ */
+static int expected_page_refs(struct page *page)
+{
+	int res;
+
+	res = page_mapcount(page);
+	if (PageSwapCache(page)) {
+		res++;
+	} else if (page_mapping(page)) {
+		res++;
+		if (page_has_private(page))
+			res++;
+	}
+	return res;
+}
+
+static int make_secure_pte(pte_t *ptep, unsigned long addr,
+			   struct page *exp_page, struct uv_cb_header *uvcb)
+{
+	pte_t entry = READ_ONCE(*ptep);
+	struct page *page;
+	int expected, rc = 0;
+
+	if (!pte_present(entry))
+		return -ENXIO;
+	if (pte_val(entry) & _PAGE_INVALID)
+		return -ENXIO;
+
+	page = pte_page(entry);
+	if (page != exp_page)
+		return -ENXIO;
+	if (PageWriteback(page))
+		return -EAGAIN;
+	expected = expected_page_refs(page);
+	if (!page_ref_freeze(page, expected))
+		return -EBUSY;
+	set_bit(PG_arch_1, &page->flags);
+	rc = uv_call(0, (u64)uvcb);
+	page_ref_unfreeze(page, expected);
+	/* Return -ENXIO if the page was not mapped, -EINVAL otherwise */
+	if (rc)
+		rc = uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
+	return rc;
+}
+
+/*
+ * Requests the Ultravisor to make a page accessible to a guest.
+ * If it's brought in the first time, it will be cleared. If
+ * it has been exported before, it will be decrypted and integrity
+ * checked.
+ */
+int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
+{
+	struct vm_area_struct *vma;
+	unsigned long uaddr;
+	struct page *page;
+	int rc, local_drain = 0;
+	spinlock_t *ptelock;
+	pte_t *ptep;
+
+again:
+	rc = -EFAULT;
+	down_read(&gmap->mm->mmap_sem);
+
+	uaddr = __gmap_translate(gmap, gaddr);
+	if (IS_ERR_VALUE(uaddr))
+		goto out;
+	vma = find_vma(gmap->mm, uaddr);
+	if (!vma)
+		goto out;
+	/*
+	 * Secure pages cannot be huge and userspace should not combine both.
+	 * In case userspace does it anyway this will result in an -EFAULT for
+	 * the unpack. The guest is thus never reaching secure mode. If
+	 * userspace is playing dirty tricky with mapping huge pages later
+	 * on this will result in a segmenation fault.
+	 */
+	if (is_vm_hugetlb_page(vma))
+		goto out;
+
+	rc = -ENXIO;
+	page = follow_page(vma, uaddr, FOLL_WRITE);
+	if (IS_ERR_OR_NULL(page))
+		goto out;
+
+	lock_page(page);
+	ptep = get_locked_pte(gmap->mm, uaddr, &ptelock);
+	rc = make_secure_pte(ptep, uaddr, page, uvcb);
+	pte_unmap_unlock(ptep, ptelock);
+	unlock_page(page);
+out:
+	up_read(&gmap->mm->mmap_sem);
+
+	if (rc == -EAGAIN) {
+		wait_on_page_writeback(page);
+	} else if (rc == -EBUSY) {
+		/*
+		 * If we have tried a local drain and the page refcount
+		 * still does not match our expected safe value, try with a
+		 * system wide drain. This is needed if the pagevecs holding
+		 * the page are on a different CPU.
+		 */
+		if (local_drain) {
+			lru_add_drain_all();
+			/* We give up here, and let the caller try again */
+			return -EAGAIN;
+		}
+		/*
+		 * We are here if the page refcount does not match the
+		 * expected safe value. The main culprits are usually
+		 * pagevecs. With lru_add_drain() we drain the pagevecs
+		 * on the local CPU so that hopefully the refcount will
+		 * reach the expected safe value.
+		 */
+		lru_add_drain();
+		local_drain = 1;
+		/* And now we try again immediately after draining */
+		goto again;
+	} else if (rc == -ENXIO) {
+		if (gmap_fault(gmap, gaddr, FAULT_FLAG_WRITE))
+			return -EFAULT;
+		return -EAGAIN;
+	}
+	return rc;
+}
+EXPORT_SYMBOL_GPL(gmap_make_secure);
+
+int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
+{
+	struct uv_cb_cts uvcb = {
+		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
+		.header.len = sizeof(uvcb),
+		.guest_handle = gmap->guest_handle,
+		.gaddr = gaddr,
+	};
+
+	return gmap_make_secure(gmap, gaddr, &uvcb);
+}
+EXPORT_SYMBOL_GPL(gmap_convert_to_secure);
+
+/**
+ * To be called with the page locked or with an extra reference!
+ */
+int arch_make_page_accessible(struct page *page)
+{
+	int rc = 0;
+
+	/* Hugepage cannot be protected, so nothing to do */
+	if (PageHuge(page))
+		return 0;
+
+	/*
+	 * PG_arch_1 is used in 3 places:
+	 * 1. for kernel page tables during early boot
+	 * 2. for storage keys of huge pages and KVM
+	 * 3. As an indication that this page might be secure. This can
+	 *    overindicate, e.g. we set the bit before calling
+	 *    convert_to_secure.
+	 * As secure pages are never huge, all 3 variants can co-exists.
+	 */
+	if (!test_bit(PG_arch_1, &page->flags))
+		return 0;
+
+	rc = uv_pin_shared(page_to_phys(page));
+	if (!rc) {
+		clear_bit(PG_arch_1, &page->flags);
+		return 0;
+	}
+
+	rc = uv_convert_from_secure(page_to_phys(page));
+	if (!rc) {
+		clear_bit(PG_arch_1, &page->flags);
+		return 0;
+	}
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(arch_make_page_accessible);
+
 #endif
-- 
2.25.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 06/42] s390/mm: add (non)secure page access exceptions handlers
  2020-02-14 22:26 [PATCH v2 00/42] KVM: s390: Add support for protected VMs Christian Borntraeger
  2020-02-14 22:26 ` [PATCH v2 01/42] mm:gup/writeback: add callbacks for inaccessible pages Christian Borntraeger
  2020-02-14 22:26 ` [PATCH v2 05/42] s390/mm: provide memory management functions for protected KVM guests Christian Borntraeger
@ 2020-02-14 22:26 ` Christian Borntraeger
  2020-02-14 22:26 ` [PATCH v2 39/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: error cases Christian Borntraeger
  2020-02-14 22:26 ` [PATCH v2 40/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: source indication Christian Borntraeger
  4 siblings, 0 replies; 22+ messages in thread
From: Christian Borntraeger @ 2020-02-14 22:26 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, linux-s390, Michael Mueller,
	Vasily Gorbik, Andrea Arcangeli, linux-mm, Janosch Frank

From: Vasily Gorbik <gor@linux.ibm.com>

Add exceptions handlers performing transparent transition of non-secure
pages to secure (import) upon guest access and secure pages to
non-secure (export) upon hypervisor access.

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
[frankja@linux.ibm.com: adding checks for failures]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
[imbrenda@linux.ibm.com:  adding a check for gmap fault]
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
[borntraeger@de.ibm.com: patch merging, splitting, fixing]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kernel/pgm_check.S |  4 +-
 arch/s390/mm/fault.c         | 78 ++++++++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kernel/pgm_check.S b/arch/s390/kernel/pgm_check.S
index eee3a482195a..2c27907a5ffc 100644
--- a/arch/s390/kernel/pgm_check.S
+++ b/arch/s390/kernel/pgm_check.S
@@ -78,8 +78,8 @@ PGM_CHECK(do_dat_exception)		/* 39 */
 PGM_CHECK(do_dat_exception)		/* 3a */
 PGM_CHECK(do_dat_exception)		/* 3b */
 PGM_CHECK_DEFAULT			/* 3c */
-PGM_CHECK_DEFAULT			/* 3d */
-PGM_CHECK_DEFAULT			/* 3e */
+PGM_CHECK(do_secure_storage_access)	/* 3d */
+PGM_CHECK(do_non_secure_storage_access)	/* 3e */
 PGM_CHECK_DEFAULT			/* 3f */
 PGM_CHECK(monitor_event_exception)	/* 40 */
 PGM_CHECK_DEFAULT			/* 41 */
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 7b0bb475c166..7bd86ebc882f 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -38,6 +38,7 @@
 #include <asm/irq.h>
 #include <asm/mmu_context.h>
 #include <asm/facility.h>
+#include <asm/uv.h>
 #include "../kernel/entry.h"
 
 #define __FAIL_ADDR_MASK -4096L
@@ -816,3 +817,80 @@ static int __init pfault_irq_init(void)
 early_initcall(pfault_irq_init);
 
 #endif /* CONFIG_PFAULT */
+
+#if IS_ENABLED(CONFIG_PGSTE)
+void do_secure_storage_access(struct pt_regs *regs)
+{
+	unsigned long addr = regs->int_parm_long & __FAIL_ADDR_MASK;
+	struct vm_area_struct *vma;
+	struct mm_struct *mm;
+	struct page *page;
+	int rc;
+
+	switch (get_fault_type(regs)) {
+	case USER_FAULT:
+		mm = current->mm;
+		down_read(&mm->mmap_sem);
+		vma = find_vma(mm, addr);
+		if (!vma) {
+			up_read(&mm->mmap_sem);
+			do_fault_error(regs, VM_READ | VM_WRITE, VM_FAULT_BADMAP);
+			break;
+		}
+		page = follow_page(vma, addr, FOLL_WRITE | FOLL_GET);
+		if (IS_ERR_OR_NULL(page)) {
+			up_read(&mm->mmap_sem);
+			break;
+		}
+		if (arch_make_page_accessible(page))
+			send_sig(SIGSEGV, current, 0);
+		put_page(page);
+		up_read(&mm->mmap_sem);
+		break;
+	case KERNEL_FAULT:
+		page = phys_to_page(addr);
+		if (unlikely(!try_get_page(page)))
+			break;
+		rc = arch_make_page_accessible(page);
+		put_page(page);
+		if (rc)
+			BUG();
+		break;
+	case VDSO_FAULT:
+		/* fallthrough */
+	case GMAP_FAULT:
+		/* fallthrough */
+	default:
+		do_fault_error(regs, VM_READ | VM_WRITE, VM_FAULT_BADMAP);
+		WARN_ON_ONCE(1);
+	}
+}
+NOKPROBE_SYMBOL(do_secure_storage_access);
+
+void do_non_secure_storage_access(struct pt_regs *regs)
+{
+	unsigned long gaddr = regs->int_parm_long & __FAIL_ADDR_MASK;
+	struct gmap *gmap = (struct gmap *)S390_lowcore.gmap;
+
+	if (get_fault_type(regs) != GMAP_FAULT) {
+		do_fault_error(regs, VM_READ | VM_WRITE, VM_FAULT_BADMAP);
+		WARN_ON_ONCE(1);
+		return;
+	}
+
+	if (gmap_convert_to_secure(gmap, gaddr) == -EINVAL)
+		send_sig(SIGSEGV, current, 0);
+}
+NOKPROBE_SYMBOL(do_non_secure_storage_access);
+
+#else
+void do_secure_storage_access(struct pt_regs *regs)
+{
+	default_trap_handler(regs);
+}
+
+void do_non_secure_storage_access(struct pt_regs *regs)
+{
+	default_trap_handler(regs);
+}
+#endif
-- 
2.25.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 39/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: error cases
  2020-02-14 22:26 [PATCH v2 00/42] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (2 preceding siblings ...)
  2020-02-14 22:26 ` [PATCH v2 06/42] s390/mm: add (non)secure page access exceptions handlers Christian Borntraeger
@ 2020-02-14 22:26 ` Christian Borntraeger
  2020-02-18 16:25   ` Will Deacon
  2020-02-14 22:26 ` [PATCH v2 40/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: source indication Christian Borntraeger
  4 siblings, 1 reply; 22+ messages in thread
From: Christian Borntraeger @ 2020-02-14 22:26 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank, Andrew Morton
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, linux-s390, Michael Mueller,
	Vasily Gorbik, Andrea Arcangeli, linux-mm, Will Deacon,
	Sean Christopherson

From: Claudio Imbrenda <imbrenda@linux.ibm.com>

This is a potential extension to do error handling if we fail to
make the page accessible if we know what others need.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 mm/gup.c            | 17 ++++++++++++-----
 mm/page-writeback.c |  6 +++++-
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index a1c15d029f7c..354bcfbd844b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -193,6 +193,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 	struct page *page;
 	spinlock_t *ptl;
 	pte_t *ptep, pte;
+	int ret;
 
 	/* FOLL_GET and FOLL_PIN are mutually exclusive. */
 	if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
@@ -250,8 +251,6 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 		if (is_zero_pfn(pte_pfn(pte))) {
 			page = pte_page(pte);
 		} else {
-			int ret;
-
 			ret = follow_pfn_pte(vma, address, ptep, flags);
 			page = ERR_PTR(ret);
 			goto out;
@@ -259,7 +258,6 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 	}
 
 	if (flags & FOLL_SPLIT && PageTransCompound(page)) {
-		int ret;
 		get_page(page);
 		pte_unmap_unlock(ptep, ptl);
 		lock_page(page);
@@ -276,7 +274,12 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 			page = ERR_PTR(-ENOMEM);
 			goto out;
 		}
-		arch_make_page_accessible(page);
+		ret = arch_make_page_accessible(page);
+		if (ret) {
+			put_page(page);
+			page = ERR_PTR(ret);
+			goto out;
+		}
 	}
 	if (flags & FOLL_TOUCH) {
 		if ((flags & FOLL_WRITE) &&
@@ -1920,7 +1923,11 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 
 		VM_BUG_ON_PAGE(compound_head(page) != head, page);
 
-		arch_make_page_accessible(page);
+		ret = arch_make_page_accessible(page);
+		if (ret) {
+			put_page(head);
+			goto pte_unmap;
+		}
 		SetPageReferenced(page);
 		pages[*nr] = page;
 		(*nr)++;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 4c020e4ae71c..558d7063c117 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2807,7 +2807,11 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
 	}
 	unlock_page_memcg(page);
-	arch_make_page_accessible(page);
+	/*
+	 * If writeback has been triggered on a page that cannot be made
+	 * accessible, it is too late.
+	 */
+	WARN_ON(arch_make_page_accessible(page));
 	return ret;
 
 }
-- 
2.25.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 40/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: source indication
  2020-02-14 22:26 [PATCH v2 00/42] KVM: s390: Add support for protected VMs Christian Borntraeger
                   ` (3 preceding siblings ...)
  2020-02-14 22:26 ` [PATCH v2 39/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: error cases Christian Borntraeger
@ 2020-02-14 22:26 ` Christian Borntraeger
  2020-02-17 14:15   ` Ulrich Weigand
  4 siblings, 1 reply; 22+ messages in thread
From: Christian Borntraeger @ 2020-02-14 22:26 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank, Andrew Morton
  Cc: KVM, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Ulrich Weigand, Claudio Imbrenda, linux-s390, Michael Mueller,
	Vasily Gorbik, Andrea Arcangeli, linux-mm, Will Deacon,
	Sean Christopherson

From: Claudio Imbrenda <imbrenda@linux.ibm.com>

We might want to do different things depending on where we are coming
from.

Cc: Will Deacon <will@kernel.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
[borntraeger@de.ibm.com: patch description]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 include/linux/gfp.h | 8 +++++++-
 mm/gup.c            | 4 ++--
 mm/page-writeback.c | 2 +-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index be2754841369..a15fcb361e7c 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -485,8 +485,14 @@ static inline void arch_free_page(struct page *page, int order) { }
 #ifndef HAVE_ARCH_ALLOC_PAGE
 static inline void arch_alloc_page(struct page *page, int order) { }
 #endif
+enum access_type {
+	MAKE_ACCESSIBLE_GENERIC,
+	MAKE_ACCESSIBLE_GET,
+	MAKE_ACCESSIBLE_GET_FAST,
+	MAKE_ACCESSIBLE_WRITEBACK
+};
 #ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
-static inline int arch_make_page_accessible(struct page *page)
+static inline int arch_make_page_accessible(struct page *page, int where)
 {
 	return 0;
 }
diff --git a/mm/gup.c b/mm/gup.c
index 354bcfbd844b..ce962c155724 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -274,7 +274,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 			page = ERR_PTR(-ENOMEM);
 			goto out;
 		}
-		ret = arch_make_page_accessible(page);
+		ret = arch_make_page_accessible(page, MAKE_ACCESSIBLE_GET);
 		if (ret) {
 			put_page(page);
 			page = ERR_PTR(ret);
@@ -1923,7 +1923,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 
 		VM_BUG_ON_PAGE(compound_head(page) != head, page);
 
-		ret = arch_make_page_accessible(page);
+		ret = arch_make_page_accessible(page, MAKE_ACCESSIBLE_GET_FAST);
 		if (ret) {
 			put_page(head);
 			goto pte_unmap;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 558d7063c117..f85148e59800 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2811,7 +2811,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
 	 * If writeback has been triggered on a page that cannot be made
 	 * accessible, it is too late.
 	 */
-	WARN_ON(arch_make_page_accessible(page));
+	WARN_ON(arch_make_page_accessible(page, MAKE_ACCESSIBLE_WRITEBACK));
 	return ret;
 
 }
-- 
2.25.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 01/42] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-14 22:26 ` [PATCH v2 01/42] mm:gup/writeback: add callbacks for inaccessible pages Christian Borntraeger
@ 2020-02-17  9:14   ` David Hildenbrand
  2020-02-17 11:10     ` Christian Borntraeger
  0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2020-02-17  9:14 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank, Andrew Morton
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, linux-s390, Michael Mueller, Vasily Gorbik,
	Andrea Arcangeli, linux-mm, Will Deacon, Sean Christopherson

On 14.02.20 23:26, Christian Borntraeger wrote:
> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> 
> With the introduction of protected KVM guests on s390 there is now a
> concept of inaccessible pages. These pages need to be made accessible
> before the host can access them.
> 
> While cpu accesses will trigger a fault that can be resolved, I/O
> accesses will just fail.  We need to add a callback into architecture
> code for places that will do I/O, namely when writeback is started or
> when a page reference is taken.
> 
> This is not only to enable paging, file backing etc, it is also
> necessary to protect the host against a malicious user space. For
> example a bad QEMU could simply start direct I/O on such protected
> memory.  We do not want userspace to be able to trigger I/O errors and
> thus we the logic is "whenever somebody accesses that page (gup) or
> doing I/O, make sure that this page can be accessed. When the guest
> tries to access that page we will wait in the page fault handler for
> writeback to have finished and for the page_ref to be the expected
> value.
> 
> If wanted by others, the callbacks can be extended with error handlin
> and a parameter from where this is called.

s/handlin/handling/

One last question from my side:

Why is it OK to ignore errors here. IOW, why not squash "[PATCH v2
39/42] example for future extension: mm:gup/writeback: add callbacks for
inaccessible pages: error cases" into this patch.

I can see in patch "[PATCH v2 05/42] s390/mm: provide memory management
functions for protected KVM guests", that the call can fail for various
reasons. That puzzles me a bit - what would happen if any of that fails?
Or will it actually never fail for s390x (and all that error handling in
arch_make_page_accessible() is essentially dead code in real life?)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 05/42] s390/mm: provide memory management functions for protected KVM guests
  2020-02-14 22:26 ` [PATCH v2 05/42] s390/mm: provide memory management functions for protected KVM guests Christian Borntraeger
@ 2020-02-17 10:21   ` David Hildenbrand
  2020-02-17 11:28     ` Christian Borntraeger
  0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2020-02-17 10:21 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, linux-s390, Michael Mueller, Vasily Gorbik,
	Andrea Arcangeli, linux-mm

> diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
> index 85e944f04c70..4ebcf891ff3c 100644
> --- a/arch/s390/include/asm/page.h
> +++ b/arch/s390/include/asm/page.h
> @@ -153,6 +153,11 @@ static inline int devmem_is_allowed(unsigned long pfn)
>  #define HAVE_ARCH_FREE_PAGE
>  #define HAVE_ARCH_ALLOC_PAGE
>  
> +#if IS_ENABLED(CONFIG_PGSTE)
> +int arch_make_page_accessible(struct page *page);
> +#define HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> +#endif
> +

Feels like this should have been one of the (CONFIG_)ARCH_HAVE_XXX
thingies defined via kconfig instead.

E.g., like (CONFIG_)HAVE_ARCH_TRANSPARENT_HUGEPAGE

[...]

> +
> +/*
> + * Requests the Ultravisor to encrypt a guest page and make it
> + * accessible to the host for paging (export).
> + *
> + * @paddr: Absolute host address of page to be exported
> + */
> +int uv_convert_from_secure(unsigned long paddr)
> +{
> +	struct uv_cb_cfs uvcb = {
> +		.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
> +		.header.len = sizeof(uvcb),
> +		.paddr = paddr
> +	};
> +
> +	if (uv_call(0, (u64)&uvcb))
> +		return -EINVAL;
> +	return 0;
> +}
> +
> +/*
> + * Calculate the expected ref_count for a page that would otherwise have no
> + * further pins. This was cribbed from similar functions in other places in
> + * the kernel, but with some slight modifications. We know that a secure
> + * page can not be a huge page for example.

s/ca not cannot/

> + */
> +static int expected_page_refs(struct page *page)
> +{
> +	int res;
> +
> +	res = page_mapcount(page);
> +	if (PageSwapCache(page)) {
> +		res++;
> +	} else if (page_mapping(page)) {
> +		res++;
> +		if (page_has_private(page))
> +			res++;
> +	}
> +	return res;
> +}
> +
> +static int make_secure_pte(pte_t *ptep, unsigned long addr,
> +			   struct page *exp_page, struct uv_cb_header *uvcb)
> +{
> +	pte_t entry = READ_ONCE(*ptep);
> +	struct page *page;
> +	int expected, rc = 0;
> +
> +	if (!pte_present(entry))
> +		return -ENXIO;
> +	if (pte_val(entry) & _PAGE_INVALID)
> +		return -ENXIO;
> +
> +	page = pte_page(entry);
> +	if (page != exp_page)
> +		return -ENXIO;
> +	if (PageWriteback(page))
> +		return -EAGAIN;
> +	expected = expected_page_refs(page);
> +	if (!page_ref_freeze(page, expected))
> +		return -EBUSY;
> +	set_bit(PG_arch_1, &page->flags);
> +	rc = uv_call(0, (u64)uvcb);
> +	page_ref_unfreeze(page, expected);
> +	/* Return -ENXIO if the page was not mapped, -EINVAL otherwise */
> +	if (rc)
> +		rc = uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
> +	return rc;
> +}
> +
> +/*
> + * Requests the Ultravisor to make a page accessible to a guest.
> + * If it's brought in the first time, it will be cleared. If
> + * it has been exported before, it will be decrypted and integrity
> + * checked.
> + */
> +int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
> +{
> +	struct vm_area_struct *vma;
> +	unsigned long uaddr;
> +	struct page *page;
> +	int rc, local_drain = 0;

local_drain could have been a bool.

> +	spinlock_t *ptelock;
> +	pte_t *ptep;
> +
> +again:
> +	rc = -EFAULT;
> +	down_read(&gmap->mm->mmap_sem);
> +
> +	uaddr = __gmap_translate(gmap, gaddr);
> +	if (IS_ERR_VALUE(uaddr))
> +		goto out;
> +	vma = find_vma(gmap->mm, uaddr);
> +	if (!vma)
> +		goto out;
> +	/*
> +	 * Secure pages cannot be huge and userspace should not combine both.
> +	 * In case userspace does it anyway this will result in an -EFAULT for
> +	 * the unpack. The guest is thus never reaching secure mode. If
> +	 * userspace is playing dirty tricky with mapping huge pages later
> +	 * on this will result in a segmenation fault.

s/segmenation/segmentation/

> +	 */
> +	if (is_vm_hugetlb_page(vma))
> +		goto out;
> +
> +	rc = -ENXIO;
> +	page = follow_page(vma, uaddr, FOLL_WRITE);
> +	if (IS_ERR_OR_NULL(page))
> +		goto out;
> +
> +	lock_page(page);
> +	ptep = get_locked_pte(gmap->mm, uaddr, &ptelock);
> +	rc = make_secure_pte(ptep, uaddr, page, uvcb);
> +	pte_unmap_unlock(ptep, ptelock);
> +	unlock_page(page);
> +out:
> +	up_read(&gmap->mm->mmap_sem);
> +
> +	if (rc == -EAGAIN) {
> +		wait_on_page_writeback(page);
> +	} else if (rc == -EBUSY) {
> +		/*
> +		 * If we have tried a local drain and the page refcount
> +		 * still does not match our expected safe value, try with a
> +		 * system wide drain. This is needed if the pagevecs holding
> +		 * the page are on a different CPU.
> +		 */
> +		if (local_drain) {
> +			lru_add_drain_all();

I do wonder if that is valid to be called with all the locks at this point.

> +			/* We give up here, and let the caller try again */
> +			return -EAGAIN;
> +		}
> +		/*
> +		 * We are here if the page refcount does not match the
> +		 * expected safe value. The main culprits are usually
> +		 * pagevecs. With lru_add_drain() we drain the pagevecs
> +		 * on the local CPU so that hopefully the refcount will
> +		 * reach the expected safe value.
> +		 */
> +		lru_add_drain();

dito ...

> +		local_drain = 1;
> +		/* And now we try again immediately after draining */
> +		goto again;
> +	} else if (rc == -ENXIO) {
> +		if (gmap_fault(gmap, gaddr, FAULT_FLAG_WRITE))
> +			return -EFAULT;
> +		return -EAGAIN;
> +	}
> +	return rc;
> +}
> +EXPORT_SYMBOL_GPL(gmap_make_secure);
> +
> +int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
> +{
> +	struct uv_cb_cts uvcb = {
> +		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
> +		.header.len = sizeof(uvcb),
> +		.guest_handle = gmap->guest_handle,
> +		.gaddr = gaddr,
> +	};
> +
> +	return gmap_make_secure(gmap, gaddr, &uvcb);
> +}
> +EXPORT_SYMBOL_GPL(gmap_convert_to_secure);
> +
> +/**
> + * To be called with the page locked or with an extra reference!

Can we have races here? (IOW, two callers concurrently for the same page)

> + */
> +int arch_make_page_accessible(struct page *page)
> +{
> +	int rc = 0;
> +
> +	/* Hugepage cannot be protected, so nothing to do */
> +	if (PageHuge(page))
> +		return 0;
> +
> +	/*
> +	 * PG_arch_1 is used in 3 places:
> +	 * 1. for kernel page tables during early boot
> +	 * 2. for storage keys of huge pages and KVM
> +	 * 3. As an indication that this page might be secure. This can
> +	 *    overindicate, e.g. we set the bit before calling
> +	 *    convert_to_secure.
> +	 * As secure pages are never huge, all 3 variants can co-exists.
> +	 */
> +	if (!test_bit(PG_arch_1, &page->flags))
> +		return 0;
> +
> +	rc = uv_pin_shared(page_to_phys(page));
> +	if (!rc) {
> +		clear_bit(PG_arch_1, &page->flags);
> +		return 0;
> +	}

Overall, looks sane to me. (I am mostly concerned about possible races,
e.g., when two gmaps would be created for a single VM and nasty stuff be
done with them). But yeah, I guess you guys thought about this ;)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 01/42] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-17  9:14   ` David Hildenbrand
@ 2020-02-17 11:10     ` Christian Borntraeger
  2020-02-18  8:27       ` David Hildenbrand
  0 siblings, 1 reply; 22+ messages in thread
From: Christian Borntraeger @ 2020-02-17 11:10 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank, Andrew Morton
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, linux-s390, Michael Mueller, Vasily Gorbik,
	Andrea Arcangeli, linux-mm, Will Deacon, Sean Christopherson



On 17.02.20 10:14, David Hildenbrand wrote:
> On 14.02.20 23:26, Christian Borntraeger wrote:
>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>
>> With the introduction of protected KVM guests on s390 there is now a
>> concept of inaccessible pages. These pages need to be made accessible
>> before the host can access them.
>>
>> While cpu accesses will trigger a fault that can be resolved, I/O
>> accesses will just fail.  We need to add a callback into architecture
>> code for places that will do I/O, namely when writeback is started or
>> when a page reference is taken.
>>
>> This is not only to enable paging, file backing etc, it is also
>> necessary to protect the host against a malicious user space. For
>> example a bad QEMU could simply start direct I/O on such protected
>> memory.  We do not want userspace to be able to trigger I/O errors and
>> thus we the logic is "whenever somebody accesses that page (gup) or
>> doing I/O, make sure that this page can be accessed. When the guest
>> tries to access that page we will wait in the page fault handler for
>> writeback to have finished and for the page_ref to be the expected
>> value.
>>
>> If wanted by others, the callbacks can be extended with error handlin
>> and a parameter from where this is called.
> 
> s/handlin/handling/

ack. 
> 
> One last question from my side:
> 
> Why is it OK to ignore errors here. IOW, why not squash "[PATCH v2
> 39/42] example for future extension: mm:gup/writeback: add callbacks for
> inaccessible pages: error cases" into this patch.

I can certainly squash this in. But I have never heard feedback from the 
memory management people if they would prefer the patch as-is (no overhead
at all for !s390) or already with the error handling. 
> 
> I can see in patch "[PATCH v2 05/42] s390/mm: provide memory management
> functions for protected KVM guests", that the call can fail for various
> reasons. That puzzles me a bit - what would happen if any of that fails?

The convert _to_ secure has error handling. uv_convert_from_secure (the
one that is used for arch_make_page_accessible can only fail if we mess up,
e.g. an "export" on memory that we have donated to the ultravisor.


> Or will it actually never fail for s390x (and all that error handling in
> arch_make_page_accessible() is essentially dead code in real life?)

So yes, if everything is setup properly this should not fail in real life
and only we have a kernel (or firmware) bug.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 05/42] s390/mm: provide memory management functions for protected KVM guests
  2020-02-17 10:21   ` David Hildenbrand
@ 2020-02-17 11:28     ` Christian Borntraeger
  2020-02-17 12:07       ` David Hildenbrand
  0 siblings, 1 reply; 22+ messages in thread
From: Christian Borntraeger @ 2020-02-17 11:28 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, linux-s390, Michael Mueller, Vasily Gorbik,
	Andrea Arcangeli, linux-mm



On 17.02.20 11:21, David Hildenbrand wrote:
>> diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
>> index 85e944f04c70..4ebcf891ff3c 100644
>> --- a/arch/s390/include/asm/page.h
>> +++ b/arch/s390/include/asm/page.h
>> @@ -153,6 +153,11 @@ static inline int devmem_is_allowed(unsigned long pfn)
>>  #define HAVE_ARCH_FREE_PAGE
>>  #define HAVE_ARCH_ALLOC_PAGE
>>  
>> +#if IS_ENABLED(CONFIG_PGSTE)
>> +int arch_make_page_accessible(struct page *page);
>> +#define HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
>> +#endif
>> +
> 
> Feels like this should have been one of the (CONFIG_)ARCH_HAVE_XXX
> thingies defined via kconfig instead.
> 
> E.g., like (CONFIG_)HAVE_ARCH_TRANSPARENT_HUGEPAGE
> 
> [...]

This looks more or less like HAVE_ARCH_ALLOC_PAGE. You will find both
variants.
I think I will leave it that way for now until we need it to be a config
or the mm maintainers have a preference.


> 
>> +
>> +/*
>> + * Requests the Ultravisor to encrypt a guest page and make it
>> + * accessible to the host for paging (export).
>> + *
>> + * @paddr: Absolute host address of page to be exported
>> + */
>> +int uv_convert_from_secure(unsigned long paddr)
>> +{
>> +	struct uv_cb_cfs uvcb = {
>> +		.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
>> +		.header.len = sizeof(uvcb),
>> +		.paddr = paddr
>> +	};
>> +
>> +	if (uv_call(0, (u64)&uvcb))
>> +		return -EINVAL;
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Calculate the expected ref_count for a page that would otherwise have no
>> + * further pins. This was cribbed from similar functions in other places in
>> + * the kernel, but with some slight modifications. We know that a secure
>> + * page can not be a huge page for example.
> 
> s/ca not cannot/

ack.


> 
>> + */
>> +static int expected_page_refs(struct page *page)
>> +{
>> +	int res;
>> +
>> +	res = page_mapcount(page);
>> +	if (PageSwapCache(page)) {
>> +		res++;
>> +	} else if (page_mapping(page)) {
>> +		res++;
>> +		if (page_has_private(page))
>> +			res++;
>> +	}
>> +	return res;
>> +}
>> +
>> +static int make_secure_pte(pte_t *ptep, unsigned long addr,
>> +			   struct page *exp_page, struct uv_cb_header *uvcb)
>> +{
>> +	pte_t entry = READ_ONCE(*ptep);
>> +	struct page *page;
>> +	int expected, rc = 0;
>> +
>> +	if (!pte_present(entry))
>> +		return -ENXIO;
>> +	if (pte_val(entry) & _PAGE_INVALID)
>> +		return -ENXIO;
>> +
>> +	page = pte_page(entry);
>> +	if (page != exp_page)
>> +		return -ENXIO;
>> +	if (PageWriteback(page))
>> +		return -EAGAIN;
>> +	expected = expected_page_refs(page);
>> +	if (!page_ref_freeze(page, expected))
>> +		return -EBUSY;
>> +	set_bit(PG_arch_1, &page->flags);
>> +	rc = uv_call(0, (u64)uvcb);
>> +	page_ref_unfreeze(page, expected);
>> +	/* Return -ENXIO if the page was not mapped, -EINVAL otherwise */
>> +	if (rc)
>> +		rc = uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
>> +	return rc;
>> +}
>> +
>> +/*
>> + * Requests the Ultravisor to make a page accessible to a guest.
>> + * If it's brought in the first time, it will be cleared. If
>> + * it has been exported before, it will be decrypted and integrity
>> + * checked.
>> + */
>> +int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
>> +{
>> +	struct vm_area_struct *vma;
>> +	unsigned long uaddr;
>> +	struct page *page;
>> +	int rc, local_drain = 0;
> 
> local_drain could have been a bool.

ack
> 
>> +	spinlock_t *ptelock;
>> +	pte_t *ptep;
>> +
>> +again:
>> +	rc = -EFAULT;
>> +	down_read(&gmap->mm->mmap_sem);
>> +
>> +	uaddr = __gmap_translate(gmap, gaddr);
>> +	if (IS_ERR_VALUE(uaddr))
>> +		goto out;
>> +	vma = find_vma(gmap->mm, uaddr);
>> +	if (!vma)
>> +		goto out;
>> +	/*
>> +	 * Secure pages cannot be huge and userspace should not combine both.
>> +	 * In case userspace does it anyway this will result in an -EFAULT for
>> +	 * the unpack. The guest is thus never reaching secure mode. If
>> +	 * userspace is playing dirty tricky with mapping huge pages later
>> +	 * on this will result in a segmenation fault.
> 
> s/segmenation/segmentation/

ack.
> 
>> +	 */
>> +	if (is_vm_hugetlb_page(vma))
>> +		goto out;
>> +
>> +	rc = -ENXIO;
>> +	page = follow_page(vma, uaddr, FOLL_WRITE);
>> +	if (IS_ERR_OR_NULL(page))
>> +		goto out;
>> +
>> +	lock_page(page);
>> +	ptep = get_locked_pte(gmap->mm, uaddr, &ptelock);
>> +	rc = make_secure_pte(ptep, uaddr, page, uvcb);
>> +	pte_unmap_unlock(ptep, ptelock);
>> +	unlock_page(page);
>> +out:
>> +	up_read(&gmap->mm->mmap_sem);
>> +
>> +	if (rc == -EAGAIN) {
>> +		wait_on_page_writeback(page);
>> +	} else if (rc == -EBUSY) {
>> +		/*
>> +		 * If we have tried a local drain and the page refcount
>> +		 * still does not match our expected safe value, try with a
>> +		 * system wide drain. This is needed if the pagevecs holding
>> +		 * the page are on a different CPU.
>> +		 */
>> +		if (local_drain) {
>> +			lru_add_drain_all();
> 
> I do wonder if that is valid to be called with all the locks at this point.

This function uses per cpu workers and needs no other locks. Also verified 
with lockdep. 

> 
>> +			/* We give up here, and let the caller try again */
>> +			return -EAGAIN;
>> +		}
>> +		/*
>> +		 * We are here if the page refcount does not match the
>> +		 * expected safe value. The main culprits are usually
>> +		 * pagevecs. With lru_add_drain() we drain the pagevecs
>> +		 * on the local CPU so that hopefully the refcount will
>> +		 * reach the expected safe value.
>> +		 */
>> +		lru_add_drain();
> 
> dito ...

dito. 

> 
>> +		local_drain = 1;
>> +		/* And now we try again immediately after draining */
>> +		goto again;
>> +	} else if (rc == -ENXIO) {
>> +		if (gmap_fault(gmap, gaddr, FAULT_FLAG_WRITE))
>> +			return -EFAULT;
>> +		return -EAGAIN;
>> +	}
>> +	return rc;
>> +}
>> +EXPORT_SYMBOL_GPL(gmap_make_secure);
>> +
>> +int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
>> +{
>> +	struct uv_cb_cts uvcb = {
>> +		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
>> +		.header.len = sizeof(uvcb),
>> +		.guest_handle = gmap->guest_handle,
>> +		.gaddr = gaddr,
>> +	};
>> +
>> +	return gmap_make_secure(gmap, gaddr, &uvcb);
>> +}
>> +EXPORT_SYMBOL_GPL(gmap_convert_to_secure);
>> +
>> +/**
>> + * To be called with the page locked or with an extra reference!
> 
> Can we have races here? (IOW, two callers concurrently for the same page)

That would be fine and is part of the design. The ultravisor calls will
either make the page accessible or will be a (mostly) no-op.
In fact, we allow for slight over-indication of "needs to be exported"

What about:

/*
 * To be called with the page locked or with an extra reference! This will
 * prevent gmap_make_secure from touching the page concurrently. Having 2
 * parallel make_page_accessible is fine, as the UV calls will become a 
 * no-op if the page is already exported.
 */


> 
>> + */
>> +int arch_make_page_accessible(struct page *page)
>> +{
>> +	int rc = 0;
>> +
>> +	/* Hugepage cannot be protected, so nothing to do */
>> +	if (PageHuge(page))
>> +		return 0;
>> +
>> +	/*
>> +	 * PG_arch_1 is used in 3 places:
>> +	 * 1. for kernel page tables during early boot
>> +	 * 2. for storage keys of huge pages and KVM
>> +	 * 3. As an indication that this page might be secure. This can
>> +	 *    overindicate, e.g. we set the bit before calling
>> +	 *    convert_to_secure.
>> +	 * As secure pages are never huge, all 3 variants can co-exists.
>> +	 */
>> +	if (!test_bit(PG_arch_1, &page->flags))
>> +		return 0;
>> +
>> +	rc = uv_pin_shared(page_to_phys(page));
>> +	if (!rc) {
>> +		clear_bit(PG_arch_1, &page->flags);
>> +		return 0;
>> +	}
> 
> Overall, looks sane to me. (I am mostly concerned about possible races,
> e.g., when two gmaps would be created for a single VM and nasty stuff be
> done with them). But yeah, I guess you guys thought about this ;)




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 05/42] s390/mm: provide memory management functions for protected KVM guests
  2020-02-17 11:28     ` Christian Borntraeger
@ 2020-02-17 12:07       ` David Hildenbrand
  0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2020-02-17 12:07 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, linux-s390, Michael Mueller, Vasily Gorbik,
	Andrea Arcangeli, linux-mm


>>> +		if (local_drain) {
>>> +			lru_add_drain_all();
>>
>> I do wonder if that is valid to be called with all the locks at this point.
> 
> This function uses per cpu workers and needs no other locks. Also verified 
> with lockdep. 

Okay, perfect.

>>> +/**
>>> + * To be called with the page locked or with an extra reference!
>>
>> Can we have races here? (IOW, two callers concurrently for the same page)
> 
> That would be fine and is part of the design. The ultravisor calls will
> either make the page accessible or will be a (mostly) no-op.
> In fact, we allow for slight over-indication of "needs to be exported"
> 
> What about:
> 
> /*
>  * To be called with the page locked or with an extra reference! This will
>  * prevent gmap_make_secure from touching the page concurrently. Having 2
>  * parallel make_page_accessible is fine, as the UV calls will become a 
>  * no-op if the page is already exported.
>  */

Yes, much clearer, thanks!



-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 40/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: source indication
  2020-02-14 22:26 ` [PATCH v2 40/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: source indication Christian Borntraeger
@ 2020-02-17 14:15   ` Ulrich Weigand
  2020-02-17 14:38     ` Christian Borntraeger
  0 siblings, 1 reply; 22+ messages in thread
From: Ulrich Weigand @ 2020-02-17 14:15 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, Andrew Morton, KVM, Cornelia Huck,
	David Hildenbrand, Thomas Huth, Ulrich Weigand, Claudio Imbrenda,
	linux-s390, Michael Mueller, Vasily Gorbik, Andrea Arcangeli,
	linux-mm, Will Deacon, Sean Christopherson

On Fri, Feb 14, 2020 at 05:26:56PM -0500, Christian Borntraeger wrote:
> +enum access_type {
> +	MAKE_ACCESSIBLE_GENERIC,
> +	MAKE_ACCESSIBLE_GET,
> +	MAKE_ACCESSIBLE_GET_FAST,
> +	MAKE_ACCESSIBLE_WRITEBACK
> +};
>  #ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> -static inline int arch_make_page_accessible(struct page *page)
> +static inline int arch_make_page_accessible(struct page *page, int where)

If we want to make this distinction, wouldn't it be simpler to just
use different function names, like
  arch_make_page_accessible_for_writeback
  arch_make_page_accessible_for_gup
etc.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 40/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: source indication
  2020-02-17 14:15   ` Ulrich Weigand
@ 2020-02-17 14:38     ` Christian Borntraeger
  0 siblings, 0 replies; 22+ messages in thread
From: Christian Borntraeger @ 2020-02-17 14:38 UTC (permalink / raw)
  To: Ulrich Weigand
  Cc: Janosch Frank, Andrew Morton, KVM, Cornelia Huck,
	David Hildenbrand, Thomas Huth, Ulrich Weigand, Claudio Imbrenda,
	linux-s390, Michael Mueller, Vasily Gorbik, Andrea Arcangeli,
	linux-mm, Will Deacon, Sean Christopherson



On 17.02.20 15:15, Ulrich Weigand wrote:
> On Fri, Feb 14, 2020 at 05:26:56PM -0500, Christian Borntraeger wrote:
>> +enum access_type {
>> +	MAKE_ACCESSIBLE_GENERIC,
>> +	MAKE_ACCESSIBLE_GET,
>> +	MAKE_ACCESSIBLE_GET_FAST,
>> +	MAKE_ACCESSIBLE_WRITEBACK
>> +};
>>  #ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
>> -static inline int arch_make_page_accessible(struct page *page)
>> +static inline int arch_make_page_accessible(struct page *page, int where)
> 
> If we want to make this distinction, wouldn't it be simpler to just
> use different function names, like
>   arch_make_page_accessible_for_writeback
>   arch_make_page_accessible_for_gup
> etc.

Agreed.
I would suggest to do these changes when somebody needs them, though.
On the other hand, Patch 39 (the error handling) is something that we
could merge now.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 01/42] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-17 11:10     ` Christian Borntraeger
@ 2020-02-18  8:27       ` David Hildenbrand
  2020-02-18 15:46         ` Sean Christopherson
  0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2020-02-18  8:27 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank, Andrew Morton
  Cc: KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, linux-s390, Michael Mueller, Vasily Gorbik,
	Andrea Arcangeli, linux-mm, Will Deacon, Sean Christopherson

On 17.02.20 12:10, Christian Borntraeger wrote:
> 
> 
> On 17.02.20 10:14, David Hildenbrand wrote:
>> On 14.02.20 23:26, Christian Borntraeger wrote:
>>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>>
>>> With the introduction of protected KVM guests on s390 there is now a
>>> concept of inaccessible pages. These pages need to be made accessible
>>> before the host can access them.
>>>
>>> While cpu accesses will trigger a fault that can be resolved, I/O
>>> accesses will just fail.  We need to add a callback into architecture
>>> code for places that will do I/O, namely when writeback is started or
>>> when a page reference is taken.
>>>
>>> This is not only to enable paging, file backing etc, it is also
>>> necessary to protect the host against a malicious user space. For
>>> example a bad QEMU could simply start direct I/O on such protected
>>> memory.  We do not want userspace to be able to trigger I/O errors and
>>> thus we the logic is "whenever somebody accesses that page (gup) or
>>> doing I/O, make sure that this page can be accessed. When the guest
>>> tries to access that page we will wait in the page fault handler for
>>> writeback to have finished and for the page_ref to be the expected
>>> value.
>>>
>>> If wanted by others, the callbacks can be extended with error handlin
>>> and a parameter from where this is called.
>>
>> s/handlin/handling/
> 
> ack. 
>>
>> One last question from my side:
>>
>> Why is it OK to ignore errors here. IOW, why not squash "[PATCH v2
>> 39/42] example for future extension: mm:gup/writeback: add callbacks for
>> inaccessible pages: error cases" into this patch.
> 
> I can certainly squash this in. But I have never heard feedback from the 
> memory management people if they would prefer the patch as-is (no overhead
> at all for !s390) or already with the error handling. 
>>
>> I can see in patch "[PATCH v2 05/42] s390/mm: provide memory management
>> functions for protected KVM guests", that the call can fail for various
>> reasons. That puzzles me a bit - what would happen if any of that fails?
> 
> The convert _to_ secure has error handling. uv_convert_from_secure (the
> one that is used for arch_make_page_accessible can only fail if we mess up,
> e.g. an "export" on memory that we have donated to the ultravisor.

Okay, so more like a BUG_ON() in case it really fails.

> 
> 
>> Or will it actually never fail for s390x (and all that error handling in
>> arch_make_page_accessible() is essentially dead code in real life?)
> 
> So yes, if everything is setup properly this should not fail in real life
> and only we have a kernel (or firmware) bug.
> 

Then, without feedback from other possible users, this should be a void
function. So either introduce error handling or convert it to a void for
now (and add e.g., BUG_ON and a comment inside the s390x implementation).

As discussed in the other patch, I'd prefer a
CONFIG_HAVE_ARCH_MAKE_PAGE_ACCESSIBLE and handle it via kconfig, but not
sure what other people here prefer.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 01/42] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-18  8:27       ` David Hildenbrand
@ 2020-02-18 15:46         ` Sean Christopherson
  2020-02-18 16:02           ` Will Deacon
  0 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2020-02-18 15:46 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Christian Borntraeger, Janosch Frank, Andrew Morton, KVM,
	Cornelia Huck, Thomas Huth, Ulrich Weigand, Claudio Imbrenda,
	linux-s390, Michael Mueller, Vasily Gorbik, Andrea Arcangeli,
	linux-mm, Will Deacon

On Tue, Feb 18, 2020 at 09:27:20AM +0100, David Hildenbrand wrote:
> On 17.02.20 12:10, Christian Borntraeger wrote:
> > So yes, if everything is setup properly this should not fail in real life
> > and only we have a kernel (or firmware) bug.
> > 
> 
> Then, without feedback from other possible users, this should be a void
> function. So either introduce error handling or convert it to a void for
> now (and add e.g., BUG_ON and a comment inside the s390x implementation).

My preference would also be for a void function (versus ignoring an int
return).


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 01/42] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-18 15:46         ` Sean Christopherson
@ 2020-02-18 16:02           ` Will Deacon
  2020-02-18 16:15             ` Christian Borntraeger
  0 siblings, 1 reply; 22+ messages in thread
From: Will Deacon @ 2020-02-18 16:02 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: David Hildenbrand, Christian Borntraeger, Janosch Frank,
	Andrew Morton, KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, linux-s390, Michael Mueller, Vasily Gorbik,
	Andrea Arcangeli, linux-mm

On Tue, Feb 18, 2020 at 07:46:10AM -0800, Sean Christopherson wrote:
> On Tue, Feb 18, 2020 at 09:27:20AM +0100, David Hildenbrand wrote:
> > On 17.02.20 12:10, Christian Borntraeger wrote:
> > > So yes, if everything is setup properly this should not fail in real life
> > > and only we have a kernel (or firmware) bug.
> > > 
> > 
> > Then, without feedback from other possible users, this should be a void
> > function. So either introduce error handling or convert it to a void for
> > now (and add e.g., BUG_ON and a comment inside the s390x implementation).
> 
> My preference would also be for a void function (versus ignoring an int
> return).

The gup code could certainly handle the error value, although the writeback
is a lot less clear (so a BUG_ON() would seem to be sufficient for now).

Will


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 01/42] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-18 16:02           ` Will Deacon
@ 2020-02-18 16:15             ` Christian Borntraeger
  2020-02-18 21:35               ` Sean Christopherson
  0 siblings, 1 reply; 22+ messages in thread
From: Christian Borntraeger @ 2020-02-18 16:15 UTC (permalink / raw)
  To: Will Deacon, Sean Christopherson
  Cc: David Hildenbrand, Janosch Frank, Andrew Morton, KVM,
	Cornelia Huck, Thomas Huth, Ulrich Weigand, Claudio Imbrenda,
	linux-s390, Michael Mueller, Vasily Gorbik, Andrea Arcangeli,
	linux-mm



On 18.02.20 17:02, Will Deacon wrote:
> On Tue, Feb 18, 2020 at 07:46:10AM -0800, Sean Christopherson wrote:
>> On Tue, Feb 18, 2020 at 09:27:20AM +0100, David Hildenbrand wrote:
>>> On 17.02.20 12:10, Christian Borntraeger wrote:
>>>> So yes, if everything is setup properly this should not fail in real life
>>>> and only we have a kernel (or firmware) bug.
>>>>
>>>
>>> Then, without feedback from other possible users, this should be a void
>>> function. So either introduce error handling or convert it to a void for
>>> now (and add e.g., BUG_ON and a comment inside the s390x implementation).
>>
>> My preference would also be for a void function (versus ignoring an int
>> return).
> 
> The gup code could certainly handle the error value, although the writeback
> is a lot less clear (so a BUG_ON() would seem to be sufficient for now).

Sean, David. Can we agree on merging patch 39?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 39/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: error cases
  2020-02-14 22:26 ` [PATCH v2 39/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: error cases Christian Borntraeger
@ 2020-02-18 16:25   ` Will Deacon
  2020-02-18 16:30     ` Christian Borntraeger
  0 siblings, 1 reply; 22+ messages in thread
From: Will Deacon @ 2020-02-18 16:25 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, Andrew Morton, KVM, Cornelia Huck,
	David Hildenbrand, Thomas Huth, Ulrich Weigand, Claudio Imbrenda,
	linux-s390, Michael Mueller, Vasily Gorbik, Andrea Arcangeli,
	linux-mm, Sean Christopherson

On Fri, Feb 14, 2020 at 05:26:55PM -0500, Christian Borntraeger wrote:
> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> 
> This is a potential extension to do error handling if we fail to
> make the page accessible if we know what others need.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
>  mm/gup.c            | 17 ++++++++++++-----
>  mm/page-writeback.c |  6 +++++-
>  2 files changed, 17 insertions(+), 6 deletions(-)

Sorry, I missed this when replying elsewhere in the thread!
Anyway, looks good to me:

Acked-by: Will Deacon <will@kernel.org>

Thanks,

Will


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 39/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: error cases
  2020-02-18 16:25   ` Will Deacon
@ 2020-02-18 16:30     ` Christian Borntraeger
  2020-02-18 16:33       ` Will Deacon
  0 siblings, 1 reply; 22+ messages in thread
From: Christian Borntraeger @ 2020-02-18 16:30 UTC (permalink / raw)
  To: Will Deacon
  Cc: Janosch Frank, Andrew Morton, KVM, Cornelia Huck,
	David Hildenbrand, Thomas Huth, Ulrich Weigand, Claudio Imbrenda,
	linux-s390, Michael Mueller, Vasily Gorbik, Andrea Arcangeli,
	linux-mm, Sean Christopherson



On 18.02.20 17:25, Will Deacon wrote:
> On Fri, Feb 14, 2020 at 05:26:55PM -0500, Christian Borntraeger wrote:
>> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>
>> This is a potential extension to do error handling if we fail to
>> make the page accessible if we know what others need.
>>
>> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
>> ---
>>  mm/gup.c            | 17 ++++++++++++-----
>>  mm/page-writeback.c |  6 +++++-
>>  2 files changed, 17 insertions(+), 6 deletions(-)
> 
> Sorry, I missed this when replying elsewhere in the thread!
> Anyway, looks good to me:
> 
> Acked-by: Will Deacon <will@kernel.org>

I can use that for a combined patch (this one merged into the first) ? Correct?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 39/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: error cases
  2020-02-18 16:30     ` Christian Borntraeger
@ 2020-02-18 16:33       ` Will Deacon
  0 siblings, 0 replies; 22+ messages in thread
From: Will Deacon @ 2020-02-18 16:33 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, Andrew Morton, KVM, Cornelia Huck,
	David Hildenbrand, Thomas Huth, Ulrich Weigand, Claudio Imbrenda,
	linux-s390, Michael Mueller, Vasily Gorbik, Andrea Arcangeli,
	linux-mm, Sean Christopherson

On Tue, Feb 18, 2020 at 05:30:40PM +0100, Christian Borntraeger wrote:
> On 18.02.20 17:25, Will Deacon wrote:
> > On Fri, Feb 14, 2020 at 05:26:55PM -0500, Christian Borntraeger wrote:
> >> From: Claudio Imbrenda <imbrenda@linux.ibm.com>
> >>
> >> This is a potential extension to do error handling if we fail to
> >> make the page accessible if we know what others need.
> >>
> >> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> >> ---
> >>  mm/gup.c            | 17 ++++++++++++-----
> >>  mm/page-writeback.c |  6 +++++-
> >>  2 files changed, 17 insertions(+), 6 deletions(-)
> > 
> > Sorry, I missed this when replying elsewhere in the thread!
> > Anyway, looks good to me:
> > 
> > Acked-by: Will Deacon <will@kernel.org>
> 
> I can use that for a combined patch (this one merged into the first) ? Correct?

Fine by me!

Will


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 01/42] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-18 16:15             ` Christian Borntraeger
@ 2020-02-18 21:35               ` Sean Christopherson
  2020-02-19  8:31                 ` David Hildenbrand
  0 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2020-02-18 21:35 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Will Deacon, David Hildenbrand, Janosch Frank, Andrew Morton,
	KVM, Cornelia Huck, Thomas Huth, Ulrich Weigand,
	Claudio Imbrenda, linux-s390, Michael Mueller, Vasily Gorbik,
	Andrea Arcangeli, linux-mm

On Tue, Feb 18, 2020 at 05:15:57PM +0100, Christian Borntraeger wrote:
> 
> 
> On 18.02.20 17:02, Will Deacon wrote:
> > On Tue, Feb 18, 2020 at 07:46:10AM -0800, Sean Christopherson wrote:
> >> On Tue, Feb 18, 2020 at 09:27:20AM +0100, David Hildenbrand wrote:
> >>> On 17.02.20 12:10, Christian Borntraeger wrote:
> >>>> So yes, if everything is setup properly this should not fail in real life
> >>>> and only we have a kernel (or firmware) bug.
> >>>>
> >>>
> >>> Then, without feedback from other possible users, this should be a void
> >>> function. So either introduce error handling or convert it to a void for
> >>> now (and add e.g., BUG_ON and a comment inside the s390x implementation).
> >>
> >> My preference would also be for a void function (versus ignoring an int
> >> return).
> > 
> > The gup code could certainly handle the error value, although the writeback
> > is a lot less clear (so a BUG_ON() would seem to be sufficient for now).
> 
> Sean, David. Can we agree on merging patch 39?

I'm a-ok with adding error checking, ignoring the return value is the only
option I don't like :-)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 01/42] mm:gup/writeback: add callbacks for inaccessible pages
  2020-02-18 21:35               ` Sean Christopherson
@ 2020-02-19  8:31                 ` David Hildenbrand
  0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2020-02-19  8:31 UTC (permalink / raw)
  To: Sean Christopherson, Christian Borntraeger
  Cc: Will Deacon, Janosch Frank, Andrew Morton, KVM, Cornelia Huck,
	Thomas Huth, Ulrich Weigand, Claudio Imbrenda, linux-s390,
	Michael Mueller, Vasily Gorbik, Andrea Arcangeli, linux-mm

On 18.02.20 22:35, Sean Christopherson wrote:
> On Tue, Feb 18, 2020 at 05:15:57PM +0100, Christian Borntraeger wrote:
>>
>>
>> On 18.02.20 17:02, Will Deacon wrote:
>>> On Tue, Feb 18, 2020 at 07:46:10AM -0800, Sean Christopherson wrote:
>>>> On Tue, Feb 18, 2020 at 09:27:20AM +0100, David Hildenbrand wrote:
>>>>> On 17.02.20 12:10, Christian Borntraeger wrote:
>>>>>> So yes, if everything is setup properly this should not fail in real life
>>>>>> and only we have a kernel (or firmware) bug.
>>>>>>
>>>>>
>>>>> Then, without feedback from other possible users, this should be a void
>>>>> function. So either introduce error handling or convert it to a void for
>>>>> now (and add e.g., BUG_ON and a comment inside the s390x implementation).
>>>>
>>>> My preference would also be for a void function (versus ignoring an int
>>>> return).
>>>
>>> The gup code could certainly handle the error value, although the writeback
>>> is a lot less clear (so a BUG_ON() would seem to be sufficient for now).
>>
>> Sean, David. Can we agree on merging patch 39?
> 
> I'm a-ok with adding error checking, ignoring the return value is the only
> option I don't like :-)

Same over here :)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, back to index

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-14 22:26 [PATCH v2 00/42] KVM: s390: Add support for protected VMs Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 01/42] mm:gup/writeback: add callbacks for inaccessible pages Christian Borntraeger
2020-02-17  9:14   ` David Hildenbrand
2020-02-17 11:10     ` Christian Borntraeger
2020-02-18  8:27       ` David Hildenbrand
2020-02-18 15:46         ` Sean Christopherson
2020-02-18 16:02           ` Will Deacon
2020-02-18 16:15             ` Christian Borntraeger
2020-02-18 21:35               ` Sean Christopherson
2020-02-19  8:31                 ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 05/42] s390/mm: provide memory management functions for protected KVM guests Christian Borntraeger
2020-02-17 10:21   ` David Hildenbrand
2020-02-17 11:28     ` Christian Borntraeger
2020-02-17 12:07       ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 06/42] s390/mm: add (non)secure page access exceptions handlers Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 39/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: error cases Christian Borntraeger
2020-02-18 16:25   ` Will Deacon
2020-02-18 16:30     ` Christian Borntraeger
2020-02-18 16:33       ` Will Deacon
2020-02-14 22:26 ` [PATCH v2 40/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: source indication Christian Borntraeger
2020-02-17 14:15   ` Ulrich Weigand
2020-02-17 14:38     ` Christian Borntraeger

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git