All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement
@ 2017-12-13 12:53 Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 01/22] s390/mm: make gmap_protect_range more modular Janosch Frank
                   ` (24 more replies)
  0 siblings, 25 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

Since the z10 s390 does support 1M pages, but whereas hugetlbfs
support was added quite fast, KVM always used standard 4k pages for
guest backings.

This patchset adds full support for 1M huge page backings for s390
KVM guests. I.e. we also support VSIE (nested vms) for these guests
and are therefore able to run all combinations of backings for all
layers of guests.

When running a VSIE guest in a huge page backed guest, we need to
split some huge pages to be able to set granular protection. This way
we avoid a prot/unprot cycle if prefixes and VSIE pages containing
level 3 gmap DAT tables share the same segment, as the prefix has to
be accessible at all times and the VSIE page has to be write
protected.

TODO:
* Cleanups & Documentation
* Refactoring to get rid of a lot of indents
* Find a way to reduce or beautify bit checks on table entries
* Storage key support for split pages (will be a separate bugfix)
* Regression testing
* Testing large setups
* Testing multi level VSIE

V2:
	* Incorporated changes from David's cleanup
	* Now flushing with IDTE_NODAT for protection transfers.
	* Added RRBE huge page handling for g2 -> g3 skey emulation
	* Added documentation for capability
	* Renamed GMAP_ENTRY_* constants
	* Added SEGMENT hardware bits constants
	* Improved some patch descriptions
	* General small improvements
	* Introduced pte_from_pmd function

Accomplished testing:
l2: KVM guest
l3: nested KVM guest

* 1m l2 guests
* VSIE (l3) 4k and 1m guests on 1m l2
* 1m l2 -> l2 migration with 4k/1m l3 guests
* l3 -> l2 migration
* postcopy works every second try, seems to be QEMU or my setup


The initial prototype was started by Dominik Dingel. I had the
pleasure of adding the VSIE part, the protection transfers and the
optimizations. A huge thanks to Christian and Martin who review(ed)
and helped debugging/designing.

Dominik Dingel (2):
  s390/mm: hugetlb pages within a gmap can not be freed
  s390/mm: clear huge page storage keys on enable_skey

Janosch Frank (20):
  s390/mm: make gmap_protect_range more modular
  s390/mm: Abstract gmap notify bit setting
  s390/mm: add gmap PMD invalidation notification
  s390/mm: Add gmap pmd invalidation and clearing
  s390/mm: Introduce gmap_pmdp_xchg
  RFC: s390/mm: Transfer guest pmd protection to host
  s390/mm: Add huge page dirty sync support
  s390/mm: Add huge pmd storage key handling
  s390/mm: Remove superfluous parameter
  s390/mm: Add gmap_protect_large read protection support
  s390/mm: Make gmap_read_table EDAT1 compatible
  s390/mm: Make protect_rmap EDAT1 compatible
  s390/mm: GMAP read table extensions
  s390/mm: Add shadow segment code
  s390/mm: Add VSIE reverse fake case
  s390/mm: Remove gmap_pte_op_walk
  s390/mm: Split huge pages if granular protection is needed
  s390/mm: Enable gmap huge pmd support
  KVM: s390: Add KVM HPAGE capability
  RFC: s390/mm: Add gmap lock classes

 Documentation/virtual/kvm/api.txt |   10 +
 arch/s390/include/asm/gmap.h      |   39 +-
 arch/s390/include/asm/pgtable.h   |   18 +-
 arch/s390/kvm/gaccess.c           |   64 +-
 arch/s390/kvm/kvm-s390.c          |   19 +-
 arch/s390/mm/fault.c              |   10 +-
 arch/s390/mm/gmap.c               | 1275 +++++++++++++++++++++++++++++++++----
 arch/s390/mm/pageattr.c           |    6 +-
 arch/s390/mm/pgtable.c            |  176 ++++-
 include/uapi/linux/kvm.h          |    1 +
 10 files changed, 1445 insertions(+), 173 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 01/22] s390/mm: make gmap_protect_range more modular
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2018-01-22 11:33   ` David Hildenbrand
  2017-12-13 12:53 ` [RFC/PATCH v2 02/22] s390/mm: Abstract gmap notify bit setting Janosch Frank
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

This patch reworks the gmap_protect_range logic and extracts the pte
handling into an own function. Also we do now walk to the pmd and make
it accessible in the function for later use. This way we can add huge
page handling logic more easily.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 arch/s390/mm/gmap.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 92 insertions(+), 10 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 05d459b..8de8bf9 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -874,7 +874,88 @@ static int gmap_pte_op_fixup(struct gmap *gmap, unsigned long gaddr,
  */
 static void gmap_pte_op_end(spinlock_t *ptl)
 {
-	spin_unlock(ptl);
+	if (ptl)
+		spin_unlock(ptl);
+}
+
+/**
+ * gmap_pmd_op_walk - walk the gmap tables, get the guest table lock
+ *		      and return the pmd pointer
+ * @gmap: pointer to guest mapping meta data structure
+ * @gaddr: virtual address in the guest address space
+ *
+ * Returns a pointer to the pmd for a guest address, or NULL
+ */
+static inline pmd_t *gmap_pmd_op_walk(struct gmap *gmap, unsigned long gaddr)
+{
+	pmd_t *pmdp;
+
+	spin_lock(&gmap->guest_table_lock);
+	pmdp = (pmd_t *) gmap_table_walk(gmap, gaddr, 1);
+
+	/*
+	 * Empty pmds can become large after we give up the
+	 * guest_table_lock, so we have to check for pmd_none
+	 * here.
+	 */
+	if (!pmdp || pmd_none(*pmdp)) {
+		spin_unlock(&gmap->guest_table_lock);
+		return NULL;
+	}
+	/*
+	 * For plain 4k guests that do not run under the vsie it
+	 * suffices to take the pte lock later on. Thus we can unlock
+	 * the guest_table_lock here.
+	 */
+	if (!pmd_large(*pmdp) && !gmap_is_shadow(gmap))
+		spin_unlock(&gmap->guest_table_lock);
+	return pmdp;
+}
+
+/**
+ * gmap_pmd_op_end - release the guest_table_lock if needed
+ * @gmap: pointer to the guest mapping meta data structure
+ * @pmdp: pointer to the pmd
+ */
+static inline void gmap_pmd_op_end(struct gmap *gmap, pmd_t *pmdp)
+{
+	if (pmd_large(*pmdp) || gmap_is_shadow(gmap))
+		spin_unlock(&gmap->guest_table_lock);
+}
+
+/*
+ * gmap_protect_pte - remove access rights to memory and set pgste bits
+ * @gmap: pointer to guest mapping meta data structure
+ * @gaddr: virtual address in the guest address space
+ * @pmdp: pointer to the pmd associated with the pte
+ * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE
+ * @bits: pgste notification bits to set
+ *
+ * Returns 0 if successfully protected, -ENOMEM if out of memory and
+ * -EAGAIN if a fixup is needed.
+ *
+ * Expected to be called with sg->mm->mmap_sem in read and
+ * guest_table_lock held for shadow gmaps.
+ */
+static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
+			    pmd_t *pmdp, int prot, unsigned long bits)
+{
+	int rc;
+	pte_t *ptep;
+	spinlock_t *ptl = NULL;
+
+	/* We have no upper segment, let's go back and fix this up. */
+	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
+		return -EAGAIN;
+
+	ptep = pte_alloc_map_lock(gmap->mm, pmdp, gaddr, &ptl);
+	if (!ptep)
+		return -ENOMEM;
+
+	/* Protect and unlock. */
+	rc = ptep_force_prot(gmap->mm, gaddr, ptep, prot, bits);
+	gmap_pte_op_end(ptl);
+	return rc;
 }
 
 /*
@@ -896,16 +977,20 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
 			      unsigned long len, int prot, unsigned long bits)
 {
 	unsigned long vmaddr;
-	spinlock_t *ptl;
-	pte_t *ptep;
+	pmd_t *pmdp;
 	int rc;
 
 	while (len) {
 		rc = -EAGAIN;
-		ptep = gmap_pte_op_walk(gmap, gaddr, &ptl);
-		if (ptep) {
-			rc = ptep_force_prot(gmap->mm, gaddr, ptep, prot, bits);
-			gmap_pte_op_end(ptl);
+		pmdp = gmap_pmd_op_walk(gmap, gaddr);
+		if (pmdp) {
+			rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,
+					      bits);
+			if (!rc) {
+				len -= PAGE_SIZE;
+				gaddr += PAGE_SIZE;
+			}
+			gmap_pmd_op_end(gmap, pmdp);
 		}
 		if (rc) {
 			vmaddr = __gmap_translate(gmap, gaddr);
@@ -914,10 +999,7 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
 			rc = gmap_pte_op_fixup(gmap, gaddr, vmaddr, prot);
 			if (rc)
 				return rc;
-			continue;
 		}
-		gaddr += PAGE_SIZE;
-		len -= PAGE_SIZE;
 	}
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 02/22] s390/mm: Abstract gmap notify bit setting
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 01/22] s390/mm: make gmap_protect_range more modular Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2018-01-22 11:34   ` David Hildenbrand
  2017-12-13 12:53 ` [RFC/PATCH v2 03/22] s390/mm: add gmap PMD invalidation notification Janosch Frank
                   ` (22 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

Currently we use the software PGSTE bits PGSTE_IN_BIT and PGSTE_VSIE_BIT
to notify before an invalidation occurs on a prefix page or a VSIE page
respectively. Both bits only work for a PGSTE, which only exists for
page tables.

For huge page support we also need such bits for segments (pmds) so
let's introduce abstract GMAP_NOTIFY_* bits that will be realized into
the respective bits when gmap DAT table entries are protected.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/gmap.h |  4 ++++
 arch/s390/mm/gmap.c          | 13 ++++++++-----
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index e07cce8..c1bc563 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -9,6 +9,10 @@
 #ifndef _ASM_S390_GMAP_H
 #define _ASM_S390_GMAP_H
 
+/* Generic bits for GMAP notification on DAT table entry changes. */
+#define GMAP_NOTIFY_SHADOW	0x2
+#define GMAP_NOTIFY_MPROT	0x1
+
 /**
  * struct gmap_struct - guest address space
  * @list: list head for the mm->context gmap list
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 8de8bf9..e7825d2 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -929,7 +929,7 @@ static inline void gmap_pmd_op_end(struct gmap *gmap, pmd_t *pmdp)
  * @gaddr: virtual address in the guest address space
  * @pmdp: pointer to the pmd associated with the pte
  * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE
- * @bits: pgste notification bits to set
+ * @bits: notification bits to set
  *
  * Returns 0 if successfully protected, -ENOMEM if out of memory and
  * -EAGAIN if a fixup is needed.
@@ -943,6 +943,7 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
 	int rc;
 	pte_t *ptep;
 	spinlock_t *ptl = NULL;
+	unsigned long pbits = 0;
 
 	/* We have no upper segment, let's go back and fix this up. */
 	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
@@ -952,8 +953,10 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
 	if (!ptep)
 		return -ENOMEM;
 
+	pbits |= (bits & GMAP_NOTIFY_MPROT) ? PGSTE_IN_BIT : 0;
+	pbits |= (bits & GMAP_NOTIFY_SHADOW) ? PGSTE_VSIE_BIT : 0;
 	/* Protect and unlock. */
-	rc = ptep_force_prot(gmap->mm, gaddr, ptep, prot, bits);
+	rc = ptep_force_prot(gmap->mm, gaddr, ptep, prot, pbits);
 	gmap_pte_op_end(ptl);
 	return rc;
 }
@@ -1028,7 +1031,7 @@ int gmap_mprotect_notify(struct gmap *gmap, unsigned long gaddr,
 	if (!MACHINE_HAS_ESOP && prot == PROT_READ)
 		return -EINVAL;
 	down_read(&gmap->mm->mmap_sem);
-	rc = gmap_protect_range(gmap, gaddr, len, prot, PGSTE_IN_BIT);
+	rc = gmap_protect_range(gmap, gaddr, len, prot, GMAP_NOTIFY_MPROT);
 	up_read(&gmap->mm->mmap_sem);
 	return rc;
 }
@@ -1150,7 +1153,7 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
 		if (ptep) {
 			spin_lock(&sg->guest_table_lock);
 			rc = ptep_force_prot(parent->mm, paddr, ptep, prot,
-					     PGSTE_VSIE_BIT);
+					     GMAP_NOTIFY_SHADOW);
 			if (!rc)
 				gmap_insert_rmap(sg, vmaddr, rmap);
 			spin_unlock(&sg->guest_table_lock);
@@ -1616,7 +1619,7 @@ struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce,
 	down_read(&parent->mm->mmap_sem);
 	rc = gmap_protect_range(parent, asce & _ASCE_ORIGIN,
 				((asce & _ASCE_TABLE_LENGTH) + 1) * PAGE_SIZE,
-				PROT_READ, PGSTE_VSIE_BIT);
+				PROT_READ, GMAP_NOTIFY_SHADOW);
 	up_read(&parent->mm->mmap_sem);
 	spin_lock(&parent->shadow_lock);
 	new->initialized = true;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 03/22] s390/mm: add gmap PMD invalidation notification
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 01/22] s390/mm: make gmap_protect_range more modular Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 02/22] s390/mm: Abstract gmap notify bit setting Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2017-12-21  9:24   ` Janosch Frank
                     ` (2 more replies)
  2017-12-13 12:53 ` [RFC/PATCH v2 04/22] s390/mm: Add gmap pmd invalidation and clearing Janosch Frank
                   ` (21 subsequent siblings)
  24 siblings, 3 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

For later migration of huge pages we want to write-protect guest
PMDs. While doing this, we have to make absolutely sure, that the
guest's lowcore is always accessible when the VCPU is running. With
PTEs, this is solved by marking the PGSTEs of the lowcore pages with
the invalidation notification bit and kicking the guest out of the SIE
via a notifier function if we need to invalidate such a page.

With PMDs we do not have PGSTEs or some other bits we could use in the
host PMD. Instead we pick one of the free bits in the gmap PMD. Every
time a host pmd will be invalidated, we will check if the respective
gmap PMD has the bit set and in that case fire up the notifier.

In the first step we only support setting the invalidation bit, but we
do not support restricting access of guest pmds. It will follow
shortly.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 arch/s390/include/asm/gmap.h    |  3 ++
 arch/s390/include/asm/pgtable.h |  7 +++-
 arch/s390/mm/gmap.c             | 92 ++++++++++++++++++++++++++++++++++++-----
 arch/s390/mm/pgtable.c          |  4 ++
 4 files changed, 94 insertions(+), 12 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index c1bc563..21bb658 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -13,6 +13,9 @@
 #define GMAP_NOTIFY_SHADOW	0x2
 #define GMAP_NOTIFY_MPROT	0x1
 
+/* Status bits in the gmap segment entry. */
+#define _SEGMENT_ENTRY_GMAP_IN		0x0001	/* invalidation notify bit */
+
 /**
  * struct gmap_struct - guest address space
  * @list: list head for the mm->context gmap list
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 57d7bc9..ba3840c 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -269,8 +269,10 @@ static inline int is_module_addr(void *addr)
 #define _REGION_ENTRY_BITS_LARGE 0xffffffff8000fe2fUL
 
 /* Bits in the segment table entry */
-#define _SEGMENT_ENTRY_BITS	0xfffffffffffffe33UL
-#define _SEGMENT_ENTRY_BITS_LARGE 0xfffffffffff0ff33UL
+#define _SEGMENT_ENTRY_BITS			0xfffffffffffffe33UL
+#define _SEGMENT_ENTRY_BITS_LARGE 		0xfffffffffff0ff33UL
+#define _SEGMENT_ENTRY_HARDWARE_BITS		0xfffffffffffffe30UL
+#define _SEGMENT_ENTRY_HARDWARE_BITS_LARGE 	0xfffffffffff00730UL
 #define _SEGMENT_ENTRY_ORIGIN_LARGE ~0xfffffUL /* large page address	    */
 #define _SEGMENT_ENTRY_ORIGIN	~0x7ffUL/* page table origin		    */
 #define _SEGMENT_ENTRY_PROTECT	0x200	/* segment protection bit	    */
@@ -1093,6 +1095,7 @@ void ptep_set_pte_at(struct mm_struct *mm, unsigned long addr,
 void ptep_set_notify(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
 void ptep_notify(struct mm_struct *mm, unsigned long addr,
 		 pte_t *ptep, unsigned long bits);
+void pmdp_notify(struct mm_struct *mm, unsigned long addr);
 int ptep_force_prot(struct mm_struct *mm, unsigned long gaddr,
 		    pte_t *ptep, int prot, unsigned long bit);
 void ptep_zap_unused(struct mm_struct *mm, unsigned long addr,
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index e7825d2..ff7fe24 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -596,10 +596,15 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
 	if (*table == _SEGMENT_ENTRY_EMPTY) {
 		rc = radix_tree_insert(&gmap->host_to_guest,
 				       vmaddr >> PMD_SHIFT, table);
-		if (!rc)
-			*table = pmd_val(*pmd);
-	} else
-		rc = 0;
+		if (!rc) {
+			if (pmd_large(*pmd)) {
+				*table = pmd_val(*pmd) &
+					_SEGMENT_ENTRY_HARDWARE_BITS_LARGE;
+			} else
+				*table = pmd_val(*pmd) &
+					_SEGMENT_ENTRY_HARDWARE_BITS;
+		}
+	}
 	spin_unlock(&gmap->guest_table_lock);
 	spin_unlock(ptl);
 	radix_tree_preload_end();
@@ -962,6 +967,33 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
 }
 
 /*
+ * gmap_protect_pmd - set pmd notification bits
+ * @pmdp: pointer to the pmd to be protected
+ * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE
+ * @bits: notification bits to set
+ *
+ * Returns 0 if successfully protected, -ENOMEM if out of memory and
+ * -EAGAIN if a fixup is needed.
+ *
+ * Expected to be called with sg->mm->mmap_sem in read and
+ * guest_table_lock held.
+ */
+static int gmap_protect_pmd(struct gmap *gmap, unsigned long gaddr,
+			    pmd_t *pmdp, int prot, unsigned long bits)
+{
+	const int pmd_i = pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID;
+	const int pmd_p = pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT;
+
+	/* Fixup needed */
+	if ((pmd_i && (prot != PROT_NONE)) || (pmd_p && (prot & PROT_WRITE)))
+		return -EAGAIN;
+
+	if (bits & GMAP_NOTIFY_MPROT)
+		pmd_val(*pmdp) |=  _SEGMENT_ENTRY_GMAP_IN;
+	return 0;
+}
+
+/*
  * gmap_protect_range - remove access rights to memory and set pgste bits
  * @gmap: pointer to guest mapping meta data structure
  * @gaddr: virtual address in the guest address space
@@ -979,7 +1011,7 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
 static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
 			      unsigned long len, int prot, unsigned long bits)
 {
-	unsigned long vmaddr;
+	unsigned long vmaddr, dist;
 	pmd_t *pmdp;
 	int rc;
 
@@ -987,11 +1019,21 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
 		rc = -EAGAIN;
 		pmdp = gmap_pmd_op_walk(gmap, gaddr);
 		if (pmdp) {
-			rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,
-					      bits);
-			if (!rc) {
-				len -= PAGE_SIZE;
-				gaddr += PAGE_SIZE;
+			if (!pmd_large(*pmdp)) {
+				rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,
+						      bits);
+				if (!rc) {
+					len -= PAGE_SIZE;
+					gaddr += PAGE_SIZE;
+				}
+			} else {
+				rc = gmap_protect_pmd(gmap, gaddr, pmdp, prot,
+						      bits);
+				if (!rc) {
+					dist = HPAGE_SIZE - (gaddr & ~HPAGE_MASK);
+					len = len < dist ? 0 : len - dist;
+					gaddr = (gaddr & HPAGE_MASK) + HPAGE_SIZE;
+				}
 			}
 			gmap_pmd_op_end(gmap, pmdp);
 		}
@@ -2185,6 +2227,36 @@ void ptep_notify(struct mm_struct *mm, unsigned long vmaddr,
 }
 EXPORT_SYMBOL_GPL(ptep_notify);
 
+/**
+ * pmdp_notify - call all invalidation callbacks for a specific pmd
+ * @mm: pointer to the process mm_struct
+ * @vmaddr: virtual address in the process address space
+ *
+ * This function is expected to be called with mmap_sem held in read.
+ */
+void pmdp_notify(struct mm_struct *mm, unsigned long vmaddr)
+{
+	unsigned long *table, gaddr;
+	struct gmap *gmap;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {
+		spin_lock(&gmap->guest_table_lock);
+		table = radix_tree_lookup(&gmap->host_to_guest,
+					  vmaddr >> PMD_SHIFT);
+		if (!table || !(*table & _SEGMENT_ENTRY_GMAP_IN)) {
+			spin_unlock(&gmap->guest_table_lock);
+			continue;
+		}
+		gaddr = __gmap_segment_gaddr(table);
+		*table &= ~_SEGMENT_ENTRY_GMAP_IN;
+		spin_unlock(&gmap->guest_table_lock);
+		gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1);
+	}
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(pmdp_notify);
+
 static inline void thp_split_mm(struct mm_struct *mm)
 {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 4f2b65d..a6cc540 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -405,6 +405,8 @@ pmd_t pmdp_xchg_direct(struct mm_struct *mm, unsigned long addr,
 	pmd_t old;
 
 	preempt_disable();
+	if (mm_has_pgste(mm))
+		pmdp_notify(mm, addr);
 	old = pmdp_flush_direct(mm, addr, pmdp);
 	*pmdp = new;
 	preempt_enable();
@@ -418,6 +420,8 @@ pmd_t pmdp_xchg_lazy(struct mm_struct *mm, unsigned long addr,
 	pmd_t old;
 
 	preempt_disable();
+	if (mm_has_pgste(mm))
+		pmdp_notify(mm, addr);
 	old = pmdp_flush_lazy(mm, addr, pmdp);
 	*pmdp = new;
 	preempt_enable();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 04/22] s390/mm: Add gmap pmd invalidation and clearing
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (2 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 03/22] s390/mm: add gmap PMD invalidation notification Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 05/22] s390/mm: hugetlb pages within a gmap can not be freed Janosch Frank
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

If the host invalidates a pmd, we also have to invalidate the
corresponding gmap pmds, as well as flush them from the TLB. This is
necessary, as we don't share the pmd tables between host and guest as
we do with ptes.

The clearing part of these three new functions sets a guest pmd entry
to _SEGMENT_ENTRY_EMPTY, so the guest will fault on it and we will
re-link it.

Flushing the gmap is not necessary in the host's lazy local and csp
cases. Both purge the TLB completely.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 arch/s390/include/asm/pgtable.h |   6 ++-
 arch/s390/mm/gmap.c             | 114 ++++++++++++++++++++++++++++++++++++++++
 arch/s390/mm/pgtable.c          |  17 ++++--
 3 files changed, 133 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index ba3840c..647c300 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -272,7 +272,7 @@ static inline int is_module_addr(void *addr)
 #define _SEGMENT_ENTRY_BITS			0xfffffffffffffe33UL
 #define _SEGMENT_ENTRY_BITS_LARGE 		0xfffffffffff0ff33UL
 #define _SEGMENT_ENTRY_HARDWARE_BITS		0xfffffffffffffe30UL
-#define _SEGMENT_ENTRY_HARDWARE_BITS_LARGE 	0xfffffffffff00730UL
+#define _SEGMENT_ENTRY_HARDWARE_BITS_LARGE	0xfffffffffff00730UL
 #define _SEGMENT_ENTRY_ORIGIN_LARGE ~0xfffffUL /* large page address	    */
 #define _SEGMENT_ENTRY_ORIGIN	~0x7ffUL/* page table origin		    */
 #define _SEGMENT_ENTRY_PROTECT	0x200	/* segment protection bit	    */
@@ -1120,6 +1120,10 @@ int set_pgste_bits(struct mm_struct *mm, unsigned long addr,
 int get_pgste(struct mm_struct *mm, unsigned long hva, unsigned long *pgstep);
 int pgste_perform_essa(struct mm_struct *mm, unsigned long hva, int orc,
 			unsigned long *oldpte, unsigned long *oldpgste);
+void gmap_pmdp_csp(struct mm_struct *mm, unsigned long vmaddr);
+void gmap_pmdp_invalidate(struct mm_struct *mm, unsigned long vmaddr);
+void gmap_pmdp_idte_local(struct mm_struct *mm, unsigned long vmaddr);
+void gmap_pmdp_idte_global(struct mm_struct *mm, unsigned long vmaddr);
 
 /*
  * Certain architectures need to do special things when PTEs
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index ff7fe24..aceaeb5 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2257,6 +2257,120 @@ void pmdp_notify(struct mm_struct *mm, unsigned long vmaddr)
 }
 EXPORT_SYMBOL_GPL(pmdp_notify);
 
+static void gmap_pmdp_clear(struct mm_struct *mm, unsigned long vmaddr,
+			    int purge)
+{
+	pmd_t *pmdp;
+	struct gmap *gmap;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {
+		spin_lock(&gmap->guest_table_lock);
+		pmdp = (pmd_t *)radix_tree_delete(&gmap->host_to_guest,
+						   vmaddr >> PMD_SHIFT);
+		if (pmdp) {
+			if (purge)
+				__pmdp_csp(pmdp);
+			pmd_val(*pmdp) = _SEGMENT_ENTRY_EMPTY;
+		}
+		spin_unlock(&gmap->guest_table_lock);
+	}
+	rcu_read_unlock();
+}
+
+/**
+ * gmap_pmdp_invalidate - invalidate all affected guest pmd entries without
+ *                        flushing
+ * @mm: pointer to the process mm_struct
+ * @vmaddr: virtual address in the process address space
+ */
+void gmap_pmdp_invalidate(struct mm_struct *mm, unsigned long vmaddr)
+{
+	gmap_pmdp_clear(mm, vmaddr, 0);
+}
+EXPORT_SYMBOL_GPL(gmap_pmdp_invalidate);
+
+/**
+ * gmap_pmdp_csp - csp all affected guest pmd entries
+ * @mm: pointer to the process mm_struct
+ * @vmaddr: virtual address in the process address space
+ */
+void gmap_pmdp_csp(struct mm_struct *mm, unsigned long vmaddr)
+{
+	gmap_pmdp_clear(mm, vmaddr, 1);
+}
+EXPORT_SYMBOL_GPL(gmap_pmdp_csp);
+
+/**
+ * gmap_pmdp_idte_local - invalidate and clear a guest pmd entry
+ * @mm: pointer to the process mm_struct
+ * @vmaddr: virtual address in the process address space
+ */
+void gmap_pmdp_idte_local(struct mm_struct *mm, unsigned long vmaddr)
+{
+	unsigned long *entry, gaddr;
+	struct gmap *gmap;
+	pmd_t *pmdp;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {
+		spin_lock(&gmap->guest_table_lock);
+		entry = radix_tree_delete(&gmap->host_to_guest,
+					  vmaddr >> PMD_SHIFT);
+		if (entry) {
+			pmdp = (pmd_t *)entry;
+			gaddr = __gmap_segment_gaddr(entry);
+			if (MACHINE_HAS_TLB_GUEST)
+				__pmdp_idte(gaddr, pmdp,
+					    IDTE_GUEST_ASCE,
+					    gmap->asce, IDTE_LOCAL);
+			else if (MACHINE_HAS_IDTE)
+				__pmdp_idte(gaddr, pmdp, 0, 0,
+					    IDTE_LOCAL);
+			*entry = _SEGMENT_ENTRY_EMPTY;
+		}
+		spin_unlock(&gmap->guest_table_lock);
+	}
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(gmap_pmdp_idte_local);
+
+/**
+ * gmap_pmdp_idte_global - invalidate and clear a guest pmd entry
+ * @mm: pointer to the process mm_struct
+ * @vmaddr: virtual address in the process address space
+ */
+void gmap_pmdp_idte_global(struct mm_struct *mm, unsigned long vmaddr)
+{
+	unsigned long *entry, gaddr;
+	struct gmap *gmap;
+	pmd_t *pmdp;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {
+		spin_lock(&gmap->guest_table_lock);
+		entry = radix_tree_delete(&gmap->host_to_guest,
+					  vmaddr >> PMD_SHIFT);
+		if (entry) {
+			pmdp = (pmd_t *)entry;
+			gaddr = __gmap_segment_gaddr(entry);
+			if (MACHINE_HAS_TLB_GUEST)
+				__pmdp_idte(gaddr, pmdp,
+					    IDTE_GUEST_ASCE,
+					    gmap->asce, IDTE_GLOBAL);
+			else if (MACHINE_HAS_IDTE)
+				__pmdp_idte(gaddr, pmdp, 0, 0,
+					    IDTE_GLOBAL);
+			else
+				__pmdp_csp(pmdp);
+			*entry = _SEGMENT_ENTRY_EMPTY;
+		}
+		spin_unlock(&gmap->guest_table_lock);
+	}
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(gmap_pmdp_idte_global);
+
 static inline void thp_split_mm(struct mm_struct *mm)
 {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index a6cc540..e690879 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -347,18 +347,27 @@ static inline void pmdp_idte_local(struct mm_struct *mm,
 			    mm->context.asce, IDTE_LOCAL);
 	else
 		__pmdp_idte(addr, pmdp, 0, 0, IDTE_LOCAL);
+	if (mm_has_pgste(mm))
+		gmap_pmdp_idte_local(mm, addr);
 }
 
 static inline void pmdp_idte_global(struct mm_struct *mm,
 				    unsigned long addr, pmd_t *pmdp)
 {
-	if (MACHINE_HAS_TLB_GUEST)
+	if (MACHINE_HAS_TLB_GUEST) {
 		__pmdp_idte(addr, pmdp, IDTE_NODAT | IDTE_GUEST_ASCE,
 			    mm->context.asce, IDTE_GLOBAL);
-	else if (MACHINE_HAS_IDTE)
+		if (mm_has_pgste(mm))
+			gmap_pmdp_idte_global(mm, addr);
+	} else if (MACHINE_HAS_IDTE) {
 		__pmdp_idte(addr, pmdp, 0, 0, IDTE_GLOBAL);
-	else
+		if (mm_has_pgste(mm))
+			gmap_pmdp_idte_global(mm, addr);
+	} else {
 		__pmdp_csp(pmdp);
+		if (mm_has_pgste(mm))
+			gmap_pmdp_csp(mm, addr);
+	}
 }
 
 static inline pmd_t pmdp_flush_direct(struct mm_struct *mm,
@@ -392,6 +401,8 @@ static inline pmd_t pmdp_flush_lazy(struct mm_struct *mm,
 			  cpumask_of(smp_processor_id()))) {
 		pmd_val(*pmdp) |= _SEGMENT_ENTRY_INVALID;
 		mm->context.flush_mm = 1;
+		if (mm_has_pgste(mm))
+			gmap_pmdp_invalidate(mm, addr);
 	} else {
 		pmdp_idte_global(mm, addr, pmdp);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 05/22] s390/mm: hugetlb pages within a gmap can not be freed
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (3 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 04/22] s390/mm: Add gmap pmd invalidation and clearing Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2018-01-24 13:45   ` David Hildenbrand
  2017-12-13 12:53 ` [RFC/PATCH v2 06/22] s390/mm: Introduce gmap_pmdp_xchg Janosch Frank
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

From: Dominik Dingel <dingel@linux.vnet.ibm.com>

Guests backed by huge pages could theoretically free unused pages via
the diagnose 10 instruction. We currently don't allow that, so we
don't have to refault it once it's needed again.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 arch/s390/mm/gmap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index aceaeb5..056acfc 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -695,6 +695,9 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
 		vmaddr |= gaddr & ~PMD_MASK;
 		/* Find vma in the parent mm */
 		vma = find_vma(gmap->mm, vmaddr);
+		/* We do not discard pages that are backed by hugetlbfs */
+		if (vma && is_vm_hugetlb_page(vma))
+			continue;
 		size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
 		zap_page_range(vma, vmaddr, size);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 06/22] s390/mm: Introduce gmap_pmdp_xchg
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (4 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 05/22] s390/mm: hugetlb pages within a gmap can not be freed Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 07/22] RFC: s390/mm: Transfer guest pmd protection to host Janosch Frank
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

When changing guest pmds, we don't need to take care of the
corresponding host pmd. This means, we don't need to flush the host
TLB entries and we don't need to notify on all gmaps.

Let's introduce a function, that exchanges a pmd and takes care of the
necessary flushing and notification.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 arch/s390/mm/gmap.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 056acfc..a252fe7 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -23,6 +23,9 @@
 
 #define GMAP_SHADOW_FAKE_TABLE 1ULL
 
+static void gmap_pmdp_xchg(struct gmap *gmap, pmd_t *old, pmd_t new,
+			   unsigned long gaddr);
+
 /**
  * gmap_alloc - allocate and initialize a guest address space
  * @mm: pointer to the parent mm_struct
@@ -2231,6 +2234,26 @@ void ptep_notify(struct mm_struct *mm, unsigned long vmaddr,
 EXPORT_SYMBOL_GPL(ptep_notify);
 
 /**
+ * pmdp_notify_gmap - call all invalidation callbacks for a specific pmd
+ * @gmap: pointer to the guest address space structure
+ * @gaddr: guest address which is affected
+ *
+ * This function is expected to be called with a locked
+ * guest_table_lock.
+ */
+static void pmdp_notify_gmap(struct gmap *gmap, unsigned long gaddr)
+{
+	unsigned long *table;
+
+	gaddr &= HPAGE_MASK;
+	table = gmap_table_walk(gmap, gaddr, 1);
+	if (!table || !(*table & _SEGMENT_ENTRY_GMAP_IN))
+		return;
+	*table &= ~_SEGMENT_ENTRY_GMAP_IN;
+	gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1);
+}
+
+/**
  * pmdp_notify - call all invalidation callbacks for a specific pmd
  * @mm: pointer to the process mm_struct
  * @vmaddr: virtual address in the process address space
@@ -2374,6 +2397,32 @@ void gmap_pmdp_idte_global(struct mm_struct *mm, unsigned long vmaddr)
 }
 EXPORT_SYMBOL_GPL(gmap_pmdp_idte_global);
 
+/**
+ * gmap_pmdp_xchg - exchange a gmap pmd with another and notify
+ * @gmap: pointer to the guest address space structure
+ * @pmdp: pointer to the pmd entry
+ * @new: replacement entry
+ * @gaddr: the affected guest address
+ *
+ * This function is assumed to be called with the guest_table_lock
+ * held.
+ */
+static void gmap_pmdp_xchg(struct gmap *gmap, pmd_t *pmdp, pmd_t new,
+			   unsigned long gaddr)
+{
+	pmdp_notify_gmap(gmap, gaddr);
+	if (MACHINE_HAS_TLB_GUEST)
+		__pmdp_idte(gaddr, (pmd_t *)pmdp,
+			    IDTE_GUEST_ASCE, gmap->asce,
+			    IDTE_GLOBAL);
+	if (MACHINE_HAS_IDTE)
+		__pmdp_idte(gaddr, (pmd_t *)pmdp,
+			    0, 0, IDTE_GLOBAL);
+	else
+		__pmdp_csp(pmdp);
+	*pmdp = new;
+}
+
 static inline void thp_split_mm(struct mm_struct *mm)
 {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 07/22] RFC: s390/mm: Transfer guest pmd protection to host
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (5 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 06/22] s390/mm: Introduce gmap_pmdp_xchg Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 08/22] s390/mm: Add huge page dirty sync support Janosch Frank
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

If we protect the guest pmd, i.e. for dirty tracking, we need to
transfer the protection to the host pmd which we copied when linking
to the guest.

If we don't, we might loose changed that on migration, as changes on
host side don't get tracked.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 arch/s390/mm/gmap.c | 115 ++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 103 insertions(+), 12 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index a252fe7..dfa3a0d 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -15,6 +15,7 @@
 #include <linux/swapops.h>
 #include <linux/ksm.h>
 #include <linux/mman.h>
+#include <linux/hugetlb.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -934,6 +935,84 @@ static inline void gmap_pmd_op_end(struct gmap *gmap, pmd_t *pmdp)
 		spin_unlock(&gmap->guest_table_lock);
 }
 
+/**
+ * gmap_pmdp_transfer_prot - transfer protection of guest pmd to host pmd
+ * @mm: the memory context
+ * @address: the affected host virtual address
+ * @gpmdp: guest pmd ptr
+ * @hpmdp: host pmd ptr
+ *
+ * Transfers the protection from a guest pmd to the associated guest
+ * pmd. This has to be done with a plain idte to circumvent the gmap
+ * invalidation hooks in the standard invalidation functions provided
+ * by pgtable.c.
+ */
+static void gmap_pmdp_transfer_prot(struct mm_struct *mm, unsigned long addr,
+				    pmd_t *gpmdp, pmd_t *hpmdp)
+{
+	const int gpmd_i = pmd_val(*gpmdp) & _SEGMENT_ENTRY_INVALID;
+	const int gpmd_p = pmd_val(*gpmdp) & _SEGMENT_ENTRY_PROTECT;
+	const int hpmd_i = pmd_val(*hpmdp) & _SEGMENT_ENTRY_INVALID;
+	const int hpmd_p = pmd_val(*hpmdp) & _SEGMENT_ENTRY_PROTECT;
+	pmd_t new = *hpmdp;
+
+	/* Fastpath, change not needed. */
+	if (hpmd_i || (hpmd_p && gpmd_p) || (!gpmd_i && !gpmd_p))
+		return;
+
+	if (gpmd_p && !hpmd_p)
+		pmd_val(new) |= _SEGMENT_ENTRY_PROTECT;
+	if (!gpmd_i && !hpmd_i)
+		pmd_val(new) &= ~_SEGMENT_ENTRY_INVALID;
+
+	if (MACHINE_HAS_TLB_GUEST)
+		__pmdp_idte(addr, hpmdp,
+			    IDTE_NODAT | IDTE_GUEST_ASCE,
+			    mm->context.asce, IDTE_GLOBAL);
+	else if (MACHINE_HAS_IDTE)
+		__pmdp_idte(addr, hpmdp, 0, 0,
+			    IDTE_GLOBAL);
+	else
+		__pmdp_csp(hpmdp);
+	*hpmdp = new;
+}
+
+/**
+ * gmap_pmdp_force_prot - change access rights of a locked pmd
+ * @mm: pointer to the process mm_struct
+ * @addr: virtual address in the guest address space
+ * @pmdp: pointer to the page table entry
+ * @prot: indicates guest access rights: PROT_NONE, PROT_READ or PROT_WRITE
+ * @bits: software bit to set (e.g. for notification)
+ *
+ * Returns 0 if the access rights were changed and -EAGAIN if the current
+ * and requested access rights are incompatible.
+ */
+static int gmap_pmdp_force_prot(struct gmap *gmap, unsigned long addr,
+				pmd_t *pmdp, int prot, unsigned long bits)
+{
+	int pmd_i = pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID;
+	int pmd_p = pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT;
+	pmd_t new = *pmdp;
+
+	/* Fixup needed */
+	if ((pmd_i && (prot != PROT_NONE)) || (pmd_p && (prot == PROT_WRITE)))
+		return -EAGAIN;
+
+	if (prot == PROT_NONE && !pmd_i) {
+		pmd_val(new) |= _SEGMENT_ENTRY_INVALID;
+		gmap_pmdp_xchg(gmap, pmdp, new, addr);
+	}
+
+	if (prot == PROT_READ && !pmd_p) {
+		pmd_val(new) &= ~_SEGMENT_ENTRY_INVALID;
+		pmd_val(new) |= _SEGMENT_ENTRY_PROTECT;
+		gmap_pmdp_xchg(gmap, pmdp, new, addr);
+	}
+	pmd_val(*pmdp) |=  bits;
+	return 0;
+}
+
 /*
  * gmap_protect_pte - remove access rights to memory and set pgste bits
  * @gmap: pointer to guest mapping meta data structure
@@ -985,18 +1064,23 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
  * guest_table_lock held.
  */
 static int gmap_protect_pmd(struct gmap *gmap, unsigned long gaddr,
-			    pmd_t *pmdp, int prot, unsigned long bits)
+			    unsigned long vmaddr, pmd_t *pmdp, pmd_t *hpmdp,
+			    int prot, unsigned long bits)
 {
-	const int pmd_i = pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID;
-	const int pmd_p = pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT;
+	unsigned long sbits = 0;
+	int ret = 0;
 
-	/* Fixup needed */
-	if ((pmd_i && (prot != PROT_NONE)) || (pmd_p && (prot & PROT_WRITE)))
-		return -EAGAIN;
+	sbits |= (bits & GMAP_NOTIFY_MPROT) ? _SEGMENT_ENTRY_GMAP_IN : 0;
+	/* Protect gmap pmd */
+	ret = gmap_pmdp_force_prot(gmap, gaddr, pmdp, prot, sbits);
+	/*
+	 * Transfer protection back to the host pmd, so userspace has
+	 * never more access rights than the VM.
+	 */
+	if (!ret)
+		gmap_pmdp_transfer_prot(gmap->mm, vmaddr, pmdp, hpmdp);
 
-	if (bits & GMAP_NOTIFY_MPROT)
-		pmd_val(*pmdp) |=  _SEGMENT_ENTRY_GMAP_IN;
-	return 0;
+	return ret;
 }
 
 /*
@@ -1017,12 +1101,18 @@ static int gmap_protect_pmd(struct gmap *gmap, unsigned long gaddr,
 static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
 			      unsigned long len, int prot, unsigned long bits)
 {
+	spinlock_t *ptl;
 	unsigned long vmaddr, dist;
-	pmd_t *pmdp;
+	pmd_t *pmdp, *hpmdp;
 	int rc;
 
 	while (len) {
 		rc = -EAGAIN;
+		vmaddr = __gmap_translate(gmap, gaddr);
+		hpmdp = (pmd_t *)huge_pte_offset(gmap->mm, vmaddr, HPAGE_SIZE);
+		/* Do we need tests here? */
+		ptl = pmd_lock(gmap->mm, hpmdp);
+
 		pmdp = gmap_pmd_op_walk(gmap, gaddr);
 		if (pmdp) {
 			if (!pmd_large(*pmdp)) {
@@ -1033,8 +1123,8 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
 					gaddr += PAGE_SIZE;
 				}
 			} else {
-				rc = gmap_protect_pmd(gmap, gaddr, pmdp, prot,
-						      bits);
+				rc =  gmap_protect_pmd(gmap, gaddr, vmaddr,
+						       pmdp, hpmdp, prot, bits);
 				if (!rc) {
 					dist = HPAGE_SIZE - (gaddr & ~HPAGE_MASK);
 					len = len < dist ? 0 : len - dist;
@@ -1043,6 +1133,7 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
 			}
 			gmap_pmd_op_end(gmap, pmdp);
 		}
+		spin_unlock(ptl);
 		if (rc) {
 			vmaddr = __gmap_translate(gmap, gaddr);
 			if (IS_ERR_VALUE(vmaddr))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 08/22] s390/mm: Add huge page dirty sync support
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (6 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 07/22] RFC: s390/mm: Transfer guest pmd protection to host Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 09/22] s390/mm: clear huge page storage keys on enable_skey Janosch Frank
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

To do dirty loging with huge pages, we protect huge pmds in the
gmap. When they are written to, we unprotect them and mark them dirty.

We introduce the function gmap_test_and_clear_dirty_segment which
handles dirty sync for huge pages.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 arch/s390/include/asm/gmap.h |  6 +++-
 arch/s390/kvm/kvm-s390.c     | 18 ++++++----
 arch/s390/mm/gmap.c          | 80 ++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 94 insertions(+), 10 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index 21bb658..ba12eef 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -13,8 +13,10 @@
 #define GMAP_NOTIFY_SHADOW	0x2
 #define GMAP_NOTIFY_MPROT	0x1
 
-/* Status bits in the gmap segment entry. */
+/* Status bits in huge and non-huge gmap segment entries. */
 #define _SEGMENT_ENTRY_GMAP_IN		0x0001	/* invalidation notify bit */
+/* Status bits only for huge segment entries */
+#define _SEGMENT_ENTRY_GMAP_UC		0x4000	/* user dirty (migration) */
 
 /**
  * struct gmap_struct - guest address space
@@ -139,4 +141,6 @@ void gmap_pte_notify(struct mm_struct *, unsigned long addr, pte_t *,
 int gmap_mprotect_notify(struct gmap *, unsigned long start,
 			 unsigned long len, int prot);
 
+void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],
+			     unsigned long gaddr, unsigned long vmaddr);
 #endif /* _ASM_S390_GMAP_H */
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index ec8b68e..73fb3bc 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -428,19 +428,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 }
 
 static void kvm_s390_sync_dirty_log(struct kvm *kvm,
-					struct kvm_memory_slot *memslot)
+				    struct kvm_memory_slot *memslot)
 {
 	gfn_t cur_gfn, last_gfn;
-	unsigned long address;
+	unsigned long gaddr, vmaddr;
+	unsigned long *dirty = memslot->dirty_bitmap;
 	struct gmap *gmap = kvm->arch.gmap;
 
-	/* Loop over all guest pages */
+	/* Loop over all guest segments */
 	last_gfn = memslot->base_gfn + memslot->npages;
-	for (cur_gfn = memslot->base_gfn; cur_gfn <= last_gfn; cur_gfn++) {
-		address = gfn_to_hva_memslot(memslot, cur_gfn);
+	for (cur_gfn = memslot->base_gfn; cur_gfn <= last_gfn; cur_gfn += _PAGE_ENTRIES, dirty += 4) {
+		gaddr = gfn_to_gpa(cur_gfn);
+		vmaddr = gfn_to_hva_memslot(memslot, cur_gfn);
+		if (kvm_is_error_hva(vmaddr))
+			continue;
+
+		gmap_sync_dirty_log_pmd(gmap, dirty, gaddr, vmaddr);
 
-		if (test_and_clear_guest_dirty(gmap->mm, address))
-			mark_page_dirty(kvm, cur_gfn);
 		if (fatal_signal_pending(current))
 			return;
 		cond_resched();
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index dfa3a0d..fa99e21 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -545,6 +545,7 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
 	p4d_t *p4d;
 	pud_t *pud;
 	pmd_t *pmd;
+	pmd_t unprot;
 	int rc;
 
 	BUG_ON(gmap_is_shadow(gmap));
@@ -602,12 +603,19 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
 				       vmaddr >> PMD_SHIFT, table);
 		if (!rc) {
 			if (pmd_large(*pmd)) {
-				*table = pmd_val(*pmd) &
-					_SEGMENT_ENTRY_HARDWARE_BITS_LARGE;
+				*table = (pmd_val(*pmd) &
+					  _SEGMENT_ENTRY_HARDWARE_BITS_LARGE)
+					| _SEGMENT_ENTRY_GMAP_UC;
 			} else
 				*table = pmd_val(*pmd) &
 					_SEGMENT_ENTRY_HARDWARE_BITS;
 		}
+	} else if (*table & _SEGMENT_ENTRY_PROTECT &&
+		   !(pmd_val(*pmd) & _SEGMENT_ENTRY_PROTECT)) {
+		unprot = __pmd((*table & (_SEGMENT_ENTRY_HARDWARE_BITS_LARGE
+					  & ~_SEGMENT_ENTRY_PROTECT))
+			       | _SEGMENT_ENTRY_GMAP_UC);
+		gmap_pmdp_xchg(gmap, (pmd_t *)table, unprot, gaddr);
 	}
 	spin_unlock(&gmap->guest_table_lock);
 	spin_unlock(ptl);
@@ -2514,6 +2522,74 @@ static void gmap_pmdp_xchg(struct gmap *gmap, pmd_t *pmdp, pmd_t new,
 	*pmdp = new;
 }
 
+/**
+ * gmap_test_and_clear_dirty_segment - test and reset segment dirty status
+ * @gmap: pointer to guest address space
+ * @pmdp: pointer to the pmd to be tested
+ * @gaddr: virtual address in the guest address space
+ *
+ * This function is assumed to be called with the guest_table_lock
+ * held.
+ */
+bool gmap_test_and_clear_dirty_segment(struct gmap *gmap, pmd_t *pmdp,
+				       pmd_t *hpmdp, unsigned long gaddr,
+				       unsigned long vmaddr)
+{
+	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
+		return false;
+
+	/* Already protected memory, which did not change is clean */
+	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT &&
+	    !(pmd_val(*pmdp) & _SEGMENT_ENTRY_GMAP_UC))
+		return false;
+
+	/* Clear UC indication and reset protection */
+	pmd_val(*pmdp) &= ~_SEGMENT_ENTRY_GMAP_UC;
+	gmap_protect_pmd(gmap, gaddr, vmaddr, pmdp, hpmdp, PROT_READ, 0);
+	return true;
+}
+
+/**
+ * gmap_sync_dirty_log_pmd - set bitmap based on dirty status of segment
+ * @gmap: pointer to guest address space
+ * @bitmap: dirty bitmap for this pmd
+ * @gaddr: virtual address in the guest address space
+ * @vmaddr: virtual address in the host address space
+ *
+ * This function is assumed to be called with the guest_table_lock
+ * held.
+ */
+void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long bitmap[4],
+			     unsigned long gaddr, unsigned long vmaddr)
+{
+	int i = 0;
+	pmd_t *pmdp, *hpmdp;
+	spinlock_t *ptl;
+
+	hpmdp = (pmd_t *)huge_pte_offset(gmap->mm, vmaddr, HPAGE_SIZE);
+	if (!hpmdp)
+		return;
+	ptl = pmd_lock(gmap->mm, hpmdp);
+	pmdp = gmap_pmd_op_walk(gmap, gaddr);
+	if (!pmdp) {
+		spin_unlock(ptl);
+		return;
+	}
+
+	if (pmd_large(*pmdp)) {
+		if (gmap_test_and_clear_dirty_segment(gmap, pmdp, hpmdp,
+						      gaddr, vmaddr))
+			memset(bitmap, 0xFF, 32);
+	} else {
+		for (; i < _PAGE_ENTRIES; i++, vmaddr += PAGE_SIZE) {
+			if (test_and_clear_guest_dirty(gmap->mm, vmaddr))
+				set_bit_le(i, bitmap);
+		}
+	}
+	gmap_pmd_op_end(gmap, pmdp);
+	spin_unlock(ptl);
+}
+
 static inline void thp_split_mm(struct mm_struct *mm)
 {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 09/22] s390/mm: clear huge page storage keys on enable_skey
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (7 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 08/22] s390/mm: Add huge page dirty sync support Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 10/22] s390/mm: Add huge pmd storage key handling Janosch Frank
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

From: Dominik Dingel <dingel@linux.vnet.ibm.com>

When a guest starts using storage keys, we trap and set a default one
for its whole valid address space. With this patch we are now able to
do that for large pages.

To speed up the storage key insertion, we use
__storage_key_init_range, which in-turn will use sske_frame to set
multiple storage keys with one instruction. As it has been previously
used for debuging we have to get rid of the default key check and make
it quiescing.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
[replaced page_set_storage_key loop with __storage_key_init_range]
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 arch/s390/mm/gmap.c     | 26 +++++++++++++++++++++++---
 arch/s390/mm/pageattr.c |  6 ++----
 2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index fa99e21..ffc11d8 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2666,17 +2666,37 @@ EXPORT_SYMBOL_GPL(s390_enable_sie);
  * Enable storage key handling from now on and initialize the storage
  * keys with the default key.
  */
-static int __s390_enable_skey(pte_t *pte, unsigned long addr,
-			      unsigned long next, struct mm_walk *walk)
+static int __s390_enable_skey_pte(pte_t *pte, unsigned long addr,
+				  unsigned long next, struct mm_walk *walk)
 {
 	/* Clear storage key */
 	ptep_zap_key(walk->mm, addr, pte);
 	return 0;
 }
 
+static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr,
+				      unsigned long hmask, unsigned long next,
+				      struct mm_walk *walk)
+{
+	pmd_t *pmd = (pmd_t *)pte;
+	unsigned long start, end;
+
+	if (pmd_val(*pmd) & _SEGMENT_ENTRY_INVALID
+	    || !(pmd_val(*pmd) & _SEGMENT_ENTRY_WRITE))
+		return 0;
+
+	start = pmd_val(*pmd) & HPAGE_MASK;
+	end = start + HPAGE_SIZE - 1;
+	__storage_key_init_range(start, end);
+	return 0;
+}
+
 int s390_enable_skey(void)
 {
-	struct mm_walk walk = { .pte_entry = __s390_enable_skey };
+	struct mm_walk walk = {
+		.hugetlb_entry = __s390_enable_skey_hugetlb,
+		.pte_entry = __s390_enable_skey_pte,
+	};
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
 	int rc = 0;
diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c
index c441715..f8c6faa 100644
--- a/arch/s390/mm/pageattr.c
+++ b/arch/s390/mm/pageattr.c
@@ -14,7 +14,7 @@
 
 static inline unsigned long sske_frame(unsigned long addr, unsigned char skey)
 {
-	asm volatile(".insn rrf,0xb22b0000,%[skey],%[addr],9,0"
+	asm volatile(".insn rrf,0xb22b0000,%[skey],%[addr],1,0"
 		     : [addr] "+a" (addr) : [skey] "d" (skey));
 	return addr;
 }
@@ -23,8 +23,6 @@ void __storage_key_init_range(unsigned long start, unsigned long end)
 {
 	unsigned long boundary, size;
 
-	if (!PAGE_DEFAULT_KEY)
-		return;
 	while (start < end) {
 		if (MACHINE_HAS_EDAT1) {
 			/* set storage keys for a 1MB frame */
@@ -37,7 +35,7 @@ void __storage_key_init_range(unsigned long start, unsigned long end)
 				continue;
 			}
 		}
-		page_set_storage_key(start, PAGE_DEFAULT_KEY, 0);
+		page_set_storage_key(start, PAGE_DEFAULT_KEY, 1);
 		start += PAGE_SIZE;
 	}
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 10/22] s390/mm: Add huge pmd storage key handling
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (8 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 09/22] s390/mm: clear huge page storage keys on enable_skey Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 11/22] s390/mm: Remove superfluous parameter Janosch Frank
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

Storage keys for guests with huge page mappings have to directly set
the key in hardware. There are no PGSTEs for PMDs that we could use to
retain the guests's logical view of the key.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 arch/s390/mm/pgtable.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 98 insertions(+), 6 deletions(-)

diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index e690879..d18b80e 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -766,12 +766,45 @@ EXPORT_SYMBOL_GPL(test_and_clear_guest_dirty);
 int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 			  unsigned char key, bool nq)
 {
-	unsigned long keyul;
+	unsigned long keyul, address;
 	spinlock_t *ptl;
 	pgste_t old, new;
+	pgd_t *pgd;
+	p4d_t *p4d;
+	pud_t *pud;
+	pmd_t *pmd;
 	pte_t *ptep;
 
-	ptep = get_locked_pte(mm, addr, &ptl);
+	pgd = pgd_offset(mm, addr);
+	p4d = p4d_alloc(mm, pgd, addr);
+	if (!p4d)
+		return -EFAULT;
+	pud = pud_alloc(mm, p4d, addr);
+	if (!pud)
+		return -EFAULT;
+	pmd = pmd_alloc(mm, pud, addr);
+	if (!pmd)
+		return -EFAULT;
+
+	ptl = pmd_lock(mm, pmd);
+	if (!pmd_present(*pmd)) {
+		spin_unlock(ptl);
+		return -EFAULT;
+	}
+	if (pmd_large(*pmd)) {
+		address = pmd_val(*pmd) & HPAGE_MASK;
+		address |= addr & ~HPAGE_MASK;
+		/*
+		 * Huge pmds need quiescing operations, they are
+		 * always mapped.
+		 */
+		page_set_storage_key(address, key, 1);
+		spin_unlock(ptl);
+		return 0;
+	}
+	spin_unlock(ptl);
+
+	ptep = pte_alloc_map_lock(mm, pmd, addr, &ptl);
 	if (unlikely(!ptep))
 		return -EFAULT;
 
@@ -782,7 +815,7 @@ int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 	pgste_val(new) |= (keyul & (_PAGE_CHANGED | _PAGE_REFERENCED)) << 48;
 	pgste_val(new) |= (keyul & (_PAGE_ACC_BITS | _PAGE_FP_BIT)) << 56;
 	if (!(pte_val(*ptep) & _PAGE_INVALID)) {
-		unsigned long address, bits, skey;
+		unsigned long bits, skey;
 
 		address = pte_val(*ptep) & PAGE_MASK;
 		skey = (unsigned long) page_get_storage_key(address);
@@ -845,14 +878,43 @@ EXPORT_SYMBOL(cond_set_guest_storage_key);
 int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr)
 {
 	spinlock_t *ptl;
+	unsigned long address;
 	pgste_t old, new;
+	pgd_t *pgd;
+	p4d_t *p4d;
+	pud_t *pud;
+	pmd_t *pmd;
 	pte_t *ptep;
 	int cc = 0;
 
-	ptep = get_locked_pte(mm, addr, &ptl);
-	if (unlikely(!ptep))
+	pgd = pgd_offset(mm, addr);
+	p4d = p4d_alloc(mm, pgd, addr);
+	if (!p4d)
+		return -EFAULT;
+	pud = pud_alloc(mm, p4d, addr);
+	if (!pud)
+		return -EFAULT;
+	pmd = pmd_alloc(mm, pud, addr);
+	if (!pmd)
 		return -EFAULT;
 
+	ptl = pmd_lock(mm, pmd);
+	if (!pmd_present(*pmd)) {
+		spin_unlock(ptl);
+		return -EFAULT;
+	}
+	if (pmd_large(*pmd)) {
+		address = pmd_val(*pmd) & HPAGE_MASK;
+		address |= addr & ~HPAGE_MASK;
+		cc = page_reset_referenced(addr);
+		spin_unlock(ptl);
+		return cc;
+	}
+	spin_unlock(ptl);
+
+	ptep = pte_alloc_map_lock(mm, pmd, addr, &ptl);
+	if (unlikely(!ptep))
+		return -EFAULT;
 	new = old = pgste_get_lock(ptep);
 	/* Reset guest reference bit only */
 	pgste_val(new) &= ~PGSTE_GR_BIT;
@@ -877,11 +939,41 @@ EXPORT_SYMBOL(reset_guest_reference_bit);
 int get_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 			  unsigned char *key)
 {
+	unsigned long address;
 	spinlock_t *ptl;
 	pgste_t pgste;
+	pgd_t *pgd;
+	p4d_t *p4d;
+	pud_t *pud;
+	pmd_t *pmd;
 	pte_t *ptep;
 
-	ptep = get_locked_pte(mm, addr, &ptl);
+	pgd = pgd_offset(mm, addr);
+	p4d = p4d_alloc(mm, pgd, addr);
+	if (!p4d)
+		return -EFAULT;
+	pud = pud_alloc(mm, p4d, addr);
+	if (!pud)
+		return -EFAULT;
+	pmd = pmd_alloc(mm, pud, addr);
+	if (!pmd)
+		return -EFAULT;
+
+	ptl = pmd_lock(mm, pmd);
+	if (!pmd_present(*pmd)) {
+		spin_unlock(ptl);
+		return -EFAULT;
+	}
+	if (pmd_large(*pmd)) {
+		address = pmd_val(*pmd) & HPAGE_MASK;
+		address |= addr & ~HPAGE_MASK;
+		*key = page_get_storage_key(address);
+		spin_unlock(ptl);
+		return 0;
+	}
+	spin_unlock(ptl);
+
+	ptep = pte_alloc_map_lock(mm, pmd, addr, &ptl);
 	if (unlikely(!ptep))
 		return -EFAULT;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 11/22] s390/mm: Remove superfluous parameter
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (9 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 10/22] s390/mm: Add huge pmd storage key handling Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2017-12-21  9:22   ` Janosch Frank
                     ` (2 more replies)
  2017-12-13 12:53 ` [RFC/PATCH v2 12/22] s390/mm: Add gmap_protect_large read protection support Janosch Frank
                   ` (13 subsequent siblings)
  24 siblings, 3 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

It seems it hasn't even been used before the last cleanup and was
overlooked.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 arch/s390/mm/gmap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index ffc11d8..d396da8 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2237,7 +2237,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_page);
  * Called with sg->parent->shadow_lock.
  */
 static void gmap_shadow_notify(struct gmap *sg, unsigned long vmaddr,
-			       unsigned long gaddr, pte_t *pte)
+			       unsigned long gaddr)
 {
 	struct gmap_rmap *rmap, *rnext, *head;
 	unsigned long start, end, bits, raddr;
@@ -2322,7 +2322,7 @@ void ptep_notify(struct mm_struct *mm, unsigned long vmaddr,
 			spin_lock(&gmap->shadow_lock);
 			list_for_each_entry_safe(sg, next,
 						 &gmap->children, list)
-				gmap_shadow_notify(sg, vmaddr, gaddr, pte);
+				gmap_shadow_notify(sg, vmaddr, gaddr);
 			spin_unlock(&gmap->shadow_lock);
 		}
 		if (bits & PGSTE_IN_BIT)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 12/22] s390/mm: Add gmap_protect_large read protection support
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (10 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 11/22] s390/mm: Remove superfluous parameter Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 13/22] s390/mm: Make gmap_read_table EDAT1 compatible Janosch Frank
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

We need to be able to write protect segments for when shadowing them
for VSIE.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 arch/s390/include/asm/gmap.h | 1 +
 arch/s390/mm/gmap.c          | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index ba12eef..518e00c 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -17,6 +17,7 @@
 #define _SEGMENT_ENTRY_GMAP_IN		0x0001	/* invalidation notify bit */
 /* Status bits only for huge segment entries */
 #define _SEGMENT_ENTRY_GMAP_UC		0x4000	/* user dirty (migration) */
+#define _SEGMENT_ENTRY_GMAP_VSIE	0x8000	/* vsie bit */
 
 /**
  * struct gmap_struct - guest address space
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index d396da8..dabc734 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -1079,6 +1079,7 @@ static int gmap_protect_pmd(struct gmap *gmap, unsigned long gaddr,
 	int ret = 0;
 
 	sbits |= (bits & GMAP_NOTIFY_MPROT) ? _SEGMENT_ENTRY_GMAP_IN : 0;
+	sbits |= (bits & GMAP_NOTIFY_SHADOW) ? _SEGMENT_ENTRY_GMAP_VSIE : 0;
 	/* Protect gmap pmd */
 	ret = gmap_pmdp_force_prot(gmap, gaddr, pmdp, prot, sbits);
 	/*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 13/22] s390/mm: Make gmap_read_table EDAT1 compatible
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (11 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 12/22] s390/mm: Add gmap_protect_large read protection support Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 14/22] s390/mm: Make protect_rmap " Janosch Frank
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

For the VSIE we shadow the GMAP/DAT tables that a guest N builds for
its N + 1 guest. This means we read the GMAP from the memory of guest
N and use the retrieved information to build a shadow version from it
which is actually used to run the guest.

gmap_read_table is used to retrieve the data from the guest's address
space. Unfortunately it currently has no support for reading from huge
guests, so let's add that.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 arch/s390/mm/gmap.c | 37 ++++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index dabc734..828eadf 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -1201,23 +1201,38 @@ int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val)
 {
 	unsigned long address, vmaddr;
 	spinlock_t *ptl;
+	pmd_t *pmdp, pmd;
 	pte_t *ptep, pte;
 	int rc;
 
 	while (1) {
 		rc = -EAGAIN;
-		ptep = gmap_pte_op_walk(gmap, gaddr, &ptl);
-		if (ptep) {
-			pte = *ptep;
-			if (pte_present(pte) && (pte_val(pte) & _PAGE_READ)) {
-				address = pte_val(pte) & PAGE_MASK;
-				address += gaddr & ~PAGE_MASK;
-				*val = *(unsigned long *) address;
-				pte_val(*ptep) |= _PAGE_YOUNG;
-				/* Do *NOT* clear the _PAGE_INVALID bit! */
-				rc = 0;
+		pmdp = gmap_pmd_op_walk(gmap, gaddr);
+		if (pmdp) {
+			if (!pmd_large(*pmdp)) {
+				ptep = pte_alloc_map_lock(gmap->mm, pmdp, gaddr, &ptl);
+				if (ptep) {
+					pte = *ptep;
+					if (pte_present(pte) && (pte_val(pte) & _PAGE_READ)) {
+						address = pte_val(pte) & PAGE_MASK;
+						address += gaddr & ~PAGE_MASK;
+						*val = *(unsigned long *) address;
+						pte_val(*ptep) |= _PAGE_YOUNG;
+						/* Do *NOT* clear the _PAGE_INVALID bit! */
+						rc = 0;
+					}
+					gmap_pte_op_end(ptl);
+				}
+			} else {
+				pmd = *pmdp;
+				if (!(pmd_val(pmd) & _SEGMENT_ENTRY_INVALID)) {
+					address = pmd_val(pmd) & HPAGE_MASK;
+					address += gaddr & ~HPAGE_MASK;
+					*val = *(unsigned long *) address;
+					rc = 0;
+				}
 			}
-			gmap_pte_op_end(ptl);
+			gmap_pmd_op_end(gmap, pmdp);
 		}
 		if (!rc)
 			break;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 14/22] s390/mm: Make protect_rmap EDAT1 compatible
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (12 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 13/22] s390/mm: Make gmap_read_table EDAT1 compatible Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 15/22] s390/mm: GMAP read table extensions Janosch Frank
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

When shadowing, we must make sure, that any changes to the GMAP inside
guest N will also be directly reflected in our shadow GMAP. This is
done by write-protecting guest N memory at the places where it stores
DAT tables for guest N + 1.

This still lacks EDAT1 support, so let's add it.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 arch/s390/mm/gmap.c | 80 +++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 66 insertions(+), 14 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 828eadf..d45ac26da 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -1275,6 +1275,51 @@ static inline void gmap_insert_rmap(struct gmap *sg, unsigned long vmaddr,
 	}
 }
 
+static int gmap_protect_rmap_pmd(struct gmap *sg, struct gmap_rmap *rmap,
+				 unsigned long paddr, unsigned long vmaddr,
+				 pmd_t *pmdp, int prot)
+{
+	int rc = 0;
+
+	/* We have no upper segment, let's go back and fix this up. */
+	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
+		return -EAGAIN;
+
+	spin_lock(&sg->guest_table_lock);
+	rc = gmap_protect_large(sg->parent, paddr, pmdp,
+				prot, GMAP_NOTIFY_SHADOW);
+	if (!rc)
+		gmap_insert_rmap(sg, vmaddr & HPAGE_MASK, rmap);
+
+	spin_unlock(&sg->guest_table_lock);
+	return rc;
+}
+
+static int gmap_protect_rmap_pte(struct gmap *sg, struct gmap_rmap *rmap,
+				 unsigned long paddr, unsigned long vmaddr,
+				 pmd_t *pmdp, int prot)
+{
+	int rc = 0;
+	pte_t *ptep = NULL;
+	spinlock_t *ptl = NULL;
+
+	/* We have no upper segment, let's go back and fix this up. */
+	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
+		return -EAGAIN;
+
+	ptep = pte_alloc_map_lock(sg->parent->mm, pmdp, paddr, &ptl);
+	if (ptep) {
+		spin_lock(&sg->guest_table_lock);
+		rc = ptep_force_prot(sg->parent->mm, paddr, ptep, prot,
+				     PGSTE_VSIE_BIT);
+		if (!rc)
+			gmap_insert_rmap(sg, vmaddr, rmap);
+		spin_unlock(&sg->guest_table_lock);
+		gmap_pte_op_end(ptl);
+	}
+	return rc;
+}
+
 /**
  * gmap_protect_rmap - modify access rights to memory and create an rmap
  * @sg: pointer to the shadow guest address space structure
@@ -1291,9 +1336,8 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
 {
 	struct gmap *parent;
 	struct gmap_rmap *rmap;
-	unsigned long vmaddr;
-	spinlock_t *ptl;
-	pte_t *ptep;
+	unsigned long vmaddr, dist;
+	pmd_t *pmdp;
 	int rc;
 
 	BUG_ON(!gmap_is_shadow(sg));
@@ -1312,15 +1356,25 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
 			return rc;
 		}
 		rc = -EAGAIN;
-		ptep = gmap_pte_op_walk(parent, paddr, &ptl);
-		if (ptep) {
-			spin_lock(&sg->guest_table_lock);
-			rc = ptep_force_prot(parent->mm, paddr, ptep, prot,
-					     GMAP_NOTIFY_SHADOW);
-			if (!rc)
-				gmap_insert_rmap(sg, vmaddr, rmap);
-			spin_unlock(&sg->guest_table_lock);
-			gmap_pte_op_end(ptl);
+		pmdp = gmap_pmd_op_walk(parent, paddr);
+		if (pmdp) {
+			if (!pmd_large(*pmdp)) {
+				rc = gmap_protect_rmap_pte(sg, rmap, paddr,
+							   vmaddr, pmdp, prot);
+				if (!rc) {
+					paddr += PAGE_SIZE;
+					len -= PAGE_SIZE;
+				}
+			} else {
+				rc = gmap_protect_rmap_pmd(sg, rmap, paddr,
+							   vmaddr, pmdp, prot);
+				if (!rc) {
+					dist = HPAGE_SIZE - (paddr & ~HPAGE_MASK);
+					len = len < dist ? 0 : len - dist;
+					paddr = (paddr & HPAGE_MASK) + HPAGE_SIZE;
+				}
+			}
+			gmap_pmd_op_end(parent, pmdp);
 		}
 		radix_tree_preload_end();
 		if (rc) {
@@ -1330,8 +1384,6 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
 				return rc;
 			continue;
 		}
-		paddr += PAGE_SIZE;
-		len -= PAGE_SIZE;
 	}
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 15/22] s390/mm: GMAP read table extensions
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (13 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 14/22] s390/mm: Make protect_rmap " Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 16/22] s390/mm: Add shadow segment code Janosch Frank
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

gmap_read_table has to tell us to which guest DAT level it traveled to
get the information we requested. With this information we can start
shadowing without searching for the pmd or pte a second time.

* This commit will most likely merged into the read table edat
* extension or into the shadowing.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 arch/s390/include/asm/gmap.h |  2 +-
 arch/s390/kvm/gaccess.c      | 15 +++++++++------
 arch/s390/mm/gmap.c          |  4 +++-
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index 518e00c..24c2f86 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -117,7 +117,7 @@ void gmap_discard(struct gmap *, unsigned long from, unsigned long to);
 void __gmap_zap(struct gmap *, unsigned long gaddr);
 void gmap_unlink(struct mm_struct *, unsigned long *table, unsigned long vmaddr);
 
-int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val);
+int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val, int *fc);
 
 struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce,
 			 int edat_level);
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index c24bfa7..ebea9ec 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -987,7 +987,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
 	union asce asce;
 	union vaddress vaddr;
 	unsigned long ptr;
-	int rc;
+	int rc, fc = 0;
 
 	*fake = 0;
 	*dat_protection = 0;
@@ -1034,7 +1034,8 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
 			rfte.val = ptr;
 			goto shadow_r2t;
 		}
-		rc = gmap_read_table(parent, ptr + vaddr.rfx * 8, &rfte.val);
+		rc = gmap_read_table(parent, ptr + vaddr.rfx * 8, &rfte.val,
+				     &fc);
 		if (rc)
 			return rc;
 		if (rfte.i)
@@ -1060,7 +1061,8 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
 			rste.val = ptr;
 			goto shadow_r3t;
 		}
-		rc = gmap_read_table(parent, ptr + vaddr.rsx * 8, &rste.val);
+		rc = gmap_read_table(parent, ptr + vaddr.rsx * 8, &rste.val,
+				     &fc);
 		if (rc)
 			return rc;
 		if (rste.i)
@@ -1087,7 +1089,8 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
 			rtte.val = ptr;
 			goto shadow_sgt;
 		}
-		rc = gmap_read_table(parent, ptr + vaddr.rtx * 8, &rtte.val);
+		rc = gmap_read_table(parent, ptr + vaddr.rtx * 8, &rtte.val,
+				     &fc);
 		if (rc)
 			return rc;
 		if (rtte.i)
@@ -1123,7 +1126,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
 			ste.val = ptr;
 			goto shadow_pgt;
 		}
-		rc = gmap_read_table(parent, ptr + vaddr.sx * 8, &ste.val);
+		rc = gmap_read_table(parent, ptr + vaddr.sx * 8, &ste.val, &fc);
 		if (rc)
 			return rc;
 		if (ste.i)
@@ -1192,7 +1195,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
 		goto shadow_page;
 	}
 	if (!rc)
-		rc = gmap_read_table(sg->parent, pgt + vaddr.px * 8, &pte.val);
+		rc = gmap_read_table(sg->parent, pgt + vaddr.px * 8, &pte.val, &fc);
 	if (!rc && pte.i)
 		rc = PGM_PAGE_TRANSLATION;
 	if (!rc && pte.z)
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index d45ac26da..8833d2a 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -1197,7 +1197,8 @@ EXPORT_SYMBOL_GPL(gmap_mprotect_notify);
  *
  * Called with gmap->mm->mmap_sem in read.
  */
-int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val)
+int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val,
+		    int *fc)
 {
 	unsigned long address, vmaddr;
 	spinlock_t *ptl;
@@ -1229,6 +1230,7 @@ int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val)
 					address = pmd_val(pmd) & HPAGE_MASK;
 					address += gaddr & ~HPAGE_MASK;
 					*val = *(unsigned long *) address;
+					*fc = 1;
 					rc = 0;
 				}
 			}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 16/22] s390/mm: Add shadow segment code
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (14 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 15/22] s390/mm: GMAP read table extensions Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 17/22] s390/mm: Add VSIE reverse fake case Janosch Frank
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

The VSIE code does not yet support shadowing large hosts. Let's add
large to large shadowing.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 arch/s390/include/asm/gmap.h |   6 +-
 arch/s390/kvm/gaccess.c      |  35 +++++-
 arch/s390/mm/gmap.c          | 292 +++++++++++++++++++++++++++++++++++++------
 3 files changed, 292 insertions(+), 41 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index 24c2f86..5247549 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -130,9 +130,11 @@ int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sgt,
 		    int fake);
 int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,
 		    int fake);
-int gmap_shadow_pgt_lookup(struct gmap *sg, unsigned long saddr,
-			   unsigned long *pgt, int *dat_protection, int *fake);
+int gmap_shadow_sgt_lookup(struct gmap *sg, unsigned long saddr,
+			   unsigned long *pgt, int *dat_protection,
+			   int *fake);
 int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte);
+int gmap_shadow_segment(struct gmap *sg, unsigned long saddr, pmd_t pmd);
 
 void gmap_register_pte_notifier(struct gmap_notifier *);
 void gmap_unregister_pte_notifier(struct gmap_notifier *);
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index ebea9ec..045d12e 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -981,7 +981,7 @@ int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra)
  */
 static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
 				  unsigned long *pgt, int *dat_protection,
-				  int *fake)
+				  int *fake, int *lvl)
 {
 	struct gmap *parent;
 	union asce asce;
@@ -1136,6 +1136,17 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
 		if (ste.cs && asce.p)
 			return PGM_TRANSLATION_SPEC;
 		*dat_protection |= ste.fc0.p;
+
+		/* Parent is mapped by huge pages. */
+		if (fc) {
+			/* Guest is also huge, easy case. */
+			if (ste.fc && sg->edat_level >= 1) {
+				*lvl = 1;
+				*pgt = ptr;
+				return 0;
+			}
+		}
+		/* Small to small and small to huge case */
 		if (ste.fc && sg->edat_level >= 1) {
 			*fake = 1;
 			ptr = ste.fc1.sfaa * _SEGMENT_SIZE;
@@ -1172,8 +1183,9 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
 {
 	union vaddress vaddr;
 	union page_table_entry pte;
+	union segment_table_entry ste;
 	unsigned long pgt;
-	int dat_protection, fake;
+	int dat_protection, fake, lvl, fc;
 	int rc;
 
 	down_read(&sg->mm->mmap_sem);
@@ -1184,12 +1196,26 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
 	 */
 	ipte_lock(vcpu);
 
-	rc = gmap_shadow_pgt_lookup(sg, saddr, &pgt, &dat_protection, &fake);
+	rc = gmap_shadow_sgt_lookup(sg, saddr, &pgt, &dat_protection, &fake);
 	if (rc)
 		rc = kvm_s390_shadow_tables(sg, saddr, &pgt, &dat_protection,
-					    &fake);
+					    &fake, &lvl);
 
 	vaddr.addr = saddr;
+
+	/* Shadow stopped at segment level, we map pmd to pmd */
+	if (lvl) {
+		if (!rc)
+			rc = gmap_read_table(sg->parent, pgt + vaddr.sx * 8,
+					     &ste.val, &fc);
+		if (!rc && ste.i)
+			rc = PGM_PAGE_TRANSLATION;
+		ste.fc1.p |= dat_protection;
+		if (!rc)
+			rc = gmap_shadow_segment(sg, saddr, __pmd(ste.val));
+		goto out;
+	}
+
 	if (fake) {
 		pte.val = pgt + vaddr.px * PAGE_SIZE;
 		goto shadow_page;
@@ -1204,6 +1230,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
 	pte.p |= dat_protection;
 	if (!rc)
 		rc = gmap_shadow_page(sg, saddr, __pte(pte.val));
+out:
 	ipte_unlock(vcpu);
 	up_read(&sg->mm->mmap_sem);
 	return rc;
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 8833d2a..b3d01d9 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -1279,7 +1279,7 @@ static inline void gmap_insert_rmap(struct gmap *sg, unsigned long vmaddr,
 
 static int gmap_protect_rmap_pmd(struct gmap *sg, struct gmap_rmap *rmap,
 				 unsigned long paddr, unsigned long vmaddr,
-				 pmd_t *pmdp, int prot)
+				 pmd_t *pmdp, pmd_t *hpmdp, int prot)
 {
 	int rc = 0;
 
@@ -1288,8 +1288,8 @@ static int gmap_protect_rmap_pmd(struct gmap *sg, struct gmap_rmap *rmap,
 		return -EAGAIN;
 
 	spin_lock(&sg->guest_table_lock);
-	rc = gmap_protect_large(sg->parent, paddr, pmdp,
-				prot, GMAP_NOTIFY_SHADOW);
+	rc = gmap_protect_pmd(sg->parent, paddr, vmaddr, pmdp, hpmdp,
+			      prot, GMAP_NOTIFY_SHADOW);
 	if (!rc)
 		gmap_insert_rmap(sg, vmaddr & HPAGE_MASK, rmap);
 
@@ -1339,7 +1339,8 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
 	struct gmap *parent;
 	struct gmap_rmap *rmap;
 	unsigned long vmaddr, dist;
-	pmd_t *pmdp;
+	spinlock_t *ptl;
+	pmd_t *pmdp, *hpmdp;
 	int rc;
 
 	BUG_ON(!gmap_is_shadow(sg));
@@ -1348,12 +1349,18 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
 		vmaddr = __gmap_translate(parent, paddr);
 		if (IS_ERR_VALUE(vmaddr))
 			return vmaddr;
+		hpmdp = (pmd_t *)huge_pte_offset(parent->mm, vmaddr, HPAGE_SIZE);
+		/* Do we need tests here? */
+		ptl = pmd_lock(parent->mm, hpmdp);
 		rmap = kzalloc(sizeof(*rmap), GFP_KERNEL);
-		if (!rmap)
+		if (!rmap) {
+			spin_unlock(ptl);
 			return -ENOMEM;
+		}
 		rmap->raddr = raddr;
 		rc = radix_tree_preload(GFP_KERNEL);
 		if (rc) {
+			spin_unlock(ptl);
 			kfree(rmap);
 			return rc;
 		}
@@ -1369,7 +1376,8 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
 				}
 			} else {
 				rc = gmap_protect_rmap_pmd(sg, rmap, paddr,
-							   vmaddr, pmdp, prot);
+							   vmaddr, pmdp,
+							   hpmdp, prot);
 				if (!rc) {
 					dist = HPAGE_SIZE - (paddr & ~HPAGE_MASK);
 					len = len < dist ? 0 : len - dist;
@@ -1378,6 +1386,7 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
 			}
 			gmap_pmd_op_end(parent, pmdp);
 		}
+		spin_unlock(ptl);
 		radix_tree_preload_end();
 		if (rc) {
 			kfree(rmap);
@@ -1391,6 +1400,7 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
 }
 
 #define _SHADOW_RMAP_MASK	0x7
+#define _SHADOW_RMAP_SEGMENT_LP	0x6
 #define _SHADOW_RMAP_REGION1	0x5
 #define _SHADOW_RMAP_REGION2	0x4
 #define _SHADOW_RMAP_REGION3	0x3
@@ -1498,13 +1508,16 @@ static void __gmap_unshadow_sgt(struct gmap *sg, unsigned long raddr,
 	for (i = 0; i < _CRST_ENTRIES; i++, raddr += _SEGMENT_SIZE) {
 		if (!(sgt[i] & _SEGMENT_ENTRY_ORIGIN))
 			continue;
-		pgt = (unsigned long *)(sgt[i] & _REGION_ENTRY_ORIGIN);
+
+		if (!(sgt[i] & _SEGMENT_ENTRY_LARGE)) {
+			pgt = (unsigned long *)(sgt[i] & _REGION_ENTRY_ORIGIN);
+			__gmap_unshadow_pgt(sg, raddr, pgt);
+			/* Free page table */
+			page = pfn_to_page(__pa(pgt) >> PAGE_SHIFT);
+			list_del(&page->lru);
+			page_table_free_pgste(page);
+		}
 		sgt[i] = _SEGMENT_ENTRY_EMPTY;
-		__gmap_unshadow_pgt(sg, raddr, pgt);
-		/* Free page table */
-		page = pfn_to_page(__pa(pgt) >> PAGE_SHIFT);
-		list_del(&page->lru);
-		page_table_free_pgste(page);
 	}
 }
 
@@ -2119,32 +2132,62 @@ EXPORT_SYMBOL_GPL(gmap_shadow_sgt);
  *
  * Called with sg->mm->mmap_sem in read.
  */
-int gmap_shadow_pgt_lookup(struct gmap *sg, unsigned long saddr,
+void gmap_shadow_pgt_lookup(struct gmap *sg, unsigned long *sge,
+			    unsigned long saddr, unsigned long *pgt,
+			    int *dat_protection, int *fake)
+{
+	struct page *page;
+
+	/* Shadow page tables are full pages (pte+pgste) */
+	page = pfn_to_page(*sge >> PAGE_SHIFT);
+	*pgt = page->index & ~GMAP_SHADOW_FAKE_TABLE;
+	*dat_protection = !!(*sge & _SEGMENT_ENTRY_PROTECT);
+	*fake = !!(page->index & GMAP_SHADOW_FAKE_TABLE);
+}
+EXPORT_SYMBOL_GPL(gmap_shadow_pgt_lookup);
+
+int gmap_shadow_sgt_lookup(struct gmap *sg, unsigned long saddr,
 			   unsigned long *pgt, int *dat_protection,
 			   int *fake)
 {
-	unsigned long *table;
+	unsigned long *sge, *r3e = NULL;
 	struct page *page;
-	int rc;
+	int rc = -EAGAIN;
 
 	BUG_ON(!gmap_is_shadow(sg));
 	spin_lock(&sg->guest_table_lock);
-	table = gmap_table_walk(sg, saddr, 1); /* get segment pointer */
-	if (table && !(*table & _SEGMENT_ENTRY_INVALID)) {
-		/* Shadow page tables are full pages (pte+pgste) */
-		page = pfn_to_page(*table >> PAGE_SHIFT);
-		*pgt = page->index & ~GMAP_SHADOW_FAKE_TABLE;
-		*dat_protection = !!(*table & _SEGMENT_ENTRY_PROTECT);
-		*fake = !!(page->index & GMAP_SHADOW_FAKE_TABLE);
-		rc = 0;
-	} else  {
-		rc = -EAGAIN;
+	if (sg->asce & _ASCE_TYPE_MASK) {
+		/* >2 GB guest */
+		r3e = (unsigned long *) gmap_table_walk(sg, saddr, 2);
+		if (!r3e || (*r3e & _REGION_ENTRY_INVALID))
+			goto out;
+		sge = (unsigned long *)(*r3e & _REGION_ENTRY_ORIGIN) + ((saddr & _SEGMENT_INDEX) >> _SEGMENT_SHIFT);
+	} else {
+		sge = (unsigned long *)(sg->asce & PAGE_MASK) + ((saddr & _SEGMENT_INDEX) >> _SEGMENT_SHIFT);
 	}
+	if (*sge & _SEGMENT_ENTRY_INVALID)
+		goto out;
+	rc = 0;
+	if (*sge & _SEGMENT_ENTRY_LARGE) {
+		if (r3e) {
+			page = pfn_to_page(*r3e >> PAGE_SHIFT);
+			*pgt = page->index & ~GMAP_SHADOW_FAKE_TABLE;
+			*dat_protection = !!(*r3e & _SEGMENT_ENTRY_PROTECT);
+			*fake = !!(page->index & GMAP_SHADOW_FAKE_TABLE);
+		} else {
+			*pgt = sg->orig_asce & PAGE_MASK;
+			*dat_protection = 0;
+			*fake = 0;
+		}
+	} else {
+		gmap_shadow_pgt_lookup(sg, sge, saddr, pgt,
+				       dat_protection, fake);
+	}
+out:
 	spin_unlock(&sg->guest_table_lock);
 	return rc;
-
 }
-EXPORT_SYMBOL_GPL(gmap_shadow_pgt_lookup);
+EXPORT_SYMBOL_GPL(gmap_shadow_sgt_lookup);
 
 /**
  * gmap_shadow_pgt - instantiate a shadow page table
@@ -2226,6 +2269,89 @@ int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,
 }
 EXPORT_SYMBOL_GPL(gmap_shadow_pgt);
 
+int gmap_shadow_segment(struct gmap *sg, unsigned long saddr, pmd_t pmd)
+{
+	struct gmap *parent;
+	struct gmap_rmap *rmap;
+	unsigned long vmaddr, paddr;
+	pmd_t spmd, tpmd, *spmdp = NULL, *tpmdp;
+	int prot;
+	int rc;
+
+	BUG_ON(!gmap_is_shadow(sg));
+	parent = sg->parent;
+
+	prot = (pmd_val(pmd) & _SEGMENT_ENTRY_PROTECT) ? PROT_READ : PROT_WRITE;
+	rmap = kzalloc(sizeof(*rmap), GFP_KERNEL);
+	if (!rmap)
+		return -ENOMEM;
+	rmap->raddr = (saddr & HPAGE_MASK) | _SHADOW_RMAP_SEGMENT_LP;
+
+	while (1) {
+		paddr = pmd_val(pmd) & HPAGE_MASK;
+		vmaddr = __gmap_translate(parent, paddr);
+		if (IS_ERR_VALUE(vmaddr)) {
+			rc = vmaddr;
+			break;
+		}
+		rc = radix_tree_preload(GFP_KERNEL);
+		if (rc)
+			break;
+		rc = -EAGAIN;
+
+		/* Let's look up the parent's mapping */
+		spmdp = gmap_pmd_op_walk(parent, paddr);
+		if (spmdp) {
+			spin_lock(&sg->guest_table_lock);
+			/* Get shadow segment table pointer */
+			tpmdp = (pmd_t *) gmap_table_walk(sg, saddr, 1);
+			if (!tpmdp) {
+				spin_unlock(&sg->guest_table_lock);
+				gmap_pmd_op_end(parent, spmdp);
+				radix_tree_preload_end();
+				break;
+			}
+			/* Shadowing magic happens here. */
+			if (!(pmd_val(*tpmdp) & _SEGMENT_ENTRY_INVALID)) {
+				rc = 0;	/* already shadowed */
+				spin_unlock(&sg->guest_table_lock);
+				gmap_pmd_op_end(parent, spmdp);
+				radix_tree_preload_end();
+				break;
+			}
+			spmd = *spmdp;
+			if (!(pmd_val(spmd) & _SEGMENT_ENTRY_INVALID) &&
+			    !((pmd_val(spmd) & _SEGMENT_ENTRY_PROTECT) &&
+			      !(pmd_val(pmd) & _SEGMENT_ENTRY_PROTECT))) {
+
+				*spmdp = __pmd(pmd_val(spmd)
+					       | _SEGMENT_ENTRY_GMAP_VSIE);
+
+				/* Insert shadow ste */
+				pmd_val(tpmd) = ((pmd_val(spmd) & HPAGE_MASK) |
+						 _SEGMENT_ENTRY_LARGE |
+						 (pmd_val(pmd) & _SEGMENT_ENTRY_PROTECT));
+				*tpmdp = tpmd;
+
+				gmap_insert_rmap(sg, vmaddr, rmap);
+				rc = 0;
+			}
+			spin_unlock(&sg->guest_table_lock);
+			gmap_pmd_op_end(parent, spmdp);
+		}
+		radix_tree_preload_end();
+		if (!rc)
+			break;
+		rc = gmap_pte_op_fixup(parent, paddr, vmaddr, prot);
+		if (rc)
+			break;
+	}
+	if (rc)
+		kfree(rmap);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(gmap_shadow_segment);
+
 /**
  * gmap_shadow_page - create a shadow page mapping
  * @sg: pointer to the shadow guest address space structure
@@ -2302,6 +2428,78 @@ int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte)
 EXPORT_SYMBOL_GPL(gmap_shadow_page);
 
 /**
+ * gmap_unshadow_segment - remove a huge segment from a shadow segment table
+ * @sg: pointer to the shadow guest address space structure
+ * @raddr: rmap address in the shadow guest address space
+ *
+ * Called with the sg->guest_table_lock
+ */
+static void gmap_unshadow_segment(struct gmap *sg, unsigned long raddr)
+{
+	unsigned long *table;
+
+	BUG_ON(!gmap_is_shadow(sg));
+	/* We already have the lock */
+	table = gmap_table_walk(sg, raddr, 1); /* get segment table pointer */
+	if (!table || *table & _SEGMENT_ENTRY_INVALID ||
+	    !(*table & _SEGMENT_ENTRY_LARGE))
+		return;
+	gmap_call_notifier(sg, raddr, raddr + (1UL << 20) - 1);
+	gmap_pmdp_xchg(sg, (pmd_t *)table, __pmd(_SEGMENT_ENTRY_EMPTY), raddr);
+}
+
+static void gmap_shadow_notify_pmd(struct gmap *sg, unsigned long vmaddr,
+				   unsigned long gaddr)
+{
+	struct gmap_rmap *rmap, *rnext, *head;
+	unsigned long start, end, bits, raddr;
+
+
+	BUG_ON(!gmap_is_shadow(sg));
+
+	spin_lock(&sg->guest_table_lock);
+	if (sg->removed) {
+		spin_unlock(&sg->guest_table_lock);
+		return;
+	}
+	/* Check for top level table */
+	start = sg->orig_asce & _ASCE_ORIGIN;
+	end = start + ((sg->orig_asce & _ASCE_TABLE_LENGTH) + 1) * 4096;
+	if (!(sg->orig_asce & _ASCE_REAL_SPACE) && gaddr >= start &&
+	    gaddr < ((end & HPAGE_MASK) + HPAGE_SIZE - 1)) {
+		/* The complete shadow table has to go */
+		gmap_unshadow(sg);
+		spin_unlock(&sg->guest_table_lock);
+		list_del(&sg->list);
+		gmap_put(sg);
+		return;
+	}
+	/* Remove the page table tree from on specific entry */
+	head = radix_tree_delete(&sg->host_to_rmap, (vmaddr & HPAGE_MASK) >> PAGE_SHIFT);
+	gmap_for_each_rmap_safe(rmap, rnext, head) {
+		bits = rmap->raddr & _SHADOW_RMAP_MASK;
+		raddr = rmap->raddr ^ bits;
+		switch (bits) {
+		case _SHADOW_RMAP_REGION1:
+			gmap_unshadow_r2t(sg, raddr);
+			break;
+		case _SHADOW_RMAP_REGION2:
+			gmap_unshadow_r3t(sg, raddr);
+			break;
+		case _SHADOW_RMAP_REGION3:
+			gmap_unshadow_sgt(sg, raddr);
+			break;
+		case _SHADOW_RMAP_SEGMENT_LP:
+			gmap_unshadow_segment(sg, raddr);
+			break;
+		}
+		kfree(rmap);
+	}
+	spin_unlock(&sg->guest_table_lock);
+}
+
+
+/**
  * gmap_shadow_notify - handle notifications for shadow gmap
  *
  * Called with sg->parent->shadow_lock.
@@ -2413,13 +2611,28 @@ EXPORT_SYMBOL_GPL(ptep_notify);
 static void pmdp_notify_gmap(struct gmap *gmap, unsigned long gaddr)
 {
 	unsigned long *table;
+	unsigned long vmaddr, bits;
+	struct gmap *sg, *next;
 
 	gaddr &= HPAGE_MASK;
 	table = gmap_table_walk(gmap, gaddr, 1);
-	if (!table || !(*table & _SEGMENT_ENTRY_GMAP_IN))
+	if (!table)
 		return;
-	*table &= ~_SEGMENT_ENTRY_GMAP_IN;
-	gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1);
+	bits = *table & (_SEGMENT_ENTRY_GMAP_IN | _SEGMENT_ENTRY_GMAP_VSIE);
+	if (!bits)
+		return;
+	*table ^= bits;
+	vmaddr = __gmap_translate(gmap, gaddr);
+	if (!list_empty(&gmap->children) && (bits & _SEGMENT_ENTRY_GMAP_VSIE)
+	    && (*table & _SEGMENT_ENTRY_PROTECT)) {
+		spin_lock(&gmap->shadow_lock);
+		list_for_each_entry_safe(sg, next,
+					 &gmap->children, list)
+			gmap_shadow_notify_pmd(sg, vmaddr, gaddr);
+		spin_unlock(&gmap->shadow_lock);
+	}
+	if (bits & _SEGMENT_ENTRY_GMAP_IN)
+		gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1);
 }
 
 /**
@@ -2431,22 +2644,31 @@ static void pmdp_notify_gmap(struct gmap *gmap, unsigned long gaddr)
  */
 void pmdp_notify(struct mm_struct *mm, unsigned long vmaddr)
 {
-	unsigned long *table, gaddr;
-	struct gmap *gmap;
+	unsigned long *table, gaddr, bits;
+	struct gmap *gmap, *sg, *next;
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {
 		spin_lock(&gmap->guest_table_lock);
 		table = radix_tree_lookup(&gmap->host_to_guest,
 					  vmaddr >> PMD_SHIFT);
-		if (!table || !(*table & _SEGMENT_ENTRY_GMAP_IN)) {
+		if (!table) {
 			spin_unlock(&gmap->guest_table_lock);
 			continue;
 		}
+		bits = *table & (_SEGMENT_ENTRY_GMAP_IN | _SEGMENT_ENTRY_GMAP_VSIE);
+		*table ^= bits;
 		gaddr = __gmap_segment_gaddr(table);
-		*table &= ~_SEGMENT_ENTRY_GMAP_IN;
 		spin_unlock(&gmap->guest_table_lock);
-		gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1);
+		if (!list_empty(&gmap->children) && (bits & _SEGMENT_ENTRY_GMAP_VSIE)) {
+			spin_lock(&gmap->shadow_lock);
+			list_for_each_entry_safe(sg, next,
+						 &gmap->children, list)
+				gmap_shadow_notify_pmd(sg, vmaddr, gaddr);
+			spin_unlock(&gmap->shadow_lock);
+		}
+		if (bits & _SEGMENT_ENTRY_GMAP_IN)
+			gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1);
 	}
 	rcu_read_unlock();
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 17/22] s390/mm: Add VSIE reverse fake case
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (15 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 16/22] s390/mm: Add shadow segment code Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 18/22] s390/mm: Remove gmap_pte_op_walk Janosch Frank
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

The fake VSIE case lets us run huge vsie guests on small hosts by
creating fake page tables. When running a small guest on a huge host,
we need to create fake tables once again.

The fake tables are needed to make sure, that the VSIE guest is only
able to access the memory that its host mapped for it.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 arch/s390/include/asm/gmap.h |  2 +-
 arch/s390/kvm/gaccess.c      | 20 +++++++++++++++----
 arch/s390/mm/gmap.c          | 46 ++++++++++++++++++++++++++++++++++----------
 3 files changed, 53 insertions(+), 15 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index 5247549..d0a47d1 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -132,7 +132,7 @@ int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,
 		    int fake);
 int gmap_shadow_sgt_lookup(struct gmap *sg, unsigned long saddr,
 			   unsigned long *pgt, int *dat_protection,
-			   int *fake);
+			   int *fake, int *lvl);
 int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte);
 int gmap_shadow_segment(struct gmap *sg, unsigned long saddr, pmd_t pmd);
 
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 045d12e..de40d17 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -1144,10 +1144,22 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
 				*lvl = 1;
 				*pgt = ptr;
 				return 0;
+			} else {
+				/*
+				 * Reverse fake case.
+				 * We map a huge parent to a small guest, i.e.
+				 * we need fake shadow pagetables.
+				 *
+				 * We need pagetables here, because
+				 * guests not aligned on 1M could
+				 * read/write from/to the parent or
+				 * host.
+				 */
+				*lvl = 0;
 			}
 		}
 		/* Small to small and small to huge case */
-		if (ste.fc && sg->edat_level >= 1) {
+		if (!fc && ste.fc && sg->edat_level >= 1) {
 			*fake = 1;
 			ptr = ste.fc1.sfaa * _SEGMENT_SIZE;
 			ste.val = ptr;
@@ -1185,7 +1197,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
 	union page_table_entry pte;
 	union segment_table_entry ste;
 	unsigned long pgt;
-	int dat_protection, fake, lvl, fc;
+	int dat_protection, fake, lvl = 0, fc;
 	int rc;
 
 	down_read(&sg->mm->mmap_sem);
@@ -1196,7 +1208,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
 	 */
 	ipte_lock(vcpu);
 
-	rc = gmap_shadow_sgt_lookup(sg, saddr, &pgt, &dat_protection, &fake);
+	rc = gmap_shadow_sgt_lookup(sg, saddr, &pgt, &dat_protection, &fake, &lvl);
 	if (rc)
 		rc = kvm_s390_shadow_tables(sg, saddr, &pgt, &dat_protection,
 					    &fake, &lvl);
@@ -1204,7 +1216,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
 	vaddr.addr = saddr;
 
 	/* Shadow stopped at segment level, we map pmd to pmd */
-	if (lvl) {
+	if (!rc && lvl) {
 		if (!rc)
 			rc = gmap_read_table(sg->parent, pgt + vaddr.sx * 8,
 					     &ste.val, &fc);
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index b3d01d9..8bcaa53 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -1506,7 +1506,7 @@ static void __gmap_unshadow_sgt(struct gmap *sg, unsigned long raddr,
 
 	BUG_ON(!gmap_is_shadow(sg));
 	for (i = 0; i < _CRST_ENTRIES; i++, raddr += _SEGMENT_SIZE) {
-		if (!(sgt[i] & _SEGMENT_ENTRY_ORIGIN))
+		if (sgt[i] ==  _SEGMENT_ENTRY_EMPTY)
 			continue;
 
 		if (!(sgt[i] & _SEGMENT_ENTRY_LARGE)) {
@@ -2148,7 +2148,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_pgt_lookup);
 
 int gmap_shadow_sgt_lookup(struct gmap *sg, unsigned long saddr,
 			   unsigned long *pgt, int *dat_protection,
-			   int *fake)
+			   int *fake, int *lvl)
 {
 	unsigned long *sge, *r3e = NULL;
 	struct page *page;
@@ -2179,9 +2179,11 @@ int gmap_shadow_sgt_lookup(struct gmap *sg, unsigned long saddr,
 			*dat_protection = 0;
 			*fake = 0;
 		}
+		*lvl = 1;
 	} else {
 		gmap_shadow_pgt_lookup(sg, sge, saddr, pgt,
 				       dat_protection, fake);
+		*lvl = 0;
 	}
 out:
 	spin_unlock(&sg->guest_table_lock);
@@ -2370,6 +2372,7 @@ int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte)
 	struct gmap_rmap *rmap;
 	unsigned long vmaddr, paddr;
 	spinlock_t *ptl;
+	pmd_t *spmdp;
 	pte_t *sptep, *tptep;
 	int prot;
 	int rc;
@@ -2394,26 +2397,43 @@ int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte)
 		if (rc)
 			break;
 		rc = -EAGAIN;
-		sptep = gmap_pte_op_walk(parent, paddr, &ptl);
-		if (sptep) {
+		spmdp = gmap_pmd_op_walk(parent, paddr);
+		if (spmdp && !(pmd_val(*spmdp) & _SEGMENT_ENTRY_INVALID)) {
 			spin_lock(&sg->guest_table_lock);
 			/* Get page table pointer */
 			tptep = (pte_t *) gmap_table_walk(sg, saddr, 0);
 			if (!tptep) {
 				spin_unlock(&sg->guest_table_lock);
-				gmap_pte_op_end(ptl);
 				radix_tree_preload_end();
+				gmap_pmd_op_end(parent, spmdp);
 				break;
 			}
-			rc = ptep_shadow_pte(sg->mm, saddr, sptep, tptep, pte);
-			if (rc > 0) {
-				/* Success and a new mapping */
-				gmap_insert_rmap(sg, vmaddr, rmap);
+
+			if (pmd_large(*spmdp)) {
+				/* TODO: Bits and pgstes */
+				*tptep = __pte(((pmd_val(*spmdp) &
+						_SEGMENT_ENTRY_ORIGIN_LARGE)
+					       + (pte_index(paddr) << 12))
+					       | (pte_val(pte) & _PAGE_PROTECT));
+				pmd_val(*spmdp) |= _SEGMENT_ENTRY_GMAP_VSIE;
+				gmap_insert_rmap(sg, vmaddr & HPAGE_MASK, rmap);
 				rmap = NULL;
 				rc = 0;
+			} else {
+				sptep = pte_alloc_map_lock(parent->mm, spmdp, paddr, &ptl);
+				if (sptep) {
+					rc = ptep_shadow_pte(sg->mm, saddr, sptep, tptep, pte);
+					if (rc > 0) {
+						/* Success and a new mapping */
+						gmap_insert_rmap(sg, vmaddr, rmap);
+						rmap = NULL;
+						rc = 0;
+					}
+					gmap_pte_op_end(ptl);
+				}
 			}
-			gmap_pte_op_end(ptl);
 			spin_unlock(&sg->guest_table_lock);
+			gmap_pmd_op_end(parent, spmdp);
 		}
 		radix_tree_preload_end();
 		if (!rc)
@@ -2492,6 +2512,12 @@ static void gmap_shadow_notify_pmd(struct gmap *sg, unsigned long vmaddr,
 		case _SHADOW_RMAP_SEGMENT_LP:
 			gmap_unshadow_segment(sg, raddr);
 			break;
+		case _SHADOW_RMAP_SEGMENT:
+			gmap_unshadow_pgt(sg, raddr);
+			break;
+		case _SHADOW_RMAP_PGTABLE:
+			gmap_unshadow_page(sg, raddr);
+			break;
 		}
 		kfree(rmap);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 18/22] s390/mm: Remove gmap_pte_op_walk
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (16 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 17/22] s390/mm: Add VSIE reverse fake case Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 19/22] s390/mm: Split huge pages if granular protection is needed Janosch Frank
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

After the large page support was added, there are no more users of
this function. Let's get rid of it.

Might be squashed later on into the previous commit.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 arch/s390/mm/gmap.c | 32 --------------------------------
 1 file changed, 32 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 8bcaa53..74f6f06 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -828,38 +828,6 @@ static inline unsigned long *gmap_table_walk(struct gmap *gmap,
 }
 
 /**
- * gmap_pte_op_walk - walk the gmap page table, get the page table lock
- *		      and return the pte pointer
- * @gmap: pointer to guest mapping meta data structure
- * @gaddr: virtual address in the guest address space
- * @ptl: pointer to the spinlock pointer
- *
- * Returns a pointer to the locked pte for a guest address, or NULL
- *
- * Note: Can also be called for shadow gmaps.
- */
-static pte_t *gmap_pte_op_walk(struct gmap *gmap, unsigned long gaddr,
-			       spinlock_t **ptl)
-{
-	unsigned long *table;
-
-	if (gmap_is_shadow(gmap))
-		spin_lock(&gmap->guest_table_lock);
-	/* Walk the gmap page table, lock and get pte pointer */
-	table = gmap_table_walk(gmap, gaddr, 1); /* get segment pointer */
-	if (!table || *table & _SEGMENT_ENTRY_INVALID) {
-		if (gmap_is_shadow(gmap))
-			spin_unlock(&gmap->guest_table_lock);
-		return NULL;
-	}
-	if (gmap_is_shadow(gmap)) {
-		*ptl = &gmap->guest_table_lock;
-		return pte_offset_map((pmd_t *) table, gaddr);
-	}
-	return pte_alloc_map_lock(gmap->mm, (pmd_t *) table, gaddr, ptl);
-}
-
-/**
  * gmap_pte_op_fixup - force a page in and connect the gmap page table
  * @gmap: pointer to guest mapping meta data structure
  * @gaddr: virtual address in the guest address space
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 19/22] s390/mm: Split huge pages if granular protection is needed
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (17 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 18/22] s390/mm: Remove gmap_pte_op_walk Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2018-01-25  7:16   ` Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 20/22] s390/mm: Enable gmap huge pmd support Janosch Frank
                   ` (5 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

A guest can put DAT tables for a lower level guest in the same huge
segment as one of its prefixes or a g3 page. This would make it
necessary for the segment to be unprotected (because of the prefix)
and protected (because of the shadowing) at the same time. This is not
possible in this universe.

Hence we split the affected huge segment, so we can protect on a
per-page basis. Such gmap segments are special and get a new software
bit, that helps us handling this edge case.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 arch/s390/include/asm/gmap.h    |  13 ++
 arch/s390/include/asm/pgtable.h |   7 +-
 arch/s390/mm/fault.c            |  10 +-
 arch/s390/mm/gmap.c             | 256 ++++++++++++++++++++++++++++++++++++----
 arch/s390/mm/pgtable.c          |  51 ++++++++
 5 files changed, 313 insertions(+), 24 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index d0a47d1..a187033 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -15,6 +15,7 @@
 
 /* Status bits in huge and non-huge gmap segment entries. */
 #define _SEGMENT_ENTRY_GMAP_IN		0x0001	/* invalidation notify bit */
+#define _SEGMENT_ENTRY_GMAP_SPLIT	0x0002  /* split huge pmd */
 /* Status bits only for huge segment entries */
 #define _SEGMENT_ENTRY_GMAP_UC		0x4000	/* user dirty (migration) */
 #define _SEGMENT_ENTRY_GMAP_VSIE	0x8000	/* vsie bit */
@@ -58,6 +59,7 @@ struct gmap {
 	struct radix_tree_root host_to_rmap;
 	struct list_head children;
 	struct list_head pt_list;
+	struct list_head split_list;
 	spinlock_t shadow_lock;
 	struct gmap *parent;
 	unsigned long orig_asce;
@@ -98,6 +100,17 @@ static inline int gmap_is_shadow(struct gmap *gmap)
 	return !!gmap->parent;
 }
 
+/**
+ * gmap_pmd_is_split - Returns if a huge gmap pmd has been split.
+ * @pmdp: pointer to the pmd
+ *
+ * Returns true if the passed huge gmap pmd has been split.
+ */
+static inline bool gmap_pmd_is_split(pmd_t *pmdp)
+{
+	return !!(pmd_val(*pmdp) & _SEGMENT_ENTRY_GMAP_SPLIT);
+}
+
 struct gmap *gmap_create(struct mm_struct *mm, unsigned long limit);
 void gmap_remove(struct gmap *gmap);
 struct gmap *gmap_get(struct gmap *gmap);
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 647c300..e68691a 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1095,6 +1095,8 @@ void ptep_set_pte_at(struct mm_struct *mm, unsigned long addr,
 void ptep_set_notify(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
 void ptep_notify(struct mm_struct *mm, unsigned long addr,
 		 pte_t *ptep, unsigned long bits);
+void ptep_notify_gmap(struct mm_struct *mm, unsigned long vmaddr,
+		      pte_t *pte, unsigned long bits);
 void pmdp_notify(struct mm_struct *mm, unsigned long addr);
 int ptep_force_prot(struct mm_struct *mm, unsigned long gaddr,
 		    pte_t *ptep, int prot, unsigned long bit);
@@ -1104,8 +1106,11 @@ void ptep_zap_key(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
 int ptep_shadow_pte(struct mm_struct *mm, unsigned long saddr,
 		    pte_t *sptep, pte_t *tptep, pte_t pte);
 void ptep_unshadow_pte(struct mm_struct *mm, unsigned long saddr, pte_t *ptep);
-
+void ptep_remove_dirty_protection_split(struct mm_struct *mm, pte_t *ptep,
+					unsigned long vmaddr);
 bool test_and_clear_guest_dirty(struct mm_struct *mm, unsigned long address);
+bool test_and_clear_guest_dirty_split(struct mm_struct *mm, pmd_t *pmdp,
+				      unsigned long vmaddr);
 int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 			  unsigned char key, bool nq);
 int cond_set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 93faeca..ba92860 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -418,7 +418,7 @@ static inline int do_exception(struct pt_regs *regs, int access)
 	struct vm_area_struct *vma;
 	enum fault_type type;
 	unsigned long trans_exc_code;
-	unsigned long address;
+	unsigned long address, gaddress = 0;
 	unsigned int flags;
 	int fault;
 
@@ -475,6 +475,12 @@ static inline int do_exception(struct pt_regs *regs, int access)
 			fault = VM_FAULT_BADMAP;
 			goto out_up;
 		}
+		/*
+		 * The GMAP code needs the full fault address even
+		 * when using large pages. Hence we take an unmasked
+		 * copy to feed to __gmap_link.
+		 */
+		gaddress = address;
 		if (gmap->pfault_enabled)
 			flags |= FAULT_FLAG_RETRY_NOWAIT;
 	}
@@ -551,7 +557,7 @@ static inline int do_exception(struct pt_regs *regs, int access)
 	}
 	if (IS_ENABLED(CONFIG_PGSTE) && gmap) {
 		address =  __gmap_link(gmap, current->thread.gmap_addr,
-				       address);
+				       gaddress);
 		if (address == -EFAULT) {
 			fault = VM_FAULT_BADMAP;
 			goto out_up;
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 74f6f06..1c15a98 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -64,6 +64,7 @@ static struct gmap *gmap_alloc(unsigned long limit)
 	INIT_LIST_HEAD(&gmap->crst_list);
 	INIT_LIST_HEAD(&gmap->children);
 	INIT_LIST_HEAD(&gmap->pt_list);
+	INIT_LIST_HEAD(&gmap->split_list);
 	INIT_RADIX_TREE(&gmap->guest_to_host, GFP_KERNEL);
 	INIT_RADIX_TREE(&gmap->host_to_guest, GFP_ATOMIC);
 	INIT_RADIX_TREE(&gmap->host_to_rmap, GFP_ATOMIC);
@@ -195,6 +196,12 @@ static void gmap_free(struct gmap *gmap)
 	gmap_radix_tree_free(&gmap->guest_to_host);
 	gmap_radix_tree_free(&gmap->host_to_guest);
 
+	/* Free split pmd page tables */
+	spin_lock(&gmap->guest_table_lock);
+	list_for_each_entry_safe(page, next, &gmap->split_list, lru)
+		page_table_free_pgste(page);
+	spin_unlock(&gmap->guest_table_lock);
+
 	/* Free additional data for a shadow gmap */
 	if (gmap_is_shadow(gmap)) {
 		/* Free all page tables. */
@@ -546,6 +553,7 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
 	pud_t *pud;
 	pmd_t *pmd;
 	pmd_t unprot;
+	pte_t *ptep;
 	int rc;
 
 	BUG_ON(gmap_is_shadow(gmap));
@@ -616,6 +624,16 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
 					  & ~_SEGMENT_ENTRY_PROTECT))
 			       | _SEGMENT_ENTRY_GMAP_UC);
 		gmap_pmdp_xchg(gmap, (pmd_t *)table, unprot, gaddr);
+	} else if (gmap_pmd_is_split((pmd_t *)table)) {
+		/*
+		 * Split pmds are somewhere in-between a normal and a
+		 * large pmd. As we don't share the page table, the
+		 * host does not remove protection on a fault and we
+		 * have to do it ourselves for the guest mapping.
+		 */
+		ptep = pte_offset_map((pmd_t *)table, gaddr);
+		if (pte_val(*ptep) & _PAGE_PROTECT)
+			ptep_remove_dirty_protection_split(mm, ptep, vmaddr);
 	}
 	spin_unlock(&gmap->guest_table_lock);
 	spin_unlock(ptl);
@@ -895,7 +913,7 @@ static inline pmd_t *gmap_pmd_op_walk(struct gmap *gmap, unsigned long gaddr)
 	 * suffices to take the pte lock later on. Thus we can unlock
 	 * the guest_table_lock here.
 	 */
-	if (!pmd_large(*pmdp) && !gmap_is_shadow(gmap))
+	if (!gmap_pmd_is_split(pmdp) && !pmd_large(*pmdp) && !gmap_is_shadow(gmap))
 		spin_unlock(&gmap->guest_table_lock);
 	return pmdp;
 }
@@ -907,10 +925,20 @@ static inline pmd_t *gmap_pmd_op_walk(struct gmap *gmap, unsigned long gaddr)
  */
 static inline void gmap_pmd_op_end(struct gmap *gmap, pmd_t *pmdp)
 {
-	if (pmd_large(*pmdp) || gmap_is_shadow(gmap))
+	if (gmap_pmd_is_split(pmdp) || pmd_large(*pmdp) || gmap_is_shadow(gmap))
 		spin_unlock(&gmap->guest_table_lock);
 }
 
+static pte_t *gmap_pte_from_pmd(struct gmap *gmap, pmd_t *pmdp,
+				unsigned long addr, spinlock_t **ptl)
+{
+	if (likely(!gmap_pmd_is_split(pmdp)))
+		return pte_alloc_map_lock(gmap->mm, pmdp, addr, ptl);
+
+	*ptl = NULL;
+	return pte_offset_map(pmdp, addr);
+}
+
 /**
  * gmap_pmdp_transfer_prot - transfer protection of guest pmd to host pmd
  * @mm: the memory context
@@ -953,6 +981,18 @@ static void gmap_pmdp_transfer_prot(struct mm_struct *mm, unsigned long addr,
 	*hpmdp = new;
 }
 
+static void gmap_pte_transfer_prot(struct mm_struct *mm, unsigned long addr,
+				   pte_t *gptep, pmd_t *hpmdp)
+{
+	pmd_t mpmd = __pmd(0);
+
+	if (pte_val(*gptep) & _PAGE_PROTECT)
+		pmd_val(mpmd) |= _SEGMENT_ENTRY_PROTECT;
+	if (pte_val(*gptep) & _PAGE_INVALID)
+		pmd_val(mpmd) |= _SEGMENT_ENTRY_INVALID;
+	gmap_pmdp_transfer_prot(mm, addr, &mpmd, hpmdp);
+}
+
 /**
  * gmap_pmdp_force_prot - change access rights of a locked pmd
  * @mm: pointer to the process mm_struct
@@ -989,6 +1029,63 @@ static int gmap_pmdp_force_prot(struct gmap *gmap, unsigned long addr,
 	return 0;
 }
 
+/**
+ * gmap_pmd_split_free - Free a split pmd's page table
+ * @pmdp The split pmd that we free of its page table
+ *
+ * If the userspace pmds are exchanged, we'll remove the gmap pmds as
+ * well, so we fault on them and link them again. We would leak
+ * memory, if we didn't free split pmds here.
+ */
+static inline void gmap_pmd_split_free(pmd_t *pmdp)
+{
+	unsigned long pgt = pmd_val(*pmdp) & _SEGMENT_ENTRY_ORIGIN;
+	struct page *page;
+
+	if (gmap_pmd_is_split(pmdp)) {
+		page = pfn_to_page(pgt >> PAGE_SHIFT);
+		list_del(&page->lru);
+		page_table_free_pgste(page);
+	}
+}
+
+/**
+ * gmap_pmd_split - Split a huge gmap pmd and use a page table instead
+ * @gmap: pointer to guest mapping meta data structure
+ * @gaddr: virtual address in the guest address space
+ * @pmdp: pointer to the pmd that will be split
+ *
+ * When splitting gmap pmds, we have to make the resulting page table
+ * look like it's a normal one to be able to use the common pte
+ * handling functions. Also we need to track these new tables as they
+ * aren't tracked anywhere else.
+ */
+static int gmap_pmd_split(struct gmap *gmap, unsigned long gaddr, pmd_t *pmdp)
+{
+	unsigned long *table;
+	struct page *page;
+	pmd_t new;
+	int i;
+
+	page = page_table_alloc_pgste(gmap->mm);
+	if (!page)
+		return -ENOMEM;
+	table = (unsigned long *) page_to_phys(page);
+	for (i = 0; i < 256; i++) {
+		table[i] = (pmd_val(*pmdp) & HPAGE_MASK) + i * PAGE_SIZE;
+		/* pmd_large() implies pmd/pte_present() */
+		table[i] |=  _PAGE_PRESENT | _PAGE_READ | _PAGE_WRITE;
+		/* ptes are directly marked as dirty */
+		table[i + PTRS_PER_PTE] |= PGSTE_UC_BIT;
+	}
+
+	pmd_val(new) = ((unsigned long)table | _SEGMENT_ENTRY |
+			(_SEGMENT_ENTRY_GMAP_SPLIT));
+	list_add(&page->lru, &gmap->split_list);
+	gmap_pmdp_xchg(gmap, pmdp, new, gaddr);
+	return 0;
+}
+
 /*
  * gmap_protect_pte - remove access rights to memory and set pgste bits
  * @gmap: pointer to guest mapping meta data structure
@@ -1004,7 +1101,8 @@ static int gmap_pmdp_force_prot(struct gmap *gmap, unsigned long addr,
  * guest_table_lock held for shadow gmaps.
  */
 static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
-			    pmd_t *pmdp, int prot, unsigned long bits)
+			    unsigned long vmaddr, pmd_t *pmdp, pmd_t *hpmdp,
+			    int prot, unsigned long bits)
 {
 	int rc;
 	pte_t *ptep;
@@ -1015,7 +1113,7 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
 	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
 		return -EAGAIN;
 
-	ptep = pte_alloc_map_lock(gmap->mm, pmdp, gaddr, &ptl);
+	ptep = gmap_pte_from_pmd(gmap, pmdp, gaddr, &ptl);
 	if (!ptep)
 		return -ENOMEM;
 
@@ -1024,6 +1122,8 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
 	/* Protect and unlock. */
 	rc = ptep_force_prot(gmap->mm, gaddr, ptep, prot, pbits);
 	gmap_pte_op_end(ptl);
+	if (!rc && gmap_pmd_is_split(pmdp))
+		gmap_pte_transfer_prot(gmap->mm, vmaddr, ptep, hpmdp);
 	return rc;
 }
 
@@ -1048,6 +1148,14 @@ static int gmap_protect_pmd(struct gmap *gmap, unsigned long gaddr,
 
 	sbits |= (bits & GMAP_NOTIFY_MPROT) ? _SEGMENT_ENTRY_GMAP_IN : 0;
 	sbits |= (bits & GMAP_NOTIFY_SHADOW) ? _SEGMENT_ENTRY_GMAP_VSIE : 0;
+
+	if (((prot != PROT_WRITE) && (bits & GMAP_NOTIFY_SHADOW))) {
+		ret = gmap_pmd_split(gmap, gaddr, pmdp);
+		if (ret)
+			return ret;
+		return -EFAULT;
+	}
+
 	/* Protect gmap pmd */
 	ret = gmap_pmdp_force_prot(gmap, gaddr, pmdp, prot, sbits);
 	/*
@@ -1081,20 +1189,27 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
 	spinlock_t *ptl;
 	unsigned long vmaddr, dist;
 	pmd_t *pmdp, *hpmdp;
-	int rc;
+	int rc = 0;
 
 	while (len) {
 		rc = -EAGAIN;
 		vmaddr = __gmap_translate(gmap, gaddr);
 		hpmdp = (pmd_t *)huge_pte_offset(gmap->mm, vmaddr, HPAGE_SIZE);
+		if (!hpmdp)
+			BUG();
 		/* Do we need tests here? */
 		ptl = pmd_lock(gmap->mm, hpmdp);
 
 		pmdp = gmap_pmd_op_walk(gmap, gaddr);
 		if (pmdp) {
 			if (!pmd_large(*pmdp)) {
-				rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,
-						      bits);
+				if (gmap_pmd_is_split(pmdp) &&
+				    (bits & GMAP_NOTIFY_MPROT)) {
+					pmd_val(*pmdp) |= _SEGMENT_ENTRY_GMAP_IN;
+				}
+
+				rc = gmap_protect_pte(gmap, gaddr, vmaddr,
+						      pmdp, hpmdp, prot, bits);
 				if (!rc) {
 					len -= PAGE_SIZE;
 					gaddr += PAGE_SIZE;
@@ -1111,7 +1226,9 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
 			gmap_pmd_op_end(gmap, pmdp);
 		}
 		spin_unlock(ptl);
-		if (rc) {
+		if (rc == -EFAULT)
+			continue;
+		if (rc == -EAGAIN) {
 			vmaddr = __gmap_translate(gmap, gaddr);
 			if (IS_ERR_VALUE(vmaddr))
 				return vmaddr;
@@ -1179,7 +1296,7 @@ int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val,
 		pmdp = gmap_pmd_op_walk(gmap, gaddr);
 		if (pmdp) {
 			if (!pmd_large(*pmdp)) {
-				ptep = pte_alloc_map_lock(gmap->mm, pmdp, gaddr, &ptl);
+				ptep = gmap_pte_from_pmd(gmap, pmdp, gaddr, &ptl);
 				if (ptep) {
 					pte = *ptep;
 					if (pte_present(pte) && (pte_val(pte) & _PAGE_READ)) {
@@ -1188,6 +1305,8 @@ int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val,
 						*val = *(unsigned long *) address;
 						pte_val(*ptep) |= _PAGE_YOUNG;
 						/* Do *NOT* clear the _PAGE_INVALID bit! */
+						if (gmap_pmd_is_split(pmdp))
+							*fc = 1;
 						rc = 0;
 					}
 					gmap_pte_op_end(ptl);
@@ -1277,7 +1396,7 @@ static int gmap_protect_rmap_pte(struct gmap *sg, struct gmap_rmap *rmap,
 	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
 		return -EAGAIN;
 
-	ptep = pte_alloc_map_lock(sg->parent->mm, pmdp, paddr, &ptl);
+	ptep = gmap_pte_from_pmd(sg->parent, pmdp, paddr, &ptl);
 	if (ptep) {
 		spin_lock(&sg->guest_table_lock);
 		rc = ptep_force_prot(sg->parent->mm, paddr, ptep, prot,
@@ -1356,12 +1475,12 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
 		}
 		spin_unlock(ptl);
 		radix_tree_preload_end();
-		if (rc) {
+		if (rc)
 			kfree(rmap);
+		if (rc == -EAGAIN) {
 			rc = gmap_pte_op_fixup(parent, paddr, vmaddr, prot);
 			if (rc)
 				return rc;
-			continue;
 		}
 	}
 	return 0;
@@ -2244,7 +2363,8 @@ int gmap_shadow_segment(struct gmap *sg, unsigned long saddr, pmd_t pmd)
 	struct gmap *parent;
 	struct gmap_rmap *rmap;
 	unsigned long vmaddr, paddr;
-	pmd_t spmd, tpmd, *spmdp = NULL, *tpmdp;
+	pmd_t spmd, tpmd, *spmdp = NULL, *tpmdp, *hpmdp;
+	spinlock_t *ptl;
 	int prot;
 	int rc;
 
@@ -2264,6 +2384,11 @@ int gmap_shadow_segment(struct gmap *sg, unsigned long saddr, pmd_t pmd)
 			rc = vmaddr;
 			break;
 		}
+		hpmdp = (pmd_t *)huge_pte_offset(sg->mm, vmaddr, HPAGE_SIZE);
+		if (!hpmdp)
+			BUG();
+		/* Do we need tests here? */
+		ptl = pmd_lock(sg->mm, hpmdp);
 		rc = radix_tree_preload(GFP_KERNEL);
 		if (rc)
 			break;
@@ -2272,12 +2397,15 @@ int gmap_shadow_segment(struct gmap *sg, unsigned long saddr, pmd_t pmd)
 		/* Let's look up the parent's mapping */
 		spmdp = gmap_pmd_op_walk(parent, paddr);
 		if (spmdp) {
+			if (!pmd_large(*spmdp))
+				BUG();
 			spin_lock(&sg->guest_table_lock);
 			/* Get shadow segment table pointer */
 			tpmdp = (pmd_t *) gmap_table_walk(sg, saddr, 1);
 			if (!tpmdp) {
 				spin_unlock(&sg->guest_table_lock);
 				gmap_pmd_op_end(parent, spmdp);
+				spin_unlock(ptl);
 				radix_tree_preload_end();
 				break;
 			}
@@ -2286,6 +2414,7 @@ int gmap_shadow_segment(struct gmap *sg, unsigned long saddr, pmd_t pmd)
 				rc = 0;	/* already shadowed */
 				spin_unlock(&sg->guest_table_lock);
 				gmap_pmd_op_end(parent, spmdp);
+				spin_unlock(ptl);
 				radix_tree_preload_end();
 				break;
 			}
@@ -2309,6 +2438,7 @@ int gmap_shadow_segment(struct gmap *sg, unsigned long saddr, pmd_t pmd)
 			spin_unlock(&sg->guest_table_lock);
 			gmap_pmd_op_end(parent, spmdp);
 		}
+		spin_unlock(ptl);
 		radix_tree_preload_end();
 		if (!rc)
 			break;
@@ -2388,7 +2518,7 @@ int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte)
 				rmap = NULL;
 				rc = 0;
 			} else {
-				sptep = pte_alloc_map_lock(parent->mm, spmdp, paddr, &ptl);
+				sptep = gmap_pte_from_pmd(parent, spmdp, paddr, &ptl);
 				if (sptep) {
 					rc = ptep_shadow_pte(sg->mm, saddr, sptep, tptep, pte);
 					if (rc > 0) {
@@ -2538,6 +2668,9 @@ static void gmap_shadow_notify(struct gmap *sg, unsigned long vmaddr,
 		case _SHADOW_RMAP_REGION3:
 			gmap_unshadow_sgt(sg, raddr);
 			break;
+		case _SHADOW_RMAP_SEGMENT_LP:
+			gmap_unshadow_segment(sg, raddr);
+			break;
 		case _SHADOW_RMAP_SEGMENT:
 			gmap_unshadow_pgt(sg, raddr);
 			break;
@@ -2550,6 +2683,46 @@ static void gmap_shadow_notify(struct gmap *sg, unsigned long vmaddr,
 	spin_unlock(&sg->guest_table_lock);
 }
 
+/*
+ * ptep_notify_gmap - call all invalidation callbacks for a specific pte of a gmap
+ * @mm: pointer to the process mm_struct
+ * @addr: virtual address in the process address space
+ * @pte: pointer to the page table entry
+ * @bits: bits from the pgste that caused the notify call
+ *
+ * This function is assumed to be called with the guest_table_lock held.
+ */
+void ptep_notify_gmap(struct mm_struct *mm, unsigned long vmaddr,
+		      pte_t *pte, unsigned long bits)
+{
+	unsigned long offset, gaddr = 0;
+	unsigned long *table;
+	struct gmap *gmap, *sg, *next;
+
+	offset = ((unsigned long) pte) & (255 * sizeof(pte_t));
+	offset = offset * (4096 / sizeof(pte_t));
+	rcu_read_lock();
+	list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {
+		table = radix_tree_lookup(&gmap->host_to_guest,
+					  vmaddr >> PMD_SHIFT);
+		if (table)
+			gaddr = __gmap_segment_gaddr(table) + offset;
+		else
+			continue;
+
+		if (!list_empty(&gmap->children) && (bits & PGSTE_VSIE_BIT)) {
+			spin_lock(&gmap->shadow_lock);
+			list_for_each_entry_safe(sg, next,
+						 &gmap->children, list)
+				gmap_shadow_notify(sg, vmaddr, gaddr);
+			spin_unlock(&gmap->shadow_lock);
+		}
+		if (bits & PGSTE_IN_BIT)
+			gmap_call_notifier(gmap, gaddr, gaddr + PAGE_SIZE - 1);
+	}
+	rcu_read_unlock();
+}
+
 /**
  * ptep_notify - call all invalidation callbacks for a specific pte.
  * @mm: pointer to the process mm_struct
@@ -2612,10 +2785,12 @@ static void pmdp_notify_gmap(struct gmap *gmap, unsigned long gaddr)
 	table = gmap_table_walk(gmap, gaddr, 1);
 	if (!table)
 		return;
-	bits = *table & (_SEGMENT_ENTRY_GMAP_IN | _SEGMENT_ENTRY_GMAP_VSIE);
+	bits = *table & _SEGMENT_ENTRY_GMAP_IN;
+	if (pmd_large(__pmd(*table)) && (*table & _SEGMENT_ENTRY_GMAP_VSIE))
+		bits |= _SEGMENT_ENTRY_GMAP_VSIE;
 	if (!bits)
 		return;
-	*table ^= bits;
+	*table &= ~bits;
 	vmaddr = __gmap_translate(gmap, gaddr);
 	if (!list_empty(&gmap->children) && (bits & _SEGMENT_ENTRY_GMAP_VSIE)
 	    && (*table & _SEGMENT_ENTRY_PROTECT)) {
@@ -2629,6 +2804,23 @@ static void pmdp_notify_gmap(struct gmap *gmap, unsigned long gaddr)
 		gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1);
 }
 
+static void pmdp_notify_split(struct mm_struct *mm, unsigned long vmaddr,
+			      unsigned long *table)
+{
+	int i = 0;
+	unsigned long bits;
+	unsigned long *ptep = (unsigned long *)(*table & PAGE_MASK);
+	unsigned long *pgste = ptep + PTRS_PER_PTE;
+
+	for (; i < 256; i++, vmaddr += PAGE_SIZE, ptep++, pgste++) {
+		bits = *pgste & (PGSTE_IN_BIT | PGSTE_VSIE_BIT);
+		if (bits) {
+			*pgste ^= bits;
+			ptep_notify_gmap(mm, vmaddr, (pte_t *)ptep, bits);
+		}
+	}
+}
+
 /**
  * pmdp_notify - call all invalidation callbacks for a specific pmd
  * @mm: pointer to the process mm_struct
@@ -2650,8 +2842,17 @@ void pmdp_notify(struct mm_struct *mm, unsigned long vmaddr)
 			spin_unlock(&gmap->guest_table_lock);
 			continue;
 		}
-		bits = *table & (_SEGMENT_ENTRY_GMAP_IN | _SEGMENT_ENTRY_GMAP_VSIE);
-		*table ^= bits;
+
+		if (gmap_pmd_is_split((pmd_t *)table)) {
+			pmdp_notify_split(mm, vmaddr, table);
+			spin_unlock(&gmap->guest_table_lock);
+			continue;
+		}
+
+		bits = *table & (_SEGMENT_ENTRY_GMAP_IN);
+		if (pmd_large(__pmd(*table)) && (*table & _SEGMENT_ENTRY_GMAP_VSIE))
+			bits |= _SEGMENT_ENTRY_GMAP_VSIE;
+		*table &= ~bits;
 		gaddr = __gmap_segment_gaddr(table);
 		spin_unlock(&gmap->guest_table_lock);
 		if (!list_empty(&gmap->children) && (bits & _SEGMENT_ENTRY_GMAP_VSIE)) {
@@ -2682,6 +2883,7 @@ static void gmap_pmdp_clear(struct mm_struct *mm, unsigned long vmaddr,
 		if (pmdp) {
 			if (purge)
 				__pmdp_csp(pmdp);
+			gmap_pmd_split_free(pmdp);
 			pmd_val(*pmdp) = _SEGMENT_ENTRY_EMPTY;
 		}
 		spin_unlock(&gmap->guest_table_lock);
@@ -2738,6 +2940,7 @@ void gmap_pmdp_idte_local(struct mm_struct *mm, unsigned long vmaddr)
 			else if (MACHINE_HAS_IDTE)
 				__pmdp_idte(gaddr, pmdp, 0, 0,
 					    IDTE_LOCAL);
+			gmap_pmd_split_free(pmdp);
 			*entry = _SEGMENT_ENTRY_EMPTY;
 		}
 		spin_unlock(&gmap->guest_table_lock);
@@ -2774,6 +2977,8 @@ void gmap_pmdp_idte_global(struct mm_struct *mm, unsigned long vmaddr)
 					    IDTE_GLOBAL);
 			else
 				__pmdp_csp(pmdp);
+
+			gmap_pmd_split_free(pmdp);
 			*entry = _SEGMENT_ENTRY_EMPTY;
 		}
 		spin_unlock(&gmap->guest_table_lock);
@@ -2852,6 +3057,7 @@ void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long bitmap[4],
 	pmd_t *pmdp, *hpmdp;
 	spinlock_t *ptl;
 
+	/* Protection against gmap_link vsie unprotection. */
 	hpmdp = (pmd_t *)huge_pte_offset(gmap->mm, vmaddr, HPAGE_SIZE);
 	if (!hpmdp)
 		return;
@@ -2867,9 +3073,17 @@ void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long bitmap[4],
 						      gaddr, vmaddr))
 			memset(bitmap, 0xFF, 32);
 	} else {
-		for (; i < _PAGE_ENTRIES; i++, vmaddr += PAGE_SIZE) {
-			if (test_and_clear_guest_dirty(gmap->mm, vmaddr))
-				set_bit_le(i, bitmap);
+		/* We handle this here, as it's of the records from mm. */
+		if (unlikely(gmap_pmd_is_split(pmdp))) {
+			for (; i < _PAGE_ENTRIES; i++, vmaddr += PAGE_SIZE) {
+				if (test_and_clear_guest_dirty_split(gmap->mm, pmdp, vmaddr))
+					set_bit_le(i, bitmap);
+			}
+		} else {
+			for (; i < _PAGE_ENTRIES; i++, vmaddr += PAGE_SIZE) {
+				if (test_and_clear_guest_dirty(gmap->mm, vmaddr))
+					set_bit_le(i, bitmap);
+			}
 		}
 	}
 	gmap_pmd_op_end(gmap, pmdp);
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index d18b80e..c0408b2 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -763,6 +763,57 @@ bool test_and_clear_guest_dirty(struct mm_struct *mm, unsigned long addr)
 }
 EXPORT_SYMBOL_GPL(test_and_clear_guest_dirty);
 
+void ptep_remove_dirty_protection_split(struct mm_struct *mm,
+					pte_t *ptep, unsigned long vmaddr)
+{
+	pte_t unprot = __pte(pte_val(*ptep) & ~_PAGE_PROTECT);
+	pgste_t pgste;
+	unsigned long bits;
+
+	pgste = pgste_get_lock(ptep);
+	pgste_val(pgste) |= PGSTE_UC_BIT;
+
+	bits = pgste_val(pgste) & (PGSTE_IN_BIT | PGSTE_VSIE_BIT);
+	pgste_val(pgste) ^= bits;
+	ptep_notify_gmap(mm, vmaddr, ptep, bits);
+	ptep_ipte_global(mm, vmaddr, ptep, 0);
+
+	*ptep = unprot;
+	pgste_set_unlock(ptep, pgste);
+}
+EXPORT_SYMBOL_GPL(ptep_remove_dirty_protection_split);
+
+bool test_and_clear_guest_dirty_split(struct mm_struct *mm, pmd_t *pmdp,
+				      unsigned long vmaddr)
+{
+	bool dirty;
+	pte_t *ptep, pte;
+	pgste_t pgste;
+	unsigned long bits;
+
+	ptep = pte_offset_map(pmdp, vmaddr);
+	pgste = pgste_get_lock(ptep);
+	dirty = !!(pgste_val(pgste) & PGSTE_UC_BIT);
+	pgste_val(pgste) &= ~PGSTE_UC_BIT;
+	pte = *ptep;
+	if (dirty) {
+		bits = pgste_val(pgste) & (PGSTE_IN_BIT | PGSTE_VSIE_BIT);
+		if (bits) {
+			pgste_val(pgste) ^= bits;
+			ptep_notify_gmap(mm, vmaddr, ptep, bits);
+		}
+		ptep_ipte_global(mm, vmaddr, ptep, 0);
+		if (MACHINE_HAS_ESOP || !(pte_val(pte) & _PAGE_WRITE))
+			pte_val(pte) |= _PAGE_PROTECT;
+		else
+			pte_val(pte) |= _PAGE_INVALID;
+		*ptep = pte;
+	}
+	pgste_set_unlock(ptep, pgste);
+	return dirty;
+}
+EXPORT_SYMBOL_GPL(test_and_clear_guest_dirty_split);
+
 int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 			  unsigned char key, bool nq)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 20/22] s390/mm: Enable gmap huge pmd support
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (18 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 19/22] s390/mm: Split huge pages if granular protection is needed Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2017-12-13 12:53 ` [RFC/PATCH v2 21/22] KVM: s390: Add KVM HPAGE capability Janosch Frank
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

Now that we have everything in place, let's allow huge (1m) pmds for
gmap linking, effectively allowing hugetlbfs backed guests. Transparent
huge pages and 2g huge pages are *not* supported through this change.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/mm/gmap.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 1c15a98..cb03646 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2,8 +2,10 @@
 /*
  *  KVM guest address space mapping code
  *
- *    Copyright IBM Corp. 2007, 2016
+ *    Copyright IBM Corp. 2007, 2016, 2017
  *    Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>
+ *		 David Hildenbrand <david@redhat.com>
+ *		 Janosch Frank <frankja@linux.vnet.ibm.com>
  */
 
 #include <linux/kernel.h>
@@ -597,9 +599,6 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
 		return -EFAULT;
 	pmd = pmd_offset(pud, vmaddr);
 	VM_BUG_ON(pmd_none(*pmd));
-	/* large pmds cannot yet be handled */
-	if (pmd_large(*pmd))
-		return -EFAULT;
 	/* Link gmap segment table entry location to page table. */
 	rc = radix_tree_preload(GFP_KERNEL);
 	if (rc)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 21/22] KVM: s390: Add KVM HPAGE capability
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (19 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 20/22] s390/mm: Enable gmap huge pmd support Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2017-12-20 13:02   ` Cornelia Huck
  2017-12-13 12:53 ` [RFC/PATCH v2 22/22] RFC: s390/mm: Add gmap lock classes Janosch Frank
                   ` (3 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

KVM huge page backing support can not be easily tested under
s390. Currently testing is only possible after most of the guest has
already been set up.

To indicate, that KVM has huge page backing support, we add the
KVM_CAP_S390_HPAGE capability. This does not mean, that transparent
huge pages are supported.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 Documentation/virtual/kvm/api.txt | 10 ++++++++++
 arch/s390/kvm/kvm-s390.c          |  1 +
 include/uapi/linux/kvm.h          |  1 +
 3 files changed, 12 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 57d3ee9..a56b0af 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4369,3 +4369,13 @@ Parameters: none
 This capability indicates if the flic device will be able to get/set the
 AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and allows
 to discover this without having to create a flic device.
+
+8.14 KVM_CAP_S390_HPAGE
+
+Architectures: s390
+This capability, if KVM_CHECK_EXTENSION indicates that it is
+available, means that KVM supports VMs that are memory backed through
+hugetlbfs with 1 megabyte pages.
+
+While it is generally possible to create and start a VM without this
+support, the VM will not be functional.
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 73fb3bc..8951ad4 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -393,6 +393,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_S390_CMMA_MIGRATION:
 	case KVM_CAP_S390_AIS:
 	case KVM_CAP_S390_AIS_MIGRATION:
+	case KVM_CAP_S390_HPAGE:
 		r = 1;
 		break;
 	case KVM_CAP_S390_MEM_OP:
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 496e59a..aa3d707 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -932,6 +932,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_HYPERV_SYNIC2 148
 #define KVM_CAP_HYPERV_VP_INDEX 149
 #define KVM_CAP_S390_AIS_MIGRATION 150
+#define KVM_CAP_S390_HPAGE 151
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [RFC/PATCH v2 22/22] RFC: s390/mm: Add gmap lock classes
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (20 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 21/22] KVM: s390: Add KVM HPAGE capability Janosch Frank
@ 2017-12-13 12:53 ` Janosch Frank
  2017-12-20 12:24   ` Christian Borntraeger
  2017-12-20 12:23 ` [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Christian Borntraeger
                   ` (2 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Janosch Frank @ 2017-12-13 12:53 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

A shadow gmap and its parent are locked right after each other when
doing VSIE management. Lockdep can't differentiate between the two
classes without some help.

TODO: Not sure yet if I have to annotate all and if gmap_pmd_walk will
be used by both shadow and parent

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 arch/s390/include/asm/gmap.h |  6 ++++++
 arch/s390/mm/gmap.c          | 40 +++++++++++++++++++---------------------
 2 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index a187033..6287aca 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -20,6 +20,12 @@
 #define _SEGMENT_ENTRY_GMAP_UC		0x4000	/* user dirty (migration) */
 #define _SEGMENT_ENTRY_GMAP_VSIE	0x8000	/* vsie bit */
 
+
+enum gmap_lock_class {
+	GMAP_LOCK_PARENT,
+	GMAP_LOCK_SHADOW
+};
+
 /**
  * struct gmap_struct - guest address space
  * @list: list head for the mm->context gmap list
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index cb03646..86a12f3 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -199,10 +199,8 @@ static void gmap_free(struct gmap *gmap)
 	gmap_radix_tree_free(&gmap->host_to_guest);
 
 	/* Free split pmd page tables */
-	spin_lock(&gmap->guest_table_lock);
 	list_for_each_entry_safe(page, next, &gmap->split_list, lru)
 		page_table_free_pgste(page);
-	spin_unlock(&gmap->guest_table_lock);
 
 	/* Free additional data for a shadow gmap */
 	if (gmap_is_shadow(gmap)) {
@@ -1373,7 +1371,7 @@ static int gmap_protect_rmap_pmd(struct gmap *sg, struct gmap_rmap *rmap,
 	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
 		return -EAGAIN;
 
-	spin_lock(&sg->guest_table_lock);
+	spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
 	rc = gmap_protect_pmd(sg->parent, paddr, vmaddr, pmdp, hpmdp,
 			      prot, GMAP_NOTIFY_SHADOW);
 	if (!rc)
@@ -1397,7 +1395,7 @@ static int gmap_protect_rmap_pte(struct gmap *sg, struct gmap_rmap *rmap,
 
 	ptep = gmap_pte_from_pmd(sg->parent, pmdp, paddr, &ptl);
 	if (ptep) {
-		spin_lock(&sg->guest_table_lock);
+		spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
 		rc = ptep_force_prot(sg->parent->mm, paddr, ptep, prot,
 				     PGSTE_VSIE_BIT);
 		if (!rc)
@@ -1913,7 +1911,7 @@ struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce,
 		/* only allow one real-space gmap shadow */
 		list_for_each_entry(sg, &parent->children, list) {
 			if (sg->orig_asce & _ASCE_REAL_SPACE) {
-				spin_lock(&sg->guest_table_lock);
+				spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
 				gmap_unshadow(sg);
 				spin_unlock(&sg->guest_table_lock);
 				list_del(&sg->list);
@@ -1985,7 +1983,7 @@ int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2t,
 		page->index |= GMAP_SHADOW_FAKE_TABLE;
 	s_r2t = (unsigned long *) page_to_phys(page);
 	/* Install shadow region second table */
-	spin_lock(&sg->guest_table_lock);
+	spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
 	table = gmap_table_walk(sg, saddr, 4); /* get region-1 pointer */
 	if (!table) {
 		rc = -EAGAIN;		/* Race with unshadow */
@@ -2018,7 +2016,7 @@ int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2t,
 	offset = ((r2t & _REGION_ENTRY_OFFSET) >> 6) * PAGE_SIZE;
 	len = ((r2t & _REGION_ENTRY_LENGTH) + 1) * PAGE_SIZE - offset;
 	rc = gmap_protect_rmap(sg, raddr, origin + offset, len, PROT_READ);
-	spin_lock(&sg->guest_table_lock);
+	spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
 	if (!rc) {
 		table = gmap_table_walk(sg, saddr, 4);
 		if (!table || (*table & _REGION_ENTRY_ORIGIN) !=
@@ -2069,7 +2067,7 @@ int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3t,
 		page->index |= GMAP_SHADOW_FAKE_TABLE;
 	s_r3t = (unsigned long *) page_to_phys(page);
 	/* Install shadow region second table */
-	spin_lock(&sg->guest_table_lock);
+	spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
 	table = gmap_table_walk(sg, saddr, 3); /* get region-2 pointer */
 	if (!table) {
 		rc = -EAGAIN;		/* Race with unshadow */
@@ -2101,7 +2099,7 @@ int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3t,
 	offset = ((r3t & _REGION_ENTRY_OFFSET) >> 6) * PAGE_SIZE;
 	len = ((r3t & _REGION_ENTRY_LENGTH) + 1) * PAGE_SIZE - offset;
 	rc = gmap_protect_rmap(sg, raddr, origin + offset, len, PROT_READ);
-	spin_lock(&sg->guest_table_lock);
+	spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
 	if (!rc) {
 		table = gmap_table_walk(sg, saddr, 3);
 		if (!table || (*table & _REGION_ENTRY_ORIGIN) !=
@@ -2152,7 +2150,7 @@ int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sgt,
 		page->index |= GMAP_SHADOW_FAKE_TABLE;
 	s_sgt = (unsigned long *) page_to_phys(page);
 	/* Install shadow region second table */
-	spin_lock(&sg->guest_table_lock);
+	spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
 	table = gmap_table_walk(sg, saddr, 2); /* get region-3 pointer */
 	if (!table) {
 		rc = -EAGAIN;		/* Race with unshadow */
@@ -2185,7 +2183,7 @@ int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sgt,
 	offset = ((sgt & _REGION_ENTRY_OFFSET) >> 6) * PAGE_SIZE;
 	len = ((sgt & _REGION_ENTRY_LENGTH) + 1) * PAGE_SIZE - offset;
 	rc = gmap_protect_rmap(sg, raddr, origin + offset, len, PROT_READ);
-	spin_lock(&sg->guest_table_lock);
+	spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
 	if (!rc) {
 		table = gmap_table_walk(sg, saddr, 2);
 		if (!table || (*table & _REGION_ENTRY_ORIGIN) !=
@@ -2241,7 +2239,7 @@ int gmap_shadow_sgt_lookup(struct gmap *sg, unsigned long saddr,
 	int rc = -EAGAIN;
 
 	BUG_ON(!gmap_is_shadow(sg));
-	spin_lock(&sg->guest_table_lock);
+	spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
 	if (sg->asce & _ASCE_TYPE_MASK) {
 		/* >2 GB guest */
 		r3e = (unsigned long *) gmap_table_walk(sg, saddr, 2);
@@ -2308,7 +2306,7 @@ int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,
 		page->index |= GMAP_SHADOW_FAKE_TABLE;
 	s_pgt = (unsigned long *) page_to_phys(page);
 	/* Install shadow page table */
-	spin_lock(&sg->guest_table_lock);
+	spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
 	table = gmap_table_walk(sg, saddr, 1); /* get segment pointer */
 	if (!table) {
 		rc = -EAGAIN;		/* Race with unshadow */
@@ -2336,7 +2334,7 @@ int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,
 	raddr = (saddr & _SEGMENT_MASK) | _SHADOW_RMAP_SEGMENT;
 	origin = pgt & _SEGMENT_ENTRY_ORIGIN & PAGE_MASK;
 	rc = gmap_protect_rmap(sg, raddr, origin, PAGE_SIZE, PROT_READ);
-	spin_lock(&sg->guest_table_lock);
+	spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
 	if (!rc) {
 		table = gmap_table_walk(sg, saddr, 1);
 		if (!table || (*table & _SEGMENT_ENTRY_ORIGIN) !=
@@ -2398,7 +2396,7 @@ int gmap_shadow_segment(struct gmap *sg, unsigned long saddr, pmd_t pmd)
 		if (spmdp) {
 			if (!pmd_large(*spmdp))
 				BUG();
-			spin_lock(&sg->guest_table_lock);
+			spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
 			/* Get shadow segment table pointer */
 			tpmdp = (pmd_t *) gmap_table_walk(sg, saddr, 1);
 			if (!tpmdp) {
@@ -2496,7 +2494,7 @@ int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte)
 		rc = -EAGAIN;
 		spmdp = gmap_pmd_op_walk(parent, paddr);
 		if (spmdp && !(pmd_val(*spmdp) & _SEGMENT_ENTRY_INVALID)) {
-			spin_lock(&sg->guest_table_lock);
+			spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
 			/* Get page table pointer */
 			tptep = (pte_t *) gmap_table_walk(sg, saddr, 0);
 			if (!tptep) {
@@ -2574,7 +2572,7 @@ static void gmap_shadow_notify_pmd(struct gmap *sg, unsigned long vmaddr,
 
 	BUG_ON(!gmap_is_shadow(sg));
 
-	spin_lock(&sg->guest_table_lock);
+	spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
 	if (sg->removed) {
 		spin_unlock(&sg->guest_table_lock);
 		return;
@@ -2635,7 +2633,7 @@ static void gmap_shadow_notify(struct gmap *sg, unsigned long vmaddr,
 
 	BUG_ON(!gmap_is_shadow(sg));
 
-	spin_lock(&sg->guest_table_lock);
+	spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
 	if (sg->removed) {
 		spin_unlock(&sg->guest_table_lock);
 		return;
@@ -2876,7 +2874,7 @@ static void gmap_pmdp_clear(struct mm_struct *mm, unsigned long vmaddr,
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {
-		spin_lock(&gmap->guest_table_lock);
+		spin_lock_nested(&gmap->guest_table_lock, GMAP_LOCK_PARENT);
 		pmdp = (pmd_t *)radix_tree_delete(&gmap->host_to_guest,
 						   vmaddr >> PMD_SHIFT);
 		if (pmdp) {
@@ -2926,7 +2924,7 @@ void gmap_pmdp_idte_local(struct mm_struct *mm, unsigned long vmaddr)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {
-		spin_lock(&gmap->guest_table_lock);
+		spin_lock_nested(&gmap->guest_table_lock, GMAP_LOCK_PARENT);
 		entry = radix_tree_delete(&gmap->host_to_guest,
 					  vmaddr >> PMD_SHIFT);
 		if (entry) {
@@ -2961,7 +2959,7 @@ void gmap_pmdp_idte_global(struct mm_struct *mm, unsigned long vmaddr)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {
-		spin_lock(&gmap->guest_table_lock);
+		spin_lock_nested(&gmap->guest_table_lock, GMAP_LOCK_PARENT);
 		entry = radix_tree_delete(&gmap->host_to_guest,
 					  vmaddr >> PMD_SHIFT);
 		if (entry) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (21 preceding siblings ...)
  2017-12-13 12:53 ` [RFC/PATCH v2 22/22] RFC: s390/mm: Add gmap lock classes Janosch Frank
@ 2017-12-20 12:23 ` Christian Borntraeger
  2017-12-21 12:00   ` David Hildenbrand
  2018-01-22 11:23 ` David Hildenbrand
  2018-01-23 21:15 ` David Hildenbrand
  24 siblings, 1 reply; 67+ messages in thread
From: Christian Borntraeger @ 2017-12-20 12:23 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: schwidefsky, david, dominik.dingel, linux-s390

FWIW, this patch set has survived some testing on my side (with storage
keys, with VSIE, with both), so I would say that after a respin with some 
patch sqashing we give it some more days for the last reviews and then apply
the whole thing via a topic branch via Martins s390 tree. I will merge
this branch as well to solve the potential conflicts with 
Davids patch ( s390x/mm: cleanup gmap_pte_op_walk() ) which is still
pending in my tree

Christian





On 12/13/2017 01:53 PM, Janosch Frank wrote:
> Since the z10 s390 does support 1M pages, but whereas hugetlbfs
> support was added quite fast, KVM always used standard 4k pages for
> guest backings.
> 
> This patchset adds full support for 1M huge page backings for s390
> KVM guests. I.e. we also support VSIE (nested vms) for these guests
> and are therefore able to run all combinations of backings for all
> layers of guests.
> 
> When running a VSIE guest in a huge page backed guest, we need to
> split some huge pages to be able to set granular protection. This way
> we avoid a prot/unprot cycle if prefixes and VSIE pages containing
> level 3 gmap DAT tables share the same segment, as the prefix has to
> be accessible at all times and the VSIE page has to be write
> protected.
> 
> TODO:
> * Cleanups & Documentation
> * Refactoring to get rid of a lot of indents
> * Find a way to reduce or beautify bit checks on table entries
> * Storage key support for split pages (will be a separate bugfix)
> * Regression testing
> * Testing large setups
> * Testing multi level VSIE
> 
> V2:
> 	* Incorporated changes from David's cleanup
> 	* Now flushing with IDTE_NODAT for protection transfers.
> 	* Added RRBE huge page handling for g2 -> g3 skey emulation
> 	* Added documentation for capability
> 	* Renamed GMAP_ENTRY_* constants
> 	* Added SEGMENT hardware bits constants
> 	* Improved some patch descriptions
> 	* General small improvements
> 	* Introduced pte_from_pmd function
> 
> Accomplished testing:
> l2: KVM guest
> l3: nested KVM guest
> 
> * 1m l2 guests
> * VSIE (l3) 4k and 1m guests on 1m l2
> * 1m l2 -> l2 migration with 4k/1m l3 guests
> * l3 -> l2 migration
> * postcopy works every second try, seems to be QEMU or my setup
> 
> 
> The initial prototype was started by Dominik Dingel. I had the
> pleasure of adding the VSIE part, the protection transfers and the
> optimizations. A huge thanks to Christian and Martin who review(ed)
> and helped debugging/designing.
> 
> Dominik Dingel (2):
>   s390/mm: hugetlb pages within a gmap can not be freed
>   s390/mm: clear huge page storage keys on enable_skey
> 
> Janosch Frank (20):
>   s390/mm: make gmap_protect_range more modular
>   s390/mm: Abstract gmap notify bit setting
>   s390/mm: add gmap PMD invalidation notification
>   s390/mm: Add gmap pmd invalidation and clearing
>   s390/mm: Introduce gmap_pmdp_xchg
>   RFC: s390/mm: Transfer guest pmd protection to host
>   s390/mm: Add huge page dirty sync support
>   s390/mm: Add huge pmd storage key handling
>   s390/mm: Remove superfluous parameter
>   s390/mm: Add gmap_protect_large read protection support
>   s390/mm: Make gmap_read_table EDAT1 compatible
>   s390/mm: Make protect_rmap EDAT1 compatible
>   s390/mm: GMAP read table extensions
>   s390/mm: Add shadow segment code
>   s390/mm: Add VSIE reverse fake case
>   s390/mm: Remove gmap_pte_op_walk
>   s390/mm: Split huge pages if granular protection is needed
>   s390/mm: Enable gmap huge pmd support
>   KVM: s390: Add KVM HPAGE capability
>   RFC: s390/mm: Add gmap lock classes
> 
>  Documentation/virtual/kvm/api.txt |   10 +
>  arch/s390/include/asm/gmap.h      |   39 +-
>  arch/s390/include/asm/pgtable.h   |   18 +-
>  arch/s390/kvm/gaccess.c           |   64 +-
>  arch/s390/kvm/kvm-s390.c          |   19 +-
>  arch/s390/mm/fault.c              |   10 +-
>  arch/s390/mm/gmap.c               | 1275 +++++++++++++++++++++++++++++++++----
>  arch/s390/mm/pageattr.c           |    6 +-
>  arch/s390/mm/pgtable.c            |  176 ++++-
>  include/uapi/linux/kvm.h          |    1 +
>  10 files changed, 1445 insertions(+), 173 deletions(-)
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 22/22] RFC: s390/mm: Add gmap lock classes
  2017-12-13 12:53 ` [RFC/PATCH v2 22/22] RFC: s390/mm: Add gmap lock classes Janosch Frank
@ 2017-12-20 12:24   ` Christian Borntraeger
  2017-12-20 12:36     ` Janosch Frank
  0 siblings, 1 reply; 67+ messages in thread
From: Christian Borntraeger @ 2017-12-20 12:24 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: schwidefsky, david, dominik.dingel, linux-s390

On 12/13/2017 01:53 PM, Janosch Frank wrote:
> A shadow gmap and its parent are locked right after each other when
> doing VSIE management. Lockdep can't differentiate between the two
> classes without some help.
> 
> TODO: Not sure yet if I have to annotate all and if gmap_pmd_walk will
> be used by both shadow and parent


I  think the annotations are just fine here.

> 
> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
> ---
>  arch/s390/include/asm/gmap.h |  6 ++++++
>  arch/s390/mm/gmap.c          | 40 +++++++++++++++++++---------------------
>  2 files changed, 25 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
> index a187033..6287aca 100644
> --- a/arch/s390/include/asm/gmap.h
> +++ b/arch/s390/include/asm/gmap.h
> @@ -20,6 +20,12 @@
>  #define _SEGMENT_ENTRY_GMAP_UC		0x4000	/* user dirty (migration) */
>  #define _SEGMENT_ENTRY_GMAP_VSIE	0x8000	/* vsie bit */
> 
> +
> +enum gmap_lock_class {
> +	GMAP_LOCK_PARENT,
> +	GMAP_LOCK_SHADOW
> +};
> +
>  /**
>   * struct gmap_struct - guest address space
>   * @list: list head for the mm->context gmap list
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index cb03646..86a12f3 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -199,10 +199,8 @@ static void gmap_free(struct gmap *gmap)
>  	gmap_radix_tree_free(&gmap->host_to_guest);
> 
>  	/* Free split pmd page tables */
> -	spin_lock(&gmap->guest_table_lock);
>  	list_for_each_entry_safe(page, next, &gmap->split_list, lru)
>  		page_table_free_pgste(page);
> -	spin_unlock(&gmap->guest_table_lock);

Any reason why you only remove these?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 22/22] RFC: s390/mm: Add gmap lock classes
  2017-12-20 12:24   ` Christian Borntraeger
@ 2017-12-20 12:36     ` Janosch Frank
  0 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-20 12:36 UTC (permalink / raw)
  To: Christian Borntraeger, kvm; +Cc: schwidefsky, david, dominik.dingel, linux-s390


[-- Attachment #1.1: Type: text/plain, Size: 1378 bytes --]

On 20.12.2017 13:24, Christian Borntraeger wrote:
> On 12/13/2017 01:53 PM, Janosch Frank wrote:
>> A shadow gmap and its parent are locked right after each other when
>> doing VSIE management. Lockdep can't differentiate between the two
>> classes without some help.
>>  /**
>>   * struct gmap_struct - guest address space
>>   * @list: list head for the mm->context gmap list
>> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
>> index cb03646..86a12f3 100644
>> --- a/arch/s390/mm/gmap.c
>> +++ b/arch/s390/mm/gmap.c
>> @@ -199,10 +199,8 @@ static void gmap_free(struct gmap *gmap)
>>  	gmap_radix_tree_free(&gmap->host_to_guest);
>>
>>  	/* Free split pmd page tables */
>> -	spin_lock(&gmap->guest_table_lock);
>>  	list_for_each_entry_safe(page, next, &gmap->split_list, lru)
>>  		page_table_free_pgste(page);
>> -	spin_unlock(&gmap->guest_table_lock);
> 
> Any reason why you only remove these?

They were inserted when I had some locking problems on split pages.
After I spoke to Martin and checked that we absolutely have no reference
to the gmap anymore and hence can not end up in gmap_split_free at the
same time I decided to remove them.

However, they should rather have never been introduced in the split
patch than being removed here... My current internal branch has this
fixed, as well as some other rebasing mistakes.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 21/22] KVM: s390: Add KVM HPAGE capability
  2017-12-13 12:53 ` [RFC/PATCH v2 21/22] KVM: s390: Add KVM HPAGE capability Janosch Frank
@ 2017-12-20 13:02   ` Cornelia Huck
  2017-12-20 13:17     ` Janosch Frank
  0 siblings, 1 reply; 67+ messages in thread
From: Cornelia Huck @ 2017-12-20 13:02 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, schwidefsky, borntraeger, david, dominik.dingel, linux-s390

On Wed, 13 Dec 2017 13:53:32 +0100
Janosch Frank <frankja@linux.vnet.ibm.com> wrote:

> KVM huge page backing support can not be easily tested under
> s390. Currently testing is only possible after most of the guest has
> already been set up.
> 
> To indicate, that KVM has huge page backing support, we add the
> KVM_CAP_S390_HPAGE capability. This does not mean, that transparent
> huge pages are supported.

Do you expect to use a different cap for non-1MB huge pages? If yes,
this should probably be mentioned here.

> 
> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
> ---
>  Documentation/virtual/kvm/api.txt | 10 ++++++++++
>  arch/s390/kvm/kvm-s390.c          |  1 +
>  include/uapi/linux/kvm.h          |  1 +
>  3 files changed, 12 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 57d3ee9..a56b0af 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -4369,3 +4369,13 @@ Parameters: none
>  This capability indicates if the flic device will be able to get/set the
>  AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and allows
>  to discover this without having to create a flic device.
> +
> +8.14 KVM_CAP_S390_HPAGE
> +
> +Architectures: s390
> +This capability, if KVM_CHECK_EXTENSION indicates that it is
> +available, means that KVM supports VMs that are memory backed through
> +hugetlbfs with 1 megabyte pages.
> +
> +While it is generally possible to create and start a VM without this
> +support, the VM will not be functional.

This sentence applies only to the hugepage case, doesn't it?

> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 73fb3bc..8951ad4 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -393,6 +393,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_S390_CMMA_MIGRATION:
>  	case KVM_CAP_S390_AIS:
>  	case KVM_CAP_S390_AIS_MIGRATION:
> +	case KVM_CAP_S390_HPAGE:
>  		r = 1;
>  		break;
>  	case KVM_CAP_S390_MEM_OP:
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 496e59a..aa3d707 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -932,6 +932,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_HYPERV_SYNIC2 148
>  #define KVM_CAP_HYPERV_VP_INDEX 149
>  #define KVM_CAP_S390_AIS_MIGRATION 150
> +#define KVM_CAP_S390_HPAGE 151
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 21/22] KVM: s390: Add KVM HPAGE capability
  2017-12-20 13:02   ` Cornelia Huck
@ 2017-12-20 13:17     ` Janosch Frank
  2017-12-20 13:21       ` Cornelia Huck
  0 siblings, 1 reply; 67+ messages in thread
From: Janosch Frank @ 2017-12-20 13:17 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: kvm, schwidefsky, borntraeger, david, dominik.dingel, linux-s390


[-- Attachment #1.1: Type: text/plain, Size: 2210 bytes --]

On 20.12.2017 14:02, Cornelia Huck wrote:
> On Wed, 13 Dec 2017 13:53:32 +0100
> Janosch Frank <frankja@linux.vnet.ibm.com> wrote:
> 
>> KVM huge page backing support can not be easily tested under
>> s390. Currently testing is only possible after most of the guest has
>> already been set up.
>>
>> To indicate, that KVM has huge page backing support, we add the
>> KVM_CAP_S390_HPAGE capability. This does not mean, that transparent
>> huge pages are supported.
> 
> Do you expect to use a different cap for non-1MB huge pages? If yes,
> this should probably be mentioned here.

Yes probably KVM_CAP_S390_HPAGE2, but this will not come in the near future.
However this commit message lacks the information, that we only support
1m pages, I'll add that.

1m guest backing pages ought to be enough for everybody (TM) and the
pain to support 2g pages is expected to be a magnitude bigger than the
one for this patchset.

> 
>>
>> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
>> ---
>>  Documentation/virtual/kvm/api.txt | 10 ++++++++++
>>  arch/s390/kvm/kvm-s390.c          |  1 +
>>  include/uapi/linux/kvm.h          |  1 +
>>  3 files changed, 12 insertions(+)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index 57d3ee9..a56b0af 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -4369,3 +4369,13 @@ Parameters: none
>>  This capability indicates if the flic device will be able to get/set the
>>  AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and allows
>>  to discover this without having to create a flic device.
>> +
>> +8.14 KVM_CAP_S390_HPAGE
>> +
>> +Architectures: s390
>> +This capability, if KVM_CHECK_EXTENSION indicates that it is
>> +available, means that KVM supports VMs that are memory backed through
>> +hugetlbfs with 1 megabyte pages.
>> +
>> +While it is generally possible to create and start a VM without this
>> +support, the VM will not be functional.
> 
> This sentence applies only to the hugepage case, doesn't it?

Yes, of course.
I'll do a: s/a/such a/

Expect a QEMU fencing patch next year.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 21/22] KVM: s390: Add KVM HPAGE capability
  2017-12-20 13:17     ` Janosch Frank
@ 2017-12-20 13:21       ` Cornelia Huck
  0 siblings, 0 replies; 67+ messages in thread
From: Cornelia Huck @ 2017-12-20 13:21 UTC (permalink / raw)
  To: Janosch Frank
  Cc: kvm, schwidefsky, borntraeger, david, dominik.dingel, linux-s390

On Wed, 20 Dec 2017 14:17:24 +0100
Janosch Frank <frankja@linux.vnet.ibm.com> wrote:

> On 20.12.2017 14:02, Cornelia Huck wrote:
> > On Wed, 13 Dec 2017 13:53:32 +0100
> > Janosch Frank <frankja@linux.vnet.ibm.com> wrote:
> >   
> >> KVM huge page backing support can not be easily tested under
> >> s390. Currently testing is only possible after most of the guest has
> >> already been set up.
> >>
> >> To indicate, that KVM has huge page backing support, we add the
> >> KVM_CAP_S390_HPAGE capability. This does not mean, that transparent
> >> huge pages are supported.  
> > 
> > Do you expect to use a different cap for non-1MB huge pages? If yes,
> > this should probably be mentioned here.  
> 
> Yes probably KVM_CAP_S390_HPAGE2, but this will not come in the near future.
> However this commit message lacks the information, that we only support
> 1m pages, I'll add that.

OK

> 
> 1m guest backing pages ought to be enough for everybody (TM) 

:)

> and the
> pain to support 2g pages is expected to be a magnitude bigger than the
> one for this patchset.

This patchset already looks complicated enough to me...

> 
> >   
> >>
> >> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
> >> ---
> >>  Documentation/virtual/kvm/api.txt | 10 ++++++++++
> >>  arch/s390/kvm/kvm-s390.c          |  1 +
> >>  include/uapi/linux/kvm.h          |  1 +
> >>  3 files changed, 12 insertions(+)
> >>
> >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >> index 57d3ee9..a56b0af 100644
> >> --- a/Documentation/virtual/kvm/api.txt
> >> +++ b/Documentation/virtual/kvm/api.txt
> >> @@ -4369,3 +4369,13 @@ Parameters: none
> >>  This capability indicates if the flic device will be able to get/set the
> >>  AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and allows
> >>  to discover this without having to create a flic device.
> >> +
> >> +8.14 KVM_CAP_S390_HPAGE
> >> +
> >> +Architectures: s390
> >> +This capability, if KVM_CHECK_EXTENSION indicates that it is
> >> +available, means that KVM supports VMs that are memory backed through
> >> +hugetlbfs with 1 megabyte pages.
> >> +
> >> +While it is generally possible to create and start a VM without this
> >> +support, the VM will not be functional.  
> > 
> > This sentence applies only to the hugepage case, doesn't it?  
> 
> Yes, of course.
> I'll do a: s/a/such a/
> 
> Expect a QEMU fencing patch next year.

Sounds good!

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 11/22] s390/mm: Remove superfluous parameter
  2017-12-13 12:53 ` [RFC/PATCH v2 11/22] s390/mm: Remove superfluous parameter Janosch Frank
@ 2017-12-21  9:22   ` Janosch Frank
  2018-01-16 12:39     ` Janosch Frank
  2018-01-16 13:11   ` David Hildenbrand
  2018-01-22 13:14   ` Christian Borntraeger
  2 siblings, 1 reply; 67+ messages in thread
From: Janosch Frank @ 2017-12-21  9:22 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390


[-- Attachment #1.1: Type: text/plain, Size: 1321 bytes --]

On 13.12.2017 13:53, Janosch Frank wrote:
> It seems it hasn't even been used before the last cleanup and was
> overlooked.

This one is an independent cleanup.

Can we move this out of this series and schedule it for earlier
inclusion into mainline?

> 
> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
> ---
>  arch/s390/mm/gmap.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index ffc11d8..d396da8 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2237,7 +2237,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_page);
>   * Called with sg->parent->shadow_lock.
>   */
>  static void gmap_shadow_notify(struct gmap *sg, unsigned long vmaddr,
> -			       unsigned long gaddr, pte_t *pte)
> +			       unsigned long gaddr)
>  {
>  	struct gmap_rmap *rmap, *rnext, *head;
>  	unsigned long start, end, bits, raddr;
> @@ -2322,7 +2322,7 @@ void ptep_notify(struct mm_struct *mm, unsigned long vmaddr,
>  			spin_lock(&gmap->shadow_lock);
>  			list_for_each_entry_safe(sg, next,
>  						 &gmap->children, list)
> -				gmap_shadow_notify(sg, vmaddr, gaddr, pte);
> +				gmap_shadow_notify(sg, vmaddr, gaddr);
>  			spin_unlock(&gmap->shadow_lock);
>  		}
>  		if (bits & PGSTE_IN_BIT)
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 03/22] s390/mm: add gmap PMD invalidation notification
  2017-12-13 12:53 ` [RFC/PATCH v2 03/22] s390/mm: add gmap PMD invalidation notification Janosch Frank
@ 2017-12-21  9:24   ` Janosch Frank
  2018-01-22 11:46   ` David Hildenbrand
  2018-01-22 11:56   ` David Hildenbrand
  2 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2017-12-21  9:24 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390


[-- Attachment #1.1: Type: text/plain, Size: 1468 bytes --]

On 13.12.2017 13:53, Janosch Frank wrote:
[...]
>  #define GMAP_NOTIFY_SHADOW	0x2
>  #define GMAP_NOTIFY_MPROT	0x1
> 
> +/* Status bits in the gmap segment entry. */
> +#define _SEGMENT_ENTRY_GMAP_IN		0x0001	/* invalidation notify bit */
> +
>  /**
>   * struct gmap_struct - guest address space
>   * @list: list head for the mm->context gmap list
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 57d7bc9..ba3840c 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -269,8 +269,10 @@ static inline int is_module_addr(void *addr)
>  #define _REGION_ENTRY_BITS_LARGE 0xffffffff8000fe2fUL
> 
>  /* Bits in the segment table entry */
> -#define _SEGMENT_ENTRY_BITS	0xfffffffffffffe33UL
> -#define _SEGMENT_ENTRY_BITS_LARGE 0xfffffffffff0ff33UL
> +#define _SEGMENT_ENTRY_BITS			0xfffffffffffffe33UL
> +#define _SEGMENT_ENTRY_BITS_LARGE 		0xfffffffffff0ff33UL
> +#define _SEGMENT_ENTRY_HARDWARE_BITS		0xfffffffffffffe30UL
> +#define _SEGMENT_ENTRY_HARDWARE_BITS_LARGE 	0xfffffffffff00730UL
>  #define _SEGMENT_ENTRY_ORIGIN_LARGE ~0xfffffUL /* large page address	    */
>  #define _SEGMENT_ENTRY_ORIGIN	~0x7ffUL/* page table origin		    */
>  #define _SEGMENT_ENTRY_PROTECT	0x200	/* segment protection bit	    */

@Martin: Are these constants fine with you?

The whitespace damage that gets fixed in the next patch is already
addressed in the next version :)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement
  2017-12-20 12:23 ` [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Christian Borntraeger
@ 2017-12-21 12:00   ` David Hildenbrand
  2017-12-22  9:08     ` Christian Borntraeger
  0 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2017-12-21 12:00 UTC (permalink / raw)
  To: Christian Borntraeger, Janosch Frank, kvm
  Cc: schwidefsky, dominik.dingel, linux-s390

On 20.12.2017 13:23, Christian Borntraeger wrote:
> FWIW, this patch set has survived some testing on my side (with storage
> keys, with VSIE, with both), so I would say that after a respin with some 
> patch sqashing we give it some more days for the last reviews and then apply
> the whole thing via a topic branch via Martins s390 tree. I will merge
> this branch as well to solve the potential conflicts with 
> Davids patch ( s390x/mm: cleanup gmap_pte_op_walk() ) which is still
> pending in my tree
> 

"postcopy works every second try, seems to be QEMU or my setup".
Shouldn't we first understand why? gmap is a very sensible topic and we
should rather spend more time understanding everything. We don't want
guest escalation bugs.

I'd love to spend more time reviewing this, but it won't happen within
the next 2 weeks (Christmas holidays)

Thanks!

> Christian
> 
> 
> 
> 
> 
> On 12/13/2017 01:53 PM, Janosch Frank wrote:
>> Since the z10 s390 does support 1M pages, but whereas hugetlbfs
>> support was added quite fast, KVM always used standard 4k pages for
>> guest backings.
>>
>> This patchset adds full support for 1M huge page backings for s390
>> KVM guests. I.e. we also support VSIE (nested vms) for these guests
>> and are therefore able to run all combinations of backings for all
>> layers of guests.
>>
>> When running a VSIE guest in a huge page backed guest, we need to
>> split some huge pages to be able to set granular protection. This way
>> we avoid a prot/unprot cycle if prefixes and VSIE pages containing
>> level 3 gmap DAT tables share the same segment, as the prefix has to
>> be accessible at all times and the VSIE page has to be write
>> protected.
>>
>> TODO:
>> * Cleanups & Documentation
>> * Refactoring to get rid of a lot of indents
>> * Find a way to reduce or beautify bit checks on table entries
>> * Storage key support for split pages (will be a separate bugfix)
>> * Regression testing
>> * Testing large setups
>> * Testing multi level VSIE
>>
>> V2:
>> 	* Incorporated changes from David's cleanup
>> 	* Now flushing with IDTE_NODAT for protection transfers.
>> 	* Added RRBE huge page handling for g2 -> g3 skey emulation
>> 	* Added documentation for capability
>> 	* Renamed GMAP_ENTRY_* constants
>> 	* Added SEGMENT hardware bits constants
>> 	* Improved some patch descriptions
>> 	* General small improvements
>> 	* Introduced pte_from_pmd function
>>
>> Accomplished testing:
>> l2: KVM guest
>> l3: nested KVM guest
>>
>> * 1m l2 guests
>> * VSIE (l3) 4k and 1m guests on 1m l2
>> * 1m l2 -> l2 migration with 4k/1m l3 guests
>> * l3 -> l2 migration
>> * postcopy works every second try, seems to be QEMU or my setup
>>
>>
>> The initial prototype was started by Dominik Dingel. I had the
>> pleasure of adding the VSIE part, the protection transfers and the
>> optimizations. A huge thanks to Christian and Martin who review(ed)
>> and helped debugging/designing.
>>
>> Dominik Dingel (2):
>>   s390/mm: hugetlb pages within a gmap can not be freed
>>   s390/mm: clear huge page storage keys on enable_skey
>>
>> Janosch Frank (20):
>>   s390/mm: make gmap_protect_range more modular
>>   s390/mm: Abstract gmap notify bit setting
>>   s390/mm: add gmap PMD invalidation notification
>>   s390/mm: Add gmap pmd invalidation and clearing
>>   s390/mm: Introduce gmap_pmdp_xchg
>>   RFC: s390/mm: Transfer guest pmd protection to host
>>   s390/mm: Add huge page dirty sync support
>>   s390/mm: Add huge pmd storage key handling
>>   s390/mm: Remove superfluous parameter
>>   s390/mm: Add gmap_protect_large read protection support
>>   s390/mm: Make gmap_read_table EDAT1 compatible
>>   s390/mm: Make protect_rmap EDAT1 compatible
>>   s390/mm: GMAP read table extensions
>>   s390/mm: Add shadow segment code
>>   s390/mm: Add VSIE reverse fake case
>>   s390/mm: Remove gmap_pte_op_walk
>>   s390/mm: Split huge pages if granular protection is needed
>>   s390/mm: Enable gmap huge pmd support
>>   KVM: s390: Add KVM HPAGE capability
>>   RFC: s390/mm: Add gmap lock classes
>>
>>  Documentation/virtual/kvm/api.txt |   10 +
>>  arch/s390/include/asm/gmap.h      |   39 +-
>>  arch/s390/include/asm/pgtable.h   |   18 +-
>>  arch/s390/kvm/gaccess.c           |   64 +-
>>  arch/s390/kvm/kvm-s390.c          |   19 +-
>>  arch/s390/mm/fault.c              |   10 +-
>>  arch/s390/mm/gmap.c               | 1275 +++++++++++++++++++++++++++++++++----
>>  arch/s390/mm/pageattr.c           |    6 +-
>>  arch/s390/mm/pgtable.c            |  176 ++++-
>>  include/uapi/linux/kvm.h          |    1 +
>>  10 files changed, 1445 insertions(+), 173 deletions(-)
>>
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement
  2017-12-21 12:00   ` David Hildenbrand
@ 2017-12-22  9:08     ` Christian Borntraeger
  2018-01-02  0:02       ` Janosch Frank
  0 siblings, 1 reply; 67+ messages in thread
From: Christian Borntraeger @ 2017-12-22  9:08 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank, kvm
  Cc: schwidefsky, dominik.dingel, linux-s390



On 12/21/2017 01:00 PM, David Hildenbrand wrote:
> On 20.12.2017 13:23, Christian Borntraeger wrote:
>> FWIW, this patch set has survived some testing on my side (with storage
>> keys, with VSIE, with both), so I would say that after a respin with some 
>> patch sqashing we give it some more days for the last reviews and then apply
>> the whole thing via a topic branch via Martins s390 tree. I will merge
>> this branch as well to solve the potential conflicts with 
>> Davids patch ( s390x/mm: cleanup gmap_pte_op_walk() ) which is still
>> pending in my tree
>>
> 
> "postcopy works every second try, seems to be QEMU or my setup".
> Shouldn't we first understand why? gmap is a very sensible topic and we
> should rather spend more time understanding everything. We don't want
> guest escalation bugs.


I somehow missed that in the cover letter. Yes we should make sure that this works (
on the other hand I usually blame postcopy because userfault makes several assumptions
which are somewhat "brave". See the empty zero page issue, which could in theory also break
if a postcopy guest does some clever ballooning in and out). But anyway I will have a look
after christmas into postcopy.

FWIW; Martin and I looked over the patches and they seem good enough (after we fixed postcopy).
I also did some testing on that and it seems to work fine so far (including classic migration).

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement
  2017-12-22  9:08     ` Christian Borntraeger
@ 2018-01-02  0:02       ` Janosch Frank
  0 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2018-01-02  0:02 UTC (permalink / raw)
  To: Christian Borntraeger, David Hildenbrand, kvm
  Cc: schwidefsky, dominik.dingel, linux-s390


[-- Attachment #1.1: Type: text/plain, Size: 2285 bytes --]

On 22.12.2017 10:08, Christian Borntraeger wrote:
> 
> 
> On 12/21/2017 01:00 PM, David Hildenbrand wrote:
>> On 20.12.2017 13:23, Christian Borntraeger wrote:
>>> FWIW, this patch set has survived some testing on my side (with storage
>>> keys, with VSIE, with both), so I would say that after a respin with some 
>>> patch sqashing we give it some more days for the last reviews and then apply
>>> the whole thing via a topic branch via Martins s390 tree. I will merge
>>> this branch as well to solve the potential conflicts with 
>>> Davids patch ( s390x/mm: cleanup gmap_pte_op_walk() ) which is still
>>> pending in my tree
>>>
>>
>> "postcopy works every second try, seems to be QEMU or my setup".
>> Shouldn't we first understand why? gmap is a very sensible topic and we
>> should rather spend more time understanding everything. We don't want
>> guest escalation bugs.
> 
> 
> I somehow missed that in the cover letter. Yes we should make sure that this works (
> on the other hand I usually blame postcopy because userfault makes several assumptions
> which are somewhat "brave". See the empty zero page issue, which could in theory also break
> if a postcopy guest does some clever ballooning in and out). But anyway I will have a look
> after christmas into postcopy.
> 
> FWIW; Martin and I looked over the patches and they seem good enough (after we fixed postcopy).
> I also did some testing on that and it seems to work fine so far (including classic migration).
> 

Postcopy returns with this on the source and results in a paused VM
which can be resumed without any problem:
qemu-system-s390x: RP: Received invalid message 0x0000 length 0x0000

This doesn't tell me anything, but Peter Xu seems to have a Qemu series
wich improves postcopy error handling. I'll try it later today.

A second postcopy migration with the resumed VM succeeds. The VM and its
VSIE guests survive a memtester run and don't log any errors. The
destination does not log any errors...

All in all I'm not really concerned about postcopy although it certainly
needs fixing and I'll have a closer look this week.

@David: The complicated parts are the new VSIE modes and the splitting.
If you could have a look at that I'd really appreciate it.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 11/22] s390/mm: Remove superfluous parameter
  2017-12-21  9:22   ` Janosch Frank
@ 2018-01-16 12:39     ` Janosch Frank
  0 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2018-01-16 12:39 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390


[-- Attachment #1.1: Type: text/plain, Size: 1500 bytes --]

On 21.12.2017 10:22, Janosch Frank wrote:
> On 13.12.2017 13:53, Janosch Frank wrote:
>> It seems it hasn't even been used before the last cleanup and was
>> overlooked.
> 
> This one is an independent cleanup.
> 
> Can we move this out of this series and schedule it for earlier
> inclusion into mainline?

Polite ping
pte has never been used, not even when it was introduced by Martin.

> 
>>
>> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
>> ---
>>  arch/s390/mm/gmap.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
>> index ffc11d8..d396da8 100644
>> --- a/arch/s390/mm/gmap.c
>> +++ b/arch/s390/mm/gmap.c
>> @@ -2237,7 +2237,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_page);
>>   * Called with sg->parent->shadow_lock.
>>   */
>>  static void gmap_shadow_notify(struct gmap *sg, unsigned long vmaddr,
>> -			       unsigned long gaddr, pte_t *pte)
>> +			       unsigned long gaddr)
>>  {
>>  	struct gmap_rmap *rmap, *rnext, *head;
>>  	unsigned long start, end, bits, raddr;
>> @@ -2322,7 +2322,7 @@ void ptep_notify(struct mm_struct *mm, unsigned long vmaddr,
>>  			spin_lock(&gmap->shadow_lock);
>>  			list_for_each_entry_safe(sg, next,
>>  						 &gmap->children, list)
>> -				gmap_shadow_notify(sg, vmaddr, gaddr, pte);
>> +				gmap_shadow_notify(sg, vmaddr, gaddr);
>>  			spin_unlock(&gmap->shadow_lock);
>>  		}
>>  		if (bits & PGSTE_IN_BIT)
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 11/22] s390/mm: Remove superfluous parameter
  2017-12-13 12:53 ` [RFC/PATCH v2 11/22] s390/mm: Remove superfluous parameter Janosch Frank
  2017-12-21  9:22   ` Janosch Frank
@ 2018-01-16 13:11   ` David Hildenbrand
  2018-01-22 13:14   ` Christian Borntraeger
  2 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand @ 2018-01-16 13:11 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390

On 13.12.2017 13:53, Janosch Frank wrote:
> It seems it hasn't even been used before the last cleanup and was
> overlooked.
> 
> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
> ---
>  arch/s390/mm/gmap.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index ffc11d8..d396da8 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2237,7 +2237,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_page);
>   * Called with sg->parent->shadow_lock.
>   */
>  static void gmap_shadow_notify(struct gmap *sg, unsigned long vmaddr,
> -			       unsigned long gaddr, pte_t *pte)
> +			       unsigned long gaddr)
>  {
>  	struct gmap_rmap *rmap, *rnext, *head;
>  	unsigned long start, end, bits, raddr;
> @@ -2322,7 +2322,7 @@ void ptep_notify(struct mm_struct *mm, unsigned long vmaddr,
>  			spin_lock(&gmap->shadow_lock);
>  			list_for_each_entry_safe(sg, next,
>  						 &gmap->children, list)
> -				gmap_shadow_notify(sg, vmaddr, gaddr, pte);
> +				gmap_shadow_notify(sg, vmaddr, gaddr);
>  			spin_unlock(&gmap->shadow_lock);
>  		}
>  		if (bits & PGSTE_IN_BIT)
> 

Could be from an earlier prototype, can be removed.

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (22 preceding siblings ...)
  2017-12-20 12:23 ` [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Christian Borntraeger
@ 2018-01-22 11:23 ` David Hildenbrand
  2018-01-22 11:56   ` Christian Borntraeger
  2018-01-23 21:15 ` David Hildenbrand
  24 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2018-01-22 11:23 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390

On 13.12.2017 13:53, Janosch Frank wrote:
> Since the z10 s390 does support 1M pages, but whereas hugetlbfs
> support was added quite fast, KVM always used standard 4k pages for
> guest backings.
> 
> This patchset adds full support for 1M huge page backings for s390
> KVM guests. I.e. we also support VSIE (nested vms) for these guests
> and are therefore able to run all combinations of backings for all
> layers of guests.
> 
> When running a VSIE guest in a huge page backed guest, we need to
> split some huge pages to be able to set granular protection. This way
> we avoid a prot/unprot cycle if prefixes and VSIE pages containing
> level 3 gmap DAT tables share the same segment, as the prefix has to
> be accessible at all times and the VSIE page has to be write
> protected.
> 
> TODO:
> * Cleanups & Documentation
> * Refactoring to get rid of a lot of indents
> * Find a way to reduce or beautify bit checks on table entries
> * Storage key support for split pages (will be a separate bugfix)
> * Regression testing
> * Testing large setups
> * Testing multi level VSIE
> 
> V2:
> 	* Incorporated changes from David's cleanup
> 	* Now flushing with IDTE_NODAT for protection transfers.
> 	* Added RRBE huge page handling for g2 -> g3 skey emulation
> 	* Added documentation for capability
> 	* Renamed GMAP_ENTRY_* constants
> 	* Added SEGMENT hardware bits constants
> 	* Improved some patch descriptions
> 	* General small improvements
> 	* Introduced pte_from_pmd function
> 
> Accomplished testing:
> l2: KVM guest
> l3: nested KVM guest
> 
> * 1m l2 guests
> * VSIE (l3) 4k and 1m guests on 1m l2
> * 1m l2 -> l2 migration with 4k/1m l3 guests
> * l3 -> l2 migration
> * postcopy works every second try, seems to be QEMU or my setup
> 
> 
> The initial prototype was started by Dominik Dingel. I had the
> pleasure of adding the VSIE part, the protection transfers and the
> optimizations. A huge thanks to Christian and Martin who review(ed)
> and helped debugging/designing.
> 

Do you have a branch somewhere? I can't find a branch where this applies
cleanly. Thanks


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 01/22] s390/mm: make gmap_protect_range more modular
  2017-12-13 12:53 ` [RFC/PATCH v2 01/22] s390/mm: make gmap_protect_range more modular Janosch Frank
@ 2018-01-22 11:33   ` David Hildenbrand
  2018-01-22 12:31     ` Janosch Frank
  0 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2018-01-22 11:33 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390

On 13.12.2017 13:53, Janosch Frank wrote:
> This patch reworks the gmap_protect_range logic and extracts the pte
> handling into an own function. Also we do now walk to the pmd and make
> it accessible in the function for later use. This way we can add huge
> page handling logic more easily.
> 
> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
> ---
>  arch/s390/mm/gmap.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 92 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 05d459b..8de8bf9 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -874,7 +874,88 @@ static int gmap_pte_op_fixup(struct gmap *gmap, unsigned long gaddr,
>   */
>  static void gmap_pte_op_end(spinlock_t *ptl)
>  {
> -	spin_unlock(ptl);
> +	if (ptl)
> +		spin_unlock(ptl);
> +}
> +
> +/**
> + * gmap_pmd_op_walk - walk the gmap tables, get the guest table lock
> + *		      and return the pmd pointer
> + * @gmap: pointer to guest mapping meta data structure
> + * @gaddr: virtual address in the guest address space
> + *
> + * Returns a pointer to the pmd for a guest address, or NULL
> + */
> +static inline pmd_t *gmap_pmd_op_walk(struct gmap *gmap, unsigned long gaddr)
> +{
> +	pmd_t *pmdp;
> +
> +	spin_lock(&gmap->guest_table_lock);
> +	pmdp = (pmd_t *) gmap_table_walk(gmap, gaddr, 1);
> +
> +	/*
> +	 * Empty pmds can become large after we give up the
> +	 * guest_table_lock, so we have to check for pmd_none
> +	 * here.
> +	 */

Don't understand that comment. We give up the lock after we're done with
the pmd either way. So I think this comment can go.

> +	if (!pmdp || pmd_none(*pmdp)) {
> +		spin_unlock(&gmap->guest_table_lock);
> +		return NULL;
> +	}
> +	/*
> +	 * For plain 4k guests that do not run under the vsie it
> +	 * suffices to take the pte lock later on. Thus we can unlock
> +	 * the guest_table_lock here.
> +	 */

As discussed, the gmap_is_shadow() check is not needed. The comment
should be something like

/* 4k page table entries are locked via the pte (pte_alloc_map_lock). */

> +	if (!pmd_large(*pmdp) && !gmap_is_shadow(gmap))
> +		spin_unlock(&gmap->guest_table_lock);
> +	return pmdp;
> +}
> +
> +/**
> + * gmap_pmd_op_end - release the guest_table_lock if needed
> + * @gmap: pointer to the guest mapping meta data structure
> + * @pmdp: pointer to the pmd
> + */
> +static inline void gmap_pmd_op_end(struct gmap *gmap, pmd_t *pmdp)
> +{
> +	if (pmd_large(*pmdp) || gmap_is_shadow(gmap))

As discussed, gmap_is_shadow() can go.

> +		spin_unlock(&gmap->guest_table_lock);
> +}
> +
> +/*
> + * gmap_protect_pte - remove access rights to memory and set pgste bits
> + * @gmap: pointer to guest mapping meta data structure
> + * @gaddr: virtual address in the guest address space
> + * @pmdp: pointer to the pmd associated with the pte
> + * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE
> + * @bits: pgste notification bits to set
> + *
> + * Returns 0 if successfully protected, -ENOMEM if out of memory and
> + * -EAGAIN if a fixup is needed.
> + *
> + * Expected to be called with sg->mm->mmap_sem in read and
> + * guest_table_lock held for shadow gmaps.
> + */
> +static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
> +			    pmd_t *pmdp, int prot, unsigned long bits)
> +{
> +	int rc;
> +	pte_t *ptep;
> +	spinlock_t *ptl = NULL;
> +
> +	/* We have no upper segment, let's go back and fix this up. */
> +	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
> +		return -EAGAIN;

This is essentially pmd_none(*pmdp), which you already verified in
gmap_pmd_op_walk().

I suggest requiring for this function that the entry is valid (which is
always the case) and getting rid of the -EAGAIN return code. Makes this
function simpler.

> +
> +	ptep = pte_alloc_map_lock(gmap->mm, pmdp, gaddr, &ptl);
> +	if (!ptep)
> +		return -ENOMEM;
> +
> +	/* Protect and unlock. */
> +	rc = ptep_force_prot(gmap->mm, gaddr, ptep, prot, bits);
> +	gmap_pte_op_end(ptl);
> +	return rc;
>  }
>  
>  /*
> @@ -896,16 +977,20 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
>  			      unsigned long len, int prot, unsigned long bits)
>  {
>  	unsigned long vmaddr;
> -	spinlock_t *ptl;
> -	pte_t *ptep;
> +	pmd_t *pmdp;
>  	int rc;
>  
>  	while (len) {
>  		rc = -EAGAIN;
> -		ptep = gmap_pte_op_walk(gmap, gaddr, &ptl);
> -		if (ptep) {
> -			rc = ptep_force_prot(gmap->mm, gaddr, ptep, prot, bits);
> -			gmap_pte_op_end(ptl);
> +		pmdp = gmap_pmd_op_walk(gmap, gaddr);
> +		if (pmdp) {
> +			rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,
> +					      bits);
> +			if (!rc) {
> +				len -= PAGE_SIZE;
> +				gaddr += PAGE_SIZE;
> +			}
> +			gmap_pmd_op_end(gmap, pmdp);

This change looks good to me.

>  		}
>  		if (rc) {
>  			vmaddr = __gmap_translate(gmap, gaddr);
> @@ -914,10 +999,7 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
>  			rc = gmap_pte_op_fixup(gmap, gaddr, vmaddr, prot);
>  			if (rc)
>  				return rc;
> -			continue;
>  		}
> -		gaddr += PAGE_SIZE;
> -		len -= PAGE_SIZE;
>  	}
>  	return 0;
>  }
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 02/22] s390/mm: Abstract gmap notify bit setting
  2017-12-13 12:53 ` [RFC/PATCH v2 02/22] s390/mm: Abstract gmap notify bit setting Janosch Frank
@ 2018-01-22 11:34   ` David Hildenbrand
  0 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand @ 2018-01-22 11:34 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390

On 13.12.2017 13:53, Janosch Frank wrote:
> Currently we use the software PGSTE bits PGSTE_IN_BIT and PGSTE_VSIE_BIT
> to notify before an invalidation occurs on a prefix page or a VSIE page
> respectively. Both bits only work for a PGSTE, which only exists for
> page tables.
> 
> For huge page support we also need such bits for segments (pmds) so
> let's introduce abstract GMAP_NOTIFY_* bits that will be realized into
> the respective bits when gmap DAT table entries are protected.
> 
> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/include/asm/gmap.h |  4 ++++
>  arch/s390/mm/gmap.c          | 13 ++++++++-----
>  2 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
> index e07cce8..c1bc563 100644
> --- a/arch/s390/include/asm/gmap.h
> +++ b/arch/s390/include/asm/gmap.h
> @@ -9,6 +9,10 @@
>  #ifndef _ASM_S390_GMAP_H
>  #define _ASM_S390_GMAP_H
>  
> +/* Generic bits for GMAP notification on DAT table entry changes. */
> +#define GMAP_NOTIFY_SHADOW	0x2
> +#define GMAP_NOTIFY_MPROT	0x1
> +
>  /**
>   * struct gmap_struct - guest address space
>   * @list: list head for the mm->context gmap list
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 8de8bf9..e7825d2 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -929,7 +929,7 @@ static inline void gmap_pmd_op_end(struct gmap *gmap, pmd_t *pmdp)
>   * @gaddr: virtual address in the guest address space
>   * @pmdp: pointer to the pmd associated with the pte
>   * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE
> - * @bits: pgste notification bits to set
> + * @bits: notification bits to set
>   *
>   * Returns 0 if successfully protected, -ENOMEM if out of memory and
>   * -EAGAIN if a fixup is needed.
> @@ -943,6 +943,7 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
>  	int rc;
>  	pte_t *ptep;
>  	spinlock_t *ptl = NULL;
> +	unsigned long pbits = 0;
>  
>  	/* We have no upper segment, let's go back and fix this up. */
>  	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
> @@ -952,8 +953,10 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
>  	if (!ptep)
>  		return -ENOMEM;
>  
> +	pbits |= (bits & GMAP_NOTIFY_MPROT) ? PGSTE_IN_BIT : 0;
> +	pbits |= (bits & GMAP_NOTIFY_SHADOW) ? PGSTE_VSIE_BIT : 0;
>  	/* Protect and unlock. */
> -	rc = ptep_force_prot(gmap->mm, gaddr, ptep, prot, bits);
> +	rc = ptep_force_prot(gmap->mm, gaddr, ptep, prot, pbits);
>  	gmap_pte_op_end(ptl);
>  	return rc;
>  }
> @@ -1028,7 +1031,7 @@ int gmap_mprotect_notify(struct gmap *gmap, unsigned long gaddr,
>  	if (!MACHINE_HAS_ESOP && prot == PROT_READ)
>  		return -EINVAL;
>  	down_read(&gmap->mm->mmap_sem);
> -	rc = gmap_protect_range(gmap, gaddr, len, prot, PGSTE_IN_BIT);
> +	rc = gmap_protect_range(gmap, gaddr, len, prot, GMAP_NOTIFY_MPROT);
>  	up_read(&gmap->mm->mmap_sem);
>  	return rc;
>  }
> @@ -1150,7 +1153,7 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
>  		if (ptep) {
>  			spin_lock(&sg->guest_table_lock);
>  			rc = ptep_force_prot(parent->mm, paddr, ptep, prot,
> -					     PGSTE_VSIE_BIT);
> +					     GMAP_NOTIFY_SHADOW);
>  			if (!rc)
>  				gmap_insert_rmap(sg, vmaddr, rmap);
>  			spin_unlock(&sg->guest_table_lock);
> @@ -1616,7 +1619,7 @@ struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce,
>  	down_read(&parent->mm->mmap_sem);
>  	rc = gmap_protect_range(parent, asce & _ASCE_ORIGIN,
>  				((asce & _ASCE_TABLE_LENGTH) + 1) * PAGE_SIZE,
> -				PROT_READ, PGSTE_VSIE_BIT);
> +				PROT_READ, GMAP_NOTIFY_SHADOW);
>  	up_read(&parent->mm->mmap_sem);
>  	spin_lock(&parent->shadow_lock);
>  	new->initialized = true;
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 03/22] s390/mm: add gmap PMD invalidation notification
  2017-12-13 12:53 ` [RFC/PATCH v2 03/22] s390/mm: add gmap PMD invalidation notification Janosch Frank
  2017-12-21  9:24   ` Janosch Frank
@ 2018-01-22 11:46   ` David Hildenbrand
  2018-01-22 13:13     ` Janosch Frank
  2018-01-22 11:56   ` David Hildenbrand
  2 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2018-01-22 11:46 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390

On 13.12.2017 13:53, Janosch Frank wrote:
> For later migration of huge pages we want to write-protect guest
> PMDs. While doing this, we have to make absolutely sure, that the
> guest's lowcore is always accessible when the VCPU is running. With
> PTEs, this is solved by marking the PGSTEs of the lowcore pages with
> the invalidation notification bit and kicking the guest out of the SIE
> via a notifier function if we need to invalidate such a page.
> 
> With PMDs we do not have PGSTEs or some other bits we could use in the
> host PMD. Instead we pick one of the free bits in the gmap PMD. Every
> time a host pmd will be invalidated, we will check if the respective
> gmap PMD has the bit set and in that case fire up the notifier.
> 
> In the first step we only support setting the invalidation bit, but we
> do not support restricting access of guest pmds. It will follow
> shortly.

I am wondering if we could avoid having invalidation bits on PMDs
completely by always splitting up a PMD huge page into PTEs.

I assume this would make the code easier - as we need split up of PMDs
either way when protecting for the shadow gmap.

This would imply that also our notification handler only has to be
called for 4k pages, which also makes that part easier.

This would mean, that the 1MB segments where the prefixes live would
always be split into 4k pages - but do we care?

I somehow dislike that somebody registers a notifier for some subregion
(e.g. 8k) but gets notified about a huge page (1mb).

Opinions?

> 
> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
> ---
>  arch/s390/include/asm/gmap.h    |  3 ++
>  arch/s390/include/asm/pgtable.h |  7 +++-
>  arch/s390/mm/gmap.c             | 92 ++++++++++++++++++++++++++++++++++++-----
>  arch/s390/mm/pgtable.c          |  4 ++
>  4 files changed, 94 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
> index c1bc563..21bb658 100644
> --- a/arch/s390/include/asm/gmap.h
> +++ b/arch/s390/include/asm/gmap.h
> @@ -13,6 +13,9 @@
>  #define GMAP_NOTIFY_SHADOW	0x2
>  #define GMAP_NOTIFY_MPROT	0x1
>  
> +/* Status bits in the gmap segment entry. */
> +#define _SEGMENT_ENTRY_GMAP_IN		0x0001	/* invalidation notify bit */
> +
>  /**
>   * struct gmap_struct - guest address space
>   * @list: list head for the mm->context gmap list
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 57d7bc9..ba3840c 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -269,8 +269,10 @@ static inline int is_module_addr(void *addr)
>  #define _REGION_ENTRY_BITS_LARGE 0xffffffff8000fe2fUL
>  
>  /* Bits in the segment table entry */
> -#define _SEGMENT_ENTRY_BITS	0xfffffffffffffe33UL
> -#define _SEGMENT_ENTRY_BITS_LARGE 0xfffffffffff0ff33UL
> +#define _SEGMENT_ENTRY_BITS			0xfffffffffffffe33UL
> +#define _SEGMENT_ENTRY_BITS_LARGE 		0xfffffffffff0ff33UL
> +#define _SEGMENT_ENTRY_HARDWARE_BITS		0xfffffffffffffe30UL
> +#define _SEGMENT_ENTRY_HARDWARE_BITS_LARGE 	0xfffffffffff00730UL
>  #define _SEGMENT_ENTRY_ORIGIN_LARGE ~0xfffffUL /* large page address	    */
>  #define _SEGMENT_ENTRY_ORIGIN	~0x7ffUL/* page table origin		    */
>  #define _SEGMENT_ENTRY_PROTECT	0x200	/* segment protection bit	    */
> @@ -1093,6 +1095,7 @@ void ptep_set_pte_at(struct mm_struct *mm, unsigned long addr,
>  void ptep_set_notify(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
>  void ptep_notify(struct mm_struct *mm, unsigned long addr,
>  		 pte_t *ptep, unsigned long bits);
> +void pmdp_notify(struct mm_struct *mm, unsigned long addr);
>  int ptep_force_prot(struct mm_struct *mm, unsigned long gaddr,
>  		    pte_t *ptep, int prot, unsigned long bit);
>  void ptep_zap_unused(struct mm_struct *mm, unsigned long addr,
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index e7825d2..ff7fe24 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -596,10 +596,15 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
>  	if (*table == _SEGMENT_ENTRY_EMPTY) {
>  		rc = radix_tree_insert(&gmap->host_to_guest,
>  				       vmaddr >> PMD_SHIFT, table);
> -		if (!rc)
> -			*table = pmd_val(*pmd);
> -	} else
> -		rc = 0;
> +		if (!rc) {
> +			if (pmd_large(*pmd)) {
> +				*table = pmd_val(*pmd) &
> +					_SEGMENT_ENTRY_HARDWARE_BITS_LARGE;
> +			} else
> +				*table = pmd_val(*pmd) &
> +					_SEGMENT_ENTRY_HARDWARE_BITS;
> +		}
> +	}
>  	spin_unlock(&gmap->guest_table_lock);
>  	spin_unlock(ptl);
>  	radix_tree_preload_end();
> @@ -962,6 +967,33 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
>  }
>  
>  /*
> + * gmap_protect_pmd - set pmd notification bits
> + * @pmdp: pointer to the pmd to be protected
> + * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE
> + * @bits: notification bits to set
> + *
> + * Returns 0 if successfully protected, -ENOMEM if out of memory and
> + * -EAGAIN if a fixup is needed.
> + *
> + * Expected to be called with sg->mm->mmap_sem in read and
> + * guest_table_lock held.
> + */
> +static int gmap_protect_pmd(struct gmap *gmap, unsigned long gaddr,
> +			    pmd_t *pmdp, int prot, unsigned long bits)
> +{
> +	const int pmd_i = pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID;
> +	const int pmd_p = pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT;
> +
> +	/* Fixup needed */
> +	if ((pmd_i && (prot != PROT_NONE)) || (pmd_p && (prot & PROT_WRITE)))
> +		return -EAGAIN;
> +
> +	if (bits & GMAP_NOTIFY_MPROT)
> +		pmd_val(*pmdp) |=  _SEGMENT_ENTRY_GMAP_IN;
> +	return 0;
> +}
> +
> +/*
>   * gmap_protect_range - remove access rights to memory and set pgste bits
>   * @gmap: pointer to guest mapping meta data structure
>   * @gaddr: virtual address in the guest address space
> @@ -979,7 +1011,7 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
>  static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
>  			      unsigned long len, int prot, unsigned long bits)
>  {
> -	unsigned long vmaddr;
> +	unsigned long vmaddr, dist;
>  	pmd_t *pmdp;
>  	int rc;
>  
> @@ -987,11 +1019,21 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
>  		rc = -EAGAIN;
>  		pmdp = gmap_pmd_op_walk(gmap, gaddr);
>  		if (pmdp) {
> -			rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,
> -					      bits);
> -			if (!rc) {
> -				len -= PAGE_SIZE;
> -				gaddr += PAGE_SIZE;
> +			if (!pmd_large(*pmdp)) {
> +				rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,
> +						      bits);
> +				if (!rc) {
> +					len -= PAGE_SIZE;
> +					gaddr += PAGE_SIZE;
> +				}
> +			} else {
> +				rc = gmap_protect_pmd(gmap, gaddr, pmdp, prot,
> +						      bits);
> +				if (!rc) {
> +					dist = HPAGE_SIZE - (gaddr & ~HPAGE_MASK);
> +					len = len < dist ? 0 : len - dist;
> +					gaddr = (gaddr & HPAGE_MASK) + HPAGE_SIZE;
> +				}
>  			}
>  			gmap_pmd_op_end(gmap, pmdp);
>  		}
> @@ -2185,6 +2227,36 @@ void ptep_notify(struct mm_struct *mm, unsigned long vmaddr,
>  }
>  EXPORT_SYMBOL_GPL(ptep_notify);
>  
> +/**
> + * pmdp_notify - call all invalidation callbacks for a specific pmd
> + * @mm: pointer to the process mm_struct
> + * @vmaddr: virtual address in the process address space
> + *
> + * This function is expected to be called with mmap_sem held in read.
> + */
> +void pmdp_notify(struct mm_struct *mm, unsigned long vmaddr)
> +{
> +	unsigned long *table, gaddr;
> +	struct gmap *gmap;
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(gmap, &mm->context.gmap_list, list) {
> +		spin_lock(&gmap->guest_table_lock);
> +		table = radix_tree_lookup(&gmap->host_to_guest,
> +					  vmaddr >> PMD_SHIFT);
> +		if (!table || !(*table & _SEGMENT_ENTRY_GMAP_IN)) {
> +			spin_unlock(&gmap->guest_table_lock);
> +			continue;
> +		}
> +		gaddr = __gmap_segment_gaddr(table);
> +		*table &= ~_SEGMENT_ENTRY_GMAP_IN;
> +		spin_unlock(&gmap->guest_table_lock);
> +		gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1);
> +	}
> +	rcu_read_unlock();
> +}
> +EXPORT_SYMBOL_GPL(pmdp_notify);
> +
>  static inline void thp_split_mm(struct mm_struct *mm)
>  {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index 4f2b65d..a6cc540 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -405,6 +405,8 @@ pmd_t pmdp_xchg_direct(struct mm_struct *mm, unsigned long addr,
>  	pmd_t old;
>  
>  	preempt_disable();
> +	if (mm_has_pgste(mm))
> +		pmdp_notify(mm, addr);
>  	old = pmdp_flush_direct(mm, addr, pmdp);
>  	*pmdp = new;
>  	preempt_enable();
> @@ -418,6 +420,8 @@ pmd_t pmdp_xchg_lazy(struct mm_struct *mm, unsigned long addr,
>  	pmd_t old;
>  
>  	preempt_disable();
> +	if (mm_has_pgste(mm))
> +		pmdp_notify(mm, addr);
>  	old = pmdp_flush_lazy(mm, addr, pmdp);
>  	*pmdp = new;
>  	preempt_enable();
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 03/22] s390/mm: add gmap PMD invalidation notification
  2017-12-13 12:53 ` [RFC/PATCH v2 03/22] s390/mm: add gmap PMD invalidation notification Janosch Frank
  2017-12-21  9:24   ` Janosch Frank
  2018-01-22 11:46   ` David Hildenbrand
@ 2018-01-22 11:56   ` David Hildenbrand
  2018-01-22 12:09     ` Janosch Frank
  2 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2018-01-22 11:56 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390

On 13.12.2017 13:53, Janosch Frank wrote:
> For later migration of huge pages we want to write-protect guest
> PMDs. While doing this, we have to make absolutely sure, that the
> guest's lowcore is always accessible when the VCPU is running. With
> PTEs, this is solved by marking the PGSTEs of the lowcore pages with
> the invalidation notification bit and kicking the guest out of the SIE
> via a notifier function if we need to invalidate such a page.
> 
> With PMDs we do not have PGSTEs or some other bits we could use in the
> host PMD. Instead we pick one of the free bits in the gmap PMD. Every
> time a host pmd will be invalidated, we will check if the respective
> gmap PMD has the bit set and in that case fire up the notifier.
> 
> In the first step we only support setting the invalidation bit, but we
> do not support restricting access of guest pmds. It will follow
> shortly.
> 
> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
> ---
>  arch/s390/include/asm/gmap.h    |  3 ++
>  arch/s390/include/asm/pgtable.h |  7 +++-
>  arch/s390/mm/gmap.c             | 92 ++++++++++++++++++++++++++++++++++++-----
>  arch/s390/mm/pgtable.c          |  4 ++
>  4 files changed, 94 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
> index c1bc563..21bb658 100644
> --- a/arch/s390/include/asm/gmap.h
> +++ b/arch/s390/include/asm/gmap.h
> @@ -13,6 +13,9 @@
>  #define GMAP_NOTIFY_SHADOW	0x2
>  #define GMAP_NOTIFY_MPROT	0x1
>  
> +/* Status bits in the gmap segment entry. */
> +#define _SEGMENT_ENTRY_GMAP_IN		0x0001	/* invalidation notify bit */
> +

_SEGMENT_ENTRY_READ -> 0x0001

Is it even okay to reuse that bit?

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement
  2018-01-22 11:23 ` David Hildenbrand
@ 2018-01-22 11:56   ` Christian Borntraeger
  0 siblings, 0 replies; 67+ messages in thread
From: Christian Borntraeger @ 2018-01-22 11:56 UTC (permalink / raw)
  To: David Hildenbrand, Janosch Frank, kvm
  Cc: schwidefsky, dominik.dingel, linux-s390

On 01/22/2018 12:23 PM, David Hildenbrand wrote:
> On 13.12.2017 13:53, Janosch Frank wrote:
>> Since the z10 s390 does support 1M pages, but whereas hugetlbfs
>> support was added quite fast, KVM always used standard 4k pages for
>> guest backings.
>>
>> This patchset adds full support for 1M huge page backings for s390
>> KVM guests. I.e. we also support VSIE (nested vms) for these guests
>> and are therefore able to run all combinations of backings for all
>> layers of guests.
>>
>> When running a VSIE guest in a huge page backed guest, we need to
>> split some huge pages to be able to set granular protection. This way
>> we avoid a prot/unprot cycle if prefixes and VSIE pages containing
>> level 3 gmap DAT tables share the same segment, as the prefix has to
>> be accessible at all times and the VSIE page has to be write
>> protected.
>>
>> TODO:
>> * Cleanups & Documentation
>> * Refactoring to get rid of a lot of indents
>> * Find a way to reduce or beautify bit checks on table entries
>> * Storage key support for split pages (will be a separate bugfix)
>> * Regression testing
>> * Testing large setups
>> * Testing multi level VSIE
>>
>> V2:
>> 	* Incorporated changes from David's cleanup
>> 	* Now flushing with IDTE_NODAT for protection transfers.
>> 	* Added RRBE huge page handling for g2 -> g3 skey emulation
>> 	* Added documentation for capability
>> 	* Renamed GMAP_ENTRY_* constants
>> 	* Added SEGMENT hardware bits constants
>> 	* Improved some patch descriptions
>> 	* General small improvements
>> 	* Introduced pte_from_pmd function
>>
>> Accomplished testing:
>> l2: KVM guest
>> l3: nested KVM guest
>>
>> * 1m l2 guests
>> * VSIE (l3) 4k and 1m guests on 1m l2
>> * 1m l2 -> l2 migration with 4k/1m l3 guests
>> * l3 -> l2 migration
>> * postcopy works every second try, seems to be QEMU or my setup
>>
>>
>> The initial prototype was started by Dominik Dingel. I had the
>> pleasure of adding the VSIE part, the protection transfers and the
>> optimizations. A huge thanks to Christian and Martin who review(ed)
>> and helped debugging/designing.
>>
> 
> Do you have a branch somewhere? I can't find a branch where this applies
> cleanly. Thanks

Pushed to 

git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git hlp_vsie

https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git/log/?h=hlp_vsie

Thanks for looking into that.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 03/22] s390/mm: add gmap PMD invalidation notification
  2018-01-22 11:56   ` David Hildenbrand
@ 2018-01-22 12:09     ` Janosch Frank
  2018-01-22 12:12       ` David Hildenbrand
  0 siblings, 1 reply; 67+ messages in thread
From: Janosch Frank @ 2018-01-22 12:09 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390


[-- Attachment #1.1: Type: text/plain, Size: 2001 bytes --]

On 22.01.2018 12:56, David Hildenbrand wrote:
> On 13.12.2017 13:53, Janosch Frank wrote:
>> For later migration of huge pages we want to write-protect guest
>> PMDs. While doing this, we have to make absolutely sure, that the
>> guest's lowcore is always accessible when the VCPU is running. With
>> PTEs, this is solved by marking the PGSTEs of the lowcore pages with
>> the invalidation notification bit and kicking the guest out of the SIE
>> via a notifier function if we need to invalidate such a page.
>>
>> With PMDs we do not have PGSTEs or some other bits we could use in the
>> host PMD. Instead we pick one of the free bits in the gmap PMD. Every
>> time a host pmd will be invalidated, we will check if the respective
>> gmap PMD has the bit set and in that case fire up the notifier.
>>
>> In the first step we only support setting the invalidation bit, but we
>> do not support restricting access of guest pmds. It will follow
>> shortly.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
>> ---
>>  arch/s390/include/asm/gmap.h    |  3 ++
>>  arch/s390/include/asm/pgtable.h |  7 +++-
>>  arch/s390/mm/gmap.c             | 92 ++++++++++++++++++++++++++++++++++++-----
>>  arch/s390/mm/pgtable.c          |  4 ++
>>  4 files changed, 94 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
>> index c1bc563..21bb658 100644
>> --- a/arch/s390/include/asm/gmap.h
>> +++ b/arch/s390/include/asm/gmap.h
>> @@ -13,6 +13,9 @@
>>  #define GMAP_NOTIFY_SHADOW	0x2
>>  #define GMAP_NOTIFY_MPROT	0x1
>>  
>> +/* Status bits in the gmap segment entry. */
>> +#define _SEGMENT_ENTRY_GMAP_IN		0x0001	/* invalidation notify bit */
>> +
> 
> _SEGMENT_ENTRY_READ -> 0x0001
> 
> Is it even okay to reuse that bit?
> 

It's in the GMAP segment entry, not in the process' segment entry.
That's why we throw away all software bits from the process entry when
linking into the gmap table.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 03/22] s390/mm: add gmap PMD invalidation notification
  2018-01-22 12:09     ` Janosch Frank
@ 2018-01-22 12:12       ` David Hildenbrand
  0 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand @ 2018-01-22 12:12 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390

On 22.01.2018 13:09, Janosch Frank wrote:
> On 22.01.2018 12:56, David Hildenbrand wrote:
>> On 13.12.2017 13:53, Janosch Frank wrote:
>>> For later migration of huge pages we want to write-protect guest
>>> PMDs. While doing this, we have to make absolutely sure, that the
>>> guest's lowcore is always accessible when the VCPU is running. With
>>> PTEs, this is solved by marking the PGSTEs of the lowcore pages with
>>> the invalidation notification bit and kicking the guest out of the SIE
>>> via a notifier function if we need to invalidate such a page.
>>>
>>> With PMDs we do not have PGSTEs or some other bits we could use in the
>>> host PMD. Instead we pick one of the free bits in the gmap PMD. Every
>>> time a host pmd will be invalidated, we will check if the respective
>>> gmap PMD has the bit set and in that case fire up the notifier.
>>>
>>> In the first step we only support setting the invalidation bit, but we
>>> do not support restricting access of guest pmds. It will follow
>>> shortly.
>>>
>>> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
>>> ---
>>>  arch/s390/include/asm/gmap.h    |  3 ++
>>>  arch/s390/include/asm/pgtable.h |  7 +++-
>>>  arch/s390/mm/gmap.c             | 92 ++++++++++++++++++++++++++++++++++++-----
>>>  arch/s390/mm/pgtable.c          |  4 ++
>>>  4 files changed, 94 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
>>> index c1bc563..21bb658 100644
>>> --- a/arch/s390/include/asm/gmap.h
>>> +++ b/arch/s390/include/asm/gmap.h
>>> @@ -13,6 +13,9 @@
>>>  #define GMAP_NOTIFY_SHADOW	0x2
>>>  #define GMAP_NOTIFY_MPROT	0x1
>>>  
>>> +/* Status bits in the gmap segment entry. */
>>> +#define _SEGMENT_ENTRY_GMAP_IN		0x0001	/* invalidation notify bit */
>>> +
>>
>> _SEGMENT_ENTRY_READ -> 0x0001
>>
>> Is it even okay to reuse that bit?
>>
> 
> It's in the GMAP segment entry, not in the process' segment entry.
> That's why we throw away all software bits from the process entry when
> linking into the gmap table.
> 

Ah right, we only share page tables but not complete segments. Thanks.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 01/22] s390/mm: make gmap_protect_range more modular
  2018-01-22 11:33   ` David Hildenbrand
@ 2018-01-22 12:31     ` Janosch Frank
  2018-01-22 12:50       ` David Hildenbrand
  0 siblings, 1 reply; 67+ messages in thread
From: Janosch Frank @ 2018-01-22 12:31 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390


[-- Attachment #1.1: Type: text/plain, Size: 4782 bytes --]

On 22.01.2018 12:33, David Hildenbrand wrote:
> On 13.12.2017 13:53, Janosch Frank wrote:
>> This patch reworks the gmap_protect_range logic and extracts the pte
>> handling into an own function. Also we do now walk to the pmd and make
>> it accessible in the function for later use. This way we can add huge
>> page handling logic more easily.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
>> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
>> ---
>>  arch/s390/mm/gmap.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++------
>>  1 file changed, 92 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
>> index 05d459b..8de8bf9 100644
>> --- a/arch/s390/mm/gmap.c
>> +++ b/arch/s390/mm/gmap.c
>> @@ -874,7 +874,88 @@ static int gmap_pte_op_fixup(struct gmap *gmap, unsigned long gaddr,
>>   */
>>  static void gmap_pte_op_end(spinlock_t *ptl)
>>  {
>> -	spin_unlock(ptl);
>> +	if (ptl)
>> +		spin_unlock(ptl);
>> +}
>> +
>> +/**
>> + * gmap_pmd_op_walk - walk the gmap tables, get the guest table lock
>> + *		      and return the pmd pointer
>> + * @gmap: pointer to guest mapping meta data structure
>> + * @gaddr: virtual address in the guest address space
>> + *
>> + * Returns a pointer to the pmd for a guest address, or NULL
>> + */
>> +static inline pmd_t *gmap_pmd_op_walk(struct gmap *gmap, unsigned long gaddr)
>> +{
>> +	pmd_t *pmdp;
>> +
>> +	spin_lock(&gmap->guest_table_lock);
>> +	pmdp = (pmd_t *) gmap_table_walk(gmap, gaddr, 1);
>> +
>> +	/*
>> +	 * Empty pmds can become large after we give up the
>> +	 * guest_table_lock, so we have to check for pmd_none
>> +	 * here.
>> +	 */
> 
> Don't understand that comment. We give up the lock after we're done with
> the pmd either way. So I think this comment can go.

TBH I'm currently not able to recall the reason for this comment.

> 
>> +	if (!pmdp || pmd_none(*pmdp)) {
>> +		spin_unlock(&gmap->guest_table_lock);
>> +		return NULL;
>> +	}
>> +	/*
>> +	 * For plain 4k guests that do not run under the vsie it
>> +	 * suffices to take the pte lock later on. Thus we can unlock
>> +	 * the guest_table_lock here.
>> +	 */
> 
> As discussed, the gmap_is_shadow() check is not needed. The comment
> should be something like

IFF we'll never use this function to walk shadow tables, then you are
right. We can make it a policy and throw in a BUG_ON.

[...]
>> +static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
>> +			    pmd_t *pmdp, int prot, unsigned long bits)
>> +{
>> +	int rc;
>> +	pte_t *ptep;
>> +	spinlock_t *ptl = NULL;
>> +
>> +	/* We have no upper segment, let's go back and fix this up. */
>> +	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
>> +		return -EAGAIN;
> 
> This is essentially pmd_none(*pmdp), which you already verified in
> gmap_pmd_op_walk().

Well, not really pmd_none is entry == ENTRY_EMPTY (only I bit set) not
entry & I.
Is there a path where we have an I bit on a pmd entry which has a valid pto?

> 
> I suggest requiring for this function that the entry is valid (which is
> always the case) and getting rid of the -EAGAIN return code. Makes this
> function simpler.
> 
>> +
>> +	ptep = pte_alloc_map_lock(gmap->mm, pmdp, gaddr, &ptl);
>> +	if (!ptep)
>> +		return -ENOMEM;
>> +
>> +	/* Protect and unlock. */
>> +	rc = ptep_force_prot(gmap->mm, gaddr, ptep, prot, bits);
>> +	gmap_pte_op_end(ptl);
>> +	return rc;
>>  }
>>  
>>  /*
>> @@ -896,16 +977,20 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
>>  			      unsigned long len, int prot, unsigned long bits)
>>  {
>>  	unsigned long vmaddr;
>> -	spinlock_t *ptl;
>> -	pte_t *ptep;
>> +	pmd_t *pmdp;
>>  	int rc;
>>  
>>  	while (len) {
>>  		rc = -EAGAIN;
>> -		ptep = gmap_pte_op_walk(gmap, gaddr, &ptl);
>> -		if (ptep) {
>> -			rc = ptep_force_prot(gmap->mm, gaddr, ptep, prot, bits);
>> -			gmap_pte_op_end(ptl);
>> +		pmdp = gmap_pmd_op_walk(gmap, gaddr);
>> +		if (pmdp) {
>> +			rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,
>> +					      bits);
>> +			if (!rc) {
>> +				len -= PAGE_SIZE;
>> +				gaddr += PAGE_SIZE;
>> +			}
>> +			gmap_pmd_op_end(gmap, pmdp);
> 
> This change looks good to me.

Great :)

> 
>>  		}
>>  		if (rc) {
>>  			vmaddr = __gmap_translate(gmap, gaddr);
>> @@ -914,10 +999,7 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
>>  			rc = gmap_pte_op_fixup(gmap, gaddr, vmaddr, prot);
>>  			if (rc)
>>  				return rc;
>> -			continue;
>>  		}
>> -		gaddr += PAGE_SIZE;
>> -		len -= PAGE_SIZE;
>>  	}
>>  	return 0;
>>  }
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 01/22] s390/mm: make gmap_protect_range more modular
  2018-01-22 12:31     ` Janosch Frank
@ 2018-01-22 12:50       ` David Hildenbrand
  2018-01-22 13:02         ` Janosch Frank
  0 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2018-01-22 12:50 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390


>>
>>> +	if (!pmdp || pmd_none(*pmdp)) {
>>> +		spin_unlock(&gmap->guest_table_lock);
>>> +		return NULL;
>>> +	}
>>> +	/*
>>> +	 * For plain 4k guests that do not run under the vsie it
>>> +	 * suffices to take the pte lock later on. Thus we can unlock
>>> +	 * the guest_table_lock here.
>>> +	 */
>>
>> As discussed, the gmap_is_shadow() check is not needed. The comment
>> should be something like
> 
> IFF we'll never use this function to walk shadow tables, then you are
> right. We can make it a policy and throw in a BUG_ON.

Right. We never protect anything on a shadow gmap. We only mirror the
access tights requested by the guest (which are then valid in the host).

> 
> [...]
>>> +static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
>>> +			    pmd_t *pmdp, int prot, unsigned long bits)
>>> +{
>>> +	int rc;
>>> +	pte_t *ptep;
>>> +	spinlock_t *ptl = NULL;
>>> +
>>> +	/* We have no upper segment, let's go back and fix this up. */
>>> +	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
>>> +		return -EAGAIN;
>>
>> This is essentially pmd_none(*pmdp), which you already verified in
>> gmap_pmd_op_walk().
> 
> Well, not really pmd_none is entry == ENTRY_EMPTY (only I bit set) not
> entry & I.
> Is there a path where we have an I bit on a pmd entry which has a valid pto?
> 

Thing idte only sets the invalid bit. But can this check than go into
gmap_pmd_op_walk? (replacing pmd_none() ?)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 01/22] s390/mm: make gmap_protect_range more modular
  2018-01-22 12:50       ` David Hildenbrand
@ 2018-01-22 13:02         ` Janosch Frank
  0 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2018-01-22 13:02 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390


[-- Attachment #1.1: Type: text/plain, Size: 1771 bytes --]

On 22.01.2018 13:50, David Hildenbrand wrote:
> 
>>>
>>>> +	if (!pmdp || pmd_none(*pmdp)) {
>>>> +		spin_unlock(&gmap->guest_table_lock);
>>>> +		return NULL;
>>>> +	}
>>>> +	/*
>>>> +	 * For plain 4k guests that do not run under the vsie it
>>>> +	 * suffices to take the pte lock later on. Thus we can unlock
>>>> +	 * the guest_table_lock here.
>>>> +	 */
>>>
>>> As discussed, the gmap_is_shadow() check is not needed. The comment
>>> should be something like
>>
>> IFF we'll never use this function to walk shadow tables, then you are
>> right. We can make it a policy and throw in a BUG_ON.
> 
> Right. We never protect anything on a shadow gmap. We only mirror the
> access tights requested by the guest (which are then valid in the host).

For now I'll introduce a comment, a BUG_ON and get rid of the check.

> 
>>
>> [...]
>>>> +static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
>>>> +			    pmd_t *pmdp, int prot, unsigned long bits)
>>>> +{
>>>> +	int rc;
>>>> +	pte_t *ptep;
>>>> +	spinlock_t *ptl = NULL;
>>>> +
>>>> +	/* We have no upper segment, let's go back and fix this up. */
>>>> +	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
>>>> +		return -EAGAIN;
>>>
>>> This is essentially pmd_none(*pmdp), which you already verified in
>>> gmap_pmd_op_walk().
>>
>> Well, not really pmd_none is entry == ENTRY_EMPTY (only I bit set) not
>> entry & I.
>> Is there a path where we have an I bit on a pmd entry which has a valid pto?
>>
> 
> Thing idte only sets the invalid bit. But can this check than go into
> gmap_pmd_op_walk? (replacing pmd_none() ?)

I'll have to think about that, but quite possibly yes.
The problem comes from Martin's wish to properly handle prot_none entries.




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 03/22] s390/mm: add gmap PMD invalidation notification
  2018-01-22 11:46   ` David Hildenbrand
@ 2018-01-22 13:13     ` Janosch Frank
  2018-01-22 13:29       ` David Hildenbrand
  0 siblings, 1 reply; 67+ messages in thread
From: Janosch Frank @ 2018-01-22 13:13 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390


[-- Attachment #1.1: Type: text/plain, Size: 2174 bytes --]

On 22.01.2018 12:46, David Hildenbrand wrote:
> On 13.12.2017 13:53, Janosch Frank wrote:
>> For later migration of huge pages we want to write-protect guest
>> PMDs. While doing this, we have to make absolutely sure, that the
>> guest's lowcore is always accessible when the VCPU is running. With
>> PTEs, this is solved by marking the PGSTEs of the lowcore pages with
>> the invalidation notification bit and kicking the guest out of the SIE
>> via a notifier function if we need to invalidate such a page.
>>
>> With PMDs we do not have PGSTEs or some other bits we could use in the
>> host PMD. Instead we pick one of the free bits in the gmap PMD. Every
>> time a host pmd will be invalidated, we will check if the respective
>> gmap PMD has the bit set and in that case fire up the notifier.
>>
>> In the first step we only support setting the invalidation bit, but we
>> do not support restricting access of guest pmds. It will follow
>> shortly.
> 
> I am wondering if we could avoid having invalidation bits on PMDs
> completely by always splitting up a PMD huge page into PTEs.
> 
> I assume this would make the code easier - as we need split up of PMDs
> either way when protecting for the shadow gmap.
> 
> This would imply that also our notification handler only has to be
> called for 4k pages, which also makes that part easier.

Except for 1MB shadowed segments which still need an invalidation handler.

> 
> This would mean, that the 1MB segments where the prefixes live would
> always be split into 4k pages - but do we care?

Hmm, I currently don't see a medium to huge benefit from it.

> 
> I somehow dislike that somebody registers a notifier for some subregion
> (e.g. 8k) but gets notified about a huge page (1mb).
> 
> Opinions?

Well, if you start bending reality to your will, things get messy.

Having two tables with different depths is a risk, as I can't be sure if
at some point a common code function will touch a split pte and set a
problematic value. But then again, we already do that anyhow.

As I'd need to rearrange patches again, I'd either do that after this
series or leave it be.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 11/22] s390/mm: Remove superfluous parameter
  2017-12-13 12:53 ` [RFC/PATCH v2 11/22] s390/mm: Remove superfluous parameter Janosch Frank
  2017-12-21  9:22   ` Janosch Frank
  2018-01-16 13:11   ` David Hildenbrand
@ 2018-01-22 13:14   ` Christian Borntraeger
  2018-01-22 13:24     ` Martin Schwidefsky
  2 siblings, 1 reply; 67+ messages in thread
From: Christian Borntraeger @ 2018-01-22 13:14 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: schwidefsky, david, dominik.dingel, linux-s390

 thanks applied. 
Martin unless you have concerns I will take this via the kvm tree.

On 12/13/2017 01:53 PM, Janosch Frank wrote:
> It seems it hasn't even been used before the last cleanup and was
> overlooked.
> 
> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
> ---
>  arch/s390/mm/gmap.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index ffc11d8..d396da8 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2237,7 +2237,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_page);
>   * Called with sg->parent->shadow_lock.
>   */
>  static void gmap_shadow_notify(struct gmap *sg, unsigned long vmaddr,
> -			       unsigned long gaddr, pte_t *pte)
> +			       unsigned long gaddr)
>  {
>  	struct gmap_rmap *rmap, *rnext, *head;
>  	unsigned long start, end, bits, raddr;
> @@ -2322,7 +2322,7 @@ void ptep_notify(struct mm_struct *mm, unsigned long vmaddr,
>  			spin_lock(&gmap->shadow_lock);
>  			list_for_each_entry_safe(sg, next,
>  						 &gmap->children, list)
> -				gmap_shadow_notify(sg, vmaddr, gaddr, pte);
> +				gmap_shadow_notify(sg, vmaddr, gaddr);
>  			spin_unlock(&gmap->shadow_lock);
>  		}
>  		if (bits & PGSTE_IN_BIT)
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 11/22] s390/mm: Remove superfluous parameter
  2018-01-22 13:14   ` Christian Borntraeger
@ 2018-01-22 13:24     ` Martin Schwidefsky
  0 siblings, 0 replies; 67+ messages in thread
From: Martin Schwidefsky @ 2018-01-22 13:24 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Janosch Frank, kvm, david, dominik.dingel, linux-s390

On Mon, 22 Jan 2018 14:14:56 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

Yes, please take this via the kvm tree.

>  thanks applied. 
> Martin unless you have concerns I will take this via the kvm tree.
> 
> On 12/13/2017 01:53 PM, Janosch Frank wrote:
> > It seems it hasn't even been used before the last cleanup and was
> > overlooked.
> > 
> > Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
> > ---
> >  arch/s390/mm/gmap.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> > index ffc11d8..d396da8 100644
> > --- a/arch/s390/mm/gmap.c
> > +++ b/arch/s390/mm/gmap.c
> > @@ -2237,7 +2237,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_page);
> >   * Called with sg->parent->shadow_lock.
> >   */
> >  static void gmap_shadow_notify(struct gmap *sg, unsigned long vmaddr,
> > -			       unsigned long gaddr, pte_t *pte)
> > +			       unsigned long gaddr)
> >  {
> >  	struct gmap_rmap *rmap, *rnext, *head;
> >  	unsigned long start, end, bits, raddr;
> > @@ -2322,7 +2322,7 @@ void ptep_notify(struct mm_struct *mm, unsigned long vmaddr,
> >  			spin_lock(&gmap->shadow_lock);
> >  			list_for_each_entry_safe(sg, next,
> >  						 &gmap->children, list)
> > -				gmap_shadow_notify(sg, vmaddr, gaddr, pte);
> > +				gmap_shadow_notify(sg, vmaddr, gaddr);
> >  			spin_unlock(&gmap->shadow_lock);
> >  		}
> >  		if (bits & PGSTE_IN_BIT)
> >   


-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 03/22] s390/mm: add gmap PMD invalidation notification
  2018-01-22 13:13     ` Janosch Frank
@ 2018-01-22 13:29       ` David Hildenbrand
  2018-01-22 14:04         ` Janosch Frank
  0 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2018-01-22 13:29 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390

On 22.01.2018 14:13, Janosch Frank wrote:
> On 22.01.2018 12:46, David Hildenbrand wrote:
>> On 13.12.2017 13:53, Janosch Frank wrote:
>>> For later migration of huge pages we want to write-protect guest
>>> PMDs. While doing this, we have to make absolutely sure, that the
>>> guest's lowcore is always accessible when the VCPU is running. With
>>> PTEs, this is solved by marking the PGSTEs of the lowcore pages with
>>> the invalidation notification bit and kicking the guest out of the SIE
>>> via a notifier function if we need to invalidate such a page.
>>>
>>> With PMDs we do not have PGSTEs or some other bits we could use in the
>>> host PMD. Instead we pick one of the free bits in the gmap PMD. Every
>>> time a host pmd will be invalidated, we will check if the respective
>>> gmap PMD has the bit set and in that case fire up the notifier.
>>>
>>> In the first step we only support setting the invalidation bit, but we
>>> do not support restricting access of guest pmds. It will follow
>>> shortly.
>>
>> I am wondering if we could avoid having invalidation bits on PMDs
>> completely by always splitting up a PMD huge page into PTEs.
>>
>> I assume this would make the code easier - as we need split up of PMDs
>> either way when protecting for the shadow gmap.
>>
>> This would imply that also our notification handler only has to be
>> called for 4k pages, which also makes that part easier.
> 
> Except for 1MB shadowed segments which still need an invalidation handler.

But that doesn't go via gmap_protect_range() / gmap_call_notifier() if I
am not mistaking.


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 03/22] s390/mm: add gmap PMD invalidation notification
  2018-01-22 13:29       ` David Hildenbrand
@ 2018-01-22 14:04         ` Janosch Frank
  0 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2018-01-22 14:04 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390


[-- Attachment #1.1: Type: text/plain, Size: 1753 bytes --]

On 22.01.2018 14:29, David Hildenbrand wrote:
> On 22.01.2018 14:13, Janosch Frank wrote:
>> On 22.01.2018 12:46, David Hildenbrand wrote:
>>> On 13.12.2017 13:53, Janosch Frank wrote:
>>>> For later migration of huge pages we want to write-protect guest
>>>> PMDs. While doing this, we have to make absolutely sure, that the
>>>> guest's lowcore is always accessible when the VCPU is running. With
>>>> PTEs, this is solved by marking the PGSTEs of the lowcore pages with
>>>> the invalidation notification bit and kicking the guest out of the SIE
>>>> via a notifier function if we need to invalidate such a page.
>>>>
>>>> With PMDs we do not have PGSTEs or some other bits we could use in the
>>>> host PMD. Instead we pick one of the free bits in the gmap PMD. Every
>>>> time a host pmd will be invalidated, we will check if the respective
>>>> gmap PMD has the bit set and in that case fire up the notifier.
>>>>
>>>> In the first step we only support setting the invalidation bit, but we
>>>> do not support restricting access of guest pmds. It will follow
>>>> shortly.
>>>
>>> I am wondering if we could avoid having invalidation bits on PMDs
>>> completely by always splitting up a PMD huge page into PTEs.
>>>
>>> I assume this would make the code easier - as we need split up of PMDs
>>> either way when protecting for the shadow gmap.
>>>
>>> This would imply that also our notification handler only has to be
>>> called for 4k pages, which also makes that part easier.
>>
>> Except for 1MB shadowed segments which still need an invalidation handler.
> 
> But that doesn't go via gmap_protect_range() / gmap_call_notifier() if I
> am not mistaking.

Yes, my point was that we'd still need pmdp_notify.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement
  2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
                   ` (23 preceding siblings ...)
  2018-01-22 11:23 ` David Hildenbrand
@ 2018-01-23 21:15 ` David Hildenbrand
  2018-01-24  9:01   ` Janosch Frank
  24 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2018-01-23 21:15 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390

On 13.12.2017 13:53, Janosch Frank wrote:
> Since the z10 s390 does support 1M pages, but whereas hugetlbfs
> support was added quite fast, KVM always used standard 4k pages for
> guest backings.
> 
> This patchset adds full support for 1M huge page backings for s390
> KVM guests. I.e. we also support VSIE (nested vms) for these guests
> and are therefore able to run all combinations of backings for all
> layers of guests.
> 
> When running a VSIE guest in a huge page backed guest, we need to
> split some huge pages to be able to set granular protection. This way
> we avoid a prot/unprot cycle if prefixes and VSIE pages containing
> level 3 gmap DAT tables share the same segment, as the prefix has to
> be accessible at all times and the VSIE page has to be write
> protected.
> 
> TODO:
> * Cleanups & Documentation
> * Refactoring to get rid of a lot of indents
> * Find a way to reduce or beautify bit checks on table entries
> * Storage key support for split pages (will be a separate bugfix)
> * Regression testing
> * Testing large setups
> * Testing multi level VSIE
> 
> V2:
> 	* Incorporated changes from David's cleanup
> 	* Now flushing with IDTE_NODAT for protection transfers.
> 	* Added RRBE huge page handling for g2 -> g3 skey emulation
> 	* Added documentation for capability
> 	* Renamed GMAP_ENTRY_* constants
> 	* Added SEGMENT hardware bits constants
> 	* Improved some patch descriptions
> 	* General small improvements
> 	* Introduced pte_from_pmd function
> 
> Accomplished testing:
> l2: KVM guest
> l3: nested KVM guest
> 
> * 1m l2 guests
> * VSIE (l3) 4k and 1m guests on 1m l2
> * 1m l2 -> l2 migration with 4k/1m l3 guests
> * l3 -> l2 migration
> * postcopy works every second try, seems to be QEMU or my setup
> 

Please correct me if I'm wrong (this stuff is complicated):


Right now we have to split huge pages under the following condition:

a) We are write protecting (prot != PROT_WRITE) ...
b) ... and we are doing it during shadow page table creation
(GMAP_NOTIFY_SHADOW)

-> gmap_protect_pmd()


This is to work around issues (RW vs. RO) when
a) G2 puts G2->G3 DAT tables on same huge page as a G2 prefix
b) Guest G2->G3 DAT tables on same huge page as G2->G3 pages referenced
in such a table

"we cannot have RO and RW at the same time if things depend on each other".


Now, the interesting thing is, for shadow page tables
(GMAP_NOTIFY_SHADOW), we only protect RO: via gmap_protect_rmap() and
gmap_protect_range().

So basically for all shadow page table housekeeping, we never protect on
pmds but only on ptes. -> We always split huge pages

This implies and important insight: _SEGMENT_ENTRY_GMAP_VSIE is never
used. (and I will prepare a cleanup patch to make PROT_READ implicit on
e.g. gmap_protect_rmap(), because this clarifies this a lot)


We only ever protect right now on huge pages without splitting it up for
the prefix, as I already mentioned. And as discussed, I doubt this is
really worth it. And we can get rid of a lot of code this way.


Long story short:

If we simply split up huge pages when protecting the prefix, we don't
need gmap_protect_pmd() anymore, and therefore also (at least) not

- s390/mm: Abstract gmap notify bit setting
- s390/mm: add gmap PMD invalidation notification


So I think doing proper sub-hugepage protection right from the beginning
makes perfect sense.

@Martin, Christian, am I missing something? What's your take on this?

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement
  2018-01-23 21:15 ` David Hildenbrand
@ 2018-01-24  9:01   ` Janosch Frank
  2018-01-24  9:14     ` David Hildenbrand
  0 siblings, 1 reply; 67+ messages in thread
From: Janosch Frank @ 2018-01-24  9:01 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390


[-- Attachment #1.1: Type: text/plain, Size: 2522 bytes --]

On 23.01.2018 22:15, David Hildenbrand wrote:
> On 13.12.2017 13:53, Janosch Frank wrote:
> Please correct me if I'm wrong (this stuff is complicated):
> 
> 
> Right now we have to split huge pages under the following condition:
> 
> a) We are write protecting (prot != PROT_WRITE) ...
> b) ... and we are doing it during shadow page table creation
> (GMAP_NOTIFY_SHADOW)
> 
> -> gmap_protect_pmd()

Yes

> 
> 
> This is to work around issues (RW vs. RO) when
> a) G2 puts G2->G3 DAT tables on same huge page as a G2 prefix
> b) Guest G2->G3 DAT tables on same huge page as G2->G3 pages referenced
> in such a table
> 
> "we cannot have RO and RW at the same time if things depend on each other".

Yes

> 
> 
> Now, the interesting thing is, for shadow page tables
> (GMAP_NOTIFY_SHADOW), we only protect RO: via gmap_protect_rmap() and
> gmap_protect_range().
> 
> So basically for all shadow page table housekeeping, we never protect on
> pmds but only on ptes. -> We always split huge pages
> 
> This implies and important insight: _SEGMENT_ENTRY_GMAP_VSIE is never
> used. (and I will prepare a cleanup patch to make PROT_READ implicit on
> e.g. gmap_protect_rmap(), because this clarifies this a lot)

Yes, I guess _SEGMENT_ENTRY_GMAP_VSIE is a leftover from before the
splitting.

> 
> 
> We only ever protect right now on huge pages without splitting it up for
> the prefix, as I already mentioned. And as discussed, I doubt this is
> really worth it. And we can get rid of a lot of code this way.

See next answer

> 
> 
> Long story short:
> 
> If we simply split up huge pages when protecting the prefix, we don't
> need gmap_protect_pmd() anymore, and therefore also (at least) not

We need it for the dirty tracking, no?

> 
> - s390/mm: Abstract gmap notify bit setting

Yes, that's not needed then.

> - s390/mm: add gmap PMD invalidation notification

We need that one (in parts) because of the protection transfer to user
space. We will be notified on mm pmds. Even if we split a pmd, we will
be notified on a pmd, not on a pte. So we need at least a skeleton that
calls pmdp_notify_split.

I'm currently preparing a patch that rips out pmd protection with
software bits. I'll attach it when finished, so we can have a look what
can go.

> 
> 
> So I think doing proper sub-hugepage protection right from the beginning
> makes perfect sense.
> 
> @Martin, Christian, am I missing something? What's your take on this?
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement
  2018-01-24  9:01   ` Janosch Frank
@ 2018-01-24  9:14     ` David Hildenbrand
  2018-01-25 15:33       ` [PATCH 0/2] Huge page pte protection Janosch Frank
  2018-01-26 10:34       ` [PATCH v2] mm: s390: Only notify on 4k pages Janosch Frank
  0 siblings, 2 replies; 67+ messages in thread
From: David Hildenbrand @ 2018-01-24  9:14 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390


>> If we simply split up huge pages when protecting the prefix, we don't
>> need gmap_protect_pmd() anymore, and therefore also (at least) not
> 
> We need it for the dirty tracking, no?

Indeed, missed that call. But we don't set any notifier bits, that's the
important part.

> 
>>
>> - s390/mm: Abstract gmap notify bit setting
> 
> Yes, that's not needed then.
> 
>> - s390/mm: add gmap PMD invalidation notification
> 
> We need that one (in parts) because of the protection transfer to user
> space. We will be notified on mm pmds. Even if we split a pmd, we will
> be notified on a pmd, not on a pte. So we need at least a skeleton that
> calls pmdp_notify_split.
> 
> I'm currently preparing a patch that rips out pmd protection with
> software bits. I'll attach it when finished, so we can have a look what
> can go.
> 

Yes, parts of it. Especially without notifier bits.


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 05/22] s390/mm: hugetlb pages within a gmap can not be freed
  2017-12-13 12:53 ` [RFC/PATCH v2 05/22] s390/mm: hugetlb pages within a gmap can not be freed Janosch Frank
@ 2018-01-24 13:45   ` David Hildenbrand
  2018-01-24 13:56     ` Janosch Frank
  0 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2018-01-24 13:45 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390

On 13.12.2017 13:53, Janosch Frank wrote:
> From: Dominik Dingel <dingel@linux.vnet.ibm.com>
> 
> Guests backed by huge pages could theoretically free unused pages via
> the diagnose 10 instruction. We currently don't allow that, so we
> don't have to refault it once it's needed again.
> 
> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
> ---
>  arch/s390/mm/gmap.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index aceaeb5..056acfc 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -695,6 +695,9 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
>  		vmaddr |= gaddr & ~PMD_MASK;
>  		/* Find vma in the parent mm */
>  		vma = find_vma(gmap->mm, vmaddr);
> +		/* We do not discard pages that are backed by hugetlbfs */
> +		if (vma && is_vm_hugetlb_page(vma))
> +			continue;
>  		size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
>  		zap_page_range(vma, vmaddr, size);
>  	}
> 

This check does not care about split huge pages, correct? (because we're
checking the VMA?)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 05/22] s390/mm: hugetlb pages within a gmap can not be freed
  2018-01-24 13:45   ` David Hildenbrand
@ 2018-01-24 13:56     ` Janosch Frank
  0 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2018-01-24 13:56 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390


[-- Attachment #1.1: Type: text/plain, Size: 1286 bytes --]

On 24.01.2018 14:45, David Hildenbrand wrote:
> On 13.12.2017 13:53, Janosch Frank wrote:
>> From: Dominik Dingel <dingel@linux.vnet.ibm.com>
>>
>> Guests backed by huge pages could theoretically free unused pages via
>> the diagnose 10 instruction. We currently don't allow that, so we
>> don't have to refault it once it's needed again.
>>
>> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
>> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
>> ---
>>  arch/s390/mm/gmap.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
>> index aceaeb5..056acfc 100644
>> --- a/arch/s390/mm/gmap.c
>> +++ b/arch/s390/mm/gmap.c
>> @@ -695,6 +695,9 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
>>  		vmaddr |= gaddr & ~PMD_MASK;
>>  		/* Find vma in the parent mm */
>>  		vma = find_vma(gmap->mm, vmaddr);
>> +		/* We do not discard pages that are backed by hugetlbfs */
>> +		if (vma && is_vm_hugetlb_page(vma))
>> +			continue;
>>  		size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
>>  		zap_page_range(vma, vmaddr, size);
>>  	}
>>
> 
> This check does not care about split huge pages, correct? (because we're
> checking the VMA?)
> 

Correct


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 19/22] s390/mm: Split huge pages if granular protection is needed
  2017-12-13 12:53 ` [RFC/PATCH v2 19/22] s390/mm: Split huge pages if granular protection is needed Janosch Frank
@ 2018-01-25  7:16   ` Janosch Frank
  2018-01-25 14:39     ` David Hildenbrand
  0 siblings, 1 reply; 67+ messages in thread
From: Janosch Frank @ 2018-01-25  7:16 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390


[-- Attachment #1.1: Type: text/plain, Size: 3165 bytes --]

On 13.12.2017 13:53, Janosch Frank wrote:
> A guest can put DAT tables for a lower level guest in the same huge
> segment as one of its prefixes or a g3 page. This would make it
> necessary for the segment to be unprotected (because of the prefix)
> and protected (because of the shadowing) at the same time. This is not
> possible in this universe.
> 
> Hence we split the affected huge segment, so we can protect on a
> per-page basis. Such gmap segments are special and get a new software
> bit, that helps us handling this edge case.
> 
> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
> ---
>  arch/s390/include/asm/gmap.h    |  13 ++
>  arch/s390/include/asm/pgtable.h |   7 +-
>  arch/s390/mm/fault.c            |  10 +-
>  arch/s390/mm/gmap.c             | 256 ++++++++++++++++++++++++++++++++++++----
>  arch/s390/mm/pgtable.c          |  51 ++++++++
>  5 files changed, 313 insertions(+), 24 deletions(-)

> @@ -1081,20 +1189,27 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
>  	spinlock_t *ptl;
>  	unsigned long vmaddr, dist;
>  	pmd_t *pmdp, *hpmdp;
> -	int rc;
> +	int rc = 0;
> 
>  	while (len) {
>  		rc = -EAGAIN;
>  		vmaddr = __gmap_translate(gmap, gaddr);
>  		hpmdp = (pmd_t *)huge_pte_offset(gmap->mm, vmaddr, HPAGE_SIZE);
> +		if (!hpmdp)
> +			BUG();
>  		/* Do we need tests here? */
>  		ptl = pmd_lock(gmap->mm, hpmdp);
> 
>  		pmdp = gmap_pmd_op_walk(gmap, gaddr);
>  		if (pmdp) {
>  			if (!pmd_large(*pmdp)) {
> -				rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,
> -						      bits);
> +				if (gmap_pmd_is_split(pmdp) &&
> +				    (bits & GMAP_NOTIFY_MPROT)) {
> +					pmd_val(*pmdp) |= _SEGMENT_ENTRY_GMAP_IN;
> +				}

@David:
This currently breaks my brain. There *was* a reason why I put this
there and I was quite insistent that we needed it. Something about
notification areas on splits, but I absolutely can't remember it. Sigh,
should've made a comment.

This might be a leftover from earlier versions, but could also keep us
from doing mprot notification on pte's.

> +
> +				rc = gmap_protect_pte(gmap, gaddr, vmaddr,
> +						      pmdp, hpmdp, prot, bits);
>  				if (!rc) {
>  					len -= PAGE_SIZE;
>  					gaddr += PAGE_SIZE;
> @@ -1111,7 +1226,9 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,

[...]

> @@ -2774,6 +2977,8 @@ void gmap_pmdp_idte_global(struct mm_struct *mm, unsigned long vmaddr)
>  					    IDTE_GLOBAL);
>  			else
>  				__pmdp_csp(pmdp);
> +
> +			gmap_pmd_split_free(pmdp);
>  			*entry = _SEGMENT_ENTRY_EMPTY;
>  		}
>  		spin_unlock(&gmap->guest_table_lock);
> @@ -2852,6 +3057,7 @@ void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long bitmap[4],
>  	pmd_t *pmdp, *hpmdp;
>  	spinlock_t *ptl;
> 
> +	/* Protection against gmap_link vsie unprotection. */
>  	hpmdp = (pmd_t *)huge_pte_offset(gmap->mm, vmaddr, HPAGE_SIZE);
>  	if (!hpmdp)
>  		return;
> @@ -2867,9 +3073,17 @@ void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long bitmap[4],
>  						      gaddr, vmaddr))
>  			memset(bitmap, 0xFF, 32);

s/0xFF/0xff/



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 19/22] s390/mm: Split huge pages if granular protection is needed
  2018-01-25  7:16   ` Janosch Frank
@ 2018-01-25 14:39     ` David Hildenbrand
  2018-01-25 14:55       ` Janosch Frank
  0 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2018-01-25 14:39 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390

On 25.01.2018 08:16, Janosch Frank wrote:
> On 13.12.2017 13:53, Janosch Frank wrote:
>> A guest can put DAT tables for a lower level guest in the same huge
>> segment as one of its prefixes or a g3 page. This would make it
>> necessary for the segment to be unprotected (because of the prefix)
>> and protected (because of the shadowing) at the same time. This is not
>> possible in this universe.
>>
>> Hence we split the affected huge segment, so we can protect on a
>> per-page basis. Such gmap segments are special and get a new software
>> bit, that helps us handling this edge case.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
>> ---
>>  arch/s390/include/asm/gmap.h    |  13 ++
>>  arch/s390/include/asm/pgtable.h |   7 +-
>>  arch/s390/mm/fault.c            |  10 +-
>>  arch/s390/mm/gmap.c             | 256 ++++++++++++++++++++++++++++++++++++----
>>  arch/s390/mm/pgtable.c          |  51 ++++++++
>>  5 files changed, 313 insertions(+), 24 deletions(-)
> 
>> @@ -1081,20 +1189,27 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
>>  	spinlock_t *ptl;
>>  	unsigned long vmaddr, dist;
>>  	pmd_t *pmdp, *hpmdp;
>> -	int rc;
>> +	int rc = 0;
>>
>>  	while (len) {
>>  		rc = -EAGAIN;
>>  		vmaddr = __gmap_translate(gmap, gaddr);
>>  		hpmdp = (pmd_t *)huge_pte_offset(gmap->mm, vmaddr, HPAGE_SIZE);
>> +		if (!hpmdp)
>> +			BUG();
>>  		/* Do we need tests here? */
>>  		ptl = pmd_lock(gmap->mm, hpmdp);
>>
>>  		pmdp = gmap_pmd_op_walk(gmap, gaddr);
>>  		if (pmdp) {
>>  			if (!pmd_large(*pmdp)) {
>> -				rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,
>> -						      bits);
>> +				if (gmap_pmd_is_split(pmdp) &&
>> +				    (bits & GMAP_NOTIFY_MPROT)) {
>> +					pmd_val(*pmdp) |= _SEGMENT_ENTRY_GMAP_IN;
>> +				}
> 
> @David:
> This currently breaks my brain. There *was* a reason why I put this
> there and I was quite insistent that we needed it. Something about
> notification areas on splits, but I absolutely can't remember it. Sigh,
> should've made a comment.
> 
> This might be a leftover from earlier versions, but could also keep us
> from doing mprot notification on pte's.
> 

This is indeed confusing.

Somebody wants to protect are certain memory (e.g. PREFIX) and get
notified on changes to this subpart.

We have a split pmd
 -> we have huge pages in our user process tables
 -> we have 4k pages in our GMAP tables

Now, we protect a subpart of this huge page via PTEs. My assumption is,
that your "mirroring code" might reduce access rights to the PMD in the
user process tables.

So e.g. via user space tables -> 1MB write protected
Via GMAP: only 8K write protected


In addition, you are setting the _SEGMENT_ENTRY_GMAP_IN on the GMAP PMD.
This means, that the pmdp_notify_gmap() calls all notifiers
(gmap_call_notifier) in case the whole PMD is changed (e.g. by write
protecting for migration)

So if we get a gmap_pmdp_xchg(), we would see _SEGMENT_ENTRY_GMAP_IN and
trigger a notification. The PTEs remain unchecked (bad!).


Now, If I understand this correctly, what you would have to do:

gmap_pmdp_xchg() (or rather pmdp_notify_gmap()) has to check whether it
is a real huge page or a split pmd. If split, it has to call the PTE
invalidators of the page table accordingly.

This makes this manual hack unnecessary.

Hope this helps :)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC/PATCH v2 19/22] s390/mm: Split huge pages if granular protection is needed
  2018-01-25 14:39     ` David Hildenbrand
@ 2018-01-25 14:55       ` Janosch Frank
  0 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2018-01-25 14:55 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390


[-- Attachment #1.1: Type: text/plain, Size: 4446 bytes --]

On 25.01.2018 15:39, David Hildenbrand wrote:
> On 25.01.2018 08:16, Janosch Frank wrote:
>> On 13.12.2017 13:53, Janosch Frank wrote:
>>> A guest can put DAT tables for a lower level guest in the same huge
>>> segment as one of its prefixes or a g3 page. This would make it
>>> necessary for the segment to be unprotected (because of the prefix)
>>> and protected (because of the shadowing) at the same time. This is not
>>> possible in this universe.
>>>
>>> Hence we split the affected huge segment, so we can protect on a
>>> per-page basis. Such gmap segments are special and get a new software
>>> bit, that helps us handling this edge case.
>>>
>>> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
>>> ---
>>>  arch/s390/include/asm/gmap.h    |  13 ++
>>>  arch/s390/include/asm/pgtable.h |   7 +-
>>>  arch/s390/mm/fault.c            |  10 +-
>>>  arch/s390/mm/gmap.c             | 256 ++++++++++++++++++++++++++++++++++++----
>>>  arch/s390/mm/pgtable.c          |  51 ++++++++
>>>  5 files changed, 313 insertions(+), 24 deletions(-)
>>
>>> @@ -1081,20 +1189,27 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
>>>  	spinlock_t *ptl;
>>>  	unsigned long vmaddr, dist;
>>>  	pmd_t *pmdp, *hpmdp;
>>> -	int rc;
>>> +	int rc = 0;
>>>
>>>  	while (len) {
>>>  		rc = -EAGAIN;
>>>  		vmaddr = __gmap_translate(gmap, gaddr);
>>>  		hpmdp = (pmd_t *)huge_pte_offset(gmap->mm, vmaddr, HPAGE_SIZE);
>>> +		if (!hpmdp)
>>> +			BUG();
>>>  		/* Do we need tests here? */
>>>  		ptl = pmd_lock(gmap->mm, hpmdp);
>>>
>>>  		pmdp = gmap_pmd_op_walk(gmap, gaddr);
>>>  		if (pmdp) {
>>>  			if (!pmd_large(*pmdp)) {
>>> -				rc = gmap_protect_pte(gmap, gaddr, pmdp, prot,
>>> -						      bits);
>>> +				if (gmap_pmd_is_split(pmdp) &&
>>> +				    (bits & GMAP_NOTIFY_MPROT)) {
>>> +					pmd_val(*pmdp) |= _SEGMENT_ENTRY_GMAP_IN;
>>> +				}
>>
>> @David:
>> This currently breaks my brain. There *was* a reason why I put this
>> there and I was quite insistent that we needed it. Something about
>> notification areas on splits, but I absolutely can't remember it. Sigh,
>> should've made a comment.
>>
>> This might be a leftover from earlier versions, but could also keep us
>> from doing mprot notification on pte's.
>>
> 
> This is indeed confusing.
> 
> Somebody wants to protect are certain memory (e.g. PREFIX) and get
> notified on changes to this subpart.
> 
> We have a split pmd
>  -> we have huge pages in our user process tables
>  -> we have 4k pages in our GMAP tables
> 
> Now, we protect a subpart of this huge page via PTEs. My assumption is,
> that your "mirroring code" might reduce access rights to the PMD in the
> user process tables.
Yes, it does at the end of gmap_protect_pte

> 
> So e.g. via user space tables -> 1MB write protected
> Via GMAP: only 8K write protected

Yes and if userspace touches the pmd, we end up in pmdp_notify which
will call notify split and do pte notification.

> 
> 
> In addition, you are setting the _SEGMENT_ENTRY_GMAP_IN on the GMAP PMD.
> This means, that the pmdp_notify_gmap() calls all notifiers
> (gmap_call_notifier) in case the whole PMD is changed (e.g. by write
> protecting for migration)
> 
> So if we get a gmap_pmdp_xchg(), we would see _SEGMENT_ENTRY_GMAP_IN and
> trigger a notification. The PTEs remain unchecked (bad!).

Split pmds have own handling functions for setting and clearing RO which
work on pte bases.

ptep_remove_dirty_protection_split
test_and_clear_guest_dirty_split
ptep_notify_gmap

> 
> 
> Now, If I understand this correctly, what you would have to do:
> 
> gmap_pmdp_xchg() (or rather pmdp_notify_gmap()) has to check whether it
> is a real huge page or a split pmd. If split, it has to call the PTE
> invalidators of the page table accordingly.

If I didn't just confuse myself completely, we don't need that. We never
call xchg on a split pmd, except for setting up the split and
pmdp_notify_gmap is never called on a split pmd either as it's not a pmd
anymore (from the gmap sight, for mm we do ptep_notify_gmap_split).

> 
> This makes this manual hack unnecessary.
> 
> Hope this helps :)

A tiny bit.
Anyway, the change seems to be stable, I'll send the change as a reply
when I come from the next meeting. Also had no postcopy problems.

When we agree on the change I'll do a new version.




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 0/2] Huge page pte protection
  2018-01-24  9:14     ` David Hildenbrand
@ 2018-01-25 15:33       ` Janosch Frank
  2018-01-25 15:33         ` [PATCH 1/2] mm: s390: Only notify on 4k pages Janosch Frank
  2018-01-25 15:33         ` [PATCH 2/2] mm: s390: Rename gmap_pte_op_fixup Janosch Frank
  2018-01-26 10:34       ` [PATCH v2] mm: s390: Only notify on 4k pages Janosch Frank
  1 sibling, 2 replies; 67+ messages in thread
From: Janosch Frank @ 2018-01-25 15:33 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

So, this was stable in both pre and postcopy, at least until I stopped
postcopy and hit my BUG_ON in gmap_shadow_notify_pmd. For some reason
migration of my 10g l2 with small and big l3 takes 60s instead of ~30s
now. For the next version it will need some more love.

The second patch will be in the next version one way or another. I
just felt that gmap_pte_op_fixup looked weird in a pmd handling
function.

Janosch Frank (2):
  mm: s390: Only notify on 4k pages
  mm: s390: Rename gmap_pte_op_fixup

 arch/s390/include/asm/gmap.h |  5 +--
 arch/s390/mm/gmap.c          | 90 +++++++++++---------------------------------
 2 files changed, 23 insertions(+), 72 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 1/2] mm: s390: Only notify on 4k pages
  2018-01-25 15:33       ` [PATCH 0/2] Huge page pte protection Janosch Frank
@ 2018-01-25 15:33         ` Janosch Frank
  2018-01-25 16:04           ` David Hildenbrand
  2018-01-25 15:33         ` [PATCH 2/2] mm: s390: Rename gmap_pte_op_fixup Janosch Frank
  1 sibling, 1 reply; 67+ messages in thread
From: Janosch Frank @ 2018-01-25 15:33 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

Let's try this

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 arch/s390/include/asm/gmap.h |  5 ++-
 arch/s390/mm/gmap.c          | 72 ++++++++------------------------------------
 2 files changed, 14 insertions(+), 63 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index 6287aca..4120360 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -13,9 +13,8 @@
 #define GMAP_NOTIFY_SHADOW	0x2
 #define GMAP_NOTIFY_MPROT	0x1
 
-/* Status bits in huge and non-huge gmap segment entries. */
-#define _SEGMENT_ENTRY_GMAP_IN		0x0001	/* invalidation notify bit */
-#define _SEGMENT_ENTRY_GMAP_SPLIT	0x0002  /* split huge pmd */
+/* Status bit in huge and non-huge gmap segment entries. */
+#define _SEGMENT_ENTRY_GMAP_SPLIT	0x0001  /* split huge pmd */
 /* Status bits only for huge segment entries */
 #define _SEGMENT_ENTRY_GMAP_UC		0x4000	/* user dirty (migration) */
 #define _SEGMENT_ENTRY_GMAP_VSIE	0x8000	/* vsie bit */
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 10e0690..c47964f 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -998,7 +998,7 @@ static void gmap_pte_transfer_prot(struct mm_struct *mm, unsigned long addr,
  * and requested access rights are incompatible.
  */
 static int gmap_pmdp_force_prot(struct gmap *gmap, unsigned long addr,
-				pmd_t *pmdp, int prot, unsigned long bits)
+				pmd_t *pmdp, int prot)
 {
 	int pmd_i = pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID;
 	int pmd_p = pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT;
@@ -1018,7 +1018,6 @@ static int gmap_pmdp_force_prot(struct gmap *gmap, unsigned long addr,
 		pmd_val(new) |= _SEGMENT_ENTRY_PROTECT;
 		gmap_pmdp_xchg(gmap, pmdp, new, addr);
 	}
-	pmd_val(*pmdp) |=  bits;
 	return 0;
 }
 
@@ -1136,21 +1135,18 @@ static int gmap_protect_pmd(struct gmap *gmap, unsigned long gaddr,
 			    unsigned long vmaddr, pmd_t *pmdp, pmd_t *hpmdp,
 			    int prot, unsigned long bits)
 {
-	unsigned long sbits = 0;
 	int ret = 0;
 
-	sbits |= (bits & GMAP_NOTIFY_MPROT) ? _SEGMENT_ENTRY_GMAP_IN : 0;
-	sbits |= (bits & GMAP_NOTIFY_SHADOW) ? _SEGMENT_ENTRY_GMAP_VSIE : 0;
-
-	if (((prot != PROT_WRITE) && (bits & GMAP_NOTIFY_SHADOW))) {
+	/* We notify only on the smallest possible frame size, a 4k page. */
+	if (bits) {
 		ret = gmap_pmd_split(gmap, gaddr, pmdp);
 		if (ret)
 			return ret;
 		return -EFAULT;
 	}
 
-	/* Protect gmap pmd */
-	ret = gmap_pmdp_force_prot(gmap, gaddr, pmdp, prot, sbits);
+	/* Protect gmap pmd for dirty tracking. */
+	ret = gmap_pmdp_force_prot(gmap, gaddr, pmdp, prot);
 	/*
 	 * Transfer protection back to the host pmd, so userspace has
 	 * never more access rights than the VM.
@@ -1167,7 +1163,7 @@ static int gmap_protect_pmd(struct gmap *gmap, unsigned long gaddr,
  * @gaddr: virtual address in the guest address space
  * @len: size of area
  * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE
- * @bits: pgste notification bits to set
+ * @bits: notification bits to set
  *
  * Returns 0 if successfully protected, -ENOMEM if out of memory and
  * -EFAULT if gaddr is invalid (or mapping for shadows is missing).
@@ -1196,11 +1192,6 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
 		pmdp = gmap_pmd_op_walk(gmap, gaddr);
 		if (pmdp) {
 			if (!pmd_large(*pmdp)) {
-				if (gmap_pmd_is_split(pmdp) &&
-				    (bits & GMAP_NOTIFY_MPROT)) {
-					pmd_val(*pmdp) |= _SEGMENT_ENTRY_GMAP_IN;
-				}
-
 				rc = gmap_protect_pte(gmap, gaddr, vmaddr,
 						      pmdp, hpmdp, prot, bits);
 				if (!rc) {
@@ -2562,53 +2553,20 @@ static void gmap_shadow_notify_pmd(struct gmap *sg, unsigned long vmaddr,
 				   unsigned long gaddr)
 {
 	struct gmap_rmap *rmap, *rnext, *head;
-	unsigned long start, end, bits, raddr;
+	unsigned long bits, raddr;
 
 
 	BUG_ON(!gmap_is_shadow(sg));
 
 	spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
-	if (sg->removed) {
-		spin_unlock(&sg->guest_table_lock);
-		return;
-	}
-	/* Check for top level table */
-	start = sg->orig_asce & _ASCE_ORIGIN;
-	end = start + ((sg->orig_asce & _ASCE_TABLE_LENGTH) + 1) * 4096;
-	if (!(sg->orig_asce & _ASCE_REAL_SPACE) && gaddr >= start &&
-	    gaddr < ((end & HPAGE_MASK) + HPAGE_SIZE - 1)) {
-		/* The complete shadow table has to go */
-		gmap_unshadow(sg);
-		spin_unlock(&sg->guest_table_lock);
-		list_del(&sg->list);
-		gmap_put(sg);
-		return;
-	}
-	/* Remove the page table tree from on specific entry */
 	head = radix_tree_delete(&sg->host_to_rmap, (vmaddr & HPAGE_MASK) >> PAGE_SHIFT);
 	gmap_for_each_rmap_safe(rmap, rnext, head) {
 		bits = rmap->raddr & _SHADOW_RMAP_MASK;
 		raddr = rmap->raddr ^ bits;
-		switch (bits) {
-		case _SHADOW_RMAP_REGION1:
-			gmap_unshadow_r2t(sg, raddr);
-			break;
-		case _SHADOW_RMAP_REGION2:
-			gmap_unshadow_r3t(sg, raddr);
-			break;
-		case _SHADOW_RMAP_REGION3:
-			gmap_unshadow_sgt(sg, raddr);
-			break;
-		case _SHADOW_RMAP_SEGMENT_LP:
+		if (bits ==  _SHADOW_RMAP_SEGMENT_LP)
 			gmap_unshadow_segment(sg, raddr);
-			break;
-		case _SHADOW_RMAP_SEGMENT:
-			gmap_unshadow_pgt(sg, raddr);
-			break;
-		case _SHADOW_RMAP_PGTABLE:
-			gmap_unshadow_page(sg, raddr);
-			break;
-		}
+		else
+			BUG_ON(1);
 		kfree(rmap);
 	}
 	spin_unlock(&sg->guest_table_lock);
@@ -2777,9 +2735,8 @@ static void pmdp_notify_gmap(struct gmap *gmap, unsigned long gaddr)
 	table = gmap_table_walk(gmap, gaddr, 1);
 	if (!table)
 		return;
-	bits = *table & _SEGMENT_ENTRY_GMAP_IN;
 	if (pmd_large(__pmd(*table)) && (*table & _SEGMENT_ENTRY_GMAP_VSIE))
-		bits |= _SEGMENT_ENTRY_GMAP_VSIE;
+		bits = _SEGMENT_ENTRY_GMAP_VSIE;
 	if (!bits)
 		return;
 	*table &= ~bits;
@@ -2792,8 +2749,6 @@ static void pmdp_notify_gmap(struct gmap *gmap, unsigned long gaddr)
 			gmap_shadow_notify_pmd(sg, vmaddr, gaddr);
 		spin_unlock(&gmap->shadow_lock);
 	}
-	if (bits & _SEGMENT_ENTRY_GMAP_IN)
-		gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1);
 }
 
 static void pmdp_notify_split(struct mm_struct *mm, unsigned long vmaddr,
@@ -2841,9 +2796,8 @@ void pmdp_notify(struct mm_struct *mm, unsigned long vmaddr)
 			continue;
 		}
 
-		bits = *table & (_SEGMENT_ENTRY_GMAP_IN);
 		if (pmd_large(__pmd(*table)) && (*table & _SEGMENT_ENTRY_GMAP_VSIE))
-			bits |= _SEGMENT_ENTRY_GMAP_VSIE;
+			bits = _SEGMENT_ENTRY_GMAP_VSIE;
 		*table &= ~bits;
 		gaddr = __gmap_segment_gaddr(table);
 		spin_unlock(&gmap->guest_table_lock);
@@ -2854,8 +2808,6 @@ void pmdp_notify(struct mm_struct *mm, unsigned long vmaddr)
 				gmap_shadow_notify_pmd(sg, vmaddr, gaddr);
 			spin_unlock(&gmap->shadow_lock);
 		}
-		if (bits & _SEGMENT_ENTRY_GMAP_IN)
-			gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1);
 	}
 	rcu_read_unlock();
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 2/2] mm: s390: Rename gmap_pte_op_fixup
  2018-01-25 15:33       ` [PATCH 0/2] Huge page pte protection Janosch Frank
  2018-01-25 15:33         ` [PATCH 1/2] mm: s390: Only notify on 4k pages Janosch Frank
@ 2018-01-25 15:33         ` Janosch Frank
  1 sibling, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2018-01-25 15:33 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

Now we also fixup segments (pmds), so it should rather be called
gmap_fixup.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 arch/s390/mm/gmap.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index c47964f..337fddd 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -843,7 +843,7 @@ static inline unsigned long *gmap_table_walk(struct gmap *gmap,
 }
 
 /**
- * gmap_pte_op_fixup - force a page in and connect the gmap page table
+ * gmap_fixup - force memory in and connect the gmap table entry
  * @gmap: pointer to guest mapping meta data structure
  * @gaddr: virtual address in the guest address space
  * @vmaddr: address in the host process address space
@@ -851,10 +851,10 @@ static inline unsigned long *gmap_table_walk(struct gmap *gmap,
  *
  * Returns 0 if the caller can retry __gmap_translate (might fail again),
  * -ENOMEM if out of memory and -EFAULT if anything goes wrong while fixing
- * up or connecting the gmap page table.
+ * up or connecting the gmap table entry.
  */
-static int gmap_pte_op_fixup(struct gmap *gmap, unsigned long gaddr,
-			     unsigned long vmaddr, int prot)
+static int gmap_fixup(struct gmap *gmap, unsigned long gaddr,
+		      unsigned long vmaddr, int prot)
 {
 	struct mm_struct *mm = gmap->mm;
 	unsigned int fault_flags;
@@ -1216,7 +1216,7 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
 			vmaddr = __gmap_translate(gmap, gaddr);
 			if (IS_ERR_VALUE(vmaddr))
 				return vmaddr;
-			rc = gmap_pte_op_fixup(gmap, gaddr, vmaddr, prot);
+			rc = gmap_fixup(gmap, gaddr, vmaddr, prot);
 			if (rc)
 				return rc;
 		}
@@ -1314,7 +1314,7 @@ int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val,
 			rc = vmaddr;
 			break;
 		}
-		rc = gmap_pte_op_fixup(gmap, gaddr, vmaddr, PROT_READ);
+		rc = gmap_fixup(gmap, gaddr, vmaddr, PROT_READ);
 		if (rc)
 			break;
 	}
@@ -1462,7 +1462,7 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
 		if (rc)
 			kfree(rmap);
 		if (rc == -EAGAIN) {
-			rc = gmap_pte_op_fixup(parent, paddr, vmaddr, prot);
+			rc = gmap_fixup(parent, paddr, vmaddr, prot);
 			if (rc)
 				return rc;
 		}
@@ -2426,7 +2426,7 @@ int gmap_shadow_segment(struct gmap *sg, unsigned long saddr, pmd_t pmd)
 		radix_tree_preload_end();
 		if (!rc)
 			break;
-		rc = gmap_pte_op_fixup(parent, paddr, vmaddr, prot);
+		rc = gmap_fixup(parent, paddr, vmaddr, prot);
 		if (rc)
 			break;
 	}
@@ -2519,7 +2519,7 @@ int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte)
 		radix_tree_preload_end();
 		if (!rc)
 			break;
-		rc = gmap_pte_op_fixup(parent, paddr, vmaddr, prot);
+		rc = gmap_fixup(parent, paddr, vmaddr, prot);
 		if (rc)
 			break;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/2] mm: s390: Only notify on 4k pages
  2018-01-25 15:33         ` [PATCH 1/2] mm: s390: Only notify on 4k pages Janosch Frank
@ 2018-01-25 16:04           ` David Hildenbrand
  2018-01-26 10:31             ` Janosch Frank
  0 siblings, 1 reply; 67+ messages in thread
From: David Hildenbrand @ 2018-01-25 16:04 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390

On 25.01.2018 16:33, Janosch Frank wrote:
> Let's try this
> 
> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
> ---
>  arch/s390/include/asm/gmap.h |  5 ++-
>  arch/s390/mm/gmap.c          | 72 ++++++++------------------------------------
>  2 files changed, 14 insertions(+), 63 deletions(-)
> 
> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
> index 6287aca..4120360 100644
> --- a/arch/s390/include/asm/gmap.h
> +++ b/arch/s390/include/asm/gmap.h
> @@ -13,9 +13,8 @@
>  #define GMAP_NOTIFY_SHADOW	0x2
>  #define GMAP_NOTIFY_MPROT	0x1
>  
> -/* Status bits in huge and non-huge gmap segment entries. */
> -#define _SEGMENT_ENTRY_GMAP_IN		0x0001	/* invalidation notify bit */
> -#define _SEGMENT_ENTRY_GMAP_SPLIT	0x0002  /* split huge pmd */
> +/* Status bit in huge and non-huge gmap segment entries. */
> +#define _SEGMENT_ENTRY_GMAP_SPLIT	0x0001  /* split huge pmd */
>  /* Status bits only for huge segment entries */
>  #define _SEGMENT_ENTRY_GMAP_UC		0x4000	/* user dirty (migration) */
>  #define _SEGMENT_ENTRY_GMAP_VSIE	0x8000	/* vsie bit */
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 10e0690..c47964f 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -998,7 +998,7 @@ static void gmap_pte_transfer_prot(struct mm_struct *mm, unsigned long addr,
>   * and requested access rights are incompatible.
>   */
>  static int gmap_pmdp_force_prot(struct gmap *gmap, unsigned long addr,
> -				pmd_t *pmdp, int prot, unsigned long bits)
> +				pmd_t *pmdp, int prot)
>  {
>  	int pmd_i = pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID;
>  	int pmd_p = pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT;
> @@ -1018,7 +1018,6 @@ static int gmap_pmdp_force_prot(struct gmap *gmap, unsigned long addr,
>  		pmd_val(new) |= _SEGMENT_ENTRY_PROTECT;
>  		gmap_pmdp_xchg(gmap, pmdp, new, addr);
>  	}
> -	pmd_val(*pmdp) |=  bits;
>  	return 0;
>  }
>  
> @@ -1136,21 +1135,18 @@ static int gmap_protect_pmd(struct gmap *gmap, unsigned long gaddr,
>  			    unsigned long vmaddr, pmd_t *pmdp, pmd_t *hpmdp,
>  			    int prot, unsigned long bits)
>  {
> -	unsigned long sbits = 0;
>  	int ret = 0;
>  
> -	sbits |= (bits & GMAP_NOTIFY_MPROT) ? _SEGMENT_ENTRY_GMAP_IN : 0;
> -	sbits |= (bits & GMAP_NOTIFY_SHADOW) ? _SEGMENT_ENTRY_GMAP_VSIE : 0;
> -
> -	if (((prot != PROT_WRITE) && (bits & GMAP_NOTIFY_SHADOW))) {
> +	/* We notify only on the smallest possible frame size, a 4k page. */
> +	if (bits) {
>  		ret = gmap_pmd_split(gmap, gaddr, pmdp);
>  		if (ret)
>  			return ret;
>  		return -EFAULT;
>  	}

See below, I think we should move that to the caller.

Especially, gmap_protect_rmap_pmd() should no longer be needed then. (if
I am not messing things up)

>  
> -	/* Protect gmap pmd */
> -	ret = gmap_pmdp_force_prot(gmap, gaddr, pmdp, prot, sbits);
> +	/* Protect gmap pmd for dirty tracking. */
> +	ret = gmap_pmdp_force_prot(gmap, gaddr, pmdp, prot);
>  	/*
>  	 * Transfer protection back to the host pmd, so userspace has
>  	 * never more access rights than the VM.
> @@ -1167,7 +1163,7 @@ static int gmap_protect_pmd(struct gmap *gmap, unsigned long gaddr,
>   * @gaddr: virtual address in the guest address space
>   * @len: size of area
>   * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE
> - * @bits: pgste notification bits to set
> + * @bits: notification bits to set
>   *
>   * Returns 0 if successfully protected, -ENOMEM if out of memory and
>   * -EFAULT if gaddr is invalid (or mapping for shadows is missing).
> @@ -1196,11 +1192,6 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
>  		pmdp = gmap_pmd_op_walk(gmap, gaddr);
>  		if (pmdp) {
>  			if (!pmd_large(*pmdp)) {
> -				if (gmap_pmd_is_split(pmdp) &&
> -				    (bits & GMAP_NOTIFY_MPROT)) {
> -					pmd_val(*pmdp) |= _SEGMENT_ENTRY_GMAP_IN;
> -				}
> -

Actually we can reduce this code here quite a lot by simply checking for

if (pmd_large(*pmdp)) {
	// splitup
	rc = EAGAIN;
}

No need to call gmap_protect_pmd().

I think it makes sense to move the split handling completely out of
gmap_protect_pmd() and only call it at places where we need it.

So only gmap_test_and_clear_dirty_segment() should end up calling it.

We can then also get rid of the "bits" parameter here, which is nice.

>  				rc = gmap_protect_pte(gmap, gaddr, vmaddr,
>  						      pmdp, hpmdp, prot, bits);
>  				if (!rc) {
> @@ -2562,53 +2553,20 @@ static void gmap_shadow_notify_pmd(struct gmap *sg, unsigned long vmaddr,
>  				   unsigned long gaddr)
>  {
>  	struct gmap_rmap *rmap, *rnext, *head;
> -	unsigned long start, end, bits, raddr;
> +	unsigned long bits, raddr;
>  
>  
>  	BUG_ON(!gmap_is_shadow(sg));
>  
>  	spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
> -	if (sg->removed) {
> -		spin_unlock(&sg->guest_table_lock);
> -		return;
> -	}
> -	/* Check for top level table */
> -	start = sg->orig_asce & _ASCE_ORIGIN;
> -	end = start + ((sg->orig_asce & _ASCE_TABLE_LENGTH) + 1) * 4096;
> -	if (!(sg->orig_asce & _ASCE_REAL_SPACE) && gaddr >= start &&
> -	    gaddr < ((end & HPAGE_MASK) + HPAGE_SIZE - 1)) {
> -		/* The complete shadow table has to go */
> -		gmap_unshadow(sg);
> -		spin_unlock(&sg->guest_table_lock);
> -		list_del(&sg->list);
> -		gmap_put(sg);
> -		return;
> -	}
> -	/* Remove the page table tree from on specific entry */
>  	head = radix_tree_delete(&sg->host_to_rmap, (vmaddr & HPAGE_MASK) >> PAGE_SHIFT);
>  	gmap_for_each_rmap_safe(rmap, rnext, head) {
>  		bits = rmap->raddr & _SHADOW_RMAP_MASK;
>  		raddr = rmap->raddr ^ bits;
> -		switch (bits) {
> -		case _SHADOW_RMAP_REGION1:
> -			gmap_unshadow_r2t(sg, raddr);
> -			break;
> -		case _SHADOW_RMAP_REGION2:
> -			gmap_unshadow_r3t(sg, raddr);
> -			break;
> -		case _SHADOW_RMAP_REGION3:
> -			gmap_unshadow_sgt(sg, raddr);
> -			break;
> -		case _SHADOW_RMAP_SEGMENT_LP:
> +		if (bits ==  _SHADOW_RMAP_SEGMENT_LP)
>  			gmap_unshadow_segment(sg, raddr);
> -			break;
> -		case _SHADOW_RMAP_SEGMENT:
> -			gmap_unshadow_pgt(sg, raddr);
> -			break;
> -		case _SHADOW_RMAP_PGTABLE:
> -			gmap_unshadow_page(sg, raddr);
> -			break;
> -		}

Now this looks much better. Do we still need the _SHADOW_RMAP_SEGMENT_LP
check in gmap_shadow_notify() ? don't think so

> +		else
> +			BUG_ON(1);
>  		kfree(rmap);
>  	}
>  	spin_unlock(&sg->guest_table_lock);
> @@ -2777,9 +2735,8 @@ static void pmdp_notify_gmap(struct gmap *gmap, unsigned long gaddr)
>  	table = gmap_table_walk(gmap, gaddr, 1);
>  	if (!table)
>  		return;
> -	bits = *table & _SEGMENT_ENTRY_GMAP_IN;
>  	if (pmd_large(__pmd(*table)) && (*table & _SEGMENT_ENTRY_GMAP_VSIE))
> -		bits |= _SEGMENT_ENTRY_GMAP_VSIE;
> +		bits = _SEGMENT_ENTRY_GMAP_VSIE;
>  	if (!bits)
>  		return;
>  	*table &= ~bits;
> @@ -2792,8 +2749,6 @@ static void pmdp_notify_gmap(struct gmap *gmap, unsigned long gaddr)
>  			gmap_shadow_notify_pmd(sg, vmaddr, gaddr);
>  		spin_unlock(&gmap->shadow_lock);
>  	}
> -	if (bits & _SEGMENT_ENTRY_GMAP_IN)
> -		gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1);
>  }
>  
>  static void pmdp_notify_split(struct mm_struct *mm, unsigned long vmaddr,
> @@ -2841,9 +2796,8 @@ void pmdp_notify(struct mm_struct *mm, unsigned long vmaddr)
>  			continue;
>  		}
>  
> -		bits = *table & (_SEGMENT_ENTRY_GMAP_IN);
>  		if (pmd_large(__pmd(*table)) && (*table & _SEGMENT_ENTRY_GMAP_VSIE))
> -			bits |= _SEGMENT_ENTRY_GMAP_VSIE;
> +			bits = _SEGMENT_ENTRY_GMAP_VSIE;
>  		*table &= ~bits;
>  		gaddr = __gmap_segment_gaddr(table);
>  		spin_unlock(&gmap->guest_table_lock);
> @@ -2854,8 +2808,6 @@ void pmdp_notify(struct mm_struct *mm, unsigned long vmaddr)
>  				gmap_shadow_notify_pmd(sg, vmaddr, gaddr);
>  			spin_unlock(&gmap->shadow_lock);
>  		}
> -		if (bits & _SEGMENT_ENTRY_GMAP_IN)
> -			gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1);
>  	}
>  	rcu_read_unlock();
>  }
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 1/2] mm: s390: Only notify on 4k pages
  2018-01-25 16:04           ` David Hildenbrand
@ 2018-01-26 10:31             ` Janosch Frank
  0 siblings, 0 replies; 67+ messages in thread
From: Janosch Frank @ 2018-01-26 10:31 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390


[-- Attachment #1.1: Type: text/plain, Size: 1836 bytes --]

On 25.01.2018 17:04, David Hildenbrand wrote:
> On 25.01.2018 16:33, Janosch Frank wrote:
>> Let's try this
> 
> Actually we can reduce this code here quite a lot by simply checking for
> 
> if (pmd_large(*pmdp)) {
> 	// splitup
> 	rc = EAGAIN;
> }

Yes, we can. (and I did)
It actually looks a lot better now, thanks!

I'm still dreading the rebase to put the changes into the original
patch, though. :)

>> -	/* Remove the page table tree from on specific entry */
>>  	head = radix_tree_delete(&sg->host_to_rmap, (vmaddr & HPAGE_MASK) >> PAGE_SHIFT);
>>  	gmap_for_each_rmap_safe(rmap, rnext, head) {
>>  		bits = rmap->raddr & _SHADOW_RMAP_MASK;
>>  		raddr = rmap->raddr ^ bits;
>> -		switch (bits) {
>> -		case _SHADOW_RMAP_REGION1:
>> -			gmap_unshadow_r2t(sg, raddr);
>> -			break;
>> -		case _SHADOW_RMAP_REGION2:
>> -			gmap_unshadow_r3t(sg, raddr);
>> -			break;
>> -		case _SHADOW_RMAP_REGION3:
>> -			gmap_unshadow_sgt(sg, raddr);
>> -			break;
>> -		case _SHADOW_RMAP_SEGMENT_LP:
>> +		if (bits ==  _SHADOW_RMAP_SEGMENT_LP)
>>  			gmap_unshadow_segment(sg, raddr);
>> -			break;
>> -		case _SHADOW_RMAP_SEGMENT:
>> -			gmap_unshadow_pgt(sg, raddr);
>> -			break;
>> -		case _SHADOW_RMAP_PGTABLE:
>> -			gmap_unshadow_page(sg, raddr);
>> -			break;
>> -		}
> 
> Now this looks much better. Do we still need the _SHADOW_RMAP_SEGMENT_LP
> check in gmap_shadow_notify() ? don't think so

Well the l2 big -> l3 little case also needs _SHADOW_RMAP_PGTABLE, so we
need at least an if/else. I forgot about that case yesterday but just
fixed it up...

Also I'm seeing increased migration time, the 10g l2 takes 1m20s on
postcopy (1m30 precopy) to migrate, which took 30 - 45s before. However,
I did not run into any problems yet, so we got that goin' for us, which
is nice.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2] mm: s390: Only notify on 4k pages
  2018-01-24  9:14     ` David Hildenbrand
  2018-01-25 15:33       ` [PATCH 0/2] Huge page pte protection Janosch Frank
@ 2018-01-26 10:34       ` Janosch Frank
  2018-01-30 10:19         ` David Hildenbrand
  1 sibling, 1 reply; 67+ messages in thread
From: Janosch Frank @ 2018-01-26 10:34 UTC (permalink / raw)
  To: kvm; +Cc: schwidefsky, borntraeger, david, dominik.dingel, linux-s390

Let's try this

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 arch/s390/include/asm/gmap.h |   5 +-
 arch/s390/mm/gmap.c          | 142 ++++++++-----------------------------------
 2 files changed, 28 insertions(+), 119 deletions(-)

diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index 6287aca..4120360 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -13,9 +13,8 @@
 #define GMAP_NOTIFY_SHADOW	0x2
 #define GMAP_NOTIFY_MPROT	0x1
 
-/* Status bits in huge and non-huge gmap segment entries. */
-#define _SEGMENT_ENTRY_GMAP_IN		0x0001	/* invalidation notify bit */
-#define _SEGMENT_ENTRY_GMAP_SPLIT	0x0002  /* split huge pmd */
+/* Status bit in huge and non-huge gmap segment entries. */
+#define _SEGMENT_ENTRY_GMAP_SPLIT	0x0001  /* split huge pmd */
 /* Status bits only for huge segment entries */
 #define _SEGMENT_ENTRY_GMAP_UC		0x4000	/* user dirty (migration) */
 #define _SEGMENT_ENTRY_GMAP_VSIE	0x8000	/* vsie bit */
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 66a68af..2f5c8ee 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -998,7 +998,7 @@ static void gmap_pte_transfer_prot(struct mm_struct *mm, unsigned long addr,
  * and requested access rights are incompatible.
  */
 static int gmap_pmdp_force_prot(struct gmap *gmap, unsigned long addr,
-				pmd_t *pmdp, int prot, unsigned long bits)
+				pmd_t *pmdp, int prot)
 {
 	int pmd_i = pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID;
 	int pmd_p = pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT;
@@ -1018,7 +1018,6 @@ static int gmap_pmdp_force_prot(struct gmap *gmap, unsigned long addr,
 		pmd_val(new) |= _SEGMENT_ENTRY_PROTECT;
 		gmap_pmdp_xchg(gmap, pmdp, new, addr);
 	}
-	pmd_val(*pmdp) |=  bits;
 	return 0;
 }
 
@@ -1102,10 +1101,6 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
 	spinlock_t *ptl = NULL;
 	unsigned long pbits = 0;
 
-	/* We have no upper segment, let's go back and fix this up. */
-	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
-		return -EAGAIN;
-
 	ptep = gmap_pte_from_pmd(gmap, pmdp, gaddr, &ptl);
 	if (!ptep)
 		return -ENOMEM;
@@ -1134,30 +1129,18 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
  */
 static int gmap_protect_pmd(struct gmap *gmap, unsigned long gaddr,
 			    unsigned long vmaddr, pmd_t *pmdp, pmd_t *hpmdp,
-			    int prot, unsigned long bits)
+			    int prot)
 {
-	unsigned long sbits = 0;
 	int ret = 0;
 
-	sbits |= (bits & GMAP_NOTIFY_MPROT) ? _SEGMENT_ENTRY_GMAP_IN : 0;
-	sbits |= (bits & GMAP_NOTIFY_SHADOW) ? _SEGMENT_ENTRY_GMAP_VSIE : 0;
-
-	if (((prot != PROT_WRITE) && (bits & GMAP_NOTIFY_SHADOW))) {
-		ret = gmap_pmd_split(gmap, gaddr, pmdp);
-		if (ret)
-			return ret;
-		return -EFAULT;
-	}
-
-	/* Protect gmap pmd */
-	ret = gmap_pmdp_force_prot(gmap, gaddr, pmdp, prot, sbits);
+	/* Protect gmap pmd for dirty tracking. */
+	ret = gmap_pmdp_force_prot(gmap, gaddr, pmdp, prot);
 	/*
 	 * Transfer protection back to the host pmd, so userspace has
 	 * never more access rights than the VM.
 	 */
 	if (!ret)
 		gmap_pmdp_transfer_prot(gmap->mm, vmaddr, pmdp, hpmdp);
-
 	return ret;
 }
 
@@ -1167,7 +1150,7 @@ static int gmap_protect_pmd(struct gmap *gmap, unsigned long gaddr,
  * @gaddr: virtual address in the guest address space
  * @len: size of area
  * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE
- * @bits: pgste notification bits to set
+ * @bits: notification bits to set
  *
  * Returns 0 if successfully protected, -ENOMEM if out of memory and
  * -EFAULT if gaddr is invalid (or mapping for shadows is missing).
@@ -1180,7 +1163,7 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
 			      unsigned long len, int prot, unsigned long bits)
 {
 	spinlock_t *ptl;
-	unsigned long vmaddr, dist;
+	unsigned long vmaddr;
 	pmd_t *pmdp, *hpmdp;
 	int rc = 0;
 
@@ -1194,13 +1177,8 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
 		ptl = pmd_lock(gmap->mm, hpmdp);
 
 		pmdp = gmap_pmd_op_walk(gmap, gaddr);
-		if (pmdp) {
+		if (pmdp && !(pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)) {
 			if (!pmd_large(*pmdp)) {
-				if (gmap_pmd_is_split(pmdp) &&
-				    (bits & GMAP_NOTIFY_MPROT)) {
-					pmd_val(*pmdp) |= _SEGMENT_ENTRY_GMAP_IN;
-				}
-
 				rc = gmap_protect_pte(gmap, gaddr, vmaddr,
 						      pmdp, hpmdp, prot, bits);
 				if (!rc) {
@@ -1208,13 +1186,9 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
 					gaddr += PAGE_SIZE;
 				}
 			} else {
-				rc =  gmap_protect_pmd(gmap, gaddr, vmaddr,
-						       pmdp, hpmdp, prot, bits);
-				if (!rc) {
-					dist = HPAGE_SIZE - (gaddr & ~HPAGE_MASK);
-					len = len < dist ? 0 : len - dist;
-					gaddr = (gaddr & HPAGE_MASK) + HPAGE_SIZE;
-				}
+				rc = gmap_pmd_split(gmap, gaddr, pmdp);
+				if (!rc)
+					rc = -EFAULT;
 			}
 			gmap_pmd_op_end(gmap, pmdp);
 		}
@@ -1357,29 +1331,9 @@ static inline void gmap_insert_rmap(struct gmap *sg, unsigned long vmaddr,
 	}
 }
 
-static int gmap_protect_rmap_pmd(struct gmap *sg, struct gmap_rmap *rmap,
-				 unsigned long paddr, unsigned long vmaddr,
-				 pmd_t *pmdp, pmd_t *hpmdp, int prot)
-{
-	int rc = 0;
-
-	/* We have no upper segment, let's go back and fix this up. */
-	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
-		return -EAGAIN;
-
-	spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
-	rc = gmap_protect_pmd(sg->parent, paddr, vmaddr, pmdp, hpmdp,
-			      prot, GMAP_NOTIFY_SHADOW);
-	if (!rc)
-		gmap_insert_rmap(sg, vmaddr & HPAGE_MASK, rmap);
-
-	spin_unlock(&sg->guest_table_lock);
-	return rc;
-}
-
 static int gmap_protect_rmap_pte(struct gmap *sg, struct gmap_rmap *rmap,
 				 unsigned long paddr, unsigned long vmaddr,
-				 pmd_t *pmdp, int prot)
+				 pmd_t *pmdp, pmd_t *hpmdp, int prot)
 {
 	int rc = 0;
 	pte_t *ptep = NULL;
@@ -1392,8 +1346,8 @@ static int gmap_protect_rmap_pte(struct gmap *sg, struct gmap_rmap *rmap,
 	ptep = gmap_pte_from_pmd(sg->parent, pmdp, paddr, &ptl);
 	if (ptep) {
 		spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
-		rc = ptep_force_prot(sg->parent->mm, paddr, ptep, prot,
-				     PGSTE_VSIE_BIT);
+		rc = gmap_protect_pte(sg->parent, paddr, vmaddr, pmdp, hpmdp,
+				      prot, GMAP_NOTIFY_SHADOW);
 		if (!rc)
 			gmap_insert_rmap(sg, vmaddr, rmap);
 		spin_unlock(&sg->guest_table_lock);
@@ -1418,7 +1372,7 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
 {
 	struct gmap *parent;
 	struct gmap_rmap *rmap;
-	unsigned long vmaddr, dist;
+	unsigned long vmaddr;
 	pmd_t *pmdp, *hpmdp;
 	spinlock_t *ptl;
 	int rc;
@@ -1446,23 +1400,19 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
 		}
 		rc = -EAGAIN;
 		pmdp = gmap_pmd_op_walk(parent, paddr);
-		if (pmdp) {
+		if (pmdp && !(pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)) {
 			if (!pmd_large(*pmdp)) {
 				rc = gmap_protect_rmap_pte(sg, rmap, paddr,
-							   vmaddr, pmdp, prot);
+							   vmaddr, pmdp, hpmdp,
+							   prot);
 				if (!rc) {
 					paddr += PAGE_SIZE;
 					len -= PAGE_SIZE;
 				}
 			} else {
-				rc = gmap_protect_rmap_pmd(sg, rmap, paddr,
-							   vmaddr, pmdp,
-							   hpmdp, prot);
-				if (!rc) {
-					dist = HPAGE_SIZE - (paddr & ~HPAGE_MASK);
-					len = len < dist ? 0 : len - dist;
-					paddr = (paddr & HPAGE_MASK) + HPAGE_SIZE;
-				}
+				rc = gmap_pmd_split(parent, paddr, pmdp);
+				if (!rc)
+					rc = -EFAULT;
 			}
 			gmap_pmd_op_end(parent, pmdp);
 		}
@@ -2562,53 +2512,19 @@ static void gmap_shadow_notify_pmd(struct gmap *sg, unsigned long vmaddr,
 				   unsigned long gaddr)
 {
 	struct gmap_rmap *rmap, *rnext, *head;
-	unsigned long start, end, bits, raddr;
-
+	unsigned long bits, raddr;
 
 	BUG_ON(!gmap_is_shadow(sg));
 
 	spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
-	if (sg->removed) {
-		spin_unlock(&sg->guest_table_lock);
-		return;
-	}
-	/* Check for top level table */
-	start = sg->orig_asce & _ASCE_ORIGIN;
-	end = start + ((sg->orig_asce & _ASCE_TABLE_LENGTH) + 1) * 4096;
-	if (!(sg->orig_asce & _ASCE_REAL_SPACE) && gaddr >= start &&
-	    gaddr < ((end & HPAGE_MASK) + HPAGE_SIZE - 1)) {
-		/* The complete shadow table has to go */
-		gmap_unshadow(sg);
-		spin_unlock(&sg->guest_table_lock);
-		list_del(&sg->list);
-		gmap_put(sg);
-		return;
-	}
-	/* Remove the page table tree from on specific entry */
 	head = radix_tree_delete(&sg->host_to_rmap, (vmaddr & HPAGE_MASK) >> PAGE_SHIFT);
 	gmap_for_each_rmap_safe(rmap, rnext, head) {
 		bits = rmap->raddr & _SHADOW_RMAP_MASK;
 		raddr = rmap->raddr ^ bits;
-		switch (bits) {
-		case _SHADOW_RMAP_REGION1:
-			gmap_unshadow_r2t(sg, raddr);
-			break;
-		case _SHADOW_RMAP_REGION2:
-			gmap_unshadow_r3t(sg, raddr);
-			break;
-		case _SHADOW_RMAP_REGION3:
-			gmap_unshadow_sgt(sg, raddr);
-			break;
-		case _SHADOW_RMAP_SEGMENT_LP:
+		if (bits == _SHADOW_RMAP_SEGMENT_LP)
 			gmap_unshadow_segment(sg, raddr);
-			break;
-		case _SHADOW_RMAP_SEGMENT:
-			gmap_unshadow_pgt(sg, raddr);
-			break;
-		case _SHADOW_RMAP_PGTABLE:
+		else
 			gmap_unshadow_page(sg, raddr);
-			break;
-		}
 		kfree(rmap);
 	}
 	spin_unlock(&sg->guest_table_lock);
@@ -2777,9 +2693,8 @@ static void pmdp_notify_gmap(struct gmap *gmap, unsigned long gaddr)
 	table = gmap_table_walk(gmap, gaddr, 1);
 	if (!table)
 		return;
-	bits = *table & _SEGMENT_ENTRY_GMAP_IN;
 	if (pmd_large(__pmd(*table)) && (*table & _SEGMENT_ENTRY_GMAP_VSIE))
-		bits |= _SEGMENT_ENTRY_GMAP_VSIE;
+		bits = _SEGMENT_ENTRY_GMAP_VSIE;
 	if (!bits)
 		return;
 	*table &= ~bits;
@@ -2792,8 +2707,6 @@ static void pmdp_notify_gmap(struct gmap *gmap, unsigned long gaddr)
 			gmap_shadow_notify_pmd(sg, vmaddr, gaddr);
 		spin_unlock(&gmap->shadow_lock);
 	}
-	if (bits & _SEGMENT_ENTRY_GMAP_IN)
-		gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1);
 }
 
 static void pmdp_notify_split(struct mm_struct *mm, unsigned long vmaddr,
@@ -2841,9 +2754,8 @@ void pmdp_notify(struct mm_struct *mm, unsigned long vmaddr)
 			continue;
 		}
 
-		bits = *table & (_SEGMENT_ENTRY_GMAP_IN);
 		if (pmd_large(__pmd(*table)) && (*table & _SEGMENT_ENTRY_GMAP_VSIE))
-			bits |= _SEGMENT_ENTRY_GMAP_VSIE;
+			bits = _SEGMENT_ENTRY_GMAP_VSIE;
 		*table &= ~bits;
 		gaddr = __gmap_segment_gaddr(table);
 		spin_unlock(&gmap->guest_table_lock);
@@ -2854,8 +2766,6 @@ void pmdp_notify(struct mm_struct *mm, unsigned long vmaddr)
 				gmap_shadow_notify_pmd(sg, vmaddr, gaddr);
 			spin_unlock(&gmap->shadow_lock);
 		}
-		if (bits & _SEGMENT_ENTRY_GMAP_IN)
-			gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1);
 	}
 	rcu_read_unlock();
 }
@@ -3028,7 +2938,7 @@ bool gmap_test_and_clear_dirty_segment(struct gmap *gmap, pmd_t *pmdp,
 
 	/* Clear UC indication and reset protection */
 	pmd_val(*pmdp) &= ~_SEGMENT_ENTRY_GMAP_UC;
-	gmap_protect_pmd(gmap, gaddr, vmaddr, pmdp, hpmdp, PROT_READ, 0);
+	gmap_protect_pmd(gmap, gaddr, vmaddr, pmdp, hpmdp, PROT_READ);
 	return true;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH v2] mm: s390: Only notify on 4k pages
  2018-01-26 10:34       ` [PATCH v2] mm: s390: Only notify on 4k pages Janosch Frank
@ 2018-01-30 10:19         ` David Hildenbrand
  0 siblings, 0 replies; 67+ messages in thread
From: David Hildenbrand @ 2018-01-30 10:19 UTC (permalink / raw)
  To: Janosch Frank, kvm; +Cc: schwidefsky, borntraeger, dominik.dingel, linux-s390

On 26.01.2018 11:34, Janosch Frank wrote:
> Let's try this
> 
> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
> ---
>  arch/s390/include/asm/gmap.h |   5 +-
>  arch/s390/mm/gmap.c          | 142 ++++++++-----------------------------------
>  2 files changed, 28 insertions(+), 119 deletions(-)
> 
> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
> index 6287aca..4120360 100644
> --- a/arch/s390/include/asm/gmap.h
> +++ b/arch/s390/include/asm/gmap.h
> @@ -13,9 +13,8 @@
>  #define GMAP_NOTIFY_SHADOW	0x2
>  #define GMAP_NOTIFY_MPROT	0x1
>  
> -/* Status bits in huge and non-huge gmap segment entries. */
> -#define _SEGMENT_ENTRY_GMAP_IN		0x0001	/* invalidation notify bit */
> -#define _SEGMENT_ENTRY_GMAP_SPLIT	0x0002  /* split huge pmd */
> +/* Status bit in huge and non-huge gmap segment entries. */
> +#define _SEGMENT_ENTRY_GMAP_SPLIT	0x0001  /* split huge pmd */
>  /* Status bits only for huge segment entries */
>  #define _SEGMENT_ENTRY_GMAP_UC		0x4000	/* user dirty (migration) */
>  #define _SEGMENT_ENTRY_GMAP_VSIE	0x8000	/* vsie bit */
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 66a68af..2f5c8ee 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -998,7 +998,7 @@ static void gmap_pte_transfer_prot(struct mm_struct *mm, unsigned long addr,
>   * and requested access rights are incompatible.
>   */
>  static int gmap_pmdp_force_prot(struct gmap *gmap, unsigned long addr,
> -				pmd_t *pmdp, int prot, unsigned long bits)
> +				pmd_t *pmdp, int prot)
>  {
>  	int pmd_i = pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID;
>  	int pmd_p = pmd_val(*pmdp) & _SEGMENT_ENTRY_PROTECT;
> @@ -1018,7 +1018,6 @@ static int gmap_pmdp_force_prot(struct gmap *gmap, unsigned long addr,
>  		pmd_val(new) |= _SEGMENT_ENTRY_PROTECT;
>  		gmap_pmdp_xchg(gmap, pmdp, new, addr);
>  	}
> -	pmd_val(*pmdp) |=  bits;
>  	return 0;
>  }
>  
> @@ -1102,10 +1101,6 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
>  	spinlock_t *ptl = NULL;
>  	unsigned long pbits = 0;
>  
> -	/* We have no upper segment, let's go back and fix this up. */
> -	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
> -		return -EAGAIN;
> -
>  	ptep = gmap_pte_from_pmd(gmap, pmdp, gaddr, &ptl);
>  	if (!ptep)
>  		return -ENOMEM;
> @@ -1134,30 +1129,18 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr,
>   */
>  static int gmap_protect_pmd(struct gmap *gmap, unsigned long gaddr,
>  			    unsigned long vmaddr, pmd_t *pmdp, pmd_t *hpmdp,
> -			    int prot, unsigned long bits)
> +			    int prot)
>  {
> -	unsigned long sbits = 0;
>  	int ret = 0;
>  
> -	sbits |= (bits & GMAP_NOTIFY_MPROT) ? _SEGMENT_ENTRY_GMAP_IN : 0;
> -	sbits |= (bits & GMAP_NOTIFY_SHADOW) ? _SEGMENT_ENTRY_GMAP_VSIE : 0;
> -
> -	if (((prot != PROT_WRITE) && (bits & GMAP_NOTIFY_SHADOW))) {
> -		ret = gmap_pmd_split(gmap, gaddr, pmdp);
> -		if (ret)
> -			return ret;
> -		return -EFAULT;
> -	}
> -
> -	/* Protect gmap pmd */
> -	ret = gmap_pmdp_force_prot(gmap, gaddr, pmdp, prot, sbits);
> +	/* Protect gmap pmd for dirty tracking. */
> +	ret = gmap_pmdp_force_prot(gmap, gaddr, pmdp, prot);
>  	/*
>  	 * Transfer protection back to the host pmd, so userspace has
>  	 * never more access rights than the VM.
>  	 */
>  	if (!ret)
>  		gmap_pmdp_transfer_prot(gmap->mm, vmaddr, pmdp, hpmdp);
> -
>  	return ret;
>  }
>  
> @@ -1167,7 +1150,7 @@ static int gmap_protect_pmd(struct gmap *gmap, unsigned long gaddr,
>   * @gaddr: virtual address in the guest address space
>   * @len: size of area
>   * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE
> - * @bits: pgste notification bits to set
> + * @bits: notification bits to set
>   *
>   * Returns 0 if successfully protected, -ENOMEM if out of memory and
>   * -EFAULT if gaddr is invalid (or mapping for shadows is missing).
> @@ -1180,7 +1163,7 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
>  			      unsigned long len, int prot, unsigned long bits)
>  {
>  	spinlock_t *ptl;
> -	unsigned long vmaddr, dist;
> +	unsigned long vmaddr;
>  	pmd_t *pmdp, *hpmdp;
>  	int rc = 0;
>  
> @@ -1194,13 +1177,8 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
>  		ptl = pmd_lock(gmap->mm, hpmdp);
>  
>  		pmdp = gmap_pmd_op_walk(gmap, gaddr);
> -		if (pmdp) {
> +		if (pmdp && !(pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)) {
>  			if (!pmd_large(*pmdp)) {
> -				if (gmap_pmd_is_split(pmdp) &&
> -				    (bits & GMAP_NOTIFY_MPROT)) {
> -					pmd_val(*pmdp) |= _SEGMENT_ENTRY_GMAP_IN;
> -				}
> -
>  				rc = gmap_protect_pte(gmap, gaddr, vmaddr,
>  						      pmdp, hpmdp, prot, bits);
>  				if (!rc) {
> @@ -1208,13 +1186,9 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
>  					gaddr += PAGE_SIZE;
>  				}
>  			} else {
> -				rc =  gmap_protect_pmd(gmap, gaddr, vmaddr,
> -						       pmdp, hpmdp, prot, bits);
> -				if (!rc) {
> -					dist = HPAGE_SIZE - (gaddr & ~HPAGE_MASK);
> -					len = len < dist ? 0 : len - dist;
> -					gaddr = (gaddr & HPAGE_MASK) + HPAGE_SIZE;
> -				}
> +				rc = gmap_pmd_split(gmap, gaddr, pmdp);
> +				if (!rc)
> +					rc = -EFAULT;
>  			}
>  			gmap_pmd_op_end(gmap, pmdp);
>  		}
> @@ -1357,29 +1331,9 @@ static inline void gmap_insert_rmap(struct gmap *sg, unsigned long vmaddr,
>  	}
>  }
>  
> -static int gmap_protect_rmap_pmd(struct gmap *sg, struct gmap_rmap *rmap,
> -				 unsigned long paddr, unsigned long vmaddr,
> -				 pmd_t *pmdp, pmd_t *hpmdp, int prot)
> -{
> -	int rc = 0;
> -
> -	/* We have no upper segment, let's go back and fix this up. */
> -	if (pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)
> -		return -EAGAIN;
> -
> -	spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
> -	rc = gmap_protect_pmd(sg->parent, paddr, vmaddr, pmdp, hpmdp,
> -			      prot, GMAP_NOTIFY_SHADOW);
> -	if (!rc)
> -		gmap_insert_rmap(sg, vmaddr & HPAGE_MASK, rmap);
> -
> -	spin_unlock(&sg->guest_table_lock);
> -	return rc;
> -}
> -
>  static int gmap_protect_rmap_pte(struct gmap *sg, struct gmap_rmap *rmap,
>  				 unsigned long paddr, unsigned long vmaddr,
> -				 pmd_t *pmdp, int prot)
> +				 pmd_t *pmdp, pmd_t *hpmdp, int prot)
>  {
>  	int rc = 0;
>  	pte_t *ptep = NULL;
> @@ -1392,8 +1346,8 @@ static int gmap_protect_rmap_pte(struct gmap *sg, struct gmap_rmap *rmap,
>  	ptep = gmap_pte_from_pmd(sg->parent, pmdp, paddr, &ptl);
>  	if (ptep) {
>  		spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
> -		rc = ptep_force_prot(sg->parent->mm, paddr, ptep, prot,
> -				     PGSTE_VSIE_BIT);
> +		rc = gmap_protect_pte(sg->parent, paddr, vmaddr, pmdp, hpmdp,
> +				      prot, GMAP_NOTIFY_SHADOW);
>  		if (!rc)
>  			gmap_insert_rmap(sg, vmaddr, rmap);
>  		spin_unlock(&sg->guest_table_lock);
> @@ -1418,7 +1372,7 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
>  {
>  	struct gmap *parent;
>  	struct gmap_rmap *rmap;
> -	unsigned long vmaddr, dist;
> +	unsigned long vmaddr;
>  	pmd_t *pmdp, *hpmdp;
>  	spinlock_t *ptl;
>  	int rc;
> @@ -1446,23 +1400,19 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
>  		}
>  		rc = -EAGAIN;
>  		pmdp = gmap_pmd_op_walk(parent, paddr);
> -		if (pmdp) {
> +		if (pmdp && !(pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)) {
>  			if (!pmd_large(*pmdp)) {
>  				rc = gmap_protect_rmap_pte(sg, rmap, paddr,
> -							   vmaddr, pmdp, prot);
> +							   vmaddr, pmdp, hpmdp,
> +							   prot);
>  				if (!rc) {
>  					paddr += PAGE_SIZE;
>  					len -= PAGE_SIZE;
>  				}
>  			} else {
> -				rc = gmap_protect_rmap_pmd(sg, rmap, paddr,
> -							   vmaddr, pmdp,
> -							   hpmdp, prot);
> -				if (!rc) {
> -					dist = HPAGE_SIZE - (paddr & ~HPAGE_MASK);
> -					len = len < dist ? 0 : len - dist;
> -					paddr = (paddr & HPAGE_MASK) + HPAGE_SIZE;
> -				}
> +				rc = gmap_pmd_split(parent, paddr, pmdp);
> +				if (!rc)
> +					rc = -EFAULT;
>  			}
>  			gmap_pmd_op_end(parent, pmdp);
>  		}
> @@ -2562,53 +2512,19 @@ static void gmap_shadow_notify_pmd(struct gmap *sg, unsigned long vmaddr,
>  				   unsigned long gaddr)
>  {
>  	struct gmap_rmap *rmap, *rnext, *head;
> -	unsigned long start, end, bits, raddr;
> -
> +	unsigned long bits, raddr;
>  
>  	BUG_ON(!gmap_is_shadow(sg));
>  
>  	spin_lock_nested(&sg->guest_table_lock, GMAP_LOCK_SHADOW);
> -	if (sg->removed) {
> -		spin_unlock(&sg->guest_table_lock);
> -		return;
> -	}
> -	/* Check for top level table */
> -	start = sg->orig_asce & _ASCE_ORIGIN;
> -	end = start + ((sg->orig_asce & _ASCE_TABLE_LENGTH) + 1) * 4096;
> -	if (!(sg->orig_asce & _ASCE_REAL_SPACE) && gaddr >= start &&
> -	    gaddr < ((end & HPAGE_MASK) + HPAGE_SIZE - 1)) {
> -		/* The complete shadow table has to go */
> -		gmap_unshadow(sg);
> -		spin_unlock(&sg->guest_table_lock);
> -		list_del(&sg->list);
> -		gmap_put(sg);
> -		return;
> -	}
> -	/* Remove the page table tree from on specific entry */
>  	head = radix_tree_delete(&sg->host_to_rmap, (vmaddr & HPAGE_MASK) >> PAGE_SHIFT);
>  	gmap_for_each_rmap_safe(rmap, rnext, head) {
>  		bits = rmap->raddr & _SHADOW_RMAP_MASK;
>  		raddr = rmap->raddr ^ bits;
> -		switch (bits) {
> -		case _SHADOW_RMAP_REGION1:
> -			gmap_unshadow_r2t(sg, raddr);
> -			break;
> -		case _SHADOW_RMAP_REGION2:
> -			gmap_unshadow_r3t(sg, raddr);
> -			break;
> -		case _SHADOW_RMAP_REGION3:
> -			gmap_unshadow_sgt(sg, raddr);
> -			break;
> -		case _SHADOW_RMAP_SEGMENT_LP:
> +		if (bits == _SHADOW_RMAP_SEGMENT_LP)
>  			gmap_unshadow_segment(sg, raddr);
> -			break;
> -		case _SHADOW_RMAP_SEGMENT:
> -			gmap_unshadow_pgt(sg, raddr);
> -			break;
> -		case _SHADOW_RMAP_PGTABLE:
> +		else
>  			gmap_unshadow_page(sg, raddr);
> -			break;
> -		}
>  		kfree(rmap);
>  	}
>  	spin_unlock(&sg->guest_table_lock);
> @@ -2777,9 +2693,8 @@ static void pmdp_notify_gmap(struct gmap *gmap, unsigned long gaddr)
>  	table = gmap_table_walk(gmap, gaddr, 1);
>  	if (!table)
>  		return;
> -	bits = *table & _SEGMENT_ENTRY_GMAP_IN;
>  	if (pmd_large(__pmd(*table)) && (*table & _SEGMENT_ENTRY_GMAP_VSIE))
> -		bits |= _SEGMENT_ENTRY_GMAP_VSIE;
> +		bits = _SEGMENT_ENTRY_GMAP_VSIE;
>  	if (!bits)
>  		return;
>  	*table &= ~bits;
> @@ -2792,8 +2707,6 @@ static void pmdp_notify_gmap(struct gmap *gmap, unsigned long gaddr)
>  			gmap_shadow_notify_pmd(sg, vmaddr, gaddr);
>  		spin_unlock(&gmap->shadow_lock);
>  	}
> -	if (bits & _SEGMENT_ENTRY_GMAP_IN)
> -		gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1);
>  }
>  
>  static void pmdp_notify_split(struct mm_struct *mm, unsigned long vmaddr,
> @@ -2841,9 +2754,8 @@ void pmdp_notify(struct mm_struct *mm, unsigned long vmaddr)
>  			continue;
>  		}
>  
> -		bits = *table & (_SEGMENT_ENTRY_GMAP_IN);
>  		if (pmd_large(__pmd(*table)) && (*table & _SEGMENT_ENTRY_GMAP_VSIE))
> -			bits |= _SEGMENT_ENTRY_GMAP_VSIE;
> +			bits = _SEGMENT_ENTRY_GMAP_VSIE;
>  		*table &= ~bits;
>  		gaddr = __gmap_segment_gaddr(table);
>  		spin_unlock(&gmap->guest_table_lock);
> @@ -2854,8 +2766,6 @@ void pmdp_notify(struct mm_struct *mm, unsigned long vmaddr)
>  				gmap_shadow_notify_pmd(sg, vmaddr, gaddr);
>  			spin_unlock(&gmap->shadow_lock);
>  		}
> -		if (bits & _SEGMENT_ENTRY_GMAP_IN)
> -			gmap_call_notifier(gmap, gaddr, gaddr + HPAGE_SIZE - 1);
>  	}
>  	rcu_read_unlock();
>  }
> @@ -3028,7 +2938,7 @@ bool gmap_test_and_clear_dirty_segment(struct gmap *gmap, pmd_t *pmdp,
>  
>  	/* Clear UC indication and reset protection */
>  	pmd_val(*pmdp) &= ~_SEGMENT_ENTRY_GMAP_UC;
> -	gmap_protect_pmd(gmap, gaddr, vmaddr, pmdp, hpmdp, PROT_READ, 0);
> +	gmap_protect_pmd(gmap, gaddr, vmaddr, pmdp, hpmdp, PROT_READ);
>  	return true;
>  }
>  
> 

Yes looks, much better to me!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2018-01-30 10:19 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-13 12:53 [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Janosch Frank
2017-12-13 12:53 ` [RFC/PATCH v2 01/22] s390/mm: make gmap_protect_range more modular Janosch Frank
2018-01-22 11:33   ` David Hildenbrand
2018-01-22 12:31     ` Janosch Frank
2018-01-22 12:50       ` David Hildenbrand
2018-01-22 13:02         ` Janosch Frank
2017-12-13 12:53 ` [RFC/PATCH v2 02/22] s390/mm: Abstract gmap notify bit setting Janosch Frank
2018-01-22 11:34   ` David Hildenbrand
2017-12-13 12:53 ` [RFC/PATCH v2 03/22] s390/mm: add gmap PMD invalidation notification Janosch Frank
2017-12-21  9:24   ` Janosch Frank
2018-01-22 11:46   ` David Hildenbrand
2018-01-22 13:13     ` Janosch Frank
2018-01-22 13:29       ` David Hildenbrand
2018-01-22 14:04         ` Janosch Frank
2018-01-22 11:56   ` David Hildenbrand
2018-01-22 12:09     ` Janosch Frank
2018-01-22 12:12       ` David Hildenbrand
2017-12-13 12:53 ` [RFC/PATCH v2 04/22] s390/mm: Add gmap pmd invalidation and clearing Janosch Frank
2017-12-13 12:53 ` [RFC/PATCH v2 05/22] s390/mm: hugetlb pages within a gmap can not be freed Janosch Frank
2018-01-24 13:45   ` David Hildenbrand
2018-01-24 13:56     ` Janosch Frank
2017-12-13 12:53 ` [RFC/PATCH v2 06/22] s390/mm: Introduce gmap_pmdp_xchg Janosch Frank
2017-12-13 12:53 ` [RFC/PATCH v2 07/22] RFC: s390/mm: Transfer guest pmd protection to host Janosch Frank
2017-12-13 12:53 ` [RFC/PATCH v2 08/22] s390/mm: Add huge page dirty sync support Janosch Frank
2017-12-13 12:53 ` [RFC/PATCH v2 09/22] s390/mm: clear huge page storage keys on enable_skey Janosch Frank
2017-12-13 12:53 ` [RFC/PATCH v2 10/22] s390/mm: Add huge pmd storage key handling Janosch Frank
2017-12-13 12:53 ` [RFC/PATCH v2 11/22] s390/mm: Remove superfluous parameter Janosch Frank
2017-12-21  9:22   ` Janosch Frank
2018-01-16 12:39     ` Janosch Frank
2018-01-16 13:11   ` David Hildenbrand
2018-01-22 13:14   ` Christian Borntraeger
2018-01-22 13:24     ` Martin Schwidefsky
2017-12-13 12:53 ` [RFC/PATCH v2 12/22] s390/mm: Add gmap_protect_large read protection support Janosch Frank
2017-12-13 12:53 ` [RFC/PATCH v2 13/22] s390/mm: Make gmap_read_table EDAT1 compatible Janosch Frank
2017-12-13 12:53 ` [RFC/PATCH v2 14/22] s390/mm: Make protect_rmap " Janosch Frank
2017-12-13 12:53 ` [RFC/PATCH v2 15/22] s390/mm: GMAP read table extensions Janosch Frank
2017-12-13 12:53 ` [RFC/PATCH v2 16/22] s390/mm: Add shadow segment code Janosch Frank
2017-12-13 12:53 ` [RFC/PATCH v2 17/22] s390/mm: Add VSIE reverse fake case Janosch Frank
2017-12-13 12:53 ` [RFC/PATCH v2 18/22] s390/mm: Remove gmap_pte_op_walk Janosch Frank
2017-12-13 12:53 ` [RFC/PATCH v2 19/22] s390/mm: Split huge pages if granular protection is needed Janosch Frank
2018-01-25  7:16   ` Janosch Frank
2018-01-25 14:39     ` David Hildenbrand
2018-01-25 14:55       ` Janosch Frank
2017-12-13 12:53 ` [RFC/PATCH v2 20/22] s390/mm: Enable gmap huge pmd support Janosch Frank
2017-12-13 12:53 ` [RFC/PATCH v2 21/22] KVM: s390: Add KVM HPAGE capability Janosch Frank
2017-12-20 13:02   ` Cornelia Huck
2017-12-20 13:17     ` Janosch Frank
2017-12-20 13:21       ` Cornelia Huck
2017-12-13 12:53 ` [RFC/PATCH v2 22/22] RFC: s390/mm: Add gmap lock classes Janosch Frank
2017-12-20 12:24   ` Christian Borntraeger
2017-12-20 12:36     ` Janosch Frank
2017-12-20 12:23 ` [RFC/PATCH v2 00/22] KVM/s390: Hugetlbfs enablement Christian Borntraeger
2017-12-21 12:00   ` David Hildenbrand
2017-12-22  9:08     ` Christian Borntraeger
2018-01-02  0:02       ` Janosch Frank
2018-01-22 11:23 ` David Hildenbrand
2018-01-22 11:56   ` Christian Borntraeger
2018-01-23 21:15 ` David Hildenbrand
2018-01-24  9:01   ` Janosch Frank
2018-01-24  9:14     ` David Hildenbrand
2018-01-25 15:33       ` [PATCH 0/2] Huge page pte protection Janosch Frank
2018-01-25 15:33         ` [PATCH 1/2] mm: s390: Only notify on 4k pages Janosch Frank
2018-01-25 16:04           ` David Hildenbrand
2018-01-26 10:31             ` Janosch Frank
2018-01-25 15:33         ` [PATCH 2/2] mm: s390: Rename gmap_pte_op_fixup Janosch Frank
2018-01-26 10:34       ` [PATCH v2] mm: s390: Only notify on 4k pages Janosch Frank
2018-01-30 10:19         ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.