All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] s390/kvm fixes
@ 2013-05-17 12:41 Christian Borntraeger
  2013-05-17 12:41 ` [PATCH 1/8] s390/pgtable: fix ipte notify bit Christian Borntraeger
                   ` (9 more replies)
  0 siblings, 10 replies; 12+ messages in thread
From: Christian Borntraeger @ 2013-05-17 12:41 UTC (permalink / raw)
  To: Marcelo Tossati, Gleb Natapov, Paolo Bonzini
  Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
	linux-s390, Christian Borntraeger

Gleb, Paolo, Marcelo,

here are some low level changes to kvm on s390 that we have been
cooking for a while now.

Patch "s390/pgtable: fix ipte notify bit" will go via Martins
tree into 3.10, but is included to reduce the amount of merge
conflicts. 

Patch "s390: fix gmap_ipte_notifier vs. software dirty pages"
will also go via Martins tree into 3.10 and it fixes a hang with
heavy host paging and KVM. This is optional for merging, but
makes testing on kvm/next easier.

This series addresses 2 problems:
- paging of guest prefix page
- RCU timeouts

The first problem is basically that we must not have the host pte
invalid or r/o for the guest prefix pages. (everything else has fully
nested paging but the prefix page must not cause host faults).
It is not enough to pin the page, also the pte has to be r/w all the
time. Mlocking is not enough due to memory compaction, malicious
unmapping etc.
We use the existing callback mechanism of the s390 page table functions
to kick guests out of SIE and hold them until this is done. We cant 
use the existing kick functions since we must hold a pgste lock while
we wait for SIE to exit and IPIs might dead lock.

The second problem is that with KVM on s390 we have seen very long
RCU stalls due to SIE not exiting on interrupts. Instead of returning
to SIE, we now force an exit into the kvm module, which then does the
guest exit/enter magic, fixing rcu.

The whole bunch is probably too complex for 3.10, so please queue for
3.11

Christian Borntraeger (5):
  s390/pgtable: fix ipte notify bit
  s390/kvm: Mark if a cpu is in SIE
  s390/kvm: Provide a way to prevent reentering SIE
  s390/kvm: Kick guests out of sie if prefix page host pte is touched
  s390: fix gmap_ipte_notifier vs. software dirty pages

Martin Schwidefsky (3):
  s390/kvm: fix psw rewinding in handle_skey
  s390/kvm: rename RCP_xxx defines to PGSTE_xxx
  s390/kvm: avoid automatic sie reentry

 arch/s390/include/asm/kvm_host.h |  8 +++-
 arch/s390/include/asm/pgtable.h  | 83 +++++++++++++++++++---------------------
 arch/s390/kernel/asm-offsets.c   |  3 ++
 arch/s390/kernel/entry64.S       | 80 ++++++++++++++++++--------------------
 arch/s390/kvm/intercept.c        | 39 +------------------
 arch/s390/kvm/kvm-s390.c         | 81 ++++++++++++++++++++++++++++++++++++++-
 arch/s390/kvm/kvm-s390.h         |  5 +++
 arch/s390/kvm/priv.c             |  3 +-
 arch/s390/mm/pgtable.c           |  5 +--
 9 files changed, 179 insertions(+), 128 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/8] s390/pgtable: fix ipte notify bit
  2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
@ 2013-05-17 12:41 ` Christian Borntraeger
  2013-05-17 12:41 ` [PATCH 2/8] s390/kvm: fix psw rewinding in handle_skey Christian Borntraeger
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2013-05-17 12:41 UTC (permalink / raw)
  To: Marcelo Tossati, Gleb Natapov, Paolo Bonzini
  Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
	linux-s390, Christian Borntraeger

Dont use the same bit as user referenced.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/pgtable.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 4105b82..0f0de30 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -306,7 +306,7 @@ extern unsigned long MODULES_END;
 #define RCP_HC_BIT	0x00200000UL
 #define RCP_GR_BIT	0x00040000UL
 #define RCP_GC_BIT	0x00020000UL
-#define RCP_IN_BIT	0x00008000UL	/* IPTE notify bit */
+#define RCP_IN_BIT	0x00002000UL	/* IPTE notify bit */
 
 /* User dirty / referenced bit for KVM's migration feature */
 #define KVM_UR_BIT	0x00008000UL
@@ -374,7 +374,7 @@ extern unsigned long MODULES_END;
 #define RCP_HC_BIT	0x0020000000000000UL
 #define RCP_GR_BIT	0x0004000000000000UL
 #define RCP_GC_BIT	0x0002000000000000UL
-#define RCP_IN_BIT	0x0000800000000000UL	/* IPTE notify bit */
+#define RCP_IN_BIT	0x0000200000000000UL	/* IPTE notify bit */
 
 /* User dirty / referenced bit for KVM's migration feature */
 #define KVM_UR_BIT	0x0000800000000000UL
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/8] s390/kvm: fix psw rewinding in handle_skey
  2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
  2013-05-17 12:41 ` [PATCH 1/8] s390/pgtable: fix ipte notify bit Christian Borntraeger
@ 2013-05-17 12:41 ` Christian Borntraeger
  2013-05-17 12:41 ` [PATCH 3/8] s390/kvm: rename RCP_xxx defines to PGSTE_xxx Christian Borntraeger
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2013-05-17 12:41 UTC (permalink / raw)
  To: Marcelo Tossati, Gleb Natapov, Paolo Bonzini
  Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
	linux-s390, Christian Borntraeger

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

The PSW can wrap if the guest has been running in the 24 bit or 31 bit
addressing mode. Use __rewind_psw to find the correct address.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/priv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 6bbd7b5..ecc58a6 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -105,7 +105,8 @@ static int handle_store_cpu_address(struct kvm_vcpu *vcpu)
 static int handle_skey(struct kvm_vcpu *vcpu)
 {
 	vcpu->stat.instruction_storage_key++;
-	vcpu->arch.sie_block->gpsw.addr -= 4;
+	vcpu->arch.sie_block->gpsw.addr =
+		__rewind_psw(vcpu->arch.sie_block->gpsw, 4);
 	VCPU_EVENT(vcpu, 4, "%s", "retrying storage key operation");
 	return 0;
 }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/8] s390/kvm: rename RCP_xxx defines to PGSTE_xxx
  2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
  2013-05-17 12:41 ` [PATCH 1/8] s390/pgtable: fix ipte notify bit Christian Borntraeger
  2013-05-17 12:41 ` [PATCH 2/8] s390/kvm: fix psw rewinding in handle_skey Christian Borntraeger
@ 2013-05-17 12:41 ` Christian Borntraeger
  2013-05-17 12:41 ` [PATCH 4/8] s390/kvm: Mark if a cpu is in SIE Christian Borntraeger
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2013-05-17 12:41 UTC (permalink / raw)
  To: Marcelo Tossati, Gleb Natapov, Paolo Bonzini
  Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
	linux-s390, Christian Borntraeger

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

The RCP byte is a part of the PGSTE value, the existing RCP_xxx names
are inaccurate. As the defines describe bits and pieces of the PGSTE,
the names should start with PGSTE_. The KVM_UR_BIT and KVM_UC_BIT are
part of the PGSTE as well, give them better names as well.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/pgtable.h | 82 ++++++++++++++++++++---------------------
 arch/s390/mm/pgtable.c          |  2 +-
 2 files changed, 40 insertions(+), 44 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 0f0de30..1fc68d9 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -299,18 +299,16 @@ extern unsigned long MODULES_END;
 #define _SEGMENT_ENTRY_EMPTY	(_SEGMENT_ENTRY_INV)
 
 /* Page status table bits for virtualization */
-#define RCP_ACC_BITS	0xf0000000UL
-#define RCP_FP_BIT	0x08000000UL
-#define RCP_PCL_BIT	0x00800000UL
-#define RCP_HR_BIT	0x00400000UL
-#define RCP_HC_BIT	0x00200000UL
-#define RCP_GR_BIT	0x00040000UL
-#define RCP_GC_BIT	0x00020000UL
-#define RCP_IN_BIT	0x00002000UL	/* IPTE notify bit */
-
-/* User dirty / referenced bit for KVM's migration feature */
-#define KVM_UR_BIT	0x00008000UL
-#define KVM_UC_BIT	0x00004000UL
+#define PGSTE_ACC_BITS	0xf0000000UL
+#define PGSTE_FP_BIT	0x08000000UL
+#define PGSTE_PCL_BIT	0x00800000UL
+#define PGSTE_HR_BIT	0x00400000UL
+#define PGSTE_HC_BIT	0x00200000UL
+#define PGSTE_GR_BIT	0x00040000UL
+#define PGSTE_GC_BIT	0x00020000UL
+#define PGSTE_UR_BIT	0x00008000UL
+#define PGSTE_UC_BIT	0x00004000UL	/* user dirty (migration) */
+#define PGSTE_IN_BIT	0x00002000UL	/* IPTE notify bit */
 
 #else /* CONFIG_64BIT */
 
@@ -367,18 +365,16 @@ extern unsigned long MODULES_END;
 				 | _SEGMENT_ENTRY_SPLIT | _SEGMENT_ENTRY_CO)
 
 /* Page status table bits for virtualization */
-#define RCP_ACC_BITS	0xf000000000000000UL
-#define RCP_FP_BIT	0x0800000000000000UL
-#define RCP_PCL_BIT	0x0080000000000000UL
-#define RCP_HR_BIT	0x0040000000000000UL
-#define RCP_HC_BIT	0x0020000000000000UL
-#define RCP_GR_BIT	0x0004000000000000UL
-#define RCP_GC_BIT	0x0002000000000000UL
-#define RCP_IN_BIT	0x0000200000000000UL	/* IPTE notify bit */
-
-/* User dirty / referenced bit for KVM's migration feature */
-#define KVM_UR_BIT	0x0000800000000000UL
-#define KVM_UC_BIT	0x0000400000000000UL
+#define PGSTE_ACC_BITS	0xf000000000000000UL
+#define PGSTE_FP_BIT	0x0800000000000000UL
+#define PGSTE_PCL_BIT	0x0080000000000000UL
+#define PGSTE_HR_BIT	0x0040000000000000UL
+#define PGSTE_HC_BIT	0x0020000000000000UL
+#define PGSTE_GR_BIT	0x0004000000000000UL
+#define PGSTE_GC_BIT	0x0002000000000000UL
+#define PGSTE_UR_BIT	0x0000800000000000UL
+#define PGSTE_UC_BIT	0x0000400000000000UL	/* user dirty (migration) */
+#define PGSTE_IN_BIT	0x0000200000000000UL	/* IPTE notify bit */
 
 #endif /* CONFIG_64BIT */
 
@@ -618,8 +614,8 @@ static inline pgste_t pgste_get_lock(pte_t *ptep)
 	asm(
 		"	lg	%0,%2\n"
 		"0:	lgr	%1,%0\n"
-		"	nihh	%0,0xff7f\n"	/* clear RCP_PCL_BIT in old */
-		"	oihh	%1,0x0080\n"	/* set RCP_PCL_BIT in new */
+		"	nihh	%0,0xff7f\n"	/* clear PCL bit in old */
+		"	oihh	%1,0x0080\n"	/* set PCL bit in new */
 		"	csg	%0,%1,%2\n"
 		"	jl	0b\n"
 		: "=&d" (old), "=&d" (new), "=Q" (ptep[PTRS_PER_PTE])
@@ -632,7 +628,7 @@ static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste)
 {
 #ifdef CONFIG_PGSTE
 	asm(
-		"	nihh	%1,0xff7f\n"	/* clear RCP_PCL_BIT */
+		"	nihh	%1,0xff7f\n"	/* clear PCL bit */
 		"	stg	%1,%0\n"
 		: "=Q" (ptep[PTRS_PER_PTE])
 		: "d" (pgste_val(pgste)), "Q" (ptep[PTRS_PER_PTE]) : "cc");
@@ -657,14 +653,14 @@ static inline pgste_t pgste_update_all(pte_t *ptep, pgste_t pgste)
 	else if (bits)
 		page_reset_referenced(address);
 	/* Transfer page changed & referenced bit to guest bits in pgste */
-	pgste_val(pgste) |= bits << 48;		/* RCP_GR_BIT & RCP_GC_BIT */
+	pgste_val(pgste) |= bits << 48;		/* GR bit & GC bit */
 	/* Get host changed & referenced bits from pgste */
-	bits |= (pgste_val(pgste) & (RCP_HR_BIT | RCP_HC_BIT)) >> 52;
+	bits |= (pgste_val(pgste) & (PGSTE_HR_BIT | PGSTE_HC_BIT)) >> 52;
 	/* Transfer page changed & referenced bit to kvm user bits */
-	pgste_val(pgste) |= bits << 45;		/* KVM_UR_BIT & KVM_UC_BIT */
+	pgste_val(pgste) |= bits << 45;		/* PGSTE_UR_BIT & PGSTE_UC_BIT */
 	/* Clear relevant host bits in pgste. */
-	pgste_val(pgste) &= ~(RCP_HR_BIT | RCP_HC_BIT);
-	pgste_val(pgste) &= ~(RCP_ACC_BITS | RCP_FP_BIT);
+	pgste_val(pgste) &= ~(PGSTE_HR_BIT | PGSTE_HC_BIT);
+	pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT);
 	/* Copy page access key and fetch protection bit to pgste */
 	pgste_val(pgste) |=
 		(unsigned long) (skey & (_PAGE_ACC_BITS | _PAGE_FP_BIT)) << 56;
@@ -685,15 +681,15 @@ static inline pgste_t pgste_update_young(pte_t *ptep, pgste_t pgste)
 	/* Get referenced bit from storage key */
 	young = page_reset_referenced(pte_val(*ptep) & PAGE_MASK);
 	if (young)
-		pgste_val(pgste) |= RCP_GR_BIT;
+		pgste_val(pgste) |= PGSTE_GR_BIT;
 	/* Get host referenced bit from pgste */
-	if (pgste_val(pgste) & RCP_HR_BIT) {
-		pgste_val(pgste) &= ~RCP_HR_BIT;
+	if (pgste_val(pgste) & PGSTE_HR_BIT) {
+		pgste_val(pgste) &= ~PGSTE_HR_BIT;
 		young = 1;
 	}
 	/* Transfer referenced bit to kvm user bits and pte */
 	if (young) {
-		pgste_val(pgste) |= KVM_UR_BIT;
+		pgste_val(pgste) |= PGSTE_UR_BIT;
 		pte_val(*ptep) |= _PAGE_SWR;
 	}
 #endif
@@ -712,7 +708,7 @@ static inline void pgste_set_key(pte_t *ptep, pgste_t pgste, pte_t entry)
 	okey = nkey = page_get_storage_key(address);
 	nkey &= ~(_PAGE_ACC_BITS | _PAGE_FP_BIT);
 	/* Set page access key and fetch protection bit from pgste */
-	nkey |= (pgste_val(pgste) & (RCP_ACC_BITS | RCP_FP_BIT)) >> 56;
+	nkey |= (pgste_val(pgste) & (PGSTE_ACC_BITS | PGSTE_FP_BIT)) >> 56;
 	if (okey != nkey)
 		page_set_storage_key(address, nkey, 0);
 #endif
@@ -801,8 +797,8 @@ static inline pgste_t pgste_ipte_notify(struct mm_struct *mm,
 					pte_t *ptep, pgste_t pgste)
 {
 #ifdef CONFIG_PGSTE
-	if (pgste_val(pgste) & RCP_IN_BIT) {
-		pgste_val(pgste) &= ~RCP_IN_BIT;
+	if (pgste_val(pgste) & PGSTE_IN_BIT) {
+		pgste_val(pgste) &= ~PGSTE_IN_BIT;
 		gmap_do_ipte_notify(mm, addr, ptep);
 	}
 #endif
@@ -970,8 +966,8 @@ static inline int ptep_test_and_clear_user_dirty(struct mm_struct *mm,
 	if (mm_has_pgste(mm)) {
 		pgste = pgste_get_lock(ptep);
 		pgste = pgste_update_all(ptep, pgste);
-		dirty = !!(pgste_val(pgste) & KVM_UC_BIT);
-		pgste_val(pgste) &= ~KVM_UC_BIT;
+		dirty = !!(pgste_val(pgste) & PGSTE_UC_BIT);
+		pgste_val(pgste) &= ~PGSTE_UC_BIT;
 		pgste_set_unlock(ptep, pgste);
 		return dirty;
 	}
@@ -990,8 +986,8 @@ static inline int ptep_test_and_clear_user_young(struct mm_struct *mm,
 	if (mm_has_pgste(mm)) {
 		pgste = pgste_get_lock(ptep);
 		pgste = pgste_update_young(ptep, pgste);
-		young = !!(pgste_val(pgste) & KVM_UR_BIT);
-		pgste_val(pgste) &= ~KVM_UR_BIT;
+		young = !!(pgste_val(pgste) & PGSTE_UR_BIT);
+		pgste_val(pgste) &= ~PGSTE_UR_BIT;
 		pgste_set_unlock(ptep, pgste);
 	}
 	return young;
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 7805ddc..5ca7568 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -690,7 +690,7 @@ int gmap_ipte_notify(struct gmap *gmap, unsigned long start, unsigned long len)
 		entry = *ptep;
 		if ((pte_val(entry) & (_PAGE_INVALID | _PAGE_RO)) == 0) {
 			pgste = pgste_get_lock(ptep);
-			pgste_val(pgste) |= RCP_IN_BIT;
+			pgste_val(pgste) |= PGSTE_IN_BIT;
 			pgste_set_unlock(ptep, pgste);
 			start += PAGE_SIZE;
 			len -= PAGE_SIZE;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/8] s390/kvm: Mark if a cpu is in SIE
  2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
                   ` (2 preceding siblings ...)
  2013-05-17 12:41 ` [PATCH 3/8] s390/kvm: rename RCP_xxx defines to PGSTE_xxx Christian Borntraeger
@ 2013-05-17 12:41 ` Christian Borntraeger
  2013-05-17 12:41 ` [PATCH 5/8] s390/kvm: Provide a way to prevent reentering SIE Christian Borntraeger
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2013-05-17 12:41 UTC (permalink / raw)
  To: Marcelo Tossati, Gleb Natapov, Paolo Bonzini
  Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
	linux-s390, Christian Borntraeger

Lets track in a private bit if the sie control block is active.
We want to track this as closely as possible, so we also have to
instrument the interrupt and program check handler. Lets use the
existing HANDLE_SIE_INTERCEPT macro.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |  5 ++++-
 arch/s390/kernel/asm-offsets.c   |  2 ++
 arch/s390/kernel/entry64.S       | 10 +++++++---
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 16bd5d1..962b92e 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -68,7 +68,10 @@ struct sca_block {
 struct kvm_s390_sie_block {
 	atomic_t cpuflags;		/* 0x0000 */
 	__u32	prefix;			/* 0x0004 */
-	__u8	reserved8[32];		/* 0x0008 */
+	__u8	reserved08[4];		/* 0x0008 */
+#define PROG_IN_SIE (1<<0)
+	__u32	prog0c;			/* 0x000c */
+	__u8	reserved10[24];		/* 0x0010 */
 	__u64	cputm;			/* 0x0028 */
 	__u64	ckc;			/* 0x0030 */
 	__u64	epoch;			/* 0x0038 */
diff --git a/arch/s390/kernel/asm-offsets.c b/arch/s390/kernel/asm-offsets.c
index 7a82f9f..6456bbe 100644
--- a/arch/s390/kernel/asm-offsets.c
+++ b/arch/s390/kernel/asm-offsets.c
@@ -7,6 +7,7 @@
 #define ASM_OFFSETS_C
 
 #include <linux/kbuild.h>
+#include <linux/kvm_host.h>
 #include <linux/sched.h>
 #include <asm/cputime.h>
 #include <asm/vdso.h>
@@ -161,6 +162,7 @@ int main(void)
 	DEFINE(__LC_PGM_TDB, offsetof(struct _lowcore, pgm_tdb));
 	DEFINE(__THREAD_trap_tdb, offsetof(struct task_struct, thread.trap_tdb));
 	DEFINE(__GMAP_ASCE, offsetof(struct gmap, asce));
+	DEFINE(__SIE_PROG0C, offsetof(struct kvm_s390_sie_block, prog0c));
 #endif /* CONFIG_32BIT */
 	return 0;
 }
diff --git a/arch/s390/kernel/entry64.S b/arch/s390/kernel/entry64.S
index 4c17eec..c2e81b4 100644
--- a/arch/s390/kernel/entry64.S
+++ b/arch/s390/kernel/entry64.S
@@ -84,7 +84,7 @@ _TIF_EXIT_SIE = (_TIF_SIGPENDING | _TIF_NEED_RESCHED | _TIF_MCCK_PENDING)
 	.macro	HANDLE_SIE_INTERCEPT scratch,pgmcheck
 #if defined(CONFIG_KVM) || defined(CONFIG_KVM_MODULE)
 	tmhh	%r8,0x0001		# interrupting from user ?
-	jnz	.+42
+	jnz	.+52
 	lgr	\scratch,%r9
 	slg	\scratch,BASED(.Lsie_loop)
 	clg	\scratch,BASED(.Lsie_length)
@@ -92,12 +92,14 @@ _TIF_EXIT_SIE = (_TIF_SIGPENDING | _TIF_NEED_RESCHED | _TIF_MCCK_PENDING)
 	# Some program interrupts are suppressing (e.g. protection).
 	# We must also check the instruction after SIE in that case.
 	# do_protection_exception will rewind to rewind_pad
-	jh	.+22
+	jh	.+32
 	.else
-	jhe	.+22
+	jhe	.+32
 	.endif
 	lg	%r9,BASED(.Lsie_loop)
 	LPP	BASED(.Lhost_id)	# set host id
+	lg	%r14,__SF_EMPTY(%r15)	# get control block pointer
+	ni	__SIE_PROG0C+3(%r14),0xfe	# no longer in SIE
 #endif
 	.endm
 
@@ -956,10 +958,12 @@ sie_loop:
 	lctlg	%c1,%c1,__GMAP_ASCE(%r14)	# load primary asce
 sie_gmap:
 	lg	%r14,__SF_EMPTY(%r15)		# get control block pointer
+	oi	__SIE_PROG0C+3(%r14),1		# we are in SIE now
 	LPP	__SF_EMPTY(%r15)		# set guest id
 	sie	0(%r14)
 sie_done:
 	LPP	__SF_EMPTY+16(%r15)		# set host id
+	ni	__SIE_PROG0C+3(%r14),0xfe	# no longer in SIE
 	lg	%r14,__LC_THREAD_INFO		# pointer thread_info struct
 sie_exit:
 	lctlg	%c1,%c1,__LC_USER_ASCE		# load primary asce
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/8] s390/kvm: Provide a way to prevent reentering SIE
  2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
                   ` (3 preceding siblings ...)
  2013-05-17 12:41 ` [PATCH 4/8] s390/kvm: Mark if a cpu is in SIE Christian Borntraeger
@ 2013-05-17 12:41 ` Christian Borntraeger
  2013-05-17 12:41 ` [PATCH 6/8] s390/kvm: Kick guests out of sie if prefix page host pte is touched Christian Borntraeger
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2013-05-17 12:41 UTC (permalink / raw)
  To: Marcelo Tossati, Gleb Natapov, Paolo Bonzini
  Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
	linux-s390, Christian Borntraeger

Lets provide functions to prevent KVM from reentering SIE and
to kick cpus out of SIE. We cannot use the common kvm_vcpu_kick code,
since we need to kick out guests in places that hold architecture
specific locks (e.g. pgste lock) which might be necessary on the
other cpus - so no waiting possible.

So lets provide a bit in a private field of the sie control block
that acts as a gate keeper, after we claimed we are in SIE.
Please note that we do not reuse prog0c, since we want to access
that bit without atomic ops.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |  5 ++++-
 arch/s390/kernel/asm-offsets.c   |  1 +
 arch/s390/kernel/entry64.S       |  4 +++-
 arch/s390/kvm/kvm-s390.c         | 28 ++++++++++++++++++++++++++++
 arch/s390/kvm/kvm-s390.h         |  4 ++++
 5 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 962b92e..9a809f9 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -71,7 +71,10 @@ struct kvm_s390_sie_block {
 	__u8	reserved08[4];		/* 0x0008 */
 #define PROG_IN_SIE (1<<0)
 	__u32	prog0c;			/* 0x000c */
-	__u8	reserved10[24];		/* 0x0010 */
+	__u8	reserved10[16];		/* 0x0010 */
+#define PROG_BLOCK_SIE 0x00000001
+	atomic_t prog20;		/* 0x0020 */
+	__u8	reserved24[4];		/* 0x0024 */
 	__u64	cputm;			/* 0x0028 */
 	__u64	ckc;			/* 0x0030 */
 	__u64	epoch;			/* 0x0038 */
diff --git a/arch/s390/kernel/asm-offsets.c b/arch/s390/kernel/asm-offsets.c
index 6456bbe..78db633 100644
--- a/arch/s390/kernel/asm-offsets.c
+++ b/arch/s390/kernel/asm-offsets.c
@@ -163,6 +163,7 @@ int main(void)
 	DEFINE(__THREAD_trap_tdb, offsetof(struct task_struct, thread.trap_tdb));
 	DEFINE(__GMAP_ASCE, offsetof(struct gmap, asce));
 	DEFINE(__SIE_PROG0C, offsetof(struct kvm_s390_sie_block, prog0c));
+	DEFINE(__SIE_PROG20, offsetof(struct kvm_s390_sie_block, prog20));
 #endif /* CONFIG_32BIT */
 	return 0;
 }
diff --git a/arch/s390/kernel/entry64.S b/arch/s390/kernel/entry64.S
index c2e81b4..c7daeef 100644
--- a/arch/s390/kernel/entry64.S
+++ b/arch/s390/kernel/entry64.S
@@ -958,7 +958,9 @@ sie_loop:
 	lctlg	%c1,%c1,__GMAP_ASCE(%r14)	# load primary asce
 sie_gmap:
 	lg	%r14,__SF_EMPTY(%r15)		# get control block pointer
-	oi	__SIE_PROG0C+3(%r14),1		# we are in SIE now
+	oi	__SIE_PROG0C+3(%r14),1		# we are going into SIE now
+	tm	__SIE_PROG20+3(%r14),1		# last exit...
+	jnz	sie_done
 	LPP	__SF_EMPTY(%r15)		# set guest id
 	sie	0(%r14)
 sie_done:
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index c1c7c68..ef4ef21 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -454,6 +454,34 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+void s390_vcpu_block(struct kvm_vcpu *vcpu)
+{
+	atomic_set_mask(PROG_BLOCK_SIE, &vcpu->arch.sie_block->prog20);
+}
+
+void s390_vcpu_unblock(struct kvm_vcpu *vcpu)
+{
+	atomic_clear_mask(PROG_BLOCK_SIE, &vcpu->arch.sie_block->prog20);
+}
+
+/*
+ * Kick a guest cpu out of SIE and wait until SIE is not running.
+ * If the CPU is not running (e.g. waiting as idle) the function will
+ * return immediately. */
+void exit_sie(struct kvm_vcpu *vcpu)
+{
+	atomic_set_mask(CPUSTAT_STOP_INT, &vcpu->arch.sie_block->cpuflags);
+	while (vcpu->arch.sie_block->prog0c & PROG_IN_SIE)
+		cpu_relax();
+}
+
+/* Kick a guest cpu out of SIE and prevent SIE-reentry */
+void exit_sie_sync(struct kvm_vcpu *vcpu)
+{
+	s390_vcpu_block(vcpu);
+	exit_sie(vcpu);
+}
+
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
 {
 	/* kvm common code refers to this, but never calls it */
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index efc14f6..7a8abfd 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -133,6 +133,10 @@ int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu);
 /* implemented in kvm-s390.c */
 int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu,
 				 unsigned long addr);
+void s390_vcpu_block(struct kvm_vcpu *vcpu);
+void s390_vcpu_unblock(struct kvm_vcpu *vcpu);
+void exit_sie(struct kvm_vcpu *vcpu);
+void exit_sie_sync(struct kvm_vcpu *vcpu);
 /* implemented in diag.c */
 int kvm_s390_handle_diag(struct kvm_vcpu *vcpu);
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 6/8] s390/kvm: Kick guests out of sie if prefix page host pte is touched
  2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
                   ` (4 preceding siblings ...)
  2013-05-17 12:41 ` [PATCH 5/8] s390/kvm: Provide a way to prevent reentering SIE Christian Borntraeger
@ 2013-05-17 12:41 ` Christian Borntraeger
  2013-05-17 12:41 ` [PATCH 7/8] s390/kvm: avoid automatic sie reentry Christian Borntraeger
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2013-05-17 12:41 UTC (permalink / raw)
  To: Marcelo Tossati, Gleb Natapov, Paolo Bonzini
  Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
	linux-s390, Christian Borntraeger

The guest prefix pages must be mapped writeable all the time
while SIE is running, otherwise the guest might see random
behaviour. (pinned at the pte level) Turns out that mlocking is
not enough, the page table entry (not the page) might change or
become r/o. This patch uses the gmap notifiers to kick guest
cpus out of SIE.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 arch/s390/include/asm/pgtable.h |  1 +
 arch/s390/kvm/intercept.c       | 39 ++------------------------------
 arch/s390/kvm/kvm-s390.c        | 49 +++++++++++++++++++++++++++++++++++++++++
 arch/s390/kvm/kvm-s390.h        |  1 +
 4 files changed, 53 insertions(+), 37 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 1fc68d9..1d0ad7d 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -739,6 +739,7 @@ struct gmap {
 	struct mm_struct *mm;
 	unsigned long *table;
 	unsigned long asce;
+	void *private;
 	struct list_head crst_list;
 };
 
diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index b7d1b2e..f0b8be0 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -174,47 +174,12 @@ static int handle_stop(struct kvm_vcpu *vcpu)
 
 static int handle_validity(struct kvm_vcpu *vcpu)
 {
-	unsigned long vmaddr;
 	int viwhy = vcpu->arch.sie_block->ipb >> 16;
-	int rc;
 
 	vcpu->stat.exit_validity++;
 	trace_kvm_s390_intercept_validity(vcpu, viwhy);
-	if (viwhy == 0x37) {
-		vmaddr = gmap_fault(vcpu->arch.sie_block->prefix,
-				    vcpu->arch.gmap);
-		if (IS_ERR_VALUE(vmaddr)) {
-			rc = -EOPNOTSUPP;
-			goto out;
-		}
-		rc = fault_in_pages_writeable((char __user *) vmaddr,
-			 PAGE_SIZE);
-		if (rc) {
-			/* user will receive sigsegv, exit to user */
-			rc = -EOPNOTSUPP;
-			goto out;
-		}
-		vmaddr = gmap_fault(vcpu->arch.sie_block->prefix + PAGE_SIZE,
-				    vcpu->arch.gmap);
-		if (IS_ERR_VALUE(vmaddr)) {
-			rc = -EOPNOTSUPP;
-			goto out;
-		}
-		rc = fault_in_pages_writeable((char __user *) vmaddr,
-			 PAGE_SIZE);
-		if (rc) {
-			/* user will receive sigsegv, exit to user */
-			rc = -EOPNOTSUPP;
-			goto out;
-		}
-	} else
-		rc = -EOPNOTSUPP;
-
-out:
-	if (rc)
-		VCPU_EVENT(vcpu, 2, "unhandled validity intercept code %d",
-			   viwhy);
-	return rc;
+	WARN_ONCE(true, "kvm: unhandled validity intercept 0x%x\n", viwhy);
+	return -EOPNOTSUPP;
 }
 
 static int handle_instruction(struct kvm_vcpu *vcpu)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index ef4ef21..08227c1 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -84,6 +84,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 };
 
 static unsigned long long *facilities;
+static struct gmap_notifier gmap_notifier;
 
 /* Section: not file related */
 int kvm_arch_hardware_enable(void *garbage)
@@ -96,13 +97,18 @@ void kvm_arch_hardware_disable(void *garbage)
 {
 }
 
+static void kvm_gmap_notifier(struct gmap *gmap, unsigned long address);
+
 int kvm_arch_hardware_setup(void)
 {
+	gmap_notifier.notifier_call = kvm_gmap_notifier;
+	gmap_register_ipte_notifier(&gmap_notifier);
 	return 0;
 }
 
 void kvm_arch_hardware_unsetup(void)
 {
+	gmap_unregister_ipte_notifier(&gmap_notifier);
 }
 
 void kvm_arch_check_processor_compat(void *rtn)
@@ -239,6 +245,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 		kvm->arch.gmap = gmap_alloc(current->mm);
 		if (!kvm->arch.gmap)
 			goto out_nogmap;
+		kvm->arch.gmap->private = kvm;
 	}
 
 	kvm->arch.css_support = 0;
@@ -309,6 +316,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 		vcpu->arch.gmap = gmap_alloc(current->mm);
 		if (!vcpu->arch.gmap)
 			return -ENOMEM;
+		vcpu->arch.gmap->private = vcpu->kvm;
 		return 0;
 	}
 
@@ -482,6 +490,22 @@ void exit_sie_sync(struct kvm_vcpu *vcpu)
 	exit_sie(vcpu);
 }
 
+static void kvm_gmap_notifier(struct gmap *gmap, unsigned long address)
+{
+	int i;
+	struct kvm *kvm = gmap->private;
+	struct kvm_vcpu *vcpu;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		/* match against both prefix pages */
+		if (vcpu->arch.sie_block->prefix == (address & ~0x1000UL)) {
+			VCPU_EVENT(vcpu, 2, "gmap notifier for %lx", address);
+			kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
+			exit_sie_sync(vcpu);
+		}
+	}
+}
+
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
 {
 	/* kvm common code refers to this, but never calls it */
@@ -634,6 +658,27 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 	return -EINVAL; /* not implemented yet */
 }
 
+static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * We use MMU_RELOAD just to re-arm the ipte notifier for the
+	 * guest prefix page. gmap_ipte_notify will wait on the ptl lock.
+	 * This ensures that the ipte instruction for this request has
+	 * already finished. We might race against a second unmapper that
+	 * wants to set the blocking bit. Lets just retry the request loop.
+	 */
+	while (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu)) {
+		int rc;
+		rc = gmap_ipte_notify(vcpu->arch.gmap,
+				      vcpu->arch.sie_block->prefix,
+				      PAGE_SIZE * 2);
+		if (rc)
+			return rc;
+		s390_vcpu_unblock(vcpu);
+	}
+	return 0;
+}
+
 static int __vcpu_run(struct kvm_vcpu *vcpu)
 {
 	int rc;
@@ -649,6 +694,10 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
 	if (!kvm_is_ucontrol(vcpu->kvm))
 		kvm_s390_deliver_pending_interrupts(vcpu);
 
+	rc = kvm_s390_handle_requests(vcpu);
+	if (rc)
+		return rc;
+
 	vcpu->arch.sie_block->icptcode = 0;
 	preempt_disable();
 	kvm_guest_enter();
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 7a8abfd..269b523 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -63,6 +63,7 @@ static inline void kvm_s390_set_prefix(struct kvm_vcpu *vcpu, u32 prefix)
 {
 	vcpu->arch.sie_block->prefix = prefix & 0x7fffe000u;
 	vcpu->arch.sie_block->ihcpu  = 0xffff;
+	kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
 }
 
 static inline u64 kvm_s390_get_base_disp_s(struct kvm_vcpu *vcpu)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 7/8] s390/kvm: avoid automatic sie reentry
  2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
                   ` (5 preceding siblings ...)
  2013-05-17 12:41 ` [PATCH 6/8] s390/kvm: Kick guests out of sie if prefix page host pte is touched Christian Borntraeger
@ 2013-05-17 12:41 ` Christian Borntraeger
  2013-05-17 12:41 ` [PATCH 8/8] s390: fix gmap_ipte_notifier vs. software dirty pages Christian Borntraeger
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2013-05-17 12:41 UTC (permalink / raw)
  To: Marcelo Tossati, Gleb Natapov, Paolo Bonzini
  Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
	linux-s390, Christian Borntraeger

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

Do not automatically restart the sie instruction in entry64.S after an
interrupt, return to the caller with a reason code instead. That allows
to deal with RCU and other conditions in C code.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kernel/entry64.S | 76 ++++++++++++++++++++--------------------------
 arch/s390/kvm/kvm-s390.c   |  4 ++-
 2 files changed, 36 insertions(+), 44 deletions(-)

diff --git a/arch/s390/kernel/entry64.S b/arch/s390/kernel/entry64.S
index c7daeef..51d99ac 100644
--- a/arch/s390/kernel/entry64.S
+++ b/arch/s390/kernel/entry64.S
@@ -47,7 +47,6 @@ _TIF_WORK_INT = (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_NEED_RESCHED | \
 		 _TIF_MCCK_PENDING)
 _TIF_TRACE    = (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SECCOMP | \
 		 _TIF_SYSCALL_TRACEPOINT)
-_TIF_EXIT_SIE = (_TIF_SIGPENDING | _TIF_NEED_RESCHED | _TIF_MCCK_PENDING)
 
 #define BASED(name) name-system_call(%r13)
 
@@ -81,25 +80,27 @@ _TIF_EXIT_SIE = (_TIF_SIGPENDING | _TIF_NEED_RESCHED | _TIF_MCCK_PENDING)
 #endif
 	.endm
 
-	.macro	HANDLE_SIE_INTERCEPT scratch,pgmcheck
+	.macro	HANDLE_SIE_INTERCEPT scratch,reason
 #if defined(CONFIG_KVM) || defined(CONFIG_KVM_MODULE)
 	tmhh	%r8,0x0001		# interrupting from user ?
-	jnz	.+52
+	jnz	.+62
 	lgr	\scratch,%r9
-	slg	\scratch,BASED(.Lsie_loop)
-	clg	\scratch,BASED(.Lsie_length)
-	.if	\pgmcheck
+	slg	\scratch,BASED(.Lsie_critical)
+	clg	\scratch,BASED(.Lsie_critical_length)
+	.if	\reason==1
 	# Some program interrupts are suppressing (e.g. protection).
 	# We must also check the instruction after SIE in that case.
 	# do_protection_exception will rewind to rewind_pad
-	jh	.+32
+	jh	.+42
 	.else
-	jhe	.+32
+	jhe	.+42
 	.endif
-	lg	%r9,BASED(.Lsie_loop)
-	LPP	BASED(.Lhost_id)	# set host id
-	lg	%r14,__SF_EMPTY(%r15)	# get control block pointer
+	lg	%r14,__SF_EMPTY(%r15)		# get control block pointer
+	LPP	__SF_EMPTY+16(%r15)		# set host id
 	ni	__SIE_PROG0C+3(%r14),0xfe	# no longer in SIE
+	lctlg	%c1,%c1,__LC_USER_ASCE		# load primary asce
+	larl	%r9,sie_exit			# skip forward to sie_exit
+	mvi	__SF_EMPTY+31(%r15),\reason	# set exit reason
 #endif
 	.endm
 
@@ -452,7 +453,7 @@ ENTRY(io_int_handler)
 	lg	%r12,__LC_THREAD_INFO
 	larl	%r13,system_call
 	lmg	%r8,%r9,__LC_IO_OLD_PSW
-	HANDLE_SIE_INTERCEPT %r14,0
+	HANDLE_SIE_INTERCEPT %r14,2
 	SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_STACK,STACK_SHIFT
 	tmhh	%r8,0x0001		# interrupting from user?
 	jz	io_skip
@@ -597,7 +598,7 @@ ENTRY(ext_int_handler)
 	lg	%r12,__LC_THREAD_INFO
 	larl	%r13,system_call
 	lmg	%r8,%r9,__LC_EXT_OLD_PSW
-	HANDLE_SIE_INTERCEPT %r14,0
+	HANDLE_SIE_INTERCEPT %r14,3
 	SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_STACK,STACK_SHIFT
 	tmhh	%r8,0x0001		# interrupting from user ?
 	jz	ext_skip
@@ -645,7 +646,7 @@ ENTRY(mcck_int_handler)
 	lg	%r12,__LC_THREAD_INFO
 	larl	%r13,system_call
 	lmg	%r8,%r9,__LC_MCK_OLD_PSW
-	HANDLE_SIE_INTERCEPT %r14,0
+	HANDLE_SIE_INTERCEPT %r14,4
 	tm	__LC_MCCK_CODE,0x80	# system damage?
 	jo	mcck_panic		# yes -> rest of mcck code invalid
 	lghi	%r14,__LC_CPU_TIMER_SAVE_AREA
@@ -939,19 +940,8 @@ ENTRY(sie64a)
 	stmg	%r6,%r14,__SF_GPRS(%r15)	# save kernel registers
 	stg	%r2,__SF_EMPTY(%r15)		# save control block pointer
 	stg	%r3,__SF_EMPTY+8(%r15)		# save guest register save area
-	xc	__SF_EMPTY+16(8,%r15),__SF_EMPTY+16(%r15) # host id == 0
+	xc	__SF_EMPTY+16(16,%r15),__SF_EMPTY+16(%r15) # host id & reason
 	lmg	%r0,%r13,0(%r3)			# load guest gprs 0-13
-# some program checks are suppressing. C code (e.g. do_protection_exception)
-# will rewind the PSW by the ILC, which is 4 bytes in case of SIE. Other
-# instructions in the sie_loop should not cause program interrupts. So
-# lets use a nop (47 00 00 00) as a landing pad.
-# See also HANDLE_SIE_INTERCEPT
-rewind_pad:
-	nop	0
-sie_loop:
-	lg	%r14,__LC_THREAD_INFO		# pointer thread_info struct
-	tm	__TI_flags+7(%r14),_TIF_EXIT_SIE
-	jnz	sie_exit
 	lg	%r14,__LC_GMAP			# get gmap pointer
 	ltgr	%r14,%r14
 	jz	sie_gmap
@@ -966,33 +956,33 @@ sie_gmap:
 sie_done:
 	LPP	__SF_EMPTY+16(%r15)		# set host id
 	ni	__SIE_PROG0C+3(%r14),0xfe	# no longer in SIE
-	lg	%r14,__LC_THREAD_INFO		# pointer thread_info struct
-sie_exit:
 	lctlg	%c1,%c1,__LC_USER_ASCE		# load primary asce
+# some program checks are suppressing. C code (e.g. do_protection_exception)
+# will rewind the PSW by the ILC, which is 4 bytes in case of SIE. Other
+# instructions beween sie64a and sie_done should not cause program
+# interrupts. So lets use a nop (47 00 00 00) as a landing pad.
+# See also HANDLE_SIE_INTERCEPT
+rewind_pad:
+	nop	0
+sie_exit:
 	lg	%r14,__SF_EMPTY+8(%r15)		# load guest register save area
 	stmg	%r0,%r13,0(%r14)		# save guest gprs 0-13
 	lmg	%r6,%r14,__SF_GPRS(%r15)	# restore kernel registers
-	lghi	%r2,0
+	lg	%r2,__SF_EMPTY+24(%r15)		# return exit reason code
 	br	%r14
 sie_fault:
-	lctlg	%c1,%c1,__LC_USER_ASCE		# load primary asce
-	lg	%r14,__LC_THREAD_INFO		# pointer thread_info struct
-	lg	%r14,__SF_EMPTY+8(%r15)		# load guest register save area
-	stmg	%r0,%r13,0(%r14)		# save guest gprs 0-13
-	lmg	%r6,%r14,__SF_GPRS(%r15)	# restore kernel registers
-	lghi	%r2,-EFAULT
-	br	%r14
+	lghi	%r14,-EFAULT
+	stg	%r14,__SF_EMPTY+24(%r15)	# set exit reason code
+	j	sie_exit
 
 	.align	8
-.Lsie_loop:
-	.quad	sie_loop
-.Lsie_length:
-	.quad	sie_done - sie_loop
-.Lhost_id:
-	.quad	0
+.Lsie_critical:
+	.quad	sie_gmap
+.Lsie_critical_length:
+	.quad	sie_done - sie_gmap
 
 	EX_TABLE(rewind_pad,sie_fault)
-	EX_TABLE(sie_loop,sie_fault)
+	EX_TABLE(sie_exit,sie_fault)
 #endif
 
 		.section .rodata, "a"
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 08227c1..93444c4 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -707,7 +707,9 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
 	trace_kvm_s390_sie_enter(vcpu,
 				 atomic_read(&vcpu->arch.sie_block->cpuflags));
 	rc = sie64a(vcpu->arch.sie_block, vcpu->run->s.regs.gprs);
-	if (rc) {
+	if (rc > 0)
+		rc = 0;
+	if (rc < 0) {
 		if (kvm_is_ucontrol(vcpu->kvm)) {
 			rc = SIE_INTERCEPT_UCONTROL;
 		} else {
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 8/8] s390: fix gmap_ipte_notifier vs. software dirty pages
  2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
                   ` (6 preceding siblings ...)
  2013-05-17 12:41 ` [PATCH 7/8] s390/kvm: avoid automatic sie reentry Christian Borntraeger
@ 2013-05-17 12:41 ` Christian Borntraeger
  2013-05-19  8:49 ` [PATCH 0/8] s390/kvm fixes Gleb Natapov
  2013-05-21  8:56 ` Gleb Natapov
  9 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2013-05-17 12:41 UTC (permalink / raw)
  To: Marcelo Tossati, Gleb Natapov, Paolo Bonzini
  Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
	linux-s390, Christian Borntraeger

On heavy paging load some guest cpus started to loop in gmap_ipte_notify.
This was visible as stalled cpus inside the guest. The gmap_ipte_notifier
tries to map a user page and then made sure that the pte is valid and
writable. Turns out that with the software change bit tracking the pte
can become read-only (and only software writable) if the page is clean.
Since we loop in this code, the page would stay clean and, therefore,
be never writable again.
Let us just use fixup_user_fault, that guarantees to call handle_mm_fault.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/mm/pgtable.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 5ca7568..1e0c438 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -677,8 +677,7 @@ int gmap_ipte_notify(struct gmap *gmap, unsigned long start, unsigned long len)
 			break;
 		}
 		/* Get the page mapped */
-		if (get_user_pages(current, gmap->mm, addr, 1, 1, 0,
-				   NULL, NULL) != 1) {
+		if (fixup_user_fault(current, gmap->mm, addr, FAULT_FLAG_WRITE)) {
 			rc = -EFAULT;
 			break;
 		}
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/8] s390/kvm fixes
  2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
                   ` (7 preceding siblings ...)
  2013-05-17 12:41 ` [PATCH 8/8] s390: fix gmap_ipte_notifier vs. software dirty pages Christian Borntraeger
@ 2013-05-19  8:49 ` Gleb Natapov
  2013-05-21  6:57   ` Martin Schwidefsky
  2013-05-21  8:56 ` Gleb Natapov
  9 siblings, 1 reply; 12+ messages in thread
From: Gleb Natapov @ 2013-05-19  8:49 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Marcelo Tossati, Paolo Bonzini, Cornelia Huck, Heiko Carstens,
	Martin Schwidefsky, KVM, linux-s390

Hi Christian,

On Fri, May 17, 2013 at 02:41:30PM +0200, Christian Borntraeger wrote:
> Gleb, Paolo, Marcelo,
> 
> here are some low level changes to kvm on s390 that we have been
> cooking for a while now.
> 
> Patch "s390/pgtable: fix ipte notify bit" will go via Martins
> tree into 3.10, but is included to reduce the amount of merge
> conflicts. 
> 
> Patch "s390: fix gmap_ipte_notifier vs. software dirty pages"
> will also go via Martins tree into 3.10 and it fixes a hang with
> heavy host paging and KVM. This is optional for merging, but
> makes testing on kvm/next easier.
> 
Can I add Martin's ACKs to those two then?

> This series addresses 2 problems:
> - paging of guest prefix page
> - RCU timeouts
> 
> The first problem is basically that we must not have the host pte
> invalid or r/o for the guest prefix pages. (everything else has fully
> nested paging but the prefix page must not cause host faults).
> It is not enough to pin the page, also the pte has to be r/w all the
> time. Mlocking is not enough due to memory compaction, malicious
> unmapping etc.
> We use the existing callback mechanism of the s390 page table functions
> to kick guests out of SIE and hold them until this is done. We cant 
> use the existing kick functions since we must hold a pgste lock while
> we wait for SIE to exit and IPIs might dead lock.
> 
> The second problem is that with KVM on s390 we have seen very long
> RCU stalls due to SIE not exiting on interrupts. Instead of returning
> to SIE, we now force an exit into the kvm module, which then does the
> guest exit/enter magic, fixing rcu.
> 
> The whole bunch is probably too complex for 3.10, so please queue for
> 3.11
> 
> Christian Borntraeger (5):
>   s390/pgtable: fix ipte notify bit
>   s390/kvm: Mark if a cpu is in SIE
>   s390/kvm: Provide a way to prevent reentering SIE
>   s390/kvm: Kick guests out of sie if prefix page host pte is touched
>   s390: fix gmap_ipte_notifier vs. software dirty pages
> 
> Martin Schwidefsky (3):
>   s390/kvm: fix psw rewinding in handle_skey
>   s390/kvm: rename RCP_xxx defines to PGSTE_xxx
>   s390/kvm: avoid automatic sie reentry
> 
>  arch/s390/include/asm/kvm_host.h |  8 +++-
>  arch/s390/include/asm/pgtable.h  | 83 +++++++++++++++++++---------------------
>  arch/s390/kernel/asm-offsets.c   |  3 ++
>  arch/s390/kernel/entry64.S       | 80 ++++++++++++++++++--------------------
>  arch/s390/kvm/intercept.c        | 39 +------------------
>  arch/s390/kvm/kvm-s390.c         | 81 ++++++++++++++++++++++++++++++++++++++-
>  arch/s390/kvm/kvm-s390.h         |  5 +++
>  arch/s390/kvm/priv.c             |  3 +-
>  arch/s390/mm/pgtable.c           |  5 +--
>  9 files changed, 179 insertions(+), 128 deletions(-)
> 
> -- 
> 1.8.1.4

--
			Gleb.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/8] s390/kvm fixes
  2013-05-19  8:49 ` [PATCH 0/8] s390/kvm fixes Gleb Natapov
@ 2013-05-21  6:57   ` Martin Schwidefsky
  0 siblings, 0 replies; 12+ messages in thread
From: Martin Schwidefsky @ 2013-05-21  6:57 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Christian Borntraeger, Marcelo Tossati, Paolo Bonzini,
	Cornelia Huck, Heiko Carstens, KVM, linux-s390

On Sun, 19 May 2013 11:49:43 +0300
Gleb Natapov <gleb@redhat.com> wrote:

> Hi Christian,
> 
> On Fri, May 17, 2013 at 02:41:30PM +0200, Christian Borntraeger wrote:
> > Gleb, Paolo, Marcelo,
> > 
> > here are some low level changes to kvm on s390 that we have been
> > cooking for a while now.
> > 
> > Patch "s390/pgtable: fix ipte notify bit" will go via Martins
> > tree into 3.10, but is included to reduce the amount of merge
> > conflicts. 
> > 
> > Patch "s390: fix gmap_ipte_notifier vs. software dirty pages"
> > will also go via Martins tree into 3.10 and it fixes a hang with
> > heavy host paging and KVM. This is optional for merging, but
> > makes testing on kvm/next easier.
> > 
> Can I add Martin's ACKs to those two then?

Yes.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/8] s390/kvm fixes
  2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
                   ` (8 preceding siblings ...)
  2013-05-19  8:49 ` [PATCH 0/8] s390/kvm fixes Gleb Natapov
@ 2013-05-21  8:56 ` Gleb Natapov
  9 siblings, 0 replies; 12+ messages in thread
From: Gleb Natapov @ 2013-05-21  8:56 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Marcelo Tossati, Paolo Bonzini, Cornelia Huck, Heiko Carstens,
	Martin Schwidefsky, KVM, linux-s390

On Fri, May 17, 2013 at 02:41:30PM +0200, Christian Borntraeger wrote:
> Gleb, Paolo, Marcelo,
> 
Applied, thanks.

> here are some low level changes to kvm on s390 that we have been
> cooking for a while now.
> 
> Patch "s390/pgtable: fix ipte notify bit" will go via Martins
> tree into 3.10, but is included to reduce the amount of merge
> conflicts. 
> 
> Patch "s390: fix gmap_ipte_notifier vs. software dirty pages"
> will also go via Martins tree into 3.10 and it fixes a hang with
> heavy host paging and KVM. This is optional for merging, but
> makes testing on kvm/next easier.
> 
> This series addresses 2 problems:
> - paging of guest prefix page
> - RCU timeouts
> 
> The first problem is basically that we must not have the host pte
> invalid or r/o for the guest prefix pages. (everything else has fully
> nested paging but the prefix page must not cause host faults).
> It is not enough to pin the page, also the pte has to be r/w all the
> time. Mlocking is not enough due to memory compaction, malicious
> unmapping etc.
> We use the existing callback mechanism of the s390 page table functions
> to kick guests out of SIE and hold them until this is done. We cant 
> use the existing kick functions since we must hold a pgste lock while
> we wait for SIE to exit and IPIs might dead lock.
> 
> The second problem is that with KVM on s390 we have seen very long
> RCU stalls due to SIE not exiting on interrupts. Instead of returning
> to SIE, we now force an exit into the kvm module, which then does the
> guest exit/enter magic, fixing rcu.
> 
> The whole bunch is probably too complex for 3.10, so please queue for
> 3.11
> 
> Christian Borntraeger (5):
>   s390/pgtable: fix ipte notify bit
>   s390/kvm: Mark if a cpu is in SIE
>   s390/kvm: Provide a way to prevent reentering SIE
>   s390/kvm: Kick guests out of sie if prefix page host pte is touched
>   s390: fix gmap_ipte_notifier vs. software dirty pages
> 
> Martin Schwidefsky (3):
>   s390/kvm: fix psw rewinding in handle_skey
>   s390/kvm: rename RCP_xxx defines to PGSTE_xxx
>   s390/kvm: avoid automatic sie reentry
> 
>  arch/s390/include/asm/kvm_host.h |  8 +++-
>  arch/s390/include/asm/pgtable.h  | 83 +++++++++++++++++++---------------------
>  arch/s390/kernel/asm-offsets.c   |  3 ++
>  arch/s390/kernel/entry64.S       | 80 ++++++++++++++++++--------------------
>  arch/s390/kvm/intercept.c        | 39 +------------------
>  arch/s390/kvm/kvm-s390.c         | 81 ++++++++++++++++++++++++++++++++++++++-
>  arch/s390/kvm/kvm-s390.h         |  5 +++
>  arch/s390/kvm/priv.c             |  3 +-
>  arch/s390/mm/pgtable.c           |  5 +--
>  9 files changed, 179 insertions(+), 128 deletions(-)
> 
> -- 
> 1.8.1.4

--
			Gleb.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-05-21  8:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
2013-05-17 12:41 ` [PATCH 1/8] s390/pgtable: fix ipte notify bit Christian Borntraeger
2013-05-17 12:41 ` [PATCH 2/8] s390/kvm: fix psw rewinding in handle_skey Christian Borntraeger
2013-05-17 12:41 ` [PATCH 3/8] s390/kvm: rename RCP_xxx defines to PGSTE_xxx Christian Borntraeger
2013-05-17 12:41 ` [PATCH 4/8] s390/kvm: Mark if a cpu is in SIE Christian Borntraeger
2013-05-17 12:41 ` [PATCH 5/8] s390/kvm: Provide a way to prevent reentering SIE Christian Borntraeger
2013-05-17 12:41 ` [PATCH 6/8] s390/kvm: Kick guests out of sie if prefix page host pte is touched Christian Borntraeger
2013-05-17 12:41 ` [PATCH 7/8] s390/kvm: avoid automatic sie reentry Christian Borntraeger
2013-05-17 12:41 ` [PATCH 8/8] s390: fix gmap_ipte_notifier vs. software dirty pages Christian Borntraeger
2013-05-19  8:49 ` [PATCH 0/8] s390/kvm fixes Gleb Natapov
2013-05-21  6:57   ` Martin Schwidefsky
2013-05-21  8:56 ` Gleb Natapov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.