* [PATCH 0/8] s390/kvm fixes
@ 2013-05-17 12:41 Christian Borntraeger
2013-05-17 12:41 ` [PATCH 1/8] s390/pgtable: fix ipte notify bit Christian Borntraeger
` (9 more replies)
0 siblings, 10 replies; 12+ messages in thread
From: Christian Borntraeger @ 2013-05-17 12:41 UTC (permalink / raw)
To: Marcelo Tossati, Gleb Natapov, Paolo Bonzini
Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
linux-s390, Christian Borntraeger
Gleb, Paolo, Marcelo,
here are some low level changes to kvm on s390 that we have been
cooking for a while now.
Patch "s390/pgtable: fix ipte notify bit" will go via Martins
tree into 3.10, but is included to reduce the amount of merge
conflicts.
Patch "s390: fix gmap_ipte_notifier vs. software dirty pages"
will also go via Martins tree into 3.10 and it fixes a hang with
heavy host paging and KVM. This is optional for merging, but
makes testing on kvm/next easier.
This series addresses 2 problems:
- paging of guest prefix page
- RCU timeouts
The first problem is basically that we must not have the host pte
invalid or r/o for the guest prefix pages. (everything else has fully
nested paging but the prefix page must not cause host faults).
It is not enough to pin the page, also the pte has to be r/w all the
time. Mlocking is not enough due to memory compaction, malicious
unmapping etc.
We use the existing callback mechanism of the s390 page table functions
to kick guests out of SIE and hold them until this is done. We cant
use the existing kick functions since we must hold a pgste lock while
we wait for SIE to exit and IPIs might dead lock.
The second problem is that with KVM on s390 we have seen very long
RCU stalls due to SIE not exiting on interrupts. Instead of returning
to SIE, we now force an exit into the kvm module, which then does the
guest exit/enter magic, fixing rcu.
The whole bunch is probably too complex for 3.10, so please queue for
3.11
Christian Borntraeger (5):
s390/pgtable: fix ipte notify bit
s390/kvm: Mark if a cpu is in SIE
s390/kvm: Provide a way to prevent reentering SIE
s390/kvm: Kick guests out of sie if prefix page host pte is touched
s390: fix gmap_ipte_notifier vs. software dirty pages
Martin Schwidefsky (3):
s390/kvm: fix psw rewinding in handle_skey
s390/kvm: rename RCP_xxx defines to PGSTE_xxx
s390/kvm: avoid automatic sie reentry
arch/s390/include/asm/kvm_host.h | 8 +++-
arch/s390/include/asm/pgtable.h | 83 +++++++++++++++++++---------------------
arch/s390/kernel/asm-offsets.c | 3 ++
arch/s390/kernel/entry64.S | 80 ++++++++++++++++++--------------------
arch/s390/kvm/intercept.c | 39 +------------------
arch/s390/kvm/kvm-s390.c | 81 ++++++++++++++++++++++++++++++++++++++-
arch/s390/kvm/kvm-s390.h | 5 +++
arch/s390/kvm/priv.c | 3 +-
arch/s390/mm/pgtable.c | 5 +--
9 files changed, 179 insertions(+), 128 deletions(-)
--
1.8.1.4
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/8] s390/pgtable: fix ipte notify bit
2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
@ 2013-05-17 12:41 ` Christian Borntraeger
2013-05-17 12:41 ` [PATCH 2/8] s390/kvm: fix psw rewinding in handle_skey Christian Borntraeger
` (8 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2013-05-17 12:41 UTC (permalink / raw)
To: Marcelo Tossati, Gleb Natapov, Paolo Bonzini
Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
linux-s390, Christian Borntraeger
Dont use the same bit as user referenced.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
arch/s390/include/asm/pgtable.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 4105b82..0f0de30 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -306,7 +306,7 @@ extern unsigned long MODULES_END;
#define RCP_HC_BIT 0x00200000UL
#define RCP_GR_BIT 0x00040000UL
#define RCP_GC_BIT 0x00020000UL
-#define RCP_IN_BIT 0x00008000UL /* IPTE notify bit */
+#define RCP_IN_BIT 0x00002000UL /* IPTE notify bit */
/* User dirty / referenced bit for KVM's migration feature */
#define KVM_UR_BIT 0x00008000UL
@@ -374,7 +374,7 @@ extern unsigned long MODULES_END;
#define RCP_HC_BIT 0x0020000000000000UL
#define RCP_GR_BIT 0x0004000000000000UL
#define RCP_GC_BIT 0x0002000000000000UL
-#define RCP_IN_BIT 0x0000800000000000UL /* IPTE notify bit */
+#define RCP_IN_BIT 0x0000200000000000UL /* IPTE notify bit */
/* User dirty / referenced bit for KVM's migration feature */
#define KVM_UR_BIT 0x0000800000000000UL
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/8] s390/kvm: fix psw rewinding in handle_skey
2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
2013-05-17 12:41 ` [PATCH 1/8] s390/pgtable: fix ipte notify bit Christian Borntraeger
@ 2013-05-17 12:41 ` Christian Borntraeger
2013-05-17 12:41 ` [PATCH 3/8] s390/kvm: rename RCP_xxx defines to PGSTE_xxx Christian Borntraeger
` (7 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2013-05-17 12:41 UTC (permalink / raw)
To: Marcelo Tossati, Gleb Natapov, Paolo Bonzini
Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
linux-s390, Christian Borntraeger
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
The PSW can wrap if the guest has been running in the 24 bit or 31 bit
addressing mode. Use __rewind_psw to find the correct address.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
arch/s390/kvm/priv.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 6bbd7b5..ecc58a6 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -105,7 +105,8 @@ static int handle_store_cpu_address(struct kvm_vcpu *vcpu)
static int handle_skey(struct kvm_vcpu *vcpu)
{
vcpu->stat.instruction_storage_key++;
- vcpu->arch.sie_block->gpsw.addr -= 4;
+ vcpu->arch.sie_block->gpsw.addr =
+ __rewind_psw(vcpu->arch.sie_block->gpsw, 4);
VCPU_EVENT(vcpu, 4, "%s", "retrying storage key operation");
return 0;
}
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/8] s390/kvm: rename RCP_xxx defines to PGSTE_xxx
2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
2013-05-17 12:41 ` [PATCH 1/8] s390/pgtable: fix ipte notify bit Christian Borntraeger
2013-05-17 12:41 ` [PATCH 2/8] s390/kvm: fix psw rewinding in handle_skey Christian Borntraeger
@ 2013-05-17 12:41 ` Christian Borntraeger
2013-05-17 12:41 ` [PATCH 4/8] s390/kvm: Mark if a cpu is in SIE Christian Borntraeger
` (6 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2013-05-17 12:41 UTC (permalink / raw)
To: Marcelo Tossati, Gleb Natapov, Paolo Bonzini
Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
linux-s390, Christian Borntraeger
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
The RCP byte is a part of the PGSTE value, the existing RCP_xxx names
are inaccurate. As the defines describe bits and pieces of the PGSTE,
the names should start with PGSTE_. The KVM_UR_BIT and KVM_UC_BIT are
part of the PGSTE as well, give them better names as well.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
arch/s390/include/asm/pgtable.h | 82 ++++++++++++++++++++---------------------
arch/s390/mm/pgtable.c | 2 +-
2 files changed, 40 insertions(+), 44 deletions(-)
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 0f0de30..1fc68d9 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -299,18 +299,16 @@ extern unsigned long MODULES_END;
#define _SEGMENT_ENTRY_EMPTY (_SEGMENT_ENTRY_INV)
/* Page status table bits for virtualization */
-#define RCP_ACC_BITS 0xf0000000UL
-#define RCP_FP_BIT 0x08000000UL
-#define RCP_PCL_BIT 0x00800000UL
-#define RCP_HR_BIT 0x00400000UL
-#define RCP_HC_BIT 0x00200000UL
-#define RCP_GR_BIT 0x00040000UL
-#define RCP_GC_BIT 0x00020000UL
-#define RCP_IN_BIT 0x00002000UL /* IPTE notify bit */
-
-/* User dirty / referenced bit for KVM's migration feature */
-#define KVM_UR_BIT 0x00008000UL
-#define KVM_UC_BIT 0x00004000UL
+#define PGSTE_ACC_BITS 0xf0000000UL
+#define PGSTE_FP_BIT 0x08000000UL
+#define PGSTE_PCL_BIT 0x00800000UL
+#define PGSTE_HR_BIT 0x00400000UL
+#define PGSTE_HC_BIT 0x00200000UL
+#define PGSTE_GR_BIT 0x00040000UL
+#define PGSTE_GC_BIT 0x00020000UL
+#define PGSTE_UR_BIT 0x00008000UL
+#define PGSTE_UC_BIT 0x00004000UL /* user dirty (migration) */
+#define PGSTE_IN_BIT 0x00002000UL /* IPTE notify bit */
#else /* CONFIG_64BIT */
@@ -367,18 +365,16 @@ extern unsigned long MODULES_END;
| _SEGMENT_ENTRY_SPLIT | _SEGMENT_ENTRY_CO)
/* Page status table bits for virtualization */
-#define RCP_ACC_BITS 0xf000000000000000UL
-#define RCP_FP_BIT 0x0800000000000000UL
-#define RCP_PCL_BIT 0x0080000000000000UL
-#define RCP_HR_BIT 0x0040000000000000UL
-#define RCP_HC_BIT 0x0020000000000000UL
-#define RCP_GR_BIT 0x0004000000000000UL
-#define RCP_GC_BIT 0x0002000000000000UL
-#define RCP_IN_BIT 0x0000200000000000UL /* IPTE notify bit */
-
-/* User dirty / referenced bit for KVM's migration feature */
-#define KVM_UR_BIT 0x0000800000000000UL
-#define KVM_UC_BIT 0x0000400000000000UL
+#define PGSTE_ACC_BITS 0xf000000000000000UL
+#define PGSTE_FP_BIT 0x0800000000000000UL
+#define PGSTE_PCL_BIT 0x0080000000000000UL
+#define PGSTE_HR_BIT 0x0040000000000000UL
+#define PGSTE_HC_BIT 0x0020000000000000UL
+#define PGSTE_GR_BIT 0x0004000000000000UL
+#define PGSTE_GC_BIT 0x0002000000000000UL
+#define PGSTE_UR_BIT 0x0000800000000000UL
+#define PGSTE_UC_BIT 0x0000400000000000UL /* user dirty (migration) */
+#define PGSTE_IN_BIT 0x0000200000000000UL /* IPTE notify bit */
#endif /* CONFIG_64BIT */
@@ -618,8 +614,8 @@ static inline pgste_t pgste_get_lock(pte_t *ptep)
asm(
" lg %0,%2\n"
"0: lgr %1,%0\n"
- " nihh %0,0xff7f\n" /* clear RCP_PCL_BIT in old */
- " oihh %1,0x0080\n" /* set RCP_PCL_BIT in new */
+ " nihh %0,0xff7f\n" /* clear PCL bit in old */
+ " oihh %1,0x0080\n" /* set PCL bit in new */
" csg %0,%1,%2\n"
" jl 0b\n"
: "=&d" (old), "=&d" (new), "=Q" (ptep[PTRS_PER_PTE])
@@ -632,7 +628,7 @@ static inline void pgste_set_unlock(pte_t *ptep, pgste_t pgste)
{
#ifdef CONFIG_PGSTE
asm(
- " nihh %1,0xff7f\n" /* clear RCP_PCL_BIT */
+ " nihh %1,0xff7f\n" /* clear PCL bit */
" stg %1,%0\n"
: "=Q" (ptep[PTRS_PER_PTE])
: "d" (pgste_val(pgste)), "Q" (ptep[PTRS_PER_PTE]) : "cc");
@@ -657,14 +653,14 @@ static inline pgste_t pgste_update_all(pte_t *ptep, pgste_t pgste)
else if (bits)
page_reset_referenced(address);
/* Transfer page changed & referenced bit to guest bits in pgste */
- pgste_val(pgste) |= bits << 48; /* RCP_GR_BIT & RCP_GC_BIT */
+ pgste_val(pgste) |= bits << 48; /* GR bit & GC bit */
/* Get host changed & referenced bits from pgste */
- bits |= (pgste_val(pgste) & (RCP_HR_BIT | RCP_HC_BIT)) >> 52;
+ bits |= (pgste_val(pgste) & (PGSTE_HR_BIT | PGSTE_HC_BIT)) >> 52;
/* Transfer page changed & referenced bit to kvm user bits */
- pgste_val(pgste) |= bits << 45; /* KVM_UR_BIT & KVM_UC_BIT */
+ pgste_val(pgste) |= bits << 45; /* PGSTE_UR_BIT & PGSTE_UC_BIT */
/* Clear relevant host bits in pgste. */
- pgste_val(pgste) &= ~(RCP_HR_BIT | RCP_HC_BIT);
- pgste_val(pgste) &= ~(RCP_ACC_BITS | RCP_FP_BIT);
+ pgste_val(pgste) &= ~(PGSTE_HR_BIT | PGSTE_HC_BIT);
+ pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT);
/* Copy page access key and fetch protection bit to pgste */
pgste_val(pgste) |=
(unsigned long) (skey & (_PAGE_ACC_BITS | _PAGE_FP_BIT)) << 56;
@@ -685,15 +681,15 @@ static inline pgste_t pgste_update_young(pte_t *ptep, pgste_t pgste)
/* Get referenced bit from storage key */
young = page_reset_referenced(pte_val(*ptep) & PAGE_MASK);
if (young)
- pgste_val(pgste) |= RCP_GR_BIT;
+ pgste_val(pgste) |= PGSTE_GR_BIT;
/* Get host referenced bit from pgste */
- if (pgste_val(pgste) & RCP_HR_BIT) {
- pgste_val(pgste) &= ~RCP_HR_BIT;
+ if (pgste_val(pgste) & PGSTE_HR_BIT) {
+ pgste_val(pgste) &= ~PGSTE_HR_BIT;
young = 1;
}
/* Transfer referenced bit to kvm user bits and pte */
if (young) {
- pgste_val(pgste) |= KVM_UR_BIT;
+ pgste_val(pgste) |= PGSTE_UR_BIT;
pte_val(*ptep) |= _PAGE_SWR;
}
#endif
@@ -712,7 +708,7 @@ static inline void pgste_set_key(pte_t *ptep, pgste_t pgste, pte_t entry)
okey = nkey = page_get_storage_key(address);
nkey &= ~(_PAGE_ACC_BITS | _PAGE_FP_BIT);
/* Set page access key and fetch protection bit from pgste */
- nkey |= (pgste_val(pgste) & (RCP_ACC_BITS | RCP_FP_BIT)) >> 56;
+ nkey |= (pgste_val(pgste) & (PGSTE_ACC_BITS | PGSTE_FP_BIT)) >> 56;
if (okey != nkey)
page_set_storage_key(address, nkey, 0);
#endif
@@ -801,8 +797,8 @@ static inline pgste_t pgste_ipte_notify(struct mm_struct *mm,
pte_t *ptep, pgste_t pgste)
{
#ifdef CONFIG_PGSTE
- if (pgste_val(pgste) & RCP_IN_BIT) {
- pgste_val(pgste) &= ~RCP_IN_BIT;
+ if (pgste_val(pgste) & PGSTE_IN_BIT) {
+ pgste_val(pgste) &= ~PGSTE_IN_BIT;
gmap_do_ipte_notify(mm, addr, ptep);
}
#endif
@@ -970,8 +966,8 @@ static inline int ptep_test_and_clear_user_dirty(struct mm_struct *mm,
if (mm_has_pgste(mm)) {
pgste = pgste_get_lock(ptep);
pgste = pgste_update_all(ptep, pgste);
- dirty = !!(pgste_val(pgste) & KVM_UC_BIT);
- pgste_val(pgste) &= ~KVM_UC_BIT;
+ dirty = !!(pgste_val(pgste) & PGSTE_UC_BIT);
+ pgste_val(pgste) &= ~PGSTE_UC_BIT;
pgste_set_unlock(ptep, pgste);
return dirty;
}
@@ -990,8 +986,8 @@ static inline int ptep_test_and_clear_user_young(struct mm_struct *mm,
if (mm_has_pgste(mm)) {
pgste = pgste_get_lock(ptep);
pgste = pgste_update_young(ptep, pgste);
- young = !!(pgste_val(pgste) & KVM_UR_BIT);
- pgste_val(pgste) &= ~KVM_UR_BIT;
+ young = !!(pgste_val(pgste) & PGSTE_UR_BIT);
+ pgste_val(pgste) &= ~PGSTE_UR_BIT;
pgste_set_unlock(ptep, pgste);
}
return young;
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 7805ddc..5ca7568 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -690,7 +690,7 @@ int gmap_ipte_notify(struct gmap *gmap, unsigned long start, unsigned long len)
entry = *ptep;
if ((pte_val(entry) & (_PAGE_INVALID | _PAGE_RO)) == 0) {
pgste = pgste_get_lock(ptep);
- pgste_val(pgste) |= RCP_IN_BIT;
+ pgste_val(pgste) |= PGSTE_IN_BIT;
pgste_set_unlock(ptep, pgste);
start += PAGE_SIZE;
len -= PAGE_SIZE;
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/8] s390/kvm: Mark if a cpu is in SIE
2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
` (2 preceding siblings ...)
2013-05-17 12:41 ` [PATCH 3/8] s390/kvm: rename RCP_xxx defines to PGSTE_xxx Christian Borntraeger
@ 2013-05-17 12:41 ` Christian Borntraeger
2013-05-17 12:41 ` [PATCH 5/8] s390/kvm: Provide a way to prevent reentering SIE Christian Borntraeger
` (5 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2013-05-17 12:41 UTC (permalink / raw)
To: Marcelo Tossati, Gleb Natapov, Paolo Bonzini
Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
linux-s390, Christian Borntraeger
Lets track in a private bit if the sie control block is active.
We want to track this as closely as possible, so we also have to
instrument the interrupt and program check handler. Lets use the
existing HANDLE_SIE_INTERCEPT macro.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
arch/s390/include/asm/kvm_host.h | 5 ++++-
arch/s390/kernel/asm-offsets.c | 2 ++
arch/s390/kernel/entry64.S | 10 +++++++---
3 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 16bd5d1..962b92e 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -68,7 +68,10 @@ struct sca_block {
struct kvm_s390_sie_block {
atomic_t cpuflags; /* 0x0000 */
__u32 prefix; /* 0x0004 */
- __u8 reserved8[32]; /* 0x0008 */
+ __u8 reserved08[4]; /* 0x0008 */
+#define PROG_IN_SIE (1<<0)
+ __u32 prog0c; /* 0x000c */
+ __u8 reserved10[24]; /* 0x0010 */
__u64 cputm; /* 0x0028 */
__u64 ckc; /* 0x0030 */
__u64 epoch; /* 0x0038 */
diff --git a/arch/s390/kernel/asm-offsets.c b/arch/s390/kernel/asm-offsets.c
index 7a82f9f..6456bbe 100644
--- a/arch/s390/kernel/asm-offsets.c
+++ b/arch/s390/kernel/asm-offsets.c
@@ -7,6 +7,7 @@
#define ASM_OFFSETS_C
#include <linux/kbuild.h>
+#include <linux/kvm_host.h>
#include <linux/sched.h>
#include <asm/cputime.h>
#include <asm/vdso.h>
@@ -161,6 +162,7 @@ int main(void)
DEFINE(__LC_PGM_TDB, offsetof(struct _lowcore, pgm_tdb));
DEFINE(__THREAD_trap_tdb, offsetof(struct task_struct, thread.trap_tdb));
DEFINE(__GMAP_ASCE, offsetof(struct gmap, asce));
+ DEFINE(__SIE_PROG0C, offsetof(struct kvm_s390_sie_block, prog0c));
#endif /* CONFIG_32BIT */
return 0;
}
diff --git a/arch/s390/kernel/entry64.S b/arch/s390/kernel/entry64.S
index 4c17eec..c2e81b4 100644
--- a/arch/s390/kernel/entry64.S
+++ b/arch/s390/kernel/entry64.S
@@ -84,7 +84,7 @@ _TIF_EXIT_SIE = (_TIF_SIGPENDING | _TIF_NEED_RESCHED | _TIF_MCCK_PENDING)
.macro HANDLE_SIE_INTERCEPT scratch,pgmcheck
#if defined(CONFIG_KVM) || defined(CONFIG_KVM_MODULE)
tmhh %r8,0x0001 # interrupting from user ?
- jnz .+42
+ jnz .+52
lgr \scratch,%r9
slg \scratch,BASED(.Lsie_loop)
clg \scratch,BASED(.Lsie_length)
@@ -92,12 +92,14 @@ _TIF_EXIT_SIE = (_TIF_SIGPENDING | _TIF_NEED_RESCHED | _TIF_MCCK_PENDING)
# Some program interrupts are suppressing (e.g. protection).
# We must also check the instruction after SIE in that case.
# do_protection_exception will rewind to rewind_pad
- jh .+22
+ jh .+32
.else
- jhe .+22
+ jhe .+32
.endif
lg %r9,BASED(.Lsie_loop)
LPP BASED(.Lhost_id) # set host id
+ lg %r14,__SF_EMPTY(%r15) # get control block pointer
+ ni __SIE_PROG0C+3(%r14),0xfe # no longer in SIE
#endif
.endm
@@ -956,10 +958,12 @@ sie_loop:
lctlg %c1,%c1,__GMAP_ASCE(%r14) # load primary asce
sie_gmap:
lg %r14,__SF_EMPTY(%r15) # get control block pointer
+ oi __SIE_PROG0C+3(%r14),1 # we are in SIE now
LPP __SF_EMPTY(%r15) # set guest id
sie 0(%r14)
sie_done:
LPP __SF_EMPTY+16(%r15) # set host id
+ ni __SIE_PROG0C+3(%r14),0xfe # no longer in SIE
lg %r14,__LC_THREAD_INFO # pointer thread_info struct
sie_exit:
lctlg %c1,%c1,__LC_USER_ASCE # load primary asce
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 5/8] s390/kvm: Provide a way to prevent reentering SIE
2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
` (3 preceding siblings ...)
2013-05-17 12:41 ` [PATCH 4/8] s390/kvm: Mark if a cpu is in SIE Christian Borntraeger
@ 2013-05-17 12:41 ` Christian Borntraeger
2013-05-17 12:41 ` [PATCH 6/8] s390/kvm: Kick guests out of sie if prefix page host pte is touched Christian Borntraeger
` (4 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2013-05-17 12:41 UTC (permalink / raw)
To: Marcelo Tossati, Gleb Natapov, Paolo Bonzini
Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
linux-s390, Christian Borntraeger
Lets provide functions to prevent KVM from reentering SIE and
to kick cpus out of SIE. We cannot use the common kvm_vcpu_kick code,
since we need to kick out guests in places that hold architecture
specific locks (e.g. pgste lock) which might be necessary on the
other cpus - so no waiting possible.
So lets provide a bit in a private field of the sie control block
that acts as a gate keeper, after we claimed we are in SIE.
Please note that we do not reuse prog0c, since we want to access
that bit without atomic ops.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
arch/s390/include/asm/kvm_host.h | 5 ++++-
arch/s390/kernel/asm-offsets.c | 1 +
arch/s390/kernel/entry64.S | 4 +++-
arch/s390/kvm/kvm-s390.c | 28 ++++++++++++++++++++++++++++
arch/s390/kvm/kvm-s390.h | 4 ++++
5 files changed, 40 insertions(+), 2 deletions(-)
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 962b92e..9a809f9 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -71,7 +71,10 @@ struct kvm_s390_sie_block {
__u8 reserved08[4]; /* 0x0008 */
#define PROG_IN_SIE (1<<0)
__u32 prog0c; /* 0x000c */
- __u8 reserved10[24]; /* 0x0010 */
+ __u8 reserved10[16]; /* 0x0010 */
+#define PROG_BLOCK_SIE 0x00000001
+ atomic_t prog20; /* 0x0020 */
+ __u8 reserved24[4]; /* 0x0024 */
__u64 cputm; /* 0x0028 */
__u64 ckc; /* 0x0030 */
__u64 epoch; /* 0x0038 */
diff --git a/arch/s390/kernel/asm-offsets.c b/arch/s390/kernel/asm-offsets.c
index 6456bbe..78db633 100644
--- a/arch/s390/kernel/asm-offsets.c
+++ b/arch/s390/kernel/asm-offsets.c
@@ -163,6 +163,7 @@ int main(void)
DEFINE(__THREAD_trap_tdb, offsetof(struct task_struct, thread.trap_tdb));
DEFINE(__GMAP_ASCE, offsetof(struct gmap, asce));
DEFINE(__SIE_PROG0C, offsetof(struct kvm_s390_sie_block, prog0c));
+ DEFINE(__SIE_PROG20, offsetof(struct kvm_s390_sie_block, prog20));
#endif /* CONFIG_32BIT */
return 0;
}
diff --git a/arch/s390/kernel/entry64.S b/arch/s390/kernel/entry64.S
index c2e81b4..c7daeef 100644
--- a/arch/s390/kernel/entry64.S
+++ b/arch/s390/kernel/entry64.S
@@ -958,7 +958,9 @@ sie_loop:
lctlg %c1,%c1,__GMAP_ASCE(%r14) # load primary asce
sie_gmap:
lg %r14,__SF_EMPTY(%r15) # get control block pointer
- oi __SIE_PROG0C+3(%r14),1 # we are in SIE now
+ oi __SIE_PROG0C+3(%r14),1 # we are going into SIE now
+ tm __SIE_PROG20+3(%r14),1 # last exit...
+ jnz sie_done
LPP __SF_EMPTY(%r15) # set guest id
sie 0(%r14)
sie_done:
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index c1c7c68..ef4ef21 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -454,6 +454,34 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
return 0;
}
+void s390_vcpu_block(struct kvm_vcpu *vcpu)
+{
+ atomic_set_mask(PROG_BLOCK_SIE, &vcpu->arch.sie_block->prog20);
+}
+
+void s390_vcpu_unblock(struct kvm_vcpu *vcpu)
+{
+ atomic_clear_mask(PROG_BLOCK_SIE, &vcpu->arch.sie_block->prog20);
+}
+
+/*
+ * Kick a guest cpu out of SIE and wait until SIE is not running.
+ * If the CPU is not running (e.g. waiting as idle) the function will
+ * return immediately. */
+void exit_sie(struct kvm_vcpu *vcpu)
+{
+ atomic_set_mask(CPUSTAT_STOP_INT, &vcpu->arch.sie_block->cpuflags);
+ while (vcpu->arch.sie_block->prog0c & PROG_IN_SIE)
+ cpu_relax();
+}
+
+/* Kick a guest cpu out of SIE and prevent SIE-reentry */
+void exit_sie_sync(struct kvm_vcpu *vcpu)
+{
+ s390_vcpu_block(vcpu);
+ exit_sie(vcpu);
+}
+
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
{
/* kvm common code refers to this, but never calls it */
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index efc14f6..7a8abfd 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -133,6 +133,10 @@ int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu);
/* implemented in kvm-s390.c */
int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu,
unsigned long addr);
+void s390_vcpu_block(struct kvm_vcpu *vcpu);
+void s390_vcpu_unblock(struct kvm_vcpu *vcpu);
+void exit_sie(struct kvm_vcpu *vcpu);
+void exit_sie_sync(struct kvm_vcpu *vcpu);
/* implemented in diag.c */
int kvm_s390_handle_diag(struct kvm_vcpu *vcpu);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 6/8] s390/kvm: Kick guests out of sie if prefix page host pte is touched
2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
` (4 preceding siblings ...)
2013-05-17 12:41 ` [PATCH 5/8] s390/kvm: Provide a way to prevent reentering SIE Christian Borntraeger
@ 2013-05-17 12:41 ` Christian Borntraeger
2013-05-17 12:41 ` [PATCH 7/8] s390/kvm: avoid automatic sie reentry Christian Borntraeger
` (3 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2013-05-17 12:41 UTC (permalink / raw)
To: Marcelo Tossati, Gleb Natapov, Paolo Bonzini
Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
linux-s390, Christian Borntraeger
The guest prefix pages must be mapped writeable all the time
while SIE is running, otherwise the guest might see random
behaviour. (pinned at the pte level) Turns out that mlocking is
not enough, the page table entry (not the page) might change or
become r/o. This patch uses the gmap notifiers to kick guest
cpus out of SIE.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
arch/s390/include/asm/pgtable.h | 1 +
arch/s390/kvm/intercept.c | 39 ++------------------------------
arch/s390/kvm/kvm-s390.c | 49 +++++++++++++++++++++++++++++++++++++++++
arch/s390/kvm/kvm-s390.h | 1 +
4 files changed, 53 insertions(+), 37 deletions(-)
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 1fc68d9..1d0ad7d 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -739,6 +739,7 @@ struct gmap {
struct mm_struct *mm;
unsigned long *table;
unsigned long asce;
+ void *private;
struct list_head crst_list;
};
diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index b7d1b2e..f0b8be0 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -174,47 +174,12 @@ static int handle_stop(struct kvm_vcpu *vcpu)
static int handle_validity(struct kvm_vcpu *vcpu)
{
- unsigned long vmaddr;
int viwhy = vcpu->arch.sie_block->ipb >> 16;
- int rc;
vcpu->stat.exit_validity++;
trace_kvm_s390_intercept_validity(vcpu, viwhy);
- if (viwhy == 0x37) {
- vmaddr = gmap_fault(vcpu->arch.sie_block->prefix,
- vcpu->arch.gmap);
- if (IS_ERR_VALUE(vmaddr)) {
- rc = -EOPNOTSUPP;
- goto out;
- }
- rc = fault_in_pages_writeable((char __user *) vmaddr,
- PAGE_SIZE);
- if (rc) {
- /* user will receive sigsegv, exit to user */
- rc = -EOPNOTSUPP;
- goto out;
- }
- vmaddr = gmap_fault(vcpu->arch.sie_block->prefix + PAGE_SIZE,
- vcpu->arch.gmap);
- if (IS_ERR_VALUE(vmaddr)) {
- rc = -EOPNOTSUPP;
- goto out;
- }
- rc = fault_in_pages_writeable((char __user *) vmaddr,
- PAGE_SIZE);
- if (rc) {
- /* user will receive sigsegv, exit to user */
- rc = -EOPNOTSUPP;
- goto out;
- }
- } else
- rc = -EOPNOTSUPP;
-
-out:
- if (rc)
- VCPU_EVENT(vcpu, 2, "unhandled validity intercept code %d",
- viwhy);
- return rc;
+ WARN_ONCE(true, "kvm: unhandled validity intercept 0x%x\n", viwhy);
+ return -EOPNOTSUPP;
}
static int handle_instruction(struct kvm_vcpu *vcpu)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index ef4ef21..08227c1 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -84,6 +84,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
};
static unsigned long long *facilities;
+static struct gmap_notifier gmap_notifier;
/* Section: not file related */
int kvm_arch_hardware_enable(void *garbage)
@@ -96,13 +97,18 @@ void kvm_arch_hardware_disable(void *garbage)
{
}
+static void kvm_gmap_notifier(struct gmap *gmap, unsigned long address);
+
int kvm_arch_hardware_setup(void)
{
+ gmap_notifier.notifier_call = kvm_gmap_notifier;
+ gmap_register_ipte_notifier(&gmap_notifier);
return 0;
}
void kvm_arch_hardware_unsetup(void)
{
+ gmap_unregister_ipte_notifier(&gmap_notifier);
}
void kvm_arch_check_processor_compat(void *rtn)
@@ -239,6 +245,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
kvm->arch.gmap = gmap_alloc(current->mm);
if (!kvm->arch.gmap)
goto out_nogmap;
+ kvm->arch.gmap->private = kvm;
}
kvm->arch.css_support = 0;
@@ -309,6 +316,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
vcpu->arch.gmap = gmap_alloc(current->mm);
if (!vcpu->arch.gmap)
return -ENOMEM;
+ vcpu->arch.gmap->private = vcpu->kvm;
return 0;
}
@@ -482,6 +490,22 @@ void exit_sie_sync(struct kvm_vcpu *vcpu)
exit_sie(vcpu);
}
+static void kvm_gmap_notifier(struct gmap *gmap, unsigned long address)
+{
+ int i;
+ struct kvm *kvm = gmap->private;
+ struct kvm_vcpu *vcpu;
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ /* match against both prefix pages */
+ if (vcpu->arch.sie_block->prefix == (address & ~0x1000UL)) {
+ VCPU_EVENT(vcpu, 2, "gmap notifier for %lx", address);
+ kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
+ exit_sie_sync(vcpu);
+ }
+ }
+}
+
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
{
/* kvm common code refers to this, but never calls it */
@@ -634,6 +658,27 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
return -EINVAL; /* not implemented yet */
}
+static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
+{
+ /*
+ * We use MMU_RELOAD just to re-arm the ipte notifier for the
+ * guest prefix page. gmap_ipte_notify will wait on the ptl lock.
+ * This ensures that the ipte instruction for this request has
+ * already finished. We might race against a second unmapper that
+ * wants to set the blocking bit. Lets just retry the request loop.
+ */
+ while (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu)) {
+ int rc;
+ rc = gmap_ipte_notify(vcpu->arch.gmap,
+ vcpu->arch.sie_block->prefix,
+ PAGE_SIZE * 2);
+ if (rc)
+ return rc;
+ s390_vcpu_unblock(vcpu);
+ }
+ return 0;
+}
+
static int __vcpu_run(struct kvm_vcpu *vcpu)
{
int rc;
@@ -649,6 +694,10 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
if (!kvm_is_ucontrol(vcpu->kvm))
kvm_s390_deliver_pending_interrupts(vcpu);
+ rc = kvm_s390_handle_requests(vcpu);
+ if (rc)
+ return rc;
+
vcpu->arch.sie_block->icptcode = 0;
preempt_disable();
kvm_guest_enter();
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 7a8abfd..269b523 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -63,6 +63,7 @@ static inline void kvm_s390_set_prefix(struct kvm_vcpu *vcpu, u32 prefix)
{
vcpu->arch.sie_block->prefix = prefix & 0x7fffe000u;
vcpu->arch.sie_block->ihcpu = 0xffff;
+ kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
}
static inline u64 kvm_s390_get_base_disp_s(struct kvm_vcpu *vcpu)
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 7/8] s390/kvm: avoid automatic sie reentry
2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
` (5 preceding siblings ...)
2013-05-17 12:41 ` [PATCH 6/8] s390/kvm: Kick guests out of sie if prefix page host pte is touched Christian Borntraeger
@ 2013-05-17 12:41 ` Christian Borntraeger
2013-05-17 12:41 ` [PATCH 8/8] s390: fix gmap_ipte_notifier vs. software dirty pages Christian Borntraeger
` (2 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2013-05-17 12:41 UTC (permalink / raw)
To: Marcelo Tossati, Gleb Natapov, Paolo Bonzini
Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
linux-s390, Christian Borntraeger
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
Do not automatically restart the sie instruction in entry64.S after an
interrupt, return to the caller with a reason code instead. That allows
to deal with RCU and other conditions in C code.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
arch/s390/kernel/entry64.S | 76 ++++++++++++++++++++--------------------------
arch/s390/kvm/kvm-s390.c | 4 ++-
2 files changed, 36 insertions(+), 44 deletions(-)
diff --git a/arch/s390/kernel/entry64.S b/arch/s390/kernel/entry64.S
index c7daeef..51d99ac 100644
--- a/arch/s390/kernel/entry64.S
+++ b/arch/s390/kernel/entry64.S
@@ -47,7 +47,6 @@ _TIF_WORK_INT = (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_NEED_RESCHED | \
_TIF_MCCK_PENDING)
_TIF_TRACE = (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SECCOMP | \
_TIF_SYSCALL_TRACEPOINT)
-_TIF_EXIT_SIE = (_TIF_SIGPENDING | _TIF_NEED_RESCHED | _TIF_MCCK_PENDING)
#define BASED(name) name-system_call(%r13)
@@ -81,25 +80,27 @@ _TIF_EXIT_SIE = (_TIF_SIGPENDING | _TIF_NEED_RESCHED | _TIF_MCCK_PENDING)
#endif
.endm
- .macro HANDLE_SIE_INTERCEPT scratch,pgmcheck
+ .macro HANDLE_SIE_INTERCEPT scratch,reason
#if defined(CONFIG_KVM) || defined(CONFIG_KVM_MODULE)
tmhh %r8,0x0001 # interrupting from user ?
- jnz .+52
+ jnz .+62
lgr \scratch,%r9
- slg \scratch,BASED(.Lsie_loop)
- clg \scratch,BASED(.Lsie_length)
- .if \pgmcheck
+ slg \scratch,BASED(.Lsie_critical)
+ clg \scratch,BASED(.Lsie_critical_length)
+ .if \reason==1
# Some program interrupts are suppressing (e.g. protection).
# We must also check the instruction after SIE in that case.
# do_protection_exception will rewind to rewind_pad
- jh .+32
+ jh .+42
.else
- jhe .+32
+ jhe .+42
.endif
- lg %r9,BASED(.Lsie_loop)
- LPP BASED(.Lhost_id) # set host id
- lg %r14,__SF_EMPTY(%r15) # get control block pointer
+ lg %r14,__SF_EMPTY(%r15) # get control block pointer
+ LPP __SF_EMPTY+16(%r15) # set host id
ni __SIE_PROG0C+3(%r14),0xfe # no longer in SIE
+ lctlg %c1,%c1,__LC_USER_ASCE # load primary asce
+ larl %r9,sie_exit # skip forward to sie_exit
+ mvi __SF_EMPTY+31(%r15),\reason # set exit reason
#endif
.endm
@@ -452,7 +453,7 @@ ENTRY(io_int_handler)
lg %r12,__LC_THREAD_INFO
larl %r13,system_call
lmg %r8,%r9,__LC_IO_OLD_PSW
- HANDLE_SIE_INTERCEPT %r14,0
+ HANDLE_SIE_INTERCEPT %r14,2
SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_STACK,STACK_SHIFT
tmhh %r8,0x0001 # interrupting from user?
jz io_skip
@@ -597,7 +598,7 @@ ENTRY(ext_int_handler)
lg %r12,__LC_THREAD_INFO
larl %r13,system_call
lmg %r8,%r9,__LC_EXT_OLD_PSW
- HANDLE_SIE_INTERCEPT %r14,0
+ HANDLE_SIE_INTERCEPT %r14,3
SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_STACK,STACK_SHIFT
tmhh %r8,0x0001 # interrupting from user ?
jz ext_skip
@@ -645,7 +646,7 @@ ENTRY(mcck_int_handler)
lg %r12,__LC_THREAD_INFO
larl %r13,system_call
lmg %r8,%r9,__LC_MCK_OLD_PSW
- HANDLE_SIE_INTERCEPT %r14,0
+ HANDLE_SIE_INTERCEPT %r14,4
tm __LC_MCCK_CODE,0x80 # system damage?
jo mcck_panic # yes -> rest of mcck code invalid
lghi %r14,__LC_CPU_TIMER_SAVE_AREA
@@ -939,19 +940,8 @@ ENTRY(sie64a)
stmg %r6,%r14,__SF_GPRS(%r15) # save kernel registers
stg %r2,__SF_EMPTY(%r15) # save control block pointer
stg %r3,__SF_EMPTY+8(%r15) # save guest register save area
- xc __SF_EMPTY+16(8,%r15),__SF_EMPTY+16(%r15) # host id == 0
+ xc __SF_EMPTY+16(16,%r15),__SF_EMPTY+16(%r15) # host id & reason
lmg %r0,%r13,0(%r3) # load guest gprs 0-13
-# some program checks are suppressing. C code (e.g. do_protection_exception)
-# will rewind the PSW by the ILC, which is 4 bytes in case of SIE. Other
-# instructions in the sie_loop should not cause program interrupts. So
-# lets use a nop (47 00 00 00) as a landing pad.
-# See also HANDLE_SIE_INTERCEPT
-rewind_pad:
- nop 0
-sie_loop:
- lg %r14,__LC_THREAD_INFO # pointer thread_info struct
- tm __TI_flags+7(%r14),_TIF_EXIT_SIE
- jnz sie_exit
lg %r14,__LC_GMAP # get gmap pointer
ltgr %r14,%r14
jz sie_gmap
@@ -966,33 +956,33 @@ sie_gmap:
sie_done:
LPP __SF_EMPTY+16(%r15) # set host id
ni __SIE_PROG0C+3(%r14),0xfe # no longer in SIE
- lg %r14,__LC_THREAD_INFO # pointer thread_info struct
-sie_exit:
lctlg %c1,%c1,__LC_USER_ASCE # load primary asce
+# some program checks are suppressing. C code (e.g. do_protection_exception)
+# will rewind the PSW by the ILC, which is 4 bytes in case of SIE. Other
+# instructions beween sie64a and sie_done should not cause program
+# interrupts. So lets use a nop (47 00 00 00) as a landing pad.
+# See also HANDLE_SIE_INTERCEPT
+rewind_pad:
+ nop 0
+sie_exit:
lg %r14,__SF_EMPTY+8(%r15) # load guest register save area
stmg %r0,%r13,0(%r14) # save guest gprs 0-13
lmg %r6,%r14,__SF_GPRS(%r15) # restore kernel registers
- lghi %r2,0
+ lg %r2,__SF_EMPTY+24(%r15) # return exit reason code
br %r14
sie_fault:
- lctlg %c1,%c1,__LC_USER_ASCE # load primary asce
- lg %r14,__LC_THREAD_INFO # pointer thread_info struct
- lg %r14,__SF_EMPTY+8(%r15) # load guest register save area
- stmg %r0,%r13,0(%r14) # save guest gprs 0-13
- lmg %r6,%r14,__SF_GPRS(%r15) # restore kernel registers
- lghi %r2,-EFAULT
- br %r14
+ lghi %r14,-EFAULT
+ stg %r14,__SF_EMPTY+24(%r15) # set exit reason code
+ j sie_exit
.align 8
-.Lsie_loop:
- .quad sie_loop
-.Lsie_length:
- .quad sie_done - sie_loop
-.Lhost_id:
- .quad 0
+.Lsie_critical:
+ .quad sie_gmap
+.Lsie_critical_length:
+ .quad sie_done - sie_gmap
EX_TABLE(rewind_pad,sie_fault)
- EX_TABLE(sie_loop,sie_fault)
+ EX_TABLE(sie_exit,sie_fault)
#endif
.section .rodata, "a"
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 08227c1..93444c4 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -707,7 +707,9 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
trace_kvm_s390_sie_enter(vcpu,
atomic_read(&vcpu->arch.sie_block->cpuflags));
rc = sie64a(vcpu->arch.sie_block, vcpu->run->s.regs.gprs);
- if (rc) {
+ if (rc > 0)
+ rc = 0;
+ if (rc < 0) {
if (kvm_is_ucontrol(vcpu->kvm)) {
rc = SIE_INTERCEPT_UCONTROL;
} else {
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 8/8] s390: fix gmap_ipte_notifier vs. software dirty pages
2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
` (6 preceding siblings ...)
2013-05-17 12:41 ` [PATCH 7/8] s390/kvm: avoid automatic sie reentry Christian Borntraeger
@ 2013-05-17 12:41 ` Christian Borntraeger
2013-05-19 8:49 ` [PATCH 0/8] s390/kvm fixes Gleb Natapov
2013-05-21 8:56 ` Gleb Natapov
9 siblings, 0 replies; 12+ messages in thread
From: Christian Borntraeger @ 2013-05-17 12:41 UTC (permalink / raw)
To: Marcelo Tossati, Gleb Natapov, Paolo Bonzini
Cc: Cornelia Huck, Heiko Carstens, Martin Schwidefsky, KVM,
linux-s390, Christian Borntraeger
On heavy paging load some guest cpus started to loop in gmap_ipte_notify.
This was visible as stalled cpus inside the guest. The gmap_ipte_notifier
tries to map a user page and then made sure that the pte is valid and
writable. Turns out that with the software change bit tracking the pte
can become read-only (and only software writable) if the page is clean.
Since we loop in this code, the page would stay clean and, therefore,
be never writable again.
Let us just use fixup_user_fault, that guarantees to call handle_mm_fault.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
arch/s390/mm/pgtable.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 5ca7568..1e0c438 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -677,8 +677,7 @@ int gmap_ipte_notify(struct gmap *gmap, unsigned long start, unsigned long len)
break;
}
/* Get the page mapped */
- if (get_user_pages(current, gmap->mm, addr, 1, 1, 0,
- NULL, NULL) != 1) {
+ if (fixup_user_fault(current, gmap->mm, addr, FAULT_FLAG_WRITE)) {
rc = -EFAULT;
break;
}
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 0/8] s390/kvm fixes
2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
` (7 preceding siblings ...)
2013-05-17 12:41 ` [PATCH 8/8] s390: fix gmap_ipte_notifier vs. software dirty pages Christian Borntraeger
@ 2013-05-19 8:49 ` Gleb Natapov
2013-05-21 6:57 ` Martin Schwidefsky
2013-05-21 8:56 ` Gleb Natapov
9 siblings, 1 reply; 12+ messages in thread
From: Gleb Natapov @ 2013-05-19 8:49 UTC (permalink / raw)
To: Christian Borntraeger
Cc: Marcelo Tossati, Paolo Bonzini, Cornelia Huck, Heiko Carstens,
Martin Schwidefsky, KVM, linux-s390
Hi Christian,
On Fri, May 17, 2013 at 02:41:30PM +0200, Christian Borntraeger wrote:
> Gleb, Paolo, Marcelo,
>
> here are some low level changes to kvm on s390 that we have been
> cooking for a while now.
>
> Patch "s390/pgtable: fix ipte notify bit" will go via Martins
> tree into 3.10, but is included to reduce the amount of merge
> conflicts.
>
> Patch "s390: fix gmap_ipte_notifier vs. software dirty pages"
> will also go via Martins tree into 3.10 and it fixes a hang with
> heavy host paging and KVM. This is optional for merging, but
> makes testing on kvm/next easier.
>
Can I add Martin's ACKs to those two then?
> This series addresses 2 problems:
> - paging of guest prefix page
> - RCU timeouts
>
> The first problem is basically that we must not have the host pte
> invalid or r/o for the guest prefix pages. (everything else has fully
> nested paging but the prefix page must not cause host faults).
> It is not enough to pin the page, also the pte has to be r/w all the
> time. Mlocking is not enough due to memory compaction, malicious
> unmapping etc.
> We use the existing callback mechanism of the s390 page table functions
> to kick guests out of SIE and hold them until this is done. We cant
> use the existing kick functions since we must hold a pgste lock while
> we wait for SIE to exit and IPIs might dead lock.
>
> The second problem is that with KVM on s390 we have seen very long
> RCU stalls due to SIE not exiting on interrupts. Instead of returning
> to SIE, we now force an exit into the kvm module, which then does the
> guest exit/enter magic, fixing rcu.
>
> The whole bunch is probably too complex for 3.10, so please queue for
> 3.11
>
> Christian Borntraeger (5):
> s390/pgtable: fix ipte notify bit
> s390/kvm: Mark if a cpu is in SIE
> s390/kvm: Provide a way to prevent reentering SIE
> s390/kvm: Kick guests out of sie if prefix page host pte is touched
> s390: fix gmap_ipte_notifier vs. software dirty pages
>
> Martin Schwidefsky (3):
> s390/kvm: fix psw rewinding in handle_skey
> s390/kvm: rename RCP_xxx defines to PGSTE_xxx
> s390/kvm: avoid automatic sie reentry
>
> arch/s390/include/asm/kvm_host.h | 8 +++-
> arch/s390/include/asm/pgtable.h | 83 +++++++++++++++++++---------------------
> arch/s390/kernel/asm-offsets.c | 3 ++
> arch/s390/kernel/entry64.S | 80 ++++++++++++++++++--------------------
> arch/s390/kvm/intercept.c | 39 +------------------
> arch/s390/kvm/kvm-s390.c | 81 ++++++++++++++++++++++++++++++++++++++-
> arch/s390/kvm/kvm-s390.h | 5 +++
> arch/s390/kvm/priv.c | 3 +-
> arch/s390/mm/pgtable.c | 5 +--
> 9 files changed, 179 insertions(+), 128 deletions(-)
>
> --
> 1.8.1.4
--
Gleb.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/8] s390/kvm fixes
2013-05-19 8:49 ` [PATCH 0/8] s390/kvm fixes Gleb Natapov
@ 2013-05-21 6:57 ` Martin Schwidefsky
0 siblings, 0 replies; 12+ messages in thread
From: Martin Schwidefsky @ 2013-05-21 6:57 UTC (permalink / raw)
To: Gleb Natapov
Cc: Christian Borntraeger, Marcelo Tossati, Paolo Bonzini,
Cornelia Huck, Heiko Carstens, KVM, linux-s390
On Sun, 19 May 2013 11:49:43 +0300
Gleb Natapov <gleb@redhat.com> wrote:
> Hi Christian,
>
> On Fri, May 17, 2013 at 02:41:30PM +0200, Christian Borntraeger wrote:
> > Gleb, Paolo, Marcelo,
> >
> > here are some low level changes to kvm on s390 that we have been
> > cooking for a while now.
> >
> > Patch "s390/pgtable: fix ipte notify bit" will go via Martins
> > tree into 3.10, but is included to reduce the amount of merge
> > conflicts.
> >
> > Patch "s390: fix gmap_ipte_notifier vs. software dirty pages"
> > will also go via Martins tree into 3.10 and it fixes a hang with
> > heavy host paging and KVM. This is optional for merging, but
> > makes testing on kvm/next easier.
> >
> Can I add Martin's ACKs to those two then?
Yes.
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/8] s390/kvm fixes
2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
` (8 preceding siblings ...)
2013-05-19 8:49 ` [PATCH 0/8] s390/kvm fixes Gleb Natapov
@ 2013-05-21 8:56 ` Gleb Natapov
9 siblings, 0 replies; 12+ messages in thread
From: Gleb Natapov @ 2013-05-21 8:56 UTC (permalink / raw)
To: Christian Borntraeger
Cc: Marcelo Tossati, Paolo Bonzini, Cornelia Huck, Heiko Carstens,
Martin Schwidefsky, KVM, linux-s390
On Fri, May 17, 2013 at 02:41:30PM +0200, Christian Borntraeger wrote:
> Gleb, Paolo, Marcelo,
>
Applied, thanks.
> here are some low level changes to kvm on s390 that we have been
> cooking for a while now.
>
> Patch "s390/pgtable: fix ipte notify bit" will go via Martins
> tree into 3.10, but is included to reduce the amount of merge
> conflicts.
>
> Patch "s390: fix gmap_ipte_notifier vs. software dirty pages"
> will also go via Martins tree into 3.10 and it fixes a hang with
> heavy host paging and KVM. This is optional for merging, but
> makes testing on kvm/next easier.
>
> This series addresses 2 problems:
> - paging of guest prefix page
> - RCU timeouts
>
> The first problem is basically that we must not have the host pte
> invalid or r/o for the guest prefix pages. (everything else has fully
> nested paging but the prefix page must not cause host faults).
> It is not enough to pin the page, also the pte has to be r/w all the
> time. Mlocking is not enough due to memory compaction, malicious
> unmapping etc.
> We use the existing callback mechanism of the s390 page table functions
> to kick guests out of SIE and hold them until this is done. We cant
> use the existing kick functions since we must hold a pgste lock while
> we wait for SIE to exit and IPIs might dead lock.
>
> The second problem is that with KVM on s390 we have seen very long
> RCU stalls due to SIE not exiting on interrupts. Instead of returning
> to SIE, we now force an exit into the kvm module, which then does the
> guest exit/enter magic, fixing rcu.
>
> The whole bunch is probably too complex for 3.10, so please queue for
> 3.11
>
> Christian Borntraeger (5):
> s390/pgtable: fix ipte notify bit
> s390/kvm: Mark if a cpu is in SIE
> s390/kvm: Provide a way to prevent reentering SIE
> s390/kvm: Kick guests out of sie if prefix page host pte is touched
> s390: fix gmap_ipte_notifier vs. software dirty pages
>
> Martin Schwidefsky (3):
> s390/kvm: fix psw rewinding in handle_skey
> s390/kvm: rename RCP_xxx defines to PGSTE_xxx
> s390/kvm: avoid automatic sie reentry
>
> arch/s390/include/asm/kvm_host.h | 8 +++-
> arch/s390/include/asm/pgtable.h | 83 +++++++++++++++++++---------------------
> arch/s390/kernel/asm-offsets.c | 3 ++
> arch/s390/kernel/entry64.S | 80 ++++++++++++++++++--------------------
> arch/s390/kvm/intercept.c | 39 +------------------
> arch/s390/kvm/kvm-s390.c | 81 ++++++++++++++++++++++++++++++++++++++-
> arch/s390/kvm/kvm-s390.h | 5 +++
> arch/s390/kvm/priv.c | 3 +-
> arch/s390/mm/pgtable.c | 5 +--
> 9 files changed, 179 insertions(+), 128 deletions(-)
>
> --
> 1.8.1.4
--
Gleb.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-05-21 8:56 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-17 12:41 [PATCH 0/8] s390/kvm fixes Christian Borntraeger
2013-05-17 12:41 ` [PATCH 1/8] s390/pgtable: fix ipte notify bit Christian Borntraeger
2013-05-17 12:41 ` [PATCH 2/8] s390/kvm: fix psw rewinding in handle_skey Christian Borntraeger
2013-05-17 12:41 ` [PATCH 3/8] s390/kvm: rename RCP_xxx defines to PGSTE_xxx Christian Borntraeger
2013-05-17 12:41 ` [PATCH 4/8] s390/kvm: Mark if a cpu is in SIE Christian Borntraeger
2013-05-17 12:41 ` [PATCH 5/8] s390/kvm: Provide a way to prevent reentering SIE Christian Borntraeger
2013-05-17 12:41 ` [PATCH 6/8] s390/kvm: Kick guests out of sie if prefix page host pte is touched Christian Borntraeger
2013-05-17 12:41 ` [PATCH 7/8] s390/kvm: avoid automatic sie reentry Christian Borntraeger
2013-05-17 12:41 ` [PATCH 8/8] s390: fix gmap_ipte_notifier vs. software dirty pages Christian Borntraeger
2013-05-19 8:49 ` [PATCH 0/8] s390/kvm fixes Gleb Natapov
2013-05-21 6:57 ` Martin Schwidefsky
2013-05-21 8:56 ` Gleb Natapov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.