kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] s390/kvm: fix MVPG when in VSIE
@ 2021-02-09 15:43 Claudio Imbrenda
  2021-02-09 15:43 ` [PATCH v3 1/2] s390/kvm: extend kvm_s390_shadow_fault to return entry pointer Claudio Imbrenda
  2021-02-09 15:43 ` [PATCH v3 2/2] s390/kvm: VSIE: correctly handle MVPG when in VSIE Claudio Imbrenda
  0 siblings, 2 replies; 8+ messages in thread
From: Claudio Imbrenda @ 2021-02-09 15:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: borntraeger, frankja, david, cohuck, kvm, linux-s390

The current handling of the MVPG instruction when executed in a nested
guest is wrong, and can lead to the nested guest hanging.

This patchset fixes the behaviour to be more architecturally correct,
and fixes the hangs observed.

v2->v3
* improved some comments
* improved some variable and parameter names for increased readability
* fixed missing handling of page faults in the MVPG handler
* small readability improvements

v1->v2
* complete rewrite

Claudio Imbrenda (2):
  s390/kvm: extend kvm_s390_shadow_fault to return entry pointer
  s390/kvm: VSIE: correctly handle MVPG when in VSIE

 arch/s390/kvm/gaccess.c |  30 ++++++++++--
 arch/s390/kvm/gaccess.h |   5 +-
 arch/s390/kvm/vsie.c    | 101 ++++++++++++++++++++++++++++++++++++----
 3 files changed, 121 insertions(+), 15 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/2] s390/kvm: extend kvm_s390_shadow_fault to return entry pointer
  2021-02-09 15:43 [PATCH v3 0/2] s390/kvm: fix MVPG when in VSIE Claudio Imbrenda
@ 2021-02-09 15:43 ` Claudio Imbrenda
  2021-02-11  8:35   ` Janosch Frank
  2021-02-11  9:18   ` David Hildenbrand
  2021-02-09 15:43 ` [PATCH v3 2/2] s390/kvm: VSIE: correctly handle MVPG when in VSIE Claudio Imbrenda
  1 sibling, 2 replies; 8+ messages in thread
From: Claudio Imbrenda @ 2021-02-09 15:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: borntraeger, frankja, david, cohuck, kvm, linux-s390, stable

Extend kvm_s390_shadow_fault to return the pointer to the valid leaf
DAT table entry, or to the invalid entry.

Also return some flags in the lower bits of the address:
DAT_PROT: indicates that DAT protection applies because of the
          protection bit in the segment (or, if EDAT, region) tables
NOT_PTE: indicates that the address of the DAT table entry returned
         does not refer to a PTE, but to a segment or region table.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: stable@vger.kernel.org
---
 arch/s390/kvm/gaccess.c | 30 +++++++++++++++++++++++++-----
 arch/s390/kvm/gaccess.h |  5 ++++-
 arch/s390/kvm/vsie.c    |  8 ++++----
 3 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 6d6b57059493..e0ab83f051d2 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -976,7 +976,9 @@ int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra)
  * kvm_s390_shadow_tables - walk the guest page table and create shadow tables
  * @sg: pointer to the shadow guest address space structure
  * @saddr: faulting address in the shadow gmap
- * @pgt: pointer to the page table address result
+ * @pgt: pointer to the beginning of the page table for the given address if
+ *       successful (return value 0), or to the first invalid DAT entry in
+ *       case of exceptions (return value > 0)
  * @fake: pgt references contiguous guest memory block, not a pgtable
  */
 static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
@@ -1034,6 +1036,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
 			rfte.val = ptr;
 			goto shadow_r2t;
 		}
+		*pgt = ptr + vaddr.rfx * 8;
 		rc = gmap_read_table(parent, ptr + vaddr.rfx * 8, &rfte.val);
 		if (rc)
 			return rc;
@@ -1060,6 +1063,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
 			rste.val = ptr;
 			goto shadow_r3t;
 		}
+		*pgt = ptr + vaddr.rsx * 8;
 		rc = gmap_read_table(parent, ptr + vaddr.rsx * 8, &rste.val);
 		if (rc)
 			return rc;
@@ -1087,6 +1091,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
 			rtte.val = ptr;
 			goto shadow_sgt;
 		}
+		*pgt = ptr + vaddr.rtx * 8;
 		rc = gmap_read_table(parent, ptr + vaddr.rtx * 8, &rtte.val);
 		if (rc)
 			return rc;
@@ -1123,6 +1128,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
 			ste.val = ptr;
 			goto shadow_pgt;
 		}
+		*pgt = ptr + vaddr.sx * 8;
 		rc = gmap_read_table(parent, ptr + vaddr.sx * 8, &ste.val);
 		if (rc)
 			return rc;
@@ -1157,6 +1163,8 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
  * @vcpu: virtual cpu
  * @sg: pointer to the shadow guest address space structure
  * @saddr: faulting address in the shadow gmap
+ * @datptr: will contain the address of the faulting DAT table entry, or of
+ *          the valid leaf, plus some flags
  *
  * Returns: - 0 if the shadow fault was successfully resolved
  *	    - > 0 (pgm exception code) on exceptions while faulting
@@ -1165,11 +1173,11 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
  *	    - -ENOMEM if out of memory
  */
 int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
-			  unsigned long saddr)
+			  unsigned long saddr, unsigned long *datptr)
 {
 	union vaddress vaddr;
 	union page_table_entry pte;
-	unsigned long pgt;
+	unsigned long pgt = 0;
 	int dat_protection, fake;
 	int rc;
 
@@ -1191,8 +1199,20 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
 		pte.val = pgt + vaddr.px * PAGE_SIZE;
 		goto shadow_page;
 	}
-	if (!rc)
-		rc = gmap_read_table(sg->parent, pgt + vaddr.px * 8, &pte.val);
+
+	switch (rc) {
+	case PGM_SEGMENT_TRANSLATION:
+	case PGM_REGION_THIRD_TRANS:
+	case PGM_REGION_SECOND_TRANS:
+	case PGM_REGION_FIRST_TRANS:
+		pgt |= NOT_PTE;
+		break;
+	case 0:
+		pgt += vaddr.px * 8;
+		rc = gmap_read_table(sg->parent, pgt, &pte.val);
+	}
+	if (*datptr)
+		*datptr = pgt | dat_protection * DAT_PROT;
 	if (!rc && pte.i)
 		rc = PGM_PAGE_TRANSLATION;
 	if (!rc && pte.z)
diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
index f4c51756c462..fec26bbb17ba 100644
--- a/arch/s390/kvm/gaccess.h
+++ b/arch/s390/kvm/gaccess.h
@@ -359,7 +359,10 @@ void ipte_unlock(struct kvm_vcpu *vcpu);
 int ipte_lock_held(struct kvm_vcpu *vcpu);
 int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra);
 
+#define DAT_PROT 2
+#define NOT_PTE 4
+
 int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *shadow,
-			  unsigned long saddr);
+			  unsigned long saddr, unsigned long *datptr);
 
 #endif /* __KVM_S390_GACCESS_H */
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index c5d0a58b2c29..7db022141db3 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -619,10 +619,10 @@ static int map_prefix(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
 	/* with mso/msl, the prefix lies at offset *mso* */
 	prefix += scb_s->mso;
 
-	rc = kvm_s390_shadow_fault(vcpu, vsie_page->gmap, prefix);
+	rc = kvm_s390_shadow_fault(vcpu, vsie_page->gmap, prefix, NULL);
 	if (!rc && (scb_s->ecb & ECB_TE))
 		rc = kvm_s390_shadow_fault(vcpu, vsie_page->gmap,
-					   prefix + PAGE_SIZE);
+					   prefix + PAGE_SIZE, NULL);
 	/*
 	 * We don't have to mprotect, we will be called for all unshadows.
 	 * SIE will detect if protection applies and trigger a validity.
@@ -913,7 +913,7 @@ static int handle_fault(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
 				    current->thread.gmap_addr, 1);
 
 	rc = kvm_s390_shadow_fault(vcpu, vsie_page->gmap,
-				   current->thread.gmap_addr);
+				   current->thread.gmap_addr, NULL);
 	if (rc > 0) {
 		rc = inject_fault(vcpu, rc,
 				  current->thread.gmap_addr,
@@ -935,7 +935,7 @@ static void handle_last_fault(struct kvm_vcpu *vcpu,
 {
 	if (vsie_page->fault_addr)
 		kvm_s390_shadow_fault(vcpu, vsie_page->gmap,
-				      vsie_page->fault_addr);
+				      vsie_page->fault_addr, NULL);
 	vsie_page->fault_addr = 0;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/2] s390/kvm: VSIE: correctly handle MVPG when in VSIE
  2021-02-09 15:43 [PATCH v3 0/2] s390/kvm: fix MVPG when in VSIE Claudio Imbrenda
  2021-02-09 15:43 ` [PATCH v3 1/2] s390/kvm: extend kvm_s390_shadow_fault to return entry pointer Claudio Imbrenda
@ 2021-02-09 15:43 ` Claudio Imbrenda
  2021-02-11  8:46   ` Janosch Frank
  1 sibling, 1 reply; 8+ messages in thread
From: Claudio Imbrenda @ 2021-02-09 15:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: borntraeger, frankja, david, cohuck, kvm, linux-s390, stable

Correctly handle the MVPG instruction when issued by a VSIE guest.

Fixes: a3508fbe9dc6d ("KVM: s390: vsie: initial support for nested virtualization")
Cc: stable@vger.kernel.org
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 arch/s390/kvm/vsie.c | 93 +++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 88 insertions(+), 5 deletions(-)

diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index 7db022141db3..7dbb0d93c25f 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -416,11 +416,6 @@ static void unshadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
 		memcpy((void *)((u64)scb_o + 0xc0),
 		       (void *)((u64)scb_s + 0xc0), 0xf0 - 0xc0);
 		break;
-	case ICPT_PARTEXEC:
-		/* MVPG only */
-		memcpy((void *)((u64)scb_o + 0xc0),
-		       (void *)((u64)scb_s + 0xc0), 0xd0 - 0xc0);
-		break;
 	}
 
 	if (scb_s->ihcpu != 0xffffU)
@@ -982,6 +977,90 @@ static int handle_stfle(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
 	return 0;
 }
 
+static u64 vsie_get_register(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page, u8 reg)
+{
+	reg &= 0xf;
+	switch (reg) {
+	case 15:
+		return vsie_page->scb_s.gg15;
+	case 14:
+		return vsie_page->scb_s.gg14;
+	default:
+		return vcpu->run->s.regs.gprs[reg];
+	}
+}
+
+static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
+{
+	struct kvm_s390_sie_block *scb_s = &vsie_page->scb_s;
+	unsigned long pei_dest, pei_src, src, dest, mask = PAGE_MASK;
+	u64 *pei_block = &vsie_page->scb_o->mcic;
+	int edat, rc_dest, rc_src;
+	union ctlreg0 cr0;
+
+	cr0.val = vcpu->arch.sie_block->gcr[0];
+	edat = cr0.edat && test_kvm_facility(vcpu->kvm, 8);
+	if (psw_bits(scb_s->gpsw).eaba == PSW_BITS_AMODE_24BIT)
+		mask = 0xfff000;
+	else if (psw_bits(scb_s->gpsw).eaba == PSW_BITS_AMODE_31BIT)
+		mask = 0x7ffff000;
+
+	dest = vsie_get_register(vcpu, vsie_page, scb_s->ipb >> 16) & mask;
+	src = vsie_get_register(vcpu, vsie_page, scb_s->ipb >> 20) & mask;
+
+	rc_dest = kvm_s390_shadow_fault(vcpu, vsie_page->gmap, dest, &pei_dest);
+	rc_src = kvm_s390_shadow_fault(vcpu, vsie_page->gmap, src, &pei_src);
+	/*
+	 * Either everything went well, or something non-critical went wrong
+	 * e.g. beause of a race. In either case, simply retry.
+	 */
+	if (rc_dest == -EAGAIN || rc_src == -EAGAIN || (!rc_dest && !rc_src)) {
+		retry_vsie_icpt(vsie_page);
+		return -EAGAIN;
+	}
+	/* Something more serious went wrong, propagate the error */
+	if (rc_dest < 0)
+		return rc_dest;
+	if (rc_src < 0)
+		return rc_src;
+
+	/* The only possible suppressing exception: just deliver it */
+	if (rc_dest == PGM_TRANSLATION_SPEC || rc_src == PGM_TRANSLATION_SPEC) {
+		clear_vsie_icpt(vsie_page);
+		rc_dest = kvm_s390_inject_program_int(vcpu, PGM_TRANSLATION_SPEC);
+		WARN_ON_ONCE(rc_dest);
+		return 1;
+	}
+
+	/*
+	 * Forward the PEI intercept to the guest if it was a page fault, or
+	 * also for segment and region table faults if EDAT applies.
+	 */
+	if (edat) {
+		rc_dest = rc_dest == PGM_ASCE_TYPE ? rc_dest : 0;
+		rc_src = rc_src == PGM_ASCE_TYPE ? rc_src : 0;
+	} else {
+		rc_dest = rc_dest != PGM_PAGE_TRANSLATION ? rc_dest : 0;
+		rc_src = rc_src != PGM_PAGE_TRANSLATION ? rc_src : 0;
+	}
+	if (!rc_dest && !rc_src) {
+		pei_block[0] = pei_dest;
+		pei_block[1] = pei_src;
+		return 1;
+	}
+
+	retry_vsie_icpt(vsie_page);
+
+	/*
+	 * The host has edat, and the guest does not, or it was an ASCE type
+	 * exception. The host needs to inject the appropriate DAT interrupts
+	 * into the guest.
+	 */
+	if (rc_dest)
+		return inject_fault(vcpu, rc_dest, dest, 1);
+	return inject_fault(vcpu, rc_src, src, 0);
+}
+
 /*
  * Run the vsie on a shadow scb and a shadow gmap, without any further
  * sanity checks, handling SIE faults.
@@ -1068,6 +1147,10 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
 		if ((scb_s->ipa & 0xf000) != 0xf000)
 			scb_s->ipa += 0x1000;
 		break;
+	case ICPT_PARTEXEC:
+		if (scb_s->ipa == 0xb254)
+			rc = vsie_handle_mvpg(vcpu, vsie_page);
+		break;
 	}
 	return rc;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] s390/kvm: extend kvm_s390_shadow_fault to return entry pointer
  2021-02-09 15:43 ` [PATCH v3 1/2] s390/kvm: extend kvm_s390_shadow_fault to return entry pointer Claudio Imbrenda
@ 2021-02-11  8:35   ` Janosch Frank
  2021-02-11  9:18   ` David Hildenbrand
  1 sibling, 0 replies; 8+ messages in thread
From: Janosch Frank @ 2021-02-11  8:35 UTC (permalink / raw)
  To: Claudio Imbrenda, linux-kernel
  Cc: borntraeger, david, cohuck, kvm, linux-s390, stable

On 2/9/21 4:43 PM, Claudio Imbrenda wrote:
> Extend kvm_s390_shadow_fault to return the pointer to the valid leaf
> DAT table entry, or to the invalid entry.
> 
> Also return some flags in the lower bits of the address:
> DAT_PROT: indicates that DAT protection applies because of the
>           protection bit in the segment (or, if EDAT, region) tables
> NOT_PTE: indicates that the address of the DAT table entry returned
>          does not refer to a PTE, but to a segment or region table.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Cc: stable@vger.kernel.org

Reviewed-by: Janosch Frank <frankja@de.ibm.com>

Small nit below.

> ---
>  arch/s390/kvm/gaccess.c | 30 +++++++++++++++++++++++++-----
>  arch/s390/kvm/gaccess.h |  5 ++++-
>  arch/s390/kvm/vsie.c    |  8 ++++----
>  3 files changed, 33 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
> index 6d6b57059493..e0ab83f051d2 100644
> --- a/arch/s390/kvm/gaccess.c
> +++ b/arch/s390/kvm/gaccess.c
> @@ -976,7 +976,9 @@ int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra)
>   * kvm_s390_shadow_tables - walk the guest page table and create shadow tables
>   * @sg: pointer to the shadow guest address space structure
>   * @saddr: faulting address in the shadow gmap
> - * @pgt: pointer to the page table address result
> + * @pgt: pointer to the beginning of the page table for the given address if
> + *       successful (return value 0), or to the first invalid DAT entry in
> + *       case of exceptions (return value > 0)
>   * @fake: pgt references contiguous guest memory block, not a pgtable
>   */
>  static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
> @@ -1034,6 +1036,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
>  			rfte.val = ptr;
>  			goto shadow_r2t;
>  		}
> +		*pgt = ptr + vaddr.rfx * 8;
>  		rc = gmap_read_table(parent, ptr + vaddr.rfx * 8, &rfte.val);
>  		if (rc)
>  			return rc;
> @@ -1060,6 +1063,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
>  			rste.val = ptr;
>  			goto shadow_r3t;
>  		}
> +		*pgt = ptr + vaddr.rsx * 8;
>  		rc = gmap_read_table(parent, ptr + vaddr.rsx * 8, &rste.val);
>  		if (rc)
>  			return rc;
> @@ -1087,6 +1091,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
>  			rtte.val = ptr;
>  			goto shadow_sgt;
>  		}
> +		*pgt = ptr + vaddr.rtx * 8;
>  		rc = gmap_read_table(parent, ptr + vaddr.rtx * 8, &rtte.val);
>  		if (rc)
>  			return rc;
> @@ -1123,6 +1128,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
>  			ste.val = ptr;
>  			goto shadow_pgt;
>  		}
> +		*pgt = ptr + vaddr.sx * 8;
>  		rc = gmap_read_table(parent, ptr + vaddr.sx * 8, &ste.val);
>  		if (rc)
>  			return rc;
> @@ -1157,6 +1163,8 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
>   * @vcpu: virtual cpu
>   * @sg: pointer to the shadow guest address space structure
>   * @saddr: faulting address in the shadow gmap
> + * @datptr: will contain the address of the faulting DAT table entry, or of
> + *          the valid leaf, plus some flags
>   *
>   * Returns: - 0 if the shadow fault was successfully resolved
>   *	    - > 0 (pgm exception code) on exceptions while faulting
> @@ -1165,11 +1173,11 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
>   *	    - -ENOMEM if out of memory
>   */
>  int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
> -			  unsigned long saddr)
> +			  unsigned long saddr, unsigned long *datptr)
>  {
>  	union vaddress vaddr;
>  	union page_table_entry pte;
> -	unsigned long pgt;
> +	unsigned long pgt = 0;
>  	int dat_protection, fake;
>  	int rc;
>  
> @@ -1191,8 +1199,20 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
>  		pte.val = pgt + vaddr.px * PAGE_SIZE;
>  		goto shadow_page;
>  	}
> -	if (!rc)
> -		rc = gmap_read_table(sg->parent, pgt + vaddr.px * 8, &pte.val);
> +
> +	switch (rc) {
> +	case PGM_SEGMENT_TRANSLATION:
> +	case PGM_REGION_THIRD_TRANS:
> +	case PGM_REGION_SECOND_TRANS:
> +	case PGM_REGION_FIRST_TRANS:
> +		pgt |= NOT_PTE;
> +		break;
> +	case 0:
> +		pgt += vaddr.px * 8;
> +		rc = gmap_read_table(sg->parent, pgt, &pte.val);
> +	}
> +	if (*datptr)
> +		*datptr = pgt | dat_protection * DAT_PROT;
>  	if (!rc && pte.i)
>  		rc = PGM_PAGE_TRANSLATION;
>  	if (!rc && pte.z)
> diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
> index f4c51756c462..fec26bbb17ba 100644
> --- a/arch/s390/kvm/gaccess.h
> +++ b/arch/s390/kvm/gaccess.h
> @@ -359,7 +359,10 @@ void ipte_unlock(struct kvm_vcpu *vcpu);
>  int ipte_lock_held(struct kvm_vcpu *vcpu);
>  int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra);
>  
> +#define DAT_PROT 2
> +#define NOT_PTE 4

I'd like to have a PEI prefix and a short comment where this comes from,
something like:
"MVPG PEI indication bits"

> +
>  int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *shadow,
> -			  unsigned long saddr);
> +			  unsigned long saddr, unsigned long *datptr);
>  
>  #endif /* __KVM_S390_GACCESS_H */
> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
> index c5d0a58b2c29..7db022141db3 100644
> --- a/arch/s390/kvm/vsie.c
> +++ b/arch/s390/kvm/vsie.c
> @@ -619,10 +619,10 @@ static int map_prefix(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>  	/* with mso/msl, the prefix lies at offset *mso* */
>  	prefix += scb_s->mso;
>  
> -	rc = kvm_s390_shadow_fault(vcpu, vsie_page->gmap, prefix);
> +	rc = kvm_s390_shadow_fault(vcpu, vsie_page->gmap, prefix, NULL);
>  	if (!rc && (scb_s->ecb & ECB_TE))
>  		rc = kvm_s390_shadow_fault(vcpu, vsie_page->gmap,
> -					   prefix + PAGE_SIZE);
> +					   prefix + PAGE_SIZE, NULL);
>  	/*
>  	 * We don't have to mprotect, we will be called for all unshadows.
>  	 * SIE will detect if protection applies and trigger a validity.
> @@ -913,7 +913,7 @@ static int handle_fault(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>  				    current->thread.gmap_addr, 1);
>  
>  	rc = kvm_s390_shadow_fault(vcpu, vsie_page->gmap,
> -				   current->thread.gmap_addr);
> +				   current->thread.gmap_addr, NULL);
>  	if (rc > 0) {
>  		rc = inject_fault(vcpu, rc,
>  				  current->thread.gmap_addr,
> @@ -935,7 +935,7 @@ static void handle_last_fault(struct kvm_vcpu *vcpu,
>  {
>  	if (vsie_page->fault_addr)
>  		kvm_s390_shadow_fault(vcpu, vsie_page->gmap,
> -				      vsie_page->fault_addr);
> +				      vsie_page->fault_addr, NULL);
>  	vsie_page->fault_addr = 0;
>  }
>  
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/2] s390/kvm: VSIE: correctly handle MVPG when in VSIE
  2021-02-09 15:43 ` [PATCH v3 2/2] s390/kvm: VSIE: correctly handle MVPG when in VSIE Claudio Imbrenda
@ 2021-02-11  8:46   ` Janosch Frank
  0 siblings, 0 replies; 8+ messages in thread
From: Janosch Frank @ 2021-02-11  8:46 UTC (permalink / raw)
  To: Claudio Imbrenda, linux-kernel
  Cc: borntraeger, david, cohuck, kvm, linux-s390, stable

On 2/9/21 4:43 PM, Claudio Imbrenda wrote:
> Correctly handle the MVPG instruction when issued by a VSIE guest.
> 
> Fixes: a3508fbe9dc6d ("KVM: s390: vsie: initial support for nested virtualization")
> Cc: stable@vger.kernel.org
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---

Thanks for fixing this.

Acked-by: Janosch Frank <frankja@linux.ibm.com>

>  arch/s390/kvm/vsie.c | 93 +++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 88 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
> index 7db022141db3..7dbb0d93c25f 100644
> --- a/arch/s390/kvm/vsie.c
> +++ b/arch/s390/kvm/vsie.c
> @@ -416,11 +416,6 @@ static void unshadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>  		memcpy((void *)((u64)scb_o + 0xc0),
>  		       (void *)((u64)scb_s + 0xc0), 0xf0 - 0xc0);
>  		break;
> -	case ICPT_PARTEXEC:
> -		/* MVPG only */
> -		memcpy((void *)((u64)scb_o + 0xc0),
> -		       (void *)((u64)scb_s + 0xc0), 0xd0 - 0xc0);
> -		break;
>  	}
>  
>  	if (scb_s->ihcpu != 0xffffU)
> @@ -982,6 +977,90 @@ static int handle_stfle(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>  	return 0;
>  }
>  
> +static u64 vsie_get_register(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page, u8 reg)
> +{
> +	reg &= 0xf;
> +	switch (reg) {
> +	case 15:
> +		return vsie_page->scb_s.gg15;
> +	case 14:
> +		return vsie_page->scb_s.gg14;
> +	default:
> +		return vcpu->run->s.regs.gprs[reg];
> +	}
> +}
> +
> +static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
> +{
> +	struct kvm_s390_sie_block *scb_s = &vsie_page->scb_s;
> +	unsigned long pei_dest, pei_src, src, dest, mask = PAGE_MASK;
> +	u64 *pei_block = &vsie_page->scb_o->mcic;
> +	int edat, rc_dest, rc_src;
> +	union ctlreg0 cr0;
> +
> +	cr0.val = vcpu->arch.sie_block->gcr[0];
> +	edat = cr0.edat && test_kvm_facility(vcpu->kvm, 8);
> +	if (psw_bits(scb_s->gpsw).eaba == PSW_BITS_AMODE_24BIT)
> +		mask = 0xfff000;
> +	else if (psw_bits(scb_s->gpsw).eaba == PSW_BITS_AMODE_31BIT)
> +		mask = 0x7ffff000;
> +
> +	dest = vsie_get_register(vcpu, vsie_page, scb_s->ipb >> 16) & mask;
> +	src = vsie_get_register(vcpu, vsie_page, scb_s->ipb >> 20) & mask;
> +
> +	rc_dest = kvm_s390_shadow_fault(vcpu, vsie_page->gmap, dest, &pei_dest);
> +	rc_src = kvm_s390_shadow_fault(vcpu, vsie_page->gmap, src, &pei_src);
> +	/*
> +	 * Either everything went well, or something non-critical went wrong
> +	 * e.g. beause of a race. In either case, simply retry.
> +	 */
> +	if (rc_dest == -EAGAIN || rc_src == -EAGAIN || (!rc_dest && !rc_src)) {
> +		retry_vsie_icpt(vsie_page);
> +		return -EAGAIN;
> +	}
> +	/* Something more serious went wrong, propagate the error */
> +	if (rc_dest < 0)
> +		return rc_dest;
> +	if (rc_src < 0)
> +		return rc_src;
> +
> +	/* The only possible suppressing exception: just deliver it */
> +	if (rc_dest == PGM_TRANSLATION_SPEC || rc_src == PGM_TRANSLATION_SPEC) {
> +		clear_vsie_icpt(vsie_page);
> +		rc_dest = kvm_s390_inject_program_int(vcpu, PGM_TRANSLATION_SPEC);
> +		WARN_ON_ONCE(rc_dest);
> +		return 1;
> +	}
> +
> +	/*
> +	 * Forward the PEI intercept to the guest if it was a page fault, or
> +	 * also for segment and region table faults if EDAT applies.
> +	 */
> +	if (edat) {
> +		rc_dest = rc_dest == PGM_ASCE_TYPE ? rc_dest : 0;
> +		rc_src = rc_src == PGM_ASCE_TYPE ? rc_src : 0;
> +	} else {
> +		rc_dest = rc_dest != PGM_PAGE_TRANSLATION ? rc_dest : 0;
> +		rc_src = rc_src != PGM_PAGE_TRANSLATION ? rc_src : 0;
> +	}
> +	if (!rc_dest && !rc_src) {
> +		pei_block[0] = pei_dest;
> +		pei_block[1] = pei_src;
> +		return 1;
> +	}
> +
> +	retry_vsie_icpt(vsie_page);
> +
> +	/*
> +	 * The host has edat, and the guest does not, or it was an ASCE type
> +	 * exception. The host needs to inject the appropriate DAT interrupts
> +	 * into the guest.
> +	 */
> +	if (rc_dest)
> +		return inject_fault(vcpu, rc_dest, dest, 1);
> +	return inject_fault(vcpu, rc_src, src, 0);
> +}
> +
>  /*
>   * Run the vsie on a shadow scb and a shadow gmap, without any further
>   * sanity checks, handling SIE faults.
> @@ -1068,6 +1147,10 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>  		if ((scb_s->ipa & 0xf000) != 0xf000)
>  			scb_s->ipa += 0x1000;
>  		break;
> +	case ICPT_PARTEXEC:
> +		if (scb_s->ipa == 0xb254)
> +			rc = vsie_handle_mvpg(vcpu, vsie_page);
> +		break;
>  	}
>  	return rc;
>  }
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] s390/kvm: extend kvm_s390_shadow_fault to return entry pointer
  2021-02-09 15:43 ` [PATCH v3 1/2] s390/kvm: extend kvm_s390_shadow_fault to return entry pointer Claudio Imbrenda
  2021-02-11  8:35   ` Janosch Frank
@ 2021-02-11  9:18   ` David Hildenbrand
  2021-02-11 12:57     ` Claudio Imbrenda
  1 sibling, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2021-02-11  9:18 UTC (permalink / raw)
  To: Claudio Imbrenda, linux-kernel
  Cc: borntraeger, frankja, cohuck, kvm, linux-s390, stable

On 09.02.21 16:43, Claudio Imbrenda wrote:
> Extend kvm_s390_shadow_fault to return the pointer to the valid leaf
> DAT table entry, or to the invalid entry.
> 
> Also return some flags in the lower bits of the address:
> DAT_PROT: indicates that DAT protection applies because of the
>            protection bit in the segment (or, if EDAT, region) tables
> NOT_PTE: indicates that the address of the DAT table entry returned
>           does not refer to a PTE, but to a segment or region table.
> 

I've been thinking about one possible issue, but I think it's not 
actually an issue. Just sharing so others can verify:

In case our guest uses huge pages / gigantic pages / ASCE R, we create 
fake page tables (GMAP_SHADOW_FAKE_TABLE).

So, it could be that kvm_s390_shadow_fault()->gmap_shadow_pgt_lookup()
succeeds, however, we have a fake PTE in our hands. We lost the
actual guest STE/RTE address. (I think it could be recovered somehow via 
page->index, thought)

But I guess, if there is a fake PTE, then there is not acutally
something that could go wrong in gmap_shadow_page() anymore that could
lead us in responding something wrong to the guest. We can only really
fail with -EINVAL, -ENOMEM or -EFAULT.

So if the guest changed anything in the meantime (e.g., zap a segment), 
we would have unshadowed the whole fake page table hierarchy and would
simply retry.

> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Cc: stable@vger.kernel.org
> ---
>   arch/s390/kvm/gaccess.c | 30 +++++++++++++++++++++++++-----
>   arch/s390/kvm/gaccess.h |  5 ++++-
>   arch/s390/kvm/vsie.c    |  8 ++++----
>   3 files changed, 33 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
> index 6d6b57059493..e0ab83f051d2 100644
> --- a/arch/s390/kvm/gaccess.c
> +++ b/arch/s390/kvm/gaccess.c
> @@ -976,7 +976,9 @@ int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra)
>    * kvm_s390_shadow_tables - walk the guest page table and create shadow tables
>    * @sg: pointer to the shadow guest address space structure
>    * @saddr: faulting address in the shadow gmap
> - * @pgt: pointer to the page table address result
> + * @pgt: pointer to the beginning of the page table for the given address if
> + *       successful (return value 0), or to the first invalid DAT entry in
> + *       case of exceptions (return value > 0)
>    * @fake: pgt references contiguous guest memory block, not a pgtable
>    */
>   static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
> @@ -1034,6 +1036,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
>   			rfte.val = ptr;
>   			goto shadow_r2t;
>   		}
> +		*pgt = ptr + vaddr.rfx * 8;
>   		rc = gmap_read_table(parent, ptr + vaddr.rfx * 8, &rfte.val);

Using

gmap_read_table(parent, *pgt, &rfte.val);

or similar with a local variable might then be even clearer. But no 
strong opinion.

>   		if (rc)
>   			return rc;
> @@ -1060,6 +1063,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
>   			rste.val = ptr;
>   			goto shadow_r3t;
>   		}
> +		*pgt = ptr + vaddr.rsx * 8;
>   		rc = gmap_read_table(parent, ptr + vaddr.rsx * 8, &rste.val);
>   		if (rc)
>   			return rc;
> @@ -1087,6 +1091,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
>   			rtte.val = ptr;
>   			goto shadow_sgt;
>   		}
> +		*pgt = ptr + vaddr.rtx * 8;
>   		rc = gmap_read_table(parent, ptr + vaddr.rtx * 8, &rtte.val);
>   		if (rc)
>   			return rc;
> @@ -1123,6 +1128,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
>   			ste.val = ptr;
>   			goto shadow_pgt;
>   		}
> +		*pgt = ptr + vaddr.sx * 8;
>   		rc = gmap_read_table(parent, ptr + vaddr.sx * 8, &ste.val);
>   		if (rc)
>   			return rc;
> @@ -1157,6 +1163,8 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
>    * @vcpu: virtual cpu
>    * @sg: pointer to the shadow guest address space structure
>    * @saddr: faulting address in the shadow gmap
> + * @datptr: will contain the address of the faulting DAT table entry, or of
> + *          the valid leaf, plus some flags
>    *
>    * Returns: - 0 if the shadow fault was successfully resolved
>    *	    - > 0 (pgm exception code) on exceptions while faulting
> @@ -1165,11 +1173,11 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
>    *	    - -ENOMEM if out of memory
>    */
>   int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
> -			  unsigned long saddr)
> +			  unsigned long saddr, unsigned long *datptr)
>   {
>   	union vaddress vaddr;
>   	union page_table_entry pte;
> -	unsigned long pgt;
> +	unsigned long pgt = 0;
>   	int dat_protection, fake;
>   	int rc;
>   
> @@ -1191,8 +1199,20 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
>   		pte.val = pgt + vaddr.px * PAGE_SIZE;
>   		goto shadow_page;
>   	}
> -	if (!rc)
> -		rc = gmap_read_table(sg->parent, pgt + vaddr.px * 8, &pte.val);
> +
> +	switch (rc) {
> +	case PGM_SEGMENT_TRANSLATION:
> +	case PGM_REGION_THIRD_TRANS:
> +	case PGM_REGION_SECOND_TRANS:
> +	case PGM_REGION_FIRST_TRANS:
> +		pgt |= NOT_PTE;
> +		break;
> +	case 0:
> +		pgt += vaddr.px * 8;
> +		rc = gmap_read_table(sg->parent, pgt, &pte.val);
> +	}
> +	if (*datptr)
> +		*datptr = pgt | dat_protection * DAT_PROT;
>   	if (!rc && pte.i)
>   		rc = PGM_PAGE_TRANSLATION;
>   	if (!rc && pte.z)
> diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
> index f4c51756c462..fec26bbb17ba 100644
> --- a/arch/s390/kvm/gaccess.h
> +++ b/arch/s390/kvm/gaccess.h
> @@ -359,7 +359,10 @@ void ipte_unlock(struct kvm_vcpu *vcpu);
>   int ipte_lock_held(struct kvm_vcpu *vcpu);
>   int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra);
>   
> +#define DAT_PROT 2
> +#define NOT_PTE 4

What if our guest is using ASCE.R ?

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] s390/kvm: extend kvm_s390_shadow_fault to return entry pointer
  2021-02-11  9:18   ` David Hildenbrand
@ 2021-02-11 12:57     ` Claudio Imbrenda
  2021-02-11 13:15       ` David Hildenbrand
  0 siblings, 1 reply; 8+ messages in thread
From: Claudio Imbrenda @ 2021-02-11 12:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, borntraeger, frankja, cohuck, kvm, linux-s390, stable

On Thu, 11 Feb 2021 10:18:56 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 09.02.21 16:43, Claudio Imbrenda wrote:
> > Extend kvm_s390_shadow_fault to return the pointer to the valid leaf
> > DAT table entry, or to the invalid entry.
> > 
> > Also return some flags in the lower bits of the address:
> > DAT_PROT: indicates that DAT protection applies because of the
> >            protection bit in the segment (or, if EDAT, region)
> > tables NOT_PTE: indicates that the address of the DAT table entry
> > returned does not refer to a PTE, but to a segment or region table.
> >   
> 
> I've been thinking about one possible issue, but I think it's not 
> actually an issue. Just sharing so others can verify:
> 
> In case our guest uses huge pages / gigantic pages / ASCE R, we
> create fake page tables (GMAP_SHADOW_FAKE_TABLE).
> 
> So, it could be that kvm_s390_shadow_fault()->gmap_shadow_pgt_lookup()
> succeeds, however, we have a fake PTE in our hands. We lost the
> actual guest STE/RTE address. (I think it could be recovered somehow
> via page->index, thought)
> 
> But I guess, if there is a fake PTE, then there is not acutally
> something that could go wrong in gmap_shadow_page() anymore that could
> lead us in responding something wrong to the guest. We can only really
> fail with -EINVAL, -ENOMEM or -EFAULT.

this was also my reasoning

> So if the guest changed anything in the meantime (e.g., zap a
> segment), we would have unshadowed the whole fake page table
> hierarchy and would simply retry.
> 
> > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > Cc: stable@vger.kernel.org
> > ---
> >   arch/s390/kvm/gaccess.c | 30 +++++++++++++++++++++++++-----
> >   arch/s390/kvm/gaccess.h |  5 ++++-
> >   arch/s390/kvm/vsie.c    |  8 ++++----
> >   3 files changed, 33 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
> > index 6d6b57059493..e0ab83f051d2 100644
> > --- a/arch/s390/kvm/gaccess.c
> > +++ b/arch/s390/kvm/gaccess.c
> > @@ -976,7 +976,9 @@ int kvm_s390_check_low_addr_prot_real(struct
> > kvm_vcpu *vcpu, unsigned long gra)
> >    * kvm_s390_shadow_tables - walk the guest page table and create
> > shadow tables
> >    * @sg: pointer to the shadow guest address space structure
> >    * @saddr: faulting address in the shadow gmap
> > - * @pgt: pointer to the page table address result
> > + * @pgt: pointer to the beginning of the page table for the given
> > address if
> > + *       successful (return value 0), or to the first invalid DAT
> > entry in
> > + *       case of exceptions (return value > 0)
> >    * @fake: pgt references contiguous guest memory block, not a
> > pgtable */
> >   static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long
> > saddr, @@ -1034,6 +1036,7 @@ static int
> > kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
> > rfte.val = ptr; goto shadow_r2t;
> >   		}
> > +		*pgt = ptr + vaddr.rfx * 8;
> >   		rc = gmap_read_table(parent, ptr + vaddr.rfx * 8,
> > &rfte.val);  
> 
> Using
> 
> gmap_read_table(parent, *pgt, &rfte.val);
> 
> or similar with a local variable might then be even clearer. But no 
> strong opinion.

that's also something I had thought about, in the end this minimizes
the number of lines / variables while still being readable

> >   		if (rc)
> >   			return rc;
> > @@ -1060,6 +1063,7 @@ static int kvm_s390_shadow_tables(struct gmap
> > *sg, unsigned long saddr, rste.val = ptr;
> >   			goto shadow_r3t;
> >   		}
> > +		*pgt = ptr + vaddr.rsx * 8;
> >   		rc = gmap_read_table(parent, ptr + vaddr.rsx * 8,
> > &rste.val); if (rc)
> >   			return rc;
> > @@ -1087,6 +1091,7 @@ static int kvm_s390_shadow_tables(struct gmap
> > *sg, unsigned long saddr, rtte.val = ptr;
> >   			goto shadow_sgt;
> >   		}
> > +		*pgt = ptr + vaddr.rtx * 8;
> >   		rc = gmap_read_table(parent, ptr + vaddr.rtx * 8,
> > &rtte.val); if (rc)
> >   			return rc;
> > @@ -1123,6 +1128,7 @@ static int kvm_s390_shadow_tables(struct gmap
> > *sg, unsigned long saddr, ste.val = ptr;
> >   			goto shadow_pgt;
> >   		}
> > +		*pgt = ptr + vaddr.sx * 8;
> >   		rc = gmap_read_table(parent, ptr + vaddr.sx * 8,
> > &ste.val); if (rc)
> >   			return rc;
> > @@ -1157,6 +1163,8 @@ static int kvm_s390_shadow_tables(struct gmap
> > *sg, unsigned long saddr,
> >    * @vcpu: virtual cpu
> >    * @sg: pointer to the shadow guest address space structure
> >    * @saddr: faulting address in the shadow gmap
> > + * @datptr: will contain the address of the faulting DAT table
> > entry, or of
> > + *          the valid leaf, plus some flags
> >    *
> >    * Returns: - 0 if the shadow fault was successfully resolved
> >    *	    - > 0 (pgm exception code) on exceptions while
> > faulting @@ -1165,11 +1173,11 @@ static int
> > kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
> >    *	    - -ENOMEM if out of memory
> >    */
> >   int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
> > -			  unsigned long saddr)
> > +			  unsigned long saddr, unsigned long
> > *datptr) {
> >   	union vaddress vaddr;
> >   	union page_table_entry pte;
> > -	unsigned long pgt;
> > +	unsigned long pgt = 0;
> >   	int dat_protection, fake;
> >   	int rc;
> >   
> > @@ -1191,8 +1199,20 @@ int kvm_s390_shadow_fault(struct kvm_vcpu
> > *vcpu, struct gmap *sg, pte.val = pgt + vaddr.px * PAGE_SIZE;
> >   		goto shadow_page;
> >   	}
> > -	if (!rc)
> > -		rc = gmap_read_table(sg->parent, pgt + vaddr.px *
> > 8, &pte.val); +
> > +	switch (rc) {
> > +	case PGM_SEGMENT_TRANSLATION:
> > +	case PGM_REGION_THIRD_TRANS:
> > +	case PGM_REGION_SECOND_TRANS:
> > +	case PGM_REGION_FIRST_TRANS:
> > +		pgt |= NOT_PTE;
> > +		break;
> > +	case 0:
> > +		pgt += vaddr.px * 8;
> > +		rc = gmap_read_table(sg->parent, pgt, &pte.val);
> > +	}
> > +	if (*datptr)
> > +		*datptr = pgt | dat_protection * DAT_PROT;
> >   	if (!rc && pte.i)
> >   		rc = PGM_PAGE_TRANSLATION;
> >   	if (!rc && pte.z)
> > diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
> > index f4c51756c462..fec26bbb17ba 100644
> > --- a/arch/s390/kvm/gaccess.h
> > +++ b/arch/s390/kvm/gaccess.h
> > @@ -359,7 +359,10 @@ void ipte_unlock(struct kvm_vcpu *vcpu);
> >   int ipte_lock_held(struct kvm_vcpu *vcpu);
> >   int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu,
> > unsigned long gra); 
> > +#define DAT_PROT 2
> > +#define NOT_PTE 4  
> 
> What if our guest is using ASCE.R ?

then we don't care.

if the guest is using ASCE.R, then shadowing will always succeed, and
the VSIE MVPG handler will retry right away.

if you are worried about the the lowest order bit, it can only be set
if a specific feature is enabled in the host, and KVM doesn't use /
support it, so the guest can't use it for its guest.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/2] s390/kvm: extend kvm_s390_shadow_fault to return entry pointer
  2021-02-11 12:57     ` Claudio Imbrenda
@ 2021-02-11 13:15       ` David Hildenbrand
  0 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2021-02-11 13:15 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: linux-kernel, borntraeger, frankja, cohuck, kvm, linux-s390, stable

On 11.02.21 13:57, Claudio Imbrenda wrote:
> On Thu, 11 Feb 2021 10:18:56 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 09.02.21 16:43, Claudio Imbrenda wrote:
>>> Extend kvm_s390_shadow_fault to return the pointer to the valid leaf
>>> DAT table entry, or to the invalid entry.
>>>
>>> Also return some flags in the lower bits of the address:
>>> DAT_PROT: indicates that DAT protection applies because of the
>>>             protection bit in the segment (or, if EDAT, region)
>>> tables NOT_PTE: indicates that the address of the DAT table entry
>>> returned does not refer to a PTE, but to a segment or region table.
>>>    
>>
>> I've been thinking about one possible issue, but I think it's not
>> actually an issue. Just sharing so others can verify:
>>
>> In case our guest uses huge pages / gigantic pages / ASCE R, we
>> create fake page tables (GMAP_SHADOW_FAKE_TABLE).
>>
>> So, it could be that kvm_s390_shadow_fault()->gmap_shadow_pgt_lookup()
>> succeeds, however, we have a fake PTE in our hands. We lost the
>> actual guest STE/RTE address. (I think it could be recovered somehow
>> via page->index, thought)
>>
>> But I guess, if there is a fake PTE, then there is not acutally
>> something that could go wrong in gmap_shadow_page() anymore that could
>> lead us in responding something wrong to the guest. We can only really
>> fail with -EINVAL, -ENOMEM or -EFAULT.
> 
> this was also my reasoning
> 
>> So if the guest changed anything in the meantime (e.g., zap a
>> segment), we would have unshadowed the whole fake page table
>> hierarchy and would simply retry.
>>
>>> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>> Cc: stable@vger.kernel.org
>>> ---
>>>    arch/s390/kvm/gaccess.c | 30 +++++++++++++++++++++++++-----
>>>    arch/s390/kvm/gaccess.h |  5 ++++-
>>>    arch/s390/kvm/vsie.c    |  8 ++++----
>>>    3 files changed, 33 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
>>> index 6d6b57059493..e0ab83f051d2 100644
>>> --- a/arch/s390/kvm/gaccess.c
>>> +++ b/arch/s390/kvm/gaccess.c
>>> @@ -976,7 +976,9 @@ int kvm_s390_check_low_addr_prot_real(struct
>>> kvm_vcpu *vcpu, unsigned long gra)
>>>     * kvm_s390_shadow_tables - walk the guest page table and create
>>> shadow tables
>>>     * @sg: pointer to the shadow guest address space structure
>>>     * @saddr: faulting address in the shadow gmap
>>> - * @pgt: pointer to the page table address result
>>> + * @pgt: pointer to the beginning of the page table for the given
>>> address if
>>> + *       successful (return value 0), or to the first invalid DAT
>>> entry in
>>> + *       case of exceptions (return value > 0)
>>>     * @fake: pgt references contiguous guest memory block, not a
>>> pgtable */
>>>    static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long
>>> saddr, @@ -1034,6 +1036,7 @@ static int
>>> kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
>>> rfte.val = ptr; goto shadow_r2t;
>>>    		}
>>> +		*pgt = ptr + vaddr.rfx * 8;
>>>    		rc = gmap_read_table(parent, ptr + vaddr.rfx * 8,
>>> &rfte.val);
>>
>> Using
>>
>> gmap_read_table(parent, *pgt, &rfte.val);
>>
>> or similar with a local variable might then be even clearer. But no
>> strong opinion.
> 
> that's also something I had thought about, in the end this minimizes
> the number of lines / variables while still being readable
> 
>>>    		if (rc)
>>>    			return rc;
>>> @@ -1060,6 +1063,7 @@ static int kvm_s390_shadow_tables(struct gmap
>>> *sg, unsigned long saddr, rste.val = ptr;
>>>    			goto shadow_r3t;
>>>    		}
>>> +		*pgt = ptr + vaddr.rsx * 8;
>>>    		rc = gmap_read_table(parent, ptr + vaddr.rsx * 8,
>>> &rste.val); if (rc)
>>>    			return rc;
>>> @@ -1087,6 +1091,7 @@ static int kvm_s390_shadow_tables(struct gmap
>>> *sg, unsigned long saddr, rtte.val = ptr;
>>>    			goto shadow_sgt;
>>>    		}
>>> +		*pgt = ptr + vaddr.rtx * 8;
>>>    		rc = gmap_read_table(parent, ptr + vaddr.rtx * 8,
>>> &rtte.val); if (rc)
>>>    			return rc;
>>> @@ -1123,6 +1128,7 @@ static int kvm_s390_shadow_tables(struct gmap
>>> *sg, unsigned long saddr, ste.val = ptr;
>>>    			goto shadow_pgt;
>>>    		}
>>> +		*pgt = ptr + vaddr.sx * 8;
>>>    		rc = gmap_read_table(parent, ptr + vaddr.sx * 8,
>>> &ste.val); if (rc)
>>>    			return rc;
>>> @@ -1157,6 +1163,8 @@ static int kvm_s390_shadow_tables(struct gmap
>>> *sg, unsigned long saddr,
>>>     * @vcpu: virtual cpu
>>>     * @sg: pointer to the shadow guest address space structure
>>>     * @saddr: faulting address in the shadow gmap
>>> + * @datptr: will contain the address of the faulting DAT table
>>> entry, or of
>>> + *          the valid leaf, plus some flags
>>>     *
>>>     * Returns: - 0 if the shadow fault was successfully resolved
>>>     *	    - > 0 (pgm exception code) on exceptions while
>>> faulting @@ -1165,11 +1173,11 @@ static int
>>> kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
>>>     *	    - -ENOMEM if out of memory
>>>     */
>>>    int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
>>> -			  unsigned long saddr)
>>> +			  unsigned long saddr, unsigned long
>>> *datptr) {
>>>    	union vaddress vaddr;
>>>    	union page_table_entry pte;
>>> -	unsigned long pgt;
>>> +	unsigned long pgt = 0;
>>>    	int dat_protection, fake;
>>>    	int rc;
>>>    
>>> @@ -1191,8 +1199,20 @@ int kvm_s390_shadow_fault(struct kvm_vcpu
>>> *vcpu, struct gmap *sg, pte.val = pgt + vaddr.px * PAGE_SIZE;
>>>    		goto shadow_page;
>>>    	}
>>> -	if (!rc)
>>> -		rc = gmap_read_table(sg->parent, pgt + vaddr.px *
>>> 8, &pte.val); +
>>> +	switch (rc) {
>>> +	case PGM_SEGMENT_TRANSLATION:
>>> +	case PGM_REGION_THIRD_TRANS:
>>> +	case PGM_REGION_SECOND_TRANS:
>>> +	case PGM_REGION_FIRST_TRANS:
>>> +		pgt |= NOT_PTE;
>>> +		break;
>>> +	case 0:
>>> +		pgt += vaddr.px * 8;
>>> +		rc = gmap_read_table(sg->parent, pgt, &pte.val);
>>> +	}
>>> +	if (*datptr)
>>> +		*datptr = pgt | dat_protection * DAT_PROT;
>>>    	if (!rc && pte.i)
>>>    		rc = PGM_PAGE_TRANSLATION;
>>>    	if (!rc && pte.z)
>>> diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
>>> index f4c51756c462..fec26bbb17ba 100644
>>> --- a/arch/s390/kvm/gaccess.h
>>> +++ b/arch/s390/kvm/gaccess.h
>>> @@ -359,7 +359,10 @@ void ipte_unlock(struct kvm_vcpu *vcpu);
>>>    int ipte_lock_held(struct kvm_vcpu *vcpu);
>>>    int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu,
>>> unsigned long gra);
>>> +#define DAT_PROT 2
>>> +#define NOT_PTE 4
>>
>> What if our guest is using ASCE.R ?
> 
> then we don't care.
> 
> if the guest is using ASCE.R, then shadowing will always succeed, and
> the VSIE MVPG handler will retry right away.
> 
> if you are worried about the the lowest order bit, it can only be set
> if a specific feature is enabled in the host, and KVM doesn't use /
> support it, so the guest can't use it for its guest.

Got it, thanks! :)

Reviewed-by: David Hildenbrand <david@redhat.com>


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-02-11 13:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-09 15:43 [PATCH v3 0/2] s390/kvm: fix MVPG when in VSIE Claudio Imbrenda
2021-02-09 15:43 ` [PATCH v3 1/2] s390/kvm: extend kvm_s390_shadow_fault to return entry pointer Claudio Imbrenda
2021-02-11  8:35   ` Janosch Frank
2021-02-11  9:18   ` David Hildenbrand
2021-02-11 12:57     ` Claudio Imbrenda
2021-02-11 13:15       ` David Hildenbrand
2021-02-09 15:43 ` [PATCH v3 2/2] s390/kvm: VSIE: correctly handle MVPG when in VSIE Claudio Imbrenda
2021-02-11  8:46   ` Janosch Frank

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).