linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/5] KVM: s390: vsie: fixes and cleanups.
@ 2020-04-02 18:48 David Hildenbrand
  2020-04-02 18:48 ` [PATCH v1 1/5] KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks David Hildenbrand
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: David Hildenbrand @ 2020-04-02 18:48 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, linux-kernel, Vasily Gorbik, Heiko Carstens,
	Cornelia Huck, Janosch Frank, Christian Borntraeger,
	David Hildenbrand

Some vsie/gmap fixes and two cleanups/improvements.

Patch #1 fixes an issue reported by Janosch. It was never observed so far,
because KVM usually doesn't use a region 1 table for it's guest (unless
memory would be exceeding something like 16 EB, which isn't even supported
by the HW). Older QEMU+KVM or other hypervisors can trigger this.

Patch #2 fixes a code path that probably was never taken and will most
probably not be taken very often in the future - unless somebody really
messes up the page tables for a guest (or writes a test for it). At some
point, a test case for this would be nice.

Patch #3 fixes a rare possible race. Don't think this is stable material.

Gave it some testing with my limited access to somewhat-fast s390x
machines. Booted a Linux kernel, supplying all possible number of
page table hiearchies.

David Hildenbrand (5):
  KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks
  KVM: s390: vsie: Fix delivery of addressing exceptions
  KVM: s390: vsie: Fix possible race when shadowing region 3 tables
  KVM: s390: vsie: Move conditional reschedule
  KVM: s390: vsie: gmap_table_walk() simplifications

 arch/s390/kvm/vsie.c |  4 ++--
 arch/s390/mm/gmap.c  | 14 ++++++++++----
 2 files changed, 12 insertions(+), 6 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v1 1/5] KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks
  2020-04-02 18:48 [PATCH v1 0/5] KVM: s390: vsie: fixes and cleanups David Hildenbrand
@ 2020-04-02 18:48 ` David Hildenbrand
  2020-04-03 12:07   ` David Hildenbrand
  2020-04-03 13:41   ` Janosch Frank
  2020-04-02 18:48 ` [PATCH v1 2/5] KVM: s390: vsie: Fix delivery of addressing exceptions David Hildenbrand
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: David Hildenbrand @ 2020-04-02 18:48 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, linux-kernel, Vasily Gorbik, Heiko Carstens,
	Cornelia Huck, Janosch Frank, Christian Borntraeger,
	David Hildenbrand, stable

In case we have a region 1 ASCE, our shadow/g3 address can have any value.
Unfortunately, (-1UL << 64) is undefined and triggers sometimes,
rejecting valid shadow addresses when trying to walk our shadow table
hierarchy.

The result is that the prefix cannot get mapped and will loop basically
forever trying to map it (-EAGAIN loop).

After all, the broken check is only a sanity check, our table shadowing
code in kvm_s390_shadow_tables() already checks these conditions, injecting
proper translation exceptions. Turn it into a WARN_ON_ONCE().

Fixes: 4be130a08420 ("s390/mm: add shadow gmap support")
Cc: <stable@vger.kernel.org> # v4.8+
Reported-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/mm/gmap.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 2fbece47ef6f..f3dbc5bdde50 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -787,14 +787,18 @@ static void gmap_call_notifier(struct gmap *gmap, unsigned long start,
 static inline unsigned long *gmap_table_walk(struct gmap *gmap,
 					     unsigned long gaddr, int level)
 {
+	const int asce_type = gmap->asce & _ASCE_TYPE_MASK;
 	unsigned long *table;
 
 	if ((gmap->asce & _ASCE_TYPE_MASK) + 4 < (level * 4))
 		return NULL;
 	if (gmap_is_shadow(gmap) && gmap->removed)
 		return NULL;
-	if (gaddr & (-1UL << (31 + ((gmap->asce & _ASCE_TYPE_MASK) >> 2)*11)))
+
+	if (WARN_ON_ONCE(asce_type != _ASCE_TYPE_REGION1) &&
+			 gaddr & (-1UL << (31 + (asce_type >> 2) * 11)))
 		return NULL;
+
 	table = gmap->table;
 	switch (gmap->asce & _ASCE_TYPE_MASK) {
 	case _ASCE_TYPE_REGION1:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 2/5] KVM: s390: vsie: Fix delivery of addressing exceptions
  2020-04-02 18:48 [PATCH v1 0/5] KVM: s390: vsie: fixes and cleanups David Hildenbrand
  2020-04-02 18:48 ` [PATCH v1 1/5] KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks David Hildenbrand
@ 2020-04-02 18:48 ` David Hildenbrand
  2020-04-06 13:17   ` Christian Borntraeger
  2020-04-02 18:48 ` [PATCH v1 3/5] KVM: s390: vsie: Fix possible race when shadowing region 3 tables David Hildenbrand
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2020-04-02 18:48 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, linux-kernel, Vasily Gorbik, Heiko Carstens,
	Cornelia Huck, Janosch Frank, Christian Borntraeger,
	David Hildenbrand, stable

Whenever we get an -EFAULT, we failed to read in guest 2 physical
address space. Such addressing exceptions are reported via a program
intercept to the nested hypervisor.

We faked the intercept, we have to return to guest 2. Instead, right
now we would be returning -EFAULT from the intercept handler, eventually
crashing the VM.

Addressing exceptions can only happen if the g2->g3 page tables
reference invalid g2 addresses (say, either a table or the final page is
not accessible - so something that basically never happens in sane
environments.

Identified by manual code inspection.

Fixes: a3508fbe9dc6 ("KVM: s390: vsie: initial support for nested virtualization")
Cc: <stable@vger.kernel.org> # v4.8+
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/kvm/vsie.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index 076090f9e666..4f6c22d72072 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -1202,6 +1202,7 @@ static int vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
 		scb_s->iprcc = PGM_ADDRESSING;
 		scb_s->pgmilc = 4;
 		scb_s->gpsw.addr = __rewind_psw(scb_s->gpsw, 4);
+		rc = 1;
 	}
 	return rc;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 3/5] KVM: s390: vsie: Fix possible race when shadowing region 3 tables
  2020-04-02 18:48 [PATCH v1 0/5] KVM: s390: vsie: fixes and cleanups David Hildenbrand
  2020-04-02 18:48 ` [PATCH v1 1/5] KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks David Hildenbrand
  2020-04-02 18:48 ` [PATCH v1 2/5] KVM: s390: vsie: Fix delivery of addressing exceptions David Hildenbrand
@ 2020-04-02 18:48 ` David Hildenbrand
  2020-04-06 13:47   ` Christian Borntraeger
  2020-04-02 18:48 ` [PATCH v1 4/5] KVM: s390: vsie: Move conditional reschedule David Hildenbrand
  2020-04-02 18:48 ` [PATCH v1 5/5] KVM: s390: vsie: gmap_table_walk() simplifications David Hildenbrand
  4 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2020-04-02 18:48 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, linux-kernel, Vasily Gorbik, Heiko Carstens,
	Cornelia Huck, Janosch Frank, Christian Borntraeger,
	David Hildenbrand

We have to properly retry again by returning -EINVAL immediately in case
somebody else instantiated the table concurrently. We missed to add the
goto in this function only. The code now matches the other, similar
shadowing functions.

We are overwriting an existing region 2 table entry. All allocated pages
are added to the crst_list to be freed later, so they are not lost
forever. However, when unshadowing the region 2 table, we wouldn't trigger
unshadowing of the original shadowed region 3 table that we replaced. It
would get unshadowed when the original region 3 table is modified. As it's
not connected to the page table hierarchy anymore, it's not going to get
used anymore. However, for a limited time, this page table will stick
around, so it's in some sense a temporary memory leak.

Identified by manual code inspection. I don't think this classifies as
stable material.

Fixes: 998f637cc4b9 ("s390/mm: avoid races on region/segment/page table shadowing")
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/mm/gmap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index f3dbc5bdde50..fd32ab566f57 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -1844,6 +1844,7 @@ int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3t,
 		goto out_free;
 	} else if (*table & _REGION_ENTRY_ORIGIN) {
 		rc = -EAGAIN;		/* Race with shadow */
+		goto out_free;
 	}
 	crst_table_init(s_r3t, _REGION3_ENTRY_EMPTY);
 	/* mark as invalid as long as the parent table is not protected */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 4/5] KVM: s390: vsie: Move conditional reschedule
  2020-04-02 18:48 [PATCH v1 0/5] KVM: s390: vsie: fixes and cleanups David Hildenbrand
                   ` (2 preceding siblings ...)
  2020-04-02 18:48 ` [PATCH v1 3/5] KVM: s390: vsie: Fix possible race when shadowing region 3 tables David Hildenbrand
@ 2020-04-02 18:48 ` David Hildenbrand
  2020-04-02 18:48 ` [PATCH v1 5/5] KVM: s390: vsie: gmap_table_walk() simplifications David Hildenbrand
  4 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2020-04-02 18:48 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, linux-kernel, Vasily Gorbik, Heiko Carstens,
	Cornelia Huck, Janosch Frank, Christian Borntraeger,
	David Hildenbrand

Let's move it to the outer loop, in case we ever run again into long
loops, trying to map the prefix. While at it, convert it to cond_resched().

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/kvm/vsie.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index 4f6c22d72072..ef05b4e167fb 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -1000,8 +1000,6 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
 
 	handle_last_fault(vcpu, vsie_page);
 
-	if (need_resched())
-		schedule();
 	if (test_cpu_flag(CIF_MCCK_PENDING))
 		s390_handle_mcck();
 
@@ -1185,6 +1183,7 @@ static int vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
 		    kvm_s390_vcpu_has_irq(vcpu, 0) ||
 		    kvm_s390_vcpu_sie_inhibited(vcpu))
 			break;
+		cond_resched();
 	}
 
 	if (rc == -EFAULT) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 5/5] KVM: s390: vsie: gmap_table_walk() simplifications
  2020-04-02 18:48 [PATCH v1 0/5] KVM: s390: vsie: fixes and cleanups David Hildenbrand
                   ` (3 preceding siblings ...)
  2020-04-02 18:48 ` [PATCH v1 4/5] KVM: s390: vsie: Move conditional reschedule David Hildenbrand
@ 2020-04-02 18:48 ` David Hildenbrand
  2020-04-03 13:03   ` Janosch Frank
  4 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2020-04-02 18:48 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, linux-kernel, Vasily Gorbik, Heiko Carstens,
	Cornelia Huck, Janosch Frank, Christian Borntraeger,
	David Hildenbrand

Let's use asce_type where applicable. Also, simplify our sanity check for
valid table levels and convert it into a WARN_ON_ONCE(). Check if we even
have a valid gmap shadow as the very first step.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/mm/gmap.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index fd32ab566f57..3c801dae7988 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -790,17 +790,18 @@ static inline unsigned long *gmap_table_walk(struct gmap *gmap,
 	const int asce_type = gmap->asce & _ASCE_TYPE_MASK;
 	unsigned long *table;
 
-	if ((gmap->asce & _ASCE_TYPE_MASK) + 4 < (level * 4))
-		return NULL;
 	if (gmap_is_shadow(gmap) && gmap->removed)
 		return NULL;
 
+	if (WARN_ON_ONCE(level > (asce_type >> 2) + 1))
+		return NULL;
+
 	if (WARN_ON_ONCE(asce_type != _ASCE_TYPE_REGION1) &&
 			 gaddr & (-1UL << (31 + (asce_type >> 2) * 11)))
 		return NULL;
 
 	table = gmap->table;
-	switch (gmap->asce & _ASCE_TYPE_MASK) {
+	switch (asce_type) {
 	case _ASCE_TYPE_REGION1:
 		table += (gaddr & _REGION1_INDEX) >> _REGION1_SHIFT;
 		if (level == 4)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/5] KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks
  2020-04-02 18:48 ` [PATCH v1 1/5] KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks David Hildenbrand
@ 2020-04-03 12:07   ` David Hildenbrand
  2020-04-03 13:41   ` Janosch Frank
  1 sibling, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2020-04-03 12:07 UTC (permalink / raw)
  To: kvm
  Cc: linux-s390, linux-kernel, Vasily Gorbik, Heiko Carstens,
	Cornelia Huck, Janosch Frank, Christian Borntraeger, stable

On 02.04.20 20:48, David Hildenbrand wrote:
> In case we have a region 1 ASCE, our shadow/g3 address can have any value.
> Unfortunately, (-1UL << 64) is undefined and triggers sometimes,
> rejecting valid shadow addresses when trying to walk our shadow table
> hierarchy.
> 
> The result is that the prefix cannot get mapped and will loop basically
> forever trying to map it (-EAGAIN loop).
> 
> After all, the broken check is only a sanity check, our table shadowing
> code in kvm_s390_shadow_tables() already checks these conditions, injecting
> proper translation exceptions. Turn it into a WARN_ON_ONCE().
> 
> Fixes: 4be130a08420 ("s390/mm: add shadow gmap support")
> Cc: <stable@vger.kernel.org> # v4.8+
> Reported-by: Janosch Frank <frankja@linux.ibm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/s390/mm/gmap.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 2fbece47ef6f..f3dbc5bdde50 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -787,14 +787,18 @@ static void gmap_call_notifier(struct gmap *gmap, unsigned long start,
>  static inline unsigned long *gmap_table_walk(struct gmap *gmap,
>  					     unsigned long gaddr, int level)
>  {
> +	const int asce_type = gmap->asce & _ASCE_TYPE_MASK;
>  	unsigned long *table;
>  
>  	if ((gmap->asce & _ASCE_TYPE_MASK) + 4 < (level * 4))
>  		return NULL;
>  	if (gmap_is_shadow(gmap) && gmap->removed)
>  		return NULL;
> -	if (gaddr & (-1UL << (31 + ((gmap->asce & _ASCE_TYPE_MASK) >> 2)*11)))
> +
> +	if (WARN_ON_ONCE(asce_type != _ASCE_TYPE_REGION1) &&
> +			 gaddr & (-1UL << (31 + (asce_type >> 2) * 11)))
>  		return NULL;

This has to be

if (WARN_ON_ONCE(asce_type != _ASCE_TYPE_REGION1 &&
		 gaddr & (-1UL << (31 + (asce_type >> 2) * 11))))

otherwise we'll get wrong warnings

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 5/5] KVM: s390: vsie: gmap_table_walk() simplifications
  2020-04-02 18:48 ` [PATCH v1 5/5] KVM: s390: vsie: gmap_table_walk() simplifications David Hildenbrand
@ 2020-04-03 13:03   ` Janosch Frank
  2020-04-03 13:09     ` David Hildenbrand
  0 siblings, 1 reply; 14+ messages in thread
From: Janosch Frank @ 2020-04-03 13:03 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: linux-s390, linux-kernel, Vasily Gorbik, Heiko Carstens,
	Cornelia Huck, Christian Borntraeger


[-- Attachment #1.1: Type: text/plain, Size: 1387 bytes --]

On 4/2/20 8:48 PM, David Hildenbrand wrote:
> Let's use asce_type where applicable. Also, simplify our sanity check for
> valid table levels and convert it into a WARN_ON_ONCE(). Check if we even
> have a valid gmap shadow as the very first step.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/s390/mm/gmap.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index fd32ab566f57..3c801dae7988 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -790,17 +790,18 @@ static inline unsigned long *gmap_table_walk(struct gmap *gmap,
>  	const int asce_type = gmap->asce & _ASCE_TYPE_MASK;
>  	unsigned long *table;
>  
> -	if ((gmap->asce & _ASCE_TYPE_MASK) + 4 < (level * 4))
> -		return NULL;
>  	if (gmap_is_shadow(gmap) && gmap->removed)
>  		return NULL;
>  
> +	if (WARN_ON_ONCE(level > (asce_type >> 2) + 1))
> +		return NULL;
> +
>  	if (WARN_ON_ONCE(asce_type != _ASCE_TYPE_REGION1) &&
>  			 gaddr & (-1UL << (31 + (asce_type >> 2) * 11)))
>  		return NULL;
>  
>  	table = gmap->table;

We could also initialize this variable at the top, no?

> -	switch (gmap->asce & _ASCE_TYPE_MASK) {
> +	switch (asce_type) {
>  	case _ASCE_TYPE_REGION1:
>  		table += (gaddr & _REGION1_INDEX) >> _REGION1_SHIFT;
>  		if (level == 4)
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 5/5] KVM: s390: vsie: gmap_table_walk() simplifications
  2020-04-03 13:03   ` Janosch Frank
@ 2020-04-03 13:09     ` David Hildenbrand
  0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2020-04-03 13:09 UTC (permalink / raw)
  To: Janosch Frank, kvm
  Cc: linux-s390, linux-kernel, Vasily Gorbik, Heiko Carstens,
	Cornelia Huck, Christian Borntraeger

On 03.04.20 15:03, Janosch Frank wrote:
> On 4/2/20 8:48 PM, David Hildenbrand wrote:
>> Let's use asce_type where applicable. Also, simplify our sanity check for
>> valid table levels and convert it into a WARN_ON_ONCE(). Check if we even
>> have a valid gmap shadow as the very first step.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  arch/s390/mm/gmap.c | 7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
>> index fd32ab566f57..3c801dae7988 100644
>> --- a/arch/s390/mm/gmap.c
>> +++ b/arch/s390/mm/gmap.c
>> @@ -790,17 +790,18 @@ static inline unsigned long *gmap_table_walk(struct gmap *gmap,
>>  	const int asce_type = gmap->asce & _ASCE_TYPE_MASK;
>>  	unsigned long *table;
>>  
>> -	if ((gmap->asce & _ASCE_TYPE_MASK) + 4 < (level * 4))
>> -		return NULL;
>>  	if (gmap_is_shadow(gmap) && gmap->removed)
>>  		return NULL;
>>  
>> +	if (WARN_ON_ONCE(level > (asce_type >> 2) + 1))
>> +		return NULL;
>> +
>>  	if (WARN_ON_ONCE(asce_type != _ASCE_TYPE_REGION1) &&
>>  			 gaddr & (-1UL << (31 + (asce_type >> 2) * 11)))
>>  		return NULL;
>>  
>>  	table = gmap->table;
> 
> We could also initialize this variable at the top, no?

very right, add that.


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/5] KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks
  2020-04-02 18:48 ` [PATCH v1 1/5] KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks David Hildenbrand
  2020-04-03 12:07   ` David Hildenbrand
@ 2020-04-03 13:41   ` Janosch Frank
  1 sibling, 0 replies; 14+ messages in thread
From: Janosch Frank @ 2020-04-03 13:41 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: linux-s390, linux-kernel, Vasily Gorbik, Heiko Carstens,
	Cornelia Huck, Christian Borntraeger, stable


[-- Attachment #1.1: Type: text/plain, Size: 2010 bytes --]

On 4/2/20 8:48 PM, David Hildenbrand wrote:
> In case we have a region 1 ASCE, our shadow/g3 address can have any value.
> Unfortunately, (-1UL << 64) is undefined and triggers sometimes,
> rejecting valid shadow addresses when trying to walk our shadow table
> hierarchy.
> 
> The result is that the prefix cannot get mapped and will loop basically
> forever trying to map it (-EAGAIN loop).
> 
> After all, the broken check is only a sanity check, our table shadowing
> code in kvm_s390_shadow_tables() already checks these conditions, injecting
> proper translation exceptions. Turn it into a WARN_ON_ONCE().
> 
> Fixes: 4be130a08420 ("s390/mm: add shadow gmap support")
> Cc: <stable@vger.kernel.org> # v4.8+
> Reported-by: Janosch Frank <frankja@linux.ibm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

With the WARN_ON_ONCE fix applied I don't run into stalls or warnings
anymore, so:
Tested-by: Janosch Frank <frankja@linux.ibm.com>

> ---
>  arch/s390/mm/gmap.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 2fbece47ef6f..f3dbc5bdde50 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -787,14 +787,18 @@ static void gmap_call_notifier(struct gmap *gmap, unsigned long start,
>  static inline unsigned long *gmap_table_walk(struct gmap *gmap,
>  					     unsigned long gaddr, int level)
>  {
> +	const int asce_type = gmap->asce & _ASCE_TYPE_MASK;
>  	unsigned long *table;
>  
>  	if ((gmap->asce & _ASCE_TYPE_MASK) + 4 < (level * 4))
>  		return NULL;
>  	if (gmap_is_shadow(gmap) && gmap->removed)
>  		return NULL;
> -	if (gaddr & (-1UL << (31 + ((gmap->asce & _ASCE_TYPE_MASK) >> 2)*11)))
> +
> +	if (WARN_ON_ONCE(asce_type != _ASCE_TYPE_REGION1) &&
> +			 gaddr & (-1UL << (31 + (asce_type >> 2) * 11)))
>  		return NULL;
> +
>  	table = gmap->table;
>  	switch (gmap->asce & _ASCE_TYPE_MASK) {
>  	case _ASCE_TYPE_REGION1:
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 2/5] KVM: s390: vsie: Fix delivery of addressing exceptions
  2020-04-02 18:48 ` [PATCH v1 2/5] KVM: s390: vsie: Fix delivery of addressing exceptions David Hildenbrand
@ 2020-04-06 13:17   ` Christian Borntraeger
  2020-04-06 13:22     ` David Hildenbrand
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Borntraeger @ 2020-04-06 13:17 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: linux-s390, linux-kernel, Vasily Gorbik, Heiko Carstens,
	Cornelia Huck, Janosch Frank, stable



On 02.04.20 20:48, David Hildenbrand wrote:
> Whenever we get an -EFAULT, we failed to read in guest 2 physical
> address space. Such addressing exceptions are reported via a program
> intercept to the nested hypervisor.
> 
> We faked the intercept, we have to return to guest 2. Instead, right
> now we would be returning -EFAULT from the intercept handler, eventually
> crashing the VM.
> 
> Addressing exceptions can only happen if the g2->g3 page tables
> reference invalid g2 addresses (say, either a table or the final page is
> not accessible - so something that basically never happens in sane
> environments.
> 
> Identified by manual code inspection.
> 
> Fixes: a3508fbe9dc6 ("KVM: s390: vsie: initial support for nested virtualization")
> Cc: <stable@vger.kernel.org> # v4.8+
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/s390/kvm/vsie.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
> index 076090f9e666..4f6c22d72072 100644
> --- a/arch/s390/kvm/vsie.c
> +++ b/arch/s390/kvm/vsie.c
> @@ -1202,6 +1202,7 @@ static int vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>  		scb_s->iprcc = PGM_ADDRESSING;
>  		scb_s->pgmilc = 4;
>  		scb_s->gpsw.addr = __rewind_psw(scb_s->gpsw, 4);
> +		rc = 1;


kvm_s390_handle_vsie has 

 return rc < 0 ? rc : 0;


so rc = 0 would result in the same behaviour, correct?
Since we DO handle everything as we should, why rc = 1 ?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 2/5] KVM: s390: vsie: Fix delivery of addressing exceptions
  2020-04-06 13:17   ` Christian Borntraeger
@ 2020-04-06 13:22     ` David Hildenbrand
  2020-04-06 13:24       ` Christian Borntraeger
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2020-04-06 13:22 UTC (permalink / raw)
  To: Christian Borntraeger, kvm
  Cc: linux-s390, linux-kernel, Vasily Gorbik, Heiko Carstens,
	Cornelia Huck, Janosch Frank, stable

On 06.04.20 15:17, Christian Borntraeger wrote:
> 
> 
> On 02.04.20 20:48, David Hildenbrand wrote:
>> Whenever we get an -EFAULT, we failed to read in guest 2 physical
>> address space. Such addressing exceptions are reported via a program
>> intercept to the nested hypervisor.
>>
>> We faked the intercept, we have to return to guest 2. Instead, right
>> now we would be returning -EFAULT from the intercept handler, eventually
>> crashing the VM.
>>
>> Addressing exceptions can only happen if the g2->g3 page tables
>> reference invalid g2 addresses (say, either a table or the final page is
>> not accessible - so something that basically never happens in sane
>> environments.
>>
>> Identified by manual code inspection.
>>
>> Fixes: a3508fbe9dc6 ("KVM: s390: vsie: initial support for nested virtualization")
>> Cc: <stable@vger.kernel.org> # v4.8+
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  arch/s390/kvm/vsie.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
>> index 076090f9e666..4f6c22d72072 100644
>> --- a/arch/s390/kvm/vsie.c
>> +++ b/arch/s390/kvm/vsie.c
>> @@ -1202,6 +1202,7 @@ static int vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>>  		scb_s->iprcc = PGM_ADDRESSING;
>>  		scb_s->pgmilc = 4;
>>  		scb_s->gpsw.addr = __rewind_psw(scb_s->gpsw, 4);
>> +		rc = 1;
> 
> 
> kvm_s390_handle_vsie has 
> 
>  return rc < 0 ? rc : 0;
> 
> 
> so rc = 0 would result in the same behaviour, correct?

yes

> Since we DO handle everything as we should, why rc = 1 ?

rc == 1 is the internal representation of "we have to go back into g2".
rc == 0, in contrast, means "we can go back into g2 (via a NULL
intercept) or continue executing g3". Returning rc == 1 instead of rc ==
0 at this point is just consistency.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 2/5] KVM: s390: vsie: Fix delivery of addressing exceptions
  2020-04-06 13:22     ` David Hildenbrand
@ 2020-04-06 13:24       ` Christian Borntraeger
  0 siblings, 0 replies; 14+ messages in thread
From: Christian Borntraeger @ 2020-04-06 13:24 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: linux-s390, linux-kernel, Vasily Gorbik, Heiko Carstens,
	Cornelia Huck, Janosch Frank, stable



On 06.04.20 15:22, David Hildenbrand wrote:
> On 06.04.20 15:17, Christian Borntraeger wrote:
>>
>>
>> On 02.04.20 20:48, David Hildenbrand wrote:
>>> Whenever we get an -EFAULT, we failed to read in guest 2 physical
>>> address space. Such addressing exceptions are reported via a program
>>> intercept to the nested hypervisor.
>>>
>>> We faked the intercept, we have to return to guest 2. Instead, right
>>> now we would be returning -EFAULT from the intercept handler, eventually
>>> crashing the VM.
>>>
>>> Addressing exceptions can only happen if the g2->g3 page tables
>>> reference invalid g2 addresses (say, either a table or the final page is
>>> not accessible - so something that basically never happens in sane
>>> environments.
>>>
>>> Identified by manual code inspection.
>>>
>>> Fixes: a3508fbe9dc6 ("KVM: s390: vsie: initial support for nested virtualization")
>>> Cc: <stable@vger.kernel.org> # v4.8+
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>  arch/s390/kvm/vsie.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
>>> index 076090f9e666..4f6c22d72072 100644
>>> --- a/arch/s390/kvm/vsie.c
>>> +++ b/arch/s390/kvm/vsie.c
>>> @@ -1202,6 +1202,7 @@ static int vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>>>  		scb_s->iprcc = PGM_ADDRESSING;
>>>  		scb_s->pgmilc = 4;
>>>  		scb_s->gpsw.addr = __rewind_psw(scb_s->gpsw, 4);
>>> +		rc = 1;
>>
>>
>> kvm_s390_handle_vsie has 
>>
>>  return rc < 0 ? rc : 0;
>>
>>
>> so rc = 0 would result in the same behaviour, correct?
> 
> yes
> 
>> Since we DO handle everything as we should, why rc = 1 ?
> 
> rc == 1 is the internal representation of "we have to go back into g2".
> rc == 0, in contrast, means "we can go back into g2 (via a NULL
> intercept) or continue executing g3". Returning rc == 1 instead of rc ==
> 0 at this point is just consistency.

Ok, I will add something to the patch description.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 3/5] KVM: s390: vsie: Fix possible race when shadowing region 3 tables
  2020-04-02 18:48 ` [PATCH v1 3/5] KVM: s390: vsie: Fix possible race when shadowing region 3 tables David Hildenbrand
@ 2020-04-06 13:47   ` Christian Borntraeger
  0 siblings, 0 replies; 14+ messages in thread
From: Christian Borntraeger @ 2020-04-06 13:47 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: linux-s390, linux-kernel, Vasily Gorbik, Heiko Carstens,
	Cornelia Huck, Janosch Frank



On 02.04.20 20:48, David Hildenbrand wrote:
> We have to properly retry again by returning -EINVAL immediately in case
> somebody else instantiated the table concurrently. We missed to add the
> goto in this function only. The code now matches the other, similar
> shadowing functions.
> 
> We are overwriting an existing region 2 table entry. All allocated pages
> are added to the crst_list to be freed later, so they are not lost
> forever. However, when unshadowing the region 2 table, we wouldn't trigger
> unshadowing of the original shadowed region 3 table that we replaced. It
> would get unshadowed when the original region 3 table is modified. As it's
> not connected to the page table hierarchy anymore, it's not going to get
> used anymore. However, for a limited time, this page table will stick
> around, so it's in some sense a temporary memory leak.
> 
> Identified by manual code inspection. I don't think this classifies as
> stable material.
> 
> Fixes: 998f637cc4b9 ("s390/mm: avoid races on region/segment/page table shadowing")
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/s390/mm/gmap.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index f3dbc5bdde50..fd32ab566f57 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -1844,6 +1844,7 @@ int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3t,
>  		goto out_free;
>  	} else if (*table & _REGION_ENTRY_ORIGIN) {
>  		rc = -EAGAIN;		/* Race with shadow */
> +		goto out_free;
>  	}
>  	crst_table_init(s_r3t, _REGION3_ENTRY_EMPTY);
>  	/* mark as invalid as long as the parent table is not protected */

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-04-06 13:47 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-02 18:48 [PATCH v1 0/5] KVM: s390: vsie: fixes and cleanups David Hildenbrand
2020-04-02 18:48 ` [PATCH v1 1/5] KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks David Hildenbrand
2020-04-03 12:07   ` David Hildenbrand
2020-04-03 13:41   ` Janosch Frank
2020-04-02 18:48 ` [PATCH v1 2/5] KVM: s390: vsie: Fix delivery of addressing exceptions David Hildenbrand
2020-04-06 13:17   ` Christian Borntraeger
2020-04-06 13:22     ` David Hildenbrand
2020-04-06 13:24       ` Christian Borntraeger
2020-04-02 18:48 ` [PATCH v1 3/5] KVM: s390: vsie: Fix possible race when shadowing region 3 tables David Hildenbrand
2020-04-06 13:47   ` Christian Borntraeger
2020-04-02 18:48 ` [PATCH v1 4/5] KVM: s390: vsie: Move conditional reschedule David Hildenbrand
2020-04-02 18:48 ` [PATCH v1 5/5] KVM: s390: vsie: gmap_table_walk() simplifications David Hildenbrand
2020-04-03 13:03   ` Janosch Frank
2020-04-03 13:09     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).