All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/1] KVM: s390: pv: fix clock comparator late after suspend/resume
@ 2022-08-29  7:56 Nico Boehr
  2022-08-29  7:56 ` [RFC PATCH v2 1/1] KVM: s390: pv: don't allow userspace to set the clock under PV Nico Boehr
  0 siblings, 1 reply; 6+ messages in thread
From: Nico Boehr @ 2022-08-29  7:56 UTC (permalink / raw)
  To: kvm; +Cc: frankja, imbrenda, borntraeger

v1->v2:
---
- fix broken migration due to deadlock

After a PV guest in QEMU has been paused and resumed, clock comparator
interrupts are delivered to the guest much too late.

This is caused by QEMU's tod-kvm device restoring the guest's TOD clock
upon guest resume. This is not possible with PV, since the guest's TOD
clock is controlled by the ultravisor.

Even if not allowed under PV, KVM allowed the respective call from
userspace (VM attribute KVM_S390_VM_TOD) and updated its internal data
structures on this call. This can make the ultravisor's and KVM's view
of the guest TOD clock inconsistent. This in turn can lead to the late
delivery of clock comparator interrupts when KVM calculates when to wake
the guest.

This fixes the kernel portion of the problem by disallowing the vm attr
call for the guest TOD clock so userspace cannot mess up KVM's view of
the guest TOD. This fix causes an ugly warning in QEMU though, hence
another fix is due for QEMU to simply not even attempt to set the guest
TOD on resume.

Nico Boehr (1):
  KVM: s390: pv: don't allow userspace to set the clock under PV

 arch/s390/kvm/kvm-s390.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH v2 1/1] KVM: s390: pv: don't allow userspace to set the clock under PV
  2022-08-29  7:56 [RFC PATCH v2 0/1] KVM: s390: pv: fix clock comparator late after suspend/resume Nico Boehr
@ 2022-08-29  7:56 ` Nico Boehr
  2022-08-31 13:26   ` Nico Boehr
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Nico Boehr @ 2022-08-29  7:56 UTC (permalink / raw)
  To: kvm; +Cc: frankja, imbrenda, borntraeger

When running under PV, the guest's TOD clock is under control of the
ultravisor and the hypervisor isn't allowed to change it. Hence, don't
allow userspace to change the guest's TOD clock by returning
-EOPNOTSUPP.

When userspace changes the guest's TOD clock, KVM updates its
kvm.arch.epoch field and, in addition, the epoch field in all state
descriptions of all VCPUs.

But, under PV, the ultravisor will ignore the epoch field in the state
description and simply overwrite it on next SIE exit with the actual
guest epoch. This leads to KVM having an incorrect view of the guest's
TOD clock: it has updated its internal kvm.arch.epoch field, but the
ultravisor ignores the field in the state description.

Whenever a guest is now waiting for a clock comparator, KVM will
incorrectly calculate the time when the guest should wake up, possibly
causing the guest to sleep for much longer than expected.

With this change, kvm_s390_set_tod() will now take the kvm->lock to be
able to call kvm_s390_pv_is_protected(). Since kvm_s390_set_tod_clock()
also takes kvm->lock, use __kvm_s390_set_tod_clock() instead.

Fixes: 0f3035047140 ("KVM: s390: protvirt: Do only reset registers that are accessible")
Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index edfd4bbd0cba..f4ee88e787fe 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1207,6 +1207,8 @@ static int kvm_s390_vm_get_migration(struct kvm *kvm,
 	return 0;
 }
 
+static void __kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod);
+
 static int kvm_s390_set_tod_ext(struct kvm *kvm, struct kvm_device_attr *attr)
 {
 	struct kvm_s390_vm_tod_clock gtod;
@@ -1216,7 +1218,7 @@ static int kvm_s390_set_tod_ext(struct kvm *kvm, struct kvm_device_attr *attr)
 
 	if (!test_kvm_facility(kvm, 139) && gtod.epoch_idx)
 		return -EINVAL;
-	kvm_s390_set_tod_clock(kvm, &gtod);
+	__kvm_s390_set_tod_clock(kvm, &gtod);
 
 	VM_EVENT(kvm, 3, "SET: TOD extension: 0x%x, TOD base: 0x%llx",
 		gtod.epoch_idx, gtod.tod);
@@ -1247,7 +1249,7 @@ static int kvm_s390_set_tod_low(struct kvm *kvm, struct kvm_device_attr *attr)
 			   sizeof(gtod.tod)))
 		return -EFAULT;
 
-	kvm_s390_set_tod_clock(kvm, &gtod);
+	__kvm_s390_set_tod_clock(kvm, &gtod);
 	VM_EVENT(kvm, 3, "SET: TOD base: 0x%llx", gtod.tod);
 	return 0;
 }
@@ -1259,6 +1261,12 @@ static int kvm_s390_set_tod(struct kvm *kvm, struct kvm_device_attr *attr)
 	if (attr->flags)
 		return -EINVAL;
 
+	mutex_lock(&kvm->lock);
+	if (kvm_s390_pv_is_protected(kvm)) {
+		ret = -EOPNOTSUPP;
+		goto out_unlock;
+	}
+
 	switch (attr->attr) {
 	case KVM_S390_VM_TOD_EXT:
 		ret = kvm_s390_set_tod_ext(kvm, attr);
@@ -1273,6 +1281,9 @@ static int kvm_s390_set_tod(struct kvm *kvm, struct kvm_device_attr *attr)
 		ret = -ENXIO;
 		break;
 	}
+
+out_unlock:
+	mutex_unlock(&kvm->lock);
 	return ret;
 }
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v2 1/1] KVM: s390: pv: don't allow userspace to set the clock under PV
  2022-08-29  7:56 ` [RFC PATCH v2 1/1] KVM: s390: pv: don't allow userspace to set the clock under PV Nico Boehr
@ 2022-08-31 13:26   ` Nico Boehr
  2022-09-20 11:07   ` Christian Borntraeger
  2022-10-05 15:08   ` Claudio Imbrenda
  2 siblings, 0 replies; 6+ messages in thread
From: Nico Boehr @ 2022-08-31 13:26 UTC (permalink / raw)
  To: kvm; +Cc: frankja, imbrenda, borntraeger, Marc Hartmayer

Quoting Nico Boehr (2022-08-29 09:56:02)
> When running under PV, the guest's TOD clock is under control of the
> ultravisor and the hypervisor isn't allowed to change it. Hence, don't
> allow userspace to change the guest's TOD clock by returning
> -EOPNOTSUPP.
> 
> When userspace changes the guest's TOD clock, KVM updates its
> kvm.arch.epoch field and, in addition, the epoch field in all state
> descriptions of all VCPUs.
> 
> But, under PV, the ultravisor will ignore the epoch field in the state
> description and simply overwrite it on next SIE exit with the actual
> guest epoch. This leads to KVM having an incorrect view of the guest's
> TOD clock: it has updated its internal kvm.arch.epoch field, but the
> ultravisor ignores the field in the state description.
> 
> Whenever a guest is now waiting for a clock comparator, KVM will
> incorrectly calculate the time when the guest should wake up, possibly
> causing the guest to sleep for much longer than expected.
> 
> With this change, kvm_s390_set_tod() will now take the kvm->lock to be
> able to call kvm_s390_pv_is_protected(). Since kvm_s390_set_tod_clock()
> also takes kvm->lock, use __kvm_s390_set_tod_clock() instead.
> 
> Fixes: 0f3035047140 ("KVM: s390: protvirt: Do only reset registers that are accessible")

I missed a 

Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>

here. Sorry Marc.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v2 1/1] KVM: s390: pv: don't allow userspace to set the clock under PV
  2022-08-29  7:56 ` [RFC PATCH v2 1/1] KVM: s390: pv: don't allow userspace to set the clock under PV Nico Boehr
  2022-08-31 13:26   ` Nico Boehr
@ 2022-09-20 11:07   ` Christian Borntraeger
  2022-10-05 13:55     ` Nico Boehr
  2022-10-05 15:08   ` Claudio Imbrenda
  2 siblings, 1 reply; 6+ messages in thread
From: Christian Borntraeger @ 2022-09-20 11:07 UTC (permalink / raw)
  To: Nico Boehr, kvm; +Cc: frankja, imbrenda

Am 29.08.22 um 09:56 schrieb Nico Boehr:
> When running under PV, the guest's TOD clock is under control of the
> ultravisor and the hypervisor isn't allowed to change it. Hence, don't
> allow userspace to change the guest's TOD clock by returning
> -EOPNOTSUPP.
> 
> When userspace changes the guest's TOD clock, KVM updates its
> kvm.arch.epoch field and, in addition, the epoch field in all state
> descriptions of all VCPUs.
> 
> But, under PV, the ultravisor will ignore the epoch field in the state
> description and simply overwrite it on next SIE exit with the actual
> guest epoch. This leads to KVM having an incorrect view of the guest's
> TOD clock: it has updated its internal kvm.arch.epoch field, but the
> ultravisor ignores the field in the state description.
> 
> Whenever a guest is now waiting for a clock comparator, KVM will
> incorrectly calculate the time when the guest should wake up, possibly
> causing the guest to sleep for much longer than expected.
> 
> With this change, kvm_s390_set_tod() will now take the kvm->lock to be
> able to call kvm_s390_pv_is_protected(). Since kvm_s390_set_tod_clock()
> also takes kvm->lock, use __kvm_s390_set_tod_clock() instead.
> 
> Fixes: 0f3035047140 ("KVM: s390: protvirt: Do only reset registers that are accessible")
> Signed-off-by: Nico Boehr <nrb@linux.ibm.com>

Has this patch already been sent to the upstream list yet and have we solved all existing problems of the previous version?
> ---
>   arch/s390/kvm/kvm-s390.c | 15 +++++++++++++--
>   1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index edfd4bbd0cba..f4ee88e787fe 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1207,6 +1207,8 @@ static int kvm_s390_vm_get_migration(struct kvm *kvm,
>   	return 0;
>   }
>   
> +static void __kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod);
> +
>   static int kvm_s390_set_tod_ext(struct kvm *kvm, struct kvm_device_attr *attr)
>   {
>   	struct kvm_s390_vm_tod_clock gtod;
> @@ -1216,7 +1218,7 @@ static int kvm_s390_set_tod_ext(struct kvm *kvm, struct kvm_device_attr *attr)
>   
>   	if (!test_kvm_facility(kvm, 139) && gtod.epoch_idx)
>   		return -EINVAL;
> -	kvm_s390_set_tod_clock(kvm, &gtod);
> +	__kvm_s390_set_tod_clock(kvm, &gtod);
>   
>   	VM_EVENT(kvm, 3, "SET: TOD extension: 0x%x, TOD base: 0x%llx",
>   		gtod.epoch_idx, gtod.tod);
> @@ -1247,7 +1249,7 @@ static int kvm_s390_set_tod_low(struct kvm *kvm, struct kvm_device_attr *attr)
>   			   sizeof(gtod.tod)))
>   		return -EFAULT;
>   
> -	kvm_s390_set_tod_clock(kvm, &gtod);
> +	__kvm_s390_set_tod_clock(kvm, &gtod);
>   	VM_EVENT(kvm, 3, "SET: TOD base: 0x%llx", gtod.tod);
>   	return 0;
>   }
> @@ -1259,6 +1261,12 @@ static int kvm_s390_set_tod(struct kvm *kvm, struct kvm_device_attr *attr)
>   	if (attr->flags)
>   		return -EINVAL;
>   
> +	mutex_lock(&kvm->lock);
> +	if (kvm_s390_pv_is_protected(kvm)) {
> +		ret = -EOPNOTSUPP;
> +		goto out_unlock;
> +	}
> +
>   	switch (attr->attr) {
>   	case KVM_S390_VM_TOD_EXT:
>   		ret = kvm_s390_set_tod_ext(kvm, attr);
> @@ -1273,6 +1281,9 @@ static int kvm_s390_set_tod(struct kvm *kvm, struct kvm_device_attr *attr)
>   		ret = -ENXIO;
>   		break;
>   	}
> +
> +out_unlock:
> +	mutex_unlock(&kvm->lock);
>   	return ret;
>   }
>   

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v2 1/1] KVM: s390: pv: don't allow userspace to set the clock under PV
  2022-09-20 11:07   ` Christian Borntraeger
@ 2022-10-05 13:55     ` Nico Boehr
  0 siblings, 0 replies; 6+ messages in thread
From: Nico Boehr @ 2022-10-05 13:55 UTC (permalink / raw)
  To: Christian Borntraeger, kvm; +Cc: frankja, imbrenda

Quoting Christian Borntraeger (2022-09-20 13:07:58)
> Am 29.08.22 um 09:56 schrieb Nico Boehr:
> > When running under PV, the guest's TOD clock is under control of the
> > ultravisor and the hypervisor isn't allowed to change it. Hence, don't
> > allow userspace to change the guest's TOD clock by returning
> > -EOPNOTSUPP.
> > 
> > When userspace changes the guest's TOD clock, KVM updates its
> > kvm.arch.epoch field and, in addition, the epoch field in all state
> > descriptions of all VCPUs.
> > 
> > But, under PV, the ultravisor will ignore the epoch field in the state
> > description and simply overwrite it on next SIE exit with the actual
> > guest epoch. This leads to KVM having an incorrect view of the guest's
> > TOD clock: it has updated its internal kvm.arch.epoch field, but the
> > ultravisor ignores the field in the state description.
> > 
> > Whenever a guest is now waiting for a clock comparator, KVM will
> > incorrectly calculate the time when the guest should wake up, possibly
> > causing the guest to sleep for much longer than expected.
> > 
> > With this change, kvm_s390_set_tod() will now take the kvm->lock to be
> > able to call kvm_s390_pv_is_protected(). Since kvm_s390_set_tod_clock()
> > also takes kvm->lock, use __kvm_s390_set_tod_clock() instead.
> > 
> > Fixes: 0f3035047140 ("KVM: s390: protvirt: Do only reset registers that are accessible")
> > Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
> 
> Has this patch already been sent to the upstream list yet and have we solved all existing problems of the previous version?

Yes, the patch works and the issues I identified should be fixed now. I can resend as a non-RFC if you like.

Note that a QEMU change is also needed to silence this message:

warning: Unable to set KVM guest TOD clock: Operation not supported

But I wanted to get the kernel side done first.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v2 1/1] KVM: s390: pv: don't allow userspace to set the clock under PV
  2022-08-29  7:56 ` [RFC PATCH v2 1/1] KVM: s390: pv: don't allow userspace to set the clock under PV Nico Boehr
  2022-08-31 13:26   ` Nico Boehr
  2022-09-20 11:07   ` Christian Borntraeger
@ 2022-10-05 15:08   ` Claudio Imbrenda
  2 siblings, 0 replies; 6+ messages in thread
From: Claudio Imbrenda @ 2022-10-05 15:08 UTC (permalink / raw)
  To: Nico Boehr; +Cc: kvm, frankja, borntraeger

On Mon, 29 Aug 2022 09:56:02 +0200
Nico Boehr <nrb@linux.ibm.com> wrote:

> When running under PV, the guest's TOD clock is under control of the
> ultravisor and the hypervisor isn't allowed to change it. Hence, don't
> allow userspace to change the guest's TOD clock by returning
> -EOPNOTSUPP.
> 
> When userspace changes the guest's TOD clock, KVM updates its
> kvm.arch.epoch field and, in addition, the epoch field in all state
> descriptions of all VCPUs.
> 
> But, under PV, the ultravisor will ignore the epoch field in the state
> description and simply overwrite it on next SIE exit with the actual
> guest epoch. This leads to KVM having an incorrect view of the guest's
> TOD clock: it has updated its internal kvm.arch.epoch field, but the
> ultravisor ignores the field in the state description.
> 
> Whenever a guest is now waiting for a clock comparator, KVM will
> incorrectly calculate the time when the guest should wake up, possibly
> causing the guest to sleep for much longer than expected.
> 
> With this change, kvm_s390_set_tod() will now take the kvm->lock to be
> able to call kvm_s390_pv_is_protected(). Since kvm_s390_set_tod_clock()
> also takes kvm->lock, use __kvm_s390_set_tod_clock() instead.
> 
> Fixes: 0f3035047140 ("KVM: s390: protvirt: Do only reset registers that are accessible")
> Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
> ---
>  arch/s390/kvm/kvm-s390.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index edfd4bbd0cba..f4ee88e787fe 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1207,6 +1207,8 @@ static int kvm_s390_vm_get_migration(struct kvm *kvm,
>  	return 0;
>  }
>  
> +static void __kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod);

are there any users of kvm_s390_set_tod_clock left after this patch?

maybe that wrapper can be removed (also from the headers), and we can
keep the function with the __

it's perhaps cleaner to have a second patch to remove the unused stuff

> +
>  static int kvm_s390_set_tod_ext(struct kvm *kvm, struct kvm_device_attr *attr)
>  {
>  	struct kvm_s390_vm_tod_clock gtod;
> @@ -1216,7 +1218,7 @@ static int kvm_s390_set_tod_ext(struct kvm *kvm, struct kvm_device_attr *attr)
>  
>  	if (!test_kvm_facility(kvm, 139) && gtod.epoch_idx)
>  		return -EINVAL;
> -	kvm_s390_set_tod_clock(kvm, &gtod);
> +	__kvm_s390_set_tod_clock(kvm, &gtod);
>  
>  	VM_EVENT(kvm, 3, "SET: TOD extension: 0x%x, TOD base: 0x%llx",
>  		gtod.epoch_idx, gtod.tod);
> @@ -1247,7 +1249,7 @@ static int kvm_s390_set_tod_low(struct kvm *kvm, struct kvm_device_attr *attr)
>  			   sizeof(gtod.tod)))
>  		return -EFAULT;
>  
> -	kvm_s390_set_tod_clock(kvm, &gtod);
> +	__kvm_s390_set_tod_clock(kvm, &gtod);
>  	VM_EVENT(kvm, 3, "SET: TOD base: 0x%llx", gtod.tod);
>  	return 0;
>  }
> @@ -1259,6 +1261,12 @@ static int kvm_s390_set_tod(struct kvm *kvm, struct kvm_device_attr *attr)
>  	if (attr->flags)
>  		return -EINVAL;
>  
> +	mutex_lock(&kvm->lock);
> +	if (kvm_s390_pv_is_protected(kvm)) {
> +		ret = -EOPNOTSUPP;
> +		goto out_unlock;
> +	}
> +
>  	switch (attr->attr) {
>  	case KVM_S390_VM_TOD_EXT:
>  		ret = kvm_s390_set_tod_ext(kvm, attr);
> @@ -1273,6 +1281,9 @@ static int kvm_s390_set_tod(struct kvm *kvm, struct kvm_device_attr *attr)
>  		ret = -ENXIO;
>  		break;
>  	}
> +
> +out_unlock:
> +	mutex_unlock(&kvm->lock);
>  	return ret;
>  }
>  


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-10-05 15:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-29  7:56 [RFC PATCH v2 0/1] KVM: s390: pv: fix clock comparator late after suspend/resume Nico Boehr
2022-08-29  7:56 ` [RFC PATCH v2 1/1] KVM: s390: pv: don't allow userspace to set the clock under PV Nico Boehr
2022-08-31 13:26   ` Nico Boehr
2022-09-20 11:07   ` Christian Borntraeger
2022-10-05 13:55     ` Nico Boehr
2022-10-05 15:08   ` Claudio Imbrenda

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.