All of lore.kernel.org
 help / color / mirror / Atom feed
* KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue
@ 2015-01-20 17:54 Marcelo Tosatti
  2015-01-20 19:40 ` Paolo Bonzini
  2015-01-21 14:09 ` Radim Krčmář
  0 siblings, 2 replies; 11+ messages in thread
From: Marcelo Tosatti @ 2015-01-20 17:54 UTC (permalink / raw)
  To: kvm-devel; +Cc: Paolo Bonzini


SuSE's 2.6.16 kernel fails to boot if the delta between tsc_timestamp
and rdtsc is larger than a given threshold:

 * If we get more than the below threshold into the future, we rerequest
 * the real time from the host again which has only little offset then
 * that we need to adjust using the TSC.
 *
 * For now that threshold is 1/5th of a jiffie. That should be good
 * enough accuracy for completely broken systems, but also give us swing
 * to not call out to the host all the time.
 */
#define PVCLOCK_DELTA_MAX ((1000000000ULL / HZ) / 5)

Disable masterclock support (which increases said delta) in case the
boot vcpu does not use MSR_KVM_SYSTEM_TIME_NEW.

Upstreams kernels which support pvclock vsyscalls (and therefore make
use of PVCLOCK_STABLE_BIT) use MSR_KVM_SYSTEM_TIME_NEW.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..9a099f6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -613,6 +613,8 @@ struct kvm_arch {
 	#ifdef CONFIG_KVM_MMU_AUDIT
 	int audit_point;
 	#endif
+
+	bool boot_vcpu_runs_old_kvmclock;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..1d8a4f6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1494,7 +1494,8 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm)
 					&ka->master_cycle_now);
 
 	ka->use_master_clock = host_tsc_clocksource && vcpus_matched
-				&& !backwards_tsc_observed;
+				&& !backwards_tsc_observed
+				&& !ka->boot_vcpu_runs_old_kvmclock;
 
 	if (ka->use_master_clock)
 		atomic_set(&kvm_guest_has_master_clock, 1);
@@ -2106,8 +2107,20 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_KVM_SYSTEM_TIME_NEW:
 	case MSR_KVM_SYSTEM_TIME: {
 		u64 gpa_offset;
+		struct kvm_arch *ka = &vcpu->kvm->arch;
+
 		kvmclock_reset(vcpu);
 
+		if (vcpu->vcpu_id == 0 && !msr_info->host_initiated) {
+			bool tmp = (msr == MSR_KVM_SYSTEM_TIME);
+
+			if (ka->boot_vcpu_runs_old_kvmclock != tmp)
+				set_bit(KVM_REQ_MASTERCLOCK_UPDATE,
+					&vcpu->requests);
+
+			ka->boot_vcpu_runs_old_kvmclock = tmp;
+		}
+
 		vcpu->arch.time = data;
 		kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue
  2015-01-20 17:54 KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue Marcelo Tosatti
@ 2015-01-20 19:40 ` Paolo Bonzini
  2015-01-21 14:09 ` Radim Krčmář
  1 sibling, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2015-01-20 19:40 UTC (permalink / raw)
  To: Marcelo Tosatti, kvm-devel



On 20/01/2015 18:54, Marcelo Tosatti wrote:
> 
> SuSE's 2.6.16 kernel fails to boot if the delta between tsc_timestamp
> and rdtsc is larger than a given threshold:
> 
>  * If we get more than the below threshold into the future, we rerequest
>  * the real time from the host again which has only little offset then
>  * that we need to adjust using the TSC.
>  *
>  * For now that threshold is 1/5th of a jiffie. That should be good
>  * enough accuracy for completely broken systems, but also give us swing
>  * to not call out to the host all the time.
>  */
> #define PVCLOCK_DELTA_MAX ((1000000000ULL / HZ) / 5)
> 
> Disable masterclock support (which increases said delta) in case the
> boot vcpu does not use MSR_KVM_SYSTEM_TIME_NEW.

Makes sense.

Applied to queue, thanks.

Paolo

> Upstreams kernels which support pvclock vsyscalls (and therefore make
> use of PVCLOCK_STABLE_BIT) use MSR_KVM_SYSTEM_TIME_NEW.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 7c492ed..9a099f6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -613,6 +613,8 @@ struct kvm_arch {
>  	#ifdef CONFIG_KVM_MMU_AUDIT
>  	int audit_point;
>  	#endif
> +
> +	bool boot_vcpu_runs_old_kvmclock;
>  };
>  
>  struct kvm_vm_stat {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8f1e22d..1d8a4f6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1494,7 +1494,8 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm)
>  					&ka->master_cycle_now);
>  
>  	ka->use_master_clock = host_tsc_clocksource && vcpus_matched
> -				&& !backwards_tsc_observed;
> +				&& !backwards_tsc_observed
> +				&& !ka->boot_vcpu_runs_old_kvmclock;
>  
>  	if (ka->use_master_clock)
>  		atomic_set(&kvm_guest_has_master_clock, 1);
> @@ -2106,8 +2107,20 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	case MSR_KVM_SYSTEM_TIME_NEW:
>  	case MSR_KVM_SYSTEM_TIME: {
>  		u64 gpa_offset;
> +		struct kvm_arch *ka = &vcpu->kvm->arch;
> +
>  		kvmclock_reset(vcpu);
>  
> +		if (vcpu->vcpu_id == 0 && !msr_info->host_initiated) {
> +			bool tmp = (msr == MSR_KVM_SYSTEM_TIME);
> +
> +			if (ka->boot_vcpu_runs_old_kvmclock != tmp)
> +				set_bit(KVM_REQ_MASTERCLOCK_UPDATE,
> +					&vcpu->requests);
> +
> +			ka->boot_vcpu_runs_old_kvmclock = tmp;
> +		}
> +
>  		vcpu->arch.time = data;
>  		kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
>  
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue
  2015-01-20 17:54 KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue Marcelo Tosatti
  2015-01-20 19:40 ` Paolo Bonzini
@ 2015-01-21 14:09 ` Radim Krčmář
  2015-01-21 14:16   ` Marcelo Tosatti
  1 sibling, 1 reply; 11+ messages in thread
From: Radim Krčmář @ 2015-01-21 14:09 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm-devel, Paolo Bonzini

2015-01-20 15:54-0200, Marcelo Tosatti:
> SuSE's 2.6.16 kernel fails to boot if the delta between tsc_timestamp
> and rdtsc is larger than a given threshold:
[...]
> Disable masterclock support (which increases said delta) in case the
> boot vcpu does not use MSR_KVM_SYSTEM_TIME_NEW.

Why do we care about 2.6.16 bugs in upstream KVM?

The code to benefit tradeoff of this patch seems bad to me ...
MSR_KVM_SYSTEM_TIME is deprecated -- we could remove it now, with
MSR_KVM_WALL_CLOCK (after hiding KVM_FEATURE_CLOCKSOURCE) if we want
to support old guests.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue
  2015-01-21 14:09 ` Radim Krčmář
@ 2015-01-21 14:16   ` Marcelo Tosatti
  2015-01-21 17:00     ` Radim Krčmář
  0 siblings, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2015-01-21 14:16 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: kvm-devel, Paolo Bonzini

On Wed, Jan 21, 2015 at 03:09:27PM +0100, Radim Krčmář wrote:
> 2015-01-20 15:54-0200, Marcelo Tosatti:
> > SuSE's 2.6.16 kernel fails to boot if the delta between tsc_timestamp
> > and rdtsc is larger than a given threshold:
> [...]
> > Disable masterclock support (which increases said delta) in case the
> > boot vcpu does not use MSR_KVM_SYSTEM_TIME_NEW.
> 
> Why do we care about 2.6.16 bugs in upstream KVM?

Because people do use 2.6.16 guests.

> The code to benefit tradeoff of this patch seems bad to me ...

Can you state the tradeoff and then explain why it is bad ?

> MSR_KVM_SYSTEM_TIME is deprecated -- we could remove it now, with
> MSR_KVM_WALL_CLOCK (after hiding KVM_FEATURE_CLOCKSOURCE) if we want
> to support old guests.

What is the benefit of removing support for MSR_KVM_SYSTEM_TIME ?

Supporting old guests is important.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue
  2015-01-21 14:16   ` Marcelo Tosatti
@ 2015-01-21 17:00     ` Radim Krčmář
  2015-01-22  1:40       ` Marcelo Tosatti
  2015-01-22  8:10       ` Paolo Bonzini
  0 siblings, 2 replies; 11+ messages in thread
From: Radim Krčmář @ 2015-01-21 17:00 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm-devel, Paolo Bonzini

2015-01-21 12:16-0200, Marcelo Tosatti:
> On Wed, Jan 21, 2015 at 03:09:27PM +0100, Radim Krčmář wrote:
> > 2015-01-20 15:54-0200, Marcelo Tosatti:
> > > SuSE's 2.6.16 kernel fails to boot if the delta between tsc_timestamp
> > > and rdtsc is larger than a given threshold:
> > [...]
> > > Disable masterclock support (which increases said delta) in case the
> > > boot vcpu does not use MSR_KVM_SYSTEM_TIME_NEW.
> > 
> > Why do we care about 2.6.16 bugs in upstream KVM?
> 
> Because people do use 2.6.16 guests.

(Those people probably won't use 3.19+ host ...
 Is this patch intended for stable?)

> > The code to benefit tradeoff of this patch seems bad to me ...
> 
> Can you state the tradeoff and then explain why it is bad ?

Additional code needs time to understand and is a source of bugs,
yet we still include it because we want to achieve something.

I meant the tradeoff between perceived value of "something" and
acceptability of the code.  (Ideally, computer programs would be a
shorter version of "Do what I want.\nEOF".)

There are three main points that made think it is bad,
1) The bug happens because a guest expects greater precision.
   I consider that as a guest problem.  kvmclock never guaranteed
   anything, so unmet expectations should be a recoverable error.

2) With time, the probability that 2.6.16 is used is getting lower,
   while people looking at KVM's code appear.
   - At what point are we going to drop 2.6.16 support?
     (We shouldn't let mistakes drag us down forever ...
      Or are we dooming KVM on purpose?)

3) The patch made me ask more silly questions than it answered :)
   (Why can't other software depend on previous behavior?
    Why can't kvmclock without master clock still fail?
    Why can't we improve the master clock?)

> > MSR_KVM_SYSTEM_TIME is deprecated -- we could remove it now, with
> > MSR_KVM_WALL_CLOCK (after hiding KVM_FEATURE_CLOCKSOURCE) if we want
> > to support old guests.
> 
> What is the benefit of removing support for MSR_KVM_SYSTEM_TIME ?

The maintainability of the code increases.  It would look as if we never
made the mistake with MSR_KVM_SYSTEM_TIME & MSR_KVM_WALL_CLOCK.
(I like when old code looks as if we wrote it from scratch.)

After comparing the (imperfectly evaluated) benefit of both variants,
 original patch:
   + 2.6.16 SUSE guests work
   - MSR_KVM_SYSTEM_TIME guests don't use master clock
   - KVM code is worse
 removal of KVM_FEATURE_CLOCKSOURCE:
   + 2.6.16 SUSE guests likely work
   + KVM code is better
   - MSR_KVM_SYSTEM_TIME guests use even worse clocksource

As KVM_FEATURE_CLOCKSOURCE2 was introduced in 2010, I found the removal
better even without waiting for the last MSR_KVM_SYSTEM_TIME guest to
perish.

> Supporting old guests is important.

It comes at a price.
(Mutually exclusive goals are important as well.)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue
  2015-01-21 17:00     ` Radim Krčmář
@ 2015-01-22  1:40       ` Marcelo Tosatti
  2015-01-22 13:59         ` Radim Krčmář
  2015-01-22  8:10       ` Paolo Bonzini
  1 sibling, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2015-01-22  1:40 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: kvm-devel, Paolo Bonzini

On Wed, Jan 21, 2015 at 06:00:37PM +0100, Radim Krčmář wrote:
> 2015-01-21 12:16-0200, Marcelo Tosatti:
> > On Wed, Jan 21, 2015 at 03:09:27PM +0100, Radim Krčmář wrote:
> > > 2015-01-20 15:54-0200, Marcelo Tosatti:
> > > > SuSE's 2.6.16 kernel fails to boot if the delta between tsc_timestamp
> > > > and rdtsc is larger than a given threshold:
> > > [...]
> > > > Disable masterclock support (which increases said delta) in case the
> > > > boot vcpu does not use MSR_KVM_SYSTEM_TIME_NEW.
> > > 
> > > Why do we care about 2.6.16 bugs in upstream KVM?
> > 
> > Because people do use 2.6.16 guests.
> 
> (Those people probably won't use 3.19+ host ...
>  Is this patch intended for stable?)

Yes.

> > > The code to benefit tradeoff of this patch seems bad to me ...
> > 
> > Can you state the tradeoff and then explain why it is bad ?
> 
> Additional code needs time to understand and is a source of bugs,
> yet we still include it because we want to achieve something.
> 
> I meant the tradeoff between perceived value of "something" and
> acceptability of the code.  (Ideally, computer programs would be a
> shorter version of "Do what I want.\nEOF".)
> 
> There are three main points that made think it is bad,
> 1) The bug happens because a guest expects greater precision.
>    I consider that as a guest problem.  kvmclock never guaranteed
>    anything, so unmet expectations should be a recoverable error.

delta = pvclock_data.tsc_timestamp - RDTSC

Guest expects delta to be smaller than a given threshold. It does
not expect greater precision.

Size of delta does not affect precision.

> 2) With time, the probability that 2.6.16 is used is getting lower,
>    while people looking at KVM's code appear.
>    - At what point are we going to drop 2.6.16 support?
>      (We shouldn't let mistakes drag us down forever ...
>       Or are we dooming KVM on purpose?)

One of the features of virtualization is to be able to run old 
operating systems?

> 3) The patch made me ask more silly questions than it answered :)
>    (Why can't other software depend on previous behavior?

Documentation/virtual/kvm/msr.txt:

        whose data will be filled in by the hypervisor periodically.
        Only one write, or registration, is needed for each VCPU. The interval
        between updates of this structure is arbitrary and implementation-dependent.
        The hypervisor may update this structure at any time it sees fit until
        anything with bit0 == 0 is written to it.

>     Why can't kvmclock without master clock still fail?

It can, given a loaded system.

>     Why can't we improve the master clock?)

Out of context question.

> > > MSR_KVM_SYSTEM_TIME is deprecated -- we could remove it now, with
> > > MSR_KVM_WALL_CLOCK (after hiding KVM_FEATURE_CLOCKSOURCE) if we want
> > > to support old guests.
> > 
> > What is the benefit of removing support for MSR_KVM_SYSTEM_TIME ?
> 
> The maintainability of the code increases.  It would look as if we never
> made the mistake with MSR_KVM_SYSTEM_TIME & MSR_KVM_WALL_CLOCK.
> (I like when old code looks as if we wrote it from scratch.)
> 
> After comparing the (imperfectly evaluated) benefit of both variants,
>  original patch:
>    + 2.6.16 SUSE guests work
>    - MSR_KVM_SYSTEM_TIME guests don't use master clock
>    - KVM code is worse
>  removal of KVM_FEATURE_CLOCKSOURCE:
>    + 2.6.16 SUSE guests likely work

All guests which depend on KVM_FEATURE_CLOCKSOURCE will timedrift.

>    + KVM code is better
>    - MSR_KVM_SYSTEM_TIME guests use even worse clocksource
> 
> As KVM_FEATURE_CLOCKSOURCE2 was introduced in 2010, I found the removal
> better even without waiting for the last MSR_KVM_SYSTEM_TIME guest to
> perish.
> 
> > Supporting old guests is important.
> 
> It comes at a price.
> (Mutually exclusive goals are important as well.)

This phrase is awkward. Overlapping goals are negative,
then? (think of a large number of totally overlapping goals).



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue
  2015-01-21 17:00     ` Radim Krčmář
  2015-01-22  1:40       ` Marcelo Tosatti
@ 2015-01-22  8:10       ` Paolo Bonzini
  2015-01-22 17:02         ` Radim Krčmář
  1 sibling, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2015-01-22  8:10 UTC (permalink / raw)
  To: Radim Krčmář, Marcelo Tosatti; +Cc: kvm-devel



On 21/01/2015 18:00, Radim Krčmář wrote:
> 2015-01-21 12:16-0200, Marcelo Tosatti:
>> On Wed, Jan 21, 2015 at 03:09:27PM +0100, Radim Krčmář wrote:
>>> 2015-01-20 15:54-0200, Marcelo Tosatti:
>>>> SuSE's 2.6.16 kernel fails to boot if the delta between tsc_timestamp
>>>> and rdtsc is larger than a given threshold:
>>> [...]
>>>> Disable masterclock support (which increases said delta) in case the
>>>> boot vcpu does not use MSR_KVM_SYSTEM_TIME_NEW.
>>>
>>> Why do we care about 2.6.16 bugs in upstream KVM?
>>
>> Because people do use 2.6.16 guests.
> 
> (Those people probably won't use 3.19+ host ...

Why not? If you are a cloud provider, you cannot really know what guests
your customer run.

Also, the masterclock support has been in there for a while.  It was not
working as well as it could, which is why we noticed it only now, but
still it was broken before as well.  1/5th of a jiffy is a really really
low amount.

>>> MSR_KVM_SYSTEM_TIME is deprecated -- we could remove it now, with
>>> MSR_KVM_WALL_CLOCK (after hiding KVM_FEATURE_CLOCKSOURCE) if we want
>>> to support old guests.
>>
>> What is the benefit of removing support for MSR_KVM_SYSTEM_TIME ?
> 
> The maintainability of the code increases.  It would look as if we never
> made the mistake with MSR_KVM_SYSTEM_TIME & MSR_KVM_WALL_CLOCK.
> (I like when old code looks as if we wrote it from scratch.)

Everybody does, and everybody obsesses over splitting patches so that
they look as if the code had been split that way from scratch.  Heck, I
probably spend over half of my development time inside "git rebase -i".

But it's just not how reality works, and it must show sooner or later.
:)  Anyway, it's just two case labels or so (before this patch).

>> Supporting old guests is important.
> 
> It comes at a price.
> (Mutually exclusive goals are important as well.)

Marcelo's patch is not too high a price.  Is it ugly?  Yes.  Could it be
any better?  No, because the ugliness is not his fault, it's intrinsic
in the problem it solves.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue
  2015-01-22  1:40       ` Marcelo Tosatti
@ 2015-01-22 13:59         ` Radim Krčmář
  2015-01-22 18:12           ` Marcelo Tosatti
  0 siblings, 1 reply; 11+ messages in thread
From: Radim Krčmář @ 2015-01-22 13:59 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm-devel, Paolo Bonzini

2015-01-21 23:40-0200, Marcelo Tosatti:
> On Wed, Jan 21, 2015 at 06:00:37PM +0100, Radim Krčmář wrote:
> > 1) The bug happens because a guest expects greater precision.
> >    I consider that as a guest problem.  kvmclock never guaranteed
> >    anything, so unmet expectations should be a recoverable error.
> 
> delta = pvclock_data.tsc_timestamp - RDTSC
> 
> Guest expects delta to be smaller than a given threshold. It does
> not expect greater precision.
> 
> Size of delta does not affect precision.

I don't understand what the guest wants to achieve with the delta.
I thought that checking this only made sense if the guest didn't believe
that PV clock works with large delta.  And they only want precision.
(What else is there on a clock?)

Disclaimer: I haven't read the code. (It wasn't in vanilla 2.6.16.)

> > 2) With time, the probability that 2.6.16 is used is getting lower,
> >    while people looking at KVM's code appear.
> >    - At what point are we going to drop 2.6.16 support?
> >      (We shouldn't let mistakes drag us down forever ...
> >       Or are we dooming KVM on purpose?)
> 
> One of the features of virtualization is to be able to run old 
> operating systems?

True, I'll assign higher priority to it.

> > 3) The patch made me ask more silly questions than it answered :)
> >    (Why can't other software depend on previous behavior?
> 
> Documentation/virtual/kvm/msr.txt:
> 
>         whose data will be filled in by the hypervisor periodically.
>         Only one write, or registration, is needed for each VCPU. The interval
>         between updates of this structure is arbitrary and implementation-dependent.
>         The hypervisor may update this structure at any time it sees fit until
>         anything with bit0 == 0 is written to it.

Exactly, this made me think it is not a KVM problem.
(And I wondered why wouldn't we yield to other misuses of it.)

> > > Supporting old guests is important.
> > 
> > It comes at a price.
> > (Mutually exclusive goals are important as well.)
> 
> This phrase is awkward. Overlapping goals are negative,
> then? (think of a large number of totally overlapping goals).

Even if both mutually exclusive goals are positive, we can only choose
one.  (Sorry, I don't see the neccessity between overlapping goals and
negativity.)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue
  2015-01-22  8:10       ` Paolo Bonzini
@ 2015-01-22 17:02         ` Radim Krčmář
  0 siblings, 0 replies; 11+ messages in thread
From: Radim Krčmář @ 2015-01-22 17:02 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Marcelo Tosatti, kvm-devel

2015-01-22 09:10+0100, Paolo Bonzini:
> On 21/01/2015 18:00, Radim Krčmář wrote:
> > 2015-01-21 12:16-0200, Marcelo Tosatti:
> >> On Wed, Jan 21, 2015 at 03:09:27PM +0100, Radim Krčmář wrote:
> >>> 2015-01-20 15:54-0200, Marcelo Tosatti:
> >>>> SuSE's 2.6.16 kernel fails to boot if the delta between tsc_timestamp
> >>>> and rdtsc is larger than a given threshold:
> >>> [...]
> >>>> Disable masterclock support (which increases said delta) in case the
> >>>> boot vcpu does not use MSR_KVM_SYSTEM_TIME_NEW.
> >>>
> >>> Why do we care about 2.6.16 bugs in upstream KVM?
> >>
> >> Because people do use 2.6.16 guests.
> > 
> > (Those people probably won't use 3.19+ host ...
> 
> Why not? If you are a cloud provider, you cannot really know what guests
> your customer run.

People running decade old kernels are likely conservative and changing
the host is unsafe too.  (This bug was introduced later.)
I doubt they would risk VMs on a cloud that doesn't ensure stability.

(It's a weak reason, I should have argued that the buggy guest code
 wasn't in Linux 2.6.16 and probably only dwells in a distribution whose
 general support ended on 2013-07-31.)

> >> What is the benefit of removing support for MSR_KVM_SYSTEM_TIME ?
> > 
> > The maintainability of the code increases.  It would look as if we never
> > made the mistake with MSR_KVM_SYSTEM_TIME & MSR_KVM_WALL_CLOCK.
> > (I like when old code looks as if we wrote it from scratch.)
> 
> Everybody does, and everybody obsesses over splitting patches so that
> they look as if the code had been split that way from scratch.  Heck, I
> probably spend over half of my development time inside "git rebase -i".

(+ most of the rest is verification and testing :)

> But it's just not how reality works, and it must show sooner or later.

(Yeah, I am keenly observing and trying the predict the outcome.)

> >> Supporting old guests is important.
> > 
> > It comes at a price.
> > (Mutually exclusive goals are important as well.)
> 
> Marcelo's patch is not too high a price.  Is it ugly?  Yes.  Could it be
> any better?  No, because the ugliness is not his fault, it's intrinsic
> in the problem it solves.

Agreed, I've learned the circumstances that make it the best solution.
(I didn't acknowledge it as a problem and documentation states that
 KVM_FEATURE_CLOCKSOURCE "may be removed in the future".
 The the-future in the future.)

Thanks to you and Marcelo.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue
  2015-01-22 13:59         ` Radim Krčmář
@ 2015-01-22 18:12           ` Marcelo Tosatti
  2015-01-22 19:33             ` Radim Krčmář
  0 siblings, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2015-01-22 18:12 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: kvm-devel, Paolo Bonzini

On Thu, Jan 22, 2015 at 02:59:28PM +0100, Radim Krčmář wrote:
> 2015-01-21 23:40-0200, Marcelo Tosatti:
> > On Wed, Jan 21, 2015 at 06:00:37PM +0100, Radim Krčmář wrote:
> > > 1) The bug happens because a guest expects greater precision.
> > >    I consider that as a guest problem.  kvmclock never guaranteed
> > >    anything, so unmet expectations should be a recoverable error.
> > 
> > delta = pvclock_data.tsc_timestamp - RDTSC
> > 
> > Guest expects delta to be smaller than a given threshold. It does
> > not expect greater precision.
> > 
> > Size of delta does not affect precision.
> 
> I don't understand what the guest wants to achieve with the delta.

Neither do I. It seems to assume that TSC delta is unreliable, while 
system_timestamp is reliable.

Therefore if TSC delta is large, request a system_timestamp update,
which keeps TSC delta small.

> I thought that checking this only made sense if the guest didn't believe
> that PV clock works with large delta.  And they only want precision.
> (What else is there on a clock?)

Ok right they assumed TSC was not reliable?

> Disclaimer: I haven't read the code. (It wasn't in vanilla 2.6.16.)
> 
> > > 2) With time, the probability that 2.6.16 is used is getting lower,
> > >    while people looking at KVM's code appear.
> > >    - At what point are we going to drop 2.6.16 support?
> > >      (We shouldn't let mistakes drag us down forever ...
> > >       Or are we dooming KVM on purpose?)
> > 
> > One of the features of virtualization is to be able to run old 
> > operating systems?
> 
> True, I'll assign higher priority to it.
> 
> > > 3) The patch made me ask more silly questions than it answered :)
> > >    (Why can't other software depend on previous behavior?
> > 
> > Documentation/virtual/kvm/msr.txt:
> > 
> >         whose data will be filled in by the hypervisor periodically.
> >         Only one write, or registration, is needed for each VCPU. The interval
> >         between updates of this structure is arbitrary and implementation-dependent.
> >         The hypervisor may update this structure at any time it sees fit until
> >         anything with bit0 == 0 is written to it.
> 
> Exactly, this made me think it is not a KVM problem.
> (And I wondered why wouldn't we yield to other misuses of it.)
> 
> > > > Supporting old guests is important.
> > > 
> > > It comes at a price.
> > > (Mutually exclusive goals are important as well.)
> > 
> > This phrase is awkward. Overlapping goals are negative,
> > then? (think of a large number of totally overlapping goals).
> 
> Even if both mutually exclusive goals are positive, we can only choose
> one.  (Sorry, I don't see the neccessity between overlapping goals and
> negativity.)

I get your point about "Mutually exclusive goals". Just being annoying.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue
  2015-01-22 18:12           ` Marcelo Tosatti
@ 2015-01-22 19:33             ` Radim Krčmář
  0 siblings, 0 replies; 11+ messages in thread
From: Radim Krčmář @ 2015-01-22 19:33 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm-devel, Paolo Bonzini

2015-01-22 16:12-0200, Marcelo Tosatti:
> On Thu, Jan 22, 2015 at 02:59:28PM +0100, Radim Krčmář wrote:
> > I don't understand what the guest wants to achieve with the delta.
> 
> Neither do I. It seems to assume that TSC delta is unreliable, while 
> system_timestamp is reliable.
> 
> Therefore if TSC delta is large, request a system_timestamp update,
> which keeps TSC delta small.

I was wondering how they do that ... kvm clock has like 1 MSR interface;
multiple registrations are supposed to work?

> > I thought that checking this only made sense if the guest didn't believe
> > that PV clock works with large delta.  And they only want precision.
> > (What else is there on a clock?)
> 
> Ok right they assumed TSC was not reliable?

Most likely.  (And that doing complicated magic is going to help it --
doesn't VMENTRY do pretty much everyting that can be done with
unreliable TSC?)

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-01-22 19:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-20 17:54 KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue Marcelo Tosatti
2015-01-20 19:40 ` Paolo Bonzini
2015-01-21 14:09 ` Radim Krčmář
2015-01-21 14:16   ` Marcelo Tosatti
2015-01-21 17:00     ` Radim Krčmář
2015-01-22  1:40       ` Marcelo Tosatti
2015-01-22 13:59         ` Radim Krčmář
2015-01-22 18:12           ` Marcelo Tosatti
2015-01-22 19:33             ` Radim Krčmář
2015-01-22  8:10       ` Paolo Bonzini
2015-01-22 17:02         ` Radim Krčmář

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.