All of lore.kernel.org
 help / color / mirror / Atom feed
* What's kvmclock's custom sched_clock for?
@ 2016-01-07  7:18 Andy Lutomirski
  2016-01-07  8:41 ` Andy Lutomirski
  2016-01-07 10:56 ` Marcelo Tosatti
  0 siblings, 2 replies; 14+ messages in thread
From: Andy Lutomirski @ 2016-01-07  7:18 UTC (permalink / raw)
  To: Marcelo Tosatti, Radim Krcmar, kvm list

AFAICT KVM reliably passes a monotonic TSC through to guests, even if
the host suspends.  That's all that sched_clock needs, I think.

So why does kvmclock have a custom sched_clock?

On a related note, KVM doesn't pass the "invariant TSC" feature
through to guests on my machine even though "invtsc" is set in QEMU
and the kernel host code appears to support it.  What gives?

--Andy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What's kvmclock's custom sched_clock for?
  2016-01-07  7:18 What's kvmclock's custom sched_clock for? Andy Lutomirski
@ 2016-01-07  8:41 ` Andy Lutomirski
  2016-01-07 10:59   ` Marcelo Tosatti
  2016-01-07 15:18   ` Radim Krcmar
  2016-01-07 10:56 ` Marcelo Tosatti
  1 sibling, 2 replies; 14+ messages in thread
From: Andy Lutomirski @ 2016-01-07  8:41 UTC (permalink / raw)
  To: Marcelo Tosatti, Radim Krcmar, kvm list

On Wed, Jan 6, 2016 at 11:18 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> AFAICT KVM reliably passes a monotonic TSC through to guests, even if
> the host suspends.  That's all that sched_clock needs, I think.
>
> So why does kvmclock have a custom sched_clock?
>
> On a related note, KVM doesn't pass the "invariant TSC" feature
> through to guests on my machine even though "invtsc" is set in QEMU
> and the kernel host code appears to support it.  What gives?

I think I solved part of the puzzle.  KVM doesn't like to advertise
invtsc by default because that breaks migration.  (Oddly, the end
result seems wrong -- with migration, the TSC doesn't stop, but it's
not constant, and X86_FEATURE_CONSTANT_TSC is nonetheless set, but
whatever.)  So the scheduler clock doesn't get marked stable.

Is that it?

This still doesn't explain why even explicitly trying to set invtsc
doesn't seem to work.

--Andy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What's kvmclock's custom sched_clock for?
  2016-01-07  7:18 What's kvmclock's custom sched_clock for? Andy Lutomirski
  2016-01-07  8:41 ` Andy Lutomirski
@ 2016-01-07 10:56 ` Marcelo Tosatti
  1 sibling, 0 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2016-01-07 10:56 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Radim Krcmar, kvm list

On Wed, Jan 06, 2016 at 11:18:51PM -0800, Andy Lutomirski wrote:
> AFAICT KVM reliably passes a monotonic TSC through to guests, 

It does not.

> even if the host suspends. That's all that sched_clock needs, I think.
>
> So why does kvmclock have a custom sched_clock?

Migration between hosts with different TSC frequencies.

> On a related note, KVM doesn't pass the "invariant TSC" feature
> through to guests on my machine even though "invtsc" is set in QEMU
> and the kernel host code appears to support it.  What gives?
> 
> --Andy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What's kvmclock's custom sched_clock for?
  2016-01-07  8:41 ` Andy Lutomirski
@ 2016-01-07 10:59   ` Marcelo Tosatti
  2016-01-07 15:18   ` Radim Krcmar
  1 sibling, 0 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2016-01-07 10:59 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Radim Krcmar, kvm list

On Thu, Jan 07, 2016 at 12:41:34AM -0800, Andy Lutomirski wrote:
> On Wed, Jan 6, 2016 at 11:18 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> > AFAICT KVM reliably passes a monotonic TSC through to guests, even if
> > the host suspends.  That's all that sched_clock needs, I think.
> >
> > So why does kvmclock have a custom sched_clock?
> >
> > On a related note, KVM doesn't pass the "invariant TSC" feature
> > through to guests on my machine even though "invtsc" is set in QEMU
> > and the kernel host code appears to support it.  What gives?
> 
> I think I solved part of the puzzle.  KVM doesn't like to advertise
> invtsc by default because that breaks migration.  (Oddly, the end
> result seems wrong -- with migration, the TSC doesn't stop, but it's
> not constant, and X86_FEATURE_CONSTANT_TSC is nonetheless set, but
> whatever.)  So the scheduler clock doesn't get marked stable.

Can you break down this sentence?

QEMU commit 68bfd0ad4a1dcc4c328d5db85dc746b49c1ec07e

    target-i386: block migration and savevm if invariant tsc is exposed
    
    Invariant TSC documentation mentions that "invariant TSC will run at a
    constant rate in all ACPI P-, C-. and T-states".
    
    This is not the case if migration to a host with different TSC
frequency
    is allowed, or if savevm is performed. So block migration/savevm.


> Is that it?
> 
> This still doesn't explain why even explicitly trying to set invtsc
> doesn't seem to work.
> 
> --Andy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What's kvmclock's custom sched_clock for?
  2016-01-07  8:41 ` Andy Lutomirski
  2016-01-07 10:59   ` Marcelo Tosatti
@ 2016-01-07 15:18   ` Radim Krcmar
  2016-01-07 17:27     ` Andy Lutomirski
  2016-01-07 20:10     ` Marcelo Tosatti
  1 sibling, 2 replies; 14+ messages in thread
From: Radim Krcmar @ 2016-01-07 15:18 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Marcelo Tosatti, kvm list

2016-01-07 00:41-0800, Andy Lutomirski:
> On Wed, Jan 6, 2016 at 11:18 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> AFAICT KVM reliably passes a monotonic TSC through to guests, even if
>> the host suspends.  That's all that sched_clock needs, I think.
>>
>> So why does kvmclock have a custom sched_clock?

If the host CPU has enough features, then yes, KVM can take care of
everything and kvmclock has no advantage over TSC, even when migrating
to TSC with different frequency as modern CPUs support TSC offset +
scaling in guests.

The problem is with antiques.  Guests on old CPUs need to have more
information on top of TSC to be able to get useful system time.
And old KVM doesn't provide good information, so we have legacy layers
everywhere.

kvmclock in the guest can just equal to rdtsc() with modern CPUs, but we
still want to use kvmclock wrapper, because kvmclock can provide an
stable clock regardless of underlying TSC (in theory).

>> On a related note, KVM doesn't pass the "invariant TSC" feature
>> through to guests on my machine even though "invtsc" is set in QEMU
>> and the kernel host code appears to support it.  What gives?
> 
> I think I solved part of the puzzle.  KVM doesn't like to advertise
> invtsc by default because that breaks migration.  (Oddly, the end
> result seems wrong -- with migration, the TSC doesn't stop, but it's
> not constant, and X86_FEATURE_CONSTANT_TSC is nonetheless set, but
> whatever.)

QEMU probably missed that because X86_FEATURE_CONSTANT_TSC is a function
of family/model.  (CONSTANT_TSC is the same as invariant TSC as KVM
guests don't have c-states.)

>             So the scheduler clock doesn't get marked stable.

Stable sched clock is quite unrelated to TSC features.  KVMs from last
few years should always give good enough result to allow stable sched
clock.  We wanted realtime guests and realtime linux needs no_hz=full
that depends on stable sched clock.  The result is huge hack.

We'd need to say that migration creates powerful gravity fields to
faithfully migrate constant/invariant TSC, but stable sched clock
doesn't have that strict expectations about time.

> Is that it?
> 
> This still doesn't explain why even explicitly trying to set invtsc
> doesn't seem to work.

Seems like a bug.  Mine cpuid is
   0x80000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000100
and QEMU says
  warning: host doesn't support requested feature: CPUID.80000007H:EDX.invtsc [bit 8]

I'll see if it's in KVM or QEMU.  (We should only forbid migrations to
hosts with different frequency and without guest TSC scaling.)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What's kvmclock's custom sched_clock for?
  2016-01-07 15:18   ` Radim Krcmar
@ 2016-01-07 17:27     ` Andy Lutomirski
  2016-01-07 17:48       ` Radim Krcmar
  2016-01-07 20:15       ` Marcelo Tosatti
  2016-01-07 20:10     ` Marcelo Tosatti
  1 sibling, 2 replies; 14+ messages in thread
From: Andy Lutomirski @ 2016-01-07 17:27 UTC (permalink / raw)
  To: Radim Krcmar; +Cc: Marcelo Tosatti, kvm list

On Thu, Jan 7, 2016 at 2:56 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Wed, Jan 06, 2016 at 11:18:51PM -0800, Andy Lutomirski wrote:
>> AFAICT KVM reliably passes a monotonic TSC through to guests,
>
> It does not.

Under what circumstances does it go backwards?  All hosts support tsc
offsets, I think, and the host code knows how to prevent the clock
from going backwards even on host suspend.

Does migration make the TSC go backwards?  If so, that's impolite and
it would be nice to fix it.

On Thu, Jan 7, 2016 at 7:18 AM, Radim Krcmar <rkrcmar@redhat.com> wrote:
> 2016-01-07 00:41-0800, Andy Lutomirski:
>> On Wed, Jan 6, 2016 at 11:18 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> AFAICT KVM reliably passes a monotonic TSC through to guests, even if
>>> the host suspends.  That's all that sched_clock needs, I think.
>>>
>>> So why does kvmclock have a custom sched_clock?
>
> If the host CPU has enough features, then yes, KVM can take care of
> everything and kvmclock has no advantage over TSC, even when migrating
> to TSC with different frequency as modern CPUs support TSC offset +
> scaling in guests.
>
> The problem is with antiques.  Guests on old CPUs need to have more
> information on top of TSC to be able to get useful system time.
> And old KVM doesn't provide good information, so we have legacy layers
> everywhere.
>
> kvmclock in the guest can just equal to rdtsc() with modern CPUs, but we
> still want to use kvmclock wrapper, because kvmclock can provide an
> stable clock regardless of underlying TSC (in theory).

OK, makes sense.

>
>>> On a related note, KVM doesn't pass the "invariant TSC" feature
>>> through to guests on my machine even though "invtsc" is set in QEMU
>>> and the kernel host code appears to support it.  What gives?
>>
>> I think I solved part of the puzzle.  KVM doesn't like to advertise
>> invtsc by default because that breaks migration.  (Oddly, the end
>> result seems wrong -- with migration, the TSC doesn't stop, but it's
>> not constant, and X86_FEATURE_CONSTANT_TSC is nonetheless set, but
>> whatever.)
>
> QEMU probably missed that because X86_FEATURE_CONSTANT_TSC is a function
> of family/model.  (CONSTANT_TSC is the same as invariant TSC as KVM
> guests don't have c-states.)
>
>>             So the scheduler clock doesn't get marked stable.
>
> Stable sched clock is quite unrelated to TSC features.  KVMs from last
> few years should always give good enough result to allow stable sched
> clock.  We wanted realtime guests and realtime linux needs no_hz=full
> that depends on stable sched clock.  The result is huge hack.
>
> We'd need to say that migration creates powerful gravity fields to
> faithfully migrate constant/invariant TSC, but stable sched clock
> doesn't have that strict expectations about time.
>
>> Is that it?
>>
>> This still doesn't explain why even explicitly trying to set invtsc
>> doesn't seem to work.
>
> Seems like a bug.  Mine cpuid is
>    0x80000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000100
> and QEMU says
>   warning: host doesn't support requested feature: CPUID.80000007H:EDX.invtsc [bit 8]
>
> I'll see if it's in KVM or QEMU.  (We should only forbid migrations to
> hosts with different frequency and without guest TSC scaling.)

If I do -cpu host,migratable=off,+invtsc, then it works.  Maybe QEMU
is just being too strict.  This is Skylake.

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What's kvmclock's custom sched_clock for?
  2016-01-07 17:27     ` Andy Lutomirski
@ 2016-01-07 17:48       ` Radim Krcmar
  2016-01-07 20:15       ` Marcelo Tosatti
  1 sibling, 0 replies; 14+ messages in thread
From: Radim Krcmar @ 2016-01-07 17:48 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Marcelo Tosatti, kvm list

2016-01-07 09:27-0800, Andy Lutomirski:
> On Thu, Jan 7, 2016 at 7:18 AM, Radim Krcmar <rkrcmar@redhat.com> wrote:
> > 2016-01-07 00:41-0800, Andy Lutomirski:
>>> This still doesn't explain why even explicitly trying to set invtsc
>>> doesn't seem to work.
>>
>> Seems like a bug.  Mine cpuid is
>>    0x80000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000100
>> and QEMU says
>>   warning: host doesn't support requested feature: CPUID.80000007H:EDX.invtsc [bit 8]
>>
>> I'll see if it's in KVM or QEMU.  (We should only forbid migrations to
>> hosts with different frequency and without guest TSC scaling.)
> 
> If I do -cpu host,migratable=off,+invtsc, then it works.  Maybe QEMU
> is just being too strict.  This is Skylake.

It does, thanks.  It's mainly a misleading warning then;  stripping
flags at the beginning instead of denying migration later on makes
some sense.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What's kvmclock's custom sched_clock for?
  2016-01-07 15:18   ` Radim Krcmar
  2016-01-07 17:27     ` Andy Lutomirski
@ 2016-01-07 20:10     ` Marcelo Tosatti
  2016-01-08 14:13       ` Radim Krcmar
  1 sibling, 1 reply; 14+ messages in thread
From: Marcelo Tosatti @ 2016-01-07 20:10 UTC (permalink / raw)
  To: Radim Krcmar; +Cc: Andy Lutomirski, kvm list

On Thu, Jan 07, 2016 at 04:18:11PM +0100, Radim Krcmar wrote:
> 2016-01-07 00:41-0800, Andy Lutomirski:
> > On Wed, Jan 6, 2016 at 11:18 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> >> AFAICT KVM reliably passes a monotonic TSC through to guests, even if
> >> the host suspends.  That's all that sched_clock needs, I think.
> >>
> >> So why does kvmclock have a custom sched_clock?
> 
> If the host CPU has enough features, then yes, KVM can take care of
> everything and kvmclock has no advantage over TSC, even when migrating
> to TSC with different frequency as modern CPUs support TSC offset +
> scaling in guests.
> 
> The problem is with antiques.  Guests on old CPUs need to have more
> information on top of TSC to be able to get useful system time.
> And old KVM doesn't provide good information, so we have legacy layers
> everywhere.
> 
> kvmclock in the guest can just equal to rdtsc() with modern CPUs, but we
> still want to use kvmclock wrapper, because kvmclock can provide an
> stable clock regardless of underlying TSC (in theory).
> 
> >> On a related note, KVM doesn't pass the "invariant TSC" feature
> >> through to guests on my machine even though "invtsc" is set in QEMU
> >> and the kernel host code appears to support it.  What gives?
> > 
> > I think I solved part of the puzzle.  KVM doesn't like to advertise
> > invtsc by default because that breaks migration.  (Oddly, the end
> > result seems wrong -- with migration, the TSC doesn't stop, but it's
> > not constant, and X86_FEATURE_CONSTANT_TSC is nonetheless set, but
> > whatever.)
> 
> QEMU probably missed that because X86_FEATURE_CONSTANT_TSC is a function
> of family/model.  (CONSTANT_TSC is the same as invariant TSC as KVM
> guests don't have c-states.)
> 
> >             So the scheduler clock doesn't get marked stable.
> 
> Stable sched clock is quite unrelated to TSC features.  KVMs from last
> few years should always give good enough result to allow stable sched
> clock.  We wanted realtime guests and realtime linux needs no_hz=full
> that depends on stable sched clock.  The result is huge hack.
> 
> We'd need to say that migration creates powerful gravity fields to
> faithfully migrate constant/invariant TSC, but stable sched clock
> doesn't have that strict expectations about time.

Was that supposed to be a joke? 

> > Is that it?
> > 
> > This still doesn't explain why even explicitly trying to set invtsc
> > doesn't seem to work.
> 
> Seems like a bug.  Mine cpuid is
>    0x80000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000100
> and QEMU says
>   warning: host doesn't support requested feature: CPUID.80000007H:EDX.invtsc [bit 8]
> 
> I'll see if it's in KVM or QEMU.  (We should only forbid migrations to
> hosts with different frequency and without guest TSC scaling.)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What's kvmclock's custom sched_clock for?
  2016-01-07 17:27     ` Andy Lutomirski
  2016-01-07 17:48       ` Radim Krcmar
@ 2016-01-07 20:15       ` Marcelo Tosatti
  1 sibling, 0 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2016-01-07 20:15 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Radim Krcmar, kvm list

On Thu, Jan 07, 2016 at 09:27:30AM -0800, Andy Lutomirski wrote:
> On Thu, Jan 7, 2016 at 2:56 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Wed, Jan 06, 2016 at 11:18:51PM -0800, Andy Lutomirski wrote:
> >> AFAICT KVM reliably passes a monotonic TSC through to guests,
> >
> > It does not.
> 
> Under what circumstances does it go backwards?  All hosts support tsc
> offsets, I think, and the host code knows how to prevent the clock
> from going backwards even on host suspend.
> 
> Does migration make the TSC go backwards?  If so, that's impolite and
> it would be nice to fix it.

TSC clocksource in the host is required for TSC masterclock scheme.

A change from TSC clocksource to a different clocksource, in the host,
invalidates TSC masterclock scheme.

If you change from TSC clocksource to HPET clocksource, for example, 
TSC masterclock scheme stops functioning and its necessary to 
stop exposing PVCLOCK_TSC_STABLE_CLOCK.

Please send a fix, your patch is causing breakage.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What's kvmclock's custom sched_clock for?
  2016-01-07 20:10     ` Marcelo Tosatti
@ 2016-01-08 14:13       ` Radim Krcmar
  2016-01-11 21:00         ` Marcelo Tosatti
  0 siblings, 1 reply; 14+ messages in thread
From: Radim Krcmar @ 2016-01-08 14:13 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Andy Lutomirski, kvm list

2016-01-07 18:10-0200, Marcelo Tosatti:
> On Thu, Jan 07, 2016 at 04:18:11PM +0100, Radim Krcmar wrote:
>> Stable sched clock is quite unrelated to TSC features.  KVMs from last
>> few years should always give good enough result to allow stable sched
>> clock.  We wanted realtime guests and realtime linux needs no_hz=full
>> that depends on stable sched clock.  The result is huge hack.
>> 
>> We'd need to say that migration creates powerful gravity fields to
>> faithfully migrate constant/invariant TSC, but stable sched clock
>> doesn't have that strict expectations about time.
> 
> Was that supposed to be a joke?

Yes, if you mean the first sentence of the second paragraph.
(I think that we'll use a different disclaimer when we enable
 best-effort migration with invariant TSC.)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What's kvmclock's custom sched_clock for?
  2016-01-08 14:13       ` Radim Krcmar
@ 2016-01-11 21:00         ` Marcelo Tosatti
  2016-01-12 15:33           ` Radim Krcmar
  0 siblings, 1 reply; 14+ messages in thread
From: Marcelo Tosatti @ 2016-01-11 21:00 UTC (permalink / raw)
  To: Radim Krcmar; +Cc: Andy Lutomirski, kvm list

On Fri, Jan 08, 2016 at 03:13:16PM +0100, Radim Krcmar wrote:
> 2016-01-07 18:10-0200, Marcelo Tosatti:
> > On Thu, Jan 07, 2016 at 04:18:11PM +0100, Radim Krcmar wrote:
> >> Stable sched clock is quite unrelated to TSC features.  KVMs from last
> >> few years should always give good enough result to allow stable sched
> >> clock.  We wanted realtime guests and realtime linux needs no_hz=full
> >> that depends on stable sched clock.  The result is huge hack.
> >> 
> >> We'd need to say that migration creates powerful gravity fields to
> >> faithfully migrate constant/invariant TSC, but stable sched clock
> >> doesn't have that strict expectations about time.
> > 
> > Was that supposed to be a joke?
> 
> Yes, if you mean the first sentence of the second paragraph.
> (I think that we'll use a different disclaimer when we enable
>  best-effort migration with invariant TSC.)

About getting rid of kvmclock, problem is steal time. Should
separate steal time reporting from rest of kvmclock, so that you
can use TSC clocksource and have steal time reporting.

Also, its very clear why migration was disabled, because 
invariant tsc man page says:

QEMU commit 68bfd0ad4a1dcc4c328d5db85dc746b49c1ec07e

    target-i386: block migration and savevm if invariant tsc is exposed

    Invariant TSC documentation mentions that "invariant TSC will run at
a
    constant rate in all ACPI P-, C-. and T-states".

    This is not the case if migration to a host with different TSC
frequency
    is allowed, or if savevm is performed. So block migration/savevm.

The issue is, even with migration to a host with 
proper frequency, TSC counting will stop for the duration of migration.

But i suppose you can document the fact (that "invariant TSC" behaviour
as documented is different than what exposed by virtualization), and 
go for it.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What's kvmclock's custom sched_clock for?
  2016-01-11 21:00         ` Marcelo Tosatti
@ 2016-01-12 15:33           ` Radim Krcmar
  2016-01-12 20:48             ` Marcelo Tosatti
  0 siblings, 1 reply; 14+ messages in thread
From: Radim Krcmar @ 2016-01-12 15:33 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Andy Lutomirski, kvm list

2016-01-11 19:00-0200, Marcelo Tosatti:
> On Fri, Jan 08, 2016 at 03:13:16PM +0100, Radim Krcmar wrote:
>> 2016-01-07 18:10-0200, Marcelo Tosatti:
>>> On Thu, Jan 07, 2016 at 04:18:11PM +0100, Radim Krcmar wrote:
>>>> Stable sched clock is quite unrelated to TSC features.  KVMs from last
>>>> few years should always give good enough result to allow stable sched
>>>> clock.  We wanted realtime guests and realtime linux needs no_hz=full
>>>> that depends on stable sched clock.  The result is huge hack.
>>>> 
>>>> We'd need to say that migration creates powerful gravity fields to
>>>> faithfully migrate constant/invariant TSC, but stable sched clock
>>>> doesn't have that strict expectations about time.
>>> 
>>> Was that supposed to be a joke?
>> 
>> Yes, if you mean the first sentence of the second paragraph.
>> (I think that we'll use a different disclaimer when we enable
>>  best-effort migration with invariant TSC.)
> 
> About getting rid of kvmclock,

I never wanted to get rid of kvmclock.  In the first part of the email
in question, I meant that the shift and scale can be accelerated by
VMX-TSC hardware, leaving only a check that kvmclock in expected mode
and rdtsc to get the result.

>                                problem is steal time. Should
> separate steal time reporting from rest of kvmclock, so that you
> can use TSC clocksource and have steal time reporting.

We can already do that, steal time doesn't depend on guest sched clock.
Steal time uses a MSR+memory based interface that is related to kvmclock
only by shared notion of a second.

> Also, its very clear why migration was disabled, because 
> invariant tsc man page says:
> 
> QEMU commit 68bfd0ad4a1dcc4c328d5db85dc746b49c1ec07e
> 
>     target-i386: block migration and savevm if invariant tsc is exposed
> 
>     Invariant TSC documentation mentions that "invariant TSC will run at a
>     constant rate in all ACPI P-, C-. and T-states".
> 
>     This is not the case if migration to a host with different TSC frequency
>     is allowed, or if savevm is performed. So block migration/savevm.
> 
> The issue is, even with migration to a host with 
> proper frequency, TSC counting will stop for the duration of migration.

Stopping is the easiest solution.  We can also try to mitigate the
difference by synchronizing time on source and destination hosts,
sharing what UTC/TAI/... time there was at one TSC read on the source,
and setting the appropriate TSC shift on the destination.  (And solve
accumulation of the error, maybe by always using the initial pair.)

The result should be less off than when stopping and the guest couldn't
tell that TSC rate varied as it can't have more reliable time source
than the host.

The issue doesn't have a good solution and I think that some people will
prefer drawbacks associated with invariant TSC migration.
(They do so for other time sources and all have the issue + we already
 migrate constant TSC, which can only match the spec if we make some
 excuses, like "migration forces CPUs into a deep C-state".)

> But i suppose you can document the fact (that "invariant TSC" behaviour
> as documented is different than what exposed by virtualization),

Yep, that generic explanation is quite likely, next to no documentation.

(There are some lawyerish explanations that don't need to violate the
 spec, but I prefer the physics-based one.)

>                                                                  and 
> go for it.

I definitely won't be proactive.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What's kvmclock's custom sched_clock for?
  2016-01-12 15:33           ` Radim Krcmar
@ 2016-01-12 20:48             ` Marcelo Tosatti
  2016-01-13 14:59               ` Radim Krcmar
  0 siblings, 1 reply; 14+ messages in thread
From: Marcelo Tosatti @ 2016-01-12 20:48 UTC (permalink / raw)
  To: Radim Krcmar; +Cc: Andy Lutomirski, kvm list

On Tue, Jan 12, 2016 at 04:33:28PM +0100, Radim Krcmar wrote:
> 2016-01-11 19:00-0200, Marcelo Tosatti:
> > On Fri, Jan 08, 2016 at 03:13:16PM +0100, Radim Krcmar wrote:
> >> 2016-01-07 18:10-0200, Marcelo Tosatti:
> >>> On Thu, Jan 07, 2016 at 04:18:11PM +0100, Radim Krcmar wrote:
> >>>> Stable sched clock is quite unrelated to TSC features.  KVMs from last
> >>>> few years should always give good enough result to allow stable sched
> >>>> clock.  We wanted realtime guests and realtime linux needs no_hz=full
> >>>> that depends on stable sched clock.  The result is huge hack.
> >>>> 
> >>>> We'd need to say that migration creates powerful gravity fields to
> >>>> faithfully migrate constant/invariant TSC, but stable sched clock
> >>>> doesn't have that strict expectations about time.
> >>> 
> >>> Was that supposed to be a joke?
> >> 
> >> Yes, if you mean the first sentence of the second paragraph.
> >> (I think that we'll use a different disclaimer when we enable
> >>  best-effort migration with invariant TSC.)
> > 
> > About getting rid of kvmclock,
> 
> I never wanted to get rid of kvmclock.  In the first part of the email
> in question, I meant that the shift and scale can be accelerated by
> VMX-TSC hardware, leaving only a check that kvmclock in expected mode
> and rdtsc to get the result.

If host TSC can be used, then its not necessary to have the kvmclock
complication.

> >                                problem is steal time. Should
> > separate steal time reporting from rest of kvmclock, so that you
> > can use TSC clocksource and have steal time reporting.
> 
> We can already do that, steal time doesn't depend on guest sched clock.
> Steal time uses a MSR+memory based interface that is related to kvmclock
> only by shared notion of a second.

Err, i meant "guest stop notification" which is done via flags field.

> > Also, its very clear why migration was disabled, because 
> > invariant tsc man page says:
> > 
> > QEMU commit 68bfd0ad4a1dcc4c328d5db85dc746b49c1ec07e
> > 
> >     target-i386: block migration and savevm if invariant tsc is exposed
> > 
> >     Invariant TSC documentation mentions that "invariant TSC will run at a
> >     constant rate in all ACPI P-, C-. and T-states".
> > 
> >     This is not the case if migration to a host with different TSC frequency
> >     is allowed, or if savevm is performed. So block migration/savevm.
> > 
> > The issue is, even with migration to a host with 
> > proper frequency, TSC counting will stop for the duration of migration.
> 
> Stopping is the easiest solution.  We can also try to mitigate the
> difference by synchronizing time on source and destination hosts,
> sharing what UTC/TAI/... time there was at one TSC read on the source,
> and setting the appropriate TSC shift on the destination.  (And solve
> accumulation of the error, maybe by always using the initial pair.)
> 
> The result should be less off than when stopping and the guest couldn't
> tell that TSC rate varied as it can't have more reliable time source
> than the host.
> 
> The issue doesn't have a good solution and I think that some people will
> prefer drawbacks associated with invariant TSC migration.
> (They do so for other time sources and all have the issue + we already
>  migrate constant TSC, which can only match the spec if we make some
>  excuses, like "migration forces CPUs into a deep C-state".)
> 
> > But i suppose you can document the fact (that "invariant TSC" behaviour
> > as documented is different than what exposed by virtualization),
> 
> Yep, that generic explanation is quite likely, next to no documentation.
> 
> (There are some lawyerish explanations that don't need to violate the
>  spec, but I prefer the physics-based one.)
> 
> >                                                                  and 
> > go for it.
> 
> I definitely won't be proactive.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: What's kvmclock's custom sched_clock for?
  2016-01-12 20:48             ` Marcelo Tosatti
@ 2016-01-13 14:59               ` Radim Krcmar
  0 siblings, 0 replies; 14+ messages in thread
From: Radim Krcmar @ 2016-01-13 14:59 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Andy Lutomirski, kvm list

2016-01-12 18:48-0200, Marcelo Tosatti:
> On Tue, Jan 12, 2016 at 04:33:28PM +0100, Radim Krcmar wrote:
>> 2016-01-11 19:00-0200, Marcelo Tosatti:
>> > About getting rid of kvmclock,
>> 
>> I never wanted to get rid of kvmclock.  In the first part of the email
>> in question, I meant that the shift and scale can be accelerated by
>> VMX-TSC hardware, leaving only a check that kvmclock in expected mode
>> and rdtsc to get the result.
> 
> If host TSC can be used, then its not necessary to have the kvmclock
> complication.

Yes, it's just easier to have an indirection until all hosts can be
used.  (And that condition may never be true, so we'll just hide
obsoleted code in an unlikely path.)

>> >                                problem is steal time. Should
>> > separate steal time reporting from rest of kvmclock, so that you
>> > can use TSC clocksource and have steal time reporting.
>> 
>> We can already do that, steal time doesn't depend on guest sched clock.
>> Steal time uses a MSR+memory based interface that is related to kvmclock
>> only by shared notion of a second.
> 
> Err, i meant "guest stop notification" which is done via flags field.

True, we read the bit without looking at time, so a split wouldn't be
unnatural.  (The current code probably works with any clocksource if
kvmclock is set up first :/)

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-01-13 14:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-07  7:18 What's kvmclock's custom sched_clock for? Andy Lutomirski
2016-01-07  8:41 ` Andy Lutomirski
2016-01-07 10:59   ` Marcelo Tosatti
2016-01-07 15:18   ` Radim Krcmar
2016-01-07 17:27     ` Andy Lutomirski
2016-01-07 17:48       ` Radim Krcmar
2016-01-07 20:15       ` Marcelo Tosatti
2016-01-07 20:10     ` Marcelo Tosatti
2016-01-08 14:13       ` Radim Krcmar
2016-01-11 21:00         ` Marcelo Tosatti
2016-01-12 15:33           ` Radim Krcmar
2016-01-12 20:48             ` Marcelo Tosatti
2016-01-13 14:59               ` Radim Krcmar
2016-01-07 10:56 ` Marcelo Tosatti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.