All of lore.kernel.org
 help / color / mirror / Atom feed
From: Randy Dunlap <randy.dunlap@oracle.com>
To: Glauber Costa <glommer@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, avi@redhat.com
Subject: Re: [PATCH 5/5] add documentation about kvmclock
Date: Thu, 15 Apr 2010 12:28:36 -0700	[thread overview]
Message-ID: <20100415122836.27f1e255.randy.dunlap@oracle.com> (raw)
In-Reply-To: <1271356648-5108-6-git-send-email-glommer@redhat.com>

On Thu, 15 Apr 2010 14:37:28 -0400 Glauber Costa wrote:

> This patch adds a new file, kvm/kvmclock.txt, describing
> the mechanism we use in kvmclock.
> 
> Signed-off-by: Glauber Costa <glommer@redhat.com>
> ---
>  Documentation/kvm/kvmclock.txt |  138 ++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 138 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/kvm/kvmclock.txt
> 
> diff --git a/Documentation/kvm/kvmclock.txt b/Documentation/kvm/kvmclock.txt
> new file mode 100644
> index 0000000..21008bb
> --- /dev/null
> +++ b/Documentation/kvm/kvmclock.txt
> @@ -0,0 +1,138 @@
> +KVM Paravirtual Clocksource driver
> +Glauber Costa, Red Hat Inc.
> +==================================
> +
> +1. General Description
> +=======================
> +
...
> +
> +2. kvmclock basics 
> +===========================
> +
> +When supported by the hypervisor, guests can register a memory page
> +to contain kvmclock data. This page has to be present in guest's address space
> +throughout its whole life. The hypervisor continues to write to it until it is
> +explicitly disabled or the guest is turned off.
> +
> +2.1 kvmclock availability
> +-------------------------
> +
> +Guests that want to take advantage of kvmclock should first check its
> +availability through cpuid.
> +
> +kvm features are presented to the guest in leaf 0x40000001. Bit 3 indicates
> +the present of kvmclock. Bit 0 indicates that kvmclock is present, but the

       presence
but it's confusing.  Is it bit 3 or bit 0?  They seem to indicate the same thing.

> +old MSR set must be used. See section 2.3 for details.

"old MSR set":  what does this mean?

> +
> +2.2 kvmclock functionality
> +--------------------------
> +
> +Two MSRs are provided by the hypervisor, controlling kvmclock operation:
> +
> + * MSR_KVM_WALL_CLOCK, value 0x4b564d00 and
> + * MSR_KVM_SYSTEM_TIME, value 0x4b564d01.
> +
> +The first one is only used in rare situations, like boot-time and a
> +suspend-resume cycle. Data is disposable, and after used, the guest
> +may use it for something else. This is hardly a hot path for anything.
> +The Hypervisor fills in the address provided through this MSR with the
> +following structure:
> +
> +struct pvclock_wall_clock {
> +        u32   version;
> +        u32   sec;
> +        u32   nsec;
> +} __attribute__((__packed__));
> +
> +Guest should only trust data to be valid when version haven't changed before

                                                         has not

> +and after reads of sec and nsec. Besides not changing, it has to be an even
> +number. Hypervisor may write an odd number to version field to indicate that
> +an update is in progress.
> +
> +MSR_KVM_SYSTEM_TIME, on the other hand, has persistent data, and is
> +constantly updated by the hypervisor with time information. The data
> +written in this MSR contains two pieces of information: the address in which
> +the guests expects time data to be present 4-byte aligned or'ed with an
> +enabled bit. If one wants to shutdown kvmclock, it just needs to write
> +anything that has 0 as its last bit.
> +
> +Time information presented by the hypervisor follows the structure:
> +
> +struct pvclock_vcpu_time_info {
> +        u32   version;
> +        u32   pad0;
> +        u64   tsc_timestamp;
> +        u64   system_time;
> +        u32   tsc_to_system_mul;
> +        s8    tsc_shift;
> +        u8    pad[3];
> +} __attribute__((__packed__)); 
> +
> +The version field plays the same role as with the one in struct
> +pvclock_wall_clock. The other fields, are:
> +
> + a. tsc_timestamp: the guest-visible tsc (result of rdtsc + tsc_offset) of
> +    this cpu at the moment we recorded system_time. Note that some time is

            CPU (please)

> +    inevitably spent between system_time and tsc_timestamp measurements.
> +    Guests can subtract this quantity from the current value of tsc to obtain
> +    a delta to be added to system_time

                           to system_time.

> +
> + b. system_time: this is the most recent host-time we could be provided with.
> +    host gets it through ktime_get_ts, using whichever clocksource is
> +    registered at the moment

                         moment.

> +
> + c. tsc_to_system_mul: this is the number that tsc delta has to be multiplied
> +    by in order to obtain time in nanoseconds. Hypervisor is free to change
> +    this value in face of events like cpu frequency change, pcpu migration,

                                         CPU

> +    etc.
> + 
> + d. tsc_shift: guests must shift 

missing text??

> +
> +With this information available, guest calculates current time as:
> +
> +  T = kt + to_nsec(tsc - tsc_0)
> +
> +2.3 Compatibility MSRs
> +----------------------
> +
> +Guests running on top of older hypervisors may have to use a different set of
> +MSRs. This is because originally, kvmclock MSRs were exported within a
> +reserved range by accident. Guests should check cpuid leaf 0x40000001 for the
> +presence of kvmclock. If bit 3 is disabled, but bit 0 is enabled, guests can
> +have access to kvmclock functionality through
> +
> + * MSR_KVM_WALL_CLOCK_OLD, value 0x11 and
> + * MSR_KVM_SYSTEM_TIME_OLD, value 0x12.
> +
> +Note, however, that this is deprecated.
> +
> +3. Migration
> +============
> +
> +Two ioctls are provided to aid the task of migration: 
> +
> + * KVM_GET_CLOCK and
> + * KVM_SET_CLOCK
> +
> +Their aim is to control an offset that can be summed to system_time, in order
> +to guarantee monotonicity on the time over guest migration. Source host
> +executes KVM_GET_CLOCK, obtaining the last valid timestamp in this host, while
> +destination sets it with KVM_SET_CLOCK. It's the destination responsibility to
> +never return time that is less than that.


---
~Randy

  reply	other threads:[~2010-04-15 19:29 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-15 18:37 [PATCH 0/5] pv clock misc fixes Glauber Costa
2010-04-15 18:37 ` [PATCH 1/5] Add a global synchronization point for pvclock Glauber Costa
2010-04-15 18:37   ` [PATCH 2/5] change msr numbers for kvmclock Glauber Costa
2010-04-15 18:37     ` [PATCH 3/5] Try using new kvm clock msrs Glauber Costa
2010-04-15 18:37       ` [PATCH 4/5] export new cpuid KVM_CAP Glauber Costa
2010-04-15 18:37         ` [PATCH 5/5] add documentation about kvmclock Glauber Costa
2010-04-15 19:28           ` Randy Dunlap [this message]
2010-04-15 20:10             ` Glauber Costa
2010-04-17 18:58         ` [PATCH 4/5] export new cpuid KVM_CAP Avi Kivity
2010-04-19 14:50           ` Glauber Costa
2010-04-20  9:29             ` Avi Kivity
2010-04-17 18:55       ` [PATCH 3/5] Try using new kvm clock msrs Avi Kivity
2010-04-17 18:51     ` [PATCH 2/5] change msr numbers for kvmclock Avi Kivity
2010-04-16 20:23   ` [PATCH 1/5] Add a global synchronization point for pvclock Marcelo Tosatti
2010-04-16 20:36   ` Jeremy Fitzhardinge
2010-04-16 21:05     ` Zachary Amsden
2010-04-19 10:39     ` Peter Zijlstra
2010-04-19 10:50       ` Avi Kivity
2010-04-19 11:05         ` Peter Zijlstra
2010-04-19 11:10           ` Avi Kivity
2010-04-19 14:21             ` Glauber Costa
2010-04-19 14:33               ` Avi Kivity
2010-04-19 14:46                 ` Peter Zijlstra
2010-04-19 16:18                   ` Jeremy Fitzhardinge
2010-04-20  9:31                     ` Avi Kivity
2010-04-20 18:23                       ` Jeremy Fitzhardinge
2010-04-20 18:54                         ` Avi Kivity
2010-04-20 19:42                           ` Jeremy Fitzhardinge
2010-04-21  0:07                             ` Zachary Amsden
2010-04-22 13:11                             ` Glauber Costa
2010-04-23  1:44                               ` Zachary Amsden
2010-04-23  9:34                                 ` Avi Kivity
2010-04-23 19:22                                   ` Jeremy Fitzhardinge
2010-04-23 19:25                                     ` Avi Kivity
2010-04-23 21:31                                   ` Zachary Amsden
2010-04-23 21:35                                     ` Jeremy Fitzhardinge
2010-04-23 21:41                                       ` Zachary Amsden
2010-04-24  9:30                                         ` Avi Kivity
2010-04-24  9:29                                     ` Avi Kivity
2010-04-19 16:11                 ` Jeremy Fitzhardinge
2010-04-19 14:26     ` Glauber Costa
2010-04-19 16:19       ` Jeremy Fitzhardinge
2010-04-19 18:25         ` Glauber Costa
2010-04-20  1:57           ` Marcelo Tosatti
2010-04-20  9:35             ` Avi Kivity
2010-04-20 12:59               ` Glauber Costa
2010-04-20 15:16                 ` Avi Kivity
2010-04-21  0:01               ` Zachary Amsden
2010-04-21  8:06                 ` Avi Kivity
2010-04-17 18:48   ` Avi Kivity
2010-04-17 18:49     ` Avi Kivity
2010-04-19 10:43       ` Peter Zijlstra
2010-04-19 10:47         ` Avi Kivity
2010-04-19 10:56           ` Peter Zijlstra
2010-04-19 11:13             ` Avi Kivity
2010-04-19 11:19               ` Peter Zijlstra
2010-04-19 11:40                 ` Avi Kivity
2010-04-19 14:32                 ` Glauber Costa
2010-04-19 14:37                   ` Avi Kivity
2010-04-19 10:46     ` Peter Zijlstra
2010-04-19 10:49       ` Avi Kivity
2010-04-19 10:51         ` Peter Zijlstra
2010-04-19 10:54           ` Avi Kivity
2010-04-19 18:35             ` Zachary Amsden
2010-04-20  9:39               ` Avi Kivity
2010-04-21  0:05                 ` Zachary Amsden
2010-04-21  8:08                   ` Avi Kivity
2010-04-19 10:49       ` Peter Zijlstra
2010-04-19 10:53         ` Avi Kivity
2010-04-19 10:59           ` Peter Zijlstra
2010-04-19 11:35             ` Avi Kivity
2010-10-25 23:30   ` Jeremy Fitzhardinge
2010-10-25 23:30     ` Jeremy Fitzhardinge
2010-10-26  8:14     ` Avi Kivity
2010-10-26 10:49       ` Glauber Costa
2010-10-26 17:04       ` Jeremy Fitzhardinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100415122836.27f1e255.randy.dunlap@oracle.com \
    --to=randy.dunlap@oracle.com \
    --cc=avi@redhat.com \
    --cc=glommer@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.