KVM ARM Archive on lore.kernel.org
 help / color / Atom feed
From: Christophe de Dinechin <christophe.de.dinechin@gmail.com>
To: Steven Price <steven.price@arm.com>
Cc: Christophe de Dinechin <christophe.de.dinechin@gmail.com>,
	KVM list <kvm@vger.kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	linux-doc@vger.kernel.org, Russell King <linux@armlinux.org.uk>,
	open list <linux-kernel@vger.kernel.org>,
	Marc Zyngier <maz@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 1/9] KVM: arm64: Document PV-time interface
Date: Wed, 7 Aug 2019 16:28:11 +0200
Message-ID: <9F77FA64-C71B-4025-A58D-3AC07E6688DE@dinechin.org> (raw)
In-Reply-To: <ff2d038d-d866-65fa-655d-b9865bf14016@arm.com>

[-- Attachment #1.1: Type: text/plain, Size: 6485 bytes --]



> On 7 Aug 2019, at 15:21, Steven Price <steven.price@arm.com> wrote:
> 
> On 05/08/2019 17:40, Christophe de Dinechin wrote:
>> 
>> Steven Price writes:
>> 
>>> Introduce a paravirtualization interface for KVM/arm64 based on the
>>> "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.
>>> 
>>> This only adds the details about "Stolen Time" as the details of "Live
>>> Physical Time" have not been fully agreed.
>>> 
>> [...]
>> 
>>> +
>>> +Stolen Time
>>> +-----------
>>> +
>>> +The structure pointed to by the PV_TIME_ST hypercall is as follows:
>>> +
>>> +  Field       | Byte Length | Byte Offset | Description
>>> +  ----------- | ----------- | ----------- | --------------------------
>>> +  Revision    |      4      |      0      | Must be 0 for version 0.1
>>> +  Attributes  |      4      |      4      | Must be 0
>>> +  Stolen time |      8      |      8      | Stolen time in unsigned
>>> +              |             |             | nanoseconds indicating how
>>> +              |             |             | much time this VCPU thread
>>> +              |             |             | was involuntarily not
>>> +              |             |             | running on a physical CPU.
>> 
>> I know very little about the topic, but I don't understand how the spec
>> as proposed allows an accurate reading of the relation between physical
>> time and stolen time simultaneously. In other words, could you draw
>> Figure 1 of the spec from within the guest? Or is it a non-objective?
> 
> Figure 1 is mostly attempting to explain Live Physical Time (LPT), which
> is not part of this patch series. But it does touch on stolen time by
> the difference between "live physical time" and "virtual time".
> 
> I'm not sure what you mean by "from within the guest". From the
> perspective of the guest the parts of the diagram where the guest isn't
> running don't exist (therefore there are discontinuities in the
> "physical time" and "live physical time" lines).

I meant: If I run code within the guest that attempts to draw Figure 1,
race conditions may cause the diagram actually drawn by your guest
program to look completely wrong on occasions.

> This patch series doesn't attempt to provide the guest with a view of
> "physical time" (or LPT) - but it might be able to observe that by
> consulting something external (e.g. an NTP server, or an emulated RTC
> which reports wall-clock time).

… with what appear to be like a built-in race condition, as you correctly
identified. I was wondering if the built-in race condition was deliberate
and/or necessary, or if it was irrelevant for the planned uses of the value.

> What it does provide is a mechanism for obtaining the difference (as
> reported by the host) between "live physical time" and "virtual time" -
> this is reported in nanoseconds in the above structure.
> 
>> For example, if you read the stolen time before you read CNTVCT_EL0,
>> isn't it possible for a lengthy event like a migration to occur between
>> the two reads, causing the stolen time to be obsolete and off by seconds?
> 
> "Lengthy events" like migration are represented by the "paused" state in
> the diagram - i.e. it's the difference between "physical time" and "live
> physical time". So stolen time doesn't attempt to represent that.
> 
> And yes, there is a race between reading CNTVCT_EL0 and reading stolen
> time - but in practice this doesn't really matter. The usual pseudo-code
> way of using stolen time is:

I’m assuming this is the guest scheduler you are talking about,
and I’m assuming virtualization can preempt that code anywhere.
Maybe that’s where I’m wrong?

For the sake of the argument, assume there is a 1s pause.
Not completely unreasonable in a migration scenario.

>  * scheduler captures stolen time from structure and CNTVCT_EL0:
>      before_timer = CNTVCT_EL0

[insert optional 1s pause here, case A]

>      before_stolen = stolen
>  * schedule in process
>  * process is pre-empted (or blocked in some way)
>  * scheduler captures stolen time from structure and CNTVCT_EL0:
>      after_timer = CNTVCT_EL0

[insert optional 1s pause here, case B]

>      after_stolen = stolen
>      time = to_nsecs(after_timer - before_timer) -
>             (after_stolen - before_stolen)

In case A, time is too big by one second. In case B, it is too small,
to the point where your code might need to be ready for
“time” unexpectedly showing up as negative.

> 
> The scheduler can then charge the process for "time" nanoseconds of
> time. This ensures that a process isn't unfairly penalised if the host
> doesn't schedule the VCPU while it is supposed to be running.
> 
> The race is very small in comparison to the time the process is running,
> and in the worst case just means the process is charged slightly more
> (or less) than it should be.

At this point, what I don’t understand is why the race would be
“very small” or why you would only be charged “slightly” more or less?

> I guess if you're really worried about it, you could do a dance like:
> 
> 	do {
> 		before = stolen
> 		timer = CNTVCT_EL0
> 		after = stolen
> 	} while (before != after);

That will work as long as nothing in that loop requires something
that would cause `stolen` to jump. If there is such a guarantee,
then that’s even efficient, because in most cases the loop
would only run once, at the cost of one extra read and one test.

> But I don't see the need to have such an accurate view of elapsed time
> that the VCPU was scheduled. And of course at the moment (without this
> series) the guest has no idea about time stolen by the host.

I’m certainly not arguing that exposing stolen time is a bad idea,
I’m only wondering if the proposed solution is racy, and if so, if
it is intentional.

If it’s indeed racy, the problem could be mitigated in a number of
ways

a) document your loop or something similar as being the recommended
way to avoid the race, and then ensure that the loop actually
will always work as intended. The upside is that it’s just a change in
some comments or documentation.

b) having a single interface that exposes multiple times. For example,
you could have a copy of CNTVCT_EL0 written alongside stolen time,
and then the scheduler could use that copy for its decision.


Thanks
Christophe

[-- Attachment #1.2: Type: text/html, Size: 48755 bytes --]

[-- Attachment #2: Type: text/plain, Size: 151 bytes --]

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

  reply index

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-02 14:50 [PATCH 0/9] arm64: Stolen time support Steven Price
2019-08-02 14:50 ` [PATCH 1/9] KVM: arm64: Document PV-time interface Steven Price
2019-08-03 11:13   ` Marc Zyngier
2019-08-05 13:06     ` Steven Price
2019-08-05  3:23   ` Zenghui Yu
2019-08-05 13:06     ` Steven Price
2019-08-05 16:40   ` Christophe de Dinechin
2019-08-07 13:21     ` Steven Price
2019-08-07 14:28       ` Christophe de Dinechin [this message]
2019-08-07 15:26         ` Steven Price
2019-08-02 14:50 ` [PATCH 2/9] KVM: arm/arm64: Factor out hypercall handling from PSCI code Steven Price
2019-08-02 14:50 ` [PATCH 3/9] KVM: arm64: Implement PV_FEATURES call Steven Price
2019-08-03 11:21   ` Marc Zyngier
2019-08-05 13:14     ` Steven Price
2019-08-02 14:50 ` [PATCH 4/9] KVM: arm64: Support stolen time reporting via shared structure Steven Price
2019-08-03 11:55   ` Marc Zyngier
2019-08-05 14:09     ` Steven Price
2019-08-03 17:58   ` Marc Zyngier
2019-08-03 18:13     ` Marc Zyngier
2019-08-05 14:18       ` Steven Price
2019-08-02 14:50 ` [PATCH 5/9] KVM: Allow kvm_device_ops to be const Steven Price
2019-08-02 14:50 ` [PATCH 6/9] KVM: arm64: Provide a PV_TIME device to user space Steven Price
2019-08-03 12:51   ` Marc Zyngier
2019-08-03 17:34     ` Marc Zyngier
2019-08-07 13:39       ` Steven Price
2019-08-07 13:51         ` Marc Zyngier
2019-08-05 16:10     ` Steven Price
2019-08-05 16:28       ` Marc Zyngier
2019-08-02 14:50 ` [PATCH 7/9] arm/arm64: Provide a wrapper for SMCCC 1.1 calls Steven Price
2019-08-05 10:03   ` Will Deacon
2019-08-02 14:50 ` [PATCH 8/9] arm/arm64: Make use of the SMCCC 1.1 wrapper Steven Price
2019-08-02 14:50 ` [PATCH 9/9] arm64: Retrieve stolen time as paravirtualized guest Steven Price
2019-08-04  9:53   ` Marc Zyngier
2019-08-08 15:29     ` Steven Price
2019-08-08 15:49       ` Marc Zyngier
2019-08-09 13:51   ` Zenghui Yu
2019-08-12 10:39     ` Steven Price
2019-08-13  6:06       ` Zenghui Yu
2019-08-03 18:05 ` [PATCH 0/9] arm64: Stolen time support Marc Zyngier
2019-08-05 13:06   ` Steven Price
2019-08-05 13:26     ` Marc Zyngier
2019-08-14 13:02     ` Alexander Graf
2019-08-14 14:19       ` Marc Zyngier
2019-08-14 14:52         ` [UNVERIFIED SENDER] " Alexander Graf
2019-08-16 10:23           ` Steven Price
2020-07-21  3:26 ` zhukeqian
2020-07-27 10:48   ` Steven Price
2020-07-29  2:57     ` zhukeqian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9F77FA64-C71B-4025-A58D-3AC07E6688DE@dinechin.org \
    --to=christophe.de.dinechin@gmail.com \
    --cc=catalin.marinas@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=maz@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=steven.price@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM ARM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvmarm/0 kvmarm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvmarm kvmarm/ https://lore.kernel.org/kvmarm \
		kvmarm@lists.cs.columbia.edu
	public-inbox-index kvmarm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/edu.columbia.cs.lists.kvmarm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git