All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Alexander Graf <agraf@suse.de>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
	linux-doc@vger.kernel.org, Peter Zijlstra <peterz@infradead.org>,
	Jan Kiszka <jan.kiszka@siemens.com>,
	Virtualization <virtualization@lists.linux-foundation.org>,
	Paul Mackerras <paulus@samba.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	Xen <xen-devel@lists.xensource.com>,
	Dave Jiang <dave.jiang@intel.com>, KVM <kvm@vger.kernel.org>,
	Glauber Costa <glommer@redhat.com>, X86 <x86@kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Avi Kivity <avi@redhat.com>,
	Rik van Riel <riel@redhat.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Greg Kroah-Hartman <gregkh@suse.de>,
	Sasha Levin <levinsasha928@gmail.com>,
	Sedat Dilek <sedat.dilek@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Dave Hansen <dave@linux.vnet.ibm.com>, Suzuki Poulose <
Subject: Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
Date: Tue, 17 Jan 2012 11:30:04 +1100	[thread overview]
Message-ID: <4F14C10C.3010303@goop.org> (raw)
In-Reply-To: <0B6BF918-45C7-45C6-AAA4-FB73EFAB9FDB@suse.de>

On 01/16/2012 09:24 PM, Alexander Graf wrote:
> This is true in case you're spinning. If on overcommit spinlocks would
> instead of spin just yield(), we wouldn't have any vcpu running that's
> just waiting for a late ticket.

Yes, but the reality is that most spinlocks are held for a short period
of time and there's a low likelihood of being preempted while within a
spinlock critical section.  Therefore if someone else tries to get the
spinlock and there's contention, it's always worth spinning for a little
while because the lock will likely become free soon.

At least that's the case if the lock has low contention (shallow queue
depth and not in slow state).  Again, maybe it makes sense to never spin
for deep queues or already slowstate locks.

> We still have an issue finding the point in time when a vcpu could run again, which is what this whole series is about. My point above was that instead of doing a count loop, we could just do the normal spin dance and set the threshold to when we enable the magic to have another spin lock notify us in the CPU. That way we
>
>   * don't change the uncontended case

I don't follow you.  What do you mean by "the normal spin dance"?  What
do you mean by "have another spinlock notify us in the CPU"?  Don't
change which uncontended case?  Do you mean in the locking path?  Or the
unlock path?  Or both?

>   * can set the threshold on the host, which knows how contended the system is

Hm, I'm not convinced that knowing how contended the system is is all
that useful overall.  What's important is how contended a particular
lock is, and what state the current holder is in.  If it's not currently
running, then knowing the overall system contention would give you some
idea about how long you need to wait for it to be rescheduled, but
that's getting pretty indirect.

I think the "slowpath if preempted while spinning" idea I mentioned in
the other mail is probably worth following up, since that give specific
actionable information to the guest from the hypervisor.  But lots of
caveats.

[[
A possible mechanism:

  * register ranges of [er]ips with the hypervisor
  * each range is paired with a "resched handler block"
  * if vcpu is preempted within such a range, make sure it is
    rescheduled in the resched handler block

This is obviously akin to the exception mechanism, but it is partially
implemented by the hypervisor.  It allows the spinlock code to be
unchanged from native, but make use of a resched rather than an explicit
counter to determine when to slowpath the lock.  And it's a nice general
mechanism that could be potentially useful elsewhere.

Unfortunately, it doesn't change the unlock path at all; it still needs
to explicitly test if a VCPU needs to be kicked on unlock.
]]


> And since we control what spin locks look like, we can for example always keep the pointer to it in a specific register so that we can handle pv_lock_ops.lock_spinning() inside there and fetch all the information we need from our pt_regs.

You've left a pile of parts of an idea lying around, but I'm not sure
what shape you intend it to be.

>>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal? Last time I checked, enabling all the PV ops did incur significant slowdown which is why I went though the work to split the individual pv ops features up to only enable a few for KVM guests.
>> The whole point of the pv-ticketlock work is to keep the pvops hooks out of the locking fast path, so that the calls are only made on the slow path - that is, when spinning too long on a contended lock, and when releasing a lock that's in a "slow" state.  In the fast path case of no contention, there are no pvops, and the executed code path is almost identical to native.
> You're still changing a tight loop that does nothing (CPU detects it and saves power) into something that performs calculations.

It still has a "pause" instruction in that loop, so that CPU mechanism
will still come into play.  "pause" doesn't directly "save power"; it's
more about making sure that memory dependence cycles are broken and that
two competing threads will make similar progress.  Besides I'm not sure
adding a dec+test to a loop that's already got a memory read and compare
in it is adding much in the way of "calculations".

    J

WARNING: multiple messages have this Message-ID (diff)
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Alexander Graf <agraf@suse.de>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
	linux-doc@vger.kernel.org, Peter Zijlstra <peterz@infradead.org>,
	Jan Kiszka <jan.kiszka@siemens.com>,
	Virtualization <virtualization@lists.linux-foundation.org>,
	Paul Mackerras <paulus@samba.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	Xen <xen-devel@lists.xensource.com>,
	Dave Jiang <dave.jiang@intel.com>, KVM <kvm@vger.kernel.org>,
	Glauber Costa <glommer@redhat.com>, X86 <x86@kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Avi Kivity <avi@redhat.com>,
	Rik van Riel <riel@redhat.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Greg Kroah-Hartman <gregkh@suse.de>,
	Sasha Levin <levinsasha928@gmail.com>,
	Sedat Dilek <sedat.dilek@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Dave Hansen <dave@linux.vnet.ibm.com>
Subject: Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
Date: Tue, 17 Jan 2012 11:30:04 +1100	[thread overview]
Message-ID: <4F14C10C.3010303@goop.org> (raw)
In-Reply-To: <0B6BF918-45C7-45C6-AAA4-FB73EFAB9FDB@suse.de>

On 01/16/2012 09:24 PM, Alexander Graf wrote:
> This is true in case you're spinning. If on overcommit spinlocks would
> instead of spin just yield(), we wouldn't have any vcpu running that's
> just waiting for a late ticket.

Yes, but the reality is that most spinlocks are held for a short period
of time and there's a low likelihood of being preempted while within a
spinlock critical section.  Therefore if someone else tries to get the
spinlock and there's contention, it's always worth spinning for a little
while because the lock will likely become free soon.

At least that's the case if the lock has low contention (shallow queue
depth and not in slow state).  Again, maybe it makes sense to never spin
for deep queues or already slowstate locks.

> We still have an issue finding the point in time when a vcpu could run again, which is what this whole series is about. My point above was that instead of doing a count loop, we could just do the normal spin dance and set the threshold to when we enable the magic to have another spin lock notify us in the CPU. That way we
>
>   * don't change the uncontended case

I don't follow you.  What do you mean by "the normal spin dance"?  What
do you mean by "have another spinlock notify us in the CPU"?  Don't
change which uncontended case?  Do you mean in the locking path?  Or the
unlock path?  Or both?

>   * can set the threshold on the host, which knows how contended the system is

Hm, I'm not convinced that knowing how contended the system is is all
that useful overall.  What's important is how contended a particular
lock is, and what state the current holder is in.  If it's not currently
running, then knowing the overall system contention would give you some
idea about how long you need to wait for it to be rescheduled, but
that's getting pretty indirect.

I think the "slowpath if preempted while spinning" idea I mentioned in
the other mail is probably worth following up, since that give specific
actionable information to the guest from the hypervisor.  But lots of
caveats.

[[
A possible mechanism:

  * register ranges of [er]ips with the hypervisor
  * each range is paired with a "resched handler block"
  * if vcpu is preempted within such a range, make sure it is
    rescheduled in the resched handler block

This is obviously akin to the exception mechanism, but it is partially
implemented by the hypervisor.  It allows the spinlock code to be
unchanged from native, but make use of a resched rather than an explicit
counter to determine when to slowpath the lock.  And it's a nice general
mechanism that could be potentially useful elsewhere.

Unfortunately, it doesn't change the unlock path at all; it still needs
to explicitly test if a VCPU needs to be kicked on unlock.
]]


> And since we control what spin locks look like, we can for example always keep the pointer to it in a specific register so that we can handle pv_lock_ops.lock_spinning() inside there and fetch all the information we need from our pt_regs.

You've left a pile of parts of an idea lying around, but I'm not sure
what shape you intend it to be.

>>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal? Last time I checked, enabling all the PV ops did incur significant slowdown which is why I went though the work to split the individual pv ops features up to only enable a few for KVM guests.
>> The whole point of the pv-ticketlock work is to keep the pvops hooks out of the locking fast path, so that the calls are only made on the slow path - that is, when spinning too long on a contended lock, and when releasing a lock that's in a "slow" state.  In the fast path case of no contention, there are no pvops, and the executed code path is almost identical to native.
> You're still changing a tight loop that does nothing (CPU detects it and saves power) into something that performs calculations.

It still has a "pause" instruction in that loop, so that CPU mechanism
will still come into play.  "pause" doesn't directly "save power"; it's
more about making sure that memory dependence cycles are broken and that
two competing threads will make similar progress.  Besides I'm not sure
adding a dec+test to a loop that's already got a memory read and compare
in it is adding much in the way of "calculations".

    J

  reply	other threads:[~2012-01-17  0:30 UTC|newest]

Thread overview: 139+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-14 18:25 [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests Raghavendra K T
2012-01-14 18:25 ` Raghavendra K T
2012-01-14 18:25 ` Raghavendra K T
2012-01-14 18:25 ` [PATCH RFC V4 1/5] debugfs: Add support to print u32 array in debugfs Raghavendra K T
2012-01-14 18:25   ` Raghavendra K T
2012-01-14 18:25   ` Raghavendra K T
2012-01-14 18:25 ` [PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks Raghavendra K T
2012-01-14 18:25   ` Raghavendra K T
2012-01-14 18:25   ` Raghavendra K T
2012-01-16  3:24   ` Alexander Graf
2012-01-16  3:24     ` Alexander Graf
2012-01-16  8:43     ` Raghavendra K T
2012-01-16  8:43       ` Raghavendra K T
2012-01-16  9:03   ` Avi Kivity
2012-01-16  9:03     ` Avi Kivity
2012-01-16  9:55     ` Raghavendra K T
2012-01-16  9:55       ` Raghavendra K T
2012-01-14 18:26 ` [PATCH RFC V4 3/5] kvm guest : Added configuration support to enable debug information for KVM Guests Raghavendra K T
2012-01-14 18:26   ` Raghavendra K T
2012-01-14 18:26   ` Raghavendra K T
2012-01-14 18:26 ` [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor Raghavendra K T
2012-01-14 18:26   ` Raghavendra K T
2012-01-14 18:26   ` Raghavendra K T
2012-01-16  3:12   ` Alexander Graf
2012-01-16  3:12     ` Alexander Graf
2012-01-16  7:25     ` Raghavendra K T
2012-01-16  7:25       ` Raghavendra K T
2012-01-16  9:05   ` Avi Kivity
2012-01-16  9:05     ` Avi Kivity
2012-01-16 14:13     ` Raghavendra K T
2012-01-16 14:13       ` Raghavendra K T
2012-01-16 14:47       ` Avi Kivity
2012-01-16 14:47         ` Avi Kivity
2012-01-16 23:49         ` Jeremy Fitzhardinge
2012-01-16 23:49           ` Jeremy Fitzhardinge
2012-01-17 11:02   ` Marcelo Tosatti
2012-01-17 11:02     ` Marcelo Tosatti
2012-01-17 11:33     ` Srivatsa Vaddagiri
2012-01-17 11:33       ` Srivatsa Vaddagiri
2012-01-18  1:34       ` Jeremy Fitzhardinge
2012-01-18  1:34         ` Jeremy Fitzhardinge
2012-01-18 13:54         ` Srivatsa Vaddagiri
2012-01-18 13:54           ` Srivatsa Vaddagiri
2012-01-18 21:52           ` Jeremy Fitzhardinge
2012-01-18 21:52             ` Jeremy Fitzhardinge
2012-01-24 14:08             ` Avi Kivity
2012-01-24 14:08               ` Avi Kivity
2012-01-24 18:51               ` Raghavendra K T
2012-01-24 18:51                 ` Raghavendra K T
2012-01-17 18:57     ` Raghavendra K T
2012-01-17 18:57       ` Raghavendra K T
2012-01-24 19:01       ` Raghavendra K T
2012-01-14 18:27 ` [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock Raghavendra K T
2012-01-14 18:27 ` Raghavendra K T
2012-01-14 18:27   ` Raghavendra K T
2012-01-14 18:27   ` Raghavendra K T
2012-01-16  3:23   ` Alexander Graf
2012-01-16  3:23     ` Alexander Graf
2012-01-16  3:51     ` Srivatsa Vaddagiri
2012-01-16  3:51       ` Srivatsa Vaddagiri
2012-01-16  4:00       ` Alexander Graf
2012-01-16  4:00         ` Alexander Graf
2012-01-16  8:47         ` Avi Kivity
2012-01-16  8:44     ` Raghavendra K T
2012-01-16  8:44       ` Raghavendra K T
2012-01-16 10:26       ` Alexander Graf
2012-01-16 10:26         ` Alexander Graf
2012-01-16  9:00   ` Avi Kivity
2012-01-16  9:00     ` Avi Kivity
2012-01-16  9:40     ` Srivatsa Vaddagiri
2012-01-16 10:14       ` Avi Kivity
2012-01-16 14:11         ` Srivatsa Vaddagiri
2012-01-17  9:14           ` Gleb Natapov
2012-01-17  9:14             ` Gleb Natapov
2012-01-17 12:26             ` Srivatsa Vaddagiri
2012-01-17 12:26               ` Srivatsa Vaddagiri
2012-01-17 12:51               ` Gleb Natapov
2012-01-17 12:51                 ` Gleb Natapov
2012-01-17 13:11                 ` Srivatsa Vaddagiri
2012-01-17 13:11                   ` Srivatsa Vaddagiri
2012-01-17 13:20                   ` Gleb Natapov
2012-01-17 13:20                     ` Gleb Natapov
2012-01-17 14:28                     ` Srivatsa Vaddagiri
2012-01-17 14:28                       ` Srivatsa Vaddagiri
2012-01-17 15:32                       ` Gleb Natapov
2012-01-17 15:32                         ` Gleb Natapov
2012-01-17 15:53                         ` Marcelo Tosatti
2012-01-17 15:53                           ` Marcelo Tosatti
2012-01-20 15:09                           ` Srivatsa Vaddagiri
2012-01-17 13:13                 ` Raghavendra K T
2012-01-17 13:13                   ` Raghavendra K T
2012-01-16  3:57 ` [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests Alexander Graf
2012-01-16  3:57   ` Alexander Graf
2012-01-16  6:40   ` Jeremy Fitzhardinge
2012-01-16  6:40     ` Jeremy Fitzhardinge
2012-01-16  8:55     ` Avi Kivity
2012-01-16  8:55       ` Avi Kivity
2012-01-16 23:59       ` Jeremy Fitzhardinge
2012-01-16 23:59         ` Jeremy Fitzhardinge
2012-01-18 10:48         ` Raghavendra K T
2012-01-18 10:48           ` Raghavendra K T
2012-01-16 10:24     ` Alexander Graf
2012-01-16 10:24       ` Alexander Graf
2012-01-17  0:30       ` Jeremy Fitzhardinge [this message]
2012-01-17  0:30         ` Jeremy Fitzhardinge
2012-01-18 10:23         ` Raghavendra K T
2012-01-18 10:23           ` Raghavendra K T
2012-01-16 13:43   ` Raghavendra K T
2012-01-16 13:43     ` Raghavendra K T
2012-01-16 13:49     ` Avi Kivity
2012-01-16 13:49       ` Avi Kivity
2012-01-16 18:48       ` Raghavendra K T
2012-01-16 18:48         ` Raghavendra K T
2012-01-16 14:20   ` Srivatsa Vaddagiri
2012-01-16 14:20     ` Srivatsa Vaddagiri
2012-01-16 14:23     ` Alexander Graf
2012-01-16 14:23       ` Alexander Graf
2012-01-16 18:38       ` Raghavendra K T
2012-01-16 18:38         ` Raghavendra K T
2012-01-16 18:42         ` Alexander Graf
2012-01-16 18:42           ` Alexander Graf
2012-01-17 17:27           ` Raghavendra K T
2012-01-17 17:27             ` Raghavendra K T
2012-01-17 17:39             ` Alexander Graf
2012-01-17 17:39               ` Alexander Graf
2012-01-17 18:36               ` Raghavendra K T
2012-01-17 18:36                 ` Raghavendra K T
2012-01-17 21:57                 ` Dave Hansen
2012-01-17 21:57                   ` Dave Hansen
2012-01-18  2:27                   ` Raghavendra K T
2012-01-18  2:27                     ` Raghavendra K T
2012-01-25  8:55                 ` Raghavendra K T
2012-01-25  8:55                   ` Raghavendra K T
2012-01-25 16:35                   ` Konrad Rzeszutek Wilk
2012-01-25 16:35                     ` Konrad Rzeszutek Wilk
2012-01-25 17:45                     ` Raghavendra K T
2012-01-25 17:45                       ` Raghavendra K T
2012-01-25 19:05                       ` Konrad Rzeszutek Wilk
2012-01-25 19:05                         ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F14C10C.3010303@goop.org \
    --to=jeremy@goop.org \
    --cc=agraf@suse.de \
    --cc=avi@redhat.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@linux.vnet.ibm.com \
    --cc=glommer@redhat.com \
    --cc=gregkh@suse.de \
    --cc=hpa@zytor.com \
    --cc=jan.kiszka@siemens.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=levinsasha928@gmail.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=sedat.dilek@gmail.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=tglx@linutronix.de \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.