Re: Preemptable Ticket Spinlock

* Re: Preemptable Ticket Spinlock
       [not found] <CAJocwccu5QQyuKRvgNyPSFz2K_rzCW419W9-XdSUYOL7+KqQKg@mail.gmail.com>
@ 2013-04-21 21:12 ` Rik van Riel
  2013-04-21 23:07   ` Jiannan Ouyang
                     ` (2 more replies)
  2013-04-22  5:58 ` Raghavendra K T
  2013-04-26 20:10 ` Andrew Theurer
  2 siblings, 3 replies; 32+ messages in thread
From: Rik van Riel @ 2013-04-21 21:12 UTC (permalink / raw)
  To: Jiannan Ouyang
  Cc: LKML, Raghavendra K T, Peter Zijlstra, Avi Kivity, Gleb Natapov,
	Ingo Molnar, Marcelo Tosatti, Srikar, H. Peter Anvin,
	Nikunj A. Dadhania, KVM, Thomas Gleixner, Chegu Vinod,
	Andrew M. Theurer, Srivatsa Vaddagiri, Andrew Jones,
	Gleb Natapov, Karen Noel

On 04/20/2013 06:12 PM, Jiannan Ouyang wrote:
> Hello Everyone,
>
> I recently came up with a spinlock algorithm that can adapt to
> preemption, which you may be interested in. The intuition is to
> downgrade a fair lock to an unfair lock automatically upon preemption,
> and preserve the fairness otherwise. It is a guest side optimization,
> and can be used as a complementary technique to host side optimizations
> like co-scheduling and Pause-Loop Exiting.
>
> In my experiments, it improves VM performance by 5:32X on average, when
> running on a non paravirtual VMM, and by 7:91X when running on a VMM
> that supports a paravirtual locking interface (using a pv preemptable
> ticket spinlock), when executing a set of microbenchmarks as well as a
> realistic e-commerce benchmark.
>
> A detailed algorithm description can be found in my VEE 2013 paper,
> Preemptable Ticket Spinlocks: Improving Consolidated Performance in the
> Cloud
> Jiannan Ouyang, John R. Lange
> ouyang,jacklange@cs.pitt.edu <mailto:jacklange@cs.pitt.edu>
> University of Pittsburgh
> http://people.cs.pitt.edu/~ouyang/files/publication/preemptable_lock-ouyang-vee13.pdf

Your algorithm is very clever, and very promising.

However, it does increase the size of the struct spinlock, and adds
an additional atomic operation to spin_unlock, neither of which I
suspect are necessary.

If we always incremented the ticket number by 2 (instead of 1), then
we could use the lower bit of the ticket number as the spinlock.

If we do NOT run virtualized, we simply increment the ticket by 2
in spin_unlock, and the code can remain otherwise the same.

If we do run virtualized, we take that spinlock after acquiring
the ticket (or timing out), just like in your code. In the
virtualized spin_unlock, we can then release the spinlock and
increment the ticket in one operation: by simply increasing the
ticket by 1.

In other words, we should be able to keep the overhead of this
to an absolute minimum, and keep spin_unlock to be always the
same cost it is today.

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 32+ messages in thread