All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: George Dunlap <george.dunlap@citrix.com>,
	Malcolm Crossley <malcolm.crossley@citrix.com>,
	JBeulich@suse.com, ian.campbell@citrix.com,
	Marcos.Matsunaga@oracle.com, keir@xen.org,
	konrad.wilk@oracle.com, george.dunlap@eu.citrix.com
Cc: xen-devel@lists.xenproject.org, stefano.stabellini@citrix.com
Subject: Re: [PATCHv2 0/3] Implement per-cpu reader-writer locks
Date: Tue, 24 Nov 2015 18:32:50 +0000	[thread overview]
Message-ID: <5654AD52.1080804@citrix.com> (raw)
In-Reply-To: <5654A98D.3050801@citrix.com>

On 24/11/15 18:16, George Dunlap wrote:
> On 20/11/15 16:03, Malcolm Crossley wrote:
>> This patch series adds per-cpu reader-writer locks as a generic lock
>> implementation and then converts the grant table and p2m rwlocks to
>> use the percpu rwlocks, in order to improve multi-socket host performance.
>>
>> CPU profiling has revealed the rwlocks themselves suffer from severe cache
>> line bouncing due to the cmpxchg operation used even when taking a read lock.
>> Multiqueue paravirtualised I/O results in heavy contention of the grant table
>> and p2m read locks of a specific domain and so I/O throughput is bottlenecked
>> by the overhead of the cache line bouncing itself.
>>
>> Per-cpu read locks avoid lock cache line bouncing by using a per-cpu data
>> area to record a CPU has taken the read lock. Correctness is enforced for the 
>> write lock by using a per lock barrier which forces the per-cpu read lock 
>> to revert to using a standard read lock. The write lock then polls all 
>> the percpu data area until active readers for the lock have exited.
>>
>> Removing the cache line bouncing on a multi-socket Haswell-EP system 
>> dramatically improves performance, with 16 vCPU network IO performance going 
>> from 15 gb/s to 64 gb/s! The host under test was fully utilising all 40 
>> logical CPU's at 64 gb/s, so a bigger logical CPU host may see an even better
>> IO improvement.
> Impressive -- thanks for doing this work.
>
> One question: Your description here sounds like you've tested with a
> single large domain, but what happens with multiple domains?

The test was with two domU's, doing network traffic between them.

>
> It looks like the "per-cpu-rwlock" is shared by *all* locks of a
> particular type (e.g., all domains share the per-cpu p2m rwlock).
> (Correct me if I'm wrong here.)

The per-pcpu pointer is shared for all locks of a particular type, but
allows the individual lock to be distinguished by following the pointer
back.

Therefore, the locks are not actually shared.

>
> Which means two things:
>
>  1) *Any* writer will have to wait for the rwlock of that type to be
> released on *all* domains before being able to write.  Is there any risk
> that on a busy system, this will be an unusually long wait?

No.  The write-locker looks through all pcpus and ignores any whose
per-cpu pointer is not pointing at the intended percpu_rwlock.

>From _percpu_write_lock():

/*
 * Remove any percpu readers not contending on this rwlock
 * from our check mask.
 */
if ( per_cpu_ptr(per_cpudata, cpu) != percpu_rwlock )
    cpumask_clear_cpu(cpu, &this_cpu(percpu_rwlock_readers));

As a result, two calls to _percpu_write_lock() against different
percpu_rwlock_t's will not interact.

>
> 2) *All* domains will have to take the slow path for reading when a
> *any* domain has or is trying to acquire the write lock.  What is the
> probability that on a busy system that turns out to be "most of the time"?

_percpu_read_lock() will only take the slow path if:

1) The intended lock is (or is in the process of being) taken for writing
2) The callpath has nested _percpu_read_lock() of the same type of lock

By observation, 2) does not occur currently in the code.

~Andrew

      parent reply	other threads:[~2015-11-24 18:32 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-20 16:03 [PATCHv2 0/3] Implement per-cpu reader-writer locks Malcolm Crossley
2015-11-20 16:03 ` [PATCHv2 1/3] rwlock: Add " Malcolm Crossley
2015-11-25 11:12   ` George Dunlap
2015-11-26 12:17   ` Jan Beulich
2015-11-20 16:03 ` [PATCHv2 2/3] grant_table: convert grant table rwlock to percpu rwlock Malcolm Crossley
2015-11-25 12:35   ` Jan Beulich
2015-11-25 13:43     ` Malcolm Crossley
2015-11-25 13:53       ` Jan Beulich
2015-11-25 14:11         ` Malcolm Crossley
2015-11-20 16:03 ` [PATCHv2 3/3] p2m: " Malcolm Crossley
2015-11-25 12:00   ` George Dunlap
2015-11-25 12:38   ` Jan Beulich
2015-11-25 12:54     ` Malcolm Crossley
2015-11-24 18:16 ` [PATCHv2 0/3] Implement per-cpu reader-writer locks George Dunlap
2015-11-24 18:30   ` George Dunlap
2015-11-25  8:58     ` Malcolm Crossley
2015-11-25  9:49       ` George Dunlap
2015-11-26  9:48       ` Dario Faggioli
2015-11-24 18:32   ` Andrew Cooper [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5654AD52.1080804@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=Marcos.Matsunaga@oracle.com \
    --cc=george.dunlap@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=keir@xen.org \
    --cc=konrad.wilk@oracle.com \
    --cc=malcolm.crossley@citrix.com \
    --cc=stefano.stabellini@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.