All of lore.kernel.org
 help / color / mirror / Atom feed
* Possible Xen grant table locking improvements
@ 2014-10-20 17:35 David Vrabel
  2014-10-20 18:06 ` Matt Wilson
  2014-10-23  9:46 ` Tim Deegan
  0 siblings, 2 replies; 5+ messages in thread
From: David Vrabel @ 2014-10-20 17:35 UTC (permalink / raw)
  To: Xen-devel; +Cc: Tim Deegan, Keir Fraser, Jan Beulich, Matt Wilson

A while ago Matt Wilson posted a patch refactoring some of the grant
table locking[1].  We (XenServer) have found they significantly improve
performance.

These patches split the single grant table lock into three.

1. The grant table lock protecting grant table resizes.

2. The maptrack lock, protecting the domain's maptrack table.

3. Per-active entry locks.

But they still have some performance bottlenecks which we believe is the
maptrack lock.

### Solution

This is proposal for removing this bottleneck.  This proposal has a
number of drawbacks (see the end), which I would appreciate feedback on.

Most guests do not map a grant reference more than twice (Linux, for
example, will typically map a gref once in the kernel address space, or
twice if a userspace mapping is required).  The maptrack entries for
these two mappings can be stored in the active entry (the "fast"
entries).  If more than two mappings are required, the existing maptrack
table can be used (the "slow" entries).

A maptrack handle for a "fast" entry is encoded as:

    31 30          16  15            0
  +---+---------------+---------------+
  | F | domid         | gref          |
  +---+---------------+---------------+

F is set for a "fast" entry, and clear for a "slow" one. Grant
references above 2^16 will have to be tracked with "slow" entries.

We can omit taking the grant table lock to check the validity of a grant
ref or maptrack handle since these tables only grow and do not shrink.

If strict IOMMU mode is used, IOMMU mappings are updated on every grant
map/unmap.  These are currently setup such that BFN == MFN which
requires reference counting the IOMMU mappings so they are only torn
down when all grefs for that MFN are unmapped.  This requires an
expensive mapcount() operation that iterates over the whole maptrack table.

There is no requirement for BFN == MFN so each grant map can create its
own IOMMU mapping.  This will require a region of bus address space that
does not overlap with RAM.

### Possible Problems

1. The "fast" maptrack entries cause a problem when destroying domains.

A domain being destroyed need tear down all active grant maps it has.
It currently does this with a O(M) operation -- iterating over all the
maptrack entries.

With the "fast" maptrack entries being stored in the remote domain's
active grant entry tables, a walk of all map track entries must iterate
over /every/ domains' active grant entry tables /and/ the local maptrack
table).  This is O(D * G + M), which is maybe too expensive?

(D = number of domains, G = mean grant table size, M = number of local
maptrack entries).

2. The "fast" maptrack entries means we cannot[2] limit the total number
of grant maps a domain can make.

However, I view this as a positive since it removes a hard scalability
limit (the maptrack table limit).

David

[1] http://lists.xen.org/archives/html/xen-devel/2013-11/msg01517.html

[2] We could be that would require synchronizing access to a per-domain
map count which I'm trying to avoid.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible Xen grant table locking improvements
  2014-10-20 17:35 Possible Xen grant table locking improvements David Vrabel
@ 2014-10-20 18:06 ` Matt Wilson
  2014-10-21  9:32   ` Jan Beulich
  2014-10-23  9:46 ` Tim Deegan
  1 sibling, 1 reply; 5+ messages in thread
From: Matt Wilson @ 2014-10-20 18:06 UTC (permalink / raw)
  To: David Vrabel; +Cc: Keir Fraser, Tim Deegan, Matt Wilson, Jan Beulich, Xen-devel

On Mon, Oct 20, 2014 at 06:35:47PM +0100, David Vrabel wrote:
> A while ago Matt Wilson posted a patch refactoring some of the grant
> table locking[1].  We (XenServer) have found they significantly improve
> performance.

Fantastic! Thank you for having a look at this work. We definitely
intended to get back to splitting the patch up for the 4.5 cycle, but
other projects were competing for the time. So at this point we were
expecting to come back to this when 4.6 development starts in earnest.

> These patches split the single grant table lock into three.
> 
> 1. The grant table lock protecting grant table resizes.
> 
> 2. The maptrack lock, protecting the domain's maptrack table.
> 
> 3. Per-active entry locks.
> 
> But they still have some performance bottlenecks which we believe is the
> maptrack lock.

I vaguely remember this being a problem as well. Unfortunately it's
been about a year since I was deeply in the code.

> ### Solution
> 
> This is proposal for removing this bottleneck.  This proposal has a
> number of drawbacks (see the end), which I would appreciate feedback on.
> 
> Most guests do not map a grant reference more than twice (Linux, for
> example, will typically map a gref once in the kernel address space, or
> twice if a userspace mapping is required).  The maptrack entries for
> these two mappings can be stored in the active entry (the "fast"
> entries).  If more than two mappings are required, the existing maptrack
> table can be used (the "slow" entries).
> 
> A maptrack handle for a "fast" entry is encoded as:
> 
>     31 30          16  15            0
>   +---+---------------+---------------+
>   | F | domid         | gref          |
>   +---+---------------+---------------+
> 
> F is set for a "fast" entry, and clear for a "slow" one. Grant
> references above 2^16 will have to be tracked with "slow" entries.
>
> We can omit taking the grant table lock to check the validity of a grant
> ref or maptrack handle since these tables only grow and do not shrink.
> 
> If strict IOMMU mode is used, IOMMU mappings are updated on every grant
> map/unmap.  These are currently setup such that BFN == MFN which
> requires reference counting the IOMMU mappings so they are only torn
> down when all grefs for that MFN are unmapped.  This requires an
> expensive mapcount() operation that iterates over the whole maptrack table.
> 
> There is no requirement for BFN == MFN so each grant map can create its
> own IOMMU mapping.  This will require a region of bus address space that
> does not overlap with RAM.
> 
> ### Possible Problems
> 
> 1. The "fast" maptrack entries cause a problem when destroying domains.
> 
> A domain being destroyed need tear down all active grant maps it has.
> It currently does this with a O(M) operation -- iterating over all the
> maptrack entries.
> 
> With the "fast" maptrack entries being stored in the remote domain's
> active grant entry tables, a walk of all map track entries must iterate
> over /every/ domains' active grant entry tables /and/ the local maptrack
> table).  This is O(D * G + M), which is maybe too expensive?
> 
> (D = number of domains, G = mean grant table size, M = number of local
> maptrack entries).

Hmmmm, that doesn't sound ideal. I suppose we would also need to walk
all of the active grant entry tables under a lock, which might make
the overall runtime of this operation unpredictable under various load
conditions.

> 2. The "fast" maptrack entries means we cannot[2] limit the total number
> of grant maps a domain can make.
> 
> However, I view this as a positive since it removes a hard scalability
> limit (the maptrack table limit).

I'm a little less worried about this point.

It's probably time for me to revisit the proposed patches so that I
can provide more comprehensive feedback on this approach.

--msw

> David
> 
> [1] http://lists.xen.org/archives/html/xen-devel/2013-11/msg01517.html
> 
> [2] We could be that would require synchronizing access to a per-domain
> map count which I'm trying to avoid.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible Xen grant table locking improvements
  2014-10-20 18:06 ` Matt Wilson
@ 2014-10-21  9:32   ` Jan Beulich
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Beulich @ 2014-10-21  9:32 UTC (permalink / raw)
  To: David Vrabel, Matt Wilson; +Cc: Tim Deegan, Keir Fraser, Matt Wilson, Xen-devel

>>> On 20.10.14 at 20:06, <msw@linux.com> wrote:
> On Mon, Oct 20, 2014 at 06:35:47PM +0100, David Vrabel wrote:
>> ### Possible Problems
>> 
>> 1. The "fast" maptrack entries cause a problem when destroying domains.
>> 
>> A domain being destroyed need tear down all active grant maps it has.
>> It currently does this with a O(M) operation -- iterating over all the
>> maptrack entries.
>> 
>> With the "fast" maptrack entries being stored in the remote domain's
>> active grant entry tables, a walk of all map track entries must iterate
>> over /every/ domains' active grant entry tables /and/ the local maptrack
>> table).  This is O(D * G + M), which is maybe too expensive?
>> 
>> (D = number of domains, G = mean grant table size, M = number of local
>> maptrack entries).
> 
> Hmmmm, that doesn't sound ideal. I suppose we would also need to walk
> all of the active grant entry tables under a lock, which might make
> the overall runtime of this operation unpredictable under various load
> conditions.

It would seem possible to at least limit the set of domains to look at:
We could introduce a bitmap (or rangeset) where we track which
other domains a given one mapped grants from. Assuming that
other than Dom0 would likely serve only a strict subset of all domains,
this might already be a win (and domain destruction is slow anyway).

Jan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible Xen grant table locking improvements
  2014-10-20 17:35 Possible Xen grant table locking improvements David Vrabel
  2014-10-20 18:06 ` Matt Wilson
@ 2014-10-23  9:46 ` Tim Deegan
  2014-11-07 15:41   ` David Vrabel
  1 sibling, 1 reply; 5+ messages in thread
From: Tim Deegan @ 2014-10-23  9:46 UTC (permalink / raw)
  To: David Vrabel; +Cc: Keir Fraser, Jan Beulich, Matt Wilson, Xen-devel

Hi,

At 18:35 +0100 on 20 Oct (1413826547), David Vrabel wrote:
> Most guests do not map a grant reference more than twice (Linux, for
> example, will typically map a gref once in the kernel address space, or
> twice if a userspace mapping is required).  The maptrack entries for
> these two mappings can be stored in the active entry (the "fast"
> entries).  If more than two mappings are required, the existing maptrack
> table can be used (the "slow" entries).

Sounds good, as long as the hit rate is indeed high.  Do you know if
the BSD/windows client code behaves this way too?

> A maptrack handle for a "fast" entry is encoded as:
> 
>     31 30          16  15            0
>   +---+---------------+---------------+
>   | F | domid         | gref          |
>   +---+---------------+---------------+
> 
> F is set for a "fast" entry, and clear for a "slow" one. Grant
> references above 2^16 will have to be tracked with "slow" entries.

How restricting is that limit?  Would 2^15½ and also encoding
which of the two entries to look at be good?

> We can omit taking the grant table lock to check the validity of a grant
> ref or maptrack handle since these tables only grow and do not shrink.

Can you also avoid the lock for accessing the entry itself, with a bit
of RCU magic?  Maybe that's overengineering things.

> If strict IOMMU mode is used, IOMMU mappings are updated on every grant
> map/unmap.  These are currently setup such that BFN == MFN which
> requires reference counting the IOMMU mappings so they are only torn
> down when all grefs for that MFN are unmapped.  This requires an
> expensive mapcount() operation that iterates over the whole maptrack table.
> 
> There is no requirement for BFN == MFN so each grant map can create its
> own IOMMU mapping.  This will require a region of bus address space that
> does not overlap with RAM.

Hrmn.  That could be tricky to arrange.  And the reference counting
might end up being cheaper than the extra IOMMU flush operations.
(Also, how much would you bet that clients actually use the returned
BFN correctly?)

Would it be enough to optimise mapcount() a bit?  We could organise the
in-use maptrack entries as a hash table instead of (or as well as) a
single linked list.

On similar lines, would it be worth fragmenting the maptrack itself
(e.g. with per-page locks) to reduce locking contention instead of
moving maptrack entries into the active entry?  If might be Good
Enough[tm], and simpler to build/maintain than this proposal.

> ### Possible Problems
> 
> 1. The "fast" maptrack entries cause a problem when destroying domains.
> 
> A domain being destroyed need tear down all active grant maps it has.
> It currently does this with a O(M) operation -- iterating over all the
> maptrack entries.
> 
> With the "fast" maptrack entries being stored in the remote domain's
> active grant entry tables, a walk of all map track entries must iterate
> over /every/ domains' active grant entry tables /and/ the local maptrack
> table).  This is O(D * G + M), which is maybe too expensive?

It seems OK to me, since it's trivially interruptable for softirqs &c.
(with maybe a bit of extra plumbing around the domain hash table).

> (D = number of domains, G = mean grant table size, M = number of local
> maptrack entries).
> 
> 2. The "fast" maptrack entries means we cannot[2] limit the total number
> of grant maps a domain can make.

Yeah, I think that's fine. :)

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible Xen grant table locking improvements
  2014-10-23  9:46 ` Tim Deegan
@ 2014-11-07 15:41   ` David Vrabel
  0 siblings, 0 replies; 5+ messages in thread
From: David Vrabel @ 2014-11-07 15:41 UTC (permalink / raw)
  To: Tim Deegan, David Vrabel; +Cc: Keir Fraser, Matt Wilson, Jan Beulich, Xen-devel

On 23/10/14 10:46, Tim Deegan wrote:
> Hi,
> 
> At 18:35 +0100 on 20 Oct (1413826547), David Vrabel wrote:
>> Most guests do not map a grant reference more than twice (Linux, for
>> example, will typically map a gref once in the kernel address space, or
>> twice if a userspace mapping is required).  The maptrack entries for
>> these two mappings can be stored in the active entry (the "fast"
>> entries).  If more than two mappings are required, the existing maptrack
>> table can be used (the "slow" entries).
> 
> Sounds good, as long as the hit rate is indeed high.  Do you know if
> the BSD/windows client code behaves this way too?

I don't know about BSD but XenServer's Windows PV drivers don't support
mapping grants (only granting access).

>> A maptrack handle for a "fast" entry is encoded as:
>>
>>     31 30          16  15            0
>>   +---+---------------+---------------+
>>   | F | domid         | gref          |
>>   +---+---------------+---------------+
>>
>> F is set for a "fast" entry, and clear for a "slow" one. Grant
>> references above 2^16 will have to be tracked with "slow" entries.
> 
> How restricting is that limit?  Would 2^15½ and also encoding
> which of the two entries to look at be good?

Oh,  I forgot about the bit for the entry index.

2^15 entries allows for (e.g.,) 8 multiqueue VIFs in a 8 VCPU guest.
Which is not a huge number and not a limit I would like to introduce.

One possibility would be guests wanting to use the fast path have to use
a new grant unmap hypercall that also passes the original grant ref and
domain.

>> We can omit taking the grant table lock to check the validity of a grant
>> ref or maptrack handle since these tables only grow and do not shrink.
> 
> Can you also avoid the lock for accessing the entry itself, with a bit
> of RCU magic?  Maybe that's overengineering things.

I don't think this will be necessary -- the active entry lock won't be
contended.

>> If strict IOMMU mode is used, IOMMU mappings are updated on every grant
>> map/unmap.  These are currently setup such that BFN == MFN which
>> requires reference counting the IOMMU mappings so they are only torn
>> down when all grefs for that MFN are unmapped.  This requires an
>> expensive mapcount() operation that iterates over the whole maptrack table.
>>
>> There is no requirement for BFN == MFN so each grant map can create its
>> own IOMMU mapping.  This will require a region of bus address space that
>> does not overlap with RAM.
> 
> Hrmn.  That could be tricky to arrange.  And the reference counting
> might end up being cheaper than the extra IOMMU flush operations.
> (Also, how much would you bet that clients actually use the returned
> BFN correctly?)

Hmm. Yes, if a guest has assumed that BFN == MFN, it could read the PTE
and get the BFN that way.

> Would it be enough to optimise mapcount() a bit?  We could organise the
> in-use maptrack entries as a hash table instead of (or as well as) a
> single linked list.
> 
> On similar lines, would it be worth fragmenting the maptrack itself
> (e.g. with per-page locks) to reduce locking contention instead of
> moving maptrack entries into the active entry?  If might be Good
> Enough[tm], and simpler to build/maintain than this proposal.

A per maptrack page lock you would still need a per-domain maptrack lock
protecting the maptrack free list.

A better idea may be to hash the domain and grant ref and have a
maptrack table per-hash bucket.  Each maptrack table would have its own
maptrack lock.

The maptrack handle could be:

    31               16  15            0
    +-------------------+---------------+
    | bucket            | index         |
    +-------------------+---------------+

We should probably try something simpler like this before getting
carried away...

David

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-11-07 15:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-20 17:35 Possible Xen grant table locking improvements David Vrabel
2014-10-20 18:06 ` Matt Wilson
2014-10-21  9:32   ` Jan Beulich
2014-10-23  9:46 ` Tim Deegan
2014-11-07 15:41   ` David Vrabel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.