All of lore.kernel.org
 help / color / mirror / Atom feed
* Any work on sharing of large multi-page segments?
@ 2015-03-15  8:37 Andrew Warkentin
  2015-03-16 11:49 ` Jan Beulich
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Warkentin @ 2015-03-15  8:37 UTC (permalink / raw)
  To: xen-devel

Has anyone done any work on sharing of large multiple-page segments 
between domains? The current grant table implementation is unsuitable 
for this because it only allows sharing single pages and is limited to a 
relatively small number of entries (and passing large numbers of 
single-page grant references between domains for a segment might get slow).

I am planning to write a paravirtualized OpenGL impelmentation that will 
use a ring buffer for commands and shared memory for passing buffers 
between front end and back end, rather than trying to push everything 
through a socket as VMGL and Chromium do (using sockets to push OpenGL 
buffers between VMs on the same machine is kind of ridiculous when you 
could just use shared memory segments). Something like this pretty much 
requires the ability to grant multi-page segments. The only other option 
I could see would be limiting the backend to running on dom0 (so it can 
use xc_map_foreign_pages), which I don't consider acceptable, since some 
graphics drivers have issues with running on dom0, and I also want to be 
able to run the backend on an arbitrary domU (such as a dedicated 
graphics domain, or passing the graphics card to a Windows domU and 
having other VMs display through it - that's about the only way that I 
know of to have Direct3D support under Windows simultaneously with 
OpenGL on other VMs other than using Wine Direct3D, which is somewhat 
limited).

Another use of segment grants would be sharing of code between driver 
domains to reduce memory footprint. I am wanting to put together a Xen 
system where dom0 is a stub domain (running a single-process embedded 
OS) that just provides xenstore and starts and stops domains (based on 
requests from a domU), and there is a separate domain for each driver 
(based on a cut-down Linux system, and maybe also optionally on the same 
single-process OS as dom0, with the NetBSD rump kernel PCI drivers 
ported to it). Since there will be several service domains, using 
segments to share code between them would reduce the memory footprint 
somewhat (for sharing kernel code these would presumably require some 
kind of small bootstrap to map the grant reference).

If nobody has patched Xen to support multi-page segment grants (I 
couldn't find anything of the sort), I plan to write my own patches. 
Would there be any major limitations of the current design that would 
prevent supporting them without a major redesign? I'd like to add 
support for multiple-page grants in a way that requires as few changes 
to existing DomU code as possible. Maybe a new hypercall that maps a 
multiple-page grant so that existing grant-handling code won't break if 
it attempts to map a multiple-page grant reference, and some kind of 
"grant length table" to allow domains that use v1 grant tables to grant 
multi-page segments (for v2 grant tables, a new type flag and entry 
format could probably be used)?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Any work on sharing of large multi-page segments?
  2015-03-15  8:37 Any work on sharing of large multi-page segments? Andrew Warkentin
@ 2015-03-16 11:49 ` Jan Beulich
  2015-03-17  0:46   ` Andrew Warkentin
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2015-03-16 11:49 UTC (permalink / raw)
  To: Andrew Warkentin; +Cc: xen-devel

>>> On 15.03.15 at 09:37, <andreww591@gmail.com> wrote:
> Has anyone done any work on sharing of large multiple-page segments 
> between domains? The current grant table implementation is unsuitable 
> for this because it only allows sharing single pages and is limited to a 
> relatively small number of entries (and passing large numbers of 
> single-page grant references between domains for a segment might get slow).
> 
> I am planning to write a paravirtualized OpenGL impelmentation that will 
> use a ring buffer for commands and shared memory for passing buffers 
> between front end and back end, rather than trying to push everything 
> through a socket as VMGL and Chromium do (using sockets to push OpenGL 
> buffers between VMs on the same machine is kind of ridiculous when you 
> could just use shared memory segments). Something like this pretty much 
> requires the ability to grant multi-page segments.

So where do you expect the major performance / scalability
improvement to be gained? Internally to Xen, each page will need
to be tracked separately anyway, as what appears physically
contiguous in the granting guest may (and likely will) not be
contiguous in machine memory (i.e. from Xen's perspective).
Furthermore the public interface is currently written such that
grant lengths must be less than 64k. I.e. already at the very
simple first steps you'd be faced with implementing bigger length
counterparts of (almost) all existing interfaces.

> Another use of segment grants would be sharing of code between driver 
> domains to reduce memory footprint. I am wanting to put together a Xen 
> system where dom0 is a stub domain (running a single-process embedded 
> OS) that just provides xenstore and starts and stops domains (based on 
> requests from a domU), and there is a separate domain for each driver 
> (based on a cut-down Linux system, and maybe also optionally on the same 
> single-process OS as dom0, with the NetBSD rump kernel PCI drivers 
> ported to it). Since there will be several service domains, using 
> segments to share code between them would reduce the memory footprint 
> somewhat (for sharing kernel code these would presumably require some 
> kind of small bootstrap to map the grant reference).

No, grants are explicitly not intended to be used for code (or data)
sharing, only for data exchange: They're being mapped with the
executable flag clear. For code and (read-only) data sharing you'll
want to use the page sharing facility instead.

Jan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Any work on sharing of large multi-page segments?
  2015-03-16 11:49 ` Jan Beulich
@ 2015-03-17  0:46   ` Andrew Warkentin
  2015-03-17  8:54     ` Jan Beulich
  2015-03-17 11:48     ` George Dunlap
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Warkentin @ 2015-03-17  0:46 UTC (permalink / raw)
  To: xen-devel

On 3/16/15, Jan Beulich <JBeulich@suse.com> wrote:
> So where do you expect the major performance / scalability
> improvement to be gained? Internally to Xen, each page will need
> to be tracked separately anyway, as what appears physically
> contiguous in the granting guest may (and likely will) not be
> contiguous in machine memory (i.e. from Xen's perspective).
> Furthermore the public interface is currently written such that
> grant lengths must be less than 64k. I.e. already at the very
> simple first steps you'd be faced with implementing bigger length
> counterparts of (almost) all existing interfaces.

Since I'm looking to grant OpenGL buffers that can be many megabytes
in size, I would think there would be a fair bit of overhead if the
backend domain had to make a hypercall to map every single page of a
buffer. I guess what I could do would be to add a hypercall that maps
a contiguous group of grant entries (contiguous in the grant table,
not necessarily contiguous in memory).

> No, grants are explicitly not intended to be used for code (or data)
> sharing, only for data exchange: They're being mapped with the
> executable flag clear. For code and (read-only) data sharing you'll
> want to use the page sharing facility instead.

I was thinking more of explicit sharing of code rather than automatic
deduplication (at least I'm assuming you're talking about the
deduplication support that was added a while back). I would imagine
there would be some overhead associated with deduplication whereas
explicit sharing would incur relatively little overhead. For a desktop
that's running relatively few VMs that are often going to be
dissimilar, the overhead of deduplication might not be worth it. Also,
I was wanting to make my disaggregated Xen system entirely
memory-resident except for the storage domain (it should be
lightweight enough), and it would make more sense to create a service
domain by mapping everything into its address space when building it,
rather than copying it from dom0 and waiting for the deduplication to
kick in.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Any work on sharing of large multi-page segments?
  2015-03-17  0:46   ` Andrew Warkentin
@ 2015-03-17  8:54     ` Jan Beulich
  2015-03-17 23:45       ` Andrew Warkentin
  2015-03-17 11:48     ` George Dunlap
  1 sibling, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2015-03-17  8:54 UTC (permalink / raw)
  To: Andrew Warkentin; +Cc: xen-devel

>>> On 17.03.15 at 01:46, <andreww591@gmail.com> wrote:
> On 3/16/15, Jan Beulich <JBeulich@suse.com> wrote:
>> So where do you expect the major performance / scalability
>> improvement to be gained? Internally to Xen, each page will need
>> to be tracked separately anyway, as what appears physically
>> contiguous in the granting guest may (and likely will) not be
>> contiguous in machine memory (i.e. from Xen's perspective).
>> Furthermore the public interface is currently written such that
>> grant lengths must be less than 64k. I.e. already at the very
>> simple first steps you'd be faced with implementing bigger length
>> counterparts of (almost) all existing interfaces.
> 
> Since I'm looking to grant OpenGL buffers that can be many megabytes
> in size, I would think there would be a fair bit of overhead if the
> backend domain had to make a hypercall to map every single page of a
> buffer. I guess what I could do would be to add a hypercall that maps
> a contiguous group of grant entries (contiguous in the grant table,
> not necessarily contiguous in memory).

And how would that be significantly different from the batching
that's already built into the grant table hypercall?

Jan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Any work on sharing of large multi-page segments?
  2015-03-17  0:46   ` Andrew Warkentin
  2015-03-17  8:54     ` Jan Beulich
@ 2015-03-17 11:48     ` George Dunlap
  1 sibling, 0 replies; 8+ messages in thread
From: George Dunlap @ 2015-03-17 11:48 UTC (permalink / raw)
  To: Andrew Warkentin; +Cc: xen-devel

On Tue, Mar 17, 2015 at 12:46 AM, Andrew Warkentin <andreww591@gmail.com> wrote:
> I was thinking more of explicit sharing of code rather than automatic
> deduplication (at least I'm assuming you're talking about the
> deduplication support that was added a while back).

Automatic deduplication has two halves: automatically identifying
pages which are duplicated, and then the actual mechanism of sharing
them.

Any deduplication code would run in as a process probably in domain 0,
and may be somewhat slow; but the actual mechanism of sharing is a
generic mechanism in the hypervisor which any client can use.  Jan is
suggesting that you might be able to use that interface to
pro-actively tell Xen about the memory pages shared between your
various domains.

 -George

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Any work on sharing of large multi-page segments?
  2015-03-17  8:54     ` Jan Beulich
@ 2015-03-17 23:45       ` Andrew Warkentin
  2015-03-18 10:16         ` Ian Campbell
  2015-03-18 16:40         ` George Dunlap
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Warkentin @ 2015-03-17 23:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 3/17/15, Jan Beulich <JBeulich@suse.com> wrote:
> And how would that be significantly different from the batching
> that's already built into the grant table hypercall?
>
I guess it does do more or less what I want already. I was looking
more at the inner mapping/unmapping functions, rather than the
wrappers around them that implement the actual hypercalls.

What would be a useful addition would be support for granting 2M
pages. That would eliminate any problem with running out of grant
table slots.

On 3/17/15, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
> Any deduplication code would run in as a process probably in domain 0,
> and may be somewhat slow; but the actual mechanism of sharing is a
> generic mechanism in the hypervisor which any client can use.  Jan is
> suggesting that you might be able to use that interface to
> pro-actively tell Xen about the memory pages shared between your
> various domains.
>

I wasn't quite sure if it's generic enough to use to implement shared
segments, or if it were specific to deduplication at the hypervisor
level.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Any work on sharing of large multi-page segments?
  2015-03-17 23:45       ` Andrew Warkentin
@ 2015-03-18 10:16         ` Ian Campbell
  2015-03-18 16:40         ` George Dunlap
  1 sibling, 0 replies; 8+ messages in thread
From: Ian Campbell @ 2015-03-18 10:16 UTC (permalink / raw)
  To: Andrew Warkentin; +Cc: Jan Beulich, xen-devel

On Tue, 2015-03-17 at 17:45 -0600, Andrew Warkentin wrote:
> On 3/17/15, Jan Beulich <JBeulich@suse.com> wrote:
> > And how would that be significantly different from the batching
> > that's already built into the grant table hypercall?
> >
> I guess it does do more or less what I want already. I was looking
> more at the inner mapping/unmapping functions, rather than the
> wrappers around them that implement the actual hypercalls.
> 
> What would be a useful addition would be support for granting 2M
> pages. That would eliminate any problem with running out of grant
> table slots.

It's related but not quite the same but on ARM we are probably
eventually going to want to look into support for 64K grant mappings at
some point.

On ARM there are several options for the basic leaf page (called
"granules") size, 4K, 16K and 64K and it seems that at least some
OS/distro vendors are going to ship their kernels with 64K pages by
default, so we need to be able to support such guests.

Short term we can deal with this in the front/backend by using multiple
grants per guest page (i.e. 16 grants per 64K page) but that is quite
wasteful of ring space. Longer term I think it would be worth
investigating adding grant table extensions to allow for >4K granule
sizes.

>From there it's not too much of a stretch to imagine supporting
superpages of whichever granule size (i.e. 2M in the 4K case might be
sane, 32M in the 64K case perhaps less useful). 

Even without that Linux will use compound pages e.g. for network frags
(up to order 3 == 32k, I think), so supporting higher order grants might
even be useful there even on x86.

Internally we'd still need to deal with all this in Xen as 4K mappings
in the second stage pages, but that's doable I think.

It's not clear when this would float to the top of priority list for Xen
on ARM though, so I wouldn't wait for us ;-)

Ian.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Any work on sharing of large multi-page segments?
  2015-03-17 23:45       ` Andrew Warkentin
  2015-03-18 10:16         ` Ian Campbell
@ 2015-03-18 16:40         ` George Dunlap
  1 sibling, 0 replies; 8+ messages in thread
From: George Dunlap @ 2015-03-18 16:40 UTC (permalink / raw)
  To: Andrew Warkentin; +Cc: Jan Beulich, xen-devel

On Tue, Mar 17, 2015 at 11:45 PM, Andrew Warkentin <andreww591@gmail.com> wrote:
> On 3/17/15, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>> Any deduplication code would run in as a process probably in domain 0,
>> and may be somewhat slow; but the actual mechanism of sharing is a
>> generic mechanism in the hypervisor which any client can use.  Jan is
>> suggesting that you might be able to use that interface to
>> pro-actively tell Xen about the memory pages shared between your
>> various domains.
>>
>
> I wasn't quite sure if it's generic enough to use to implement shared
> segments, or if it were specific to deduplication at the hypervisor
> level.

I haven't used the shared memory interface myself, but we designed it
I'm pretty sure that was the intention.  An idea was discussed, for
instance, was that scanning random pages for duplicates has turned out
to actually be not much benefit for VMWare; but that a more promising
avenue might be hooking into the block layer, so that if VM A and VM B
have disks that share an image base, and block X is shared, that if VM
A reads block X and then VM B reads block X, instead of issuing a
second DMA into VM B's memory, the disk layer can just tell Xen,
"please share this page copy-on-write between VM A and VM B".

Remember also that Jan was recommending this method for sharing
read-only shared data and executable segments, not for areas for
communication (i.e.., where both are going to want to write to the
memory and have the other side see the updates).  For that you've got
to go with grant tables.

Anyway, I think it's worth looking into.

 -George

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-03-18 16:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-15  8:37 Any work on sharing of large multi-page segments? Andrew Warkentin
2015-03-16 11:49 ` Jan Beulich
2015-03-17  0:46   ` Andrew Warkentin
2015-03-17  8:54     ` Jan Beulich
2015-03-17 23:45       ` Andrew Warkentin
2015-03-18 10:16         ` Ian Campbell
2015-03-18 16:40         ` George Dunlap
2015-03-17 11:48     ` George Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.