All of lore.kernel.org
 help / color / mirror / Atom feed
* [hybrid] : mmap pfn space...
@ 2012-03-23 18:01 Mukesh Rathor
  2012-03-26 10:37 ` Stefano Stabellini
  0 siblings, 1 reply; 11+ messages in thread
From: Mukesh Rathor @ 2012-03-23 18:01 UTC (permalink / raw)
  To: Xen-devel, Ian Campbell, stefano.stabellini

Hi Ian/Stefano,

So, I'm back to using pfn space from maxphysaddr below. Stefano, you
suggested ballooning, but that would be just too slow. There are lot of
pages to be mapped, 4k at a time during guest creation, and I am afraid
ballooning and hypercalls to populate EPT will be pretty slow.

OTOH, there is tons of address space available between max-physaddr and
max pfn in dom0. Stefano, your concern was stuff mapped there 
causing problems in future. But we can always look at the e820 for
conflicts. Keeping things fast is important for us .

Please let me know if you still have issues with my approach. I
believe this is what Ian is doing on ARM port.

thanks,
Mukesh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [hybrid] : mmap pfn space...
  2012-03-23 18:01 [hybrid] : mmap pfn space Mukesh Rathor
@ 2012-03-26 10:37 ` Stefano Stabellini
  2012-03-26 10:40   ` Ian Campbell
  2012-04-14  1:47   ` Mukesh Rathor
  0 siblings, 2 replies; 11+ messages in thread
From: Stefano Stabellini @ 2012-03-26 10:37 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel, Ian Campbell, Stefano Stabellini

On Fri, 23 Mar 2012, Mukesh Rathor wrote:
> Hi Ian/Stefano,
> 
> So, I'm back to using pfn space from maxphysaddr below. Stefano, you
> suggested ballooning, but that would be just too slow. There are lot of
> pages to be mapped, 4k at a time during guest creation, and I am afraid
> ballooning and hypercalls to populate EPT will be pretty slow.
> 
> OTOH, there is tons of address space available between max-physaddr and
> max pfn in dom0. Stefano, your concern was stuff mapped there 
> causing problems in future. But we can always look at the e820 for
> conflicts. Keeping things fast is important for us .
> 
> Please let me know if you still have issues with my approach. I
> believe this is what Ian is doing on ARM port.

I think that we should explicitly allocate these pages/addresses and
not rely on the fact that they are at a specific location that we deem
safe for now.
So if we explicitly introduce a new region at the end of the e820 that
we mark as reserved and we use it for this, I would be OK with that.
However we need to be careful because editing the e820 has proved to be
challenging in the past.
Also we would need to figure out a way to tell Linux that these
reserved addresses are actually OK to be used. Maybe we need a new
command line or hypercall for that.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [hybrid] : mmap pfn space...
  2012-03-26 10:37 ` Stefano Stabellini
@ 2012-03-26 10:40   ` Ian Campbell
  2012-04-14  1:47   ` Mukesh Rathor
  1 sibling, 0 replies; 11+ messages in thread
From: Ian Campbell @ 2012-03-26 10:40 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Xen-devel

On Mon, 2012-03-26 at 11:37 +0100, Stefano Stabellini wrote:
> On Fri, 23 Mar 2012, Mukesh Rathor wrote:
> > Hi Ian/Stefano,
> > 
> > So, I'm back to using pfn space from maxphysaddr below. Stefano, you
> > suggested ballooning, but that would be just too slow. There are lot of
> > pages to be mapped, 4k at a time during guest creation, and I am afraid
> > ballooning and hypercalls to populate EPT will be pretty slow.
> > 
> > OTOH, there is tons of address space available between max-physaddr and
> > max pfn in dom0. Stefano, your concern was stuff mapped there 
> > causing problems in future. But we can always look at the e820 for
> > conflicts. Keeping things fast is important for us .
> > 
> > Please let me know if you still have issues with my approach. I
> > believe this is what Ian is doing on ARM port.
> 
> I think that we should explicitly allocate these pages/addresses and
> not rely on the fact that they are at a specific location that we deem
> safe for now.

Agreed. In the context of the thing I'm doing on ARM this is entirely a
short term hack until we figure out something better.

Ian.

> So if we explicitly introduce a new region at the end of the e820 that
> we mark as reserved and we use it for this, I would be OK with that.
> However we need to be careful because editing the e820 has proved to be
> challenging in the past.
> Also we would need to figure out a way to tell Linux that these
> reserved addresses are actually OK to be used. Maybe we need a new
> command line or hypercall for that.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [hybrid] : mmap pfn space...
  2012-03-26 10:37 ` Stefano Stabellini
  2012-03-26 10:40   ` Ian Campbell
@ 2012-04-14  1:47   ` Mukesh Rathor
  2012-04-16 14:14     ` Ian Campbell
  2012-04-16 14:39     ` Daniel De Graaf
  1 sibling, 2 replies; 11+ messages in thread
From: Mukesh Rathor @ 2012-04-14  1:47 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Xen-devel, Ian Campbell

On Mon, 26 Mar 2012 11:37:46 +0100
Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
> I think that we should explicitly allocate these pages/addresses and
> not rely on the fact that they are at a specific location that we deem
> safe for now.
> So if we explicitly introduce a new region at the end of the e820 that
> we mark as reserved and we use it for this, I would be OK with that.
> However we need to be careful because editing the e820 has proved to
> be challenging in the past.
> Also we would need to figure out a way to tell Linux that these
> reserved addresses are actually OK to be used. Maybe we need a new
> command line or hypercall for that.

That sounds like reasonable approach. Lets do it as part of phase II.
I wanna get some basic code in.

So, to give an update of where I am, good news, I've got guests 
finally booting using hybrid dom0. So, that means I am almost there 
now!!!! Yeay...

But, the pfn space management for privcmd mapping is still a hack. 
Running into many issues. Basially, it is forcing me to write a slab
allocator for the resvd pfn space, that I am trying to avoid. During
guest creation, xl process maps about 10k foreign pgs, and xenstored 1.

I was thinking of just dividing my pfn space into say 10 chunks, each
with 10k pages, so 10 guest creations can happen simultaneously. But,
then xl is not the only process doing the mapping I found out. xenstored
also needs to map domU frames. Otherwise, I could just do one chunk
per process. Also, I am breaking mmap semantics somewhat by hooking
via privcmd_mmap, because the unmaps don't follow any order. So my last
unmap frees the entire 10k chunk it's using. 

In a nutshell, I am still trying to figure how to allocate rsvd pfn's 
for privcmd without writing a slab allocator. I think using mmap makes
it harder, can't we just use ioctl to get the VA? Then, I could nicely
do something like:
  xl: 
    - open(privcmd file)
    - ioctl(get rsvd/e820 pfn handle)
    - ioctl(get VA using above handle) /* alternate to mmap */
    - ioctl(get VA1 using above handle) /* alternate to mmap */
    ...
    - ioctl(release handle)
    - ioctl(release VA)
    - close file

Is that an option (to change mmap to ioctl)? 

Hope that makes sense,

thanks,
Mukesh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [hybrid] : mmap pfn space...
  2012-04-14  1:47   ` Mukesh Rathor
@ 2012-04-16 14:14     ` Ian Campbell
  2012-04-16 16:22       ` Stefano Stabellini
  2012-04-16 14:39     ` Daniel De Graaf
  1 sibling, 1 reply; 11+ messages in thread
From: Ian Campbell @ 2012-04-16 14:14 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel, Stefano Stabellini

On Sat, 2012-04-14 at 02:47 +0100, Mukesh Rathor wrote:
> On Mon, 26 Mar 2012 11:37:46 +0100
> Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
> > I think that we should explicitly allocate these pages/addresses and
> > not rely on the fact that they are at a specific location that we deem
> > safe for now.
> > So if we explicitly introduce a new region at the end of the e820 that
> > we mark as reserved and we use it for this, I would be OK with that.
> > However we need to be careful because editing the e820 has proved to
> > be challenging in the past.
> > Also we would need to figure out a way to tell Linux that these
> > reserved addresses are actually OK to be used. Maybe we need a new
> > command line or hypercall for that.
> 
> That sounds like reasonable approach. Lets do it as part of phase II.
> I wanna get some basic code in.
> 
> So, to give an update of where I am, good news, I've got guests 
> finally booting using hybrid dom0. So, that means I am almost there 
> now!!!! Yeay...

Awesome news!

> But, the pfn space management for privcmd mapping is still a hack. 
> Running into many issues. Basially, it is forcing me to write a slab
> allocator for the resvd pfn space, that I am trying to avoid. During
> guest creation, xl process maps about 10k foreign pgs, and xenstored 1.

10k simultaneously or over the life of a domain build?

> I was thinking of just dividing my pfn space into say 10 chunks, each
> with 10k pages, so 10 guest creations can happen simultaneously. But,
> then xl is not the only process doing the mapping I found out. xenstored
> also needs to map domU frames. Otherwise, I could just do one chunk
> per process. Also, I am breaking mmap semantics somewhat by hooking
> via privcmd_mmap, because the unmaps don't follow any order. So my last
> unmap frees the entire 10k chunk it's using. 

Presumably that's mostly just an issue of doing more accounting/tracking
in the privcmd driver (like the gntdev device does) so you can properly
release things at the right time/place?

> In a nutshell, I am still trying to figure how to allocate rsvd pfn's 
> for privcmd without writing a slab allocator.

Can't you just use the core get_page function (or
alloc_xenballooned_pages) and move the associated mfn aside temporarily
(or not if using alloc_xenballooned_pages)?

>  I think using mmap makes
> it harder, can't we just use ioctl to get the VA? Then, I could nicely
> do something like:
>   xl: 
>     - open(privcmd file)
>     - ioctl(get rsvd/e820 pfn handle)
>     - ioctl(get VA using above handle) /* alternate to mmap */
>     - ioctl(get VA1 using above handle) /* alternate to mmap */
>     ...
>     - ioctl(release handle)
>     - ioctl(release VA)
>     - close file
> 
> Is that an option (to change mmap to ioctl)? 
> 
> Hope that makes sense,
> 
> thanks,
> Mukesh
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [hybrid] : mmap pfn space...
  2012-04-14  1:47   ` Mukesh Rathor
  2012-04-16 14:14     ` Ian Campbell
@ 2012-04-16 14:39     ` Daniel De Graaf
  2012-04-16 14:59       ` Ian Campbell
  1 sibling, 1 reply; 11+ messages in thread
From: Daniel De Graaf @ 2012-04-16 14:39 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel, Ian Campbell, Stefano Stabellini

On 04/13/2012 09:47 PM, Mukesh Rathor wrote:
> On Mon, 26 Mar 2012 11:37:46 +0100
> Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
>> I think that we should explicitly allocate these pages/addresses and
>> not rely on the fact that they are at a specific location that we deem
>> safe for now.
>> So if we explicitly introduce a new region at the end of the e820 that
>> we mark as reserved and we use it for this, I would be OK with that.
>> However we need to be careful because editing the e820 has proved to
>> be challenging in the past.
>> Also we would need to figure out a way to tell Linux that these
>> reserved addresses are actually OK to be used. Maybe we need a new
>> command line or hypercall for that.
> 
> That sounds like reasonable approach. Lets do it as part of phase II.
> I wanna get some basic code in.
> 
> So, to give an update of where I am, good news, I've got guests 
> finally booting using hybrid dom0. So, that means I am almost there 
> now!!!! Yeay...
> 
> But, the pfn space management for privcmd mapping is still a hack. 
> Running into many issues. Basially, it is forcing me to write a slab
> allocator for the resvd pfn space, that I am trying to avoid. During
> guest creation, xl process maps about 10k foreign pgs, and xenstored 1.
> 
> I was thinking of just dividing my pfn space into say 10 chunks, each
> with 10k pages, so 10 guest creations can happen simultaneously. But,
> then xl is not the only process doing the mapping I found out. xenstored
> also needs to map domU frames.

With Xen 4.2, xenstored should be using the grant table for its shared
page. Similar changes can be made to xenconsoled so that only the domain
build/migrate processes use map_foreign_range. I have a patch to xenconsoled
without the fallback to map_foreign_range sitting around; I was planning to
post it with proper fallback (which I may do soon, looks simple enough).

> Otherwise, I could just do one chunk
> per process. Also, I am breaking mmap semantics somewhat by hooking
> via privcmd_mmap, because the unmaps don't follow any order. So my last
> unmap frees the entire 10k chunk it's using. 
> 
> In a nutshell, I am still trying to figure how to allocate rsvd pfn's 
> for privcmd without writing a slab allocator. I think using mmap makes
> it harder, can't we just use ioctl to get the VA? Then, I could nicely
> do something like:
>   xl: 
>     - open(privcmd file)
>     - ioctl(get rsvd/e820 pfn handle)
>     - ioctl(get VA using above handle) /* alternate to mmap */
>     - ioctl(get VA1 using above handle) /* alternate to mmap */
>     ...
>     - ioctl(release handle)
>     - ioctl(release VA)
>     - close file
> 
> Is that an option (to change mmap to ioctl)? 
> 
> Hope that makes sense,
> 
> thanks,
> Mukesh
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


-- 
Daniel De Graaf
National Security Agency

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [hybrid] : mmap pfn space...
  2012-04-16 14:39     ` Daniel De Graaf
@ 2012-04-16 14:59       ` Ian Campbell
  0 siblings, 0 replies; 11+ messages in thread
From: Ian Campbell @ 2012-04-16 14:59 UTC (permalink / raw)
  To: Daniel De Graaf; +Cc: Xen-devel, Stefano Stabellini

On Mon, 2012-04-16 at 15:39 +0100, Daniel De Graaf wrote:
> On 04/13/2012 09:47 PM, Mukesh Rathor wrote:
> > I was thinking of just dividing my pfn space into say 10 chunks, each
> > with 10k pages, so 10 guest creations can happen simultaneously. But,
> > then xl is not the only process doing the mapping I found out. xenstored
> > also needs to map domU frames.
> 
> With Xen 4.2, xenstored should be using the grant table for its shared
> page. Similar changes can be made to xenconsoled so that only the domain
> build/migrate processes use map_foreign_range. I have a patch to xenconsoled
> without the fallback to map_foreign_range sitting around; I was planning to
> post it with proper fallback (which I may do soon, looks simple enough).

That sounds like a good thing to have, although I don't think we'd take
it for 4.2 at this point so you've got some time.

I think the privcmd stuff needs to still assume that domain build is not
the only privileged mapper of pages and do proper tracking of what it
has mapped where. Various debug utilities etc also use this interface,
i.e. xenctx (and gdbsx? I suppose Mukesh would know ;-))

Ian.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [hybrid] : mmap pfn space...
  2012-04-16 14:14     ` Ian Campbell
@ 2012-04-16 16:22       ` Stefano Stabellini
  2012-04-16 16:26         ` Ian Campbell
  2012-04-18  1:20         ` Mukesh Rathor
  0 siblings, 2 replies; 11+ messages in thread
From: Stefano Stabellini @ 2012-04-16 16:22 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Xen-devel, Stefano Stabellini

On Mon, 16 Apr 2012, Ian Campbell wrote:
> > In a nutshell, I am still trying to figure how to allocate rsvd pfn's 
> > for privcmd without writing a slab allocator.
> 
> Can't you just use the core get_page function (or
> alloc_xenballooned_pages) and move the associated mfn aside temporarily
> (or not if using alloc_xenballooned_pages)?

I think that is a good suggestion: if we are trying to get in something
that works but might not be the best solution, then using
alloc_xenballooned_pages to get some pages and then changing the p2m is
the best option: it wastes a non-trivial amount of memory in dom0 but at
least it is known to work well and it wouldn't be an "hack".

Give a look at gntdev_alloc_map, gnttab_map_refs and m2p_add_override
for an example.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [hybrid] : mmap pfn space...
  2012-04-16 16:22       ` Stefano Stabellini
@ 2012-04-16 16:26         ` Ian Campbell
  2012-04-18  1:20         ` Mukesh Rathor
  1 sibling, 0 replies; 11+ messages in thread
From: Ian Campbell @ 2012-04-16 16:26 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Xen-devel

On Mon, 2012-04-16 at 17:22 +0100, Stefano Stabellini wrote:
> On Mon, 16 Apr 2012, Ian Campbell wrote:
> > > In a nutshell, I am still trying to figure how to allocate rsvd pfn's 
> > > for privcmd without writing a slab allocator.
> > 
> > Can't you just use the core get_page function (or
> > alloc_xenballooned_pages) and move the associated mfn aside temporarily
> > (or not if using alloc_xenballooned_pages)?
> 
> I think that is a good suggestion: if we are trying to get in something
> that works but might not be the best solution, then using
> alloc_xenballooned_pages to get some pages and then changing the p2m is
> the best option: it wastes a non-trivial amount of memory in dom0 but at
> least it is known to work well and it wouldn't be an "hack".

I don't think it wastes all that much -- even 10k pages is only a few
10s of megabytes for the duration of the build.

Also free_xenballooned_pages does the right thing if
alloc_xenballooned_pages had to explicitly free some pages to satisfy
the allocation. i.e. it will shrink the balloon and re-add those pages
to the allocator, it won't leave them in the balloon or something.

Ian.

> 
> Give a look at gntdev_alloc_map, gnttab_map_refs and m2p_add_override
> for an example.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [hybrid] : mmap pfn space...
  2012-04-16 16:22       ` Stefano Stabellini
  2012-04-16 16:26         ` Ian Campbell
@ 2012-04-18  1:20         ` Mukesh Rathor
  2012-04-18  8:33           ` Ian Campbell
  1 sibling, 1 reply; 11+ messages in thread
From: Mukesh Rathor @ 2012-04-18  1:20 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Xen-devel, Ian Campbell

On Mon, 16 Apr 2012 17:22:14 +0100
Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:

> On Mon, 16 Apr 2012, Ian Campbell wrote:
> > > In a nutshell, I am still trying to figure how to allocate rsvd
> > > pfn's for privcmd without writing a slab allocator.
> > 
> > Can't you just use the core get_page function (or
> > alloc_xenballooned_pages) and move the associated mfn aside
> > temporarily (or not if using alloc_xenballooned_pages)?
> 
> I think that is a good suggestion: if we are trying to get in
> something that works but might not be the best solution, then using
> alloc_xenballooned_pages to get some pages and then changing the p2m
> is the best option: it wastes a non-trivial amount of memory in dom0
> but at least it is known to work well and it wouldn't be an "hack".
> 
> Give a look at gntdev_alloc_map, gnttab_map_refs and m2p_add_override
> for an example.


Ok. I changed to using alloc_xenballooned_pages. In future, if we run
into problems, we can look into alternatives. In past we've had
problems with limits reached ballooing down. We run with small dom0.

thanks,
Mukesh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [hybrid] : mmap pfn space...
  2012-04-18  1:20         ` Mukesh Rathor
@ 2012-04-18  8:33           ` Ian Campbell
  0 siblings, 0 replies; 11+ messages in thread
From: Ian Campbell @ 2012-04-18  8:33 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel, Stefano Stabellini

On Wed, 2012-04-18 at 02:20 +0100, Mukesh Rathor wrote:
> On Mon, 16 Apr 2012 17:22:14 +0100
> Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
> 
> > On Mon, 16 Apr 2012, Ian Campbell wrote:
> > > > In a nutshell, I am still trying to figure how to allocate rsvd
> > > > pfn's for privcmd without writing a slab allocator.
> > > 
> > > Can't you just use the core get_page function (or
> > > alloc_xenballooned_pages) and move the associated mfn aside
> > > temporarily (or not if using alloc_xenballooned_pages)?
> > 
> > I think that is a good suggestion: if we are trying to get in
> > something that works but might not be the best solution, then using
> > alloc_xenballooned_pages to get some pages and then changing the p2m
> > is the best option: it wastes a non-trivial amount of memory in dom0
> > but at least it is known to work well and it wouldn't be an "hack".
> > 
> > Give a look at gntdev_alloc_map, gnttab_map_refs and m2p_add_override
> > for an example.
> 
> 
> Ok. I changed to using alloc_xenballooned_pages. In future, if we run
> into problems, we can look into alternatives. In past we've had
> problems with limits reached ballooing down. We run with small dom0.

You don't really need to increase the size of dom0, just the size of the
balloon. e.g. if you run dom0_mem=512M,max:1024M then you get a dom0
with 512M of RAM, but a total PFN space of 1024M, which means you have
512M of balloon available for alloc_xenballooned_pages.

If you just do dom0_mem=512M then I believe you get 512M of RAM but PFN
space sized for the entire host, which is going to give you more than
enough balloon space on any typical host (but there are obviously
downsides if the host has lots of RAM relative to 512M!)

Ian.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-04-18  8:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-23 18:01 [hybrid] : mmap pfn space Mukesh Rathor
2012-03-26 10:37 ` Stefano Stabellini
2012-03-26 10:40   ` Ian Campbell
2012-04-14  1:47   ` Mukesh Rathor
2012-04-16 14:14     ` Ian Campbell
2012-04-16 16:22       ` Stefano Stabellini
2012-04-16 16:26         ` Ian Campbell
2012-04-18  1:20         ` Mukesh Rathor
2012-04-18  8:33           ` Ian Campbell
2012-04-16 14:39     ` Daniel De Graaf
2012-04-16 14:59       ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.