All of lore.kernel.org
 help / color / mirror / Atom feed
* Improving hvm IO performance by using self IO emulator (YA io-emu?)
@ 2007-02-22  5:23 Tristan Gingold
  2007-02-22  7:59 ` Keir Fraser
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Tristan Gingold @ 2007-02-22  5:23 UTC (permalink / raw)
  To: xen-devel

Summary: I am proposing  a new method to improve hvm IO emulation: the IO
requests are reflected to the domain firmware which emulates the IO using PV
drivers.  The pros of this method are minor hypervisor modifications, smooth
transition, performance improvement and convergence with PV model


Discussion:

The current IO emulator (ioemu process in dom-0) is a well known bottleneck
for hvm performance because IO requests travel is long and cross many rings.

Many ideas to improve the emulation have been proposed.  None of them have
been adopted because their approach are too disruptive.

Based on my recent firmware experience I'd like to propose a new method.

The principle is rather simple: the hvm domain does all the work.  IO requests
are simply reflected to the domain.  When the hypervisor decodes an IO
request it sends it to the domain using a SMI(x86)/PMI(ia64)-like
interruption.  This reflection saves some registers, put parameters (IO req)
into registers and call the firmware at defined address using a defined mode
(physical mode should be the best).  The firmware handles the IO request like
ioemu does but use PV drivers (net, blk, fb...) to access to external
resources.  It then resumes the domain execution through an hypercall which
restores registers and mode.

I think there are many pros to this approach:

* the changes in the hypervisor are rather small: only the code to do the
reflection has to be added.  This is a well-known and light mechanism.

* the transition can be smooth: this new method can co-exist in several way
with the current method.  First it can be used only when enabled.  Then once
the reflection code is added in the hypervisor the firmware can just send the
IO request to ioemu like the hypervisor already does.  The in domain IO
emulation can be added driver per driver (eg: IDE disk first, then network,
then fb).
This smooth transition is a major advantage to early evaluate this new method.

* Because all the emulation work is done in the domain the work in accounted
to this domain and not to another domain (dom0 today).  This is good for
management and for security.

* From the hypervisor point of view such an hvm domain looks like a PV domain:
only the creation differs.  This IO emulation method unifies the domain.  This
will simplify save & restore and Xen in general.

* Performance should be improved compared to the current io emulation method:
the IO request travel is shorter.  If we want to work on performance we could
later handle directly some IO requests in the hypervisor (I think of ports or
iomem which don't have side-effect).


I don't see a lot of cons, the major one is 'porting' ioemu code to
firmware code.  This is the challenge.  But qemu seems to be well structured.
Most of the files might be ported without changes, the core has of course to
be rewritten.  The PV drivers should also be ported.

SMP can be first handled with a global lock and later concurrent accesses may
be allowed.  This may improve performance compared to ioemu which is almost
single threaded.

I don't know yet how to use the PV-on-HVM drivers.  There is currently only
one page to communicate with xenstore.  We can try to share this page
between the firmware and the PV-on-HVM drivers or we may create a second
page.


I have thought of this new IO emulation method during my work on EFI gfw for
ia64.  Recently I have looked more deeply into the sources.  I can't see any
stopper yet.  Unless someone has a strong point against this method I hope
I will be able to work on it shortly (ia64 first - sorry!)

Comments are *very* welcome.

Tristan.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-22  5:23 Improving hvm IO performance by using self IO emulator (YA io-emu?) Tristan Gingold
@ 2007-02-22  7:59 ` Keir Fraser
  2007-02-22  9:33   ` tgingold
  2007-02-22 10:34 ` Improving hvm IO performance by using self IO emulator(YA io-emu?) Guy Zana
  2007-02-22 16:06 ` Improving hvm IO performance by using self IO emulator (YA io-emu?) Anthony Liguori
  2 siblings, 1 reply; 22+ messages in thread
From: Keir Fraser @ 2007-02-22  7:59 UTC (permalink / raw)
  To: Tristan Gingold, xen-devel

On 22/2/07 05:23, "Tristan Gingold" <tgingold@free.fr> wrote:

> The current IO emulator (ioemu process in dom-0) is a well known bottleneck
> for hvm performance because IO requests travel is long and cross many rings.
> 
> Many ideas to improve the emulation have been proposed.  None of them have
> been adopted because their approach are too disruptive.
> 
> Based on my recent firmware experience I'd like to propose a new method.

This sounds plausible. It probably depends on what kind of 'firmware'
environment you plan to drop the ioemu code into? The general idea of
emulated devices looking to the control stack like PV I/O is one that we
want for x86 as well. So any xend changes to that effect are welcome.

> * From the hypervisor point of view such an hvm domain looks like a PV domain:
> only the creation differs.  This IO emulation method unifies the domain.  This
> will simplify save & restore and Xen in general.

I don't know the specifics of ia64 VTi, but I'd expect that Xen will still
need to be aware of VTi? I'd be surprised if the differences can be hidden
safely and efficiently. The model you propose sounds much more to me like a
VTi (non-PV) domain with PV extensions in an extended firmware module.

 -- Keir

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-22  7:59 ` Keir Fraser
@ 2007-02-22  9:33   ` tgingold
  2007-02-22 10:23     ` Keir Fraser
  0 siblings, 1 reply; 22+ messages in thread
From: tgingold @ 2007-02-22  9:33 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Tristan Gingold, xen-devel

Quoting Keir Fraser <Keir.Fraser@cl.cam.ac.uk>:

> On 22/2/07 05:23, "Tristan Gingold" <tgingold@free.fr> wrote:
>
> > The current IO emulator (ioemu process in dom-0) is a well known bottleneck
> > for hvm performance because IO requests travel is long and cross many
> rings.
> >
> > Many ideas to improve the emulation have been proposed.  None of them have
> > been adopted because their approach are too disruptive.
> >
> > Based on my recent firmware experience I'd like to propose a new method.
>
> This sounds plausible. It probably depends on what kind of 'firmware'
> environment you plan to drop the ioemu code into? The general idea of
> emulated devices looking to the control stack like PV I/O is one that we
> want for x86 as well.
Yes that's the idea.

> So any xend changes to that effect are welcome.

> > * From the hypervisor point of view such an hvm domain looks like a PV
> domain:
> > only the creation differs.  This IO emulation method unifies the domain.
> This
> > will simplify save & restore and Xen in general.
>
> I don't know the specifics of ia64 VTi, but I'd expect that Xen will still
> need to be aware of VTi?
Sure.
> I'd be surprised if the differences can be hidden
> safely and efficiently.
If we can get rid of the ioemu process the differences between hvm and PV will
be small, won't they ?

> The model you propose sounds much more to me like a
> VTi (non-PV) domain with PV extensions in an extended firmware module.
Yes, but this model should work with the ioemu process.

Tristan.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-22  9:33   ` tgingold
@ 2007-02-22 10:23     ` Keir Fraser
  0 siblings, 0 replies; 22+ messages in thread
From: Keir Fraser @ 2007-02-22 10:23 UTC (permalink / raw)
  To: tgingold; +Cc: xen-devel

On 22/2/07 09:33, "tgingold@free.fr" <tgingold@free.fr> wrote:

>> I don't know the specifics of ia64 VTi, but I'd expect that Xen will still
>> need to be aware of VTi?
> Sure.
>> I'd be surprised if the differences can be hidden
>> safely and efficiently.
> If we can get rid of the ioemu process the differences between hvm and PV will
> be small, won't they ?

>From the perspective of dom0, yes. From the perspective of Xen, maybe not so
much. :-)

 -- Keir

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Improving hvm IO performance by using self IO emulator(YA io-emu?)
  2007-02-22  5:23 Improving hvm IO performance by using self IO emulator (YA io-emu?) Tristan Gingold
  2007-02-22  7:59 ` Keir Fraser
@ 2007-02-22 10:34 ` Guy Zana
  2007-02-22 16:06 ` Improving hvm IO performance by using self IO emulator (YA io-emu?) Anthony Liguori
  2 siblings, 0 replies; 22+ messages in thread
From: Guy Zana @ 2007-02-22 10:34 UTC (permalink / raw)
  To: Tristan Gingold, xen-devel

Are you suggesting to write EFI PV Drivers?

This method sounds very promising, but there are some limitations, Windows Vista 32bit does not support EFI (so I've read).
It's like PV-on-HVM but it also eliminates the need to install the regular PV-on-HVM drivers.

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com 
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of 
> Tristan Gingold
> Sent: Thursday, February 22, 2007 7:23 AM
> To: xen-devel@lists.xensource.com
> Subject: [Xen-devel] Improving hvm IO performance by using 
> self IO emulator(YA io-emu?)
> 
> Summary: I am proposing  a new method to improve hvm IO 
> emulation: the IO requests are reflected to the domain 
> firmware which emulates the IO using PV drivers.  The pros of 
> this method are minor hypervisor modifications, smooth 
> transition, performance improvement and convergence with PV model
> 
> 
> Discussion:
> 
> The current IO emulator (ioemu process in dom-0) is a well 
> known bottleneck for hvm performance because IO requests 
> travel is long and cross many rings.
> 
> Many ideas to improve the emulation have been proposed.  None 
> of them have been adopted because their approach are too disruptive.
> 
> Based on my recent firmware experience I'd like to propose a 
> new method.
> 
> The principle is rather simple: the hvm domain does all the 
> work.  IO requests are simply reflected to the domain.  When 
> the hypervisor decodes an IO request it sends it to the 
> domain using a SMI(x86)/PMI(ia64)-like interruption.  This 
> reflection saves some registers, put parameters (IO req) into 
> registers and call the firmware at defined address using a 
> defined mode (physical mode should be the best).  The 
> firmware handles the IO request like ioemu does but use PV 
> drivers (net, blk, fb...) to access to external resources.  
> It then resumes the domain execution through an hypercall 
> which restores registers and mode.
> 
> I think there are many pros to this approach:
> 
> * the changes in the hypervisor are rather small: only the 
> code to do the reflection has to be added.  This is a 
> well-known and light mechanism.
> 
> * the transition can be smooth: this new method can co-exist 
> in several way with the current method.  First it can be used 
> only when enabled.  Then once the reflection code is added in 
> the hypervisor the firmware can just send the IO request to 
> ioemu like the hypervisor already does.  The in domain IO 
> emulation can be added driver per driver (eg: IDE disk first, 
> then network, then fb).
> This smooth transition is a major advantage to early evaluate 
> this new method.
> 
> * Because all the emulation work is done in the domain the 
> work in accounted to this domain and not to another domain 
> (dom0 today).  This is good for management and for security.
> 
> * From the hypervisor point of view such an hvm domain looks 
> like a PV domain:
> only the creation differs.  This IO emulation method unifies 
> the domain.  This will simplify save & restore and Xen in general.
> 
> * Performance should be improved compared to the current io 
> emulation method:
> the IO request travel is shorter.  If we want to work on 
> performance we could later handle directly some IO requests 
> in the hypervisor (I think of ports or iomem which don't have 
> side-effect).
> 
> 
> I don't see a lot of cons, the major one is 'porting' ioemu 
> code to firmware code.  This is the challenge.  But qemu 
> seems to be well structured.
> Most of the files might be ported without changes, the core 
> has of course to be rewritten.  The PV drivers should also be ported.
> 
> SMP can be first handled with a global lock and later 
> concurrent accesses may be allowed.  This may improve 
> performance compared to ioemu which is almost single threaded.
> 
> I don't know yet how to use the PV-on-HVM drivers.  There is 
> currently only one page to communicate with xenstore.  We can 
> try to share this page between the firmware and the PV-on-HVM 
> drivers or we may create a second page.
> 
> 
> I have thought of this new IO emulation method during my work 
> on EFI gfw for ia64.  Recently I have looked more deeply into 
> the sources.  I can't see any stopper yet.  Unless someone 
> has a strong point against this method I hope I will be able 
> to work on it shortly (ia64 first - sorry!)
> 
> Comments are *very* welcome.
> 
> Tristan.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-22  5:23 Improving hvm IO performance by using self IO emulator (YA io-emu?) Tristan Gingold
  2007-02-22  7:59 ` Keir Fraser
  2007-02-22 10:34 ` Improving hvm IO performance by using self IO emulator(YA io-emu?) Guy Zana
@ 2007-02-22 16:06 ` Anthony Liguori
  2007-02-22 20:58   ` tgingold
  2 siblings, 1 reply; 22+ messages in thread
From: Anthony Liguori @ 2007-02-22 16:06 UTC (permalink / raw)
  To: Tristan Gingold; +Cc: xen-devel

Hi Tristan,

Thanks for posting this.

Tristan Gingold wrote:
> Summary: I am proposing  a new method to improve hvm IO emulation: the IO
> requests are reflected to the domain firmware which emulates the IO using PV
> drivers.  The pros of this method are minor hypervisor modifications, smooth
> transition, performance improvement and convergence with PV model
> 
> 
> Discussion:
> 
> The current IO emulator (ioemu process in dom-0) is a well known bottleneck
> for hvm performance because IO requests travel is long and cross many rings.

I'm not quite sure that I agree this is the bottleneck.  If IO latency 
were the problem, then a major reduction in IO latency ought to 
significantly improve performance right?

KVM has a pretty much optimal path from the kernel to userspace.  The 
overhead of going to userspace is roughly two syscalls (and we've 
measured this overhead).  Yet it makes almost no difference in IO 
throughput.

The big problem with disk emulation isn't IO latency, but the fact that 
the IDE emulation can only have one outstanding request at a time.  The 
SCSI emulation helps this a lot.

I don't know what the bottle neck is in network emulation, but I suspect 
the number of copies we have in the path has a great deal to do with it.

> Many ideas to improve the emulation have been proposed.  None of them have
> been adopted because their approach are too disruptive.
> 
> Based on my recent firmware experience I'd like to propose a new method.
> 
> The principle is rather simple: the hvm domain does all the work.  IO requests
> are simply reflected to the domain.  When the hypervisor decodes an IO
> request it sends it to the domain using a SMI(x86)/PMI(ia64)-like
> interruption.  This reflection saves some registers, put parameters (IO req)
> into registers and call the firmware at defined address using a defined mode
> (physical mode should be the best).  The firmware handles the IO request like
> ioemu does but use PV drivers (net, blk, fb...) to access to external
> resources.  It then resumes the domain execution through an hypercall which
> restores registers and mode.
> 
> I think there are many pros to this approach:
> 
> * the changes in the hypervisor are rather small: only the code to do the
> reflection has to be added.  This is a well-known and light mechanism.
> 
> * the transition can be smooth: this new method can co-exist in several way
> with the current method.  First it can be used only when enabled.  Then once
> the reflection code is added in the hypervisor the firmware can just send the
> IO request to ioemu like the hypervisor already does.  The in domain IO
> emulation can be added driver per driver (eg: IDE disk first, then network,
> then fb).
> This smooth transition is a major advantage to early evaluate this new method.
> 
> * Because all the emulation work is done in the domain the work in accounted
> to this domain and not to another domain (dom0 today).  This is good for
> management and for security.
> 
> * From the hypervisor point of view such an hvm domain looks like a PV domain:
> only the creation differs.  This IO emulation method unifies the domain.  This
> will simplify save & restore and Xen in general.
> 
> * Performance should be improved compared to the current io emulation method:
> the IO request travel is shorter.  If we want to work on performance we could
> later handle directly some IO requests in the hypervisor (I think of ports or
> iomem which don't have side-effect).
> 
> 
> I don't see a lot of cons, the major one is 'porting' ioemu code to
> firmware code.  This is the challenge.  But qemu seems to be well structured.
> Most of the files might be ported without changes, the core has of course to
> be rewritten.  The PV drivers should also be ported.
> 
> SMP can be first handled with a global lock and later concurrent accesses may
> be allowed.  This may improve performance compared to ioemu which is almost
> single threaded.

There's a lot to like about this sort of approach.  It's not a silver 
bullet wrt performance but I think the model is elegant in many ways. 
An interesting place to start would be lapic/pit emulation.  Removing 
this code from the hypervisor would be pretty useful and there is no 
need to address PV-on-HVM issues.

Can you provide more details on how the reflecting works?  Have you 
measured the cost of reflection?  Do you just setup a page table that 
maps physical memory 1-1 and then reenter the guest?

Does the firmware get loaded as an option ROM or is it a special portion 
of guest memory that isn't normally reachable?

Regards,

Anthony Liguori

> I don't know yet how to use the PV-on-HVM drivers.  There is currently only
> one page to communicate with xenstore.  We can try to share this page
> between the firmware and the PV-on-HVM drivers or we may create a second
> page.
> 
> 
> I have thought of this new IO emulation method during my work on EFI gfw for
> ia64.  Recently I have looked more deeply into the sources.  I can't see any
> stopper yet.  Unless someone has a strong point against this method I hope
> I will be able to work on it shortly (ia64 first - sorry!)
> 
> Comments are *very* welcome.
> 
> Tristan.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-22 16:06 ` Improving hvm IO performance by using self IO emulator (YA io-emu?) Anthony Liguori
@ 2007-02-22 20:58   ` tgingold
  2007-02-22 21:23     ` Anthony Liguori
  2007-02-22 21:24     ` Mark Williamson
  0 siblings, 2 replies; 22+ messages in thread
From: tgingold @ 2007-02-22 20:58 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Tristan Gingold, xen-devel

Selon Anthony Liguori <aliguori@us.ibm.com>:

> Hi Tristan,
>
> Thanks for posting this.
[...]
> I'm not quite sure that I agree this is the bottleneck.  If IO latency
> were the problem, then a major reduction in IO latency ought to
> significantly improve performance right?
Sure.

It is interesting to note you don't agree.  This appeared to me so obvious.
Maybe should I do measures first and think only after :-)

> KVM has a pretty much optimal path from the kernel to userspace.  The
> overhead of going to userspace is roughly two syscalls (and we've
> measured this overhead).  Yet it makes almost no difference in IO
> throughput.
The path can be split into 2 parts: from trap to ioemu and from ioemu to
real hardware (the return is the same).  ioemu to hardware should be roughly
the same with KVM and Xen.  Is trap to ioemu that different between Xen and
KVM ?

Honestly I don't know.  Does anyone have figures ?

It would be interesting to compare disk (or net) performances between:
* linux
* dom0
* driver domain
* PV-on-HVM drivers
* ioemu

Does such a comparaison exist ?

> The big problem with disk emulation isn't IO latency, but the fact that
> the IDE emulation can only have one outstanding request at a time.  The
> SCSI emulation helps this a lot.
IIRC, a real IDE can only have one outstanding request too (this may have
changed with AHCI).  This is really IIRC :-(

BTW on ia64 there is no REP IN/OUT.  When Windows use IDE in PIO mode (during
install and crash dump), performances are horrible.  There is a patch which
adds a special handling for PIO mode and really improve data rate.

> I don't know what the bottle neck is in network emulation, but I suspect
> the number of copies we have in the path has a great deal to do with it.
This reason seems obvious.


[...]
> There's a lot to like about this sort of approach.  It's not a silver
> bullet wrt performance but I think the model is elegant in many ways.
> An interesting place to start would be lapic/pit emulation.  Removing
> this code from the hypervisor would be pretty useful and there is no
> need to address PV-on-HVM issues.
Indeed this is the simpler code to move.  But why would it be useful ?

> Can you provide more details on how the reflecting works?  Have you
> measured the cost of reflection?  Do you just setup a page table that
> maps physical memory 1-1 and then reenter the guest?
Yes, set disable PG, set up flat mode and reenter the guest.
Cost not yet measured!

> Does the firmware get loaded as an option ROM or is it a special portion
> of guest memory that isn't normally reachable?
IMHO it should come with hvmload.  No needs to make it unreachable.

Tristan.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-22 20:58   ` tgingold
@ 2007-02-22 21:23     ` Anthony Liguori
  2007-02-22 21:41       ` Mark Williamson
  2007-02-24  6:07       ` Tristan Gingold
  2007-02-22 21:24     ` Mark Williamson
  1 sibling, 2 replies; 22+ messages in thread
From: Anthony Liguori @ 2007-02-22 21:23 UTC (permalink / raw)
  To: tgingold; +Cc: xen-devel

tgingold@free.fr wrote:
>> KVM has a pretty much optimal path from the kernel to userspace.  The
>> overhead of going to userspace is roughly two syscalls (and we've
>> measured this overhead).  Yet it makes almost no difference in IO
>> throughput.
>>     
> The path can be split into 2 parts: from trap to ioemu and from ioemu to
> real hardware (the return is the same).  ioemu to hardware should be roughly
> the same with KVM and Xen.  Is trap to ioemu that different between Xen and
> KVM ?
>   

Yup.  With KVM, there is no scheduler involvement.  qemu does a blocking 
ioctl to the Linux kernel, and the Linux kernel does a vmrun.  Provided 
the time slice hasn't been exhausted, Linux returns directly to qemu 
after a vmexit.

Xen uses event channels which involved domain switches and 
select()'ing.  A lot of the time, the path is pretty optimal.  However, 
quite a bit of the time, you run into worst case scenarios with the 
various schedulers and the latency sky rockets.

> Honestly I don't know.  Does anyone have figures ?
>   

Yeah, it varies a lot on different hardware.  For reference:

if round trip to a null int80 syscall is 150 nsec, a round trip vmexit 
to userspace in KVM may be 2500 nsec.  On bare metal, it may cost 1700 
nsec to do a PIO operation to a IDE port so 2500 really isn't that bad.

Xen is usually around there too but every so often, it spikes to 
something awful (100ks of nsecs) and that skews the average cost.

> It would be interesting to compare disk (or net) performances between:
> * linux
> * dom0
> * driver domain
> * PV-on-HVM drivers
> * ioemu
>
> Does such a comparaison exist ?
>   

Not that I know of.  I've done a lot of benchmarking but not of PV-on-HVM.

Xen can typically get pretty close to native for disk IO.

>> The big problem with disk emulation isn't IO latency, but the fact that
>> the IDE emulation can only have one outstanding request at a time.  The
>> SCSI emulation helps this a lot.
>>     
> IIRC, a real IDE can only have one outstanding request too (this may have
> changed with AHCI).  This is really IIRC :-(
>   

You recall correctly.  IDE can only have one type of outstanding DMA 
request.

> BTW on ia64 there is no REP IN/OUT.  When Windows use IDE in PIO mode (during
> install and crash dump), performances are horrible.  There is a patch which
> adds a special handling for PIO mode and really improve data rate.
>   

Ouch :-(  Fortunately, OS's won't use PIO very often.

>> I don't know what the bottle neck is in network emulation, but I suspect
>> the number of copies we have in the path has a great deal to do with it.
>>     
> This reason seems obvious.
>
>
> [...]
>   
>> There's a lot to like about this sort of approach.  It's not a silver
>> bullet wrt performance but I think the model is elegant in many ways.
>> An interesting place to start would be lapic/pit emulation.  Removing
>> this code from the hypervisor would be pretty useful and there is no
>> need to address PV-on-HVM issues.
>>     
> Indeed this is the simpler code to move.  But why would it be useful ?
>   

Removing code from the hypervisor reduces the TCB so it's a win.  Having 
it in firmware within the HVM domain is even better than having it in 
dom0 too wrt the TCB.

>> Can you provide more details on how the reflecting works?  Have you
>> measured the cost of reflection?  Do you just setup a page table that
>> maps physical memory 1-1 and then reenter the guest?
>>     
> Yes, set disable PG, set up flat mode and reenter the guest.
> Cost not yet measured!
>   

That would be very useful to measure.  My chief concern would be that 
disabling PG would be considerably more costly than entering with paging 
enabled.  That may not be the case on VT today since there is no ASIDs 
so it would be useful to test on SVM too.

>> Does the firmware get loaded as an option ROM or is it a special portion
>> of guest memory that isn't normally reachable?
>>     
> IMHO it should come with hvmload.  No needs to make it unreachable.
>   

It would be nice to get rid of hvmloader in the long term IMHO.  Any 
initialization should be done in the BIOS.

Regards,

Anthony Liguori

> Tristan.
>   

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-22 20:58   ` tgingold
  2007-02-22 21:23     ` Anthony Liguori
@ 2007-02-22 21:24     ` Mark Williamson
  2007-02-22 21:33       ` Anthony Liguori
                         ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Mark Williamson @ 2007-02-22 21:24 UTC (permalink / raw)
  To: xen-devel; +Cc: tgingold, Anthony Liguori

> > The big problem with disk emulation isn't IO latency, but the fact that
> > the IDE emulation can only have one outstanding request at a time.  The
> > SCSI emulation helps this a lot.
>
> IIRC, a real IDE can only have one outstanding request too (this may have
> changed with AHCI).  This is really IIRC :-(

Can SATA drives queue multiple outstanding requests?  Thought some newer rev 
could, but I may well be misremembering - in any case we'd want something 
that was well supported.

> > I don't know what the bottle neck is in network emulation, but I suspect
> > the number of copies we have in the path has a great deal to do with it.
>
> This reason seems obvious.

Latency may matter more to the network performance than it did to block, 
actually (especially given our current setup is fairly pessimal wrt 
latency!).  It would be interesting to see how much difference this makes.

In any case, copies are bad too :-)  Presumably, hooking directly into the 
paravirt network channel would improve this situation too.

Perhaps the network device ought to be the first to move?

> > There's a lot to like about this sort of approach.  It's not a silver
> > bullet wrt performance but I think the model is elegant in many ways.
> > An interesting place to start would be lapic/pit emulation.  Removing
> > this code from the hypervisor would be pretty useful and there is no
> > need to address PV-on-HVM issues.
>
> Indeed this is the simpler code to move.  But why would it be useful ?

It might be a good proof of concept, and it simplifies the hypervisor (and the 
migration / suspend process) at the same time.

> > Does the firmware get loaded as an option ROM or is it a special portion
> > of guest memory that isn't normally reachable?
>
> IMHO it should come with hvmload.  No needs to make it unreachable.

Mmmm.  It's not like the guest can break security if it tampers with the 
device models in its own memory space.

Question: how does this compare with using a "stub domain" to run the device 
models?  The previous proposed approach was to automatically switch to the 
stub domain on trapping an IO by the HVM guest, and have that stub domain run 
the device models, etc.

You seem to be actually proposing running the code within the HVM guest 
itself.  The two approaches aren't actually that different, IMO, since the 
guest still effectively has two different execution contexts.  It does seem 
to me that running within the HVM guest itself might be more flexible.

A cool little trick that this strategy could enable is to run a full Qemu 
instruction emulator within the device model - I'd imagine this could be 
useful on IA64, for instance, in order to provide support for running legacy 
OSes (e.g. for x86, or *cough* PPC ;-))

Cheers,
Mark

-- 
Dave: Just a question. What use is a unicyle with no seat?  And no pedals!
Mark: To answer a question with a question: What use is a skateboard?
Dave: Skateboards have wheels.
Mark: My wheel has a wheel!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-22 21:24     ` Mark Williamson
@ 2007-02-22 21:33       ` Anthony Liguori
  2007-02-23  0:15         ` Mark Williamson
                           ` (2 more replies)
  2007-02-23  0:32       ` Alan
  2007-02-24  6:12       ` Tristan Gingold
  2 siblings, 3 replies; 22+ messages in thread
From: Anthony Liguori @ 2007-02-22 21:33 UTC (permalink / raw)
  To: Mark Williamson; +Cc: tgingold, xen-devel

Mark Williamson wrote:
>>> The big problem with disk emulation isn't IO latency, but the fact that
>>> the IDE emulation can only have one outstanding request at a time.  The
>>> SCSI emulation helps this a lot.
>>>       
>> IIRC, a real IDE can only have one outstanding request too (this may have
>> changed with AHCI).  This is really IIRC :-(
>>     
>
> Can SATA drives queue multiple outstanding requests?  Thought some newer rev 
> could, but I may well be misremembering - in any case we'd want something 
> that was well supported.
>   

SATA can, yes.  However, as you mention, SATA is very poorly supported.

The LSI scsi adapter seems to work quite nicely with Windows and Linux.  
And it supports TCQ.  And it's already implemented :-)  Can't really 
beat that :-)

>   
>>> I don't know what the bottle neck is in network emulation, but I suspect
>>> the number of copies we have in the path has a great deal to do with it.
>>>       
>> This reason seems obvious.
>>     
>
> Latency may matter more to the network performance than it did to block, 
> actually (especially given our current setup is fairly pessimal wrt 
> latency!).  It would be interesting to see how much difference this makes.
>
> In any case, copies are bad too :-)  Presumably, hooking directly into the 
> paravirt network channel would improve this situation too.
>
> Perhaps the network device ought to be the first to move?
>   

Can't say.  I haven't done much research on network performance.

>>> There's a lot to like about this sort of approach.  It's not a silver
>>> bullet wrt performance but I think the model is elegant in many ways.
>>> An interesting place to start would be lapic/pit emulation.  Removing
>>> this code from the hypervisor would be pretty useful and there is no
>>> need to address PV-on-HVM issues.
>>>       
>> Indeed this is the simpler code to move.  But why would it be useful ?
>>     
>
> It might be a good proof of concept, and it simplifies the hypervisor (and the 
> migration / suspend process) at the same time.
>
>   
>>> Does the firmware get loaded as an option ROM or is it a special portion
>>> of guest memory that isn't normally reachable?
>>>       
>> IMHO it should come with hvmload.  No needs to make it unreachable.
>>     
>
> Mmmm.  It's not like the guest can break security if it tampers with the 
> device models in its own memory space.
>
> Question: how does this compare with using a "stub domain" to run the device 
> models?  The previous proposed approach was to automatically switch to the 
> stub domain on trapping an IO by the HVM guest, and have that stub domain run 
> the device models, etc.
>   

Reflecting is a bit more expensive than doing a stub domain.  There is 
no way to wire up the VMEXITs to go directly into the guest so you're 
always going to have to pay the cost of going from guest => host => 
guest => host => guest for every PIO.  The guest is incapable of 
reenabling PG on its own hence the extra host => guest transition.

Compare to stub domain where, if done correctly, you can go from guest 
=> host/0 => host/3 => host/0 => guest.  The question would be, is 
host/0 => host/3 => host/0 fundamentally faster than host => guest => host.

I know that guest => host => guest typically costs *at least* 1000 nsecs 
on SVM.  A null sysenter syscall (that's host/3 => host/0 => host/3) is 
roughly 75 nsecs.

So my expectation is that stub domain can actually be made to be faster 
than reflecting.

Regards,

Anthony Liguori

> You seem to be actually proposing running the code within the HVM guest 
> itself.  The two approaches aren't actually that different, IMO, since the 
> guest still effectively has two different execution contexts.  It does seem 
> to me that running within the HVM guest itself might be more flexible.
>
> A cool little trick that this strategy could enable is to run a full Qemu 
> instruction emulator within the device model - I'd imagine this could be 
> useful on IA64, for instance, in order to provide support for running legacy 
> OSes (e.g. for x86, or *cough* PPC ;-))
>
> Cheers,
> Mark
>
>   

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-22 21:23     ` Anthony Liguori
@ 2007-02-22 21:41       ` Mark Williamson
  2007-02-24  6:19         ` Tristan Gingold
  2007-02-24  6:07       ` Tristan Gingold
  1 sibling, 1 reply; 22+ messages in thread
From: Mark Williamson @ 2007-02-22 21:41 UTC (permalink / raw)
  To: xen-devel; +Cc: tgingold, Anthony Liguori

While I'm thinking about it, I wonder how returning to the guest from the 
emulator would work...

We'd want to hypercall to transfer back to it...  do we need specific Xen 
support for this or could (for instance) Gerd's work on domU kexec be 
leveraged here?

Perhaps it would be worth evaluating some kind of "send these events and then 
switch back to guest code" hypercall so that the emulator doesn't have to 
bounce in and out of Xen so much.  Remains to be seen whether this makes much 
diffecence to overall performance but it seems somehow civilised ;-)

Cheers,
Mark

On Thursday 22 February 2007 21:23, Anthony Liguori wrote:
> tgingold@free.fr wrote:
> >> KVM has a pretty much optimal path from the kernel to userspace.  The
> >> overhead of going to userspace is roughly two syscalls (and we've
> >> measured this overhead).  Yet it makes almost no difference in IO
> >> throughput.
> >
> > The path can be split into 2 parts: from trap to ioemu and from ioemu to
> > real hardware (the return is the same).  ioemu to hardware should be
> > roughly the same with KVM and Xen.  Is trap to ioemu that different
> > between Xen and KVM ?
>
> Yup.  With KVM, there is no scheduler involvement.  qemu does a blocking
> ioctl to the Linux kernel, and the Linux kernel does a vmrun.  Provided
> the time slice hasn't been exhausted, Linux returns directly to qemu
> after a vmexit.
>
> Xen uses event channels which involved domain switches and
> select()'ing.  A lot of the time, the path is pretty optimal.  However,
> quite a bit of the time, you run into worst case scenarios with the
> various schedulers and the latency sky rockets.
>
> > Honestly I don't know.  Does anyone have figures ?
>
> Yeah, it varies a lot on different hardware.  For reference:
>
> if round trip to a null int80 syscall is 150 nsec, a round trip vmexit
> to userspace in KVM may be 2500 nsec.  On bare metal, it may cost 1700
> nsec to do a PIO operation to a IDE port so 2500 really isn't that bad.
>
> Xen is usually around there too but every so often, it spikes to
> something awful (100ks of nsecs) and that skews the average cost.
>
> > It would be interesting to compare disk (or net) performances between:
> > * linux
> > * dom0
> > * driver domain
> > * PV-on-HVM drivers
> > * ioemu
> >
> > Does such a comparaison exist ?
>
> Not that I know of.  I've done a lot of benchmarking but not of PV-on-HVM.
>
> Xen can typically get pretty close to native for disk IO.
>
> >> The big problem with disk emulation isn't IO latency, but the fact that
> >> the IDE emulation can only have one outstanding request at a time.  The
> >> SCSI emulation helps this a lot.
> >
> > IIRC, a real IDE can only have one outstanding request too (this may have
> > changed with AHCI).  This is really IIRC :-(
>
> You recall correctly.  IDE can only have one type of outstanding DMA
> request.
>
> > BTW on ia64 there is no REP IN/OUT.  When Windows use IDE in PIO mode
> > (during install and crash dump), performances are horrible.  There is a
> > patch which adds a special handling for PIO mode and really improve data
> > rate.
>
> Ouch :-(  Fortunately, OS's won't use PIO very often.
>
> >> I don't know what the bottle neck is in network emulation, but I suspect
> >> the number of copies we have in the path has a great deal to do with it.
> >
> > This reason seems obvious.
> >
> >
> > [...]
> >
> >> There's a lot to like about this sort of approach.  It's not a silver
> >> bullet wrt performance but I think the model is elegant in many ways.
> >> An interesting place to start would be lapic/pit emulation.  Removing
> >> this code from the hypervisor would be pretty useful and there is no
> >> need to address PV-on-HVM issues.
> >
> > Indeed this is the simpler code to move.  But why would it be useful ?
>
> Removing code from the hypervisor reduces the TCB so it's a win.  Having
> it in firmware within the HVM domain is even better than having it in
> dom0 too wrt the TCB.
>
> >> Can you provide more details on how the reflecting works?  Have you
> >> measured the cost of reflection?  Do you just setup a page table that
> >> maps physical memory 1-1 and then reenter the guest?
> >
> > Yes, set disable PG, set up flat mode and reenter the guest.
> > Cost not yet measured!
>
> That would be very useful to measure.  My chief concern would be that
> disabling PG would be considerably more costly than entering with paging
> enabled.  That may not be the case on VT today since there is no ASIDs
> so it would be useful to test on SVM too.
>
> >> Does the firmware get loaded as an option ROM or is it a special portion
> >> of guest memory that isn't normally reachable?
> >
> > IMHO it should come with hvmload.  No needs to make it unreachable.
>
> It would be nice to get rid of hvmloader in the long term IMHO.  Any
> initialization should be done in the BIOS.
>
> Regards,
>
> Anthony Liguori
>
> > Tristan.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

-- 
Dave: Just a question. What use is a unicyle with no seat?  And no pedals!
Mark: To answer a question with a question: What use is a skateboard?
Dave: Skateboards have wheels.
Mark: My wheel has a wheel!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-23  0:26         ` Alan
@ 2007-02-23  0:12           ` Anthony Liguori
  2007-02-23 12:57             ` Alan
  0 siblings, 1 reply; 22+ messages in thread
From: Anthony Liguori @ 2007-02-23  0:12 UTC (permalink / raw)
  To: Alan; +Cc: tgingold, xen-devel, Mark Williamson

Alan wrote:
>> SATA can, yes.  However, as you mention, SATA is very poorly supported.
>>     
>
> By what - it works very nicely in current Linux kernels,

But it isn't supported by older kernels and most versions of Windows.  A 
major use of virtualization is running older operating systems so 
depending on newer kernels is not really an option (if we have a new 
kernel, we'd prefer to use a paravirtual driver anyway).

>  including AHCI
> with NCQ and multiple outstanding commands. The fact Xen isn't merged
> and is living in prehistory is I'm afraid a Xen problem.
>   

This discussion is independent of Xen.  It's equally applicable to KVM 
and QEMU so please don't assume this has anything to do with Xen's merge 
status.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-22 21:33       ` Anthony Liguori
@ 2007-02-23  0:15         ` Mark Williamson
  2007-02-23  0:26         ` Alan
  2007-02-24  6:17         ` Tristan Gingold
  2 siblings, 0 replies; 22+ messages in thread
From: Mark Williamson @ 2007-02-23  0:15 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: tgingold, xen-devel

> > Can SATA drives queue multiple outstanding requests?  Thought some newer
> > rev could, but I may well be misremembering - in any case we'd want
> > something that was well supported.
>
> SATA can, yes.  However, as you mention, SATA is very poorly supported.
>
> The LSI scsi adapter seems to work quite nicely with Windows and Linux.
> And it supports TCQ.  And it's already implemented :-)  Can't really
> beat that :-)

LSI wins :-)  Supporting TCQ is cool too (but can we actually leverage that 
through the PV interface?)

> > Perhaps the network device ought to be the first to move?
>
> Can't say.  I haven't done much research on network performance.

Network was the hard device to virtualise anyway, so I suspect efficiency may 
matter more here....  although we'd have to test whether it was significant 
compared to other factors (is the device we're emulating at least well suited 
to efficient batching behaviour or should we be looking at that too?)

> Reflecting is a bit more expensive than doing a stub domain.  There is
> no way to wire up the VMEXITs to go directly into the guest so you're
> always going to have to pay the cost of going from guest => host =>
> guest => host => guest for every PIO.  The guest is incapable of
> reenabling PG on its own hence the extra host => guest transition.

VMEXITs still go to ring 0 though, right?  So you still need the ring 
transition into the guest and back?

What you wouldn't need if leveraging HVM is the pagetable switch - although I 
don't know if this is the case for VT-i which is somewhat different to VT-x 
in design.

> I know that guest => host => guest typically costs *at least* 1000 nsecs
> on SVM.  A null sysenter syscall (that's host/3 => host/0 => host/3) is
> roughly 75 nsecs.
>
> So my expectation is that stub domain can actually be made to be faster
> than reflecting.
>

Interesting.  The code should be fairly common to both though, so maybe we can 
do a bakeoff!

Cheers,
Mark

> Regards,
>
> Anthony Liguori
>
> > You seem to be actually proposing running the code within the HVM guest
> > itself.  The two approaches aren't actually that different, IMO, since
> > the guest still effectively has two different execution contexts.  It
> > does seem to me that running within the HVM guest itself might be more
> > flexible.
> >
> > A cool little trick that this strategy could enable is to run a full Qemu
> > instruction emulator within the device model - I'd imagine this could be
> > useful on IA64, for instance, in order to provide support for running
> > legacy OSes (e.g. for x86, or *cough* PPC ;-))
> >
> > Cheers,
> > Mark

-- 
Dave: Just a question. What use is a unicyle with no seat?  And no pedals!
Mark: To answer a question with a question: What use is a skateboard?
Dave: Skateboards have wheels.
Mark: My wheel has a wheel!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-22 21:33       ` Anthony Liguori
  2007-02-23  0:15         ` Mark Williamson
@ 2007-02-23  0:26         ` Alan
  2007-02-23  0:12           ` Anthony Liguori
  2007-02-24  6:17         ` Tristan Gingold
  2 siblings, 1 reply; 22+ messages in thread
From: Alan @ 2007-02-23  0:26 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: tgingold, xen-devel, Mark Williamson

> SATA can, yes.  However, as you mention, SATA is very poorly supported.

By what - it works very nicely in current Linux kernels, including AHCI
with NCQ and multiple outstanding commands. The fact Xen isn't merged
and is living in prehistory is I'm afraid a Xen problem.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-22 21:24     ` Mark Williamson
  2007-02-22 21:33       ` Anthony Liguori
@ 2007-02-23  0:32       ` Alan
  2007-02-24  6:12       ` Tristan Gingold
  2 siblings, 0 replies; 22+ messages in thread
From: Alan @ 2007-02-23  0:32 UTC (permalink / raw)
  To: Mark Williamson; +Cc: tgingold, Anthony Liguori, xen-devel

> Can SATA drives queue multiple outstanding requests?  Thought some newer rev 
> could, but I may well be misremembering - in any case we'd want something 
> that was well supported.

Most SATA drives support NCQ which is sensible queuing and all the rest
of it. Some early ones get it badly wrong. You need a controller
interface that handles it and a device that handles it. Devices
pretending to be PATA controllers lack the brains but AHCI is a current
intel standard used by many vendors for this role and intended to replace
SFF style IDE.

Alam

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-23  0:12           ` Anthony Liguori
@ 2007-02-23 12:57             ` Alan
  2007-02-23 18:56               ` Anthony Liguori
  0 siblings, 1 reply; 22+ messages in thread
From: Alan @ 2007-02-23 12:57 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: tgingold, xen-devel, Mark Williamson

> But it isn't supported by older kernels and most versions of Windows.  A 

Linux 2.4.x AHCI drivers exist. Windows 95/98 are lacking them as is NT
that much is true, but Win2K and later support AHCI. AHCI is also very
nice from a virtualisation point of view as you get commands in queues
and you can batch them up sensibly.

For older windows there is the ADMA interface which is saner to emulate
than SFF but not very sane.

> This discussion is independent of Xen.  It's equally applicable to KVM 
> and QEMU so please don't assume this has anything to do with Xen's merge 
> status.

Don't even get me started on qemu. The qemu "emulation" of ATAPI is a good
reason to use anything else as an interface.

Alan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-23 12:57             ` Alan
@ 2007-02-23 18:56               ` Anthony Liguori
  0 siblings, 0 replies; 22+ messages in thread
From: Anthony Liguori @ 2007-02-23 18:56 UTC (permalink / raw)
  To: Alan; +Cc: tgingold, xen-devel, Mark Williamson

Alan wrote:
>> But it isn't supported by older kernels and most versions of Windows.  A 
>>     
>
> Linux 2.4.x AHCI drivers exist. Windows 95/98 are lacking them as is NT
> that much is true, but Win2K and later support AHCI. AHCI is also very
> nice from a virtualisation point of view as you get commands in queues
> and you can batch them up sensibly.
>
> For older windows there is the ADMA interface which is saner to emulate
> than SFF but not very sane.
>
>   
>> This discussion is independent of Xen.  It's equally applicable to KVM 
>> and QEMU so please don't assume this has anything to do with Xen's merge 
>> status.
>>     
>
> Don't even get me started on qemu. The qemu "emulation" of ATAPI is a good
> reason to use anything else as an interface.
>   

Feel free to submit patches.

Regards,

Anthony Liguori

> Alan
>   

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-22 21:23     ` Anthony Liguori
  2007-02-22 21:41       ` Mark Williamson
@ 2007-02-24  6:07       ` Tristan Gingold
  1 sibling, 0 replies; 22+ messages in thread
From: Tristan Gingold @ 2007-02-24  6:07 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: tgingold, xen-devel

On Thu, Feb 22, 2007 at 03:23:03PM -0600, Anthony Liguori wrote:
> tgingold@free.fr wrote:
[... overhead ...]
> Yup.  With KVM, there is no scheduler involvement.  qemu does a blocking 
> ioctl to the Linux kernel, and the Linux kernel does a vmrun.  Provided 
> the time slice hasn't been exhausted, Linux returns directly to qemu 
> after a vmexit.
Ok, thank you for the details.

> Xen uses event channels which involved domain switches and 
> select()'ing.  A lot of the time, the path is pretty optimal.  However, 
> quite a bit of the time, you run into worst case scenarios with the 
> various schedulers and the latency sky rockets.
> 
> >Honestly I don't know.  Does anyone have figures ?
> >  
> 
> Yeah, it varies a lot on different hardware.  For reference:
> 
> if round trip to a null int80 syscall is 150 nsec, a round trip vmexit 
> to userspace in KVM may be 2500 nsec.  On bare metal, it may cost 1700 
> nsec to do a PIO operation to a IDE port so 2500 really isn't that bad.
> 
> Xen is usually around there too but every so often, it spikes to 
> something awful (100ks of nsecs) and that skews the average cost.
That explains the latency.

[...]
> >>The big problem with disk emulation isn't IO latency, but the fact that
> >>the IDE emulation can only have one outstanding request at a time.  The
> >>SCSI emulation helps this a lot.
> >>    
> >IIRC, a real IDE can only have one outstanding request too (this may have
> >changed with AHCI).  This is really IIRC :-(
> >  
> 
> You recall correctly.  IDE can only have one type of outstanding DMA 
> request.
So there is something I do not understand: KDM IDE accesses are almost as
fast as bare metal (2500 ns vs 1700 ns).  Is KVM IO performance awful compared
to bare metal ?  If so why ?

[...]
> Removing code from the hypervisor reduces the TCB so it's a win.  Having 
> it in firmware within the HVM domain is even better than having it in 
> dom0 too wrt the TCB.
Ok.

> >>Can you provide more details on how the reflecting works?  Have you
> >>measured the cost of reflection?  Do you just setup a page table that
> >>maps physical memory 1-1 and then reenter the guest?
> >>    
> >Yes, set disable PG, set up flat mode and reenter the guest.
> >Cost not yet measured!
> 
> That would be very useful to measure.  My chief concern would be that 
> disabling PG would be considerably more costly than entering with paging 
> enabled.  That may not be the case on VT today since there is no ASIDs 
> so it would be useful to test on SVM too.
Switching to physical mode shouldn't be slow on ia64 (Sorry I am more
familiar with Xen/ia64).  Anyways this is a detail.

> >>Does the firmware get loaded as an option ROM or is it a special portion
> >>of guest memory that isn't normally reachable?
> >>    
> >IMHO it should come with hvmload.  No needs to make it unreachable.
> >  
> 
> It would be nice to get rid of hvmloader in the long term IMHO.  Any 
> initialization should be done in the BIOS.
Again I am not very familiar with hvmloader and these are implementation
details IMHO.

Tristan.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-22 21:24     ` Mark Williamson
  2007-02-22 21:33       ` Anthony Liguori
  2007-02-23  0:32       ` Alan
@ 2007-02-24  6:12       ` Tristan Gingold
  2007-02-27 12:14         ` Mark Williamson
  2 siblings, 1 reply; 22+ messages in thread
From: Tristan Gingold @ 2007-02-24  6:12 UTC (permalink / raw)
  To: Mark Williamson; +Cc: tgingold, Anthony Liguori, xen-devel

On Thu, Feb 22, 2007 at 09:24:15PM +0000, Mark Williamson wrote:
[...]
> Perhaps the network device ought to be the first to move?
I think I will start with the most simple device :-)

> > > Does the firmware get loaded as an option ROM or is it a special portion
> > > of guest memory that isn't normally reachable?
> >
> > IMHO it should come with hvmload.  No needs to make it unreachable.
> 
> Mmmm.  It's not like the guest can break security if it tampers with the 
> device models in its own memory space.
[Maybe I don't catch all the english here]
How can the guest break security WRT an usual PV domain ?

> Question: how does this compare with using a "stub domain" to run the device 
> models?  The previous proposed approach was to automatically switch to the 
> stub domain on trapping an IO by the HVM guest, and have that stub domain run 
> the device models, etc.
Is there a partial/full implementation of stub domain ?
The pro of firmware approach compared to stub domain is the easy way to do it:
it doesn't requires of lot of modification in the HV.

> You seem to be actually proposing running the code within the HVM guest 
> itself.  The two approaches aren't actually that different, IMO, since the 
> guest still effectively has two different execution contexts.  It does seem 
> to me that running within the HVM guest itself might be more flexible.
I fully agree.

> A cool little trick that this strategy could enable is to run a full Qemu 
> instruction emulator within the device model - I'd imagine this could be 
> useful on IA64, for instance, in order to provide support for running legacy 
> OSes (e.g. for x86, or *cough* PPC ;-))
That's something I'd like to have too.

Tristan.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-22 21:33       ` Anthony Liguori
  2007-02-23  0:15         ` Mark Williamson
  2007-02-23  0:26         ` Alan
@ 2007-02-24  6:17         ` Tristan Gingold
  2 siblings, 0 replies; 22+ messages in thread
From: Tristan Gingold @ 2007-02-24  6:17 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: tgingold, xen-devel, Mark Williamson

On Thu, Feb 22, 2007 at 03:33:21PM -0600, Anthony Liguori wrote:
> Mark Williamson wrote:
[...]
> >Mmmm.  It's not like the guest can break security if it tampers with the 
> >device models in its own memory space.
> >
> >Question: how does this compare with using a "stub domain" to run the 
> >device models?  The previous proposed approach was to automatically switch 
> >to the stub domain on trapping an IO by the HVM guest, and have that stub 
> >domain run the device models, etc.
> >  
> 
> Reflecting is a bit more expensive than doing a stub domain.  There is 
> no way to wire up the VMEXITs to go directly into the guest so you're 
> always going to have to pay the cost of going from guest => host => 
> guest => host => guest for every PIO.  The guest is incapable of 
> reenabling PG on its own hence the extra host => guest transition.
> 
> Compare to stub domain where, if done correctly, you can go from guest 
> => host/0 => host/3 => host/0 => guest.  The question would be, is 
> host/0 => host/3 => host/0 fundamentally faster than host => guest => host.
> 
> I know that guest => host => guest typically costs *at least* 1000 nsecs 
> on SVM.  A null sysenter syscall (that's host/3 => host/0 => host/3) is 
> roughly 75 nsecs.
> 
> So my expectation is that stub domain can actually be made to be faster 
> than reflecting.
Ok.  Unfortunatly I don't have the figures for ia64.

With the firmware approach strictly speaking we don't need to reenter into the
guest mode during the reflection.  That would be very like stub-domain.
[I really have to look on stub-domain implementation if there is such one].

Tristan.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-22 21:41       ` Mark Williamson
@ 2007-02-24  6:19         ` Tristan Gingold
  0 siblings, 0 replies; 22+ messages in thread
From: Tristan Gingold @ 2007-02-24  6:19 UTC (permalink / raw)
  To: Mark Williamson; +Cc: tgingold, Anthony Liguori, xen-devel

On Thu, Feb 22, 2007 at 09:41:20PM +0000, Mark Williamson wrote:
> While I'm thinking about it, I wonder how returning to the guest from the 
> emulator would work...
> 
> We'd want to hypercall to transfer back to it...  do we need specific Xen 
> support for this or could (for instance) Gerd's work on domU kexec be 
> leveraged here?
> 
> Perhaps it would be worth evaluating some kind of "send these events and then 
> switch back to guest code" hypercall so that the emulator doesn't have to 
> bounce in and out of Xen so much.  Remains to be seen whether this makes much 
> diffecence to overall performance but it seems somehow civilised ;-)
For sure, there are a lot of possible minor optimization points...

Tristan.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Re: Improving hvm IO performance by using self IO emulator (YA io-emu?)
  2007-02-24  6:12       ` Tristan Gingold
@ 2007-02-27 12:14         ` Mark Williamson
  0 siblings, 0 replies; 22+ messages in thread
From: Mark Williamson @ 2007-02-27 12:14 UTC (permalink / raw)
  To: Tristan Gingold; +Cc: Anthony Liguori, xen-devel

> > Perhaps the network device ought to be the first to move?
>
> I think I will start with the most simple device :-)

Good plan :-)

> > > IMHO it should come with hvmload.  No needs to make it unreachable.
> >
> > Mmmm.  It's not like the guest can break security if it tampers with the
> > device models in its own memory space.
>
> [Maybe I don't catch all the english here]
> How can the guest break security WRT an usual PV domain ?

It can't - I just meant that it's no worse having the emulators in the domain 
itself than having a paravirtualised domain.  It doesn't imply an increase in 
trust, so there's no particular reason not to put emulators in the guest.

> > Question: how does this compare with using a "stub domain" to run the
> > device models?  The previous proposed approach was to automatically
> > switch to the stub domain on trapping an IO by the HVM guest, and have
> > that stub domain run the device models, etc.
>
> Is there a partial/full implementation of stub domain ?
> The pro of firmware approach compared to stub domain is the easy way to do
> it: it doesn't requires of lot of modification in the HV.

I believe some folks are working on this, but I'm not sure there's a "proper" 
stub domain with emulators linked to mini-os (as per the original plan) yet.

It's nice that modification isn't required - I suspect it also means less 
changes in the tools, etc, are necessary.

> > A cool little trick that this strategy could enable is to run a full Qemu
> > instruction emulator within the device model - I'd imagine this could be
> > useful on IA64, for instance, in order to provide support for running
> > legacy OSes (e.g. for x86, or *cough* PPC ;-))
>
> That's something I'd like to have too.

This is probably another discussion, but there's some interesting design 
questions here regarding how the CPU emulation would fit in.  For instance, 
whether it would need to be done in such a way that the guest could be 
save/restored (or even live migrated!) between x86 and IA64 machines...  
Being able to do this would be cool, but I'm not sure how useful it would be 
in the real world!  This also has implications for whether the PV drivers 
would use the host architecture protocol or the guest architecture 
protocol...  This stuff could get fun :-)

In any case, getting it working in any form would be an advance.

Cheers,
Mark

-- 
Dave: Just a question. What use is a unicyle with no seat?  And no pedals!
Mark: To answer a question with a question: What use is a skateboard?
Dave: Skateboards have wheels.
Mark: My wheel has a wheel!

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2007-02-27 12:14 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-22  5:23 Improving hvm IO performance by using self IO emulator (YA io-emu?) Tristan Gingold
2007-02-22  7:59 ` Keir Fraser
2007-02-22  9:33   ` tgingold
2007-02-22 10:23     ` Keir Fraser
2007-02-22 10:34 ` Improving hvm IO performance by using self IO emulator(YA io-emu?) Guy Zana
2007-02-22 16:06 ` Improving hvm IO performance by using self IO emulator (YA io-emu?) Anthony Liguori
2007-02-22 20:58   ` tgingold
2007-02-22 21:23     ` Anthony Liguori
2007-02-22 21:41       ` Mark Williamson
2007-02-24  6:19         ` Tristan Gingold
2007-02-24  6:07       ` Tristan Gingold
2007-02-22 21:24     ` Mark Williamson
2007-02-22 21:33       ` Anthony Liguori
2007-02-23  0:15         ` Mark Williamson
2007-02-23  0:26         ` Alan
2007-02-23  0:12           ` Anthony Liguori
2007-02-23 12:57             ` Alan
2007-02-23 18:56               ` Anthony Liguori
2007-02-24  6:17         ` Tristan Gingold
2007-02-23  0:32       ` Alan
2007-02-24  6:12       ` Tristan Gingold
2007-02-27 12:14         ` Mark Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.