All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Xen-devel Digest, Vol 25, Issue 93
       [not found] <E1HQkNQ-0002f5-Pl@host-192-168-0-1-bcn-london>
@ 2007-03-12 16:10 ` PUCCETTI Armand
  2007-03-12 16:19   ` Petersson, Mats
  0 siblings, 1 reply; 35+ messages in thread
From: PUCCETTI Armand @ 2007-03-12 16:10 UTC (permalink / raw)
  To: xen-devel


>> When the system boots, the processor is normally in "real-mode", and
>> it's definitely not got paging enabled. So we have to "make 
>> the guest OS
>> believe this is the case". But at the same time, the guest OS is most
>> likely not loaded at address zero in memory, so we need paging enabled
>> to remap the GUEST PHYSICAL address to match the machine physical
>> address. So we have a "linear map" to translate the "address zero" to
>> the "start of guest memory", and so on for every page of memory in the
>> guest.
>>
>> This is not hard to do, since the AMD-V/VT feature of the processor
>> expects the paging-bit to be different between what the guest "thinks"
>> and the actual case. In the AMD-V, there's even support to 
>> run real-mode
>> with paging enabled, so all the BIOS-code and such will be running in
>> this mode. VT has to do a bunch of tricky stuff to work around that
>> problem.
>>
>> Ok fine, does this argument holds true for even non-VT and 
>> non-Pacifica enabled processors?
>> I doubt it.
>>     
>
> Not precisely. I'm talking only about HVM mode, which is "full
> virtualization". PV-mode uses a different paging interface, which at
> least for most parts, comprise of changing the whole area of code in the
> kernel that updates the page-tables, by adding code that is aware of the
> THREE types of address (guest-virtual, guest-physical and
> machine-physical). This means that there's no real need for the
> "read-only page-tables" and "shadow-mode" - the page-table just contains
> the right value for the machine-physical address. [That's not to say
> that read-only page-tables can't be used in a PV system too - I'm not
> 100% sure how the page-table management works in the PV mode]. 
>   
That is very interesting info on the paging system. Mats, could you please
explain a bit the working of the PV paging? How do the the guest+host 
page tables work
together? What does the guest page table point to, i.e. how+when is it 
mapped onto the host page table?

I have seen in the code that there are different cases of guest+host 
paging table heights. Why?

thanks. Armand
>>> I hope i made myself clear.
>>> Please enlighten me :-).
>>>
>>> When paging is enabled, we use a shadow page-table, which is
>>> essentially
>>> that the GUEST sees one page-table, and the processor another
>>> (thanks to
>>> the fact that the hypervisor intercepts the CR3 read/write 
>>>       
>> operations,
>>     
>>> and when CR3 is read back by the guest, we don't send back the value
>>> it's ACTUALLY POINTING TO IN THE PROCESSOR, but the value 
>>>       
>> that was set
>>     
>>> by the guest). So there are two page-tables.
>>>
>>> Got this well, thanks Mats :).
>>>
>>> To make the page-table updates by the guest visible to the 
>>>       
>> hypervisor,
>>     
>>> all of the guest-page-tables are made read-only (by scanning
>>> the new CR3
>>> value whenever one is set).
>>>
>>> I didn't get this either well :(
>>> sorry, but do you mean CR3 for the guest or for the
>>> processor? i hope you mean guest?
>>>       
>> Yes, scan the guest-CR3 to see where it placed the page-tables.
>>
>>     
>>> Whenever a page-fault happens, the hypervisor has "first look", and
>>> determines if the update is for a page-table or not. If it is a
>>> page-table update, the guest operation is emulated (in 
>>>       
>> x86_emulate.c),
>>     
>>> and the result is written to the shadow-page-table AND the
>>>
>>> Why do we need emulation?some peculiar reason for emulating?
>>> Do you mean to say if i am running a 32 bit domU on top of a
>>> 64 bit processor, the guest operation for updating the page
>>> table is emulated by the hypervisor.am i right?
>>>       
>> No, it's simply because we need to see the result of the 
>> instruction and
>> write it to two places (with some modification in one of 
>> those places).
>> So if the code is doing, for example: "*pte |= 1;" (set a
>> page-table-entry to "present"), we need to mark both the
>> guest-page-table-entry to "present", and mark our 
>> shadow-entry "present"
>> (and perhaps do some other work too, but that's the minimum work
>> needed).
>>
>> This brings one more question in my mind.Why do we use pinning then?
>>     
>
> I believe there's two types of pinning! Page-pinning, which is blocking
> a page from being accessed in an incorrect way [again, I'm not 100% sure
> how this works, or exactly what it does - just that it's a term used in
> the general way I described in the previous sentence]. 
>
>   
>> As i see at it.To avoid shadow page tables to be swapped out 
>> before the page tables they actually point to are swapped.Am i right?
>>
>> But according to interface manual,-> to bind a vcpu to a 
>> specific CPU in a SMP environment we use pining.But these two 
>> look pretty orthogonal statements to me, which means i may be 
>> wrong :(.
>> Can somebody help me in this regard?
>>     
>
> CPU pinning is to tie a VCPU to a (set of) processor(s). For example,
> you may want to pin Dom0 to run only on CPU0, and pin a DomU to run on
> CPU's 1,2 and 3. That way, Dom0 is ALWAYS able to run on it's own CPU,
> and it's never in contention about which CPU to use, and DomU can run on
> three CPU's as much as it likes. You could have another DomU pinned to
> CPU 3 if you wish. That means that CPU 1, 2 are exclusively for the
> first DomU, whilst the second DomU shares CPU3 with the first DomU (so
> they both get half the CPU performance of one CPU - on average over a
> reasonable amount of time). 
>
> --
>   

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Re: Xen-devel Digest, Vol 25, Issue 93
  2007-03-12 16:10 ` Xen-devel Digest, Vol 25, Issue 93 PUCCETTI Armand
@ 2007-03-12 16:19   ` Petersson, Mats
  2007-03-12 16:23     ` Keir Fraser
  0 siblings, 1 reply; 35+ messages in thread
From: Petersson, Mats @ 2007-03-12 16:19 UTC (permalink / raw)
  To: PUCCETTI Armand, xen-devel

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com 
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of 
> PUCCETTI Armand
> Sent: 12 March 2007 16:11
> To: xen-devel@lists.xensource.com
> Subject: [Xen-devel] Re: Xen-devel Digest, Vol 25, Issue 93
> 
> 
> >> When the system boots, the processor is normally in 
> "real-mode", and
> >> it's definitely not got paging enabled. So we have to "make 
> >> the guest OS
> >> believe this is the case". But at the same time, the guest 
> OS is most
> >> likely not loaded at address zero in memory, so we need 
> paging enabled
> >> to remap the GUEST PHYSICAL address to match the machine physical
> >> address. So we have a "linear map" to translate the 
> "address zero" to
> >> the "start of guest memory", and so on for every page of 
> memory in the
> >> guest.
> >>
> >> This is not hard to do, since the AMD-V/VT feature of the processor
> >> expects the paging-bit to be different between what the 
> guest "thinks"
> >> and the actual case. In the AMD-V, there's even support to 
> >> run real-mode
> >> with paging enabled, so all the BIOS-code and such will be 
> running in
> >> this mode. VT has to do a bunch of tricky stuff to work around that
> >> problem.
> >>
> >> Ok fine, does this argument holds true for even non-VT and 
> >> non-Pacifica enabled processors?
> >> I doubt it.
> >>     
> >
> > Not precisely. I'm talking only about HVM mode, which is "full
> > virtualization". PV-mode uses a different paging interface, which at
> > least for most parts, comprise of changing the whole area 
> of code in the
> > kernel that updates the page-tables, by adding code that is 
> aware of the
> > THREE types of address (guest-virtual, guest-physical and
> > machine-physical). This means that there's no real need for the
> > "read-only page-tables" and "shadow-mode" - the page-table 
> just contains
> > the right value for the machine-physical address. [That's not to say
> > that read-only page-tables can't be used in a PV system too 
> - I'm not
> > 100% sure how the page-table management works in the PV mode]. 
> >   
> That is very interesting info on the paging system. Mats, 
> could you please
> explain a bit the working of the PV paging? How do the the guest+host 
> page tables work
> together? What does the guest page table point to, i.e. 
> how+when is it 
> mapped onto the host page table?
> 
> I have seen in the code that there are different cases of guest+host 
> paging table heights. Why?

I'm sorry, I don't quite know this. I believe that the page-table has to
be the same number of levels in both Xen and the PV guest. 

There's been some recent work to implement 32-bit PV on 64-bit HV, which
I think changes this by allowing a 32-bit PAE guest to run on a 64-bit
hypervisor. Someone else who works more on PV is probably better to
answer this... 

In HVM, you definitely have 32-bit both PAE and non-PAE on 64-bit HV,
which obviously means different number of page-table levels (2, 3 or 4
respectively for non-PAE, PAE and 64-bit). 

--
Mats


> 
> thanks. Armand

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Re: Xen-devel Digest, Vol 25, Issue 93
  2007-03-12 16:19   ` Petersson, Mats
@ 2007-03-12 16:23     ` Keir Fraser
  2007-03-12 16:26       ` More page-table questions Petersson, Mats
  0 siblings, 1 reply; 35+ messages in thread
From: Keir Fraser @ 2007-03-12 16:23 UTC (permalink / raw)
  To: Petersson, Mats, PUCCETTI Armand, xen-devel

On 12/3/07 16:19, "Petersson, Mats" <Mats.Petersson@amd.com> wrote:

>> I have seen in the code that there are different cases of guest+host
>> paging table heights. Why?
> 
> I'm sorry, I don't quite know this. I believe that the page-table has to
> be the same number of levels in both Xen and the PV guest.
> 
> There's been some recent work to implement 32-bit PV on 64-bit HV, which
> I think changes this by allowing a 32-bit PAE guest to run on a 64-bit
> hypervisor. Someone else who works more on PV is probably better to
> answer this... 

For PV guests, there are no separate Xen/shadow page tables. Xen reserves a
bit of space at the top end of guest pagetables to map itself. Hence
normally the guest and Xen pagetables must be the same height as they are
actually the same pagetables.

Supporting PAE guest on 64-bit Xen is the only exception. Xen maintains a
hidden top-level page directory and one of the entries in that directory
points at the guest's three-level pagetable. But again there is no shadowing
of the guest three-level pagetable: they are directly hooked into the hidden
top-level directory, and the real physical %cr3 points at that hidden
directory.

 -- Keir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: More page-table questions.
  2007-03-12 16:23     ` Keir Fraser
@ 2007-03-12 16:26       ` Petersson, Mats
  2007-03-12 16:32         ` Keir Fraser
  0 siblings, 1 reply; 35+ messages in thread
From: Petersson, Mats @ 2007-03-12 16:26 UTC (permalink / raw)
  To: Keir Fraser, PUCCETTI Armand, xen-devel

 

> -----Original Message-----
> From: Keir Fraser [mailto:keir@xensource.com] 
> Sent: 12 March 2007 16:23
> To: Petersson, Mats; PUCCETTI Armand; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] Re: Xen-devel Digest, Vol 25, Issue 93
> 
> On 12/3/07 16:19, "Petersson, Mats" <Mats.Petersson@amd.com> wrote:
> 
> >> I have seen in the code that there are different cases of 
> guest+host
> >> paging table heights. Why?
> > 
> > I'm sorry, I don't quite know this. I believe that the 
> page-table has to
> > be the same number of levels in both Xen and the PV guest.
> > 
> > There's been some recent work to implement 32-bit PV on 
> 64-bit HV, which
> > I think changes this by allowing a 32-bit PAE guest to run 
> on a 64-bit
> > hypervisor. Someone else who works more on PV is probably better to
> > answer this... 
> 
> For PV guests, there are no separate Xen/shadow page tables. 
> Xen reserves a
> bit of space at the top end of guest pagetables to map itself. Hence
> normally the guest and Xen pagetables must be the same height 
> as they are
> actually the same pagetables.
> 
> Supporting PAE guest on 64-bit Xen is the only exception. Xen 
> maintains a
> hidden top-level page directory and one of the entries in 
> that directory
> points at the guest's three-level pagetable. But again there 
> is no shadowing
> of the guest three-level pagetable: they are directly hooked 
> into the hidden
> top-level directory, and the real physical %cr3 points at that hidden
> directory.

Are the page-tables ever updated directly by the guest, or is it all
done via hyper-calls?

--
Mats

>  -- Keir
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: More page-table questions.
  2007-03-12 16:26       ` More page-table questions Petersson, Mats
@ 2007-03-12 16:32         ` Keir Fraser
  2007-03-12 16:35           ` Petersson, Mats
  2007-03-12 17:27           ` More page-table questions PUCCETTI Armand
  0 siblings, 2 replies; 35+ messages in thread
From: Keir Fraser @ 2007-03-12 16:32 UTC (permalink / raw)
  To: Petersson, Mats, Keir Fraser, PUCCETTI Armand, xen-devel

On 12/3/07 16:26, "Petersson, Mats" <Mats.Petersson@amd.com> wrote:

> Are the page-tables ever updated directly by the guest, or is it all
> done via hyper-calls?

Leaf PTEs (i.e., really just PTEs, not PDEs) can be directly written from
the point-of-view of the guest. In fact they are trapped and emulated by
Xen. The guest is somewhat aware of this because it has explicitly
write-protected all its pagetables, so if it were to attempt the direct
write on native hardware in these circumstances it would receive a page
fault.

 -- Keir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: More page-table questions.
  2007-03-12 16:32         ` Keir Fraser
@ 2007-03-12 16:35           ` Petersson, Mats
  2007-03-12 16:38             ` Keir Fraser
  2007-03-15 22:15             ` Questions about device/event channels in Xen Liang Yang
  2007-03-12 17:27           ` More page-table questions PUCCETTI Armand
  1 sibling, 2 replies; 35+ messages in thread
From: Petersson, Mats @ 2007-03-12 16:35 UTC (permalink / raw)
  To: Keir Fraser, PUCCETTI Armand, xen-devel

 

> -----Original Message-----
> From: Keir Fraser [mailto:keir@xensource.com] 
> Sent: 12 March 2007 16:32
> To: Petersson, Mats; Keir Fraser; PUCCETTI Armand; 
> xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] More page-table questions.
> 
> On 12/3/07 16:26, "Petersson, Mats" <Mats.Petersson@amd.com> wrote:
> 
> > Are the page-tables ever updated directly by the guest, or is it all
> > done via hyper-calls?
> 
> Leaf PTEs (i.e., really just PTEs, not PDEs) can be directly 
> written from
> the point-of-view of the guest. In fact they are trapped and 
> emulated by
> Xen. The guest is somewhat aware of this because it has explicitly
> write-protected all its pagetables, so if it were to attempt 
> the direct
> write on native hardware in these circumstances it would 
> receive a page
> fault.

So in one way or another, the hypervisor knows about every write to the
page-table, yes?

--
Mats
> 
>  -- Keir
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: More page-table questions.
  2007-03-12 16:35           ` Petersson, Mats
@ 2007-03-12 16:38             ` Keir Fraser
  2007-03-15 22:15             ` Questions about device/event channels in Xen Liang Yang
  1 sibling, 0 replies; 35+ messages in thread
From: Keir Fraser @ 2007-03-12 16:38 UTC (permalink / raw)
  To: Petersson, Mats, Keir Fraser, PUCCETTI Armand, xen-devel

On 12/3/07 16:35, "Petersson, Mats" <Mats.Petersson@amd.com> wrote:

> So in one way or another, the hypervisor knows about every write to the
> page-table, yes?

Only the hypervisor ever actually updates pagetables. Guest attempts are
trapped and emulated, or the guest explicitly executes a hypercall.

 -- Keir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: More page-table questions.
  2007-03-12 16:32         ` Keir Fraser
  2007-03-12 16:35           ` Petersson, Mats
@ 2007-03-12 17:27           ` PUCCETTI Armand
  2007-03-12 17:42             ` Petersson, Mats
  1 sibling, 1 reply; 35+ messages in thread
From: PUCCETTI Armand @ 2007-03-12 17:27 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Petersson, Mats, xen-devel

Keir Fraser a écrit :
> On 12/3/07 16:26, "Petersson, Mats" <Mats.Petersson@amd.com> wrote:
>
>   
>> Are the page-tables ever updated directly by the guest, or is it all
>> done via hyper-calls?
>>     
>
> Leaf PTEs (i.e., really just PTEs, not PDEs) can be directly written from
> the point-of-view of the guest. In fact they are trapped and emulated by
> Xen. The guest is somewhat aware of this because it has explicitly
> write-protected all its pagetables, so if it were to attempt the direct
> write on native hardware in these circumstances it would receive a page
> fault.
>
>  -- Keir
>
>
>   
This is unclear to me: "a guest believes he can write PTEs" means that
his source code to access the page tables is left unchanged between 
legacy and PV version?

Merely, the hypervisor traps the guest's accesses to the page tables, to 
control
what he is doing (e.g. not overlapping any other domain's pages) and 
allowing or denying
any writes. This should apply to any page table level, so why only 
blocking writes to PTEs?

This is for 4K pages, but how are 2M pages mixed? or do we assume that 
every domain pages
are 4K?

Armand

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: More page-table questions.
  2007-03-12 17:27           ` More page-table questions PUCCETTI Armand
@ 2007-03-12 17:42             ` Petersson, Mats
  2007-03-13 16:25               ` Mark Williamson
  0 siblings, 1 reply; 35+ messages in thread
From: Petersson, Mats @ 2007-03-12 17:42 UTC (permalink / raw)
  To: PUCCETTI Armand, Keir Fraser; +Cc: xen-devel

 

> -----Original Message-----
> From: PUCCETTI Armand [mailto:armand.puccetti@cea.fr] 
> Sent: 12 March 2007 17:27
> To: Keir Fraser
> Cc: Petersson, Mats; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] More page-table questions.
> 
> Keir Fraser a écrit :
> > On 12/3/07 16:26, "Petersson, Mats" <Mats.Petersson@amd.com> wrote:
> >
> >   
> >> Are the page-tables ever updated directly by the guest, or 
> is it all
> >> done via hyper-calls?
> >>     
> >
> > Leaf PTEs (i.e., really just PTEs, not PDEs) can be 
> directly written from
> > the point-of-view of the guest. In fact they are trapped 
> and emulated by
> > Xen. The guest is somewhat aware of this because it has explicitly
> > write-protected all its pagetables, so if it were to 
> attempt the direct
> > write on native hardware in these circumstances it would 
> receive a page
> > fault.
> >
> >  -- Keir
> >
> >
> >   
> This is unclear to me: "a guest believes he can write PTEs" means that
> his source code to access the page tables is left unchanged between 
> legacy and PV version?
> 
> Merely, the hypervisor traps the guest's accesses to the page 
> tables, to 
> control
> what he is doing (e.g. not overlapping any other domain's pages) and 
> allowing or denying
> any writes. This should apply to any page table level, so why only 
> blocking writes to PTEs?

No, it's the other way around (and I'm sure Keir will correct me if I'm wrong). The guest is not allowed to write AT ALL to the upper levels of the page-table (aside from via hypercalls). So, code in the guest can be unmodified as long as it's touching just the bottom level of page-table (i.e. the individual 4K page).

> 
> This is for 4K pages, but how are 2M pages mixed? or do we 
> assume that 
> every domain pages
> are 4K?

As far as I know, Xen _ONLY_ supports small pages (4K), no large page support at present. 

--
Mats
> 
> Armand
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: More page-table questions.
  2007-03-12 17:42             ` Petersson, Mats
@ 2007-03-13 16:25               ` Mark Williamson
  0 siblings, 0 replies; 35+ messages in thread
From: Mark Williamson @ 2007-03-13 16:25 UTC (permalink / raw)
  To: xen-devel; +Cc: Petersson, Mats, PUCCETTI Armand, Keir Fraser

> > This is unclear to me: "a guest believes he can write PTEs" means that
> > his source code to access the page tables is left unchanged between
> > legacy and PV version?
> >
> > Merely, the hypervisor traps the guest's accesses to the page
> > tables, to
> > control
> > what he is doing (e.g. not overlapping any other domain's pages) and
> > allowing or denying
> > any writes. This should apply to any page table level, so why only
> > blocking writes to PTEs?
>
> No, it's the other way around (and I'm sure Keir will correct me if I'm
> wrong). The guest is not allowed to write AT ALL to the upper levels of the
> page-table (aside from via hypercalls). So, code in the guest can be
> unmodified as long as it's touching just the bottom level of page-table
> (i.e. the individual 4K page).

The guest doesn't actually do explicit hypercalls in PV these days; it tries 
to write to the page table leaf nodes and these writes cause a fault (because 
the page tables must be mapped read only).  Xen then validates the change 
being made and applies it to the page table.

Guests have to be modified to translate pseudophysical->machine addresses and 
to map pagetables readonly, but they don't make explicit hypercalls anymore 
(although the effect is much the same).

> > This is for 4K pages, but how are 2M pages mixed? or do we
> > assume that
> > every domain pages
> > are 4K?
>
> As far as I know, Xen _ONLY_ supports small pages (4K), no large page
> support at present.

Large page support hasn't been figured out yet, so 4K pages is fixed on x86.  
I think the IA64 guys (and maybe PPC?) may have considered large pages (IA64 
at least has a far wider range of allowed page sizes than x86).

Cheers,
Mark

-- 
Dave: Just a question. What use is a unicyle with no seat?  And no pedals!
Mark: To answer a question with a question: What use is a skateboard?
Dave: Skateboards have wheels.
Mark: My wheel has a wheel!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Questions about device/event channels in Xen.
  2007-03-12 16:35           ` Petersson, Mats
  2007-03-12 16:38             ` Keir Fraser
@ 2007-03-15 22:15             ` Liang Yang
  2007-03-16  0:34               ` Mark Williamson
                                 ` (2 more replies)
  1 sibling, 3 replies; 35+ messages in thread
From: Liang Yang @ 2007-03-15 22:15 UTC (permalink / raw)
  To: xen-devel; +Cc: Petersson, Mats

Hello,

I just have several questions about device and event channel:
1. From the implementation point of view, are device and event channel the 
same (i.e. both based on shared memory)?

2. In Xen papers, it is said up to 1024 channels are supported per domain. 
Does 1024 include both device channel and event channel?

3. Are these device/event channels allocated dynamically or statically for 
each domain?

4. It seems I need to allocate one device channel per device, is this true?

Thanks,

Liang

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Questions about device/event channels in Xen.
  2007-03-15 22:15             ` Questions about device/event channels in Xen Liang Yang
@ 2007-03-16  0:34               ` Mark Williamson
  2007-03-16  6:02                   ` Liang Yang
  2007-03-16  3:17               ` Questions about device/event channels in Xen Daniel Stodden
  2007-03-16  8:38               ` Petersson, Mats
  2 siblings, 1 reply; 35+ messages in thread
From: Mark Williamson @ 2007-03-16  0:34 UTC (permalink / raw)
  To: xen-devel; +Cc: Petersson, Mats, Liang Yang

The terminology may be confusing you here, so let me just say: Device channels 
are not like Event channels.  They're different concepts...  let me 
elaborate:

> I just have several questions about device and event channel:
> 1. From the implementation point of view, are device and event channel the
> same (i.e. both based on shared memory)?

Event channels don't use interdomain shared memory.  They're like an 
interdomain interrupt line, provided as a service by Xen.  Basically a way 
for a pair of domains to "poke" each other to say "Something just happened 
and there's work for you to do".

The "device channel" uses interdomain shared memory (using grant tables) and 
event channels to emulate the functionality of a device.  For instance, the 
blkfront and blkback drivers do something like the following:

1. blkfront wants to access a block of data
   -> queue a "read request" into memory it shares with blkback
   -> notify blkback in dom0 using an event channel
2. blkback experiences an "interrupt" as a result of the event sent to it
   -> looks in the shared memory to find the request
   -> executes the read operation
   -> puts a response in shared memory
   -> notifies blkfront in the domU using an event channel
3. blkfront experiences an "interrupt" due to the event sent to it
   -> completes processing of the new data

The combination of the shared memory (containing a ring buffer for requests 
and responses) and the event channel provides the facilities for the front 
and back drivers to talk to each other; this is the device channel.

> 2. In Xen papers, it is said up to 1024 channels are supported per domain.
> Does 1024 include both device channel and event channel?

This should be answered by the text above; device channels are a different 
thing, built using event channels.

> 3. Are these device/event channels allocated dynamically or statically for
> each domain?

XenLinux virtual device drivers bind event channels dynamically when they set 
up their communications with another domain.

I think there are some statically allocated event channels for essential 
services (e.g. for XenStore and the domain's console).

> 4. It seems I need to allocate one device channel per device, is this true?

Yes, but the device channel is something you build yourself using shared 
memory and event channels - it's up to you how you implement it.

In summary: event channels and shared memory are concrete services provided by 
Xen using an API.  A "device channel" is a high level term for the way 
drivers use these facilities to communicate.

I hope this helps, please ask if you need any clarification.

Cheers,
Mark

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Questions about device/event channels in Xen.
  2007-03-15 22:15             ` Questions about device/event channels in Xen Liang Yang
  2007-03-16  0:34               ` Mark Williamson
@ 2007-03-16  3:17               ` Daniel Stodden
  2007-03-16  8:38               ` Petersson, Mats
  2 siblings, 0 replies; 35+ messages in thread
From: Daniel Stodden @ 2007-03-16  3:17 UTC (permalink / raw)
  To: Liang Yang; +Cc: Xen Developers

On Thu, 2007-03-15 at 15:15 -0700, Liang Yang wrote:
> Hello,
> 
> I just have several questions about device and event channel:
> 1. From the implementation point of view, are device and event channel the 
> same (i.e. both based on shared memory)?
> 
> 2. In Xen papers, it is said up to 1024 channels are supported per domain. 
> Does 1024 include both device channel and event channel?

actually it depends on the architecture. on 64-bit-systems it's 4096.
there's a page of memory every domain shares with xen. this specific
limitation is due to the length of a bitvector where every event channel
marked pending sets a unique bit to 1, according to its port number (you
may think of this as a 'channel number', but actually the number depends
on who's holding the endpoint, similar to TCP/UDP connections. two
numbers connecting two domains by one channel).

the length of the bitvector in turn is more or less fixed, due to the
way it is indexed to speed up searches a little. when interrupted,
domains receiving events search the vector in order to determine which
device sent the notification.

> 3. Are these device/event channels allocated dynamically or statically for 
> each domain?

the channel itself is allocated dynamically. it's actually the port
numbers per domain being limited. but that is not much space.

> 4. It seems I need to allocate one device channel per device, is this true?

yes, as mark correctly explained. equivalent to the way different
interrupt lines in a physical host would be assigned to different
devices. one *may* share them, but it's tedious, and event channels are
cheaper than actual wire. :)

note: correctly termed, there's no such thing as a 'device channel'.
there are 'devices', being an event channel (for notification) and
shared memory (for the data).

regards,
daniel

-- 
Daniel Stodden
LRR     -      Lehrstuhl für Rechnertechnik und Rechnerorganisation
Institut für Informatik der TU München             D-85748 Garching
http://www.lrr.in.tum.de/~stodden         mailto:stodden@cs.tum.edu
PGP Fingerprint: F5A4 1575 4C56 E26A 0B33  3D80 457E 82AE B0D8 735B

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Questions about device/event channels in Xen.
@ 2007-03-16  6:02                   ` Liang Yang
  2007-03-16  8:45                     ` Keir Fraser
  0 siblings, 1 reply; 35+ messages in thread
From: Liang Yang @ 2007-03-16  6:02 UTC (permalink / raw)
  To: 'Mark Williamson', xen-devel
  Cc: 'Petersson, Mats', 'Daniel Stodden'

Hi Mark,

Thanks for your clarification. It is clear now. But I still have several
questions.

First: it seems Xen uses at least two different types of even "channels".
First type is for interrupt notification (upper call or uni-directional) and
the second if for the notification of queued descriptors (bi-directional).
So is the type of event channel fixed when Xen allocate them or not fixed
(for the same device), e.g. event channel 2 was a uni-directional type and
later can be changed to bi-directional type.

Second: as these events are handled asynchronously, does Xen treat different
type of event differently?  For example, does Xen always respond to
interrupt event immediately (unlike queuing more descriptors and then set up
event)?

Third: for a PCIe device, I can choose to use MSI or the legacy line-based
interrupt. Does different type of interrupt handling mechanism affect the
event channel set-up?

Liang

 
-----Original Message-----
From: M.A. Williamson [mailto:maw48@hermes.cam.ac.uk] On Behalf Of Mark
Williamson
Sent: Thursday, March 15, 2007 5:34 PM
To: xen-devel@lists.xensource.com
Cc: Liang Yang; Petersson, Mats
Subject: Re: [Xen-devel] Questions about device/event channels in Xen.

The terminology may be confusing you here, so let me just say: Device
channels 
are not like Event channels.  They're different concepts...  let me 
elaborate:

> I just have several questions about device and event channel:
> 1. From the implementation point of view, are device and event channel the
> same (i.e. both based on shared memory)?

Event channels don't use interdomain shared memory.  They're like an 
interdomain interrupt line, provided as a service by Xen.  Basically a way 
for a pair of domains to "poke" each other to say "Something just happened 
and there's work for you to do".

The "device channel" uses interdomain shared memory (using grant tables) and

event channels to emulate the functionality of a device.  For instance, the 
blkfront and blkback drivers do something like the following:

1. blkfront wants to access a block of data
   -> queue a "read request" into memory it shares with blkback
   -> notify blkback in dom0 using an event channel
2. blkback experiences an "interrupt" as a result of the event sent to it
   -> looks in the shared memory to find the request
   -> executes the read operation
   -> puts a response in shared memory
   -> notifies blkfront in the domU using an event channel
3. blkfront experiences an "interrupt" due to the event sent to it
   -> completes processing of the new data

The combination of the shared memory (containing a ring buffer for requests 
and responses) and the event channel provides the facilities for the front 
and back drivers to talk to each other; this is the device channel.

> 2. In Xen papers, it is said up to 1024 channels are supported per domain.
> Does 1024 include both device channel and event channel?

This should be answered by the text above; device channels are a different 
thing, built using event channels.

> 3. Are these device/event channels allocated dynamically or statically for
> each domain?

XenLinux virtual device drivers bind event channels dynamically when they
set 
up their communications with another domain.

I think there are some statically allocated event channels for essential 
services (e.g. for XenStore and the domain's console).

> 4. It seems I need to allocate one device channel per device, is this
true?

Yes, but the device channel is something you build yourself using shared 
memory and event channels - it's up to you how you implement it.

In summary: event channels and shared memory are concrete services provided
by 
Xen using an API.  A "device channel" is a high level term for the way 
drivers use these facilities to communicate.

I hope this helps, please ask if you need any clarification.

Cheers,
Mark

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Questions about device/event channels in Xen.
  2007-03-15 22:15             ` Questions about device/event channels in Xen Liang Yang
  2007-03-16  0:34               ` Mark Williamson
  2007-03-16  3:17               ` Questions about device/event channels in Xen Daniel Stodden
@ 2007-03-16  8:38               ` Petersson, Mats
  2 siblings, 0 replies; 35+ messages in thread
From: Petersson, Mats @ 2007-03-16  8:38 UTC (permalink / raw)
  To: Liang Yang, xen-devel

I have no idea on any of the below questions. Perhaps you may want to
send it to xen-devel. 

--
Mats 

> -----Original Message-----
> From: Liang Yang [mailto:multisyncfe991@hotmail.com] 
> Sent: 15 March 2007 22:15
> To: xen-devel@lists.xensource.com
> Cc: Petersson, Mats
> Subject: Questions about device/event channels in Xen.
> 
> Hello,
> 
> I just have several questions about device and event channel:
> 1. From the implementation point of view, are device and 
> event channel the 
> same (i.e. both based on shared memory)?
> 
> 2. In Xen papers, it is said up to 1024 channels are 
> supported per domain. 
> Does 1024 include both device channel and event channel?
> 
> 3. Are these device/event channels allocated dynamically or 
> statically for 
> each domain?
> 
> 4. It seems I need to allocate one device channel per device, 
> is this true?
> 
> Thanks,
> 
> Liang
> 
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Questions about device/event channels in Xen.
  2007-03-16  6:02                   ` Liang Yang
@ 2007-03-16  8:45                     ` Keir Fraser
  2007-03-16 17:30                       ` Does Dom0 always get interrupts first before they are delivered to other guest domains? Liang Yang
  0 siblings, 1 reply; 35+ messages in thread
From: Keir Fraser @ 2007-03-16  8:45 UTC (permalink / raw)
  To: Liang Yang, 'Mark Williamson', xen-devel
  Cc: 'Petersson, Mats', 'Daniel Stodden'

On 16/3/07 06:02, "Liang Yang" <multisyncfe991@hotmail.com> wrote:

> First: it seems Xen uses at least two different types of even "channels".
> First type is for interrupt notification (upper call or uni-directional) and
> the second if for the notification of queued descriptors (bi-directional).
> So is the type of event channel fixed when Xen allocate them or not fixed
> (for the same device), e.g. event channel 2 was a uni-directional type and
> later can be changed to bi-directional type.

An event channel can be allocated/deallocated many times during a domain's
lifetime. The type of an event channel can change across allocations, but is
fixed at allocation time for a particular allocate-to-deallocate period.

> Second: as these events are handled asynchronously, does Xen treat different
> type of event differently?  For example, does Xen always respond to
> interrupt event immediately (unlike queuing more descriptors and then set up
> event)?

Xen doesn't treat event delivery differently depending on type of event
channel. What changes is the reason for kicking the event channel.

> Third: for a PCIe device, I can choose to use MSI or the legacy line-based
> interrupt. Does different type of interrupt handling mechanism affect the
> event channel set-up?

We don't support MSI yet, but the event-channel interface will not change
when MSI is supported. The event channel will still be bound to a 'pirq'.

 -- Keir

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Does Dom0 always get interrupts first before they are delivered to other guest domains?
  2007-03-16  8:45                     ` Keir Fraser
@ 2007-03-16 17:30                       ` Liang Yang
  2007-03-16 17:40                         ` Petersson, Mats
  2007-03-19 16:33                         ` Does Xen also plan to move the back-end driver to the stub domain for HVM? Liang Yang
  0 siblings, 2 replies; 35+ messages in thread
From: Liang Yang @ 2007-03-16 17:30 UTC (permalink / raw)
  To: xen-devel

Hello,

It seems if HVM domains access device using emulation mode  w/ device model 
in domain0, Xen hypervisor will send the interrupt event to domain0 first 
and then the device model in domain0 will send event to HVM domains.

However, if I'm using split driver model and I only run BE driver on 
domain0. Does domain0 still get the interrupt first (assume this interupt is 
not owned by the Xen hypervisor ,e.g. local APIC timer) or Xen hypervisor 
will send event directly to HVM domain bypass domain0 for split driver 
model?

Another question is: for interrupt delivery, does Xen treat para-virtualized 
domain differently from HVM domain considering using device model and split 
driver model?

Thanks a lot,

Liang

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Does Dom0 always get interrupts first before they are delivered to other guest domains?
  2007-03-16 17:30                       ` Does Dom0 always get interrupts first before they are delivered to other guest domains? Liang Yang
@ 2007-03-16 17:40                         ` Petersson, Mats
  2007-03-16 18:48                           ` Liang Yang
  2007-03-19 16:33                         ` Does Xen also plan to move the back-end driver to the stub domain for HVM? Liang Yang
  1 sibling, 1 reply; 35+ messages in thread
From: Petersson, Mats @ 2007-03-16 17:40 UTC (permalink / raw)
  To: Liang Yang, xen-devel

 

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com 
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Liang Yang
> Sent: 16 March 2007 17:30
> To: xen-devel@lists.xensource.com
> Subject: [Xen-devel] Does Dom0 always get interrupts first 
> before they are delivered to other guest domains?
> 
> Hello,
> 
> It seems if HVM domains access device using emulation mode  
> w/ device model 
> in domain0, Xen hypervisor will send the interrupt event to 
> domain0 first 
> and then the device model in domain0 will send event to HVM domains.

Ok, so let's see if I've understood your question first:
If we do a disk-read (for example), the actual disk-read operation
itself will generate an interrupt, which goes into Xen HV where it's
converted to an event that goes to Dom0, which in turn wakes up the
pending call to read (in this case) that was requesting the disk IO, and
then when the read-call is finished an event is sent to the HVM DomU. Is
this the sequence of events that you're talking about?

If that's what you are talking about, it must be done this way. 
> 
> However, if I'm using split driver model and I only run BE driver on 
> domain0. Does domain0 still get the interrupt first (assume 
> this interupt is 
> not owned by the Xen hypervisor ,e.g. local APIC timer) or 
> Xen hypervisor 
> will send event directly to HVM domain bypass domain0 for 
> split driver 
> model?

Not in the above type of scenario. The interrupt must go to the
driver-domain (normally Dom0) to indicate that the hardware is ready to
deliver the data. This will wake up the user-mode call that waited for
the data, and then the data can be delivered to the guest domain from
there (which in turn is awakened by the event sent from the driver
domain). 

There is no difference in the number of events in these two cases. 

There is however a big difference in the number of hypervisor-to-dom0
events that occur: the HVM model will require something in the order of
5 writes to the IDE controller to perform one disk read/write operation.
Each of those will incur one event to wake up qemu-dm, and one event to
wake the domu (which will most likely just to one or two instructions
forward to hit the next write to the IDE controller). 

> 
> Another question is: for interrupt delivery, does Xen treat 
> para-virtualized 
> domain differently from HVM domain considering using device 
> model and split 
> driver model?

Not in interrupt delivery, no. Except for the fact that HVM domains
obviously have full hardware interfaces for interrupt controllers etc,
which adds a little bit of overhead (because each interrupt needs to be
acknowledged/cancelled on the interrupt controller, for example). 

--
Mats
> 
> Thanks a lot,
> 
> Liang
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Does Dom0 always get interrupts first before they are delivered to other guest domains?
  2007-03-16 17:40                         ` Petersson, Mats
@ 2007-03-16 18:48                           ` Liang Yang
  2007-03-21  0:37                             ` Mark Williamson
  0 siblings, 1 reply; 35+ messages in thread
From: Liang Yang @ 2007-03-16 18:48 UTC (permalink / raw)
  To: Petersson, Mats, xen-devel

Hi Mats,

Thanks. I still have two more questions:

First, you once gave another excellent explanation about the communication 
between HVM domain and HV (15 Feb 2007 ). Here I quote part of it
"...Since these IO events are synchronous in a real processor, the 
hypervisor will wait for a "return event" before the guest is allowed to 
continue. Qemu-dm runs as a normal user-process in Dom0..."
My question is about those Synchronous I/O events. Why can't we make them 
asynchronous? e.g. whenever I/O are done, we can interrupt HV again and let 
HV resume I/O processing. Is there any specific limiation to force Xen 
hypervisor do I/O in synchronous mode?

Second,  you just mentioned there is big difference between the number of 
HV-to-domain0 events for device model and split driver model. Could you 
elaborate the details about how split driver model can reduce the 
HV-to-domain0 events compared with using qemu device model?

Have a wonderful weekend,

Liang

----- Original Message ----- 
From: "Petersson, Mats" <Mats.Petersson@amd.com>
To: "Liang Yang" <multisyncfe991@hotmail.com>; 
<xen-devel@lists.xensource.com>
Sent: Friday, March 16, 2007 10:40 AM
Subject: RE: [Xen-devel] Does Dom0 always get interrupts first before they 
are delivered to other guest domains?




> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Liang Yang
> Sent: 16 March 2007 17:30
> To: xen-devel@lists.xensource.com
> Subject: [Xen-devel] Does Dom0 always get interrupts first
> before they are delivered to other guest domains?
>
> Hello,
>
> It seems if HVM domains access device using emulation mode
> w/ device model
> in domain0, Xen hypervisor will send the interrupt event to
> domain0 first
> and then the device model in domain0 will send event to HVM domains.

Ok, so let's see if I've understood your question first:
If we do a disk-read (for example), the actual disk-read operation
itself will generate an interrupt, which goes into Xen HV where it's
converted to an event that goes to Dom0, which in turn wakes up the
pending call to read (in this case) that was requesting the disk IO, and
then when the read-call is finished an event is sent to the HVM DomU. Is
this the sequence of events that you're talking about?

If that's what you are talking about, it must be done this way.
>
> However, if I'm using split driver model and I only run BE driver on
> domain0. Does domain0 still get the interrupt first (assume
> this interupt is
> not owned by the Xen hypervisor ,e.g. local APIC timer) or
> Xen hypervisor
> will send event directly to HVM domain bypass domain0 for
> split driver
> model?

Not in the above type of scenario. The interrupt must go to the
driver-domain (normally Dom0) to indicate that the hardware is ready to
deliver the data. This will wake up the user-mode call that waited for
the data, and then the data can be delivered to the guest domain from
there (which in turn is awakened by the event sent from the driver
domain).

There is no difference in the number of events in these two cases.

There is however a big difference in the number of hypervisor-to-dom0
events that occur: the HVM model will require something in the order of
5 writes to the IDE controller to perform one disk read/write operation.
Each of those will incur one event to wake up qemu-dm, and one event to
wake the domu (which will most likely just to one or two instructions
forward to hit the next write to the IDE controller).

>
> Another question is: for interrupt delivery, does Xen treat
> para-virtualized
> domain differently from HVM domain considering using device
> model and split
> driver model?

Not in interrupt delivery, no. Except for the fact that HVM domains
obviously have full hardware interfaces for interrupt controllers etc,
which adds a little bit of overhead (because each interrupt needs to be
acknowledged/cancelled on the interrupt controller, for example).

--
Mats
>
> Thanks a lot,
>
> Liang
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Does Xen also plan to move the back-end driver to the stub domain for HVM?
  2007-03-16 17:30                       ` Does Dom0 always get interrupts first before they are delivered to other guest domains? Liang Yang
  2007-03-16 17:40                         ` Petersson, Mats
@ 2007-03-19 16:33                         ` Liang Yang
  2007-03-19 16:45                           ` Petersson, Mats
  2007-03-19 18:20                           ` Anthony Liguori
  1 sibling, 2 replies; 35+ messages in thread
From: Liang Yang @ 2007-03-19 16:33 UTC (permalink / raw)
  To: xen-devel

Hi,

Based on the roadmap on Xen summit, there is a plan to move QEMU and let it 
run on the stub domain to improve HVM performance. However, comparing with 
QEMU device model, it will be much easier to move BE driver and let it run 
in stub domain instead of dom0 as BE part is running on the kernel space 
(QEMU is running on user space).

but I'm little bit confused about the relationship between stub domain and 
guest domain. Is the stub domain part of guest domain? Does each guest 
domain have a stub domain which is created when the guest domain is created?

If the stub domain is part of guest domain, does porting device model to 
stub domain compromise the orginial design purpose of isoloated devide 
domain?

Thanks,

Liang

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Does Xen also plan to move the back-end driver to the stub domain for HVM?
  2007-03-19 16:33                         ` Does Xen also plan to move the back-end driver to the stub domain for HVM? Liang Yang
@ 2007-03-19 16:45                           ` Petersson, Mats
  2007-03-19 18:20                           ` Anthony Liguori
  1 sibling, 0 replies; 35+ messages in thread
From: Petersson, Mats @ 2007-03-19 16:45 UTC (permalink / raw)
  To: Liang Yang, xen-devel

 

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com 
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Liang Yang
> Sent: 19 March 2007 16:34
> To: xen-devel@lists.xensource.com
> Subject: [Xen-devel] Does Xen also plan to move the back-end 
> driver to the stub domain for HVM?
> 
> Hi,
> 
> Based on the roadmap on Xen summit, there is a plan to move 
> QEMU and let it 
> run on the stub domain to improve HVM performance. However, 
> comparing with 
> QEMU device model, it will be much easier to move BE driver 
> and let it run 
> in stub domain instead of dom0 as BE part is running on the 
> kernel space 
> (QEMU is running on user space).

But that wouldn't serve the same purpose. What would you solve with
doing this? 

The purpose of the stub-domain is to ensure that QEMU-DM runs on the
same CPU as the domain needing the device-model, which in turn serves
several purposes:
1. It reduces the load on Dom0. Dom0 can end up being the bottleneck
quite qucikly for a HVM system with many domains. 
2. It reduces the latency in switching (because there is no OTHER
processor to wake up, wait for qemu-dm to react, etc, etc). 

The back-end driver, on the other hand, is there to serve as a bridge
between the virtual device in the guest and the hardware owner (dom0).
Since there's no plan to let guest-domains straight onto hardware
(besides the what's currently allowed with the pci-hide and
pci-passthrough - where the guest domain OWNS that hardware
exclusively), there's still a need to communicate from DomU to Dom0 (or
whichever domain it is that owns the hardware involved). 
> 
> but I'm little bit confused about the relationship between 
> stub domain and 
> guest domain. Is the stub domain part of guest domain? Does 
> each guest 
> domain have a stub domain which is created when the guest 
> domain is created?

Yes, each guest domain will have a stub-domain, according to what I
understand. 
> 
> If the stub domain is part of guest domain, does porting 
> device model to 
> stub domain compromise the orginial design purpose of 
> isoloated devide 
> domain?

No, because the stub-domain will still communicate with Dom0 once it's
got a full packet of IO request (cf. our discussion on IDE controller
for example). 

The purpose of the stub-domain is primarily to reduce the overhead of
Dom0. There are quite a few IO requests that can be resolved almost
entirely in the qemu-dm itself, which means that the Dom0 wouldn't have
to be bothered at all. Other requests do require that Dom0 is involved.
But if 1 in 4 requests go to Dom0, that means that the stub-domain can
solve 3 in 4 requests without going through Dom0 - that's where the big
saving is. 

--
Mats
> 
> Thanks,
> 
> Liang
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Does Xen also plan to move the back-end driver to the stub domain for HVM?
  2007-03-19 16:33                         ` Does Xen also plan to move the back-end driver to the stub domain for HVM? Liang Yang
  2007-03-19 16:45                           ` Petersson, Mats
@ 2007-03-19 18:20                           ` Anthony Liguori
  2007-03-19 19:21                             ` Liang Yang
  1 sibling, 1 reply; 35+ messages in thread
From: Anthony Liguori @ 2007-03-19 18:20 UTC (permalink / raw)
  To: Liang Yang; +Cc: xen-devel

Liang Yang wrote:
> Hi,
> 
> Based on the roadmap on Xen summit, there is a plan to move QEMU and let 
> it run on the stub domain to improve HVM performance.

Using a stub domain won't improve HVM performance.  It will improve 
accountability and scalability but running a single HVM guest shouldn't 
see any improvement.

> However, comparing 
> with QEMU device model, it will be much easier to move BE driver and let 
> it run in stub domain instead of dom0 as BE part is running on the 
> kernel space (QEMU is running on user space).

Actually, this cannot make performance better since you're technically 
adding another layer of indirection in the picture.  Within dom0, 
qemu-dm has direct access to the hardware.  Fortunately, the Xen BE/FE 
model is quite good performance wise so there shouldn't be a performance 
regression here.

> but I'm little bit confused about the relationship between stub domain 
> and guest domain. Is the stub domain part of guest domain? Does each 
> guest domain have a stub domain which is created when the guest domain 
> is created?

A lot of this is still being worked out.  From a user perspective, the 
idea would be that creating an HVM domain would be identical to how it's 
done today.  What happens under the covers though remains to be seen.

Regards,

Anthony Liguori

> If the stub domain is part of guest domain, does porting device model to 
> stub domain compromise the orginial design purpose of isoloated devide 
> domain?
> 
> Thanks,
> 
> Liang

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Re: Does Xen also plan to move the back-end driver to the stub domain for HVM?
  2007-03-19 18:20                           ` Anthony Liguori
@ 2007-03-19 19:21                             ` Liang Yang
  2007-03-19 20:20                               ` Anthony Liguori
  2007-03-20 10:03                               ` Re: Does Xen also plan to move the back-end driver to the stub domain for HVM? Petersson, Mats
  0 siblings, 2 replies; 35+ messages in thread
From: Liang Yang @ 2007-03-19 19:21 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: xen-devel

"QEMU has direct access to hardware", does this mean the QEMU device model 
does not need to communicate with the native device driver which is also 
sitting in dom0?


----- Original Message ----- 
From: "Anthony Liguori" <aliguori@us.ibm.com>
To: "Liang Yang" <multisyncfe991@hotmail.com>
Cc: <xen-devel@lists.xensource.com>
Sent: Monday, March 19, 2007 11:20 AM
Subject: [Xen-devel] Re: Does Xen also plan to move the back-end driver to 
the stub domain for HVM?


> Liang Yang wrote:
>> Hi,
>>
>> Based on the roadmap on Xen summit, there is a plan to move QEMU and let 
>> it run on the stub domain to improve HVM performance.
>
> Using a stub domain won't improve HVM performance.  It will improve 
> accountability and scalability but running a single HVM guest shouldn't 
> see any improvement.
>
>> However, comparing with QEMU device model, it will be much easier to move 
>> BE driver and let it run in stub domain instead of dom0 as BE part is 
>> running on the kernel space (QEMU is running on user space).
>
> Actually, this cannot make performance better since you're technically 
> adding another layer of indirection in the picture.  Within dom0, qemu-dm 
> has direct access to the hardware.  Fortunately, the Xen BE/FE model is 
> quite good performance wise so there shouldn't be a performance regression 
> here.
>
>> but I'm little bit confused about the relationship between stub domain 
>> and guest domain. Is the stub domain part of guest domain? Does each 
>> guest domain have a stub domain which is created when the guest domain is 
>> created?
>
> A lot of this is still being worked out.  From a user perspective, the 
> idea would be that creating an HVM domain would be identical to how it's 
> done today.  What happens under the covers though remains to be seen.
>
> Regards,
>
> Anthony Liguori
>
>> If the stub domain is part of guest domain, does porting device model to 
>> stub domain compromise the orginial design purpose of isoloated devide 
>> domain?
>>
>> Thanks,
>>
>> Liang
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Re: Does Xen also plan to move the back-end driver to the stub domain for HVM?
  2007-03-19 19:21                             ` Liang Yang
@ 2007-03-19 20:20                               ` Anthony Liguori
  2007-03-19 21:56                                 ` Question about reserving one CPU for the Xen hypervisor in case of vm exit Liang Yang
  2007-03-20 10:03                               ` Re: Does Xen also plan to move the back-end driver to the stub domain for HVM? Petersson, Mats
  1 sibling, 1 reply; 35+ messages in thread
From: Anthony Liguori @ 2007-03-19 20:20 UTC (permalink / raw)
  To: Liang Yang; +Cc: xen-devel

Liang Yang wrote:
> "QEMU has direct access to hardware", does this mean the QEMU device 
> model does not need to communicate with the native device driver which 
> is also sitting in dom0?
>

No, it means that it communicates with the native device drivers 
directly instead of going through another indirection layer (namely, the 
front and backend drivers).

Regards,

Anthony Liguori

> ----- Original Message ----- From: "Anthony Liguori" 
> <aliguori@us.ibm.com>
> To: "Liang Yang" <multisyncfe991@hotmail.com>
> Cc: <xen-devel@lists.xensource.com>
> Sent: Monday, March 19, 2007 11:20 AM
> Subject: [Xen-devel] Re: Does Xen also plan to move the back-end 
> driver to the stub domain for HVM?
>
>
>> Liang Yang wrote:
>>> Hi,
>>>
>>> Based on the roadmap on Xen summit, there is a plan to move QEMU and 
>>> let it run on the stub domain to improve HVM performance.
>>
>> Using a stub domain won't improve HVM performance.  It will improve 
>> accountability and scalability but running a single HVM guest 
>> shouldn't see any improvement.
>>
>>> However, comparing with QEMU device model, it will be much easier to 
>>> move BE driver and let it run in stub domain instead of dom0 as BE 
>>> part is running on the kernel space (QEMU is running on user space).
>>
>> Actually, this cannot make performance better since you're 
>> technically adding another layer of indirection in the picture.  
>> Within dom0, qemu-dm has direct access to the hardware.  Fortunately, 
>> the Xen BE/FE model is quite good performance wise so there shouldn't 
>> be a performance regression here.
>>
>>> but I'm little bit confused about the relationship between stub 
>>> domain and guest domain. Is the stub domain part of guest domain? 
>>> Does each guest domain have a stub domain which is created when the 
>>> guest domain is created?
>>
>> A lot of this is still being worked out.  From a user perspective, 
>> the idea would be that creating an HVM domain would be identical to 
>> how it's done today.  What happens under the covers though remains to 
>> be seen.
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>> If the stub domain is part of guest domain, does porting device 
>>> model to stub domain compromise the orginial design purpose of 
>>> isoloated devide domain?
>>>
>>> Thanks,
>>>
>>> Liang
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Question about reserving one CPU for the Xen hypervisor in case of vm exit.
  2007-03-19 20:20                               ` Anthony Liguori
@ 2007-03-19 21:56                                 ` Liang Yang
  2007-03-20 10:13                                   ` Petersson, Mats
  0 siblings, 1 reply; 35+ messages in thread
From: Liang Yang @ 2007-03-19 21:56 UTC (permalink / raw)
  To: xen-devel

Hi,

My platform has two dual-core processors with VT-x enabled. Suppose I use 
"xm vcpu-pin" command to set up a fixed mapping between each physical 
processor/core to virtual cpu (to avoid possible migration).

I have three domains, one is dom0, the second is domUP and the third is 
domUF (HVM domain). I give each domain one CPU and reserve one for 
hypervisor. What I want to do is to always keep one CPU idle (reserving it 
for VMM), Xen hyperviso can thus always use this idle CPU whenever a "vm 
exit" happens and the guest HVM domain still has its own CPU to do some 
overlapping processing (to improve performance).

Is this feasible?

Thanks,

Liang

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Re: Does Xen also plan to move the back-end driver to the stub domain for HVM?
  2007-03-19 19:21                             ` Liang Yang
  2007-03-19 20:20                               ` Anthony Liguori
@ 2007-03-20 10:03                               ` Petersson, Mats
  1 sibling, 0 replies; 35+ messages in thread
From: Petersson, Mats @ 2007-03-20 10:03 UTC (permalink / raw)
  To: Liang Yang, Anthony Liguori; +Cc: xen-devel

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com 
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Liang Yang
> Sent: 19 March 2007 19:21
> To: Anthony Liguori
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] Re: Does Xen also plan to move the 
> back-end driver to the stub domain for HVM?
> 
> "QEMU has direct access to hardware", does this mean the QEMU 
> device model 
> does not need to communicate with the native device driver 
> which is also 
> sitting in dom0?

No, it needs the Dom0 device driver. 

--
Mats
> 
> 
> ----- Original Message ----- 
> From: "Anthony Liguori" <aliguori@us.ibm.com>
> To: "Liang Yang" <multisyncfe991@hotmail.com>
> Cc: <xen-devel@lists.xensource.com>
> Sent: Monday, March 19, 2007 11:20 AM
> Subject: [Xen-devel] Re: Does Xen also plan to move the 
> back-end driver to 
> the stub domain for HVM?
> 
> 
> > Liang Yang wrote:
> >> Hi,
> >>
> >> Based on the roadmap on Xen summit, there is a plan to 
> move QEMU and let 
> >> it run on the stub domain to improve HVM performance.
> >
> > Using a stub domain won't improve HVM performance.  It will improve 
> > accountability and scalability but running a single HVM 
> guest shouldn't 
> > see any improvement.
> >
> >> However, comparing with QEMU device model, it will be much 
> easier to move 
> >> BE driver and let it run in stub domain instead of dom0 as 
> BE part is 
> >> running on the kernel space (QEMU is running on user space).
> >
> > Actually, this cannot make performance better since you're 
> technically 
> > adding another layer of indirection in the picture.  Within 
> dom0, qemu-dm 
> > has direct access to the hardware.  Fortunately, the Xen 
> BE/FE model is 
> > quite good performance wise so there shouldn't be a 
> performance regression 
> > here.
> >
> >> but I'm little bit confused about the relationship between 
> stub domain 
> >> and guest domain. Is the stub domain part of guest domain? 
> Does each 
> >> guest domain have a stub domain which is created when the 
> guest domain is 
> >> created?
> >
> > A lot of this is still being worked out.  From a user 
> perspective, the 
> > idea would be that creating an HVM domain would be 
> identical to how it's 
> > done today.  What happens under the covers though remains 
> to be seen.
> >
> > Regards,
> >
> > Anthony Liguori
> >
> >> If the stub domain is part of guest domain, does porting 
> device model to 
> >> stub domain compromise the orginial design purpose of 
> isoloated devide 
> >> domain?
> >>
> >> Thanks,
> >>
> >> Liang
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> > 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Question about reserving one CPU for the Xen hypervisor in case of vm exit.
  2007-03-19 21:56                                 ` Question about reserving one CPU for the Xen hypervisor in case of vm exit Liang Yang
@ 2007-03-20 10:13                                   ` Petersson, Mats
  0 siblings, 0 replies; 35+ messages in thread
From: Petersson, Mats @ 2007-03-20 10:13 UTC (permalink / raw)
  To: Liang Yang, xen-devel

 

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com 
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Liang Yang
> Sent: 19 March 2007 21:56
> To: xen-devel@lists.xensource.com
> Subject: [Xen-devel] Question about reserving one CPU for the 
> Xen hypervisor in case of vm exit.
> 
> Hi,
> 
> My platform has two dual-core processors with VT-x enabled. 
> Suppose I use 
> "xm vcpu-pin" command to set up a fixed mapping between each physical 
> processor/core to virtual cpu (to avoid possible migration).
> 
> I have three domains, one is dom0, the second is domUP and 
> the third is 
> domUF (HVM domain). I give each domain one CPU and reserve one for 
> hypervisor. What I want to do is to always keep one CPU idle 
> (reserving it 
> for VMM), Xen hyperviso can thus always use this idle CPU 
> whenever a "vm 
> exit" happens and the guest HVM domain still has its own CPU 
> to do some 
> overlapping processing (to improve performance).

That will leave you with one CPU sitting there doing absolutely nothing,
as the VMEXIT handling is all done on the CPU that causes the VMEXIT in
the first place. 

The same applies for hypercalls from the PV side. They all happen on the
same CPU that the guest is running on. 

It's a good idea to allow Dom0 to have it's own CPU, but beyond that,
you're better off sharing the three CPU's between your two guests in one
way or another - obviously, you can't give one and a half CPU to a
guest, so you probably will have to give both guests two CPU's to make
efficient use of the system. Or give one CPU to one guest and two to the
other guest. 

--
Mats
> 
> Is this feasible?
> 
> Thanks,
> 
> Liang
> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Does Dom0 always get interrupts first before they are delivered to other guest domains?
  2007-03-16 18:48                           ` Liang Yang
@ 2007-03-21  0:37                             ` Mark Williamson
  2007-03-21  1:23                                 ` Liang Yang
  0 siblings, 1 reply; 35+ messages in thread
From: Mark Williamson @ 2007-03-21  0:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Petersson, Mats, Liang Yang

Hi,

> First, you once gave another excellent explanation about the communication
> between HVM domain and HV (15 Feb 2007 ). Here I quote part of it
> "...Since these IO events are synchronous in a real processor, the
> hypervisor will wait for a "return event" before the guest is allowed to
> continue. Qemu-dm runs as a normal user-process in Dom0..."
> My question is about those Synchronous I/O events. Why can't we make them
> asynchronous? e.g. whenever I/O are done, we can interrupt HV again and let
> HV resume I/O processing. Is there any specific limiation to force Xen
> hypervisor do I/O in synchronous mode?

Was this talking about IO port reads / writes?

The problem with IO port reads is that the guest expects the hardware to have 
responded to an IO port read and for the result to be available as soon as 
the inb (or whatever) instruction has finished...  Therefore in a virtual 
machine, we can't return to the guest until we've figured out (by emulating 
using the device model) what that read should return.

Consecutive writes can potentially be batched, I believe, and there has been 
talk of implementing that.

I don't see any reason why other VCPUs shouldn't keep running in the meantime, 
though.

> Second,  you just mentioned there is big difference between the number of
> HV-to-domain0 events for device model and split driver model. Could you
> elaborate the details about how split driver model can reduce the
> HV-to-domain0 events compared with using qemu device model?

The PV split drivers are designed to minimise events: they'll queue up a load 
of IO requests in a batch and then notify dom0 that the IO requests are 
ready.

In contrast, the FV device emulation can't do this: we have to consult dom0 
for the emulation of any device operations the guest does (e.g. each IO port 
read the guest does) so the batching is less efficient.

Cheers,
Mark

> Have a wonderful weekend,
>
> Liang
>
> ----- Original Message -----
> From: "Petersson, Mats" <Mats.Petersson@amd.com>
> To: "Liang Yang" <multisyncfe991@hotmail.com>;
> <xen-devel@lists.xensource.com>
> Sent: Friday, March 16, 2007 10:40 AM
> Subject: RE: [Xen-devel] Does Dom0 always get interrupts first before they
> are delivered to other guest domains?
>
> > -----Original Message-----
> > From: xen-devel-bounces@lists.xensource.com
> > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Liang Yang
> > Sent: 16 March 2007 17:30
> > To: xen-devel@lists.xensource.com
> > Subject: [Xen-devel] Does Dom0 always get interrupts first
> > before they are delivered to other guest domains?
> >
> > Hello,
> >
> > It seems if HVM domains access device using emulation mode
> > w/ device model
> > in domain0, Xen hypervisor will send the interrupt event to
> > domain0 first
> > and then the device model in domain0 will send event to HVM domains.
>
> Ok, so let's see if I've understood your question first:
> If we do a disk-read (for example), the actual disk-read operation
> itself will generate an interrupt, which goes into Xen HV where it's
> converted to an event that goes to Dom0, which in turn wakes up the
> pending call to read (in this case) that was requesting the disk IO, and
> then when the read-call is finished an event is sent to the HVM DomU. Is
> this the sequence of events that you're talking about?
>
> If that's what you are talking about, it must be done this way.
>
> > However, if I'm using split driver model and I only run BE driver on
> > domain0. Does domain0 still get the interrupt first (assume
> > this interupt is
> > not owned by the Xen hypervisor ,e.g. local APIC timer) or
> > Xen hypervisor
> > will send event directly to HVM domain bypass domain0 for
> > split driver
> > model?
>
> Not in the above type of scenario. The interrupt must go to the
> driver-domain (normally Dom0) to indicate that the hardware is ready to
> deliver the data. This will wake up the user-mode call that waited for
> the data, and then the data can be delivered to the guest domain from
> there (which in turn is awakened by the event sent from the driver
> domain).
>
> There is no difference in the number of events in these two cases.
>
> There is however a big difference in the number of hypervisor-to-dom0
> events that occur: the HVM model will require something in the order of
> 5 writes to the IDE controller to perform one disk read/write operation.
> Each of those will incur one event to wake up qemu-dm, and one event to
> wake the domu (which will most likely just to one or two instructions
> forward to hit the next write to the IDE controller).
>
> > Another question is: for interrupt delivery, does Xen treat
> > para-virtualized
> > domain differently from HVM domain considering using device
> > model and split
> > driver model?
>
> Not in interrupt delivery, no. Except for the fact that HVM domains
> obviously have full hardware interfaces for interrupt controllers etc,
> which adds a little bit of overhead (because each interrupt needs to be
> acknowledged/cancelled on the interrupt controller, for example).
>
> --
> Mats
>
> > Thanks a lot,
> >
> > Liang
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

-- 
Dave: Just a question. What use is a unicyle with no seat?  And no pedals!
Mark: To answer a question with a question: What use is a skateboard?
Dave: Skateboards have wheels.
Mark: My wheel has a wheel!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Does Dom0 always get interrupts first before they are delivered to other guest domains?
@ 2007-03-21  1:23                                 ` Liang Yang
  0 siblings, 0 replies; 35+ messages in thread
From: Liang Yang @ 2007-03-21  1:23 UTC (permalink / raw)
  To: 'Mark Williamson', xen-devel; +Cc: 'Petersson, Mats'

Hi Mark,

Thanks. 

I have another question about using VT-X and Hypercall to support
para-virtualized and full-virtualized domain simultaneously:

It seems Xen does not need to use hypercall to replace all problematic
instructions (e.g. HLT, POPF etc.). For example, there is an instruction
called CLTS. Instead of replacing it with a hypercall, Xen hypervisor will
first delegate it to ring 0 when a GP fault occurs and then run it from
there to solve ring aliasing issue.
(http://www.linuxjournal.com/comment/reply/8909 talked about this).

Now my first question comes up: if I 'm running both para-virtualized and
full-virtualized domain on single CPU (I think Xen hypervisor will set up
the exception bitmap for CLTS instruction for HVM domain). Then Xen
hypervisor will be confused and does not know how to handle it when running
CLTS in ring 1. 

Does Xen hypervisor do a VM EXIT or still delegate CLTS to ring 0? How does
Xen hypervisor distinguish the instruction is from para-virtualized domain
or is from a full-virtualized domain? Does Xen have to replace all
problematic instructions with hypercalls for Para-domain (even for CLTS)?
Why does Xen need to use different strategies in para-virtualized domain to
handle CLTS (delegation to ring 0) and other problematic instructions
(hypercall)?


My second question:
It seems each processor has its own exception bitmap. If I have
multi-processors (vt-x enabled), does Xen hypervisor use the same exception
bitmap in all processors or does Xen allow different processor have its own
(maybe different) exception bitmap?

Best regards,

Liang

-----Original Message-----
From: M.A. Williamson [mailto:maw48@hermes.cam.ac.uk] On Behalf Of Mark
Williamson
Sent: Tuesday, March 20, 2007 5:37 PM
To: xen-devel@lists.xensource.com
Cc: Liang Yang; Petersson, Mats
Subject: Re: [Xen-devel] Does Dom0 always get interrupts first before they
are delivered to other guest domains?

Hi,

> First, you once gave another excellent explanation about the communication
> between HVM domain and HV (15 Feb 2007 ). Here I quote part of it
> "...Since these IO events are synchronous in a real processor, the
> hypervisor will wait for a "return event" before the guest is allowed to
> continue. Qemu-dm runs as a normal user-process in Dom0..."
> My question is about those Synchronous I/O events. Why can't we make them
> asynchronous? e.g. whenever I/O are done, we can interrupt HV again and
let
> HV resume I/O processing. Is there any specific limiation to force Xen
> hypervisor do I/O in synchronous mode?

Was this talking about IO port reads / writes?

The problem with IO port reads is that the guest expects the hardware to
have 
responded to an IO port read and for the result to be available as soon as 
the inb (or whatever) instruction has finished...  Therefore in a virtual 
machine, we can't return to the guest until we've figured out (by emulating 
using the device model) what that read should return.

Consecutive writes can potentially be batched, I believe, and there has been

talk of implementing that.

I don't see any reason why other VCPUs shouldn't keep running in the
meantime, 
though.

> Second,  you just mentioned there is big difference between the number of
> HV-to-domain0 events for device model and split driver model. Could you
> elaborate the details about how split driver model can reduce the
> HV-to-domain0 events compared with using qemu device model?

The PV split drivers are designed to minimise events: they'll queue up a
load 
of IO requests in a batch and then notify dom0 that the IO requests are 
ready.

In contrast, the FV device emulation can't do this: we have to consult dom0 
for the emulation of any device operations the guest does (e.g. each IO port

read the guest does) so the batching is less efficient.

Cheers,
Mark

> Have a wonderful weekend,
>
> Liang
>
> ----- Original Message -----
> From: "Petersson, Mats" <Mats.Petersson@amd.com>
> To: "Liang Yang" <multisyncfe991@hotmail.com>;
> <xen-devel@lists.xensource.com>
> Sent: Friday, March 16, 2007 10:40 AM
> Subject: RE: [Xen-devel] Does Dom0 always get interrupts first before they
> are delivered to other guest domains?
>
> > -----Original Message-----
> > From: xen-devel-bounces@lists.xensource.com
> > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Liang Yang
> > Sent: 16 March 2007 17:30
> > To: xen-devel@lists.xensource.com
> > Subject: [Xen-devel] Does Dom0 always get interrupts first
> > before they are delivered to other guest domains?
> >
> > Hello,
> >
> > It seems if HVM domains access device using emulation mode
> > w/ device model
> > in domain0, Xen hypervisor will send the interrupt event to
> > domain0 first
> > and then the device model in domain0 will send event to HVM domains.
>
> Ok, so let's see if I've understood your question first:
> If we do a disk-read (for example), the actual disk-read operation
> itself will generate an interrupt, which goes into Xen HV where it's
> converted to an event that goes to Dom0, which in turn wakes up the
> pending call to read (in this case) that was requesting the disk IO, and
> then when the read-call is finished an event is sent to the HVM DomU. Is
> this the sequence of events that you're talking about?
>
> If that's what you are talking about, it must be done this way.
>
> > However, if I'm using split driver model and I only run BE driver on
> > domain0. Does domain0 still get the interrupt first (assume
> > this interupt is
> > not owned by the Xen hypervisor ,e.g. local APIC timer) or
> > Xen hypervisor
> > will send event directly to HVM domain bypass domain0 for
> > split driver
> > model?
>
> Not in the above type of scenario. The interrupt must go to the
> driver-domain (normally Dom0) to indicate that the hardware is ready to
> deliver the data. This will wake up the user-mode call that waited for
> the data, and then the data can be delivered to the guest domain from
> there (which in turn is awakened by the event sent from the driver
> domain).
>
> There is no difference in the number of events in these two cases.
>
> There is however a big difference in the number of hypervisor-to-dom0
> events that occur: the HVM model will require something in the order of
> 5 writes to the IDE controller to perform one disk read/write operation.
> Each of those will incur one event to wake up qemu-dm, and one event to
> wake the domu (which will most likely just to one or two instructions
> forward to hit the next write to the IDE controller).
>
> > Another question is: for interrupt delivery, does Xen treat
> > para-virtualized
> > domain differently from HVM domain considering using device
> > model and split
> > driver model?
>
> Not in interrupt delivery, no. Except for the fact that HVM domains
> obviously have full hardware interfaces for interrupt controllers etc,
> which adds a little bit of overhead (because each interrupt needs to be
> acknowledged/cancelled on the interrupt controller, for example).
>
> --
> Mats
>
> > Thanks a lot,
> >
> > Liang
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

-- 
Dave: Just a question. What use is a unicyle with no seat?  And no pedals!
Mark: To answer a question with a question: What use is a skateboard?
Dave: Skateboards have wheels.
Mark: My wheel has a wheel!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Does Dom0 always get interrupts first before they aredelivered to other guest domains?
@ 2007-03-21  8:31                                 ` Tian, Kevin
  0 siblings, 0 replies; 35+ messages in thread
From: Tian, Kevin @ 2007-03-21  8:31 UTC (permalink / raw)
  To: Liang Yang, Mark Williamson, xen-devel; +Cc: Petersson, Mats

>From: Liang Yang
>Sent: 2007年3月21日 9:23
>
>Now my first question comes up: if I 'm running both para-virtualized and
>full-virtualized domain on single CPU (I think Xen hypervisor will set up
>the exception bitmap for CLTS instruction for HVM domain). Then Xen
>hypervisor will be confused and does not know how to handle it when
>running
>CLTS in ring 1.

Whenever Xen hypervisor is running, there's always a current vcpu 
context from which Xen can easily know whether current domain is 
para-virtualized or not.

Para-virtualized and HVM guest has different entry point for the 
above CLTS example. For para-virtualized guest, it's the GP fault 
handler of Xen to be invoked at the point. For HVM guest, it's the 
VM-EXIT handler to be invoked with detail reason. When running 
within guest, hardware knows whether running environment is with 
hardware virtualization assist or not, and then can decide which path 
to enter when fault happens.

Thanks,
Kevin

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Does Dom0 always get interrupts first before they are delivered to other guest domains?
@ 2007-03-21  9:13                                 ` Petersson, Mats
  0 siblings, 0 replies; 35+ messages in thread
From: Petersson, Mats @ 2007-03-21  9:13 UTC (permalink / raw)
  To: Liang Yang, Mark Williamson, xen-devel

First of: Forgive me for top-posting, but I think this message should be
seen by all, and isn't really a response to the post below anyway.

Could you (Liang Yang) please avoid sending the SAME question to both me
privately and the mailing list. It's called cross-posting and not a
"nice" thing, as I may not realize that it's been posted to two
different places. 

To everyone else, I've already answered the below questions (aside from
the bit that wasn't in the mail to me, but that's been answered by Kevin
anyways). 

--
Mats
> -----Original Message-----
> From: Liang Yang [mailto:multisyncfe991@hotmail.com] 
> Sent: 21 March 2007 01:23
> To: 'Mark Williamson'; xen-devel@lists.xensource.com
> Cc: Petersson, Mats
> Subject: RE: [Xen-devel] Does Dom0 always get interrupts 
> first before they are delivered to other guest domains?
> 
> Hi Mark,
> 
> Thanks. 
> 
> I have another question about using VT-X and Hypercall to support
> para-virtualized and full-virtualized domain simultaneously:
> 
> It seems Xen does not need to use hypercall to replace all problematic
> instructions (e.g. HLT, POPF etc.). For example, there is an 
> instruction
> called CLTS. Instead of replacing it with a hypercall, Xen 
> hypervisor will
> first delegate it to ring 0 when a GP fault occurs and then 
> run it from
> there to solve ring aliasing issue.
> (http://www.linuxjournal.com/comment/reply/8909 talked about this).
> 
> Now my first question comes up: if I 'm running both 
> para-virtualized and
> full-virtualized domain on single CPU (I think Xen hypervisor 
> will set up
> the exception bitmap for CLTS instruction for HVM domain). Then Xen
> hypervisor will be confused and does not know how to handle 
> it when running
> CLTS in ring 1. 
> 
> Does Xen hypervisor do a VM EXIT or still delegate CLTS to 
> ring 0? How does
> Xen hypervisor distinguish the instruction is from 
> para-virtualized domain
> or is from a full-virtualized domain? Does Xen have to replace all
> problematic instructions with hypercalls for Para-domain 
> (even for CLTS)?
> Why does Xen need to use different strategies in 
> para-virtualized domain to
> handle CLTS (delegation to ring 0) and other problematic instructions
> (hypercall)?
> 
> 
> My second question:
> It seems each processor has its own exception bitmap. If I have
> multi-processors (vt-x enabled), does Xen hypervisor use the 
> same exception
> bitmap in all processors or does Xen allow different 
> processor have its own
> (maybe different) exception bitmap?
> 
> Best regards,
> 
> Liang
> 
> -----Original Message-----
> From: M.A. Williamson [mailto:maw48@hermes.cam.ac.uk] On 
> Behalf Of Mark
> Williamson
> Sent: Tuesday, March 20, 2007 5:37 PM
> To: xen-devel@lists.xensource.com
> Cc: Liang Yang; Petersson, Mats
> Subject: Re: [Xen-devel] Does Dom0 always get interrupts 
> first before they
> are delivered to other guest domains?
> 
> Hi,
> 
> > First, you once gave another excellent explanation about 
> the communication
> > between HVM domain and HV (15 Feb 2007 ). Here I quote part of it
> > "...Since these IO events are synchronous in a real processor, the
> > hypervisor will wait for a "return event" before the guest 
> is allowed to
> > continue. Qemu-dm runs as a normal user-process in Dom0..."
> > My question is about those Synchronous I/O events. Why 
> can't we make them
> > asynchronous? e.g. whenever I/O are done, we can interrupt 
> HV again and
> let
> > HV resume I/O processing. Is there any specific limiation 
> to force Xen
> > hypervisor do I/O in synchronous mode?
> 
> Was this talking about IO port reads / writes?
> 
> The problem with IO port reads is that the guest expects the 
> hardware to
> have 
> responded to an IO port read and for the result to be 
> available as soon as 
> the inb (or whatever) instruction has finished...  Therefore 
> in a virtual 
> machine, we can't return to the guest until we've figured out 
> (by emulating 
> using the device model) what that read should return.
> 
> Consecutive writes can potentially be batched, I believe, and 
> there has been
> 
> talk of implementing that.
> 
> I don't see any reason why other VCPUs shouldn't keep running in the
> meantime, 
> though.
> 
> > Second,  you just mentioned there is big difference between 
> the number of
> > HV-to-domain0 events for device model and split driver 
> model. Could you
> > elaborate the details about how split driver model can reduce the
> > HV-to-domain0 events compared with using qemu device model?
> 
> The PV split drivers are designed to minimise events: they'll 
> queue up a
> load 
> of IO requests in a batch and then notify dom0 that the IO 
> requests are 
> ready.
> 
> In contrast, the FV device emulation can't do this: we have 
> to consult dom0 
> for the emulation of any device operations the guest does 
> (e.g. each IO port
> 
> read the guest does) so the batching is less efficient.
> 
> Cheers,
> Mark
> 
> > Have a wonderful weekend,
> >
> > Liang
> >
> > ----- Original Message -----
> > From: "Petersson, Mats" <Mats.Petersson@amd.com>
> > To: "Liang Yang" <multisyncfe991@hotmail.com>;
> > <xen-devel@lists.xensource.com>
> > Sent: Friday, March 16, 2007 10:40 AM
> > Subject: RE: [Xen-devel] Does Dom0 always get interrupts 
> first before they
> > are delivered to other guest domains?
> >
> > > -----Original Message-----
> > > From: xen-devel-bounces@lists.xensource.com
> > > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf 
> Of Liang Yang
> > > Sent: 16 March 2007 17:30
> > > To: xen-devel@lists.xensource.com
> > > Subject: [Xen-devel] Does Dom0 always get interrupts first
> > > before they are delivered to other guest domains?
> > >
> > > Hello,
> > >
> > > It seems if HVM domains access device using emulation mode
> > > w/ device model
> > > in domain0, Xen hypervisor will send the interrupt event to
> > > domain0 first
> > > and then the device model in domain0 will send event to 
> HVM domains.
> >
> > Ok, so let's see if I've understood your question first:
> > If we do a disk-read (for example), the actual disk-read operation
> > itself will generate an interrupt, which goes into Xen HV where it's
> > converted to an event that goes to Dom0, which in turn wakes up the
> > pending call to read (in this case) that was requesting the 
> disk IO, and
> > then when the read-call is finished an event is sent to the 
> HVM DomU. Is
> > this the sequence of events that you're talking about?
> >
> > If that's what you are talking about, it must be done this way.
> >
> > > However, if I'm using split driver model and I only run 
> BE driver on
> > > domain0. Does domain0 still get the interrupt first (assume
> > > this interupt is
> > > not owned by the Xen hypervisor ,e.g. local APIC timer) or
> > > Xen hypervisor
> > > will send event directly to HVM domain bypass domain0 for
> > > split driver
> > > model?
> >
> > Not in the above type of scenario. The interrupt must go to the
> > driver-domain (normally Dom0) to indicate that the hardware 
> is ready to
> > deliver the data. This will wake up the user-mode call that 
> waited for
> > the data, and then the data can be delivered to the guest 
> domain from
> > there (which in turn is awakened by the event sent from the driver
> > domain).
> >
> > There is no difference in the number of events in these two cases.
> >
> > There is however a big difference in the number of 
> hypervisor-to-dom0
> > events that occur: the HVM model will require something in 
> the order of
> > 5 writes to the IDE controller to perform one disk 
> read/write operation.
> > Each of those will incur one event to wake up qemu-dm, and 
> one event to
> > wake the domu (which will most likely just to one or two 
> instructions
> > forward to hit the next write to the IDE controller).
> >
> > > Another question is: for interrupt delivery, does Xen treat
> > > para-virtualized
> > > domain differently from HVM domain considering using device
> > > model and split
> > > driver model?
> >
> > Not in interrupt delivery, no. Except for the fact that HVM domains
> > obviously have full hardware interfaces for interrupt 
> controllers etc,
> > which adds a little bit of overhead (because each interrupt 
> needs to be
> > acknowledged/cancelled on the interrupt controller, for example).
> >
> > --
> > Mats
> >
> > > Thanks a lot,
> > >
> > > Liang
> > >
> > >
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xensource.com
> > > http://lists.xensource.com/xen-devel
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 
> -- 
> Dave: Just a question. What use is a unicyle with no seat?  
> And no pedals!
> Mark: To answer a question with a question: What use is a skateboard?
> Dave: Skateboards have wheels.
> Mark: My wheel has a wheel!
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Does Dom0 always get interrupts first before they are delivered to other guest domains?
@ 2007-04-07 16:59                                 ` Mark Williamson
  2007-04-12  0:20                                   ` Does Dom0 always get interrupts first before theyare " Liang Yang
  0 siblings, 1 reply; 35+ messages in thread
From: Mark Williamson @ 2007-04-07 16:59 UTC (permalink / raw)
  To: Liang Yang; +Cc: 'Petersson, Mats', xen-devel

> I have another question about using VT-X and Hypercall to support
> para-virtualized and full-virtualized domain simultaneously:

Sure, sorry for the delay...

> It seems Xen does not need to use hypercall to replace all problematic
> instructions (e.g. HLT, POPF etc.). For example, there is an instruction
> called CLTS. Instead of replacing it with a hypercall, Xen hypervisor will
> first delegate it to ring 0 when a GP fault occurs and then run it from
> there to solve ring aliasing issue.
> (http://www.linuxjournal.com/comment/reply/8909 talked about this).
> 

If instructions are trappable then Xen can catch their execution and
emulate them - it sometimes does this, even for paravirt guests.  Since
a GPF occurs it's possible to catch the CLTS instruction.  Some
instructions fail silently when run outside ring 0, which is one cas
ewhere a hypercall is more important (broadly speaking, the other cases
for using hypercalls being performance and improved manageability).

> Now my first question comes up: if I 'm running both para-virtualized and
> full-virtualized domain on single CPU (I think Xen hypervisor will set up
> the exception bitmap for CLTS instruction for HVM domain). Then Xen
> hypervisor will be confused and does not know how to handle it when running
> CLTS in ring 1. 

It'll know which form of handling is required because it changes the
necessary data structures when context switching between the two
domains.

The other stuff is a bit too specific in HVM-land for me to answer
fully, but I vaguely remember Mats having already responded.

Cheers,
Mark

> Does Xen hypervisor do a VM EXIT or still delegate CLTS to ring 0? How does
> Xen hypervisor distinguish the instruction is from para-virtualized domain
> or is from a full-virtualized domain? Does Xen have to replace all
> problematic instructions with hypercalls for Para-domain (even for CLTS)?
> Why does Xen need to use different strategies in para-virtualized domain to
> handle CLTS (delegation to ring 0) and other problematic instructions
> (hypercall)?
> 
> 
> My second question:
> It seems each processor has its own exception bitmap. If I have
> multi-processors (vt-x enabled), does Xen hypervisor use the same exception
> bitmap in all processors or does Xen allow different processor have its own
> (maybe different) exception bitmap?
> 
> Best regards,
> 
> Liang
> 
> -----Original Message-----
> From: M.A. Williamson [mailto:maw48@hermes.cam.ac.uk] On Behalf Of Mark
> Williamson
> Sent: Tuesday, March 20, 2007 5:37 PM
> To: xen-devel@lists.xensource.com
> Cc: Liang Yang; Petersson, Mats
> Subject: Re: [Xen-devel] Does Dom0 always get interrupts first before they
> are delivered to other guest domains?
> 
> Hi,
> 
> > First, you once gave another excellent explanation about the communication
> > between HVM domain and HV (15 Feb 2007 ). Here I quote part of it
> > "...Since these IO events are synchronous in a real processor, the
> > hypervisor will wait for a "return event" before the guest is allowed to
> > continue. Qemu-dm runs as a normal user-process in Dom0..."
> > My question is about those Synchronous I/O events. Why can't we make them
> > asynchronous? e.g. whenever I/O are done, we can interrupt HV again and
> let
> > HV resume I/O processing. Is there any specific limiation to force Xen
> > hypervisor do I/O in synchronous mode?
> 
> Was this talking about IO port reads / writes?
> 
> The problem with IO port reads is that the guest expects the hardware to
> have 
> responded to an IO port read and for the result to be available as soon as 
> the inb (or whatever) instruction has finished...  Therefore in a virtual 
> machine, we can't return to the guest until we've figured out (by emulating 
> using the device model) what that read should return.
> 
> Consecutive writes can potentially be batched, I believe, and there has been
> 
> talk of implementing that.
> 
> I don't see any reason why other VCPUs shouldn't keep running in the
> meantime, 
> though.
> 
> > Second,  you just mentioned there is big difference between the number of
> > HV-to-domain0 events for device model and split driver model. Could you
> > elaborate the details about how split driver model can reduce the
> > HV-to-domain0 events compared with using qemu device model?
> 
> The PV split drivers are designed to minimise events: they'll queue up a
> load 
> of IO requests in a batch and then notify dom0 that the IO requests are 
> ready.
> 
> In contrast, the FV device emulation can't do this: we have to consult dom0 
> for the emulation of any device operations the guest does (e.g. each IO port
> 
> read the guest does) so the batching is less efficient.
> 
> Cheers,
> Mark
> 
> > Have a wonderful weekend,
> >
> > Liang
> >
> > ----- Original Message -----
> > From: "Petersson, Mats" <Mats.Petersson@amd.com>
> > To: "Liang Yang" <multisyncfe991@hotmail.com>;
> > <xen-devel@lists.xensource.com>
> > Sent: Friday, March 16, 2007 10:40 AM
> > Subject: RE: [Xen-devel] Does Dom0 always get interrupts first before they
> > are delivered to other guest domains?
> >
> > > -----Original Message-----
> > > From: xen-devel-bounces@lists.xensource.com
> > > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Liang Yang
> > > Sent: 16 March 2007 17:30
> > > To: xen-devel@lists.xensource.com
> > > Subject: [Xen-devel] Does Dom0 always get interrupts first
> > > before they are delivered to other guest domains?
> > >
> > > Hello,
> > >
> > > It seems if HVM domains access device using emulation mode
> > > w/ device model
> > > in domain0, Xen hypervisor will send the interrupt event to
> > > domain0 first
> > > and then the device model in domain0 will send event to HVM domains.
> >
> > Ok, so let's see if I've understood your question first:
> > If we do a disk-read (for example), the actual disk-read operation
> > itself will generate an interrupt, which goes into Xen HV where it's
> > converted to an event that goes to Dom0, which in turn wakes up the
> > pending call to read (in this case) that was requesting the disk IO, and
> > then when the read-call is finished an event is sent to the HVM DomU. Is
> > this the sequence of events that you're talking about?
> >
> > If that's what you are talking about, it must be done this way.
> >
> > > However, if I'm using split driver model and I only run BE driver on
> > > domain0. Does domain0 still get the interrupt first (assume
> > > this interupt is
> > > not owned by the Xen hypervisor ,e.g. local APIC timer) or
> > > Xen hypervisor
> > > will send event directly to HVM domain bypass domain0 for
> > > split driver
> > > model?
> >
> > Not in the above type of scenario. The interrupt must go to the
> > driver-domain (normally Dom0) to indicate that the hardware is ready to
> > deliver the data. This will wake up the user-mode call that waited for
> > the data, and then the data can be delivered to the guest domain from
> > there (which in turn is awakened by the event sent from the driver
> > domain).
> >
> > There is no difference in the number of events in these two cases.
> >
> > There is however a big difference in the number of hypervisor-to-dom0
> > events that occur: the HVM model will require something in the order of
> > 5 writes to the IDE controller to perform one disk read/write operation.
> > Each of those will incur one event to wake up qemu-dm, and one event to
> > wake the domu (which will most likely just to one or two instructions
> > forward to hit the next write to the IDE controller).
> >
> > > Another question is: for interrupt delivery, does Xen treat
> > > para-virtualized
> > > domain differently from HVM domain considering using device
> > > model and split
> > > driver model?
> >
> > Not in interrupt delivery, no. Except for the fact that HVM domains
> > obviously have full hardware interfaces for interrupt controllers etc,
> > which adds a little bit of overhead (because each interrupt needs to be
> > acknowledged/cancelled on the interrupt controller, for example).
> >
> > --
> > Mats
> >
> > > Thanks a lot,
> > >
> > > Liang
> > >
> > >
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xensource.com
> > > http://lists.xensource.com/xen-devel
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Does Dom0 always get interrupts first before theyare delivered to other guest domains?
  2007-04-07 16:59                                 ` Mark Williamson
@ 2007-04-12  0:20                                   ` Liang Yang
  2007-04-12 14:00                                     ` Petersson, Mats
  0 siblings, 1 reply; 35+ messages in thread
From: Liang Yang @ 2007-04-12  0:20 UTC (permalink / raw)
  To: Mark Williamson; +Cc: xen-devel

Hi Mark,

Thanks for your reply. I still have questions about the switch overhead 
between rings. It seems HW support of VT-x is not as efficient as expected 
as there are too many conditions to check for each vmexit and vm-reentry. 
But I don't know how to quantify the overhead comparison of vt-x based 
context switch and
hypercall based context switch.

If I just considering the pure context switch ovehead, which one has bigger 
overhead, using HW vmexit/vmentry to do root and non-root mode switch by 
programming VT-x vector or using SW hypercall to inject interrupt to switch 
from ring 1 to ring 0 (or ring 3 to ring 0 for 64bit OS)? Does the switch 
between ring1 and ring0 has the same overhead as the switch between ring 3 
and ring0?

BTW, both root and non-root mode has four rings, if the ring0 and ring3 in 
non-root mode are used for guest os kernel and user applications, which
ring level in root mode will be used when a vmexit happens? Can I jump from 
ring 3 in non-root mode directly to ring 0 in root mode?

Thanks,

Liang

----- Original Message ----- 
From: "Mark Williamson" <mark.williamson@cl.cam.ac.uk>
To: "Liang Yang" <multisyncfe991@hotmail.com>
Cc: <xen-devel@lists.xensource.com>; "'Petersson, Mats'" 
<Mats.Petersson@amd.com>
Sent: Saturday, April 07, 2007 9:59 AM
Subject: RE: [Xen-devel] Does Dom0 always get interrupts first before 
theyare delivered to other guest domains?


>> I have another question about using VT-X and Hypercall to support
>> para-virtualized and full-virtualized domain simultaneously:
>
> Sure, sorry for the delay...
>
>> It seems Xen does not need to use hypercall to replace all problematic
>> instructions (e.g. HLT, POPF etc.). For example, there is an instruction
>> called CLTS. Instead of replacing it with a hypercall, Xen hypervisor 
>> will
>> first delegate it to ring 0 when a GP fault occurs and then run it from
>> there to solve ring aliasing issue.
>> (http://www.linuxjournal.com/comment/reply/8909 talked about this).
>>
>
> If instructions are trappable then Xen can catch their execution and
> emulate them - it sometimes does this, even for paravirt guests.  Since
> a GPF occurs it's possible to catch the CLTS instruction.  Some
> instructions fail silently when run outside ring 0, which is one cas
> ewhere a hypercall is more important (broadly speaking, the other cases
> for using hypercalls being performance and improved manageability).
>
>> Now my first question comes up: if I 'm running both para-virtualized and
>> full-virtualized domain on single CPU (I think Xen hypervisor will set up
>> the exception bitmap for CLTS instruction for HVM domain). Then Xen
>> hypervisor will be confused and does not know how to handle it when 
>> running
>> CLTS in ring 1.
>
> It'll know which form of handling is required because it changes the
> necessary data structures when context switching between the two
> domains.
>
> The other stuff is a bit too specific in HVM-land for me to answer
> fully, but I vaguely remember Mats having already responded.
>
> Cheers,
> Mark
>
>> Does Xen hypervisor do a VM EXIT or still delegate CLTS to ring 0? How 
>> does
>> Xen hypervisor distinguish the instruction is from para-virtualized 
>> domain
>> or is from a full-virtualized domain? Does Xen have to replace all
>> problematic instructions with hypercalls for Para-domain (even for CLTS)?
>> Why does Xen need to use different strategies in para-virtualized domain 
>> to
>> handle CLTS (delegation to ring 0) and other problematic instructions
>> (hypercall)?
>>
>>
>> My second question:
>> It seems each processor has its own exception bitmap. If I have
>> multi-processors (vt-x enabled), does Xen hypervisor use the same 
>> exception
>> bitmap in all processors or does Xen allow different processor have its 
>> own
>> (maybe different) exception bitmap?
>>
>> Best regards,
>>
>> Liang
>>
>> -----Original Message-----
>> From: M.A. Williamson [mailto:maw48@hermes.cam.ac.uk] On Behalf Of Mark
>> Williamson
>> Sent: Tuesday, March 20, 2007 5:37 PM
>> To: xen-devel@lists.xensource.com
>> Cc: Liang Yang; Petersson, Mats
>> Subject: Re: [Xen-devel] Does Dom0 always get interrupts first before 
>> they
>> are delivered to other guest domains?
>>
>> Hi,
>>
>> > First, you once gave another excellent explanation about the 
>> > communication
>> > between HVM domain and HV (15 Feb 2007 ). Here I quote part of it
>> > "...Since these IO events are synchronous in a real processor, the
>> > hypervisor will wait for a "return event" before the guest is allowed 
>> > to
>> > continue. Qemu-dm runs as a normal user-process in Dom0..."
>> > My question is about those Synchronous I/O events. Why can't we make 
>> > them
>> > asynchronous? e.g. whenever I/O are done, we can interrupt HV again and
>> let
>> > HV resume I/O processing. Is there any specific limiation to force Xen
>> > hypervisor do I/O in synchronous mode?
>>
>> Was this talking about IO port reads / writes?
>>
>> The problem with IO port reads is that the guest expects the hardware to
>> have
>> responded to an IO port read and for the result to be available as soon 
>> as
>> the inb (or whatever) instruction has finished...  Therefore in a virtual
>> machine, we can't return to the guest until we've figured out (by 
>> emulating
>> using the device model) what that read should return.
>>
>> Consecutive writes can potentially be batched, I believe, and there has 
>> been
>>
>> talk of implementing that.
>>
>> I don't see any reason why other VCPUs shouldn't keep running in the
>> meantime,
>> though.
>>
>> > Second,  you just mentioned there is big difference between the number 
>> > of
>> > HV-to-domain0 events for device model and split driver model. Could you
>> > elaborate the details about how split driver model can reduce the
>> > HV-to-domain0 events compared with using qemu device model?
>>
>> The PV split drivers are designed to minimise events: they'll queue up a
>> load
>> of IO requests in a batch and then notify dom0 that the IO requests are
>> ready.
>>
>> In contrast, the FV device emulation can't do this: we have to consult 
>> dom0
>> for the emulation of any device operations the guest does (e.g. each IO 
>> port
>>
>> read the guest does) so the batching is less efficient.
>>
>> Cheers,
>> Mark
>>
>> > Have a wonderful weekend,
>> >
>> > Liang
>> >
>> > ----- Original Message -----
>> > From: "Petersson, Mats" <Mats.Petersson@amd.com>
>> > To: "Liang Yang" <multisyncfe991@hotmail.com>;
>> > <xen-devel@lists.xensource.com>
>> > Sent: Friday, March 16, 2007 10:40 AM
>> > Subject: RE: [Xen-devel] Does Dom0 always get interrupts first before 
>> > they
>> > are delivered to other guest domains?
>> >
>> > > -----Original Message-----
>> > > From: xen-devel-bounces@lists.xensource.com
>> > > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Liang 
>> > > Yang
>> > > Sent: 16 March 2007 17:30
>> > > To: xen-devel@lists.xensource.com
>> > > Subject: [Xen-devel] Does Dom0 always get interrupts first
>> > > before they are delivered to other guest domains?
>> > >
>> > > Hello,
>> > >
>> > > It seems if HVM domains access device using emulation mode
>> > > w/ device model
>> > > in domain0, Xen hypervisor will send the interrupt event to
>> > > domain0 first
>> > > and then the device model in domain0 will send event to HVM domains.
>> >
>> > Ok, so let's see if I've understood your question first:
>> > If we do a disk-read (for example), the actual disk-read operation
>> > itself will generate an interrupt, which goes into Xen HV where it's
>> > converted to an event that goes to Dom0, which in turn wakes up the
>> > pending call to read (in this case) that was requesting the disk IO, 
>> > and
>> > then when the read-call is finished an event is sent to the HVM DomU. 
>> > Is
>> > this the sequence of events that you're talking about?
>> >
>> > If that's what you are talking about, it must be done this way.
>> >
>> > > However, if I'm using split driver model and I only run BE driver on
>> > > domain0. Does domain0 still get the interrupt first (assume
>> > > this interupt is
>> > > not owned by the Xen hypervisor ,e.g. local APIC timer) or
>> > > Xen hypervisor
>> > > will send event directly to HVM domain bypass domain0 for
>> > > split driver
>> > > model?
>> >
>> > Not in the above type of scenario. The interrupt must go to the
>> > driver-domain (normally Dom0) to indicate that the hardware is ready to
>> > deliver the data. This will wake up the user-mode call that waited for
>> > the data, and then the data can be delivered to the guest domain from
>> > there (which in turn is awakened by the event sent from the driver
>> > domain).
>> >
>> > There is no difference in the number of events in these two cases.
>> >
>> > There is however a big difference in the number of hypervisor-to-dom0
>> > events that occur: the HVM model will require something in the order of
>> > 5 writes to the IDE controller to perform one disk read/write 
>> > operation.
>> > Each of those will incur one event to wake up qemu-dm, and one event to
>> > wake the domu (which will most likely just to one or two instructions
>> > forward to hit the next write to the IDE controller).
>> >
>> > > Another question is: for interrupt delivery, does Xen treat
>> > > para-virtualized
>> > > domain differently from HVM domain considering using device
>> > > model and split
>> > > driver model?
>> >
>> > Not in interrupt delivery, no. Except for the fact that HVM domains
>> > obviously have full hardware interfaces for interrupt controllers etc,
>> > which adds a little bit of overhead (because each interrupt needs to be
>> > acknowledged/cancelled on the interrupt controller, for example).
>> >
>> > --
>> > Mats
>> >
>> > > Thanks a lot,
>> > >
>> > > Liang
>> > >
>> > >
>> > > _______________________________________________
>> > > Xen-devel mailing list
>> > > Xen-devel@lists.xensource.com
>> > > http://lists.xensource.com/xen-devel
>> >
>> > _______________________________________________
>> > Xen-devel mailing list
>> > Xen-devel@lists.xensource.com
>> > http://lists.xensource.com/xen-devel
>>
>
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Does Dom0 always get interrupts first before theyare delivered to other guest domains?
  2007-04-12  0:20                                   ` Does Dom0 always get interrupts first before theyare " Liang Yang
@ 2007-04-12 14:00                                     ` Petersson, Mats
  2007-04-12 20:15                                       ` Does Dom0 always get interrupts first beforetheyare " Liang Yang
  0 siblings, 1 reply; 35+ messages in thread
From: Petersson, Mats @ 2007-04-12 14:00 UTC (permalink / raw)
  To: Liang Yang, Mark Williamson; +Cc: xen-devel

 

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com 
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Liang Yang
> Sent: 12 April 2007 01:21
> To: Mark Williamson
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] Does Dom0 always get interrupts 
> first before theyare delivered to other guest domains?
> 
> Hi Mark,

I'm not Mark, but I'll try to give some answers... 
> 
> Thanks for your reply. I still have questions about the 
> switch overhead 
> between rings. It seems HW support of VT-x is not as 
> efficient as expected 
> as there are too many conditions to check for each vmexit and 
> vm-reentry. 
> But I don't know how to quantify the overhead comparison of 
> vt-x based 
> context switch and
> hypercall based context switch.

The HVM context switch will be longer. How much longer depends on so
many factors that it's probably easier to measure the difference (in
some way) than to try to guestimate it by reading documentation or
anything such-like. 

The reason that HVM (AMD-V/Intel VTx) isn't as "good" as the
para-virtual case has very little to do with interrupt handling, I
should think (unless you're doing something very peculiar in your
guest), but much more to do with how the guest hardware acesses are
performed. For example, an interrupt that leads back to the guest will
most likely lead to several DomU VMEXITs just in the interrupt handler.
For example an IDE interrupt that indicates to dom0 that a sector
requested by a HVM DomU is ready:

Assuming that the HVM guest is currently running, the following is the
set of events:
1. VMEXIT for disk-related IRQ in real hardware. Hypervisor forwards the
IRQ 14 to Dom0 (actually, there's nothing that the hypervisor actually
needs to do here, but the guest needs to exit so that Dom0 can run, and
of course, eventually the guest will have to be restarted)
2. QEMU receives the data from the read() function requesting the
disk-data for DomU. Once the data is in QEMU, QEMU will signal the IRQ
to guest. 
3. Guest is restarted with Virtual IRQ pending. 
4. Guest takes interrupt (assuming interrupt mask and eflag interupt
enable flag allows interrupts to be taken). Processor looks up the IDT
entry for the corresponding IRQ and jumps to the location indicated. 
5. IRQ handler checks the status of the IDE controller -> VMEXIT IOIO. 
6. VMEXIT IOIO leads to QEMU-operation -> Dom0 needs to run -> QEMU
signals the result back to guest, guest is restarted.
7. IRQ handler retrieves the data [a] -> VMEXIT IOIO.
8. VMEXIT IOIO for the IO read/write of the data [if the driver uses
INS/OUTS this is a single VMEXIT IOIO, if it's a "stupid" driver using
individual IN/OUT instructions, it will take 256 (16-bit per transfer)
VMEXIT's]. Again, this leads to QEMU/Dom0 being scheduled and event back
to guest when done.
9. IRQ handler acknowledges the interrupt -> VMEXIT IOIO. [b]
10. VMEXIT IOIO/MMIO (pagefault) due to access to interrupt controller.
This time we just perform the relevant [A]PIC management inside the
hypervisor, as it's has models for the interrupt controllers (8259 and
APIC). Guest is restarted when the interrupt controller access is
finished. 
11. Done. 

That is four VMEXIT operations for one disk-interrupt. 

[a] The IRQ handler itself may not actually retrieve the data, but some
thread/process that is awakaned by the IRQ handler - this is not really
important for the discussion or the number of VMEXIT's, but it will of
course have some impact on the interrupt latency as interrupts are
disabled within the IRQ handler, but not in the worker thread that reads
the data. The exact order of the events described above is also
different, but the net number of VMEXIT's is unchanged. 
[b] On a "old-style" PC, there will be 2 interrupt acknowledge IO
operations, because the IDE controller is wired to IRQ13/14, which is on
the second 8259 PIC, which means that both the master and the slave
needs an ACK operation. 

> 
> If I just considering the pure context switch ovehead, which 
> one has bigger 
> overhead, using HW vmexit/vmentry to do root and non-root 
> mode switch by 
> programming VT-x vector or using SW hypercall to inject 
> interrupt to switch 
> from ring 1 to ring 0 (or ring 3 to ring 0 for 64bit OS)? 
> Does the switch 
> between ring1 and ring0 has the same overhead as the switch 
> between ring 3 
> and ring0?

Ring-switch has the same overhead regardless of which rings the switch
is between [at least in the sense that the processor does exactly the
same thing when switching from ring 2 to 1 or ring 3 to 0 - the exact
time it takes to switch rings is harder to determine, because it depends
on alignments, cache hit/miss rates, and various other things]. 
> 
> BTW, both root and non-root mode has four rings, if the ring0 
> and ring3 in 
> non-root mode are used for guest os kernel and user 
> applications, which
> ring level in root mode will be used when a vmexit happens? 

The VMEXIT will end up in ring 0 in the hypervisor [in AMD-processors,
VMEXIT "returns", whilst in Intel processors, there is a dedicated
register that holds the "vmexit address", but the processor essentially
returns to a state that is identical to prior to the
VMRUN/VMLAUNCH/VMRESUME instruction that got the guest-code running in
the first place - very similar to a call instruction]. 

> Can I jump from 
> ring 3 in non-root mode directly to ring 0 in root mode?

Yes, that's perfectly possible (in fact, it's most likely what ALWAYS
happens). 

--
Mats
> 
> Thanks,
> 
> Liang
> 
> ----- Original Message ----- 
> From: "Mark Williamson" <mark.williamson@cl.cam.ac.uk>
> To: "Liang Yang" <multisyncfe991@hotmail.com>
> Cc: <xen-devel@lists.xensource.com>; "'Petersson, Mats'" 
> <Mats.Petersson@amd.com>
> Sent: Saturday, April 07, 2007 9:59 AM
> Subject: RE: [Xen-devel] Does Dom0 always get interrupts first before 
> theyare delivered to other guest domains?
> 
> 
> >> I have another question about using VT-X and Hypercall to support
> >> para-virtualized and full-virtualized domain simultaneously:
> >
> > Sure, sorry for the delay...
> >
> >> It seems Xen does not need to use hypercall to replace all 
> problematic
> >> instructions (e.g. HLT, POPF etc.). For example, there is 
> an instruction
> >> called CLTS. Instead of replacing it with a hypercall, Xen 
> hypervisor 
> >> will
> >> first delegate it to ring 0 when a GP fault occurs and 
> then run it from
> >> there to solve ring aliasing issue.
> >> (http://www.linuxjournal.com/comment/reply/8909 talked about this).
> >>
> >
> > If instructions are trappable then Xen can catch their execution and
> > emulate them - it sometimes does this, even for paravirt 
> guests.  Since
> > a GPF occurs it's possible to catch the CLTS instruction.  Some
> > instructions fail silently when run outside ring 0, which is one cas
> > ewhere a hypercall is more important (broadly speaking, the 
> other cases
> > for using hypercalls being performance and improved manageability).
> >
> >> Now my first question comes up: if I 'm running both 
> para-virtualized and
> >> full-virtualized domain on single CPU (I think Xen 
> hypervisor will set up
> >> the exception bitmap for CLTS instruction for HVM domain). Then Xen
> >> hypervisor will be confused and does not know how to 
> handle it when 
> >> running
> >> CLTS in ring 1.
> >
> > It'll know which form of handling is required because it changes the
> > necessary data structures when context switching between the two
> > domains.
> >
> > The other stuff is a bit too specific in HVM-land for me to answer
> > fully, but I vaguely remember Mats having already responded.
> >
> > Cheers,
> > Mark
> >
> >> Does Xen hypervisor do a VM EXIT or still delegate CLTS to 
> ring 0? How 
> >> does
> >> Xen hypervisor distinguish the instruction is from 
> para-virtualized 
> >> domain
> >> or is from a full-virtualized domain? Does Xen have to replace all
> >> problematic instructions with hypercalls for Para-domain 
> (even for CLTS)?
> >> Why does Xen need to use different strategies in 
> para-virtualized domain 
> >> to
> >> handle CLTS (delegation to ring 0) and other problematic 
> instructions
> >> (hypercall)?
> >>
> >>
> >> My second question:
> >> It seems each processor has its own exception bitmap. If I have
> >> multi-processors (vt-x enabled), does Xen hypervisor use the same 
> >> exception
> >> bitmap in all processors or does Xen allow different 
> processor have its 
> >> own
> >> (maybe different) exception bitmap?
> >>
> >> Best regards,
> >>
> >> Liang
> >>
> >> -----Original Message-----
> >> From: M.A. Williamson [mailto:maw48@hermes.cam.ac.uk] On 
> Behalf Of Mark
> >> Williamson
> >> Sent: Tuesday, March 20, 2007 5:37 PM
> >> To: xen-devel@lists.xensource.com
> >> Cc: Liang Yang; Petersson, Mats
> >> Subject: Re: [Xen-devel] Does Dom0 always get interrupts 
> first before 
> >> they
> >> are delivered to other guest domains?
> >>
> >> Hi,
> >>
> >> > First, you once gave another excellent explanation about the 
> >> > communication
> >> > between HVM domain and HV (15 Feb 2007 ). Here I quote part of it
> >> > "...Since these IO events are synchronous in a real 
> processor, the
> >> > hypervisor will wait for a "return event" before the 
> guest is allowed 
> >> > to
> >> > continue. Qemu-dm runs as a normal user-process in Dom0..."
> >> > My question is about those Synchronous I/O events. Why 
> can't we make 
> >> > them
> >> > asynchronous? e.g. whenever I/O are done, we can 
> interrupt HV again and
> >> let
> >> > HV resume I/O processing. Is there any specific 
> limiation to force Xen
> >> > hypervisor do I/O in synchronous mode?
> >>
> >> Was this talking about IO port reads / writes?
> >>
> >> The problem with IO port reads is that the guest expects 
> the hardware to
> >> have
> >> responded to an IO port read and for the result to be 
> available as soon 
> >> as
> >> the inb (or whatever) instruction has finished...  
> Therefore in a virtual
> >> machine, we can't return to the guest until we've figured out (by 
> >> emulating
> >> using the device model) what that read should return.
> >>
> >> Consecutive writes can potentially be batched, I believe, 
> and there has 
> >> been
> >>
> >> talk of implementing that.
> >>
> >> I don't see any reason why other VCPUs shouldn't keep 
> running in the
> >> meantime,
> >> though.
> >>
> >> > Second,  you just mentioned there is big difference 
> between the number 
> >> > of
> >> > HV-to-domain0 events for device model and split driver 
> model. Could you
> >> > elaborate the details about how split driver model can reduce the
> >> > HV-to-domain0 events compared with using qemu device model?
> >>
> >> The PV split drivers are designed to minimise events: 
> they'll queue up a
> >> load
> >> of IO requests in a batch and then notify dom0 that the IO 
> requests are
> >> ready.
> >>
> >> In contrast, the FV device emulation can't do this: we 
> have to consult 
> >> dom0
> >> for the emulation of any device operations the guest does 
> (e.g. each IO 
> >> port
> >>
> >> read the guest does) so the batching is less efficient.
> >>
> >> Cheers,
> >> Mark
> >>
> >> > Have a wonderful weekend,
> >> >
> >> > Liang
> >> >
> >> > ----- Original Message -----
> >> > From: "Petersson, Mats" <Mats.Petersson@amd.com>
> >> > To: "Liang Yang" <multisyncfe991@hotmail.com>;
> >> > <xen-devel@lists.xensource.com>
> >> > Sent: Friday, March 16, 2007 10:40 AM
> >> > Subject: RE: [Xen-devel] Does Dom0 always get interrupts 
> first before 
> >> > they
> >> > are delivered to other guest domains?
> >> >
> >> > > -----Original Message-----
> >> > > From: xen-devel-bounces@lists.xensource.com
> >> > > [mailto:xen-devel-bounces@lists.xensource.com] On 
> Behalf Of Liang 
> >> > > Yang
> >> > > Sent: 16 March 2007 17:30
> >> > > To: xen-devel@lists.xensource.com
> >> > > Subject: [Xen-devel] Does Dom0 always get interrupts first
> >> > > before they are delivered to other guest domains?
> >> > >
> >> > > Hello,
> >> > >
> >> > > It seems if HVM domains access device using emulation mode
> >> > > w/ device model
> >> > > in domain0, Xen hypervisor will send the interrupt event to
> >> > > domain0 first
> >> > > and then the device model in domain0 will send event 
> to HVM domains.
> >> >
> >> > Ok, so let's see if I've understood your question first:
> >> > If we do a disk-read (for example), the actual disk-read 
> operation
> >> > itself will generate an interrupt, which goes into Xen 
> HV where it's
> >> > converted to an event that goes to Dom0, which in turn 
> wakes up the
> >> > pending call to read (in this case) that was requesting 
> the disk IO, 
> >> > and
> >> > then when the read-call is finished an event is sent to 
> the HVM DomU. 
> >> > Is
> >> > this the sequence of events that you're talking about?
> >> >
> >> > If that's what you are talking about, it must be done this way.
> >> >
> >> > > However, if I'm using split driver model and I only 
> run BE driver on
> >> > > domain0. Does domain0 still get the interrupt first (assume
> >> > > this interupt is
> >> > > not owned by the Xen hypervisor ,e.g. local APIC timer) or
> >> > > Xen hypervisor
> >> > > will send event directly to HVM domain bypass domain0 for
> >> > > split driver
> >> > > model?
> >> >
> >> > Not in the above type of scenario. The interrupt must go to the
> >> > driver-domain (normally Dom0) to indicate that the 
> hardware is ready to
> >> > deliver the data. This will wake up the user-mode call 
> that waited for
> >> > the data, and then the data can be delivered to the 
> guest domain from
> >> > there (which in turn is awakened by the event sent from 
> the driver
> >> > domain).
> >> >
> >> > There is no difference in the number of events in these 
> two cases.
> >> >
> >> > There is however a big difference in the number of 
> hypervisor-to-dom0
> >> > events that occur: the HVM model will require something 
> in the order of
> >> > 5 writes to the IDE controller to perform one disk read/write 
> >> > operation.
> >> > Each of those will incur one event to wake up qemu-dm, 
> and one event to
> >> > wake the domu (which will most likely just to one or two 
> instructions
> >> > forward to hit the next write to the IDE controller).
> >> >
> >> > > Another question is: for interrupt delivery, does Xen treat
> >> > > para-virtualized
> >> > > domain differently from HVM domain considering using device
> >> > > model and split
> >> > > driver model?
> >> >
> >> > Not in interrupt delivery, no. Except for the fact that 
> HVM domains
> >> > obviously have full hardware interfaces for interrupt 
> controllers etc,
> >> > which adds a little bit of overhead (because each 
> interrupt needs to be
> >> > acknowledged/cancelled on the interrupt controller, for example).
> >> >
> >> > --
> >> > Mats
> >> >
> >> > > Thanks a lot,
> >> > >
> >> > > Liang
> >> > >
> >> > >
> >> > > _______________________________________________
> >> > > Xen-devel mailing list
> >> > > Xen-devel@lists.xensource.com
> >> > > http://lists.xensource.com/xen-devel
> >> >
> >> > _______________________________________________
> >> > Xen-devel mailing list
> >> > Xen-devel@lists.xensource.com
> >> > http://lists.xensource.com/xen-devel
> >>
> >
> > 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Does Dom0 always get interrupts first beforetheyare delivered to other guest domains?
  2007-04-12 14:00                                     ` Petersson, Mats
@ 2007-04-12 20:15                                       ` Liang Yang
  0 siblings, 0 replies; 35+ messages in thread
From: Liang Yang @ 2007-04-12 20:15 UTC (permalink / raw)
  To: Petersson, Mats, Mark Williamson; +Cc: xen-devel

Hi Mats,

Thank you for your always prompt and knowledgeable reply. I will vote you as 
one of MVP in this mailing list :)

Best regards,

Liang

----- Original Message ----- 
From: "Petersson, Mats" <Mats.Petersson@amd.com>
To: "Liang Yang" <multisyncfe991@hotmail.com>; "Mark Williamson" 
<mark.williamson@cl.cam.ac.uk>
Cc: <xen-devel@lists.xensource.com>
Sent: Thursday, April 12, 2007 7:00 AM
Subject: RE: [Xen-devel] Does Dom0 always get interrupts first beforetheyare 
delivered to other guest domains?




> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Liang Yang
> Sent: 12 April 2007 01:21
> To: Mark Williamson
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] Does Dom0 always get interrupts
> first before theyare delivered to other guest domains?
>
> Hi Mark,

I'm not Mark, but I'll try to give some answers...
>
> Thanks for your reply. I still have questions about the
> switch overhead
> between rings. It seems HW support of VT-x is not as
> efficient as expected
> as there are too many conditions to check for each vmexit and
> vm-reentry.
> But I don't know how to quantify the overhead comparison of
> vt-x based
> context switch and
> hypercall based context switch.

The HVM context switch will be longer. How much longer depends on so
many factors that it's probably easier to measure the difference (in
some way) than to try to guestimate it by reading documentation or
anything such-like.

The reason that HVM (AMD-V/Intel VTx) isn't as "good" as the
para-virtual case has very little to do with interrupt handling, I
should think (unless you're doing something very peculiar in your
guest), but much more to do with how the guest hardware acesses are
performed. For example, an interrupt that leads back to the guest will
most likely lead to several DomU VMEXITs just in the interrupt handler.
For example an IDE interrupt that indicates to dom0 that a sector
requested by a HVM DomU is ready:

Assuming that the HVM guest is currently running, the following is the
set of events:
1. VMEXIT for disk-related IRQ in real hardware. Hypervisor forwards the
IRQ 14 to Dom0 (actually, there's nothing that the hypervisor actually
needs to do here, but the guest needs to exit so that Dom0 can run, and
of course, eventually the guest will have to be restarted)
2. QEMU receives the data from the read() function requesting the
disk-data for DomU. Once the data is in QEMU, QEMU will signal the IRQ
to guest.
3. Guest is restarted with Virtual IRQ pending.
4. Guest takes interrupt (assuming interrupt mask and eflag interupt
enable flag allows interrupts to be taken). Processor looks up the IDT
entry for the corresponding IRQ and jumps to the location indicated.
5. IRQ handler checks the status of the IDE controller -> VMEXIT IOIO.
6. VMEXIT IOIO leads to QEMU-operation -> Dom0 needs to run -> QEMU
signals the result back to guest, guest is restarted.
7. IRQ handler retrieves the data [a] -> VMEXIT IOIO.
8. VMEXIT IOIO for the IO read/write of the data [if the driver uses
INS/OUTS this is a single VMEXIT IOIO, if it's a "stupid" driver using
individual IN/OUT instructions, it will take 256 (16-bit per transfer)
VMEXIT's]. Again, this leads to QEMU/Dom0 being scheduled and event back
to guest when done.
9. IRQ handler acknowledges the interrupt -> VMEXIT IOIO. [b]
10. VMEXIT IOIO/MMIO (pagefault) due to access to interrupt controller.
This time we just perform the relevant [A]PIC management inside the
hypervisor, as it's has models for the interrupt controllers (8259 and
APIC). Guest is restarted when the interrupt controller access is
finished.
11. Done.

That is four VMEXIT operations for one disk-interrupt.

[a] The IRQ handler itself may not actually retrieve the data, but some
thread/process that is awakaned by the IRQ handler - this is not really
important for the discussion or the number of VMEXIT's, but it will of
course have some impact on the interrupt latency as interrupts are
disabled within the IRQ handler, but not in the worker thread that reads
the data. The exact order of the events described above is also
different, but the net number of VMEXIT's is unchanged.
[b] On a "old-style" PC, there will be 2 interrupt acknowledge IO
operations, because the IDE controller is wired to IRQ13/14, which is on
the second 8259 PIC, which means that both the master and the slave
needs an ACK operation.

>
> If I just considering the pure context switch ovehead, which
> one has bigger
> overhead, using HW vmexit/vmentry to do root and non-root
> mode switch by
> programming VT-x vector or using SW hypercall to inject
> interrupt to switch
> from ring 1 to ring 0 (or ring 3 to ring 0 for 64bit OS)?
> Does the switch
> between ring1 and ring0 has the same overhead as the switch
> between ring 3
> and ring0?

Ring-switch has the same overhead regardless of which rings the switch
is between [at least in the sense that the processor does exactly the
same thing when switching from ring 2 to 1 or ring 3 to 0 - the exact
time it takes to switch rings is harder to determine, because it depends
on alignments, cache hit/miss rates, and various other things].
>
> BTW, both root and non-root mode has four rings, if the ring0
> and ring3 in
> non-root mode are used for guest os kernel and user
> applications, which
> ring level in root mode will be used when a vmexit happens?

The VMEXIT will end up in ring 0 in the hypervisor [in AMD-processors,
VMEXIT "returns", whilst in Intel processors, there is a dedicated
register that holds the "vmexit address", but the processor essentially
returns to a state that is identical to prior to the
VMRUN/VMLAUNCH/VMRESUME instruction that got the guest-code running in
the first place - very similar to a call instruction].

> Can I jump from
> ring 3 in non-root mode directly to ring 0 in root mode?

Yes, that's perfectly possible (in fact, it's most likely what ALWAYS
happens).

--
Mats
>
> Thanks,
>
> Liang
>
> ----- Original Message ----- 
> From: "Mark Williamson" <mark.williamson@cl.cam.ac.uk>
> To: "Liang Yang" <multisyncfe991@hotmail.com>
> Cc: <xen-devel@lists.xensource.com>; "'Petersson, Mats'"
> <Mats.Petersson@amd.com>
> Sent: Saturday, April 07, 2007 9:59 AM
> Subject: RE: [Xen-devel] Does Dom0 always get interrupts first before
> theyare delivered to other guest domains?
>
>
> >> I have another question about using VT-X and Hypercall to support
> >> para-virtualized and full-virtualized domain simultaneously:
> >
> > Sure, sorry for the delay...
> >
> >> It seems Xen does not need to use hypercall to replace all
> problematic
> >> instructions (e.g. HLT, POPF etc.). For example, there is
> an instruction
> >> called CLTS. Instead of replacing it with a hypercall, Xen
> hypervisor
> >> will
> >> first delegate it to ring 0 when a GP fault occurs and
> then run it from
> >> there to solve ring aliasing issue.
> >> (http://www.linuxjournal.com/comment/reply/8909 talked about this).
> >>
> >
> > If instructions are trappable then Xen can catch their execution and
> > emulate them - it sometimes does this, even for paravirt
> guests.  Since
> > a GPF occurs it's possible to catch the CLTS instruction.  Some
> > instructions fail silently when run outside ring 0, which is one cas
> > ewhere a hypercall is more important (broadly speaking, the
> other cases
> > for using hypercalls being performance and improved manageability).
> >
> >> Now my first question comes up: if I 'm running both
> para-virtualized and
> >> full-virtualized domain on single CPU (I think Xen
> hypervisor will set up
> >> the exception bitmap for CLTS instruction for HVM domain). Then Xen
> >> hypervisor will be confused and does not know how to
> handle it when
> >> running
> >> CLTS in ring 1.
> >
> > It'll know which form of handling is required because it changes the
> > necessary data structures when context switching between the two
> > domains.
> >
> > The other stuff is a bit too specific in HVM-land for me to answer
> > fully, but I vaguely remember Mats having already responded.
> >
> > Cheers,
> > Mark
> >
> >> Does Xen hypervisor do a VM EXIT or still delegate CLTS to
> ring 0? How
> >> does
> >> Xen hypervisor distinguish the instruction is from
> para-virtualized
> >> domain
> >> or is from a full-virtualized domain? Does Xen have to replace all
> >> problematic instructions with hypercalls for Para-domain
> (even for CLTS)?
> >> Why does Xen need to use different strategies in
> para-virtualized domain
> >> to
> >> handle CLTS (delegation to ring 0) and other problematic
> instructions
> >> (hypercall)?
> >>
> >>
> >> My second question:
> >> It seems each processor has its own exception bitmap. If I have
> >> multi-processors (vt-x enabled), does Xen hypervisor use the same
> >> exception
> >> bitmap in all processors or does Xen allow different
> processor have its
> >> own
> >> (maybe different) exception bitmap?
> >>
> >> Best regards,
> >>
> >> Liang
> >>
> >> -----Original Message-----
> >> From: M.A. Williamson [mailto:maw48@hermes.cam.ac.uk] On
> Behalf Of Mark
> >> Williamson
> >> Sent: Tuesday, March 20, 2007 5:37 PM
> >> To: xen-devel@lists.xensource.com
> >> Cc: Liang Yang; Petersson, Mats
> >> Subject: Re: [Xen-devel] Does Dom0 always get interrupts
> first before
> >> they
> >> are delivered to other guest domains?
> >>
> >> Hi,
> >>
> >> > First, you once gave another excellent explanation about the
> >> > communication
> >> > between HVM domain and HV (15 Feb 2007 ). Here I quote part of it
> >> > "...Since these IO events are synchronous in a real
> processor, the
> >> > hypervisor will wait for a "return event" before the
> guest is allowed
> >> > to
> >> > continue. Qemu-dm runs as a normal user-process in Dom0..."
> >> > My question is about those Synchronous I/O events. Why
> can't we make
> >> > them
> >> > asynchronous? e.g. whenever I/O are done, we can
> interrupt HV again and
> >> let
> >> > HV resume I/O processing. Is there any specific
> limiation to force Xen
> >> > hypervisor do I/O in synchronous mode?
> >>
> >> Was this talking about IO port reads / writes?
> >>
> >> The problem with IO port reads is that the guest expects
> the hardware to
> >> have
> >> responded to an IO port read and for the result to be
> available as soon
> >> as
> >> the inb (or whatever) instruction has finished...
> Therefore in a virtual
> >> machine, we can't return to the guest until we've figured out (by
> >> emulating
> >> using the device model) what that read should return.
> >>
> >> Consecutive writes can potentially be batched, I believe,
> and there has
> >> been
> >>
> >> talk of implementing that.
> >>
> >> I don't see any reason why other VCPUs shouldn't keep
> running in the
> >> meantime,
> >> though.
> >>
> >> > Second,  you just mentioned there is big difference
> between the number
> >> > of
> >> > HV-to-domain0 events for device model and split driver
> model. Could you
> >> > elaborate the details about how split driver model can reduce the
> >> > HV-to-domain0 events compared with using qemu device model?
> >>
> >> The PV split drivers are designed to minimise events:
> they'll queue up a
> >> load
> >> of IO requests in a batch and then notify dom0 that the IO
> requests are
> >> ready.
> >>
> >> In contrast, the FV device emulation can't do this: we
> have to consult
> >> dom0
> >> for the emulation of any device operations the guest does
> (e.g. each IO
> >> port
> >>
> >> read the guest does) so the batching is less efficient.
> >>
> >> Cheers,
> >> Mark
> >>
> >> > Have a wonderful weekend,
> >> >
> >> > Liang
> >> >
> >> > ----- Original Message -----
> >> > From: "Petersson, Mats" <Mats.Petersson@amd.com>
> >> > To: "Liang Yang" <multisyncfe991@hotmail.com>;
> >> > <xen-devel@lists.xensource.com>
> >> > Sent: Friday, March 16, 2007 10:40 AM
> >> > Subject: RE: [Xen-devel] Does Dom0 always get interrupts
> first before
> >> > they
> >> > are delivered to other guest domains?
> >> >
> >> > > -----Original Message-----
> >> > > From: xen-devel-bounces@lists.xensource.com
> >> > > [mailto:xen-devel-bounces@lists.xensource.com] On
> Behalf Of Liang
> >> > > Yang
> >> > > Sent: 16 March 2007 17:30
> >> > > To: xen-devel@lists.xensource.com
> >> > > Subject: [Xen-devel] Does Dom0 always get interrupts first
> >> > > before they are delivered to other guest domains?
> >> > >
> >> > > Hello,
> >> > >
> >> > > It seems if HVM domains access device using emulation mode
> >> > > w/ device model
> >> > > in domain0, Xen hypervisor will send the interrupt event to
> >> > > domain0 first
> >> > > and then the device model in domain0 will send event
> to HVM domains.
> >> >
> >> > Ok, so let's see if I've understood your question first:
> >> > If we do a disk-read (for example), the actual disk-read
> operation
> >> > itself will generate an interrupt, which goes into Xen
> HV where it's
> >> > converted to an event that goes to Dom0, which in turn
> wakes up the
> >> > pending call to read (in this case) that was requesting
> the disk IO,
> >> > and
> >> > then when the read-call is finished an event is sent to
> the HVM DomU.
> >> > Is
> >> > this the sequence of events that you're talking about?
> >> >
> >> > If that's what you are talking about, it must be done this way.
> >> >
> >> > > However, if I'm using split driver model and I only
> run BE driver on
> >> > > domain0. Does domain0 still get the interrupt first (assume
> >> > > this interupt is
> >> > > not owned by the Xen hypervisor ,e.g. local APIC timer) or
> >> > > Xen hypervisor
> >> > > will send event directly to HVM domain bypass domain0 for
> >> > > split driver
> >> > > model?
> >> >
> >> > Not in the above type of scenario. The interrupt must go to the
> >> > driver-domain (normally Dom0) to indicate that the
> hardware is ready to
> >> > deliver the data. This will wake up the user-mode call
> that waited for
> >> > the data, and then the data can be delivered to the
> guest domain from
> >> > there (which in turn is awakened by the event sent from
> the driver
> >> > domain).
> >> >
> >> > There is no difference in the number of events in these
> two cases.
> >> >
> >> > There is however a big difference in the number of
> hypervisor-to-dom0
> >> > events that occur: the HVM model will require something
> in the order of
> >> > 5 writes to the IDE controller to perform one disk read/write
> >> > operation.
> >> > Each of those will incur one event to wake up qemu-dm,
> and one event to
> >> > wake the domu (which will most likely just to one or two
> instructions
> >> > forward to hit the next write to the IDE controller).
> >> >
> >> > > Another question is: for interrupt delivery, does Xen treat
> >> > > para-virtualized
> >> > > domain differently from HVM domain considering using device
> >> > > model and split
> >> > > driver model?
> >> >
> >> > Not in interrupt delivery, no. Except for the fact that
> HVM domains
> >> > obviously have full hardware interfaces for interrupt
> controllers etc,
> >> > which adds a little bit of overhead (because each
> interrupt needs to be
> >> > acknowledged/cancelled on the interrupt controller, for example).
> >> >
> >> > --
> >> > Mats
> >> >
> >> > > Thanks a lot,
> >> > >
> >> > > Liang
> >> > >
> >> > >
> >> > > _______________________________________________
> >> > > Xen-devel mailing list
> >> > > Xen-devel@lists.xensource.com
> >> > > http://lists.xensource.com/xen-devel
> >> >
> >> > _______________________________________________
> >> > Xen-devel mailing list
> >> > Xen-devel@lists.xensource.com
> >> > http://lists.xensource.com/xen-devel
> >>
> >
> >
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>
>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2007-04-12 20:15 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <E1HQkNQ-0002f5-Pl@host-192-168-0-1-bcn-london>
2007-03-12 16:10 ` Xen-devel Digest, Vol 25, Issue 93 PUCCETTI Armand
2007-03-12 16:19   ` Petersson, Mats
2007-03-12 16:23     ` Keir Fraser
2007-03-12 16:26       ` More page-table questions Petersson, Mats
2007-03-12 16:32         ` Keir Fraser
2007-03-12 16:35           ` Petersson, Mats
2007-03-12 16:38             ` Keir Fraser
2007-03-15 22:15             ` Questions about device/event channels in Xen Liang Yang
2007-03-16  0:34               ` Mark Williamson
2007-03-16  6:02                 ` Liang Yang
2007-03-16  6:02                   ` Liang Yang
2007-03-16  8:45                     ` Keir Fraser
2007-03-16 17:30                       ` Does Dom0 always get interrupts first before they are delivered to other guest domains? Liang Yang
2007-03-16 17:40                         ` Petersson, Mats
2007-03-16 18:48                           ` Liang Yang
2007-03-21  0:37                             ` Mark Williamson
2007-03-21  1:23                               ` Liang Yang
2007-03-21  1:23                                 ` Liang Yang
2007-03-21  8:31                                 ` Does Dom0 always get interrupts first before they aredelivered " Tian, Kevin
2007-03-21  9:13                                 ` Does Dom0 always get interrupts first before they are delivered " Petersson, Mats
2007-04-07 16:59                                 ` Mark Williamson
2007-04-12  0:20                                   ` Does Dom0 always get interrupts first before theyare " Liang Yang
2007-04-12 14:00                                     ` Petersson, Mats
2007-04-12 20:15                                       ` Does Dom0 always get interrupts first beforetheyare " Liang Yang
2007-03-19 16:33                         ` Does Xen also plan to move the back-end driver to the stub domain for HVM? Liang Yang
2007-03-19 16:45                           ` Petersson, Mats
2007-03-19 18:20                           ` Anthony Liguori
2007-03-19 19:21                             ` Liang Yang
2007-03-19 20:20                               ` Anthony Liguori
2007-03-19 21:56                                 ` Question about reserving one CPU for the Xen hypervisor in case of vm exit Liang Yang
2007-03-20 10:13                                   ` Petersson, Mats
2007-03-20 10:03                               ` Re: Does Xen also plan to move the back-end driver to the stub domain for HVM? Petersson, Mats
2007-03-16  3:17               ` Questions about device/event channels in Xen Daniel Stodden
2007-03-16  8:38               ` Petersson, Mats
2007-03-12 17:27           ` More page-table questions PUCCETTI Armand
2007-03-12 17:42             ` Petersson, Mats
2007-03-13 16:25               ` Mark Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.