Re: Xen-devel Digest, Vol 25, Issue 93

* Re: Xen-devel Digest, Vol 25, Issue 93
       [not found] <E1HQkNQ-0002f5-Pl@host-192-168-0-1-bcn-london>
@ 2007-03-12 16:10 ` PUCCETTI Armand
  2007-03-12 16:19   ` Petersson, Mats
  0 siblings, 1 reply; 35+ messages in thread
From: PUCCETTI Armand @ 2007-03-12 16:10 UTC (permalink / raw)
  To: xen-devel

>> When the system boots, the processor is normally in "real-mode", and
>> it's definitely not got paging enabled. So we have to "make 
>> the guest OS
>> believe this is the case". But at the same time, the guest OS is most
>> likely not loaded at address zero in memory, so we need paging enabled
>> to remap the GUEST PHYSICAL address to match the machine physical
>> address. So we have a "linear map" to translate the "address zero" to
>> the "start of guest memory", and so on for every page of memory in the
>> guest.
>>
>> This is not hard to do, since the AMD-V/VT feature of the processor
>> expects the paging-bit to be different between what the guest "thinks"
>> and the actual case. In the AMD-V, there's even support to 
>> run real-mode
>> with paging enabled, so all the BIOS-code and such will be running in
>> this mode. VT has to do a bunch of tricky stuff to work around that
>> problem.
>>
>> Ok fine, does this argument holds true for even non-VT and 
>> non-Pacifica enabled processors?
>> I doubt it.
>>     
>
> Not precisely. I'm talking only about HVM mode, which is "full
> virtualization". PV-mode uses a different paging interface, which at
> least for most parts, comprise of changing the whole area of code in the
> kernel that updates the page-tables, by adding code that is aware of the
> THREE types of address (guest-virtual, guest-physical and
> machine-physical). This means that there's no real need for the
> "read-only page-tables" and "shadow-mode" - the page-table just contains
> the right value for the machine-physical address. [That's not to say
> that read-only page-tables can't be used in a PV system too - I'm not
> 100% sure how the page-table management works in the PV mode]. 
>   
That is very interesting info on the paging system. Mats, could you please
explain a bit the working of the PV paging? How do the the guest+host 
page tables work
together? What does the guest page table point to, i.e. how+when is it 
mapped onto the host page table?

I have seen in the code that there are different cases of guest+host 
paging table heights. Why?

thanks. Armand
>>> I hope i made myself clear.
>>> Please enlighten me :-).
>>>
>>> When paging is enabled, we use a shadow page-table, which is
>>> essentially
>>> that the GUEST sees one page-table, and the processor another
>>> (thanks to
>>> the fact that the hypervisor intercepts the CR3 read/write 
>>>       
>> operations,
>>     
>>> and when CR3 is read back by the guest, we don't send back the value
>>> it's ACTUALLY POINTING TO IN THE PROCESSOR, but the value 
>>>       
>> that was set
>>     
>>> by the guest). So there are two page-tables.
>>>
>>> Got this well, thanks Mats :).
>>>
>>> To make the page-table updates by the guest visible to the 
>>>       
>> hypervisor,
>>     
>>> all of the guest-page-tables are made read-only (by scanning
>>> the new CR3
>>> value whenever one is set).
>>>
>>> I didn't get this either well :(
>>> sorry, but do you mean CR3 for the guest or for the
>>> processor? i hope you mean guest?
>>>       
>> Yes, scan the guest-CR3 to see where it placed the page-tables.
>>
>>     
>>> Whenever a page-fault happens, the hypervisor has "first look", and
>>> determines if the update is for a page-table or not. If it is a
>>> page-table update, the guest operation is emulated (in 
>>>       
>> x86_emulate.c),
>>     
>>> and the result is written to the shadow-page-table AND the
>>>
>>> Why do we need emulation?some peculiar reason for emulating?
>>> Do you mean to say if i am running a 32 bit domU on top of a
>>> 64 bit processor, the guest operation for updating the page
>>> table is emulated by the hypervisor.am i right?
>>>       
>> No, it's simply because we need to see the result of the 
>> instruction and
>> write it to two places (with some modification in one of 
>> those places).
>> So if the code is doing, for example: "*pte |= 1;" (set a
>> page-table-entry to "present"), we need to mark both the
>> guest-page-table-entry to "present", and mark our 
>> shadow-entry "present"
>> (and perhaps do some other work too, but that's the minimum work
>> needed).
>>
>> This brings one more question in my mind.Why do we use pinning then?
>>     
>
> I believe there's two types of pinning! Page-pinning, which is blocking
> a page from being accessed in an incorrect way [again, I'm not 100% sure
> how this works, or exactly what it does - just that it's a term used in
> the general way I described in the previous sentence]. 
>
>   
>> As i see at it.To avoid shadow page tables to be swapped out 
>> before the page tables they actually point to are swapped.Am i right?
>>
>> But according to interface manual,-> to bind a vcpu to a 
>> specific CPU in a SMP environment we use pining.But these two 
>> look pretty orthogonal statements to me, which means i may be 
>> wrong :(.
>> Can somebody help me in this regard?
>>     
>
> CPU pinning is to tie a VCPU to a (set of) processor(s). For example,
> you may want to pin Dom0 to run only on CPU0, and pin a DomU to run on
> CPU's 1,2 and 3. That way, Dom0 is ALWAYS able to run on it's own CPU,
> and it's never in contention about which CPU to use, and DomU can run on
> three CPU's as much as it likes. You could have another DomU pinned to
> CPU 3 if you wish. That means that CPU 1, 2 are exclusively for the
> first DomU, whilst the second DomU shares CPU3 with the first DomU (so
> they both get half the CPU performance of one CPU - on average over a
> reasonable amount of time). 
>
> --
>   

^ permalink raw reply	[flat|nested] 35+ messages in thread