All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/PFC 0/2] s390 host support
@ 2007-04-27 13:40 Carsten Otte
  2007-04-27 16:19 ` Hollis Blanchard
       [not found] ` <1177681224.5770.20.camel-WIxn4w2hgUz3YA32ykw5MLlKpX0K8NHHQQ4Iyu8u01E@public.gmane.org>
  0 siblings, 2 replies; 31+ messages in thread
From: Carsten Otte @ 2007-04-27 13:40 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
  Cc: cborntra-tA70FqPdS9bQT0dZR+AlfA, schwidefsky-tA70FqPdS9bQT0dZR+AlfA

This series of patches adds support for s390host. This interface can be
used to run virtual machines using s390's hardware virtualization
capability SIE. Similar to the kvm interface on x86, this interface is
hardware dependent at this time. Patches apply against 2.6.21, please
review.
We intend to move to a common arch-independent kernel interface and
userspace with kvm.



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found] ` <1177681224.5770.20.camel-WIxn4w2hgUz3YA32ykw5MLlKpX0K8NHHQQ4Iyu8u01E@public.gmane.org>
@ 2007-04-27 15:14   ` Carsten Otte
  2007-04-28  6:27   ` Avi Kivity
  1 sibling, 0 replies; 31+ messages in thread
From: Carsten Otte @ 2007-04-27 15:14 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
  Cc: cborntra-tA70FqPdS9bQT0dZR+AlfA, schwidefsky-tA70FqPdS9bQT0dZR+AlfA

Am Freitag, den 27.04.2007, 15:40 +0200 schrieb Carsten Otte:
> This series of patches adds support for s390host.
I forgot to mention: this set of patches is licensed under the terms of
the GNU general public license. Future versions of these patches will
contain the corresponding header.



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
  2007-04-27 13:40 [PATCH/PFC 0/2] s390 host support Carsten Otte
@ 2007-04-27 16:19 ` Hollis Blanchard
       [not found]   ` <pan.2007.04.27.16.18.10.889473-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
       [not found] ` <1177681224.5770.20.camel-WIxn4w2hgUz3YA32ykw5MLlKpX0K8NHHQQ4Iyu8u01E@public.gmane.org>
  1 sibling, 1 reply; 31+ messages in thread
From: Hollis Blanchard @ 2007-04-27 16:19 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

On Fri, 27 Apr 2007 15:40:24 +0200, Carsten Otte wrote:

> This series of patches adds support for s390host. This interface can be
> used to run virtual machines using s390's hardware virtualization
> capability SIE. Similar to the kvm interface on x86, this interface is
> hardware dependent at this time. Patches apply against 2.6.21, please
> review.
> We intend to move to a common arch-independent kernel interface and
> userspace with kvm.

So if I understand correctly, this work is totally separate from KVM, but
you're posting for comment and intend to merge with KVM in the future.

To investigate PowerPC support I did some work to refactor the KVM code
into x86 vs shared bits. It was based on the in-kernel KVM code, so I need
to rebase that work on kvm.git. I guess you guys have the same
problem; have you done any work in that area?

-- 
Hollis Blanchard
IBM Linux Technology Center



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]   ` <pan.2007.04.27.16.18.10.889473-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2007-04-27 19:58     ` Carsten Otte
       [not found]       ` <463255F3.2000500-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
  2007-04-29  8:09     ` Heiko Carstens
  1 sibling, 1 reply; 31+ messages in thread
From: Carsten Otte @ 2007-04-27 19:58 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Hollis Blanchard wrote:
> So if I understand correctly, this work is totally separate from KVM, but
> you're posting for comment and intend to merge with KVM in the future.
Actually we started working on this quite long before kvm came up back 
in 2005 as prototype work. Our approach however seems fairly similar 
to kvm - that's why we intend to merge. At this time, it is totaly 
seperate.

> To investigate PowerPC support I did some work to refactor the KVM code
> into x86 vs shared bits. It was based on the in-kernel KVM code, so I need
> to rebase that work on kvm.git. I guess you guys have the same
> problem; have you done any work in that area?
So far, we spend some time reading the kvm kernel code. We wanted to give
people the chance to do the same with ours. We have'nt stated integrating
just yet, we rather prefer to agree which direction to aim for first.


so long
Carsten

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]       ` <463255F3.2000500-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
@ 2007-04-27 22:34         ` Dong, Eddie
  0 siblings, 0 replies; 31+ messages in thread
From: Dong, Eddie @ 2007-04-27 22:34 UTC (permalink / raw)
  To: carsteno-tA70FqPdS9bQT0dZR+AlfA, Hollis Blanchard
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Carsten Otte wrote:
> Hollis Blanchard wrote:
>> So if I understand correctly, this work is totally separate from
>> KVM, but you're posting for comment and intend to merge with KVM in
>> the future. 
> Actually we started working on this quite long before kvm came up back
> in 2005 as prototype work. Our approach however seems fairly similar
> to kvm - that's why we intend to merge. At this time, it is totaly
> seperate.
We also have an in house host based VMM solution for IA-64 and 
want to add back support to KVM. Any cross architecture support 
discussion will be in right time :-)
Thx, eddie

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found] ` <1177681224.5770.20.camel-WIxn4w2hgUz3YA32ykw5MLlKpX0K8NHHQQ4Iyu8u01E@public.gmane.org>
  2007-04-27 15:14   ` Carsten Otte
@ 2007-04-28  6:27   ` Avi Kivity
       [not found]     ` <4632E94C.20904-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 31+ messages in thread
From: Avi Kivity @ 2007-04-28  6:27 UTC (permalink / raw)
  To: Carsten Otte
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	cborntra-tA70FqPdS9bQT0dZR+AlfA,
	schwidefsky-tA70FqPdS9bQT0dZR+AlfA

Carsten Otte wrote:
> This series of patches adds support for s390host. This interface can be
> used to run virtual machines using s390's hardware virtualization
> capability SIE. 

I must say I'm pleasantly surprised by this.  I keep thinking of ppc and
ia64 as additional ports, while ignoring the big daddy of virtualization.


> Similar to the kvm interface on x86, this interface is
> hardware dependent at this time. Patches apply against 2.6.21, please
> review.
>   

It's all greek to me.

> We intend to move to a common arch-independent kernel interface and
> userspace with kvm.
>
>   

The address space and vcpu management are rather different from kvm's,
however your approach is better and we'll want to move kvm in your
direction rather than the other way round (specifically the tight vcpu
<-> task coupling; mmu is more diffcult).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]     ` <4632E94C.20904-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-04-28  8:45       ` Carsten Otte
       [not found]         ` <4633099D.3020709-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
  2007-04-29  8:11       ` Heiko Carstens
  1 sibling, 1 reply; 31+ messages in thread
From: Carsten Otte @ 2007-04-28  8:45 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christian Borntraeger, mschwid2-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8

Avi Kivity wrote:
> I must say I'm pleasantly surprised by this.  I keep thinking of ppc and
> ia64 as additional ports, while ignoring the big daddy of virtualization.
Thank you very much for the warm welcome :-).

>> Similar to the kvm interface on x86, this interface is
>> hardware dependent at this time. Patches apply against 2.6.21, please
>> review.
> 
> It's all greek to me.
Actually I felt the same about reading much of the kvm kernel code. 
Both architectures are rather different, and our task will be to 
identify spots where we can abstract to get code that is common for 
all architectures and find out where we need architecture dependent code.

> The address space and vcpu management are rather different from kvm's,
> however your approach is better and we'll want to move kvm in your
> direction rather than the other way round (specifically the tight vcpu
> <-> task coupling; mmu is more diffcult).
We have tried a file based approach for the cpus before too.

With regard to the memory, I do not quite understand why regular 
pageable user space memory does'nt work with vt and svm. We would 
definetly prefer to keep our virtual machine's memory pageable on 
s390, therefore I guess we need some arch dependent plug that 
allocates the memory. This would boil down to a regular anonymous 
allocation on s390, and to specifically allocated memory on x86.

so long,
Carsten


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]   ` <pan.2007.04.27.16.18.10.889473-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2007-04-27 19:58     ` Carsten Otte
@ 2007-04-29  8:09     ` Heiko Carstens
  1 sibling, 0 replies; 31+ messages in thread
From: Heiko Carstens @ 2007-04-29  8:09 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

> To investigate PowerPC support I did some work to refactor the KVM code
> into x86 vs shared bits. It was based on the in-kernel KVM code, so I need
> to rebase that work on kvm.git. I guess you guys have the same
> problem; have you done any work in that area?

We haven't done anything yet to make kvm non-x86 friendly. If you have
patches please send them.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]     ` <4632E94C.20904-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2007-04-28  8:45       ` Carsten Otte
@ 2007-04-29  8:11       ` Heiko Carstens
       [not found]         ` <20070429081157.GC8332-5VkHqLvV2o3MbYB6QlFGEg@public.gmane.org>
  1 sibling, 1 reply; 31+ messages in thread
From: Heiko Carstens @ 2007-04-29  8:11 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	cborntra-tA70FqPdS9bQT0dZR+AlfA,
	schwidefsky-tA70FqPdS9bQT0dZR+AlfA

> > We intend to move to a common arch-independent kernel interface and
> > userspace with kvm.
> The address space and vcpu management are rather different from kvm's,
> however your approach is better and we'll want to move kvm in your
> direction rather than the other way round (specifically the tight vcpu
> <-> task coupling; mmu is more diffcult).

How do we continue from here? Adding new architectures to the ioctl based
approach or change kvm to a syscall interface? Also IMHO it would be better
to move the code away from drivers and to kernel/ or virt/ with arch
dependent backends.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]         ` <20070429081157.GC8332-5VkHqLvV2o3MbYB6QlFGEg@public.gmane.org>
@ 2007-04-29  8:45           ` Avi Kivity
  2007-04-30 18:58             ` Hollis Blanchard
  0 siblings, 1 reply; 31+ messages in thread
From: Avi Kivity @ 2007-04-29  8:45 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	cborntra-tA70FqPdS9bQT0dZR+AlfA,
	schwidefsky-tA70FqPdS9bQT0dZR+AlfA

Heiko Carstens wrote:
>>> We intend to move to a common arch-independent kernel interface and
>>> userspace with kvm.
>>>       
>> The address space and vcpu management are rather different from kvm's,
>> however your approach is better and we'll want to move kvm in your
>> direction rather than the other way round (specifically the tight vcpu
>> <-> task coupling; mmu is more diffcult).
>>     
>
> How do we continue from here? Adding new architectures to the ioctl based
> approach or change kvm to a syscall interface? 

I think we can start the syscall based API (with compatibility ioctls 
for x86),  now that we have all four archs looking at it.

> Also IMHO it would be better
> to move the code away from drivers and to kernel/ or virt/ with arch
> dependent backends.
>   

I agree.  I'll do that some time after the merge window closes.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]         ` <4633099D.3020709-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
@ 2007-04-29  9:13           ` Avi Kivity
       [not found]             ` <463461B1.7060406-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Avi Kivity @ 2007-04-29  9:13 UTC (permalink / raw)
  To: carsteno-tA70FqPdS9bQT0dZR+AlfA
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christian Borntraeger, mschwid2-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8

Carsten Otte wrote:
>
>> The address space and vcpu management are rather different from kvm's,
>> however your approach is better and we'll want to move kvm in your
>> direction rather than the other way round (specifically the tight vcpu
>> <-> task coupling; mmu is more diffcult).
> We have tried a file based approach for the cpus before too.

We'll want to keep a vcpu fd.  If the vcpu is idle we'll be asleep in 
poll() or the like, and we need some kind of wakeup mechanism.

>
> With regard to the memory, I do not quite understand why regular 
> pageable user space memory does'nt work with vt and svm. We would 
> definetly prefer to keep our virtual machine's memory pageable on 
> s390, therefore I guess we need some arch dependent plug that 
> allocates the memory. This would boil down to a regular anonymous 
> allocation on s390, and to specifically allocated memory on x86.

I guess some of the difference stems from the fact that on x86, the 
Linux pagetables are actually the hardware pagetables.  VT and SVM use a 
separate page table for the guest which cannot be shared with the host. 
This means that

- we need to teach the Linux mm to look at shadow page tables when 
transferring dirty bits
- when Linux wants to write protect a page, it has to modify the shadow 
page tables too (and flush the guest tlbs, which is again a bit different)
- this means rmap has to be extended to include kvm

I think that non-x86 have purely software page tables, maybe this make 
things easier.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]             ` <463461B1.7060406-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-04-29 10:24               ` Carsten Otte
       [not found]                 ` <4634726F.10705-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Carsten Otte @ 2007-04-29 10:24 UTC (permalink / raw)
  To: Avi Kivity
  Cc: carsteno-tA70FqPdS9bQT0dZR+AlfA,
	kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christian Borntraeger, mschwid2-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8


Avi Kivity wrote:
> We'll want to keep a vcpu fd.  If the vcpu is idle we'll be asleep in 
> poll() or the like, and we need some kind of wakeup mechanism.
Our userspace does idle/wakeup differently:
One cpu exits sys_s390host_sie, and the intercept code indicates a 
halt with interrupts enabled (cpu idle loop). Now userland parks our 
vcpu thread in pthread_cond_wait. Once we want to wakeup this thread, 
either by interprocessor signal (need_resched and such) or due to an 
IO interrupt, we do a pthread_cond_signal to wakeup the thread again. 
The thread will now enter sys_s390host_sie, and after entering the 
vcpu context will execute the interrupt handler first.
The advantage of waiting in userland I see, is that userspace can dump 
interrupts to idle CPUs without kernel intervention. On the other 
hand, my brain hurts when thinking about userland passing vcpu fds to 
other threads/processes and when thinking about sys_fork().
In the end, you do the decision and we'll follow the way you lead to.

> I guess some of the difference stems from the fact that on x86, the 
> Linux pagetables are actually the hardware pagetables.  VT and SVM use a 
> separate page table for the guest which cannot be shared with the host. 
> This means that
> 
> - we need to teach the Linux mm to look at shadow page tables when 
> transferring dirty bits
> - when Linux wants to write protect a page, it has to modify the shadow 
> page tables too (and flush the guest tlbs, which is again a bit different)
> - this means rmap has to be extended to include kvm
> 
> I think that non-x86 have purely software page tables, maybe this make 
> things easier.
We do use hardware page tables too. Our hardware does know about 
mutiple levels of page translation, and does its part of maintaining 
different sets of dirty/reference bits for guest and host while 
running in the virtual machine context. This process is transparent 
for both virtual machine and host.
For the x86 part, I will spend some time to read the kvm code a little 
more.

so long,
Carsten

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]                 ` <4634726F.10705-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
@ 2007-04-29 10:48                   ` Avi Kivity
       [not found]                     ` <463477EE.3000406-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Avi Kivity @ 2007-04-29 10:48 UTC (permalink / raw)
  To: carsteno-tA70FqPdS9bQT0dZR+AlfA
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christian Borntraeger, mschwid2-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8

Carsten Otte wrote:
>
> Avi Kivity wrote:
>> We'll want to keep a vcpu fd.  If the vcpu is idle we'll be asleep in 
>> poll() or the like, and we need some kind of wakeup mechanism.
> Our userspace does idle/wakeup differently:
> One cpu exits sys_s390host_sie, and the intercept code indicates a 
> halt with interrupts enabled (cpu idle loop). Now userland parks our 
> vcpu thread in pthread_cond_wait. Once we want to wakeup this thread, 
> either by interprocessor signal (need_resched and such) or due to an 
> IO interrupt, we do a pthread_cond_signal to wakeup the thread again. 
> The thread will now enter sys_s390host_sie, and after entering the 
> vcpu context will execute the interrupt handler first.
> The advantage of waiting in userland I see, is that userspace can dump 
> interrupts to idle CPUs without kernel intervention. On the other 
> hand, my brain hurts when thinking about userland passing vcpu fds to 
> other threads/processes and when thinking about sys_fork().

In both cases you wait in the kernel; with an fd you wait in the kernel 
and with pthread_cond_wait you wait in futex(FUTEX_WAIT) or a close 
relative.

Can one do the equivalent of a futex wakeup from the kernel easily?

> In the end, you do the decision and we'll follow the way you lead to.
>

My primary concern is not to lock userspace into one way of working.  
This is really another sad side effect of the kernel providing a 
bazillion sleep/wakeup methods.

>> I guess some of the difference stems from the fact that on x86, the 
>> Linux pagetables are actually the hardware pagetables.  VT and SVM 
>> use a separate page table for the guest which cannot be shared with 
>> the host. This means that
>>
>> - we need to teach the Linux mm to look at shadow page tables when 
>> transferring dirty bits
>> - when Linux wants to write protect a page, it has to modify the 
>> shadow page tables too (and flush the guest tlbs, which is again a 
>> bit different)
>> - this means rmap has to be extended to include kvm
>>
>> I think that non-x86 have purely software page tables, maybe this 
>> make things easier.
> We do use hardware page tables too. Our hardware does know about 
> mutiple levels of page translation, and does its part of maintaining 
> different sets of dirty/reference bits for guest and host while 
> running in the virtual machine context. This process is transparent 
> for both virtual machine and host.

Nested page tables/extended page tables also provide this facility, with 
some caveats:

- on 32-bit hosts (or 64-bit hosts with 32-bit userspace), host 
userspace virtual address space is not enough to contain the guest 
physical address space.
- there is no way to protect the host userspace from the guest
- some annoying linker scripts need to be used to compile the host 
userspace to move it out of the guest userspace area, making it more 
difficult to write kvm userspace

I think there's a way to work around these issues on 64-bit npt 
hardware: allocate a pgd entry (at a non-zero offset) to hold guest 
physical memory, and copy this pgd entry into a guest-only pgd at offset 
zero.

Of course, there are many millions of non-npt/ept processors out there, 
and we can't leave them out in the cold, so we'll have to work something 
out for classical shadow page tables.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]                     ` <463477EE.3000406-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-04-29 11:15                       ` Carsten Otte
       [not found]                         ` <46347E6D.90409-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
  2007-04-29 12:13                       ` Heiko Carstens
  1 sibling, 1 reply; 31+ messages in thread
From: Carsten Otte @ 2007-04-29 11:15 UTC (permalink / raw)
  To: Avi Kivity
  Cc: carsteno-tA70FqPdS9bQT0dZR+AlfA,
	kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christian Borntraeger, mschwid2-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8


Avi Kivity wrote:
> In both cases you wait in the kernel; with an fd you wait in the kernel 
> and with pthread_cond_wait you wait in futex(FUTEX_WAIT) or a close 
> relative.
That is a good point indeed ;-).

> Can one do the equivalent of a futex wakeup from the kernel easily?
No, we did not have the need to do that. Now that you mention it, we'd 
want to move interprocessor signal handling into the kernel anyway for 
performance reasons. That would rise the need to wake up from kernel. 
The other way round, how do you intend to wake a thread that uses 
poll() or similar from userspace?

> Nested page tables/extended page tables also provide this facility, with 
> some caveats:
> 
> - on 32-bit hosts (or 64-bit hosts with 32-bit userspace), host 
> userspace virtual address space is not enough to contain the guest 
> physical address space.
> - there is no way to protect the host userspace from the guest
> - some annoying linker scripts need to be used to compile the host 
> userspace to move it out of the guest userspace area, making it more 
> difficult to write kvm userspace
> 
> I think there's a way to work around these issues on 64-bit npt 
> hardware: allocate a pgd entry (at a non-zero offset) to hold guest 
> physical memory, and copy this pgd entry into a guest-only pgd at offset 
> zero.
> 
> Of course, there are many millions of non-npt/ept processors out there, 
> and we can't leave them out in the cold, so we'll have to work something 
> out for classical shadow page tables.
No, of course not. The nested pagetable approach sounds neat to me. 
Does'nt
the fact that there will be no security barrier between guest 
userspace and virtual machine require running kvm as non privileged 
user in the end?

Our implementation does use action bits preseted to sys_s390host_sie 
to update the hardware control blocks for the virutal machine. The 
hardware control blocks would be mapped read-only to user address 
space. This way, the kernel can enforce the user not to mess things 
up, which allows to run non-privileged user code (userid johndoe 
instead of root). Would this approach be reasonable on x86 too?

so long,
Carsten

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]                         ` <46347E6D.90409-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
@ 2007-04-29 11:49                           ` Avi Kivity
       [not found]                             ` <46348661.6000909-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Avi Kivity @ 2007-04-29 11:49 UTC (permalink / raw)
  To: carsteno-tA70FqPdS9bQT0dZR+AlfA
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christian Borntraeger, mschwid2-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8

Carsten Otte wrote:
>
>
>> Can one do the equivalent of a futex wakeup from the kernel easily?
> No, we did not have the need to do that. Now that you mention it, we'd 
> want to move interprocessor signal handling into the kernel anyway for 
> performance reasons. That would rise the need to wake up from kernel. 
> The other way round, how do you intend to wake a thread that uses 
> poll() or similar from userspace?
>

Write to a pipe, or send a signal (signals are quite fast if you mask 
them in userspace and use ppoll()).

>> Nested page tables/extended page tables also provide this facility, 
>> with some caveats:
>>
>> - on 32-bit hosts (or 64-bit hosts with 32-bit userspace), host 
>> userspace virtual address space is not enough to contain the guest 
>> physical address space.
>> - there is no way to protect the host userspace from the guest
>> - some annoying linker scripts need to be used to compile the host 
>> userspace to move it out of the guest userspace area, making it more 
>> difficult to write kvm userspace
>>
>> I think there's a way to work around these issues on 64-bit npt 
>> hardware: allocate a pgd entry (at a non-zero offset) to hold guest 
>> physical memory, and copy this pgd entry into a guest-only pgd at 
>> offset zero.
>>
>> Of course, there are many millions of non-npt/ept processors out 
>> there, and we can't leave them out in the cold, so we'll have to work 
>> something out for classical shadow page tables.
> No, of course not. The nested pagetable approach sounds neat to me. 
> Does'nt
> the fact that there will be no security barrier between guest 
> userspace and virtual machine require running kvm as non privileged 
> user in the end?

The trick I mentioned (copying a pgd entry) means:

- guest physical and host userspace are different (have different pgds)
- guest physical (offset 0) is aliased to host userspace (offset $bignum)
- guest address space is limited to 2^(12+9*3)
- the pte dirty and accessed bits are shared

so guest userspace is not exposed, but the guest ptes _are_ shared.

In a way, this is similar to shared memory, if shared page tables are 
ever implemented.  Think of a shared memory segment mapped at two 
different offsets, but aligned at a pud boundary so everything below the 
pgd entry is sharable.

>
> Our implementation does use action bits preseted to sys_s390host_sie 
> to update the hardware control blocks for the virutal machine. The 
> hardware control blocks would be mapped read-only to user address 
> space. This way, the kernel can enforce the user not to mess things 
> up, which allows to run non-privileged user code (userid johndoe 
> instead of root). Would this approach be reasonable on x86 too?

Allowing the guest to hack the host userspace exposes the rest of the 
user's processes to a malicious guest, and allows the guest to open 
network connections through the host, no?

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]                     ` <463477EE.3000406-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2007-04-29 11:15                       ` Carsten Otte
@ 2007-04-29 12:13                       ` Heiko Carstens
       [not found]                         ` <20070429121351.GA8254-5VkHqLvV2o3MbYB6QlFGEg@public.gmane.org>
  1 sibling, 1 reply; 31+ messages in thread
From: Heiko Carstens @ 2007-04-29 12:13 UTC (permalink / raw)
  To: Avi Kivity
  Cc: carsteno-tA70FqPdS9bQT0dZR+AlfA,
	kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christian Borntraeger, mschwid2-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8

> Nested page tables/extended page tables also provide this facility, with 
> some caveats:
> 
> - on 32-bit hosts (or 64-bit hosts with 32-bit userspace), host 
> userspace virtual address space is not enough to contain the guest 
> physical address space.
> - there is no way to protect the host userspace from the guest

Sorry, but are you saying that it is currently possible to access
(read and/or write) host userspace address space from the guest?

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]                         ` <20070429121351.GA8254-5VkHqLvV2o3MbYB6QlFGEg@public.gmane.org>
@ 2007-04-29 12:27                           ` Avi Kivity
  0 siblings, 0 replies; 31+ messages in thread
From: Avi Kivity @ 2007-04-29 12:27 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: carsteno-tA70FqPdS9bQT0dZR+AlfA,
	kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christian Borntraeger, mschwid2-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8

Heiko Carstens wrote:
>> Nested page tables/extended page tables also provide this facility, with 
>> some caveats:
>>
>> - on 32-bit hosts (or 64-bit hosts with 32-bit userspace), host 
>> userspace virtual address space is not enough to contain the guest 
>> physical address space.
>> - there is no way to protect the host userspace from the guest
>>     
>
> Sorry, but are you saying that it is currently possible to access
> (read and/or write) host userspace address space from the guest?
>   

No.

First, we don't yet have support for npt (I'm promised a patch by AMD).

Second, the way I first planned it, guest physical and host userspace 
would be completely unrelated address spaces, with guest physical 
mmap()ed into host userspace.  This is how non-npt is implemented right now.

Third, our conversation gave rise to an idea of how to implement guest 
physical as a strict subset of host userspace.  This (a) preserves 
isolation, and (b) allows the Linux mm to operate unmodified [1] on the 
guest ptes.

I was being unclear: npt/ept _allows_ one to do this, but you're not 
_forced_ to.  The strict subset thing is a kind of a mix between the two 
that still preserves isolation.


[1]  We'd still need to teach it how to invalidate guest tlb entries, 
unfortunately.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]                             ` <46348661.6000909-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-04-29 14:27                               ` Carsten Otte
       [not found]                                 ` <4634AB6C.4020901-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
  2007-04-30 14:48                               ` Carsten Otte
  1 sibling, 1 reply; 31+ messages in thread
From: Carsten Otte @ 2007-04-29 14:27 UTC (permalink / raw)
  To: Avi Kivity
  Cc: carsteno-tA70FqPdS9bQT0dZR+AlfA,
	kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christian Borntraeger, mschwid2-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8


Avi Kivity wrote:
> The trick I mentioned (copying a pgd entry) means:
> 
> - guest physical and host userspace are different (have different pgds)
> - guest physical (offset 0) is aliased to host userspace (offset $bignum)
> - guest address space is limited to 2^(12+9*3)
> - the pte dirty and accessed bits are shared
> 
> so guest userspace is not exposed, but the guest ptes _are_ shared.
> 
> In a way, this is similar to shared memory, if shared page tables are 
> ever implemented.  Think of a shared memory segment mapped at two 
> different offsets, but aligned at a pud boundary so everything below the 
> pgd entry is sharable.
Nasty. We could also mask out host userspace while having guest memory 
mapped to it: our hw control block allows to set an offset and a 
memory size for the virtual image. This way, we can allow the virtual 
machine to work on a subset of the host user address space. At this 
time, our userspace does not use that feature because we had a 
different security model in mind (below).

>> Our implementation does use action bits preseted to sys_s390host_sie 
>> to update the hardware control blocks for the virutal machine. The 
>> hardware control blocks would be mapped read-only to user address 
>> space. This way, the kernel can enforce the user not to mess things 
>> up, which allows to run non-privileged user code (userid johndoe 
>> instead of root). Would this approach be reasonable on x86 too?
> 
> Allowing the guest to hack the host userspace exposes the rest of the 
> user's processes to a malicious guest, and allows the guest to open 
> network connections through the host, no?
The security model we had in mind was, that the user who starts the
userspace program equals root on the guest system but does not equal
root on the host.
This way, we have seperate users by means of regular kernel security 
barriers in the host linux: the user johndoe is capable of messing 
with his personal virtual machines and other resources, but can not 
mess with virtual machines and other resources belonging to other 
users. If the guest root choses to be malicious, he might well be able 
to take over the userspace and mess up with whatever the user is 
allowed to do on the host.
Frankly this does not mean we want to leave the door open for the 
guest intentionally, just that it is not an integral security issue 
for the hosting Linux if we would have a security bug.
Looks to me, like we have different security models today. If our 
model to start guests as regular user would work on all platforms 
without causing performance penalty, I think it would be worth to do 
that extra effort. If not, we could also implement the current kvm 
security barrier and reduce complexity of our s390host code.

so long,
Carsten

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]                                 ` <4634AB6C.4020901-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
@ 2007-04-29 15:06                                   ` Avi Kivity
  0 siblings, 0 replies; 31+ messages in thread
From: Avi Kivity @ 2007-04-29 15:06 UTC (permalink / raw)
  To: carsteno-tA70FqPdS9bQT0dZR+AlfA
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christian Borntraeger, mschwid2-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8

Carsten Otte wrote:
>
>>> Our implementation does use action bits preseted to sys_s390host_sie 
>>> to update the hardware control blocks for the virutal machine. The 
>>> hardware control blocks would be mapped read-only to user address 
>>> space. This way, the kernel can enforce the user not to mess things 
>>> up, which allows to run non-privileged user code (userid johndoe 
>>> instead of root). Would this approach be reasonable on x86 too?
>>
>> Allowing the guest to hack the host userspace exposes the rest of the 
>> user's processes to a malicious guest, and allows the guest to open 
>> network connections through the host, no?
> The security model we had in mind was, that the user who starts the
> userspace program equals root on the guest system but does not equal
> root on the host.
> This way, we have seperate users by means of regular kernel security 
> barriers in the host linux: the user johndoe is capable of messing 
> with his personal virtual machines and other resources, but can not 
> mess with virtual machines and other resources belonging to other 
> users. If the guest root choses to be malicious, he might well be able 
> to take over the userspace and mess up with whatever the user is 
> allowed to do on the host.
> Frankly this does not mean we want to leave the door open for the 
> guest intentionally, just that it is not an integral security issue 
> for the hosting Linux if we would have a security bug.

I don't know what your usage model is, but it seems to me that leaving 
the host userspace at the mercy of the guest is a fairly large security 
hole:

- the guest can modify the user's files, and read other users' files
- the guest can access the host's network, possibly bypassing any 
firewalling that is set up for the guest
- the guest can access other virtual machines on the host

So, if the guest is broken into, or if you download an untrusted guest 
image ("virtual appliance"), then potentially large amounts of data are 
at risk, even if you run as a regular user.  Does your usage model allow 
this?

> Looks to me, like we have different security models today. If our 
> model to start guests as regular user would work on all platforms 
> without causing performance penalty, I think it would be worth to do 
> that extra effort. If not, we could also implement the current kvm 
> security barrier and reduce complexity of our s390host code.

kvm/x86 also allows running as a regular user.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]                             ` <46348661.6000909-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2007-04-29 14:27                               ` Carsten Otte
@ 2007-04-30 14:48                               ` Carsten Otte
       [not found]                                 ` <463601A3.3070206-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 31+ messages in thread
From: Carsten Otte @ 2007-04-30 14:48 UTC (permalink / raw)
  To: Avi Kivity
  Cc: carsteno-tA70FqPdS9bQT0dZR+AlfA,
	kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christian Borntraeger, mschwid2-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8

Avi Kivity wrote:
> Carsten Otte wrote:
>> No, we did not have the need to do that. Now that you mention it, we'd 
>> want to move interprocessor signal handling into the kernel anyway for 
>> performance reasons. That would rise the need to wake up from kernel. 
>> The other way round, how do you intend to wake a thread that uses 
>> poll() or similar from userspace?
>>
> 
> Write to a pipe, or send a signal (signals are quite fast if you mask 
> them in userspace and use ppoll()).
Signals have the disadvantage that they wake all guest CPUs (unless 
one dedicates a singal per vcpu which does'nt scale). I think we need 
a wakeup mechanism that can be used to send an interrupt to a specific 
idle cpu from both kernel and userland. Pipes and poll (one per cpu) 
would allow that, but it seems to me like there must be better options.
After having slept over it, I think that our idle/wakeup mechanism for 
s390host is a mess. I will try to come up with an idea for this.

 > I don't know what your usage model is, but it seems to me that
 > leaving the host userspace at the mercy of the guest is a fairly
 > large security hole:
 > - the guest can modify the user's files, and read other users' files
 > - the guest can access the host's network, possibly bypassing any
 > firewalling that is set up for the guest
 > - the guest can access other virtual machines on the host
 >
 > So, if the guest is broken into, or if you download an untrusted
 > guest image ("virtual appliance"), then potentially large amounts of
 > data are at risk, even if you run as a regular user.  Does your
 > usage model allow this?
Okay, I am convinced. We need to secure both. That will cause some 
rework of our IO device drivers we use in our prototype, which don't 
exactly care to check input data from the guest in userspace today.
Also, I am going to figure why kvm does'nt to run non-root in my local 
x86 installation.

so long,
Carsten

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]                                 ` <463601A3.3070206-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
@ 2007-04-30 14:56                                   ` Avi Kivity
       [not found]                                     ` <463603B6.3010105-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Avi Kivity @ 2007-04-30 14:56 UTC (permalink / raw)
  To: carsteno-tA70FqPdS9bQT0dZR+AlfA
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christian Borntraeger, mschwid2-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8

Carsten Otte wrote:
> Avi Kivity wrote:
>> Carsten Otte wrote:
>>> No, we did not have the need to do that. Now that you mention it, 
>>> we'd want to move interprocessor signal handling into the kernel 
>>> anyway for performance reasons. That would rise the need to wake up 
>>> from kernel. The other way round, how do you intend to wake a thread 
>>> that uses poll() or similar from userspace?
>>>
>>
>> Write to a pipe, or send a signal (signals are quite fast if you mask 
>> them in userspace and use ppoll()).
> Signals have the disadvantage that they wake all guest CPUs (unless 
> one dedicates a singal per vcpu which does'nt scale). I think we need 
> a wakeup mechanism that can be used to send an interrupt to a specific 
> idle cpu from both kernel and userland. 

You can send a signal to a specific thread.  See tkill(2).

> Pipes and poll (one per cpu) would allow that, but it seems to me like 
> there must be better options.
> After having slept over it, I think that our idle/wakeup mechanism for 
> s390host is a mess. I will try to come up with an idea for this.

If the eventfd patchset is merged, then file descriptors will become the 
standard Linux handle type, and poll (or rather, epoll) will become the 
standard way of waiting for something to happen.  But of course if you 
come up with something better we'll use that.

Having an fd will also simplify the kernel/userspace communication area 
setup (current kvm does a similar thing).

>
> Also, I am going to figure why kvm does'nt to run non-root in my local 
> x86 installation.
>

chmod 0666 /dev/kvm should be enough, if you don't use bridged networking.

(the long way is to add a kvm group, add yourself to that, and add a 
udev rule setting /dev/kvm's group to kvm).

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
  2007-04-29  8:45           ` Avi Kivity
@ 2007-04-30 18:58             ` Hollis Blanchard
       [not found]               ` <pan.2007.04.30.18.58.56.432063-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Hollis Blanchard @ 2007-04-30 18:58 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

On Sun, 29 Apr 2007 11:45:33 +0300, Avi Kivity wrote:

> Heiko Carstens wrote:
>>>> We intend to move to a common arch-independent kernel interface and
>>>> userspace with kvm.
>>>>       
>>> The address space and vcpu management are rather different from kvm's,
>>> however your approach is better and we'll want to move kvm in your
>>> direction rather than the other way round (specifically the tight vcpu
>>> <-> task coupling; mmu is more diffcult).
>>>     
>>>     
>> How do we continue from here? Adding new architectures to the ioctl
>> based approach or change kvm to a syscall interface?
> 
> I think we can start the syscall based API (with compatibility ioctls
> for x86),  now that we have all four archs looking at it.

It would probably make sense for the IA64 and S390 folks, who already have
syscall-based implementations, to put up a straw man interface for comment?

I was looking at refactoring the ioctl interface, but since we're dropping
it then I'm glad I haven't put too much time into it. :)

-- 
Hollis Blanchard
IBM Linux Technology Center



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]               ` <pan.2007.04.30.18.58.56.432063-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2007-05-01  6:43                 ` Avi Kivity
  2007-05-01 14:53                   ` Hollis Blanchard
  0 siblings, 1 reply; 31+ messages in thread
From: Avi Kivity @ 2007-05-01  6:43 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Hollis Blanchard wrote:
>>
>> I think we can start the syscall based API (with compatibility ioctls
>> for x86),  now that we have all four archs looking at it.
>>     
>
> It would probably make sense for the IA64 and S390 folks, who already have
> syscall-based implementations, to put up a straw man interface for comment?
>   

Yes please.

> I was looking at refactoring the ioctl interface, but since we're dropping
> it then I'm glad I haven't put too much time into it. :)
>   

Well, that's needed anyway.  The ioctl interface isn't going away.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
  2007-05-01  6:43                 ` Avi Kivity
@ 2007-05-01 14:53                   ` Hollis Blanchard
       [not found]                     ` <pan.2007.05.01.14.53.20.257696-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Hollis Blanchard @ 2007-05-01 14:53 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

On Tue, 01 May 2007 09:43:57 +0300, Avi Kivity wrote:

> Hollis Blanchard wrote:
>>>
>>> I think we can start the syscall based API (with compatibility ioctls
>>> for x86),  now that we have all four archs looking at it.
>> 
>> I was looking at refactoring the ioctl interface, but since we're
>> dropping it then I'm glad I haven't put too much time into it. :)
>>      
> Well, that's needed anyway.  The ioctl interface isn't going away.

Maybe I misunderstood. When you said this:
> I think we can start the syscall based API (with compatibility ioctls
> for x86),  now that we have all four archs looking at it.

I thought you meant that all architectures would use the syscall
interface, and x86 would only continue to support the ioctls as a legacy
interface. In that case, I think kvm_main.c would basically need a rename
to kvm_ioctl.c, and it would be built only for x86 so wouldn't need any
portability.

-- 
Hollis Blanchard
IBM Linux Technology Center



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]                     ` <pan.2007.05.01.14.53.20.257696-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2007-05-01 14:57                       ` Avi Kivity
  0 siblings, 0 replies; 31+ messages in thread
From: Avi Kivity @ 2007-05-01 14:57 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Hollis Blanchard wrote:
> On Tue, 01 May 2007 09:43:57 +0300, Avi Kivity wrote:
>
>   
>> Hollis Blanchard wrote:
>>     
>>>> I think we can start the syscall based API (with compatibility ioctls
>>>> for x86),  now that we have all four archs looking at it.
>>>>         
>>> I was looking at refactoring the ioctl interface, but since we're
>>> dropping it then I'm glad I haven't put too much time into it. :)
>>>      
>>>       
>> Well, that's needed anyway.  The ioctl interface isn't going away.
>>     
>
> Maybe I misunderstood. When you said this:
>   
>> I think we can start the syscall based API (with compatibility ioctls
>> for x86),  now that we have all four archs looking at it.
>>     
>
> I thought you meant that all architectures would use the syscall
> interface, and x86 would only continue to support the ioctls as a legacy
> interface. In that case, I think kvm_main.c would basically need a rename
> to kvm_ioctl.c, and it would be built only for x86 so wouldn't need any
> portability.
>   

That is what I meant.  But we wouldn't want to duplicate all that code, 
would we?  The ioctl interface and the syscall interface have to call 
the same internal API, so at least some refactoring is needed.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]                                     ` <463603B6.3010105-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-05-14 14:17                                       ` Carsten Otte
       [not found]                                         ` <46486F89.3080609-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Carsten Otte @ 2007-05-14 14:17 UTC (permalink / raw)
  To: Avi Kivity
  Cc: carsteno-tA70FqPdS9bQT0dZR+AlfA,
	kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christian Borntraeger, mschwid2-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8

Avi Kivity wrote:
> If the eventfd patchset is merged, then file descriptors will become the 
> standard Linux handle type, and poll (or rather, epoll) will become the 
> standard way of waiting for something to happen.  But of course if you 
> come up with something better we'll use that.
Triggered by this discussion, I have spent quite some time thinking 
about signaling, idle cpus, interrupts, signal processor (read: IPI) 
and such lately. It has become clear to me, that sleeping in userspace 
has been a bad design point I've made. The way kvm deals with this 
(idle cpu thread sleeps interruptible in kernel) is clearly preferable.

Our sie system call has quite a complex userspace interface that 
allows the user to modify various bits and pieces of our virtual cpu 
control block. All this is needed only, because we do a lot of wrong 
things in userspace. Like signal processor (read: IPI). I will go 
ahead and put these things into our kernel module. That should 
simplify our user<->kernel interface a lot.
One problem is, that we need to inject interrupts from userland. This 
requires waking up idle CPUs. I want to try how it comes out with a 
new system call for irqs rather then using tkill(). We could have the 
kernel choose the vcpu that is enabled for this interrupt for example. 
And the kernel can do optimizations like sending irqs to idle cpus 
preferably. The user could supply a CPU mask that specifies what CPUs 
come into question for the irq.

Another neat advantage of moving SIE specifics into the kernel module 
is, that our userspace will be left with device drivers only. We can 
then put those into kvm/qemu or switch to other paravirtual device 
drivers and discard our userspace code.

I believe once we've changed that, merging with kvm on both kernel and 
user side should become easier then it is today.

so long,
Carsten

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]                                         ` <46486F89.3080609-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
@ 2007-05-14 14:50                                           ` Avi Kivity
       [not found]                                             ` <4648774E.2060304-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Avi Kivity @ 2007-05-14 14:50 UTC (permalink / raw)
  To: carsteno-tA70FqPdS9bQT0dZR+AlfA
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christian Borntraeger, mschwid2-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8

Carsten Otte wrote:
> Avi Kivity wrote:
>> If the eventfd patchset is merged, then file descriptors will become 
>> the standard Linux handle type, and poll (or rather, epoll) will 
>> become the standard way of waiting for something to happen.  But of 
>> course if you come up with something better we'll use that.
> Triggered by this discussion, I have spent quite some time thinking 
> about signaling, idle cpus, interrupts, signal processor (read: IPI) 
> and such lately. It has become clear to me, that sleeping in userspace 
> has been a bad design point I've made. The way kvm deals with this 
> (idle cpu thread sleeps interruptible in kernel) is clearly preferable.

kvm doesn't do this directly.  A hlt instruction (which is is used on 
x86 to signal an idle cpu) is trapped and echoed to userspace, which 
then sleeps using select(2).

We thought of having hlt sleep in the kernel, but that meant that we 
would need to specify the exit conditions from sleep (signals, fd 
readiness, aio readiness).

(I think you're comparing to your pthread way of sleeping and waking, 
just making sure we're all on the same page here)



-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]                                             ` <4648774E.2060304-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-05-14 15:26                                               ` Carsten Otte
       [not found]                                                 ` <46487FA5.4090905-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Carsten Otte @ 2007-05-14 15:26 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christian Borntraeger, carsteno-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	mschwid2-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8

Avi Kivity wrote:
> kvm doesn't do this directly.  A hlt instruction (which is is used on 
> x86 to signal an idle cpu) is trapped and echoed to userspace, which 
> then sleeps using select(2).
I've read that part. I still don't like this approach, it just does'nt 
fit our signal processor instruction without interresting race conditions.
Our SIGP instruction does provide a condition code on the source cpu 
which indicates if the interprocessor signal was accepted by the 
target cpu.
- When the target CPU is going idle, but has not yet called signal(), 
how can we figure from kernel space if it has masked this interrupt? 
We would want to figure that quick to be able to reenter VM context on 
the initiating CPU asap.
- Also this requires synchronization, our arch requires there may be 
just one external interrupt pending per target CPU at a given time. 
How do we synchronize if both user and kernel can inject interrupts?

> We thought of having hlt sleep in the kernel, but that meant that we 
> would need to specify the exit conditions from sleep (signals, fd 
> readiness, aio readiness).
Yes, that is required indeed. I think pending signals should make the
syscall exit. AIO translates to SIGIO, and file descriptors should be 
checked by another pthread via poll.

> (I think you're comparing to your pthread way of sleeping and waking, 
> just making sure we're all on the same page here)
Yes we are. Sorry for confusion.

so long,
Carsten

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]                                                 ` <46487FA5.4090905-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
@ 2007-05-14 15:29                                                   ` Carsten Otte
       [not found]                                                     ` <46488047.8090404-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
  2007-05-14 15:53                                                   ` Avi Kivity
  1 sibling, 1 reply; 31+ messages in thread
From: Carsten Otte @ 2007-05-14 15:29 UTC (permalink / raw)
  To: carsteno-tA70FqPdS9bQT0dZR+AlfA
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christian Borntraeger, carsteno-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	mschwid2-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8

Carsten Otte wrote:
> - When the target CPU is going idle, but has not yet called signal(), 
> how can we figure from kernel space if it has masked this interrupt?
*Ouch*. Should be select(), not signal().

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]                                                 ` <46487FA5.4090905-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
  2007-05-14 15:29                                                   ` Carsten Otte
@ 2007-05-14 15:53                                                   ` Avi Kivity
  1 sibling, 0 replies; 31+ messages in thread
From: Avi Kivity @ 2007-05-14 15:53 UTC (permalink / raw)
  To: carsteno-tA70FqPdS9bQT0dZR+AlfA
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christian Borntraeger, carsteno-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	mschwid2-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8

Carsten Otte wrote:
> Avi Kivity wrote:
>> kvm doesn't do this directly.  A hlt instruction (which is is used on 
>> x86 to signal an idle cpu) is trapped and echoed to userspace, which 
>> then sleeps using select(2).
> I've read that part. I still don't like this approach, it just does'nt 
> fit our signal processor instruction without interresting race 
> conditions.

x86 and s390 don't have to be the same when hardware differences 
warrant.  But see below.

> Our SIGP instruction does provide a condition code on the source cpu 
> which indicates if the interprocessor signal was accepted by the 
> target cpu.
> - When the target CPU is going idle, but has not yet called signal(), 
> how can we figure from kernel space if it has masked this interrupt? 
> We would want to figure that quick to be able to reenter VM context on 
> the initiating CPU asap.

I don't understand the context here.  Why would the target cpu call 
signal()?

> - Also this requires synchronization, our arch requires there may be 
> just one external interrupt pending per target CPU at a given time. 
> How do we synchronize if both user and kernel can inject interrupts?

With the code that's going into kvm now, userspace posts the interrupt 
to the kernel, and the kernel injects it.  So the kernel is the 
synchronization point (x86 has the same constraint).

>
>> We thought of having hlt sleep in the kernel, but that meant that we 
>> would need to specify the exit conditions from sleep (signals, fd 
>> readiness, aio readiness).
> Yes, that is required indeed. I think pending signals should make the
> syscall exit. AIO translates to SIGIO, and file descriptors should be 
> checked by another pthread via poll.
>

Currently qemu multiplexes fd readiness and vcpu execution on the same 
(and only) thread, but it may make sense to have completions reaped by 
an I/O thread, which then dispatches interrupts to the appropriate vcpu, 
if necessary.  That avoids unnecessary exits, especially if we have 
interrupt mitigation and guest smp.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH/PFC 0/2] s390 host support
       [not found]                                                     ` <46488047.8090404-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
@ 2007-05-14 15:55                                                       ` Avi Kivity
  0 siblings, 0 replies; 31+ messages in thread
From: Avi Kivity @ 2007-05-14 15:55 UTC (permalink / raw)
  To: carsteno-tA70FqPdS9bQT0dZR+AlfA
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Christian Borntraeger, carsteno-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	mschwid2-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8

Carsten Otte wrote:
> Carsten Otte wrote:
>> - When the target CPU is going idle, but has not yet called signal(), 
>> how can we figure from kernel space if it has masked this interrupt?
> *Ouch*. Should be select(), not signal().

Ah ok.  The kernel signal (or fd readiness) logic takes care of this and 
avoids unnecessary wakeups.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2007-05-14 15:55 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-27 13:40 [PATCH/PFC 0/2] s390 host support Carsten Otte
2007-04-27 16:19 ` Hollis Blanchard
     [not found]   ` <pan.2007.04.27.16.18.10.889473-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-04-27 19:58     ` Carsten Otte
     [not found]       ` <463255F3.2000500-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-04-27 22:34         ` Dong, Eddie
2007-04-29  8:09     ` Heiko Carstens
     [not found] ` <1177681224.5770.20.camel-WIxn4w2hgUz3YA32ykw5MLlKpX0K8NHHQQ4Iyu8u01E@public.gmane.org>
2007-04-27 15:14   ` Carsten Otte
2007-04-28  6:27   ` Avi Kivity
     [not found]     ` <4632E94C.20904-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-28  8:45       ` Carsten Otte
     [not found]         ` <4633099D.3020709-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-04-29  9:13           ` Avi Kivity
     [not found]             ` <463461B1.7060406-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-29 10:24               ` Carsten Otte
     [not found]                 ` <4634726F.10705-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-04-29 10:48                   ` Avi Kivity
     [not found]                     ` <463477EE.3000406-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-29 11:15                       ` Carsten Otte
     [not found]                         ` <46347E6D.90409-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-04-29 11:49                           ` Avi Kivity
     [not found]                             ` <46348661.6000909-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-29 14:27                               ` Carsten Otte
     [not found]                                 ` <4634AB6C.4020901-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-04-29 15:06                                   ` Avi Kivity
2007-04-30 14:48                               ` Carsten Otte
     [not found]                                 ` <463601A3.3070206-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-04-30 14:56                                   ` Avi Kivity
     [not found]                                     ` <463603B6.3010105-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-14 14:17                                       ` Carsten Otte
     [not found]                                         ` <46486F89.3080609-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-05-14 14:50                                           ` Avi Kivity
     [not found]                                             ` <4648774E.2060304-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-05-14 15:26                                               ` Carsten Otte
     [not found]                                                 ` <46487FA5.4090905-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-05-14 15:29                                                   ` Carsten Otte
     [not found]                                                     ` <46488047.8090404-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
2007-05-14 15:55                                                       ` Avi Kivity
2007-05-14 15:53                                                   ` Avi Kivity
2007-04-29 12:13                       ` Heiko Carstens
     [not found]                         ` <20070429121351.GA8254-5VkHqLvV2o3MbYB6QlFGEg@public.gmane.org>
2007-04-29 12:27                           ` Avi Kivity
2007-04-29  8:11       ` Heiko Carstens
     [not found]         ` <20070429081157.GC8332-5VkHqLvV2o3MbYB6QlFGEg@public.gmane.org>
2007-04-29  8:45           ` Avi Kivity
2007-04-30 18:58             ` Hollis Blanchard
     [not found]               ` <pan.2007.04.30.18.58.56.432063-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-05-01  6:43                 ` Avi Kivity
2007-05-01 14:53                   ` Hollis Blanchard
     [not found]                     ` <pan.2007.05.01.14.53.20.257696-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-05-01 14:57                       ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.