All of lore.kernel.org
 help / color / mirror / Atom feed
* [ARM] Native application design and discussion (I hope)
@ 2017-04-06 20:21 Volodymyr Babchuk
  2017-04-06 21:31 ` Stefano Stabellini
  0 siblings, 1 reply; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-04-06 20:21 UTC (permalink / raw)
  To: Xen Devel, Artem Mygaiev, Julien Grall, Stefano Stabellini

Hello all,

I want to discuss EL0 (native) applications for XEN. This will be relatively
long e-mail with requirements, proposed design and my PoC results.

So, why we want XEN native applications in the first place? I see the following
reasons:

1. Isolation. I see XEN as a sort of micro-kernel, so there are no place for
device drivers, emulators, specific SMC handlers, hypervisor extension, etc..

2. Modularity. Just look at Linux kernel. Obviously, for different devices we
can load different drivers.

3. Performance. Native application should be faster than stub domain, or there
will be no sense in it.

4. Ease of use. I want to make call to EL0 app as easy as possible.
Ideally - as a function call.

Actually, no one wants extra code in hypervisor, so reasons (1) and (2) are most
important. I know that there was tries to do such thing in x86 but with
different approach. I want describe my idea for arm64.

Native application is an another domain type. It has own vCPU (only one at this
moment) Native app is loaded as any other kernel, using ELF loader.
It looks like another stub-domain such as MiniOS, but there are two big
differences:

1. MiniOS has event loop that serves requests from hypervisor. Native
application does not has such loop. It has one entry point where you jump every
time when you need something from it.

2. Native application runs in EL0 mode, so it does not have access to MMU,
it can't handle vIQRs, exceptions and so on. XEN does all this for it.

You can find example native application at [1]. I used exactly this one to
benchmark my implementation. Mostly it is inspired by approach used in TEE.
Actually, I took some code directly from OP-TEE Trusted Application library.
In app_entry.c you can find entry point - __app_entry(). It takes function
number and some parameters that will be passed to a function. I probably going
to change ABI a bit, but basic idea will be the same.

Function number will be something like APP_INIT, APP_HANDLE_SMC
or APP_HANDLE_MMIO... I think you got the idea. I also implemented two syscalls
(via old plain SVC instruction). app_log() writes to XEN log, app_return() exits
from application back to hypervisor. We will need other syscalls like
app_call_smc(), app_map_guest_page(), app_map_io(), etc.

Now, back to XEN. Classic way to handle something with stubdomain is to
write request to a ring buffer, fire an event through event channel, that will
trigger vIRQ in stubdomain and stubdomain's vCPU will be scheduled to handle
a request. Problem it that you can't control scheduler, so you don't know
when your request will be really handled, which in not fine in some
embedded use cases.

There is how I see handling requests with native application:

0. Hypervisor pauses requester vCPU
1. Hypervisor either passes parameters via registers or writes request to a
shared page/ring buffer.
2. Then in sets PC of native app vCPU to entry point and initializes r0-r7
with function code and other parameters.
3. Hypervisor switches context to native app vCPU
4. When native app finishes request handling it calls special syscall app_exit()
5. Hypervisor analyses return code, updates requester vCPU state (if needed),
switches back to that vCPU, unpauses it.

Most of that was done at [2]. Most interesting part is in arch/arm/domain.c
There are functions call_el0_app() and return_from_el0_app() that do most
of the work. Also I have added syscalls handlers (in the same way,
as hypercalls are handled). You can find them at xen/arch/arm/syscall.c

At this moment entry point is hardcoded and you need to update it every time
you rebuild native application. Also there are no actual parameters passed.
Also, whole code is a piece of gosa, because it was first time I hacked XEN.

I don't want to repeat benchmark results, because they already was posted in ML.
You can find them at [3].

I understand that I have missed many things:

1. How to ship and load native app, because some of them will be needed even
before dom0 is created.

2. How to distinguish multiple native apps

3. Concurrency in native apps

4. How to restart misbehaved apps.

But at this moment I want to discuss basic approach. If there are will be no
objections against basic concept, then we can develop details.

[1] https://github.com/lorc/xen_app_stub - native app
[2] https://github.com/lorc/xen/tree/el0_app - my branch with PoC
[3] http://marc.info/?l=xen-devel&m=149088856116797&w=2 - benchmark results


-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-06 20:21 [ARM] Native application design and discussion (I hope) Volodymyr Babchuk
@ 2017-04-06 21:31 ` Stefano Stabellini
  2017-04-07 11:03   ` Volodymyr Babchuk
  0 siblings, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-04-06 21:31 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Julien Grall, Artem Mygaiev, Stefano Stabellini, Xen Devel

On Thu, 6 Apr 2017, Volodymyr Babchuk wrote:
> Hello all,
> 
> I want to discuss EL0 (native) applications for XEN. This will be relatively
> long e-mail with requirements, proposed design and my PoC results.
> 
> So, why we want XEN native applications in the first place? I see the following
> reasons:
> 
> 1. Isolation. I see XEN as a sort of micro-kernel, so there are no place for
> device drivers, emulators, specific SMC handlers, hypervisor extension, etc..
> 
> 2. Modularity. Just look at Linux kernel. Obviously, for different devices we
> can load different drivers.
> 
> 3. Performance. Native application should be faster than stub domain, or there
> will be no sense in it.
> 
> 4. Ease of use. I want to make call to EL0 app as easy as possible.
> Ideally - as a function call.
> 
> Actually, no one wants extra code in hypervisor, so reasons (1) and (2) are most
> important. I know that there was tries to do such thing in x86 but with
> different approach. I want describe my idea for arm64.
> 
> Native application is an another domain type. It has own vCPU (only one at this
> moment) Native app is loaded as any other kernel, using ELF loader.
> It looks like another stub-domain such as MiniOS, but there are two big
> differences:

Could you describe the reason why you are suggesting it? Unless strictly
necessary, I wouldn't go down the vcpu route, because as soon as we
bring a vcpu into the picture, we have a number of problems, including
scheduling, affinity, etc. It is also user-visible (xl vcpu-list) which
I don't think it should be.

I understand that one of the goals is "Modularity", which makes us think
of an ELF loader, such as the one for a new domain. I agree that
modularity is important, but I would solve it as a second step. In first
instance, I would limit the scope to run some code under
/xen/arch/arm/apps or, better, /apps (for example) in a lower privilege
mode. After that is done and working, I am sure we can find a way to
dynamically load more apps at run time.

 
> 1. MiniOS has event loop that serves requests from hypervisor. Native
> application does not has such loop. It has one entry point where you jump every
> time when you need something from it.
> 
> 2. Native application runs in EL0 mode, so it does not have access to MMU,
> it can't handle vIQRs, exceptions and so on. XEN does all this for it.
>
> You can find example native application at [1]. I used exactly this one to
> benchmark my implementation. Mostly it is inspired by approach used in TEE.
> Actually, I took some code directly from OP-TEE Trusted Application library.
> In app_entry.c you can find entry point - __app_entry(). It takes function
> number and some parameters that will be passed to a function. I probably going
> to change ABI a bit, but basic idea will be the same.
> 
> Function number will be something like APP_INIT, APP_HANDLE_SMC
> or APP_HANDLE_MMIO... I think you got the idea. I also implemented two syscalls
> (via old plain SVC instruction). app_log() writes to XEN log, app_return() exits
> from application back to hypervisor. We will need other syscalls like
> app_call_smc(), app_map_guest_page(), app_map_io(), etc.
> 
> Now, back to XEN. Classic way to handle something with stubdomain is to
> write request to a ring buffer, fire an event through event channel, that will
> trigger vIRQ in stubdomain and stubdomain's vCPU will be scheduled to handle
> a request. Problem it that you can't control scheduler, so you don't know
> when your request will be really handled, which in not fine in some
> embedded use cases.
> 
> There is how I see handling requests with native application:
> 
> 0. Hypervisor pauses requester vCPU
> 1. Hypervisor either passes parameters via registers or writes request to a
> shared page/ring buffer.
> 2. Then in sets PC of native app vCPU to entry point and initializes r0-r7
> with function code and other parameters.
> 3. Hypervisor switches context to native app vCPU
> 4. When native app finishes request handling it calls special syscall app_exit()
> 5. Hypervisor analyses return code, updates requester vCPU state (if needed),
> switches back to that vCPU, unpauses it.
> 
> Most of that was done at [2]. Most interesting part is in arch/arm/domain.c
> There are functions call_el0_app() and return_from_el0_app() that do most
> of the work. Also I have added syscalls handlers (in the same way,
> as hypercalls are handled). You can find them at xen/arch/arm/syscall.c

This workflow is actually kind of OK. I would not use the term "vcpu"
for anything related to an el0 app. Maybe we need to introduce a new
concept, such as "app_context" for example. But I would not want to
confuse "vcpu" which is the runtime environment exposed to guests, with
the el0 Xen context.

A vcpu is expected to be running simultenously with other vcpus of the
same domain or different domains. The scheduler is expected to choose
when it is supposed to be running. On the other end, an el0 app runs to
handle/emulate a single request from a guest vcpu, which will be paused
until the el0 app finishes. After that, the guest vcpu will resume.


> At this moment entry point is hardcoded and you need to update it every time
> you rebuild native application. Also there are no actual parameters passed.
> Also, whole code is a piece of gosa, because it was first time I hacked XEN.

:-)
I would start by introducing a proper way to pass parameters and return
values.


> I don't want to repeat benchmark results, because they already was posted in ML.
> You can find them at [3].
> 
> I understand that I have missed many things:
> 
> 1. How to ship and load native app, because some of them will be needed even
> before dom0 is created.

I envision something like Linux's insmod, but I suggest postponing this
problem. At the moment, it would be fine to assume that all apps need to
be built statically and cannot be loaded at runtime.


> 2. How to distinguish multiple native apps

Each apps need to specify a range of MMIO/SMC handlers. Xen will invoke
the right one.


> 3. Concurrency in native apps

This is an interesting problem: what do we do if two guest vcpus make
simultenous requests that need to be handled by the same app?
Technically, we could run the same app twice on two different pcpus
simultenously. But then, the apps would need to be able to cope with
concurrency (spin_locks, etc.) From Xen point of view, it should be OK
though.


> 4. How to restart misbehaved apps.

A related question is the following: do we expect to allocate each app
once at boot or once per guest? Apps need to have some per-domain
context, but it could be passed from Xen to the app on a shared page,
possibly reducing the need for allocating the same app once per guest? 


> But at this moment I want to discuss basic approach. If there are will be no
> objections against basic concept, then we can develop details.
> 
> [1] https://github.com/lorc/xen_app_stub - native app
> [2] https://github.com/lorc/xen/tree/el0_app - my branch with PoC
> [3] http://marc.info/?l=xen-devel&m=149088856116797&w=2 - benchmark results

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-06 21:31 ` Stefano Stabellini
@ 2017-04-07 11:03   ` Volodymyr Babchuk
  2017-04-07 23:36     ` Stefano Stabellini
  0 siblings, 1 reply; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-04-07 11:03 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Artem Mygaiev, Xen Devel

Hi Stefano,


On 7 April 2017 at 00:31, Stefano Stabellini <sstabellini@kernel.org> wrote:
> On Thu, 6 Apr 2017, Volodymyr Babchuk wrote:
>> Hello all,
>>
>> I want to discuss EL0 (native) applications for XEN. This will be relatively
>> long e-mail with requirements, proposed design and my PoC results.
>>
>> So, why we want XEN native applications in the first place? I see the following
>> reasons:
>>
>> 1. Isolation. I see XEN as a sort of micro-kernel, so there are no place for
>> device drivers, emulators, specific SMC handlers, hypervisor extension, etc..
>>
>> 2. Modularity. Just look at Linux kernel. Obviously, for different devices we
>> can load different drivers.
>>
>> 3. Performance. Native application should be faster than stub domain, or there
>> will be no sense in it.
>>
>> 4. Ease of use. I want to make call to EL0 app as easy as possible.
>> Ideally - as a function call.
>>
>> Actually, no one wants extra code in hypervisor, so reasons (1) and (2) are most
>> important. I know that there was tries to do such thing in x86 but with
>> different approach. I want describe my idea for arm64.
>>
>> Native application is an another domain type. It has own vCPU (only one at this
>> moment) Native app is loaded as any other kernel, using ELF loader.
>> It looks like another stub-domain such as MiniOS, but there are two big
>> differences:
>
> Could you describe the reason why you are suggesting it? Unless strictly
> necessary, I wouldn't go down the vcpu route, because as soon as we
> bring a vcpu into the picture, we have a number of problems, including
> scheduling, affinity, etc. It is also user-visible (xl vcpu-list) which
> I don't think it should be.
I used this in my PoC because I didn't want to do extra work. Also this looks
very natural. Domain is actually the same as a process, vcpu is like a thread.
But yes, I already had some issues with scheduler. Manageable, thought.

> I understand that one of the goals is "Modularity", which makes us think
> of an ELF loader, such as the one for a new domain. I agree that
> modularity is important, but I would solve it as a second step. In first
> instance, I would limit the scope to run some code under
> /xen/arch/arm/apps or, better, /apps (for example) in a lower privilege
> mode. After that is done and working, I am sure we can find a way to
> dynamically load more apps at run time.
Again, use of existing domain framework was the easiest way. I needed
some container to hold app and domain fits perfectly. I need to map pages
there, need routines to copy to and from its memory, need p2m code, etc.

But, yes, if we are going to implement this in right way, then maybe we need
separate identities like 'app_container' and 'app_thread'. See below.

>> 1. MiniOS has event loop that serves requests from hypervisor. Native
>> application does not has such loop. It has one entry point where you jump every
>> time when you need something from it.
>>
>> 2. Native application runs in EL0 mode, so it does not have access to MMU,
>> it can't handle vIQRs, exceptions and so on. XEN does all this for it.
>>
>> You can find example native application at [1]. I used exactly this one to
>> benchmark my implementation. Mostly it is inspired by approach used in TEE.
>> Actually, I took some code directly from OP-TEE Trusted Application library.
>> In app_entry.c you can find entry point - __app_entry(). It takes function
>> number and some parameters that will be passed to a function. I probably going
>> to change ABI a bit, but basic idea will be the same.
>>
>> Function number will be something like APP_INIT, APP_HANDLE_SMC
>> or APP_HANDLE_MMIO... I think you got the idea. I also implemented two syscalls
>> (via old plain SVC instruction). app_log() writes to XEN log, app_return() exits
>> from application back to hypervisor. We will need other syscalls like
>> app_call_smc(), app_map_guest_page(), app_map_io(), etc.
>>
>> Now, back to XEN. Classic way to handle something with stubdomain is to
>> write request to a ring buffer, fire an event through event channel, that will
>> trigger vIRQ in stubdomain and stubdomain's vCPU will be scheduled to handle
>> a request. Problem it that you can't control scheduler, so you don't know
>> when your request will be really handled, which in not fine in some
>> embedded use cases.
>>
>> There is how I see handling requests with native application:
>>
>> 0. Hypervisor pauses requester vCPU
>> 1. Hypervisor either passes parameters via registers or writes request to a
>> shared page/ring buffer.
>> 2. Then in sets PC of native app vCPU to entry point and initializes r0-r7
>> with function code and other parameters.
>> 3. Hypervisor switches context to native app vCPU
>> 4. When native app finishes request handling it calls special syscall app_exit()
>> 5. Hypervisor analyses return code, updates requester vCPU state (if needed),
>> switches back to that vCPU, unpauses it.
>>
>> Most of that was done at [2]. Most interesting part is in arch/arm/domain.c
>> There are functions call_el0_app() and return_from_el0_app() that do most
>> of the work. Also I have added syscalls handlers (in the same way,
>> as hypercalls are handled). You can find them at xen/arch/arm/syscall.c
>
> This workflow is actually kind of OK. I would not use the term "vcpu"
> for anything related to an el0 app. Maybe we need to introduce a new
> concept, such as "app_context" for example. But I would not want to
> confuse "vcpu" which is the runtime environment exposed to guests, with
> the el0 Xen context.
>
> A vcpu is expected to be running simultenously with other vcpus of the
> same domain or different domains. The scheduler is expected to choose
> when it is supposed to be running. On the other end, an el0 app runs to
> handle/emulate a single request from a guest vcpu, which will be paused
> until the el0 app finishes. After that, the guest vcpu will resume.
Okay, but what should be stored in `current` while el0 application is running?
Remember, that it can issue syscalls, which will be handled in hypervisor.

We can create separates types for native applications. But then we can end
having two parallel and mostly identical frameworks. One for domains and
another one - for apps. What do you think?

>> At this moment entry point is hardcoded and you need to update it every time
>> you rebuild native application. Also there are no actual parameters passed.
>> Also, whole code is a piece of gosa, because it was first time I hacked XEN.
>
> :-)
> I would start by introducing a proper way to pass parameters and return
> values.
>
>> I don't want to repeat benchmark results, because they already was posted in ML.
>> You can find them at [3].
>>
>> I understand that I have missed many things:
>>
>> 1. How to ship and load native app, because some of them will be needed even
>> before dom0 is created.
>
> I envision something like Linux's insmod, but I suggest postponing this
> problem. At the moment, it would be fine to assume that all apps need to
> be built statically and cannot be loaded at runtime.
Okay. Then we need to hold them in special sections of hypervisor image
and also we need some sort of loader in hypervisor.

>> 2. How to distinguish multiple native apps
>
> Each apps need to specify a range of MMIO/SMC handlers. Xen will invoke
> the right one.
What about device drivers? Consider power management for example. This is
crucial if we want to use XEN in mobile devices. Our (there, in EPAM) idea is
to hold drivers for PM, drivers for coprocessors and so on in native apps.
Probably we will need different types of apps: SMC handler, MMIO handler,
PM driver, and so on.

>> 3. Concurrency in native apps
>
> This is an interesting problem: what do we do if two guest vcpus make
> simultenous requests that need to be handled by the same app?
> Technically, we could run the same app twice on two different pcpus
> simultenously. But then, the apps would need to be able to cope with
> concurrency (spin_locks, etc.) From Xen point of view, it should be OK
> though.
Yes. Probably, we can pass id of pcpu to app, so it can have per-cpu storage
if it wants to. Plus spin_locks and no blocking syscalls.

>
>> 4. How to restart misbehaved apps.
>
> A related question is the following: do we expect to allocate each app
> once at boot or once per guest? Apps need to have some per-domain
> context, but it could be passed from Xen to the app on a shared page,
> possibly reducing the need for allocating the same app once per guest?
SMC handler needs to be cross-domain for example. Emulators can be
tied to guests, I think. Device drivers should be cross-domain also.

>
>> But at this moment I want to discuss basic approach. If there are will be no
>> objections against basic concept, then we can develop details.
>>
>> [1] https://github.com/lorc/xen_app_stub - native app
>> [2] https://github.com/lorc/xen/tree/el0_app - my branch with PoC
>> [3] http://marc.info/?l=xen-devel&m=149088856116797&w=2 - benchmark results



-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-07 11:03   ` Volodymyr Babchuk
@ 2017-04-07 23:36     ` Stefano Stabellini
  2017-04-11 20:32       ` Stefano Stabellini
  0 siblings, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-04-07 23:36 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Julien Grall, Stefano Stabellini, Artem Mygaiev, Xen Devel

On Fri, 7 Apr 2017, Volodymyr Babchuk wrote:
> >> Native application is an another domain type. It has own vCPU (only one at this
> >> moment) Native app is loaded as any other kernel, using ELF loader.
> >> It looks like another stub-domain such as MiniOS, but there are two big
> >> differences:
> >
> > Could you describe the reason why you are suggesting it? Unless strictly
> > necessary, I wouldn't go down the vcpu route, because as soon as we
> > bring a vcpu into the picture, we have a number of problems, including
> > scheduling, affinity, etc. It is also user-visible (xl vcpu-list) which
> > I don't think it should be.
> I used this in my PoC because I didn't want to do extra work. Also this looks
> very natural. Domain is actually the same as a process, vcpu is like a thread.
> But yes, I already had some issues with scheduler. Manageable, thought.
> 
> > I understand that one of the goals is "Modularity", which makes us think
> > of an ELF loader, such as the one for a new domain. I agree that
> > modularity is important, but I would solve it as a second step. In first
> > instance, I would limit the scope to run some code under
> > /xen/arch/arm/apps or, better, /apps (for example) in a lower privilege
> > mode. After that is done and working, I am sure we can find a way to
> > dynamically load more apps at run time.
> Again, use of existing domain framework was the easiest way. I needed
> some container to hold app and domain fits perfectly. I need to map pages
> there, need routines to copy to and from its memory, need p2m code, etc.
> 
> But, yes, if we are going to implement this in right way, then maybe we need
> separate identities like 'app_container' and 'app_thread'. See below.
> 
> >
> > A vcpu is expected to be running simultenously with other vcpus of the
> > same domain or different domains. The scheduler is expected to choose
> > when it is supposed to be running. On the other end, an el0 app runs to
> > handle/emulate a single request from a guest vcpu, which will be paused
> > until the el0 app finishes. After that, the guest vcpu will resume.
> Okay, but what should be stored in `current` while el0 application is running?
> Remember, that it can issue syscalls, which will be handled in hypervisor.
> 
> We can create separates types for native applications. But then we can end
> having two parallel and mostly identical frameworks. One for domains and
> another one - for apps. What do you think?

This is a great topic for the Xen Hackathon.

This is the most difficult problem that we need to solve as part of this
work. It is difficult to have the right answer at the beginning, before
seeing any code. If the app_container/app_thread approach causes too
much duplication of work, the alternative would be to fix/improve
stubdoms (minios) until they match what we need. Specifically, these
would be the requirements:

1) Determinism: a stubdom servicing a given guest needs to be scheduled
   immediately after the guest vcpu traps into Xen. It needs to
   deterministic. The stubdom vcpu has to be scheduled on the same pcpu.
   This is probably the most important missing thing at the moment.

2) Accounting: memory and cpu time of a stubdom should be accounted
   agaist the domain it is servicing. Otherwise it's not fair.

3) Visibility: stub domains and vcpus should be marked differently from other
   vcpus as not to confuse the user. Otherwise "xl list" becomes
   confusing.


1) and 2) are particularly important. If we had them, we would not need
el0 apps. I believe stubdoms would be as fast as el0 apps too.



> >> At this moment entry point is hardcoded and you need to update it every time
> >> you rebuild native application. Also there are no actual parameters passed.
> >> Also, whole code is a piece of gosa, because it was first time I hacked XEN.
> >
> > :-)
> > I would start by introducing a proper way to pass parameters and return
> > values.
> >
> >> I don't want to repeat benchmark results, because they already was posted in ML.
> >> You can find them at [3].
> >>
> >> I understand that I have missed many things:
> >>
> >> 1. How to ship and load native app, because some of them will be needed even
> >> before dom0 is created.
> >
> > I envision something like Linux's insmod, but I suggest postponing this
> > problem. At the moment, it would be fine to assume that all apps need to
> > be built statically and cannot be loaded at runtime.
> Okay. Then we need to hold them in special sections of hypervisor image
> and also we need some sort of loader in hypervisor.
> 
> >> 2. How to distinguish multiple native apps
> >
> > Each apps need to specify a range of MMIO/SMC handlers. Xen will invoke
> > the right one.
> What about device drivers? Consider power management for example. This is
> crucial if we want to use XEN in mobile devices. Our (there, in EPAM) idea is
> to hold drivers for PM, drivers for coprocessors and so on in native apps.
> Probably we will need different types of apps: SMC handler, MMIO handler,
> PM driver, and so on.

Yes, something like that.


> >> 3. Concurrency in native apps
> >
> > This is an interesting problem: what do we do if two guest vcpus make
> > simultenous requests that need to be handled by the same app?
> > Technically, we could run the same app twice on two different pcpus
> > simultenously. But then, the apps would need to be able to cope with
> > concurrency (spin_locks, etc.) From Xen point of view, it should be OK
> > though.
> Yes. Probably, we can pass id of pcpu to app, so it can have per-cpu storage
> if it wants to. Plus spin_locks and no blocking syscalls.
> 
> >
> >> 4. How to restart misbehaved apps.
> >
> > A related question is the following: do we expect to allocate each app
> > once at boot or once per guest? Apps need to have some per-domain
> > context, but it could be passed from Xen to the app on a shared page,
> > possibly reducing the need for allocating the same app once per guest?
> SMC handler needs to be cross-domain for example. Emulators can be
> tied to guests, I think. Device drivers should be cross-domain also.
> 
> >
> >> But at this moment I want to discuss basic approach. If there are will be no
> >> objections against basic concept, then we can develop details.
> >>
> >> [1] https://github.com/lorc/xen_app_stub - native app
> >> [2] https://github.com/lorc/xen/tree/el0_app - my branch with PoC
> >> [3] http://marc.info/?l=xen-devel&m=149088856116797&w=2 - benchmark results

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-07 23:36     ` Stefano Stabellini
@ 2017-04-11 20:32       ` Stefano Stabellini
  2017-04-12 18:13         ` Dario Faggioli
  0 siblings, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-04-11 20:32 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Volodymyr Babchuk, dario.faggioli, george.dunlap, Xen Devel,
	Julien Grall, Artem Mygaiev

On Fri, 7 Apr 2017, Stefano Stabellini wrote:
> On Fri, 7 Apr 2017, Volodymyr Babchuk wrote:
> > >> Native application is an another domain type. It has own vCPU (only one at this
> > >> moment) Native app is loaded as any other kernel, using ELF loader.
> > >> It looks like another stub-domain such as MiniOS, but there are two big
> > >> differences:
> > >
> > > Could you describe the reason why you are suggesting it? Unless strictly
> > > necessary, I wouldn't go down the vcpu route, because as soon as we
> > > bring a vcpu into the picture, we have a number of problems, including
> > > scheduling, affinity, etc. It is also user-visible (xl vcpu-list) which
> > > I don't think it should be.
> > I used this in my PoC because I didn't want to do extra work. Also this looks
> > very natural. Domain is actually the same as a process, vcpu is like a thread.
> > But yes, I already had some issues with scheduler. Manageable, thought.
> > 
> > > I understand that one of the goals is "Modularity", which makes us think
> > > of an ELF loader, such as the one for a new domain. I agree that
> > > modularity is important, but I would solve it as a second step. In first
> > > instance, I would limit the scope to run some code under
> > > /xen/arch/arm/apps or, better, /apps (for example) in a lower privilege
> > > mode. After that is done and working, I am sure we can find a way to
> > > dynamically load more apps at run time.
> > Again, use of existing domain framework was the easiest way. I needed
> > some container to hold app and domain fits perfectly. I need to map pages
> > there, need routines to copy to and from its memory, need p2m code, etc.
> > 
> > But, yes, if we are going to implement this in right way, then maybe we need
> > separate identities like 'app_container' and 'app_thread'. See below.
> > 
> > >
> > > A vcpu is expected to be running simultenously with other vcpus of the
> > > same domain or different domains. The scheduler is expected to choose
> > > when it is supposed to be running. On the other end, an el0 app runs to
> > > handle/emulate a single request from a guest vcpu, which will be paused
> > > until the el0 app finishes. After that, the guest vcpu will resume.
> > Okay, but what should be stored in `current` while el0 application is running?
> > Remember, that it can issue syscalls, which will be handled in hypervisor.
> > 
> > We can create separates types for native applications. But then we can end
> > having two parallel and mostly identical frameworks. One for domains and
> > another one - for apps. What do you think?
> 
> This is a great topic for the Xen Hackathon.
> 
> This is the most difficult problem that we need to solve as part of this
> work. It is difficult to have the right answer at the beginning, before
> seeing any code. If the app_container/app_thread approach causes too
> much duplication of work, the alternative would be to fix/improve
> stubdoms (minios) until they match what we need. Specifically, these
> would be the requirements:
> 
> 1) Determinism: a stubdom servicing a given guest needs to be scheduled
>    immediately after the guest vcpu traps into Xen. It needs to
>    deterministic. The stubdom vcpu has to be scheduled on the same pcpu.
>    This is probably the most important missing thing at the moment.
> 
> 2) Accounting: memory and cpu time of a stubdom should be accounted
>    agaist the domain it is servicing. Otherwise it's not fair.
> 
> 3) Visibility: stub domains and vcpus should be marked differently from other
>    vcpus as not to confuse the user. Otherwise "xl list" becomes
>    confusing.
> 
> 
> 1) and 2) are particularly important. If we had them, we would not need
> el0 apps. I believe stubdoms would be as fast as el0 apps too.

CC'ing George and Dario. I was speaking with George about this topic,
I'll let him explain his view as scheduler maintainer, but he suggested
to avoid scheduler modifications (all schedulers would need to be
taught to handle this) and extend struct vcpu for el0 apps instead.


> > >> At this moment entry point is hardcoded and you need to update it every time
> > >> you rebuild native application. Also there are no actual parameters passed.
> > >> Also, whole code is a piece of gosa, because it was first time I hacked XEN.
> > >
> > > :-)
> > > I would start by introducing a proper way to pass parameters and return
> > > values.
> > >
> > >> I don't want to repeat benchmark results, because they already was posted in ML.
> > >> You can find them at [3].
> > >>
> > >> I understand that I have missed many things:
> > >>
> > >> 1. How to ship and load native app, because some of them will be needed even
> > >> before dom0 is created.
> > >
> > > I envision something like Linux's insmod, but I suggest postponing this
> > > problem. At the moment, it would be fine to assume that all apps need to
> > > be built statically and cannot be loaded at runtime.
> > Okay. Then we need to hold them in special sections of hypervisor image
> > and also we need some sort of loader in hypervisor.
> > 
> > >> 2. How to distinguish multiple native apps
> > >
> > > Each apps need to specify a range of MMIO/SMC handlers. Xen will invoke
> > > the right one.
> > What about device drivers? Consider power management for example. This is
> > crucial if we want to use XEN in mobile devices. Our (there, in EPAM) idea is
> > to hold drivers for PM, drivers for coprocessors and so on in native apps.
> > Probably we will need different types of apps: SMC handler, MMIO handler,
> > PM driver, and so on.
> 
> Yes, something like that.
> 
> 
> > >> 3. Concurrency in native apps
> > >
> > > This is an interesting problem: what do we do if two guest vcpus make
> > > simultenous requests that need to be handled by the same app?
> > > Technically, we could run the same app twice on two different pcpus
> > > simultenously. But then, the apps would need to be able to cope with
> > > concurrency (spin_locks, etc.) From Xen point of view, it should be OK
> > > though.
> > Yes. Probably, we can pass id of pcpu to app, so it can have per-cpu storage
> > if it wants to. Plus spin_locks and no blocking syscalls.
> > 
> > >
> > >> 4. How to restart misbehaved apps.
> > >
> > > A related question is the following: do we expect to allocate each app
> > > once at boot or once per guest? Apps need to have some per-domain
> > > context, but it could be passed from Xen to the app on a shared page,
> > > possibly reducing the need for allocating the same app once per guest?
> > SMC handler needs to be cross-domain for example. Emulators can be
> > tied to guests, I think. Device drivers should be cross-domain also.
> > 
> > >
> > >> But at this moment I want to discuss basic approach. If there are will be no
> > >> objections against basic concept, then we can develop details.
> > >>
> > >> [1] https://github.com/lorc/xen_app_stub - native app
> > >> [2] https://github.com/lorc/xen/tree/el0_app - my branch with PoC
> > >> [3] http://marc.info/?l=xen-devel&m=149088856116797&w=2 - benchmark results
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-11 20:32       ` Stefano Stabellini
@ 2017-04-12 18:13         ` Dario Faggioli
  2017-04-12 19:17           ` Stefano Stabellini
  0 siblings, 1 reply; 82+ messages in thread
From: Dario Faggioli @ 2017-04-12 18:13 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Volodymyr Babchuk, Julien Grall, Artem Mygaiev, george.dunlap, Xen Devel


[-- Attachment #1.1: Type: text/plain, Size: 3370 bytes --]

On Tue, 2017-04-11 at 13:32 -0700, Stefano Stabellini wrote:
> On Fri, 7 Apr 2017, Stefano Stabellini wrote:
> > 
> > This is the most difficult problem that we need to solve as part of
> > this
> > work. It is difficult to have the right answer at the beginning,
> > before
> > seeing any code. If the app_container/app_thread approach causes
> > too
> > much duplication of work, the alternative would be to fix/improve
> > stubdoms (minios) until they match what we need. Specifically,
> > these
> > would be the requirements:
> > 
>
IMO, this stubdom way, is really really really interesting! :-)

> > 1) Determinism: a stubdom servicing a given guest needs to be
> > scheduled
> >    immediately after the guest vcpu traps into Xen. It needs to
> >    deterministic.
>
Something like this is in my plan since long time. Being able to /
having help for making it happen would be *great*!

So, if I'm the scheduler, can you tell  me exactly when a vcpu blocks
waiting for a service from the vcpu of an app/stubdom (as opposed to,
going to sleep, waiting for other, unrelated, event, etc), and which
one?

If yes... That'd be a good start.

> >  The stubdom vcpu has to be scheduled on the same pcpu.
> >    This is probably the most important missing thing at the moment.
> > 
That's interesting --similar to what I had in mind, even-- but needs
thinking.

E.g., if the stubdom/app is multi-vcpu, which of its vcpu would you
schedule? And how can we be sure that what will run on that vcpu of the
stubdom is _exactly_ the process that will deal with the request the
"real gust" is waiting on?

TBH, this is much more of an issue if we think of doing something like
this for driver domain too, while in the stubdom case it indeed
shouldn't be impossible, but still...

(And stubdoms, especially minios ones, are the ones I know less, so
bear with me a bit.)

> > 2) Accounting: memory and cpu time of a stubdom should be accounted
> >    agaist the domain it is servicing. Otherwise it's not fair.
> > 
Absolutely.

> > 3) Visibility: stub domains and vcpus should be marked differently
> > from other
> >    vcpus as not to confuse the user. Otherwise "xl list" becomes
> >    confusing.
> > 
Well, may seem unrelated, but will you schedule the subdom _only_ in
this kind of "donated time slots" way?

> > 1) and 2) are particularly important. If we had them, we would not
> > need
> > el0 apps. I believe stubdoms would be as fast as el0 apps too.
> 
> CC'ing George and Dario. I was speaking with George about this topic,
> I'll let him explain his view as scheduler maintainer, but he
> suggested
> to avoid scheduler modifications (all schedulers would need to be
> taught to handle this) and extend struct vcpu for el0 apps instead.
> 
Yeah, thanks Stefano. I'm back today after being sick for a couple of
days, so I need to catch up with this thread, and I will.

In general, I like the idea of enhancing stubdoms for this, and I'll
happily participate in design and development of that.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-12 18:13         ` Dario Faggioli
@ 2017-04-12 19:17           ` Stefano Stabellini
  2017-04-20 20:20             ` Volodymyr Babchuk
  0 siblings, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-04-12 19:17 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Stefano Stabellini, Volodymyr Babchuk, george.dunlap, Xen Devel,
	Julien Grall, Artem Mygaiev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3757 bytes --]

On Wed, 12 Apr 2017, Dario Faggioli wrote:
> On Tue, 2017-04-11 at 13:32 -0700, Stefano Stabellini wrote:
> > On Fri, 7 Apr 2017, Stefano Stabellini wrote:
> > > 
> > > This is the most difficult problem that we need to solve as part of
> > > this
> > > work. It is difficult to have the right answer at the beginning,
> > > before
> > > seeing any code. If the app_container/app_thread approach causes
> > > too
> > > much duplication of work, the alternative would be to fix/improve
> > > stubdoms (minios) until they match what we need. Specifically,
> > > these
> > > would be the requirements:
> > > 
> >
> IMO, this stubdom way, is really really really interesting! :-)
> 
> > > 1) Determinism: a stubdom servicing a given guest needs to be
> > > scheduled
> > >    immediately after the guest vcpu traps into Xen. It needs to
> > >    deterministic.
> >
> Something like this is in my plan since long time. Being able to /
> having help for making it happen would be *great*!
> 
> So, if I'm the scheduler, can you tell  me exactly when a vcpu blocks
> waiting for a service from the vcpu of an app/stubdom (as opposed to,
> going to sleep, waiting for other, unrelated, event, etc), and which
> one?
> 
> If yes... That'd be a good start.

Yes, I think so.


> > >  The stubdom vcpu has to be scheduled on the same pcpu.
> > >    This is probably the most important missing thing at the moment.
> > > 
> That's interesting --similar to what I had in mind, even-- but needs
> thinking.
> 
> E.g., if the stubdom/app is multi-vcpu, which of its vcpu would you
> schedule? And how can we be sure that what will run on that vcpu of the
> stubdom is _exactly_ the process that will deal with the request the
> "real gust" is waiting on?
> 
> TBH, this is much more of an issue if we think of doing something like
> this for driver domain too, while in the stubdom case it indeed
> shouldn't be impossible, but still...
> 
> (And stubdoms, especially minios ones, are the ones I know less, so
> bear with me a bit.)

We would have one app per emulator. Each app would register an MMIO
range or instruction set to emulate. On a guest trap, Xen figures out
which app it needs to run.

With the app model, we would run the app on the same physical cpu where
the guest vcpu trapped, always starting from the same entry point of the
app. You could run as many app instances concurrently as the number of
guest vcpus on different pcpus.

There are no stubdom processes, only a single entry point and a single
address space.


> > > 2) Accounting: memory and cpu time of a stubdom should be accounted
> > >    agaist the domain it is servicing. Otherwise it's not fair.
> > > 
> Absolutely.
> 
> > > 3) Visibility: stub domains and vcpus should be marked differently
> > > from other
> > >    vcpus as not to confuse the user. Otherwise "xl list" becomes
> > >    confusing.
> > > 
> Well, may seem unrelated, but will you schedule the subdom _only_ in
> this kind of "donated time slots" way?

Yes


> > > 1) and 2) are particularly important. If we had them, we would not
> > > need
> > > el0 apps. I believe stubdoms would be as fast as el0 apps too.
> > 
> > CC'ing George and Dario. I was speaking with George about this topic,
> > I'll let him explain his view as scheduler maintainer, but he
> > suggested
> > to avoid scheduler modifications (all schedulers would need to be
> > taught to handle this) and extend struct vcpu for el0 apps instead.
> > 
> Yeah, thanks Stefano. I'm back today after being sick for a couple of
> days, so I need to catch up with this thread, and I will.
> 
> In general, I like the idea of enhancing stubdoms for this, and I'll
> happily participate in design and development of that.

That would be great!

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-12 19:17           ` Stefano Stabellini
@ 2017-04-20 20:20             ` Volodymyr Babchuk
  2017-04-21 14:42               ` Andrii Anisov
  2017-04-21 15:57               ` Julien Grall
  0 siblings, 2 replies; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-04-20 20:20 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Dario Faggioli, Artem Mygaiev, Julien Grall, george.dunlap, Xen Devel

Hi Stefano,

On 12 April 2017 at 22:17, Stefano Stabellini <sstabellini@kernel.org> wrote:
> On Wed, 12 Apr 2017, Dario Faggioli wrote:
>> On Tue, 2017-04-11 at 13:32 -0700, Stefano Stabellini wrote:
>> > On Fri, 7 Apr 2017, Stefano Stabellini wrote:
>> > >
>> > > This is the most difficult problem that we need to solve as part of
>> > > this
>> > > work. It is difficult to have the right answer at the beginning,
>> > > before
>> > > seeing any code. If the app_container/app_thread approach causes
>> > > too
>> > > much duplication of work, the alternative would be to fix/improve
>> > > stubdoms (minios) until they match what we need. Specifically,
>> > > these
>> > > would be the requirements:
>> > >
>> >
>> IMO, this stubdom way, is really really really interesting! :-)
>>
>> > > 1) Determinism: a stubdom servicing a given guest needs to be
>> > > scheduled
>> > >    immediately after the guest vcpu traps into Xen. It needs to
>> > >    deterministic.
>> >
>> Something like this is in my plan since long time. Being able to /
>> having help for making it happen would be *great*!
>>
>> So, if I'm the scheduler, can you tell  me exactly when a vcpu blocks
>> waiting for a service from the vcpu of an app/stubdom (as opposed to,
>> going to sleep, waiting for other, unrelated, event, etc), and which
>> one?
>>
>> If yes... That'd be a good start.
>
> Yes, I think so.
Yep, it would be great to have support for apps from scheduler. At this moment
I have no clear vision on how that should look, thought.
>
>> > >  The stubdom vcpu has to be scheduled on the same pcpu.
>> > >    This is probably the most important missing thing at the moment.
>> > >
>> That's interesting --similar to what I had in mind, even-- but needs
>> thinking.
>>
>> E.g., if the stubdom/app is multi-vcpu, which of its vcpu would you
>> schedule? And how can we be sure that what will run on that vcpu of the
>> stubdom is _exactly_ the process that will deal with the request the
>> "real gust" is waiting on?
>>
>> TBH, this is much more of an issue if we think of doing something like
>> this for driver domain too, while in the stubdom case it indeed
>> shouldn't be impossible, but still...
>>
>> (And stubdoms, especially minios ones, are the ones I know less, so
>> bear with me a bit.)
>
> We would have one app per emulator. Each app would register an MMIO
> range or instruction set to emulate. On a guest trap, Xen figures out
> which app it needs to run.
I't is not best approach, I think. For example we need one SMC handler for
all domains. Because that SMC handler should track execution state of different
guests to help TEE with scheduling. You know, TEE can't block in secure state,
so it returns back and blocks in kernel driver. SMC handler need to know
which guest it needs to wake up when times comes.

The same story with virtual coprocessors, I think.

On other hand, MMIO handler can be one per domain. So, it should be
configurable. Or, maybe we need per-app MMIO handler and one global SMC handler.
Perhaps, we need to think about all possible use cases.

> With the app model, we would run the app on the same physical cpu where
> the guest vcpu trapped, always starting from the same entry point of the
> app. You could run as many app instances concurrently as the number of
> guest vcpus on different pcpus.
>
> There are no stubdom processes, only a single entry point and a single
> address space.
Right

>
>> > > 2) Accounting: memory and cpu time of a stubdom should be accounted
>> > >    agaist the domain it is servicing. Otherwise it's not fair.
>> > >
>> Absolutely.
>>
>> > > 3) Visibility: stub domains and vcpus should be marked differently
>> > > from other
>> > >    vcpus as not to confuse the user. Otherwise "xl list" becomes
>> > >    confusing.
>> > >
>> Well, may seem unrelated, but will you schedule the subdom _only_ in
>> this kind of "donated time slots" way?
>
> Yes
>
>
>> > > 1) and 2) are particularly important. If we had them, we would not
>> > > need
>> > > el0 apps. I believe stubdoms would be as fast as el0 apps too.
>> >
>> > CC'ing George and Dario. I was speaking with George about this topic,
>> > I'll let him explain his view as scheduler maintainer, but he
>> > suggested
>> > to avoid scheduler modifications (all schedulers would need to be
>> > taught to handle this) and extend struct vcpu for el0 apps instead.
>> >
>> Yeah, thanks Stefano. I'm back today after being sick for a couple of
>> days, so I need to catch up with this thread, and I will.
>>
>> In general, I like the idea of enhancing stubdoms for this, and I'll
>> happily participate in design and development of that.
>
> That would be great!



-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-20 20:20             ` Volodymyr Babchuk
@ 2017-04-21 14:42               ` Andrii Anisov
  2017-04-21 15:49                 ` Julien Grall
  2017-04-21 20:58                 ` Stefano Stabellini
  2017-04-21 15:57               ` Julien Grall
  1 sibling, 2 replies; 82+ messages in thread
From: Andrii Anisov @ 2017-04-21 14:42 UTC (permalink / raw)
  To: Volodymyr Babchuk, Stefano Stabellini
  Cc: Dario Faggioli, Xen Devel, Julien Grall, george.dunlap, Artem Mygaiev


[-- Attachment #1.1: Type: text/plain, Size: 2074 bytes --]

Hello,

On 20.04.17 23:20, Volodymyr Babchuk wrote:
> Hi Stefano,
>
> On 12 April 2017 at 22:17, Stefano Stabellini <sstabellini@kernel.org> wrote:
>> On Wed, 12 Apr 2017, Dario Faggioli wrote:
>>> On Tue, 2017-04-11 at 13:32 -0700, Stefano Stabellini wrote:
>>>> On Fri, 7 Apr 2017, Stefano Stabellini wrote:
>>>>> This is the most difficult problem that we need to solve as part of
>>>>> this
>>>>> work. It is difficult to have the right answer at the beginning,
>>>>> before
>>>>> seeing any code. If the app_container/app_thread approach causes
>>>>> too
>>>>> much duplication of work, the alternative would be to fix/improve
>>>>> stubdoms (minios) until they match what we need. Specifically,
>>>>> these
>>>>> would be the requirements:
>>>>>
>>> IMO, this stubdom way, is really really really interesting! :-)
>>>
>>>>> 1) Determinism: a stubdom servicing a given guest needs to be
>>>>> scheduled
>>>>>     immediately after the guest vcpu traps into Xen. It needs to
>>>>>     deterministic.
We will also need another type of application: one which is periodically 
called by XEN itself, not actually servicing any domain request. This is 
needed for a coprocessor sharing framework scheduler implementation.


-- 

*Andrii Anisov*

*Lead Systems Engineer*

*Office: *+380 44 390 5457 <tel:+380%2044%20390%205457> *x* 66766 
<tel:66766>*Cell: *+380 50 5738852 <tel:+380%2050%205738852>*Email: 
*andrii_anisov@epam.com <mailto:andrii_anisov@epam.com>

*Kyiv**,* *Ukraine *(GMT+3)*epam.com <http://www.epam.com>*

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or 
entity(ies) to which it is addressed and contains information that is 
legally privileged and confidential. If you are not the intended 
recipient, or the person responsible for delivering the message to the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this communication is strictly prohibited. 
All unintended recipients are obliged to delete this message and destroy 
any printed copies.


[-- Attachment #1.2: Type: text/html, Size: 6555 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-21 14:42               ` Andrii Anisov
@ 2017-04-21 15:49                 ` Julien Grall
  2017-04-21 16:08                   ` Volodymyr Babchuk
  2017-04-21 16:20                   ` Andrii Anisov
  2017-04-21 20:58                 ` Stefano Stabellini
  1 sibling, 2 replies; 82+ messages in thread
From: Julien Grall @ 2017-04-21 15:49 UTC (permalink / raw)
  To: Andrii Anisov, Volodymyr Babchuk, Stefano Stabellini
  Cc: Dario Faggioli, Xen Devel, george.dunlap, Artem Mygaiev



On 21/04/17 15:42, Andrii Anisov wrote:
> Hello,

Hi,

> On 20.04.17 23:20, Volodymyr Babchuk wrote:
>> Hi Stefano,
>>
>> On 12 April 2017 at 22:17, Stefano Stabellini <sstabellini@kernel.org> wrote:
>>> On Wed, 12 Apr 2017, Dario Faggioli wrote:
>>>> On Tue, 2017-04-11 at 13:32 -0700, Stefano Stabellini wrote:
>>>>> On Fri, 7 Apr 2017, Stefano Stabellini wrote:
>>>>>> This is the most difficult problem that we need to solve as part of
>>>>>> this
>>>>>> work. It is difficult to have the right answer at the beginning,
>>>>>> before
>>>>>> seeing any code. If the app_container/app_thread approach causes
>>>>>> too
>>>>>> much duplication of work, the alternative would be to fix/improve
>>>>>> stubdoms (minios) until they match what we need. Specifically,
>>>>>> these
>>>>>> would be the requirements:
>>>>>>
>>>> IMO, this stubdom way, is really really really interesting! :-)
>>>>
>>>>>> 1) Determinism: a stubdom servicing a given guest needs to be
>>>>>> scheduled
>>>>>>    immediately after the guest vcpu traps into Xen. It needs to
>>>>>>    deterministic.
> We will also need another type of application: one which is periodically
> called by XEN itself, not actually servicing any domain request. This is
> needed for a coprocessor sharing framework scheduler implementation.

I don't think we should think in term of type of application supported. 
We should aim to have a generic interface we can maintain based on the 
needs.

We can further restrict access to some interface for a given app. But I 
would rather avoid to have different interfaces for each type of 
application.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-20 20:20             ` Volodymyr Babchuk
  2017-04-21 14:42               ` Andrii Anisov
@ 2017-04-21 15:57               ` Julien Grall
  2017-04-21 16:16                 ` Volodymyr Babchuk
  1 sibling, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-04-21 15:57 UTC (permalink / raw)
  To: Volodymyr Babchuk, Stefano Stabellini
  Cc: Dario Faggioli, Artem Mygaiev, george.dunlap, Xen Devel

Hello Volodymyr,

On 20/04/17 21:20, Volodymyr Babchuk wrote:
> On 12 April 2017 at 22:17, Stefano Stabellini <sstabellini@kernel.org> wrote:
>> On Wed, 12 Apr 2017, Dario Faggioli wrote:
>>> On Tue, 2017-04-11 at 13:32 -0700, Stefano Stabellini wrote:
>>>> On Fri, 7 Apr 2017, Stefano Stabellini wrote:
>> We would have one app per emulator. Each app would register an MMIO
>> range or instruction set to emulate. On a guest trap, Xen figures out
>> which app it needs to run.
> I't is not best approach, I think. For example we need one SMC handler for
> all domains. Because that SMC handler should track execution state of different
> guests to help TEE with scheduling. You know, TEE can't block in secure state,
> so it returns back and blocks in kernel driver. SMC handler need to know
> which guest it needs to wake up when times comes.
>
> The same story with virtual coprocessors, I think.
>
> On other hand, MMIO handler can be one per domain. So, it should be
> configurable. Or, maybe we need per-app MMIO handler and one global SMC handler.
> Perhaps, we need to think about all possible use cases.

Could you explain what would be the benefits to run this global SMC 
handler in EL0?

After all, it will require access to the host SMC. So what will you 
protect against?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-21 15:49                 ` Julien Grall
@ 2017-04-21 16:08                   ` Volodymyr Babchuk
  2017-04-21 16:20                   ` Andrii Anisov
  1 sibling, 0 replies; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-04-21 16:08 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Andrii Anisov, Dario Faggioli, george.dunlap,
	Xen Devel, Artem Mygaiev

Hi Julien,

On 21 April 2017 at 18:49, Julien Grall <julien.grall@arm.com> wrote:
>
>
> On 21/04/17 15:42, Andrii Anisov wrote:
>>
>> Hello,
>
>
> Hi,
>
>> On 20.04.17 23:20, Volodymyr Babchuk wrote:
>>>
>>> Hi Stefano,
>>>
>>> On 12 April 2017 at 22:17, Stefano Stabellini <sstabellini@kernel.org>
>>> wrote:
>>>>
>>>> On Wed, 12 Apr 2017, Dario Faggioli wrote:
>>>>>
>>>>> On Tue, 2017-04-11 at 13:32 -0700, Stefano Stabellini wrote:
>>>>>>
>>>>>> On Fri, 7 Apr 2017, Stefano Stabellini wrote:
>>>>>>>
>>>>>>> This is the most difficult problem that we need to solve as part of
>>>>>>> this
>>>>>>> work. It is difficult to have the right answer at the beginning,
>>>>>>> before
>>>>>>> seeing any code. If the app_container/app_thread approach causes
>>>>>>> too
>>>>>>> much duplication of work, the alternative would be to fix/improve
>>>>>>> stubdoms (minios) until they match what we need. Specifically,
>>>>>>> these
>>>>>>> would be the requirements:
>>>>>>>
>>>>> IMO, this stubdom way, is really really really interesting! :-)
>>>>>
>>>>>>> 1) Determinism: a stubdom servicing a given guest needs to be
>>>>>>> scheduled
>>>>>>>    immediately after the guest vcpu traps into Xen. It needs to
>>>>>>>    deterministic.
>>
>> We will also need another type of application: one which is periodically
>> called by XEN itself, not actually servicing any domain request. This is
>> needed for a coprocessor sharing framework scheduler implementation.
>
> I don't think we should think in term of type of application supported. We
> should aim to have a generic interface we can maintain based on the needs.

> We can further restrict access to some interface for a given app. But I
> would rather avoid to have different interfaces for each type of
> application.
Probably, we can try another approach: allow application to register hooks
in hypervisor: i.e. hook on MMIO, hook on SMC, hook on timer and so on.

-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-21 15:57               ` Julien Grall
@ 2017-04-21 16:16                 ` Volodymyr Babchuk
  2017-04-21 16:47                   ` Julien Grall
  0 siblings, 1 reply; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-04-21 16:16 UTC (permalink / raw)
  To: Julien Grall
  Cc: Dario Faggioli, Stefano Stabellini, george.dunlap, Artem Mygaiev,
	Xen Devel

Hi Julien,

On 21 April 2017 at 18:57, Julien Grall <julien.grall@arm.com> wrote:
> Hello Volodymyr,
>
> On 20/04/17 21:20, Volodymyr Babchuk wrote:
>>
>> On 12 April 2017 at 22:17, Stefano Stabellini <sstabellini@kernel.org>
>> wrote:
>>>
>>> On Wed, 12 Apr 2017, Dario Faggioli wrote:
>>>>
>>>> On Tue, 2017-04-11 at 13:32 -0700, Stefano Stabellini wrote:
>>>>>
>>>>> On Fri, 7 Apr 2017, Stefano Stabellini wrote:
>>>
>>> We would have one app per emulator. Each app would register an MMIO
>>> range or instruction set to emulate. On a guest trap, Xen figures out
>>> which app it needs to run.
>>
>> I't is not best approach, I think. For example we need one SMC handler for
>> all domains. Because that SMC handler should track execution state of
>> different
>> guests to help TEE with scheduling. You know, TEE can't block in secure
>> state,
>> so it returns back and blocks in kernel driver. SMC handler need to know
>> which guest it needs to wake up when times comes.
>>
>> The same story with virtual coprocessors, I think.
>>
>> On other hand, MMIO handler can be one per domain. So, it should be
>> configurable. Or, maybe we need per-app MMIO handler and one global SMC
>> handler.
>> Perhaps, we need to think about all possible use cases.
>
>
> Could you explain what would be the benefits to run this global SMC handler
> in EL0?
>
> After all, it will require access to the host SMC. So what will you protect
> against?
Yes, it will require access to host SMC. Idea is not to protect (but,
it can protect also).
I want to allow different guests to work with one TEE. Imagine that
multiple guests need
protected storage, accelerated cryptography or other TEE services.
All SMCs will be trapped to app, app will alter(or block) request and
forward it to TEE. This is the most basic use case, which we want to
implement.


-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-21 15:49                 ` Julien Grall
  2017-04-21 16:08                   ` Volodymyr Babchuk
@ 2017-04-21 16:20                   ` Andrii Anisov
  1 sibling, 0 replies; 82+ messages in thread
From: Andrii Anisov @ 2017-04-21 16:20 UTC (permalink / raw)
  To: Julien Grall, Volodymyr Babchuk, Stefano Stabellini
  Cc: Dario Faggioli, Xen Devel, george.dunlap, Artem Mygaiev


[-- Attachment #1.1: Type: text/plain, Size: 1265 bytes --]

Julien,

> I don't think we should think in term of type of application 
> supported. We should aim to have a generic interface we can maintain 
> based on the needs.
I was just pointing out the use-case missed in the discussion of the 
interface intended to be generic.
It introduces new requirements in vcpu(app context) and its scheduling 
areas of the topic, comparing to the raw domain's requests servicing.

-- 

*Andrii Anisov*

*Lead Systems Engineer*

*Office: *+380 44 390 5457 <tel:+380%2044%20390%205457> *x* 66766 
<tel:66766>*Cell: *+380 50 5738852 <tel:+380%2050%205738852>*Email: 
*andrii_anisov@epam.com <mailto:andrii_anisov@epam.com>

*Kyiv**,* *Ukraine *(GMT+3)*epam.com <http://www.epam.com>*

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or 
entity(ies) to which it is addressed and contains information that is 
legally privileged and confidential. If you are not the intended 
recipient, or the person responsible for delivering the message to the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this communication is strictly prohibited. 
All unintended recipients are obliged to delete this message and destroy 
any printed copies.


[-- Attachment #1.2: Type: text/html, Size: 5119 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-21 16:16                 ` Volodymyr Babchuk
@ 2017-04-21 16:47                   ` Julien Grall
  2017-04-21 17:04                     ` Volodymyr Babchuk
  0 siblings, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-04-21 16:47 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Dario Faggioli, Stefano Stabellini, george.dunlap, Artem Mygaiev,
	Xen Devel

On 21/04/17 17:16, Volodymyr Babchuk wrote:
> Hi Julien,

Hi Volodymyr,

> On 21 April 2017 at 18:57, Julien Grall <julien.grall@arm.com> wrote:
>> Hello Volodymyr,
>>
>> On 20/04/17 21:20, Volodymyr Babchuk wrote:
>>>
>>> On 12 April 2017 at 22:17, Stefano Stabellini <sstabellini@kernel.org>
>>> wrote:
>>>>
>>>> On Wed, 12 Apr 2017, Dario Faggioli wrote:
>>>>>
>>>>> On Tue, 2017-04-11 at 13:32 -0700, Stefano Stabellini wrote:
>>>>>>
>>>>>> On Fri, 7 Apr 2017, Stefano Stabellini wrote:
>>>>
>>>> We would have one app per emulator. Each app would register an MMIO
>>>> range or instruction set to emulate. On a guest trap, Xen figures out
>>>> which app it needs to run.
>>>
>>> I't is not best approach, I think. For example we need one SMC handler for
>>> all domains. Because that SMC handler should track execution state of
>>> different
>>> guests to help TEE with scheduling. You know, TEE can't block in secure
>>> state,
>>> so it returns back and blocks in kernel driver. SMC handler need to know
>>> which guest it needs to wake up when times comes.
>>>
>>> The same story with virtual coprocessors, I think.
>>>
>>> On other hand, MMIO handler can be one per domain. So, it should be
>>> configurable. Or, maybe we need per-app MMIO handler and one global SMC
>>> handler.
>>> Perhaps, we need to think about all possible use cases.
>>
>>
>> Could you explain what would be the benefits to run this global SMC handler
>> in EL0?
>>
>> After all, it will require access to the host SMC. So what will you protect
>> against?
> Yes, it will require access to host SMC. Idea is not to protect (but,
> it can protect also).
> I want to allow different guests to work with one TEE. Imagine that
> multiple guests need
> protected storage, accelerated cryptography or other TEE services.
> All SMCs will be trapped to app, app will alter(or block) request and
> forward it to TEE. This is the most basic use case, which we want to
> implement.

I am sorry, but I don't understand it. I envision EL0 as a way to limit 
the attack vector to Xen and the host. If you give full access to SMC, 
then you cannot protect.

If the idea is not to protect, why do you want to move the code in EL0? 
What is the point to add an overhead (even if it is small) in this case?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-21 16:47                   ` Julien Grall
@ 2017-04-21 17:04                     ` Volodymyr Babchuk
  2017-04-21 17:38                       ` Julien Grall
  0 siblings, 1 reply; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-04-21 17:04 UTC (permalink / raw)
  To: Julien Grall
  Cc: Dario Faggioli, Stefano Stabellini, george.dunlap, Artem Mygaiev,
	Xen Devel

Julien,

On 21 April 2017 at 19:47, Julien Grall <julien.grall@arm.com> wrote:
> On 21/04/17 17:16, Volodymyr Babchuk wrote:
>>
>> Hi Julien,
>
>
> Hi Volodymyr,
>
>
>> On 21 April 2017 at 18:57, Julien Grall <julien.grall@arm.com> wrote:
>>>
>>> Hello Volodymyr,
>>>
>>> On 20/04/17 21:20, Volodymyr Babchuk wrote:
>>>>
>>>>
>>>> On 12 April 2017 at 22:17, Stefano Stabellini <sstabellini@kernel.org>
>>>> wrote:
>>>>>
>>>>>
>>>>> On Wed, 12 Apr 2017, Dario Faggioli wrote:
>>>>>>
>>>>>>
>>>>>> On Tue, 2017-04-11 at 13:32 -0700, Stefano Stabellini wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Fri, 7 Apr 2017, Stefano Stabellini wrote:
>>>>>
>>>>>
>>>>> We would have one app per emulator. Each app would register an MMIO
>>>>> range or instruction set to emulate. On a guest trap, Xen figures out
>>>>> which app it needs to run.
>>>>
>>>>
>>>> I't is not best approach, I think. For example we need one SMC handler
>>>> for
>>>> all domains. Because that SMC handler should track execution state of
>>>> different
>>>> guests to help TEE with scheduling. You know, TEE can't block in secure
>>>> state,
>>>> so it returns back and blocks in kernel driver. SMC handler need to know
>>>> which guest it needs to wake up when times comes.
>>>>
>>>> The same story with virtual coprocessors, I think.
>>>>
>>>> On other hand, MMIO handler can be one per domain. So, it should be
>>>> configurable. Or, maybe we need per-app MMIO handler and one global SMC
>>>> handler.
>>>> Perhaps, we need to think about all possible use cases.
>>>
>>>
>>>
>>> Could you explain what would be the benefits to run this global SMC
>>> handler
>>> in EL0?
>>>
>>> After all, it will require access to the host SMC. So what will you
>>> protect
>>> against?
>>
>> Yes, it will require access to host SMC. Idea is not to protect (but,
>> it can protect also).
>> I want to allow different guests to work with one TEE. Imagine that
>> multiple guests need
>> protected storage, accelerated cryptography or other TEE services.
>> All SMCs will be trapped to app, app will alter(or block) request and
>> forward it to TEE. This is the most basic use case, which we want to
>> implement.
>
>
> I am sorry, but I don't understand it. I envision EL0 as a way to limit the
> attack vector to Xen and the host. If you give full access to SMC, then you
> cannot protect.
In any case it will limit the attack surface. Filtered SMC request is
not as destructive as
arbitrary SMC from a guest.

> If the idea is not to protect, why do you want to move the code in EL0? What
> is the point to add an overhead (even if it is small) in this case?
There are many reasons:
1. Community is reluctant to add OP-TEE (or any other TEE) handler
right into hypervisor codebase.
2. Modularity. You can detect running TEE during boot and load
appropriate TEE handler app (honestly, it is not a big deal, because
you know on which system will work your hypervisor and TEE type can be
hardcoded in build).
3. Some degree of protection. Bug in EL0 handler will not bring down
whole hypervisor.

Andrii can correct me, but the same reasons apply to virtual
coprocessors framework and drivers.

-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-21 17:04                     ` Volodymyr Babchuk
@ 2017-04-21 17:38                       ` Julien Grall
  2017-04-21 18:35                         ` Volodymyr Babchuk
  2017-04-21 21:24                         ` Stefano Stabellini
  0 siblings, 2 replies; 82+ messages in thread
From: Julien Grall @ 2017-04-21 17:38 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Dario Faggioli, Stefano Stabellini, george.dunlap, Artem Mygaiev,
	Xen Devel

Hi Volodymyr,

On 21/04/17 18:04, Volodymyr Babchuk wrote:
> On 21 April 2017 at 19:47, Julien Grall <julien.grall@arm.com> wrote:
>> On 21/04/17 17:16, Volodymyr Babchuk wrote:
>>> On 21 April 2017 at 18:57, Julien Grall <julien.grall@arm.com> wrote:
>>>>
>>>> Hello Volodymyr,
>>>>
>>>> On 20/04/17 21:20, Volodymyr Babchuk wrote:
>>>>>
>>>>>
>>>>> On 12 April 2017 at 22:17, Stefano Stabellini <sstabellini@kernel.org>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On Wed, 12 Apr 2017, Dario Faggioli wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 2017-04-11 at 13:32 -0700, Stefano Stabellini wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, 7 Apr 2017, Stefano Stabellini wrote:
>>>>>>
>>>>>>
>>>>>> We would have one app per emulator. Each app would register an MMIO
>>>>>> range or instruction set to emulate. On a guest trap, Xen figures out
>>>>>> which app it needs to run.
>>>>>
>>>>>
>>>>> I't is not best approach, I think. For example we need one SMC handler
>>>>> for
>>>>> all domains. Because that SMC handler should track execution state of
>>>>> different
>>>>> guests to help TEE with scheduling. You know, TEE can't block in secure
>>>>> state,
>>>>> so it returns back and blocks in kernel driver. SMC handler need to know
>>>>> which guest it needs to wake up when times comes.
>>>>>
>>>>> The same story with virtual coprocessors, I think.
>>>>>
>>>>> On other hand, MMIO handler can be one per domain. So, it should be
>>>>> configurable. Or, maybe we need per-app MMIO handler and one global SMC
>>>>> handler.
>>>>> Perhaps, we need to think about all possible use cases.
>>>>
>>>>
>>>>
>>>> Could you explain what would be the benefits to run this global SMC
>>>> handler
>>>> in EL0?
>>>>
>>>> After all, it will require access to the host SMC. So what will you
>>>> protect
>>>> against?
>>>
>>> Yes, it will require access to host SMC. Idea is not to protect (but,
>>> it can protect also).
>>> I want to allow different guests to work with one TEE. Imagine that
>>> multiple guests need
>>> protected storage, accelerated cryptography or other TEE services.
>>> All SMCs will be trapped to app, app will alter(or block) request and
>>> forward it to TEE. This is the most basic use case, which we want to
>>> implement.
>>
>>
>> I am sorry, but I don't understand it. I envision EL0 as a way to limit the
>> attack vector to Xen and the host. If you give full access to SMC, then you
>> cannot protect.
> In any case it will limit the attack surface. Filtered SMC request is
> not as destructive as
> arbitrary SMC from a guest.

I agree with that. But why in EL0? I think you answer partly below.

>
>> If the idea is not to protect, why do you want to move the code in EL0? What
>> is the point to add an overhead (even if it is small) in this case?
> There are many reasons:
> 1. Community is reluctant to add OP-TEE (or any other TEE) handler
> right into hypervisor codebase.

Well, I think I was the only one to be reluctant. And I asked you to 
look at different solutions and come up with suggestion are saying why 
you solution is better.

Whilst I agree that EL0 app is a solution for a lot of emulation. We 
should be careful before moving code to EL0 and evaluating the impact. I 
am expecting to see the interface very small and the application to be 
standalone (e.g not requiring much interaction with Xen or the host 
hardware). But you seem to have a different view (see your e-mail with: 
"Probably, we can try another approach: allow application to register 
hooks in hypervisor: i.e. hook on MMIO, hook on SMC, hook on timer and 
so on.").

If you introduce EL0 but require a big interface, then I believe you 
don't limit the surface attack.

> 2. Modularity. You can detect running TEE during boot and load
> appropriate TEE handler app (honestly, it is not a big deal, because
> you know on which system will work your hypervisor and TEE type can be
> hardcoded in build).

Well, you could make Xen modular like Linux and still run everything in 
EL2. (Disclaimer, I am not saying we should do that...)

> 3. Some degree of protection. Bug in EL0 handler will not bring down
> whole hypervisor.

If you have a single app handling all the domains using SMC, then you 
will bring down all thoses domains. I agree it does not take down the 
hypervisor, but it will render unusable a part of the platform. In this 
case, how do you plan to restore the services?

Also, a bug in the EL0 handler may give the opportunity, in your use 
case, to get access to the firmware or data from another guest. How this 
will bring more protection?

If you handle only one guest per app, then it is very easy to kill that 
app and domain. It will only harm itself.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-21 17:38                       ` Julien Grall
@ 2017-04-21 18:35                         ` Volodymyr Babchuk
  2017-04-24 11:00                           ` Julien Grall
  2017-04-21 21:24                         ` Stefano Stabellini
  1 sibling, 1 reply; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-04-21 18:35 UTC (permalink / raw)
  To: Julien Grall
  Cc: Dario Faggioli, Stefano Stabellini, george.dunlap, Artem Mygaiev,
	Xen Devel

Hi Julien,

On 21 April 2017 at 20:38, Julien Grall <julien.grall@arm.com> wrote:
> Hi Volodymyr,
>
> On 21/04/17 18:04, Volodymyr Babchuk wrote:
>>
>> On 21 April 2017 at 19:47, Julien Grall <julien.grall@arm.com> wrote:
>>>
>>> On 21/04/17 17:16, Volodymyr Babchuk wrote:
>>>>
>>>> On 21 April 2017 at 18:57, Julien Grall <julien.grall@arm.com> wrote:
>>>>>
>>>>>
>>>>> Hello Volodymyr,
>>>>>
>>>>> On 20/04/17 21:20, Volodymyr Babchuk wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 12 April 2017 at 22:17, Stefano Stabellini <sstabellini@kernel.org>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, 12 Apr 2017, Dario Faggioli wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 2017-04-11 at 13:32 -0700, Stefano Stabellini wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, 7 Apr 2017, Stefano Stabellini wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> We would have one app per emulator. Each app would register an MMIO
>>>>>>> range or instruction set to emulate. On a guest trap, Xen figures out
>>>>>>> which app it needs to run.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I't is not best approach, I think. For example we need one SMC handler
>>>>>> for
>>>>>> all domains. Because that SMC handler should track execution state of
>>>>>> different
>>>>>> guests to help TEE with scheduling. You know, TEE can't block in
>>>>>> secure
>>>>>> state,
>>>>>> so it returns back and blocks in kernel driver. SMC handler need to
>>>>>> know
>>>>>> which guest it needs to wake up when times comes.
>>>>>>
>>>>>> The same story with virtual coprocessors, I think.
>>>>>>
>>>>>> On other hand, MMIO handler can be one per domain. So, it should be
>>>>>> configurable. Or, maybe we need per-app MMIO handler and one global
>>>>>> SMC
>>>>>> handler.
>>>>>> Perhaps, we need to think about all possible use cases.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Could you explain what would be the benefits to run this global SMC
>>>>> handler
>>>>> in EL0?
>>>>>
>>>>> After all, it will require access to the host SMC. So what will you
>>>>> protect
>>>>> against?
>>>>
>>>>
>>>> Yes, it will require access to host SMC. Idea is not to protect (but,
>>>> it can protect also).
>>>> I want to allow different guests to work with one TEE. Imagine that
>>>> multiple guests need
>>>> protected storage, accelerated cryptography or other TEE services.
>>>> All SMCs will be trapped to app, app will alter(or block) request and
>>>> forward it to TEE. This is the most basic use case, which we want to
>>>> implement.
>>>
>>>
>>>
>>> I am sorry, but I don't understand it. I envision EL0 as a way to limit
>>> the
>>> attack vector to Xen and the host. If you give full access to SMC, then
>>> you
>>> cannot protect.
>>
>> In any case it will limit the attack surface. Filtered SMC request is
>> not as destructive as
>> arbitrary SMC from a guest.
>
>
> I agree with that. But why in EL0? I think you answer partly below.
Yes.

>>
>>> If the idea is not to protect, why do you want to move the code in EL0?
>>> What
>>> is the point to add an overhead (even if it is small) in this case?
>>
>> There are many reasons:
>> 1. Community is reluctant to add OP-TEE (or any other TEE) handler
>> right into hypervisor codebase.
>
> Well, I think I was the only one to be reluctant. And I asked you to look at
> different solutions and come up with suggestion are saying why you solution
> is better.
Frankly, I'll be glad to put TEE handler right into the hypervisor. It
will ease up a lot
of things.

> Whilst I agree that EL0 app is a solution for a lot of emulation. We should
> be careful before moving code to EL0 and evaluating the impact. I am
> expecting to see the interface very small and the application to be
> standalone (e.g not requiring much interaction with Xen or the host
> hardware). But you seem to have a different view (see your e-mail with:
> "Probably, we can try another approach: allow application to register hooks
> in hypervisor: i.e. hook on MMIO, hook on SMC, hook on timer and so on.").
>
> If you introduce EL0 but require a big interface, then I believe you don't
> limit the surface attack.
Yes, actually we see EL0 apps as a way to extend hypervisor in a some
manageable way.
Isolation is a bonus. We here, at EPAM want to use EL0 apps for two things:
1. TEE support
2. vcoproc drivers.
Also apps can be used for
3. Device emulation (PL011 is the obvious candidate).

Obviously, 1 and 2 are not safe by their nature, even if they can be
executed in EL0.
In this cases apps can provide only modularity. And we will be happy to put them
right into hypervisor, if there are no objections.

>> 2. Modularity. You can detect running TEE during boot and load
>> appropriate TEE handler app (honestly, it is not a big deal, because
>> you know on which system will work your hypervisor and TEE type can be
>> hardcoded in build).
>
> Well, you could make Xen modular like Linux and still run everything in EL2.
> (Disclaimer, I am not saying we should do that...)
Yes, we could. And this will be a lot faster, than to run something in EL0.
This approach has own benefit: it will be crossplatform feature.

>> 3. Some degree of protection. Bug in EL0 handler will not bring down
>> whole hypervisor.
> If you have a single app handling all the domains using SMC, then you will
> bring down all thoses domains. I agree it does not take down the hypervisor,
> but it will render unusable a part of the platform.
This will bring down TEE interface, right. Domains will lose ability
to communicate
with TEE, but other functionality will remain intact.

> In this case, how do you plan to restore the services?
The obvious solution is to reboot the whole platform. It will be more
controlled process, than hypervisor crash.
But there are more soft ways.

For example, SMC app can be restarted. Then, I am almost sure that I
can ask OP-TEE to abort all opened sessions. After that, requests from
a new domains can be processed as usual, but we can't rely on state of
old domains, so they should be gracefully restarted. There will be
problem with dom0/domD, though.

> Also, a bug in the EL0 handler may give the opportunity, in your use case,
> to get access to the firmware or data from another guest. How this will
> bring more protection?
On other hand, bug in EL2 handler will give access to whole supervisor.

> If you handle only one guest per app, then it is very easy to kill that app
> and domain. It will only harm itself.
Yes, I agree. It is great for emulators (and can be used it this
case). But, unfortunately, TEE handler needs shared state. I can't see
how to implement OP-TEE handler without shared knowledge about wait
queues in all guests.

It just came to me that it can be possible to move most of this stuff
to OP-TEE. Can S-EL1 request two stage table walk for a given guest?
We can do this in software, anyways. Probably I can minimize TEE
handler it hypervisor, make it almost generic. Need to think more
about this...

-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-21 14:42               ` Andrii Anisov
  2017-04-21 15:49                 ` Julien Grall
@ 2017-04-21 20:58                 ` Stefano Stabellini
  2017-04-21 21:17                   ` Stefano Stabellini
  2017-04-24 16:56                   ` Andrii Anisov
  1 sibling, 2 replies; 82+ messages in thread
From: Stefano Stabellini @ 2017-04-21 20:58 UTC (permalink / raw)
  To: Andrii Anisov
  Cc: Stefano Stabellini, Volodymyr Babchuk, Dario Faggioli,
	george.dunlap, Xen Devel, Julien Grall, Artem Mygaiev

Hello Andrii,

could you please use plain text (not HTML) in your emails?

On Fri, 21 Apr 2017, Andrii Anisov wrote:
> 
> Hello,
> 
> On 20.04.17 23:20, Volodymyr Babchuk wrote:
> 
> Hi Stefano,
> 
> On 12 April 2017 at 22:17, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> On Wed, 12 Apr 2017, Dario Faggioli wrote:
> 
> On Tue, 2017-04-11 at 13:32 -0700, Stefano Stabellini wrote:
> 
> On Fri, 7 Apr 2017, Stefano Stabellini wrote:
> 
> This is the most difficult problem that we need to solve as part of
> this
> work. It is difficult to have the right answer at the beginning,
> before
> seeing any code. If the app_container/app_thread approach causes
> too
> much duplication of work, the alternative would be to fix/improve
> stubdoms (minios) until they match what we need. Specifically,
> these
> would be the requirements:
> 
> IMO, this stubdom way, is really really really interesting! :-)
> 
> 1) Determinism: a stubdom servicing a given guest needs to be
> scheduled
>    immediately after the guest vcpu traps into Xen. It needs to
>    deterministic.
> 
> We will also need another type of application: one which is periodically called by XEN itself, not actually servicing any domain request. This is needed for a
> coprocessor sharing framework scheduler implementation.

EL0 apps can be a powerful new tool for us to use, but they are not the
solution to everything. This is where I would draw the line: if the
workload needs to be scheduled periodically, then it is not a good fit
for an EL0 app. In that case, stubdoms or regular driver domains are a
better choice.

EL0 apps are a natural fit for emulators, when you have one instance per
VM.

I am not completely convinced that they could be used for cases where
you need one instance for all domains (even without periodic
scheduling). I am not sure they could be made to work well that way.
Stubdom or driver domains could be better fit for that use case too.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-21 20:58                 ` Stefano Stabellini
@ 2017-04-21 21:17                   ` Stefano Stabellini
  2017-04-24 16:56                   ` Andrii Anisov
  1 sibling, 0 replies; 82+ messages in thread
From: Stefano Stabellini @ 2017-04-21 21:17 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Andrii Anisov, Volodymyr Babchuk, Dario Faggioli, george.dunlap,
	Xen Devel, Julien Grall, Artem Mygaiev

On Fri, 21 Apr 2017, Stefano Stabellini wrote:
> Hello Andrii,
> 
> could you please use plain text (not HTML) in your emails?
> 
> On Fri, 21 Apr 2017, Andrii Anisov wrote:
> > 
> > Hello,
> > 
> > On 20.04.17 23:20, Volodymyr Babchuk wrote:
> > 
> > Hi Stefano,
> > 
> > On 12 April 2017 at 22:17, Stefano Stabellini <sstabellini@kernel.org> wrote:
> > 
> > On Wed, 12 Apr 2017, Dario Faggioli wrote:
> > 
> > On Tue, 2017-04-11 at 13:32 -0700, Stefano Stabellini wrote:
> > 
> > On Fri, 7 Apr 2017, Stefano Stabellini wrote:
> > 
> > This is the most difficult problem that we need to solve as part of
> > this
> > work. It is difficult to have the right answer at the beginning,
> > before
> > seeing any code. If the app_container/app_thread approach causes
> > too
> > much duplication of work, the alternative would be to fix/improve
> > stubdoms (minios) until they match what we need. Specifically,
> > these
> > would be the requirements:
> > 
> > IMO, this stubdom way, is really really really interesting! :-)
> > 
> > 1) Determinism: a stubdom servicing a given guest needs to be
> > scheduled
> >    immediately after the guest vcpu traps into Xen. It needs to
> >    deterministic.
> > 
> > We will also need another type of application: one which is periodically called by XEN itself, not actually servicing any domain request. This is needed for a
> > coprocessor sharing framework scheduler implementation.
> 
> EL0 apps can be a powerful new tool for us to use, but they are not the
> solution to everything. This is where I would draw the line: if the
> workload needs to be scheduled periodically, then it is not a good fit
> for an EL0 app. In that case, stubdoms or regular driver domains are a
> better choice.
> 
> EL0 apps are a natural fit for emulators, when you have one instance per
> VM.
> 
> I am not completely convinced that they could be used for cases where
> you need one instance for all domains (even without periodic
> scheduling). I am not sure they could be made to work well that way.
> Stubdom or driver domains could be better fit for that use case too.

Although I'll add that some emulators running as EL0 apps might have to
register a timer with Xen because they might need to send periodical
interrupts to the guest (think of emulating a timer).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-21 17:38                       ` Julien Grall
  2017-04-21 18:35                         ` Volodymyr Babchuk
@ 2017-04-21 21:24                         ` Stefano Stabellini
  2017-04-24 16:14                           ` Andrii Anisov
                                             ` (2 more replies)
  1 sibling, 3 replies; 82+ messages in thread
From: Stefano Stabellini @ 2017-04-21 21:24 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Volodymyr Babchuk, Dario Faggioli,
	george.dunlap, Xen Devel, Artem Mygaiev

On Fri, 21 Apr 2017, Julien Grall wrote:
> Hi Volodymyr,
> 
> On 21/04/17 18:04, Volodymyr Babchuk wrote:
> > On 21 April 2017 at 19:47, Julien Grall <julien.grall@arm.com> wrote:
> > > On 21/04/17 17:16, Volodymyr Babchuk wrote:
> > > > On 21 April 2017 at 18:57, Julien Grall <julien.grall@arm.com> wrote:
> > > > > 
> > > > > Hello Volodymyr,
> > > > > 
> > > > > On 20/04/17 21:20, Volodymyr Babchuk wrote:
> > > > > > 
> > > > > > 
> > > > > > On 12 April 2017 at 22:17, Stefano Stabellini
> > > > > > <sstabellini@kernel.org>
> > > > > > wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On Wed, 12 Apr 2017, Dario Faggioli wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On Tue, 2017-04-11 at 13:32 -0700, Stefano Stabellini wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On Fri, 7 Apr 2017, Stefano Stabellini wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > We would have one app per emulator. Each app would register an
> > > > > > > MMIO
> > > > > > > range or instruction set to emulate. On a guest trap, Xen figures
> > > > > > > out
> > > > > > > which app it needs to run.
> > > > > > 
> > > > > > 
> > > > > > I't is not best approach, I think. For example we need one SMC
> > > > > > handler
> > > > > > for
> > > > > > all domains. Because that SMC handler should track execution state
> > > > > > of
> > > > > > different
> > > > > > guests to help TEE with scheduling. You know, TEE can't block in
> > > > > > secure
> > > > > > state,
> > > > > > so it returns back and blocks in kernel driver. SMC handler need to
> > > > > > know
> > > > > > which guest it needs to wake up when times comes.
> > > > > > 
> > > > > > The same story with virtual coprocessors, I think.
> > > > > > 
> > > > > > On other hand, MMIO handler can be one per domain. So, it should be
> > > > > > configurable. Or, maybe we need per-app MMIO handler and one global
> > > > > > SMC
> > > > > > handler.
> > > > > > Perhaps, we need to think about all possible use cases.
> > > > > 
> > > > > 
> > > > > 
> > > > > Could you explain what would be the benefits to run this global SMC
> > > > > handler
> > > > > in EL0?
> > > > > 
> > > > > After all, it will require access to the host SMC. So what will you
> > > > > protect
> > > > > against?
> > > > 
> > > > Yes, it will require access to host SMC. Idea is not to protect (but,
> > > > it can protect also).
> > > > I want to allow different guests to work with one TEE. Imagine that
> > > > multiple guests need
> > > > protected storage, accelerated cryptography or other TEE services.
> > > > All SMCs will be trapped to app, app will alter(or block) request and
> > > > forward it to TEE. This is the most basic use case, which we want to
> > > > implement.
> > > 
> > > 
> > > I am sorry, but I don't understand it. I envision EL0 as a way to limit
> > > the
> > > attack vector to Xen and the host. If you give full access to SMC, then
> > > you
> > > cannot protect.
> > In any case it will limit the attack surface. Filtered SMC request is
> > not as destructive as
> > arbitrary SMC from a guest.
> 
> I agree with that. But why in EL0? I think you answer partly below.
> 
> > 
> > > If the idea is not to protect, why do you want to move the code in EL0?
> > > What
> > > is the point to add an overhead (even if it is small) in this case?
> > There are many reasons:
> > 1. Community is reluctant to add OP-TEE (or any other TEE) handler
> > right into hypervisor codebase.
> 
> Well, I think I was the only one to be reluctant. And I asked you to look at
> different solutions and come up with suggestion are saying why you solution is
> better.
> 
> Whilst I agree that EL0 app is a solution for a lot of emulation. We should be
> careful before moving code to EL0 and evaluating the impact. I am expecting to
> see the interface very small and the application to be standalone (e.g not
> requiring much interaction with Xen or the host hardware).

I also had this understanding


> But you seem to have a different view (see your e-mail with:
> "Probably, we can try another approach: allow application to register
> hooks in hypervisor: i.e. hook on MMIO, hook on SMC, hook on timer and
> so on.").
> 
> If you introduce EL0 but require a big interface, then I believe you don't
> limit the surface attack.

Also, it is difficult to maintain a large EL0-Xen interface.

The idea is basically to register an MMIO range to emulate, submit the
request to the EL0 app, which would take care of the emulation. The
interface with Xen would be mostly limited to map/unmap guest memory
(only the guest it is servicing) and send interrupts to the guest. It
would always and only be run immediately after the guest vcpu in its own
time slot.

The key is to be simple. If it becomes complex, then we are reinventing
stubdoms.

If the workload needs hardware access, periodical scheduling and it's
only once instance for all domains, then it's probably better as a
stubdom.

If the workloads doesn't need hardware access, it's one instance per
domain, at most it needs to register a timer with Xen, then it would be
fine as EL0 app. For the EL0 app framework, I'll start from there.


For example, I can see that something as complex as TEE emulation could
have one component running as an EL0 app and another component in a
stubdom, coexisting peacefully.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-21 18:35                         ` Volodymyr Babchuk
@ 2017-04-24 11:00                           ` Julien Grall
  2017-04-24 21:29                             ` Volodymyr Babchuk
  0 siblings, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-04-24 11:00 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Dario Faggioli, Stefano Stabellini, george.dunlap, Artem Mygaiev,
	Xen Devel



On 21/04/17 19:35, Volodymyr Babchuk wrote:
> Hi Julien,

Hi Volodymyr,


> On 21 April 2017 at 20:38, Julien Grall <julien.grall@arm.com> wrote:
>>> 3. Some degree of protection. Bug in EL0 handler will not bring down
>>> whole hypervisor.
>> If you have a single app handling all the domains using SMC, then you will
>> bring down all thoses domains. I agree it does not take down the hypervisor,
>> but it will render unusable a part of the platform.
> This will bring down TEE interface, right. Domains will lose ability
> to communicate
> with TEE, but other functionality will remain intact.

I don't think so. SMC are synchronous, so if a vCPU do an SMC call it 
will block until the handler has finished.

So if the EL0 app crash, then you will render the vCPU unusable. I guess 
this could be fixed by a timeout, but how do define the timeout? And 
then, you may have some state store in the EL0 app, they will be lost if 
the app crash. So how do you restore the state?

>
>> In this case, how do you plan to restore the services?
> The obvious solution is to reboot the whole platform. It will be more
> controlled process, than hypervisor crash.
> But there are more soft ways.
>
> For example, SMC app can be restarted. Then, I am almost sure that I
> can ask OP-TEE to abort all opened sessions. After that, requests from
> a new domains can be processed as usual, but we can't rely on state of
> old domains, so they should be gracefully restarted. There will be
> problem with dom0/domD, though.

What is domD? I guess you mean Driver Domain. If so, the goal of using 
driving domain is to restart it easily if a driver crash.

>
>> Also, a bug in the EL0 handler may give the opportunity, in your use case,
>> to get access to the firmware or data from another guest. How this will
>> bring more protection?
> On other hand, bug in EL2 handler will give access to whole supervisor.
>
>> If you handle only one guest per app, then it is very easy to kill that app
>> and domain. It will only harm itself.
> Yes, I agree. It is great for emulators (and can be used it this
> case). But, unfortunately, TEE handler needs shared state. I can't see
> how to implement OP-TEE handler without shared knowledge about wait
> queues in all guests.
>
> It just came to me that it can be possible to move most of this stuff
> to OP-TEE. Can S-EL1 request two stage table walk for a given guest?
> We can do this in software, anyways. Probably I can minimize TEE
> handler it hypervisor, make it almost generic. Need to think more
> about this...

I am not sure how S-EL1 could request stage-2 table walk, the SMC may be 
handled on a different pCPU than the guest vCPU or even the guest vCPU 
may have been descheduled whilst waiting the answer.

I think you still need the hypervisor to do the translation IPA -> PA 
for your guest and also potential bounce buffer if the guest buffer span 
across multiple pages. You will also need the hypervisor to do basic 
sanity check such as for preventing to the buffer to live in a foreign 
mapping and making sure the page does not disappear under your feet 
(likely via incrementing the refcount on the page) when handling the SMC.

Cheers,
-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-21 21:24                         ` Stefano Stabellini
@ 2017-04-24 16:14                           ` Andrii Anisov
  2017-04-24 16:46                           ` Andrii Anisov
  2017-04-27 15:25                           ` George Dunlap
  2 siblings, 0 replies; 82+ messages in thread
From: Andrii Anisov @ 2017-04-24 16:14 UTC (permalink / raw)
  To: Stefano Stabellini, Julien Grall
  Cc: Volodymyr Babchuk, Dario Faggioli, Artem Mygaiev, george.dunlap,
	Xen Devel

Stefano,


On 22.04.17 00:24, Stefano Stabellini wrote:
> The idea is basically to register an MMIO range to emulate, submit the
> request to the EL0 app, which would take care of the emulation. The
> interface with Xen would be mostly limited to map/unmap guest memory
> (only the guest it is servicing) and send interrupts to the guest. It
> would always and only be run immediately after the guest vcpu in its own
> time slot.
>
> The key is to be simple. If it becomes complex, then we are reinventing
> stubdoms.
My impression is that you stick at the emulators as the only application 
of this new concept.
So that you are suggesting rather emulators specific interface not 
simple but generic one.

-- 

*Andrii Anisov*



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-21 21:24                         ` Stefano Stabellini
  2017-04-24 16:14                           ` Andrii Anisov
@ 2017-04-24 16:46                           ` Andrii Anisov
  2017-04-27 15:25                           ` George Dunlap
  2 siblings, 0 replies; 82+ messages in thread
From: Andrii Anisov @ 2017-04-24 16:46 UTC (permalink / raw)
  To: Stefano Stabellini, Julien Grall
  Cc: Volodymyr Babchuk, Dario Faggioli, Artem Mygaiev, george.dunlap,
	Xen Devel

Stefano,


On 22.04.17 00:24, Stefano Stabellini wrote:
> The key is to be simple. If it becomes complex, then we are reinventing
> stubdoms.
>
> If the workload needs hardware access, periodical scheduling and it's
> only once instance for all domains, then it's probably better as a
> stubdom.
>
> If the workloads doesn't need hardware access, it's one instance per
> domain, at most it needs to register a timer with Xen, then it would be
> fine as EL0 app. For the EL0 app framework, I'll start from there.
If we are speaking about shared coprocessors framework, we need here 
several things:
  - MMIO access emulation
  - periodic actions (scheduling) which at least will include IOMMU 
reconfiguration and some actions with coprocessor hardware itself in runtime
  - coprocessor interrupts handling and redistribution to target domains

 From one and we would like to have that stuff encapsulated in some 
container (application), somehow separated from XEN. From other hand 
minimal overhead is desired.

-- 

*Andrii Anisov*



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-21 20:58                 ` Stefano Stabellini
  2017-04-21 21:17                   ` Stefano Stabellini
@ 2017-04-24 16:56                   ` Andrii Anisov
  2017-04-24 18:08                     ` Stefano Stabellini
  2017-04-24 19:11                     ` Julien Grall
  1 sibling, 2 replies; 82+ messages in thread
From: Andrii Anisov @ 2017-04-24 16:56 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Volodymyr Babchuk, Dario Faggioli, george.dunlap, Xen Devel,
	Julien Grall, Artem Mygaiev

On 21.04.17 23:58, Stefano Stabellini wrote:
> Hello Andrii,
>
> could you please use plain text (not HTML) in your emails?
My bad. Will be checking delivery format settings carefully.

> On Fri, 21 Apr 2017, Andrii Anisov wrote:
>> We will also need another type of application: one which is periodically called by XEN itself, not actually servicing any domain request. This is needed for a
>> coprocessor sharing framework scheduler implementation.
> EL0 apps can be a powerful new tool for us to use, but they are not the
> solution to everything. This is where I would draw the line: if the
> workload needs to be scheduled periodically, then it is not a good fit
> for an EL0 app.
 From my last conversation with Volodymyr I've got a feeling that 
notions "EL0" and "XEN native application" must be pretty orthogonal.
In [1] Volodymyr got no performance gain from changing domain's 
exception level from EL1 to EL0.
Only when Volodymyr stripped the domain's context  abstraction (i.e. 
dropped GIC context store/restore) some noticeable results were reached.
So I treat his results as a "light domain" experiment.
For me it is interesting how configurable and light the "domain" 
abstraction could be and when it starts to be treated as a "native 
application"?

> In that case, stubdoms or regular driver domains are a
> better choice.
Sorry for my ignorance, but is there any difference between regular 
domains and stub domains from hypervisor point of view?

[1]http://marc.info/?l=xen-devel&m=149088856116797&w=2  - benchmark results

-- 

*Andrii Anisov*



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-24 16:56                   ` Andrii Anisov
@ 2017-04-24 18:08                     ` Stefano Stabellini
  2017-04-25 10:15                       ` Andrii Anisov
                                         ` (2 more replies)
  2017-04-24 19:11                     ` Julien Grall
  1 sibling, 3 replies; 82+ messages in thread
From: Stefano Stabellini @ 2017-04-24 18:08 UTC (permalink / raw)
  To: Andrii Anisov
  Cc: Stefano Stabellini, Volodymyr Babchuk, Dario Faggioli,
	george.dunlap, Xen Devel, Julien Grall, Artem Mygaiev

On Mon, 24 Apr 2017, Andrii Anisov wrote:
> On 21.04.17 23:58, Stefano Stabellini wrote:
> > Hello Andrii,
> > 
> > could you please use plain text (not HTML) in your emails?
> My bad. Will be checking delivery format settings carefully.
> 
> > On Fri, 21 Apr 2017, Andrii Anisov wrote:
> > > We will also need another type of application: one which is periodically
> > > called by XEN itself, not actually servicing any domain request. This is
> > > needed for a
> > > coprocessor sharing framework scheduler implementation.
> > EL0 apps can be a powerful new tool for us to use, but they are not the
> > solution to everything. This is where I would draw the line: if the
> > workload needs to be scheduled periodically, then it is not a good fit
> > for an EL0 app.
> From my last conversation with Volodymyr I've got a feeling that notions "EL0"
> and "XEN native application" must be pretty orthogonal.
> In [1] Volodymyr got no performance gain from changing domain's exception
> level from EL1 to EL0.
> Only when Volodymyr stripped the domain's context  abstraction (i.e. dropped
> GIC context store/restore) some noticeable results were reached.
> So I treat his results as a "light domain" experiment.
> For me it is interesting how configurable and light the "domain" abstraction
> could be and when it starts to be treated as a "native application"?
> 
> > In that case, stubdoms or regular driver domains are a
> > better choice.
> Sorry for my ignorance, but is there any difference between regular domains
> and stub domains from hypervisor point of view?
> 
> [1]http://marc.info/?l=xen-devel&m=149088856116797&w=2  - benchmark results

Let me add more context and information.


Stubdomains (stubdoms in short) are small domains, each running a single
application. Typically they run unikernels rather than a full fledged
operating system. A classic example is QEMU stubdoms on x86: one QEMU
stubdoms is started for each regular guest domain. Each QEMU stubdom
instance provides emulation for one guest - it runs one instance of
QEMU.

From Xen point of view, they are regular domain, except that they are
privilege in regards to one particular guest only (they can map a page
of the guest they are servicing, but they cannot map any random page in
memory). If you do "xl list" you would see stubdoms in the output.

The advantages of using stubdoms are:
- they already exist
- their security properties are well known
- you could run almost anything, including Linux or a full distro, as a
  stubdom
- they could access hardware resources if you assign them to stubdoms
- there could be one for all guests (like DomD), or many
- they can run in parallel to the guest vcpus

The disadvantages are:
- they are scheduled independently: the Xen scheduler has no special
  code to handle stubdoms. Time spent in a stubdom is not accounted to
  the guest that triggered it.
- they potentially run on any pcpu: because they are scheduled
  independently, they could be run anywhere
- slower to context switch
- we don't have a good (any?) infrastructure to build stubdoms today
- they show up on "xl list", most users find it confusing


In this thread, we are discussing whether it makes sense to introduce a
*new* model, one that has different properties, therefore different
advantages and disadvantages. Let's call it EL0 apps.

An EL0 app is an application that runs on top of the Xen hypervisor
directly in EL0 mode. It is not a domain, and it doesn't show up on "xl
list". It runs on the same timeslot of the guest vcpu it is servicing.
It is scheduled deterministically: right after a guest vcpu traps into
the hypervisor. The build is simple. Writing an EL0 app should be
(almost) as simple as writing a regular emulator for Xen. An EL0
app is not scheduled, but it could register a timer with Xen. The
interface between EL0 apps and Xen is small: the number of
functionalities exposed are very limited. They don't access hardware.

The advantages of using EL0 apps are:
- scheduled deterministically
- faster context switch
- lower and deterministic latency
- EL0 apps execution time is accounted appropriately to the guest that
  they are servicing
- they run on the same pcpu as the guest vcpu
- writing and building EL0 apps should be easier

The disadvantages are:
- support for EL0 apps still needs to be written
- they only expose limited functionalities (map guest pages, send
  interrupts to guest, register timer)
- they don't have access to hardware
- one for each guest (not one for all guests)


As you can imagine, depending on what you need to write, one model could
be a better fit than the other. Although stubdoms on x86 are mostly used
to run QEMU, to provide emulation to guests, actually, emulators would
be a better fit as EL0 apps.

Something more complex that requires interaction with the hardware is
likely better as a standalone stubdom.


> If we are speaking about shared coprocessors framework, we need here several
> things:
>  - MMIO access emulation

This could be run as EL0 app.


>  - periodic actions (scheduling) which at least will include IOMMU
> reconfiguration and some actions with coprocessor hardware itself in runtime
>  - coprocessor interrupts handling and redistribution to target domains

These would be better as stubdoms. Or, if they are simple enough, in
the hypervisor directly.

Assuming that MMIO access emulation is done in an EL0 app, how many
types of calls does it need to make to these two components? If the
answer is none, or *very* few, then it is OK to run MMIO access
emulation as an EL0 app. These two components could even be in Xen, if
they are simple enough.

If these two pieces are run as a stubdom, we could do everything in a
stubdom (including MMIO access emulation), or we could still run MMIO
access emulation in an EL0 app, and introduce a ring buffer connecting
EL0 apps with the stubdom. EL0 apps could send requests to it that way.

If these two pieces are in Xen, we would need a very strong and
well-thought-out set of checks to protect Xen from bad EL0 calls. This
is why we are saying that if the number of EL0 calls is too high, then
it might be not be feasible to use an EL0 app for your use case.

As far as I can tell, without looking at the existing code, the options
are:
1) everything in a stubdom
2) MMIO access emulation as EL0 app, the rest in a stubdom
3) MMIO access emulation as EL0 app, the rest in Xen

The right approach depends on the type of interactions between MMIO
access emulation and the rest of the shared coprocessors framework. Does
this make sense?


> From one and we would like to have that stuff encapsulated in some container
> (application), somehow separated from XEN. From other hand minimal overhead is
> desired.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-24 16:56                   ` Andrii Anisov
  2017-04-24 18:08                     ` Stefano Stabellini
@ 2017-04-24 19:11                     ` Julien Grall
  2017-04-24 21:41                       ` Volodymyr Babchuk
  2017-04-25  8:52                       ` Andrii Anisov
  1 sibling, 2 replies; 82+ messages in thread
From: Julien Grall @ 2017-04-24 19:11 UTC (permalink / raw)
  To: Andrii Anisov, Stefano Stabellini
  Cc: Volodymyr Babchuk, Dario Faggioli, george.dunlap, Xen Devel, nd,
	Artem Mygaiev

Hi Andrii,

On 24/04/2017 17:56, Andrii Anisov wrote:
> On 21.04.17 23:58, Stefano Stabellini wrote:
>> On Fri, 21 Apr 2017, Andrii Anisov wrote:
>>> We will also need another type of application: one which is
>>> periodically called by XEN itself, not actually servicing any domain
>>> request. This is needed for a
>>> coprocessor sharing framework scheduler implementation.
>> EL0 apps can be a powerful new tool for us to use, but they are not the
>> solution to everything. This is where I would draw the line: if the
>> workload needs to be scheduled periodically, then it is not a good fit
>> for an EL0 app.
> From my last conversation with Volodymyr I've got a feeling that notions
> "EL0" and "XEN native application" must be pretty orthogonal.
> In [1] Volodymyr got no performance gain from changing domain's
> exception level from EL1 to EL0.
> Only when Volodymyr stripped the domain's context  abstraction (i.e.
> dropped GIC context store/restore) some noticeable results were reached.

Do you have numbers for part that take times in the save/restore? You 
mention GIC and I am a bit surprised you don't mention FPU.

I would have a look at optimizing the context switch path. Some ideas:
	- there are a lot of unnecessary isb/dsb. The registers used by the 
guests only will be synchronized by eret.
	- FPU is taking time to save/restore, you could make it lazy
	- It might be possible to limit the number of LRs saved/restored 
depending on the number of LRs used by a domain.
	- ...

If the numbers are still bad, then we can start stripping some part (for 
instance you may not need FPU).

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-24 11:00                           ` Julien Grall
@ 2017-04-24 21:29                             ` Volodymyr Babchuk
  0 siblings, 0 replies; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-04-24 21:29 UTC (permalink / raw)
  To: Julien Grall
  Cc: Dario Faggioli, Stefano Stabellini, george.dunlap, Artem Mygaiev,
	Xen Devel

Hello Julien,

>>>>
>>>> 3. Some degree of protection. Bug in EL0 handler will not bring down
>>>> whole hypervisor.
>>>
>>> If you have a single app handling all the domains using SMC, then you
>>> will
>>> bring down all thoses domains. I agree it does not take down the
>>> hypervisor,
>>> but it will render unusable a part of the platform.
>>
>> This will bring down TEE interface, right. Domains will lose ability
>> to communicate
>> with TEE, but other functionality will remain intact.
>
>
> I don't think so. SMC are synchronous, so if a vCPU do an SMC call it will
> block until the handler has finished.
>
> So if the EL0 app crash, then you will render the vCPU unusable. I guess
> this could be fixed by a timeout, but how do define the timeout? And then,
> you may have some state store in the EL0 app, they will be lost if the app
> crash. So how do you restore the state?
When app will crash, hypervisor should handle this as an exception.
Then SMC framework in the hypervisor will return and error as result
of SMC.
As I said, state int app will be lost. So we will need to reboot
guests that use SMC. Or just any subsequent SMC from them. Only newly
booted guest will be allowed to do SMC.

>>
>>> In this case, how do you plan to restore the services?
>>
>> The obvious solution is to reboot the whole platform. It will be more
>> controlled process, than hypervisor crash.
>> But there are more soft ways.
>>
>> For example, SMC app can be restarted. Then, I am almost sure that I
>> can ask OP-TEE to abort all opened sessions. After that, requests from
>> a new domains can be processed as usual, but we can't rely on state of
>> old domains, so they should be gracefully restarted. There will be
>> problem with dom0/domD, though.
>
>
> What is domD? I guess you mean Driver Domain. If so, the goal of using
> driving domain is to restart it easily if a driver crash.
Yep, driver domain. In theory yes. But as I can see from practice, it
is very difficult to restart a device on a running system.

>>
>>> Also, a bug in the EL0 handler may give the opportunity, in your use
>>> case,
>>> to get access to the firmware or data from another guest. How this will
>>> bring more protection?
>>
>> On other hand, bug in EL2 handler will give access to whole supervisor.
>>
>>> If you handle only one guest per app, then it is very easy to kill that
>>> app
>>> and domain. It will only harm itself.
>>
>> Yes, I agree. It is great for emulators (and can be used it this
>> case). But, unfortunately, TEE handler needs shared state. I can't see
>> how to implement OP-TEE handler without shared knowledge about wait
>> queues in all guests.
>>
>> It just came to me that it can be possible to move most of this stuff
>> to OP-TEE. Can S-EL1 request two stage table walk for a given guest?
>> We can do this in software, anyways. Probably I can minimize TEE
>> handler it hypervisor, make it almost generic. Need to think more
>> about this...
>
>
> I am not sure how S-EL1 could request stage-2 table walk, the SMC may be
> handled on a different pCPU than the guest vCPU or even the guest vCPU may
> have been descheduled whilst waiting the answer.
If it can be different pCPU, then yes, this is a problem. But, someone
erlier mentioned that he want to execute EL0 app at the same pCPU,
wheren requesting vCPU ran.

> I think you still need the hypervisor to do the translation IPA -> PA for
Yes, my intention is to do this in EL0 app (or directly in
hypervisor). I just wanted to to consider idea of doing this in TEE.
But you are right, XEN can remove pages in any time, so we need to pin
them.
> your guest and also potential bounce buffer if the guest buffer span across
> multiple pages.
That should be done on TEE driver side. Actually, I'm currently
upstreaming necessary patches to OP-TEE.

> You will also need the hypervisor to do basic sanity check
> such as for preventing to the buffer to live in a foreign mapping and making
> sure the page does not disappear under your feet (likely via incrementing
> the refcount on the page) when handling the SMC.
That is a very good point. Thank you.

-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-24 19:11                     ` Julien Grall
@ 2017-04-24 21:41                       ` Volodymyr Babchuk
  2017-04-25 11:43                         ` Julien Grall
  2017-04-25  8:52                       ` Andrii Anisov
  1 sibling, 1 reply; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-04-24 21:41 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Andrii Anisov, Dario Faggioli, george.dunlap,
	Xen Devel, nd, Artem Mygaiev

Julien,

>>>> We will also need another type of application: one which is
>>>> periodically called by XEN itself, not actually servicing any domain
>>>> request. This is needed for a
>>>> coprocessor sharing framework scheduler implementation.
>>>
>>> EL0 apps can be a powerful new tool for us to use, but they are not the
>>> solution to everything. This is where I would draw the line: if the
>>> workload needs to be scheduled periodically, then it is not a good fit
>>> for an EL0 app.
>>
>> From my last conversation with Volodymyr I've got a feeling that notions
>> "EL0" and "XEN native application" must be pretty orthogonal.
>> In [1] Volodymyr got no performance gain from changing domain's
>> exception level from EL1 to EL0.
>> Only when Volodymyr stripped the domain's context  abstraction (i.e.
>> dropped GIC context store/restore) some noticeable results were reached.
>
>
> Do you have numbers for part that take times in the save/restore? You
> mention GIC and I am a bit surprised you don't mention FPU.
I did it in the other thread. Check out [1]. The most speed up I got
after removing vGIC context handling

> I would have a look at optimizing the context switch path. Some ideas:
>         - there are a lot of unnecessary isb/dsb. The registers used by the
> guests only will be synchronized by eret.
I have removed (almost) all of them. No significant changes in latency.

>         - FPU is taking time to save/restore, you could make it lazy
This also does not takes much time.

>         - It might be possible to limit the number of LRs saved/restored
> depending on the number of LRs used by a domain.
Excuse me, what is LR in this context?

You can take a look at my context switch routines at [2].

[1] http://marc.info/?l=xen-devel&m=149088856116797&w=2
[2] https://github.com/lorc/xen/blob/el0_app/xen/arch/arm/domain.c#L257


-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-24 19:11                     ` Julien Grall
  2017-04-24 21:41                       ` Volodymyr Babchuk
@ 2017-04-25  8:52                       ` Andrii Anisov
  1 sibling, 0 replies; 82+ messages in thread
From: Andrii Anisov @ 2017-04-25  8:52 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: Volodymyr Babchuk, Dario Faggioli, george.dunlap, Xen Devel, nd,
	Artem Mygaiev

Dear Julien,

I just read Volodymyr's emails, looked through his code and we had a 
discussion.
So I'm not ready to present technical details. I guess Volodymyr is a 
right person while he is up to this topic.

On 24.04.17 22:11, Julien Grall wrote:
> Do you have numbers for part that take times in the save/restore? You 
> mention GIC and I am a bit surprised you don't mention FPU.
Volodymyr mentioned GIC in his email with benchmark results as an 
example of the major change.

-- 

*Andrii Anisov*


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-24 18:08                     ` Stefano Stabellini
@ 2017-04-25 10:15                       ` Andrii Anisov
  2017-05-05 10:51                       ` Andrii Anisov
  2017-05-05 11:09                       ` [ARM] Native application design and discussion (I hope) Andrii Anisov
  2 siblings, 0 replies; 82+ messages in thread
From: Andrii Anisov @ 2017-04-25 10:15 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Volodymyr Babchuk, Dario Faggioli, george.dunlap, Xen Devel,
	Julien Grall, Artem Mygaiev

Dear Stefano,


Thank you for such a wide explanation.

I have to read it carefully and consider it. Also some internal 
discussion is needed.

So I'll get back later with comments.


-- 

*Andrii Anisov*



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-24 21:41                       ` Volodymyr Babchuk
@ 2017-04-25 11:43                         ` Julien Grall
  2017-04-26 21:44                           ` Volodymyr Babchuk
  0 siblings, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-04-25 11:43 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Stefano Stabellini, Andrii Anisov, Dario Faggioli, george.dunlap,
	Xen Devel, nd, Artem Mygaiev



On 24/04/17 22:41, Volodymyr Babchuk wrote:
> Julien,

Hi Volodymyr,

>>>>> We will also need another type of application: one which is
>>>>> periodically called by XEN itself, not actually servicing any domain
>>>>> request. This is needed for a
>>>>> coprocessor sharing framework scheduler implementation.
>>>>
>>>> EL0 apps can be a powerful new tool for us to use, but they are not the
>>>> solution to everything. This is where I would draw the line: if the
>>>> workload needs to be scheduled periodically, then it is not a good fit
>>>> for an EL0 app.
>>>
>>> From my last conversation with Volodymyr I've got a feeling that notions
>>> "EL0" and "XEN native application" must be pretty orthogonal.
>>> In [1] Volodymyr got no performance gain from changing domain's
>>> exception level from EL1 to EL0.
>>> Only when Volodymyr stripped the domain's context  abstraction (i.e.
>>> dropped GIC context store/restore) some noticeable results were reached.
>>
>>
>> Do you have numbers for part that take times in the save/restore? You
>> mention GIC and I am a bit surprised you don't mention FPU.
> I did it in the other thread. Check out [1]. The most speed up I got
> after removing vGIC context handling

Oh, yes. Sorry I forgot this thread. Continuing on that, you said that 
"Now profiler shows that hypervisor spends time in spinlocks and p2m code."

Could you expand here? How the EL0 app will spend time in p2m code?

Similarly, why spinlocks take time? Are they contented?

>
>> I would have a look at optimizing the context switch path. Some ideas:
>>         - there are a lot of unnecessary isb/dsb. The registers used by the
>> guests only will be synchronized by eret.
> I have removed (almost) all of them. No significant changes in latency.
>
>>         - FPU is taking time to save/restore, you could make it lazy
> This also does not takes much time.
>
>>         - It might be possible to limit the number of LRs saved/restored
>> depending on the number of LRs used by a domain.
> Excuse me, what is LR in this context?

Sorry I meant GIC LRs (see GIC save/restore code). They are used to list 
the interrupts injected to the guest. All of they may not be used at the 
time of the context switch.

>
> You can take a look at my context switch routines at [2].

I had a quick look and I am not sure which context switch you exactly 
used as you split it into 2 helpers but also modify the current one.

Could you briefly describe the context switch you do for EL0 app here?


> [1] http://marc.info/?l=xen-devel&m=149088856116797&w=2
> [2] https://github.com/lorc/xen/blob/el0_app/xen/arch/arm/domain.c#L257


Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-25 11:43                         ` Julien Grall
@ 2017-04-26 21:44                           ` Volodymyr Babchuk
  2017-04-27 17:26                             ` Volodymyr Babchuk
  2017-05-02 12:42                             ` Julien Grall
  0 siblings, 2 replies; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-04-26 21:44 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Andrii Anisov, Dario Faggioli, george.dunlap,
	Xen Devel, nd, Artem Mygaiev

Hi Julien,

On 25 April 2017 at 14:43, Julien Grall <julien.grall@arm.com> wrote:
>>>>>> We will also need another type of application: one which is
>>>>>> periodically called by XEN itself, not actually servicing any domain
>>>>>> request. This is needed for a
>>>>>> coprocessor sharing framework scheduler implementation.
>>>>>
>>>>>
>>>>> EL0 apps can be a powerful new tool for us to use, but they are not the
>>>>> solution to everything. This is where I would draw the line: if the
>>>>> workload needs to be scheduled periodically, then it is not a good fit
>>>>> for an EL0 app.
>>>>
>>>>
>>>> From my last conversation with Volodymyr I've got a feeling that notions
>>>> "EL0" and "XEN native application" must be pretty orthogonal.
>>>> In [1] Volodymyr got no performance gain from changing domain's
>>>> exception level from EL1 to EL0.
>>>> Only when Volodymyr stripped the domain's context  abstraction (i.e.
>>>> dropped GIC context store/restore) some noticeable results were reached.
>>>
>>>
>>>
>>> Do you have numbers for part that take times in the save/restore? You
>>> mention GIC and I am a bit surprised you don't mention FPU.
>>
>> I did it in the other thread. Check out [1]. The most speed up I got
>> after removing vGIC context handling
>
>
> Oh, yes. Sorry I forgot this thread. Continuing on that, you said that "Now
> profiler shows that hypervisor spends time in spinlocks and p2m code."
>
> Could you expand here? How the EL0 app will spend time in p2m code?
I don't quite remember. It was somewhere around p2m save/restore
context functions.
I'll try to restore that setup and will provide more details.

> Similarly, why spinlocks take time? Are they contented?
Problem is that my profiler does not show stack, so I can't say which
spinlock causes this. But profiler didn't showed that CPU spend much
time in spinlock wait loop. So looks like there are no contention.

>>
>>> I would have a look at optimizing the context switch path. Some ideas:
>>>         - there are a lot of unnecessary isb/dsb. The registers used by
>>> the
>>> guests only will be synchronized by eret.
>>
>> I have removed (almost) all of them. No significant changes in latency.
>>
>>>         - FPU is taking time to save/restore, you could make it lazy
>>
>> This also does not takes much time.
>>
>>>         - It might be possible to limit the number of LRs saved/restored
>>> depending on the number of LRs used by a domain.
>>
>> Excuse me, what is LR in this context?
>
>
> Sorry I meant GIC LRs (see GIC save/restore code). They are used to list the
> interrupts injected to the guest. All of they may not be used at the time of
> the context switch.
As I said, I don't call GIC save and restore routines, So, that should
no be an issue (if I got that right).

>>
>> You can take a look at my context switch routines at [2].
>
>
> I had a quick look and I am not sure which context switch you exactly used
> as you split it into 2 helpers but also modify the current one.
>
> Could you briefly describe the context switch you do for EL0 app here?
As I said, I tried to reuse all existing services. My PoC hosts app in
separate domain. Also this domain have own vcpu. So, at first I used
the plain old ctxt_switch_from()/ctxt_switch_to() pair from domain.c.
You know that those two functions save/restore almost all state of
vCPU except pc, sp, lr and other general purpose registers. The
remaining context is saved/restored in entry.S
I just made v->arch.cpu_info->guest_cpu_user_regs.pc to point to app
entry point and changed saved cpsr, to switch right into el0.

Then I copied  ctxt_switch_from()/ctxt_switch_to() to
ctxt_switch_from_partial()/ctxt_switch_to_partial() and began to
remove all unneeded code (dsb()'s\isb()'s, gic context handling, etc).
So, overall flow is following:

0. If it is the first call, then I create 1:1 VM mapping and program
ttbr0, ttbrc, mair  registers of app vcpu.
1. I pause a calling vcpu
2. I program saved pc of app vcpu to point to the app entry point, sp
to point to a top of a stack, cpsr to entry in el0 mode.
3. I call ctxt_switch_from_partial() to save context of calling vcpu
4. I enable TGE bit
5. I call ctx_switch_to_partial() to restore context of app vcpu
6. I call __save_context() to save rest of the context of calling vcpu
(pc, sp, lr, r0-r31).
7. I invoke switch_stack_and_jump() to restore rest of the context of app vcpu
8. Now I'm in EL0 app. Hooray! App does something, invokes syscalls
(which are handled in hypervisor) and so on.
9. App invoke syscall named app_exit()
10.I use  ctxt_switch_from_partial() to save app state (actually it is
not needed, I think)
11. I use ctxt_swtich_to_partial() to restore calling vcpu state
12. I unpause calling vcpu and drop TGE bit.
13. I call __restore_context() to restore pc, lr and friends. At this
time code jumps back to p.6 (because I saved pc there). But it checks
flag variable and sees that it is actually exit from app.
14. ... so it exits back to calling domain.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-21 21:24                         ` Stefano Stabellini
  2017-04-24 16:14                           ` Andrii Anisov
  2017-04-24 16:46                           ` Andrii Anisov
@ 2017-04-27 15:25                           ` George Dunlap
  2017-05-02 12:45                             ` Julien Grall
  2 siblings, 1 reply; 82+ messages in thread
From: George Dunlap @ 2017-04-27 15:25 UTC (permalink / raw)
  To: Stefano Stabellini, Julien Grall
  Cc: Volodymyr Babchuk, Dario Faggioli, Artem Mygaiev, Xen Devel

On 21/04/17 22:24, Stefano Stabellini wrote:
> On Fri, 21 Apr 2017, Julien Grall wrote:
>> Hi Volodymyr,
>>
>> On 21/04/17 18:04, Volodymyr Babchuk wrote:
>>> On 21 April 2017 at 19:47, Julien Grall <julien.grall@arm.com> wrote:
>>>> On 21/04/17 17:16, Volodymyr Babchuk wrote:
>>>>> On 21 April 2017 at 18:57, Julien Grall <julien.grall@arm.com> wrote:
>>>>>>
>>>>>> Hello Volodymyr,
>>>>>>
>>>>>> On 20/04/17 21:20, Volodymyr Babchuk wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 12 April 2017 at 22:17, Stefano Stabellini
>>>>>>> <sstabellini@kernel.org>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, 12 Apr 2017, Dario Faggioli wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 2017-04-11 at 13:32 -0700, Stefano Stabellini wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, 7 Apr 2017, Stefano Stabellini wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> We would have one app per emulator. Each app would register an
>>>>>>>> MMIO
>>>>>>>> range or instruction set to emulate. On a guest trap, Xen figures
>>>>>>>> out
>>>>>>>> which app it needs to run.
>>>>>>>
>>>>>>>
>>>>>>> I't is not best approach, I think. For example we need one SMC
>>>>>>> handler
>>>>>>> for
>>>>>>> all domains. Because that SMC handler should track execution state
>>>>>>> of
>>>>>>> different
>>>>>>> guests to help TEE with scheduling. You know, TEE can't block in
>>>>>>> secure
>>>>>>> state,
>>>>>>> so it returns back and blocks in kernel driver. SMC handler need to
>>>>>>> know
>>>>>>> which guest it needs to wake up when times comes.
>>>>>>>
>>>>>>> The same story with virtual coprocessors, I think.
>>>>>>>
>>>>>>> On other hand, MMIO handler can be one per domain. So, it should be
>>>>>>> configurable. Or, maybe we need per-app MMIO handler and one global
>>>>>>> SMC
>>>>>>> handler.
>>>>>>> Perhaps, we need to think about all possible use cases.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Could you explain what would be the benefits to run this global SMC
>>>>>> handler
>>>>>> in EL0?
>>>>>>
>>>>>> After all, it will require access to the host SMC. So what will you
>>>>>> protect
>>>>>> against?
>>>>>
>>>>> Yes, it will require access to host SMC. Idea is not to protect (but,
>>>>> it can protect also).
>>>>> I want to allow different guests to work with one TEE. Imagine that
>>>>> multiple guests need
>>>>> protected storage, accelerated cryptography or other TEE services.
>>>>> All SMCs will be trapped to app, app will alter(or block) request and
>>>>> forward it to TEE. This is the most basic use case, which we want to
>>>>> implement.
>>>>
>>>>
>>>> I am sorry, but I don't understand it. I envision EL0 as a way to limit
>>>> the
>>>> attack vector to Xen and the host. If you give full access to SMC, then
>>>> you
>>>> cannot protect.
>>> In any case it will limit the attack surface. Filtered SMC request is
>>> not as destructive as
>>> arbitrary SMC from a guest.
>>
>> I agree with that. But why in EL0? I think you answer partly below.
>>
>>>
>>>> If the idea is not to protect, why do you want to move the code in EL0?
>>>> What
>>>> is the point to add an overhead (even if it is small) in this case?
>>> There are many reasons:
>>> 1. Community is reluctant to add OP-TEE (or any other TEE) handler
>>> right into hypervisor codebase.
>>
>> Well, I think I was the only one to be reluctant. And I asked you to look at
>> different solutions and come up with suggestion are saying why you solution is
>> better.
>>
>> Whilst I agree that EL0 app is a solution for a lot of emulation. We should be
>> careful before moving code to EL0 and evaluating the impact. I am expecting to
>> see the interface very small and the application to be standalone (e.g not
>> requiring much interaction with Xen or the host hardware).
> 
> I also had this understanding
> 
> 
>> But you seem to have a different view (see your e-mail with:
>> "Probably, we can try another approach: allow application to register
>> hooks in hypervisor: i.e. hook on MMIO, hook on SMC, hook on timer and
>> so on.").
>>
>> If you introduce EL0 but require a big interface, then I believe you don't
>> limit the surface attack.
> 
> Also, it is difficult to maintain a large EL0-Xen interface.
> 
> The idea is basically to register an MMIO range to emulate, submit the
> request to the EL0 app, which would take care of the emulation. The
> interface with Xen would be mostly limited to map/unmap guest memory
> (only the guest it is servicing) and send interrupts to the guest. It
> would always and only be run immediately after the guest vcpu in its own
> time slot.
> 
> The key is to be simple. If it becomes complex, then we are reinventing
> stubdoms.

A couple of notes:

- I think these things will inevitably end up being somewhat
complicated.  We should always strive for simplicity and flexibility,
but the main thing is that we should use the right tool for the right
job: This is for handling synchronous events from a single vcpu in at
vcpu's context (both scheduling and permission-wise).  Handling things
from multiple domains should be handled with a classic domain.

- This looks a lot like the the deprivileged emulator work done by that
intern many years ago -- whoever ends up implementing this, it might be
worth looking at those patches.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-26 21:44                           ` Volodymyr Babchuk
@ 2017-04-27 17:26                             ` Volodymyr Babchuk
  2017-05-02 12:52                               ` Julien Grall
  2017-05-02 12:42                             ` Julien Grall
  1 sibling, 1 reply; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-04-27 17:26 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Andrii Anisov, Dario Faggioli, george.dunlap,
	Xen Devel, nd, Artem Mygaiev

Hi Julien,

I'm back with profiler results.

>> Oh, yes. Sorry I forgot this thread. Continuing on that, you said that "Now
>> profiler shows that hypervisor spends time in spinlocks and p2m code."
>>
>> Could you expand here? How the EL0 app will spend time in p2m code?
> I don't quite remember. It was somewhere around p2m save/restore
> context functions.
> I'll try to restore that setup and will provide more details.
So, there are top 5 functions:

p2m_restore_state - 10.6%
spin_lock - 8.4%
spin_unlock_irqrestore - 6%
ctxt_switch_to_partial - 5.7%
gicv2_hcr_status - 4.6%

per-source-file statistics:

spinlock.c - 22%
entry.S - 15%
arm/domain.c - 11.6%

-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-26 21:44                           ` Volodymyr Babchuk
  2017-04-27 17:26                             ` Volodymyr Babchuk
@ 2017-05-02 12:42                             ` Julien Grall
  1 sibling, 0 replies; 82+ messages in thread
From: Julien Grall @ 2017-05-02 12:42 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Stefano Stabellini, Andrii Anisov, Dario Faggioli, george.dunlap,
	Xen Devel, nd, Artem Mygaiev



On 26/04/17 22:44, Volodymyr Babchuk wrote:
> Hi Julien,

Hi Volodymyr,

> On 25 April 2017 at 14:43, Julien Grall <julien.grall@arm.com> wrote:
>>>>>>> We will also need another type of application: one which is
>>>>>>> periodically called by XEN itself, not actually servicing any domain
>>>>>>> request. This is needed for a
>>>>>>> coprocessor sharing framework scheduler implementation.
>>>>>>
>>>>>>
>>>>>> EL0 apps can be a powerful new tool for us to use, but they are not the
>>>>>> solution to everything. This is where I would draw the line: if the
>>>>>> workload needs to be scheduled periodically, then it is not a good fit
>>>>>> for an EL0 app.
>>>>>
>>>>>
>>>>> From my last conversation with Volodymyr I've got a feeling that notions
>>>>> "EL0" and "XEN native application" must be pretty orthogonal.
>>>>> In [1] Volodymyr got no performance gain from changing domain's
>>>>> exception level from EL1 to EL0.
>>>>> Only when Volodymyr stripped the domain's context  abstraction (i.e.
>>>>> dropped GIC context store/restore) some noticeable results were reached.
>>>>
>>>>
>>>>
>>>> Do you have numbers for part that take times in the save/restore? You
>>>> mention GIC and I am a bit surprised you don't mention FPU.
>>>
>>> I did it in the other thread. Check out [1]. The most speed up I got
>>> after removing vGIC context handling
>>
>>
>> Oh, yes. Sorry I forgot this thread. Continuing on that, you said that "Now
>> profiler shows that hypervisor spends time in spinlocks and p2m code."
>>
>> Could you expand here? How the EL0 app will spend time in p2m code?
> I don't quite remember. It was somewhere around p2m save/restore
> context functions.
> I'll try to restore that setup and will provide more details.
>
>> Similarly, why spinlocks take time? Are they contented?
> Problem is that my profiler does not show stack, so I can't say which
> spinlock causes this. But profiler didn't showed that CPU spend much
> time in spinlock wait loop. So looks like there are no contention.
>
>>>
>>>> I would have a look at optimizing the context switch path. Some ideas:
>>>>         - there are a lot of unnecessary isb/dsb. The registers used by
>>>> the
>>>> guests only will be synchronized by eret.
>>>
>>> I have removed (almost) all of them. No significant changes in latency.
>>>
>>>>         - FPU is taking time to save/restore, you could make it lazy
>>>
>>> This also does not takes much time.
>>>
>>>>         - It might be possible to limit the number of LRs saved/restored
>>>> depending on the number of LRs used by a domain.
>>>
>>> Excuse me, what is LR in this context?
>>
>>
>> Sorry I meant GIC LRs (see GIC save/restore code). They are used to list the
>> interrupts injected to the guest. All of they may not be used at the time of
>> the context switch.
> As I said, I don't call GIC save and restore routines, So, that should
> no be an issue (if I got that right).

Well, my point was that maybe you can limit the time in gic save/restore 
code rather than completely ignore them.

For instance, if you don't save/restore the GIC you will need to disable 
the vGIC (GICH_HCR.En) to avoid interrupt injection when running the EL0 
app. I don't see this code here.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-27 15:25                           ` George Dunlap
@ 2017-05-02 12:45                             ` Julien Grall
  0 siblings, 0 replies; 82+ messages in thread
From: Julien Grall @ 2017-05-02 12:45 UTC (permalink / raw)
  To: George Dunlap, Stefano Stabellini
  Cc: Volodymyr Babchuk, Dario Faggioli, Artem Mygaiev, Xen Devel

Hi,

On 27/04/17 16:25, George Dunlap wrote:
> A couple of notes:
>
> - I think these things will inevitably end up being somewhat
> complicated.  We should always strive for simplicity and flexibility,
> but the main thing is that we should use the right tool for the right
> job: This is for handling synchronous events from a single vcpu in at
> vcpu's context (both scheduling and permission-wise).  Handling things
> from multiple domains should be handled with a classic domain.
>
> - This looks a lot like the the deprivileged emulator work done by that
> intern many years ago -- whoever ends up implementing this, it might be
> worth looking at those patches.

I agree here, I was actually expecting a similar work for ARM. For 
reference, George is speaking about:

https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg00336.html
https://lists.xen.org/archives/html/xen-devel/2015-07/msg03507.html

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-27 17:26                             ` Volodymyr Babchuk
@ 2017-05-02 12:52                               ` Julien Grall
  0 siblings, 0 replies; 82+ messages in thread
From: Julien Grall @ 2017-05-02 12:52 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Stefano Stabellini, Andrii Anisov, Dario Faggioli, george.dunlap,
	Xen Devel, nd, Artem Mygaiev



On 27/04/17 18:26, Volodymyr Babchuk wrote:
> Hi Julien,

Hi Volodymyr,

> I'm back with profiler results.
>
>>> Oh, yes. Sorry I forgot this thread. Continuing on that, you said that "Now
>>> profiler shows that hypervisor spends time in spinlocks and p2m code."
>>>
>>> Could you expand here? How the EL0 app will spend time in p2m code?
>> I don't quite remember. It was somewhere around p2m save/restore
>> context functions.
>> I'll try to restore that setup and will provide more details.
> So, there are top 5 functions:
>
> p2m_restore_state - 10.6%

This is with your el0_app branch? Or did you make some changes?

For instance, the p2m_restore_state has been reworked for Xen 4.9 and 
some of the isb() could be dropped.

> spin_lock - 8.4%
> spin_unlock_irqrestore - 6%

I am a bit confused on what you are profiling. Are you only profiling 
context save/restore? Or do you also profile the rest of the hypervisor?

E.g, how many time spin_*lock are called... That would help to know if 
the problem is because of the number of locks taken or potential 
optimization that we didn't implement.

> ctxt_switch_to_partial - 5.7%
> gicv2_hcr_status - 4.6%

Cheers,

>
> per-source-file statistics:
>
> spinlock.c - 22%
> entry.S - 15%
> arm/domain.c - 11.6%
>

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-24 18:08                     ` Stefano Stabellini
  2017-04-25 10:15                       ` Andrii Anisov
@ 2017-05-05 10:51                       ` Andrii Anisov
  2017-05-05 19:28                         ` Stefano Stabellini
  2017-05-05 11:09                       ` [ARM] Native application design and discussion (I hope) Andrii Anisov
  2 siblings, 1 reply; 82+ messages in thread
From: Andrii Anisov @ 2017-05-05 10:51 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Volodymyr Babchuk, Dario Faggioli, george.dunlap, Xen Devel,
	Julien Grall, Artem Mygaiev

Hello Stefano,

On 24.04.17 21:08, Stefano Stabellini wrote:
> Stubdomains (stubdoms in short) are small domains, each running a single
> application. Typically they run unikernels rather than a full fledged
> operating system. A classic example is QEMU stubdoms on x86: one QEMU
> stubdoms is started for each regular guest domain. Each QEMU stubdom
> instance provides emulation for one guest - it runs one instance of
> QEMU.
I'm wondering if there are any examples of practical usage of stub 
domains with ARM?

>  From Xen point of view, they are regular domain, except that they are
> privilege in regards to one particular guest only (they can map a page
> of the guest they are servicing, but they cannot map any random page in
> memory). If you do "xl list" you would see stubdoms in the output.
So they are the regular XEN domains with sort of specific permissions.
> The advantages of using stubdoms are:
> - they already exist
> - their security properties are well known
Could you please point me to some up to date documentation about 
stubdoms and their security properties?
> In this thread, we are discussing whether it makes sense to introduce a
> *new* model, one that has different properties, therefore different
> advantages and disadvantages. Let's call it EL0 apps.
>
> An EL0 app is an application that runs on top of the Xen hypervisor
> directly in EL0 mode. It is not a domain, and it doesn't show up on "xl
> list". It runs on the same timeslot of the guest vcpu it is servicing.
> It is scheduled deterministically: right after a guest vcpu traps into
> the hypervisor. The build is simple. Writing an EL0 app should be
> (almost) as simple as writing a regular emulator for Xen. An EL0
> app is not scheduled, but it could register a timer with Xen. The
> interface between EL0 apps and Xen is small: the number of
> functionalities exposed are very limited.
Any reason to have an interface between XEN and EL0 app to be bound to 
an app functionality?
Why not to introduce a generic (simplistic) interface and do not limit 
the functionality of the EL0 app?

> The advantages of using EL0 apps are:
> - scheduled deterministically
> - faster context switch
> - lower and deterministic latency
> - EL0 apps execution time is accounted appropriately to the guest that
>    they are servicing
Can't the EL0 app be servicing XEN itself?

*
*

*
*

*Andrii Anisov*



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-04-24 18:08                     ` Stefano Stabellini
  2017-04-25 10:15                       ` Andrii Anisov
  2017-05-05 10:51                       ` Andrii Anisov
@ 2017-05-05 11:09                       ` Andrii Anisov
  2 siblings, 0 replies; 82+ messages in thread
From: Andrii Anisov @ 2017-05-05 11:09 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Volodymyr Babchuk, Dario Faggioli, george.dunlap, Xen Devel,
	Julien Grall, Artem Mygaiev


On 24.04.17 21:08, Stefano Stabellini wrote:
>
>> If we are speaking about shared coprocessors framework, we need here several
>> things:
>>   - MMIO access emulation
> This could be run as EL0 app.
But even now, the MMIO access emulation has something to do with the 
real hardware under the circumstances.

>>   - periodic actions (scheduling) which at least will include IOMMU
>> reconfiguration and some actions with coprocessor hardware itself in runtime
>>   - coprocessor interrupts handling and redistribution to target domains
> These would be better as stubdoms. Or, if they are simple enough, in
> the hypervisor directly.
>
> Assuming that MMIO access emulation is done in an EL0 app, how many
> types of calls does it need to make to these two components?
With current implementation - none, but I guess it should be able to 
trigger scheduling actions.
Also access emulation does collect some data what is used by context 
switching process.

> As far as I can tell, without looking at the existing code, the options
> are:
> 1) everything in a stubdom
> 2) MMIO access emulation as EL0 app, the rest in a stubdom
> 3) MMIO access emulation as EL0 app, the rest in Xen
>
> The right approach depends on the type of interactions between MMIO
> access emulation and the rest of the shared coprocessors framework. Does
> this make sense?
Thank you for your ideas and comments.
We will see what would be the EL0 app and how would evolve our vision of 
SCF along further implementation.
Currently we have pretty much unclarity here to get the decision postponed.

-- 

*Andrii Anisov*



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-05 10:51                       ` Andrii Anisov
@ 2017-05-05 19:28                         ` Stefano Stabellini
  2017-05-08 10:46                           ` George Dunlap
  2017-05-09 10:13                           ` Dario Faggioli
  0 siblings, 2 replies; 82+ messages in thread
From: Stefano Stabellini @ 2017-05-05 19:28 UTC (permalink / raw)
  To: Andrii Anisov
  Cc: Stefano Stabellini, Volodymyr Babchuk, Dario Faggioli,
	george.dunlap, Xen Devel, Julien Grall, Artem Mygaiev

On Fri, 5 May 2017, Andrii Anisov wrote:
> Hello Stefano,
> 
> On 24.04.17 21:08, Stefano Stabellini wrote:
> > Stubdomains (stubdoms in short) are small domains, each running a single
> > application. Typically they run unikernels rather than a full fledged
> > operating system. A classic example is QEMU stubdoms on x86: one QEMU
> > stubdoms is started for each regular guest domain. Each QEMU stubdom
> > instance provides emulation for one guest - it runs one instance of
> > QEMU.
> I'm wondering if there are any examples of practical usage of stub domains
> with ARM?

Good question. I don't think so: there have been practical examples of
unikernels running on Xen on ARM, but not stubdoms, because we haven't
needed to run large emulation pieces yet.


> >  From Xen point of view, they are regular domain, except that they are
> > privilege in regards to one particular guest only (they can map a page
> > of the guest they are servicing, but they cannot map any random page in
> > memory). If you do "xl list" you would see stubdoms in the output.
> So they are the regular XEN domains with sort of specific permissions.
> > The advantages of using stubdoms are:
> > - they already exist
> > - their security properties are well known
> Could you please point me to some up to date documentation about stubdoms and
> their security properties?

Stubdoms have been talked about in the Xen community for a very long
time:

https://hal.inria.fr/inria-00329969/PDF/final.pdf
http://www.cs.ubc.ca/~andy/papers/xoar-sosp-final.pdf
https://wiki.xen.org/wiki/Dom0_Disaggregation

Both OpenXT and Qubes OS use stubdoms.


> > In this thread, we are discussing whether it makes sense to introduce a
> > *new* model, one that has different properties, therefore different
> > advantages and disadvantages. Let's call it EL0 apps.
> > 
> > An EL0 app is an application that runs on top of the Xen hypervisor
> > directly in EL0 mode. It is not a domain, and it doesn't show up on "xl
> > list". It runs on the same timeslot of the guest vcpu it is servicing.
> > It is scheduled deterministically: right after a guest vcpu traps into
> > the hypervisor. The build is simple. Writing an EL0 app should be
> > (almost) as simple as writing a regular emulator for Xen. An EL0
> > app is not scheduled, but it could register a timer with Xen. The
> > interface between EL0 apps and Xen is small: the number of
> > functionalities exposed are very limited.
> Any reason to have an interface between XEN and EL0 app to be bound to an app
> functionality?
> Why not to introduce a generic (simplistic) interface and do not limit the
> functionality of the EL0 app?

Because if we did that there would be no security benefits in having EL0
apps: we might as well run the emulator in the hypervisor.


> > The advantages of using EL0 apps are:
> > - scheduled deterministically
> > - faster context switch
> > - lower and deterministic latency
> > - EL0 apps execution time is accounted appropriately to the guest that
> >    they are servicing
> Can't the EL0 app be servicing XEN itself?

Short answer: no.

Long answer follows. EL0 apps will run in a different context. It was
suggested to keep track of their state in the guest vcpu struct, which
looks like a good idea to me. If we did that, the only way to have an
EL0 app running without being bound to a specific guest, would be to run
it on the idle vcpu, which I think is a bad idea.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-05 19:28                         ` Stefano Stabellini
@ 2017-05-08 10:46                           ` George Dunlap
  2017-05-08 18:31                             ` Stefano Stabellini
  2017-05-09 10:13                           ` Dario Faggioli
  1 sibling, 1 reply; 82+ messages in thread
From: George Dunlap @ 2017-05-08 10:46 UTC (permalink / raw)
  To: Stefano Stabellini, Andrii Anisov
  Cc: Volodymyr Babchuk, Dario Faggioli, Artem Mygaiev, Julien Grall,
	Xen Devel

On 05/05/17 20:28, Stefano Stabellini wrote:
> On Fri, 5 May 2017, Andrii Anisov wrote:
>> Hello Stefano,
>>
>> On 24.04.17 21:08, Stefano Stabellini wrote:
>>> Stubdomains (stubdoms in short) are small domains, each running a single
>>> application. Typically they run unikernels rather than a full fledged
>>> operating system. A classic example is QEMU stubdoms on x86: one QEMU
>>> stubdoms is started for each regular guest domain. Each QEMU stubdom
>>> instance provides emulation for one guest - it runs one instance of
>>> QEMU.
>> I'm wondering if there are any examples of practical usage of stub domains
>> with ARM?
> 
> Good question. I don't think so: there have been practical examples of
> unikernels running on Xen on ARM, but not stubdoms, because we haven't
> needed to run large emulation pieces yet.

So often when we say "stub domains" we mean specifically, "devicemodel
stub domains".  But there are many other stub domains for other
purposes.  You can run xenstored in a stubdomain rather than in dom0,
for instance; I think this probably already works on ARM.  I believe
that the PV vTPM architecture also has one vTPM "worker" per guest,
along with a "global" domain to control the physical TPM and multiplex
it over the various vTPMs.

 -George



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-08 10:46                           ` George Dunlap
@ 2017-05-08 18:31                             ` Stefano Stabellini
  2017-05-08 18:33                               ` Julien Grall
  2017-05-09  8:53                               ` George Dunlap
  0 siblings, 2 replies; 82+ messages in thread
From: Stefano Stabellini @ 2017-05-08 18:31 UTC (permalink / raw)
  To: George Dunlap
  Cc: Stefano Stabellini, Andrii Anisov, Volodymyr Babchuk,
	Dario Faggioli, Xen Devel, Julien Grall, Artem Mygaiev

On Mon, 8 May 2017, George Dunlap wrote:
> On 05/05/17 20:28, Stefano Stabellini wrote:
> > On Fri, 5 May 2017, Andrii Anisov wrote:
> >> Hello Stefano,
> >>
> >> On 24.04.17 21:08, Stefano Stabellini wrote:
> >>> Stubdomains (stubdoms in short) are small domains, each running a single
> >>> application. Typically they run unikernels rather than a full fledged
> >>> operating system. A classic example is QEMU stubdoms on x86: one QEMU
> >>> stubdoms is started for each regular guest domain. Each QEMU stubdom
> >>> instance provides emulation for one guest - it runs one instance of
> >>> QEMU.
> >> I'm wondering if there are any examples of practical usage of stub domains
> >> with ARM?
> > 
> > Good question. I don't think so: there have been practical examples of
> > unikernels running on Xen on ARM, but not stubdoms, because we haven't
> > needed to run large emulation pieces yet.
> 
> So often when we say "stub domains" we mean specifically, "devicemodel
> stub domains".  But there are many other stub domains for other
> purposes.  You can run xenstored in a stubdomain rather than in dom0,
> for instance; I think this probably already works on ARM.  I believe
> that the PV vTPM architecture also has one vTPM "worker" per guest,
> along with a "global" domain to control the physical TPM and multiplex
> it over the various vTPMs.

TPM is an x86 concept, but xenstored stubdom is possible.

Althought they don't have to, stubdoms are typically based on mini-os
(git://xenbits.xen.org/mini-os.git) which only has 32-bit ARM support
today. However, it should be possible to run a 32-bit stubdom on a
64-bit host.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-08 18:31                             ` Stefano Stabellini
@ 2017-05-08 18:33                               ` Julien Grall
  2017-05-09  8:53                               ` George Dunlap
  1 sibling, 0 replies; 82+ messages in thread
From: Julien Grall @ 2017-05-08 18:33 UTC (permalink / raw)
  To: Stefano Stabellini, George Dunlap
  Cc: Volodymyr Babchuk, Dario Faggioli, Artem Mygaiev, Andrii Anisov,
	Xen Devel



On 05/08/2017 07:31 PM, Stefano Stabellini wrote:
> On Mon, 8 May 2017, George Dunlap wrote:
>> On 05/05/17 20:28, Stefano Stabellini wrote:
>>> On Fri, 5 May 2017, Andrii Anisov wrote:
>>>> Hello Stefano,
>>>>
>>>> On 24.04.17 21:08, Stefano Stabellini wrote:
>>>>> Stubdomains (stubdoms in short) are small domains, each running a single
>>>>> application. Typically they run unikernels rather than a full fledged
>>>>> operating system. A classic example is QEMU stubdoms on x86: one QEMU
>>>>> stubdoms is started for each regular guest domain. Each QEMU stubdom
>>>>> instance provides emulation for one guest - it runs one instance of
>>>>> QEMU.
>>>> I'm wondering if there are any examples of practical usage of stub domains
>>>> with ARM?
>>>
>>> Good question. I don't think so: there have been practical examples of
>>> unikernels running on Xen on ARM, but not stubdoms, because we haven't
>>> needed to run large emulation pieces yet.
>>
>> So often when we say "stub domains" we mean specifically, "devicemodel
>> stub domains".  But there are many other stub domains for other
>> purposes.  You can run xenstored in a stubdomain rather than in dom0,
>> for instance; I think this probably already works on ARM.  I believe
>> that the PV vTPM architecture also has one vTPM "worker" per guest,
>> along with a "global" domain to control the physical TPM and multiplex
>> it over the various vTPMs.
>
> TPM is an x86 concept, but xenstored stubdom is possible.
>
> Althought they don't have to, stubdoms are typically based on mini-os
> (git://xenbits.xen.org/mini-os.git) which only has 32-bit ARM support
> today. However, it should be possible to run a 32-bit stubdom on a
> 64-bit host.

The 32-bit ARM support in Mini-OS has never been completed. Hopefully
someone will finish it soon.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-08 18:31                             ` Stefano Stabellini
  2017-05-08 18:33                               ` Julien Grall
@ 2017-05-09  8:53                               ` George Dunlap
  2017-05-10 16:38                                 ` Andrii Anisov
  1 sibling, 1 reply; 82+ messages in thread
From: George Dunlap @ 2017-05-09  8:53 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Andrii Anisov, Volodymyr Babchuk, Dario Faggioli, Xen Devel,
	Julien Grall, Artem Mygaiev

On 08/05/17 19:31, Stefano Stabellini wrote:
> On Mon, 8 May 2017, George Dunlap wrote:
>> On 05/05/17 20:28, Stefano Stabellini wrote:
>>> On Fri, 5 May 2017, Andrii Anisov wrote:
>>>> Hello Stefano,
>>>>
>>>> On 24.04.17 21:08, Stefano Stabellini wrote:
>>>>> Stubdomains (stubdoms in short) are small domains, each running a single
>>>>> application. Typically they run unikernels rather than a full fledged
>>>>> operating system. A classic example is QEMU stubdoms on x86: one QEMU
>>>>> stubdoms is started for each regular guest domain. Each QEMU stubdom
>>>>> instance provides emulation for one guest - it runs one instance of
>>>>> QEMU.
>>>> I'm wondering if there are any examples of practical usage of stub domains
>>>> with ARM?
>>>
>>> Good question. I don't think so: there have been practical examples of
>>> unikernels running on Xen on ARM, but not stubdoms, because we haven't
>>> needed to run large emulation pieces yet.
>>
>> So often when we say "stub domains" we mean specifically, "devicemodel
>> stub domains".  But there are many other stub domains for other
>> purposes.  You can run xenstored in a stubdomain rather than in dom0,
>> for instance; I think this probably already works on ARM.  I believe
>> that the PV vTPM architecture also has one vTPM "worker" per guest,
>> along with a "global" domain to control the physical TPM and multiplex
>> it over the various vTPMs.
> 
> TPM is an x86 concept, but xenstored stubdom is possible.

A few years ago I'd have said ACPI was an x86 concept as well. :-)  But
my point was mainly to give examples to Andrii of other ways stubdomains
were used.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-05 19:28                         ` Stefano Stabellini
  2017-05-08 10:46                           ` George Dunlap
@ 2017-05-09 10:13                           ` Dario Faggioli
  2017-05-09 10:32                             ` Julien Grall
  1 sibling, 1 reply; 82+ messages in thread
From: Dario Faggioli @ 2017-05-09 10:13 UTC (permalink / raw)
  To: Stefano Stabellini, Andrii Anisov
  Cc: Volodymyr Babchuk, Julien Grall, Artem Mygaiev, george.dunlap, Xen Devel


[-- Attachment #1.1: Type: text/plain, Size: 1963 bytes --]

On Fri, 2017-05-05 at 12:28 -0700, Stefano Stabellini wrote:
> On Fri, 5 May 2017, Andrii Anisov wrote:
> > On 24.04.17 21:08, Stefano Stabellini wrote:
> > > The advantages of using EL0 apps are:
> > > - scheduled deterministically
> > > - faster context switch
> > > - lower and deterministic latency
> > > - EL0 apps execution time is accounted appropriately to the guest
> > > that
> > >    they are servicing
> > 
> > Can't the EL0 app be servicing XEN itself?
> 
> Short answer: no.
> 
> Long answer follows. EL0 apps will run in a different context. 
>
I still feel like I am missing something (most likely, due to my
limited knowledge of ARM arch and XenOnARM code). Can you try to
clarify a bit for me what it "in a different context" in this case, and
 why it is important?

> It was
> suggested to keep track of their state in the guest vcpu struct,
> which
> looks like a good idea to me. If we did that, the only way to have an
> EL0 app running without being bound to a specific guest, would be to
> run
> it on the idle vcpu, which I think is a bad idea.
>
Which, FTR, is what we do in Xen for a bunch of things already, i.e.,
softirqs and tasklets.

It's actually a rather effective way of executing some piece of Xen
code synchronously with some event (as softirqs are always checked 'on
the way back' from the hypervisor), which I guess in your case could be
 the trap from the guest vCPU requesting service.

And it should not be hard to give such code access to the context of
the vCPU that was previously running (in x86, given we implement what
we call lazy context switch, it's most likely still loaded in the
pCPU!).

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-09 10:13                           ` Dario Faggioli
@ 2017-05-09 10:32                             ` Julien Grall
  2017-05-09 11:08                               ` Dario Faggioli
  0 siblings, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-05-09 10:32 UTC (permalink / raw)
  To: Dario Faggioli, Stefano Stabellini, Andrii Anisov
  Cc: Volodymyr Babchuk, Artem Mygaiev, george.dunlap, Xen Devel

Hi Dario,

On 05/09/2017 11:13 AM, Dario Faggioli wrote:
> On Fri, 2017-05-05 at 12:28 -0700, Stefano Stabellini wrote:
>> On Fri, 5 May 2017, Andrii Anisov wrote:
>>> On 24.04.17 21:08, Stefano Stabellini wrote:
>>>> The advantages of using EL0 apps are:
>>>> - scheduled deterministically
>>>> - faster context switch
>>>> - lower and deterministic latency
>>>> - EL0 apps execution time is accounted appropriately to the guest
>>>> that
>>>>    they are servicing
>>>
>>> Can't the EL0 app be servicing XEN itself?
>>
>> Short answer: no.
>>
>> Long answer follows. EL0 apps will run in a different context.
>>
> I still feel like I am missing something (most likely, due to my
> limited knowledge of ARM arch and XenOnARM code). Can you try to
> clarify a bit for me what it "in a different context" in this case, and
>  why it is important?

We want to run it in a different exception level to limit the surface 
attack of the hypervisor if the application is buggy.


>> It was
>> suggested to keep track of their state in the guest vcpu struct,
>> which
>> looks like a good idea to me. If we did that, the only way to have an
>> EL0 app running without being bound to a specific guest, would be to
>> run
>> it on the idle vcpu, which I think is a bad idea.
>>
> Which, FTR, is what we do in Xen for a bunch of things already, i.e.,
> softirqs and tasklets.

No, we don't switch to the idle vCPU to handle tasklets or softirqs. 
They will be done before entering to the guest and still in the 
hypervisor context.

>
> It's actually a rather effective way of executing some piece of Xen
> code synchronously with some event (as softirqs are always checked 'on
> the way back' from the hypervisor), which I guess in your case could be
>  the trap from the guest vCPU requesting service.
>
> And it should not be hard to give such code access to the context of
> the vCPU that was previously running (in x86, given we implement what
> we call lazy context switch, it's most likely still loaded in the
> pCPU!).

I agree with Stefano, switching to the idle vCPU is a pretty bad idea.

the idle vCPU is a fake vCPU on ARM to stick with the common code (we 
never leave the hypervisor). In the case of the EL0 app, we want to 
change exception level to run the code with lower privilege.

Also IHMO, it should only be used when there are nothing to run and not 
re-purposed for running EL0 app.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-09 10:32                             ` Julien Grall
@ 2017-05-09 11:08                               ` Dario Faggioli
  2017-05-09 11:19                                 ` Julien Grall
  2017-05-09 18:29                                 ` Stefano Stabellini
  0 siblings, 2 replies; 82+ messages in thread
From: Dario Faggioli @ 2017-05-09 11:08 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini, Andrii Anisov
  Cc: Volodymyr Babchuk, Artem Mygaiev, george.dunlap, Xen Devel


[-- Attachment #1.1: Type: text/plain, Size: 3212 bytes --]

On Tue, 2017-05-09 at 11:32 +0100, Julien Grall wrote:
> Hi Dario,
> 
Hey,

> On 05/09/2017 11:13 AM, Dario Faggioli wrote:
> > 
> > Which, FTR, is what we do in Xen for a bunch of things already,
> > i.e.,
> > softirqs and tasklets.
> 
> No, we don't switch to the idle vCPU to handle tasklets or softirqs. 
> They will be done before entering to the guest and still in the 
> hypervisor context.
> 
Mmm... I don't know who's "we" here, but even if it's "you ARM people",
you actually do.

In fact, this is common code:

static struct task_slice
csched_schedule(
    const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
{
    [...]
    /* Choices, choices:
     * - If we have a tasklet, we need to run the idle vcpu no matter what.
    [...]
    /* Tasklet work (which runs in idle VCPU context) overrides all else. */
    if ( tasklet_work_scheduled )
    {
        TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
        snext = CSCHED_VCPU(idle_vcpu[cpu]);
        snext->pri = CSCHED_PRI_TS_BOOST;
    }
    [...]
}

And this is "your" idle loop:

void idle_loop(void)
{
    for ( ; ; )
    {
        [...]
        local_irq_disable();
        if ( cpu_is_haltable(smp_processor_id()) )
        {                            
            dsb(sy);
            wfi();
        }
        local_irq_enable();
        do_tasklet();
        do_softirq();
        [...]
    }
}

Actually, yes, it was a bit inaccurate of me to cite both softirqs and
tasklets, together, like I did. Softirqs indeed are checked and handled
before leaving Xen, as you say, as well as, in the idle loop, as shown
above.

But for tasklet (and, to be 100% precise, for vCPU context tasklet),
it's actually the case that we force the idle vCPU in execution to run
them.

> > And it should not be hard to give such code access to the context
> > of
> > the vCPU that was previously running (in x86, given we implement
> > what
> > we call lazy context switch, it's most likely still loaded in the
> > pCPU!).
> 
> I agree with Stefano, switching to the idle vCPU is a pretty bad
> idea.
> 
> the idle vCPU is a fake vCPU on ARM to stick with the common code
> (we 
> never leave the hypervisor). In the case of the EL0 app, we want to 
> change exception level to run the code with lower privilege.
> 
> Also IHMO, it should only be used when there are nothing to run and
> not 
> re-purposed for running EL0 app.
> 
It's already purposed for running when there is nothing to do _or_ when
there are tasklets.

I do see your point about privilege level, though. And I agree with
George that it looks very similar to when, in the x86 world, we tried
to put the infra together for switching to Ring3 to run some pieces of
Xen code.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-09 11:08                               ` Dario Faggioli
@ 2017-05-09 11:19                                 ` Julien Grall
  2017-05-09 18:29                                 ` Stefano Stabellini
  1 sibling, 0 replies; 82+ messages in thread
From: Julien Grall @ 2017-05-09 11:19 UTC (permalink / raw)
  To: Dario Faggioli, Stefano Stabellini, Andrii Anisov
  Cc: Volodymyr Babchuk, Artem Mygaiev, george.dunlap, Xen Devel



On 05/09/2017 12:08 PM, Dario Faggioli wrote:
> On Tue, 2017-05-09 at 11:32 +0100, Julien Grall wrote:
>> Hi Dario,
>>
> Hey,
>
>> On 05/09/2017 11:13 AM, Dario Faggioli wrote:
>>>
>>> Which, FTR, is what we do in Xen for a bunch of things already,
>>> i.e.,
>>> softirqs and tasklets.
>>
>> No, we don't switch to the idle vCPU to handle tasklets or softirqs.
>> They will be done before entering to the guest and still in the
>> hypervisor context.
>>
> Mmm... I don't know who's "we" here, but even if it's "you ARM people",
> you actually do.
>
> In fact, this is common code:
>
> static struct task_slice
> csched_schedule(
>     const struct scheduler *ops, s_time_t now, bool_t tasklet_work_scheduled)
> {
>     [...]
>     /* Choices, choices:
>      * - If we have a tasklet, we need to run the idle vcpu no matter what.
>     [...]
>     /* Tasklet work (which runs in idle VCPU context) overrides all else. */
>     if ( tasklet_work_scheduled )
>     {
>         TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
>         snext = CSCHED_VCPU(idle_vcpu[cpu]);
>         snext->pri = CSCHED_PRI_TS_BOOST;
>     }
>     [...]
> }
>
> And this is "your" idle loop:
>
> void idle_loop(void)
> {
>     for ( ; ; )
>     {
>         [...]
>         local_irq_disable();
>         if ( cpu_is_haltable(smp_processor_id()) )
>         {
>             dsb(sy);
>             wfi();
>         }
>         local_irq_enable();
>         do_tasklet();
>         do_softirq();
>         [...]
>     }
> }
> Actually, yes, it was a bit inaccurate of me to cite both softirqs and
> tasklets, together, like I did. Softirqs indeed are checked and handled
> before leaving Xen, as you say, as well as, in the idle loop, as shown
> above.
>
> But for tasklet (and, to be 100% precise, for vCPU context tasklet),
> it's actually the case that we force the idle vCPU in execution to run
> them.

I am a bit confused. When I read the softirq code, I saw there is a 
softirq tasklet which call do_tasklet_work.

But it sounds like that tasklet can be either scheduled in softirq or in 
idle vCPU depending of the type of the tasklet (is_softirq).

Thank you for the explanation :).

>
>>> And it should not be hard to give such code access to the context
>>> of
>>> the vCPU that was previously running (in x86, given we implement
>>> what
>>> we call lazy context switch, it's most likely still loaded in the
>>> pCPU!).
>>
>> I agree with Stefano, switching to the idle vCPU is a pretty bad
>> idea.
>>
>> the idle vCPU is a fake vCPU on ARM to stick with the common code
>> (we
>> never leave the hypervisor). In the case of the EL0 app, we want to
>> change exception level to run the code with lower privilege.
>>
>> Also IHMO, it should only be used when there are nothing to run and
>> not
>> re-purposed for running EL0 app.
>>
> It's already purposed for running when there is nothing to do _or_ when
> there are tasklets.
>
> I do see your point about privilege level, though. And I agree with
> George that it looks very similar to when, in the x86 world, we tried
> to put the infra together for switching to Ring3 to run some pieces of
> Xen code.

We would like to do exactly the same for ARM. On another part of the 
thread I suggested to Volodymyr to look at what has been done there.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-09 11:08                               ` Dario Faggioli
  2017-05-09 11:19                                 ` Julien Grall
@ 2017-05-09 18:29                                 ` Stefano Stabellini
  2017-05-10  9:56                                   ` George Dunlap
  1 sibling, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-05-09 18:29 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Stefano Stabellini, Andrii Anisov, Volodymyr Babchuk,
	george.dunlap, Xen Devel, Julien Grall, Artem Mygaiev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1307 bytes --]

On Tue, 9 May 2017, Dario Faggioli wrote:
> > > And it should not be hard to give such code access to the context
> > > of
> > > the vCPU that was previously running (in x86, given we implement
> > > what
> > > we call lazy context switch, it's most likely still loaded in the
> > > pCPU!).
> > 
> > I agree with Stefano, switching to the idle vCPU is a pretty bad
> > idea.
> > 
> > the idle vCPU is a fake vCPU on ARM to stick with the common code
> > (we 
> > never leave the hypervisor). In the case of the EL0 app, we want to 
> > change exception level to run the code with lower privilege.
> > 
> > Also IHMO, it should only be used when there are nothing to run and
> > not 
> > re-purposed for running EL0 app.
> > 
> It's already purposed for running when there is nothing to do _or_ when
> there are tasklets.
> 
> I do see your point about privilege level, though. And I agree with
> George that it looks very similar to when, in the x86 world, we tried
> to put the infra together for switching to Ring3 to run some pieces of
> Xen code.

Right, and just to add to it, context switching to the idle vcpu has a
cost, but it doesn't give us any security benefits whatsever. If Xen is
going to spend time on context switching, it is better to do it in a
way that introduces a security boundary.

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-09 18:29                                 ` Stefano Stabellini
@ 2017-05-10  9:56                                   ` George Dunlap
  2017-05-10 10:00                                     ` Julien Grall
  2017-05-10 18:08                                     ` Andrii Anisov
  0 siblings, 2 replies; 82+ messages in thread
From: George Dunlap @ 2017-05-10  9:56 UTC (permalink / raw)
  To: Stefano Stabellini, Dario Faggioli
  Cc: Volodymyr Babchuk, Julien Grall, Artem Mygaiev, Andrii Anisov, Xen Devel

On 09/05/17 19:29, Stefano Stabellini wrote:
> On Tue, 9 May 2017, Dario Faggioli wrote:
>>>> And it should not be hard to give such code access to the context
>>>> of
>>>> the vCPU that was previously running (in x86, given we implement
>>>> what
>>>> we call lazy context switch, it's most likely still loaded in the
>>>> pCPU!).
>>>
>>> I agree with Stefano, switching to the idle vCPU is a pretty bad
>>> idea.
>>>
>>> the idle vCPU is a fake vCPU on ARM to stick with the common code
>>> (we 
>>> never leave the hypervisor). In the case of the EL0 app, we want to 
>>> change exception level to run the code with lower privilege.
>>>
>>> Also IHMO, it should only be used when there are nothing to run and
>>> not 
>>> re-purposed for running EL0 app.
>>>
>> It's already purposed for running when there is nothing to do _or_ when
>> there are tasklets.
>>
>> I do see your point about privilege level, though. And I agree with
>> George that it looks very similar to when, in the x86 world, we tried
>> to put the infra together for switching to Ring3 to run some pieces of
>> Xen code.
> 
> Right, and just to add to it, context switching to the idle vcpu has a
> cost, but it doesn't give us any security benefits whatsever. If Xen is
> going to spend time on context switching, it is better to do it in a
> way that introduces a security boundary.

"Context switching" to the idle vcpu doesn't actually save or change any
registers, nor does it flush the TLB.  It's more or less just accounting
for the scheduler.  So it has a cost (going through the scheduler) but
not a very large one.

But the context here is that Andrii asked something about whether this
"EL0 App" functionality could be used to service Xen as well as a
domain.  You said it didn't make sense, and Dario (as I understand it)
was pointing out that we already did something similar with tasklets.
If there was a need to be able to "upload" user-specified routines that
would handle events generated by the hypervisor rather than events
generated by a guest, that would indeed be a possibility.  It would
essentially be the equivalent of a deprivileged, untrusted tasklet.

At the moment I can't foresee the need for such a mechanism, and I don't
particularly think that we should keep that use case in mind when
designing the "App" interface.  But it is an interesting idea to keep in
our back pockets in case a use case comes up later.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-10  9:56                                   ` George Dunlap
@ 2017-05-10 10:00                                     ` Julien Grall
  2017-05-10 10:03                                       ` George Dunlap
  2017-05-10 18:08                                     ` Andrii Anisov
  1 sibling, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-05-10 10:00 UTC (permalink / raw)
  To: George Dunlap, Stefano Stabellini, Dario Faggioli
  Cc: Volodymyr Babchuk, Artem Mygaiev, Andrii Anisov, Xen Devel



On 05/10/2017 10:56 AM, George Dunlap wrote:
> On 09/05/17 19:29, Stefano Stabellini wrote:
>> On Tue, 9 May 2017, Dario Faggioli wrote:
>>>>> And it should not be hard to give such code access to the context
>>>>> of
>>>>> the vCPU that was previously running (in x86, given we implement
>>>>> what
>>>>> we call lazy context switch, it's most likely still loaded in the
>>>>> pCPU!).
>>>>
>>>> I agree with Stefano, switching to the idle vCPU is a pretty bad
>>>> idea.
>>>>
>>>> the idle vCPU is a fake vCPU on ARM to stick with the common code
>>>> (we
>>>> never leave the hypervisor). In the case of the EL0 app, we want to
>>>> change exception level to run the code with lower privilege.
>>>>
>>>> Also IHMO, it should only be used when there are nothing to run and
>>>> not
>>>> re-purposed for running EL0 app.
>>>>
>>> It's already purposed for running when there is nothing to do _or_ when
>>> there are tasklets.
>>>
>>> I do see your point about privilege level, though. And I agree with
>>> George that it looks very similar to when, in the x86 world, we tried
>>> to put the infra together for switching to Ring3 to run some pieces of
>>> Xen code.
>>
>> Right, and just to add to it, context switching to the idle vcpu has a
>> cost, but it doesn't give us any security benefits whatsever. If Xen is
>> going to spend time on context switching, it is better to do it in a
>> way that introduces a security boundary.
>
> "Context switching" to the idle vcpu doesn't actually save or change any
> registers, nor does it flush the TLB.  It's more or less just accounting
> for the scheduler.  So it has a cost (going through the scheduler) but
> not a very large one.

It depends on the architecture. For ARM we don't yet support lazy 
context switch. So effectively, the cost to "context switch" to the idle 
vCPU will be quite high.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-10 10:00                                     ` Julien Grall
@ 2017-05-10 10:03                                       ` George Dunlap
  2017-05-10 10:48                                         ` Julien Grall
  0 siblings, 1 reply; 82+ messages in thread
From: George Dunlap @ 2017-05-10 10:03 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini, Dario Faggioli
  Cc: Volodymyr Babchuk, Artem Mygaiev, Andrii Anisov, Xen Devel

On 10/05/17 11:00, Julien Grall wrote:
> 
> 
> On 05/10/2017 10:56 AM, George Dunlap wrote:
>> On 09/05/17 19:29, Stefano Stabellini wrote:
>>> On Tue, 9 May 2017, Dario Faggioli wrote:
>>>>>> And it should not be hard to give such code access to the context
>>>>>> of
>>>>>> the vCPU that was previously running (in x86, given we implement
>>>>>> what
>>>>>> we call lazy context switch, it's most likely still loaded in the
>>>>>> pCPU!).
>>>>>
>>>>> I agree with Stefano, switching to the idle vCPU is a pretty bad
>>>>> idea.
>>>>>
>>>>> the idle vCPU is a fake vCPU on ARM to stick with the common code
>>>>> (we
>>>>> never leave the hypervisor). In the case of the EL0 app, we want to
>>>>> change exception level to run the code with lower privilege.
>>>>>
>>>>> Also IHMO, it should only be used when there are nothing to run and
>>>>> not
>>>>> re-purposed for running EL0 app.
>>>>>
>>>> It's already purposed for running when there is nothing to do _or_ when
>>>> there are tasklets.
>>>>
>>>> I do see your point about privilege level, though. And I agree with
>>>> George that it looks very similar to when, in the x86 world, we tried
>>>> to put the infra together for switching to Ring3 to run some pieces of
>>>> Xen code.
>>>
>>> Right, and just to add to it, context switching to the idle vcpu has a
>>> cost, but it doesn't give us any security benefits whatsever. If Xen is
>>> going to spend time on context switching, it is better to do it in a
>>> way that introduces a security boundary.
>>
>> "Context switching" to the idle vcpu doesn't actually save or change any
>> registers, nor does it flush the TLB.  It's more or less just accounting
>> for the scheduler.  So it has a cost (going through the scheduler) but
>> not a very large one.
> 
> It depends on the architecture. For ARM we don't yet support lazy
> context switch. So effectively, the cost to "context switch" to the idle
> vCPU will be quite high.

Oh, right.  Sorry, I thought I had seen code implementing lazy context
switch in ARM, but I must have imagined it.  That is indeed a material
consideration.

Is there a particular reason that lazy context switch is difficult on
ARM?  If not it should be a fairly important bit of low-hanging fruit
from a performance perspective.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-10 10:03                                       ` George Dunlap
@ 2017-05-10 10:48                                         ` Julien Grall
  2017-05-10 17:37                                           ` Volodymyr Babchuk
  0 siblings, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-05-10 10:48 UTC (permalink / raw)
  To: George Dunlap, Stefano Stabellini, Dario Faggioli
  Cc: Volodymyr Babchuk, Artem Mygaiev, Andrii Anisov, Xen Devel

Hi George,

On 05/10/2017 11:03 AM, George Dunlap wrote:
> On 10/05/17 11:00, Julien Grall wrote:
>>
>>
>> On 05/10/2017 10:56 AM, George Dunlap wrote:
>>> On 09/05/17 19:29, Stefano Stabellini wrote:
>>>> On Tue, 9 May 2017, Dario Faggioli wrote:
>>>>>>> And it should not be hard to give such code access to the context
>>>>>>> of
>>>>>>> the vCPU that was previously running (in x86, given we implement
>>>>>>> what
>>>>>>> we call lazy context switch, it's most likely still loaded in the
>>>>>>> pCPU!).
>>>>>>
>>>>>> I agree with Stefano, switching to the idle vCPU is a pretty bad
>>>>>> idea.
>>>>>>
>>>>>> the idle vCPU is a fake vCPU on ARM to stick with the common code
>>>>>> (we
>>>>>> never leave the hypervisor). In the case of the EL0 app, we want to
>>>>>> change exception level to run the code with lower privilege.
>>>>>>
>>>>>> Also IHMO, it should only be used when there are nothing to run and
>>>>>> not
>>>>>> re-purposed for running EL0 app.
>>>>>>
>>>>> It's already purposed for running when there is nothing to do _or_ when
>>>>> there are tasklets.
>>>>>
>>>>> I do see your point about privilege level, though. And I agree with
>>>>> George that it looks very similar to when, in the x86 world, we tried
>>>>> to put the infra together for switching to Ring3 to run some pieces of
>>>>> Xen code.
>>>>
>>>> Right, and just to add to it, context switching to the idle vcpu has a
>>>> cost, but it doesn't give us any security benefits whatsever. If Xen is
>>>> going to spend time on context switching, it is better to do it in a
>>>> way that introduces a security boundary.
>>>
>>> "Context switching" to the idle vcpu doesn't actually save or change any
>>> registers, nor does it flush the TLB.  It's more or less just accounting
>>> for the scheduler.  So it has a cost (going through the scheduler) but
>>> not a very large one.
>>
>> It depends on the architecture. For ARM we don't yet support lazy
>> context switch. So effectively, the cost to "context switch" to the idle
>> vCPU will be quite high.
>
> Oh, right.  Sorry, I thought I had seen code implementing lazy context
> switch in ARM, but I must have imagined it.  That is indeed a material
> consideration.
>
> Is there a particular reason that lazy context switch is difficult on
> ARM?  If not it should be a fairly important bit of low-hanging fruit
> from a performance perspective.

I am not entirely sure what you are doing on x86. Let me explain what we 
do and why context switch is heavy on ARM.

In the case of ARM, when entering to the hypervisor, we only save the 
bare minimum (all non-banked registers + registers useful for handling 
guest request),  and left the rest untouched.

Our save/restore functions are quite big because it involving 
saving/restore state of the interrupt controller, FPU... So we have a 
fast exit/entry but slow context switch.

What we currently do is avoiding save/restore the idle vCPU because we 
always stay in the hypervisor exception level. However we still restore 
all the registers of the previous running vCPU and restore the one of 
the next running vCPU.

This has a big impact on the workload when running vCPU and waiting for 
interrupts (hence the patch from Stefano to limit entering in the 
hypervisor though it is not by default).

I made the assumption the idle vCPU is only running when nothing has to 
be done. But as you mentioned tasklet can be done there too. So running 
tasklet on Xen ARM will have an high cost.

A list of optimization we could do on ARM is:
	- Avoiding restore if the vCPU stay the same before and after idle vPCU
	- Avoiding save/restore if vCPU is dedicated to a pCPU

Do you have any other optimization on x86?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-09  8:53                               ` George Dunlap
@ 2017-05-10 16:38                                 ` Andrii Anisov
  0 siblings, 0 replies; 82+ messages in thread
From: Andrii Anisov @ 2017-05-10 16:38 UTC (permalink / raw)
  To: George Dunlap, Stefano Stabellini
  Cc: Volodymyr Babchuk, Dario Faggioli, Artem Mygaiev, Julien Grall,
	Xen Devel

Hello George,


On 09.05.17 11:53, George Dunlap wrote:
> A few years ago I'd have said ACPI was an x86 concept as well. :-)  But
> my point was mainly to give examples to Andrii of other ways stubdomains
> were used.
>
>   -George

Thanks for mentioning the PV vTPM in this thread.
It looks really interesting to me both from native app prospective as 
well as SCF point of view.

-- 

*Andrii Anisov*



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-10 10:48                                         ` Julien Grall
@ 2017-05-10 17:37                                           ` Volodymyr Babchuk
  2017-05-10 18:05                                             ` Stefano Stabellini
  2017-05-10 19:04                                             ` Julien Grall
  0 siblings, 2 replies; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-05-10 17:37 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Andrii Anisov, Dario Faggioli, George Dunlap,
	Xen Devel, Artem Mygaiev

Hi Julien,

Returning back to Native apps, I think we can make ctx switch even
faster by dropping p2m code. Imagine that we already created stage 1
MMU for native application. Then to switch to app it we need only:

1. Enable TGE bit in HCR
2. Disable VM bit in HCR
3. Save/Program EL1_TTBR and friends
3.5 (optionally) save/restore FPU state
4. Save/Restore general purpose registers + SP + CSR + PC to jump to
an app in EL0 state.

This can be done in "real" vcpu or in idle vcpu context. No differences there.

Exception handling in hypervisor would became tricky because of vcpu
absence for native app. Current implementation of entry.S always says
general purpose registers to a vcpu structure. Basically, we should
teach entry.S and traps.c about native apps.
Am I missing something?




On 10 May 2017 at 13:48, Julien Grall <julien.grall@arm.com> wrote:
> Hi George,
>
>
> On 05/10/2017 11:03 AM, George Dunlap wrote:
>>
>> On 10/05/17 11:00, Julien Grall wrote:
>>>
>>>
>>>
>>> On 05/10/2017 10:56 AM, George Dunlap wrote:
>>>>
>>>> On 09/05/17 19:29, Stefano Stabellini wrote:
>>>>>
>>>>> On Tue, 9 May 2017, Dario Faggioli wrote:
>>>>>>>>
>>>>>>>> And it should not be hard to give such code access to the context
>>>>>>>> of
>>>>>>>> the vCPU that was previously running (in x86, given we implement
>>>>>>>> what
>>>>>>>> we call lazy context switch, it's most likely still loaded in the
>>>>>>>> pCPU!).
>>>>>>>
>>>>>>>
>>>>>>> I agree with Stefano, switching to the idle vCPU is a pretty bad
>>>>>>> idea.
>>>>>>>
>>>>>>> the idle vCPU is a fake vCPU on ARM to stick with the common code
>>>>>>> (we
>>>>>>> never leave the hypervisor). In the case of the EL0 app, we want to
>>>>>>> change exception level to run the code with lower privilege.
>>>>>>>
>>>>>>> Also IHMO, it should only be used when there are nothing to run and
>>>>>>> not
>>>>>>> re-purposed for running EL0 app.
>>>>>>>
>>>>>> It's already purposed for running when there is nothing to do _or_
>>>>>> when
>>>>>> there are tasklets.
>>>>>>
>>>>>> I do see your point about privilege level, though. And I agree with
>>>>>> George that it looks very similar to when, in the x86 world, we tried
>>>>>> to put the infra together for switching to Ring3 to run some pieces of
>>>>>> Xen code.
>>>>>
>>>>>
>>>>> Right, and just to add to it, context switching to the idle vcpu has a
>>>>> cost, but it doesn't give us any security benefits whatsever. If Xen is
>>>>> going to spend time on context switching, it is better to do it in a
>>>>> way that introduces a security boundary.
>>>>
>>>>
>>>> "Context switching" to the idle vcpu doesn't actually save or change any
>>>> registers, nor does it flush the TLB.  It's more or less just accounting
>>>> for the scheduler.  So it has a cost (going through the scheduler) but
>>>> not a very large one.
>>>
>>>
>>> It depends on the architecture. For ARM we don't yet support lazy
>>> context switch. So effectively, the cost to "context switch" to the idle
>>> vCPU will be quite high.
>>
>>
>> Oh, right.  Sorry, I thought I had seen code implementing lazy context
>> switch in ARM, but I must have imagined it.  That is indeed a material
>> consideration.
>>
>> Is there a particular reason that lazy context switch is difficult on
>> ARM?  If not it should be a fairly important bit of low-hanging fruit
>> from a performance perspective.
>
>
> I am not entirely sure what you are doing on x86. Let me explain what we do
> and why context switch is heavy on ARM.
>
> In the case of ARM, when entering to the hypervisor, we only save the bare
> minimum (all non-banked registers + registers useful for handling guest
> request),  and left the rest untouched.
>
> Our save/restore functions are quite big because it involving saving/restore
> state of the interrupt controller, FPU... So we have a fast exit/entry but
> slow context switch.
>
> What we currently do is avoiding save/restore the idle vCPU because we
> always stay in the hypervisor exception level. However we still restore all
> the registers of the previous running vCPU and restore the one of the next
> running vCPU.
>
> This has a big impact on the workload when running vCPU and waiting for
> interrupts (hence the patch from Stefano to limit entering in the hypervisor
> though it is not by default).
>
> I made the assumption the idle vCPU is only running when nothing has to be
> done. But as you mentioned tasklet can be done there too. So running tasklet
> on Xen ARM will have an high cost.
>
> A list of optimization we could do on ARM is:
>         - Avoiding restore if the vCPU stay the same before and after idle
> vPCU
>         - Avoiding save/restore if vCPU is dedicated to a pCPU
>
> Do you have any other optimization on x86?
>
> Cheers,
>
> --
> Julien Grall



-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-10 17:37                                           ` Volodymyr Babchuk
@ 2017-05-10 18:05                                             ` Stefano Stabellini
  2017-05-10 19:04                                             ` Julien Grall
  1 sibling, 0 replies; 82+ messages in thread
From: Stefano Stabellini @ 2017-05-10 18:05 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Stefano Stabellini, Andrii Anisov, Dario Faggioli, George Dunlap,
	Xen Devel, Julien Grall, Artem Mygaiev

On Wed, 10 May 2017, Volodymyr Babchuk wrote:
> Hi Julien,
> 
> Returning back to Native apps, I think we can make ctx switch even
> faster by dropping p2m code. Imagine that we already created stage 1
> MMU for native application. Then to switch to app it we need only:
> 
> 1. Enable TGE bit in HCR
> 2. Disable VM bit in HCR
> 3. Save/Program EL1_TTBR and friends
> 3.5 (optionally) save/restore FPU state
> 4. Save/Restore general purpose registers + SP + CSR + PC to jump to
> an app in EL0 state.
> 
> This can be done in "real" vcpu or in idle vcpu context. No differences there.
> 
> Exception handling in hypervisor would became tricky because of vcpu
> absence for native app. Current implementation of entry.S always says
> general purpose registers to a vcpu structure. Basically, we should
> teach entry.S and traps.c about native apps.
> Am I missing something?

The nicest way to do this is probably to create another saved_context
in arch_vcpu for EL0 apps. That way, changes to traps.c and entry.S will
be almost nothing.


> 
> On 10 May 2017 at 13:48, Julien Grall <julien.grall@arm.com> wrote:
> > Hi George,
> >
> >
> > On 05/10/2017 11:03 AM, George Dunlap wrote:
> >>
> >> On 10/05/17 11:00, Julien Grall wrote:
> >>>
> >>>
> >>>
> >>> On 05/10/2017 10:56 AM, George Dunlap wrote:
> >>>>
> >>>> On 09/05/17 19:29, Stefano Stabellini wrote:
> >>>>>
> >>>>> On Tue, 9 May 2017, Dario Faggioli wrote:
> >>>>>>>>
> >>>>>>>> And it should not be hard to give such code access to the context
> >>>>>>>> of
> >>>>>>>> the vCPU that was previously running (in x86, given we implement
> >>>>>>>> what
> >>>>>>>> we call lazy context switch, it's most likely still loaded in the
> >>>>>>>> pCPU!).
> >>>>>>>
> >>>>>>>
> >>>>>>> I agree with Stefano, switching to the idle vCPU is a pretty bad
> >>>>>>> idea.
> >>>>>>>
> >>>>>>> the idle vCPU is a fake vCPU on ARM to stick with the common code
> >>>>>>> (we
> >>>>>>> never leave the hypervisor). In the case of the EL0 app, we want to
> >>>>>>> change exception level to run the code with lower privilege.
> >>>>>>>
> >>>>>>> Also IHMO, it should only be used when there are nothing to run and
> >>>>>>> not
> >>>>>>> re-purposed for running EL0 app.
> >>>>>>>
> >>>>>> It's already purposed for running when there is nothing to do _or_
> >>>>>> when
> >>>>>> there are tasklets.
> >>>>>>
> >>>>>> I do see your point about privilege level, though. And I agree with
> >>>>>> George that it looks very similar to when, in the x86 world, we tried
> >>>>>> to put the infra together for switching to Ring3 to run some pieces of
> >>>>>> Xen code.
> >>>>>
> >>>>>
> >>>>> Right, and just to add to it, context switching to the idle vcpu has a
> >>>>> cost, but it doesn't give us any security benefits whatsever. If Xen is
> >>>>> going to spend time on context switching, it is better to do it in a
> >>>>> way that introduces a security boundary.
> >>>>
> >>>>
> >>>> "Context switching" to the idle vcpu doesn't actually save or change any
> >>>> registers, nor does it flush the TLB.  It's more or less just accounting
> >>>> for the scheduler.  So it has a cost (going through the scheduler) but
> >>>> not a very large one.
> >>>
> >>>
> >>> It depends on the architecture. For ARM we don't yet support lazy
> >>> context switch. So effectively, the cost to "context switch" to the idle
> >>> vCPU will be quite high.
> >>
> >>
> >> Oh, right.  Sorry, I thought I had seen code implementing lazy context
> >> switch in ARM, but I must have imagined it.  That is indeed a material
> >> consideration.
> >>
> >> Is there a particular reason that lazy context switch is difficult on
> >> ARM?  If not it should be a fairly important bit of low-hanging fruit
> >> from a performance perspective.
> >
> >
> > I am not entirely sure what you are doing on x86. Let me explain what we do
> > and why context switch is heavy on ARM.
> >
> > In the case of ARM, when entering to the hypervisor, we only save the bare
> > minimum (all non-banked registers + registers useful for handling guest
> > request),  and left the rest untouched.
> >
> > Our save/restore functions are quite big because it involving saving/restore
> > state of the interrupt controller, FPU... So we have a fast exit/entry but
> > slow context switch.
> >
> > What we currently do is avoiding save/restore the idle vCPU because we
> > always stay in the hypervisor exception level. However we still restore all
> > the registers of the previous running vCPU and restore the one of the next
> > running vCPU.
> >
> > This has a big impact on the workload when running vCPU and waiting for
> > interrupts (hence the patch from Stefano to limit entering in the hypervisor
> > though it is not by default).
> >
> > I made the assumption the idle vCPU is only running when nothing has to be
> > done. But as you mentioned tasklet can be done there too. So running tasklet
> > on Xen ARM will have an high cost.
> >
> > A list of optimization we could do on ARM is:
> >         - Avoiding restore if the vCPU stay the same before and after idle
> > vPCU
> >         - Avoiding save/restore if vCPU is dedicated to a pCPU
> >
> > Do you have any other optimization on x86?
> >
> > Cheers,
> >
> > --
> > Julien Grall
> 
> 
> 
> -- 
> WBR Volodymyr Babchuk aka lorc [+380976646013]
> mailto: vlad.babchuk@gmail.com
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-10  9:56                                   ` George Dunlap
  2017-05-10 10:00                                     ` Julien Grall
@ 2017-05-10 18:08                                     ` Andrii Anisov
  2017-05-10 18:24                                       ` Stefano Stabellini
  1 sibling, 1 reply; 82+ messages in thread
From: Andrii Anisov @ 2017-05-10 18:08 UTC (permalink / raw)
  To: George Dunlap, Stefano Stabellini, Dario Faggioli
  Cc: Volodymyr Babchuk, Julien Grall, Artem Mygaiev, Xen Devel

Hello All,


On 10.05.17 12:56, George Dunlap wrote:
> But the context here is that Andrii asked something about whether this
> "EL0 App" functionality could be used to service Xen as well as a
> domain.  You said it didn't make sense, and Dario (as I understand it)
> was pointing out that we already did something similar with tasklets.
> If there was a need to be able to "upload" user-specified routines that
> would handle events generated by the hypervisor rather than events
> generated by a guest, that would indeed be a possibility.  It would
> essentially be the equivalent of a deprivileged, untrusted tasklet.
Actually that is what we are heavily interested in.
One more pro for a generic EL0 apps is that they could have different 
from XEN license. I.e. proprietary one.

> At the moment I can't foresee the need for such a mechanism, and I don't
> particularly think that we should keep that use case in mind when
> designing the "App" interface.  But it is an interesting idea to keep in
> our back pockets in case a use case comes up later.
I would provide few examples we have on the table:

  * fdtlib mentioned here [1] - just an example of a piece of some
    untrusted but virtually needed code.
  * a coprocessor platform support for SCF [2][3] - probably will be a
    piece of proprietary code, due to such IP specific functionalities
    like coprocessor task switching sequence and mmio access emulation.
  * some tee support code - support of trustee or mshield - proprietary one.

[1] 
https://lists.xenproject.org/archives/html/xen-devel/2017-05/msg00381.html
[2] 
https://lists.xenproject.org/archives/html/xen-devel/2016-10/msg01966.html
[3] 
https://lists.xenproject.org/archives/html/xen-devel/2017-05/msg00348.html

-- 

*Andrii Anisov*



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-10 18:08                                     ` Andrii Anisov
@ 2017-05-10 18:24                                       ` Stefano Stabellini
  2017-05-11 15:19                                         ` Volodymyr Babchuk
  0 siblings, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-05-10 18:24 UTC (permalink / raw)
  To: Andrii Anisov
  Cc: Stefano Stabellini, Volodymyr Babchuk, Dario Faggioli,
	George Dunlap, Xen Devel, Julien Grall, Artem Mygaiev

On Wed, 10 May 2017, Andrii Anisov wrote:
> On 10.05.17 12:56, George Dunlap wrote:
> > But the context here is that Andrii asked something about whether this
> > "EL0 App" functionality could be used to service Xen as well as a
> > domain.  You said it didn't make sense, and Dario (as I understand it)
> > was pointing out that we already did something similar with tasklets.
> > If there was a need to be able to "upload" user-specified routines that
> > would handle events generated by the hypervisor rather than events
> > generated by a guest, that would indeed be a possibility.  It would
> > essentially be the equivalent of a deprivileged, untrusted tasklet.

I just want to point out that the comparision with tasklets is not
helpful. Tasklets involve the idle vcpu, which we are trying to step away
from because it increases irq latency. Tasklets don't provide any
isolation. The context switch model for the idle vcpu and for EL0 apps
is different, thus it has a different cost.

I think we shouldn't mention tasklets in this thread any longer.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-10 17:37                                           ` Volodymyr Babchuk
  2017-05-10 18:05                                             ` Stefano Stabellini
@ 2017-05-10 19:04                                             ` Julien Grall
  2017-05-11 10:07                                               ` Julien Grall
  1 sibling, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-05-10 19:04 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Stefano Stabellini, Andrii Anisov, Steve Capper, Dario Faggioli,
	George Dunlap, Xen Devel, Artem Mygaiev

On 05/10/2017 06:37 PM, Volodymyr Babchuk wrote:
> Hi Julien,

Hi Volodymyr,

> Returning back to Native apps, I think we can make ctx switch even
> faster by dropping p2m code. Imagine that we already created stage 1
> MMU for native application. Then to switch to app it we need only:
>
> 1. Enable TGE bit in HCR
> 2. Disable VM bit in HCR
> 3. Save/Program EL1_TTBR and friends
> 3.5 (optionally) save/restore FPU state
> 4. Save/Restore general purpose registers + SP + CSR + PC to jump to
> an app in EL0 state.
>
> This can be done in "real" vcpu or in idle vcpu context. No differences there.
>
> Exception handling in hypervisor would became tricky because of vcpu
> absence for native app. Current implementation of entry.S always says
> general purpose registers to a vcpu structure. Basically, we should
> teach entry.S and traps.c about native apps.
> Am I missing something?

HCR_EL2.VM is allowed to be cached in the TLBs so for correctness you 
have to flush the TLBs everytime you change this bit (see D4.8.3 in ARM 
DDI 0487A.k_iss10775).

Furthermore, as I mentioned earlier (see [1]) there are dependencies on 
the VMID even when stage-2 is disabled (see D4-1823 in ARM DDI 
0487A.k_iss10775) so you have to program correctly VTTBR_EL2.VMID. This 
also means that if you use a different EL0 app, you have to ther use a 
different VMID or flush the TLBs.

Bottom line, if you don't use stage-2 page table you have to flush the 
TLBs. Likely this will have an higher impact on the platform than using 
stage-2 page table.

Virtual memory is quite tricky, someone needs to look at the ARM ARM and 
check all the behaviors when disabling either stage-1 or stage-2. There 
are memory attribute implications that may make tricky to move an EL0 
app between pCPU.

CC Steve on the discussion to get more feedback. We are trying to 
disable completely EL1.

Cheers,

[1] https://lists.xen.org/archives/html/xen-devel/2017-03/msg04374.html

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-10 19:04                                             ` Julien Grall
@ 2017-05-11 10:07                                               ` Julien Grall
  2017-05-11 11:28                                                 ` Volodymyr Babchuk
  0 siblings, 1 reply; 82+ messages in thread
From: Julien Grall @ 2017-05-11 10:07 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Stefano Stabellini, Andrii Anisov, Steve Capper, Dario Faggioli,
	George Dunlap, Xen Devel, Artem Mygaiev

Hello,

On 10/05/17 20:04, Julien Grall wrote:
> On 05/10/2017 06:37 PM, Volodymyr Babchuk wrote:
>> Hi Julien,
>
> Hi Volodymyr,
>
>> Returning back to Native apps, I think we can make ctx switch even
>> faster by dropping p2m code. Imagine that we already created stage 1
>> MMU for native application. Then to switch to app it we need only:
>>
>> 1. Enable TGE bit in HCR
>> 2. Disable VM bit in HCR
>> 3. Save/Program EL1_TTBR and friends
>> 3.5 (optionally) save/restore FPU state
>> 4. Save/Restore general purpose registers + SP + CSR + PC to jump to
>> an app in EL0 state.
>>
>> This can be done in "real" vcpu or in idle vcpu context. No
>> differences there.
>>
>> Exception handling in hypervisor would became tricky because of vcpu
>> absence for native app. Current implementation of entry.S always says
>> general purpose registers to a vcpu structure. Basically, we should
>> teach entry.S and traps.c about native apps.
>> Am I missing something?
>
> HCR_EL2.VM is allowed to be cached in the TLBs so for correctness you
> have to flush the TLBs everytime you change this bit (see D4.8.3 in ARM
> DDI 0487A.k_iss10775).
>
> Furthermore, as I mentioned earlier (see [1]) there are dependencies on
> the VMID even when stage-2 is disabled (see D4-1823 in ARM DDI
> 0487A.k_iss10775) so you have to program correctly VTTBR_EL2.VMID. This
> also means that if you use a different EL0 app, you have to ther use a
> different VMID or flush the TLBs.
>
> Bottom line, if you don't use stage-2 page table you have to flush the
> TLBs. Likely this will have an higher impact on the platform than using
> stage-2 page table.
>
> Virtual memory is quite tricky, someone needs to look at the ARM ARM and
> check all the behaviors when disabling either stage-1 or stage-2. There
> are memory attribute implications that may make tricky to move an EL0
> app between pCPU.

Looking again at the documentation and chatting with other ARM folks. I 
was wrong on some part, sorry for the confusion.

It turns out that if you don't need to flush the TLBs when disabling the 
HCR_EL2.VM (this is what Linux does for KVM). So disabling stage-2 for 
EL0 app would be ok.

But you still need to allocate a VMID per EL0 app as TLBs will still 
depend on it even with stage-2 disabled.

Even if we keep stage-2 enabled, we would have to create dummy page 
tables of stage-1 because the memory attribute would impact performance 
and at least not allow the EL0 app to move (see D4.2.8 in ARM DDI 
0487A.k_iss10775). In this case, 1:1 page tables with a block map (e.g 
1GB) would be sufficient and rely on stage-2 page tables.

Lastly, can you remind me with platform you are using for testing?

I hope this helps.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-11 10:07                                               ` Julien Grall
@ 2017-05-11 11:28                                                 ` Volodymyr Babchuk
  0 siblings, 0 replies; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-05-11 11:28 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Andrii Anisov, Steve Capper, Dario Faggioli,
	George Dunlap, Xen Devel, Artem Mygaiev

Hi Julien,



On 11 May 2017 at 13:07, Julien Grall <julien.grall@arm.com> wrote:
> Looking again at the documentation and chatting with other ARM folks. I was
> wrong on some part, sorry for the confusion.
Thank you for this investigation. One can't fit whole ARMv8 TRM in one's head :)

> It turns out that if you don't need to flush the TLBs when disabling the
> HCR_EL2.VM (this is what Linux does for KVM). So disabling stage-2 for EL0
> app would be ok.
Aha, these are good news.

> But you still need to allocate a VMID per EL0 app as TLBs will still depend
> on it even with stage-2 disabled.
I see. But I will need to allocate VMID in any case, right?

> Even if we keep stage-2 enabled, we would have to create dummly page tables
> of stage-1 because the memory attribute would impact performance and at
> least not allow the EL0 app to move (see D4.2.8 in ARM DDI
> 0487A.k_iss10775). In this case, 1:1 page tables with a block map (e.g 1GB)
> would be sufficient and rely on stage-2 page tables.
Yes, I did exactly this in my PoC.  I'm was curious on disabling
stage-2 because I don't want to mess with p2m context save/restore
functions. Good to know that it is possible.

> Lastly, can you remind me with platform you are using for testing?
It is Renesas Rcar Gen3. It is Big-Little platform, but currently we
use only four A57 cores.

-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-10 18:24                                       ` Stefano Stabellini
@ 2017-05-11 15:19                                         ` Volodymyr Babchuk
  2017-05-11 15:35                                           ` Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope)) Julien Grall
  0 siblings, 1 reply; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-05-11 15:19 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Andrii Anisov, Dario Faggioli, George Dunlap, Xen Devel,
	Julien Grall, Artem Mygaiev

Hi Stefano,

On 10 May 2017 at 21:24, Stefano Stabellini <sstabellini@kernel.org> wrote:
> I just want to point out that the comparision with tasklets is not
> helpful. Tasklets involve the idle vcpu, which we are trying to step away
> from because it increases irq latency. Tasklets don't provide any
> isolation. The context switch model for the idle vcpu and for EL0 apps
> is different, thus it has a different cost.
>
> I think we shouldn't mention tasklets in this thread any longer.
Yep, you are right. Let's forget about tasklets and focus on EL0 apps.

I want summarize political (opposed to technical) part of the discussion.

We, here at EPAM, viewed EL0 apps primarily as a way to extend
hypervisor. Because when it comes to embedded and automotive, there
arise some ugly things, that are needed at hypervisor level:
TEE mediators (OP-TEE is a good TEE, but for example there is TI's
MSHIELD with deeply proprietary license), device drivers for vcopros,
device drivers for cpufreq, and so on.
Some of this things can't be included in hypervisor due to legal
issues, some - because of code size or complexity. And we can't run
them in stubdoms, because stubdoms are slow for certain use-cases, in
some cases they are insecure, in some cases they just don't fit at
all.

On other hand you consider EL0 apps as ideal host for emulators only.
I can see your point, because XEN was always viewed as hypervisor for
servers.
But servers have different requirements in comparison to embedded
applications. Traditional servers does not use hardware accelerated
video decoders, they don't need to disable cpu's or scale frequencies
to preserve energy (okay, they need to, but it is not as pressing, as
on battery-powered device), there almost no proprietary code (or even
proprietary blobs, argh!).
Looks like virtualization on embedded is the next big thing. Linux
kernel was able to satisfy both parties. I hope that XEN can do the
same.

So, going back to EL0 apps. Honestly, I'd prefer not to use them as
extension mechanism. Yes, they provide isolation, but interfacing with
them will be painful. Probably we can leave them to emulators only
(but as I can see, PL011 emulator is going to be merged right into
hypervisor. Will be there need for other emulators?).
What I really want to ask: what do you thing about old good modules
like ones in linux kernel? There will be no isolation, this is bad.
But:
 - you can load proprietary modules if you want to
 - they are fast
 - you can interface with them in a nativest way possible: just call a function

Artem, could you please comment from your side?

-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope))
  2017-05-11 15:19                                         ` Volodymyr Babchuk
@ 2017-05-11 15:35                                           ` Julien Grall
  2017-05-11 16:35                                             ` George Dunlap
                                                               ` (2 more replies)
  0 siblings, 3 replies; 82+ messages in thread
From: Julien Grall @ 2017-05-11 15:35 UTC (permalink / raw)
  To: Volodymyr Babchuk, Stefano Stabellini
  Cc: Tim Deegan, Andrii Anisov, Andrew Cooper, Dario Faggioli,
	Ian Jackson, George Dunlap, Xen Devel, Jan Beulich, Wei Liu,
	Artem Mygaiev

Renaming the subject + adding more people in the conversation as this is 
not related to only ARM anymore.

On 11/05/17 16:19, Volodymyr Babchuk wrote:
> Hi Stefano,
>
> On 10 May 2017 at 21:24, Stefano Stabellini <sstabellini@kernel.org> wrote:
>> I just want to point out that the comparision with tasklets is not
>> helpful. Tasklets involve the idle vcpu, which we are trying to step away
>> from because it increases irq latency. Tasklets don't provide any
>> isolation. The context switch model for the idle vcpu and for EL0 apps
>> is different, thus it has a different cost.
>>
>> I think we shouldn't mention tasklets in this thread any longer.
> Yep, you are right. Let's forget about tasklets and focus on EL0 apps.
>
> I want summarize political (opposed to technical) part of the discussion.
>
> We, here at EPAM, viewed EL0 apps primarily as a way to extend
> hypervisor. Because when it comes to embedded and automotive, there
> arise some ugly things, that are needed at hypervisor level:
> TEE mediators (OP-TEE is a good TEE, but for example there is TI's
> MSHIELD with deeply proprietary license), device drivers for vcopros,
> device drivers for cpufreq, and so on.
> Some of this things can't be included in hypervisor due to legal
> issues, some - because of code size or complexity. And we can't run
> them in stubdoms, because stubdoms are slow for certain use-cases, in
> some cases they are insecure, in some cases they just don't fit at
> all.
>
> On other hand you consider EL0 apps as ideal host for emulators only.
> I can see your point, because XEN was always viewed as hypervisor for
> servers.
> But servers have different requirements in comparison to embedded
> applications. Traditional servers does not use hardware accelerated
> video decoders, they don't need to disable cpu's or scale frequencies
> to preserve energy (okay, they need to, but it is not as pressing, as
> on battery-powered device), there almost no proprietary code (or even
> proprietary blobs, argh!).
> Looks like virtualization on embedded is the next big thing. Linux
> kernel was able to satisfy both parties. I hope that XEN can do the
> same.
>
> So, going back to EL0 apps. Honestly, I'd prefer not to use them as
> extension mechanism. Yes, they provide isolation, but interfacing with
> them will be painful. Probably we can leave them to emulators only
> (but as I can see, PL011 emulator is going to be merged right into
> hypervisor. Will be there need for other emulators?).
> What I really want to ask: what do you thing about old good modules
> like ones in linux kernel? There will be no isolation, this is bad.
> But:
>  - you can load proprietary modules if you want to
>  - they are fast
>  - you can interface with them in a nativest way possible: just call a function
>
> Artem, could you please comment from your side?
>

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope))
  2017-05-11 15:35                                           ` Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope)) Julien Grall
@ 2017-05-11 16:35                                             ` George Dunlap
  2017-05-11 17:14                                               ` Volodymyr Babchuk
  2017-05-11 17:14                                             ` George Dunlap
  2017-05-11 18:04                                             ` Stefano Stabellini
  2 siblings, 1 reply; 82+ messages in thread
From: George Dunlap @ 2017-05-11 16:35 UTC (permalink / raw)
  To: Julien Grall, Volodymyr Babchuk, Stefano Stabellini
  Cc: Tim Deegan, Andrii Anisov, Andrew Cooper, Dario Faggioli,
	Ian Jackson, Xen Devel, Jan Beulich, Wei Liu, Artem Mygaiev

On 11/05/17 16:35, Julien Grall wrote:
> On 11/05/17 16:19, Volodymyr Babchuk wrote:
>> Hi Stefano,
>>
>> On 10 May 2017 at 21:24, Stefano Stabellini <sstabellini@kernel.org>
>> wrote:
>>> I just want to point out that the comparision with tasklets is not
>>> helpful. Tasklets involve the idle vcpu, which we are trying to step
>>> away
>>> from because it increases irq latency. Tasklets don't provide any
>>> isolation. The context switch model for the idle vcpu and for EL0 apps
>>> is different, thus it has a different cost.
>>>
>>> I think we shouldn't mention tasklets in this thread any longer.
>> Yep, you are right. Let's forget about tasklets and focus on EL0 apps.
>>
>> I want summarize political (opposed to technical) part of the discussion.
>>
>> We, here at EPAM, viewed EL0 apps primarily as a way to extend
>> hypervisor. Because when it comes to embedded and automotive, there
>> arise some ugly things, that are needed at hypervisor level:
>> TEE mediators (OP-TEE is a good TEE, but for example there is TI's
>> MSHIELD with deeply proprietary license), device drivers for vcopros,
>> device drivers for cpufreq, and so on.
>> Some of this things can't be included in hypervisor due to legal
>> issues, some - because of code size or complexity. And we can't run
>> them in stubdoms, because stubdoms are slow for certain use-cases, in
>> some cases they are insecure, in some cases they just don't fit at
>> all.
>>
>> On other hand you consider EL0 apps as ideal host for emulators only.
>> I can see your point, because XEN was always viewed as hypervisor for
>> servers.
>> But servers have different requirements in comparison to embedded
>> applications. Traditional servers does not use hardware accelerated
>> video decoders, they don't need to disable cpu's or scale frequencies
>> to preserve energy (okay, they need to, but it is not as pressing, as
>> on battery-powered device), there almost no proprietary code (or even
>> proprietary blobs, argh!).
>> Looks like virtualization on embedded is the next big thing. Linux
>> kernel was able to satisfy both parties. I hope that XEN can do the
>> same.
>>
>> So, going back to EL0 apps. Honestly, I'd prefer not to use them as
>> extension mechanism. Yes, they provide isolation, but interfacing with
>> them will be painful. Probably we can leave them to emulators only
>> (but as I can see, PL011 emulator is going to be merged right into
>> hypervisor. Will be there need for other emulators?).
>> What I really want to ask: what do you thing about old good modules
>> like ones in linux kernel? There will be no isolation, this is bad.
>> But:
>>  - you can load proprietary modules if you want to
>>  - they are fast
>>  - you can interface with them in a nativest way possible: just call a
>> function

Even better would be to skip the module-loading step entirely, and just
compile proprietary code directly into your Xen binary.

Both solutions, unfortunately, are illegal.*

 -George

* I am not a lawyer, and this is not legal advice; but see this
presentation for a bit more information:
http://www.kroah.com/log/linux/ols_2006_keynote.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope))
  2017-05-11 15:35                                           ` Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope)) Julien Grall
  2017-05-11 16:35                                             ` George Dunlap
@ 2017-05-11 17:14                                             ` George Dunlap
  2017-05-11 17:16                                               ` George Dunlap
  2017-05-11 18:13                                               ` Volodymyr Babchuk
  2017-05-11 18:04                                             ` Stefano Stabellini
  2 siblings, 2 replies; 82+ messages in thread
From: George Dunlap @ 2017-05-11 17:14 UTC (permalink / raw)
  To: Julien Grall, Volodymyr Babchuk, Stefano Stabellini
  Cc: Tim Deegan, Andrii Anisov, Andrew Cooper, Dario Faggioli,
	Ian Jackson, Xen Devel, Jan Beulich, Wei Liu, Artem Mygaiev

On 11/05/17 16:35, Julien Grall wrote:
> Renaming the subject + adding more people in the conversation as this is
> not related to only ARM anymore.
> 
> On 11/05/17 16:19, Volodymyr Babchuk wrote:
>> Hi Stefano,
>>
>> On 10 May 2017 at 21:24, Stefano Stabellini <sstabellini@kernel.org>
>> wrote:
>>> I just want to point out that the comparision with tasklets is not
>>> helpful. Tasklets involve the idle vcpu, which we are trying to step
>>> away
>>> from because it increases irq latency. Tasklets don't provide any
>>> isolation. The context switch model for the idle vcpu and for EL0 apps
>>> is different, thus it has a different cost.
>>>
>>> I think we shouldn't mention tasklets in this thread any longer.
>> Yep, you are right. Let's forget about tasklets and focus on EL0 apps.
>>
>> I want summarize political (opposed to technical) part of the discussion.
>>
>> We, here at EPAM, viewed EL0 apps primarily as a way to extend
>> hypervisor. Because when it comes to embedded and automotive, there
>> arise some ugly things, that are needed at hypervisor level:
>> TEE mediators (OP-TEE is a good TEE, but for example there is TI's
>> MSHIELD with deeply proprietary license),

If you're going to use a deeply proprietary TEE mediator, then you need
to find yourself a deeply proprietary hypervisor to go along with it --
either one you pay a license fee for or one you develop yourself.  It
would almost certainly be cheaper to improve the open-source one than to
do either of those.

Or you can try mixing the two and see what happens; but that doesn't
seem like a very sound legal strategy to me.

>> ...some [things can't be included in hypervisor] because of code
>> size or complexity.

Sorry, just to be clear: below you mentioned modules as a solution, and
given the context this would be included.  So can you expand on what you
mean that there are things that 1) can't be included in the hypervisor
because of code size or complexity, but for which 2) loadable modules
would be a suitable solution?

>> And we can't run
>> them in stubdoms, because stubdoms are slow for certain use-cases, in
>> some cases they are insecure, in some cases they just don't fit at
>> all.
>> On other hand you consider EL0 apps as ideal host for emulators only.
>> I can see your point, because XEN was always viewed as hypervisor for
>> servers.
>> But servers have different requirements in comparison to embedded
>> applications. Traditional servers does not use hardware accelerated
>> video decoders, they don't need to disable cpu's or scale frequencies
>> to preserve energy (okay, they need to, but it is not as pressing, as
>> on battery-powered device), there almost no proprietary code (or even
>> proprietary blobs, argh!).
>> Looks like virtualization on embedded is the next big thing. Linux
>> kernel was able to satisfy both parties. I hope that XEN can do the
>> same.

For many of these, there are probably technical solutions that we could
come up with that would allow proprietary content (such as video
decoders &c) that would have suitable performance without needing access
to the Xen address space.

Maybe I'm just not familiar with things, but it's hard for me to imagine
why you'd need proprietary blobs to disable cpus or scale frequency.
Are these really such complex activities that it's worth investing
thousands of hours of developer work into developing proprietary
solutions that you license?

Loading proprietary modules into Linux is as illegal as it would be in
Xen.  Many people obviously do it anyway, but you are really putting
yourself at a risk of meeting a guy like Patrick McHardy[1], a private
individual with copyright on the Linux kernel who by some estimates has
made almost EUR 2m in the last few years suing companies for GPL violations.

 -George

[1] https://lwn.net/Articles/721458/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope))
  2017-05-11 16:35                                             ` George Dunlap
@ 2017-05-11 17:14                                               ` Volodymyr Babchuk
  2017-05-11 17:20                                                 ` George Dunlap
  0 siblings, 1 reply; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-05-11 17:14 UTC (permalink / raw)
  To: George Dunlap
  Cc: Tim Deegan, Stefano Stabellini, Andrii Anisov, Andrew Cooper,
	Dario Faggioli, Ian Jackson, Xen Devel, Julien Grall,
	Jan Beulich, Wei Liu, Artem Mygaiev

Hi George,

On 11 May 2017 at 19:35, George Dunlap <george.dunlap@citrix.com> wrote:
> Even better would be to skip the module-loading step entirely, and just
> compile proprietary code directly into your Xen binary.
>
> Both solutions, unfortunately, are illegal.*
Look, I don't saying we want to produce closed-source modules or apps.
We want to write open source code. Just imagine, that certain header
files have some proprietary license (e.g. some device interface
definition and this interface is IP of company which developed it).
AFAIK, it can't be included into Xen distribution. I thought, that it
can be included in some module with different (but still open source)
license.  But if you say that it can't... Then I don't know. It is out
of my competence. I'm not lawyer also.

-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope))
  2017-05-11 17:14                                             ` George Dunlap
@ 2017-05-11 17:16                                               ` George Dunlap
  2017-05-11 18:13                                               ` Volodymyr Babchuk
  1 sibling, 0 replies; 82+ messages in thread
From: George Dunlap @ 2017-05-11 17:16 UTC (permalink / raw)
  To: Julien Grall, Volodymyr Babchuk, Stefano Stabellini
  Cc: Andrii Anisov, Ian Jackson, Dario Faggioli, Tim Deegan,
	Xen Devel, Jan Beulich, Andrew Cooper, Wei Liu, Artem Mygaiev

On Thu, May 11, 2017 at 6:14 PM, George Dunlap <george.dunlap@citrix.com> wrote:
> yourself at a risk of meeting a guy like Patrick McHardy[1], a private
> individual with copyright on the Linux kernel

This should be "copyright on *code in the* Linux Kernel".  Obviously
he doesn't own a copyright on the whole thing, just a decent chunk of
it.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope))
  2017-05-11 17:14                                               ` Volodymyr Babchuk
@ 2017-05-11 17:20                                                 ` George Dunlap
  2017-05-11 17:53                                                   ` Lars Kurth
  0 siblings, 1 reply; 82+ messages in thread
From: George Dunlap @ 2017-05-11 17:20 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Stefano Stabellini, Andrii Anisov, Ian Jackson, Dario Faggioli,
	Tim Deegan, Xen Devel, Julien Grall, Jan Beulich, Andrew Cooper,
	Wei Liu, Artem Mygaiev

On Thu, May 11, 2017 at 6:14 PM, Volodymyr Babchuk
<vlad.babchuk@gmail.com> wrote:
> Hi George,
>
> On 11 May 2017 at 19:35, George Dunlap <george.dunlap@citrix.com> wrote:
>> Even better would be to skip the module-loading step entirely, and just
>> compile proprietary code directly into your Xen binary.
>>
>> Both solutions, unfortunately, are illegal.*
> Look, I don't saying we want to produce closed-source modules or apps.
> We want to write open source code. Just imagine, that certain header
> files have some proprietary license (e.g. some device interface
> definition and this interface is IP of company which developed it).
> AFAIK, it can't be included into Xen distribution. I thought, that it
> can be included in some module with different (but still open source)
> license.  But if you say that it can't... Then I don't know. It is out
> of my competence. I'm not lawyer also.

I see.  That's good to know, but it doesn't change the legal aspect of
things. :-0

It used to be held that the information contained in headers --
constants, interface definitions, and so on -- weren't copyrightable;
in which case you could just include the header (or a modified version
of it) without any problems.  Unfortunately Oracle v Google may have
changed that.  But you'd have to ask a lawyer about that...

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope))
  2017-05-11 17:20                                                 ` George Dunlap
@ 2017-05-11 17:53                                                   ` Lars Kurth
  0 siblings, 0 replies; 82+ messages in thread
From: Lars Kurth @ 2017-05-11 17:53 UTC (permalink / raw)
  To: George Dunlap
  Cc: Stefano Stabellini, Andrii Anisov, Volodymyr Babchuk, Tim Deegan,
	Dario Faggioli, Ian Jackson, Xen Devel, Julien Grall,
	Jan Beulich, Andrew Cooper, Wei Liu, Artem Mygaiev


> On 11 May 2017, at 18:20, George Dunlap <george.dunlap@citrix.com> wrote:
> 
> On Thu, May 11, 2017 at 6:14 PM, Volodymyr Babchuk
> <vlad.babchuk@gmail.com> wrote:
>> Hi George,
>> 
>> On 11 May 2017 at 19:35, George Dunlap <george.dunlap@citrix.com> wrote:
>>> Even better would be to skip the module-loading step entirely, and just
>>> compile proprietary code directly into your Xen binary.
>>> 
>>> Both solutions, unfortunately, are illegal.*
>> Look, I don't saying we want to produce closed-source modules or apps.
>> We want to write open source code. Just imagine, that certain header
>> files have some proprietary license (e.g. some device interface
>> definition and this interface is IP of company which developed it).
>> AFAIK, it can't be included into Xen distribution. I thought, that it
>> can be included in some module with different (but still open source)
>> license.  But if you say that it can't... Then I don't know. It is out
>> of my competence. I'm not lawyer also.
> 
> I see.  That's good to know, but it doesn't change the legal aspect of
> things. :-0

The legal issues would be similar to those with Linux Kernel Modules. For more information, see http://www.ifross.org/en/artikel/license-incompatibility-and-linux-kernel-modules-reloaded

Best Regards
Lars
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope))
  2017-05-11 15:35                                           ` Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope)) Julien Grall
  2017-05-11 16:35                                             ` George Dunlap
  2017-05-11 17:14                                             ` George Dunlap
@ 2017-05-11 18:04                                             ` Stefano Stabellini
  2017-05-11 18:39                                               ` Volodymyr Babchuk
  2 siblings, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-05-11 18:04 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Andrii Anisov, Volodymyr Babchuk,
	Andrew Cooper, Dario Faggioli, Tim Deegan, George Dunlap,
	Xen Devel, Jan Beulich, Wei Liu, Ian Jackson, Artem Mygaiev

On 11/05/17 16:19, Volodymyr Babchuk wrote:
> Hi Stefano,
> 
> On 10 May 2017 at 21:24, Stefano Stabellini <sstabellini@kernel.org> wrote:
> > I just want to point out that the comparision with tasklets is not
> > helpful. Tasklets involve the idle vcpu, which we are trying to step away
> > from because it increases irq latency. Tasklets don't provide any
> > isolation. The context switch model for the idle vcpu and for EL0 apps
> > is different, thus it has a different cost.
> > 
> > I think we shouldn't mention tasklets in this thread any longer.
> Yep, you are right. Let's forget about tasklets and focus on EL0 apps.
> 
> I want summarize political (opposed to technical) part of the discussion.
> 
> We, here at EPAM, viewed EL0 apps primarily as a way to extend
> hypervisor. Because when it comes to embedded and automotive, there
> arise some ugly things, that are needed at hypervisor level:
> TEE mediators (OP-TEE is a good TEE, but for example there is TI's
> MSHIELD with deeply proprietary license), device drivers for vcopros,
> device drivers for cpufreq, and so on.
> Some of this things can't be included in hypervisor due to legal
> issues, some - because of code size or complexity. And we can't run
> them in stubdoms, because stubdoms are slow for certain use-cases, in
> some cases they are insecure, in some cases they just don't fit at
> all.

I can see that stubdoms can be slow if you require very low latencies.
Scheduler optimizations (giving stubdoms an higher priority) might be
able to improve on those.

But they are not insecure. Also, in what cases they don't fit at all?


> On other hand you consider EL0 apps as ideal host for emulators only.

Yes, EL0 apps are ideal for emulators, but not just emulators, anything
that runs deterministically after a guest trap or a timer event could be
a decent fit for an EL0 app. The issue is the interface between EL0 app
and Xen, but that can be discussed and designed in a way to satisfy all
parties.

But we need to start from somewhere. I suggest you write a simple design
document to explain the use-case for EL0 apps and their interfaces to
the rest of the system. We can take the discussion from there. We might
be able to reach a consensus on a design that works for everybody.

We need a concrete proposal to start from though.


> I can see your point, because XEN was always viewed as hypervisor for
> servers.
>
> But servers have different requirements in comparison to embedded
> applications. Traditional servers does not use hardware accelerated
> video decoders, they don't need to disable cpu's or scale frequencies
> to preserve energy (okay, they need to, but it is not as pressing, as
> on battery-powered device), there almost no proprietary code (or even
> proprietary blobs, argh!).
> Looks like virtualization on embedded is the next big thing. Linux
> kernel was able to satisfy both parties. I hope that XEN can do the
> same.

I think that this has not much to do with embedded vs server; it's more
about the need of supporting new, more complex, hardware and firmware
interfaces.


> So, going back to EL0 apps. Honestly, I'd prefer not to use them as
> extension mechanism. Yes, they provide isolation, but interfacing with
> them will be painful. Probably we can leave them to emulators only
> (but as I can see, PL011 emulator is going to be merged right into
> hypervisor. Will be there need for other emulators?).
> What I really want to ask: what do you thing about old good modules
> like ones in linux kernel? There will be no isolation, this is bad.
> But:
>  - you can load proprietary modules if you want to
>  - they are fast
>  - you can interface with them in a nativest way possible: just call a
> function

Proprietary modules are a legal minefield. They are best avoided even in
Linux. Fortunately, both EL0 apps and stubdoms could be proprietary.
Thus, especially if you have a requirement for running proprietary
code, it is key to do EL0 apps and/or stubdoms in Xen on ARM.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope))
  2017-05-11 17:14                                             ` George Dunlap
  2017-05-11 17:16                                               ` George Dunlap
@ 2017-05-11 18:13                                               ` Volodymyr Babchuk
  2017-05-12 11:48                                                 ` George Dunlap
  1 sibling, 1 reply; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-05-11 18:13 UTC (permalink / raw)
  To: George Dunlap
  Cc: Tim Deegan, Stefano Stabellini, Andrii Anisov, Andrew Cooper,
	Dario Faggioli, Ian Jackson, Xen Devel, Julien Grall,
	Jan Beulich, Wei Liu, Artem Mygaiev

George,

On 11 May 2017 at 20:14, George Dunlap <george.dunlap@citrix.com> wrote:
>>> We, here at EPAM, viewed EL0 apps primarily as a way to extend
>>> hypervisor. Because when it comes to embedded and automotive, there
>>> arise some ugly things, that are needed at hypervisor level:
>>> TEE mediators (OP-TEE is a good TEE, but for example there is TI's
>>> MSHIELD with deeply proprietary license),
>
> If you're going to use a deeply proprietary TEE mediator, then you need
> to find yourself a deeply proprietary hypervisor to go along with it --
> either one you pay a license fee for or one you develop yourself.  It
> would almost certainly be cheaper to improve the open-source one than to
> do either of those.
> Or you can try mixing the two and see what happens; but that doesn't
> seem like a very sound legal strategy to me.
Okay, point taken.

>>> ...some [things can't be included in hypervisor] because of code
>>> size or complexity.
>
> Sorry, just to be clear: below you mentioned modules as a solution, and
> given the context this would be included.  So can you expand on what you
> mean that there are things that 1) can't be included in the hypervisor
> because of code size or complexity, but for which 2) loadable modules
> would be a suitable solution?
Well... Device drives? Emulators? For example, if I will write bunch
of good and neat GPL drivers for some SoC and I'll promise to maintain
them, will you include them into upstream?
Or I will write emulator for some arcane device, will it be merged
into upstream?
Real case: I will write OP-TEE mediator for one client and Google
Trusty mediator for other client. Every will have, say, 2,000 lines of
code. Are there changes, that they both will be merged into
hypervisor?

>>> And we can't run
>>> them in stubdoms, because stubdoms are slow for certain use-cases, in
>>> some cases they are insecure, in some cases they just don't fit at
>>> all.
>>> On other hand you consider EL0 apps as ideal host for emulators only.
>>> I can see your point, because XEN was always viewed as hypervisor for
>>> servers.
>>> But servers have different requirements in comparison to embedded
>>> applications. Traditional servers does not use hardware accelerated
>>> video decoders, they don't need to disable cpu's or scale frequencies
>>> to preserve energy (okay, they need to, but it is not as pressing, as
>>> on battery-powered device), there almost no proprietary code (or even
>>> proprietary blobs, argh!).
>>> Looks like virtualization on embedded is the next big thing. Linux
>>> kernel was able to satisfy both parties. I hope that XEN can do the
>>> same.

> For many of these, there are probably technical solutions that we could
> come up with that would allow proprietary content (such as video
> decoders &c) that would have suitable performance without needing access
> to the Xen address space.
Yes, we probably can. But any such solution will require some changes
in hypervisor to accommodate it. So, what we are currently doing? We
are discussing such solutions.

> Maybe I'm just not familiar with things, but it's hard for me to imagine
> why you'd need proprietary blobs to disable cpus or scale frequency.
> Are these really such complex activities that it's worth investing
> thousands of hours of developer work into developing proprietary
> solutions that you license?
Okay, I don't know no platform where you need proprietary blob to
scale frequency. And I hope, I never will encounter one.
But I can imagine it: some firmware binary that needs to be uploaded
into PMIC. Can we store this firmware in the hypervisor? I don't know.
I'm not a lawyer.

> Loading proprietary modules into Linux is as illegal as it would be in
> Xen.  Many people obviously do it anyway, but you are really putting
> yourself at a risk of meeting a guy like Patrick McHardy[1], a private
> individual with copyright on the Linux kernel who by some estimates has
> made almost EUR 2m in the last few years suing companies for GPL violations.
Okay, I didn't know that it is illegal to load non-gpl modules into
Linux kernel. Thank you for sharing this knowledge. But now I'm
curios, why there are EXPORT_SYMBOL_GPL() and plain EXPORT_SYMBOL()? I
though it was intended to separate GPL and non-GPL code.
BTW, "non-GPL code" does not mean "closed-source code". It can be
LGPL, MIT, BSD, or Copyleft license. I can imagine proprietary license
which is compatible with BSD, but is incompatible with GPLv2.

Anyways, I have taken your point. No proprietary code in modules. What
about other parts of discussion? Are you against loadable modules in
any fashion? What about native apps?

-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope))
  2017-05-11 18:04                                             ` Stefano Stabellini
@ 2017-05-11 18:39                                               ` Volodymyr Babchuk
  0 siblings, 0 replies; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-05-11 18:39 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Andrii Anisov, Tim Deegan, Dario Faggioli, Ian Jackson,
	George Dunlap, Xen Devel, Julien Grall, Jan Beulich,
	Andrew Cooper, Wei Liu, Artem Mygaiev

Stefano,

>> We, here at EPAM, viewed EL0 apps primarily as a way to extend
>> hypervisor. Because when it comes to embedded and automotive, there
>> arise some ugly things, that are needed at hypervisor level:
>> TEE mediators (OP-TEE is a good TEE, but for example there is TI's
>> MSHIELD with deeply proprietary license), device drivers for vcopros,
>> device drivers for cpufreq, and so on.
>> Some of this things can't be included in hypervisor due to legal
>> issues, some - because of code size or complexity. And we can't run
>> them in stubdoms, because stubdoms are slow for certain use-cases, in
>> some cases they are insecure, in some cases they just don't fit at
>> all.
> I can see that stubdoms can be slow if you require very low latencies.
> Scheduler optimizations (giving stubdoms an higher priority) might be
> able to improve on those.
Yeah, when I wrote "slow" I actually meant "have high latency". Thank
you for correction. Yep, they can be improved to a certain limit.

> But they are not insecure. Also, in what cases they don't fit at all?
About security... I had in mind that case, that we discussed on
community call: secure boot. Say, we put OP-TEE mediator into stubdom.
But as it is sensitive thing, we need to a) check it's signature, b)
create this stubdom before dom0 construction. From other points it as
secure as any other domain.

Regarding "don't fit at all": the virtual coprocessors use-case. I
don't see how we can put vGPU driver into a stubdomain.


>> On other hand you consider EL0 apps as ideal host for emulators only.
>
> Yes, EL0 apps are ideal for emulators, but not just emulators, anything
> that runs deterministically after a guest trap or a timer event could be
> a decent fit for an EL0 app. The issue is the interface between EL0 app
> and Xen, but that can be discussed and designed in a way to satisfy all
> parties.
Okay, we need to discuss it, but looks like this definition covers all
our use-cases.

> But we need to start from somewhere. I suggest you write a simple design
> document to explain the use-case for EL0 apps and their interfaces to
> the rest of the system. We can take the discussion from there. We might
> be able to reach a consensus on a design that works for everybody.
>
> We need a concrete proposal to start from though.
Yes, I agree. Now, when we have discussed this in ML, I have better
vision on this topic. I'll try to present some design document in the
next week.

>> I can see your point, because XEN was always viewed as hypervisor for
>> servers.
>>
>> But servers have different requirements in comparison to embedded
>> applications. Traditional servers does not use hardware accelerated
>> video decoders, they don't need to disable cpu's or scale frequencies
>> to preserve energy (okay, they need to, but it is not as pressing, as
>> on battery-powered device), there almost no proprietary code (or even
>> proprietary blobs, argh!).
>> Looks like virtualization on embedded is the next big thing. Linux
>> kernel was able to satisfy both parties. I hope that XEN can do the
>> same.
> I think that this has not much to do with embedded vs server; it's more
> about the need of supporting new, more complex, hardware and firmware
> interfaces.
Yep, this is more precise.

>
>> So, going back to EL0 apps. Honestly, I'd prefer not to use them as
>> extension mechanism. Yes, they provide isolation, but interfacing with
>> them will be painful. Probably we can leave them to emulators only
>> (but as I can see, PL011 emulator is going to be merged right into
>> hypervisor. Will be there need for other emulators?).
>> What I really want to ask: what do you thing about old good modules
>> like ones in linux kernel? There will be no isolation, this is bad.
>> But:
>>  - you can load proprietary modules if you want to
>>  - they are fast
>>  - you can interface with them in a nativest way possible: just call a
>> function
>
> Proprietary modules are a legal minefield. They are best avoided even in
> Linux. Fortunately, both EL0 apps and stubdoms could be proprietary.
> Thus, especially if you have a requirement for running proprietary
> code, it is key to do EL0 apps and/or stubdoms in Xen on ARM.
As you can see, we already discussed this :) Under "proprietary" I
have meant something like "open source, but not compatible with GPL",
and not it-is-so-secret-that-you-need-to-sign-100-NDAs. Anyways, this
is a hard topic and we need to consider it carefully.


-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope))
  2017-05-11 18:13                                               ` Volodymyr Babchuk
@ 2017-05-12 11:48                                                 ` George Dunlap
  2017-05-12 18:43                                                   ` Stefano Stabellini
  0 siblings, 1 reply; 82+ messages in thread
From: George Dunlap @ 2017-05-12 11:48 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Stefano Stabellini, Andrii Anisov, Ian Jackson, Dario Faggioli,
	Tim Deegan, Xen Devel, Julien Grall, Jan Beulich, Andrew Cooper,
	Wei Liu, Artem Mygaiev

[reordering slightly to make the response easier]

On Thu, May 11, 2017 at 7:13 PM, Volodymyr Babchuk
<vlad.babchuk@gmail.com> wrote:
>> Maybe I'm just not familiar with things, but it's hard for me to imagine
>> why you'd need proprietary blobs to disable cpus or scale frequency.
>> Are these really such complex activities that it's worth investing
>> thousands of hours of developer work into developing proprietary
>> solutions that you license?
> Okay, I don't know no platform where you need proprietary blob to
> scale frequency. And I hope, I never will encounter one.
> But I can imagine it: some firmware binary that needs to be uploaded
> into PMIC. Can we store this firmware in the hypervisor? I don't know.
> I'm not a lawyer.

On x86, we do microcode updates, which are (as I understand it) binary
blobs that get passed through the hypervisor to the cpus.  This blob
isn't executed by Xen, so it doesn't seem like you would be able to
argue that passing a binary blob through the hypervisor creates a
derivative / combined work.  In that case the blobs are stored as
files on disk and passed to Xen at boot time (via grub), not compiled
into the Xen binary.  Whether compiling such things into the binary
constitutes a "derived work" is something you'd probably better ask a
lawyer. :-)

If configuring the bootloader to pass extra files to Xen isn't
suitable on ARM for some reason we can probably come up with some
other way of packaging things together which honors the GPL suitably.

>>>> ...some [things can't be included in hypervisor] because of code
>>>> size or complexity.
>>
>> Sorry, just to be clear: below you mentioned modules as a solution, and
>> given the context this would be included.  So can you expand on what you
>> mean that there are things that 1) can't be included in the hypervisor
>> because of code size or complexity, but for which 2) loadable modules
>> would be a suitable solution?
> Well... Device drives? Emulators? For example, if I will write bunch
> of good and neat GPL drivers for some SoC and I'll promise to maintain
> them, will you include them into upstream?
> Or I will write emulator for some arcane device, will it be merged
> into upstream?
> Real case: I will write OP-TEE mediator for one client and Google
> Trusty mediator for other client. Every will have, say, 2,000 lines of
> code. Are there changes, that they both will be merged into
> hypervisor?

[snip]

> Anyways, I have taken your point. No proprietary code in modules. What
> about other parts of discussion? Are you against loadable modules in
> any fashion? What about native apps?

There are several different questions we're getting slightly mixed up here:
1. Should some bit of functionality (like a TEE mediator or device
emulation) live in the xen.git tree?
2. Should that functionality run in the hypervisor address space?
3. Should that functionality be loaded via a loadable module?
4. What place to proprietary components have in a Xen system?

Let me address #4 first.  There are lots of examples of proprietary
*components* of Xen systems.  XenClient used to have a proprietary
device model (a process running in dom0) for helping virtualize
graphics cards; a number of companies have proprietary drivers for
memory sharing or VM introspection.  But all of those are outside of
the Xen address space, interacting with Xen via hypercalls.  As long
as "native apps" (I think we probably need a better name here) are
analogous to a devicemodel stubdomain -- in a separate address space
and acting through a well-defined hypercal interface -- I don't have
any objection to having proprietary ones.

Regarding #1-2, let me first say that how specific it is to a
particular platform or use case isn't actually important to any of
these questions.  The considerations are partly technical, and partly
practical -- how much benefit does it give to the project as a whole
vs the cost?

For a long time there were only two functional schedulers in Xen --
the Credit scheduler (now called "credit1" to distinguish it from
"credit2"), and the ARINC653 scheduler, which is a real-time scheduler
targeted at a very specific use case and industry.  As far as I know
there is only one user.  But it was checked into the Xen tree because
it would obviously be useful to them (benefit) and almost no impact on
anyone else (cost); and it ran inside the hypervisor because that's
the only place to run a scheduler.

So given your examples, I see no reason not to have several
implementations of different mediators or emulated devices in tree, or
in a XenProject-managed git repo (like mini-os.git).  I don't know the
particulars about mediators or the devices you have in mind, but if
you can show technical reasons why they need to be run in the
hypervisor rather than somewhere else (for performance or security
sake, for instance), there's no reason in principle not to add them to
the hypervisor code; and if they're in the hypervisor, then they
should be in xen.git.

Regarding modules (#3): The problem that loadable modules were
primarily introduced to solve in Linux wasn't "How to deal with
proprietary drivers", or even "how to deal with out-of-tree drivers".
The problem was, "How to we allow software providers to 1) have a
single kernel binary, which 2) has drivers for all the different
systems on which it needs to run, but 3) not take a massive amount of
memory or space on systems, given that any given system will not need
the vast majority of drivers?"

Suppose hypothetically that we decided that the mediators you describe
need to run in the hypervisor.  As long as Kconfig is sufficient for
people to enable or disable what they need to make a functional and
efficient system, then there's no need to introduce modules.  If we
reached a point where people wanted a single binary that could do
either or OP-TEE mediator or the Google mediator, or both, or neither,
but didn't to include all of them in the core binary (perhaps because
of memory constraints), then loadable modules would be a good solution
to consider.  But either way, if we decided they should run in the
hypervisor, then all things being equal it would still be better to
have both implementations in-tree.

There are a couple of reasons for the push-back on loadable modules.
The first is the extra complication and infrastructure it adds.  But
the second is that people have a strong temptation to use them for
out-of-tree and proprietary code, both of which we'd like to avoid if
possible.  If there comes a point in time where loadable modules are
the only reasonable solution to the problem, I will support having
them; but until that time I will look for other solutions if I can.

Does that make sense?

BTW I've been saying "I" throughout this response; hopefully that
makes it clear that I'm mainly speaking for myself here.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope))
  2017-05-12 11:48                                                 ` George Dunlap
@ 2017-05-12 18:43                                                   ` Stefano Stabellini
  2017-05-12 19:04                                                     ` Volodymyr Babchuk
  0 siblings, 1 reply; 82+ messages in thread
From: Stefano Stabellini @ 2017-05-12 18:43 UTC (permalink / raw)
  To: George Dunlap
  Cc: Stefano Stabellini, Andrii Anisov, Volodymyr Babchuk,
	Andrew Cooper, Dario Faggioli, Tim Deegan, Xen Devel,
	Julien Grall, Jan Beulich, Wei Liu, Ian Jackson, Artem Mygaiev

On Fri, 12 May 2017, George Dunlap wrote:
> So given your examples, I see no reason not to have several
> implementations of different mediators or emulated devices in tree, or
> in a XenProject-managed git repo (like mini-os.git).  I don't know the
> particulars about mediators or the devices you have in mind, but if
> you can show technical reasons why they need to be run in the
> hypervisor rather than somewhere else (for performance or security
> sake, for instance), there's no reason in principle not to add them to
> the hypervisor code; and if they're in the hypervisor, then they
> should be in xen.git.

On the topic of the technical reasons for being out of the hypervisor
(EL0 app or stubdom), I'll spend a couple of words on security.

How large are these components? If they increase the hypervisor code
size too much, it's best if they are run elsewhere.

What is their guest-exposed attack surface? If it's large it's best to
run them out of the hypervisor.

My gut feeling is that both these points might be a problem.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope))
  2017-05-12 18:43                                                   ` Stefano Stabellini
@ 2017-05-12 19:04                                                     ` Volodymyr Babchuk
  2017-05-15 11:21                                                       ` George Dunlap
  0 siblings, 1 reply; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-05-12 19:04 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Andrii Anisov, Tim Deegan, Dario Faggioli, Ian Jackson,
	George Dunlap, Xen Devel, Julien Grall, Jan Beulich,
	Andrew Cooper, Wei Liu, Artem Mygaiev

Stefano,

On 12 May 2017 at 21:43, Stefano Stabellini <sstabellini@kernel.org> wrote:

> On the topic of the technical reasons for being out of the hypervisor
> (EL0 app or stubdom), I'll spend a couple of words on security.
>
> How large are these components? If they increase the hypervisor code
> size too much, it's best if they are run elsewhere.
I'm talking about OP-TEE now.
"Large" as "large code base"? I have shared my PoC driver. Here it is
[1]. My expectation: 1,000-2,000 lines of code for mediator + some
OP-TEE headers.

> What is their guest-exposed attack surface? If it's large it's best to
> run them out of the hypervisor.
OP-TEE mediator will trap SMC calls and parse parameter buffers
according to OP-TEE ABI specification. ABI is very simple, so I can't
say that there will be attack surface.

> My gut feeling is that both these points might be a problem.
The real problem, that is needs the same privileges, as hypervisor
itself. I wrote this in parallel thread:
it needs to pin guest pages (to ensure that page will be not
transferred to another domain, while OP-TEE uses it), it needs to map
guest page so it can do IPA->PA translation in a command buffer, it
needs to execute SMCs (but we can limit it there, thanks to SMCCC),
probably it will need to inject vIRQ to guest to wake it up.

[1] https://github.com/lorc/xen/tree/staging-4.7/xen/arch/arm/optee
-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope))
  2017-05-12 19:04                                                     ` Volodymyr Babchuk
@ 2017-05-15 11:21                                                       ` George Dunlap
  2017-05-15 17:32                                                         ` Stefano Stabellini
  0 siblings, 1 reply; 82+ messages in thread
From: George Dunlap @ 2017-05-15 11:21 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Stefano Stabellini, Wei Liu, Xen Devel, Julien Grall,
	Andrii Anisov, Artem Mygaiev

[Reducing CC list now that we're off the topic of modules]

On Fri, May 12, 2017 at 8:04 PM, Volodymyr Babchuk
<vlad.babchuk@gmail.com> wrote:
> Stefano,
>
> On 12 May 2017 at 21:43, Stefano Stabellini <sstabellini@kernel.org> wrote:
>
>> On the topic of the technical reasons for being out of the hypervisor
>> (EL0 app or stubdom), I'll spend a couple of words on security.
>>
>> How large are these components? If they increase the hypervisor code
>> size too much, it's best if they are run elsewhere.
> I'm talking about OP-TEE now.
> "Large" as "large code base"? I have shared my PoC driver. Here it is
> [1]. My expectation: 1,000-2,000 lines of code for mediator + some
> OP-TEE headers.
>
>> What is their guest-exposed attack surface? If it's large it's best to
>> run them out of the hypervisor.
> OP-TEE mediator will trap SMC calls and parse parameter buffers
> according to OP-TEE ABI specification. ABI is very simple, so I can't
> say that there will be attack surface.
>
>> My gut feeling is that both these points might be a problem.
> The real problem, that is needs the same privileges, as hypervisor
> itself. I wrote this in parallel thread:
> it needs to pin guest pages (to ensure that page will be not
> transferred to another domain, while OP-TEE uses it), it needs to map
> guest page so it can do IPA->PA translation in a command buffer, it
> needs to execute SMCs (but we can limit it there, thanks to SMCCC),
> probably it will need to inject vIRQ to guest to wake it up.

Xen is different than Linux in that it attempts to take a "practical
microkernel" approach.  "Microkernel" meaning that we prefer to do as
much *outside* of the hypervisor as possible.  "Practical" meaning, if
running it outside the hypervisor causes too much complexity or too
much performance overhead, then we don't stand on ideology but allow
things to run inside of Xen.

With the exception of SMCs (which I don't know anything about), device
models (e.g., QEMU) already have  of this functionality on x86,
running from dom0 or from a stubdomain.

Do OP-TEE mediators require a lot of performance?  I.e., do the
operations happen very frequently and/or are they particularly
latency-sensitive?  If not then it might be worth implementing it as a
dom0 device model first, and then exploring higher-performing options
if that turns out to be too slow.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope))
  2017-05-15 11:21                                                       ` George Dunlap
@ 2017-05-15 17:32                                                         ` Stefano Stabellini
  0 siblings, 0 replies; 82+ messages in thread
From: Stefano Stabellini @ 2017-05-15 17:32 UTC (permalink / raw)
  To: George Dunlap
  Cc: Stefano Stabellini, Wei Liu, Volodymyr Babchuk, Xen Devel,
	Julien Grall, Andrii Anisov, Artem Mygaiev

On Mon, 15 May 2017, George Dunlap wrote:
> [Reducing CC list now that we're off the topic of modules]
> 
> On Fri, May 12, 2017 at 8:04 PM, Volodymyr Babchuk
> <vlad.babchuk@gmail.com> wrote:
> > Stefano,
> >
> > On 12 May 2017 at 21:43, Stefano Stabellini <sstabellini@kernel.org> wrote:
> >
> >> On the topic of the technical reasons for being out of the hypervisor
> >> (EL0 app or stubdom), I'll spend a couple of words on security.
> >>
> >> How large are these components? If they increase the hypervisor code
> >> size too much, it's best if they are run elsewhere.
> > I'm talking about OP-TEE now.
> > "Large" as "large code base"? I have shared my PoC driver. Here it is
> > [1]. My expectation: 1,000-2,000 lines of code for mediator + some
> > OP-TEE headers.
> >
> >> What is their guest-exposed attack surface? If it's large it's best to
> >> run them out of the hypervisor.
> > OP-TEE mediator will trap SMC calls and parse parameter buffers
> > according to OP-TEE ABI specification. ABI is very simple, so I can't
> > say that there will be attack surface.
> >
> >> My gut feeling is that both these points might be a problem.
> > The real problem, that is needs the same privileges, as hypervisor
> > itself. I wrote this in parallel thread:
> > it needs to pin guest pages (to ensure that page will be not
> > transferred to another domain, while OP-TEE uses it), it needs to map
> > guest page so it can do IPA->PA translation in a command buffer, it
> > needs to execute SMCs (but we can limit it there, thanks to SMCCC),
> > probably it will need to inject vIRQ to guest to wake it up.
> 
> Xen is different than Linux in that it attempts to take a "practical
> microkernel" approach.  "Microkernel" meaning that we prefer to do as
> much *outside* of the hypervisor as possible.  "Practical" meaning, if
> running it outside the hypervisor causes too much complexity or too
> much performance overhead, then we don't stand on ideology but allow
> things to run inside of Xen.
> 
> With the exception of SMCs (which I don't know anything about), device
> models (e.g., QEMU) already have  of this functionality on x86,
> running from dom0 or from a stubdomain.
> 
> Do OP-TEE mediators require a lot of performance?  I.e., do the
> operations happen very frequently and/or are they particularly
> latency-sensitive?  If not then it might be worth implementing it as a
> dom0 device model first, and then exploring higher-performing options
> if that turns out to be too slow.

The whole discussion started from the need for something that has lower
latency, and more importantly, more deterministic latency, than a dom0
device model.

Any use-cases with even the weakest of real-time requirements won't be
satisfied by a dom0 device model, where the max latency is basically
infinite.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-15 12:51 ` George Dunlap
@ 2017-05-15 17:35   ` Stefano Stabellini
  0 siblings, 0 replies; 82+ messages in thread
From: Stefano Stabellini @ 2017-05-15 17:35 UTC (permalink / raw)
  To: George Dunlap
  Cc: Stefano Stabellini, Andrii Anisov, Volodymyr Babchuk, Tim Deegan,
	Dario Faggioli, Ian Jackson, Xen Devel, Julien Grall,
	Jan Beulich, Andrew Cooper, Wei Liu, Artem Mygaiev

On Mon, 15 May 2017, George Dunlap wrote:
> On Fri, May 12, 2017 at 7:47 PM, Volodymyr Babchuk
> <vlad.babchuk@gmail.com> wrote:
> >> Regarding modules (#3): The problem that loadable modules were
> >> primarily introduced to solve in Linux wasn't "How to deal with
> >> proprietary drivers", or even "how to deal with out-of-tree drivers".
> >> The problem was, "How to we allow software providers to 1) have a
> >> single kernel binary, which 2) has drivers for all the different
> >> systems on which it needs to run, but 3) not take a massive amount of
> >> memory or space on systems, given that any given system will not need
> >> the vast majority of drivers?"
> >>
> >> Suppose hypothetically that we decided that the mediators you describe
> >> need to run in the hypervisor.  As long as Kconfig is sufficient for
> >> people to enable or disable what they need to make a functional and
> >> efficient system, then there's no need to introduce modules.  If we
> >> reached a point where people wanted a single binary that could do
> >> either or OP-TEE mediator or the Google mediator, or both, or neither,
> >> but didn't to include all of them in the core binary (perhaps because
> >> of memory constraints), then loadable modules would be a good solution
> >> to consider.  But either way, if we decided they should run in the
> >> hypervisor, then all things being equal it would still be better to
> >> have both implementations in-tree.
> >>
> >> There are a couple of reasons for the push-back on loadable modules.
> >> The first is the extra complication and infrastructure it adds.  But
> >> the second is that people have a strong temptation to use them for
> >> out-of-tree and proprietary code, both of which we'd like to avoid if
> >> possible.  If there comes a point in time where loadable modules are
> >> the only reasonable solution to the problem, I will support having
> >> them; but until that time I will look for other solutions if I can.
> >>
> >> Does that make sense?
> > Yes, thank you. Legal questions is not my best side. Looks like I was
> > too quick, when proposed modules as a solution to our needs. Sorry, I
> > had to investigate this topic further before talking about it.
> >
> > So, let's get back to native apps. We had internal discussion about
> > possible use cases and want to share our conclusions.
> >
> > 1. Emulators. As Stefano pointed, this is ideal use case for small,
> > fast native apps that are accounted in a calling vcpu time slice.
> >
> > 2. Virtual coprocessor backend/driver. The part that does actual job:
> > makes coprocessor to save or restore context. It is also small,
> > straightforward app, but it should have access to a real HW.
> >
> > 3. TEE mediators. They need so much privileges, so there actually are
> > no sense in putting them into native apps. For example, to work
> > properly OP-TEE mediator needs to: pin guest pages, map guest pages to
> > perform IPA->MPA translation, send vIRQs to guests, issue real SMCs.
> 
> As I think I've said elsewhere, apart from "issue real SMCs", all of
> that functionality is already available to device models running in
> domain 0, in the sense that there are interfaces which cause Xen to
> make those things happen: when the devicemodel maps a page, that
> increases the refcount and effectively pins it; the devicemodel
> accesses *all* guest pages in terms of guest memory addresses, but (I
> believe) can ask Xen for a p->m translation of a particular page in
> memory; and it can set vIRQs pending to the guest.  It seems likely
> that a suitable hypervisor interface could be made to expose SMC
> functionality to device models as well.

I'll repeat here for convenience. The discussion started from the need
for something that has lower latency, and more importantly, more
deterministic latency, than a dom0 device model. A dom0 device model
cannot guarantee even the weakest of latency requirements.

On ARM there are no dom0 device models today, and given their critical
limitations, I prefer to introduce something different from the start.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-12 18:47 Volodymyr Babchuk
  2017-05-15 12:51 ` George Dunlap
@ 2017-05-15 13:54 ` Andrii Anisov
  1 sibling, 0 replies; 82+ messages in thread
From: Andrii Anisov @ 2017-05-15 13:54 UTC (permalink / raw)
  To: Volodymyr Babchuk, George Dunlap
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Dario Faggioli,
	Tim Deegan, Xen Devel, Julien Grall, Jan Beulich, Ian Jackson,
	Artem Mygaiev


On 12.05.17 21:47, Volodymyr Babchuk wrote:
> vcoproc driver should be able to work with real HW, probably it will
> handle real IRQs from device, we need one instance of driver per
> device, not per domain. Andrii can correct me, but vcoproc framework
> is not tied to vcpus, so it can work in context of any vcpu. Thus, it
> will be accounted for that vcpu, that happened to execute at current
> moment. Probably, this is not fair.
I guess the mmio access emulation should be accounted per domain's vcpu. 
Context switching - per idle vcpu (xen itself).
Should it be two different "native apps"?

> Can we run vcoproc driver in a stubdomain? Probably yes, if we can
> guarantee latency (as if in real time system). Just one example: 60FPS
> displays are standard at this time. 1/60 gives us 16ms to deliver
> frame to a display. 16ms should be enough to render next frame,
> compose it, deliver to a display controller. Actually it is plenty of
> time (most of the time). Now imagine that we want to share one GPU
> between two domains. Actual render tasks can be very fast, lets say 1
> ms for each domain. But to render both of them, we need to switch GPU
> context at least two times (one time to render Guest A task, one time
> to render Guest B task). This gives us 8ms between switches. If we
> will put vcoproc driver to a stubdomain, we will be at mercy of vCPU
> scheduler. It is good scheduler, but I don't know if it suits for this
> use case. 8ms is an upper bound. If there will be three domains
> sharing GPU, limit will be 6 ms. And, actually, one slice per domain
> is not enough, because domain may be willing to render own portion
> later. So, 1 ms will be more realistic requirement. I mean, that
> stubdom with coproc driver should be scheduled every 1ms not matter of
> what.
> With native apps (or some light stubdomain) which will be scheduled
> right when it is needed - this is much easier task.
>
> At least, this is my vision of vcoproc driver problem. Andrii can
> correct me, if I'm terribly wrong.
All above is correct enough.

-- 

*Andrii Anisov*



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
  2017-05-12 18:47 Volodymyr Babchuk
@ 2017-05-15 12:51 ` George Dunlap
  2017-05-15 17:35   ` Stefano Stabellini
  2017-05-15 13:54 ` Andrii Anisov
  1 sibling, 1 reply; 82+ messages in thread
From: George Dunlap @ 2017-05-15 12:51 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Stefano Stabellini, Andrii Anisov, Tim Deegan, Dario Faggioli,
	Ian Jackson, Xen Devel, Julien Grall, Jan Beulich, Andrew Cooper,
	Wei Liu, Artem Mygaiev

On Fri, May 12, 2017 at 7:47 PM, Volodymyr Babchuk
<vlad.babchuk@gmail.com> wrote:
>> Regarding modules (#3): The problem that loadable modules were
>> primarily introduced to solve in Linux wasn't "How to deal with
>> proprietary drivers", or even "how to deal with out-of-tree drivers".
>> The problem was, "How to we allow software providers to 1) have a
>> single kernel binary, which 2) has drivers for all the different
>> systems on which it needs to run, but 3) not take a massive amount of
>> memory or space on systems, given that any given system will not need
>> the vast majority of drivers?"
>>
>> Suppose hypothetically that we decided that the mediators you describe
>> need to run in the hypervisor.  As long as Kconfig is sufficient for
>> people to enable or disable what they need to make a functional and
>> efficient system, then there's no need to introduce modules.  If we
>> reached a point where people wanted a single binary that could do
>> either or OP-TEE mediator or the Google mediator, or both, or neither,
>> but didn't to include all of them in the core binary (perhaps because
>> of memory constraints), then loadable modules would be a good solution
>> to consider.  But either way, if we decided they should run in the
>> hypervisor, then all things being equal it would still be better to
>> have both implementations in-tree.
>>
>> There are a couple of reasons for the push-back on loadable modules.
>> The first is the extra complication and infrastructure it adds.  But
>> the second is that people have a strong temptation to use them for
>> out-of-tree and proprietary code, both of which we'd like to avoid if
>> possible.  If there comes a point in time where loadable modules are
>> the only reasonable solution to the problem, I will support having
>> them; but until that time I will look for other solutions if I can.
>>
>> Does that make sense?
> Yes, thank you. Legal questions is not my best side. Looks like I was
> too quick, when proposed modules as a solution to our needs. Sorry, I
> had to investigate this topic further before talking about it.
>
> So, let's get back to native apps. We had internal discussion about
> possible use cases and want to share our conclusions.
>
> 1. Emulators. As Stefano pointed, this is ideal use case for small,
> fast native apps that are accounted in a calling vcpu time slice.
>
> 2. Virtual coprocessor backend/driver. The part that does actual job:
> makes coprocessor to save or restore context. It is also small,
> straightforward app, but it should have access to a real HW.
>
> 3. TEE mediators. They need so much privileges, so there actually are
> no sense in putting them into native apps. For example, to work
> properly OP-TEE mediator needs to: pin guest pages, map guest pages to
> perform IPA->MPA translation, send vIRQs to guests, issue real SMCs.

As I think I've said elsewhere, apart from "issue real SMCs", all of
that functionality is already available to device models running in
domain 0, in the sense that there are interfaces which cause Xen to
make those things happen: when the devicemodel maps a page, that
increases the refcount and effectively pins it; the devicemodel
accesses *all* guest pages in terms of guest memory addresses, but (I
believe) can ask Xen for a p->m translation of a particular page in
memory; and it can set vIRQs pending to the guest.  It seems likely
that a suitable hypervisor interface could be made to expose SMC
functionality to device models as well.

(Unless I've misunderstood something somewhere.)

Running it outside of dom0 could potentially be a security advantage
if you don't want to trust dom0 100%.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [ARM] Native application design and discussion (I hope)
@ 2017-05-12 18:47 Volodymyr Babchuk
  2017-05-15 12:51 ` George Dunlap
  2017-05-15 13:54 ` Andrii Anisov
  0 siblings, 2 replies; 82+ messages in thread
From: Volodymyr Babchuk @ 2017-05-12 18:47 UTC (permalink / raw)
  To: George Dunlap
  Cc: Stefano Stabellini, Andrii Anisov, Ian Jackson, Dario Faggioli,
	Tim Deegan, Xen Devel, Julien Grall, Jan Beulich, Andrew Cooper,
	Wei Liu, Artem Mygaiev

Hi George,

On 12 May 2017 at 14:48, George Dunlap <george.dunlap@citrix.com> wrote:
> [reordering slightly to make the response easier]

>> Okay, I don't know no platform where you need proprietary blob to
>> scale frequency. And I hope, I never will encounter one.
>> But I can imagine it: some firmware binary that needs to be uploaded
>> into PMIC. Can we store this firmware in the hypervisor? I don't know.
>> I'm not a lawyer.
>
> On x86, we do microcode updates, which are (as I understand it) binary
> blobs that get passed through the hypervisor to the cpus.  This blob
> isn't executed by Xen, so it doesn't seem like you would be able to
> argue that passing a binary blob through the hypervisor creates a
> derivative / combined work.  In that case the blobs are stored as
> files on disk and passed to Xen at boot time (via grub), not compiled
> into the Xen binary.  Whether compiling such things into the binary
> constitutes a "derived work" is something you'd probably better ask a
> lawyer. :-)
Yeah, there are always legal ways to do this.

>>> Sorry, just to be clear: below you mentioned modules as a solution, and
>>> given the context this would be included.  So can you expand on what you
>>> mean that there are things that 1) can't be included in the hypervisor
>>> because of code size or complexity, but for which 2) loadable modules
>>> would be a suitable solution?
>> Well... Device drives? Emulators? For example, if I will write bunch
>> of good and neat GPL drivers for some SoC and I'll promise to maintain
>> them, will you include them into upstream?
>> Or I will write emulator for some arcane device, will it be merged
>> into upstream?
>> Real case: I will write OP-TEE mediator for one client and Google
>> Trusty mediator for other client. Every will have, say, 2,000 lines of
>> code. Are there changes, that they both will be merged into
>> hypervisor?
>
> [snip]
>
>> Anyways, I have taken your point. No proprietary code in modules. What
>> about other parts of discussion? Are you against loadable modules in
>> any fashion? What about native apps?
>
> There are several different questions we're getting slightly mixed up here:
> 1. Should some bit of functionality (like a TEE mediator or device
> emulation) live in the xen.git tree?
> 2. Should that functionality run in the hypervisor address space?
> 3. Should that functionality be loaded via a loadable module?
> 4. What place to proprietary components have in a Xen system?
>
> Let me address #4 first.  There are lots of examples of proprietary
> *components* of Xen systems.  XenClient used to have a proprietary
> device model (a process running in dom0) for helping virtualize
> graphics cards; a number of companies have proprietary drivers for
> memory sharing or VM introspection.  But all of those are outside of
> the Xen address space, interacting with Xen via hypercalls.  As long
> as "native apps" (I think we probably need a better name here) are
> analogous to a devicemodel stubdomain -- in a separate address space
> and acting through a well-defined hypercal interface -- I don't have
> any objection to having proprietary ones.
Yes, native apps will use almost the same mechanism (actually, it will
be syscalls instead of hypercalls, but basic idea is the same). They
are not linked to a hypervisor in any way.

> Regarding #1-2, let me first say that how specific it is to a
> particular platform or use case isn't actually important to any of
> these questions.  The considerations are partly technical, and partly
> practical -- how much benefit does it give to the project as a whole
> vs the cost?
>
> For a long time there were only two functional schedulers in Xen --
> the Credit scheduler (now called "credit1" to distinguish it from
> "credit2"), and the ARINC653 scheduler, which is a real-time scheduler
> targeted at a very specific use case and industry.  As far as I know
> there is only one user.  But it was checked into the Xen tree because
> it would obviously be useful to them (benefit) and almost no impact on
> anyone else (cost); and it ran inside the hypervisor because that's
> the only place to run a scheduler.
>
> So given your examples, I see no reason not to have several
> implementations of different mediators or emulated devices in tree, or
> in a XenProject-managed git repo (like mini-os.git).  I don't know the
> particulars about mediators or the devices you have in mind, but if
> you can show technical reasons why they need to be run in the
> hypervisor rather than somewhere else (for performance or security
> sake, for instance), there's no reason in principle not to add them to
> the hypervisor code; and if they're in the hypervisor, then they
> should be in xen.git.
This is question that bothered me. Thank you for clarification. Going
to specific use cases, yes, there are reasons why OP-TEE mediator
should run in hypervisor (or in a very privileged app).

> Regarding modules (#3): The problem that loadable modules were
> primarily introduced to solve in Linux wasn't "How to deal with
> proprietary drivers", or even "how to deal with out-of-tree drivers".
> The problem was, "How to we allow software providers to 1) have a
> single kernel binary, which 2) has drivers for all the different
> systems on which it needs to run, but 3) not take a massive amount of
> memory or space on systems, given that any given system will not need
> the vast majority of drivers?"
>
> Suppose hypothetically that we decided that the mediators you describe
> need to run in the hypervisor.  As long as Kconfig is sufficient for
> people to enable or disable what they need to make a functional and
> efficient system, then there's no need to introduce modules.  If we
> reached a point where people wanted a single binary that could do
> either or OP-TEE mediator or the Google mediator, or both, or neither,
> but didn't to include all of them in the core binary (perhaps because
> of memory constraints), then loadable modules would be a good solution
> to consider.  But either way, if we decided they should run in the
> hypervisor, then all things being equal it would still be better to
> have both implementations in-tree.
>
> There are a couple of reasons for the push-back on loadable modules.
> The first is the extra complication and infrastructure it adds.  But
> the second is that people have a strong temptation to use them for
> out-of-tree and proprietary code, both of which we'd like to avoid if
> possible.  If there comes a point in time where loadable modules are
> the only reasonable solution to the problem, I will support having
> them; but until that time I will look for other solutions if I can.
>
> Does that make sense?
Yes, thank you. Legal questions is not my best side. Looks like I was
too quick, when proposed modules as a solution to our needs. Sorry, I
had to investigate this topic further before talking about it.

So, let's get back to native apps. We had internal discussion about
possible use cases and want to share our conclusions.

1. Emulators. As Stefano pointed, this is ideal use case for small,
fast native apps that are accounted in a calling vcpu time slice.

2. Virtual coprocessor backend/driver. The part that does actual job:
makes coprocessor to save or restore context. It is also small,
straightforward app, but it should have access to a real HW.

3. TEE mediators. They need so much privileges, so there actually are
no sense in putting them into native apps. For example, to work
properly OP-TEE mediator needs to: pin guest pages, map guest pages to
perform IPA->MPA translation, send vIRQs to guests, issue real SMCs.

4. Any other uses?

So, as you can see, emulator have no privileges at all and can be
domain-bound (e.g. one emulator instance per guest). vcoproc driver
needs privileges to work with certain MMIOs (and possibly, IRQs). TEE
mediator is actually should work at EL2 level, there are no benefits
in putting it into EL0 app.

If there are no objections, I propose to put TEE topic aside for now.
Just to be clear: I really like your idea to put TEE mediators into
hypervisor tree and use Kconfig to choose needed one.

So, there are emulators and vcoproc drivers left. Emulator is a quite
simple thing: it should handle MMIO read/write and issue vIRQ
sometimes. Also it should be configured somehow. As Stefano said, we
should have multiple instances of the same emulator. One for each
domain.

vcoproc driver should be able to work with real HW, probably it will
handle real IRQs from device, we need one instance of driver per
device, not per domain. Andrii can correct me, but vcoproc framework
is not tied to vcpus, so it can work in context of any vcpu. Thus, it
will be accounted for that vcpu, that happened to execute at current
moment. Probably, this is not fair.

Can we run vcoproc driver in a stubdomain? Probably yes, if we can
guarantee latency (as if in real time system). Just one example: 60FPS
displays are standard at this time. 1/60 gives us 16ms to deliver
frame to a display. 16ms should be enough to render next frame,
compose it, deliver to a display controller. Actually it is plenty of
time (most of the time). Now imagine that we want to share one GPU
between two domains. Actual render tasks can be very fast, lets say 1
ms for each domain. But to render both of them, we need to switch GPU
context at least two times (one time to render Guest A task, one time
to render Guest B task). This gives us 8ms between switches. If we
will put vcoproc driver to a stubdomain, we will be at mercy of vCPU
scheduler. It is good scheduler, but I don't know if it suits for this
use case. 8ms is an upper bound. If there will be three domains
sharing GPU, limit will be 6 ms. And, actually, one slice per domain
is not enough, because domain may be willing to render own portion
later. So, 1 ms will be more realistic requirement. I mean, that
stubdom with coproc driver should be scheduled every 1ms not matter of
what.
With native apps (or some light stubdomain) which will be scheduled
right when it is needed - this is much easier task.

At least, this is my vision of vcoproc driver problem. Andrii can
correct me, if I'm terribly wrong.

> BTW I've been saying "I" throughout this response; hopefully that
> makes it clear that I'm mainly speaking for myself here.
Yeah, I understand this.

-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2017-05-15 17:35 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-06 20:21 [ARM] Native application design and discussion (I hope) Volodymyr Babchuk
2017-04-06 21:31 ` Stefano Stabellini
2017-04-07 11:03   ` Volodymyr Babchuk
2017-04-07 23:36     ` Stefano Stabellini
2017-04-11 20:32       ` Stefano Stabellini
2017-04-12 18:13         ` Dario Faggioli
2017-04-12 19:17           ` Stefano Stabellini
2017-04-20 20:20             ` Volodymyr Babchuk
2017-04-21 14:42               ` Andrii Anisov
2017-04-21 15:49                 ` Julien Grall
2017-04-21 16:08                   ` Volodymyr Babchuk
2017-04-21 16:20                   ` Andrii Anisov
2017-04-21 20:58                 ` Stefano Stabellini
2017-04-21 21:17                   ` Stefano Stabellini
2017-04-24 16:56                   ` Andrii Anisov
2017-04-24 18:08                     ` Stefano Stabellini
2017-04-25 10:15                       ` Andrii Anisov
2017-05-05 10:51                       ` Andrii Anisov
2017-05-05 19:28                         ` Stefano Stabellini
2017-05-08 10:46                           ` George Dunlap
2017-05-08 18:31                             ` Stefano Stabellini
2017-05-08 18:33                               ` Julien Grall
2017-05-09  8:53                               ` George Dunlap
2017-05-10 16:38                                 ` Andrii Anisov
2017-05-09 10:13                           ` Dario Faggioli
2017-05-09 10:32                             ` Julien Grall
2017-05-09 11:08                               ` Dario Faggioli
2017-05-09 11:19                                 ` Julien Grall
2017-05-09 18:29                                 ` Stefano Stabellini
2017-05-10  9:56                                   ` George Dunlap
2017-05-10 10:00                                     ` Julien Grall
2017-05-10 10:03                                       ` George Dunlap
2017-05-10 10:48                                         ` Julien Grall
2017-05-10 17:37                                           ` Volodymyr Babchuk
2017-05-10 18:05                                             ` Stefano Stabellini
2017-05-10 19:04                                             ` Julien Grall
2017-05-11 10:07                                               ` Julien Grall
2017-05-11 11:28                                                 ` Volodymyr Babchuk
2017-05-10 18:08                                     ` Andrii Anisov
2017-05-10 18:24                                       ` Stefano Stabellini
2017-05-11 15:19                                         ` Volodymyr Babchuk
2017-05-11 15:35                                           ` Modules support in Xen (WAS: Re: [ARM] Native application design and discussion (I hope)) Julien Grall
2017-05-11 16:35                                             ` George Dunlap
2017-05-11 17:14                                               ` Volodymyr Babchuk
2017-05-11 17:20                                                 ` George Dunlap
2017-05-11 17:53                                                   ` Lars Kurth
2017-05-11 17:14                                             ` George Dunlap
2017-05-11 17:16                                               ` George Dunlap
2017-05-11 18:13                                               ` Volodymyr Babchuk
2017-05-12 11:48                                                 ` George Dunlap
2017-05-12 18:43                                                   ` Stefano Stabellini
2017-05-12 19:04                                                     ` Volodymyr Babchuk
2017-05-15 11:21                                                       ` George Dunlap
2017-05-15 17:32                                                         ` Stefano Stabellini
2017-05-11 18:04                                             ` Stefano Stabellini
2017-05-11 18:39                                               ` Volodymyr Babchuk
2017-05-05 11:09                       ` [ARM] Native application design and discussion (I hope) Andrii Anisov
2017-04-24 19:11                     ` Julien Grall
2017-04-24 21:41                       ` Volodymyr Babchuk
2017-04-25 11:43                         ` Julien Grall
2017-04-26 21:44                           ` Volodymyr Babchuk
2017-04-27 17:26                             ` Volodymyr Babchuk
2017-05-02 12:52                               ` Julien Grall
2017-05-02 12:42                             ` Julien Grall
2017-04-25  8:52                       ` Andrii Anisov
2017-04-21 15:57               ` Julien Grall
2017-04-21 16:16                 ` Volodymyr Babchuk
2017-04-21 16:47                   ` Julien Grall
2017-04-21 17:04                     ` Volodymyr Babchuk
2017-04-21 17:38                       ` Julien Grall
2017-04-21 18:35                         ` Volodymyr Babchuk
2017-04-24 11:00                           ` Julien Grall
2017-04-24 21:29                             ` Volodymyr Babchuk
2017-04-21 21:24                         ` Stefano Stabellini
2017-04-24 16:14                           ` Andrii Anisov
2017-04-24 16:46                           ` Andrii Anisov
2017-04-27 15:25                           ` George Dunlap
2017-05-02 12:45                             ` Julien Grall
2017-05-12 18:47 Volodymyr Babchuk
2017-05-15 12:51 ` George Dunlap
2017-05-15 17:35   ` Stefano Stabellini
2017-05-15 13:54 ` Andrii Anisov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.