All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Shared coprocessor framework
@ 2016-10-28 17:53 Andrii Anisov
  2016-10-28 18:38 ` Andrew Cooper
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Anisov @ 2016-10-28 17:53 UTC (permalink / raw)
  To: xen-devel, embedded-pv-devel, Artem Mygaiev, Oleks Tysh

Dear All,

At this moment we are designing a shared coprocessor framework to be
introduced into XEN hypervisor.
Please review an initial draft of the requirements and design overview.

=======================================================================

Shared coprocessor framework
============================

The main idea of the coprocessor sharing is to let different domains use
coprocessor concurrently and independently within some time share.

It is targeted capability of different domains to run concurrently different
firmware on the coproc. This target implies that there is a way to switch
coproc execution context externally from the main CPU which is running a
hypervisor.

Сomplexity and predictability of a coprocessor externally issued context
switching process defines the accuracy of the coproc scheduling and resources
allocation for domains as well as scheduling overhead.

Shared coprocessor framework requirements
=========================================

Functional
----------
The framework shall:
* provide a coprocessor resources sharing between different domains
* support a number of coprocessor instances shared between domains in runtime
* support configurable on system startup shareable coprocessors configuration
  (number of instances, type of instances, scheduling algorithm per
  coprocessor instance, etc)
* support configurable on domain startup list of coprocessors shared to this
  particular domain
* support runtime configuration of scheduling algorithm per instance of shared
  coprocessor

Non-functional
--------------
* The framework will provide an interface to integrate different coprocessor
  support implementation
* The framework will provide an interface to integrate different scheduling
  algorithms implementation.
* (Optional) The coprocessor firmware and drivers should not be changed in
  order to support the sharing.

Operation environment
---------------------
* XEN 4.8+

Assumptions and dependencies
----------------------------
* MMU-enabled SoC with MMU-protected coprocessor(s)

Design overview
===============

The Shared coprocessor framework will be implemented as a part of the XEN
hypervisor. The framework design and development will be based on master
branch not earlier than 4.8.0-rc2 targeting upstreaming before XEN 4.9 release.

Following are described design overview addressing requirements:

Coprocessors resources sharing mechanism
----------------------------------------

The coprocessor resources sharing will be done in a way of switching context
(per domain instance of firmware running) on the coprocessor issued by the
main processor. It is done by a hypervisor running on the main cpu and aware
about domains, their capabilities and needs.

The hypervisor is aware about coprocessors to be shared. For each shared
coprocessor there are: an assigned timer, scheduler and per-domain vcoproc
instances list.

It will be implemented a vcoproc scheduling mechanism independent of XEN VCPU
scheduling mechanism. It is caused by the difference in vcoproc and vcpu
nature:
* it is not considered pool of equal vcoproc running on a pool of equal
  coprocessors (SMP like behavior)
* due to fact that scheduler is not running on the coprocessor itself:
    * a vcoproc scheduler does not preempt the coprocessor context
    * coprocessor could deny context switching at the moment scheduler decide
    so, scheduling mechanism should adopt that
* there could be number of different coprocessors instances with scheduler and
  vcoproc queue associated

Due to the fact that some domain assigned a vcoproc could access coproc when
it is running another domain context, framework will implement iomem access
emulation for domains which are not provided coproc at the moment of access.

Interrupts from coprocessor will be effectively delivered to the domain which
vcoproc context is being running at the moment of the interrupt.

While the context switching is coprocessor and SoC specific procedure, the
framework would not encapsulate such procedure, but provides interface to
register platform implementation of context switching procedure.

It is assumed that the coprocessor has an iommu associated or corresponding
system level iommu (i.e. arm smmu) owned by hypervisor. The iommu setup is
also platform specific code and part of context switching.

In case of only having coprocessor associated iommu, platform code is also
responsible to handle that iommu setup (i.e. in a way of iomem access
emulation).

Support a number of coprocessor instances shared between domains in runtime
---------------------------------------------------------------------------

The coprocessor framework will have a list of coprocessors available for
sharing during system runtime. Any running domain could be using none, one or
several shared coprocessors.

System startup configuration
----------------------------

The compilation time configuration will provide separate config for the shared
coprocessor framework support, configs enabling scheduling algorithms, configs
enabling support of the coprocessor platform support.

The system startup configuration will be described in the system device tree
in the xen "choosen" node. The configuration should at least describe which
coprocessors within the system will be shared to the domains, a scheduling
algorithm assigned to each shared coprocessor, and an identificator of this
particular coprocessor instance. Hypervisor will read corresponding nodes and
configure the coprocessor framework with platform implementations on startup.

Domain startup configuration
----------------------------

In a domain configuration file it will be specified which particular instance
of the coprocessor should be shared to this domain.

The shared coprocessor domain configuration should be read by tools/xen and on
domain creation a vcoproc should be created and assigned to vcoproc instances
list of the particular coprocessor. Starting from here it is assumed that
vcoproc is ready to be scheduled as soon as domain starts using it.

In case of failure of vcoproc creation, the domain creation should be aborted.

Runtime configuration of scheduling algorithm per instance of shared
--------------------------------------------------------------------
coprocessor
-----------

There will be a initial domain tool implemented which will provide a CLI for
runtime monitoring of shared coprocessors instances available, per-domain/
per-coproc instance list of vcoproc and runtime setup of the scheduler per
each coprocessor instance.

Integration of different coprocessor support implementation
-----------------------------------------------------------

The coprocessor sharing framework is a platform independent entity which
implements generic flows and actions. It does need platform specific hooks
which implement exact coprocessor management operations.
The coprocessor platform specific implementations will provide:
* domain create/destroy hooks
* context switch hooks (coprocessor start, stop, context load/save)
* iomem handler(s)
* interrupt handler(s)

Integration of different scheduling algorithms implementation
-------------------------------------------------------------

There are number of possible coprocessor sharing scenarios with different
needs of coprocessor resources allocation for different domains. Within
current design the coprocessor resource allocation to domains is a matter of
scheduling algorithm and its configuration. The framework will use the
scheduling algorithm to determine if context switching should be done and what
is the next vcoproc to be ran on coprocessor.

The algorithm impelementation will provide:
* initialization/deinitialization hooks
* scheduling operations hooks

(Optional) The coprocessor firmware and drivers should not be changed in order
------------------------------------------------------------------------------
to support the sharing.
-----------------------

This requirement could be compromised for implementations which support
coprocessors not designed for externally issued context switching. Such
context switch function support would be introduced into the firmware and the
driver of the coprocessor.

Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Shared coprocessor framework
  2016-10-28 17:53 [RFC] Shared coprocessor framework Andrii Anisov
@ 2016-10-28 18:38 ` Andrew Cooper
  2016-10-31 19:31   ` Andrii Anisov
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2016-10-28 18:38 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel, embedded-pv-devel, Artem Mygaiev, Oleks Tysh

On 28/10/16 18:53, Andrii Anisov wrote:
> Dear All,
>
> At this moment we are designing a shared coprocessor framework to be
> introduced into XEN hypervisor.
> Please review an initial draft of the requirements and design overview.

Thankyou for the design doc.  An immediate +1 from me, simply for the
doc existing :)

>
> =======================================================================
>
> Shared coprocessor framework
> ============================
>
> The main idea of the coprocessor sharing is to let different domains use
> coprocessor concurrently and independently within some time share.

Forgive my ignorance (I am an x86 person, and given the CC list, I guess
this is talking about ARM systems), but what are coprocessors and what
might I do with one?

I think you might want a brief introduction sentence or two describing
what kind of systems this is applicable to, and an example of the kind
of thing which might want to be shared.

>
> It is targeted capability of different domains to run concurrently different
> firmware on the coproc.

I cant parse this sentence.  I presume you mean that the purpose of this
framework is to provide a mechanism for Xen to share a coprocessors
resource between multiple domains?

> This target implies that there is a way to switch
> coproc execution context externally from the main CPU which is running a
> hypervisor.
>
> Сomplexity and predictability of a coprocessor externally issued context
> switching process defines the accuracy of the coproc scheduling and resources
> allocation for domains as well as scheduling overhead.
>
> Shared coprocessor framework requirements
> =========================================
>
> Functional
> ----------
> The framework shall:
> * provide a coprocessor resources sharing between different domains

Grammar nit.  Either "Provide coprocessor resource sharing between..."
or "Provide sharing of coprocessor resources between..."

> * support a number of coprocessor instances shared between domains in runtime
> * support configurable on system startup shareable coprocessors configuration
>   (number of instances, type of instances, scheduling algorithm per
>   coprocessor instance, etc)

Does it need to only be configurable at system startup?  There is often
more flexibility by having a default configuration at system start (so
dom0 can use the resources), which can later be altered by toolstack policy.

Considering the latter option, even if you don't implement support at
first, tends to lead to a cleaner design, but of course it does depend
heavily on the details of the situation.

> * support configurable on domain startup list of coprocessors shared to this
>   particular domain
> * support runtime configuration of scheduling algorithm per instance of shared
>   coprocessor
>
> Non-functional
> --------------
> * The framework will provide an interface to integrate different coprocessor
>   support implementation
> * The framework will provide an interface to integrate different scheduling
>   algorithms implementation.
> * (Optional) The coprocessor firmware and drivers should not be changed in
>   order to support the sharing.
>
> Operation environment
> ---------------------
> * XEN 4.8+
>
> Assumptions and dependencies
> ----------------------------
> * MMU-enabled SoC with MMU-protected coprocessor(s)

Right - definitely ARM then, but it took me until half way through the
document to work this out.

>
> Design overview
> ===============
>
> The Shared coprocessor framework will be implemented as a part of the XEN
> hypervisor. The framework design and development will be based on master
> branch not earlier than 4.8.0-rc2 targeting upstreaming before XEN 4.9 release.
>
> Following are described design overview addressing requirements:
>
> Coprocessors resources sharing mechanism
> ----------------------------------------
>
> The coprocessor resources sharing will be done in a way of switching context
> (per domain instance of firmware running) on the coprocessor issued by the
> main processor. It is done by a hypervisor running on the main cpu and aware
> about domains, their capabilities and needs.
>
> The hypervisor is aware about coprocessors to be shared. For each shared
> coprocessor there are: an assigned timer, scheduler and per-domain vcoproc
> instances list.
>
> It will be implemented a vcoproc scheduling mechanism independent of XEN VCPU
> scheduling mechanism. It is caused by the difference in vcoproc and vcpu
> nature:
> * it is not considered pool of equal vcoproc running on a pool of equal
>   coprocessors (SMP like behavior)
> * due to fact that scheduler is not running on the coprocessor itself:
>     * a vcoproc scheduler does not preempt the coprocessor context
>     * coprocessor could deny context switching at the moment scheduler decide
>     so, scheduling mechanism should adopt that
> * there could be number of different coprocessors instances with scheduler and
>   vcoproc queue associated
>
> Due to the fact that some domain assigned a vcoproc could access coproc when
> it is running another domain context, framework will implement iomem access
> emulation for domains which are not provided coproc at the moment of access.
>
> Interrupts from coprocessor will be effectively delivered to the domain which
> vcoproc context is being running at the moment of the interrupt.
>
> While the context switching is coprocessor and SoC specific procedure, the
> framework would not encapsulate such procedure, but provides interface to
> register platform implementation of context switching procedure.
>
> It is assumed that the coprocessor has an iommu associated or corresponding
> system level iommu (i.e. arm smmu) owned by hypervisor. The iommu setup is
> also platform specific code and part of context switching.
>
> In case of only having coprocessor associated iommu, platform code is also
> responsible to handle that iommu setup (i.e. in a way of iomem access
> emulation).
>
> Support a number of coprocessor instances shared between domains in runtime
> ---------------------------------------------------------------------------
>
> The coprocessor framework will have a list of coprocessors available for
> sharing during system runtime. Any running domain could be using none, one or
> several shared coprocessors.

I would be tempted to extend this slightly, and specify that there
should be a mechanism for the toolstack to query all of this information
at arbitrary points in time.

>
> System startup configuration
> ----------------------------
>
> The compilation time configuration will provide separate config for the shared
> coprocessor framework support, configs enabling scheduling algorithms, configs
> enabling support of the coprocessor platform support.
>
> The system startup configuration will be described in the system device tree
> in the xen "choosen" node. The configuration should at least describe which
> coprocessors within the system will be shared to the domains, a scheduling
> algorithm assigned to each shared coprocessor, and an identificator of this
> particular coprocessor instance. Hypervisor will read corresponding nodes and
> configure the coprocessor framework with platform implementations on startup.
>
> Domain startup configuration
> ----------------------------
>
> In a domain configuration file it will be specified which particular instance
> of the coprocessor should be shared to this domain.
>
> The shared coprocessor domain configuration should be read by tools/xen and on
> domain creation a vcoproc should be created and assigned to vcoproc instances
> list of the particular coprocessor. Starting from here it is assumed that
> vcoproc is ready to be scheduled as soon as domain starts using it.
>
> In case of failure of vcoproc creation, the domain creation should be aborted.
>
> Runtime configuration of scheduling algorithm per instance of shared
> --------------------------------------------------------------------
> coprocessor
> -----------
>
> There will be a initial domain tool implemented which will provide a CLI for
> runtime monitoring of shared coprocessors instances available, per-domain/
> per-coproc instance list of vcoproc and runtime setup of the scheduler per
> each coprocessor instance.
>
> Integration of different coprocessor support implementation
> -----------------------------------------------------------
>
> The coprocessor sharing framework is a platform independent entity which
> implements generic flows and actions. It does need platform specific hooks
> which implement exact coprocessor management operations.
> The coprocessor platform specific implementations will provide:
> * domain create/destroy hooks
> * context switch hooks (coprocessor start, stop, context load/save)
> * iomem handler(s)
> * interrupt handler(s)
>
> Integration of different scheduling algorithms implementation
> -------------------------------------------------------------
>
> There are number of possible coprocessor sharing scenarios with different
> needs of coprocessor resources allocation for different domains. Within
> current design the coprocessor resource allocation to domains is a matter of
> scheduling algorithm and its configuration. The framework will use the
> scheduling algorithm to determine if context switching should be done and what
> is the next vcoproc to be ran on coprocessor.
>
> The algorithm impelementation will provide:
> * initialization/deinitialization hooks
> * scheduling operations hooks
>
> (Optional) The coprocessor firmware and drivers should not be changed in order
> ------------------------------------------------------------------------------
> to support the sharing.
> -----------------------
>
> This requirement could be compromised for implementations which support
> coprocessors not designed for externally issued context switching. Such
> context switch function support would be introduced into the firmware and the
> driver of the coprocessor.

Do you mean that, ideally, Xen can fully context switch a coprocessor
behind the back of domU, and the domU driver need not know or care about
the difference?

And, where that isn't possible, some virtual hooks could be introduced
to the domU driver so domU can opt into sharing when it has a compatible
driver?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Shared coprocessor framework
  2016-10-28 18:38 ` Andrew Cooper
@ 2016-10-31 19:31   ` Andrii Anisov
  2016-11-01 13:57     ` Artem Mygaiev
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Anisov @ 2016-10-31 19:31 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Oleks Tysh, embedded-pv-devel, xen-devel, Artem Mygaiev

> Thankyou for the design doc.  An immediate +1 from me, simply for the
> doc existing :)

Thank you for you interest and comments.

> Forgive my ignorance (I am an x86 person, and given the CC list, I guess
> this is talking about ARM systems), but what are coprocessors and what
> might I do with one?

Well coprocessor could be a some processing unit inside a SoC which is
running some firmware and supplementing primary processor functions.
F.e. GPU, DSP, some FPGA inside a SoC.
The living example is a GPU sharing implemented for the ARM based SoC.
BTW, the xengt implements pretty close approach and is a pure x86
world solution.

> > It is targeted capability of different domains to run concurrently different
> > firmware on the coproc.
> I cant parse this sentence.  I presume you mean that the purpose of this
> framework is to provide a mechanism for Xen to share a coprocessors
> resource between multiple domains?

Maybe it should be reworded. I mean that coprocessors are entities
which are running some firmware to perform their tasks. So different
domains in their time slice could run different versions of firmware
on the same coprocessor.
It is mentioned here to stress that domains contexts are totally
independent (both for processed data and for firmware code).

> Grammar nit.  Either "Provide coprocessor resource sharing between..."
> or "Provide sharing of coprocessor resources between..."

Will take "Provide sharing of coprocessor resources between...".


> Does it need to only be configurable at system startup?  There is often
> more flexibility by having a default configuration at system start (so
> dom0 can use the resources), which can later be altered by toolstack policy.

I did mean that hypervisor, what starts first, checks what
coprocessors within the system would be shared, own them and provide
to a framework. Providing some of those resources to dom0 would not a
big deal: just assign a vcoproc to the domain. And yes, it could be a
default configuration.

>
> Considering the latter option, even if you don't implement support at
> first, tends to lead to a cleaner design, but of course it does depend
> heavily on the details of the situation.

Definitely we would tailor the design along with an implementation.

> > * MMU-enabled SoC with MMU-protected coprocessor(s)
> Right - definitely ARM then, but it took me until half way through the
> document to work this out.

You know, it was specified ARM based SoC here. At some point it was
removed such a dependency. Inspired by the already mentioned xengt.

> I would be tempted to extend this slightly, and specify that there
> should be a mechanism for the toolstack to query all of this information
> at arbitrary points in time.

It should be covered here:
>
> > Runtime configuration of scheduling algorithm per instance of shared
> > --------------------------------------------------------------------
> > coprocessor
> > -----------
> >
> > There will be a initial domain tool implemented which will provide a CLI for
> > runtime monitoring of shared coprocessors instances available, per-domain/
> > per-coproc instance list of vcoproc and runtime setup of the scheduler per
> > each coprocessor instance.

Maybe it needs some rewording.

> Do you mean that, ideally, Xen can fully context switch a coprocessor
> behind the back of domU, and the domU driver need not know or care about
> the difference?
You've got the point.

> And, where that isn't possible, some virtual hooks could be introduced
> to the domU driver so domU can opt into sharing when it has a compatible
> driver?
Yes, due to specifics of SoC implementation pure solution could be
inefficient, not secure or even impossible. In this case
drivers/firmware could be modified to cooperate with coprocessor
sharing framework in XEN.
This part could be architecture (coprocessor) specific, or machine
(SoC) specific.

Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Shared coprocessor framework
  2016-10-31 19:31   ` Andrii Anisov
@ 2016-11-01 13:57     ` Artem Mygaiev
  2016-11-03 11:31       ` Andrii Anisov
  2016-11-11 20:43       ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 15+ messages in thread
From: Artem Mygaiev @ 2016-11-01 13:57 UTC (permalink / raw)
  To: Andrii Anisov; +Cc: Oleks Tysh, Andrew Cooper, xen-devel, embedded-pv-devel

Let me explain a bit background of this work.

We see growing amount of use cases for different "co-processors" like
 - GPUs (inside of most modern SoCs)
 - low-power side CPU cores (like ARM Cortex M or R on board with
Cortex A cores to handle PM or other tasks)
 - DSPs (for example, TI C6000 family DSP core inside of Jacinto 6 SoC)
 - FPGAs (Altera or Xilinx SoCs = ARM+FPGA)

These cores most often used standalone for some specific function, but
quite often there is a need to "virtualize" such co-processor together
with the main CPU cores. For example, they may be used in some
virtualized heterogeneous computing environment so there shall be some
sort of an independent "context" of a co-processor associated with a
VM which interacts with it. In some cases, addressing security and
stability requirements, "context" means not only "data" but also
"code" (firmware); i.e. when VMs switch on main CPU, both code and
data memory shall switch on co-processor.

Couple examples when VMs run on same SoC, both want to use some
co-processor in data-intense tasks with different data sets and with
different firmware images and ensure isolation (no data is leaked or
code corrupted through co-processor's memory access) and stability
(restart of one system does not lead to crash of another):
1. use GPU for GL rendering in one VM, another for NN state computing
2. use DSP for HW-accelerated media decoding, another for video image
processing (object recognition, etc.)

We already see a need for enabling such cases in Embedded/Automotive
space (mostly dominated by ARM now), but also this might fit generic
computing in heterogeneous environments - different co-processors are
now deployed alongside generic CPUs in server environments (Google use
own tensor processors for NN computing acceleration, Microsoft used
Altera's FPGAs in project Catapult, there are deployments of GPU
computing nodes in some clouds, etc.)

Hope that makes sense

Best regards,
Artem Mygaiev


On Mon, Oct 31, 2016 at 9:31 PM, Andrii Anisov <andrii.anisov@gmail.com> wrote:
>> Thankyou for the design doc.  An immediate +1 from me, simply for the
>> doc existing :)
>
> Thank you for you interest and comments.
>
>> Forgive my ignorance (I am an x86 person, and given the CC list, I guess
>> this is talking about ARM systems), but what are coprocessors and what
>> might I do with one?
>
> Well coprocessor could be a some processing unit inside a SoC which is
> running some firmware and supplementing primary processor functions.
> F.e. GPU, DSP, some FPGA inside a SoC.
> The living example is a GPU sharing implemented for the ARM based SoC.
> BTW, the xengt implements pretty close approach and is a pure x86
> world solution.
>
>> > It is targeted capability of different domains to run concurrently different
>> > firmware on the coproc.
>> I cant parse this sentence.  I presume you mean that the purpose of this
>> framework is to provide a mechanism for Xen to share a coprocessors
>> resource between multiple domains?
>
> Maybe it should be reworded. I mean that coprocessors are entities
> which are running some firmware to perform their tasks. So different
> domains in their time slice could run different versions of firmware
> on the same coprocessor.
> It is mentioned here to stress that domains contexts are totally
> independent (both for processed data and for firmware code).
>
>> Grammar nit.  Either "Provide coprocessor resource sharing between..."
>> or "Provide sharing of coprocessor resources between..."
>
> Will take "Provide sharing of coprocessor resources between...".
>
>
>> Does it need to only be configurable at system startup?  There is often
>> more flexibility by having a default configuration at system start (so
>> dom0 can use the resources), which can later be altered by toolstack policy.
>
> I did mean that hypervisor, what starts first, checks what
> coprocessors within the system would be shared, own them and provide
> to a framework. Providing some of those resources to dom0 would not a
> big deal: just assign a vcoproc to the domain. And yes, it could be a
> default configuration.
>
>>
>> Considering the latter option, even if you don't implement support at
>> first, tends to lead to a cleaner design, but of course it does depend
>> heavily on the details of the situation.
>
> Definitely we would tailor the design along with an implementation.
>
>> > * MMU-enabled SoC with MMU-protected coprocessor(s)
>> Right - definitely ARM then, but it took me until half way through the
>> document to work this out.
>
> You know, it was specified ARM based SoC here. At some point it was
> removed such a dependency. Inspired by the already mentioned xengt.
>
>> I would be tempted to extend this slightly, and specify that there
>> should be a mechanism for the toolstack to query all of this information
>> at arbitrary points in time.
>
> It should be covered here:
>>
>> > Runtime configuration of scheduling algorithm per instance of shared
>> > --------------------------------------------------------------------
>> > coprocessor
>> > -----------
>> >
>> > There will be a initial domain tool implemented which will provide a CLI for
>> > runtime monitoring of shared coprocessors instances available, per-domain/
>> > per-coproc instance list of vcoproc and runtime setup of the scheduler per
>> > each coprocessor instance.
>
> Maybe it needs some rewording.
>
>> Do you mean that, ideally, Xen can fully context switch a coprocessor
>> behind the back of domU, and the domU driver need not know or care about
>> the difference?
> You've got the point.
>
>> And, where that isn't possible, some virtual hooks could be introduced
>> to the domU driver so domU can opt into sharing when it has a compatible
>> driver?
> Yes, due to specifics of SoC implementation pure solution could be
> inefficient, not secure or even impossible. In this case
> drivers/firmware could be modified to cooperate with coprocessor
> sharing framework in XEN.
> This part could be architecture (coprocessor) specific, or machine
> (SoC) specific.
>
> Sincerely,
> Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Shared coprocessor framework
  2016-11-01 13:57     ` Artem Mygaiev
@ 2016-11-03 11:31       ` Andrii Anisov
  2016-11-11 20:43       ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 15+ messages in thread
From: Andrii Anisov @ 2016-11-03 11:31 UTC (permalink / raw)
  To: xen-devel, embedded-pv-devel; +Cc: Oleks Tysh, Andrew Cooper, Artem Mygaiev

Dear All,

Would you please come up with more comments, ideas or blames on the topic?

Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Shared coprocessor framework
  2016-11-01 13:57     ` Artem Mygaiev
  2016-11-03 11:31       ` Andrii Anisov
@ 2016-11-11 20:43       ` Konrad Rzeszutek Wilk
  2016-11-12 12:04         ` Artem Mygaiev
  1 sibling, 1 reply; 15+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-11-11 20:43 UTC (permalink / raw)
  To: Artem Mygaiev
  Cc: Oleks Tysh, Andrew Cooper, embedded-pv-devel, xen-devel, Andrii Anisov

On Tue, Nov 01, 2016 at 03:57:13PM +0200, Artem Mygaiev wrote:
> Let me explain a bit background of this work.
> 
> We see growing amount of use cases for different "co-processors" like
>  - GPUs (inside of most modern SoCs)
>  - low-power side CPU cores (like ARM Cortex M or R on board with
> Cortex A cores to handle PM or other tasks)
>  - DSPs (for example, TI C6000 family DSP core inside of Jacinto 6 SoC)
>  - FPGAs (Altera or Xilinx SoCs = ARM+FPGA)
> 
> These cores most often used standalone for some specific function, but
> quite often there is a need to "virtualize" such co-processor together
> with the main CPU cores. For example, they may be used in some
> virtualized heterogeneous computing environment so there shall be some
> sort of an independent "context" of a co-processor associated with a
> VM which interacts with it. In some cases, addressing security and
> stability requirements, "context" means not only "data" but also
> "code" (firmware); i.e. when VMs switch on main CPU, both code and
> data memory shall switch on co-processor.
> 
> Couple examples when VMs run on same SoC, both want to use some
> co-processor in data-intense tasks with different data sets and with
> different firmware images and ensure isolation (no data is leaked or
> code corrupted through co-processor's memory access) and stability
> (restart of one system does not lead to crash of another):
> 1. use GPU for GL rendering in one VM, another for NN state computing
> 2. use DSP for HW-accelerated media decoding, another for video image
> processing (object recognition, etc.)
> 

Thanks for the examples, they make it easier to grok what 'coprocessors'
mean. Thought still being a newbie to the ARM-land I have some extra
questions:

Does this also mean that the hypervisor has to know the co-processors?
As in how to start/stop them? And how to tell them to save/restore
guest context? Or is there some generic specification for doing this?


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Shared coprocessor framework
  2016-11-11 20:43       ` Konrad Rzeszutek Wilk
@ 2016-11-12 12:04         ` Artem Mygaiev
  2016-11-15 19:23           ` Konrad Rzeszutek Wilk
  2016-11-16 12:39           ` Andrii Anisov
  0 siblings, 2 replies; 15+ messages in thread
From: Artem Mygaiev @ 2016-11-12 12:04 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Oleks Tysh, Andrew Cooper, embedded-pv-devel, xen-devel, Andrii Anisov

On Fri, Nov 11, 2016 at 10:43 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> Does this also mean that the hypervisor has to know the co-processors?
> As in how to start/stop them? And how to tell them to save/restore
> guest context? Or is there some generic specification for doing this?

Unfortunately there is be no single way to switch context on
co-processors, so yes, hypervisor has to know the co-processors.
The situation is not as bad as having full-scope driver (which is
implemented in some proprietary hypervisors), we only need to:
1. stop
2. flush registers
3. switch memory context <--- implemented by SMMU in ARM
4. restore registers
5. start

Best regards,
Artem Mygaiev

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Shared coprocessor framework
  2016-11-12 12:04         ` Artem Mygaiev
@ 2016-11-15 19:23           ` Konrad Rzeszutek Wilk
  2016-11-16 13:42             ` Andrii Anisov
  2016-11-16 12:39           ` Andrii Anisov
  1 sibling, 1 reply; 15+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-11-15 19:23 UTC (permalink / raw)
  To: Artem Mygaiev
  Cc: Oleks Tysh, Andrew Cooper, embedded-pv-devel, xen-devel, Andrii Anisov

On Sat, Nov 12, 2016 at 02:04:25PM +0200, Artem Mygaiev wrote:
> On Fri, Nov 11, 2016 at 10:43 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > Does this also mean that the hypervisor has to know the co-processors?
> > As in how to start/stop them? And how to tell them to save/restore
> > guest context? Or is there some generic specification for doing this?
> 
> Unfortunately there is be no single way to switch context on
> co-processors, so yes, hypervisor has to know the co-processors.
> The situation is not as bad as having full-scope driver (which is
> implemented in some proprietary hypervisors), we only need to:
> 1. stop
> 2. flush registers
> 3. switch memory context <--- implemented by SMMU in ARM
> 4. restore registers
> 5. start

So it looks like there could be an generic API to deal with
these various operations. And I think you are thinking to hook
it up to the scheduler so that when a guest switches you can
follow with that (similar to how Intel IOMMU VT-x Posted Interrupts
are done). But I think the design also mentioned asynchronous
jobs so there may be situations where there is a doorbell
to wake up an guest? 

But I think in my x86 poisioned PoV mind this is similar to an
PCIe device that has its own MMU.

Which brings some more questions - how do we erect the barriers
such that this "coprocessor" does not destabilize the system
incase the firmware on the "coprocessors" ends up blowing up?

Are there some other operations to allow the coprocessors
only to touch specific memory regions?

Thanks.
> 
> Best regards,
> Artem Mygaiev

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Shared coprocessor framework
  2016-11-12 12:04         ` Artem Mygaiev
  2016-11-15 19:23           ` Konrad Rzeszutek Wilk
@ 2016-11-16 12:39           ` Andrii Anisov
  2016-11-22  1:50             ` Stefano Stabellini
  1 sibling, 1 reply; 15+ messages in thread
From: Andrii Anisov @ 2016-11-16 12:39 UTC (permalink / raw)
  To: Artem Mygaiev; +Cc: Oleks Tysh, Andrew Cooper, embedded-pv-devel, xen-devel

AM> The situation is not as bad as having full-scope driver (which is
AM> implemented in some proprietary hypervisors), we only need to:
AM> 1. stop
AM> 2. flush registers
AM> 3. switch memory context <--- implemented by SMMU in ARM
AM> 4. restore registers
AM> 5. start

Well, we also need to take care about following:

AA> Due to the fact that some domain assigned a vcoproc could access coproc when
AA> it is running another domain context, framework will implement iomem access
AA> emulation for domains which are not provided coproc at the moment of access.

Sincerely,
Andrii Anisov.


On Sat, Nov 12, 2016 at 2:04 PM, Artem Mygaiev <joculator@gmail.com> wrote:
> On Fri, Nov 11, 2016 at 10:43 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
>> Does this also mean that the hypervisor has to know the co-processors?
>> As in how to start/stop them? And how to tell them to save/restore
>> guest context? Or is there some generic specification for doing this?
>
> Unfortunately there is be no single way to switch context on
> co-processors, so yes, hypervisor has to know the co-processors.
> The situation is not as bad as having full-scope driver (which is
> implemented in some proprietary hypervisors), we only need to:
> 1. stop
> 2. flush registers
> 3. switch memory context <--- implemented by SMMU in ARM
> 4. restore registers
> 5. start
>
> Best regards,
> Artem Mygaiev

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Shared coprocessor framework
  2016-11-15 19:23           ` Konrad Rzeszutek Wilk
@ 2016-11-16 13:42             ` Andrii Anisov
  0 siblings, 0 replies; 15+ messages in thread
From: Andrii Anisov @ 2016-11-16 13:42 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Oleks Tysh, Andrew Cooper, embedded-pv-devel, xen-devel, Artem Mygaiev

> So it looks like there could be an generic API to deal with
> these various operations.

Currently it is designed to be a generic API with platform
(coprocessor) specific hooks.

> And I think you are thinking to hook it up to the scheduler so that when a guest
> switches you can follow with that

If you mean hook with vcpu scheduler then we will not. On an SMP
system it could be running vcpus from different domains at the same
time, so we can not rely on the domain is running.
vcoproc scheduling will be independent from vcpu scheduling.

> But I think the design also mentioned asynchronous jobs so there may be situations
> where there is a doorbell to wake up an guest?
Not really sure I've got the question. Coprocessor generated
interrupts would be routed to the domain which context is running on
the coprocessor at the moment of interrupt.

> But I think in my x86 poisioned PoV mind this is similar to an
> PCIe device that has its own MMU.
I'm sorry, I can not compare. I know a little about the x86 world.

> Which brings some more questions - how do we erect the barriers
> such that this "coprocessor" does not destabilize the system
> incase the firmware on the "coprocessors" ends up blowing up?
>
> Are there some other operations to allow the coprocessors
> only to touch specific memory regions?
IOMMU should take care of that. IOMMU setup planned to be a part of
"context", so the framework will take care to switch it.

Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Shared coprocessor framework
  2016-11-16 12:39           ` Andrii Anisov
@ 2016-11-22  1:50             ` Stefano Stabellini
  2016-11-23 13:54               ` Andrii Anisov
  0 siblings, 1 reply; 15+ messages in thread
From: Stefano Stabellini @ 2016-11-22  1:50 UTC (permalink / raw)
  To: Andrii Anisov
  Cc: Oleks Tysh, Andrew Cooper, embedded-pv-devel, xen-devel, Artem Mygaiev

Hi Andrii,

Thank you for the document, I think is a very good start. I also see the
need for this framework. Please add more details about the proposed
interface (Xen API, hypercalls, etc) in the next version; I am looking
forward to it.


On Wed, 16 Nov 2016, Andrii Anisov wrote:
> AM> The situation is not as bad as having full-scope driver (which is
> AM> implemented in some proprietary hypervisors), we only need to:
> AM> 1. stop
> AM> 2. flush registers
> AM> 3. switch memory context <--- implemented by SMMU in ARM
> AM> 4. restore registers
> AM> 5. start
> 
> Well, we also need to take care about following:
> 
> AA> Due to the fact that some domain assigned a vcoproc could access coproc when
> AA> it is running another domain context, framework will implement iomem access
> AA> emulation for domains which are not provided coproc at the moment of access.

This is certainly going to be the hardest part. I take the framework is
just going to provide a generic API for implementing a coprocessor
emulator and it is going to be up to each coprocessor implementation to
provide the code.

Is the emulator going to live in the Xen hypervisor?

It would be nice to provide a simple coprocessor example, if you have one.


> On Sat, Nov 12, 2016 at 2:04 PM, Artem Mygaiev <joculator@gmail.com> wrote:
> > On Fri, Nov 11, 2016 at 10:43 PM, Konrad Rzeszutek Wilk
> > <konrad.wilk@oracle.com> wrote:
> >> Does this also mean that the hypervisor has to know the co-processors?
> >> As in how to start/stop them? And how to tell them to save/restore
> >> guest context? Or is there some generic specification for doing this?
> >
> > Unfortunately there is be no single way to switch context on
> > co-processors, so yes, hypervisor has to know the co-processors.
> > The situation is not as bad as having full-scope driver (which is
> > implemented in some proprietary hypervisors), we only need to:
> > 1. stop
> > 2. flush registers
> > 3. switch memory context <--- implemented by SMMU in ARM
> > 4. restore registers
> > 5. start
> >
> > Best regards,
> > Artem Mygaiev
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Shared coprocessor framework
  2016-11-22  1:50             ` Stefano Stabellini
@ 2016-11-23 13:54               ` Andrii Anisov
  2016-11-23 18:56                 ` Stefano Stabellini
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Anisov @ 2016-11-23 13:54 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Oleks Tysh, Andrew Cooper, embedded-pv-devel, xen-devel, Artem Mygaiev

Stefano,

Please see my answers below:
>
> Thank you for the document, I think is a very good start. I also see the
> need for this framework. Please add more details about the proposed
> interface (Xen API, hypercalls, etc) in the next version; I am looking
> forward to it.

We will come up with the document update once we have it agreed internally.

>> AA> Due to the fact that some domain assigned a vcoproc could access coproc when
>> AA> it is running another domain context, framework will implement iomem access
>> AA> emulation for domains which are not provided coproc at the moment of access.
>
> This is certainly going to be the hardest part. I take the framework is
> just going to provide a generic API for implementing a coprocessor
> emulator and it is going to be up to each coprocessor implementation to
> provide the code.

This piece together with the context switching logic is definitely a
platform specific stuff and its complexity could be different
coprocessor to coprocessor.
Registers access emultaion for not running vcopro is expected to be
not very complex for our case:
    - saved context return on register read
    - stacking on writes to be executed during switching context to that vcoproc
    - rare more complex corner cases

> Is the emulator going to live in the Xen hypervisor?
That is the idea.

> It would be nice to provide a simple coprocessor example, if you have one.
I'm not really sure about a simple functional example. We are
targeting GPU sharing for the first drop.

Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Shared coprocessor framework
  2016-11-23 13:54               ` Andrii Anisov
@ 2016-11-23 18:56                 ` Stefano Stabellini
  2016-11-23 19:07                   ` Andrii Anisov
  2016-11-23 19:23                   ` Andrii Anisov
  0 siblings, 2 replies; 15+ messages in thread
From: Stefano Stabellini @ 2016-11-23 18:56 UTC (permalink / raw)
  To: Andrii Anisov
  Cc: Stefano Stabellini, Andrew Cooper, Oleks Tysh, embedded-pv-devel,
	xen-devel, Artem Mygaiev

On Wed, 23 Nov 2016, Andrii Anisov wrote:
> Stefano,
> 
> Please see my answers below:
> >
> > Thank you for the document, I think is a very good start. I also see the
> > need for this framework. Please add more details about the proposed
> > interface (Xen API, hypercalls, etc) in the next version; I am looking
> > forward to it.
> 
> We will come up with the document update once we have it agreed internally.
> 
> >> AA> Due to the fact that some domain assigned a vcoproc could access coproc when
> >> AA> it is running another domain context, framework will implement iomem access
> >> AA> emulation for domains which are not provided coproc at the moment of access.
> >
> > This is certainly going to be the hardest part. I take the framework is
> > just going to provide a generic API for implementing a coprocessor
> > emulator and it is going to be up to each coprocessor implementation to
> > provide the code.
> 
> This piece together with the context switching logic is definitely a
> platform specific stuff and its complexity could be different
> coprocessor to coprocessor.
> Registers access emultaion for not running vcopro is expected to be
> not very complex for our case:
>     - saved context return on register read
>     - stacking on writes to be executed during switching context to that vcoproc
>     - rare more complex corner cases
> 
> > Is the emulator going to live in the Xen hypervisor?
> That is the idea.
> 
> > It would be nice to provide a simple coprocessor example, if you have one.
> I'm not really sure about a simple functional example. We are
> targeting GPU sharing for the first drop.
 
I was thinking of something trivial but enough to prove the point.
Something like a very simple accelerator, maybe a data copy accelerator.
A GPU is certainly not trivial :-)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Shared coprocessor framework
  2016-11-23 18:56                 ` Stefano Stabellini
@ 2016-11-23 19:07                   ` Andrii Anisov
  2016-11-23 19:23                   ` Andrii Anisov
  1 sibling, 0 replies; 15+ messages in thread
From: Andrii Anisov @ 2016-11-23 19:07 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Oleks Tysh, Andrew Cooper, embedded-pv-devel, xen-devel, Artem Mygaiev

> I was thinking of something trivial but enough to prove the point.
> Something like a very simple accelerator, maybe a data copy accelerator.
> A GPU is certainly not trivial :-)
Indeed.
But we still have targets to reach and shortage in resources to spread
over simple examples ;)

Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Shared coprocessor framework
  2016-11-23 18:56                 ` Stefano Stabellini
  2016-11-23 19:07                   ` Andrii Anisov
@ 2016-11-23 19:23                   ` Andrii Anisov
  1 sibling, 0 replies; 15+ messages in thread
From: Andrii Anisov @ 2016-11-23 19:23 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Oleks Tysh, Andrew Cooper, embedded-pv-devel, xen-devel, Artem Mygaiev

> I was thinking of something trivial but enough to prove the point.
It is already implemented in a hack'n'slash way.
So we are pretty confident in the approach and looking forward to make
generic and scalable implementation.
And upstreamable, of course.

Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-11-23 19:23 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-28 17:53 [RFC] Shared coprocessor framework Andrii Anisov
2016-10-28 18:38 ` Andrew Cooper
2016-10-31 19:31   ` Andrii Anisov
2016-11-01 13:57     ` Artem Mygaiev
2016-11-03 11:31       ` Andrii Anisov
2016-11-11 20:43       ` Konrad Rzeszutek Wilk
2016-11-12 12:04         ` Artem Mygaiev
2016-11-15 19:23           ` Konrad Rzeszutek Wilk
2016-11-16 13:42             ` Andrii Anisov
2016-11-16 12:39           ` Andrii Anisov
2016-11-22  1:50             ` Stefano Stabellini
2016-11-23 13:54               ` Andrii Anisov
2016-11-23 18:56                 ` Stefano Stabellini
2016-11-23 19:07                   ` Andrii Anisov
2016-11-23 19:23                   ` Andrii Anisov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.