All of lore.kernel.org
 help / color / mirror / Atom feed
* [Draft C] Xen on ARM vITS Handling
@ 2015-05-27 11:48 Ian Campbell
  2015-05-27 16:44 ` Vijay Kilari
  2015-05-29 13:40 ` Julien Grall
  0 siblings, 2 replies; 15+ messages in thread
From: Ian Campbell @ 2015-05-27 11:48 UTC (permalink / raw)
  To: xen-devel; +Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari

Here follows draft C based on previous feedback.

Also at:

http://xenbits.xen.org/people/ianc/vits/draftC.{pdf,html}

I think I've captured most of the previous discussion, except where
explicitly noted by XXX or in other replies, but please do point out
places where I've missed something.

One area where I am pretty sure I've dropped the ball is on the
completion and update of `CREADR`. That conversation ended up
bifurcating along the 1:N vs N:N mapping scheme lines, and I didn't
manage to get the various proposals straight. Since we've now agreed on
N:N hopefully we can reach a conclusion (no pun intended) on the
completion aspect too (sorry that this probably means rehasing at least
a subset of the previous thread).

Ian.

% Xen on ARM vITS Handling
% Ian Campbell <ian.campbell@citrix.com>
% Draft C

# Changelog

## Since Draft B

* Details of command translation (thanks to Julien and Vijay)
* Added background on LPI Translation and Pending tablesd
* Added background on Collections
* Settled on `N:N` scheme for vITS:pITS mapping.
* Rejigged section nesting a bit.
* Since we now thing translation should be cheap, settle on
  translation at scheduling time.
* Lazy `INVALL` and `SYNC`

## Since Draft A

* Added discussion of when/where command translation occurs.
* Contention on scheduler lock, suggestion to use SOFTIRQ.
* Handling of domain shutdown.
* More detailed discussion of multiple vs single vits pros/cons.

# Introduction

ARM systems containing a GIC version 3 or later may contain one or
more ITS logical blocks. An ITS is used to route Message Signalled
interrupts from devices into an LPI injection on the processor.

The following summarises the ITS hardware design and serves as a set
of assumptions for the vITS software design. (XXX it is entirely
possible I've horribly misunderstood how this stuff fits
together). For full details of the ITS see the "GIC Architecture
Specification".

## Device Identifiers

Each device using the ITS is associated with a unique identifier.

The device IDs are typically described via system firmware, e.g. the
ACPI IORT table or via device tree.

The number of device ids is variable and can be discovered via
`GITS_TYPER.Devbits`. This field allows an ITS to have up to 2^32
device.

## Interrupt Collections

Each interrupt is a member of an Interrupt Collection. This allows
software to manage large numbers of physical interrupts with a small
number of commands rather than issuing one command per interrupt.

On a system with N processors, the ITS must provide at least N+1
collections.

## Target Addresses

The Target Address correspond to a specific GIC re-distributor. The format
of this field depends on the value of the `GITS_TYPER.PTA` bit:

* 1: the base address of the re-distributor target is used
* 0: a unique processor number is used. The mapping between the
  processor affinity value (`MPIDR`) and the processor number is
  discoverable via `GICR_TYPER.ProcessorNumber`.

## ITS Translation Table

Message signalled interrupts are translated into an LPI via an ITS
translation table which must be configured for each device which can
generate an MSI.

The ITS translation table maps the device id of the originating devic
into an Interrupt Collection and then into a target address.

## ITS Configuration

The ITS is configured and managed, including establishing and
configuring Translation Table for each device, via an in memory ring
shared between the CPU and the ITS controller. The ring is managed via
the `GITS_CBASER` register and indexed by `GITS_CWRITER` and
`GITS_CREADR` registers.

A processor adds commands to the shared ring and then updates
`GITS_CWRITER` to make them visible to the ITS controller.

The ITS controller processes commands from the ring and then updates
`GITS_CREADR` to indicate the the processor that the command has been
processed.

Commands are processed sequentially.

Commands sent on the ring include operational commands:

* Routing interrupts to processors;
* Generating interrupts;
* Clearing the pending state of interrupts;
* Synchronising the command queue

and maintenance commands:

* Map device/collection/processor;
* Map virtual interrupt;
* Clean interrupts;
* Discard interrupts;

The field `GITS_CBASER.Size` encodes the number of 4KB pages minus 0
consisting of the command queue. This field is 8 bits which means the
maximum size is 2^8 * 4KB = 1MB. Given that each command is 32 bytes,
there is a maximum of 32768 commands in the queue.

The ITS provides no specific completion notification
mechanism. Completion is monitored by a combination of a `SYNC`
command and either polling `GITS_CREADR` or notification via an
interrupt generated via the `INT` command.

Note that the interrupt generation via `INT` requires an originating
device ID to be supplied (which is then translated via the ITS into an
LPI). No specific device ID is defined for this purpose and so the OS
software is expected to fabricate one.

Possible ways of inventing such a device ID are:

* Enumerate all device ids in the system and pick another one;
* Use a PCI BDF associated with a non-existent device function (such
  as an unused one relating to the PCI root-bridge) and translate that
  (via firmware tables) into a suitable device id;
* ???

## LPI Configuration Table

Each LPI has an associated configuration byte in the LPI Configuration
Table (managed via the GIC Redistributor and placed at
`GICR_PROPBASER` or `GICR_VPROPBASER`). This byte configures:

* The LPI's priority;
* Whether the LPI is enabled or disabled.

Software updates the Configuration Table directly but must then issue
an invalidate command (per-device `INV` ITS command, global `INVALL`
ITS command or write `GICR_INVLPIR`) for the affect to be guaranteed
to become visible (possibly requiring an ITS `SYNC` command to ensure
completion of the `INV` or `INVALL`). Note that it is valid for an
implementaiton to reread the configuration table at any time (IOW it
is _not_ guarenteed that a change to the LPI Configuration Table won't
be visible until an invalidate is issued).

## LPI Pending Table

Each LPI also has an associated bit in the LPI Pending Table (managed
by the GIC redistributor). This bit signals whether the LPI is pending
or not.

# vITS

A guest domain which is allowed to use ITS functionality (i.e. has
been assigned pass-through devices which can generate MSIs) will be
presented with a virtualised ITS.

Accesses to the vITS registers will trap to Xen and be emulated and a
virtualised Command Queue will be provided.

Commands entered onto the virtual Command Queue will be translated
into physical commands, as described later in this document.

XXX there are other aspects to virtualising the ITS (LPI collection
management, assignment of LPI ranges to guests, device
management). However these are not currently considered here. XXX
Should they be/do they need to be?

# Requirements

Emulation should not block in the hypervisor for extended periods. In
particular Xen should not busy wait on the physical ITS. Doing so
blocks the physical CPU from doing anything else (such as scheduling
other VCPUS)

There may be multiple guests which have a vITS, all targeting the same
underlying pITS. A single guest VCPU should not be able to monopolise
the pITS via its vITS and all guests should be able to make forward
progress.

# vITS to pITS mapping

A physical system may have multiple physical ITSs.

We assume that a given device is only associated with one pITS.

A guest which is given access to multiple devices associated with
multiple pITSs will need to be given virtualised access to all
associated pITSs.

There are several possible models for achieving this:

* `1:N`: One virtual ITS tired to multiple physical ITS.
* `N:N`: One virtual ITS per physical ITS.
* `M:N`: Multiple virtual ITS tied to a differing number of physical ITSs.

This design assumes an `N:N` model, which is thought to be simpler on
the Xen side since it avoids questions of how to fairly schedule
commands in the `1:N` model while avoiding starvation as well as
simplifying the virtualisation of global commands such as `INVALL` or
`SYNC`.

The `N:N` model is also a better fit for I/O NUMA systems.

Since the choice of model is internal to the hypervisor/tools and is
communicated to the guest via firmware tables we are not tied to this
model as an ABI if we decide to change.

New toolstack domctls or extension to existing domctls will likely be
required to allow the toolstack to determine the number of vITS which
will be required for the guest and to determine the mapping for
passed-through devices.

# LPI Configuration Table Virtualistion

A guest's write accesses to its LPI Configuration Table (which is just
an area of guest RAM which the guest has nominated) will be trapped to
the hypervisor, using stage 2 MMU permissions, in order for changes to
be propagated into the physical LPI Configuration Table.

A host wide LPI dirty bit map, with 1 bit per LPI, will be maintained
which indicates whether an update to the physical LPI Configuration
Table has been flushed (via an invalidate command). The corresponding
bit will be set whenever a guest changes the configuration of an LPI.

This dirty bit map will be used during the handling of relevant ITS
Commands (`INV`, `INVALL` etc).

Note that no invalidate is required during the handling of an LPI
Configuration Table trap.

# Command Queue Virtualisation

The command queue of each vITS is represented by a data structure:

    struct vits_cq {
        list_head schedule_list; /* Queued onto pits.schedule_list */
        uint32_t creadr;         /* Virtual creadr */
        uint32_t cwriter;        /* Virtual cwriter */
        uint32_t progress;       /* Index of last command queued to pits */
        [ Reference to command queue memory ]
    };

Each pITS has an associated data structure:

    struct pits {
        list_head schedule_list; /* Contains list of vitq_cq.schedule_lists */
	uint32_t last_creadr;
    };

On write to the virtual `CWRITER` the cwriter field is updated and if
that results in there being new outstanding requests then the vits_cq
is enqueued onto pITS' schedule_list (unless it is already there).

On read from the virtual `CREADR` iff the vits_cq is such that
commands are outstanding then a scheduling pass is attempted (in order
to update `vits_cq.creadr`). The current value of `vitq_cq.creadr` is
then returned.

## Command translation

In order to virtualise the Command Queue each command must be
translated (this is described in the GIC spec).

Translation of certain commands is potentially expensive, however we
will attempt to arrange things (data structures etc) such that the
overhead a translation time is minimised (see later).

Translation can be done in two places:

* During scheduling.
* On write to `CWRITER`, into a per `vits_cq` queue which the
  scheduler then propagates to the pits.

Doing the translate during scheduling means that potentially expensive
operations may be accounted to `current`, who may have nothing to do
with those operations (this is true whether it is IRQ context or
SOFTIRQ context).

Doing the translate during `CWRITER` emulation accounts it to the
right place, but introduces a potentially long synchronous operation
which ties down a VCPU. Introducing batching here means we have
essentially the same issue wrt when to replenish the translated queue
as doing translate during scheduling.

Translate during `CWRITER` also has memory overheads. Unclear if they
are at a problematic scale or not.

Since we have arranged for translation overheads to be minimised it
seems that translation during scheduling should be tollerable.

## pITS Scheduling

A pITS scheduling pass is attempted:

* On write to any virtual `CWRITER` iff that write results in there
  being new outstanding requests for that vits;
* On read from a virtual `CREADR` iff there are commands outstanding
  on that vits;
* On receipt of an interrupt notification arising from Xen's own use
  of `INT`; (see discussion under Completion)
* On any interrupt injection arising from a guests use of the `INT`
  command; (XXX perhaps, see discussion under Completion)

This may result in lots of contention on the scheduler
locking. Therefore we consider that in each case all which happens is
triggering of a softirq which will be processed on return to guest,
and just once even for multiple events.

Such deferral could be considered OK (XXX ???) for the `CREADR` case
because at worst the value read will be one cycle out of date. A guest
which receives an `INT` notification might reasonably expect a
subsequent read of `CREADR` to reflect that. However that should be
covered by the softint processing which would occur on entry to the
guest to inject the `INT`.

Each scheduling pass will:

* Read the physical `CREADR`;
* For each command between `pits.last_creadr` and the new `CREADR`
  value process completion of that command and update the
  corresponding `vits_cq.creadr`.
* Attempt to refill the pITS Command Queue (see below).

## Domain Shutdown

We can't free a `vits_cq` while has things on the physical control
queue, and we cannot cancel things which are on the control queue.

So we must wait.

Obviously don't enqueue anything new onto the pits if `d->is_dying`.

`domain_relinquish_resources()` waits (somehow, with suitable
continuations etc) for anything which the `vits_cq` has outstanding to
be completed so that the datastructures can be cleared.

## Filling the pITS Command Queue.

Various algorithms could be used here. For now a simple proposal is
to traverse the `pits.schedule_list` starting from where the last
refill finished (i.e not from the top of the list each time).

In order to simplify bookkeeping and to bound the amount of time spent
on a single scheduling pass each `vitq_cq` will only have a single
batch of commands enqueued with the PITs at a time.

If a `vits_cq` has no pending commands then it is removed from the
list.

If a `vits_cq` already has commands enqueued with the pITS Command
Queue then it is skipped.

If a `vits_cq` has some pending commands then `min(pits-free-slots,
vits-outstanding, VITS_BATCH_SIZE)` will be taken from the vITS
command queue, translated and placed onto the pITS
queue. `vits_cq.progress` will be updated to reflect this.

Each `vits_cq` is handled in turn in this way until the pITS Command
Queue is full, there are no more outstanding commands or each active
`vits_cq` has commands enqueued with the pITS.

There will likely need to be a data structure which shadows the pITS
Command Queue slots with references to the `vits_cq` which has a
command currently occupying that slot and corresponding the index into
the virtual command queue, for use when completing a command.

`VITS_BATCH_SIZE` should be small, TBD say 4 or 8.

## Completion

It is expected that commands will normally be completed (resulting in
an update of the corresponding `vits_cq.creadr`) via guest read from
`CREADR`. This will trigger a scheduling pass which will ensure the
`vits_cq.creadr` value is up to date before it is returned.

A guest which does completion via the use of `INT` cannot observe
`CREADR` without reading it, so updating on read from `CREADR`
suffices from the point of view of the guests observation of the
state. (Of course we will inject the interrupt at the designated point
and the guest may well then read `CREADR`)

However in order to keep the pITS Command Queue moving along we need
to consider what happens if there are no `INT` based events nor reads
from `CREADR` to drive completion and therefore refilling of the Queue
with other outstanding commands.

A guest which enqueues some commands and then never checks for
completion cannot itself block things because any other guest which
reads `CREADR` will drive completion. However if _no_ guest reads from
`CREADR` then completion will not occur and this must be dealt with.

Even if we include completion on `INT`-base interrupt injection then
it is possible that the pITS queue may not contain any such
interrupts, either because no guest is using them or because the
batching means that none of them are enqueued on the active ring at
the moment.

So we need a fallback to ensure that queue keeps moving. There are
several options:

* A periodic timer in Xen which runs whenever there are outstanding
  commands in the pITS. This is simple but pretty sucky.
* Xen injects its own `INT` commands into the pITS ring. This requires
  figuring out a device ID to use.

The second option is likely to be preferable if the issue of selecting
a device ID can be addressed.

A secondary question is when these `INT` commands should be inserted
into the command stream:

* After each batch taken from a single `vits_cq`;
* After each scheduling pass;
* One active in the command stream at any given time;

The latter should be sufficient, by arranging to insert a `INT` into
the stream at the end of any scheduling pass which occurs while there
is not a currently outstanding `INT` we have sufficient backstop to
allow us to refill the ring.

This assumes that there is no particular benefit to keeping the
`CWRITER` rolling ahead of the pITS's actual processing. This is true
because the ITS operates on commands in the order they appear in the
queue, so there is no need to maintain a runway ahead of the ITS
processing. (XXX If this is a concern perhaps the INT could be
inserted at the head of the final batch of commands in a scheduling
pass instead of the tail).

Xen itself should never need to issue an associated `SYNC` command,
since the individual guests would need to issue those themselves when
they care. The `INT` only serves to allow Xen to enqueue new commands
when there is space on the ring, it has no interest itself on the
actual completion.

## Locking

It may be preferable to use `atomic_t` types for various fields
(e.g. `vits_cq.creadr`) in order to reduce the amount and scope of
locking required.

# ITS Command Translation

This section is based on the section 5.13 of GICv3 specification
(PRD03-GENC-010745 24.0). The goal is to provide insight of the cost
to emulate ITS commands in Xen.

The ITS provides 12 commands in order to manage interrupt collections,
devices and interrupts. Possible command parameters are device ID
(`ID`), Event ID (`vID`), Collection ID (`vCID`), Target Address
(`vTA`) parameters.

These parameters need to be validated and translated from Virtual to
Physical.

## Parameter Validation / Translation

Each command contains parameters that needs to be validated before any
usage in Xen or passing to the hardware.

### Device ID (`ID`)

This parameter is used by commands which manage a specific device and
the interrupts associated with that device. Checking if a device is
present and retrieving the data structure must be fast.

The device identifiers may not be assigned contiguously and the maximum
number is very high (2^32).

XXX In the context of virtualised device ids this may not be the case,
e.g. we can arrange for (mostly) contiguous device ids and we know the
bound is significantly lower than 2^32

Possible efficient data structures would be:

1. List: The lookup/deletion is in O(n) and the insertion will depend
   if the device should be sorted following their identifier. The
   memory overhead is 18 bytes per element.
2. Red-black tree: All the operations are O(log(n)). The memory
   overhead is 24 bytes per element.

A Red-black tree seems the more suitable for having fast deviceID
validation even though the memory overhead is a bit higher compare to
the list.

### Event ID (`vID`)

This is the per-device Interrupt identifier (i.e. the MSI index). It
is configured by the device driver software.

It is not necessary to translate a `vID`, however they may need to be
represented in various data structures given to the pITS.

XXX is any of this true?

### Interrupt Collection (`vCID`)

This parameter is used in commands which manage collections and
interrupt in order to move them for one CPU to another. The ITS is
only mandated to implement N + 1 collections where N is the number of
processor on the platform (i.e max number of VCPUs for a given
guest). Furthermore, the identifiers are always contiguous.

If we decide to implement the strict minimum (i.e N + 1), an array is
enough and will allow operations in O(1).

XXX Could forgo array and go straight to vcpu_info/domain_info.

### Target Address (`vTA`)

This parameter is used in commands which manage collections. It is a
unique identifier per processor. The format is different following the
value of the `GITS_TYPER.PTA` bit . The value of the field is fixed by
the ITS implementation and the software has to handle the 2 cases.

A solution with `GITS_TYPER.PTA` set to one will require some
computation in order to find the VCPU associated with the
redistributor address. It will be similar to get_vcpu_from_rdist in
the vGICv3 emulation (xen/arch/arm/vgic-v3.c).

On another hand, setting GITS_TYPER.PTA to zero will give us control to
decide the linear process number which could simply be the vcpu_id (always
linear).

XXX Non-linear VCPUs e.g. via AFFR1 hierarchy?

## Command Translation

Of the existing GICv3 ITS commands, `MAPC`, `MAPD`, `MAPVI`/`MAPI` are
potentially time consuming commands as these commands creates entry in
the Xen ITS structures, which are used to validate other ITS commands.

`INVALL` and `SYNC` are global and potentially disruptive to other
guests and so need consideration.

All other ITS command like `MOVI`, `DISCARD`, `INV`, `INT`, `CLEAR`
just validate and generate physical command.

### `MAPC` command translation

Format: `MAPC vCID, vTA`

- `MAPC pCID, pTA` physical ITS command is generated

### `MAPD` Command translation

Format: `MAPD device, Valid, ITT IPA, ITT Size`

`MAPD` is sent with `Valid` bit set if device needs to be added and reset
when device is removed.

If `Valid` bit is set:

- Allocate memory for `its_device` struct
- Validate ITT IPA & ITT size and update its_device struct
- Find number of vectors(nrvecs) for this device by querying PCI
  helper function
- Allocate nrvecs number of LPI XXX nrvecs is a function of `ITT Size`?
- Allocate memory for `struct vlpi_map` for this device. This
  `vlpi_map` holds mapping of Virtual LPI to Physical LPI and ID.
- Find physical ITS node with which this device is associated
- Call `p2m_lookup` on ITT IPA addr and get physical ITT address
- Validate ITT Size
- Generate/format physical ITS command: `MAPD, ITT PA, ITT Size`

Here the overhead is with memory allocation for `its_device` and `vlpi_map`

XXX Suggestion was to preallocate some of those at device passthrough
setup time?

If Validation bit is not set:

- Validate if the device exits by checking vITS device list
- Clear all `vlpis` assigned for this device
- Remove this device from vITS list
- Free memory

XXX If preallocation presumably shouldn't free here either.

### `MAPVI`/`MAPI` Command translation

Format: `MAPVI device, ID, vID, vCID`

- Validate if the device exits by checking vITS device list
- Validate vCID and get pCID by searching cid_map

- if vID does not have entry in `vlpi_entries` of this device allocate
  a new pID from `vlpi_map` of this device and update `vlpi_entries`
  with new pID
- Allocate irq descriptor and add to RB tree
- call `route_irq_to_guest()` for this pID
- Generate/format physical ITS command: `MAPVI device ID, pID, pCID`

Here the overhead is allocating physical ID, allocate memory for irq
descriptor and routing interrupt.

XXX Suggested to preallocate?

### `INVALL` Command translation

A physical `INVALL` is only generated if the LPI dirty bitmap has any
bits set. Otherwise it is skipped.

XXX Perhaps bitmap should just be a simple counter?

XXX bitmap is host global, a per-domain bitmap would allow us to elide
`INVALL` unless an LPI associated with the guest making the request
was dirty. Would also need some sort of "ITS INVALL" clock in order
that other guests can elide their own `INVALL` if one has already
happened. Complexity not worth it at this stage?

### `SYNC` Command translation

Can be omitted from the physical command stream if the previous
command was also a `SYNC`, i.e. due to a guest sending a series of
`SYNC` commands or one guest's batch ending with one and the nexts
beggining.

XXX TBD can we do anything more? e.g. omit sync if the guest hasn't
done anything of importance since the last sync?

# GICv4 Direct Interrupt Injection

GICv4 will directly mark the LPIs pending in the virtual pending table
which is per-redistributor (i.e per-vCPU).

LPIs will be received by the guest the same way as an SPIs. I.e trap in
IRQ mode then read ICC_IAR1_EL1 (for GICv3).

Therefore GICv4 will not require one vITS per pITS.

# Event Channels

It has been proposed that it might be nice to inject event channels as
LPIs in the future. Whether or not that would involve any sort of vITS
is unclear, but if it did then it would likely be a separate emulation
to the vITS emulation used with a pITS and as such is not considered
further here.

# Glossary

* _MSI_: Message Signalled Interrupt
* _ITS_: Interrupt Translation Service
* _GIC_: Generic Interrupt Controller
* _LPI_: Locality-specific Peripheral Interrupt

# References

"GIC Architecture Specification" PRD03-GENC-010745 24.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Draft C] Xen on ARM vITS Handling
  2015-05-27 11:48 [Draft C] Xen on ARM vITS Handling Ian Campbell
@ 2015-05-27 16:44 ` Vijay Kilari
  2015-05-29 14:06   ` Julien Grall
  2015-06-01 12:11   ` Ian Campbell
  2015-05-29 13:40 ` Julien Grall
  1 sibling, 2 replies; 15+ messages in thread
From: Vijay Kilari @ 2015-05-27 16:44 UTC (permalink / raw)
  To: Ian Campbell; +Cc: manish.jaggi, Julien Grall, Stefano Stabellini, xen-devel

On Wed, May 27, 2015 at 5:18 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> Here follows draft C based on previous feedback.
>
> Also at:
>
> http://xenbits.xen.org/people/ianc/vits/draftC.{pdf,html}
>
> I think I've captured most of the previous discussion, except where
> explicitly noted by XXX or in other replies, but please do point out
> places where I've missed something.
>
> One area where I am pretty sure I've dropped the ball is on the
> completion and update of `CREADR`. That conversation ended up
> bifurcating along the 1:N vs N:N mapping scheme lines, and I didn't
> manage to get the various proposals straight. Since we've now agreed on
> N:N hopefully we can reach a conclusion (no pun intended) on the
> completion aspect too (sorry that this probably means rehasing at least
> a subset of the previous thread).
>
> Ian.
>
> % Xen on ARM vITS Handling
> % Ian Campbell <ian.campbell@citrix.com>
> % Draft C
>
> # Changelog
>
> ## Since Draft B
>
> * Details of command translation (thanks to Julien and Vijay)
> * Added background on LPI Translation and Pending tablesd
> * Added background on Collections
> * Settled on `N:N` scheme for vITS:pITS mapping.
> * Rejigged section nesting a bit.
> * Since we now thing translation should be cheap, settle on
>   translation at scheduling time.
> * Lazy `INVALL` and `SYNC`
>
> ## Since Draft A
>
> * Added discussion of when/where command translation occurs.
> * Contention on scheduler lock, suggestion to use SOFTIRQ.
> * Handling of domain shutdown.
> * More detailed discussion of multiple vs single vits pros/cons.
>
> # Introduction
>
> ARM systems containing a GIC version 3 or later may contain one or
> more ITS logical blocks. An ITS is used to route Message Signalled
> interrupts from devices into an LPI injection on the processor.
>
> The following summarises the ITS hardware design and serves as a set
> of assumptions for the vITS software design. (XXX it is entirely
> possible I've horribly misunderstood how this stuff fits
> together). For full details of the ITS see the "GIC Architecture
> Specification".
>
> ## Device Identifiers
>
> Each device using the ITS is associated with a unique identifier.
>
> The device IDs are typically described via system firmware, e.g. the
> ACPI IORT table or via device tree.
>
> The number of device ids is variable and can be discovered via
> `GITS_TYPER.Devbits`. This field allows an ITS to have up to 2^32
> device.
>
> ## Interrupt Collections
>
> Each interrupt is a member of an Interrupt Collection. This allows
> software to manage large numbers of physical interrupts with a small
> number of commands rather than issuing one command per interrupt.
>
> On a system with N processors, the ITS must provide at least N+1
> collections.
>
> ## Target Addresses
>
> The Target Address correspond to a specific GIC re-distributor. The format
> of this field depends on the value of the `GITS_TYPER.PTA` bit:
>
> * 1: the base address of the re-distributor target is used
> * 0: a unique processor number is used. The mapping between the
>   processor affinity value (`MPIDR`) and the processor number is
>   discoverable via `GICR_TYPER.ProcessorNumber`.
>
> ## ITS Translation Table
>
> Message signalled interrupts are translated into an LPI via an ITS
> translation table which must be configured for each device which can
> generate an MSI.
>
> The ITS translation table maps the device id of the originating devic
> into an Interrupt Collection and then into a target address.
>
> ## ITS Configuration
>
> The ITS is configured and managed, including establishing and
> configuring Translation Table for each device, via an in memory ring
> shared between the CPU and the ITS controller. The ring is managed via
> the `GITS_CBASER` register and indexed by `GITS_CWRITER` and
> `GITS_CREADR` registers.
>
> A processor adds commands to the shared ring and then updates
> `GITS_CWRITER` to make them visible to the ITS controller.
>
> The ITS controller processes commands from the ring and then updates
> `GITS_CREADR` to indicate the the processor that the command has been
> processed.
>
> Commands are processed sequentially.
>
> Commands sent on the ring include operational commands:
>
> * Routing interrupts to processors;
> * Generating interrupts;
> * Clearing the pending state of interrupts;
> * Synchronising the command queue
>
> and maintenance commands:
>
> * Map device/collection/processor;
> * Map virtual interrupt;
> * Clean interrupts;
> * Discard interrupts;
>
> The field `GITS_CBASER.Size` encodes the number of 4KB pages minus 0
> consisting of the command queue. This field is 8 bits which means the
> maximum size is 2^8 * 4KB = 1MB. Given that each command is 32 bytes,
> there is a maximum of 32768 commands in the queue.
>
> The ITS provides no specific completion notification
> mechanism. Completion is monitored by a combination of a `SYNC`
> command and either polling `GITS_CREADR` or notification via an
> interrupt generated via the `INT` command.
>
> Note that the interrupt generation via `INT` requires an originating
> device ID to be supplied (which is then translated via the ITS into an
> LPI). No specific device ID is defined for this purpose and so the OS
> software is expected to fabricate one.
>
> Possible ways of inventing such a device ID are:
>
> * Enumerate all device ids in the system and pick another one;
> * Use a PCI BDF associated with a non-existent device function (such
>   as an unused one relating to the PCI root-bridge) and translate that
>   (via firmware tables) into a suitable device id;
> * ???
>
> ## LPI Configuration Table
>
> Each LPI has an associated configuration byte in the LPI Configuration
> Table (managed via the GIC Redistributor and placed at
> `GICR_PROPBASER` or `GICR_VPROPBASER`). This byte configures:
>
> * The LPI's priority;
> * Whether the LPI is enabled or disabled.
>
> Software updates the Configuration Table directly but must then issue
> an invalidate command (per-device `INV` ITS command, global `INVALL`
> ITS command or write `GICR_INVLPIR`) for the affect to be guaranteed
> to become visible (possibly requiring an ITS `SYNC` command to ensure
> completion of the `INV` or `INVALL`). Note that it is valid for an
> implementaiton to reread the configuration table at any time (IOW it
> is _not_ guarenteed that a change to the LPI Configuration Table won't
> be visible until an invalidate is issued).
>
> ## LPI Pending Table
>
> Each LPI also has an associated bit in the LPI Pending Table (managed
> by the GIC redistributor). This bit signals whether the LPI is pending
> or not.
>
> # vITS
>
> A guest domain which is allowed to use ITS functionality (i.e. has
> been assigned pass-through devices which can generate MSIs) will be
> presented with a virtualised ITS.
>
> Accesses to the vITS registers will trap to Xen and be emulated and a
> virtualised Command Queue will be provided.
>
> Commands entered onto the virtual Command Queue will be translated
> into physical commands, as described later in this document.
>
> XXX there are other aspects to virtualising the ITS (LPI collection
> management, assignment of LPI ranges to guests, device
> management). However these are not currently considered here. XXX
> Should they be/do they need to be?
>
> # Requirements
>
> Emulation should not block in the hypervisor for extended periods. In
> particular Xen should not busy wait on the physical ITS. Doing so
> blocks the physical CPU from doing anything else (such as scheduling
> other VCPUS)
>
> There may be multiple guests which have a vITS, all targeting the same
> underlying pITS. A single guest VCPU should not be able to monopolise
> the pITS via its vITS and all guests should be able to make forward
> progress.
>
> # vITS to pITS mapping
>
> A physical system may have multiple physical ITSs.
>
> We assume that a given device is only associated with one pITS.
>
> A guest which is given access to multiple devices associated with
> multiple pITSs will need to be given virtualised access to all
> associated pITSs.
>
> There are several possible models for achieving this:
>
> * `1:N`: One virtual ITS tired to multiple physical ITS.
> * `N:N`: One virtual ITS per physical ITS.
> * `M:N`: Multiple virtual ITS tied to a differing number of physical ITSs.
>
> This design assumes an `N:N` model, which is thought to be simpler on
> the Xen side since it avoids questions of how to fairly schedule
> commands in the `1:N` model while avoiding starvation as well as
> simplifying the virtualisation of global commands such as `INVALL` or
> `SYNC`.
>
> The `N:N` model is also a better fit for I/O NUMA systems.
>
> Since the choice of model is internal to the hypervisor/tools and is
> communicated to the guest via firmware tables we are not tied to this
> model as an ABI if we decide to change.
>
> New toolstack domctls or extension to existing domctls will likely be
> required to allow the toolstack to determine the number of vITS which
> will be required for the guest and to determine the mapping for
> passed-through devices.
>
> # LPI Configuration Table Virtualistion
>
> A guest's write accesses to its LPI Configuration Table (which is just
> an area of guest RAM which the guest has nominated) will be trapped to
> the hypervisor, using stage 2 MMU permissions, in order for changes to
> be propagated into the physical LPI Configuration Table.
>
> A host wide LPI dirty bit map, with 1 bit per LPI, will be maintained
> which indicates whether an update to the physical LPI Configuration
> Table has been flushed (via an invalidate command). The corresponding
> bit will be set whenever a guest changes the configuration of an LPI.
>
> This dirty bit map will be used during the handling of relevant ITS
> Commands (`INV`, `INVALL` etc).
>
> Note that no invalidate is required during the handling of an LPI
> Configuration Table trap.
>
> # Command Queue Virtualisation
>
> The command queue of each vITS is represented by a data structure:
>
>     struct vits_cq {
>         list_head schedule_list; /* Queued onto pits.schedule_list */
>         uint32_t creadr;         /* Virtual creadr */
>         uint32_t cwriter;        /* Virtual cwriter */
>         uint32_t progress;       /* Index of last command queued to pits */
>         [ Reference to command queue memory ]
>     };
>
> Each pITS has an associated data structure:
>
>     struct pits {
>         list_head schedule_list; /* Contains list of vitq_cq.schedule_lists */
>         uint32_t last_creadr;
>     };
>
> On write to the virtual `CWRITER` the cwriter field is updated and if
> that results in there being new outstanding requests then the vits_cq
> is enqueued onto pITS' schedule_list (unless it is already there).
>
> On read from the virtual `CREADR` iff the vits_cq is such that
> commands are outstanding then a scheduling pass is attempted (in order
> to update `vits_cq.creadr`). The current value of `vitq_cq.creadr` is
> then returned.
>
> ## Command translation
>
> In order to virtualise the Command Queue each command must be
> translated (this is described in the GIC spec).
>
> Translation of certain commands is potentially expensive, however we
> will attempt to arrange things (data structures etc) such that the
> overhead a translation time is minimised (see later).
>
> Translation can be done in two places:
>
> * During scheduling.
> * On write to `CWRITER`, into a per `vits_cq` queue which the
>   scheduler then propagates to the pits.
>
> Doing the translate during scheduling means that potentially expensive
> operations may be accounted to `current`, who may have nothing to do
> with those operations (this is true whether it is IRQ context or
> SOFTIRQ context).
>
> Doing the translate during `CWRITER` emulation accounts it to the
> right place, but introduces a potentially long synchronous operation
> which ties down a VCPU. Introducing batching here means we have
> essentially the same issue wrt when to replenish the translated queue
> as doing translate during scheduling.
>
> Translate during `CWRITER` also has memory overheads. Unclear if they
> are at a problematic scale or not.
>
> Since we have arranged for translation overheads to be minimised it
> seems that translation during scheduling should be tollerable.
>
> ## pITS Scheduling
>
> A pITS scheduling pass is attempted:
>
> * On write to any virtual `CWRITER` iff that write results in there
>   being new outstanding requests for that vits;

   You mean, scheduling pass (softirq trigger)  is triggered iff there is no
ongoing requests from that vits?

> * On read from a virtual `CREADR` iff there are commands outstanding
>   on that vits;
> * On receipt of an interrupt notification arising from Xen's own use
>   of `INT`; (see discussion under Completion)
> * On any interrupt injection arising from a guests use of the `INT`
>   command; (XXX perhaps, see discussion under Completion)
>
> This may result in lots of contention on the scheduler
> locking. Therefore we consider that in each case all which happens is
> triggering of a softirq which will be processed on return to guest,
> and just once even for multiple events.

Is it required to have all the cases to trigger scheduling pass?
Just on CWRITER if no ongoing request and on Xen's own completion INT
is not sufficient?

>
> Such deferral could be considered OK (XXX ???) for the `CREADR` case
> because at worst the value read will be one cycle out of date. A guest
> which receives an `INT` notification might reasonably expect a
> subsequent read of `CREADR` to reflect that. However that should be
> covered by the softint processing which would occur on entry to the
> guest to inject the `INT`.
>
> Each scheduling pass will:
>
> * Read the physical `CREADR`;
> * For each command between `pits.last_creadr` and the new `CREADR`
>   value process completion of that command and update the
>   corresponding `vits_cq.creadr`.
> * Attempt to refill the pITS Command Queue (see below).
>
> ## Domain Shutdown
>
> We can't free a `vits_cq` while has things on the physical control
> queue, and we cannot cancel things which are on the control queue.
>
> So we must wait.
>
> Obviously don't enqueue anything new onto the pits if `d->is_dying`.
>
> `domain_relinquish_resources()` waits (somehow, with suitable
> continuations etc) for anything which the `vits_cq` has outstanding to
> be completed so that the datastructures can be cleared.
>
> ## Filling the pITS Command Queue.
>
> Various algorithms could be used here. For now a simple proposal is
> to traverse the `pits.schedule_list` starting from where the last
> refill finished (i.e not from the top of the list each time).
>
> In order to simplify bookkeeping and to bound the amount of time spent
> on a single scheduling pass each `vitq_cq` will only have a single
> batch of commands enqueued with the PITs at a time.
>
> If a `vits_cq` has no pending commands then it is removed from the
> list.
>
> If a `vits_cq` already has commands enqueued with the pITS Command
> Queue then it is skipped.
>
> If a `vits_cq` has some pending commands then `min(pits-free-slots,
> vits-outstanding, VITS_BATCH_SIZE)` will be taken from the vITS
> command queue, translated and placed onto the pITS
> queue. `vits_cq.progress` will be updated to reflect this.
>
> Each `vits_cq` is handled in turn in this way until the pITS Command
> Queue is full, there are no more outstanding commands or each active
> `vits_cq` has commands enqueued with the pITS.
>
> There will likely need to be a data structure which shadows the pITS
> Command Queue slots with references to the `vits_cq` which has a
> command currently occupying that slot and corresponding the index into
> the virtual command queue, for use when completing a command.
>
> `VITS_BATCH_SIZE` should be small, TBD say 4 or 8.
>
> ## Completion
>
> It is expected that commands will normally be completed (resulting in
> an update of the corresponding `vits_cq.creadr`) via guest read from
> `CREADR`. This will trigger a scheduling pass which will ensure the
> `vits_cq.creadr` value is up to date before it is returned.
>
> A guest which does completion via the use of `INT` cannot observe
> `CREADR` without reading it, so updating on read from `CREADR`
> suffices from the point of view of the guests observation of the
> state. (Of course we will inject the interrupt at the designated point
> and the guest may well then read `CREADR`)
>
> However in order to keep the pITS Command Queue moving along we need
> to consider what happens if there are no `INT` based events nor reads
> from `CREADR` to drive completion and therefore refilling of the Queue
> with other outstanding commands.
>
> A guest which enqueues some commands and then never checks for
> completion cannot itself block things because any other guest which
> reads `CREADR` will drive completion. However if _no_ guest reads from
> `CREADR` then completion will not occur and this must be dealt with.
>
> Even if we include completion on `INT`-base interrupt injection then
> it is possible that the pITS queue may not contain any such
> interrupts, either because no guest is using them or because the
> batching means that none of them are enqueued on the active ring at
> the moment.
>
> So we need a fallback to ensure that queue keeps moving. There are
> several options:
>
> * A periodic timer in Xen which runs whenever there are outstanding
>   commands in the pITS. This is simple but pretty sucky.
> * Xen injects its own `INT` commands into the pITS ring. This requires
>   figuring out a device ID to use.
>
> The second option is likely to be preferable if the issue of selecting
> a device ID can be addressed.
>
> A secondary question is when these `INT` commands should be inserted
> into the command stream:
>
> * After each batch taken from a single `vits_cq`;

   Is this not enough? because Scheduling pass just sends a one batch of
command with Xen's INT command

> * After each scheduling pass;
> * One active in the command stream at any given time;
>
> The latter should be sufficient, by arranging to insert a `INT` into
> the stream at the end of any scheduling pass which occurs while there
> is not a currently outstanding `INT` we have sufficient backstop to
> allow us to refill the ring.
>
> This assumes that there is no particular benefit to keeping the
> `CWRITER` rolling ahead of the pITS's actual processing. This is true
> because the ITS operates on commands in the order they appear in the
> queue, so there is no need to maintain a runway ahead of the ITS
> processing. (XXX If this is a concern perhaps the INT could be
> inserted at the head of the final batch of commands in a scheduling
> pass instead of the tail).
>
> Xen itself should never need to issue an associated `SYNC` command,
> since the individual guests would need to issue those themselves when
> they care. The `INT` only serves to allow Xen to enqueue new commands
> when there is space on the ring, it has no interest itself on the
> actual completion.
>
> ## Locking
>
> It may be preferable to use `atomic_t` types for various fields
> (e.g. `vits_cq.creadr`) in order to reduce the amount and scope of
> locking required.
>
> # ITS Command Translation
>
> This section is based on the section 5.13 of GICv3 specification
> (PRD03-GENC-010745 24.0). The goal is to provide insight of the cost
> to emulate ITS commands in Xen.
>
> The ITS provides 12 commands in order to manage interrupt collections,
> devices and interrupts. Possible command parameters are device ID
> (`ID`), Event ID (`vID`), Collection ID (`vCID`), Target Address
> (`vTA`) parameters.
>
> These parameters need to be validated and translated from Virtual to
> Physical.
>
> ## Parameter Validation / Translation
>
> Each command contains parameters that needs to be validated before any
> usage in Xen or passing to the hardware.
>
> ### Device ID (`ID`)
>
> This parameter is used by commands which manage a specific device and
> the interrupts associated with that device. Checking if a device is
> present and retrieving the data structure must be fast.
>
> The device identifiers may not be assigned contiguously and the maximum
> number is very high (2^32).
>
> XXX In the context of virtualised device ids this may not be the case,
> e.g. we can arrange for (mostly) contiguous device ids and we know the
> bound is significantly lower than 2^32
>
> Possible efficient data structures would be:
>
> 1. List: The lookup/deletion is in O(n) and the insertion will depend
>    if the device should be sorted following their identifier. The
>    memory overhead is 18 bytes per element.
> 2. Red-black tree: All the operations are O(log(n)). The memory
>    overhead is 24 bytes per element.
>
> A Red-black tree seems the more suitable for having fast deviceID
> validation even though the memory overhead is a bit higher compare to
> the list.

When PHYSDEVOP_pci_device_add is called, memory for its_device structure
and other needed structure for this device is allocated added to RB-tree
with all necessary information

>
> ### Event ID (`vID`)
>
> This is the per-device Interrupt identifier (i.e. the MSI index). It
> is configured by the device driver software.
>
> It is not necessary to translate a `vID`, however they may need to be
> represented in various data structures given to the pITS.
>
> XXX is any of this true?
>
> ### Interrupt Collection (`vCID`)
>
> This parameter is used in commands which manage collections and
> interrupt in order to move them for one CPU to another. The ITS is
> only mandated to implement N + 1 collections where N is the number of
> processor on the platform (i.e max number of VCPUs for a given
> guest). Furthermore, the identifiers are always contiguous.
>
> If we decide to implement the strict minimum (i.e N + 1), an array is
> enough and will allow operations in O(1).
>
> XXX Could forgo array and go straight to vcpu_info/domain_info.
>
> ### Target Address (`vTA`)
>
> This parameter is used in commands which manage collections. It is a
> unique identifier per processor. The format is different following the
> value of the `GITS_TYPER.PTA` bit . The value of the field is fixed by
> the ITS implementation and the software has to handle the 2 cases.
>
> A solution with `GITS_TYPER.PTA` set to one will require some
> computation in order to find the VCPU associated with the
> redistributor address. It will be similar to get_vcpu_from_rdist in
> the vGICv3 emulation (xen/arch/arm/vgic-v3.c).
>
> On another hand, setting GITS_TYPER.PTA to zero will give us control to
> decide the linear process number which could simply be the vcpu_id (always
> linear).
>
> XXX Non-linear VCPUs e.g. via AFFR1 hierarchy?
>
> ## Command Translation
>
> Of the existing GICv3 ITS commands, `MAPC`, `MAPD`, `MAPVI`/`MAPI` are
> potentially time consuming commands as these commands creates entry in
> the Xen ITS structures, which are used to validate other ITS commands.
>
> `INVALL` and `SYNC` are global and potentially disruptive to other
> guests and so need consideration.
>
> All other ITS command like `MOVI`, `DISCARD`, `INV`, `INT`, `CLEAR`
> just validate and generate physical command.
>
> ### `MAPC` command translation
>
> Format: `MAPC vCID, vTA`
>
   -  The GITS_TYPER.PAtype is emulated as 0. Hence vTA is always represents
      vcpu number. Hence vTA is validated against physical Collection
IDs by querying
      ITS driver and corresponding Physical Collection ID is retrieved.
   -  Each vITS will have cid_map (struct cid_mapping) which holds mapping of
      Virtual Collection ID(vCID), Virtual Target address(vTA) and
      Physical Collection ID (pCID).
      If vCID entry already exists in cid_map, then that particular
mapping is updated with
      the new pCID and vTA else new entry is made in cid_map
   -  MAPC pCID, pTA physical ITS command is generated

   Here there is no overhead, the cid_map entries are preallocated
with size of nr_cpus
   in the platform.


> - `MAPC pCID, pTA` physical ITS command is generated
>
> ### `MAPD` Command translation
>
> Format: `MAPD device, Valid, ITT IPA, ITT Size`
>
> `MAPD` is sent with `Valid` bit set if device needs to be added and reset
> when device is removed.
>
> If `Valid` bit is set:
>
> - Allocate memory for `its_device` struct
> - Validate ITT IPA & ITT size and update its_device struct
> - Find number of vectors(nrvecs) for this device by querying PCI
>   helper function
> - Allocate nrvecs number of LPI XXX nrvecs is a function of `ITT Size`?
> - Allocate memory for `struct vlpi_map` for this device. This
>   `vlpi_map` holds mapping of Virtual LPI to Physical LPI and ID.
> - Find physical ITS node with which this device is associated
> - Call `p2m_lookup` on ITT IPA addr and get physical ITT address
> - Validate ITT Size
> - Generate/format physical ITS command: `MAPD, ITT PA, ITT Size`
>
> Here the overhead is with memory allocation for `its_device` and `vlpi_map`
>
> XXX Suggestion was to preallocate some of those at device passthrough
> setup time?

If Validation bit is set:
   - Query its_device tree and get its_device structure for this device.
   - (XXX: If pci device is hidden from dom0, does this device is added
       with PHYSDEVOP_pci_device_add hypercall?)
   - If device does not exists return
   - If device exists in RB-tree then
          - Validate ITT IPA & ITT size and update its_device struct
          - Check if device is already assigned to the domain,
            if not then
               - Find number of vectors(nrvecs) for this device.
               - Allocate nrvecs number of LPI
               - Fetch vlpi_map for this device (preallocated at the
time of adding
                 this device to Xen). This vlpi_map holds mapping of
Virtual LPI to
                 Physical LPI and ID.
               - Call p2m_lookup on ITT IPA addr and get physical ITT address
               - Assign this device to this domain and mark as enabled
          - If this device already exists with the domain (Domain is
remapping the device)
               - Validate ITT IPA & ITT size and update its_device struct
               - Call p2m_lookup on ITT IPA addr and get physical ITT address
               - Disable all the LPIs of this device by searching
through vlpi_map and LPI
                 configuration table

          - Generate/format physical ITS command: MAPD, ITT PA, ITT Size

>
> If Validation bit is not set:
>
> - Validate if the device exits by checking vITS device list
> - Clear all `vlpis` assigned for this device
> - Remove this device from vITS list
> - Free memory
>
> XXX If preallocation presumably shouldn't free here either.
>

If Validation bit is not set:
    - Validate if the device exits by checking RB-tree and is assigned
to this domain
    - Disable all the LPIs associated with this device and release irq
    - Clear all vlpis mapping for this device
    - Remove this device from the domain

> ### `MAPVI`/`MAPI` Command translation
>
> Format: `MAPVI device, ID, vID, vCID`
>
> - Validate if the device exits by checking vITS device list
> - Validate vCID and get pCID by searching cid_map
>
> - if vID does not have entry in `vlpi_entries` of this device allocate
>   a new pID from `vlpi_map` of this device and update `vlpi_entries`
>   with new pID
> - Allocate irq descriptor and add to RB tree
> - call `route_irq_to_guest()` for this pID
> - Generate/format physical ITS command: `MAPVI device ID, pID, pCID`
>

- Validate if the device exists by checking vITS device RB-tree.
- Validate vCID and get pCID by searching cid_map
- if vID does not have entry in vlpi_entries of this device
      -  Allot pID from vlpi_map of this device and update
vlpi_entries with new pID.
      - Allocate irq descriptor and add to RB tree
      - call route_irq_to_guest() for this pID
  If exists,
     - If vCID is different ( remapping interrupts to differnt collection ),
            - Disable LPI
            - Update the vlpi_map
             (XXX: Enable LPI on guest request?)
- Generate/format physical ITS command: MAPVI device ID, pID, pCID

> Here the overhead is allocating physical ID, allocate memory for irq
> descriptor and routing interrupt.
>
> XXX Suggested to preallocate?
>
> ### `INVALL` Command translation
>
> A physical `INVALL` is only generated if the LPI dirty bitmap has any
> bits set. Otherwise it is skipped.
>
> XXX Perhaps bitmap should just be a simple counter?
>
> XXX bitmap is host global, a per-domain bitmap would allow us to elide
> `INVALL` unless an LPI associated with the guest making the request
> was dirty. Would also need some sort of "ITS INVALL" clock in order
> that other guests can elide their own `INVALL` if one has already
> happened. Complexity not worth it at this stage?
>
> ### `SYNC` Command translation
>
> Can be omitted from the physical command stream if the previous
> command was also a `SYNC`, i.e. due to a guest sending a series of
> `SYNC` commands or one guest's batch ending with one and the nexts
> beggining.
>
> XXX TBD can we do anything more? e.g. omit sync if the guest hasn't
> done anything of importance since the last sync?
>
> # GICv4 Direct Interrupt Injection
>
> GICv4 will directly mark the LPIs pending in the virtual pending table
> which is per-redistributor (i.e per-vCPU).
>
> LPIs will be received by the guest the same way as an SPIs. I.e trap in
> IRQ mode then read ICC_IAR1_EL1 (for GICv3).
>
> Therefore GICv4 will not require one vITS per pITS.
>
> # Event Channels
>
> It has been proposed that it might be nice to inject event channels as
> LPIs in the future. Whether or not that would involve any sort of vITS
> is unclear, but if it did then it would likely be a separate emulation
> to the vITS emulation used with a pITS and as such is not considered
> further here.
>
> # Glossary
>
> * _MSI_: Message Signalled Interrupt
> * _ITS_: Interrupt Translation Service
> * _GIC_: Generic Interrupt Controller
> * _LPI_: Locality-specific Peripheral Interrupt
>
> # References
>
> "GIC Architecture Specification" PRD03-GENC-010745 24.0
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Draft C] Xen on ARM vITS Handling
  2015-05-27 11:48 [Draft C] Xen on ARM vITS Handling Ian Campbell
  2015-05-27 16:44 ` Vijay Kilari
@ 2015-05-29 13:40 ` Julien Grall
  2015-06-01 13:12   ` Ian Campbell
  1 sibling, 1 reply; 15+ messages in thread
From: Julien Grall @ 2015-05-29 13:40 UTC (permalink / raw)
  To: Ian Campbell, xen-devel
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari

Hi Ian,

NIT: You used my Linaro email which I think is de-activated now :).

On 27/05/2015 13:48, Ian Campbell wrote:
> Here follows draft C based on previous feedback.
>
> Also at:
>
> http://xenbits.xen.org/people/ianc/vits/draftC.{pdf,html}
>
> I think I've captured most of the previous discussion, except where
> explicitly noted by XXX or in other replies, but please do point out
> places where I've missed something.
>
> One area where I am pretty sure I've dropped the ball is on the
> completion and update of `CREADR`. That conversation ended up
> bifurcating along the 1:N vs N:N mapping scheme lines, and I didn't
> manage to get the various proposals straight. Since we've now agreed on
> N:N hopefully we can reach a conclusion (no pun intended) on the
> completion aspect too (sorry that this probably means rehasing at least
> a subset of the previous thread).
>
> Ian.
>
> % Xen on ARM vITS Handling
> % Ian Campbell <ian.campbell@citrix.com>
> % Draft C
>
> # Changelog
>
> ## Since Draft B
>
> * Details of command translation (thanks to Julien and Vijay)
> * Added background on LPI Translation and Pending tablesd
> * Added background on Collections
> * Settled on `N:N` scheme for vITS:pITS mapping.
> * Rejigged section nesting a bit.
> * Since we now thing translation should be cheap, settle on
>    translation at scheduling time.
> * Lazy `INVALL` and `SYNC`
>
> ## Since Draft A
>
> * Added discussion of when/where command translation occurs.
> * Contention on scheduler lock, suggestion to use SOFTIRQ.
> * Handling of domain shutdown.
> * More detailed discussion of multiple vs single vits pros/cons.
>
> # Introduction
>
> ARM systems containing a GIC version 3 or later may contain one or
> more ITS logical blocks. An ITS is used to route Message Signalled
> interrupts from devices into an LPI injection on the processor.
>
> The following summarises the ITS hardware design and serves as a set
> of assumptions for the vITS software design. (XXX it is entirely
> possible I've horribly misunderstood how this stuff fits
> together). For full details of the ITS see the "GIC Architecture
> Specification".
>
> ## Device Identifiers
>
> Each device using the ITS is associated with a unique identifier.
>
> The device IDs are typically described via system firmware, e.g. the
> ACPI IORT table or via device tree.
>
> The number of device ids is variable and can be discovered via
> `GITS_TYPER.Devbits`. This field allows an ITS to have up to 2^32
> device.
>
> ## Interrupt Collections
>
> Each interrupt is a member of an Interrupt Collection. This allows
> software to manage large numbers of physical interrupts with a small
> number of commands rather than issuing one command per interrupt.
>
> On a system with N processors, the ITS must provide at least N+1
> collections.
>
> ## Target Addresses
>
> The Target Address correspond to a specific GIC re-distributor. The format
> of this field depends on the value of the `GITS_TYPER.PTA` bit:
>
> * 1: the base address of the re-distributor target is used
> * 0: a unique processor number is used. The mapping between the
>    processor affinity value (`MPIDR`) and the processor number is
>    discoverable via `GICR_TYPER.ProcessorNumber`.
>
> ## ITS Translation Table
>
> Message signalled interrupts are translated into an LPI via an ITS
> translation table which must be configured for each device which can
> generate an MSI.

I'm not sure what is the ITS Table Table. Did you mean Interrupt
Translation Table?

>
> The ITS translation table maps the device id of the originating devic

s/devic/device/?

> into an Interrupt Collection and then into a target address.
>
> ## ITS Configuration
>
> The ITS is configured and managed, including establishing and
> configuring Translation Table for each device, via an in memory ring
> shared between the CPU and the ITS controller. The ring is managed via
> the `GITS_CBASER` register and indexed by `GITS_CWRITER` and
> `GITS_CREADR` registers.
>
> A processor adds commands to the shared ring and then updates
> `GITS_CWRITER` to make them visible to the ITS controller.
>
> The ITS controller processes commands from the ring and then updates
> `GITS_CREADR` to indicate the the processor that the command has been
> processed.
>
> Commands are processed sequentially.
>
> Commands sent on the ring include operational commands:
>
> * Routing interrupts to processors;
> * Generating interrupts;
> * Clearing the pending state of interrupts;
> * Synchronising the command queue
>
> and maintenance commands:
>
> * Map device/collection/processor;
> * Map virtual interrupt;
> * Clean interrupts;
> * Discard interrupts;
>
> The field `GITS_CBASER.Size` encodes the number of 4KB pages minus 0
> consisting of the command queue. This field is 8 bits which means the
> maximum size is 2^8 * 4KB = 1MB. Given that each command is 32 bytes,
> there is a maximum of 32768 commands in the queue.
>
> The ITS provides no specific completion notification
> mechanism. Completion is monitored by a combination of a `SYNC`
> command and either polling `GITS_CREADR` or notification via an
> interrupt generated via the `INT` command.
>
> Note that the interrupt generation via `INT` requires an originating
> device ID to be supplied (which is then translated via the ITS into an
> LPI). No specific device ID is defined for this purpose and so the OS
> software is expected to fabricate one.
>
> Possible ways of inventing such a device ID are:
>
> * Enumerate all device ids in the system and pick another one;
> * Use a PCI BDF associated with a non-existent device function (such
>    as an unused one relating to the PCI root-bridge) and translate that
>    (via firmware tables) into a suitable device id;
> * ???
>
> ## LPI Configuration Table
>
> Each LPI has an associated configuration byte in the LPI Configuration
> Table (managed via the GIC Redistributor and placed at
> `GICR_PROPBASER` or `GICR_VPROPBASER`). This byte configures:
>
> * The LPI's priority;
> * Whether the LPI is enabled or disabled.
>
> Software updates the Configuration Table directly but must then issue
> an invalidate command (per-device `INV` ITS command, global `INVALL`
> ITS command or write `GICR_INVLPIR`) for the affect to be guaranteed
> to become visible (possibly requiring an ITS `SYNC` command to ensure
> completion of the `INV` or `INVALL`). Note that it is valid for an
> implementaiton to reread the configuration table at any time (IOW it

s/implementaition/implementation/

> is _not_ guarenteed that a change to the LPI Configuration Table won't

s/guarenteed/guaranteed/? Or may the first use of this word was wrong?

> be visible until an invalidate is issued).
>
> ## LPI Pending Table
>
> Each LPI also has an associated bit in the LPI Pending Table (managed
> by the GIC redistributor). This bit signals whether the LPI is pending
> or not.
>
> # vITS
>
> A guest domain which is allowed to use ITS functionality (i.e. has
> been assigned pass-through devices which can generate MSIs) will be
> presented with a virtualised ITS.
>
> Accesses to the vITS registers will trap to Xen and be emulated and a
> virtualised Command Queue will be provided.
>
> Commands entered onto the virtual Command Queue will be translated
> into physical commands, as described later in this document.
>
> XXX there are other aspects to virtualising the ITS (LPI collection
> management, assignment of LPI ranges to guests, device
> management). However these are not currently considered here. XXX
> Should they be/do they need to be?

I think we began to cover these aspect with the section "command emulation".

> # Requirements
>
> Emulation should not block in the hypervisor for extended periods. In
> particular Xen should not busy wait on the physical ITS. Doing so
> blocks the physical CPU from doing anything else (such as scheduling
> other VCPUS)
>
> There may be multiple guests which have a vITS, all targeting the same
> underlying pITS. A single guest VCPU should not be able to monopolise
> the pITS via its vITS and all guests should be able to make forward
> progress.
>
> # vITS to pITS mapping
>
> A physical system may have multiple physical ITSs.
>
> We assume that a given device is only associated with one pITS.
>
> A guest which is given access to multiple devices associated with
> multiple pITSs will need to be given virtualised access to all
> associated pITSs.
>
> There are several possible models for achieving this:
>
> * `1:N`: One virtual ITS tired to multiple physical ITS.
> * `N:N`: One virtual ITS per physical ITS.
> * `M:N`: Multiple virtual ITS tied to a differing number of physical ITSs.
>
> This design assumes an `N:N` model, which is thought to be simpler on
> the Xen side since it avoids questions of how to fairly schedule
> commands in the `1:N` model while avoiding starvation as well as
> simplifying the virtualisation of global commands such as `INVALL` or
> `SYNC`.
>
> The `N:N` model is also a better fit for I/O NUMA systems.
>
> Since the choice of model is internal to the hypervisor/tools and is
> communicated to the guest via firmware tables we are not tied to this
> model as an ABI if we decide to change.
>
> New toolstack domctls or extension to existing domctls will likely be
> required to allow the toolstack to determine the number of vITS which
> will be required for the guest and to determine the mapping for
> passed-through devices.
>
> # LPI Configuration Table Virtualistion

s/Virtualistion/Virtualisation/

>
> A guest's write accesses to its LPI Configuration Table (which is just
> an area of guest RAM which the guest has nominated) will be trapped to
> the hypervisor, using stage 2 MMU permissions, in order for changes to
> be propagated into the physical LPI Configuration Table.
>
> A host wide LPI dirty bit map, with 1 bit per LPI, will be maintained
> which indicates whether an update to the physical LPI Configuration
> Table has been flushed (via an invalidate command). The corresponding
> bit will be set whenever a guest changes the configuration of an LPI.
>
> This dirty bit map will be used during the handling of relevant ITS
> Commands (`INV`, `INVALL` etc).
>
> Note that no invalidate is required during the handling of an LPI
> Configuration Table trap.
>
> # Command Queue Virtualisation
>
> The command queue of each vITS is represented by a data structure:
>
>      struct vits_cq {
>          list_head schedule_list; /* Queued onto pits.schedule_list */
>          uint32_t creadr;         /* Virtual creadr */
>          uint32_t cwriter;        /* Virtual cwriter */
>          uint32_t progress;       /* Index of last command queued to pits */
>          [ Reference to command queue memory ]
>      };
>
> Each pITS has an associated data structure:
>
>      struct pits {
>          list_head schedule_list; /* Contains list of vitq_cq.schedule_lists */
> 	uint32_t last_creadr;
>      };
>
> On write to the virtual `CWRITER` the cwriter field is updated and if
> that results in there being new outstanding requests then the vits_cq
> is enqueued onto pITS' schedule_list (unless it is already there).
>
> On read from the virtual `CREADR` iff the vits_cq is such that
> commands are outstanding then a scheduling pass is attempted (in order
> to update `vits_cq.creadr`). The current value of `vitq_cq.creadr` is
> then returned.
>
> ## Command translation
>
> In order to virtualise the Command Queue each command must be
> translated (this is described in the GIC spec).
>
> Translation of certain commands is potentially expensive, however we
> will attempt to arrange things (data structures etc) such that the
> overhead a translation time is minimised (see later).
>
> Translation can be done in two places:
>
> * During scheduling.
> * On write to `CWRITER`, into a per `vits_cq` queue which the
>    scheduler then propagates to the pits.
>
> Doing the translate during scheduling means that potentially expensive
> operations may be accounted to `current`, who may have nothing to do
> with those operations (this is true whether it is IRQ context or
> SOFTIRQ context).
>
> Doing the translate during `CWRITER` emulation accounts it to the
> right place, but introduces a potentially long synchronous operation
> which ties down a VCPU. Introducing batching here means we have
> essentially the same issue wrt when to replenish the translated queue
> as doing translate during scheduling.
>
> Translate during `CWRITER` also has memory overheads. Unclear if they
> are at a problematic scale or not.
>
> Since we have arranged for translation overheads to be minimised it
> seems that translation during scheduling should be tollerable.

s/tollerable/tolerable/

>
> ## pITS Scheduling
>
> A pITS scheduling pass is attempted:
>
> * On write to any virtual `CWRITER` iff that write results in there
>    being new outstanding requests for that vits;
> * On read from a virtual `CREADR` iff there are commands outstanding
>    on that vits;
> * On receipt of an interrupt notification arising from Xen's own use
>    of `INT`; (see discussion under Completion)
> * On any interrupt injection arising from a guests use of the `INT`
>    command; (XXX perhaps, see discussion under Completion)
>
> This may result in lots of contention on the scheduler
> locking. Therefore we consider that in each case all which happens is
> triggering of a softirq which will be processed on return to guest,
> and just once even for multiple events.
>
> Such deferral could be considered OK (XXX ???) for the `CREADR` case
> because at worst the value read will be one cycle out of date. A guest
> which receives an `INT` notification might reasonably expect a
> subsequent read of `CREADR` to reflect that. However that should be
> covered by the softint processing which would occur on entry to the
> guest to inject the `INT`.
>
> Each scheduling pass will:
>
> * Read the physical `CREADR`;
> * For each command between `pits.last_creadr` and the new `CREADR`
>    value process completion of that command and update the
>    corresponding `vits_cq.creadr`.
> * Attempt to refill the pITS Command Queue (see below).
>
> ## Domain Shutdown
>
> We can't free a `vits_cq` while has things on the physical control
> queue, and we cannot cancel things which are on the control queue.
>
> So we must wait.
>
> Obviously don't enqueue anything new onto the pits if `d->is_dying`.
>
> `domain_relinquish_resources()` waits (somehow, with suitable
> continuations etc) for anything which the `vits_cq` has outstanding to
> be completed so that the datastructures can be cleared.
>
> ## Filling the pITS Command Queue.
>
> Various algorithms could be used here. For now a simple proposal is
> to traverse the `pits.schedule_list` starting from where the last
> refill finished (i.e not from the top of the list each time).
>
> In order to simplify bookkeeping and to bound the amount of time spent
> on a single scheduling pass each `vitq_cq` will only have a single
> batch of commands enqueued with the PITs at a time.
>
> If a `vits_cq` has no pending commands then it is removed from the
> list.
>
> If a `vits_cq` already has commands enqueued with the pITS Command
> Queue then it is skipped.
>
> If a `vits_cq` has some pending commands then `min(pits-free-slots,
> vits-outstanding, VITS_BATCH_SIZE)` will be taken from the vITS
> command queue, translated and placed onto the pITS
> queue. `vits_cq.progress` will be updated to reflect this.
>
> Each `vits_cq` is handled in turn in this way until the pITS Command
> Queue is full, there are no more outstanding commands or each active
> `vits_cq` has commands enqueued with the pITS.
>
> There will likely need to be a data structure which shadows the pITS
> Command Queue slots with references to the `vits_cq` which has a
> command currently occupying that slot and corresponding the index into
> the virtual command queue, for use when completing a command.
>
> `VITS_BATCH_SIZE` should be small, TBD say 4 or 8.
>
> ## Completion
>
> It is expected that commands will normally be completed (resulting in
> an update of the corresponding `vits_cq.creadr`) via guest read from
> `CREADR`. This will trigger a scheduling pass which will ensure the
> `vits_cq.creadr` value is up to date before it is returned.
>
> A guest which does completion via the use of `INT` cannot observe
> `CREADR` without reading it, so updating on read from `CREADR`
> suffices from the point of view of the guests observation of the
> state. (Of course we will inject the interrupt at the designated point
> and the guest may well then read `CREADR`)
>
> However in order to keep the pITS Command Queue moving along we need
> to consider what happens if there are no `INT` based events nor reads
> from `CREADR` to drive completion and therefore refilling of the Queue
> with other outstanding commands.
>
> A guest which enqueues some commands and then never checks for
> completion cannot itself block things because any other guest which
> reads `CREADR` will drive completion. However if _no_ guest reads from
> `CREADR` then completion will not occur and this must be dealt with.
>
> Even if we include completion on `INT`-base interrupt injection then
> it is possible that the pITS queue may not contain any such
> interrupts, either because no guest is using them or because the
> batching means that none of them are enqueued on the active ring at
> the moment.
>
> So we need a fallback to ensure that queue keeps moving. There are
> several options:
>
> * A periodic timer in Xen which runs whenever there are outstanding
>    commands in the pITS. This is simple but pretty sucky.
> * Xen injects its own `INT` commands into the pITS ring. This requires
>    figuring out a device ID to use.
>
> The second option is likely to be preferable if the issue of selecting
> a device ID can be addressed.
>
> A secondary question is when these `INT` commands should be inserted
> into the command stream:
>
> * After each batch taken from a single `vits_cq`;
> * After each scheduling pass;
> * One active in the command stream at any given time;
>
> The latter should be sufficient, by arranging to insert a `INT` into
> the stream at the end of any scheduling pass which occurs while there
> is not a currently outstanding `INT` we have sufficient backstop to
> allow us to refill the ring.
>
> This assumes that there is no particular benefit to keeping the
> `CWRITER` rolling ahead of the pITS's actual processing. This is true
> because the ITS operates on commands in the order they appear in the
> queue, so there is no need to maintain a runway ahead of the ITS
> processing. (XXX If this is a concern perhaps the INT could be
> inserted at the head of the final batch of commands in a scheduling
> pass instead of the tail).
>
> Xen itself should never need to issue an associated `SYNC` command,
> since the individual guests would need to issue those themselves when
> they care. The `INT` only serves to allow Xen to enqueue new commands
> when there is space on the ring, it has no interest itself on the
> actual completion.
>
> ## Locking
>
> It may be preferable to use `atomic_t` types for various fields
> (e.g. `vits_cq.creadr`) in order to reduce the amount and scope of
> locking required.
>
> # ITS Command Translation
>
> This section is based on the section 5.13 of GICv3 specification
> (PRD03-GENC-010745 24.0). The goal is to provide insight of the cost
> to emulate ITS commands in Xen.
>
> The ITS provides 12 commands in order to manage interrupt collections,
> devices and interrupts. Possible command parameters are device ID
> (`ID`), Event ID (`vID`), Collection ID (`vCID`), Target Address
> (`vTA`) parameters.
>
> These parameters need to be validated and translated from Virtual to
> Physical.
>
> ## Parameter Validation / Translation
>
> Each command contains parameters that needs to be validated before any
> usage in Xen or passing to the hardware.
>
> ### Device ID (`ID`)
>
> This parameter is used by commands which manage a specific device and
> the interrupts associated with that device. Checking if a device is
> present and retrieving the data structure must be fast.
>
> The device identifiers may not be assigned contiguously and the maximum
> number is very high (2^32).
>
> XXX In the context of virtualised device ids this may not be the case,
> e.g. we can arrange for (mostly) contiguous device ids and we know the
> bound is significantly lower than 2^32

Well, the deviceID is computed from the BDF and some DMA alias. As the
algorithm can't be tweaked, it's very likely that we will have
non-contiguous Device ID. See pci_for_each_dma_alias in Linux
(drivers/pci/search.c).

> Possible efficient data structures would be:
>
> 1. List: The lookup/deletion is in O(n) and the insertion will depend
>     if the device should be sorted following their identifier. The
>     memory overhead is 18 bytes per element.
> 2. Red-black tree: All the operations are O(log(n)). The memory
>     overhead is 24 bytes per element.
>
> A Red-black tree seems the more suitable for having fast deviceID
> validation even though the memory overhead is a bit higher compare to
> the list.
>
> ### Event ID (`vID`)
>
> This is the per-device Interrupt identifier (i.e. the MSI index). It
> is configured by the device driver software.
>
> It is not necessary to translate a `vID`, however they may need to be
> represented in various data structures given to the pITS.
>
> XXX is any of this true?


Right, the vID will always be equal to the pID. Although you will need
to associate a physical LPI for every pair (vID, DevID).

> ### Interrupt Collection (`vCID`)
>
> This parameter is used in commands which manage collections and
> interrupt in order to move them for one CPU to another. The ITS is
> only mandated to implement N + 1 collections where N is the number of
> processor on the platform (i.e max number of VCPUs for a given
> guest). Furthermore, the identifiers are always contiguous.
>
> If we decide to implement the strict minimum (i.e N + 1), an array is
> enough and will allow operations in O(1).
>
> XXX Could forgo array and go straight to vcpu_info/domain_info.

Not really, the number of collection is always one higher than the
number of VCPUs. How would you store the last collection?

>
> ### Target Address (`vTA`)
>
> This parameter is used in commands which manage collections. It is a
> unique identifier per processor. The format is different following the
> value of the `GITS_TYPER.PTA` bit . The value of the field is fixed by
> the ITS implementation and the software has to handle the 2 cases.
>
> A solution with `GITS_TYPER.PTA` set to one will require some
> computation in order to find the VCPU associated with the
> redistributor address. It will be similar to get_vcpu_from_rdist in
> the vGICv3 emulation (xen/arch/arm/vgic-v3.c).
>
> On another hand, setting GITS_TYPER.PTA to zero will give us control to
> decide the linear process number which could simply be the vcpu_id (always
> linear).
>
> XXX Non-linear VCPUs e.g. via AFFR1 hierarchy?

No matter of the number of affinity levels we will handle, the vcpu_id
will always be linear as it's used as an index in an array.

> ## Command Translation
>
> Of the existing GICv3 ITS commands, `MAPC`, `MAPD`, `MAPVI`/`MAPI` are
> potentially time consuming commands as these commands creates entry in
> the Xen ITS structures, which are used to validate other ITS commands.
>
> `INVALL` and `SYNC` are global and potentially disruptive to other
> guests and so need consideration.

INVALL and SYNC are not global. They both take a parameter: vCID for
INVALL and vTarget for SYNC.

INVALL ensures that any interrupts in the specified collection are
re-load. SYNC ensures that all the previous command, and all outstanding
physical actions relating to the specified re-distributor are completed.

> All other ITS command like `MOVI`, `DISCARD`, `INV`, `INT`, `CLEAR`
> just validate and generate physical command.
>
> ### `MAPC` command translation
>
> Format: `MAPC vCID, vTA`
>
> - `MAPC pCID, pTA` physical ITS command is generated

We should not send any MAPC command to the physical ITS. The collection
is already mapped during Xen boot.

This command should only assign a pCID to the vCID.

>
> ### `MAPD` Command translation
>
> Format: `MAPD device, Valid, ITT IPA, ITT Size`
>
> `MAPD` is sent with `Valid` bit set if device needs to be added and reset
> when device is removed.

Another case: The ITT is replaced. This use case needs more care because
we need to ensure that all the interrupt are disabled before switching
to the new ITT.

> If `Valid` bit is set:
>
> - Allocate memory for `its_device` struct
> - Validate ITT IPA & ITT size and update its_device struct
> - Find number of vectors(nrvecs) for this device by querying PCI
>    helper function
> - Allocate nrvecs number of LPI XXX nrvecs is a function of `ITT Size`?
> - Allocate memory for `struct vlpi_map` for this device. This
>    `vlpi_map` holds mapping of Virtual LPI to Physical LPI and ID.
> - Find physical ITS node with which this device is associated

XXX: The MAPD command is using a virtual DevID which is different that
the pDevID (because the BDF is not the same). How do you find the
corresponding translation?

> - Call `p2m_lookup` on ITT IPA addr and get physical ITT address
> - Validate ITT Size
> - Generate/format physical ITS command: `MAPD, ITT PA, ITT Size`

I had some though about the other validation problem with the ITT. The
region will be used by the ITS to store the mapping between the ID and
the LPI as long as some others information.

I guess that if the OS is playing with the ITT (such as writing in it)
the ITS will behave badly. We have to ensure to the guest will never
write in it and by the same occasion that the same region is not passed
to 2 devices.

> Here the overhead is with memory allocation for `its_device` and `vlpi_map`
>
> XXX Suggestion was to preallocate some of those at device passthrough
> setup time?

Some of the informations can even be setup when the PCI device is added
to Xen (such as the number of MSI supported and physical LPIs chunk).

> If Validation bit is not set:
>
> - Validate if the device exits by checking vITS device list
> - Clear all `vlpis` assigned for this device
> - Remove this device from vITS list
> - Free memory
>
> XXX If preallocation presumably shouldn't free here either.

Right. We could use a field to say if the device is activated or not.

>
> ### `MAPVI`/`MAPI` Command translation
>
> Format: `MAPVI device, ID, vID, vCID`

Actually the 2 commands are completely different:
	- MAPI maps a (DevID, ID) to a collection
	- MAVI maps a (DevID, ID) to a collection and an LPI.

The process described below is only about MAPVI.

Also what about interrupt re-mapping?

>
> - Validate if the device exits by checking vITS device list

s/exits/exists/

> - Validate vCID and get pCID by searching cid_map
>
> - if vID does not have entry in `vlpi_entries` of this device allocate
>    a new pID from `vlpi_map` of this device and update `vlpi_entries`
>    with new pID

What if the vID is already used by another

> - Allocate irq descriptor and add to RB tree
> - call `route_irq_to_guest()` for this pID
> - Generate/format physical ITS command: `MAPVI device ID, pID, pCID`


> Here the overhead is allocating physical ID, allocate memory for irq
> descriptor and routing interrupt.
>
> XXX Suggested to preallocate?

Right. We may also need to have a separate routing for LPIs as the
current function is quite long to execute.

I was thinking into routing the interrupt at device assignation
(assuming we allocate the pLPIs at that time). And only set the mapping
to vLPIs when the MAPI is called.

>
> ### `INVALL` Command translation

The format of INVALL is INVALL collection

> A physical `INVALL` is only generated if the LPI dirty bitmap has any
> bits set. Otherwise it is skipped.
>
> XXX Perhaps bitmap should just be a simple counter?

We would need to handle it per collection.

> XXX bitmap is host global, a per-domain bitmap would allow us to elide
> `INVALL` unless an LPI associated with the guest making the request
> was dirty. Would also need some sort of "ITS INVALL" clock in order
> that other guests can elide their own `INVALL` if one has already
> happened. Complexity not worth it at this stage?

Given that I just discovered that INVALL is also taking a collection in
parameter, it will likely be more complex.

>
> ### `SYNC` Command translation

The format of SYNC is SYNC target. It's only ensure the completion for a
re-distributor.
Although, the pseudo-code (see perform_sync in 5.13.22 in
PRD03-GENC-010745 24.0) seems to say it waits for all re-distributor...
I'm not sure what to trust.

> Can be omitted from the physical command stream if the previous
> command was also a `SYNC`, i.e. due to a guest sending a series of
> `SYNC` commands or one guest's batch ending with one and the nexts
> beggining.

s/beggining/beginning/

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Draft C] Xen on ARM vITS Handling
  2015-05-27 16:44 ` Vijay Kilari
@ 2015-05-29 14:06   ` Julien Grall
  2015-06-01 13:36     ` Ian Campbell
  2015-06-01 12:11   ` Ian Campbell
  1 sibling, 1 reply; 15+ messages in thread
From: Julien Grall @ 2015-05-29 14:06 UTC (permalink / raw)
  To: Vijay Kilari, Ian Campbell
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, xen-devel

Hi Vijay,

On 27/05/15 17:44, Vijay Kilari wrote:
>> ## Command Translation
>>
>> Of the existing GICv3 ITS commands, `MAPC`, `MAPD`, `MAPVI`/`MAPI` are
>> potentially time consuming commands as these commands creates entry in
>> the Xen ITS structures, which are used to validate other ITS commands.
>>
>> `INVALL` and `SYNC` are global and potentially disruptive to other
>> guests and so need consideration.
>>
>> All other ITS command like `MOVI`, `DISCARD`, `INV`, `INT`, `CLEAR`
>> just validate and generate physical command.
>>
>> ### `MAPC` command translation
>>
>> Format: `MAPC vCID, vTA`
>>
>    -  The GITS_TYPER.PAtype is emulated as 0. Hence vTA is always represents
>       vcpu number. Hence vTA is validated against physical Collection
> IDs by querying
>       ITS driver and corresponding Physical Collection ID is retrieved.
>    -  Each vITS will have cid_map (struct cid_mapping) which holds mapping of

Why do you speak about each vITS? The emulation is only related to one
vITS and not shared...

>       Virtual Collection ID(vCID), Virtual Target address(vTA) and
>       Physical Collection ID (pCID).
>       If vCID entry already exists in cid_map, then that particular
> mapping is updated with
>       the new pCID and vTA else new entry is made in cid_map

When you move a collection, you also have to make sure that all the
interrupts associated to it will be delivered to the new target.

I'm not sure what you are suggesting for that...


>    -  MAPC pCID, pTA physical ITS command is generated

We should not send any MAPC command to the physical ITS. The collection
is already mapped during Xen boot and the guest should not be able to
move the physical collection (they are shared between all the guests and
Xen).


> 
>    Here there is no overhead, the cid_map entries are preallocated
> with size of nr_cpus
>    in the platform.

As said the number of collection should be at least nr_cpus + 1.

> 
>> - `MAPC pCID, pTA` physical ITS command is generated
>>
>> ### `MAPD` Command translation
>>
>> Format: `MAPD device, Valid, ITT IPA, ITT Size`
>>
>> `MAPD` is sent with `Valid` bit set if device needs to be added and reset
>> when device is removed.
>>
>> If `Valid` bit is set:
>>
>> - Allocate memory for `its_device` struct
>> - Validate ITT IPA & ITT size and update its_device struct
>> - Find number of vectors(nrvecs) for this device by querying PCI
>>   helper function
>> - Allocate nrvecs number of LPI XXX nrvecs is a function of `ITT Size`?
>> - Allocate memory for `struct vlpi_map` for this device. This
>>   `vlpi_map` holds mapping of Virtual LPI to Physical LPI and ID.
>> - Find physical ITS node with which this device is associated
>> - Call `p2m_lookup` on ITT IPA addr and get physical ITT address
>> - Validate ITT Size
>> - Generate/format physical ITS command: `MAPD, ITT PA, ITT Size`
>>
>> Here the overhead is with memory allocation for `its_device` and `vlpi_map`
>>
>> XXX Suggestion was to preallocate some of those at device passthrough
>> setup time?
> 
> If Validation bit is set:
>    - Query its_device tree and get its_device structure for this device.
>    - (XXX: If pci device is hidden from dom0, does this device is added
>        with PHYSDEVOP_pci_device_add hypercall?)
>    - If device does not exists return
>    - If device exists in RB-tree then
>           - Validate ITT IPA & ITT size and update its_device struct

To validate the ITT size you need to know the number of interrupt ID.

>           - Check if device is already assigned to the domain,
>             if not then
>                - Find number of vectors(nrvecs) for this device.
>                - Allocate nrvecs number of LPI
>                - Fetch vlpi_map for this device (preallocated at the
> time of adding
>                  this device to Xen). This vlpi_map holds mapping of
> Virtual LPI to
>                  Physical LPI and ID.
>                - Call p2m_lookup on ITT IPA addr and get physical ITT address
>                - Assign this device to this domain and mark as enabled
>           - If this device already exists with the domain (Domain is
> remapping the device)
>                - Validate ITT IPA & ITT size and update its_device struct
>                - Call p2m_lookup on ITT IPA addr and get physical ITT address
>                - Disable all the LPIs of this device by searching
> through vlpi_map and LPI
>                  configuration table

Disabling all the LPIs associated to a device can be time consuming
because you have to unroute them and make sure that the physical ITS
effectively disabled it before sending the MAPD command.

Given that the software would be buggy if it send a MAPD command without
releasing all the associated interrupt we could ignore the command if
any interrupt is still enabled.

> 
>           - Generate/format physical ITS command: MAPD, ITT PA, ITT Size
> 
>>
>> If Validation bit is not set:
>>
>> - Validate if the device exits by checking vITS device list
>> - Clear all `vlpis` assigned for this device
>> - Remove this device from vITS list
>> - Free memory
>>
>> XXX If preallocation presumably shouldn't free here either.
>>
> 
> If Validation bit is not set:
>     - Validate if the device exits by checking RB-tree and is assigned

exists

> to this domain
>     - Disable all the LPIs associated with this device and release irq

That could be very expensive... We should deny the command if the
associated interrupts are not disabled.

>     - Clear all vlpis mapping for this device
>     - Remove this device from the domain
> 
>> ### `MAPVI`/`MAPI` Command translation
>>
>> Format: `MAPVI device, ID, vID, vCID`
>>
>> - Validate if the device exits by checking vITS device list
>> - Validate vCID and get pCID by searching cid_map
>>
>> - if vID does not have entry in `vlpi_entries` of this device allocate
>>   a new pID from `vlpi_map` of this device and update `vlpi_entries`
>>   with new pID
>> - Allocate irq descriptor and add to RB tree
>> - call `route_irq_to_guest()` for this pID
>> - Generate/format physical ITS command: `MAPVI device ID, pID, pCID`
>>
> 
> - Validate if the device exists by checking vITS device RB-tree.
> - Validate vCID and get pCID by searching cid_map
> - if vID does not have entry in vlpi_entries of this device
>       -  Allot pID from vlpi_map of this device and update
> vlpi_entries with new pID.
>       - Allocate irq descriptor and add to RB tree
>       - call route_irq_to_guest() for this pID
>   If exists,
>      - If vCID is different ( remapping interrupts to differnt collection ),
>             - Disable LPI

You have to ensure the the LPI is disabled with an INV/SYNC.

Although as suggested on the previous commands, we may want to deny the
command if the interrupt is not disabled.

>             - Update the vlpi_map
>              (XXX: Enable LPI on guest request?)

Read the spec...

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Draft C] Xen on ARM vITS Handling
  2015-05-27 16:44 ` Vijay Kilari
  2015-05-29 14:06   ` Julien Grall
@ 2015-06-01 12:11   ` Ian Campbell
  2015-06-01 12:24     ` Julien Grall
  1 sibling, 1 reply; 15+ messages in thread
From: Ian Campbell @ 2015-06-01 12:11 UTC (permalink / raw)
  To: Vijay Kilari; +Cc: manish.jaggi, Julien Grall, Stefano Stabellini, xen-devel

On Wed, 2015-05-27 at 22:14 +0530, Vijay Kilari wrote:
> > ## pITS Scheduling
> >
> > A pITS scheduling pass is attempted:
> >
> > * On write to any virtual `CWRITER` iff that write results in there
> >   being new outstanding requests for that vits;
> 
>    You mean, scheduling pass (softirq trigger)  is triggered iff there is no
> ongoing requests from that vits?

Yes, this has changed with the switch to only a single outstanding
batch. I went with:
        
        * On write to any virtual `CWRITER` iff that write results in there
          being new outstanding requests for that vits which could be consumed
          by the pits (i.e. subject to only a single batch only being
          permitted by the scheduler);
        

Although implementationwise it may be OK to defer that decision to the
scheduler, rather than try to figure it out in the mmio trap.

> 
> > * On read from a virtual `CREADR` iff there are commands outstanding
> >   on that vits;
> > * On receipt of an interrupt notification arising from Xen's own use
> >   of `INT`; (see discussion under Completion)
> > * On any interrupt injection arising from a guests use of the `INT`
> >   command; (XXX perhaps, see discussion under Completion)
> >
> > This may result in lots of contention on the scheduler
> > locking. Therefore we consider that in each case all which happens is
> > triggering of a softirq which will be processed on return to guest,
> > and just once even for multiple events.
> 
> Is it required to have all the cases to trigger scheduling pass?
> Just on CWRITER if no ongoing request and on Xen's own completion INT
> is not sufficient?

I think CREADR is needed too, so the guest sees up to date info.

And on injection arising from the guest use of INT is marked as optional
here and considered later on. Whether it is needed depends on the
decision there.

> [...]
> > The second option is likely to be preferable if the issue of selecting
> > a device ID can be addressed.
> >
> > A secondary question is when these `INT` commands should be inserted
> > into the command stream:

(Nb, this is a list of options, not a list of places where it must be
done)

> >
> > * After each batch taken from a single `vits_cq`;
> 
>    Is this not enough? because Scheduling pass just sends a one batch of
> command with Xen's INT command

It is almost certainly _sufficient_, the question is more whether it is
_necessary_ or whether we can reduce the number of interrupts which are
required for correct emulation of a vits, iow can we get away with one
of the other two options.

The following text argues that only one Xen INT is needed in the stream
at any given moment.

> > ### Device ID (`ID`)
> >
> > This parameter is used by commands which manage a specific device and
> > the interrupts associated with that device. Checking if a device is
> > present and retrieving the data structure must be fast.
> >
> > The device identifiers may not be assigned contiguously and the maximum
> > number is very high (2^32).
> >
> > XXX In the context of virtualised device ids this may not be the case,
> > e.g. we can arrange for (mostly) contiguous device ids and we know the
> > bound is significantly lower than 2^32
> >
> > Possible efficient data structures would be:
> >
> > 1. List: The lookup/deletion is in O(n) and the insertion will depend
> >    if the device should be sorted following their identifier. The
> >    memory overhead is 18 bytes per element.
> > 2. Red-black tree: All the operations are O(log(n)). The memory
> >    overhead is 24 bytes per element.
> >
> > A Red-black tree seems the more suitable for having fast deviceID
> > validation even though the memory overhead is a bit higher compare to
> > the list.
> 
> When PHYSDEVOP_pci_device_add is called, memory for its_device structure
> and other needed structure for this device is allocated added to RB-tree
> with all necessary information

Sounds like a reasonable time to do it. I added something based on your
words.

[...]
> > Format: `MAPC vCID, vTA`
> >
>    -  The GITS_TYPER.PAtype is emulated as 0.

ITYM `GITS_TYPER.PTA`?

I've updated various introductory section to reflect the decision to
emulate as 0.

> 
> > - `MAPC pCID, pTA` physical ITS command is generated
> >
> > ### `MAPD` Command translation
> >
> > Format: `MAPD device, Valid, ITT IPA, ITT Size`
> >
> > `MAPD` is sent with `Valid` bit set if device needs to be added and reset
> > when device is removed.
> >
> > If `Valid` bit is set:
> >
> > - Allocate memory for `its_device` struct
> > - Validate ITT IPA & ITT size and update its_device struct
> > - Find number of vectors(nrvecs) for this device by querying PCI
> >   helper function
> > - Allocate nrvecs number of LPI XXX nrvecs is a function of `ITT Size`?
> > - Allocate memory for `struct vlpi_map` for this device. This
> >   `vlpi_map` holds mapping of Virtual LPI to Physical LPI and ID.
> > - Find physical ITS node with which this device is associated
> > - Call `p2m_lookup` on ITT IPA addr and get physical ITT address
> > - Validate ITT Size
> > - Generate/format physical ITS command: `MAPD, ITT PA, ITT Size`
> >
> > Here the overhead is with memory allocation for `its_device` and `vlpi_map`
> >
> > XXX Suggestion was to preallocate some of those at device passthrough
> > setup time?
> 
> If Validation bit is set:

I think the proper name for this is just "Valid bit", it just indicates
that the entry should be made valid or invalid (i.e. added or removed)
rather than requesting any kind of validation.

>    - Query its_device tree and get its_device structure for this device.
>    - (XXX: If pci device is hidden from dom0, does this device is added
>        with PHYSDEVOP_pci_device_add hypercall?)

We do/will not hide PCI devices from dom0 in the same way we do for
platform devices, we should do as for x86 and rely on xen-pciback to
bind to the PCI device and keep real drivers away rather than on actual
hiding (which will be complex).

So there should always be a PHYSDEVOP_pci_device_add.

If a PCI host-bridge/controller itself is hidden from dom0 then devices
behind it will not be available for passthrough.

>    - If device does not exists return
>    - If device exists in RB-tree then
>           - Validate ITT IPA & ITT size and update its_device struct
>           - Check if device is already assigned to the domain,

If a device isn't assigned to the domain then we shouldn't be assigning
it.

I suspect you actually meant something like if the device is already
mapped in the (v)ITS table, but then the following text seems to
discount updating the table with a second MAPD after, since it only
considers the case that the device is not already mapped.

I didn't update the remainder of this entry since I'm not sure what to
change to account for the above.

Ian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Draft C] Xen on ARM vITS Handling
  2015-06-01 12:11   ` Ian Campbell
@ 2015-06-01 12:24     ` Julien Grall
  2015-06-01 13:45       ` Ian Campbell
  2015-06-03  7:25       ` Vijay Kilari
  0 siblings, 2 replies; 15+ messages in thread
From: Julien Grall @ 2015-06-01 12:24 UTC (permalink / raw)
  To: Ian Campbell, Vijay Kilari
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, xen-devel

On 01/06/15 13:11, Ian Campbell wrote:
>>> ### Device ID (`ID`)
>>>
>>> This parameter is used by commands which manage a specific device and
>>> the interrupts associated with that device. Checking if a device is
>>> present and retrieving the data structure must be fast.
>>>
>>> The device identifiers may not be assigned contiguously and the maximum
>>> number is very high (2^32).
>>>
>>> XXX In the context of virtualised device ids this may not be the case,
>>> e.g. we can arrange for (mostly) contiguous device ids and we know the
>>> bound is significantly lower than 2^32
>>>
>>> Possible efficient data structures would be:
>>>
>>> 1. List: The lookup/deletion is in O(n) and the insertion will depend
>>>    if the device should be sorted following their identifier. The
>>>    memory overhead is 18 bytes per element.
>>> 2. Red-black tree: All the operations are O(log(n)). The memory
>>>    overhead is 24 bytes per element.
>>>
>>> A Red-black tree seems the more suitable for having fast deviceID
>>> validation even though the memory overhead is a bit higher compare to
>>> the list.
>>
>> When PHYSDEVOP_pci_device_add is called, memory for its_device structure
>> and other needed structure for this device is allocated added to RB-tree
>> with all necessary information
> 
> Sounds like a reasonable time to do it. I added something based on your
> words.

Hmmm... The RB-tree suggested is per domain not the host and indexed
with the vDevID.

This is the only way to know quickly if the domain is able to use the
device and retrieving a device. Indeed, the vDevID won't be equal to the
pDevID as the vBDF will be different to the pBDF.

PHYSDEVOP_pci_device_add is to ask Xen managing the PCI device. At that
time we don't know to which domain the device will be passthrough.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Draft C] Xen on ARM vITS Handling
  2015-05-29 13:40 ` Julien Grall
@ 2015-06-01 13:12   ` Ian Campbell
  2015-06-01 15:29     ` Julien Grall
  0 siblings, 1 reply; 15+ messages in thread
From: Ian Campbell @ 2015-06-01 13:12 UTC (permalink / raw)
  To: Julien Grall
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari, xen-devel

On Fri, 2015-05-29 at 14:40 +0100, Julien Grall wrote:
> Hi Ian,
> 
> NIT: You used my Linaro email which I think is de-activated now :).

I keep finding new address books with that address  in them!

> > ## ITS Translation Table
> >
> > Message signalled interrupts are translated into an LPI via an ITS
> > translation table which must be configured for each device which can
> > generate an MSI.
> 
> I'm not sure what is the ITS Table Table. Did you mean Interrupt
> Translation Table?

I don't think I wrote Table Table anywhere.

I'm referring to the tables which are established by e.g. the MAPD
command and friends, e.g. the thing shown in "4.9.12 Notional ITS Table
Structure".

> > is _not_ guarenteed that a change to the LPI Configuration Table won't
> 
> s/guarenteed/guaranteed/? Or may the first use of this word was wrong?

guaranteed is correct, I can never remember it though.

> > XXX there are other aspects to virtualising the ITS (LPI collection
> > management, assignment of LPI ranges to guests, device
> > management). However these are not currently considered here. XXX
> > Should they be/do they need to be?
> 
> I think we began to cover these aspect with the section "command emulation".

Some aspects, yes. I went with:

        There are other aspects to virtualising the ITS (LPI collection
        management, assignment of LPI ranges to guests, device
        management). However these are only considered here to the extent
        needed for describing the vITS emulation.

> > XXX In the context of virtualised device ids this may not be the case,
> > e.g. we can arrange for (mostly) contiguous device ids and we know the
> > bound is significantly lower than 2^32
> 
> Well, the deviceID is computed from the BDF and some DMA alias. As the
> algorithm can't be tweaked, it's very likely that we will have
> non-contiguous Device ID. See pci_for_each_dma_alias in Linux
> (drivers/pci/search.c).

The implication here is that deviceID is fixed in hardware and is used
by driver domain software in contexts where we do not get the
opportunity to translate is that right? What contexts are those?

Note that the BDF is also something which we could in principal
virtualise (we already do for domU). Perhaps that is infeasible for dom0
though?

That gives me two thoughts.

The first is that although device identifiers are not necessarily
contiguous, they are generally at least grouped and not allocated at
random through the 2^32 options. For example a PCI Host bridge typically
has a range of device ids associated with it and each device has a
device id derived from that.

I'm not sure if we can leverage that into a more useful data structure
than an R-B tree, or for example to arrange for the R-B to allow for the
translation of a device within a span into the parent span and from
there do the lookup. Specifically when looking up a device ID
corresponding to a PCI device we could arrange to find the PCI host
bridge and find the actual device from there. This would keep the RB
tree much smaller and therefore perhaps quicker? Of course that depends
on what the lookup from PCI host bridge to a device looked like.

The second is that perhaps we can do something simpler for the domU
case, if we were willing to tolerate it being different from dom0.

> > Possible efficient data structures would be:
> >
> > 1. List: The lookup/deletion is in O(n) and the insertion will depend
> >     if the device should be sorted following their identifier. The
> >     memory overhead is 18 bytes per element.
> > 2. Red-black tree: All the operations are O(log(n)). The memory
> >     overhead is 24 bytes per element.
> >
> > A Red-black tree seems the more suitable for having fast deviceID
> > validation even though the memory overhead is a bit higher compare to
> > the list.
> >
> > ### Event ID (`vID`)
> >
> > This is the per-device Interrupt identifier (i.e. the MSI index). It
> > is configured by the device driver software.
> >
> > It is not necessary to translate a `vID`, however they may need to be
> > represented in various data structures given to the pITS.
> >
> > XXX is any of this true?
> 
> 
> Right, the vID will always be equal to the pID. Although you will need
> to associate a physical LPI for every pair (vID, DevID).

I think in the terms defined by this document that is (`ID`, `vID`) =>
an LPI. Right?

Have we considered how this mapping will be tracked?
 
> > ### Interrupt Collection (`vCID`)
> >
> > This parameter is used in commands which manage collections and
> > interrupt in order to move them for one CPU to another. The ITS is
> > only mandated to implement N + 1 collections where N is the number of
> > processor on the platform (i.e max number of VCPUs for a given
> > guest). Furthermore, the identifiers are always contiguous.
> >
> > If we decide to implement the strict minimum (i.e N + 1), an array is
> > enough and will allow operations in O(1).
> >
> > XXX Could forgo array and go straight to vcpu_info/domain_info.
> 
> Not really, the number of collection is always one higher than the
> number of VCPUs. How would you store the last collection?

In domain_info. What I meant was:

    if ( vcid == domain->nr_vcpus )
         return domain->interrupt_collection
    else if ( vcid < domain_nr_vcpus )
         return domain->vcpus[vcid]->interrupt_colleciton
    else
         invalid vcid.

Similar to how SPI vs PPI interrupts are handled.

> > ## Command Translation
> >
> > Of the existing GICv3 ITS commands, `MAPC`, `MAPD`, `MAPVI`/`MAPI` are
> > potentially time consuming commands as these commands creates entry in
> > the Xen ITS structures, which are used to validate other ITS commands.
> >
> > `INVALL` and `SYNC` are global and potentially disruptive to other
> > guests and so need consideration.
> 
> INVALL and SYNC are not global. They both take a parameter: vCID for
> INVALL and vTarget for SYNC.

By global I meant not associated with a specific device. I went with:

        `INVALL` and `SYNC` are not specific to a given device (they are per
        collection per target respectively) and are therefore potentially
        disruptive to other guests and so need consideration.

> INVALL ensures that any interrupts in the specified collection are
> re-load. SYNC ensures that all the previous command, and all outstanding
> physical actions relating to the specified re-distributor are completed.

> 
> > All other ITS command like `MOVI`, `DISCARD`, `INV`, `INT`, `CLEAR`
> > just validate and generate physical command.
> >
> > ### `MAPC` command translation
> >
> > Format: `MAPC vCID, vTA`
> >
> > - `MAPC pCID, pTA` physical ITS command is generated
> 
> We should not send any MAPC command to the physical ITS. The collection
> is already mapped during Xen boot.

What is the plan for this start of day mapping? One collection per pCPU
and ignore the rest?

It seems (section 4.9.2) that there are two potential kinds of
collections, ones internal to the ITS and others where data is held in
external memory. The numbers of both are limited by the hardware.

I suppose the internal ones will be faster.

Supposing that a guest is likely to use collections to map interrupts to
specific vcpus, and that the physical collections will be mapped to
pcpus, I suppose this means we will need to do some relatively expensive
remapping (corresponding to moving the IRQ to another collection) in
arch_move_irqs? Is that the best we can do?

> This command should only assign a pCID to the vCID.

Does it not also need to remap some interrupts to that new pCID?


> >
> > ### `MAPD` Command translation
> >
> > Format: `MAPD device, Valid, ITT IPA, ITT Size`
> >
> > `MAPD` is sent with `Valid` bit set if device needs to be added and reset
> > when device is removed.
> 
> Another case: The ITT is replaced. This use case needs more care because
> we need to ensure that all the interrupt are disabled before switching
> to the new ITT.

I've added a note since I think this is going to be a discussion in the
other sub thread.

> 
> > If `Valid` bit is set:
> >
> > - Allocate memory for `its_device` struct
> > - Validate ITT IPA & ITT size and update its_device struct
> > - Find number of vectors(nrvecs) for this device by querying PCI
> >    helper function
> > - Allocate nrvecs number of LPI XXX nrvecs is a function of `ITT Size`?
> > - Allocate memory for `struct vlpi_map` for this device. This
> >    `vlpi_map` holds mapping of Virtual LPI to Physical LPI and ID.
> > - Find physical ITS node with which this device is associated
> 
> XXX: The MAPD command is using a virtual DevID which is different that
> the pDevID (because the BDF is not the same). How do you find the
> corresponding translation?

Not sure, do we need a per-domain thing mapping vBDF to $something? Do
we alreayd have such a thing, e.g. in the SMMU code?

I've added a note.

> 
> > - Call `p2m_lookup` on ITT IPA addr and get physical ITT address
> > - Validate ITT Size
> > - Generate/format physical ITS command: `MAPD, ITT PA, ITT Size`
> 
> I had some though about the other validation problem with the ITT. The
> region will be used by the ITS to store the mapping between the ID and
> the LPI as long as some others information.

ITYM "as well as some other information"?

> I guess that if the OS is playing with the ITT (such as writing in it)
> the ITS will behave badly. We have to ensure to the guest will never
> write in it and by the same occasion that the same region is not passed
> to 2 devices.

I don't think we will be exposing the physical ITT to the guest, will
we? That will live in memory which Xen owns and controls and doesn't
share with any guest.

In fact, I don't know that a vITS will need an ITT memory at all, i.e.
most of our GITS_BASERn will be unimplemented.

In theory we could use these registers to offload some of the data
structure storage requirements to the guest, but that would require
great care to validate any time we touched it (or perhaps just
p2m==r/o), I think it is probably not worth the stress if we can just
use regular hypervisor side data structures instead? (This stuff is just
there for the h/w ITS which doesn't have the luxury of xmalloc).

> 
> > Here the overhead is with memory allocation for `its_device` and `vlpi_map`
> >
> > XXX Suggestion was to preallocate some of those at device passthrough
> > setup time?
> 
> Some of the informations can even be setup when the PCI device is added
> to Xen (such as the number of MSI supported and physical LPIs chunk).

Yes, assuming there are sufficient LPIs to allocate in this way. That's
not clear though, is it?

> > If Validation bit is not set:
> >
> > - Validate if the device exits by checking vITS device list
> > - Clear all `vlpis` assigned for this device
> > - Remove this device from vITS list
> > - Free memory
> >
> > XXX If preallocation presumably shouldn't free here either.
> 
> Right. We could use a field to say if the device is activated or not.
> 
> >
> > ### `MAPVI`/`MAPI` Command translation
> >
> > Format: `MAPVI device, ID, vID, vCID`
> 
> Actually the 2 commands are completely different:
> 	- MAPI maps a (DevID, ID) to a collection
> 	- MAVI maps a (DevID, ID) to a collection and an LPI.

MAPVI for the second one I think?

The difference is that MAPI lacks the vID argument?

> The process described below is only about MAPVI.

OK. I've left a placeholder for `MAPI`.

> Also what about interrupt re-mapping?

I don't know, what about it?

> > - Validate vCID and get pCID by searching cid_map
> >
> > - if vID does not have entry in `vlpi_entries` of this device allocate
> >    a new pID from `vlpi_map` of this device and update `vlpi_entries`
> >    with new pID
> 
> What if the vID is already used by another

I think Vijay's updates already addressed this.

> > - Allocate irq descriptor and add to RB tree
> > - call `route_irq_to_guest()` for this pID
> > - Generate/format physical ITS command: `MAPVI device ID, pID, pCID`
> 
> 
> > Here the overhead is allocating physical ID, allocate memory for irq
> > descriptor and routing interrupt.
> >
> > XXX Suggested to preallocate?
> 
> Right. We may also need to have a separate routing for LPIs as the
> current function is quite long to execute.
> 
> I was thinking into routing the interrupt at device assignation
> (assuming we allocate the pLPIs at that time). And only set the mapping
> to vLPIs when the MAPI is called.

Please propose concrete modifications to the text, since I can't figure
out what you mean to change here.

> 
> >
> > ### `INVALL` Command translation
> 
> The format of INVALL is INVALL collection
> 
> > A physical `INVALL` is only generated if the LPI dirty bitmap has any
> > bits set. Otherwise it is skipped.
> >
> > XXX Perhaps bitmap should just be a simple counter?
> 
> We would need to handle it per collection.

Hrm, this complicates things a bit. Don't we need to invalidate any pCID
which has a routing of an interrupt to vCID? i.e. potentially multiple
INVALL?

> > XXX bitmap is host global, a per-domain bitmap would allow us to elide
> > `INVALL` unless an LPI associated with the guest making the request
> > was dirty. Would also need some sort of "ITS INVALL" clock in order
> > that other guests can elide their own `INVALL` if one has already
> > happened. Complexity not worth it at this stage?
> 
> Given that I just discovered that INVALL is also taking a collection in
> parameter, it will likely be more complex.

Yes.

> 
> >
> > ### `SYNC` Command translation
> 
> The format of SYNC is SYNC target. It's only ensure the completion for a
> re-distributor.
> Although, the pseudo-code (see perform_sync in 5.13.22 in
> PRD03-GENC-010745 24.0) seems to say it waits for all re-distributor...
> I'm not sure what to trust.

Yes, it's confusing but the first sentence of 5.13.22 says:
        This command specifies that the ITS must wait for completion of
        internal effects of all previous commands, and all
        outstanding physical actions relating to the specified
        re-distributor.
        
So, by my reading, all redistributors need to have seen the effect of
any command issued to the given redistributor (not all commands given to
any redistributor).

Example: given command cA issued to redistributor rA and command cB
issued to redistrubutor rB and then issuing SYNC(rA) must ensure that cA
is visible to _both_ rA and rB, but doesn't say anything regarding cB at
all.

Ian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Draft C] Xen on ARM vITS Handling
  2015-05-29 14:06   ` Julien Grall
@ 2015-06-01 13:36     ` Ian Campbell
  2015-06-02 10:46       ` Julien Grall
  0 siblings, 1 reply; 15+ messages in thread
From: Ian Campbell @ 2015-06-01 13:36 UTC (permalink / raw)
  To: Julien Grall
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari, xen-devel

On Fri, 2015-05-29 at 15:06 +0100, Julien Grall wrote:
> Hi Vijay,
> 
> On 27/05/15 17:44, Vijay Kilari wrote:
> >> ## Command Translation
> >>
> >> Of the existing GICv3 ITS commands, `MAPC`, `MAPD`, `MAPVI`/`MAPI` are
> >> potentially time consuming commands as these commands creates entry in
> >> the Xen ITS structures, which are used to validate other ITS commands.
> >>
> >> `INVALL` and `SYNC` are global and potentially disruptive to other
> >> guests and so need consideration.
> >>
> >> All other ITS command like `MOVI`, `DISCARD`, `INV`, `INT`, `CLEAR`
> >> just validate and generate physical command.
> >>
> >> ### `MAPC` command translation
> >>
> >> Format: `MAPC vCID, vTA`
> >>
> >    -  The GITS_TYPER.PAtype is emulated as 0. Hence vTA is always represents
> >       vcpu number. Hence vTA is validated against physical Collection
> > IDs by querying
> >       ITS driver and corresponding Physical Collection ID is retrieved.
> >    -  Each vITS will have cid_map (struct cid_mapping) which holds mapping of
> 
> Why do you speak about each vITS? The emulation is only related to one
> vITS and not shared...

And each vITS will have a cid_map, which is used. This seems like a
reasonable way to express this concept in the context.

Perhaps there is a need to include discussion of some of the secondary
data structures alongside the defintion `cits_cq`. In which case we
could talk about "its associated `cid_map`" and things.

> >       Virtual Collection ID(vCID), Virtual Target address(vTA) and
> >       Physical Collection ID (pCID).
> >       If vCID entry already exists in cid_map, then that particular
> > mapping is updated with
> >       the new pCID and vTA else new entry is made in cid_map
> 
> When you move a collection, you also have to make sure that all the
> interrupts associated to it will be delivered to the new target.
> 
> I'm not sure what you are suggesting for that...

This is going to be rather painful I fear.

> >    -  MAPC pCID, pTA physical ITS command is generated
> 
> We should not send any MAPC command to the physical ITS. The collection
> is already mapped during Xen boot and the guest should not be able to
> move the physical collection (they are shared between all the guests and
> Xen).

This needs discussion in the background section, to describe the
physical setup which the virtual stuff can make assumption of.

> >    Here there is no overhead, the cid_map entries are preallocated
> > with size of nr_cpus
> >    in the platform.
> 
> As said the number of collection should be at least nr_cpus + 1.

FWIW I read this as "with size appropriate for nr_cpus", which leaves
the +1 as implicit. I added the +1 nevertheless.

> >> - `MAPC pCID, pTA` physical ITS command is generated
> >>
> >> ### `MAPD` Command translation
> >>
> >> Format: `MAPD device, Valid, ITT IPA, ITT Size`
> >>
> >> `MAPD` is sent with `Valid` bit set if device needs to be added and reset
> >> when device is removed.
> >>
> >> If `Valid` bit is set:
> >>
> >> - Allocate memory for `its_device` struct
> >> - Validate ITT IPA & ITT size and update its_device struct
> >> - Find number of vectors(nrvecs) for this device by querying PCI
> >>   helper function
> >> - Allocate nrvecs number of LPI XXX nrvecs is a function of `ITT Size`?
> >> - Allocate memory for `struct vlpi_map` for this device. This
> >>   `vlpi_map` holds mapping of Virtual LPI to Physical LPI and ID.
> >> - Find physical ITS node with which this device is associated
> >> - Call `p2m_lookup` on ITT IPA addr and get physical ITT address
> >> - Validate ITT Size
> >> - Generate/format physical ITS command: `MAPD, ITT PA, ITT Size`
> >>
> >> Here the overhead is with memory allocation for `its_device` and `vlpi_map`
> >>
> >> XXX Suggestion was to preallocate some of those at device passthrough
> >> setup time?
> > 
> > If Validation bit is set:
> >    - Query its_device tree and get its_device structure for this device.
> >    - (XXX: If pci device is hidden from dom0, does this device is added
> >        with PHYSDEVOP_pci_device_add hypercall?)
> >    - If device does not exists return
> >    - If device exists in RB-tree then
> >           - Validate ITT IPA & ITT size and update its_device struct
> 
> To validate the ITT size you need to know the number of interrupt ID.

Please could you get into the habit of making concrete suggestions for
changes to the text. I've no idea what change I should make based on
this observation. If not concrete suggestions please try and make the
implications of what you are saying clear.


> 
> >           - Check if device is already assigned to the domain,
> >             if not then
> >                - Find number of vectors(nrvecs) for this device.
> >                - Allocate nrvecs number of LPI
> >                - Fetch vlpi_map for this device (preallocated at the
> > time of adding
> >                  this device to Xen). This vlpi_map holds mapping of
> > Virtual LPI to
> >                  Physical LPI and ID.
> >                - Call p2m_lookup on ITT IPA addr and get physical ITT address
> >                - Assign this device to this domain and mark as enabled
> >           - If this device already exists with the domain (Domain is
> > remapping the device)
> >                - Validate ITT IPA & ITT size and update its_device struct
> >                - Call p2m_lookup on ITT IPA addr and get physical ITT address
> >                - Disable all the LPIs of this device by searching
> > through vlpi_map and LPI
> >                  configuration table
> 
> Disabling all the LPIs associated to a device can be time consuming
> because you have to unroute them and make sure that the physical ITS
> effectively disabled it before sending the MAPD command.
> 
> Given that the software would be buggy if it send a MAPD command without
> releasing all the associated interrupt we could ignore the command if
> any interrupt is still enabled.

Releasing how? Did you mean disable?


> >     - Clear all vlpis mapping for this device
> >     - Remove this device from the domain
> > 
> >> ### `MAPVI`/`MAPI` Command translation
> >>
> >> Format: `MAPVI device, ID, vID, vCID`
> >>
> >> - Validate if the device exits by checking vITS device list
> >> - Validate vCID and get pCID by searching cid_map
> >>
> >> - if vID does not have entry in `vlpi_entries` of this device allocate
> >>   a new pID from `vlpi_map` of this device and update `vlpi_entries`
> >>   with new pID
> >> - Allocate irq descriptor and add to RB tree
> >> - call `route_irq_to_guest()` for this pID
> >> - Generate/format physical ITS command: `MAPVI device ID, pID, pCID`
> >>
> > 
> > - Validate if the device exists by checking vITS device RB-tree.
> > - Validate vCID and get pCID by searching cid_map
> > - if vID does not have entry in vlpi_entries of this device
> >       -  Allot pID from vlpi_map of this device and update
> > vlpi_entries with new pID.
> >       - Allocate irq descriptor and add to RB tree
> >       - call route_irq_to_guest() for this pID
> >   If exists,
> >      - If vCID is different ( remapping interrupts to differnt collection ),
> >             - Disable LPI
> 
> You have to ensure the the LPI is disabled with an INV/SYNC.

Can we not rely on the subsequent INV/SYNC from the guest relating to
the MAPI here?

> Although as suggested on the previous commands, we may want to deny the
> command if the interrupt is not disabled.
> 
> >             - Update the vlpi_map
> >              (XXX: Enable LPI on guest request?)
> 
> Read the spec...

If you know the answer then please share it rather than making people
guess what you are talking about.

Ian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Draft C] Xen on ARM vITS Handling
  2015-06-01 12:24     ` Julien Grall
@ 2015-06-01 13:45       ` Ian Campbell
  2015-06-03  7:25       ` Vijay Kilari
  1 sibling, 0 replies; 15+ messages in thread
From: Ian Campbell @ 2015-06-01 13:45 UTC (permalink / raw)
  To: Julien Grall
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari, xen-devel

On Mon, 2015-06-01 at 13:24 +0100, Julien Grall wrote:
> On 01/06/15 13:11, Ian Campbell wrote:
> >>> ### Device ID (`ID`)
> >>>
> >>> This parameter is used by commands which manage a specific device and
> >>> the interrupts associated with that device. Checking if a device is
> >>> present and retrieving the data structure must be fast.
> >>>
> >>> The device identifiers may not be assigned contiguously and the maximum
> >>> number is very high (2^32).
> >>>
> >>> XXX In the context of virtualised device ids this may not be the case,
> >>> e.g. we can arrange for (mostly) contiguous device ids and we know the
> >>> bound is significantly lower than 2^32
> >>>
> >>> Possible efficient data structures would be:
> >>>
> >>> 1. List: The lookup/deletion is in O(n) and the insertion will depend
> >>>    if the device should be sorted following their identifier. The
> >>>    memory overhead is 18 bytes per element.
> >>> 2. Red-black tree: All the operations are O(log(n)). The memory
> >>>    overhead is 24 bytes per element.
> >>>
> >>> A Red-black tree seems the more suitable for having fast deviceID
> >>> validation even though the memory overhead is a bit higher compare to
> >>> the list.
> >>
> >> When PHYSDEVOP_pci_device_add is called, memory for its_device structure
> >> and other needed structure for this device is allocated added to RB-tree
> >> with all necessary information
> > 
> > Sounds like a reasonable time to do it. I added something based on your
> > words.
> 
> Hmmm... The RB-tree suggested is per domain not the host and indexed
> with the vDevID.

I added "The `ID` is per domain and therefore the datastructure should
be too." before "Possible efficient..."

> This is the only way to know quickly if the domain is able to use the
> device and retrieving a device. Indeed, the vDevID won't be equal to the
> pDevID as the vBDF will be different to the pBDF.
> 
> PHYSDEVOP_pci_device_add is to ask Xen managing the PCI device. At that
> time we don't know to which domain the device will be passthrough.

Yes, I suppose we can allocate at PHYSDEVOP_pci_device_add time, but
linking it into the R-B tree will have to happen at assignment time.

This section now ends:

        When `PHYSDEVOP_pci_device_add` is called, memory for its_device
        structure and other needed structure for this device is allocated.
        
        When `XEN_DOMCTL_assign_device` is called the device will be added to
        the per domain RB-tree with all necessary information.

Ian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Draft C] Xen on ARM vITS Handling
  2015-06-01 13:12   ` Ian Campbell
@ 2015-06-01 15:29     ` Julien Grall
  2015-06-02  9:41       ` Ian Campbell
  0 siblings, 1 reply; 15+ messages in thread
From: Julien Grall @ 2015-06-01 15:29 UTC (permalink / raw)
  To: Ian Campbell
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari, xen-devel

On 01/06/15 14:12, Ian Campbell wrote:
> On Fri, 2015-05-29 at 14:40 +0100, Julien Grall wrote:
>> Hi Ian,

Hi Ian,

>> NIT: You used my Linaro email which I think is de-activated now :).
> 
> I keep finding new address books with that address  in them!
> 
>>> ## ITS Translation Table
>>>
>>> Message signalled interrupts are translated into an LPI via an ITS
>>> translation table which must be configured for each device which can
>>> generate an MSI.
>>
>> I'm not sure what is the ITS Table Table. Did you mean Interrupt
>> Translation Table?
> 
> I don't think I wrote Table Table anywhere.

Sorry I meant "ITS translation table"

> I'm referring to the tables which are established by e.g. the MAPD
> command and friends, e.g. the thing shown in "4.9.12 Notional ITS Table
> Structure".

On previous paragraph you are referring particularly to "Interrupt
Translation Table". This is the only table that is configured per device.

[..]

>>> XXX there are other aspects to virtualising the ITS (LPI collection
>>> management, assignment of LPI ranges to guests, device
>>> management). However these are not currently considered here. XXX
>>> Should they be/do they need to be?
>>
>> I think we began to cover these aspect with the section "command emulation".
> 
> Some aspects, yes. I went with:
> 
>         There are other aspects to virtualising the ITS (LPI collection
>         management, assignment of LPI ranges to guests, device
>         management). However these are only considered here to the extent
>         needed for describing the vITS emulation.
> 
>>> XXX In the context of virtualised device ids this may not be the case,
>>> e.g. we can arrange for (mostly) contiguous device ids and we know the
>>> bound is significantly lower than 2^32
>>
>> Well, the deviceID is computed from the BDF and some DMA alias. As the
>> algorithm can't be tweaked, it's very likely that we will have
>> non-contiguous Device ID. See pci_for_each_dma_alias in Linux
>> (drivers/pci/search.c).
> 
> The implication here is that deviceID is fixed in hardware and is used
> by driver domain software in contexts where we do not get the
> opportunity to translate is that right? What contexts are those?

No, the driver domain software will always use a virtual DeviceID (based
on the vBDF and other things). The problem I wanted to raise is how to
translate back the vDeviceID to a physical deviceID/BDF.

> Note that the BDF is also something which we could in principal
> virtualise (we already do for domU). Perhaps that is infeasible for dom0
> though?

For DOM0 the virtual BDF is equal to the physical BDF. So the both
deviceID (physical and virtual) will be the same.

We may decide to do vBDF == pBDF for guest too in order to simplify the
code.

> That gives me two thoughts.
> 
> The first is that although device identifiers are not necessarily
> contiguous, they are generally at least grouped and not allocated at
> random through the 2^32 options. For example a PCI Host bridge typically
> has a range of device ids associated with it and each device has a
> device id derived from that.

Usually it's one per (device, function).

> 
> I'm not sure if we can leverage that into a more useful data structure
> than an R-B tree, or for example to arrange for the R-B to allow for the
> translation of a device within a span into the parent span and from
> there do the lookup. Specifically when looking up a device ID
> corresponding to a PCI device we could arrange to find the PCI host
> bridge and find the actual device from there. This would keep the RB
> tree much smaller and therefore perhaps quicker? Of course that depends
> on what the lookup from PCI host bridge to a device looked like.

I'm not sure why you are speaking about PCI host bridge. AFAIK, the
guest doesn't have a physical host bridge.

Although, this is an optimization that we can think about it later. The
R-B will already be fast enough for a first implementation. My main
point was about the translation vDeviceID => pDeviceID.

> The second is that perhaps we can do something simpler for the domU
> case, if we were willing to tolerate it being different from dom0.
> 
>>> Possible efficient data structures would be:
>>>
>>> 1. List: The lookup/deletion is in O(n) and the insertion will depend
>>>     if the device should be sorted following their identifier. The
>>>     memory overhead is 18 bytes per element.
>>> 2. Red-black tree: All the operations are O(log(n)). The memory
>>>     overhead is 24 bytes per element.
>>>
>>> A Red-black tree seems the more suitable for having fast deviceID
>>> validation even though the memory overhead is a bit higher compare to
>>> the list.
>>>
>>> ### Event ID (`vID`)
>>>
>>> This is the per-device Interrupt identifier (i.e. the MSI index). It
>>> is configured by the device driver software.
>>>
>>> It is not necessary to translate a `vID`, however they may need to be
>>> represented in various data structures given to the pITS.
>>>
>>> XXX is any of this true?
>>
>>
>> Right, the vID will always be equal to the pID. Although you will need
>> to associate a physical LPI for every pair (vID, DevID).
> 
> I think in the terms defined by this document that is (`ID`, `vID`) =>
> an LPI. Right?

Well, by `ID` you refer about deviceID? That remind me that the second
paragraph of "# ITS Command Translation" is confusing (copy below):

"The ITS provides 12 commands in order to manage interrupt collections,
devices and interrupts. Possible command parameters are device ID
(`ID`), Event ID (`vID`), Collection ID (`vCID`), Target Address
(`vTA`) parameters."

The spec is using `ID` for "interrupt ID" and `device` for deviceID.

In general, the section "command emulation" is using 'v' to refer the
virtual identifier and 'p' for the physical identifier. (see pCID vs
vCID both are Collection ID).

> 
> Have we considered how this mapping will be tracked?

I don't think so. We need to track the list of `event ID` enabled for
this deviceID and the associated `vLPI` and `LPI`.

Although, if allocate a contiguous chunck of `LPI` it won't be necessary
to track the later.

>>> ### Interrupt Collection (`vCID`)
>>>
>>> This parameter is used in commands which manage collections and
>>> interrupt in order to move them for one CPU to another. The ITS is
>>> only mandated to implement N + 1 collections where N is the number of
>>> processor on the platform (i.e max number of VCPUs for a given
>>> guest). Furthermore, the identifiers are always contiguous.
>>>
>>> If we decide to implement the strict minimum (i.e N + 1), an array is
>>> enough and will allow operations in O(1).
>>>
>>> XXX Could forgo array and go straight to vcpu_info/domain_info.
>>
>> Not really, the number of collection is always one higher than the
>> number of VCPUs. How would you store the last collection?
> 
> In domain_info. What I meant was:
> 
>     if ( vcid == domain->nr_vcpus )
>          return domain->interrupt_collection
>     else if ( vcid < domain_nr_vcpus )
>          return domain->vcpus[vcid]->interrupt_colleciton
>     else
>          invalid vcid.
> 
> Similar to how SPI vs PPI interrupts are handled.

Sorry, I didn't understand your suggestion like that.

I think that can work, although the resulting code may be difficult to
read/understand because a collection can be moved from one vCPU to another.

> 
>>> ## Command Translation
>>>
>>> Of the existing GICv3 ITS commands, `MAPC`, `MAPD`, `MAPVI`/`MAPI` are
>>> potentially time consuming commands as these commands creates entry in
>>> the Xen ITS structures, which are used to validate other ITS commands.
>>>
>>> `INVALL` and `SYNC` are global and potentially disruptive to other
>>> guests and so need consideration.
>>
>> INVALL and SYNC are not global. They both take a parameter: vCID for
>> INVALL and vTarget for SYNC.
> 
> By global I meant not associated with a specific device. I went with:
> 
>         `INVALL` and `SYNC` are not specific to a given device (they are per
>         collection per target respectively) and are therefore potentially
>         disruptive to other guests and so need consideration.

Thanks for the clarification.

> 
>> INVALL ensures that any interrupts in the specified collection are
>> re-load. SYNC ensures that all the previous command, and all outstanding
>> physical actions relating to the specified re-distributor are completed.
> 
>>
>>> All other ITS command like `MOVI`, `DISCARD`, `INV`, `INT`, `CLEAR`
>>> just validate and generate physical command.
>>>
>>> ### `MAPC` command translation
>>>
>>> Format: `MAPC vCID, vTA`
>>>
>>> - `MAPC pCID, pTA` physical ITS command is generated
>>
>> We should not send any MAPC command to the physical ITS. The collection
>> is already mapped during Xen boot.
> 
> What is the plan for this start of day mapping? One collection per pCPU
> and ignore the rest?

To map the only collection per pCPU. And we can see for improvement later.

> It seems (section 4.9.2) that there are two potential kinds of
> collections, ones internal to the ITS and others where data is held in
> external memory. The numbers of both are limited by the hardware.

They are all the same. The only difference is for external collection
the memory should be allocated by the software.

> I suppose the internal ones will be faster.

No idea.

> Supposing that a guest is likely to use collections to map interrupts to
> specific vcpus, and that the physical collections will be mapped to
> pcpus, I suppose this means we will need to do some relatively expensive
> remapping (corresponding to moving the IRQ to another collection) in
> arch_move_irqs? Is that the best we can do?

If we use the same solution as SPI/PPIs (i.e arch_move_irqs), yes. But
I'm not in favor of this solution which is already expensive for
SPI/PPIs (after talking with Dario, vCPU may move often between pCPU).

We could decide to implement a lazy solution (i.e moving the interrupt
only when it's firing) but that would require to send ITS command in Xen.

Another idea is to never move the interrupt collection from one pCPU to
another. The vCID would always be equal to the same pCID, but the
interrupt would be injected to a different processor depending on the
associated vTA.

> 
>> This command should only assign a pCID to the vCID.
> 
> Does it not also need to remap some interrupts to that new pCID?

That would be very expensive. Another solution would be to always have
the same mapping vCID <=> pCID (allocated at boot) and only update the
vTA. We would loose the possibility to have LPIs following a vCPU but
that would be a small drawback compare to the cost of the ITS.

>>>
>>> ### `MAPD` Command translation
>>>
>>> Format: `MAPD device, Valid, ITT IPA, ITT Size`
>>>
>>> `MAPD` is sent with `Valid` bit set if device needs to be added and reset
>>> when device is removed.
>>
>> Another case: The ITT is replaced. This use case needs more care because
>> we need to ensure that all the interrupt are disabled before switching
>> to the new ITT.
> 
> I've added a note since I think this is going to be a discussion in the
> other sub thread.
> 
>>
>>> If `Valid` bit is set:
>>>
>>> - Allocate memory for `its_device` struct
>>> - Validate ITT IPA & ITT size and update its_device struct
>>> - Find number of vectors(nrvecs) for this device by querying PCI
>>>    helper function
>>> - Allocate nrvecs number of LPI XXX nrvecs is a function of `ITT Size`?
>>> - Allocate memory for `struct vlpi_map` for this device. This
>>>    `vlpi_map` holds mapping of Virtual LPI to Physical LPI and ID.
>>> - Find physical ITS node with which this device is associated
>>
>> XXX: The MAPD command is using a virtual DevID which is different that
>> the pDevID (because the BDF is not the same). How do you find the
>> corresponding translation?
> 
> Not sure, do we need a per-domain thing mapping vBDF to $something? Do
> we alreayd have such a thing, e.g. in the SMMU code?

The current SMMU code doesn't have anything related to it. I need to
think about it.

> I've added a note.

Thanks.

>>
>>> - Call `p2m_lookup` on ITT IPA addr and get physical ITT address
>>> - Validate ITT Size
>>> - Generate/format physical ITS command: `MAPD, ITT PA, ITT Size`
>>
>> I had some though about the other validation problem with the ITT. The
>> region will be used by the ITS to store the mapping between the ID and
>> the LPI as long as some others information.
> 
> ITYM "as well as some other information"?

Yes.

> 
>> I guess that if the OS is playing with the ITT (such as writing in it)
>> the ITS will behave badly. We have to ensure to the guest will never
>> write in it and by the same occasion that the same region is not passed
>> to 2 devices.
> 
> I don't think we will be exposing the physical ITT to the guest, will
> we? That will live in memory which Xen owns and controls and doesn't
> share with any guest.

That's need to be clarify. The RFC sent by Vijay is letting the guest
feeding the physical ITT.

But I just though that it won't work well, the ITT has to be contiguous
in the physical memory.

> In fact, I don't know that a vITS will need an ITT memory at all, i.e.
> most of our GITS_BASERn will be unimplemented.
> In theory we could use these registers to offload some of the data
> structure storage requirements to the guest, but that would require
> great care to validate any time we touched it (or perhaps just
> p2m==r/o), I think it is probably not worth the stress if we can just
> use regular hypervisor side data structures instead? (This stuff is just
> there for the h/w ITS which doesn't have the luxury of xmalloc).

GITS_BASERn is only used to feed internal ITS table such as Collections,
Devices, Virtual Processors.

For the ITT, the address is passed via MAPD and as to be allocated by
the guest.

We could ignore the address provided by the guest but that would mean
the ITT is allocated twice (one by the guest unused, and the Xen one
used to feed the pITS). Although, as pointed above the ITT memory region
allocated by the guest may not be contiguous in physical memory

Furthermore, if we allocate the ITT in Xen, this command will be quick
to emulate (we could send the MAPD command when the device is attached).

So less possible security issue.

>>> Here the overhead is with memory allocation for `its_device` and `vlpi_map`
>>>
>>> XXX Suggestion was to preallocate some of those at device passthrough
>>> setup time?
>>
>> Some of the informations can even be setup when the PCI device is added
>> to Xen (such as the number of MSI supported and physical LPIs chunk).
> 
> Yes, assuming there are sufficient LPIs to allocate in this way. That's
> not clear though, is it?

IHMO it's not an issue. If the ITS doesn't provide enough LPIs a
baremetal OS will likely not being able to use a device.

FWIW, Linux is always allocating the LPIs when the device is added based
on the PCI cfg.

> 
>>> If Validation bit is not set:
>>>
>>> - Validate if the device exits by checking vITS device list
>>> - Clear all `vlpis` assigned for this device
>>> - Remove this device from vITS list
>>> - Free memory
>>>
>>> XXX If preallocation presumably shouldn't free here either.
>>
>> Right. We could use a field to say if the device is activated or not.
>>
>>>
>>> ### `MAPVI`/`MAPI` Command translation
>>>
>>> Format: `MAPVI device, ID, vID, vCID`
>>
>> Actually the 2 commands are completely different:
>> 	- MAPI maps a (DevID, ID) to a collection
>> 	- MAVI maps a (DevID, ID) to a collection and an LPI.
> 
> MAPVI for the second one I think?

Right.

> 
> The difference is that MAPI lacks the vID argument?

Yes. And MAPI doesn't bind the (DevID, ID) to an LPIs. The first one can
be used to move IRQ from one collection to another.

>> The process described below is only about MAPVI.
> 
> OK. I've left a placeholder for `MAPI`.
> 
>> Also what about interrupt re-mapping?
> 
> I don't know, what about it?

It was an open question. It answered to it on the mail to Vijay's.

> 
>>> - Validate vCID and get pCID by searching cid_map
>>>
>>> - if vID does not have entry in `vlpi_entries` of this device allocate
>>>    a new pID from `vlpi_map` of this device and update `vlpi_entries`
>>>    with new pID
>>
>> What if the vID is already used by another
> 
> I think Vijay's updates already addressed this.

I will give a look.

> 
>>> - Allocate irq descriptor and add to RB tree
>>> - call `route_irq_to_guest()` for this pID
>>> - Generate/format physical ITS command: `MAPVI device ID, pID, pCID`
>>
>>
>>> Here the overhead is allocating physical ID, allocate memory for irq
>>> descriptor and routing interrupt.
>>>
>>> XXX Suggested to preallocate?
>>
>> Right. We may also need to have a separate routing for LPIs as the
>> current function is quite long to execute.
>>
>> I was thinking into routing the interrupt at device assignation
>> (assuming we allocate the pLPIs at that time). And only set the mapping
>> to vLPIs when the MAPI is called.
> 
> Please propose concrete modifications to the text, since I can't figure
> out what you mean to change here.

It's not yet fully clear in my mind.

If we assuming that we have a chunk of LPIs allocate for a the device
when it's assigned to the guest, the only missing information is the
corresponding vLPIs.

We could decide to setup the internal routing structure (allocation of
the irq_guest) and defer the setting of the vLPI. This would work
because the interrupt are disabled at boot time and won't be enabled as
long as a MAPVI command is sent.



> 
>>
>>>
>>> ### `INVALL` Command translation
>>
>> The format of INVALL is INVALL collection
>>
>>> A physical `INVALL` is only generated if the LPI dirty bitmap has any
>>> bits set. Otherwise it is skipped.
>>>
>>> XXX Perhaps bitmap should just be a simple counter?
>>
>> We would need to handle it per collection.
> 
> Hrm, this complicates things a bit. Don't we need to invalidate any pCID
> which has a routing of an interrupt to vCID? i.e. potentially multiple
> INVALL?

Not necessarily. If we keep the interrupts of a vCID always in the same
pCID, we would need to send only on INVALL.

>>
>>>
>>> ### `SYNC` Command translation
>>
>> The format of SYNC is SYNC target. It's only ensure the completion for a
>> re-distributor.
>> Although, the pseudo-code (see perform_sync in 5.13.22 in
>> PRD03-GENC-010745 24.0) seems to say it waits for all re-distributor...
>> I'm not sure what to trust.
> 
> Yes, it's confusing but the first sentence of 5.13.22 says:
>         This command specifies that the ITS must wait for completion of
>         internal effects of all previous commands, and all
>         outstanding physical actions relating to the specified
>         re-distributor.
>         
> So, by my reading, all redistributors need to have seen the effect of
> any command issued to the given redistributor (not all commands given to
> any redistributor).
> 
> Example: given command cA issued to redistributor rA and command cB
> issued to redistrubutor rB and then issuing SYNC(rA) must ensure that cA
> is visible to _both_ rA and rB, but doesn't say anything regarding cB at
> all.

Thanks for the explanation.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Draft C] Xen on ARM vITS Handling
  2015-06-01 15:29     ` Julien Grall
@ 2015-06-02  9:41       ` Ian Campbell
  0 siblings, 0 replies; 15+ messages in thread
From: Ian Campbell @ 2015-06-02  9:41 UTC (permalink / raw)
  To: Julien Grall; +Cc: manish.jaggi, Stefano Stabellini, Vijay Kilari, xen-devel

On Mon, 2015-06-01 at 16:29 +0100, Julien Grall wrote:
> On 01/06/15 14:12, Ian Campbell wrote:
> > On Fri, 2015-05-29 at 14:40 +0100, Julien Grall wrote:
> >> Hi Ian,
> 
> Hi Ian,
> 
> >> NIT: You used my Linaro email which I think is de-activated now :).
> > 
> > I keep finding new address books with that address  in them!
> > 
> >>> ## ITS Translation Table
> >>>
> >>> Message signalled interrupts are translated into an LPI via an ITS
> >>> translation table which must be configured for each device which can
> >>> generate an MSI.
> >>
> >> I'm not sure what is the ITS Table Table. Did you mean Interrupt
> >> Translation Table?
> > 
> > I don't think I wrote Table Table anywhere.
> 
> Sorry I meant "ITS translation table"
> 
> > I'm referring to the tables which are established by e.g. the MAPD
> > command and friends, e.g. the thing shown in "4.9.12 Notional ITS Table
> > Structure".
> 
> On previous paragraph you are referring particularly to "Interrupt
> Translation Table". This is the only table that is configured per device.

I'm afraid I'm still not getting your point. Please quote the exact text
which you think is wrong and if possible suggest an alternative.

> [..]
> 
> >>> XXX there are other aspects to virtualising the ITS (LPI collection
> >>> management, assignment of LPI ranges to guests, device
> >>> management). However these are not currently considered here. XXX
> >>> Should they be/do they need to be?
> >>
> >> I think we began to cover these aspect with the section "command emulation".
> > 
> > Some aspects, yes. I went with:
> > 
> >         There are other aspects to virtualising the ITS (LPI collection
> >         management, assignment of LPI ranges to guests, device
> >         management). However these are only considered here to the extent
> >         needed for describing the vITS emulation.
> > 
> >>> XXX In the context of virtualised device ids this may not be the case,
> >>> e.g. we can arrange for (mostly) contiguous device ids and we know the
> >>> bound is significantly lower than 2^32
> >>
> >> Well, the deviceID is computed from the BDF and some DMA alias. As the
> >> algorithm can't be tweaked, it's very likely that we will have
> >> non-contiguous Device ID. See pci_for_each_dma_alias in Linux
> >> (drivers/pci/search.c).
> > 
> > The implication here is that deviceID is fixed in hardware and is used
> > by driver domain software in contexts where we do not get the
> > opportunity to translate is that right? What contexts are those?
> 
> No, the driver domain software will always use a virtual DeviceID (based
> on the vBDF and other things). The problem I wanted to raise is how to
> translate back the vDeviceID to a physical deviceID/BDF.

Right, so this goes back to my original point, which is that if we
completely control the translation from vDeviceID to pDeviceID/BDF then
the vDeviceId space need not be sparse and need not utilise the entire
2^32 space, at least for domU uses.

> > Note that the BDF is also something which we could in principal
> > virtualise (we already do for domU). Perhaps that is infeasible for dom0
> > though?
> 
> For DOM0 the virtual BDF is equal to the physical BDF. So the both
> deviceID (physical and virtual) will be the same.
> 
> We may decide to do vBDF == pBDF for guest too in order to simplify the
> code.

It seems to me that choosing vBDF such that the vDeviceId space is to
our liking would be a good idea.

> > That gives me two thoughts.
> > 
> > The first is that although device identifiers are not necessarily
> > contiguous, they are generally at least grouped and not allocated at
> > random through the 2^32 options. For example a PCI Host bridge typically
> > has a range of device ids associated with it and each device has a
> > device id derived from that.
> 
> Usually it's one per (device, function).

Yes, but my point is that they are generally grouped by bus. The bus is
assigned a (contiguous) range and individual (device,function)=> device
id mappings are based on a formula applied to the base address.

i.e. for a given PCI bus the device ids are in the range 1000..1000+N,
not N random number selected from the 2^32 space.

> 
> > 
> > I'm not sure if we can leverage that into a more useful data structure
> > than an R-B tree, or for example to arrange for the R-B to allow for the
> > translation of a device within a span into the parent span and from
> > there do the lookup. Specifically when looking up a device ID
> > corresponding to a PCI device we could arrange to find the PCI host
> > bridge and find the actual device from there. This would keep the RB
> > tree much smaller and therefore perhaps quicker? Of course that depends
> > on what the lookup from PCI host bridge to a device looked like.
> 
> I'm not sure why you are speaking about PCI host bridge. AFAIK, the
> guest doesn't have a physical host bridge.

It has a virtual one provided by the pciif/pcifront+back thing. Any PCI
bus is behind some sort of host bridge, whether physical, virtual or
"notional".

> Although, this is an optimization that we can think about it later. The
> R-B will already be fast enough for a first implementation. My main
> point was about the translation vDeviceID => pDeviceID.
> 
> > The second is that perhaps we can do something simpler for the domU
> > case, if we were willing to tolerate it being different from dom0.
> > 
> >>> Possible efficient data structures would be:
> >>>
> >>> 1. List: The lookup/deletion is in O(n) and the insertion will depend
> >>>     if the device should be sorted following their identifier. The
> >>>     memory overhead is 18 bytes per element.
> >>> 2. Red-black tree: All the operations are O(log(n)). The memory
> >>>     overhead is 24 bytes per element.
> >>>
> >>> A Red-black tree seems the more suitable for having fast deviceID
> >>> validation even though the memory overhead is a bit higher compare to
> >>> the list.
> >>>
> >>> ### Event ID (`vID`)
> >>>
> >>> This is the per-device Interrupt identifier (i.e. the MSI index). It
> >>> is configured by the device driver software.
> >>>
> >>> It is not necessary to translate a `vID`, however they may need to be
> >>> represented in various data structures given to the pITS.
> >>>
> >>> XXX is any of this true?
> >>
> >>
> >> Right, the vID will always be equal to the pID. Although you will need
> >> to associate a physical LPI for every pair (vID, DevID).
> > 
> > I think in the terms defined by this document that is (`ID`, `vID`) =>
> > an LPI. Right?
> 
> Well, by `ID` you refer about deviceID?

I think this section has suffered from multiple authors using
inconsistent terminology and from some of the naming being confusingly
similar. I'm going to do a pass and switch everything to consistently
use the names used in the spec, with a p- or v- prefix as necessary.

> > Have we considered how this mapping will be tracked?
> 
> I don't think so. We need to track the list of `event ID` enabled for
> this deviceID and the associated `vLPI` and `LPI`.
> 
> Although, if allocate a contiguous chunck of `LPI` it won't be necessary
> to track the later.

Is the deviceID space for a given device linear?

> 
> >>> ### Interrupt Collection (`vCID`)
> >>>
> >>> This parameter is used in commands which manage collections and
> >>> interrupt in order to move them for one CPU to another. The ITS is
> >>> only mandated to implement N + 1 collections where N is the number of
> >>> processor on the platform (i.e max number of VCPUs for a given
> >>> guest). Furthermore, the identifiers are always contiguous.
> >>>
> >>> If we decide to implement the strict minimum (i.e N + 1), an array is
> >>> enough and will allow operations in O(1).
> >>>
> >>> XXX Could forgo array and go straight to vcpu_info/domain_info.
> >>
> >> Not really, the number of collection is always one higher than the
> >> number of VCPUs. How would you store the last collection?
> > 
> > In domain_info. What I meant was:
> > 
> >     if ( vcid == domain->nr_vcpus )
> >          return domain->interrupt_collection
> >     else if ( vcid < domain_nr_vcpus )
> >          return domain->vcpus[vcid]->interrupt_colleciton
> >     else
> >          invalid vcid.
> > 
> > Similar to how SPI vs PPI interrupts are handled.
> 
> Sorry, I didn't understand your suggestion like that.
> 
> I think that can work, although the resulting code may be difficult to
> read/understand because a collection can be moved from one vCPU to another.

Actually, I think the guest is allowed to point multiple collections at
the same vcpu (it's dumb, but allowed), so this scheme probably doesn't
work.

> > 
> >> INVALL ensures that any interrupts in the specified collection are
> >> re-load. SYNC ensures that all the previous command, and all outstanding
> >> physical actions relating to the specified re-distributor are completed.
> > 
> >>
> >>> All other ITS command like `MOVI`, `DISCARD`, `INV`, `INT`, `CLEAR`
> >>> just validate and generate physical command.
> >>>
> >>> ### `MAPC` command translation
> >>>
> >>> Format: `MAPC vCID, vTA`
> >>>
> >>> - `MAPC pCID, pTA` physical ITS command is generated
> >>
> >> We should not send any MAPC command to the physical ITS. The collection
> >> is already mapped during Xen boot.
> > 
> > What is the plan for this start of day mapping? One collection per pCPU
> > and ignore the rest?
> 
> To map the only collection per pCPU. And we can see for improvement later.

If you are making assumptions like this which are not written down then
please point out explicitly that an assumption needs to be documented.

> > Supposing that a guest is likely to use collections to map interrupts to
> > specific vcpus, and that the physical collections will be mapped to
> > pcpus, I suppose this means we will need to do some relatively expensive
> > remapping (corresponding to moving the IRQ to another collection) in
> > arch_move_irqs? Is that the best we can do?
> 
> If we use the same solution as SPI/PPIs (i.e arch_move_irqs), yes. But
> I'm not in favor of this solution which is already expensive for
> SPI/PPIs (after talking with Dario, vCPU may move often between pCPU).
> 
> We could decide to implement a lazy solution (i.e moving the interrupt
> only when it's firing) but that would require to send ITS command in Xen.
> 
> Another idea is to never move the interrupt collection from one pCPU to
> another. The vCID would always be equal to the same pCID, but the
> interrupt would be injected to a different processor depending on the
> associated vTA.

I have an idea here, but I'm going to try and update the document for
the current feedback before proposing a radical change.


> > 
> >> I guess that if the OS is playing with the ITT (such as writing in it)
> >> the ITS will behave badly. We have to ensure to the guest will never
> >> write in it and by the same occasion that the same region is not passed
> >> to 2 devices.
> > 
> > I don't think we will be exposing the physical ITT to the guest, will
> > we? That will live in memory which Xen owns and controls and doesn't
> > share with any guest.
> 
> That's need to be clarify. The RFC sent by Vijay is letting the guest
> feeding the physical ITT.
>
> But I just though that it won't work well, the ITT has to be contiguous
> in the physical memory.
> 
> > In fact, I don't know that a vITS will need an ITT memory at all, i.e.
> > most of our GITS_BASERn will be unimplemented.
> > In theory we could use these registers to offload some of the data
> > structure storage requirements to the guest, but that would require
> > great care to validate any time we touched it (or perhaps just
> > p2m==r/o), I think it is probably not worth the stress if we can just
> > use regular hypervisor side data structures instead? (This stuff is just
> > there for the h/w ITS which doesn't have the luxury of xmalloc).
> 
> GITS_BASERn is only used to feed internal ITS table such as Collections,
> Devices, Virtual Processors.
> 
> For the ITT, the address is passed via MAPD and as to be allocated by
> the guest.
> 
> We could ignore the address provided by the guest but that would mean
> the ITT is allocated twice (one by the guest unused, and the Xen one
> used to feed the pITS). Although, as pointed above the ITT memory region
> allocated by the guest may not be contiguous in physical memory

There is absolutely no way that we are going to be giving guest
controlled memory to the pITS. We absolutely have to allocate and use
Xen memory for the pITT.

The vITT needs to be "shadowed" into the pITT after translation and
verification.

> Furthermore, if we allocate the ITT in Xen, this command will be quick
> to emulate (we could send the MAPD command when the device is attached).
> 
> So less possible security issue.

Far less, yes.

> >>> Here the overhead is with memory allocation for `its_device` and `vlpi_map`
> >>>
> >>> XXX Suggestion was to preallocate some of those at device passthrough
> >>> setup time?
> >>
> >> Some of the informations can even be setup when the PCI device is added
> >> to Xen (such as the number of MSI supported and physical LPIs chunk).
> > 
> > Yes, assuming there are sufficient LPIs to allocate in this way. That's
> > not clear though, is it?
> 
> IHMO it's not an issue. If the ITS doesn't provide enough LPIs a
> baremetal OS will likely not being able to use a device.
> 
> FWIW, Linux is always allocating the LPIs when the device is added based
> on the PCI cfg.

The sort of thing I'm worried about is a device (i.e. some off the shelf
IP) with some number of MSIs supported integrated in a device where, for
whatever reason, the designer has decided that only a smaller set of
MSIs are worth supporting and sizing the ITS accordingly (i.e. deciding
to only support 8 queues in a NIC instead of the full 16).

In such a decide the sum(all devices MSIs) > num LPIs.

Maybe we can defer worrying about this till later.
> > 
> >>> - Allocate irq descriptor and add to RB tree
> >>> - call `route_irq_to_guest()` for this pID
> >>> - Generate/format physical ITS command: `MAPVI device ID, pID, pCID`
> >>
> >>
> >>> Here the overhead is allocating physical ID, allocate memory for irq
> >>> descriptor and routing interrupt.
> >>>
> >>> XXX Suggested to preallocate?
> >>
> >> Right. We may also need to have a separate routing for LPIs as the
> >> current function is quite long to execute.
> >>
> >> I was thinking into routing the interrupt at device assignation
> >> (assuming we allocate the pLPIs at that time). And only set the mapping
> >> to vLPIs when the MAPI is called.
> > 
> > Please propose concrete modifications to the text, since I can't figure
> > out what you mean to change here.
> 
> It's not yet fully clear in my mind.
> 
> If we assuming that we have a chunk of LPIs allocate for a the device
> when it's assigned to the guest, the only missing information is the
> corresponding vLPIs.
> 
> We could decide to setup the internal routing structure (allocation of
> the irq_guest) and defer the setting of the vLPI. This would work
> because the interrupt are disabled at boot time and won't be enabled as
> long as a MAPVI command is sent.

I'm going to try and write some introductory bits about these data
structures and early allocations to try and bring some clarity here so
we can decide what to actually do.

> >>> ### `INVALL` Command translation
> >>
> >> The format of INVALL is INVALL collection
> >>
> >>> A physical `INVALL` is only generated if the LPI dirty bitmap has any
> >>> bits set. Otherwise it is skipped.
> >>>
> >>> XXX Perhaps bitmap should just be a simple counter?
> >>
> >> We would need to handle it per collection.
> > 
> > Hrm, this complicates things a bit. Don't we need to invalidate any pCID
> > which has a routing of an interrupt to vCID? i.e. potentially multiple
> > INVALL?
> 
> Not necessarily. If we keep the interrupts of a vCID always in the same
> pCID, we would need to send only on INVALL.

I think I/we need to spec out the vCID->pCID mapping a bit more clearly
so we can see how this will fit together.

Ian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Draft C] Xen on ARM vITS Handling
  2015-06-01 13:36     ` Ian Campbell
@ 2015-06-02 10:46       ` Julien Grall
  2015-06-02 11:09         ` Ian Campbell
  0 siblings, 1 reply; 15+ messages in thread
From: Julien Grall @ 2015-06-02 10:46 UTC (permalink / raw)
  To: Ian Campbell, Julien Grall
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari, xen-devel

Hi Ian,

On 01/06/15 14:36, Ian Campbell wrote:
> On Fri, 2015-05-29 at 15:06 +0100, Julien Grall wrote:
>> Hi Vijay,
>>
>> On 27/05/15 17:44, Vijay Kilari wrote:
>>>> ## Command Translation
>>>>
>>>> Of the existing GICv3 ITS commands, `MAPC`, `MAPD`, `MAPVI`/`MAPI` are
>>>> potentially time consuming commands as these commands creates entry in
>>>> the Xen ITS structures, which are used to validate other ITS commands.
>>>>
>>>> `INVALL` and `SYNC` are global and potentially disruptive to other
>>>> guests and so need consideration.
>>>>
>>>> All other ITS command like `MOVI`, `DISCARD`, `INV`, `INT`, `CLEAR`
>>>> just validate and generate physical command.
>>>>
>>>> ### `MAPC` command translation
>>>>
>>>> Format: `MAPC vCID, vTA`
>>>>
>>>    -  The GITS_TYPER.PAtype is emulated as 0. Hence vTA is always represents
>>>       vcpu number. Hence vTA is validated against physical Collection
>>> IDs by querying
>>>       ITS driver and corresponding Physical Collection ID is retrieved.
>>>    -  Each vITS will have cid_map (struct cid_mapping) which holds mapping of
>>
>> Why do you speak about each vITS? The emulation is only related to one
>> vITS and not shared...
>
> And each vITS will have a cid_map, which is used. This seems like a
> reasonable way to express this concept in the context.

This is rather strange when everything in the command emulation is per-vits.

> Perhaps there is a need to include discussion of some of the secondary
> data structures alongside the defintion `cits_cq`. In which case we
> could talk about "its associated `cid_map`" and things.
>
>>>       Virtual Collection ID(vCID), Virtual Target address(vTA) and
>>>       Physical Collection ID (pCID).
>>>       If vCID entry already exists in cid_map, then that particular
>>> mapping is updated with
>>>       the new pCID and vTA else new entry is made in cid_map
>>
>> When you move a collection, you also have to make sure that all the
>> interrupts associated to it will be delivered to the new target.
>>
>> I'm not sure what you are suggesting for that...
>
> This is going to be rather painful I fear.
>
>>>    -  MAPC pCID, pTA physical ITS command is generated
>>
>> We should not send any MAPC command to the physical ITS. The collection
>> is already mapped during Xen boot and the guest should not be able to
>> move the physical collection (they are shared between all the guests and
>> Xen).
>
> This needs discussion in the background section, to describe the
> physical setup which the virtual stuff can make assumption of.

I don't think this is a background section. The physical number of
collection is limited (the mandatory number of collections is nr_cpus +
1). Those collection will likely be shared between Xen and the different
guests.

If we let the guest moving the physical collection we will also move all
the interrupts which is wrong.

>
>>>    Here there is no overhead, the cid_map entries are preallocated
>>> with size of nr_cpus
>>>    in the platform.
>>
>> As said the number of collection should be at least nr_cpus + 1.
>
> FWIW I read this as "with size appropriate for nr_cpus", which leaves
> the +1 as implicit. I added the +1 nevertheless.

I wanted to make clear. His implementation was only considering nr_cpus
collections.

>
>>>> - `MAPC pCID, pTA` physical ITS command is generated
>>>>
>>>> ### `MAPD` Command translation
>>>>
>>>> Format: `MAPD device, Valid, ITT IPA, ITT Size`
>>>>
>>>> `MAPD` is sent with `Valid` bit set if device needs to be added and reset
>>>> when device is removed.
>>>>
>>>> If `Valid` bit is set:
>>>>
>>>> - Allocate memory for `its_device` struct
>>>> - Validate ITT IPA & ITT size and update its_device struct
>>>> - Find number of vectors(nrvecs) for this device by querying PCI
>>>>   helper function
>>>> - Allocate nrvecs number of LPI XXX nrvecs is a function of `ITT Size`?
>>>> - Allocate memory for `struct vlpi_map` for this device. This
>>>>   `vlpi_map` holds mapping of Virtual LPI to Physical LPI and ID.
>>>> - Find physical ITS node with which this device is associated
>>>> - Call `p2m_lookup` on ITT IPA addr and get physical ITT address
>>>> - Validate ITT Size
>>>> - Generate/format physical ITS command: `MAPD, ITT PA, ITT Size`
>>>>
>>>> Here the overhead is with memory allocation for `its_device` and `vlpi_map`
>>>>
>>>> XXX Suggestion was to preallocate some of those at device passthrough
>>>> setup time?
>>>
>>> If Validation bit is set:
>>>    - Query its_device tree and get its_device structure for this device.
>>>    - (XXX: If pci device is hidden from dom0, does this device is added
>>>        with PHYSDEVOP_pci_device_add hypercall?)
>>>    - If device does not exists return
>>>    - If device exists in RB-tree then
>>>           - Validate ITT IPA & ITT size and update its_device struct
>>
>> To validate the ITT size you need to know the number of interrupt ID.
>
> Please could you get into the habit of making concrete suggestions for
> changes to the text. I've no idea what change I should make based on
> this observation. If not concrete suggestions please try and make the
> implications of what you are saying clear.

The size of the ITT is based on the number of Interrupt supported by the
device.

The only way to validate the size getting the number of Interrupt
before. i.e

	- Find the number of MSI for this device
	- Compute the theorical size of the ITT
	- Validate the ITT IPA and the ITT size
	- Update the its_device struct

Note: this is based on this conversation and doesn't take into account
the other threads.

>
>
>>
>>>           - Check if device is already assigned to the domain,
>>>             if not then
>>>                - Find number of vectors(nrvecs) for this device.
>>>                - Allocate nrvecs number of LPI
>>>                - Fetch vlpi_map for this device (preallocated at the
>>> time of adding
>>>                  this device to Xen). This vlpi_map holds mapping of
>>> Virtual LPI to
>>>                  Physical LPI and ID.
>>>                - Call p2m_lookup on ITT IPA addr and get physical ITT address
>>>                - Assign this device to this domain and mark as enabled
>>>           - If this device already exists with the domain (Domain is
>>> remapping the device)
>>>                - Validate ITT IPA & ITT size and update its_device struct
>>>                - Call p2m_lookup on ITT IPA addr and get physical ITT address
>>>                - Disable all the LPIs of this device by searching
>>> through vlpi_map and LPI
>>>                  configuration table
>>
>> Disabling all the LPIs associated to a device can be time consuming
>> because you have to unroute them and make sure that the physical ITS
>> effectively disabled it before sending the MAPD command.
>>
>> Given that the software would be buggy if it send a MAPD command without
>> releasing all the associated interrupt we could ignore the command if
>> any interrupt is still enabled.
>
> Releasing how? Did you mean disable?

I think disable is enough (see 4.9.18).

>
>>>     - Clear all vlpis mapping for this device
>>>     - Remove this device from the domain
>>>
>>>> ### `MAPVI`/`MAPI` Command translation
>>>>
>>>> Format: `MAPVI device, ID, vID, vCID`
>>>>
>>>> - Validate if the device exits by checking vITS device list
>>>> - Validate vCID and get pCID by searching cid_map
>>>>
>>>> - if vID does not have entry in `vlpi_entries` of this device allocate
>>>>   a new pID from `vlpi_map` of this device and update `vlpi_entries`
>>>>   with new pID
>>>> - Allocate irq descriptor and add to RB tree
>>>> - call `route_irq_to_guest()` for this pID
>>>> - Generate/format physical ITS command: `MAPVI device ID, pID, pCID`
>>>>
>>>
>>> - Validate if the device exists by checking vITS device RB-tree.
>>> - Validate vCID and get pCID by searching cid_map
>>> - if vID does not have entry in vlpi_entries of this device
>>>       -  Allot pID from vlpi_map of this device and update
>>> vlpi_entries with new pID.
>>>       - Allocate irq descriptor and add to RB tree
>>>       - call route_irq_to_guest() for this pID
>>>   If exists,
>>>      - If vCID is different ( remapping interrupts to differnt collection ),
>>>             - Disable LPI
>>
>> You have to ensure the the LPI is disabled with an INV/SYNC.
>
> Can we not rely on the subsequent INV/SYNC from the guest relating to
> the MAPI here?

I think you meant MAPVI here.

It's not mandatory that MAPVI is followed by an INV. INV is only used 
when the software modified the LPI configuration table.

Furthermore, it's not clear what would be the behavior of the ITS if the 
interrupt is not correctly disabled (i.e not having an INV/SYNC before 
MAPI).

Although, I think more about it and Xen should not disable/enable any 
IRQ without the explicit request of the guest. Based on 4.9.17, I think 
we can ignore the command if the IRQ has not been disabled (i.e LPI 
disabled in the LPI conf table + INV/INVALL sent).

>
>> Although as suggested on the previous commands, we may want to deny the
>> command if the interrupt is not disabled.
>>
>>>             - Update the vlpi_map
>>>              (XXX: Enable LPI on guest request?)
>>
>> Read the spec...
>
> If you know the answer then please share it rather than making people
> guess what you are talking about.

The Spec is pretty clear on it (see 4.9.17). The guest is managing the
interrupt itself via the LPI configuration table, we should not try to
do more than the guest ask.

The question to enable/disable the IRQ will depend on the virtual LPI
configuration table and if we already replicated the value to the
physical one.

Although as we don't have any mapping vLPI -> pLPI at this time, it 
likely means that we have to check the virtual LPI configuration table, 
replicate the value in the physical table and then send an INV/SYNC (if 
the value has been toggled) to ensure the completion.

This is because it would be valid, even if it's pointless, to enable an 
LPI before mapping a (DevID, ID).

Regards,

-- 
Julien Grall

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Draft C] Xen on ARM vITS Handling
  2015-06-02 10:46       ` Julien Grall
@ 2015-06-02 11:09         ` Ian Campbell
  0 siblings, 0 replies; 15+ messages in thread
From: Ian Campbell @ 2015-06-02 11:09 UTC (permalink / raw)
  To: Julien Grall
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari, xen-devel

On Tue, 2015-06-02 at 11:46 +0100, Julien Grall wrote:
> Hi Ian,
> 
> On 01/06/15 14:36, Ian Campbell wrote:
> > On Fri, 2015-05-29 at 15:06 +0100, Julien Grall wrote:
> >> Hi Vijay,
> >>
> >> On 27/05/15 17:44, Vijay Kilari wrote:
> >>>> ## Command Translation
> >>>>
> >>>> Of the existing GICv3 ITS commands, `MAPC`, `MAPD`, `MAPVI`/`MAPI` are
> >>>> potentially time consuming commands as these commands creates entry in
> >>>> the Xen ITS structures, which are used to validate other ITS commands.
> >>>>
> >>>> `INVALL` and `SYNC` are global and potentially disruptive to other
> >>>> guests and so need consideration.
> >>>>
> >>>> All other ITS command like `MOVI`, `DISCARD`, `INV`, `INT`, `CLEAR`
> >>>> just validate and generate physical command.
> >>>>
> >>>> ### `MAPC` command translation
> >>>>
> >>>> Format: `MAPC vCID, vTA`
> >>>>
> >>>    -  The GITS_TYPER.PAtype is emulated as 0. Hence vTA is always represents
> >>>       vcpu number. Hence vTA is validated against physical Collection
> >>> IDs by querying
> >>>       ITS driver and corresponding Physical Collection ID is retrieved.
> >>>    -  Each vITS will have cid_map (struct cid_mapping) which holds mapping of
> >>
> >> Why do you speak about each vITS? The emulation is only related to one
> >> vITS and not shared...
> >
> > And each vITS will have a cid_map, which is used. This seems like a
> > reasonable way to express this concept in the context.
> 
> This is rather strange when everything in the command emulation is per-vits.

I'm afraid you are going to have to say more explicitly what you find
strange here.

> > Perhaps there is a need to include discussion of some of the secondary
> > data structures alongside the defintion `cits_cq`. In which case we
> > could talk about "its associated `cid_map`" and things.
> >
> >>>       Virtual Collection ID(vCID), Virtual Target address(vTA) and
> >>>       Physical Collection ID (pCID).
> >>>       If vCID entry already exists in cid_map, then that particular
> >>> mapping is updated with
> >>>       the new pCID and vTA else new entry is made in cid_map
> >>
> >> When you move a collection, you also have to make sure that all the
> >> interrupts associated to it will be delivered to the new target.
> >>
> >> I'm not sure what you are suggesting for that...
> >
> > This is going to be rather painful I fear.
> >
> >>>    -  MAPC pCID, pTA physical ITS command is generated
> >>
> >> We should not send any MAPC command to the physical ITS. The collection
> >> is already mapped during Xen boot and the guest should not be able to
> >> move the physical collection (they are shared between all the guests and
> >> Xen).
> >
> > This needs discussion in the background section, to describe the
> > physical setup which the virtual stuff can make assumption of.
> 
> I don't think this is a background section. The physical number of
> collection is limited (the mandatory number of collections is nr_cpus +
> 1). Those collection will likely be shared between Xen and the different
> guests.

Right, and this needs to be explained in the document as an assumption
upon which other things can draw, so that the document is (so far as
possible) a coherent whole...

> If we let the guest moving the physical collection we will also move all
> the interrupts which is wrong.

... and therefore things like this would become apparent.

> >>>> - `MAPC pCID, pTA` physical ITS command is generated
> >>>>
> >>>> ### `MAPD` Command translation
> >>>>
> >>>> Format: `MAPD device, Valid, ITT IPA, ITT Size`
> >>>>
> >>>> `MAPD` is sent with `Valid` bit set if device needs to be added and reset
> >>>> when device is removed.
> >>>>
> >>>> If `Valid` bit is set:
> >>>>
> >>>> - Allocate memory for `its_device` struct
> >>>> - Validate ITT IPA & ITT size and update its_device struct
> >>>> - Find number of vectors(nrvecs) for this device by querying PCI
> >>>>   helper function
> >>>> - Allocate nrvecs number of LPI XXX nrvecs is a function of `ITT Size`?
> >>>> - Allocate memory for `struct vlpi_map` for this device. This
> >>>>   `vlpi_map` holds mapping of Virtual LPI to Physical LPI and ID.
> >>>> - Find physical ITS node with which this device is associated
> >>>> - Call `p2m_lookup` on ITT IPA addr and get physical ITT address
> >>>> - Validate ITT Size
> >>>> - Generate/format physical ITS command: `MAPD, ITT PA, ITT Size`
> >>>>
> >>>> Here the overhead is with memory allocation for `its_device` and `vlpi_map`
> >>>>
> >>>> XXX Suggestion was to preallocate some of those at device passthrough
> >>>> setup time?
> >>>
> >>> If Validation bit is set:
> >>>    - Query its_device tree and get its_device structure for this device.
> >>>    - (XXX: If pci device is hidden from dom0, does this device is added
> >>>        with PHYSDEVOP_pci_device_add hypercall?)
> >>>    - If device does not exists return
> >>>    - If device exists in RB-tree then
> >>>           - Validate ITT IPA & ITT size and update its_device struct
> >>
> >> To validate the ITT size you need to know the number of interrupt ID.
> >
> > Please could you get into the habit of making concrete suggestions for
> > changes to the text. I've no idea what change I should make based on
> > this observation. If not concrete suggestions please try and make the
> > implications of what you are saying clear.
> 
> The size of the ITT is based on the number of Interrupt supported by the
> device.
> 
> The only way to validate the size getting the number of Interrupt
> before. i.e
> 
> 	- Find the number of MSI for this device

This is the only thing which I think is currently problematic (the rest
we know).

AIUI the intention for PCI devices was that we would learn this around
the time dom0 makes the PHYSDEVOP_pci_device_add call (or perhaps
XEN_DOMCTL_assign_device or some other function for non-PCI devices).

The doc at the moment says:

- Find number of vectors (nrvecs) for this device by querying PCI
  helper function

I don't recall if this was in draftC or is based on feedback since then.
In any case I've changed it to say

- Find number of vectors from `its_device` struct.

and added a note:

XXX Number of vectors contains in `struct its_device` needs to have
been communicated from dom0 at some point, e..g by
PHYSDEVOP_pci_device_add call or XEN_DOMCTL_assign_device or some
other scheme.

> 	- Compute the theorical size of the ITT
> 	- Validate the ITT IPA and the ITT size

Do we need to validate against the theoretical size, or just again the
amount of memory the guest has or?

If we don't actually store anything in the ITT then we may be able just
not care about the size.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Draft C] Xen on ARM vITS Handling
  2015-06-01 12:24     ` Julien Grall
  2015-06-01 13:45       ` Ian Campbell
@ 2015-06-03  7:25       ` Vijay Kilari
  2015-06-03  8:45         ` Ian Campbell
  1 sibling, 1 reply; 15+ messages in thread
From: Vijay Kilari @ 2015-06-03  7:25 UTC (permalink / raw)
  To: Julien Grall
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Ian Campbell, xen-devel

On Mon, Jun 1, 2015 at 5:54 PM, Julien Grall <julien.grall@citrix.com> wrote:
> On 01/06/15 13:11, Ian Campbell wrote:
>>>> ### Device ID (`ID`)
>>>>
>>>> This parameter is used by commands which manage a specific device and
>>>> the interrupts associated with that device. Checking if a device is
>>>> present and retrieving the data structure must be fast.
>>>>
>>>> The device identifiers may not be assigned contiguously and the maximum
>>>> number is very high (2^32).
>>>>
>>>> XXX In the context of virtualised device ids this may not be the case,
>>>> e.g. we can arrange for (mostly) contiguous device ids and we know the
>>>> bound is significantly lower than 2^32
>>>>
>>>> Possible efficient data structures would be:
>>>>
>>>> 1. List: The lookup/deletion is in O(n) and the insertion will depend
>>>>    if the device should be sorted following their identifier. The
>>>>    memory overhead is 18 bytes per element.
>>>> 2. Red-black tree: All the operations are O(log(n)). The memory
>>>>    overhead is 24 bytes per element.

How about using radix-tree instead of RB-tree?

>>>>
>>>> A Red-black tree seems the more suitable for having fast deviceID
>>>> validation even though the memory overhead is a bit higher compare to
>>>> the list.
>>>
>>> When PHYSDEVOP_pci_device_add is called, memory for its_device structure
>>> and other needed structure for this device is allocated added to RB-tree
>>> with all necessary information
>>
>> Sounds like a reasonable time to do it. I added something based on your
>> words.
>
> Hmmm... The RB-tree suggested is per domain not the host and indexed
> with the vDevID.
>
> This is the only way to know quickly if the domain is able to use the
> device and retrieving a device. Indeed, the vDevID won't be equal to the
> pDevID as the vBDF will be different to the pBDF.

Yes, vBDF is converted to pBDF to match DevID

>
> PHYSDEVOP_pci_device_add is to ask Xen managing the PCI device. At that
> time we don't know to which domain the device will be passthrough.

PHYSDEVOP_pci_device_add will only add its_device to global radix tree list.

When MAPD is received, its_device is removed from global list and added
to per domain list. When domain releases the device, its_device is added back
to global list. is it ok?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Draft C] Xen on ARM vITS Handling
  2015-06-03  7:25       ` Vijay Kilari
@ 2015-06-03  8:45         ` Ian Campbell
  0 siblings, 0 replies; 15+ messages in thread
From: Ian Campbell @ 2015-06-03  8:45 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: Julien Grall, manish.jaggi, Julien Grall, Stefano Stabellini, xen-devel

On Wed, 2015-06-03 at 12:55 +0530, Vijay Kilari wrote:
> On Mon, Jun 1, 2015 at 5:54 PM, Julien Grall <julien.grall@citrix.com> wrote:
> > On 01/06/15 13:11, Ian Campbell wrote:
> >>>> ### Device ID (`ID`)
> >>>>
> >>>> This parameter is used by commands which manage a specific device and
> >>>> the interrupts associated with that device. Checking if a device is
> >>>> present and retrieving the data structure must be fast.
> >>>>
> >>>> The device identifiers may not be assigned contiguously and the maximum
> >>>> number is very high (2^32).
> >>>>
> >>>> XXX In the context of virtualised device ids this may not be the case,
> >>>> e.g. we can arrange for (mostly) contiguous device ids and we know the
> >>>> bound is significantly lower than 2^32
> >>>>
> >>>> Possible efficient data structures would be:
> >>>>
> >>>> 1. List: The lookup/deletion is in O(n) and the insertion will depend
> >>>>    if the device should be sorted following their identifier. The
> >>>>    memory overhead is 18 bytes per element.
> >>>> 2. Red-black tree: All the operations are O(log(n)). The memory
> >>>>    overhead is 24 bytes per element.
> 
> How about using radix-tree instead of RB-tree?
> 
> >>>>
> >>>> A Red-black tree seems the more suitable for having fast deviceID
> >>>> validation even though the memory overhead is a bit higher compare to
> >>>> the list.
> >>>
> >>> When PHYSDEVOP_pci_device_add is called, memory for its_device structure
> >>> and other needed structure for this device is allocated added to RB-tree
> >>> with all necessary information
> >>
> >> Sounds like a reasonable time to do it. I added something based on your
> >> words.
> >
> > Hmmm... The RB-tree suggested is per domain not the host and indexed
> > with the vDevID.
> >
> > This is the only way to know quickly if the domain is able to use the
> > device and retrieving a device. Indeed, the vDevID won't be equal to the
> > pDevID as the vBDF will be different to the pBDF.
> 
> Yes, vBDF is converted to pBDF to match DevID
> 
> >
> > PHYSDEVOP_pci_device_add is to ask Xen managing the PCI device. At that
> > time we don't know to which domain the device will be passthrough.
> 
> PHYSDEVOP_pci_device_add will only add its_device to global radix tree list.
> 
> When MAPD is received, its_device is removed from global list and added
> to per domain list. When domain releases the device, its_device is added back
> to global list. is it ok?

I suspect we might need two list (or tree) entries for each its_device,
one for the pDevice mapping and one for the vDevice mapping. We may even
want a third for vCollection membership, I'm not sure.

Either way I don't think it'll be a big deal, the need or not for each
of those will fall out in the wash from the rest of the design, I think.

Based on the amount of discussion on draftC and the fact that we are
still finding new areas of complexity I'm going to take a step back and
try something simpler and see if I can come up with something which we
can get done for 4.6. I'll try and get a new draft reflecting that out
ASAP.

(I have my edits from the feedback on draftC so far in git, so if it
doesn't work we can always take up this one again...)

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-06-03  8:45 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-27 11:48 [Draft C] Xen on ARM vITS Handling Ian Campbell
2015-05-27 16:44 ` Vijay Kilari
2015-05-29 14:06   ` Julien Grall
2015-06-01 13:36     ` Ian Campbell
2015-06-02 10:46       ` Julien Grall
2015-06-02 11:09         ` Ian Campbell
2015-06-01 12:11   ` Ian Campbell
2015-06-01 12:24     ` Julien Grall
2015-06-01 13:45       ` Ian Campbell
2015-06-03  7:25       ` Vijay Kilari
2015-06-03  8:45         ` Ian Campbell
2015-05-29 13:40 ` Julien Grall
2015-06-01 13:12   ` Ian Campbell
2015-06-01 15:29     ` Julien Grall
2015-06-02  9:41       ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.