All of lore.kernel.org
 help / color / mirror / Atom feed
* [Draft F] Xen on ARM vITS Handling
@ 2015-06-11  9:40 Ian Campbell
  2015-06-11 12:02 ` Ian Campbell
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Ian Campbell @ 2015-06-11  9:40 UTC (permalink / raw)
  To: xen-devel; +Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari

Draft F follows. Also at:
http://xenbits.xen.org/people/ianc/vits/draftF.{pdf,html}

Here's a quick update based on feedback prior to meeting on #xenarm at
12:00AM BST / 7:00AM EDT / 4:30PM IST (which is ~1:20 from now)

Ian.

% Xen on ARM vITS Handling
% Ian Campbell <ian.campbell@citrix.com>
% Draft F

# Changelog

## Since Draft E

* Discussion of `struct pending_irq`
* Fix handling of enable/disable, requiring switching back to trapping
  the virtual cfg table again. get_vlpi_cfg is no longer needed.
* Fix p2m_lookup to also use get_page_from_gfn.

## Since Draft D

* Fixed assumptions about vLPI->pLPI mapping, which is not
  possible. This lead to changes to the model for enabling and
  disabling pLPI and vLPI and the handling of the virtual LPI
  configuration table, resolving _Unresolved Issue 1_.
* Made the pLPI and vLPI interrupt priorities explicit.
* Attempted to clarify the trust issues regarding in-guest data
  structures.
* Mandate a particular cacheability for tables in guest memory.

## Since Draft C

* _Major_ rework, in an attempt to simplify everything into something
  more likely to be achievable for 4.6.
    * Made some simplifying assumptions.
    * Reduced the scope of some support.
    * Command emulation is now mostly trivial.
    * Expanded detail on host setup, allowing other assumptions to be
      made during emulation.
* Many other things lost in the noise of the above.

## Since Draft B

* Details of command translation (thanks to Julien and Vijay)
* Added background on LPI Translation and Pending tables
* Added background on Collections
* Settled on `N:N` scheme for vITS:pat's mapping.
* Rejigged section nesting a bit.
* Since we now thing translation should be cheap, settle on
  translation at scheduling time.
* Lazy `INVALL` and `SYNC`

## Since Draft A

* Added discussion of when/where command translation occurs.
* Contention on scheduler lock, suggestion to use SOFTIRQ.
* Handling of domain shutdown.
* More detailed discussion of multiple vs single vits pros/cons.

# Introduction

ARM systems containing a GIC version 3 or later may contain one or
more ITS logical blocks. An ITS is used to route Message Signalled
interrupts from devices into an LPI injection on the processor.

The following summarises the ITS hardware design and serves as a set
of assumptions for the vITS software design. For full details of the
ITS see the "GIC Architecture Specification".

## Locality-specific Peripheral Interrupts (`LPI`)

This is a new class of message signalled interrupts introduced in
GICv3. They occupy the interrupt ID space from `8192..(2^32)-1`.

The number of LPIs support by an ITS is exposed via
`GITS_TYPER.IDbits` (as number of bits - 1), it may be up to
2^32. _Note_: This field also contains the number of Event IDs
supported by the ITS.

### LPI Configuration Table

Each LPI has an associated configuration byte in the LPI Configuration
Table (managed via the GIC Redistributor and placed at
`GICR_PROPBASER` or `GICR_VPROPBASER`). This byte configures:

* The LPI's priority;
* Whether the LPI is enabled or disabled.

Software updates the Configuration Table directly but must then issue
an invalidate command (per-device `INV` ITS command, global `INVALL`
ITS command or write `GICR_INVLPIR`) for the affect to be guaranteed
to become visible (possibly requiring an ITS `SYNC` command to ensure
completion of the `INV` or `INVALL`). Note that it is valid for an
implementation to reread the configuration table at any time (IOW it
is _not_ guaranteed that a change to the LPI Configuration Table won't
be visible until an invalidate is issued).

### LPI Pending Table

Each LPI also has an associated bit in the LPI Pending Table (managed
by the GIC redistributor). This bit signals whether the LPI is pending
or not.

This region may contain out of date information and the mechanism to
synchronise is `IMPLEMENTATION DEFINED`.

## Interrupt Translation Service (`ITS`)

### Device Identifiers

Each device using the ITS is associated with a unique "Device
Identifier".

The device IDs are properties of the implementation and are typically
described via system firmware, e.g. the ACPI IORT table or via device
tree.

The number of device ids in a system depends on the implementation and
can be discovered via `GITS_TYPER.Devbits`. This field allows an ITS
to have up to 2^32 devices.

### Events

Each device can generate "Events" (called `ID` in the spec) these
correspond to possible interrupt sources in the device (e.g. MSI
offset).

The maximum number of interrupt sources is device specific. It is
usually discovered either from firmware tables (e.g. DT or ACPI) or
from bus specific mechanisms (e.g. PCI config space).

The maximum number of events ids support by an ITS is exposed via
`GITS_TYPER.IDbits` (as number of bits - 1), it may be up to
2^32. _Note_: This field also contains the number of `LPIs` supported
by the ITS.

### Interrupt Collections

Each interrupt is a member of an "Interrupt Collection". This allows
software to manage large numbers of physical interrupts with a small
number of commands rather than issuing one command per interrupt.

On a system with N processors, the ITS must provide at least N+1
collections.

An ITS may support some number of internal collections (indicated by
`GITS_TYPER.HCC`) and external ones which require memory provisioned
by the Operating System via a `GITS_BASERn` register.

### Target Addresses

The Target Address correspond to a specific GIC re-distributor. The
format of this field depends on the value of the `GITS_TYPER.PTA` bit:

* 1: the base address of the re-distributor target is used
* 0: a unique processor number is used. The mapping between the
  processor affinity value (`MPIDR`) and the processor number is
  discoverable via `GICR_TYPER.ProcessorNumber`.

This value is up to the ITS implementer (`GITS_TYPER` is a read-only
register).

### Device Table

A Device Table is configured in each ITS which maps incoming device
identifiers into an ITS Interrupt Translation Table.

### Interrupt Translation Table (`ITT`) and Collection Table

An `Event` generated by a `Device` is translated into an `LPI` via a
per-Device Interrupt Translation Table. The structure of this table is
described in GIC Spec 4.9.12.

The ITS translation table maps the device id of the originating device
into a physical interrupt (`LPI`) and an Interrupt Collection.

The Collection is in turn looked up in the Collection Table to produce
a Target Address, indicating a redistributor (AKA CPU) to which the
LPI is delivered.

### OS Provisioned Memory Regions

The ITS hardware design provides mechanisms for an ITS to be provided
with various blocks of memory by the OS for ITS internal use, this
include the per-device ITT (established with `MAPD`) and memory
regions for Device Tables, Virtual Processors and Interrupt
Collections. Up to 8 such regions can be requested by the ITS and
provisioned by the OS via the `GITS_BASERn` registers.

### ITS Configuration

The ITS is configured and managed, including establishing and
configuring the Translation Tables and Collection Table, via an in
memory ring shared between the CPU and the ITS controller. The ring is
managed via the `GITS_CBASER` register and indexed by `GITS_CWRITER`
and `GITS_CREADR` registers.

A processor adds commands to the shared ring and then updates
`GITS_CWRITER` to make them visible to the ITS controller.

The ITS controller processes commands from the ring and then updates
`GITS_CREADR` to indicate the the processor that the command has been
processed.

Commands are processed sequentially.

Commands sent on the ring include operational commands:

* Routing interrupts to processors;
* Generating interrupts;
* Clearing the pending state of interrupts;
* Synchronising the command queue

and maintenance commands:

* Map device/collection/processor;
* Map virtual interrupt;
* Clean interrupts;
* Discard interrupts;

The field `GITS_CBASER.Size` encodes the number of 4KB pages minus 0
consisting of the command queue. This field is 8 bits which means the
maximum size is 2^8 * 4KB = 1MB. Given that each command is 32 bytes,
there is a maximum of 32768 commands in the queue.

The ITS provides no specific completion notification
mechanism. Completion is monitored by a combination of a `SYNC`
command and either polling `GITS_CREADR` or notification via an
interrupt generated via the `INT` command.

Note that the interrupt generation via `INT` requires an originating
device ID to be supplied (which is then translated via the ITS into an
LPI). No specific device ID is defined for this purpose and so the OS
software is expected to fabricate one.

Possible ways of inventing such a device ID are:

* Enumerate all device ids in the system and pick another one;
* Use a PCI BDF associated with a non-existent device function (such
  as an unused one relating to the PCI root-bridge) and translate that
  (via firmware tables) into a suitable device id;
* ???

# LPI Handling in Xen

## IRQ descriptors

Currently all SGI/PPI/SPI interrupts are covered by a single static
array of `struct irq_desc` with ~1024 entries (the maximum interrupt
number in that set of interrupt types).

The addition of LPIs in GICv3 means that the largest potential
interrupt specifier is much larger.

Therefore a second dynamically allocated array will be added to cover
the range `8192..nr_lpis`. The `irq_to_desc` function will determine
which array to use (static `0..1024` or dynamic `8192..end` lpi desc
array) based on the input irq number. Two arrays are used to avoid a
wasteful allocation covering the unused/unusable) `1024..8191` range.

## Virtual LPI interrupt injection

A physical interrupt which is routed to a guest vCPU has the
`_IRQ_GUEST` flag set in the `irq_desc` status mask. Such interrupts
have an associated instance of `struct irq_guest` which contains the
target `struct domain` pointer and virtual interrupt number.

In Xen a virtual interrupt (either arising from a physical interrupt
or completely virtual) is ultimately injected to a VCPU using the
`vgic_vcpu_inject_irq` function, or `vgic_vcpu_inject_lpi`.

This mechanism will likely need updating to handle the injection of
virtual LPIs. In particular rather than `GICD_ITARGERRn` or
`GICD_IROUTERn` routing of LPIs is performed via the ITS collections
mechanism. This is discussed below (In _vITS_:_Virtual LPI injection_).

# Scope

The ITS is rather complicated, especially when combined with
virtualisation. To simplify things we initially omit the following
functionality:

- Interrupt -> vCPU -> pCPU affinity. The management of physical vs
  virtual Collections is a feature of GICv4, thus is omitted in this
  design for GICv3. Physical interrupts which occur on a pCPU where
  the target vCPU is not already resident will be forwarded (via IPI)
  to the correct pCPU for injection via the existing
  `vgic_vcpu_inject_irq` mechanism (extended to handle LPI injection
  correctly).
- Clearing of the pending state of an LPI under various circumstances
  (`MOVI`, `DISCARD`, `CLEAR` commands) is not done. This will result
  in guests seeing some perhaps spurious interrupts.
- vITS functionality will only be available on 64-bit ARM hosts,
  avoiding the need to worry about fast access to guest owned data
  structures (64-bit uses a direct map). (NB: 32-bit guests on 64-bit
  hosts can be considered to have access)

# pITS

## Assumptions

It is assumed that `GITS_TYPER.IDbits` is large enough that there are
sufficient LPIs available to cover the sum of the number of possible
events generated by each device in the system (that is the sum of the
actual events for each bit of hardware, rather than the notional
per-device maximum from `GITS_TYPER.Idbits`).

This assumption avoids the need to do memory allocations and interrupt
routing at run time, e.g. during command processing by allowing us to
setup everything up front.

## Driver

The physical driver will provide functions for enabling, disabling
routing etc a specified interrupt, via the usual Xen APIs for doing
such things.

This will likely involve interacting with the physical ITS command
queue etc. In this document such interactions are considered internal
to the driver (i.e. we care that the API to enable an interrupt
exists, not how it is implemented).

The physical ITS will be provisioned with whatever tables it requests
via its `GITS_BASERn` registers.

## Collections

The `pITS` will be configured at start of day with 1 Collection mapped
to each physical processor, using the `MAPC` command on the physical
ITS.

## Per Device Information

Each physical device in the system which can be used together with an
ITS (whether using passthrough or not) will have associated with it a
data structure:

    struct its_device {
        struct pits *pits;
        uintNN_t phys_device_id;
        uintNN_t virt_device_id;
        unsigned int *events;
        unsigned int nr_events;
        struct page_info *pitt;
        unsigned int nr_pitt_pages;
        /* Other fields relating to pITS maintenance but unrelated to vITS */
    };

Where:

- `pits`: Pointer to the associated physical ITS.
- `phys_device_id`: The physical device ID of the physical device
- `virt_device_id`: The virtual device ID if the device is accessible
  to a domain
- `events`: An array mapping a per-device event number into a physical
  LPI.
- `nr_events`: The number of events which this device is able to
  generate.
- `pitt`, `nr_pitt_pages`: Records allocation of pages for physical
  ITT (not directly accessible).

During its lifetime this structure may be referenced by several
different mappings (e.g. physical and virtual device id maps, virtual
collection device id).

## Device Discovery/Registration and Configuration

Per device information will be discovered based on firmware tables (DT
or ACPI) and information provided by dom0 (e.g. reading associated PCI
cfg space, registration via PHYSDEVOP_pci_device_add or new custom
hypercalls).

This information shall include at least:

- The Device ID of the device.
- The maximum number of Events which the device is capable of
  generating.

When a device is discovered/registered (i.e. when all necessary
information is available) then:

- `struct its_device` and the embedded `events` array will be
  allocated (the latter with `nr_events` elements).
- The `struct its_device` will be inserted into a mapping (possibly an
  R-B tree) from its physical Device ID to the `struct its`.
- `nr_events` physical LPIs will be allocated and recorded in the
  `events` array.
- An ITT table will be allocated for the device and the appropriate
  `MAPD` command will be issued to the physical ITS. The location will
  be recorded in `struct its_device.pitt`.
- Each Event which the device may generate will be mapped to the
  corresponding LPI in the `events` array and a collection, by issuing
  a series of `MAPVI` commands. Events will be assigned to physical
  collections in a round-robin fashion.

This setup must occur for a given device before any ITS interrupts may
be configured for the device and certainly before a device is passed
through to a guest. This implies that dom0 cannot use MSIs on a PCI
device before having called `PHYSDEVOP_pci_device_add`.

# Device Assignment

Each domain will have an associated mapping from virtual device ids
into a data structure describing the physical device, including a
reference to the relevant `struct its_device`.

The number of possible device IDs may be large so a simple array or
list is likely unsuitable. A tree (e.g. Red-Black may be a suitable
data structure. Currently we do not need to perform lookups in this
tree on any hot paths.

_Note_: In the context of virtualised device ids (especially for domU)
it may be possible to arrange for the upper bound on the number of
device IDs to be lower allowing a more efficient data structure to be
used. This is left for a future improvement.

When a device is assigned to a domain (including to domain 0) the
mapping for the new virtual device ID will be entered into the tree.

During assignment all LPIs associated with the device will be routed
to the guest (i.e. `route_irq_to_guest` will be called for each LPI in
the `struct its_device.events` array) and the pLPI will be enabled in
the physical LPI configuration table with a priority of `GIC_PRI_IRQ`
(not any priority from the guest).

# vITS

A guest domain which is allowed to use ITS functionality (i.e. has
been assigned pass-through devices which can generate MSIs) will be
presented with a virtualised ITS.

Accesses to the vITS registers will trap to Xen and be emulated and a
virtualised Command Queue will be provided.

Commands entered onto the virtual Command Queue will be translated
into physical commands, as described later in this document.

There are other aspects to virtualising the ITS (LPI collection
management, assignment of LPI ranges to guests, device
management). However these are only considered here to the extent
needed for describing the vITS emulation.

## Xen interaction with guest OS provisioned vITS memory

Memory which the guest provisions to the vITS (ITT via `MAPD` or other
tables via `GITS_BASERn`) needs careful handling in Xen.

### Trust

Since Xen cannot trust data in data structures contained in such
memory if a guest can trample over it at will. Therefore Xen either
must take great care when accessing data structures stored in such
memory to validate the contents e.g. not trust that values are within
the required limits or it must take steps to restrict guest access to
the memory when it is provisioned. Since the data structures are
simple and most accessors need to do bounds check anyway it is
considered sufficient to simply do the necessary checks on access.

**Any information read memory which has been provisioned by the guest
   OS should not be trusted and must be carefully checked (e.g. ranges
   etc) before use.**

### Mapping

Most data structures stored in this shared memory are accessed on the
hot interrupt injection path and must therefore be quickly accessible
from within Xen. Since we have restricted vits support to 64-bit hosts
only `map_domain_page` is fast enough to be used on the fly and
therefore we do not need to be concerned about unbounded amounts of
permanently mapped memory consumed by each `MAPD` command.

Although `map_domain_page` is fast, `p2m_lookup` (translation from IPA
to PA) is not necessarily so. For now we accept this, as a future
extension a sparse mapping of the guest device table in vmap space
could be considered, with limits on the total amount of vmap space which
we allow each domain to consume.

The `GITS_BASERn` registers allow for the guest to specify cache
attributes for the memory. For now we require that these have the same
attributes as hypercall arguments in general (see `public/arch-arm.h`)

In addition while `GITS_BASERn` allows the Cacheability to be
specified as `Device-nGnRnE` we require that the tables provided be in
normal guest RAM (not MMIO, not granted memory etc), that is it must
have type `p2m_ram_rw`.

## vITS properties

The vITS implementation shall have:

- `GITS_TYPER.HCC == nr_vcpus + 1`.
- `GITS_TYPER.PTA == 0`. Target addresses are linear processor numbers.
- `GITS_TYPER.Devbits == See below`.
- `GITS_TYPER.IDbits == See below`.
- `GITS_TYPER.ITT Entry Size == 7`, meaning 8 bytes, which is the size
  of `struct vitt` (defined below).

`GITS_TYPER.Devbits` and `GITS_TYPER.Idbits` will need to be chosen to
reflect the host and guest configurations (number of LPIs, maximum
device ID etc).

Other fields (not mentioned here) will be set to some sensible (or
mandated) value.

The `GITS_BASER0` will be setup to request sufficient memory for a
device table consisting of entries of:

    struct vdevice_table {
        uint64_t vitt_ipa;
        uint32_t vitt_size;
        uint32_t padding;
    };
    BUILD_BUG_ON(sizeof(struct vdevice_table) != 16);

On write to `GITS_BASER0` the relevant details of the Device Table
(IPA, size, cache attributes to use when mapping) will be recorded in
`struct domain`.

All other `GITS_BASERn.Valid == 0`.

## vITS to pITS mapping

A physical system may have multiple physical ITSs.

With the simplified vits command model presented here only a single
`vits` is required.

In the future a more complex arrangement may be desired. Since the
choice of model is internal to the hypervisor/tools and is
communicated to the guest via firmware tables we are not tied to this
model as an ABI if we decide to change.

When constructing dom0 it will therefore be necessary to rewrite any
DTS properties which refer to an ITS to point to the single provided
ITS, as well as dropping all ITS nodes and replacing them with a
single node representing the vITS.

## Mapping from `vLPI` back to `pLPI`

While we have arranged for a (`pDevice`,`pEvent`) to map to a single
`pLPI` we cannot guarantee that a given `vLPI` is mapped by a single
(`vDevice`,`vEvent`) since the guest may setup multiple ITT tables
such that this is not the case. Enforcing that this is the case is
prohibitively expensive.

Therefore it is not in general possible to associate a `vLPI` with a
`pLPI`.

## Per-domain `struct pending_irq` for `vLPI`s

Internally Xen uses a `struct pending_irq` to track the status of any
pending virtual IRQ, including a virtual LPI.

Upon domain creation an array of such `struct pending_irq`'s will be
allocated to cover the range `8192..nr_lpis` (for the number of LPIs
which the guest is configured with) and a pointer this array will be
stored in the `struct domain`. The function `irq_to_pending` will be
modified to lookup interupts in the LPI range in this array.

## Handling of unrouted/spurious LPIs

Since there is no 1:1 link between a `vLPI` and `pLPI` enabling and
disabling of phyiscal LPIs cannot be driven from the state of an
associated vLPI.

Each `pLPI` is routed and enabled during device assignment, therefore
it is possible to receive a physical LPI which has yet to be routed
(via a `vITS`) to a `vLPI`.

Similarly if a guest routes multiple Events to a single `vLPI` the
interrupt may already be pending when we attempt to deliver it.

Such `pLPI`s shall be ignored and left in the priority dropped state
(per the read from `GICC_IAR`). They will not be `EOI`-d in order to
avoid a possible interrupt storm.

On device deassignment (including as part of domain destroy) after
resetting the device it will be necessary to EOI any interrupts in
such a state by walking over all events in the corresponding `struct
its_device`.

## Enabling and disabling LPIs

Two new functions `vgic_enable_lpi` and `vgic_disable_lpi` will be
provided which are analogous to `vgic_enable_irqs` and
`vgic_disable_irqs` but work for the LPI interface. (Alternatively,
refactoring the existing functions to work for all caes would be
acceptable too).

A `vLPI` which has not yet be enabled will automatically be queued, by
the existing vgic injection machinery, until a call to
`vgic_enable_lpi` is made (in response to a trapped access to the
virtual cfg table).

## LPI Configuration Table Virtualisation

A guest's write accesses to its LPI Configuration Table (which is just
an area of guest RAM which the guest has nominated) will be trapped to
the hypervisor, using stage 2 MMU permissions, in order for changes to
be propagated into the host interrupt configuration.

On write `bit[0]` of the written byte is the enable/disable state for
the irq and is handled thus, for each byte in the written value:

    lpi = lpi correspoding to byte offset (addr - table_base);

    pending_irq = irq_to_pending(lpi);
    pending_irq->priority = byte & 0xfc; /* XXX: or byte >> 2 */

    if ( byte & 0x1 )
        vgic_enable_lpi(current, lpi);
    else
        vgic_disable_lpi(current, lpi);

Note that physical interrupts are always configured with a priority of
`GIC_PRI_IRQ`, regardless of the priority of any virtual interrupt.

## LPI Pending Table Virtualisation

According to GIC spec 4.8.5 this table is not necessarily in sync and
the mechanism to force a sync is `IMPLEMENTATION DEFINED`, hence we
don't need to do anything.

## Device Table Virtualisation

The IPA, size and cacheability attributes of the guest device table
will be recorded in `struct domain` upon write to `GITS_BASER0`.

In order to lookup an entry for `device`:

    define {get,set}_vdevice_entry(domain, device, struct device_table *entry):
        offset = device*sizeof(struct vdevice_table)
        if offset > <DT size>: error

        dt_entry = <DT base IPA> + device*sizeof(struct vdevice_table)
        paddr = p2m_lookup(domain, dt_entry, p2m_ram)
        page = get_page_from_gfn(current->domain, paddr>>PAGE_SHIFT, &p2mt, P2M_ALLOC);
        if !page: error
        if !page_is_ram(p2mt): put_page(page); error;

        dt_mapping = map_domain_page(page)

        if (set)
             dt_mapping[<appropriate page offset from device>] = *entry;
        else
             *entry = dt_mapping[<appropriate page offset>];

        unmap_domain_page(dt_mapping)
        put_page(page)

Since everything is based upon IPA (guest addresses) a malicious guest
can only reference its own RAM here.

## ITT Virtualisation

The location of a VITS will have been recorded in the domain Device
Table by a `MAPI` or `MAPVI` command and is looked up as above.

The `vitt` is a `struct vitt`:

    struct vitt {
        uint16_t valid:1;
        uint16_t pad:15;
        uint16_t collection;
        uint32_t vlpi;
    };
    BUILD_BUG_ON(sizeof(struct vitt) != 8);

A lookup occurs similar to for a device table, the offset is range
checked against the `vitt_size` from the device table. To lookup
`event` on `device`:

    define {get,set}_vitt_entry(domain, device, event, struct vitt *entry):
        get_vdevice_entry(domain, device, &dt)

        offset = event*sizeof(struct vitt);
        if offset > dt->vitt_size: error

        vitt_entry = dt->vita_ipa + event*sizeof(struct vitt)
        paddr = p2m_lookup(domain, vitt_entry, p2m_ram)
        page = get_page_from_gfn(current->domain, paddr>>PAGE_SHIFT, &p2mt, P2M_ALLOC);
        if !page: error
        if !page_is_ram(p2mt): put_page(page); error;

        vitt_mapping = map_domain_page(page)

        if (set)
             vitt_mapping[<appropriate page offset from event>] = *entry;
        else
             *entry = vitt_mapping[<appropriate page offset>];

        unmap_domain_page(entry)
        put_page(page)

Again since this is IPA based a malicious guest can only point things
to its own ram.

## Collection Table Virtualisation

A pointer to a dynamically allocated array `its_collections` mapping
collection ID to vcpu ID will be added to `struct domain`. The array
shall have `nr_vcpus + 1` entries and resets to ~0 (or another
explicitly invalid vpcu nr).

## Virtual LPI injection

As discussed above the `vgic_vcpu_inject_irq` functionality will need
to be extended to cover this new case, most likely via a new
`vgic_vcpu_inject_lpi` frontend function. `vgic_vcpu_inject_irq` will
also require some refactoring to allow the priority to be passed in
from the caller (since `LPI` proprity comes from the `LPI` CFG table,
while `SPI` and `PPI` priority is configured via other means).

`vgic_vcpu_inject_lpi` receives a `struct domain *` and a virtual
interrupt number (corresponding to a vLPI) and needs to figure out
which vcpu this should map to.

To do this it must look up the Collection ID associated (via the vITS)
with that LPI.

Proposal: Add a new `its_device` field to `struct irq_guest`, a
pointer to the associated `struct its_device`. The existing `struct
irq_guest.virq` field contains the event ID (perhaps use a `union`
to give a more appropriate name) and _not_ the virtual LPI. Injection
then consists of:

        d = irq_guest->domain
        virq = irq_guest->virq
        its_device = irq_guest->its_device

        get_vitt_entry(d, its_device->virt_device_id, virq, &vitt)
        vcpu = d->its_collections[vitt.collection]

        if !is_valid_lpi(vitt.vlpi): error

        vgic_vcpu_inject_lpi(&d->vcpus[vcpu], vitt.vlpi)

If the LPI is currently disabled then it will be queued by
`vgic_vcpu_inject_lpi` and injected in response to a subsequent
`vgic_enable_lpi` call.

## Command Queue Virtualisation

The command translation/emulation in this design has been arranged to
be as cheap as possible (e.g. in many cases the actions are NOPs),
avoiding previous concerns about the length of time which an emulated
write to a `CWRITER` register may block the vcpu.

The vits will simply track its reader and writer pointers. On write
to `CWRITER` it will immediately and synchronously process all
commands in the queue and update its state accordingly.

It might be possible to implement a rudimentary form of preemption by
periodically (as determined by `hypercall_preempt_check()`) returning
to the guest without incrementing PC but with updated internal
`CREADR` state, meaning it will reexecute the write to `CWRITER` and
we can pickup where we left off for another iteration. This at least
lets us schedule other vcpus etc and prevents a monopoly.

## ITS Command Translation

This section is based on the section 5.13 of GICv3 specification
(PRD03-GENC-010745 24.0) and provides concrete ideas of how this can
be interpreted for Xen.

The ITS provides 12 commands in order to manage interrupt collections,
devices and interrupts. Possible command parameters are:

- Device ID (`Device`) (called `device` in the spec).
- Event ID (`Event`) (called `ID` in the spec). This is an index into
  a devices `ITT`.
- Collection ID (`Collection`) (called `collection` in the spec)
- LPI ID (`LPI`) (called `pID` in the spec)
- Target Address (`TA`) (called `TA` in the spec`)

These parameters need to be validated and translated from Virtual (`v`
prefix) to Physical (`p` prefix).

Note, we differ from the naming in the GIC spec for clarity, in
particular we use `Event` not `ID` and `LPI` not `pID` to reduce
confusion, especially when `v` and `p` suffixes are used due to
virtualisation.

### Parameter Validation / Translation

Each command contains parameters that needs to be validated before any
usage in Xen or passing to the hardware.

#### Device ID (`Device`)

Corresponding ITT obtained by looking up as described above.

The physical `struct its_device` can be found by looking up in the
domain's device map.

If lookup fails or the resulting device table entry is invalid then
the Device is invalid.

#### Event ID (`Event`)

Validated against emulated `GITS_TYPER.IDbits`.

It is not necessary to translate a `vEvent`.

#### LPI (`LPI`)

Validated against emulated `GITS_TYPER.IDbits`.

It is not necessary to translate a `vLPI` into a `pLPI` since the
tables all contain `vLPI`. (Translation from `pLPI` to `vLPI` happens
via `struct irq_guest` when we receive the IRQ).

#### Interrupt Collection (`Collection`)

The `Collection` is validated against the size of the per-domain
`its_collections` array (i.e. nr_vcpus + 1) and then translated by a
simple lookup in that array.

     vcpu_nr = d->its_collections[Collection]

A result > `nr_cpus` is invalid

#### Target Address (`TA`)

This parameter is used in commands which manage collections. It is a
unique identifier per processor.

We have chosen to implement `GITS_TYPER.PTA` as 0, hence `vTA` simply
corresponds to the `vcpu_id`, so only needs bounds checking against
`nr_vcpus`.

### Commands

To be read with reference to spec for each command (which includes
error checks etc which are omitted here).

It is assumed that inputs will be bounds and validity checked as
described above, thus error handling is omitted for brevity (i.e. if
get and/or set fail then so be it). In general invalid commands are
simply ignored.

#### `MAPD`: Map a physical device to an ITT.

_Format_: `MAPD Device, Valid, ITT Address, ITT Size`.

_Spec_: 5.13.11

`MAPD` is sent with `Valid` bit set if the mapping is to be added and
reset when mapping is removed.

When the `Valid` bit is set then the range `ITT Address` to `ITT
Address` + `ITT Size` need not be validated, this is done in
`{get,set}_vdevice_entry` when calling the `p2m_lookup`
function. Validating the memory at `MAPD` time would serve no purpose
since the guest could subsequently balloon it out or grant map over it etc.

The domain's device table is updated with the provided information.

The `vitt_mapd` field is set according to the `Valid` flag in the
command:

    dt_entry.vitt_ipa = ITT Address
    dt_entry.vitt_size = ITT Size
    set_vdevice_entry(current->domain, Device, &dt_entry)

#### `MAPC`: Map an interrupt collection to a target processor

_Format_: `MAPC Collection, TA`

_Spec_: 5.13.12

The updated `vTA` (a vcpu number) is recorded in the `its_collections`
array of the domain struct:

    d->its_collections[Collection] = TA

#### `MAPI`: Map an interrupt to an interrupt collection.

_Format_: `MAPI Device, LPI, Collection`

_Spec_: 5.13.13

After validation:

    vitt.valid = True
    vitt.collection = Collection
    vitt.vlpi = LPI
    set_vitt_entry(current->domian, Device, LPI, &vitt)

#### `MAPVI`: Map an input identifier to a physical interrupt and an interrupt collection.

Format: `MAPVI Device, Event, LPI, Collection`

    vitt.valid = True
    vitt.collection = Collection
    vitt.vlpi = LPI
    set_vitt_entry(current->odmian, Device, Event, &vitt)

#### `MOVI`: Redirect interrupt to an interrupt collection

_Format_: `MOVI Device, Event, Collection`

_Spec_: 5.13.15

    get_vitt_entry(current->domain, Device, Event, &vitt)
    vitt.collection = Collection
    set_vitt_entry(current->domain, Device, Event, &vitt)

    XXX consider helper which sets field without mapping/unmapping
    twice.

This command is supposed to move any pending interrupts associated
with `Event` to the vcpu implied by the new `Collection`, which is
tricky. For now we ignore this requirement (as we do for
`GICD_IROUTERn` and `GICD_TARGETRn` for other interrupt types).

#### `DISCARD`: Discard interrupt requests

_Format_: `DISCARD Device, Event`

_Spec_: 5.13.16

    get_vitt_entry(current->domain, Device, Event, &vitt)
    vitt.valid = False
    set_vitt_entry(current->domain, Device, Event, &vitt)

    XXX consider helper which sets field without mapping/unmapping
    twice.

This command is supposed to clear the pending state of any associated
interrupt. This requirement is ignored (guest may see a spurious
interrupt).

#### `INV`: Clean any caches associated with interrupt

_Format_: `INV Device, Event`

_Spec_: 5.13.17

Since LPI Configuration table updates are not trapped and the config
is read on use, there is nothing to do here.

#### `INVALL`: Clean any caches associated with an interrupt collection

_Format_: `INVALL Collection`

_Spec_: 5.13.19

Since LPI Configuration table updates are not trapped and the config
is read on use, there is nothing to do here.

#### `INT`: Generate an interrupt

_Format_: `INT Device, Event`

_Spec_: 5.13.20

The `vitt` entry corresonding to `Device,Event` is looked up and then:

    get_vitt_entry(current->domain, Device, Event, &vitt)
    vgic_vcpu_inject_lpi(current->domain, vitt.vlpi)

__Note_: Where (Device,Event) is real may need consideration of
interactions with real LPIs being delivered: Julien had concerns about
Xen's internal IRQ State tracking. if this is a problem then may need
changes to IRQ state tracking, or to inject as a real IRQ and let
physical IRQ injection handle it, or write to `GICR_SETLPIR`?

#### `CLEAR`: Clear the pending state of an interrupt

_Format_: `CLEAR Device, Event`

_Spec_: 5.13.21

Should clear pending state of LPI. Ignore (guest may see a spurious
interrupt).

#### `SYNC`: Wait for completion of any outstanding ITS actions for collection

_Format_: `SYNC TA`

_Spec_: 5.13.22

This command can be ignored.

# GICv4 Direct Interrupt Injection

GICv4 will directly mark the LPIs pending in the virtual pending table
which is per-redistributor (i.e per-vCPU).

LPIs will be received by the guest the same way as an SPIs. I.e trap in
IRQ mode then read ICC_IAR1_EL1 (for GICv3).

Therefore GICv4 will not require one vITS per pITS.

# Event Channels

It has been proposed that it might be nice to inject event channels as
LPIs in the future. Whether or not that would involve any sort of vITS
is unclear, but if it did then it would likely be a separate emulation
to the vITS emulation used with a pITS and as such is not considered
further here.

# Glossary

* _MSI_: Message Signalled Interrupt
* _ITS_: Interrupt Translation Service
* _GIC_: Generic Interrupt Controller
* _LPI_: Locality-specific Peripheral Interrupt

# References

"GIC Architecture Specification" PRD03-GENC-010745 24.0.

"IO Remapping Table System Software on ARM® Platforms" ARM DEN 0049A.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft F] Xen on ARM vITS Handling
  2015-06-11  9:40 [Draft F] Xen on ARM vITS Handling Ian Campbell
@ 2015-06-11 12:02 ` Ian Campbell
  2015-06-12  8:37 ` Vijay Kilari
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Ian Campbell @ 2015-06-11 12:02 UTC (permalink / raw)
  To: xen-devel; +Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari

On Thu, 2015-06-11 at 10:40 +0100, Ian Campbell wrote:
> Here's a quick update based on feedback prior to meeting on #xenarm at
> 12:00AM BST / 7:00AM EDT / 4:30PM IST (which is ~1:20 from now)

Here is the log.

(12:02:38) ijc: VK: So, are you happy that the design doc is something which could be implemented?
(12:03:26) VK: ijc: I have some doubts as conclusion is not done in some cases or I have missed to follow up
(12:03:53) ijc: OK, shall we go through them then?
(12:04:12) ijc: I'll be working from the Draft F I sent out an hour ago
(12:04:24) VK: ijc: I have listed down queries and go through topic wise as per draft E
(12:04:44) ijc: ok
(12:05:59) VK: ijc: 2.2.8 - Xen will use completion INT mechanism and trigger softIRQ for scheduling.
(12:05:59) VK:    and one completion INT per domain is allocated for mapping completion INT to domain's
(12:05:59) VK:    vITS. OK?
(12:06:41) ijc: VK: Why is that needed? AFAICT the pITS driver can either poll or use a single host wide completion interrupt
(12:06:52) ijc: From the PoV of vits we don't care how the pits driver gets completions I think
(12:07:01) ijc: Or at least this design does not require it 
(12:08:07) ijc: There is no softirq and nor ITS scheduling in draft E, so I don't htink it is neeed, od you understand differently?
(12:08:58) VK: ijc:  As there is no info about ITS scheduling in draft E, I want some clarification.
(12:09:29) ijc: VK: Everything is done synchronously in the GITS_CWIRTE handler
(12:09:48) ijc: THings have been arranged such that the commands are all cheap enough to do this
(12:10:16) ijc: The section "Command Queue Virtualisation" covers this I think
(12:10:29) ijc: 7.11 in draft E
(12:10:41) VK: ijc:  Ok. then it is same as what I have done RFC v2 patch
(12:10:49) ijc: and 7.14 in draft F
(12:11:21) ijc: VK: At least that aspect may be I think, I don't know though.
(12:11:56) VK: ijc: OK
(12:12:28) julieng: VK: At the difference that there is no physical command send to the ITS
(12:13:00) julieng: neither allocation
(12:13:39) ijc: julieng: Right, the intention was to do as much stuff at setup or assignment time such that the command processing was cheap
(12:15:10) VK: ijc:  But VCPU still polls right and for that you have proposed rudimentary form of preemption
(12:15:55) VK: ijc: in 7.14 in draft F. I am not aware of this any guidance on this?
(12:16:01) ijc: VK I think the polling and the preemption are unrelated. The write to GITS_WRITER is processed synchronously, so the vcpu cannot be polling then, I think?
(12:16:26) ijc: The premption thing is an optional extension to consider to allow that synchronous processing to be split up e.g. to allow other vcpus to run
(12:17:22) ijc: If other vcpus are reading GITS_READR then I suppose we would want them to see progress, i.e. by updating the internal CREADR stepwise rather than all at once at the end.
(12:18:36) VK: ijc: but when VCPU posts command on CWRITER write,  then VCPU polls for completion in pITS driver
(12:19:04) julieng: VK: There is no command send to the physical ITS.
(12:19:20) ijc: If a command generates a request to the generic code which results in a call to the pits driver then it is up to the pits driver how to deal with that and polling would be a valid response
(12:19:27) ijc: s/response/way to implement that/
(12:19:52) ijc: VK: I've deliberately decupled the vits and pits here (via the abstraction of the generic code) so that from a vits PoV you are not required to worry about it
(12:20:22) julieng: ijc: AFAICT, there no command requiring physical command anymore
(12:20:37) ijc: julieng: that would be even better ;-)
(12:20:44) ijc: and I think you are right
(12:21:01) julieng: If not, this would be a concern as a guest would be able to block a pCPU for a while.
(12:21:20) ijc: (I wouldn't be too worried about that in the end, but it is moot anyway)
(12:22:43) ijc: VK: Does that resolve your concern?
(12:23:54) VK: ijc: I am not getting it here. How vITS command does _not_ translate to physical ITS command
(12:24:13) ijc: VK: Everything is setup at start of day, so there is nothing to do during vits command processing
(12:24:36) ijc: Look through 7.15.2.* and you should see no calls to anything which interacts with the physical its
(12:25:28) ijc: (NB: 7.15.2.7 and .8 should read "Since LPI Configuration table updates are handled synchronously, there
(12:25:28) ijc: is nothing to do here." in Draft F, I missed updating them
(12:32:22) VK: ijc:  when you say it set up start of the day. you mean the guest Device Table and ITT table is directly updated by Xen instead of sending physical ITS command?
(12:32:59) ijc: VK: http://xenbits.xen.org/people/ianc/vits/draftF.html#device-discoveryregistration-and-configuration
(12:33:11) ijc: and the following section "6 Device Assignment"
(12:33:28) ijc: All of the events are routed to pLPIs during setup (either xen boot or during device assignment)
(12:37:36) VK: ijc: OK, on PHYSDEVOPS_pci_assign_device, MAPD & MAPVI commands required for this device is sent for all events of this device
(12:38:24) ijc: VK: According to 5.5 that happens upon discovery/registration, i.e. pci_device_add, rather than during assign.
(12:38:53) ijc: Since the physial ITT mapping doesn't depend on the specific domain I don't think it needs to be dferred
(12:39:29) VK: ijc: if so then only routing of interrupts will be changed to assigned domain on device assignment right?
(12:40:02) ijc: right. the (Device,Event)=>(pLPI) mapping is always there. On assignment what changes is what Xen does with the pLPI
(12:41:32) VK: ijc: you have also mentioned "Events will be assigned to physical collections in a round-robin fashion" . why?. round-robin is chosen just for distributing event fairly?
(12:42:50) ijc: VK: It was arbitrary, but better than "all to collection 0" or something
(12:45:11) VK: ijc:  next is on 7.11 in draft F ( ITT Vritualisation)
(12:45:19) VK: struct vitt {uint16_t valid:1;uint16_t pad:15;uint16_t collection;uint32_t vlpi;}
(12:46:19) VK: ijc: Is it ok to store pLPI and virtual collection in vitt?. Because this helps to easily map vLPI to pLPI
(12:47:15) ijc: There is no 1:1 map from vLPI to pLPI, so no. What need do you foresee for this mapping?
(12:47:46) ijc: the collection in vitt is already virtual
(12:48:18) ijc: s/collection/vcollection/ done on that struct and the uses
(12:50:20) VK: ijc: I think (Device, vID) is mapped to pLPI
(12:51:05) ijc: VK: Where?
(12:53:41) VK: ijc: OK. because in this design, we are generated pLPI based on (Device, Event), pLPI is not mapped to vLPI.
(12:53:53) ijc: Right
(12:54:38) ijc: I'm quite likely to get preempted by another thing shortly after 1pm BST (i.e. between 6 and 15 mins from now).
(12:54:43) ijc: Is there anything else we need to cover?
(12:55:43) VK: ijc: now vLPI and pLPI is mapped using Event
(12:56:07) ijc: For draftG I've got updates for 7.15.2.* mentioned above, a change to vitt to contain vcollection not collection and I need to update 7.14 ("Command Queue Virt") to consider multiple vcpus all pounding GITS_CREADR/CWRITER in parallel and how that should work (which I need to think about a bit)
(12:56:43) ijc: VK: mapped using Event> I'm not sure I follow, or was that just finishing your previous thought?
(12:57:06) ijc: Shall I post minutes (i.e. the IRC log) to xen-devel? VK and julieng are you OK with that?
(12:58:21) julieng: ijc: I'm fine with that. Thanks
(12:58:22) VK: ijc:  OK. I will ping you whenever I have some queries tomorrow.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft F] Xen on ARM vITS Handling
  2015-06-11  9:40 [Draft F] Xen on ARM vITS Handling Ian Campbell
  2015-06-11 12:02 ` Ian Campbell
@ 2015-06-12  8:37 ` Vijay Kilari
  2015-06-12  8:52   ` Ian Campbell
  2015-06-12 12:55 ` Ian Campbell
  2015-06-16 14:50 ` Vijay Kilari
  3 siblings, 1 reply; 18+ messages in thread
From: Vijay Kilari @ 2015-06-12  8:37 UTC (permalink / raw)
  To: Ian Campbell; +Cc: manish.jaggi, Julien Grall, Stefano Stabellini, xen-devel

On Thu, Jun 11, 2015 at 3:10 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> Draft F follows. Also at:
> http://xenbits.xen.org/people/ianc/vits/draftF.{pdf,html}
>
> Here's a quick update based on feedback prior to meeting on #xenarm at
> 12:00AM BST / 7:00AM EDT / 4:30PM IST (which is ~1:20 from now)
>
> Ian.
>
> % Xen on ARM vITS Handling
> % Ian Campbell <ian.campbell@citrix.com>
> % Draft F
>
> # Changelog
>
> ## Since Draft E
>
> * Discussion of `struct pending_irq`
> * Fix handling of enable/disable, requiring switching back to trapping
>   the virtual cfg table again. get_vlpi_cfg is no longer needed.
> * Fix p2m_lookup to also use get_page_from_gfn.
>
> ## Since Draft D
>
> * Fixed assumptions about vLPI->pLPI mapping, which is not
>   possible. This lead to changes to the model for enabling and
>   disabling pLPI and vLPI and the handling of the virtual LPI
>   configuration table, resolving _Unresolved Issue 1_.
> * Made the pLPI and vLPI interrupt priorities explicit.
> * Attempted to clarify the trust issues regarding in-guest data
>   structures.
> * Mandate a particular cacheability for tables in guest memory.
>
> ## Since Draft C
>
> * _Major_ rework, in an attempt to simplify everything into something
>   more likely to be achievable for 4.6.
>     * Made some simplifying assumptions.
>     * Reduced the scope of some support.
>     * Command emulation is now mostly trivial.
>     * Expanded detail on host setup, allowing other assumptions to be
>       made during emulation.
> * Many other things lost in the noise of the above.
>
> ## Since Draft B
>
> * Details of command translation (thanks to Julien and Vijay)
> * Added background on LPI Translation and Pending tables
> * Added background on Collections
> * Settled on `N:N` scheme for vITS:pat's mapping.
> * Rejigged section nesting a bit.
> * Since we now thing translation should be cheap, settle on
>   translation at scheduling time.
> * Lazy `INVALL` and `SYNC`
>
> ## Since Draft A
>
> * Added discussion of when/where command translation occurs.
> * Contention on scheduler lock, suggestion to use SOFTIRQ.
> * Handling of domain shutdown.
> * More detailed discussion of multiple vs single vits pros/cons.
>
> # Introduction
>
> ARM systems containing a GIC version 3 or later may contain one or
> more ITS logical blocks. An ITS is used to route Message Signalled
> interrupts from devices into an LPI injection on the processor.
>
> The following summarises the ITS hardware design and serves as a set
> of assumptions for the vITS software design. For full details of the
> ITS see the "GIC Architecture Specification".
>
> ## Locality-specific Peripheral Interrupts (`LPI`)
>
> This is a new class of message signalled interrupts introduced in
> GICv3. They occupy the interrupt ID space from `8192..(2^32)-1`.
>
> The number of LPIs support by an ITS is exposed via
> `GITS_TYPER.IDbits` (as number of bits - 1), it may be up to
> 2^32. _Note_: This field also contains the number of Event IDs
> supported by the ITS.
>
> ### LPI Configuration Table
>
> Each LPI has an associated configuration byte in the LPI Configuration
> Table (managed via the GIC Redistributor and placed at
> `GICR_PROPBASER` or `GICR_VPROPBASER`). This byte configures:
>
> * The LPI's priority;
> * Whether the LPI is enabled or disabled.
>
> Software updates the Configuration Table directly but must then issue
> an invalidate command (per-device `INV` ITS command, global `INVALL`
> ITS command or write `GICR_INVLPIR`) for the affect to be guaranteed
> to become visible (possibly requiring an ITS `SYNC` command to ensure
> completion of the `INV` or `INVALL`). Note that it is valid for an
> implementation to reread the configuration table at any time (IOW it
> is _not_ guaranteed that a change to the LPI Configuration Table won't
> be visible until an invalidate is issued).
>
> ### LPI Pending Table
>
> Each LPI also has an associated bit in the LPI Pending Table (managed
> by the GIC redistributor). This bit signals whether the LPI is pending
> or not.
>
> This region may contain out of date information and the mechanism to
> synchronise is `IMPLEMENTATION DEFINED`.
>
> ## Interrupt Translation Service (`ITS`)
>
> ### Device Identifiers
>
> Each device using the ITS is associated with a unique "Device
> Identifier".
>
> The device IDs are properties of the implementation and are typically
> described via system firmware, e.g. the ACPI IORT table or via device
> tree.
>
> The number of device ids in a system depends on the implementation and
> can be discovered via `GITS_TYPER.Devbits`. This field allows an ITS
> to have up to 2^32 devices.
>
> ### Events
>
> Each device can generate "Events" (called `ID` in the spec) these
> correspond to possible interrupt sources in the device (e.g. MSI
> offset).
>
> The maximum number of interrupt sources is device specific. It is
> usually discovered either from firmware tables (e.g. DT or ACPI) or
> from bus specific mechanisms (e.g. PCI config space).
>
> The maximum number of events ids support by an ITS is exposed via
> `GITS_TYPER.IDbits` (as number of bits - 1), it may be up to
> 2^32. _Note_: This field also contains the number of `LPIs` supported
> by the ITS.
>
> ### Interrupt Collections
>
> Each interrupt is a member of an "Interrupt Collection". This allows
> software to manage large numbers of physical interrupts with a small
> number of commands rather than issuing one command per interrupt.
>
> On a system with N processors, the ITS must provide at least N+1
> collections.
>
> An ITS may support some number of internal collections (indicated by
> `GITS_TYPER.HCC`) and external ones which require memory provisioned
> by the Operating System via a `GITS_BASERn` register.
>
> ### Target Addresses
>
> The Target Address correspond to a specific GIC re-distributor. The
> format of this field depends on the value of the `GITS_TYPER.PTA` bit:
>
> * 1: the base address of the re-distributor target is used
> * 0: a unique processor number is used. The mapping between the
>   processor affinity value (`MPIDR`) and the processor number is
>   discoverable via `GICR_TYPER.ProcessorNumber`.
>
> This value is up to the ITS implementer (`GITS_TYPER` is a read-only
> register).
>
> ### Device Table
>
> A Device Table is configured in each ITS which maps incoming device
> identifiers into an ITS Interrupt Translation Table.
>
> ### Interrupt Translation Table (`ITT`) and Collection Table
>
> An `Event` generated by a `Device` is translated into an `LPI` via a
> per-Device Interrupt Translation Table. The structure of this table is
> described in GIC Spec 4.9.12.
>
> The ITS translation table maps the device id of the originating device
> into a physical interrupt (`LPI`) and an Interrupt Collection.
>
> The Collection is in turn looked up in the Collection Table to produce
> a Target Address, indicating a redistributor (AKA CPU) to which the
> LPI is delivered.
>
> ### OS Provisioned Memory Regions
>
> The ITS hardware design provides mechanisms for an ITS to be provided
> with various blocks of memory by the OS for ITS internal use, this
> include the per-device ITT (established with `MAPD`) and memory
> regions for Device Tables, Virtual Processors and Interrupt
> Collections. Up to 8 such regions can be requested by the ITS and
> provisioned by the OS via the `GITS_BASERn` registers.
>
> ### ITS Configuration
>
> The ITS is configured and managed, including establishing and
> configuring the Translation Tables and Collection Table, via an in
> memory ring shared between the CPU and the ITS controller. The ring is
> managed via the `GITS_CBASER` register and indexed by `GITS_CWRITER`
> and `GITS_CREADR` registers.
>
> A processor adds commands to the shared ring and then updates
> `GITS_CWRITER` to make them visible to the ITS controller.
>
> The ITS controller processes commands from the ring and then updates
> `GITS_CREADR` to indicate the the processor that the command has been
> processed.
>
> Commands are processed sequentially.
>
> Commands sent on the ring include operational commands:
>
> * Routing interrupts to processors;
> * Generating interrupts;
> * Clearing the pending state of interrupts;
> * Synchronising the command queue
>
> and maintenance commands:
>
> * Map device/collection/processor;
> * Map virtual interrupt;
> * Clean interrupts;
> * Discard interrupts;
>
> The field `GITS_CBASER.Size` encodes the number of 4KB pages minus 0
> consisting of the command queue. This field is 8 bits which means the
> maximum size is 2^8 * 4KB = 1MB. Given that each command is 32 bytes,
> there is a maximum of 32768 commands in the queue.
>
> The ITS provides no specific completion notification
> mechanism. Completion is monitored by a combination of a `SYNC`
> command and either polling `GITS_CREADR` or notification via an
> interrupt generated via the `INT` command.
>
> Note that the interrupt generation via `INT` requires an originating
> device ID to be supplied (which is then translated via the ITS into an
> LPI). No specific device ID is defined for this purpose and so the OS
> software is expected to fabricate one.
>
> Possible ways of inventing such a device ID are:
>
> * Enumerate all device ids in the system and pick another one;
> * Use a PCI BDF associated with a non-existent device function (such
>   as an unused one relating to the PCI root-bridge) and translate that
>   (via firmware tables) into a suitable device id;
> * ???
>
> # LPI Handling in Xen
>
> ## IRQ descriptors
>
> Currently all SGI/PPI/SPI interrupts are covered by a single static
> array of `struct irq_desc` with ~1024 entries (the maximum interrupt
> number in that set of interrupt types).
>
> The addition of LPIs in GICv3 means that the largest potential
> interrupt specifier is much larger.
>
> Therefore a second dynamically allocated array will be added to cover
> the range `8192..nr_lpis`. The `irq_to_desc` function will determine
> which array to use (static `0..1024` or dynamic `8192..end` lpi desc
> array) based on the input irq number. Two arrays are used to avoid a
> wasteful allocation covering the unused/unusable) `1024..8191` range.
>
> ## Virtual LPI interrupt injection
>
> A physical interrupt which is routed to a guest vCPU has the
> `_IRQ_GUEST` flag set in the `irq_desc` status mask. Such interrupts
> have an associated instance of `struct irq_guest` which contains the
> target `struct domain` pointer and virtual interrupt number.
>
> In Xen a virtual interrupt (either arising from a physical interrupt
> or completely virtual) is ultimately injected to a VCPU using the
> `vgic_vcpu_inject_irq` function, or `vgic_vcpu_inject_lpi`.
>
> This mechanism will likely need updating to handle the injection of
> virtual LPIs. In particular rather than `GICD_ITARGERRn` or
> `GICD_IROUTERn` routing of LPIs is performed via the ITS collections
> mechanism. This is discussed below (In _vITS_:_Virtual LPI injection_).
>
> # Scope
>
> The ITS is rather complicated, especially when combined with
> virtualisation. To simplify things we initially omit the following
> functionality:
>
> - Interrupt -> vCPU -> pCPU affinity. The management of physical vs
>   virtual Collections is a feature of GICv4, thus is omitted in this
>   design for GICv3. Physical interrupts which occur on a pCPU where
>   the target vCPU is not already resident will be forwarded (via IPI)
>   to the correct pCPU for injection via the existing
>   `vgic_vcpu_inject_irq` mechanism (extended to handle LPI injection
>   correctly).
> - Clearing of the pending state of an LPI under various circumstances
>   (`MOVI`, `DISCARD`, `CLEAR` commands) is not done. This will result
>   in guests seeing some perhaps spurious interrupts.
> - vITS functionality will only be available on 64-bit ARM hosts,
>   avoiding the need to worry about fast access to guest owned data
>   structures (64-bit uses a direct map). (NB: 32-bit guests on 64-bit
>   hosts can be considered to have access)
>
> # pITS
>
> ## Assumptions
>
> It is assumed that `GITS_TYPER.IDbits` is large enough that there are
> sufficient LPIs available to cover the sum of the number of possible
> events generated by each device in the system (that is the sum of the
> actual events for each bit of hardware, rather than the notional
> per-device maximum from `GITS_TYPER.Idbits`).
>
> This assumption avoids the need to do memory allocations and interrupt
> routing at run time, e.g. during command processing by allowing us to
> setup everything up front.
>
> ## Driver
>
> The physical driver will provide functions for enabling, disabling
> routing etc a specified interrupt, via the usual Xen APIs for doing
> such things.
>
> This will likely involve interacting with the physical ITS command
> queue etc. In this document such interactions are considered internal
> to the driver (i.e. we care that the API to enable an interrupt
> exists, not how it is implemented).
>
> The physical ITS will be provisioned with whatever tables it requests
> via its `GITS_BASERn` registers.
>
> ## Collections
>
> The `pITS` will be configured at start of day with 1 Collection mapped
> to each physical processor, using the `MAPC` command on the physical
> ITS.
>
> ## Per Device Information
>
> Each physical device in the system which can be used together with an
> ITS (whether using passthrough or not) will have associated with it a
> data structure:
>
>     struct its_device {
>         struct pits *pits;
>         uintNN_t phys_device_id;
>         uintNN_t virt_device_id;
>         unsigned int *events;
>         unsigned int nr_events;
>         struct page_info *pitt;
>         unsigned int nr_pitt_pages;
>         /* Other fields relating to pITS maintenance but unrelated to vITS */
>     };
>
> Where:
>
> - `pits`: Pointer to the associated physical ITS.
> - `phys_device_id`: The physical device ID of the physical device
> - `virt_device_id`: The virtual device ID if the device is accessible
>   to a domain
> - `events`: An array mapping a per-device event number into a physical
>   LPI.
> - `nr_events`: The number of events which this device is able to
>   generate.
> - `pitt`, `nr_pitt_pages`: Records allocation of pages for physical
>   ITT (not directly accessible).
>
> During its lifetime this structure may be referenced by several
> different mappings (e.g. physical and virtual device id maps, virtual
> collection device id).
>
> ## Device Discovery/Registration and Configuration
>
> Per device information will be discovered based on firmware tables (DT
> or ACPI) and information provided by dom0 (e.g. reading associated PCI
> cfg space, registration via PHYSDEVOP_pci_device_add or new custom
> hypercalls).
>
> This information shall include at least:
>
> - The Device ID of the device.
> - The maximum number of Events which the device is capable of
>   generating.
>
> When a device is discovered/registered (i.e. when all necessary
> information is available) then:
>
> - `struct its_device` and the embedded `events` array will be
>   allocated (the latter with `nr_events` elements).
> - The `struct its_device` will be inserted into a mapping (possibly an
>   R-B tree) from its physical Device ID to the `struct its`.
> - `nr_events` physical LPIs will be allocated and recorded in the
>   `events` array.
> - An ITT table will be allocated for the device and the appropriate
>   `MAPD` command will be issued to the physical ITS. The location will
>   be recorded in `struct its_device.pitt`.
> - Each Event which the device may generate will be mapped to the
>   corresponding LPI in the `events` array and a collection, by issuing
>   a series of `MAPVI` commands. Events will be assigned to physical
>   collections in a round-robin fashion.
>
> This setup must occur for a given device before any ITS interrupts may
> be configured for the device and certainly before a device is passed
> through to a guest. This implies that dom0 cannot use MSIs on a PCI
> device before having called `PHYSDEVOP_pci_device_add`.
>
> # Device Assignment
>
> Each domain will have an associated mapping from virtual device ids
> into a data structure describing the physical device, including a
> reference to the relevant `struct its_device`.
>
> The number of possible device IDs may be large so a simple array or
> list is likely unsuitable. A tree (e.g. Red-Black may be a suitable
> data structure. Currently we do not need to perform lookups in this
> tree on any hot paths.
>
> _Note_: In the context of virtualised device ids (especially for domU)
> it may be possible to arrange for the upper bound on the number of
> device IDs to be lower allowing a more efficient data structure to be
> used. This is left for a future improvement.
>
> When a device is assigned to a domain (including to domain 0) the
> mapping for the new virtual device ID will be entered into the tree.
>
> During assignment all LPIs associated with the device will be routed
> to the guest (i.e. `route_irq_to_guest` will be called for each LPI in
> the `struct its_device.events` array) and the pLPI will be enabled in
> the physical LPI configuration table with a priority of `GIC_PRI_IRQ`
> (not any priority from the guest).
>
> # vITS
>
> A guest domain which is allowed to use ITS functionality (i.e. has
> been assigned pass-through devices which can generate MSIs) will be
> presented with a virtualised ITS.
>
> Accesses to the vITS registers will trap to Xen and be emulated and a
> virtualised Command Queue will be provided.
>
> Commands entered onto the virtual Command Queue will be translated
> into physical commands, as described later in this document.
>
> There are other aspects to virtualising the ITS (LPI collection
> management, assignment of LPI ranges to guests, device
> management). However these are only considered here to the extent
> needed for describing the vITS emulation.
>
> ## Xen interaction with guest OS provisioned vITS memory
>
> Memory which the guest provisions to the vITS (ITT via `MAPD` or other
> tables via `GITS_BASERn`) needs careful handling in Xen.
>
> ### Trust
>
> Since Xen cannot trust data in data structures contained in such
> memory if a guest can trample over it at will. Therefore Xen either
> must take great care when accessing data structures stored in such
> memory to validate the contents e.g. not trust that values are within
> the required limits or it must take steps to restrict guest access to
> the memory when it is provisioned. Since the data structures are
> simple and most accessors need to do bounds check anyway it is
> considered sufficient to simply do the necessary checks on access.
>
> **Any information read memory which has been provisioned by the guest
>    OS should not be trusted and must be carefully checked (e.g. ranges
>    etc) before use.**
>
> ### Mapping
>
> Most data structures stored in this shared memory are accessed on the
> hot interrupt injection path and must therefore be quickly accessible
> from within Xen. Since we have restricted vits support to 64-bit hosts
> only `map_domain_page` is fast enough to be used on the fly and
> therefore we do not need to be concerned about unbounded amounts of
> permanently mapped memory consumed by each `MAPD` command.
>
> Although `map_domain_page` is fast, `p2m_lookup` (translation from IPA
> to PA) is not necessarily so. For now we accept this, as a future
> extension a sparse mapping of the guest device table in vmap space
> could be considered, with limits on the total amount of vmap space which
> we allow each domain to consume.
>
> The `GITS_BASERn` registers allow for the guest to specify cache
> attributes for the memory. For now we require that these have the same
> attributes as hypercall arguments in general (see `public/arch-arm.h`)
>
> In addition while `GITS_BASERn` allows the Cacheability to be
> specified as `Device-nGnRnE` we require that the tables provided be in
> normal guest RAM (not MMIO, not granted memory etc), that is it must
> have type `p2m_ram_rw`.
>
> ## vITS properties
>
> The vITS implementation shall have:
>
> - `GITS_TYPER.HCC == nr_vcpus + 1`.
> - `GITS_TYPER.PTA == 0`. Target addresses are linear processor numbers.
> - `GITS_TYPER.Devbits == See below`.
> - `GITS_TYPER.IDbits == See below`.
> - `GITS_TYPER.ITT Entry Size == 7`, meaning 8 bytes, which is the size
>   of `struct vitt` (defined below).
>
> `GITS_TYPER.Devbits` and `GITS_TYPER.Idbits` will need to be chosen to
> reflect the host and guest configurations (number of LPIs, maximum
> device ID etc).
>
> Other fields (not mentioned here) will be set to some sensible (or
> mandated) value.
>
> The `GITS_BASER0` will be setup to request sufficient memory for a
> device table consisting of entries of:
>
>     struct vdevice_table {
>         uint64_t vitt_ipa;
>         uint32_t vitt_size;
>         uint32_t padding;
>     };

      How about adding valid bit to know if the entry is valid or not?

>     BUILD_BUG_ON(sizeof(struct vdevice_table) != 16);
>
> On write to `GITS_BASER0` the relevant details of the Device Table
> (IPA, size, cache attributes to use when mapping) will be recorded in
> `struct domain`.
>
> All other `GITS_BASERn.Valid == 0`.
>
> ## vITS to pITS mapping
>
> A physical system may have multiple physical ITSs.
>
> With the simplified vits command model presented here only a single
> `vits` is required.
>
> In the future a more complex arrangement may be desired. Since the
> choice of model is internal to the hypervisor/tools and is
> communicated to the guest via firmware tables we are not tied to this
> model as an ABI if we decide to change.
>
> When constructing dom0 it will therefore be necessary to rewrite any
> DTS properties which refer to an ITS to point to the single provided
> ITS, as well as dropping all ITS nodes and replacing them with a
> single node representing the vITS.
>
> ## Mapping from `vLPI` back to `pLPI`
>
> While we have arranged for a (`pDevice`,`pEvent`) to map to a single
> `pLPI` we cannot guarantee that a given `vLPI` is mapped by a single
> (`vDevice`,`vEvent`) since the guest may setup multiple ITT tables
> such that this is not the case. Enforcing that this is the case is
> prohibitively expensive.
>
> Therefore it is not in general possible to associate a `vLPI` with a
> `pLPI`.
>
> ## Per-domain `struct pending_irq` for `vLPI`s
>
> Internally Xen uses a `struct pending_irq` to track the status of any
> pending virtual IRQ, including a virtual LPI.
>
> Upon domain creation an array of such `struct pending_irq`'s will be
> allocated to cover the range `8192..nr_lpis` (for the number of LPIs
> which the guest is configured with) and a pointer this array will be
> stored in the `struct domain`. The function `irq_to_pending` will be
> modified to lookup interupts in the LPI range in this array.
>
> ## Handling of unrouted/spurious LPIs
>
> Since there is no 1:1 link between a `vLPI` and `pLPI` enabling and
> disabling of phyiscal LPIs cannot be driven from the state of an
> associated vLPI.
>
> Each `pLPI` is routed and enabled during device assignment, therefore
> it is possible to receive a physical LPI which has yet to be routed
> (via a `vITS`) to a `vLPI`.

Why do we need to enable LPIs during device assignment?
Can't we do it only on LPI configuration update, which is trapped in
Xen as mentioned
in 7.8? ( ## Enabling and disabling LPIs)

>
> Similarly if a guest routes multiple Events to a single `vLPI` the
> interrupt may already be pending when we attempt to deliver it.
>
> Such `pLPI`s shall be ignored and left in the priority dropped state
> (per the read from `GICC_IAR`). They will not be `EOI`-d in order to
> avoid a possible interrupt storm.
>
> On device deassignment (including as part of domain destroy) after
> resetting the device it will be necessary to EOI any interrupts in
> such a state by walking over all events in the corresponding `struct
> its_device`.
>
> ## Enabling and disabling LPIs
>
> Two new functions `vgic_enable_lpi` and `vgic_disable_lpi` will be
> provided which are analogous to `vgic_enable_irqs` and
> `vgic_disable_irqs` but work for the LPI interface. (Alternatively,
> refactoring the existing functions to work for all caes would be
> acceptable too).
>
> A `vLPI` which has not yet be enabled will automatically be queued, by
> the existing vgic injection machinery, until a call to
> `vgic_enable_lpi` is made (in response to a trapped access to the
> virtual cfg table).
>
> ## LPI Configuration Table Virtualisation
>
> A guest's write accesses to its LPI Configuration Table (which is just
> an area of guest RAM which the guest has nominated) will be trapped to
> the hypervisor, using stage 2 MMU permissions, in order for changes to
> be propagated into the host interrupt configuration.
>
> On write `bit[0]` of the written byte is the enable/disable state for
> the irq and is handled thus, for each byte in the written value:
>
>     lpi = lpi correspoding to byte offset (addr - table_base);
>
>     pending_irq = irq_to_pending(lpi);
>     pending_irq->priority = byte & 0xfc; /* XXX: or byte >> 2 */
>
>     if ( byte & 0x1 )
>         vgic_enable_lpi(current, lpi);
>     else
>         vgic_disable_lpi(current, lpi);
>
> Note that physical interrupts are always configured with a priority of
> `GIC_PRI_IRQ`, regardless of the priority of any virtual interrupt.
>
> ## LPI Pending Table Virtualisation
>
> According to GIC spec 4.8.5 this table is not necessarily in sync and
> the mechanism to force a sync is `IMPLEMENTATION DEFINED`, hence we
> don't need to do anything.
>
> ## Device Table Virtualisation
>
> The IPA, size and cacheability attributes of the guest device table
> will be recorded in `struct domain` upon write to `GITS_BASER0`.
>
> In order to lookup an entry for `device`:
>
>     define {get,set}_vdevice_entry(domain, device, struct device_table *entry):
>         offset = device*sizeof(struct vdevice_table)
>         if offset > <DT size>: error
>
>         dt_entry = <DT base IPA> + device*sizeof(struct vdevice_table)
>         paddr = p2m_lookup(domain, dt_entry, p2m_ram)
>         page = get_page_from_gfn(current->domain, paddr>>PAGE_SHIFT, &p2mt, P2M_ALLOC);
>         if !page: error
>         if !page_is_ram(p2mt): put_page(page); error;
>
>         dt_mapping = map_domain_page(page)
>
>         if (set)
>              dt_mapping[<appropriate page offset from device>] = *entry;
>         else
>              *entry = dt_mapping[<appropriate page offset>];
>
>         unmap_domain_page(dt_mapping)
>         put_page(page)
>
> Since everything is based upon IPA (guest addresses) a malicious guest
> can only reference its own RAM here.
>
> ## ITT Virtualisation
>
> The location of a VITS will have been recorded in the domain Device
> Table by a `MAPI` or `MAPVI` command and is looked up as above.
>
> The `vitt` is a `struct vitt`:
>
>     struct vitt {
>         uint16_t valid:1;
>         uint16_t pad:15;
>         uint16_t collection;
>         uint32_t vlpi;
>     };
>     BUILD_BUG_ON(sizeof(struct vitt) != 8);
>
> A lookup occurs similar to for a device table, the offset is range
> checked against the `vitt_size` from the device table. To lookup
> `event` on `device`:
>
>     define {get,set}_vitt_entry(domain, device, event, struct vitt *entry):
>         get_vdevice_entry(domain, device, &dt)
>
>         offset = event*sizeof(struct vitt);
>         if offset > dt->vitt_size: error
>
>         vitt_entry = dt->vita_ipa + event*sizeof(struct vitt)
>         paddr = p2m_lookup(domain, vitt_entry, p2m_ram)
>         page = get_page_from_gfn(current->domain, paddr>>PAGE_SHIFT, &p2mt, P2M_ALLOC);
>         if !page: error
>         if !page_is_ram(p2mt): put_page(page); error;
>
>         vitt_mapping = map_domain_page(page)
>
>         if (set)
>              vitt_mapping[<appropriate page offset from event>] = *entry;
>         else
>              *entry = vitt_mapping[<appropriate page offset>];
>
>         unmap_domain_page(entry)
>         put_page(page)
>
> Again since this is IPA based a malicious guest can only point things
> to its own ram.
>
> ## Collection Table Virtualisation
>
> A pointer to a dynamically allocated array `its_collections` mapping
> collection ID to vcpu ID will be added to `struct domain`. The array
> shall have `nr_vcpus + 1` entries and resets to ~0 (or another
> explicitly invalid vpcu nr).
>
> ## Virtual LPI injection
>
> As discussed above the `vgic_vcpu_inject_irq` functionality will need
> to be extended to cover this new case, most likely via a new
> `vgic_vcpu_inject_lpi` frontend function. `vgic_vcpu_inject_irq` will
> also require some refactoring to allow the priority to be passed in
> from the caller (since `LPI` proprity comes from the `LPI` CFG table,
> while `SPI` and `PPI` priority is configured via other means).
>
> `vgic_vcpu_inject_lpi` receives a `struct domain *` and a virtual
> interrupt number (corresponding to a vLPI) and needs to figure out
> which vcpu this should map to.
>
> To do this it must look up the Collection ID associated (via the vITS)
> with that LPI.
>
> Proposal: Add a new `its_device` field to `struct irq_guest`, a
> pointer to the associated `struct its_device`. The existing `struct
> irq_guest.virq` field contains the event ID (perhaps use a `union`
> to give a more appropriate name) and _not_ the virtual LPI. Injection
> then consists of:
>
>         d = irq_guest->domain
>         virq = irq_guest->virq
>         its_device = irq_guest->its_device
>
>         get_vitt_entry(d, its_device->virt_device_id, virq, &vitt)
>         vcpu = d->its_collections[vitt.collection]
>
>         if !is_valid_lpi(vitt.vlpi): error
>
>         vgic_vcpu_inject_lpi(&d->vcpus[vcpu], vitt.vlpi)
>
> If the LPI is currently disabled then it will be queued by
> `vgic_vcpu_inject_lpi` and injected in response to a subsequent
> `vgic_enable_lpi` call.
>
> ## Command Queue Virtualisation
>
> The command translation/emulation in this design has been arranged to
> be as cheap as possible (e.g. in many cases the actions are NOPs),
> avoiding previous concerns about the length of time which an emulated
> write to a `CWRITER` register may block the vcpu.
>
> The vits will simply track its reader and writer pointers. On write
> to `CWRITER` it will immediately and synchronously process all
> commands in the queue and update its state accordingly.
>
> It might be possible to implement a rudimentary form of preemption by
> periodically (as determined by `hypercall_preempt_check()`) returning
> to the guest without incrementing PC but with updated internal
> `CREADR` state, meaning it will reexecute the write to `CWRITER` and
> we can pickup where we left off for another iteration. This at least
> lets us schedule other vcpus etc and prevents a monopoly.
>
> ## ITS Command Translation
>
> This section is based on the section 5.13 of GICv3 specification
> (PRD03-GENC-010745 24.0) and provides concrete ideas of how this can
> be interpreted for Xen.
>
> The ITS provides 12 commands in order to manage interrupt collections,
> devices and interrupts. Possible command parameters are:
>
> - Device ID (`Device`) (called `device` in the spec).
> - Event ID (`Event`) (called `ID` in the spec). This is an index into
>   a devices `ITT`.
> - Collection ID (`Collection`) (called `collection` in the spec)
> - LPI ID (`LPI`) (called `pID` in the spec)
> - Target Address (`TA`) (called `TA` in the spec`)
>
> These parameters need to be validated and translated from Virtual (`v`
> prefix) to Physical (`p` prefix).
>
> Note, we differ from the naming in the GIC spec for clarity, in
> particular we use `Event` not `ID` and `LPI` not `pID` to reduce
> confusion, especially when `v` and `p` suffixes are used due to
> virtualisation.
>
> ### Parameter Validation / Translation
>
> Each command contains parameters that needs to be validated before any
> usage in Xen or passing to the hardware.
>
> #### Device ID (`Device`)
>
> Corresponding ITT obtained by looking up as described above.
>
> The physical `struct its_device` can be found by looking up in the
> domain's device map.
>
> If lookup fails or the resulting device table entry is invalid then
> the Device is invalid.
>
> #### Event ID (`Event`)
>
> Validated against emulated `GITS_TYPER.IDbits`.
>
> It is not necessary to translate a `vEvent`.
>
> #### LPI (`LPI`)
>
> Validated against emulated `GITS_TYPER.IDbits`.
>
> It is not necessary to translate a `vLPI` into a `pLPI` since the
> tables all contain `vLPI`. (Translation from `pLPI` to `vLPI` happens
> via `struct irq_guest` when we receive the IRQ).
>
> #### Interrupt Collection (`Collection`)
>
> The `Collection` is validated against the size of the per-domain
> `its_collections` array (i.e. nr_vcpus + 1) and then translated by a
> simple lookup in that array.
>
>      vcpu_nr = d->its_collections[Collection]
>
> A result > `nr_cpus` is invalid
>
> #### Target Address (`TA`)
>
> This parameter is used in commands which manage collections. It is a
> unique identifier per processor.
>
> We have chosen to implement `GITS_TYPER.PTA` as 0, hence `vTA` simply
> corresponds to the `vcpu_id`, so only needs bounds checking against
> `nr_vcpus`.
>
> ### Commands
>
> To be read with reference to spec for each command (which includes
> error checks etc which are omitted here).
>
> It is assumed that inputs will be bounds and validity checked as
> described above, thus error handling is omitted for brevity (i.e. if
> get and/or set fail then so be it). In general invalid commands are
> simply ignored.
>
> #### `MAPD`: Map a physical device to an ITT.
>
> _Format_: `MAPD Device, Valid, ITT Address, ITT Size`.
>
> _Spec_: 5.13.11
>
> `MAPD` is sent with `Valid` bit set if the mapping is to be added and
> reset when mapping is removed.
>
> When the `Valid` bit is set then the range `ITT Address` to `ITT
> Address` + `ITT Size` need not be validated, this is done in
> `{get,set}_vdevice_entry` when calling the `p2m_lookup`
> function. Validating the memory at `MAPD` time would serve no purpose
> since the guest could subsequently balloon it out or grant map over it etc.
>
> The domain's device table is updated with the provided information.
>
> The `vitt_mapd` field is set according to the `Valid` flag in the
> command:
>
>     dt_entry.vitt_ipa = ITT Address
>     dt_entry.vitt_size = ITT Size
>     set_vdevice_entry(current->domain, Device, &dt_entry)
>
> #### `MAPC`: Map an interrupt collection to a target processor
>
> _Format_: `MAPC Collection, TA`
>
> _Spec_: 5.13.12
>
> The updated `vTA` (a vcpu number) is recorded in the `its_collections`
> array of the domain struct:
>
>     d->its_collections[Collection] = TA
>
> #### `MAPI`: Map an interrupt to an interrupt collection.
>
> _Format_: `MAPI Device, LPI, Collection`
>
> _Spec_: 5.13.13
>
> After validation:
>
>     vitt.valid = True
>     vitt.collection = Collection
>     vitt.vlpi = LPI
>     set_vitt_entry(current->domian, Device, LPI, &vitt)
>
> #### `MAPVI`: Map an input identifier to a physical interrupt and an interrupt collection.
>
> Format: `MAPVI Device, Event, LPI, Collection`
>
>     vitt.valid = True
>     vitt.collection = Collection
>     vitt.vlpi = LPI
>     set_vitt_entry(current->odmian, Device, Event, &vitt)
>
> #### `MOVI`: Redirect interrupt to an interrupt collection
>
> _Format_: `MOVI Device, Event, Collection`
>
> _Spec_: 5.13.15
>
>     get_vitt_entry(current->domain, Device, Event, &vitt)
>     vitt.collection = Collection
>     set_vitt_entry(current->domain, Device, Event, &vitt)
>
>     XXX consider helper which sets field without mapping/unmapping
>     twice.
>
> This command is supposed to move any pending interrupts associated
> with `Event` to the vcpu implied by the new `Collection`, which is
> tricky. For now we ignore this requirement (as we do for
> `GICD_IROUTERn` and `GICD_TARGETRn` for other interrupt types).
>
> #### `DISCARD`: Discard interrupt requests
>
> _Format_: `DISCARD Device, Event`
>
> _Spec_: 5.13.16
>
>     get_vitt_entry(current->domain, Device, Event, &vitt)
>     vitt.valid = False
>     set_vitt_entry(current->domain, Device, Event, &vitt)
>
>     XXX consider helper which sets field without mapping/unmapping
>     twice.
>
> This command is supposed to clear the pending state of any associated
> interrupt. This requirement is ignored (guest may see a spurious
> interrupt).
>
> #### `INV`: Clean any caches associated with interrupt
>
> _Format_: `INV Device, Event`
>
> _Spec_: 5.13.17
>
> Since LPI Configuration table updates are not trapped and the config
> is read on use, there is nothing to do here.
>
> #### `INVALL`: Clean any caches associated with an interrupt collection
>
> _Format_: `INVALL Collection`
>
> _Spec_: 5.13.19
>
> Since LPI Configuration table updates are not trapped and the config
> is read on use, there is nothing to do here.
>
> #### `INT`: Generate an interrupt
>
> _Format_: `INT Device, Event`
>
> _Spec_: 5.13.20
>
> The `vitt` entry corresonding to `Device,Event` is looked up and then:
>
>     get_vitt_entry(current->domain, Device, Event, &vitt)
>     vgic_vcpu_inject_lpi(current->domain, vitt.vlpi)
>
> __Note_: Where (Device,Event) is real may need consideration of
> interactions with real LPIs being delivered: Julien had concerns about
> Xen's internal IRQ State tracking. if this is a problem then may need
> changes to IRQ state tracking, or to inject as a real IRQ and let
> physical IRQ injection handle it, or write to `GICR_SETLPIR`?
>
> #### `CLEAR`: Clear the pending state of an interrupt
>
> _Format_: `CLEAR Device, Event`
>
> _Spec_: 5.13.21
>
> Should clear pending state of LPI. Ignore (guest may see a spurious
> interrupt).
>
> #### `SYNC`: Wait for completion of any outstanding ITS actions for collection
>
> _Format_: `SYNC TA`
>
> _Spec_: 5.13.22
>
> This command can be ignored.
>
> # GICv4 Direct Interrupt Injection
>
> GICv4 will directly mark the LPIs pending in the virtual pending table
> which is per-redistributor (i.e per-vCPU).
>
> LPIs will be received by the guest the same way as an SPIs. I.e trap in
> IRQ mode then read ICC_IAR1_EL1 (for GICv3).
>
> Therefore GICv4 will not require one vITS per pITS.
>
> # Event Channels
>
> It has been proposed that it might be nice to inject event channels as
> LPIs in the future. Whether or not that would involve any sort of vITS
> is unclear, but if it did then it would likely be a separate emulation
> to the vITS emulation used with a pITS and as such is not considered
> further here.
>
> # Glossary
>
> * _MSI_: Message Signalled Interrupt
> * _ITS_: Interrupt Translation Service
> * _GIC_: Generic Interrupt Controller
> * _LPI_: Locality-specific Peripheral Interrupt
>
> # References
>
> "GIC Architecture Specification" PRD03-GENC-010745 24.0.
>
> "IO Remapping Table System Software on ARM® Platforms" ARM DEN 0049A.
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft F] Xen on ARM vITS Handling
  2015-06-12  8:37 ` Vijay Kilari
@ 2015-06-12  8:52   ` Ian Campbell
  2015-06-12 13:09     ` Julien Grall
  0 siblings, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2015-06-12  8:52 UTC (permalink / raw)
  To: Vijay Kilari; +Cc: manish.jaggi, Julien Grall, Stefano Stabellini, xen-devel

On Fri, 2015-06-12 at 14:07 +0530, Vijay Kilari wrote:

Please could you trim your quotes to only include the bit you are
referring to. Otherwise there is a high chance I will miss a one line
comment in the middle of the thousand lines of quoted matter.

> > The `GITS_BASER0` will be setup to request sufficient memory for a
> > device table consisting of entries of:
> >
> >     struct vdevice_table {
> >         uint64_t vitt_ipa;
> >         uint32_t vitt_size;
> >         uint32_t padding;
> >     };
> 
>       How about adding valid bit to know if the entry is valid or not?

I suggest to use vitt_ipa == INVALID_PADDR to signal this rather than
using another bit.

> > ## Handling of unrouted/spurious LPIs
> >
> > Since there is no 1:1 link between a `vLPI` and `pLPI` enabling and
> > disabling of phyiscal LPIs cannot be driven from the state of an
> > associated vLPI.
> >
> > Each `pLPI` is routed and enabled during device assignment, therefore
> > it is possible to receive a physical LPI which has yet to be routed
> > (via a `vITS`) to a `vLPI`.
> 
> Why do we need to enable LPIs during device assignment?
> Can't we do it only on LPI configuration update, which is trapped in
> Xen as mentioned in 7.8? ( ## Enabling and disabling LPIs)

Quoting the first sentence/paragraph of this section:
        Since there is no 1:1 link between a `vLPI` and `pLPI` enabling
        and disabling of phyiscal LPIs cannot be driven from the state
        of an associated vLPI.

To expand on that: The vITT can map multiple (vDevice,vEvent) pairs to
the same LPI, and each of those (vDevice,vEvent) pairs is related to a
different (pDevice,pEvent) which in turn has a unique pLPI associated
with it. Thus a vLPI can be associated with more than one pLPI.

Enumerating all pLPIs associated with a given vLPI would be expensive (a
complete walk of the vITT).

In addition if it were possible to do so we would also need to manage
enabling/disabling the pLPI in several other places that in vPLI cfg
traps, specifically MAPC and MAPD at least.

So pLPIs must be routed at device assignment time because in the vLPI
configuration table trap there is no mapping back to a single pLPI.

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft F] Xen on ARM vITS Handling
  2015-06-11  9:40 [Draft F] Xen on ARM vITS Handling Ian Campbell
  2015-06-11 12:02 ` Ian Campbell
  2015-06-12  8:37 ` Vijay Kilari
@ 2015-06-12 12:55 ` Ian Campbell
  2015-06-12 13:14   ` Julien Grall
  2015-06-16 14:50 ` Vijay Kilari
  3 siblings, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2015-06-12 12:55 UTC (permalink / raw)
  To: xen-devel; +Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari

On Thu, 2015-06-11 at 10:40 +0100, Ian Campbell wrote:
> ## Command Queue Virtualisation
> 
> The command translation/emulation in this design has been arranged to
> be as cheap as possible (e.g. in many cases the actions are NOPs),
> avoiding previous concerns about the length of time which an emulated
> write to a `CWRITER` register may block the vcpu.
> 
> The vits will simply track its reader and writer pointers. On write
> to `CWRITER` it will immediately and synchronously process all
> commands in the queue and update its state accordingly.
> 
> It might be possible to implement a rudimentary form of preemption by
> periodically (as determined by `hypercall_preempt_check()`) returning
> to the guest without incrementing PC but with updated internal
> `CREADR` state, meaning it will reexecute the write to `CWRITER` and
> we can pickup where we left off for another iteration. This at least
> lets us schedule other vcpus etc and prevents a monopoly.

In the presence of multiple VCPUs writing to GITS_CWRITER preemption
actually gets pretty subtle. I suggest leaving it out for now.

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft F] Xen on ARM vITS Handling
  2015-06-12  8:52   ` Ian Campbell
@ 2015-06-12 13:09     ` Julien Grall
  2015-06-12 13:16       ` Ian Campbell
  0 siblings, 1 reply; 18+ messages in thread
From: Julien Grall @ 2015-06-12 13:09 UTC (permalink / raw)
  To: Ian Campbell, Vijay Kilari
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, xen-devel

Hi Ian,

On 12/06/2015 04:52, Ian Campbell wrote:
> On Fri, 2015-06-12 at 14:07 +0530, Vijay Kilari wrote:
> So pLPIs must be routed at device assignment time because in the vLPI
> configuration table trap there is no mapping back to a single pLPI.

I just remembered the exact reason that made use to differ SPI enabling.
When the device is assigned, the domain VCPUs are still down (even VCPU0).

If we receive an interrupt before the VCPU0 is unpaused, the interrupt 
will be lost. Same if the interrupt is not yet configured (i.e before 
the vITS setup correctly the table) with your proposal.

This could happen when the device is not quiescent. We had this issue on 
the vexpress at boot time when the network card was trying to send an 
interrupt before DOM0 is setup.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft F] Xen on ARM vITS Handling
  2015-06-12 12:55 ` Ian Campbell
@ 2015-06-12 13:14   ` Julien Grall
  2015-06-12 13:26     ` Ian Campbell
  0 siblings, 1 reply; 18+ messages in thread
From: Julien Grall @ 2015-06-12 13:14 UTC (permalink / raw)
  To: Ian Campbell, xen-devel
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari



On 12/06/2015 08:55, Ian Campbell wrote:
> On Thu, 2015-06-11 at 10:40 +0100, Ian Campbell wrote:
>> ## Command Queue Virtualisation
>>
>> The command translation/emulation in this design has been arranged to
>> be as cheap as possible (e.g. in many cases the actions are NOPs),
>> avoiding previous concerns about the length of time which an emulated
>> write to a `CWRITER` register may block the vcpu.
>>
>> The vits will simply track its reader and writer pointers. On write
>> to `CWRITER` it will immediately and synchronously process all
>> commands in the queue and update its state accordingly.
>>
>> It might be possible to implement a rudimentary form of preemption by
>> periodically (as determined by `hypercall_preempt_check()`) returning
>> to the guest without incrementing PC but with updated internal
>> `CREADR` state, meaning it will reexecute the write to `CWRITER` and
>> we can pickup where we left off for another iteration. This at least
>> lets us schedule other vcpus etc and prevents a monopoly.
>
> In the presence of multiple VCPUs writing to GITS_CWRITER preemption
> actually gets pretty subtle. I suggest leaving it out for now.

Would it be possible to do it with re-doing the write to the GITS_CWRITER?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft F] Xen on ARM vITS Handling
  2015-06-12 13:09     ` Julien Grall
@ 2015-06-12 13:16       ` Ian Campbell
  2015-06-12 13:32         ` Julien Grall
  0 siblings, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2015-06-12 13:16 UTC (permalink / raw)
  To: Julien Grall
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari, xen-devel

On Fri, 2015-06-12 at 09:09 -0400, Julien Grall wrote:
> Hi Ian,
> 
> On 12/06/2015 04:52, Ian Campbell wrote:
> > On Fri, 2015-06-12 at 14:07 +0530, Vijay Kilari wrote:
> > So pLPIs must be routed at device assignment time because in the vLPI
> > configuration table trap there is no mapping back to a single pLPI.
> 
> I just remembered the exact reason that made use to differ SPI enabling.

I can't parse this sentence, differ how?

> When the device is assigned, the domain VCPUs are still down (even VCPU0).
> 
> If we receive an interrupt before the VCPU0 is unpaused, the interrupt 
> will be lost. Same if the interrupt is not yet configured (i.e before 
> the vITS setup correctly the table) with your proposal.

Is this any different to booting with the ITT not setup?

(SPIs are a slightly different case because they don't need h/w routing)

> This could happen when the device is not quiescent. We had this issue on 
> the vexpress at boot time when the network card was trying to send an 
> interrupt before DOM0 is setup.

I don't fully understand the issue you are trying to describe, but do
you want to propose a change to the spec?

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft F] Xen on ARM vITS Handling
  2015-06-12 13:14   ` Julien Grall
@ 2015-06-12 13:26     ` Ian Campbell
  0 siblings, 0 replies; 18+ messages in thread
From: Ian Campbell @ 2015-06-12 13:26 UTC (permalink / raw)
  To: Julien Grall
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari, xen-devel

On Fri, 2015-06-12 at 09:14 -0400, Julien Grall wrote:
> 
> On 12/06/2015 08:55, Ian Campbell wrote:
> > On Thu, 2015-06-11 at 10:40 +0100, Ian Campbell wrote:
> >> ## Command Queue Virtualisation
> >>
> >> The command translation/emulation in this design has been arranged to
> >> be as cheap as possible (e.g. in many cases the actions are NOPs),
> >> avoiding previous concerns about the length of time which an emulated
> >> write to a `CWRITER` register may block the vcpu.
> >>
> >> The vits will simply track its reader and writer pointers. On write
> >> to `CWRITER` it will immediately and synchronously process all
> >> commands in the queue and update its state accordingly.
> >>
> >> It might be possible to implement a rudimentary form of preemption by
> >> periodically (as determined by `hypercall_preempt_check()`) returning
> >> to the guest without incrementing PC but with updated internal
> >> `CREADR` state, meaning it will reexecute the write to `CWRITER` and
> >> we can pickup where we left off for another iteration. This at least
> >> lets us schedule other vcpus etc and prevents a monopoly.
> >
> > In the presence of multiple VCPUs writing to GITS_CWRITER preemption
> > actually gets pretty subtle. I suggest leaving it out for now.
> 
> Would it be possible to do it with re-doing the write to the GITS_CWRITER?

Not easily.

For example one subtle case is:

      * VCPUA writes W1 to GITS_CWRITER, processes a proportion and is
        preempted.
      * VCPUB writes W2 (>W1) GITS_CWRITER W2 and processes a proportion
        such that W1 < GITS_CREADR < W2, is preempted.
      * VCPUA resumes, sees GITS_CREADR != W1, processes up to W2, and
        then keeps going processing junk until it wraps around to W1
        again.

The != is necessary because the command queue is a circular buffer, so
GITS_CREADR < GITS_CWRITER is not a reliable way to determine if there
is anything on the ring.

Another case I haven't decided if I'm concerned about yet is an OS which
issues two writes to GITS_CWRITER in what it things is the proper order
(maybe it's using barriers or atmomic variables or something to try and
ensure this) but the trap for the first takes longer to get to the
handler than the second (maybe it takes an interrupt on the way in, or
more cache misses etc). This may or may not be an issue, I've not
decided yet (mainly because I haven't decided if that OS behaviour is
valid or not).

Unlike with proper hypercall preemption we do not have the ability to
record progress in the hypercall arguments, nor to indicate that a
particular call is a continuation rather than a fresh call, both of
which make this harder to deal with. It's possible something could be
constructed with some per-vcpu data, I'm still thinking it through.

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft F] Xen on ARM vITS Handling
  2015-06-12 13:16       ` Ian Campbell
@ 2015-06-12 13:32         ` Julien Grall
  2015-06-12 14:05           ` Julien Grall
  2015-06-12 14:24           ` Ian Campbell
  0 siblings, 2 replies; 18+ messages in thread
From: Julien Grall @ 2015-06-12 13:32 UTC (permalink / raw)
  To: Ian Campbell
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari, xen-devel



On 12/06/2015 09:16, Ian Campbell wrote:
> On Fri, 2015-06-12 at 09:09 -0400, Julien Grall wrote:
>> Hi Ian,
>>
>> On 12/06/2015 04:52, Ian Campbell wrote:
>>> On Fri, 2015-06-12 at 14:07 +0530, Vijay Kilari wrote:
>>> So pLPIs must be routed at device assignment time because in the vLPI
>>> configuration table trap there is no mapping back to a single pLPI.
>>
>> I just remembered the exact reason that made use to differ SPI enabling.
>
> I can't parse this sentence, differ how?

deferring sorry.

>
>> When the device is assigned, the domain VCPUs are still down (even VCPU0).
>>
>> If we receive an interrupt before the VCPU0 is unpaused, the interrupt
>> will be lost. Same if the interrupt is not yet configured (i.e before
>> the vITS setup correctly the table) with your proposal.
>
> Is this any different to booting with the ITT not setup?

I don't understand your question.

> (SPIs are a slightly different case because they don't need h/w routing)

I think you mixed PPIs with SPIs. SPIs (shared private interrupt) 
requires h/w routing.

>
>> This could happen when the device is not quiescent. We had this issue on
>> the vexpress at boot time when the network card was trying to send an
>> interrupt before DOM0 is setup.
>
> I don't fully understand the issue you are trying to describe, but do
> you want to propose a change to the spec?

I actually don't know how to modify it. So it's an open question.

vgic_vcpu_inject_irq doesn't queue the interrupt if a VCPU is down. I 
think this is because the state of the VCPU wouldn't be correct.

The process would be something like:

     - Creation of the domain
	=> All vCPUs are down

     - Device is assigned to the guest
         => Enable physical LPIs

     * physical LPI is received *
	=> Will be ignored and not EOIed (VCPU0 is down)
         => The LPI will never fired again during the guest life

     -  Domain is started by the toolstack
          => VCPU0 is online

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft F] Xen on ARM vITS Handling
  2015-06-12 13:32         ` Julien Grall
@ 2015-06-12 14:05           ` Julien Grall
  2015-06-12 14:12             ` Ian Campbell
  2015-06-12 14:24           ` Ian Campbell
  1 sibling, 1 reply; 18+ messages in thread
From: Julien Grall @ 2015-06-12 14:05 UTC (permalink / raw)
  To: Julien Grall, Ian Campbell
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari, xen-devel



On 12/06/2015 09:32, Julien Grall wrote:
>
>
> On 12/06/2015 09:16, Ian Campbell wrote:
>> On Fri, 2015-06-12 at 09:09 -0400, Julien Grall wrote:
>>> Hi Ian,
>>>
>>> On 12/06/2015 04:52, Ian Campbell wrote:
>>>> On Fri, 2015-06-12 at 14:07 +0530, Vijay Kilari wrote:
>>>> So pLPIs must be routed at device assignment time because in the vLPI
>>>> configuration table trap there is no mapping back to a single pLPI.
>>>
>>> I just remembered the exact reason that made use to differ SPI enabling.
>>
>> I can't parse this sentence, differ how?
>
> deferring sorry.
>
>>
>>> When the device is assigned, the domain VCPUs are still down (even
>>> VCPU0).
>>>
>>> If we receive an interrupt before the VCPU0 is unpaused, the interrupt
>>> will be lost. Same if the interrupt is not yet configured (i.e before
>>> the vITS setup correctly the table) with your proposal.
>>
>> Is this any different to booting with the ITT not setup?
>
> I don't understand your question.
>
>> (SPIs are a slightly different case because they don't need h/w routing)
>
> I think you mixed PPIs with SPIs. SPIs (shared private interrupt)

s/private/processor/

> requires h/w routing.
>
>>
>>> This could happen when the device is not quiescent. We had this issue on
>>> the vexpress at boot time when the network card was trying to send an
>>> interrupt before DOM0 is setup.
>>
>> I don't fully understand the issue you are trying to describe, but do
>> you want to propose a change to the spec?
>
> I actually don't know how to modify it. So it's an open question.
>
> vgic_vcpu_inject_irq doesn't queue the interrupt if a VCPU is down. I
> think this is because the state of the VCPU wouldn't be correct.
>
> The process would be something like:
>
>      - Creation of the domain
>      => All vCPUs are down
>
>      - Device is assigned to the guest
>          => Enable physical LPIs
>
>      * physical LPI is received *
>      => Will be ignored and not EOIed (VCPU0 is down)
>          => The LPI will never fired again during the guest life
>
>      -  Domain is started by the toolstack
>           => VCPU0 is online
>
> Regards,
>

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft F] Xen on ARM vITS Handling
  2015-06-12 14:05           ` Julien Grall
@ 2015-06-12 14:12             ` Ian Campbell
  0 siblings, 0 replies; 18+ messages in thread
From: Ian Campbell @ 2015-06-12 14:12 UTC (permalink / raw)
  To: Julien Grall
  Cc: Vijay Kilari, manish.jaggi, Julien Grall, xen-devel,
	Julien Grall, Stefano Stabellini

On Fri, 2015-06-12 at 10:05 -0400, Julien Grall wrote:
> 
> On 12/06/2015 09:32, Julien Grall wrote:
> >
> >
> > On 12/06/2015 09:16, Ian Campbell wrote:
> >> On Fri, 2015-06-12 at 09:09 -0400, Julien Grall wrote:
> >>> Hi Ian,
> >>>
> >>> On 12/06/2015 04:52, Ian Campbell wrote:
> >>>> On Fri, 2015-06-12 at 14:07 +0530, Vijay Kilari wrote:
> >>>> So pLPIs must be routed at device assignment time because in the vLPI
> >>>> configuration table trap there is no mapping back to a single pLPI.
> >>>
> >>> I just remembered the exact reason that made use to differ SPI enabling.
> >>
> >> I can't parse this sentence, differ how?
> >
> > deferring sorry.
> >
> >>
> >>> When the device is assigned, the domain VCPUs are still down (even
> >>> VCPU0).
> >>>
> >>> If we receive an interrupt before the VCPU0 is unpaused, the interrupt
> >>> will be lost. Same if the interrupt is not yet configured (i.e before
> >>> the vITS setup correctly the table) with your proposal.
> >>
> >> Is this any different to booting with the ITT not setup?
> >
> > I don't understand your question.
> >
> >> (SPIs are a slightly different case because they don't need h/w routing)
> >
> > I think you mixed PPIs with SPIs. SPIs (shared private interrupt)
> 
> s/private/processor/

It's "peripheral" ;-)

> 
> > requires h/w routing.
> >
> >>
> >>> This could happen when the device is not quiescent. We had this issue on
> >>> the vexpress at boot time when the network card was trying to send an
> >>> interrupt before DOM0 is setup.
> >>
> >> I don't fully understand the issue you are trying to describe, but do
> >> you want to propose a change to the spec?
> >
> > I actually don't know how to modify it. So it's an open question.
> >
> > vgic_vcpu_inject_irq doesn't queue the interrupt if a VCPU is down. I
> > think this is because the state of the VCPU wouldn't be correct.
> >
> > The process would be something like:
> >
> >      - Creation of the domain
> >      => All vCPUs are down
> >
> >      - Device is assigned to the guest
> >          => Enable physical LPIs
> >
> >      * physical LPI is received *
> >      => Will be ignored and not EOIed (VCPU0 is down)
> >          => The LPI will never fired again during the guest life
> >
> >      -  Domain is started by the toolstack
> >           => VCPU0 is online
> >
> > Regards,
> >
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft F] Xen on ARM vITS Handling
  2015-06-12 13:32         ` Julien Grall
  2015-06-12 14:05           ` Julien Grall
@ 2015-06-12 14:24           ` Ian Campbell
  2015-06-12 17:55             ` Julien Grall
  1 sibling, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2015-06-12 14:24 UTC (permalink / raw)
  To: Julien Grall
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari, xen-devel

On Fri, 2015-06-12 at 09:32 -0400, Julien Grall wrote:
> 
> On 12/06/2015 09:16, Ian Campbell wrote:
> > On Fri, 2015-06-12 at 09:09 -0400, Julien Grall wrote:
> >> Hi Ian,
> >>
> >> On 12/06/2015 04:52, Ian Campbell wrote:
> >>> On Fri, 2015-06-12 at 14:07 +0530, Vijay Kilari wrote:
> >>> So pLPIs must be routed at device assignment time because in the vLPI
> >>> configuration table trap there is no mapping back to a single pLPI.
> >>
> >> I just remembered the exact reason that made use to differ SPI enabling.
> >
> > I can't parse this sentence, differ how?
> 
> deferring sorry.
> 
> >
> >> When the device is assigned, the domain VCPUs are still down (even VCPU0).
> >>
> >> If we receive an interrupt before the VCPU0 is unpaused, the interrupt
> >> will be lost. Same if the interrupt is not yet configured (i.e before
> >> the vITS setup correctly the table) with your proposal.
> >
> > Is this any different to booting with the ITT not setup?
> 
> I don't understand your question.

During boot the ITT is not configured and a spurious event will go
undelivered to an LPI then too, even on native.

> > (SPIs are a slightly different case because they don't need h/w routing)
> 
> I think you mixed PPIs with SPIs. SPIs (shared private interrupt) 
> requires h/w routing.

I don't think they do, GICD_ICFGR (or the GICv3 equivalent) come up in a
state where the interrupt will go _somewhere_, which differs from things
injected via the ITS.

> >> This could happen when the device is not quiescent. We had this issue on
> >> the vexpress at boot time when the network card was trying to send an
> >> interrupt before DOM0 is setup.
> >
> > I don't fully understand the issue you are trying to describe, but do
> > you want to propose a change to the spec?
> 
> I actually don't know how to modify it. So it's an open question.

For SPI too, or just for LPI?

> vgic_vcpu_inject_irq doesn't queue the interrupt if a VCPU is down. I 
> think this is because the state of the VCPU wouldn't be correct.
> 
> The process would be something like:
> 
>      - Creation of the domain
> 	=> All vCPUs are down
> 
>      - Device is assigned to the guest
>          => Enable physical LPIs
> 
>      * physical LPI is received *
> 	=> Will be ignored and not EOIed (VCPU0 is down)
>          => The LPI will never fired again during the guest life
> 
>      -  Domain is started by the toolstack
>           => VCPU0 is online

Is it sufficient to queue interrupts even for VCPUs which are down? How
does the lack of a vITT entry when this interrupt occurred affect this?


Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft F] Xen on ARM vITS Handling
  2015-06-12 14:24           ` Ian Campbell
@ 2015-06-12 17:55             ` Julien Grall
  2015-06-16 15:10               ` Ian Campbell
  0 siblings, 1 reply; 18+ messages in thread
From: Julien Grall @ 2015-06-12 17:55 UTC (permalink / raw)
  To: Ian Campbell
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari, xen-devel



On 12/06/2015 10:24, Ian Campbell wrote:
> On Fri, 2015-06-12 at 09:32 -0400, Julien Grall wrote:
>>
>> On 12/06/2015 09:16, Ian Campbell wrote:
>>> On Fri, 2015-06-12 at 09:09 -0400, Julien Grall wrote:
>>>> Hi Ian,
>>>>
>>>> On 12/06/2015 04:52, Ian Campbell wrote:
>>>>> On Fri, 2015-06-12 at 14:07 +0530, Vijay Kilari wrote:
>>>>> So pLPIs must be routed at device assignment time because in the vLPI
>>>>> configuration table trap there is no mapping back to a single pLPI.
>>>>
>>>> I just remembered the exact reason that made use to differ SPI enabling.
>>>
>>> I can't parse this sentence, differ how?
>>
>> deferring sorry.
>>
>>>
>>>> When the device is assigned, the domain VCPUs are still down (even VCPU0).
>>>>
>>>> If we receive an interrupt before the VCPU0 is unpaused, the interrupt
>>>> will be lost. Same if the interrupt is not yet configured (i.e before
>>>> the vITS setup correctly the table) with your proposal.
>>>
>>> Is this any different to booting with the ITT not setup?
>>
>> I don't understand your question.
>
> During boot the ITT is not configured and a spurious event will go
> undelivered to an LPI then too, even on native.

It's different. In the case of native, the event is not recorded by the 
ITS so it can fire up again later when the ITT is setup (for instance 
because the device has been reset).

With your proposal, the interrupt will go in "Active" state (from the 
CPU POV see 4.8.3) in the GIC. If you don't EOI it, it will never fire 
again when the guest has setup the vITT.

Although, the device won't know where it has to write the event ID (i.e 
in GITS_TRANSLATER) because it should not have been configured. So it 
will get ignored, right?

>>> (SPIs are a slightly different case because they don't need h/w routing)
>>
>> I think you mixed PPIs with SPIs. SPIs (shared private interrupt)
>> requires h/w routing.
>
> I don't think they do, GICD_ICFGR (or the GICv3 equivalent) come up in a
> state where the interrupt will go _somewhere_, which differs from things
> injected via the ITS.

Right, I was confused with the h/w routing meaning.

>
>>>> This could happen when the device is not quiescent. We had this issue on
>>>> the vexpress at boot time when the network card was trying to send an
>>>> interrupt before DOM0 is setup.
>>>
>>> I don't fully understand the issue you are trying to describe, but do
>>> you want to propose a change to the spec?
>>
>> I actually don't know how to modify it. So it's an open question.
>
> For SPI too, or just for LPI?

Only LPI.

>> vgic_vcpu_inject_irq doesn't queue the interrupt if a VCPU is down. I
>> think this is because the state of the VCPU wouldn't be correct.
>>
>> The process would be something like:
>>
>>       - Creation of the domain
>> 	=> All vCPUs are down
>>
>>       - Device is assigned to the guest
>>           => Enable physical LPIs
>>
>>       * physical LPI is received *
>> 	=> Will be ignored and not EOIed (VCPU0 is down)
>>           => The LPI will never fired again during the guest life
>>
>>       -  Domain is started by the toolstack
>>            => VCPU0 is online
>
> Is it sufficient to queue interrupts even for VCPUs which are down? How
> does the lack of a vITT entry when this interrupt occurred affect this?

Well, in this case we don't know on which vLPI we have to inject it. But 
as said above, I guess we don't care if we ensure that the device can't 
send an event (by ensuring that the device doesn't know the 
GITS_TRANSLATER address is).

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft F] Xen on ARM vITS Handling
  2015-06-11  9:40 [Draft F] Xen on ARM vITS Handling Ian Campbell
                   ` (2 preceding siblings ...)
  2015-06-12 12:55 ` Ian Campbell
@ 2015-06-16 14:50 ` Vijay Kilari
  2015-06-16 15:07   ` Ian Campbell
  3 siblings, 1 reply; 18+ messages in thread
From: Vijay Kilari @ 2015-06-16 14:50 UTC (permalink / raw)
  To: Ian Campbell; +Cc: manish.jaggi, Julien Grall, Stefano Stabellini, xen-devel

On Thu, Jun 11, 2015 at 3:10 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> Draft F follows. Also at:
> http://xenbits.xen.org/people/ianc/vits/draftF.{pdf,html}
>
>
> ## Per-domain `struct pending_irq` for `vLPI`s
>
> Internally Xen uses a `struct pending_irq` to track the status of any
> pending virtual IRQ, including a virtual LPI.
>
> Upon domain creation an array of such `struct pending_irq`'s will be
> allocated to cover the range `8192..nr_lpis` (for the number of LPIs
> which the guest is configured with) and a pointer this array will be
> stored in the `struct domain`. The function `irq_to_pending` will be
> modified to lookup interupts in the LPI range in this array.
>.

nr_lpis can be large if more devices are assigned to domain.
As I was suggesting on #xenarm chat, is it ok to use RB-tree instead of array?

what should be value for nr_lpis?

Regards
Vijay

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft F] Xen on ARM vITS Handling
  2015-06-16 14:50 ` Vijay Kilari
@ 2015-06-16 15:07   ` Ian Campbell
  0 siblings, 0 replies; 18+ messages in thread
From: Ian Campbell @ 2015-06-16 15:07 UTC (permalink / raw)
  To: Vijay Kilari; +Cc: manish.jaggi, Julien Grall, Stefano Stabellini, xen-devel

On Tue, 2015-06-16 at 20:20 +0530, Vijay Kilari wrote:
> On Thu, Jun 11, 2015 at 3:10 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> > Draft F follows. Also at:
> > http://xenbits.xen.org/people/ianc/vits/draftF.{pdf,html}
> >
> >
> > ## Per-domain `struct pending_irq` for `vLPI`s
> >
> > Internally Xen uses a `struct pending_irq` to track the status of any
> > pending virtual IRQ, including a virtual LPI.
> >
> > Upon domain creation an array of such `struct pending_irq`'s will be
> > allocated to cover the range `8192..nr_lpis` (for the number of LPIs
> > which the guest is configured with) and a pointer this array will be
> > stored in the `struct domain`. The function `irq_to_pending` will be
> > modified to lookup interupts in the LPI range in this array.
> >.
> 
> nr_lpis can be large if more devices are assigned to domain.
> As I was suggesting on #xenarm chat, is it ok to use RB-tree instead of array?
> 
> what should be value for nr_lpis?

It should be user configurable and default to the sum of the number of
events on all devices at start of day.

I think this removes the need for it to be an R-B tree, an array is
tolerable here.

Adding an R-B tree not only has a memory overhead, but it then needs
more complex management when inserting, searching, etc.

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft F] Xen on ARM vITS Handling
  2015-06-12 17:55             ` Julien Grall
@ 2015-06-16 15:10               ` Ian Campbell
  2015-06-16 16:14                 ` Julien Grall
  0 siblings, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2015-06-16 15:10 UTC (permalink / raw)
  To: Julien Grall
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari, xen-devel

On Fri, 2015-06-12 at 13:55 -0400, Julien Grall wrote:
> 
> On 12/06/2015 10:24, Ian Campbell wrote:
> > On Fri, 2015-06-12 at 09:32 -0400, Julien Grall wrote:
> >>
> >> On 12/06/2015 09:16, Ian Campbell wrote:
> >>> On Fri, 2015-06-12 at 09:09 -0400, Julien Grall wrote:
> >>>> Hi Ian,
> >>>>
> >>>> On 12/06/2015 04:52, Ian Campbell wrote:
> >>>>> On Fri, 2015-06-12 at 14:07 +0530, Vijay Kilari wrote:
> >>>>> So pLPIs must be routed at device assignment time because in the vLPI
> >>>>> configuration table trap there is no mapping back to a single pLPI.
> >>>>
> >>>> I just remembered the exact reason that made use to differ SPI enabling.
> >>>
> >>> I can't parse this sentence, differ how?
> >>
> >> deferring sorry.
> >>
> >>>
> >>>> When the device is assigned, the domain VCPUs are still down (even VCPU0).
> >>>>
> >>>> If we receive an interrupt before the VCPU0 is unpaused, the interrupt
> >>>> will be lost. Same if the interrupt is not yet configured (i.e before
> >>>> the vITS setup correctly the table) with your proposal.
> >>>
> >>> Is this any different to booting with the ITT not setup?
> >>
> >> I don't understand your question.
> >
> > During boot the ITT is not configured and a spurious event will go
> > undelivered to an LPI then too, even on native.
> 
> It's different. In the case of native, the event is not recorded by the 
> ITS so it can fire up again later when the ITT is setup (for instance 
> because the device has been reset).
> 
> With your proposal, the interrupt will go in "Active" state (from the 
> CPU POV see 4.8.3) in the GIC. If you don't EOI it, it will never fire 
> again when the guest has setup the vITT.

Our other option is to EIO it, which has its own issues (namely possible
interrupt storms).

For PCI passthrough it may be that we can mitigate that some what by
resetting the device ourselves or something.

> Although, the device won't know where it has to write the event ID (i.e 
> in GITS_TRANSLATER) because it should not have been configured. So it 
> will get ignored, right?

I think so, yes.

> >
> >>>> This could happen when the device is not quiescent. We had this issue on
> >>>> the vexpress at boot time when the network card was trying to send an
> >>>> interrupt before DOM0 is setup.
> >>>
> >>> I don't fully understand the issue you are trying to describe, but do
> >>> you want to propose a change to the spec?
> >>
> >> I actually don't know how to modify it. So it's an open question.
> >
> > For SPI too, or just for LPI?
> 
> Only LPI.

How is it fixed for SPI?

> >> vgic_vcpu_inject_irq doesn't queue the interrupt if a VCPU is down. I
> >> think this is because the state of the VCPU wouldn't be correct.
> >>
> >> The process would be something like:
> >>
> >>       - Creation of the domain
> >> 	=> All vCPUs are down
> >>
> >>       - Device is assigned to the guest
> >>           => Enable physical LPIs
> >>
> >>       * physical LPI is received *
> >> 	=> Will be ignored and not EOIed (VCPU0 is down)
> >>           => The LPI will never fired again during the guest life
> >>
> >>       -  Domain is started by the toolstack
> >>            => VCPU0 is online
> >
> > Is it sufficient to queue interrupts even for VCPUs which are down? How
> > does the lack of a vITT entry when this interrupt occurred affect this?
> 
> Well, in this case we don't know on which vLPI we have to inject it. But 
> as said above, I guess we don't care if we ensure that the device can't 
> send an event (by ensuring that the device doesn't know the 
> GITS_TRANSLATER address is).

Yes, I think that works.

I'm not sure where to spell that out though.

Ian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Draft F] Xen on ARM vITS Handling
  2015-06-16 15:10               ` Ian Campbell
@ 2015-06-16 16:14                 ` Julien Grall
  0 siblings, 0 replies; 18+ messages in thread
From: Julien Grall @ 2015-06-16 16:14 UTC (permalink / raw)
  To: Ian Campbell, Julien Grall
  Cc: manish.jaggi, Julien Grall, Stefano Stabellini, Vijay Kilari, xen-devel

On 16/06/15 16:10, Ian Campbell wrote:
>>>
>>>>>> This could happen when the device is not quiescent. We had this issue on
>>>>>> the vexpress at boot time when the network card was trying to send an
>>>>>> interrupt before DOM0 is setup.
>>>>>
>>>>> I don't fully understand the issue you are trying to describe, but do
>>>>> you want to propose a change to the spec?
>>>>
>>>> I actually don't know how to modify it. So it's an open question.
>>>
>>> For SPI too, or just for LPI?
>>
>> Only LPI.
> 
> How is it fixed for SPI?

We queue the SPI but don't inject it (gic_raise_guest_irq) to the guest.

The injection will be done when the guest enable the IRQ.

We can't use it for LPIs as the mapping vLPI -> pLPI may not yet exists.

>>>> vgic_vcpu_inject_irq doesn't queue the interrupt if a VCPU is down. I
>>>> think this is because the state of the VCPU wouldn't be correct.
>>>>
>>>> The process would be something like:
>>>>
>>>>       - Creation of the domain
>>>> 	=> All vCPUs are down
>>>>
>>>>       - Device is assigned to the guest
>>>>           => Enable physical LPIs
>>>>
>>>>       * physical LPI is received *
>>>> 	=> Will be ignored and not EOIed (VCPU0 is down)
>>>>           => The LPI will never fired again during the guest life
>>>>
>>>>       -  Domain is started by the toolstack
>>>>            => VCPU0 is online
>>>
>>> Is it sufficient to queue interrupts even for VCPUs which are down? How
>>> does the lack of a vITT entry when this interrupt occurred affect this?
>>
>> Well, in this case we don't know on which vLPI we have to inject it. But 
>> as said above, I guess we don't care if we ensure that the device can't 
>> send an event (by ensuring that the device doesn't know the 
>> GITS_TRANSLATER address is).
> 
> Yes, I think that works.
> 
> I'm not sure where to spell that out though.

We want to find a way to avoid the PCI device to send an event. I see
multiple possibility to do it:
	1) Mask the event MSI-X allow to mask a specific event. There is a
similar feature for MSI but it's optional.
	2) Compose the MSI message only when the vLPI is enabled. That require
Xen/PCI-back to delay the write in the config space.

None of this solution satisfy me. But I don't have much knowledge on how
PCI-passthrough works on Xen.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-06-16 16:14 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-11  9:40 [Draft F] Xen on ARM vITS Handling Ian Campbell
2015-06-11 12:02 ` Ian Campbell
2015-06-12  8:37 ` Vijay Kilari
2015-06-12  8:52   ` Ian Campbell
2015-06-12 13:09     ` Julien Grall
2015-06-12 13:16       ` Ian Campbell
2015-06-12 13:32         ` Julien Grall
2015-06-12 14:05           ` Julien Grall
2015-06-12 14:12             ` Ian Campbell
2015-06-12 14:24           ` Ian Campbell
2015-06-12 17:55             ` Julien Grall
2015-06-16 15:10               ` Ian Campbell
2015-06-16 16:14                 ` Julien Grall
2015-06-12 12:55 ` Ian Campbell
2015-06-12 13:14   ` Julien Grall
2015-06-12 13:26     ` Ian Campbell
2015-06-16 14:50 ` Vijay Kilari
2015-06-16 15:07   ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.