[PATCH v10 00/32] arm64: Dom0 ITS emulation

* [PATCH v10 00/32] arm64: Dom0 ITS emulation
@ 2017-05-26 17:35 Andre Przywara
  2017-05-26 17:35 ` [PATCH v10 01/32] ARM: vGIC: avoid rank lock when reading priority Andre Przywara
                   ` (31 more replies)
  0 siblings, 32 replies; 96+ messages in thread
From: Andre Przywara @ 2017-05-26 17:35 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: xen-devel, Vijaya Kumar K, Vijay Kilari, Shanker Donthineni

Hi,

yet another rework, addressing comments on v9.
The same restriction as for v9 still apply: the locking is considered
somewhat insufficient and will be fixed by an upcoming rework.

Most of the fixes are smaller changes as requested by the reviewers.
Some preparatory patches have been moved earlier in the series, also there
are some new patches: 04/32 to prepare gic_remove_from_queues() to be
reused, 05/32 to add gic_remove_irq(), 09/32 as a split from the former 05/28
and finally the VGIC rank lock fix I posted earlier this week, which I put
upfront in this series (01/32). It is not necessary for the ITS, but creates
merge conflicts and I expect it to be queued earlier.
A rework removes the VCPU ID from the host LPI entry, because this information
is in the struct pending_irq already and this redundancy causes some headache.
This leads to changes in patch 11/32 in the way how we inject LPIs, also
removes some code from the command emulation where the host LPI structure
was updated.

For a detailed changelog see below.

Cheers,
Andre

----------------------------------
This series adds support for emulation of an ARM GICv3 ITS interrupt
controller. For hardware which relies on the ITS to provide interrupts for
its peripherals this code is needed to get a machine booted into Dom0 at
all. ITS emulation for DomUs is only really useful with PCI passthrough,
which is not yet available for ARM. It is expected that this feature
will be co-developed with the ITS DomU code. However this code drop here
considered DomU emulation already, to keep later architectural changes
to a minimum.

This is technical preview version to allow early testing of the feature.
Things not (properly) addressed in this release:
- There is only support for Dom0 at the moment. DomU support is only really
useful with PCI passthrough, which is not there yet for ARM.
- The MOVALL command is not emulated. In our case there is really nothing
to do here. We might need to revisit this in the future for DomU support.
- The INVALL command might need some rework to be more efficient. Currently
we iterate over all mapped LPIs, which might take a bit longer.
- Indirect tables are not supported. This affects both the host and the
virtual side.
- The ITS tables inside (Dom0) guest memory cannot easily be protected
at the moment (without restricting access to Xen as well). So for now
we trust Dom0 not to touch this memory (which the spec forbids as well).
- With malicious guests (DomUs) there is a possibility of an interrupt
storm triggered by a device. We would need to investigate what that means
for Xen and if there is a nice way to prevent this. Disabling the LPI on
the host side would require command queuing, which has its downsides to
be issued during runtime.
- Dom0 should make sure that the ITS resources (number of LPIs, devices,
events) later handed to a DomU are really limited, as a large number of
them could mean much time spend in Xen to initialize, free or handle those.
It is expected that the toolstack sets up a tailored ITS with just enough
resources to accommodate the needs of the actual passthrough-ed device(s).
- The command queue locking is currently suboptimal and should be made more
fine-grained in the future, if possible.
- Provide support for running with an IOMMU, to map the doorbell page
to all devices.

Some generic design principles:

* The current GIC code statically allocates structures for each supported
IRQ (both for the host and the guest), which due to the potentially
millions of LPI interrupts is not feasible to copy for the ITS.
So we refrain from introducing the ITS as a first class Xen interrupt
controller, also we don't hold struct irq_desc's or struct pending_irq's
for each possible LPI.
Fortunately LPIs are only interesting to guests, so we get away with
storing only the virtual IRQ number and the guest VCPU for each allocated
host LPI, which can be stashed into one uint64_t. This data is stored in
a two-level table, which is both memory efficient and quick to access.
We hook into the existing IRQ handling and VGIC code to avoid accessing
the normal structures, providing alternative methods for getting the
needed information (priority, is enabled?) for LPIs.
Whenever a guest maps a device, we allocate the maximum required number
of struct pending_irq's, so that any triggering LPI can find its data
structure. Upon the guest actually mapping the LPI, this pointer to the
corresponding pending_irq gets entered into a radix tree, so that it can
be quickly looked up.

* On the guest side we (later will) have to deal with malicious guests
trying to hog Xen with mapping requests for a lot of LPIs, for instance.
As the ITS actually uses system memory for storing status information,
we use this memory (which the guest has to provide) to naturally limit
a guest. Whenever we need information from any of the ITS tables, we
temporarily map them (which is cheap on arm64) and copy the required data.

* An obvious approach to handling some guest ITS commands would be to
propagate them to the host, for instance to map devices and LPIs and
to enable or disable LPIs.
However this (later with DomU support) will create an attack vector, as
a malicious guest could try to fill the host command queue with
propagated commands.
So we try to avoid this situation: Dom0 sending a device mapping (MAPD)
command is the only time we allow queuing commands to the host ITS command
queue, as this seems to be the only reliable way of getting the
required information at the moment. However at the same time we map all
events to LPIs already, also enable them. This avoids sending commands
later at runtime, as we can deal with mappings and LPI enabling/disabling
internally.

To accomodate the tech preview nature of this feature at the moment, there
is a Kconfig option to enable it. Also it is supported on arm64 only, which
will most likely not change in the future.
This leads to some hideous constructs like an #ifdef'ed header file with
empty function stubs to accomodate arm32 and non-ITS builds, which share
some generic code paths with the ITS emulation.
The number of supported LPIs can be limited on the command line, in case
the number reported by the hardware is too high. As Xen cannot foresee how
many interrupts the guests will need, we cater for as many as possible.
The command line parameter is called max-lpi-bits and expresses the number
of bits required to hold an interrupt ID. It defaults to 20, if that is
lower than the number supported by the hardware.

This code boots Dom0 on an ARM Fast Model with ITS support. I tried to
address the issues seen by people running the previous versions on real
hardware, though couldn't verify this here for myself.
So any testing, bug reports (and possibly even fixes) are very welcome.

The code can also be found on the its/v10 branch here:
git://linux-arm.org/xen-ap.git
http://www.linux-arm.org/git?p=xen-ap.git;a=shortlog;h=refs/heads/its/v10

Cheers,
Andre

Changelog v9 ... v10:
no changes to {06, 07, 12, 15, 20, 29, 30, 32}/32
- [01/32]: fix for rank lock deadlock as posted before
- [02/32]: replace arbitrary 16 bits for DomU interrupt IDs with 10 bits
- [03/32]: handle functions not dealing with LPIs as well
- [04/32]: new patch to rename gic_remove_from_queues and remove lock
- [05/32]: new patch to introduce helper for removing IRQs from the VGIC
- [06/32]: adapt to previous changes
- [08/32]: use memset to clear the whole structure
- [09/32]: split off from former 05/28
- [10/32]: moved up front, initialize VCPU ID
- [11/32]: remove VCPU ID from host entry, rework LPI injection, moved
           out part dealing with LPI priority caching
- [13/32]: add a hint about boolean variables
- [14/32]: fix bool_t type usage
- [16/32]: replace magic value, always use intid_bits for TYPER generation,
           add memory barrier
- [17/32]: add comments about ITS table layout, remove le64_to_cpu(),
           simplify CTLR read and remove lock, use atomic read and add comment
           in CREADR read, add TODO and ASSERT about crafting ITS tables,
           add empty lines in switch/case, move code block
- [18/32]: consistent use of data types, comments moved out to earlier patch
- [19/32]: fold in get_host_lpi(), rename veventid identifier
- [21/32]: move variable assignment
- [22/32]: add TODO locking comment, use new gic_remove_irq() function
- [23/32]: remove no longer needed VCPU ID from host LPI functions, add
           locking TODO, use goto out
- [24/32]: explain reason for LR check, make LRs unsigned, move PRISTINE
           check into one place
- [25/32]: mention MAPI, remove VCPU ID from host LPI updates, use atomic
           write for priority update, remove outdated comment, explain
           error handling path, check for valid property table
- [26/32]: remove update of VCPU ID in the host_lpi structure, add locking TODO
- [27/32]: fix error path
- [28/32]: add comment about physical LPI, use generic function to remove IRQ,
           remove redundant clear_bit
- [31/32]: make vgic_v3_its_init_virtual() static (and move it), move comment,
           remove unneeded call to vgic_v3_its_free_domain()

Changelog v8 ... v9:
- [01/28]: initialize number of interrupt IDs for DomUs also
- [02/28]: move priority reading back up front
- [03/28]: enumerate all call sites in commit message, add ASSERTs,
           add "unlikely" hints, avoid skipping ASSERTs, add comment to
           irq_to_pending() definition
- [04/28]: explain expectation of device state while destroying domain
- [05/28]: document case of invalid LPI, change dummy priority to 0xff
- [08/28]: check cross page boundary condition early in function
- [10/28]: initialize status and lr member as well
- [11/28]: check lpi_vcpu_id to cover all virtual CPUs
- [12/28]: add spin lock ASSERT
- [13/28]: introduce types for our ITS table entries, fix error messages
- [14/28]: use new ITS table entry types
- [15/28]: new patch to introduce pending_irq lookup function
- [17/28]: verify size of collection table entry
- [18/28]: use new pending_irq lookup function
- [19/28]: use new pending_irq lookup function, collection table type and
           vgic_init_pending_irq, add Dom0 ASSERT and unmap devices for DomUs
- [20/28]: document PRISTINE_LPI flag, fix typo, avoid double insertion of
           the same LPI into different LRs
- [21/28]: use new pending_irq lookup function, avoid explict LPI number
           parameter
- [22/28]: add physical affinity TODO, use new table type and pending_irq
           lookup function, fix error message
- [24/28]: use pending_irq lookup function, drop explicit LPI number parameter
- [25/28]: drop explicit LPI number parameter
- [27/28]: use new ITS table entry type

Changelog v7 ... v8:
- drop list parameter and rename to gicv3_its_make_hwdwom_dt_nodes()
- remove rebase artifacts
- add irq_enter/irq_exit() calls
- propagates number of host LPIs and number of event IDs to Dom0
- add proper coverage of all addresses in ITS MMIO handler
- avoid vcmd_lock for CBASER writes
- fix missing irqsave/irqrestore on VGIC VCPU lock
- move struct pending_irq use under the VGIC VCPU lock
- protect gic_raise_guest_irq() against NULL pending_irq
- improve device and collection table entry size documentation
- count number of ITSes to increase mmio_count
- rework MAPD, DISCARD, MAPTI and MOVI to take proper locks
- properly rollback failing MAPD and MAPTI calls
- rework functions to update property table
- return error on vgic_access_guest_memory crossing page boundary
- make sure CREADR access is atomic

Changelog v5 ... v6:
- reordered patches to allow splitting the series
- introduced functions later to avoid warnings on intermediate builds
- refactored common code changes into separate patches
- dropped GENMASK_ULL and BIT_ULL (both patches and their usage later)
- rework locking in MMIO register reads and writes
- protect new code from being executed without an ITS being configured
- fix vgic_access_guest_memory (now a separate patch)
- some more comments and TODOs

Changelog v4 ... v5:
- adding many comments
- spinlock asserts
- rename r_host_lpis to max_host_lpi_ids
- remove max_its_device_bits command line
- add warning on high number of LPIs
- avoid potential leak on host MAPD
- properly handle nr_events rounding
- remove unmap_all_devices(), replace with ASSERT
- add barriers for (lockless) host LPI lookups
- add proper locking in ITS and redist MMIO register handling
- rollback failing device mapping
- fix various printks
- add vgic_access_guest_memory() and use it
- (getting rid of page mapping functions and helpers)
- drop table mapping / unmapping on redist/ITS enable/disable
- minor reworks in functions as per review comments
- fix ITS enablement check
- move lpi_to_pending() and lpi_get_priority() to vgic_ops
- move do_LPI() to gic_hw_ops
- whitespace and hard tabs fixes
- introduce ITS domain init function (and use it for the rbtree)
- enable IRQs around do_LPI
- implement TODOs for later optimizations
- add "v" prefix to variables holding virtual properties
- provide locked and normal versions of read/write_itte
- only CLEAR LPI if not already guest visible (plus comment)
- update LPI property on MAPTI
- store vcpu_id in pending_irq for LPIs (helps INVALL)
- improve INVALL implementation to only cover LPIs on this VCPU
- improve virtual BASE register initialization
- limit number of virtual LPIs to 24 bits (Linux bug at 32??)
- only inject LPIs if redistributor is actually enabled

Changelog v3 .. v4:
- make HAS_ITS depend on EXPERT
- introduce new patch 02 to initialize host ITS early
- fix cmd_lock init position
- introduce warning on high number of LPI allocations
- various int -> unsigned fixes
- adding and improving comments
- rate limit ITS command queue full msg
- drop unneeded checks
- validate against allowed number of device IDs
- avoid memory leaks when removing devices
- improve algorithm for finding free host LPI
- convert unmap_all_devices from goto to while loop
- add message on remapping ITS device
- name virtual device / event IDs properly
- use atomic read when reading ITT entry

Changelog v2 .. v3:
- preallocate struct pending_irq's
- map ITS and redistributor tables only on demand
- store property, enable and pending bit in struct pending_irq
- improve error checking and handling
- add comments

Changelog v1 .. v2:
- clean up header file inclusion
- rework host ITS table allocation: observe attributes, many fixes
- remove patch 1 to export __flush_dcache_area, use existing function instead
- use number of LPIs internally instead of number of bits
- keep host_its_list as private as possible
- keep struct its_devices private
- rework gicv3_its_map_guest_devices
- fix rbtree issues
- more error handling and propagation
- cope with GICv4 implementations (but no virtual LPI features!)
- abstract host and guest ITSes by using doorbell addresses
- join per-redistributor variables into one per-CPU structure
- fix data types (unsigned int)
- many minor bug fixes

(Rough) changelog RFC-v2 .. v1:
- split host ITS driver into gic-v3-lpi.c and gic-v3-its.c part
- rename virtual ITS driver file to vgic-v3-its.c
- use macros and named constants for all magic numbers
- use atomic accessors for accessing the host LPI data
- remove leftovers from connecting virtual and host ITSes
- bail out if host ITS is disabled in the DT
- rework map/unmap_guest_pages():
    - split off p2m part as get/put_guest_pages (to be done on allocation)
    - get rid of vmap, using map_domain_page() instead
- delay allocation of virtual tables until actual LPI/ITS enablement
- properly size both virtual and physical tables upon allocation
- fix put_domain() locking issues in physdev_op and LPI handling code
- add and extend comments in various areas
- fix lotsa coding style and white space issues, including comment style
- add locking to data structures not yet covered
- fix various locking issues
- use an rbtree to deal with ITS devices (instead of a list)
- properly handle memory attributes for ITS tables
- handle cacheable/non-cacheable ITS table mappings
- sanitize guest provided ITS/LPI table attributes
- fix breakage on non-GICv2 compatible host GICv3 controllers
- add command line parameters on top of Kconfig options
- properly wait for an ITS to become quiescient before enabling it
- handle host ITS command queue errors
- actually wait for host ITS command completion (READR==WRITER)
- fix ARM32 compilation
- various patch splits and reorderings

Andre Przywara (31):
  ARM: vGIC: avoid rank lock when reading priority
  ARM: GICv3: setup number of LPI bits for a GICv3 guest
  ARM: vGIC: move irq_to_pending() calls under the VGIC VCPU lock
  ARM: vGIC: rework gic_remove_from_queues()
  ARM: vGIC: introduce gic_remove_irq()
  ARM: GIC: Add checks for NULL pointer pending_irq's
  ARM: GICv3: introduce separate pending_irq structs for LPIs
  ARM: GIC: export and extend vgic_init_pending_irq()
  ARM: vGIC: cache virtual LPI priority in struct pending_irq
  ARM: vGIC: add LPI VCPU ID to struct pending_irq
  ARM: GICv3: forward pending LPIs to guests
  ARM: GICv3: enable ITS and LPIs on the host
  ARM: vGICv3: handle virtual LPI pending and property tables
  ARM: vGICv3: re-use vgic_reg64_check_access
  ARM: vGIC: advertise LPI support
  ARM: vITS: add command handling stub and MMIO emulation
  ARM: vITS: introduce translation table walks
  ARM: vITS: provide access to struct pending_irq
  ARM: vITS: handle INT command
  ARM: vITS: handle MAPC command
  ARM: vITS: handle CLEAR command
  ARM: vITS: handle MAPD command
  ARM: GICv3: handle unmapped LPIs
  ARM: vITS: handle MAPTI/MAPI command
  ARM: vITS: handle MOVI command
  ARM: vITS: handle DISCARD command
  ARM: vITS: handle INV command
  ARM: vITS: handle INVALL command
  ARM: vITS: increase mmio_count for each ITS
  ARM: vITS: create and initialize virtual ITSes for Dom0
  ARM: vITS: create ITS subnodes for Dom0 DT

Vijaya Kumar K (1):
  ARM: introduce vgic_access_guest_memory()

 xen/arch/arm/gic-v2.c            |    7 +
 xen/arch/arm/gic-v3-its.c        |  180 +++++
 xen/arch/arm/gic-v3-lpi.c        |   93 ++-
 xen/arch/arm/gic-v3.c            |   29 +-
 xen/arch/arm/gic.c               |   99 ++-
 xen/arch/arm/vgic-v2.c           |   15 +
 xen/arch/arm/vgic-v3-its.c       | 1462 +++++++++++++++++++++++++++++++++++++-
 xen/arch/arm/vgic-v3.c           |  303 +++++++-
 xen/arch/arm/vgic.c              |  140 +++-
 xen/include/asm-arm/domain.h     |   16 +-
 xen/include/asm-arm/event.h      |    3 +
 xen/include/asm-arm/gic.h        |    5 +-
 xen/include/asm-arm/gic_v3_its.h |   42 ++
 xen/include/asm-arm/vgic-emul.h  |    9 +
 xen/include/asm-arm/vgic.h       |   18 +-
 15 files changed, 2353 insertions(+), 68 deletions(-)

-- 
2.9.0

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 96+ messages in thread