All of lore.kernel.org
 help / color / mirror / Atom feed
* PCI Pass-through in Xen ARM: Draft 4
@ 2015-08-13  9:42 Manish Jaggi
  2015-08-13 10:37 ` Julien Grall
                   ` (4 more replies)
  0 siblings, 5 replies; 40+ messages in thread
From: Manish Jaggi @ 2015-08-13  9:42 UTC (permalink / raw)
  To: Xen Devel, Konrad Rzeszutek Wilk, ★ Stefano Stabellini,
	"Manish Jaggi,★ Kumar, Vijaya",
	Julien Grall, Ian Campbell
  Cc: Prasun.kapoor

               -----------------------------
              | PCI Pass-through in Xen ARM |
               -----------------------------
              manish.jaggi@caviumnetworks.com
              -------------------------------

                       Draft-4


  -----------------------------------------------------------------------------
  Introduction
  -----------------------------------------------------------------------------
  This document describes the design for the PCI passthrough support in Xen
  ARM. The target system is an ARM 64bit SoC with GICv3 and SMMU v2 and PCIe
  devices.

  -----------------------------------------------------------------------------
  Revision History
  -----------------------------------------------------------------------------
  Changes from Draft-1:
  ---------------------
  a) map_mmio hypercall removed from earlier draft
  b) device bar mapping into guest not 1:1
  c) Reserved Area in guest address space for mapping PCI-EP BARs in Stage2.
  d) Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).

  Changes from Draft-2:
  ---------------------
  a) DomU boot information updated with boot-time device assignment and
  hotplug.
  b) SMMU description added
  c) Mapping between streamID - bdf - deviceID.
  d) assign_device hypercall to include virtual(guest) sbdf.
  Toolstack to generate guest sbdf rather than pciback.

  Changes from Draft-3:
  ---------------------
  a) Fixed typos and added more description
  b) NUMA and PCI passthrough description removed for now.
  c) Added example from Ian's Mail

  -----------------------------------------------------------------------------
  Index
  -----------------------------------------------------------------------------
    (1) Background

    (2) Basic PCI Support in Xen ARM
    (2.1) pci_hostbridge and pci_hostbridge_ops
    (2.2) PHYSDEVOP_HOSTBRIDGE_ADD hypercall
    (2.3) XEN Internal API

    (3) SMMU programming
    (3.1) Additions for PCI Passthrough
    (3.2) Mapping between streamID - deviceID - pci sbdf - requesterID

    (4) Assignment of PCI device
    (4.1) Dom0
    (4.1.1) Stage 2 Mapping of GITS_ITRANSLATER space (4k)
    (4.1.1.1) For Dom0
    (4.1.1.2) For DomU
    (4.1.1.2.1) Hypercall Details: XEN_DOMCTL_get_itranslater_space

    (4.2) DomU
    (4.2.1) Reserved Areas in guest memory space
    (4.2.2) Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
    (4.2.3) Hypercall Modification for bdf mapping notification to xen

    (5) DomU FrontEnd Bus Changes
    (5.1) Change in Linux PCI frontend bus and gicv3-its node binding 
for domU

    (6) Glossary

    (7) References
  -----------------------------------------------------------------------------

  1.    Background of PCI passthrough
  -----------------------------------------------------------------------------
  Passthrough refers to assigning a PCI device to a guest domain (domU) such
  that the guest has full control over the device. The MMIO space / 
interrupts
  are managed by the guest itself, close to how a bare kernel manages a 
device.

  Device's access to guest address space needs to be isolated and protected.
  SMMU (System MMU - IOMMU in ARM) is programmed by xen hypervisor to allow
  device access guest memory for data transfer and sending MSI/X interrupts.
  PCI devices generated message signalled interrupt writes are within guest
  address spaces which are also translated using SMMU.

  For this reason the GITS (ITS address space) Interrupt Translation 
Register
  space is mapped in the guest address space.

  2.    Basic PCI Support for ARM
  -----------------------------------------------------------------------------
  The APIs to read write from PCI configuration space are based on 
segment:bdf.
  How the sbdf is mapped to a physical address is under the realm of the PCI
  host controller.

  ARM PCI support in Xen, introduces PCI host controller similar to what
  exists in Linux. Host controller drivers registers callbacks, which are
  invoked on matching the compatible property in pci device tree node.

  Note: as pci devices are enumerated the pci node in device tree refers to
  the host controller.

  (TODO: for ACPI unimplemented)

  2.1    pci_hostbridge and pci_hostbridge_ops
  -----------------------------------------------------------------------------
  The init function in the PCI host driver calls to register hostbridge
  callbacks:

  int pci_hostbridge_register(pci_hostbridge_t *pcihb);

  struct pci_hostbridge_ops {
      u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,
                                  u32 reg, u32 bytes);
      void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn,
                                  u32 reg, u32 bytes, u32 val);
  };

  struct pci_hostbridge{
      u32 segno;
      paddr_t cfg_base;
      paddr_t cfg_size;
      struct dt_device_node *dt_node;
      struct pci_hostbridge_ops ops;
      struct list_head list;
  };

  A PCI conf_read function would internally be as follows:
  u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes)
  {
      pci_hostbridge_t *pcihb;
      list_for_each_entry(pcihb, &pci_hostbridge_list, list)
      {
          if(pcihb-segno == seg)
              return pcihb-ops.pci_conf_read(pcihb, bus, devfn, reg, bytes);
      }
      return -1;
  }

  2.2    PHYSDEVOP_pci_host_bridge_add hypercall
  -----------------------------------------------------------------------------
  Xen code accesses PCI configuration space based on the sbdf received from
  the guest. The order in which the pci device tree node appear may not be
  the same order of device enumeration in dom0. Thus there needs to be a
  mechanism to bind the segment number assigned by dom0 to the pci host
  controller. The hypercall is introduced:

  #define PHYSDEVOP_pci_host_bridge_add    <<>>
  struct physdev_pci_host_bridge_add {
      /* IN */
      uint16_t seg;
      uint64_t cfg_base;
      uint64_t cfg_size;
  };

  This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add
  hypercall.

  To understand in detail about the requirement Ian's example is listed 
below:
-- Ref: [1]
  Imagine we have two PCI host bridges, one with CFG space at 0xA0000000 and
  a second with CFG space at 0xB0000000.

  Xen discovers these and assigns segment 0=0xA0000000 and segment
  1=0xB0000000.

  Dom0 discovers them too but assigns segment 1=0xA0000000 and segment
  0=0xB0000000 (i.e. the other way).

  Now Dom0 makes a hypercall referring to a device as (segment=1,BDF), i.e.
  the device with BDF behind the root bridge at 0xA0000000. (Perhaps this is
  the PHYSDEVOP_manage_pci_add_ext call).

  But Xen thinks it is talking about the device with BDF behind the root
  bridge at 0xB0000000 because Dom0 and Xen do not agree on what the 
segments
  mean. Now Xen will use the wrong device ID in the IOMMU (since that is
  associated with the host bridge), or poke the wrong configuration 
space, or
  whatever.

  Or maybe Xen chose 42=0xB0000000 and 43=0xA0000000 so when Dom0 starts
  talking about segment=0 and =1 it has no idea what is going on.

  PHYSDEVOP_pci_host_bridge_add is intended to allow Dom0 to say "Segment 0
  is the host bridge at 0xB0000000" and "Segment 1 is the host bridge at
  0xA0000000". With this there is no confusion between Xen and Dom0 because
  Xen isn't picking a segment ID, it is being told what it is by Dom0 which
  has done the picking.
--

  The handler code invokes to update segment number in pci_hostbridge:

  int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t
  cfg_size);

  Subsequent calls to pci_conf_read/write are completed by the
  pci_hostbridge_ops of the respective pci_hostbridge.

  2.3    XEN Internal API
  -----------------------------------------------------------------------------
  a) pci_hostbridge_dt_node

  struct dt_device_node* pci_hostbridge_dt_node(uint32_t segno);

  Returns the device tree node pointer of the pci node which is bound by 
the
  passed segment number. The API can be called subsequent to
  pci_hostbridge_setup

  3.    SMMU programming
  -----------------------------------------------------------------------------

  3.1.    Additions for PCI Passthrough
  -----------------------------------------------------------------------------

  3.1.1 - add_device in iommu_ops is implemented.
  -----------------------------------------------------------------------------

  This is called when PHYSDEVOP_pci_add_device / 
PHYSDEVOP_manage_pci_add_ext
  is called from dom0.

  .add_device = arm_smmu_add_dom0_dev,
  static int arm_smmu_add_dom0_dev(u8 devfn, struct device *dev)
  {
          if (dev_is_pci(dev)) {
              struct pci_dev *pdev = to_pci_dev(dev);
              return arm_smmu_assign_dev(pdev-domain, devfn, dev);
          }
          return -1;
  }

  3.1.2 - remove_device in iommu_ops is implemented.
  -----------------------------------------------------------------------------
  This is called when PHYSDEVOP_pci_device_remove is called from dom0/domU.

  .remove_device = arm_smmu_remove_dev.
  TODO: add implementation details of arm_smmu_remove_dev.

  3.1.3 dev_get_dev_node is modified for pci devices.
  -----------------------------------------------------------------------------
  The function is modified to return the dt_node of the pci hostbridge from
  the device tree. This is required as non-dt devices need a way to find on
  which smmu they are attached.

  static struct arm_smmu_device *find_smmu_for_device(struct device *dev)
  {
          struct device_node *dev_node = dev_get_dev_node(dev);
  ....

  static struct device_node *dev_get_dev_node(struct device *dev)
  {
          if (dev_is_pci(dev)) {
                  struct pci_dev *pdev = to_pci_dev(dev);
                  return pci_hostbridge_dt_node(pdev-seg);
          }
  ...


  3.2.    Mapping between streamID - deviceID - pci sbdf - requesterID
  -----------------------------------------------------------------------------
  For a simpler case all should be equal to BDF. But there are some devices
  that use the wrong requester ID for DMA transactions. Linux kernel has PCI
  quirks for these. How the same be implemented in Xen or a diffrent 
approach
  has to be taken is TODO here.

  Till that time, for basic implementation it is assumed that all are equal
  to BDF.

  4.    Assignment of PCI device
  -----------------------------------------------------------------------------

  4.1    Dom0
  -----------------------------------------------------------------------------
  All PCI devices are assigned to dom0 unless hidden by pciback.hide 
bootargs
  in dom0.Dom0 enumerates the PCI devices. For each device the MMIO 
space has
  to be mapped in the Stage2 translation for dom0. For dom0 Xen maps ranges
  from device tree pci nodes in stage 2 translation during boot.

  In the flow of hypercall processing PHYSDEV_pci_add_device
  its_add_device(machine_sbdf) should be called. This will allocate ITS
  specific data structures for the device. (Reference [2])


  4.1.1 Stage 2 Mapping of GITS_ITRANSLATER space (64k)
  -----------------------------------------------------------------------------

  GITS_ITRANSLATER space (64k) must be programmed in Stage2 translation so
  that SMMU can translate MSI(x) from the device using the page table of the
  domain.

  4.1.1.1 For Dom0
  -----------------------------------------------------------------------------
  GITS_ITRANSLATER address space is mapped 1:1 during dom0 boot. For 
dom0 this
  mapping is done in the vgic driver. For domU the mapping is done by
  toolstack.

  4.1.1.2 For DomU
  -----------------------------------------------------------------------------
  For domU, while creating the domain, the toolstack reads the IPA from the
  macro GITS_ITRANSLATER_SPACE from xen/include/public/arch-arm.h. The PA is
  read from a new hypercall which returns PA of GITS_ITRANSLATER_SPACE.

  Subsequently toolstack sends a hypercall to create a stage 2 mapping.

  Hypercall Details: XEN_DOMCTL_get_itranslater_space

  /* XEN_DOMCTL_get_itranslater_space */
  struct xen_domctl_get_itranslater_space {
      /* OUT variables. */
      uint64_aligned_t start_addr;
      uint64_aligned_t size;
  };

  4.2 DomU
  -----------------------------------------------------------------------------

  4.2.1 Mapping BAR regions in guest address space
  -----------------------------------------------------------------------------
  When a PCI-EP device is assigned to a domU the toolstack will read the pci
  configuration space BAR registers. Toolstack allocates a virtual BAR
  region for each BAR region, from the area reserved in guest address 
space for
  mapping BARs referred to as Guest BAR area. This area is defined in
  public/arch-arm.h

  /* For 32bit BARs*/
  #define GUEST_BAR_BASE_32 <<>>
  #define GUEST_BAR_SIZE_32 <<>>

  /* For 64bit BARs*/
  #define GUEST_BAR_BASE_64 <<>>
  #define GUEST_BAR_SIZE_64 <<>>

  Toolstack then invokes domctl xc_domain_memory_mapping to map in stage2
  translation. If a BAR region address is 32b BASE_32 area would be used,
  otherwise 64b. If a combination of both is required the support is TODO.

  Toolstack manages these areas and allocate from these area. The allocation
  and deallocation is done using APIs similar to malloc and free.

  4.2.2    Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
  ----------------------------------------------------------------------------
  Toolstack also updates the xenstore information for the device
  (virtualbar:physical bar).This information is read by xen-pciback and
  returned to the domU-pcifront driver configuration space reads for BAR.

  Entries created are as follows:
  /local/domain/0/backend/pci/1/0
  vdev-N
      BDF = ""
      BAR-0-IPA = ""
      BAR-0-PA = ""
      BAR-0-SIZE = ""
      ...
      BAR-M-IPA = ""
      BAR-M-PA = ""
      BAR-M-SIZE = ""

  Note: If BAR M SIZE is 0, it is not a valid entry.

  4.2.3 Hypercall Modification (XEN_DOMCTL_assign_device)
  ----------------------------------------------------------------------------
  For machine:sbdf guest:sbdf needs to be generated when a device is 
assigned
  to a domU. Currently this is done by xen-pciback. As per discussions [4]
  on xen-devel the df generation should be done by toolstack rather than
  the xen-pciback.

  Since there is only one pci-frontend bus in domU, s:b:d.f is 0:0:d.f
  It is proposed in this design document that the df generation be done by
  toolstack and the xenstore keys be created by toolstack.

  Folowing guest_sbdf generation the domctl to assign the device is invoked.
  This hypercall is updated to include *guest_sbdf*. Xen ITS driver can 
store
  this mapping domID: guest_sbdf: machine_sbdf and can be used later.

  struct xen_domctl_assign_device {
     uint32_t dev;   /* XEN_DOMCTL_DEV_* */
     union {
         struct {
             uint32_t machine_sbdf;   /* machine PCI ID of assigned 
device */
             uint32_t guest_sbdf;   /* guest PCI ID of assigned device */
         } pci;
         struct {
             uint32_t size; /* Length of the path */
             XEN_GUEST_HANDLE_64(char) path; /* path to the device tree 
node */
         } dt;
     } u;
  };

  In the handler of this hypercall an internal API function
  its_assign_device(domid, machine_sbdf, guest_sbdf)
  (Refrence [2])

  is called and will store the mapping between machine_sbdf:guest_sbdf.

  5. Change in Linux PCI frontEnd - backend driver for MSI/X programming
  -----------------------------------------------------------------------------

  5.1 pci-frontend bus and gicv3-its node binding for domU
  -----------------------------------------------------------------------------
  It is assumed that toolstack would generate a gicv3-its node in domU 
device
  tree. As of now the ARM PCI passthrough design supports device 
assignment to
  the guests which have gicv3-its support. PCI passthrough with a gicv2 
guest
  is not supported.

  All the devices assigned to domU are enumerated on a PCI frontend bus.
  On this bus interrupt parent is set as gicv3-its for ARM systems. As the
  gicv3-its is emulated in xen, all the access by domU driver is trapped.
  This helps configuration & direct injection of MSI(LPI) into the 
guest. Thus
  the frontend-backend communication for MSI is no longer required.

  Frontend-backend communication is required only for reading PCI 
configuration
  space by dom0 on behalf of domU.

  6.    Glossary
  -----------------------------------------------------------------------------
  MSI: Message Signalled Interrupt
  ITS: Interrupt Translation Service
  GIC: Generic Interrupt Controller
  LPI: Locality-specific Peripheral Interrupt


  7.    References
  -----------------------------------------------------------------------------
  [1]. http://osdir.com/ml/general/2015-08/msg15346.html
  [2]. http://lists.xen.org/archives/html/xen-devel/2015-07/msg01984.html
  [3]. http://xenbits.xen.org/people/ianc/vits/draftG.html
  [4]. http://lists.xen.org/archives/html/xen-devel/2015-07/msg05513.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-13  9:42 PCI Pass-through in Xen ARM: Draft 4 Manish Jaggi
@ 2015-08-13 10:37 ` Julien Grall
  2015-09-02 15:19   ` Ian Campbell
  2015-08-13 15:29 ` Jan Beulich
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2015-08-13 10:37 UTC (permalink / raw)
  To: Manish Jaggi, Xen Devel, Konrad Rzeszutek Wilk,
	★ Stefano Stabellini, "Manish Jaggi,★ Kumar,
	Vijaya",
	Julien Grall, Ian Campbell
  Cc: Prasun.kapoor

Hi Manish,

I left Linaro 4 months ago. Can you please use my email citrix
(julien.grall@citrix.com).

On 13/08/15 10:42, Manish Jaggi wrote:
>  -----------------------------------------------------------------------------
> 
> 
>  1.    Background of PCI passthrough
>  -----------------------------------------------------------------------------
> 
>  Passthrough refers to assigning a PCI device to a guest domain (domU) such
>  that the guest has full control over the device. The MMIO space /
> interrupts
>  are managed by the guest itself, close to how a bare kernel manages a
> device.
> 
>  Device's access to guest address space needs to be isolated and protected.
>  SMMU (System MMU - IOMMU in ARM) is programmed by xen hypervisor to allow
>  device access guest memory for data transfer and sending MSI/X interrupts.
>  PCI devices generated message signalled interrupt writes are within guest
>  address spaces which are also translated using SMMU.

In all this design you are only speaking about MSI interrupts. But what
about legacy interrupt?

AFAICT, it's possible for Linux to use it either because MSIs are not
supported or because the user asked it.

>  4.2.2    Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
>  ----------------------------------------------------------------------------
> 
>  Toolstack also updates the xenstore information for the device
>  (virtualbar:physical bar).This information is read by xen-pciback and
>  returned to the domU-pcifront driver configuration space reads for BAR.
> 
>  Entries created are as follows:
>  /local/domain/0/backend/pci/1/0
>  vdev-N
>      BDF = ""
>      BAR-0-IPA = ""
>      BAR-0-PA = ""
>      BAR-0-SIZE = ""
>      ...
>      BAR-M-IPA = ""
>      BAR-M-PA = ""
>      BAR-M-SIZE = ""
> 
>  Note: If BAR M SIZE is 0, it is not a valid entry.

How would you describe the ROM in xenstore?


regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-13  9:42 PCI Pass-through in Xen ARM: Draft 4 Manish Jaggi
  2015-08-13 10:37 ` Julien Grall
@ 2015-08-13 15:29 ` Jan Beulich
  2015-08-13 17:01   ` Ian Campbell
                     ` (2 more replies)
  2015-08-31 12:36 ` Manish Jaggi
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 40+ messages in thread
From: Jan Beulich @ 2015-08-13 15:29 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: Prasun.kapoor, Ian Campbell, stefano.stabellini, Vijaya Kumar,
	Julien Grall, Xen Devel

               >>> On 13.08.15 at 11:42, <mjaggi@caviumnetworks.com> wrote:
>   2.1    pci_hostbridge and pci_hostbridge_ops
>   -----------------------------------------------------------------------------
>   The init function in the PCI host driver calls to register hostbridge
>   callbacks:
> 
>   int pci_hostbridge_register(pci_hostbridge_t *pcihb);
> 
>   struct pci_hostbridge_ops {
>       u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,
>                                   u32 reg, u32 bytes);
>       void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn,
>                                   u32 reg, u32 bytes, u32 val);
>   };
> 
>   struct pci_hostbridge{
>       u32 segno;
>       paddr_t cfg_base;
>       paddr_t cfg_size;
>       struct dt_device_node *dt_node;
>       struct pci_hostbridge_ops ops;
>       struct list_head list;
>   };
> 
>   A PCI conf_read function would internally be as follows:
>   u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes)
>   {
>       pci_hostbridge_t *pcihb;
>       list_for_each_entry(pcihb, &pci_hostbridge_list, list)
>       {
>           if(pcihb-segno == seg)
>               return pcihb-ops.pci_conf_read(pcihb, bus, devfn, reg, bytes);
>       }
>       return -1;
>   }

Which implies 1 segment per host bridge, which doesn't seem too
nice to me: I can't see why a bridge might not cover more than one
segment, and I also can't see why you shouldn't be able to put
multiple bridges in the same segment when the number of busses
they have is small.

>   2.2    PHYSDEVOP_pci_host_bridge_add hypercall
>   -----------------------------------------------------------------------------
>   Xen code accesses PCI configuration space based on the sbdf received from
>   the guest.

Guest? Not Dom0?

>   4.1.1.2 For DomU
>   -----------------------------------------------------------------------------
>   For domU, while creating the domain, the toolstack reads the IPA from the
>   macro GITS_ITRANSLATER_SPACE from xen/include/public/arch-arm.h. The PA is
>   read from a new hypercall which returns PA of GITS_ITRANSLATER_SPACE.
> 
>   Subsequently toolstack sends a hypercall to create a stage 2 mapping.
> 
>   Hypercall Details: XEN_DOMCTL_get_itranslater_space
> 
>   /* XEN_DOMCTL_get_itranslater_space */
>   struct xen_domctl_get_itranslater_space {
>       /* OUT variables. */
>       uint64_aligned_t start_addr;
>       uint64_aligned_t size;
>   };

Is this a domain specific rather than a globally unique range?

>   4.2.1 Mapping BAR regions in guest address space
>   -----------------------------------------------------------------------------
>   When a PCI-EP device is assigned to a domU the toolstack will read the pci
>   configuration space BAR registers. Toolstack allocates a virtual BAR
>   region for each BAR region, from the area reserved in guest address 
> space for
>   mapping BARs referred to as Guest BAR area. This area is defined in
>   public/arch-arm.h
> 
>   /* For 32bit BARs*/
>   #define GUEST_BAR_BASE_32 <<>>
>   #define GUEST_BAR_SIZE_32 <<>>
> 
>   /* For 64bit BARs*/
>   #define GUEST_BAR_BASE_64 <<>>
>   #define GUEST_BAR_SIZE_64 <<>>

This sounds like you want to use a pair of fixed (forever, since putting
it in the ABI) ranges - is that really a good idea? Wouldn't it be possible
(and necessary when assigning many or large devices) to establish
these ranges dynamically?

Furthermore OSes can generally reassign BARs as they see fit (as
long as the chosen addresses don't collide with anything else).

>   4.2.2    Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
>   ----------------------------------------------------------------------------
>   Toolstack also updates the xenstore information for the device
>   (virtualbar:physical bar).This information is read by xen-pciback and
>   returned to the domU-pcifront driver configuration space reads for BAR.
> 
>   Entries created are as follows:
>   /local/domain/0/backend/pci/1/0
>   vdev-N
>       BDF = ""
>       BAR-0-IPA = ""
>       BAR-0-PA = ""
>       BAR-0-SIZE = ""
>       ...
>       BAR-M-IPA = ""
>       BAR-M-PA = ""
>       BAR-M-SIZE = ""

This again looks like it wouldn't cope with the guest reassigning
BARs (since how would the toolstack know). Or did you mean to
say that in case of reassignment pciback also updates these
entries?

But I wonder in general why this would need to be done through
xenstore: pciback knows the real BAR base, and with the help of
the hypervisor it ought to be able to also know the corresponding
guest address.

>   4.2.3 Hypercall Modification (XEN_DOMCTL_assign_device)
>   ----------------------------------------------------------------------------
>   For machine:sbdf guest:sbdf needs to be generated when a device is 
> assigned
>   to a domU. Currently this is done by xen-pciback. As per discussions [4]
>   on xen-devel the df generation should be done by toolstack rather than
>   the xen-pciback.

And this is already the case for HVM guests (just that the assignment
is done in qemu, which still is part of the tool stack). Iirc only PV
passthrough uses pciback for this.

>   Since there is only one pci-frontend bus in domU, s:b:d.f is 0:0:d.f
>   It is proposed in this design document that the df generation be done by
>   toolstack and the xenstore keys be created by toolstack.
> 
>   Folowing guest_sbdf generation the domctl to assign the device is invoked.
>   This hypercall is updated to include *guest_sbdf*. Xen ITS driver can 
> store
>   this mapping domID: guest_sbdf: machine_sbdf and can be used later.

Mind explaining why the hypervisor needs to know about the guest
topology? Is that because you want to continue to not involve qemu
in any of the emulation?

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-13 15:29 ` Jan Beulich
@ 2015-08-13 17:01   ` Ian Campbell
  2015-08-14  9:26     ` Jan Beulich
  2015-08-14 15:38   ` Stefano Stabellini
  2015-09-02 14:57   ` Ian Campbell
  2 siblings, 1 reply; 40+ messages in thread
From: Ian Campbell @ 2015-08-13 17:01 UTC (permalink / raw)
  To: Jan Beulich, Manish Jaggi
  Cc: Prasun.kapoor, Vijaya Kumar, Julien Grall, Xen Devel, stefano.stabellini

On Thu, 2015-08-13 at 09:29 -0600, Jan Beulich wrote:

A bunch of your questions seem to be about things which have been discussed
at length in previous postings, it's probably worth reviewing some of those
discussions.

> &gt; 
> &gt;   /* For 32bit BARs*/
> &gt;   #define GUEST_BAR_BASE_32 &lt;&lt;&gt;&gt;
> &gt;   #define GUEST_BAR_SIZE_32 &lt;&lt;&gt;&gt;
> &gt; 
> &gt;   /* For 64bit BARs*/
> &gt;   #define GUEST_BAR_BASE_64 &lt;&lt;&gt;&gt;
> &gt;   #define GUEST_BAR_SIZE_64 &lt;&lt;&gt;&gt;
> 
> This sounds like you want to use a pair of fixed (forever, since putting
> it in the ABI) ranges - is that really a good idea? Wouldn't it be possible
> (and necessary when assigning many or large devices) to establish
> these ranges dynamically?

The guest physical address space layout is explicitly not an ABI on ARM.
The relevant bits of arch-arm.h have the appropriate tools-and-xen-only
ifdefs.

> 
[...]
> Furthermore OSes can generally reassign BARs as they see fit (as
> long as the chosen addresses don't collide with anything else).

Not with PV pcifront/back based setups, the BARs are fixed then I believe,
even for x86/PV. That model is being followed on ARM too.

We don't want to involve QEMU just to emulate this sort of thing, and nor
do we really want to add such functionality to pciback.


> But I wonder in general why this would need to be done through
> xenstore: pciback knows the real BAR base, and with the help of
> the hypervisor it ought to be able to also know the corresponding
> guest address.

How? Previously adding a new hypercall was proposed for this, I didn't like
that approach because it added a new frozen ABI to the hypercall interface
and IMHO the tools is the right place to have the address space layout
stuff anyway.


> >   Since there is only one pci-frontend bus in domU, s:b:d.f is 0:0:d.f
> >   It is proposed in this design document that the df generation be done 
> > by
> >   toolstack and the xenstore keys be created by toolstack.
> > 
> >   Folowing guest_sbdf generation the domctl to assign the device is 
> > invoked.
> >   This hypercall is updated to include *guest_sbdf*. Xen ITS driver can 
> > 
> > store
> >   this mapping domID: guest_sbdf: machine_sbdf and can be used later.
> 
> Mind explaining why the hypervisor needs to know about the guest
> topology? Is that because you want to continue to not involve qemu
> in any of the emulation?

Some virtualsed hardware (i.e. the interrupt translation unit) take device
-ids which are derived from the SBDF, since Xen deals with configuring that
h/w it needs to know the guest's SBDF.

Ian.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-13 17:01   ` Ian Campbell
@ 2015-08-14  9:26     ` Jan Beulich
  2015-08-14 13:21       ` Stefano Stabellini
  2015-09-02 14:47       ` Ian Campbell
  0 siblings, 2 replies; 40+ messages in thread
From: Jan Beulich @ 2015-08-14  9:26 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Prasun.kapoor, stefano.stabellini, Manish Jaggi, Vijaya Kumar,
	JulienGrall, Xen Devel

>>> On 13.08.15 at 19:01, <ian.campbell@citrix.com> wrote:
> On Thu, 2015-08-13 at 09:29 -0600, Jan Beulich wrote:
> 
> A bunch of your questions seem to be about things which have been discussed
> at length in previous postings, it's probably worth reviewing some of those
> discussions.

With the huge amount of mail to be read after returning from vacation
I had hoped to avoid that, assuming that results of such discussion
would (should) manifest themselves in the new draft i.e. I specifically
avoided to join the tail of the draft 3 thread).

>> Furthermore OSes can generally reassign BARs as they see fit (as
>> long as the chosen addresses don't collide with anything else).
> 
> Not with PV pcifront/back based setups, the BARs are fixed then I believe,
> even for x86/PV. That model is being followed on ARM too.

I'm unaware of the mechanism to avoid such re-assignment in Linux;
it's even less clear how you'd expect to suppress this in other guest
OSes (Windows - afaik - being able to run on ARM certainly makes it
a candidate guest, even if maybe this doesn't work today), namely
such not having any pcifront. And I hope this design discussion isn't
limiting itself to Linux guests.

>> But I wonder in general why this would need to be done through
>> xenstore: pciback knows the real BAR base, and with the help of
>> the hypervisor it ought to be able to also know the corresponding
>> guest address.
> 
> How? Previously adding a new hypercall was proposed for this, I didn't like
> that approach because it added a new frozen ABI to the hypercall interface
> and IMHO the tools is the right place to have the address space layout
> stuff anyway.

Yeah - it's a balancing issue between maintaining redundant information
(i.e. the risk of it getting out of sync) or having an unchangeable
interface.

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-14  9:26     ` Jan Beulich
@ 2015-08-14 13:21       ` Stefano Stabellini
  2015-08-14 13:58         ` Jan Beulich
  2015-09-02 14:47       ` Ian Campbell
  1 sibling, 1 reply; 40+ messages in thread
From: Stefano Stabellini @ 2015-08-14 13:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, Ian Campbell, stefano.stabellini, Manish Jaggi,
	Vijaya Kumar, JulienGrall, Xen Devel

On Fri, 14 Aug 2015, Jan Beulich wrote:
> >>> On 13.08.15 at 19:01, <ian.campbell@citrix.com> wrote:
> > On Thu, 2015-08-13 at 09:29 -0600, Jan Beulich wrote:
> > 
> > A bunch of your questions seem to be about things which have been discussed
> > at length in previous postings, it's probably worth reviewing some of those
> > discussions.
> 
> With the huge amount of mail to be read after returning from vacation
> I had hoped to avoid that, assuming that results of such discussion
> would (should) manifest themselves in the new draft i.e. I specifically
> avoided to join the tail of the draft 3 thread).
> 
> >> Furthermore OSes can generally reassign BARs as they see fit (as
> >> long as the chosen addresses don't collide with anything else).
> > 
> > Not with PV pcifront/back based setups, the BARs are fixed then I believe,
> > even for x86/PV. That model is being followed on ARM too.
> 
> I'm unaware of the mechanism to avoid such re-assignment in Linux;

They simply don't work. The code to reassign BARs has never been
written.


> it's even less clear how you'd expect to suppress this in other guest
> OSes (Windows - afaik - being able to run on ARM certainly makes it
> a candidate guest, even if maybe this doesn't work today), namely
> such not having any pcifront. And I hope this design discussion isn't
> limiting itself to Linux guests.

We'll write down in the ABI documentation that BARs reassignments are
not supported.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-14 13:21       ` Stefano Stabellini
@ 2015-08-14 13:58         ` Jan Beulich
  2015-08-14 14:03           ` Stefano Stabellini
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2015-08-14 13:58 UTC (permalink / raw)
  To: stefano.stabellini
  Cc: Prasun.kapoor, Ian Campbell, Manish Jaggi, Vijaya Kumar,
	JulienGrall, Xen Devel

>>> On 14.08.15 at 15:21, <stefano.stabellini@eu.citrix.com> wrote:
> On Fri, 14 Aug 2015, Jan Beulich wrote:
>> it's even less clear how you'd expect to suppress this in other guest
>> OSes (Windows - afaik - being able to run on ARM certainly makes it
>> a candidate guest, even if maybe this doesn't work today), namely
>> such not having any pcifront. And I hope this design discussion isn't
>> limiting itself to Linux guests.
> 
> We'll write down in the ABI documentation that BARs reassignments are
> not supported.

I.e. guests doing so Will Not Work (TM), with users (usually not reading
ABI docs) learning it the hard way. Not nice, but a way to handle it.

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-14 13:58         ` Jan Beulich
@ 2015-08-14 14:03           ` Stefano Stabellini
  2015-08-14 14:34             ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: Stefano Stabellini @ 2015-08-14 14:03 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, Ian Campbell, Manish Jaggi, Vijaya Kumar,
	JulienGrall, Xen Devel, stefano.stabellini

On Fri, 14 Aug 2015, Jan Beulich wrote:
> >>> On 14.08.15 at 15:21, <stefano.stabellini@eu.citrix.com> wrote:
> > On Fri, 14 Aug 2015, Jan Beulich wrote:
> >> it's even less clear how you'd expect to suppress this in other guest
> >> OSes (Windows - afaik - being able to run on ARM certainly makes it
> >> a candidate guest, even if maybe this doesn't work today), namely
> >> such not having any pcifront. And I hope this design discussion isn't
> >> limiting itself to Linux guests.
> > 
> > We'll write down in the ABI documentation that BARs reassignments are
> > not supported.
> 
> I.e. guests doing so Will Not Work (TM), with users (usually not reading
> ABI docs) learning it the hard way. Not nice, but a way to handle it.

The target of the ABI docs is not users, but kernel developers, who
should most definitely read them and fix their kernels.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-14 14:03           ` Stefano Stabellini
@ 2015-08-14 14:34             ` Jan Beulich
  2015-08-14 14:37               ` Stefano Stabellini
  2015-09-02 14:45               ` Ian Campbell
  0 siblings, 2 replies; 40+ messages in thread
From: Jan Beulich @ 2015-08-14 14:34 UTC (permalink / raw)
  To: stefano.stabellini
  Cc: Prasun.kapoor, Ian Campbell, Manish Jaggi, Vijaya Kumar,
	JulienGrall, Xen Devel

>>> On 14.08.15 at 16:03, <stefano.stabellini@eu.citrix.com> wrote:
> On Fri, 14 Aug 2015, Jan Beulich wrote:
>> >>> On 14.08.15 at 15:21, <stefano.stabellini@eu.citrix.com> wrote:
>> > On Fri, 14 Aug 2015, Jan Beulich wrote:
>> >> it's even less clear how you'd expect to suppress this in other guest
>> >> OSes (Windows - afaik - being able to run on ARM certainly makes it
>> >> a candidate guest, even if maybe this doesn't work today), namely
>> >> such not having any pcifront. And I hope this design discussion isn't
>> >> limiting itself to Linux guests.
>> > 
>> > We'll write down in the ABI documentation that BARs reassignments are
>> > not supported.
>> 
>> I.e. guests doing so Will Not Work (TM), with users (usually not reading
>> ABI docs) learning it the hard way. Not nice, but a way to handle it.
> 
> The target of the ABI docs is not users, but kernel developers, who
> should most definitely read them and fix their kernels.

??? (We're talking of unmodified, closed source OSes here.)

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-14 14:34             ` Jan Beulich
@ 2015-08-14 14:37               ` Stefano Stabellini
  2015-08-14 14:45                 ` Julien Grall
  2015-09-02 14:45               ` Ian Campbell
  1 sibling, 1 reply; 40+ messages in thread
From: Stefano Stabellini @ 2015-08-14 14:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, Ian Campbell, Manish Jaggi, Vijaya Kumar,
	JulienGrall, Xen Devel, stefano.stabellini

On Fri, 14 Aug 2015, Jan Beulich wrote:
> >>> On 14.08.15 at 16:03, <stefano.stabellini@eu.citrix.com> wrote:
> > On Fri, 14 Aug 2015, Jan Beulich wrote:
> >> >>> On 14.08.15 at 15:21, <stefano.stabellini@eu.citrix.com> wrote:
> >> > On Fri, 14 Aug 2015, Jan Beulich wrote:
> >> >> it's even less clear how you'd expect to suppress this in other guest
> >> >> OSes (Windows - afaik - being able to run on ARM certainly makes it
> >> >> a candidate guest, even if maybe this doesn't work today), namely
> >> >> such not having any pcifront. And I hope this design discussion isn't
> >> >> limiting itself to Linux guests.
> >> > 
> >> > We'll write down in the ABI documentation that BARs reassignments are
> >> > not supported.
> >> 
> >> I.e. guests doing so Will Not Work (TM), with users (usually not reading
> >> ABI docs) learning it the hard way. Not nice, but a way to handle it.
> > 
> > The target of the ABI docs is not users, but kernel developers, who
> > should most definitely read them and fix their kernels.
> 
> ??? (We're talking of unmodified, closed source OSes here.)

If you are thinking of Windows for ARM64, there isn't one yet.  When/if
it will become available, there are going to be a number of issues to
address before we can run it as a guest on Xen on ARM with the current
architecture which doesn't do emulation. I am hopeful that we'll be able
to have a discussion on ABI issues such as this one.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-14 14:37               ` Stefano Stabellini
@ 2015-08-14 14:45                 ` Julien Grall
  2015-08-14 15:15                   ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2015-08-14 14:45 UTC (permalink / raw)
  To: Stefano Stabellini, Jan Beulich
  Cc: Prasun.kapoor, Ian Campbell, Manish Jaggi, Vijaya Kumar,
	JulienGrall, Xen Devel

On 14/08/15 15:37, Stefano Stabellini wrote:
> On Fri, 14 Aug 2015, Jan Beulich wrote:
>>>>> On 14.08.15 at 16:03, <stefano.stabellini@eu.citrix.com> wrote:
>>> On Fri, 14 Aug 2015, Jan Beulich wrote:
>>>>>>> On 14.08.15 at 15:21, <stefano.stabellini@eu.citrix.com> wrote:
>>>>> On Fri, 14 Aug 2015, Jan Beulich wrote:
>>>>>> it's even less clear how you'd expect to suppress this in other guest
>>>>>> OSes (Windows - afaik - being able to run on ARM certainly makes it
>>>>>> a candidate guest, even if maybe this doesn't work today), namely
>>>>>> such not having any pcifront. And I hope this design discussion isn't
>>>>>> limiting itself to Linux guests.
>>>>>
>>>>> We'll write down in the ABI documentation that BARs reassignments are
>>>>> not supported.
>>>>
>>>> I.e. guests doing so Will Not Work (TM), with users (usually not reading
>>>> ABI docs) learning it the hard way. Not nice, but a way to handle it.
>>>
>>> The target of the ABI docs is not users, but kernel developers, who
>>> should most definitely read them and fix their kernels.
>>
>> ??? (We're talking of unmodified, closed source OSes here.)
> 
> If you are thinking of Windows for ARM64, there isn't one yet.  When/if
> it will become available, there are going to be a number of issues to
> address before we can run it as a guest on Xen on ARM with the current
> architecture which doesn't do emulation. I am hopeful that we'll be able
> to have a discussion on ABI issues such as this one.

It may be worth to read [1] where the Xen ARM architecture is explained.

Regards,

[1]
http://wiki.xen.org/wiki/Xen_ARM_with_Virtualization_Extensions_whitepaper

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-14 14:45                 ` Julien Grall
@ 2015-08-14 15:15                   ` Jan Beulich
  2015-08-14 15:24                     ` Stefano Stabellini
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2015-08-14 15:15 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: Prasun.kapoor, Ian Campbell, Manish Jaggi, VijayaKumar,
	JulienGrall, Xen Devel

>>> On 14.08.15 at 16:45, <julien.grall@citrix.com> wrote:
> On 14/08/15 15:37, Stefano Stabellini wrote:
>> If you are thinking of Windows for ARM64, there isn't one yet.  When/if
>> it will become available, there are going to be a number of issues to
>> address before we can run it as a guest on Xen on ARM with the current
>> architecture which doesn't do emulation. I am hopeful that we'll be able
>> to have a discussion on ABI issues such as this one.
> 
> It may be worth to read [1] where the Xen ARM architecture is explained.
> 
> Regards,
> 
> [1]
> http://wiki.xen.org/wiki/Xen_ARM_with_Virtualization_Extensions_whitepaper 

Neither Stefano's nor your reply really make a lot of sense to me: I
realize you currently require a cooperating guest, but when
designing something like pass-through it would seem natural to me to
try to allow for other guest kinds in the future.

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-14 15:15                   ` Jan Beulich
@ 2015-08-14 15:24                     ` Stefano Stabellini
  0 siblings, 0 replies; 40+ messages in thread
From: Stefano Stabellini @ 2015-08-14 15:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, JulienGrall, Ian Campbell, Manish Jaggi,
	VijayaKumar, Stefano Stabellini, Xen Devel, Julien Grall

On Fri, 14 Aug 2015, Jan Beulich wrote:
> >>> On 14.08.15 at 16:45, <julien.grall@citrix.com> wrote:
> > On 14/08/15 15:37, Stefano Stabellini wrote:
> >> If you are thinking of Windows for ARM64, there isn't one yet.  When/if
> >> it will become available, there are going to be a number of issues to
> >> address before we can run it as a guest on Xen on ARM with the current
> >> architecture which doesn't do emulation. I am hopeful that we'll be able
> >> to have a discussion on ABI issues such as this one.
> > 
> > It may be worth to read [1] where the Xen ARM architecture is explained.
> > 
> > Regards,
> > 
> > [1]
> > http://wiki.xen.org/wiki/Xen_ARM_with_Virtualization_Extensions_whitepaper 
> 
> Neither Stefano's nor your reply really make a lot of sense to me: I
> realize you currently require a cooperating guest, but when
> designing something like pass-through it would seem natural to me to
> try to allow for other guest kinds in the future.

Xen x86 was designed to work around existing guests. In that context you
had un-cooperative guests, such as Windows, and cooperative guests, such
as Linux, which could be extensively modified to run on Xen.

This is not the case on ARM. Please scratch all preconceptions.

On ARM we only have one kind of guests: operating systems that have been
ported to the published Xen ABIs. There are no legacy OSes to work
around. If a new OS comes around, free or proprietary, I expect it to be
ported to the Xen ABI. And that is not much of an effort, because we
designed the interface from the start to be clean, nice and to "make
sense". Linux and FreeBSD didn't have to be extensively modified to run
on Xen on ARM. This is why I am also opposed to some of your comments on
ACPI for ARM, for example.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-13 15:29 ` Jan Beulich
  2015-08-13 17:01   ` Ian Campbell
@ 2015-08-14 15:38   ` Stefano Stabellini
  2015-08-14 18:58     ` Jaggi, Manish
  2015-09-02 14:57   ` Ian Campbell
  2 siblings, 1 reply; 40+ messages in thread
From: Stefano Stabellini @ 2015-08-14 15:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, Ian Campbell, stefano.stabellini, Manish Jaggi,
	Vijaya Kumar, Julien Grall, Xen Devel

On Thu, 13 Aug 2015, Jan Beulich wrote:
>                >>> On 13.08.15 at 11:42, <mjaggi@caviumnetworks.com> wrote:
> >   2.1    pci_hostbridge and pci_hostbridge_ops
> >   -----------------------------------------------------------------------------
> >   The init function in the PCI host driver calls to register hostbridge
> >   callbacks:
> > 
> >   int pci_hostbridge_register(pci_hostbridge_t *pcihb);
> > 
> >   struct pci_hostbridge_ops {
> >       u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,
> >                                   u32 reg, u32 bytes);
> >       void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn,
> >                                   u32 reg, u32 bytes, u32 val);
> >   };
> > 
> >   struct pci_hostbridge{
> >       u32 segno;
> >       paddr_t cfg_base;
> >       paddr_t cfg_size;
> >       struct dt_device_node *dt_node;
> >       struct pci_hostbridge_ops ops;
> >       struct list_head list;
> >   };
> > 
> >   A PCI conf_read function would internally be as follows:
> >   u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes)
> >   {
> >       pci_hostbridge_t *pcihb;
> >       list_for_each_entry(pcihb, &pci_hostbridge_list, list)
> >       {
> >           if(pcihb-segno == seg)
> >               return pcihb-ops.pci_conf_read(pcihb, bus, devfn, reg, bytes);
> >       }
> >       return -1;
> >   }
> 
> Which implies 1 segment per host bridge, which doesn't seem too
> nice to me: I can't see why a bridge might not cover more than one
> segment, and I also can't see why you shouldn't be able to put
> multiple bridges in the same segment when the number of busses
> they have is small.

1 segment per host bridge is an ARM IORT requirement
(http://infocenter.arm.com/help/topic/com.arm.doc.den0049a/DEN0049A_IO_Remapping_Table.pdf).
It is also pretty much in the spirit of the MCFG table, even though it
is not spelled out so clearly there. I know that we are talking about
device tree here, but I think it is safe to keep the same constraint.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-14 15:38   ` Stefano Stabellini
@ 2015-08-14 18:58     ` Jaggi, Manish
  2015-08-16 23:59       ` Stefano Stabellini
  0 siblings, 1 reply; 40+ messages in thread
From: Jaggi, Manish @ 2015-08-14 18:58 UTC (permalink / raw)
  To: Stefano Stabellini, Jan Beulich
  Cc: Prasun.kapoor, Kumar, Vijaya, Julien Grall, Ian Campbell, Xen Devel

How about having  a short discussion on Monday during Xen Summit to discuss on the points.
We have a talk tuesday morning and I would update it based on the monday discussion.

Regards,
Manish Jaggi

________________________________________
From: xen-devel-bounces@lists.xen.org <xen-devel-bounces@lists.xen.org> on behalf of Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Sent: Friday, August 14, 2015 9:08 PM
To: Jan Beulich
Cc: Prasun.kapoor@cavium.com; Ian Campbell; stefano.stabellini@eu.citrix.com; Jaggi, Manish; Kumar, Vijaya; Julien Grall; Xen Devel
Subject: Re: [Xen-devel] PCI Pass-through in Xen ARM: Draft 4

On Thu, 13 Aug 2015, Jan Beulich wrote:
>                >>> On 13.08.15 at 11:42, <mjaggi@caviumnetworks.com> wrote:
> >   2.1    pci_hostbridge and pci_hostbridge_ops
> >   -----------------------------------------------------------------------------
> >   The init function in the PCI host driver calls to register hostbridge
> >   callbacks:
> >
> >   int pci_hostbridge_register(pci_hostbridge_t *pcihb);
> >
> >   struct pci_hostbridge_ops {
> >       u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,
> >                                   u32 reg, u32 bytes);
> >       void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn,
> >                                   u32 reg, u32 bytes, u32 val);
> >   };
> >
> >   struct pci_hostbridge{
> >       u32 segno;
> >       paddr_t cfg_base;
> >       paddr_t cfg_size;
> >       struct dt_device_node *dt_node;
> >       struct pci_hostbridge_ops ops;
> >       struct list_head list;
> >   };
> >
> >   A PCI conf_read function would internally be as follows:
> >   u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes)
> >   {
> >       pci_hostbridge_t *pcihb;
> >       list_for_each_entry(pcihb, &pci_hostbridge_list, list)
> >       {
> >           if(pcihb-segno == seg)
> >               return pcihb-ops.pci_conf_read(pcihb, bus, devfn, reg, bytes);
> >       }
> >       return -1;
> >   }
>
> Which implies 1 segment per host bridge, which doesn't seem too
> nice to me: I can't see why a bridge might not cover more than one
> segment, and I also can't see why you shouldn't be able to put
> multiple bridges in the same segment when the number of busses
> they have is small.

1 segment per host bridge is an ARM IORT requirement
(http://infocenter.arm.com/help/topic/com.arm.doc.den0049a/DEN0049A_IO_Remapping_Table.pdf).
It is also pretty much in the spirit of the MCFG table, even though it
is not spelled out so clearly there. I know that we are talking about
device tree here, but I think it is safe to keep the same constraint.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-14 18:58     ` Jaggi, Manish
@ 2015-08-16 23:59       ` Stefano Stabellini
  0 siblings, 0 replies; 40+ messages in thread
From: Stefano Stabellini @ 2015-08-16 23:59 UTC (permalink / raw)
  To: Jaggi, Manish
  Cc: Prasun.kapoor, Ian Campbell, Stefano Stabellini, Kumar, Vijaya,
	Julien Grall, Xen Devel, Jan Beulich

Sure, I would be happy to, but unfortunately Jan won't be attending.

On Fri, 14 Aug 2015, Jaggi, Manish wrote:
> How about having  a short discussion on Monday during Xen Summit to discuss on the points.
> We have a talk tuesday morning and I would update it based on the monday discussion.
> 
> Regards,
> Manish Jaggi
> 
> ________________________________________
> From: xen-devel-bounces@lists.xen.org <xen-devel-bounces@lists.xen.org> on behalf of Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Sent: Friday, August 14, 2015 9:08 PM
> To: Jan Beulich
> Cc: Prasun.kapoor@cavium.com; Ian Campbell; stefano.stabellini@eu.citrix.com; Jaggi, Manish; Kumar, Vijaya; Julien Grall; Xen Devel
> Subject: Re: [Xen-devel] PCI Pass-through in Xen ARM: Draft 4
> 
> On Thu, 13 Aug 2015, Jan Beulich wrote:
> >                >>> On 13.08.15 at 11:42, <mjaggi@caviumnetworks.com> wrote:
> > >   2.1    pci_hostbridge and pci_hostbridge_ops
> > >   -----------------------------------------------------------------------------
> > >   The init function in the PCI host driver calls to register hostbridge
> > >   callbacks:
> > >
> > >   int pci_hostbridge_register(pci_hostbridge_t *pcihb);
> > >
> > >   struct pci_hostbridge_ops {
> > >       u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,
> > >                                   u32 reg, u32 bytes);
> > >       void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn,
> > >                                   u32 reg, u32 bytes, u32 val);
> > >   };
> > >
> > >   struct pci_hostbridge{
> > >       u32 segno;
> > >       paddr_t cfg_base;
> > >       paddr_t cfg_size;
> > >       struct dt_device_node *dt_node;
> > >       struct pci_hostbridge_ops ops;
> > >       struct list_head list;
> > >   };
> > >
> > >   A PCI conf_read function would internally be as follows:
> > >   u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes)
> > >   {
> > >       pci_hostbridge_t *pcihb;
> > >       list_for_each_entry(pcihb, &pci_hostbridge_list, list)
> > >       {
> > >           if(pcihb-segno == seg)
> > >               return pcihb-ops.pci_conf_read(pcihb, bus, devfn, reg, bytes);
> > >       }
> > >       return -1;
> > >   }
> >
> > Which implies 1 segment per host bridge, which doesn't seem too
> > nice to me: I can't see why a bridge might not cover more than one
> > segment, and I also can't see why you shouldn't be able to put
> > multiple bridges in the same segment when the number of busses
> > they have is small.
> 
> 1 segment per host bridge is an ARM IORT requirement
> (http://infocenter.arm.com/help/topic/com.arm.doc.den0049a/DEN0049A_IO_Remapping_Table.pdf).
> It is also pretty much in the spirit of the MCFG table, even though it
> is not spelled out so clearly there. I know that we are talking about
> device tree here, but I think it is safe to keep the same constraint.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-13  9:42 PCI Pass-through in Xen ARM: Draft 4 Manish Jaggi
  2015-08-13 10:37 ` Julien Grall
  2015-08-13 15:29 ` Jan Beulich
@ 2015-08-31 12:36 ` Manish Jaggi
  2015-09-01  7:32   ` Jan Beulich
  2015-09-01 16:15 ` Stefano Stabellini
  2015-09-10  1:12 ` Julien Grall
  4 siblings, 1 reply; 40+ messages in thread
From: Manish Jaggi @ 2015-08-31 12:36 UTC (permalink / raw)
  To: Xen Devel, Konrad Rzeszutek Wilk, ★ Stefano Stabellini,
	"Manish Jaggi,★ Kumar, Vijaya",
	Julien Grall, Ian Campbell
  Cc: Prasun.kapoor



On Thursday 13 August 2015 03:12 PM, Manish Jaggi wrote:
> -----------------------------
>              | PCI Pass-through in Xen ARM |
>               -----------------------------
>              manish.jaggi@caviumnetworks.com
>              -------------------------------
>
>                       Draft-4
> [snip]
>
>  4.2.1 Mapping BAR regions in guest address space
>  ----------------------------------------------------------------------------- 
>
>  When a PCI-EP device is assigned to a domU the toolstack will read 
> the pci
>  configuration space BAR registers. Toolstack allocates a virtual BAR
>  region for each BAR region, from the area reserved in guest address 
> space for
>  mapping BARs referred to as Guest BAR area. This area is defined in
>  public/arch-arm.h
>
>  /* For 32bit BARs*/
>  #define GUEST_BAR_BASE_32 <<>>
>  #define GUEST_BAR_SIZE_32 <<>>
>
>  /* For 64bit BARs*/
>  #define GUEST_BAR_BASE_64 <<>>
>  #define GUEST_BAR_SIZE_64 <<>>
>
>  Toolstack then invokes domctl xc_domain_memory_mapping to map in stage2
>  translation. If a BAR region address is 32b BASE_32 area would be used,
>  otherwise 64b. If a combination of both is required the support is TODO.
>
>  Toolstack manages these areas and allocate from these area. The 
> allocation
>  and deallocation is done using APIs similar to malloc and free.
>

To implement this feature in xl tools there is required to have a malloc 
and free from the reserved area.
Can we have the XEN_DOMCTL_memory_mapping extended with a flag say 
ALLOCATE/FREE_FROM_BAR_AREA.
When this flag is passed xen would add or remove the stage2 mapping for 
the domain.
This will make use of the code already present in xen.

Any reservations with this approach ?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-31 12:36 ` Manish Jaggi
@ 2015-09-01  7:32   ` Jan Beulich
  2015-09-02 12:08     ` Manish Jaggi
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2015-09-01  7:32 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: Prasun.kapoor, Ian Campbell, stefano.stabellini, Vijaya Kumar,
	Julien Grall, Xen Devel

>>> On 31.08.15 at 14:36, <mjaggi@caviumnetworks.com> wrote:
> On Thursday 13 August 2015 03:12 PM, Manish Jaggi wrote:
>>  4.2.1 Mapping BAR regions in guest address space
>>  ----------------------------------------------------------------------------- 
>>
>>  When a PCI-EP device is assigned to a domU the toolstack will read 
>> the pci
>>  configuration space BAR registers. Toolstack allocates a virtual BAR
>>  region for each BAR region, from the area reserved in guest address 
>> space for
>>  mapping BARs referred to as Guest BAR area. This area is defined in
>>  public/arch-arm.h
>>
>>  /* For 32bit BARs*/
>>  #define GUEST_BAR_BASE_32 <<>>
>>  #define GUEST_BAR_SIZE_32 <<>>
>>
>>  /* For 64bit BARs*/
>>  #define GUEST_BAR_BASE_64 <<>>
>>  #define GUEST_BAR_SIZE_64 <<>>
>>
>>  Toolstack then invokes domctl xc_domain_memory_mapping to map in stage2
>>  translation. If a BAR region address is 32b BASE_32 area would be used,
>>  otherwise 64b. If a combination of both is required the support is TODO.
>>
>>  Toolstack manages these areas and allocate from these area. The 
>> allocation
>>  and deallocation is done using APIs similar to malloc and free.
>>
> 
> To implement this feature in xl tools there is required to have a malloc 
> and free from the reserved area.
> Can we have the XEN_DOMCTL_memory_mapping extended with a flag say 
> ALLOCATE/FREE_FROM_BAR_AREA.
> When this flag is passed xen would add or remove the stage2 mapping for 
> the domain.
> This will make use of the code already present in xen.

Above it was said that the tool stack manages this area (including
allocations from it). Why would this require a new hypercall?

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-13  9:42 PCI Pass-through in Xen ARM: Draft 4 Manish Jaggi
                   ` (2 preceding siblings ...)
  2015-08-31 12:36 ` Manish Jaggi
@ 2015-09-01 16:15 ` Stefano Stabellini
  2015-09-10  1:12 ` Julien Grall
  4 siblings, 0 replies; 40+ messages in thread
From: Stefano Stabellini @ 2015-09-01 16:15 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: Prasun.kapoor, Ian Campbell, Andrew Cooper, Kumar, Vijaya,
	Julien Grall, Xen Devel, David Vrabel, Stefano Stabellini,
	Roger Pau Monne

CC'ing a few other x86 people as this is likely to be the same approach
that will be taken by PVH.
        
    
On Thu, 13 Aug 2015, Manish Jaggi wrote:
>               -----------------------------
>              | PCI Pass-through in Xen ARM |
>               -----------------------------
>              manish.jaggi@caviumnetworks.com
>              -------------------------------
> 
>                       Draft-4
> 
> 
>  -----------------------------------------------------------------------------
>  Introduction
>  -----------------------------------------------------------------------------
>  This document describes the design for the PCI passthrough support in Xen
>  ARM. The target system is an ARM 64bit SoC with GICv3 and SMMU v2 and PCIe
>  devices.
> 
>  -----------------------------------------------------------------------------
>  Revision History
>  -----------------------------------------------------------------------------
>  Changes from Draft-1:
>  ---------------------
>  a) map_mmio hypercall removed from earlier draft
>  b) device bar mapping into guest not 1:1
>  c) Reserved Area in guest address space for mapping PCI-EP BARs in Stage2.
>  d) Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
> 
>  Changes from Draft-2:
>  ---------------------
>  a) DomU boot information updated with boot-time device assignment and
>  hotplug.
>  b) SMMU description added
>  c) Mapping between streamID - bdf - deviceID.
>  d) assign_device hypercall to include virtual(guest) sbdf.
>  Toolstack to generate guest sbdf rather than pciback.
> 
>  Changes from Draft-3:
>  ---------------------
>  a) Fixed typos and added more description
>  b) NUMA and PCI passthrough description removed for now.
>  c) Added example from Ian's Mail
> 
>  -----------------------------------------------------------------------------
>  Index
>  -----------------------------------------------------------------------------
>    (1) Background
> 
>    (2) Basic PCI Support in Xen ARM
>    (2.1) pci_hostbridge and pci_hostbridge_ops
>    (2.2) PHYSDEVOP_HOSTBRIDGE_ADD hypercall
>    (2.3) XEN Internal API
> 
>    (3) SMMU programming
>    (3.1) Additions for PCI Passthrough
>    (3.2) Mapping between streamID - deviceID - pci sbdf - requesterID
> 
>    (4) Assignment of PCI device
>    (4.1) Dom0
>    (4.1.1) Stage 2 Mapping of GITS_ITRANSLATER space (4k)
>    (4.1.1.1) For Dom0
>    (4.1.1.2) For DomU
>    (4.1.1.2.1) Hypercall Details: XEN_DOMCTL_get_itranslater_space
> 
>    (4.2) DomU
>    (4.2.1) Reserved Areas in guest memory space
>    (4.2.2) Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
>    (4.2.3) Hypercall Modification for bdf mapping notification to xen
> 
>    (5) DomU FrontEnd Bus Changes
>    (5.1) Change in Linux PCI frontend bus and gicv3-its node binding for domU
> 
>    (6) Glossary
> 
>    (7) References
>  -----------------------------------------------------------------------------
> 
>  1.    Background of PCI passthrough
>  -----------------------------------------------------------------------------
>  Passthrough refers to assigning a PCI device to a guest domain (domU) such
>  that the guest has full control over the device. The MMIO space / interrupts
>  are managed by the guest itself, close to how a bare kernel manages a device.
> 
>  Device's access to guest address space needs to be isolated and protected.
>  SMMU (System MMU - IOMMU in ARM) is programmed by xen hypervisor to allow
>  device access guest memory for data transfer and sending MSI/X interrupts.
>  PCI devices generated message signalled interrupt writes are within guest
>  address spaces which are also translated using SMMU.
> 
>  For this reason the GITS (ITS address space) Interrupt Translation Register
>  space is mapped in the guest address space.
> 
>  2.    Basic PCI Support for ARM
>  -----------------------------------------------------------------------------
>  The APIs to read write from PCI configuration space are based on segment:bdf.
>  How the sbdf is mapped to a physical address is under the realm of the PCI
>  host controller.
> 
>  ARM PCI support in Xen, introduces PCI host controller similar to what
>  exists in Linux. Host controller drivers registers callbacks, which are
>  invoked on matching the compatible property in pci device tree node.
> 
>  Note: as pci devices are enumerated the pci node in device tree refers to
>  the host controller.
> 
>  (TODO: for ACPI unimplemented)
> 
>  2.1    pci_hostbridge and pci_hostbridge_ops
>  -----------------------------------------------------------------------------
>  The init function in the PCI host driver calls to register hostbridge
>  callbacks:
> 
>  int pci_hostbridge_register(pci_hostbridge_t *pcihb);
> 
>  struct pci_hostbridge_ops {
>      u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,
>                                  u32 reg, u32 bytes);
>      void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn,
>                                  u32 reg, u32 bytes, u32 val);
>  };
> 
>  struct pci_hostbridge{
>      u32 segno;
>      paddr_t cfg_base;
>      paddr_t cfg_size;
>      struct dt_device_node *dt_node;
>      struct pci_hostbridge_ops ops;
>      struct list_head list;
>  };
> 
>  A PCI conf_read function would internally be as follows:
>  u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes)
>  {
>      pci_hostbridge_t *pcihb;
>      list_for_each_entry(pcihb, &pci_hostbridge_list, list)
>      {
>          if(pcihb-segno == seg)
>              return pcihb-ops.pci_conf_read(pcihb, bus, devfn, reg, bytes);
>      }
>      return -1;
>  }
> 
>  2.2    PHYSDEVOP_pci_host_bridge_add hypercall
>  -----------------------------------------------------------------------------
>  Xen code accesses PCI configuration space based on the sbdf received from
>  the guest. The order in which the pci device tree node appear may not be
>  the same order of device enumeration in dom0. Thus there needs to be a
>  mechanism to bind the segment number assigned by dom0 to the pci host
>  controller. The hypercall is introduced:
> 
>  #define PHYSDEVOP_pci_host_bridge_add    <<>>
>  struct physdev_pci_host_bridge_add {
>      /* IN */
>      uint16_t seg;
>      uint64_t cfg_base;
>      uint64_t cfg_size;
>  };
> 
>  This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add
>  hypercall.
> 
>  To understand in detail about the requirement Ian's example is listed below:
> -- Ref: [1]
>  Imagine we have two PCI host bridges, one with CFG space at 0xA0000000 and
>  a second with CFG space at 0xB0000000.
> 
>  Xen discovers these and assigns segment 0=0xA0000000 and segment
>  1=0xB0000000.
> 
>  Dom0 discovers them too but assigns segment 1=0xA0000000 and segment
>  0=0xB0000000 (i.e. the other way).
> 
>  Now Dom0 makes a hypercall referring to a device as (segment=1,BDF), i.e.
>  the device with BDF behind the root bridge at 0xA0000000. (Perhaps this is
>  the PHYSDEVOP_manage_pci_add_ext call).
> 
>  But Xen thinks it is talking about the device with BDF behind the root
>  bridge at 0xB0000000 because Dom0 and Xen do not agree on what the segments
>  mean. Now Xen will use the wrong device ID in the IOMMU (since that is
>  associated with the host bridge), or poke the wrong configuration space, or
>  whatever.
> 
>  Or maybe Xen chose 42=0xB0000000 and 43=0xA0000000 so when Dom0 starts
>  talking about segment=0 and =1 it has no idea what is going on.
> 
>  PHYSDEVOP_pci_host_bridge_add is intended to allow Dom0 to say "Segment 0
>  is the host bridge at 0xB0000000" and "Segment 1 is the host bridge at
>  0xA0000000". With this there is no confusion between Xen and Dom0 because
>  Xen isn't picking a segment ID, it is being told what it is by Dom0 which
>  has done the picking.
> --
> 
>  The handler code invokes to update segment number in pci_hostbridge:
> 
>  int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t
>  cfg_size);
> 
>  Subsequent calls to pci_conf_read/write are completed by the
>  pci_hostbridge_ops of the respective pci_hostbridge.
> 
>  2.3    XEN Internal API
>  -----------------------------------------------------------------------------
>  a) pci_hostbridge_dt_node
> 
>  struct dt_device_node* pci_hostbridge_dt_node(uint32_t segno);
> 
>  Returns the device tree node pointer of the pci node which is bound by the
>  passed segment number. The API can be called subsequent to
>  pci_hostbridge_setup
> 
>  3.    SMMU programming
>  -----------------------------------------------------------------------------
> 
>  3.1.    Additions for PCI Passthrough
>  -----------------------------------------------------------------------------
> 
>  3.1.1 - add_device in iommu_ops is implemented.
>  -----------------------------------------------------------------------------
> 
>  This is called when PHYSDEVOP_pci_add_device / PHYSDEVOP_manage_pci_add_ext
>  is called from dom0.
> 
>  .add_device = arm_smmu_add_dom0_dev,
>  static int arm_smmu_add_dom0_dev(u8 devfn, struct device *dev)
>  {
>          if (dev_is_pci(dev)) {
>              struct pci_dev *pdev = to_pci_dev(dev);
>              return arm_smmu_assign_dev(pdev-domain, devfn, dev);
>          }
>          return -1;
>  }
> 
>  3.1.2 - remove_device in iommu_ops is implemented.
>  -----------------------------------------------------------------------------
>  This is called when PHYSDEVOP_pci_device_remove is called from dom0/domU.
> 
>  .remove_device = arm_smmu_remove_dev.
>  TODO: add implementation details of arm_smmu_remove_dev.
> 
>  3.1.3 dev_get_dev_node is modified for pci devices.
>  -----------------------------------------------------------------------------
>  The function is modified to return the dt_node of the pci hostbridge from
>  the device tree. This is required as non-dt devices need a way to find on
>  which smmu they are attached.
> 
>  static struct arm_smmu_device *find_smmu_for_device(struct device *dev)
>  {
>          struct device_node *dev_node = dev_get_dev_node(dev);
>  ....
> 
>  static struct device_node *dev_get_dev_node(struct device *dev)
>  {
>          if (dev_is_pci(dev)) {
>                  struct pci_dev *pdev = to_pci_dev(dev);
>                  return pci_hostbridge_dt_node(pdev-seg);
>          }
>  ...
> 
> 
>  3.2.    Mapping between streamID - deviceID - pci sbdf - requesterID
>  -----------------------------------------------------------------------------
>  For a simpler case all should be equal to BDF. But there are some devices
>  that use the wrong requester ID for DMA transactions. Linux kernel has PCI
>  quirks for these. How the same be implemented in Xen or a diffrent approach
>  has to be taken is TODO here.
> 
>  Till that time, for basic implementation it is assumed that all are equal
>  to BDF.
> 
>  4.    Assignment of PCI device
>  -----------------------------------------------------------------------------
> 
>  4.1    Dom0
>  -----------------------------------------------------------------------------
>  All PCI devices are assigned to dom0 unless hidden by pciback.hide bootargs
>  in dom0.Dom0 enumerates the PCI devices. For each device the MMIO space has
>  to be mapped in the Stage2 translation for dom0. For dom0 Xen maps ranges
>  from device tree pci nodes in stage 2 translation during boot.
> 
>  In the flow of hypercall processing PHYSDEV_pci_add_device
>  its_add_device(machine_sbdf) should be called. This will allocate ITS
>  specific data structures for the device. (Reference [2])
> 
> 
>  4.1.1 Stage 2 Mapping of GITS_ITRANSLATER space (64k)
>  -----------------------------------------------------------------------------
> 
>  GITS_ITRANSLATER space (64k) must be programmed in Stage2 translation so
>  that SMMU can translate MSI(x) from the device using the page table of the
>  domain.
> 
>  4.1.1.1 For Dom0
>  -----------------------------------------------------------------------------
>  GITS_ITRANSLATER address space is mapped 1:1 during dom0 boot. For dom0 this
>  mapping is done in the vgic driver. For domU the mapping is done by
>  toolstack.
> 
>  4.1.1.2 For DomU
>  -----------------------------------------------------------------------------
>  For domU, while creating the domain, the toolstack reads the IPA from the
>  macro GITS_ITRANSLATER_SPACE from xen/include/public/arch-arm.h. The PA is
>  read from a new hypercall which returns PA of GITS_ITRANSLATER_SPACE.
> 
>  Subsequently toolstack sends a hypercall to create a stage 2 mapping.
> 
>  Hypercall Details: XEN_DOMCTL_get_itranslater_space
> 
>  /* XEN_DOMCTL_get_itranslater_space */
>  struct xen_domctl_get_itranslater_space {
>      /* OUT variables. */
>      uint64_aligned_t start_addr;
>      uint64_aligned_t size;
>  };
> 
>  4.2 DomU
>  -----------------------------------------------------------------------------
> 
>  4.2.1 Mapping BAR regions in guest address space
>  -----------------------------------------------------------------------------
>  When a PCI-EP device is assigned to a domU the toolstack will read the pci
>  configuration space BAR registers. Toolstack allocates a virtual BAR
>  region for each BAR region, from the area reserved in guest address space for
>  mapping BARs referred to as Guest BAR area. This area is defined in
>  public/arch-arm.h
> 
>  /* For 32bit BARs*/
>  #define GUEST_BAR_BASE_32 <<>>
>  #define GUEST_BAR_SIZE_32 <<>>
> 
>  /* For 64bit BARs*/
>  #define GUEST_BAR_BASE_64 <<>>
>  #define GUEST_BAR_SIZE_64 <<>>
> 
>  Toolstack then invokes domctl xc_domain_memory_mapping to map in stage2
>  translation. If a BAR region address is 32b BASE_32 area would be used,
>  otherwise 64b. If a combination of both is required the support is TODO.
> 
>  Toolstack manages these areas and allocate from these area. The allocation
>  and deallocation is done using APIs similar to malloc and free.
> 
>  4.2.2    Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
>  ----------------------------------------------------------------------------
>  Toolstack also updates the xenstore information for the device
>  (virtualbar:physical bar).This information is read by xen-pciback and
>  returned to the domU-pcifront driver configuration space reads for BAR.
> 
>  Entries created are as follows:
>  /local/domain/0/backend/pci/1/0
>  vdev-N
>      BDF = ""
>      BAR-0-IPA = ""
>      BAR-0-PA = ""
>      BAR-0-SIZE = ""
>      ...
>      BAR-M-IPA = ""
>      BAR-M-PA = ""
>      BAR-M-SIZE = ""
> 
>  Note: If BAR M SIZE is 0, it is not a valid entry.
> 
>  4.2.3 Hypercall Modification (XEN_DOMCTL_assign_device)
>  ----------------------------------------------------------------------------
>  For machine:sbdf guest:sbdf needs to be generated when a device is assigned
>  to a domU. Currently this is done by xen-pciback. As per discussions [4]
>  on xen-devel the df generation should be done by toolstack rather than
>  the xen-pciback.
> 
>  Since there is only one pci-frontend bus in domU, s:b:d.f is 0:0:d.f
>  It is proposed in this design document that the df generation be done by
>  toolstack and the xenstore keys be created by toolstack.
> 
>  Folowing guest_sbdf generation the domctl to assign the device is invoked.
>  This hypercall is updated to include *guest_sbdf*. Xen ITS driver can store
>  this mapping domID: guest_sbdf: machine_sbdf and can be used later.
> 
>  struct xen_domctl_assign_device {
>     uint32_t dev;   /* XEN_DOMCTL_DEV_* */
>     union {
>         struct {
>             uint32_t machine_sbdf;   /* machine PCI ID of assigned device */
>             uint32_t guest_sbdf;   /* guest PCI ID of assigned device */
>         } pci;
>         struct {
>             uint32_t size; /* Length of the path */
>             XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */
>         } dt;
>     } u;
>  };
> 
>  In the handler of this hypercall an internal API function
>  its_assign_device(domid, machine_sbdf, guest_sbdf)
>  (Refrence [2])
> 
>  is called and will store the mapping between machine_sbdf:guest_sbdf.
> 
>  5. Change in Linux PCI frontEnd - backend driver for MSI/X programming
>  -----------------------------------------------------------------------------
> 
>  5.1 pci-frontend bus and gicv3-its node binding for domU
>  -----------------------------------------------------------------------------
>  It is assumed that toolstack would generate a gicv3-its node in domU device
>  tree. As of now the ARM PCI passthrough design supports device assignment to
>  the guests which have gicv3-its support. PCI passthrough with a gicv2 guest
>  is not supported.
> 
>  All the devices assigned to domU are enumerated on a PCI frontend bus.
>  On this bus interrupt parent is set as gicv3-its for ARM systems. As the
>  gicv3-its is emulated in xen, all the access by domU driver is trapped.
>  This helps configuration & direct injection of MSI(LPI) into the guest. Thus
>  the frontend-backend communication for MSI is no longer required.
> 
>  Frontend-backend communication is required only for reading PCI configuration
>  space by dom0 on behalf of domU.
> 
>  6.    Glossary
>  -----------------------------------------------------------------------------
>  MSI: Message Signalled Interrupt
>  ITS: Interrupt Translation Service
>  GIC: Generic Interrupt Controller
>  LPI: Locality-specific Peripheral Interrupt
> 
> 
>  7.    References
>  -----------------------------------------------------------------------------
>  [1]. http://osdir.com/ml/general/2015-08/msg15346.html
>  [2]. http://lists.xen.org/archives/html/xen-devel/2015-07/msg01984.html
>  [3]. http://xenbits.xen.org/people/ianc/vits/draftG.html
>  [4]. http://lists.xen.org/archives/html/xen-devel/2015-07/msg05513.html
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-09-01  7:32   ` Jan Beulich
@ 2015-09-02 12:08     ` Manish Jaggi
  2015-09-02 12:59       ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Manish Jaggi @ 2015-09-02 12:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, Ian Campbell, stefano.stabellini, Vijaya Kumar,
	Julien Grall, Xen Devel



On Tuesday 01 September 2015 01:02 PM, Jan Beulich wrote:
>>>> On 31.08.15 at 14:36, <mjaggi@caviumnetworks.com> wrote:
>> On Thursday 13 August 2015 03:12 PM, Manish Jaggi wrote:
>>>   4.2.1 Mapping BAR regions in guest address space
>>>   -----------------------------------------------------------------------------
>>>
>>>   When a PCI-EP device is assigned to a domU the toolstack will read
>>> the pci
>>>   configuration space BAR registers. Toolstack allocates a virtual BAR
>>>   region for each BAR region, from the area reserved in guest address
>>> space for
>>>   mapping BARs referred to as Guest BAR area. This area is defined in
>>>   public/arch-arm.h
>>>
>>>   /* For 32bit BARs*/
>>>   #define GUEST_BAR_BASE_32 <<>>
>>>   #define GUEST_BAR_SIZE_32 <<>>
>>>
>>>   /* For 64bit BARs*/
>>>   #define GUEST_BAR_BASE_64 <<>>
>>>   #define GUEST_BAR_SIZE_64 <<>>
>>>
>>>   Toolstack then invokes domctl xc_domain_memory_mapping to map in stage2
>>>   translation. If a BAR region address is 32b BASE_32 area would be used,
>>>   otherwise 64b. If a combination of both is required the support is TODO.
>>>
>>>   Toolstack manages these areas and allocate from these area. The
>>> allocation
>>>   and deallocation is done using APIs similar to malloc and free.
>>>
>> To implement this feature in xl tools there is required to have a malloc
>> and free from the reserved area.
>> Can we have the XEN_DOMCTL_memory_mapping extended with a flag say
>> ALLOCATE/FREE_FROM_BAR_AREA.
>> When this flag is passed xen would add or remove the stage2 mapping for
>> the domain.
>> This will make use of the code already present in xen.
> Above it was said that the tool stack manages this area (including
> allocations from it). Why would this require a new hypercall?
As a rule xl tools should manage the guest memory map. Now if it does by 
itself or initiates it is another thing.
Allocating an area for PCI BAR and freeing it reserved area  would 
require adding  allocator code in xl tools.
Since xen already knows about the area (as it is defined in public 
header file) and there exists code in xen,
i believe it make sense to use that rather than adding the same in xl tools.


> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-09-02 12:08     ` Manish Jaggi
@ 2015-09-02 12:59       ` Julien Grall
  2015-09-02 13:46         ` Ian Campbell
  2015-09-02 15:03         ` Ian Campbell
  0 siblings, 2 replies; 40+ messages in thread
From: Julien Grall @ 2015-09-02 12:59 UTC (permalink / raw)
  To: Manish Jaggi, Jan Beulich
  Cc: Prasun.kapoor, Ian Campbell, stefano.stabellini, Vijaya Kumar,
	Julien Grall, Xen Devel

On 02/09/15 13:08, Manish Jaggi wrote:
> 
> 
> On Tuesday 01 September 2015 01:02 PM, Jan Beulich wrote:
>>>>> On 31.08.15 at 14:36, <mjaggi@caviumnetworks.com> wrote:
>>> On Thursday 13 August 2015 03:12 PM, Manish Jaggi wrote:
>>>>   4.2.1 Mapping BAR regions in guest address space
>>>>  
>>>> -----------------------------------------------------------------------------
>>>>
>>>>
>>>>   When a PCI-EP device is assigned to a domU the toolstack will read
>>>> the pci
>>>>   configuration space BAR registers. Toolstack allocates a virtual BAR
>>>>   region for each BAR region, from the area reserved in guest address
>>>> space for
>>>>   mapping BARs referred to as Guest BAR area. This area is defined in
>>>>   public/arch-arm.h
>>>>
>>>>   /* For 32bit BARs*/
>>>>   #define GUEST_BAR_BASE_32 <<>>
>>>>   #define GUEST_BAR_SIZE_32 <<>>
>>>>
>>>>   /* For 64bit BARs*/
>>>>   #define GUEST_BAR_BASE_64 <<>>
>>>>   #define GUEST_BAR_SIZE_64 <<>>
>>>>
>>>>   Toolstack then invokes domctl xc_domain_memory_mapping to map in
>>>> stage2
>>>>   translation. If a BAR region address is 32b BASE_32 area would be
>>>> used,
>>>>   otherwise 64b. If a combination of both is required the support is
>>>> TODO.
>>>>
>>>>   Toolstack manages these areas and allocate from these area. The
>>>> allocation
>>>>   and deallocation is done using APIs similar to malloc and free.
>>>>
>>> To implement this feature in xl tools there is required to have a malloc
>>> and free from the reserved area.
>>> Can we have the XEN_DOMCTL_memory_mapping extended with a flag say
>>> ALLOCATE/FREE_FROM_BAR_AREA.
>>> When this flag is passed xen would add or remove the stage2 mapping for
>>> the domain.
>>> This will make use of the code already present in xen.
>> Above it was said that the tool stack manages this area (including
>> allocations from it). Why would this require a new hypercall?
> As a rule xl tools should manage the guest memory map. Now if it does by
> itself or initiates it is another thing.
> Allocating an area for PCI BAR and freeing it reserved area  would
> require adding  allocator code in xl tools.
> Since xen already knows about the area (as it is defined in public
> header file) and there exists code in xen,

Which code? We don't have any code in Xen to find free space in the
stage-2 table for the PCI region.

Anyway, I think this logic should be done in the toolstack and not in
the hypervisor. Only the toolstack is in charge of the memory layout.
Xen appears to know the memory layout on ARM because it's statically define.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-09-02 12:59       ` Julien Grall
@ 2015-09-02 13:46         ` Ian Campbell
  2015-09-02 15:03           ` Ian Campbell
  2015-09-02 15:03         ` Ian Campbell
  1 sibling, 1 reply; 40+ messages in thread
From: Ian Campbell @ 2015-09-02 13:46 UTC (permalink / raw)
  To: Julien Grall, Manish Jaggi, Jan Beulich
  Cc: Prasun.kapoor, Vijaya Kumar, Julien Grall, Xen Devel, stefano.stabellini

On Wed, 2015-09-02 at 13:59 +0100, Julien Grall wrote:

(I'm not caught up on my mail, so just commenting on this one aspect)

> Anyway, I think this logic should be done in the toolstack and not in
> the hypervisor. Only the toolstack is in charge of the memory layout.
> Xen appears to know the memory layout on ARM because it's statically 
> define.

The domU address space GUEST_* #defines in xen/include/public/arch-arm.h
are really just a convenience used when the toolstack and hypervisor need
to agree on a value and that value happens, right now, to be static.

The _correct_ interface would be a hypercall (or several) where the
toolstack tells Xen what the values are, but that's code and faff etc so
where the value which the toolstack is static we take a short cut and add
one of these #defines.

So everyone should just think of every GUEST_FOO in there as being
equivalent to:
    struct xen_arch_domainconfig {
	//.... other stuff
        uint64_t foo;
    };
i.e. passed to XEN_DOMCTL_createdomain during domain build (obviously and a
field foo in struct arch_domain where Xen stashes the value and uses that
instead of GUEST_FOO etc).

The actual value of foo would be in tools/libx?/something.h, or decided at
runtime.

Obviously for FOO which is a static value that's a pain, hence the #defines
instead.

Ian.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-14 14:34             ` Jan Beulich
  2015-08-14 14:37               ` Stefano Stabellini
@ 2015-09-02 14:45               ` Ian Campbell
  2015-09-02 14:52                 ` Jan Beulich
  1 sibling, 1 reply; 40+ messages in thread
From: Ian Campbell @ 2015-09-02 14:45 UTC (permalink / raw)
  To: Jan Beulich, stefano.stabellini
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, JulienGrall, Xen Devel

On Fri, 2015-08-14 at 08:34 -0600, Jan Beulich wrote:
> > 
> > > > On 14.08.15 at 16:03, <stefano.stabellini@eu.citrix.com> wrote:
> > On Fri, 14 Aug 2015, Jan Beulich wrote:
> > > > > > On 14.08.15 at 15:21, <stefano.stabellini@eu.citrix.com> wrote:
> > > > On Fri, 14 Aug 2015, Jan Beulich wrote:
> > > > > it's even less clear how you'd expect to suppress this in other 
> > > > > guest
> > > > > OSes (Windows - afaik - being able to run on ARM certainly makes 
> > > > > it
> > > > > a candidate guest, even if maybe this doesn't work today), namely
> > > > > such not having any pcifront. And I hope this design discussion 
> > > > > isn't
> > > > > limiting itself to Linux guests.
> > > > 
> > > > We'll write down in the ABI documentation that BARs reassignments 
> > > > are
> > > > not supported.
> > > 
> > > I.e. guests doing so Will Not Work (TM), with users (usually not 
> > > reading
> > > ABI docs) learning it the hard way. Not nice, but a way to handle it.
> > 
> > The target of the ABI docs is not users, but kernel developers, who
> > should most definitely read them and fix their kernels.
> 
> ??? (We're talking of unmodified, closed source OSes here.)

On x86 such unmodified OSes would not use pciif.h/pcifront/back, instead
they would be an HVM guest and get an HVM PCI bus emulated by the device
model, which would (I suppose) support remapping BARs etc, since as you say
unmodified OSes may require that.

AFAIK pcifront/back doesn't work for x86/HVM guests today, or at least not
in general for "unmodified, closed source OSes".

AIUI it is the case today (and has always been the case) that x86/PV guests
using pcifront/pciback cannot rewrite BARS, except to either all 1's or
their original value (this is needed to probe the size or something AIUI).
This is my reading of linux/drivers/xen/xen
-pciback/conf_space_header.c:bar_write at least, which says:
    /* For the BARs, only allow writes which write ~0 or
     * the correct resource information
     * (Needed for when the driver probes the resource usage)
     */

ARM here is following the x86/PV model for PCI, not the x86/HVM emulated
one.

Supporting writing of BARs for ARM guests (as for x86/PV guests, I think)
would be hard to achieve. I would justify doing this on ARM on the basis
that x86/PV has been doing this for a decade and barely anyone has ever
noticed.

Ian.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-14  9:26     ` Jan Beulich
  2015-08-14 13:21       ` Stefano Stabellini
@ 2015-09-02 14:47       ` Ian Campbell
  1 sibling, 0 replies; 40+ messages in thread
From: Ian Campbell @ 2015-09-02 14:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, stefano.stabellini, Manish Jaggi, Vijaya Kumar,
	JulienGrall, Xen Devel

On Fri, 2015-08-14 at 03:26 -0600, Jan Beulich wrote:
> > 
> > > > On 13.08.15 at 19:01, <ian.campbell@citrix.com> wrote:
> > On Thu, 2015-08-13 at 09:29 -0600, Jan Beulich wrote:
> > 
> > A bunch of your questions seem to be about things which have been 
> > discussed
> > at length in previous postings, it's probably worth reviewing some of 
> > those
> > discussions.
> 
> With the huge amount of mail to be read after returning from vacation
> I had hoped to avoid that,

I think I'd forgotten at the time that you had recently been away, sorry.

>  assuming that results of such discussion
> would (should) manifest themselves in the new draft

Yes, that would also be ideal.

Ian.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-09-02 14:45               ` Ian Campbell
@ 2015-09-02 14:52                 ` Jan Beulich
  2015-09-02 15:07                   ` Ian Campbell
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2015-09-02 14:52 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Prasun.kapoor, stefano.stabellini, Manish Jaggi, Vijaya Kumar,
	JulienGrall, Xen Devel

>>> On 02.09.15 at 16:45, <ian.campbell@citrix.com> wrote:
> On Fri, 2015-08-14 at 08:34 -0600, Jan Beulich wrote:
>> > > > On 14.08.15 at 16:03, <stefano.stabellini@eu.citrix.com> wrote:
>> > On Fri, 14 Aug 2015, Jan Beulich wrote:
>> > > > > > On 14.08.15 at 15:21, <stefano.stabellini@eu.citrix.com> wrote:
>> > > > On Fri, 14 Aug 2015, Jan Beulich wrote:
>> > > > > it's even less clear how you'd expect to suppress this in other 
>> > > > > guest
>> > > > > OSes (Windows - afaik - being able to run on ARM certainly makes 
>> > > > > it
>> > > > > a candidate guest, even if maybe this doesn't work today), namely
>> > > > > such not having any pcifront. And I hope this design discussion 
>> > > > > isn't
>> > > > > limiting itself to Linux guests.
>> > > > 
>> > > > We'll write down in the ABI documentation that BARs reassignments 
>> > > > are
>> > > > not supported.
>> > > 
>> > > I.e. guests doing so Will Not Work (TM), with users (usually not 
>> > > reading
>> > > ABI docs) learning it the hard way. Not nice, but a way to handle it.
>> > 
>> > The target of the ABI docs is not users, but kernel developers, who
>> > should most definitely read them and fix their kernels.
>> 
>> ??? (We're talking of unmodified, closed source OSes here.)
> 
> On x86 such unmodified OSes would not use pciif.h/pcifront/back, instead
> they would be an HVM guest and get an HVM PCI bus emulated by the device
> model, which would (I suppose) support remapping BARs etc, since as you say
> unmodified OSes may require that.
> 
> AFAIK pcifront/back doesn't work for x86/HVM guests today, or at least not
> in general for "unmodified, closed source OSes".
> 
> AIUI it is the case today (and has always been the case) that x86/PV guests
> using pcifront/pciback cannot rewrite BARS, except to either all 1's or
> their original value (this is needed to probe the size or something AIUI).
> This is my reading of linux/drivers/xen/xen
> -pciback/conf_space_header.c:bar_write at least, which says:
>     /* For the BARs, only allow writes which write ~0 or
>      * the correct resource information
>      * (Needed for when the driver probes the resource usage)
>      */
> 
> ARM here is following the x86/PV model for PCI, not the x86/HVM emulated
> one.

I understood it that way, but my point is - how are you ever going to
support e.g. Windows guests on ARM? Are you really making yourself
dependent upon MS adding a Xen PCI frontend to it?

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-13 15:29 ` Jan Beulich
  2015-08-13 17:01   ` Ian Campbell
  2015-08-14 15:38   ` Stefano Stabellini
@ 2015-09-02 14:57   ` Ian Campbell
  2015-09-02 15:06     ` Jan Beulich
  2 siblings, 1 reply; 40+ messages in thread
From: Ian Campbell @ 2015-09-02 14:57 UTC (permalink / raw)
  To: Jan Beulich, Manish Jaggi
  Cc: Prasun.kapoor, Vijaya Kumar, Julien Grall, Xen Devel, stefano.stabellini

On Thu, 2015-08-13 at 09:29 -0600, Jan Beulich wrote:
>                >>> On 13.08.15 at 11:42, <mjaggi@caviumnetworks.com> 
> wrote:
> >   2.1    pci_hostbridge and pci_hostbridge_ops
> >   ---------------------------------------------------------------------
> > --------
> >   The init function in the PCI host driver calls to register hostbridge
> >   callbacks:
> > 
> >   int pci_hostbridge_register(pci_hostbridge_t *pcihb);
> > 
> >   struct pci_hostbridge_ops {
> >       u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,
> >                                   u32 reg, u32 bytes);
> >       void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 
> > devfn,
> >                                   u32 reg, u32 bytes, u32 val);
> >   };
> > 
> >   struct pci_hostbridge{
> >       u32 segno;
> >       paddr_t cfg_base;
> >       paddr_t cfg_size;
> >       struct dt_device_node *dt_node;
> >       struct pci_hostbridge_ops ops;
> >       struct list_head list;
> >   };
> > 
> >   A PCI conf_read function would internally be as follows:
> >   u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes)
> >   {
> >       pci_hostbridge_t *pcihb;
> >       list_for_each_entry(pcihb, &pci_hostbridge_list, list)
> >       {
> >           if(pcihb-segno == seg)
> >               return pcihb-ops.pci_conf_read(pcihb, bus, devfn, reg, 
> > bytes);
> >       }
> >       return -1;
> >   }
> 
> Which implies 1 segment per host bridge, which doesn't seem too
> nice to me: I can't see why a bridge might not cover more than one
> segment, and I also can't see why you shouldn't be able to put
> multiple bridges in the same segment when the number of busses
> they have is small.

Does this imply that:
  #define PHYSDEVOP_pci_host_bridge_add    <<>>
  struct physdev_pci_host_bridge_add {
      /* IN */
      uint16_t seg;
      uint64_t cfg_base;
      uint64_t cfg_size;
  };
(as specified in this draft 4) ought to have a bus field (as proposed in 
http://article.gmane.org/gmane.comp.emulators.xen.devel/233386) or perhaps
even a bus and nr_buses field to give a span?

And then that the lookup functions ought to take those into account, of
course.

In the case where a single bridge covers multiple segments would we then
need to support calling this function multiple times i.e. once for each
segment the bridge contains each pointing to the same cfg region? Or
something similar.

Ian.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-09-02 13:46         ` Ian Campbell
@ 2015-09-02 15:03           ` Ian Campbell
  0 siblings, 0 replies; 40+ messages in thread
From: Ian Campbell @ 2015-09-02 15:03 UTC (permalink / raw)
  To: Julien Grall, Manish Jaggi, Jan Beulich
  Cc: Prasun.kapoor, Vijaya Kumar, Julien Grall, stefano.stabellini, Xen Devel

On Wed, 2015-09-02 at 14:46 +0100, Ian Campbell wrote:
> On Wed, 2015-09-02 at 13:59 +0100, Julien Grall wrote:
> 
> (I'm not caught up on my mail, so just commenting on this one aspect)
> 
> > Anyway, I think this logic should be done in the toolstack and not in
> > the hypervisor. Only the toolstack is in charge of the memory layout.
> > Xen appears to know the memory layout on ARM because it's statically 
> > define.
> 
> The domU address space GUEST_* #defines in xen/include/public/arch-arm.h
> are really just a convenience used when the toolstack and hypervisor need
> to agree on a value and that value happens, right now, to be static.
> 
> The _correct_ interface would be a hypercall (or several) where the
> toolstack tells Xen what the values are, but that's code and faff etc so
> where the value which the toolstack is static we take a short cut and add
> one of these #defines.
> 
> So everyone should just think of every GUEST_FOO in there as being
> equivalent to:
>     struct xen_arch_domainconfig {
> 	//.... other stuff
>         uint64_t foo;
>     };
> i.e. passed to XEN_DOMCTL_createdomain during domain build (obviously and 
> a
> field foo in struct arch_domain where Xen stashes the value and uses that
> instead of GUEST_FOO etc).
> 
> The actual value of foo would be in tools/libx?/something.h, or decided 
> at
> runtime.
> 
> Obviously for FOO which is a static value that's a pain, hence the 
> #defines
> instead.

I forgot to also say, there are some regions which Xen doesn't actually
need to know about at all, but which are recorded in arch-arm.h so that the
complete memory map is in a single place. GUEST_GNTTAB_* and GUEST_MAGIC_*
for example fall into this category.

I think the GUEST_BAR_* actually fall into this too, since only the
toolstack needs to know those limits, and then tell Xen about specific
subsets which are mapped to specific devices etc.

Ian.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-09-02 12:59       ` Julien Grall
  2015-09-02 13:46         ` Ian Campbell
@ 2015-09-02 15:03         ` Ian Campbell
  1 sibling, 0 replies; 40+ messages in thread
From: Ian Campbell @ 2015-09-02 15:03 UTC (permalink / raw)
  To: Julien Grall, Manish Jaggi, Jan Beulich
  Cc: Prasun.kapoor, Vijaya Kumar, Julien Grall, Xen Devel, stefano.stabellini

On Wed, 2015-09-02 at 13:59 +0100, Julien Grall wrote:
> On 02/09/15 13:08, Manish Jaggi wrote:
> > 
> > 
> > On Tuesday 01 September 2015 01:02 PM, Jan Beulich wrote:
> > > > > > On 31.08.15 at 14:36, <mjaggi@caviumnetworks.com> wrote:
> > > > On Thursday 13 August 2015 03:12 PM, Manish Jaggi wrote:
> > > > >   4.2.1 Mapping BAR regions in guest address space
> > > > >  
> > > > > -----------------------------------------------------------------
> > > > > ------------
> > > > > 
> > > > > 
> > > > >   When a PCI-EP device is assigned to a domU the toolstack will 
> > > > > read
> > > > > the pci
> > > > >   configuration space BAR registers. Toolstack allocates a 
> > > > > virtual BAR
> > > > >   region for each BAR region, from the area reserved in guest 
> > > > > address
> > > > > space for
> > > > >   mapping BARs referred to as Guest BAR area. This area is 
> > > > > defined in
> > > > >   public/arch-arm.h
> > > > > 
> > > > >   /* For 32bit BARs*/
> > > > >   #define GUEST_BAR_BASE_32 <<>>
> > > > >   #define GUEST_BAR_SIZE_32 <<>>
> > > > > 
> > > > >   /* For 64bit BARs*/
> > > > >   #define GUEST_BAR_BASE_64 <<>>
> > > > >   #define GUEST_BAR_SIZE_64 <<>>
> > > > > 
> > > > >   Toolstack then invokes domctl xc_domain_memory_mapping to map 
> > > > > in
> > > > > stage2
> > > > >   translation. If a BAR region address is 32b BASE_32 area would 
> > > > > be
> > > > > used,
> > > > >   otherwise 64b. If a combination of both is required the support 
> > > > > is
> > > > > TODO.
> > > > > 
> > > > >   Toolstack manages these areas and allocate from these area. The
> > > > > allocation
> > > > >   and deallocation is done using APIs similar to malloc and free.
> > > > > 
> > > > To implement this feature in xl tools there is required to have a 
> > > > malloc
> > > > and free from the reserved area.
> > > > Can we have the XEN_DOMCTL_memory_mapping extended with a flag say
> > > > ALLOCATE/FREE_FROM_BAR_AREA.
> > > > When this flag is passed xen would add or remove the stage2 mapping 
> > > > for
> > > > the domain.
> > > > This will make use of the code already present in xen.
> > > Above it was said that the tool stack manages this area (including
> > > allocations from it). Why would this require a new hypercall?
> > As a rule xl tools should manage the guest memory map. Now if it does 
> > by
> > itself or initiates it is another thing.
> > Allocating an area for PCI BAR and freeing it reserved area  would
> > require adding  allocator code in xl tools.
> > Since xen already knows about the area (as it is defined in public
> > header file) and there exists code in xen,
> 
> Which code? We don't have any code in Xen to find free space in the
> stage-2 table for the PCI region.

I agree, AFAIK there is no such code in Xen.

So, yes, the toolstack should gain code to assign these regions out of the
MMIO window(s) which are set aside for this purpose in arch-arm.h and tell
the other components in the system as necessary.

> Anyway, I think this logic should be done in the toolstack and not in
> the hypervisor. Only the toolstack is in charge of the memory layout.
> Xen appears to know the memory layout on ARM because it's statically 
> define.

Right, I gave a longer explanation of this in another mail.

Ian.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-09-02 14:57   ` Ian Campbell
@ 2015-09-02 15:06     ` Jan Beulich
  0 siblings, 0 replies; 40+ messages in thread
From: Jan Beulich @ 2015-09-02 15:06 UTC (permalink / raw)
  To: Manish Jaggi, Ian Campbell
  Cc: Prasun.kapoor, Vijaya Kumar, JulienGrall, Xen Devel, stefano.stabellini

>>> On 02.09.15 at 16:57, <ian.campbell@citrix.com> wrote:
> On Thu, 2015-08-13 at 09:29 -0600, Jan Beulich wrote:
>>                >>> On 13.08.15 at 11:42, <mjaggi@caviumnetworks.com>>> wrote:
>> >   2.1    pci_hostbridge and pci_hostbridge_ops
>> >   ---------------------------------------------------------------------
>> > --------
>> >   The init function in the PCI host driver calls to register hostbridge
>> >   callbacks:
>> > 
>> >   int pci_hostbridge_register(pci_hostbridge_t *pcihb);
>> > 
>> >   struct pci_hostbridge_ops {
>> >       u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,
>> >                                   u32 reg, u32 bytes);
>> >       void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 
>> > devfn,
>> >                                   u32 reg, u32 bytes, u32 val);
>> >   };
>> > 
>> >   struct pci_hostbridge{
>> >       u32 segno;
>> >       paddr_t cfg_base;
>> >       paddr_t cfg_size;
>> >       struct dt_device_node *dt_node;
>> >       struct pci_hostbridge_ops ops;
>> >       struct list_head list;
>> >   };
>> > 
>> >   A PCI conf_read function would internally be as follows:
>> >   u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes)
>> >   {
>> >       pci_hostbridge_t *pcihb;
>> >       list_for_each_entry(pcihb, &pci_hostbridge_list, list)
>> >       {
>> >           if(pcihb-segno == seg)
>> >               return pcihb-ops.pci_conf_read(pcihb, bus, devfn, reg, 
>> > bytes);
>> >       }
>> >       return -1;
>> >   }
>> 
>> Which implies 1 segment per host bridge, which doesn't seem too
>> nice to me: I can't see why a bridge might not cover more than one
>> segment, and I also can't see why you shouldn't be able to put
>> multiple bridges in the same segment when the number of busses
>> they have is small.
> 
> Does this imply that:
>   #define PHYSDEVOP_pci_host_bridge_add    <<>>
>   struct physdev_pci_host_bridge_add {
>       /* IN */
>       uint16_t seg;
>       uint64_t cfg_base;
>       uint64_t cfg_size;
>   };
> (as specified in this draft 4) ought to have a bus field (as proposed in 
> http://article.gmane.org/gmane.comp.emulators.xen.devel/233386) or perhaps
> even a bus and nr_buses field to give a span?

The latter, yes.

> And then that the lookup functions ought to take those into account, of
> course.
> 
> In the case where a single bridge covers multiple segments would we then
> need to support calling this function multiple times i.e. once for each
> segment the bridge contains each pointing to the same cfg region? Or
> something similar.

Right.

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-09-02 14:52                 ` Jan Beulich
@ 2015-09-02 15:07                   ` Ian Campbell
  0 siblings, 0 replies; 40+ messages in thread
From: Ian Campbell @ 2015-09-02 15:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, stefano.stabellini, Manish Jaggi, Vijaya Kumar,
	JulienGrall, Xen Devel

On Wed, 2015-09-02 at 08:52 -0600, Jan Beulich wrote:
> > 
> > > > On 02.09.15 at 16:45, <ian.campbell@citrix.com> wrote:
> > On Fri, 2015-08-14 at 08:34 -0600, Jan Beulich wrote:
> > > > > > On 14.08.15 at 16:03, <stefano.stabellini@eu.citrix.com> wrote:
> > > > On Fri, 14 Aug 2015, Jan Beulich wrote:
> > > > > > > > On 14.08.15 at 15:21, <stefano.stabellini@eu.citrix.com> 
> > > > > > > > wrote:
> > > > > > On Fri, 14 Aug 2015, Jan Beulich wrote:
> > > > > > > it's even less clear how you'd expect to suppress this in 
> > > > > > > other 
> > > > > > > guest
> > > > > > > OSes (Windows - afaik - being able to run on ARM certainly 
> > > > > > > makes 
> > > > > > > it
> > > > > > > a candidate guest, even if maybe this doesn't work today), 
> > > > > > > namely
> > > > > > > such not having any pcifront. And I hope this design 
> > > > > > > discussion 
> > > > > > > isn't
> > > > > > > limiting itself to Linux guests.
> > > > > > 
> > > > > > We'll write down in the ABI documentation that BARs 
> > > > > > reassignments 
> > > > > > are
> > > > > > not supported.
> > > > > 
> > > > > I.e. guests doing so Will Not Work (TM), with users (usually not 
> > > > > reading
> > > > > ABI docs) learning it the hard way. Not nice, but a way to handle 
> > > > > it.
> > > > 
> > > > The target of the ABI docs is not users, but kernel developers, who
> > > > should most definitely read them and fix their kernels.
> > > 
> > > ??? (We're talking of unmodified, closed source OSes here.)
> > 
> > On x86 such unmodified OSes would not use pciif.h/pcifront/back, 
> > instead
> > they would be an HVM guest and get an HVM PCI bus emulated by the 
> > device
> > model, which would (I suppose) support remapping BARs etc, since as you 
> > say
> > unmodified OSes may require that.
> > 
> > AFAIK pcifront/back doesn't work for x86/HVM guests today, or at least 
> > not
> > in general for "unmodified, closed source OSes".
> > 
> > AIUI it is the case today (and has always been the case) that x86/PV 
> > guests
> > using pcifront/pciback cannot rewrite BARS, except to either all 1's or
> > their original value (this is needed to probe the size or something 
> > AIUI).
> > This is my reading of linux/drivers/xen/xen
> > -pciback/conf_space_header.c:bar_write at least, which says:
> >     /* For the BARs, only allow writes which write ~0 or
> >      * the correct resource information
> >      * (Needed for when the driver probes the resource usage)
> >      */
> > 
> > ARM here is following the x86/PV model for PCI, not the x86/HVM 
> > emulated
> > one.
> 
> I understood it that way, but my point is - how are you ever going to
> support e.g. Windows guests on ARM? Are you really making yourself
> dependent upon MS adding a Xen PCI frontend to it?

Supporting completely unmodified/unaware (so not even PVHVM) OSes on ARM
would require adding the concept of a device model, much like for x86/HVM.
The DM would have to provide the emulated style PCI bus alongside all the
other emulated hardware such a Windows guest would certainly require.

Maybe we can do that as an extension to the existing ARM guest, rather than
adding a second guest type like x86 has (and is moving away from with the
PVH stuff).

Ian.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-13 10:37 ` Julien Grall
@ 2015-09-02 15:19   ` Ian Campbell
  2015-09-02 15:40     ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Ian Campbell @ 2015-09-02 15:19 UTC (permalink / raw)
  To: Julien Grall, Manish Jaggi, Xen Devel, Konrad Rzeszutek Wilk,
	★ Stefano Stabellini, "Manish Jaggi,★ Kumar,
	Vijaya",
	Julien Grall
  Cc: Prasun.kapoor

On Thu, 2015-08-13 at 11:37 +0100, Julien Grall wrote:
> >  4.2.2    Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
> >  ----------------------------------------------------------------------
> > ------
> > 
> >  Toolstack also updates the xenstore information for the device
> >  (virtualbar:physical bar).This information is read by xen-pciback and
> >  returned to the domU-pcifront driver configuration space reads for 
> > BAR.
> > 
> >  Entries created are as follows:
> >  /local/domain/0/backend/pci/1/0
> >  vdev-N
> >      BDF = ""
> >      BAR-0-IPA = ""
> >      BAR-0-PA = ""
> >      BAR-0-SIZE = ""
> >      ...
> >      BAR-M-IPA = ""
> >      BAR-M-PA = ""
> >      BAR-M-SIZE = ""
> > 
> >  Note: If BAR M SIZE is 0, it is not a valid entry.
> 
> How would you describe the ROM in xenstore?

You mean the ROM BAR? BAR-ROM-* perhaps? 

> 
> 
> regards,
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-09-02 15:19   ` Ian Campbell
@ 2015-09-02 15:40     ` Julien Grall
  0 siblings, 0 replies; 40+ messages in thread
From: Julien Grall @ 2015-09-02 15:40 UTC (permalink / raw)
  To: Ian Campbell, Manish Jaggi, Xen Devel, Konrad Rzeszutek Wilk,
	★ Stefano Stabellini, "Manish Jaggi,★ Kumar,
	Vijaya",
	Julien Grall
  Cc: Prasun.kapoor

On 02/09/15 16:19, Ian Campbell wrote:
> On Thu, 2015-08-13 at 11:37 +0100, Julien Grall wrote:
>>>  4.2.2    Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
>>>  ----------------------------------------------------------------------
>>> ------
>>>
>>>  Toolstack also updates the xenstore information for the device
>>>  (virtualbar:physical bar).This information is read by xen-pciback and
>>>  returned to the domU-pcifront driver configuration space reads for 
>>> BAR.
>>>
>>>  Entries created are as follows:
>>>  /local/domain/0/backend/pci/1/0
>>>  vdev-N
>>>      BDF = ""
>>>      BAR-0-IPA = ""
>>>      BAR-0-PA = ""
>>>      BAR-0-SIZE = ""
>>>      ...
>>>      BAR-M-IPA = ""
>>>      BAR-M-PA = ""
>>>      BAR-M-SIZE = ""
>>>
>>>  Note: If BAR M SIZE is 0, it is not a valid entry.
>>
>> How would you describe the ROM in xenstore?
> 
> You mean the ROM BAR? BAR-ROM-* perhaps? 

Right I mean the ROM BAR. I think it's should be spell out in the design
document.

Although I don't much care about the name.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-08-13  9:42 PCI Pass-through in Xen ARM: Draft 4 Manish Jaggi
                   ` (3 preceding siblings ...)
  2015-09-01 16:15 ` Stefano Stabellini
@ 2015-09-10  1:12 ` Julien Grall
  2015-09-15 18:58   ` Jaggi, Manish
  4 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2015-09-10  1:12 UTC (permalink / raw)
  To: Manish Jaggi, Xen Devel, Konrad Rzeszutek Wilk,
	★ Stefano Stabellini, Manish Jaggi,★ Kumar, Vijaya,
	Julien Grall, Ian Campbell
  Cc: Prasun.kapoor

Hi Manish,

On 13/08/2015 10:42, Manish Jaggi wrote:
>   3.2.    Mapping between streamID - deviceID - pci sbdf - requesterID
>   -----------------------------------------------------------------------------
>   For a simpler case all should be equal to BDF. But there are some devices
>   that use the wrong requester ID for DMA transactions. Linux kernel has
> PCI
>   quirks for these. How the same be implemented in Xen or a diffrent
> approach
>   has to be taken is TODO here.
>
>   Till that time, for basic implementation it is assumed that all are equal
>   to BDF.

Back to this streamID = DeviceID = requestID = SBDF again...

I've just found a patch for Linux send by one of your colleague about 
tweaking the requesterID for thunder-X board (See [1]). The most 
interesting bits are:

static u32 pci_requester_id_ecam(struct pci_dev *dev)
{
	return (((pci_domain_nr(dev->bus) >> 2) << 19) |
		((pci_domain_nr(dev->bus) % 4) << 16) |
		(dev->bus->number << 8) | dev->devfn);
}

static u32 thunder_pci_requester_id(struct pci_dev *dev, u16 alias)
{
	int ret;

	ret = thunder_pem_requester_id(dev);
	if (ret >= 0)
		return (u32)ret;

	return pci_requester_id_ecam(dev);
}

Which is then used to override the default function used by ITS to find 
the deviceID.

AFAICT, this means that you can't safely assume that DeviceID == sBDF 
even for your platform. Is that right?

If so, I'm afraid that you have to handle DeviceID != sBDF (and so on) 
in your design doc. I.e how do you plan to get the base requester ID.

I can see 2 different solutions:
	1) Let DOM0 pass the first requester ID when registering the bus
	   Pros:
		* Less per-platform code in Xen
	   Cons:
		* Assume that the requester ID are contiguous. (Is it really a cons?)
		* Still require quirk for buggy device (i.e requester ID not correct)
	2) Do it in Xen
	   Pros:
		* We are not relying on DOM0 giving the requester ID
			=> Not assuming contiguous requester ID
  	   Cons:
		* Per PCI bridge code to handle the mapping

Regards,

[1] https://lkml.org/lkml/2015/7/15/703



-- 
Julien Grall

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-09-10  1:12 ` Julien Grall
@ 2015-09-15 18:58   ` Jaggi, Manish
  2015-09-15 21:18     ` David Daney
  2015-09-16 12:58     ` Julien Grall
  0 siblings, 2 replies; 40+ messages in thread
From: Jaggi, Manish @ 2015-09-15 18:58 UTC (permalink / raw)
  To: Julien Grall, Xen Devel, Konrad Rzeszutek Wilk,
	★ Stefano Stabellini, Ian Campbell, Daney, David
  Cc: Prasun.kapoor, Jaggi, Manish, Kumar, Vijaya



On Thursday 10 September 2015 06:42 AM, Julien Grall wrote:
> Hi Manish,
Hi Julien, sorry for late response..
>
> On 13/08/2015 10:42, Manish Jaggi wrote:
>>   3.2.    Mapping between streamID - deviceID - pci sbdf - requesterID
>> -----------------------------------------------------------------------------
>>   For a simpler case all should be equal to BDF. But there are some 
>> devices
>>   that use the wrong requester ID for DMA transactions. Linux kernel has
>> PCI
>>   quirks for these. How the same be implemented in Xen or a diffrent
>> approach
>>   has to be taken is TODO here.
>>
>>   Till that time, for basic implementation it is assumed that all are 
>> equal
>>   to BDF.
>
> Back to this streamID = DeviceID = requestID = SBDF again...
>
> I've just found a patch for Linux send by one of your colleague about 
> tweaking the requesterID for thunder-X board (See [1]). The most 
> interesting bits are:
>
> static u32 pci_requester_id_ecam(struct pci_dev *dev)
> {
>     return (((pci_domain_nr(dev->bus) >> 2) << 19) |
>         ((pci_domain_nr(dev->bus) % 4) << 16) |
>         (dev->bus->number << 8) | dev->devfn);
> }
>
> static u32 thunder_pci_requester_id(struct pci_dev *dev, u16 alias)
> {
>     int ret;
>
>     ret = thunder_pem_requester_id(dev);
>     if (ret >= 0)
>         return (u32)ret;
>
>     return pci_requester_id_ecam(dev);
> }
>
> Which is then used to override the default function used by ITS to 
> find the deviceID.
>
> AFAICT, this means that you can't safely assume that DeviceID == sBDF 
> even for your platform. Is that right?
Yes.
>
> If so, I'm afraid that you have to handle DeviceID != sBDF (and so on) 
> in your design doc. I.e how do you plan to get the base requester ID.
>
Right. But on our platform the requesterId is uniquely generated from 
bdf. Adding David to confirm that.
> I can see 2 different solutions:
>     1) Let DOM0 pass the first requester ID when registering the bus
>        Pros:
>         * Less per-platform code in Xen
>        Cons:
>         * Assume that the requester ID are contiguous. (Is it really a 
> cons?)
>         * Still require quirk for buggy device (i.e requester ID not 
> correct)
>     2) Do it in Xen
>        Pros:
>         * We are not relying on DOM0 giving the requester ID
>             => Not assuming contiguous requester ID
>         Cons:
>         * Per PCI bridge code to handle the mapping
>
  We can have (3) that when PHYSDEVOP_pci_add_device is called the sbdf 
and requesterID both are passed in hypercall.
> Regards,
>
> [1] https://lkml.org/lkml/2015/7/15/703
>
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-09-15 18:58   ` Jaggi, Manish
@ 2015-09-15 21:18     ` David Daney
  2015-09-16 12:58     ` Julien Grall
  1 sibling, 0 replies; 40+ messages in thread
From: David Daney @ 2015-09-15 21:18 UTC (permalink / raw)
  To: Jaggi, Manish
  Cc: Prasun.kapoor, Ian Campbell, Kumar, Vijaya,
	★ Stefano Stabellini, Daney, David, Xen Devel,
	Julien Grall

On 09/15/2015 11:58 AM, Jaggi, Manish wrote:
> 
> 
> On Thursday 10 September 2015 06:42 AM, Julien Grall wrote:
>> Hi Manish,
> Hi Julien, sorry for late response..
>>
>> On 13/08/2015 10:42, Manish Jaggi wrote:
>>>    3.2.    Mapping between streamID - deviceID - pci sbdf - requesterID
>>> -----------------------------------------------------------------------------
>>>    For a simpler case all should be equal to BDF. But there are some
>>> devices
>>>    that use the wrong requester ID for DMA transactions. Linux kernel has
>>> PCI
>>>    quirks for these. How the same be implemented in Xen or a diffrent
>>> approach
>>>    has to be taken is TODO here.
>>>
>>>    Till that time, for basic implementation it is assumed that all are
>>> equal
>>>    to BDF.
>>
>> Back to this streamID = DeviceID = requestID = SBDF again...
>>
>> I've just found a patch for Linux send by one of your colleague about
>> tweaking the requesterID for thunder-X board (See [1]). The most
>> interesting bits are:
>>
>> static u32 pci_requester_id_ecam(struct pci_dev *dev)
>> {
>>      return (((pci_domain_nr(dev->bus) >> 2) << 19) |
>>          ((pci_domain_nr(dev->bus) % 4) << 16) |
>>          (dev->bus->number << 8) | dev->devfn);
>> }
>>
>> static u32 thunder_pci_requester_id(struct pci_dev *dev, u16 alias)
>> {
>>      int ret;
>>
>>      ret = thunder_pem_requester_id(dev);
>>      if (ret >= 0)
>>          return (u32)ret;
>>
>>      return pci_requester_id_ecam(dev);
>> }
>>
>> Which is then used to override the default function used by ITS to
>> find the deviceID.
>>
>> AFAICT, this means that you can't safely assume that DeviceID == sBDF
>> even for your platform. Is that right?
> Yes.
>>
>> If so, I'm afraid that you have to handle DeviceID != sBDF (and so on)
>> in your design doc. I.e how do you plan to get the base requester ID.
>>
> Right. But on our platform the requesterId is uniquely generated from
> bdf. Adding David to confirm that.

The requesterId is: Node << 19 + ECAM# << 16 + bdf

For the Linux kernel, the Node/ECAM part is going to be read from OF
device-tree or ACPI, not calculated with the above formula.


>> I can see 2 different solutions:
>>      1) Let DOM0 pass the first requester ID when registering the bus
>>         Pros:
>>          * Less per-platform code in Xen
>>         Cons:
>>          * Assume that the requester ID are contiguous. (Is it really a
>> cons?)
>>          * Still require quirk for buggy device (i.e requester ID not
>> correct)
>>      2) Do it in Xen
>>         Pros:
>>          * We are not relying on DOM0 giving the requester ID
>>              => Not assuming contiguous requester ID
>>          Cons:
>>          * Per PCI bridge code to handle the mapping
>>
>    We can have (3) that when PHYSDEVOP_pci_add_device is called the sbdf
> and requesterID both are passed in hypercall.
>> Regards,
>>
>> [1] https://lkml.org/lkml/2015/7/15/703
>>
>>
>>
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-09-15 18:58   ` Jaggi, Manish
  2015-09-15 21:18     ` David Daney
@ 2015-09-16 12:58     ` Julien Grall
  2015-09-19 20:24       ` Manish Jaggi
  1 sibling, 1 reply; 40+ messages in thread
From: Julien Grall @ 2015-09-16 12:58 UTC (permalink / raw)
  To: Jaggi, Manish, Xen Devel, Konrad Rzeszutek Wilk,
	★ Stefano Stabellini, Ian Campbell, Daney, David
  Cc: Prasun.kapoor, Kumar, Vijaya

On 15/09/15 19:58, Jaggi, Manish wrote:
>> I can see 2 different solutions:
>>     1) Let DOM0 pass the first requester ID when registering the bus
>>        Pros:
>>         * Less per-platform code in Xen
>>        Cons:
>>         * Assume that the requester ID are contiguous. (Is it really a 
>> cons?)
>>         * Still require quirk for buggy device (i.e requester ID not 
>> correct)
>>     2) Do it in Xen
>>        Pros:
>>         * We are not relying on DOM0 giving the requester ID
>>             => Not assuming contiguous requester ID
>>         Cons:
>>         * Per PCI bridge code to handle the mapping
>>
>   We can have (3) that when PHYSDEVOP_pci_add_device is called the sbdf 
> and requesterID both are passed in hypercall.

The name of the physdev operation is PHYSDEVOP_pci_device_add and not
PHYSDEVOP_pci_add_device. Please rename it all the usage in the design doc.

Although, we can't modify PHYSDEVOP_pci_device_add because it's part of
the ABI which is stable.

Based on David's mail, the requester ID of a given device can be found
using base + devfn where base is the first requesterID of the bus.

IIRC, this is also match the IORT ACPI spec.

So for now, I would extend the physdev you've introduced to add an host
bridge (PHYSDEV_pci_host_bridge_add) to pass the base requesterID.

We can think later to introduce a new physdev op to add PCI if we ever
require unique requesterID (i.e non-contiguous under the same bridge).

Regards,

---
Julien Grall

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-09-16 12:58     ` Julien Grall
@ 2015-09-19 20:24       ` Manish Jaggi
  2015-09-19 20:48         ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Manish Jaggi @ 2015-09-19 20:24 UTC (permalink / raw)
  To: Julien Grall, Jaggi, Manish, Xen Devel, Konrad Rzeszutek Wilk,
	★ Stefano Stabellini, Ian Campbell, Daney, David
  Cc: Prasun.kapoor, Kumar, Vijaya



On Wednesday 16 September 2015 06:28 PM, Julien Grall wrote:
> On 15/09/15 19:58, Jaggi, Manish wrote:
>>> I can see 2 different solutions:
>>>      1) Let DOM0 pass the first requester ID when registering the bus
>>>         Pros:
>>>          * Less per-platform code in Xen
>>>         Cons:
>>>          * Assume that the requester ID are contiguous. (Is it really a
>>> cons?)
>>>          * Still require quirk for buggy device (i.e requester ID not
>>> correct)
>>>      2) Do it in Xen
>>>         Pros:
>>>          * We are not relying on DOM0 giving the requester ID
>>>              => Not assuming contiguous requester ID
>>>          Cons:
>>>          * Per PCI bridge code to handle the mapping
>>>
>>    We can have (3) that when PHYSDEVOP_pci_add_device is called the sbdf
>> and requesterID both are passed in hypercall.
> The name of the physdev operation is PHYSDEVOP_pci_device_add and not
> PHYSDEVOP_pci_add_device. Please rename it all the usage in the design doc.
>
> Although, we can't modify PHYSDEVOP_pci_device_add because it's part of
> the ABI which is stable.
>
> Based on David's mail, the requester ID of a given device can be found
> using base + devfn where base is the first requesterID of the bus.
>
> IIRC, this is also match the IORT ACPI spec.
>
> So for now, I would extend the physdev you've introduced to add an host
> bridge (PHYSDEV_pci_host_bridge_add) to pass the base requesterID.
The requester-ID is derived from the Node# and ECAM# as per David. I 
guess the ECAM and Node# can be derived from
the cfg_addr.
Each Ecam has a cfg_addr in Thunder, which is mentioned in the pci node 
in device tree.
For thunder I think we don't need to pass requester-ID in the phydevop.
>
> We can think later to introduce a new physdev op to add PCI if we ever
> require unique requesterID (i.e non-contiguous under the same bridge).
>
> Regards,
>
> ---
> Julien Grall

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-09-19 20:24       ` Manish Jaggi
@ 2015-09-19 20:48         ` Julien Grall
  2015-09-19 21:51           ` Daney, David
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2015-09-19 20:48 UTC (permalink / raw)
  To: Manish Jaggi, Jaggi, Manish, Xen Devel
  Cc: Prasun.kapoor, Ian Campbell, ★ Stefano Stabellini, Kumar,
	Vijaya, Daney, David

Hi Manish,

On 19/09/2015 21:24, Manish Jaggi wrote:
> On Wednesday 16 September 2015 06:28 PM, Julien Grall wrote:
>> On 15/09/15 19:58, Jaggi, Manish wrote:
>>>> I can see 2 different solutions:
>>>>      1) Let DOM0 pass the first requester ID when registering the bus
>>>>         Pros:
>>>>          * Less per-platform code in Xen
>>>>         Cons:
>>>>          * Assume that the requester ID are contiguous. (Is it really a
>>>> cons?)
>>>>          * Still require quirk for buggy device (i.e requester ID not
>>>> correct)
>>>>      2) Do it in Xen
>>>>         Pros:
>>>>          * We are not relying on DOM0 giving the requester ID
>>>>              => Not assuming contiguous requester ID
>>>>          Cons:
>>>>          * Per PCI bridge code to handle the mapping
>>>>
>>>    We can have (3) that when PHYSDEVOP_pci_add_device is called the sbdf
>>> and requesterID both are passed in hypercall.
>> The name of the physdev operation is PHYSDEVOP_pci_device_add and not
>> PHYSDEVOP_pci_add_device. Please rename it all the usage in the design
>> doc.
>>
>> Although, we can't modify PHYSDEVOP_pci_device_add because it's part of
>> the ABI which is stable.
>>
>> Based on David's mail, the requester ID of a given device can be found
>> using base + devfn where base is the first requesterID of the bus.
>>
>> IIRC, this is also match the IORT ACPI spec.
>>
>> So for now, I would extend the physdev you've introduced to add an host
>> bridge (PHYSDEV_pci_host_bridge_add) to pass the base requesterID.
> The requester-ID is derived from the Node# and ECAM# as per David. I
> guess the ECAM and Node# can be derived from
> the cfg_addr.

There is no place for "I guess". Any approximation here would lead to 
add more hypercalls because we design the first one on speculation.

> Each Ecam has a cfg_addr in Thunder, which is mentioned in the pci node
> in device tree.

IIUC what you said, you suggest to retrieve the ECAM based on the 
cfg_addr in Xen, right?

If so, this is the wrong way to go we should avoid to have platform 
specific code in Xen as much as possible and get everything either via 
the firmware table (ACPI or DT) or via hypercall.

> For thunder I think we don't need to pass requester-ID in the phydevop.

May I remind you that this design doc is not about Thunder-X but PCI 
passthrough in general. If your platform already requires specific code 
in Xen to get the requester ID, what prevents other to not do the same?

So it looks like to me that adding the base requester-ID in the 
PHYSDEVOP_pci_host_bridge_add is the right way to go.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-09-19 20:48         ` Julien Grall
@ 2015-09-19 21:51           ` Daney, David
  2015-09-21 10:17             ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Daney, David @ 2015-09-19 21:51 UTC (permalink / raw)
  To: Julien Grall, Jaggi, Manish
  Cc: Prasun.kapoor, Kumar, Vijaya, ★ Stefano Stabellini, Ian Campbell

I apologize for the top post, but for the record;

The Thunder Linux implementation will be obtaining the PCI requester id from the OF device tree, or ACPI IORT table.  It will *not* be derived from any hardware address.

It may make sense to use the same technique to get the PCI requester id in the XEN environment too.

David Daney.
________________________________________
From: Julien Grall <julien.grall@citrix.com>
Sent: Saturday, September 19, 2015 1:48:43 PM
To: Jaggi, Manish; Jaggi, Manish; Xen Devel
Cc: Konrad Rzeszutek Wilk; ★ Stefano Stabellini; Ian Campbell; Daney, David; Prasun.kapoor@cavium.com; Kumar, Vijaya
Subject: Re: [Xen-devel] PCI Pass-through in Xen ARM: Draft 4

Hi Manish,

On 19/09/2015 21:24, Manish Jaggi wrote:
> On Wednesday 16 September 2015 06:28 PM, Julien Grall wrote:
>> On 15/09/15 19:58, Jaggi, Manish wrote:
>>>> I can see 2 different solutions:
>>>>      1) Let DOM0 pass the first requester ID when registering the bus
>>>>         Pros:
>>>>          * Less per-platform code in Xen
>>>>         Cons:
>>>>          * Assume that the requester ID are contiguous. (Is it really a
>>>> cons?)
>>>>          * Still require quirk for buggy device (i.e requester ID not
>>>> correct)
>>>>      2) Do it in Xen
>>>>         Pros:
>>>>          * We are not relying on DOM0 giving the requester ID
>>>>              => Not assuming contiguous requester ID
>>>>          Cons:
>>>>          * Per PCI bridge code to handle the mapping
>>>>
>>>    We can have (3) that when PHYSDEVOP_pci_add_device is called the sbdf
>>> and requesterID both are passed in hypercall.
>> The name of the physdev operation is PHYSDEVOP_pci_device_add and not
>> PHYSDEVOP_pci_add_device. Please rename it all the usage in the design
>> doc.
>>
>> Although, we can't modify PHYSDEVOP_pci_device_add because it's part of
>> the ABI which is stable.
>>
>> Based on David's mail, the requester ID of a given device can be found
>> using base + devfn where base is the first requesterID of the bus.
>>
>> IIRC, this is also match the IORT ACPI spec.
>>
>> So for now, I would extend the physdev you've introduced to add an host
>> bridge (PHYSDEV_pci_host_bridge_add) to pass the base requesterID.
> The requester-ID is derived from the Node# and ECAM# as per David. I
> guess the ECAM and Node# can be derived from
> the cfg_addr.

There is no place for "I guess". Any approximation here would lead to
add more hypercalls because we design the first one on speculation.

> Each Ecam has a cfg_addr in Thunder, which is mentioned in the pci node
> in device tree.

IIUC what you said, you suggest to retrieve the ECAM based on the
cfg_addr in Xen, right?

If so, this is the wrong way to go we should avoid to have platform
specific code in Xen as much as possible and get everything either via
the firmware table (ACPI or DT) or via hypercall.

> For thunder I think we don't need to pass requester-ID in the phydevop.

May I remind you that this design doc is not about Thunder-X but PCI
passthrough in general. If your platform already requires specific code
in Xen to get the requester ID, what prevents other to not do the same?

So it looks like to me that adding the base requester-ID in the
PHYSDEVOP_pci_host_bridge_add is the right way to go.

Regards,

--
Julien Grall

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PCI Pass-through in Xen ARM: Draft 4
  2015-09-19 21:51           ` Daney, David
@ 2015-09-21 10:17             ` Julien Grall
  0 siblings, 0 replies; 40+ messages in thread
From: Julien Grall @ 2015-09-21 10:17 UTC (permalink / raw)
  To: Daney, David, Jaggi, Manish, Xen Devel
  Cc: Prasun.kapoor, Kumar, Vijaya, Ian Campbell, ★ Stefano Stabellini

Hi David,

On 19/09/15 22:51, Daney, David wrote:
> I apologize for the top post, but for the record;
> 
> The Thunder Linux implementation will be obtaining the PCI requester id from the OF device tree, or ACPI IORT table.  It will *not* be derived from any hardware address.

I'm aware about the ACPI IORT table but haven't see anything around a
device tree bindings for the requester ID.

Do you have a link/draft explaining this binding?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2015-09-21 10:17 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-13  9:42 PCI Pass-through in Xen ARM: Draft 4 Manish Jaggi
2015-08-13 10:37 ` Julien Grall
2015-09-02 15:19   ` Ian Campbell
2015-09-02 15:40     ` Julien Grall
2015-08-13 15:29 ` Jan Beulich
2015-08-13 17:01   ` Ian Campbell
2015-08-14  9:26     ` Jan Beulich
2015-08-14 13:21       ` Stefano Stabellini
2015-08-14 13:58         ` Jan Beulich
2015-08-14 14:03           ` Stefano Stabellini
2015-08-14 14:34             ` Jan Beulich
2015-08-14 14:37               ` Stefano Stabellini
2015-08-14 14:45                 ` Julien Grall
2015-08-14 15:15                   ` Jan Beulich
2015-08-14 15:24                     ` Stefano Stabellini
2015-09-02 14:45               ` Ian Campbell
2015-09-02 14:52                 ` Jan Beulich
2015-09-02 15:07                   ` Ian Campbell
2015-09-02 14:47       ` Ian Campbell
2015-08-14 15:38   ` Stefano Stabellini
2015-08-14 18:58     ` Jaggi, Manish
2015-08-16 23:59       ` Stefano Stabellini
2015-09-02 14:57   ` Ian Campbell
2015-09-02 15:06     ` Jan Beulich
2015-08-31 12:36 ` Manish Jaggi
2015-09-01  7:32   ` Jan Beulich
2015-09-02 12:08     ` Manish Jaggi
2015-09-02 12:59       ` Julien Grall
2015-09-02 13:46         ` Ian Campbell
2015-09-02 15:03           ` Ian Campbell
2015-09-02 15:03         ` Ian Campbell
2015-09-01 16:15 ` Stefano Stabellini
2015-09-10  1:12 ` Julien Grall
2015-09-15 18:58   ` Jaggi, Manish
2015-09-15 21:18     ` David Daney
2015-09-16 12:58     ` Julien Grall
2015-09-19 20:24       ` Manish Jaggi
2015-09-19 20:48         ` Julien Grall
2015-09-19 21:51           ` Daney, David
2015-09-21 10:17             ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.