All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices
@ 2018-03-12 18:33 ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Alexey Gerasimenko, qemu-devel

This patch series introduces support of Q35 emulation for Xen HVM guests
(via QEMU). This feature is present in other virtualization products and
Xen can greatly benefit from this feature as well.

The main goal for implementing Q35 emulation for Xen was extending PCI/GPU
passthrough capabilities. It's the main advantage of Q35 emulation
- availability of extra features for PCIe device passthrough. The most
important PCIe-specific passthrough feature Q35 provides is a support for
PCIe config space ECAM (aka MMCONFIG) to allow accesses to extended PCIe
config space (>256), which is MMIO-based.  Lots of PCIe devices and their
drivers make use of PCIe Extended Capabilities, whose can be accessed only
using ECAM and offsets above 0x100 in PCI config space. Supporting ECAM
is a mandatory feature for PCIe passthrough. Not only this allows
passthrough PCIe devices to function properly, but opens a road to extend
Xen PCIe passthrough features further -- eg. providing support for AER. One
of possible directions is providing support for PCIe Resizable BARs --
a feature which likely to become common for modern GPUs as video memory
sizes increase.

Q35 emulation may also be useful for other purposes. In fact, the emulation
of a more recent chipset partially closes a huge gap between a set of
required platform features and the actual emulated platform capabilities
- lot of required functionality is actually missing in a real i440 chipset.
One can look at IGD passthru support patches from Intel for example:
according to code comments, they had to create a dummy PCI-ISA bridge
at BDF 0:1F.0 in order to make the old i440 system look more modern, just
to make it compatible with IGD driver. Using Q35 emulation with its own
emulated LPC bridge allows to avoid workarounds like this. i440 on its own
is a fairly outdated system and doesn't really support lot of things, like
MMIO hole above 4Gb (although it is actually emulated). Also, due to the
i440 chipset's age the only fact of its usage may be used as a reliable
method to detect a virtualized environment by some malicious software
especially considering the fact that i440 emulation is shared among
multiple virtualization products.

On top of this series I've also implemented a solution which solves
existing Xen puzzle with HVM memory layout -- handling of VRAM, RMRRs and
MMIO hole in general. This "puzzle" (memory layout inconsistency between
libxl/libxc, hvmloader and QEMU) is a sort of fundamental problem which
plagues Xen for years and among few other issues prevents Xen to become a
decent GPU/PCIe passthrough platform (which it should be). This solution
also allows to later resolve current PCI passthrough incompatibility
issues, eg. with Populate-on-Demand. In fact, i440 support has been added
as well, but it's a bit hacky as it uses NB registers which are not present
in a real i440 (well, one more non-existing i440 feature won't harm anyway
as there are plenty of them already). I'm planning to send RFC patches of
this solution right after current patches will be reviewed and related code
settle, to rebase patches on top of it. Also, a good description is
required as the change is rather radical.

The good thing is that providing Q35 support for Xen at this stage neither
break any existing functionality nor affect the legacy i440 emulation
in any way - Q35 emulation can be enabled on demand only, using a new
domain config option. Also, only existing interfaces are used, no new
hypecalls were introduced, no API changes, etc. Although in the future
we'll have to change some hypercall/QMP/etc interfaces to remove
limitations and extend the Q35/PCIe passthru support further.

Current features and limitations:
- All basic functionality works normally - MP, networking, storage (AHCI),
  powering down VMs via ACPI soft off, etc
- Xen Platform Device and PV devices are supported -- PV drivers for vbd,
  vif, etc may be installed and used
- PCIe ECAM fully supported, with allocating space for PCIEXBAR in MMIO
  hole, ACPI MCFG generation, etc.
- Xen is limited to max 4 PIRQs in multiple places, while Q35 have support
  of 8 PIRQs / PCI router links. This was workarounded by describing only
  4 usable IRQ link entries in ACPI tables and disabling PIRQE..PIRQH -- like
  we're on a real system which has only some of 8 available PIRQs physically
  connected on the chipset. Extending the number of PCI links supported
  is trivial, but this step will change the save/migration stream format
  a bit... although as it seems there was actually some place for this
  extension being left -- eg. field uint8_t route[4] followed by uint8_t
  pad0[4] in hvm_hw_pci_link structure. Anyway, there is no problem actually
  as we normally deal with APIC mode (or MSIs) for IRQ delivery, while PIC
  mode with PCI routing needed only for legacy compatibility
- PCI hotplug currently implemented via ACPI hotplug, in a way similar
  to i440. In future, this might be changed to native PCIe hotplug facilities
  (if there will be a benefit).
- For PCIe passthrough to work on Windows 7 and above, a specific
  workaround was implemented, which allows to use PCIe device passthrough
  on those guest OSes normally. In future, this should be changed to a new
  emulated PCI architecture for Xen -- providing support for simple PCI
  hierarchies, nested MMIO spaces, etc. Basically, we need at least
  to provide support for PCI-PCI bridges (PCIe Root Ports in our case).
  Currently Xen limited to bus 0 in many places, even in hypercall
  parameters. A detailed description of the issue can be found in the patch
  named "xen/pt: Xen PCIe passthrough support for Q35: bypass PCIe topology
  check".
- VM migration was not tested as the feature primarily targets the PCIe
  passthrough which doesn't compatible with migration anyway.

How to use the Q35 feature:

A new domain config option was implemented: device_model_machine. It's
a string which has following possible values:
- "i440" -- i440 emulation (default)
- "q35"  -- emulate a Q35 machine. By default, the storage interface is
  AHCI.

Note that omitting device_model_machine parameter means i440 system
by default, so the default behavior doesn't change for old domain config
files.

So, in order to enable Q35 emulation one need to specify the following
option in the domain config file:
device_model_machine="q35"

It is recommended to install the guest OS from scratch to avoid issues due
to the emulated platform change.

One extra note - if you're going to backport this series to some older QEMU
version, make sure you have this patch for AHCI DMA bug applied: [1].
Otherwise you will encounter  random Q35 guest hangups with "Bad RAM
offset" message logged in /var/log/xen. Recent QEMU versions have this
patch commited already.

Also, a commit [2] is required to be applied (for xen-pt.c) -- it is
available in the upstream QEMU currently, but not present in qemu-xen.

This is my first (somewhat) large contribution to Xen, so some mistakes
are to be expected. Most testing was done using previous version of patches
and Xen 4.8.x.

I plan to support and extend this series further, for now I expect some
comments/suggestions/testing results/bugreports.

[1]: https://lists.xen.org/archives/html/xen-devel/2017-07/msg01077.html
[2]: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg03572.html

Xen changes:
Alexey Gerasimenko (12):
  libacpi: new DSDT ACPI table for Q35
  Makefile: build and use new DSDT table for Q35
  hvmloader: add function to query an emulated machine type (i440/Q35)
  hvmloader: add ACPI enabling for Q35
  hvmloader: add Q35 DSDT table loading
  hvmloader: add basic Q35 support
  hvmloader: allocate MMCONFIG area in the MMIO hole + minor code
    refactoring
  libxl: Q35 support (new option device_model_machine)
  libxl: Xen Platform device support for Q35
  libacpi: build ACPI MCFG table if requested
  hvmloader: use libacpi to build MCFG table
  docs: provide description for device_model_machine option

 docs/man/xl.cfg.pod.5.in             |  27 ++
 tools/firmware/hvmloader/Makefile    |   2 +-
 tools/firmware/hvmloader/config.h    |   5 +
 tools/firmware/hvmloader/hvmloader.c |  11 +-
 tools/firmware/hvmloader/pci.c       | 289 ++++++++++++------
 tools/firmware/hvmloader/pci_regs.h  |   7 +
 tools/firmware/hvmloader/util.c      | 130 ++++++++-
 tools/firmware/hvmloader/util.h      |  10 +
 tools/libacpi/Makefile               |   9 +-
 tools/libacpi/acpi2_0.h              |  21 ++
 tools/libacpi/build.c                |  42 +++
 tools/libacpi/dsdt_q35.asl           | 551 +++++++++++++++++++++++++++++++++++
 tools/libacpi/libacpi.h              |   4 +
 tools/libxl/libxl_dm.c               |  20 +-
 tools/libxl/libxl_types.idl          |   7 +
 tools/xl/xl_parse.c                  |  14 +
 16 files changed, 1051 insertions(+), 98 deletions(-)
 create mode 100644 tools/libacpi/dsdt_q35.asl

QEMU changes:
Alexey Gerasimenko (18):
  pc/xen: Xen Q35 support: provide IRQ handling for PCI devices
  pc/q35: Apply PCI bus BSEL property for Xen PCI device hotplug
  q35/acpi/xen: Provide ACPI PCI hotplug interface for Xen on Q35
  q35/xen: Add Xen platform device support for Q35
  q35: Fix incorrect values for PCIEXBAR masks
  xen/pt: XenHostPCIDevice: provide functions for PCI Capabilities and
    PCIe Extended Capabilities enumeration
  xen/pt: avoid reading PCIe device type and cap version multiple times
  xen/pt: determine the legacy/PCIe mode for a passed through device
  xen/pt: Xen PCIe passthrough support for Q35: bypass PCIe topology
    check
  xen/pt: add support for PCIe Extended Capabilities and larger config
    space
  xen/pt: handle PCIe Extended Capabilities Next register
  xen/pt: allow to hide PCIe Extended Capabilities
  xen/pt: add Vendor-specific PCIe Extended Capability descriptor and
    sizing
  xen/pt: add fixed-size PCIe Extended Capabilities descriptors
  xen/pt: add AER PCIe Extended Capability descriptor and sizing
  xen/pt: add descriptors and size calculation for
    RCLD/ACS/PMUX/DPA/MCAST/TPH/DPC PCIe Extended Capabilities
  xen/pt: add Resizable BAR PCIe Extended Capability descriptor and
    sizing
  xen/pt: add VC/VC9/MFVC PCIe Extended Capabilities descriptors and
    sizing

 hw/acpi/ich9.c               |   24 +
 hw/acpi/pcihp.c              |    8 +-
 hw/core/machine.c            |   21 +
 hw/i386/pc_q35.c             |   27 +-
 hw/i386/xen/xen-hvm.c        |   32 +-
 hw/isa/lpc_ich9.c            |    4 +
 hw/pci-host/piix.c           |    2 +-
 hw/pci-host/q35.c            |   14 +-
 hw/xen/xen-host-pci-device.c |  110 ++++-
 hw/xen/xen-host-pci-device.h |    6 +-
 hw/xen/xen_pt.c              |   53 +-
 hw/xen/xen_pt.h              |   19 +-
 hw/xen/xen_pt_config_init.c  | 1109 +++++++++++++++++++++++++++++++++++++++---
 include/hw/acpi/ich9.h       |    2 +
 include/hw/acpi/pcihp.h      |    2 +
 include/hw/boards.h          |    1 +
 include/hw/i386/ich9.h       |    1 +
 include/hw/i386/pc.h         |    3 +
 include/hw/pci-host/q35.h    |    4 +-
 include/hw/xen/xen.h         |    5 +-
 qemu-options.hx              |    1 +
 stubs/xen-hvm.c              |    8 +-
 22 files changed, 1333 insertions(+), 123 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices
@ 2018-03-12 18:33 ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Alexey Gerasimenko, qemu-devel

This patch series introduces support of Q35 emulation for Xen HVM guests
(via QEMU). This feature is present in other virtualization products and
Xen can greatly benefit from this feature as well.

The main goal for implementing Q35 emulation for Xen was extending PCI/GPU
passthrough capabilities. It's the main advantage of Q35 emulation
- availability of extra features for PCIe device passthrough. The most
important PCIe-specific passthrough feature Q35 provides is a support for
PCIe config space ECAM (aka MMCONFIG) to allow accesses to extended PCIe
config space (>256), which is MMIO-based.  Lots of PCIe devices and their
drivers make use of PCIe Extended Capabilities, whose can be accessed only
using ECAM and offsets above 0x100 in PCI config space. Supporting ECAM
is a mandatory feature for PCIe passthrough. Not only this allows
passthrough PCIe devices to function properly, but opens a road to extend
Xen PCIe passthrough features further -- eg. providing support for AER. One
of possible directions is providing support for PCIe Resizable BARs --
a feature which likely to become common for modern GPUs as video memory
sizes increase.

Q35 emulation may also be useful for other purposes. In fact, the emulation
of a more recent chipset partially closes a huge gap between a set of
required platform features and the actual emulated platform capabilities
- lot of required functionality is actually missing in a real i440 chipset.
One can look at IGD passthru support patches from Intel for example:
according to code comments, they had to create a dummy PCI-ISA bridge
at BDF 0:1F.0 in order to make the old i440 system look more modern, just
to make it compatible with IGD driver. Using Q35 emulation with its own
emulated LPC bridge allows to avoid workarounds like this. i440 on its own
is a fairly outdated system and doesn't really support lot of things, like
MMIO hole above 4Gb (although it is actually emulated). Also, due to the
i440 chipset's age the only fact of its usage may be used as a reliable
method to detect a virtualized environment by some malicious software
especially considering the fact that i440 emulation is shared among
multiple virtualization products.

On top of this series I've also implemented a solution which solves
existing Xen puzzle with HVM memory layout -- handling of VRAM, RMRRs and
MMIO hole in general. This "puzzle" (memory layout inconsistency between
libxl/libxc, hvmloader and QEMU) is a sort of fundamental problem which
plagues Xen for years and among few other issues prevents Xen to become a
decent GPU/PCIe passthrough platform (which it should be). This solution
also allows to later resolve current PCI passthrough incompatibility
issues, eg. with Populate-on-Demand. In fact, i440 support has been added
as well, but it's a bit hacky as it uses NB registers which are not present
in a real i440 (well, one more non-existing i440 feature won't harm anyway
as there are plenty of them already). I'm planning to send RFC patches of
this solution right after current patches will be reviewed and related code
settle, to rebase patches on top of it. Also, a good description is
required as the change is rather radical.

The good thing is that providing Q35 support for Xen at this stage neither
break any existing functionality nor affect the legacy i440 emulation
in any way - Q35 emulation can be enabled on demand only, using a new
domain config option. Also, only existing interfaces are used, no new
hypecalls were introduced, no API changes, etc. Although in the future
we'll have to change some hypercall/QMP/etc interfaces to remove
limitations and extend the Q35/PCIe passthru support further.

Current features and limitations:
- All basic functionality works normally - MP, networking, storage (AHCI),
  powering down VMs via ACPI soft off, etc
- Xen Platform Device and PV devices are supported -- PV drivers for vbd,
  vif, etc may be installed and used
- PCIe ECAM fully supported, with allocating space for PCIEXBAR in MMIO
  hole, ACPI MCFG generation, etc.
- Xen is limited to max 4 PIRQs in multiple places, while Q35 have support
  of 8 PIRQs / PCI router links. This was workarounded by describing only
  4 usable IRQ link entries in ACPI tables and disabling PIRQE..PIRQH -- like
  we're on a real system which has only some of 8 available PIRQs physically
  connected on the chipset. Extending the number of PCI links supported
  is trivial, but this step will change the save/migration stream format
  a bit... although as it seems there was actually some place for this
  extension being left -- eg. field uint8_t route[4] followed by uint8_t
  pad0[4] in hvm_hw_pci_link structure. Anyway, there is no problem actually
  as we normally deal with APIC mode (or MSIs) for IRQ delivery, while PIC
  mode with PCI routing needed only for legacy compatibility
- PCI hotplug currently implemented via ACPI hotplug, in a way similar
  to i440. In future, this might be changed to native PCIe hotplug facilities
  (if there will be a benefit).
- For PCIe passthrough to work on Windows 7 and above, a specific
  workaround was implemented, which allows to use PCIe device passthrough
  on those guest OSes normally. In future, this should be changed to a new
  emulated PCI architecture for Xen -- providing support for simple PCI
  hierarchies, nested MMIO spaces, etc. Basically, we need at least
  to provide support for PCI-PCI bridges (PCIe Root Ports in our case).
  Currently Xen limited to bus 0 in many places, even in hypercall
  parameters. A detailed description of the issue can be found in the patch
  named "xen/pt: Xen PCIe passthrough support for Q35: bypass PCIe topology
  check".
- VM migration was not tested as the feature primarily targets the PCIe
  passthrough which doesn't compatible with migration anyway.

How to use the Q35 feature:

A new domain config option was implemented: device_model_machine. It's
a string which has following possible values:
- "i440" -- i440 emulation (default)
- "q35"  -- emulate a Q35 machine. By default, the storage interface is
  AHCI.

Note that omitting device_model_machine parameter means i440 system
by default, so the default behavior doesn't change for old domain config
files.

So, in order to enable Q35 emulation one need to specify the following
option in the domain config file:
device_model_machine="q35"

It is recommended to install the guest OS from scratch to avoid issues due
to the emulated platform change.

One extra note - if you're going to backport this series to some older QEMU
version, make sure you have this patch for AHCI DMA bug applied: [1].
Otherwise you will encounter  random Q35 guest hangups with "Bad RAM
offset" message logged in /var/log/xen. Recent QEMU versions have this
patch commited already.

Also, a commit [2] is required to be applied (for xen-pt.c) -- it is
available in the upstream QEMU currently, but not present in qemu-xen.

This is my first (somewhat) large contribution to Xen, so some mistakes
are to be expected. Most testing was done using previous version of patches
and Xen 4.8.x.

I plan to support and extend this series further, for now I expect some
comments/suggestions/testing results/bugreports.

[1]: https://lists.xen.org/archives/html/xen-devel/2017-07/msg01077.html
[2]: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg03572.html

Xen changes:
Alexey Gerasimenko (12):
  libacpi: new DSDT ACPI table for Q35
  Makefile: build and use new DSDT table for Q35
  hvmloader: add function to query an emulated machine type (i440/Q35)
  hvmloader: add ACPI enabling for Q35
  hvmloader: add Q35 DSDT table loading
  hvmloader: add basic Q35 support
  hvmloader: allocate MMCONFIG area in the MMIO hole + minor code
    refactoring
  libxl: Q35 support (new option device_model_machine)
  libxl: Xen Platform device support for Q35
  libacpi: build ACPI MCFG table if requested
  hvmloader: use libacpi to build MCFG table
  docs: provide description for device_model_machine option

 docs/man/xl.cfg.pod.5.in             |  27 ++
 tools/firmware/hvmloader/Makefile    |   2 +-
 tools/firmware/hvmloader/config.h    |   5 +
 tools/firmware/hvmloader/hvmloader.c |  11 +-
 tools/firmware/hvmloader/pci.c       | 289 ++++++++++++------
 tools/firmware/hvmloader/pci_regs.h  |   7 +
 tools/firmware/hvmloader/util.c      | 130 ++++++++-
 tools/firmware/hvmloader/util.h      |  10 +
 tools/libacpi/Makefile               |   9 +-
 tools/libacpi/acpi2_0.h              |  21 ++
 tools/libacpi/build.c                |  42 +++
 tools/libacpi/dsdt_q35.asl           | 551 +++++++++++++++++++++++++++++++++++
 tools/libacpi/libacpi.h              |   4 +
 tools/libxl/libxl_dm.c               |  20 +-
 tools/libxl/libxl_types.idl          |   7 +
 tools/xl/xl_parse.c                  |  14 +
 16 files changed, 1051 insertions(+), 98 deletions(-)
 create mode 100644 tools/libacpi/dsdt_q35.asl

QEMU changes:
Alexey Gerasimenko (18):
  pc/xen: Xen Q35 support: provide IRQ handling for PCI devices
  pc/q35: Apply PCI bus BSEL property for Xen PCI device hotplug
  q35/acpi/xen: Provide ACPI PCI hotplug interface for Xen on Q35
  q35/xen: Add Xen platform device support for Q35
  q35: Fix incorrect values for PCIEXBAR masks
  xen/pt: XenHostPCIDevice: provide functions for PCI Capabilities and
    PCIe Extended Capabilities enumeration
  xen/pt: avoid reading PCIe device type and cap version multiple times
  xen/pt: determine the legacy/PCIe mode for a passed through device
  xen/pt: Xen PCIe passthrough support for Q35: bypass PCIe topology
    check
  xen/pt: add support for PCIe Extended Capabilities and larger config
    space
  xen/pt: handle PCIe Extended Capabilities Next register
  xen/pt: allow to hide PCIe Extended Capabilities
  xen/pt: add Vendor-specific PCIe Extended Capability descriptor and
    sizing
  xen/pt: add fixed-size PCIe Extended Capabilities descriptors
  xen/pt: add AER PCIe Extended Capability descriptor and sizing
  xen/pt: add descriptors and size calculation for
    RCLD/ACS/PMUX/DPA/MCAST/TPH/DPC PCIe Extended Capabilities
  xen/pt: add Resizable BAR PCIe Extended Capability descriptor and
    sizing
  xen/pt: add VC/VC9/MFVC PCIe Extended Capabilities descriptors and
    sizing

 hw/acpi/ich9.c               |   24 +
 hw/acpi/pcihp.c              |    8 +-
 hw/core/machine.c            |   21 +
 hw/i386/pc_q35.c             |   27 +-
 hw/i386/xen/xen-hvm.c        |   32 +-
 hw/isa/lpc_ich9.c            |    4 +
 hw/pci-host/piix.c           |    2 +-
 hw/pci-host/q35.c            |   14 +-
 hw/xen/xen-host-pci-device.c |  110 ++++-
 hw/xen/xen-host-pci-device.h |    6 +-
 hw/xen/xen_pt.c              |   53 +-
 hw/xen/xen_pt.h              |   19 +-
 hw/xen/xen_pt_config_init.c  | 1109 +++++++++++++++++++++++++++++++++++++++---
 include/hw/acpi/ich9.h       |    2 +
 include/hw/acpi/pcihp.h      |    2 +
 include/hw/boards.h          |    1 +
 include/hw/i386/ich9.h       |    1 +
 include/hw/i386/pc.h         |    3 +
 include/hw/pci-host/q35.h    |    4 +-
 include/hw/xen/xen.h         |    5 +-
 qemu-options.hx              |    1 +
 stubs/xen-hvm.c              |    8 +-
 22 files changed, 1333 insertions(+), 123 deletions(-)

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [RFC PATCH 01/12] libacpi: new DSDT ACPI table for Q35
  2018-03-12 18:33 ` Alexey Gerasimenko
  (?)
@ 2018-03-12 18:33 ` Alexey Gerasimenko
  2018-03-12 19:38   ` Konrad Rzeszutek Wilk
  2018-03-19 12:43   ` Roger Pau Monné
  -1 siblings, 2 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Alexey Gerasimenko, Jan Beulich

This patch adds the DSDT table for Q35 (new tools/libacpi/dsdt_q35.asl
file). There are not many differences with dsdt.asl (for i440) at the
moment, namely:

- BDF location of LPC Controller
- Minor changes related to FDC detection
- Addition of _OSC method to inform OSPM about PCIe features supported

As we are still using 4 PCI router links and their corresponding
device/register addresses are same (offset 0x60), no need to change PCI
routing descriptions.

Also, ACPI hotplug is still used to control passed through device hot
(un)plug (as it was for i440).

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 tools/libacpi/dsdt_q35.asl | 551 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 551 insertions(+)
 create mode 100644 tools/libacpi/dsdt_q35.asl

diff --git a/tools/libacpi/dsdt_q35.asl b/tools/libacpi/dsdt_q35.asl
new file mode 100644
index 0000000000..cd02946a07
--- /dev/null
+++ b/tools/libacpi/dsdt_q35.asl
@@ -0,0 +1,551 @@
+/******************************************************************************
+ * DSDT for Xen with Qemu device model (for Q35 machine)
+ *
+ * Copyright (c) 2004, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+DefinitionBlock ("DSDT.aml", "DSDT", 2, "Xen", "HVM", 0)
+{
+    Name (\PMBS, 0x0C00)
+    Name (\PMLN, 0x08)
+    Name (\IOB1, 0x00)
+    Name (\IOL1, 0x00)
+    Name (\APCB, 0xFEC00000)
+    Name (\APCL, 0x00010000)
+    Name (\PUID, 0x00)
+
+
+    Scope (\_SB)
+    {
+
+        /* Fix HCT test for 0x400 pci memory:
+         * - need to report low 640 MB mem as motherboard resource
+         */
+       Device(MEM0)
+       {
+           Name(_HID, EISAID("PNP0C02"))
+           Name(_CRS, ResourceTemplate() {
+               QWordMemory(
+                    ResourceConsumer, PosDecode, MinFixed,
+                    MaxFixed, Cacheable, ReadWrite,
+                    0x00000000,
+                    0x00000000,
+                    0x0009ffff,
+                    0x00000000,
+                    0x000a0000)
+           })
+       }
+
+       Device (PCI0)
+       {
+           Name (_HID, EisaId ("PNP0A03"))
+           Name (_UID, 0x00)
+           Name (_ADR, 0x00)
+           Name (_BBN, 0x00)
+
+           /* _OSC, modified from ASL sample in ACPI spec */
+           Name(SUPP, 0) /* PCI _OSC Support Field value */
+           Name(CTRL, 0) /* PCI _OSC Control Field value */
+           Method(_OSC, 4) {
+               /* Create DWORD-addressable fields from the Capabilities Buffer */
+               CreateDWordField(Arg3, 0, CDW1)
+
+               /* Switch by UUID.
+                * Only PCI Host Bridge Device capabilities UUID used for now
+                */
+               If (LEqual(Arg0, ToUUID("33DB4D5B-1FF7-401C-9657-7441C03DD766"))) {
+                   /* Create DWORD-addressable fields from the Capabilities Buffer */
+                   CreateDWordField(Arg3, 4, CDW2)
+                   CreateDWordField(Arg3, 8, CDW3)
+
+                   /* Save Capabilities DWORD2 & 3 */
+                   Store(CDW2, SUPP)
+                   Store(CDW3, CTRL)
+
+                   /* Validate Revision DWORD */
+                   If (LNotEqual(Arg1, One)) {
+                       /* Unknown revision */
+                       /* Support and Control DWORDs will be returned anyway */
+                       Or(CDW1, 0x08, CDW1)
+                   }
+
+                   /* Control field bits are:
+                    * bit 0    PCI Express Native Hot Plug control
+                    * bit 1    SHPC Native Hot Plug control
+                    * bit 2    PCI Express Native Power Management Events control
+                    * bit 3    PCI Express Advanced Error Reporting control
+                    * bit 4    PCI Express Capability Structure control
+                    */
+
+                   /* Always allow native PME, AER (no dependencies)
+                    * Never allow SHPC (no SHPC controller in this system)
+                    * Do not allow PCIe Capability Structure control for now
+                    * Also, ACPI hotplug is used for now instead of PCIe
+                    * Native Hot Plug
+                    */
+                   And(CTRL, 0x0C, CTRL)
+
+                   If (LNotEqual(CDW3, CTRL)) {
+                       /* Some of Capabilities bits were masked */
+                       Or(CDW1, 0x10, CDW1)
+                   }
+                   /* Update DWORD3 in the buffer */
+                   Store(CTRL, CDW3)
+               } Else {
+                   Or(CDW1, 4, CDW1) /* Unrecognized UUID */
+               }
+               Return (Arg3)
+           }
+           /* end of _OSC */
+
+
+           /* Make cirrues VGA S3 suspend/resume work in Windows XP/2003 */
+           Device (VGA)
+           {
+               Name (_ADR, 0x00020000)
+
+               Method (_S1D, 0, NotSerialized)
+               {
+                   Return (0x00)
+               }
+               Method (_S2D, 0, NotSerialized)
+               {
+                   Return (0x00)
+               }
+               Method (_S3D, 0, NotSerialized)
+               {
+                   Return (0x00)
+               }
+           }
+
+           Method (_CRS, 0, NotSerialized)
+           {
+               Store (ResourceTemplate ()
+               {
+                   /* bus number is from 0 - 255*/
+                   WordBusNumber(
+                        ResourceProducer, MinFixed, MaxFixed, SubDecode,
+                        0x0000,
+                        0x0000,
+                        0x00FF,
+                        0x0000,
+                        0x0100)
+                    IO (Decode16, 0x0CF8, 0x0CF8, 0x01, 0x08)
+                    WordIO(
+                        ResourceProducer, MinFixed, MaxFixed, PosDecode,
+                        EntireRange,
+                        0x0000,
+                        0x0000,
+                        0x0CF7,
+                        0x0000,
+                        0x0CF8)
+                    WordIO(
+                        ResourceProducer, MinFixed, MaxFixed, PosDecode,
+                        EntireRange,
+                        0x0000,
+                        0x0D00,
+                        0xFFFF,
+                        0x0000,
+                        0xF300)
+
+                    /* reserve memory for pci devices */
+                    DWordMemory(
+                        ResourceProducer, PosDecode, MinFixed, MaxFixed,
+                        WriteCombining, ReadWrite,
+                        0x00000000,
+                        0x000A0000,
+                        0x000BFFFF,
+                        0x00000000,
+                        0x00020000)
+
+                    DWordMemory(
+                        ResourceProducer, PosDecode, MinFixed, MaxFixed,
+                        NonCacheable, ReadWrite,
+                        0x00000000,
+                        0xF0000000,
+                        0xF4FFFFFF,
+                        0x00000000,
+                        0x05000000,
+                        ,, _Y01)
+
+                    QWordMemory (
+                        ResourceProducer, PosDecode, MinFixed, MaxFixed,
+                        NonCacheable, ReadWrite,
+                        0x0000000000000000,
+                        0x0000000FFFFFFFF0,
+                        0x0000000FFFFFFFFF,
+                        0x0000000000000000,
+                        0x0000000000000010,
+                        ,, _Y02)
+
+                }, Local1)
+
+                CreateDWordField(Local1, \_SB.PCI0._CRS._Y01._MIN, MMIN)
+                CreateDWordField(Local1, \_SB.PCI0._CRS._Y01._MAX, MMAX)
+                CreateDWordField(Local1, \_SB.PCI0._CRS._Y01._LEN, MLEN)
+
+                Store(\_SB.PMIN, MMIN)
+                Store(\_SB.PLEN, MLEN)
+                Add(MMIN, MLEN, MMAX)
+                Subtract(MMAX, One, MMAX)
+
+                /*
+                 * WinXP / Win2K3 blue-screen for operations on 64-bit values.
+                 * Therefore we need to split the 64-bit calculations needed
+                 * here, but different iasl versions evaluate name references
+                 * to integers differently:
+                 * Year (approximate)          2006    2008    2012
+                 * \_SB.PCI0._CRS._Y02         zero   valid   valid
+                 * \_SB.PCI0._CRS._Y02._MIN   valid   valid    huge
+                 */
+                If(LEqual(Zero, \_SB.PCI0._CRS._Y02)) {
+                    Subtract(\_SB.PCI0._CRS._Y02._MIN, 14, Local0)
+                } Else {
+                    Store(\_SB.PCI0._CRS._Y02, Local0)
+                }
+                CreateDWordField(Local1, Add(Local0, 14), MINL)
+                CreateDWordField(Local1, Add(Local0, 18), MINH)
+                CreateDWordField(Local1, Add(Local0, 22), MAXL)
+                CreateDWordField(Local1, Add(Local0, 26), MAXH)
+                CreateDWordField(Local1, Add(Local0, 38), LENL)
+                CreateDWordField(Local1, Add(Local0, 42), LENH)
+
+                Store(\_SB.LMIN, MINL)
+                Store(\_SB.HMIN, MINH)
+                Store(\_SB.LLEN, LENL)
+                Store(\_SB.HLEN, LENH)
+                Add(MINL, LENL, MAXL)
+                Add(MINH, LENH, MAXH)
+                If(LLess(MAXL, MINL)) {
+                    Add(MAXH, One, MAXH)
+                }
+                If(LOr(MINH, LENL)) {
+                    If(LEqual(MAXL, 0)) {
+                        Subtract(MAXH, One, MAXH)
+                    }
+                    Subtract(MAXL, One, MAXL)
+                }
+
+                Return (Local1)
+            }
+
+            Device(HPET) {
+                Name(_HID,  EISAID("PNP0103"))
+                Name(_UID, 0)
+                Method (_STA, 0, NotSerialized) {
+                    If(LEqual(\_SB.HPET, 0)) {
+                        Return(0x00)
+                    } Else {
+                        Return(0x0F)
+                    }
+                }
+                Name(_CRS, ResourceTemplate() {
+                    DWordMemory(
+                        ResourceConsumer, PosDecode, MinFixed, MaxFixed,
+                        NonCacheable, ReadWrite,
+                        0x00000000,
+                        0xFED00000,
+                        0xFED003FF,
+                        0x00000000,
+                        0x00000400 /* 1K memory: FED00000 - FED003FF */
+                    )
+                })
+            }
+
+
+            /****************************************************************
+             * LPC ISA bridge
+             ****************************************************************/
+
+            Device (ISA)
+            {
+                Name (_ADR, 0x001f0000) /* device 31, fn 0 */
+
+                /* PCI Interrupt Routing Register 1 - PIRQA..PIRQD */
+                OperationRegion(PIRQ, PCI_Config, 0x60, 0x4)
+                Scope(\) {
+                    Field (\_SB.PCI0.ISA.PIRQ, ByteAcc, NoLock, Preserve) {
+                        PIRA, 8,
+                        PIRB, 8,
+                        PIRC, 8,
+                        PIRD, 8
+                    }
+                }
+                /*
+                   PCI Interrupt Routing Register 2 (PIRQE..PIRQH) cannot be
+                   used because of existing Xen IRQ limitations (4 PCI links
+                   only)
+                */
+
+                /* LPC_I/O: I/O Decode Ranges Register */
+                OperationRegion(LPCD, PCI_Config, 0x80, 0x2)
+                Field(LPCD, AnyAcc, NoLock, Preserve) {
+                    COMA,   3,
+                        ,   1,
+                    COMB,   3,
+
+                    Offset(0x01),
+                    LPTD,   2,
+                        ,   2,
+                    FDCD,   2
+                }
+
+                /* LPC_EN: LPC I/F Enables Register */
+                OperationRegion(LPCE, PCI_Config, 0x82, 0x2)
+                Field(LPCE, AnyAcc, NoLock, Preserve) {
+                    CAEN,   1,
+                    CBEN,   1,
+                    LPEN,   1,
+                    FDEN,   1
+                }
+
+                Device (SYSR)
+                {
+                    Name (_HID, EisaId ("PNP0C02"))
+                    Name (_UID, 0x01)
+                    Name (CRS, ResourceTemplate ()
+                    {
+                        /* TODO: list hidden resources */
+                        IO (Decode16, 0x0010, 0x0010, 0x00, 0x10)
+                        IO (Decode16, 0x0022, 0x0022, 0x00, 0x0C)
+                        IO (Decode16, 0x0030, 0x0030, 0x00, 0x10)
+                        IO (Decode16, 0x0044, 0x0044, 0x00, 0x1C)
+                        IO (Decode16, 0x0062, 0x0062, 0x00, 0x02)
+                        IO (Decode16, 0x0065, 0x0065, 0x00, 0x0B)
+                        IO (Decode16, 0x0072, 0x0072, 0x00, 0x0E)
+                        IO (Decode16, 0x0080, 0x0080, 0x00, 0x01)
+                        IO (Decode16, 0x0084, 0x0084, 0x00, 0x03)
+                        IO (Decode16, 0x0088, 0x0088, 0x00, 0x01)
+                        IO (Decode16, 0x008C, 0x008C, 0x00, 0x03)
+                        IO (Decode16, 0x0090, 0x0090, 0x00, 0x10)
+                        IO (Decode16, 0x00A2, 0x00A2, 0x00, 0x1C)
+                        IO (Decode16, 0x00E0, 0x00E0, 0x00, 0x10)
+                        IO (Decode16, 0x08A0, 0x08A0, 0x00, 0x04)
+                        IO (Decode16, 0x0CC0, 0x0CC0, 0x00, 0x10)
+                        IO (Decode16, 0x04D0, 0x04D0, 0x00, 0x02)
+                    })
+                    Method (_CRS, 0, NotSerialized)
+                    {
+                        Return (CRS)
+                    }
+                }
+
+                Device (PIC)
+                {
+                    Name (_HID, EisaId ("PNP0000"))
+                    Name (_CRS, ResourceTemplate ()
+                    {
+                        IO (Decode16, 0x0020, 0x0020, 0x01, 0x02)
+                        IO (Decode16, 0x00A0, 0x00A0, 0x01, 0x02)
+                        IRQNoFlags () {2}
+                    })
+                }
+
+                Device (DMA0)
+                {
+                    Name (_HID, EisaId ("PNP0200"))
+                    Name (_CRS, ResourceTemplate ()
+                    {
+                        DMA (Compatibility, BusMaster, Transfer8) {4}
+                        IO (Decode16, 0x0000, 0x0000, 0x00, 0x10)
+                        IO (Decode16, 0x0081, 0x0081, 0x00, 0x03)
+                        IO (Decode16, 0x0087, 0x0087, 0x00, 0x01)
+                        IO (Decode16, 0x0089, 0x0089, 0x00, 0x03)
+                        IO (Decode16, 0x008F, 0x008F, 0x00, 0x01)
+                        IO (Decode16, 0x00C0, 0x00C0, 0x00, 0x20)
+                        IO (Decode16, 0x0480, 0x0480, 0x00, 0x10)
+                    })
+                }
+
+                Device (TMR)
+                {
+                    Name (_HID, EisaId ("PNP0100"))
+                    Name (_CRS, ResourceTemplate ()
+                    {
+                        IO (Decode16, 0x0040, 0x0040, 0x00, 0x04)
+                        IRQNoFlags () {0}
+                    })
+                }
+
+                Device (RTC)
+                {
+                    Name (_HID, EisaId ("PNP0B00"))
+                    Name (_CRS, ResourceTemplate ()
+                    {
+                        IO (Decode16, 0x0070, 0x0070, 0x00, 0x02)
+                        IRQNoFlags () {8}
+                    })
+                }
+
+                Device (SPKR)
+                {
+                    Name (_HID, EisaId ("PNP0800"))
+                    Name (_CRS, ResourceTemplate ()
+                    {
+                        IO (Decode16, 0x0061, 0x0061, 0x00, 0x01)
+                    })
+                }
+
+                Device (PS2M)
+                {
+                    Name (_HID, EisaId ("PNP0F13"))
+                    Name (_CID, 0x130FD041)
+                    Method (_STA, 0, NotSerialized)
+                    {
+                        Return (0x0F)
+                    }
+
+                    Name (_CRS, ResourceTemplate ()
+                    {
+                        IRQNoFlags () {12}
+                    })
+                }
+
+                Device (PS2K)
+                {
+                    Name (_HID, EisaId ("PNP0303"))
+                    Name (_CID, 0x0B03D041)
+                    Method (_STA, 0, NotSerialized)
+                    {
+                        Return (0x0F)
+                    }
+
+                    Name (_CRS, ResourceTemplate ()
+                    {
+                        IO (Decode16, 0x0060, 0x0060, 0x00, 0x01)
+                        IO (Decode16, 0x0064, 0x0064, 0x00, 0x01)
+                        IRQNoFlags () {1}
+                    })
+                }
+
+                Device(FDC0)
+                {
+                    Name(_HID, EisaId("PNP0700"))
+                    Method(_STA, 0, NotSerialized)
+                    {
+                        Store(FDEN, Local0)
+                        If (LEqual(Local0, 0)) {
+                            Return (0x00)
+                        } Else {
+                            Return (0x0F)
+                        }
+                   }
+
+                   Name(_CRS, ResourceTemplate()
+                   {
+                       IO(Decode16, 0x03F2, 0x03F2, 0x00, 0x04)
+                       IO(Decode16, 0x03F7, 0x03F7, 0x00, 0x01)
+                       IRQNoFlags() { 6 }
+                       DMA(Compatibility, NotBusMaster, Transfer8) { 2 }
+                   })
+                }
+
+                Device (UAR1)
+                {
+                    Name (_HID, EisaId ("PNP0501"))
+                    Name (_UID, 0x01)
+                    Method (_STA, 0, NotSerialized)
+                    {
+                        If(LEqual(\_SB.UAR1, 0)) {
+                            Return(0x00)
+                        } Else {
+                            Return(0x0F)
+                        }
+                    }
+
+                    Name (_CRS, ResourceTemplate()
+                    {
+                        IO (Decode16, 0x03F8, 0x03F8, 8, 8)
+                        IRQNoFlags () {4}
+                    })
+                }
+
+                Device (UAR2)
+                {
+                    Name (_HID, EisaId ("PNP0501"))
+                    Name (_UID, 0x02)
+                    Method (_STA, 0, NotSerialized)
+                    {
+                        If(LEqual(\_SB.UAR2, 0)) {
+                            Return(0x00)
+                        } Else {
+                            Return(0x0F)
+                        }
+                    }
+
+                    Name (_CRS, ResourceTemplate()
+                    {
+                        IO (Decode16, 0x02F8, 0x02F8, 8, 8)
+                        IRQNoFlags () {3}
+                    })
+                }
+
+                Device (LTP1)
+                {
+                    Name (_HID, EisaId ("PNP0400"))
+                    Name (_UID, 0x02)
+                    Method (_STA, 0, NotSerialized)
+                    {
+                        If(LEqual(\_SB.LTP1, 0)) {
+                            Return(0x00)
+                        } Else {
+                            Return(0x0F)
+                        }
+                    }
+
+                    Name (_CRS, ResourceTemplate()
+                    {
+                        IO (Decode16, 0x0378, 0x0378, 0x08, 0x08)
+                        IRQNoFlags () {7}
+                    })
+                }
+
+                Device(VGID) {
+                    Name(_HID, EisaId ("XEN0000"))
+                    Name(_UID, 0x00)
+                    Name(_CID, "VM_Gen_Counter")
+                    Name(_DDN, "VM_Gen_Counter")
+                    Method(_STA, 0, NotSerialized)
+                    {
+                        If(LEqual(\_SB.VGIA, 0x00000000)) {
+                            Return(0x00)
+                        } Else {
+                            Return(0x0F)
+                        }
+                    }
+                    Name(PKG, Package ()
+                    {
+                        0x00000000,
+                        0x00000000
+                    })
+                    Method(ADDR, 0, NotSerialized)
+                    {
+                        Store(\_SB.VGIA, Index(PKG, 0))
+                        Return(PKG)
+                    }
+                }
+            }
+        }
+    }
+    /* _S3 and _S4 are in separate SSDTs */
+    Name (\_S5, Package (0x04) {
+        0x00,  /* PM1a_CNT.SLP_TYP */
+        0x00,  /* PM1b_CNT.SLP_TYP */
+        0x00,  /* reserved */
+        0x00   /* reserved */
+    })
+    Name(PICD, 0)
+    Method(_PIC, 1) {
+        Store(Arg0, PICD)
+    }
+}
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 02/12] Makefile: build and use new DSDT table for Q35
  2018-03-12 18:33 ` Alexey Gerasimenko
  (?)
  (?)
@ 2018-03-12 18:33 ` Alexey Gerasimenko
  2018-03-19 12:46   ` Roger Pau Monné
  2018-03-19 13:07   ` Jan Beulich
  -1 siblings, 2 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Ian Jackson, Alexey Gerasimenko, Jan Beulich, Wei Liu

Provide building for newly added dsdt_q35.asl file, in a way similar
to dsdt.asl.

Note that '15cpu' ACPI tables are only applicable to qemu-traditional
(which have no support for Q35), so we need to use 'anycpu' version only.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 tools/firmware/hvmloader/Makefile | 2 +-
 tools/libacpi/Makefile            | 9 ++++++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/firmware/hvmloader/Makefile b/tools/firmware/hvmloader/Makefile
index a5b4c32c1a..b8b94bddda 100644
--- a/tools/firmware/hvmloader/Makefile
+++ b/tools/firmware/hvmloader/Makefile
@@ -75,7 +75,7 @@ rombios.o: roms.inc
 smbios.o: CFLAGS += -D__SMBIOS_DATE__="\"$(SMBIOS_REL_DATE)\""
 
 ACPI_PATH = ../../libacpi
-DSDT_FILES = dsdt_anycpu.c dsdt_15cpu.c dsdt_anycpu_qemu_xen.c
+DSDT_FILES = dsdt_anycpu.c dsdt_15cpu.c dsdt_anycpu_qemu_xen.c dsdt_q35_anycpu_qemu_xen.c
 ACPI_OBJS = $(patsubst %.c,%.o,$(DSDT_FILES)) build.o static_tables.o
 $(ACPI_OBJS): CFLAGS += -I. -DLIBACPI_STDUTILS=\"$(CURDIR)/util.h\"
 CFLAGS += -I$(ACPI_PATH)
diff --git a/tools/libacpi/Makefile b/tools/libacpi/Makefile
index a47a658a25..7946284118 100644
--- a/tools/libacpi/Makefile
+++ b/tools/libacpi/Makefile
@@ -21,7 +21,7 @@ endif
 
 MK_DSDT = $(ACPI_BUILD_DIR)/mk_dsdt
 
-C_SRC-$(CONFIG_X86) = dsdt_anycpu.c dsdt_15cpu.c dsdt_anycpu_qemu_xen.c dsdt_pvh.c
+C_SRC-$(CONFIG_X86) = dsdt_anycpu.c dsdt_15cpu.c dsdt_anycpu_qemu_xen.c dsdt_q35_anycpu_qemu_xen.c dsdt_pvh.c
 C_SRC-$(CONFIG_ARM_64) = dsdt_anycpu_arm.c
 DSDT_FILES ?= $(C_SRC-y)
 C_SRC = $(addprefix $(ACPI_BUILD_DIR)/, $(DSDT_FILES))
@@ -56,6 +56,13 @@ $(ACPI_BUILD_DIR)/dsdt_anycpu_qemu_xen.asl: dsdt.asl dsdt_acpi_info.asl $(MK_DSD
 	$(MK_DSDT) --debug=$(debug) --dm-version qemu-xen >> $@.$(TMP_SUFFIX)
 	mv -f $@.$(TMP_SUFFIX) $@
 
+$(ACPI_BUILD_DIR)/dsdt_q35_anycpu_qemu_xen.asl: dsdt_q35.asl dsdt_acpi_info.asl $(MK_DSDT)
+	# Remove last bracket
+	awk 'NR > 1 {print s} {s=$$0}' $< > $@.$(TMP_SUFFIX)
+	cat dsdt_acpi_info.asl >> $@.$(TMP_SUFFIX)
+	$(MK_DSDT) --debug=$(debug) --dm-version qemu-xen >> $@.$(TMP_SUFFIX)
+	mv -f $@.$(TMP_SUFFIX) $@
+
 # NB. awk invocation is a portable alternative to 'head -n -1'
 $(ACPI_BUILD_DIR)/dsdt_%cpu.asl: dsdt.asl dsdt_acpi_info.asl  $(MK_DSDT)
 	# Remove last bracket
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 03/12] hvmloader: add function to query an emulated machine type (i440/Q35)
  2018-03-12 18:33 ` Alexey Gerasimenko
                   ` (2 preceding siblings ...)
  (?)
@ 2018-03-12 18:33 ` Alexey Gerasimenko
  2018-03-13 17:26   ` Wei Liu
  2018-03-19 12:56   ` Roger Pau Monné
  -1 siblings, 2 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Ian Jackson, Alexey Gerasimenko, Jan Beulich, Wei Liu

This adds a new function get_pc_machine_type() which allows to determine
the emulated chipset type. Supported return values:

- MACHINE_TYPE_I440
- MACHINE_TYPE_Q35
- MACHINE_TYPE_UNKNOWN, results in the error message being printed
  followed by calling BUG() in hvmloader.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 tools/firmware/hvmloader/pci_regs.h |  5 ++++
 tools/firmware/hvmloader/util.c     | 47 +++++++++++++++++++++++++++++++++++++
 tools/firmware/hvmloader/util.h     |  8 +++++++
 3 files changed, 60 insertions(+)

diff --git a/tools/firmware/hvmloader/pci_regs.h b/tools/firmware/hvmloader/pci_regs.h
index 7bf2d873ab..ba498b840e 100644
--- a/tools/firmware/hvmloader/pci_regs.h
+++ b/tools/firmware/hvmloader/pci_regs.h
@@ -107,6 +107,11 @@
 
 #define PCI_INTEL_OPREGION 0xfc /* 4 bits */
 
+#define PCI_VENDOR_ID_INTEL              0x8086
+#define PCI_DEVICE_ID_INTEL_82441        0x1237
+#define PCI_DEVICE_ID_INTEL_Q35_MCH      0x29c0
+
+
 #endif /* __HVMLOADER_PCI_REGS_H__ */
 
 /*
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 0c3f2d24cd..5739a87628 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -22,6 +22,7 @@
 #include "hypercall.h"
 #include "ctype.h"
 #include "vnuma.h"
+#include "pci_regs.h"
 #include <acpi2_0.h>
 #include <libacpi.h>
 #include <stdint.h>
@@ -735,6 +736,52 @@ void __bug(char *file, int line)
     crash();
 }
 
+
+static int machine_type = MACHINE_TYPE_UNDEFINED;
+
+int get_pc_machine_type(void)
+{
+    uint16_t vendor_id;
+    uint16_t device_id;
+
+    if (machine_type != MACHINE_TYPE_UNDEFINED)
+        return machine_type;
+
+    machine_type = MACHINE_TYPE_UNKNOWN;
+
+    vendor_id = pci_readw(0, PCI_VENDOR_ID);
+    device_id = pci_readw(0, PCI_DEVICE_ID);
+
+    /* only Intel platforms are emulated currently */
+    if (vendor_id == PCI_VENDOR_ID_INTEL)
+    {
+        switch (device_id)
+        {
+        case PCI_DEVICE_ID_INTEL_82441:
+            machine_type = MACHINE_TYPE_I440;
+            printf("Detected i440 chipset\n");
+            break;
+
+        case PCI_DEVICE_ID_INTEL_Q35_MCH:
+            machine_type = MACHINE_TYPE_Q35;
+            printf("Detected Q35 chipset\n");
+            break;
+
+        default:
+            break;
+        }
+    }
+
+    if (machine_type == MACHINE_TYPE_UNKNOWN)
+    {
+        printf("Unknown emulated chipset encountered, VID=%04Xh, DID=%04Xh\n",
+               vendor_id, device_id);
+        BUG();
+    }
+
+    return machine_type;
+}
+
 static void validate_hvm_info(struct hvm_info_table *t)
 {
     uint8_t *ptr = (uint8_t *)t;
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 7bca6418d2..7c77bedb00 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -100,6 +100,14 @@ void pci_write(uint32_t devfn, uint32_t reg, uint32_t len, uint32_t val);
 #define pci_writew(devfn, reg, val) pci_write(devfn, reg, 2, (uint16_t)(val))
 #define pci_writel(devfn, reg, val) pci_write(devfn, reg, 4, (uint32_t)(val))
 
+/* Emulated machine types */
+#define MACHINE_TYPE_UNDEFINED      0
+#define MACHINE_TYPE_I440           1
+#define MACHINE_TYPE_Q35            2
+#define MACHINE_TYPE_UNKNOWN        (-1)
+
+int get_pc_machine_type(void);
+
 /* Get a pointer to the shared-info page */
 struct shared_info *get_shared_info(void) __attribute__ ((const));
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 04/12] hvmloader: add ACPI enabling for Q35
  2018-03-12 18:33 ` Alexey Gerasimenko
                   ` (3 preceding siblings ...)
  (?)
@ 2018-03-12 18:33 ` Alexey Gerasimenko
  2018-03-13 17:26   ` Wei Liu
  2018-03-19 13:01   ` Roger Pau Monné
  -1 siblings, 2 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Ian Jackson, Alexey Gerasimenko, Jan Beulich, Wei Liu

In order to turn on ACPI for OS, we need to write a chipset-specific value
to SMI_CMD register (sort of imitation of the APM->ACPI switch on real
systems). Modify acpi_enable_sci() function to support both i440 and Q35
emulation.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 tools/firmware/hvmloader/hvmloader.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
index f603f68ded..070698440e 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -257,9 +257,16 @@ static const struct bios_config *detect_bios(void)
 static void acpi_enable_sci(void)
 {
     uint8_t pm1a_cnt_val;
+    uint8_t acpi_enable_val;
 
-#define PIIX4_SMI_CMD_IOPORT 0xb2
+#define SMI_CMD_IOPORT       0xb2
 #define PIIX4_ACPI_ENABLE    0xf1
+#define ICH9_ACPI_ENABLE     0x02
+
+    if (get_pc_machine_type() == MACHINE_TYPE_Q35)
+        acpi_enable_val = ICH9_ACPI_ENABLE;
+    else
+        acpi_enable_val = PIIX4_ACPI_ENABLE;
 
     /*
      * PIIX4 emulation in QEMU has SCI_EN=0 by default. We have no legacy
@@ -267,7 +274,7 @@ static void acpi_enable_sci(void)
      */
     pm1a_cnt_val = inb(ACPI_PM1A_CNT_BLK_ADDRESS_V1);
     if ( !(pm1a_cnt_val & ACPI_PM1C_SCI_EN) )
-        outb(PIIX4_SMI_CMD_IOPORT, PIIX4_ACPI_ENABLE);
+        outb(SMI_CMD_IOPORT, acpi_enable_val);
 
     pm1a_cnt_val = inb(ACPI_PM1A_CNT_BLK_ADDRESS_V1);
     BUG_ON(!(pm1a_cnt_val & ACPI_PM1C_SCI_EN));
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 05/12] hvmloader: add Q35 DSDT table loading
  2018-03-12 18:33 ` Alexey Gerasimenko
                   ` (4 preceding siblings ...)
  (?)
@ 2018-03-12 18:33 ` Alexey Gerasimenko
  2018-03-19 14:45   ` Roger Pau Monné
  -1 siblings, 1 reply; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Ian Jackson, Alexey Gerasimenko, Jan Beulich, Wei Liu

Allows to select Q35 DSDT table in hvmloader_acpi_build_tables(). Function
get_pc_machine_type() is used to select a proper table (i440/q35).

As we are bound to the qemu-xen device model for Q35, no need
to initialize config->dsdt_15cpu/config->dsdt_15cpu_len fields.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 tools/firmware/hvmloader/util.c | 13 +++++++++++--
 tools/firmware/hvmloader/util.h |  2 ++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 5739a87628..d8db9e3c8e 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -955,8 +955,17 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
     }
     else if ( !strncmp(s, "qemu_xen", 9) )
     {
-        config->dsdt_anycpu = dsdt_anycpu_qemu_xen;
-        config->dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len;
+        if (get_pc_machine_type() == MACHINE_TYPE_Q35)
+        {
+            config->dsdt_anycpu = dsdt_q35_anycpu_qemu_xen;
+            config->dsdt_anycpu_len = dsdt_q35_anycpu_qemu_xen_len;
+        }
+        else
+        {
+            config->dsdt_anycpu = dsdt_anycpu_qemu_xen;
+            config->dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len;
+        }
+
         config->dsdt_15cpu = NULL;
         config->dsdt_15cpu_len = 0;
     }
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 7c77bedb00..fd2d885c96 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -288,7 +288,9 @@ bool check_overlap(uint64_t start, uint64_t size,
                    uint64_t reserved_start, uint64_t reserved_size);
 
 extern const unsigned char dsdt_anycpu_qemu_xen[], dsdt_anycpu[], dsdt_15cpu[];
+extern const unsigned char dsdt_q35_anycpu_qemu_xen[];
 extern const int dsdt_anycpu_qemu_xen_len, dsdt_anycpu_len, dsdt_15cpu_len;
+extern const int dsdt_q35_anycpu_qemu_xen_len;
 
 struct acpi_config;
 void hvmloader_acpi_build_tables(struct acpi_config *config,
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 06/12] hvmloader: add basic Q35 support
  2018-03-12 18:33 ` Alexey Gerasimenko
                   ` (5 preceding siblings ...)
  (?)
@ 2018-03-12 18:33 ` Alexey Gerasimenko
  2018-03-19 15:30   ` Roger Pau Monné
  -1 siblings, 1 reply; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Ian Jackson, Alexey Gerasimenko, Jan Beulich, Wei Liu

This patch does following:

1. Move PCI-device specific initialization out of pci_setup function
to the newly created class_specific_pci_device_setup function to simplify
code.

2. PCI-device specific initialization extended with LPC controller
initialization

3. Initialize PIRQA...{PIRQD, PIRQH} routing accordingly to the emulated
south bridge (either located on PCI_ISA_DEVFN or PCI_ICH9_LPC_DEVFN).

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 tools/firmware/hvmloader/config.h |   1 +
 tools/firmware/hvmloader/pci.c    | 162 ++++++++++++++++++++++++--------------
 2 files changed, 104 insertions(+), 59 deletions(-)

diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
index 6e00413f2e..6fde6b7b60 100644
--- a/tools/firmware/hvmloader/config.h
+++ b/tools/firmware/hvmloader/config.h
@@ -52,6 +52,7 @@ extern uint8_t ioapic_version;
 
 #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
 #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
+#define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0 */
 
 /* MMIO hole: Hardcoded defaults, which can be dynamically expanded. */
 #define PCI_MEM_END         0xfc000000
diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 0b708bf578..033bd20992 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -35,6 +35,7 @@ unsigned long pci_mem_end = PCI_MEM_END;
 uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 
 enum virtual_vga virtual_vga = VGA_none;
+uint32_t vga_devfn = 256;
 unsigned long igd_opregion_pgbase = 0;
 
 /* Check if the specified range conflicts with any reserved device memory. */
@@ -76,14 +77,93 @@ static int find_next_rmrr(uint32_t base)
     return next_rmrr;
 }
 
+#define SCI_EN_IOPORT  (ACPI_PM1A_EVT_BLK_ADDRESS_V1 + 0x30)
+#define GBL_SMI_EN      (1 << 0)
+#define APMC_EN         (1 << 5)
+
+static void class_specific_pci_device_setup(uint16_t vendor_id,
+                                            uint16_t device_id,
+                                            uint8_t bus, uint8_t devfn)
+{
+    uint16_t class;
+
+    class = pci_readw(devfn, PCI_CLASS_DEVICE);
+
+    switch ( class )
+    {
+    case 0x0300:
+        /* If emulated VGA is found, preserve it as primary VGA. */
+        if ( (vendor_id == 0x1234) && (device_id == 0x1111) )
+        {
+            vga_devfn = devfn;
+            virtual_vga = VGA_std;
+        }
+        else if ( (vendor_id == 0x1013) && (device_id == 0xb8) )
+        {
+            vga_devfn = devfn;
+            virtual_vga = VGA_cirrus;
+        }
+        else if ( virtual_vga == VGA_none )
+        {
+            vga_devfn = devfn;
+            virtual_vga = VGA_pt;
+            if ( vendor_id == 0x8086 )
+            {
+                igd_opregion_pgbase = mem_hole_alloc(IGD_OPREGION_PAGES);
+                /*
+                 * Write the the OpRegion offset to give the opregion
+                 * address to the device model. The device model will trap
+                 * and map the OpRegion at the give address.
+                 */
+                pci_writel(vga_devfn, PCI_INTEL_OPREGION,
+                           igd_opregion_pgbase << PAGE_SHIFT);
+            }
+        }
+        break;
+
+    case 0x0680:
+        /* PIIX4 ACPI PM. Special device with special PCI config space. */
+        ASSERT((vendor_id == 0x8086) && (device_id == 0x7113));
+        pci_writew(devfn, 0x20, 0x0000); /* No smb bus IO enable */
+        pci_writew(devfn, 0xd2, 0x0000); /* No smb bus IO enable */
+        pci_writew(devfn, 0x22, 0x0000);
+        pci_writew(devfn, 0x3c, 0x0009); /* Hardcoded IRQ9 */
+        pci_writew(devfn, 0x3d, 0x0001);
+        pci_writel(devfn, 0x40, ACPI_PM1A_EVT_BLK_ADDRESS_V1 | 1);
+        pci_writeb(devfn, 0x80, 0x01); /* enable PM io space */
+        break;
+
+    case 0x0601:
+        /* LPC bridge */
+        if (vendor_id == 0x8086 && device_id == 0x2918)
+        {
+            pci_writeb(devfn, 0x3c, 0x09); /* Hardcoded IRQ9 */
+            pci_writeb(devfn, 0x3d, 0x01);
+            pci_writel(devfn, 0x40, ACPI_PM1A_EVT_BLK_ADDRESS_V1 | 1);
+            pci_writeb(devfn, 0x44, 0x80); /* enable PM io space */
+            outl(SCI_EN_IOPORT, inl(SCI_EN_IOPORT) | GBL_SMI_EN | APMC_EN);
+        }
+        break;
+
+    case 0x0101:
+        if ( vendor_id == 0x8086 )
+        {
+            /* Intel ICHs since PIIX3: enable IDE legacy mode. */
+            pci_writew(devfn, 0x40, 0x8000); /* enable IDE0 */
+            pci_writew(devfn, 0x42, 0x8000); /* enable IDE1 */
+        }
+        break;
+    }
+}
+
 void pci_setup(void)
 {
     uint8_t is_64bar, using_64bar, bar64_relocate = 0;
     uint32_t devfn, bar_reg, cmd, bar_data, bar_data_upper;
     uint64_t base, bar_sz, bar_sz_upper, mmio_total = 0;
-    uint32_t vga_devfn = 256;
-    uint16_t class, vendor_id, device_id;
+    uint16_t vendor_id, device_id;
     unsigned int bar, pin, link, isa_irq;
+    int is_running_on_q35 = 0;
 
     /* Resources assignable to PCI devices via BARs. */
     struct resource {
@@ -130,13 +210,28 @@ void pci_setup(void)
     if ( s )
         mmio_hole_size = strtoll(s, NULL, 0);
 
+    /* check if we are on Q35 and set the flag if it is the case */
+    is_running_on_q35 = get_pc_machine_type() == MACHINE_TYPE_Q35;
+
     /* Program PCI-ISA bridge with appropriate link routes. */
     isa_irq = 0;
     for ( link = 0; link < 4; link++ )
     {
         do { isa_irq = (isa_irq + 1) & 15;
         } while ( !(PCI_ISA_IRQ_MASK & (1U << isa_irq)) );
-        pci_writeb(PCI_ISA_DEVFN, 0x60 + link, isa_irq);
+
+        if (is_running_on_q35)
+        {
+            pci_writeb(PCI_ICH9_LPC_DEVFN, 0x60 + link, isa_irq);
+
+            /* PIRQE..PIRQH are unused */
+            pci_writeb(PCI_ICH9_LPC_DEVFN, 0x68 + link, 0x80);
+        }
+        else
+        {
+            pci_writeb(PCI_ISA_DEVFN, 0x60 + link, isa_irq);
+        }
+
         printf("PCI-ISA link %u routed to IRQ%u\n", link, isa_irq);
     }
 
@@ -147,66 +242,13 @@ void pci_setup(void)
     /* Scan the PCI bus and map resources. */
     for ( devfn = 0; devfn < 256; devfn++ )
     {
-        class     = pci_readw(devfn, PCI_CLASS_DEVICE);
         vendor_id = pci_readw(devfn, PCI_VENDOR_ID);
         device_id = pci_readw(devfn, PCI_DEVICE_ID);
         if ( (vendor_id == 0xffff) && (device_id == 0xffff) )
             continue;
 
-        ASSERT((devfn != PCI_ISA_DEVFN) ||
-               ((vendor_id == 0x8086) && (device_id == 0x7000)));
-
-        switch ( class )
-        {
-        case 0x0300:
-            /* If emulated VGA is found, preserve it as primary VGA. */
-            if ( (vendor_id == 0x1234) && (device_id == 0x1111) )
-            {
-                vga_devfn = devfn;
-                virtual_vga = VGA_std;
-            }
-            else if ( (vendor_id == 0x1013) && (device_id == 0xb8) )
-            {
-                vga_devfn = devfn;
-                virtual_vga = VGA_cirrus;
-            }
-            else if ( virtual_vga == VGA_none )
-            {
-                vga_devfn = devfn;
-                virtual_vga = VGA_pt;
-                if ( vendor_id == 0x8086 )
-                {
-                    igd_opregion_pgbase = mem_hole_alloc(IGD_OPREGION_PAGES);
-                    /*
-                     * Write the the OpRegion offset to give the opregion
-                     * address to the device model. The device model will trap 
-                     * and map the OpRegion at the give address.
-                     */
-                    pci_writel(vga_devfn, PCI_INTEL_OPREGION,
-                               igd_opregion_pgbase << PAGE_SHIFT);
-                }
-            }
-            break;
-        case 0x0680:
-            /* PIIX4 ACPI PM. Special device with special PCI config space. */
-            ASSERT((vendor_id == 0x8086) && (device_id == 0x7113));
-            pci_writew(devfn, 0x20, 0x0000); /* No smb bus IO enable */
-            pci_writew(devfn, 0xd2, 0x0000); /* No smb bus IO enable */
-            pci_writew(devfn, 0x22, 0x0000);
-            pci_writew(devfn, 0x3c, 0x0009); /* Hardcoded IRQ9 */
-            pci_writew(devfn, 0x3d, 0x0001);
-            pci_writel(devfn, 0x40, ACPI_PM1A_EVT_BLK_ADDRESS_V1 | 1);
-            pci_writeb(devfn, 0x80, 0x01); /* enable PM io space */
-            break;
-        case 0x0101:
-            if ( vendor_id == 0x8086 )
-            {
-                /* Intel ICHs since PIIX3: enable IDE legacy mode. */
-                pci_writew(devfn, 0x40, 0x8000); /* enable IDE0 */
-                pci_writew(devfn, 0x42, 0x8000); /* enable IDE1 */
-            }
-            break;
-        }
+        class_specific_pci_device_setup(vendor_id, device_id,
+                                        0 /* virt_bus support TBD */, devfn);
 
         /* Map the I/O memory and port resources. */
         for ( bar = 0; bar < 7; bar++ )
@@ -283,7 +325,9 @@ void pci_setup(void)
         {
             /* This is the barber's pole mapping used by Xen. */
             link = ((pin - 1) + (devfn >> 3)) & 3;
-            isa_irq = pci_readb(PCI_ISA_DEVFN, 0x60 + link);
+            isa_irq = pci_readb(is_running_on_q35 ?
+                                PCI_ICH9_LPC_DEVFN : PCI_ISA_DEVFN,
+                                0x60 + link);
             pci_writeb(devfn, PCI_INTERRUPT_LINE, isa_irq);
             printf("pci dev %02x:%x INT%c->IRQ%u\n",
                    devfn>>3, devfn&7, 'A'+pin-1, isa_irq);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-12 18:33 ` Alexey Gerasimenko
                   ` (6 preceding siblings ...)
  (?)
@ 2018-03-12 18:33 ` Alexey Gerasimenko
  2018-03-19 15:58   ` Roger Pau Monné
  2018-05-29 14:23   ` Jan Beulich
  -1 siblings, 2 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Ian Jackson, Alexey Gerasimenko, Jan Beulich, Wei Liu

Much like normal PCI BARs or other chipset-specific memory-mapped
resources, MMCONFIG area needs space in MMIO hole, so we must allocate
it manually.

The actual MMCONFIG size depends on a number of PCI buses available which
should be covered by ECAM. Possible options are 64MB, 128MB and 256MB.
As we are limited to the bus 0 currently, thus using lowest possible
setting (64MB), #defined via PCI_MAX_MCFG_BUSES in hvmloader/config.h.
When multiple PCI buses support for Xen will be implemented,
PCI_MAX_MCFG_BUSES may be changed to calculation of the number of buses
according to results of the PCI devices enumeration.

The way to allocate MMCONFIG range in MMIO hole is similar to how other
PCI BARs are allocated. The patch extends 'bars' structure to make
it universal for any arbitrary BAR type -- either IO, MMIO, ROM or
a chipset-specific resource.

One important new field is addr_mask, which tells which bits of the base
address can (should) be written. Different address types (ROM, MMIO BAR,
PCIEXBAR) will have different addr_mask values.

For every assignable BAR range we store its size, PCI device BDF (devfn
actually) to which it belongs, BAR type (mem/io/mem64) and corresponding
register offset in device PCI conf space. This way we can insert MMCONFIG
entry into bars array in the same manner like for any other BARs. In this
case, the devfn field will point to MCH PCI device and bar_reg will
contain PCIEXBAR register offset. It will be assigned a slot in MMIO hole
later in a very same way like for plain PCI BARs, with respect to its size
alignment.

Also, to reduce code complexity, all long mem/mem64 BAR flags checks are
replaced by simple bars[i] field probing, eg.:
-        if ( (bar_reg == PCI_ROM_ADDRESS) ||
-             ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
-              PCI_BASE_ADDRESS_SPACE_MEMORY) )
+        if ( bars[i].is_mem )

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 tools/firmware/hvmloader/config.h   |   4 ++
 tools/firmware/hvmloader/pci.c      | 127 ++++++++++++++++++++++++++++--------
 tools/firmware/hvmloader/pci_regs.h |   2 +
 3 files changed, 106 insertions(+), 27 deletions(-)

diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
index 6fde6b7b60..5443ecd804 100644
--- a/tools/firmware/hvmloader/config.h
+++ b/tools/firmware/hvmloader/config.h
@@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
 #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
 #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
 #define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0 */
+#define PCI_MCH_DEVFN       0       /* bus 0, dev 0, func 0 */
 
 /* MMIO hole: Hardcoded defaults, which can be dynamically expanded. */
 #define PCI_MEM_END         0xfc000000
 
+/* possible values are: 64, 128, 256 */
+#define PCI_MAX_MCFG_BUSES  64
+
 #define ACPI_TIS_HDR_ADDRESS 0xFED40F00UL
 
 extern unsigned long pci_mem_start, pci_mem_end;
diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 033bd20992..6de124bbd5 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -158,9 +158,10 @@ static void class_specific_pci_device_setup(uint16_t vendor_id,
 
 void pci_setup(void)
 {
-    uint8_t is_64bar, using_64bar, bar64_relocate = 0;
+    uint8_t is_64bar, using_64bar, bar64_relocate = 0, is_mem;
     uint32_t devfn, bar_reg, cmd, bar_data, bar_data_upper;
     uint64_t base, bar_sz, bar_sz_upper, mmio_total = 0;
+    uint64_t addr_mask;
     uint16_t vendor_id, device_id;
     unsigned int bar, pin, link, isa_irq;
     int is_running_on_q35 = 0;
@@ -172,10 +173,14 @@ void pci_setup(void)
 
     /* Create a list of device BARs in descending order of size. */
     struct bars {
-        uint32_t is_64bar;
         uint32_t devfn;
         uint32_t bar_reg;
         uint64_t bar_sz;
+        uint64_t addr_mask; /* which bits of the base address can be written */
+        uint32_t bar_data;  /* initial value - BAR flags here */
+        uint8_t  is_64bar;
+        uint8_t  is_mem;
+        uint8_t  padding[2];
     } *bars = (struct bars *)scratch_start;
     unsigned int i, nr_bars = 0;
     uint64_t mmio_hole_size = 0;
@@ -259,13 +264,21 @@ void pci_setup(void)
                 bar_reg = PCI_ROM_ADDRESS;
 
             bar_data = pci_readl(devfn, bar_reg);
+
+            is_mem = !!(((bar_data & PCI_BASE_ADDRESS_SPACE) ==
+                       PCI_BASE_ADDRESS_SPACE_MEMORY) ||
+                       (bar_reg == PCI_ROM_ADDRESS));
+
             if ( bar_reg != PCI_ROM_ADDRESS )
             {
-                is_64bar = !!((bar_data & (PCI_BASE_ADDRESS_SPACE |
-                             PCI_BASE_ADDRESS_MEM_TYPE_MASK)) ==
-                             (PCI_BASE_ADDRESS_SPACE_MEMORY |
+                is_64bar = !!(is_mem &&
+                             ((bar_data & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
                              PCI_BASE_ADDRESS_MEM_TYPE_64));
+
                 pci_writel(devfn, bar_reg, ~0);
+
+                addr_mask = is_mem ? PCI_BASE_ADDRESS_MEM_MASK
+                                   : PCI_BASE_ADDRESS_IO_MASK;
             }
             else
             {
@@ -273,28 +286,35 @@ void pci_setup(void)
                 pci_writel(devfn, bar_reg,
                            (bar_data | PCI_ROM_ADDRESS_MASK) &
                            ~PCI_ROM_ADDRESS_ENABLE);
+
+                addr_mask = PCI_ROM_ADDRESS_MASK;
             }
+
             bar_sz = pci_readl(devfn, bar_reg);
             pci_writel(devfn, bar_reg, bar_data);
 
             if ( bar_reg != PCI_ROM_ADDRESS )
-                bar_sz &= (((bar_data & PCI_BASE_ADDRESS_SPACE) ==
-                            PCI_BASE_ADDRESS_SPACE_MEMORY) ?
-                           PCI_BASE_ADDRESS_MEM_MASK :
-                           (PCI_BASE_ADDRESS_IO_MASK & 0xffff));
+                bar_sz &= is_mem ? PCI_BASE_ADDRESS_MEM_MASK :
+                                   (PCI_BASE_ADDRESS_IO_MASK & 0xffff);
             else
                 bar_sz &= PCI_ROM_ADDRESS_MASK;
-            if (is_64bar) {
+
+            if (is_64bar)
+            {
                 bar_data_upper = pci_readl(devfn, bar_reg + 4);
                 pci_writel(devfn, bar_reg + 4, ~0);
                 bar_sz_upper = pci_readl(devfn, bar_reg + 4);
                 pci_writel(devfn, bar_reg + 4, bar_data_upper);
                 bar_sz = (bar_sz_upper << 32) | bar_sz;
             }
+
             bar_sz &= ~(bar_sz - 1);
             if ( bar_sz == 0 )
                 continue;
 
+            /* leave only memtype/enable bits etc */
+            bar_data &= ~addr_mask;
+
             for ( i = 0; i < nr_bars; i++ )
                 if ( bars[i].bar_sz < bar_sz )
                     break;
@@ -302,14 +322,15 @@ void pci_setup(void)
             if ( i != nr_bars )
                 memmove(&bars[i+1], &bars[i], (nr_bars-i) * sizeof(*bars));
 
-            bars[i].is_64bar = is_64bar;
-            bars[i].devfn   = devfn;
-            bars[i].bar_reg = bar_reg;
-            bars[i].bar_sz  = bar_sz;
+            bars[i].is_64bar  = is_64bar;
+            bars[i].is_mem    = is_mem;
+            bars[i].devfn     = devfn;
+            bars[i].bar_reg   = bar_reg;
+            bars[i].bar_sz    = bar_sz;
+            bars[i].addr_mask = addr_mask;
+            bars[i].bar_data  = bar_data;
 
-            if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
-                  PCI_BASE_ADDRESS_SPACE_MEMORY) ||
-                 (bar_reg == PCI_ROM_ADDRESS) )
+            if ( is_mem )
                 mmio_total += bar_sz;
 
             nr_bars++;
@@ -339,6 +360,63 @@ void pci_setup(void)
         pci_writew(devfn, PCI_COMMAND, cmd);
     }
 
+    /*
+     *  Calculate MMCONFIG area size and squeeze it into the bars array
+     *  for assigning a slot in the MMIO hole
+     */
+    if (is_running_on_q35)
+    {
+        /* disable PCIEXBAR decoding for now */
+        pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR, 0);
+        pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR + 4, 0);
+
+#define PCIEXBAR_64_BUSES    (2 << 1)
+#define PCIEXBAR_128_BUSES   (1 << 1)
+#define PCIEXBAR_256_BUSES   (0 << 1)
+#define PCIEXBAR_ENABLE      (1 << 0)
+
+        switch (PCI_MAX_MCFG_BUSES)
+        {
+        case 64:
+            bar_data = PCIEXBAR_64_BUSES | PCIEXBAR_ENABLE;
+            bar_sz = MB(64);
+            break;
+
+        case 128:
+            bar_data = PCIEXBAR_128_BUSES | PCIEXBAR_ENABLE;
+            bar_sz = MB(128);
+            break;
+
+        case 256:
+            bar_data = PCIEXBAR_256_BUSES | PCIEXBAR_ENABLE;
+            bar_sz = MB(256);
+            break;
+
+        default:
+            /* unsupported number of buses specified */
+            BUG();
+        }
+
+        addr_mask = ~(bar_sz - 1);
+
+        for ( i = 0; i < nr_bars; i++ )
+            if ( bars[i].bar_sz < bar_sz )
+                break;
+
+        if ( i != nr_bars )
+            memmove(&bars[i+1], &bars[i], (nr_bars-i) * sizeof(*bars));
+
+        bars[i].is_mem    = 1;
+        bars[i].devfn     = PCI_MCH_DEVFN;
+        bars[i].bar_reg   = PCI_MCH_PCIEXBAR;
+        bars[i].bar_sz    = bar_sz;
+        bars[i].addr_mask = addr_mask;
+        bars[i].bar_data  = bar_data;
+
+        mmio_total += bar_sz;
+        nr_bars++;
+    }
+
     if ( mmio_hole_size )
     {
         uint64_t max_ram_below_4g = GB(4) - mmio_hole_size;
@@ -473,10 +551,10 @@ void pci_setup(void)
          */
         using_64bar = bars[i].is_64bar && bar64_relocate
             && (mmio_total > (mem_resource.max - mem_resource.base));
-        bar_data = pci_readl(devfn, bar_reg);
 
-        if ( (bar_data & PCI_BASE_ADDRESS_SPACE) ==
-             PCI_BASE_ADDRESS_SPACE_MEMORY )
+        bar_data = bars[i].bar_data;
+
+        if ( bars[i].is_mem )
         {
             /* Mapping high memory if PCI device is 64 bits bar */
             if ( using_64bar ) {
@@ -486,18 +564,15 @@ void pci_setup(void)
                 if ( !pci_hi_mem_start )
                     pci_hi_mem_start = high_mem_resource.base;
                 resource = &high_mem_resource;
-                bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
             } 
             else {
                 resource = &mem_resource;
-                bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
             }
             mmio_total -= bar_sz;
         }
         else
         {
             resource = &io_resource;
-            bar_data &= ~PCI_BASE_ADDRESS_IO_MASK;
         }
 
         base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
@@ -519,7 +594,7 @@ void pci_setup(void)
             }
         }
 
-        bar_data |= (uint32_t)base;
+        bar_data |= (uint32_t) (base & bars[i].addr_mask);
         bar_data_upper = (uint32_t)(base >> 32);
         base += bar_sz;
 
@@ -544,9 +619,7 @@ void pci_setup(void)
 
         /* Now enable the memory or I/O mapping. */
         cmd = pci_readw(devfn, PCI_COMMAND);
-        if ( (bar_reg == PCI_ROM_ADDRESS) ||
-             ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
-              PCI_BASE_ADDRESS_SPACE_MEMORY) )
+        if ( bars[i].is_mem )
             cmd |= PCI_COMMAND_MEMORY;
         else
             cmd |= PCI_COMMAND_IO;
diff --git a/tools/firmware/hvmloader/pci_regs.h b/tools/firmware/hvmloader/pci_regs.h
index ba498b840e..4f1c6d0800 100644
--- a/tools/firmware/hvmloader/pci_regs.h
+++ b/tools/firmware/hvmloader/pci_regs.h
@@ -111,6 +111,8 @@
 #define PCI_DEVICE_ID_INTEL_82441        0x1237
 #define PCI_DEVICE_ID_INTEL_Q35_MCH      0x29c0
 
+#define PCI_MCH_PCIEXBAR                 0x60
+
 
 #endif /* __HVMLOADER_PCI_REGS_H__ */
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 08/12] libxl: Q35 support (new option device_model_machine)
  2018-03-12 18:33 ` Alexey Gerasimenko
                   ` (7 preceding siblings ...)
  (?)
@ 2018-03-12 18:33 ` Alexey Gerasimenko
  2018-03-13 17:25   ` Wei Liu
  2018-03-19 17:01   ` Roger Pau Monné
  -1 siblings, 2 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Alexey Gerasimenko

Provide a new domain config option to select the emulated machine type,
device_model_machine. It has following possible values:
- "i440" - i440 emulation (default)
- "q35" - emulate a Q35 machine. By default, the storage interface is AHCI.

Note that omitting device_model_machine parameter means i440 system
by default, so the default behavior doesn't change for existing domain
config files.

Setting device_model_machine to "q35" sends '-machine q35,accel=xen'
argument to QEMU. Unlike i440, there no separate machine type
to enable/disable Xen platform device, it is controlled via a machine
property only. See 'libxl: Xen Platform device support for Q35' patch for
a detailed description.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 tools/libxl/libxl_dm.c      | 16 ++++++++++------
 tools/libxl/libxl_types.idl |  7 +++++++
 tools/xl/xl_parse.c         | 14 ++++++++++++++
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index a3cddce8b7..7b531050c7 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1443,13 +1443,17 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
             flexarray_append(dm_args, b_info->extra_pv[i]);
         break;
     case LIBXL_DOMAIN_TYPE_HVM:
-        if (!libxl_defbool_val(b_info->u.hvm.xen_platform_pci)) {
-            /* Switching here to the machine "pc" which does not add
-             * the xen-platform device instead of the default "xenfv" machine.
-             */
-            machinearg = libxl__strdup(gc, "pc,accel=xen");
+        if (b_info->device_model_machine == LIBXL_DEVICE_MODEL_MACHINE_Q35) {
+            machinearg = libxl__sprintf(gc, "q35,accel=xen");
         } else {
-            machinearg = libxl__strdup(gc, "xenfv");
+            if (!libxl_defbool_val(b_info->u.hvm.xen_platform_pci)) {
+                /* Switching here to the machine "pc" which does not add
+                 * the xen-platform device instead of the default "xenfv" machine.
+                 */
+                machinearg = libxl__strdup(gc, "pc,accel=xen");
+            } else {
+                machinearg = libxl__strdup(gc, "xenfv");
+            }
         }
         if (b_info->u.hvm.mmio_hole_memkb) {
             uint64_t max_ram_below_4g = (1ULL << 32) -
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 35038120ca..f3ef3cbdde 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -101,6 +101,12 @@ libxl_device_model_version = Enumeration("device_model_version", [
     (2, "QEMU_XEN"),             # Upstream based qemu-xen device model
     ])
 
+libxl_device_model_machine = Enumeration("device_model_machine", [
+    (0, "UNKNOWN"),
+    (1, "I440"),
+    (2, "Q35"),
+    ])
+
 libxl_console_type = Enumeration("console_type", [
     (0, "UNKNOWN"),
     (1, "SERIAL"),
@@ -491,6 +497,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("device_model_ssid_label", string),
     # device_model_user is not ready for use yet
     ("device_model_user", string),
+    ("device_model_machine", libxl_device_model_machine),
 
     # extra parameters pass directly to qemu, NULL terminated
     ("extra",            libxl_string_list),
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index f6842540ca..a7506a426b 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -2110,6 +2110,20 @@ skip_usbdev:
     xlu_cfg_replace_string(config, "device_model_user",
                            &b_info->device_model_user, 0);
 
+    if (!xlu_cfg_get_string (config, "device_model_machine", &buf, 0)) {
+        if (!strcmp(buf, "i440")) {
+            b_info->device_model_machine = LIBXL_DEVICE_MODEL_MACHINE_I440;
+        } else if (!strcmp(buf, "q35")) {
+            b_info->device_model_machine = LIBXL_DEVICE_MODEL_MACHINE_Q35;
+        } else {
+            fprintf(stderr,
+                    "Unknown device_model_machine \"%s\" specified\n", buf);
+            exit(1);
+        }
+    } else {
+        b_info->device_model_machine = LIBXL_DEVICE_MODEL_MACHINE_UNKNOWN;
+    }
+
 #define parse_extra_args(type)                                            \
     e = xlu_cfg_get_list_as_string_list(config, "device_model_args"#type, \
                                     &b_info->extra##type, 0);            \
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 09/12] libxl: Xen Platform device support for Q35
  2018-03-12 18:33 ` Alexey Gerasimenko
                   ` (8 preceding siblings ...)
  (?)
@ 2018-03-12 18:33 ` Alexey Gerasimenko
  2018-03-19 15:05   ` Alexey G
  -1 siblings, 1 reply; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Alexey Gerasimenko

Current Xen/QEMU method to control Xen Platform device is a bit odd --
changing 'xen_platform_device' option value actually modifies QEMU
emulated machine type, namely xenfv <--> pc.

In order to avoid multiplying machine types, use the new way to control
Xen Platform device for QEMU -- xen-platform-dev property. To maintain
backward compatibility with existing Xen/QEMU setups, this is only
applicable to q35 machine currently. i440 emulation uses the old method
(xenfv/pc machine) to control Xen Platform device, this may be changed
later to xen-platform-dev property as well.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 tools/libxl/libxl_dm.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 7b531050c7..586035aa73 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1444,7 +1444,11 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
         break;
     case LIBXL_DOMAIN_TYPE_HVM:
         if (b_info->device_model_machine == LIBXL_DEVICE_MODEL_MACHINE_Q35) {
-            machinearg = libxl__sprintf(gc, "q35,accel=xen");
+            if (!libxl_defbool_val(b_info->u.hvm.xen_platform_pci)) {
+                machinearg = libxl__sprintf(gc, "q35,accel=xen");
+            } else {
+                machinearg = libxl__sprintf(gc, "q35,accel=xen,xen-platform-dev=on");
+            }
         } else {
             if (!libxl_defbool_val(b_info->u.hvm.xen_platform_pci)) {
                 /* Switching here to the machine "pc" which does not add
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 10/12] libacpi: build ACPI MCFG table if requested
  2018-03-12 18:33 ` Alexey Gerasimenko
                   ` (9 preceding siblings ...)
  (?)
@ 2018-03-12 18:33 ` Alexey Gerasimenko
  2018-03-19 17:33   ` Roger Pau Monné
  2018-05-29 14:36   ` Jan Beulich
  -1 siblings, 2 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Alexey Gerasimenko, Jan Beulich

This adds construct_mcfg() function to libacpi which allows to build MCFG
table for a given mmconfig_addr/mmconfig_len pair if the ACPI_HAS_MCFG
flag was specified in acpi_config struct.

The maximum bus number is calculated from mmconfig_len using
MCFG_SIZE_TO_NUM_BUSES macro (1MByte of MMIO space per bus).

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 tools/libacpi/acpi2_0.h | 21 +++++++++++++++++++++
 tools/libacpi/build.c   | 42 ++++++++++++++++++++++++++++++++++++++++++
 tools/libacpi/libacpi.h |  4 ++++
 3 files changed, 67 insertions(+)

diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
index 2619ba32db..209ad1acd3 100644
--- a/tools/libacpi/acpi2_0.h
+++ b/tools/libacpi/acpi2_0.h
@@ -422,6 +422,25 @@ struct acpi_20_slit {
 };
 
 /*
+ * PCI Express Memory Mapped Configuration Description Table
+ */
+struct mcfg_range_entry {
+    uint64_t base_address;
+    uint16_t pci_segment;
+    uint8_t  start_pci_bus_num;
+    uint8_t  end_pci_bus_num;
+    uint32_t reserved;
+};
+
+struct acpi_mcfg {
+    struct acpi_header header;
+    uint8_t reserved[8];
+    struct mcfg_range_entry entries[1];
+};
+
+#define MCFG_SIZE_TO_NUM_BUSES(size)  ((size) >> 20)
+
+/*
  * Table Signatures.
  */
 #define ACPI_2_0_RSDP_SIGNATURE ASCII64('R','S','D',' ','P','T','R',' ')
@@ -435,6 +454,7 @@ struct acpi_20_slit {
 #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
 #define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
 #define ACPI_2_0_SLIT_SIGNATURE ASCII32('S','L','I','T')
+#define ACPI_MCFG_SIGNATURE     ASCII32('M','C','F','G')
 
 /*
  * Table revision numbers.
@@ -449,6 +469,7 @@ struct acpi_20_slit {
 #define ACPI_1_0_FADT_REVISION 0x01
 #define ACPI_2_0_SRAT_REVISION 0x01
 #define ACPI_2_0_SLIT_REVISION 0x01
+#define ACPI_1_0_MCFG_REVISION 0x01
 
 #pragma pack ()
 
diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index f9881c9604..5daf1fc5b8 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -303,6 +303,37 @@ static struct acpi_20_slit *construct_slit(struct acpi_ctxt *ctxt,
     return slit;
 }
 
+static struct acpi_mcfg *construct_mcfg(struct acpi_ctxt *ctxt,
+                                        const struct acpi_config *config)
+{
+    struct acpi_mcfg *mcfg;
+
+    /* Warning: this code expects that we have only one PCI segment */
+    mcfg = ctxt->mem_ops.alloc(ctxt, sizeof(*mcfg), 16);
+    if (!mcfg)
+        return NULL;
+
+    memset(mcfg, 0, sizeof(*mcfg));
+    mcfg->header.signature    = ACPI_MCFG_SIGNATURE;
+    mcfg->header.revision     = ACPI_1_0_MCFG_REVISION;
+    fixed_strcpy(mcfg->header.oem_id, ACPI_OEM_ID);
+    fixed_strcpy(mcfg->header.oem_table_id, ACPI_OEM_TABLE_ID);
+    mcfg->header.oem_revision = ACPI_OEM_REVISION;
+    mcfg->header.creator_id   = ACPI_CREATOR_ID;
+    mcfg->header.creator_revision = ACPI_CREATOR_REVISION;
+    mcfg->header.length = sizeof(*mcfg);
+
+    mcfg->entries[0].base_address = config->mmconfig_addr;
+    mcfg->entries[0].pci_segment = 0;
+    mcfg->entries[0].start_pci_bus_num = 0;
+    mcfg->entries[0].end_pci_bus_num =
+        MCFG_SIZE_TO_NUM_BUSES(config->mmconfig_len) - 1;
+
+    set_checksum(mcfg, offsetof(struct acpi_header, checksum), sizeof(*mcfg));
+
+    return mcfg;;
+}
+
 static int construct_passthrough_tables(struct acpi_ctxt *ctxt,
                                         unsigned long *table_ptrs,
                                         int nr_tables,
@@ -350,6 +381,7 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
     struct acpi_20_hpet *hpet;
     struct acpi_20_waet *waet;
     struct acpi_20_tcpa *tcpa;
+    struct acpi_mcfg *mcfg;
     unsigned char *ssdt;
     static const uint16_t tis_signature[] = {0x0001, 0x0001, 0x0001};
     void *lasa;
@@ -417,6 +449,16 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
         printf("CONV disabled\n");
     }
 
+    /* MCFG */
+    if ( config->table_flags & ACPI_HAS_MCFG )
+    {
+        mcfg = construct_mcfg(ctxt, config);
+        if (!mcfg)
+            return -1;
+
+        table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, mcfg);
+    }
+
     /* TPM TCPA and SSDT. */
     if ( (config->table_flags & ACPI_HAS_TCPA) &&
          (config->tis_hdr[0] == tis_signature[0]) &&
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index a2efd23b0b..dd85b928e9 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -36,6 +36,7 @@
 #define ACPI_HAS_8042              (1<<13)
 #define ACPI_HAS_CMOS_RTC          (1<<14)
 #define ACPI_HAS_SSDT_LAPTOP_SLATE (1<<15)
+#define ACPI_HAS_MCFG              (1<<16)
 
 struct xen_vmemrange;
 struct acpi_numa {
@@ -96,6 +97,9 @@ struct acpi_config {
     uint32_t ioapic_base_address;
     uint16_t pci_isa_irq_mask;
     uint8_t ioapic_id;
+
+    uint64_t mmconfig_addr;
+    uint32_t mmconfig_len;
 };
 
 int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 11/12] hvmloader: use libacpi to build MCFG table
  2018-03-12 18:33 ` Alexey Gerasimenko
                   ` (10 preceding siblings ...)
  (?)
@ 2018-03-12 18:33 ` Alexey Gerasimenko
  2018-03-14 17:48   ` Alexey G
                     ` (2 more replies)
  -1 siblings, 3 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Ian Jackson, Alexey Gerasimenko, Jan Beulich, Wei Liu

This patch extends hvmloader_acpi_build_tables() with code which detects
if MMCONFIG is available -- i.e. initialized and enabled (+we're running
on Q35), obtains its base address and size and asks libacpi to build MCFG
table for it via setting the flag ACPI_HAS_MCFG in a manner similar
to other optional ACPI tables building.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 tools/firmware/hvmloader/util.c | 70 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index d8db9e3c8e..c6fc81d52a 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -782,6 +782,69 @@ int get_pc_machine_type(void)
     return machine_type;
 }
 
+#define PCIEXBAR_ADDR_MASK_64MB     (~((1ULL << 26) - 1))
+#define PCIEXBAR_ADDR_MASK_128MB    (~((1ULL << 27) - 1))
+#define PCIEXBAR_ADDR_MASK_256MB    (~((1ULL << 28) - 1))
+#define PCIEXBAR_LENGTH_BITS(reg)   (((reg) >> 1) & 3)
+#define PCIEXBAREN                  1
+
+static uint64_t mmconfig_get_base(void)
+{
+    uint64_t base;
+    uint32_t reg = pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR);
+
+    base = reg | (uint64_t) pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR+4) << 32;
+
+    switch (PCIEXBAR_LENGTH_BITS(reg))
+    {
+    case 0:
+        base &= PCIEXBAR_ADDR_MASK_256MB;
+        break;
+    case 1:
+        base &= PCIEXBAR_ADDR_MASK_128MB;
+        break;
+    case 2:
+        base &= PCIEXBAR_ADDR_MASK_64MB;
+        break;
+    case 3:
+        BUG();  /* a reserved value encountered */
+    }
+
+    return base;
+}
+
+static uint32_t mmconfig_get_size(void)
+{
+    uint32_t reg = pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR);
+
+    switch (PCIEXBAR_LENGTH_BITS(reg))
+    {
+    case 0: return MB(256);
+    case 1: return MB(128);
+    case 2: return MB(64);
+    case 3:
+        BUG();  /* a reserved value encountered */
+    }
+
+    return 0;
+}
+
+static uint32_t mmconfig_is_enabled(void)
+{
+    return pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR) & PCIEXBAREN;
+}
+
+static int is_mmconfig_used(void)
+{
+    if (get_pc_machine_type() == MACHINE_TYPE_Q35)
+    {
+        if (mmconfig_is_enabled() && mmconfig_get_base())
+            return 1;
+    }
+
+    return 0;
+}
+
 static void validate_hvm_info(struct hvm_info_table *t)
 {
     uint8_t *ptr = (uint8_t *)t;
@@ -993,6 +1056,13 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
         config->pci_hi_len = pci_hi_mem_end - pci_hi_mem_start;
     }
 
+    if ( is_mmconfig_used() )
+    {
+        config->table_flags |= ACPI_HAS_MCFG;
+        config->mmconfig_addr = mmconfig_get_base();
+        config->mmconfig_len  = mmconfig_get_size();
+    }
+
     s = xenstore_read("platform/generation-id", "0:0");
     if ( s )
     {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 12/12] docs: provide description for device_model_machine option
  2018-03-12 18:33 ` Alexey Gerasimenko
                   ` (11 preceding siblings ...)
  (?)
@ 2018-03-12 18:33 ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Alexey Gerasimenko

This patch adds description for 'device_model_machine' option which allows
to control which chipset will be emulated by device model.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 docs/man/xl.cfg.pod.5.in | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index a699367779..7b8991ab7d 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -2484,6 +2484,33 @@ you have existing guests then, depending on the nature of the guest
 Operating System, you may wish to force them to use the device
 model which they were installed with.
 
+=item B<device_model_machine="STRING">
+
+Selects which chipset the device model should emulate for this
+guest.
+
+Valid options are:
+
+=over 4
+
+=item B<"i440">
+
+Use i440 emulation (a default setting)
+
+=item B<"q35">
+
+Use Q35/ICH9 emulation. This enables additional features for
+PCIe device passthrough
+
+=back
+
+Note that omitting device_model_machine parameter means i440 system
+by default, so the default behavior doesn't change for old domain
+config files.
+
+It is recommended to install the guest OS from scratch to avoid issues
+due to the emulated platform change.
+
 =item B<device_model_override="PATH">
 
 Override the path to the binary to be used as the device-model. The
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [Qemu-devel] [RFC PATCH 13/30] pc/xen: Xen Q35 support: provide IRQ handling for PCI devices
  2018-03-12 18:33 ` Alexey Gerasimenko
@ 2018-03-12 18:33   ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Alexey Gerasimenko, qemu-devel, Michael S. Tsirkin,
	Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost, Stefano Stabellini, Anthony Perard

The primary difference in PCI device IRQ management between Xen HVM and
QEMU is that Xen PCI IRQs are "device-centric" while QEMU PCI IRQs are
"chipset-centric". Namely, Xen uses PCI device BDF and INTx as coordinates
to assert IRQ while QEMU finds out to which chipset PIRQ the IRQ is routed
through the hierarchy of PCI buses and manages IRQ assertion on chipset
side (as PIRQ inputs).

Two callback functions are used for this purpose: .map_irq and .set_irq
(named after corresponding structure fields). Corresponding Xen-specific
callback functions are piix3_set_irq() and pci_slot_get_pirq(). In Xen
case these functions do not operate on pirq pin numbers. Instead, they use
a specific value to pass BDF/INTx information between .map_irq and
.set_irq -- PCI device devfn and INTx pin number are combined into
pseudo-PIRQ in pci_slot_get_pirq, which piix3_set_irq later decodes back
into devfn and INTx number for passing to *set_pci_intx_level() call.

For Xen on Q35 this scheme is still applicable, with the exception that
function names are non-descriptive now and need to be renamed to show
their common i440/Q35 nature. Proposed new names are:

xen_pci_slot_get_pirq --> xen_cmn_pci_slot_get_pirq
xen_piix3_set_irq     --> xen_cmn_set_irq

Another IRQ-related difference between i440 and Q35 is the number of PIRQ
inputs and PIRQ routers (PCI IRQ links in terms of ACPI) available. i440
has 4 PCI interrupt links, while Q35 has 8 (PIRQA...PIRQH).
Currently Xen have support for only 4 PCI links, so we describe only 4 of
8 PCI links in ACPI tables. Also, hvmloader disables PIRQ routing for
PIRQE..PIRQH by writing 80h into corresponding PIRQ[n]_ROUT registers.

All this PCI interrupt routing stuff is largely an ancient legacy from PIC
era. It's hardly worth to extend number of PCI links supported as we
normally deal with APIC mode and/or MSI interrupts.

The only useful thing to do with PIRQE..PIRQH routing currently is to
check if guest actually attempts to use it for some reason (despite ACPI
PCI routing information provided). In this case, a warning is logged.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/i386/pc_q35.c       | 13 ++++++++++---
 hw/i386/xen/xen-hvm.c  | 32 +++++++++++++++++++++++++++++---
 hw/isa/lpc_ich9.c      |  4 ++++
 hw/pci-host/piix.c     |  2 +-
 include/hw/i386/ich9.h |  1 +
 include/hw/xen/xen.h   |  5 +++--
 stubs/xen-hvm.c        |  8 ++++++--
 7 files changed, 54 insertions(+), 11 deletions(-)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 0c0bc48137..0db670f6d7 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -203,9 +203,16 @@ static void pc_q35_init(MachineState *machine)
     for (i = 0; i < GSI_NUM_PINS; i++) {
         qdev_connect_gpio_out_named(lpc_dev, ICH9_GPIO_GSI, i, pcms->gsi[i]);
     }
-    pci_bus_irqs(host_bus, ich9_lpc_set_irq, ich9_lpc_map_irq, ich9_lpc,
-                 ICH9_LPC_NB_PIRQS);
-    pci_bus_set_route_irq_fn(host_bus, ich9_route_intx_pin_to_irq);
+
+    if (xen_enabled()) {
+        pci_bus_irqs(host_bus, xen_cmn_set_irq, xen_cmn_pci_slot_get_pirq,
+                     ich9_lpc, ICH9_XEN_NUM_IRQ_SOURCES);
+    } else {
+        pci_bus_irqs(host_bus, ich9_lpc_set_irq, ich9_lpc_map_irq, ich9_lpc,
+                     ICH9_LPC_NB_PIRQS);
+        pci_bus_set_route_irq_fn(host_bus, ich9_route_intx_pin_to_irq);
+    }
+
     isa_bus = ich9_lpc->isa_bus;
 
     if (kvm_pic_in_kernel()) {
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index f24b7d4923..40a5c13fa6 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -13,6 +13,7 @@
 #include "cpu.h"
 #include "hw/pci/pci.h"
 #include "hw/i386/pc.h"
+#include "hw/i386/ich9.h"
 #include "hw/i386/apic-msidef.h"
 #include "hw/xen/xen_common.h"
 #include "hw/xen/xen_backend.h"
@@ -115,14 +116,14 @@ typedef struct XenIOState {
     Notifier wakeup;
 } XenIOState;
 
-/* Xen specific function for piix pci */
+/* Xen-specific functions for pci dev IRQ handling */
 
-int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num)
+int xen_cmn_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num)
 {
     return irq_num + ((pci_dev->devfn >> 3) << 2);
 }
 
-void xen_piix3_set_irq(void *opaque, int irq_num, int level)
+void xen_cmn_set_irq(void *opaque, int irq_num, int level)
 {
     xen_set_pci_intx_level(xen_domid, 0, 0, irq_num >> 2,
                            irq_num & 3, level);
@@ -145,6 +146,31 @@ void xen_piix_pci_write_config_client(uint32_t address, uint32_t val, int len)
     }
 }
 
+void xen_ich9_pci_write_config_client(uint32_t address, uint32_t val, int len)
+{
+    static bool pirqe_f_warned = false;
+
+    if (ranges_overlap(address, len, ICH9_LPC_PIRQA_ROUT, 4)) {
+        /* handle PIRQA..PIRQD routing */
+        xen_piix_pci_write_config_client(address, val, len);
+    } else if (ranges_overlap(address, len, ICH9_LPC_PIRQE_ROUT, 4)) {
+        while (len--) {
+            if (range_covers_byte(ICH9_LPC_PIRQE_ROUT, 4, address) &&
+                (val & 0x80) == 0) {
+                /* print warning only once */
+                if (!pirqe_f_warned) {
+                    pirqe_f_warned = true;
+                    fprintf(stderr, "WARNING: guest domain attempted to use PIRQ%c "
+                            "routing which is not supported for Xen/Q35 currently\n",
+                            (char)(address - ICH9_LPC_PIRQE_ROUT + 'E'));
+                    break;
+                }
+            }
+            address++, val >>= 8;
+        }
+    }
+}
+
 int xen_is_pirq_msi(uint32_t msi_data)
 {
     /* If vector is 0, the msi is remapped into a pirq, passed as
diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index e692b9fdc1..b17ac82ed6 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -49,6 +49,7 @@
 #include "qom/cpu.h"
 #include "hw/nvram/fw_cfg.h"
 #include "qemu/cutils.h"
+#include "hw/xen/xen.h"
 
 /*****************************************************************************/
 /* ICH9 LPC PCI to ISA bridge */
@@ -514,6 +515,9 @@ static void ich9_lpc_config_write(PCIDevice *d,
     ICH9LPCState *lpc = ICH9_LPC_DEVICE(d);
     uint32_t rcba_old = pci_get_long(d->config + ICH9_LPC_RCBA);
 
+    if (xen_enabled()){
+        xen_ich9_pci_write_config_client(addr, val, len);
+    }
     pci_default_write_config(d, addr, val, len);
     if (ranges_overlap(addr, len, ICH9_LPC_PMBASE, 4) ||
         ranges_overlap(addr, len, ICH9_LPC_ACPI_CTRL, 1)) {
diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 0e608347c1..2627c06fae 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -415,7 +415,7 @@ PCIBus *i440fx_init(const char *host_type, const char *pci_type,
         PCIDevice *pci_dev = pci_create_simple_multifunction(b,
                              -1, true, "PIIX3-xen");
         piix3 = PIIX3_PCI_DEVICE(pci_dev);
-        pci_bus_irqs(b, xen_piix3_set_irq, xen_pci_slot_get_pirq,
+        pci_bus_irqs(b, xen_cmn_set_irq, xen_cmn_pci_slot_get_pirq,
                 piix3, XEN_PIIX_NUM_PIRQS);
     } else {
         PCIDevice *pci_dev = pci_create_simple_multifunction(b,
diff --git a/include/hw/i386/ich9.h b/include/hw/i386/ich9.h
index 673d13d28f..3dc42fcbce 100644
--- a/include/hw/i386/ich9.h
+++ b/include/hw/i386/ich9.h
@@ -143,6 +143,7 @@ Object *ich9_lpc_find(void);
 
 #define ICH9_A2_LPC_REVISION                    0x2
 #define ICH9_LPC_NB_PIRQS                       8       /* PCI A-H */
+#define ICH9_XEN_NUM_IRQ_SOURCES                128
 
 #define ICH9_LPC_PMBASE                         0x40
 #define ICH9_LPC_PMBASE_BASE_ADDRESS_MASK       Q35_MASK(32, 15, 7)
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 7efcdaa8fe..55c6cad543 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -30,9 +30,10 @@ static inline bool xen_enabled(void)
     return xen_allowed;
 }
 
-int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num);
-void xen_piix3_set_irq(void *opaque, int irq_num, int level);
+int xen_cmn_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num);
+void xen_cmn_set_irq(void *opaque, int irq_num, int level);
 void xen_piix_pci_write_config_client(uint32_t address, uint32_t val, int len);
+void xen_ich9_pci_write_config_client(uint32_t address, uint32_t val, int len);
 void xen_hvm_inject_msi(uint64_t addr, uint32_t data);
 int xen_is_pirq_msi(uint32_t msi_data);
 
diff --git a/stubs/xen-hvm.c b/stubs/xen-hvm.c
index 0067bcc6db..c1bc45744c 100644
--- a/stubs/xen-hvm.c
+++ b/stubs/xen-hvm.c
@@ -14,12 +14,12 @@
 #include "exec/memory.h"
 #include "qapi/qapi-commands-misc.h"
 
-int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num)
+int xen_cmn_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num)
 {
     return -1;
 }
 
-void xen_piix3_set_irq(void *opaque, int irq_num, int level)
+void xen_cmn_set_irq(void *opaque, int irq_num, int level)
 {
 }
 
@@ -27,6 +27,10 @@ void xen_piix_pci_write_config_client(uint32_t address, uint32_t val, int len)
 {
 }
 
+void xen_ich9_pci_write_config_client(uint32_t address, uint32_t val, int len)
+{
+}
+
 void xen_hvm_inject_msi(uint64_t addr, uint32_t data)
 {
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 13/30] pc/xen: Xen Q35 support: provide IRQ handling for PCI devices
@ 2018-03-12 18:33   ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Eduardo Habkost, Michael S. Tsirkin,
	qemu-devel, Anthony Perard, Alexey Gerasimenko, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson

The primary difference in PCI device IRQ management between Xen HVM and
QEMU is that Xen PCI IRQs are "device-centric" while QEMU PCI IRQs are
"chipset-centric". Namely, Xen uses PCI device BDF and INTx as coordinates
to assert IRQ while QEMU finds out to which chipset PIRQ the IRQ is routed
through the hierarchy of PCI buses and manages IRQ assertion on chipset
side (as PIRQ inputs).

Two callback functions are used for this purpose: .map_irq and .set_irq
(named after corresponding structure fields). Corresponding Xen-specific
callback functions are piix3_set_irq() and pci_slot_get_pirq(). In Xen
case these functions do not operate on pirq pin numbers. Instead, they use
a specific value to pass BDF/INTx information between .map_irq and
.set_irq -- PCI device devfn and INTx pin number are combined into
pseudo-PIRQ in pci_slot_get_pirq, which piix3_set_irq later decodes back
into devfn and INTx number for passing to *set_pci_intx_level() call.

For Xen on Q35 this scheme is still applicable, with the exception that
function names are non-descriptive now and need to be renamed to show
their common i440/Q35 nature. Proposed new names are:

xen_pci_slot_get_pirq --> xen_cmn_pci_slot_get_pirq
xen_piix3_set_irq     --> xen_cmn_set_irq

Another IRQ-related difference between i440 and Q35 is the number of PIRQ
inputs and PIRQ routers (PCI IRQ links in terms of ACPI) available. i440
has 4 PCI interrupt links, while Q35 has 8 (PIRQA...PIRQH).
Currently Xen have support for only 4 PCI links, so we describe only 4 of
8 PCI links in ACPI tables. Also, hvmloader disables PIRQ routing for
PIRQE..PIRQH by writing 80h into corresponding PIRQ[n]_ROUT registers.

All this PCI interrupt routing stuff is largely an ancient legacy from PIC
era. It's hardly worth to extend number of PCI links supported as we
normally deal with APIC mode and/or MSI interrupts.

The only useful thing to do with PIRQE..PIRQH routing currently is to
check if guest actually attempts to use it for some reason (despite ACPI
PCI routing information provided). In this case, a warning is logged.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/i386/pc_q35.c       | 13 ++++++++++---
 hw/i386/xen/xen-hvm.c  | 32 +++++++++++++++++++++++++++++---
 hw/isa/lpc_ich9.c      |  4 ++++
 hw/pci-host/piix.c     |  2 +-
 include/hw/i386/ich9.h |  1 +
 include/hw/xen/xen.h   |  5 +++--
 stubs/xen-hvm.c        |  8 ++++++--
 7 files changed, 54 insertions(+), 11 deletions(-)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 0c0bc48137..0db670f6d7 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -203,9 +203,16 @@ static void pc_q35_init(MachineState *machine)
     for (i = 0; i < GSI_NUM_PINS; i++) {
         qdev_connect_gpio_out_named(lpc_dev, ICH9_GPIO_GSI, i, pcms->gsi[i]);
     }
-    pci_bus_irqs(host_bus, ich9_lpc_set_irq, ich9_lpc_map_irq, ich9_lpc,
-                 ICH9_LPC_NB_PIRQS);
-    pci_bus_set_route_irq_fn(host_bus, ich9_route_intx_pin_to_irq);
+
+    if (xen_enabled()) {
+        pci_bus_irqs(host_bus, xen_cmn_set_irq, xen_cmn_pci_slot_get_pirq,
+                     ich9_lpc, ICH9_XEN_NUM_IRQ_SOURCES);
+    } else {
+        pci_bus_irqs(host_bus, ich9_lpc_set_irq, ich9_lpc_map_irq, ich9_lpc,
+                     ICH9_LPC_NB_PIRQS);
+        pci_bus_set_route_irq_fn(host_bus, ich9_route_intx_pin_to_irq);
+    }
+
     isa_bus = ich9_lpc->isa_bus;
 
     if (kvm_pic_in_kernel()) {
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index f24b7d4923..40a5c13fa6 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -13,6 +13,7 @@
 #include "cpu.h"
 #include "hw/pci/pci.h"
 #include "hw/i386/pc.h"
+#include "hw/i386/ich9.h"
 #include "hw/i386/apic-msidef.h"
 #include "hw/xen/xen_common.h"
 #include "hw/xen/xen_backend.h"
@@ -115,14 +116,14 @@ typedef struct XenIOState {
     Notifier wakeup;
 } XenIOState;
 
-/* Xen specific function for piix pci */
+/* Xen-specific functions for pci dev IRQ handling */
 
-int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num)
+int xen_cmn_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num)
 {
     return irq_num + ((pci_dev->devfn >> 3) << 2);
 }
 
-void xen_piix3_set_irq(void *opaque, int irq_num, int level)
+void xen_cmn_set_irq(void *opaque, int irq_num, int level)
 {
     xen_set_pci_intx_level(xen_domid, 0, 0, irq_num >> 2,
                            irq_num & 3, level);
@@ -145,6 +146,31 @@ void xen_piix_pci_write_config_client(uint32_t address, uint32_t val, int len)
     }
 }
 
+void xen_ich9_pci_write_config_client(uint32_t address, uint32_t val, int len)
+{
+    static bool pirqe_f_warned = false;
+
+    if (ranges_overlap(address, len, ICH9_LPC_PIRQA_ROUT, 4)) {
+        /* handle PIRQA..PIRQD routing */
+        xen_piix_pci_write_config_client(address, val, len);
+    } else if (ranges_overlap(address, len, ICH9_LPC_PIRQE_ROUT, 4)) {
+        while (len--) {
+            if (range_covers_byte(ICH9_LPC_PIRQE_ROUT, 4, address) &&
+                (val & 0x80) == 0) {
+                /* print warning only once */
+                if (!pirqe_f_warned) {
+                    pirqe_f_warned = true;
+                    fprintf(stderr, "WARNING: guest domain attempted to use PIRQ%c "
+                            "routing which is not supported for Xen/Q35 currently\n",
+                            (char)(address - ICH9_LPC_PIRQE_ROUT + 'E'));
+                    break;
+                }
+            }
+            address++, val >>= 8;
+        }
+    }
+}
+
 int xen_is_pirq_msi(uint32_t msi_data)
 {
     /* If vector is 0, the msi is remapped into a pirq, passed as
diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index e692b9fdc1..b17ac82ed6 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -49,6 +49,7 @@
 #include "qom/cpu.h"
 #include "hw/nvram/fw_cfg.h"
 #include "qemu/cutils.h"
+#include "hw/xen/xen.h"
 
 /*****************************************************************************/
 /* ICH9 LPC PCI to ISA bridge */
@@ -514,6 +515,9 @@ static void ich9_lpc_config_write(PCIDevice *d,
     ICH9LPCState *lpc = ICH9_LPC_DEVICE(d);
     uint32_t rcba_old = pci_get_long(d->config + ICH9_LPC_RCBA);
 
+    if (xen_enabled()){
+        xen_ich9_pci_write_config_client(addr, val, len);
+    }
     pci_default_write_config(d, addr, val, len);
     if (ranges_overlap(addr, len, ICH9_LPC_PMBASE, 4) ||
         ranges_overlap(addr, len, ICH9_LPC_ACPI_CTRL, 1)) {
diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 0e608347c1..2627c06fae 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -415,7 +415,7 @@ PCIBus *i440fx_init(const char *host_type, const char *pci_type,
         PCIDevice *pci_dev = pci_create_simple_multifunction(b,
                              -1, true, "PIIX3-xen");
         piix3 = PIIX3_PCI_DEVICE(pci_dev);
-        pci_bus_irqs(b, xen_piix3_set_irq, xen_pci_slot_get_pirq,
+        pci_bus_irqs(b, xen_cmn_set_irq, xen_cmn_pci_slot_get_pirq,
                 piix3, XEN_PIIX_NUM_PIRQS);
     } else {
         PCIDevice *pci_dev = pci_create_simple_multifunction(b,
diff --git a/include/hw/i386/ich9.h b/include/hw/i386/ich9.h
index 673d13d28f..3dc42fcbce 100644
--- a/include/hw/i386/ich9.h
+++ b/include/hw/i386/ich9.h
@@ -143,6 +143,7 @@ Object *ich9_lpc_find(void);
 
 #define ICH9_A2_LPC_REVISION                    0x2
 #define ICH9_LPC_NB_PIRQS                       8       /* PCI A-H */
+#define ICH9_XEN_NUM_IRQ_SOURCES                128
 
 #define ICH9_LPC_PMBASE                         0x40
 #define ICH9_LPC_PMBASE_BASE_ADDRESS_MASK       Q35_MASK(32, 15, 7)
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 7efcdaa8fe..55c6cad543 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -30,9 +30,10 @@ static inline bool xen_enabled(void)
     return xen_allowed;
 }
 
-int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num);
-void xen_piix3_set_irq(void *opaque, int irq_num, int level);
+int xen_cmn_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num);
+void xen_cmn_set_irq(void *opaque, int irq_num, int level);
 void xen_piix_pci_write_config_client(uint32_t address, uint32_t val, int len);
+void xen_ich9_pci_write_config_client(uint32_t address, uint32_t val, int len);
 void xen_hvm_inject_msi(uint64_t addr, uint32_t data);
 int xen_is_pirq_msi(uint32_t msi_data);
 
diff --git a/stubs/xen-hvm.c b/stubs/xen-hvm.c
index 0067bcc6db..c1bc45744c 100644
--- a/stubs/xen-hvm.c
+++ b/stubs/xen-hvm.c
@@ -14,12 +14,12 @@
 #include "exec/memory.h"
 #include "qapi/qapi-commands-misc.h"
 
-int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num)
+int xen_cmn_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num)
 {
     return -1;
 }
 
-void xen_piix3_set_irq(void *opaque, int irq_num, int level)
+void xen_cmn_set_irq(void *opaque, int irq_num, int level)
 {
 }
 
@@ -27,6 +27,10 @@ void xen_piix_pci_write_config_client(uint32_t address, uint32_t val, int len)
 {
 }
 
+void xen_ich9_pci_write_config_client(uint32_t address, uint32_t val, int len)
+{
+}
+
 void xen_hvm_inject_msi(uint64_t addr, uint32_t data)
 {
 }
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [Qemu-devel] [RFC PATCH 14/30] pc/q35: Apply PCI bus BSEL property for Xen PCI device hotplug
  2018-03-12 18:33 ` Alexey Gerasimenko
@ 2018-03-12 18:33   ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Alexey Gerasimenko, qemu-devel, Michael S. Tsirkin,
	Igor Mammedov, Marcel Apfelbaum

On Q35 we still need to assign BSEL property to bus(es) for PCI device
add/hotplug to work.
Extend acpi_set_pci_info() function to support Q35 as well. Previously
it was limited to find_i440fx() call, this patch adds new (trivial)
function find_q35() which returns root PCIBus object on Q35, in a way
similar to what find_i440fx does.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/acpi/pcihp.c      | 6 +++++-
 hw/pci-host/q35.c    | 8 ++++++++
 include/hw/i386/pc.h | 3 +++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index 91c82fdc7a..f70d8620d7 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -105,7 +105,11 @@ static void acpi_set_pci_info(void)
     }
     bsel_is_set = true;
 
-    bus = find_i440fx(); /* TODO: Q35 support */
+    bus = find_i440fx();
+    if (!bus) {
+        bus = find_q35();
+    }
+
     if (bus) {
         /* Scan all PCI buses. Set property to enable acpi based hotplug. */
         pci_for_each_bus_depth_first(bus, acpi_set_bsel, NULL, &bsel_alloc);
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index a36a1195e4..8c1603fce9 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -258,6 +258,14 @@ static void q35_host_initfn(Object *obj)
             IO_APIC_DEFAULT_ADDRESS - 1);
 }
 
+PCIBus *find_q35(void)
+{
+    PCIHostState *s = OBJECT_CHECK(PCIHostState,
+                                   object_resolve_path("/machine/q35", NULL),
+                                   TYPE_PCI_HOST_BRIDGE);
+    return s ? s->bus : NULL;
+}
+
 static const TypeInfo q35_host_info = {
     .name       = TYPE_Q35_HOST_DEVICE,
     .parent     = TYPE_PCIE_HOST_BRIDGE,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index bb49165fe0..96d74b35bd 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -302,6 +302,9 @@ PCIBus *find_i440fx(void);
 extern PCIDevice *piix4_dev;
 int piix4_init(PCIBus *bus, ISABus **isa_bus, int devfn);
 
+/* q35.c */
+PCIBus *find_q35(void);
+
 /* pc_sysfw.c */
 void pc_system_firmware_init(MemoryRegion *rom_memory,
                              bool isapc_ram_fw);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 14/30] pc/q35: Apply PCI bus BSEL property for Xen PCI device hotplug
@ 2018-03-12 18:33   ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Marcel Apfelbaum, Igor Mammedov, Michael S. Tsirkin,
	Alexey Gerasimenko, qemu-devel

On Q35 we still need to assign BSEL property to bus(es) for PCI device
add/hotplug to work.
Extend acpi_set_pci_info() function to support Q35 as well. Previously
it was limited to find_i440fx() call, this patch adds new (trivial)
function find_q35() which returns root PCIBus object on Q35, in a way
similar to what find_i440fx does.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/acpi/pcihp.c      | 6 +++++-
 hw/pci-host/q35.c    | 8 ++++++++
 include/hw/i386/pc.h | 3 +++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index 91c82fdc7a..f70d8620d7 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -105,7 +105,11 @@ static void acpi_set_pci_info(void)
     }
     bsel_is_set = true;
 
-    bus = find_i440fx(); /* TODO: Q35 support */
+    bus = find_i440fx();
+    if (!bus) {
+        bus = find_q35();
+    }
+
     if (bus) {
         /* Scan all PCI buses. Set property to enable acpi based hotplug. */
         pci_for_each_bus_depth_first(bus, acpi_set_bsel, NULL, &bsel_alloc);
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index a36a1195e4..8c1603fce9 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -258,6 +258,14 @@ static void q35_host_initfn(Object *obj)
             IO_APIC_DEFAULT_ADDRESS - 1);
 }
 
+PCIBus *find_q35(void)
+{
+    PCIHostState *s = OBJECT_CHECK(PCIHostState,
+                                   object_resolve_path("/machine/q35", NULL),
+                                   TYPE_PCI_HOST_BRIDGE);
+    return s ? s->bus : NULL;
+}
+
 static const TypeInfo q35_host_info = {
     .name       = TYPE_Q35_HOST_DEVICE,
     .parent     = TYPE_PCIE_HOST_BRIDGE,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index bb49165fe0..96d74b35bd 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -302,6 +302,9 @@ PCIBus *find_i440fx(void);
 extern PCIDevice *piix4_dev;
 int piix4_init(PCIBus *bus, ISABus **isa_bus, int devfn);
 
+/* q35.c */
+PCIBus *find_q35(void);
+
 /* pc_sysfw.c */
 void pc_system_firmware_init(MemoryRegion *rom_memory,
                              bool isapc_ram_fw);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [Qemu-devel] [RFC PATCH 15/30] q35/acpi/xen: Provide ACPI PCI hotplug interface for Xen on Q35
  2018-03-12 18:33 ` Alexey Gerasimenko
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Alexey Gerasimenko, qemu-devel, Michael S. Tsirkin,
	Marcel Apfelbaum, Igor Mammedov

This patch allows to use ACPI PCI hotplug functionality for Xen on Q35.
All added code depends on xen_enabled(), so no functionality change for
non-Xen usage.

We need to call the acpi_set_pci_info function from ich9_pm_init as well,
so it was made globally visible again (as it was before).

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/acpi/ich9.c          | 24 ++++++++++++++++++++++++
 hw/acpi/pcihp.c         |  2 +-
 include/hw/acpi/ich9.h  |  2 ++
 include/hw/acpi/pcihp.h |  2 ++
 4 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index c5d8646abc..62e2582e1a 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -37,6 +37,7 @@
 
 #include "hw/i386/ich9.h"
 #include "hw/mem/pc-dimm.h"
+#include "hw/xen/xen.h"
 
 //#define DEBUG
 
@@ -258,6 +259,10 @@ static void pm_reset(void *opaque)
     pm->smi_en_wmask = ~0;
 
     acpi_update_sci(&pm->acpi_regs, pm->irq);
+
+    if (xen_enabled()) {
+        acpi_pcihp_reset(&pm->acpi_pci_hotplug);
+    }
 }
 
 static void pm_powerdown_req(Notifier *n, void *opaque)
@@ -300,6 +305,17 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm,
     pm->powerdown_notifier.notify = pm_powerdown_req;
     qemu_register_powerdown_notifier(&pm->powerdown_notifier);
 
+    if (xen_enabled()) {
+        PCIBus *bus = pci_get_bus(lpc_pci);
+
+        qbus_set_hotplug_handler(BUS(bus), DEVICE(lpc_pci), &error_abort);
+
+        acpi_pcihp_init(OBJECT(lpc_pci), &pm->acpi_pci_hotplug, bus,
+                        pci_address_space_io(lpc_pci), false);
+
+        acpi_set_pci_info();
+    }
+
     legacy_acpi_cpu_hotplug_init(pci_address_space_io(lpc_pci),
         OBJECT(lpc_pci), &pm->gpe_cpu, ICH9_CPU_HOTPLUG_IO_BASE);
 
@@ -496,6 +512,10 @@ void ich9_pm_device_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
             acpi_memory_plug_cb(hotplug_dev, &lpc->pm.acpi_memory_hotplug,
                                 dev, errp);
         }
+    } else if (xen_enabled() &&
+               object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        acpi_pcihp_device_plug_cb(hotplug_dev, &lpc->pm.acpi_pci_hotplug,
+                                  dev, errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
         if (lpc->pm.cpu_hotplug_legacy) {
             legacy_acpi_cpu_plug_cb(hotplug_dev, &lpc->pm.gpe_cpu, dev, errp);
@@ -522,6 +542,10 @@ void ich9_pm_device_unplug_request_cb(HotplugHandler *hotplug_dev,
                !lpc->pm.cpu_hotplug_legacy) {
         acpi_cpu_unplug_request_cb(hotplug_dev, &lpc->pm.cpuhp_state,
                                    dev, errp);
+    } else if (xen_enabled() &&
+               object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        acpi_pcihp_device_unplug_cb(hotplug_dev, &lpc->pm.acpi_pci_hotplug,
+                                    dev, errp);
     } else {
         error_setg(errp, "acpi: device unplug request for not supported device"
                    " type: %s", object_get_typename(OBJECT(dev)));
diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index f70d8620d7..d822f93293 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -94,7 +94,7 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque)
     return bsel_alloc;
 }
 
-static void acpi_set_pci_info(void)
+void acpi_set_pci_info(void)
 {
     static bool bsel_is_set;
     PCIBus *bus;
diff --git a/include/hw/acpi/ich9.h b/include/hw/acpi/ich9.h
index 59aeb06393..4a47d93745 100644
--- a/include/hw/acpi/ich9.h
+++ b/include/hw/acpi/ich9.h
@@ -26,6 +26,7 @@
 #include "hw/acpi/cpu.h"
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/acpi_dev_interface.h"
+#include "hw/acpi/pcihp.h"
 #include "hw/acpi/tco.h"
 
 typedef struct ICH9LPCPMRegs {
@@ -52,6 +53,7 @@ typedef struct ICH9LPCPMRegs {
     bool cpu_hotplug_legacy;
     AcpiCpuHotplug gpe_cpu;
     CPUHotplugState cpuhp_state;
+    AcpiPciHpState acpi_pci_hotplug;
 
     MemHotplugState acpi_memory_hotplug;
 
diff --git a/include/hw/acpi/pcihp.h b/include/hw/acpi/pcihp.h
index 8a65f99fc8..0a685dd228 100644
--- a/include/hw/acpi/pcihp.h
+++ b/include/hw/acpi/pcihp.h
@@ -64,6 +64,8 @@ void acpi_pcihp_device_unplug_cb(HotplugHandler *hotplug_dev, AcpiPciHpState *s,
 /* Called on reset */
 void acpi_pcihp_reset(AcpiPciHpState *s);
 
+void acpi_set_pci_info(void);
+
 extern const VMStateDescription vmstate_acpi_pcihp_pci_status;
 
 #define VMSTATE_PCI_HOTPLUG(pcihp, state, test_pcihp) \
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 15/30] q35/acpi/xen: Provide ACPI PCI hotplug interface for Xen on Q35
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Marcel Apfelbaum, Igor Mammedov, Michael S. Tsirkin,
	Alexey Gerasimenko, qemu-devel

This patch allows to use ACPI PCI hotplug functionality for Xen on Q35.
All added code depends on xen_enabled(), so no functionality change for
non-Xen usage.

We need to call the acpi_set_pci_info function from ich9_pm_init as well,
so it was made globally visible again (as it was before).

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/acpi/ich9.c          | 24 ++++++++++++++++++++++++
 hw/acpi/pcihp.c         |  2 +-
 include/hw/acpi/ich9.h  |  2 ++
 include/hw/acpi/pcihp.h |  2 ++
 4 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index c5d8646abc..62e2582e1a 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -37,6 +37,7 @@
 
 #include "hw/i386/ich9.h"
 #include "hw/mem/pc-dimm.h"
+#include "hw/xen/xen.h"
 
 //#define DEBUG
 
@@ -258,6 +259,10 @@ static void pm_reset(void *opaque)
     pm->smi_en_wmask = ~0;
 
     acpi_update_sci(&pm->acpi_regs, pm->irq);
+
+    if (xen_enabled()) {
+        acpi_pcihp_reset(&pm->acpi_pci_hotplug);
+    }
 }
 
 static void pm_powerdown_req(Notifier *n, void *opaque)
@@ -300,6 +305,17 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm,
     pm->powerdown_notifier.notify = pm_powerdown_req;
     qemu_register_powerdown_notifier(&pm->powerdown_notifier);
 
+    if (xen_enabled()) {
+        PCIBus *bus = pci_get_bus(lpc_pci);
+
+        qbus_set_hotplug_handler(BUS(bus), DEVICE(lpc_pci), &error_abort);
+
+        acpi_pcihp_init(OBJECT(lpc_pci), &pm->acpi_pci_hotplug, bus,
+                        pci_address_space_io(lpc_pci), false);
+
+        acpi_set_pci_info();
+    }
+
     legacy_acpi_cpu_hotplug_init(pci_address_space_io(lpc_pci),
         OBJECT(lpc_pci), &pm->gpe_cpu, ICH9_CPU_HOTPLUG_IO_BASE);
 
@@ -496,6 +512,10 @@ void ich9_pm_device_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
             acpi_memory_plug_cb(hotplug_dev, &lpc->pm.acpi_memory_hotplug,
                                 dev, errp);
         }
+    } else if (xen_enabled() &&
+               object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        acpi_pcihp_device_plug_cb(hotplug_dev, &lpc->pm.acpi_pci_hotplug,
+                                  dev, errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
         if (lpc->pm.cpu_hotplug_legacy) {
             legacy_acpi_cpu_plug_cb(hotplug_dev, &lpc->pm.gpe_cpu, dev, errp);
@@ -522,6 +542,10 @@ void ich9_pm_device_unplug_request_cb(HotplugHandler *hotplug_dev,
                !lpc->pm.cpu_hotplug_legacy) {
         acpi_cpu_unplug_request_cb(hotplug_dev, &lpc->pm.cpuhp_state,
                                    dev, errp);
+    } else if (xen_enabled() &&
+               object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        acpi_pcihp_device_unplug_cb(hotplug_dev, &lpc->pm.acpi_pci_hotplug,
+                                    dev, errp);
     } else {
         error_setg(errp, "acpi: device unplug request for not supported device"
                    " type: %s", object_get_typename(OBJECT(dev)));
diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index f70d8620d7..d822f93293 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -94,7 +94,7 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque)
     return bsel_alloc;
 }
 
-static void acpi_set_pci_info(void)
+void acpi_set_pci_info(void)
 {
     static bool bsel_is_set;
     PCIBus *bus;
diff --git a/include/hw/acpi/ich9.h b/include/hw/acpi/ich9.h
index 59aeb06393..4a47d93745 100644
--- a/include/hw/acpi/ich9.h
+++ b/include/hw/acpi/ich9.h
@@ -26,6 +26,7 @@
 #include "hw/acpi/cpu.h"
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/acpi_dev_interface.h"
+#include "hw/acpi/pcihp.h"
 #include "hw/acpi/tco.h"
 
 typedef struct ICH9LPCPMRegs {
@@ -52,6 +53,7 @@ typedef struct ICH9LPCPMRegs {
     bool cpu_hotplug_legacy;
     AcpiCpuHotplug gpe_cpu;
     CPUHotplugState cpuhp_state;
+    AcpiPciHpState acpi_pci_hotplug;
 
     MemHotplugState acpi_memory_hotplug;
 
diff --git a/include/hw/acpi/pcihp.h b/include/hw/acpi/pcihp.h
index 8a65f99fc8..0a685dd228 100644
--- a/include/hw/acpi/pcihp.h
+++ b/include/hw/acpi/pcihp.h
@@ -64,6 +64,8 @@ void acpi_pcihp_device_unplug_cb(HotplugHandler *hotplug_dev, AcpiPciHpState *s,
 /* Called on reset */
 void acpi_pcihp_reset(AcpiPciHpState *s);
 
+void acpi_set_pci_info(void);
+
 extern const VMStateDescription vmstate_acpi_pcihp_pci_status;
 
 #define VMSTATE_PCI_HOTPLUG(pcihp, state, test_pcihp) \
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [Qemu-devel] [RFC PATCH 16/30] q35/xen: Add Xen platform device support for Q35
  2018-03-12 18:33 ` Alexey Gerasimenko
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Alexey Gerasimenko, qemu-devel, Eduardo Habkost,
	Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Michael S. Tsirkin

Current Xen/QEMU method to control Xen Platform device on i440 is a bit
odd -- enabling/disabling Xen platform device actually modifies the QEMU
emulated machine type, namely xenfv <--> pc.

In order to avoid multiplying machine types, use a new way to control Xen
Platform device for QEMU -- "xen-platform-dev" machine property (bool).
To maintain backward compatibility with existing Xen/QEMU setups, this
is only applicable to q35 machine currently. i440 emulation still uses the
old method (i.e. xenfv/pc machine selection) to control Xen Platform
device, this may be changed later to xen-platform-dev property as well.

This way we can use a single machine type (q35) and change just
xen-platform-dev value to on/off to control Xen platform device.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/core/machine.c   | 21 +++++++++++++++++++++
 hw/i386/pc_q35.c    | 14 ++++++++++++++
 include/hw/boards.h |  1 +
 qemu-options.hx     |  1 +
 4 files changed, 37 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 5e2bbcdace..205e7da3ce 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -290,6 +290,20 @@ static void machine_set_igd_gfx_passthru(Object *obj, bool value, Error **errp)
     ms->igd_gfx_passthru = value;
 }
 
+static bool machine_get_xen_platform_dev(Object *obj, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    return ms->xen_platform_dev;
+}
+
+static void machine_set_xen_platform_dev(Object *obj, bool value, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    ms->xen_platform_dev = value;
+}
+
 static char *machine_get_firmware(Object *obj, Error **errp)
 {
     MachineState *ms = MACHINE(obj);
@@ -595,6 +609,13 @@ static void machine_class_init(ObjectClass *oc, void *data)
     object_class_property_set_description(oc, "igd-passthru",
         "Set on/off to enable/disable igd passthrou", &error_abort);
 
+    object_class_property_add_bool(oc, "xen-platform-dev",
+        machine_get_xen_platform_dev,
+        machine_set_xen_platform_dev, &error_abort);
+    object_class_property_set_description(oc, "xen-platform-dev",
+        "Set on/off to enable/disable Xen Platform device",
+        &error_abort);
+
     object_class_property_add_str(oc, "firmware",
         machine_get_firmware, machine_set_firmware,
         &error_abort);
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 0db670f6d7..62caf924cf 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -56,6 +56,18 @@
 /* ICH9 AHCI has 6 ports */
 #define MAX_SATA_PORTS     6
 
+static void q35_xen_hvm_init(MachineState *machine)
+{
+    PCMachineState *pcms = PC_MACHINE(machine);
+
+    if (xen_enabled()) {
+        /* check if Xen Platform device is enabled */
+        if (machine->xen_platform_dev) {
+            pci_create_simple(pcms->bus, -1, "xen-platform");
+        }
+    }
+}
+
 /* PC hardware initialisation */
 static void pc_q35_init(MachineState *machine)
 {
@@ -207,6 +219,8 @@ static void pc_q35_init(MachineState *machine)
     if (xen_enabled()) {
         pci_bus_irqs(host_bus, xen_cmn_set_irq, xen_cmn_pci_slot_get_pirq,
                      ich9_lpc, ICH9_XEN_NUM_IRQ_SOURCES);
+
+        q35_xen_hvm_init(machine);
     } else {
         pci_bus_irqs(host_bus, ich9_lpc_set_irq, ich9_lpc_map_irq, ich9_lpc,
                      ICH9_LPC_NB_PIRQS);
diff --git a/include/hw/boards.h b/include/hw/boards.h
index efb0a9edfd..f35fc1cc03 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -238,6 +238,7 @@ struct MachineState {
     bool usb;
     bool usb_disabled;
     bool igd_gfx_passthru;
+    bool xen_platform_dev;
     char *firmware;
     bool iommu;
     bool suppress_vmdesc;
diff --git a/qemu-options.hx b/qemu-options.hx
index 6585058c6c..cee0b92028 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -38,6 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
     "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
     "                mem-merge=on|off controls memory merge support (default: on)\n"
     "                igd-passthru=on|off controls IGD GFX passthrough support (default=off)\n"
+    "                xen-platform-dev=on|off controls Xen Platform device (default=off)\n"
     "                aes-key-wrap=on|off controls support for AES key wrapping (default=on)\n"
     "                dea-key-wrap=on|off controls support for DEA key wrapping (default=on)\n"
     "                suppress-vmdesc=on|off disables self-describing migration (default=off)\n"
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 16/30] q35/xen: Add Xen platform device support for Q35
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel,
	Alexey Gerasimenko, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson

Current Xen/QEMU method to control Xen Platform device on i440 is a bit
odd -- enabling/disabling Xen platform device actually modifies the QEMU
emulated machine type, namely xenfv <--> pc.

In order to avoid multiplying machine types, use a new way to control Xen
Platform device for QEMU -- "xen-platform-dev" machine property (bool).
To maintain backward compatibility with existing Xen/QEMU setups, this
is only applicable to q35 machine currently. i440 emulation still uses the
old method (i.e. xenfv/pc machine selection) to control Xen Platform
device, this may be changed later to xen-platform-dev property as well.

This way we can use a single machine type (q35) and change just
xen-platform-dev value to on/off to control Xen platform device.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/core/machine.c   | 21 +++++++++++++++++++++
 hw/i386/pc_q35.c    | 14 ++++++++++++++
 include/hw/boards.h |  1 +
 qemu-options.hx     |  1 +
 4 files changed, 37 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 5e2bbcdace..205e7da3ce 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -290,6 +290,20 @@ static void machine_set_igd_gfx_passthru(Object *obj, bool value, Error **errp)
     ms->igd_gfx_passthru = value;
 }
 
+static bool machine_get_xen_platform_dev(Object *obj, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    return ms->xen_platform_dev;
+}
+
+static void machine_set_xen_platform_dev(Object *obj, bool value, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    ms->xen_platform_dev = value;
+}
+
 static char *machine_get_firmware(Object *obj, Error **errp)
 {
     MachineState *ms = MACHINE(obj);
@@ -595,6 +609,13 @@ static void machine_class_init(ObjectClass *oc, void *data)
     object_class_property_set_description(oc, "igd-passthru",
         "Set on/off to enable/disable igd passthrou", &error_abort);
 
+    object_class_property_add_bool(oc, "xen-platform-dev",
+        machine_get_xen_platform_dev,
+        machine_set_xen_platform_dev, &error_abort);
+    object_class_property_set_description(oc, "xen-platform-dev",
+        "Set on/off to enable/disable Xen Platform device",
+        &error_abort);
+
     object_class_property_add_str(oc, "firmware",
         machine_get_firmware, machine_set_firmware,
         &error_abort);
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 0db670f6d7..62caf924cf 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -56,6 +56,18 @@
 /* ICH9 AHCI has 6 ports */
 #define MAX_SATA_PORTS     6
 
+static void q35_xen_hvm_init(MachineState *machine)
+{
+    PCMachineState *pcms = PC_MACHINE(machine);
+
+    if (xen_enabled()) {
+        /* check if Xen Platform device is enabled */
+        if (machine->xen_platform_dev) {
+            pci_create_simple(pcms->bus, -1, "xen-platform");
+        }
+    }
+}
+
 /* PC hardware initialisation */
 static void pc_q35_init(MachineState *machine)
 {
@@ -207,6 +219,8 @@ static void pc_q35_init(MachineState *machine)
     if (xen_enabled()) {
         pci_bus_irqs(host_bus, xen_cmn_set_irq, xen_cmn_pci_slot_get_pirq,
                      ich9_lpc, ICH9_XEN_NUM_IRQ_SOURCES);
+
+        q35_xen_hvm_init(machine);
     } else {
         pci_bus_irqs(host_bus, ich9_lpc_set_irq, ich9_lpc_map_irq, ich9_lpc,
                      ICH9_LPC_NB_PIRQS);
diff --git a/include/hw/boards.h b/include/hw/boards.h
index efb0a9edfd..f35fc1cc03 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -238,6 +238,7 @@ struct MachineState {
     bool usb;
     bool usb_disabled;
     bool igd_gfx_passthru;
+    bool xen_platform_dev;
     char *firmware;
     bool iommu;
     bool suppress_vmdesc;
diff --git a/qemu-options.hx b/qemu-options.hx
index 6585058c6c..cee0b92028 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -38,6 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
     "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
     "                mem-merge=on|off controls memory merge support (default: on)\n"
     "                igd-passthru=on|off controls IGD GFX passthrough support (default=off)\n"
+    "                xen-platform-dev=on|off controls Xen Platform device (default=off)\n"
     "                aes-key-wrap=on|off controls support for AES key wrapping (default=on)\n"
     "                dea-key-wrap=on|off controls support for DEA key wrapping (default=on)\n"
     "                suppress-vmdesc=on|off disables self-describing migration (default=off)\n"
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 17/30] q35: Fix incorrect values for PCIEXBAR masks
  2018-03-12 18:33 ` Alexey Gerasimenko
                   ` (16 preceding siblings ...)
  (?)
@ 2018-03-12 18:34 ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Marcel Apfelbaum, Michael S. Tsirkin, Alexey Gerasimenko, qemu-devel

There are two small issues in PCIEXBAR address mask handling:
- wrong bit positions for address mask bits (see PCIEXBAR description
  in Q35 datasheet)
- incorrect usage of 64ADR_MASK

Due to this, attempting to write a valid PCIEXBAR address may cause it to
shift to another address, causing memory layout corruption where emulated
MMIO regions may overlap real (passed through) MMIO ranges. Fix this
by providing correct values.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/pci-host/q35.c         | 6 +++---
 include/hw/pci-host/q35.h | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 8c1603fce9..b9a49721e2 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -322,12 +322,12 @@ static void mch_update_pciexbar(MCHPCIState *mch)
         break;
     case MCH_HOST_BRIDGE_PCIEXBAR_LENGTH_128M:
         length = 128 * 1024 * 1024;
-        addr_mask |= MCH_HOST_BRIDGE_PCIEXBAR_128ADMSK |
-            MCH_HOST_BRIDGE_PCIEXBAR_64ADMSK;
+        addr_mask |= MCH_HOST_BRIDGE_PCIEXBAR_128ADMSK;
         break;
     case MCH_HOST_BRIDGE_PCIEXBAR_LENGTH_64M:
         length = 64 * 1024 * 1024;
-        addr_mask |= MCH_HOST_BRIDGE_PCIEXBAR_64ADMSK;
+        addr_mask |= MCH_HOST_BRIDGE_PCIEXBAR_64ADMSK |
+            MCH_HOST_BRIDGE_PCIEXBAR_128ADMSK;
         break;
     case MCH_HOST_BRIDGE_PCIEXBAR_LENGTH_RVD:
     default:
diff --git a/include/hw/pci-host/q35.h b/include/hw/pci-host/q35.h
index 8f4ddde393..ec8d77fa8b 100644
--- a/include/hw/pci-host/q35.h
+++ b/include/hw/pci-host/q35.h
@@ -103,8 +103,8 @@ typedef struct Q35PCIHost {
 #define MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT       0xb0000000
 #define MCH_HOST_BRIDGE_PCIEXBAR_MAX           (0x10000000) /* 256M */
 #define MCH_HOST_BRIDGE_PCIEXBAR_ADMSK         Q35_MASK(64, 35, 28)
-#define MCH_HOST_BRIDGE_PCIEXBAR_128ADMSK      ((uint64_t)(1 << 26))
-#define MCH_HOST_BRIDGE_PCIEXBAR_64ADMSK       ((uint64_t)(1 << 25))
+#define MCH_HOST_BRIDGE_PCIEXBAR_128ADMSK      ((uint64_t)(1 << 27))
+#define MCH_HOST_BRIDGE_PCIEXBAR_64ADMSK       ((uint64_t)(1 << 26))
 #define MCH_HOST_BRIDGE_PCIEXBAR_LENGTH_MASK   ((uint64_t)(0x3 << 1))
 #define MCH_HOST_BRIDGE_PCIEXBAR_LENGTH_256M   ((uint64_t)(0x0 << 1))
 #define MCH_HOST_BRIDGE_PCIEXBAR_LENGTH_128M   ((uint64_t)(0x1 << 1))
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [Qemu-devel] [RFC PATCH 18/30] xen/pt: XenHostPCIDevice: provide functions for PCI Capabilities and PCIe Extended Capabilities enumeration
  2018-03-12 18:33 ` Alexey Gerasimenko
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Alexey Gerasimenko, qemu-devel, Stefano Stabellini, Anthony Perard

This patch introduces 2 new functions,
- xen_host_pci_find_next_ext_cap (actually a reworked
  xen_host_pci_find_ext_cap_offset function which is unused)
- xen_host_pci_find_next_cap

These functions allow to search for PCI/PCIe capabilities in a uniform
way. Both functions allow to search either a specific capability or any
encountered next (by specifying CAP_ID_ANY as a capability ID) -- this may
be useful when we merely need to traverse the capability list one-by-one.
In both functions the 'pos' argument allows to continue searching from
last position (0 means to start from beginning).

In order not to probe PCIe Extended Capabilities existence every time,
xen_host_pci_find_next_ext_cap makes use of the new 'has_pcie_ext_caps'
field in XenHostPCIDevice structure which is filled only once (in
xen_host_pci_device_get).

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen-host-pci-device.c | 95 +++++++++++++++++++++++++++++++++++++-------
 hw/xen/xen-host-pci-device.h |  5 ++-
 2 files changed, 85 insertions(+), 15 deletions(-)

diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
index eed8cc88e3..9d76b199af 100644
--- a/hw/xen/xen-host-pci-device.c
+++ b/hw/xen/xen-host-pci-device.c
@@ -14,6 +14,7 @@
 
 #define XEN_HOST_PCI_MAX_EXT_CAP \
     ((PCIE_CONFIG_SPACE_SIZE - PCI_CONFIG_SPACE_SIZE) / (PCI_CAP_SIZEOF + 4))
+#define XEN_HOST_PCI_CAP_MAX 48
 
 #ifdef XEN_HOST_PCI_DEVICE_DEBUG
 #  define XEN_HOST_PCI_LOG(f, a...) fprintf(stderr, "%s: " f, __func__, ##a)
@@ -199,6 +200,19 @@ static bool xen_host_pci_dev_is_virtfn(XenHostPCIDevice *d)
     return !stat(path, &buf);
 }
 
+static bool xen_host_pci_dev_has_pcie_ext_caps(XenHostPCIDevice *d)
+{
+    uint32_t header;
+
+    if (xen_host_pci_get_long(d, PCI_CONFIG_SPACE_SIZE, &header))
+        return false;
+
+    if (header == 0 || header == ~0U)
+        return false;
+
+    return true;
+}
+
 static void xen_host_pci_config_open(XenHostPCIDevice *d, Error **errp)
 {
     char path[PATH_MAX];
@@ -297,37 +311,89 @@ int xen_host_pci_set_block(XenHostPCIDevice *d, int pos, uint8_t *buf, int len)
     return xen_host_pci_config_write(d, pos, buf, len);
 }
 
-int xen_host_pci_find_ext_cap_offset(XenHostPCIDevice *d, uint32_t cap)
+int xen_host_pci_find_next_ext_cap(XenHostPCIDevice *d, int pos, uint32_t cap)
 {
     uint32_t header = 0;
     int max_cap = XEN_HOST_PCI_MAX_EXT_CAP;
-    int pos = PCI_CONFIG_SPACE_SIZE;
+
+    if (!d->has_pcie_ext_caps)
+        return 0;
+
+    if (!pos) {
+        pos = PCI_CONFIG_SPACE_SIZE;
+    } else {
+        if (xen_host_pci_get_long(d, pos, &header))
+            return 0;
+
+        pos = PCI_EXT_CAP_NEXT(header);
+    }
 
     do {
-        if (xen_host_pci_get_long(d, pos, &header)) {
+        if (!pos || pos < PCI_CONFIG_SPACE_SIZE)
+            break;
+
+        if (xen_host_pci_get_long(d, pos, &header))
             break;
-        }
         /*
          * If we have no capabilities, this is indicated by cap ID,
          * cap version and next pointer all being 0.
+         * Also check for all F's returned (which means PCIe ext conf space
+         * is unreadable for some reason)
          */
-        if (header == 0) {
+        if (header == 0 || header == ~0U)
             break;
-        }
 
-        if (PCI_EXT_CAP_ID(header) == cap) {
+        if (cap == CAP_ID_ANY)
+            return pos;
+        else if (PCI_EXT_CAP_ID(header) == cap)
             return pos;
-        }
 
         pos = PCI_EXT_CAP_NEXT(header);
-        if (pos < PCI_CONFIG_SPACE_SIZE) {
+    } while (--max_cap);
+
+    return 0;
+}
+
+int xen_host_pci_find_next_cap(XenHostPCIDevice *d, int pos, uint32_t cap)
+{
+    uint8_t id;
+    unsigned max_cap = XEN_HOST_PCI_CAP_MAX;
+    uint8_t status = 0;
+    uint8_t curpos;
+
+    if (xen_host_pci_get_byte(d, PCI_STATUS, &status))
+        return 0;
+
+    if ((status & PCI_STATUS_CAP_LIST) == 0)
+        return 0;
+
+    if (pos < PCI_CAPABILITY_LIST) {
+        curpos = PCI_CAPABILITY_LIST;
+    } else {
+        curpos = (uint8_t) pos;
+    }
+
+    while (max_cap--) {
+        if (xen_host_pci_get_byte(d, curpos, &curpos))
+            break;
+        if (!curpos)
             break;
-        }
 
-        max_cap--;
-    } while (max_cap > 0);
+        if (cap == CAP_ID_ANY)
+            return curpos;
 
-    return -1;
+        if (xen_host_pci_get_byte(d, curpos + PCI_CAP_LIST_ID, &id))
+            break;
+
+        if (id == 0xff)
+            break;
+        else if (id == cap)
+            return curpos;
+
+        curpos += PCI_CAP_LIST_NEXT;
+    }
+
+    return 0;
 }
 
 void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
@@ -377,7 +443,8 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
     }
     d->class_code = v;
 
-    d->is_virtfn = xen_host_pci_dev_is_virtfn(d);
+    d->is_virtfn         = xen_host_pci_dev_is_virtfn(d);
+    d->has_pcie_ext_caps = xen_host_pci_dev_has_pcie_ext_caps(d);
 
     return;
 
diff --git a/hw/xen/xen-host-pci-device.h b/hw/xen/xen-host-pci-device.h
index 4d8d34ecb0..37c5614a24 100644
--- a/hw/xen/xen-host-pci-device.h
+++ b/hw/xen/xen-host-pci-device.h
@@ -32,6 +32,7 @@ typedef struct XenHostPCIDevice {
     XenHostPCIIORegion rom;
 
     bool is_virtfn;
+    bool has_pcie_ext_caps;
 
     int config_fd;
 } XenHostPCIDevice;
@@ -53,6 +54,8 @@ int xen_host_pci_set_long(XenHostPCIDevice *d, int pos, uint32_t data);
 int xen_host_pci_set_block(XenHostPCIDevice *d, int pos, uint8_t *buf,
                            int len);
 
-int xen_host_pci_find_ext_cap_offset(XenHostPCIDevice *s, uint32_t cap);
+#define CAP_ID_ANY  (~0U)
+int xen_host_pci_find_next_cap(XenHostPCIDevice *s, int pos, uint32_t cap);
+int xen_host_pci_find_next_ext_cap(XenHostPCIDevice *d, int pos, uint32_t cap);
 
 #endif /* XEN_HOST_PCI_DEVICE_H */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 18/30] xen/pt: XenHostPCIDevice: provide functions for PCI Capabilities and PCIe Extended Capabilities enumeration
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Anthony Perard, Stefano Stabellini, Alexey Gerasimenko, qemu-devel

This patch introduces 2 new functions,
- xen_host_pci_find_next_ext_cap (actually a reworked
  xen_host_pci_find_ext_cap_offset function which is unused)
- xen_host_pci_find_next_cap

These functions allow to search for PCI/PCIe capabilities in a uniform
way. Both functions allow to search either a specific capability or any
encountered next (by specifying CAP_ID_ANY as a capability ID) -- this may
be useful when we merely need to traverse the capability list one-by-one.
In both functions the 'pos' argument allows to continue searching from
last position (0 means to start from beginning).

In order not to probe PCIe Extended Capabilities existence every time,
xen_host_pci_find_next_ext_cap makes use of the new 'has_pcie_ext_caps'
field in XenHostPCIDevice structure which is filled only once (in
xen_host_pci_device_get).

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen-host-pci-device.c | 95 +++++++++++++++++++++++++++++++++++++-------
 hw/xen/xen-host-pci-device.h |  5 ++-
 2 files changed, 85 insertions(+), 15 deletions(-)

diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
index eed8cc88e3..9d76b199af 100644
--- a/hw/xen/xen-host-pci-device.c
+++ b/hw/xen/xen-host-pci-device.c
@@ -14,6 +14,7 @@
 
 #define XEN_HOST_PCI_MAX_EXT_CAP \
     ((PCIE_CONFIG_SPACE_SIZE - PCI_CONFIG_SPACE_SIZE) / (PCI_CAP_SIZEOF + 4))
+#define XEN_HOST_PCI_CAP_MAX 48
 
 #ifdef XEN_HOST_PCI_DEVICE_DEBUG
 #  define XEN_HOST_PCI_LOG(f, a...) fprintf(stderr, "%s: " f, __func__, ##a)
@@ -199,6 +200,19 @@ static bool xen_host_pci_dev_is_virtfn(XenHostPCIDevice *d)
     return !stat(path, &buf);
 }
 
+static bool xen_host_pci_dev_has_pcie_ext_caps(XenHostPCIDevice *d)
+{
+    uint32_t header;
+
+    if (xen_host_pci_get_long(d, PCI_CONFIG_SPACE_SIZE, &header))
+        return false;
+
+    if (header == 0 || header == ~0U)
+        return false;
+
+    return true;
+}
+
 static void xen_host_pci_config_open(XenHostPCIDevice *d, Error **errp)
 {
     char path[PATH_MAX];
@@ -297,37 +311,89 @@ int xen_host_pci_set_block(XenHostPCIDevice *d, int pos, uint8_t *buf, int len)
     return xen_host_pci_config_write(d, pos, buf, len);
 }
 
-int xen_host_pci_find_ext_cap_offset(XenHostPCIDevice *d, uint32_t cap)
+int xen_host_pci_find_next_ext_cap(XenHostPCIDevice *d, int pos, uint32_t cap)
 {
     uint32_t header = 0;
     int max_cap = XEN_HOST_PCI_MAX_EXT_CAP;
-    int pos = PCI_CONFIG_SPACE_SIZE;
+
+    if (!d->has_pcie_ext_caps)
+        return 0;
+
+    if (!pos) {
+        pos = PCI_CONFIG_SPACE_SIZE;
+    } else {
+        if (xen_host_pci_get_long(d, pos, &header))
+            return 0;
+
+        pos = PCI_EXT_CAP_NEXT(header);
+    }
 
     do {
-        if (xen_host_pci_get_long(d, pos, &header)) {
+        if (!pos || pos < PCI_CONFIG_SPACE_SIZE)
+            break;
+
+        if (xen_host_pci_get_long(d, pos, &header))
             break;
-        }
         /*
          * If we have no capabilities, this is indicated by cap ID,
          * cap version and next pointer all being 0.
+         * Also check for all F's returned (which means PCIe ext conf space
+         * is unreadable for some reason)
          */
-        if (header == 0) {
+        if (header == 0 || header == ~0U)
             break;
-        }
 
-        if (PCI_EXT_CAP_ID(header) == cap) {
+        if (cap == CAP_ID_ANY)
+            return pos;
+        else if (PCI_EXT_CAP_ID(header) == cap)
             return pos;
-        }
 
         pos = PCI_EXT_CAP_NEXT(header);
-        if (pos < PCI_CONFIG_SPACE_SIZE) {
+    } while (--max_cap);
+
+    return 0;
+}
+
+int xen_host_pci_find_next_cap(XenHostPCIDevice *d, int pos, uint32_t cap)
+{
+    uint8_t id;
+    unsigned max_cap = XEN_HOST_PCI_CAP_MAX;
+    uint8_t status = 0;
+    uint8_t curpos;
+
+    if (xen_host_pci_get_byte(d, PCI_STATUS, &status))
+        return 0;
+
+    if ((status & PCI_STATUS_CAP_LIST) == 0)
+        return 0;
+
+    if (pos < PCI_CAPABILITY_LIST) {
+        curpos = PCI_CAPABILITY_LIST;
+    } else {
+        curpos = (uint8_t) pos;
+    }
+
+    while (max_cap--) {
+        if (xen_host_pci_get_byte(d, curpos, &curpos))
+            break;
+        if (!curpos)
             break;
-        }
 
-        max_cap--;
-    } while (max_cap > 0);
+        if (cap == CAP_ID_ANY)
+            return curpos;
 
-    return -1;
+        if (xen_host_pci_get_byte(d, curpos + PCI_CAP_LIST_ID, &id))
+            break;
+
+        if (id == 0xff)
+            break;
+        else if (id == cap)
+            return curpos;
+
+        curpos += PCI_CAP_LIST_NEXT;
+    }
+
+    return 0;
 }
 
 void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
@@ -377,7 +443,8 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
     }
     d->class_code = v;
 
-    d->is_virtfn = xen_host_pci_dev_is_virtfn(d);
+    d->is_virtfn         = xen_host_pci_dev_is_virtfn(d);
+    d->has_pcie_ext_caps = xen_host_pci_dev_has_pcie_ext_caps(d);
 
     return;
 
diff --git a/hw/xen/xen-host-pci-device.h b/hw/xen/xen-host-pci-device.h
index 4d8d34ecb0..37c5614a24 100644
--- a/hw/xen/xen-host-pci-device.h
+++ b/hw/xen/xen-host-pci-device.h
@@ -32,6 +32,7 @@ typedef struct XenHostPCIDevice {
     XenHostPCIIORegion rom;
 
     bool is_virtfn;
+    bool has_pcie_ext_caps;
 
     int config_fd;
 } XenHostPCIDevice;
@@ -53,6 +54,8 @@ int xen_host_pci_set_long(XenHostPCIDevice *d, int pos, uint32_t data);
 int xen_host_pci_set_block(XenHostPCIDevice *d, int pos, uint8_t *buf,
                            int len);
 
-int xen_host_pci_find_ext_cap_offset(XenHostPCIDevice *s, uint32_t cap);
+#define CAP_ID_ANY  (~0U)
+int xen_host_pci_find_next_cap(XenHostPCIDevice *s, int pos, uint32_t cap);
+int xen_host_pci_find_next_ext_cap(XenHostPCIDevice *d, int pos, uint32_t cap);
 
 #endif /* XEN_HOST_PCI_DEVICE_H */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [Qemu-devel] [RFC PATCH 19/30] xen/pt: avoid reading PCIe device type and cap version multiple times
  2018-03-12 18:33 ` Alexey Gerasimenko
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Alexey Gerasimenko, qemu-devel, Stefano Stabellini, Anthony Perard

xen_pt_config_init.c reads Device/Port Type and Capability version fields
in many places. Two functions are used for this purpose:
get_capability_version and get_device_type. These functions perform PCI
conf space reading every time they're called. Another bad thing is that
these functions know nothing about where PCI Expess Capability is located,
so its offset must be provided explicitly in function arguments. Their
typical usage is like this:
    uint8_t cap_ver = get_capability_version(s, real_offset - reg->offset);
    uint8_t dev_type = get_device_type(s, real_offset - reg->offset);

To avoid this, the PCI Express Capability register now being read only
once and stored in  XenHostPCIDevice structure (pcie_flags field). The
capabiliy offset parameter is no longer needed, simplifying functions
usage. Also, get_device_type and get_capability_version were renamed
to more descriptive get_pcie_device_type and get_pcie_capability_version.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen-host-pci-device.c | 15 +++++++++++++++
 hw/xen/xen-host-pci-device.h |  1 +
 hw/xen/xen_pt_config_init.c  | 34 ++++++++++++++--------------------
 3 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
index 9d76b199af..11e9e26d31 100644
--- a/hw/xen/xen-host-pci-device.c
+++ b/hw/xen/xen-host-pci-device.c
@@ -402,6 +402,7 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
 {
     unsigned int v;
     Error *err = NULL;
+    int pcie_cap_pos;
 
     d->config_fd = -1;
     d->domain = domain;
@@ -446,6 +447,20 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
     d->is_virtfn         = xen_host_pci_dev_is_virtfn(d);
     d->has_pcie_ext_caps = xen_host_pci_dev_has_pcie_ext_caps(d);
 
+    /* read and store PCIe Capabilities field for later use */
+    pcie_cap_pos = xen_host_pci_find_next_cap(d, 0, PCI_CAP_ID_EXP);
+
+    if (pcie_cap_pos) {
+        if (xen_host_pci_get_word(d, pcie_cap_pos + PCI_EXP_FLAGS,
+                                  &d->pcie_flags)) {
+            error_setg(&err, "Unable to read from PCI Express capability "
+                       "structure at 0x%x", pcie_cap_pos);
+            goto error;
+        }
+    } else {
+        d->pcie_flags = 0xFFFF;
+    }
+
     return;
 
 error:
diff --git a/hw/xen/xen-host-pci-device.h b/hw/xen/xen-host-pci-device.h
index 37c5614a24..2884c4b4b9 100644
--- a/hw/xen/xen-host-pci-device.h
+++ b/hw/xen/xen-host-pci-device.h
@@ -27,6 +27,7 @@ typedef struct XenHostPCIDevice {
     uint16_t device_id;
     uint32_t class_code;
     int irq;
+    uint16_t pcie_flags;
 
     XenHostPCIIORegion io_regions[PCI_NUM_REGIONS - 1];
     XenHostPCIIORegion rom;
diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index a3ce33e78b..02e8c97f3c 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -828,24 +828,18 @@ static XenPTRegInfo xen_pt_emu_reg_vendor[] = {
  * PCI Express Capability
  */
 
-static inline uint8_t get_capability_version(XenPCIPassthroughState *s,
-                                             uint32_t offset)
+static inline uint8_t get_pcie_capability_version(XenPCIPassthroughState *s)
 {
-    uint8_t flag;
-    if (xen_host_pci_get_byte(&s->real_device, offset + PCI_EXP_FLAGS, &flag)) {
-        return 0;
-    }
-    return flag & PCI_EXP_FLAGS_VERS;
+    assert(s->real_device.pcie_flags != 0xFFFF);
+
+    return (uint8_t) (s->real_device.pcie_flags & PCI_EXP_FLAGS_VERS);
 }
 
-static inline uint8_t get_device_type(XenPCIPassthroughState *s,
-                                      uint32_t offset)
+static inline uint8_t get_pcie_device_type(XenPCIPassthroughState *s)
 {
-    uint8_t flag;
-    if (xen_host_pci_get_byte(&s->real_device, offset + PCI_EXP_FLAGS, &flag)) {
-        return 0;
-    }
-    return (flag & PCI_EXP_FLAGS_TYPE) >> 4;
+    assert(s->real_device.pcie_flags != 0xFFFF);
+
+    return (uint8_t) ((s->real_device.pcie_flags & PCI_EXP_FLAGS_TYPE) >> 4);
 }
 
 /* initialize Link Control register */
@@ -853,8 +847,8 @@ static int xen_pt_linkctrl_reg_init(XenPCIPassthroughState *s,
                                     XenPTRegInfo *reg, uint32_t real_offset,
                                     uint32_t *data)
 {
-    uint8_t cap_ver = get_capability_version(s, real_offset - reg->offset);
-    uint8_t dev_type = get_device_type(s, real_offset - reg->offset);
+    uint8_t cap_ver  = get_pcie_capability_version(s);
+    uint8_t dev_type = get_pcie_device_type(s);
 
     /* no need to initialize in case of Root Complex Integrated Endpoint
      * with cap_ver 1.x
@@ -871,7 +865,7 @@ static int xen_pt_devctrl2_reg_init(XenPCIPassthroughState *s,
                                     XenPTRegInfo *reg, uint32_t real_offset,
                                     uint32_t *data)
 {
-    uint8_t cap_ver = get_capability_version(s, real_offset - reg->offset);
+    uint8_t cap_ver = get_pcie_capability_version(s);
 
     /* no need to initialize in case of cap_ver 1.x */
     if (cap_ver == 1) {
@@ -886,7 +880,7 @@ static int xen_pt_linkctrl2_reg_init(XenPCIPassthroughState *s,
                                      XenPTRegInfo *reg, uint32_t real_offset,
                                      uint32_t *data)
 {
-    uint8_t cap_ver = get_capability_version(s, real_offset - reg->offset);
+    uint8_t cap_ver = get_pcie_capability_version(s);
     uint32_t reg_field = 0;
 
     /* no need to initialize in case of cap_ver 1.x */
@@ -1586,8 +1580,8 @@ static int xen_pt_pcie_size_init(XenPCIPassthroughState *s,
                                  uint32_t base_offset, uint8_t *size)
 {
     PCIDevice *d = &s->dev;
-    uint8_t version = get_capability_version(s, base_offset);
-    uint8_t type = get_device_type(s, base_offset);
+    uint8_t version = get_pcie_capability_version(s);
+    uint8_t type = get_pcie_device_type(s);
     uint8_t pcie_size = 0;
 
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 19/30] xen/pt: avoid reading PCIe device type and cap version multiple times
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Anthony Perard, Stefano Stabellini, Alexey Gerasimenko, qemu-devel

xen_pt_config_init.c reads Device/Port Type and Capability version fields
in many places. Two functions are used for this purpose:
get_capability_version and get_device_type. These functions perform PCI
conf space reading every time they're called. Another bad thing is that
these functions know nothing about where PCI Expess Capability is located,
so its offset must be provided explicitly in function arguments. Their
typical usage is like this:
    uint8_t cap_ver = get_capability_version(s, real_offset - reg->offset);
    uint8_t dev_type = get_device_type(s, real_offset - reg->offset);

To avoid this, the PCI Express Capability register now being read only
once and stored in  XenHostPCIDevice structure (pcie_flags field). The
capabiliy offset parameter is no longer needed, simplifying functions
usage. Also, get_device_type and get_capability_version were renamed
to more descriptive get_pcie_device_type and get_pcie_capability_version.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen-host-pci-device.c | 15 +++++++++++++++
 hw/xen/xen-host-pci-device.h |  1 +
 hw/xen/xen_pt_config_init.c  | 34 ++++++++++++++--------------------
 3 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
index 9d76b199af..11e9e26d31 100644
--- a/hw/xen/xen-host-pci-device.c
+++ b/hw/xen/xen-host-pci-device.c
@@ -402,6 +402,7 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
 {
     unsigned int v;
     Error *err = NULL;
+    int pcie_cap_pos;
 
     d->config_fd = -1;
     d->domain = domain;
@@ -446,6 +447,20 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
     d->is_virtfn         = xen_host_pci_dev_is_virtfn(d);
     d->has_pcie_ext_caps = xen_host_pci_dev_has_pcie_ext_caps(d);
 
+    /* read and store PCIe Capabilities field for later use */
+    pcie_cap_pos = xen_host_pci_find_next_cap(d, 0, PCI_CAP_ID_EXP);
+
+    if (pcie_cap_pos) {
+        if (xen_host_pci_get_word(d, pcie_cap_pos + PCI_EXP_FLAGS,
+                                  &d->pcie_flags)) {
+            error_setg(&err, "Unable to read from PCI Express capability "
+                       "structure at 0x%x", pcie_cap_pos);
+            goto error;
+        }
+    } else {
+        d->pcie_flags = 0xFFFF;
+    }
+
     return;
 
 error:
diff --git a/hw/xen/xen-host-pci-device.h b/hw/xen/xen-host-pci-device.h
index 37c5614a24..2884c4b4b9 100644
--- a/hw/xen/xen-host-pci-device.h
+++ b/hw/xen/xen-host-pci-device.h
@@ -27,6 +27,7 @@ typedef struct XenHostPCIDevice {
     uint16_t device_id;
     uint32_t class_code;
     int irq;
+    uint16_t pcie_flags;
 
     XenHostPCIIORegion io_regions[PCI_NUM_REGIONS - 1];
     XenHostPCIIORegion rom;
diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index a3ce33e78b..02e8c97f3c 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -828,24 +828,18 @@ static XenPTRegInfo xen_pt_emu_reg_vendor[] = {
  * PCI Express Capability
  */
 
-static inline uint8_t get_capability_version(XenPCIPassthroughState *s,
-                                             uint32_t offset)
+static inline uint8_t get_pcie_capability_version(XenPCIPassthroughState *s)
 {
-    uint8_t flag;
-    if (xen_host_pci_get_byte(&s->real_device, offset + PCI_EXP_FLAGS, &flag)) {
-        return 0;
-    }
-    return flag & PCI_EXP_FLAGS_VERS;
+    assert(s->real_device.pcie_flags != 0xFFFF);
+
+    return (uint8_t) (s->real_device.pcie_flags & PCI_EXP_FLAGS_VERS);
 }
 
-static inline uint8_t get_device_type(XenPCIPassthroughState *s,
-                                      uint32_t offset)
+static inline uint8_t get_pcie_device_type(XenPCIPassthroughState *s)
 {
-    uint8_t flag;
-    if (xen_host_pci_get_byte(&s->real_device, offset + PCI_EXP_FLAGS, &flag)) {
-        return 0;
-    }
-    return (flag & PCI_EXP_FLAGS_TYPE) >> 4;
+    assert(s->real_device.pcie_flags != 0xFFFF);
+
+    return (uint8_t) ((s->real_device.pcie_flags & PCI_EXP_FLAGS_TYPE) >> 4);
 }
 
 /* initialize Link Control register */
@@ -853,8 +847,8 @@ static int xen_pt_linkctrl_reg_init(XenPCIPassthroughState *s,
                                     XenPTRegInfo *reg, uint32_t real_offset,
                                     uint32_t *data)
 {
-    uint8_t cap_ver = get_capability_version(s, real_offset - reg->offset);
-    uint8_t dev_type = get_device_type(s, real_offset - reg->offset);
+    uint8_t cap_ver  = get_pcie_capability_version(s);
+    uint8_t dev_type = get_pcie_device_type(s);
 
     /* no need to initialize in case of Root Complex Integrated Endpoint
      * with cap_ver 1.x
@@ -871,7 +865,7 @@ static int xen_pt_devctrl2_reg_init(XenPCIPassthroughState *s,
                                     XenPTRegInfo *reg, uint32_t real_offset,
                                     uint32_t *data)
 {
-    uint8_t cap_ver = get_capability_version(s, real_offset - reg->offset);
+    uint8_t cap_ver = get_pcie_capability_version(s);
 
     /* no need to initialize in case of cap_ver 1.x */
     if (cap_ver == 1) {
@@ -886,7 +880,7 @@ static int xen_pt_linkctrl2_reg_init(XenPCIPassthroughState *s,
                                      XenPTRegInfo *reg, uint32_t real_offset,
                                      uint32_t *data)
 {
-    uint8_t cap_ver = get_capability_version(s, real_offset - reg->offset);
+    uint8_t cap_ver = get_pcie_capability_version(s);
     uint32_t reg_field = 0;
 
     /* no need to initialize in case of cap_ver 1.x */
@@ -1586,8 +1580,8 @@ static int xen_pt_pcie_size_init(XenPCIPassthroughState *s,
                                  uint32_t base_offset, uint8_t *size)
 {
     PCIDevice *d = &s->dev;
-    uint8_t version = get_capability_version(s, base_offset);
-    uint8_t type = get_device_type(s, base_offset);
+    uint8_t version = get_pcie_capability_version(s);
+    uint8_t type = get_pcie_device_type(s);
     uint8_t pcie_size = 0;
 
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [Qemu-devel] [RFC PATCH 20/30] xen/pt: determine the legacy/PCIe mode for a passed through device
  2018-03-12 18:33 ` Alexey Gerasimenko
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Alexey Gerasimenko, qemu-devel, Stefano Stabellini, Anthony Perard

Even if we have some real PCIe device being passed through to a guest,
there are situations when we cannot use its PCIe features, primarily
allowing to access extended (>256) config space.

Basically, we can allow reading PCIe extended config space only if both
the device and emulated system are PCIe-capable. So it's a combination
of checks:
- PCI Express capability presence
- pci_is_express(device)
- pci_bus_is_express(device bus)

The AND-product of these checks is stored to pcie_enabled_dev flag
in XenPCIPassthroughState for later use in functions like
xen_pt_pci_config_access_check.

This way we get consistent behavior when the same PCIe device being passed
through to either i440 domain or Q35 one.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt.c | 28 ++++++++++++++++++++++++++--
 hw/xen/xen_pt.h |  1 +
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index 9b7a960de1..a902a9b685 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -687,6 +687,21 @@ static const MemoryListener xen_pt_io_listener = {
     .priority = 10,
 };
 
+static inline bool xen_pt_dev_is_pcie_mode(PCIDevice *d)
+{
+    XenPCIPassthroughState *s = XEN_PT_DEVICE(d);
+    PCIBus *bus = pci_get_bus(d);
+
+    if (bus != NULL) {
+        if (pci_is_express(d) && pci_bus_is_express(bus) &&
+            xen_host_pci_find_next_cap(&s->real_device, 0, PCI_CAP_ID_EXP)) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
 static void
 xen_igd_passthrough_isa_bridge_create(XenPCIPassthroughState *s,
                                       XenHostPCIDevice *dev)
@@ -794,8 +809,17 @@ static void xen_pt_realize(PCIDevice *d, Error **errp)
                    s->real_device.dev, s->real_device.func);
     }
 
-    /* Initialize virtualized PCI configuration (Extended 256 Bytes) */
-    memset(d->config, 0, PCI_CONFIG_SPACE_SIZE);
+    s->pcie_enabled_dev = xen_pt_dev_is_pcie_mode(d);
+    if (s->pcie_enabled_dev) {
+        XEN_PT_LOG(d, "Host device %04x:%02x:%02x.%d passed thru "
+                   "in PCIe mode\n", s->real_device.domain,
+                    s->real_device.bus, s->real_device.dev,
+                    s->real_device.func);
+    }
+
+    /* Initialize virtualized PCI configuration space (256/4K bytes) */
+    memset(d->config, 0, pci_is_express(d) ? PCIE_CONFIG_SPACE_SIZE
+                                           : PCI_CONFIG_SPACE_SIZE);
 
     s->memory_listener = xen_pt_memory_listener;
     s->io_listener = xen_pt_io_listener;
diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
index aa39a9aa5f..1204acbdce 100644
--- a/hw/xen/xen_pt.h
+++ b/hw/xen/xen_pt.h
@@ -212,6 +212,7 @@ struct XenPCIPassthroughState {
 
     PCIHostDeviceAddress hostaddr;
     bool is_virtfn;
+    bool pcie_enabled_dev;
     bool permissive;
     bool permissive_warned;
     XenHostPCIDevice real_device;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 20/30] xen/pt: determine the legacy/PCIe mode for a passed through device
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Anthony Perard, Stefano Stabellini, Alexey Gerasimenko, qemu-devel

Even if we have some real PCIe device being passed through to a guest,
there are situations when we cannot use its PCIe features, primarily
allowing to access extended (>256) config space.

Basically, we can allow reading PCIe extended config space only if both
the device and emulated system are PCIe-capable. So it's a combination
of checks:
- PCI Express capability presence
- pci_is_express(device)
- pci_bus_is_express(device bus)

The AND-product of these checks is stored to pcie_enabled_dev flag
in XenPCIPassthroughState for later use in functions like
xen_pt_pci_config_access_check.

This way we get consistent behavior when the same PCIe device being passed
through to either i440 domain or Q35 one.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt.c | 28 ++++++++++++++++++++++++++--
 hw/xen/xen_pt.h |  1 +
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index 9b7a960de1..a902a9b685 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -687,6 +687,21 @@ static const MemoryListener xen_pt_io_listener = {
     .priority = 10,
 };
 
+static inline bool xen_pt_dev_is_pcie_mode(PCIDevice *d)
+{
+    XenPCIPassthroughState *s = XEN_PT_DEVICE(d);
+    PCIBus *bus = pci_get_bus(d);
+
+    if (bus != NULL) {
+        if (pci_is_express(d) && pci_bus_is_express(bus) &&
+            xen_host_pci_find_next_cap(&s->real_device, 0, PCI_CAP_ID_EXP)) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
 static void
 xen_igd_passthrough_isa_bridge_create(XenPCIPassthroughState *s,
                                       XenHostPCIDevice *dev)
@@ -794,8 +809,17 @@ static void xen_pt_realize(PCIDevice *d, Error **errp)
                    s->real_device.dev, s->real_device.func);
     }
 
-    /* Initialize virtualized PCI configuration (Extended 256 Bytes) */
-    memset(d->config, 0, PCI_CONFIG_SPACE_SIZE);
+    s->pcie_enabled_dev = xen_pt_dev_is_pcie_mode(d);
+    if (s->pcie_enabled_dev) {
+        XEN_PT_LOG(d, "Host device %04x:%02x:%02x.%d passed thru "
+                   "in PCIe mode\n", s->real_device.domain,
+                    s->real_device.bus, s->real_device.dev,
+                    s->real_device.func);
+    }
+
+    /* Initialize virtualized PCI configuration space (256/4K bytes) */
+    memset(d->config, 0, pci_is_express(d) ? PCIE_CONFIG_SPACE_SIZE
+                                           : PCI_CONFIG_SPACE_SIZE);
 
     s->memory_listener = xen_pt_memory_listener;
     s->io_listener = xen_pt_io_listener;
diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
index aa39a9aa5f..1204acbdce 100644
--- a/hw/xen/xen_pt.h
+++ b/hw/xen/xen_pt.h
@@ -212,6 +212,7 @@ struct XenPCIPassthroughState {
 
     PCIHostDeviceAddress hostaddr;
     bool is_virtfn;
+    bool pcie_enabled_dev;
     bool permissive;
     bool permissive_warned;
     XenHostPCIDevice real_device;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [Qemu-devel] [RFC PATCH 21/30] xen/pt: Xen PCIe passthrough support for Q35: bypass PCIe topology check
  2018-03-12 18:33 ` Alexey Gerasimenko
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Alexey Gerasimenko, qemu-devel, Stefano Stabellini, Anthony Perard

Compared to legacy i440 system, there are certain difficulties while
passing through PCIe devices to guest OSes like Windows 7 and above
on platforms with native support of PCIe bus (in our case Q35). This
problem is not applicable to older OSes like Windows XP -- PCIe
passthrough on such OSes can be used normally as these OSes have
no support for PCIe-specific features and treat all PCIe devices as legacy
PCI ones.

The problem manifests itself as "Code 10" error for a passed thru PCIe
device in Windows Device Manager (along with exclamation mark on it). The
device with such error do not function no matter the fact that Windows
successfully booted while actually using this device, ex. as a primary VGA
card with VBE features, LFB, etc. working properly during boot time.
It doesn't matter which PCI class the device have -- the problem is common
to GPUs, NIC cards, USB controllers, etc. In the same time, all these
devices can be passed thru successfully using i440 emulation on same
Windows 7+ OSes.

The actual root cause of the problem lies in the fact that Windows kernel
(PnP manager particularly) while processing StartDevice IRP refuses
to continue to start the device and control flow actually doesn't even
reach the IRP handler in the device driver at all. The real reason for
this typically does not appear at the time PnP manager tries to start the
device, but happens much earlier -- during the Windows boot stage, while
enumerating devices on a PCI/PCIe bus in the Windows pci.sys driver. There
is a set of checks for every discovered device on the PCIe bus. Failing
some of them leads to marking the discovered PCIe device as 'invalid'
by setting the flag. Later on, StartDevice attempt will fail due to this
flag, finally resulting in Code 10 error.

The actual check in pci.sys which results in the PCIe device being marked
as 'invalid' in our case is a validation of upstream PCIe bus hierarchy
to which passed through device belongs. Basically, pci.sys checks if the
PCIe device has parent devices, such as PCIe Root Port or upstream PCIe
switch. In our case the PCIe device has no parents and resides on bus
0 without eg. corresponding Root Port.

Therefore, in order to resolve this problem in a architecturally correct
way, we need to introduce to Xen some support of at least trivial non-flat
PCI bus hierarchy. In very simplest case - just one virtual Root Port,
on secondary bus of which all physical functions of the real passed thru
device will reside, eg. GPU and its HDAudio function.

This solution is not hard to implement technically, but there are multiple
affecting limitations present in Xen (many related to each other)
currently:

- in many places the code is limited to use bus 0 only. This applicable
  to both hypervisor and supplemental modules like hvmloader. This
  limitation is enforced on API level -- many functions and interfaces
  allow to specify only devfn argument while bus 0 being implied.

- lot of code assumes Type0 PCI config space layout only, while we need
  to handle Type1 PCI devices as well

- currently there no way to assign to a guest domain even a simplest
  linked hierarchy of passed thru PCI devices. In some cases we might need
  to passthrough a real PCIe Switch/Root Port with his downstream child
  devices.

- in a similar way Xen/hvmloader lacks the concept of IO/MMIO space
  nesting. Both code which does MMIO hole sizing and code which allocates
  BARs to MMIO hole have no idea of MMIO ranges nesting and their relations.
  In case of virtual Root Port we have basically an emulated PCI-PCI bridge
  with some parts of its MMIO range used for real MMIO ranges of passed
  through device(s).

So, adding to Xen multiple PCI buses support will require a bit of effort
and discussions regarding the actual design of the feature.  Nevertheless,
this task is crucial for PCI/GPU passthrough features of Xen to work
properly.

To summarize, we need to implement following things in the future:
1) Get rid of PCI bus 0 limitation everywhere. This could've been
  a simplest of subtasks but in reality this will require to change
  interfaces as well - AFAIR even adding a PCI device via QMP only allows
  to specify a device slot while we need to have some way to place the
  device on an arbitrary bus.

2) Fully or partially emulated PCI-PCI bridge which will provide
  a secondary bus for PCIe device placement - there might be a possibility
  to reuse some existing emulation QEMU provides. This also includes Type1
  devices support.
  The task will become more complicated if there arise necessity, for
  example, to control the PCIe link for a passed through PCIe device. As PT
  device reset is mandatory in most cases, there might be a chance
  to encounter a situation when we need to retrain the PCIe link to restore
  PCIe link speed after the reset. In this case there will be a need
  to selectively translate accesses to certain registers of emulated PCIe
  Switch/Root Port to the corresponding physical upstream PCIe
  Switch/RootPort. This will require some interaction with Dom0, hopefully
  extending xen-pciback will be enough.

3) The concept of I/O and MMIO ranges nesting, for tasks like sizing MMIO
  hole or PCI BAR allocation. This one should be pretty simple.

The actual implementation still is a matter to discuss of course.

In the meantime there can be used a very simple workaround which allows
to bypass pci.sys limitation for PCIe topology check - there exist one
good exception to "must have upstream PCIe parent" rule of pci.sys. It's
chipset-integrated devices. How pci.sys can tell if it deals with
a chipset built-in device? It checks one of PCI Express Capability fields
in the device PCI conf space. For chipset built-in devices this field will
state "root complex integrated device" while in our  case for a normal
passed thru PCIe device there will be a "PCIe endpoint" type. So that's
what the workaround does - it intercepts reading of this particular field
for passed through devices and returns the "root complex integrated
device" value for PCIe endpoints. This makes pci.sys happy and allows
Windows 7 and above to use PT device on PCIe-capable system normally.
So far no negative side effects were encountered while using this
approach, so it's a good temporary solution until multiple PCI bus support
will be added to Xen.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt_config_init.c | 60 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 02e8c97f3c..91de215407 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -902,6 +902,55 @@ static int xen_pt_linkctrl2_reg_init(XenPCIPassthroughState *s,
     *data = reg_field;
     return 0;
 }
+/* initialize PCI Express Capabilities register */
+static int xen_pt_pcie_capabilities_reg_init(XenPCIPassthroughState *s,
+                                             XenPTRegInfo *reg,
+                                             uint32_t real_offset,
+                                             uint32_t *data)
+{
+    uint8_t dev_type = get_pcie_device_type(s);
+    uint16_t reg_field;
+
+    if (xen_host_pci_get_word(&s->real_device,
+                             real_offset - reg->offset + PCI_EXP_FLAGS,
+                             &reg_field)) {
+        XEN_PT_ERR(&s->dev, "Error reading PCIe Capabilities reg\n");
+        *data = 0;
+        return 0;
+    }
+
+    /*
+     * Q35 workaround for Win7+ pci.sys PCIe topology check.
+     * As our PT device currently located on a bus 0, fake the
+     * device/port type field to the "Root Complex integrated device"
+     * value to bypass the check
+     */
+    switch (dev_type) {
+    case PCI_EXP_TYPE_ENDPOINT:
+    case PCI_EXP_TYPE_LEG_END:
+        XEN_PT_LOG(&s->dev, "Original PCIe Capabilities reg is 0x%04X\n",
+            reg_field);
+        reg_field &= ~PCI_EXP_FLAGS_TYPE;
+        reg_field |= ((PCI_EXP_TYPE_RC_END /*9*/ << 4) & PCI_EXP_FLAGS_TYPE);
+        XEN_PT_LOG(&s->dev, "Q35 PCIe topology check workaround: "
+                   "faking Capabilities reg to 0x%04X\n", reg_field);
+        break;
+
+    case PCI_EXP_TYPE_ROOT_PORT:
+    case PCI_EXP_TYPE_UPSTREAM:
+    case PCI_EXP_TYPE_DOWNSTREAM:
+    case PCI_EXP_TYPE_PCI_BRIDGE:
+    case PCI_EXP_TYPE_PCIE_BRIDGE:
+    case PCI_EXP_TYPE_RC_END:
+    case PCI_EXP_TYPE_RC_EC:
+    default:
+        /* do nothing, return as is */
+        break;
+    }
+
+    *data = reg_field;
+    return 0;
+}
 
 /* PCI Express Capability Structure reg static information table */
 static XenPTRegInfo xen_pt_emu_reg_pcie[] = {
@@ -916,6 +965,17 @@ static XenPTRegInfo xen_pt_emu_reg_pcie[] = {
         .u.b.read   = xen_pt_byte_reg_read,
         .u.b.write  = xen_pt_byte_reg_write,
     },
+    /* PCI Express Capabilities Register */
+    {
+        .offset     = PCI_EXP_FLAGS,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_pcie_capabilities_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
     /* Device Capabilities reg */
     {
         .offset     = PCI_EXP_DEVCAP,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 21/30] xen/pt: Xen PCIe passthrough support for Q35: bypass PCIe topology check
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Anthony Perard, Stefano Stabellini, Alexey Gerasimenko, qemu-devel

Compared to legacy i440 system, there are certain difficulties while
passing through PCIe devices to guest OSes like Windows 7 and above
on platforms with native support of PCIe bus (in our case Q35). This
problem is not applicable to older OSes like Windows XP -- PCIe
passthrough on such OSes can be used normally as these OSes have
no support for PCIe-specific features and treat all PCIe devices as legacy
PCI ones.

The problem manifests itself as "Code 10" error for a passed thru PCIe
device in Windows Device Manager (along with exclamation mark on it). The
device with such error do not function no matter the fact that Windows
successfully booted while actually using this device, ex. as a primary VGA
card with VBE features, LFB, etc. working properly during boot time.
It doesn't matter which PCI class the device have -- the problem is common
to GPUs, NIC cards, USB controllers, etc. In the same time, all these
devices can be passed thru successfully using i440 emulation on same
Windows 7+ OSes.

The actual root cause of the problem lies in the fact that Windows kernel
(PnP manager particularly) while processing StartDevice IRP refuses
to continue to start the device and control flow actually doesn't even
reach the IRP handler in the device driver at all. The real reason for
this typically does not appear at the time PnP manager tries to start the
device, but happens much earlier -- during the Windows boot stage, while
enumerating devices on a PCI/PCIe bus in the Windows pci.sys driver. There
is a set of checks for every discovered device on the PCIe bus. Failing
some of them leads to marking the discovered PCIe device as 'invalid'
by setting the flag. Later on, StartDevice attempt will fail due to this
flag, finally resulting in Code 10 error.

The actual check in pci.sys which results in the PCIe device being marked
as 'invalid' in our case is a validation of upstream PCIe bus hierarchy
to which passed through device belongs. Basically, pci.sys checks if the
PCIe device has parent devices, such as PCIe Root Port or upstream PCIe
switch. In our case the PCIe device has no parents and resides on bus
0 without eg. corresponding Root Port.

Therefore, in order to resolve this problem in a architecturally correct
way, we need to introduce to Xen some support of at least trivial non-flat
PCI bus hierarchy. In very simplest case - just one virtual Root Port,
on secondary bus of which all physical functions of the real passed thru
device will reside, eg. GPU and its HDAudio function.

This solution is not hard to implement technically, but there are multiple
affecting limitations present in Xen (many related to each other)
currently:

- in many places the code is limited to use bus 0 only. This applicable
  to both hypervisor and supplemental modules like hvmloader. This
  limitation is enforced on API level -- many functions and interfaces
  allow to specify only devfn argument while bus 0 being implied.

- lot of code assumes Type0 PCI config space layout only, while we need
  to handle Type1 PCI devices as well

- currently there no way to assign to a guest domain even a simplest
  linked hierarchy of passed thru PCI devices. In some cases we might need
  to passthrough a real PCIe Switch/Root Port with his downstream child
  devices.

- in a similar way Xen/hvmloader lacks the concept of IO/MMIO space
  nesting. Both code which does MMIO hole sizing and code which allocates
  BARs to MMIO hole have no idea of MMIO ranges nesting and their relations.
  In case of virtual Root Port we have basically an emulated PCI-PCI bridge
  with some parts of its MMIO range used for real MMIO ranges of passed
  through device(s).

So, adding to Xen multiple PCI buses support will require a bit of effort
and discussions regarding the actual design of the feature.  Nevertheless,
this task is crucial for PCI/GPU passthrough features of Xen to work
properly.

To summarize, we need to implement following things in the future:
1) Get rid of PCI bus 0 limitation everywhere. This could've been
  a simplest of subtasks but in reality this will require to change
  interfaces as well - AFAIR even adding a PCI device via QMP only allows
  to specify a device slot while we need to have some way to place the
  device on an arbitrary bus.

2) Fully or partially emulated PCI-PCI bridge which will provide
  a secondary bus for PCIe device placement - there might be a possibility
  to reuse some existing emulation QEMU provides. This also includes Type1
  devices support.
  The task will become more complicated if there arise necessity, for
  example, to control the PCIe link for a passed through PCIe device. As PT
  device reset is mandatory in most cases, there might be a chance
  to encounter a situation when we need to retrain the PCIe link to restore
  PCIe link speed after the reset. In this case there will be a need
  to selectively translate accesses to certain registers of emulated PCIe
  Switch/Root Port to the corresponding physical upstream PCIe
  Switch/RootPort. This will require some interaction with Dom0, hopefully
  extending xen-pciback will be enough.

3) The concept of I/O and MMIO ranges nesting, for tasks like sizing MMIO
  hole or PCI BAR allocation. This one should be pretty simple.

The actual implementation still is a matter to discuss of course.

In the meantime there can be used a very simple workaround which allows
to bypass pci.sys limitation for PCIe topology check - there exist one
good exception to "must have upstream PCIe parent" rule of pci.sys. It's
chipset-integrated devices. How pci.sys can tell if it deals with
a chipset built-in device? It checks one of PCI Express Capability fields
in the device PCI conf space. For chipset built-in devices this field will
state "root complex integrated device" while in our  case for a normal
passed thru PCIe device there will be a "PCIe endpoint" type. So that's
what the workaround does - it intercepts reading of this particular field
for passed through devices and returns the "root complex integrated
device" value for PCIe endpoints. This makes pci.sys happy and allows
Windows 7 and above to use PT device on PCIe-capable system normally.
So far no negative side effects were encountered while using this
approach, so it's a good temporary solution until multiple PCI bus support
will be added to Xen.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt_config_init.c | 60 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 02e8c97f3c..91de215407 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -902,6 +902,55 @@ static int xen_pt_linkctrl2_reg_init(XenPCIPassthroughState *s,
     *data = reg_field;
     return 0;
 }
+/* initialize PCI Express Capabilities register */
+static int xen_pt_pcie_capabilities_reg_init(XenPCIPassthroughState *s,
+                                             XenPTRegInfo *reg,
+                                             uint32_t real_offset,
+                                             uint32_t *data)
+{
+    uint8_t dev_type = get_pcie_device_type(s);
+    uint16_t reg_field;
+
+    if (xen_host_pci_get_word(&s->real_device,
+                             real_offset - reg->offset + PCI_EXP_FLAGS,
+                             &reg_field)) {
+        XEN_PT_ERR(&s->dev, "Error reading PCIe Capabilities reg\n");
+        *data = 0;
+        return 0;
+    }
+
+    /*
+     * Q35 workaround for Win7+ pci.sys PCIe topology check.
+     * As our PT device currently located on a bus 0, fake the
+     * device/port type field to the "Root Complex integrated device"
+     * value to bypass the check
+     */
+    switch (dev_type) {
+    case PCI_EXP_TYPE_ENDPOINT:
+    case PCI_EXP_TYPE_LEG_END:
+        XEN_PT_LOG(&s->dev, "Original PCIe Capabilities reg is 0x%04X\n",
+            reg_field);
+        reg_field &= ~PCI_EXP_FLAGS_TYPE;
+        reg_field |= ((PCI_EXP_TYPE_RC_END /*9*/ << 4) & PCI_EXP_FLAGS_TYPE);
+        XEN_PT_LOG(&s->dev, "Q35 PCIe topology check workaround: "
+                   "faking Capabilities reg to 0x%04X\n", reg_field);
+        break;
+
+    case PCI_EXP_TYPE_ROOT_PORT:
+    case PCI_EXP_TYPE_UPSTREAM:
+    case PCI_EXP_TYPE_DOWNSTREAM:
+    case PCI_EXP_TYPE_PCI_BRIDGE:
+    case PCI_EXP_TYPE_PCIE_BRIDGE:
+    case PCI_EXP_TYPE_RC_END:
+    case PCI_EXP_TYPE_RC_EC:
+    default:
+        /* do nothing, return as is */
+        break;
+    }
+
+    *data = reg_field;
+    return 0;
+}
 
 /* PCI Express Capability Structure reg static information table */
 static XenPTRegInfo xen_pt_emu_reg_pcie[] = {
@@ -916,6 +965,17 @@ static XenPTRegInfo xen_pt_emu_reg_pcie[] = {
         .u.b.read   = xen_pt_byte_reg_read,
         .u.b.write  = xen_pt_byte_reg_write,
     },
+    /* PCI Express Capabilities Register */
+    {
+        .offset     = PCI_EXP_FLAGS,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_pcie_capabilities_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
     /* Device Capabilities reg */
     {
         .offset     = PCI_EXP_DEVCAP,
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [Qemu-devel] [RFC PATCH 22/30] xen/pt: add support for PCIe Extended Capabilities and larger config space
  2018-03-12 18:33 ` Alexey Gerasimenko
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Alexey Gerasimenko, qemu-devel, Stefano Stabellini, Anthony Perard

This patch provides basic facilities for PCIe Extended Capabilities and
support for controlled (via s->pcie_enabled_dev flag) access to PCIe
config space (>256).

PCIe Extended Capabilities make use of 16-bit capability ID. Also,
a capability size might exceed 8-bit width. So as the very first step
we need to increase type size for grp_id, grp_size, etc -- they were
limited to 8-bit.

The only troublesome issue with PCIe Extended Capability IDs is that their
value range is actually same as for basic PCI capabilities.
Eg. capability ID 3 means VPD Capability for PCI and at the same time
Device Serial Number Capability for PCIe Extended caps. This adds a bit of
inconvenience.

In order to distinguish between two sets of same capability IDs, the patch
introduces a set of macros to mark a capability ID as PCIe Extended one
(or check if it is basic/extended + get a raw ID value):
- PCIE_EXT_CAP_ID(cap_id)
- IS_PCIE_EXT_CAP_ID(grp_id)
- GET_PCIE_EXT_CAP_ID(grp_id)

Here is how it's used:
    /* Intel IGD Opregion group */
    {
        .grp_id      = XEN_PCI_INTEL_OPREGION,  /* no change */
        .grp_type    = XEN_PT_GRP_TYPE_EMU,
        .grp_size    = 0x4,
        .size_init   = xen_pt_reg_grp_size_init,
        .emu_regs    = xen_pt_emu_reg_igd_opregion,
    },
    /* Vendor-specific Extended Capability reg group */
    {
        .grp_id      = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_VNDR),
        .grp_type    = XEN_PT_GRP_TYPE_EMU,
        .grp_size    = 0xFF,
        .size_init   = xen_pt_ext_cap_vendor_size_init,
        .emu_regs    = xen_pt_ext_cap_emu_reg_vendor,
    },
By using the PCIE_EXT_CAP_ID() macro it is possible to reuse existing
header files with already defined PCIe Extended Capability ID values.

find_cap_offset() receive capabily ID and checks if it's an Extended one
by using IS_PCIE_EXT_CAP_ID(cap) macro, passing the real capabiliy
ID value to either xen_host_pci_find_next_ext_cap
or xen_host_pci_find_next_cap.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt.c             |  14 +++++-
 hw/xen/xen_pt.h             |  13 +++--
 hw/xen/xen_pt_config_init.c | 113 +++++++++++++++++++++-----------------------
 3 files changed, 74 insertions(+), 66 deletions(-)

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index a902a9b685..bf098c26b3 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -82,10 +82,20 @@ void xen_pt_log(const PCIDevice *d, const char *f, ...)
 
 /* Config Space */
 
-static int xen_pt_pci_config_access_check(PCIDevice *d, uint32_t addr, int len)
+static int xen_pt_pci_config_access_check(PCIDevice *d,
+                                          uint32_t addr, int len)
 {
+    XenPCIPassthroughState *s = XEN_PT_DEVICE(d);
+
     /* check offset range */
-    if (addr > 0xFF) {
+    if (s->pcie_enabled_dev) {
+        if (addr >= PCIE_CONFIG_SPACE_SIZE) {
+            XEN_PT_ERR(d, "Failed to access register with offset "
+                          "exceeding 0xFFF. (addr: 0x%02x, len: %d)\n",
+                          addr, len);
+            return -1;
+        }
+    } else if (addr >= PCI_CONFIG_SPACE_SIZE) {
         XEN_PT_ERR(d, "Failed to access register with offset exceeding 0xFF. "
                    "(addr: 0x%02x, len: %d)\n", addr, len);
         return -1;
diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
index 1204acbdce..5531347ab2 100644
--- a/hw/xen/xen_pt.h
+++ b/hw/xen/xen_pt.h
@@ -31,6 +31,11 @@ void xen_pt_log(const PCIDevice *d, const char *f, ...) GCC_FMT_ATTR(2, 3);
 /* Helper */
 #define XEN_PFN(x) ((x) >> XC_PAGE_SHIFT)
 
+/* Macro's for PCIe Extended Capabilities */
+#define PCIE_EXT_CAP_ID(cap_id)     ((cap_id) | (1U << 16))
+#define IS_PCIE_EXT_CAP_ID(grp_id)  ((grp_id) & (1U << 16))
+#define GET_PCIE_EXT_CAP_ID(grp_id) ((grp_id) & 0xFFFF)
+
 typedef const struct XenPTRegInfo XenPTRegInfo;
 typedef struct XenPTReg XenPTReg;
 
@@ -152,13 +157,13 @@ typedef const struct XenPTRegGroupInfo XenPTRegGroupInfo;
 /* emul reg group size initialize method */
 typedef int (*xen_pt_reg_size_init_fn)
     (XenPCIPassthroughState *, XenPTRegGroupInfo *,
-     uint32_t base_offset, uint8_t *size);
+     uint32_t base_offset, uint32_t *size);
 
 /* emulated register group information */
 struct XenPTRegGroupInfo {
-    uint8_t grp_id;
+    uint32_t grp_id;
     XenPTRegisterGroupType grp_type;
-    uint8_t grp_size;
+    uint32_t grp_size;
     xen_pt_reg_size_init_fn size_init;
     XenPTRegInfo *emu_regs;
 };
@@ -168,7 +173,7 @@ typedef struct XenPTRegGroup {
     QLIST_ENTRY(XenPTRegGroup) entries;
     XenPTRegGroupInfo *reg_grp;
     uint32_t base_offset;
-    uint8_t size;
+    uint32_t size;
     QLIST_HEAD(, XenPTReg) reg_tbl_list;
 } XenPTRegGroup;
 
diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 91de215407..9c041fa288 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -32,29 +32,42 @@ static int xen_pt_ptr_reg_init(XenPCIPassthroughState *s, XenPTRegInfo *reg,
 /* helper */
 
 /* A return value of 1 means the capability should NOT be exposed to guest. */
-static int xen_pt_hide_dev_cap(const XenHostPCIDevice *d, uint8_t grp_id)
+static int xen_pt_hide_dev_cap(const XenHostPCIDevice *d, uint32_t grp_id)
 {
-    switch (grp_id) {
-    case PCI_CAP_ID_EXP:
-        /* The PCI Express Capability Structure of the VF of Intel 82599 10GbE
-         * Controller looks trivial, e.g., the PCI Express Capabilities
-         * Register is 0. We should not try to expose it to guest.
-         *
-         * The datasheet is available at
-         * http://download.intel.com/design/network/datashts/82599_datasheet.pdf
-         *
-         * See 'Table 9.7. VF PCIe Configuration Space' of the datasheet, the
-         * PCI Express Capability Structure of the VF of Intel 82599 10GbE
-         * Controller looks trivial, e.g., the PCI Express Capabilities
-         * Register is 0, so the Capability Version is 0 and
-         * xen_pt_pcie_size_init() would fail.
-         */
-        if (d->vendor_id == PCI_VENDOR_ID_INTEL &&
-            d->device_id == PCI_DEVICE_ID_INTEL_82599_SFP_VF) {
-            return 1;
+    if (IS_PCIE_EXT_CAP_ID(grp_id)) {
+        switch (GET_PCIE_EXT_CAP_ID(grp_id)) {
+            /* Here can be added device-specific filtering
+             * for PCIe Extended capabilities (those with offset >= 0x100).
+             * This is simply a placeholder as no filtering needed for now.
+             */
+        default:
+            break;
+        }
+    } else {
+        /* basic PCI capability */
+        switch (grp_id) {
+        case PCI_CAP_ID_EXP:
+            /* The PCI Express Capability Structure of the VF of Intel 82599 10GbE
+             * Controller looks trivial, e.g., the PCI Express Capabilities
+             * Register is 0. We should not try to expose it to guest.
+             *
+             * The datasheet is available at
+             * http://download.intel.com/design/network/datashts/82599_datasheet.pdf
+             *
+             * See 'Table 9.7. VF PCIe Configuration Space' of the datasheet, the
+             * PCI Express Capability Structure of the VF of Intel 82599 10GbE
+             * Controller looks trivial, e.g., the PCI Express Capabilities
+             * Register is 0, so the Capability Version is 0 and
+             * xen_pt_pcie_size_init() would fail.
+             */
+            if (d->vendor_id == PCI_VENDOR_ID_INTEL &&
+                d->device_id == PCI_DEVICE_ID_INTEL_82599_SFP_VF) {
+                return 1;
+            }
+            break;
         }
-        break;
     }
+
     return 0;
 }
 
@@ -1622,7 +1635,7 @@ static XenPTRegInfo xen_pt_emu_reg_igd_opregion[] = {
 
 static int xen_pt_reg_grp_size_init(XenPCIPassthroughState *s,
                                     const XenPTRegGroupInfo *grp_reg,
-                                    uint32_t base_offset, uint8_t *size)
+                                    uint32_t base_offset, uint32_t *size)
 {
     *size = grp_reg->grp_size;
     return 0;
@@ -1630,14 +1643,18 @@ static int xen_pt_reg_grp_size_init(XenPCIPassthroughState *s,
 /* get Vendor Specific Capability Structure register group size */
 static int xen_pt_vendor_size_init(XenPCIPassthroughState *s,
                                    const XenPTRegGroupInfo *grp_reg,
-                                   uint32_t base_offset, uint8_t *size)
+                                   uint32_t base_offset, uint32_t *size)
 {
-    return xen_host_pci_get_byte(&s->real_device, base_offset + 0x02, size);
+    uint8_t sz = 0;
+    int ret = xen_host_pci_get_byte(&s->real_device, base_offset + 0x02, &sz);
+
+    *size = sz;
+    return ret;
 }
 /* get PCI Express Capability Structure register group size */
 static int xen_pt_pcie_size_init(XenPCIPassthroughState *s,
                                  const XenPTRegGroupInfo *grp_reg,
-                                 uint32_t base_offset, uint8_t *size)
+                                 uint32_t base_offset, uint32_t *size)
 {
     PCIDevice *d = &s->dev;
     uint8_t version = get_pcie_capability_version(s);
@@ -1709,7 +1726,7 @@ static int xen_pt_pcie_size_init(XenPCIPassthroughState *s,
 /* get MSI Capability Structure register group size */
 static int xen_pt_msi_size_init(XenPCIPassthroughState *s,
                                 const XenPTRegGroupInfo *grp_reg,
-                                uint32_t base_offset, uint8_t *size)
+                                uint32_t base_offset, uint32_t *size)
 {
     uint16_t msg_ctrl = 0;
     uint8_t msi_size = 0xa;
@@ -1737,7 +1754,7 @@ static int xen_pt_msi_size_init(XenPCIPassthroughState *s,
 /* get MSI-X Capability Structure register group size */
 static int xen_pt_msix_size_init(XenPCIPassthroughState *s,
                                  const XenPTRegGroupInfo *grp_reg,
-                                 uint32_t base_offset, uint8_t *size)
+                                 uint32_t base_offset, uint32_t *size)
 {
     int rc = 0;
 
@@ -1920,44 +1937,20 @@ out:
  * Main
  */
 
-static uint8_t find_cap_offset(XenPCIPassthroughState *s, uint8_t cap)
+static uint32_t find_cap_offset(XenPCIPassthroughState *s, uint32_t cap)
 {
-    uint8_t id;
-    unsigned max_cap = XEN_PCI_CAP_MAX;
-    uint8_t pos = PCI_CAPABILITY_LIST;
-    uint8_t status = 0;
+    uint32_t retval = 0;
 
-    if (xen_host_pci_get_byte(&s->real_device, PCI_STATUS, &status)) {
-        return 0;
-    }
-    if ((status & PCI_STATUS_CAP_LIST) == 0) {
-        return 0;
-    }
-
-    while (max_cap--) {
-        if (xen_host_pci_get_byte(&s->real_device, pos, &pos)) {
-            break;
-        }
-        if (pos < PCI_CONFIG_HEADER_SIZE) {
-            break;
+    if (IS_PCIE_EXT_CAP_ID(cap)) {
+        if (s->pcie_enabled_dev) {
+            retval = xen_host_pci_find_next_ext_cap(&s->real_device, 0,
+                                                    GET_PCIE_EXT_CAP_ID(cap));
         }
-
-        pos &= ~3;
-        if (xen_host_pci_get_byte(&s->real_device,
-                                  pos + PCI_CAP_LIST_ID, &id)) {
-            break;
-        }
-
-        if (id == 0xff) {
-            break;
-        }
-        if (id == cap) {
-            return pos;
-        }
-
-        pos += PCI_CAP_LIST_NEXT;
+    } else {
+        retval = xen_host_pci_find_next_cap(&s->real_device, 0, cap);
     }
-    return 0;
+
+    return retval;
 }
 
 static void xen_pt_config_reg_init(XenPCIPassthroughState *s,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 22/30] xen/pt: add support for PCIe Extended Capabilities and larger config space
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Anthony Perard, Stefano Stabellini, Alexey Gerasimenko, qemu-devel

This patch provides basic facilities for PCIe Extended Capabilities and
support for controlled (via s->pcie_enabled_dev flag) access to PCIe
config space (>256).

PCIe Extended Capabilities make use of 16-bit capability ID. Also,
a capability size might exceed 8-bit width. So as the very first step
we need to increase type size for grp_id, grp_size, etc -- they were
limited to 8-bit.

The only troublesome issue with PCIe Extended Capability IDs is that their
value range is actually same as for basic PCI capabilities.
Eg. capability ID 3 means VPD Capability for PCI and at the same time
Device Serial Number Capability for PCIe Extended caps. This adds a bit of
inconvenience.

In order to distinguish between two sets of same capability IDs, the patch
introduces a set of macros to mark a capability ID as PCIe Extended one
(or check if it is basic/extended + get a raw ID value):
- PCIE_EXT_CAP_ID(cap_id)
- IS_PCIE_EXT_CAP_ID(grp_id)
- GET_PCIE_EXT_CAP_ID(grp_id)

Here is how it's used:
    /* Intel IGD Opregion group */
    {
        .grp_id      = XEN_PCI_INTEL_OPREGION,  /* no change */
        .grp_type    = XEN_PT_GRP_TYPE_EMU,
        .grp_size    = 0x4,
        .size_init   = xen_pt_reg_grp_size_init,
        .emu_regs    = xen_pt_emu_reg_igd_opregion,
    },
    /* Vendor-specific Extended Capability reg group */
    {
        .grp_id      = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_VNDR),
        .grp_type    = XEN_PT_GRP_TYPE_EMU,
        .grp_size    = 0xFF,
        .size_init   = xen_pt_ext_cap_vendor_size_init,
        .emu_regs    = xen_pt_ext_cap_emu_reg_vendor,
    },
By using the PCIE_EXT_CAP_ID() macro it is possible to reuse existing
header files with already defined PCIe Extended Capability ID values.

find_cap_offset() receive capabily ID and checks if it's an Extended one
by using IS_PCIE_EXT_CAP_ID(cap) macro, passing the real capabiliy
ID value to either xen_host_pci_find_next_ext_cap
or xen_host_pci_find_next_cap.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt.c             |  14 +++++-
 hw/xen/xen_pt.h             |  13 +++--
 hw/xen/xen_pt_config_init.c | 113 +++++++++++++++++++++-----------------------
 3 files changed, 74 insertions(+), 66 deletions(-)

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index a902a9b685..bf098c26b3 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -82,10 +82,20 @@ void xen_pt_log(const PCIDevice *d, const char *f, ...)
 
 /* Config Space */
 
-static int xen_pt_pci_config_access_check(PCIDevice *d, uint32_t addr, int len)
+static int xen_pt_pci_config_access_check(PCIDevice *d,
+                                          uint32_t addr, int len)
 {
+    XenPCIPassthroughState *s = XEN_PT_DEVICE(d);
+
     /* check offset range */
-    if (addr > 0xFF) {
+    if (s->pcie_enabled_dev) {
+        if (addr >= PCIE_CONFIG_SPACE_SIZE) {
+            XEN_PT_ERR(d, "Failed to access register with offset "
+                          "exceeding 0xFFF. (addr: 0x%02x, len: %d)\n",
+                          addr, len);
+            return -1;
+        }
+    } else if (addr >= PCI_CONFIG_SPACE_SIZE) {
         XEN_PT_ERR(d, "Failed to access register with offset exceeding 0xFF. "
                    "(addr: 0x%02x, len: %d)\n", addr, len);
         return -1;
diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
index 1204acbdce..5531347ab2 100644
--- a/hw/xen/xen_pt.h
+++ b/hw/xen/xen_pt.h
@@ -31,6 +31,11 @@ void xen_pt_log(const PCIDevice *d, const char *f, ...) GCC_FMT_ATTR(2, 3);
 /* Helper */
 #define XEN_PFN(x) ((x) >> XC_PAGE_SHIFT)
 
+/* Macro's for PCIe Extended Capabilities */
+#define PCIE_EXT_CAP_ID(cap_id)     ((cap_id) | (1U << 16))
+#define IS_PCIE_EXT_CAP_ID(grp_id)  ((grp_id) & (1U << 16))
+#define GET_PCIE_EXT_CAP_ID(grp_id) ((grp_id) & 0xFFFF)
+
 typedef const struct XenPTRegInfo XenPTRegInfo;
 typedef struct XenPTReg XenPTReg;
 
@@ -152,13 +157,13 @@ typedef const struct XenPTRegGroupInfo XenPTRegGroupInfo;
 /* emul reg group size initialize method */
 typedef int (*xen_pt_reg_size_init_fn)
     (XenPCIPassthroughState *, XenPTRegGroupInfo *,
-     uint32_t base_offset, uint8_t *size);
+     uint32_t base_offset, uint32_t *size);
 
 /* emulated register group information */
 struct XenPTRegGroupInfo {
-    uint8_t grp_id;
+    uint32_t grp_id;
     XenPTRegisterGroupType grp_type;
-    uint8_t grp_size;
+    uint32_t grp_size;
     xen_pt_reg_size_init_fn size_init;
     XenPTRegInfo *emu_regs;
 };
@@ -168,7 +173,7 @@ typedef struct XenPTRegGroup {
     QLIST_ENTRY(XenPTRegGroup) entries;
     XenPTRegGroupInfo *reg_grp;
     uint32_t base_offset;
-    uint8_t size;
+    uint32_t size;
     QLIST_HEAD(, XenPTReg) reg_tbl_list;
 } XenPTRegGroup;
 
diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 91de215407..9c041fa288 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -32,29 +32,42 @@ static int xen_pt_ptr_reg_init(XenPCIPassthroughState *s, XenPTRegInfo *reg,
 /* helper */
 
 /* A return value of 1 means the capability should NOT be exposed to guest. */
-static int xen_pt_hide_dev_cap(const XenHostPCIDevice *d, uint8_t grp_id)
+static int xen_pt_hide_dev_cap(const XenHostPCIDevice *d, uint32_t grp_id)
 {
-    switch (grp_id) {
-    case PCI_CAP_ID_EXP:
-        /* The PCI Express Capability Structure of the VF of Intel 82599 10GbE
-         * Controller looks trivial, e.g., the PCI Express Capabilities
-         * Register is 0. We should not try to expose it to guest.
-         *
-         * The datasheet is available at
-         * http://download.intel.com/design/network/datashts/82599_datasheet.pdf
-         *
-         * See 'Table 9.7. VF PCIe Configuration Space' of the datasheet, the
-         * PCI Express Capability Structure of the VF of Intel 82599 10GbE
-         * Controller looks trivial, e.g., the PCI Express Capabilities
-         * Register is 0, so the Capability Version is 0 and
-         * xen_pt_pcie_size_init() would fail.
-         */
-        if (d->vendor_id == PCI_VENDOR_ID_INTEL &&
-            d->device_id == PCI_DEVICE_ID_INTEL_82599_SFP_VF) {
-            return 1;
+    if (IS_PCIE_EXT_CAP_ID(grp_id)) {
+        switch (GET_PCIE_EXT_CAP_ID(grp_id)) {
+            /* Here can be added device-specific filtering
+             * for PCIe Extended capabilities (those with offset >= 0x100).
+             * This is simply a placeholder as no filtering needed for now.
+             */
+        default:
+            break;
+        }
+    } else {
+        /* basic PCI capability */
+        switch (grp_id) {
+        case PCI_CAP_ID_EXP:
+            /* The PCI Express Capability Structure of the VF of Intel 82599 10GbE
+             * Controller looks trivial, e.g., the PCI Express Capabilities
+             * Register is 0. We should not try to expose it to guest.
+             *
+             * The datasheet is available at
+             * http://download.intel.com/design/network/datashts/82599_datasheet.pdf
+             *
+             * See 'Table 9.7. VF PCIe Configuration Space' of the datasheet, the
+             * PCI Express Capability Structure of the VF of Intel 82599 10GbE
+             * Controller looks trivial, e.g., the PCI Express Capabilities
+             * Register is 0, so the Capability Version is 0 and
+             * xen_pt_pcie_size_init() would fail.
+             */
+            if (d->vendor_id == PCI_VENDOR_ID_INTEL &&
+                d->device_id == PCI_DEVICE_ID_INTEL_82599_SFP_VF) {
+                return 1;
+            }
+            break;
         }
-        break;
     }
+
     return 0;
 }
 
@@ -1622,7 +1635,7 @@ static XenPTRegInfo xen_pt_emu_reg_igd_opregion[] = {
 
 static int xen_pt_reg_grp_size_init(XenPCIPassthroughState *s,
                                     const XenPTRegGroupInfo *grp_reg,
-                                    uint32_t base_offset, uint8_t *size)
+                                    uint32_t base_offset, uint32_t *size)
 {
     *size = grp_reg->grp_size;
     return 0;
@@ -1630,14 +1643,18 @@ static int xen_pt_reg_grp_size_init(XenPCIPassthroughState *s,
 /* get Vendor Specific Capability Structure register group size */
 static int xen_pt_vendor_size_init(XenPCIPassthroughState *s,
                                    const XenPTRegGroupInfo *grp_reg,
-                                   uint32_t base_offset, uint8_t *size)
+                                   uint32_t base_offset, uint32_t *size)
 {
-    return xen_host_pci_get_byte(&s->real_device, base_offset + 0x02, size);
+    uint8_t sz = 0;
+    int ret = xen_host_pci_get_byte(&s->real_device, base_offset + 0x02, &sz);
+
+    *size = sz;
+    return ret;
 }
 /* get PCI Express Capability Structure register group size */
 static int xen_pt_pcie_size_init(XenPCIPassthroughState *s,
                                  const XenPTRegGroupInfo *grp_reg,
-                                 uint32_t base_offset, uint8_t *size)
+                                 uint32_t base_offset, uint32_t *size)
 {
     PCIDevice *d = &s->dev;
     uint8_t version = get_pcie_capability_version(s);
@@ -1709,7 +1726,7 @@ static int xen_pt_pcie_size_init(XenPCIPassthroughState *s,
 /* get MSI Capability Structure register group size */
 static int xen_pt_msi_size_init(XenPCIPassthroughState *s,
                                 const XenPTRegGroupInfo *grp_reg,
-                                uint32_t base_offset, uint8_t *size)
+                                uint32_t base_offset, uint32_t *size)
 {
     uint16_t msg_ctrl = 0;
     uint8_t msi_size = 0xa;
@@ -1737,7 +1754,7 @@ static int xen_pt_msi_size_init(XenPCIPassthroughState *s,
 /* get MSI-X Capability Structure register group size */
 static int xen_pt_msix_size_init(XenPCIPassthroughState *s,
                                  const XenPTRegGroupInfo *grp_reg,
-                                 uint32_t base_offset, uint8_t *size)
+                                 uint32_t base_offset, uint32_t *size)
 {
     int rc = 0;
 
@@ -1920,44 +1937,20 @@ out:
  * Main
  */
 
-static uint8_t find_cap_offset(XenPCIPassthroughState *s, uint8_t cap)
+static uint32_t find_cap_offset(XenPCIPassthroughState *s, uint32_t cap)
 {
-    uint8_t id;
-    unsigned max_cap = XEN_PCI_CAP_MAX;
-    uint8_t pos = PCI_CAPABILITY_LIST;
-    uint8_t status = 0;
+    uint32_t retval = 0;
 
-    if (xen_host_pci_get_byte(&s->real_device, PCI_STATUS, &status)) {
-        return 0;
-    }
-    if ((status & PCI_STATUS_CAP_LIST) == 0) {
-        return 0;
-    }
-
-    while (max_cap--) {
-        if (xen_host_pci_get_byte(&s->real_device, pos, &pos)) {
-            break;
-        }
-        if (pos < PCI_CONFIG_HEADER_SIZE) {
-            break;
+    if (IS_PCIE_EXT_CAP_ID(cap)) {
+        if (s->pcie_enabled_dev) {
+            retval = xen_host_pci_find_next_ext_cap(&s->real_device, 0,
+                                                    GET_PCIE_EXT_CAP_ID(cap));
         }
-
-        pos &= ~3;
-        if (xen_host_pci_get_byte(&s->real_device,
-                                  pos + PCI_CAP_LIST_ID, &id)) {
-            break;
-        }
-
-        if (id == 0xff) {
-            break;
-        }
-        if (id == cap) {
-            return pos;
-        }
-
-        pos += PCI_CAP_LIST_NEXT;
+    } else {
+        retval = xen_host_pci_find_next_cap(&s->real_device, 0, cap);
     }
-    return 0;
+
+    return retval;
 }
 
 static void xen_pt_config_reg_init(XenPCIPassthroughState *s,
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [Qemu-devel] [RFC PATCH 23/30] xen/pt: handle PCIe Extended Capabilities Next register
  2018-03-12 18:33 ` Alexey Gerasimenko
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Alexey Gerasimenko, qemu-devel, Stefano Stabellini, Anthony Perard

The patch adds new xen_pt_ext_cap_ptr_reg_init function which is used
to initialize the emulated next pcie extended capability pointer.

Primary mission of this function is to have a method to selectively hide
some extended capabilities from the capability linked list, skipping them
by altering the Next capability pointer value.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt_config_init.c | 73 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 71 insertions(+), 2 deletions(-)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 9c041fa288..0ce2a033f9 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -23,11 +23,14 @@
 
 #define XEN_PT_INVALID_REG          0xFFFFFFFF      /* invalid register value */
 
-/* prototype */
+/* prototypes */
 
 static int xen_pt_ptr_reg_init(XenPCIPassthroughState *s, XenPTRegInfo *reg,
                                uint32_t real_offset, uint32_t *data);
-
+static int xen_pt_ext_cap_ptr_reg_init(XenPCIPassthroughState *s,
+                                       XenPTRegInfo *reg,
+                                       uint32_t real_offset,
+                                       uint32_t *data);
 
 /* helper */
 
@@ -1932,6 +1935,72 @@ out:
     return 0;
 }
 
+#define PCIE_EXT_CAP_NEXT_SHIFT 4
+#define PCIE_EXT_CAP_VER_MASK   0xF
+
+static int xen_pt_ext_cap_ptr_reg_init(XenPCIPassthroughState *s,
+                                       XenPTRegInfo *reg,
+                                       uint32_t real_offset,
+                                       uint32_t *data)
+{
+    int i, rc;
+    XenHostPCIDevice *d = &s->real_device;
+    uint16_t reg_field;
+    uint16_t cur_offset, version, cap_id;
+    uint32_t header;
+
+    if (real_offset < PCI_CONFIG_SPACE_SIZE) {
+        XEN_PT_ERR(&s->dev, "Incorrect PCIe extended capability offset"
+                   "encountered: 0x%04x\n", real_offset);
+        return -EINVAL;
+    }
+
+    rc = xen_host_pci_get_word(d, real_offset, &reg_field);
+    if (rc)
+        return rc;
+
+    /* preserve version field */
+    version    = reg_field & PCIE_EXT_CAP_VER_MASK;
+    cur_offset = reg_field >> PCIE_EXT_CAP_NEXT_SHIFT;
+
+    while (cur_offset && cur_offset != 0xFFF) {
+        rc = xen_host_pci_get_long(d, cur_offset, &header);
+        if (rc) {
+            XEN_PT_ERR(&s->dev, "Failed to read PCIe extended capability "
+                       "@0x%x (rc:%d)\n", cur_offset, rc);
+            return rc;
+        }
+
+        cap_id = PCI_EXT_CAP_ID(header);
+
+        for (i = 0; xen_pt_emu_reg_grps[i].grp_size != 0; i++) {
+            uint32_t cur_grp_id = xen_pt_emu_reg_grps[i].grp_id;
+
+            if (!IS_PCIE_EXT_CAP_ID(cur_grp_id))
+                continue;
+
+            if (xen_pt_hide_dev_cap(d, cur_grp_id))
+                continue;
+
+            if (GET_PCIE_EXT_CAP_ID(cur_grp_id) == cap_id) {
+                if (xen_pt_emu_reg_grps[i].grp_type == XEN_PT_GRP_TYPE_EMU)
+                    goto out;
+
+                /* skip TYPE_HARDWIRED capability, move the ptr to next one */
+                break;
+            }
+        }
+
+        /* next capability */
+        cur_offset = PCI_EXT_CAP_NEXT(header);
+    }
+
+out:
+    *data = (cur_offset << PCIE_EXT_CAP_NEXT_SHIFT) | version;
+    return 0;
+}
+
+
 
 /*************
  * Main
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 23/30] xen/pt: handle PCIe Extended Capabilities Next register
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Anthony Perard, Stefano Stabellini, Alexey Gerasimenko, qemu-devel

The patch adds new xen_pt_ext_cap_ptr_reg_init function which is used
to initialize the emulated next pcie extended capability pointer.

Primary mission of this function is to have a method to selectively hide
some extended capabilities from the capability linked list, skipping them
by altering the Next capability pointer value.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt_config_init.c | 73 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 71 insertions(+), 2 deletions(-)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 9c041fa288..0ce2a033f9 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -23,11 +23,14 @@
 
 #define XEN_PT_INVALID_REG          0xFFFFFFFF      /* invalid register value */
 
-/* prototype */
+/* prototypes */
 
 static int xen_pt_ptr_reg_init(XenPCIPassthroughState *s, XenPTRegInfo *reg,
                                uint32_t real_offset, uint32_t *data);
-
+static int xen_pt_ext_cap_ptr_reg_init(XenPCIPassthroughState *s,
+                                       XenPTRegInfo *reg,
+                                       uint32_t real_offset,
+                                       uint32_t *data);
 
 /* helper */
 
@@ -1932,6 +1935,72 @@ out:
     return 0;
 }
 
+#define PCIE_EXT_CAP_NEXT_SHIFT 4
+#define PCIE_EXT_CAP_VER_MASK   0xF
+
+static int xen_pt_ext_cap_ptr_reg_init(XenPCIPassthroughState *s,
+                                       XenPTRegInfo *reg,
+                                       uint32_t real_offset,
+                                       uint32_t *data)
+{
+    int i, rc;
+    XenHostPCIDevice *d = &s->real_device;
+    uint16_t reg_field;
+    uint16_t cur_offset, version, cap_id;
+    uint32_t header;
+
+    if (real_offset < PCI_CONFIG_SPACE_SIZE) {
+        XEN_PT_ERR(&s->dev, "Incorrect PCIe extended capability offset"
+                   "encountered: 0x%04x\n", real_offset);
+        return -EINVAL;
+    }
+
+    rc = xen_host_pci_get_word(d, real_offset, &reg_field);
+    if (rc)
+        return rc;
+
+    /* preserve version field */
+    version    = reg_field & PCIE_EXT_CAP_VER_MASK;
+    cur_offset = reg_field >> PCIE_EXT_CAP_NEXT_SHIFT;
+
+    while (cur_offset && cur_offset != 0xFFF) {
+        rc = xen_host_pci_get_long(d, cur_offset, &header);
+        if (rc) {
+            XEN_PT_ERR(&s->dev, "Failed to read PCIe extended capability "
+                       "@0x%x (rc:%d)\n", cur_offset, rc);
+            return rc;
+        }
+
+        cap_id = PCI_EXT_CAP_ID(header);
+
+        for (i = 0; xen_pt_emu_reg_grps[i].grp_size != 0; i++) {
+            uint32_t cur_grp_id = xen_pt_emu_reg_grps[i].grp_id;
+
+            if (!IS_PCIE_EXT_CAP_ID(cur_grp_id))
+                continue;
+
+            if (xen_pt_hide_dev_cap(d, cur_grp_id))
+                continue;
+
+            if (GET_PCIE_EXT_CAP_ID(cur_grp_id) == cap_id) {
+                if (xen_pt_emu_reg_grps[i].grp_type == XEN_PT_GRP_TYPE_EMU)
+                    goto out;
+
+                /* skip TYPE_HARDWIRED capability, move the ptr to next one */
+                break;
+            }
+        }
+
+        /* next capability */
+        cur_offset = PCI_EXT_CAP_NEXT(header);
+    }
+
+out:
+    *data = (cur_offset << PCIE_EXT_CAP_NEXT_SHIFT) | version;
+    return 0;
+}
+
+
 
 /*************
  * Main
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [Qemu-devel] [RFC PATCH 24/30] xen/pt: allow to hide PCIe Extended Capabilities
  2018-03-12 18:33 ` Alexey Gerasimenko
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Alexey Gerasimenko, qemu-devel, Stefano Stabellini, Anthony Perard

We need to hide some unwanted PCI/PCIe capabilities for passed through
devices.
Normally we do this by marking the capability register group
as XEN_PT_GRP_TYPE_HARDWIRED which exclude this capability from the
capability list and returns zeroes on attempts to read capability body.
Skipping the capability in the linked list of capabilities can be done
by changing Next Capability register to skip one or many unwanted
capabilities.

One difference between PCI and PCIe Extended capabilities is that we don't
have the list head field anymore. PCIe Extended capabilities always start
at offset 0x100 if they're present. Unfortunately, there are typically
only few PCIe extended capabilities present which means there is a chance
that some capability we want to hide will reside at offset 0x100 in PCIe
config space.

The simplest way to hide such capabilities from guest OS or drivers
is faking their capability ID value.

This patch adds the Capability ID register handler which checks
- if the capability to which this register belong starts at offset 0x100
  in PCIe config space
- if this capability is marked as XEN_PT_GRP_TYPE_HARDWIRED

If it is the case, then a fake Capability ID value is returned.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt.c             | 11 +++++++-
 hw/xen/xen_pt.h             |  5 ++++
 hw/xen/xen_pt_config_init.c | 62 ++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index bf098c26b3..e6a18afa83 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -154,7 +154,16 @@ static uint32_t xen_pt_pci_read_config(PCIDevice *d, uint32_t addr, int len)
     reg_grp_entry = xen_pt_find_reg_grp(s, addr);
     if (reg_grp_entry) {
         /* check 0-Hardwired register group */
-        if (reg_grp_entry->reg_grp->grp_type == XEN_PT_GRP_TYPE_HARDWIRED) {
+        if (reg_grp_entry->reg_grp->grp_type == XEN_PT_GRP_TYPE_HARDWIRED &&
+            /*
+             * For PCIe Extended Capabilities we need to emulate
+             * CapabilityID and NextCapability/Version registers for a
+             * hardwired reg group located at the offset 0x100 in PCIe
+             * config space. This allows us to hide the first extended
+             * capability as well.
+             */
+            !(reg_grp_entry->base_offset == PCI_CONFIG_SPACE_SIZE &&
+            ranges_overlap(addr, len, 0x100, 4))) {
             /* no need to emulate, just return 0 */
             val = 0;
             goto exit;
diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
index 5531347ab2..ac45261679 100644
--- a/hw/xen/xen_pt.h
+++ b/hw/xen/xen_pt.h
@@ -78,6 +78,11 @@ typedef int (*xen_pt_conf_byte_read)
 
 #define XEN_PCI_INTEL_OPREGION 0xfc
 
+#define XEN_PCIE_CAP_ID             0
+#define XEN_PCIE_CAP_LIST_NEXT      2
+
+#define XEN_PCIE_FAKE_CAP_ID_BASE   0xFE00
+
 typedef enum {
     XEN_PT_GRP_TYPE_HARDWIRED = 0,  /* 0 Hardwired reg group */
     XEN_PT_GRP_TYPE_EMU,            /* emul reg group */
diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 0ce2a033f9..10f3b67d35 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -31,6 +31,10 @@ static int xen_pt_ext_cap_ptr_reg_init(XenPCIPassthroughState *s,
                                        XenPTRegInfo *reg,
                                        uint32_t real_offset,
                                        uint32_t *data);
+static int xen_pt_ext_cap_capid_reg_init(XenPCIPassthroughState *s,
+                                         XenPTRegInfo *reg,
+                                         uint32_t real_offset,
+                                         uint32_t *data);
 
 /* helper */
 
@@ -1630,6 +1634,56 @@ static XenPTRegInfo xen_pt_emu_reg_igd_opregion[] = {
     },
 };
 
+
+/****************************
+ * Emulated registers for
+ * PCIe Extended Capabilities
+ */
+
+static uint16_t fake_cap_id = XEN_PCIE_FAKE_CAP_ID_BASE;
+
+/* PCIe Extended Capability ID reg */
+static int xen_pt_ext_cap_capid_reg_init(XenPCIPassthroughState *s,
+                                         XenPTRegInfo *reg,
+                                         uint32_t real_offset,
+                                         uint32_t *data)
+{
+    uint16_t reg_field;
+    int rc;
+    XenPTRegGroup *reg_grp_entry = NULL;
+
+    /* use real device register's value as initial value */
+    rc = xen_host_pci_get_word(&s->real_device, real_offset, &reg_field);
+    if (rc) {
+        return rc;
+    }
+
+    reg_grp_entry = xen_pt_find_reg_grp(s, real_offset);
+
+    if (reg_grp_entry) {
+        if (reg_grp_entry->reg_grp->grp_type == XEN_PT_GRP_TYPE_HARDWIRED &&
+            reg_grp_entry->base_offset == PCI_CONFIG_SPACE_SIZE) {
+            /*
+             * This is the situation when we were asked to hide (aka
+             * "hardwire to 0") some PCIe ext capability, but it was located
+             * at offset 0x100 in PCIe config space. In this case we can't
+             * simply exclude it from the linked list of capabilities
+             * (as it is the first entry in the list), so we must fake its
+             * Capability ID in PCIe Extended Capability header, leaving
+             * the Next Ptr field intact while returning zeroes on attempts
+             * to read capability body (writes are ignored).
+             */
+            reg_field = fake_cap_id;
+            /* increment the value in order to have unique Capability IDs */
+            fake_cap_id++;
+        }
+    }
+
+    *data = reg_field;
+    return 0;
+}
+
+
 /****************************
  * Capabilities
  */
@@ -2173,7 +2227,13 @@ void xen_pt_config_init(XenPCIPassthroughState *s, Error **errp)
             }
         }
 
-        if (xen_pt_emu_reg_grps[i].grp_type == XEN_PT_GRP_TYPE_EMU) {
+        if (xen_pt_emu_reg_grps[i].grp_type == XEN_PT_GRP_TYPE_EMU ||
+            /*
+             * We need to always emulate the PCIe Extended Capability
+             * header for a hidden capability which starts at offset 0x100
+             */
+            (xen_pt_emu_reg_grps[i].grp_type == XEN_PT_GRP_TYPE_HARDWIRED &&
+            reg_grp_offset == 0x100)) {
             if (xen_pt_emu_reg_grps[i].emu_regs) {
                 int j = 0;
                 XenPTRegInfo *regs = xen_pt_emu_reg_grps[i].emu_regs;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 24/30] xen/pt: allow to hide PCIe Extended Capabilities
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Anthony Perard, Stefano Stabellini, Alexey Gerasimenko, qemu-devel

We need to hide some unwanted PCI/PCIe capabilities for passed through
devices.
Normally we do this by marking the capability register group
as XEN_PT_GRP_TYPE_HARDWIRED which exclude this capability from the
capability list and returns zeroes on attempts to read capability body.
Skipping the capability in the linked list of capabilities can be done
by changing Next Capability register to skip one or many unwanted
capabilities.

One difference between PCI and PCIe Extended capabilities is that we don't
have the list head field anymore. PCIe Extended capabilities always start
at offset 0x100 if they're present. Unfortunately, there are typically
only few PCIe extended capabilities present which means there is a chance
that some capability we want to hide will reside at offset 0x100 in PCIe
config space.

The simplest way to hide such capabilities from guest OS or drivers
is faking their capability ID value.

This patch adds the Capability ID register handler which checks
- if the capability to which this register belong starts at offset 0x100
  in PCIe config space
- if this capability is marked as XEN_PT_GRP_TYPE_HARDWIRED

If it is the case, then a fake Capability ID value is returned.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt.c             | 11 +++++++-
 hw/xen/xen_pt.h             |  5 ++++
 hw/xen/xen_pt_config_init.c | 62 ++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index bf098c26b3..e6a18afa83 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -154,7 +154,16 @@ static uint32_t xen_pt_pci_read_config(PCIDevice *d, uint32_t addr, int len)
     reg_grp_entry = xen_pt_find_reg_grp(s, addr);
     if (reg_grp_entry) {
         /* check 0-Hardwired register group */
-        if (reg_grp_entry->reg_grp->grp_type == XEN_PT_GRP_TYPE_HARDWIRED) {
+        if (reg_grp_entry->reg_grp->grp_type == XEN_PT_GRP_TYPE_HARDWIRED &&
+            /*
+             * For PCIe Extended Capabilities we need to emulate
+             * CapabilityID and NextCapability/Version registers for a
+             * hardwired reg group located at the offset 0x100 in PCIe
+             * config space. This allows us to hide the first extended
+             * capability as well.
+             */
+            !(reg_grp_entry->base_offset == PCI_CONFIG_SPACE_SIZE &&
+            ranges_overlap(addr, len, 0x100, 4))) {
             /* no need to emulate, just return 0 */
             val = 0;
             goto exit;
diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
index 5531347ab2..ac45261679 100644
--- a/hw/xen/xen_pt.h
+++ b/hw/xen/xen_pt.h
@@ -78,6 +78,11 @@ typedef int (*xen_pt_conf_byte_read)
 
 #define XEN_PCI_INTEL_OPREGION 0xfc
 
+#define XEN_PCIE_CAP_ID             0
+#define XEN_PCIE_CAP_LIST_NEXT      2
+
+#define XEN_PCIE_FAKE_CAP_ID_BASE   0xFE00
+
 typedef enum {
     XEN_PT_GRP_TYPE_HARDWIRED = 0,  /* 0 Hardwired reg group */
     XEN_PT_GRP_TYPE_EMU,            /* emul reg group */
diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 0ce2a033f9..10f3b67d35 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -31,6 +31,10 @@ static int xen_pt_ext_cap_ptr_reg_init(XenPCIPassthroughState *s,
                                        XenPTRegInfo *reg,
                                        uint32_t real_offset,
                                        uint32_t *data);
+static int xen_pt_ext_cap_capid_reg_init(XenPCIPassthroughState *s,
+                                         XenPTRegInfo *reg,
+                                         uint32_t real_offset,
+                                         uint32_t *data);
 
 /* helper */
 
@@ -1630,6 +1634,56 @@ static XenPTRegInfo xen_pt_emu_reg_igd_opregion[] = {
     },
 };
 
+
+/****************************
+ * Emulated registers for
+ * PCIe Extended Capabilities
+ */
+
+static uint16_t fake_cap_id = XEN_PCIE_FAKE_CAP_ID_BASE;
+
+/* PCIe Extended Capability ID reg */
+static int xen_pt_ext_cap_capid_reg_init(XenPCIPassthroughState *s,
+                                         XenPTRegInfo *reg,
+                                         uint32_t real_offset,
+                                         uint32_t *data)
+{
+    uint16_t reg_field;
+    int rc;
+    XenPTRegGroup *reg_grp_entry = NULL;
+
+    /* use real device register's value as initial value */
+    rc = xen_host_pci_get_word(&s->real_device, real_offset, &reg_field);
+    if (rc) {
+        return rc;
+    }
+
+    reg_grp_entry = xen_pt_find_reg_grp(s, real_offset);
+
+    if (reg_grp_entry) {
+        if (reg_grp_entry->reg_grp->grp_type == XEN_PT_GRP_TYPE_HARDWIRED &&
+            reg_grp_entry->base_offset == PCI_CONFIG_SPACE_SIZE) {
+            /*
+             * This is the situation when we were asked to hide (aka
+             * "hardwire to 0") some PCIe ext capability, but it was located
+             * at offset 0x100 in PCIe config space. In this case we can't
+             * simply exclude it from the linked list of capabilities
+             * (as it is the first entry in the list), so we must fake its
+             * Capability ID in PCIe Extended Capability header, leaving
+             * the Next Ptr field intact while returning zeroes on attempts
+             * to read capability body (writes are ignored).
+             */
+            reg_field = fake_cap_id;
+            /* increment the value in order to have unique Capability IDs */
+            fake_cap_id++;
+        }
+    }
+
+    *data = reg_field;
+    return 0;
+}
+
+
 /****************************
  * Capabilities
  */
@@ -2173,7 +2227,13 @@ void xen_pt_config_init(XenPCIPassthroughState *s, Error **errp)
             }
         }
 
-        if (xen_pt_emu_reg_grps[i].grp_type == XEN_PT_GRP_TYPE_EMU) {
+        if (xen_pt_emu_reg_grps[i].grp_type == XEN_PT_GRP_TYPE_EMU ||
+            /*
+             * We need to always emulate the PCIe Extended Capability
+             * header for a hidden capability which starts at offset 0x100
+             */
+            (xen_pt_emu_reg_grps[i].grp_type == XEN_PT_GRP_TYPE_HARDWIRED &&
+            reg_grp_offset == 0x100)) {
             if (xen_pt_emu_reg_grps[i].emu_regs) {
                 int j = 0;
                 XenPTRegInfo *regs = xen_pt_emu_reg_grps[i].emu_regs;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [Qemu-devel] [RFC PATCH 25/30] xen/pt: add Vendor-specific PCIe Extended Capability descriptor and sizing
  2018-03-12 18:33 ` Alexey Gerasimenko
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Alexey Gerasimenko, qemu-devel, Stefano Stabellini, Anthony Perard

The patch provides Vendor-specific PCIe Extended Capability description
structure and corresponding sizing function. In this particular case the
size of the Vendor capability is available in the VSEC Length field.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt_config_init.c | 77 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 75 insertions(+), 2 deletions(-)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 10f3b67d35..6e99b9ebd7 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -129,6 +129,18 @@ static uint32_t get_throughable_mask(const XenPCIPassthroughState *s,
     return throughable_mask & valid_mask;
 }
 
+static void log_pcie_extended_cap(XenPCIPassthroughState *s,
+                                  const char *cap_name,
+                                  uint32_t base_offset, uint32_t size)
+{
+    if (size) {
+        XEN_PT_LOG(&s->dev, "Found PCIe Extended Capability: %s at 0x%04x, "
+                            "size 0x%x bytes\n", cap_name,
+                            (uint16_t) base_offset, size);
+    }
+}
+
+
 /****************
  * general register functions
  */
@@ -1684,6 +1696,44 @@ static int xen_pt_ext_cap_capid_reg_init(XenPCIPassthroughState *s,
 }
 
 
+/* Vendor-specific Ext Capability Structure reg static information table */
+static XenPTRegInfo xen_pt_ext_cap_emu_reg_vendor[] = {
+    {
+        .offset     = XEN_PCIE_CAP_ID,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_ext_cap_capid_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    {
+        .offset     = XEN_PCIE_CAP_LIST_NEXT,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_ext_cap_ptr_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    {
+        .offset     = PCI_VNDR_HEADER,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .ro_mask    = 0xFFFFFFFF,
+        .emu_mask   = 0x00000000,
+        .init       = xen_pt_common_reg_init,
+        .u.dw.read  = xen_pt_long_reg_read,
+        .u.dw.write = xen_pt_long_reg_write,
+    },
+    {
+        .size = 0,
+    },
+};
+
+
 /****************************
  * Capabilities
  */
@@ -1708,6 +1758,23 @@ static int xen_pt_vendor_size_init(XenPCIPassthroughState *s,
     *size = sz;
     return ret;
 }
+
+static int xen_pt_ext_cap_vendor_size_init(XenPCIPassthroughState *s,
+                                           const XenPTRegGroupInfo *grp_reg,
+                                           uint32_t base_offset,
+                                           uint32_t *size)
+{
+    uint32_t vsec_hdr = 0;
+    int ret = xen_host_pci_get_long(&s->real_device,
+                                    base_offset + PCI_VNDR_HEADER,
+                                    &vsec_hdr);
+
+    *size = PCI_VNDR_HEADER_LEN(vsec_hdr);
+
+    log_pcie_extended_cap(s, "Vendor-specific", base_offset, *size);
+
+    return ret;
+}
 /* get PCI Express Capability Structure register group size */
 static int xen_pt_pcie_size_init(XenPCIPassthroughState *s,
                                  const XenPTRegGroupInfo *grp_reg,
@@ -1934,6 +2001,14 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
         .size_init   = xen_pt_reg_grp_size_init,
         .emu_regs    = xen_pt_emu_reg_igd_opregion,
     },
+    /* Vendor-specific Extended Capability reg group */
+    {
+        .grp_id      = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_VNDR),
+        .grp_type    = XEN_PT_GRP_TYPE_EMU,
+        .grp_size    = 0xFF,
+        .size_init   = xen_pt_ext_cap_vendor_size_init,
+        .emu_regs    = xen_pt_ext_cap_emu_reg_vendor,
+    },
     {
         .grp_size = 0,
     },
@@ -2054,8 +2129,6 @@ out:
     return 0;
 }
 
-
-
 /*************
  * Main
  */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 25/30] xen/pt: add Vendor-specific PCIe Extended Capability descriptor and sizing
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Anthony Perard, Stefano Stabellini, Alexey Gerasimenko, qemu-devel

The patch provides Vendor-specific PCIe Extended Capability description
structure and corresponding sizing function. In this particular case the
size of the Vendor capability is available in the VSEC Length field.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt_config_init.c | 77 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 75 insertions(+), 2 deletions(-)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 10f3b67d35..6e99b9ebd7 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -129,6 +129,18 @@ static uint32_t get_throughable_mask(const XenPCIPassthroughState *s,
     return throughable_mask & valid_mask;
 }
 
+static void log_pcie_extended_cap(XenPCIPassthroughState *s,
+                                  const char *cap_name,
+                                  uint32_t base_offset, uint32_t size)
+{
+    if (size) {
+        XEN_PT_LOG(&s->dev, "Found PCIe Extended Capability: %s at 0x%04x, "
+                            "size 0x%x bytes\n", cap_name,
+                            (uint16_t) base_offset, size);
+    }
+}
+
+
 /****************
  * general register functions
  */
@@ -1684,6 +1696,44 @@ static int xen_pt_ext_cap_capid_reg_init(XenPCIPassthroughState *s,
 }
 
 
+/* Vendor-specific Ext Capability Structure reg static information table */
+static XenPTRegInfo xen_pt_ext_cap_emu_reg_vendor[] = {
+    {
+        .offset     = XEN_PCIE_CAP_ID,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_ext_cap_capid_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    {
+        .offset     = XEN_PCIE_CAP_LIST_NEXT,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_ext_cap_ptr_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    {
+        .offset     = PCI_VNDR_HEADER,
+        .size       = 4,
+        .init_val   = 0x00000000,
+        .ro_mask    = 0xFFFFFFFF,
+        .emu_mask   = 0x00000000,
+        .init       = xen_pt_common_reg_init,
+        .u.dw.read  = xen_pt_long_reg_read,
+        .u.dw.write = xen_pt_long_reg_write,
+    },
+    {
+        .size = 0,
+    },
+};
+
+
 /****************************
  * Capabilities
  */
@@ -1708,6 +1758,23 @@ static int xen_pt_vendor_size_init(XenPCIPassthroughState *s,
     *size = sz;
     return ret;
 }
+
+static int xen_pt_ext_cap_vendor_size_init(XenPCIPassthroughState *s,
+                                           const XenPTRegGroupInfo *grp_reg,
+                                           uint32_t base_offset,
+                                           uint32_t *size)
+{
+    uint32_t vsec_hdr = 0;
+    int ret = xen_host_pci_get_long(&s->real_device,
+                                    base_offset + PCI_VNDR_HEADER,
+                                    &vsec_hdr);
+
+    *size = PCI_VNDR_HEADER_LEN(vsec_hdr);
+
+    log_pcie_extended_cap(s, "Vendor-specific", base_offset, *size);
+
+    return ret;
+}
 /* get PCI Express Capability Structure register group size */
 static int xen_pt_pcie_size_init(XenPCIPassthroughState *s,
                                  const XenPTRegGroupInfo *grp_reg,
@@ -1934,6 +2001,14 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
         .size_init   = xen_pt_reg_grp_size_init,
         .emu_regs    = xen_pt_emu_reg_igd_opregion,
     },
+    /* Vendor-specific Extended Capability reg group */
+    {
+        .grp_id      = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_VNDR),
+        .grp_type    = XEN_PT_GRP_TYPE_EMU,
+        .grp_size    = 0xFF,
+        .size_init   = xen_pt_ext_cap_vendor_size_init,
+        .emu_regs    = xen_pt_ext_cap_emu_reg_vendor,
+    },
     {
         .grp_size = 0,
     },
@@ -2054,8 +2129,6 @@ out:
     return 0;
 }
 
-
-
 /*************
  * Main
  */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [Qemu-devel] [RFC PATCH 26/30] xen/pt: add fixed-size PCIe Extended Capabilities descriptors
  2018-03-12 18:33 ` Alexey Gerasimenko
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Alexey Gerasimenko, qemu-devel, Stefano Stabellini, Anthony Perard

This adds description structures for all fixed-size PCIe Extended
Capabilities.

For every capability register group, only 2 registers are emulated
currently: Capability ID (16 bit) and Next Capability Offset/Version (16
bit). Both needed to implement selective capability hiding. All other
registers are passed through at the moment (unless they belong to
a "hardwired" capability which is hidden)

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt_config_init.c | 183 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 183 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 6e99b9ebd7..42296c08cc 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -1734,6 +1734,37 @@ static XenPTRegInfo xen_pt_ext_cap_emu_reg_vendor[] = {
 };
 
 
+/* Common reg static information table for all passthru-type
+ * PCIe Extended Capabilities. Only Extended Cap ID and
+ * Next pointer are handled (to support capability hiding).
+ */
+static XenPTRegInfo xen_pt_ext_cap_emu_reg_dummy[] = {
+    {
+        .offset     = XEN_PCIE_CAP_ID,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_ext_cap_capid_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    {
+        .offset     = XEN_PCIE_CAP_LIST_NEXT,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_ext_cap_ptr_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    {
+        .size = 0,
+    },
+};
+
+
 /****************************
  * Capabilities
  */
@@ -2009,6 +2040,158 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
         .size_init   = xen_pt_ext_cap_vendor_size_init,
         .emu_regs    = xen_pt_ext_cap_emu_reg_vendor,
     },
+    /* Device Serial Number Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_DSN),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = PCI_EXT_CAP_DSN_SIZEOF,       /*0x0C*/
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Power Budgeting Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_PWR),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = PCI_EXT_CAP_PWR_SIZEOF,       /*0x10*/
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Root Complex Internal Link Control Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_RCILC),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x0C,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Root Complex Event Collector Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_RCEC),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x08,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Root Complex Register Block Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_RCRB),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x14,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Configuration Access Correlation Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_CAC),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x08,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Alternate Routing ID Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_ARI),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = PCI_EXT_CAP_ARI_SIZEOF,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Address Translation Services Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_ATS),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = PCI_EXT_CAP_ATS_SIZEOF,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Single Root I/O Virtualization Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_SRIOV),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = PCI_EXT_CAP_SRIOV_SIZEOF,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Page Request Interface Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_PRI),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = PCI_EXT_CAP_PRI_SIZEOF,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Latency Tolerance Reporting Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_LTR),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = PCI_EXT_CAP_LTR_SIZEOF,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Secondary PCIe Capability Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_SECPCI),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x10,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Process Address Space ID Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_PASID),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = PCI_EXT_CAP_PASID_SIZEOF,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* L1 PM Substates Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_L1SS),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x10,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Precision Time Measurement Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_PTM),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x0C,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* M-PCIe Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(0x20),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x1C,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* LN Requester (LNR) Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(0x1C),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x08,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Function Readiness Status (FRS) Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(0x21),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x10,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Readiness Time Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(0x22),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x0C,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
     {
         .grp_size = 0,
     },
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 26/30] xen/pt: add fixed-size PCIe Extended Capabilities descriptors
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Anthony Perard, Stefano Stabellini, Alexey Gerasimenko, qemu-devel

This adds description structures for all fixed-size PCIe Extended
Capabilities.

For every capability register group, only 2 registers are emulated
currently: Capability ID (16 bit) and Next Capability Offset/Version (16
bit). Both needed to implement selective capability hiding. All other
registers are passed through at the moment (unless they belong to
a "hardwired" capability which is hidden)

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt_config_init.c | 183 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 183 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 6e99b9ebd7..42296c08cc 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -1734,6 +1734,37 @@ static XenPTRegInfo xen_pt_ext_cap_emu_reg_vendor[] = {
 };
 
 
+/* Common reg static information table for all passthru-type
+ * PCIe Extended Capabilities. Only Extended Cap ID and
+ * Next pointer are handled (to support capability hiding).
+ */
+static XenPTRegInfo xen_pt_ext_cap_emu_reg_dummy[] = {
+    {
+        .offset     = XEN_PCIE_CAP_ID,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_ext_cap_capid_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    {
+        .offset     = XEN_PCIE_CAP_LIST_NEXT,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_ext_cap_ptr_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
+    {
+        .size = 0,
+    },
+};
+
+
 /****************************
  * Capabilities
  */
@@ -2009,6 +2040,158 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
         .size_init   = xen_pt_ext_cap_vendor_size_init,
         .emu_regs    = xen_pt_ext_cap_emu_reg_vendor,
     },
+    /* Device Serial Number Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_DSN),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = PCI_EXT_CAP_DSN_SIZEOF,       /*0x0C*/
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Power Budgeting Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_PWR),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = PCI_EXT_CAP_PWR_SIZEOF,       /*0x10*/
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Root Complex Internal Link Control Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_RCILC),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x0C,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Root Complex Event Collector Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_RCEC),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x08,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Root Complex Register Block Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_RCRB),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x14,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Configuration Access Correlation Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_CAC),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x08,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Alternate Routing ID Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_ARI),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = PCI_EXT_CAP_ARI_SIZEOF,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Address Translation Services Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_ATS),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = PCI_EXT_CAP_ATS_SIZEOF,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Single Root I/O Virtualization Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_SRIOV),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = PCI_EXT_CAP_SRIOV_SIZEOF,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Page Request Interface Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_PRI),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = PCI_EXT_CAP_PRI_SIZEOF,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Latency Tolerance Reporting Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_LTR),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = PCI_EXT_CAP_LTR_SIZEOF,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Secondary PCIe Capability Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_SECPCI),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x10,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Process Address Space ID Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_PASID),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = PCI_EXT_CAP_PASID_SIZEOF,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* L1 PM Substates Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_L1SS),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x10,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Precision Time Measurement Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_PTM),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x0C,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* M-PCIe Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(0x20),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x1C,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* LN Requester (LNR) Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(0x1C),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x08,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Function Readiness Status (FRS) Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(0x21),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x10,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Readiness Time Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(0x22),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0x0C,
+        .size_init  = xen_pt_reg_grp_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
     {
         .grp_size = 0,
     },
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [Qemu-devel] [RFC PATCH 27/30] xen/pt: add AER PCIe Extended Capability descriptor and sizing
  2018-03-12 18:33 ` Alexey Gerasimenko
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Alexey Gerasimenko, qemu-devel, Stefano Stabellini, Anthony Perard

The patch provides Advanced Error Reporting PCIe Extended Capability
description structure and corresponding capability sizing function.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt_config_init.c | 72 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 42296c08cc..98aae3daca 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -1924,6 +1924,70 @@ static int xen_pt_msix_size_init(XenPCIPassthroughState *s,
     return 0;
 }
 
+/* get Advanced Error Reporting Extended Capability register group size */
+#define PCI_ERR_CAP_TLP_PREFIX_LOG      (1U << 11)
+#define PCI_DEVCAP2_END_END_TLP_PREFIX  (1U << 21)
+static int xen_pt_ext_cap_aer_size_init(XenPCIPassthroughState *s,
+                                        const XenPTRegGroupInfo *grp_reg,
+                                        uint32_t base_offset,
+                                        uint32_t *size)
+{
+    uint8_t dev_type = get_pcie_device_type(s);
+    uint32_t aer_caps = 0;
+    uint32_t sz = 0;
+    int pcie_cap_pos;
+    uint32_t devcaps2;
+    int ret = 0;
+
+    pcie_cap_pos = xen_host_pci_find_next_cap(&s->real_device, 0,
+                                              PCI_CAP_ID_EXP);
+    if (!pcie_cap_pos) {
+        XEN_PT_ERR(&s->dev,
+                   "Cannot find a required PCI Express Capability\n");
+        return -1;
+    }
+
+    if (get_pcie_capability_version(s) > 1) {
+        ret = xen_host_pci_get_long(&s->real_device,
+                                    pcie_cap_pos + PCI_EXP_DEVCAP2,
+                                    &devcaps2);
+        if (ret) {
+            XEN_PT_ERR(&s->dev, "Error while reading Device "
+                       "Capabilities 2 Register \n");
+            return -1;
+        }
+    }
+
+    if (devcaps2 & PCI_DEVCAP2_END_END_TLP_PREFIX) {
+        ret = xen_host_pci_get_long(&s->real_device,
+                                    base_offset + PCI_ERR_CAP,
+                                    &aer_caps);
+        if (ret) {
+            XEN_PT_ERR(&s->dev,
+                       "Error while reading AER Extended Capability\n");
+            return -1;
+        }
+
+        if (aer_caps & PCI_ERR_CAP_TLP_PREFIX_LOG) {
+            sz = 0x48;
+        }
+    }
+
+    if (!sz) {
+        if (dev_type == PCI_EXP_TYPE_ROOT_PORT ||
+            dev_type == PCI_EXP_TYPE_RC_EC) {
+            sz = 0x38;
+        } else {
+            sz = 0x2C;
+        }
+    }
+
+    *size = sz;
+
+    log_pcie_extended_cap(s, "AER", base_offset, *size);
+    return ret;
+}
+
 
 static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
     /* Header Type0 reg group */
@@ -2192,6 +2256,14 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
         .size_init  = xen_pt_reg_grp_size_init,
         .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
     },
+    /* Advanced Error Reporting Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_ERR),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_aer_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
     {
         .grp_size = 0,
     },
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 27/30] xen/pt: add AER PCIe Extended Capability descriptor and sizing
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Anthony Perard, Stefano Stabellini, Alexey Gerasimenko, qemu-devel

The patch provides Advanced Error Reporting PCIe Extended Capability
description structure and corresponding capability sizing function.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt_config_init.c | 72 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 42296c08cc..98aae3daca 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -1924,6 +1924,70 @@ static int xen_pt_msix_size_init(XenPCIPassthroughState *s,
     return 0;
 }
 
+/* get Advanced Error Reporting Extended Capability register group size */
+#define PCI_ERR_CAP_TLP_PREFIX_LOG      (1U << 11)
+#define PCI_DEVCAP2_END_END_TLP_PREFIX  (1U << 21)
+static int xen_pt_ext_cap_aer_size_init(XenPCIPassthroughState *s,
+                                        const XenPTRegGroupInfo *grp_reg,
+                                        uint32_t base_offset,
+                                        uint32_t *size)
+{
+    uint8_t dev_type = get_pcie_device_type(s);
+    uint32_t aer_caps = 0;
+    uint32_t sz = 0;
+    int pcie_cap_pos;
+    uint32_t devcaps2;
+    int ret = 0;
+
+    pcie_cap_pos = xen_host_pci_find_next_cap(&s->real_device, 0,
+                                              PCI_CAP_ID_EXP);
+    if (!pcie_cap_pos) {
+        XEN_PT_ERR(&s->dev,
+                   "Cannot find a required PCI Express Capability\n");
+        return -1;
+    }
+
+    if (get_pcie_capability_version(s) > 1) {
+        ret = xen_host_pci_get_long(&s->real_device,
+                                    pcie_cap_pos + PCI_EXP_DEVCAP2,
+                                    &devcaps2);
+        if (ret) {
+            XEN_PT_ERR(&s->dev, "Error while reading Device "
+                       "Capabilities 2 Register \n");
+            return -1;
+        }
+    }
+
+    if (devcaps2 & PCI_DEVCAP2_END_END_TLP_PREFIX) {
+        ret = xen_host_pci_get_long(&s->real_device,
+                                    base_offset + PCI_ERR_CAP,
+                                    &aer_caps);
+        if (ret) {
+            XEN_PT_ERR(&s->dev,
+                       "Error while reading AER Extended Capability\n");
+            return -1;
+        }
+
+        if (aer_caps & PCI_ERR_CAP_TLP_PREFIX_LOG) {
+            sz = 0x48;
+        }
+    }
+
+    if (!sz) {
+        if (dev_type == PCI_EXP_TYPE_ROOT_PORT ||
+            dev_type == PCI_EXP_TYPE_RC_EC) {
+            sz = 0x38;
+        } else {
+            sz = 0x2C;
+        }
+    }
+
+    *size = sz;
+
+    log_pcie_extended_cap(s, "AER", base_offset, *size);
+    return ret;
+}
+
 
 static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
     /* Header Type0 reg group */
@@ -2192,6 +2256,14 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
         .size_init  = xen_pt_reg_grp_size_init,
         .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
     },
+    /* Advanced Error Reporting Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_ERR),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_aer_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
     {
         .grp_size = 0,
     },
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [Qemu-devel] [RFC PATCH 28/30] xen/pt: add descriptors and size calculation for RCLD/ACS/PMUX/DPA/MCAST/TPH/DPC PCIe Extended Capabilities
  2018-03-12 18:33 ` Alexey Gerasimenko
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Alexey Gerasimenko, qemu-devel, Stefano Stabellini, Anthony Perard

Add few more PCIe Extended Capabilities entries to the
xen_pt_emu_reg_grps[] array along with their corresponding *_size_init()
functions.

All these capabilities have non-fixed size but their size calculation
is very simple, hence adding them in a single batch.

For every capability register group, only 2 registers are emulated
currently: Capability ID (16 bit) and Next Capability Offset/Version (16
bit). Both needed to implement the selective capability hiding. All other
registers are passed through at the moment (unless they belong to
a capability marked as "hardwired" which is hidden)

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt_config_init.c | 224 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 224 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 98aae3daca..326f5671ff 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -1988,6 +1988,174 @@ static int xen_pt_ext_cap_aer_size_init(XenPCIPassthroughState *s,
     return ret;
 }
 
+/* get Root Complex Link Declaration Extended Capability register group size */
+#define RCLD_GET_NUM_ENTRIES(x)     (((x) >> 8) & 0xFF)
+static int xen_pt_ext_cap_rcld_size_init(XenPCIPassthroughState *s,
+                                         const XenPTRegGroupInfo *grp_reg,
+                                         uint32_t base_offset,
+                                         uint32_t *size)
+{
+    uint32_t elem_self_descr = 0;
+
+    int ret = xen_host_pci_get_long(&s->real_device,
+                                    base_offset + 4,
+                                    &elem_self_descr);
+
+    *size = 0x10 + RCLD_GET_NUM_ENTRIES(elem_self_descr) * 0x10;
+
+    log_pcie_extended_cap(s, "Root Complex Link Declaration",
+                          base_offset, *size);
+    return ret;
+}
+
+/* get Access Control Services Extended Capability register group size */
+#define ACS_VECTOR_SIZE_BITS(x)    ((((x) >> 8) & 0xFF) ?: 256)
+static int xen_pt_ext_cap_acs_size_init(XenPCIPassthroughState *s,
+                                        const XenPTRegGroupInfo *grp_reg,
+                                        uint32_t base_offset,
+                                        uint32_t *size)
+{
+    uint16_t acs_caps = 0;
+
+    int ret = xen_host_pci_get_word(&s->real_device,
+                                    base_offset + PCI_ACS_CAP,
+                                    &acs_caps);
+
+    if (acs_caps & PCI_ACS_EC) {
+        uint32_t vector_sz = ACS_VECTOR_SIZE_BITS(acs_caps);
+
+        *size = PCI_ACS_EGRESS_CTL_V + ((vector_sz + 7) & ~7) / 8;
+    } else {
+        *size = PCI_ACS_EGRESS_CTL_V;
+    }
+
+    log_pcie_extended_cap(s, "ACS", base_offset, *size);
+    return ret;
+}
+
+/* get Multicast Extended Capability register group size */
+static int xen_pt_ext_cap_multicast_size_init(XenPCIPassthroughState *s,
+                                              const XenPTRegGroupInfo *grp_reg,
+                                              uint32_t base_offset,
+                                              uint32_t *size)
+{
+    uint8_t dev_type = get_pcie_device_type(s);
+
+    switch (dev_type) {
+    case PCI_EXP_TYPE_ENDPOINT:
+    case PCI_EXP_TYPE_LEG_END:
+    case PCI_EXP_TYPE_RC_END:
+    case PCI_EXP_TYPE_RC_EC:
+    default:
+        *size = PCI_EXT_CAP_MCAST_ENDPOINT_SIZEOF;
+        break;
+
+    case PCI_EXP_TYPE_ROOT_PORT:
+    case PCI_EXP_TYPE_UPSTREAM:
+    case PCI_EXP_TYPE_DOWNSTREAM:
+        *size = 0x30;
+        break;
+    }
+
+    log_pcie_extended_cap(s, "Multicast", base_offset, *size);
+    return 0;
+}
+
+/* get Dynamic Power Allocation Extended Capability register group size */
+static int xen_pt_ext_cap_dpa_size_init(XenPCIPassthroughState *s,
+                                        const XenPTRegGroupInfo *grp_reg,
+                                        uint32_t base_offset,
+                                        uint32_t *size)
+{
+    uint32_t dpa_caps = 0;
+    uint32_t num_entries;
+
+    int ret = xen_host_pci_get_long(&s->real_device,
+                                    base_offset + PCI_DPA_CAP,
+                                    &dpa_caps);
+
+    num_entries = (dpa_caps & PCI_DPA_CAP_SUBSTATE_MASK) + 1;
+
+    *size = PCI_DPA_BASE_SIZEOF + num_entries /*byte-size registers*/;
+
+    log_pcie_extended_cap(s, "Dynamic Power Allocation", base_offset, *size);
+    return ret;
+}
+
+/* get TPH Requester Extended Capability register group size */
+static int xen_pt_ext_cap_tph_size_init(XenPCIPassthroughState *s,
+                                        const XenPTRegGroupInfo *grp_reg,
+                                        uint32_t base_offset,
+                                        uint32_t *size)
+{
+    uint32_t tph_caps = 0;
+    uint32_t num_entries;
+
+    int ret = xen_host_pci_get_long(&s->real_device,
+                                    base_offset + PCI_TPH_CAP,
+                                    &tph_caps);
+
+    switch(tph_caps & PCI_TPH_CAP_LOC_MASK) {
+    case PCI_TPH_LOC_CAP:
+        num_entries = (tph_caps & PCI_TPH_CAP_ST_MASK) >> PCI_TPH_CAP_ST_SHIFT;
+        num_entries++;
+        break;
+
+    case PCI_TPH_LOC_NONE:
+    case PCI_TPH_LOC_MSIX:
+    default:
+        /* not in the capability */
+        num_entries = 0;
+    }
+
+    *size = PCI_TPH_BASE_SIZEOF + num_entries * 2;
+
+    log_pcie_extended_cap(s, "TPH Requester", base_offset, *size);
+    return ret;
+}
+
+/* get Downstream Port Containment Extended Capability register group size */
+static int xen_pt_ext_cap_dpc_size_init(XenPCIPassthroughState *s,
+                                        const XenPTRegGroupInfo *grp_reg,
+                                        uint32_t base_offset,
+                                        uint32_t *size)
+{
+    uint16_t dpc_caps = 0;
+
+    int ret = xen_host_pci_get_word(&s->real_device,
+                                    base_offset + PCI_EXP_DPC_CAP,
+                                    &dpc_caps);
+
+    if (dpc_caps & PCI_EXP_DPC_CAP_RP_EXT) {
+        *size = 0x20 + ((dpc_caps & PCI_EXP_DPC_RP_PIO_LOG_SIZE) >> 8) * 4;
+    } else {
+        *size = 0xC;
+    }
+
+    log_pcie_extended_cap(s, "Downstream Port Containment",
+                          base_offset, *size);
+    return ret;
+}
+
+/* get Protocol Multiplexing Extended Capability register group size */
+#define PMUX_GET_NUM_ENTRIES(x)     ((x) & 0x3F)
+static int xen_pt_ext_cap_pmux_size_init(XenPCIPassthroughState *s,
+                                         const XenPTRegGroupInfo *grp_reg,
+                                         uint32_t base_offset,
+                                         uint32_t *size)
+{
+    uint32_t pmux_caps = 0;
+
+    int ret = xen_host_pci_get_long(&s->real_device,
+                                    base_offset + 4,
+                                    &pmux_caps);
+
+    *size = 0x10 + PMUX_GET_NUM_ENTRIES(pmux_caps) * 4;
+
+    log_pcie_extended_cap(s, "PMUX", base_offset, *size);
+    return ret;
+}
+
 
 static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
     /* Header Type0 reg group */
@@ -2264,6 +2432,62 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
         .size_init  = xen_pt_ext_cap_aer_size_init,
         .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
     },
+    /* Root Complex Link Declaration Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_RCLD),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_rcld_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Access Control Services Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_ACS),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_acs_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Multicast Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_MCAST),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_multicast_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Dynamic Power Allocation Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_DPA),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_dpa_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* TPH Requester Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_TPH),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_tph_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Protocol Multiplexing Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_PMUX),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_pmux_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Downstream Port Containment Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_DPC),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_dpc_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
     {
         .grp_size = 0,
     },
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 28/30] xen/pt: add descriptors and size calculation for RCLD/ACS/PMUX/DPA/MCAST/TPH/DPC PCIe Extended Capabilities
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Anthony Perard, Stefano Stabellini, Alexey Gerasimenko, qemu-devel

Add few more PCIe Extended Capabilities entries to the
xen_pt_emu_reg_grps[] array along with their corresponding *_size_init()
functions.

All these capabilities have non-fixed size but their size calculation
is very simple, hence adding them in a single batch.

For every capability register group, only 2 registers are emulated
currently: Capability ID (16 bit) and Next Capability Offset/Version (16
bit). Both needed to implement the selective capability hiding. All other
registers are passed through at the moment (unless they belong to
a capability marked as "hardwired" which is hidden)

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt_config_init.c | 224 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 224 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 98aae3daca..326f5671ff 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -1988,6 +1988,174 @@ static int xen_pt_ext_cap_aer_size_init(XenPCIPassthroughState *s,
     return ret;
 }
 
+/* get Root Complex Link Declaration Extended Capability register group size */
+#define RCLD_GET_NUM_ENTRIES(x)     (((x) >> 8) & 0xFF)
+static int xen_pt_ext_cap_rcld_size_init(XenPCIPassthroughState *s,
+                                         const XenPTRegGroupInfo *grp_reg,
+                                         uint32_t base_offset,
+                                         uint32_t *size)
+{
+    uint32_t elem_self_descr = 0;
+
+    int ret = xen_host_pci_get_long(&s->real_device,
+                                    base_offset + 4,
+                                    &elem_self_descr);
+
+    *size = 0x10 + RCLD_GET_NUM_ENTRIES(elem_self_descr) * 0x10;
+
+    log_pcie_extended_cap(s, "Root Complex Link Declaration",
+                          base_offset, *size);
+    return ret;
+}
+
+/* get Access Control Services Extended Capability register group size */
+#define ACS_VECTOR_SIZE_BITS(x)    ((((x) >> 8) & 0xFF) ?: 256)
+static int xen_pt_ext_cap_acs_size_init(XenPCIPassthroughState *s,
+                                        const XenPTRegGroupInfo *grp_reg,
+                                        uint32_t base_offset,
+                                        uint32_t *size)
+{
+    uint16_t acs_caps = 0;
+
+    int ret = xen_host_pci_get_word(&s->real_device,
+                                    base_offset + PCI_ACS_CAP,
+                                    &acs_caps);
+
+    if (acs_caps & PCI_ACS_EC) {
+        uint32_t vector_sz = ACS_VECTOR_SIZE_BITS(acs_caps);
+
+        *size = PCI_ACS_EGRESS_CTL_V + ((vector_sz + 7) & ~7) / 8;
+    } else {
+        *size = PCI_ACS_EGRESS_CTL_V;
+    }
+
+    log_pcie_extended_cap(s, "ACS", base_offset, *size);
+    return ret;
+}
+
+/* get Multicast Extended Capability register group size */
+static int xen_pt_ext_cap_multicast_size_init(XenPCIPassthroughState *s,
+                                              const XenPTRegGroupInfo *grp_reg,
+                                              uint32_t base_offset,
+                                              uint32_t *size)
+{
+    uint8_t dev_type = get_pcie_device_type(s);
+
+    switch (dev_type) {
+    case PCI_EXP_TYPE_ENDPOINT:
+    case PCI_EXP_TYPE_LEG_END:
+    case PCI_EXP_TYPE_RC_END:
+    case PCI_EXP_TYPE_RC_EC:
+    default:
+        *size = PCI_EXT_CAP_MCAST_ENDPOINT_SIZEOF;
+        break;
+
+    case PCI_EXP_TYPE_ROOT_PORT:
+    case PCI_EXP_TYPE_UPSTREAM:
+    case PCI_EXP_TYPE_DOWNSTREAM:
+        *size = 0x30;
+        break;
+    }
+
+    log_pcie_extended_cap(s, "Multicast", base_offset, *size);
+    return 0;
+}
+
+/* get Dynamic Power Allocation Extended Capability register group size */
+static int xen_pt_ext_cap_dpa_size_init(XenPCIPassthroughState *s,
+                                        const XenPTRegGroupInfo *grp_reg,
+                                        uint32_t base_offset,
+                                        uint32_t *size)
+{
+    uint32_t dpa_caps = 0;
+    uint32_t num_entries;
+
+    int ret = xen_host_pci_get_long(&s->real_device,
+                                    base_offset + PCI_DPA_CAP,
+                                    &dpa_caps);
+
+    num_entries = (dpa_caps & PCI_DPA_CAP_SUBSTATE_MASK) + 1;
+
+    *size = PCI_DPA_BASE_SIZEOF + num_entries /*byte-size registers*/;
+
+    log_pcie_extended_cap(s, "Dynamic Power Allocation", base_offset, *size);
+    return ret;
+}
+
+/* get TPH Requester Extended Capability register group size */
+static int xen_pt_ext_cap_tph_size_init(XenPCIPassthroughState *s,
+                                        const XenPTRegGroupInfo *grp_reg,
+                                        uint32_t base_offset,
+                                        uint32_t *size)
+{
+    uint32_t tph_caps = 0;
+    uint32_t num_entries;
+
+    int ret = xen_host_pci_get_long(&s->real_device,
+                                    base_offset + PCI_TPH_CAP,
+                                    &tph_caps);
+
+    switch(tph_caps & PCI_TPH_CAP_LOC_MASK) {
+    case PCI_TPH_LOC_CAP:
+        num_entries = (tph_caps & PCI_TPH_CAP_ST_MASK) >> PCI_TPH_CAP_ST_SHIFT;
+        num_entries++;
+        break;
+
+    case PCI_TPH_LOC_NONE:
+    case PCI_TPH_LOC_MSIX:
+    default:
+        /* not in the capability */
+        num_entries = 0;
+    }
+
+    *size = PCI_TPH_BASE_SIZEOF + num_entries * 2;
+
+    log_pcie_extended_cap(s, "TPH Requester", base_offset, *size);
+    return ret;
+}
+
+/* get Downstream Port Containment Extended Capability register group size */
+static int xen_pt_ext_cap_dpc_size_init(XenPCIPassthroughState *s,
+                                        const XenPTRegGroupInfo *grp_reg,
+                                        uint32_t base_offset,
+                                        uint32_t *size)
+{
+    uint16_t dpc_caps = 0;
+
+    int ret = xen_host_pci_get_word(&s->real_device,
+                                    base_offset + PCI_EXP_DPC_CAP,
+                                    &dpc_caps);
+
+    if (dpc_caps & PCI_EXP_DPC_CAP_RP_EXT) {
+        *size = 0x20 + ((dpc_caps & PCI_EXP_DPC_RP_PIO_LOG_SIZE) >> 8) * 4;
+    } else {
+        *size = 0xC;
+    }
+
+    log_pcie_extended_cap(s, "Downstream Port Containment",
+                          base_offset, *size);
+    return ret;
+}
+
+/* get Protocol Multiplexing Extended Capability register group size */
+#define PMUX_GET_NUM_ENTRIES(x)     ((x) & 0x3F)
+static int xen_pt_ext_cap_pmux_size_init(XenPCIPassthroughState *s,
+                                         const XenPTRegGroupInfo *grp_reg,
+                                         uint32_t base_offset,
+                                         uint32_t *size)
+{
+    uint32_t pmux_caps = 0;
+
+    int ret = xen_host_pci_get_long(&s->real_device,
+                                    base_offset + 4,
+                                    &pmux_caps);
+
+    *size = 0x10 + PMUX_GET_NUM_ENTRIES(pmux_caps) * 4;
+
+    log_pcie_extended_cap(s, "PMUX", base_offset, *size);
+    return ret;
+}
+
 
 static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
     /* Header Type0 reg group */
@@ -2264,6 +2432,62 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
         .size_init  = xen_pt_ext_cap_aer_size_init,
         .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
     },
+    /* Root Complex Link Declaration Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_RCLD),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_rcld_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Access Control Services Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_ACS),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_acs_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Multicast Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_MCAST),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_multicast_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Dynamic Power Allocation Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_DPA),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_dpa_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* TPH Requester Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_TPH),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_tph_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Protocol Multiplexing Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_PMUX),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_pmux_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Downstream Port Containment Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_DPC),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_dpc_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
     {
         .grp_size = 0,
     },
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [Qemu-devel] [RFC PATCH 29/30] xen/pt: add Resizable BAR PCIe Extended Capability descriptor and sizing
  2018-03-12 18:33 ` Alexey Gerasimenko
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Alexey Gerasimenko, qemu-devel, Stefano Stabellini, Anthony Perard

Unlike other PCIe Extended Capabilities, we currently cannot allow attempts
to use Resizable BAR Capability. Without specifically handling BAR resizing
we're likely end up with corrupted MMIO hole layout if guest OS will
attempt to use this feature. Actually, recent Windows versions started
to understand and use the Resizable BAR Capability (see [1]).

For now, we need to hide the Resizable BAR Capability from guest OS until
BAR resizing emulation support will be implemented in Xen. This support
is a pretty much mandatory todo-feature as the effect of writing
to Resizable BAR control registers can be considered similar
to reprogramming normal BAR registers -- i.e. this needs to be handled
explicitly, resulting in corresponding MMIO BAR range(s) remapping.
Until then, we mark the Resizable BAR Capability as
XEN_PT_GRP_TYPE_HARDWIRED.

[1]: https://docs.microsoft.com/en-us/windows-hardware/drivers/display/resizable-bar-support

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt_config_init.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 326f5671ff..b03b071b22 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -2156,6 +2156,26 @@ static int xen_pt_ext_cap_pmux_size_init(XenPCIPassthroughState *s,
     return ret;
 }
 
+/* get Resizable BAR Extended Capability register group size */
+static int xen_pt_ext_cap_rebar_size_init(XenPCIPassthroughState *s,
+                                          const XenPTRegGroupInfo *grp_reg,
+                                          uint32_t base_offset,
+                                          uint32_t *size)
+{
+    uint32_t rebar_ctl = 0;
+    uint32_t num_entries;
+
+    int ret = xen_host_pci_get_long(&s->real_device,
+                                    base_offset + PCI_REBAR_CTRL,
+                                    &rebar_ctl);
+    num_entries =
+        (rebar_ctl & PCI_REBAR_CTRL_NBAR_MASK) >> PCI_REBAR_CTRL_NBAR_SHIFT;
+
+    *size = num_entries*8 + 4;
+
+    log_pcie_extended_cap(s, "Resizable BAR", base_offset, *size);
+    return ret;
+}
 
 static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
     /* Header Type0 reg group */
@@ -2488,6 +2508,13 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
         .size_init  = xen_pt_ext_cap_dpc_size_init,
         .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
     },
+    /* Resizable BAR Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_REBAR),
+        .grp_type   = XEN_PT_GRP_TYPE_HARDWIRED,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_rebar_size_init,
+    },
     {
         .grp_size = 0,
     },
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 29/30] xen/pt: add Resizable BAR PCIe Extended Capability descriptor and sizing
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Anthony Perard, Stefano Stabellini, Alexey Gerasimenko, qemu-devel

Unlike other PCIe Extended Capabilities, we currently cannot allow attempts
to use Resizable BAR Capability. Without specifically handling BAR resizing
we're likely end up with corrupted MMIO hole layout if guest OS will
attempt to use this feature. Actually, recent Windows versions started
to understand and use the Resizable BAR Capability (see [1]).

For now, we need to hide the Resizable BAR Capability from guest OS until
BAR resizing emulation support will be implemented in Xen. This support
is a pretty much mandatory todo-feature as the effect of writing
to Resizable BAR control registers can be considered similar
to reprogramming normal BAR registers -- i.e. this needs to be handled
explicitly, resulting in corresponding MMIO BAR range(s) remapping.
Until then, we mark the Resizable BAR Capability as
XEN_PT_GRP_TYPE_HARDWIRED.

[1]: https://docs.microsoft.com/en-us/windows-hardware/drivers/display/resizable-bar-support

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt_config_init.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 326f5671ff..b03b071b22 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -2156,6 +2156,26 @@ static int xen_pt_ext_cap_pmux_size_init(XenPCIPassthroughState *s,
     return ret;
 }
 
+/* get Resizable BAR Extended Capability register group size */
+static int xen_pt_ext_cap_rebar_size_init(XenPCIPassthroughState *s,
+                                          const XenPTRegGroupInfo *grp_reg,
+                                          uint32_t base_offset,
+                                          uint32_t *size)
+{
+    uint32_t rebar_ctl = 0;
+    uint32_t num_entries;
+
+    int ret = xen_host_pci_get_long(&s->real_device,
+                                    base_offset + PCI_REBAR_CTRL,
+                                    &rebar_ctl);
+    num_entries =
+        (rebar_ctl & PCI_REBAR_CTRL_NBAR_MASK) >> PCI_REBAR_CTRL_NBAR_SHIFT;
+
+    *size = num_entries*8 + 4;
+
+    log_pcie_extended_cap(s, "Resizable BAR", base_offset, *size);
+    return ret;
+}
 
 static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
     /* Header Type0 reg group */
@@ -2488,6 +2508,13 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
         .size_init  = xen_pt_ext_cap_dpc_size_init,
         .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
     },
+    /* Resizable BAR Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_REBAR),
+        .grp_type   = XEN_PT_GRP_TYPE_HARDWIRED,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_rebar_size_init,
+    },
     {
         .grp_size = 0,
     },
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [Qemu-devel] [RFC PATCH 30/30] xen/pt: add VC/VC9/MFVC PCIe Extended Capabilities descriptors and sizing
  2018-03-12 18:33 ` Alexey Gerasimenko
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Alexey Gerasimenko, qemu-devel, Stefano Stabellini, Anthony Perard

Virtual Channel/MFVC capabilities are relatively useless for emulation
(passing through accesses to them should be enough in most cases) yet they
have hardest format of all PCIe Extended Capabilities, mostly because
VC capability format allows the sparse config space layout with gaps
between the parts which make up the VC capability.

We have the main capability body followed by variable number of entries
where each entry may additionally reference the arbitration table outside
main capability body. There are no constrains on these arbitration table
offsets -- in theory, they may reside outside the VC capability range
anywhere in PCIe extended config space. Also, every arbitration table size
is not fixed - it depends on current VC/Port Arbitration Select field
value.

To simplify things, this patch assume that changing VC/Port Arbitration
Select value (i.e. resizing arbitration tables) do not cause arbitration
table offsets to change. Normally the device must place arbitration tables
considering their maximum size, not current one. Maximum arbitration table
size depends on VC/Port Arbitration Capability bitmask -- this is what
actually used to calculate the arbitration table size.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt_config_init.c | 192 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 192 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index b03b071b22..ab9c233d84 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -2177,6 +2177,174 @@ static int xen_pt_ext_cap_rebar_size_init(XenPCIPassthroughState *s,
     return ret;
 }
 
+/* get VC/VC9/MFVC Extended Capability register group size */
+static uint32_t get_arb_table_len_max(XenPCIPassthroughState *s,
+                                      uint32_t max_bit_supported,
+                                      uint32_t arb_cap)
+{
+    int n_bit;
+    uint32_t table_max_size = 0;
+
+    if (!arb_cap) {
+        return 0;
+    }
+
+    for (n_bit = 7; n_bit >= 0 && !(arb_cap & (1 << n_bit)); n_bit--);
+
+    if (n_bit > max_bit_supported) {
+        XEN_PT_ERR(&s->dev, "Warning: encountered unknown VC arbitration "
+                   "capability supported: 0x%02x\n", (uint8_t) arb_cap);
+    }
+
+    switch (n_bit) {
+    case 0: break;
+    case 1: return 32;
+    case 2: return 64;
+    case 3: /*128 too*/
+    case 4: return 128;
+    default:
+        table_max_size = 8 << n_bit;
+    }
+
+    return table_max_size;
+}
+
+#define GET_ARB_TABLE_OFFSET(x)           (((x) >> 24) * 0x10)
+#define GET_VC_ARB_CAPABILITY(x)          ((x) & 0xFF)
+#define ARB_TABLE_ENTRY_SIZE_BITS(x)      (1 << (((x) & PCI_VC_CAP1_ARB_SIZE)\
+                                          >> 10))
+static int xen_pt_ext_cap_vchan_size_init(XenPCIPassthroughState *s,
+                                          const XenPTRegGroupInfo *grp_reg,
+                                          uint32_t base_offset,
+                                          uint32_t *size)
+{
+    uint32_t header;
+    uint32_t vc_cap_max_size = PCIE_CONFIG_SPACE_SIZE - base_offset;
+    uint32_t next_ptr;
+    uint32_t arb_table_start_max = 0, arb_table_end_max = 0;
+    uint32_t port_vc_cap1, port_vc_cap2, vc_rsrc_cap;
+    uint32_t ext_vc_count = 0;
+    uint32_t arb_table_entry_size;  /* in bits */
+    const char *cap_name;
+    int ret;
+    int i;
+
+    ret = xen_host_pci_get_long(&s->real_device, base_offset, &header);
+    if (ret) {
+        goto err_read;
+    }
+
+    next_ptr = PCI_EXT_CAP_NEXT(header);
+
+    switch (PCI_EXT_CAP_ID(header)) {
+    case PCI_EXT_CAP_ID_VC:
+    case PCI_EXT_CAP_ID_VC9:
+        cap_name = "Virtual Channel";
+        break;
+    case PCI_EXT_CAP_ID_MFVC:
+        cap_name = "Multi-Function VC";
+        break;
+    default:
+        XEN_PT_ERR(&s->dev, "Unknown VC Extended Capability ID "
+                   "encountered: 0x%04x\n", PCI_EXT_CAP_ID(header));
+        return -1;
+    }
+
+    if (next_ptr && next_ptr > base_offset) {
+        vc_cap_max_size = next_ptr - base_offset;
+    }
+
+    ret = xen_host_pci_get_long(&s->real_device,
+                                base_offset + PCI_VC_PORT_CAP1,
+                                &port_vc_cap1);
+    if (ret) {
+        goto err_read;
+    }
+
+    ret = xen_host_pci_get_long(&s->real_device,
+                                base_offset + PCI_VC_PORT_CAP2,
+                                &port_vc_cap2);
+    if (ret) {
+        goto err_read;
+    }
+
+    ext_vc_count = port_vc_cap1 & PCI_VC_CAP1_EVCC;
+
+    arb_table_start_max = GET_ARB_TABLE_OFFSET(port_vc_cap2);
+
+    /* check arbitration table offset for validity */
+    if (arb_table_start_max >= vc_cap_max_size) {
+        XEN_PT_ERR(&s->dev, "Warning: VC arbitration table offset points "
+                   "outside the expected range: %#04x\n",
+                   (uint16_t) arb_table_start_max);
+        /* skip this arbitration table */
+        arb_table_start_max = 0;
+    }
+
+    if (arb_table_start_max) {
+        uint32_t vc_arb_cap = GET_VC_ARB_CAPABILITY(port_vc_cap2);
+        uint32_t num_phases = get_arb_table_len_max(s, 3, vc_arb_cap);
+        uint32_t arb_tbl_sz = QEMU_ALIGN_UP(num_phases * 4, 32) / 8;
+
+        arb_table_end_max = base_offset + arb_table_start_max + arb_tbl_sz;
+    }
+
+    /* get Function/Port Arbitration Table Entry size */
+    arb_table_entry_size = ARB_TABLE_ENTRY_SIZE_BITS(port_vc_cap1);
+
+    /* process all VC Resource entries */
+    for (i = 0; i < ext_vc_count; i++) {
+        uint32_t arb_table_offset;
+
+        /* read VC Resource Capability */
+        ret = xen_host_pci_get_long(&s->real_device,
+            base_offset + PCI_VC_RES_CAP + i * PCI_CAP_VC_PER_VC_SIZEOF,
+            &vc_rsrc_cap);
+        if (ret) {
+            goto err_read;
+        }
+
+        arb_table_offset = GET_ARB_TABLE_OFFSET(vc_rsrc_cap);
+
+        if (arb_table_offset > arb_table_start_max) {
+            /* check arbitration table offset for validity */
+            if (arb_table_offset >= vc_cap_max_size) {
+                XEN_PT_ERR(&s->dev, "Warning: Port/Function arbitration table "
+                           "offset points outside the expected range: %#04x\n",
+                           (uint16_t) arb_table_offset);
+                /* skip this arbitration table */
+                arb_table_offset = 0;
+            } else {
+                arb_table_start_max = arb_table_offset;
+            }
+
+            if (arb_table_offset) {
+                uint32_t vc_arb_cap = GET_VC_ARB_CAPABILITY(vc_rsrc_cap);
+                uint32_t num_phases = get_arb_table_len_max(s, 5, vc_arb_cap);
+                uint32_t arb_tbl_sz =
+                    QEMU_ALIGN_UP(num_phases * arb_table_entry_size, 32) / 8;
+
+                arb_table_end_max = base_offset + arb_table_offset + arb_tbl_sz;
+            }
+        }
+    }
+
+    if (arb_table_end_max) {
+        *size = arb_table_end_max - base_offset;
+    } else {
+        *size = PCI_CAP_VC_BASE_SIZEOF +
+                ext_vc_count * PCI_CAP_VC_PER_VC_SIZEOF;
+    }
+
+    log_pcie_extended_cap(s, cap_name, base_offset, *size);
+    return 0;
+
+err_read:
+    XEN_PT_ERR(&s->dev, "Error while reading VC Extended Capability\n");
+    return ret;
+}
+
+
 static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
     /* Header Type0 reg group */
     {
@@ -2515,6 +2683,30 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
         .grp_size   = 0xFF,
         .size_init  = xen_pt_ext_cap_rebar_size_init,
     },
+    /* Virtual Channel Extended Capability reg group (2) */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_VC),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_vchan_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Virtual Channel Extended Capability reg group (9) */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_VC9),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_vchan_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Multi-Function Virtual Channel Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_MFVC),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_vchan_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
     {
         .grp_size = 0,
     },
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [RFC PATCH 30/30] xen/pt: add VC/VC9/MFVC PCIe Extended Capabilities descriptors and sizing
@ 2018-03-12 18:34   ` Alexey Gerasimenko
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey Gerasimenko @ 2018-03-12 18:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Anthony Perard, Stefano Stabellini, Alexey Gerasimenko, qemu-devel

Virtual Channel/MFVC capabilities are relatively useless for emulation
(passing through accesses to them should be enough in most cases) yet they
have hardest format of all PCIe Extended Capabilities, mostly because
VC capability format allows the sparse config space layout with gaps
between the parts which make up the VC capability.

We have the main capability body followed by variable number of entries
where each entry may additionally reference the arbitration table outside
main capability body. There are no constrains on these arbitration table
offsets -- in theory, they may reside outside the VC capability range
anywhere in PCIe extended config space. Also, every arbitration table size
is not fixed - it depends on current VC/Port Arbitration Select field
value.

To simplify things, this patch assume that changing VC/Port Arbitration
Select value (i.e. resizing arbitration tables) do not cause arbitration
table offsets to change. Normally the device must place arbitration tables
considering their maximum size, not current one. Maximum arbitration table
size depends on VC/Port Arbitration Capability bitmask -- this is what
actually used to calculate the arbitration table size.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt_config_init.c | 192 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 192 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index b03b071b22..ab9c233d84 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -2177,6 +2177,174 @@ static int xen_pt_ext_cap_rebar_size_init(XenPCIPassthroughState *s,
     return ret;
 }
 
+/* get VC/VC9/MFVC Extended Capability register group size */
+static uint32_t get_arb_table_len_max(XenPCIPassthroughState *s,
+                                      uint32_t max_bit_supported,
+                                      uint32_t arb_cap)
+{
+    int n_bit;
+    uint32_t table_max_size = 0;
+
+    if (!arb_cap) {
+        return 0;
+    }
+
+    for (n_bit = 7; n_bit >= 0 && !(arb_cap & (1 << n_bit)); n_bit--);
+
+    if (n_bit > max_bit_supported) {
+        XEN_PT_ERR(&s->dev, "Warning: encountered unknown VC arbitration "
+                   "capability supported: 0x%02x\n", (uint8_t) arb_cap);
+    }
+
+    switch (n_bit) {
+    case 0: break;
+    case 1: return 32;
+    case 2: return 64;
+    case 3: /*128 too*/
+    case 4: return 128;
+    default:
+        table_max_size = 8 << n_bit;
+    }
+
+    return table_max_size;
+}
+
+#define GET_ARB_TABLE_OFFSET(x)           (((x) >> 24) * 0x10)
+#define GET_VC_ARB_CAPABILITY(x)          ((x) & 0xFF)
+#define ARB_TABLE_ENTRY_SIZE_BITS(x)      (1 << (((x) & PCI_VC_CAP1_ARB_SIZE)\
+                                          >> 10))
+static int xen_pt_ext_cap_vchan_size_init(XenPCIPassthroughState *s,
+                                          const XenPTRegGroupInfo *grp_reg,
+                                          uint32_t base_offset,
+                                          uint32_t *size)
+{
+    uint32_t header;
+    uint32_t vc_cap_max_size = PCIE_CONFIG_SPACE_SIZE - base_offset;
+    uint32_t next_ptr;
+    uint32_t arb_table_start_max = 0, arb_table_end_max = 0;
+    uint32_t port_vc_cap1, port_vc_cap2, vc_rsrc_cap;
+    uint32_t ext_vc_count = 0;
+    uint32_t arb_table_entry_size;  /* in bits */
+    const char *cap_name;
+    int ret;
+    int i;
+
+    ret = xen_host_pci_get_long(&s->real_device, base_offset, &header);
+    if (ret) {
+        goto err_read;
+    }
+
+    next_ptr = PCI_EXT_CAP_NEXT(header);
+
+    switch (PCI_EXT_CAP_ID(header)) {
+    case PCI_EXT_CAP_ID_VC:
+    case PCI_EXT_CAP_ID_VC9:
+        cap_name = "Virtual Channel";
+        break;
+    case PCI_EXT_CAP_ID_MFVC:
+        cap_name = "Multi-Function VC";
+        break;
+    default:
+        XEN_PT_ERR(&s->dev, "Unknown VC Extended Capability ID "
+                   "encountered: 0x%04x\n", PCI_EXT_CAP_ID(header));
+        return -1;
+    }
+
+    if (next_ptr && next_ptr > base_offset) {
+        vc_cap_max_size = next_ptr - base_offset;
+    }
+
+    ret = xen_host_pci_get_long(&s->real_device,
+                                base_offset + PCI_VC_PORT_CAP1,
+                                &port_vc_cap1);
+    if (ret) {
+        goto err_read;
+    }
+
+    ret = xen_host_pci_get_long(&s->real_device,
+                                base_offset + PCI_VC_PORT_CAP2,
+                                &port_vc_cap2);
+    if (ret) {
+        goto err_read;
+    }
+
+    ext_vc_count = port_vc_cap1 & PCI_VC_CAP1_EVCC;
+
+    arb_table_start_max = GET_ARB_TABLE_OFFSET(port_vc_cap2);
+
+    /* check arbitration table offset for validity */
+    if (arb_table_start_max >= vc_cap_max_size) {
+        XEN_PT_ERR(&s->dev, "Warning: VC arbitration table offset points "
+                   "outside the expected range: %#04x\n",
+                   (uint16_t) arb_table_start_max);
+        /* skip this arbitration table */
+        arb_table_start_max = 0;
+    }
+
+    if (arb_table_start_max) {
+        uint32_t vc_arb_cap = GET_VC_ARB_CAPABILITY(port_vc_cap2);
+        uint32_t num_phases = get_arb_table_len_max(s, 3, vc_arb_cap);
+        uint32_t arb_tbl_sz = QEMU_ALIGN_UP(num_phases * 4, 32) / 8;
+
+        arb_table_end_max = base_offset + arb_table_start_max + arb_tbl_sz;
+    }
+
+    /* get Function/Port Arbitration Table Entry size */
+    arb_table_entry_size = ARB_TABLE_ENTRY_SIZE_BITS(port_vc_cap1);
+
+    /* process all VC Resource entries */
+    for (i = 0; i < ext_vc_count; i++) {
+        uint32_t arb_table_offset;
+
+        /* read VC Resource Capability */
+        ret = xen_host_pci_get_long(&s->real_device,
+            base_offset + PCI_VC_RES_CAP + i * PCI_CAP_VC_PER_VC_SIZEOF,
+            &vc_rsrc_cap);
+        if (ret) {
+            goto err_read;
+        }
+
+        arb_table_offset = GET_ARB_TABLE_OFFSET(vc_rsrc_cap);
+
+        if (arb_table_offset > arb_table_start_max) {
+            /* check arbitration table offset for validity */
+            if (arb_table_offset >= vc_cap_max_size) {
+                XEN_PT_ERR(&s->dev, "Warning: Port/Function arbitration table "
+                           "offset points outside the expected range: %#04x\n",
+                           (uint16_t) arb_table_offset);
+                /* skip this arbitration table */
+                arb_table_offset = 0;
+            } else {
+                arb_table_start_max = arb_table_offset;
+            }
+
+            if (arb_table_offset) {
+                uint32_t vc_arb_cap = GET_VC_ARB_CAPABILITY(vc_rsrc_cap);
+                uint32_t num_phases = get_arb_table_len_max(s, 5, vc_arb_cap);
+                uint32_t arb_tbl_sz =
+                    QEMU_ALIGN_UP(num_phases * arb_table_entry_size, 32) / 8;
+
+                arb_table_end_max = base_offset + arb_table_offset + arb_tbl_sz;
+            }
+        }
+    }
+
+    if (arb_table_end_max) {
+        *size = arb_table_end_max - base_offset;
+    } else {
+        *size = PCI_CAP_VC_BASE_SIZEOF +
+                ext_vc_count * PCI_CAP_VC_PER_VC_SIZEOF;
+    }
+
+    log_pcie_extended_cap(s, cap_name, base_offset, *size);
+    return 0;
+
+err_read:
+    XEN_PT_ERR(&s->dev, "Error while reading VC Extended Capability\n");
+    return ret;
+}
+
+
 static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
     /* Header Type0 reg group */
     {
@@ -2515,6 +2683,30 @@ static const XenPTRegGroupInfo xen_pt_emu_reg_grps[] = {
         .grp_size   = 0xFF,
         .size_init  = xen_pt_ext_cap_rebar_size_init,
     },
+    /* Virtual Channel Extended Capability reg group (2) */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_VC),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_vchan_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Virtual Channel Extended Capability reg group (9) */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_VC9),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_vchan_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
+    /* Multi-Function Virtual Channel Extended Capability reg group */
+    {
+        .grp_id     = PCIE_EXT_CAP_ID(PCI_EXT_CAP_ID_MFVC),
+        .grp_type   = XEN_PT_GRP_TYPE_EMU,
+        .grp_size   = 0xFF,
+        .size_init  = xen_pt_ext_cap_vchan_size_init,
+        .emu_regs   = xen_pt_ext_cap_emu_reg_dummy,
+    },
     {
         .grp_size = 0,
     },
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 01/12] libacpi: new DSDT ACPI table for Q35
  2018-03-12 18:33 ` [RFC PATCH 01/12] libacpi: new DSDT ACPI table for Q35 Alexey Gerasimenko
@ 2018-03-12 19:38   ` Konrad Rzeszutek Wilk
  2018-03-12 20:10     ` Alexey G
  2018-03-19 12:43   ` Roger Pau Monné
  1 sibling, 1 reply; 183+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-03-12 19:38 UTC (permalink / raw)
  To: Alexey Gerasimenko; +Cc: xen-devel, Wei Liu, Ian Jackson, Jan Beulich

On Tue, Mar 13, 2018 at 04:33:46AM +1000, Alexey Gerasimenko wrote:
> This patch adds the DSDT table for Q35 (new tools/libacpi/dsdt_q35.asl
> file). There are not many differences with dsdt.asl (for i440) at the
> moment, namely:
> 
> - BDF location of LPC Controller
> - Minor changes related to FDC detection
> - Addition of _OSC method to inform OSPM about PCIe features supported
> 
> As we are still using 4 PCI router links and their corresponding
> device/register addresses are same (offset 0x60), no need to change PCI
> routing descriptions.
> 
> Also, ACPI hotplug is still used to control passed through device hot
> (un)plug (as it was for i440).
> 
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> ---
>  tools/libacpi/dsdt_q35.asl | 551 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 551 insertions(+)
>  create mode 100644 tools/libacpi/dsdt_q35.asl
> 
> diff --git a/tools/libacpi/dsdt_q35.asl b/tools/libacpi/dsdt_q35.asl
> new file mode 100644
> index 0000000000..cd02946a07
> --- /dev/null
> +++ b/tools/libacpi/dsdt_q35.asl
> @@ -0,0 +1,551 @@
> +/******************************************************************************
> + * DSDT for Xen with Qemu device model (for Q35 machine)
> + *
> + * Copyright (c) 2004, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU Lesser General Public License as published
> + * by the Free Software Foundation; version 2.1 only. with the special
> + * exception on linking described in file LICENSE.

I don't see the 'LICENSE' file in Xen's directory?

Also, your email does not seem to be coming from Intel, so I have to ask,
where did this file originally come from?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 16/30] q35/xen: Add Xen platform device support for Q35
  2018-03-12 18:34   ` Alexey Gerasimenko
  (?)
@ 2018-03-12 19:44   ` Eduardo Habkost
  2018-03-12 20:56       ` Alexey G
  -1 siblings, 1 reply; 183+ messages in thread
From: Eduardo Habkost @ 2018-03-12 19:44 UTC (permalink / raw)
  To: Alexey Gerasimenko
  Cc: xen-devel, qemu-devel, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson, Michael S. Tsirkin

On Tue, Mar 13, 2018 at 04:34:01AM +1000, Alexey Gerasimenko wrote:
> Current Xen/QEMU method to control Xen Platform device on i440 is a bit
> odd -- enabling/disabling Xen platform device actually modifies the QEMU
> emulated machine type, namely xenfv <--> pc.
> 
> In order to avoid multiplying machine types, use a new way to control Xen
> Platform device for QEMU -- "xen-platform-dev" machine property (bool).
> To maintain backward compatibility with existing Xen/QEMU setups, this
> is only applicable to q35 machine currently. i440 emulation still uses the
> old method (i.e. xenfv/pc machine selection) to control Xen Platform
> device, this may be changed later to xen-platform-dev property as well.
> 
> This way we can use a single machine type (q35) and change just
> xen-platform-dev value to on/off to control Xen platform device.
> 
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> ---
[...]
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 6585058c6c..cee0b92028 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -38,6 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>      "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
>      "                mem-merge=on|off controls memory merge support (default: on)\n"
>      "                igd-passthru=on|off controls IGD GFX passthrough support (default=off)\n"
> +    "                xen-platform-dev=on|off controls Xen Platform device (default=off)\n"
>      "                aes-key-wrap=on|off controls support for AES key wrapping (default=on)\n"
>      "                dea-key-wrap=on|off controls support for DEA key wrapping (default=on)\n"
>      "                suppress-vmdesc=on|off disables self-describing migration (default=off)\n"

What are the obstacles preventing "-device xen-platform" from
working?  It would be better than adding a new boolean option to
-machine.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 16/30] q35/xen: Add Xen platform device support for Q35
  2018-03-12 18:34   ` Alexey Gerasimenko
  (?)
  (?)
@ 2018-03-12 19:44   ` Eduardo Habkost
  -1 siblings, 0 replies; 183+ messages in thread
From: Eduardo Habkost @ 2018-03-12 19:44 UTC (permalink / raw)
  To: Alexey Gerasimenko
  Cc: Michael S. Tsirkin, qemu-devel, Paolo Bonzini, Marcel Apfelbaum,
	xen-devel, Richard Henderson

On Tue, Mar 13, 2018 at 04:34:01AM +1000, Alexey Gerasimenko wrote:
> Current Xen/QEMU method to control Xen Platform device on i440 is a bit
> odd -- enabling/disabling Xen platform device actually modifies the QEMU
> emulated machine type, namely xenfv <--> pc.
> 
> In order to avoid multiplying machine types, use a new way to control Xen
> Platform device for QEMU -- "xen-platform-dev" machine property (bool).
> To maintain backward compatibility with existing Xen/QEMU setups, this
> is only applicable to q35 machine currently. i440 emulation still uses the
> old method (i.e. xenfv/pc machine selection) to control Xen Platform
> device, this may be changed later to xen-platform-dev property as well.
> 
> This way we can use a single machine type (q35) and change just
> xen-platform-dev value to on/off to control Xen platform device.
> 
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> ---
[...]
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 6585058c6c..cee0b92028 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -38,6 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>      "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
>      "                mem-merge=on|off controls memory merge support (default: on)\n"
>      "                igd-passthru=on|off controls IGD GFX passthrough support (default=off)\n"
> +    "                xen-platform-dev=on|off controls Xen Platform device (default=off)\n"
>      "                aes-key-wrap=on|off controls support for AES key wrapping (default=on)\n"
>      "                dea-key-wrap=on|off controls support for DEA key wrapping (default=on)\n"
>      "                suppress-vmdesc=on|off disables self-describing migration (default=off)\n"

What are the obstacles preventing "-device xen-platform" from
working?  It would be better than adding a new boolean option to
-machine.

-- 
Eduardo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 01/12] libacpi: new DSDT ACPI table for Q35
  2018-03-12 19:38   ` Konrad Rzeszutek Wilk
@ 2018-03-12 20:10     ` Alexey G
  2018-03-12 20:32       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-12 20:10 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Wei Liu, Ian Jackson, Jan Beulich

On Mon, 12 Mar 2018 15:38:03 -0400
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:

>On Tue, Mar 13, 2018 at 04:33:46AM +1000, Alexey Gerasimenko wrote:
>> This patch adds the DSDT table for Q35 (new
>> tools/libacpi/dsdt_q35.asl file). There are not many differences
>> with dsdt.asl (for i440) at the moment, namely:
>> 
>> - BDF location of LPC Controller
>> - Minor changes related to FDC detection
>> - Addition of _OSC method to inform OSPM about PCIe features
>> supported
>> 
>> As we are still using 4 PCI router links and their corresponding
>> device/register addresses are same (offset 0x60), no need to change
>> PCI routing descriptions.
>> 
>> Also, ACPI hotplug is still used to control passed through device hot
>> (un)plug (as it was for i440).
>> 
>> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
>> ---
>>  tools/libacpi/dsdt_q35.asl | 551
>> +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 551
>> insertions(+) create mode 100644 tools/libacpi/dsdt_q35.asl
>> 
>> diff --git a/tools/libacpi/dsdt_q35.asl b/tools/libacpi/dsdt_q35.asl
>> new file mode 100644
>> index 0000000000..cd02946a07
>> --- /dev/null
>> +++ b/tools/libacpi/dsdt_q35.asl
>> @@ -0,0 +1,551 @@
>> +/******************************************************************************
>> + * DSDT for Xen with Qemu device model (for Q35 machine)
>> + *
>> + * Copyright (c) 2004, Intel Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> modify
>> + * it under the terms of the GNU Lesser General Public License as
>> published
>> + * by the Free Software Foundation; version 2.1 only. with the
>> special
>> + * exception on linking described in file LICENSE.  
>
>I don't see the 'LICENSE' file in Xen's directory?
>
>Also, your email does not seem to be coming from Intel, so I have to
>ask, where did this file originally come from?

It's basically Xen's dsdt.asl with some modifications related to Q35.
Currently only few modifications needed, but in the future dsdt.asl and
dsdt_q35.asl will diverge more from each other -- that's the reason why
a separate file was forked instead applying these changes to dsdt.asl
directly, for example, as #ifdef-parts.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 01/12] libacpi: new DSDT ACPI table for Q35
  2018-03-12 20:10     ` Alexey G
@ 2018-03-12 20:32       ` Konrad Rzeszutek Wilk
  2018-03-12 21:19         ` Alexey G
  0 siblings, 1 reply; 183+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-03-12 20:32 UTC (permalink / raw)
  To: Alexey G; +Cc: xen-devel, Wei Liu, Ian Jackson, Jan Beulich

On Tue, Mar 13, 2018 at 06:10:35AM +1000, Alexey G wrote:
> On Mon, 12 Mar 2018 15:38:03 -0400
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> 
> >On Tue, Mar 13, 2018 at 04:33:46AM +1000, Alexey Gerasimenko wrote:
> >> This patch adds the DSDT table for Q35 (new
> >> tools/libacpi/dsdt_q35.asl file). There are not many differences
> >> with dsdt.asl (for i440) at the moment, namely:
> >> 
> >> - BDF location of LPC Controller
> >> - Minor changes related to FDC detection
> >> - Addition of _OSC method to inform OSPM about PCIe features
> >> supported
> >> 
> >> As we are still using 4 PCI router links and their corresponding
> >> device/register addresses are same (offset 0x60), no need to change
> >> PCI routing descriptions.
> >> 
> >> Also, ACPI hotplug is still used to control passed through device hot
> >> (un)plug (as it was for i440).
> >> 
> >> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> >> ---
> >>  tools/libacpi/dsdt_q35.asl | 551
> >> +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 551
> >> insertions(+) create mode 100644 tools/libacpi/dsdt_q35.asl
> >> 
> >> diff --git a/tools/libacpi/dsdt_q35.asl b/tools/libacpi/dsdt_q35.asl
> >> new file mode 100644
> >> index 0000000000..cd02946a07
> >> --- /dev/null
> >> +++ b/tools/libacpi/dsdt_q35.asl
> >> @@ -0,0 +1,551 @@
> >> +/******************************************************************************
> >> + * DSDT for Xen with Qemu device model (for Q35 machine)
> >> + *
> >> + * Copyright (c) 2004, Intel Corporation.
> >> + *
> >> + * This program is free software; you can redistribute it and/or
> >> modify
> >> + * it under the terms of the GNU Lesser General Public License as
> >> published
> >> + * by the Free Software Foundation; version 2.1 only. with the
> >> special
> >> + * exception on linking described in file LICENSE.  
> >
> >I don't see the 'LICENSE' file in Xen's directory?
> >
> >Also, your email does not seem to be coming from Intel, so I have to
> >ask, where did this file originally come from?
> 
> It's basically Xen's dsdt.asl with some modifications related to Q35.
> Currently only few modifications needed, but in the future dsdt.asl and
> dsdt_q35.asl will diverge more from each other -- that's the reason why
> a separate file was forked instead applying these changes to dsdt.asl
> directly, for example, as #ifdef-parts.

OK, as such you should make a seperate patch that adds this file (and
be completly unmodified) and make sure you CC Intel folks (Kevin, et all) so
they can Ack it.

Thank you.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 16/30] q35/xen: Add Xen platform device support for Q35
  2018-03-12 19:44   ` [Qemu-devel] " Eduardo Habkost
@ 2018-03-12 20:56       ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-12 20:56 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: xen-devel, qemu-devel, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson, Michael S. Tsirkin

On Mon, 12 Mar 2018 16:44:06 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

>On Tue, Mar 13, 2018 at 04:34:01AM +1000, Alexey Gerasimenko wrote:
>> Current Xen/QEMU method to control Xen Platform device on i440 is a
>> bit odd -- enabling/disabling Xen platform device actually modifies
>> the QEMU emulated machine type, namely xenfv <--> pc.
>> 
>> In order to avoid multiplying machine types, use a new way to
>> control Xen Platform device for QEMU -- "xen-platform-dev" machine
>> property (bool). To maintain backward compatibility with existing
>> Xen/QEMU setups, this is only applicable to q35 machine currently.
>> i440 emulation still uses the old method (i.e. xenfv/pc machine
>> selection) to control Xen Platform device, this may be changed later
>> to xen-platform-dev property as well.
>> 
>> This way we can use a single machine type (q35) and change just
>> xen-platform-dev value to on/off to control Xen platform device.
>> 
>> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
>> ---  
>[...]
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 6585058c6c..cee0b92028 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -38,6 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>>      "                dump-guest-core=on|off include guest memory in
>> a core dump (default=on)\n" "                mem-merge=on|off
>> controls memory merge support (default: on)\n" "
>> igd-passthru=on|off controls IGD GFX passthrough support
>> (default=off)\n"
>> +    "                xen-platform-dev=on|off controls Xen Platform
>> device (default=off)\n" "                aes-key-wrap=on|off
>> controls support for AES key wrapping (default=on)\n"
>> "                dea-key-wrap=on|off controls support for DEA key
>> wrapping (default=on)\n" "                suppress-vmdesc=on|off
>> disables self-describing migration (default=off)\n"  
>
>What are the obstacles preventing "-device xen-platform" from
>working?  It would be better than adding a new boolean option to
>-machine.

I guess the initial assumption was that changing the
xen_platform_device value in Xen's options may cause some additional
changes in platform configuration besides adding (or not) the Xen
Platform device, hence a completely different machine type was chosen
(xenfv).

At the moment pc,accel=xen/xenfv selection mostly governs
only the Xen Platform device presence. Also setting max_cpus to
HVM_MAX_VCPUS depends on it, but this doesn't applicable to a
'pc,accel=xen' machine for some reason.

If applying HVM_MAX_VCPUS to max_cpus is really necessary I think it's
better to set it unconditionally for all 'accel=xen' HVM machine
types inside xen_enabled() block. Right now it's missing for
pc,accel=xen and q35,accel=xen.

I'll check if supplying the Xen platform device via the '-device' option
will be ok for all usage cases.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 16/30] q35/xen: Add Xen platform device support for Q35
@ 2018-03-12 20:56       ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-12 20:56 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Michael S. Tsirkin, qemu-devel, Paolo Bonzini, Marcel Apfelbaum,
	xen-devel, Richard Henderson

On Mon, 12 Mar 2018 16:44:06 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

>On Tue, Mar 13, 2018 at 04:34:01AM +1000, Alexey Gerasimenko wrote:
>> Current Xen/QEMU method to control Xen Platform device on i440 is a
>> bit odd -- enabling/disabling Xen platform device actually modifies
>> the QEMU emulated machine type, namely xenfv <--> pc.
>> 
>> In order to avoid multiplying machine types, use a new way to
>> control Xen Platform device for QEMU -- "xen-platform-dev" machine
>> property (bool). To maintain backward compatibility with existing
>> Xen/QEMU setups, this is only applicable to q35 machine currently.
>> i440 emulation still uses the old method (i.e. xenfv/pc machine
>> selection) to control Xen Platform device, this may be changed later
>> to xen-platform-dev property as well.
>> 
>> This way we can use a single machine type (q35) and change just
>> xen-platform-dev value to on/off to control Xen platform device.
>> 
>> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
>> ---  
>[...]
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 6585058c6c..cee0b92028 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -38,6 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>>      "                dump-guest-core=on|off include guest memory in
>> a core dump (default=on)\n" "                mem-merge=on|off
>> controls memory merge support (default: on)\n" "
>> igd-passthru=on|off controls IGD GFX passthrough support
>> (default=off)\n"
>> +    "                xen-platform-dev=on|off controls Xen Platform
>> device (default=off)\n" "                aes-key-wrap=on|off
>> controls support for AES key wrapping (default=on)\n"
>> "                dea-key-wrap=on|off controls support for DEA key
>> wrapping (default=on)\n" "                suppress-vmdesc=on|off
>> disables self-describing migration (default=off)\n"  
>
>What are the obstacles preventing "-device xen-platform" from
>working?  It would be better than adding a new boolean option to
>-machine.

I guess the initial assumption was that changing the
xen_platform_device value in Xen's options may cause some additional
changes in platform configuration besides adding (or not) the Xen
Platform device, hence a completely different machine type was chosen
(xenfv).

At the moment pc,accel=xen/xenfv selection mostly governs
only the Xen Platform device presence. Also setting max_cpus to
HVM_MAX_VCPUS depends on it, but this doesn't applicable to a
'pc,accel=xen' machine for some reason.

If applying HVM_MAX_VCPUS to max_cpus is really necessary I think it's
better to set it unconditionally for all 'accel=xen' HVM machine
types inside xen_enabled() block. Right now it's missing for
pc,accel=xen and q35,accel=xen.

I'll check if supplying the Xen platform device via the '-device' option
will be ok for all usage cases.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 01/12] libacpi: new DSDT ACPI table for Q35
  2018-03-12 20:32       ` Konrad Rzeszutek Wilk
@ 2018-03-12 21:19         ` Alexey G
  2018-03-13  2:41           ` Tian, Kevin
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-12 21:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Kevin Tian, Wei Liu, Ian Jackson, Jan Beulich

On Mon, 12 Mar 2018 16:32:27 -0400
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:

>On Tue, Mar 13, 2018 at 06:10:35AM +1000, Alexey G wrote:
>> On Mon, 12 Mar 2018 15:38:03 -0400
>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>>   
>> >On Tue, Mar 13, 2018 at 04:33:46AM +1000, Alexey Gerasimenko
>> >wrote:  
>> >> This patch adds the DSDT table for Q35 (new
>> >> tools/libacpi/dsdt_q35.asl file). There are not many differences
>> >> with dsdt.asl (for i440) at the moment, namely:
>> >> 
>> >> - BDF location of LPC Controller
>> >> - Minor changes related to FDC detection
>> >> - Addition of _OSC method to inform OSPM about PCIe features
>> >> supported
>> >> 
>> >> As we are still using 4 PCI router links and their corresponding
>> >> device/register addresses are same (offset 0x60), no need to
>> >> change PCI routing descriptions.
>> >> 
>> >> Also, ACPI hotplug is still used to control passed through device
>> >> hot (un)plug (as it was for i440).
>> >> 
>> >> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
>> >> ---
>> >>  tools/libacpi/dsdt_q35.asl | 551
>> >> +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 551
>> >> insertions(+) create mode 100644 tools/libacpi/dsdt_q35.asl
>> >> 
>> >> diff --git a/tools/libacpi/dsdt_q35.asl
>> >> b/tools/libacpi/dsdt_q35.asl new file mode 100644
>> >> index 0000000000..cd02946a07
>> >> --- /dev/null
>> >> +++ b/tools/libacpi/dsdt_q35.asl
>> >> @@ -0,0 +1,551 @@
>> >> +/******************************************************************************
>> >> + * DSDT for Xen with Qemu device model (for Q35 machine)
>> >> + *
>> >> + * Copyright (c) 2004, Intel Corporation.
>> >> + *
>> >> + * This program is free software; you can redistribute it and/or
>> >> modify
>> >> + * it under the terms of the GNU Lesser General Public License as
>> >> published
>> >> + * by the Free Software Foundation; version 2.1 only. with the
>> >> special
>> >> + * exception on linking described in file LICENSE.    
>> >
>> >I don't see the 'LICENSE' file in Xen's directory?
>> >
>> >Also, your email does not seem to be coming from Intel, so I have to
>> >ask, where did this file originally come from?  
>> 
>> It's basically Xen's dsdt.asl with some modifications related to Q35.
>> Currently only few modifications needed, but in the future dsdt.asl
>> and dsdt_q35.asl will diverge more from each other -- that's the
>> reason why a separate file was forked instead applying these changes
>> to dsdt.asl directly, for example, as #ifdef-parts.  
>
>OK, as such you should make a seperate patch that adds this file (and
>be completly unmodified) and make sure you CC Intel folks (Kevin, et
>all) so they can Ack it.

Kevin -- I assume you mean Kevin Tian <kevin.tian@intel.com>? Cc'ing
him.
Please let me know other persons from Intel who are also responsible,
the MAINTAINERS file doesn't tell much about Intel people
regarding /libacpi, unfortunately.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 16/30] q35/xen: Add Xen platform device support for Q35
  2018-03-12 20:56       ` Alexey G
  (?)
  (?)
@ 2018-03-12 21:44       ` Eduardo Habkost
  2018-03-13 23:49           ` Alexey G
  -1 siblings, 1 reply; 183+ messages in thread
From: Eduardo Habkost @ 2018-03-12 21:44 UTC (permalink / raw)
  To: Alexey G
  Cc: xen-devel, qemu-devel, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson, Michael S. Tsirkin

On Tue, Mar 13, 2018 at 06:56:37AM +1000, Alexey G wrote:
> On Mon, 12 Mar 2018 16:44:06 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> >On Tue, Mar 13, 2018 at 04:34:01AM +1000, Alexey Gerasimenko wrote:
> >> Current Xen/QEMU method to control Xen Platform device on i440 is a
> >> bit odd -- enabling/disabling Xen platform device actually modifies
> >> the QEMU emulated machine type, namely xenfv <--> pc.
> >> 
> >> In order to avoid multiplying machine types, use a new way to
> >> control Xen Platform device for QEMU -- "xen-platform-dev" machine
> >> property (bool). To maintain backward compatibility with existing
> >> Xen/QEMU setups, this is only applicable to q35 machine currently.
> >> i440 emulation still uses the old method (i.e. xenfv/pc machine
> >> selection) to control Xen Platform device, this may be changed later
> >> to xen-platform-dev property as well.
> >> 
> >> This way we can use a single machine type (q35) and change just
> >> xen-platform-dev value to on/off to control Xen platform device.
> >> 
> >> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> >> ---  
> >[...]
> >> diff --git a/qemu-options.hx b/qemu-options.hx
> >> index 6585058c6c..cee0b92028 100644
> >> --- a/qemu-options.hx
> >> +++ b/qemu-options.hx
> >> @@ -38,6 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
> >>      "                dump-guest-core=on|off include guest memory in
> >> a core dump (default=on)\n" "                mem-merge=on|off
> >> controls memory merge support (default: on)\n" "
> >> igd-passthru=on|off controls IGD GFX passthrough support
> >> (default=off)\n"
> >> +    "                xen-platform-dev=on|off controls Xen Platform
> >> device (default=off)\n" "                aes-key-wrap=on|off
> >> controls support for AES key wrapping (default=on)\n"
> >> "                dea-key-wrap=on|off controls support for DEA key
> >> wrapping (default=on)\n" "                suppress-vmdesc=on|off
> >> disables self-describing migration (default=off)\n"  
> >
> >What are the obstacles preventing "-device xen-platform" from
> >working?  It would be better than adding a new boolean option to
> >-machine.
> 
> I guess the initial assumption was that changing the
> xen_platform_device value in Xen's options may cause some additional
> changes in platform configuration besides adding (or not) the Xen
> Platform device, hence a completely different machine type was chosen
> (xenfv).
> 
> At the moment pc,accel=xen/xenfv selection mostly governs
> only the Xen Platform device presence. Also setting max_cpus to
> HVM_MAX_VCPUS depends on it, but this doesn't applicable to a
> 'pc,accel=xen' machine for some reason.
> 
> If applying HVM_MAX_VCPUS to max_cpus is really necessary I think it's
> better to set it unconditionally for all 'accel=xen' HVM machine
> types inside xen_enabled() block. Right now it's missing for
> pc,accel=xen and q35,accel=xen.

If you are talking about MachineClass::max_cpus, note that it is
returned by query-machines, so it's supposed to be a static
value.  Changing it a runtime would mean the query-machines value
is incorrect.

Is HVM_MAX_CPUS higher or lower than 255?  If it's higher, does
it mean the current value on pc and q35 isn't accurate?


> 
> I'll check if supplying the Xen platform device via the '-device' option
> will be ok for all usage cases.

Is HVM_MAX_CPUS something that needs to be enabled because of
accel=xen or because or the xen-platform device?

If it's just because of accel=xen, we could introduce a
AccelClass::max_cpus() method (we also have KVM-imposed CPU count
limits, currently implemented inside kvm_init()).

-- 
Eduardo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 16/30] q35/xen: Add Xen platform device support for Q35
  2018-03-12 20:56       ` Alexey G
  (?)
@ 2018-03-12 21:44       ` Eduardo Habkost
  -1 siblings, 0 replies; 183+ messages in thread
From: Eduardo Habkost @ 2018-03-12 21:44 UTC (permalink / raw)
  To: Alexey G
  Cc: Michael S. Tsirkin, qemu-devel, Paolo Bonzini, Marcel Apfelbaum,
	xen-devel, Richard Henderson

On Tue, Mar 13, 2018 at 06:56:37AM +1000, Alexey G wrote:
> On Mon, 12 Mar 2018 16:44:06 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> >On Tue, Mar 13, 2018 at 04:34:01AM +1000, Alexey Gerasimenko wrote:
> >> Current Xen/QEMU method to control Xen Platform device on i440 is a
> >> bit odd -- enabling/disabling Xen platform device actually modifies
> >> the QEMU emulated machine type, namely xenfv <--> pc.
> >> 
> >> In order to avoid multiplying machine types, use a new way to
> >> control Xen Platform device for QEMU -- "xen-platform-dev" machine
> >> property (bool). To maintain backward compatibility with existing
> >> Xen/QEMU setups, this is only applicable to q35 machine currently.
> >> i440 emulation still uses the old method (i.e. xenfv/pc machine
> >> selection) to control Xen Platform device, this may be changed later
> >> to xen-platform-dev property as well.
> >> 
> >> This way we can use a single machine type (q35) and change just
> >> xen-platform-dev value to on/off to control Xen platform device.
> >> 
> >> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> >> ---  
> >[...]
> >> diff --git a/qemu-options.hx b/qemu-options.hx
> >> index 6585058c6c..cee0b92028 100644
> >> --- a/qemu-options.hx
> >> +++ b/qemu-options.hx
> >> @@ -38,6 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
> >>      "                dump-guest-core=on|off include guest memory in
> >> a core dump (default=on)\n" "                mem-merge=on|off
> >> controls memory merge support (default: on)\n" "
> >> igd-passthru=on|off controls IGD GFX passthrough support
> >> (default=off)\n"
> >> +    "                xen-platform-dev=on|off controls Xen Platform
> >> device (default=off)\n" "                aes-key-wrap=on|off
> >> controls support for AES key wrapping (default=on)\n"
> >> "                dea-key-wrap=on|off controls support for DEA key
> >> wrapping (default=on)\n" "                suppress-vmdesc=on|off
> >> disables self-describing migration (default=off)\n"  
> >
> >What are the obstacles preventing "-device xen-platform" from
> >working?  It would be better than adding a new boolean option to
> >-machine.
> 
> I guess the initial assumption was that changing the
> xen_platform_device value in Xen's options may cause some additional
> changes in platform configuration besides adding (or not) the Xen
> Platform device, hence a completely different machine type was chosen
> (xenfv).
> 
> At the moment pc,accel=xen/xenfv selection mostly governs
> only the Xen Platform device presence. Also setting max_cpus to
> HVM_MAX_VCPUS depends on it, but this doesn't applicable to a
> 'pc,accel=xen' machine for some reason.
> 
> If applying HVM_MAX_VCPUS to max_cpus is really necessary I think it's
> better to set it unconditionally for all 'accel=xen' HVM machine
> types inside xen_enabled() block. Right now it's missing for
> pc,accel=xen and q35,accel=xen.

If you are talking about MachineClass::max_cpus, note that it is
returned by query-machines, so it's supposed to be a static
value.  Changing it a runtime would mean the query-machines value
is incorrect.

Is HVM_MAX_CPUS higher or lower than 255?  If it's higher, does
it mean the current value on pc and q35 isn't accurate?


> 
> I'll check if supplying the Xen platform device via the '-device' option
> will be ok for all usage cases.

Is HVM_MAX_CPUS something that needs to be enabled because of
accel=xen or because or the xen-platform device?

If it's just because of accel=xen, we could introduce a
AccelClass::max_cpus() method (we also have KVM-imposed CPU count
limits, currently implemented inside kvm_init()).

-- 
Eduardo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 01/12] libacpi: new DSDT ACPI table for Q35
  2018-03-12 21:19         ` Alexey G
@ 2018-03-13  2:41           ` Tian, Kevin
  0 siblings, 0 replies; 183+ messages in thread
From: Tian, Kevin @ 2018-03-13  2:41 UTC (permalink / raw)
  To: Alexey G, Konrad Rzeszutek Wilk
  Cc: xen-devel, Peng, Chao P, Wei Liu, Ian Jackson, Jan Beulich

> From: Alexey G [mailto:x1917x@gmail.com]
> Sent: Tuesday, March 13, 2018 5:20 AM
> 
> On Mon, 12 Mar 2018 16:32:27 -0400
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> 
> >On Tue, Mar 13, 2018 at 06:10:35AM +1000, Alexey G wrote:
> >> On Mon, 12 Mar 2018 15:38:03 -0400
> >> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> >>
> >> >On Tue, Mar 13, 2018 at 04:33:46AM +1000, Alexey Gerasimenko
> >> >wrote:
> >> >> This patch adds the DSDT table for Q35 (new
> >> >> tools/libacpi/dsdt_q35.asl file). There are not many differences
> >> >> with dsdt.asl (for i440) at the moment, namely:
> >> >>
> >> >> - BDF location of LPC Controller
> >> >> - Minor changes related to FDC detection
> >> >> - Addition of _OSC method to inform OSPM about PCIe features
> >> >> supported
> >> >>
> >> >> As we are still using 4 PCI router links and their corresponding
> >> >> device/register addresses are same (offset 0x60), no need to
> >> >> change PCI routing descriptions.
> >> >>
> >> >> Also, ACPI hotplug is still used to control passed through device
> >> >> hot (un)plug (as it was for i440).
> >> >>
> >> >> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> >> >> ---
> >> >>  tools/libacpi/dsdt_q35.asl | 551
> >> >> +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed,
> 551
> >> >> insertions(+) create mode 100644 tools/libacpi/dsdt_q35.asl
> >> >>
> >> >> diff --git a/tools/libacpi/dsdt_q35.asl
> >> >> b/tools/libacpi/dsdt_q35.asl new file mode 100644
> >> >> index 0000000000..cd02946a07
> >> >> --- /dev/null
> >> >> +++ b/tools/libacpi/dsdt_q35.asl
> >> >> @@ -0,0 +1,551 @@
> >> >>
> +/************************************************************
> ******************
> >> >> + * DSDT for Xen with Qemu device model (for Q35 machine)
> >> >> + *
> >> >> + * Copyright (c) 2004, Intel Corporation.
> >> >> + *
> >> >> + * This program is free software; you can redistribute it and/or
> >> >> modify
> >> >> + * it under the terms of the GNU Lesser General Public License as
> >> >> published
> >> >> + * by the Free Software Foundation; version 2.1 only. with the
> >> >> special
> >> >> + * exception on linking described in file LICENSE.
> >> >
> >> >I don't see the 'LICENSE' file in Xen's directory?
> >> >
> >> >Also, your email does not seem to be coming from Intel, so I have to
> >> >ask, where did this file originally come from?
> >>
> >> It's basically Xen's dsdt.asl with some modifications related to Q35.
> >> Currently only few modifications needed, but in the future dsdt.asl
> >> and dsdt_q35.asl will diverge more from each other -- that's the
> >> reason why a separate file was forked instead applying these changes
> >> to dsdt.asl directly, for example, as #ifdef-parts.
> >
> >OK, as such you should make a seperate patch that adds this file (and
> >be completly unmodified) and make sure you CC Intel folks (Kevin, et
> >all) so they can Ack it.
> 
> Kevin -- I assume you mean Kevin Tian <kevin.tian@intel.com>? Cc'ing
> him.
> Please let me know other persons from Intel who are also responsible,
> the MAINTAINERS file doesn't tell much about Intel people
> regarding /libacpi, unfortunately.

I'm not the maintainer of libacpi (should be Jan?). But CC my
colleague (Chao Peng) here who did some study of Q35 support
before and can help review.

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices
  2018-03-12 18:33 ` Alexey Gerasimenko
@ 2018-03-13  9:21   ` Daniel P. Berrangé
  -1 siblings, 0 replies; 183+ messages in thread
From: Daniel P. Berrangé @ 2018-03-13  9:21 UTC (permalink / raw)
  To: Alexey Gerasimenko; +Cc: xen-devel, qemu-devel

The subject line says to expect 30 patches, but you've only sent 18 to
the list here. I eventually figured out that the first 12 patches were
in Xen code and so not sent to qemu-devel.

For future if you have changes that affect multiple completely separate
projects, send them as separate series. ie just send PATCH 00/18 to
QEMU devel so it doesn't look like a bunch of patches have gone missing.

On Tue, Mar 13, 2018 at 04:33:45AM +1000, Alexey Gerasimenko wrote:
> How to use the Q35 feature:
> 
> A new domain config option was implemented: device_model_machine. It's
> a string which has following possible values:
> - "i440" -- i440 emulation (default)
> - "q35"  -- emulate a Q35 machine. By default, the storage interface is
>   AHCI.

Presumably this is mapping to the QEMU -machine arg, so it feels desirable
to keep the same naming scheme. ie allow any of the versioned machine
names that QEMU uses. eg any of "pc-q35-2.x" versioned types, or 'q35' as
an alias for latest, and use "pc-i440fx-2.x" versioned types of 'pc' as
an alias for latest, rather than 'i440' which is needlessly divering
from the QEMU machine type.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices
@ 2018-03-13  9:21   ` Daniel P. Berrangé
  0 siblings, 0 replies; 183+ messages in thread
From: Daniel P. Berrangé @ 2018-03-13  9:21 UTC (permalink / raw)
  To: Alexey Gerasimenko; +Cc: xen-devel, qemu-devel

The subject line says to expect 30 patches, but you've only sent 18 to
the list here. I eventually figured out that the first 12 patches were
in Xen code and so not sent to qemu-devel.

For future if you have changes that affect multiple completely separate
projects, send them as separate series. ie just send PATCH 00/18 to
QEMU devel so it doesn't look like a bunch of patches have gone missing.

On Tue, Mar 13, 2018 at 04:33:45AM +1000, Alexey Gerasimenko wrote:
> How to use the Q35 feature:
> 
> A new domain config option was implemented: device_model_machine. It's
> a string which has following possible values:
> - "i440" -- i440 emulation (default)
> - "q35"  -- emulate a Q35 machine. By default, the storage interface is
>   AHCI.

Presumably this is mapping to the QEMU -machine arg, so it feels desirable
to keep the same naming scheme. ie allow any of the versioned machine
names that QEMU uses. eg any of "pc-q35-2.x" versioned types, or 'q35' as
an alias for latest, and use "pc-i440fx-2.x" versioned types of 'pc' as
an alias for latest, rather than 'i440' which is needlessly divering
from the QEMU machine type.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 16/30] q35/xen: Add Xen platform device support for Q35
  2018-03-12 18:34   ` Alexey Gerasimenko
@ 2018-03-13  9:24     ` Daniel P. Berrangé
  -1 siblings, 0 replies; 183+ messages in thread
From: Daniel P. Berrangé @ 2018-03-13  9:24 UTC (permalink / raw)
  To: Alexey Gerasimenko
  Cc: xen-devel, Eduardo Habkost, Michael S. Tsirkin, qemu-devel,
	Marcel Apfelbaum, Paolo Bonzini, Richard Henderson

On Tue, Mar 13, 2018 at 04:34:01AM +1000, Alexey Gerasimenko wrote:
> Current Xen/QEMU method to control Xen Platform device on i440 is a bit
> odd -- enabling/disabling Xen platform device actually modifies the QEMU
> emulated machine type, namely xenfv <--> pc.
> 
> In order to avoid multiplying machine types, use a new way to control Xen
> Platform device for QEMU -- "xen-platform-dev" machine property (bool).
> To maintain backward compatibility with existing Xen/QEMU setups, this
> is only applicable to q35 machine currently. i440 emulation still uses the
> old method (i.e. xenfv/pc machine selection) to control Xen Platform
> device, this may be changed later to xen-platform-dev property as well.

The change you made to q35 is pretty tiny, so I imagine the equiv
change to pc machine is equally small. IOW, I think you should just
convert them both straight away rather than providing an inconsistent
configuration approach for q35 vs pc.

> This way we can use a single machine type (q35) and change just
> xen-platform-dev value to on/off to control Xen platform device.
> 
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> ---
>  hw/core/machine.c   | 21 +++++++++++++++++++++
>  hw/i386/pc_q35.c    | 14 ++++++++++++++
>  include/hw/boards.h |  1 +
>  qemu-options.hx     |  1 +
>  4 files changed, 37 insertions(+)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 5e2bbcdace..205e7da3ce 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -290,6 +290,20 @@ static void machine_set_igd_gfx_passthru(Object *obj, bool value, Error **errp)
>      ms->igd_gfx_passthru = value;
>  }
>  
> +static bool machine_get_xen_platform_dev(Object *obj, Error **errp)
> +{
> +    MachineState *ms = MACHINE(obj);
> +
> +    return ms->xen_platform_dev;
> +}
> +
> +static void machine_set_xen_platform_dev(Object *obj, bool value, Error **errp)
> +{
> +    MachineState *ms = MACHINE(obj);
> +
> +    ms->xen_platform_dev = value;
> +}
> +
>  static char *machine_get_firmware(Object *obj, Error **errp)
>  {
>      MachineState *ms = MACHINE(obj);
> @@ -595,6 +609,13 @@ static void machine_class_init(ObjectClass *oc, void *data)
>      object_class_property_set_description(oc, "igd-passthru",
>          "Set on/off to enable/disable igd passthrou", &error_abort);
>  
> +    object_class_property_add_bool(oc, "xen-platform-dev",
> +        machine_get_xen_platform_dev,
> +        machine_set_xen_platform_dev, &error_abort);
> +    object_class_property_set_description(oc, "xen-platform-dev",
> +        "Set on/off to enable/disable Xen Platform device",
> +        &error_abort);
> +
>      object_class_property_add_str(oc, "firmware",
>          machine_get_firmware, machine_set_firmware,
>          &error_abort);
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 0db670f6d7..62caf924cf 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -56,6 +56,18 @@
>  /* ICH9 AHCI has 6 ports */
>  #define MAX_SATA_PORTS     6
>  
> +static void q35_xen_hvm_init(MachineState *machine)
> +{
> +    PCMachineState *pcms = PC_MACHINE(machine);
> +
> +    if (xen_enabled()) {
> +        /* check if Xen Platform device is enabled */
> +        if (machine->xen_platform_dev) {
> +            pci_create_simple(pcms->bus, -1, "xen-platform");
> +        }
> +    }
> +}
> +
>  /* PC hardware initialisation */
>  static void pc_q35_init(MachineState *machine)
>  {
> @@ -207,6 +219,8 @@ static void pc_q35_init(MachineState *machine)
>      if (xen_enabled()) {
>          pci_bus_irqs(host_bus, xen_cmn_set_irq, xen_cmn_pci_slot_get_pirq,
>                       ich9_lpc, ICH9_XEN_NUM_IRQ_SOURCES);
> +
> +        q35_xen_hvm_init(machine);
>      } else {
>          pci_bus_irqs(host_bus, ich9_lpc_set_irq, ich9_lpc_map_irq, ich9_lpc,
>                       ICH9_LPC_NB_PIRQS);
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index efb0a9edfd..f35fc1cc03 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -238,6 +238,7 @@ struct MachineState {
>      bool usb;
>      bool usb_disabled;
>      bool igd_gfx_passthru;
> +    bool xen_platform_dev;
>      char *firmware;
>      bool iommu;
>      bool suppress_vmdesc;
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 6585058c6c..cee0b92028 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -38,6 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>      "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
>      "                mem-merge=on|off controls memory merge support (default: on)\n"
>      "                igd-passthru=on|off controls IGD GFX passthrough support (default=off)\n"
> +    "                xen-platform-dev=on|off controls Xen Platform device (default=off)\n"
>      "                aes-key-wrap=on|off controls support for AES key wrapping (default=on)\n"
>      "                dea-key-wrap=on|off controls support for DEA key wrapping (default=on)\n"
>      "                suppress-vmdesc=on|off disables self-describing migration (default=off)\n"
> -- 
> 2.11.0
> 
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 16/30] q35/xen: Add Xen platform device support for Q35
@ 2018-03-13  9:24     ` Daniel P. Berrangé
  0 siblings, 0 replies; 183+ messages in thread
From: Daniel P. Berrangé @ 2018-03-13  9:24 UTC (permalink / raw)
  To: Alexey Gerasimenko
  Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Paolo Bonzini,
	Marcel Apfelbaum, xen-devel, Richard Henderson

On Tue, Mar 13, 2018 at 04:34:01AM +1000, Alexey Gerasimenko wrote:
> Current Xen/QEMU method to control Xen Platform device on i440 is a bit
> odd -- enabling/disabling Xen platform device actually modifies the QEMU
> emulated machine type, namely xenfv <--> pc.
> 
> In order to avoid multiplying machine types, use a new way to control Xen
> Platform device for QEMU -- "xen-platform-dev" machine property (bool).
> To maintain backward compatibility with existing Xen/QEMU setups, this
> is only applicable to q35 machine currently. i440 emulation still uses the
> old method (i.e. xenfv/pc machine selection) to control Xen Platform
> device, this may be changed later to xen-platform-dev property as well.

The change you made to q35 is pretty tiny, so I imagine the equiv
change to pc machine is equally small. IOW, I think you should just
convert them both straight away rather than providing an inconsistent
configuration approach for q35 vs pc.

> This way we can use a single machine type (q35) and change just
> xen-platform-dev value to on/off to control Xen platform device.
> 
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> ---
>  hw/core/machine.c   | 21 +++++++++++++++++++++
>  hw/i386/pc_q35.c    | 14 ++++++++++++++
>  include/hw/boards.h |  1 +
>  qemu-options.hx     |  1 +
>  4 files changed, 37 insertions(+)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 5e2bbcdace..205e7da3ce 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -290,6 +290,20 @@ static void machine_set_igd_gfx_passthru(Object *obj, bool value, Error **errp)
>      ms->igd_gfx_passthru = value;
>  }
>  
> +static bool machine_get_xen_platform_dev(Object *obj, Error **errp)
> +{
> +    MachineState *ms = MACHINE(obj);
> +
> +    return ms->xen_platform_dev;
> +}
> +
> +static void machine_set_xen_platform_dev(Object *obj, bool value, Error **errp)
> +{
> +    MachineState *ms = MACHINE(obj);
> +
> +    ms->xen_platform_dev = value;
> +}
> +
>  static char *machine_get_firmware(Object *obj, Error **errp)
>  {
>      MachineState *ms = MACHINE(obj);
> @@ -595,6 +609,13 @@ static void machine_class_init(ObjectClass *oc, void *data)
>      object_class_property_set_description(oc, "igd-passthru",
>          "Set on/off to enable/disable igd passthrou", &error_abort);
>  
> +    object_class_property_add_bool(oc, "xen-platform-dev",
> +        machine_get_xen_platform_dev,
> +        machine_set_xen_platform_dev, &error_abort);
> +    object_class_property_set_description(oc, "xen-platform-dev",
> +        "Set on/off to enable/disable Xen Platform device",
> +        &error_abort);
> +
>      object_class_property_add_str(oc, "firmware",
>          machine_get_firmware, machine_set_firmware,
>          &error_abort);
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 0db670f6d7..62caf924cf 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -56,6 +56,18 @@
>  /* ICH9 AHCI has 6 ports */
>  #define MAX_SATA_PORTS     6
>  
> +static void q35_xen_hvm_init(MachineState *machine)
> +{
> +    PCMachineState *pcms = PC_MACHINE(machine);
> +
> +    if (xen_enabled()) {
> +        /* check if Xen Platform device is enabled */
> +        if (machine->xen_platform_dev) {
> +            pci_create_simple(pcms->bus, -1, "xen-platform");
> +        }
> +    }
> +}
> +
>  /* PC hardware initialisation */
>  static void pc_q35_init(MachineState *machine)
>  {
> @@ -207,6 +219,8 @@ static void pc_q35_init(MachineState *machine)
>      if (xen_enabled()) {
>          pci_bus_irqs(host_bus, xen_cmn_set_irq, xen_cmn_pci_slot_get_pirq,
>                       ich9_lpc, ICH9_XEN_NUM_IRQ_SOURCES);
> +
> +        q35_xen_hvm_init(machine);
>      } else {
>          pci_bus_irqs(host_bus, ich9_lpc_set_irq, ich9_lpc_map_irq, ich9_lpc,
>                       ICH9_LPC_NB_PIRQS);
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index efb0a9edfd..f35fc1cc03 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -238,6 +238,7 @@ struct MachineState {
>      bool usb;
>      bool usb_disabled;
>      bool igd_gfx_passthru;
> +    bool xen_platform_dev;
>      char *firmware;
>      bool iommu;
>      bool suppress_vmdesc;
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 6585058c6c..cee0b92028 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -38,6 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>      "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
>      "                mem-merge=on|off controls memory merge support (default: on)\n"
>      "                igd-passthru=on|off controls IGD GFX passthrough support (default=off)\n"
> +    "                xen-platform-dev=on|off controls Xen Platform device (default=off)\n"
>      "                aes-key-wrap=on|off controls support for AES key wrapping (default=on)\n"
>      "                dea-key-wrap=on|off controls support for DEA key wrapping (default=on)\n"
>      "                suppress-vmdesc=on|off disables self-describing migration (default=off)\n"
> -- 
> 2.11.0
> 
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices
  2018-03-13  9:21   ` Daniel P. Berrangé
@ 2018-03-13 11:37     ` Alexey G
  -1 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-13 11:37 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: xen-devel, qemu-devel

On Tue, 13 Mar 2018 09:21:54 +0000
Daniel P. Berrangé <berrange@redhat.com> wrote:

>The subject line says to expect 30 patches, but you've only sent 18 to
>the list here. I eventually figured out that the first 12 patches were
>in Xen code and so not sent to qemu-devel.
>
>For future if you have changes that affect multiple completely separate
>projects, send them as separate series. ie just send PATCH 00/18 to
>QEMU devel so it doesn't look like a bunch of patches have gone
>missing.

OK, we'll do for next versions.

>> A new domain config option was implemented: device_model_machine.
>> It's a string which has following possible values:
>> - "i440" -- i440 emulation (default)
>> - "q35"  -- emulate a Q35 machine. By default, the storage interface
>> is AHCI.  
>
>Presumably this is mapping to the QEMU -machine arg, so it feels
>desirable to keep the same naming scheme. ie allow any of the
>versioned machine names that QEMU uses. eg any of "pc-q35-2.x"
>versioned types, or 'q35' as an alias for latest, and use
>"pc-i440fx-2.x" versioned types of 'pc' as an alias for latest, rather
>than 'i440' which is needlessly divering from the QEMU machine type.

Yes, it is translated into the '-machine' argument.

A direct mapping between the Xen device_model_machine option and QEMU
'-machine' argument won't be accepted by Xen maintainers I guess.

The main problem with this approach is a requirement to have a match
between Xen/libxl and QEMU versions. If, for example,
device_model_machine tells something like "pc-q35-2.11" and later we
downgrade QEMU to some older version we'll likely have a problem
without changing anything in the domain config. So I guess the "use the
latest available" approach for machine selection (pc, q35, etc) is the
only possible option. Perhaps having the way to specify the exact QEMU
machine name and version in a separate domain config parameter (for
advanced use) might be feasible.

Also, parameter names do not speak for themselves I'm afraid. This way
we'll have, for example, device_model_machine="pc" vs
device_model_machine="q35"... a bit unclear I think. This may be
obvious for a QEMU user, but many Xen users didn't get used to QEMU
machines and there might be some wondering why "q35" is not "pc" and
why "pc" is an i440 system precisely.

Another obstacle here is xen_platform_device option which indirectly
selects QEMU machine type for i440 at the moment (pc/xenfv), but this
may be addressed by controlling the Xen platform device independently
via a separate machine property or '-device xen-platform' like
Eduardo Habkost suggested.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices
@ 2018-03-13 11:37     ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-13 11:37 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: xen-devel, qemu-devel

On Tue, 13 Mar 2018 09:21:54 +0000
Daniel P. Berrangé <berrange@redhat.com> wrote:

>The subject line says to expect 30 patches, but you've only sent 18 to
>the list here. I eventually figured out that the first 12 patches were
>in Xen code and so not sent to qemu-devel.
>
>For future if you have changes that affect multiple completely separate
>projects, send them as separate series. ie just send PATCH 00/18 to
>QEMU devel so it doesn't look like a bunch of patches have gone
>missing.

OK, we'll do for next versions.

>> A new domain config option was implemented: device_model_machine.
>> It's a string which has following possible values:
>> - "i440" -- i440 emulation (default)
>> - "q35"  -- emulate a Q35 machine. By default, the storage interface
>> is AHCI.  
>
>Presumably this is mapping to the QEMU -machine arg, so it feels
>desirable to keep the same naming scheme. ie allow any of the
>versioned machine names that QEMU uses. eg any of "pc-q35-2.x"
>versioned types, or 'q35' as an alias for latest, and use
>"pc-i440fx-2.x" versioned types of 'pc' as an alias for latest, rather
>than 'i440' which is needlessly divering from the QEMU machine type.

Yes, it is translated into the '-machine' argument.

A direct mapping between the Xen device_model_machine option and QEMU
'-machine' argument won't be accepted by Xen maintainers I guess.

The main problem with this approach is a requirement to have a match
between Xen/libxl and QEMU versions. If, for example,
device_model_machine tells something like "pc-q35-2.11" and later we
downgrade QEMU to some older version we'll likely have a problem
without changing anything in the domain config. So I guess the "use the
latest available" approach for machine selection (pc, q35, etc) is the
only possible option. Perhaps having the way to specify the exact QEMU
machine name and version in a separate domain config parameter (for
advanced use) might be feasible.

Also, parameter names do not speak for themselves I'm afraid. This way
we'll have, for example, device_model_machine="pc" vs
device_model_machine="q35"... a bit unclear I think. This may be
obvious for a QEMU user, but many Xen users didn't get used to QEMU
machines and there might be some wondering why "q35" is not "pc" and
why "pc" is an i440 system precisely.

Another obstacle here is xen_platform_device option which indirectly
selects QEMU machine type for i440 at the moment (pc/xenfv), but this
may be addressed by controlling the Xen platform device independently
via a separate machine property or '-device xen-platform' like
Eduardo Habkost suggested.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices
  2018-03-13 11:37     ` Alexey G
  (?)
@ 2018-03-13 11:44     ` Daniel P. Berrangé
  -1 siblings, 0 replies; 183+ messages in thread
From: Daniel P. Berrangé @ 2018-03-13 11:44 UTC (permalink / raw)
  To: Alexey G; +Cc: xen-devel, qemu-devel

On Tue, Mar 13, 2018 at 09:37:55PM +1000, Alexey G wrote:
> On Tue, 13 Mar 2018 09:21:54 +0000
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> >The subject line says to expect 30 patches, but you've only sent 18 to
> >the list here. I eventually figured out that the first 12 patches were
> >in Xen code and so not sent to qemu-devel.
> >
> >For future if you have changes that affect multiple completely separate
> >projects, send them as separate series. ie just send PATCH 00/18 to
> >QEMU devel so it doesn't look like a bunch of patches have gone
> >missing.
> 
> OK, we'll do for next versions.
> 
> >> A new domain config option was implemented: device_model_machine.
> >> It's a string which has following possible values:
> >> - "i440" -- i440 emulation (default)
> >> - "q35"  -- emulate a Q35 machine. By default, the storage interface
> >> is AHCI.  
> >
> >Presumably this is mapping to the QEMU -machine arg, so it feels
> >desirable to keep the same naming scheme. ie allow any of the
> >versioned machine names that QEMU uses. eg any of "pc-q35-2.x"
> >versioned types, or 'q35' as an alias for latest, and use
> >"pc-i440fx-2.x" versioned types of 'pc' as an alias for latest, rather
> >than 'i440' which is needlessly divering from the QEMU machine type.
> 
> Yes, it is translated into the '-machine' argument.
> 
> A direct mapping between the Xen device_model_machine option and QEMU
> '-machine' argument won't be accepted by Xen maintainers I guess.
> 
> The main problem with this approach is a requirement to have a match
> between Xen/libxl and QEMU versions. If, for example,
> device_model_machine tells something like "pc-q35-2.11" and later we
> downgrade QEMU to some older version we'll likely have a problem
> without changing anything in the domain config. So I guess the "use the
> latest available" approach for machine selection (pc, q35, etc) is the
> only possible option. Perhaps having the way to specify the exact QEMU
> machine name and version in a separate domain config parameter (for
> advanced use) might be feasible.

At least with plain QEMU or KVM, using the versioned machine type
names is important as that is what guarantees you a stable guest
machine ABI, independant of QEMU version.  If your deployment has
a mixture of QEMU versions on different hosts, then you very much
want to pick a versioned machine type to ensure compatibility for
live migration. With libvirt we accept the short "pc" or "q35"
names on input, but expand them to the fully versioned name
when saving the config file, so no matter which QEMU version is
used each time the guest is launched, the ABI is always the same.

> 
> Also, parameter names do not speak for themselves I'm afraid. This way
> we'll have, for example, device_model_machine="pc" vs
> device_model_machine="q35"... a bit unclear I think. This may be
> obvious for a QEMU user, but many Xen users didn't get used to QEMU
> machines and there might be some wondering why "q35" is not "pc" and
> why "pc" is an i440 system precisely.
> 
> Another obstacle here is xen_platform_device option which indirectly
> selects QEMU machine type for i440 at the moment (pc/xenfv), but this
> may be addressed by controlling the Xen platform device independently
> via a separate machine property or '-device xen-platform' like
> Eduardo Habkost suggested.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices
  2018-03-13 11:37     ` Alexey G
  (?)
  (?)
@ 2018-03-13 11:44     ` Daniel P. Berrangé
  -1 siblings, 0 replies; 183+ messages in thread
From: Daniel P. Berrangé @ 2018-03-13 11:44 UTC (permalink / raw)
  To: Alexey G; +Cc: xen-devel, qemu-devel

On Tue, Mar 13, 2018 at 09:37:55PM +1000, Alexey G wrote:
> On Tue, 13 Mar 2018 09:21:54 +0000
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> >The subject line says to expect 30 patches, but you've only sent 18 to
> >the list here. I eventually figured out that the first 12 patches were
> >in Xen code and so not sent to qemu-devel.
> >
> >For future if you have changes that affect multiple completely separate
> >projects, send them as separate series. ie just send PATCH 00/18 to
> >QEMU devel so it doesn't look like a bunch of patches have gone
> >missing.
> 
> OK, we'll do for next versions.
> 
> >> A new domain config option was implemented: device_model_machine.
> >> It's a string which has following possible values:
> >> - "i440" -- i440 emulation (default)
> >> - "q35"  -- emulate a Q35 machine. By default, the storage interface
> >> is AHCI.  
> >
> >Presumably this is mapping to the QEMU -machine arg, so it feels
> >desirable to keep the same naming scheme. ie allow any of the
> >versioned machine names that QEMU uses. eg any of "pc-q35-2.x"
> >versioned types, or 'q35' as an alias for latest, and use
> >"pc-i440fx-2.x" versioned types of 'pc' as an alias for latest, rather
> >than 'i440' which is needlessly divering from the QEMU machine type.
> 
> Yes, it is translated into the '-machine' argument.
> 
> A direct mapping between the Xen device_model_machine option and QEMU
> '-machine' argument won't be accepted by Xen maintainers I guess.
> 
> The main problem with this approach is a requirement to have a match
> between Xen/libxl and QEMU versions. If, for example,
> device_model_machine tells something like "pc-q35-2.11" and later we
> downgrade QEMU to some older version we'll likely have a problem
> without changing anything in the domain config. So I guess the "use the
> latest available" approach for machine selection (pc, q35, etc) is the
> only possible option. Perhaps having the way to specify the exact QEMU
> machine name and version in a separate domain config parameter (for
> advanced use) might be feasible.

At least with plain QEMU or KVM, using the versioned machine type
names is important as that is what guarantees you a stable guest
machine ABI, independant of QEMU version.  If your deployment has
a mixture of QEMU versions on different hosts, then you very much
want to pick a versioned machine type to ensure compatibility for
live migration. With libvirt we accept the short "pc" or "q35"
names on input, but expand them to the fully versioned name
when saving the config file, so no matter which QEMU version is
used each time the guest is launched, the ABI is always the same.

> 
> Also, parameter names do not speak for themselves I'm afraid. This way
> we'll have, for example, device_model_machine="pc" vs
> device_model_machine="q35"... a bit unclear I think. This may be
> obvious for a QEMU user, but many Xen users didn't get used to QEMU
> machines and there might be some wondering why "q35" is not "pc" and
> why "pc" is an i440 system precisely.
> 
> Another obstacle here is xen_platform_device option which indirectly
> selects QEMU machine type for i440 at the moment (pc/xenfv), but this
> may be addressed by controlling the Xen platform device independently
> via a separate machine property or '-device xen-platform' like
> Eduardo Habkost suggested.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 08/12] libxl: Q35 support (new option device_model_machine)
  2018-03-12 18:33 ` [RFC PATCH 08/12] libxl: Q35 support (new option device_model_machine) Alexey Gerasimenko
@ 2018-03-13 17:25   ` Wei Liu
  2018-03-13 17:32     ` Anthony PERARD
  2018-03-19 17:01   ` Roger Pau Monné
  1 sibling, 1 reply; 183+ messages in thread
From: Wei Liu @ 2018-03-13 17:25 UTC (permalink / raw)
  To: Alexey Gerasimenko; +Cc: Anthony PERARD, xen-devel, Ian Jackson, Wei Liu

Cc Anthony

IIRC there are changes needed on QEMU side? Do we need to wait until
that lands?

Wei.

On Tue, Mar 13, 2018 at 04:33:53AM +1000, Alexey Gerasimenko wrote:
> Provide a new domain config option to select the emulated machine type,
> device_model_machine. It has following possible values:
> - "i440" - i440 emulation (default)
> - "q35" - emulate a Q35 machine. By default, the storage interface is AHCI.
> 
> Note that omitting device_model_machine parameter means i440 system
> by default, so the default behavior doesn't change for existing domain
> config files.
> 
> Setting device_model_machine to "q35" sends '-machine q35,accel=xen'
> argument to QEMU. Unlike i440, there no separate machine type
> to enable/disable Xen platform device, it is controlled via a machine
> property only. See 'libxl: Xen Platform device support for Q35' patch for
> a detailed description.
> 
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> ---
>  tools/libxl/libxl_dm.c      | 16 ++++++++++------
>  tools/libxl/libxl_types.idl |  7 +++++++
>  tools/xl/xl_parse.c         | 14 ++++++++++++++
>  3 files changed, 31 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index a3cddce8b7..7b531050c7 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -1443,13 +1443,17 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
>              flexarray_append(dm_args, b_info->extra_pv[i]);
>          break;
>      case LIBXL_DOMAIN_TYPE_HVM:
> -        if (!libxl_defbool_val(b_info->u.hvm.xen_platform_pci)) {
> -            /* Switching here to the machine "pc" which does not add
> -             * the xen-platform device instead of the default "xenfv" machine.
> -             */
> -            machinearg = libxl__strdup(gc, "pc,accel=xen");
> +        if (b_info->device_model_machine == LIBXL_DEVICE_MODEL_MACHINE_Q35) {
> +            machinearg = libxl__sprintf(gc, "q35,accel=xen");
>          } else {
> -            machinearg = libxl__strdup(gc, "xenfv");
> +            if (!libxl_defbool_val(b_info->u.hvm.xen_platform_pci)) {
> +                /* Switching here to the machine "pc" which does not add
> +                 * the xen-platform device instead of the default "xenfv" machine.
> +                 */
> +                machinearg = libxl__strdup(gc, "pc,accel=xen");
> +            } else {
> +                machinearg = libxl__strdup(gc, "xenfv");
> +            }
>          }
>          if (b_info->u.hvm.mmio_hole_memkb) {
>              uint64_t max_ram_below_4g = (1ULL << 32) -
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 35038120ca..f3ef3cbdde 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -101,6 +101,12 @@ libxl_device_model_version = Enumeration("device_model_version", [
>      (2, "QEMU_XEN"),             # Upstream based qemu-xen device model
>      ])
>  
> +libxl_device_model_machine = Enumeration("device_model_machine", [
> +    (0, "UNKNOWN"),
> +    (1, "I440"),
> +    (2, "Q35"),
> +    ])
> +
>  libxl_console_type = Enumeration("console_type", [
>      (0, "UNKNOWN"),
>      (1, "SERIAL"),
> @@ -491,6 +497,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>      ("device_model_ssid_label", string),
>      # device_model_user is not ready for use yet
>      ("device_model_user", string),
> +    ("device_model_machine", libxl_device_model_machine),
>  
>      # extra parameters pass directly to qemu, NULL terminated
>      ("extra",            libxl_string_list),
> diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
> index f6842540ca..a7506a426b 100644
> --- a/tools/xl/xl_parse.c
> +++ b/tools/xl/xl_parse.c
> @@ -2110,6 +2110,20 @@ skip_usbdev:
>      xlu_cfg_replace_string(config, "device_model_user",
>                             &b_info->device_model_user, 0);
>  
> +    if (!xlu_cfg_get_string (config, "device_model_machine", &buf, 0)) {
> +        if (!strcmp(buf, "i440")) {
> +            b_info->device_model_machine = LIBXL_DEVICE_MODEL_MACHINE_I440;
> +        } else if (!strcmp(buf, "q35")) {
> +            b_info->device_model_machine = LIBXL_DEVICE_MODEL_MACHINE_Q35;
> +        } else {
> +            fprintf(stderr,
> +                    "Unknown device_model_machine \"%s\" specified\n", buf);
> +            exit(1);
> +        }
> +    } else {
> +        b_info->device_model_machine = LIBXL_DEVICE_MODEL_MACHINE_UNKNOWN;
> +    }
> +
>  #define parse_extra_args(type)                                            \
>      e = xlu_cfg_get_list_as_string_list(config, "device_model_args"#type, \
>                                      &b_info->extra##type, 0);            \
> -- 
> 2.11.0
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 03/12] hvmloader: add function to query an emulated machine type (i440/Q35)
  2018-03-12 18:33 ` [RFC PATCH 03/12] hvmloader: add function to query an emulated machine type (i440/Q35) Alexey Gerasimenko
@ 2018-03-13 17:26   ` Wei Liu
  2018-03-13 17:58     ` Alexey G
  2018-03-19 12:56   ` Roger Pau Monné
  1 sibling, 1 reply; 183+ messages in thread
From: Wei Liu @ 2018-03-13 17:26 UTC (permalink / raw)
  To: Alexey Gerasimenko
  Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

On Tue, Mar 13, 2018 at 04:33:48AM +1000, Alexey Gerasimenko wrote:
> This adds a new function get_pc_machine_type() which allows to determine
> the emulated chipset type. Supported return values:
> 
> - MACHINE_TYPE_I440
> - MACHINE_TYPE_Q35
> - MACHINE_TYPE_UNKNOWN, results in the error message being printed
>   followed by calling BUG() in hvmloader.
> 
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> ---
>  tools/firmware/hvmloader/pci_regs.h |  5 ++++
>  tools/firmware/hvmloader/util.c     | 47 +++++++++++++++++++++++++++++++++++++
>  tools/firmware/hvmloader/util.h     |  8 +++++++
>  3 files changed, 60 insertions(+)
> 
> diff --git a/tools/firmware/hvmloader/pci_regs.h b/tools/firmware/hvmloader/pci_regs.h
> index 7bf2d873ab..ba498b840e 100644
> --- a/tools/firmware/hvmloader/pci_regs.h
> +++ b/tools/firmware/hvmloader/pci_regs.h
> @@ -107,6 +107,11 @@
>  
>  #define PCI_INTEL_OPREGION 0xfc /* 4 bits */
>  
> +#define PCI_VENDOR_ID_INTEL              0x8086
> +#define PCI_DEVICE_ID_INTEL_82441        0x1237
> +#define PCI_DEVICE_ID_INTEL_Q35_MCH      0x29c0
> +
> +

Too many blank lines.

>  #endif /* __HVMLOADER_PCI_REGS_H__ */
>  
>  /*
> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> index 0c3f2d24cd..5739a87628 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -22,6 +22,7 @@
>  #include "hypercall.h"
>  #include "ctype.h"
>  #include "vnuma.h"
> +#include "pci_regs.h"
>  #include <acpi2_0.h>
>  #include <libacpi.h>
>  #include <stdint.h>
> @@ -735,6 +736,52 @@ void __bug(char *file, int line)
>      crash();
>  }
>  
> +
> +static int machine_type = MACHINE_TYPE_UNDEFINED;
> +
> +int get_pc_machine_type(void)
> +{
> +    uint16_t vendor_id;
> +    uint16_t device_id;
> +
> +    if (machine_type != MACHINE_TYPE_UNDEFINED)
> +        return machine_type;
> +
> +    machine_type = MACHINE_TYPE_UNKNOWN;
> +
> +    vendor_id = pci_readw(0, PCI_VENDOR_ID);
> +    device_id = pci_readw(0, PCI_DEVICE_ID);
> +
> +    /* only Intel platforms are emulated currently */
> +    if (vendor_id == PCI_VENDOR_ID_INTEL)

Coding style.

> +    {
> +        switch (device_id)

Ditto.

And this patch should be folded into its user, unless the patch that
uses it is very big on its own.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 04/12] hvmloader: add ACPI enabling for Q35
  2018-03-12 18:33 ` [RFC PATCH 04/12] hvmloader: add ACPI enabling for Q35 Alexey Gerasimenko
@ 2018-03-13 17:26   ` Wei Liu
  2018-03-19 13:01   ` Roger Pau Monné
  1 sibling, 0 replies; 183+ messages in thread
From: Wei Liu @ 2018-03-13 17:26 UTC (permalink / raw)
  To: Alexey Gerasimenko
  Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

On Tue, Mar 13, 2018 at 04:33:49AM +1000, Alexey Gerasimenko wrote:
> In order to turn on ACPI for OS, we need to write a chipset-specific value
> to SMI_CMD register (sort of imitation of the APM->ACPI switch on real
> systems). Modify acpi_enable_sci() function to support both i440 and Q35
> emulation.
> 
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> ---
>  tools/firmware/hvmloader/hvmloader.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
> index f603f68ded..070698440e 100644
> --- a/tools/firmware/hvmloader/hvmloader.c
> +++ b/tools/firmware/hvmloader/hvmloader.c
> @@ -257,9 +257,16 @@ static const struct bios_config *detect_bios(void)
>  static void acpi_enable_sci(void)
>  {
>      uint8_t pm1a_cnt_val;
> +    uint8_t acpi_enable_val;
>  
> -#define PIIX4_SMI_CMD_IOPORT 0xb2
> +#define SMI_CMD_IOPORT       0xb2
>  #define PIIX4_ACPI_ENABLE    0xf1
> +#define ICH9_ACPI_ENABLE     0x02
> +
> +    if (get_pc_machine_type() == MACHINE_TYPE_Q35)

Coding style.

And the previous patch can be folded into this one.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 08/12] libxl: Q35 support (new option device_model_machine)
  2018-03-13 17:25   ` Wei Liu
@ 2018-03-13 17:32     ` Anthony PERARD
  0 siblings, 0 replies; 183+ messages in thread
From: Anthony PERARD @ 2018-03-13 17:32 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, Ian Jackson, Alexey Gerasimenko

On Tue, Mar 13, 2018 at 05:25:50PM +0000, Wei Liu wrote:
> Cc Anthony
> 
> IIRC there are changes needed on QEMU side?

Yes, there are actually QEMU patches in the patch series. I'm CCed on
them.

> Do we need to wait until that lands?

That's depends. I don't have an answer.

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 03/12] hvmloader: add function to query an emulated machine type (i440/Q35)
  2018-03-13 17:26   ` Wei Liu
@ 2018-03-13 17:58     ` Alexey G
  2018-03-13 18:04       ` Wei Liu
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-13 17:58 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, Ian Jackson, Jan Beulich, Andrew Cooper

On Tue, 13 Mar 2018 17:26:04 +0000
Wei Liu <wei.liu2@citrix.com> wrote:

>On Tue, Mar 13, 2018 at 04:33:48AM +1000, Alexey Gerasimenko wrote:
>> This adds a new function get_pc_machine_type() which allows to
>> determine the emulated chipset type. Supported return values:
>> 
>> - MACHINE_TYPE_I440
>> - MACHINE_TYPE_Q35
>> - MACHINE_TYPE_UNKNOWN, results in the error message being printed
>>   followed by calling BUG() in hvmloader.
>> 
>> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
>> ---
>>  tools/firmware/hvmloader/pci_regs.h |  5 ++++
>>  tools/firmware/hvmloader/util.c     | 47
>> +++++++++++++++++++++++++++++++++++++
>> tools/firmware/hvmloader/util.h     |  8 +++++++ 3 files changed, 60
>> insertions(+)
>> 
>> diff --git a/tools/firmware/hvmloader/pci_regs.h
>> b/tools/firmware/hvmloader/pci_regs.h index 7bf2d873ab..ba498b840e
>> 100644 --- a/tools/firmware/hvmloader/pci_regs.h
>> +++ b/tools/firmware/hvmloader/pci_regs.h
>> @@ -107,6 +107,11 @@
>>  
>>  #define PCI_INTEL_OPREGION 0xfc /* 4 bits */
>>  
>> +#define PCI_VENDOR_ID_INTEL              0x8086
>> +#define PCI_DEVICE_ID_INTEL_82441        0x1237
>> +#define PCI_DEVICE_ID_INTEL_Q35_MCH      0x29c0
>> +
>> +  
>
>Too many blank lines.

Will fix.

>> @@ -735,6 +736,52 @@ void __bug(char *file, int line)
>>      crash();
>>  }
>>  
>> +    /* only Intel platforms are emulated currently */
>> +    if (vendor_id == PCI_VENDOR_ID_INTEL)  
>
>Coding style.
>
>Ditto.

Will fix.

>And this patch should be folded into its user, unless the patch that
>uses it is very big on its own.

Hmm, looks like I overfollowed the recommendation about making atomic
patches for easier review. There are multiple users of these function,
it was made in a separate patch just because of this. In the next
version I'll merge it with some of the patches which use this function
then.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 03/12] hvmloader: add function to query an emulated machine type (i440/Q35)
  2018-03-13 17:58     ` Alexey G
@ 2018-03-13 18:04       ` Wei Liu
  0 siblings, 0 replies; 183+ messages in thread
From: Wei Liu @ 2018-03-13 18:04 UTC (permalink / raw)
  To: Alexey G; +Cc: xen-devel, Wei Liu, Ian Jackson, Jan Beulich, Andrew Cooper

On Wed, Mar 14, 2018 at 03:58:17AM +1000, Alexey G wrote:
> On Tue, 13 Mar 2018 17:26:04 +0000
> Wei Liu <wei.liu2@citrix.com> wrote:
> 
> >On Tue, Mar 13, 2018 at 04:33:48AM +1000, Alexey Gerasimenko wrote:
> >> This adds a new function get_pc_machine_type() which allows to
> >> determine the emulated chipset type. Supported return values:
> >> 
> >> - MACHINE_TYPE_I440
> >> - MACHINE_TYPE_Q35
> >> - MACHINE_TYPE_UNKNOWN, results in the error message being printed
> >>   followed by calling BUG() in hvmloader.
> >> 
> >> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> >> ---
> >>  tools/firmware/hvmloader/pci_regs.h |  5 ++++
> >>  tools/firmware/hvmloader/util.c     | 47
> >> +++++++++++++++++++++++++++++++++++++
> >> tools/firmware/hvmloader/util.h     |  8 +++++++ 3 files changed, 60
> >> insertions(+)
> >> 
> >> diff --git a/tools/firmware/hvmloader/pci_regs.h
> >> b/tools/firmware/hvmloader/pci_regs.h index 7bf2d873ab..ba498b840e
> >> 100644 --- a/tools/firmware/hvmloader/pci_regs.h
> >> +++ b/tools/firmware/hvmloader/pci_regs.h
> >> @@ -107,6 +107,11 @@
> >>  
> >>  #define PCI_INTEL_OPREGION 0xfc /* 4 bits */
> >>  
> >> +#define PCI_VENDOR_ID_INTEL              0x8086
> >> +#define PCI_DEVICE_ID_INTEL_82441        0x1237
> >> +#define PCI_DEVICE_ID_INTEL_Q35_MCH      0x29c0
> >> +
> >> +  
> >
> >Too many blank lines.
> 
> Will fix.
> 
> >> @@ -735,6 +736,52 @@ void __bug(char *file, int line)
> >>      crash();
> >>  }
> >>  
> >> +    /* only Intel platforms are emulated currently */
> >> +    if (vendor_id == PCI_VENDOR_ID_INTEL)  
> >
> >Coding style.
> >
> >Ditto.
> 
> Will fix.
> 
> >And this patch should be folded into its user, unless the patch that
> >uses it is very big on its own.
> 
> Hmm, looks like I overfollowed the recommendation about making atomic
> patches for easier review. There are multiple users of these function,
> it was made in a separate patch just because of this. In the next
> version I'll merge it with some of the patches which use this function
> then.

It really depends. It will take some back-and-forth to find the right
balance. I can't say I'm very consistent on this either.

If you think leaving it in a separate patch is better, I won't object.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 16/30] q35/xen: Add Xen platform device support for Q35
  2018-03-12 21:44       ` [Qemu-devel] " Eduardo Habkost
@ 2018-03-13 23:49           ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-13 23:49 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: xen-devel, qemu-devel, Marcel Apfelbaum, Paolo Bonzini,
	Richard Henderson, Michael S. Tsirkin

On Mon, 12 Mar 2018 18:44:02 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

>On Tue, Mar 13, 2018 at 06:56:37AM +1000, Alexey G wrote:
>> On Mon, 12 Mar 2018 16:44:06 -0300
>> Eduardo Habkost <ehabkost@redhat.com> wrote:
>>   
>> >On Tue, Mar 13, 2018 at 04:34:01AM +1000, Alexey Gerasimenko
>> >wrote:  
>> >> Current Xen/QEMU method to control Xen Platform device on i440 is
>> >> a bit odd -- enabling/disabling Xen platform device actually
>> >> modifies the QEMU emulated machine type, namely xenfv <--> pc.
>> >> 
>> >> In order to avoid multiplying machine types, use a new way to
>> >> control Xen Platform device for QEMU -- "xen-platform-dev" machine
>> >> property (bool). To maintain backward compatibility with existing
>> >> Xen/QEMU setups, this is only applicable to q35 machine currently.
>> >> i440 emulation still uses the old method (i.e. xenfv/pc machine
>> >> selection) to control Xen Platform device, this may be changed
>> >> later to xen-platform-dev property as well.
>> >> 
>> >> This way we can use a single machine type (q35) and change just
>> >> xen-platform-dev value to on/off to control Xen platform device.
>> >> 
>> >> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
>> >> ---    
>> >[...]  
>> >> diff --git a/qemu-options.hx b/qemu-options.hx
>> >> index 6585058c6c..cee0b92028 100644
>> >> --- a/qemu-options.hx
>> >> +++ b/qemu-options.hx
>> >> @@ -38,6 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>> >>      "                dump-guest-core=on|off include guest memory
>> >> in a core dump (default=on)\n" "                mem-merge=on|off
>> >> controls memory merge support (default: on)\n" "
>> >> igd-passthru=on|off controls IGD GFX passthrough support
>> >> (default=off)\n"
>> >> +    "                xen-platform-dev=on|off controls Xen
>> >> Platform device (default=off)\n" "
>> >> aes-key-wrap=on|off controls support for AES key wrapping
>> >> (default=on)\n" "                dea-key-wrap=on|off controls
>> >> support for DEA key wrapping (default=on)\n" "
>> >> suppress-vmdesc=on|off disables self-describing migration
>> >> (default=off)\n"    
>> >
>> >What are the obstacles preventing "-device xen-platform" from
>> >working?  It would be better than adding a new boolean option to
>> >-machine.  
>> 
>> I guess the initial assumption was that changing the
>> xen_platform_device value in Xen's options may cause some additional
>> changes in platform configuration besides adding (or not) the Xen
>> Platform device, hence a completely different machine type was chosen
>> (xenfv).
>> 
>> At the moment pc,accel=xen/xenfv selection mostly governs
>> only the Xen Platform device presence. Also setting max_cpus to
>> HVM_MAX_VCPUS depends on it, but this doesn't applicable to a
>> 'pc,accel=xen' machine for some reason.
>> 
>> If applying HVM_MAX_VCPUS to max_cpus is really necessary I think
>> it's better to set it unconditionally for all 'accel=xen' HVM machine
>> types inside xen_enabled() block. Right now it's missing for
>> pc,accel=xen and q35,accel=xen.  
>
>If you are talking about MachineClass::max_cpus, note that it is
>returned by query-machines, so it's supposed to be a static
>value.  Changing it a runtime would mean the query-machines value
>is incorrect.
>
>Is HVM_MAX_CPUS higher or lower than 255?  If it's higher, does
>it mean the current value on pc and q35 isn't accurate?

HVM_MAX_VCPUS is 128 currently, but there is an ongoing work from Intel
to support more vcpus and >8bit APIC IDs, so this number will likely
change soon.

According to the code, using HVM_MAX_VCPUS in QEMU is a bit excessive as
the maximum number of vcpus is controlled on Xen side anyway. Currently
HVM_MAX_VCPUS is used in a one-time check for the maxcpus value (which
itself comes from libxl).
I think for future compatibility it's better to set mc->max_cpus to
HVM_MAX_VCPUS for all accel=xen HVM-supported machine types, not just
xenfv.

The '-device' approach you suggested seems more preferable than a
machine bool property, I'll try switching to it.

>Is HVM_MAX_CPUS something that needs to be enabled because of
>accel=xen or because or the xen-platform device?
>
>If it's just because of accel=xen, we could introduce a
>AccelClass::max_cpus() method (we also have KVM-imposed CPU count
>limits, currently implemented inside kvm_init()).

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 16/30] q35/xen: Add Xen platform device support for Q35
@ 2018-03-13 23:49           ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-13 23:49 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Michael S. Tsirkin, qemu-devel, Paolo Bonzini, Marcel Apfelbaum,
	xen-devel, Richard Henderson

On Mon, 12 Mar 2018 18:44:02 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

>On Tue, Mar 13, 2018 at 06:56:37AM +1000, Alexey G wrote:
>> On Mon, 12 Mar 2018 16:44:06 -0300
>> Eduardo Habkost <ehabkost@redhat.com> wrote:
>>   
>> >On Tue, Mar 13, 2018 at 04:34:01AM +1000, Alexey Gerasimenko
>> >wrote:  
>> >> Current Xen/QEMU method to control Xen Platform device on i440 is
>> >> a bit odd -- enabling/disabling Xen platform device actually
>> >> modifies the QEMU emulated machine type, namely xenfv <--> pc.
>> >> 
>> >> In order to avoid multiplying machine types, use a new way to
>> >> control Xen Platform device for QEMU -- "xen-platform-dev" machine
>> >> property (bool). To maintain backward compatibility with existing
>> >> Xen/QEMU setups, this is only applicable to q35 machine currently.
>> >> i440 emulation still uses the old method (i.e. xenfv/pc machine
>> >> selection) to control Xen Platform device, this may be changed
>> >> later to xen-platform-dev property as well.
>> >> 
>> >> This way we can use a single machine type (q35) and change just
>> >> xen-platform-dev value to on/off to control Xen platform device.
>> >> 
>> >> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
>> >> ---    
>> >[...]  
>> >> diff --git a/qemu-options.hx b/qemu-options.hx
>> >> index 6585058c6c..cee0b92028 100644
>> >> --- a/qemu-options.hx
>> >> +++ b/qemu-options.hx
>> >> @@ -38,6 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>> >>      "                dump-guest-core=on|off include guest memory
>> >> in a core dump (default=on)\n" "                mem-merge=on|off
>> >> controls memory merge support (default: on)\n" "
>> >> igd-passthru=on|off controls IGD GFX passthrough support
>> >> (default=off)\n"
>> >> +    "                xen-platform-dev=on|off controls Xen
>> >> Platform device (default=off)\n" "
>> >> aes-key-wrap=on|off controls support for AES key wrapping
>> >> (default=on)\n" "                dea-key-wrap=on|off controls
>> >> support for DEA key wrapping (default=on)\n" "
>> >> suppress-vmdesc=on|off disables self-describing migration
>> >> (default=off)\n"    
>> >
>> >What are the obstacles preventing "-device xen-platform" from
>> >working?  It would be better than adding a new boolean option to
>> >-machine.  
>> 
>> I guess the initial assumption was that changing the
>> xen_platform_device value in Xen's options may cause some additional
>> changes in platform configuration besides adding (or not) the Xen
>> Platform device, hence a completely different machine type was chosen
>> (xenfv).
>> 
>> At the moment pc,accel=xen/xenfv selection mostly governs
>> only the Xen Platform device presence. Also setting max_cpus to
>> HVM_MAX_VCPUS depends on it, but this doesn't applicable to a
>> 'pc,accel=xen' machine for some reason.
>> 
>> If applying HVM_MAX_VCPUS to max_cpus is really necessary I think
>> it's better to set it unconditionally for all 'accel=xen' HVM machine
>> types inside xen_enabled() block. Right now it's missing for
>> pc,accel=xen and q35,accel=xen.  
>
>If you are talking about MachineClass::max_cpus, note that it is
>returned by query-machines, so it's supposed to be a static
>value.  Changing it a runtime would mean the query-machines value
>is incorrect.
>
>Is HVM_MAX_CPUS higher or lower than 255?  If it's higher, does
>it mean the current value on pc and q35 isn't accurate?

HVM_MAX_VCPUS is 128 currently, but there is an ongoing work from Intel
to support more vcpus and >8bit APIC IDs, so this number will likely
change soon.

According to the code, using HVM_MAX_VCPUS in QEMU is a bit excessive as
the maximum number of vcpus is controlled on Xen side anyway. Currently
HVM_MAX_VCPUS is used in a one-time check for the maxcpus value (which
itself comes from libxl).
I think for future compatibility it's better to set mc->max_cpus to
HVM_MAX_VCPUS for all accel=xen HVM-supported machine types, not just
xenfv.

The '-device' approach you suggested seems more preferable than a
machine bool property, I'll try switching to it.

>Is HVM_MAX_CPUS something that needs to be enabled because of
>accel=xen or because or the xen-platform device?
>
>If it's just because of accel=xen, we could introduce a
>AccelClass::max_cpus() method (we also have KVM-imposed CPU count
>limits, currently implemented inside kvm_init()).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 13/30] pc/xen: Xen Q35 support: provide IRQ handling for PCI devices
  2018-03-12 18:33   ` Alexey Gerasimenko
  (?)
@ 2018-03-14 10:48   ` Paolo Bonzini
  2018-03-14 11:28       ` Alexey G
  -1 siblings, 1 reply; 183+ messages in thread
From: Paolo Bonzini @ 2018-03-14 10:48 UTC (permalink / raw)
  To: Alexey Gerasimenko, xen-devel
  Cc: qemu-devel, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Eduardo Habkost, Stefano Stabellini,
	Anthony Perard

On 12/03/2018 19:33, Alexey Gerasimenko wrote:
> xen_pci_slot_get_pirq --> xen_cmn_pci_slot_get_pirq
> xen_piix3_set_irq     --> xen_cmn_set_irq

Don't abbrvt names, xen_hvm_ is a better prefix.

> 
> +                    fprintf(stderr, "WARNING: guest domain attempted to use PIRQ%c "
> +                            "routing which is not supported for Xen/Q35 currently\n",
> +                            (char)(address - ICH9_LPC_PIRQE_ROUT + 'E'));

Use error_report instead.

Paolo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 13/30] pc/xen: Xen Q35 support: provide IRQ handling for PCI devices
  2018-03-12 18:33   ` Alexey Gerasimenko
  (?)
  (?)
@ 2018-03-14 10:48   ` Paolo Bonzini
  -1 siblings, 0 replies; 183+ messages in thread
From: Paolo Bonzini @ 2018-03-14 10:48 UTC (permalink / raw)
  To: Alexey Gerasimenko, xen-devel
  Cc: Stefano Stabellini, Eduardo Habkost, Michael S. Tsirkin,
	qemu-devel, Anthony Perard, Marcel Apfelbaum, Richard Henderson

On 12/03/2018 19:33, Alexey Gerasimenko wrote:
> xen_pci_slot_get_pirq --> xen_cmn_pci_slot_get_pirq
> xen_piix3_set_irq     --> xen_cmn_set_irq

Don't abbrvt names, xen_hvm_ is a better prefix.

> 
> +                    fprintf(stderr, "WARNING: guest domain attempted to use PIRQ%c "
> +                            "routing which is not supported for Xen/Q35 currently\n",
> +                            (char)(address - ICH9_LPC_PIRQE_ROUT + 'E'));

Use error_report instead.

Paolo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 13/30] pc/xen: Xen Q35 support: provide IRQ handling for PCI devices
  2018-03-14 10:48   ` [Qemu-devel] " Paolo Bonzini
@ 2018-03-14 11:28       ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-14 11:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: xen-devel, qemu-devel, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Eduardo Habkost, Stefano Stabellini,
	Anthony Perard

On Wed, 14 Mar 2018 11:48:46 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

>On 12/03/2018 19:33, Alexey Gerasimenko wrote:
>> xen_pci_slot_get_pirq --> xen_cmn_pci_slot_get_pirq
>> xen_piix3_set_irq     --> xen_cmn_set_irq  
>
>Don't abbrvt names, xen_hvm_ is a better prefix.

Agree, will rename xen_cmn_* to xen_hvm_*

>> +                    fprintf(stderr, "WARNING: guest domain
>> attempted to use PIRQ%c "
>> +                            "routing which is not supported for
>> Xen/Q35 currently\n",
>> +                            (char)(address - ICH9_LPC_PIRQE_ROUT +
>> 'E'));  
>
>Use error_report instead.

OK, will change to error_report().
There are multiple fprintf(stderr,...)'s still left in the file though,
an additional cleanup patch to replace all such instances to
error_report() calls might be needed later.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 13/30] pc/xen: Xen Q35 support: provide IRQ handling for PCI devices
@ 2018-03-14 11:28       ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-14 11:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Stefano Stabellini, Eduardo Habkost, Michael S. Tsirkin,
	qemu-devel, Anthony Perard, Marcel Apfelbaum, xen-devel,
	Richard Henderson

On Wed, 14 Mar 2018 11:48:46 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

>On 12/03/2018 19:33, Alexey Gerasimenko wrote:
>> xen_pci_slot_get_pirq --> xen_cmn_pci_slot_get_pirq
>> xen_piix3_set_irq     --> xen_cmn_set_irq  
>
>Don't abbrvt names, xen_hvm_ is a better prefix.

Agree, will rename xen_cmn_* to xen_hvm_*

>> +                    fprintf(stderr, "WARNING: guest domain
>> attempted to use PIRQ%c "
>> +                            "routing which is not supported for
>> Xen/Q35 currently\n",
>> +                            (char)(address - ICH9_LPC_PIRQE_ROUT +
>> 'E'));  
>
>Use error_report instead.

OK, will change to error_report().
There are multiple fprintf(stderr,...)'s still left in the file though,
an additional cleanup patch to replace all such instances to
error_report() calls might be needed later.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 11/12] hvmloader: use libacpi to build MCFG table
  2018-03-12 18:33 ` [RFC PATCH 11/12] hvmloader: use libacpi to build MCFG table Alexey Gerasimenko
@ 2018-03-14 17:48   ` Alexey G
  2018-03-19 17:49   ` Roger Pau Monné
  2018-05-29 14:46   ` Jan Beulich
  2 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-14 17:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Ian Jackson, Wei Liu, Jan Beulich

[-- Attachment #1: Type: text/plain, Size: 1182 bytes --]

On Tue, 13 Mar 2018 04:33:56 +1000
Alexey Gerasimenko <x1917x@gmail.com> wrote:

>This patch extends hvmloader_acpi_build_tables() with code which
>detects if MMCONFIG is available -- i.e. initialized and enabled
>(+we're running on Q35), obtains its base address and size and asks
>libacpi to build MCFG table for it via setting the flag ACPI_HAS_MCFG
>in a manner similar to other optional ACPI tables building.
>
>Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
>---
> tools/firmware/hvmloader/util.c | 70
> +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 70
> insertions(+)

Looks like I missed the patch for reserving MMCONFIG area in E820 map,
it is required for Linux guests (otherwise MMCONFIG info will be
rejected by linux kernel). Windows guests allow to use MMCONFIG without
a corresponding E820 entry.

Following lines need to be added to /hvmloader/e820.c:

+    /* mark MMCONFIG area */
+    if ( is_mmconfig_used() )
+    {
+        e820[nr].addr = mmconfig_get_base();
+        e820[nr].size = mmconfig_get_size();
+        e820[nr].type = E820_RESERVED;
+        nr++;
+    }

The corresponding patch-file is attached, will include it in v2 patches.

[-- Attachment #2: hvmloader-mark-MMCONFIG-in-E820-map.patch --]
[-- Type: application/octet-stream, Size: 2522 bytes --]

From c16186e0c1ad388362f61136b8da2e02e76d1840 Mon Sep 17 00:00:00 2001
From: Alexey Gerasimenko <x1917x@gmail.com>
Date: Thu, 15 Mar 2018 03:06:39 +1000
Subject: [PATCH] hvmloader: mark MMCONFIG in E820 map

---
 tools/firmware/hvmloader/e820.c | 9 +++++++++
 tools/firmware/hvmloader/util.c | 6 +++---
 tools/firmware/hvmloader/util.h | 5 +++++
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 4d1c955a02..9cfe86e78e 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -233,6 +233,15 @@ int build_e820_table(struct e820entry *e820,
         nr++;
     }
 
+    /* mark MMCONFIG area */
+    if ( is_mmconfig_used() )
+    {
+        e820[nr].addr = mmconfig_get_base();
+        e820[nr].size = mmconfig_get_size();
+        e820[nr].type = E820_RESERVED;
+        nr++;
+    }
+
     /* Low RAM goes here. Reserve space for special pages. */
     BUG_ON(low_mem_end < MB(2));
 
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index c6fc81d52a..a32ada9613 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -788,7 +788,7 @@ int get_pc_machine_type(void)
 #define PCIEXBAR_LENGTH_BITS(reg)   (((reg) >> 1) & 3)
 #define PCIEXBAREN                  1
 
-static uint64_t mmconfig_get_base(void)
+uint64_t mmconfig_get_base(void)
 {
     uint64_t base;
     uint32_t reg = pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR);
@@ -813,7 +813,7 @@ static uint64_t mmconfig_get_base(void)
     return base;
 }
 
-static uint32_t mmconfig_get_size(void)
+uint32_t mmconfig_get_size(void)
 {
     uint32_t reg = pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR);
 
@@ -834,7 +834,7 @@ static uint32_t mmconfig_is_enabled(void)
     return pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR) & PCIEXBAREN;
 }
 
-static int is_mmconfig_used(void)
+int is_mmconfig_used(void)
 {
     if (get_pc_machine_type() == MACHINE_TYPE_Q35)
     {
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index fd2d885c96..892ab3897a 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -296,6 +296,11 @@ struct acpi_config;
 void hvmloader_acpi_build_tables(struct acpi_config *config,
                                  unsigned int physical);
 
+/* MMCONFIG-related */
+uint64_t mmconfig_get_base(void);
+uint32_t mmconfig_get_size(void);
+int is_mmconfig_used(void);
+
 #endif /* __HVMLOADER_UTIL_H__ */
 
 /*
-- 
2.11.0


[-- Attachment #3: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices
  2018-03-12 18:33 ` Alexey Gerasimenko
                   ` (31 preceding siblings ...)
  (?)
@ 2018-03-16 17:34 ` Alexey G
  2018-03-16 18:26   ` Stefano Stabellini
  2018-03-16 18:36   ` Roger Pau Monné
  -1 siblings, 2 replies; 183+ messages in thread
From: Alexey G @ 2018-03-16 17:34 UTC (permalink / raw)
  To: xen-devel; +Cc: Anthony Perard, Stefano Stabellini

A gentle RFC-ping.

Any thoughts on this? Regarding the feature as a whole. So far there
were responses mostly targeting individual patches, while I'd like to
hear about chosen approaches in general, whether the overall direction
is correct (or not), etc. It's just RFC after all, not v11. :)

I can split it into two series if that would be preferable, one for
general Q35 bring up and basic access to PCIe extended config
space via ECAM (this is what the feature was used for initially) and
the second part is providing support for PCIe Extended Capabilities
emulation infrastructure (hw/xen/xen-pt*.c in QEMU).

On Tue, 13 Mar 2018 04:33:45 +1000
Alexey Gerasimenko <x1917x@gmail.com> wrote:

>This patch series introduces support of Q35 emulation for Xen HVM
>guests (via QEMU). This feature is present in other virtualization
>products and Xen can greatly benefit from this feature as well.
>
>The main goal for implementing Q35 emulation for Xen was extending
>PCI/GPU passthrough capabilities. It's the main advantage of Q35
>emulation
>- availability of extra features for PCIe device passthrough. The most
>important PCIe-specific passthrough feature Q35 provides is a support
>for PCIe config space ECAM (aka MMCONFIG) to allow accesses to
>extended PCIe config space (>256), which is MMIO-based.  Lots of PCIe
>devices and their drivers make use of PCIe Extended Capabilities,
>whose can be accessed only using ECAM and offsets above 0x100 in PCI
>config space. Supporting ECAM is a mandatory feature for PCIe
>passthrough. Not only this allows passthrough PCIe devices to function
>properly, but opens a road to extend Xen PCIe passthrough features
>further -- eg. providing support for AER. One of possible directions
>is providing support for PCIe Resizable BARs -- a feature which likely

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices
  2018-03-16 17:34 ` Alexey G
@ 2018-03-16 18:26   ` Stefano Stabellini
  2018-03-16 18:36   ` Roger Pau Monné
  1 sibling, 0 replies; 183+ messages in thread
From: Stefano Stabellini @ 2018-03-16 18:26 UTC (permalink / raw)
  To: Alexey G; +Cc: Anthony Perard, xen-devel, Stefano Stabellini

Hi Alexey, thanks for the ping. I think this is a good feature to have
and I would like to check it in when it is ready. I spoke with Anthony
and agreed that he will be reviewing it. Please be patient but we'll get
there :-)

On Sat, 17 Mar 2018, Alexey G wrote:
> A gentle RFC-ping.
> 
> Any thoughts on this? Regarding the feature as a whole. So far there
> were responses mostly targeting individual patches, while I'd like to
> hear about chosen approaches in general, whether the overall direction
> is correct (or not), etc. It's just RFC after all, not v11. :)
> 
> I can split it into two series if that would be preferable, one for
> general Q35 bring up and basic access to PCIe extended config
> space via ECAM (this is what the feature was used for initially) and
> the second part is providing support for PCIe Extended Capabilities
> emulation infrastructure (hw/xen/xen-pt*.c in QEMU).
> 
> On Tue, 13 Mar 2018 04:33:45 +1000
> Alexey Gerasimenko <x1917x@gmail.com> wrote:
> 
> >This patch series introduces support of Q35 emulation for Xen HVM
> >guests (via QEMU). This feature is present in other virtualization
> >products and Xen can greatly benefit from this feature as well.
> >
> >The main goal for implementing Q35 emulation for Xen was extending
> >PCI/GPU passthrough capabilities. It's the main advantage of Q35
> >emulation
> >- availability of extra features for PCIe device passthrough. The most
> >important PCIe-specific passthrough feature Q35 provides is a support
> >for PCIe config space ECAM (aka MMCONFIG) to allow accesses to
> >extended PCIe config space (>256), which is MMIO-based.  Lots of PCIe
> >devices and their drivers make use of PCIe Extended Capabilities,
> >whose can be accessed only using ECAM and offsets above 0x100 in PCI
> >config space. Supporting ECAM is a mandatory feature for PCIe
> >passthrough. Not only this allows passthrough PCIe devices to function
> >properly, but opens a road to extend Xen PCIe passthrough features
> >further -- eg. providing support for AER. One of possible directions
> >is providing support for PCIe Resizable BARs -- a feature which likely
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices
  2018-03-16 17:34 ` Alexey G
  2018-03-16 18:26   ` Stefano Stabellini
@ 2018-03-16 18:36   ` Roger Pau Monné
  1 sibling, 0 replies; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-16 18:36 UTC (permalink / raw)
  To: Alexey G; +Cc: Anthony Perard, xen-devel, Stefano Stabellini

On Sat, Mar 17, 2018 at 03:34:58AM +1000, Alexey G wrote:
> A gentle RFC-ping.
> 
> Any thoughts on this? Regarding the feature as a whole. So far there
> were responses mostly targeting individual patches, while I'd like to
> hear about chosen approaches in general, whether the overall direction
> is correct (or not), etc. It's just RFC after all, not v11. :)

I plan to look at the series, but in general you should wait at least
7 days (one week) before pinging.

> I can split it into two series if that would be preferable, one for
> general Q35 bring up and basic access to PCIe extended config
> space via ECAM (this is what the feature was used for initially) and
> the second part is providing support for PCIe Extended Capabilities
> emulation infrastructure (hw/xen/xen-pt*.c in QEMU).

I would wait a bit before doing more work, as said I plan to look at
the series, and other probably are too, it's just that you gave us too
little time ;).

Keep in mind we are approaching feature freeze, and there are some
more mature series that will likely get more review attention than
yours ATM, in order to try to get them in before the freeze.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 01/12] libacpi: new DSDT ACPI table for Q35
  2018-03-12 18:33 ` [RFC PATCH 01/12] libacpi: new DSDT ACPI table for Q35 Alexey Gerasimenko
  2018-03-12 19:38   ` Konrad Rzeszutek Wilk
@ 2018-03-19 12:43   ` Roger Pau Monné
  2018-03-19 13:57     ` Alexey G
  1 sibling, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-19 12:43 UTC (permalink / raw)
  To: Alexey Gerasimenko; +Cc: xen-devel, Wei Liu, Ian Jackson, Jan Beulich

On Tue, Mar 13, 2018 at 04:33:46AM +1000, Alexey Gerasimenko wrote:
> This patch adds the DSDT table for Q35 (new tools/libacpi/dsdt_q35.asl
> file). There are not many differences with dsdt.asl (for i440) at the
> moment, namely:
> 
> - BDF location of LPC Controller
> - Minor changes related to FDC detection
> - Addition of _OSC method to inform OSPM about PCIe features supported
> 
> As we are still using 4 PCI router links and their corresponding
> device/register addresses are same (offset 0x60), no need to change PCI
> routing descriptions.
> 
> Also, ACPI hotplug is still used to control passed through device hot
> (un)plug (as it was for i440).
> 
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> ---
>  tools/libacpi/dsdt_q35.asl | 551 +++++++++++++++++++++++++++++++++++++++++++++

So this is basically a modified dupe of the current dsdt.asl? AFAICT
there are a bunch of common bits, which ideally we want to have
defined in a single place.

Can't you factor out the common parts of the dsdt.asl into smaller
parts an include them for both dsdt.asl and dsdt_q35.asl?

I would first have a patch that extract the common parts of the
dsdt into file(s), and then a second patch which creates a
dsdt_q35.asl based on those common bits plus the specific q35 code.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 02/12] Makefile: build and use new DSDT table for Q35
  2018-03-12 18:33 ` [RFC PATCH 02/12] Makefile: build and use new DSDT " Alexey Gerasimenko
@ 2018-03-19 12:46   ` Roger Pau Monné
  2018-03-19 14:18     ` Alexey G
  2018-03-19 13:07   ` Jan Beulich
  1 sibling, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-19 12:46 UTC (permalink / raw)
  To: Alexey Gerasimenko
  Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

On Tue, Mar 13, 2018 at 04:33:47AM +1000, Alexey Gerasimenko wrote:
> Provide building for newly added dsdt_q35.asl file, in a way similar
> to dsdt.asl.
> 
> Note that '15cpu' ACPI tables are only applicable to qemu-traditional
> (which have no support for Q35), so we need to use 'anycpu' version only.

You should do this in the same patch that adds dsdt_q35.asl, at the
end without this the previous patch just adds dead code.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 03/12] hvmloader: add function to query an emulated machine type (i440/Q35)
  2018-03-12 18:33 ` [RFC PATCH 03/12] hvmloader: add function to query an emulated machine type (i440/Q35) Alexey Gerasimenko
  2018-03-13 17:26   ` Wei Liu
@ 2018-03-19 12:56   ` Roger Pau Monné
  2018-03-19 16:26     ` Alexey G
  1 sibling, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-19 12:56 UTC (permalink / raw)
  To: Alexey Gerasimenko
  Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

On Tue, Mar 13, 2018 at 04:33:48AM +1000, Alexey Gerasimenko wrote:
> This adds a new function get_pc_machine_type() which allows to determine
> the emulated chipset type. Supported return values:
> 
> - MACHINE_TYPE_I440
> - MACHINE_TYPE_Q35
> - MACHINE_TYPE_UNKNOWN, results in the error message being printed
>   followed by calling BUG() in hvmloader.

This is not correct, the return values are strictly MACHINE_TYPE_I440
or MACHINE_TYPE_Q35. Everything else ends up in a BUG().

Also makes me wonder whether this should instead be init_machine_type,
and users should just read machine_type directly.

> 
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> ---
>  tools/firmware/hvmloader/pci_regs.h |  5 ++++
>  tools/firmware/hvmloader/util.c     | 47 +++++++++++++++++++++++++++++++++++++
>  tools/firmware/hvmloader/util.h     |  8 +++++++
>  3 files changed, 60 insertions(+)
> 
> diff --git a/tools/firmware/hvmloader/pci_regs.h b/tools/firmware/hvmloader/pci_regs.h
> index 7bf2d873ab..ba498b840e 100644
> --- a/tools/firmware/hvmloader/pci_regs.h
> +++ b/tools/firmware/hvmloader/pci_regs.h
> @@ -107,6 +107,11 @@
>  
>  #define PCI_INTEL_OPREGION 0xfc /* 4 bits */
>  
> +#define PCI_VENDOR_ID_INTEL              0x8086
> +#define PCI_DEVICE_ID_INTEL_82441        0x1237
> +#define PCI_DEVICE_ID_INTEL_Q35_MCH      0x29c0
> +
> +
>  #endif /* __HVMLOADER_PCI_REGS_H__ */
>  
>  /*
> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> index 0c3f2d24cd..5739a87628 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -22,6 +22,7 @@
>  #include "hypercall.h"
>  #include "ctype.h"
>  #include "vnuma.h"
> +#include "pci_regs.h"
>  #include <acpi2_0.h>
>  #include <libacpi.h>
>  #include <stdint.h>
> @@ -735,6 +736,52 @@ void __bug(char *file, int line)
>      crash();
>  }
>  
> +
> +static int machine_type = MACHINE_TYPE_UNDEFINED;

There's no need to init this, _UNDEFINED is 0 which is the default
value.

> +
> +int get_pc_machine_type(void)

You introduce a function that's not used anywhere, and the commit log
doesn't mention why this is needed at all. In general I prefer
functions to be introduced with at least a caller, or else it needs to
be described in the commit message why this is not the case.

> +{
> +    uint16_t vendor_id;
> +    uint16_t device_id;
> +
> +    if (machine_type != MACHINE_TYPE_UNDEFINED)
> +        return machine_type;
> +
> +    machine_type = MACHINE_TYPE_UNKNOWN;
> +
> +    vendor_id = pci_readw(0, PCI_VENDOR_ID);
> +    device_id = pci_readw(0, PCI_DEVICE_ID);
> +
> +    /* only Intel platforms are emulated currently */
> +    if (vendor_id == PCI_VENDOR_ID_INTEL)

Should this maybe be a BUG_ON(vendor_id != PCI_VENDOR_ID_INTEL) then?
Note that in this case you end up with a BUG later anyway.

> +    {
> +        switch (device_id)
> +        {
> +        case PCI_DEVICE_ID_INTEL_82441:
> +            machine_type = MACHINE_TYPE_I440;
> +            printf("Detected i440 chipset\n");
> +            break;
> +
> +        case PCI_DEVICE_ID_INTEL_Q35_MCH:
> +            machine_type = MACHINE_TYPE_Q35;
> +            printf("Detected Q35 chipset\n");
> +            break;
> +
> +        default:
> +            break;
> +        }
> +    }
> +
> +    if (machine_type == MACHINE_TYPE_UNKNOWN)
> +    {
> +        printf("Unknown emulated chipset encountered, VID=%04Xh, DID=%04Xh\n",
> +               vendor_id, device_id);
> +        BUG();

Why not place this in the default switch label? That would allow you
to get rid of the MACHINE_TYPE_UNKNOWN define also.

> +    }
> +
> +    return machine_type;
> +}
> +
>  static void validate_hvm_info(struct hvm_info_table *t)
>  {
>      uint8_t *ptr = (uint8_t *)t;
> diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
> index 7bca6418d2..7c77bedb00 100644
> --- a/tools/firmware/hvmloader/util.h
> +++ b/tools/firmware/hvmloader/util.h
> @@ -100,6 +100,14 @@ void pci_write(uint32_t devfn, uint32_t reg, uint32_t len, uint32_t val);
>  #define pci_writew(devfn, reg, val) pci_write(devfn, reg, 2, (uint16_t)(val))
>  #define pci_writel(devfn, reg, val) pci_write(devfn, reg, 4, (uint32_t)(val))
>  
> +/* Emulated machine types */
> +#define MACHINE_TYPE_UNDEFINED      0
> +#define MACHINE_TYPE_I440           1
> +#define MACHINE_TYPE_Q35            2
> +#define MACHINE_TYPE_UNKNOWN        (-1)

An enum seems better suited for this.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 04/12] hvmloader: add ACPI enabling for Q35
  2018-03-12 18:33 ` [RFC PATCH 04/12] hvmloader: add ACPI enabling for Q35 Alexey Gerasimenko
  2018-03-13 17:26   ` Wei Liu
@ 2018-03-19 13:01   ` Roger Pau Monné
  2018-03-19 23:59     ` Alexey G
  1 sibling, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-19 13:01 UTC (permalink / raw)
  To: Alexey Gerasimenko
  Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

On Tue, Mar 13, 2018 at 04:33:49AM +1000, Alexey Gerasimenko wrote:
> In order to turn on ACPI for OS, we need to write a chipset-specific value
> to SMI_CMD register (sort of imitation of the APM->ACPI switch on real
> systems). Modify acpi_enable_sci() function to support both i440 and Q35
> emulation.
> 
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> ---
>  tools/firmware/hvmloader/hvmloader.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
> index f603f68ded..070698440e 100644
> --- a/tools/firmware/hvmloader/hvmloader.c
> +++ b/tools/firmware/hvmloader/hvmloader.c
> @@ -257,9 +257,16 @@ static const struct bios_config *detect_bios(void)
>  static void acpi_enable_sci(void)
>  {
>      uint8_t pm1a_cnt_val;
> +    uint8_t acpi_enable_val;
>  
> -#define PIIX4_SMI_CMD_IOPORT 0xb2
> +#define SMI_CMD_IOPORT       0xb2
>  #define PIIX4_ACPI_ENABLE    0xf1
> +#define ICH9_ACPI_ENABLE     0x02
> +
> +    if (get_pc_machine_type() == MACHINE_TYPE_Q35)
> +        acpi_enable_val = ICH9_ACPI_ENABLE;
> +    else
> +        acpi_enable_val = PIIX4_ACPI_ENABLE;

Coding style, but I would rather:

switch ( get_pc_machine_type() )
{
case MACHINE_TYPE_Q35:
...
case MACHINE_TYPE_I440:
...
default:
BUG();
}

I think storing the machine type in a global variable is better than
calling get_pc_machine_type each time.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 02/12] Makefile: build and use new DSDT table for Q35
  2018-03-12 18:33 ` [RFC PATCH 02/12] Makefile: build and use new DSDT " Alexey Gerasimenko
  2018-03-19 12:46   ` Roger Pau Monné
@ 2018-03-19 13:07   ` Jan Beulich
  2018-03-19 14:10     ` Alexey G
  1 sibling, 1 reply; 183+ messages in thread
From: Jan Beulich @ 2018-03-19 13:07 UTC (permalink / raw)
  To: Alexey Gerasimenko; +Cc: Andrew Cooper, Wei Liu, Ian Jackson, xen-devel

>>> On 12.03.18 at 19:33, <x1917x@gmail.com> wrote:
> --- a/tools/firmware/hvmloader/Makefile
> +++ b/tools/firmware/hvmloader/Makefile
> @@ -75,7 +75,7 @@ rombios.o: roms.inc
>  smbios.o: CFLAGS += -D__SMBIOS_DATE__="\"$(SMBIOS_REL_DATE)\""
>  
>  ACPI_PATH = ../../libacpi
> -DSDT_FILES = dsdt_anycpu.c dsdt_15cpu.c dsdt_anycpu_qemu_xen.c
> +DSDT_FILES = dsdt_anycpu.c dsdt_15cpu.c dsdt_anycpu_qemu_xen.c dsdt_q35_anycpu_qemu_xen.c

Unless you intend to add a second flavor, please omit the "anycpu"
part from the name of the new instance.

> @@ -56,6 +56,13 @@ $(ACPI_BUILD_DIR)/dsdt_anycpu_qemu_xen.asl: dsdt.asl dsdt_acpi_info.asl $(MK_DSD
>  	$(MK_DSDT) --debug=$(debug) --dm-version qemu-xen >> $@.$(TMP_SUFFIX)
>  	mv -f $@.$(TMP_SUFFIX) $@
>  
> +$(ACPI_BUILD_DIR)/dsdt_q35_anycpu_qemu_xen.asl: dsdt_q35.asl dsdt_acpi_info.asl $(MK_DSDT)
> +	# Remove last bracket
> +	awk 'NR > 1 {print s} {s=$$0}' $< > $@.$(TMP_SUFFIX)
> +	cat dsdt_acpi_info.asl >> $@.$(TMP_SUFFIX)
> +	$(MK_DSDT) --debug=$(debug) --dm-version qemu-xen >> $@.$(TMP_SUFFIX)
> +	mv -f $@.$(TMP_SUFFIX) $@

The commands look to be exactly the same as those for
dsdt_anycpu_qemu_xen.asl - please let's not duplicate such
things, but instead use a pattern rule.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 01/12] libacpi: new DSDT ACPI table for Q35
  2018-03-19 12:43   ` Roger Pau Monné
@ 2018-03-19 13:57     ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-19 13:57 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Wei Liu, Ian Jackson, Jan Beulich

On Mon, 19 Mar 2018 12:43:05 +0000
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Tue, Mar 13, 2018 at 04:33:46AM +1000, Alexey Gerasimenko wrote:
>> This patch adds the DSDT table for Q35 (new
>> tools/libacpi/dsdt_q35.asl file). There are not many differences
>> with dsdt.asl (for i440) at the moment, namely:
>> 
>> - BDF location of LPC Controller
>> - Minor changes related to FDC detection
>> - Addition of _OSC method to inform OSPM about PCIe features
>> supported
>> 
>> As we are still using 4 PCI router links and their corresponding
>> device/register addresses are same (offset 0x60), no need to change
>> PCI routing descriptions.
>> 
>> Also, ACPI hotplug is still used to control passed through device hot
>> (un)plug (as it was for i440).
>> 
>> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
>> ---
>>  tools/libacpi/dsdt_q35.asl | 551
>> +++++++++++++++++++++++++++++++++++++++++++++  
>
>So this is basically a modified dupe of the current dsdt.asl? AFAICT
>there are a bunch of common bits, which ideally we want to have
>defined in a single place.
>
>Can't you factor out the common parts of the dsdt.asl into smaller
>parts an include them for both dsdt.asl and dsdt_q35.asl?
>
>I would first have a patch that extract the common parts of the
>dsdt into file(s), and then a second patch which creates a
>dsdt_q35.asl based on those common bits plus the specific q35 code.

Yes, it's a good thing that many registers have same addresses on
i440 and Q35. Some encountered common things were unexpected though --
AFAIR _S5 SLP_TYP value do not correspond to the ICH9 datasheet,
a different value used instead to trigger ACPI Soft-Off emulation.

Regarding dsdt.asl/dsdt_q35.asl -- OK, I'll split these files into
common/specific parts.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 02/12] Makefile: build and use new DSDT table for Q35
  2018-03-19 13:07   ` Jan Beulich
@ 2018-03-19 14:10     ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-19 14:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Ian Jackson, xen-devel

On Mon, 19 Mar 2018 07:07:34 -0600
"Jan Beulich" <JBeulich@suse.com> wrote:

>>>> On 12.03.18 at 19:33, <x1917x@gmail.com> wrote:  
>> --- a/tools/firmware/hvmloader/Makefile
>> +++ b/tools/firmware/hvmloader/Makefile
>> @@ -75,7 +75,7 @@ rombios.o: roms.inc
>>  smbios.o: CFLAGS += -D__SMBIOS_DATE__="\"$(SMBIOS_REL_DATE)\""
>>  
>>  ACPI_PATH = ../../libacpi
>> -DSDT_FILES = dsdt_anycpu.c dsdt_15cpu.c dsdt_anycpu_qemu_xen.c
>> +DSDT_FILES = dsdt_anycpu.c dsdt_15cpu.c dsdt_anycpu_qemu_xen.c
>> dsdt_q35_anycpu_qemu_xen.c  
>
>Unless you intend to add a second flavor, please omit the "anycpu"
>part from the name of the new instance.

Just following same "anycpu/15cpu" naming scheme, there will be no need
for dsdt_q35_15cpu.c, so I guess its ok to drop anycpu/15cpu part of
the name, will rename it.

>> @@ -56,6 +56,13 @@ $(ACPI_BUILD_DIR)/dsdt_anycpu_qemu_xen.asl:
>> dsdt.asl dsdt_acpi_info.asl $(MK_DSD $(MK_DSDT) --debug=$(debug)
>> --dm-version qemu-xen >> $@.$(TMP_SUFFIX) mv -f $@.$(TMP_SUFFIX) $@
>>  
>> +$(ACPI_BUILD_DIR)/dsdt_q35_anycpu_qemu_xen.asl: dsdt_q35.asl
>> dsdt_acpi_info.asl $(MK_DSDT)
>> +	# Remove last bracket
>> +	awk 'NR > 1 {print s} {s=$$0}' $< > $@.$(TMP_SUFFIX)
>> +	cat dsdt_acpi_info.asl >> $@.$(TMP_SUFFIX)
>> +	$(MK_DSDT) --debug=$(debug) --dm-version qemu-xen >>
>> $@.$(TMP_SUFFIX)
>> +	mv -f $@.$(TMP_SUFFIX) $@  
>
>The commands look to be exactly the same as those for
>dsdt_anycpu_qemu_xen.asl - please let's not duplicate such
>things, but instead use a pattern rule.

Agree, reusing the rule will be better.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 02/12] Makefile: build and use new DSDT table for Q35
  2018-03-19 12:46   ` Roger Pau Monné
@ 2018-03-19 14:18     ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-19 14:18 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

On Mon, 19 Mar 2018 12:46:05 +0000
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Tue, Mar 13, 2018 at 04:33:47AM +1000, Alexey Gerasimenko wrote:
>> Provide building for newly added dsdt_q35.asl file, in a way similar
>> to dsdt.asl.
>> 
>> Note that '15cpu' ACPI tables are only applicable to qemu-traditional
>> (which have no support for Q35), so we need to use 'anycpu' version
>> only.  
>
>You should do this in the same patch that adds dsdt_q35.asl, at the
>end without this the previous patch just adds dead code.
>
>Thanks, Roger.

Agree, I've abused recommendation to granulate patches for easier
review. :) Will merge it with the previous patch.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 05/12] hvmloader: add Q35 DSDT table loading
  2018-03-12 18:33 ` [RFC PATCH 05/12] hvmloader: add Q35 DSDT table loading Alexey Gerasimenko
@ 2018-03-19 14:45   ` Roger Pau Monné
  2018-03-20  0:15     ` Alexey G
  0 siblings, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-19 14:45 UTC (permalink / raw)
  To: Alexey Gerasimenko
  Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

On Tue, Mar 13, 2018 at 04:33:50AM +1000, Alexey Gerasimenko wrote:
> Allows to select Q35 DSDT table in hvmloader_acpi_build_tables(). Function
> get_pc_machine_type() is used to select a proper table (i440/q35).
> 
> As we are bound to the qemu-xen device model for Q35, no need
> to initialize config->dsdt_15cpu/config->dsdt_15cpu_len fields.
> 
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> ---
>  tools/firmware/hvmloader/util.c | 13 +++++++++++--
>  tools/firmware/hvmloader/util.h |  2 ++
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> index 5739a87628..d8db9e3c8e 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -955,8 +955,17 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
>      }
>      else if ( !strncmp(s, "qemu_xen", 9) )
>      {
> -        config->dsdt_anycpu = dsdt_anycpu_qemu_xen;
> -        config->dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len;
> +        if (get_pc_machine_type() == MACHINE_TYPE_Q35)

Coding style (missing spaces between parentheses), and I would prefer
a switch here.


IMO you should add a BUG_ON(Q35) in the qemu_xen_traditional condition
above this one..

> +        {
> +            config->dsdt_anycpu = dsdt_q35_anycpu_qemu_xen;
> +            config->dsdt_anycpu_len = dsdt_q35_anycpu_qemu_xen_len;
> +        }
> +        else
> +        {
> +            config->dsdt_anycpu = dsdt_anycpu_qemu_xen;
> +            config->dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len;
> +        }
> +
>          config->dsdt_15cpu = NULL;
>          config->dsdt_15cpu_len = 0;
>      }
> diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
> index 7c77bedb00..fd2d885c96 100644
> --- a/tools/firmware/hvmloader/util.h
> +++ b/tools/firmware/hvmloader/util.h
> @@ -288,7 +288,9 @@ bool check_overlap(uint64_t start, uint64_t size,
>                     uint64_t reserved_start, uint64_t reserved_size);
>  
>  extern const unsigned char dsdt_anycpu_qemu_xen[], dsdt_anycpu[], dsdt_15cpu[];
> +extern const unsigned char dsdt_q35_anycpu_qemu_xen[];
>  extern const int dsdt_anycpu_qemu_xen_len, dsdt_anycpu_len, dsdt_15cpu_len;
> +extern const int dsdt_q35_anycpu_qemu_xen_len;

Since you are adding this, maybe unsigned int? (or size_t?)

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 09/12] libxl: Xen Platform device support for Q35
  2018-03-12 18:33 ` [RFC PATCH 09/12] libxl: Xen Platform device support for Q35 Alexey Gerasimenko
@ 2018-03-19 15:05   ` Alexey G
  2018-03-21 16:32     ` Wei Liu
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-19 15:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson

On Tue, 13 Mar 2018 04:33:54 +1000
Alexey Gerasimenko <x1917x@gmail.com> wrote:

>Current Xen/QEMU method to control Xen Platform device is a bit odd --
>changing 'xen_platform_device' option value actually modifies QEMU
>emulated machine type, namely xenfv <--> pc.
>
>In order to avoid multiplying machine types, use the new way to control
>Xen Platform device for QEMU -- xen-platform-dev property. To maintain
>backward compatibility with existing Xen/QEMU setups, this is only
>applicable to q35 machine currently. i440 emulation uses the old method
>(xenfv/pc machine) to control Xen Platform device, this may be changed
>later to xen-platform-dev property as well.
>
>Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
>---
> tools/libxl/libxl_dm.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
>diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
>index 7b531050c7..586035aa73 100644
>--- a/tools/libxl/libxl_dm.c
>+++ b/tools/libxl/libxl_dm.c
>@@ -1444,7 +1444,11 @@ static int
>libxl__build_device_model_args_new(libxl__gc *gc,
>         break;
>     case LIBXL_DOMAIN_TYPE_HVM:
>         if (b_info->device_model_machine ==
> LIBXL_DEVICE_MODEL_MACHINE_Q35) {
>-            machinearg = libxl__sprintf(gc, "q35,accel=xen");
>+            if (!libxl_defbool_val(b_info->u.hvm.xen_platform_pci)) {
>+                machinearg = libxl__sprintf(gc, "q35,accel=xen");
>+            } else {
>+                machinearg = libxl__sprintf(gc,
>"q35,accel=xen,xen-platform-dev=on");
>+            }
>         } else {
>             if (!libxl_defbool_val(b_info->u.hvm.xen_platform_pci)) {
>                 /* Switching here to the machine "pc" which does not
> add

Regarding this one -- QEMU maintainers suggested that supplying '-device
xen-platform' directly should be a better approach than a machine
property, so this patch is kinda obsolete.

Right now "xenfv" machine usage for qemu-xen seems to be limited to
controlling the Xen platform device and applying the HVM_MAX_VCPUS
value to maxcpus + minor changes related to IGD passthrough. Both
should be applicable for a "pc,accel=xen" machine as well I think, which
in fact currently lacks the HVM_MAX_VCPUS check for some reason.

Adding a distinct method to control Xen platform device for the q35
machine suggests to propagate the same approach to i440 machine types,
but... it depends on who else can use xenfv for qemu-xen (not to be
confused with xenfv usage on qemu-traditional).

Is there any other toolstacks/code which use xenfv machine solely to
turn on/off Xen platform device?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 06/12] hvmloader: add basic Q35 support
  2018-03-12 18:33 ` [RFC PATCH 06/12] hvmloader: add basic Q35 support Alexey Gerasimenko
@ 2018-03-19 15:30   ` Roger Pau Monné
  2018-03-19 23:44     ` Alexey G
  0 siblings, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-19 15:30 UTC (permalink / raw)
  To: Alexey Gerasimenko
  Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

On Tue, Mar 13, 2018 at 04:33:51AM +1000, Alexey Gerasimenko wrote:
> This patch does following:
> 
> 1. Move PCI-device specific initialization out of pci_setup function
> to the newly created class_specific_pci_device_setup function to simplify
> code.
> 
> 2. PCI-device specific initialization extended with LPC controller
> initialization
> 
> 3. Initialize PIRQA...{PIRQD, PIRQH} routing accordingly to the emulated
> south bridge (either located on PCI_ISA_DEVFN or PCI_ICH9_LPC_DEVFN).
> 
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> ---
>  tools/firmware/hvmloader/config.h |   1 +
>  tools/firmware/hvmloader/pci.c    | 162 ++++++++++++++++++++++++--------------
>  2 files changed, 104 insertions(+), 59 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
> index 6e00413f2e..6fde6b7b60 100644
> --- a/tools/firmware/hvmloader/config.h
> +++ b/tools/firmware/hvmloader/config.h
> @@ -52,6 +52,7 @@ extern uint8_t ioapic_version;
>  
>  #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
> +#define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0 */
>  
>  /* MMIO hole: Hardcoded defaults, which can be dynamically expanded. */
>  #define PCI_MEM_END         0xfc000000
> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> index 0b708bf578..033bd20992 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -35,6 +35,7 @@ unsigned long pci_mem_end = PCI_MEM_END;
>  uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
>  
>  enum virtual_vga virtual_vga = VGA_none;
> +uint32_t vga_devfn = 256;

uint8_t should be enough to store a devfn. Also this should be static
maybe?

>  unsigned long igd_opregion_pgbase = 0;
>  
>  /* Check if the specified range conflicts with any reserved device memory. */
> @@ -76,14 +77,93 @@ static int find_next_rmrr(uint32_t base)
>      return next_rmrr;
>  }
>  
> +#define SCI_EN_IOPORT  (ACPI_PM1A_EVT_BLK_ADDRESS_V1 + 0x30)
> +#define GBL_SMI_EN      (1 << 0)
> +#define APMC_EN         (1 << 5)

Alignment.

> +
> +static void class_specific_pci_device_setup(uint16_t vendor_id,
> +                                            uint16_t device_id,
> +                                            uint8_t bus, uint8_t devfn)
> +{
> +    uint16_t class;
> +
> +    class = pci_readw(devfn, PCI_CLASS_DEVICE);
> +
> +    switch ( class )

switch ( pci_readw(devfn, PCI_CLASS_DEVICE) ) ?

I don't see class being used elsewhere.

Also why is vendor_id/device_id provided by the caller but not class?
It seems kind of pointless.

Why not fetch vendor/device from the function itself and move the
(vendor_id == 0xffff) && (device_id == 0xffff) check inside the
function?

Also in this case I think it would be better to have a non-functional
patch that introduces class_specific_pci_device_setup and a second
patch that adds support for ICH9.

Having code movement and new code in the same patch makes it harder to
very what you are actually moving vs introducing.

> +    {
> +    case 0x0300:

All this values need to be defines documented somewhere.

> +        /* If emulated VGA is found, preserve it as primary VGA. */
> +        if ( (vendor_id == 0x1234) && (device_id == 0x1111) )
> +        {
> +            vga_devfn = devfn;
> +            virtual_vga = VGA_std;
> +        }
> +        else if ( (vendor_id == 0x1013) && (device_id == 0xb8) )
> +        {
> +            vga_devfn = devfn;
> +            virtual_vga = VGA_cirrus;
> +        }
> +        else if ( virtual_vga == VGA_none )
> +        {
> +            vga_devfn = devfn;
> +            virtual_vga = VGA_pt;
> +            if ( vendor_id == 0x8086 )
> +            {
> +                igd_opregion_pgbase = mem_hole_alloc(IGD_OPREGION_PAGES);
> +                /*
> +                 * Write the the OpRegion offset to give the opregion
> +                 * address to the device model. The device model will trap
> +                 * and map the OpRegion at the give address.
> +                 */
> +                pci_writel(vga_devfn, PCI_INTEL_OPREGION,
> +                           igd_opregion_pgbase << PAGE_SHIFT);
> +            }
> +        }
> +        break;
> +
> +    case 0x0680:
> +        /* PIIX4 ACPI PM. Special device with special PCI config space. */
> +        ASSERT((vendor_id == 0x8086) && (device_id == 0x7113));
> +        pci_writew(devfn, 0x20, 0x0000); /* No smb bus IO enable */
> +        pci_writew(devfn, 0xd2, 0x0000); /* No smb bus IO enable */
> +        pci_writew(devfn, 0x22, 0x0000);
> +        pci_writew(devfn, 0x3c, 0x0009); /* Hardcoded IRQ9 */
> +        pci_writew(devfn, 0x3d, 0x0001);
> +        pci_writel(devfn, 0x40, ACPI_PM1A_EVT_BLK_ADDRESS_V1 | 1);
> +        pci_writeb(devfn, 0x80, 0x01); /* enable PM io space */
> +        break;
> +
> +    case 0x0601:
> +        /* LPC bridge */
> +        if (vendor_id == 0x8086 && device_id == 0x2918)
> +        {
> +            pci_writeb(devfn, 0x3c, 0x09); /* Hardcoded IRQ9 */
> +            pci_writeb(devfn, 0x3d, 0x01);
> +            pci_writel(devfn, 0x40, ACPI_PM1A_EVT_BLK_ADDRESS_V1 | 1);
> +            pci_writeb(devfn, 0x44, 0x80); /* enable PM io space */
> +            outl(SCI_EN_IOPORT, inl(SCI_EN_IOPORT) | GBL_SMI_EN | APMC_EN);
> +        }
> +        break;
> +
> +    case 0x0101:
> +        if ( vendor_id == 0x8086 )
> +        {
> +            /* Intel ICHs since PIIX3: enable IDE legacy mode. */
> +            pci_writew(devfn, 0x40, 0x8000); /* enable IDE0 */
> +            pci_writew(devfn, 0x42, 0x8000); /* enable IDE1 */
> +        }
> +        break;
> +    }
> +}
> +
>  void pci_setup(void)
>  {
>      uint8_t is_64bar, using_64bar, bar64_relocate = 0;
>      uint32_t devfn, bar_reg, cmd, bar_data, bar_data_upper;
>      uint64_t base, bar_sz, bar_sz_upper, mmio_total = 0;
> -    uint32_t vga_devfn = 256;
> -    uint16_t class, vendor_id, device_id;
> +    uint16_t vendor_id, device_id;
>      unsigned int bar, pin, link, isa_irq;
> +    int is_running_on_q35 = 0;

bool is_running_on_q35 = (get_pc_machine_type() == MACHINE_TYPE_Q35);

>  
>      /* Resources assignable to PCI devices via BARs. */
>      struct resource {
> @@ -130,13 +210,28 @@ void pci_setup(void)
>      if ( s )
>          mmio_hole_size = strtoll(s, NULL, 0);
>  
> +    /* check if we are on Q35 and set the flag if it is the case */
> +    is_running_on_q35 = get_pc_machine_type() == MACHINE_TYPE_Q35;
> +
>      /* Program PCI-ISA bridge with appropriate link routes. */
>      isa_irq = 0;
>      for ( link = 0; link < 4; link++ )
>      {
>          do { isa_irq = (isa_irq + 1) & 15;
>          } while ( !(PCI_ISA_IRQ_MASK & (1U << isa_irq)) );
> -        pci_writeb(PCI_ISA_DEVFN, 0x60 + link, isa_irq);
> +
> +        if (is_running_on_q35)

Coding style.

> +        {
> +            pci_writeb(PCI_ICH9_LPC_DEVFN, 0x60 + link, isa_irq);
> +
> +            /* PIRQE..PIRQH are unused */
> +            pci_writeb(PCI_ICH9_LPC_DEVFN, 0x68 + link, 0x80);

According to the spec 0x80 is the default value for this registers, do
you really need to write it?

Is maybe QEMU not correctly setting the default value?

> +        }
> +        else
> +        {
> +            pci_writeb(PCI_ISA_DEVFN, 0x60 + link, isa_irq);

Is all this magic described somewhere that you can reference?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-12 18:33 ` [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring Alexey Gerasimenko
@ 2018-03-19 15:58   ` Roger Pau Monné
  2018-03-19 19:49     ` Alexey G
  2018-05-29 14:23   ` Jan Beulich
  1 sibling, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-19 15:58 UTC (permalink / raw)
  To: Alexey Gerasimenko
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, Paul Durrant, Jan Beulich,
	xen-devel

On Tue, Mar 13, 2018 at 04:33:52AM +1000, Alexey Gerasimenko wrote:
> Much like normal PCI BARs or other chipset-specific memory-mapped
> resources, MMCONFIG area needs space in MMIO hole, so we must allocate
> it manually.
> 
> The actual MMCONFIG size depends on a number of PCI buses available which
> should be covered by ECAM. Possible options are 64MB, 128MB and 256MB.
> As we are limited to the bus 0 currently, thus using lowest possible
> setting (64MB), #defined via PCI_MAX_MCFG_BUSES in hvmloader/config.h.
> When multiple PCI buses support for Xen will be implemented,
> PCI_MAX_MCFG_BUSES may be changed to calculation of the number of buses
> according to results of the PCI devices enumeration.
> 
> The way to allocate MMCONFIG range in MMIO hole is similar to how other
> PCI BARs are allocated. The patch extends 'bars' structure to make
> it universal for any arbitrary BAR type -- either IO, MMIO, ROM or
> a chipset-specific resource.

I'm not sure this is fully correct. The IOREQ interface can
differentiate PCI devices and forward config space accesses to
different emulators (see IOREQ_TYPE_PCI_CONFIG). With this change you
will forward all MCFG accesses to QEMU, which will likely be wrong if
there are multiple PCI-device emulators for the same domain.

Ie: AFAICT Xen needs to know about the MCFG emulation and detect
accesses to it in order to forward them to the right emulators.

Adding Paul who knows more about all this.

> One important new field is addr_mask, which tells which bits of the base
> address can (should) be written. Different address types (ROM, MMIO BAR,
> PCIEXBAR) will have different addr_mask values.
> 
> For every assignable BAR range we store its size, PCI device BDF (devfn
> actually) to which it belongs, BAR type (mem/io/mem64) and corresponding
> register offset in device PCI conf space. This way we can insert MMCONFIG
> entry into bars array in the same manner like for any other BARs. In this
> case, the devfn field will point to MCH PCI device and bar_reg will
> contain PCIEXBAR register offset. It will be assigned a slot in MMIO hole
> later in a very same way like for plain PCI BARs, with respect to its size
> alignment.
> 
> Also, to reduce code complexity, all long mem/mem64 BAR flags checks are
> replaced by simple bars[i] field probing, eg.:
> -        if ( (bar_reg == PCI_ROM_ADDRESS) ||
> -             ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> -              PCI_BASE_ADDRESS_SPACE_MEMORY) )
> +        if ( bars[i].is_mem )

This should be a separate change IMO.

> 
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> ---
>  tools/firmware/hvmloader/config.h   |   4 ++
>  tools/firmware/hvmloader/pci.c      | 127 ++++++++++++++++++++++++++++--------
>  tools/firmware/hvmloader/pci_regs.h |   2 +
>  3 files changed, 106 insertions(+), 27 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
> index 6fde6b7b60..5443ecd804 100644
> --- a/tools/firmware/hvmloader/config.h
> +++ b/tools/firmware/hvmloader/config.h
> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
>  #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
>  #define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0 */
> +#define PCI_MCH_DEVFN       0       /* bus 0, dev 0, func 0 */
>  
>  /* MMIO hole: Hardcoded defaults, which can be dynamically expanded. */
>  #define PCI_MEM_END         0xfc000000
>  
> +/* possible values are: 64, 128, 256 */
> +#define PCI_MAX_MCFG_BUSES  64

What the reasoning for this value? Do we know which devices need ECAM
areas?

> +
>  #define ACPI_TIS_HDR_ADDRESS 0xFED40F00UL
>  
>  extern unsigned long pci_mem_start, pci_mem_end;
> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> index 033bd20992..6de124bbd5 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -158,9 +158,10 @@ static void class_specific_pci_device_setup(uint16_t vendor_id,
>  
>  void pci_setup(void)
>  {
> -    uint8_t is_64bar, using_64bar, bar64_relocate = 0;
> +    uint8_t is_64bar, using_64bar, bar64_relocate = 0, is_mem;
>      uint32_t devfn, bar_reg, cmd, bar_data, bar_data_upper;
>      uint64_t base, bar_sz, bar_sz_upper, mmio_total = 0;
> +    uint64_t addr_mask;
>      uint16_t vendor_id, device_id;
>      unsigned int bar, pin, link, isa_irq;
>      int is_running_on_q35 = 0;
> @@ -172,10 +173,14 @@ void pci_setup(void)
>  
>      /* Create a list of device BARs in descending order of size. */
>      struct bars {
> -        uint32_t is_64bar;
>          uint32_t devfn;
>          uint32_t bar_reg;
>          uint64_t bar_sz;
> +        uint64_t addr_mask; /* which bits of the base address can be written */
> +        uint32_t bar_data;  /* initial value - BAR flags here */
> +        uint8_t  is_64bar;
> +        uint8_t  is_mem;
> +        uint8_t  padding[2];

Why are you manually adding a padding here? Also why not make this
fields bool?

>      } *bars = (struct bars *)scratch_start;
>      unsigned int i, nr_bars = 0;
>      uint64_t mmio_hole_size = 0;
> @@ -259,13 +264,21 @@ void pci_setup(void)
>                  bar_reg = PCI_ROM_ADDRESS;
>  
>              bar_data = pci_readl(devfn, bar_reg);
> +
> +            is_mem = !!(((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> +                       PCI_BASE_ADDRESS_SPACE_MEMORY) ||
> +                       (bar_reg == PCI_ROM_ADDRESS));
> +
>              if ( bar_reg != PCI_ROM_ADDRESS )
>              {
> -                is_64bar = !!((bar_data & (PCI_BASE_ADDRESS_SPACE |
> -                             PCI_BASE_ADDRESS_MEM_TYPE_MASK)) ==
> -                             (PCI_BASE_ADDRESS_SPACE_MEMORY |
> +                is_64bar = !!(is_mem &&
> +                             ((bar_data & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
>                               PCI_BASE_ADDRESS_MEM_TYPE_64));
> +
>                  pci_writel(devfn, bar_reg, ~0);
> +
> +                addr_mask = is_mem ? PCI_BASE_ADDRESS_MEM_MASK
> +                                   : PCI_BASE_ADDRESS_IO_MASK;
>              }
>              else
>              {
> @@ -273,28 +286,35 @@ void pci_setup(void)
>                  pci_writel(devfn, bar_reg,
>                             (bar_data | PCI_ROM_ADDRESS_MASK) &
>                             ~PCI_ROM_ADDRESS_ENABLE);
> +
> +                addr_mask = PCI_ROM_ADDRESS_MASK;
>              }
> +
>              bar_sz = pci_readl(devfn, bar_reg);
>              pci_writel(devfn, bar_reg, bar_data);
>  
>              if ( bar_reg != PCI_ROM_ADDRESS )
> -                bar_sz &= (((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> -                            PCI_BASE_ADDRESS_SPACE_MEMORY) ?
> -                           PCI_BASE_ADDRESS_MEM_MASK :
> -                           (PCI_BASE_ADDRESS_IO_MASK & 0xffff));
> +                bar_sz &= is_mem ? PCI_BASE_ADDRESS_MEM_MASK :
> +                                   (PCI_BASE_ADDRESS_IO_MASK & 0xffff);
>              else
>                  bar_sz &= PCI_ROM_ADDRESS_MASK;
> -            if (is_64bar) {
> +
> +            if (is_64bar)

Coding style (spaces between parentheses).

> +            {
>                  bar_data_upper = pci_readl(devfn, bar_reg + 4);
>                  pci_writel(devfn, bar_reg + 4, ~0);
>                  bar_sz_upper = pci_readl(devfn, bar_reg + 4);
>                  pci_writel(devfn, bar_reg + 4, bar_data_upper);
>                  bar_sz = (bar_sz_upper << 32) | bar_sz;
>              }
> +
>              bar_sz &= ~(bar_sz - 1);
>              if ( bar_sz == 0 )
>                  continue;
>  
> +            /* leave only memtype/enable bits etc */
> +            bar_data &= ~addr_mask;
> +
>              for ( i = 0; i < nr_bars; i++ )
>                  if ( bars[i].bar_sz < bar_sz )
>                      break;
> @@ -302,14 +322,15 @@ void pci_setup(void)
>              if ( i != nr_bars )
>                  memmove(&bars[i+1], &bars[i], (nr_bars-i) * sizeof(*bars));
>  
> -            bars[i].is_64bar = is_64bar;
> -            bars[i].devfn   = devfn;
> -            bars[i].bar_reg = bar_reg;
> -            bars[i].bar_sz  = bar_sz;
> +            bars[i].is_64bar  = is_64bar;
> +            bars[i].is_mem    = is_mem;
> +            bars[i].devfn     = devfn;
> +            bars[i].bar_reg   = bar_reg;
> +            bars[i].bar_sz    = bar_sz;
> +            bars[i].addr_mask = addr_mask;
> +            bars[i].bar_data  = bar_data;
>  
> -            if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> -                  PCI_BASE_ADDRESS_SPACE_MEMORY) ||
> -                 (bar_reg == PCI_ROM_ADDRESS) )
> +            if ( is_mem )
>                  mmio_total += bar_sz;
>  
>              nr_bars++;
> @@ -339,6 +360,63 @@ void pci_setup(void)
>          pci_writew(devfn, PCI_COMMAND, cmd);
>      }
>  
> +    /*
> +     *  Calculate MMCONFIG area size and squeeze it into the bars array
> +     *  for assigning a slot in the MMIO hole
> +     */
> +    if (is_running_on_q35)
> +    {
> +        /* disable PCIEXBAR decoding for now */
> +        pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR, 0);
> +        pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR + 4, 0);

I'm afraid I will need some context here, where is the description for
the config space of dev 0 fn 0? I don't seem to be able to find it in
the ich9 spec.

> +
> +#define PCIEXBAR_64_BUSES    (2 << 1)
> +#define PCIEXBAR_128_BUSES   (1 << 1)
> +#define PCIEXBAR_256_BUSES   (0 << 1)
> +#define PCIEXBAR_ENABLE      (1 << 0)

Why those strange definitions? (0 << 1)? (2 << 1) instead of (1 << 2)?

> +
> +        switch (PCI_MAX_MCFG_BUSES)
> +        {
> +        case 64:
> +            bar_data = PCIEXBAR_64_BUSES | PCIEXBAR_ENABLE;
> +            bar_sz = MB(64);
> +            break;
> +
> +        case 128:
> +            bar_data = PCIEXBAR_128_BUSES | PCIEXBAR_ENABLE;
> +            bar_sz = MB(128);
> +            break;
> +
> +        case 256:
> +            bar_data = PCIEXBAR_256_BUSES | PCIEXBAR_ENABLE;
> +            bar_sz = MB(256);
> +            break;
> +
> +        default:
> +            /* unsupported number of buses specified */
> +            BUG();
> +        }

I don't see how PCI_MAX_MCFG_BUSES should be used. Is the user
supposed to know what value to use at compile time? What about distro
packagers?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 03/12] hvmloader: add function to query an emulated machine type (i440/Q35)
  2018-03-19 12:56   ` Roger Pau Monné
@ 2018-03-19 16:26     ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-19 16:26 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

On Mon, 19 Mar 2018 12:56:51 +0000
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Tue, Mar 13, 2018 at 04:33:48AM +1000, Alexey Gerasimenko wrote:
>> This adds a new function get_pc_machine_type() which allows to
>> determine the emulated chipset type. Supported return values:
>> 
>> - MACHINE_TYPE_I440
>> - MACHINE_TYPE_Q35
>> - MACHINE_TYPE_UNKNOWN, results in the error message being printed
>>   followed by calling BUG() in hvmloader.  
>
>This is not correct, the return values are strictly MACHINE_TYPE_I440
>or MACHINE_TYPE_Q35. Everything else ends up in a BUG().
>
>Also makes me wonder whether this should instead be init_machine_type,
>and users should just read machine_type directly.

Completely agree here, get_-style function should normally return a
value, not to perform extra checks and call BUG().

Renaming the function to init_machine_type() and replacing
get_pc_machine_type() usage to reading the machine_type (extern)
variable should be more clear (or, perhaps, one-line function to return
its value).

This way we can assume the machine type was successfully validated,
hence no need for additional checks for MACHINE_TYPE_UNKNOWN value (and
the MACHINE_TYPE_UNKNOWN itself).

>>  tools/firmware/hvmloader/pci_regs.h |  5 ++++
>>  tools/firmware/hvmloader/util.c     | 47
>> +++++++++++++++++++++++++++++++++++++
>> tools/firmware/hvmloader/util.h     |  8 +++++++ 3 files changed, 60
>> insertions(+)
>> 
>> diff --git a/tools/firmware/hvmloader/pci_regs.h
>> b/tools/firmware/hvmloader/pci_regs.h index 7bf2d873ab..ba498b840e
>> 100644 --- a/tools/firmware/hvmloader/pci_regs.h
>> +++ b/tools/firmware/hvmloader/pci_regs.h

>> +static int machine_type = MACHINE_TYPE_UNDEFINED;  
>
>There's no need to init this, _UNDEFINED is 0 which is the default
>value.

Using the explicit initialization with the named constant here merely
improves readability. Comparing the enum-style variable later with
MACHINE_TYPE_UNDEFINED seems better than comparing it with 0. It's zero
difference for a compiler, but makes difference for a human. :)

Besides, it will be converted to enum type anyway, so some named entry
for the 'unassigned' value will be appropriate I think. 

>> +int get_pc_machine_type(void)  
>
>You introduce a function that's not used anywhere, and the commit log
>doesn't mention why this is needed at all. In general I prefer
>functions to be introduced with at least a caller, or else it needs to
>be described in the commit message why this is not the case.

There are multiple users, will merge the function with some
of its callers (Wei suggested the same).

>> +{
>> +    uint16_t vendor_id;
>> +    uint16_t device_id;
>> +
>> +    if (machine_type != MACHINE_TYPE_UNDEFINED)
>> +        return machine_type;
>> +
>> +    machine_type = MACHINE_TYPE_UNKNOWN;
>> +
>> +    vendor_id = pci_readw(0, PCI_VENDOR_ID);
>> +    device_id = pci_readw(0, PCI_DEVICE_ID);
>> +
>> +    /* only Intel platforms are emulated currently */
>> +    if (vendor_id == PCI_VENDOR_ID_INTEL)  
>
>Should this maybe be a BUG_ON(vendor_id != PCI_VENDOR_ID_INTEL) then?
>Note that in this case you end up with a BUG later anyway.

Yes, this is intentional. Non-Intel vendor => unknown machine.

>> +    {
>> +        switch (device_id)
>> +        {
>> +        case PCI_DEVICE_ID_INTEL_82441:
>> +            machine_type = MACHINE_TYPE_I440;
>> +            printf("Detected i440 chipset\n");
>> +            break;
>> +
>> +        case PCI_DEVICE_ID_INTEL_Q35_MCH:
>> +            machine_type = MACHINE_TYPE_Q35;
>> +            printf("Detected Q35 chipset\n");
>> +            break;
>> +
>> +        default:
>> +            break;
>> +        }
>> +    }
>> +
>> +    if (machine_type == MACHINE_TYPE_UNKNOWN)
>> +    {
>> +        printf("Unknown emulated chipset encountered, VID=%04Xh,
>> DID=%04Xh\n",
>> +               vendor_id, device_id);
>> +        BUG();  
>
>Why not place this in the default switch label? That would allow you
>to get rid of the MACHINE_TYPE_UNKNOWN define also.

This check outside the switch covers cases (Vendor is not Intel)
OR (Vendor is Intel but the host bridge is unknown).

I guess it can be moved into the switch, but it means there will
be two copies of printf(VID:DID)/BUG() block -- one for Vendor ID check,
second is for Device ID processing. Placing this check outside allows
to reuse it for both cases.

>> +    }
>> +
>> +    return machine_type;
>> +}
>> +
>>  static void validate_hvm_info(struct hvm_info_table *t)
>>  {
>>      uint8_t *ptr = (uint8_t *)t;
>> diff --git a/tools/firmware/hvmloader/util.h
>> b/tools/firmware/hvmloader/util.h index 7bca6418d2..7c77bedb00 100644
>> --- a/tools/firmware/hvmloader/util.h
>> +++ b/tools/firmware/hvmloader/util.h
>> @@ -100,6 +100,14 @@ void pci_write(uint32_t devfn, uint32_t reg,
>> uint32_t len, uint32_t val); #define pci_writew(devfn, reg, val)
>> pci_write(devfn, reg, 2, (uint16_t)(val)) #define pci_writel(devfn,
>> reg, val) pci_write(devfn, reg, 4, (uint32_t)(val)) 
>> +/* Emulated machine types */
>> +#define MACHINE_TYPE_UNDEFINED      0
>> +#define MACHINE_TYPE_I440           1
>> +#define MACHINE_TYPE_Q35            2
>> +#define MACHINE_TYPE_UNKNOWN        (-1)  
>
>An enum seems better suited for this.

Agree, + MACHINE_TYPE_UNKNOWN will be dropped in the next version.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 08/12] libxl: Q35 support (new option device_model_machine)
  2018-03-12 18:33 ` [RFC PATCH 08/12] libxl: Q35 support (new option device_model_machine) Alexey Gerasimenko
  2018-03-13 17:25   ` Wei Liu
@ 2018-03-19 17:01   ` Roger Pau Monné
  2018-03-19 22:11     ` Alexey G
  1 sibling, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-19 17:01 UTC (permalink / raw)
  To: Alexey Gerasimenko; +Cc: xen-devel, Wei Liu, Ian Jackson

On Tue, Mar 13, 2018 at 04:33:53AM +1000, Alexey Gerasimenko wrote:
> Provide a new domain config option to select the emulated machine type,
> device_model_machine. It has following possible values:
> - "i440" - i440 emulation (default)
> - "q35" - emulate a Q35 machine. By default, the storage interface is AHCI.

I would rather name this machine_chipset or device_model_chipset.

> 
> Note that omitting device_model_machine parameter means i440 system
> by default, so the default behavior doesn't change for existing domain
> config files.
> 
> Setting device_model_machine to "q35" sends '-machine q35,accel=xen'
> argument to QEMU. Unlike i440, there no separate machine type
> to enable/disable Xen platform device, it is controlled via a machine

But I assume the xen_platform_pci option still works as expected?

> property only. See 'libxl: Xen Platform device support for Q35' patch for
> a detailed description.
> 
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> ---
>  tools/libxl/libxl_dm.c      | 16 ++++++++++------
>  tools/libxl/libxl_types.idl |  7 +++++++
>  tools/xl/xl_parse.c         | 14 ++++++++++++++
>  3 files changed, 31 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index a3cddce8b7..7b531050c7 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -1443,13 +1443,17 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
>              flexarray_append(dm_args, b_info->extra_pv[i]);
>          break;
>      case LIBXL_DOMAIN_TYPE_HVM:
> -        if (!libxl_defbool_val(b_info->u.hvm.xen_platform_pci)) {
> -            /* Switching here to the machine "pc" which does not add
> -             * the xen-platform device instead of the default "xenfv" machine.
> -             */
> -            machinearg = libxl__strdup(gc, "pc,accel=xen");
> +        if (b_info->device_model_machine == LIBXL_DEVICE_MODEL_MACHINE_Q35) {
> +            machinearg = libxl__sprintf(gc, "q35,accel=xen");
>          } else {
> -            machinearg = libxl__strdup(gc, "xenfv");
> +            if (!libxl_defbool_val(b_info->u.hvm.xen_platform_pci)) {
> +                /* Switching here to the machine "pc" which does not add
> +                 * the xen-platform device instead of the default "xenfv" machine.
> +                 */
> +                machinearg = libxl__strdup(gc, "pc,accel=xen");
> +            } else {
> +                machinearg = libxl__strdup(gc, "xenfv");
> +            }
>          }
>          if (b_info->u.hvm.mmio_hole_memkb) {
>              uint64_t max_ram_below_4g = (1ULL << 32) -
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 35038120ca..f3ef3cbdde 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -101,6 +101,12 @@ libxl_device_model_version = Enumeration("device_model_version", [
>      (2, "QEMU_XEN"),             # Upstream based qemu-xen device model
>      ])
>  
> +libxl_device_model_machine = Enumeration("device_model_machine", [
> +    (0, "UNKNOWN"),

Shouldn't this be named DEFAULT?

> +    (1, "I440"),
> +    (2, "Q35"),
> +    ])
> +
>  libxl_console_type = Enumeration("console_type", [
>      (0, "UNKNOWN"),
>      (1, "SERIAL"),
> @@ -491,6 +497,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>      ("device_model_ssid_label", string),
>      # device_model_user is not ready for use yet
>      ("device_model_user", string),
> +    ("device_model_machine", libxl_device_model_machine),
>  
>      # extra parameters pass directly to qemu, NULL terminated
>      ("extra",            libxl_string_list),
> diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
> index f6842540ca..a7506a426b 100644
> --- a/tools/xl/xl_parse.c
> +++ b/tools/xl/xl_parse.c
> @@ -2110,6 +2110,20 @@ skip_usbdev:
>      xlu_cfg_replace_string(config, "device_model_user",
>                             &b_info->device_model_user, 0);
>  
> +    if (!xlu_cfg_get_string (config, "device_model_machine", &buf, 0)) {
> +        if (!strcmp(buf, "i440")) {
> +            b_info->device_model_machine = LIBXL_DEVICE_MODEL_MACHINE_I440;
> +        } else if (!strcmp(buf, "q35")) {
> +            b_info->device_model_machine = LIBXL_DEVICE_MODEL_MACHINE_Q35;
> +        } else {
> +            fprintf(stderr,
> +                    "Unknown device_model_machine \"%s\" specified\n", buf);
> +            exit(1);
> +        }
> +    } else {
> +        b_info->device_model_machine = LIBXL_DEVICE_MODEL_MACHINE_UNKNOWN;

That seems to be it's usage. I'm not sure you should explicitly set it
in the default case (DEFAULT == 0 already).

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 10/12] libacpi: build ACPI MCFG table if requested
  2018-03-12 18:33 ` [RFC PATCH 10/12] libacpi: build ACPI MCFG table if requested Alexey Gerasimenko
@ 2018-03-19 17:33   ` Roger Pau Monné
  2018-03-19 21:46     ` Alexey G
  2018-05-29 14:36   ` Jan Beulich
  1 sibling, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-19 17:33 UTC (permalink / raw)
  To: Alexey Gerasimenko; +Cc: xen-devel, Wei Liu, Ian Jackson, Jan Beulich

On Tue, Mar 13, 2018 at 04:33:55AM +1000, Alexey Gerasimenko wrote:
> This adds construct_mcfg() function to libacpi which allows to build MCFG
> table for a given mmconfig_addr/mmconfig_len pair if the ACPI_HAS_MCFG
> flag was specified in acpi_config struct.
> 
> The maximum bus number is calculated from mmconfig_len using
> MCFG_SIZE_TO_NUM_BUSES macro (1MByte of MMIO space per bus).
> 
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> ---
>  tools/libacpi/acpi2_0.h | 21 +++++++++++++++++++++
>  tools/libacpi/build.c   | 42 ++++++++++++++++++++++++++++++++++++++++++
>  tools/libacpi/libacpi.h |  4 ++++
>  3 files changed, 67 insertions(+)
> 
> diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
> index 2619ba32db..209ad1acd3 100644
> --- a/tools/libacpi/acpi2_0.h
> +++ b/tools/libacpi/acpi2_0.h
> @@ -422,6 +422,25 @@ struct acpi_20_slit {
>  };
>  
>  /*
> + * PCI Express Memory Mapped Configuration Description Table
> + */
> +struct mcfg_range_entry {
> +    uint64_t base_address;
> +    uint16_t pci_segment;
> +    uint8_t  start_pci_bus_num;
> +    uint8_t  end_pci_bus_num;
> +    uint32_t reserved;
> +};
> +
> +struct acpi_mcfg {
> +    struct acpi_header header;
> +    uint8_t reserved[8];
> +    struct mcfg_range_entry entries[1];
> +};

I would define this as:

struct acpi_10_mcfg {
    struct acpi_header header;
    uint8_t reserved[8];
    struct acpi_10_mcfg_entry {
        uint64_t base_address;
        uint16_t pci_segment;
        uint8_t  start_pci_bus;
        uint8_t  end_pci_bus;
        uint32_t reserved;
    } entries[1];
};

> +
> +#define MCFG_SIZE_TO_NUM_BUSES(size)  ((size) >> 20)

I'm not sure the following macro belongs here. This is not directly
related to ACPI.

> +
> +/*
>   * Table Signatures.
>   */
>  #define ACPI_2_0_RSDP_SIGNATURE ASCII64('R','S','D',' ','P','T','R',' ')
> @@ -435,6 +454,7 @@ struct acpi_20_slit {
>  #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
>  #define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
>  #define ACPI_2_0_SLIT_SIGNATURE ASCII32('S','L','I','T')
> +#define ACPI_MCFG_SIGNATURE     ASCII32('M','C','F','G')
>  
>  /*
>   * Table revision numbers.
> @@ -449,6 +469,7 @@ struct acpi_20_slit {
>  #define ACPI_1_0_FADT_REVISION 0x01
>  #define ACPI_2_0_SRAT_REVISION 0x01
>  #define ACPI_2_0_SLIT_REVISION 0x01
> +#define ACPI_1_0_MCFG_REVISION 0x01
>  
>  #pragma pack ()
>  
> diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
> index f9881c9604..5daf1fc5b8 100644
> --- a/tools/libacpi/build.c
> +++ b/tools/libacpi/build.c
> @@ -303,6 +303,37 @@ static struct acpi_20_slit *construct_slit(struct acpi_ctxt *ctxt,
>      return slit;
>  }
>  
> +static struct acpi_mcfg *construct_mcfg(struct acpi_ctxt *ctxt,
> +                                        const struct acpi_config *config)
> +{
> +    struct acpi_mcfg *mcfg;
> +
> +    /* Warning: this code expects that we have only one PCI segment */
> +    mcfg = ctxt->mem_ops.alloc(ctxt, sizeof(*mcfg), 16);
> +    if (!mcfg)

Coding style.

> +        return NULL;
> +
> +    memset(mcfg, 0, sizeof(*mcfg));
> +    mcfg->header.signature    = ACPI_MCFG_SIGNATURE;
> +    mcfg->header.revision     = ACPI_1_0_MCFG_REVISION;
> +    fixed_strcpy(mcfg->header.oem_id, ACPI_OEM_ID);
> +    fixed_strcpy(mcfg->header.oem_table_id, ACPI_OEM_TABLE_ID);
> +    mcfg->header.oem_revision = ACPI_OEM_REVISION;
> +    mcfg->header.creator_id   = ACPI_CREATOR_ID;
> +    mcfg->header.creator_revision = ACPI_CREATOR_REVISION;
> +    mcfg->header.length = sizeof(*mcfg);

As said before, if you want to align things, please do it for the
whole block.

> +
> +    mcfg->entries[0].base_address = config->mmconfig_addr;
> +    mcfg->entries[0].pci_segment = 0;
> +    mcfg->entries[0].start_pci_bus_num = 0;
> +    mcfg->entries[0].end_pci_bus_num =
> +        MCFG_SIZE_TO_NUM_BUSES(config->mmconfig_len) - 1;

Why not pass the start_bus and end_bus values in acpi_config at least?

> +
> +    set_checksum(mcfg, offsetof(struct acpi_header, checksum), sizeof(*mcfg));
> +
> +    return mcfg;;

Double ;;

> +}
> +
>  static int construct_passthrough_tables(struct acpi_ctxt *ctxt,
>                                          unsigned long *table_ptrs,
>                                          int nr_tables,
> @@ -350,6 +381,7 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
>      struct acpi_20_hpet *hpet;
>      struct acpi_20_waet *waet;
>      struct acpi_20_tcpa *tcpa;
> +    struct acpi_mcfg *mcfg;
>      unsigned char *ssdt;
>      static const uint16_t tis_signature[] = {0x0001, 0x0001, 0x0001};
>      void *lasa;
> @@ -417,6 +449,16 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
>          printf("CONV disabled\n");
>      }
>  
> +    /* MCFG */
> +    if ( config->table_flags & ACPI_HAS_MCFG )
> +    {
> +        mcfg = construct_mcfg(ctxt, config);
> +        if (!mcfg)

Coding style.

> +            return -1;
> +
> +        table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, mcfg);
> +    }
> +
>      /* TPM TCPA and SSDT. */
>      if ( (config->table_flags & ACPI_HAS_TCPA) &&
>           (config->tis_hdr[0] == tis_signature[0]) &&
> diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
> index a2efd23b0b..dd85b928e9 100644
> --- a/tools/libacpi/libacpi.h
> +++ b/tools/libacpi/libacpi.h
> @@ -36,6 +36,7 @@
>  #define ACPI_HAS_8042              (1<<13)
>  #define ACPI_HAS_CMOS_RTC          (1<<14)
>  #define ACPI_HAS_SSDT_LAPTOP_SLATE (1<<15)
> +#define ACPI_HAS_MCFG              (1<<16)
>  
>  struct xen_vmemrange;
>  struct acpi_numa {
> @@ -96,6 +97,9 @@ struct acpi_config {
>      uint32_t ioapic_base_address;
>      uint16_t pci_isa_irq_mask;
>      uint8_t ioapic_id;
> +
> +    uint64_t mmconfig_addr;
> +    uint32_t mmconfig_len;

This interface is quite limited because it only allows us to create a
single MCFG entry, but since this is not a public interface I guess it
doesn't matter that much, it can always be expanded when required.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 11/12] hvmloader: use libacpi to build MCFG table
  2018-03-12 18:33 ` [RFC PATCH 11/12] hvmloader: use libacpi to build MCFG table Alexey Gerasimenko
  2018-03-14 17:48   ` Alexey G
@ 2018-03-19 17:49   ` Roger Pau Monné
  2018-03-19 21:20     ` Alexey G
  2018-05-29 14:46   ` Jan Beulich
  2 siblings, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-19 17:49 UTC (permalink / raw)
  To: Alexey Gerasimenko
  Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

On Tue, Mar 13, 2018 at 04:33:56AM +1000, Alexey Gerasimenko wrote:
> This patch extends hvmloader_acpi_build_tables() with code which detects
> if MMCONFIG is available -- i.e. initialized and enabled (+we're running
> on Q35), obtains its base address and size and asks libacpi to build MCFG
> table for it via setting the flag ACPI_HAS_MCFG in a manner similar
> to other optional ACPI tables building.
> 
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> ---
>  tools/firmware/hvmloader/util.c | 70 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 70 insertions(+)
> 
> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> index d8db9e3c8e..c6fc81d52a 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -782,6 +782,69 @@ int get_pc_machine_type(void)
>      return machine_type;
>  }
>  
> +#define PCIEXBAR_ADDR_MASK_64MB     (~((1ULL << 26) - 1))
> +#define PCIEXBAR_ADDR_MASK_128MB    (~((1ULL << 27) - 1))
> +#define PCIEXBAR_ADDR_MASK_256MB    (~((1ULL << 28) - 1))
> +#define PCIEXBAR_LENGTH_BITS(reg)   (((reg) >> 1) & 3)
> +#define PCIEXBAREN                  1

PCIEXBAR_ENABLE maybe?

> +
> +static uint64_t mmconfig_get_base(void)
> +{
> +    uint64_t base;
> +    uint32_t reg = pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR);
> +
> +    base = reg | (uint64_t) pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR+4) << 32;

Please add parentheses in the above expression.

> +
> +    switch (PCIEXBAR_LENGTH_BITS(reg))
> +    {
> +    case 0:
> +        base &= PCIEXBAR_ADDR_MASK_256MB;
> +        break;
> +    case 1:
> +        base &= PCIEXBAR_ADDR_MASK_128MB;
> +        break;
> +    case 2:
> +        base &= PCIEXBAR_ADDR_MASK_64MB;
> +        break;

Missing newlines, plus this looks like it wants to use the defines
introduced in patch 7 (PCIEXBAR_{64,128,256}_BUSES). Also any reason
this patch and patch 7 cannot be put sequentially?

They are very related, and in fact I'm not sure why we need to write
this info to the device in patch 7 and then fetch it from the device
here. Isn't there an easier way to pass this information? At the end
this is all in hvmloader.

> +    case 3:

default:

> +        BUG();  /* a reserved value encountered */
> +    }
> +
> +    return base;
> +}
> +
> +static uint32_t mmconfig_get_size(void)

unsigned int or size_t?

> +{
> +    uint32_t reg = pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR);
> +
> +    switch (PCIEXBAR_LENGTH_BITS(reg))
> +    {
> +    case 0: return MB(256);
> +    case 1: return MB(128);
> +    case 2: return MB(64);
> +    case 3:
> +        BUG();  /* a reserved value encountered */

Same comments as above about the labels and the case 3 label.

> +    }
> +
> +    return 0;
> +}
> +
> +static uint32_t mmconfig_is_enabled(void)
> +{
> +    return pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR) & PCIEXBAREN;
> +}
> +
> +static int is_mmconfig_used(void)

bool

> +{
> +    if (get_pc_machine_type() == MACHINE_TYPE_Q35)
> +    {
> +        if (mmconfig_is_enabled() && mmconfig_get_base())

Coding style.

Also you can join the conditions:

if ( get_pc_machine_type() == MACHINE_TYPE_Q35 && mmconfig_is_enabled() &&
     mmconfig_get_base() )
     return true;

Looking at this, is it actually a valid state to have
mmconfig_is_enabled() == true and mmconfig_get_base() == 0?

> +            return 1;
> +    }
> +
> +    return 0;
> +}
> +
>  static void validate_hvm_info(struct hvm_info_table *t)
>  {
>      uint8_t *ptr = (uint8_t *)t;
> @@ -993,6 +1056,13 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
>          config->pci_hi_len = pci_hi_mem_end - pci_hi_mem_start;
>      }
>  
> +    if ( is_mmconfig_used() )
> +    {
> +        config->table_flags |= ACPI_HAS_MCFG;
> +        config->mmconfig_addr = mmconfig_get_base();
> +        config->mmconfig_len  = mmconfig_get_size();
> +    }
> +
>      s = xenstore_read("platform/generation-id", "0:0");
>      if ( s )
>      {
> -- 
> 2.11.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-19 15:58   ` Roger Pau Monné
@ 2018-03-19 19:49     ` Alexey G
  2018-03-20  8:50       ` Roger Pau Monné
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-19 19:49 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, Paul Durrant, Jan Beulich,
	xen-devel

On Mon, 19 Mar 2018 15:58:02 +0000
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Tue, Mar 13, 2018 at 04:33:52AM +1000, Alexey Gerasimenko wrote:
>> Much like normal PCI BARs or other chipset-specific memory-mapped
>> resources, MMCONFIG area needs space in MMIO hole, so we must
>> allocate it manually.
>> 
>> The actual MMCONFIG size depends on a number of PCI buses available
>> which should be covered by ECAM. Possible options are 64MB, 128MB
>> and 256MB. As we are limited to the bus 0 currently, thus using
>> lowest possible setting (64MB), #defined via PCI_MAX_MCFG_BUSES in
>> hvmloader/config.h. When multiple PCI buses support for Xen will be
>> implemented, PCI_MAX_MCFG_BUSES may be changed to calculation of the
>> number of buses according to results of the PCI devices enumeration.
>> 
>> The way to allocate MMCONFIG range in MMIO hole is similar to how
>> other PCI BARs are allocated. The patch extends 'bars' structure to
>> make it universal for any arbitrary BAR type -- either IO, MMIO, ROM
>> or a chipset-specific resource.  
>
>I'm not sure this is fully correct. The IOREQ interface can
>differentiate PCI devices and forward config space accesses to
>different emulators (see IOREQ_TYPE_PCI_CONFIG). With this change you
>will forward all MCFG accesses to QEMU, which will likely be wrong if
>there are multiple PCI-device emulators for the same domain.
>
>Ie: AFAICT Xen needs to know about the MCFG emulation and detect
>accesses to it in order to forward them to the right emulators.
>
>Adding Paul who knows more about all this.

In which use cases multiple PCI-device emulators are used for a single
HVM domain? Is it a proprietary setup?

I assume it is somehow related to this code in xen-hvm.c:
                /* Fake a write to port 0xCF8 so that
                 * the config space access will target the
                 * correct device model.
                 */
                val = (1u << 31) | ((req->addr & 0x0f00) <...>
                do_outp(0xcf8, 4, val);
if yes, similar thing can be made for IOREQ_TYPE_COPY accesses to
the emulated MMCONFIG if needed.

In HVM+QEMU case we are not limited to merely passed through devices,
most of the observable PCI config space devices belong to one particular
QEMU instance. This dictates the overall emulated MMCONFIG layout
for a domain which should be in sync to what QEMU emulates via CF8h/CFCh
accesses... and between multiple device model instances (if there are
any, still not sure what multiple PCI-device emulators you mentioned
really are).

Basically, we have an emulated MMCONFIG area of 64/128/256MB size in
the MMIO hole of the guest HVM domain. (BTW, this area itself can be
considered a feature of the chipset the device model emulates.)
It can be relocated to some other place in MMIO hole, this means that
QEMU will trap accesses to the specific to the emulated chipset
PCIEXBAR register and will issue same MMIO unmap/map calls as for
any normal emulated MMIO range.

On the other hand, it won't be easy to provide emulated MMCONFIG
translation into IOREQ_TYPE_PCI_CONFIG from Xen side. Xen should know
current emulated MMCONFIG area position and size in order to translate
(or not) accesses to it into corresponding BDF/reg pair (+whether that
area is enabled for decoding or not). This will likely require to
introduce new hypercall(s).

The question is if there will be any difference or benefit at all.

It's basically the same emulated MMIO range after all, but in one case
we trap accesses to it in Xen and translate them into
IOREQ_TYPE_PCI_CONFIG requests.
We have to provide some infrastructure to let Xen know where the device 
model/guest expects to use the MMCONFIG area (and its size). The
device model will need to use this infrastructure, informing Xen of
any changes. Also, due to MMCONFIG nature there might be some pitfalls
like a necessity to send multiple IOREQ_TYPE_PCI_CONFIG ioreqs caused by
a single memory read/write operation.

In another case, we still have an emulated MMIO range, but Xen will send
plain IOREQ_TYPE_COPY requests to QEMU which it handles itself.
In such case, all code to work with MMCONFIG accesses is available for
reuse right away (mmcfg -> pci_* translation in QEMU), no new
functionality required neither in Xen or QEMU.

>> One important new field is addr_mask, which tells which bits of the
>> base address can (should) be written. Different address types (ROM,
>> MMIO BAR, PCIEXBAR) will have different addr_mask values.
>> 
>> For every assignable BAR range we store its size, PCI device BDF
>> (devfn actually) to which it belongs, BAR type (mem/io/mem64) and
>> corresponding register offset in device PCI conf space. This way we
>> can insert MMCONFIG entry into bars array in the same manner like
>> for any other BARs. In this case, the devfn field will point to MCH
>> PCI device and bar_reg will contain PCIEXBAR register offset. It
>> will be assigned a slot in MMIO hole later in a very same way like
>> for plain PCI BARs, with respect to its size alignment.
>> 
>> Also, to reduce code complexity, all long mem/mem64 BAR flags checks
>> are replaced by simple bars[i] field probing, eg.:
>> -        if ( (bar_reg == PCI_ROM_ADDRESS) ||
>> -             ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
>> -              PCI_BASE_ADDRESS_SPACE_MEMORY) )
>> +        if ( bars[i].is_mem )  
>
>This should be a separate change IMO.

OK, no problem.

>>  tools/firmware/hvmloader/config.h   |   4 ++
>>  tools/firmware/hvmloader/pci.c      | 127
>> ++++++++++++++++++++++++++++--------
>> tools/firmware/hvmloader/pci_regs.h |   2 + 3 files changed, 106
>> insertions(+), 27 deletions(-)
>> 
>> diff --git a/tools/firmware/hvmloader/config.h
>> b/tools/firmware/hvmloader/config.h index 6fde6b7b60..5443ecd804
>> 100644 --- a/tools/firmware/hvmloader/config.h
>> +++ b/tools/firmware/hvmloader/config.h
>> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
>>  #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
>>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI
>> connected */ #define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0 */
>> +#define PCI_MCH_DEVFN       0       /* bus 0, dev 0, func 0 */
>>  
>>  /* MMIO hole: Hardcoded defaults, which can be dynamically
>> expanded. */ #define PCI_MEM_END         0xfc000000
>>  
>> +/* possible values are: 64, 128, 256 */
>> +#define PCI_MAX_MCFG_BUSES  64  
>
>What the reasoning for this value? Do we know which devices need ECAM
>areas?

Yes, Xen is limited to bus 0 emulation currently, the description
states "When multiple PCI buses support for Xen will be implemented,
PCI_MAX_MCFG_BUSES may be changed to calculation of the number of buses
according to results of the PCI devices enumeration".

I think it might be better to replace 'switch (PCI_MAX_MCFG_BUSES)'
with the real code right away, i.e. change it to

'switch (max_bus_num, aligned up to 64/128/256 boundary)',
where max_bus_num should be set in PCI device enumeration code in
pci_setup(). As we are limited to bus 0 currently, we'll just set it
to 0 for now, before/after the PCI device enumeration loop (which should
became multi-bus capable eventually).

>>  #define ACPI_TIS_HDR_ADDRESS 0xFED40F00UL
>>  
>>  extern unsigned long pci_mem_start, pci_mem_end;
>> diff --git a/tools/firmware/hvmloader/pci.c
>> b/tools/firmware/hvmloader/pci.c index 033bd20992..6de124bbd5 100644
>> --- a/tools/firmware/hvmloader/pci.c
>> +++ b/tools/firmware/hvmloader/pci.c
>> @@ -158,9 +158,10 @@ static void
>> class_specific_pci_device_setup(uint16_t vendor_id, 
>>  void pci_setup(void)
>>  {
>> -    uint8_t is_64bar, using_64bar, bar64_relocate = 0;
>> +    uint8_t is_64bar, using_64bar, bar64_relocate = 0, is_mem;
>>      uint32_t devfn, bar_reg, cmd, bar_data, bar_data_upper;
>>      uint64_t base, bar_sz, bar_sz_upper, mmio_total = 0;
>> +    uint64_t addr_mask;
>>      uint16_t vendor_id, device_id;
>>      unsigned int bar, pin, link, isa_irq;
>>      int is_running_on_q35 = 0;
>> @@ -172,10 +173,14 @@ void pci_setup(void)
>>  
>>      /* Create a list of device BARs in descending order of size. */
>>      struct bars {
>> -        uint32_t is_64bar;
>>          uint32_t devfn;
>>          uint32_t bar_reg;
>>          uint64_t bar_sz;
>> +        uint64_t addr_mask; /* which bits of the base address can
>> be written */
>> +        uint32_t bar_data;  /* initial value - BAR flags here */
>> +        uint8_t  is_64bar;
>> +        uint8_t  is_mem;
>> +        uint8_t  padding[2];  
>
>Why are you manually adding a padding here? Also why not make this
>fields bool?

Just following existing code style, hvmloader/pci.c for some
reason prefers to specify uint8_t for bool vars. OK, will change it
to bools.

>>      } *bars = (struct bars *)scratch_start;
>>      unsigned int i, nr_bars = 0;
>>      uint64_t mmio_hole_size = 0;
>> @@ -259,13 +264,21 @@ void pci_setup(void)
>>                  bar_reg = PCI_ROM_ADDRESS;
>>  
>>              bar_data = pci_readl(devfn, bar_reg);
>> +
>> +            is_mem = !!(((bar_data & PCI_BASE_ADDRESS_SPACE) ==
>> +                       PCI_BASE_ADDRESS_SPACE_MEMORY) ||
>> +                       (bar_reg == PCI_ROM_ADDRESS));
>> +
>>              if ( bar_reg != PCI_ROM_ADDRESS )
>>              {
>> -                is_64bar = !!((bar_data & (PCI_BASE_ADDRESS_SPACE |
>> -                             PCI_BASE_ADDRESS_MEM_TYPE_MASK)) ==
>> -                             (PCI_BASE_ADDRESS_SPACE_MEMORY |
>> +                is_64bar = !!(is_mem &&
>> +                             ((bar_data &
>> PCI_BASE_ADDRESS_MEM_TYPE_MASK) == PCI_BASE_ADDRESS_MEM_TYPE_64));
>> +
>>                  pci_writel(devfn, bar_reg, ~0);
>> +
>> +                addr_mask = is_mem ? PCI_BASE_ADDRESS_MEM_MASK
>> +                                   : PCI_BASE_ADDRESS_IO_MASK;
>>              }
>>              else
>>              {
>> @@ -273,28 +286,35 @@ void pci_setup(void)
>>                  pci_writel(devfn, bar_reg,
>>                             (bar_data | PCI_ROM_ADDRESS_MASK) &
>>                             ~PCI_ROM_ADDRESS_ENABLE);
>> +
>> +                addr_mask = PCI_ROM_ADDRESS_MASK;
>>              }
>> +
>>              bar_sz = pci_readl(devfn, bar_reg);
>>              pci_writel(devfn, bar_reg, bar_data);
>>  
>>              if ( bar_reg != PCI_ROM_ADDRESS )
>> -                bar_sz &= (((bar_data & PCI_BASE_ADDRESS_SPACE) ==
>> -                            PCI_BASE_ADDRESS_SPACE_MEMORY) ?
>> -                           PCI_BASE_ADDRESS_MEM_MASK :
>> -                           (PCI_BASE_ADDRESS_IO_MASK & 0xffff));
>> +                bar_sz &= is_mem ? PCI_BASE_ADDRESS_MEM_MASK :
>> +                                   (PCI_BASE_ADDRESS_IO_MASK &
>> 0xffff); else
>>                  bar_sz &= PCI_ROM_ADDRESS_MASK;
>> -            if (is_64bar) {
>> +
>> +            if (is_64bar)  
>
>Coding style (spaces between parentheses).

OK, will add.

>> +            {
>>                  bar_data_upper = pci_readl(devfn, bar_reg + 4);
>>                  pci_writel(devfn, bar_reg + 4, ~0);
>>                  bar_sz_upper = pci_readl(devfn, bar_reg + 4);
>>                  pci_writel(devfn, bar_reg + 4, bar_data_upper);
>>                  bar_sz = (bar_sz_upper << 32) | bar_sz;
>>              }
>> +
>>              bar_sz &= ~(bar_sz - 1);
>>              if ( bar_sz == 0 )
>>                  continue;
>>  
>> +            /* leave only memtype/enable bits etc */
>> +            bar_data &= ~addr_mask;
>> +
>>              for ( i = 0; i < nr_bars; i++ )
>>                  if ( bars[i].bar_sz < bar_sz )
>>                      break;
>> @@ -302,14 +322,15 @@ void pci_setup(void)
>>              if ( i != nr_bars )
>>                  memmove(&bars[i+1], &bars[i], (nr_bars-i) *
>> sizeof(*bars)); 
>> -            bars[i].is_64bar = is_64bar;
>> -            bars[i].devfn   = devfn;
>> -            bars[i].bar_reg = bar_reg;
>> -            bars[i].bar_sz  = bar_sz;
>> +            bars[i].is_64bar  = is_64bar;
>> +            bars[i].is_mem    = is_mem;
>> +            bars[i].devfn     = devfn;
>> +            bars[i].bar_reg   = bar_reg;
>> +            bars[i].bar_sz    = bar_sz;
>> +            bars[i].addr_mask = addr_mask;
>> +            bars[i].bar_data  = bar_data;
>>  
>> -            if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
>> -                  PCI_BASE_ADDRESS_SPACE_MEMORY) ||
>> -                 (bar_reg == PCI_ROM_ADDRESS) )
>> +            if ( is_mem )
>>                  mmio_total += bar_sz;
>>  
>>              nr_bars++;
>> @@ -339,6 +360,63 @@ void pci_setup(void)
>>          pci_writew(devfn, PCI_COMMAND, cmd);
>>      }
>>  
>> +    /*
>> +     *  Calculate MMCONFIG area size and squeeze it into the bars
>> array
>> +     *  for assigning a slot in the MMIO hole
>> +     */
>> +    if (is_running_on_q35)
>> +    {
>> +        /* disable PCIEXBAR decoding for now */
>> +        pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR, 0);
>> +        pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR + 4, 0);  
>
>I'm afraid I will need some context here, where is the description for
>the config space of dev 0 fn 0? I don't seem to be able to find it in
>the ich9 spec.

ICH9 is a south bridge, you need to check the NB/MCH datasheet, namely
"Intel® 3 Series Express Chipset Family".

>> +
>> +#define PCIEXBAR_64_BUSES    (2 << 1)
>> +#define PCIEXBAR_128_BUSES   (1 << 1)
>> +#define PCIEXBAR_256_BUSES   (0 << 1)
>> +#define PCIEXBAR_ENABLE      (1 << 0)  
>
>Why those strange definitions? (0 << 1)? (2 << 1) instead of (1 << 2)?

These are bitfields. It's just to show their bitfield nature,
bits[2..1] and bit0. I'll change them to something more readable
(like shifts with _BITPOS-defines) in non-RFC patches.

>> +
>> +        switch (PCI_MAX_MCFG_BUSES)
>> +        {
>> +        case 64:
>> +            bar_data = PCIEXBAR_64_BUSES | PCIEXBAR_ENABLE;
>> +            bar_sz = MB(64);
>> +            break;
>> +
>> +        case 128:
>> +            bar_data = PCIEXBAR_128_BUSES | PCIEXBAR_ENABLE;
>> +            bar_sz = MB(128);
>> +            break;
>> +
>> +        case 256:
>> +            bar_data = PCIEXBAR_256_BUSES | PCIEXBAR_ENABLE;
>> +            bar_sz = MB(256);
>> +            break;
>> +
>> +        default:
>> +            /* unsupported number of buses specified */
>> +            BUG();
>> +        }  
>
>I don't see how PCI_MAX_MCFG_BUSES should be used. Is the user
>supposed to know what value to use at compile time? What about distro
>packagers?

Answered above mostly.
We're limited to bus 0 currently. However, it is possible to change
MMCONFIG size manually for now (eg. to 256MB which allows to cover the
whole 0-FF bus range).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 11/12] hvmloader: use libacpi to build MCFG table
  2018-03-19 17:49   ` Roger Pau Monné
@ 2018-03-19 21:20     ` Alexey G
  2018-03-20  8:58       ` Roger Pau Monné
  2018-03-20  9:36       ` Jan Beulich
  0 siblings, 2 replies; 183+ messages in thread
From: Alexey G @ 2018-03-19 21:20 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

On Mon, 19 Mar 2018 17:49:09 +0000
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Tue, Mar 13, 2018 at 04:33:56AM +1000, Alexey Gerasimenko wrote:
>> This patch extends hvmloader_acpi_build_tables() with code which
>> detects if MMCONFIG is available -- i.e. initialized and enabled
>> (+we're running on Q35), obtains its base address and size and asks
>> libacpi to build MCFG table for it via setting the flag
>> ACPI_HAS_MCFG in a manner similar to other optional ACPI tables
>> building.
>> 
>> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
>> ---
>>  tools/firmware/hvmloader/util.c | 70
>> +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 70
>> insertions(+)
>> 
>> diff --git a/tools/firmware/hvmloader/util.c
>> b/tools/firmware/hvmloader/util.c index d8db9e3c8e..c6fc81d52a 100644
>> --- a/tools/firmware/hvmloader/util.c
>> +++ b/tools/firmware/hvmloader/util.c
>> @@ -782,6 +782,69 @@ int get_pc_machine_type(void)
>>      return machine_type;
>>  }
>>  
>> +#define PCIEXBAR_ADDR_MASK_64MB     (~((1ULL << 26) - 1))
>> +#define PCIEXBAR_ADDR_MASK_128MB    (~((1ULL << 27) - 1))
>> +#define PCIEXBAR_ADDR_MASK_256MB    (~((1ULL << 28) - 1))
>> +#define PCIEXBAR_LENGTH_BITS(reg)   (((reg) >> 1) & 3)
>> +#define PCIEXBAREN                  1  
>
>PCIEXBAR_ENABLE maybe?

PCIEXBAREN is just an official name of this bit from the
Intel datasheet. :) OK, will rename it to PCIEXBAR_ENABLE.

>> +
>> +static uint64_t mmconfig_get_base(void)
>> +{
>> +    uint64_t base;
>> +    uint32_t reg = pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR);
>> +
>> +    base = reg | (uint64_t) pci_readl(PCI_MCH_DEVFN,
>> PCI_MCH_PCIEXBAR+4) << 32;  
>
>Please add parentheses in the above expression.

Agree, parentheses will make the op priority clearer.

>> +
>> +    switch (PCIEXBAR_LENGTH_BITS(reg))
>> +    {
>> +    case 0:
>> +        base &= PCIEXBAR_ADDR_MASK_256MB;
>> +        break;
>> +    case 1:
>> +        base &= PCIEXBAR_ADDR_MASK_128MB;
>> +        break;
>> +    case 2:
>> +        base &= PCIEXBAR_ADDR_MASK_64MB;
>> +        break;  
>
>Missing newlines, plus this looks like it wants to use the defines
>introduced in patch 7 (PCIEXBAR_{64,128,256}_BUSES). Also any reason
>this patch and patch 7 cannot be put sequentially?

I think all these #defines should find a way to pci_regs.h, it seems
like an appropriate place for them.

Regarding the order of hvmloader patches -- will verify this for
the next version.

>They are very related, and in fact I'm not sure why we need to write
>this info to the device in patch 7 and then fetch it from the device
>here. Isn't there an easier way to pass this information? At the end
>this is all in hvmloader.

Well, the hvmloader_acpi_build_tables() function mostly does device
probing (using I/O instruction) and xenstore reads to collect system
information in order to discover which ACPI_HAS_* flags it should pass
to acpi_build_tables(), but using global variables to pass this kind of
information for MMCONFIG will be OK too I think.

>> +    case 3:  
>
>default:

There is '& 3' for the switch argument, but ok I guess, it's clearer
with 'default'.

>> +        BUG();  /* a reserved value encountered */
>> +    }
>> +
>> +    return base;
>> +}
>> +
>> +static uint32_t mmconfig_get_size(void)  
>
>unsigned int or size_t?

Using types which are common to the existing code.

size_t have almost zero use in hvmloader.

unsigned int instead of uint32_t... well, the uint32_t still
used more often as a type name anyway, but I have no objections to
either choice.

>> +{
>> +    uint32_t reg = pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR);
>> +
>> +    switch (PCIEXBAR_LENGTH_BITS(reg))
>> +    {
>> +    case 0: return MB(256);
>> +    case 1: return MB(128);
>> +    case 2: return MB(64);
>> +    case 3:
>> +        BUG();  /* a reserved value encountered */  
>
>Same comments as above about the labels and the case 3 label.
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static uint32_t mmconfig_is_enabled(void)
>> +{
>> +    return pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR) & PCIEXBAREN;
>> +}
>> +
>> +static int is_mmconfig_used(void)  
>
>bool

OK

>> +{
>> +    if (get_pc_machine_type() == MACHINE_TYPE_Q35)
>> +    {
>> +        if (mmconfig_is_enabled() && mmconfig_get_base())  
>
>Coding style.
>
>Also you can join the conditions:
>
>if ( get_pc_machine_type() == MACHINE_TYPE_Q35 &&
>mmconfig_is_enabled() &&
>     mmconfig_get_base() )
>     return true;
>
>Looking at this, is it actually a valid state to have
>mmconfig_is_enabled() == true and mmconfig_get_base() == 0?

Yes, in theory we can have either PCIEXBAREN=0 and a valid PCIEXBAR
base, or vice versa.
Of course normally we should not encounter a situation where base=0 and
PCIEXBAREN=1, just covering here possible cases which the register
format allows.

Regarding check merging -- ok, sure. Short-circuit evaluation should
guaranty that these registers are not touched on a different
machine.

>> +            return 1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>  static void validate_hvm_info(struct hvm_info_table *t)
>>  {
>>      uint8_t *ptr = (uint8_t *)t;
>> @@ -993,6 +1056,13 @@ void hvmloader_acpi_build_tables(struct
>> acpi_config *config, config->pci_hi_len = pci_hi_mem_end -
>> pci_hi_mem_start; }
>>  
>> +    if ( is_mmconfig_used() )
>> +    {
>> +        config->table_flags |= ACPI_HAS_MCFG;
>> +        config->mmconfig_addr = mmconfig_get_base();
>> +        config->mmconfig_len  = mmconfig_get_size();
>> +    }
>> +

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 10/12] libacpi: build ACPI MCFG table if requested
  2018-03-19 17:33   ` Roger Pau Monné
@ 2018-03-19 21:46     ` Alexey G
  2018-03-20  9:03       ` Roger Pau Monné
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-19 21:46 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Wei Liu, Ian Jackson, Jan Beulich

On Mon, 19 Mar 2018 17:33:34 +0000
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Tue, Mar 13, 2018 at 04:33:55AM +1000, Alexey Gerasimenko wrote:
>> This adds construct_mcfg() function to libacpi which allows to build
>> MCFG table for a given mmconfig_addr/mmconfig_len pair if the
>> ACPI_HAS_MCFG flag was specified in acpi_config struct.
>> 
>> The maximum bus number is calculated from mmconfig_len using
>> MCFG_SIZE_TO_NUM_BUSES macro (1MByte of MMIO space per bus).
>> 
>> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
>> ---
>>  tools/libacpi/acpi2_0.h | 21 +++++++++++++++++++++
>>  tools/libacpi/build.c   | 42
>> ++++++++++++++++++++++++++++++++++++++++++ tools/libacpi/libacpi.h
>> |  4 ++++ 3 files changed, 67 insertions(+)
>> 
>> diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
>> index 2619ba32db..209ad1acd3 100644
>> --- a/tools/libacpi/acpi2_0.h
>> +++ b/tools/libacpi/acpi2_0.h
>> @@ -422,6 +422,25 @@ struct acpi_20_slit {
>>  };
>>  
>>  /*
>> + * PCI Express Memory Mapped Configuration Description Table
>> + */
>> +struct mcfg_range_entry {
>> +    uint64_t base_address;
>> +    uint16_t pci_segment;
>> +    uint8_t  start_pci_bus_num;
>> +    uint8_t  end_pci_bus_num;
>> +    uint32_t reserved;
>> +};
>> +
>> +struct acpi_mcfg {
>> +    struct acpi_header header;
>> +    uint8_t reserved[8];
>> +    struct mcfg_range_entry entries[1];
>> +};  
>
>I would define this as:
>
>struct acpi_10_mcfg {
>    struct acpi_header header;
>    uint8_t reserved[8];
>    struct acpi_10_mcfg_entry {
>        uint64_t base_address;
>        uint16_t pci_segment;
>        uint8_t  start_pci_bus;
>        uint8_t  end_pci_bus;
>        uint32_t reserved;
>    } entries[1];
>};

Hmm, a choice of preference, but OK, will move it inside.

>> +
>> +#define MCFG_SIZE_TO_NUM_BUSES(size)  ((size) >> 20)  
>
>I'm not sure the following macro belongs here. This is not directly
>related to ACPI.

Yeah, pci_regs.h might be better I think.

>> +
>> +/*
>>   * Table Signatures.
>>   */
>>  #define ACPI_2_0_RSDP_SIGNATURE ASCII64('R','S','D','
>> ','P','T','R',' ') @@ -435,6 +454,7 @@ struct acpi_20_slit {
>>  #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
>>  #define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
>>  #define ACPI_2_0_SLIT_SIGNATURE ASCII32('S','L','I','T')
>> +#define ACPI_MCFG_SIGNATURE     ASCII32('M','C','F','G')
>>  
>>  /*
>>   * Table revision numbers.
>> @@ -449,6 +469,7 @@ struct acpi_20_slit {
>>  #define ACPI_1_0_FADT_REVISION 0x01
>>  #define ACPI_2_0_SRAT_REVISION 0x01
>>  #define ACPI_2_0_SLIT_REVISION 0x01
>> +#define ACPI_1_0_MCFG_REVISION 0x01
>>  
>>  #pragma pack ()
>>  
>> diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
>> index f9881c9604..5daf1fc5b8 100644
>> --- a/tools/libacpi/build.c
>> +++ b/tools/libacpi/build.c
>> @@ -303,6 +303,37 @@ static struct acpi_20_slit
>> *construct_slit(struct acpi_ctxt *ctxt, return slit;
>>  }
>>  
>> +static struct acpi_mcfg *construct_mcfg(struct acpi_ctxt *ctxt,
>> +                                        const struct acpi_config
>> *config) +{
>> +    struct acpi_mcfg *mcfg;
>> +
>> +    /* Warning: this code expects that we have only one PCI segment
>> */
>> +    mcfg = ctxt->mem_ops.alloc(ctxt, sizeof(*mcfg), 16);
>> +    if (!mcfg)  
>
>Coding style.

OK

>> +        return NULL;
>> +
>> +    memset(mcfg, 0, sizeof(*mcfg));
>> +    mcfg->header.signature    = ACPI_MCFG_SIGNATURE;
>> +    mcfg->header.revision     = ACPI_1_0_MCFG_REVISION;
>> +    fixed_strcpy(mcfg->header.oem_id, ACPI_OEM_ID);
>> +    fixed_strcpy(mcfg->header.oem_table_id, ACPI_OEM_TABLE_ID);
>> +    mcfg->header.oem_revision = ACPI_OEM_REVISION;
>> +    mcfg->header.creator_id   = ACPI_CREATOR_ID;
>> +    mcfg->header.creator_revision = ACPI_CREATOR_REVISION;
>> +    mcfg->header.length = sizeof(*mcfg);  
>
>As said before, if you want to align things, please do it for the
>whole block.

Agree, will reorder lines.

>> +
>> +    mcfg->entries[0].base_address = config->mmconfig_addr;
>> +    mcfg->entries[0].pci_segment = 0;
>> +    mcfg->entries[0].start_pci_bus_num = 0;
>> +    mcfg->entries[0].end_pci_bus_num =
>> +        MCFG_SIZE_TO_NUM_BUSES(config->mmconfig_len) - 1;  
>
>Why not pass the start_bus and end_bus values in acpi_config at least?

start_pci_bus_num will be always 0.

It will be kinda ugly to pass config->mmconfig_addr along with
config->end_pci_bus_num, baseaddr+size combo looks nicer I think.

>> +
>> +    set_checksum(mcfg, offsetof(struct acpi_header, checksum),
>> sizeof(*mcfg)); +
>> +    return mcfg;;  
>
>Double ;;

Oops, missed this one.

>> +}
>> +
>>  static int construct_passthrough_tables(struct acpi_ctxt *ctxt,
>>                                          unsigned long *table_ptrs,
>>                                          int nr_tables,
>> @@ -350,6 +381,7 @@ static int construct_secondary_tables(struct
>> acpi_ctxt *ctxt, struct acpi_20_hpet *hpet;
>>      struct acpi_20_waet *waet;
>>      struct acpi_20_tcpa *tcpa;
>> +    struct acpi_mcfg *mcfg;
>>      unsigned char *ssdt;
>>      static const uint16_t tis_signature[] = {0x0001, 0x0001,
>> 0x0001}; void *lasa;
>> @@ -417,6 +449,16 @@ static int construct_secondary_tables(struct
>> acpi_ctxt *ctxt, printf("CONV disabled\n");
>>      }
>>  
>> +    /* MCFG */
>> +    if ( config->table_flags & ACPI_HAS_MCFG )
>> +    {
>> +        mcfg = construct_mcfg(ctxt, config);
>> +        if (!mcfg)  
>
>Coding style.

Will fix.

>> +            return -1;
>> +
>> +        table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, mcfg);
>> +    }
>> +
>>      /* TPM TCPA and SSDT. */
>>      if ( (config->table_flags & ACPI_HAS_TCPA) &&
>>           (config->tis_hdr[0] == tis_signature[0]) &&
>> diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
>> index a2efd23b0b..dd85b928e9 100644
>> --- a/tools/libacpi/libacpi.h
>> +++ b/tools/libacpi/libacpi.h
>> @@ -36,6 +36,7 @@
>>  #define ACPI_HAS_8042              (1<<13)
>>  #define ACPI_HAS_CMOS_RTC          (1<<14)
>>  #define ACPI_HAS_SSDT_LAPTOP_SLATE (1<<15)
>> +#define ACPI_HAS_MCFG              (1<<16)
>>  
>>  struct xen_vmemrange;
>>  struct acpi_numa {
>> @@ -96,6 +97,9 @@ struct acpi_config {
>>      uint32_t ioapic_base_address;
>>      uint16_t pci_isa_irq_mask;
>>      uint8_t ioapic_id;
>> +
>> +    uint64_t mmconfig_addr;
>> +    uint32_t mmconfig_len;  
>
>This interface is quite limited because it only allows us to create a
>single MCFG entry, but since this is not a public interface I guess it
>doesn't matter that much, it can always be expanded when required.

We will be limited to a single MMCONFIG area for a long time I'm
afraid, it will good to move away from the bus 0 limitation first.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 08/12] libxl: Q35 support (new option device_model_machine)
  2018-03-19 17:01   ` Roger Pau Monné
@ 2018-03-19 22:11     ` Alexey G
  2018-03-20  9:11       ` Roger Pau Monné
  2018-03-21 16:25       ` Wei Liu
  0 siblings, 2 replies; 183+ messages in thread
From: Alexey G @ 2018-03-19 22:11 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Wei Liu, Ian Jackson

On Mon, 19 Mar 2018 17:01:18 +0000
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Tue, Mar 13, 2018 at 04:33:53AM +1000, Alexey Gerasimenko wrote:
>> Provide a new domain config option to select the emulated machine
>> type, device_model_machine. It has following possible values:
>> - "i440" - i440 emulation (default)
>> - "q35" - emulate a Q35 machine. By default, the storage interface
>> is AHCI.  
>
>I would rather name this machine_chipset or device_model_chipset.

device_model_ prefix is a must I think -- multiple device model related
options have names starting with device_model_.

device_model_chipset... well, maybe, but we're actually specifying a
QEMU machine here. In QEMU mailing list there was even a suggestion
to allow to pass a machine version number here, like "pc-q35-2.10".
I think some opinions are needed here.

>> 
>> Note that omitting device_model_machine parameter means i440 system
>> by default, so the default behavior doesn't change for existing
>> domain config files.
>> 
>> Setting device_model_machine to "q35" sends '-machine q35,accel=xen'
>> argument to QEMU. Unlike i440, there no separate machine type
>> to enable/disable Xen platform device, it is controlled via a
>> machine  
>
>But I assume the xen_platform_pci option still works as expected?

Yes, xen_platform_pci should work as before.

>> property only. See 'libxl: Xen Platform device support for Q35'
>> patch for a detailed description.
>> 
>> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
>> ---
>>  tools/libxl/libxl_dm.c      | 16 ++++++++++------
>>  tools/libxl/libxl_types.idl |  7 +++++++
>>  tools/xl/xl_parse.c         | 14 ++++++++++++++
>>  3 files changed, 31 insertions(+), 6 deletions(-)
>> 
>> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
>> index a3cddce8b7..7b531050c7 100644
>> --- a/tools/libxl/libxl_dm.c
>> +++ b/tools/libxl/libxl_dm.c
>> @@ -1443,13 +1443,17 @@ static int
>> libxl__build_device_model_args_new(libxl__gc *gc,
>> flexarray_append(dm_args, b_info->extra_pv[i]); break;
>>      case LIBXL_DOMAIN_TYPE_HVM:
>> -        if (!libxl_defbool_val(b_info->u.hvm.xen_platform_pci)) {
>> -            /* Switching here to the machine "pc" which does not add
>> -             * the xen-platform device instead of the default
>> "xenfv" machine.
>> -             */
>> -            machinearg = libxl__strdup(gc, "pc,accel=xen");
>> +        if (b_info->device_model_machine ==
>> LIBXL_DEVICE_MODEL_MACHINE_Q35) {
>> +            machinearg = libxl__sprintf(gc, "q35,accel=xen");
>>          } else {
>> -            machinearg = libxl__strdup(gc, "xenfv");
>> +            if (!libxl_defbool_val(b_info->u.hvm.xen_platform_pci))
>> {
>> +                /* Switching here to the machine "pc" which does
>> not add
>> +                 * the xen-platform device instead of the default
>> "xenfv" machine.
>> +                 */
>> +                machinearg = libxl__strdup(gc, "pc,accel=xen");
>> +            } else {
>> +                machinearg = libxl__strdup(gc, "xenfv");
>> +            }
>>          }
>>          if (b_info->u.hvm.mmio_hole_memkb) {
>>              uint64_t max_ram_below_4g = (1ULL << 32) -
>> diff --git a/tools/libxl/libxl_types.idl
>> b/tools/libxl/libxl_types.idl index 35038120ca..f3ef3cbdde 100644
>> --- a/tools/libxl/libxl_types.idl
>> +++ b/tools/libxl/libxl_types.idl
>> @@ -101,6 +101,12 @@ libxl_device_model_version =
>> Enumeration("device_model_version", [ (2, "QEMU_XEN"),             #
>> Upstream based qemu-xen device model ])
>>  
>> +libxl_device_model_machine = Enumeration("device_model_machine", [
>> +    (0, "UNKNOWN"),  
>
>Shouldn't this be named DEFAULT?

"Unknown" here should be read as "unspecified", but I guess DEFAULT
will be clearer anyway.

>> +    (1, "I440"),
>> +    (2, "Q35"),
>> +    ])
>> +
>>  libxl_console_type = Enumeration("console_type", [
>>      (0, "UNKNOWN"),
>>      (1, "SERIAL"),
>> @@ -491,6 +497,7 @@ libxl_domain_build_info =
>> Struct("domain_build_info",[ ("device_model_ssid_label", string),
>>      # device_model_user is not ready for use yet
>>      ("device_model_user", string),
>> +    ("device_model_machine", libxl_device_model_machine),
>>  
>>      # extra parameters pass directly to qemu, NULL terminated
>>      ("extra",            libxl_string_list),
>> diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
>> index f6842540ca..a7506a426b 100644
>> --- a/tools/xl/xl_parse.c
>> +++ b/tools/xl/xl_parse.c
>> @@ -2110,6 +2110,20 @@ skip_usbdev:
>>      xlu_cfg_replace_string(config, "device_model_user",
>>                             &b_info->device_model_user, 0);
>>  
>> +    if (!xlu_cfg_get_string (config, "device_model_machine", &buf,
>> 0)) {
>> +        if (!strcmp(buf, "i440")) {
>> +            b_info->device_model_machine =
>> LIBXL_DEVICE_MODEL_MACHINE_I440;
>> +        } else if (!strcmp(buf, "q35")) {
>> +            b_info->device_model_machine =
>> LIBXL_DEVICE_MODEL_MACHINE_Q35;
>> +        } else {
>> +            fprintf(stderr,
>> +                    "Unknown device_model_machine \"%s\"
>> specified\n", buf);
>> +            exit(1);
>> +        }
>> +    } else {
>> +        b_info->device_model_machine =
>> LIBXL_DEVICE_MODEL_MACHINE_UNKNOWN;  
>
>That seems to be it's usage. I'm not sure you should explicitly set it
>in the default case (DEFAULT == 0 already).

Will check this, although setting the variable value explicitly is good
for code readability I think.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 06/12] hvmloader: add basic Q35 support
  2018-03-19 15:30   ` Roger Pau Monné
@ 2018-03-19 23:44     ` Alexey G
  2018-03-20  9:20       ` Roger Pau Monné
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-19 23:44 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

On Mon, 19 Mar 2018 15:30:14 +0000
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Tue, Mar 13, 2018 at 04:33:51AM +1000, Alexey Gerasimenko wrote:
>> This patch does following:
>> 
>> 1. Move PCI-device specific initialization out of pci_setup function
>> to the newly created class_specific_pci_device_setup function to
>> simplify code.
>> 
>> 2. PCI-device specific initialization extended with LPC controller
>> initialization
>> 
>> 3. Initialize PIRQA...{PIRQD, PIRQH} routing accordingly to the
>> emulated south bridge (either located on PCI_ISA_DEVFN or
>> PCI_ICH9_LPC_DEVFN).
>> 
>> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
>> ---
>>  tools/firmware/hvmloader/config.h |   1 +
>>  tools/firmware/hvmloader/pci.c    | 162
>> ++++++++++++++++++++++++-------------- 2 files changed, 104
>> insertions(+), 59 deletions(-)
>> 
>> diff --git a/tools/firmware/hvmloader/config.h
>> b/tools/firmware/hvmloader/config.h index 6e00413f2e..6fde6b7b60
>> 100644 --- a/tools/firmware/hvmloader/config.h
>> +++ b/tools/firmware/hvmloader/config.h
>> @@ -52,6 +52,7 @@ extern uint8_t ioapic_version;
>>  
>>  #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
>>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI
>> connected */ +#define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0 */
>>  
>>  /* MMIO hole: Hardcoded defaults, which can be dynamically
>> expanded. */ #define PCI_MEM_END         0xfc000000
>> diff --git a/tools/firmware/hvmloader/pci.c
>> b/tools/firmware/hvmloader/pci.c index 0b708bf578..033bd20992 100644
>> --- a/tools/firmware/hvmloader/pci.c
>> +++ b/tools/firmware/hvmloader/pci.c
>> @@ -35,6 +35,7 @@ unsigned long pci_mem_end = PCI_MEM_END;
>>  uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
>>  
>>  enum virtual_vga virtual_vga = VGA_none;
>> +uint32_t vga_devfn = 256;  
>
>uint8_t should be enough to store a devfn. Also this should be static
>maybe?

Yep, forgot 'static'. Changing uint32_t to uint8_t here will require
to change the
'    if ( vga_devfn != 256 )' condition as well -- it's a bit out of the
patch scope, probably a separate tiny patch would be better.

>>  unsigned long igd_opregion_pgbase = 0;
>>  
>>  /* Check if the specified range conflicts with any reserved device
>> memory. */ @@ -76,14 +77,93 @@ static int find_next_rmrr(uint32_t
>> base) return next_rmrr;
>>  }
>>  
>> +#define SCI_EN_IOPORT  (ACPI_PM1A_EVT_BLK_ADDRESS_V1 + 0x30)
>> +#define GBL_SMI_EN      (1 << 0)
>> +#define APMC_EN         (1 << 5)  
>
>Alignment.

Will correct.

>> +
>> +static void class_specific_pci_device_setup(uint16_t vendor_id,
>> +                                            uint16_t device_id,
>> +                                            uint8_t bus, uint8_t
>> devfn) +{
>> +    uint16_t class;
>> +
>> +    class = pci_readw(devfn, PCI_CLASS_DEVICE);
>> +
>> +    switch ( class )  
>
>switch ( pci_readw(devfn, PCI_CLASS_DEVICE) ) ?
>
>I don't see class being used elsewhere.

>Also why is vendor_id/device_id provided by the caller but not class?
>It seems kind of pointless.

'class' is not used by pci_setup(), thus moved to
class_specific_pci_device_setup().

pci_readw(devfn, PCI_CLASS_DEVICE) inside the switch condition to drop
the variable -- sure, agree.

Passing vendor_id/device_id pair via function args allows to avoid
reading the vendor_id/device_id from PCI conf twice -- a bit less
garbage in the polluted PCI setup debuglog. It's not a big problem
really, so this can be changed to passing only BDF to
class_specific_pci_device_setup().

>Why not fetch vendor/device from the function itself and move the
>(vendor_id == 0xffff) && (device_id == 0xffff) check inside the
>function?

Hmm, this is a part of the PCI bus enumeration, not PCI device setup.

>Also in this case I think it would be better to have a non-functional
>patch that introduces class_specific_pci_device_setup and a second
>patch that adds support for ICH9.
>
>Having code movement and new code in the same patch makes it harder to
>very what you are actually moving vs introducing.

Agree, will split this actions to separate patches for the next version.

>> +    {
>> +    case 0x0300:  
>
>All this values need to be defines documented somewhere.

Agree... although it was not me who introduced all these hardcoded PCI
class values. :) I'll change these numbers into newly added pci_regs.h
#defines in the non-functional patch.

>> +        /* If emulated VGA is found, preserve it as primary VGA. */
>> +        if ( (vendor_id == 0x1234) && (device_id == 0x1111) )
>> +        {
>> +            vga_devfn = devfn;
>> +            virtual_vga = VGA_std;
>> +        }
>> +        else if ( (vendor_id == 0x1013) && (device_id == 0xb8) )
>> +        {
>> +            vga_devfn = devfn;
>> +            virtual_vga = VGA_cirrus;
>> +        }
>> +        else if ( virtual_vga == VGA_none )
>> +        {
>> +            vga_devfn = devfn;
>> +            virtual_vga = VGA_pt;
>> +            if ( vendor_id == 0x8086 )
>> +            {
>> +                igd_opregion_pgbase =
>> mem_hole_alloc(IGD_OPREGION_PAGES);
>> +                /*
>> +                 * Write the the OpRegion offset to give the
>> opregion
>> +                 * address to the device model. The device model
>> will trap
>> +                 * and map the OpRegion at the give address.
>> +                 */
>> +                pci_writel(vga_devfn, PCI_INTEL_OPREGION,
>> +                           igd_opregion_pgbase << PAGE_SHIFT);
>> +            }
>> +        }
>> +        break;
>> +
>> +    case 0x0680:
>> +        /* PIIX4 ACPI PM. Special device with special PCI config
>> space. */
>> +        ASSERT((vendor_id == 0x8086) && (device_id == 0x7113));
>> +        pci_writew(devfn, 0x20, 0x0000); /* No smb bus IO enable */
>> +        pci_writew(devfn, 0xd2, 0x0000); /* No smb bus IO enable */
>> +        pci_writew(devfn, 0x22, 0x0000);
>> +        pci_writew(devfn, 0x3c, 0x0009); /* Hardcoded IRQ9 */
>> +        pci_writew(devfn, 0x3d, 0x0001);
>> +        pci_writel(devfn, 0x40, ACPI_PM1A_EVT_BLK_ADDRESS_V1 | 1);
>> +        pci_writeb(devfn, 0x80, 0x01); /* enable PM io space */
>> +        break;
>> +
>> +    case 0x0601:
>> +        /* LPC bridge */
>> +        if (vendor_id == 0x8086 && device_id == 0x2918)
>> +        {
>> +            pci_writeb(devfn, 0x3c, 0x09); /* Hardcoded IRQ9 */
>> +            pci_writeb(devfn, 0x3d, 0x01);
>> +            pci_writel(devfn, 0x40, ACPI_PM1A_EVT_BLK_ADDRESS_V1 |
>> 1);
>> +            pci_writeb(devfn, 0x44, 0x80); /* enable PM io space */
>> +            outl(SCI_EN_IOPORT, inl(SCI_EN_IOPORT) | GBL_SMI_EN |
>> APMC_EN);
>> +        }
>> +        break;
>> +
>> +    case 0x0101:
>> +        if ( vendor_id == 0x8086 )
>> +        {
>> +            /* Intel ICHs since PIIX3: enable IDE legacy mode. */
>> +            pci_writew(devfn, 0x40, 0x8000); /* enable IDE0 */
>> +            pci_writew(devfn, 0x42, 0x8000); /* enable IDE1 */
>> +        }
>> +        break;
>> +    }
>> +}
>> +
>>  void pci_setup(void)
>>  {
>>      uint8_t is_64bar, using_64bar, bar64_relocate = 0;
>>      uint32_t devfn, bar_reg, cmd, bar_data, bar_data_upper;
>>      uint64_t base, bar_sz, bar_sz_upper, mmio_total = 0;
>> -    uint32_t vga_devfn = 256;
>> -    uint16_t class, vendor_id, device_id;
>> +    uint16_t vendor_id, device_id;
>>      unsigned int bar, pin, link, isa_irq;
>> +    int is_running_on_q35 = 0;  
>
>bool is_running_on_q35 = (get_pc_machine_type() == MACHINE_TYPE_Q35);

OK

>>  
>>      /* Resources assignable to PCI devices via BARs. */
>>      struct resource {
>> @@ -130,13 +210,28 @@ void pci_setup(void)
>>      if ( s )
>>          mmio_hole_size = strtoll(s, NULL, 0);
>>  
>> +    /* check if we are on Q35 and set the flag if it is the case */
>> +    is_running_on_q35 = get_pc_machine_type() == MACHINE_TYPE_Q35;
>> +
>>      /* Program PCI-ISA bridge with appropriate link routes. */
>>      isa_irq = 0;
>>      for ( link = 0; link < 4; link++ )
>>      {
>>          do { isa_irq = (isa_irq + 1) & 15;
>>          } while ( !(PCI_ISA_IRQ_MASK & (1U << isa_irq)) );
>> -        pci_writeb(PCI_ISA_DEVFN, 0x60 + link, isa_irq);
>> +
>> +        if (is_running_on_q35)  
>
>Coding style.

OK

>> +        {
>> +            pci_writeb(PCI_ICH9_LPC_DEVFN, 0x60 + link, isa_irq);
>> +
>> +            /* PIRQE..PIRQH are unused */
>> +            pci_writeb(PCI_ICH9_LPC_DEVFN, 0x68 + link, 0x80);  
>
>According to the spec 0x80 is the default value for this registers, do
>you really need to write it?
>
>Is maybe QEMU not correctly setting the default value?

Won't agree here. We're initializing PIRQ[n] routing in this
fragment, it's better not to rely on any values but simply initialize
all PIRQ[n]_ROUT registers, this makes it explicit.

Even if it is unnecessary due to defaults it's more obvious to set
these registers to our own values than to force a reader to either look
up their emulation in QEMU code or read the ICH9 pdf to confirm
assumptions.

>> +        }
>> +        else
>> +        {
>> +            pci_writeb(PCI_ISA_DEVFN, 0x60 + link, isa_irq);  
>
>Is all this magic described somewhere that you can reference?

It's setting up PCI interrupt routing for PIC mode. All this 
PIRQ[n]_ROUT stuff basically needed for legacy compatibility only,
normally we deal with APIC mode (+MSIs).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 04/12] hvmloader: add ACPI enabling for Q35
  2018-03-19 13:01   ` Roger Pau Monné
@ 2018-03-19 23:59     ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-19 23:59 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

On Mon, 19 Mar 2018 13:01:58 +0000
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Tue, Mar 13, 2018 at 04:33:49AM +1000, Alexey Gerasimenko wrote:
>> In order to turn on ACPI for OS, we need to write a chipset-specific
>> value to SMI_CMD register (sort of imitation of the APM->ACPI switch
>> on real systems). Modify acpi_enable_sci() function to support both
>> i440 and Q35 emulation.
>> 
>> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
>> ---
>>  tools/firmware/hvmloader/hvmloader.c | 11 +++++++++--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>> 
>> diff --git a/tools/firmware/hvmloader/hvmloader.c
>> b/tools/firmware/hvmloader/hvmloader.c index f603f68ded..070698440e
>> 100644 --- a/tools/firmware/hvmloader/hvmloader.c
>> +++ b/tools/firmware/hvmloader/hvmloader.c
>> @@ -257,9 +257,16 @@ static const struct bios_config
>> *detect_bios(void) static void acpi_enable_sci(void)
>>  {
>>      uint8_t pm1a_cnt_val;
>> +    uint8_t acpi_enable_val;
>>  
>> -#define PIIX4_SMI_CMD_IOPORT 0xb2
>> +#define SMI_CMD_IOPORT       0xb2
>>  #define PIIX4_ACPI_ENABLE    0xf1
>> +#define ICH9_ACPI_ENABLE     0x02
>> +
>> +    if (get_pc_machine_type() == MACHINE_TYPE_Q35)
>> +        acpi_enable_val = ICH9_ACPI_ENABLE;
>> +    else
>> +        acpi_enable_val = PIIX4_ACPI_ENABLE;  
>
>Coding style, but I would rather:
>
>switch ( get_pc_machine_type() )
>{
>case MACHINE_TYPE_Q35:
>...
>case MACHINE_TYPE_I440:
>...
>default:
>BUG();
>}

Agree, a better code maintainability.

>I think storing the machine type in a global variable is better than
>calling get_pc_machine_type each time.

OK, will switch to it.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 05/12] hvmloader: add Q35 DSDT table loading
  2018-03-19 14:45   ` Roger Pau Monné
@ 2018-03-20  0:15     ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-20  0:15 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

On Mon, 19 Mar 2018 14:45:29 +0000
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Tue, Mar 13, 2018 at 04:33:50AM +1000, Alexey Gerasimenko wrote:
>> Allows to select Q35 DSDT table in hvmloader_acpi_build_tables().
>> Function get_pc_machine_type() is used to select a proper table
>> (i440/q35).
>> 
>> As we are bound to the qemu-xen device model for Q35, no need
>> to initialize config->dsdt_15cpu/config->dsdt_15cpu_len fields.
>> 
>> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
>> ---
>>  tools/firmware/hvmloader/util.c | 13 +++++++++++--
>>  tools/firmware/hvmloader/util.h |  2 ++
>>  2 files changed, 13 insertions(+), 2 deletions(-)
>> 
>> diff --git a/tools/firmware/hvmloader/util.c
>> b/tools/firmware/hvmloader/util.c index 5739a87628..d8db9e3c8e 100644
>> --- a/tools/firmware/hvmloader/util.c
>> +++ b/tools/firmware/hvmloader/util.c
>> @@ -955,8 +955,17 @@ void hvmloader_acpi_build_tables(struct
>> acpi_config *config, }
>>      else if ( !strncmp(s, "qemu_xen", 9) )
>>      {
>> -        config->dsdt_anycpu = dsdt_anycpu_qemu_xen;
>> -        config->dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len;
>> +        if (get_pc_machine_type() == MACHINE_TYPE_Q35)  
>
>Coding style (missing spaces between parentheses), and I would prefer
>a switch here.

OK, will change to a switch.

>IMO you should add a BUG_ON(Q35) in the qemu_xen_traditional condition
>above this one..

AFAIR qemu-traditional knows nothing about Q35 emulation, so we won't
ever encounter a Q35 chipset while using qemu-traditional.

>> +        {
>> +            config->dsdt_anycpu = dsdt_q35_anycpu_qemu_xen;
>> +            config->dsdt_anycpu_len = dsdt_q35_anycpu_qemu_xen_len;
>> +        }
>> +        else
>> +        {
>> +            config->dsdt_anycpu = dsdt_anycpu_qemu_xen;
>> +            config->dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len;
>> +        }
>> +
>>          config->dsdt_15cpu = NULL;
>>          config->dsdt_15cpu_len = 0;
>>      }
>> diff --git a/tools/firmware/hvmloader/util.h
>> b/tools/firmware/hvmloader/util.h index 7c77bedb00..fd2d885c96 100644
>> --- a/tools/firmware/hvmloader/util.h
>> +++ b/tools/firmware/hvmloader/util.h
>> @@ -288,7 +288,9 @@ bool check_overlap(uint64_t start, uint64_t size,
>>                     uint64_t reserved_start, uint64_t reserved_size);
>>  
>>  extern const unsigned char dsdt_anycpu_qemu_xen[], dsdt_anycpu[],
>> dsdt_15cpu[]; +extern const unsigned char dsdt_q35_anycpu_qemu_xen[];
>>  extern const int dsdt_anycpu_qemu_xen_len, dsdt_anycpu_len,
>> dsdt_15cpu_len; +extern const int dsdt_q35_anycpu_qemu_xen_len;  
>
>Since you are adding this, maybe unsigned int? (or size_t?)

No problem, ok.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-19 19:49     ` Alexey G
@ 2018-03-20  8:50       ` Roger Pau Monné
  2018-03-20  9:25         ` Paul Durrant
  2018-03-21  0:58         ` Alexey G
  0 siblings, 2 replies; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-20  8:50 UTC (permalink / raw)
  To: Alexey G
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, Paul Durrant, Jan Beulich,
	xen-devel

On Tue, Mar 20, 2018 at 05:49:22AM +1000, Alexey G wrote:
> On Mon, 19 Mar 2018 15:58:02 +0000
> Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> >On Tue, Mar 13, 2018 at 04:33:52AM +1000, Alexey Gerasimenko wrote:
> >> Much like normal PCI BARs or other chipset-specific memory-mapped
> >> resources, MMCONFIG area needs space in MMIO hole, so we must
> >> allocate it manually.
> >> 
> >> The actual MMCONFIG size depends on a number of PCI buses available
> >> which should be covered by ECAM. Possible options are 64MB, 128MB
> >> and 256MB. As we are limited to the bus 0 currently, thus using
> >> lowest possible setting (64MB), #defined via PCI_MAX_MCFG_BUSES in
> >> hvmloader/config.h. When multiple PCI buses support for Xen will be
> >> implemented, PCI_MAX_MCFG_BUSES may be changed to calculation of the
> >> number of buses according to results of the PCI devices enumeration.
> >> 
> >> The way to allocate MMCONFIG range in MMIO hole is similar to how
> >> other PCI BARs are allocated. The patch extends 'bars' structure to
> >> make it universal for any arbitrary BAR type -- either IO, MMIO, ROM
> >> or a chipset-specific resource.  
> >
> >I'm not sure this is fully correct. The IOREQ interface can
> >differentiate PCI devices and forward config space accesses to
> >different emulators (see IOREQ_TYPE_PCI_CONFIG). With this change you
> >will forward all MCFG accesses to QEMU, which will likely be wrong if
> >there are multiple PCI-device emulators for the same domain.
> >
> >Ie: AFAICT Xen needs to know about the MCFG emulation and detect
> >accesses to it in order to forward them to the right emulators.
> >
> >Adding Paul who knows more about all this.
> 
> In which use cases multiple PCI-device emulators are used for a single
> HVM domain? Is it a proprietary setup?

Likely. I think XenGT might be using it. It's a feature of the IOREQ
implementation in Xen.

Traditional PCI config space accesses are not IO port space accesses.
The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and IOREQ
servers can register devices they would like to receive configuration
space accesses for. QEMU is already making use of this, see for
example xen_map_pcidev in the QEMU code.

By treating MCFG accesses as MMIO you are bypassing the IOREQ PCI
layer, and thus a IOREQ server could register a PCI device and only
receive PCI configuration accesses from the IO port space, while MCFG
accesses would be forwarded somewhere else.

I think you need to make the IOREQ code aware of the MCFG area and
XEN_DMOP_IO_RANGE_PCI needs to forward both IO space and MCFG accesses
to the right IOREQ server.

> I assume it is somehow related to this code in xen-hvm.c:
>                 /* Fake a write to port 0xCF8 so that
>                  * the config space access will target the
>                  * correct device model.
>                  */
>                 val = (1u << 31) | ((req->addr & 0x0f00) <...>
>                 do_outp(0xcf8, 4, val);
> if yes, similar thing can be made for IOREQ_TYPE_COPY accesses to
> the emulated MMCONFIG if needed.

I have to admit I don't know that much about QEMU, and I have no idea
what the chunk above is supposed to accomplish.

> 
> In HVM+QEMU case we are not limited to merely passed through devices,
> most of the observable PCI config space devices belong to one particular
> QEMU instance. This dictates the overall emulated MMCONFIG layout
> for a domain which should be in sync to what QEMU emulates via CF8h/CFCh
> accesses... and between multiple device model instances (if there are
> any, still not sure what multiple PCI-device emulators you mentioned
> really are).

In newer versions of Xen (>4.5 IIRC, Paul knows more), QEMU doesn't
directly trap accesses to the 0xcf8/0xcfc IO ports, it's Xen instead
the one that detects and decodes such accesses, and then forwards them
to the IOREQ server that has been registered to handle them.

You cannot simply forward all MCFG accesses to QEMU as MMIO accesses,
Xen needs to decode them and they need to be handled as
IOREQ_TYPE_PCI_CONFIG requests, not IOREQ_TYPE_COPY IMO.

> 
> Basically, we have an emulated MMCONFIG area of 64/128/256MB size in
> the MMIO hole of the guest HVM domain. (BTW, this area itself can be
> considered a feature of the chipset the device model emulates.)
> It can be relocated to some other place in MMIO hole, this means that
> QEMU will trap accesses to the specific to the emulated chipset
> PCIEXBAR register and will issue same MMIO unmap/map calls as for
> any normal emulated MMIO range.
> 
> On the other hand, it won't be easy to provide emulated MMCONFIG
> translation into IOREQ_TYPE_PCI_CONFIG from Xen side. Xen should know
> current emulated MMCONFIG area position and size in order to translate
> (or not) accesses to it into corresponding BDF/reg pair (+whether that
> area is enabled for decoding or not). This will likely require to
> introduce new hypercall(s).

Yes, you will have to introduce new hypercalls to tell Xen the
position/size of the MCFG hole. Likely you want to tell it the start
address, the pci segment, start bus and end bus. I know pci segment
and start bus is always going to be 0 ATM, but it would be nice to
have a complete interface.

By your comment above I think you want an interface that allows you to
remove/add those MCFG areas at runtime.

> The question is if there will be any difference or benefit at all.

IMO it's not about benefits or differences, it's about correctness.
Xen currently detects accesses to the PCI configuration space from IO
ports and for consistency it should also detect accesses to this space
by any other means.

> It's basically the same emulated MMIO range after all, but in one case
> we trap accesses to it in Xen and translate them into
> IOREQ_TYPE_PCI_CONFIG requests.
> We have to provide some infrastructure to let Xen know where the device 
> model/guest expects to use the MMCONFIG area (and its size). The
> device model will need to use this infrastructure, informing Xen of
> any changes. Also, due to MMCONFIG nature there might be some pitfalls
> like a necessity to send multiple IOREQ_TYPE_PCI_CONFIG ioreqs caused by
> a single memory read/write operation.

This seems all fine. Why do you expect MCFG access to create multiple
IOREQ_TYPE_PCI_CONFIG but not multiple IOREQ_TYPE_COPY?

> In another case, we still have an emulated MMIO range, but Xen will send
> plain IOREQ_TYPE_COPY requests to QEMU which it handles itself.
> In such case, all code to work with MMCONFIG accesses is available for
> reuse right away (mmcfg -> pci_* translation in QEMU), no new
> functionality required neither in Xen or QEMU.

As I tried to argument above, I think this is not correct, but I would
also like that Paul expresses his opinion as the IOREQ maintainer.

> >>  tools/firmware/hvmloader/config.h   |   4 ++
> >>  tools/firmware/hvmloader/pci.c      | 127
> >> ++++++++++++++++++++++++++++--------
> >> tools/firmware/hvmloader/pci_regs.h |   2 + 3 files changed, 106
> >> insertions(+), 27 deletions(-)
> >> 
> >> diff --git a/tools/firmware/hvmloader/config.h
> >> b/tools/firmware/hvmloader/config.h index 6fde6b7b60..5443ecd804
> >> 100644 --- a/tools/firmware/hvmloader/config.h
> >> +++ b/tools/firmware/hvmloader/config.h
> >> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
> >>  #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
> >>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI
> >> connected */ #define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0 */
> >> +#define PCI_MCH_DEVFN       0       /* bus 0, dev 0, func 0 */
> >>  
> >>  /* MMIO hole: Hardcoded defaults, which can be dynamically
> >> expanded. */ #define PCI_MEM_END         0xfc000000
> >>  
> >> +/* possible values are: 64, 128, 256 */
> >> +#define PCI_MAX_MCFG_BUSES  64  
> >
> >What the reasoning for this value? Do we know which devices need ECAM
> >areas?
> 
> Yes, Xen is limited to bus 0 emulation currently, the description
> states "When multiple PCI buses support for Xen will be implemented,
> PCI_MAX_MCFG_BUSES may be changed to calculation of the number of buses
> according to results of the PCI devices enumeration".
> 
> I think it might be better to replace 'switch (PCI_MAX_MCFG_BUSES)'
> with the real code right away, i.e. change it to
> 
> 'switch (max_bus_num, aligned up to 64/128/256 boundary)',
> where max_bus_num should be set in PCI device enumeration code in
> pci_setup(). As we are limited to bus 0 currently, we'll just set it
> to 0 for now, before/after the PCI device enumeration loop (which should
> became multi-bus capable eventually).

I guess this is all pretty much hardcoded to bus 0 in several places,
so I'm not sure it's worth to add PCI_MAX_MCFG_BUSES. IMO if something
like this should be added it should be PCI_MAX_BUSES, and several
places should be changed to make use of it. Or ideally we should find
a way to detect this at runtime, without needed any hardcoded defines.

I think it would be good if you can add a note comment describing the
different MCFG sizes supported by the Q35 chipset (64/128/256).

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 11/12] hvmloader: use libacpi to build MCFG table
  2018-03-19 21:20     ` Alexey G
@ 2018-03-20  8:58       ` Roger Pau Monné
  2018-03-20  9:36       ` Jan Beulich
  1 sibling, 0 replies; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-20  8:58 UTC (permalink / raw)
  To: Alexey G; +Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

On Tue, Mar 20, 2018 at 07:20:53AM +1000, Alexey G wrote:
> On Mon, 19 Mar 2018 17:49:09 +0000
> Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> >On Tue, Mar 13, 2018 at 04:33:56AM +1000, Alexey Gerasimenko wrote:
> >> This patch extends hvmloader_acpi_build_tables() with code which
> >> detects if MMCONFIG is available -- i.e. initialized and enabled
> >> (+we're running on Q35), obtains its base address and size and asks
> >> libacpi to build MCFG table for it via setting the flag
> >> ACPI_HAS_MCFG in a manner similar to other optional ACPI tables
> >> building.
> >> 
> >> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> >> ---
> >>  tools/firmware/hvmloader/util.c | 70
> >> +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 70
> >> insertions(+)
> >> 
> >> diff --git a/tools/firmware/hvmloader/util.c
> >> b/tools/firmware/hvmloader/util.c index d8db9e3c8e..c6fc81d52a 100644
> >> --- a/tools/firmware/hvmloader/util.c
> >> +++ b/tools/firmware/hvmloader/util.c
> >> @@ -782,6 +782,69 @@ int get_pc_machine_type(void)
> >>      return machine_type;
> >>  }
> >>  
> >> +#define PCIEXBAR_ADDR_MASK_64MB     (~((1ULL << 26) - 1))
> >> +#define PCIEXBAR_ADDR_MASK_128MB    (~((1ULL << 27) - 1))
> >> +#define PCIEXBAR_ADDR_MASK_256MB    (~((1ULL << 28) - 1))
> >> +#define PCIEXBAR_LENGTH_BITS(reg)   (((reg) >> 1) & 3)
> >> +#define PCIEXBAREN                  1  
> >
> >PCIEXBAR_ENABLE maybe?
> 
> PCIEXBAREN is just an official name of this bit from the
> Intel datasheet. :) OK, will rename it to PCIEXBAR_ENABLE.

Oh, if that's the name on the spec then leave it as-is. It's always
best to be able to search directly on the spec.

> >> +
> >> +static uint64_t mmconfig_get_base(void)
> >> +{
> >> +    uint64_t base;
> >> +    uint32_t reg = pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR);
> >> +
> >> +    base = reg | (uint64_t) pci_readl(PCI_MCH_DEVFN,
> >> PCI_MCH_PCIEXBAR+4) << 32;  
> >
> >Please add parentheses in the above expression.
> 
> Agree, parentheses will make the op priority clearer.
> 
> >> +
> >> +    switch (PCIEXBAR_LENGTH_BITS(reg))
> >> +    {
> >> +    case 0:
> >> +        base &= PCIEXBAR_ADDR_MASK_256MB;
> >> +        break;
> >> +    case 1:
> >> +        base &= PCIEXBAR_ADDR_MASK_128MB;
> >> +        break;
> >> +    case 2:
> >> +        base &= PCIEXBAR_ADDR_MASK_64MB;
> >> +        break;  
> >
> >Missing newlines, plus this looks like it wants to use the defines
> >introduced in patch 7 (PCIEXBAR_{64,128,256}_BUSES). Also any reason
> >this patch and patch 7 cannot be put sequentially?
> 
> I think all these #defines should find a way to pci_regs.h, it seems
> like an appropriate place for them.

Hm, pci_regs.h seems to contain the generic PCI registers. Those
should maybe live in a q35.h header, since it's very device specific
AFAICT.

> Regarding the order of hvmloader patches -- will verify this for
> the next version.
> 
> >They are very related, and in fact I'm not sure why we need to write
> >this info to the device in patch 7 and then fetch it from the device
> >here. Isn't there an easier way to pass this information? At the end
> >this is all in hvmloader.
> 
> Well, the hvmloader_acpi_build_tables() function mostly does device
> probing (using I/O instruction) and xenstore reads to collect system
> information in order to discover which ACPI_HAS_* flags it should pass
> to acpi_build_tables(), but using global variables to pass this kind of
> information for MMCONFIG will be OK too I think.

It was just a suggestion, it seems kind of cumbersome to write
something to a register and then fetch it afterwards, when it's all
done in the same binary.

> >> +    case 3:  
> >
> >default:
> 
> There is '& 3' for the switch argument, but ok I guess, it's clearer
> with 'default'.
> 
> >> +        BUG();  /* a reserved value encountered */
> >> +    }
> >> +
> >> +    return base;
> >> +}
> >> +
> >> +static uint32_t mmconfig_get_size(void)  
> >
> >unsigned int or size_t?
> 
> Using types which are common to the existing code.
> 
> size_t have almost zero use in hvmloader.

If it's available I would rather use it.

> >> +{
> >> +    if (get_pc_machine_type() == MACHINE_TYPE_Q35)
> >> +    {
> >> +        if (mmconfig_is_enabled() && mmconfig_get_base())  
> >
> >Coding style.
> >
> >Also you can join the conditions:
> >
> >if ( get_pc_machine_type() == MACHINE_TYPE_Q35 &&
> >mmconfig_is_enabled() &&
> >     mmconfig_get_base() )
> >     return true;
> >
> >Looking at this, is it actually a valid state to have
> >mmconfig_is_enabled() == true and mmconfig_get_base() == 0?
> 
> Yes, in theory we can have either PCIEXBAREN=0 and a valid PCIEXBAR
> base, or vice versa.
> Of course normally we should not encounter a situation where base=0 and
> PCIEXBAREN=1, just covering here possible cases which the register
> format allows.

But those registers are set by hvmloader, and I don't think hvmloader
will ever set PCIEXBAREN == 1 and PCIEXBAR base == 0?

> Regarding check merging -- ok, sure. Short-circuit evaluation should
> guaranty that these registers are not touched on a different
> machine.

Yes, if you first check for the chipset type.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 10/12] libacpi: build ACPI MCFG table if requested
  2018-03-19 21:46     ` Alexey G
@ 2018-03-20  9:03       ` Roger Pau Monné
  2018-03-20 21:06         ` Alexey G
  0 siblings, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-20  9:03 UTC (permalink / raw)
  To: Alexey G; +Cc: xen-devel, Wei Liu, Ian Jackson, Jan Beulich

On Tue, Mar 20, 2018 at 07:46:04AM +1000, Alexey G wrote:
> On Mon, 19 Mar 2018 17:33:34 +0000
> Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> >On Tue, Mar 13, 2018 at 04:33:55AM +1000, Alexey Gerasimenko wrote:
> >> This adds construct_mcfg() function to libacpi which allows to build
> >> MCFG table for a given mmconfig_addr/mmconfig_len pair if the
> >> ACPI_HAS_MCFG flag was specified in acpi_config struct.
> >> 
> >> The maximum bus number is calculated from mmconfig_len using
> >> MCFG_SIZE_TO_NUM_BUSES macro (1MByte of MMIO space per bus).
> >> 
> >> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> >> ---
> >>  tools/libacpi/acpi2_0.h | 21 +++++++++++++++++++++
> >>  tools/libacpi/build.c   | 42
> >> ++++++++++++++++++++++++++++++++++++++++++ tools/libacpi/libacpi.h
> >> |  4 ++++ 3 files changed, 67 insertions(+)
> >> 
> >> diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
> >> index 2619ba32db..209ad1acd3 100644
> >> --- a/tools/libacpi/acpi2_0.h
> >> +++ b/tools/libacpi/acpi2_0.h
> >> @@ -422,6 +422,25 @@ struct acpi_20_slit {
> >>  };
> >>  
> >>  /*
> >> + * PCI Express Memory Mapped Configuration Description Table
> >> + */
> >> +struct mcfg_range_entry {
> >> +    uint64_t base_address;
> >> +    uint16_t pci_segment;
> >> +    uint8_t  start_pci_bus_num;
> >> +    uint8_t  end_pci_bus_num;
> >> +    uint32_t reserved;
> >> +};
> >> +
> >> +struct acpi_mcfg {
> >> +    struct acpi_header header;
> >> +    uint8_t reserved[8];
> >> +    struct mcfg_range_entry entries[1];
> >> +};  
> >
> >I would define this as:
> >
> >struct acpi_10_mcfg {
> >    struct acpi_header header;
> >    uint8_t reserved[8];
> >    struct acpi_10_mcfg_entry {
> >        uint64_t base_address;
> >        uint16_t pci_segment;
> >        uint8_t  start_pci_bus;
> >        uint8_t  end_pci_bus;
> >        uint32_t reserved;
> >    } entries[1];
> >};
> 
> Hmm, a choice of preference, but OK, will move it inside.

Note the name change also (acpi_10_mcfg). Also I think you can drop
the acpi_10_mcfg_entry name and just use an anonymous struct.

> >> +
> >> +    mcfg->entries[0].base_address = config->mmconfig_addr;
> >> +    mcfg->entries[0].pci_segment = 0;
> >> +    mcfg->entries[0].start_pci_bus_num = 0;
> >> +    mcfg->entries[0].end_pci_bus_num =
> >> +        MCFG_SIZE_TO_NUM_BUSES(config->mmconfig_len) - 1;  
> >
> >Why not pass the start_bus and end_bus values in acpi_config at least?
> 
> start_pci_bus_num will be always 0.
> 
> It will be kinda ugly to pass config->mmconfig_addr along with
> config->end_pci_bus_num, baseaddr+size combo looks nicer I think.

I'm not going to insist, but ACPI doesn't really care about the size,
it just needs to know the start and end. Seems pointless to write a
value here that later libacpi needs to convert to the value it
actually needs. Also start/end buses are uint8_t, size is uint32_t.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 08/12] libxl: Q35 support (new option device_model_machine)
  2018-03-19 22:11     ` Alexey G
@ 2018-03-20  9:11       ` Roger Pau Monné
  2018-03-21 16:27         ` Wei Liu
  2018-03-21 16:25       ` Wei Liu
  1 sibling, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-20  9:11 UTC (permalink / raw)
  To: Alexey G; +Cc: xen-devel, Wei Liu, Ian Jackson

On Tue, Mar 20, 2018 at 08:11:49AM +1000, Alexey G wrote:
> On Mon, 19 Mar 2018 17:01:18 +0000
> Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> >On Tue, Mar 13, 2018 at 04:33:53AM +1000, Alexey Gerasimenko wrote:
> >> Provide a new domain config option to select the emulated machine
> >> type, device_model_machine. It has following possible values:
> >> - "i440" - i440 emulation (default)
> >> - "q35" - emulate a Q35 machine. By default, the storage interface
> >> is AHCI.  
> >
> >I would rather name this machine_chipset or device_model_chipset.
> 
> device_model_ prefix is a must I think -- multiple device model related
> options have names starting with device_model_.
> 
> device_model_chipset... well, maybe, but we're actually specifying a
> QEMU machine here. In QEMU mailing list there was even a suggestion
> to allow to pass a machine version number here, like "pc-q35-2.10".
> I think some opinions are needed here.

I'm not sure what a 'machine' is in QEMU speak, but in my mind I would
consider PC a machine (vs ARM for example).

I think 'chipset' is clearer, but again others should express their
opinion.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 06/12] hvmloader: add basic Q35 support
  2018-03-19 23:44     ` Alexey G
@ 2018-03-20  9:20       ` Roger Pau Monné
  2018-03-20 21:23         ` Alexey G
  0 siblings, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-20  9:20 UTC (permalink / raw)
  To: Alexey G; +Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

On Tue, Mar 20, 2018 at 09:44:33AM +1000, Alexey G wrote:
> On Mon, 19 Mar 2018 15:30:14 +0000
> Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> >On Tue, Mar 13, 2018 at 04:33:51AM +1000, Alexey Gerasimenko wrote:
> >> +    {
> >> +    case 0x0300:  
> >
> >All this values need to be defines documented somewhere.
> 
> Agree... although it was not me who introduced all these hardcoded PCI
> class values. :) I'll change these numbers into newly added pci_regs.h
> #defines in the non-functional patch.

Right. I've realized that later. If you place this code moment in a
separate patch without any other modifications I won't complain about
the lack of defines (although it would be nice to have them :)).

> >> +        {
> >> +            pci_writeb(PCI_ICH9_LPC_DEVFN, 0x60 + link, isa_irq);
> >> +
> >> +            /* PIRQE..PIRQH are unused */
> >> +            pci_writeb(PCI_ICH9_LPC_DEVFN, 0x68 + link, 0x80);  
> >
> >According to the spec 0x80 is the default value for this registers, do
> >you really need to write it?
> >
> >Is maybe QEMU not correctly setting the default value?
> 
> Won't agree here. We're initializing PIRQ[n] routing in this
> fragment, it's better not to rely on any values but simply initialize
> all PIRQ[n]_ROUT registers, this makes it explicit.
> 
> Even if it is unnecessary due to defaults it's more obvious to set
> these registers to our own values than to force a reader to either look
> up their emulation in QEMU code or read the ICH9 pdf to confirm
> assumptions.

But if you start doing this, you should do it for all the registers.
Why is PIRQE..PIRQH routing special that you need to re-write the
default value? But not SIRQ_CNTL for example?

I think a comment noting that the default value for those registers is
what we expect (0x80 - Interrupt Routing Disabled) would be better.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-20  8:50       ` Roger Pau Monné
@ 2018-03-20  9:25         ` Paul Durrant
  2018-03-21  0:58         ` Alexey G
  1 sibling, 0 replies; 183+ messages in thread
From: Paul Durrant @ 2018-03-20  9:25 UTC (permalink / raw)
  To: Roger Pau Monne, Alexey G
  Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

> -----Original Message-----
> From: Roger Pau Monne
> Sent: 20 March 2018 08:51
> To: Alexey G <x1917x@gmail.com>
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Jan
> Beulich <jbeulich@suse.com>; Wei Liu <wei.liu2@citrix.com>; Paul Durrant
> <Paul.Durrant@citrix.com>
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Tue, Mar 20, 2018 at 05:49:22AM +1000, Alexey G wrote:
> > On Mon, 19 Mar 2018 15:58:02 +0000
> > Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > >On Tue, Mar 13, 2018 at 04:33:52AM +1000, Alexey Gerasimenko wrote:
> > >> Much like normal PCI BARs or other chipset-specific memory-mapped
> > >> resources, MMCONFIG area needs space in MMIO hole, so we must
> > >> allocate it manually.
> > >>
> > >> The actual MMCONFIG size depends on a number of PCI buses available
> > >> which should be covered by ECAM. Possible options are 64MB, 128MB
> > >> and 256MB. As we are limited to the bus 0 currently, thus using
> > >> lowest possible setting (64MB), #defined via PCI_MAX_MCFG_BUSES in
> > >> hvmloader/config.h. When multiple PCI buses support for Xen will be
> > >> implemented, PCI_MAX_MCFG_BUSES may be changed to calculation
> of the
> > >> number of buses according to results of the PCI devices enumeration.
> > >>
> > >> The way to allocate MMCONFIG range in MMIO hole is similar to how
> > >> other PCI BARs are allocated. The patch extends 'bars' structure to
> > >> make it universal for any arbitrary BAR type -- either IO, MMIO, ROM
> > >> or a chipset-specific resource.
> > >
> > >I'm not sure this is fully correct. The IOREQ interface can
> > >differentiate PCI devices and forward config space accesses to
> > >different emulators (see IOREQ_TYPE_PCI_CONFIG). With this change
> you
> > >will forward all MCFG accesses to QEMU, which will likely be wrong if
> > >there are multiple PCI-device emulators for the same domain.
> > >
> > >Ie: AFAICT Xen needs to know about the MCFG emulation and detect
> > >accesses to it in order to forward them to the right emulators.
> > >
> > >Adding Paul who knows more about all this.
> >
> > In which use cases multiple PCI-device emulators are used for a single
> > HVM domain? Is it a proprietary setup?
> 
> Likely. I think XenGT might be using it. It's a feature of the IOREQ
> implementation in Xen.
> 

Multiple ioreq servers are a supported use-case for Xen, if only experimental at this point. And indeed xengt is one such use-case.

> Traditional PCI config space accesses are not IO port space accesses.
> The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and IOREQ
> servers can register devices they would like to receive configuration
> space accesses for. QEMU is already making use of this, see for
> example xen_map_pcidev in the QEMU code.
> 
> By treating MCFG accesses as MMIO you are bypassing the IOREQ PCI
> layer, and thus a IOREQ server could register a PCI device and only
> receive PCI configuration accesses from the IO port space, while MCFG
> accesses would be forwarded somewhere else.
> 
> I think you need to make the IOREQ code aware of the MCFG area and
> XEN_DMOP_IO_RANGE_PCI needs to forward both IO space and MCFG
> accesses
> to the right IOREQ server.

Yes, Xen must intercept all accesses to PCI config space and route them accordingly.

> 
> > I assume it is somehow related to this code in xen-hvm.c:
> >                 /* Fake a write to port 0xCF8 so that
> >                  * the config space access will target the
> >                  * correct device model.
> >                  */
> >                 val = (1u << 31) | ((req->addr & 0x0f00) <...>
> >                 do_outp(0xcf8, 4, val);
> > if yes, similar thing can be made for IOREQ_TYPE_COPY accesses to
> > the emulated MMCONFIG if needed.
> 
> I have to admit I don't know that much about QEMU, and I have no idea
> what the chunk above is supposed to accomplish.
> 

The easiest way to make QEMU behave appropriately when dealing with a config space ioreq was indeed to make it appear as a write to cf8 followed by a read or write to cfc.

> >
> > In HVM+QEMU case we are not limited to merely passed through devices,
> > most of the observable PCI config space devices belong to one particular
> > QEMU instance. This dictates the overall emulated MMCONFIG layout
> > for a domain which should be in sync to what QEMU emulates via
> CF8h/CFCh
> > accesses... and between multiple device model instances (if there are
> > any, still not sure what multiple PCI-device emulators you mentioned
> > really are).
> 
> In newer versions of Xen (>4.5 IIRC, Paul knows more), QEMU doesn't
> directly trap accesses to the 0xcf8/0xcfc IO ports, it's Xen instead
> the one that detects and decodes such accesses, and then forwards them
> to the IOREQ server that has been registered to handle them.
> 

Correct.

> You cannot simply forward all MCFG accesses to QEMU as MMIO accesses,
> Xen needs to decode them and they need to be handled as
> IOREQ_TYPE_PCI_CONFIG requests, not IOREQ_TYPE_COPY IMO.
> 
> >
> > Basically, we have an emulated MMCONFIG area of 64/128/256MB size in
> > the MMIO hole of the guest HVM domain. (BTW, this area itself can be
> > considered a feature of the chipset the device model emulates.)
> > It can be relocated to some other place in MMIO hole, this means that
> > QEMU will trap accesses to the specific to the emulated chipset
> > PCIEXBAR register and will issue same MMIO unmap/map calls as for
> > any normal emulated MMIO range.
> >
> > On the other hand, it won't be easy to provide emulated MMCONFIG
> > translation into IOREQ_TYPE_PCI_CONFIG from Xen side. Xen should know
> > current emulated MMCONFIG area position and size in order to translate
> > (or not) accesses to it into corresponding BDF/reg pair (+whether that
> > area is enabled for decoding or not). This will likely require to
> > introduce new hypercall(s).
> 
> Yes, you will have to introduce new hypercalls to tell Xen the
> position/size of the MCFG hole. Likely you want to tell it the start
> address, the pci segment, start bus and end bus. I know pci segment
> and start bus is always going to be 0 ATM, but it would be nice to
> have a complete interface.
> 
> By your comment above I think you want an interface that allows you to
> remove/add those MCFG areas at runtime.
> 

We're going to want hotplug eventually so, yes, devices need to appear and disappear dynamically.

> > The question is if there will be any difference or benefit at all.
> 
> IMO it's not about benefits or differences, it's about correctness.
> Xen currently detects accesses to the PCI configuration space from IO
> ports and for consistency it should also detect accesses to this space
> by any other means.
> 

Yes, this is a 'must' rather than a 'should' though.

> > It's basically the same emulated MMIO range after all, but in one case
> > we trap accesses to it in Xen and translate them into
> > IOREQ_TYPE_PCI_CONFIG requests.
> > We have to provide some infrastructure to let Xen know where the device
> > model/guest expects to use the MMCONFIG area (and its size). The
> > device model will need to use this infrastructure, informing Xen of
> > any changes. Also, due to MMCONFIG nature there might be some pitfalls
> > like a necessity to send multiple IOREQ_TYPE_PCI_CONFIG ioreqs caused
> by
> > a single memory read/write operation.
> 
> This seems all fine. Why do you expect MCFG access to create multiple
> IOREQ_TYPE_PCI_CONFIG but not multiple IOREQ_TYPE_COPY?
> 
> > In another case, we still have an emulated MMIO range, but Xen will send
> > plain IOREQ_TYPE_COPY requests to QEMU which it handles itself.
> > In such case, all code to work with MMCONFIG accesses is available for
> > reuse right away (mmcfg -> pci_* translation in QEMU), no new
> > functionality required neither in Xen or QEMU.
> 
> As I tried to argument above, I think this is not correct, but I would
> also like that Paul expresses his opinion as the IOREQ maintainer.

Xen should handle MMCONFIG accesses. All PCI device emulators should register for PCI config space by SBDF and the mechanism by which the Xen intercepts the config access and routes it to the emulator should be none of the emulators concern. QEMU does not own the PCI bus topology; Xen does, and it's been this way for quite some time (even if the implementation is incomplete).

  Paul

> 
> > >>  tools/firmware/hvmloader/config.h   |   4 ++
> > >>  tools/firmware/hvmloader/pci.c      | 127
> > >> ++++++++++++++++++++++++++++--------
> > >> tools/firmware/hvmloader/pci_regs.h |   2 + 3 files changed, 106
> > >> insertions(+), 27 deletions(-)
> > >>
> > >> diff --git a/tools/firmware/hvmloader/config.h
> > >> b/tools/firmware/hvmloader/config.h index 6fde6b7b60..5443ecd804
> > >> 100644 --- a/tools/firmware/hvmloader/config.h
> > >> +++ b/tools/firmware/hvmloader/config.h
> > >> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
> > >>  #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
> > >>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI
> > >> connected */ #define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0 */
> > >> +#define PCI_MCH_DEVFN       0       /* bus 0, dev 0, func 0 */
> > >>
> > >>  /* MMIO hole: Hardcoded defaults, which can be dynamically
> > >> expanded. */ #define PCI_MEM_END         0xfc000000
> > >>
> > >> +/* possible values are: 64, 128, 256 */
> > >> +#define PCI_MAX_MCFG_BUSES  64
> > >
> > >What the reasoning for this value? Do we know which devices need ECAM
> > >areas?
> >
> > Yes, Xen is limited to bus 0 emulation currently, the description
> > states "When multiple PCI buses support for Xen will be implemented,
> > PCI_MAX_MCFG_BUSES may be changed to calculation of the number of
> buses
> > according to results of the PCI devices enumeration".
> >
> > I think it might be better to replace 'switch (PCI_MAX_MCFG_BUSES)'
> > with the real code right away, i.e. change it to
> >
> > 'switch (max_bus_num, aligned up to 64/128/256 boundary)',
> > where max_bus_num should be set in PCI device enumeration code in
> > pci_setup(). As we are limited to bus 0 currently, we'll just set it
> > to 0 for now, before/after the PCI device enumeration loop (which should
> > became multi-bus capable eventually).
> 
> I guess this is all pretty much hardcoded to bus 0 in several places,
> so I'm not sure it's worth to add PCI_MAX_MCFG_BUSES. IMO if something
> like this should be added it should be PCI_MAX_BUSES, and several
> places should be changed to make use of it. Or ideally we should find
> a way to detect this at runtime, without needed any hardcoded defines.
> 
> I think it would be good if you can add a note comment describing the
> different MCFG sizes supported by the Q35 chipset (64/128/256).
> 
> Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 11/12] hvmloader: use libacpi to build MCFG table
  2018-03-19 21:20     ` Alexey G
  2018-03-20  8:58       ` Roger Pau Monné
@ 2018-03-20  9:36       ` Jan Beulich
  2018-03-20 20:53         ` Alexey G
  1 sibling, 1 reply; 183+ messages in thread
From: Jan Beulich @ 2018-03-20  9:36 UTC (permalink / raw)
  To: Alexey G
  Cc: Andrew Cooper, xen-devel, Wei Liu, Ian Jackson, Roger Pau Monné

>>> On 19.03.18 at 22:20, <x1917x@gmail.com> wrote:
> On Mon, 19 Mar 2018 17:49:09 +0000
> Roger Pau Monné <roger.pau@citrix.com> wrote:
>>On Tue, Mar 13, 2018 at 04:33:56AM +1000, Alexey Gerasimenko wrote:
>>> --- a/tools/firmware/hvmloader/util.c
>>> +++ b/tools/firmware/hvmloader/util.c
>>> @@ -782,6 +782,69 @@ int get_pc_machine_type(void)
>>>      return machine_type;
>>>  }
>>>  
>>> +#define PCIEXBAR_ADDR_MASK_64MB     (~((1ULL << 26) - 1))
>>> +#define PCIEXBAR_ADDR_MASK_128MB    (~((1ULL << 27) - 1))
>>> +#define PCIEXBAR_ADDR_MASK_256MB    (~((1ULL << 28) - 1))
>>> +#define PCIEXBAR_LENGTH_BITS(reg)   (((reg) >> 1) & 3)
>>> +#define PCIEXBAREN                  1  
>>
>>PCIEXBAR_ENABLE maybe?
> 
> PCIEXBAREN is just an official name of this bit from the
> Intel datasheet. :) OK, will rename it to PCIEXBAR_ENABLE.

I think using names from the datasheet (where they exist) is
preferable in cases like this one.

>>> +    switch (PCIEXBAR_LENGTH_BITS(reg))
>>> +    {
>>> +    case 0:
>>> +        base &= PCIEXBAR_ADDR_MASK_256MB;
>>> +        break;
>>> +    case 1:
>>> +        base &= PCIEXBAR_ADDR_MASK_128MB;
>>> +        break;
>>> +    case 2:
>>> +        base &= PCIEXBAR_ADDR_MASK_64MB;
>>> +        break;  
>>
>>Missing newlines, plus this looks like it wants to use the defines
>>introduced in patch 7 (PCIEXBAR_{64,128,256}_BUSES). Also any reason
>>this patch and patch 7 cannot be put sequentially?
> 
> I think all these #defines should find a way to pci_regs.h, it seems
> like an appropriate place for them.

I don't think device specific defines belong into pci_regs.h.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 11/12] hvmloader: use libacpi to build MCFG table
  2018-03-20  9:36       ` Jan Beulich
@ 2018-03-20 20:53         ` Alexey G
  2018-03-21  7:36           ` Jan Beulich
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-20 20:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, xen-devel, Wei Liu, Ian Jackson, Roger Pau Monné

On Tue, 20 Mar 2018 03:36:57 -0600
"Jan Beulich" <JBeulich@suse.com> wrote:

>>>> On 19.03.18 at 22:20, <x1917x@gmail.com> wrote:  
>> On Mon, 19 Mar 2018 17:49:09 +0000
>> Roger Pau Monné <roger.pau@citrix.com> wrote:  
>>>On Tue, Mar 13, 2018 at 04:33:56AM +1000, Alexey Gerasimenko wrote:  
>>>> --- a/tools/firmware/hvmloader/util.c
>>>> +++ b/tools/firmware/hvmloader/util.c
>>>> @@ -782,6 +782,69 @@ int get_pc_machine_type(void)
>>>>      return machine_type;
>>>>  }
>>>>  
>>>> +#define PCIEXBAR_ADDR_MASK_64MB     (~((1ULL << 26) - 1))
>>>> +#define PCIEXBAR_ADDR_MASK_128MB    (~((1ULL << 27) - 1))
>>>> +#define PCIEXBAR_ADDR_MASK_256MB    (~((1ULL << 28) - 1))
>>>> +#define PCIEXBAR_LENGTH_BITS(reg)   (((reg) >> 1) & 3)
>>>> +#define PCIEXBAREN                  1    
>>>
>>>PCIEXBAR_ENABLE maybe?  
>> 
>> PCIEXBAREN is just an official name of this bit from the
>> Intel datasheet. :) OK, will rename it to PCIEXBAR_ENABLE.  
>
>I think using names from the datasheet (where they exist) is
>preferable in cases like this one.

Leaving it intact then.

>>>> +    switch (PCIEXBAR_LENGTH_BITS(reg))
>>>> +    {
>>>> +    case 0:
>>>> +        base &= PCIEXBAR_ADDR_MASK_256MB;
>>>> +        break;
>>>> +    case 1:
>>>> +        base &= PCIEXBAR_ADDR_MASK_128MB;
>>>> +        break;
>>>> +    case 2:
>>>> +        base &= PCIEXBAR_ADDR_MASK_64MB;
>>>> +        break;    
>>>
>>>Missing newlines, plus this looks like it wants to use the defines
>>>introduced in patch 7 (PCIEXBAR_{64,128,256}_BUSES). Also any reason
>>>this patch and patch 7 cannot be put sequentially?  
>> 
>> I think all these #defines should find a way to pci_regs.h, it seems
>> like an appropriate place for them.  
>
>I don't think device specific defines belong into pci_regs.h.

Will gather all these #defines and macros in the new pci_regs_q35.h
file. It should not harm to include it from pci_regs.h I think, in
order to include pci_regs.h only in *.c.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 10/12] libacpi: build ACPI MCFG table if requested
  2018-03-20  9:03       ` Roger Pau Monné
@ 2018-03-20 21:06         ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-20 21:06 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Wei Liu, Ian Jackson, Jan Beulich

On Tue, 20 Mar 2018 09:03:56 +0000
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Tue, Mar 20, 2018 at 07:46:04AM +1000, Alexey G wrote:
>> On Mon, 19 Mar 2018 17:33:34 +0000
>> Roger Pau Monné <roger.pau@citrix.com> wrote:
>>   
>> >On Tue, Mar 13, 2018 at 04:33:55AM +1000, Alexey Gerasimenko
>> >wrote:  
>> >> This adds construct_mcfg() function to libacpi which allows to
>> >> build MCFG table for a given mmconfig_addr/mmconfig_len pair if
>> >> the ACPI_HAS_MCFG flag was specified in acpi_config struct.
>> >> 
>> >> The maximum bus number is calculated from mmconfig_len using
>> >> MCFG_SIZE_TO_NUM_BUSES macro (1MByte of MMIO space per bus).
>> >> 
>> >> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
>> >> ---
>> >>  tools/libacpi/acpi2_0.h | 21 +++++++++++++++++++++
>> >>  tools/libacpi/build.c   | 42
>> >> ++++++++++++++++++++++++++++++++++++++++++ tools/libacpi/libacpi.h
>> >> |  4 ++++ 3 files changed, 67 insertions(+)
>> >> 
>> >> diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
>> >> index 2619ba32db..209ad1acd3 100644
>> >> --- a/tools/libacpi/acpi2_0.h
>> >> +++ b/tools/libacpi/acpi2_0.h
>> >> @@ -422,6 +422,25 @@ struct acpi_20_slit {
>> >>  };
>> >>  
>> >>  /*
>> >> + * PCI Express Memory Mapped Configuration Description Table
>> >> + */
>> >> +struct mcfg_range_entry {
>> >> +    uint64_t base_address;
>> >> +    uint16_t pci_segment;
>> >> +    uint8_t  start_pci_bus_num;
>> >> +    uint8_t  end_pci_bus_num;
>> >> +    uint32_t reserved;
>> >> +};
>> >> +
>> >> +struct acpi_mcfg {
>> >> +    struct acpi_header header;
>> >> +    uint8_t reserved[8];
>> >> +    struct mcfg_range_entry entries[1];
>> >> +};    
>> >
>> >I would define this as:
>> >
>> >struct acpi_10_mcfg {
>> >    struct acpi_header header;
>> >    uint8_t reserved[8];
>> >    struct acpi_10_mcfg_entry {
>> >        uint64_t base_address;
>> >        uint16_t pci_segment;
>> >        uint8_t  start_pci_bus;
>> >        uint8_t  end_pci_bus;
>> >        uint32_t reserved;
>> >    } entries[1];
>> >};  
>> 
>> Hmm, a choice of preference, but OK, will move it inside.  
>
>Note the name change also (acpi_10_mcfg). Also I think you can drop
>the acpi_10_mcfg_entry name and just use an anonymous struct.
>
>> >> +
>> >> +    mcfg->entries[0].base_address = config->mmconfig_addr;
>> >> +    mcfg->entries[0].pci_segment = 0;
>> >> +    mcfg->entries[0].start_pci_bus_num = 0;
>> >> +    mcfg->entries[0].end_pci_bus_num =
>> >> +        MCFG_SIZE_TO_NUM_BUSES(config->mmconfig_len) - 1;    
>> >
>> >Why not pass the start_bus and end_bus values in acpi_config at
>> >least?  
>> 
>> start_pci_bus_num will be always 0.
>> 
>> It will be kinda ugly to pass config->mmconfig_addr along with
>> config->end_pci_bus_num, baseaddr+size combo looks nicer I think.  
>I'm not going to insist, but ACPI doesn't really care about the size,
>it just needs to know the start and end. Seems pointless to write a
>value here that later libacpi needs to convert to the value it
>actually needs. Also start/end buses are uint8_t, size is uint32_t.

As the underlying implementation is limited to just one PCI segment
and we need to pass only one MCFG range entry, I guess it will be ok to
use the mmconfig_addr + mmconfig_num_buses pair (almost same as
end_bus, but more size-descriptive).



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 06/12] hvmloader: add basic Q35 support
  2018-03-20  9:20       ` Roger Pau Monné
@ 2018-03-20 21:23         ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-20 21:23 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

On Tue, 20 Mar 2018 09:20:01 +0000
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Tue, Mar 20, 2018 at 09:44:33AM +1000, Alexey G wrote:
>> On Mon, 19 Mar 2018 15:30:14 +0000
>> Roger Pau Monné <roger.pau@citrix.com> wrote:
>>   
>> >On Tue, Mar 13, 2018 at 04:33:51AM +1000, Alexey Gerasimenko
>> >wrote:  
>> >> +    {
>> >> +    case 0x0300:    
>> >
>> >All this values need to be defines documented somewhere.  
>> 
>> Agree... although it was not me who introduced all these hardcoded
>> PCI class values. :) I'll change these numbers into newly added
>> pci_regs.h #defines in the non-functional patch.  
>
>Right. I've realized that later. If you place this code moment in a
>separate patch without any other modifications I won't complain about
>the lack of defines (although it would be nice to have them :)).

OK, will do.

>> >> +        {
>> >> +            pci_writeb(PCI_ICH9_LPC_DEVFN, 0x60 + link, isa_irq);
>> >> +
>> >> +            /* PIRQE..PIRQH are unused */
>> >> +            pci_writeb(PCI_ICH9_LPC_DEVFN, 0x68 + link,
>> >> 0x80);    
>> >
>> >According to the spec 0x80 is the default value for this registers,
>> >do you really need to write it?
>> >
>> >Is maybe QEMU not correctly setting the default value?  
>> 
>> Won't agree here. We're initializing PIRQ[n] routing in this
>> fragment, it's better not to rely on any values but simply initialize
>> all PIRQ[n]_ROUT registers, this makes it explicit.
>> 
>> Even if it is unnecessary due to defaults it's more obvious to set
>> these registers to our own values than to force a reader to either
>> look up their emulation in QEMU code or read the ICH9 pdf to confirm
>> assumptions.  
>
>But if you start doing this, you should do it for all the registers.
>Why is PIRQE..PIRQH routing special that you need to re-write the
>default value? But not SIRQ_CNTL for example?
>
>I think a comment noting that the default value for those registers is
>what we expect (0x80 - Interrupt Routing Disabled) would be better.

It will depend on future QEMU/hvmloader changes a bit, but I think
switching to the comment about the default values instead of
initialization should be good.

>
>Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-20  8:50       ` Roger Pau Monné
  2018-03-20  9:25         ` Paul Durrant
@ 2018-03-21  0:58         ` Alexey G
  2018-03-21  9:09           ` Roger Pau Monné
  1 sibling, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-21  0:58 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, Paul Durrant, Jan Beulich,
	xen-devel

On Tue, 20 Mar 2018 08:50:48 +0000
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Tue, Mar 20, 2018 at 05:49:22AM +1000, Alexey G wrote:
>> On Mon, 19 Mar 2018 15:58:02 +0000
>> Roger Pau Monné <roger.pau@citrix.com> wrote:
>>   
>> >On Tue, Mar 13, 2018 at 04:33:52AM +1000, Alexey Gerasimenko
>> >wrote:  
>> >> Much like normal PCI BARs or other chipset-specific memory-mapped
>> >> resources, MMCONFIG area needs space in MMIO hole, so we must
>> >> allocate it manually.
>> >> 
>> >> The actual MMCONFIG size depends on a number of PCI buses
>> >> available which should be covered by ECAM. Possible options are
>> >> 64MB, 128MB and 256MB. As we are limited to the bus 0 currently,
>> >> thus using lowest possible setting (64MB), #defined via
>> >> PCI_MAX_MCFG_BUSES in hvmloader/config.h. When multiple PCI buses
>> >> support for Xen will be implemented, PCI_MAX_MCFG_BUSES may be
>> >> changed to calculation of the number of buses according to
>> >> results of the PCI devices enumeration.
>> >> 
>> >> The way to allocate MMCONFIG range in MMIO hole is similar to how
>> >> other PCI BARs are allocated. The patch extends 'bars' structure
>> >> to make it universal for any arbitrary BAR type -- either IO,
>> >> MMIO, ROM or a chipset-specific resource.    
>> >
>> >I'm not sure this is fully correct. The IOREQ interface can
>> >differentiate PCI devices and forward config space accesses to
>> >different emulators (see IOREQ_TYPE_PCI_CONFIG). With this change
>> >you will forward all MCFG accesses to QEMU, which will likely be
>> >wrong if there are multiple PCI-device emulators for the same
>> >domain.
>> >
>> >Ie: AFAICT Xen needs to know about the MCFG emulation and detect
>> >accesses to it in order to forward them to the right emulators.
>> >
>> >Adding Paul who knows more about all this.  
>> 
>> In which use cases multiple PCI-device emulators are used for a
>> single HVM domain? Is it a proprietary setup?  
>
>Likely. I think XenGT might be using it. It's a feature of the IOREQ
>implementation in Xen.

According to public slides for the feature, both PCI conf and MMIO
accesses can be routed to the designated device model. It looks like
for this particular setup it doesn't really matter which particular
ioreq type must be used for MMCONFIG accesses -- either
IOREQ_TYPE_PCI_CONFIG or IOREQ_TYPE_COPY (MMIO accesses) should be
acceptable. The only thing which matters is ioreq routing itself --
making decisions to which device model the PCI conf/MMIO ioreq should
be sent.

>Traditional PCI config space accesses are not IO port space accesses.

(assuming 'not' mistyped here)

>The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and IOREQ
>servers can register devices they would like to receive configuration
>space accesses for. QEMU is already making use of this, see for

That's one of the reasons why current IOREQ_TYPE_PCI_CONFIG
implementation is a bit inconvenient for MMCONFIG MMIO accesses -- it's
too much CF8h/CFCh-centric in its implementation, might be painful to
change something in the code which was intended for CF8h/CFCh handling
(and not for MMIO processing).

>example xen_map_pcidev in the QEMU code.
>
>By treating MCFG accesses as MMIO you are bypassing the IOREQ PCI
>layer, and thus a IOREQ server could register a PCI device and only
>receive PCI configuration accesses from the IO port space, while MCFG
>accesses would be forwarded somewhere else.

It will be handled by IOREQ too, just using a different IOREQ type
(MMIO one). The basic question is why do we have to stick to PCI conf
space ioreqs for emulating MMIO accesses to MMCONFIG.

>I think you need to make the IOREQ code aware of the MCFG area and
>XEN_DMOP_IO_RANGE_PCI needs to forward both IO space and MCFG accesses
>to the right IOREQ server.

Right now there is no way to inform Xen where the emulated MMCONFIG
area is located in order to make this decision, based on the address
within MMCONFIG range. A new dmop/hypercall is needed (with args
similar to pci_mmcfg_reserved) along with its usage in QEMU.

I'll try to summarize two different approaches to MMCONFIG
handling. For both approaches the final PCI config host interface for a
passed through device in QEMU will remain same as at the moment --
xen_host_pci_* functions in /hw/xen.


Approach #1. Informing Xen about MMCONFIG area changes and letting Xen
to translate MMIO accesses to _PCI_CONFIG ioreqs:

1. QEMU will trap accesses to PCIEXBAR, calling Xen via dmop/hypercall
to let the latter know of any MMCONFIG area address/size/status changes

2. Xen will trap MMIO accesses to the current MMCONFIG location and
convert memory accesses into one or several _PCI_CONFIG ioreqs and send
them to a chosen device model

3. QEMU will receive _PCI_CONFIG ioreqs with SBDF and 12-bit offsets
inside which it needs to somehow pass to
pci_host_config_{read,write}_common() for emulation. It might require
few hacks to make the gears turn (due to QEMU pci conf read/write
model).
At the moment emulated CF8h/CFCh ports play a special role
in all this -- xen-hvm.c writes an AMD-style value to the
emulated CF8h port "so that the config space access will target the
correct device model" (quote). Not sure about this and why it's is
needed if Xen actually makes the decision to which DM the PCI conf
ioreq should be sent.

One minor note: these new 'set_mmconfig_' dmops/hypercalls have to be
triggered inside the chipset-specific emulation code in QEMU (PCIEXBAR
handling in Q35 case). If there will be another machine which needs to
emulate MMCONFIG control differently -- we have no choice but to
insert these dmops/hypercalls into another chipset-specific emulation
code as well, eg. inside HECBASE emulation code.

Approach #2. Handling MMCONFIG area inside QEMU using usual MMIO
emulation:

1. QEMU will trap accesses to PCIEXBAR (or whatever else possibly
supported in the future like HECBASE), eventually asking Xen to map the
MMCONFIG MMIO range for ioreq servicing just like it does for any
other emulated MMIO range, via map_io_range_to_ioreq_server(). All
changes in MMCONFIG placement/status will lead to remapping/unmapping
the MMIO range.

2. Xen will trap MMIO accesses to this area and forward them to QEMU as
MMIO (IOREQ_TYPE_COPY) ioreqs

3. QEMU will receive these accesses and pass them to the existing
MMCONFIG emulation -- pcie_mmcfg_data_read/write handlers, finally
resulting in same xen_host_pci_* function calls as before.

This approach works "right out of the box", no changes needed for either
Xen or QEMU. As both _PCI_CONFIG and MMIO type ioreqs are processed,
either method can be used to access PCI/extended config space --
CF8/CFC port I/O or MMIO accesses to MMCONFIG.

IOREQ routing for multiple device emulators can be supported too. In
fact, the same mmconfig dmops/hypercalls can be added to let Xen know
where MMCONFIG area resides, Xen will use this information to forward
MMCONFIG MMIO ioreqs accordingly to BDF of the address. The difference
with the approach #1 is that these interfaces are now completely
optional when we use MMIO ioreqs for MMCONFIG on vanilla Xen/QEMU.

The question is why IOREQ_TYPE_COPY -> IOREQ_TYPE_PCI_CONFIG
translation is a must have thing at all? It won't make handling simpler.
For current QEMU implementation IOREQ_TYPE_COPY (MMIO accesses for
MMCONFIG) would be preferable as it allows to use the existing code.

I think it will be safe to use MMCONFIG emulation on MMIO level for now
and later extend it with 'set_mmconfig_' dmop/hypercall for the
'multiple device emulators' IOREQ_TYPE_COPY routing to work same as for
PCI conf, so it can be used by XenGT etc on Q35 as well.

After all, all this is Q35-specific and won't harm the existing i440
emulation in any way.

>> I assume it is somehow related to this code in xen-hvm.c:
>>                 /* Fake a write to port 0xCF8 so that
>>                  * the config space access will target the
>>                  * correct device model.
>>                  */
>>                 val = (1u << 31) | ((req->addr & 0x0f00) <...>
>>                 do_outp(0xcf8, 4, val);
>> if yes, similar thing can be made for IOREQ_TYPE_COPY accesses to
>> the emulated MMCONFIG if needed.  
>
>I have to admit I don't know that much about QEMU, and I have no idea
>what the chunk above is supposed to accomplish.
>
>> 
>> In HVM+QEMU case we are not limited to merely passed through devices,
>> most of the observable PCI config space devices belong to one
>> particular QEMU instance. This dictates the overall emulated
>> MMCONFIG layout for a domain which should be in sync to what QEMU
>> emulates via CF8h/CFCh accesses... and between multiple device model
>> instances (if there are any, still not sure what multiple PCI-device
>> emulators you mentioned really are).  
>
>In newer versions of Xen (>4.5 IIRC, Paul knows more), QEMU doesn't
>directly trap accesses to the 0xcf8/0xcfc IO ports, it's Xen instead
>the one that detects and decodes such accesses, and then forwards them
>to the IOREQ server that has been registered to handle them.
>
>You cannot simply forward all MCFG accesses to QEMU as MMIO accesses,
>Xen needs to decode them and they need to be handled as
>IOREQ_TYPE_PCI_CONFIG requests, not IOREQ_TYPE_COPY IMO.
>
>> 
>> Basically, we have an emulated MMCONFIG area of 64/128/256MB size in
>> the MMIO hole of the guest HVM domain. (BTW, this area itself can be
>> considered a feature of the chipset the device model emulates.)
>> It can be relocated to some other place in MMIO hole, this means that
>> QEMU will trap accesses to the specific to the emulated chipset
>> PCIEXBAR register and will issue same MMIO unmap/map calls as for
>> any normal emulated MMIO range.
>> 
>> On the other hand, it won't be easy to provide emulated MMCONFIG
>> translation into IOREQ_TYPE_PCI_CONFIG from Xen side. Xen should know
>> current emulated MMCONFIG area position and size in order to
>> translate (or not) accesses to it into corresponding BDF/reg pair
>> (+whether that area is enabled for decoding or not). This will
>> likely require to introduce new hypercall(s).  
>
>Yes, you will have to introduce new hypercalls to tell Xen the
>position/size of the MCFG hole. Likely you want to tell it the start
>address, the pci segment, start bus and end bus. I know pci segment
>and start bus is always going to be 0 ATM, but it would be nice to
>have a complete interface.
>
>By your comment above I think you want an interface that allows you to
>remove/add those MCFG areas at runtime.
>
>> The question is if there will be any difference or benefit at all.  
>
>IMO it's not about benefits or differences, it's about correctness.
>Xen currently detects accesses to the PCI configuration space from IO
>ports and for consistency it should also detect accesses to this space
>by any other means.
>
>> It's basically the same emulated MMIO range after all, but in one
>> case we trap accesses to it in Xen and translate them into
>> IOREQ_TYPE_PCI_CONFIG requests.
>> We have to provide some infrastructure to let Xen know where the
>> device model/guest expects to use the MMCONFIG area (and its size).
>> The device model will need to use this infrastructure, informing Xen
>> of any changes. Also, due to MMCONFIG nature there might be some
>> pitfalls like a necessity to send multiple IOREQ_TYPE_PCI_CONFIG
>> ioreqs caused by a single memory read/write operation.  
>
>This seems all fine. Why do you expect MCFG access to create multiple
>IOREQ_TYPE_PCI_CONFIG but not multiple IOREQ_TYPE_COPY?
>> In another case, we still have an emulated MMIO range, but Xen will
>> send plain IOREQ_TYPE_COPY requests to QEMU which it handles itself.
>> In such case, all code to work with MMCONFIG accesses is available
>> for reuse right away (mmcfg -> pci_* translation in QEMU), no new
>> functionality required neither in Xen or QEMU.  
>
>As I tried to argument above, I think this is not correct, but I would
>also like that Paul expresses his opinion as the IOREQ maintainer.
>
>> >>  tools/firmware/hvmloader/config.h   |   4 ++
>> >>  tools/firmware/hvmloader/pci.c      | 127
>> >> ++++++++++++++++++++++++++++--------
>> >> tools/firmware/hvmloader/pci_regs.h |   2 + 3 files changed, 106
>> >> insertions(+), 27 deletions(-)
>> >> 
>> >> diff --git a/tools/firmware/hvmloader/config.h
>> >> b/tools/firmware/hvmloader/config.h index 6fde6b7b60..5443ecd804
>> >> 100644 --- a/tools/firmware/hvmloader/config.h
>> >> +++ b/tools/firmware/hvmloader/config.h
>> >> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
>> >>  #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
>> >>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI
>> >> connected */ #define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0
>> >> */ +#define PCI_MCH_DEVFN       0       /* bus 0, dev 0, func 0 */
>> >>  
>> >>  /* MMIO hole: Hardcoded defaults, which can be dynamically
>> >> expanded. */ #define PCI_MEM_END         0xfc000000
>> >>  
>> >> +/* possible values are: 64, 128, 256 */
>> >> +#define PCI_MAX_MCFG_BUSES  64    
>> >
>> >What the reasoning for this value? Do we know which devices need
>> >ECAM areas?  
>> 
>> Yes, Xen is limited to bus 0 emulation currently, the description
>> states "When multiple PCI buses support for Xen will be implemented,
>> PCI_MAX_MCFG_BUSES may be changed to calculation of the number of
>> buses according to results of the PCI devices enumeration".
>> 
>> I think it might be better to replace 'switch (PCI_MAX_MCFG_BUSES)'
>> with the real code right away, i.e. change it to
>> 
>> 'switch (max_bus_num, aligned up to 64/128/256 boundary)',
>> where max_bus_num should be set in PCI device enumeration code in
>> pci_setup(). As we are limited to bus 0 currently, we'll just set it
>> to 0 for now, before/after the PCI device enumeration loop (which
>> should became multi-bus capable eventually).  
>
>I guess this is all pretty much hardcoded to bus 0 in several places,
>so I'm not sure it's worth to add PCI_MAX_MCFG_BUSES. IMO if something
>like this should be added it should be PCI_MAX_BUSES, and several
>places should be changed to make use of it. Or ideally we should find
>a way to detect this at runtime, without needed any hardcoded defines.

Getting rid of bus 0 limitation should have high priority I'm afraid. 
It has become an obstacle for PCIe passthrough.

>I think it would be good if you can add a note comment describing the
>different MCFG sizes supported by the Q35 chipset (64/128/256).

will add "...supported by Q35" here:

>> +/* possible values are: 64, 128, 256 */
>> +#define PCI_MAX_MCFG_BUSES  64    

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 11/12] hvmloader: use libacpi to build MCFG table
  2018-03-20 20:53         ` Alexey G
@ 2018-03-21  7:36           ` Jan Beulich
  0 siblings, 0 replies; 183+ messages in thread
From: Jan Beulich @ 2018-03-21  7:36 UTC (permalink / raw)
  To: Alexey G
  Cc: Andrew Cooper, xen-devel, Wei Liu, Ian Jackson, Roger Pau Monné

>>> On 20.03.18 at 21:53, <x1917x@gmail.com> wrote:
> On Tue, 20 Mar 2018 03:36:57 -0600
> "Jan Beulich" <JBeulich@suse.com> wrote:
>>>>> On 19.03.18 at 22:20, <x1917x@gmail.com> wrote:  
>>> On Mon, 19 Mar 2018 17:49:09 +0000
>>> Roger Pau Monné <roger.pau@citrix.com> wrote:  
>>>>On Tue, Mar 13, 2018 at 04:33:56AM +1000, Alexey Gerasimenko wrote:  
>>>>> +    switch (PCIEXBAR_LENGTH_BITS(reg))
>>>>> +    {
>>>>> +    case 0:
>>>>> +        base &= PCIEXBAR_ADDR_MASK_256MB;
>>>>> +        break;
>>>>> +    case 1:
>>>>> +        base &= PCIEXBAR_ADDR_MASK_128MB;
>>>>> +        break;
>>>>> +    case 2:
>>>>> +        base &= PCIEXBAR_ADDR_MASK_64MB;
>>>>> +        break;    
>>>>
>>>>Missing newlines, plus this looks like it wants to use the defines
>>>>introduced in patch 7 (PCIEXBAR_{64,128,256}_BUSES). Also any reason
>>>>this patch and patch 7 cannot be put sequentially?  
>>> 
>>> I think all these #defines should find a way to pci_regs.h, it seems
>>> like an appropriate place for them.  
>>
>>I don't think device specific defines belong into pci_regs.h.
> 
> Will gather all these #defines and macros in the new pci_regs_q35.h
> file. It should not harm to include it from pci_regs.h I think, in
> order to include pci_regs.h only in *.c.

Well, no - no unnecessary dependencies please. If only a single
file needs these definitions, only that file should include the
respective header (if one is warranted in the first place).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-21  0:58         ` Alexey G
@ 2018-03-21  9:09           ` Roger Pau Monné
  2018-03-21  9:36             ` Paul Durrant
  2018-03-21 14:25             ` Alexey G
  0 siblings, 2 replies; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-21  9:09 UTC (permalink / raw)
  To: Alexey G
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, Paul Durrant, Jan Beulich,
	xen-devel

On Wed, Mar 21, 2018 at 10:58:40AM +1000, Alexey G wrote:
> On Tue, 20 Mar 2018 08:50:48 +0000
> Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> >On Tue, Mar 20, 2018 at 05:49:22AM +1000, Alexey G wrote:
> >> On Mon, 19 Mar 2018 15:58:02 +0000
> >> Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>   
> >> >On Tue, Mar 13, 2018 at 04:33:52AM +1000, Alexey Gerasimenko
> >> >wrote:  
> >> >> Much like normal PCI BARs or other chipset-specific memory-mapped
> >> >> resources, MMCONFIG area needs space in MMIO hole, so we must
> >> >> allocate it manually.
> >> >> 
> >> >> The actual MMCONFIG size depends on a number of PCI buses
> >> >> available which should be covered by ECAM. Possible options are
> >> >> 64MB, 128MB and 256MB. As we are limited to the bus 0 currently,
> >> >> thus using lowest possible setting (64MB), #defined via
> >> >> PCI_MAX_MCFG_BUSES in hvmloader/config.h. When multiple PCI buses
> >> >> support for Xen will be implemented, PCI_MAX_MCFG_BUSES may be
> >> >> changed to calculation of the number of buses according to
> >> >> results of the PCI devices enumeration.
> >> >> 
> >> >> The way to allocate MMCONFIG range in MMIO hole is similar to how
> >> >> other PCI BARs are allocated. The patch extends 'bars' structure
> >> >> to make it universal for any arbitrary BAR type -- either IO,
> >> >> MMIO, ROM or a chipset-specific resource.    
> >> >
> >> >I'm not sure this is fully correct. The IOREQ interface can
> >> >differentiate PCI devices and forward config space accesses to
> >> >different emulators (see IOREQ_TYPE_PCI_CONFIG). With this change
> >> >you will forward all MCFG accesses to QEMU, which will likely be
> >> >wrong if there are multiple PCI-device emulators for the same
> >> >domain.
> >> >
> >> >Ie: AFAICT Xen needs to know about the MCFG emulation and detect
> >> >accesses to it in order to forward them to the right emulators.
> >> >
> >> >Adding Paul who knows more about all this.  
> >> 
> >> In which use cases multiple PCI-device emulators are used for a
> >> single HVM domain? Is it a proprietary setup?  
> >
> >Likely. I think XenGT might be using it. It's a feature of the IOREQ
> >implementation in Xen.
> 
> According to public slides for the feature, both PCI conf and MMIO
> accesses can be routed to the designated device model. It looks like
> for this particular setup it doesn't really matter which particular
> ioreq type must be used for MMCONFIG accesses -- either
> IOREQ_TYPE_PCI_CONFIG or IOREQ_TYPE_COPY (MMIO accesses) should be
> acceptable.

Isn't that going to be quite messy? How is the IOREQ server supposed
to decode a MCFG access received as IOREQ_TYPE_COPY?

I don't think the IOREQ server needs to know the start of the MCFG
region, in which case it won't be able to detect and decode the
access if it's of type IOREQ_TYPE_COPY.

MCFG accesses need to be sent to the IOREQ server as
IOREQ_TYPE_PCI_CONFIG, or else you are forcing each IOREQ server to
know the position of the MCFG area in order to do the decoding. In
your case this would work because QEMU controls the position of the
MCFG region, but there's no need for other IOREQ servers to know the
position of the MCFG area.

> The only thing which matters is ioreq routing itself --
> making decisions to which device model the PCI conf/MMIO ioreq should
> be sent.

Hm, see above, but I'm fairly sure you need to forward those MCFG
accesses as IOREQ_TYPE_PCI_CONFIG to the IOREQ server.

> >Traditional PCI config space accesses are not IO port space accesses.
> 
> (assuming 'not' mistyped here)

Not really, this should instead be:

"Traditional PCI config space accesses are not forwarded to the IOREQ
server as IO port space accesses (IOREQ_TYPE_PIO) but rather as PCI
config space accesses (IOREQ_TYPE_PCI_CONFIG)."

Sorry for the confusion.

> >The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and IOREQ
> >servers can register devices they would like to receive configuration
> >space accesses for. QEMU is already making use of this, see for
> 
> That's one of the reasons why current IOREQ_TYPE_PCI_CONFIG
> implementation is a bit inconvenient for MMCONFIG MMIO accesses -- it's
> too much CF8h/CFCh-centric in its implementation, might be painful to
> change something in the code which was intended for CF8h/CFCh handling
> (and not for MMIO processing).

I'm not sure I follow. Do you mean that changes should be made to the
ioreq struct in order to forward MCFG accesses using
IOREQ_TYPE_PCI_CONFIG as it's type?

> >example xen_map_pcidev in the QEMU code.
> >
> >By treating MCFG accesses as MMIO you are bypassing the IOREQ PCI
> >layer, and thus a IOREQ server could register a PCI device and only
> >receive PCI configuration accesses from the IO port space, while MCFG
> >accesses would be forwarded somewhere else.
> 
> It will be handled by IOREQ too, just using a different IOREQ type
> (MMIO one). The basic question is why do we have to stick to PCI conf
> space ioreqs for emulating MMIO accesses to MMCONFIG.

Because other IOREQ servers don't need to know about the position/size
of the MCFG area, and cannot register MMIO ranges that cover their
device's PCI configuration space in the MCFG region.

Not to mention that it would would be a terrible design flaw to force
IOREQ servers to register PCI devices and MCFG areas belonging to
those devices separately as MMIO in order to trap all possible PCI
configuration space accesses.

> >I think you need to make the IOREQ code aware of the MCFG area and
> >XEN_DMOP_IO_RANGE_PCI needs to forward both IO space and MCFG accesses
> >to the right IOREQ server.
> 
> Right now there is no way to inform Xen where the emulated MMCONFIG
> area is located in order to make this decision, based on the address
> within MMCONFIG range. A new dmop/hypercall is needed (with args
> similar to pci_mmcfg_reserved) along with its usage in QEMU.
> 
> I'll try to summarize two different approaches to MMCONFIG
> handling. For both approaches the final PCI config host interface for a
> passed through device in QEMU will remain same as at the moment --
> xen_host_pci_* functions in /hw/xen.
> 
> 
> Approach #1. Informing Xen about MMCONFIG area changes and letting Xen
> to translate MMIO accesses to _PCI_CONFIG ioreqs:
> 
> 1. QEMU will trap accesses to PCIEXBAR, calling Xen via dmop/hypercall
> to let the latter know of any MMCONFIG area address/size/status changes
> 
> 2. Xen will trap MMIO accesses to the current MMCONFIG location and
> convert memory accesses into one or several _PCI_CONFIG ioreqs and send
> them to a chosen device model
> 
> 3. QEMU will receive _PCI_CONFIG ioreqs with SBDF and 12-bit offsets
> inside which it needs to somehow pass to
> pci_host_config_{read,write}_common() for emulation. It might require
> few hacks to make the gears turn (due to QEMU pci conf read/write
> model).
> At the moment emulated CF8h/CFCh ports play a special role
> in all this -- xen-hvm.c writes an AMD-style value to the
> emulated CF8h port "so that the config space access will target the
> correct device model" (quote). Not sure about this and why it's is
> needed if Xen actually makes the decision to which DM the PCI conf
> ioreq should be sent.
> 
> One minor note: these new 'set_mmconfig_' dmops/hypercalls have to be
> triggered inside the chipset-specific emulation code in QEMU (PCIEXBAR
> handling in Q35 case). If there will be another machine which needs to
> emulate MMCONFIG control differently -- we have no choice but to
> insert these dmops/hypercalls into another chipset-specific emulation
> code as well, eg. inside HECBASE emulation code.

Maybe you could detect offsets >= 256 and replay them in QEMU like
mmio accesses? Using the address_space_write or
pcie_mmcfg_data_read/write functions?

I have to admit my knowledge of QEMU is quite limited, so I'm not sure
of the best way to handle this.

Ideally we should find a way that doesn't involve having to modify
each chipset to handle MCFG accesses from Xen. It would be nice to
have some kind of interface inside of QEMU so all chipsets can
register MCFG areas or modify them, but this is out of the scope of
this work.

Regardless of how this ends up being implemented inside of QEMU I
think the above approach is the right one from an architectural PoV.

AFAICT there are still some reserved bits in the ioreq struct that you
could use to signal 'this is a MCFG PCI access' if required.

> Approach #2. Handling MMCONFIG area inside QEMU using usual MMIO
> emulation:
> 
> 1. QEMU will trap accesses to PCIEXBAR (or whatever else possibly
> supported in the future like HECBASE), eventually asking Xen to map the
> MMCONFIG MMIO range for ioreq servicing just like it does for any
> other emulated MMIO range, via map_io_range_to_ioreq_server(). All
> changes in MMCONFIG placement/status will lead to remapping/unmapping
> the MMIO range.
> 
> 2. Xen will trap MMIO accesses to this area and forward them to QEMU as
> MMIO (IOREQ_TYPE_COPY) ioreqs
> 
> 3. QEMU will receive these accesses and pass them to the existing
> MMCONFIG emulation -- pcie_mmcfg_data_read/write handlers, finally
> resulting in same xen_host_pci_* function calls as before.
> 
> This approach works "right out of the box", no changes needed for either
> Xen or QEMU. As both _PCI_CONFIG and MMIO type ioreqs are processed,
> either method can be used to access PCI/extended config space --
> CF8/CFC port I/O or MMIO accesses to MMCONFIG.
> 
> IOREQ routing for multiple device emulators can be supported too. In
> fact, the same mmconfig dmops/hypercalls can be added to let Xen know
> where MMCONFIG area resides, Xen will use this information to forward
> MMCONFIG MMIO ioreqs accordingly to BDF of the address. The difference
> with the approach #1 is that these interfaces are now completely
> optional when we use MMIO ioreqs for MMCONFIG on vanilla Xen/QEMU.

As said above, if you forward MCFG accesses as IOREQ_TYPE_COPY you are
forcing each IOREQ server to know the position of the MCFG area in
order to do the decoding, this is not acceptable IMO.

> The question is why IOREQ_TYPE_COPY -> IOREQ_TYPE_PCI_CONFIG
> translation is a must have thing at all? It won't make handling simpler.
> For current QEMU implementation IOREQ_TYPE_COPY (MMIO accesses for
> MMCONFIG) would be preferable as it allows to use the existing code.

Granted it's likely easier to implement, but it's also incorrect. You
seem to have in mind the picture of a single IOREQ server (QEMU)
handling all the devices.

Although this is the most common scenario, it's not the only one
supported by Xen. Your proposed solution breaks the usage of multiple
IOREQ servers as PCI device emulators.

> I think it will be safe to use MMCONFIG emulation on MMIO level for now
> and later extend it with 'set_mmconfig_' dmop/hypercall for the
> 'multiple device emulators' IOREQ_TYPE_COPY routing to work same as for
> PCI conf, so it can be used by XenGT etc on Q35 as well.

I'm afraid this kind of issues would have been fairly easier to
identify if a design document for this feature was sent to the list
prior to it's implementation.

Regarding whether to accept something like this, I'm not really in
favor, but IMO it depends on how much new code is added to handle this
incorrect usage that would then go away (or would have to be changed)
in order to handle the proper implementation.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-21  9:09           ` Roger Pau Monné
@ 2018-03-21  9:36             ` Paul Durrant
  2018-03-21 14:35               ` Alexey G
  2018-03-21 14:25             ` Alexey G
  1 sibling, 1 reply; 183+ messages in thread
From: Paul Durrant @ 2018-03-21  9:36 UTC (permalink / raw)
  To: Roger Pau Monne, Alexey G
  Cc: xen-devel, Ian Jackson, Wei Liu, Jan Beulich, Andrew Cooper

> -----Original Message-----
> 
> > The question is why IOREQ_TYPE_COPY -> IOREQ_TYPE_PCI_CONFIG
> > translation is a must have thing at all? It won't make handling simpler.
> > For current QEMU implementation IOREQ_TYPE_COPY (MMIO accesses for
> > MMCONFIG) would be preferable as it allows to use the existing code.
> 
> Granted it's likely easier to implement, but it's also incorrect. You
> seem to have in mind the picture of a single IOREQ server (QEMU)
> handling all the devices.
> 
> Although this is the most common scenario, it's not the only one
> supported by Xen. Your proposed solution breaks the usage of multiple
> IOREQ servers as PCI device emulators.
> 

Indeed it will, and that is not acceptable even in the short term.

> > I think it will be safe to use MMCONFIG emulation on MMIO level for now
> > and later extend it with 'set_mmconfig_' dmop/hypercall for the
> > 'multiple device emulators' IOREQ_TYPE_COPY routing to work same as for
> > PCI conf, so it can be used by XenGT etc on Q35 as well.
> 

Introducing known breakage is not really on, particularly when it can be avoided with a reasonable amount of extra work.

  Paul

> I'm afraid this kind of issues would have been fairly easier to
> identify if a design document for this feature was sent to the list
> prior to it's implementation.
> 
> Regarding whether to accept something like this, I'm not really in
> favor, but IMO it depends on how much new code is added to handle this
> incorrect usage that would then go away (or would have to be changed)
> in order to handle the proper implementation.
> 
> Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-21  9:09           ` Roger Pau Monné
  2018-03-21  9:36             ` Paul Durrant
@ 2018-03-21 14:25             ` Alexey G
  2018-03-21 14:54               ` Paul Durrant
  2018-03-21 15:20               ` Roger Pau Monné
  1 sibling, 2 replies; 183+ messages in thread
From: Alexey G @ 2018-03-21 14:25 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	Paul Durrant, Jan Beulich, Anthony Perard, xen-devel

On Wed, 21 Mar 2018 09:09:11 +0000
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Wed, Mar 21, 2018 at 10:58:40AM +1000, Alexey G wrote:
[...]
>> According to public slides for the feature, both PCI conf and MMIO
>> accesses can be routed to the designated device model. It looks like
>> for this particular setup it doesn't really matter which particular
>> ioreq type must be used for MMCONFIG accesses -- either
>> IOREQ_TYPE_PCI_CONFIG or IOREQ_TYPE_COPY (MMIO accesses) should be
>> acceptable.  
>
>Isn't that going to be quite messy? How is the IOREQ server supposed
>to decode a MCFG access received as IOREQ_TYPE_COPY?

This code is already available and in sync with QEMU legacy PCI conf
emulation infrastructure.

>I don't think the IOREQ server needs to know the start of the MCFG
>region, in which case it won't be able to detect and decode the
>access if it's of type IOREQ_TYPE_COPY.

How do you think Xen will be able to know if arbitrary MMIO
access targets MMCONFIG area and to which BDF the offset in this area
belongs, without knowing where MMCONFIG is located and what PCI bus
layout is? It's QEMU who emulate PCIEXBAR and can tell Xen where
MMCONFIG is expected to be.

>MCFG accesses need to be sent to the IOREQ server as
>IOREQ_TYPE_PCI_CONFIG, or else you are forcing each IOREQ server to
>know the position of the MCFG area in order to do the decoding. In
>your case this would work because QEMU controls the position of the
>MCFG region, but there's no need for other IOREQ servers to know the
>position of the MCFG area.
>
>> The only thing which matters is ioreq routing itself --
>> making decisions to which device model the PCI conf/MMIO ioreq should
>> be sent.  
>
>Hm, see above, but I'm fairly sure you need to forward those MCFG
>accesses as IOREQ_TYPE_PCI_CONFIG to the IOREQ server.

(a detailed answer below)

>> >Traditional PCI config space accesses are not IO port space
>> >accesses.  
>> 
>> (assuming 'not' mistyped here)  
>
>Not really, this should instead be:
>
>"Traditional PCI config space accesses are not forwarded to the IOREQ
>server as IO port space accesses (IOREQ_TYPE_PIO) but rather as PCI
>config space accesses (IOREQ_TYPE_PCI_CONFIG)."
>
>Sorry for the confusion.
>
>> >The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and
>> >IOREQ servers can register devices they would like to receive
>> >configuration space accesses for. QEMU is already making use of
>> >this, see for  
>> 
>> That's one of the reasons why current IOREQ_TYPE_PCI_CONFIG
>> implementation is a bit inconvenient for MMCONFIG MMIO accesses --
>> it's too much CF8h/CFCh-centric in its implementation, might be
>> painful to change something in the code which was intended for
>> CF8h/CFCh handling (and not for MMIO processing).  
>
>I'm not sure I follow. Do you mean that changes should be made to the
>ioreq struct in order to forward MCFG accesses using
>IOREQ_TYPE_PCI_CONFIG as it's type?

No changes for ioreq structures needed for now.

>> It will be handled by IOREQ too, just using a different IOREQ type
>> (MMIO one). The basic question is why do we have to stick to PCI conf
>> space ioreqs for emulating MMIO accesses to MMCONFIG.  
>
>Because other IOREQ servers don't need to know about the position/size
>of the MCFG area, and cannot register MMIO ranges that cover their
>device's PCI configuration space in the MCFG region.
>
>Not to mention that it would would be a terrible design flaw to force
>IOREQ servers to register PCI devices and MCFG areas belonging to
>those devices separately as MMIO in order to trap all possible PCI
>configuration space accesses.

PCI conf space layout is shared by the emulated machine. And MMCONFIG
layout is mandated by this common PCI bus map.

Even if those 'multiple device models' see a different picture of PCI
conf space, their visions of PCI bus must not overlap + MMCONFIG layout
must be consistent between different device models.

Although it is a terrible mistake to think about the emulated PCI bus
like it's a set of distinct PCI devices unrelated to each other. It's
all coupled together. And this is especially true for PCIe.
Many PCIe features rely on PCIe device interaction in PCIe fabric, eg.
PCIe endpoints may interact with Root Complex in many ways. This
cooperation may  need to be emulated somehow, eg. to provide some
support for PM features, link management or native hotplug facilities.
Even if we have a real passed through device, we might need to provide
an emulated PCIe Switch or a Root Port for it to function properly
within the PCIe hierarchy.

Dedicating an isolated PCI device to some isolated device model --
that's what might be the design flaw, considering the PCIe world.

[...]
>
>Maybe you could detect offsets >= 256 and replay them in QEMU like
>mmio accesses? Using the address_space_write or
>pcie_mmcfg_data_read/write functions?
>I have to admit my knowledge of QEMU is quite limited, so I'm not sure
>of the best way to handle this.
>
>Ideally we should find a way that doesn't involve having to modify
>each chipset to handle MCFG accesses from Xen. It would be nice to
>have some kind of interface inside of QEMU so all chipsets can
>register MCFG areas or modify them, but this is out of the scope of
>this work.

Roger, Paul,

Here is what you suggest, just to clarify:

1. Add to Xen a new hypercall (+corresponding dmop) so QEMU can tell
Xen where QEMU emulates machine's MMCONFIG (chipset-specific emulation
of PCIEXBAR/HECBASE/etc mmcfg relocation). Xen will rely on this
information to know to which PCI device the address within MMCONFIG
belong.

2. Xen will trap this area + remap its trapping to other address if QEMU
will inform Xen about emulated PCIEXBAR value change

3. Every MMIO access to the current MMCONFIG range will be converted
into BDF first (by offset within this range, knowing where the range is)

4. Target device model is selected using calculated BDF

5. MMIO read/write accesses are converted into PCI config space ioreqs
(like it was a CF8/CFCh operation instead of MMIO access). At this
point ioreq structure allows to specify extended PCI conf offset
(12-bit), so it will fit into PCI conf ioreq. For now let's assume that
eg. a 64-bit memory operation is either aborted or workarounded by
splitting this operation into multiple PCI conf ioreqs.

6. PCI conf read/write ioreqs are sent to the chosen device model

7. QEMU receive MMCONFIG memory reads/writes as PCI conf reads/writes

8. As these MMCONFIG PCI conf reads occur out of context (just
address/len/data without any emulated device attached to it), xen-hvm.c
should employ special logic to make it QEMU-friendly -- eg. right now
it sends received PCI conf access into (emulated by QEMU) CF8h/CFCh
ports.
There is a real problem to embed these "naked" accesses into QEMU
infrastructure, workarounds are required. BTW, find_primary_bus() was
dropped from QEMU code -- it could've been useful here. Let's assume
some workaround is employed (like storing a required object pointers in
global variables for later use in xen-hvm.c)

9. Existing MMCONFIG-handling code in QEMU will be unused in this
scenario

10. All this needed primarily to make the specific "Multiple device
emulators" feature to work (XenGT was mentioned as its user) on Q35
with MMCONFIG.

Anything wrong/missing here?

(Adding Stefano and Anthony as xen-hvm.c mentioned)


Here is another suggestion:

1. QEMU use existing facilities to emulate PCIEXBAR for a Q35
machine, calling Xen's map_io_range_to_ioreq_server() API to mark MMIO
range for emulation, just like for any other emulated MMIO range

2. All accesses to this area will be forwarded to QEMU as MMIO ioreqs
and emulated flawlessly as everything is within QEMU architecture --
pci-host/PCIBus/PCIDevice machinery in place. No workarounds required
for xen-hvm.c

3. CF8/CFC accesses will be forwarded as _PCI_CONFIG ioreqs, as usually.
Both methods are in sync as they use common PCI emulation
infrastructure in QEMU

4. At this point absolutely zero changes are required in both Xen and
QEMU code. Only existing interfaces are used. In fact, no related code
changes required at all except a bugfix for PCIEXBAR mask emulation
(provided in this series)

5. But. Just to make the 'multiple device emulators' (no extra reasons
so far) feature to work, we add the same hypercall/dmop usage to let
Xen know where QEMU emulates MMCONFIG

6. Xen will continue to trap accesses to this range but instead of
sending _COPY ioreq immediately, he will check the address against
known MMCONFIG location (in the same manner as above), then convert the
offset within it to BDF and he can proceed to usual BDF-based ioreq
routing for those device emulator DMs, whatever they are

7. In fact, MMIO -> PCI conf ioreq translation can be freely used as
well at this stage, if it is more convenient for 'multiple device
emulators' feature users. It can be even made selectable.

So, the question which needs explanation is: why do you think MMIO->PCI
conf ioreq translation is mandatory for MMCONFIG? Can't we just add new
hypercall/dmop to make ioreq routing for 'multiple device emulators' to
work while letting QEMU to use any API provided for him to do its tasks?

It's kinda funny to pretend that QEMU don't know anything about
MMCONFIG being MMIO when it's QEMU who inform Xen about its memory
address and size.

>Regardless of how this ends up being implemented inside of QEMU I
>think the above approach is the right one from an architectural PoV.
>
>AFAICT there are still some reserved bits in the ioreq struct that you
>could use to signal 'this is a MCFG PCI access' if required.
>
>> Approach #2. Handling MMCONFIG area inside QEMU using usual MMIO
>> emulation:
>> 
>> 1. QEMU will trap accesses to PCIEXBAR (or whatever else possibly
>> supported in the future like HECBASE), eventually asking Xen to map
>> the MMCONFIG MMIO range for ioreq servicing just like it does for any
>> other emulated MMIO range, via map_io_range_to_ioreq_server(). All
>> changes in MMCONFIG placement/status will lead to remapping/unmapping
>> the MMIO range.
>> 
>> 2. Xen will trap MMIO accesses to this area and forward them to QEMU
>> as MMIO (IOREQ_TYPE_COPY) ioreqs
>> 
>> 3. QEMU will receive these accesses and pass them to the existing
>> MMCONFIG emulation -- pcie_mmcfg_data_read/write handlers, finally
>> resulting in same xen_host_pci_* function calls as before.
>> 
>> This approach works "right out of the box", no changes needed for
>> either Xen or QEMU. As both _PCI_CONFIG and MMIO type ioreqs are
>> processed, either method can be used to access PCI/extended config
>> space -- CF8/CFC port I/O or MMIO accesses to MMCONFIG.
>> 
>> IOREQ routing for multiple device emulators can be supported too. In
>> fact, the same mmconfig dmops/hypercalls can be added to let Xen know
>> where MMCONFIG area resides, Xen will use this information to forward
>> MMCONFIG MMIO ioreqs accordingly to BDF of the address. The
>> difference with the approach #1 is that these interfaces are now
>> completely optional when we use MMIO ioreqs for MMCONFIG on vanilla
>> Xen/QEMU.  
>
>As said above, if you forward MCFG accesses as IOREQ_TYPE_COPY you are
>forcing each IOREQ server to know the position of the MCFG area in
>order to do the decoding, this is not acceptable IMO.
>
>> The question is why IOREQ_TYPE_COPY -> IOREQ_TYPE_PCI_CONFIG
>> translation is a must have thing at all? It won't make handling
>> simpler. For current QEMU implementation IOREQ_TYPE_COPY (MMIO
>> accesses for MMCONFIG) would be preferable as it allows to use the
>> existing code.  
>
>Granted it's likely easier to implement, but it's also incorrect. You
>seem to have in mind the picture of a single IOREQ server (QEMU)
>handling all the devices.
>
>Although this is the most common scenario, it's not the only one
>supported by Xen. Your proposed solution breaks the usage of multiple
>IOREQ servers as PCI device emulators.
>
>> I think it will be safe to use MMCONFIG emulation on MMIO level for
>> now and later extend it with 'set_mmconfig_' dmop/hypercall for the
>> 'multiple device emulators' IOREQ_TYPE_COPY routing to work same as
>> for PCI conf, so it can be used by XenGT etc on Q35 as well.  
>
>I'm afraid this kind of issues would have been fairly easier to
>identify if a design document for this feature was sent to the list
>prior to it's implementation.
>
>Regarding whether to accept something like this, I'm not really in
>favor, but IMO it depends on how much new code is added to handle this
>incorrect usage that would then go away (or would have to be changed)
>in order to handle the proper implementation.
>
>Thanks, Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-21  9:36             ` Paul Durrant
@ 2018-03-21 14:35               ` Alexey G
  2018-03-21 14:58                 ` Paul Durrant
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-21 14:35 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Wei Liu, Andrew Cooper, Jan Beulich, Ian Jackson, xen-devel,
	Roger Pau Monne

On Wed, 21 Mar 2018 09:36:04 +0000
Paul Durrant <Paul.Durrant@citrix.com> wrote:
>> 
>> Although this is the most common scenario, it's not the only one
>> supported by Xen. Your proposed solution breaks the usage of multiple
>> IOREQ servers as PCI device emulators.
>
>Indeed it will, and that is not acceptable even in the short term.

Hmm, what exactly you are rejecting? QEMU's usage of established (and
provided by Xen) interfaces for QEMU to use? Any particular reason why
QEMU can use map_io_range_to_ioreq_server() in one case and can't in
another? It's API available for QEMU after all.

If we actually switch to the emulated MMCONFIG range informing approach
for Xen (via a new dmop/hypercall), who should prevent QEMU to actually
map this range via map_io_range_to_ioreq_server? QEMU itself? Or Xen?
How to will look, "QEMU asks us to map this range as emulated MMIO, but
he previously told us that emulated PCIEXBAR register points there, so
we won't allow him to do it"?

>> > I think it will be safe to use MMCONFIG emulation on MMIO level
>> > for now and later extend it with 'set_mmconfig_' dmop/hypercall
>> > for the 'multiple device emulators' IOREQ_TYPE_COPY routing to
>> > work same as for PCI conf, so it can be used by XenGT etc on Q35
>> > as well.  
>Introducing known breakage is not really on, particularly when it can
>be avoided with a reasonable amount of extra work.

It's hard to break something which doesn't exist. :) Multiple device
emulators feature do not support translation/routing of MMCONFIG MMIO
accesses currently, it must be designed first.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-21 14:25             ` Alexey G
@ 2018-03-21 14:54               ` Paul Durrant
  2018-03-21 17:41                 ` Alexey G
  2018-03-21 15:20               ` Roger Pau Monné
  1 sibling, 1 reply; 183+ messages in thread
From: Paul Durrant @ 2018-03-21 14:54 UTC (permalink / raw)
  To: 'Alexey G', Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Jan Beulich,
	Ian Jackson, Anthony Perard, xen-devel

> -----Original Message-----
> From: Alexey G [mailto:x1917x@gmail.com]
> Sent: 21 March 2018 14:26
> To: Roger Pau Monne <roger.pau@citrix.com>
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Jan
> Beulich <jbeulich@suse.com>; Wei Liu <wei.liu2@citrix.com>; Paul Durrant
> <Paul.Durrant@citrix.com>; Anthony Perard <anthony.perard@citrix.com>;
> Stefano Stabellini <sstabellini@kernel.org>
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Wed, 21 Mar 2018 09:09:11 +0000
> Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> >On Wed, Mar 21, 2018 at 10:58:40AM +1000, Alexey G wrote:
> [...]
> >> According to public slides for the feature, both PCI conf and MMIO
> >> accesses can be routed to the designated device model. It looks like
> >> for this particular setup it doesn't really matter which particular
> >> ioreq type must be used for MMCONFIG accesses -- either
> >> IOREQ_TYPE_PCI_CONFIG or IOREQ_TYPE_COPY (MMIO accesses) should
> be
> >> acceptable.
> >
> >Isn't that going to be quite messy? How is the IOREQ server supposed
> >to decode a MCFG access received as IOREQ_TYPE_COPY?
> 
> This code is already available and in sync with QEMU legacy PCI conf
> emulation infrastructure.
> 
> >I don't think the IOREQ server needs to know the start of the MCFG
> >region, in which case it won't be able to detect and decode the
> >access if it's of type IOREQ_TYPE_COPY.
> 
> How do you think Xen will be able to know if arbitrary MMIO
> access targets MMCONFIG area and to which BDF the offset in this area
> belongs, without knowing where MMCONFIG is located and what PCI bus
> layout is? It's QEMU who emulate PCIEXBAR and can tell Xen where
> MMCONFIG is expected to be.
> 
> >MCFG accesses need to be sent to the IOREQ server as
> >IOREQ_TYPE_PCI_CONFIG, or else you are forcing each IOREQ server to
> >know the position of the MCFG area in order to do the decoding. In
> >your case this would work because QEMU controls the position of the
> >MCFG region, but there's no need for other IOREQ servers to know the
> >position of the MCFG area.
> >
> >> The only thing which matters is ioreq routing itself --
> >> making decisions to which device model the PCI conf/MMIO ioreq should
> >> be sent.
> >
> >Hm, see above, but I'm fairly sure you need to forward those MCFG
> >accesses as IOREQ_TYPE_PCI_CONFIG to the IOREQ server.
> 
> (a detailed answer below)
> 
> >> >Traditional PCI config space accesses are not IO port space
> >> >accesses.
> >>
> >> (assuming 'not' mistyped here)
> >
> >Not really, this should instead be:
> >
> >"Traditional PCI config space accesses are not forwarded to the IOREQ
> >server as IO port space accesses (IOREQ_TYPE_PIO) but rather as PCI
> >config space accesses (IOREQ_TYPE_PCI_CONFIG)."
> >
> >Sorry for the confusion.
> >
> >> >The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and
> >> >IOREQ servers can register devices they would like to receive
> >> >configuration space accesses for. QEMU is already making use of
> >> >this, see for
> >>
> >> That's one of the reasons why current IOREQ_TYPE_PCI_CONFIG
> >> implementation is a bit inconvenient for MMCONFIG MMIO accesses --
> >> it's too much CF8h/CFCh-centric in its implementation, might be
> >> painful to change something in the code which was intended for
> >> CF8h/CFCh handling (and not for MMIO processing).
> >
> >I'm not sure I follow. Do you mean that changes should be made to the
> >ioreq struct in order to forward MCFG accesses using
> >IOREQ_TYPE_PCI_CONFIG as it's type?
> 
> No changes for ioreq structures needed for now.
> 
> >> It will be handled by IOREQ too, just using a different IOREQ type
> >> (MMIO one). The basic question is why do we have to stick to PCI conf
> >> space ioreqs for emulating MMIO accesses to MMCONFIG.
> >
> >Because other IOREQ servers don't need to know about the position/size
> >of the MCFG area, and cannot register MMIO ranges that cover their
> >device's PCI configuration space in the MCFG region.
> >
> >Not to mention that it would would be a terrible design flaw to force
> >IOREQ servers to register PCI devices and MCFG areas belonging to
> >those devices separately as MMIO in order to trap all possible PCI
> >configuration space accesses.
> 
> PCI conf space layout is shared by the emulated machine. And MMCONFIG
> layout is mandated by this common PCI bus map.
> 
> Even if those 'multiple device models' see a different picture of PCI
> conf space, their visions of PCI bus must not overlap + MMCONFIG layout
> must be consistent between different device models.
> 
> Although it is a terrible mistake to think about the emulated PCI bus
> like it's a set of distinct PCI devices unrelated to each other. It's
> all coupled together. And this is especially true for PCIe.
> Many PCIe features rely on PCIe device interaction in PCIe fabric, eg.
> PCIe endpoints may interact with Root Complex in many ways. This
> cooperation may  need to be emulated somehow, eg. to provide some
> support for PM features, link management or native hotplug facilities.
> Even if we have a real passed through device, we might need to provide
> an emulated PCIe Switch or a Root Port for it to function properly
> within the PCIe hierarchy.
> 
> Dedicating an isolated PCI device to some isolated device model --
> that's what might be the design flaw, considering the PCIe world.
> 

I think that is the crux of the problem. The current multi-ioreq-server relies on being able to consider PCI devices as being isolated from each other... and that is basically fine because we only use a single PCI bus with no bridges. To move to PCIe will require more emulation in Xen, but I think that is the only way to do it properly.

> [...]
> >
> >Maybe you could detect offsets >= 256 and replay them in QEMU like
> >mmio accesses? Using the address_space_write or
> >pcie_mmcfg_data_read/write functions?
> >I have to admit my knowledge of QEMU is quite limited, so I'm not sure
> >of the best way to handle this.
> >
> >Ideally we should find a way that doesn't involve having to modify
> >each chipset to handle MCFG accesses from Xen. It would be nice to
> >have some kind of interface inside of QEMU so all chipsets can
> >register MCFG areas or modify them, but this is out of the scope of
> >this work.
> 
> Roger, Paul,
> 
> Here is what you suggest, just to clarify:
> 
> 1. Add to Xen a new hypercall (+corresponding dmop) so QEMU can tell
> Xen where QEMU emulates machine's MMCONFIG (chipset-specific
> emulation
> of PCIEXBAR/HECBASE/etc mmcfg relocation). Xen will rely on this
> information to know to which PCI device the address within MMCONFIG
> belong.
> 
> 2. Xen will trap this area + remap its trapping to other address if QEMU
> will inform Xen about emulated PCIEXBAR value change
> 
> 3. Every MMIO access to the current MMCONFIG range will be converted
> into BDF first (by offset within this range, knowing where the range is)
> 
> 4. Target device model is selected using calculated BDF
> 
> 5. MMIO read/write accesses are converted into PCI config space ioreqs
> (like it was a CF8/CFCh operation instead of MMIO access). At this
> point ioreq structure allows to specify extended PCI conf offset
> (12-bit), so it will fit into PCI conf ioreq. For now let's assume that
> eg. a 64-bit memory operation is either aborted or workarounded by
> splitting this operation into multiple PCI conf ioreqs.
> 
> 6. PCI conf read/write ioreqs are sent to the chosen device model
> 
> 7. QEMU receive MMCONFIG memory reads/writes as PCI conf reads/writes
> 
> 8. As these MMCONFIG PCI conf reads occur out of context (just
> address/len/data without any emulated device attached to it), xen-hvm.c
> should employ special logic to make it QEMU-friendly -- eg. right now
> it sends received PCI conf access into (emulated by QEMU) CF8h/CFCh
> ports.
> There is a real problem to embed these "naked" accesses into QEMU
> infrastructure, workarounds are required. BTW, find_primary_bus() was
> dropped from QEMU code -- it could've been useful here. Let's assume
> some workaround is employed (like storing a required object pointers in
> global variables for later use in xen-hvm.c)
> 
> 9. Existing MMCONFIG-handling code in QEMU will be unused in this
> scenario
> 
> 10. All this needed primarily to make the specific "Multiple device
> emulators" feature to work (XenGT was mentioned as its user) on Q35
> with MMCONFIG.
> 
> Anything wrong/missing here?

That all sounds plausible. All we essentially need to do is make sure the config space transactions make it to the right device model in QEMU. If the emulation in Xen is comprehensive then I guess there should not even be any reason for QEMU's idea of the bus topology and Xen's presentation of the bus topology to the guest to even match.

  Paul

> 
> (Adding Stefano and Anthony as xen-hvm.c mentioned)
> 
> 
> Here is another suggestion:
> 
> 1. QEMU use existing facilities to emulate PCIEXBAR for a Q35
> machine, calling Xen's map_io_range_to_ioreq_server() API to mark MMIO
> range for emulation, just like for any other emulated MMIO range
> 
> 2. All accesses to this area will be forwarded to QEMU as MMIO ioreqs
> and emulated flawlessly as everything is within QEMU architecture --
> pci-host/PCIBus/PCIDevice machinery in place. No workarounds required
> for xen-hvm.c
> 
> 3. CF8/CFC accesses will be forwarded as _PCI_CONFIG ioreqs, as usually.
> Both methods are in sync as they use common PCI emulation
> infrastructure in QEMU
> 
> 4. At this point absolutely zero changes are required in both Xen and
> QEMU code. Only existing interfaces are used. In fact, no related code
> changes required at all except a bugfix for PCIEXBAR mask emulation
> (provided in this series)
> 
> 5. But. Just to make the 'multiple device emulators' (no extra reasons
> so far) feature to work, we add the same hypercall/dmop usage to let
> Xen know where QEMU emulates MMCONFIG
> 
> 6. Xen will continue to trap accesses to this range but instead of
> sending _COPY ioreq immediately, he will check the address against
> known MMCONFIG location (in the same manner as above), then convert
> the
> offset within it to BDF and he can proceed to usual BDF-based ioreq
> routing for those device emulator DMs, whatever they are
> 
> 7. In fact, MMIO -> PCI conf ioreq translation can be freely used as
> well at this stage, if it is more convenient for 'multiple device
> emulators' feature users. It can be even made selectable.
> 
> So, the question which needs explanation is: why do you think MMIO->PCI
> conf ioreq translation is mandatory for MMCONFIG? Can't we just add new
> hypercall/dmop to make ioreq routing for 'multiple device emulators' to
> work while letting QEMU to use any API provided for him to do its tasks?
> 
> It's kinda funny to pretend that QEMU don't know anything about
> MMCONFIG being MMIO when it's QEMU who inform Xen about its memory
> address and size.
> 
> >Regardless of how this ends up being implemented inside of QEMU I
> >think the above approach is the right one from an architectural PoV.
> >
> >AFAICT there are still some reserved bits in the ioreq struct that you
> >could use to signal 'this is a MCFG PCI access' if required.
> >
> >> Approach #2. Handling MMCONFIG area inside QEMU using usual MMIO
> >> emulation:
> >>
> >> 1. QEMU will trap accesses to PCIEXBAR (or whatever else possibly
> >> supported in the future like HECBASE), eventually asking Xen to map
> >> the MMCONFIG MMIO range for ioreq servicing just like it does for any
> >> other emulated MMIO range, via map_io_range_to_ioreq_server(). All
> >> changes in MMCONFIG placement/status will lead to
> remapping/unmapping
> >> the MMIO range.
> >>
> >> 2. Xen will trap MMIO accesses to this area and forward them to QEMU
> >> as MMIO (IOREQ_TYPE_COPY) ioreqs
> >>
> >> 3. QEMU will receive these accesses and pass them to the existing
> >> MMCONFIG emulation -- pcie_mmcfg_data_read/write handlers, finally
> >> resulting in same xen_host_pci_* function calls as before.
> >>
> >> This approach works "right out of the box", no changes needed for
> >> either Xen or QEMU. As both _PCI_CONFIG and MMIO type ioreqs are
> >> processed, either method can be used to access PCI/extended config
> >> space -- CF8/CFC port I/O or MMIO accesses to MMCONFIG.
> >>
> >> IOREQ routing for multiple device emulators can be supported too. In
> >> fact, the same mmconfig dmops/hypercalls can be added to let Xen know
> >> where MMCONFIG area resides, Xen will use this information to forward
> >> MMCONFIG MMIO ioreqs accordingly to BDF of the address. The
> >> difference with the approach #1 is that these interfaces are now
> >> completely optional when we use MMIO ioreqs for MMCONFIG on vanilla
> >> Xen/QEMU.
> >
> >As said above, if you forward MCFG accesses as IOREQ_TYPE_COPY you are
> >forcing each IOREQ server to know the position of the MCFG area in
> >order to do the decoding, this is not acceptable IMO.
> >
> >> The question is why IOREQ_TYPE_COPY -> IOREQ_TYPE_PCI_CONFIG
> >> translation is a must have thing at all? It won't make handling
> >> simpler. For current QEMU implementation IOREQ_TYPE_COPY (MMIO
> >> accesses for MMCONFIG) would be preferable as it allows to use the
> >> existing code.
> >
> >Granted it's likely easier to implement, but it's also incorrect. You
> >seem to have in mind the picture of a single IOREQ server (QEMU)
> >handling all the devices.
> >
> >Although this is the most common scenario, it's not the only one
> >supported by Xen. Your proposed solution breaks the usage of multiple
> >IOREQ servers as PCI device emulators.
> >
> >> I think it will be safe to use MMCONFIG emulation on MMIO level for
> >> now and later extend it with 'set_mmconfig_' dmop/hypercall for the
> >> 'multiple device emulators' IOREQ_TYPE_COPY routing to work same as
> >> for PCI conf, so it can be used by XenGT etc on Q35 as well.
> >
> >I'm afraid this kind of issues would have been fairly easier to
> >identify if a design document for this feature was sent to the list
> >prior to it's implementation.
> >
> >Regarding whether to accept something like this, I'm not really in
> >favor, but IMO it depends on how much new code is added to handle this
> >incorrect usage that would then go away (or would have to be changed)
> >in order to handle the proper implementation.
> >
> >Thanks, Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-21 14:35               ` Alexey G
@ 2018-03-21 14:58                 ` Paul Durrant
  0 siblings, 0 replies; 183+ messages in thread
From: Paul Durrant @ 2018-03-21 14:58 UTC (permalink / raw)
  To: 'Alexey G'
  Cc: Wei Liu, Andrew Cooper, Jan Beulich, Ian Jackson, xen-devel,
	Roger Pau Monne

> -----Original Message-----
> From: Alexey G [mailto:x1917x@gmail.com]
> Sent: 21 March 2018 14:35
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Roger Pau Monne <roger.pau@citrix.com>; xen-
> devel@lists.xenproject.org; Andrew Cooper <Andrew.Cooper3@citrix.com>;
> Ian Jackson <Ian.Jackson@citrix.com>; Jan Beulich <jbeulich@suse.com>;
> Wei Liu <wei.liu2@citrix.com>
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Wed, 21 Mar 2018 09:36:04 +0000
> Paul Durrant <Paul.Durrant@citrix.com> wrote:
> >>
> >> Although this is the most common scenario, it's not the only one
> >> supported by Xen. Your proposed solution breaks the usage of multiple
> >> IOREQ servers as PCI device emulators.
> >
> >Indeed it will, and that is not acceptable even in the short term.
> 
> Hmm, what exactly you are rejecting? QEMU's usage of established (and
> provided by Xen) interfaces for QEMU to use? Any particular reason why
> QEMU can use map_io_range_to_ioreq_server() in one case and can't in
> another? It's API available for QEMU after all.
> 
> If we actually switch to the emulated MMCONFIG range informing approach
> for Xen (via a new dmop/hypercall), who should prevent QEMU to actually
> map this range via map_io_range_to_ioreq_server? QEMU itself? Or Xen?

Xen internal emulation always trumps any external emulator, so even if QEMU maps an MMIO range it will not see any accesses if Xen is handling emulation of that range.

> How to will look, "QEMU asks us to map this range as emulated MMIO, but
> he previously told us that emulated PCIEXBAR register points there, so
> we won't allow him to do it"?
> 
> >> > I think it will be safe to use MMCONFIG emulation on MMIO level
> >> > for now and later extend it with 'set_mmconfig_' dmop/hypercall
> >> > for the 'multiple device emulators' IOREQ_TYPE_COPY routing to
> >> > work same as for PCI conf, so it can be used by XenGT etc on Q35
> >> > as well.
> >Introducing known breakage is not really on, particularly when it can
> >be avoided with a reasonable amount of extra work.
> 
> It's hard to break something which doesn't exist. :) Multiple device
> emulators feature do not support translation/routing of MMCONFIG MMIO
> accesses currently, it must be designed first.

Indeed, but updating to a new chipset emulation that breaks existing functionality is not going to be helpful.

  Paul

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-21 14:25             ` Alexey G
  2018-03-21 14:54               ` Paul Durrant
@ 2018-03-21 15:20               ` Roger Pau Monné
  2018-03-21 16:56                 ` Alexey G
  1 sibling, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-21 15:20 UTC (permalink / raw)
  To: Alexey G
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	Paul Durrant, Jan Beulich, Anthony Perard, xen-devel

On Thu, Mar 22, 2018 at 12:25:40AM +1000, Alexey G wrote:
> Roger, Paul,
> 
> Here is what you suggest, just to clarify:
> 
> 1. Add to Xen a new hypercall (+corresponding dmop) so QEMU can tell
> Xen where QEMU emulates machine's MMCONFIG (chipset-specific emulation
> of PCIEXBAR/HECBASE/etc mmcfg relocation). Xen will rely on this
> information to know to which PCI device the address within MMCONFIG
> belong.
> 
> 2. Xen will trap this area + remap its trapping to other address if QEMU
> will inform Xen about emulated PCIEXBAR value change
> 
> 3. Every MMIO access to the current MMCONFIG range will be converted
> into BDF first (by offset within this range, knowing where the range is)
> 
> 4. Target device model is selected using calculated BDF
> 
> 5. MMIO read/write accesses are converted into PCI config space ioreqs
> (like it was a CF8/CFCh operation instead of MMIO access). At this
> point ioreq structure allows to specify extended PCI conf offset
> (12-bit), so it will fit into PCI conf ioreq. For now let's assume that
> eg. a 64-bit memory operation is either aborted or workarounded by
> splitting this operation into multiple PCI conf ioreqs.

Why can't you just set size = 8 in that case in the ioreq?

QEMU should then reject those if the chipset doesn't support 64bit
accesses. I cannot find in the spec any mention of whether this
chipset supports 64bit MCFG accesses, and according to the PCIe spec
64bit accesses to MCFG should not be used unless the chipset is known
to handle them correctly.

> 6. PCI conf read/write ioreqs are sent to the chosen device model
> 
> 7. QEMU receive MMCONFIG memory reads/writes as PCI conf reads/writes
> 
> 8. As these MMCONFIG PCI conf reads occur out of context (just
> address/len/data without any emulated device attached to it), xen-hvm.c
> should employ special logic to make it QEMU-friendly -- eg. right now
> it sends received PCI conf access into (emulated by QEMU) CF8h/CFCh
> ports.
> There is a real problem to embed these "naked" accesses into QEMU
> infrastructure, workarounds are required. BTW, find_primary_bus() was
> dropped from QEMU code -- it could've been useful here. Let's assume
> some workaround is employed (like storing a required object pointers in
> global variables for later use in xen-hvm.c)

That seems like a minor nit, but why not just use
address_space_{read/write} to replay the MCFG accesses as memory
read/writes?

> 
> 9. Existing MMCONFIG-handling code in QEMU will be unused in this
> scenario

If you replay the read/write I don't think so. In any case this is
irrelevant. QEMU CPU emulation code is also unused when running under
Xen.

> 10. All this needed primarily to make the specific "Multiple device
> emulators" feature to work (XenGT was mentioned as its user) on Q35
> with MMCONFIG.
> 
> Anything wrong/missing here?

I think that's correct.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 08/12] libxl: Q35 support (new option device_model_machine)
  2018-03-19 22:11     ` Alexey G
  2018-03-20  9:11       ` Roger Pau Monné
@ 2018-03-21 16:25       ` Wei Liu
  1 sibling, 0 replies; 183+ messages in thread
From: Wei Liu @ 2018-03-21 16:25 UTC (permalink / raw)
  To: Alexey G; +Cc: xen-devel, Wei Liu, Ian Jackson, Roger Pau Monné

On Tue, Mar 20, 2018 at 08:11:49AM +1000, Alexey G wrote:
> >>          if (b_info->u.hvm.mmio_hole_memkb) {
> >>              uint64_t max_ram_below_4g = (1ULL << 32) -
> >> diff --git a/tools/libxl/libxl_types.idl
> >> b/tools/libxl/libxl_types.idl index 35038120ca..f3ef3cbdde 100644
> >> --- a/tools/libxl/libxl_types.idl
> >> +++ b/tools/libxl/libxl_types.idl
> >> @@ -101,6 +101,12 @@ libxl_device_model_version =
> >> Enumeration("device_model_version", [ (2, "QEMU_XEN"),             #
> >> Upstream based qemu-xen device model ])
> >>  
> >> +libxl_device_model_machine = Enumeration("device_model_machine", [
> >> +    (0, "UNKNOWN"),  
> >
> >Shouldn't this be named DEFAULT?
> 
> "Unknown" here should be read as "unspecified", but I guess DEFAULT
> will be clearer anyway.
> 

I'm afraid the ship has already sailed. There are far too many UNKNOWNs
in libxl_types.idl so we might as well stick to it here.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 08/12] libxl: Q35 support (new option device_model_machine)
  2018-03-20  9:11       ` Roger Pau Monné
@ 2018-03-21 16:27         ` Wei Liu
  2018-03-21 17:03           ` Anthony PERARD
  0 siblings, 1 reply; 183+ messages in thread
From: Wei Liu @ 2018-03-21 16:27 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Ian Jackson, xen-devel, Wei Liu, Alexey G, Anthony PERARD

On Tue, Mar 20, 2018 at 09:11:10AM +0000, Roger Pau Monné wrote:
> On Tue, Mar 20, 2018 at 08:11:49AM +1000, Alexey G wrote:
> > On Mon, 19 Mar 2018 17:01:18 +0000
> > Roger Pau Monné <roger.pau@citrix.com> wrote:
> > 
> > >On Tue, Mar 13, 2018 at 04:33:53AM +1000, Alexey Gerasimenko wrote:
> > >> Provide a new domain config option to select the emulated machine
> > >> type, device_model_machine. It has following possible values:
> > >> - "i440" - i440 emulation (default)
> > >> - "q35" - emulate a Q35 machine. By default, the storage interface
> > >> is AHCI.  
> > >
> > >I would rather name this machine_chipset or device_model_chipset.
> > 
> > device_model_ prefix is a must I think -- multiple device model related
> > options have names starting with device_model_.
> > 
> > device_model_chipset... well, maybe, but we're actually specifying a
> > QEMU machine here. In QEMU mailing list there was even a suggestion
> > to allow to pass a machine version number here, like "pc-q35-2.10".
> > I think some opinions are needed here.
> 
> I'm not sure what a 'machine' is in QEMU speak, but in my mind I would
> consider PC a machine (vs ARM for example).
> 
> I think 'chipset' is clearer, but again others should express their
> opinion.

AIUI machine is a collection of chipset and peripherals, i.e. it covers
more than the chipset alone.

Cc Anthony for correction.

Wei.

> 
> Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 09/12] libxl: Xen Platform device support for Q35
  2018-03-19 15:05   ` Alexey G
@ 2018-03-21 16:32     ` Wei Liu
  0 siblings, 0 replies; 183+ messages in thread
From: Wei Liu @ 2018-03-21 16:32 UTC (permalink / raw)
  To: Alexey G; +Cc: xen-devel, Ian Jackson, Wei Liu

On Tue, Mar 20, 2018 at 01:05:32AM +1000, Alexey G wrote:
> On Tue, 13 Mar 2018 04:33:54 +1000
> Alexey Gerasimenko <x1917x@gmail.com> wrote:
> 
> >Current Xen/QEMU method to control Xen Platform device is a bit odd --
> >changing 'xen_platform_device' option value actually modifies QEMU
> >emulated machine type, namely xenfv <--> pc.
> >
> >In order to avoid multiplying machine types, use the new way to control
> >Xen Platform device for QEMU -- xen-platform-dev property. To maintain
> >backward compatibility with existing Xen/QEMU setups, this is only
> >applicable to q35 machine currently. i440 emulation uses the old method
> >(xenfv/pc machine) to control Xen Platform device, this may be changed
> >later to xen-platform-dev property as well.
> >
> >Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> >---
> > tools/libxl/libxl_dm.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> >diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> >index 7b531050c7..586035aa73 100644
> >--- a/tools/libxl/libxl_dm.c
> >+++ b/tools/libxl/libxl_dm.c
> >@@ -1444,7 +1444,11 @@ static int
> >libxl__build_device_model_args_new(libxl__gc *gc,
> >         break;
> >     case LIBXL_DOMAIN_TYPE_HVM:
> >         if (b_info->device_model_machine ==
> > LIBXL_DEVICE_MODEL_MACHINE_Q35) {
> >-            machinearg = libxl__sprintf(gc, "q35,accel=xen");
> >+            if (!libxl_defbool_val(b_info->u.hvm.xen_platform_pci)) {
> >+                machinearg = libxl__sprintf(gc, "q35,accel=xen");
> >+            } else {
> >+                machinearg = libxl__sprintf(gc,
> >"q35,accel=xen,xen-platform-dev=on");
> >+            }
> >         } else {
> >             if (!libxl_defbool_val(b_info->u.hvm.xen_platform_pci)) {
> >                 /* Switching here to the machine "pc" which does not
> > add
> 
> Regarding this one -- QEMU maintainers suggested that supplying '-device
> xen-platform' directly should be a better approach than a machine
> property, so this patch is kinda obsolete.

I agree with QEMU maintainers.

> 
> Right now "xenfv" machine usage for qemu-xen seems to be limited to
> controlling the Xen platform device and applying the HVM_MAX_VCPUS
> value to maxcpus + minor changes related to IGD passthrough. Both
> should be applicable for a "pc,accel=xen" machine as well I think, which
> in fact currently lacks the HVM_MAX_VCPUS check for some reason.
> 
> Adding a distinct method to control Xen platform device for the q35
> machine suggests to propagate the same approach to i440 machine types,
> but... it depends on who else can use xenfv for qemu-xen (not to be
> confused with xenfv usage on qemu-traditional).
> 
> Is there any other toolstacks/code which use xenfv machine solely to
> turn on/off Xen platform device?

Check libvirt?

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-21 15:20               ` Roger Pau Monné
@ 2018-03-21 16:56                 ` Alexey G
  2018-03-21 17:06                   ` Paul Durrant
  2018-03-21 17:15                   ` Roger Pau Monné
  0 siblings, 2 replies; 183+ messages in thread
From: Alexey G @ 2018-03-21 16:56 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	Paul Durrant, Jan Beulich, Anthony Perard, xen-devel

On Wed, 21 Mar 2018 15:20:17 +0000
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Thu, Mar 22, 2018 at 12:25:40AM +1000, Alexey G wrote:
>> Roger, Paul,
>> 
>> Here is what you suggest, just to clarify:
>> 
>> 1. Add to Xen a new hypercall (+corresponding dmop) so QEMU can tell
>> Xen where QEMU emulates machine's MMCONFIG (chipset-specific
>> emulation of PCIEXBAR/HECBASE/etc mmcfg relocation). Xen will rely
>> on this information to know to which PCI device the address within
>> MMCONFIG belong.
>> 
>> 2. Xen will trap this area + remap its trapping to other address if
>> QEMU will inform Xen about emulated PCIEXBAR value change
>> 
>> 3. Every MMIO access to the current MMCONFIG range will be converted
>> into BDF first (by offset within this range, knowing where the range
>> is)
>> 
>> 4. Target device model is selected using calculated BDF
>> 
>> 5. MMIO read/write accesses are converted into PCI config space
>> ioreqs (like it was a CF8/CFCh operation instead of MMIO access). At
>> this point ioreq structure allows to specify extended PCI conf offset
>> (12-bit), so it will fit into PCI conf ioreq. For now let's assume
>> that eg. a 64-bit memory operation is either aborted or workarounded
>> by splitting this operation into multiple PCI conf ioreqs.  
>
>Why can't you just set size = 8 in that case in the ioreq?
>
>QEMU should then reject those if the chipset doesn't support 64bit
>accesses. I cannot find in the spec any mention of whether this
>chipset supports 64bit MCFG accesses, and according to the PCIe spec
>64bit accesses to MCFG should not be used unless the chipset is known
>to handle them correctly.
Yes, uint64_t should be enough in this particular case in fact, though
memory nature of MMCONFIG accesses might still require to provide
specific handling.


All right then, so it will be a dmop/hypercall to tell Xen where to
trap MMIO accesses to MMCONFIG as you propose.

The primary device model (QEMU) will be emulating chipset-specific
PCIEXBAR/etc and issuing this new dmop to tell Xen which area it needs
to trap for MMIO MMCONFIG acceses. It's basically what
map_io_range_to_ioreq_server does currently, but I guess a new dedicated
dmop/hypercall is bearable.

>> 6. PCI conf read/write ioreqs are sent to the chosen device model
>> 
>> 7. QEMU receive MMCONFIG memory reads/writes as PCI conf reads/writes
>> 
>> 8. As these MMCONFIG PCI conf reads occur out of context (just
>> address/len/data without any emulated device attached to it),
>> xen-hvm.c should employ special logic to make it QEMU-friendly --
>> eg. right now it sends received PCI conf access into (emulated by
>> QEMU) CF8h/CFCh ports.
>> There is a real problem to embed these "naked" accesses into QEMU
>> infrastructure, workarounds are required. BTW, find_primary_bus() was
>> dropped from QEMU code -- it could've been useful here. Let's assume
>> some workaround is employed (like storing a required object pointers
>> in global variables for later use in xen-hvm.c)  
>
>That seems like a minor nit, but why not just use
>address_space_{read/write} to replay the MCFG accesses as memory
>read/writes?

Well, this might work actually. Although the overall scenario will be
overcomplicated a bit for _PCI_CONFIG ioreqs. Here is how it will look:

QEMU receives PCIEXBAR update -> calls the new dmop to tell Xen new
MMCONFIG address/size -> Xen (re)maps MMIO trapping area -> someone is
accessing this area -> Xen intercepts this MMIO access

But here's what happens next:

Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info back to
the offset in emulated MMCONFIG range -> DM calls
address_space_read/write to trigger MMIO emulation

I tnink some parts of this equation can be collapsed, isn't it?

Above scenario makes it obvious that at least for QEMU the MMIO->PCI
conf translation is a redundant step. Why not to allow specifying for DM
whether it prefers to receive MMCONFIG accesses as native (MMIO ones)
or as translated PCI conf ioreqs? We can still route either ioreq
type to multiple device emulators accordingly.

This will be the most universal and consistent approach -- either _COPY
or _PCI_CONFIG-type ioreqs can be sent to DM, whatever it likes more.

>> 9. Existing MMCONFIG-handling code in QEMU will be unused in this
>> scenario  
>
>If you replay the read/write I don't think so. In any case this is
>irrelevant. QEMU CPU emulation code is also unused when running under
>Xen.
>
>> 10. All this needed primarily to make the specific "Multiple device
>> emulators" feature to work (XenGT was mentioned as its user) on Q35
>> with MMCONFIG.
>> 
>> Anything wrong/missing here?  
>
>I think that's correct.
>
>Thanks, Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 08/12] libxl: Q35 support (new option device_model_machine)
  2018-03-21 16:27         ` Wei Liu
@ 2018-03-21 17:03           ` Anthony PERARD
  0 siblings, 0 replies; 183+ messages in thread
From: Anthony PERARD @ 2018-03-21 17:03 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, Ian Jackson, Alexey G, Roger Pau Monné

On Wed, Mar 21, 2018 at 04:27:43PM +0000, Wei Liu wrote:
> On Tue, Mar 20, 2018 at 09:11:10AM +0000, Roger Pau Monné wrote:
> > On Tue, Mar 20, 2018 at 08:11:49AM +1000, Alexey G wrote:
> > > On Mon, 19 Mar 2018 17:01:18 +0000
> > > Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > 
> > > >On Tue, Mar 13, 2018 at 04:33:53AM +1000, Alexey Gerasimenko wrote:
> > > >> Provide a new domain config option to select the emulated machine
> > > >> type, device_model_machine. It has following possible values:
> > > >> - "i440" - i440 emulation (default)
> > > >> - "q35" - emulate a Q35 machine. By default, the storage interface
> > > >> is AHCI.  
> > > >
> > > >I would rather name this machine_chipset or device_model_chipset.
> > > 
> > > device_model_ prefix is a must I think -- multiple device model related
> > > options have names starting with device_model_.
> > > 
> > > device_model_chipset... well, maybe, but we're actually specifying a
> > > QEMU machine here. In QEMU mailing list there was even a suggestion
> > > to allow to pass a machine version number here, like "pc-q35-2.10".
> > > I think some opinions are needed here.
> > 
> > I'm not sure what a 'machine' is in QEMU speak, but in my mind I would
> > consider PC a machine (vs ARM for example).
> > 
> > I think 'chipset' is clearer, but again others should express their
> > opinion.
> 
> AIUI machine is a collection of chipset and peripherals, i.e. it covers
> more than the chipset alone.

The description of the QEMU machine "q35" is
"Standard PC (Q35 + ICH9, 2009)". So right in the description, Q35 is
not enought to describe what -machine=q35 is about. And FYI "pc" or
"pc_piix" description is "Standard PC (i440FX + PIIX, 1996)".

Also, we could expand the option to actually allow a user to select the
exact version of the QEMU machine to use. Having "pc-i440fx-2.12" in
the xl config file instead of just "pc" could prevent compatibility
issue for an existing virtual machine.

I don't know what a chipset is when related to QEMU, beside being a
piece of silicon in hardware. I think a QEMU machine is closer to a
motherboard than just a chipset. Feel free to read
"qemu.git/hw/i386/pc_piix.c" and "qemu.git/hw/i386/pc_q35.c"
to read the difference between both machine.

Anyway, I think "device_model_machine" is better that ".._chipset".
"machine" better describe the change involve when selecting q35.

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-21 16:56                 ` Alexey G
@ 2018-03-21 17:06                   ` Paul Durrant
  2018-03-22  0:31                     ` Alexey G
  2018-03-21 17:15                   ` Roger Pau Monné
  1 sibling, 1 reply; 183+ messages in thread
From: Paul Durrant @ 2018-03-21 17:06 UTC (permalink / raw)
  To: 'Alexey G', Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Jan Beulich,
	Ian Jackson, Anthony Perard, xen-devel

> -----Original Message-----
> From: Alexey G [mailto:x1917x@gmail.com]
> Sent: 21 March 2018 16:57
> To: Roger Pau Monne <roger.pau@citrix.com>
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Jan
> Beulich <jbeulich@suse.com>; Wei Liu <wei.liu2@citrix.com>; Paul Durrant
> <Paul.Durrant@citrix.com>; Anthony Perard <anthony.perard@citrix.com>;
> Stefano Stabellini <sstabellini@kernel.org>
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Wed, 21 Mar 2018 15:20:17 +0000
> Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> >On Thu, Mar 22, 2018 at 12:25:40AM +1000, Alexey G wrote:
> >> Roger, Paul,
> >>
> >> Here is what you suggest, just to clarify:
> >>
> >> 1. Add to Xen a new hypercall (+corresponding dmop) so QEMU can tell
> >> Xen where QEMU emulates machine's MMCONFIG (chipset-specific
> >> emulation of PCIEXBAR/HECBASE/etc mmcfg relocation). Xen will rely
> >> on this information to know to which PCI device the address within
> >> MMCONFIG belong.
> >>
> >> 2. Xen will trap this area + remap its trapping to other address if
> >> QEMU will inform Xen about emulated PCIEXBAR value change
> >>
> >> 3. Every MMIO access to the current MMCONFIG range will be converted
> >> into BDF first (by offset within this range, knowing where the range
> >> is)
> >>
> >> 4. Target device model is selected using calculated BDF
> >>
> >> 5. MMIO read/write accesses are converted into PCI config space
> >> ioreqs (like it was a CF8/CFCh operation instead of MMIO access). At
> >> this point ioreq structure allows to specify extended PCI conf offset
> >> (12-bit), so it will fit into PCI conf ioreq. For now let's assume
> >> that eg. a 64-bit memory operation is either aborted or workarounded
> >> by splitting this operation into multiple PCI conf ioreqs.
> >
> >Why can't you just set size = 8 in that case in the ioreq?
> >
> >QEMU should then reject those if the chipset doesn't support 64bit
> >accesses. I cannot find in the spec any mention of whether this
> >chipset supports 64bit MCFG accesses, and according to the PCIe spec
> >64bit accesses to MCFG should not be used unless the chipset is known
> >to handle them correctly.
> Yes, uint64_t should be enough in this particular case in fact, though
> memory nature of MMCONFIG accesses might still require to provide
> specific handling.
> 
> 
> All right then, so it will be a dmop/hypercall to tell Xen where to
> trap MMIO accesses to MMCONFIG as you propose.
> 
> The primary device model (QEMU) will be emulating chipset-specific
> PCIEXBAR/etc and issuing this new dmop to tell Xen which area it needs
> to trap for MMIO MMCONFIG acceses. It's basically what
> map_io_range_to_ioreq_server does currently, but I guess a new dedicated
> dmop/hypercall is bearable.
> 
> >> 6. PCI conf read/write ioreqs are sent to the chosen device model
> >>
> >> 7. QEMU receive MMCONFIG memory reads/writes as PCI conf
> reads/writes
> >>
> >> 8. As these MMCONFIG PCI conf reads occur out of context (just
> >> address/len/data without any emulated device attached to it),
> >> xen-hvm.c should employ special logic to make it QEMU-friendly --
> >> eg. right now it sends received PCI conf access into (emulated by
> >> QEMU) CF8h/CFCh ports.
> >> There is a real problem to embed these "naked" accesses into QEMU
> >> infrastructure, workarounds are required. BTW, find_primary_bus() was
> >> dropped from QEMU code -- it could've been useful here. Let's assume
> >> some workaround is employed (like storing a required object pointers
> >> in global variables for later use in xen-hvm.c)
> >
> >That seems like a minor nit, but why not just use
> >address_space_{read/write} to replay the MCFG accesses as memory
> >read/writes?
> 
> Well, this might work actually. Although the overall scenario will be
> overcomplicated a bit for _PCI_CONFIG ioreqs. Here is how it will look:
> 
> QEMU receives PCIEXBAR update -> calls the new dmop to tell Xen new
> MMCONFIG address/size -> Xen (re)maps MMIO trapping area -> someone
> is
> accessing this area -> Xen intercepts this MMIO access
> 
> But here's what happens next:
> 
> Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
> DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info back to
> the offset in emulated MMCONFIG range -> DM calls
> address_space_read/write to trigger MMIO emulation
> 

That would only be true of a dm that cannot handle PCI config ioreqs directly.

  Paul

> I tnink some parts of this equation can be collapsed, isn't it?
> 
> Above scenario makes it obvious that at least for QEMU the MMIO->PCI
> conf translation is a redundant step. Why not to allow specifying for DM
> whether it prefers to receive MMCONFIG accesses as native (MMIO ones)
> or as translated PCI conf ioreqs? We can still route either ioreq
> type to multiple device emulators accordingly.
> 
> This will be the most universal and consistent approach -- either _COPY
> or _PCI_CONFIG-type ioreqs can be sent to DM, whatever it likes more.
> 
> >> 9. Existing MMCONFIG-handling code in QEMU will be unused in this
> >> scenario
> >
> >If you replay the read/write I don't think so. In any case this is
> >irrelevant. QEMU CPU emulation code is also unused when running under
> >Xen.
> >
> >> 10. All this needed primarily to make the specific "Multiple device
> >> emulators" feature to work (XenGT was mentioned as its user) on Q35
> >> with MMCONFIG.
> >>
> >> Anything wrong/missing here?
> >
> >I think that's correct.
> >
> >Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-21 16:56                 ` Alexey G
  2018-03-21 17:06                   ` Paul Durrant
@ 2018-03-21 17:15                   ` Roger Pau Monné
  2018-03-21 22:49                     ` Alexey G
  1 sibling, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-21 17:15 UTC (permalink / raw)
  To: Alexey G
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	Paul Durrant, Jan Beulich, Anthony Perard, xen-devel

On Thu, Mar 22, 2018 at 02:56:56AM +1000, Alexey G wrote:
> On Wed, 21 Mar 2018 15:20:17 +0000
> Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> >On Thu, Mar 22, 2018 at 12:25:40AM +1000, Alexey G wrote:
> >> 8. As these MMCONFIG PCI conf reads occur out of context (just
> >> address/len/data without any emulated device attached to it),
> >> xen-hvm.c should employ special logic to make it QEMU-friendly --
> >> eg. right now it sends received PCI conf access into (emulated by
> >> QEMU) CF8h/CFCh ports.
> >> There is a real problem to embed these "naked" accesses into QEMU
> >> infrastructure, workarounds are required. BTW, find_primary_bus() was
> >> dropped from QEMU code -- it could've been useful here. Let's assume
> >> some workaround is employed (like storing a required object pointers
> >> in global variables for later use in xen-hvm.c)  
> >
> >That seems like a minor nit, but why not just use
> >address_space_{read/write} to replay the MCFG accesses as memory
> >read/writes?
> 
> Well, this might work actually. Although the overall scenario will be
> overcomplicated a bit for _PCI_CONFIG ioreqs. Here is how it will look:
> 
> QEMU receives PCIEXBAR update -> calls the new dmop to tell Xen new
> MMCONFIG address/size -> Xen (re)maps MMIO trapping area -> someone is
> accessing this area -> Xen intercepts this MMIO access
> 
> But here's what happens next:
> 
> Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
> DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info back to
> the offset in emulated MMCONFIG range -> DM calls
> address_space_read/write to trigger MMIO emulation
> 
> I tnink some parts of this equation can be collapsed, isn't it?
> 
> Above scenario makes it obvious that at least for QEMU the MMIO->PCI
> conf translation is a redundant step. Why not to allow specifying for DM
> whether it prefers to receive MMCONFIG accesses as native (MMIO ones)
> or as translated PCI conf ioreqs?

You are just adding an extra level of complexity to an interface
that's fairly simple. You register a PCI device using
XEN_DMOP_IO_RANGE_PCI and you get IOREQ_TYPE_PCI_CONFIG ioreqs.
Getting both IOREQ_TYPE_PCI_CONFIG and IOREQ_TYPE_COPY for PCI config
space access is misleading.

In both cases Xen would have to do the MCFG access decoding in order
to figure out which IOREQ server will handle the request. At which
point the only step that you avoid is the reconstruction of the memory
access from the IOREQ_TYPE_PCI_CONFIG which is trivial.

> We can still route either ioreq
> type to multiple device emulators accordingly.

It's exactly the same that's done for IO space PCI config space
addresses. QEMU gets an IOREQ_TYPE_PCI_CONFIG and it replays the IO
space access using do_outp and cpu_ioreq_pio.

If you think using IOREQ_TYPE_COPY for MCFG accesses is such a benefit
for QEMU, why not just translate the IOREQ_TYPE_PCI_CONFIG into
IOREQ_TYPE_COPY in handle_ioreq and dispatch it using
cpu_ioreq_move?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-21 14:54               ` Paul Durrant
@ 2018-03-21 17:41                 ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-21 17:41 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Jan Beulich,
	Ian Jackson, Anthony Perard, xen-devel, Roger Pau Monne

On Wed, 21 Mar 2018 14:54:16 +0000
Paul Durrant <Paul.Durrant@citrix.com> wrote:

>> -----Original Message-----
>> From: Alexey G [mailto:x1917x@gmail.com]
>> Sent: 21 March 2018 14:26
>> To: Roger Pau Monne <roger.pau@citrix.com>
>> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
>> <Andrew.Cooper3@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>;
>> Jan Beulich <jbeulich@suse.com>; Wei Liu <wei.liu2@citrix.com>; Paul
>> Durrant <Paul.Durrant@citrix.com>; Anthony Perard
>> <anthony.perard@citrix.com>; Stefano Stabellini
>> <sstabellini@kernel.org> Subject: Re: [Xen-devel] [RFC PATCH 07/12]
>> hvmloader: allocate MMCONFIG area in the MMIO hole + minor code
>> refactoring
>> 
>> On Wed, 21 Mar 2018 09:09:11 +0000
>> Roger Pau Monné <roger.pau@citrix.com> wrote:
>>   
>> >On Wed, Mar 21, 2018 at 10:58:40AM +1000, Alexey G wrote:  
>> [...]  
>> >> According to public slides for the feature, both PCI conf and MMIO
>> >> accesses can be routed to the designated device model. It looks
>> >> like for this particular setup it doesn't really matter which
>> >> particular ioreq type must be used for MMCONFIG accesses -- either
>> >> IOREQ_TYPE_PCI_CONFIG or IOREQ_TYPE_COPY (MMIO accesses) should  
>> be  
>> >> acceptable.  
>> >
>> >Isn't that going to be quite messy? How is the IOREQ server supposed
>> >to decode a MCFG access received as IOREQ_TYPE_COPY?  
>> 
>> This code is already available and in sync with QEMU legacy PCI conf
>> emulation infrastructure.
>>   
>> >I don't think the IOREQ server needs to know the start of the MCFG
>> >region, in which case it won't be able to detect and decode the
>> >access if it's of type IOREQ_TYPE_COPY.  
>> 
>> How do you think Xen will be able to know if arbitrary MMIO
>> access targets MMCONFIG area and to which BDF the offset in this area
>> belongs, without knowing where MMCONFIG is located and what PCI bus
>> layout is? It's QEMU who emulate PCIEXBAR and can tell Xen where
>> MMCONFIG is expected to be.
>>   
>> >MCFG accesses need to be sent to the IOREQ server as
>> >IOREQ_TYPE_PCI_CONFIG, or else you are forcing each IOREQ server to
>> >know the position of the MCFG area in order to do the decoding. In
>> >your case this would work because QEMU controls the position of the
>> >MCFG region, but there's no need for other IOREQ servers to know the
>> >position of the MCFG area.
>> >  
>> >> The only thing which matters is ioreq routing itself --
>> >> making decisions to which device model the PCI conf/MMIO ioreq
>> >> should be sent.  
>> >
>> >Hm, see above, but I'm fairly sure you need to forward those MCFG
>> >accesses as IOREQ_TYPE_PCI_CONFIG to the IOREQ server.  
>> 
>> (a detailed answer below)
>>   
>> >> >Traditional PCI config space accesses are not IO port space
>> >> >accesses.  
>> >>
>> >> (assuming 'not' mistyped here)  
>> >
>> >Not really, this should instead be:
>> >
>> >"Traditional PCI config space accesses are not forwarded to the
>> >IOREQ server as IO port space accesses (IOREQ_TYPE_PIO) but rather
>> >as PCI config space accesses (IOREQ_TYPE_PCI_CONFIG)."
>> >
>> >Sorry for the confusion.
>> >  
>> >> >The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and
>> >> >IOREQ servers can register devices they would like to receive
>> >> >configuration space accesses for. QEMU is already making use of
>> >> >this, see for  
>> >>
>> >> That's one of the reasons why current IOREQ_TYPE_PCI_CONFIG
>> >> implementation is a bit inconvenient for MMCONFIG MMIO accesses --
>> >> it's too much CF8h/CFCh-centric in its implementation, might be
>> >> painful to change something in the code which was intended for
>> >> CF8h/CFCh handling (and not for MMIO processing).  
>> >
>> >I'm not sure I follow. Do you mean that changes should be made to
>> >the ioreq struct in order to forward MCFG accesses using
>> >IOREQ_TYPE_PCI_CONFIG as it's type?  
>> 
>> No changes for ioreq structures needed for now.
>>   
>> >> It will be handled by IOREQ too, just using a different IOREQ type
>> >> (MMIO one). The basic question is why do we have to stick to PCI
>> >> conf space ioreqs for emulating MMIO accesses to MMCONFIG.  
>> >
>> >Because other IOREQ servers don't need to know about the
>> >position/size of the MCFG area, and cannot register MMIO ranges
>> >that cover their device's PCI configuration space in the MCFG
>> >region.
>> >
>> >Not to mention that it would would be a terrible design flaw to
>> >force IOREQ servers to register PCI devices and MCFG areas
>> >belonging to those devices separately as MMIO in order to trap all
>> >possible PCI configuration space accesses.  
>> 
>> PCI conf space layout is shared by the emulated machine. And MMCONFIG
>> layout is mandated by this common PCI bus map.
>> 
>> Even if those 'multiple device models' see a different picture of PCI
>> conf space, their visions of PCI bus must not overlap + MMCONFIG
>> layout must be consistent between different device models.
>> 
>> Although it is a terrible mistake to think about the emulated PCI bus
>> like it's a set of distinct PCI devices unrelated to each other. It's
>> all coupled together. And this is especially true for PCIe.
>> Many PCIe features rely on PCIe device interaction in PCIe fabric,
>> eg. PCIe endpoints may interact with Root Complex in many ways. This
>> cooperation may  need to be emulated somehow, eg. to provide some
>> support for PM features, link management or native hotplug
>> facilities. Even if we have a real passed through device, we might
>> need to provide an emulated PCIe Switch or a Root Port for it to
>> function properly within the PCIe hierarchy.
>> 
>> Dedicating an isolated PCI device to some isolated device model --
>> that's what might be the design flaw, considering the PCIe world.
>>   
>
>I think that is the crux of the problem. The current
>multi-ioreq-server relies on being able to consider PCI devices as
>being isolated from each other... and that is basically fine because
>we only use a single PCI bus with no bridges. To move to PCIe will
>require more emulation in Xen, but I think that is the only way to do
>it properly.

Unfortunately, this approach won't work anymore for PCIe. We can't just
separate PCI bus emulation with the chipset-specific emulation. Even
MMCONFIG is a chipset-specific feature. In order to do this, Xen should
emulate many device model features itself, probably to the point when
QEMU can be safely dropped as DM.

The single bus has become a major issue for passthrough already:
http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg03593.html

I want to replace this workaround with actual multiple bus support,
although now I'm not sure if I can make it through -- there will be a
lot of resistance likely. Although this is a mandatory feature for PCIe
PT.

>> [...]  
>> >
>> >Maybe you could detect offsets >= 256 and replay them in QEMU like
>> >mmio accesses? Using the address_space_write or
>> >pcie_mmcfg_data_read/write functions?
>> >I have to admit my knowledge of QEMU is quite limited, so I'm not
>> >sure of the best way to handle this.
>> >
>> >Ideally we should find a way that doesn't involve having to modify
>> >each chipset to handle MCFG accesses from Xen. It would be nice to
>> >have some kind of interface inside of QEMU so all chipsets can
>> >register MCFG areas or modify them, but this is out of the scope of
>> >this work.  
>> 
>> Roger, Paul,
>> 
>> Here is what you suggest, just to clarify:
>> 
>> 1. Add to Xen a new hypercall (+corresponding dmop) so QEMU can tell
>> Xen where QEMU emulates machine's MMCONFIG (chipset-specific
>> emulation
>> of PCIEXBAR/HECBASE/etc mmcfg relocation). Xen will rely on this
>> information to know to which PCI device the address within MMCONFIG
>> belong.
>> 
>> 2. Xen will trap this area + remap its trapping to other address if
>> QEMU will inform Xen about emulated PCIEXBAR value change
>> 
>> 3. Every MMIO access to the current MMCONFIG range will be converted
>> into BDF first (by offset within this range, knowing where the range
>> is)
>> 
>> 4. Target device model is selected using calculated BDF
>> 
>> 5. MMIO read/write accesses are converted into PCI config space
>> ioreqs (like it was a CF8/CFCh operation instead of MMIO access). At
>> this point ioreq structure allows to specify extended PCI conf offset
>> (12-bit), so it will fit into PCI conf ioreq. For now let's assume
>> that eg. a 64-bit memory operation is either aborted or workarounded
>> by splitting this operation into multiple PCI conf ioreqs.
>> 
>> 6. PCI conf read/write ioreqs are sent to the chosen device model
>> 
>> 7. QEMU receive MMCONFIG memory reads/writes as PCI conf reads/writes
>> 
>> 8. As these MMCONFIG PCI conf reads occur out of context (just
>> address/len/data without any emulated device attached to it),
>> xen-hvm.c should employ special logic to make it QEMU-friendly --
>> eg. right now it sends received PCI conf access into (emulated by
>> QEMU) CF8h/CFCh ports.
>> There is a real problem to embed these "naked" accesses into QEMU
>> infrastructure, workarounds are required. BTW, find_primary_bus() was
>> dropped from QEMU code -- it could've been useful here. Let's assume
>> some workaround is employed (like storing a required object pointers
>> in global variables for later use in xen-hvm.c)
>> 
>> 9. Existing MMCONFIG-handling code in QEMU will be unused in this
>> scenario
>> 
>> 10. All this needed primarily to make the specific "Multiple device
>> emulators" feature to work (XenGT was mentioned as its user) on Q35
>> with MMCONFIG.
>> 
>> Anything wrong/missing here?  
>
>That all sounds plausible. All we essentially need to do is make sure
>the config space transactions make it to the right device model in
>QEMU. If the emulation in Xen is comprehensive then I guess there
>should not even be any reason for QEMU's idea of the bus topology and
>Xen's presentation of the bus topology to the guest to even match.
>  Paul
>
>> 
>> (Adding Stefano and Anthony as xen-hvm.c mentioned)
>> 
>> 
>> Here is another suggestion:
>> 
>> 1. QEMU use existing facilities to emulate PCIEXBAR for a Q35
>> machine, calling Xen's map_io_range_to_ioreq_server() API to mark
>> MMIO range for emulation, just like for any other emulated MMIO range
>> 
>> 2. All accesses to this area will be forwarded to QEMU as MMIO ioreqs
>> and emulated flawlessly as everything is within QEMU architecture --
>> pci-host/PCIBus/PCIDevice machinery in place. No workarounds required
>> for xen-hvm.c
>> 
>> 3. CF8/CFC accesses will be forwarded as _PCI_CONFIG ioreqs, as
>> usually. Both methods are in sync as they use common PCI emulation
>> infrastructure in QEMU
>> 
>> 4. At this point absolutely zero changes are required in both Xen and
>> QEMU code. Only existing interfaces are used. In fact, no related
>> code changes required at all except a bugfix for PCIEXBAR mask
>> emulation (provided in this series)
>> 
>> 5. But. Just to make the 'multiple device emulators' (no extra
>> reasons so far) feature to work, we add the same hypercall/dmop
>> usage to let Xen know where QEMU emulates MMCONFIG
>> 
>> 6. Xen will continue to trap accesses to this range but instead of
>> sending _COPY ioreq immediately, he will check the address against
>> known MMCONFIG location (in the same manner as above), then convert
>> the
>> offset within it to BDF and he can proceed to usual BDF-based ioreq
>> routing for those device emulator DMs, whatever they are
>> 
>> 7. In fact, MMIO -> PCI conf ioreq translation can be freely used as
>> well at this stage, if it is more convenient for 'multiple device
>> emulators' feature users. It can be even made selectable.
>> 
>> So, the question which needs explanation is: why do you think
>> MMIO->PCI conf ioreq translation is mandatory for MMCONFIG? Can't we
>> just add new hypercall/dmop to make ioreq routing for 'multiple
>> device emulators' to work while letting QEMU to use any API provided
>> for him to do its tasks?
>> 
>> It's kinda funny to pretend that QEMU don't know anything about
>> MMCONFIG being MMIO when it's QEMU who inform Xen about its memory
>> address and size.
>>   
>> >Regardless of how this ends up being implemented inside of QEMU I
>> >think the above approach is the right one from an architectural PoV.
>> >
>> >AFAICT there are still some reserved bits in the ioreq struct that
>> >you could use to signal 'this is a MCFG PCI access' if required.
>> >  
>> >> Approach #2. Handling MMCONFIG area inside QEMU using usual MMIO
>> >> emulation:
>> >>
>> >> 1. QEMU will trap accesses to PCIEXBAR (or whatever else possibly
>> >> supported in the future like HECBASE), eventually asking Xen to
>> >> map the MMCONFIG MMIO range for ioreq servicing just like it does
>> >> for any other emulated MMIO range, via
>> >> map_io_range_to_ioreq_server(). All changes in MMCONFIG
>> >> placement/status will lead to  
>> remapping/unmapping  
>> >> the MMIO range.
>> >>
>> >> 2. Xen will trap MMIO accesses to this area and forward them to
>> >> QEMU as MMIO (IOREQ_TYPE_COPY) ioreqs
>> >>
>> >> 3. QEMU will receive these accesses and pass them to the existing
>> >> MMCONFIG emulation -- pcie_mmcfg_data_read/write handlers, finally
>> >> resulting in same xen_host_pci_* function calls as before.
>> >>
>> >> This approach works "right out of the box", no changes needed for
>> >> either Xen or QEMU. As both _PCI_CONFIG and MMIO type ioreqs are
>> >> processed, either method can be used to access PCI/extended config
>> >> space -- CF8/CFC port I/O or MMIO accesses to MMCONFIG.
>> >>
>> >> IOREQ routing for multiple device emulators can be supported too.
>> >> In fact, the same mmconfig dmops/hypercalls can be added to let
>> >> Xen know where MMCONFIG area resides, Xen will use this
>> >> information to forward MMCONFIG MMIO ioreqs accordingly to BDF of
>> >> the address. The difference with the approach #1 is that these
>> >> interfaces are now completely optional when we use MMIO ioreqs
>> >> for MMCONFIG on vanilla Xen/QEMU.  
>> >
>> >As said above, if you forward MCFG accesses as IOREQ_TYPE_COPY you
>> >are forcing each IOREQ server to know the position of the MCFG area
>> >in order to do the decoding, this is not acceptable IMO.
>> >  
>> >> The question is why IOREQ_TYPE_COPY -> IOREQ_TYPE_PCI_CONFIG
>> >> translation is a must have thing at all? It won't make handling
>> >> simpler. For current QEMU implementation IOREQ_TYPE_COPY (MMIO
>> >> accesses for MMCONFIG) would be preferable as it allows to use the
>> >> existing code.  
>> >
>> >Granted it's likely easier to implement, but it's also incorrect.
>> >You seem to have in mind the picture of a single IOREQ server (QEMU)
>> >handling all the devices.
>> >
>> >Although this is the most common scenario, it's not the only one
>> >supported by Xen. Your proposed solution breaks the usage of
>> >multiple IOREQ servers as PCI device emulators.
>> >  
>> >> I think it will be safe to use MMCONFIG emulation on MMIO level
>> >> for now and later extend it with 'set_mmconfig_' dmop/hypercall
>> >> for the 'multiple device emulators' IOREQ_TYPE_COPY routing to
>> >> work same as for PCI conf, so it can be used by XenGT etc on Q35
>> >> as well.  
>> >
>> >I'm afraid this kind of issues would have been fairly easier to
>> >identify if a design document for this feature was sent to the list
>> >prior to it's implementation.
>> >
>> >Regarding whether to accept something like this, I'm not really in
>> >favor, but IMO it depends on how much new code is added to handle
>> >this incorrect usage that would then go away (or would have to be
>> >changed) in order to handle the proper implementation.
>> >
>> >Thanks, Roger.  
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-21 17:15                   ` Roger Pau Monné
@ 2018-03-21 22:49                     ` Alexey G
  2018-03-22  9:29                       ` Paul Durrant
  2018-03-22  9:57                       ` Roger Pau Monné
  0 siblings, 2 replies; 183+ messages in thread
From: Alexey G @ 2018-03-21 22:49 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	Paul Durrant, Jan Beulich, Anthony Perard, xen-devel

On Wed, 21 Mar 2018 17:15:04 +0000
Roger Pau Monné <roger.pau@citrix.com> wrote:
[...]
>> Above scenario makes it obvious that at least for QEMU the MMIO->PCI
>> conf translation is a redundant step. Why not to allow specifying
>> for DM whether it prefers to receive MMCONFIG accesses as native
>> (MMIO ones) or as translated PCI conf ioreqs?  
>
>You are just adding an extra level of complexity to an interface
>that's fairly simple. You register a PCI device using
>XEN_DMOP_IO_RANGE_PCI and you get IOREQ_TYPE_PCI_CONFIG ioreqs.

Yes, and it is still needed as we have two distinct (and not equal)
interfaces to PCI conf space. Apart from 0..FFh range overlapping they
can be considered very different interfaces. And whether it is a real
system or emulated -- we can use either one of these two interfaces or
both.

For QEMU zero changes are needed to support MMCONFIG MMIO accesses if
they come as MMIO ioreqs. It's just what its MMCONFIG emulation code
expects.
Anyway, for (kind of vague) users of the multiple ioreq servers
capability we can enable MMIO translation to PCI conf ioreqs. Note that
actually this is an extra step, not forwarding trapped MMCONFIG MMIO
accesses to the selected device model as is.

>Getting both IOREQ_TYPE_PCI_CONFIG and IOREQ_TYPE_COPY for PCI config
>space access is misleading.

These are very different accesses, both in transport and capabilities.

>In both cases Xen would have to do the MCFG access decoding in order
>to figure out which IOREQ server will handle the request. At which
>point the only step that you avoid is the reconstruction of the memory
>access from the IOREQ_TYPE_PCI_CONFIG which is trivial.

The "reconstruction of the memory access" you mentioned won't be easy
actually. The thing is, address_space_read/write is not all what we
need.

In order to translate PCI conf ioreqs back to emulated MMIO ops, we
need to be an involved party, mainly to know where MMCONFIG area is
located so we can construct the address within its range from BDF.
This piece of information is destroyed in the process of MMIO ioreq
translation to PCI conf type.

The code which parse PCI conf ioreqs in xen-hvm.c doesn't know anything
about the current emulated MMCONFIG state. The correct way to have this
info is to participate in its emulation. As we don't participate, we
have no other way than trying to gain backdoor access to PCIHost fields
via things like object_resolve_*(). This solution is cumbersome and
ugly but will work... and may break anytime due to changes in QEMU. 

QEMU maintainers will grin while looking at all this I'm afraid --
trapped MMIO accesses which are translated to PCI conf accesses which
in turn translated back to emulated MMIO accesses upon receiving, along
with tedious attempts to gain access to MMCONFIG-related info as we're
not invited to the MMCONFIG emulation party.

The more I think about it, the more I like the existing
map_io_range_to_ioreq_server() approach. :( It works without doing
anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
working as expected. There is a problem to make it compatible with
the specific multiple ioreq servers feature, but providing a new
dmop/hypercall (which you suggest is a must have thing to trap MMCONFIG
MMIO to give QEMU only the freedom to tell where it is located) allows
to solve this problem in any possible way, either MMIO -> PCI conf
translation or anything else.

>> We can still route either ioreq
>> type to multiple device emulators accordingly.  
>
>It's exactly the same that's done for IO space PCI config space
>addresses. QEMU gets an IOREQ_TYPE_PCI_CONFIG and it replays the IO
>space access using do_outp and cpu_ioreq_pio.

...And it is completely limited to basic PCI conf space. I don't know
the context of this line in xen-hvm.c:

val = (1u << 31) | ((req->addr & 0x0f00) << 16) | ((sbdf & 0xffff) << 8)
       | (req->addr & 0xfc);

but seems like current QEMU versions do not expect anything similar to
AMD ECS-style accesses for 0CF8h. It is limited to basic PCI conf only.

>If you think using IOREQ_TYPE_COPY for MCFG accesses is such a benefit
>for QEMU, why not just translate the IOREQ_TYPE_PCI_CONFIG into
>IOREQ_TYPE_COPY in handle_ioreq and dispatch it using
>cpu_ioreq_move?

Answered above, we need to somehow have access to the info which don't
belong to us for this step.

>Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-21 17:06                   ` Paul Durrant
@ 2018-03-22  0:31                     ` Alexey G
  2018-03-22  9:04                       ` Jan Beulich
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-22  0:31 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Jan Beulich,
	Ian Jackson, Anthony Perard, xen-devel, Roger Pau Monne

On Wed, 21 Mar 2018 17:06:28 +0000
Paul Durrant <Paul.Durrant@citrix.com> wrote:
[...]
>> Well, this might work actually. Although the overall scenario will be
>> overcomplicated a bit for _PCI_CONFIG ioreqs. Here is how it will
>> look:
>> 
>> QEMU receives PCIEXBAR update -> calls the new dmop to tell Xen new
>> MMCONFIG address/size -> Xen (re)maps MMIO trapping area -> someone
>> is
>> accessing this area -> Xen intercepts this MMIO access
>> 
>> But here's what happens next:
>> 
>> Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
>> DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info back to
>> the offset in emulated MMCONFIG range -> DM calls
>> address_space_read/write to trigger MMIO emulation
>>   
>
>That would only be true of a dm that cannot handle PCI config ioreqs
>directly.

It's just a bit problematic for xen-hvm.c (Xen ioreq processor in QEMU).

It receives these PCI conf ioreqs out of any context. To workaround
this, existing code issues I/O to emulated CF8h/CFCh ports in order to
allow QEMU to find their target. But we can't use the same method for
MMCONFIG accesses -- this works for basic PCI conf space only.

We need to either locate PCIBus/PCIDevice manually via object lookups
and then proceed to something like pci_host_config_read_common(), or to
convert the PCI conf access into the emulated MMIO access... again, a
required piece of information is missing -- we need to somehow learn the
current MMCONFIG address to recreate the memory address to be emulated.

Let's put it simply -- the goal to make PCI conf ioreqs to reach their
MMCONFIG targets in xen-hvm.c is easily achievable but it will look like
a hack. MMIO ioreqs are preferable for MMCONFIG -- no extra logic
needed for them, we can directly pass them for emulation in a way
somewhat reminiscent of CF8h/CFCh replay, except for memory.

Ideally it would be PCI conf ioreq translation for supplemental device
emulators while skipping this translation for QEMU. QEMU expects PCI
config ioreqs only for CF8/CFC accesses. I assume it's DEMU/VGPU which
of primary concern here, not experimental users like XenGT.

>  Paul
>
>> I tnink some parts of this equation can be collapsed, isn't it?
>> 
>> Above scenario makes it obvious that at least for QEMU the MMIO->PCI
>> conf translation is a redundant step. Why not to allow specifying
>> for DM whether it prefers to receive MMCONFIG accesses as native
>> (MMIO ones) or as translated PCI conf ioreqs? We can still route
>> either ioreq type to multiple device emulators accordingly.
>> 
>> This will be the most universal and consistent approach -- either
>> _COPY or _PCI_CONFIG-type ioreqs can be sent to DM, whatever it
>> likes more. 
>> >> 9. Existing MMCONFIG-handling code in QEMU will be unused in this
>> >> scenario  
>> >
>> >If you replay the read/write I don't think so. In any case this is
>> >irrelevant. QEMU CPU emulation code is also unused when running
>> >under Xen.
>> >  
>> >> 10. All this needed primarily to make the specific "Multiple
>> >> device emulators" feature to work (XenGT was mentioned as its
>> >> user) on Q35 with MMCONFIG.
>> >>
>> >> Anything wrong/missing here?  
>> >
>> >I think that's correct.
>> >
>> >Thanks, Roger.  
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22  0:31                     ` Alexey G
@ 2018-03-22  9:04                       ` Jan Beulich
  2018-03-22  9:55                         ` Alexey G
  0 siblings, 1 reply; 183+ messages in thread
From: Jan Beulich @ 2018-03-22  9:04 UTC (permalink / raw)
  To: Alexey G
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Paul Durrant,
	xen-devel, Anthony Perard, Ian Jackson, Roger Pau Monne

>>> On 22.03.18 at 01:31, <x1917x@gmail.com> wrote:
> On Wed, 21 Mar 2018 17:06:28 +0000
> Paul Durrant <Paul.Durrant@citrix.com> wrote:
> [...]
>>> Well, this might work actually. Although the overall scenario will be
>>> overcomplicated a bit for _PCI_CONFIG ioreqs. Here is how it will
>>> look:
>>> 
>>> QEMU receives PCIEXBAR update -> calls the new dmop to tell Xen new
>>> MMCONFIG address/size -> Xen (re)maps MMIO trapping area -> someone
>>> is
>>> accessing this area -> Xen intercepts this MMIO access
>>> 
>>> But here's what happens next:
>>> 
>>> Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
>>> DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info back to
>>> the offset in emulated MMCONFIG range -> DM calls
>>> address_space_read/write to trigger MMIO emulation
>>>   
>>
>>That would only be true of a dm that cannot handle PCI config ioreqs
>>directly.
> 
> It's just a bit problematic for xen-hvm.c (Xen ioreq processor in QEMU).
> 
> It receives these PCI conf ioreqs out of any context. To workaround
> this, existing code issues I/O to emulated CF8h/CFCh ports in order to
> allow QEMU to find their target. But we can't use the same method for
> MMCONFIG accesses -- this works for basic PCI conf space only.

I think you want to view this the other way around: No physical
device would ever get to see MMCFG accesses (or CF8/CFC port
ones). This same layering is what we should have in the
virtualized case.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-21 22:49                     ` Alexey G
@ 2018-03-22  9:29                       ` Paul Durrant
  2018-03-22 10:05                         ` Roger Pau Monné
  2018-03-22 10:50                         ` Alexey G
  2018-03-22  9:57                       ` Roger Pau Monné
  1 sibling, 2 replies; 183+ messages in thread
From: Paul Durrant @ 2018-03-22  9:29 UTC (permalink / raw)
  To: 'Alexey G', Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Jan Beulich,
	Ian Jackson, Anthony Perard, xen-devel

> -----Original Message-----
> From: Alexey G [mailto:x1917x@gmail.com]
> Sent: 21 March 2018 22:50
> To: Roger Pau Monne <roger.pau@citrix.com>
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Jan
> Beulich <jbeulich@suse.com>; Wei Liu <wei.liu2@citrix.com>; Paul Durrant
> <Paul.Durrant@citrix.com>; Anthony Perard <anthony.perard@citrix.com>;
> Stefano Stabellini <sstabellini@kernel.org>
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Wed, 21 Mar 2018 17:15:04 +0000
> Roger Pau Monné <roger.pau@citrix.com> wrote:
> [...]
> >> Above scenario makes it obvious that at least for QEMU the MMIO->PCI
> >> conf translation is a redundant step. Why not to allow specifying
> >> for DM whether it prefers to receive MMCONFIG accesses as native
> >> (MMIO ones) or as translated PCI conf ioreqs?
> >
> >You are just adding an extra level of complexity to an interface
> >that's fairly simple. You register a PCI device using
> >XEN_DMOP_IO_RANGE_PCI and you get IOREQ_TYPE_PCI_CONFIG ioreqs.
> 
> Yes, and it is still needed as we have two distinct (and not equal)
> interfaces to PCI conf space. Apart from 0..FFh range overlapping they
> can be considered very different interfaces. And whether it is a real
> system or emulated -- we can use either one of these two interfaces or
> both.
> 
> For QEMU zero changes are needed to support MMCONFIG MMIO accesses
> if
> they come as MMIO ioreqs. It's just what its MMCONFIG emulation code
> expects.
> Anyway, for (kind of vague) users of the multiple ioreq servers
> capability we can enable MMIO translation to PCI conf ioreqs. Note that
> actually this is an extra step, not forwarding trapped MMCONFIG MMIO
> accesses to the selected device model as is.
> 
> >Getting both IOREQ_TYPE_PCI_CONFIG and IOREQ_TYPE_COPY for PCI
> config
> >space access is misleading.
> 
> These are very different accesses, both in transport and capabilities.
> 
> >In both cases Xen would have to do the MCFG access decoding in order
> >to figure out which IOREQ server will handle the request. At which
> >point the only step that you avoid is the reconstruction of the memory
> >access from the IOREQ_TYPE_PCI_CONFIG which is trivial.
> 
> The "reconstruction of the memory access" you mentioned won't be easy
> actually. The thing is, address_space_read/write is not all what we
> need.
> 
> In order to translate PCI conf ioreqs back to emulated MMIO ops, we
> need to be an involved party, mainly to know where MMCONFIG area is
> located so we can construct the address within its range from BDF.
> This piece of information is destroyed in the process of MMIO ioreq
> translation to PCI conf type.
> 
> The code which parse PCI conf ioreqs in xen-hvm.c doesn't know anything
> about the current emulated MMCONFIG state. The correct way to have this
> info is to participate in its emulation. As we don't participate, we
> have no other way than trying to gain backdoor access to PCIHost fields
> via things like object_resolve_*(). This solution is cumbersome and
> ugly but will work... and may break anytime due to changes in QEMU.
> 
> QEMU maintainers will grin while looking at all this I'm afraid --
> trapped MMIO accesses which are translated to PCI conf accesses which
> in turn translated back to emulated MMIO accesses upon receiving, along
> with tedious attempts to gain access to MMCONFIG-related info as we're
> not invited to the MMCONFIG emulation party.
> 
> The more I think about it, the more I like the existing
> map_io_range_to_ioreq_server() approach. :( It works without doing
> anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
> working as expected. There is a problem to make it compatible with
> the specific multiple ioreq servers feature, but providing a new
> dmop/hypercall (which you suggest is a must have thing to trap MMCONFIG
> MMIO to give QEMU only the freedom to tell where it is located) allows
> to solve this problem in any possible way, either MMIO -> PCI conf
> translation or anything else.
> 

I don't think we even want QEMU to have the freedom to say where the MMCONFIG areas are located, do we? QEMU is not in charge of the guest memory map and it is not responsible for the building the MCFG table, Xen is. So it should be Xen that decides where the MMCONFIG area goes for each registered PCI device and it should be Xen that adds that to the MCFG table. It should be Xen that handles the MMCONFIG MMIO accesses and these should be forwarded to QEMU as PCI config IOREQs.
Now, it may be that we need to introduce a Xen specific mechanism into QEMU to then route those config space transactions to the device models but that would be an improvement over the current cf8/cfc hackery anyway.

  Paul

> >> We can still route either ioreq
> >> type to multiple device emulators accordingly.
> >
> >It's exactly the same that's done for IO space PCI config space
> >addresses. QEMU gets an IOREQ_TYPE_PCI_CONFIG and it replays the IO
> >space access using do_outp and cpu_ioreq_pio.
> 
> ...And it is completely limited to basic PCI conf space. I don't know
> the context of this line in xen-hvm.c:
> 
> val = (1u << 31) | ((req->addr & 0x0f00) << 16) | ((sbdf & 0xffff) << 8)
>        | (req->addr & 0xfc);
> 
> but seems like current QEMU versions do not expect anything similar to
> AMD ECS-style accesses for 0CF8h. It is limited to basic PCI conf only.
> 
> >If you think using IOREQ_TYPE_COPY for MCFG accesses is such a benefit
> >for QEMU, why not just translate the IOREQ_TYPE_PCI_CONFIG into
> >IOREQ_TYPE_COPY in handle_ioreq and dispatch it using
> >cpu_ioreq_move?
> 
> Answered above, we need to somehow have access to the info which don't
> belong to us for this step.
> 
> >Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22  9:04                       ` Jan Beulich
@ 2018-03-22  9:55                         ` Alexey G
  2018-03-22 10:06                           ` Paul Durrant
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-22  9:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Paul Durrant,
	xen-devel, Anthony Perard, Ian Jackson, Roger Pau Monne

On Thu, 22 Mar 2018 03:04:16 -0600
"Jan Beulich" <JBeulich@suse.com> wrote:

>>>> On 22.03.18 at 01:31, <x1917x@gmail.com> wrote:  
>> On Wed, 21 Mar 2018 17:06:28 +0000
>> Paul Durrant <Paul.Durrant@citrix.com> wrote:
>> [...]  
>>>> Well, this might work actually. Although the overall scenario will
>>>> be overcomplicated a bit for _PCI_CONFIG ioreqs. Here is how it
>>>> will look:
>>>> 
>>>> QEMU receives PCIEXBAR update -> calls the new dmop to tell Xen new
>>>> MMCONFIG address/size -> Xen (re)maps MMIO trapping area -> someone
>>>> is
>>>> accessing this area -> Xen intercepts this MMIO access
>>>> 
>>>> But here's what happens next:
>>>> 
>>>> Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
>>>> DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info back
>>>> to the offset in emulated MMCONFIG range -> DM calls
>>>> address_space_read/write to trigger MMIO emulation
>>>>     
>>>
>>>That would only be true of a dm that cannot handle PCI config ioreqs
>>>directly.  
>> 
>> It's just a bit problematic for xen-hvm.c (Xen ioreq processor in
>> QEMU).
>> 
>> It receives these PCI conf ioreqs out of any context. To workaround
>> this, existing code issues I/O to emulated CF8h/CFCh ports in order
>> to allow QEMU to find their target. But we can't use the same method
>> for MMCONFIG accesses -- this works for basic PCI conf space only.  
>
>I think you want to view this the other way around: No physical
>device would ever get to see MMCFG accesses (or CF8/CFC port
>ones). This same layering is what we should have in the
>virtualized case.

We have purely virtual layout of the PCI bus along with virtual,
emulated and completely unrelated to host's MMCONFIG -- so what's
exposed? This emulated MMCONFIG simply a supplement to virtual PCI bus
and its layout correspond to the virtual PCI bus guest/QEMU see.

It's QEMU who controls chipset-specific PCIEXBAR emulation and knows
about MMCONFIG position and size. QEMU informs Xen about where it is,
in order to receive events about R/W accesses to this emulated area --
so, why he should receive these events in a form of PCI conf BDF/reg and
not simply as MMCONFIG offset directly if it is basically the same
thing?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-21 22:49                     ` Alexey G
  2018-03-22  9:29                       ` Paul Durrant
@ 2018-03-22  9:57                       ` Roger Pau Monné
  2018-03-22 12:29                         ` Alexey G
  1 sibling, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-22  9:57 UTC (permalink / raw)
  To: Alexey G
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	Paul Durrant, Jan Beulich, Anthony Perard, xen-devel

On Thu, Mar 22, 2018 at 08:49:58AM +1000, Alexey G wrote:
> On Wed, 21 Mar 2018 17:15:04 +0000
> Roger Pau Monné <roger.pau@citrix.com> wrote:
> [...]
> >> Above scenario makes it obvious that at least for QEMU the MMIO->PCI
> >> conf translation is a redundant step. Why not to allow specifying
> >> for DM whether it prefers to receive MMCONFIG accesses as native
> >> (MMIO ones) or as translated PCI conf ioreqs?  
> >
> >You are just adding an extra level of complexity to an interface
> >that's fairly simple. You register a PCI device using
> >XEN_DMOP_IO_RANGE_PCI and you get IOREQ_TYPE_PCI_CONFIG ioreqs.
> 
> Yes, and it is still needed as we have two distinct (and not equal)
> interfaces to PCI conf space. Apart from 0..FFh range overlapping they
> can be considered very different interfaces. And whether it is a real
> system or emulated -- we can use either one of these two interfaces or
> both.

The legacy PCI config space accesses and the MCFG config space access
are just different methods of accessing the PCI configuration space,
but the data _must_ be exactly the same. I don't see how a device
would care about where the access to the config space originated.

> For QEMU zero changes are needed to support MMCONFIG MMIO accesses if
> they come as MMIO ioreqs. It's just what its MMCONFIG emulation code
> expects.

As I said many times in this thread, you seem to be focused around
what's best for QEMU only, and this is wrong. The IOREQ interface is
used by QEMU, but it's also used by other device emulators.

I get the feeling that you assume that the correct solution is the one
that involves less changes to Xen and QEMU. This is simply not true.

> Anyway, for (kind of vague) users of the multiple ioreq servers
> capability we can enable MMIO translation to PCI conf ioreqs. Note that
> actually this is an extra step, not forwarding trapped MMCONFIG MMIO
> accesses to the selected device model as is.
>
> >Getting both IOREQ_TYPE_PCI_CONFIG and IOREQ_TYPE_COPY for PCI config
> >space access is misleading.
> 
> These are very different accesses, both in transport and capabilities.
> 
> >In both cases Xen would have to do the MCFG access decoding in order
> >to figure out which IOREQ server will handle the request. At which
> >point the only step that you avoid is the reconstruction of the memory
> >access from the IOREQ_TYPE_PCI_CONFIG which is trivial.
> 
> The "reconstruction of the memory access" you mentioned won't be easy
> actually. The thing is, address_space_read/write is not all what we
> need.
> 
> In order to translate PCI conf ioreqs back to emulated MMIO ops, we
> need to be an involved party, mainly to know where MMCONFIG area is
> located so we can construct the address within its range from BDF.
> This piece of information is destroyed in the process of MMIO ioreq
> translation to PCI conf type.

QEMU certainly knows the position of the MCFG area (because it's the
one that tells Xen about it), so I don't understand your concerns
above.

> The code which parse PCI conf ioreqs in xen-hvm.c doesn't know anything
> about the current emulated MMCONFIG state. The correct way to have this
> info is to participate in its emulation. As we don't participate, we
> have no other way than trying to gain backdoor access to PCIHost fields
> via things like object_resolve_*(). This solution is cumbersome and
> ugly but will work... and may break anytime due to changes in QEMU. 

OK, so you don't want to reconstruct the access, fine.

Then just inject it using pcie_mmcfg_data_{read/write} or some similar
wrapper. My suggestion was just to try to use the easier way to get
this injected into QEMU.

> QEMU maintainers will grin while looking at all this I'm afraid --
> trapped MMIO accesses which are translated to PCI conf accesses which
> in turn translated back to emulated MMIO accesses upon receiving, along
> with tedious attempts to gain access to MMCONFIG-related info as we're
> not invited to the MMCONFIG emulation party.
>
> The more I think about it, the more I like the existing
> map_io_range_to_ioreq_server() approach. :( It works without doing
> anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
> working as expected. There is a problem to make it compatible with
> the specific multiple ioreq servers feature, but providing a new
> dmop/hypercall (which you suggest is a must have thing to trap MMCONFIG
> MMIO to give QEMU only the freedom to tell where it is located) allows
> to solve this problem in any possible way, either MMIO -> PCI conf
> translation or anything else.

I'm sorry, but I'm getting lost.

You complain that using IOREQ_TYPE_PCI_CONFIG is not a good approach
because QEMU needs to know the position of the MCFG area if we want to
reconstruct and forward the MMIO access. And then you are proposing to
use IOREQ_TYPE_COPY which _requires_ QEMU to know the position of the
MCFG area in order to do the decoding of the PCI config space access.

> >> We can still route either ioreq
> >> type to multiple device emulators accordingly.  
> >
> >It's exactly the same that's done for IO space PCI config space
> >addresses. QEMU gets an IOREQ_TYPE_PCI_CONFIG and it replays the IO
> >space access using do_outp and cpu_ioreq_pio.
> 
> ...And it is completely limited to basic PCI conf space. I don't know
> the context of this line in xen-hvm.c:
> 
> val = (1u << 31) | ((req->addr & 0x0f00) << 16) | ((sbdf & 0xffff) << 8)
>        | (req->addr & 0xfc);
> 
> but seems like current QEMU versions do not expect anything similar to
> AMD ECS-style accesses for 0CF8h. It is limited to basic PCI conf only.
> 
> >If you think using IOREQ_TYPE_COPY for MCFG accesses is such a benefit
> >for QEMU, why not just translate the IOREQ_TYPE_PCI_CONFIG into
> >IOREQ_TYPE_COPY in handle_ioreq and dispatch it using
> >cpu_ioreq_move?
> 
> Answered above, we need to somehow have access to the info which don't
> belong to us for this step.

Why not? QEMU tells Xen the position of the MCFG area but then you
complain that QEMU doesn't know the position of the MCFG area?

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22  9:29                       ` Paul Durrant
@ 2018-03-22 10:05                         ` Roger Pau Monné
  2018-03-22 10:09                           ` Paul Durrant
  2018-03-22 10:50                         ` Alexey G
  1 sibling, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-22 10:05 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, 'Alexey G',
	Jan Beulich, Ian Jackson, Anthony Perard, xen-devel

On Thu, Mar 22, 2018 at 09:29:44AM +0000, Paul Durrant wrote:
> > The more I think about it, the more I like the existing
> > map_io_range_to_ioreq_server() approach. :( It works without doing
> > anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
> > working as expected. There is a problem to make it compatible with
> > the specific multiple ioreq servers feature, but providing a new
> > dmop/hypercall (which you suggest is a must have thing to trap MMCONFIG
> > MMIO to give QEMU only the freedom to tell where it is located) allows
> > to solve this problem in any possible way, either MMIO -> PCI conf
> > translation or anything else.
> > 
> 
> I don't think we even want QEMU to have the freedom to say where the
> MMCONFIG areas are located, do we?

Sadly this how the chipset works. The PCIEXBAR register contains the
position of the MCFG area. And this is emulated by QEMU.

> QEMU is not in charge of the
> guest memory map and it is not responsible for the building the MCFG
> table, Xen is.

Well, the one that builds the MCFG table is hvmloader actually, which
is the one that initially sets the value of PCIEXBAR and thus the
initial position of the MCFG.

> So it should be Xen that decides where the MMCONFIG
> area goes for each registered PCI device and it should be Xen that
> adds that to the MCFG table. It should be Xen that handles the
> MMCONFIG MMIO accesses and these should be forwarded to QEMU as PCI
> config IOREQs.  Now, it may be that we need to introduce a Xen
> specific mechanism into QEMU to then route those config space
> transactions to the device models but that would be an improvement
> over the current cf8/cfc hackery anyway.

I think we need a way for QEMU to tell Xen the position of the MCFG
area, and any changes to it.

I don't think we want to emulate the PCIEXBAR register inside of Xen,
if we do that then we would likely have to emulate the full Express
Chipset inside of Xen.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22  9:55                         ` Alexey G
@ 2018-03-22 10:06                           ` Paul Durrant
  2018-03-22 11:56                             ` Alexey G
  0 siblings, 1 reply; 183+ messages in thread
From: Paul Durrant @ 2018-03-22 10:06 UTC (permalink / raw)
  To: 'Alexey G', Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, xen-devel,
	Anthony Perard, Ian Jackson, Roger Pau Monne

> -----Original Message-----
> From: Alexey G [mailto:x1917x@gmail.com]
> Sent: 22 March 2018 09:55
> To: Jan Beulich <JBeulich@suse.com>
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Anthony Perard
> <anthony.perard@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Paul
> Durrant <Paul.Durrant@citrix.com>; Roger Pau Monne
> <roger.pau@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Stefano Stabellini
> <sstabellini@kernel.org>; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Thu, 22 Mar 2018 03:04:16 -0600
> "Jan Beulich" <JBeulich@suse.com> wrote:
> 
> >>>> On 22.03.18 at 01:31, <x1917x@gmail.com> wrote:
> >> On Wed, 21 Mar 2018 17:06:28 +0000
> >> Paul Durrant <Paul.Durrant@citrix.com> wrote:
> >> [...]
> >>>> Well, this might work actually. Although the overall scenario will
> >>>> be overcomplicated a bit for _PCI_CONFIG ioreqs. Here is how it
> >>>> will look:
> >>>>
> >>>> QEMU receives PCIEXBAR update -> calls the new dmop to tell Xen
> new
> >>>> MMCONFIG address/size -> Xen (re)maps MMIO trapping area ->
> someone
> >>>> is
> >>>> accessing this area -> Xen intercepts this MMIO access
> >>>>
> >>>> But here's what happens next:
> >>>>
> >>>> Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
> >>>> DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info back
> >>>> to the offset in emulated MMCONFIG range -> DM calls
> >>>> address_space_read/write to trigger MMIO emulation
> >>>>
> >>>
> >>>That would only be true of a dm that cannot handle PCI config ioreqs
> >>>directly.
> >>
> >> It's just a bit problematic for xen-hvm.c (Xen ioreq processor in
> >> QEMU).
> >>
> >> It receives these PCI conf ioreqs out of any context. To workaround
> >> this, existing code issues I/O to emulated CF8h/CFCh ports in order
> >> to allow QEMU to find their target. But we can't use the same method
> >> for MMCONFIG accesses -- this works for basic PCI conf space only.
> >
> >I think you want to view this the other way around: No physical
> >device would ever get to see MMCFG accesses (or CF8/CFC port
> >ones). This same layering is what we should have in the
> >virtualized case.
> 
> We have purely virtual layout of the PCI bus along with virtual,
> emulated and completely unrelated to host's MMCONFIG -- so what's
> exposed? This emulated MMCONFIG simply a supplement to virtual PCI bus
> and its layout correspond to the virtual PCI bus guest/QEMU see.
> 
> It's QEMU who controls chipset-specific PCIEXBAR emulation and knows
> about MMCONFIG position and size.

...and I think that it the wrong solution for Xen. We only use QEMU as an emulator for peripheral devices; we should not be using it for this kind of emulation... that should be brought into the hypervisor.

> QEMU informs Xen about where it is,

No. Xen should not care where QEMU wants to put it because the MMIO emulations should not even read QEMU.

   Paul

> in order to receive events about R/W accesses to this emulated area --
> so, why he should receive these events in a form of PCI conf BDF/reg and
> not simply as MMCONFIG offset directly if it is basically the same
> thing?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22 10:05                         ` Roger Pau Monné
@ 2018-03-22 10:09                           ` Paul Durrant
  2018-03-22 11:36                             ` Alexey G
  0 siblings, 1 reply; 183+ messages in thread
From: Paul Durrant @ 2018-03-22 10:09 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, 'Alexey G',
	Jan Beulich, Ian Jackson, Anthony Perard, xen-devel

> -----Original Message-----
> From: Roger Pau Monne
> Sent: 22 March 2018 10:06
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: 'Alexey G' <x1917x@gmail.com>; xen-devel@lists.xenproject.org;
> Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> <Ian.Jackson@citrix.com>; Jan Beulich <jbeulich@suse.com>; Wei Liu
> <wei.liu2@citrix.com>; Anthony Perard <anthony.perard@citrix.com>;
> Stefano Stabellini <sstabellini@kernel.org>
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Thu, Mar 22, 2018 at 09:29:44AM +0000, Paul Durrant wrote:
> > > The more I think about it, the more I like the existing
> > > map_io_range_to_ioreq_server() approach. :( It works without doing
> > > anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
> > > working as expected. There is a problem to make it compatible with
> > > the specific multiple ioreq servers feature, but providing a new
> > > dmop/hypercall (which you suggest is a must have thing to trap
> MMCONFIG
> > > MMIO to give QEMU only the freedom to tell where it is located) allows
> > > to solve this problem in any possible way, either MMIO -> PCI conf
> > > translation or anything else.
> > >
> >
> > I don't think we even want QEMU to have the freedom to say where the
> > MMCONFIG areas are located, do we?
> 
> Sadly this how the chipset works. The PCIEXBAR register contains the
> position of the MCFG area. And this is emulated by QEMU.

So we should be emulating that in Xen, not handing it off to QEMU. Our integration with QEMU is already terrible and using QEMU to emulate the PCIe chipset will only make it worse.

> 
> > QEMU is not in charge of the
> > guest memory map and it is not responsible for the building the MCFG
> > table, Xen is.
> 
> Well, the one that builds the MCFG table is hvmloader actually, which
> is the one that initially sets the value of PCIEXBAR and thus the
> initial position of the MCFG.
> 
> > So it should be Xen that decides where the MMCONFIG
> > area goes for each registered PCI device and it should be Xen that
> > adds that to the MCFG table. It should be Xen that handles the
> > MMCONFIG MMIO accesses and these should be forwarded to QEMU as
> PCI
> > config IOREQs.  Now, it may be that we need to introduce a Xen
> > specific mechanism into QEMU to then route those config space
> > transactions to the device models but that would be an improvement
> > over the current cf8/cfc hackery anyway.
> 
> I think we need a way for QEMU to tell Xen the position of the MCFG
> area, and any changes to it.
> 
> I don't think we want to emulate the PCIEXBAR register inside of Xen,
> if we do that then we would likely have to emulate the full Express
> Chipset inside of Xen.
> 

No, that's *exactly* what we should be doing. We should only be using QEMU for emulation of discrete peripheral devices.

  Paul

> Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22  9:29                       ` Paul Durrant
  2018-03-22 10:05                         ` Roger Pau Monné
@ 2018-03-22 10:50                         ` Alexey G
  1 sibling, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-22 10:50 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Jan Beulich,
	Ian Jackson, Anthony Perard, xen-devel, Roger Pau Monne

On Thu, 22 Mar 2018 09:29:44 +0000
Paul Durrant <Paul.Durrant@citrix.com> wrote:

>> -----Original Message-----
[...]
>> >In both cases Xen would have to do the MCFG access decoding in order
>> >to figure out which IOREQ server will handle the request. At which
>> >point the only step that you avoid is the reconstruction of the
>> >memory access from the IOREQ_TYPE_PCI_CONFIG which is trivial.  
>> 
>> The "reconstruction of the memory access" you mentioned won't be easy
>> actually. The thing is, address_space_read/write is not all what we
>> need.
>> 
>> In order to translate PCI conf ioreqs back to emulated MMIO ops, we
>> need to be an involved party, mainly to know where MMCONFIG area is
>> located so we can construct the address within its range from BDF.
>> This piece of information is destroyed in the process of MMIO ioreq
>> translation to PCI conf type.
>> 
>> The code which parse PCI conf ioreqs in xen-hvm.c doesn't know
>> anything about the current emulated MMCONFIG state. The correct way
>> to have this info is to participate in its emulation. As we don't
>> participate, we have no other way than trying to gain backdoor
>> access to PCIHost fields via things like object_resolve_*(). This
>> solution is cumbersome and ugly but will work... and may break
>> anytime due to changes in QEMU.
>> 
>> QEMU maintainers will grin while looking at all this I'm afraid --
>> trapped MMIO accesses which are translated to PCI conf accesses which
>> in turn translated back to emulated MMIO accesses upon receiving,
>> along with tedious attempts to gain access to MMCONFIG-related info
>> as we're not invited to the MMCONFIG emulation party.
>> 
>> The more I think about it, the more I like the existing
>> map_io_range_to_ioreq_server() approach. :( It works without doing
>> anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
>> working as expected. There is a problem to make it compatible with
>> the specific multiple ioreq servers feature, but providing a new
>> dmop/hypercall (which you suggest is a must have thing to trap
>> MMCONFIG MMIO to give QEMU only the freedom to tell where it is
>> located) allows to solve this problem in any possible way, either
>> MMIO -> PCI conf translation or anything else.
>>   
>
>I don't think we even want QEMU to have the freedom to say where the
>MMCONFIG areas are located, do we? QEMU is not in charge of the guest
>memory map and it is not responsible for the building the MCFG table,
>Xen is. So it should be Xen that decides where the MMCONFIG area goes
>for each registered PCI device and it should be Xen that adds that to
>the MCFG table. It should be Xen that handles the MMCONFIG MMIO
>accesses and these should be forwarded to QEMU as PCI config IOREQs.
>Now, it may be that we need to introduce a Xen specific mechanism into
>QEMU to then route those config space transactions to the device
>models but that would be an improvement over the current cf8/cfc
>hackery anyway.

Well, MMCONFIG is a chipset-specific thing. We probably can't simply
abstract its usage, merely providing ACPI MCFG table for it.

Its layout must correspond to the emulated PCI conf space where the
majority of devices belong to QEMU. Although we can track all QEMU's
usage of emulated/PT PCI devices and build this layout themselves, this
design may introduce multiple issues. For QEMU handling of such PCI
conf ioreq without knowing anything about MMCONFIG will become worse --
previously he at least knew that those belong to the MMCONFIG range he
emulates, but in case of PCI conf ioreqs situation gets a bit more
complicated -- either CF8/CFC workaround or manual lookup of the target
device from rather isolated xen-hvm.c. Feasible, yes, but will look like
a dirty hack -- doing part of QEMU's internal job.

These are merely inconveniences, main problem here at the moment is
OVMF. OVMF does MMCONFIG relocation by writing to PCIEXBAR he knows
about on Q35, followed by using it at the address he expects. This is
something I want to address in subsequent patches.

>  Paul
>
>> >> We can still route either ioreq
>> >> type to multiple device emulators accordingly.  
>> >
>> >It's exactly the same that's done for IO space PCI config space
>> >addresses. QEMU gets an IOREQ_TYPE_PCI_CONFIG and it replays the IO
>> >space access using do_outp and cpu_ioreq_pio.  
>> 
>> ...And it is completely limited to basic PCI conf space. I don't know
>> the context of this line in xen-hvm.c:
>> 
>> val = (1u << 31) | ((req->addr & 0x0f00) << 16) | ((sbdf & 0xffff)
>> << 8) | (req->addr & 0xfc);
>> 
>> but seems like current QEMU versions do not expect anything similar
>> to AMD ECS-style accesses for 0CF8h. It is limited to basic PCI conf
>> only. 
>> >If you think using IOREQ_TYPE_COPY for MCFG accesses is such a
>> >benefit for QEMU, why not just translate the IOREQ_TYPE_PCI_CONFIG
>> >into IOREQ_TYPE_COPY in handle_ioreq and dispatch it using
>> >cpu_ioreq_move?  
>> 
>> Answered above, we need to somehow have access to the info which
>> don't belong to us for this step.
>>   
>> >Thanks, Roger.  


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22 10:09                           ` Paul Durrant
@ 2018-03-22 11:36                             ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-22 11:36 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Jan Beulich,
	Ian Jackson, Anthony Perard, xen-devel, Roger Pau Monne

On Thu, 22 Mar 2018 10:09:16 +0000
Paul Durrant <Paul.Durrant@citrix.com> wrote:
[...]
>> > I don't think we even want QEMU to have the freedom to say where
>> > the MMCONFIG areas are located, do we?    
>> 
>> Sadly this how the chipset works. The PCIEXBAR register contains the
>> position of the MCFG area. And this is emulated by QEMU.    

>So we should be emulating that in Xen, not handing it off to QEMU. Our
>integration with QEMU is already terrible and using QEMU to emulate
>the PCIe chipset will only make it worse.  

I guess QEMU guys will tell that it will actually improve. :)
One of the very first observation I made while learning Xen/QEMU was
that Xen and QEMU behave sort of like stepmother and stepdaughter --
dislike each other but have to live together in one house for now.
I think a better interaction will benefit both.

There are some architectural issues (MMIO hole control for passthrough
needs is one of them) which can be solved by actually improving
coordination with QEMU, while not sacrificing the security in any way.

>> > QEMU is not in charge of the
>> > guest memory map and it is not responsible for the building the
>> > MCFG table, Xen is.    
>> 
>> Well, the one that builds the MCFG table is hvmloader actually, which
>> is the one that initially sets the value of PCIEXBAR and thus the
>> initial position of the MCFG.
>>     
>> > So it should be Xen that decides where the MMCONFIG
>> > area goes for each registered PCI device and it should be Xen that
>> > adds that to the MCFG table. It should be Xen that handles the
>> > MMCONFIG MMIO accesses and these should be forwarded to QEMU as    
>> PCI    
>> > config IOREQs.  Now, it may be that we need to introduce a Xen
>> > specific mechanism into QEMU to then route those config space
>> > transactions to the device models but that would be an improvement
>> > over the current cf8/cfc hackery anyway.    
>> 
>> I think we need a way for QEMU to tell Xen the position of the MCFG
>> area, and any changes to it.
>> 
>> I don't think we want to emulate the PCIEXBAR register inside of Xen,
>> if we do that then we would likely have to emulate the full Express
>> Chipset inside of Xen.
>>     
>No, that's *exactly* what we should be doing. We should only be using
>QEMU for emulation of discrete peripheral devices.  

Emulated PCIe Switch (PCI-PCI bridge basically) can be considered a
discrete peripheral device which can function alone?

If we should emulate the whole PCIe bus, where will be the dividing
line between chipset emulation and PCIe hierarchy emulation?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22 10:06                           ` Paul Durrant
@ 2018-03-22 11:56                             ` Alexey G
  2018-03-22 12:09                               ` Jan Beulich
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-22 11:56 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Jan Beulich,
	xen-devel, Anthony Perard, Ian Jackson, Roger Pau Monne

On Thu, 22 Mar 2018 10:06:09 +0000
Paul Durrant <Paul.Durrant@citrix.com> wrote:

>> -----Original Message-----
>> From: Alexey G [mailto:x1917x@gmail.com]
>> Sent: 22 March 2018 09:55
>> To: Jan Beulich <JBeulich@suse.com>
>> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Anthony Perard
>> <anthony.perard@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>;
>> Paul Durrant <Paul.Durrant@citrix.com>; Roger Pau Monne
>> <roger.pau@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Stefano
>> Stabellini <sstabellini@kernel.org>; xen-devel@lists.xenproject.org
>> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate
>> MMCONFIG area in the MMIO hole + minor code refactoring
>> 
>> On Thu, 22 Mar 2018 03:04:16 -0600
>> "Jan Beulich" <JBeulich@suse.com> wrote:
>>   
>> >>>> On 22.03.18 at 01:31, <x1917x@gmail.com> wrote:  
>> >> On Wed, 21 Mar 2018 17:06:28 +0000
>> >> Paul Durrant <Paul.Durrant@citrix.com> wrote:
>> >> [...]  
>> >>>> Well, this might work actually. Although the overall scenario
>> >>>> will be overcomplicated a bit for _PCI_CONFIG ioreqs. Here is
>> >>>> how it will look:
>> >>>>
>> >>>> QEMU receives PCIEXBAR update -> calls the new dmop to tell
>> >>>> Xen  
>> new  
>> >>>> MMCONFIG address/size -> Xen (re)maps MMIO trapping area ->  
>> someone  
>> >>>> is
>> >>>> accessing this area -> Xen intercepts this MMIO access
>> >>>>
>> >>>> But here's what happens next:
>> >>>>
>> >>>> Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
>> >>>> DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info
>> >>>> back to the offset in emulated MMCONFIG range -> DM calls
>> >>>> address_space_read/write to trigger MMIO emulation
>> >>>>  
>> >>>
>> >>>That would only be true of a dm that cannot handle PCI config
>> >>>ioreqs directly.  
>> >>
>> >> It's just a bit problematic for xen-hvm.c (Xen ioreq processor in
>> >> QEMU).
>> >>
>> >> It receives these PCI conf ioreqs out of any context. To
>> >> workaround this, existing code issues I/O to emulated CF8h/CFCh
>> >> ports in order to allow QEMU to find their target. But we can't
>> >> use the same method for MMCONFIG accesses -- this works for basic
>> >> PCI conf space only.  
>> >
>> >I think you want to view this the other way around: No physical
>> >device would ever get to see MMCFG accesses (or CF8/CFC port
>> >ones). This same layering is what we should have in the
>> >virtualized case.  
>> 
>> We have purely virtual layout of the PCI bus along with virtual,
>> emulated and completely unrelated to host's MMCONFIG -- so what's
>> exposed? This emulated MMCONFIG simply a supplement to virtual PCI
>> bus and its layout correspond to the virtual PCI bus guest/QEMU see.
>> 
>> It's QEMU who controls chipset-specific PCIEXBAR emulation and knows
>> about MMCONFIG position and size.  
>
>...and I think that it the wrong solution for Xen. We only use QEMU as
>an emulator for peripheral devices; we should not be using it for this
>kind of emulation... that should be brought into the hypervisor.
>
>> QEMU informs Xen about where it is,  
>
>No. Xen should not care where QEMU wants to put it because the MMIO
>emulations should not even read QEMU.

QEMU does a lot of MMIO emulation, what's so special in the emulated
MMCONFIG? It has absolutely nothing to do with host's MMCONFIG, neither
in address/size or the internal layout. None of the host
MMCONFIG-related facilities touched in any way. It is purely virtual
thing.

I really don't understand why some people have that fear of emulated
MMCONFIG -- it's really the same thing as any other MMIO range QEMU
already emulates via map_io_range_to_ioreq_server(). No sensitive
information exposed. It is related only to emulated PCI conf space which
QEMU already knows about and use, providing emulated PCI devices for it.

>   Paul
>
>> in order to receive events about R/W accesses to this emulated area
>> -- so, why he should receive these events in a form of PCI conf
>> BDF/reg and not simply as MMCONFIG offset directly if it is
>> basically the same thing?  


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22 11:56                             ` Alexey G
@ 2018-03-22 12:09                               ` Jan Beulich
  2018-03-22 13:05                                 ` Alexey G
  0 siblings, 1 reply; 183+ messages in thread
From: Jan Beulich @ 2018-03-22 12:09 UTC (permalink / raw)
  To: Alexey G
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Paul Durrant,
	xen-devel, Anthony Perard, Ian Jackson, Roger Pau Monne

>>> On 22.03.18 at 12:56, <x1917x@gmail.com> wrote:
> I really don't understand why some people have that fear of emulated
> MMCONFIG -- it's really the same thing as any other MMIO range QEMU
> already emulates via map_io_range_to_ioreq_server(). No sensitive
> information exposed. It is related only to emulated PCI conf space which
> QEMU already knows about and use, providing emulated PCI devices for it.

You continue to ignore the routing requirement multiple ioreq
servers impose.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22  9:57                       ` Roger Pau Monné
@ 2018-03-22 12:29                         ` Alexey G
  2018-03-22 12:44                           ` Roger Pau Monné
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-22 12:29 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	Paul Durrant, Jan Beulich, Anthony Perard, xen-devel

On Thu, 22 Mar 2018 09:57:16 +0000
Roger Pau Monné <roger.pau@citrix.com> wrote:
[...]
>> Yes, and it is still needed as we have two distinct (and not equal)
>> interfaces to PCI conf space. Apart from 0..FFh range overlapping
>> they can be considered very different interfaces. And whether it is
>> a real system or emulated -- we can use either one of these two
>> interfaces or both.  
>
>The legacy PCI config space accesses and the MCFG config space access
>are just different methods of accessing the PCI configuration space,
>but the data _must_ be exactly the same. I don't see how a device
>would care about where the access to the config space originated.

If they were different methods of accessing the same thing, they
could've been used interchangeably. When we've got a PCI conf ioreq
which has offset>100h we know we cannot just pass it to emulated
CF8/CFC but have to emulate this specifically.

>> For QEMU zero changes are needed to support MMCONFIG MMIO accesses if
>> they come as MMIO ioreqs. It's just what its MMCONFIG emulation code
>> expects.  
>
>As I said many times in this thread, you seem to be focused around
>what's best for QEMU only, and this is wrong. The IOREQ interface is
>used by QEMU, but it's also used by other device emulators.
>
>I get the feeling that you assume that the correct solution is the one
>that involves less changes to Xen and QEMU. This is simply not true.
>
>> Anyway, for (kind of vague) users of the multiple ioreq servers
>> capability we can enable MMIO translation to PCI conf ioreqs. Note
>> that actually this is an extra step, not forwarding trapped MMCONFIG
>> MMIO accesses to the selected device model as is.
>>  
>> >Getting both IOREQ_TYPE_PCI_CONFIG and IOREQ_TYPE_COPY for PCI
>> >config space access is misleading.  
>> 
>> These are very different accesses, both in transport and
>> capabilities. 
>> >In both cases Xen would have to do the MCFG access decoding in order
>> >to figure out which IOREQ server will handle the request. At which
>> >point the only step that you avoid is the reconstruction of the
>> >memory access from the IOREQ_TYPE_PCI_CONFIG which is trivial.  
>> 
>> The "reconstruction of the memory access" you mentioned won't be easy
>> actually. The thing is, address_space_read/write is not all what we
>> need.
>> 
>> In order to translate PCI conf ioreqs back to emulated MMIO ops, we
>> need to be an involved party, mainly to know where MMCONFIG area is
>> located so we can construct the address within its range from BDF.
>> This piece of information is destroyed in the process of MMIO ioreq
>> translation to PCI conf type.  
>
>QEMU certainly knows the position of the MCFG area (because it's the
>one that tells Xen about it), so I don't understand your concerns
>above.
>> The code which parse PCI conf ioreqs in xen-hvm.c doesn't know
>> anything about the current emulated MMCONFIG state. The correct way
>> to have this info is to participate in its emulation. As we don't
>> participate, we have no other way than trying to gain backdoor
>> access to PCIHost fields via things like object_resolve_*(). This
>> solution is cumbersome and ugly but will work... and may break
>> anytime due to changes in QEMU.   
>
>OK, so you don't want to reconstruct the access, fine.
>
>Then just inject it using pcie_mmcfg_data_{read/write} or some similar
>wrapper. My suggestion was just to try to use the easier way to get
>this injected into QEMU.

QEMU knows its position, the problem it that xen-hvm.c (ioreq
processor) is rather isolated from MMCONFIG emulation.

If you check the pcie_mmcfg_data_read/write MMCONFIG handlers in QEMU,
you can see this:

static uint64_t pcie_mmcfg_data_read(void *opaque, <...>
{
    PCIExpressHost *e = opaque;
...

We know this 'opaque' when we do MMIO-style MMCONFIG handling as
pcie_mmcfg_data_read/write are actual handlers.

But xen-hvm.c needs to gain access to PCIExpressHost out of nowhere,
which is possible but considered a hack by QEMU. We can also insert
some code to MMCONFIG emulation which will store info we need to some
global variables to be used across wildly different and unrelated
modules. It will work, but anyone who see it will have bad thoughts on
his mind.

>> QEMU maintainers will grin while looking at all this I'm afraid --
>> trapped MMIO accesses which are translated to PCI conf accesses which
>> in turn translated back to emulated MMIO accesses upon receiving,
>> along with tedious attempts to gain access to MMCONFIG-related info
>> as we're not invited to the MMCONFIG emulation party.
>>
>> The more I think about it, the more I like the existing
>> map_io_range_to_ioreq_server() approach. :( It works without doing
>> anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
>> working as expected. There is a problem to make it compatible with
>> the specific multiple ioreq servers feature, but providing a new
>> dmop/hypercall (which you suggest is a must have thing to trap
>> MMCONFIG MMIO to give QEMU only the freedom to tell where it is
>> located) allows to solve this problem in any possible way, either
>> MMIO -> PCI conf translation or anything else.  
>
>I'm sorry, but I'm getting lost.
>
>You complain that using IOREQ_TYPE_PCI_CONFIG is not a good approach
>because QEMU needs to know the position of the MCFG area if we want to
>reconstruct and forward the MMIO access. And then you are proposing to
>use IOREQ_TYPE_COPY which _requires_ QEMU to know the position of the
>MCFG area in order to do the decoding of the PCI config space access.
>> >> We can still route either ioreq
>> >> type to multiple device emulators accordingly.    
>> >
>> >It's exactly the same that's done for IO space PCI config space
>> >addresses. QEMU gets an IOREQ_TYPE_PCI_CONFIG and it replays the IO
>> >space access using do_outp and cpu_ioreq_pio.  
>> 
>> ...And it is completely limited to basic PCI conf space. I don't know
>> the context of this line in xen-hvm.c:
>> 
>> val = (1u << 31) | ((req->addr & 0x0f00) << 16) | ((sbdf & 0xffff)
>> << 8) | (req->addr & 0xfc);
>> 
>> but seems like current QEMU versions do not expect anything similar
>> to AMD ECS-style accesses for 0CF8h. It is limited to basic PCI conf
>> only. 
>> >If you think using IOREQ_TYPE_COPY for MCFG accesses is such a
>> >benefit for QEMU, why not just translate the IOREQ_TYPE_PCI_CONFIG
>> >into IOREQ_TYPE_COPY in handle_ioreq and dispatch it using
>> >cpu_ioreq_move?  
>> 
>> Answered above, we need to somehow have access to the info which
>> don't belong to us for this step.  
>
>Why not? QEMU tells Xen the position of the MCFG area but then you
>complain that QEMU doesn't know the position of the MCFG area?

Answered above.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22 12:29                         ` Alexey G
@ 2018-03-22 12:44                           ` Roger Pau Monné
  2018-03-22 15:31                             ` Alexey G
  0 siblings, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-22 12:44 UTC (permalink / raw)
  To: Alexey G
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	Paul Durrant, Jan Beulich, Anthony Perard, xen-devel

On Thu, Mar 22, 2018 at 10:29:22PM +1000, Alexey G wrote:
> On Thu, 22 Mar 2018 09:57:16 +0000
> Roger Pau Monné <roger.pau@citrix.com> wrote:
> [...]
> >> Yes, and it is still needed as we have two distinct (and not equal)
> >> interfaces to PCI conf space. Apart from 0..FFh range overlapping
> >> they can be considered very different interfaces. And whether it is
> >> a real system or emulated -- we can use either one of these two
> >> interfaces or both.  
> >
> >The legacy PCI config space accesses and the MCFG config space access
> >are just different methods of accessing the PCI configuration space,
> >but the data _must_ be exactly the same. I don't see how a device
> >would care about where the access to the config space originated.
> 
> If they were different methods of accessing the same thing, they
> could've been used interchangeably. When we've got a PCI conf ioreq
> which has offset>100h we know we cannot just pass it to emulated
> CF8/CFC but have to emulate this specifically.

This is already not the best approach to dispatch PCI config space
access in QEMU. I think the interface in QEMU should be:

pci_conf_space_{read/write}(sbdf, register, size , data)

And this would go directly into the device. But I assume this involves
a non-trivial amount of work to be implemented. Hence xen-hvm.c usage
of the IO port access replay.

> >OK, so you don't want to reconstruct the access, fine.
> >
> >Then just inject it using pcie_mmcfg_data_{read/write} or some similar
> >wrapper. My suggestion was just to try to use the easier way to get
> >this injected into QEMU.
> 
> QEMU knows its position, the problem it that xen-hvm.c (ioreq
> processor) is rather isolated from MMCONFIG emulation.
> 
> If you check the pcie_mmcfg_data_read/write MMCONFIG handlers in QEMU,
> you can see this:
> 
> static uint64_t pcie_mmcfg_data_read(void *opaque, <...>
> {
>     PCIExpressHost *e = opaque;
> ...
> 
> We know this 'opaque' when we do MMIO-style MMCONFIG handling as
> pcie_mmcfg_data_read/write are actual handlers.
> 
> But xen-hvm.c needs to gain access to PCIExpressHost out of nowhere,
> which is possible but considered a hack by QEMU. We can also insert
> some code to MMCONFIG emulation which will store info we need to some
> global variables to be used across wildly different and unrelated
> modules. It will work, but anyone who see it will have bad thoughts on
> his mind.

Since you need to notify Xen the MCFG area address, why not just store
the MCFG address while doing this operation? You could do this with a
helper in xen-hvm.c, and keep the variable locally to that file.

In any case, this is a QEMU implementation detail. IMO the IOREQ
interface is clear and should not be bended like this just because
'this is easier to implement in QEMU'.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22 12:09                               ` Jan Beulich
@ 2018-03-22 13:05                                 ` Alexey G
  2018-03-22 13:20                                   ` Jan Beulich
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-22 13:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Paul Durrant,
	xen-devel, Anthony Perard, Ian Jackson, Roger Pau Monne

On Thu, 22 Mar 2018 06:09:44 -0600
"Jan Beulich" <JBeulich@suse.com> wrote:

>>>> On 22.03.18 at 12:56, <x1917x@gmail.com> wrote:  
>> I really don't understand why some people have that fear of emulated
>> MMCONFIG -- it's really the same thing as any other MMIO range QEMU
>> already emulates via map_io_range_to_ioreq_server(). No sensitive
>> information exposed. It is related only to emulated PCI conf space
>> which QEMU already knows about and use, providing emulated PCI
>> devices for it.  
>
>You continue to ignore the routing requirement multiple ioreq
>servers impose.

If the emulated MMCONFIG approach will be modified to become
fully compatible with multiple ioreq servers (whatever they used for), I
assume there will be no objections that emulated MMCONFIG can't be
used?
I just want to clarify this moment -- why people think that
a completely emulated MMIO range, not related in any
way to host's MMCONFIG may compromise something.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22 13:05                                 ` Alexey G
@ 2018-03-22 13:20                                   ` Jan Beulich
  2018-03-22 14:34                                     ` Alexey G
  0 siblings, 1 reply; 183+ messages in thread
From: Jan Beulich @ 2018-03-22 13:20 UTC (permalink / raw)
  To: Alexey G
  Cc: StefanoStabellini, Wei Liu, Andrew Cooper, PaulDurrant,
	xen-devel, Anthony Perard, Ian Jackson, Roger Pau Monne

>>> On 22.03.18 at 14:05, <x1917x@gmail.com> wrote:
> On Thu, 22 Mar 2018 06:09:44 -0600
> "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>>>>> On 22.03.18 at 12:56, <x1917x@gmail.com> wrote:  
>>> I really don't understand why some people have that fear of emulated
>>> MMCONFIG -- it's really the same thing as any other MMIO range QEMU
>>> already emulates via map_io_range_to_ioreq_server(). No sensitive
>>> information exposed. It is related only to emulated PCI conf space
>>> which QEMU already knows about and use, providing emulated PCI
>>> devices for it.  
>>
>>You continue to ignore the routing requirement multiple ioreq
>>servers impose.
> 
> If the emulated MMCONFIG approach will be modified to become
> fully compatible with multiple ioreq servers (whatever they used for), I
> assume there will be no objections that emulated MMCONFIG can't be
> used?
> I just want to clarify this moment -- why people think that
> a completely emulated MMIO range, not related in any
> way to host's MMCONFIG may compromise something.

Compromise? All that was said so far - afair - was that this is the
wrong way round design wise.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22 13:20                                   ` Jan Beulich
@ 2018-03-22 14:34                                     ` Alexey G
  2018-03-22 14:42                                       ` Jan Beulich
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-22 14:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: StefanoStabellini, Wei Liu, Andrew Cooper, PaulDurrant,
	xen-devel, Anthony Perard, Ian Jackson, Roger Pau Monne

On Thu, 22 Mar 2018 07:20:00 -0600
"Jan Beulich" <JBeulich@suse.com> wrote:

>>>> On 22.03.18 at 14:05, <x1917x@gmail.com> wrote:  
>> On Thu, 22 Mar 2018 06:09:44 -0600
>> "Jan Beulich" <JBeulich@suse.com> wrote:
>>   
>>>>>> On 22.03.18 at 12:56, <x1917x@gmail.com> wrote:    
>>>> I really don't understand why some people have that fear of
>>>> emulated MMCONFIG -- it's really the same thing as any other MMIO
>>>> range QEMU already emulates via map_io_range_to_ioreq_server(). No
>>>> sensitive information exposed. It is related only to emulated PCI
>>>> conf space which QEMU already knows about and use, providing
>>>> emulated PCI devices for it.    
>>>
>>>You continue to ignore the routing requirement multiple ioreq
>>>servers impose.  
>> 
>> If the emulated MMCONFIG approach will be modified to become
>> fully compatible with multiple ioreq servers (whatever they used
>> for), I assume there will be no objections that emulated MMCONFIG
>> can't be used?
>> I just want to clarify this moment -- why people think that
>> a completely emulated MMIO range, not related in any
>> way to host's MMCONFIG may compromise something.  
>
>Compromise? All that was said so far - afair - was that this is the
>wrong way round design wise.

I assume it's all about emulating some real system for HVM, for other
goals PV/PVH are available. What is a proper, design-wise way to
emulate the MMIO-based MMCONFIG range Q35 provides you think of?

Here is what I've heard so far in this thread:

1. Add a completely new dmop/hypercall so that QEMU can tell Xen where
emulated MMCONFIG MMIO area is located and in the same time map it for
MMIO trapping to intercept accesses. Latter action is the same what
map_io_range_to_ioreq_server() does, but let's ignore it for now
because there was opinion that we need to stick to a distinct hypercall.

2. Upon trapping accesses to this emulated range, Xen will pretend that
QEMU didn't just told him about MMCONFIG location and size and instead
convert MMIO access into PCI conf one and send the ioreq to QEMU or
some other DM.

3. If there will be a PCIEXBAR relocation (OVMF does it currently for
MMCONFIG usage, but we must later teach him non-QEMU manners), QEMU must
immediately inform Xen about any changes in MMCONFIG location/status.

4. QEMU receives PCI conf access while expecting the MMIO address, so
xen-hvm.c has to deal with it somehow, either obtaining MMCONFIG base
and recreating emulated MMIO access from BDF/reg or doing the dirty work
of finding PCIBus/PCIDevice target itself as it cannot use emulated
CF8/CFC ports due to legacy PCI conf size limitation.

Please confirm that it is a preferable solution or if something missing.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22 14:34                                     ` Alexey G
@ 2018-03-22 14:42                                       ` Jan Beulich
  2018-03-22 15:08                                         ` Alexey G
  0 siblings, 1 reply; 183+ messages in thread
From: Jan Beulich @ 2018-03-22 14:42 UTC (permalink / raw)
  To: Alexey G
  Cc: StefanoStabellini, Wei Liu, Andrew Cooper, PaulDurrant,
	xen-devel, Anthony Perard, Ian Jackson, Roger Pau Monne

>>> On 22.03.18 at 15:34, <x1917x@gmail.com> wrote:
> On Thu, 22 Mar 2018 07:20:00 -0600
> "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>>>>> On 22.03.18 at 14:05, <x1917x@gmail.com> wrote:  
>>> On Thu, 22 Mar 2018 06:09:44 -0600
>>> "Jan Beulich" <JBeulich@suse.com> wrote:
>>>   
>>>>>>> On 22.03.18 at 12:56, <x1917x@gmail.com> wrote:    
>>>>> I really don't understand why some people have that fear of
>>>>> emulated MMCONFIG -- it's really the same thing as any other MMIO
>>>>> range QEMU already emulates via map_io_range_to_ioreq_server(). No
>>>>> sensitive information exposed. It is related only to emulated PCI
>>>>> conf space which QEMU already knows about and use, providing
>>>>> emulated PCI devices for it.    
>>>>
>>>>You continue to ignore the routing requirement multiple ioreq
>>>>servers impose.  
>>> 
>>> If the emulated MMCONFIG approach will be modified to become
>>> fully compatible with multiple ioreq servers (whatever they used
>>> for), I assume there will be no objections that emulated MMCONFIG
>>> can't be used?
>>> I just want to clarify this moment -- why people think that
>>> a completely emulated MMIO range, not related in any
>>> way to host's MMCONFIG may compromise something.  
>>
>>Compromise? All that was said so far - afair - was that this is the
>>wrong way round design wise.
> 
> I assume it's all about emulating some real system for HVM, for other
> goals PV/PVH are available. What is a proper, design-wise way to
> emulate the MMIO-based MMCONFIG range Q35 provides you think of?
> 
> Here is what I've heard so far in this thread:
> 
> 1. Add a completely new dmop/hypercall so that QEMU can tell Xen where
> emulated MMCONFIG MMIO area is located and in the same time map it for
> MMIO trapping to intercept accesses. Latter action is the same what
> map_io_range_to_ioreq_server() does, but let's ignore it for now
> because there was opinion that we need to stick to a distinct hypercall.
> 
> 2. Upon trapping accesses to this emulated range, Xen will pretend that
> QEMU didn't just told him about MMCONFIG location and size and instead
> convert MMIO access into PCI conf one and send the ioreq to QEMU or
> some other DM.
> 
> 3. If there will be a PCIEXBAR relocation (OVMF does it currently for
> MMCONFIG usage, but we must later teach him non-QEMU manners), QEMU must
> immediately inform Xen about any changes in MMCONFIG location/status.
> 
> 4. QEMU receives PCI conf access while expecting the MMIO address, so
> xen-hvm.c has to deal with it somehow, either obtaining MMCONFIG base
> and recreating emulated MMIO access from BDF/reg or doing the dirty work
> of finding PCIBus/PCIDevice target itself as it cannot use emulated
> CF8/CFC ports due to legacy PCI conf size limitation.
> 
> Please confirm that it is a preferable solution or if something missing.

I'm afraid this is only part of the picture, as you've been told by
others before. We first of all need to settle on who emulates
the core chipset registers. Depending on that will be how Xen
would learn about the MCFG location inside the guest.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22 14:42                                       ` Jan Beulich
@ 2018-03-22 15:08                                         ` Alexey G
  2018-03-23 13:57                                           ` Paul Durrant
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-22 15:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: StefanoStabellini, Wei Liu, Andrew Cooper, PaulDurrant,
	xen-devel, Anthony Perard, Ian Jackson, Roger Pau Monne

On Thu, 22 Mar 2018 08:42:09 -0600
"Jan Beulich" <JBeulich@suse.com> wrote:

>>>> On 22.03.18 at 15:34, <x1917x@gmail.com> wrote:  
>> On Thu, 22 Mar 2018 07:20:00 -0600
>> "Jan Beulich" <JBeulich@suse.com> wrote:
>>   
>>>>>> On 22.03.18 at 14:05, <x1917x@gmail.com> wrote:    
>>>> On Thu, 22 Mar 2018 06:09:44 -0600
>>>> "Jan Beulich" <JBeulich@suse.com> wrote:
>>>>     
>>>>>>>> On 22.03.18 at 12:56, <x1917x@gmail.com> wrote:      
>>>>>> I really don't understand why some people have that fear of
>>>>>> emulated MMCONFIG -- it's really the same thing as any other MMIO
>>>>>> range QEMU already emulates via map_io_range_to_ioreq_server().
>>>>>> No sensitive information exposed. It is related only to emulated
>>>>>> PCI conf space which QEMU already knows about and use, providing
>>>>>> emulated PCI devices for it.      
>>>>>
>>>>>You continue to ignore the routing requirement multiple ioreq
>>>>>servers impose.    
>>>> 
>>>> If the emulated MMCONFIG approach will be modified to become
>>>> fully compatible with multiple ioreq servers (whatever they used
>>>> for), I assume there will be no objections that emulated MMCONFIG
>>>> can't be used?
>>>> I just want to clarify this moment -- why people think that
>>>> a completely emulated MMIO range, not related in any
>>>> way to host's MMCONFIG may compromise something.    
>>>
>>>Compromise? All that was said so far - afair - was that this is the
>>>wrong way round design wise.  
>> 
>> I assume it's all about emulating some real system for HVM, for other
>> goals PV/PVH are available. What is a proper, design-wise way to
>> emulate the MMIO-based MMCONFIG range Q35 provides you think of?
>> 
>> Here is what I've heard so far in this thread:
>> 
>> 1. Add a completely new dmop/hypercall so that QEMU can tell Xen
>> where emulated MMCONFIG MMIO area is located and in the same time
>> map it for MMIO trapping to intercept accesses. Latter action is the
>> same what map_io_range_to_ioreq_server() does, but let's ignore it
>> for now because there was opinion that we need to stick to a
>> distinct hypercall.
>> 
>> 2. Upon trapping accesses to this emulated range, Xen will pretend
>> that QEMU didn't just told him about MMCONFIG location and size and
>> instead convert MMIO access into PCI conf one and send the ioreq to
>> QEMU or some other DM.
>> 
>> 3. If there will be a PCIEXBAR relocation (OVMF does it currently for
>> MMCONFIG usage, but we must later teach him non-QEMU manners), QEMU
>> must immediately inform Xen about any changes in MMCONFIG
>> location/status.
>> 
>> 4. QEMU receives PCI conf access while expecting the MMIO address, so
>> xen-hvm.c has to deal with it somehow, either obtaining MMCONFIG base
>> and recreating emulated MMIO access from BDF/reg or doing the dirty
>> work of finding PCIBus/PCIDevice target itself as it cannot use
>> emulated CF8/CFC ports due to legacy PCI conf size limitation.
>> 
>> Please confirm that it is a preferable solution or if something
>> missing.  
>
>I'm afraid this is only part of the picture, as you've been told by
>others before. We first of all need to settle on who emulates
>the core chipset registers. Depending on that will be how Xen
>would learn about the MCFG location inside the guest.

Few related thoughts:

1. MMCONFIG address is chipset-specific. On Q35 it's a PCIEXBAR, on
other x86 systems it may be HECBASE or else. So we can assume it is
bound to the emulated machine

2. We rely on QEMU to emulate different machines for us.

3. There are users which touch chipset-specific PCIEXBAR directly if
they see a Q35 system (OVMF so far)

Seems like we're pretty limited in freedom of choice in this
conditions, I'm afraid.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22 12:44                           ` Roger Pau Monné
@ 2018-03-22 15:31                             ` Alexey G
  2018-03-23 10:29                               ` Paul Durrant
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-22 15:31 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	Paul Durrant, Jan Beulich, Anthony Perard, xen-devel

On Thu, 22 Mar 2018 12:44:02 +0000
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Thu, Mar 22, 2018 at 10:29:22PM +1000, Alexey G wrote:
>> On Thu, 22 Mar 2018 09:57:16 +0000
>> Roger Pau Monné <roger.pau@citrix.com> wrote:
>> [...]  
>> >> Yes, and it is still needed as we have two distinct (and not
>> >> equal) interfaces to PCI conf space. Apart from 0..FFh range
>> >> overlapping they can be considered very different interfaces. And
>> >> whether it is a real system or emulated -- we can use either one
>> >> of these two interfaces or both.    
>> >
>> >The legacy PCI config space accesses and the MCFG config space
>> >access are just different methods of accessing the PCI
>> >configuration space, but the data _must_ be exactly the same. I
>> >don't see how a device would care about where the access to the
>> >config space originated.  
>> 
>> If they were different methods of accessing the same thing, they
>> could've been used interchangeably. When we've got a PCI conf ioreq
>> which has offset>100h we know we cannot just pass it to emulated
>> CF8/CFC but have to emulate this specifically.  
>
>This is already not the best approach to dispatch PCI config space
>access in QEMU. I think the interface in QEMU should be:
>
>pci_conf_space_{read/write}(sbdf, register, size , data)
>
>And this would go directly into the device. But I assume this involves
>a non-trivial amount of work to be implemented. Hence xen-hvm.c usage
>of the IO port access replay.

Yes, it's a helpful shortcut. The only bad thing that we can't use
it for PCI extended config accesses, a memory address within emulated
MMCONFIG much more preferable in current architecture.

>> >OK, so you don't want to reconstruct the access, fine.
>> >
>> >Then just inject it using pcie_mmcfg_data_{read/write} or some
>> >similar wrapper. My suggestion was just to try to use the easier
>> >way to get this injected into QEMU.  
>> 
>> QEMU knows its position, the problem it that xen-hvm.c (ioreq
>> processor) is rather isolated from MMCONFIG emulation.
>> 
>> If you check the pcie_mmcfg_data_read/write MMCONFIG handlers in
>> QEMU, you can see this:
>> 
>> static uint64_t pcie_mmcfg_data_read(void *opaque, <...>
>> {
>>     PCIExpressHost *e = opaque;
>> ...
>> 
>> We know this 'opaque' when we do MMIO-style MMCONFIG handling as
>> pcie_mmcfg_data_read/write are actual handlers.
>> 
>> But xen-hvm.c needs to gain access to PCIExpressHost out of nowhere,
>> which is possible but considered a hack by QEMU. We can also insert
>> some code to MMCONFIG emulation which will store info we need to some
>> global variables to be used across wildly different and unrelated
>> modules. It will work, but anyone who see it will have bad thoughts
>> on his mind.  
>
>Since you need to notify Xen the MCFG area address, why not just store
>the MCFG address while doing this operation? You could do this with a
>helper in xen-hvm.c, and keep the variable locally to that file.
>
>In any case, this is a QEMU implementation detail. IMO the IOREQ
>interface is clear and should not be bended like this just because
>'this is easier to implement in QEMU'.

A bit of hack too, but might work. Anyway, it's an extra work we can
avoid if we simply skip PCI conf translation for MMCONFIG MMIO ioreqs
targeting QEMU. I completely agree that we need to translate these
accesses into PCI conf ioreqs for device DMs, but for QEMU it is an
unwanted and redundant step.

AFAIK (Paul might correct me here) the multiple device emulators
feature already makes use of the primary (aka default) DM and
device-specific DM distinction, so in theory it should be possible to
provide that translation only for device-specific DMs (which function
apart from the emulated machine and cannot use its facilities).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22 15:31                             ` Alexey G
@ 2018-03-23 10:29                               ` Paul Durrant
  2018-03-23 11:38                                 ` Jan Beulich
  0 siblings, 1 reply; 183+ messages in thread
From: Paul Durrant @ 2018-03-23 10:29 UTC (permalink / raw)
  To: 'Alexey G', Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Jan Beulich,
	xen-devel, Anthony Perard, Ian Jackson

> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On Behalf
> Of Alexey G
> Sent: 22 March 2018 15:31
> To: Roger Pau Monne <roger.pau@citrix.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> <wei.liu2@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian
> Jackson <Ian.Jackson@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>;
> Jan Beulich <jbeulich@suse.com>; Anthony Perard
> <anthony.perard@citrix.com>; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Thu, 22 Mar 2018 12:44:02 +0000
> Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> >On Thu, Mar 22, 2018 at 10:29:22PM +1000, Alexey G wrote:
> >> On Thu, 22 Mar 2018 09:57:16 +0000
> >> Roger Pau Monné <roger.pau@citrix.com> wrote:
> >> [...]
> >> >> Yes, and it is still needed as we have two distinct (and not
> >> >> equal) interfaces to PCI conf space. Apart from 0..FFh range
> >> >> overlapping they can be considered very different interfaces. And
> >> >> whether it is a real system or emulated -- we can use either one
> >> >> of these two interfaces or both.
> >> >
> >> >The legacy PCI config space accesses and the MCFG config space
> >> >access are just different methods of accessing the PCI
> >> >configuration space, but the data _must_ be exactly the same. I
> >> >don't see how a device would care about where the access to the
> >> >config space originated.
> >>
> >> If they were different methods of accessing the same thing, they
> >> could've been used interchangeably. When we've got a PCI conf ioreq
> >> which has offset>100h we know we cannot just pass it to emulated
> >> CF8/CFC but have to emulate this specifically.
> >
> >This is already not the best approach to dispatch PCI config space
> >access in QEMU. I think the interface in QEMU should be:
> >
> >pci_conf_space_{read/write}(sbdf, register, size , data)
> >
> >And this would go directly into the device. But I assume this involves
> >a non-trivial amount of work to be implemented. Hence xen-hvm.c usage
> >of the IO port access replay.
> 
> Yes, it's a helpful shortcut. The only bad thing that we can't use
> it for PCI extended config accesses, a memory address within emulated
> MMCONFIG much more preferable in current architecture.
> 
> >> >OK, so you don't want to reconstruct the access, fine.
> >> >
> >> >Then just inject it using pcie_mmcfg_data_{read/write} or some
> >> >similar wrapper. My suggestion was just to try to use the easier
> >> >way to get this injected into QEMU.
> >>
> >> QEMU knows its position, the problem it that xen-hvm.c (ioreq
> >> processor) is rather isolated from MMCONFIG emulation.
> >>
> >> If you check the pcie_mmcfg_data_read/write MMCONFIG handlers in
> >> QEMU, you can see this:
> >>
> >> static uint64_t pcie_mmcfg_data_read(void *opaque, <...>
> >> {
> >>     PCIExpressHost *e = opaque;
> >> ...
> >>
> >> We know this 'opaque' when we do MMIO-style MMCONFIG handling as
> >> pcie_mmcfg_data_read/write are actual handlers.
> >>
> >> But xen-hvm.c needs to gain access to PCIExpressHost out of nowhere,
> >> which is possible but considered a hack by QEMU. We can also insert
> >> some code to MMCONFIG emulation which will store info we need to
> some
> >> global variables to be used across wildly different and unrelated
> >> modules. It will work, but anyone who see it will have bad thoughts
> >> on his mind.
> >
> >Since you need to notify Xen the MCFG area address, why not just store
> >the MCFG address while doing this operation? You could do this with a
> >helper in xen-hvm.c, and keep the variable locally to that file.
> >
> >In any case, this is a QEMU implementation detail. IMO the IOREQ
> >interface is clear and should not be bended like this just because
> >'this is easier to implement in QEMU'.
> 
> A bit of hack too, but might work. Anyway, it's an extra work we can
> avoid if we simply skip PCI conf translation for MMCONFIG MMIO ioreqs
> targeting QEMU. I completely agree that we need to translate these
> accesses into PCI conf ioreqs for device DMs, but for QEMU it is an
> unwanted and redundant step.
> 
> AFAIK (Paul might correct me here) the multiple device emulators
> feature already makes use of the primary (aka default) DM and
> device-specific DM distinction, so in theory it should be possible to
> provide that translation only for device-specific DMs (which function
> apart from the emulated machine and cannot use its facilities).
> 

No, that's not quite right. Only qemu-trad (and stubdom) are 'default' ioreq servers. Upstream QEMU has registered individual PCI devices with Xen for some time now, and hence gets proper PCI config IOREQs. Also we really really want default ioreq servers as their interface to Xen is fragile and has only just narrowly avoided being a security issue.

  Paul

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-23 10:29                               ` Paul Durrant
@ 2018-03-23 11:38                                 ` Jan Beulich
  2018-03-23 13:52                                   ` Paul Durrant
  0 siblings, 1 reply; 183+ messages in thread
From: Jan Beulich @ 2018-03-23 11:38 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, 'Alexey G',
	xen-devel, Anthony Perard, Ian Jackson, Roger Pau Monne

>>> On 23.03.18 at 11:29, <Paul.Durrant@citrix.com> wrote:
> No, that's not quite right. Only qemu-trad (and stubdom) are 'default' ioreq 
> servers. Upstream QEMU has registered individual PCI devices with Xen for 
> some time now, and hence gets proper PCI config IOREQs. Also we really really 
> want default ioreq servers as their interface to Xen is fragile and has only 
> just narrowly avoided being a security issue.

Did you miss some "don't" or "to go away"?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-23 11:38                                 ` Jan Beulich
@ 2018-03-23 13:52                                   ` Paul Durrant
  0 siblings, 0 replies; 183+ messages in thread
From: Paul Durrant @ 2018-03-23 13:52 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, 'Alexey G',
	Ian Jackson, Anthony Perard, xen-devel, Roger Pau Monne

> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On Behalf
> Of Jan Beulich
> Sent: 23 March 2018 11:39
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> <wei.liu2@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>;
> 'Alexey G' <x1917x@gmail.com>; xen-devel@lists.xenproject.org; Anthony
> Perard <anthony.perard@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>;
> Roger Pau Monne <roger.pau@citrix.com>
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> >>> On 23.03.18 at 11:29, <Paul.Durrant@citrix.com> wrote:
> > No, that's not quite right. Only qemu-trad (and stubdom) are 'default' ioreq
> > servers. Upstream QEMU has registered individual PCI devices with Xen for
> > some time now, and hence gets proper PCI config IOREQs. Also we really
> really
> > want default ioreq servers as their interface to Xen is fragile and has only
> > just narrowly avoided being a security issue.
> 
> Did you miss some "don't" or "to go away"?
> 

Oops, yes! "to go away" definitely.

  Paul

> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-22 15:08                                         ` Alexey G
@ 2018-03-23 13:57                                           ` Paul Durrant
  2018-03-23 22:32                                             ` Alexey G
  0 siblings, 1 reply; 183+ messages in thread
From: Paul Durrant @ 2018-03-23 13:57 UTC (permalink / raw)
  To: 'Alexey G', Jan Beulich
  Cc: StefanoStabellini, Wei Liu, Andrew Cooper, xen-devel,
	Anthony Perard, Ian Jackson, Roger Pau Monne

> -----Original Message-----
> From: Alexey G [mailto:x1917x@gmail.com]
> Sent: 22 March 2018 15:09
> To: Jan Beulich <JBeulich@suse.com>
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Anthony Perard
> <anthony.perard@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Paul
> Durrant <Paul.Durrant@citrix.com>; Roger Pau Monne
> <roger.pau@citrix.com>; Wei Liu <wei.liu2@citrix.com>; StefanoStabellini
> <sstabellini@kernel.org>; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Thu, 22 Mar 2018 08:42:09 -0600
> "Jan Beulich" <JBeulich@suse.com> wrote:
> 
> >>>> On 22.03.18 at 15:34, <x1917x@gmail.com> wrote:
> >> On Thu, 22 Mar 2018 07:20:00 -0600
> >> "Jan Beulich" <JBeulich@suse.com> wrote:
> >>
> >>>>>> On 22.03.18 at 14:05, <x1917x@gmail.com> wrote:
> >>>> On Thu, 22 Mar 2018 06:09:44 -0600
> >>>> "Jan Beulich" <JBeulich@suse.com> wrote:
> >>>>
> >>>>>>>> On 22.03.18 at 12:56, <x1917x@gmail.com> wrote:
> >>>>>> I really don't understand why some people have that fear of
> >>>>>> emulated MMCONFIG -- it's really the same thing as any other
> MMIO
> >>>>>> range QEMU already emulates via
> map_io_range_to_ioreq_server().
> >>>>>> No sensitive information exposed. It is related only to emulated
> >>>>>> PCI conf space which QEMU already knows about and use, providing
> >>>>>> emulated PCI devices for it.
> >>>>>
> >>>>>You continue to ignore the routing requirement multiple ioreq
> >>>>>servers impose.
> >>>>
> >>>> If the emulated MMCONFIG approach will be modified to become
> >>>> fully compatible with multiple ioreq servers (whatever they used
> >>>> for), I assume there will be no objections that emulated MMCONFIG
> >>>> can't be used?
> >>>> I just want to clarify this moment -- why people think that
> >>>> a completely emulated MMIO range, not related in any
> >>>> way to host's MMCONFIG may compromise something.
> >>>
> >>>Compromise? All that was said so far - afair - was that this is the
> >>>wrong way round design wise.
> >>
> >> I assume it's all about emulating some real system for HVM, for other
> >> goals PV/PVH are available. What is a proper, design-wise way to
> >> emulate the MMIO-based MMCONFIG range Q35 provides you think of?
> >>
> >> Here is what I've heard so far in this thread:
> >>
> >> 1. Add a completely new dmop/hypercall so that QEMU can tell Xen
> >> where emulated MMCONFIG MMIO area is located and in the same time
> >> map it for MMIO trapping to intercept accesses. Latter action is the
> >> same what map_io_range_to_ioreq_server() does, but let's ignore it
> >> for now because there was opinion that we need to stick to a
> >> distinct hypercall.
> >>
> >> 2. Upon trapping accesses to this emulated range, Xen will pretend
> >> that QEMU didn't just told him about MMCONFIG location and size and
> >> instead convert MMIO access into PCI conf one and send the ioreq to
> >> QEMU or some other DM.
> >>
> >> 3. If there will be a PCIEXBAR relocation (OVMF does it currently for
> >> MMCONFIG usage, but we must later teach him non-QEMU manners),
> QEMU
> >> must immediately inform Xen about any changes in MMCONFIG
> >> location/status.
> >>
> >> 4. QEMU receives PCI conf access while expecting the MMIO address, so
> >> xen-hvm.c has to deal with it somehow, either obtaining MMCONFIG
> base
> >> and recreating emulated MMIO access from BDF/reg or doing the dirty
> >> work of finding PCIBus/PCIDevice target itself as it cannot use
> >> emulated CF8/CFC ports due to legacy PCI conf size limitation.
> >>
> >> Please confirm that it is a preferable solution or if something
> >> missing.
> >
> >I'm afraid this is only part of the picture, as you've been told by
> >others before. We first of all need to settle on who emulates
> >the core chipset registers. Depending on that will be how Xen
> >would learn about the MCFG location inside the guest.
> 
> Few related thoughts:
> 
> 1. MMCONFIG address is chipset-specific. On Q35 it's a PCIEXBAR, on
> other x86 systems it may be HECBASE or else. So we can assume it is
> bound to the emulated machine
> 

Xen emulates the machine so it should be emulating PCIEXBAR. 

> 2. We rely on QEMU to emulate different machines for us.
> 

We should not be. It's a historical artefact that we rely on QEMU for any part of machine emulation.

> 3. There are users which touch chipset-specific PCIEXBAR directly if
> they see a Q35 system (OVMF so far)
> 

And we should squash such accesses. The toolstack should be sole control of the guest memory map. It should be the only building MCFG so it should decide where the MMCONFIG regions go, not the firmware running in guest context.

> Seems like we're pretty limited in freedom of choice in this
> conditions, I'm afraid.

I don't think so. We're only limited if we use QEMU's Q35 emulation and what I'm saying is that we should not be doing that (nor should be we be using it to emulate any part of the PIIX today).

  Paul

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-23 13:57                                           ` Paul Durrant
@ 2018-03-23 22:32                                             ` Alexey G
  2018-03-26  9:24                                               ` Roger Pau Monné
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-23 22:32 UTC (permalink / raw)
  To: Paul Durrant
  Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Jan Beulich,
	xen-devel, Anthony Perard, Ian Jackson, Roger Pau Monne

On Fri, 23 Mar 2018 13:57:11 +0000
Paul Durrant <Paul.Durrant@citrix.com> wrote:
[...]
>> Few related thoughts:
>> 
>> 1. MMCONFIG address is chipset-specific. On Q35 it's a PCIEXBAR, on
>> other x86 systems it may be HECBASE or else. So we can assume it is
>> bound to the emulated machine
>
>Xen emulates the machine so it should be emulating PCIEXBAR. 

Actually, Xen currently emulates only few devices. Others are
provided by QEMU, that's the problem.

>> 2. We rely on QEMU to emulate different machines for us.
>We should not be. It's a historical artefact that we rely on QEMU for
>any part of machine emulation.

HVM guests need to see something more or less close to real hardware to
run. Even if we later install PV drivers for network/storage/etc usage,
we still need to support system firmware (SeaBIOS/OVMF) and be able to
install any (ideally) OS which expects to be installed only on some
real x86 hw. We also need to be ready to fallback to the emulated hw if
eg. user will boot OS in the safe mode.

It all depends on what you mean by not relying on QEMU for any part
of machine emulation.

There is a number of mandatory devices which should be provided for a
typical x86 system. Xen emulates some of them, but there is a number
which he doesn't. Apart from "classic" devices like RTC, PIT, KBC, etc
we need to provide at least storage and network interfaces.

Windows installer won't be happy to boot from the PV storage device, he
prefers to encounter something like AHCI (Windows 7+), ATA (for older
OSes) or ATAPI if it is an iso cd.
Providing emulation for the AHCI+ATA+ATAPI trio alone is a non-trivial
task. QEMU itself provides only partial implementation of these, many
features are unsupported. Another very useful thing to emulate is USB.
Depending on the controller version and device classes required, this
may be far more complex to emulate than AHCI+ATA+ATAPI combined.

So, if you suggest to drop QEMU completely, it means that all this
functionality must be replaced by own. Not that hard, but still a lot
of effort.


OTOH, if you mean stripping QEMU of general PCI bus control and
replacing his emulated NB/SB with Xen-owned -- well, it theory it
should be possible, with patches on QEMU side.

In fact, the emulated chipset (NB+SB combo without supplemental devices)
itself is a small part of required emulation. It's relatively easy to
provide own analogs of for eg. 'mch' and 'ICH9-LPC' QEMU PCIDevice's,
the problem is to glue all remaining parts together.

I assume the final goal in this case is to have only a set of necessary
QEMU PCIDevice's for which we will be providing I/O, MMIO and PCI conf
trapping facilities. Only devices such as rtl8139, ich9-ahci and few
others.

Basically, this means a new, chipset-less QEMU machine type.
Well, in theory it is possible with a bit of effort I think. The main
question is where will be the NB/SB/PCIbus emulating part reside in
this case. As this part must still have some priveleges, it's basically
the same decision problem as with QEMU's dwelling place -- stubdomain,
Dom0 or else.

>> 3. There are users which touch chipset-specific PCIEXBAR directly if
>> they see a Q35 system (OVMF so far)
>
>And we should squash such accesses.
>

Yes, we have that privilege (i.e. allocating all IO/MMIO bases) for
hvmloader. OVMF should not differ in this subject to SeaBIOS.

>The toolstack should be sole
>control of the guest memory map. It should be the only building MCFG
>so it should decide where the MMCONFIG regions go, not the firmware
>running in guest context.

HVM memory layout is another problem which needs solution BTW. I had to
implement one for my PT goals, but it's very radical I'm afraid.

Right now there are wicked issues present in handling memory layout
between hvmloader and QEMU. They may see a different memory map, even
with overlaps in some (depending on MMIO hole size and content) cases --
like an attempt to place MMIO BAR over memory which is used for vram
backing storage by QEMU, causing variety of issues like emulated I/O
errors (with a storage device) during guest boot attempt.

Regarding control of the guest memory map in the toolstack only... The
problem is, only firmware can see a final memory map at the moment.
And only the device model knows about invisible "service" ranges for
emulated devices, like the LFB content (aka "VRAM") when it is not
mapped to a guest.

In order to calculate the final memory/MMIO hole split, we need to know:

1) all PCI devices on a PCI bus. At the moment Xen contributes only
devices like PT to the final PCI bus (via QMP device_add). Others are
QEMU ones. Even Xen platform PCI device relies on QEMU emulation.
Non-QEMU device emulators are another source of virtual PCI devices I
guess.

2) all chipset-specific emulated MMIO ranges. MMCONFIG is one of them
and largest (up to 256Mb for a segment). There are few other smaller
ranges, eg. Root Complex registers. All this ranges depend on the
emulated chipset.

3) all reserved memory ranges (this one what toolstack already knows)

4) all "service" guest memory ranges like backing storage for VRAM in
QEMU. Emulated Option ROMs should belong here too, but IIRC xen-hvm.c
either intentionally or by mistate handles them as emulated ranges
currently.

If we miss any of these (like what are the chipset-specific ranges and
their size alignment requirements) -- we're in trouble. But, if we know
*all* of these, we can pre-calculate the MMIO hole size. Although this
is a bit fragile to do from the toolstack because both sizing algo in
the toolstack and MMIO BAR allocation code in the firmware (hvmloader)
must have their algorithms synchronized, because it is possible to
sruff BARs to MMIO hole in different ways, especially when PCI-PCI
bridges will appear on the scene. Both need to do it in a consistent way
(resulting in similar set of gaps between allocated BARs), otherwise
expected MMIO hole sizes won't match, which means we may need to
relocate MMIO BARs to the high MMIO hole and this in turn may lead to
those overlaps with QEMU memories.

>> Seems like we're pretty limited in freedom of choice in this
>> conditions, I'm afraid.  
>
>I don't think so. We're only limited if we use QEMU's Q35 emulation
>and what I'm saying is that we should not be doing that (nor should be
>we be using it to emulate any part of the PIIX today).
>
>  Paul


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-23 22:32                                             ` Alexey G
@ 2018-03-26  9:24                                               ` Roger Pau Monné
  2018-03-26 19:42                                                 ` Alexey G
  0 siblings, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-26  9:24 UTC (permalink / raw)
  To: Alexey G
  Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Paul Durrant,
	Jan Beulich, xen-devel, Anthony Perard, Ian Jackson

On Sat, Mar 24, 2018 at 08:32:44AM +1000, Alexey G wrote:
> On Fri, 23 Mar 2018 13:57:11 +0000
> Paul Durrant <Paul.Durrant@citrix.com> wrote:
> [...]
> >> Few related thoughts:
> >> 
> >> 1. MMCONFIG address is chipset-specific. On Q35 it's a PCIEXBAR, on
> >> other x86 systems it may be HECBASE or else. So we can assume it is
> >> bound to the emulated machine
> >
> >Xen emulates the machine so it should be emulating PCIEXBAR. 
> 
> Actually, Xen currently emulates only few devices. Others are
> provided by QEMU, that's the problem.
> 
> >> 2. We rely on QEMU to emulate different machines for us.
> >We should not be. It's a historical artefact that we rely on QEMU for
> >any part of machine emulation.
> 
> HVM guests need to see something more or less close to real hardware to
> run. Even if we later install PV drivers for network/storage/etc usage,
> we still need to support system firmware (SeaBIOS/OVMF) and be able to
> install any (ideally) OS which expects to be installed only on some
> real x86 hw. We also need to be ready to fallback to the emulated hw if
> eg. user will boot OS in the safe mode.

I think Paul means that Xen should be emulating the platform devices
and part of the southbridge/northbridge functionality, but not all the
emulated devices provided to a guest.

> 
> It all depends on what you mean by not relying on QEMU for any part
> of machine emulation.
> 
> There is a number of mandatory devices which should be provided for a
> typical x86 system. Xen emulates some of them, but there is a number
> which he doesn't. Apart from "classic" devices like RTC, PIT, KBC, etc
> we need to provide at least storage and network interfaces.
> 
> Windows installer won't be happy to boot from the PV storage device, he
> prefers to encounter something like AHCI (Windows 7+), ATA (for older
> OSes) or ATAPI if it is an iso cd.
> Providing emulation for the AHCI+ATA+ATAPI trio alone is a non-trivial
> task. QEMU itself provides only partial implementation of these, many
> features are unsupported. Another very useful thing to emulate is USB.
> Depending on the controller version and device classes required, this
> may be far more complex to emulate than AHCI+ATA+ATAPI combined.
> 
> So, if you suggest to drop QEMU completely, it means that all this
> functionality must be replaced by own. Not that hard, but still a lot
> of effort.
> 
> 
> OTOH, if you mean stripping QEMU of general PCI bus control and
> replacing his emulated NB/SB with Xen-owned -- well, it theory it
> should be possible, with patches on QEMU side.
> 
> In fact, the emulated chipset (NB+SB combo without supplemental devices)
> itself is a small part of required emulation. It's relatively easy to
> provide own analogs of for eg. 'mch' and 'ICH9-LPC' QEMU PCIDevice's,
> the problem is to glue all remaining parts together.
> 
> I assume the final goal in this case is to have only a set of necessary
> QEMU PCIDevice's for which we will be providing I/O, MMIO and PCI conf
> trapping facilities. Only devices such as rtl8139, ich9-ahci and few
> others.
> 
> Basically, this means a new, chipset-less QEMU machine type.
> Well, in theory it is possible with a bit of effort I think. The main
> question is where will be the NB/SB/PCIbus emulating part reside in
> this case.

Mostly inside of Xen. Of course the IDE/SATA/USB/Ethernet... part of
the southbrigde will be emulated by a device model (ie: QEMU).

As you mention above, I also took a look and it seems like the amount
of registers that we should emulate for Q35 DRAM controller (D0:F0) is
fairly minimal based on current QEMU implementation. We could even
possibly get away by just emulating PCIEXBAR.

> As this part must still have some priveleges, it's basically
> the same decision problem as with QEMU's dwelling place -- stubdomain,
> Dom0 or else.
> 
> >> 3. There are users which touch chipset-specific PCIEXBAR directly if
> >> they see a Q35 system (OVMF so far)
> >
> >And we should squash such accesses.
> >
> 
> Yes, we have that privilege (i.e. allocating all IO/MMIO bases) for
> hvmloader. OVMF should not differ in this subject to SeaBIOS.
> 
> >The toolstack should be sole
> >control of the guest memory map. It should be the only building MCFG
> >so it should decide where the MMCONFIG regions go, not the firmware
> >running in guest context.
> 
> HVM memory layout is another problem which needs solution BTW. I had to
> implement one for my PT goals, but it's very radical I'm afraid.
> 
> Right now there are wicked issues present in handling memory layout
> between hvmloader and QEMU. They may see a different memory map, even
> with overlaps in some (depending on MMIO hole size and content) cases --
> like an attempt to place MMIO BAR over memory which is used for vram
> backing storage by QEMU, causing variety of issues like emulated I/O
> errors (with a storage device) during guest boot attempt.
> 
> Regarding control of the guest memory map in the toolstack only... The
> problem is, only firmware can see a final memory map at the moment.
> And only the device model knows about invisible "service" ranges for
> emulated devices, like the LFB content (aka "VRAM") when it is not
> mapped to a guest.
> 
> In order to calculate the final memory/MMIO hole split, we need to know:
> 
> 1) all PCI devices on a PCI bus. At the moment Xen contributes only
> devices like PT to the final PCI bus (via QMP device_add). Others are
> QEMU ones. Even Xen platform PCI device relies on QEMU emulation.
> Non-QEMU device emulators are another source of virtual PCI devices I
> guess.
> 
> 2) all chipset-specific emulated MMIO ranges. MMCONFIG is one of them
> and largest (up to 256Mb for a segment). There are few other smaller
> ranges, eg. Root Complex registers. All this ranges depend on the
> emulated chipset.
> 
> 3) all reserved memory ranges (this one what toolstack already knows)
> 
> 4) all "service" guest memory ranges like backing storage for VRAM in
> QEMU. Emulated Option ROMs should belong here too, but IIRC xen-hvm.c
> either intentionally or by mistate handles them as emulated ranges
> currently.
> 
> If we miss any of these (like what are the chipset-specific ranges and
> their size alignment requirements) -- we're in trouble. But, if we know
> *all* of these, we can pre-calculate the MMIO hole size. Although this
> is a bit fragile to do from the toolstack because both sizing algo in
> the toolstack and MMIO BAR allocation code in the firmware (hvmloader)
> must have their algorithms synchronized, because it is possible to
> sruff BARs to MMIO hole in different ways, especially when PCI-PCI
> bridges will appear on the scene. Both need to do it in a consistent way
> (resulting in similar set of gaps between allocated BARs), otherwise
> expected MMIO hole sizes won't match, which means we may need to
> relocate MMIO BARs to the high MMIO hole and this in turn may lead to
> those overlaps with QEMU memories.

I agree that the current memory layout management (or the lack of it)
is concerning. Although related, I think this should be tackled as a
different issue from the chipset one IMHO.

Since you already posted the Q35 series I would attempt to get that
done first before jumping into the memory layout one.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-26  9:24                                               ` Roger Pau Monné
@ 2018-03-26 19:42                                                 ` Alexey G
  2018-03-27  8:45                                                   ` Roger Pau Monné
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-26 19:42 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Paul Durrant,
	Jan Beulich, xen-devel, Anthony Perard, Ian Jackson

On Mon, 26 Mar 2018 10:24:38 +0100
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Sat, Mar 24, 2018 at 08:32:44AM +1000, Alexey G wrote:
[...]
>> In fact, the emulated chipset (NB+SB combo without supplemental
>> devices) itself is a small part of required emulation. It's
>> relatively easy to provide own analogs of for eg. 'mch' and
>> 'ICH9-LPC' QEMU PCIDevice's, the problem is to glue all remaining
>> parts together.
>> 
>> I assume the final goal in this case is to have only a set of
>> necessary QEMU PCIDevice's for which we will be providing I/O, MMIO
>> and PCI conf trapping facilities. Only devices such as rtl8139,
>> ich9-ahci and few others.
>> 
>> Basically, this means a new, chipset-less QEMU machine type.
>> Well, in theory it is possible with a bit of effort I think. The main
>> question is where will be the NB/SB/PCIbus emulating part reside in
>> this case.  
>
>Mostly inside of Xen. Of course the IDE/SATA/USB/Ethernet... part of
>the southbrigde will be emulated by a device model (ie: QEMU).
>
>As you mention above, I also took a look and it seems like the amount
>of registers that we should emulate for Q35 DRAM controller (D0:F0) is
>fairly minimal based on current QEMU implementation. We could even
>possibly get away by just emulating PCIEXBAR.

MCH emulation alone might be not an option. Besides, some
southbridge-specific features like emulating ACPI PM facilities for
domain power management (basically, anything at PMBASE) will be
preferable to implement on Xen side, especially considering the fact
that ACPI tables are already provided by Xen's libacpi/hvmloader, not
the device model.
I think the feature may require to cover at least the NB+SB
combination, at least Q35 MCH + ICH9 for start, ideally 82441FX+PIIX4
as well. Also, Xen should control emulated/PT PCI device placement.

Before going this way, it would be good to measure all risks.
Looks like there are two main directions currently:

I. (conservative) Let the main device model (QEMU) to inform Xen about
the current chipset-specific MMCONFIG location, to allow Xen to know
that some MMIO accesses to this area must be forwarded to other ioreq
servers (device emulators) in a form of PCI config read/write ioreqs,
if BDF corresponding to a MMCONFIG offset will point to the PCI device
owned by a device emulator.
In case of device emulators the conversion of MMIO accesses to PCI
config ones is a mandatory step, while the owner of the MMCONFIG MMIO
range may receive MMIO accesses in a native form without conversion
(a strongly preferable option for QEMU).

This approach assumes introducing of the new dmop/hypercall (something
like XEN_DMOP_mmcfg_location_change) to pass to Xen basic MMCONFIG
information -- address, enabled/disabled status (or simply address=0
instead) and size of the MMCONFIG area, eg. as a number of buses.
This information is enough to select a proper ioreq server in Xen and
allow multiple device emulators to function properly.
For future compatibility we can also provide the segment and
start/end bus range as arguments.

What this approach will require:
--------------------------------

- new notification-style dmop/hypercall to tell Xen about the current
  emulated MMCONFIG location

- trivial changes in QEMU to use this dmop in Q35 PCIEXBAR handling code

- relatively simple Xen changes in ioreq.c to use the provided range
  for ioreq server selection. Also, to provide MMIO -> PCI config ioreq
  translation for supplemental ioreq servers which don't know anything
  about the emulated system

Risks:
------

Risk to break anything is minimal in this case.

If QEMU will not provide this information (eg. due to an outdated
version installed), only basic PCI config space accesses via CF8/CFC
will be forwarded to a distinct ioreq server. This means the extended
PCI config space accesses won't be forwarded to specific device
emulators. Other than these device emulators, anything else will
continue to work properly in this case. No differences will be for
guest OSes without PCIe ECAM support in either case.

In general, no breakthrough improvements, no negative side-effects.
Just PCIe ECAM working as expected and compatibility with multiple
ioreq servers is retained.


II. (a new feature) Move chipset emulation to Xen directly.

In this case no separate notification necessary as Xen will be
emulating the chosen chipset itself. MMCONFIG location will be known
from own PCIEXBAR emulation.

QEMU will be used only to emulate a minimal set of unrelated devices
(eg. storage/network/vga). Less dependency on QEMU overall.

More freedom to implement some specific features in the future like
smram support for EFI firmware needs. Chipset remapping (aka reclaim)
functionality for memory relocation may be implemented under complete
Xen control, avoiding usage of unsafe add_to_physmap hypercalls.

In future this will allow to move passthrough-supporting code from QEMU
(hw/xen/xen-pt*.c) to Xen, merging it with Roger's vpci series.
This will improve eg. the PT + stubdomain situation a lot -- PCI config
space accesses for PT devices will be handled in a uniform way without
Dom0 interaction.
This particular feature can be implemented for the previous approach as
well, still it is easier to do when Xen controls the emulated machine

In general, this is a good long-term direction.

What this approach will require:
--------------------------------

- Changes in QEMU code to support a new chipset-less machine(s). In
  theory might be possible to implement on top of the "null" machine
  concept

- Major changes in Xen code to implement the actual chipset emulation
  there

- Changes on the toolstack side as the emulated machine will be
  selected and used differently

- Moving passthrough support from QEMU to Xen will likely require to
  re-divide areas of responsibility for PCI device passthrough between
  xen-pciback and the hypervisor. It might be more convenient to
  perform some tasks of xen-pciback in Xen directly

- strong dependency between Xen/libxl/QEMU/etc versions -- any outdated
  component will be a major problem. Can be resolved by providing some
  compatibility code

- longer implementation time

Risks:
------

- A major architecture change with possible issues encountered during
  the implementation

- Moving the emulation of the machine to Xen creates a non-zero risk of
  introducing a security issue while extending the emulation support
  further. As all emulation will take place on a most trusted level, any
  exploitable bug in the chipset emulation code may compromise the
  whole system

- there is a risk to encounter some dependency on missing chipset
  devices in QEMU. Some of QEMU devices (which depend on QEMU chipset
  devices/properties) might not work without extra patches. In theory
  this may be addressed by leaving the dummy MCH/LPC/pci-host devices
  in place while not forwarding any IO/MMIO/PCI conf accesses to them
  (using simply as compat placeholders)

- risk of incompatibility with future QEMU versions

In both cases, for security concerns PCIEXBAR and other MCH registers
can be made write-once (RO on all further accesses, similar to a
TXT-locked system).

[...]
>> Regarding control of the guest memory map in the toolstack only...
>> The problem is, only firmware can see a final memory map at the
>> moment. And only the device model knows about invisible "service"
>> ranges for emulated devices, like the LFB content (aka "VRAM") when
>> it is not mapped to a guest.
>> 
>> In order to calculate the final memory/MMIO hole split, we need to
>> know:
>> 
>> 1) all PCI devices on a PCI bus. At the moment Xen contributes only
>> devices like PT to the final PCI bus (via QMP device_add). Others are
>> QEMU ones. Even Xen platform PCI device relies on QEMU emulation.
>> Non-QEMU device emulators are another source of virtual PCI devices I
>> guess.
>> 
>> 2) all chipset-specific emulated MMIO ranges. MMCONFIG is one of them
>> and largest (up to 256Mb for a segment). There are few other smaller
>> ranges, eg. Root Complex registers. All this ranges depend on the
>> emulated chipset.
>> 
>> 3) all reserved memory ranges (this one what toolstack already knows)
>> 
>> 4) all "service" guest memory ranges like backing storage for VRAM in
>> QEMU. Emulated Option ROMs should belong here too, but IIRC xen-hvm.c
>> either intentionally or by mistate handles them as emulated ranges
>> currently.
>> 
>> If we miss any of these (like what are the chipset-specific ranges
>> and their size alignment requirements) -- we're in trouble. But, if
>> we know *all* of these, we can pre-calculate the MMIO hole size.
>> Although this is a bit fragile to do from the toolstack because both
>> sizing algo in the toolstack and MMIO BAR allocation code in the
>> firmware (hvmloader) must have their algorithms synchronized,
>> because it is possible to sruff BARs to MMIO hole in different ways,
>> especially when PCI-PCI bridges will appear on the scene. Both need
>> to do it in a consistent way (resulting in similar set of gaps
>> between allocated BARs), otherwise expected MMIO hole sizes won't
>> match, which means we may need to relocate MMIO BARs to the high
>> MMIO hole and this in turn may lead to those overlaps with QEMU
>> memories.  
>
>I agree that the current memory layout management (or the lack of it)
>is concerning. Although related, I think this should be tackled as a
>different issue from the chipset one IMHO.
>
>Since you already posted the Q35 series I would attempt to get that
>done first before jumping into the memory layout one.

It is somewhat related to the chipset because memory/MMIO layout
inconsistency can be solved more, well, naturally on Q35.

Basically, we have a non-standard MMIO hole layout where the
start of the high MMIO hole do not match the top of addressable RAM
(due to invisible ranges of the device model).

Q35 initially have facilities to allow firmware to modify (via
emulation) or discover such MMIO hole setup which can be used for safe
MMIO BAR allocation to avoid overlaps with QEMU-owned invisible ranges.

It doesn't really matter which registers to pick for this task, but for
Q35 this approach is at least consistent with what a real system does
(PV/PVH people will find this peculiarity pointless I suppose :) ).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-26 19:42                                                 ` Alexey G
@ 2018-03-27  8:45                                                   ` Roger Pau Monné
  2018-03-27 15:37                                                     ` Alexey G
  0 siblings, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-27  8:45 UTC (permalink / raw)
  To: Alexey G
  Cc: StefanoStabellini, Wei Liu, Andrew Cooper, Paul Durrant,
	Jan Beulich, xen-devel, Anthony Perard, Ian Jackson

On Tue, Mar 27, 2018 at 05:42:11AM +1000, Alexey G wrote:
> On Mon, 26 Mar 2018 10:24:38 +0100
> Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> >On Sat, Mar 24, 2018 at 08:32:44AM +1000, Alexey G wrote:
> [...]
> >> In fact, the emulated chipset (NB+SB combo without supplemental
> >> devices) itself is a small part of required emulation. It's
> >> relatively easy to provide own analogs of for eg. 'mch' and
> >> 'ICH9-LPC' QEMU PCIDevice's, the problem is to glue all remaining
> >> parts together.
> >> 
> >> I assume the final goal in this case is to have only a set of
> >> necessary QEMU PCIDevice's for which we will be providing I/O, MMIO
> >> and PCI conf trapping facilities. Only devices such as rtl8139,
> >> ich9-ahci and few others.
> >> 
> >> Basically, this means a new, chipset-less QEMU machine type.
> >> Well, in theory it is possible with a bit of effort I think. The main
> >> question is where will be the NB/SB/PCIbus emulating part reside in
> >> this case.  
> >
> >Mostly inside of Xen. Of course the IDE/SATA/USB/Ethernet... part of
> >the southbrigde will be emulated by a device model (ie: QEMU).
> >
> >As you mention above, I also took a look and it seems like the amount
> >of registers that we should emulate for Q35 DRAM controller (D0:F0) is
> >fairly minimal based on current QEMU implementation. We could even
> >possibly get away by just emulating PCIEXBAR.
> 
> MCH emulation alone might be not an option. Besides, some
> southbridge-specific features like emulating ACPI PM facilities for
> domain power management (basically, anything at PMBASE) will be
> preferable to implement on Xen side, especially considering the fact
> that ACPI tables are already provided by Xen's libacpi/hvmloader, not
> the device model.

Likely, but AFAICT this is kind of already broken, because PM1a and
TMR is already emulated by Xen at hardcoded values. See
xen/arch/x86/hvm/pmtimer.c.

> I think the feature may require to cover at least the NB+SB
> combination, at least Q35 MCH + ICH9 for start, ideally 82441FX+PIIX4
> as well. Also, Xen should control emulated/PT PCI device placement.

Q35 MCH (D0:F0) it's required in order to trap access to PCIEXBAR.

Could you be more concise about ICH9?

The ICH9 spec contains multiple devices, for example it includes an
ethernet controller and a SATA controller, which we should not emulate
inside of Xen.

> II. (a new feature) Move chipset emulation to Xen directly.
> 
> In this case no separate notification necessary as Xen will be
> emulating the chosen chipset itself. MMCONFIG location will be known
> from own PCIEXBAR emulation.
> 
> QEMU will be used only to emulate a minimal set of unrelated devices
> (eg. storage/network/vga). Less dependency on QEMU overall.
> 
> More freedom to implement some specific features in the future like
> smram support for EFI firmware needs. Chipset remapping (aka reclaim)
> functionality for memory relocation may be implemented under complete
> Xen control, avoiding usage of unsafe add_to_physmap hypercalls.
> 
> In future this will allow to move passthrough-supporting code from QEMU
> (hw/xen/xen-pt*.c) to Xen, merging it with Roger's vpci series.
> This will improve eg. the PT + stubdomain situation a lot -- PCI config
> space accesses for PT devices will be handled in a uniform way without
> Dom0 interaction.
> This particular feature can be implemented for the previous approach as
> well, still it is easier to do when Xen controls the emulated machine
> 
> In general, this is a good long-term direction.
> 
> What this approach will require:
> --------------------------------
> 
> - Changes in QEMU code to support a new chipset-less machine(s). In
>   theory might be possible to implement on top of the "null" machine
>   concept

Not all parts of the chipset should go inside of Xen, ATM I only
foresee Q35 MCH being implemented inside of Xen. So I'm not sure
calling this a chipset-less machine is correct from QEMU PoV.

> - Major changes in Xen code to implement the actual chipset emulation
>   there
> 
> - Changes on the toolstack side as the emulated machine will be
>   selected and used differently
> 
> - Moving passthrough support from QEMU to Xen will likely require to
>   re-divide areas of responsibility for PCI device passthrough between
>   xen-pciback and the hypervisor. It might be more convenient to
>   perform some tasks of xen-pciback in Xen directly

Moving pci-passthough from QEMU to Xen is IMO a separate project, and
by the text you provide I'm not sure how is that related to the Q35
chipset implementation.

> - strong dependency between Xen/libxl/QEMU/etc versions -- any outdated
>   component will be a major problem. Can be resolved by providing some
>   compatibility code

Well, you would only be able to use the Q35 feature with the right
version of the components.

> - longer implementation time
> 
> Risks:
> ------
> 
> - A major architecture change with possible issues encountered during
>   the implementation
> 
> - Moving the emulation of the machine to Xen creates a non-zero risk of
>   introducing a security issue while extending the emulation support
>   further. As all emulation will take place on a most trusted level, any
>   exploitable bug in the chipset emulation code may compromise the
>   whole system
> 
> - there is a risk to encounter some dependency on missing chipset
>   devices in QEMU. Some of QEMU devices (which depend on QEMU chipset
>   devices/properties) might not work without extra patches. In theory
>   this may be addressed by leaving the dummy MCH/LPC/pci-host devices
>   in place while not forwarding any IO/MMIO/PCI conf accesses to them
>   (using simply as compat placeholders)
> 
> - risk of incompatibility with future QEMU versions
> 
> In both cases, for security concerns PCIEXBAR and other MCH registers
> can be made write-once (RO on all further accesses, similar to a
> TXT-locked system).

I think option II is the right way to move forward.

> It is somewhat related to the chipset because memory/MMIO layout
> inconsistency can be solved more, well, naturally on Q35.
> 
> Basically, we have a non-standard MMIO hole layout where the
> start of the high MMIO hole do not match the top of addressable RAM
> (due to invisible ranges of the device model).

But that's a device model issue then? I'm not sure I'm getting what
you mean here.

> Q35 initially have facilities to allow firmware to modify (via
> emulation) or discover such MMIO hole setup which can be used for safe
> MMIO BAR allocation to avoid overlaps with QEMU-owned invisible ranges.

IMO a single entity should be in control of the memory layout, and
that's the toolstack.

Ideally we should not allow the firmware to change the layout at all.
What are specifically the registers that you mention?

> It doesn't really matter which registers to pick for this task, but for
> Q35 this approach is at least consistent with what a real system does
> (PV/PVH people will find this peculiarity pointless I suppose :) ).

Right, but I don't think we aim to emulate a fully complete Q35 MCH or
ICH9 for example, which has tons of registers, not even QEMU is trying
to do that. The main goal is to emulate the registers we know are
required for OSes to work.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-27  8:45                                                   ` Roger Pau Monné
@ 2018-03-27 15:37                                                     ` Alexey G
  2018-03-28  9:30                                                       ` Roger Pau Monné
  2018-03-28 10:03                                                       ` Paul Durrant
  0 siblings, 2 replies; 183+ messages in thread
From: Alexey G @ 2018-03-27 15:37 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: StefanoStabellini, Wei Liu, Paul Durrant, Jan Beulich, xen-devel,
	Anthony Perard, Ian Jackson

On Tue, 27 Mar 2018 09:45:30 +0100
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Tue, Mar 27, 2018 at 05:42:11AM +1000, Alexey G wrote:
>> On Mon, 26 Mar 2018 10:24:38 +0100
>> Roger Pau Monné <roger.pau@citrix.com> wrote:
>>   
>> >On Sat, Mar 24, 2018 at 08:32:44AM +1000, Alexey G wrote:  
>> [...]  
>> >> In fact, the emulated chipset (NB+SB combo without supplemental
>> >> devices) itself is a small part of required emulation. It's
>> >> relatively easy to provide own analogs of for eg. 'mch' and
>> >> 'ICH9-LPC' QEMU PCIDevice's, the problem is to glue all remaining
>> >> parts together.
>> >> 
>> >> I assume the final goal in this case is to have only a set of
>> >> necessary QEMU PCIDevice's for which we will be providing I/O,
>> >> MMIO and PCI conf trapping facilities. Only devices such as
>> >> rtl8139, ich9-ahci and few others.
>> >> 
>> >> Basically, this means a new, chipset-less QEMU machine type.
>> >> Well, in theory it is possible with a bit of effort I think. The
>> >> main question is where will be the NB/SB/PCIbus emulating part
>> >> reside in this case.    
>> >
>> >Mostly inside of Xen. Of course the IDE/SATA/USB/Ethernet... part of
>> >the southbrigde will be emulated by a device model (ie: QEMU).
>> >
>> >As you mention above, I also took a look and it seems like the
>> >amount of registers that we should emulate for Q35 DRAM controller
>> >(D0:F0) is fairly minimal based on current QEMU implementation. We
>> >could even possibly get away by just emulating PCIEXBAR.  
>> 
>> MCH emulation alone might be not an option. Besides, some
>> southbridge-specific features like emulating ACPI PM facilities for
>> domain power management (basically, anything at PMBASE) will be
>> preferable to implement on Xen side, especially considering the fact
>> that ACPI tables are already provided by Xen's libacpi/hvmloader, not
>> the device model.  
>
>Likely, but AFAICT this is kind of already broken, because PM1a and
>TMR is already emulated by Xen at hardcoded values. See
>xen/arch/x86/hvm/pmtimer.c.

Yes, that should be an argument to try to implement PMBASE emulation in
Xen too. Although this needs to be checked against dependencies in
QEMU first, especially with ACPI-related code.

This way we can have a better flexibility to use an arbitrary PMBASE
value, not just having to hardcode it to ACPI_PM1A_EVT_BLK_ADDRESS_V1
in all related components.

>> I think the feature may require to cover at least the NB+SB
>> combination, at least Q35 MCH + ICH9 for start, ideally 82441FX+PIIX4
>> as well. Also, Xen should control emulated/PT PCI device placement.  
>
>Q35 MCH (D0:F0) it's required in order to trap access to PCIEXBAR.

Absolutely.


BTW, another somewhat related problem at the moment is that Xen knows
nothing about a chipset-specific MMIO hole(s). Due to this, it is
possible for a guest to map PT BARs outside the MMIO hole, leading to
errors like this:

(XEN) memory_map:remove: dom4 gfn=c8000 mfn=c8000 nr=2000
(XEN) memory_map:add: dom4 gfn=ffffffffc8000 mfn=c8000 nr=2000
(XEN) p2m.c:1121:d0v5 p2m_set_entry: 0xffffffffc8000:9 -> -22 (0xc8000)
(XEN) memory_map:fail: dom4 gfn=ffffffffc8000 mfn=c8000 nr=2000 ret:-22
(XEN) memory_map:remove: dom4 gfn=ffffffffc8000 mfn=c8000 nr=2000
(XEN) p2m.c:1228:d0v5 gfn_to_mfn failed! gfn=ffffffffc8000 type:4
(XEN) memory_map: error -22 removing dom4 access to [c8000,c9fff]
(XEN) memory_map:remove: dom4 gfn=ffffffffc8000 mfn=c8000 nr=2000
(XEN) p2m.c:1228:d0v5 gfn_to_mfn failed! gfn=ffffffffc8000 type:4
(XEN) memory_map: error -22 removing dom4 access to [c8000,c9fff]
(XEN) memory_map:add: dom4 gfn=c8000 mfn=c8000 nr=2000

Note that it was merely a lame BAR sizing attempt from the guest-side SW
(a PCI config space viewing tool) -- writing F's to the high part of the
MMIO BAR first.

If we will know the guest's MMIO hole bounds, we can adapt to this
behavior, avoiding erroneous mapping attempts to a wrong address
outside the MMIO hole. Only the MMIO hole designated range can be used
to map PT device BARs.

So, if we will be actually emulating MCH's MMIO hole related registers
in Xen as well -- we can use them as scratchpad registers (write-once
of course) to pass this kind of information between Xen and other
involved parties as an alternative to eg. a dedicated hypercall.

>Could you be more concise about ICH9?
>
>The ICH9 spec contains multiple devices, for example it includes an
>ethernet controller and a SATA controller, which we should not emulate
>inside of Xen.

ICH built-in devices from out PoV can be considered as distinct PCI
devices (as long as they're actually distinct devices in PCI config
space).
It's a QEMU's approach for them -- these devices can be added to a q35
machine optionally. Only a minimal set of devices provided initially,
like MCH/LPC/AHCI. SMBus controller (0:1F.3) added by default too, but
it's not useful much at the moment.

So mostly we can consider the LPC bridge (0:1F.0) for emulation of
all devices provided by a real ICH SB.

>> II. (a new feature) Move chipset emulation to Xen directly.
>> 
>> In this case no separate notification necessary as Xen will be
>> emulating the chosen chipset itself. MMCONFIG location will be known
>> from own PCIEXBAR emulation.
>> 
>> QEMU will be used only to emulate a minimal set of unrelated devices
>> (eg. storage/network/vga). Less dependency on QEMU overall.
>> 
>> More freedom to implement some specific features in the future like
>> smram support for EFI firmware needs. Chipset remapping (aka reclaim)
>> functionality for memory relocation may be implemented under complete
>> Xen control, avoiding usage of unsafe add_to_physmap hypercalls.
>> 
>> In future this will allow to move passthrough-supporting code from
>> QEMU (hw/xen/xen-pt*.c) to Xen, merging it with Roger's vpci series.
>> This will improve eg. the PT + stubdomain situation a lot -- PCI
>> config space accesses for PT devices will be handled in a uniform
>> way without Dom0 interaction.
>> This particular feature can be implemented for the previous approach
>> as well, still it is easier to do when Xen controls the emulated
>> machine
>> 
>> In general, this is a good long-term direction.
>> 
>> What this approach will require:
>> --------------------------------
>> 
>> - Changes in QEMU code to support a new chipset-less machine(s). In
>>   theory might be possible to implement on top of the "null" machine
>>   concept  
>
>Not all parts of the chipset should go inside of Xen, ATM I only
>foresee Q35 MCH being implemented inside of Xen. So I'm not sure
>calling this a chipset-less machine is correct from QEMU PoV.

Emulating only MCH in Xen will still require lot of changes but 
overall benefit will become unclear -- basically, we just move
PCIEXBAR emulation to Xen from QEMU.

>> - Major changes in Xen code to implement the actual chipset emulation
>>   there
>> 
>> - Changes on the toolstack side as the emulated machine will be
>>   selected and used differently
>> 
>> - Moving passthrough support from QEMU to Xen will likely require to
>>   re-divide areas of responsibility for PCI device passthrough
>> between xen-pciback and the hypervisor. It might be more convenient
>> to perform some tasks of xen-pciback in Xen directly  
>
>Moving pci-passthough from QEMU to Xen is IMO a separate project, and
>by the text you provide I'm not sure how is that related to the Q35
>chipset implementation.

Yes, it's more a separate feature on top of that approach. 

>> - strong dependency between Xen/libxl/QEMU/etc versions -- any
>> outdated component will be a major problem. Can be resolved by
>> providing some compatibility code  
>
>Well, you would only be able to use the Q35 feature with the right
>version of the components.
>
>> - longer implementation time
>> 
>> Risks:
>> ------
>> 
>> - A major architecture change with possible issues encountered during
>>   the implementation
>> 
>> - Moving the emulation of the machine to Xen creates a non-zero risk
>> of introducing a security issue while extending the emulation support
>>   further. As all emulation will take place on a most trusted level,
>> any exploitable bug in the chipset emulation code may compromise the
>>   whole system
>> 
>> - there is a risk to encounter some dependency on missing chipset
>>   devices in QEMU. Some of QEMU devices (which depend on QEMU chipset
>>   devices/properties) might not work without extra patches. In theory
>>   this may be addressed by leaving the dummy MCH/LPC/pci-host devices
>>   in place while not forwarding any IO/MMIO/PCI conf accesses to them
>>   (using simply as compat placeholders)
>> 
>> - risk of incompatibility with future QEMU versions
>> 
>> In both cases, for security concerns PCIEXBAR and other MCH registers
>> can be made write-once (RO on all further accesses, similar to a
>> TXT-locked system).  
>
>I think option II is the right way to move forward.

Agree, it's a good long-term direction.
Well, the problem is, option 1 can be implemented in a matter of 1-3
days. It will allow MMCONFIG to work with multiple device emulators
while being very light on requirements -- no big code changes
necessary, easy to test/review, etc.

OTOH, option 2 will require some research first as the change is
non-trivial and may possibly produce any kind of incompatibility issues
with QEMU.

Emulating just MCH in Xen while still leaving anything else to
QEMU does not show an obvious advantage. Without extending the
chipset emulation in Xen further, it will be just an overcomplicated
emulation of PCIEXBAR register. If this will be the only first objective
for the feature, then we need some strong justification why moving the
emulation of guest's PCIEXBAR from QEMU to Xen is a mandatory thing.

We need to be extra sure that having MCH emulated in Xen while ICH9 and
all the rest will remain to be emulated by QEMU is a good solution for
PCIEXBAR emulation. Otherwise, having a split-type chipset emulation
between Xen/QEMU just to handle the Q35' PCIEXBAR register is an
overkill.

I would personally prefer to implement the option 1 first, while
researching and implementing the option 2 in the near perspective.

There is nothing special in PCIEXBAR, it's just one of the emulated
chipset registers, holding the address of the emulated MMIO area. This
register doesn't differ much with eg. AHCI ABAR. In fact, it's actually
more harmless --  for MMCONFIG MMIO we merely forward accesses for PCI
config read/write emulation (same thing as for emulated CF8/CFC I/O),
while handling AHCI ABAR MMIO means that we do serious things like
initiating real block I/O with the host. For PT devices MMCONFIG
accesses still go thru hw/xen-pt*.c for filtering or emulation.

>> It is somewhat related to the chipset because memory/MMIO layout
>> inconsistency can be solved more, well, naturally on Q35.
>> 
>> Basically, we have a non-standard MMIO hole layout where the
>> start of the high MMIO hole do not match the top of addressable RAM
>> (due to invisible ranges of the device model).  
>
>But that's a device model issue then? I'm not sure I'm getting what
>you mean here.

We depend on the device model in the question where we can place
the start of the high MMIO hole currently. This also badly affects
memory relocation support, which is required for MMIO hole auto-sizing.
There are multiple options how to resolve this problem, eg. placing
VRAM to some addresses far beyond >4Gb but this approach is not ideal
too as the device model cannot know where 64-bit BARs will be
allocated. Although this is a simplest approach to avoid overlaps and
to have the high MMIO hole base equal to the max guest RAM address.

>> Q35 initially have facilities to allow firmware to modify (via
>> emulation) or discover such MMIO hole setup which can be used for
>> safe MMIO BAR allocation to avoid overlaps with QEMU-owned invisible
>> ranges.  
>
>IMO a single entity should be in control of the memory layout, and
>that's the toolstack.
>
>Ideally we should not allow the firmware to change the layout at all.

This approach is terribly wrong, I don't know why opinions like this
so common at Citrix. The toolstack is a least informed side. If
MMIO/memory layout should be immutable, it must be calculated
considering all factors, like chipset-specific MMIO ranges or ranges
which cannot be used for the MMIO hole.

We need to know all resource requirements of device-model's and PT
PCI devices, all chipset-specific MMIO ranges (which belong to a device
model), all RMRRs (host's property) and all device-model invisible
ranges like VRAM backing store (another device model's property).
And we need to know in which manner hvmloader will be allocating BARs
to the MMIO hole -- eg. either in a forward direction starting from some
base or moving backwards from the end of 4Gb (minus hardcoded ranges).
Basically this means that we have to depend on hvmloader code/version
too in the toolstack, which is wrong on its own -- we should have a
freedom to modify the BAR allocation algo in hvmloader at any time.

At the moment all this information can be discovered only from
the firmware side. Lot of changes needed to gather all required
information from the toolstack.

>What are specifically the registers that you mention?

Write-once emulation of TOLUD/TOUUD/REMAPBASE/REMAPLIMIT registers for
hvmloader to use. That's the approach I'm actually using to make
'hvmloader/allow-memory-relocate=1' to work. Memory relocation without
relying on add_to_physmap hypercall for hvmloader (which it does
currently) while having MMIO/memory layout synchronized between all
parties. There are multiple benefits (mostly for PT needs), including
the MMIO hole auto-sizing support but this approach won't be accepted
well with "toolstack should do everything" attitude I'm afraid.

>> It doesn't really matter which registers to pick for this task, but
>> for Q35 this approach is at least consistent with what a real system
>> does (PV/PVH people will find this peculiarity pointless I
>> suppose :) ).  

>Right, but I don't think we aim to emulate a fully complete Q35 MCH or
>ICH9 for example, which has tons of registers, not even QEMU is trying
>to do that. The main goal is to emulate the registers we know are
>required for OSes to work.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-27 15:37                                                     ` Alexey G
@ 2018-03-28  9:30                                                       ` Roger Pau Monné
  2018-03-28 11:42                                                         ` Alexey G
  2018-03-28 10:03                                                       ` Paul Durrant
  1 sibling, 1 reply; 183+ messages in thread
From: Roger Pau Monné @ 2018-03-28  9:30 UTC (permalink / raw)
  To: Alexey G
  Cc: StefanoStabellini, Wei Liu, Paul Durrant, Jan Beulich, xen-devel,
	Anthony Perard, Ian Jackson

On Wed, Mar 28, 2018 at 01:37:29AM +1000, Alexey G wrote:
> On Tue, 27 Mar 2018 09:45:30 +0100
> Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> >On Tue, Mar 27, 2018 at 05:42:11AM +1000, Alexey G wrote:
> >> On Mon, 26 Mar 2018 10:24:38 +0100
> >> Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>   
> >> >On Sat, Mar 24, 2018 at 08:32:44AM +1000, Alexey G wrote:  

> BTW, another somewhat related problem at the moment is that Xen knows
> nothing about a chipset-specific MMIO hole(s). Due to this, it is
> possible for a guest to map PT BARs outside the MMIO hole, leading to
> errors like this:
> 
> (XEN) memory_map:remove: dom4 gfn=c8000 mfn=c8000 nr=2000
> (XEN) memory_map:add: dom4 gfn=ffffffffc8000 mfn=c8000 nr=2000
> (XEN) p2m.c:1121:d0v5 p2m_set_entry: 0xffffffffc8000:9 -> -22 (0xc8000)
> (XEN) memory_map:fail: dom4 gfn=ffffffffc8000 mfn=c8000 nr=2000 ret:-22
> (XEN) memory_map:remove: dom4 gfn=ffffffffc8000 mfn=c8000 nr=2000
> (XEN) p2m.c:1228:d0v5 gfn_to_mfn failed! gfn=ffffffffc8000 type:4
> (XEN) memory_map: error -22 removing dom4 access to [c8000,c9fff]
> (XEN) memory_map:remove: dom4 gfn=ffffffffc8000 mfn=c8000 nr=2000
> (XEN) p2m.c:1228:d0v5 gfn_to_mfn failed! gfn=ffffffffc8000 type:4
> (XEN) memory_map: error -22 removing dom4 access to [c8000,c9fff]
> (XEN) memory_map:add: dom4 gfn=c8000 mfn=c8000 nr=2000
> 
> Note that it was merely a lame BAR sizing attempt from the guest-side SW
> (a PCI config space viewing tool) -- writing F's to the high part of the
> MMIO BAR first.

You should disable memory decoding before attempting to size a BAR.

This error has nothing to do with trying to move a BAR outside of the
MMIO hole, this error is caused by the gfn being bigger than the guest
physical address width AFAICT.

> If we will know the guest's MMIO hole bounds, we can adapt to this
> behavior, avoiding erroneous mapping attempts to a wrong address
> outside the MMIO hole. Only the MMIO hole designated range can be used
> to map PT device BARs.
> 
> So, if we will be actually emulating MCH's MMIO hole related registers
> in Xen as well -- we can use them as scratchpad registers (write-once
> of course) to pass this kind of information between Xen and other
> involved parties as an alternative to eg. a dedicated hypercall.

I'm not sure where this information is stored in MCH, guest OSes tend
to fetch this from the ACPI _CRS method of the host-pci bridge device.

I also don't see QEMU emulating such registers, but yes, I won't be
opposed to storing/reporting this in some registers if that's indeed
supported. Note that I don't think this should be mandatory for adding
Q35 support though.

> >> What this approach will require:
> >> --------------------------------
> >> 
> >> - Changes in QEMU code to support a new chipset-less machine(s). In
> >>   theory might be possible to implement on top of the "null" machine
> >>   concept  
> >
> >Not all parts of the chipset should go inside of Xen, ATM I only
> >foresee Q35 MCH being implemented inside of Xen. So I'm not sure
> >calling this a chipset-less machine is correct from QEMU PoV.
> 
> Emulating only MCH in Xen will still require lot of changes but 
> overall benefit will become unclear -- basically, we just move
> PCIEXBAR emulation to Xen from QEMU.

At least it would make Xen the one controlling the MCFG area, which is
important. It would also be the first step into moving other chipset
functionality into Xen.

Not doing it just perpetuates the bad precedent that we already have
with the previous chipset.

> >What are specifically the registers that you mention?
> 
> Write-once emulation of TOLUD/TOUUD/REMAPBASE/REMAPLIMIT registers for
> hvmloader to use. That's the approach I'm actually using to make
> 'hvmloader/allow-memory-relocate=1' to work. Memory relocation without
> relying on add_to_physmap hypercall for hvmloader (which it does
> currently) while having MMIO/memory layout synchronized between all
> parties. There are multiple benefits (mostly for PT needs), including
> the MMIO hole auto-sizing support but this approach won't be accepted
> well with "toolstack should do everything" attitude I'm afraid.

You seem to be trying to fix several issues at the same time, which
just makes this much more complex than needed. The initial aim of this
series was to allow HVM guests to use the Q35 chipset. I think that's
what we should focus on.

As you have listed above (and in other emails) there are many
limitations with the current HVM approach, which I would be more than
happy for you to solve. But IMO not all of them must be solved in
order to add Q35 support.

Since this series and email thread has already gone quite far, would
you mind writing a design document with the approach that we
discussed?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-27 15:37                                                     ` Alexey G
  2018-03-28  9:30                                                       ` Roger Pau Monné
@ 2018-03-28 10:03                                                       ` Paul Durrant
  2018-03-28 14:14                                                         ` Alexey G
  1 sibling, 1 reply; 183+ messages in thread
From: Paul Durrant @ 2018-03-28 10:03 UTC (permalink / raw)
  To: 'Alexey G', Roger Pau Monne
  Cc: StefanoStabellini, Wei Liu, Jan Beulich, xen-devel,
	Anthony Perard, Ian Jackson

> -----Original Message-----
> >IMO a single entity should be in control of the memory layout, and
> >that's the toolstack.
> >
> >Ideally we should not allow the firmware to change the layout at all.
> 
> This approach is terribly wrong, I don't know why opinions like this
> so common at Citrix. The toolstack is a least informed side. If
> MMIO/memory layout should be immutable, it must be calculated
> considering all factors, like chipset-specific MMIO ranges or ranges
> which cannot be used for the MMIO hole.
> 

Why is this approach wrong? Code running in the guest is non-privileged and we really don't want it messing around with memory layout. We really want to be in a position to e.g. build ACPI tables in the toolstack and we cannot do this until the layout becomes immutable.

> We need to know all resource requirements of device-model's and PT
> PCI devices, all chipset-specific MMIO ranges (which belong to a device
> model), all RMRRs (host's property) and all device-model invisible
> ranges like VRAM backing store (another device model's property).

Yes, indeed we do.

> And we need to know in which manner hvmloader will be allocating BARs
> to the MMIO hole -- eg. either in a forward direction starting from some
> base or moving backwards from the end of 4Gb (minus hardcoded ranges).

Eventually we want to get rid of hvmloader. Why do we need to know anything about its enumeration of BARs? After all they could be completely re-enumerated by the guest OS during or after boot (and indeed Windows does precisely that).

> Basically this means that we have to depend on hvmloader code/version
> too in the toolstack, which is wrong on its own -- we should have a
> freedom to modify the BAR allocation algo in hvmloader at any time.
> 

It should be irrelevant. The toolstack should decide on the sizes and locations of the MMIO holes and they should remain fixed, and be enforced by Xen. This avoids issues that we currently have such as guests populating RAM inside MMIO holes.

  Paul

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-28  9:30                                                       ` Roger Pau Monné
@ 2018-03-28 11:42                                                         ` Alexey G
  2018-03-28 12:05                                                           ` Paul Durrant
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-03-28 11:42 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: StefanoStabellini, Wei Liu, Paul Durrant, Jan Beulich, xen-devel,
	Anthony Perard, Ian Jackson

On Wed, 28 Mar 2018 10:30:32 +0100
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Wed, Mar 28, 2018 at 01:37:29AM +1000, Alexey G wrote:
>> On Tue, 27 Mar 2018 09:45:30 +0100
>> Roger Pau Monné <roger.pau@citrix.com> wrote:
>>   
>> >On Tue, Mar 27, 2018 at 05:42:11AM +1000, Alexey G wrote:  
>> >> On Mon, 26 Mar 2018 10:24:38 +0100
>> >> Roger Pau Monné <roger.pau@citrix.com> wrote:
>> >>     
>> >> >On Sat, Mar 24, 2018 at 08:32:44AM +1000, Alexey G wrote:    
>
>> BTW, another somewhat related problem at the moment is that Xen knows
>> nothing about a chipset-specific MMIO hole(s). Due to this, it is
>> possible for a guest to map PT BARs outside the MMIO hole, leading to
>> errors like this:
>> 
>> (XEN) memory_map:remove: dom4 gfn=c8000 mfn=c8000 nr=2000
>> (XEN) memory_map:add: dom4 gfn=ffffffffc8000 mfn=c8000 nr=2000
>> (XEN) p2m.c:1121:d0v5 p2m_set_entry: 0xffffffffc8000:9 -> -22
>> (0xc8000) (XEN) memory_map:fail: dom4 gfn=ffffffffc8000 mfn=c8000
>> nr=2000 ret:-22 (XEN) memory_map:remove: dom4 gfn=ffffffffc8000
>> mfn=c8000 nr=2000 (XEN) p2m.c:1228:d0v5 gfn_to_mfn failed!
>> gfn=ffffffffc8000 type:4 (XEN) memory_map: error -22 removing dom4
>> access to [c8000,c9fff] (XEN) memory_map:remove: dom4
>> gfn=ffffffffc8000 mfn=c8000 nr=2000 (XEN) p2m.c:1228:d0v5 gfn_to_mfn
>> failed! gfn=ffffffffc8000 type:4 (XEN) memory_map: error -22
>> removing dom4 access to [c8000,c9fff] (XEN) memory_map:add: dom4
>> gfn=c8000 mfn=c8000 nr=2000
>> 
>> Note that it was merely a lame BAR sizing attempt from the
>> guest-side SW (a PCI config space viewing tool) -- writing F's to
>> the high part of the MMIO BAR first.  
>
>You should disable memory decoding before attempting to size a BAR.

The problem is, that PCI config space viewer is not mine. :)
It should disable the decoding first normally, yes, but it doesn't. Yet
there are no problems on the real system and these errors while being
run in a VM. IIRC powercycling the guest and triggering these errors
multiple times even had negative impact on host's stability, so it's a
good test case.

>This error has nothing to do with trying to move a BAR outside of the
>MMIO hole, this error is caused by the gfn being bigger than the guest
>physical address width AFAICT.

In fact, it's the essence of the error -- an attempt to map the range
where it shouldn't be attempted to map at all.
p2m_set_entry is too deep to encounter this error, it should be avoided
much earlier. If we knew the limits where we can (and cannot) map the
PT device BARs, we can check if we really need to proceed with the
mapping. This way we can handle that "mid-sizing/mid-change" condition
when only half of the 64-bit mem BAR has been written.

>> If we will know the guest's MMIO hole bounds, we can adapt to this
>> behavior, avoiding erroneous mapping attempts to a wrong address
>> outside the MMIO hole. Only the MMIO hole designated range can be
>> used to map PT device BARs.
>> 
>> So, if we will be actually emulating MCH's MMIO hole related
>> registers in Xen as well -- we can use them as scratchpad registers
>> (write-once of course) to pass this kind of information between Xen
>> and other involved parties as an alternative to eg. a dedicated
>> hypercall.  
>
>I'm not sure where this information is stored in MCH, guest OSes tend
>to fetch this from the ACPI _CRS method of the host-pci bridge device.
>
>I also don't see QEMU emulating such registers, but yes, I won't be
>opposed to storing/reporting this in some registers if that's indeed
>supported. Note that I don't think this should be mandatory for adding
>Q35 support though.

This info needed for Xen, not guest OSes -- in order to avoid errors
like described above. If we will be emulating MCH in Xen internally, we
can emulate this registers as well. It will be simpler than introducing
a new hypercall to inform Xen about the established MMIO hole range.

Anyway, you're right, it's a side issue. Just an example for what else
the built-in MCH emulation may be useful.

>> >> What this approach will require:
>> >> --------------------------------
>> >> 
>> >> - Changes in QEMU code to support a new chipset-less machine(s).
>> >> In theory might be possible to implement on top of the "null"
>> >> machine concept    
>> >
>> >Not all parts of the chipset should go inside of Xen, ATM I only
>> >foresee Q35 MCH being implemented inside of Xen. So I'm not sure
>> >calling this a chipset-less machine is correct from QEMU PoV.  
>> 
>> Emulating only MCH in Xen will still require lot of changes but 
>> overall benefit will become unclear -- basically, we just move
>> PCIEXBAR emulation to Xen from QEMU.  
>
>At least it would make Xen the one controlling the MCFG area, which is
>important. It would also be the first step into moving other chipset
>functionality into Xen.
>
>Not doing it just perpetuates the bad precedent that we already have
>with the previous chipset.

I think it will be kinda ugly if we will be emulating just MCH in Xen
and ICH9 (+ all the rest) in QEMU at the same time. It looks more like
some temporary solution. It would be good to know if such approach will
be approved by maintainers.

>> >What are specifically the registers that you mention?  
>> 
>> Write-once emulation of TOLUD/TOUUD/REMAPBASE/REMAPLIMIT registers
>> for hvmloader to use. That's the approach I'm actually using to make
>> 'hvmloader/allow-memory-relocate=1' to work. Memory relocation
>> without relying on add_to_physmap hypercall for hvmloader (which it
>> does currently) while having MMIO/memory layout synchronized between
>> all parties. There are multiple benefits (mostly for PT needs),
>> including the MMIO hole auto-sizing support but this approach won't
>> be accepted well with "toolstack should do everything" attitude I'm
>> afraid.  
>
>You seem to be trying to fix several issues at the same time, which
>just makes this much more complex than needed. The initial aim of this
>series was to allow HVM guests to use the Q35 chipset. I think that's
>what we should focus on.

Agree. Initially, the main goal was to allow the PCIe extended config
space usage for PT devices. Even this particular feature is not in its
final state, there are other patches for hw/xen/xen-pt*.c pending
(dynamic fields support), but these are more common, not bound to
just Q35.

>As you have listed above (and in other emails) there are many
>limitations with the current HVM approach, which I would be more than
>happy for you to solve. But IMO not all of them must be solved in
>order to add Q35 support.
>
>Since this series and email thread has already gone quite far, would
>you mind writing a design document with the approach that we
>discussed?

I think we must all agree which approach to implement next. Basically,
whether we need to completely discard the option #1 for this series and
move on with #2. That lengthy requirements/risks email was an attempt to
provide some ground for comparison.

Leaving only required devices like vga/usb/network/storage to QEMU while
emulating everything else in Xen is a good milestone, but, as I
understood we currently targeting less ambitious goals for option #2 --
emulating only MCH in Xen while emulating ICH9 etc in QEMU.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-28 11:42                                                         ` Alexey G
@ 2018-03-28 12:05                                                           ` Paul Durrant
  0 siblings, 0 replies; 183+ messages in thread
From: Paul Durrant @ 2018-03-28 12:05 UTC (permalink / raw)
  To: 'Alexey G', Roger Pau Monne
  Cc: StefanoStabellini, Wei Liu, Jan Beulich, xen-devel,
	Anthony Perard, Ian Jackson

> -----Original Message-----
> 
> I think we must all agree which approach to implement next. Basically,
> whether we need to completely discard the option #1 for this series and
> move on with #2. That lengthy requirements/risks email was an attempt to
> provide some ground for comparison.
> 
> Leaving only required devices like vga/usb/network/storage to QEMU while
> emulating everything else in Xen is a good milestone, but, as I
> understood we currently targeting less ambitious goals for option #2 --
> emulating only MCH in Xen while emulating ICH9 etc in QEMU.

Option #2 is right direction architecturally; the trick is figuring out how to get there in stages.

I think the fact that Xen emulation obscures QEMU emulation means that we can start to do this without needing too much, if any, change in QEMU. It looks like handling MMCONFIG inside Xen would be a reasonable first stage. Stage two could be explanding Roger's vpci work to handle PCI pass-through to guests. Not sure what would come next.

  Paul


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-28 10:03                                                       ` Paul Durrant
@ 2018-03-28 14:14                                                         ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-03-28 14:14 UTC (permalink / raw)
  To: Paul Durrant
  Cc: StefanoStabellini, Wei Liu, Jan Beulich, xen-devel,
	Anthony Perard, Ian Jackson, Roger Pau Monne

On Wed, 28 Mar 2018 10:03:29 +0000
Paul Durrant <Paul.Durrant@citrix.com> wrote:
>> >IMO a single entity should be in control of the memory layout, and
>> >that's the toolstack.
>> >
>> >Ideally we should not allow the firmware to change the layout at
>> >all.  
>> 
>> This approach is terribly wrong, I don't know why opinions like this
>> so common at Citrix. The toolstack is a least informed side. If
>> MMIO/memory layout should be immutable, it must be calculated
>> considering all factors, like chipset-specific MMIO ranges or ranges
>> which cannot be used for the MMIO hole.
>>   
>
>Why is this approach wrong? Code running in the guest is
>non-privileged and we really don't want it messing around with memory
>layout. We really want to be in a position to e.g. build ACPI tables
>in the toolstack and we cannot do this until the layout becomes
>immutable.

Only firmware code in the guest can correctly determine guest's MMIO
hole requirements (BIOS typically, but hvmloader in our case).

It is impossible to do in the toolstack at the moment, because it
doesn't know at least
- MMIO BAR sizes of device-model's PCI devices
- chipset-specific MMIO ranges the DM emulates for a chosen machine
- the way these ranges are allocated to the MMIO hole by guest firmware

Even providing some interface to query all related information from a
device model won't cover the problem how firmware will be allocating
these ranges to the MMIO hole. Any code (or version) change can make
the toolstack's expectations wrong ->

>> We need to know all resource requirements of device-model's and PT
>> PCI devices, all chipset-specific MMIO ranges (which belong to a
>> device model), all RMRRs (host's property) and all device-model
>> invisible ranges like VRAM backing store (another device model's
>> property).  
>
>Yes, indeed we do.
>
>> And we need to know in which manner hvmloader will be allocating BARs
>> to the MMIO hole -- eg. either in a forward direction starting from
>> some base or moving backwards from the end of 4Gb (minus hardcoded
>> ranges).  
>
>Eventually we want to get rid of hvmloader.

...especially if BAR allocation will be delegated from hvmloader to
other firmware like SeaBIOS/OVMF.

> Why do we need to know
>anything about its enumeration of BARs? After all they could be
>completely re-enumerated by the guest OS during or after boot (and
>indeed Windows does precisely that).

You probably confuse BAR assignment with BAR values enumeration.

Windows reallocates all PCI BARs in specific conditions only, they call
this feature 'PCI resources rebalancing'. Normally it sticks to the PCI
BAR allocation setup provided by firmware (hvmloader in our case). BAR
reallocation doesn't really matter as long as we have a correct MMIO
hole size.

The very last thing a user needs to do is guessing the correct value of
the mmio_hole_size parameter, which value will be ok for all his PT
devices while being not too large at the same time to leave more RAM
for 32-bit guests.

Those 32-bit guests are most problematic for MMIO hole sizing. We should
try to keep the MMIO hole size as small as possible to reduce RAM losses
while at the same time we are not permitted to allocate any BARs to the
high MMIO hole -- moving 64-bit BARs above 4Gb will automatically make
such devices non-functional for 32-bit guests.

This means we need to calculate the precise MMIO hole size, according
to all factors. And only firmware code in the guest can do it right.

>> Basically this means that we have to depend on hvmloader code/version
>> too in the toolstack, which is wrong on its own -- we should have a
>> freedom to modify the BAR allocation algo in hvmloader at any time.
>>   
>
>It should be irrelevant. The toolstack should decide on the sizes and
>locations of the MMIO holes and they should remain fixed, and be
>enforced by Xen. This avoids issues that we currently have such as
>guests populating RAM inside MMIO holes.

The toolstack can't do it. There should be some one-time way to
communicate the MMIO hole setup between Xen and hvmloader. And after
that we can make it immutable.

Write-once interface via emulated platform registers (either designated
for this purpose or any arbitrarily chosen) is a safe approach. We have
full control of what can be provided or allowed to guest firmware via
this interface. A dedicated hypercall should be ok too, but it's a bit
overkill.

What is definitely not safe is to allow hvmloader to use the
add_to_physmap hypercall.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-03-12 18:33 ` [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring Alexey Gerasimenko
  2018-03-19 15:58   ` Roger Pau Monné
@ 2018-05-29 14:23   ` Jan Beulich
  2018-05-29 17:56     ` Alexey G
  1 sibling, 1 reply; 183+ messages in thread
From: Jan Beulich @ 2018-05-29 14:23 UTC (permalink / raw)
  To: Alexey Gerasimenko; +Cc: Andrew Cooper, Wei Liu, Ian Jackson, xen-devel

>>> On 12.03.18 at 19:33, <x1917x@gmail.com> wrote:
> --- a/tools/firmware/hvmloader/config.h
> +++ b/tools/firmware/hvmloader/config.h
> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
>  #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
>  #define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0 */
> +#define PCI_MCH_DEVFN       0       /* bus 0, dev 0, func 0 */

Just MCH is liable to become ambiguous in the future. Perhaps PCI_Q35_MCH_DEVFN?

> @@ -172,10 +173,14 @@ void pci_setup(void)
>  
>      /* Create a list of device BARs in descending order of size. */
>      struct bars {
> -        uint32_t is_64bar;
>          uint32_t devfn;
>          uint32_t bar_reg;
>          uint64_t bar_sz;
> +        uint64_t addr_mask; /* which bits of the base address can be written */
> +        uint32_t bar_data;  /* initial value - BAR flags here */

Why 32 bits? You only use the low few ones afaics. Also please avoid fixed width
types unless you really need them.

> @@ -259,13 +264,21 @@ void pci_setup(void)
>                  bar_reg = PCI_ROM_ADDRESS;
>  
>              bar_data = pci_readl(devfn, bar_reg);
> +
> +            is_mem = !!(((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> +                       PCI_BASE_ADDRESS_SPACE_MEMORY) ||
> +                       (bar_reg == PCI_ROM_ADDRESS));

Once you make is_mem properly bool, !! won't be needed anymore.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 10/12] libacpi: build ACPI MCFG table if requested
  2018-03-12 18:33 ` [RFC PATCH 10/12] libacpi: build ACPI MCFG table if requested Alexey Gerasimenko
  2018-03-19 17:33   ` Roger Pau Monné
@ 2018-05-29 14:36   ` Jan Beulich
  2018-05-29 18:20     ` Alexey G
  1 sibling, 1 reply; 183+ messages in thread
From: Jan Beulich @ 2018-05-29 14:36 UTC (permalink / raw)
  To: Alexey Gerasimenko; +Cc: Ian Jackson, Wei Liu, xen-devel

>>> On 12.03.18 at 19:33, <x1917x@gmail.com> wrote:
> --- a/tools/libacpi/acpi2_0.h
> +++ b/tools/libacpi/acpi2_0.h
> @@ -422,6 +422,25 @@ struct acpi_20_slit {
>  };
>  
>  /*
> + * PCI Express Memory Mapped Configuration Description Table
> + */
> +struct mcfg_range_entry {
> +    uint64_t base_address;
> +    uint16_t pci_segment;
> +    uint8_t  start_pci_bus_num;
> +    uint8_t  end_pci_bus_num;
> +    uint32_t reserved;
> +};
> +
> +struct acpi_mcfg {
> +    struct acpi_header header;
> +    uint8_t reserved[8];
> +    struct mcfg_range_entry entries[1];
> +};
> +
> +#define MCFG_SIZE_TO_NUM_BUSES(size)  ((size) >> 20)

In a response to a comment from Roger you suggested to move this to pci_regs.h.
I don't see why it would belong there. I think if ACPI spells out such a formula
somewhere, it's fine to liver here. Otherwise, since you need it in a single file only,
please put it into the .c file.

> --- a/tools/libacpi/build.c
> +++ b/tools/libacpi/build.c
> @@ -303,6 +303,37 @@ static struct acpi_20_slit *construct_slit(struct 
> acpi_ctxt *ctxt,
>      return slit;
>  }
>  
> +static struct acpi_mcfg *construct_mcfg(struct acpi_ctxt *ctxt,
> +                                        const struct acpi_config *config)
> +{
> +    struct acpi_mcfg *mcfg;
> +
> +    /* Warning: this code expects that we have only one PCI segment */
> +    mcfg = ctxt->mem_ops.alloc(ctxt, sizeof(*mcfg), 16);
> +    if (!mcfg)
> +        return NULL;
> +
> +    memset(mcfg, 0, sizeof(*mcfg));
> +    mcfg->header.signature    = ACPI_MCFG_SIGNATURE;
> +    mcfg->header.revision     = ACPI_1_0_MCFG_REVISION;
> +    fixed_strcpy(mcfg->header.oem_id, ACPI_OEM_ID);
> +    fixed_strcpy(mcfg->header.oem_table_id, ACPI_OEM_TABLE_ID);
> +    mcfg->header.oem_revision = ACPI_OEM_REVISION;
> +    mcfg->header.creator_id   = ACPI_CREATOR_ID;
> +    mcfg->header.creator_revision = ACPI_CREATOR_REVISION;
> +    mcfg->header.length = sizeof(*mcfg);
> +
> +    mcfg->entries[0].base_address = config->mmconfig_addr;
> +    mcfg->entries[0].pci_segment = 0;
> +    mcfg->entries[0].start_pci_bus_num = 0;
> +    mcfg->entries[0].end_pci_bus_num =
> +        MCFG_SIZE_TO_NUM_BUSES(config->mmconfig_len) - 1;
> +
> +    set_checksum(mcfg, offsetof(struct acpi_header, checksum), sizeof(*mcfg));

Despite the numerous pre-existing examples this isn't really correct.
What you mean is something like

    set_checksum(mcfg, offsetof(typeof(*mcfg), header.checksum), sizeof(*mcfg));

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 11/12] hvmloader: use libacpi to build MCFG table
  2018-03-12 18:33 ` [RFC PATCH 11/12] hvmloader: use libacpi to build MCFG table Alexey Gerasimenko
  2018-03-14 17:48   ` Alexey G
  2018-03-19 17:49   ` Roger Pau Monné
@ 2018-05-29 14:46   ` Jan Beulich
  2018-05-29 17:26     ` Alexey G
  2 siblings, 1 reply; 183+ messages in thread
From: Jan Beulich @ 2018-05-29 14:46 UTC (permalink / raw)
  To: Alexey Gerasimenko; +Cc: Andrew Cooper, Wei Liu, Ian Jackson, xen-devel

>>> On 12.03.18 at 19:33, <x1917x@gmail.com> wrote:
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -782,6 +782,69 @@ int get_pc_machine_type(void)
>      return machine_type;
>  }
>  
> +#define PCIEXBAR_ADDR_MASK_64MB     (~((1ULL << 26) - 1))
> +#define PCIEXBAR_ADDR_MASK_128MB    (~((1ULL << 27) - 1))
> +#define PCIEXBAR_ADDR_MASK_256MB    (~((1ULL << 28) - 1))

I don't see the value of these constants, the more that they're generic
64/128/256 Mb masks rather than being PCIEXBAR specific. They also
have no business living in pci_regs.h imo, including any of ...

> +#define PCIEXBAR_LENGTH_BITS(reg)   (((reg) >> 1) & 3)
> +#define PCIEXBAREN                  1

... these: Only generic fields should be described there. If you want to
collect Q35 definitions in a central place, add q35.h. But if you do,
please properly prefix all of them such that there won't be any risk
collisions with possible future additions.

> +static uint64_t mmconfig_get_base(void)
> +{
> +    uint64_t base;
> +    uint32_t reg = pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR);
> +
> +    base = reg | (uint64_t) pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR+4) << 32;
> +
> +    switch (PCIEXBAR_LENGTH_BITS(reg))
> +    {
> +    case 0:
> +        base &= PCIEXBAR_ADDR_MASK_256MB;
> +        break;
> +    case 1:
> +        base &= PCIEXBAR_ADDR_MASK_128MB;
> +        break;
> +    case 2:
> +        base &= PCIEXBAR_ADDR_MASK_64MB;
> +        break;
> +    case 3:
> +        BUG();  /* a reserved value encountered */
> +    }

Instead of this switch, why can't you ...

> +    return base;

    return base & ~(mmconfig_get_size() - 1);

here, eliminating (afaics) the need for the constants above?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 11/12] hvmloader: use libacpi to build MCFG table
  2018-05-29 14:46   ` Jan Beulich
@ 2018-05-29 17:26     ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-05-29 17:26 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Ian Jackson, xen-devel

On Tue, 29 May 2018 08:46:13 -0600
"Jan Beulich" <JBeulich@suse.com> wrote:

>>>> On 12.03.18 at 19:33, <x1917x@gmail.com> wrote:  
>> --- a/tools/firmware/hvmloader/util.c
>> +++ b/tools/firmware/hvmloader/util.c
>> @@ -782,6 +782,69 @@ int get_pc_machine_type(void)
>>      return machine_type;
>>  }
>>  
>> +#define PCIEXBAR_ADDR_MASK_64MB     (~((1ULL << 26) - 1))
>> +#define PCIEXBAR_ADDR_MASK_128MB    (~((1ULL << 27) - 1))
>> +#define PCIEXBAR_ADDR_MASK_256MB    (~((1ULL << 28) - 1))  
>
>I don't see the value of these constants, the more that they're generic
>64/128/256 Mb masks rather than being PCIEXBAR specific. They also
>have no business living in pci_regs.h imo, including any of ...
>
>> +#define PCIEXBAR_LENGTH_BITS(reg)   (((reg) >> 1) & 3)
>> +#define PCIEXBAREN                  1  
>
>... these: Only generic fields should be described there. If you want to
>collect Q35 definitions in a central place, add q35.h. But if you do,
>please properly prefix all of them such that there won't be any risk
>collisions with possible future additions.

OK, sure.

>> +static uint64_t mmconfig_get_base(void)
>> +{
>> +    uint64_t base;
>> +    uint32_t reg = pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR);
>> +
>> +    base = reg | (uint64_t) pci_readl(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR+4) << 32;
>> +
>> +    switch (PCIEXBAR_LENGTH_BITS(reg))
>> +    {
>> +    case 0:
>> +        base &= PCIEXBAR_ADDR_MASK_256MB;
>> +        break;
>> +    case 1:
>> +        base &= PCIEXBAR_ADDR_MASK_128MB;
>> +        break;
>> +    case 2:
>> +        base &= PCIEXBAR_ADDR_MASK_64MB;
>> +        break;
>> +    case 3:
>> +        BUG();  /* a reserved value encountered */
>> +    }  
>
>Instead of this switch, why can't you ...
>
>> +    return base;  
>
>    return base & ~(mmconfig_get_size() - 1);
>
>here, eliminating (afaics) the need for the constants above?

I remember some MMCONFIG implementations using base alignment smaller
than a possible MMCONFIG size, the code style was probably influenced by
that fact. But as we deal with Q35 only, the mmconfig_get_size() for the
base address mask is absolutely valid (and shorter).

In this case it will be nicer, agree. And we still have an assert for
the unimplemented value (3) via the mmconfig_get_size() call to catch
errors like an emulator returning 0xFF's on register reads.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-05-29 14:23   ` Jan Beulich
@ 2018-05-29 17:56     ` Alexey G
  2018-05-29 18:47       ` Alexey G
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-05-29 17:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Ian Jackson, xen-devel

On Tue, 29 May 2018 08:23:51 -0600
"Jan Beulich" <JBeulich@suse.com> wrote:

>>>> On 12.03.18 at 19:33, <x1917x@gmail.com> wrote:  
>> --- a/tools/firmware/hvmloader/config.h
>> +++ b/tools/firmware/hvmloader/config.h
>> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
>>  #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
>>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
>>  #define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0 */
>> +#define PCI_MCH_DEVFN       0       /* bus 0, dev 0, func 0 */  
>
>Just MCH is liable to become ambiguous in the future. Perhaps PCI_Q35_MCH_DEVFN?

Agree, PCI_Q35_MCH_DEVFN is more explicit.

>> @@ -172,10 +173,14 @@ void pci_setup(void)
>>  
>>      /* Create a list of device BARs in descending order of size. */
>>      struct bars {
>> -        uint32_t is_64bar;
>>          uint32_t devfn;
>>          uint32_t bar_reg;
>>          uint64_t bar_sz;
>> +        uint64_t addr_mask; /* which bits of the base address can be written */
>> +        uint32_t bar_data;  /* initial value - BAR flags here */  
>
>Why 32 bits? You only use the low few ones afaics. Also please avoid fixed width
>types unless you really need them.

bar_data is supposed to hold only BAR's kludge bits like 'enabled' bit
values or MMCONFIG width bits. All of them occupy the low dword only
while BAR's high dword is just a part of the address which will be
replaced by allocated one (for mem64 BARs), thus no need to keep the
high half.

So this is a sort of minor optimization -- avoiding using 64-bit operand
size when 32 bit is enough.

>> @@ -259,13 +264,21 @@ void pci_setup(void)
>>                  bar_reg = PCI_ROM_ADDRESS;
>>  
>>              bar_data = pci_readl(devfn, bar_reg);
>> +
>> +            is_mem = !!(((bar_data & PCI_BASE_ADDRESS_SPACE) ==
>> +                       PCI_BASE_ADDRESS_SPACE_MEMORY) ||
>> +                       (bar_reg == PCI_ROM_ADDRESS));  
>
>Once you make is_mem properly bool, !! won't be needed anymore.

OK, will switch to bool.

>Jan
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 10/12] libacpi: build ACPI MCFG table if requested
  2018-05-29 14:36   ` Jan Beulich
@ 2018-05-29 18:20     ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-05-29 18:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Jackson, Wei Liu, xen-devel

On Tue, 29 May 2018 08:36:49 -0600
"Jan Beulich" <JBeulich@suse.com> wrote:

>>>> On 12.03.18 at 19:33, <x1917x@gmail.com> wrote:  
>> --- a/tools/libacpi/acpi2_0.h
>> +++ b/tools/libacpi/acpi2_0.h
>> @@ -422,6 +422,25 @@ struct acpi_20_slit {
>>  };
>>  
>>  /*
>> + * PCI Express Memory Mapped Configuration Description Table
>> + */
>> +struct mcfg_range_entry {
>> +    uint64_t base_address;
>> +    uint16_t pci_segment;
>> +    uint8_t  start_pci_bus_num;
>> +    uint8_t  end_pci_bus_num;
>> +    uint32_t reserved;
>> +};
>> +
>> +struct acpi_mcfg {
>> +    struct acpi_header header;
>> +    uint8_t reserved[8];
>> +    struct mcfg_range_entry entries[1];
>> +};
>> +
>> +#define MCFG_SIZE_TO_NUM_BUSES(size)  ((size) >> 20)  
>
>In a response to a comment from Roger you suggested to move this to pci_regs.h.
>I don't see why it would belong there. I think if ACPI spells out such a formula
>somewhere, it's fine to liver here. Otherwise, since you need it in a single file only,
>please put it into the .c file.

Agree, it is currently used in one place only, no need for .h

>> --- a/tools/libacpi/build.c
>> +++ b/tools/libacpi/build.c
>> @@ -303,6 +303,37 @@ static struct acpi_20_slit *construct_slit(struct 
>> acpi_ctxt *ctxt,
>>      return slit;
>>  }
>>  
>> +static struct acpi_mcfg *construct_mcfg(struct acpi_ctxt *ctxt,
>> +                                        const struct acpi_config *config)
>> +{
>> +    struct acpi_mcfg *mcfg;
>> +
>> +    /* Warning: this code expects that we have only one PCI segment */
>> +    mcfg = ctxt->mem_ops.alloc(ctxt, sizeof(*mcfg), 16);
>> +    if (!mcfg)
>> +        return NULL;
>> +
>> +    memset(mcfg, 0, sizeof(*mcfg));
>> +    mcfg->header.signature    = ACPI_MCFG_SIGNATURE;
>> +    mcfg->header.revision     = ACPI_1_0_MCFG_REVISION;
>> +    fixed_strcpy(mcfg->header.oem_id, ACPI_OEM_ID);
>> +    fixed_strcpy(mcfg->header.oem_table_id, ACPI_OEM_TABLE_ID);
>> +    mcfg->header.oem_revision = ACPI_OEM_REVISION;
>> +    mcfg->header.creator_id   = ACPI_CREATOR_ID;
>> +    mcfg->header.creator_revision = ACPI_CREATOR_REVISION;
>> +    mcfg->header.length = sizeof(*mcfg);
>> +
>> +    mcfg->entries[0].base_address = config->mmconfig_addr;
>> +    mcfg->entries[0].pci_segment = 0;
>> +    mcfg->entries[0].start_pci_bus_num = 0;
>> +    mcfg->entries[0].end_pci_bus_num =
>> +        MCFG_SIZE_TO_NUM_BUSES(config->mmconfig_len) - 1;
>> +
>> +    set_checksum(mcfg, offsetof(struct acpi_header, checksum), sizeof(*mcfg));  
>
>Despite the numerous pre-existing examples this isn't really correct.
>What you mean is something like
>
>    set_checksum(mcfg, offsetof(typeof(*mcfg), header.checksum), sizeof(*mcfg));

Yes, all those set_checksum calls rely on the fact the acpi_header
structure will always be the first field. It will be, but the code is
technically wrong anyway.

I'll update all such set_checksum(...checksum)...) instances in the
file for the next version, this is a trivial change.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-05-29 17:56     ` Alexey G
@ 2018-05-29 18:47       ` Alexey G
  2018-05-30  4:32         ` Alexey G
  2018-05-30  8:12         ` Jan Beulich
  0 siblings, 2 replies; 183+ messages in thread
From: Alexey G @ 2018-05-29 18:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Ian Jackson, xen-devel

On Wed, 30 May 2018 03:56:07 +1000
Alexey G <x1917x@gmail.com> wrote:

>On Tue, 29 May 2018 08:23:51 -0600
>"Jan Beulich" <JBeulich@suse.com> wrote:
>
>>>>> On 12.03.18 at 19:33, <x1917x@gmail.com> wrote:    
>>> --- a/tools/firmware/hvmloader/config.h
>>> +++ b/tools/firmware/hvmloader/config.h
>>> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
>>>  #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
>>>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
>>>  #define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0 */
>>> +#define PCI_MCH_DEVFN       0       /* bus 0, dev 0, func 0 */    
>>
>>Just MCH is liable to become ambiguous in the future. Perhaps PCI_Q35_MCH_DEVFN?  
>
>Agree, PCI_Q35_MCH_DEVFN is more explicit.
>
>>> @@ -172,10 +173,14 @@ void pci_setup(void)
>>>  
>>>      /* Create a list of device BARs in descending order of size. */
>>>      struct bars {
>>> -        uint32_t is_64bar;
>>>          uint32_t devfn;
>>>          uint32_t bar_reg;
>>>          uint64_t bar_sz;
>>> +        uint64_t addr_mask; /* which bits of the base address can be written */
>>> +        uint32_t bar_data;  /* initial value - BAR flags here */    
>>
>>Why 32 bits? You only use the low few ones afaics. Also please avoid fixed width
>>types unless you really need them.  
>
>bar_data is supposed to hold only BAR's kludge bits like 'enabled' bit
>values or MMCONFIG width bits. All of them occupy the low dword only
>while BAR's high dword is just a part of the address which will be
>replaced by allocated one (for mem64 BARs), thus no need to keep the
>high half.
>
>So this is a sort of minor optimization -- avoiding using 64-bit operand
>size when 32 bit is enough.

Sorry, looks like I've misread the question. You were actually 
suggesting to make bar_data shorter. 8 bits is enough at the moment, so
bar_data can be changed to uint8_t, yes.

Regarding avoiding using bool here -- the only reason was adapting to
the existing code style. For some reason the existing hvmloader code
prefers to use uint-types for bool values.

>>> @@ -259,13 +264,21 @@ void pci_setup(void)
>>>                  bar_reg = PCI_ROM_ADDRESS;
>>>  
>>>              bar_data = pci_readl(devfn, bar_reg);
>>> +
>>> +            is_mem = !!(((bar_data & PCI_BASE_ADDRESS_SPACE) ==
>>> +                       PCI_BASE_ADDRESS_SPACE_MEMORY) ||
>>> +                       (bar_reg == PCI_ROM_ADDRESS));    
>>
>>Once you make is_mem properly bool, !! won't be needed anymore.  
>
>OK, will switch to bool.
>
>>Jan
>>
>>  
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-05-29 18:47       ` Alexey G
@ 2018-05-30  4:32         ` Alexey G
  2018-05-30  8:13           ` Jan Beulich
  2018-05-30  8:12         ` Jan Beulich
  1 sibling, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-05-30  4:32 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Ian Jackson, xen-devel

>On Wed, 30 May 2018 03:56:07 +1000
>Alexey G <x1917x@gmail.com> wrote:
>
>>On Tue, 29 May 2018 08:23:51 -0600
>>"Jan Beulich" <JBeulich@suse.com> wrote:
>>  
>>>>>> On 12.03.18 at 19:33, <x1917x@gmail.com> wrote:      
>>>> --- a/tools/firmware/hvmloader/config.h
>>>> +++ b/tools/firmware/hvmloader/config.h
>>>> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
>>>>  #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
>>>>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
>>>>  #define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0 */
>>>> +#define PCI_MCH_DEVFN       0       /* bus 0, dev 0, func 0 */      
>>>
>>>Just MCH is liable to become ambiguous in the future. Perhaps PCI_Q35_MCH_DEVFN?    
>>
>>Agree, PCI_Q35_MCH_DEVFN is more explicit.

On the other thought, we can reuse one MCH BDF #define for multiple
emulated chipsets, not just for something completely distinct to Q35
but even for those which mostly require merely changing PCI DIDs (like
P35 etc.) So in this case producing multiple #defines like
PCI_{Q|P|G}35_MCH_DEVFN for the same BDF 0:0.0 might be excessive.

PCI_ICH9_LPC_DEVFN can be actually reused too, its BDF location
survived many chipset generations so its #define can be shared as well
(though renamed to something like PCI_LPC_BRIDGE_DEVFN).

>>>> @@ -172,10 +173,14 @@ void pci_setup(void)
>>>>  
>>>>      /* Create a list of device BARs in descending order of size. */
>>>>      struct bars {
>>>> -        uint32_t is_64bar;
>>>>          uint32_t devfn;
>>>>          uint32_t bar_reg;
>>>>          uint64_t bar_sz;
>>>> +        uint64_t addr_mask; /* which bits of the base address can be written */
>>>> +        uint32_t bar_data;  /* initial value - BAR flags here */      
>>>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-05-29 18:47       ` Alexey G
  2018-05-30  4:32         ` Alexey G
@ 2018-05-30  8:12         ` Jan Beulich
  2018-05-31  5:15           ` Alexey G
  1 sibling, 1 reply; 183+ messages in thread
From: Jan Beulich @ 2018-05-30  8:12 UTC (permalink / raw)
  To: Alexey Gerasimenko; +Cc: Andrew Cooper, Wei Liu, Ian Jackson, xen-devel

>>> On 29.05.18 at 20:47, <x1917x@gmail.com> wrote:
> On Wed, 30 May 2018 03:56:07 +1000
> Alexey G <x1917x@gmail.com> wrote:
>>On Tue, 29 May 2018 08:23:51 -0600
>>"Jan Beulich" <JBeulich@suse.com> wrote:
>>>>>> On 12.03.18 at 19:33, <x1917x@gmail.com> wrote:    
>>>> @@ -172,10 +173,14 @@ void pci_setup(void)
>>>>  
>>>>      /* Create a list of device BARs in descending order of size. */
>>>>      struct bars {
>>>> -        uint32_t is_64bar;
>>>>          uint32_t devfn;
>>>>          uint32_t bar_reg;
>>>>          uint64_t bar_sz;
>>>> +        uint64_t addr_mask; /* which bits of the base address can be written */
>>>> +        uint32_t bar_data;  /* initial value - BAR flags here */    
>>>
>>>Why 32 bits? You only use the low few ones afaics. Also please avoid fixed width
>>>types unless you really need them.  
>>
>>bar_data is supposed to hold only BAR's kludge bits like 'enabled' bit
>>values or MMCONFIG width bits. All of them occupy the low dword only
>>while BAR's high dword is just a part of the address which will be
>>replaced by allocated one (for mem64 BARs), thus no need to keep the
>>high half.
>>
>>So this is a sort of minor optimization -- avoiding using 64-bit operand
>>size when 32 bit is enough.
> 
> Sorry, looks like I've misread the question. You were actually 
> suggesting to make bar_data shorter. 8 bits is enough at the moment, so
> bar_data can be changed to uint8_t, yes.

Right.

> Regarding avoiding using bool here -- the only reason was adapting to
> the existing code style. For some reason the existing hvmloader code
> prefers to use uint-types for bool values.

And wrongly so. We're slowly moving over, and we'd prefer the issue to
not be widened by new code.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-05-30  4:32         ` Alexey G
@ 2018-05-30  8:13           ` Jan Beulich
  2018-05-31  4:25             ` Alexey G
  0 siblings, 1 reply; 183+ messages in thread
From: Jan Beulich @ 2018-05-30  8:13 UTC (permalink / raw)
  To: Alexey Gerasimenko; +Cc: Andrew Cooper, Wei Liu, Ian Jackson, xen-devel

>>> On 30.05.18 at 06:32, <x1917x@gmail.com> wrote:
>> On Wed, 30 May 2018 03:56:07 +1000
>>Alexey G <x1917x@gmail.com> wrote:
>>
>>>On Tue, 29 May 2018 08:23:51 -0600
>>>"Jan Beulich" <JBeulich@suse.com> wrote:
>>>  
>>>>>>> On 12.03.18 at 19:33, <x1917x@gmail.com> wrote:      
>>>>> --- a/tools/firmware/hvmloader/config.h
>>>>> +++ b/tools/firmware/hvmloader/config.h
>>>>> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
>>>>>  #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
>>>>>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
>>>>>  #define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0 */
>>>>> +#define PCI_MCH_DEVFN       0       /* bus 0, dev 0, func 0 */      
>>>>
>>>>Just MCH is liable to become ambiguous in the future. Perhaps PCI_Q35_MCH_DEVFN?    
>>>
>>>Agree, PCI_Q35_MCH_DEVFN is more explicit.
> 
> On the other thought, we can reuse one MCH BDF #define for multiple
> emulated chipsets, not just for something completely distinct to Q35
> but even for those which mostly require merely changing PCI DIDs (like
> P35 etc.) So in this case producing multiple #defines like
> PCI_{Q|P|G}35_MCH_DEVFN for the same BDF 0:0.0 might be excessive.
> 
> PCI_ICH9_LPC_DEVFN can be actually reused too, its BDF location
> survived many chipset generations so its #define can be shared as well
> (though renamed to something like PCI_LPC_BRIDGE_DEVFN).

PCI_x35_MCH_DEVFN then, with a brief comment explaining the x?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-05-30  8:13           ` Jan Beulich
@ 2018-05-31  4:25             ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-05-31  4:25 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Ian Jackson, xen-devel

On Wed, 30 May 2018 02:13:30 -0600
"Jan Beulich" <JBeulich@suse.com> wrote:

>>>> On 30.05.18 at 06:32, <x1917x@gmail.com> wrote:  
>>> On Wed, 30 May 2018 03:56:07 +1000
>>>Alexey G <x1917x@gmail.com> wrote:
>>>  
>>>>On Tue, 29 May 2018 08:23:51 -0600
>>>>"Jan Beulich" <JBeulich@suse.com> wrote:
>>>>    
>>>>>>>> On 12.03.18 at 19:33, <x1917x@gmail.com> wrote:        
>>>>>> --- a/tools/firmware/hvmloader/config.h
>>>>>> +++ b/tools/firmware/hvmloader/config.h
>>>>>> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
>>>>>>  #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
>>>>>>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
>>>>>>  #define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0 */
>>>>>> +#define PCI_MCH_DEVFN       0       /* bus 0, dev 0, func 0 */        
>>>>>
>>>>>Just MCH is liable to become ambiguous in the future. Perhaps PCI_Q35_MCH_DEVFN?      
>>>>
>>>>Agree, PCI_Q35_MCH_DEVFN is more explicit.  
>> 
>> On the other thought, we can reuse one MCH BDF #define for multiple
>> emulated chipsets, not just for something completely distinct to Q35
>> but even for those which mostly require merely changing PCI DIDs (like
>> P35 etc.) So in this case producing multiple #defines like
>> PCI_{Q|P|G}35_MCH_DEVFN for the same BDF 0:0.0 might be excessive.
>> 
>> PCI_ICH9_LPC_DEVFN can be actually reused too, its BDF location
>> survived many chipset generations so its #define can be shared as well
>> (though renamed to something like PCI_LPC_BRIDGE_DEVFN).  
>
>PCI_x35_MCH_DEVFN then, with a brief comment explaining the x?

Hmm, I'm afraid there are too many chipsets sharing similarity with Q35,
including x31 and x33 series. Also, it might be confusing due to
existence of X-series chipsets like Intel X38.

I think it's better to rename this #define to PCI_Q35_MCH_DEVFN for now
as you suggested and leave the choice of unified names  for anyone (if
any) who will be actually adding P35/G35/etc emulation on top of Q35's.
So far we're limited to Q35 after all.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-05-30  8:12         ` Jan Beulich
@ 2018-05-31  5:15           ` Alexey G
  2018-06-01  5:30             ` Jan Beulich
  0 siblings, 1 reply; 183+ messages in thread
From: Alexey G @ 2018-05-31  5:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Ian Jackson, xen-devel

On Wed, 30 May 2018 02:12:37 -0600
"Jan Beulich" <JBeulich@suse.com> wrote:

>>>> On 29.05.18 at 20:47, <x1917x@gmail.com> wrote:  
>> On Wed, 30 May 2018 03:56:07 +1000
>> Alexey G <x1917x@gmail.com> wrote:  
>>>On Tue, 29 May 2018 08:23:51 -0600
>>>"Jan Beulich" <JBeulich@suse.com> wrote:  
>>>>>>> On 12.03.18 at 19:33, <x1917x@gmail.com> wrote:      
>>>>> @@ -172,10 +173,14 @@ void pci_setup(void)
>>>>>  
>>>>>      /* Create a list of device BARs in descending order of size. */
>>>>>      struct bars {
>>>>> -        uint32_t is_64bar;
>>>>>          uint32_t devfn;
>>>>>          uint32_t bar_reg;
>>>>>          uint64_t bar_sz;
>>>>> +        uint64_t addr_mask; /* which bits of the base address can be written */
>>>>> +        uint32_t bar_data;  /* initial value - BAR flags here */      
>>>>
>>>>Why 32 bits? You only use the low few ones afaics. Also please avoid fixed width
>>>>types unless you really need them.    
>>>
>>>bar_data is supposed to hold only BAR's kludge bits like 'enabled' bit
>>>values or MMCONFIG width bits. All of them occupy the low dword only
>>>while BAR's high dword is just a part of the address which will be
>>>replaced by allocated one (for mem64 BARs), thus no need to keep the
>>>high half.
>>>
>>>So this is a sort of minor optimization -- avoiding using 64-bit operand
>>>size when 32 bit is enough.  
>> 
>> Sorry, looks like I've misread the question. You were actually 
>> suggesting to make bar_data shorter. 8 bits is enough at the moment, so
>> bar_data can be changed to uint8_t, yes.  
>
>Right.

Ok, I'll switch to smaller types though not sure if it will make any
significant impact I'm afraid. 

In particular, bar_data will be typically used in 32/64-bit 
arithmetics, using a 32-bit datatype means we avoiding explicit zero
extension for both 32 and 64-bit operations while for an uint8_t field
the compiler will have to provide extra MOVZX instructions to embed a
8-bit operand into 32/64-bit expressions. 32-bit bar_reg can be made
16-bit in the same way but any memory usage improvements will be
similarly counteracted by a requirement to use 66h-prefixed
instructions for it.

Anyway, as the BAR allocation code is not memory- or
time-consuming/critical, I guess any option will be good.

>> Regarding avoiding using bool here -- the only reason was adapting to
>> the existing code style. For some reason the existing hvmloader code
>> prefers to use uint-types for bool values.  
>
>And wrongly so. We're slowly moving over, and we'd prefer the issue to
>not be widened by new code.

BTW, there are other changes pending for hvmloader/pci.c which will
(hopefully :) ) replace its BAR allocation and RMRR handling code, so
this patch can be considered as sort of intermediate one -- I'm using a
heavily reworked version of hvmloader/pci.c which I'd like to upstream.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-05-31  5:15           ` Alexey G
@ 2018-06-01  5:30             ` Jan Beulich
  2018-06-01 15:53               ` Alexey G
  0 siblings, 1 reply; 183+ messages in thread
From: Jan Beulich @ 2018-06-01  5:30 UTC (permalink / raw)
  To: x1917x; +Cc: andrew.cooper3, wei.liu2, Ian.Jackson, xen-devel

>>> Alexey G <x1917x@gmail.com> 05/31/18 7:15 AM >>>
>On Wed, 30 May 2018 02:12:37 -0600 "Jan Beulich" <JBeulich@suse.com> wrote:
>>>>> On 29.05.18 at 20:47, <x1917x@gmail.com> wrote:  
>>> On Wed, 30 May 2018 03:56:07 +1000
>>> Alexey G <x1917x@gmail.com> wrote:  
>>>>On Tue, 29 May 2018 08:23:51 -0600
>>>>"Jan Beulich" <JBeulich@suse.com> wrote:  
>>>>>>>> On 12.03.18 at 19:33, <x1917x@gmail.com> wrote:      
>>>>>> @@ -172,10 +173,14 @@ void pci_setup(void)
>>>>>>  
>>>>>>      /* Create a list of device BARs in descending order of size. */
>>>>>>      struct bars {
>>>>>> -        uint32_t is_64bar;
>>>>>>          uint32_t devfn;
>>>>>>          uint32_t bar_reg;
>>>>>>          uint64_t bar_sz;
>>>>>> +        uint64_t addr_mask; /* which bits of the base address can be written */
>>>>>> +        uint32_t bar_data;  /* initial value - BAR flags here */      
>>>>>
>>>>>Why 32 bits? You only use the low few ones afaics. Also please avoid fixed width
>>>>>types unless you really need them.    
>>>>
>>>>bar_data is supposed to hold only BAR's kludge bits like 'enabled' bit
>>>>values or MMCONFIG width bits. All of them occupy the low dword only
>>>>while BAR's high dword is just a part of the address which will be
>>>>replaced by allocated one (for mem64 BARs), thus no need to keep the
>>>>high half.
>>>>
>>>>So this is a sort of minor optimization -- avoiding using 64-bit operand
>>>>size when 32 bit is enough.  
>>> 
>>> Sorry, looks like I've misread the question. You were actually 
>>> suggesting to make bar_data shorter. 8 bits is enough at the moment, so
>>> bar_data can be changed to uint8_t, yes.  
>>
>>Right.
>
>Ok, I'll switch to smaller types though not sure if it will make any
>significant impact I'm afraid. 
>
>In particular, bar_data will be typically used in 32/64-bit 
>arithmetics, using a 32-bit datatype means we avoiding explicit zero
>extension for both 32 and 64-bit operations while for an uint8_t field
>the compiler will have to provide extra MOVZX instructions to embed a
>8-bit operand into 32/64-bit expressions. 32-bit bar_reg can be made
>16-bit in the same way but any memory usage improvements will be
>similarly counteracted by a requirement to use 66h-prefixed
>instructions for it.

Hmm, yes, the space saving from using less wide types are probably indeed
not worth it. But then please switch to "unsigned int" instead of uint<N>_t
whenever the exact size doesn't matter.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
  2018-06-01  5:30             ` Jan Beulich
@ 2018-06-01 15:53               ` Alexey G
  0 siblings, 0 replies; 183+ messages in thread
From: Alexey G @ 2018-06-01 15:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: andrew.cooper3, wei.liu2, Ian.Jackson, xen-devel

On Thu, 31 May 2018 23:30:35 -0600
"Jan Beulich" <jbeulich@suse.com> wrote:

>>>> Alexey G <x1917x@gmail.com> 05/31/18 7:15 AM >>>  
>>On Wed, 30 May 2018 02:12:37 -0600 "Jan Beulich" <JBeulich@suse.com> wrote:  
>>>>>> On 29.05.18 at 20:47, <x1917x@gmail.com> wrote:    
>>>> On Wed, 30 May 2018 03:56:07 +1000
>>>> Alexey G <x1917x@gmail.com> wrote:    
>>>>>On Tue, 29 May 2018 08:23:51 -0600
>>>>>"Jan Beulich" <JBeulich@suse.com> wrote:    
>>>>>>>>> On 12.03.18 at 19:33, <x1917x@gmail.com> wrote:        
>>>>>>> @@ -172,10 +173,14 @@ void pci_setup(void)
>>>>>>>  
>>>>>>>      /* Create a list of device BARs in descending order of size. */
>>>>>>>      struct bars {
>>>>>>> -        uint32_t is_64bar;
>>>>>>>          uint32_t devfn;
>>>>>>>          uint32_t bar_reg;
>>>>>>>          uint64_t bar_sz;
>>>>>>> +        uint64_t addr_mask; /* which bits of the base address can be written */
>>>>>>> +        uint32_t bar_data;  /* initial value - BAR flags here */        
>>>>>>
>>>>>>Why 32 bits? You only use the low few ones afaics. Also please avoid fixed width
>>>>>>types unless you really need them.      
>>>>>
>>>>>bar_data is supposed to hold only BAR's kludge bits like 'enabled' bit
>>>>>values or MMCONFIG width bits. All of them occupy the low dword only
>>>>>while BAR's high dword is just a part of the address which will be
>>>>>replaced by allocated one (for mem64 BARs), thus no need to keep the
>>>>>high half.
>>>>>
>>>>>So this is a sort of minor optimization -- avoiding using 64-bit operand
>>>>>size when 32 bit is enough.    
>>>> 
>>>> Sorry, looks like I've misread the question. You were actually 
>>>> suggesting to make bar_data shorter. 8 bits is enough at the moment, so
>>>> bar_data can be changed to uint8_t, yes.    
>>>
>>>Right.  
>>
>>Ok, I'll switch to smaller types though not sure if it will make any
>>significant impact I'm afraid. 
>>
>>In particular, bar_data will be typically used in 32/64-bit 
>>arithmetics, using a 32-bit datatype means we avoiding explicit zero
>>extension for both 32 and 64-bit operations while for an uint8_t field
>>the compiler will have to provide extra MOVZX instructions to embed a
>>8-bit operand into 32/64-bit expressions. 32-bit bar_reg can be made
>>16-bit in the same way but any memory usage improvements will be
>>similarly counteracted by a requirement to use 66h-prefixed
>>instructions for it.  
>
>Hmm, yes, the space saving from using less wide types are probably indeed
>not worth it. But then please switch to "unsigned int" instead of uint<N>_t
>whenever the exact size doesn't matter.

Ok, will do in v2.

>Jan
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

end of thread, other threads:[~2018-06-01 15:54 UTC | newest]

Thread overview: 183+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-12 18:33 [Qemu-devel] [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices Alexey Gerasimenko
2018-03-12 18:33 ` Alexey Gerasimenko
2018-03-12 18:33 ` [RFC PATCH 01/12] libacpi: new DSDT ACPI table for Q35 Alexey Gerasimenko
2018-03-12 19:38   ` Konrad Rzeszutek Wilk
2018-03-12 20:10     ` Alexey G
2018-03-12 20:32       ` Konrad Rzeszutek Wilk
2018-03-12 21:19         ` Alexey G
2018-03-13  2:41           ` Tian, Kevin
2018-03-19 12:43   ` Roger Pau Monné
2018-03-19 13:57     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 02/12] Makefile: build and use new DSDT " Alexey Gerasimenko
2018-03-19 12:46   ` Roger Pau Monné
2018-03-19 14:18     ` Alexey G
2018-03-19 13:07   ` Jan Beulich
2018-03-19 14:10     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 03/12] hvmloader: add function to query an emulated machine type (i440/Q35) Alexey Gerasimenko
2018-03-13 17:26   ` Wei Liu
2018-03-13 17:58     ` Alexey G
2018-03-13 18:04       ` Wei Liu
2018-03-19 12:56   ` Roger Pau Monné
2018-03-19 16:26     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 04/12] hvmloader: add ACPI enabling for Q35 Alexey Gerasimenko
2018-03-13 17:26   ` Wei Liu
2018-03-19 13:01   ` Roger Pau Monné
2018-03-19 23:59     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 05/12] hvmloader: add Q35 DSDT table loading Alexey Gerasimenko
2018-03-19 14:45   ` Roger Pau Monné
2018-03-20  0:15     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 06/12] hvmloader: add basic Q35 support Alexey Gerasimenko
2018-03-19 15:30   ` Roger Pau Monné
2018-03-19 23:44     ` Alexey G
2018-03-20  9:20       ` Roger Pau Monné
2018-03-20 21:23         ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring Alexey Gerasimenko
2018-03-19 15:58   ` Roger Pau Monné
2018-03-19 19:49     ` Alexey G
2018-03-20  8:50       ` Roger Pau Monné
2018-03-20  9:25         ` Paul Durrant
2018-03-21  0:58         ` Alexey G
2018-03-21  9:09           ` Roger Pau Monné
2018-03-21  9:36             ` Paul Durrant
2018-03-21 14:35               ` Alexey G
2018-03-21 14:58                 ` Paul Durrant
2018-03-21 14:25             ` Alexey G
2018-03-21 14:54               ` Paul Durrant
2018-03-21 17:41                 ` Alexey G
2018-03-21 15:20               ` Roger Pau Monné
2018-03-21 16:56                 ` Alexey G
2018-03-21 17:06                   ` Paul Durrant
2018-03-22  0:31                     ` Alexey G
2018-03-22  9:04                       ` Jan Beulich
2018-03-22  9:55                         ` Alexey G
2018-03-22 10:06                           ` Paul Durrant
2018-03-22 11:56                             ` Alexey G
2018-03-22 12:09                               ` Jan Beulich
2018-03-22 13:05                                 ` Alexey G
2018-03-22 13:20                                   ` Jan Beulich
2018-03-22 14:34                                     ` Alexey G
2018-03-22 14:42                                       ` Jan Beulich
2018-03-22 15:08                                         ` Alexey G
2018-03-23 13:57                                           ` Paul Durrant
2018-03-23 22:32                                             ` Alexey G
2018-03-26  9:24                                               ` Roger Pau Monné
2018-03-26 19:42                                                 ` Alexey G
2018-03-27  8:45                                                   ` Roger Pau Monné
2018-03-27 15:37                                                     ` Alexey G
2018-03-28  9:30                                                       ` Roger Pau Monné
2018-03-28 11:42                                                         ` Alexey G
2018-03-28 12:05                                                           ` Paul Durrant
2018-03-28 10:03                                                       ` Paul Durrant
2018-03-28 14:14                                                         ` Alexey G
2018-03-21 17:15                   ` Roger Pau Monné
2018-03-21 22:49                     ` Alexey G
2018-03-22  9:29                       ` Paul Durrant
2018-03-22 10:05                         ` Roger Pau Monné
2018-03-22 10:09                           ` Paul Durrant
2018-03-22 11:36                             ` Alexey G
2018-03-22 10:50                         ` Alexey G
2018-03-22  9:57                       ` Roger Pau Monné
2018-03-22 12:29                         ` Alexey G
2018-03-22 12:44                           ` Roger Pau Monné
2018-03-22 15:31                             ` Alexey G
2018-03-23 10:29                               ` Paul Durrant
2018-03-23 11:38                                 ` Jan Beulich
2018-03-23 13:52                                   ` Paul Durrant
2018-05-29 14:23   ` Jan Beulich
2018-05-29 17:56     ` Alexey G
2018-05-29 18:47       ` Alexey G
2018-05-30  4:32         ` Alexey G
2018-05-30  8:13           ` Jan Beulich
2018-05-31  4:25             ` Alexey G
2018-05-30  8:12         ` Jan Beulich
2018-05-31  5:15           ` Alexey G
2018-06-01  5:30             ` Jan Beulich
2018-06-01 15:53               ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 08/12] libxl: Q35 support (new option device_model_machine) Alexey Gerasimenko
2018-03-13 17:25   ` Wei Liu
2018-03-13 17:32     ` Anthony PERARD
2018-03-19 17:01   ` Roger Pau Monné
2018-03-19 22:11     ` Alexey G
2018-03-20  9:11       ` Roger Pau Monné
2018-03-21 16:27         ` Wei Liu
2018-03-21 17:03           ` Anthony PERARD
2018-03-21 16:25       ` Wei Liu
2018-03-12 18:33 ` [RFC PATCH 09/12] libxl: Xen Platform device support for Q35 Alexey Gerasimenko
2018-03-19 15:05   ` Alexey G
2018-03-21 16:32     ` Wei Liu
2018-03-12 18:33 ` [RFC PATCH 10/12] libacpi: build ACPI MCFG table if requested Alexey Gerasimenko
2018-03-19 17:33   ` Roger Pau Monné
2018-03-19 21:46     ` Alexey G
2018-03-20  9:03       ` Roger Pau Monné
2018-03-20 21:06         ` Alexey G
2018-05-29 14:36   ` Jan Beulich
2018-05-29 18:20     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 11/12] hvmloader: use libacpi to build MCFG table Alexey Gerasimenko
2018-03-14 17:48   ` Alexey G
2018-03-19 17:49   ` Roger Pau Monné
2018-03-19 21:20     ` Alexey G
2018-03-20  8:58       ` Roger Pau Monné
2018-03-20  9:36       ` Jan Beulich
2018-03-20 20:53         ` Alexey G
2018-03-21  7:36           ` Jan Beulich
2018-05-29 14:46   ` Jan Beulich
2018-05-29 17:26     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 12/12] docs: provide description for device_model_machine option Alexey Gerasimenko
2018-03-12 18:33 ` [Qemu-devel] [RFC PATCH 13/30] pc/xen: Xen Q35 support: provide IRQ handling for PCI devices Alexey Gerasimenko
2018-03-12 18:33   ` Alexey Gerasimenko
2018-03-14 10:48   ` [Qemu-devel] " Paolo Bonzini
2018-03-14 11:28     ` Alexey G
2018-03-14 11:28       ` Alexey G
2018-03-14 10:48   ` Paolo Bonzini
2018-03-12 18:33 ` [Qemu-devel] [RFC PATCH 14/30] pc/q35: Apply PCI bus BSEL property for Xen PCI device hotplug Alexey Gerasimenko
2018-03-12 18:33   ` Alexey Gerasimenko
2018-03-12 18:34 ` [Qemu-devel] [RFC PATCH 15/30] q35/acpi/xen: Provide ACPI PCI hotplug interface for Xen on Q35 Alexey Gerasimenko
2018-03-12 18:34   ` Alexey Gerasimenko
2018-03-12 18:34 ` [Qemu-devel] [RFC PATCH 16/30] q35/xen: Add Xen platform device support for Q35 Alexey Gerasimenko
2018-03-12 18:34   ` Alexey Gerasimenko
2018-03-12 19:44   ` [Qemu-devel] " Eduardo Habkost
2018-03-12 20:56     ` Alexey G
2018-03-12 20:56       ` Alexey G
2018-03-12 21:44       ` Eduardo Habkost
2018-03-12 21:44       ` [Qemu-devel] " Eduardo Habkost
2018-03-13 23:49         ` Alexey G
2018-03-13 23:49           ` Alexey G
2018-03-12 19:44   ` Eduardo Habkost
2018-03-13  9:24   ` [Qemu-devel] " Daniel P. Berrangé
2018-03-13  9:24     ` Daniel P. Berrangé
2018-03-12 18:34 ` [RFC PATCH 17/30] q35: Fix incorrect values for PCIEXBAR masks Alexey Gerasimenko
2018-03-12 18:34 ` [Qemu-devel] [RFC PATCH 18/30] xen/pt: XenHostPCIDevice: provide functions for PCI Capabilities and PCIe Extended Capabilities enumeration Alexey Gerasimenko
2018-03-12 18:34   ` Alexey Gerasimenko
2018-03-12 18:34 ` [Qemu-devel] [RFC PATCH 19/30] xen/pt: avoid reading PCIe device type and cap version multiple times Alexey Gerasimenko
2018-03-12 18:34   ` Alexey Gerasimenko
2018-03-12 18:34 ` [Qemu-devel] [RFC PATCH 20/30] xen/pt: determine the legacy/PCIe mode for a passed through device Alexey Gerasimenko
2018-03-12 18:34   ` Alexey Gerasimenko
2018-03-12 18:34 ` [Qemu-devel] [RFC PATCH 21/30] xen/pt: Xen PCIe passthrough support for Q35: bypass PCIe topology check Alexey Gerasimenko
2018-03-12 18:34   ` Alexey Gerasimenko
2018-03-12 18:34 ` [Qemu-devel] [RFC PATCH 22/30] xen/pt: add support for PCIe Extended Capabilities and larger config space Alexey Gerasimenko
2018-03-12 18:34   ` Alexey Gerasimenko
2018-03-12 18:34 ` [Qemu-devel] [RFC PATCH 23/30] xen/pt: handle PCIe Extended Capabilities Next register Alexey Gerasimenko
2018-03-12 18:34   ` Alexey Gerasimenko
2018-03-12 18:34 ` [Qemu-devel] [RFC PATCH 24/30] xen/pt: allow to hide PCIe Extended Capabilities Alexey Gerasimenko
2018-03-12 18:34   ` Alexey Gerasimenko
2018-03-12 18:34 ` [Qemu-devel] [RFC PATCH 25/30] xen/pt: add Vendor-specific PCIe Extended Capability descriptor and sizing Alexey Gerasimenko
2018-03-12 18:34   ` Alexey Gerasimenko
2018-03-12 18:34 ` [Qemu-devel] [RFC PATCH 26/30] xen/pt: add fixed-size PCIe Extended Capabilities descriptors Alexey Gerasimenko
2018-03-12 18:34   ` Alexey Gerasimenko
2018-03-12 18:34 ` [Qemu-devel] [RFC PATCH 27/30] xen/pt: add AER PCIe Extended Capability descriptor and sizing Alexey Gerasimenko
2018-03-12 18:34   ` Alexey Gerasimenko
2018-03-12 18:34 ` [Qemu-devel] [RFC PATCH 28/30] xen/pt: add descriptors and size calculation for RCLD/ACS/PMUX/DPA/MCAST/TPH/DPC PCIe Extended Capabilities Alexey Gerasimenko
2018-03-12 18:34   ` Alexey Gerasimenko
2018-03-12 18:34 ` [Qemu-devel] [RFC PATCH 29/30] xen/pt: add Resizable BAR PCIe Extended Capability descriptor and sizing Alexey Gerasimenko
2018-03-12 18:34   ` Alexey Gerasimenko
2018-03-12 18:34 ` [Qemu-devel] [RFC PATCH 30/30] xen/pt: add VC/VC9/MFVC PCIe Extended Capabilities descriptors " Alexey Gerasimenko
2018-03-12 18:34   ` Alexey Gerasimenko
2018-03-13  9:21 ` [Qemu-devel] [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices Daniel P. Berrangé
2018-03-13  9:21   ` Daniel P. Berrangé
2018-03-13 11:37   ` Alexey G
2018-03-13 11:37     ` Alexey G
2018-03-13 11:44     ` Daniel P. Berrangé
2018-03-13 11:44     ` Daniel P. Berrangé
2018-03-16 17:34 ` Alexey G
2018-03-16 18:26   ` Stefano Stabellini
2018-03-16 18:36   ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.