linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/14] Add VFIO mediated device support and DEV-MSI support for the idxd driver
@ 2021-02-05 20:52 Dave Jiang
  2021-02-05 20:52 ` [PATCH v5 01/14] vfio/mdev: idxd: add theory of operation documentation for idxd mdev Dave Jiang
                   ` (13 more replies)
  0 siblings, 14 replies; 31+ messages in thread
From: Dave Jiang @ 2021-02-05 20:52 UTC (permalink / raw)
  To: alex.williamson, kwankhede, tglx, vkoul
  Cc: megha.dey, jacob.jun.pan, ashok.raj, jgg, yi.l.liu, baolu.lu,
	kevin.tian, sanjay.k.kumar, tony.luck, dan.j.williams,
	eric.auger, parav, netanelg, shahafs, pbonzini, dmaengine,
	linux-kernel, kvm

- Thomas, thank you for the previous reviews. I've made the appropriate
  changes based on your feedback. Please take a look again at patches 5 and
  11 for IMS setup. I'd really appreciate an ack if they look good.
- Dan and Vinod, I'd really appreciate it if you can review patches 1-3 for
  idxd driver bits and provide an ack if they look good.
- Alex and Kirti, I'd very much appreciate it if you can review the series
  and consider inclusion for 5.13 kernel if everything looks good.
Thank you all!

v5:
- Split out non driver IMS code to its own series.
- Removed device devsec detection code.
- Reworked irq_entries for IMS so emulated vector is also included.
- Reworked vidxd_send_interrupt() to take irq_entry directly (data ready for
  consumption) (Thomas)
- Removed pointer to msi_entry in irq_entries (Thomas)
- Removed irq_domain check on free entries (Thomas)
- Split out irqbypass management code (Thomas)
- Fix EXPORT_SYMBOL to EXPORT_SYMBOL_GPL (Thomas)
- Refactored code to use auxiliary bus (Jason)

v4:
dev-msi:
- Make interrupt remapping code more readable (Thomas)
- Add flush writes to unmask/write and reset ims slots (Thomas)
- Interrupt Message Storm-> Interrupt Message Store (Thomas)
- Merge in pasid programming code. (Thomas)

mdev:
- Fixed up domain assignment (Thomas)
- Define magic numbers (Thomas)
- Move siov detection code to PCI common (Thomas)
- Remove duplicated MSI entry info (Thomas)
- Convert code to use ims_slot (Thomas)
- Add explanation of pasid programming for IMS entry (Thomas)
- Add release int handle release support due to spec 1.1 update.

v3:
Dev-msi:
- No need to add support for 2 different dev-msi irq domains, a common
once can be used for both the cases(with IR enabled/disabled)
- Add arch specific function to specify additions to msi_prepare callback
instead of making the callback a weak function
- Call platform ops directly instead of a wrapper function
- Make mask/unmask callbacks as void functions
dev->msi_domain should be updated at the device driver level before
calling dev_msi_alloc_irqs()
dev_msi_alloc/free_irqs() cannot be used for PCI devices
Followed the generic layering scheme: infrastructure bits->arch
bits->enabling bits

Mdev:
- Remove set kvm group notifier (Yan Zhao)
- Fix VFIO irq trigger removal (Yan Zhao)
- Add mmio read flush to ims mask (Jason)

v2:
IMS (now dev-msi):
- With recommendations from Jason/Thomas/Dan on making IMS more generic:
- Pass a non-pci generic device(struct device) for IMS management instead of
mdev
- Remove all references to mdev and symbol_get/put
- Remove all references to IMS in common code and replace with dev-msi
- Remove dynamic allocation of platform-msi interrupts: no groups,no
new msi list or list helpers
- Create a generic dev-msi domain with and without interrupt remapping
enabled.
- Introduce dev_msi_domain_alloc_irqs and dev_msi_domain_free_irqs apis

mdev:
- Removing unrelated bits from SVA enabling that’s not necessary for
the submission. (Kevin)
- Restructured entire mdev driver series to make reviewing easier (Kevin)
- Made rw emulation more robust (Kevin)
- Removed uuid wq type and added single dedicated wq type (Kevin)
- Locking fixes for vdev (Yan Zhao)
- VFIO MSIX trigger fixes (Yan Zhao)

This code series will match the support of the 5.6 kernel (stage 1) driver
but on guest.

The code has dependency on IMS enabling code:
https://lore.kernel.org/linux-pci/1612385805-3412-1-git-send-email-megha.dey@intel.com/T/#t

Stage 1 of the driver has been accepted in v5.6 kernel. It supports dedicated
workqueue (wq) without Shared Virtual Memory (SVM) support.

Stage 2 of the driver supports shared wq and SVM and has been accepted in
kernel v5.11.

VFIO mediated device framework allows vendor drivers to wrap a portion of
device resources into virtual devices (mdev). Each mdev can be assigned
to different guest using the same set of VFIO uAPIs as assigning a
physical device. Accessing to the mdev resource is served with mixed
policies. For example, vendor drivers typically mark data-path interface
as pass-through for fast guest operations, and then trap-and-mediate the
control-path interface to avoid undesired interference between mdevs. Some
level of emulation is necessary behind vfio mdev to compose the virtual
device interface.

This series brings mdev to idxd driver to enable Intel Scalable IOV
(SIOV), a hardware-assisted mediated pass-through technology. SIOV makes
each DSA wq independently assignable through PASID-granular resource/DMA
isolation. It helps improve scalability and reduces mediation complexity
against purely software-based mdev implementations. Each assigned wq is
configured by host and exposed to the guest in a read-only configuration
mode, which allows the guest to use the wq w/o additional setup. This
design greatly reduces the emulation bits to focus on handling commands
from guests.

There are two possible avenues to support virtual device composition:
1. VFIO mediated device (mdev) or 2. User space DMA through char device
(or UACCE). Given the small portion of emulation to satisfy our needs
and VFIO mdev having the infrastructure already to support the device
passthrough, we feel that VFIO mdev is the better route. For more in depth
explanation, see documentation in Documents/driver-api/vfio/mdev-idxd.rst.

Introducing mdev types “1dwq-v1” type. This mdev type allows
allocation of a single dedicated wq from available dedicated wqs. After
a workqueue (wq) is enabled, the user will generate an uuid. On mdev
creation, the mdev driver code will find a dwq depending on the mdev
type. When the create operation is successful, the user generated uuid
can be passed to qemu. When the guest boots up, it should discover a
DSA device when doing PCI discovery.

For example of “1dwq-v1” type:
1. Enable wq with “mdev” wq type
2. A user generated uuid.
3. The uuid is written to the mdev class sysfs path:
echo $UUID > /sys/class/mdev_bus/0000\:00\:0a.0/mdev_supported_types/idxd-1dwq-v1/create
4. Pass the following parameter to qemu:
"-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:00:0a.0/$UUID"

The wq exported through mdev will have the read only config bit set
for configuration. This means that the device does not require the
typical configuration. After enabling the device, the user must set the
WQ type and name. That is all is necessary to enable the WQ and start
using it. The single wq configuration is not the only way to create the
mdev. Multi wqs support for mdev will be in the future works.

The mdev utilizes Interrupt Message Store or IMS[3], a device-specific
MSI implementation, instead of MSIX for interrupts for the guest. This
preserves MSIX for host usages and also allows a significantly larger
number of interrupt vectors for guest usage.

The idxd driver implements IMS as on-device memory mapped unified
storage. Each interrupt message is stored as a DWORD size data payload
and a 64-bit address (same as MSI-X). Access to the IMS is through the
host idxd driver.

The idxd driver makes use of the generic IMS irq chip and domain which
stores the interrupt messages as an array in device memory. Allocation and
freeing of interrupts happens via the generic msi_domain_alloc/free_irqs()
interface. One only needs to ensure the interrupt domain is stored in
the underlying device struct.

[1]: https://lore.kernel.org/lkml/157965011794.73301.15960052071729101309.stgit@djiang5-desk3.ch.intel.com/
[2]: https://software.intel.com/en-us/articles/intel-sdm
[3]: https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[4]: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification
[5]: https://01.org/blogs/2019/introducing-intel-data-streaming-accelerator
[6]: https://intel.github.io/idxd/
[7]: https://github.com/intel/idxd-driver idxd-stage2.5

---

Dave Jiang (14):
      vfio/mdev: idxd: add theory of operation documentation for idxd mdev
      dmaengine: idxd: add IMS detection in base driver
      dmaengine: idxd: add device support functions in prep for mdev
      vfio/mdev: idxd: Add auxialary device plumbing for idxd mdev support
      vfio/mdev: idxd: add basic mdev registration and helper functions
      vfio/mdev: idxd: add mdev type as a new wq type
      vfio/mdev: idxd: add 1dwq-v1 mdev type
      vfio/mdev: idxd: add emulation rw routines
      vfio/mdev: idxd: prep for virtual device commands
      vfio/mdev: idxd: virtual device commands emulation
      vfio/mdev: idxd: ims setup for the vdcm
      vfio/mdev: idxd: add irq bypass for IMS vectors
      vfio/mdev: idxd: add new wq state for mdev
      vfio/mdev: idxd: add error notification from host driver to mediated device


 .../ABI/stable/sysfs-driver-dma-idxd          |    6 +
 MAINTAINERS                                   |    8 +-
 drivers/dma/idxd/Makefile                     |    2 +
 drivers/dma/idxd/cdev.c                       |    6 +-
 drivers/dma/idxd/device.c                     |  137 +-
 drivers/dma/idxd/idxd.h                       |   48 +-
 drivers/dma/idxd/init.c                       |   98 +-
 drivers/dma/idxd/irq.c                        |    8 +-
 drivers/dma/idxd/registers.h                  |   36 +-
 drivers/dma/idxd/sysfs.c                      |   33 +-
 drivers/vfio/mdev/Kconfig                     |   10 +
 drivers/vfio/mdev/Makefile                    |    1 +
 drivers/vfio/mdev/idxd/Makefile               |    4 +
 drivers/vfio/mdev/idxd/mdev.c                 | 1295 +++++++++++++++++
 drivers/vfio/mdev/idxd/mdev.h                 |  119 ++
 drivers/vfio/mdev/idxd/vdev.c                 | 1014 +++++++++++++
 drivers/vfio/mdev/idxd/vdev.h                 |   28 +
 include/uapi/linux/idxd.h                     |    2 +
 kernel/irq/msi.c                              |    2 +
 19 files changed, 2814 insertions(+), 43 deletions(-)
 create mode 100644 drivers/vfio/mdev/idxd/Makefile
 create mode 100644 drivers/vfio/mdev/idxd/mdev.c
 create mode 100644 drivers/vfio/mdev/idxd/mdev.h
 create mode 100644 drivers/vfio/mdev/idxd/vdev.c
 create mode 100644 drivers/vfio/mdev/idxd/vdev.h

--


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 01/14] vfio/mdev: idxd: add theory of operation documentation for idxd mdev
  2021-02-05 20:52 [PATCH v5 00/14] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
@ 2021-02-05 20:52 ` Dave Jiang
  2021-02-05 20:53 ` [PATCH v5 02/14] dmaengine: idxd: add IMS detection in base driver Dave Jiang
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Dave Jiang @ 2021-02-05 20:52 UTC (permalink / raw)
  To: alex.williamson, kwankhede, tglx, vkoul
  Cc: megha.dey, jacob.jun.pan, ashok.raj, jgg, yi.l.liu, baolu.lu,
	kevin.tian, sanjay.k.kumar, tony.luck, dan.j.williams,
	eric.auger, parav, netanelg, shahafs, pbonzini, dmaengine,
	linux-kernel, kvm

Add idxd vfio mediated device theory of operation documentation.
Provide description on mdev design, usage, and why vfio mdev was chosen.

Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 Documentation/driver-api/vfio/mdev-idxd.rst |  397 +++++++++++++++++++++++++++
 MAINTAINERS                                 |    1 
 2 files changed, 398 insertions(+)
 create mode 100644 Documentation/driver-api/vfio/mdev-idxd.rst

diff --git a/Documentation/driver-api/vfio/mdev-idxd.rst b/Documentation/driver-api/vfio/mdev-idxd.rst
new file mode 100644
index 000000000000..9bf93eafc7c8
--- /dev/null
+++ b/Documentation/driver-api/vfio/mdev-idxd.rst
@@ -0,0 +1,397 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
+IDXD Overview
+=============
+IDXD (Intel Data Accelerator Driver) is the driver for the Intel Data
+Streaming Accelerator (DSA).  Intel DSA is a high performance data copy
+and transformation accelerator. In addition to data move operations,
+the device also supports data fill, CRC generation, Data Integrity Field
+(DIF), and memory compare and delta generation. Intel DSA supports
+a variety of PCI-SIG defined capabilities such as Address Translation
+Services (ATS), Process address Space ID (PASID), Page Request Interface
+(PRI), Message Signalled Interrupts Extended (MSI-X), and Advanced Error
+Reporting (AER). Some of those capabilities enable the device to support
+Shared Virtual Memory (SVM), or also known as Shared Virtual Addressing
+(SVA). Intel DSA also supports Intel Scalable I/O Virtualization (SIOV)
+to improve scalability of device assignment.
+
+
+The Intel DSA device contains the following basic components:
+* Work queue (WQ)
+
+  A WQ is an on device storage to queue descriptors to the
+  device. Requests are added to a WQ by using new CPU instructions
+  (MOVDIR64B and ENQCMD(S)) to write the memory mapped “portal”
+  associated with each WQ.
+
+* Engine
+
+  Operation unit that pulls descriptors from WQs and processes them.
+
+* Group
+
+  Abstract container to associate one or more engines with one or more WQs.
+
+
+Two types of WQs are supported:
+* Dedicated WQ (DWQ)
+
+  A single client should owns this exclusively and can submit work
+  to it. The MOVDIR64B instruction is used to submit descriptors to
+  this type of WQ. The instruction is a posted write, therefore the
+  submitter must ensure not exceed the WQ length for submission. The
+  use of PASID is optional with DWQ. Multiple clients can submit to
+  a DWQ, but sychronization is required due to when the WQ is full,
+  the submission is silently dropped.
+
+* Shared WQ (SWQ)
+
+  Multiple clients can submit work to this WQ. The submitter must use
+  ENQMCDS (from supervisor mode) or ENQCMD (from user mode). These
+  instructions will indicate via EFLAGS.ZF bit whether a submission
+  succeeds. The use of PASID is mandatory to identify the address space
+  of each client.
+
+
+For more information about the new instructions [1][2].
+
+The IDXD driver is broken down into following usages:
+* In kernel interface through dmaengine subsystem API.
+* Userspace DMA support through character device. mmap(2) is utilized
+  to map directly to mmio address (or portals) for descriptor submission.
+* VFIO Mediated device (mdev) supporting device passthrough usages. This
+  is only for the mdev usage.
+
+
+=================================
+Assignable Device Interface (ADI)
+=================================
+The term ADI is used to represent the minimal unit of assignment for
+Intel Scalable IOV device. Each ADI instance refers to the set of device
+backend resources that are allocated, configured and organized as an
+isolated unit.
+
+Intel DSA defines each WQ as an ADI. The MMIO registers of each work queue
+are partitioned into two categories:
+* MMIO registers accessed for data-path operations.
+* MMIO registers accessed for control-path operations.
+
+Data-path MMIO registers of each WQ are contained within
+one or more system page size aligned regions and can be mapped in the
+CPU page table for direct access from the guest. Control-path MMIO
+registers of all WQs are located together but segregated from data-path
+MMIO regions. Therefore, guest updates to control-path registers must
+be intercepted and then go through the host driver to be reflected in
+the device.
+
+Data-path MMIO registers of DSA WQ are portals for submitting descriptors
+to the device. There are four portals per WQ, each being 64 bytes
+in size and located on a separate 4KB page in BAR2. Each portal has
+different implications regarding interrupt message type (MSI vs. IMS)
+and occupancy control (limited vs. unlimited). It is not necessary to
+map all portals to the guest.
+
+Control-path MMIO registers of DSA WQ include global configurations
+(shared by all WQs) and WQ-specific configurations. The owner
+(e.g. the guest) of the WQ is expected to only change WQ-specific
+configurations. Intel DSA spec introduces a “Configuration Support”
+capability which, if cleared, indicates that some fields of WQ
+configuration registers are read-only and the WQ configuration is
+pre-configured by the host.
+
+
+Interrupt Message Store (IMS)
+-----------------------------
+The ADI utilizes Interrupt Message Store (IMS), a device-specific MSI
+implementation, instead of MSIX for interrupts for the guest. This
+preserves MSIX for host usages and also allows a significantly larger
+number of interrupt vectors for large number of guests usage.
+
+Intel DSA device implements IMS as on-device memory mapped unified
+storage. Each interrupt message is stored as a DWORD size data payload
+and a 64-bit address (same as MSI-X). Access to the IMS is through the
+host idxd driver.
+
+The idxd driver makes use of the generic IMS irq chip and domain which
+stores the interrupt messages in an array in device memory. Allocation and
+freeing of interrupts happens via the generic msi_domain_alloc/free_irqs()
+interface. Driver only needs to ensure the interrupt domain is stored in
+the underlying device struct.
+
+
+ADI Isolation
+-------------
+Operations or functioning of one ADI must not affect the functioning
+of another ADI or the physical device. Upstream memory requests from
+different ADIs are distinguished using a Process Address Space Identifier
+(PASID). With the support of PASID-granular address translation in Intel
+VT-d, the address space targeted by a request from ADI can be a Host
+Virtual Address (HVA), Host I/O Virtual Address (HIOVA), Guest Physical
+Address (GPA), Guest Virtual Address (GVA), Guest I/O Virtual Address
+(GIOVA), etc. The PASID identity for an ADI is expected to be accessed
+or modified by privileged software through the host driver.
+
+=========================
+Virtual DSA (vDSA) Device
+=========================
+The DSA WQ itself is not a PCI device thus must be composed into a
+virtual DSA device to the guest.
+
+The composition logic needs to handle four main requirements:
+* Emulate PCI config space.
+* Map data-path portals for direct access from the guest.
+* Emulate control-path MMIO registers and selectively forward WQ
+  configuration requests through host driver to the device.
+* Forward and emulate WQ interrupts to the guest.
+
+The composition logic tells the guest aspects of WQ which are configurable
+through a combination of capability fields, e.g.:
+* Configuration Support (if cleared, most aspects are not modifiable).
+* WQ Mode Support (if cleared, cannot change between dedicated and
+  shared mode).
+* Dedicated Mode Support.
+* Shared Mode Support.
+* ...
+
+The virtual capability fields are set according to the vDSA
+type. Following is an example of vDSA types and related WQ configurability:
+* Type ‘1dwq-v1’
+   * One DSA gen1 dedicated WQ to this guest
+   * Guest cannot share the WQ between its clients (no guest SVA)
+   * Guest cannot change any WQ configuration
+
+Besides, the composition logic also needs to serve administrative commands
+(thru virtual CMD register) through host driver, including:
+* Drain/abort all descriptors submitted by this guest.
+* Drain/abort descriptors associated with a PASID.
+* Enable/disable/reset the WQ (when it’s not shared by multiple VMs).
+* Request interrupt handle.
+
+With this design, vDSA emulation is **greatly simplified**. Most
+registers are emulated in simple READ-ONLY flavor, and handling limited
+configurability is required only for a few registers.
+
+===========================
+VFIO mdev vs. userspace DMA
+===========================
+There are two avenues to support vDSA composition.
+1. VFIO mediated device (mdev)
+2. Userspace DMA through char device
+
+VFIO mdev provides a generic subdevice passthrough framework. Unified
+uAPIs are used for both device and subdevice passthrough, thus any
+userspace VMM which already supports VFIO device passthrough would
+naturally support mdev/subdevice passthrough. The implication of VFIO
+mdev is putting emulation of device interface in the kernel (part of
+host driver) which must be carefully scrutinized. Fortunately, vDSA
+composition includes only a small portion of emulation code, due to the
+fact that most registers are simply READ-ONLY to the guest. The majority
+logic of handling limited configurability and administrative commands
+is anyway required to sit in the kernel, regardless of which kernel uAPI
+is pursued. In this regard, VFIO mdev is a nice fit for vDSA composition.
+
+IDXD driver provides a char device interface for applications to
+map the WQ portal and directly submit descriptors to do DMA. This
+interface provides only data-path access to userspace and relies on
+the host driver to handle control-path configurations. Expanding such
+interface to support subdevice passthrough allows moving the emulation
+code to userspace. However, quite some work is required to grow it from
+an application-oriented interface into a passthrough-oriented interface:
+new uAPIs to handle guest WQ configurability and administrative commands,
+and new uAPIs to handle passthrough specific requirements (e.g. DMA map,
+guest SVA, live migration, posted interrupt, etc.). And once it is done,
+every userspace VMM has to explicitly bind to IDXD specific uAPI, even
+though the real user is in the guest (instead of the VMM itself) in the
+passthrough scenario.
+
+Although some generalization might be possible to reduce the work of
+handling passthrough, we feel the difference between userspace DMA
+and subdevice passthrough is distinct in IDXD. Therefore, we choose to
+build vDSA composition on top of VFIO mdev framework and leave userspace
+DMA intact after discussion at LPC 2020.
+
+=============================
+Host Registration and Release
+=============================
+
+Intel DSA reports support for Intel Scalable IOV via a PCI Express
+Designated Vendor Specific Extended Capability (DVSEC). In addition,
+PASID-granular address translation capability is required in the
+IOMMU. During host initialization, the IDXD driver should check the
+presence of both capabilities before calling mdev_register_device()
+to register with the VFIO mdev framework and provide a set of ops
+(struct mdev_parent_ops). The IOMMU capability is indicated by the
+IOMMU_DEV_FEAT_AUX feature flag with iommu_dev_has_feature() and enabled
+with iommu_dev_enable_feature().
+
+On release, iommu_dev_disable_feature() is called after
+mdev_unregister_device() to disable the IOMMU_DEV_FEAT_AUX flag that
+the driver enabled during host initialization.
+
+The mdev_parent_ops data structure is filled out by the driver to provide
+a number of ops called by VFIO mdev framework::
+
+        struct mdev_parent_ops {
+                .supported_type_groups
+                .create
+                .remove
+                .open
+                .release
+                .read
+                .write
+                .mmap
+                .ioctl
+        };
+
+Supported_type_groups
+---------------------
+At the moment only one vDSA type is supported.
+
+“1dwq-v1”:
+  Single dedicated WQ (DSA 1.0) with read-only configuration exposed to
+  the guest. On the guest kernel, a vDSA device shows up with a single
+  WQ that is pre-configured by the host. The configuration for the WQ
+  is entirely read-only and cannot be reconfigured. There is no support
+  of guest SVA on this WQ.
+
+  The interrupt vector 0 is emulated by the host driver to support the admin
+  command completion and error reporting. A second interrupt vector is
+  bound to the IMS and used for I/O operation. In this implementation,
+  there are only two vectors being supported.
+
+create
+------
+API function to create the mdev. mdev_set_iommu_device() is called to
+associate the mdev device to the parent PCI device. This function is
+where the driver sets up and initializes the resources to support a single
+mdev device. This is triggered through sysfs to initiate the creation.
+
+remove
+------
+API function that mirrors the create() function and releases all the
+resources backing the mdev.  This is also triggered through sysfs.
+
+open
+----
+API function that is called down from VFIO userspace to indicate to the
+driver that the upper layers are ready to claim and utilize the mdev. IMS
+entries are allocated and setup here.
+
+release
+-------
+The mirror function to open that releases the mdev by VFIO userspace.
+
+read / write
+------------
+This is where the Intel IDXD driver provides read/write emulation of
+PCI config space and MMIO registers. These paths are the “slow” path
+of the mediated device and emulation is used rather than direct access
+to the hardware resources. Typically configuration and administrative
+commands go through this path. This allows the mdev to show up as a
+virtual PCI device on the guest kernel.
+
+The emulation of PCI config space is nothing special, which is simply
+copied from kvmgt. In the future this part might be consolidated to
+reduce duplication.
+
+Emulating MMIO reads are simply memory copies. There is no side-effect
+to be emulated upon guest read.
+
+Emulating MMIO writes are required only for a few registers, due to
+read-only configuration on the ‘1dwq-v1’ type. Majority of composition
+logic is hooked in the CMD register for performing administrative commands
+such as WQ drain, abort, enable, disable and reset operations. The rest of
+the emulation is about handling errors (GENCTRL/SWERROR) and interrupts
+(INTCAUSE/MSIXPERM) on the vDSA device. Future mdev types might allow
+limited WQ configurability, which then requires additional emulation of
+the WQCFG register.
+
+mmap
+----
+This is the function that provides the setup to expose a portion of the
+hardware, also known as portals, for direct access for “fast” path
+operations through the mmap() syscall. A limited region of the hardware
+is mapped to the guest for direct I/O submission.
+
+There are four portals per WQ: unlimited MSI-X, limited MSI-X, unlimited
+IMS, limited IMS.  Descriptors submitted to limited portals are subject
+to threshold configuration limitations for shared WQs. The MSI-X portals
+are used for host submissions, and the IMS portals are mapped to vm for
+guest submission.
+
+ioctl
+-----
+This API function does several things
+* Provides general device information to VFIO userspace.
+* Provides device region information (PCI, mmio, etc).
+* Get interrupts information
+* Setup interrupts for the mediated device.
+* Mdev device reset
+
+For the Intel idxd driver, Interrupt Message Store (IMS) vectors are being
+used for mdev interrupts rather than MSIX vectors. IMS provides additional
+interrupt vectors outside of PCI MSIX specification in order to support
+significantly more vectors. The emulated interrupt (0) is connected through
+kernel eventfd. When interrupt 0 needs to be asserted, the driver will
+signal the eventfd to trigger vector 0 interrupt on the guest.
+The IMS interrupts are setup via eventfd as well. However, it utilizes
+irq bypass manager to directly inject the interrupt in the guest.
+
+To allocate IMS, we utilize the IMS array APIs. On host init, we need
+to create the MSI domain::
+
+        struct ims_array_info ims_info;
+        struct device *dev = &pci_dev->dev;
+
+
+        /* assign the device IMS size */
+        ims_info.max_slots = max_ims_size;
+        /* assign the MMIO base address for the IMS table */
+        ims_info.slots = mmio_base + ims_offset;
+        /* assign the MSI domain to the device */
+        dev->msi_domain = pci_ims_array_create_msi_irq_domain(pci_dev, &ims_info);
+
+When we are ready to allocate the interrupts::
+
+        struct device *dev = mdev_dev(mdev);
+
+        irq_domain = pci_dev->dev.msi_domain;
+        /* the irqs are allocated against device of mdev */
+        rc = msi_domain_alloc_irqs(irq_domain, dev, num_vecs);
+
+
+        /* we can retrieve the slot index from msi_entry */
+        for_each_msi_entry(entry, dev) {
+                slot_index = entry->device_msi.hwirq;
+                irq = entry->irq;
+        }
+
+        request_irq(irq, interrupt_handler_function, 0, “ims”, context);
+
+
+The DSA device is structured such that MSI-X table entry 0 is used for
+admin commands completion, error reporting, and other misc commands. The
+remaining MSI-X table entries are used for WQ completion. For vm support,
+the virtual device also presents a similar layout. Therefore, vector 0
+is emulated by the software. Additional vector(s) are associated with IMS.
+
+The index (slot) for the per device IMS entry is managed by the MSI
+core. The index is the “interrupt handle” that the guest kernel
+needs to program into a DMA descriptor. That interrupt handle tells the
+hardware which IMS vector to trigger the interrupt on for the host.
+
+The virtual device presents an admin command called “request interrupt
+handle” that is not supported by the physical device. On probe of
+the DSA device on the guest kernel, the guest driver will issue the
+“request interrupt handle” command in order to get the interrupt
+handle for descriptor programming. The host driver will return the
+assigned slot for the IMS entry table to the guest.
+
+==========
+References
+==========
+[1] https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
+[2] https://software.intel.com/en-us/articles/intel-sdm
+[3] https://software.intel.com/sites/default/files/managed/cc/0e/intel-scalable-io-virtualization-technical-specification.pdf
+[4] https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification
diff --git a/MAINTAINERS b/MAINTAINERS
index c2114daa6bc7..ae34b0331eb4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8970,6 +8970,7 @@ INTEL IADX DRIVER
 M:	Dave Jiang <dave.jiang@intel.com>
 L:	dmaengine@vger.kernel.org
 S:	Supported
+F:	Documentation/driver-api/vfio/mdev-idxd.rst
 F:	drivers/dma/idxd/*
 F:	include/uapi/linux/idxd.h
 



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 02/14] dmaengine: idxd: add IMS detection in base driver
  2021-02-05 20:52 [PATCH v5 00/14] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
  2021-02-05 20:52 ` [PATCH v5 01/14] vfio/mdev: idxd: add theory of operation documentation for idxd mdev Dave Jiang
@ 2021-02-05 20:53 ` Dave Jiang
  2021-02-10 23:30   ` Jason Gunthorpe
  2021-02-05 20:53 ` [PATCH v5 03/14] dmaengine: idxd: add device support functions in prep for mdev Dave Jiang
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Dave Jiang @ 2021-02-05 20:53 UTC (permalink / raw)
  To: alex.williamson, kwankhede, tglx, vkoul
  Cc: megha.dey, jacob.jun.pan, ashok.raj, jgg, yi.l.liu, baolu.lu,
	kevin.tian, sanjay.k.kumar, tony.luck, dan.j.williams,
	eric.auger, parav, netanelg, shahafs, pbonzini, dmaengine,
	linux-kernel, kvm

In preparation for support of VFIO mediated device for idxd driver, the
enabling for Interrupt Message Store (IMS) interrupts is added for the idxd
With IMS support the idxd driver can dynamically allocate interrupts on a
per mdev basis based on how many IMS vectors that are mapped to the mdev
device. This commit only provides the detection functions in the base driver
and not the VFIO mdev code utilization.

The commit has some portal related changes. A "portal" is a special
location within the MMIO BAR2 of the DSA device where descriptors are
submitted via the CPU command MOVDIR64B or ENQCMD(S). The offset for the
portal address determines whether the submitted descriptor is for MSI-X
or IMS notification.

See Intel SIOV spec for more details:
https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 Documentation/ABI/stable/sysfs-driver-dma-idxd |    6 ++++++
 drivers/dma/idxd/cdev.c                        |    4 ++--
 drivers/dma/idxd/device.c                      |    2 +-
 drivers/dma/idxd/idxd.h                        |   13 +++++++++----
 drivers/dma/idxd/init.c                        |   19 +++++++++++++++++++
 drivers/dma/idxd/registers.h                   |    7 +++++++
 drivers/dma/idxd/sysfs.c                       |    9 +++++++++
 7 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-driver-dma-idxd b/Documentation/ABI/stable/sysfs-driver-dma-idxd
index 55285c136cf0..95cd7975f488 100644
--- a/Documentation/ABI/stable/sysfs-driver-dma-idxd
+++ b/Documentation/ABI/stable/sysfs-driver-dma-idxd
@@ -129,6 +129,12 @@ KernelVersion:	5.10.0
 Contact:	dmaengine@vger.kernel.org
 Description:	The last executed device administrative command's status/error.
 
+What:		/sys/bus/dsa/devices/dsa<m>/ims_size
+Date:		Oct 15, 2020
+KernelVersion:	5.11.0
+Contact:	dmaengine@vger.kernel.org
+Description:	The total number of vectors available for Interrupt Message Store.
+
 What:		/sys/bus/dsa/devices/wq<m>.<n>/block_on_fault
 Date:		Oct 27, 2020
 KernelVersion:	5.11.0
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 0db9b82ed8cf..b1518106434f 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -205,8 +205,8 @@ static int idxd_cdev_mmap(struct file *filp, struct vm_area_struct *vma)
 		return rc;
 
 	vma->vm_flags |= VM_DONTCOPY;
-	pfn = (base + idxd_get_wq_portal_full_offset(wq->id,
-				IDXD_PORTAL_LIMITED)) >> PAGE_SHIFT;
+	pfn = (base + idxd_get_wq_portal_full_offset(wq->id, IDXD_PORTAL_LIMITED,
+						     IDXD_IRQ_MSIX)) >> PAGE_SHIFT;
 	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 	vma->vm_private_data = ctx;
 
diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index 205156afeb54..d6c447d09a6f 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -290,7 +290,7 @@ int idxd_wq_map_portal(struct idxd_wq *wq)
 	resource_size_t start;
 
 	start = pci_resource_start(pdev, IDXD_WQ_BAR);
-	start += idxd_get_wq_portal_full_offset(wq->id, IDXD_PORTAL_LIMITED);
+	start += idxd_get_wq_portal_full_offset(wq->id, IDXD_PORTAL_LIMITED, IDXD_IRQ_MSIX);
 
 	wq->portal = devm_ioremap(dev, start, IDXD_PORTAL_SIZE);
 	if (!wq->portal)
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index a9386a66ab72..90c9458903e1 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -163,6 +163,7 @@ enum idxd_device_flag {
 	IDXD_FLAG_CONFIGURABLE = 0,
 	IDXD_FLAG_CMD_RUNNING,
 	IDXD_FLAG_PASID_ENABLED,
+	IDXD_FLAG_IMS_SUPPORTED,
 };
 
 struct idxd_device {
@@ -190,6 +191,7 @@ struct idxd_device {
 
 	int num_groups;
 
+	u32 ims_offset;
 	u32 msix_perm_offset;
 	u32 wqcfg_offset;
 	u32 grpcfg_offset;
@@ -197,6 +199,7 @@ struct idxd_device {
 
 	u64 max_xfer_bytes;
 	u32 max_batch_size;
+	int ims_size;
 	int max_groups;
 	int max_engines;
 	int max_tokens;
@@ -279,15 +282,17 @@ enum idxd_interrupt_type {
 	IDXD_IRQ_IMS,
 };
 
-static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot)
+static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot,
+					    enum idxd_interrupt_type irq_type)
 {
-	return prot * 0x1000;
+	return prot * 0x1000 + irq_type * 0x2000;
 }
 
 static inline int idxd_get_wq_portal_full_offset(int wq_id,
-						 enum idxd_portal_prot prot)
+						 enum idxd_portal_prot prot,
+						 enum idxd_interrupt_type irq_type)
 {
-	return ((wq_id * 4) << PAGE_SHIFT) + idxd_get_wq_portal_offset(prot);
+	return ((wq_id * 4) << PAGE_SHIFT) + idxd_get_wq_portal_offset(prot, irq_type);
 }
 
 static inline void idxd_set_type(struct idxd_device *idxd)
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 0c982337ef84..ee56b92108d8 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -254,10 +254,28 @@ static void idxd_read_table_offsets(struct idxd_device *idxd)
 	dev_dbg(dev, "IDXD Work Queue Config Offset: %#x\n", idxd->wqcfg_offset);
 	idxd->msix_perm_offset = offsets.msix_perm * IDXD_TABLE_MULT;
 	dev_dbg(dev, "IDXD MSIX Permission Offset: %#x\n", idxd->msix_perm_offset);
+	idxd->ims_offset = offsets.ims * IDXD_TABLE_MULT;
+	dev_dbg(dev, "IDXD IMS Offset: %#x\n", idxd->ims_offset);
 	idxd->perfmon_offset = offsets.perfmon * IDXD_TABLE_MULT;
 	dev_dbg(dev, "IDXD Perfmon Offset: %#x\n", idxd->perfmon_offset);
 }
 
+static void idxd_check_ims(struct idxd_device *idxd)
+{
+	struct pci_dev *pdev = idxd->pdev;
+
+	/* verify that we have IMS vectors supported by device */
+	if (idxd->hw.gen_cap.max_ims_mult) {
+		idxd->ims_size = idxd->hw.gen_cap.max_ims_mult * 256ULL;
+		dev_dbg(&pdev->dev, "IMS size: %u\n", idxd->ims_size);
+		set_bit(IDXD_FLAG_IMS_SUPPORTED, &idxd->flags);
+		dev_dbg(&pdev->dev, "IMS supported for device\n");
+		return;
+	}
+
+	dev_dbg(&pdev->dev, "IMS unsupported for device\n");
+}
+
 static void idxd_read_caps(struct idxd_device *idxd)
 {
 	struct device *dev = &idxd->pdev->dev;
@@ -276,6 +294,7 @@ static void idxd_read_caps(struct idxd_device *idxd)
 	dev_dbg(dev, "max xfer size: %llu bytes\n", idxd->max_xfer_bytes);
 	idxd->max_batch_size = 1U << idxd->hw.gen_cap.max_batch_shift;
 	dev_dbg(dev, "max batch size: %u\n", idxd->max_batch_size);
+	idxd_check_ims(idxd);
 	if (idxd->hw.gen_cap.config_en)
 		set_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags);
 
diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h
index 5cbf368c7367..c97f700bcf34 100644
--- a/drivers/dma/idxd/registers.h
+++ b/drivers/dma/idxd/registers.h
@@ -385,4 +385,11 @@ union wqcfg {
 #define GRPENGCFG_OFFSET(idxd_dev, n) ((idxd_dev)->grpcfg_offset + (n) * GRPCFG_SIZE + 32)
 #define GRPFLGCFG_OFFSET(idxd_dev, n) ((idxd_dev)->grpcfg_offset + (n) * GRPCFG_SIZE + 40)
 
+#define PCI_EXT_CAP_ID_DVSEC		0x23	/* Designated Vendor-Specific */
+#define PCI_DVSEC_HEADER1		0x4	/* Designated Vendor-Specific Header1 */
+#define PCI_DVSEC_HEADER2		0x8	/* Designated Vendor-Specific Header2 */
+#define PCI_DVSEC_ID_INTEL_SIOV		0x0005
+#define PCI_DVSEC_INTEL_SIOV_CAP	0x0014
+#define PCI_DVSEC_INTEL_SIOV_CAP_IMS	0X00000001
+
 #endif
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index 21c1e23cdf23..ab5c76e1226b 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -1444,6 +1444,14 @@ static ssize_t numa_node_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(numa_node);
 
+static ssize_t ims_size_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct idxd_device *idxd = container_of(dev, struct idxd_device, conf_dev);
+
+	return sprintf(buf, "%u\n", idxd->ims_size);
+}
+static DEVICE_ATTR_RO(ims_size);
+
 static ssize_t max_batch_size_show(struct device *dev,
 				   struct device_attribute *attr, char *buf)
 {
@@ -1639,6 +1647,7 @@ static struct attribute *idxd_device_attributes[] = {
 	&dev_attr_max_work_queues_size.attr,
 	&dev_attr_max_engines.attr,
 	&dev_attr_numa_node.attr,
+	&dev_attr_ims_size.attr,
 	&dev_attr_max_batch_size.attr,
 	&dev_attr_max_transfer_size.attr,
 	&dev_attr_op_cap.attr,



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 03/14] dmaengine: idxd: add device support functions in prep for mdev
  2021-02-05 20:52 [PATCH v5 00/14] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
  2021-02-05 20:52 ` [PATCH v5 01/14] vfio/mdev: idxd: add theory of operation documentation for idxd mdev Dave Jiang
  2021-02-05 20:53 ` [PATCH v5 02/14] dmaengine: idxd: add IMS detection in base driver Dave Jiang
@ 2021-02-05 20:53 ` Dave Jiang
  2021-02-05 20:53 ` [PATCH v5 04/14] vfio/mdev: idxd: Add auxialary device plumbing for idxd mdev support Dave Jiang
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Dave Jiang @ 2021-02-05 20:53 UTC (permalink / raw)
  To: alex.williamson, kwankhede, tglx, vkoul
  Cc: megha.dey, jacob.jun.pan, ashok.raj, jgg, yi.l.liu, baolu.lu,
	kevin.tian, sanjay.k.kumar, tony.luck, dan.j.williams,
	eric.auger, parav, netanelg, shahafs, pbonzini, dmaengine,
	linux-kernel, kvm

Add device support helper functions in preparation of adding VFIO
mdev support.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/idxd/device.c    |   61 ++++++++++++++++++++++++++++++++++++++++++
 drivers/dma/idxd/idxd.h      |    4 +++
 drivers/dma/idxd/registers.h |    3 +-
 3 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index d6c447d09a6f..2491b27c8125 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -306,6 +306,30 @@ void idxd_wq_unmap_portal(struct idxd_wq *wq)
 	devm_iounmap(dev, wq->portal);
 }
 
+int idxd_wq_abort(struct idxd_wq *wq)
+{
+	struct idxd_device *idxd = wq->idxd;
+	struct device *dev = &idxd->pdev->dev;
+	u32 operand, status;
+
+	dev_dbg(dev, "Abort WQ %d\n", wq->id);
+	if (wq->state != IDXD_WQ_ENABLED) {
+		dev_dbg(dev, "WQ %d not active\n", wq->id);
+		return -ENXIO;
+	}
+
+	operand = BIT(wq->id % 16) | ((wq->id / 16) << 16);
+	dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_ABORT_WQ, operand);
+	idxd_cmd_exec(idxd, IDXD_CMD_ABORT_WQ, operand, &status);
+	if (status != IDXD_CMDSTS_SUCCESS) {
+		dev_dbg(dev, "WQ abort failed: %#x\n", status);
+		return -ENXIO;
+	}
+
+	dev_dbg(dev, "WQ %d aborted\n", wq->id);
+	return 0;
+}
+
 int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid)
 {
 	struct idxd_device *idxd = wq->idxd;
@@ -412,6 +436,32 @@ void idxd_wq_quiesce(struct idxd_wq *wq)
 	percpu_ref_exit(&wq->wq_active);
 }
 
+void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid)
+{
+	struct idxd_device *idxd = wq->idxd;
+	int offset;
+
+	lockdep_assert_held(&idxd->dev_lock);
+
+	/* PASID fields are 8 bytes into the WQCFG register */
+	offset = WQCFG_OFFSET(idxd, wq->id, WQCFG_PASID_IDX);
+	wq->wqcfg->pasid = pasid;
+	iowrite32(wq->wqcfg->bits[WQCFG_PASID_IDX], idxd->reg_base + offset);
+}
+
+void idxd_wq_setup_priv(struct idxd_wq *wq, int priv)
+{
+	struct idxd_device *idxd = wq->idxd;
+	int offset;
+
+	lockdep_assert_held(&idxd->dev_lock);
+
+	/* priv field is 8 bytes into the WQCFG register */
+	offset = WQCFG_OFFSET(idxd, wq->id, WQCFG_PRIV_IDX);
+	wq->wqcfg->priv = !!priv;
+	iowrite32(wq->wqcfg->bits[WQCFG_PRIV_IDX], idxd->reg_base + offset);
+}
+
 /* Device control bits */
 static inline bool idxd_is_enabled(struct idxd_device *idxd)
 {
@@ -599,6 +649,17 @@ void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid)
 	dev_dbg(dev, "pasid %d drained\n", pasid);
 }
 
+void idxd_device_abort_pasid(struct idxd_device *idxd, int pasid)
+{
+	struct device *dev = &idxd->pdev->dev;
+	u32 operand;
+
+	operand = pasid;
+	dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_ABORT_PASID, operand);
+	idxd_cmd_exec(idxd, IDXD_CMD_ABORT_PASID, operand, NULL);
+	dev_dbg(dev, "pasid %d aborted\n", pasid);
+}
+
 int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, int *handle,
 				   enum idxd_interrupt_type irq_type)
 {
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 90c9458903e1..a2438b3166db 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -350,6 +350,7 @@ void idxd_device_cleanup(struct idxd_device *idxd);
 int idxd_device_config(struct idxd_device *idxd);
 void idxd_device_wqs_clear_state(struct idxd_device *idxd);
 void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid);
+void idxd_device_abort_pasid(struct idxd_device *idxd, int pasid);
 int idxd_device_load_config(struct idxd_device *idxd);
 int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, int *handle,
 				   enum idxd_interrupt_type irq_type);
@@ -369,6 +370,9 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid);
 int idxd_wq_disable_pasid(struct idxd_wq *wq);
 void idxd_wq_quiesce(struct idxd_wq *wq);
 int idxd_wq_init_percpu_ref(struct idxd_wq *wq);
+int idxd_wq_abort(struct idxd_wq *wq);
+void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid);
+void idxd_wq_setup_priv(struct idxd_wq *wq, int priv);
 
 /* submission */
 int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc);
diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h
index c97f700bcf34..d9a732decdd5 100644
--- a/drivers/dma/idxd/registers.h
+++ b/drivers/dma/idxd/registers.h
@@ -347,7 +347,8 @@ union wqcfg {
 	u32 bits[8];
 } __packed;
 
-#define WQCFG_PASID_IDX                2
+#define WQCFG_PASID_IDX		2
+#define WQCFG_PRIV_IDX		2
 
 /*
  * This macro calculates the offset into the WQCFG register



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 04/14] vfio/mdev: idxd: Add auxialary device plumbing for idxd mdev support
  2021-02-05 20:52 [PATCH v5 00/14] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
                   ` (2 preceding siblings ...)
  2021-02-05 20:53 ` [PATCH v5 03/14] dmaengine: idxd: add device support functions in prep for mdev Dave Jiang
@ 2021-02-05 20:53 ` Dave Jiang
  2021-02-10 23:46   ` Jason Gunthorpe
  2021-02-05 20:53 ` [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions Dave Jiang
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Dave Jiang @ 2021-02-05 20:53 UTC (permalink / raw)
  To: alex.williamson, kwankhede, tglx, vkoul
  Cc: megha.dey, jacob.jun.pan, ashok.raj, jgg, yi.l.liu, baolu.lu,
	kevin.tian, sanjay.k.kumar, tony.luck, dan.j.williams,
	eric.auger, parav, netanelg, shahafs, pbonzini, dmaengine,
	linux-kernel, kvm

Add the VFIO mediated device driver as an auxiliary device to the main idxd
driver. This allows the mdev code to be under VFIO mdev subsystem.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 MAINTAINERS                     |    8 ++++
 drivers/dma/idxd/Makefile       |    2 +
 drivers/dma/idxd/idxd.h         |    7 ++++
 drivers/dma/idxd/init.c         |   77 +++++++++++++++++++++++++++++++++++++++
 drivers/vfio/mdev/Kconfig       |    9 +++++
 drivers/vfio/mdev/Makefile      |    1 +
 drivers/vfio/mdev/idxd/Makefile |    4 ++
 drivers/vfio/mdev/idxd/mdev.c   |   75 ++++++++++++++++++++++++++++++++++++++
 8 files changed, 182 insertions(+), 1 deletion(-)
 create mode 100644 drivers/vfio/mdev/idxd/Makefile
 create mode 100644 drivers/vfio/mdev/idxd/mdev.c

diff --git a/MAINTAINERS b/MAINTAINERS
index ae34b0331eb4..71862e759075 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8970,7 +8970,6 @@ INTEL IADX DRIVER
 M:	Dave Jiang <dave.jiang@intel.com>
 L:	dmaengine@vger.kernel.org
 S:	Supported
-F:	Documentation/driver-api/vfio/mdev-idxd.rst
 F:	drivers/dma/idxd/*
 F:	include/uapi/linux/idxd.h
 
@@ -18720,6 +18719,13 @@ F:	drivers/vfio/mdev/
 F:	include/linux/mdev.h
 F:	samples/vfio-mdev/
 
+VFIO MEDIATED DEVICE IDXD DRIVER
+M:	Dave Jiang <dave.jiang@intel.com>
+L:	kvm@vger.kernel.org
+S:	Maintained
+F:	Documentation/driver-api/vfio/mdev-idxd.rst
+F:	drivers/vfio/mdev/idxd/
+
 VFIO PLATFORM DRIVER
 M:	Eric Auger <eric.auger@redhat.com>
 L:	kvm@vger.kernel.org
diff --git a/drivers/dma/idxd/Makefile b/drivers/dma/idxd/Makefile
index 8978b898d777..d91d1718efac 100644
--- a/drivers/dma/idxd/Makefile
+++ b/drivers/dma/idxd/Makefile
@@ -1,2 +1,4 @@
+ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=IDXD
+
 obj-$(CONFIG_INTEL_IDXD) += idxd.o
 idxd-y := init.o irq.o device.o sysfs.o submit.o dma.o cdev.o
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index a2438b3166db..f02c96164515 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -8,6 +8,7 @@
 #include <linux/percpu-rwsem.h>
 #include <linux/wait.h>
 #include <linux/cdev.h>
+#include <linux/auxiliary_bus.h>
 #include "registers.h"
 
 #define IDXD_DRIVER_VERSION	"1.00"
@@ -221,6 +222,8 @@ struct idxd_device {
 	struct work_struct work;
 
 	int *int_handles;
+
+	struct auxiliary_device *mdev_auxdev;
 };
 
 /* IDXD software descriptor */
@@ -282,6 +285,10 @@ enum idxd_interrupt_type {
 	IDXD_IRQ_IMS,
 };
 
+struct idxd_mdev_aux_drv {
+	        struct auxiliary_driver auxiliary_drv;
+};
+
 static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot,
 					    enum idxd_interrupt_type irq_type)
 {
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index ee56b92108d8..fd57f39e4b7d 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -382,6 +382,74 @@ static void idxd_disable_system_pasid(struct idxd_device *idxd)
 	idxd->sva = NULL;
 }
 
+static void idxd_remove_mdev_auxdev(struct idxd_device *idxd)
+{
+	if (!IS_ENABLED(CONFIG_VFIO_MDEV_IDXD))
+		return;
+
+	auxiliary_device_delete(idxd->mdev_auxdev);
+	auxiliary_device_uninit(idxd->mdev_auxdev);
+}
+
+static void idxd_auxdev_release(struct device *dev)
+{
+	struct auxiliary_device *auxdev = to_auxiliary_dev(dev);
+	struct idxd_device *idxd = dev_get_drvdata(dev);
+
+	kfree(auxdev->name);
+	kfree(auxdev);
+	idxd->mdev_auxdev = NULL;
+}
+
+static int idxd_setup_mdev_auxdev(struct idxd_device *idxd)
+{
+	struct auxiliary_device *auxdev;
+	struct device *dev = &idxd->pdev->dev;
+	int rc;
+
+	if (!IS_ENABLED(CONFIG_VFIO_MDEV_IDXD))
+		return 0;
+
+	auxdev = kzalloc(sizeof(*auxdev), GFP_KERNEL);
+	if (!auxdev)
+		return -ENOMEM;
+
+	auxdev->name = kasprintf(GFP_KERNEL, "mdev-%s", idxd_name[idxd->type]);
+	if (!auxdev->name) {
+		rc = -ENOMEM;
+		goto err_name;
+	}
+
+	dev_dbg(&idxd->pdev->dev, "aux dev mdev: %s\n", auxdev->name);
+
+	auxdev->dev.parent = dev;
+	auxdev->dev.release = idxd_auxdev_release;
+	auxdev->id = idxd->id;
+
+	rc = auxiliary_device_init(auxdev);
+	if (rc < 0) {
+		dev_err(dev, "Failed to init aux dev: %d\n", rc);
+		goto err_auxdev;
+	}
+
+	rc = auxiliary_device_add(auxdev);
+	if (rc < 0) {
+		dev_err(dev, "Failed to add aux dev: %d\n", rc);
+		goto err_auxdev;
+	}
+
+	idxd->mdev_auxdev = auxdev;
+	dev_set_drvdata(&auxdev->dev, idxd);
+
+	return 0;
+
+ err_auxdev:
+	kfree(auxdev->name);
+ err_name:
+	kfree(auxdev);
+	return rc;
+}
+
 static int idxd_probe(struct idxd_device *idxd)
 {
 	struct pci_dev *pdev = idxd->pdev;
@@ -434,11 +502,19 @@ static int idxd_probe(struct idxd_device *idxd)
 		goto err_idr_fail;
 	}
 
+	rc = idxd_setup_mdev_auxdev(idxd);
+	if (rc < 0)
+		goto err_auxdev_fail;
+
 	idxd->major = idxd_cdev_get_major(idxd);
 
 	dev_dbg(dev, "IDXD device %d probed successfully\n", idxd->id);
 	return 0;
 
+ err_auxdev_fail:
+	mutex_lock(&idxd_idr_lock);
+	idr_remove(&idxd_idrs[idxd->type], idxd->id);
+	mutex_unlock(&idxd_idr_lock);
  err_idr_fail:
 	idxd_mask_error_interrupts(idxd);
 	idxd_mask_msix_vectors(idxd);
@@ -610,6 +686,7 @@ static void idxd_remove(struct pci_dev *pdev)
 	dev_dbg(&pdev->dev, "%s called\n", __func__);
 	idxd_cleanup_sysfs(idxd);
 	idxd_shutdown(pdev);
+	idxd_remove_mdev_auxdev(idxd);
 	if (device_pasid_enabled(idxd))
 		idxd_disable_system_pasid(idxd);
 	mutex_lock(&idxd_idr_lock);
diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
index 5da27f2100f9..e9540e43d1f1 100644
--- a/drivers/vfio/mdev/Kconfig
+++ b/drivers/vfio/mdev/Kconfig
@@ -16,3 +16,12 @@ config VFIO_MDEV_DEVICE
 	default n
 	help
 	  VFIO based driver for Mediated devices.
+
+config VFIO_MDEV_IDXD
+	tristate "VFIO Mediated device driver for Intel IDXD"
+	depends on VFIO && VFIO_MDEV && X86_64
+	select AUXILIARY_BUS
+	select IMS_MSI_ARRAY
+	default n
+	help
+	  VFIO based mediated device driver for Intel Accelerator Devices driver.
diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile
index 101516fdf375..338843fa6110 100644
--- a/drivers/vfio/mdev/Makefile
+++ b/drivers/vfio/mdev/Makefile
@@ -4,3 +4,4 @@ mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o
 
 obj-$(CONFIG_VFIO_MDEV) += mdev.o
 obj-$(CONFIG_VFIO_MDEV_DEVICE) += vfio_mdev.o
+obj-$(CONFIG_VFIO_MDEV_IDXD) += idxd/
diff --git a/drivers/vfio/mdev/idxd/Makefile b/drivers/vfio/mdev/idxd/Makefile
new file mode 100644
index 000000000000..e8f45cb96117
--- /dev/null
+++ b/drivers/vfio/mdev/idxd/Makefile
@@ -0,0 +1,4 @@
+ccflags-y += -I$(srctree)/drivers/dma/idxd -DDEFAULT_SYMBOL_NAMESPACE=IDXD
+
+obj-$(CONFIG_VFIO_MDEV_IDXD) += idxd_mdev.o
+idxd_mdev-y := mdev.o
diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c
new file mode 100644
index 000000000000..8b9a6adeb606
--- /dev/null
+++ b/drivers/vfio/mdev/idxd/mdev.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2020 Intel Corporation. All rights rsvd. */
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/device.h>
+#include <linux/auxiliary_bus.h>
+#include <uapi/linux/idxd.h>
+#include "registers.h"
+#include "idxd.h"
+
+static int idxd_mdev_host_init(struct idxd_device *idxd)
+{
+	/* FIXME: Fill in later */
+	return 0;
+}
+
+static int idxd_mdev_host_release(struct idxd_device *idxd)
+{
+	/* FIXME: Fill in later */
+	return 0;
+}
+
+static int idxd_mdev_aux_probe(struct auxiliary_device *auxdev,
+			       const struct auxiliary_device_id *id)
+{
+	struct idxd_device *idxd = dev_get_drvdata(&auxdev->dev);
+	int rc;
+
+	rc = idxd_mdev_host_init(idxd);
+	if (rc < 0) {
+		dev_warn(&auxdev->dev, "mdev host init failed: %d\n", rc);
+		return rc;
+	}
+
+	return 0;
+}
+
+static void idxd_mdev_aux_remove(struct auxiliary_device *auxdev)
+{
+	struct idxd_device *idxd = dev_get_drvdata(&auxdev->dev);
+
+	idxd_mdev_host_release(idxd);
+}
+
+static const struct auxiliary_device_id idxd_mdev_auxbus_id_table[] = {
+	{ .name = "idxd.mdev-dsa" },
+	{ .name = "idxd.mdev-iax" },
+	{},
+};
+MODULE_DEVICE_TABLE(auxiliary, idxd_mdev_auxbus_id_table);
+
+static struct idxd_mdev_aux_drv idxd_mdev_aux_drv = {
+	.auxiliary_drv = {
+		.id_table = idxd_mdev_auxbus_id_table,
+		.probe = idxd_mdev_aux_probe,
+		.remove = idxd_mdev_aux_remove,
+	},
+};
+
+static int idxd_mdev_auxdev_drv_register(struct idxd_mdev_aux_drv *drv)
+{
+	return auxiliary_driver_register(&drv->auxiliary_drv);
+}
+
+static void idxd_mdev_auxdev_drv_unregister(struct idxd_mdev_aux_drv *drv)
+{
+	auxiliary_driver_unregister(&drv->auxiliary_drv);
+}
+
+module_driver(idxd_mdev_aux_drv, idxd_mdev_auxdev_drv_register, idxd_mdev_auxdev_drv_unregister);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Intel Corporation");



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions
  2021-02-05 20:52 [PATCH v5 00/14] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
                   ` (3 preceding siblings ...)
  2021-02-05 20:53 ` [PATCH v5 04/14] vfio/mdev: idxd: Add auxialary device plumbing for idxd mdev support Dave Jiang
@ 2021-02-05 20:53 ` Dave Jiang
  2021-02-10 23:59   ` Jason Gunthorpe
  2021-02-05 20:53 ` [PATCH v5 06/14] vfio/mdev: idxd: add mdev type as a new wq type Dave Jiang
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Dave Jiang @ 2021-02-05 20:53 UTC (permalink / raw)
  To: alex.williamson, kwankhede, tglx, vkoul
  Cc: megha.dey, jacob.jun.pan, ashok.raj, jgg, yi.l.liu, baolu.lu,
	kevin.tian, sanjay.k.kumar, tony.luck, dan.j.williams,
	eric.auger, parav, netanelg, shahafs, pbonzini, dmaengine,
	linux-kernel, kvm

Create a mediated device through the VFIO mediated device framework. The
mdev framework allows creation of an mediated device by the driver with
portion of the device's resources. The driver will emulate the slow path
such as the PCI config space, MMIO bar, and the command registers. The
descriptor submission portal(s) will be mmaped to the guest in order to
submit descriptors directly by the guest kernel or apps. The mediated
device support code in the idxd will be referred to as the Virtual
Device Composition Module (vdcm). Add basic plumbing to fill out the
mdev_parent_ops struct that VFIO mdev requires to support a mediated
device.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/idxd/device.c       |    1 
 drivers/dma/idxd/idxd.h         |    7 
 drivers/dma/idxd/init.c         |    2 
 drivers/vfio/mdev/idxd/Makefile |    2 
 drivers/vfio/mdev/idxd/mdev.c   | 1006 +++++++++++++++++++++++++++++++++++++++
 drivers/vfio/mdev/idxd/mdev.h   |  115 ++++
 drivers/vfio/mdev/idxd/vdev.c   |   75 +++
 drivers/vfio/mdev/idxd/vdev.h   |   19 +
 8 files changed, 1218 insertions(+), 9 deletions(-)
 create mode 100644 drivers/vfio/mdev/idxd/mdev.h
 create mode 100644 drivers/vfio/mdev/idxd/vdev.c
 create mode 100644 drivers/vfio/mdev/idxd/vdev.h

diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index 2491b27c8125..89fa2bbe6ebf 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -265,6 +265,7 @@ int idxd_wq_disable(struct idxd_wq *wq)
 	dev_dbg(dev, "WQ %d disabled\n", wq->id);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(idxd_wq_disable);
 
 void idxd_wq_drain(struct idxd_wq *wq)
 {
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index f02c96164515..a271942df2be 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -133,6 +133,7 @@ struct idxd_wq {
 	u64 max_xfer_bytes;
 	u32 max_batch_size;
 	bool ats_dis;
+	struct list_head vdcm_list;
 };
 
 struct idxd_engine {
@@ -165,6 +166,7 @@ enum idxd_device_flag {
 	IDXD_FLAG_CMD_RUNNING,
 	IDXD_FLAG_PASID_ENABLED,
 	IDXD_FLAG_IMS_SUPPORTED,
+	IDXD_FLAG_MDEV_ENABLED,
 };
 
 struct idxd_device {
@@ -275,6 +277,11 @@ static inline bool device_swq_supported(struct idxd_device *idxd)
 	return (support_enqcmd && device_pasid_enabled(idxd));
 }
 
+static inline bool device_mdev_enabled(struct idxd_device *idxd)
+{
+	return test_bit(IDXD_FLAG_MDEV_ENABLED, &idxd->flags);
+}
+
 enum idxd_portal_prot {
 	IDXD_PORTAL_UNLIMITED = 0,
 	IDXD_PORTAL_LIMITED,
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index fd57f39e4b7d..cc3b757d300f 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -215,7 +215,6 @@ static int idxd_setup_internals(struct idxd_device *idxd)
 
 	for (i = 0; i < idxd->max_wqs; i++) {
 		struct idxd_wq *wq = &idxd->wqs[i];
-		int rc;
 
 		wq->id = i;
 		wq->idxd = idxd;
@@ -227,6 +226,7 @@ static int idxd_setup_internals(struct idxd_device *idxd)
 		if (!wq->wqcfg)
 			return -ENOMEM;
 		init_completion(&wq->wq_dead);
+		INIT_LIST_HEAD(&wq->vdcm_list);
 	}
 
 	for (i = 0; i < idxd->max_engines; i++) {
diff --git a/drivers/vfio/mdev/idxd/Makefile b/drivers/vfio/mdev/idxd/Makefile
index e8f45cb96117..27a08621d120 100644
--- a/drivers/vfio/mdev/idxd/Makefile
+++ b/drivers/vfio/mdev/idxd/Makefile
@@ -1,4 +1,4 @@
 ccflags-y += -I$(srctree)/drivers/dma/idxd -DDEFAULT_SYMBOL_NAMESPACE=IDXD
 
 obj-$(CONFIG_VFIO_MDEV_IDXD) += idxd_mdev.o
-idxd_mdev-y := mdev.o
+idxd_mdev-y := mdev.o vdev.o
diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c
index 8b9a6adeb606..384ba5d6bc2b 100644
--- a/drivers/vfio/mdev/idxd/mdev.c
+++ b/drivers/vfio/mdev/idxd/mdev.c
@@ -1,27 +1,1017 @@
 // SPDX-License-Identifier: GPL-2.0
-/* Copyright(c) 2020 Intel Corporation. All rights rsvd. */
+/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/pci.h>
 #include <linux/device.h>
-#include <linux/auxiliary_bus.h>
+#include <linux/sched/task.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/mm.h>
+#include <linux/mmu_context.h>
+#include <linux/vfio.h>
+#include <linux/mdev.h>
+#include <linux/msi.h>
+#include <linux/intel-iommu.h>
+#include <linux/intel-svm.h>
+#include <linux/kvm_host.h>
+#include <linux/eventfd.h>
+#include <linux/circ_buf.h>
+#include <linux/irqchip/irq-ims-msi.h>
 #include <uapi/linux/idxd.h>
 #include "registers.h"
 #include "idxd.h"
+#include "../../vfio/pci/vfio_pci_private.h"
+#include "mdev.h"
+#include "vdev.h"
 
-static int idxd_mdev_host_init(struct idxd_device *idxd)
+static u64 idxd_pci_config[] = {
+	0x0010000000008086ULL,
+	0x0080000008800000ULL,
+	0x000000000000000cULL,
+	0x000000000000000cULL,
+	0x0000000000000000ULL,
+	0x2010808600000000ULL,
+	0x0000004000000000ULL,
+	0x000000ff00000000ULL,
+	0x0000060000015011ULL, /* MSI-X capability, hardcoded 2 entries, Encoded as N-1 */
+	0x0000070000000000ULL,
+	0x0000000000920010ULL, /* PCIe capability */
+	0x0000000000000000ULL,
+	0x0000000000000000ULL,
+	0x0000000000000000ULL,
+	0x0000000000000000ULL,
+	0x0000000000000000ULL,
+	0x0000000000000000ULL,
+	0x0000000000000000ULL,
+};
+
+static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, unsigned int index,
+			      unsigned int start, unsigned int count, void *data);
+
+static int idxd_mdev_get_pasid(struct mdev_device *mdev, u32 *pasid)
+{
+	struct vfio_group *vfio_group;
+	struct iommu_domain *iommu_domain;
+	struct device *dev = mdev_dev(mdev);
+	struct device *iommu_device = mdev_get_iommu_device(dev);
+	struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev);
+	int mdev_pasid;
+
+	if (!vidxd->vdev.vfio_group) {
+		dev_warn(dev, "Missing vfio_group.\n");
+		return -EINVAL;
+	}
+
+	vfio_group = vidxd->vdev.vfio_group;
+
+	iommu_domain = vfio_group_iommu_domain(vfio_group);
+	if (IS_ERR_OR_NULL(iommu_domain))
+		goto err;
+
+	mdev_pasid = iommu_aux_get_pasid(iommu_domain, iommu_device);
+	if (mdev_pasid < 0)
+		goto err;
+
+	*pasid = (u32)mdev_pasid;
+	return 0;
+
+ err:
+	vfio_group_put_external_user(vfio_group);
+	vidxd->vdev.vfio_group = NULL;
+	return -EFAULT;
+}
+
+static inline void reset_vconfig(struct vdcm_idxd *vidxd)
+{
+	u16 *devid = (u16 *)(vidxd->cfg + PCI_DEVICE_ID);
+	struct idxd_device *idxd = vidxd->idxd;
+
+	memset(vidxd->cfg, 0, VIDXD_MAX_CFG_SPACE_SZ);
+	memcpy(vidxd->cfg, idxd_pci_config, sizeof(idxd_pci_config));
+
+	if (idxd->type == IDXD_TYPE_DSA)
+		*devid = PCI_DEVICE_ID_INTEL_DSA_SPR0;
+	else if (idxd->type == IDXD_TYPE_IAX)
+		*devid = PCI_DEVICE_ID_INTEL_IAX_SPR0;
+}
+
+static inline void reset_vmmio(struct vdcm_idxd *vidxd)
+{
+	memset(&vidxd->bar0, 0, VIDXD_MAX_MMIO_SPACE_SZ);
+}
+
+static void idxd_vdcm_init(struct vdcm_idxd *vidxd)
+{
+	struct idxd_wq *wq = vidxd->wq;
+
+	reset_vconfig(vidxd);
+	reset_vmmio(vidxd);
+
+	vidxd->bar_size[0] = VIDXD_BAR0_SIZE;
+	vidxd->bar_size[1] = VIDXD_BAR2_SIZE;
+
+	vidxd_mmio_init(vidxd);
+
+	if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED)
+		idxd_wq_disable(wq);
+}
+
+static void idxd_vdcm_release(struct mdev_device *mdev)
+{
+	struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev);
+	struct device *dev = mdev_dev(mdev);
+
+	dev_dbg(dev, "vdcm_idxd_release %d\n", vidxd->type->type);
+	mutex_lock(&vidxd->dev_lock);
+	if (!vidxd->refcount)
+		goto out;
+
+	idxd_vdcm_set_irqs(vidxd, VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
+			   VFIO_PCI_MSIX_IRQ_INDEX, 0, 0, NULL);
+
+	vidxd_free_ims_entries(vidxd);
+	if (vidxd->vdev.vfio_group) {
+		vfio_group_put_external_user(vidxd->vdev.vfio_group);
+		vidxd->vdev.vfio_group = NULL;
+	}
+
+	/* Re-initialize the VIDXD to a pristine state for re-use */
+	idxd_vdcm_init(vidxd);
+	vidxd->refcount--;
+
+ out:
+	mutex_unlock(&vidxd->dev_lock);
+}
+
+static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev_device *mdev,
+					   struct vdcm_idxd_type *type)
+{
+	struct vdcm_idxd *vidxd;
+	struct idxd_wq *wq = NULL;
+	int i;
+
+	/* PLACEHOLDER, wq matching comes later */
+
+	if (!wq)
+		return ERR_PTR(-ENODEV);
+
+	vidxd = kzalloc(sizeof(*vidxd), GFP_KERNEL);
+	if (!vidxd)
+		return ERR_PTR(-ENOMEM);
+
+	mutex_init(&vidxd->dev_lock);
+	vidxd->idxd = idxd;
+	vidxd->vdev.mdev = mdev;
+	vidxd->wq = wq;
+	mdev_set_drvdata(mdev, vidxd);
+	vidxd->type = type;
+	vidxd->num_wqs = VIDXD_MAX_WQS;
+
+	idxd_vdcm_init(vidxd);
+	mutex_lock(&wq->wq_lock);
+	idxd_wq_get(wq);
+	mutex_unlock(&wq->wq_lock);
+
+	for (i = 0; i < VIDXD_MAX_MSIX_ENTRIES; i++) {
+		vidxd->irq_entries[i].vidxd = vidxd;
+		vidxd->irq_entries[i].id = i;
+	}
+
+	return vidxd;
+}
+
+static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES];
+
+static struct vdcm_idxd_type *idxd_vdcm_find_vidxd_type(struct device *dev,
+							const char *name)
+{
+	int i;
+	char dev_name[IDXD_MDEV_NAME_LEN];
+
+	for (i = 0; i < IDXD_MDEV_TYPES; i++) {
+		snprintf(dev_name, IDXD_MDEV_NAME_LEN, "idxd-%s",
+			 idxd_mdev_types[i].name);
+
+		if (!strncmp(name, dev_name, IDXD_MDEV_NAME_LEN))
+			return &idxd_mdev_types[i];
+	}
+
+	return NULL;
+}
+
+static int idxd_vdcm_create(struct kobject *kobj, struct mdev_device *mdev)
+{
+	struct vdcm_idxd *vidxd;
+	struct vdcm_idxd_type *type;
+	struct device *dev, *parent;
+	struct idxd_device *idxd;
+	struct idxd_wq *wq;
+
+	parent = mdev_parent_dev(mdev);
+	idxd = dev_get_drvdata(parent);
+	dev = mdev_dev(mdev);
+	mdev_set_iommu_device(dev, parent);
+	type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj));
+	if (!type) {
+		dev_err(dev, "failed to find type %s to create\n",
+			kobject_name(kobj));
+		return -EINVAL;
+	}
+
+	vidxd = vdcm_vidxd_create(idxd, mdev, type);
+	if (IS_ERR(vidxd)) {
+		dev_err(dev, "failed to create vidxd: %ld\n", PTR_ERR(vidxd));
+		return PTR_ERR(vidxd);
+	}
+
+	wq = vidxd->wq;
+	mutex_lock(&wq->wq_lock);
+	list_add(&vidxd->list, &wq->vdcm_list);
+	mutex_unlock(&wq->wq_lock);
+	dev_dbg(dev, "mdev creation success: %s\n", dev_name(mdev_dev(mdev)));
+
+	return 0;
+}
+
+static int idxd_vdcm_remove(struct mdev_device *mdev)
+{
+	struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev);
+	struct idxd_device *idxd = vidxd->idxd;
+	struct device *dev = &idxd->pdev->dev;
+	struct idxd_wq *wq = vidxd->wq;
+
+	dev_dbg(dev, "%s: removing for wq %d\n", __func__, vidxd->wq->id);
+
+	mutex_lock(&wq->wq_lock);
+	list_del(&vidxd->list);
+	idxd_wq_put(wq);
+	mutex_unlock(&wq->wq_lock);
+
+	kfree(vidxd);
+	return 0;
+}
+
+static int idxd_vdcm_open(struct mdev_device *mdev)
+{
+	struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev);
+	int rc = -EINVAL;
+	struct vdcm_idxd_type *type = vidxd->type;
+	struct device *dev = mdev_dev(mdev);
+	struct vfio_group *vfio_group;
+
+	dev_dbg(dev, "%s: type: %d\n", __func__, type->type);
+
+	mutex_lock(&vidxd->dev_lock);
+	if (vidxd->refcount)
+		goto out;
+
+	vfio_group = vfio_group_get_external_user_from_dev(dev);
+	if (IS_ERR_OR_NULL(vfio_group))
+		return -EFAULT;
+	vidxd->vdev.vfio_group = vfio_group;
+
+	/* allocate and setup IMS entries */
+	rc = vidxd_setup_ims_entries(vidxd);
+	if (rc < 0)
+		goto ims_fail;
+
+	vidxd->refcount++;
+	mutex_unlock(&vidxd->dev_lock);
+
+	return rc;
+
+ ims_fail:
+	vfio_group_put_external_user(vfio_group);
+	vidxd->vdev.vfio_group = NULL;
+ out:
+	mutex_unlock(&vidxd->dev_lock);
+	return rc;
+}
+
+static ssize_t idxd_vdcm_rw(struct mdev_device *mdev, char *buf, size_t count, loff_t *ppos,
+			    enum idxd_vdcm_rw mode)
+{
+	struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev);
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	u64 pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	struct device *dev = mdev_dev(mdev);
+	int rc = -EINVAL;
+
+	if (index >= VFIO_PCI_NUM_REGIONS) {
+		dev_err(dev, "invalid index: %u\n", index);
+		return -EINVAL;
+	}
+
+	switch (index) {
+	case VFIO_PCI_CONFIG_REGION_INDEX:
+		if (mode == IDXD_VDCM_WRITE)
+			rc = vidxd_cfg_write(vidxd, pos, buf, count);
+		else
+			rc = vidxd_cfg_read(vidxd, pos, buf, count);
+		break;
+	case VFIO_PCI_BAR0_REGION_INDEX:
+	case VFIO_PCI_BAR1_REGION_INDEX:
+		if (mode == IDXD_VDCM_WRITE)
+			rc = vidxd_mmio_write(vidxd, vidxd->bar_val[0] + pos, buf, count);
+		else
+			rc = vidxd_mmio_read(vidxd, vidxd->bar_val[0] + pos, buf, count);
+		break;
+	case VFIO_PCI_BAR2_REGION_INDEX:
+	case VFIO_PCI_BAR3_REGION_INDEX:
+	case VFIO_PCI_BAR4_REGION_INDEX:
+	case VFIO_PCI_BAR5_REGION_INDEX:
+	case VFIO_PCI_VGA_REGION_INDEX:
+	case VFIO_PCI_ROM_REGION_INDEX:
+	default:
+		dev_err(dev, "unsupported region: %u\n", index);
+	}
+
+	return rc == 0 ? count : rc;
+}
+
+static ssize_t idxd_vdcm_read(struct mdev_device *mdev, char __user *buf, size_t count,
+			      loff_t *ppos)
+{
+	struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev);
+	unsigned int done = 0;
+	int rc;
+
+	mutex_lock(&vidxd->dev_lock);
+	while (count) {
+		size_t filled;
+
+		if (count >= 4 && !(*ppos % 4)) {
+			u32 val;
+
+			rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val),
+					  ppos, IDXD_VDCM_READ);
+			if (rc <= 0)
+				goto read_err;
+
+			if (copy_to_user(buf, &val, sizeof(val)))
+				goto read_err;
+
+			filled = 4;
+		} else if (count >= 2 && !(*ppos % 2)) {
+			u16 val;
+
+			rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val),
+					  ppos, IDXD_VDCM_READ);
+			if (rc <= 0)
+				goto read_err;
+
+			if (copy_to_user(buf, &val, sizeof(val)))
+				goto read_err;
+
+			filled = 2;
+		} else {
+			u8 val;
+
+			rc = idxd_vdcm_rw(mdev, &val, sizeof(val), ppos,
+					  IDXD_VDCM_READ);
+			if (rc <= 0)
+				goto read_err;
+
+			if (copy_to_user(buf, &val, sizeof(val)))
+				goto read_err;
+
+			filled = 1;
+		}
+
+		count -= filled;
+		done += filled;
+		*ppos += filled;
+		buf += filled;
+	}
+
+	mutex_unlock(&vidxd->dev_lock);
+	return done;
+
+ read_err:
+	mutex_unlock(&vidxd->dev_lock);
+	return -EFAULT;
+}
+
+static ssize_t idxd_vdcm_write(struct mdev_device *mdev, const char __user *buf, size_t count,
+			       loff_t *ppos)
+{
+	struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev);
+	unsigned int done = 0;
+	int rc;
+
+	mutex_lock(&vidxd->dev_lock);
+	while (count) {
+		size_t filled;
+
+		if (count >= 4 && !(*ppos % 4)) {
+			u32 val;
+
+			if (copy_from_user(&val, buf, sizeof(val)))
+				goto write_err;
+
+			rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val),
+					  ppos, IDXD_VDCM_WRITE);
+			if (rc <= 0)
+				goto write_err;
+
+			filled = 4;
+		} else if (count >= 2 && !(*ppos % 2)) {
+			u16 val;
+
+			if (copy_from_user(&val, buf, sizeof(val)))
+				goto write_err;
+
+			rc = idxd_vdcm_rw(mdev, (char *)&val,
+					  sizeof(val), ppos, IDXD_VDCM_WRITE);
+			if (rc <= 0)
+				goto write_err;
+
+			filled = 2;
+		} else {
+			u8 val;
+
+			if (copy_from_user(&val, buf, sizeof(val)))
+				goto write_err;
+
+			rc = idxd_vdcm_rw(mdev, &val, sizeof(val),
+					  ppos, IDXD_VDCM_WRITE);
+			if (rc <= 0)
+				goto write_err;
+
+			filled = 1;
+		}
+
+		count -= filled;
+		done += filled;
+		*ppos += filled;
+		buf += filled;
+	}
+
+	mutex_unlock(&vidxd->dev_lock);
+	return done;
+
+write_err:
+	mutex_unlock(&vidxd->dev_lock);
+	return -EFAULT;
+}
+
+static int check_vma(struct idxd_wq *wq, struct vm_area_struct *vma)
 {
-	/* FIXME: Fill in later */
+	if (vma->vm_end < vma->vm_start)
+		return -EINVAL;
+	if (!(vma->vm_flags & VM_SHARED))
+		return -EINVAL;
+
 	return 0;
 }
 
-static int idxd_mdev_host_release(struct idxd_device *idxd)
+static int idxd_vdcm_mmap(struct mdev_device *mdev, struct vm_area_struct *vma)
+{
+	unsigned int wq_idx, rc;
+	unsigned long req_size, pgoff = 0, offset;
+	pgprot_t pg_prot;
+	struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev);
+	struct idxd_wq *wq = vidxd->wq;
+	struct idxd_device *idxd = vidxd->idxd;
+	enum idxd_portal_prot virt_portal, phys_portal;
+	phys_addr_t base = pci_resource_start(idxd->pdev, IDXD_WQ_BAR);
+	struct device *dev = mdev_dev(mdev);
+
+	rc = check_vma(wq, vma);
+	if (rc)
+		return rc;
+
+	pg_prot = vma->vm_page_prot;
+	req_size = vma->vm_end - vma->vm_start;
+	vma->vm_flags |= VM_DONTCOPY;
+
+	offset = (vma->vm_pgoff << PAGE_SHIFT) &
+		 ((1ULL << VFIO_PCI_OFFSET_SHIFT) - 1);
+
+	wq_idx = offset >> (PAGE_SHIFT + 2);
+	if (wq_idx >= 1) {
+		dev_err(dev, "mapping invalid wq %d off %lx\n",
+			wq_idx, offset);
+		return -EINVAL;
+	}
+
+	/*
+	 * Check and see if the guest wants to map to the limited or unlimited portal.
+	 * The driver will allow mapping to unlimited portal only if the the wq is a
+	 * dedicated wq. Otherwise, it goes to limited.
+	 */
+	virt_portal = ((offset >> PAGE_SHIFT) & 0x3) == 1;
+	phys_portal = IDXD_PORTAL_LIMITED;
+	if (virt_portal == IDXD_PORTAL_UNLIMITED && wq_dedicated(wq))
+		phys_portal = IDXD_PORTAL_UNLIMITED;
+
+	/* We always map IMS portals to the guest */
+	pgoff = (base + idxd_get_wq_portal_full_offset(wq->id, phys_portal,
+						       IDXD_IRQ_IMS)) >> PAGE_SHIFT;
+
+	dev_dbg(dev, "mmap %lx %lx %lx %lx\n", vma->vm_start, pgoff, req_size,
+		pgprot_val(pg_prot));
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+	vma->vm_private_data = mdev;
+	vma->vm_pgoff = pgoff;
+
+	return remap_pfn_range(vma, vma->vm_start, pgoff, req_size, pg_prot);
+}
+
+static int idxd_vdcm_get_irq_count(struct vdcm_idxd *vidxd, int type)
 {
-	/* FIXME: Fill in later */
+	/*
+	 * Even though the number of MSIX vectors supported are not tied to number of
+	 * wqs being exported, the current design is to allow 1 vector per WQ for guest.
+	 * So here we end up with num of wqs plus 1 that handles the misc interrupts.
+	 */
+	if (type == VFIO_PCI_MSI_IRQ_INDEX || type == VFIO_PCI_MSIX_IRQ_INDEX)
+		return VIDXD_MAX_MSIX_VECS;
+
 	return 0;
 }
 
+static irqreturn_t idxd_guest_wq_completion(int irq, void *data)
+{
+	struct ims_irq_entry *irq_entry = data;
+
+	vidxd_send_interrupt(irq_entry);
+	return IRQ_HANDLED;
+}
+
+static int msix_trigger_unregister(struct vdcm_idxd *vidxd, int index)
+{
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+	struct ims_irq_entry *irq_entry;
+	int rc;
+
+	if (!vidxd->vdev.msix_trigger[index])
+		return 0;
+
+	dev_dbg(dev, "disable MSIX trigger %d\n", index);
+	if (index) {
+		u32 auxval;
+
+		irq_entry = &vidxd->irq_entries[index];
+		if (irq_entry->irq_set) {
+			free_irq(irq_entry->irq, irq_entry);
+			irq_entry->irq_set = false;
+		}
+
+		auxval = ims_ctrl_pasid_aux(0, false);
+		rc = irq_set_auxdata(irq_entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval);
+		if (rc)
+			return rc;
+	}
+	eventfd_ctx_put(vidxd->vdev.msix_trigger[index]);
+	vidxd->vdev.msix_trigger[index] = NULL;
+
+	return 0;
+}
+
+static int msix_trigger_register(struct vdcm_idxd *vidxd, u32 fd, int index)
+{
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+	struct ims_irq_entry *irq_entry;
+	struct eventfd_ctx *trigger;
+	int rc;
+
+	if (vidxd->vdev.msix_trigger[index])
+		return 0;
+
+	dev_dbg(dev, "enable MSIX trigger %d\n", index);
+	trigger = eventfd_ctx_fdget(fd);
+	if (IS_ERR(trigger)) {
+		dev_warn(dev, "eventfd_ctx_fdget failed %d\n", index);
+		return PTR_ERR(trigger);
+	}
+
+	if (index) {
+		u32 pasid;
+		u32 auxval;
+
+		irq_entry = &vidxd->irq_entries[index];
+		rc = idxd_mdev_get_pasid(mdev, &pasid);
+		if (rc < 0)
+			return rc;
+
+		/*
+		 * Program and enable the pasid field in the IMS entry. The programmed pasid and
+		 * enabled field is checked against the  pasid and enable field for the work queue
+		 * configuration and the pasid for the descriptor. A mismatch will result in blocked
+		 * IMS interrupt.
+		 */
+		auxval = ims_ctrl_pasid_aux(pasid, true);
+		rc = irq_set_auxdata(irq_entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval);
+		if (rc < 0)
+			return rc;
+
+		rc = request_irq(irq_entry->irq, idxd_guest_wq_completion, 0, "idxd-ims",
+				 irq_entry);
+		if (rc) {
+			dev_warn(dev, "failed to request ims irq\n");
+			eventfd_ctx_put(trigger);
+			auxval = ims_ctrl_pasid_aux(0, false);
+			irq_set_auxdata(irq_entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval);
+			return rc;
+		}
+		irq_entry->irq_set = true;
+	}
+
+	vidxd->vdev.msix_trigger[index] = trigger;
+	return 0;
+}
+
+static int vdcm_idxd_set_msix_trigger(struct vdcm_idxd *vidxd,
+				      unsigned int index, unsigned int start,
+				      unsigned int count, uint32_t flags,
+				      void *data)
+{
+	int i, rc = 0;
+
+	if (count > VIDXD_MAX_MSIX_ENTRIES - 1)
+		count = VIDXD_MAX_MSIX_ENTRIES - 1;
+
+	if (count == 0 && (flags & VFIO_IRQ_SET_DATA_NONE)) {
+		/* Disable all MSIX entries */
+		for (i = 0; i < VIDXD_MAX_MSIX_ENTRIES; i++) {
+			rc = msix_trigger_unregister(vidxd, i);
+			if (rc < 0)
+				return rc;
+		}
+		return 0;
+	}
+
+	for (i = 0; i < count; i++) {
+		if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
+			u32 fd = *(u32 *)(data + i * sizeof(u32));
+
+			rc = msix_trigger_register(vidxd, fd, i);
+			if (rc < 0)
+				return rc;
+		} else if (flags & VFIO_IRQ_SET_DATA_NONE) {
+			rc = msix_trigger_unregister(vidxd, i);
+			if (rc < 0)
+				return rc;
+		}
+	}
+	return rc;
+}
+
+static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags,
+			      unsigned int index, unsigned int start,
+			      unsigned int count, void *data)
+{
+	int (*func)(struct vdcm_idxd *vidxd, unsigned int index,
+		    unsigned int start, unsigned int count, uint32_t flags,
+		    void *data) = NULL;
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+
+	switch (index) {
+	case VFIO_PCI_INTX_IRQ_INDEX:
+		dev_warn(dev, "intx interrupts not supported.\n");
+		break;
+	case VFIO_PCI_MSI_IRQ_INDEX:
+		dev_dbg(dev, "msi interrupt.\n");
+		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
+		case VFIO_IRQ_SET_ACTION_MASK:
+		case VFIO_IRQ_SET_ACTION_UNMASK:
+			break;
+		case VFIO_IRQ_SET_ACTION_TRIGGER:
+			func = vdcm_idxd_set_msix_trigger;
+			break;
+		}
+		break;
+	case VFIO_PCI_MSIX_IRQ_INDEX:
+		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
+		case VFIO_IRQ_SET_ACTION_MASK:
+		case VFIO_IRQ_SET_ACTION_UNMASK:
+			break;
+		case VFIO_IRQ_SET_ACTION_TRIGGER:
+			func = vdcm_idxd_set_msix_trigger;
+			break;
+		}
+		break;
+	default:
+		return -ENOTTY;
+	}
+
+	if (!func)
+		return -ENOTTY;
+
+	return func(vidxd, index, start, count, flags, data);
+}
+
+static void vidxd_vdcm_reset(struct vdcm_idxd *vidxd)
+{
+	vidxd_reset(vidxd);
+}
+
+static long idxd_vdcm_ioctl(struct mdev_device *mdev, unsigned int cmd,
+			    unsigned long arg)
+{
+	struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev);
+	unsigned long minsz;
+	int rc = -EINVAL;
+	struct device *dev = mdev_dev(mdev);
+
+	dev_dbg(dev, "vidxd %p ioctl, cmd: %d\n", vidxd, cmd);
+
+	mutex_lock(&vidxd->dev_lock);
+	if (cmd == VFIO_DEVICE_GET_INFO) {
+		struct vfio_device_info info;
+
+		minsz = offsetofend(struct vfio_device_info, num_irqs);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz)) {
+			rc = -EFAULT;
+			goto out;
+		}
+
+		if (info.argsz < minsz) {
+			rc = -EINVAL;
+			goto out;
+		}
+
+		info.flags = VFIO_DEVICE_FLAGS_PCI;
+		info.flags |= VFIO_DEVICE_FLAGS_RESET;
+		info.num_regions = VFIO_PCI_NUM_REGIONS;
+		info.num_irqs = VFIO_PCI_NUM_IRQS;
+
+		if (copy_to_user((void __user *)arg, &info, minsz))
+			rc = -EFAULT;
+		else
+			rc = 0;
+		goto out;
+	} else if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
+		struct vfio_region_info info;
+		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+		struct vfio_region_info_cap_sparse_mmap *sparse = NULL;
+		size_t size;
+		int nr_areas = 1;
+		int cap_type_id = 0;
+
+		minsz = offsetofend(struct vfio_region_info, offset);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz)) {
+			rc = -EFAULT;
+			goto out;
+		}
+
+		if (info.argsz < minsz) {
+			rc = -EINVAL;
+			goto out;
+		}
+
+		switch (info.index) {
+		case VFIO_PCI_CONFIG_REGION_INDEX:
+			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+			info.size = VIDXD_MAX_CFG_SPACE_SZ;
+			info.flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE;
+			break;
+		case VFIO_PCI_BAR0_REGION_INDEX:
+			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+			info.size = vidxd->bar_size[info.index];
+			if (!info.size) {
+				info.flags = 0;
+				break;
+			}
+
+			info.flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE;
+			break;
+		case VFIO_PCI_BAR1_REGION_INDEX:
+			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+			info.size = 0;
+			info.flags = 0;
+			break;
+		case VFIO_PCI_BAR2_REGION_INDEX:
+			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+			info.flags = VFIO_REGION_INFO_FLAG_CAPS | VFIO_REGION_INFO_FLAG_MMAP |
+				     VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE;
+			info.size = vidxd->bar_size[1];
+
+			/*
+			 * Every WQ has two areas for unlimited and limited
+			 * MSI-X portals. IMS portals are not reported
+			 */
+			nr_areas = 2;
+
+			size = sizeof(*sparse) + (nr_areas * sizeof(*sparse->areas));
+			sparse = kzalloc(size, GFP_KERNEL);
+			if (!sparse) {
+				rc = -ENOMEM;
+				goto out;
+			}
+
+			sparse->header.id = VFIO_REGION_INFO_CAP_SPARSE_MMAP;
+			sparse->header.version = 1;
+			sparse->nr_areas = nr_areas;
+			cap_type_id = VFIO_REGION_INFO_CAP_SPARSE_MMAP;
+
+			/* Unlimited portal */
+			sparse->areas[0].offset = 0;
+			sparse->areas[0].size = PAGE_SIZE;
+
+			/* Limited portal */
+			sparse->areas[1].offset = PAGE_SIZE;
+			sparse->areas[1].size = PAGE_SIZE;
+			break;
+
+		case VFIO_PCI_BAR3_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX:
+			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+			info.size = 0;
+			info.flags = 0;
+			dev_dbg(dev, "get region info bar:%d\n", info.index);
+			break;
+
+		case VFIO_PCI_ROM_REGION_INDEX:
+		case VFIO_PCI_VGA_REGION_INDEX:
+			dev_dbg(dev, "get region info index:%d\n", info.index);
+			break;
+		default: {
+			if (info.index >= VFIO_PCI_NUM_REGIONS)
+				rc = -EINVAL;
+			else
+				rc = 0;
+			goto out;
+		} /* default */
+		} /* info.index switch */
+
+		if ((info.flags & VFIO_REGION_INFO_FLAG_CAPS) && sparse) {
+			if (cap_type_id == VFIO_REGION_INFO_CAP_SPARSE_MMAP) {
+				rc = vfio_info_add_capability(&caps, &sparse->header,
+							      sizeof(*sparse) + (sparse->nr_areas *
+							      sizeof(*sparse->areas)));
+				kfree(sparse);
+				if (rc)
+					goto out;
+			}
+		}
+
+		if (caps.size) {
+			if (info.argsz < sizeof(info) + caps.size) {
+				info.argsz = sizeof(info) + caps.size;
+				info.cap_offset = 0;
+			} else {
+				vfio_info_cap_shift(&caps, sizeof(info));
+				if (copy_to_user((void __user *)arg + sizeof(info),
+						 caps.buf, caps.size)) {
+					kfree(caps.buf);
+					rc = -EFAULT;
+					goto out;
+				}
+				info.cap_offset = sizeof(info);
+			}
+
+			kfree(caps.buf);
+		}
+		if (copy_to_user((void __user *)arg, &info, minsz))
+			rc = -EFAULT;
+		else
+			rc = 0;
+		goto out;
+	} else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
+		struct vfio_irq_info info;
+
+		minsz = offsetofend(struct vfio_irq_info, count);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz)) {
+			rc = -EFAULT;
+			goto out;
+		}
+
+		if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS) {
+			rc = -EINVAL;
+			goto out;
+		}
+
+		info.flags = VFIO_IRQ_INFO_EVENTFD;
+
+		switch (info.index) {
+		case VFIO_PCI_INTX_IRQ_INDEX:
+			info.flags |= (VFIO_IRQ_INFO_MASKABLE | VFIO_IRQ_INFO_AUTOMASKED);
+			break;
+		case VFIO_PCI_MSI_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
+		case VFIO_PCI_REQ_IRQ_INDEX:
+			info.flags |= VFIO_IRQ_INFO_NORESIZE;
+			break;
+		case VFIO_PCI_ERR_IRQ_INDEX:
+			info.flags |= VFIO_IRQ_INFO_NORESIZE;
+			if (pci_is_pcie(vidxd->idxd->pdev))
+				break;
+			fallthrough;
+		default:
+			rc = -EINVAL;
+			goto out;
+		} /* switch(info.index) */
+
+		info.flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_NORESIZE;
+		info.count = idxd_vdcm_get_irq_count(vidxd, info.index);
+
+		if (copy_to_user((void __user *)arg, &info, minsz))
+			rc = -EFAULT;
+		else
+			rc = 0;
+		goto out;
+	} else if (cmd == VFIO_DEVICE_SET_IRQS) {
+		struct vfio_irq_set hdr;
+		u8 *data = NULL;
+		size_t data_size = 0;
+
+		minsz = offsetofend(struct vfio_irq_set, count);
+
+		if (copy_from_user(&hdr, (void __user *)arg, minsz)) {
+			rc = -EFAULT;
+			goto out;
+		}
+
+		if (!(hdr.flags & VFIO_IRQ_SET_DATA_NONE)) {
+			int max = idxd_vdcm_get_irq_count(vidxd, hdr.index);
+
+			rc = vfio_set_irqs_validate_and_prepare(&hdr, max, VFIO_PCI_NUM_IRQS,
+								&data_size);
+			if (rc) {
+				dev_err(dev, "intel:vfio_set_irqs_validate_and_prepare failed\n");
+				rc = -EINVAL;
+				goto out;
+			}
+			if (data_size) {
+				data = memdup_user((void __user *)(arg + minsz), data_size);
+				if (IS_ERR(data)) {
+					rc = PTR_ERR(data);
+					goto out;
+				}
+			}
+		}
+
+		if (!data) {
+			rc = -EINVAL;
+			goto out;
+		}
+
+		rc = idxd_vdcm_set_irqs(vidxd, hdr.flags, hdr.index, hdr.start, hdr.count, data);
+		kfree(data);
+		goto out;
+	} else if (cmd == VFIO_DEVICE_RESET) {
+		vidxd_vdcm_reset(vidxd);
+	}
+
+ out:
+	mutex_unlock(&vidxd->dev_lock);
+	return rc;
+}
+
+static const struct mdev_parent_ops idxd_vdcm_ops = {
+	.create			= idxd_vdcm_create,
+	.remove			= idxd_vdcm_remove,
+	.open			= idxd_vdcm_open,
+	.release		= idxd_vdcm_release,
+	.read			= idxd_vdcm_read,
+	.write			= idxd_vdcm_write,
+	.mmap			= idxd_vdcm_mmap,
+	.ioctl			= idxd_vdcm_ioctl,
+};
+
+int idxd_mdev_host_init(struct idxd_device *idxd)
+{
+	struct device *dev = &idxd->pdev->dev;
+	int rc;
+
+	if (!test_bit(IDXD_FLAG_IMS_SUPPORTED, &idxd->flags))
+		return -EOPNOTSUPP;
+
+	if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
+		rc = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_AUX);
+		if (rc < 0) {
+			dev_warn(dev, "Failed to enable aux-domain: %d\n", rc);
+			return rc;
+		}
+	} else {
+		dev_warn(dev, "No aux-domain feature.\n");
+		return -EOPNOTSUPP;
+	}
+
+	return mdev_register_device(dev, &idxd_vdcm_ops);
+}
+
+void idxd_mdev_host_release(struct idxd_device *idxd)
+{
+	struct device *dev = &idxd->pdev->dev;
+	int rc;
+
+	mdev_unregister_device(dev);
+	if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
+		rc = iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX);
+		if (rc < 0)
+			dev_warn(dev, "Failed to disable aux-domain: %d\n",
+				 rc);
+	}
+}
+
 static int idxd_mdev_aux_probe(struct auxiliary_device *auxdev,
 			       const struct auxiliary_device_id *id)
 {
@@ -34,6 +1024,7 @@ static int idxd_mdev_aux_probe(struct auxiliary_device *auxdev,
 		return rc;
 	}
 
+	set_bit(IDXD_FLAG_MDEV_ENABLED, &idxd->flags);
 	return 0;
 }
 
@@ -41,6 +1032,7 @@ static void idxd_mdev_aux_remove(struct auxiliary_device *auxdev)
 {
 	struct idxd_device *idxd = dev_get_drvdata(&auxdev->dev);
 
+	clear_bit(IDXD_FLAG_MDEV_ENABLED, &idxd->flags);
 	idxd_mdev_host_release(idxd);
 }
 
@@ -70,6 +1062,6 @@ static void idxd_mdev_auxdev_drv_unregister(struct idxd_mdev_aux_drv *drv)
 }
 
 module_driver(idxd_mdev_aux_drv, idxd_mdev_auxdev_drv_register, idxd_mdev_auxdev_drv_unregister);
-
+MODULE_IMPORT_NS(IDXD);
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Intel Corporation");
diff --git a/drivers/vfio/mdev/idxd/mdev.h b/drivers/vfio/mdev/idxd/mdev.h
new file mode 100644
index 000000000000..7ca50f054714
--- /dev/null
+++ b/drivers/vfio/mdev/idxd/mdev.h
@@ -0,0 +1,115 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2020 Intel Corporation. All rights rsvd. */
+
+#ifndef _IDXD_MDEV_H_
+#define _IDXD_MDEV_H_
+
+/* two 64-bit BARs implemented */
+#define VIDXD_MAX_BARS 2
+#define VIDXD_MAX_CFG_SPACE_SZ 4096
+#define VIDXD_MAX_MMIO_SPACE_SZ 8192
+#define VIDXD_MSIX_TBL_SZ_OFFSET 0x42
+#define VIDXD_CAP_CTRL_SZ 0x100
+#define VIDXD_GRP_CTRL_SZ 0x100
+#define VIDXD_WQ_CTRL_SZ 0x100
+#define VIDXD_WQ_OCPY_INT_SZ 0x20
+#define VIDXD_MSIX_TBL_SZ 0x90
+#define VIDXD_MSIX_PERM_TBL_SZ 0x48
+
+#define VIDXD_MSIX_TABLE_OFFSET 0x600
+#define VIDXD_MSIX_PERM_OFFSET 0x300
+#define VIDXD_GRPCFG_OFFSET 0x400
+#define VIDXD_WQCFG_OFFSET 0x500
+#define VIDXD_IMS_OFFSET 0x1000
+
+#define VIDXD_BAR0_SIZE  0x2000
+#define VIDXD_BAR2_SIZE  0x2000
+#define VIDXD_MAX_MSIX_ENTRIES  (VIDXD_MSIX_TBL_SZ / 0x10)
+#define VIDXD_MAX_WQS	1
+#define VIDXD_MAX_MSIX_VECS	2
+
+#define	VIDXD_ATS_OFFSET 0x100
+#define	VIDXD_PRS_OFFSET 0x110
+#define VIDXD_PASID_OFFSET 0x120
+#define VIDXD_MSIX_PBA_OFFSET 0x700
+
+struct ims_irq_entry {
+	struct vdcm_idxd *vidxd;
+	bool irq_set;
+	int id;
+	int irq;
+};
+
+struct idxd_vdev {
+	struct mdev_device *mdev;
+	struct vfio_group *vfio_group;
+	struct eventfd_ctx *msix_trigger[VIDXD_MAX_MSIX_ENTRIES];
+};
+
+struct vdcm_idxd {
+	struct idxd_device *idxd;
+	struct idxd_wq *wq;
+	struct idxd_vdev vdev;
+	struct vdcm_idxd_type *type;
+	int num_wqs;
+	struct ims_irq_entry irq_entries[VIDXD_MAX_MSIX_ENTRIES];
+
+	/* For VM use case */
+	u64 bar_val[VIDXD_MAX_BARS];
+	u64 bar_size[VIDXD_MAX_BARS];
+	u8 cfg[VIDXD_MAX_CFG_SPACE_SZ];
+	u8 bar0[VIDXD_MAX_MMIO_SPACE_SZ];
+	struct list_head list;
+	struct mutex dev_lock; /* lock for vidxd resources */
+
+	int refcount;
+};
+
+static inline struct vdcm_idxd *to_vidxd(struct idxd_vdev *vdev)
+{
+	return container_of(vdev, struct vdcm_idxd, vdev);
+}
+
+#define IDXD_MDEV_NAME_LEN 64
+
+enum idxd_mdev_type {
+	IDXD_MDEV_TYPE_DSA_1_DWQ = 0,
+	IDXD_MDEV_TYPE_IAX_1_DWQ,
+};
+
+#define IDXD_MDEV_TYPES 2
+
+struct vdcm_idxd_type {
+	char *name;
+	enum idxd_mdev_type type;
+	unsigned int avail_instance;
+};
+
+enum idxd_vdcm_rw {
+	IDXD_VDCM_READ = 0,
+	IDXD_VDCM_WRITE,
+};
+
+static inline u64 get_reg_val(void *buf, int size)
+{
+	u64 val = 0;
+
+	switch (size) {
+	case 8:
+		val = *(u64 *)buf;
+		break;
+	case 4:
+		val = *(u32 *)buf;
+		break;
+	case 2:
+		val = *(u16 *)buf;
+		break;
+	case 1:
+		val = *(u8 *)buf;
+		break;
+	}
+
+	return val;
+}
+
+#endif
diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c
new file mode 100644
index 000000000000..766753a2ec53
--- /dev/null
+++ b/drivers/vfio/mdev/idxd/vdev.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/device.h>
+#include <linux/sched/task.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/mm.h>
+#include <linux/mmu_context.h>
+#include <linux/vfio.h>
+#include <linux/mdev.h>
+#include <linux/msi.h>
+#include <linux/intel-iommu.h>
+#include <linux/intel-svm.h>
+#include <linux/kvm_host.h>
+#include <linux/eventfd.h>
+#include <uapi/linux/idxd.h>
+#include "registers.h"
+#include "idxd.h"
+#include "../../vfio/pci/vfio_pci_private.h"
+#include "mdev.h"
+#include "vdev.h"
+
+int vidxd_send_interrupt(struct ims_irq_entry *iie)
+{
+	/* PLACE HOLDER */
+	return 0;
+}
+
+int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size)
+{
+	/* PLACEHOLDER */
+	return 0;
+}
+
+int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size)
+{
+	/* PLACEHOLDER */
+	return 0;
+}
+
+int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count)
+{
+	/* PLACEHOLDER */
+	return 0;
+}
+
+int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size)
+{
+	/* PLACEHOLDER */
+	return 0;
+}
+
+void vidxd_mmio_init(struct vdcm_idxd *vidxd)
+{
+	/* PLACEHOLDER */
+}
+
+void vidxd_reset(struct vdcm_idxd *vidxd)
+{
+	/* PLACEHOLDER */
+}
+
+int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd)
+{
+	/* PLACEHOLDER */
+	return 0;
+}
+
+void vidxd_free_ims_entries(struct vdcm_idxd *vidxd)
+{
+	/* PLACEHOLDER */
+}
diff --git a/drivers/vfio/mdev/idxd/vdev.h b/drivers/vfio/mdev/idxd/vdev.h
new file mode 100644
index 000000000000..cc2ba6ccff7b
--- /dev/null
+++ b/drivers/vfio/mdev/idxd/vdev.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */
+
+#ifndef _IDXD_VDEV_H_
+#define _IDXD_VDEV_H_
+
+#include "mdev.h"
+
+int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size);
+int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size);
+int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count);
+int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size);
+void vidxd_mmio_init(struct vdcm_idxd *vidxd);
+void vidxd_reset(struct vdcm_idxd *vidxd);
+int vidxd_send_interrupt(struct ims_irq_entry *iie);
+int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd);
+void vidxd_free_ims_entries(struct vdcm_idxd *vidxd);
+
+#endif



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 06/14] vfio/mdev: idxd: add mdev type as a new wq type
  2021-02-05 20:52 [PATCH v5 00/14] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
                   ` (4 preceding siblings ...)
  2021-02-05 20:53 ` [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions Dave Jiang
@ 2021-02-05 20:53 ` Dave Jiang
  2021-02-05 20:53 ` [PATCH v5 07/14] vfio/mdev: idxd: add 1dwq-v1 mdev type Dave Jiang
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Dave Jiang @ 2021-02-05 20:53 UTC (permalink / raw)
  To: alex.williamson, kwankhede, tglx, vkoul
  Cc: megha.dey, jacob.jun.pan, ashok.raj, jgg, yi.l.liu, baolu.lu,
	kevin.tian, sanjay.k.kumar, tony.luck, dan.j.williams,
	eric.auger, parav, netanelg, shahafs, pbonzini, dmaengine,
	linux-kernel, kvm

Add "mdev" wq type and support helpers. The mdev wq type marks the wq
to be utilized as a VFIO mediated device.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/idxd/idxd.h  |    2 ++
 drivers/dma/idxd/sysfs.c |   13 +++++++++++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index a271942df2be..67428c8d476d 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -73,6 +73,7 @@ enum idxd_wq_type {
 	IDXD_WQT_NONE = 0,
 	IDXD_WQT_KERNEL,
 	IDXD_WQT_USER,
+	IDXD_WQT_MDEV,
 };
 
 struct idxd_cdev {
@@ -344,6 +345,7 @@ void idxd_cleanup_sysfs(struct idxd_device *idxd);
 int idxd_register_driver(void);
 void idxd_unregister_driver(void);
 struct bus_type *idxd_get_bus_type(struct idxd_device *idxd);
+bool is_idxd_wq_mdev(struct idxd_wq *wq);
 
 /* device interrupt control */
 irqreturn_t idxd_irq_handler(int vec, void *data);
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index ab5c76e1226b..13d20cbd4cf6 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -14,6 +14,7 @@ static char *idxd_wq_type_names[] = {
 	[IDXD_WQT_NONE]		= "none",
 	[IDXD_WQT_KERNEL]	= "kernel",
 	[IDXD_WQT_USER]		= "user",
+	[IDXD_WQT_MDEV]		= "mdev",
 };
 
 static void idxd_conf_device_release(struct device *dev)
@@ -79,6 +80,11 @@ static inline bool is_idxd_wq_cdev(struct idxd_wq *wq)
 	return wq->type == IDXD_WQT_USER;
 }
 
+inline bool is_idxd_wq_mdev(struct idxd_wq *wq)
+{
+	return wq->type == IDXD_WQT_MDEV ? true : false;
+}
+
 static int idxd_config_bus_match(struct device *dev,
 				 struct device_driver *drv)
 {
@@ -1151,8 +1157,9 @@ static ssize_t wq_type_show(struct device *dev,
 		return sprintf(buf, "%s\n",
 			       idxd_wq_type_names[IDXD_WQT_KERNEL]);
 	case IDXD_WQT_USER:
-		return sprintf(buf, "%s\n",
-			       idxd_wq_type_names[IDXD_WQT_USER]);
+		return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_USER]);
+	case IDXD_WQT_MDEV:
+		return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_MDEV]);
 	case IDXD_WQT_NONE:
 	default:
 		return sprintf(buf, "%s\n",
@@ -1179,6 +1186,8 @@ static ssize_t wq_type_store(struct device *dev,
 		wq->type = IDXD_WQT_KERNEL;
 	else if (sysfs_streq(buf, idxd_wq_type_names[IDXD_WQT_USER]))
 		wq->type = IDXD_WQT_USER;
+	else if (sysfs_streq(buf, idxd_wq_type_names[IDXD_WQT_MDEV]))
+		wq->type = IDXD_WQT_MDEV;
 	else
 		return -EINVAL;
 



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 07/14] vfio/mdev: idxd: add 1dwq-v1 mdev type
  2021-02-05 20:52 [PATCH v5 00/14] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
                   ` (5 preceding siblings ...)
  2021-02-05 20:53 ` [PATCH v5 06/14] vfio/mdev: idxd: add mdev type as a new wq type Dave Jiang
@ 2021-02-05 20:53 ` Dave Jiang
  2021-02-11  0:00   ` Jason Gunthorpe
  2021-02-05 20:53 ` [PATCH v5 08/14] vfio/mdev: idxd: add emulation rw routines Dave Jiang
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Dave Jiang @ 2021-02-05 20:53 UTC (permalink / raw)
  To: alex.williamson, kwankhede, tglx, vkoul
  Cc: megha.dey, jacob.jun.pan, ashok.raj, jgg, yi.l.liu, baolu.lu,
	kevin.tian, sanjay.k.kumar, tony.luck, dan.j.williams,
	eric.auger, parav, netanelg, shahafs, pbonzini, dmaengine,
	linux-kernel, kvm

Add mdev device type "1dwq-v1" support code. 1dwq-v1 is defined as a
single DSA gen1 dedicated WQ. This WQ cannot be shared between guests. The
guest also cannot change any WQ configuration.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/idxd/sysfs.c      |    1 
 drivers/vfio/mdev/idxd/mdev.c |  216 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 207 insertions(+), 10 deletions(-)

diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index 13d20cbd4cf6..d985a0ac23d9 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -84,6 +84,7 @@ inline bool is_idxd_wq_mdev(struct idxd_wq *wq)
 {
 	return wq->type == IDXD_WQT_MDEV ? true : false;
 }
+EXPORT_SYMBOL_GPL(is_idxd_wq_mdev);
 
 static int idxd_config_bus_match(struct device *dev,
 				 struct device_driver *drv)
diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c
index 384ba5d6bc2b..7529396f3812 100644
--- a/drivers/vfio/mdev/idxd/mdev.c
+++ b/drivers/vfio/mdev/idxd/mdev.c
@@ -46,6 +46,9 @@ static u64 idxd_pci_config[] = {
 	0x0000000000000000ULL,
 };
 
+static char idxd_dsa_1dwq_name[IDXD_MDEV_NAME_LEN];
+static char idxd_iax_1dwq_name[IDXD_MDEV_NAME_LEN];
+
 static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, unsigned int index,
 			      unsigned int start, unsigned int count, void *data);
 
@@ -144,21 +147,70 @@ static void idxd_vdcm_release(struct mdev_device *mdev)
 	mutex_unlock(&vidxd->dev_lock);
 }
 
+static struct idxd_wq *find_any_dwq(struct idxd_device *idxd, struct vdcm_idxd_type *type)
+{
+	int i;
+	struct idxd_wq *wq;
+	unsigned long flags;
+
+	switch (type->type) {
+	case IDXD_MDEV_TYPE_DSA_1_DWQ:
+		if (idxd->type != IDXD_TYPE_DSA)
+			return NULL;
+		break;
+	case IDXD_MDEV_TYPE_IAX_1_DWQ:
+		if (idxd->type != IDXD_TYPE_IAX)
+			return NULL;
+		break;
+	default:
+		return NULL;
+	}
+
+	spin_lock_irqsave(&idxd->dev_lock, flags);
+	for (i = 0; i < idxd->max_wqs; i++) {
+		wq = &idxd->wqs[i];
+
+		if (wq->state != IDXD_WQ_ENABLED)
+			continue;
+
+		if (!wq_dedicated(wq))
+			continue;
+
+		if (idxd_wq_refcount(wq) != 0)
+			continue;
+
+		spin_unlock_irqrestore(&idxd->dev_lock, flags);
+		mutex_lock(&wq->wq_lock);
+		if (idxd_wq_refcount(wq)) {
+			spin_lock_irqsave(&idxd->dev_lock, flags);
+			continue;
+		}
+
+		idxd_wq_get(wq);
+		mutex_unlock(&wq->wq_lock);
+		return wq;
+	}
+
+	spin_unlock_irqrestore(&idxd->dev_lock, flags);
+	return NULL;
+}
+
 static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev_device *mdev,
 					   struct vdcm_idxd_type *type)
 {
 	struct vdcm_idxd *vidxd;
 	struct idxd_wq *wq = NULL;
-	int i;
-
-	/* PLACEHOLDER, wq matching comes later */
+	int i, rc;
 
+	wq = find_any_dwq(idxd, type);
 	if (!wq)
 		return ERR_PTR(-ENODEV);
 
 	vidxd = kzalloc(sizeof(*vidxd), GFP_KERNEL);
-	if (!vidxd)
-		return ERR_PTR(-ENOMEM);
+	if (!vidxd) {
+		rc = -ENOMEM;
+		goto err;
+	}
 
 	mutex_init(&vidxd->dev_lock);
 	vidxd->idxd = idxd;
@@ -169,9 +221,6 @@ static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev
 	vidxd->num_wqs = VIDXD_MAX_WQS;
 
 	idxd_vdcm_init(vidxd);
-	mutex_lock(&wq->wq_lock);
-	idxd_wq_get(wq);
-	mutex_unlock(&wq->wq_lock);
 
 	for (i = 0; i < VIDXD_MAX_MSIX_ENTRIES; i++) {
 		vidxd->irq_entries[i].vidxd = vidxd;
@@ -179,9 +228,24 @@ static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev
 	}
 
 	return vidxd;
+
+ err:
+	mutex_lock(&wq->wq_lock);
+	idxd_wq_put(wq);
+	mutex_unlock(&wq->wq_lock);
+	return ERR_PTR(rc);
 }
 
-static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES];
+static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES] = {
+	{
+		.name = idxd_dsa_1dwq_name,
+		.type = IDXD_MDEV_TYPE_DSA_1_DWQ,
+	},
+	{
+		.name = idxd_iax_1dwq_name,
+		.type = IDXD_MDEV_TYPE_IAX_1_DWQ,
+	},
+};
 
 static struct vdcm_idxd_type *idxd_vdcm_find_vidxd_type(struct device *dev,
 							const char *name)
@@ -965,7 +1029,94 @@ static long idxd_vdcm_ioctl(struct mdev_device *mdev, unsigned int cmd,
 	return rc;
 }
 
-static const struct mdev_parent_ops idxd_vdcm_ops = {
+static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf)
+{
+	struct vdcm_idxd_type *type;
+
+	type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj));
+
+	if (type)
+		return sprintf(buf, "%s\n", type->name);
+
+	return -EINVAL;
+}
+static MDEV_TYPE_ATTR_RO(name);
+
+static int find_available_mdev_instances(struct idxd_device *idxd, struct vdcm_idxd_type *type)
+{
+	int count = 0, i;
+	unsigned long flags;
+
+	switch (type->type) {
+	case IDXD_MDEV_TYPE_DSA_1_DWQ:
+		if (idxd->type != IDXD_TYPE_DSA)
+			return 0;
+		break;
+	case IDXD_MDEV_TYPE_IAX_1_DWQ:
+		if (idxd->type != IDXD_TYPE_IAX)
+			return 0;
+		break;
+	default:
+		return 0;
+	}
+
+	spin_lock_irqsave(&idxd->dev_lock, flags);
+	for (i = 0; i < idxd->max_wqs; i++) {
+		struct idxd_wq *wq;
+
+		wq = &idxd->wqs[i];
+		if (!is_idxd_wq_mdev(wq) || !wq_dedicated(wq) || idxd_wq_refcount(wq))
+			continue;
+
+		count++;
+	}
+	spin_unlock_irqrestore(&idxd->dev_lock, flags);
+
+	return count;
+}
+
+static ssize_t available_instances_show(struct kobject *kobj,
+					struct device *dev, char *buf)
+{
+	int count;
+	struct idxd_device *idxd = dev_get_drvdata(dev);
+	struct vdcm_idxd_type *type;
+
+	type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj));
+	if (!type)
+		return -EINVAL;
+
+	count = find_available_mdev_instances(idxd, type);
+
+	return sprintf(buf, "%d\n", count);
+}
+static MDEV_TYPE_ATTR_RO(available_instances);
+
+static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
+			       char *buf)
+{
+	return sprintf(buf, "%s\n", VFIO_DEVICE_API_PCI_STRING);
+}
+static MDEV_TYPE_ATTR_RO(device_api);
+
+static struct attribute *idxd_mdev_types_attrs[] = {
+	&mdev_type_attr_name.attr,
+	&mdev_type_attr_device_api.attr,
+	&mdev_type_attr_available_instances.attr,
+	NULL,
+};
+
+static struct attribute_group idxd_mdev_type_dsa_group0 = {
+	.name = idxd_dsa_1dwq_name,
+	.attrs = idxd_mdev_types_attrs,
+};
+
+static struct attribute_group idxd_mdev_type_iax_group0 = {
+	.name = idxd_iax_1dwq_name,
+	.attrs = idxd_mdev_types_attrs,
+};
+
+static struct mdev_parent_ops idxd_vdcm_ops = {
 	.create			= idxd_vdcm_create,
 	.remove			= idxd_vdcm_remove,
 	.open			= idxd_vdcm_open,
@@ -976,6 +1127,43 @@ static const struct mdev_parent_ops idxd_vdcm_ops = {
 	.ioctl			= idxd_vdcm_ioctl,
 };
 
+/* Set the mdev type version to the hardware version supported */
+static void init_mdev_1dwq_name(struct idxd_device *idxd)
+{
+	unsigned int version;
+
+	version = (idxd->hw.version & GENMASK(15, 8)) >> 8;
+	if (idxd->type == IDXD_TYPE_DSA && strlen(idxd_dsa_1dwq_name) == 0)
+		sprintf(idxd_dsa_1dwq_name, "dsa-1dwq-v%u", version);
+	else if (idxd->type == IDXD_TYPE_IAX && strlen(idxd_iax_1dwq_name) == 0)
+		sprintf(idxd_iax_1dwq_name, "iax-1dwq-v%u", version);
+}
+
+static int alloc_supported_types(struct idxd_device *idxd)
+{
+	struct attribute_group **idxd_mdev_type_groups;
+
+	idxd_mdev_type_groups = kcalloc(2, sizeof(struct attribute_group *), GFP_KERNEL);
+	if (!idxd_mdev_type_groups)
+		return -ENOMEM;
+
+	switch (idxd->type) {
+	case IDXD_TYPE_DSA:
+		idxd_mdev_type_groups[0] = &idxd_mdev_type_dsa_group0;
+		break;
+	case IDXD_TYPE_IAX:
+		idxd_mdev_type_groups[0] = &idxd_mdev_type_iax_group0;
+		break;
+	case IDXD_TYPE_UNKNOWN:
+	default:
+		return -ENODEV;
+	}
+
+	idxd_vdcm_ops.supported_type_groups = idxd_mdev_type_groups;
+
+	return 0;
+}
+
 int idxd_mdev_host_init(struct idxd_device *idxd)
 {
 	struct device *dev = &idxd->pdev->dev;
@@ -984,6 +1172,11 @@ int idxd_mdev_host_init(struct idxd_device *idxd)
 	if (!test_bit(IDXD_FLAG_IMS_SUPPORTED, &idxd->flags))
 		return -EOPNOTSUPP;
 
+	init_mdev_1dwq_name(idxd);
+	rc = alloc_supported_types(idxd);
+	if (rc < 0)
+		return rc;
+
 	if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
 		rc = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_AUX);
 		if (rc < 0) {
@@ -1010,6 +1203,9 @@ void idxd_mdev_host_release(struct idxd_device *idxd)
 			dev_warn(dev, "Failed to disable aux-domain: %d\n",
 				 rc);
 	}
+
+	kfree(idxd_vdcm_ops.supported_type_groups);
+	idxd_vdcm_ops.supported_type_groups = NULL;
 }
 
 static int idxd_mdev_aux_probe(struct auxiliary_device *auxdev,



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 08/14] vfio/mdev: idxd: add emulation rw routines
  2021-02-05 20:52 [PATCH v5 00/14] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
                   ` (6 preceding siblings ...)
  2021-02-05 20:53 ` [PATCH v5 07/14] vfio/mdev: idxd: add 1dwq-v1 mdev type Dave Jiang
@ 2021-02-05 20:53 ` Dave Jiang
  2021-02-05 20:53 ` [PATCH v5 09/14] vfio/mdev: idxd: prep for virtual device commands Dave Jiang
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Dave Jiang @ 2021-02-05 20:53 UTC (permalink / raw)
  To: alex.williamson, kwankhede, tglx, vkoul
  Cc: megha.dey, jacob.jun.pan, ashok.raj, jgg, yi.l.liu, baolu.lu,
	kevin.tian, sanjay.k.kumar, tony.luck, dan.j.williams,
	eric.auger, parav, netanelg, shahafs, pbonzini, dmaengine,
	linux-kernel, kvm

Add emulation routines for PCI config read/write, MMIO read/write, and
interrupt handling routine for the emulated device. The rw routines are
called when PCI config read/writes or BAR0 mmio read/writes and being
issued by the guest kernel through KVM/qemu.

Because we are supporting read-only configuration, most of the MMIO
emulations are simple memory copy except for cases such as handling device
commands and interrupts.

As part of emulation code, add the support code for "1dwq" mdev type.
This mdev type follows the standard VFIO mdev flow. The "1dwq" type will
export a single dedicated wq to the mdev. The dwq will have read-only
configuration that is configured by the host. The mdev type does not
support PASID and SVA and will match the stage 1 driver in functional
support. For backward compatibility, the mdev will maintain the DSA
spec definition of this mdev type once the commit goes upstream.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/idxd/registers.h  |   10 +
 drivers/vfio/mdev/idxd/vdev.c |  456 ++++++++++++++++++++++++++++++++++++++++-
 drivers/vfio/mdev/idxd/vdev.h |    8 +
 include/uapi/linux/idxd.h     |    2 
 4 files changed, 468 insertions(+), 8 deletions(-)

diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h
index d9a732decdd5..50ea94259c99 100644
--- a/drivers/dma/idxd/registers.h
+++ b/drivers/dma/idxd/registers.h
@@ -195,7 +195,8 @@ union cmdsts_reg {
 	};
 	u32 bits;
 } __packed;
-#define IDXD_CMDSTS_ACTIVE		0x80000000
+#define IDXD_CMDS_ACTIVE_BIT		31
+#define IDXD_CMDSTS_ACTIVE		BIT(IDXD_CMDS_ACTIVE_BIT)
 #define IDXD_CMDSTS_ERR_MASK		0xff
 #define IDXD_CMDSTS_RES_SHIFT		8
 
@@ -278,6 +279,11 @@ union msix_perm {
 	u32 bits;
 } __packed;
 
+#define IDXD_MSIX_PERM_MASK	0xfffff00c
+#define IDXD_MSIX_PERM_IGNORE	0x3
+#define MSIX_ENTRY_MASK_INT	0x1
+#define MSIX_ENTRY_CTRL_BYTE	12
+
 union group_flags {
 	struct {
 		u32 tc_a:3;
@@ -349,6 +355,8 @@ union wqcfg {
 
 #define WQCFG_PASID_IDX		2
 #define WQCFG_PRIV_IDX		2
+#define WQCFG_MODE_DEDICATED	1
+#define WQCFG_MODE_SHARED	0
 
 /*
  * This macro calculates the offset into the WQCFG register
diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c
index 766753a2ec53..958b09987e5c 100644
--- a/drivers/vfio/mdev/idxd/vdev.c
+++ b/drivers/vfio/mdev/idxd/vdev.c
@@ -25,35 +25,472 @@
 
 int vidxd_send_interrupt(struct ims_irq_entry *iie)
 {
-	/* PLACE HOLDER */
+	struct vdcm_idxd *vidxd = iie->vidxd;
+	struct device *dev = &vidxd->idxd->pdev->dev;
+	int rc;
+
+	dev_dbg(dev, "%s interrput %d\n", __func__, iie->id);
+
+	if (!vidxd->vdev.msix_trigger[iie->id]) {
+		dev_warn(dev, "%s: intr eventfd not found %d\n", __func__, iie->id);
+		return -EINVAL;
+	}
+
+	rc = eventfd_signal(vidxd->vdev.msix_trigger[iie->id], 1);
+	if (rc != 1)
+		dev_err(dev, "eventfd signal failed: %d on wq(%d) vector(%d)\n",
+			vidxd->wq->id, iie->id, rc);
+	else
+		dev_dbg(dev, "vidxd interrupt triggered wq(%d) %d\n", vidxd->wq->id, iie->id);
+
+	return rc;
+}
+
+static void vidxd_report_error(struct vdcm_idxd *vidxd, unsigned int error)
+{
+	u8 *bar0 = vidxd->bar0;
+	union sw_err_reg *swerr = (union sw_err_reg *)(bar0 + IDXD_SWERR_OFFSET);
+	union genctrl_reg *genctrl;
+	bool send = false;
+
+	if (!swerr->valid) {
+		memset(swerr, 0, sizeof(*swerr));
+		swerr->valid = 1;
+		swerr->error = error;
+		send = true;
+	} else if (swerr->valid && !swerr->overflow) {
+		swerr->overflow = 1;
+	}
+
+	genctrl = (union genctrl_reg *)(bar0 + IDXD_GENCTRL_OFFSET);
+	if (send && genctrl->softerr_int_en) {
+		u32 *intcause = (u32 *)(bar0 + IDXD_INTCAUSE_OFFSET);
+
+		*intcause |= IDXD_INTC_ERR;
+		vidxd_send_interrupt(&vidxd->irq_entries[0]);
+	}
+}
+
+int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size)
+{
+	u32 offset = pos & (vidxd->bar_size[0] - 1);
+	u8 *bar0 = vidxd->bar0;
+	struct device *dev = mdev_dev(vidxd->vdev.mdev);
+
+	dev_dbg(dev, "vidxd mmio W %d %x %x: %llx\n", vidxd->wq->id, size,
+		offset, get_reg_val(buf, size));
+
+	if (((size & (size - 1)) != 0) || (offset & (size - 1)) != 0)
+		return -EINVAL;
+
+	/* If we don't limit this, we potentially can write out of bound */
+	if (size > sizeof(u32))
+		return -EINVAL;
+
+	switch (offset) {
+	case IDXD_GENCFG_OFFSET ... IDXD_GENCFG_OFFSET + 3:
+		/* Write only when device is disabled. */
+		if (vidxd_state(vidxd) == IDXD_DEVICE_STATE_DISABLED)
+			memcpy(bar0 + offset, buf, size);
+		break;
+
+	case IDXD_GENCTRL_OFFSET:
+		memcpy(bar0 + offset, buf, size);
+		break;
+
+	case IDXD_INTCAUSE_OFFSET:
+		bar0[offset] &= ~(get_reg_val(buf, 1) & GENMASK(4, 0));
+		break;
+
+	case IDXD_CMD_OFFSET: {
+		u32 *cmdsts = (u32 *)(bar0 + IDXD_CMDSTS_OFFSET);
+		u32 val = get_reg_val(buf, size);
+
+		if (size != sizeof(u32))
+			return -EINVAL;
+
+		/* Check and set command in progress */
+		if (test_and_set_bit(IDXD_CMDS_ACTIVE_BIT, (unsigned long *)cmdsts) == 0)
+			vidxd_do_command(vidxd, val);
+		else
+			vidxd_report_error(vidxd, DSA_ERR_CMD_REG);
+		break;
+	}
+
+	case IDXD_SWERR_OFFSET:
+		/* W1C */
+		bar0[offset] &= ~(get_reg_val(buf, 1) & GENMASK(1, 0));
+		break;
+
+	case VIDXD_WQCFG_OFFSET ... VIDXD_WQCFG_OFFSET + VIDXD_WQ_CTRL_SZ - 1:
+	case VIDXD_GRPCFG_OFFSET ...  VIDXD_GRPCFG_OFFSET + VIDXD_GRP_CTRL_SZ - 1:
+		/* Nothing is written. Should be all RO */
+		break;
+
+	case VIDXD_MSIX_TABLE_OFFSET ...  VIDXD_MSIX_TABLE_OFFSET + VIDXD_MSIX_TBL_SZ - 1: {
+		int index = (offset - VIDXD_MSIX_TABLE_OFFSET) / 0x10;
+		u8 *msix_entry = &bar0[VIDXD_MSIX_TABLE_OFFSET + index * 0x10];
+		u64 *pba = (u64 *)(bar0 + VIDXD_MSIX_PBA_OFFSET);
+		u8 ctrl;
+
+		ctrl = msix_entry[MSIX_ENTRY_CTRL_BYTE];
+		memcpy(bar0 + offset, buf, size);
+		/* Handle clearing of UNMASK bit */
+		if (!(msix_entry[MSIX_ENTRY_CTRL_BYTE] & MSIX_ENTRY_MASK_INT) &&
+		    ctrl & MSIX_ENTRY_MASK_INT)
+			if (test_and_clear_bit(index, (unsigned long *)pba))
+				vidxd_send_interrupt(&vidxd->irq_entries[index]);
+		break;
+	}
+
+	case VIDXD_MSIX_PERM_OFFSET ...  VIDXD_MSIX_PERM_OFFSET + VIDXD_MSIX_PERM_TBL_SZ - 1:
+		memcpy(bar0 + offset, buf, size);
+		break;
+	} /* offset */
+
 	return 0;
 }
 
 int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size)
 {
-	/* PLACEHOLDER */
+	u32 offset = pos & (vidxd->bar_size[0] - 1);
+	struct device *dev = mdev_dev(vidxd->vdev.mdev);
+
+	memcpy(buf, vidxd->bar0 + offset, size);
+
+	dev_dbg(dev, "vidxd mmio R %d %x %x: %llx\n",
+		vidxd->wq->id, size, offset, get_reg_val(buf, size));
 	return 0;
 }
 
-int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size)
+int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count)
 {
-	/* PLACEHOLDER */
+	u32 offset = pos & 0xfff;
+	struct device *dev = mdev_dev(vidxd->vdev.mdev);
+
+	memcpy(buf, &vidxd->cfg[offset], count);
+
+	dev_dbg(dev, "vidxd pci R %d %x %x: %llx\n",
+		vidxd->wq->id, count, offset, get_reg_val(buf, count));
+
 	return 0;
 }
 
-int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count)
+/*
+ * Much of the emulation code has been borrowed from Intel i915 cfg space
+ * emulation code.
+ * drivers/gpu/drm/i915/gvt/cfg_space.c:
+ */
+
+/*
+ * Bitmap for writable bits (RW or RW1C bits, but cannot co-exist in one
+ * byte) byte by byte in standard pci configuration space. (not the full
+ * 256 bytes.)
+ */
+static const u8 pci_cfg_space_rw_bmp[PCI_INTERRUPT_LINE + 4] = {
+	[PCI_COMMAND]		= 0xff, 0x07,
+	[PCI_STATUS]		= 0x00, 0xf9, /* the only one RW1C byte */
+	[PCI_CACHE_LINE_SIZE]	= 0xff,
+	[PCI_BASE_ADDRESS_0 ... PCI_CARDBUS_CIS - 1] = 0xff,
+	[PCI_ROM_ADDRESS]	= 0x01, 0xf8, 0xff, 0xff,
+	[PCI_INTERRUPT_LINE]	= 0xff,
+};
+
+static void _pci_cfg_mem_write(struct vdcm_idxd *vidxd, unsigned int off, u8 *src,
+			       unsigned int bytes)
 {
-	/* PLACEHOLDER */
+	u8 *cfg_base = vidxd->cfg;
+	u8 mask, new, old;
+	int i = 0;
+
+	for (; i < bytes && (off + i < sizeof(pci_cfg_space_rw_bmp)); i++) {
+		mask = pci_cfg_space_rw_bmp[off + i];
+		old = cfg_base[off + i];
+		new = src[i] & mask;
+
+		/**
+		 * The PCI_STATUS high byte has RW1C bits, here
+		 * emulates clear by writing 1 for these bits.
+		 * Writing a 0b to RW1C bits has no effect.
+		 */
+		if (off + i == PCI_STATUS + 1)
+			new = (~new & old) & mask;
+
+		cfg_base[off + i] = (old & ~mask) | new;
+	}
+
+	/* For other configuration space directly copy as it is. */
+	if (i < bytes)
+		memcpy(cfg_base + off + i, src + i, bytes - i);
+}
+
+static inline void _write_pci_bar(struct vdcm_idxd *vidxd, u32 offset, u32 val, bool low)
+{
+	u32 *pval;
+
+	/* BAR offset should be 32 bits algiend */
+	offset = rounddown(offset, 4);
+	pval = (u32 *)(vidxd->cfg + offset);
+
+	if (low) {
+		/*
+		 * only update bit 31 - bit 4,
+		 * leave the bit 3 - bit 0 unchanged.
+		 */
+		*pval = (val & GENMASK(31, 4)) | (*pval & GENMASK(3, 0));
+	} else {
+		*pval = val;
+	}
+}
+
+static int _pci_cfg_bar_write(struct vdcm_idxd *vidxd, unsigned int offset, void *p_data,
+			      unsigned int bytes)
+{
+	u32 new = *(u32 *)(p_data);
+	bool lo = IS_ALIGNED(offset, 8);
+	u64 size;
+	unsigned int bar_id;
+
+	/*
+	 * Power-up software can determine how much address
+	 * space the device requires by writing a value of
+	 * all 1's to the register and then reading the value
+	 * back. The device will return 0's in all don't-care
+	 * address bits.
+	 */
+	if (new == 0xffffffff) {
+		switch (offset) {
+		case PCI_BASE_ADDRESS_0:
+		case PCI_BASE_ADDRESS_1:
+		case PCI_BASE_ADDRESS_2:
+		case PCI_BASE_ADDRESS_3:
+			bar_id = (offset - PCI_BASE_ADDRESS_0) / 8;
+			size = vidxd->bar_size[bar_id];
+			_write_pci_bar(vidxd, offset, size >> (lo ? 0 : 32), lo);
+			break;
+		default:
+			/* Unimplemented BARs */
+			_write_pci_bar(vidxd, offset, 0x0, false);
+		}
+	} else {
+		switch (offset) {
+		case PCI_BASE_ADDRESS_0:
+		case PCI_BASE_ADDRESS_1:
+		case PCI_BASE_ADDRESS_2:
+		case PCI_BASE_ADDRESS_3:
+			_write_pci_bar(vidxd, offset, new, lo);
+			break;
+		default:
+			break;
+		}
+	}
 	return 0;
 }
 
 int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size)
 {
-	/* PLACEHOLDER */
+	struct device *dev = &vidxd->idxd->pdev->dev;
+
+	if (size > 4)
+		return -EINVAL;
+
+	if (pos + size > VIDXD_MAX_CFG_SPACE_SZ)
+		return -EINVAL;
+
+	dev_dbg(dev, "vidxd pci W %d %x %x: %llx\n", vidxd->wq->id, size, pos,
+		get_reg_val(buf, size));
+
+	/* First check if it's PCI_COMMAND */
+	if (IS_ALIGNED(pos, 2) && pos == PCI_COMMAND) {
+		bool new_bme;
+		bool bme;
+
+		if (size > 2)
+			return -EINVAL;
+
+		new_bme = !!(get_reg_val(buf, 2) & PCI_COMMAND_MASTER);
+		bme = !!(vidxd->cfg[pos] & PCI_COMMAND_MASTER);
+		_pci_cfg_mem_write(vidxd, pos, buf, size);
+
+		/* Flag error if turning off BME while device is enabled */
+		if ((bme && !new_bme) && vidxd_state(vidxd) == IDXD_DEVICE_STATE_ENABLED)
+			vidxd_report_error(vidxd, DSA_ERR_PCI_CFG);
+		return 0;
+	}
+
+	switch (pos) {
+	case PCI_BASE_ADDRESS_0 ... PCI_BASE_ADDRESS_5:
+		if (!IS_ALIGNED(pos, 4))
+			return -EINVAL;
+		return _pci_cfg_bar_write(vidxd, pos, buf, size);
+
+	default:
+		_pci_cfg_mem_write(vidxd, pos, buf, size);
+	}
 	return 0;
 }
 
+static void vidxd_mmio_init_grpcap(struct vdcm_idxd *vidxd)
+{
+	u8 *bar0 = vidxd->bar0;
+	union group_cap_reg *grp_cap = (union group_cap_reg *)(bar0 + IDXD_GRPCAP_OFFSET);
+
+	/* single group for current implementation */
+	grp_cap->token_en = 0;
+	grp_cap->token_limit = 0;
+	grp_cap->total_tokens = 0;
+	grp_cap->num_groups = 1;
+}
+
+static void vidxd_mmio_init_grpcfg(struct vdcm_idxd *vidxd)
+{
+	u8 *bar0 = vidxd->bar0;
+	struct grpcfg *grpcfg = (struct grpcfg *)(bar0 + VIDXD_GRPCFG_OFFSET);
+	struct idxd_wq *wq = vidxd->wq;
+	struct idxd_group *group = wq->group;
+	int i;
+
+	/*
+	 * At this point, we are only exporting a single workqueue for
+	 * each mdev. So we need to just fake it as first workqueue
+	 * and also mark the available engines in this group.
+	 */
+
+	/* Set single workqueue and the first one */
+	grpcfg->wqs[0] = BIT(0);
+	grpcfg->engines = 0;
+	for (i = 0; i < group->num_engines; i++)
+		grpcfg->engines |= BIT(i);
+	grpcfg->flags.bits = group->grpcfg.flags.bits;
+}
+
+static void vidxd_mmio_init_wqcap(struct vdcm_idxd *vidxd)
+{
+	u8 *bar0 = vidxd->bar0;
+	struct idxd_wq *wq = vidxd->wq;
+	union wq_cap_reg *wq_cap = (union wq_cap_reg *)(bar0 + IDXD_WQCAP_OFFSET);
+
+	wq_cap->occupancy_int = 0;
+	wq_cap->occupancy = 0;
+	wq_cap->priority = 0;
+	wq_cap->total_wq_size = wq->size;
+	wq_cap->num_wqs = VIDXD_MAX_WQS;
+	wq_cap->wq_ats_support = 0;
+	wq_cap->dedicated_mode = 1;
+	wq_cap->shared_mode = 0;
+}
+
+static void vidxd_mmio_init_wqcfg(struct vdcm_idxd *vidxd)
+{
+	struct idxd_device *idxd = vidxd->idxd;
+	struct idxd_wq *wq = vidxd->wq;
+	u8 *bar0 = vidxd->bar0;
+	union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET);
+
+	wqcfg->wq_size = wq->size;
+	wqcfg->wq_thresh = wq->threshold;
+
+	wqcfg->mode = WQCFG_MODE_DEDICATED;
+
+	wqcfg->bof = 0;
+
+	wqcfg->priority = wq->priority;
+	wqcfg->max_xfer_shift = idxd->hw.gen_cap.max_xfer_shift;
+	wqcfg->max_batch_shift = idxd->hw.gen_cap.max_batch_shift;
+	/* make mode change read-only */
+	wqcfg->mode_support = 0;
+}
+
+static void vidxd_mmio_init_engcap(struct vdcm_idxd *vidxd)
+{
+	u8 *bar0 = vidxd->bar0;
+	union engine_cap_reg *engcap = (union engine_cap_reg *)(bar0 + IDXD_ENGCAP_OFFSET);
+	struct idxd_wq *wq = vidxd->wq;
+	struct idxd_group *group = wq->group;
+
+	engcap->num_engines = group->num_engines;
+}
+
+static void vidxd_mmio_init_gencap(struct vdcm_idxd *vidxd)
+{
+	struct idxd_device *idxd = vidxd->idxd;
+	u8 *bar0 = vidxd->bar0;
+	union gen_cap_reg *gencap = (union gen_cap_reg *)(bar0 + IDXD_GENCAP_OFFSET);
+
+	gencap->bits = idxd->hw.gen_cap.bits;
+	gencap->config_en = 0;
+	gencap->max_ims_mult = 0;
+	gencap->cmd_cap = 1;
+	gencap->block_on_fault = 0;
+}
+
+static void vidxd_mmio_init_cmdcap(struct vdcm_idxd *vidxd)
+{
+	struct idxd_device *idxd = vidxd->idxd;
+	u8 *bar0 = vidxd->bar0;
+	u32 *cmdcap = (u32 *)(bar0 + IDXD_CMDCAP_OFFSET);
+
+	if (idxd->hw.cmd_cap)
+		*cmdcap = idxd->hw.cmd_cap;
+	else
+		*cmdcap = 0x1ffe;
+
+	*cmdcap |= BIT(IDXD_CMD_REQUEST_INT_HANDLE) | BIT(IDXD_CMD_RELEASE_INT_HANDLE);
+}
+
+static void vidxd_mmio_init_opcap(struct vdcm_idxd *vidxd)
+{
+	u64 opcode;
+	u8 *bar0 = vidxd->bar0;
+	u64 *opcap = (u64 *)(bar0 + IDXD_OPCAP_OFFSET);
+
+	opcode = BIT_ULL(DSA_OPCODE_NOOP) | BIT_ULL(DSA_OPCODE_BATCH) |
+		 BIT_ULL(DSA_OPCODE_DRAIN) | BIT_ULL(DSA_OPCODE_MEMMOVE) |
+		 BIT_ULL(DSA_OPCODE_MEMFILL) | BIT_ULL(DSA_OPCODE_COMPARE) |
+		 BIT_ULL(DSA_OPCODE_COMPVAL) | BIT_ULL(DSA_OPCODE_CR_DELTA) |
+		 BIT_ULL(DSA_OPCODE_AP_DELTA) | BIT_ULL(DSA_OPCODE_DUALCAST) |
+		 BIT_ULL(DSA_OPCODE_CRCGEN) | BIT_ULL(DSA_OPCODE_COPY_CRC) |
+		 BIT_ULL(DSA_OPCODE_DIF_CHECK) | BIT_ULL(DSA_OPCODE_DIF_INS) |
+		 BIT_ULL(DSA_OPCODE_DIF_STRP) | BIT_ULL(DSA_OPCODE_DIF_UPDT) |
+		 BIT_ULL(DSA_OPCODE_CFLUSH);
+	*opcap = opcode;
+}
+
+static void vidxd_mmio_init_version(struct vdcm_idxd *vidxd)
+{
+	struct idxd_device *idxd = vidxd->idxd;
+	u32 *version;
+
+	version = (u32 *)vidxd->bar0;
+	*version = idxd->hw.version;
+}
+
 void vidxd_mmio_init(struct vdcm_idxd *vidxd)
+{
+	u8 *bar0 = vidxd->bar0;
+	union offsets_reg *offsets;
+
+	memset(vidxd->bar0, 0, VIDXD_BAR0_SIZE);
+
+	vidxd_mmio_init_version(vidxd);
+	vidxd_mmio_init_gencap(vidxd);
+	vidxd_mmio_init_wqcap(vidxd);
+	vidxd_mmio_init_grpcap(vidxd);
+	vidxd_mmio_init_engcap(vidxd);
+	vidxd_mmio_init_opcap(vidxd);
+
+	offsets = (union offsets_reg *)(bar0 + IDXD_TABLE_OFFSET);
+	offsets->grpcfg = VIDXD_GRPCFG_OFFSET / 0x100;
+	offsets->wqcfg = VIDXD_WQCFG_OFFSET / 0x100;
+	offsets->msix_perm = VIDXD_MSIX_PERM_OFFSET / 0x100;
+
+	vidxd_mmio_init_cmdcap(vidxd);
+	memset(bar0 + VIDXD_MSIX_PERM_OFFSET, 0, VIDXD_MSIX_PERM_TBL_SZ);
+	vidxd_mmio_init_grpcfg(vidxd);
+	vidxd_mmio_init_wqcfg(vidxd);
+}
+
+static void idxd_complete_command(struct vdcm_idxd *vidxd, enum idxd_cmdsts_err val)
 {
 	/* PLACEHOLDER */
 }
@@ -63,6 +500,11 @@ void vidxd_reset(struct vdcm_idxd *vidxd)
 	/* PLACEHOLDER */
 }
 
+void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val)
+{
+	/* PLACEHOLDER */
+}
+
 int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd)
 {
 	/* PLACEHOLDER */
diff --git a/drivers/vfio/mdev/idxd/vdev.h b/drivers/vfio/mdev/idxd/vdev.h
index cc2ba6ccff7b..fc0f405baa40 100644
--- a/drivers/vfio/mdev/idxd/vdev.h
+++ b/drivers/vfio/mdev/idxd/vdev.h
@@ -6,6 +6,13 @@
 
 #include "mdev.h"
 
+static inline u8 vidxd_state(struct vdcm_idxd *vidxd)
+{
+	union gensts_reg *gensts = (union gensts_reg *)(vidxd->bar0 + IDXD_GENSTATS_OFFSET);
+
+	return gensts->state;
+}
+
 int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size);
 int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size);
 int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count);
@@ -15,5 +22,6 @@ void vidxd_reset(struct vdcm_idxd *vidxd);
 int vidxd_send_interrupt(struct ims_irq_entry *iie);
 int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd);
 void vidxd_free_ims_entries(struct vdcm_idxd *vidxd);
+void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val);
 
 #endif
diff --git a/include/uapi/linux/idxd.h b/include/uapi/linux/idxd.h
index 236d437947bc..22d1b229a912 100644
--- a/include/uapi/linux/idxd.h
+++ b/include/uapi/linux/idxd.h
@@ -89,6 +89,8 @@ enum dsa_completion_status {
 	DSA_COMP_HW_ERR1,
 	DSA_COMP_HW_ERR_DRB,
 	DSA_COMP_TRANSLATION_FAIL,
+	DSA_ERR_PCI_CFG = 0x51,
+	DSA_ERR_CMD_REG,
 };
 
 enum iax_completion_status {



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 09/14] vfio/mdev: idxd: prep for virtual device commands
  2021-02-05 20:52 [PATCH v5 00/14] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
                   ` (7 preceding siblings ...)
  2021-02-05 20:53 ` [PATCH v5 08/14] vfio/mdev: idxd: add emulation rw routines Dave Jiang
@ 2021-02-05 20:53 ` Dave Jiang
  2021-02-05 20:53 ` [PATCH v5 10/14] vfio/mdev: idxd: virtual device commands emulation Dave Jiang
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Dave Jiang @ 2021-02-05 20:53 UTC (permalink / raw)
  To: alex.williamson, kwankhede, tglx, vkoul
  Cc: megha.dey, jacob.jun.pan, ashok.raj, jgg, yi.l.liu, baolu.lu,
	kevin.tian, sanjay.k.kumar, tony.luck, dan.j.williams,
	eric.auger, parav, netanelg, shahafs, pbonzini, dmaengine,
	linux-kernel, kvm

Update some of the device commands in order to support usage by the virtual
device commands emulated by the vdcm. Expose some of the commands' raw
status so the virtual commands can utilize them accordingly.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/idxd/cdev.c       |    2 +
 drivers/dma/idxd/device.c     |   69 +++++++++++++++++++++++++++--------------
 drivers/dma/idxd/idxd.h       |    8 ++---
 drivers/dma/idxd/irq.c        |    2 +
 drivers/dma/idxd/sysfs.c      |    8 ++---
 drivers/vfio/mdev/idxd/mdev.c |    2 +
 6 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index b1518106434f..f46328ba8493 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -160,7 +160,7 @@ static int idxd_cdev_release(struct inode *node, struct file *filep)
 			if (rc < 0)
 				dev_err(dev, "wq disable pasid failed.\n");
 		} else {
-			idxd_wq_drain(wq);
+			idxd_wq_drain(wq, NULL);
 		}
 	}
 
diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index 89fa2bbe6ebf..245d576ddc43 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -216,22 +216,25 @@ void idxd_wq_free_resources(struct idxd_wq *wq)
 	sbitmap_queue_free(&wq->sbq);
 }
 
-int idxd_wq_enable(struct idxd_wq *wq)
+int idxd_wq_enable(struct idxd_wq *wq, u32 *status)
 {
 	struct idxd_device *idxd = wq->idxd;
 	struct device *dev = &idxd->pdev->dev;
-	u32 status;
+	u32 stat;
 
 	if (wq->state == IDXD_WQ_ENABLED) {
 		dev_dbg(dev, "WQ %d already enabled\n", wq->id);
 		return -ENXIO;
 	}
 
-	idxd_cmd_exec(idxd, IDXD_CMD_ENABLE_WQ, wq->id, &status);
+	idxd_cmd_exec(idxd, IDXD_CMD_ENABLE_WQ, wq->id, &stat);
 
-	if (status != IDXD_CMDSTS_SUCCESS &&
-	    status != IDXD_CMDSTS_ERR_WQ_ENABLED) {
-		dev_dbg(dev, "WQ enable failed: %#x\n", status);
+	if (status)
+		*status = stat;
+
+	if (stat != IDXD_CMDSTS_SUCCESS &&
+	    stat != IDXD_CMDSTS_ERR_WQ_ENABLED) {
+		dev_dbg(dev, "WQ enable failed: %#x\n", stat);
 		return -ENXIO;
 	}
 
@@ -240,11 +243,11 @@ int idxd_wq_enable(struct idxd_wq *wq)
 	return 0;
 }
 
-int idxd_wq_disable(struct idxd_wq *wq)
+int idxd_wq_disable(struct idxd_wq *wq, u32 *status)
 {
 	struct idxd_device *idxd = wq->idxd;
 	struct device *dev = &idxd->pdev->dev;
-	u32 status, operand;
+	u32 stat, operand;
 
 	dev_dbg(dev, "Disabling WQ %d\n", wq->id);
 
@@ -254,10 +257,13 @@ int idxd_wq_disable(struct idxd_wq *wq)
 	}
 
 	operand = BIT(wq->id % 16) | ((wq->id / 16) << 16);
-	idxd_cmd_exec(idxd, IDXD_CMD_DISABLE_WQ, operand, &status);
+	idxd_cmd_exec(idxd, IDXD_CMD_DISABLE_WQ, operand, &stat);
+
+	if (status)
+		*status = stat;
 
-	if (status != IDXD_CMDSTS_SUCCESS) {
-		dev_dbg(dev, "WQ disable failed: %#x\n", status);
+	if (stat != IDXD_CMDSTS_SUCCESS) {
+		dev_dbg(dev, "WQ disable failed: %#x\n", stat);
 		return -ENXIO;
 	}
 
@@ -267,20 +273,31 @@ int idxd_wq_disable(struct idxd_wq *wq)
 }
 EXPORT_SYMBOL_GPL(idxd_wq_disable);
 
-void idxd_wq_drain(struct idxd_wq *wq)
+int idxd_wq_drain(struct idxd_wq *wq, u32 *status)
 {
 	struct idxd_device *idxd = wq->idxd;
 	struct device *dev = &idxd->pdev->dev;
-	u32 operand;
+	u32 operand, stat;
 
 	if (wq->state != IDXD_WQ_ENABLED) {
 		dev_dbg(dev, "WQ %d in wrong state: %d\n", wq->id, wq->state);
-		return;
+		return 0;
 	}
 
 	dev_dbg(dev, "Draining WQ %d\n", wq->id);
 	operand = BIT(wq->id % 16) | ((wq->id / 16) << 16);
-	idxd_cmd_exec(idxd, IDXD_CMD_DRAIN_WQ, operand, NULL);
+	idxd_cmd_exec(idxd, IDXD_CMD_DRAIN_WQ, operand, &stat);
+
+	if (status)
+		*status = stat;
+
+	if (stat != IDXD_CMDSTS_SUCCESS) {
+		dev_dbg(dev, "WQ drain failed: %#x\n", stat);
+		return -ENXIO;
+	}
+
+	dev_dbg(dev, "WQ %d drained\n", wq->id);
+	return 0;
 }
 
 int idxd_wq_map_portal(struct idxd_wq *wq)
@@ -307,11 +324,11 @@ void idxd_wq_unmap_portal(struct idxd_wq *wq)
 	devm_iounmap(dev, wq->portal);
 }
 
-int idxd_wq_abort(struct idxd_wq *wq)
+int idxd_wq_abort(struct idxd_wq *wq, u32 *status)
 {
 	struct idxd_device *idxd = wq->idxd;
 	struct device *dev = &idxd->pdev->dev;
-	u32 operand, status;
+	u32 operand, stat;
 
 	dev_dbg(dev, "Abort WQ %d\n", wq->id);
 	if (wq->state != IDXD_WQ_ENABLED) {
@@ -321,9 +338,13 @@ int idxd_wq_abort(struct idxd_wq *wq)
 
 	operand = BIT(wq->id % 16) | ((wq->id / 16) << 16);
 	dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_ABORT_WQ, operand);
-	idxd_cmd_exec(idxd, IDXD_CMD_ABORT_WQ, operand, &status);
-	if (status != IDXD_CMDSTS_SUCCESS) {
-		dev_dbg(dev, "WQ abort failed: %#x\n", status);
+	idxd_cmd_exec(idxd, IDXD_CMD_ABORT_WQ, operand, &stat);
+
+	if (status)
+		*status = stat;
+
+	if (stat != IDXD_CMDSTS_SUCCESS) {
+		dev_dbg(dev, "WQ abort failed: %#x\n", stat);
 		return -ENXIO;
 	}
 
@@ -339,7 +360,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid)
 	unsigned int offset;
 	unsigned long flags;
 
-	rc = idxd_wq_disable(wq);
+	rc = idxd_wq_disable(wq, NULL);
 	if (rc < 0)
 		return rc;
 
@@ -351,7 +372,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid)
 	iowrite32(wqcfg.bits[WQCFG_PASID_IDX], idxd->reg_base + offset);
 	spin_unlock_irqrestore(&idxd->dev_lock, flags);
 
-	rc = idxd_wq_enable(wq);
+	rc = idxd_wq_enable(wq, NULL);
 	if (rc < 0)
 		return rc;
 
@@ -366,7 +387,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq)
 	unsigned int offset;
 	unsigned long flags;
 
-	rc = idxd_wq_disable(wq);
+	rc = idxd_wq_disable(wq, NULL);
 	if (rc < 0)
 		return rc;
 
@@ -378,7 +399,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq)
 	iowrite32(wqcfg.bits[WQCFG_PASID_IDX], idxd->reg_base + offset);
 	spin_unlock_irqrestore(&idxd->dev_lock, flags);
 
-	rc = idxd_wq_enable(wq);
+	rc = idxd_wq_enable(wq, NULL);
 	if (rc < 0)
 		return rc;
 
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 67428c8d476d..41eee987c9b7 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -376,9 +376,9 @@ int idxd_device_release_int_handle(struct idxd_device *idxd, int handle,
 /* work queue control */
 int idxd_wq_alloc_resources(struct idxd_wq *wq);
 void idxd_wq_free_resources(struct idxd_wq *wq);
-int idxd_wq_enable(struct idxd_wq *wq);
-int idxd_wq_disable(struct idxd_wq *wq);
-void idxd_wq_drain(struct idxd_wq *wq);
+int idxd_wq_enable(struct idxd_wq *wq, u32 *status);
+int idxd_wq_disable(struct idxd_wq *wq, u32 *status);
+int idxd_wq_drain(struct idxd_wq *wq, u32 *status);
 int idxd_wq_map_portal(struct idxd_wq *wq);
 void idxd_wq_unmap_portal(struct idxd_wq *wq);
 void idxd_wq_disable_cleanup(struct idxd_wq *wq);
@@ -386,7 +386,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid);
 int idxd_wq_disable_pasid(struct idxd_wq *wq);
 void idxd_wq_quiesce(struct idxd_wq *wq);
 int idxd_wq_init_percpu_ref(struct idxd_wq *wq);
-int idxd_wq_abort(struct idxd_wq *wq);
+int idxd_wq_abort(struct idxd_wq *wq, u32 *status);
 void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid);
 void idxd_wq_setup_priv(struct idxd_wq *wq, int priv);
 
diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c
index a60ca11a5784..090926856df3 100644
--- a/drivers/dma/idxd/irq.c
+++ b/drivers/dma/idxd/irq.c
@@ -48,7 +48,7 @@ static void idxd_device_reinit(struct work_struct *work)
 		struct idxd_wq *wq = &idxd->wqs[i];
 
 		if (wq->state == IDXD_WQ_ENABLED) {
-			rc = idxd_wq_enable(wq);
+			rc = idxd_wq_enable(wq, NULL);
 			if (rc < 0) {
 				dev_warn(dev, "Unable to re-enable wq %s\n",
 					 dev_name(&wq->conf_dev));
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index d985a0ac23d9..913ff019fe36 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -189,7 +189,7 @@ static int enable_wq(struct idxd_wq *wq)
 		return rc;
 	}
 
-	rc = idxd_wq_enable(wq);
+	rc = idxd_wq_enable(wq, NULL);
 	if (rc < 0) {
 		mutex_unlock(&wq->wq_lock);
 		dev_warn(dev, "WQ %d enabling failed: %d\n", wq->id, rc);
@@ -199,7 +199,7 @@ static int enable_wq(struct idxd_wq *wq)
 	rc = idxd_wq_map_portal(wq);
 	if (rc < 0) {
 		dev_warn(dev, "wq portal mapping failed: %d\n", rc);
-		rc = idxd_wq_disable(wq);
+		rc = idxd_wq_disable(wq, NULL);
 		if (rc < 0)
 			dev_warn(dev, "IDXD wq disable failed\n");
 		mutex_unlock(&wq->wq_lock);
@@ -321,8 +321,8 @@ static void disable_wq(struct idxd_wq *wq)
 
 	idxd_wq_unmap_portal(wq);
 
-	idxd_wq_drain(wq);
-	rc = idxd_wq_disable(wq);
+	idxd_wq_drain(wq, NULL);
+	rc = idxd_wq_disable(wq, NULL);
 
 	idxd_wq_free_resources(wq);
 	wq->client_count = 0;
diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c
index 7529396f3812..67e6b33468cd 100644
--- a/drivers/vfio/mdev/idxd/mdev.c
+++ b/drivers/vfio/mdev/idxd/mdev.c
@@ -117,7 +117,7 @@ static void idxd_vdcm_init(struct vdcm_idxd *vidxd)
 	vidxd_mmio_init(vidxd);
 
 	if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED)
-		idxd_wq_disable(wq);
+		idxd_wq_disable(wq, NULL);
 }
 
 static void idxd_vdcm_release(struct mdev_device *mdev)



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 10/14] vfio/mdev: idxd: virtual device commands emulation
  2021-02-05 20:52 [PATCH v5 00/14] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
                   ` (8 preceding siblings ...)
  2021-02-05 20:53 ` [PATCH v5 09/14] vfio/mdev: idxd: prep for virtual device commands Dave Jiang
@ 2021-02-05 20:53 ` Dave Jiang
  2021-02-05 20:54 ` [PATCH v5 11/14] vfio/mdev: idxd: ims setup for the vdcm Dave Jiang
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Dave Jiang @ 2021-02-05 20:53 UTC (permalink / raw)
  To: alex.williamson, kwankhede, tglx, vkoul
  Cc: megha.dey, jacob.jun.pan, ashok.raj, jgg, yi.l.liu, baolu.lu,
	kevin.tian, sanjay.k.kumar, tony.luck, dan.j.williams,
	eric.auger, parav, netanelg, shahafs, pbonzini, dmaengine,
	linux-kernel, kvm

Add all the helper functions that supports the emulation of the commands
that are submitted to the device command register.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/idxd/device.c     |    5 
 drivers/dma/idxd/registers.h  |   16 +
 drivers/vfio/mdev/idxd/mdev.c |    2 
 drivers/vfio/mdev/idxd/mdev.h |    3 
 drivers/vfio/mdev/idxd/vdev.c |  440 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 460 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index 245d576ddc43..c5faa23bd8ce 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -242,6 +242,7 @@ int idxd_wq_enable(struct idxd_wq *wq, u32 *status)
 	dev_dbg(dev, "WQ %d enabled\n", wq->id);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(idxd_wq_enable);
 
 int idxd_wq_disable(struct idxd_wq *wq, u32 *status)
 {
@@ -299,6 +300,7 @@ int idxd_wq_drain(struct idxd_wq *wq, u32 *status)
 	dev_dbg(dev, "WQ %d drained\n", wq->id);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(idxd_wq_drain);
 
 int idxd_wq_map_portal(struct idxd_wq *wq)
 {
@@ -351,6 +353,7 @@ int idxd_wq_abort(struct idxd_wq *wq, u32 *status)
 	dev_dbg(dev, "WQ %d aborted\n", wq->id);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(idxd_wq_abort);
 
 int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid)
 {
@@ -470,6 +473,7 @@ void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid)
 	wq->wqcfg->pasid = pasid;
 	iowrite32(wq->wqcfg->bits[WQCFG_PASID_IDX], idxd->reg_base + offset);
 }
+EXPORT_SYMBOL_GPL(idxd_wq_setup_pasid);
 
 void idxd_wq_setup_priv(struct idxd_wq *wq, int priv)
 {
@@ -483,6 +487,7 @@ void idxd_wq_setup_priv(struct idxd_wq *wq, int priv)
 	wq->wqcfg->priv = !!priv;
 	iowrite32(wq->wqcfg->bits[WQCFG_PRIV_IDX], idxd->reg_base + offset);
 }
+EXPORT_SYMBOL_GPL(idxd_wq_setup_priv);
 
 /* Device control bits */
 static inline bool idxd_is_enabled(struct idxd_device *idxd)
diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h
index 50ea94259c99..0f985787417c 100644
--- a/drivers/dma/idxd/registers.h
+++ b/drivers/dma/idxd/registers.h
@@ -120,7 +120,8 @@ union gencfg_reg {
 union genctrl_reg {
 	struct {
 		u32 softerr_int_en:1;
-		u32 rsvd:31;
+		u32 halt_state_int_en:1;
+		u32 rsvd:30;
 	};
 	u32 bits;
 } __packed;
@@ -142,6 +143,8 @@ enum idxd_device_status_state {
 	IDXD_DEVICE_STATE_HALT,
 };
 
+#define IDXD_GENSTATS_MASK		0x03
+
 enum idxd_device_reset_type {
 	IDXD_DEVICE_RESET_SOFTWARE = 0,
 	IDXD_DEVICE_RESET_FLR,
@@ -154,6 +157,7 @@ enum idxd_device_reset_type {
 #define IDXD_INTC_CMD			0x02
 #define IDXD_INTC_OCCUPY			0x04
 #define IDXD_INTC_PERFMON_OVFL		0x08
+#define IDXD_INTC_HALT_STATE		0x10
 
 #define IDXD_CMD_OFFSET			0xa0
 union idxd_command_reg {
@@ -165,6 +169,7 @@ union idxd_command_reg {
 	};
 	u32 bits;
 } __packed;
+#define IDXD_CMD_INT_MASK		0x80000000
 
 enum idxd_cmd {
 	IDXD_CMD_ENABLE_DEVICE = 1,
@@ -228,10 +233,11 @@ enum idxd_cmdsts_err {
 	/* disable device errors */
 	IDXD_CMDSTS_ERR_DIS_DEV_EN = 0x31,
 	/* disable WQ, drain WQ, abort WQ, reset WQ */
-	IDXD_CMDSTS_ERR_DEV_NOT_EN,
+	IDXD_CMDSTS_ERR_WQ_NOT_EN,
 	/* request interrupt handle */
 	IDXD_CMDSTS_ERR_INVAL_INT_IDX = 0x41,
 	IDXD_CMDSTS_ERR_NO_HANDLE,
+	IDXD_CMDSTS_ERR_INVAL_INT_IDX_RELEASE,
 };
 
 #define IDXD_CMDCAP_OFFSET		0xb0
@@ -353,6 +359,12 @@ union wqcfg {
 	u32 bits[8];
 } __packed;
 
+enum idxd_wq_hw_state {
+	IDXD_WQ_DEV_DISABLED = 0,
+	IDXD_WQ_DEV_ENABLED,
+	IDXD_WQ_DEV_BUSY,
+};
+
 #define WQCFG_PASID_IDX		2
 #define WQCFG_PRIV_IDX		2
 #define WQCFG_MODE_DEDICATED	1
diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c
index 67e6b33468cd..7cde707021db 100644
--- a/drivers/vfio/mdev/idxd/mdev.c
+++ b/drivers/vfio/mdev/idxd/mdev.c
@@ -52,7 +52,7 @@ static char idxd_iax_1dwq_name[IDXD_MDEV_NAME_LEN];
 static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, unsigned int index,
 			      unsigned int start, unsigned int count, void *data);
 
-static int idxd_mdev_get_pasid(struct mdev_device *mdev, u32 *pasid)
+int idxd_mdev_get_pasid(struct mdev_device *mdev, u32 *pasid)
 {
 	struct vfio_group *vfio_group;
 	struct iommu_domain *iommu_domain;
diff --git a/drivers/vfio/mdev/idxd/mdev.h b/drivers/vfio/mdev/idxd/mdev.h
index 7ca50f054714..8421b4962ac7 100644
--- a/drivers/vfio/mdev/idxd/mdev.h
+++ b/drivers/vfio/mdev/idxd/mdev.h
@@ -38,6 +38,7 @@ struct ims_irq_entry {
 	bool irq_set;
 	int id;
 	int irq;
+	int ims_idx;
 };
 
 struct idxd_vdev {
@@ -112,4 +113,6 @@ static inline u64 get_reg_val(void *buf, int size)
 	return val;
 }
 
+int idxd_mdev_get_pasid(struct mdev_device *mdev, u32 *pasid);
+
 #endif
diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c
index 958b09987e5c..766fd98e9eea 100644
--- a/drivers/vfio/mdev/idxd/vdev.c
+++ b/drivers/vfio/mdev/idxd/vdev.c
@@ -492,17 +492,451 @@ void vidxd_mmio_init(struct vdcm_idxd *vidxd)
 
 static void idxd_complete_command(struct vdcm_idxd *vidxd, enum idxd_cmdsts_err val)
 {
-	/* PLACEHOLDER */
+	u8 *bar0 = vidxd->bar0;
+	u32 *cmd = (u32 *)(bar0 + IDXD_CMD_OFFSET);
+	u32 *cmdsts = (u32 *)(bar0 + IDXD_CMDSTS_OFFSET);
+	u32 *intcause = (u32 *)(bar0 + IDXD_INTCAUSE_OFFSET);
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+
+	*cmdsts = val;
+	dev_dbg(dev, "%s: cmd: %#x  status: %#x\n", __func__, *cmd, val);
+
+	if (*cmd & IDXD_CMD_INT_MASK) {
+		*intcause |= IDXD_INTC_CMD;
+		vidxd_send_interrupt(&vidxd->irq_entries[0]);
+	}
+}
+
+static void vidxd_enable(struct vdcm_idxd *vidxd)
+{
+	u8 *bar0 = vidxd->bar0;
+	union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET);
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+
+	dev_dbg(dev, "%s\n", __func__);
+	if (gensts->state == IDXD_DEVICE_STATE_ENABLED)
+		return idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_ENABLED);
+
+	/* Check PCI configuration */
+	if (!(vidxd->cfg[PCI_COMMAND] & PCI_COMMAND_MASTER))
+		return idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_BUSMASTER_EN);
+
+	gensts->state = IDXD_DEVICE_STATE_ENABLED;
+
+	return idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static void vidxd_disable(struct vdcm_idxd *vidxd)
+{
+	struct idxd_wq *wq;
+	union wqcfg *wqcfg;
+	u8 *bar0 = vidxd->bar0;
+	union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET);
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+	u32 status;
+
+	dev_dbg(dev, "%s\n", __func__);
+	if (gensts->state == IDXD_DEVICE_STATE_DISABLED) {
+		idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DIS_DEV_EN);
+		return;
+	}
+
+	wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET);
+	wq = vidxd->wq;
+
+	/* If it is a DWQ, need to disable the DWQ as well */
+	if (wq_dedicated(wq)) {
+		idxd_wq_disable(wq, &status);
+		if (status) {
+			dev_warn(dev, "vidxd disable (wq disable) failed: %#x\n", status);
+			idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DIS_DEV_EN);
+			return;
+		}
+	} else {
+		idxd_wq_drain(wq, &status);
+		if (status)
+			dev_warn(dev, "vidxd disable (wq drain) failed: %#x\n", status);
+	}
+
+	wqcfg->wq_state = 0;
+	gensts->state = IDXD_DEVICE_STATE_DISABLED;
+	idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static void vidxd_drain_all(struct vdcm_idxd *vidxd)
+{
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+	struct idxd_wq *wq = vidxd->wq;
+
+	dev_dbg(dev, "%s\n", __func__);
+
+	idxd_wq_drain(wq, NULL);
+	idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static void vidxd_wq_drain(struct vdcm_idxd *vidxd, int val)
+{
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+	u8 *bar0 = vidxd->bar0;
+	union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET);
+	struct idxd_wq *wq = vidxd->wq;
+	u32 status;
+
+	dev_dbg(dev, "%s\n", __func__);
+	if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) {
+		idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN);
+		return;
+	}
+
+	idxd_wq_drain(wq, &status);
+	if (status) {
+		dev_dbg(dev, "wq drain failed: %#x\n", status);
+		idxd_complete_command(vidxd, status);
+		return;
+	}
+
+	idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static void vidxd_abort_all(struct vdcm_idxd *vidxd)
+{
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+	struct idxd_wq *wq = vidxd->wq;
+
+	dev_dbg(dev, "%s\n", __func__);
+	idxd_wq_abort(wq, NULL);
+	idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static void vidxd_wq_abort(struct vdcm_idxd *vidxd, int val)
+{
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+	u8 *bar0 = vidxd->bar0;
+	union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET);
+	struct idxd_wq *wq = vidxd->wq;
+	u32 status;
+
+	dev_dbg(dev, "%s\n", __func__);
+	if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) {
+		idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN);
+		return;
+	}
+
+	idxd_wq_abort(wq, &status);
+	if (status) {
+		dev_dbg(dev, "wq abort failed: %#x\n", status);
+		idxd_complete_command(vidxd, status);
+		return;
+	}
+
+	idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
 }
 
 void vidxd_reset(struct vdcm_idxd *vidxd)
 {
-	/* PLACEHOLDER */
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+	u8 *bar0 = vidxd->bar0;
+	union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET);
+	struct idxd_wq *wq;
+
+	dev_dbg(dev, "%s\n", __func__);
+	gensts->state = IDXD_DEVICE_STATE_DRAIN;
+	wq = vidxd->wq;
+
+	if (wq->state == IDXD_WQ_ENABLED) {
+		idxd_wq_abort(wq, NULL);
+		idxd_wq_disable(wq, NULL);
+	}
+
+	vidxd_mmio_init(vidxd);
+	gensts->state = IDXD_DEVICE_STATE_DISABLED;
+	idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static void vidxd_wq_reset(struct vdcm_idxd *vidxd, int wq_id_mask)
+{
+	struct idxd_wq *wq;
+	u8 *bar0 = vidxd->bar0;
+	union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET);
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+	u32 status;
+
+	wq = vidxd->wq;
+	dev_dbg(dev, "vidxd reset wq %u:%u\n", 0, wq->id);
+
+	if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) {
+		idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN);
+		return;
+	}
+
+	idxd_wq_abort(wq, &status);
+	if (status) {
+		dev_dbg(dev, "vidxd reset wq failed to abort: %#x\n", status);
+		idxd_complete_command(vidxd, status);
+		return;
+	}
+
+	idxd_wq_disable(wq, &status);
+	if (status) {
+		dev_dbg(dev, "vidxd reset wq failed to disable: %#x\n", status);
+		idxd_complete_command(vidxd, status);
+		return;
+	}
+
+	wqcfg->wq_state = IDXD_WQ_DEV_DISABLED;
+	idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static void vidxd_alloc_int_handle(struct vdcm_idxd *vidxd, int operand)
+{
+	bool ims = !!(operand & CMD_INT_HANDLE_IMS);
+	u32 cmdsts;
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+	int ims_idx, vidx;
+
+	vidx = operand & GENMASK(15, 0);
+
+	dev_dbg(dev, "allocating int handle for %d\n", vidx);
+
+	/* vidx cannot be 0 since that's emulated and does not require IMS handle */
+	if (vidx <= 0 || vidx >= VIDXD_MAX_MSIX_ENTRIES) {
+		idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_INVAL_INT_IDX);
+		return;
+	}
+
+	if (ims) {
+		dev_warn(dev, "IMS allocation is not implemented yet\n");
+		idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_NO_HANDLE);
+		return;
+	}
+
+	ims_idx = vidxd->irq_entries[vidx].ims_idx;
+	cmdsts = ims_idx << IDXD_CMDSTS_RES_SHIFT;
+	dev_dbg(dev, "requested index %d handle %d\n", vidx, ims_idx);
+	idxd_complete_command(vidxd, cmdsts);
+}
+
+static void vidxd_release_int_handle(struct vdcm_idxd *vidxd, int operand)
+{
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+	bool ims = !!(operand & CMD_INT_HANDLE_IMS);
+	int handle, i;
+	bool found = false;
+
+	handle = operand & GENMASK(15, 0);
+	dev_dbg(dev, "allocating int handle %d\n", handle);
+
+	if (ims) {
+		dev_warn(dev, "IMS allocation is not implemented yet\n");
+		idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_INVAL_INT_IDX_RELEASE);
+		return;
+	}
+
+	/* IMS backed entry start at 1, 0 is emulated vector */
+	for (i = 1; i < VIDXD_MAX_MSIX_ENTRIES; i++) {
+		if (vidxd->irq_entries[i].ims_idx == handle) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		dev_warn(dev, "Freeing unallocated int handle.\n");
+		idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_INVAL_INT_IDX_RELEASE);
+	}
+
+	dev_dbg(dev, "int handle %d released.\n", handle);
+	idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static void vidxd_wq_enable(struct vdcm_idxd *vidxd, int wq_id)
+{
+	struct idxd_wq *wq;
+	u8 *bar0 = vidxd->bar0;
+	union wq_cap_reg *wqcap;
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+	struct idxd_device *idxd;
+	union wqcfg *vwqcfg, *wqcfg;
+	unsigned long flags;
+	u32 status, wq_pasid;
+	int priv, rc;
+
+	if (wq_id >= VIDXD_MAX_WQS) {
+		idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_WQIDX);
+		return;
+	}
+
+	idxd = vidxd->idxd;
+	wq = vidxd->wq;
+
+	dev_dbg(dev, "%s: wq %u:%u\n", __func__, wq_id, wq->id);
+
+	vwqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET + wq_id * 32);
+	wqcap = (union wq_cap_reg *)(bar0 + IDXD_WQCAP_OFFSET);
+	wqcfg = wq->wqcfg;
+
+	if (vidxd_state(vidxd) != IDXD_DEVICE_STATE_ENABLED) {
+		idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_NOTEN);
+		return;
+	}
+
+	if (vwqcfg->wq_state != IDXD_WQ_DEV_DISABLED) {
+		idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_ENABLED);
+		return;
+	}
+
+	if (wq_dedicated(wq) && wqcap->dedicated_mode == 0) {
+		idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_MODE);
+		return;
+	}
+
+	priv = 1;
+	rc = idxd_mdev_get_pasid(mdev, &wq_pasid);
+	if (rc < 0) {
+		dev_err(dev, "idxd pasid setup failed wq %d: %d\n", wq->id, rc);
+		idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_PASID_EN);
+		return;
+	}
+
+	/* Clear pasid_en, pasid, and priv values */
+	wqcfg->bits[WQCFG_PASID_IDX] &= ~GENMASK(29, 8);
+	wqcfg->priv = priv;
+	wqcfg->pasid_en = 1;
+	wqcfg->pasid = wq_pasid;
+	dev_dbg(dev, "program pasid %d in wq %d\n", wq_pasid, wq->id);
+	spin_lock_irqsave(&idxd->dev_lock, flags);
+	idxd_wq_setup_pasid(wq, wq_pasid);
+	idxd_wq_setup_priv(wq, priv);
+	spin_unlock_irqrestore(&idxd->dev_lock, flags);
+	idxd_wq_enable(wq, &status);
+	if (status) {
+		dev_err(dev, "vidxd enable wq %d failed\n", wq->id);
+		idxd_complete_command(vidxd, status);
+		return;
+	}
+
+	vwqcfg->wq_state = IDXD_WQ_DEV_ENABLED;
+	idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static void vidxd_wq_disable(struct vdcm_idxd *vidxd, int wq_id_mask)
+{
+	struct idxd_wq *wq;
+	union wqcfg *wqcfg;
+	u8 *bar0 = vidxd->bar0;
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+	u32 status;
+
+	wq = vidxd->wq;
+
+	dev_dbg(dev, "vidxd disable wq %u:%u\n", 0, wq->id);
+
+	wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET);
+	if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) {
+		idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN);
+		return;
+	}
+
+	/* If it is a DWQ, need to disable the DWQ as well */
+	if (wq_dedicated(wq)) {
+		idxd_wq_disable(wq, &status);
+		if (status) {
+			dev_warn(dev, "vidxd disable wq failed: %#x\n", status);
+			idxd_complete_command(vidxd, status);
+			return;
+		}
+	} else {
+		idxd_wq_drain(wq, &status);
+		if (status) {
+			dev_warn(dev, "vidxd disable drain wq failed: %#x\n", status);
+			idxd_complete_command(vidxd, status);
+			return;
+		}
+	}
+
+	wqcfg->wq_state = IDXD_WQ_DEV_DISABLED;
+	idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
+}
+
+static bool command_supported(struct vdcm_idxd *vidxd, u32 cmd)
+{
+	struct idxd_device *idxd = vidxd->idxd;
+
+	if (cmd == IDXD_CMD_REQUEST_INT_HANDLE || cmd == IDXD_CMD_RELEASE_INT_HANDLE)
+		return true;
+
+	return !!(idxd->hw.opcap.bits[0] & BIT_ULL(cmd));
 }
 
 void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val)
 {
-	/* PLACEHOLDER */
+	union idxd_command_reg *reg = (union idxd_command_reg *)(vidxd->bar0 + IDXD_CMD_OFFSET);
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+
+	reg->bits = val;
+
+	dev_dbg(dev, "%s: cmd code: %u reg: %x\n", __func__, reg->cmd, reg->bits);
+
+	if (!command_supported(vidxd, reg->cmd)) {
+		idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_CMD);
+		return;
+	}
+
+	switch (reg->cmd) {
+	case IDXD_CMD_ENABLE_DEVICE:
+		vidxd_enable(vidxd);
+		break;
+	case IDXD_CMD_DISABLE_DEVICE:
+		vidxd_disable(vidxd);
+		break;
+	case IDXD_CMD_DRAIN_ALL:
+		vidxd_drain_all(vidxd);
+		break;
+	case IDXD_CMD_ABORT_ALL:
+		vidxd_abort_all(vidxd);
+		break;
+	case IDXD_CMD_RESET_DEVICE:
+		vidxd_reset(vidxd);
+		break;
+	case IDXD_CMD_ENABLE_WQ:
+		vidxd_wq_enable(vidxd, reg->operand);
+		break;
+	case IDXD_CMD_DISABLE_WQ:
+		vidxd_wq_disable(vidxd, reg->operand);
+		break;
+	case IDXD_CMD_DRAIN_WQ:
+		vidxd_wq_drain(vidxd, reg->operand);
+		break;
+	case IDXD_CMD_ABORT_WQ:
+		vidxd_wq_abort(vidxd, reg->operand);
+		break;
+	case IDXD_CMD_RESET_WQ:
+		vidxd_wq_reset(vidxd, reg->operand);
+		break;
+	case IDXD_CMD_REQUEST_INT_HANDLE:
+		vidxd_alloc_int_handle(vidxd, reg->operand);
+		break;
+	case IDXD_CMD_RELEASE_INT_HANDLE:
+		vidxd_release_int_handle(vidxd, reg->operand);
+		break;
+	default:
+		idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_CMD);
+		break;
+	}
 }
 
 int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd)



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 11/14] vfio/mdev: idxd: ims setup for the vdcm
  2021-02-05 20:52 [PATCH v5 00/14] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
                   ` (9 preceding siblings ...)
  2021-02-05 20:53 ` [PATCH v5 10/14] vfio/mdev: idxd: virtual device commands emulation Dave Jiang
@ 2021-02-05 20:54 ` Dave Jiang
  2021-02-05 20:54 ` [PATCH v5 12/14] vfio/mdev: idxd: add irq bypass for IMS vectors Dave Jiang
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Dave Jiang @ 2021-02-05 20:54 UTC (permalink / raw)
  To: alex.williamson, kwankhede, tglx, vkoul
  Cc: megha.dey, jacob.jun.pan, ashok.raj, jgg, yi.l.liu, baolu.lu,
	kevin.tian, sanjay.k.kumar, tony.luck, dan.j.williams,
	eric.auger, parav, netanelg, shahafs, pbonzini, dmaengine,
	linux-kernel, kvm

Add setup for IMS enabling for the mediated device.

On the actual hardware the MSIX vector 0 is misc interrupt and
handles events such as administrative command completion, error
reporting, performance monitor overflow, and etc. The MSIX vectors
1...N are used for descriptor completion interrupts. On the guest
kernel, the MSIX interrupts are backed by the mediated device through
emulation or IMS vectors. Vector 0 is handled through emulation by
the host vdcm. The vector 1 (and more may be supported later) is
backed by IMS.

IMS can be setup with interrupt handlers via request_irq() just like
MSIX interrupts once the relevant IRQ domain is set.

The msi_domain_alloc_irqs()/msi_domain_free_irqs() APIs can then be
used to allocate interrupts from the above set domain.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/idxd/idxd.h       |    1 +
 drivers/vfio/mdev/idxd/mdev.c |   12 +++++++++
 drivers/vfio/mdev/idxd/vdev.c |   53 ++++++++++++++++++++++++++++++++---------
 kernel/irq/msi.c              |    2 ++
 4 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 41eee987c9b7..c5ef6ccc9ba6 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -224,6 +224,7 @@ struct idxd_device {
 	struct workqueue_struct *wq;
 	struct work_struct work;
 
+	struct irq_domain *ims_domain;
 	int *int_handles;
 
 	struct auxiliary_device *mdev_auxdev;
diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c
index 7cde707021db..8a4af882a47f 100644
--- a/drivers/vfio/mdev/idxd/mdev.c
+++ b/drivers/vfio/mdev/idxd/mdev.c
@@ -1167,6 +1167,7 @@ static int alloc_supported_types(struct idxd_device *idxd)
 int idxd_mdev_host_init(struct idxd_device *idxd)
 {
 	struct device *dev = &idxd->pdev->dev;
+	struct ims_array_info ims_info;
 	int rc;
 
 	if (!test_bit(IDXD_FLAG_IMS_SUPPORTED, &idxd->flags))
@@ -1188,6 +1189,15 @@ int idxd_mdev_host_init(struct idxd_device *idxd)
 		return -EOPNOTSUPP;
 	}
 
+	ims_info.max_slots = idxd->ims_size;
+	ims_info.slots = idxd->reg_base + idxd->ims_offset;
+	idxd->ims_domain = pci_ims_array_create_msi_irq_domain(idxd->pdev, &ims_info);
+	if (!idxd->ims_domain) {
+		dev_warn(dev, "Fail to acquire IMS domain\n");
+		iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX);
+		return -ENODEV;
+	}
+
 	return mdev_register_device(dev, &idxd_vdcm_ops);
 }
 
@@ -1196,6 +1206,8 @@ void idxd_mdev_host_release(struct idxd_device *idxd)
 	struct device *dev = &idxd->pdev->dev;
 	int rc;
 
+	irq_domain_remove(idxd->ims_domain);
+
 	mdev_unregister_device(dev);
 	if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
 		rc = iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX);
diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c
index 766fd98e9eea..8626438a9e54 100644
--- a/drivers/vfio/mdev/idxd/vdev.c
+++ b/drivers/vfio/mdev/idxd/vdev.c
@@ -16,6 +16,7 @@
 #include <linux/intel-svm.h>
 #include <linux/kvm_host.h>
 #include <linux/eventfd.h>
+#include <linux/irqchip/irq-ims-msi.h>
 #include <uapi/linux/idxd.h>
 #include "registers.h"
 #include "idxd.h"
@@ -871,6 +872,47 @@ static void vidxd_wq_disable(struct vdcm_idxd *vidxd, int wq_id_mask)
 	idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS);
 }
 
+void vidxd_free_ims_entries(struct vdcm_idxd *vidxd)
+{
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+
+	msi_domain_free_irqs(dev_get_msi_domain(dev), dev);
+}
+
+int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd)
+{
+	struct irq_domain *irq_domain;
+	struct idxd_device *idxd = vidxd->idxd;
+	struct mdev_device *mdev = vidxd->vdev.mdev;
+	struct device *dev = mdev_dev(mdev);
+	struct msi_desc *entry;
+	struct ims_irq_entry *irq_entry;
+	int rc, i;
+
+	irq_domain = idxd->ims_domain;
+	dev_set_msi_domain(dev, irq_domain);
+
+	/* We are allocate MAX_MSIX - 1 is because vector 0 is emulated and not IMS backed. */
+	rc = msi_domain_alloc_irqs(irq_domain, dev, VIDXD_MAX_MSIX_VECS - 1);
+	if (rc < 0)
+		return rc;
+	/*
+	 * The first MSIX vector on the guest is emulated and not backed by IMS. To make matters
+	 * simple the ims entries include the emulated vector. Here the code starts at index
+	 * 1 to setup all the IMS backed vectors.
+	 */
+	i = 1;
+	for_each_msi_entry(entry, dev) {
+		irq_entry = &vidxd->irq_entries[i];
+		irq_entry->ims_idx = entry->device_msi.hwirq;
+		irq_entry->irq = entry->irq;
+		i++;
+	}
+
+	return 0;
+}
+
 static bool command_supported(struct vdcm_idxd *vidxd, u32 cmd)
 {
 	struct idxd_device *idxd = vidxd->idxd;
@@ -938,14 +980,3 @@ void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val)
 		break;
 	}
 }
-
-int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd)
-{
-	/* PLACEHOLDER */
-	return 0;
-}
-
-void vidxd_free_ims_entries(struct vdcm_idxd *vidxd)
-{
-	/* PLACEHOLDER */
-}
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index d70d92eac322..d95299b4ae79 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -536,6 +536,7 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
 
 	return ops->domain_alloc_irqs(domain, dev, nvec);
 }
+EXPORT_SYMBOL_GPL(msi_domain_alloc_irqs);
 
 void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
 {
@@ -572,6 +573,7 @@ void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
 
 	return ops->domain_free_irqs(domain, dev);
 }
+EXPORT_SYMBOL_GPL(msi_domain_free_irqs);
 
 /**
  * msi_get_domain_info - Get the MSI interrupt domain info for @domain



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 12/14] vfio/mdev: idxd: add irq bypass for IMS vectors
  2021-02-05 20:52 [PATCH v5 00/14] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
                   ` (10 preceding siblings ...)
  2021-02-05 20:54 ` [PATCH v5 11/14] vfio/mdev: idxd: ims setup for the vdcm Dave Jiang
@ 2021-02-05 20:54 ` Dave Jiang
  2021-02-05 20:54 ` [PATCH v5 13/14] vfio/mdev: idxd: add new wq state for mdev Dave Jiang
  2021-02-05 20:54 ` [PATCH v5 14/14] vfio/mdev: idxd: add error notification from host driver to mediated device Dave Jiang
  13 siblings, 0 replies; 31+ messages in thread
From: Dave Jiang @ 2021-02-05 20:54 UTC (permalink / raw)
  To: alex.williamson, kwankhede, tglx, vkoul
  Cc: megha.dey, jacob.jun.pan, ashok.raj, jgg, yi.l.liu, baolu.lu,
	kevin.tian, sanjay.k.kumar, tony.luck, dan.j.williams,
	eric.auger, parav, netanelg, shahafs, pbonzini, dmaengine,
	linux-kernel, kvm

Add support to bypass host for IMS interrupts configured for the guest.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/vfio/mdev/Kconfig     |    1 +
 drivers/vfio/mdev/idxd/mdev.c |   17 +++++++++++++++--
 drivers/vfio/mdev/idxd/mdev.h |    1 +
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
index e9540e43d1f1..ab0a6f0930bc 100644
--- a/drivers/vfio/mdev/Kconfig
+++ b/drivers/vfio/mdev/Kconfig
@@ -22,6 +22,7 @@ config VFIO_MDEV_IDXD
 	depends on VFIO && VFIO_MDEV && X86_64
 	select AUXILIARY_BUS
 	select IMS_MSI_ARRAY
+	select IRQ_BYPASS_MANAGER
 	default n
 	help
 	  VFIO based mediated device driver for Intel Accelerator Devices driver.
diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c
index 8a4af882a47f..d59920f78109 100644
--- a/drivers/vfio/mdev/idxd/mdev.c
+++ b/drivers/vfio/mdev/idxd/mdev.c
@@ -616,9 +616,13 @@ static int msix_trigger_unregister(struct vdcm_idxd *vidxd, int index)
 
 	dev_dbg(dev, "disable MSIX trigger %d\n", index);
 	if (index) {
+		struct irq_bypass_producer *producer;
 		u32 auxval;
 
+		producer = &vidxd->vdev.producer[index];
+		irq_bypass_unregister_producer(producer);
 		irq_entry = &vidxd->irq_entries[index];
+
 		if (irq_entry->irq_set) {
 			free_irq(irq_entry->irq, irq_entry);
 			irq_entry->irq_set = false;
@@ -654,9 +658,10 @@ static int msix_trigger_register(struct vdcm_idxd *vidxd, u32 fd, int index)
 	}
 
 	if (index) {
-		u32 pasid;
-		u32 auxval;
+		struct irq_bypass_producer *producer;
+		u32 pasid, auxval;
 
+		producer = &vidxd->vdev.producer[index];
 		irq_entry = &vidxd->irq_entries[index];
 		rc = idxd_mdev_get_pasid(mdev, &pasid);
 		if (rc < 0)
@@ -682,6 +687,14 @@ static int msix_trigger_register(struct vdcm_idxd *vidxd, u32 fd, int index)
 			irq_set_auxdata(irq_entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval);
 			return rc;
 		}
+
+		producer->token = trigger;
+		producer->irq = irq_entry->irq;
+		rc = irq_bypass_register_producer(producer);
+		if (unlikely(rc))
+			dev_info(dev, "irq bypass producer (token %p) registration failed: %d\n",
+				 producer->token, rc);
+
 		irq_entry->irq_set = true;
 	}
 
diff --git a/drivers/vfio/mdev/idxd/mdev.h b/drivers/vfio/mdev/idxd/mdev.h
index 8421b4962ac7..1f867de416e7 100644
--- a/drivers/vfio/mdev/idxd/mdev.h
+++ b/drivers/vfio/mdev/idxd/mdev.h
@@ -45,6 +45,7 @@ struct idxd_vdev {
 	struct mdev_device *mdev;
 	struct vfio_group *vfio_group;
 	struct eventfd_ctx *msix_trigger[VIDXD_MAX_MSIX_ENTRIES];
+	struct irq_bypass_producer producer[VIDXD_MAX_MSIX_ENTRIES];
 };
 
 struct vdcm_idxd {



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 13/14] vfio/mdev: idxd: add new wq state for mdev
  2021-02-05 20:52 [PATCH v5 00/14] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
                   ` (11 preceding siblings ...)
  2021-02-05 20:54 ` [PATCH v5 12/14] vfio/mdev: idxd: add irq bypass for IMS vectors Dave Jiang
@ 2021-02-05 20:54 ` Dave Jiang
  2021-02-05 20:54 ` [PATCH v5 14/14] vfio/mdev: idxd: add error notification from host driver to mediated device Dave Jiang
  13 siblings, 0 replies; 31+ messages in thread
From: Dave Jiang @ 2021-02-05 20:54 UTC (permalink / raw)
  To: alex.williamson, kwankhede, tglx, vkoul
  Cc: megha.dey, jacob.jun.pan, ashok.raj, jgg, yi.l.liu, baolu.lu,
	kevin.tian, sanjay.k.kumar, tony.luck, dan.j.williams,
	eric.auger, parav, netanelg, shahafs, pbonzini, dmaengine,
	linux-kernel, kvm

When a dedicated wq is enabled as mdev, we must disable the wq on the
device in order to program the pasid to the wq. Introduce a wq state
IDXD_WQ_LOCKED that is software state only in order to prevent the user
from modifying the configuration while mdev wq is in this state. While
in this state, the wq is not in DISABLED state and will prevent any
modifications to the configuration. It is also not in the ENABLED state
and therefore prevents any actions allowed in the ENABLED state.

For mdev, the dwq is disabled and set to LOCKED state upon the mdev
creation. When ->open() is called on the mdev and a pasid is programmed to
the WQCFG, the dwq is enabled again and goes to the ENABLED state.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/idxd/device.c     |    9 +++++++++
 drivers/dma/idxd/idxd.h       |    1 +
 drivers/dma/idxd/sysfs.c      |    2 ++
 drivers/vfio/mdev/idxd/mdev.c |    4 +++-
 4 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index c5faa23bd8ce..1cd64a6a60de 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -252,6 +252,14 @@ int idxd_wq_disable(struct idxd_wq *wq, u32 *status)
 
 	dev_dbg(dev, "Disabling WQ %d\n", wq->id);
 
+	/*
+	 * When the wq is in LOCKED state, it means it is disabled but
+	 * also at the same time is "enabled" as far as the user is
+	 * concerned. So a call to disable the hardware can be skipped.
+	 */
+	if (wq->state == IDXD_WQ_LOCKED)
+		goto out;
+
 	if (wq->state != IDXD_WQ_ENABLED) {
 		dev_dbg(dev, "WQ %d in wrong state: %d\n", wq->id, wq->state);
 		return 0;
@@ -268,6 +276,7 @@ int idxd_wq_disable(struct idxd_wq *wq, u32 *status)
 		return -ENXIO;
 	}
 
+ out:
 	wq->state = IDXD_WQ_DISABLED;
 	dev_dbg(dev, "WQ %d disabled\n", wq->id);
 	return 0;
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index c5ef6ccc9ba6..4afe35385f85 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -62,6 +62,7 @@ struct idxd_group {
 enum idxd_wq_state {
 	IDXD_WQ_DISABLED = 0,
 	IDXD_WQ_ENABLED,
+	IDXD_WQ_LOCKED,
 };
 
 enum idxd_wq_flag {
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index 913ff019fe36..1bce55ac24b9 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -879,6 +879,8 @@ static ssize_t wq_state_show(struct device *dev,
 		return sprintf(buf, "disabled\n");
 	case IDXD_WQ_ENABLED:
 		return sprintf(buf, "enabled\n");
+	case IDXD_WQ_LOCKED:
+		return sprintf(buf, "locked\n");
 	}
 
 	return sprintf(buf, "unknown\n");
diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c
index d59920f78109..60913950a4f5 100644
--- a/drivers/vfio/mdev/idxd/mdev.c
+++ b/drivers/vfio/mdev/idxd/mdev.c
@@ -116,8 +116,10 @@ static void idxd_vdcm_init(struct vdcm_idxd *vidxd)
 
 	vidxd_mmio_init(vidxd);
 
-	if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED)
+	if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) {
 		idxd_wq_disable(wq, NULL);
+		wq->state = IDXD_WQ_LOCKED;
+	}
 }
 
 static void idxd_vdcm_release(struct mdev_device *mdev)



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 14/14] vfio/mdev: idxd: add error notification from host driver to mediated device
  2021-02-05 20:52 [PATCH v5 00/14] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
                   ` (12 preceding siblings ...)
  2021-02-05 20:54 ` [PATCH v5 13/14] vfio/mdev: idxd: add new wq state for mdev Dave Jiang
@ 2021-02-05 20:54 ` Dave Jiang
  13 siblings, 0 replies; 31+ messages in thread
From: Dave Jiang @ 2021-02-05 20:54 UTC (permalink / raw)
  To: alex.williamson, kwankhede, tglx, vkoul
  Cc: megha.dey, jacob.jun.pan, ashok.raj, jgg, yi.l.liu, baolu.lu,
	kevin.tian, sanjay.k.kumar, tony.luck, dan.j.williams,
	eric.auger, parav, netanelg, shahafs, pbonzini, dmaengine,
	linux-kernel, kvm

When a device error occurs, the mediated device need to be notified in
order to notify the guest of device error. Add support to notify the
specific mdev when an error is wq specific and broadcast errors to all mdev
when it's a generic device error.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/idxd/idxd.h       |    7 +++++++
 drivers/dma/idxd/irq.c        |    6 ++++++
 drivers/vfio/mdev/idxd/mdev.c |    5 +++++
 drivers/vfio/mdev/idxd/vdev.c |   32 ++++++++++++++++++++++++++++++++
 drivers/vfio/mdev/idxd/vdev.h |    1 +
 5 files changed, 51 insertions(+)

diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 4afe35385f85..6016df029ed4 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -295,10 +295,17 @@ enum idxd_interrupt_type {
 	IDXD_IRQ_IMS,
 };
 
+struct aux_mdev_ops {
+	void (*notify_error)(struct idxd_wq *wq);
+};
+
 struct idxd_mdev_aux_drv {
 	        struct auxiliary_driver auxiliary_drv;
+		const struct aux_mdev_ops ops;
 };
 
+#define to_mdev_aux_drv(_aux_drv) container_of(_aux_drv, struct idxd_mdev_aux_drv, auxiliary_drv)
+
 static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot,
 					    enum idxd_interrupt_type irq_type)
 {
diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c
index 090926856df3..9cdd3e789799 100644
--- a/drivers/dma/idxd/irq.c
+++ b/drivers/dma/idxd/irq.c
@@ -118,6 +118,8 @@ static int process_misc_interrupts(struct idxd_device *idxd, u32 cause)
 	u32 val = 0;
 	int i;
 	bool err = false;
+	struct auxiliary_driver *auxdrv = to_auxiliary_drv(idxd->mdev_auxdev->dev.driver);
+	struct idxd_mdev_aux_drv *mdevdrv = to_mdev_aux_drv(auxdrv);
 
 	if (cause & IDXD_INTC_ERR) {
 		spin_lock_bh(&idxd->dev_lock);
@@ -132,6 +134,8 @@ static int process_misc_interrupts(struct idxd_device *idxd, u32 cause)
 
 			if (wq->type == IDXD_WQT_USER)
 				wake_up_interruptible(&wq->idxd_cdev.err_queue);
+			else if (wq->type == IDXD_WQT_MDEV)
+				mdevdrv->ops.notify_error(wq);
 		} else {
 			int i;
 
@@ -140,6 +144,8 @@ static int process_misc_interrupts(struct idxd_device *idxd, u32 cause)
 
 				if (wq->type == IDXD_WQT_USER)
 					wake_up_interruptible(&wq->idxd_cdev.err_queue);
+				else if (wq->type == IDXD_WQT_MDEV)
+					mdevdrv->ops.notify_error(wq);
 			}
 		}
 
diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c
index 60913950a4f5..edccaad66c8c 100644
--- a/drivers/vfio/mdev/idxd/mdev.c
+++ b/drivers/vfio/mdev/idxd/mdev.c
@@ -1266,12 +1266,17 @@ static const struct auxiliary_device_id idxd_mdev_auxbus_id_table[] = {
 };
 MODULE_DEVICE_TABLE(auxiliary, idxd_mdev_auxbus_id_table);
 
+static const struct aux_mdev_ops aux_mdev_ops = {
+	.notify_error = idxd_wq_vidxd_send_errors,
+};
+
 static struct idxd_mdev_aux_drv idxd_mdev_aux_drv = {
 	.auxiliary_drv = {
 		.id_table = idxd_mdev_auxbus_id_table,
 		.probe = idxd_mdev_aux_probe,
 		.remove = idxd_mdev_aux_remove,
 	},
+	.ops = aux_mdev_ops,
 };
 
 static int idxd_mdev_auxdev_drv_register(struct idxd_mdev_aux_drv *drv)
diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c
index 8626438a9e54..3aa9d5b870e8 100644
--- a/drivers/vfio/mdev/idxd/vdev.c
+++ b/drivers/vfio/mdev/idxd/vdev.c
@@ -980,3 +980,35 @@ void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val)
 		break;
 	}
 }
+
+static void vidxd_send_errors(struct vdcm_idxd *vidxd)
+{
+	struct idxd_device *idxd = vidxd->idxd;
+	u8 *bar0 = vidxd->bar0;
+	union sw_err_reg *swerr = (union sw_err_reg *)(bar0 + IDXD_SWERR_OFFSET);
+	union genctrl_reg *genctrl = (union genctrl_reg *)(bar0 + IDXD_GENCTRL_OFFSET);
+	u32 *intcause = (u32 *)(bar0 + IDXD_INTCAUSE_OFFSET);
+	int i;
+
+	if (swerr->valid) {
+		if (!swerr->overflow)
+			swerr->overflow = 1;
+		return;
+	}
+
+	lockdep_assert_held(&idxd->dev_lock);
+	for (i = 0; i < 4; i++)
+		swerr->bits[i] = idxd->sw_err.bits[i];
+
+	*intcause |= IDXD_INTC_ERR;
+	if (genctrl->softerr_int_en)
+		vidxd_send_interrupt(&vidxd->irq_entries[0]);
+}
+
+void idxd_wq_vidxd_send_errors(struct idxd_wq *wq)
+{
+	struct vdcm_idxd *vidxd;
+
+	list_for_each_entry(vidxd, &wq->vdcm_list, list)
+		vidxd_send_errors(vidxd);
+}
diff --git a/drivers/vfio/mdev/idxd/vdev.h b/drivers/vfio/mdev/idxd/vdev.h
index fc0f405baa40..00df08f9a963 100644
--- a/drivers/vfio/mdev/idxd/vdev.h
+++ b/drivers/vfio/mdev/idxd/vdev.h
@@ -23,5 +23,6 @@ int vidxd_send_interrupt(struct ims_irq_entry *iie);
 int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd);
 void vidxd_free_ims_entries(struct vdcm_idxd *vidxd);
 void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val);
+void idxd_wq_vidxd_send_errors(struct idxd_wq *wq);
 
 #endif



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 02/14] dmaengine: idxd: add IMS detection in base driver
  2021-02-05 20:53 ` [PATCH v5 02/14] dmaengine: idxd: add IMS detection in base driver Dave Jiang
@ 2021-02-10 23:30   ` Jason Gunthorpe
  2021-02-10 23:32     ` Dave Jiang
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2021-02-10 23:30 UTC (permalink / raw)
  To: Dave Jiang
  Cc: alex.williamson, kwankhede, tglx, vkoul, megha.dey,
	jacob.jun.pan, ashok.raj, yi.l.liu, baolu.lu, kevin.tian,
	sanjay.k.kumar, tony.luck, dan.j.williams, eric.auger, parav,
	netanelg, shahafs, pbonzini, dmaengine, linux-kernel, kvm

On Fri, Feb 05, 2021 at 01:53:05PM -0700, Dave Jiang wrote:

> diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
> index 21c1e23cdf23..ab5c76e1226b 100644
> +++ b/drivers/dma/idxd/sysfs.c
> @@ -1444,6 +1444,14 @@ static ssize_t numa_node_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(numa_node);
>  
> +static ssize_t ims_size_show(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> +	struct idxd_device *idxd = container_of(dev, struct idxd_device, conf_dev);
> +
> +	return sprintf(buf, "%u\n", idxd->ims_size);
> +}

use sysfs_emit for all new sysfs functions please

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 02/14] dmaengine: idxd: add IMS detection in base driver
  2021-02-10 23:30   ` Jason Gunthorpe
@ 2021-02-10 23:32     ` Dave Jiang
  0 siblings, 0 replies; 31+ messages in thread
From: Dave Jiang @ 2021-02-10 23:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, kwankhede, tglx, vkoul, megha.dey,
	jacob.jun.pan, ashok.raj, yi.l.liu, baolu.lu, kevin.tian,
	sanjay.k.kumar, tony.luck, dan.j.williams, eric.auger, parav,
	netanelg, shahafs, pbonzini, dmaengine, linux-kernel, kvm


On 2/10/2021 4:30 PM, Jason Gunthorpe wrote:
> On Fri, Feb 05, 2021 at 01:53:05PM -0700, Dave Jiang wrote:
>
>> diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
>> index 21c1e23cdf23..ab5c76e1226b 100644
>> +++ b/drivers/dma/idxd/sysfs.c
>> @@ -1444,6 +1444,14 @@ static ssize_t numa_node_show(struct device *dev,
>>   }
>>   static DEVICE_ATTR_RO(numa_node);
>>   
>> +static ssize_t ims_size_show(struct device *dev, struct device_attribute *attr, char *buf)
>> +{
>> +	struct idxd_device *idxd = container_of(dev, struct idxd_device, conf_dev);
>> +
>> +	return sprintf(buf, "%u\n", idxd->ims_size);
>> +}
> use sysfs_emit for all new sysfs functions please

Will fix. Thanks!


>
> Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 04/14] vfio/mdev: idxd: Add auxialary device plumbing for idxd mdev support
  2021-02-05 20:53 ` [PATCH v5 04/14] vfio/mdev: idxd: Add auxialary device plumbing for idxd mdev support Dave Jiang
@ 2021-02-10 23:46   ` Jason Gunthorpe
  2021-02-12 18:56     ` Dave Jiang
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2021-02-10 23:46 UTC (permalink / raw)
  To: Dave Jiang
  Cc: alex.williamson, kwankhede, tglx, vkoul, megha.dey,
	jacob.jun.pan, ashok.raj, yi.l.liu, baolu.lu, kevin.tian,
	sanjay.k.kumar, tony.luck, dan.j.williams, eric.auger, parav,
	netanelg, shahafs, pbonzini, dmaengine, linux-kernel, kvm

On Fri, Feb 05, 2021 at 01:53:18PM -0700, Dave Jiang wrote:
> diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
> index a2438b3166db..f02c96164515 100644
> +++ b/drivers/dma/idxd/idxd.h
> @@ -8,6 +8,7 @@
>  #include <linux/percpu-rwsem.h>
>  #include <linux/wait.h>
>  #include <linux/cdev.h>
> +#include <linux/auxiliary_bus.h>
>  #include "registers.h"
>  
>  #define IDXD_DRIVER_VERSION	"1.00"
> @@ -221,6 +222,8 @@ struct idxd_device {
>  	struct work_struct work;
>  
>  	int *int_handles;
> +
> +	struct auxiliary_device *mdev_auxdev;
>  };

If there is only one aux device there not much reason to make it a
dedicated allocation.

>  /* IDXD software descriptor */
> @@ -282,6 +285,10 @@ enum idxd_interrupt_type {
>  	IDXD_IRQ_IMS,
>  };
>  
> +struct idxd_mdev_aux_drv {
> +	        struct auxiliary_driver auxiliary_drv;
> +};

Wrong indent. What is this even for?

> +
>  static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot,
>  					    enum idxd_interrupt_type irq_type)
>  {
> diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
> index ee56b92108d8..fd57f39e4b7d 100644
> +++ b/drivers/dma/idxd/init.c
> @@ -382,6 +382,74 @@ static void idxd_disable_system_pasid(struct idxd_device *idxd)
>  	idxd->sva = NULL;
>  }
>  
> +static void idxd_remove_mdev_auxdev(struct idxd_device *idxd)
> +{
> +	if (!IS_ENABLED(CONFIG_VFIO_MDEV_IDXD))
> +		return;
> +
> +	auxiliary_device_delete(idxd->mdev_auxdev);
> +	auxiliary_device_uninit(idxd->mdev_auxdev);
> +}
> +
> +static void idxd_auxdev_release(struct device *dev)
> +{
> +	struct auxiliary_device *auxdev = to_auxiliary_dev(dev);
> +	struct idxd_device *idxd = dev_get_drvdata(dev);

Nope, where did you see drvdata being used like this? You need to use
container_of.

If put the mdev_auxdev as a non pointer member then this is just:

     struct idxd_device *idxd = container_of(dev, struct idxd_device, mdev_auxdev)
     
     put_device(&idxd->conf_dev);

And fix the 'setup' to match this design

> +	kfree(auxdev->name);

This is weird, the name shouldn't be allocated, it is supposed to be a
fixed string to make it easy to find the driver name in the code base.

> +static int idxd_setup_mdev_auxdev(struct idxd_device *idxd)
> +{
> +	struct auxiliary_device *auxdev;
> +	struct device *dev = &idxd->pdev->dev;
> +	int rc;
> +
> +	if (!IS_ENABLED(CONFIG_VFIO_MDEV_IDXD))
> +		return 0;
> +
> +	auxdev = kzalloc(sizeof(*auxdev), GFP_KERNEL);
> +	if (!auxdev)
> +		return -ENOMEM;
> +
> +	auxdev->name = kasprintf(GFP_KERNEL, "mdev-%s", idxd_name[idxd->type]);
> +	if (!auxdev->name) {
> +		rc = -ENOMEM;
> +		goto err_name;
> +	}
> +
> +	dev_dbg(&idxd->pdev->dev, "aux dev mdev: %s\n", auxdev->name);
> +
> +	auxdev->dev.parent = dev;
> +	auxdev->dev.release = idxd_auxdev_release;
> +	auxdev->id = idxd->id;
> +
> +	rc = auxiliary_device_init(auxdev);
> +	if (rc < 0) {
> +		dev_err(dev, "Failed to init aux dev: %d\n", rc);
> +		goto err_auxdev;
> +	}

Put the init earlier so it can handle the error unwinds

> +	rc = auxiliary_device_add(auxdev);
> +	if (rc < 0) {
> +		dev_err(dev, "Failed to add aux dev: %d\n", rc);
> +		goto err_auxdev;
> +	}
> +
> +	idxd->mdev_auxdev = auxdev;
> +	dev_set_drvdata(&auxdev->dev, idxd);

No to using drvdata, and this is in the wrong order anyhow.

> +	return 0;
> +
> + err_auxdev:
> +	kfree(auxdev->name);
> + err_name:
> +	kfree(auxdev);
> +	return rc;
> +}
> +
>  static int idxd_probe(struct idxd_device *idxd)
>  {
>  	struct pci_dev *pdev = idxd->pdev;
> @@ -434,11 +502,19 @@ static int idxd_probe(struct idxd_device *idxd)
>  		goto err_idr_fail;
>  	}
>  
> +	rc = idxd_setup_mdev_auxdev(idxd);
> +	if (rc < 0)
> +		goto err_auxdev_fail;
> +
>  	idxd->major = idxd_cdev_get_major(idxd);
>  
>  	dev_dbg(dev, "IDXD device %d probed successfully\n", idxd->id);
>  	return 0;
>  
> + err_auxdev_fail:
> +	mutex_lock(&idxd_idr_lock);
> +	idr_remove(&idxd_idrs[idxd->type], idxd->id);
> +	mutex_unlock(&idxd_idr_lock);

Probably wrong to order things like this..

Also somehow this has a 

	idxd = devm_kzalloc(dev, sizeof(struct idxd_device), GFP_KERNEL);

but the idxd has a kref'd struct device in it:

struct idxd_device {
	enum idxd_type type;
	struct device conf_dev;

So that's not right either

You'll need to fix the lifetime model for idxd_device before you get
to adding auxdevices

> +static int idxd_mdev_host_init(struct idxd_device *idxd)
> +{
> +	/* FIXME: Fill in later */
> +	return 0;
> +}
> +
> +static int idxd_mdev_host_release(struct idxd_device *idxd)
> +{
> +	/* FIXME: Fill in later */
> +	return 0;
> +}

Don't leave empty stubs like this, just provide the whole driver in
the next patch

> +static int idxd_mdev_aux_probe(struct auxiliary_device *auxdev,
> +			       const struct auxiliary_device_id *id)
> +{
> +	struct idxd_device *idxd = dev_get_drvdata(&auxdev->dev);

Continuing no to using drvdata, must use container_of

> +	int rc;
> +
> +	rc = idxd_mdev_host_init(idxd);

And why add this indirection? Just write what it here

> +static struct idxd_mdev_aux_drv idxd_mdev_aux_drv = {
> +	.auxiliary_drv = {
> +		.id_table = idxd_mdev_auxbus_id_table,
> +		.probe = idxd_mdev_aux_probe,
> +		.remove = idxd_mdev_aux_remove,
> +	},
> +};

Why idxd_mdev_aux_drv ? Does a later patch add something here?

> +static int idxd_mdev_auxdev_drv_register(struct idxd_mdev_aux_drv *drv)
> +{
> +	return auxiliary_driver_register(&drv->auxiliary_drv);
> +}
> +
> +static void idxd_mdev_auxdev_drv_unregister(struct idxd_mdev_aux_drv *drv)
> +{
> +	auxiliary_driver_unregister(&drv->auxiliary_drv);
> +}
> +
> +module_driver(idxd_mdev_aux_drv, idxd_mdev_auxdev_drv_register, idxd_mdev_auxdev_drv_unregister);

There is some auxillary driver macro that does this boilerplate

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions
  2021-02-05 20:53 ` [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions Dave Jiang
@ 2021-02-10 23:59   ` Jason Gunthorpe
  2021-02-16 19:04     ` Dave Jiang
  2021-03-02  0:23     ` Dave Jiang
  0 siblings, 2 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2021-02-10 23:59 UTC (permalink / raw)
  To: Dave Jiang
  Cc: alex.williamson, kwankhede, tglx, vkoul, megha.dey,
	jacob.jun.pan, ashok.raj, yi.l.liu, baolu.lu, kevin.tian,
	sanjay.k.kumar, tony.luck, dan.j.williams, eric.auger, parav,
	netanelg, shahafs, pbonzini, dmaengine, linux-kernel, kvm

On Fri, Feb 05, 2021 at 01:53:24PM -0700, Dave Jiang wrote:

> +static int check_vma(struct idxd_wq *wq, struct vm_area_struct *vma)
>  {
> -	/* FIXME: Fill in later */
> +	if (vma->vm_end < vma->vm_start)
> +		return -EINVAL;

These checks are redundant

> -static int idxd_mdev_host_release(struct idxd_device *idxd)
> +static int idxd_vdcm_mmap(struct mdev_device *mdev, struct vm_area_struct *vma)
> +{
> +	unsigned int wq_idx, rc;
> +	unsigned long req_size, pgoff = 0, offset;
> +	pgprot_t pg_prot;
> +	struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev);
> +	struct idxd_wq *wq = vidxd->wq;
> +	struct idxd_device *idxd = vidxd->idxd;
> +	enum idxd_portal_prot virt_portal, phys_portal;
> +	phys_addr_t base = pci_resource_start(idxd->pdev, IDXD_WQ_BAR);
> +	struct device *dev = mdev_dev(mdev);
> +
> +	rc = check_vma(wq, vma);
> +	if (rc)
> +		return rc;
> +
> +	pg_prot = vma->vm_page_prot;
> +	req_size = vma->vm_end - vma->vm_start;
> +	vma->vm_flags |= VM_DONTCOPY;
> +
> +	offset = (vma->vm_pgoff << PAGE_SHIFT) &
> +		 ((1ULL << VFIO_PCI_OFFSET_SHIFT) - 1);
> +
> +	wq_idx = offset >> (PAGE_SHIFT + 2);
> +	if (wq_idx >= 1) {
> +		dev_err(dev, "mapping invalid wq %d off %lx\n",
> +			wq_idx, offset);
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Check and see if the guest wants to map to the limited or unlimited portal.
> +	 * The driver will allow mapping to unlimited portal only if the the wq is a
> +	 * dedicated wq. Otherwise, it goes to limited.
> +	 */
> +	virt_portal = ((offset >> PAGE_SHIFT) & 0x3) == 1;
> +	phys_portal = IDXD_PORTAL_LIMITED;
> +	if (virt_portal == IDXD_PORTAL_UNLIMITED && wq_dedicated(wq))
> +		phys_portal = IDXD_PORTAL_UNLIMITED;
> +
> +	/* We always map IMS portals to the guest */
> +	pgoff = (base + idxd_get_wq_portal_full_offset(wq->id, phys_portal,
> +						       IDXD_IRQ_IMS)) >> PAGE_SHIFT;
> +	dev_dbg(dev, "mmap %lx %lx %lx %lx\n", vma->vm_start, pgoff, req_size,
> +		pgprot_val(pg_prot));
> +	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> +	vma->vm_private_data = mdev;

What ensures the mdev pointer is valid strictly longer than the VMA?
This needs refcounting.

> +	vma->vm_pgoff = pgoff;
> +
> +	return remap_pfn_range(vma, vma->vm_start, pgoff, req_size, pg_prot);

Nothing validated req_size - did you copy this from the Intel RDMA
driver? It had a huge security bug just like this.
> +
> +static int msix_trigger_unregister(struct vdcm_idxd *vidxd, int index)
> +{
> +	struct mdev_device *mdev = vidxd->vdev.mdev;
> +	struct device *dev = mdev_dev(mdev);
> +	struct ims_irq_entry *irq_entry;
> +	int rc;
> +
> +	if (!vidxd->vdev.msix_trigger[index])
> +		return 0;
> +
> +	dev_dbg(dev, "disable MSIX trigger %d\n", index);
> +	if (index) {
> +		u32 auxval;
> +
> +		irq_entry = &vidxd->irq_entries[index];
> +		if (irq_entry->irq_set) {
> +			free_irq(irq_entry->irq, irq_entry);
> +			irq_entry->irq_set = false;
> +		}
> +
> +		auxval = ims_ctrl_pasid_aux(0, false);
> +		rc = irq_set_auxdata(irq_entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval);
> +		if (rc)
> +			return rc;
> +	}
> +	eventfd_ctx_put(vidxd->vdev.msix_trigger[index]);
> +	vidxd->vdev.msix_trigger[index] = NULL;
> +
> +	return 0;
> +}
> +
> +static int msix_trigger_register(struct vdcm_idxd *vidxd, u32 fd, int index)
> +{
> +	struct mdev_device *mdev = vidxd->vdev.mdev;
> +	struct device *dev = mdev_dev(mdev);
> +	struct ims_irq_entry *irq_entry;
> +	struct eventfd_ctx *trigger;
> +	int rc;
> +
> +	if (vidxd->vdev.msix_trigger[index])
> +		return 0;
> +
> +	dev_dbg(dev, "enable MSIX trigger %d\n", index);
> +	trigger = eventfd_ctx_fdget(fd);
> +	if (IS_ERR(trigger)) {
> +		dev_warn(dev, "eventfd_ctx_fdget failed %d\n", index);
> +		return PTR_ERR(trigger);
> +	}
> +
> +	if (index) {
> +		u32 pasid;
> +		u32 auxval;
> +
> +		irq_entry = &vidxd->irq_entries[index];
> +		rc = idxd_mdev_get_pasid(mdev, &pasid);
> +		if (rc < 0)
> +			return rc;
> +
> +		/*
> +		 * Program and enable the pasid field in the IMS entry. The programmed pasid and
> +		 * enabled field is checked against the  pasid and enable field for the work queue
> +		 * configuration and the pasid for the descriptor. A mismatch will result in blocked
> +		 * IMS interrupt.
> +		 */
> +		auxval = ims_ctrl_pasid_aux(pasid, true);
> +		rc = irq_set_auxdata(irq_entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval);
> +		if (rc < 0)
> +			return rc;
> +
> +		rc = request_irq(irq_entry->irq, idxd_guest_wq_completion, 0, "idxd-ims",
> +				 irq_entry);
> +		if (rc) {
> +			dev_warn(dev, "failed to request ims irq\n");
> +			eventfd_ctx_put(trigger);
> +			auxval = ims_ctrl_pasid_aux(0, false);
> +			irq_set_auxdata(irq_entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval);
> +			return rc;
> +		}
> +		irq_entry->irq_set = true;
> +	}
> +
> +	vidxd->vdev.msix_trigger[index] = trigger;
> +	return 0;
> +}
> +
> +static int vdcm_idxd_set_msix_trigger(struct vdcm_idxd *vidxd,
> +				      unsigned int index, unsigned int start,
> +				      unsigned int count, uint32_t flags,
> +				      void *data)
> +{
> +	int i, rc = 0;
> +
> +	if (count > VIDXD_MAX_MSIX_ENTRIES - 1)
> +		count = VIDXD_MAX_MSIX_ENTRIES - 1;
> +
> +	if (count == 0 && (flags & VFIO_IRQ_SET_DATA_NONE)) {
> +		/* Disable all MSIX entries */
> +		for (i = 0; i < VIDXD_MAX_MSIX_ENTRIES; i++) {
> +			rc = msix_trigger_unregister(vidxd, i);
> +			if (rc < 0)
> +				return rc;
> +		}
> +		return 0;
> +	}
> +
> +	for (i = 0; i < count; i++) {
> +		if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
> +			u32 fd = *(u32 *)(data + i * sizeof(u32));
> +
> +			rc = msix_trigger_register(vidxd, fd, i);
> +			if (rc < 0)
> +				return rc;
> +		} else if (flags & VFIO_IRQ_SET_DATA_NONE) {
> +			rc = msix_trigger_unregister(vidxd, i);
> +			if (rc < 0)
> +				return rc;
> +		}
> +	}
> +	return rc;
> +}
> +
> +static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags,
> +			      unsigned int index, unsigned int start,
> +			      unsigned int count, void *data)
> +{
> +	int (*func)(struct vdcm_idxd *vidxd, unsigned int index,
> +		    unsigned int start, unsigned int count, uint32_t flags,
> +		    void *data) = NULL;
> +	struct mdev_device *mdev = vidxd->vdev.mdev;
> +	struct device *dev = mdev_dev(mdev);
> +
> +	switch (index) {
> +	case VFIO_PCI_INTX_IRQ_INDEX:
> +		dev_warn(dev, "intx interrupts not supported.\n");
> +		break;
> +	case VFIO_PCI_MSI_IRQ_INDEX:
> +		dev_dbg(dev, "msi interrupt.\n");
> +		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> +		case VFIO_IRQ_SET_ACTION_MASK:
> +		case VFIO_IRQ_SET_ACTION_UNMASK:
> +			break;
> +		case VFIO_IRQ_SET_ACTION_TRIGGER:
> +			func = vdcm_idxd_set_msix_trigger;

This would be a good place to insert a common VFIO helper library to
take care of the MSI-X emulation for IMS.

> +int idxd_mdev_host_init(struct idxd_device *idxd)
> +{
> +	struct device *dev = &idxd->pdev->dev;
> +	int rc;
> +
> +	if (!test_bit(IDXD_FLAG_IMS_SUPPORTED, &idxd->flags))
> +		return -EOPNOTSUPP;
> +
> +	if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
> +		rc = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_AUX);

Huh. This is the first user of IOMMU_DEV_FEAT_AUX, why has so much
dead-code infrastructure been already merged around this?


> @@ -34,6 +1024,7 @@ static int idxd_mdev_aux_probe(struct auxiliary_device *auxdev,
>  		return rc;
>  	}
>  
> +	set_bit(IDXD_FLAG_MDEV_ENABLED, &idxd->flags);

Something is being done wrong if this flag is needed

> +int vidxd_send_interrupt(struct ims_irq_entry *iie)
> +{
> +	/* PLACE HOLDER */
> +	return 0;
> +}

Here too, don't structure the patches like this

> diff --git a/drivers/vfio/mdev/idxd/vdev.h b/drivers/vfio/mdev/idxd/vdev.h
> new file mode 100644
> index 000000000000..cc2ba6ccff7b
> +++ b/drivers/vfio/mdev/idxd/vdev.h
> @@ -0,0 +1,19 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */
> +
> +#ifndef _IDXD_VDEV_H_
> +#define _IDXD_VDEV_H_
> +
> +#include "mdev.h"
> +
> +int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size);
> +int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size);
> +int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count);
> +int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size);
> +void vidxd_mmio_init(struct vdcm_idxd *vidxd);
> +void vidxd_reset(struct vdcm_idxd *vidxd);
> +int vidxd_send_interrupt(struct ims_irq_entry *iie);
> +int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd);
> +void vidxd_free_ims_entries(struct vdcm_idxd *vidxd);

Why are these functions special??

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 07/14] vfio/mdev: idxd: add 1dwq-v1 mdev type
  2021-02-05 20:53 ` [PATCH v5 07/14] vfio/mdev: idxd: add 1dwq-v1 mdev type Dave Jiang
@ 2021-02-11  0:00   ` Jason Gunthorpe
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2021-02-11  0:00 UTC (permalink / raw)
  To: Dave Jiang
  Cc: alex.williamson, kwankhede, tglx, vkoul, megha.dey,
	jacob.jun.pan, ashok.raj, yi.l.liu, baolu.lu, kevin.tian,
	sanjay.k.kumar, tony.luck, dan.j.williams, eric.auger, parav,
	netanelg, shahafs, pbonzini, dmaengine, linux-kernel, kvm

On Fri, Feb 05, 2021 at 01:53:37PM -0700, Dave Jiang wrote:

> -static const struct mdev_parent_ops idxd_vdcm_ops = {
> +static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf)
> +{
> +	struct vdcm_idxd_type *type;
> +
> +	type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj));
> +
> +	if (type)
> +		return sprintf(buf, "%s\n", type->name);
> +
> +	return -EINVAL;

Success oriented flow

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 04/14] vfio/mdev: idxd: Add auxialary device plumbing for idxd mdev support
  2021-02-10 23:46   ` Jason Gunthorpe
@ 2021-02-12 18:56     ` Dave Jiang
  2021-02-12 19:14       ` Jason Gunthorpe
  0 siblings, 1 reply; 31+ messages in thread
From: Dave Jiang @ 2021-02-12 18:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, kwankhede, tglx, vkoul, megha.dey,
	jacob.jun.pan, ashok.raj, yi.l.liu, baolu.lu, kevin.tian,
	sanjay.k.kumar, tony.luck, dan.j.williams, eric.auger, parav,
	netanelg, shahafs, pbonzini, dmaengine, linux-kernel, kvm


On 2/10/2021 4:46 PM, Jason Gunthorpe wrote:
> On Fri, Feb 05, 2021 at 01:53:18PM -0700, Dave Jiang wrote:
>> diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
>> index a2438b3166db..f02c96164515 100644
>> +++ b/drivers/dma/idxd/idxd.h
>> @@ -8,6 +8,7 @@
>>   #include <linux/percpu-rwsem.h>
>>   #include <linux/wait.h>
>>   #include <linux/cdev.h>
>> +#include <linux/auxiliary_bus.h>
>>   #include "registers.h"
>>   
>>   #define IDXD_DRIVER_VERSION	"1.00"
>> @@ -221,6 +222,8 @@ struct idxd_device {
>>   	struct work_struct work;
>>   
>>   	int *int_handles;
>> +
>> +	struct auxiliary_device *mdev_auxdev;
>>   };
> If there is only one aux device there not much reason to make it a
> dedicated allocation.

Hi Jason. Thank you for the review. Very much appreciated!

Yep. I had it embedded and then changed it when I was working on the 
UACCE bits to make it uniform. Should've just kept it the way it was.

>>   /* IDXD software descriptor */
>> @@ -282,6 +285,10 @@ enum idxd_interrupt_type {
>>   	IDXD_IRQ_IMS,
>>   };
>>   
>> +struct idxd_mdev_aux_drv {
>> +	        struct auxiliary_driver auxiliary_drv;
>> +};
> Wrong indent. What is this even for?

Will remove.

>> +
>>   static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot,
>>   					    enum idxd_interrupt_type irq_type)
>>   {
>> diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
>> index ee56b92108d8..fd57f39e4b7d 100644
>> +++ b/drivers/dma/idxd/init.c
>> @@ -382,6 +382,74 @@ static void idxd_disable_system_pasid(struct idxd_device *idxd)
>>   	idxd->sva = NULL;
>>   }
>>   
>> +static void idxd_remove_mdev_auxdev(struct idxd_device *idxd)
>> +{
>> +	if (!IS_ENABLED(CONFIG_VFIO_MDEV_IDXD))
>> +		return;
>> +
>> +	auxiliary_device_delete(idxd->mdev_auxdev);
>> +	auxiliary_device_uninit(idxd->mdev_auxdev);
>> +}
>> +
>> +static void idxd_auxdev_release(struct device *dev)
>> +{
>> +	struct auxiliary_device *auxdev = to_auxiliary_dev(dev);
>> +	struct idxd_device *idxd = dev_get_drvdata(dev);
> Nope, where did you see drvdata being used like this? You need to use
> container_of.
>
> If put the mdev_auxdev as a non pointer member then this is just:
>
>       struct idxd_device *idxd = container_of(dev, struct idxd_device, mdev_auxdev)
>       
>       put_device(&idxd->conf_dev);
>
> And fix the 'setup' to match this design

Yes. Once it's embedded, everything falls in place. The drvdata hack was 
to deal with the auxdev being a pointer.


>> +	kfree(auxdev->name);
> This is weird, the name shouldn't be allocated, it is supposed to be a
> fixed string to make it easy to find the driver name in the code base.

Will fix.


>> +static int idxd_setup_mdev_auxdev(struct idxd_device *idxd)
>> +{
>> +	struct auxiliary_device *auxdev;
>> +	struct device *dev = &idxd->pdev->dev;
>> +	int rc;
>> +
>> +	if (!IS_ENABLED(CONFIG_VFIO_MDEV_IDXD))
>> +		return 0;
>> +
>> +	auxdev = kzalloc(sizeof(*auxdev), GFP_KERNEL);
>> +	if (!auxdev)
>> +		return -ENOMEM;
>> +
>> +	auxdev->name = kasprintf(GFP_KERNEL, "mdev-%s", idxd_name[idxd->type]);
>> +	if (!auxdev->name) {
>> +		rc = -ENOMEM;
>> +		goto err_name;
>> +	}
>> +
>> +	dev_dbg(&idxd->pdev->dev, "aux dev mdev: %s\n", auxdev->name);
>> +
>> +	auxdev->dev.parent = dev;
>> +	auxdev->dev.release = idxd_auxdev_release;
>> +	auxdev->id = idxd->id;
>> +
>> +	rc = auxiliary_device_init(auxdev);
>> +	if (rc < 0) {
>> +		dev_err(dev, "Failed to init aux dev: %d\n", rc);
>> +		goto err_auxdev;
>> +	}
> Put the init earlier so it can handle the error unwinds

I think with auxdev embedded, there's really not much going on so I 
think this resolves itself.


>> +	rc = auxiliary_device_add(auxdev);
>> +	if (rc < 0) {
>> +		dev_err(dev, "Failed to add aux dev: %d\n", rc);
>> +		goto err_auxdev;
>> +	}
>> +
>> +	idxd->mdev_auxdev = auxdev;
>> +	dev_set_drvdata(&auxdev->dev, idxd);
> No to using drvdata, and this is in the wrong order anyhow.
>
>> +	return 0;
>> +
>> + err_auxdev:
>> +	kfree(auxdev->name);
>> + err_name:
>> +	kfree(auxdev);
>> +	return rc;
>> +}
>> +
>>   static int idxd_probe(struct idxd_device *idxd)
>>   {
>>   	struct pci_dev *pdev = idxd->pdev;
>> @@ -434,11 +502,19 @@ static int idxd_probe(struct idxd_device *idxd)
>>   		goto err_idr_fail;
>>   	}
>>   
>> +	rc = idxd_setup_mdev_auxdev(idxd);
>> +	if (rc < 0)
>> +		goto err_auxdev_fail;
>> +
>>   	idxd->major = idxd_cdev_get_major(idxd);
>>   
>>   	dev_dbg(dev, "IDXD device %d probed successfully\n", idxd->id);
>>   	return 0;
>>   
>> + err_auxdev_fail:
>> +	mutex_lock(&idxd_idr_lock);
>> +	idr_remove(&idxd_idrs[idxd->type], idxd->id);
>> +	mutex_unlock(&idxd_idr_lock);
> Probably wrong to order things like this..

How should it be ordered?

>
> Also somehow this has a
>
> 	idxd = devm_kzalloc(dev, sizeof(struct idxd_device), GFP_KERNEL);
>
> but the idxd has a kref'd struct device in it:

So the conf_dev is a struct device that let the driver do configuration 
of the device and other components through sysfs. It's a child device to 
the pdev. It should have no relation to the auxdev. The confdevs for 
each component should not be released until the physical device is 
released. For the mdev case, the auxdev shouldn't be released until the 
removal of the pdev as well since it is a child of the pdev also.

pdev --- device conf_dev --- wq conf_dev

     |                   |--- engine conf_dev

     |                   |--- group conf_dev

     |--- aux_dev

>
> struct idxd_device {
> 	enum idxd_type type;
> 	struct device conf_dev;
>
> So that's not right either
>
> You'll need to fix the lifetime model for idxd_device before you get
> to adding auxdevices

Can you kindly expand on how it's suppose to look like please?


>
>> +static int idxd_mdev_host_init(struct idxd_device *idxd)
>> +{
>> +	/* FIXME: Fill in later */
>> +	return 0;
>> +}
>> +
>> +static int idxd_mdev_host_release(struct idxd_device *idxd)
>> +{
>> +	/* FIXME: Fill in later */
>> +	return 0;
>> +}
> Don't leave empty stubs like this, just provide the whole driver in
> the next patch

Ok will do that.


>
>> +static int idxd_mdev_aux_probe(struct auxiliary_device *auxdev,
>> +			       const struct auxiliary_device_id *id)
>> +{
>> +	struct idxd_device *idxd = dev_get_drvdata(&auxdev->dev);
> Continuing no to using drvdata, must use container_of
>
>> +	int rc;
>> +
>> +	rc = idxd_mdev_host_init(idxd);
> And why add this indirection? Just write what it here

ok


>
>> +static struct idxd_mdev_aux_drv idxd_mdev_aux_drv = {
>> +	.auxiliary_drv = {
>> +		.id_table = idxd_mdev_auxbus_id_table,
>> +		.probe = idxd_mdev_aux_probe,
>> +		.remove = idxd_mdev_aux_remove,
>> +	},
>> +};
> Why idxd_mdev_aux_drv ? Does a later patch add something here?


Yes. There is a callback function that's added later. But I can code it 
so that it gets changed later on.


>
>> +static int idxd_mdev_auxdev_drv_register(struct idxd_mdev_aux_drv *drv)
>> +{
>> +	return auxiliary_driver_register(&drv->auxiliary_drv);
>> +}
>> +
>> +static void idxd_mdev_auxdev_drv_unregister(struct idxd_mdev_aux_drv *drv)
>> +{
>> +	auxiliary_driver_unregister(&drv->auxiliary_drv);
>> +}
>> +
>> +module_driver(idxd_mdev_aux_drv, idxd_mdev_auxdev_drv_register, idxd_mdev_auxdev_drv_unregister);
> There is some auxillary driver macro that does this boilerplate

Ok thanks.


>
> Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 04/14] vfio/mdev: idxd: Add auxialary device plumbing for idxd mdev support
  2021-02-12 18:56     ` Dave Jiang
@ 2021-02-12 19:14       ` Jason Gunthorpe
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2021-02-12 19:14 UTC (permalink / raw)
  To: Dave Jiang
  Cc: alex.williamson, kwankhede, tglx, vkoul, megha.dey,
	jacob.jun.pan, ashok.raj, yi.l.liu, baolu.lu, kevin.tian,
	sanjay.k.kumar, tony.luck, dan.j.williams, eric.auger, parav,
	netanelg, shahafs, pbonzini, dmaengine, linux-kernel, kvm

On Fri, Feb 12, 2021 at 11:56:24AM -0700, Dave Jiang wrote:

> > > @@ -434,11 +502,19 @@ static int idxd_probe(struct idxd_device *idxd)
> > >   		goto err_idr_fail;
> > >   	}
> > > +	rc = idxd_setup_mdev_auxdev(idxd);
> > > +	if (rc < 0)
> > > +		goto err_auxdev_fail;
> > > +
> > >   	idxd->major = idxd_cdev_get_major(idxd);
> > >   	dev_dbg(dev, "IDXD device %d probed successfully\n", idxd->id);
> > >   	return 0;
> > > + err_auxdev_fail:
> > > +	mutex_lock(&idxd_idr_lock);
> > > +	idr_remove(&idxd_idrs[idxd->type], idxd->id);
> > > +	mutex_unlock(&idxd_idr_lock);
> > Probably wrong to order things like this..
> 
> How should it be ordered?

The IDR is global data so some other thread could have read the IDR
and got this pointer, but now it is being torn down in some racy
way. It is best to make the store to global access be the very last
thing so you never have to try to unstore from global memory and don't
have to think about concurrency.

> > Also somehow this has a
> > 
> > 	idxd = devm_kzalloc(dev, sizeof(struct idxd_device), GFP_KERNEL);
> > 
> > but the idxd has a kref'd struct device in it:
> 
> So the conf_dev is a struct device that let the driver do configuration of
> the device and other components through sysfs. It's a child device to the
> pdev. It should have no relation to the auxdev. The confdevs for each
> component should not be released until the physical device is released. For
> the mdev case, the auxdev shouldn't be released until the removal of the
> pdev as well since it is a child of the pdev also.
> 
> pdev --- device conf_dev --- wq conf_dev
> 
>     |                   |--- engine conf_dev
> 
>     |                   |--- group conf_dev
> 
>     |--- aux_dev
> 
> > 
> > struct idxd_device {
> > 	enum idxd_type type;
> > 	struct device conf_dev;
> > 
> > So that's not right either
> > 
> > You'll need to fix the lifetime model for idxd_device before you get
> > to adding auxdevices
> 
> Can you kindly expand on how it's suppose to look like please?

Well, you can't call kfree on memory that contains a struct device,
you have to use put_device() - so the devm_kzalloc is unconditionally
wrong. Maybe you could replace it with a devm put device action, but
it would probably be alot saner to just put the required put_device's
where they need to be in the first place.

I didn't try to work out what this was all for, but once it is sorted
out you can just embed the aux device here and chain its release to
put_device on the conf_dev and all the lifetime will work out
naturally. 

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions
  2021-02-10 23:59   ` Jason Gunthorpe
@ 2021-02-16 19:04     ` Dave Jiang
  2021-02-16 20:39       ` Dan Williams
  2021-02-16 21:33       ` Jason Gunthorpe
  2021-03-02  0:23     ` Dave Jiang
  1 sibling, 2 replies; 31+ messages in thread
From: Dave Jiang @ 2021-02-16 19:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, kwankhede, tglx, vkoul, megha.dey,
	jacob.jun.pan, ashok.raj, yi.l.liu, baolu.lu, kevin.tian,
	sanjay.k.kumar, tony.luck, dan.j.williams, eric.auger, parav,
	netanelg, shahafs, pbonzini, dmaengine, linux-kernel, kvm


On 2/10/2021 4:59 PM, Jason Gunthorpe wrote:
> On Fri, Feb 05, 2021 at 01:53:24PM -0700, Dave Jiang wrote:
>
>> +static int check_vma(struct idxd_wq *wq, struct vm_area_struct *vma)
>>   {
>> -	/* FIXME: Fill in later */
>> +	if (vma->vm_end < vma->vm_start)
>> +		return -EINVAL;
> These checks are redundant

Thanks. Will remove.

>
>> -static int idxd_mdev_host_release(struct idxd_device *idxd)
>> +static int idxd_vdcm_mmap(struct mdev_device *mdev, struct vm_area_struct *vma)
>> +{
>> +	unsigned int wq_idx, rc;
>> +	unsigned long req_size, pgoff = 0, offset;
>> +	pgprot_t pg_prot;
>> +	struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev);
>> +	struct idxd_wq *wq = vidxd->wq;
>> +	struct idxd_device *idxd = vidxd->idxd;
>> +	enum idxd_portal_prot virt_portal, phys_portal;
>> +	phys_addr_t base = pci_resource_start(idxd->pdev, IDXD_WQ_BAR);
>> +	struct device *dev = mdev_dev(mdev);
>> +
>> +	rc = check_vma(wq, vma);
>> +	if (rc)
>> +		return rc;
>> +
>> +	pg_prot = vma->vm_page_prot;
>> +	req_size = vma->vm_end - vma->vm_start;
>> +	vma->vm_flags |= VM_DONTCOPY;
>> +
>> +	offset = (vma->vm_pgoff << PAGE_SHIFT) &
>> +		 ((1ULL << VFIO_PCI_OFFSET_SHIFT) - 1);
>> +
>> +	wq_idx = offset >> (PAGE_SHIFT + 2);
>> +	if (wq_idx >= 1) {
>> +		dev_err(dev, "mapping invalid wq %d off %lx\n",
>> +			wq_idx, offset);
>> +		return -EINVAL;
>> +	}
>> +
>> +	/*
>> +	 * Check and see if the guest wants to map to the limited or unlimited portal.
>> +	 * The driver will allow mapping to unlimited portal only if the the wq is a
>> +	 * dedicated wq. Otherwise, it goes to limited.
>> +	 */
>> +	virt_portal = ((offset >> PAGE_SHIFT) & 0x3) == 1;
>> +	phys_portal = IDXD_PORTAL_LIMITED;
>> +	if (virt_portal == IDXD_PORTAL_UNLIMITED && wq_dedicated(wq))
>> +		phys_portal = IDXD_PORTAL_UNLIMITED;
>> +
>> +	/* We always map IMS portals to the guest */
>> +	pgoff = (base + idxd_get_wq_portal_full_offset(wq->id, phys_portal,
>> +						       IDXD_IRQ_IMS)) >> PAGE_SHIFT;
>> +	dev_dbg(dev, "mmap %lx %lx %lx %lx\n", vma->vm_start, pgoff, req_size,
>> +		pgprot_val(pg_prot));
>> +	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>> +	vma->vm_private_data = mdev;
> What ensures the mdev pointer is valid strictly longer than the VMA?
> This needs refcounting.

Going to take a kref at open and then put_device at close. Does that 
sound reasonable or should I be calling get_device() in mmap() and then 
register a notifier for when vma is released?


>
>> +	vma->vm_pgoff = pgoff;
>> +
>> +	return remap_pfn_range(vma, vma->vm_start, pgoff, req_size, pg_prot);
> Nothing validated req_size - did you copy this from the Intel RDMA
> driver? It had a huge security bug just like this.
Thanks. Will add. Some of the code came from the Intel i915 mdev driver.
>> +
>> +static int msix_trigger_unregister(struct vdcm_idxd *vidxd, int index)
>> +{
>> +	struct mdev_device *mdev = vidxd->vdev.mdev;
>> +	struct device *dev = mdev_dev(mdev);
>> +	struct ims_irq_entry *irq_entry;
>> +	int rc;
>> +
>> +	if (!vidxd->vdev.msix_trigger[index])
>> +		return 0;
>> +
>> +	dev_dbg(dev, "disable MSIX trigger %d\n", index);
>> +	if (index) {
>> +		u32 auxval;
>> +
>> +		irq_entry = &vidxd->irq_entries[index];
>> +		if (irq_entry->irq_set) {
>> +			free_irq(irq_entry->irq, irq_entry);
>> +			irq_entry->irq_set = false;
>> +		}
>> +
>> +		auxval = ims_ctrl_pasid_aux(0, false);
>> +		rc = irq_set_auxdata(irq_entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval);
>> +		if (rc)
>> +			return rc;
>> +	}
>> +	eventfd_ctx_put(vidxd->vdev.msix_trigger[index]);
>> +	vidxd->vdev.msix_trigger[index] = NULL;
>> +
>> +	return 0;
>> +}
>> +
>> +static int msix_trigger_register(struct vdcm_idxd *vidxd, u32 fd, int index)
>> +{
>> +	struct mdev_device *mdev = vidxd->vdev.mdev;
>> +	struct device *dev = mdev_dev(mdev);
>> +	struct ims_irq_entry *irq_entry;
>> +	struct eventfd_ctx *trigger;
>> +	int rc;
>> +
>> +	if (vidxd->vdev.msix_trigger[index])
>> +		return 0;
>> +
>> +	dev_dbg(dev, "enable MSIX trigger %d\n", index);
>> +	trigger = eventfd_ctx_fdget(fd);
>> +	if (IS_ERR(trigger)) {
>> +		dev_warn(dev, "eventfd_ctx_fdget failed %d\n", index);
>> +		return PTR_ERR(trigger);
>> +	}
>> +
>> +	if (index) {
>> +		u32 pasid;
>> +		u32 auxval;
>> +
>> +		irq_entry = &vidxd->irq_entries[index];
>> +		rc = idxd_mdev_get_pasid(mdev, &pasid);
>> +		if (rc < 0)
>> +			return rc;
>> +
>> +		/*
>> +		 * Program and enable the pasid field in the IMS entry. The programmed pasid and
>> +		 * enabled field is checked against the  pasid and enable field for the work queue
>> +		 * configuration and the pasid for the descriptor. A mismatch will result in blocked
>> +		 * IMS interrupt.
>> +		 */
>> +		auxval = ims_ctrl_pasid_aux(pasid, true);
>> +		rc = irq_set_auxdata(irq_entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval);
>> +		if (rc < 0)
>> +			return rc;
>> +
>> +		rc = request_irq(irq_entry->irq, idxd_guest_wq_completion, 0, "idxd-ims",
>> +				 irq_entry);
>> +		if (rc) {
>> +			dev_warn(dev, "failed to request ims irq\n");
>> +			eventfd_ctx_put(trigger);
>> +			auxval = ims_ctrl_pasid_aux(0, false);
>> +			irq_set_auxdata(irq_entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval);
>> +			return rc;
>> +		}
>> +		irq_entry->irq_set = true;
>> +	}
>> +
>> +	vidxd->vdev.msix_trigger[index] = trigger;
>> +	return 0;
>> +}
>> +
>> +static int vdcm_idxd_set_msix_trigger(struct vdcm_idxd *vidxd,
>> +				      unsigned int index, unsigned int start,
>> +				      unsigned int count, uint32_t flags,
>> +				      void *data)
>> +{
>> +	int i, rc = 0;
>> +
>> +	if (count > VIDXD_MAX_MSIX_ENTRIES - 1)
>> +		count = VIDXD_MAX_MSIX_ENTRIES - 1;
>> +
>> +	if (count == 0 && (flags & VFIO_IRQ_SET_DATA_NONE)) {
>> +		/* Disable all MSIX entries */
>> +		for (i = 0; i < VIDXD_MAX_MSIX_ENTRIES; i++) {
>> +			rc = msix_trigger_unregister(vidxd, i);
>> +			if (rc < 0)
>> +				return rc;
>> +		}
>> +		return 0;
>> +	}
>> +
>> +	for (i = 0; i < count; i++) {
>> +		if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
>> +			u32 fd = *(u32 *)(data + i * sizeof(u32));
>> +
>> +			rc = msix_trigger_register(vidxd, fd, i);
>> +			if (rc < 0)
>> +				return rc;
>> +		} else if (flags & VFIO_IRQ_SET_DATA_NONE) {
>> +			rc = msix_trigger_unregister(vidxd, i);
>> +			if (rc < 0)
>> +				return rc;
>> +		}
>> +	}
>> +	return rc;
>> +}
>> +
>> +static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags,
>> +			      unsigned int index, unsigned int start,
>> +			      unsigned int count, void *data)
>> +{
>> +	int (*func)(struct vdcm_idxd *vidxd, unsigned int index,
>> +		    unsigned int start, unsigned int count, uint32_t flags,
>> +		    void *data) = NULL;
>> +	struct mdev_device *mdev = vidxd->vdev.mdev;
>> +	struct device *dev = mdev_dev(mdev);
>> +
>> +	switch (index) {
>> +	case VFIO_PCI_INTX_IRQ_INDEX:
>> +		dev_warn(dev, "intx interrupts not supported.\n");
>> +		break;
>> +	case VFIO_PCI_MSI_IRQ_INDEX:
>> +		dev_dbg(dev, "msi interrupt.\n");
>> +		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>> +		case VFIO_IRQ_SET_ACTION_MASK:
>> +		case VFIO_IRQ_SET_ACTION_UNMASK:
>> +			break;
>> +		case VFIO_IRQ_SET_ACTION_TRIGGER:
>> +			func = vdcm_idxd_set_msix_trigger;
> This would be a good place to insert a common VFIO helper library to
> take care of the MSI-X emulation for IMS.

I took a look at the idxd version vs the VFIO version and they are 
somewhat different. Although the MSI and MSIX case can be squashed in 
the idxd driver code. I do think that the parent code block can be split 
out in VFIO code and made into a common helper function to deal with 
VFIO_DEVICE_SET_IRQS and I've done so.


>> +int idxd_mdev_host_init(struct idxd_device *idxd)
>> +{
>> +	struct device *dev = &idxd->pdev->dev;
>> +	int rc;
>> +
>> +	if (!test_bit(IDXD_FLAG_IMS_SUPPORTED, &idxd->flags))
>> +		return -EOPNOTSUPP;
>> +
>> +	if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
>> +		rc = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_AUX);
> Huh. This is the first user of IOMMU_DEV_FEAT_AUX, why has so much
> dead-code infrastructure been already merged around this?
>
>
>> @@ -34,6 +1024,7 @@ static int idxd_mdev_aux_probe(struct auxiliary_device *auxdev,
>>   		return rc;
>>   	}
>>   
>> +	set_bit(IDXD_FLAG_MDEV_ENABLED, &idxd->flags);
> Something is being done wrong if this flag is needed

Will remove.


>
>> +int vidxd_send_interrupt(struct ims_irq_entry *iie)
>> +{
>> +	/* PLACE HOLDER */
>> +	return 0;
>> +}
> Here too, don't structure the patches like this

This is the unfortunately result of attempting to split the driver code 
into manageable patches from inherited code. Do you suggest I organize 
it such that I add the function definitions first so we don't deal with 
empty functions?

>
>> diff --git a/drivers/vfio/mdev/idxd/vdev.h b/drivers/vfio/mdev/idxd/vdev.h
>> new file mode 100644
>> index 000000000000..cc2ba6ccff7b
>> +++ b/drivers/vfio/mdev/idxd/vdev.h
>> @@ -0,0 +1,19 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */
>> +
>> +#ifndef _IDXD_VDEV_H_
>> +#define _IDXD_VDEV_H_
>> +
>> +#include "mdev.h"
>> +
>> +int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size);
>> +int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size);
>> +int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count);
>> +int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size);
>> +void vidxd_mmio_init(struct vdcm_idxd *vidxd);
>> +void vidxd_reset(struct vdcm_idxd *vidxd);
>> +int vidxd_send_interrupt(struct ims_irq_entry *iie);
>> +int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd);
>> +void vidxd_free_ims_entries(struct vdcm_idxd *vidxd);
> Why are these functions special??

I'm not sure I follow the intent of this question. The vidxd_* functions 
are split out to vdev.c because they are the emulation helper functions 
for the mdev. It seems reasonable to split them out from the mdev code 
to make it more manageable.


>
> Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions
  2021-02-16 19:04     ` Dave Jiang
@ 2021-02-16 20:39       ` Dan Williams
  2021-02-16 21:31         ` Jason Gunthorpe
  2021-02-16 21:33       ` Jason Gunthorpe
  1 sibling, 1 reply; 31+ messages in thread
From: Dan Williams @ 2021-02-16 20:39 UTC (permalink / raw)
  To: Dave Jiang
  Cc: Jason Gunthorpe, Alex Williamson, kwankhede, Thomas Gleixner,
	Vinod Koul, Dey, Megha, Jacob jun Pan, Raj, Ashok, Yi L Liu,
	Baolu Lu, Tian, Kevin, Sanjay K Kumar, Luck, Tony, eric.auger,
	Parav Pandit, netanelg, shahafs, Paolo Bonzini, dmaengine,
	Linux Kernel Mailing List, KVM list

On Tue, Feb 16, 2021 at 11:05 AM Dave Jiang <dave.jiang@intel.com> wrote:
>
>
> On 2/10/2021 4:59 PM, Jason Gunthorpe wrote:
> > On Fri, Feb 05, 2021 at 01:53:24PM -0700, Dave Jiang wrote:
> >
> >> +static int check_vma(struct idxd_wq *wq, struct vm_area_struct *vma)
> >>   {
> >> -    /* FIXME: Fill in later */
> >> +    if (vma->vm_end < vma->vm_start)
> >> +            return -EINVAL;
> > These checks are redundant
>
> Thanks. Will remove.
>
> >
> >> -static int idxd_mdev_host_release(struct idxd_device *idxd)
> >> +static int idxd_vdcm_mmap(struct mdev_device *mdev, struct vm_area_struct *vma)
> >> +{
> >> +    unsigned int wq_idx, rc;
> >> +    unsigned long req_size, pgoff = 0, offset;
> >> +    pgprot_t pg_prot;
> >> +    struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev);
> >> +    struct idxd_wq *wq = vidxd->wq;
> >> +    struct idxd_device *idxd = vidxd->idxd;
> >> +    enum idxd_portal_prot virt_portal, phys_portal;
> >> +    phys_addr_t base = pci_resource_start(idxd->pdev, IDXD_WQ_BAR);
> >> +    struct device *dev = mdev_dev(mdev);
> >> +
> >> +    rc = check_vma(wq, vma);
> >> +    if (rc)
> >> +            return rc;
> >> +
> >> +    pg_prot = vma->vm_page_prot;
> >> +    req_size = vma->vm_end - vma->vm_start;
> >> +    vma->vm_flags |= VM_DONTCOPY;
> >> +
> >> +    offset = (vma->vm_pgoff << PAGE_SHIFT) &
> >> +             ((1ULL << VFIO_PCI_OFFSET_SHIFT) - 1);
> >> +
> >> +    wq_idx = offset >> (PAGE_SHIFT + 2);
> >> +    if (wq_idx >= 1) {
> >> +            dev_err(dev, "mapping invalid wq %d off %lx\n",
> >> +                    wq_idx, offset);
> >> +            return -EINVAL;
> >> +    }
> >> +
> >> +    /*
> >> +     * Check and see if the guest wants to map to the limited or unlimited portal.
> >> +     * The driver will allow mapping to unlimited portal only if the the wq is a
> >> +     * dedicated wq. Otherwise, it goes to limited.
> >> +     */
> >> +    virt_portal = ((offset >> PAGE_SHIFT) & 0x3) == 1;
> >> +    phys_portal = IDXD_PORTAL_LIMITED;
> >> +    if (virt_portal == IDXD_PORTAL_UNLIMITED && wq_dedicated(wq))
> >> +            phys_portal = IDXD_PORTAL_UNLIMITED;
> >> +
> >> +    /* We always map IMS portals to the guest */
> >> +    pgoff = (base + idxd_get_wq_portal_full_offset(wq->id, phys_portal,
> >> +                                                   IDXD_IRQ_IMS)) >> PAGE_SHIFT;
> >> +    dev_dbg(dev, "mmap %lx %lx %lx %lx\n", vma->vm_start, pgoff, req_size,
> >> +            pgprot_val(pg_prot));
> >> +    vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> >> +    vma->vm_private_data = mdev;
> > What ensures the mdev pointer is valid strictly longer than the VMA?
> > This needs refcounting.
>
> Going to take a kref at open and then put_device at close. Does that
> sound reasonable or should I be calling get_device() in mmap() and then
> register a notifier for when vma is released?

Where does this enabling ever look at vm_private_data again? It seems
to me it should be reasonable for the mdev to die out from underneath
a vma, just need some tracking to block future uses of the
vma->vm_private_data from being attempted.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions
  2021-02-16 20:39       ` Dan Williams
@ 2021-02-16 21:31         ` Jason Gunthorpe
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2021-02-16 21:31 UTC (permalink / raw)
  To: Dan Williams
  Cc: Dave Jiang, Alex Williamson, kwankhede, Thomas Gleixner,
	Vinod Koul, Dey, Megha, Jacob jun Pan, Raj, Ashok, Yi L Liu,
	Baolu Lu, Tian, Kevin, Sanjay K Kumar, Luck, Tony, eric.auger,
	Parav Pandit, netanelg, shahafs, Paolo Bonzini, dmaengine,
	Linux Kernel Mailing List, KVM list

On Tue, Feb 16, 2021 at 12:39:56PM -0800, Dan Williams wrote:
> > >> +    /*
> > >> +     * Check and see if the guest wants to map to the limited or unlimited portal.
> > >> +     * The driver will allow mapping to unlimited portal only if the the wq is a
> > >> +     * dedicated wq. Otherwise, it goes to limited.
> > >> +     */
> > >> +    virt_portal = ((offset >> PAGE_SHIFT) & 0x3) == 1;
> > >> +    phys_portal = IDXD_PORTAL_LIMITED;
> > >> +    if (virt_portal == IDXD_PORTAL_UNLIMITED && wq_dedicated(wq))
> > >> +            phys_portal = IDXD_PORTAL_UNLIMITED;
> > >> +
> > >> +    /* We always map IMS portals to the guest */
> > >> +    pgoff = (base + idxd_get_wq_portal_full_offset(wq->id, phys_portal,
> > >> +                                                   IDXD_IRQ_IMS)) >> PAGE_SHIFT;
> > >> +    dev_dbg(dev, "mmap %lx %lx %lx %lx\n", vma->vm_start, pgoff, req_size,
> > >> +            pgprot_val(pg_prot));
> > >> +    vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> > >> +    vma->vm_private_data = mdev;
> > > What ensures the mdev pointer is valid strictly longer than the VMA?
> > > This needs refcounting.
> >
> > Going to take a kref at open and then put_device at close. Does that
> > sound reasonable or should I be calling get_device() in mmap() and then
> > register a notifier for when vma is released?
> 
> Where does this enabling ever look at vm_private_data again?

So long as a PCI BAR page is mapped into a VMA the pci driver cannot
be removed. Things must either wait until the fd (or at least all
VMAs) are closed, or zap the VMAs before allowing the device driver to
be removed.

There should be some logic in this whole thing where the pci_driver
destroys the mdevs which destroy the vfio's which wait for all the fds
to be closed.

There is enough going on in vfio_device_fops_release() that this might
happen already, Dave needs to investigate and confirm the whole thing
works as expected.

Presumably there is no security issue with sharing these portal pages
because I don't see a vma ops involved here to track when pages are
freed up (ie the vm_private_data is dead code cargo-cult'd from
someplace else)

But this is all sufficiently tricky, and Intel has already had
security bugs in their drivers here, that someone needs to audit it
closely before it gets posted again.

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions
  2021-02-16 19:04     ` Dave Jiang
  2021-02-16 20:39       ` Dan Williams
@ 2021-02-16 21:33       ` Jason Gunthorpe
  2021-02-19  1:42         ` Tian, Kevin
  1 sibling, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2021-02-16 21:33 UTC (permalink / raw)
  To: Dave Jiang
  Cc: alex.williamson, kwankhede, tglx, vkoul, megha.dey,
	jacob.jun.pan, ashok.raj, yi.l.liu, baolu.lu, kevin.tian,
	sanjay.k.kumar, tony.luck, dan.j.williams, eric.auger, parav,
	netanelg, shahafs, pbonzini, dmaengine, linux-kernel, kvm

On Tue, Feb 16, 2021 at 12:04:55PM -0700, Dave Jiang wrote:

> > > +	return remap_pfn_range(vma, vma->vm_start, pgoff, req_size, pg_prot);
> > Nothing validated req_size - did you copy this from the Intel RDMA
> > driver? It had a huge security bug just like this.

> Thanks. Will add. Some of the code came from the Intel i915 mdev
> driver.

Please make sure it is fixed as well, the security bug is huge.

> > > +			      unsigned int index, unsigned int start,
> > > +			      unsigned int count, void *data)
> > > +{
> > > +	int (*func)(struct vdcm_idxd *vidxd, unsigned int index,
> > > +		    unsigned int start, unsigned int count, uint32_t flags,
> > > +		    void *data) = NULL;
> > > +	struct mdev_device *mdev = vidxd->vdev.mdev;
> > > +	struct device *dev = mdev_dev(mdev);
> > > +
> > > +	switch (index) {
> > > +	case VFIO_PCI_INTX_IRQ_INDEX:
> > > +		dev_warn(dev, "intx interrupts not supported.\n");
> > > +		break;
> > > +	case VFIO_PCI_MSI_IRQ_INDEX:
> > > +		dev_dbg(dev, "msi interrupt.\n");
> > > +		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> > > +		case VFIO_IRQ_SET_ACTION_MASK:
> > > +		case VFIO_IRQ_SET_ACTION_UNMASK:
> > > +			break;
> > > +		case VFIO_IRQ_SET_ACTION_TRIGGER:
> > > +			func = vdcm_idxd_set_msix_trigger;
> > This would be a good place to insert a common VFIO helper library to
> > take care of the MSI-X emulation for IMS.
> 
> I took a look at the idxd version vs the VFIO version and they are somewhat
> different. Although the MSI and MSIX case can be squashed in the idxd driver
> code. I do think that the parent code block can be split out in VFIO code
> and made into a common helper function to deal with VFIO_DEVICE_SET_IRQS and
> I've done so.

Really it looks like the MSI emulation for a simple IMS device is just
mapping the MSI table to a certain irq_chip, this feels like it should
be substantially common code

> > > diff --git a/drivers/vfio/mdev/idxd/vdev.h b/drivers/vfio/mdev/idxd/vdev.h
> > > new file mode 100644
> > > index 000000000000..cc2ba6ccff7b
> > > +++ b/drivers/vfio/mdev/idxd/vdev.h
> > > @@ -0,0 +1,19 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */
> > > +
> > > +#ifndef _IDXD_VDEV_H_
> > > +#define _IDXD_VDEV_H_
> > > +
> > > +#include "mdev.h"
> > > +
> > > +int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size);
> > > +int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size);
> > > +int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count);
> > > +int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size);
> > > +void vidxd_mmio_init(struct vdcm_idxd *vidxd);
> > > +void vidxd_reset(struct vdcm_idxd *vidxd);
> > > +int vidxd_send_interrupt(struct ims_irq_entry *iie);
> > > +int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd);
> > > +void vidxd_free_ims_entries(struct vdcm_idxd *vidxd);
> > Why are these functions special??
> 
> I'm not sure I follow the intent of this question. The vidxd_* functions are
> split out to vdev.c because they are the emulation helper functions for the
> mdev. It seems reasonable to split them out from the mdev code to make it
> more manageable.

Why do they get their own mostly empty header file?

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions
  2021-02-16 21:33       ` Jason Gunthorpe
@ 2021-02-19  1:42         ` Tian, Kevin
  0 siblings, 0 replies; 31+ messages in thread
From: Tian, Kevin @ 2021-02-19  1:42 UTC (permalink / raw)
  To: Jason Gunthorpe, Jiang, Dave
  Cc: alex.williamson, kwankhede, tglx, vkoul, Dey, Megha, Pan,
	Jacob jun, Raj, Ashok, Liu, Yi L, Lu, Baolu, Kumar, Sanjay K,
	Luck, Tony, Williams, Dan J, eric.auger, parav, netanelg,
	shahafs, pbonzini, dmaengine, linux-kernel, kvm, Wang, Zhenyu Z

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, February 17, 2021 5:33 AM
> 
> On Tue, Feb 16, 2021 at 12:04:55PM -0700, Dave Jiang wrote:
> 
> > > > +	return remap_pfn_range(vma, vma->vm_start, pgoff, req_size,
> pg_prot);
> > > Nothing validated req_size - did you copy this from the Intel RDMA
> > > driver? It had a huge security bug just like this.
> 
> > Thanks. Will add. Some of the code came from the Intel i915 mdev
> > driver.
> 
> Please make sure it is fixed as well, the security bug is huge.
> 

It's already been fixed 2yrs ago:

commit 51b00d8509dc69c98740da2ad07308b630d3eb7d
Author: Zhenyu Wang <zhenyuw@linux.intel.com>
Date:   Fri Jan 11 13:58:53 2019 +0800

    drm/i915/gvt: Fix mmap range check

    This is to fix missed mmap range check on vGPU bar2 region
    and only allow to map vGPU allocated GMADDR range, which means
    user space should support sparse mmap to get proper offset for
    mmap vGPU aperture. And this takes care of actual pgoff in mmap
    request as original code always does from beginning of vGPU
    aperture.

    Fixes: 659643f7d814 ("drm/i915/gvt/kvmgt: add vfio/mdev support to KVMGT")
    Cc: "Monroy, Rodrigo Axel" <rodrigo.axel.monroy@intel.com>
    Cc: "Orrala Contreras, Alfredo" <alfredo.orrala.contreras@intel.com>
    Cc: stable@vger.kernel.org # v4.10+
    Reviewed-by: Hang Yuan <hang.yuan@intel.com>
    Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

Thanks
Kevin


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions
  2021-02-10 23:59   ` Jason Gunthorpe
  2021-02-16 19:04     ` Dave Jiang
@ 2021-03-02  0:23     ` Dave Jiang
  2021-03-02  0:29       ` Jason Gunthorpe
  1 sibling, 1 reply; 31+ messages in thread
From: Dave Jiang @ 2021-03-02  0:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, kwankhede, tglx, vkoul, megha.dey,
	jacob.jun.pan, ashok.raj, yi.l.liu, baolu.lu, kevin.tian,
	sanjay.k.kumar, tony.luck, dan.j.williams, eric.auger, parav,
	netanelg, shahafs, pbonzini, dmaengine, linux-kernel, kvm


On 2/10/2021 4:59 PM, Jason Gunthorpe wrote:
> On Fri, Feb 05, 2021 at 01:53:24PM -0700, Dave Jiang wrote:

<-- cut for brevity -->


> +static int vdcm_idxd_set_msix_trigger(struct vdcm_idxd *vidxd,
> +				      unsigned int index, unsigned int start,
> +				      unsigned int count, uint32_t flags,
> +				      void *data)
> +{
> +	int i, rc = 0;
> +
> +	if (count > VIDXD_MAX_MSIX_ENTRIES - 1)
> +		count = VIDXD_MAX_MSIX_ENTRIES - 1;
> +
> +	if (count == 0 && (flags & VFIO_IRQ_SET_DATA_NONE)) {
> +		/* Disable all MSIX entries */
> +		for (i = 0; i < VIDXD_MAX_MSIX_ENTRIES; i++) {
> +			rc = msix_trigger_unregister(vidxd, i);
> +			if (rc < 0)
> +				return rc;
> +		}
> +		return 0;
> +	}
> +
> +	for (i = 0; i < count; i++) {
> +		if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
> +			u32 fd = *(u32 *)(data + i * sizeof(u32));
> +
> +			rc = msix_trigger_register(vidxd, fd, i);
> +			if (rc < 0)
> +				return rc;
> +		} else if (flags & VFIO_IRQ_SET_DATA_NONE) {
> +			rc = msix_trigger_unregister(vidxd, i);
> +			if (rc < 0)
> +				return rc;
> +		}
> +	}
> +	return rc;
> +}
> +
> +static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags,
> +			      unsigned int index, unsigned int start,
> +			      unsigned int count, void *data)
> +{
> +	int (*func)(struct vdcm_idxd *vidxd, unsigned int index,
> +		    unsigned int start, unsigned int count, uint32_t flags,
> +		    void *data) = NULL;
> +	struct mdev_device *mdev = vidxd->vdev.mdev;
> +	struct device *dev = mdev_dev(mdev);
> +
> +	switch (index) {
> +	case VFIO_PCI_INTX_IRQ_INDEX:
> +		dev_warn(dev, "intx interrupts not supported.\n");
> +		break;
> +	case VFIO_PCI_MSI_IRQ_INDEX:
> +		dev_dbg(dev, "msi interrupt.\n");
> +		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> +		case VFIO_IRQ_SET_ACTION_MASK:
> +		case VFIO_IRQ_SET_ACTION_UNMASK:
> +			break;
> +		case VFIO_IRQ_SET_ACTION_TRIGGER:
> +			func = vdcm_idxd_set_msix_trigger;
> This would be a good place to insert a common VFIO helper library to
> take care of the MSI-X emulation for IMS.

Hi Jason,

So after looking at the code in vfio_pci_intrs.c, I agree that the 
set_irqs code between VFIO_PCI and this driver can be made in common. 
Given that Alex doesn't want a vfio_pci device embedded in the driver, I 
think we'll need some sort of generic VFIO device that can be used from 
the vfio_pci side and vfio_mdev side to pass down in order to have 
common support library functions. Do you have any thoughts on how to do 
this cleanly architecturally? Also, with vfio_pci common split [1] still 
being worked on, do you think we can defer the work on making the 
interrupt setup code common until the vfio_pci split work settles? Thanks!

[1]: https://lore.kernel.org/kvm/20210201162828.5938-1-mgurtovoy@nvidia.com/



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions
  2021-03-02  0:23     ` Dave Jiang
@ 2021-03-02  0:29       ` Jason Gunthorpe
  2021-03-02  0:48         ` Dave Jiang
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2021-03-02  0:29 UTC (permalink / raw)
  To: Dave Jiang
  Cc: alex.williamson, kwankhede, tglx, vkoul, megha.dey,
	jacob.jun.pan, ashok.raj, yi.l.liu, baolu.lu, kevin.tian,
	sanjay.k.kumar, tony.luck, dan.j.williams, eric.auger, parav,
	netanelg, shahafs, pbonzini, dmaengine, linux-kernel, kvm

On Mon, Mar 01, 2021 at 05:23:47PM -0700, Dave Jiang wrote:
> 
> So after looking at the code in vfio_pci_intrs.c, I agree that the set_irqs
> code between VFIO_PCI and this driver can be made in common. Given that Alex
> doesn't want a vfio_pci device embedded in the driver, 

idxd isn't a vfio_pci so it would be improper to do something like
that here anyhow.

> I think we'll need some sort of generic VFIO device that can be used
> from the vfio_pci side and vfio_mdev side to pass down in order to
> have common support library functions. 

Why do you need more layers?

Just make some helper functions to manage this and build them into
their own struct and function family. All this needs is some callback
to for the end driver to hook in the raw device programming and some
entry points to direct the emulation access to the module.

It should be fully self contained and completely unrelated to vfio_pci

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions
  2021-03-02  0:29       ` Jason Gunthorpe
@ 2021-03-02  0:48         ` Dave Jiang
  2021-03-02  0:50           ` Jason Gunthorpe
  0 siblings, 1 reply; 31+ messages in thread
From: Dave Jiang @ 2021-03-02  0:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, kwankhede, tglx, vkoul, megha.dey,
	jacob.jun.pan, ashok.raj, yi.l.liu, baolu.lu, kevin.tian,
	sanjay.k.kumar, tony.luck, dan.j.williams, eric.auger, parav,
	netanelg, shahafs, pbonzini, dmaengine, linux-kernel, kvm


On 3/1/2021 5:29 PM, Jason Gunthorpe wrote:
> On Mon, Mar 01, 2021 at 05:23:47PM -0700, Dave Jiang wrote:
>> So after looking at the code in vfio_pci_intrs.c, I agree that the set_irqs
>> code between VFIO_PCI and this driver can be made in common. Given that Alex
>> doesn't want a vfio_pci device embedded in the driver,
> idxd isn't a vfio_pci so it would be improper to do something like
> that here anyhow.
>
>> I think we'll need some sort of generic VFIO device that can be used
>> from the vfio_pci side and vfio_mdev side to pass down in order to
>> have common support library functions.
> Why do you need more layers?
>
> Just make some helper functions to manage this and build them into
> their own struct and function family. All this needs is some callback
> to for the end driver to hook in the raw device programming and some
> entry points to direct the emulation access to the module.
>
> It should be fully self contained and completely unrelated to vfio_pci
>
Maybe I'm looking at this wrong. I see a some code in vfio_pci_intrs.c 
that we can reuse with some changes here and there. But, I think see 
where you are getting at with just common functions for mdev side. Let 
me create it just for IMS emulation and then we can go from there trying 
to figure if that's the right path to go down or if we need to share 
code with vfio_pci.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions
  2021-03-02  0:48         ` Dave Jiang
@ 2021-03-02  0:50           ` Jason Gunthorpe
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2021-03-02  0:50 UTC (permalink / raw)
  To: Dave Jiang
  Cc: alex.williamson, kwankhede, tglx, vkoul, megha.dey,
	jacob.jun.pan, ashok.raj, yi.l.liu, baolu.lu, kevin.tian,
	sanjay.k.kumar, tony.luck, dan.j.williams, eric.auger, parav,
	netanelg, shahafs, pbonzini, dmaengine, linux-kernel, kvm

On Mon, Mar 01, 2021 at 05:48:00PM -0700, Dave Jiang wrote:
> 
> On 3/1/2021 5:29 PM, Jason Gunthorpe wrote:
> > On Mon, Mar 01, 2021 at 05:23:47PM -0700, Dave Jiang wrote:
> > > So after looking at the code in vfio_pci_intrs.c, I agree that the set_irqs
> > > code between VFIO_PCI and this driver can be made in common. Given that Alex
> > > doesn't want a vfio_pci device embedded in the driver,
> > idxd isn't a vfio_pci so it would be improper to do something like
> > that here anyhow.
> > 
> > > I think we'll need some sort of generic VFIO device that can be used
> > > from the vfio_pci side and vfio_mdev side to pass down in order to
> > > have common support library functions.
> > Why do you need more layers?
> > 
> > Just make some helper functions to manage this and build them into
> > their own struct and function family. All this needs is some callback
> > to for the end driver to hook in the raw device programming and some
> > entry points to direct the emulation access to the module.
> > 
> > It should be fully self contained and completely unrelated to vfio_pci
> > 
> Maybe I'm looking at this wrong. I see a some code in vfio_pci_intrs.c that
> we can reuse with some changes here and there. But, I think see where you
> are getting at with just common functions for mdev side. Let me create it
> just for IMS emulation and then we can go from there trying to figure if
> that's the right path to go down or if we need to share code with vfio_pci.

If it really is very common it could all be consolidated in a
vfio_utils.c kind of thing that all the places can use.

There is nothing wrong with splitting pieces of vfio_pci out.

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-03-02  7:46 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-05 20:52 [PATCH v5 00/14] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
2021-02-05 20:52 ` [PATCH v5 01/14] vfio/mdev: idxd: add theory of operation documentation for idxd mdev Dave Jiang
2021-02-05 20:53 ` [PATCH v5 02/14] dmaengine: idxd: add IMS detection in base driver Dave Jiang
2021-02-10 23:30   ` Jason Gunthorpe
2021-02-10 23:32     ` Dave Jiang
2021-02-05 20:53 ` [PATCH v5 03/14] dmaengine: idxd: add device support functions in prep for mdev Dave Jiang
2021-02-05 20:53 ` [PATCH v5 04/14] vfio/mdev: idxd: Add auxialary device plumbing for idxd mdev support Dave Jiang
2021-02-10 23:46   ` Jason Gunthorpe
2021-02-12 18:56     ` Dave Jiang
2021-02-12 19:14       ` Jason Gunthorpe
2021-02-05 20:53 ` [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions Dave Jiang
2021-02-10 23:59   ` Jason Gunthorpe
2021-02-16 19:04     ` Dave Jiang
2021-02-16 20:39       ` Dan Williams
2021-02-16 21:31         ` Jason Gunthorpe
2021-02-16 21:33       ` Jason Gunthorpe
2021-02-19  1:42         ` Tian, Kevin
2021-03-02  0:23     ` Dave Jiang
2021-03-02  0:29       ` Jason Gunthorpe
2021-03-02  0:48         ` Dave Jiang
2021-03-02  0:50           ` Jason Gunthorpe
2021-02-05 20:53 ` [PATCH v5 06/14] vfio/mdev: idxd: add mdev type as a new wq type Dave Jiang
2021-02-05 20:53 ` [PATCH v5 07/14] vfio/mdev: idxd: add 1dwq-v1 mdev type Dave Jiang
2021-02-11  0:00   ` Jason Gunthorpe
2021-02-05 20:53 ` [PATCH v5 08/14] vfio/mdev: idxd: add emulation rw routines Dave Jiang
2021-02-05 20:53 ` [PATCH v5 09/14] vfio/mdev: idxd: prep for virtual device commands Dave Jiang
2021-02-05 20:53 ` [PATCH v5 10/14] vfio/mdev: idxd: virtual device commands emulation Dave Jiang
2021-02-05 20:54 ` [PATCH v5 11/14] vfio/mdev: idxd: ims setup for the vdcm Dave Jiang
2021-02-05 20:54 ` [PATCH v5 12/14] vfio/mdev: idxd: add irq bypass for IMS vectors Dave Jiang
2021-02-05 20:54 ` [PATCH v5 13/14] vfio/mdev: idxd: add new wq state for mdev Dave Jiang
2021-02-05 20:54 ` [PATCH v5 14/14] vfio/mdev: idxd: add error notification from host driver to mediated device Dave Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).