LKML Archive on
 help / color / Atom feed
From: Dave Jiang <>
Subject: [PATCH v5 0/9] idxd driver for Intel Data Streaming Accelerator
Date: Tue, 21 Jan 2020 16:43:35 -0700
Message-ID: <> (raw)

- Fix header spelling and added Ack.
- Rebase to dmaengine-next
- Make wq resource allocation only for kernel wq type.
- Reworked patch 2 to drop un-needed changes to dma device registration.

- Merge unused __iowrite512() into iosubmit_cmds512().
- Fix various comments for iosubmit_cmds512() patch.
- Drop dmanegine request API and supporting code
- Update to use existing dmaengine API

- Change request_alloc to context_alloc to make it more generic
- Make context_alloc only built when selected via CONFIG_CONTEXT_ALLOC
- Change function names to context_alloc_from_pages() and
- Added commenting to better document the functions.

- Pushed CPU feature check burden to the driver and removed feature check and alignment check from iosubmit_cmds512()
- Removed generic support and make iosubmit_cmds512() x86 only
- added comments to explain __iowrite512() quirks.

The patch series breaks down into following parts:
Patch 1: x86 arch, add a new I/O accessor based on movdir64b
Patches 2,3: dmaengine subsystem changes for chan hotplug
Patches 4-9: idxd driver

This patch series implements the first part of the driver for the Intel
Data Streaming accelerator, the Intel Data Accelerator driver (idxd).
The Intel DSA replaces the Intel IOAT DMA engine from previous Xeon platforms
on a future processor platform. Many new features are implemented by Intel DSA.
1. Descriptors can be issued directly from kernel, user, and guest via new CPU
   instructions enqcmd, enqcmds, and movdir64b. The descriptor is written to
   an mmio address in one of the device's PCI BAR and is called a portal.
   New CPU instruction details can be found in the latest Intel Software
   Developer's Manual. [1]
2. Shared workqueues allow multiple users issue descriptors to the same
3. Shared Virtual Memory (SVM) support allows using virtual address instead of
   requiring pinned physical address that traditional DMA controllers require.
   This simplifies programming and makes it easier for user space to do DMA
   operations. Page faults can be recovered through PCI Address Translation
   Service (ATS) performed by the DMA device.
4. Supports scalable IOV (SIOV) to accelerate virtualization. [2]

The submission will happen in multiple stages depending on availability of
kernel support for Process Address Space ID (PASID), IOMMU, vIOMMU, and
Interrupt Message Storage (IMS).

Stage 1 (this series): idxd driver with only dedicated workqueue support.
	- No PASID support
	- No shared workqueue (requires PASID) support
	- With DMA engine plumbing
	- With char driver for user command portal export.
Stage 2. idxd driver with PASID support and shared workqueue support
STage 3. idxd driver with VFIO mediated device (mdev) and with IMS support.

The DSA device defines sub-components called workqueues, groups, and engines.
A group is an abstract container that can have 1 or more workqueues and 1 or
more engines. The number of groups, workqueues, and engines supported by the
device can be detected from the general capabilities register. The workqueues
are where descriptors queued up before being processed by the engines.

The DSA device also has a memory BAR that contains multiple portals.
Depending on the offset from the BAR, various portals can be used to submit
descriptors with one of the CPU commands mentioned above. The types of
portals are MSIX limited, MSIX unlimited, IMS limited, and IMS unlimited as
defined by the hardware spec. The MSIX unlimited portals are reserved for
kernel submissions. The limited portals can be exported to user space for
application usages. A limited portal is configured by the workqueue threshold
attribute and can be restricted to have a workqueue size that is smaller than
the actual workqueue size. This allows the kernel to submit command descriptors
to a workqueue and not be blocked by the user application.

There are two types of workqueues that the DSA device supports, dedicated and
shared. A dedicated workqueue receives descriptors via the movdir64b
instruction. This instruction is a posted write and therefore does not wait for
a completion. Because of this, the software must keep track of the number of
descriptors submitted to the workqueue. A full workqueue will drop the
descriptor without notice. A shared workqueue accepts the enqcmds instruction in
the kernel and enqcmd instruction from user applications. The command will set
the zero flag to indicate whether the submission of the descriptor is
successful. The enqcmd(s) instruction is non-posted and waits for the write
completion before return.

The stage 1 of the patch submission provides a base driver that only support
the dedicated workqueue type without PASID support. The supported source and
destination addresses must be physical. This is similar to traditional
DMA operations where the device receives a descriptor with physical source and
destination addresses for operation. Plumbing to the existing kernel dmaengine
subsystem is added in order to support such usages. DMA memmove operation can be
tested with the in kernel dmatest module.

A large part of the base driver is the sysfs component.  There is also
no requirement for DSA to be used during early kernel boot. Configuration
of the device during initramfs should be sufficient.

A bus type (dsa_bus) is defined for a hierachy of DSA devices and
sub-components to be connected to, /sys/bus/dsa/.
A struct device is created for each DSA device and for each of its
sub-component (workqueues, groups, and engines).  So looking under
/sys/bus/dsa/devices, one would observe entries such as dsa0, dsa1, wq0.0,
wq1.0, group0.0, engine0.0, and etc. Each of those has sysfs attributes
underneath that allows the configuration of those parts or reporting status or
capabilities of the parts that they represent.

├── dsa0 -> ../../../devices/pci0000:00/0000:00:0a.0/dsa0
├── engine0.0 -> ../../../devices/pci0000:00/0000:00:0a.0/dsa0/engine0.0
├── engine0.1 -> ../../../devices/pci0000:00/0000:00:0a.0/dsa0/engine0.1
├── engine0.2 -> ../../../devices/pci0000:00/0000:00:0a.0/dsa0/engine0.2
├── engine0.3 -> ../../../devices/pci0000:00/0000:00:0a.0/dsa0/engine0.3
├── group0.0 -> ../../../devices/pci0000:00/0000:00:0a.0/dsa0/group0.0
├── group0.1 -> ../../../devices/pci0000:00/0000:00:0a.0/dsa0/group0.1
├── group0.2 -> ../../../devices/pci0000:00/0000:00:0a.0/dsa0/group0.2
├── group0.3 -> ../../../devices/pci0000:00/0000:00:0a.0/dsa0/group0.3
├── wq0.0 -> ../../../devices/pci0000:00/0000:00:0a.0/dsa0/wq0.0
├── wq0.1 -> ../../../devices/pci0000:00/0000:00:0a.0/dsa0/wq0.1
├── wq0.2 -> ../../../devices/pci0000:00/0000:00:0a.0/dsa0/wq0.2
├── wq0.3 -> ../../../devices/pci0000:00/0000:00:0a.0/dsa0/wq0.3
├── wq0.4 -> ../../../devices/pci0000:00/0000:00:0a.0/dsa0/wq0.4
├── wq0.5 -> ../../../devices/pci0000:00/0000:00:0a.0/dsa0/wq0.5
├── wq0.6 -> ../../../devices/pci0000:00/0000:00:0a.0/dsa0/wq0.6
├── wq0.7 -> ../../../devices/pci0000:00/0000:00:0a.0/dsa0/wq0.7

Under /sys/bus/dsa/drivers/dsa/ there is a bind and an unbind attribute. Those
allow us to enable and disable the device and workqueue components through the
bus probe and remove functions in the driver. By writing the "device" names
(i.e. dsa0, wq0.0) into bind or unbind attributes we can enable or disable those
components respectively. This is the typical driver-core bind / unbind behavior.

The workqueue device attributes exports two attributes, type and name, to
indicate how the workqueue is being utilized. There are 2 primary types that
the driver recognizes: kernel, user. An additonal mdev type is available from
stage 3 enabling.  The "kernel" type marks the workqueue for in kernel usages.
The "user" type surfaces a char device for user application consumption.
The "name" attribute is a string type that marks the workqueue for more
specific usages. For example, for the dmaengine subsystem to claim the
workqueue the name should be "dmanegine". For "user" queue types, the name
can be any valid string useful for identification by the user application.

For the "user" workqueue that surfaces a char device, char device allows a
limited portal region to be exported to user applications by the mmap() call
once the application opens the char device.  Character device nodes in
/dev/dsa/wqM.N will be made visible for application to open the device.
A user application can use the enqcmd CPU instruction to submit
descriptors directly to a workqueue without kernel driver involvement.

Kernel branch for easy review: idxd-stage1



Dave Jiang (8):
      x86/asm: add iosubmit_cmds512() based on MOVDIR64B CPU instruction
      dmaengine: break out channel registration
      dmaengine: add support to dynamic register/unregister of channels
      dmaengine: idxd: Init and probe for Intel data accelerators
      dmaengine: idxd: add configuration component of driver
      dmaengine: idxd: add descriptor manipulation routines
      dmaengine: idxd: connect idxd to dmaengine subsystem
      dmaengine: idxd: add char driver to expose submission portal to userland

Jing Lin (1):
      dmaengine: idxd: add sysfs ABI for idxd driver

 Documentation/ABI/stable/sysfs-driver-dma-idxd |  171 +++
 MAINTAINERS                                    |    8 
 arch/x86/include/asm/io.h                      |   36 +
 drivers/dma/Kconfig                            |   13 
 drivers/dma/Makefile                           |    1 
 drivers/dma/dmaengine.c                        |  165 ++-
 drivers/dma/idxd/Makefile                      |    2 
 drivers/dma/idxd/cdev.c                        |  302 +++++
 drivers/dma/idxd/device.c                      |  693 +++++++++++
 drivers/dma/idxd/dma.c                         |  217 +++
 drivers/dma/idxd/idxd.h                        |  316 +++++
 drivers/dma/idxd/init.c                        |  533 ++++++++
 drivers/dma/idxd/irq.c                         |  261 ++++
 drivers/dma/idxd/registers.h                   |  336 +++++
 drivers/dma/idxd/submit.c                      |   95 +
 drivers/dma/idxd/sysfs.c                       | 1528 ++++++++++++++++++++++++
 include/linux/dmaengine.h                      |    4 
 include/uapi/linux/idxd.h                      |  228 ++++
 18 files changed, 4849 insertions(+), 60 deletions(-)
 create mode 100644 Documentation/ABI/stable/sysfs-driver-dma-idxd
 create mode 100644 drivers/dma/idxd/Makefile
 create mode 100644 drivers/dma/idxd/cdev.c
 create mode 100644 drivers/dma/idxd/device.c
 create mode 100644 drivers/dma/idxd/dma.c
 create mode 100644 drivers/dma/idxd/idxd.h
 create mode 100644 drivers/dma/idxd/init.c
 create mode 100644 drivers/dma/idxd/irq.c
 create mode 100644 drivers/dma/idxd/registers.h
 create mode 100644 drivers/dma/idxd/submit.c
 create mode 100644 drivers/dma/idxd/sysfs.c
 create mode 100644 include/uapi/linux/idxd.h


             reply index

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-21 23:43 Dave Jiang [this message]
2020-01-21 23:43 ` [PATCH v5 1/9] x86/asm: add iosubmit_cmds512() based on MOVDIR64B CPU instruction Dave Jiang
2020-01-21 23:43 ` [PATCH v5 2/9] dmaengine: break out channel registration Dave Jiang
2020-01-21 23:43 ` [PATCH v5 3/9] dmaengine: add support to dynamic register/unregister of channels Dave Jiang
2020-01-21 23:43 ` [PATCH v5 4/9] dmaengine: idxd: Init and probe for Intel data accelerators Dave Jiang
2020-01-21 23:44 ` [PATCH v5 5/9] dmaengine: idxd: add configuration component of driver Dave Jiang
2020-01-21 23:44 ` [PATCH v5 6/9] dmaengine: idxd: add sysfs ABI for idxd driver Dave Jiang
2020-01-21 23:44 ` [PATCH v5 7/9] dmaengine: idxd: add descriptor manipulation routines Dave Jiang
2020-01-21 23:44 ` [PATCH v5 8/9] dmaengine: idxd: connect idxd to dmaengine subsystem Dave Jiang
2020-01-21 23:44 ` [PATCH v5 9/9] dmaengine: idxd: add char driver to expose submission portal to userland Dave Jiang
2020-01-24  5:49 ` [PATCH v5 0/9] idxd driver for Intel Data Streaming Accelerator Vinod Koul

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on

Archives are clonable:
	git clone --mirror lkml/git/0.git
	git clone --mirror lkml/git/1.git
	git clone --mirror lkml/git/2.git
	git clone --mirror lkml/git/3.git
	git clone --mirror lkml/git/4.git
	git clone --mirror lkml/git/5.git
	git clone --mirror lkml/git/6.git
	git clone --mirror lkml/git/7.git
	git clone --mirror lkml/git/8.git
	git clone --mirror lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ \
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone