All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] QAIC accel driver
@ 2023-02-06 15:41 ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-06 15:41 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: linux-doc, linux-arm-msm, quic_ajitpals, quic_pkanojiy,
	stanislaw.gruszka, quic_carlv, Jeffrey Hugo, jacek.lawrynowicz

From: Jeffrey Hugo <jhugo@qti.qualcomm.com>

This series introduces a driver under the accel subsystem (QAIC -
Qualcomm AIC) for the Qualcomm Cloud AI 100 product (AIC100).  AIC100 is
a PCIe adapter card that hosts a dedicated machine learning inference
accelerator.

Regarding the open userspace (see the documentation patch), the UMD and
compiler are a week or so away from being posted in the indicated repos.
Just need to polish some documentation.

The previous version (RFC) can be found at:
https://lore.kernel.org/all/1660588956-24027-1-git-send-email-quic_jhugo@quicinc.com/

v2:
-Addressed comments from RFC
-Reduced the code to the core minimum by dropping telemetery, etc
-Conversion to accel subsystem
-Dropped versioning
-Add mhi_qaic_cntl component
-Restructure the documentation
-Pull in a few fixes from the downstream tree

Jeffrey Hugo (7):
  accel/qaic: Add documentation for AIC100 accelerator driver
  accel/qaic: Add uapi and core driver file
  accel/qaic: Add MHI controller
  accel/qaic: Add control path
  accel/qaic: Add datapath
  accel/qaic: Add qaic driver to the build system
  MAINTAINERS: Add entry for QAIC driver

Pranjal Ramajor Asha Kanojiya (1):
  accel/qaic: Add mhi_qaic_cntl

 Documentation/accel/index.rst       |    1 +
 Documentation/accel/qaic/aic100.rst |  498 +++++++++
 Documentation/accel/qaic/index.rst  |   13 +
 Documentation/accel/qaic/qaic.rst   |  169 +++
 MAINTAINERS                         |    8 +
 drivers/accel/Kconfig               |    1 +
 drivers/accel/Makefile              |    1 +
 drivers/accel/qaic/Kconfig          |   23 +
 drivers/accel/qaic/Makefile         |   13 +
 drivers/accel/qaic/mhi_controller.c |  576 +++++++++++
 drivers/accel/qaic/mhi_controller.h |   19 +
 drivers/accel/qaic/mhi_qaic_ctrl.c  |  586 +++++++++++
 drivers/accel/qaic/mhi_qaic_ctrl.h  |   11 +
 drivers/accel/qaic/qaic.h           |  321 ++++++
 drivers/accel/qaic/qaic_control.c   | 1656 +++++++++++++++++++++++++++++
 drivers/accel/qaic/qaic_data.c      | 1952 +++++++++++++++++++++++++++++++++++
 drivers/accel/qaic/qaic_drv.c       |  771 ++++++++++++++
 include/uapi/drm/qaic_accel.h       |  283 +++++
 18 files changed, 6902 insertions(+)
 create mode 100644 Documentation/accel/qaic/aic100.rst
 create mode 100644 Documentation/accel/qaic/index.rst
 create mode 100644 Documentation/accel/qaic/qaic.rst
 create mode 100644 drivers/accel/qaic/Kconfig
 create mode 100644 drivers/accel/qaic/Makefile
 create mode 100644 drivers/accel/qaic/mhi_controller.c
 create mode 100644 drivers/accel/qaic/mhi_controller.h
 create mode 100644 drivers/accel/qaic/mhi_qaic_ctrl.c
 create mode 100644 drivers/accel/qaic/mhi_qaic_ctrl.h
 create mode 100644 drivers/accel/qaic/qaic.h
 create mode 100644 drivers/accel/qaic/qaic_control.c
 create mode 100644 drivers/accel/qaic/qaic_data.c
 create mode 100644 drivers/accel/qaic/qaic_drv.c
 create mode 100644 include/uapi/drm/qaic_accel.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 0/8] QAIC accel driver
@ 2023-02-06 15:41 ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-06 15:41 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: jacek.lawrynowicz, stanislaw.gruszka, quic_pkanojiy, quic_carlv,
	quic_ajitpals, linux-doc, linux-arm-msm, Jeffrey Hugo

From: Jeffrey Hugo <jhugo@qti.qualcomm.com>

This series introduces a driver under the accel subsystem (QAIC -
Qualcomm AIC) for the Qualcomm Cloud AI 100 product (AIC100).  AIC100 is
a PCIe adapter card that hosts a dedicated machine learning inference
accelerator.

Regarding the open userspace (see the documentation patch), the UMD and
compiler are a week or so away from being posted in the indicated repos.
Just need to polish some documentation.

The previous version (RFC) can be found at:
https://lore.kernel.org/all/1660588956-24027-1-git-send-email-quic_jhugo@quicinc.com/

v2:
-Addressed comments from RFC
-Reduced the code to the core minimum by dropping telemetery, etc
-Conversion to accel subsystem
-Dropped versioning
-Add mhi_qaic_cntl component
-Restructure the documentation
-Pull in a few fixes from the downstream tree

Jeffrey Hugo (7):
  accel/qaic: Add documentation for AIC100 accelerator driver
  accel/qaic: Add uapi and core driver file
  accel/qaic: Add MHI controller
  accel/qaic: Add control path
  accel/qaic: Add datapath
  accel/qaic: Add qaic driver to the build system
  MAINTAINERS: Add entry for QAIC driver

Pranjal Ramajor Asha Kanojiya (1):
  accel/qaic: Add mhi_qaic_cntl

 Documentation/accel/index.rst       |    1 +
 Documentation/accel/qaic/aic100.rst |  498 +++++++++
 Documentation/accel/qaic/index.rst  |   13 +
 Documentation/accel/qaic/qaic.rst   |  169 +++
 MAINTAINERS                         |    8 +
 drivers/accel/Kconfig               |    1 +
 drivers/accel/Makefile              |    1 +
 drivers/accel/qaic/Kconfig          |   23 +
 drivers/accel/qaic/Makefile         |   13 +
 drivers/accel/qaic/mhi_controller.c |  576 +++++++++++
 drivers/accel/qaic/mhi_controller.h |   19 +
 drivers/accel/qaic/mhi_qaic_ctrl.c  |  586 +++++++++++
 drivers/accel/qaic/mhi_qaic_ctrl.h  |   11 +
 drivers/accel/qaic/qaic.h           |  321 ++++++
 drivers/accel/qaic/qaic_control.c   | 1656 +++++++++++++++++++++++++++++
 drivers/accel/qaic/qaic_data.c      | 1952 +++++++++++++++++++++++++++++++++++
 drivers/accel/qaic/qaic_drv.c       |  771 ++++++++++++++
 include/uapi/drm/qaic_accel.h       |  283 +++++
 18 files changed, 6902 insertions(+)
 create mode 100644 Documentation/accel/qaic/aic100.rst
 create mode 100644 Documentation/accel/qaic/index.rst
 create mode 100644 Documentation/accel/qaic/qaic.rst
 create mode 100644 drivers/accel/qaic/Kconfig
 create mode 100644 drivers/accel/qaic/Makefile
 create mode 100644 drivers/accel/qaic/mhi_controller.c
 create mode 100644 drivers/accel/qaic/mhi_controller.h
 create mode 100644 drivers/accel/qaic/mhi_qaic_ctrl.c
 create mode 100644 drivers/accel/qaic/mhi_qaic_ctrl.h
 create mode 100644 drivers/accel/qaic/qaic.h
 create mode 100644 drivers/accel/qaic/qaic_control.c
 create mode 100644 drivers/accel/qaic/qaic_data.c
 create mode 100644 drivers/accel/qaic/qaic_drv.c
 create mode 100644 include/uapi/drm/qaic_accel.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 1/8] accel/qaic: Add documentation for AIC100 accelerator driver
  2023-02-06 15:41 ` Jeffrey Hugo
@ 2023-02-06 15:41   ` Jeffrey Hugo
  -1 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-06 15:41 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: Jeffrey Hugo, linux-doc, linux-arm-msm, quic_ajitpals,
	quic_pkanojiy, stanislaw.gruszka, quic_carlv, jacek.lawrynowicz

The Qualcomm Cloud AI 100 (AIC100) device is an Artificial Intelligence
accelerator PCIe card.  It contains a number of components both in the
SoC and on the card which facilitate running workloads:

QSM: management processor
NSPs: workload compute units
DMA Bridge: dedicated data mover for the workloads
MHI: multiplexed communication channels
DDR: workload storage and memory

The Linux kernel driver for AIC100 is called "QAIC" and is located in the
accel subsystem.

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
---
 Documentation/accel/index.rst       |   1 +
 Documentation/accel/qaic/aic100.rst | 498 ++++++++++++++++++++++++++++++++++++
 Documentation/accel/qaic/index.rst  |  13 +
 Documentation/accel/qaic/qaic.rst   | 169 ++++++++++++
 4 files changed, 681 insertions(+)
 create mode 100644 Documentation/accel/qaic/aic100.rst
 create mode 100644 Documentation/accel/qaic/index.rst
 create mode 100644 Documentation/accel/qaic/qaic.rst

diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
index 2b43c9a..e94a016 100644
--- a/Documentation/accel/index.rst
+++ b/Documentation/accel/index.rst
@@ -8,6 +8,7 @@ Compute Accelerators
    :maxdepth: 1
 
    introduction
+   qaic/index
 
 .. only::  subproject and html
 
diff --git a/Documentation/accel/qaic/aic100.rst b/Documentation/accel/qaic/aic100.rst
new file mode 100644
index 0000000..773aa54
--- /dev/null
+++ b/Documentation/accel/qaic/aic100.rst
@@ -0,0 +1,498 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+===============================
+ Qualcomm Cloud AI 100 (AIC100)
+===============================
+
+Overview
+========
+
+The Qualcomm Cloud AI 100/AIC100 family of products (including SA9000P - part of
+Snapdragon Ride) are PCIe adapter cards which contain a dedicated SoC ASIC for
+the purpose of efficiently running Artificial Intelligence (AI) Deep Learning
+inference workloads.  They are AI accelerators.
+
+The PCIe interface of AIC100 is capable of PCIe Gen4 speeds over eight lanes
+(x8).  An individual SoC on a card can have up to 16 NSPs for running workloads.
+Each SoC has an A53 management CPU.  On card, there can be up to 32 GB of DDR.
+
+Multiple AIC100 cards can be hosted in a single system to scale overall
+performance.
+
+Hardware Description
+====================
+
+An AIC100 card consists of an AIC100 SoC, on-card DDR, and a set of misc
+peripherals (PMICs, etc).
+
+An AIC100 card can either be a PCIe HHHL form factor (a traditional PCIe card),
+or a Dual M.2 card.  Both use PCIe to connect to the host system.
+
+As a PCIe endpoint/adapter, AIC100 uses the standard VendorID(VID)/
+DeviceID(DID) combination to uniquely identify itself to the host.  AIC100
+uses the standard Qualcomm VID (0x17cb).  All AIC100 instances use the same
+AIC100 DID (0xa100).
+
+AIC100 does not implement FLR (function level reset).
+
+AIC100 implements MSI but does not implement MSI-X.  AIC100 requires 17 MSIs to
+operate (1 for MHI, 16 for the DMA Bridge).
+
+As a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device
+hardware.  AIC100 provides 3, 64-bit BARs.
+
+* The first BAR is 4K in size, and exposes the MHI interface to the host.
+
+* The second BAR is 2M in size, and exposes the DMA Bridge interface to the
+  host.
+
+* The third BAR is variable in size based on an individual AIC100's
+  configuration, but defaults to 64K.  This BAR currently has no purpose.
+
+From the host perspective, AIC100 has several key hardware components-
+
+* QSM (QAIC Service Manager)
+* NSPs (Neural Signal Processor)
+* DMA Bridge
+* DDR
+* MHI (Modem Host Interface)
+
+QSM
+---
+
+QAIC Service Manager.  This is an ARM A53 CPU that runs the primary
+firmware of the card and performs on-card management tasks.  It also
+communicates with the host via MHI.  Each AIC100 has one of
+these.
+
+NSP
+---
+
+Neural Signal Processor.  Each AIC100 has up to 16 of these.  These are
+the processors that run the workloads on AIC100.  Each NSP is a Qualcomm Hexagon
+(Q6) DSP with HVX and HMX.  Each NSP can only run one workload at a time, but
+multiple NSPs may be assigned to a single workload.  Since each NSP can only run
+one workload, AIC100 is limited to 16 concurrent workloads.  Workload
+"scheduling" is under the purview of the host.  AIC100 does not automatically
+timeslice.
+
+DMA Bridge
+----------
+
+The DMA Bridge is custom DMA engine that manages the flow of data
+in and out of workloads.  AIC100 has one of these.  The DMA Bridge has 16
+channels, each consisting of a set of request/response FIFOs.  Each active
+workload is assigned a single DMA Bridge channel.  The DMA Bridge exposes
+hardware registers to manage the FIFOs (head/tail pointers), but requires host
+memory to store the FIFOs.
+
+DDR
+---
+
+AIC100 has on-card DDR.  In total, an AIC100 can have up to 32 GB of DDR.
+This DDR is used to store workloads, data for the workloads, and is used by the
+QSM for managing the device.  NSPs are granted access to sections of the DDR by
+the QSM.  The host does not have direct access to the DDR, and must make
+requests to the QSM to transfer data to the DDR.
+
+MHI
+---
+
+AIC100 has one MHI interface over PCIe.  MHI itself is documented at
+Documentation/mhi/index.rst  MHI is the mechanism the host uses to communicate
+with the QSM.  Except for workload data via the DMA Bridge, all interaction with
+he device occurs via MHI.
+
+High-level Use Flow
+===================
+
+AIC100 is a programmable accelerator typically used for running
+neural networks in inferencing mode to efficiently perform AI operations.
+AIC100 is not intended for training neural networks.  AIC100 can be utilitized
+for generic compute workloads.
+
+Assuming a user wants to utilize AIC100, they would follow these steps:
+
+1. Compile the workload into an ELF targeting the NSP(s)
+2. Make requests to the QSM to load the workload and related artifacts into the
+   device DDR
+3. Make a request to the QSM to activate the workload onto a set of idle NSPs
+4. Make requests to the DMA Bridge to send input data to the workload to be
+   processed, and other requests to receive processed output data from the
+   workload.
+5. Once the workload is no longer required, make a request to the QSM to
+   deactivate the workload, thus putting the NSPs back into an idle state.
+6. Once the workload and related artifacts are no longer needed for future
+   sessions, make requests to the QSM to unload the data from DDR.  This frees
+   the DDR to be used by other users.
+
+
+Boot Flow
+=========
+
+AIC100 uses a flashless boot flow, derived from Qualcomm MSMs.
+
+When AIC100 is first powered on, it begins executing PBL (Primary Bootloader)
+from ROM.  PBL enumerates the PCIe link, and initializes the BHI (Boot Host
+Interface) component of MHI.
+
+Using BHI, the host points PBL to the location of the SBL (Secondary Bootloader)
+image.  The PBL pulls the image from the host, validates it, and begins
+execution of SBL.
+
+SBL initializes MHI, and uses MHI to notify the host that the device has entered
+the SBL stage.  SBL performs a number of operations:
+
+* SBL initializes the majority of hardware (anything PBL left uninitialized),
+  including DDR.
+* SBL offloads the bootlog to the host.
+* SBL synchonizes timestamps with the host for future logging.
+* SBL uses the Sahara protocol to obtain the runtime firmware images from the
+  host.
+
+Once SBL has obtained and validated the runtime firmware, it brings the NSPs out
+of reset, and jumps into the QSM.
+
+The QSM uses MHI to notify the host that the device has entered the QSM stage
+(AMSS in MHI terms).  At this point, the AIC100 device is fully functional, and
+ready to process workloads.
+
+Userspace components
+====================
+
+Compiler
+--------
+
+An open compiler for AIC100 based on upstream LLVM can be found at:
+https://github.com/quic/software-kit-for-qualcomm-cloud-ai-100-cc
+
+Usermode Driver (UMD)
+---------------------
+
+An open UMD that interfaces with the qaic kernel driver can be found at:
+https://github.com/quic/software-kit-for-qualcomm-cloud-ai-100
+
+Sahara loader
+-------------
+
+An open implementation of the Sahara protocol called kickstart can be found at:
+https://github.com/andersson/qdl
+
+MHI Channels
+============
+
+AIC100 defines a number of MHI channels for different purposes.  This is a list
+of the defined channels, and their uses.
+
+| QAIC_LOOPBACK
+| Channels 0/1
+| Valid for AMSS
+| Any data sent to the device on this channel is sent back to the host.
+
+| QAIC_SAHARA
+| Channels 2/3
+| Valid for SBL
+| Used by SBL to obtain the runtime firmware from the host.
+
+| QAIC_DIAG
+| Channels 4/5
+| Valid for AMSS
+| Used to communicate with QSM via the Diag protocol.
+
+| QAIC_SSR
+| Channels 6/7
+| Valid for AMSS
+| Used to notify the host of subsystem restart events, and to offload SSR crashdumps.
+
+| QAIC_QDSS
+| Channels 8/9
+| Valid for AMSS
+| Used for the Qualcomm Debug Subsystem.
+
+| QAIC_CONTROL
+| Channels 10/11
+| Valid for AMSS
+| Used for the Neural Network Control (NNC) protocol.  This is the primary channel between host and QSM for managing workloads.
+
+| QAIC_LOGGING
+| Channels 12/13
+| Valid for SBL
+| Used by the SBL to send the bootlog to the host.
+
+| QAIC_STATUS
+| Channels 14/15
+| Valid for AMSS
+| Used to notify the host of Reliability, Accessability, Serviceability (RAS) events.
+
+| QAIC_TELEMETRY
+| Channels 16/17
+| Valid for AMSS
+| Used to get/set power/thermal/etc attributes.
+
+| QAIC_DEBUG
+| Channels 18/19
+| Valid for AMSS
+| Not used.
+
+| QAIC_TIMESYNC
+| Channels 20/21
+| Valid for SBL/AMSS
+| Used to synchronize timestamps in the device side logs with the host time source.
+
+DMA Bridge
+==========
+
+Overview
+--------
+
+The DMA Bridge is one of the main interfaces to the host from the device
+(the other being MHI).  As part of activating a workload to run on NSPs, the QSM
+assigns that network a DMA Bridge channel.  A workload's DMA Bridge channel
+(DBC for short) is solely for the use of that workload and is not shared with
+other workloads.
+
+Each DBC is a pair of FIFOs that manage data in and out of the workload.  One
+FIFO is the request FIFO.  The other FIFO is the response FIFO.
+
+Each DBC contains 4 registers in hardware:
+
+* Request FIFO head pointer (offset 0x0).  Read only to the host.  Indicates the
+  latest item in the FIFO the device has consumed.
+* Request FIFO tail pointer (offset 0x4).  Read/write by the host.  Host
+  increments this register to add new items to the FIFO.
+* Response FIFO head pointer (offset 0x8).  Read/write by the host.  Indicates
+  the latest item in the FIFO the host has consumed.
+* Response FIFO tail pointer (offset 0xc).  Read only to the host.  Device
+  increments this register to add new items to the FIFO.
+
+The values in each register are indexes in the FIFO.  To get the location of the
+FIFO element pointed to by the register: FIFO base address + register * element
+size.
+
+DBC registers are exposed to the host via the second BAR.  Each DBC consumes
+0x1000 of space in the BAR.
+
+The actual FIFOs are backed by host memory.  When sending a request to the QSM
+to activate a network, the host must donate memory to be used for the FIFOs.
+Due to internal mapping limitations of the device, a single contigious chunk of
+memory must be provided per DBC, which hosts both FIFOs.  The request FIFO will
+consume the beginning of the memory chunk, and the response FIFO will consume
+the end of the memory chunk.
+
+Request FIFO
+------------
+
+A request FIFO element has the following structure:
+
+| {
+|	u16 req_id;
+|	u8  seq_id;
+|	u8  pcie_dma_cmd;
+|	u32 reserved;
+|	u64 pcie_dma_source_addr;
+|	u64 pcie_dma_dest_addr;
+|	u32 pcie_dma_len;
+|	u32 reserved;
+|	u64 doorbell_addr;
+|	u8  doorbell_attr;
+|	u8  reserved;
+|	u16 reserved;
+|	u32 doorbell_data;
+|	u32 sem_cmd0;
+|	u32 sem_cmd1;
+|	u32 sem_cmd2;
+|	u32 sem_cmd3;
+| }
+
+Request field descriptions:
+
+| req_id- request ID.  A request FIFO element and a response FIFO element with
+|         the same request ID refer to the same command.
+
+| seq_id- sequence ID within a request.  Ignored by the DMA Bridge.
+
+| pcie_dma_cmd- describes the DMA element of this request.
+| 	Bit(7) is the force msi flag, which overrides the DMA Bridge MSI logic
+| 		and generates a MSI when this request is complete, and QSM
+| 		configures the DMA Bridge to look at this bit.
+| 	Bits(6:5) are reserved.
+| 	Bit(4) is the completion code flag, and indicates that the DMA Bridge
+| 		shall generate a response FIFO element when this request is
+| 		complete.
+| 	Bit(3) indicates if this request is a linked list transfer(0) or a bulk
+| 		transfer(1).
+| 	Bit(2) is reserved.
+| 	Bits(1:0) indicate the type of transfer.  No transfer(0), to device(1),
+| 		from device(2).  Value 3 is illegal.
+
+| pcie_dma_source_addr- source address for a bulk transfer, or the address of
+|         the linked list.
+
+| pcie_dma_dest_addr- destination address for a bulk transfer.
+
+| pcie_dma_len- length of the bulk transfer.  Note that the size of this field
+| 	limits transfers to 4G in size.
+
+| doorbell_addr- address of the doorbell to ring when this request is complete.
+
+| doorbell_attr- doorbell attributes.
+| 	Bit(7) indicates if a write to a doorbell is to occur.
+| 	Bits(6:2) are reserved.
+| 	Bits(1:0) contain the encoding of the doorbell length.  0 is 32-bit,
+| 		1 is 16-bit, 2 is 8-bit, 3 is reserved.  The doorbell address
+| 		must be naturally aligned to the specified length.
+
+| doorbell_data- data to write to the doorbell.  Only the bits corresponding to
+| 	the doorbell length are valid.
+
+| sem_cmdN- semaphore command.
+| 	Bit(31) indicates this semaphore command is enabled.
+| 	Bit(30) is the to-device DMA fence.  Block this request until all
+| 		to-device DMA transfers are complete.
+| 	Bit(29) is the from-device DMA fence.  Block this request until all
+| 		from-device DMA transfers are complete.
+| 	Bits(28:27) are reserved.
+| 	Bits(26:24) are the semaphore command.  0 is NOP.  1 is init with the
+| 		specified value.  2 is increment.  3 is decrement.  4 is wait
+| 		until the semaphore is equal to the specified value.  5 is wait
+| 		until the semaphore is greater or equal to the specified value.
+| 		6 is "P", wait until semaphore is greater than 0, then
+| 		decrement by 1.  7 is reserved.
+| 	Bit(23) is reserved.
+| 	Bit(22) is the semaphore sync.  0 is post sync, which means that the
+| 		semaphore operation is done after the DMA transfer.  1 is
+| 		presync, which gates the DMA transfer.  Only one presync is
+| 		allowed per request.
+| 	Bit(21) is reserved.
+| 	Bits(20:16) is the index of the semaphore to operate on.
+| 	Bits(15:12) are reserved.
+| 	Bits(11:0) are the semaphore value to use in operations.
+
+Overall, a request is processed in 4 steps:
+
+1. If specified, the presync semaphore condition must be true
+2. If enabled, the DMA transfer occurs
+3. If specified, the postsync semaphore conditions must be true
+4. If enabled, the doorbell is written
+
+By using the semaphores in conjunction with the workload running on the NSPs,
+the data pipeline can be synchronized such that the host can queue multiple
+requests of data for the workload to process, but the DMA Bridge will only copy
+the data into the memory of the workload when the workload is ready to process
+the next input.
+
+Response FIFO
+-------------
+
+Once a request is fully processed, a response FIFO element is generated if
+specified in pcie_dma_cmd.  The structure of a response FIFO element:
+
+| {
+| 	u16 req_id;
+| 	u16 completion_code;
+| }
+
+req_id- matches the req_id of the request that generated this element.
+
+completion_code- status of this request.  0 is success.  non-zero is an error.
+
+The DMA Bridge will generate a MSI to the host as a reaction to activity in the
+response FIFO of a DBC.  The DMA Bridge hardware has an IRQ storm mitigation
+algorithm, where it will only generate a MSI when the response FIFO transitions
+from empty to non-empty (unless force MSI is enabled and triggered).  In
+response to this MSI, the host is expected to drain the response FIFO, and must
+take care to handle any race conditions between draining the FIFO, and the
+device inserting elements into the FIFO.
+
+Neural Network Control (NNC) Protocol
+=====================================
+
+The NNC protocol is how the host makes requests to the QSM to manage workloads.
+It uses the QAIC_CONTROL MHI channel.
+
+Each NNC request is packaged into a message.  Each message is a series of
+transactions.  A passthrough type transaction can contain elements known as
+commands.
+
+QSM requires NNC messages be little endian encoded and the fields be naturally
+aligned.  Since there are 64-bit elements in some NNC messages, 64-bit alignment
+must be maintained.
+
+A message contains a header and then a series of transactions.  A message may be
+at most 4K in size from QSM to the host.  From the host to the QSM, a message
+can be at most 64K (maximum size of a single MHI packet), but there is a
+continuation feature where message N+1 can be marked as a continuation of
+message N.  This is used for exceedingly large DMA xfer transactions.
+
+Transaction descriptions:
+
+passthrough- Allows userspace to send an opaque payload directly to the QSM.
+This is used for NNC commands.  Userspace is responsible for managing
+the QSM message requirements in the payload
+
+dma_xfer- DMA transfer.  Describes an object that the QSM should DMA into the
+device via address and size tuples.
+
+activate- Activate a workload onto NSPs.  The host must provide memory to be
+used by the DBC.
+
+deactivate- Deactivate an active workload and return the NSPs to idle.
+
+status- Query the QSM about it's NNC implementation.  Returns the NNC version,
+and if CRC is used.
+
+terminate- Release a user's resources.
+
+dma_xfer_cont- Continuation of a previous DMA transfer.  If a DMA transfer
+cannot be specified in a single message (highly fragmented), this
+transaction can be used to specify more ranges.
+
+validate_partition- Query to QSM to determine if a partition identifier is
+valid.
+
+Each message is tagged with a user id, and a partition id.  The user id allows
+QSM to track resources, and release them when the user goes away (eg the process
+crashes).  A partition id identifies the resource partition that QSM manages,
+which this message applies to.
+
+Messages may have CRCs.  Messages should have CRCs applied until the QSM
+reports via the status transaction that CRCs are not needed.  The QSM on the
+SA9000P requires CRCs for black channel safing.
+
+Subsystem Restart (SSR)
+=======================
+
+SSR is the concept of limiting the impact of an error.  An AIC100 device may
+have multiple users, each with their own workload running.  If the workload of
+one user crashes, the fallout of that should be limited to that workload and not
+impact other workloads.  SSR accomplishes this.
+
+If a particular workload crashes, QSM notifies the host via the QAIC_SSR MHI
+channel.  This notification identifies the workload by it's assigned DBC.  A
+multi-stage recovery process is then used to cleanup both sides, and get the
+DBC/NSPs into a working state.
+
+When SSR occurs, any state in the workload is lost.  Any inputs that were in
+process, or queued by not yet serviced, are lost.  The loaded artifacts will
+remain in on-card DDR, but the host will need to re-activate the workload if
+it desires to recover the workload.
+
+Reliability, Accessability, Serviceability (RAS)
+================================================
+
+AIC100 is expected to be deployed in server systems where RAS ideology is
+applied.  Simply put, RAS is the concept of detecting, classifying, and
+reporting errors.  While PCIe has AER (Advanced Error Reporting) which factors
+into RAS, AER does not allow for a device to report details about internal
+errors.  Therefore, AIC100 implements a custom RAS mechanism.  When a RAS event
+occurs, QSM will report the event with appropriate details via the QAIC_STATUS
+MHI channel.  A sysadmin may determine that a particular device needs
+additional service based on RAS reports.
+
+Telemetry
+=========
+
+QSM has the ability to report various physical attributes of the device, and in
+some cases, to allow the host to control them.  Examples include thermal limits,
+thermal readings, and power readings.  These items are communicated via the
+QAIC_TELEMETRY MHI channel
diff --git a/Documentation/accel/qaic/index.rst b/Documentation/accel/qaic/index.rst
new file mode 100644
index 0000000..ad19b88
--- /dev/null
+++ b/Documentation/accel/qaic/index.rst
@@ -0,0 +1,13 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+=====================================
+ accel/qaic Qualcomm Cloud AI driver
+=====================================
+
+The accel/qaic driver supports the Qualcomm Cloud AI machine learning
+accelerator cards.
+
+.. toctree::
+
+   qaic
+   aic100
diff --git a/Documentation/accel/qaic/qaic.rst b/Documentation/accel/qaic/qaic.rst
new file mode 100644
index 0000000..b0e7a5f
--- /dev/null
+++ b/Documentation/accel/qaic/qaic.rst
@@ -0,0 +1,169 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+=============
+ QAIC driver
+=============
+
+The QAIC driver is the Kernel Mode Driver (KMD) for the AIC100 family of AI
+accelerator products.
+
+Interrupts
+==========
+
+While the AIC100 DMA Bridge hardware implements an IRQ storm mitigation
+mechanism, it is still possible for an IRQ storm to occur.  A storm can happen
+if the workload is particularly quick, and the host is responsive.  If the host
+can drain the response FIFO as quickly as the device can insert elements into
+it, then the device will frequently transition the response FIFO from empty to
+non-empty and generate MSIs at a rate equilivelent to the speed of the
+workload's ability to process inputs.  The lprnet (license plate reader network)
+workload is known to trigger this condition, and can generate in excess of 100k
+MSIs per second.  It has been observed that most systems cannot tolerate this
+for long, and will crash due to some form of watchdog due to the overhead of
+the interrupt controller interrupting the host CPU.
+
+To mitigate this issue, the QAIC driver implements specific IRQ handling.  When
+QAIC receives an IRQ, it disables that line.  This prevents the interrupt
+controller from interrupting the CPU.  Then AIC drains the FIFO.  Once the FIFO
+is drained, QAIC implements a "last chance" polling algorithm where QAIC will
+sleep for a time to see if the workload will generate more activity.  The IRQ
+line remains disabled during this time.  If no activity is detected, QAIC exits
+polling mode and reenables the IRQ line.
+
+This mitigation in QAIC is very effective.  The same lprnet usecase that
+generates 100k IRQs per second (per /proc/interrupts) is reduced to roughly 64
+IRQs over 5 minutes while keeping the host system stable, and having the same
+workload throughput performance (within run to run noise variation).
+
+
+Neural Network Control (NNC) Protocol
+=====================================
+
+The implementation of NNC is split between the KMD (QAIC) and UMD.  In general
+QAIC understands how to encode/decode NNC wire protocol, and elements of the
+protocol which require kernelspace knowledge to process (for example, mapping
+host memory to device IOVAs).  QAIC understands the structure of a message, and
+all of the transactions.  QAIC does not understand commands (the payload of a
+passthrough transaction).
+
+QAIC handles and enforces the required little endianness and 64-bit alignment,
+to the degree that it can.  Since QAIC does not know the contents of a
+passthrough transaction, it relies on the UMD to saitsfy the requirements.
+
+The terminate transaction is of particular use to QAIC.  QAIC is not aware of
+the resources that are loaded onto a device since the majority of that activity
+occurs within NNC commands.  As a result, QAIC does not have the means to
+roll back userspace activity.  To ensure that a userspace client's resources
+are fully released in the case of a process crash, or a bug, QAIC uses the
+terminate command to let QSM know when a user has gone away, and the resources
+can be released.
+
+QSM can report a version number of the NNC protocol it supports.  This is in the
+form of a Major number and a Minor number.
+
+Major number updates indicate changes to the NNC protocol which impact the
+message format, or transactions (impacts QAIC).
+
+Minor number updates indicate changes to the NNC protocol which impact the
+commands (does not impact QAIC).
+
+uAPI
+====
+
+QAIC defines a number of driver specific IOCTLs as part of the userspace API.
+This section describes those APIs.
+
+DRM_IOCTL_QAIC_MANAGE:
+This IOCTL allows userspace to send a NNC request to the QSM.  The call will
+block until a response is received, or the request has timed out.
+
+DRM_IOCTL_QAIC_CREATE_BO:
+This IOCTL allows userspace to allocate a buffer object (BO) which can send or
+receive data from a workload.  The call will return a GEM handle that
+represents the allocated buffer.  The BO is not usable until it has been sliced
+(see DRM_IOCTL_QAIC_ATTACH_SLICE_BO).
+
+DRM_IOCTL_QAIC_MMAP_BO:
+This IOCTL allows userspace to prepare an allocated BO to be mmap'd into the
+userspace process.
+
+DRM_IOCTL_QAIC_ATTACH_SLICE_BO:
+This IOCTL allows userspace to slice a BO in preparation for sending the BO to
+the device.  Slicing is the operation of describing what portions of a BO get
+sent where to a workload.  This requires a set of DMA transfers for the DMA
+Bridge, and as such, locks the BO to a specific DBC.
+
+DRM_IOCTL_QAIC_EXECUTE_BO:
+This IOCTL allows userspace to submit a set of sliced BOs to the device.  The
+call is non-blocking.  Success only indicates that the BOs have been queued
+to the device, but does not guarantee they have been executed.
+
+DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO:
+This IOCTL operates like DRM_IOCTL_QAIC_EXECUTE_BO, but it allows userspace to
+shrink the BOs sent to the device for this specific call.  If a BO typically has
+N inputs, but only a subset of those is available, this IOCTL allows userspace
+to indicate that only the first M bytes of the BO should be sent to the device
+to minimize data transfer overhead.  This IOCTL dynamically recomputes the
+slicing, and therefore has some processing overhead before the BOs can be queued
+to the device.
+
+DRM_IOCTL_QAIC_WAIT_BO:
+This IOCTL allows userspace to determine when a particular BO has been processed
+by the device.  The call will block until either the BO has been processed and
+can be re-queued to the device, or a timeout occurs.
+
+DRM_IOCTL_QAIC_PERF_STATS_BO:
+This IOCTL allows userspace to collect performance statistics on the most
+recent execution of a BO.  This allows userspace to construct an end to end
+timeline of the BO processing for a performance analysis.
+
+DRM_IOCTL_QAIC_PART_DEV:
+This IOCTL allows userspace to request a duplicate "shadow device".  This extra
+accelN device is associated with a specific partition of resources on the AIC100
+device and can be used for limiting a process to some subset of resources.
+
+Userspace Client Isolation
+==========================
+
+AIC100 supports multiple clients.  Multiple DBCs can be consumed by a single
+client, and multiple clients can each consume one or more DBCs.  Workloads
+may contain sensistive information therefore only the client that owns the
+workload should be allowed to interface with the DBC.
+
+Clients are identified by the instance associated with their open().  A client
+may only use memory they allocate, and DBCs that are assigned to their
+workloads.  Attempts to access resources assigned to other clients will be
+rejected.
+
+Module parameters
+=================
+
+QAIC supports the following module parameters:
+
+**datapath_polling (bool)**
+
+Configures QAIC to use a polling thread for datapath events instead of relying
+on the device interrupts.  Useful for platforms with broken multiMSI.  Must be
+set at QAIC driver initialization.  Default is 0 (off).
+
+**mhi_timeout (int)**
+
+Sets the timeout value for MHI operations in milliseconds (ms).  Must be set
+at the time the driver detects a device.  Default is 2000 (2 seconds).
+
+**control_resp_timeout (int)**
+
+Sets the timeout value for QSM responses to NNC messages in seconds (s).  Must
+be set at the time the driver is sending a request to QSM.  Default is 60 (one
+minute).
+
+**wait_exec_default_timeout (int)**
+
+Sets the default timeout for the wait_exec ioctl in milliseconds (ms).  Must be
+set prior to the waic_exec ioctl call.  A value specified in the ioctl call
+overrides this for that call.  Default is 5000 (5 seconds).
+
+**datapath_poll_interval_us (int)**
+
+Sets the polling interval in microseconds (us) when datapath polling is active.
+Takes effect at the next polling interval.  Default is 100 (100 us).
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 1/8] accel/qaic: Add documentation for AIC100 accelerator driver
@ 2023-02-06 15:41   ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-06 15:41 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: jacek.lawrynowicz, stanislaw.gruszka, quic_pkanojiy, quic_carlv,
	quic_ajitpals, linux-doc, linux-arm-msm, Jeffrey Hugo

The Qualcomm Cloud AI 100 (AIC100) device is an Artificial Intelligence
accelerator PCIe card.  It contains a number of components both in the
SoC and on the card which facilitate running workloads:

QSM: management processor
NSPs: workload compute units
DMA Bridge: dedicated data mover for the workloads
MHI: multiplexed communication channels
DDR: workload storage and memory

The Linux kernel driver for AIC100 is called "QAIC" and is located in the
accel subsystem.

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
---
 Documentation/accel/index.rst       |   1 +
 Documentation/accel/qaic/aic100.rst | 498 ++++++++++++++++++++++++++++++++++++
 Documentation/accel/qaic/index.rst  |  13 +
 Documentation/accel/qaic/qaic.rst   | 169 ++++++++++++
 4 files changed, 681 insertions(+)
 create mode 100644 Documentation/accel/qaic/aic100.rst
 create mode 100644 Documentation/accel/qaic/index.rst
 create mode 100644 Documentation/accel/qaic/qaic.rst

diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
index 2b43c9a..e94a016 100644
--- a/Documentation/accel/index.rst
+++ b/Documentation/accel/index.rst
@@ -8,6 +8,7 @@ Compute Accelerators
    :maxdepth: 1
 
    introduction
+   qaic/index
 
 .. only::  subproject and html
 
diff --git a/Documentation/accel/qaic/aic100.rst b/Documentation/accel/qaic/aic100.rst
new file mode 100644
index 0000000..773aa54
--- /dev/null
+++ b/Documentation/accel/qaic/aic100.rst
@@ -0,0 +1,498 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+===============================
+ Qualcomm Cloud AI 100 (AIC100)
+===============================
+
+Overview
+========
+
+The Qualcomm Cloud AI 100/AIC100 family of products (including SA9000P - part of
+Snapdragon Ride) are PCIe adapter cards which contain a dedicated SoC ASIC for
+the purpose of efficiently running Artificial Intelligence (AI) Deep Learning
+inference workloads.  They are AI accelerators.
+
+The PCIe interface of AIC100 is capable of PCIe Gen4 speeds over eight lanes
+(x8).  An individual SoC on a card can have up to 16 NSPs for running workloads.
+Each SoC has an A53 management CPU.  On card, there can be up to 32 GB of DDR.
+
+Multiple AIC100 cards can be hosted in a single system to scale overall
+performance.
+
+Hardware Description
+====================
+
+An AIC100 card consists of an AIC100 SoC, on-card DDR, and a set of misc
+peripherals (PMICs, etc).
+
+An AIC100 card can either be a PCIe HHHL form factor (a traditional PCIe card),
+or a Dual M.2 card.  Both use PCIe to connect to the host system.
+
+As a PCIe endpoint/adapter, AIC100 uses the standard VendorID(VID)/
+DeviceID(DID) combination to uniquely identify itself to the host.  AIC100
+uses the standard Qualcomm VID (0x17cb).  All AIC100 instances use the same
+AIC100 DID (0xa100).
+
+AIC100 does not implement FLR (function level reset).
+
+AIC100 implements MSI but does not implement MSI-X.  AIC100 requires 17 MSIs to
+operate (1 for MHI, 16 for the DMA Bridge).
+
+As a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device
+hardware.  AIC100 provides 3, 64-bit BARs.
+
+* The first BAR is 4K in size, and exposes the MHI interface to the host.
+
+* The second BAR is 2M in size, and exposes the DMA Bridge interface to the
+  host.
+
+* The third BAR is variable in size based on an individual AIC100's
+  configuration, but defaults to 64K.  This BAR currently has no purpose.
+
+From the host perspective, AIC100 has several key hardware components-
+
+* QSM (QAIC Service Manager)
+* NSPs (Neural Signal Processor)
+* DMA Bridge
+* DDR
+* MHI (Modem Host Interface)
+
+QSM
+---
+
+QAIC Service Manager.  This is an ARM A53 CPU that runs the primary
+firmware of the card and performs on-card management tasks.  It also
+communicates with the host via MHI.  Each AIC100 has one of
+these.
+
+NSP
+---
+
+Neural Signal Processor.  Each AIC100 has up to 16 of these.  These are
+the processors that run the workloads on AIC100.  Each NSP is a Qualcomm Hexagon
+(Q6) DSP with HVX and HMX.  Each NSP can only run one workload at a time, but
+multiple NSPs may be assigned to a single workload.  Since each NSP can only run
+one workload, AIC100 is limited to 16 concurrent workloads.  Workload
+"scheduling" is under the purview of the host.  AIC100 does not automatically
+timeslice.
+
+DMA Bridge
+----------
+
+The DMA Bridge is custom DMA engine that manages the flow of data
+in and out of workloads.  AIC100 has one of these.  The DMA Bridge has 16
+channels, each consisting of a set of request/response FIFOs.  Each active
+workload is assigned a single DMA Bridge channel.  The DMA Bridge exposes
+hardware registers to manage the FIFOs (head/tail pointers), but requires host
+memory to store the FIFOs.
+
+DDR
+---
+
+AIC100 has on-card DDR.  In total, an AIC100 can have up to 32 GB of DDR.
+This DDR is used to store workloads, data for the workloads, and is used by the
+QSM for managing the device.  NSPs are granted access to sections of the DDR by
+the QSM.  The host does not have direct access to the DDR, and must make
+requests to the QSM to transfer data to the DDR.
+
+MHI
+---
+
+AIC100 has one MHI interface over PCIe.  MHI itself is documented at
+Documentation/mhi/index.rst  MHI is the mechanism the host uses to communicate
+with the QSM.  Except for workload data via the DMA Bridge, all interaction with
+he device occurs via MHI.
+
+High-level Use Flow
+===================
+
+AIC100 is a programmable accelerator typically used for running
+neural networks in inferencing mode to efficiently perform AI operations.
+AIC100 is not intended for training neural networks.  AIC100 can be utilitized
+for generic compute workloads.
+
+Assuming a user wants to utilize AIC100, they would follow these steps:
+
+1. Compile the workload into an ELF targeting the NSP(s)
+2. Make requests to the QSM to load the workload and related artifacts into the
+   device DDR
+3. Make a request to the QSM to activate the workload onto a set of idle NSPs
+4. Make requests to the DMA Bridge to send input data to the workload to be
+   processed, and other requests to receive processed output data from the
+   workload.
+5. Once the workload is no longer required, make a request to the QSM to
+   deactivate the workload, thus putting the NSPs back into an idle state.
+6. Once the workload and related artifacts are no longer needed for future
+   sessions, make requests to the QSM to unload the data from DDR.  This frees
+   the DDR to be used by other users.
+
+
+Boot Flow
+=========
+
+AIC100 uses a flashless boot flow, derived from Qualcomm MSMs.
+
+When AIC100 is first powered on, it begins executing PBL (Primary Bootloader)
+from ROM.  PBL enumerates the PCIe link, and initializes the BHI (Boot Host
+Interface) component of MHI.
+
+Using BHI, the host points PBL to the location of the SBL (Secondary Bootloader)
+image.  The PBL pulls the image from the host, validates it, and begins
+execution of SBL.
+
+SBL initializes MHI, and uses MHI to notify the host that the device has entered
+the SBL stage.  SBL performs a number of operations:
+
+* SBL initializes the majority of hardware (anything PBL left uninitialized),
+  including DDR.
+* SBL offloads the bootlog to the host.
+* SBL synchonizes timestamps with the host for future logging.
+* SBL uses the Sahara protocol to obtain the runtime firmware images from the
+  host.
+
+Once SBL has obtained and validated the runtime firmware, it brings the NSPs out
+of reset, and jumps into the QSM.
+
+The QSM uses MHI to notify the host that the device has entered the QSM stage
+(AMSS in MHI terms).  At this point, the AIC100 device is fully functional, and
+ready to process workloads.
+
+Userspace components
+====================
+
+Compiler
+--------
+
+An open compiler for AIC100 based on upstream LLVM can be found at:
+https://github.com/quic/software-kit-for-qualcomm-cloud-ai-100-cc
+
+Usermode Driver (UMD)
+---------------------
+
+An open UMD that interfaces with the qaic kernel driver can be found at:
+https://github.com/quic/software-kit-for-qualcomm-cloud-ai-100
+
+Sahara loader
+-------------
+
+An open implementation of the Sahara protocol called kickstart can be found at:
+https://github.com/andersson/qdl
+
+MHI Channels
+============
+
+AIC100 defines a number of MHI channels for different purposes.  This is a list
+of the defined channels, and their uses.
+
+| QAIC_LOOPBACK
+| Channels 0/1
+| Valid for AMSS
+| Any data sent to the device on this channel is sent back to the host.
+
+| QAIC_SAHARA
+| Channels 2/3
+| Valid for SBL
+| Used by SBL to obtain the runtime firmware from the host.
+
+| QAIC_DIAG
+| Channels 4/5
+| Valid for AMSS
+| Used to communicate with QSM via the Diag protocol.
+
+| QAIC_SSR
+| Channels 6/7
+| Valid for AMSS
+| Used to notify the host of subsystem restart events, and to offload SSR crashdumps.
+
+| QAIC_QDSS
+| Channels 8/9
+| Valid for AMSS
+| Used for the Qualcomm Debug Subsystem.
+
+| QAIC_CONTROL
+| Channels 10/11
+| Valid for AMSS
+| Used for the Neural Network Control (NNC) protocol.  This is the primary channel between host and QSM for managing workloads.
+
+| QAIC_LOGGING
+| Channels 12/13
+| Valid for SBL
+| Used by the SBL to send the bootlog to the host.
+
+| QAIC_STATUS
+| Channels 14/15
+| Valid for AMSS
+| Used to notify the host of Reliability, Accessability, Serviceability (RAS) events.
+
+| QAIC_TELEMETRY
+| Channels 16/17
+| Valid for AMSS
+| Used to get/set power/thermal/etc attributes.
+
+| QAIC_DEBUG
+| Channels 18/19
+| Valid for AMSS
+| Not used.
+
+| QAIC_TIMESYNC
+| Channels 20/21
+| Valid for SBL/AMSS
+| Used to synchronize timestamps in the device side logs with the host time source.
+
+DMA Bridge
+==========
+
+Overview
+--------
+
+The DMA Bridge is one of the main interfaces to the host from the device
+(the other being MHI).  As part of activating a workload to run on NSPs, the QSM
+assigns that network a DMA Bridge channel.  A workload's DMA Bridge channel
+(DBC for short) is solely for the use of that workload and is not shared with
+other workloads.
+
+Each DBC is a pair of FIFOs that manage data in and out of the workload.  One
+FIFO is the request FIFO.  The other FIFO is the response FIFO.
+
+Each DBC contains 4 registers in hardware:
+
+* Request FIFO head pointer (offset 0x0).  Read only to the host.  Indicates the
+  latest item in the FIFO the device has consumed.
+* Request FIFO tail pointer (offset 0x4).  Read/write by the host.  Host
+  increments this register to add new items to the FIFO.
+* Response FIFO head pointer (offset 0x8).  Read/write by the host.  Indicates
+  the latest item in the FIFO the host has consumed.
+* Response FIFO tail pointer (offset 0xc).  Read only to the host.  Device
+  increments this register to add new items to the FIFO.
+
+The values in each register are indexes in the FIFO.  To get the location of the
+FIFO element pointed to by the register: FIFO base address + register * element
+size.
+
+DBC registers are exposed to the host via the second BAR.  Each DBC consumes
+0x1000 of space in the BAR.
+
+The actual FIFOs are backed by host memory.  When sending a request to the QSM
+to activate a network, the host must donate memory to be used for the FIFOs.
+Due to internal mapping limitations of the device, a single contigious chunk of
+memory must be provided per DBC, which hosts both FIFOs.  The request FIFO will
+consume the beginning of the memory chunk, and the response FIFO will consume
+the end of the memory chunk.
+
+Request FIFO
+------------
+
+A request FIFO element has the following structure:
+
+| {
+|	u16 req_id;
+|	u8  seq_id;
+|	u8  pcie_dma_cmd;
+|	u32 reserved;
+|	u64 pcie_dma_source_addr;
+|	u64 pcie_dma_dest_addr;
+|	u32 pcie_dma_len;
+|	u32 reserved;
+|	u64 doorbell_addr;
+|	u8  doorbell_attr;
+|	u8  reserved;
+|	u16 reserved;
+|	u32 doorbell_data;
+|	u32 sem_cmd0;
+|	u32 sem_cmd1;
+|	u32 sem_cmd2;
+|	u32 sem_cmd3;
+| }
+
+Request field descriptions:
+
+| req_id- request ID.  A request FIFO element and a response FIFO element with
+|         the same request ID refer to the same command.
+
+| seq_id- sequence ID within a request.  Ignored by the DMA Bridge.
+
+| pcie_dma_cmd- describes the DMA element of this request.
+| 	Bit(7) is the force msi flag, which overrides the DMA Bridge MSI logic
+| 		and generates a MSI when this request is complete, and QSM
+| 		configures the DMA Bridge to look at this bit.
+| 	Bits(6:5) are reserved.
+| 	Bit(4) is the completion code flag, and indicates that the DMA Bridge
+| 		shall generate a response FIFO element when this request is
+| 		complete.
+| 	Bit(3) indicates if this request is a linked list transfer(0) or a bulk
+| 		transfer(1).
+| 	Bit(2) is reserved.
+| 	Bits(1:0) indicate the type of transfer.  No transfer(0), to device(1),
+| 		from device(2).  Value 3 is illegal.
+
+| pcie_dma_source_addr- source address for a bulk transfer, or the address of
+|         the linked list.
+
+| pcie_dma_dest_addr- destination address for a bulk transfer.
+
+| pcie_dma_len- length of the bulk transfer.  Note that the size of this field
+| 	limits transfers to 4G in size.
+
+| doorbell_addr- address of the doorbell to ring when this request is complete.
+
+| doorbell_attr- doorbell attributes.
+| 	Bit(7) indicates if a write to a doorbell is to occur.
+| 	Bits(6:2) are reserved.
+| 	Bits(1:0) contain the encoding of the doorbell length.  0 is 32-bit,
+| 		1 is 16-bit, 2 is 8-bit, 3 is reserved.  The doorbell address
+| 		must be naturally aligned to the specified length.
+
+| doorbell_data- data to write to the doorbell.  Only the bits corresponding to
+| 	the doorbell length are valid.
+
+| sem_cmdN- semaphore command.
+| 	Bit(31) indicates this semaphore command is enabled.
+| 	Bit(30) is the to-device DMA fence.  Block this request until all
+| 		to-device DMA transfers are complete.
+| 	Bit(29) is the from-device DMA fence.  Block this request until all
+| 		from-device DMA transfers are complete.
+| 	Bits(28:27) are reserved.
+| 	Bits(26:24) are the semaphore command.  0 is NOP.  1 is init with the
+| 		specified value.  2 is increment.  3 is decrement.  4 is wait
+| 		until the semaphore is equal to the specified value.  5 is wait
+| 		until the semaphore is greater or equal to the specified value.
+| 		6 is "P", wait until semaphore is greater than 0, then
+| 		decrement by 1.  7 is reserved.
+| 	Bit(23) is reserved.
+| 	Bit(22) is the semaphore sync.  0 is post sync, which means that the
+| 		semaphore operation is done after the DMA transfer.  1 is
+| 		presync, which gates the DMA transfer.  Only one presync is
+| 		allowed per request.
+| 	Bit(21) is reserved.
+| 	Bits(20:16) is the index of the semaphore to operate on.
+| 	Bits(15:12) are reserved.
+| 	Bits(11:0) are the semaphore value to use in operations.
+
+Overall, a request is processed in 4 steps:
+
+1. If specified, the presync semaphore condition must be true
+2. If enabled, the DMA transfer occurs
+3. If specified, the postsync semaphore conditions must be true
+4. If enabled, the doorbell is written
+
+By using the semaphores in conjunction with the workload running on the NSPs,
+the data pipeline can be synchronized such that the host can queue multiple
+requests of data for the workload to process, but the DMA Bridge will only copy
+the data into the memory of the workload when the workload is ready to process
+the next input.
+
+Response FIFO
+-------------
+
+Once a request is fully processed, a response FIFO element is generated if
+specified in pcie_dma_cmd.  The structure of a response FIFO element:
+
+| {
+| 	u16 req_id;
+| 	u16 completion_code;
+| }
+
+req_id- matches the req_id of the request that generated this element.
+
+completion_code- status of this request.  0 is success.  non-zero is an error.
+
+The DMA Bridge will generate a MSI to the host as a reaction to activity in the
+response FIFO of a DBC.  The DMA Bridge hardware has an IRQ storm mitigation
+algorithm, where it will only generate a MSI when the response FIFO transitions
+from empty to non-empty (unless force MSI is enabled and triggered).  In
+response to this MSI, the host is expected to drain the response FIFO, and must
+take care to handle any race conditions between draining the FIFO, and the
+device inserting elements into the FIFO.
+
+Neural Network Control (NNC) Protocol
+=====================================
+
+The NNC protocol is how the host makes requests to the QSM to manage workloads.
+It uses the QAIC_CONTROL MHI channel.
+
+Each NNC request is packaged into a message.  Each message is a series of
+transactions.  A passthrough type transaction can contain elements known as
+commands.
+
+QSM requires NNC messages be little endian encoded and the fields be naturally
+aligned.  Since there are 64-bit elements in some NNC messages, 64-bit alignment
+must be maintained.
+
+A message contains a header and then a series of transactions.  A message may be
+at most 4K in size from QSM to the host.  From the host to the QSM, a message
+can be at most 64K (maximum size of a single MHI packet), but there is a
+continuation feature where message N+1 can be marked as a continuation of
+message N.  This is used for exceedingly large DMA xfer transactions.
+
+Transaction descriptions:
+
+passthrough- Allows userspace to send an opaque payload directly to the QSM.
+This is used for NNC commands.  Userspace is responsible for managing
+the QSM message requirements in the payload
+
+dma_xfer- DMA transfer.  Describes an object that the QSM should DMA into the
+device via address and size tuples.
+
+activate- Activate a workload onto NSPs.  The host must provide memory to be
+used by the DBC.
+
+deactivate- Deactivate an active workload and return the NSPs to idle.
+
+status- Query the QSM about it's NNC implementation.  Returns the NNC version,
+and if CRC is used.
+
+terminate- Release a user's resources.
+
+dma_xfer_cont- Continuation of a previous DMA transfer.  If a DMA transfer
+cannot be specified in a single message (highly fragmented), this
+transaction can be used to specify more ranges.
+
+validate_partition- Query to QSM to determine if a partition identifier is
+valid.
+
+Each message is tagged with a user id, and a partition id.  The user id allows
+QSM to track resources, and release them when the user goes away (eg the process
+crashes).  A partition id identifies the resource partition that QSM manages,
+which this message applies to.
+
+Messages may have CRCs.  Messages should have CRCs applied until the QSM
+reports via the status transaction that CRCs are not needed.  The QSM on the
+SA9000P requires CRCs for black channel safing.
+
+Subsystem Restart (SSR)
+=======================
+
+SSR is the concept of limiting the impact of an error.  An AIC100 device may
+have multiple users, each with their own workload running.  If the workload of
+one user crashes, the fallout of that should be limited to that workload and not
+impact other workloads.  SSR accomplishes this.
+
+If a particular workload crashes, QSM notifies the host via the QAIC_SSR MHI
+channel.  This notification identifies the workload by it's assigned DBC.  A
+multi-stage recovery process is then used to cleanup both sides, and get the
+DBC/NSPs into a working state.
+
+When SSR occurs, any state in the workload is lost.  Any inputs that were in
+process, or queued by not yet serviced, are lost.  The loaded artifacts will
+remain in on-card DDR, but the host will need to re-activate the workload if
+it desires to recover the workload.
+
+Reliability, Accessability, Serviceability (RAS)
+================================================
+
+AIC100 is expected to be deployed in server systems where RAS ideology is
+applied.  Simply put, RAS is the concept of detecting, classifying, and
+reporting errors.  While PCIe has AER (Advanced Error Reporting) which factors
+into RAS, AER does not allow for a device to report details about internal
+errors.  Therefore, AIC100 implements a custom RAS mechanism.  When a RAS event
+occurs, QSM will report the event with appropriate details via the QAIC_STATUS
+MHI channel.  A sysadmin may determine that a particular device needs
+additional service based on RAS reports.
+
+Telemetry
+=========
+
+QSM has the ability to report various physical attributes of the device, and in
+some cases, to allow the host to control them.  Examples include thermal limits,
+thermal readings, and power readings.  These items are communicated via the
+QAIC_TELEMETRY MHI channel
diff --git a/Documentation/accel/qaic/index.rst b/Documentation/accel/qaic/index.rst
new file mode 100644
index 0000000..ad19b88
--- /dev/null
+++ b/Documentation/accel/qaic/index.rst
@@ -0,0 +1,13 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+=====================================
+ accel/qaic Qualcomm Cloud AI driver
+=====================================
+
+The accel/qaic driver supports the Qualcomm Cloud AI machine learning
+accelerator cards.
+
+.. toctree::
+
+   qaic
+   aic100
diff --git a/Documentation/accel/qaic/qaic.rst b/Documentation/accel/qaic/qaic.rst
new file mode 100644
index 0000000..b0e7a5f
--- /dev/null
+++ b/Documentation/accel/qaic/qaic.rst
@@ -0,0 +1,169 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+=============
+ QAIC driver
+=============
+
+The QAIC driver is the Kernel Mode Driver (KMD) for the AIC100 family of AI
+accelerator products.
+
+Interrupts
+==========
+
+While the AIC100 DMA Bridge hardware implements an IRQ storm mitigation
+mechanism, it is still possible for an IRQ storm to occur.  A storm can happen
+if the workload is particularly quick, and the host is responsive.  If the host
+can drain the response FIFO as quickly as the device can insert elements into
+it, then the device will frequently transition the response FIFO from empty to
+non-empty and generate MSIs at a rate equilivelent to the speed of the
+workload's ability to process inputs.  The lprnet (license plate reader network)
+workload is known to trigger this condition, and can generate in excess of 100k
+MSIs per second.  It has been observed that most systems cannot tolerate this
+for long, and will crash due to some form of watchdog due to the overhead of
+the interrupt controller interrupting the host CPU.
+
+To mitigate this issue, the QAIC driver implements specific IRQ handling.  When
+QAIC receives an IRQ, it disables that line.  This prevents the interrupt
+controller from interrupting the CPU.  Then AIC drains the FIFO.  Once the FIFO
+is drained, QAIC implements a "last chance" polling algorithm where QAIC will
+sleep for a time to see if the workload will generate more activity.  The IRQ
+line remains disabled during this time.  If no activity is detected, QAIC exits
+polling mode and reenables the IRQ line.
+
+This mitigation in QAIC is very effective.  The same lprnet usecase that
+generates 100k IRQs per second (per /proc/interrupts) is reduced to roughly 64
+IRQs over 5 minutes while keeping the host system stable, and having the same
+workload throughput performance (within run to run noise variation).
+
+
+Neural Network Control (NNC) Protocol
+=====================================
+
+The implementation of NNC is split between the KMD (QAIC) and UMD.  In general
+QAIC understands how to encode/decode NNC wire protocol, and elements of the
+protocol which require kernelspace knowledge to process (for example, mapping
+host memory to device IOVAs).  QAIC understands the structure of a message, and
+all of the transactions.  QAIC does not understand commands (the payload of a
+passthrough transaction).
+
+QAIC handles and enforces the required little endianness and 64-bit alignment,
+to the degree that it can.  Since QAIC does not know the contents of a
+passthrough transaction, it relies on the UMD to saitsfy the requirements.
+
+The terminate transaction is of particular use to QAIC.  QAIC is not aware of
+the resources that are loaded onto a device since the majority of that activity
+occurs within NNC commands.  As a result, QAIC does not have the means to
+roll back userspace activity.  To ensure that a userspace client's resources
+are fully released in the case of a process crash, or a bug, QAIC uses the
+terminate command to let QSM know when a user has gone away, and the resources
+can be released.
+
+QSM can report a version number of the NNC protocol it supports.  This is in the
+form of a Major number and a Minor number.
+
+Major number updates indicate changes to the NNC protocol which impact the
+message format, or transactions (impacts QAIC).
+
+Minor number updates indicate changes to the NNC protocol which impact the
+commands (does not impact QAIC).
+
+uAPI
+====
+
+QAIC defines a number of driver specific IOCTLs as part of the userspace API.
+This section describes those APIs.
+
+DRM_IOCTL_QAIC_MANAGE:
+This IOCTL allows userspace to send a NNC request to the QSM.  The call will
+block until a response is received, or the request has timed out.
+
+DRM_IOCTL_QAIC_CREATE_BO:
+This IOCTL allows userspace to allocate a buffer object (BO) which can send or
+receive data from a workload.  The call will return a GEM handle that
+represents the allocated buffer.  The BO is not usable until it has been sliced
+(see DRM_IOCTL_QAIC_ATTACH_SLICE_BO).
+
+DRM_IOCTL_QAIC_MMAP_BO:
+This IOCTL allows userspace to prepare an allocated BO to be mmap'd into the
+userspace process.
+
+DRM_IOCTL_QAIC_ATTACH_SLICE_BO:
+This IOCTL allows userspace to slice a BO in preparation for sending the BO to
+the device.  Slicing is the operation of describing what portions of a BO get
+sent where to a workload.  This requires a set of DMA transfers for the DMA
+Bridge, and as such, locks the BO to a specific DBC.
+
+DRM_IOCTL_QAIC_EXECUTE_BO:
+This IOCTL allows userspace to submit a set of sliced BOs to the device.  The
+call is non-blocking.  Success only indicates that the BOs have been queued
+to the device, but does not guarantee they have been executed.
+
+DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO:
+This IOCTL operates like DRM_IOCTL_QAIC_EXECUTE_BO, but it allows userspace to
+shrink the BOs sent to the device for this specific call.  If a BO typically has
+N inputs, but only a subset of those is available, this IOCTL allows userspace
+to indicate that only the first M bytes of the BO should be sent to the device
+to minimize data transfer overhead.  This IOCTL dynamically recomputes the
+slicing, and therefore has some processing overhead before the BOs can be queued
+to the device.
+
+DRM_IOCTL_QAIC_WAIT_BO:
+This IOCTL allows userspace to determine when a particular BO has been processed
+by the device.  The call will block until either the BO has been processed and
+can be re-queued to the device, or a timeout occurs.
+
+DRM_IOCTL_QAIC_PERF_STATS_BO:
+This IOCTL allows userspace to collect performance statistics on the most
+recent execution of a BO.  This allows userspace to construct an end to end
+timeline of the BO processing for a performance analysis.
+
+DRM_IOCTL_QAIC_PART_DEV:
+This IOCTL allows userspace to request a duplicate "shadow device".  This extra
+accelN device is associated with a specific partition of resources on the AIC100
+device and can be used for limiting a process to some subset of resources.
+
+Userspace Client Isolation
+==========================
+
+AIC100 supports multiple clients.  Multiple DBCs can be consumed by a single
+client, and multiple clients can each consume one or more DBCs.  Workloads
+may contain sensistive information therefore only the client that owns the
+workload should be allowed to interface with the DBC.
+
+Clients are identified by the instance associated with their open().  A client
+may only use memory they allocate, and DBCs that are assigned to their
+workloads.  Attempts to access resources assigned to other clients will be
+rejected.
+
+Module parameters
+=================
+
+QAIC supports the following module parameters:
+
+**datapath_polling (bool)**
+
+Configures QAIC to use a polling thread for datapath events instead of relying
+on the device interrupts.  Useful for platforms with broken multiMSI.  Must be
+set at QAIC driver initialization.  Default is 0 (off).
+
+**mhi_timeout (int)**
+
+Sets the timeout value for MHI operations in milliseconds (ms).  Must be set
+at the time the driver detects a device.  Default is 2000 (2 seconds).
+
+**control_resp_timeout (int)**
+
+Sets the timeout value for QSM responses to NNC messages in seconds (s).  Must
+be set at the time the driver is sending a request to QSM.  Default is 60 (one
+minute).
+
+**wait_exec_default_timeout (int)**
+
+Sets the default timeout for the wait_exec ioctl in milliseconds (ms).  Must be
+set prior to the waic_exec ioctl call.  A value specified in the ioctl call
+overrides this for that call.  Default is 5000 (5 seconds).
+
+**datapath_poll_interval_us (int)**
+
+Sets the polling interval in microseconds (us) when datapath polling is active.
+Takes effect at the next polling interval.  Default is 100 (100 us).
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 2/8] accel/qaic: Add uapi and core driver file
  2023-02-06 15:41 ` Jeffrey Hugo
@ 2023-02-06 15:41   ` Jeffrey Hugo
  -1 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-06 15:41 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: Jeffrey Hugo, linux-doc, linux-arm-msm, quic_ajitpals,
	quic_pkanojiy, stanislaw.gruszka, quic_carlv, jacek.lawrynowicz

Add the QAIC driver uapi file and core driver file that binds to the PCIe
device.  The core driver file also creates the accel device and manages
all the interconnections between the different parts of the driver.

The driver can be built as a module.  If so, it will be called "qaic.ko".

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
---
 drivers/accel/qaic/qaic.h     | 321 ++++++++++++++++++
 drivers/accel/qaic/qaic_drv.c | 771 ++++++++++++++++++++++++++++++++++++++++++
 include/uapi/drm/qaic_accel.h | 283 ++++++++++++++++
 3 files changed, 1375 insertions(+)
 create mode 100644 drivers/accel/qaic/qaic.h
 create mode 100644 drivers/accel/qaic/qaic_drv.c
 create mode 100644 include/uapi/drm/qaic_accel.h

diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
new file mode 100644
index 0000000..3f7ea76
--- /dev/null
+++ b/drivers/accel/qaic/qaic.h
@@ -0,0 +1,321 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * Copyright (c) 2019-2021, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef QAICINTERNAL_H_
+#define QAICINTERNAL_H_
+
+#include <linux/interrupt.h>
+#include <linux/kref.h>
+#include <linux/mhi.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/spinlock.h>
+#include <linux/srcu.h>
+#include <linux/wait.h>
+#include <linux/workqueue.h>
+#include <drm/drm_device.h>
+#include <drm/drm_gem.h>
+
+#define QAIC_DBC_BASE		0x20000
+#define QAIC_DBC_SIZE		0x1000
+
+#define QAIC_NO_PARTITION	-1
+
+#define QAIC_DBC_OFF(i)		((i) * QAIC_DBC_SIZE + QAIC_DBC_BASE)
+
+#define to_qaic_bo(obj) container_of(obj, struct qaic_bo, base)
+
+extern bool poll_datapath;
+
+struct qaic_user {
+	/* Uniquely identifies this user for the device */
+	int			handle;
+	struct kref		ref_count;
+	/* Char device opened by this user */
+	struct qaic_drm_device	*qddev;
+	/* Node in list of users that opened this drm device */
+	struct list_head	node;
+	/* SRCU used to synchronize this user during cleanup */
+	struct srcu_struct	qddev_lock;
+	atomic_t		chunk_id;
+};
+
+struct dma_bridge_chan {
+	/* Pointer to device strcut maintained by driver */
+	struct qaic_device	*qdev;
+	/* ID of this DMA bridge channel(DBC) */
+	unsigned int		id;
+	/* Synchronizes access to xfer_list */
+	spinlock_t		xfer_lock;
+	/* Base address of request queue */
+	void			*req_q_base;
+	/* Base address of response queue */
+	void			*rsp_q_base;
+	/*
+	 * Base bus address of request queue. Response queue bus address can be
+	 * calculated by adding request queue size to this variable
+	 */
+	dma_addr_t		dma_addr;
+	/* Total size of request and response queue in byte */
+	u32			total_size;
+	/* Capacity of request/response queue */
+	u32			nelem;
+	/* The user that opened this DBC */
+	struct qaic_user	*usr;
+	/*
+	 * Request ID of next memory handle that goes in request queue. One
+	 * memory handle can enqueue more than one request elements, all
+	 * this requests that belong to same memory handle have same request ID
+	 */
+	u16			next_req_id;
+	/* TRUE: DBC is in use; FALSE: DBC not in use */
+	bool			in_use;
+	/*
+	 * Base address of device registers. Used to read/write request and
+	 * response queue's head and tail pointer of this DBC.
+	 */
+	void __iomem		*dbc_base;
+	/* Head of list where each node is a memory handle queued in request queue */
+	struct list_head	xfer_list;
+	/* Synchronizes DBC readers during cleanup */
+	struct srcu_struct	ch_lock;
+	/*
+	 * When this DBC is released, any thread waiting on this wait queue is
+	 * woken up
+	 */
+	wait_queue_head_t	dbc_release;
+	/* Head of list where each node is a bo associated with this DBC */
+	struct list_head	bo_lists;
+	/* The irq line for this DBC.  Used for polling */
+	unsigned int		irq;
+	/* Polling work item to simulate interrupts */
+	struct work_struct	poll_work;
+};
+
+struct qaic_device {
+	/* Pointer to base PCI device struct of our physical device */
+	struct pci_dev		*pdev;
+	/* Mask of all bars of this device */
+	int			bars;
+	/* Req. ID of request that will be queued next in MHI control device */
+	u32			next_seq_num;
+	/* Base address of bar 0 */
+	void __iomem		*bar_0;
+	/* Base address of bar 2 */
+	void __iomem		*bar_2;
+	/* Controller structure for MHI devices */
+	struct mhi_controller	*mhi_cntl;
+	/* MHI control channel device */
+	struct mhi_device	*cntl_ch;
+	/* List of requests queued in MHI control device */
+	struct list_head	cntl_xfer_list;
+	/* Synchronizes MHI control device transactions and its xfer list */
+	struct mutex		cntl_mutex;
+	/* Base actual physical representation of drm device */
+	struct qaic_drm_device	*base_dev;
+	/* Array of DBC struct of this device */
+	struct dma_bridge_chan	*dbc;
+	/* Work queue for tasks related to MHI control device */
+	struct workqueue_struct	*cntl_wq;
+	/* Synchronizes all the users of device during cleanup */
+	struct srcu_struct	dev_lock;
+	/* TRUE: Device under reset; FALSE: Device not under reset */
+	bool			in_reset;
+	/*
+	 * TRUE: A tx MHI transaction has failed and a rx buffer is still queued
+	 * in control device. Such a buffer is considered lost rx buffer
+	 * FALSE: No rx buffer is lost in control device
+	 */
+	bool			cntl_lost_buf;
+	/* Maximum number of DBC supported by this device */
+	u32			num_dbc;
+	/* Head in list of drm device created on top of this device */
+	struct list_head	qaic_drm_devices;
+	/* Synchronizes access of qaic_drm_devices list */
+	struct mutex		qaic_drm_devices_mutex;
+	/* Generate the CRC of a control message */
+	u32 (*gen_crc)(void *msg);
+	/* Validate the CRC of a control message */
+	bool (*valid_crc)(void *msg);
+};
+
+struct qaic_drm_device {
+	/* Pointer to the root device struct driven by this driver */
+	struct qaic_device	*qdev;
+	/* Node in list of drm devices maintained by root device */
+	struct list_head	node;
+	/*
+	 * The physical device can be partition in number of logical devices.
+	 * And each logical device is given a partition id. This member stores
+	 * that id. QAIC_NO_PARTITION is a sentinel used to mark that this drm
+	 * device is the actual physical device
+	 */
+	s32			partition_id;
+	/*
+	 * It points to the user that created this drm device. It is NULL
+	 * when this drm device represents the physical device i.e.
+	 * partition_id is QAIC_NO_PARTITION
+	 */
+	struct qaic_user	*owner;
+	/* Pointer to the drm device struct of this drm device */
+	struct drm_device	*ddev;
+	/* Head in list of users who have opened this drm device */
+	struct list_head	users;
+	/* Synchronizes access to users list */
+	struct mutex		users_mutex;
+};
+
+struct qaic_bo {
+	struct drm_gem_object	base;
+	/* Scatter/gather table for allocate/imported BO */
+	struct sg_table		*sgt;
+	/* BO size requested by user. GEM object might be bigger in size. */
+	u64			size;
+	/* Head in list of slices of this BO */
+	struct list_head	slices;
+	/* Total nents, for all slices of this BO */
+	int			total_slice_nents;
+	/*
+	 * Direction of transfer. It can assume only two value DMA_TO_DEVICE and
+	 * DMA_FROM_DEVICE.
+	 */
+	int			dir;
+	/* The pointer of the DBC which operates on this BO */
+	struct dma_bridge_chan	*dbc;
+	/* Number of slice that belongs to this buffer */
+	u32			nr_slice;
+	/* Number of slice that have been transferred by DMA engine */
+	u32			nr_slice_xfer_done;
+	/* TRUE = BO is queued for execution, FALSE = BO is not queued */
+	bool			queued;
+	/*
+	 * If TRUE then user has attached slicing information to this BO by
+	 * calling DRM_IOCTL_QAIC_ATTACH_SLICE_BO ioctl.
+	 */
+	bool			sliced;
+	/* Request ID of this BO if it is queued for execution */
+	u16			req_id;
+	/* Handle assigned to this BO */
+	u32			handle;
+	/* Wait on this for completion of DMA transfer of this BO */
+	struct completion	xfer_done;
+	/*
+	 * Node in linked list where head is dbc->xfer_list.
+	 * This link list contain BO's that are queued for DMA transfer.
+	 */
+	struct list_head	xfer_list;
+	/*
+	 * Node in linked list where head is dbc->bo_lists.
+	 * This link list contain BO's that are associated with the DBC it is
+	 * linked to.
+	 */
+	struct list_head	bo_list;
+	struct {
+		/*
+		 * Latest timestamp(ns) at which kernel received a request to
+		 * execute this BO
+		 */
+		u64		req_received_ts;
+		/*
+		 * Latest timestamp(ns) at which kernel enqueued requests of
+		 * this BO for execution in DMA queue
+		 */
+		u64		req_submit_ts;
+		/*
+		 * Latest timestamp(ns) at which kernel received a completion
+		 * interrupt for requests of this BO
+		 */
+		u64		req_processed_ts;
+		/*
+		 * Number of elements already enqueued in DMA queue before
+		 * enqueuing requests of this BO
+		 */
+		u32		queue_level_before;
+	} perf_stats;
+
+};
+
+struct bo_slice {
+	/* Mapped pages */
+	struct sg_table		*sgt;
+	/* Number of requests required to queue in DMA queue */
+	int			nents;
+	/* See enum dma_data_direction */
+	int			dir;
+	/* Actual requests that will be copied in DMA queue */
+	struct dbc_req		*reqs;
+	struct kref		ref_count;
+	/* TRUE: No DMA transfer required */
+	bool			no_xfer;
+	/* Pointer to the parent BO handle */
+	struct qaic_bo		*bo;
+	/* Node in list of slices maintained by parent BO */
+	struct list_head	slice;
+	/* Size of this slice in bytes */
+	u64			size;
+	/* Offset of this slice in buffer */
+	u64			offset;
+};
+
+int get_dbc_req_elem_size(void);
+int get_dbc_rsp_elem_size(void);
+int get_cntl_version(struct qaic_device *qdev, struct qaic_user *usr,
+		     u16 *major, u16 *minor);
+int qaic_manage_ioctl(struct drm_device *dev, void *data,
+		      struct drm_file *file_priv);
+int qaic_execute_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
+		       unsigned long arg, bool is_partial);
+int qaic_wait_exec_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
+			 unsigned long arg);
+int qaic_query_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
+		     unsigned long arg);
+int qaic_data_mmap(struct qaic_device *qdev, struct qaic_user *usr,
+		   struct vm_area_struct *vma);
+int qaic_data_get_reservation(struct qaic_device *qdev, struct qaic_user *usr,
+			      void *data, u32 *partition_id,
+			      u16 *remove);
+void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
+			 struct mhi_result *mhi_result);
+
+void qaic_mhi_dl_xfer_cb(struct mhi_device *mhi_dev,
+			 struct mhi_result *mhi_result);
+
+int qaic_control_open(struct qaic_device *qdev);
+void qaic_control_close(struct qaic_device *qdev);
+void qaic_release_usr(struct qaic_device *qdev, struct qaic_user *usr);
+
+irqreturn_t dbc_irq_threaded_fn(int irq, void *data);
+irqreturn_t dbc_irq_handler(int irq, void *data);
+int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr);
+void enable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr);
+void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id);
+void release_dbc(struct qaic_device *qdev, u32 dbc_id);
+
+void wake_all_cntl(struct qaic_device *qdev);
+void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset);
+
+struct drm_gem_object *qaic_gem_prime_import(struct drm_device *dev,
+					     struct dma_buf *dma_buf);
+
+int qaic_create_bo_ioctl(struct drm_device *dev, void *data,
+			 struct drm_file *file_priv);
+int qaic_mmap_bo_ioctl(struct drm_device *dev, void *data,
+		       struct drm_file *file_priv);
+int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data,
+			       struct drm_file *file_priv);
+int qaic_execute_bo_ioctl(struct drm_device *dev, void *data,
+			  struct drm_file *file_priv);
+int qaic_partial_execute_bo_ioctl(struct drm_device *dev, void *data,
+				  struct drm_file *file_priv);
+int qaic_wait_bo_ioctl(struct drm_device *dev, void *data,
+		       struct drm_file *file_priv);
+int qaic_test_print_bo_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file_priv);
+int qaic_perf_stats_bo_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file_priv);
+void irq_polling_work(struct work_struct *work);
+
+#endif /* QAICINTERNAL_H_ */
diff --git a/drivers/accel/qaic/qaic_drv.c b/drivers/accel/qaic/qaic_drv.c
new file mode 100644
index 0000000..602a784
--- /dev/null
+++ b/drivers/accel/qaic/qaic_drv.c
@@ -0,0 +1,771 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2019-2021, The Linux Foundation. All rights reserved. */
+/* Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved. */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/idr.h>
+#include <linux/interrupt.h>
+#include <linux/list.h>
+#include <linux/kref.h>
+#include <linux/mhi.h>
+#include <linux/module.h>
+#include <linux/msi.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+#include <linux/workqueue.h>
+#include <linux/wait.h>
+#include <drm/drm_accel.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_file.h>
+#include <drm/drm_gem.h>
+#include <drm/drm_ioctl.h>
+#include <uapi/drm/qaic_accel.h>
+
+#include "mhi_controller.h"
+#include "mhi_qaic_ctrl.h"
+#include "qaic.h"
+
+MODULE_IMPORT_NS(DMA_BUF);
+
+#define PCI_DEV_AIC100			0xa100
+#define QAIC_NAME			"qaic"
+#define QAIC_DESC			"Qualcomm Cloud AI Accelerators"
+
+static unsigned int datapath_polling;
+module_param(datapath_polling, uint, 0400);
+bool poll_datapath;
+
+static u16 cntl_major = 5;
+static u16 cntl_minor;/* 0 */
+static bool link_up;
+static DEFINE_IDA(qaic_usrs);
+
+static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id,
+				  struct qaic_user *owner);
+static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id,
+				    struct qaic_user *owner);
+
+static void free_usr(struct kref *kref)
+{
+	struct qaic_user *usr = container_of(kref, struct qaic_user, ref_count);
+
+	cleanup_srcu_struct(&usr->qddev_lock);
+	ida_free(&qaic_usrs, usr->handle);
+	kfree(usr);
+}
+
+static int qaic_open(struct drm_device *dev, struct drm_file *file)
+{
+	struct qaic_drm_device *qddev = dev->dev_private;
+	struct qaic_device *qdev = qddev->qdev;
+	struct qaic_user *usr;
+	int rcu_id;
+	int ret;
+
+	rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
+		return -ENODEV;
+	}
+
+	usr = kmalloc(sizeof(*usr), GFP_KERNEL);
+	if (!usr) {
+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
+		return -ENOMEM;
+	}
+
+	usr->handle = ida_alloc(&qaic_usrs, GFP_KERNEL);
+	if (usr->handle < 0) {
+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
+		kfree(usr);
+		return usr->handle;
+	}
+	usr->qddev = qddev;
+	atomic_set(&usr->chunk_id, 0);
+	init_srcu_struct(&usr->qddev_lock);
+	kref_init(&usr->ref_count);
+
+	ret = mutex_lock_interruptible(&qddev->users_mutex);
+	if (ret) {
+		cleanup_srcu_struct(&usr->qddev_lock);
+		kfree(usr);
+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
+		return ret;
+	}
+
+	list_add(&usr->node, &qddev->users);
+	mutex_unlock(&qddev->users_mutex);
+
+	file->driver_priv = usr;
+
+	srcu_read_unlock(&qdev->dev_lock, rcu_id);
+	return 0;
+}
+
+static void qaic_postclose(struct drm_device *dev, struct drm_file *file)
+{
+	struct qaic_user *usr = file->driver_priv;
+	struct qaic_drm_device *qddev;
+	struct qaic_device *qdev;
+	int qdev_rcu_id;
+	int usr_rcu_id;
+	int i;
+
+	qddev = usr->qddev;
+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
+	if (qddev) {
+		qdev = qddev->qdev;
+		qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+		if (!qdev->in_reset) {
+			qaic_release_usr(qdev, usr);
+			for (i = 0; i < qdev->num_dbc; ++i)
+				if (qdev->dbc[i].usr &&
+				    qdev->dbc[i].usr->handle == usr->handle)
+					release_dbc(qdev, i);
+
+			/* Remove child devices */
+			if (qddev->partition_id == QAIC_NO_PARTITION)
+				qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, usr);
+		}
+		srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+
+		mutex_lock(&qddev->users_mutex);
+		if (!list_empty(&usr->node))
+			list_del_init(&usr->node);
+		mutex_unlock(&qddev->users_mutex);
+	}
+
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+	kref_put(&usr->ref_count, free_usr);
+
+	file->driver_priv = NULL;
+}
+
+static int qaic_part_dev_ioctl(struct drm_device *dev, void *data,
+			       struct drm_file *file_priv)
+{
+	struct qaic_device *qdev;
+	struct qaic_user *usr;
+	u32 partition_id;
+	int qdev_rcu_id;
+	int usr_rcu_id;
+	int ret = 0;
+	u16 remove;
+
+	usr = file_priv->driver_priv;
+	if (!usr)
+		return -EINVAL;
+
+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
+	if (!usr->qddev) {
+		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+		return -ENODEV;
+	}
+
+	qdev = usr->qddev->qdev;
+	if (!qdev) {
+		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+		return -ENODEV;
+	}
+
+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	/* This IOCTL is only supported for base devices. */
+	if (usr->qddev->partition_id != QAIC_NO_PARTITION) {
+		ret = -ENOTTY;
+		goto out;
+	}
+
+	ret = qaic_data_get_reservation(qdev, usr, data, &partition_id,
+					&remove);
+	if (ret)
+		goto out;
+
+	if (remove == 1)
+		qaic_destroy_drm_device(qdev, partition_id, usr);
+	else
+		ret = qaic_create_drm_device(qdev, partition_id, usr);
+
+out:
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+
+	return ret;
+}
+
+DEFINE_DRM_ACCEL_FOPS(qaic_accel_fops);
+
+static const struct drm_ioctl_desc qaic_drm_ioctls[] = {
+	DRM_IOCTL_DEF_DRV(QAIC_MANAGE, qaic_manage_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(QAIC_CREATE_BO, qaic_create_bo_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(QAIC_MMAP_BO, qaic_mmap_bo_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(QAIC_ATTACH_SLICE_BO, qaic_attach_slice_bo_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(QAIC_EXECUTE_BO, qaic_execute_bo_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(QAIC_PARTIAL_EXECUTE_BO, qaic_partial_execute_bo_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(QAIC_WAIT_BO, qaic_wait_bo_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(QAIC_PERF_STATS_BO, qaic_perf_stats_bo_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(QAIC_PART_DEV, qaic_part_dev_ioctl, DRM_RENDER_ALLOW),
+};
+
+static const struct drm_driver qaic_accel_driver = {
+	.driver_features	= DRIVER_GEM | DRIVER_COMPUTE_ACCEL,
+
+	.name			= QAIC_NAME,
+	.desc			= QAIC_DESC,
+	.date			= "20190618",
+
+	.fops			= &qaic_accel_fops,
+	.open			= qaic_open,
+	.postclose		= qaic_postclose,
+
+	.ioctls			= qaic_drm_ioctls,
+	.num_ioctls		= ARRAY_SIZE(qaic_drm_ioctls),
+	.prime_fd_to_handle	= drm_gem_prime_fd_to_handle,
+	.gem_prime_import	= qaic_gem_prime_import,
+};
+
+static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id,
+				  struct qaic_user *owner)
+{
+	struct qaic_drm_device *qddev;
+	struct drm_device *ddev;
+	struct device *pdev;
+	int ret;
+
+	/*
+	 * Partition id QAIC_NO_PARTITION indicates that the device was created
+	 * on mhi_probe and id > QAIC_NO_PARTITION indicates a partition
+	 * created using IOCTL. So, pdev for primary device is the pci dev and
+	 * the parent for partition dev is the primary device.
+	 */
+	if (partition_id == QAIC_NO_PARTITION)
+		pdev = &qdev->pdev->dev;
+	else
+		pdev = qdev->base_dev->ddev->dev;
+
+	qddev = kzalloc(sizeof(*qddev), GFP_KERNEL);
+	if (!qddev) {
+		ret = -ENOMEM;
+		goto qddev_fail;
+	}
+
+	ddev = drm_dev_alloc(&qaic_accel_driver, pdev);
+	if (IS_ERR(ddev)) {
+		ret = PTR_ERR(ddev);
+		goto ddev_fail;
+	}
+
+	ddev->dev_private = qddev;
+	qddev->ddev = ddev;
+
+	if (partition_id == QAIC_NO_PARTITION)
+		qdev->base_dev = qddev;
+	qddev->qdev = qdev;
+	qddev->partition_id = partition_id;
+	qddev->owner = owner;
+	INIT_LIST_HEAD(&qddev->users);
+	mutex_init(&qddev->users_mutex);
+
+	mutex_lock(&qdev->qaic_drm_devices_mutex);
+	list_add(&qddev->node, &qdev->qaic_drm_devices);
+	mutex_unlock(&qdev->qaic_drm_devices_mutex);
+
+	ret = drm_dev_register(ddev, 0);
+	if (ret) {
+		pci_dbg(qdev->pdev, "%s: drm_dev_register failed %d\n", __func__, ret);
+		goto drm_reg_fail;
+	}
+
+	return 0;
+
+drm_reg_fail:
+	mutex_destroy(&qddev->users_mutex);
+	mutex_lock(&qdev->qaic_drm_devices_mutex);
+	list_del(&qddev->node);
+	mutex_unlock(&qdev->qaic_drm_devices_mutex);
+	if (partition_id == QAIC_NO_PARTITION)
+		qdev->base_dev = NULL;
+	drm_dev_put(ddev);
+ddev_fail:
+	kfree(qddev);
+qddev_fail:
+	return ret;
+}
+
+static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id,
+				    struct qaic_user *owner)
+{
+	struct qaic_drm_device *qddev;
+	struct qaic_drm_device *q;
+	struct qaic_user *usr;
+
+	list_for_each_entry_safe(qddev, q, &qdev->qaic_drm_devices, node) {
+		/*
+		 * Skip devices in case we just want to remove devices
+		 * specific to a owner or partition id.
+		 *
+		 * owner	partition_id	notes
+		 * ----------------------------------
+		 * NULL		NO_PARTITION	delete base + all derived (qdev
+		 *				reset)
+		 * !NULL	NO_PARTITION	delete derived devs created by
+		 *				owner.
+		 * !NULL	>NO_PARTITION	delete derived dev identified by
+		 *				the partition id and created by
+		 *				owner
+		 * NULL		>NO_PARTITION	invalid (no-op)
+		 *
+		 * if partition_id is any value < QAIC_NO_PARTITION this will be
+		 * a no-op.
+		 */
+		if (owner && owner != qddev->owner)
+			continue;
+
+		if (partition_id != QAIC_NO_PARTITION &&
+		    partition_id != qddev->partition_id && !owner)
+			continue;
+
+		/*
+		 * Existing users get unresolvable errors till they close FDs.
+		 * Need to sync carefully with users calling close().  The
+		 * list of users can be modified elsewhere when the lock isn't
+		 * held here, but the sync'ing the srcu with the mutex held
+		 * could deadlock.  Grab the mutex so that the list will be
+		 * unmodified.  The user we get will exist as long as the
+		 * lock is held.  Signal that the qcdev is going away, and
+		 * grab a reference to the user so they don't go away for
+		 * synchronize_srcu().  Then release the mutex to avoid
+		 * deadlock and make sure the user has observed the signal.
+		 * With the lock released, we cannot maintain any state of the
+		 * user list.
+		 */
+		mutex_lock(&qddev->users_mutex);
+		while (!list_empty(&qddev->users)) {
+			usr = list_first_entry(&qddev->users, struct qaic_user,
+					       node);
+			list_del_init(&usr->node);
+			kref_get(&usr->ref_count);
+			usr->qddev = NULL;
+			mutex_unlock(&qddev->users_mutex);
+			synchronize_srcu(&usr->qddev_lock);
+			kref_put(&usr->ref_count, free_usr);
+			mutex_lock(&qddev->users_mutex);
+		}
+		mutex_unlock(&qddev->users_mutex);
+
+		if (qddev->ddev) {
+			drm_dev_unregister(qddev->ddev);
+			drm_dev_put(qddev->ddev);
+		}
+
+		list_del(&qddev->node);
+		kfree(qddev);
+	}
+}
+
+static int qaic_mhi_probe(struct mhi_device *mhi_dev,
+			  const struct mhi_device_id *id)
+{
+	struct qaic_device *qdev;
+	u16 major, minor;
+	int ret;
+
+	/*
+	 * Invoking this function indicates that the control channel to the
+	 * device is available.  We use that as a signal to indicate that
+	 * the device side firmware has booted.  The device side firmware
+	 * manages the device resources, so we need to communicate with it
+	 * via the control channel in order to utilize the device.  Therefore
+	 * we wait until this signal to create the drm dev that userspace will
+	 * use to control the device, because without the device side firmware,
+	 * userspace can't do anything useful.
+	 */
+
+	qdev = pci_get_drvdata(to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
+
+	qdev->in_reset = false;
+
+	dev_set_drvdata(&mhi_dev->dev, qdev);
+	qdev->cntl_ch = mhi_dev;
+
+	ret = qaic_control_open(qdev);
+	if (ret) {
+		pci_dbg(qdev->pdev, "%s: control_open failed %d\n", __func__, ret);
+		goto err;
+	}
+
+	ret = get_cntl_version(qdev, NULL, &major, &minor);
+	if (ret || major != cntl_major || minor > cntl_minor) {
+		pci_err(qdev->pdev, "%s: Control protocol version (%d.%d) not supported.  Supported version is (%d.%d). Ret: %d\n",
+			__func__, major, minor, cntl_major, cntl_minor, ret);
+		ret = -EINVAL;
+		goto close_control;
+	}
+
+	ret = qaic_create_drm_device(qdev, QAIC_NO_PARTITION, NULL);
+
+	return ret;
+
+close_control:
+	qaic_control_close(qdev);
+err:
+	return ret;
+}
+
+static void qaic_mhi_remove(struct mhi_device *mhi_dev)
+{
+}
+
+static void qaic_notify_reset(struct qaic_device *qdev)
+{
+	int i;
+
+	qdev->in_reset = true;
+	/* wake up any waiters to avoid waiting for timeouts at sync */
+	wake_all_cntl(qdev);
+	for (i = 0; i < qdev->num_dbc; ++i)
+		wakeup_dbc(qdev, i);
+	synchronize_srcu(&qdev->dev_lock);
+}
+
+void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset)
+{
+	int i;
+
+	qaic_notify_reset(qdev);
+
+	/* remove drmdevs to prevent new users from coming in */
+	if (qdev->base_dev)
+		qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, NULL);
+
+	/* start tearing things down */
+	for (i = 0; i < qdev->num_dbc; ++i)
+		release_dbc(qdev, i);
+
+	if (exit_reset)
+		qdev->in_reset = false;
+}
+
+static int qaic_pci_probe(struct pci_dev *pdev,
+			  const struct pci_device_id *id)
+{
+	int ret;
+	int i;
+	int mhi_irq;
+	struct qaic_device *qdev;
+
+	qdev = devm_kzalloc(&pdev->dev, sizeof(*qdev), GFP_KERNEL);
+	if (!qdev)
+		return -ENOMEM;
+
+	if (id->device == PCI_DEV_AIC100) {
+		qdev->num_dbc = 16;
+		qdev->dbc = devm_kcalloc(&pdev->dev, qdev->num_dbc, sizeof(*qdev->dbc),
+					 GFP_KERNEL);
+		if (!qdev->dbc)
+			return -ENOMEM;
+	}
+
+	qdev->cntl_wq = alloc_workqueue("qaic_cntl", WQ_UNBOUND, 0);
+	if (!qdev->cntl_wq) {
+		ret = -ENOMEM;
+		goto wq_fail;
+	}
+	pci_set_drvdata(pdev, qdev);
+	qdev->pdev = pdev;
+	mutex_init(&qdev->cntl_mutex);
+	INIT_LIST_HEAD(&qdev->cntl_xfer_list);
+	init_srcu_struct(&qdev->dev_lock);
+	INIT_LIST_HEAD(&qdev->qaic_drm_devices);
+	mutex_init(&qdev->qaic_drm_devices_mutex);
+	for (i = 0; i < qdev->num_dbc; ++i) {
+		spin_lock_init(&qdev->dbc[i].xfer_lock);
+		qdev->dbc[i].qdev = qdev;
+		qdev->dbc[i].id = i;
+		INIT_LIST_HEAD(&qdev->dbc[i].xfer_list);
+		init_srcu_struct(&qdev->dbc[i].ch_lock);
+		init_waitqueue_head(&qdev->dbc[i].dbc_release);
+		INIT_LIST_HEAD(&qdev->dbc[i].bo_lists);
+	}
+
+	qdev->bars = pci_select_bars(pdev, IORESOURCE_MEM);
+
+	/* make sure the device has the expected BARs */
+	if (qdev->bars != (BIT(0) | BIT(2) | BIT(4))) {
+		pci_dbg(pdev, "%s: expected BARs 0, 2, and 4 not found in device.  Found 0x%x\n",
+			__func__, qdev->bars);
+		ret = -EINVAL;
+		goto bar_fail;
+	}
+
+	ret = pci_enable_device(pdev);
+	if (ret)
+		goto enable_fail;
+
+	ret = pci_request_selected_regions(pdev, qdev->bars, "aic100");
+	if (ret)
+		goto request_regions_fail;
+
+	pci_set_master(pdev);
+
+	ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
+	if (ret)
+		goto dma_mask_fail;
+	ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
+	if (ret)
+		goto dma_mask_fail;
+	ret = dma_set_max_seg_size(&pdev->dev, UINT_MAX);
+	if (ret)
+		goto dma_mask_fail;
+
+	qdev->bar_0 = pci_ioremap_bar(pdev, 0);
+	if (!qdev->bar_0) {
+		ret = -ENOMEM;
+		goto ioremap_0_fail;
+	}
+
+	qdev->bar_2 = pci_ioremap_bar(pdev, 2);
+	if (!qdev->bar_2) {
+		ret = -ENOMEM;
+		goto ioremap_2_fail;
+	}
+
+	for (i = 0; i < qdev->num_dbc; ++i)
+		qdev->dbc[i].dbc_base = qdev->bar_2 + QAIC_DBC_OFF(i);
+
+	ret = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI);
+	if (ret < 0)
+		goto alloc_irq_fail;
+
+	if (ret < 32) {
+		pci_err(pdev, "%s: Requested 32 MSIs.  Obtained %d MSIs which is less than the 32 required.\n",
+			__func__, ret);
+		ret = -ENODEV;
+		goto invalid_msi_config;
+	}
+
+	mhi_irq = pci_irq_vector(pdev, 0);
+	if (mhi_irq < 0) {
+		ret = mhi_irq;
+		goto get_mhi_irq_fail;
+	}
+
+	for (i = 0; i < qdev->num_dbc; ++i) {
+		ret = devm_request_threaded_irq(&pdev->dev,
+						pci_irq_vector(pdev, i + 1),
+						dbc_irq_handler,
+						dbc_irq_threaded_fn,
+						IRQF_SHARED,
+						"qaic_dbc",
+						&qdev->dbc[i]);
+		if (ret)
+			goto get_dbc_irq_failed;
+
+		if (poll_datapath) {
+			qdev->dbc[i].irq = pci_irq_vector(pdev, i + 1);
+			disable_irq_nosync(qdev->dbc[i].irq);
+			INIT_WORK(&qdev->dbc[i].poll_work, irq_polling_work);
+		}
+	}
+
+	qdev->mhi_cntl = qaic_mhi_register_controller(pdev, qdev->bar_0, mhi_irq);
+	if (IS_ERR(qdev->mhi_cntl)) {
+		ret = PTR_ERR(qdev->mhi_cntl);
+		goto mhi_register_fail;
+	}
+
+	return 0;
+
+mhi_register_fail:
+get_dbc_irq_failed:
+	for (i = 0; i < qdev->num_dbc; ++i)
+		devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
+			      &qdev->dbc[i]);
+get_mhi_irq_fail:
+invalid_msi_config:
+	pci_free_irq_vectors(pdev);
+alloc_irq_fail:
+	iounmap(qdev->bar_2);
+ioremap_2_fail:
+	iounmap(qdev->bar_0);
+ioremap_0_fail:
+dma_mask_fail:
+	pci_clear_master(pdev);
+	pci_release_selected_regions(pdev, qdev->bars);
+request_regions_fail:
+	pci_disable_device(pdev);
+enable_fail:
+	pci_set_drvdata(pdev, NULL);
+bar_fail:
+	for (i = 0; i < qdev->num_dbc; ++i)
+		cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
+	cleanup_srcu_struct(&qdev->dev_lock);
+	destroy_workqueue(qdev->cntl_wq);
+wq_fail:
+	return ret;
+}
+
+static void qaic_pci_remove(struct pci_dev *pdev)
+{
+	struct qaic_device *qdev = pci_get_drvdata(pdev);
+	int i;
+
+	if (!qdev)
+		return;
+
+	qaic_dev_reset_clean_local_state(qdev, false);
+	qaic_mhi_free_controller(qdev->mhi_cntl, link_up);
+	for (i = 0; i < qdev->num_dbc; ++i) {
+		devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
+			      &qdev->dbc[i]);
+		cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
+	}
+	destroy_workqueue(qdev->cntl_wq);
+	pci_free_irq_vectors(pdev);
+	iounmap(qdev->bar_0);
+	pci_clear_master(pdev);
+	pci_release_selected_regions(pdev, qdev->bars);
+	pci_disable_device(pdev);
+	pci_set_drvdata(pdev, NULL);
+}
+
+static void qaic_pci_shutdown(struct pci_dev *pdev)
+{
+	/* see qaic_exit for what link_up is doing */
+	link_up = true;
+	qaic_pci_remove(pdev);
+}
+
+static pci_ers_result_t qaic_pci_error_detected(struct pci_dev *pdev,
+						pci_channel_state_t error)
+{
+	return PCI_ERS_RESULT_NEED_RESET;
+}
+
+static void qaic_pci_reset_prepare(struct pci_dev *pdev)
+{
+	struct qaic_device *qdev = pci_get_drvdata(pdev);
+
+	qaic_notify_reset(qdev);
+	qaic_mhi_start_reset(qdev->mhi_cntl);
+	qaic_dev_reset_clean_local_state(qdev, false);
+}
+
+static void qaic_pci_reset_done(struct pci_dev *pdev)
+{
+	struct qaic_device *qdev = pci_get_drvdata(pdev);
+
+	qdev->in_reset = false;
+	qaic_mhi_reset_done(qdev->mhi_cntl);
+}
+
+static const struct mhi_device_id qaic_mhi_match_table[] = {
+	{ .chan = "QAIC_CONTROL", },
+	{},
+};
+
+static struct mhi_driver qaic_mhi_driver = {
+	.id_table = qaic_mhi_match_table,
+	.remove = qaic_mhi_remove,
+	.probe = qaic_mhi_probe,
+	.ul_xfer_cb = qaic_mhi_ul_xfer_cb,
+	.dl_xfer_cb = qaic_mhi_dl_xfer_cb,
+	.driver = {
+		.name = "qaic_mhi",
+	},
+};
+
+static const struct pci_device_id qaic_ids[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_QCOM, PCI_DEV_AIC100), },
+	{ }
+};
+MODULE_DEVICE_TABLE(pci, qaic_ids);
+
+static const struct pci_error_handlers qaic_pci_err_handler = {
+	.error_detected = qaic_pci_error_detected,
+	.reset_prepare = qaic_pci_reset_prepare,
+	.reset_done = qaic_pci_reset_done,
+};
+
+static struct pci_driver qaic_pci_driver = {
+	.name = QAIC_NAME,
+	.id_table = qaic_ids,
+	.probe = qaic_pci_probe,
+	.remove = qaic_pci_remove,
+	.shutdown = qaic_pci_shutdown,
+	.err_handler = &qaic_pci_err_handler,
+};
+
+static int __init qaic_init(void)
+{
+	int ret;
+
+	if (datapath_polling)
+		poll_datapath = true;
+
+	ret = mhi_driver_register(&qaic_mhi_driver);
+	if (ret) {
+		pr_debug("qaic: mhi_driver_register failed %d\n", ret);
+		goto free_class;
+	}
+
+	ret = pci_register_driver(&qaic_pci_driver);
+	if (ret) {
+		pr_debug("qaic: pci_register_driver failed %d\n", ret);
+		goto free_mhi;
+	}
+
+	ret = mhi_qaic_ctrl_init();
+	if (ret) {
+		pr_debug("qaic: mhi_qaic_ctrl_init failed %d\n", ret);
+		goto free_pci;
+	}
+
+	return 0;
+
+free_pci:
+	pci_unregister_driver(&qaic_pci_driver);
+free_mhi:
+	mhi_driver_unregister(&qaic_mhi_driver);
+free_class:
+
+	return ret;
+}
+
+static void __exit qaic_exit(void)
+{
+	/*
+	 * We assume that qaic_pci_remove() is called due to a hotplug event
+	 * which would mean that the link is down, and thus
+	 * qaic_mhi_free_controller() should not try to access the device during
+	 * cleanup.
+	 * We call pci_unregister_driver() below, which also triggers
+	 * qaic_pci_remove(), but since this is module exit, we expect the link
+	 * to the device to be up, in which case qaic_mhi_free_controller()
+	 * should try to access the device during cleanup to put the device in
+	 * a sane state.
+	 * For that reason, we set link_up here to let qaic_mhi_free_controller
+	 * know the expected link state.  Since the module is going to be
+	 * removed at the end of this, we don't need to worry about
+	 * reinitializing the link_up state after the cleanup is done.
+	 */
+	link_up = true;
+	mhi_qaic_ctrl_deinit();
+	pci_unregister_driver(&qaic_pci_driver);
+	mhi_driver_unregister(&qaic_mhi_driver);
+}
+
+module_init(qaic_init);
+module_exit(qaic_exit);
+
+MODULE_AUTHOR(QAIC_DESC " Kernel Driver Team");
+MODULE_DESCRIPTION(QAIC_DESC " Accel Driver");
+MODULE_LICENSE("GPL");
diff --git a/include/uapi/drm/qaic_accel.h b/include/uapi/drm/qaic_accel.h
new file mode 100644
index 0000000..d5fa6f5
--- /dev/null
+++ b/include/uapi/drm/qaic_accel.h
@@ -0,0 +1,283 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+ *
+ * Copyright (c) 2019-2020, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef QAIC_ACCEL_H_
+#define QAIC_ACCEL_H_
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+#include "drm.h"
+
+#if defined(__CPLUSPLUS)
+extern "C" {
+#endif
+
+#define QAIC_MANAGE_MAX_MSG_LENGTH SZ_4K	/**<
+						  * The length(4K) includes len and
+						  * count fields of qaic_manage_msg
+						  */
+
+enum qaic_sem_flags {
+	SEM_INSYNCFENCE =	0x1,
+	SEM_OUTSYNCFENCE =	0x2,
+};
+
+enum qaic_sem_cmd {
+	SEM_NOP =		0,
+	SEM_INIT =		1,
+	SEM_INC =		2,
+	SEM_DEC =		3,
+	SEM_WAIT_EQUAL =	4,
+	SEM_WAIT_GT_EQ =	5, /**< Greater than or equal */
+	SEM_WAIT_GT_0 =		6, /**< Greater than 0 */
+};
+
+enum qaic_manage_transaction_type {
+	TRANS_UNDEFINED =			0,
+	TRANS_PASSTHROUGH_FROM_USR =		1,
+	TRANS_PASSTHROUGH_TO_USR =		2,
+	TRANS_PASSTHROUGH_FROM_DEV =		3,
+	TRANS_PASSTHROUGH_TO_DEV =		4,
+	TRANS_DMA_XFER_FROM_USR =		5,
+	TRANS_DMA_XFER_TO_DEV =			6,
+	TRANS_ACTIVATE_FROM_USR =		7,
+	TRANS_ACTIVATE_FROM_DEV =		8,
+	TRANS_ACTIVATE_TO_DEV =			9,
+	TRANS_DEACTIVATE_FROM_USR =		10,
+	TRANS_DEACTIVATE_FROM_DEV =		11,
+	TRANS_STATUS_FROM_USR =			12,
+	TRANS_STATUS_TO_USR =			13,
+	TRANS_STATUS_FROM_DEV =			14,
+	TRANS_STATUS_TO_DEV =			15,
+	TRANS_TERMINATE_FROM_DEV =		16,
+	TRANS_TERMINATE_TO_DEV =		17,
+	TRANS_DMA_XFER_CONT =			18,
+	TRANS_VALIDATE_PARTITION_FROM_DEV =	19,
+	TRANS_VALIDATE_PARTITION_TO_DEV =	20,
+	TRANS_MAX =				21
+};
+
+struct qaic_manage_trans_hdr {
+	__u32 type;	/**< in, value from enum manage_transaction_type */
+	__u32 len;	/**< in, length of this transaction, including the header */
+};
+
+struct qaic_manage_trans_passthrough {
+	struct qaic_manage_trans_hdr hdr;
+	__u8 data[];	/**< in, userspace must encode in little endian */
+};
+
+struct qaic_manage_trans_dma_xfer {
+	struct qaic_manage_trans_hdr hdr;
+	__u32 tag;	/**< in, device specific */
+	__u32 count;	/**< in */
+	__u64 addr;	/**< in, address of the data to transferred via DMA */
+	__u64 size;	/**< in, length of the data to transferred via DMA */
+};
+
+struct qaic_manage_trans_activate_to_dev {
+	struct qaic_manage_trans_hdr hdr;
+	__u32 queue_size;	/**<
+				  * in, number of elements in DBC request
+				  * and respose queue
+				  */
+	__u32 eventfd;		/**< in */
+	__u32 options;		/**< in, device specific */
+	__u32 pad;		/**< pad must be 0 */
+};
+
+struct qaic_manage_trans_activate_from_dev {
+	struct qaic_manage_trans_hdr hdr;
+	__u32 status;	/**< out, status of activate transaction */
+	__u32 dbc_id;	/**< out, Identifier of assigned DMA Bridge channel */
+	__u64 options;	/**< out */
+};
+
+struct qaic_manage_trans_deactivate {
+	struct qaic_manage_trans_hdr hdr;
+	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
+	__u32 pad;	/**< pad must be 0 */
+};
+
+struct qaic_manage_trans_status_to_dev {
+	struct qaic_manage_trans_hdr hdr;
+};
+
+struct qaic_manage_trans_status_from_dev {
+	struct qaic_manage_trans_hdr hdr;
+	__u16 major;	/**< out, major vesrion of NNC protocol used by device */
+	__u16 minor;	/**< out, minor vesrion of NNC protocol used by device */
+	__u32 status;	/**< out, status of query transaction  */
+	__u64 status_flags;	/**<
+				  * out
+				  * 0    : If set then device has CRC check enabled
+				  * 1:63 : Unused
+				  */
+};
+
+struct qaic_manage_msg {
+	__u32 len;	/**< in, Length of valid data - ie sum of all transactions */
+	__u32 count;	/**< in, Number of transactions in message */
+	__u64 data;	/**< in, Pointer to array of transactions */
+};
+
+struct qaic_create_bo {
+	__u64 size;	/**< in, Size of BO in byte */
+	__u32 handle;	/**< out, Returned GEM handle for the BO */
+	__u32 pad;	/**< pad must be 0 */
+};
+
+struct qaic_mmap_bo {
+	__u32 handle;	/**< in, Handle for the BO being mapped. */
+	__u32 pad;	/**< pad must be 0 */
+	__u64 offset;	/**<
+			  * out, offset into the drm node to use for
+			  * subsequent mmap call
+			  */
+};
+
+/**
+ * @brief semaphore command
+ */
+struct qaic_sem {
+	__u16 val;	/**< in, Only lower 12 bits are valid */
+	__u8  index;	/**< in, Only lower 5 bits are valid */
+	__u8  presync;	/**< in, 1 if presync operation, 0 if postsync */
+	__u8  cmd;	/**< in, See enum sem_cmd */
+	__u8  flags;	/**< in, See sem_flags for valid bits.  All others must be 0 */
+	__u16 pad;	/**< pad must be 0 */
+};
+
+struct qaic_attach_slice_entry {
+	__u64 size;		/**< in, Size memory to allocate for this BO slice */
+	struct qaic_sem	sem0;	/**< in, Must be zero if not valid */
+	struct qaic_sem	sem1;	/**< in, Must be zero if not valid */
+	struct qaic_sem	sem2;	/**< in, Must be zero if not valid */
+	struct qaic_sem	sem3;	/**< in, Must be zero if not valid */
+	__u64 dev_addr;		/**< in, Address in device to/from which data is copied */
+	__u64 db_addr;		/**< in, Doorbell address */
+	__u32 db_data;		/**< in, Data to write to doorbell */
+	__u32 db_len;		/**<
+				  * in, Doorbell length - 32, 16, or 8 bits.
+				  * 0 means doorbell is inactive
+				  */
+	__u64 offset;		/**< in, Offset from start of buffer */
+};
+
+struct qaic_attach_slice_hdr {
+	__u32 count;	/**< in, Number of slices for this BO */
+	__u32 dbc_id;	/**< in, Associate this BO with this DMA Bridge channel */
+	__u32 handle;	/**< in, Handle of BO to which slicing information is to be attached */
+	__u32 dir;	/**< in, Direction of data: 1 = DMA_TO_DEVICE, 2 = DMA_FROM_DEVICE */
+	__u64 size;	/**<
+			  * in, Total length of BO
+			  * If BO is imported (DMABUF/PRIME) then this size
+			  * should not exceed the size of DMABUF provided.
+			  * If BO is allocated using DRM_IOCTL_QAIC_CREATE_BO
+			  * then this size should be exactly same as the size
+			  * provided during DRM_IOCTL_QAIC_CREATE_BO.
+			  */
+};
+
+struct qaic_attach_slice {
+	struct qaic_attach_slice_hdr hdr;
+	__u64 data;	/**<
+			  * in, Pointer to a buffer which is container of
+			  * struct qaic_attach_slice_entry[]
+			  */
+};
+
+struct qaic_execute_entry {
+	__u32 handle;	/**< in, buffer handle */
+	__u32 dir;	/**< in, 1 = to device, 2 = from device */
+};
+
+struct qaic_partial_execute_entry {
+	__u32 handle;	/**< in, buffer handle */
+	__u32 dir;	/**< in, 1 = to device, 2 = from device */
+	__u64 resize;	/**< in, 0 = no resize */
+};
+
+struct qaic_execute_hdr {
+	__u32 count;	/**< in, number of executes following this header */
+	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
+};
+
+struct qaic_execute {
+	struct qaic_execute_hdr hdr;
+	__u64 data;	/**< in, qaic_execute_entry or qaic_partial_execute_entry container */
+};
+
+struct qaic_wait {
+	__u32 handle;	/**< in, handle to wait on until execute is complete */
+	__u32 timeout;	/**< in, timeout for wait(in ms) */
+	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
+	__u32 pad;	/**< pad must be 0 */
+};
+
+struct qaic_perf_stats_hdr {
+	__u16 count;	/**< in, Total number BOs requested */
+	__u16 pad;	/**< pad must be 0 */
+	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
+};
+
+struct qaic_perf_stats {
+	struct qaic_perf_stats_hdr hdr;
+	__u64 data;	/**< in, qaic_perf_stats_entry container */
+};
+
+struct qaic_perf_stats_entry {
+	__u32 handle;			/**< in, Handle of the memory request */
+	__u32 queue_level_before;	/**<
+					  * out, Number of elements in queue
+					  * before submission given memory request
+					  */
+	__u32 num_queue_element;	/**<
+					  * out, Number of elements to add in the
+					  * queue for given memory request
+					  */
+	__u32 submit_latency_us;	/**<
+					  * out, Time taken by kernel to submit
+					  * the request to device
+					  */
+	__u32 device_latency_us;	/**<
+					  * out, Time taken by device to execute the
+					  * request. 0 if request is not completed
+					  */
+	__u32 pad;			/**< pad must be 0 */
+};
+
+struct qaic_part_dev {
+	__u32 partition_id;	/**< in, reservation id */
+	__u16 remove;		/**< in, 1 - Remove device 0 - Create device */
+	__u16 pad;		/**< pad must be 0 */
+};
+
+#define DRM_QAIC_MANAGE				0x00
+#define DRM_QAIC_CREATE_BO			0x01
+#define DRM_QAIC_MMAP_BO			0x02
+#define DRM_QAIC_ATTACH_SLICE_BO		0x03
+#define DRM_QAIC_EXECUTE_BO			0x04
+#define DRM_QAIC_PARTIAL_EXECUTE_BO		0x05
+#define DRM_QAIC_WAIT_BO			0x06
+#define DRM_QAIC_PERF_STATS_BO			0x07
+#define DRM_QAIC_PART_DEV			0x08
+
+#define DRM_IOCTL_QAIC_MANAGE			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_MANAGE, struct qaic_manage_msg)
+#define DRM_IOCTL_QAIC_CREATE_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_CREATE_BO,	struct qaic_create_bo)
+#define DRM_IOCTL_QAIC_MMAP_BO			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_MMAP_BO, struct qaic_mmap_bo)
+#define DRM_IOCTL_QAIC_ATTACH_SLICE_BO		DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_ATTACH_SLICE_BO, struct qaic_attach_slice)
+#define DRM_IOCTL_QAIC_EXECUTE_BO		DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_EXECUTE_BO,	struct qaic_execute)
+#define DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO	DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_PARTIAL_EXECUTE_BO,	struct qaic_execute)
+#define DRM_IOCTL_QAIC_WAIT_BO			DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_WAIT_BO, struct qaic_wait)
+#define DRM_IOCTL_QAIC_PERF_STATS_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PERF_STATS_BO, struct qaic_perf_stats)
+#define DRM_IOCTL_QAIC_PART_DEV			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PART_DEV, struct qaic_part_dev)
+
+#if defined(__CPLUSPLUS)
+}
+#endif
+
+#endif /* QAIC_ACCEL_H_ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 2/8] accel/qaic: Add uapi and core driver file
@ 2023-02-06 15:41   ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-06 15:41 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: jacek.lawrynowicz, stanislaw.gruszka, quic_pkanojiy, quic_carlv,
	quic_ajitpals, linux-doc, linux-arm-msm, Jeffrey Hugo

Add the QAIC driver uapi file and core driver file that binds to the PCIe
device.  The core driver file also creates the accel device and manages
all the interconnections between the different parts of the driver.

The driver can be built as a module.  If so, it will be called "qaic.ko".

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
---
 drivers/accel/qaic/qaic.h     | 321 ++++++++++++++++++
 drivers/accel/qaic/qaic_drv.c | 771 ++++++++++++++++++++++++++++++++++++++++++
 include/uapi/drm/qaic_accel.h | 283 ++++++++++++++++
 3 files changed, 1375 insertions(+)
 create mode 100644 drivers/accel/qaic/qaic.h
 create mode 100644 drivers/accel/qaic/qaic_drv.c
 create mode 100644 include/uapi/drm/qaic_accel.h

diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
new file mode 100644
index 0000000..3f7ea76
--- /dev/null
+++ b/drivers/accel/qaic/qaic.h
@@ -0,0 +1,321 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * Copyright (c) 2019-2021, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef QAICINTERNAL_H_
+#define QAICINTERNAL_H_
+
+#include <linux/interrupt.h>
+#include <linux/kref.h>
+#include <linux/mhi.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/spinlock.h>
+#include <linux/srcu.h>
+#include <linux/wait.h>
+#include <linux/workqueue.h>
+#include <drm/drm_device.h>
+#include <drm/drm_gem.h>
+
+#define QAIC_DBC_BASE		0x20000
+#define QAIC_DBC_SIZE		0x1000
+
+#define QAIC_NO_PARTITION	-1
+
+#define QAIC_DBC_OFF(i)		((i) * QAIC_DBC_SIZE + QAIC_DBC_BASE)
+
+#define to_qaic_bo(obj) container_of(obj, struct qaic_bo, base)
+
+extern bool poll_datapath;
+
+struct qaic_user {
+	/* Uniquely identifies this user for the device */
+	int			handle;
+	struct kref		ref_count;
+	/* Char device opened by this user */
+	struct qaic_drm_device	*qddev;
+	/* Node in list of users that opened this drm device */
+	struct list_head	node;
+	/* SRCU used to synchronize this user during cleanup */
+	struct srcu_struct	qddev_lock;
+	atomic_t		chunk_id;
+};
+
+struct dma_bridge_chan {
+	/* Pointer to device strcut maintained by driver */
+	struct qaic_device	*qdev;
+	/* ID of this DMA bridge channel(DBC) */
+	unsigned int		id;
+	/* Synchronizes access to xfer_list */
+	spinlock_t		xfer_lock;
+	/* Base address of request queue */
+	void			*req_q_base;
+	/* Base address of response queue */
+	void			*rsp_q_base;
+	/*
+	 * Base bus address of request queue. Response queue bus address can be
+	 * calculated by adding request queue size to this variable
+	 */
+	dma_addr_t		dma_addr;
+	/* Total size of request and response queue in byte */
+	u32			total_size;
+	/* Capacity of request/response queue */
+	u32			nelem;
+	/* The user that opened this DBC */
+	struct qaic_user	*usr;
+	/*
+	 * Request ID of next memory handle that goes in request queue. One
+	 * memory handle can enqueue more than one request elements, all
+	 * this requests that belong to same memory handle have same request ID
+	 */
+	u16			next_req_id;
+	/* TRUE: DBC is in use; FALSE: DBC not in use */
+	bool			in_use;
+	/*
+	 * Base address of device registers. Used to read/write request and
+	 * response queue's head and tail pointer of this DBC.
+	 */
+	void __iomem		*dbc_base;
+	/* Head of list where each node is a memory handle queued in request queue */
+	struct list_head	xfer_list;
+	/* Synchronizes DBC readers during cleanup */
+	struct srcu_struct	ch_lock;
+	/*
+	 * When this DBC is released, any thread waiting on this wait queue is
+	 * woken up
+	 */
+	wait_queue_head_t	dbc_release;
+	/* Head of list where each node is a bo associated with this DBC */
+	struct list_head	bo_lists;
+	/* The irq line for this DBC.  Used for polling */
+	unsigned int		irq;
+	/* Polling work item to simulate interrupts */
+	struct work_struct	poll_work;
+};
+
+struct qaic_device {
+	/* Pointer to base PCI device struct of our physical device */
+	struct pci_dev		*pdev;
+	/* Mask of all bars of this device */
+	int			bars;
+	/* Req. ID of request that will be queued next in MHI control device */
+	u32			next_seq_num;
+	/* Base address of bar 0 */
+	void __iomem		*bar_0;
+	/* Base address of bar 2 */
+	void __iomem		*bar_2;
+	/* Controller structure for MHI devices */
+	struct mhi_controller	*mhi_cntl;
+	/* MHI control channel device */
+	struct mhi_device	*cntl_ch;
+	/* List of requests queued in MHI control device */
+	struct list_head	cntl_xfer_list;
+	/* Synchronizes MHI control device transactions and its xfer list */
+	struct mutex		cntl_mutex;
+	/* Base actual physical representation of drm device */
+	struct qaic_drm_device	*base_dev;
+	/* Array of DBC struct of this device */
+	struct dma_bridge_chan	*dbc;
+	/* Work queue for tasks related to MHI control device */
+	struct workqueue_struct	*cntl_wq;
+	/* Synchronizes all the users of device during cleanup */
+	struct srcu_struct	dev_lock;
+	/* TRUE: Device under reset; FALSE: Device not under reset */
+	bool			in_reset;
+	/*
+	 * TRUE: A tx MHI transaction has failed and a rx buffer is still queued
+	 * in control device. Such a buffer is considered lost rx buffer
+	 * FALSE: No rx buffer is lost in control device
+	 */
+	bool			cntl_lost_buf;
+	/* Maximum number of DBC supported by this device */
+	u32			num_dbc;
+	/* Head in list of drm device created on top of this device */
+	struct list_head	qaic_drm_devices;
+	/* Synchronizes access of qaic_drm_devices list */
+	struct mutex		qaic_drm_devices_mutex;
+	/* Generate the CRC of a control message */
+	u32 (*gen_crc)(void *msg);
+	/* Validate the CRC of a control message */
+	bool (*valid_crc)(void *msg);
+};
+
+struct qaic_drm_device {
+	/* Pointer to the root device struct driven by this driver */
+	struct qaic_device	*qdev;
+	/* Node in list of drm devices maintained by root device */
+	struct list_head	node;
+	/*
+	 * The physical device can be partition in number of logical devices.
+	 * And each logical device is given a partition id. This member stores
+	 * that id. QAIC_NO_PARTITION is a sentinel used to mark that this drm
+	 * device is the actual physical device
+	 */
+	s32			partition_id;
+	/*
+	 * It points to the user that created this drm device. It is NULL
+	 * when this drm device represents the physical device i.e.
+	 * partition_id is QAIC_NO_PARTITION
+	 */
+	struct qaic_user	*owner;
+	/* Pointer to the drm device struct of this drm device */
+	struct drm_device	*ddev;
+	/* Head in list of users who have opened this drm device */
+	struct list_head	users;
+	/* Synchronizes access to users list */
+	struct mutex		users_mutex;
+};
+
+struct qaic_bo {
+	struct drm_gem_object	base;
+	/* Scatter/gather table for allocate/imported BO */
+	struct sg_table		*sgt;
+	/* BO size requested by user. GEM object might be bigger in size. */
+	u64			size;
+	/* Head in list of slices of this BO */
+	struct list_head	slices;
+	/* Total nents, for all slices of this BO */
+	int			total_slice_nents;
+	/*
+	 * Direction of transfer. It can assume only two value DMA_TO_DEVICE and
+	 * DMA_FROM_DEVICE.
+	 */
+	int			dir;
+	/* The pointer of the DBC which operates on this BO */
+	struct dma_bridge_chan	*dbc;
+	/* Number of slice that belongs to this buffer */
+	u32			nr_slice;
+	/* Number of slice that have been transferred by DMA engine */
+	u32			nr_slice_xfer_done;
+	/* TRUE = BO is queued for execution, FALSE = BO is not queued */
+	bool			queued;
+	/*
+	 * If TRUE then user has attached slicing information to this BO by
+	 * calling DRM_IOCTL_QAIC_ATTACH_SLICE_BO ioctl.
+	 */
+	bool			sliced;
+	/* Request ID of this BO if it is queued for execution */
+	u16			req_id;
+	/* Handle assigned to this BO */
+	u32			handle;
+	/* Wait on this for completion of DMA transfer of this BO */
+	struct completion	xfer_done;
+	/*
+	 * Node in linked list where head is dbc->xfer_list.
+	 * This link list contain BO's that are queued for DMA transfer.
+	 */
+	struct list_head	xfer_list;
+	/*
+	 * Node in linked list where head is dbc->bo_lists.
+	 * This link list contain BO's that are associated with the DBC it is
+	 * linked to.
+	 */
+	struct list_head	bo_list;
+	struct {
+		/*
+		 * Latest timestamp(ns) at which kernel received a request to
+		 * execute this BO
+		 */
+		u64		req_received_ts;
+		/*
+		 * Latest timestamp(ns) at which kernel enqueued requests of
+		 * this BO for execution in DMA queue
+		 */
+		u64		req_submit_ts;
+		/*
+		 * Latest timestamp(ns) at which kernel received a completion
+		 * interrupt for requests of this BO
+		 */
+		u64		req_processed_ts;
+		/*
+		 * Number of elements already enqueued in DMA queue before
+		 * enqueuing requests of this BO
+		 */
+		u32		queue_level_before;
+	} perf_stats;
+
+};
+
+struct bo_slice {
+	/* Mapped pages */
+	struct sg_table		*sgt;
+	/* Number of requests required to queue in DMA queue */
+	int			nents;
+	/* See enum dma_data_direction */
+	int			dir;
+	/* Actual requests that will be copied in DMA queue */
+	struct dbc_req		*reqs;
+	struct kref		ref_count;
+	/* TRUE: No DMA transfer required */
+	bool			no_xfer;
+	/* Pointer to the parent BO handle */
+	struct qaic_bo		*bo;
+	/* Node in list of slices maintained by parent BO */
+	struct list_head	slice;
+	/* Size of this slice in bytes */
+	u64			size;
+	/* Offset of this slice in buffer */
+	u64			offset;
+};
+
+int get_dbc_req_elem_size(void);
+int get_dbc_rsp_elem_size(void);
+int get_cntl_version(struct qaic_device *qdev, struct qaic_user *usr,
+		     u16 *major, u16 *minor);
+int qaic_manage_ioctl(struct drm_device *dev, void *data,
+		      struct drm_file *file_priv);
+int qaic_execute_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
+		       unsigned long arg, bool is_partial);
+int qaic_wait_exec_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
+			 unsigned long arg);
+int qaic_query_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
+		     unsigned long arg);
+int qaic_data_mmap(struct qaic_device *qdev, struct qaic_user *usr,
+		   struct vm_area_struct *vma);
+int qaic_data_get_reservation(struct qaic_device *qdev, struct qaic_user *usr,
+			      void *data, u32 *partition_id,
+			      u16 *remove);
+void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
+			 struct mhi_result *mhi_result);
+
+void qaic_mhi_dl_xfer_cb(struct mhi_device *mhi_dev,
+			 struct mhi_result *mhi_result);
+
+int qaic_control_open(struct qaic_device *qdev);
+void qaic_control_close(struct qaic_device *qdev);
+void qaic_release_usr(struct qaic_device *qdev, struct qaic_user *usr);
+
+irqreturn_t dbc_irq_threaded_fn(int irq, void *data);
+irqreturn_t dbc_irq_handler(int irq, void *data);
+int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr);
+void enable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr);
+void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id);
+void release_dbc(struct qaic_device *qdev, u32 dbc_id);
+
+void wake_all_cntl(struct qaic_device *qdev);
+void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset);
+
+struct drm_gem_object *qaic_gem_prime_import(struct drm_device *dev,
+					     struct dma_buf *dma_buf);
+
+int qaic_create_bo_ioctl(struct drm_device *dev, void *data,
+			 struct drm_file *file_priv);
+int qaic_mmap_bo_ioctl(struct drm_device *dev, void *data,
+		       struct drm_file *file_priv);
+int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data,
+			       struct drm_file *file_priv);
+int qaic_execute_bo_ioctl(struct drm_device *dev, void *data,
+			  struct drm_file *file_priv);
+int qaic_partial_execute_bo_ioctl(struct drm_device *dev, void *data,
+				  struct drm_file *file_priv);
+int qaic_wait_bo_ioctl(struct drm_device *dev, void *data,
+		       struct drm_file *file_priv);
+int qaic_test_print_bo_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file_priv);
+int qaic_perf_stats_bo_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file_priv);
+void irq_polling_work(struct work_struct *work);
+
+#endif /* QAICINTERNAL_H_ */
diff --git a/drivers/accel/qaic/qaic_drv.c b/drivers/accel/qaic/qaic_drv.c
new file mode 100644
index 0000000..602a784
--- /dev/null
+++ b/drivers/accel/qaic/qaic_drv.c
@@ -0,0 +1,771 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2019-2021, The Linux Foundation. All rights reserved. */
+/* Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved. */
+
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/idr.h>
+#include <linux/interrupt.h>
+#include <linux/list.h>
+#include <linux/kref.h>
+#include <linux/mhi.h>
+#include <linux/module.h>
+#include <linux/msi.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+#include <linux/workqueue.h>
+#include <linux/wait.h>
+#include <drm/drm_accel.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_file.h>
+#include <drm/drm_gem.h>
+#include <drm/drm_ioctl.h>
+#include <uapi/drm/qaic_accel.h>
+
+#include "mhi_controller.h"
+#include "mhi_qaic_ctrl.h"
+#include "qaic.h"
+
+MODULE_IMPORT_NS(DMA_BUF);
+
+#define PCI_DEV_AIC100			0xa100
+#define QAIC_NAME			"qaic"
+#define QAIC_DESC			"Qualcomm Cloud AI Accelerators"
+
+static unsigned int datapath_polling;
+module_param(datapath_polling, uint, 0400);
+bool poll_datapath;
+
+static u16 cntl_major = 5;
+static u16 cntl_minor;/* 0 */
+static bool link_up;
+static DEFINE_IDA(qaic_usrs);
+
+static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id,
+				  struct qaic_user *owner);
+static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id,
+				    struct qaic_user *owner);
+
+static void free_usr(struct kref *kref)
+{
+	struct qaic_user *usr = container_of(kref, struct qaic_user, ref_count);
+
+	cleanup_srcu_struct(&usr->qddev_lock);
+	ida_free(&qaic_usrs, usr->handle);
+	kfree(usr);
+}
+
+static int qaic_open(struct drm_device *dev, struct drm_file *file)
+{
+	struct qaic_drm_device *qddev = dev->dev_private;
+	struct qaic_device *qdev = qddev->qdev;
+	struct qaic_user *usr;
+	int rcu_id;
+	int ret;
+
+	rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
+		return -ENODEV;
+	}
+
+	usr = kmalloc(sizeof(*usr), GFP_KERNEL);
+	if (!usr) {
+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
+		return -ENOMEM;
+	}
+
+	usr->handle = ida_alloc(&qaic_usrs, GFP_KERNEL);
+	if (usr->handle < 0) {
+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
+		kfree(usr);
+		return usr->handle;
+	}
+	usr->qddev = qddev;
+	atomic_set(&usr->chunk_id, 0);
+	init_srcu_struct(&usr->qddev_lock);
+	kref_init(&usr->ref_count);
+
+	ret = mutex_lock_interruptible(&qddev->users_mutex);
+	if (ret) {
+		cleanup_srcu_struct(&usr->qddev_lock);
+		kfree(usr);
+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
+		return ret;
+	}
+
+	list_add(&usr->node, &qddev->users);
+	mutex_unlock(&qddev->users_mutex);
+
+	file->driver_priv = usr;
+
+	srcu_read_unlock(&qdev->dev_lock, rcu_id);
+	return 0;
+}
+
+static void qaic_postclose(struct drm_device *dev, struct drm_file *file)
+{
+	struct qaic_user *usr = file->driver_priv;
+	struct qaic_drm_device *qddev;
+	struct qaic_device *qdev;
+	int qdev_rcu_id;
+	int usr_rcu_id;
+	int i;
+
+	qddev = usr->qddev;
+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
+	if (qddev) {
+		qdev = qddev->qdev;
+		qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+		if (!qdev->in_reset) {
+			qaic_release_usr(qdev, usr);
+			for (i = 0; i < qdev->num_dbc; ++i)
+				if (qdev->dbc[i].usr &&
+				    qdev->dbc[i].usr->handle == usr->handle)
+					release_dbc(qdev, i);
+
+			/* Remove child devices */
+			if (qddev->partition_id == QAIC_NO_PARTITION)
+				qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, usr);
+		}
+		srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+
+		mutex_lock(&qddev->users_mutex);
+		if (!list_empty(&usr->node))
+			list_del_init(&usr->node);
+		mutex_unlock(&qddev->users_mutex);
+	}
+
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+	kref_put(&usr->ref_count, free_usr);
+
+	file->driver_priv = NULL;
+}
+
+static int qaic_part_dev_ioctl(struct drm_device *dev, void *data,
+			       struct drm_file *file_priv)
+{
+	struct qaic_device *qdev;
+	struct qaic_user *usr;
+	u32 partition_id;
+	int qdev_rcu_id;
+	int usr_rcu_id;
+	int ret = 0;
+	u16 remove;
+
+	usr = file_priv->driver_priv;
+	if (!usr)
+		return -EINVAL;
+
+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
+	if (!usr->qddev) {
+		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+		return -ENODEV;
+	}
+
+	qdev = usr->qddev->qdev;
+	if (!qdev) {
+		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+		return -ENODEV;
+	}
+
+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	/* This IOCTL is only supported for base devices. */
+	if (usr->qddev->partition_id != QAIC_NO_PARTITION) {
+		ret = -ENOTTY;
+		goto out;
+	}
+
+	ret = qaic_data_get_reservation(qdev, usr, data, &partition_id,
+					&remove);
+	if (ret)
+		goto out;
+
+	if (remove == 1)
+		qaic_destroy_drm_device(qdev, partition_id, usr);
+	else
+		ret = qaic_create_drm_device(qdev, partition_id, usr);
+
+out:
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+
+	return ret;
+}
+
+DEFINE_DRM_ACCEL_FOPS(qaic_accel_fops);
+
+static const struct drm_ioctl_desc qaic_drm_ioctls[] = {
+	DRM_IOCTL_DEF_DRV(QAIC_MANAGE, qaic_manage_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(QAIC_CREATE_BO, qaic_create_bo_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(QAIC_MMAP_BO, qaic_mmap_bo_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(QAIC_ATTACH_SLICE_BO, qaic_attach_slice_bo_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(QAIC_EXECUTE_BO, qaic_execute_bo_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(QAIC_PARTIAL_EXECUTE_BO, qaic_partial_execute_bo_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(QAIC_WAIT_BO, qaic_wait_bo_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(QAIC_PERF_STATS_BO, qaic_perf_stats_bo_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(QAIC_PART_DEV, qaic_part_dev_ioctl, DRM_RENDER_ALLOW),
+};
+
+static const struct drm_driver qaic_accel_driver = {
+	.driver_features	= DRIVER_GEM | DRIVER_COMPUTE_ACCEL,
+
+	.name			= QAIC_NAME,
+	.desc			= QAIC_DESC,
+	.date			= "20190618",
+
+	.fops			= &qaic_accel_fops,
+	.open			= qaic_open,
+	.postclose		= qaic_postclose,
+
+	.ioctls			= qaic_drm_ioctls,
+	.num_ioctls		= ARRAY_SIZE(qaic_drm_ioctls),
+	.prime_fd_to_handle	= drm_gem_prime_fd_to_handle,
+	.gem_prime_import	= qaic_gem_prime_import,
+};
+
+static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id,
+				  struct qaic_user *owner)
+{
+	struct qaic_drm_device *qddev;
+	struct drm_device *ddev;
+	struct device *pdev;
+	int ret;
+
+	/*
+	 * Partition id QAIC_NO_PARTITION indicates that the device was created
+	 * on mhi_probe and id > QAIC_NO_PARTITION indicates a partition
+	 * created using IOCTL. So, pdev for primary device is the pci dev and
+	 * the parent for partition dev is the primary device.
+	 */
+	if (partition_id == QAIC_NO_PARTITION)
+		pdev = &qdev->pdev->dev;
+	else
+		pdev = qdev->base_dev->ddev->dev;
+
+	qddev = kzalloc(sizeof(*qddev), GFP_KERNEL);
+	if (!qddev) {
+		ret = -ENOMEM;
+		goto qddev_fail;
+	}
+
+	ddev = drm_dev_alloc(&qaic_accel_driver, pdev);
+	if (IS_ERR(ddev)) {
+		ret = PTR_ERR(ddev);
+		goto ddev_fail;
+	}
+
+	ddev->dev_private = qddev;
+	qddev->ddev = ddev;
+
+	if (partition_id == QAIC_NO_PARTITION)
+		qdev->base_dev = qddev;
+	qddev->qdev = qdev;
+	qddev->partition_id = partition_id;
+	qddev->owner = owner;
+	INIT_LIST_HEAD(&qddev->users);
+	mutex_init(&qddev->users_mutex);
+
+	mutex_lock(&qdev->qaic_drm_devices_mutex);
+	list_add(&qddev->node, &qdev->qaic_drm_devices);
+	mutex_unlock(&qdev->qaic_drm_devices_mutex);
+
+	ret = drm_dev_register(ddev, 0);
+	if (ret) {
+		pci_dbg(qdev->pdev, "%s: drm_dev_register failed %d\n", __func__, ret);
+		goto drm_reg_fail;
+	}
+
+	return 0;
+
+drm_reg_fail:
+	mutex_destroy(&qddev->users_mutex);
+	mutex_lock(&qdev->qaic_drm_devices_mutex);
+	list_del(&qddev->node);
+	mutex_unlock(&qdev->qaic_drm_devices_mutex);
+	if (partition_id == QAIC_NO_PARTITION)
+		qdev->base_dev = NULL;
+	drm_dev_put(ddev);
+ddev_fail:
+	kfree(qddev);
+qddev_fail:
+	return ret;
+}
+
+static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id,
+				    struct qaic_user *owner)
+{
+	struct qaic_drm_device *qddev;
+	struct qaic_drm_device *q;
+	struct qaic_user *usr;
+
+	list_for_each_entry_safe(qddev, q, &qdev->qaic_drm_devices, node) {
+		/*
+		 * Skip devices in case we just want to remove devices
+		 * specific to a owner or partition id.
+		 *
+		 * owner	partition_id	notes
+		 * ----------------------------------
+		 * NULL		NO_PARTITION	delete base + all derived (qdev
+		 *				reset)
+		 * !NULL	NO_PARTITION	delete derived devs created by
+		 *				owner.
+		 * !NULL	>NO_PARTITION	delete derived dev identified by
+		 *				the partition id and created by
+		 *				owner
+		 * NULL		>NO_PARTITION	invalid (no-op)
+		 *
+		 * if partition_id is any value < QAIC_NO_PARTITION this will be
+		 * a no-op.
+		 */
+		if (owner && owner != qddev->owner)
+			continue;
+
+		if (partition_id != QAIC_NO_PARTITION &&
+		    partition_id != qddev->partition_id && !owner)
+			continue;
+
+		/*
+		 * Existing users get unresolvable errors till they close FDs.
+		 * Need to sync carefully with users calling close().  The
+		 * list of users can be modified elsewhere when the lock isn't
+		 * held here, but the sync'ing the srcu with the mutex held
+		 * could deadlock.  Grab the mutex so that the list will be
+		 * unmodified.  The user we get will exist as long as the
+		 * lock is held.  Signal that the qcdev is going away, and
+		 * grab a reference to the user so they don't go away for
+		 * synchronize_srcu().  Then release the mutex to avoid
+		 * deadlock and make sure the user has observed the signal.
+		 * With the lock released, we cannot maintain any state of the
+		 * user list.
+		 */
+		mutex_lock(&qddev->users_mutex);
+		while (!list_empty(&qddev->users)) {
+			usr = list_first_entry(&qddev->users, struct qaic_user,
+					       node);
+			list_del_init(&usr->node);
+			kref_get(&usr->ref_count);
+			usr->qddev = NULL;
+			mutex_unlock(&qddev->users_mutex);
+			synchronize_srcu(&usr->qddev_lock);
+			kref_put(&usr->ref_count, free_usr);
+			mutex_lock(&qddev->users_mutex);
+		}
+		mutex_unlock(&qddev->users_mutex);
+
+		if (qddev->ddev) {
+			drm_dev_unregister(qddev->ddev);
+			drm_dev_put(qddev->ddev);
+		}
+
+		list_del(&qddev->node);
+		kfree(qddev);
+	}
+}
+
+static int qaic_mhi_probe(struct mhi_device *mhi_dev,
+			  const struct mhi_device_id *id)
+{
+	struct qaic_device *qdev;
+	u16 major, minor;
+	int ret;
+
+	/*
+	 * Invoking this function indicates that the control channel to the
+	 * device is available.  We use that as a signal to indicate that
+	 * the device side firmware has booted.  The device side firmware
+	 * manages the device resources, so we need to communicate with it
+	 * via the control channel in order to utilize the device.  Therefore
+	 * we wait until this signal to create the drm dev that userspace will
+	 * use to control the device, because without the device side firmware,
+	 * userspace can't do anything useful.
+	 */
+
+	qdev = pci_get_drvdata(to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
+
+	qdev->in_reset = false;
+
+	dev_set_drvdata(&mhi_dev->dev, qdev);
+	qdev->cntl_ch = mhi_dev;
+
+	ret = qaic_control_open(qdev);
+	if (ret) {
+		pci_dbg(qdev->pdev, "%s: control_open failed %d\n", __func__, ret);
+		goto err;
+	}
+
+	ret = get_cntl_version(qdev, NULL, &major, &minor);
+	if (ret || major != cntl_major || minor > cntl_minor) {
+		pci_err(qdev->pdev, "%s: Control protocol version (%d.%d) not supported.  Supported version is (%d.%d). Ret: %d\n",
+			__func__, major, minor, cntl_major, cntl_minor, ret);
+		ret = -EINVAL;
+		goto close_control;
+	}
+
+	ret = qaic_create_drm_device(qdev, QAIC_NO_PARTITION, NULL);
+
+	return ret;
+
+close_control:
+	qaic_control_close(qdev);
+err:
+	return ret;
+}
+
+static void qaic_mhi_remove(struct mhi_device *mhi_dev)
+{
+}
+
+static void qaic_notify_reset(struct qaic_device *qdev)
+{
+	int i;
+
+	qdev->in_reset = true;
+	/* wake up any waiters to avoid waiting for timeouts at sync */
+	wake_all_cntl(qdev);
+	for (i = 0; i < qdev->num_dbc; ++i)
+		wakeup_dbc(qdev, i);
+	synchronize_srcu(&qdev->dev_lock);
+}
+
+void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset)
+{
+	int i;
+
+	qaic_notify_reset(qdev);
+
+	/* remove drmdevs to prevent new users from coming in */
+	if (qdev->base_dev)
+		qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, NULL);
+
+	/* start tearing things down */
+	for (i = 0; i < qdev->num_dbc; ++i)
+		release_dbc(qdev, i);
+
+	if (exit_reset)
+		qdev->in_reset = false;
+}
+
+static int qaic_pci_probe(struct pci_dev *pdev,
+			  const struct pci_device_id *id)
+{
+	int ret;
+	int i;
+	int mhi_irq;
+	struct qaic_device *qdev;
+
+	qdev = devm_kzalloc(&pdev->dev, sizeof(*qdev), GFP_KERNEL);
+	if (!qdev)
+		return -ENOMEM;
+
+	if (id->device == PCI_DEV_AIC100) {
+		qdev->num_dbc = 16;
+		qdev->dbc = devm_kcalloc(&pdev->dev, qdev->num_dbc, sizeof(*qdev->dbc),
+					 GFP_KERNEL);
+		if (!qdev->dbc)
+			return -ENOMEM;
+	}
+
+	qdev->cntl_wq = alloc_workqueue("qaic_cntl", WQ_UNBOUND, 0);
+	if (!qdev->cntl_wq) {
+		ret = -ENOMEM;
+		goto wq_fail;
+	}
+	pci_set_drvdata(pdev, qdev);
+	qdev->pdev = pdev;
+	mutex_init(&qdev->cntl_mutex);
+	INIT_LIST_HEAD(&qdev->cntl_xfer_list);
+	init_srcu_struct(&qdev->dev_lock);
+	INIT_LIST_HEAD(&qdev->qaic_drm_devices);
+	mutex_init(&qdev->qaic_drm_devices_mutex);
+	for (i = 0; i < qdev->num_dbc; ++i) {
+		spin_lock_init(&qdev->dbc[i].xfer_lock);
+		qdev->dbc[i].qdev = qdev;
+		qdev->dbc[i].id = i;
+		INIT_LIST_HEAD(&qdev->dbc[i].xfer_list);
+		init_srcu_struct(&qdev->dbc[i].ch_lock);
+		init_waitqueue_head(&qdev->dbc[i].dbc_release);
+		INIT_LIST_HEAD(&qdev->dbc[i].bo_lists);
+	}
+
+	qdev->bars = pci_select_bars(pdev, IORESOURCE_MEM);
+
+	/* make sure the device has the expected BARs */
+	if (qdev->bars != (BIT(0) | BIT(2) | BIT(4))) {
+		pci_dbg(pdev, "%s: expected BARs 0, 2, and 4 not found in device.  Found 0x%x\n",
+			__func__, qdev->bars);
+		ret = -EINVAL;
+		goto bar_fail;
+	}
+
+	ret = pci_enable_device(pdev);
+	if (ret)
+		goto enable_fail;
+
+	ret = pci_request_selected_regions(pdev, qdev->bars, "aic100");
+	if (ret)
+		goto request_regions_fail;
+
+	pci_set_master(pdev);
+
+	ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
+	if (ret)
+		goto dma_mask_fail;
+	ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
+	if (ret)
+		goto dma_mask_fail;
+	ret = dma_set_max_seg_size(&pdev->dev, UINT_MAX);
+	if (ret)
+		goto dma_mask_fail;
+
+	qdev->bar_0 = pci_ioremap_bar(pdev, 0);
+	if (!qdev->bar_0) {
+		ret = -ENOMEM;
+		goto ioremap_0_fail;
+	}
+
+	qdev->bar_2 = pci_ioremap_bar(pdev, 2);
+	if (!qdev->bar_2) {
+		ret = -ENOMEM;
+		goto ioremap_2_fail;
+	}
+
+	for (i = 0; i < qdev->num_dbc; ++i)
+		qdev->dbc[i].dbc_base = qdev->bar_2 + QAIC_DBC_OFF(i);
+
+	ret = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI);
+	if (ret < 0)
+		goto alloc_irq_fail;
+
+	if (ret < 32) {
+		pci_err(pdev, "%s: Requested 32 MSIs.  Obtained %d MSIs which is less than the 32 required.\n",
+			__func__, ret);
+		ret = -ENODEV;
+		goto invalid_msi_config;
+	}
+
+	mhi_irq = pci_irq_vector(pdev, 0);
+	if (mhi_irq < 0) {
+		ret = mhi_irq;
+		goto get_mhi_irq_fail;
+	}
+
+	for (i = 0; i < qdev->num_dbc; ++i) {
+		ret = devm_request_threaded_irq(&pdev->dev,
+						pci_irq_vector(pdev, i + 1),
+						dbc_irq_handler,
+						dbc_irq_threaded_fn,
+						IRQF_SHARED,
+						"qaic_dbc",
+						&qdev->dbc[i]);
+		if (ret)
+			goto get_dbc_irq_failed;
+
+		if (poll_datapath) {
+			qdev->dbc[i].irq = pci_irq_vector(pdev, i + 1);
+			disable_irq_nosync(qdev->dbc[i].irq);
+			INIT_WORK(&qdev->dbc[i].poll_work, irq_polling_work);
+		}
+	}
+
+	qdev->mhi_cntl = qaic_mhi_register_controller(pdev, qdev->bar_0, mhi_irq);
+	if (IS_ERR(qdev->mhi_cntl)) {
+		ret = PTR_ERR(qdev->mhi_cntl);
+		goto mhi_register_fail;
+	}
+
+	return 0;
+
+mhi_register_fail:
+get_dbc_irq_failed:
+	for (i = 0; i < qdev->num_dbc; ++i)
+		devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
+			      &qdev->dbc[i]);
+get_mhi_irq_fail:
+invalid_msi_config:
+	pci_free_irq_vectors(pdev);
+alloc_irq_fail:
+	iounmap(qdev->bar_2);
+ioremap_2_fail:
+	iounmap(qdev->bar_0);
+ioremap_0_fail:
+dma_mask_fail:
+	pci_clear_master(pdev);
+	pci_release_selected_regions(pdev, qdev->bars);
+request_regions_fail:
+	pci_disable_device(pdev);
+enable_fail:
+	pci_set_drvdata(pdev, NULL);
+bar_fail:
+	for (i = 0; i < qdev->num_dbc; ++i)
+		cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
+	cleanup_srcu_struct(&qdev->dev_lock);
+	destroy_workqueue(qdev->cntl_wq);
+wq_fail:
+	return ret;
+}
+
+static void qaic_pci_remove(struct pci_dev *pdev)
+{
+	struct qaic_device *qdev = pci_get_drvdata(pdev);
+	int i;
+
+	if (!qdev)
+		return;
+
+	qaic_dev_reset_clean_local_state(qdev, false);
+	qaic_mhi_free_controller(qdev->mhi_cntl, link_up);
+	for (i = 0; i < qdev->num_dbc; ++i) {
+		devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
+			      &qdev->dbc[i]);
+		cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
+	}
+	destroy_workqueue(qdev->cntl_wq);
+	pci_free_irq_vectors(pdev);
+	iounmap(qdev->bar_0);
+	pci_clear_master(pdev);
+	pci_release_selected_regions(pdev, qdev->bars);
+	pci_disable_device(pdev);
+	pci_set_drvdata(pdev, NULL);
+}
+
+static void qaic_pci_shutdown(struct pci_dev *pdev)
+{
+	/* see qaic_exit for what link_up is doing */
+	link_up = true;
+	qaic_pci_remove(pdev);
+}
+
+static pci_ers_result_t qaic_pci_error_detected(struct pci_dev *pdev,
+						pci_channel_state_t error)
+{
+	return PCI_ERS_RESULT_NEED_RESET;
+}
+
+static void qaic_pci_reset_prepare(struct pci_dev *pdev)
+{
+	struct qaic_device *qdev = pci_get_drvdata(pdev);
+
+	qaic_notify_reset(qdev);
+	qaic_mhi_start_reset(qdev->mhi_cntl);
+	qaic_dev_reset_clean_local_state(qdev, false);
+}
+
+static void qaic_pci_reset_done(struct pci_dev *pdev)
+{
+	struct qaic_device *qdev = pci_get_drvdata(pdev);
+
+	qdev->in_reset = false;
+	qaic_mhi_reset_done(qdev->mhi_cntl);
+}
+
+static const struct mhi_device_id qaic_mhi_match_table[] = {
+	{ .chan = "QAIC_CONTROL", },
+	{},
+};
+
+static struct mhi_driver qaic_mhi_driver = {
+	.id_table = qaic_mhi_match_table,
+	.remove = qaic_mhi_remove,
+	.probe = qaic_mhi_probe,
+	.ul_xfer_cb = qaic_mhi_ul_xfer_cb,
+	.dl_xfer_cb = qaic_mhi_dl_xfer_cb,
+	.driver = {
+		.name = "qaic_mhi",
+	},
+};
+
+static const struct pci_device_id qaic_ids[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_QCOM, PCI_DEV_AIC100), },
+	{ }
+};
+MODULE_DEVICE_TABLE(pci, qaic_ids);
+
+static const struct pci_error_handlers qaic_pci_err_handler = {
+	.error_detected = qaic_pci_error_detected,
+	.reset_prepare = qaic_pci_reset_prepare,
+	.reset_done = qaic_pci_reset_done,
+};
+
+static struct pci_driver qaic_pci_driver = {
+	.name = QAIC_NAME,
+	.id_table = qaic_ids,
+	.probe = qaic_pci_probe,
+	.remove = qaic_pci_remove,
+	.shutdown = qaic_pci_shutdown,
+	.err_handler = &qaic_pci_err_handler,
+};
+
+static int __init qaic_init(void)
+{
+	int ret;
+
+	if (datapath_polling)
+		poll_datapath = true;
+
+	ret = mhi_driver_register(&qaic_mhi_driver);
+	if (ret) {
+		pr_debug("qaic: mhi_driver_register failed %d\n", ret);
+		goto free_class;
+	}
+
+	ret = pci_register_driver(&qaic_pci_driver);
+	if (ret) {
+		pr_debug("qaic: pci_register_driver failed %d\n", ret);
+		goto free_mhi;
+	}
+
+	ret = mhi_qaic_ctrl_init();
+	if (ret) {
+		pr_debug("qaic: mhi_qaic_ctrl_init failed %d\n", ret);
+		goto free_pci;
+	}
+
+	return 0;
+
+free_pci:
+	pci_unregister_driver(&qaic_pci_driver);
+free_mhi:
+	mhi_driver_unregister(&qaic_mhi_driver);
+free_class:
+
+	return ret;
+}
+
+static void __exit qaic_exit(void)
+{
+	/*
+	 * We assume that qaic_pci_remove() is called due to a hotplug event
+	 * which would mean that the link is down, and thus
+	 * qaic_mhi_free_controller() should not try to access the device during
+	 * cleanup.
+	 * We call pci_unregister_driver() below, which also triggers
+	 * qaic_pci_remove(), but since this is module exit, we expect the link
+	 * to the device to be up, in which case qaic_mhi_free_controller()
+	 * should try to access the device during cleanup to put the device in
+	 * a sane state.
+	 * For that reason, we set link_up here to let qaic_mhi_free_controller
+	 * know the expected link state.  Since the module is going to be
+	 * removed at the end of this, we don't need to worry about
+	 * reinitializing the link_up state after the cleanup is done.
+	 */
+	link_up = true;
+	mhi_qaic_ctrl_deinit();
+	pci_unregister_driver(&qaic_pci_driver);
+	mhi_driver_unregister(&qaic_mhi_driver);
+}
+
+module_init(qaic_init);
+module_exit(qaic_exit);
+
+MODULE_AUTHOR(QAIC_DESC " Kernel Driver Team");
+MODULE_DESCRIPTION(QAIC_DESC " Accel Driver");
+MODULE_LICENSE("GPL");
diff --git a/include/uapi/drm/qaic_accel.h b/include/uapi/drm/qaic_accel.h
new file mode 100644
index 0000000..d5fa6f5
--- /dev/null
+++ b/include/uapi/drm/qaic_accel.h
@@ -0,0 +1,283 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+ *
+ * Copyright (c) 2019-2020, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef QAIC_ACCEL_H_
+#define QAIC_ACCEL_H_
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+#include "drm.h"
+
+#if defined(__CPLUSPLUS)
+extern "C" {
+#endif
+
+#define QAIC_MANAGE_MAX_MSG_LENGTH SZ_4K	/**<
+						  * The length(4K) includes len and
+						  * count fields of qaic_manage_msg
+						  */
+
+enum qaic_sem_flags {
+	SEM_INSYNCFENCE =	0x1,
+	SEM_OUTSYNCFENCE =	0x2,
+};
+
+enum qaic_sem_cmd {
+	SEM_NOP =		0,
+	SEM_INIT =		1,
+	SEM_INC =		2,
+	SEM_DEC =		3,
+	SEM_WAIT_EQUAL =	4,
+	SEM_WAIT_GT_EQ =	5, /**< Greater than or equal */
+	SEM_WAIT_GT_0 =		6, /**< Greater than 0 */
+};
+
+enum qaic_manage_transaction_type {
+	TRANS_UNDEFINED =			0,
+	TRANS_PASSTHROUGH_FROM_USR =		1,
+	TRANS_PASSTHROUGH_TO_USR =		2,
+	TRANS_PASSTHROUGH_FROM_DEV =		3,
+	TRANS_PASSTHROUGH_TO_DEV =		4,
+	TRANS_DMA_XFER_FROM_USR =		5,
+	TRANS_DMA_XFER_TO_DEV =			6,
+	TRANS_ACTIVATE_FROM_USR =		7,
+	TRANS_ACTIVATE_FROM_DEV =		8,
+	TRANS_ACTIVATE_TO_DEV =			9,
+	TRANS_DEACTIVATE_FROM_USR =		10,
+	TRANS_DEACTIVATE_FROM_DEV =		11,
+	TRANS_STATUS_FROM_USR =			12,
+	TRANS_STATUS_TO_USR =			13,
+	TRANS_STATUS_FROM_DEV =			14,
+	TRANS_STATUS_TO_DEV =			15,
+	TRANS_TERMINATE_FROM_DEV =		16,
+	TRANS_TERMINATE_TO_DEV =		17,
+	TRANS_DMA_XFER_CONT =			18,
+	TRANS_VALIDATE_PARTITION_FROM_DEV =	19,
+	TRANS_VALIDATE_PARTITION_TO_DEV =	20,
+	TRANS_MAX =				21
+};
+
+struct qaic_manage_trans_hdr {
+	__u32 type;	/**< in, value from enum manage_transaction_type */
+	__u32 len;	/**< in, length of this transaction, including the header */
+};
+
+struct qaic_manage_trans_passthrough {
+	struct qaic_manage_trans_hdr hdr;
+	__u8 data[];	/**< in, userspace must encode in little endian */
+};
+
+struct qaic_manage_trans_dma_xfer {
+	struct qaic_manage_trans_hdr hdr;
+	__u32 tag;	/**< in, device specific */
+	__u32 count;	/**< in */
+	__u64 addr;	/**< in, address of the data to transferred via DMA */
+	__u64 size;	/**< in, length of the data to transferred via DMA */
+};
+
+struct qaic_manage_trans_activate_to_dev {
+	struct qaic_manage_trans_hdr hdr;
+	__u32 queue_size;	/**<
+				  * in, number of elements in DBC request
+				  * and respose queue
+				  */
+	__u32 eventfd;		/**< in */
+	__u32 options;		/**< in, device specific */
+	__u32 pad;		/**< pad must be 0 */
+};
+
+struct qaic_manage_trans_activate_from_dev {
+	struct qaic_manage_trans_hdr hdr;
+	__u32 status;	/**< out, status of activate transaction */
+	__u32 dbc_id;	/**< out, Identifier of assigned DMA Bridge channel */
+	__u64 options;	/**< out */
+};
+
+struct qaic_manage_trans_deactivate {
+	struct qaic_manage_trans_hdr hdr;
+	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
+	__u32 pad;	/**< pad must be 0 */
+};
+
+struct qaic_manage_trans_status_to_dev {
+	struct qaic_manage_trans_hdr hdr;
+};
+
+struct qaic_manage_trans_status_from_dev {
+	struct qaic_manage_trans_hdr hdr;
+	__u16 major;	/**< out, major vesrion of NNC protocol used by device */
+	__u16 minor;	/**< out, minor vesrion of NNC protocol used by device */
+	__u32 status;	/**< out, status of query transaction  */
+	__u64 status_flags;	/**<
+				  * out
+				  * 0    : If set then device has CRC check enabled
+				  * 1:63 : Unused
+				  */
+};
+
+struct qaic_manage_msg {
+	__u32 len;	/**< in, Length of valid data - ie sum of all transactions */
+	__u32 count;	/**< in, Number of transactions in message */
+	__u64 data;	/**< in, Pointer to array of transactions */
+};
+
+struct qaic_create_bo {
+	__u64 size;	/**< in, Size of BO in byte */
+	__u32 handle;	/**< out, Returned GEM handle for the BO */
+	__u32 pad;	/**< pad must be 0 */
+};
+
+struct qaic_mmap_bo {
+	__u32 handle;	/**< in, Handle for the BO being mapped. */
+	__u32 pad;	/**< pad must be 0 */
+	__u64 offset;	/**<
+			  * out, offset into the drm node to use for
+			  * subsequent mmap call
+			  */
+};
+
+/**
+ * @brief semaphore command
+ */
+struct qaic_sem {
+	__u16 val;	/**< in, Only lower 12 bits are valid */
+	__u8  index;	/**< in, Only lower 5 bits are valid */
+	__u8  presync;	/**< in, 1 if presync operation, 0 if postsync */
+	__u8  cmd;	/**< in, See enum sem_cmd */
+	__u8  flags;	/**< in, See sem_flags for valid bits.  All others must be 0 */
+	__u16 pad;	/**< pad must be 0 */
+};
+
+struct qaic_attach_slice_entry {
+	__u64 size;		/**< in, Size memory to allocate for this BO slice */
+	struct qaic_sem	sem0;	/**< in, Must be zero if not valid */
+	struct qaic_sem	sem1;	/**< in, Must be zero if not valid */
+	struct qaic_sem	sem2;	/**< in, Must be zero if not valid */
+	struct qaic_sem	sem3;	/**< in, Must be zero if not valid */
+	__u64 dev_addr;		/**< in, Address in device to/from which data is copied */
+	__u64 db_addr;		/**< in, Doorbell address */
+	__u32 db_data;		/**< in, Data to write to doorbell */
+	__u32 db_len;		/**<
+				  * in, Doorbell length - 32, 16, or 8 bits.
+				  * 0 means doorbell is inactive
+				  */
+	__u64 offset;		/**< in, Offset from start of buffer */
+};
+
+struct qaic_attach_slice_hdr {
+	__u32 count;	/**< in, Number of slices for this BO */
+	__u32 dbc_id;	/**< in, Associate this BO with this DMA Bridge channel */
+	__u32 handle;	/**< in, Handle of BO to which slicing information is to be attached */
+	__u32 dir;	/**< in, Direction of data: 1 = DMA_TO_DEVICE, 2 = DMA_FROM_DEVICE */
+	__u64 size;	/**<
+			  * in, Total length of BO
+			  * If BO is imported (DMABUF/PRIME) then this size
+			  * should not exceed the size of DMABUF provided.
+			  * If BO is allocated using DRM_IOCTL_QAIC_CREATE_BO
+			  * then this size should be exactly same as the size
+			  * provided during DRM_IOCTL_QAIC_CREATE_BO.
+			  */
+};
+
+struct qaic_attach_slice {
+	struct qaic_attach_slice_hdr hdr;
+	__u64 data;	/**<
+			  * in, Pointer to a buffer which is container of
+			  * struct qaic_attach_slice_entry[]
+			  */
+};
+
+struct qaic_execute_entry {
+	__u32 handle;	/**< in, buffer handle */
+	__u32 dir;	/**< in, 1 = to device, 2 = from device */
+};
+
+struct qaic_partial_execute_entry {
+	__u32 handle;	/**< in, buffer handle */
+	__u32 dir;	/**< in, 1 = to device, 2 = from device */
+	__u64 resize;	/**< in, 0 = no resize */
+};
+
+struct qaic_execute_hdr {
+	__u32 count;	/**< in, number of executes following this header */
+	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
+};
+
+struct qaic_execute {
+	struct qaic_execute_hdr hdr;
+	__u64 data;	/**< in, qaic_execute_entry or qaic_partial_execute_entry container */
+};
+
+struct qaic_wait {
+	__u32 handle;	/**< in, handle to wait on until execute is complete */
+	__u32 timeout;	/**< in, timeout for wait(in ms) */
+	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
+	__u32 pad;	/**< pad must be 0 */
+};
+
+struct qaic_perf_stats_hdr {
+	__u16 count;	/**< in, Total number BOs requested */
+	__u16 pad;	/**< pad must be 0 */
+	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
+};
+
+struct qaic_perf_stats {
+	struct qaic_perf_stats_hdr hdr;
+	__u64 data;	/**< in, qaic_perf_stats_entry container */
+};
+
+struct qaic_perf_stats_entry {
+	__u32 handle;			/**< in, Handle of the memory request */
+	__u32 queue_level_before;	/**<
+					  * out, Number of elements in queue
+					  * before submission given memory request
+					  */
+	__u32 num_queue_element;	/**<
+					  * out, Number of elements to add in the
+					  * queue for given memory request
+					  */
+	__u32 submit_latency_us;	/**<
+					  * out, Time taken by kernel to submit
+					  * the request to device
+					  */
+	__u32 device_latency_us;	/**<
+					  * out, Time taken by device to execute the
+					  * request. 0 if request is not completed
+					  */
+	__u32 pad;			/**< pad must be 0 */
+};
+
+struct qaic_part_dev {
+	__u32 partition_id;	/**< in, reservation id */
+	__u16 remove;		/**< in, 1 - Remove device 0 - Create device */
+	__u16 pad;		/**< pad must be 0 */
+};
+
+#define DRM_QAIC_MANAGE				0x00
+#define DRM_QAIC_CREATE_BO			0x01
+#define DRM_QAIC_MMAP_BO			0x02
+#define DRM_QAIC_ATTACH_SLICE_BO		0x03
+#define DRM_QAIC_EXECUTE_BO			0x04
+#define DRM_QAIC_PARTIAL_EXECUTE_BO		0x05
+#define DRM_QAIC_WAIT_BO			0x06
+#define DRM_QAIC_PERF_STATS_BO			0x07
+#define DRM_QAIC_PART_DEV			0x08
+
+#define DRM_IOCTL_QAIC_MANAGE			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_MANAGE, struct qaic_manage_msg)
+#define DRM_IOCTL_QAIC_CREATE_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_CREATE_BO,	struct qaic_create_bo)
+#define DRM_IOCTL_QAIC_MMAP_BO			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_MMAP_BO, struct qaic_mmap_bo)
+#define DRM_IOCTL_QAIC_ATTACH_SLICE_BO		DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_ATTACH_SLICE_BO, struct qaic_attach_slice)
+#define DRM_IOCTL_QAIC_EXECUTE_BO		DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_EXECUTE_BO,	struct qaic_execute)
+#define DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO	DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_PARTIAL_EXECUTE_BO,	struct qaic_execute)
+#define DRM_IOCTL_QAIC_WAIT_BO			DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_WAIT_BO, struct qaic_wait)
+#define DRM_IOCTL_QAIC_PERF_STATS_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PERF_STATS_BO, struct qaic_perf_stats)
+#define DRM_IOCTL_QAIC_PART_DEV			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PART_DEV, struct qaic_part_dev)
+
+#if defined(__CPLUSPLUS)
+}
+#endif
+
+#endif /* QAIC_ACCEL_H_ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 3/8] accel/qaic: Add MHI controller
  2023-02-06 15:41 ` Jeffrey Hugo
@ 2023-02-06 15:41   ` Jeffrey Hugo
  -1 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-06 15:41 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: Jeffrey Hugo, linux-doc, linux-arm-msm, quic_ajitpals,
	quic_pkanojiy, stanislaw.gruszka, quic_carlv, jacek.lawrynowicz

An AIC100 device contains a MHI interface with a number of different
channels for controlling different aspects of the device.  The MHI
controller works with the MHI bus to enable and drive that interface.

AIC100 uses the BHI protocol in PBL to load SBL.  The MHI controller
expects the sbl to be located at /lib/firmware/qcom/aic100/sbl.bin and
expects the MHI bus to manage the process of loading and sending SBL to
the device.

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
---
 drivers/accel/qaic/mhi_controller.c | 576 ++++++++++++++++++++++++++++++++++++
 drivers/accel/qaic/mhi_controller.h |  19 ++
 2 files changed, 595 insertions(+)
 create mode 100644 drivers/accel/qaic/mhi_controller.c
 create mode 100644 drivers/accel/qaic/mhi_controller.h

diff --git a/drivers/accel/qaic/mhi_controller.c b/drivers/accel/qaic/mhi_controller.c
new file mode 100644
index 0000000..984428c
--- /dev/null
+++ b/drivers/accel/qaic/mhi_controller.c
@@ -0,0 +1,576 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2019-2021, The Linux Foundation. All rights reserved. */
+/* Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved. */
+
+#include <linux/delay.h>
+#include <linux/err.h>
+#include <linux/memblock.h>
+#include <linux/mhi.h>
+#include <linux/moduleparam.h>
+#include <linux/pci.h>
+#include <linux/sizes.h>
+
+#include "mhi_controller.h"
+#include "qaic.h"
+
+#define MAX_RESET_TIME_SEC 25
+
+static unsigned int mhi_timeout = 2000; /* 2 sec default */
+module_param(mhi_timeout, uint, 0600);
+
+static struct mhi_channel_config aic100_channels[] = {
+	{
+		.name = "QAIC_LOOPBACK",
+		.num = 0,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_LOOPBACK",
+		.num = 1,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_SAHARA",
+		.num = 2,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_SBL,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_SAHARA",
+		.num = 3,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_SBL,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_DIAG",
+		.num = 4,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_DIAG",
+		.num = 5,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_SSR",
+		.num = 6,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_SSR",
+		.num = 7,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_QDSS",
+		.num = 8,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_QDSS",
+		.num = 9,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_CONTROL",
+		.num = 10,
+		.num_elements = 128,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_CONTROL",
+		.num = 11,
+		.num_elements = 128,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_LOGGING",
+		.num = 12,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_SBL,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_LOGGING",
+		.num = 13,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_SBL,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_STATUS",
+		.num = 14,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_STATUS",
+		.num = 15,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_TELEMETRY",
+		.num = 16,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_TELEMETRY",
+		.num = 17,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_DEBUG",
+		.num = 18,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_DEBUG",
+		.num = 19,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_TIMESYNC",
+		.num = 20,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_SBL | MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.num = 21,
+		.name = "QAIC_TIMESYNC",
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_SBL | MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+};
+
+static struct mhi_event_config aic100_events[] = {
+	{
+		.num_elements = 32,
+		.irq_moderation_ms = 0,
+		.irq = 0,
+		.channel = U32_MAX,
+		.priority = 1,
+		.mode = MHI_DB_BRST_DISABLE,
+		.data_type = MHI_ER_CTRL,
+		.hardware_event = false,
+		.client_managed = false,
+		.offload_channel = false,
+	},
+};
+
+static struct mhi_controller_config aic100_config = {
+	.max_channels = 128,
+	.timeout_ms = 0, /* controlled by mhi_timeout */
+	.buf_len = 0,
+	.num_channels = ARRAY_SIZE(aic100_channels),
+	.ch_cfg = aic100_channels,
+	.num_events = ARRAY_SIZE(aic100_events),
+	.event_cfg = aic100_events,
+	.use_bounce_buf = false,
+	.m2_no_db = false,
+};
+
+static int mhi_read_reg(struct mhi_controller *mhi_cntl, void __iomem *addr, u32 *out)
+{
+	u32 tmp = readl_relaxed(addr);
+
+	if (tmp == U32_MAX)
+		return -EIO;
+
+	*out = tmp;
+
+	return 0;
+}
+
+static void mhi_write_reg(struct mhi_controller *mhi_cntl, void __iomem *addr,
+			  u32 val)
+{
+	writel_relaxed(val, addr);
+}
+
+static int mhi_runtime_get(struct mhi_controller *mhi_cntl)
+{
+	return 0;
+}
+
+static void mhi_runtime_put(struct mhi_controller *mhi_cntl)
+{
+}
+
+static void mhi_status_cb(struct mhi_controller *mhi_cntl, enum mhi_callback reason)
+{
+	struct qaic_device *qdev = pci_get_drvdata(to_pci_dev(mhi_cntl->cntrl_dev));
+
+	/* this event occurs in atomic context */
+	if (reason == MHI_CB_FATAL_ERROR)
+		pci_err(qdev->pdev, "Fatal error received from device.  Attempting to recover\n");
+	/* this event occurs in non-atomic context */
+	if (reason == MHI_CB_SYS_ERROR && !qdev->in_reset)
+		qaic_dev_reset_clean_local_state(qdev, true);
+}
+
+static int mhi_reset_and_async_power_up(struct mhi_controller *mhi_cntl)
+{
+	char time_sec = 1;
+	int current_ee;
+	int ret;
+
+	/* Reset the device to bring the device in PBL EE */
+	mhi_soc_reset(mhi_cntl);
+
+	/*
+	 * Keep checking the execution environment(EE) after every 1 second
+	 * interval.
+	 */
+	do {
+		msleep(1000);
+		current_ee = mhi_get_exec_env(mhi_cntl);
+	} while (current_ee != MHI_EE_PBL && time_sec++ <= MAX_RESET_TIME_SEC);
+
+	/* If the device is in PBL EE retry power up */
+	if (current_ee == MHI_EE_PBL)
+		ret = mhi_async_power_up(mhi_cntl);
+	else
+		ret = -EIO;
+
+	return ret;
+}
+
+struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev,
+						    void __iomem *mhi_bar,
+						    int mhi_irq)
+{
+	struct mhi_controller *mhi_cntl;
+	int ret;
+
+	mhi_cntl = kzalloc(sizeof(*mhi_cntl), GFP_KERNEL);
+	if (!mhi_cntl)
+		return ERR_PTR(-ENOMEM);
+
+	mhi_cntl->cntrl_dev = &pci_dev->dev;
+
+	/*
+	 * Covers the entire possible physical ram region.  Remote side is
+	 * going to calculate a size of this range, so subtract 1 to prevent
+	 * rollover.
+	 */
+	mhi_cntl->iova_start = 0;
+	mhi_cntl->iova_stop = PHYS_ADDR_MAX - 1;
+
+	mhi_cntl->status_cb = mhi_status_cb;
+	mhi_cntl->runtime_get = mhi_runtime_get;
+	mhi_cntl->runtime_put = mhi_runtime_put;
+	mhi_cntl->read_reg = mhi_read_reg;
+	mhi_cntl->write_reg = mhi_write_reg;
+	mhi_cntl->regs = mhi_bar;
+	mhi_cntl->reg_len = SZ_4K;
+	mhi_cntl->nr_irqs = 1;
+	mhi_cntl->irq = kmalloc(sizeof(*mhi_cntl->irq), GFP_KERNEL);
+
+	if (!mhi_cntl->irq) {
+		ret = -ENOMEM;
+		goto fail_irq;
+	}
+
+	mhi_cntl->irq[0] = mhi_irq;
+
+	mhi_cntl->fw_image = "qcom/aic100/sbl.bin";
+
+	/* use latest configured timeout */
+	aic100_config.timeout_ms = mhi_timeout;
+	ret = mhi_register_controller(mhi_cntl, &aic100_config);
+	if (ret) {
+		pci_err(pci_dev, "mhi_register_controller failed %d\n", ret);
+		goto register_fail;
+	}
+
+	ret = mhi_prepare_for_power_up(mhi_cntl);
+	if (ret) {
+		pci_err(pci_dev, "mhi_prepare_for_power_up failed %d\n", ret);
+		goto prepare_power_up_fail;
+	}
+
+	ret = mhi_async_power_up(mhi_cntl);
+	/*
+	 * If EIO is returned it is possible that device is in SBL EE, which is
+	 * undesired. SOC reset the device and try to power up again.
+	 */
+	if (ret == -EIO && MHI_EE_SBL == mhi_get_exec_env(mhi_cntl)) {
+		pci_err(pci_dev, "Device is not expected to be SBL EE. SOC resetting the device to put it in PBL EE and again trying mhi async power up. Error %d\n",
+			ret);
+		ret = mhi_reset_and_async_power_up(mhi_cntl);
+	}
+
+	if (ret) {
+		pci_err(pci_dev, "mhi_async_power_up failed %d\n", ret);
+		goto power_up_fail;
+	}
+
+	return mhi_cntl;
+
+power_up_fail:
+	mhi_unprepare_after_power_down(mhi_cntl);
+prepare_power_up_fail:
+	mhi_unregister_controller(mhi_cntl);
+register_fail:
+	kfree(mhi_cntl->irq);
+fail_irq:
+	kfree(mhi_cntl);
+	return ERR_PTR(ret);
+}
+
+void qaic_mhi_free_controller(struct mhi_controller *mhi_cntl, bool link_up)
+{
+	mhi_power_down(mhi_cntl, link_up);
+	mhi_unprepare_after_power_down(mhi_cntl);
+	mhi_unregister_controller(mhi_cntl);
+	kfree(mhi_cntl->irq);
+	kfree(mhi_cntl);
+}
+
+void qaic_mhi_start_reset(struct mhi_controller *mhi_cntl)
+{
+	mhi_power_down(mhi_cntl, true);
+}
+
+void qaic_mhi_reset_done(struct mhi_controller *mhi_cntl)
+{
+	struct pci_dev *pci_dev = container_of(mhi_cntl->cntrl_dev,
+					       struct pci_dev, dev);
+	int ret;
+
+	ret = mhi_async_power_up(mhi_cntl);
+	if (ret)
+		pci_err(pci_dev, "mhi_async_power_up failed after reset %d\n", ret);
+}
diff --git a/drivers/accel/qaic/mhi_controller.h b/drivers/accel/qaic/mhi_controller.h
new file mode 100644
index 0000000..12197bb
--- /dev/null
+++ b/drivers/accel/qaic/mhi_controller.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * Copyright (c) 2019-2020, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef MHICONTROLLERQAIC_H_
+#define MHICONTROLLERQAIC_H_
+
+struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev,
+						    void __iomem *mhi_bar,
+						    int mhi_irq);
+
+void qaic_mhi_free_controller(struct mhi_controller *mhi_cntl, bool link_up);
+
+void qaic_mhi_start_reset(struct mhi_controller *mhi_cntl);
+void qaic_mhi_reset_done(struct mhi_controller *mhi_cntl);
+
+#endif /* MHICONTROLLERQAIC_H_ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 3/8] accel/qaic: Add MHI controller
@ 2023-02-06 15:41   ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-06 15:41 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: jacek.lawrynowicz, stanislaw.gruszka, quic_pkanojiy, quic_carlv,
	quic_ajitpals, linux-doc, linux-arm-msm, Jeffrey Hugo

An AIC100 device contains a MHI interface with a number of different
channels for controlling different aspects of the device.  The MHI
controller works with the MHI bus to enable and drive that interface.

AIC100 uses the BHI protocol in PBL to load SBL.  The MHI controller
expects the sbl to be located at /lib/firmware/qcom/aic100/sbl.bin and
expects the MHI bus to manage the process of loading and sending SBL to
the device.

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
---
 drivers/accel/qaic/mhi_controller.c | 576 ++++++++++++++++++++++++++++++++++++
 drivers/accel/qaic/mhi_controller.h |  19 ++
 2 files changed, 595 insertions(+)
 create mode 100644 drivers/accel/qaic/mhi_controller.c
 create mode 100644 drivers/accel/qaic/mhi_controller.h

diff --git a/drivers/accel/qaic/mhi_controller.c b/drivers/accel/qaic/mhi_controller.c
new file mode 100644
index 0000000..984428c
--- /dev/null
+++ b/drivers/accel/qaic/mhi_controller.c
@@ -0,0 +1,576 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2019-2021, The Linux Foundation. All rights reserved. */
+/* Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved. */
+
+#include <linux/delay.h>
+#include <linux/err.h>
+#include <linux/memblock.h>
+#include <linux/mhi.h>
+#include <linux/moduleparam.h>
+#include <linux/pci.h>
+#include <linux/sizes.h>
+
+#include "mhi_controller.h"
+#include "qaic.h"
+
+#define MAX_RESET_TIME_SEC 25
+
+static unsigned int mhi_timeout = 2000; /* 2 sec default */
+module_param(mhi_timeout, uint, 0600);
+
+static struct mhi_channel_config aic100_channels[] = {
+	{
+		.name = "QAIC_LOOPBACK",
+		.num = 0,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_LOOPBACK",
+		.num = 1,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_SAHARA",
+		.num = 2,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_SBL,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_SAHARA",
+		.num = 3,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_SBL,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_DIAG",
+		.num = 4,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_DIAG",
+		.num = 5,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_SSR",
+		.num = 6,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_SSR",
+		.num = 7,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_QDSS",
+		.num = 8,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_QDSS",
+		.num = 9,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_CONTROL",
+		.num = 10,
+		.num_elements = 128,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_CONTROL",
+		.num = 11,
+		.num_elements = 128,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_LOGGING",
+		.num = 12,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_SBL,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_LOGGING",
+		.num = 13,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_SBL,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_STATUS",
+		.num = 14,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_STATUS",
+		.num = 15,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_TELEMETRY",
+		.num = 16,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_TELEMETRY",
+		.num = 17,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_DEBUG",
+		.num = 18,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_DEBUG",
+		.num = 19,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_TIMESYNC",
+		.num = 20,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.ee_mask = MHI_CH_EE_SBL | MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+	{
+		.num = 21,
+		.name = "QAIC_TIMESYNC",
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.ee_mask = MHI_CH_EE_SBL | MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.wake_capable = false,
+	},
+};
+
+static struct mhi_event_config aic100_events[] = {
+	{
+		.num_elements = 32,
+		.irq_moderation_ms = 0,
+		.irq = 0,
+		.channel = U32_MAX,
+		.priority = 1,
+		.mode = MHI_DB_BRST_DISABLE,
+		.data_type = MHI_ER_CTRL,
+		.hardware_event = false,
+		.client_managed = false,
+		.offload_channel = false,
+	},
+};
+
+static struct mhi_controller_config aic100_config = {
+	.max_channels = 128,
+	.timeout_ms = 0, /* controlled by mhi_timeout */
+	.buf_len = 0,
+	.num_channels = ARRAY_SIZE(aic100_channels),
+	.ch_cfg = aic100_channels,
+	.num_events = ARRAY_SIZE(aic100_events),
+	.event_cfg = aic100_events,
+	.use_bounce_buf = false,
+	.m2_no_db = false,
+};
+
+static int mhi_read_reg(struct mhi_controller *mhi_cntl, void __iomem *addr, u32 *out)
+{
+	u32 tmp = readl_relaxed(addr);
+
+	if (tmp == U32_MAX)
+		return -EIO;
+
+	*out = tmp;
+
+	return 0;
+}
+
+static void mhi_write_reg(struct mhi_controller *mhi_cntl, void __iomem *addr,
+			  u32 val)
+{
+	writel_relaxed(val, addr);
+}
+
+static int mhi_runtime_get(struct mhi_controller *mhi_cntl)
+{
+	return 0;
+}
+
+static void mhi_runtime_put(struct mhi_controller *mhi_cntl)
+{
+}
+
+static void mhi_status_cb(struct mhi_controller *mhi_cntl, enum mhi_callback reason)
+{
+	struct qaic_device *qdev = pci_get_drvdata(to_pci_dev(mhi_cntl->cntrl_dev));
+
+	/* this event occurs in atomic context */
+	if (reason == MHI_CB_FATAL_ERROR)
+		pci_err(qdev->pdev, "Fatal error received from device.  Attempting to recover\n");
+	/* this event occurs in non-atomic context */
+	if (reason == MHI_CB_SYS_ERROR && !qdev->in_reset)
+		qaic_dev_reset_clean_local_state(qdev, true);
+}
+
+static int mhi_reset_and_async_power_up(struct mhi_controller *mhi_cntl)
+{
+	char time_sec = 1;
+	int current_ee;
+	int ret;
+
+	/* Reset the device to bring the device in PBL EE */
+	mhi_soc_reset(mhi_cntl);
+
+	/*
+	 * Keep checking the execution environment(EE) after every 1 second
+	 * interval.
+	 */
+	do {
+		msleep(1000);
+		current_ee = mhi_get_exec_env(mhi_cntl);
+	} while (current_ee != MHI_EE_PBL && time_sec++ <= MAX_RESET_TIME_SEC);
+
+	/* If the device is in PBL EE retry power up */
+	if (current_ee == MHI_EE_PBL)
+		ret = mhi_async_power_up(mhi_cntl);
+	else
+		ret = -EIO;
+
+	return ret;
+}
+
+struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev,
+						    void __iomem *mhi_bar,
+						    int mhi_irq)
+{
+	struct mhi_controller *mhi_cntl;
+	int ret;
+
+	mhi_cntl = kzalloc(sizeof(*mhi_cntl), GFP_KERNEL);
+	if (!mhi_cntl)
+		return ERR_PTR(-ENOMEM);
+
+	mhi_cntl->cntrl_dev = &pci_dev->dev;
+
+	/*
+	 * Covers the entire possible physical ram region.  Remote side is
+	 * going to calculate a size of this range, so subtract 1 to prevent
+	 * rollover.
+	 */
+	mhi_cntl->iova_start = 0;
+	mhi_cntl->iova_stop = PHYS_ADDR_MAX - 1;
+
+	mhi_cntl->status_cb = mhi_status_cb;
+	mhi_cntl->runtime_get = mhi_runtime_get;
+	mhi_cntl->runtime_put = mhi_runtime_put;
+	mhi_cntl->read_reg = mhi_read_reg;
+	mhi_cntl->write_reg = mhi_write_reg;
+	mhi_cntl->regs = mhi_bar;
+	mhi_cntl->reg_len = SZ_4K;
+	mhi_cntl->nr_irqs = 1;
+	mhi_cntl->irq = kmalloc(sizeof(*mhi_cntl->irq), GFP_KERNEL);
+
+	if (!mhi_cntl->irq) {
+		ret = -ENOMEM;
+		goto fail_irq;
+	}
+
+	mhi_cntl->irq[0] = mhi_irq;
+
+	mhi_cntl->fw_image = "qcom/aic100/sbl.bin";
+
+	/* use latest configured timeout */
+	aic100_config.timeout_ms = mhi_timeout;
+	ret = mhi_register_controller(mhi_cntl, &aic100_config);
+	if (ret) {
+		pci_err(pci_dev, "mhi_register_controller failed %d\n", ret);
+		goto register_fail;
+	}
+
+	ret = mhi_prepare_for_power_up(mhi_cntl);
+	if (ret) {
+		pci_err(pci_dev, "mhi_prepare_for_power_up failed %d\n", ret);
+		goto prepare_power_up_fail;
+	}
+
+	ret = mhi_async_power_up(mhi_cntl);
+	/*
+	 * If EIO is returned it is possible that device is in SBL EE, which is
+	 * undesired. SOC reset the device and try to power up again.
+	 */
+	if (ret == -EIO && MHI_EE_SBL == mhi_get_exec_env(mhi_cntl)) {
+		pci_err(pci_dev, "Device is not expected to be SBL EE. SOC resetting the device to put it in PBL EE and again trying mhi async power up. Error %d\n",
+			ret);
+		ret = mhi_reset_and_async_power_up(mhi_cntl);
+	}
+
+	if (ret) {
+		pci_err(pci_dev, "mhi_async_power_up failed %d\n", ret);
+		goto power_up_fail;
+	}
+
+	return mhi_cntl;
+
+power_up_fail:
+	mhi_unprepare_after_power_down(mhi_cntl);
+prepare_power_up_fail:
+	mhi_unregister_controller(mhi_cntl);
+register_fail:
+	kfree(mhi_cntl->irq);
+fail_irq:
+	kfree(mhi_cntl);
+	return ERR_PTR(ret);
+}
+
+void qaic_mhi_free_controller(struct mhi_controller *mhi_cntl, bool link_up)
+{
+	mhi_power_down(mhi_cntl, link_up);
+	mhi_unprepare_after_power_down(mhi_cntl);
+	mhi_unregister_controller(mhi_cntl);
+	kfree(mhi_cntl->irq);
+	kfree(mhi_cntl);
+}
+
+void qaic_mhi_start_reset(struct mhi_controller *mhi_cntl)
+{
+	mhi_power_down(mhi_cntl, true);
+}
+
+void qaic_mhi_reset_done(struct mhi_controller *mhi_cntl)
+{
+	struct pci_dev *pci_dev = container_of(mhi_cntl->cntrl_dev,
+					       struct pci_dev, dev);
+	int ret;
+
+	ret = mhi_async_power_up(mhi_cntl);
+	if (ret)
+		pci_err(pci_dev, "mhi_async_power_up failed after reset %d\n", ret);
+}
diff --git a/drivers/accel/qaic/mhi_controller.h b/drivers/accel/qaic/mhi_controller.h
new file mode 100644
index 0000000..12197bb
--- /dev/null
+++ b/drivers/accel/qaic/mhi_controller.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * Copyright (c) 2019-2020, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef MHICONTROLLERQAIC_H_
+#define MHICONTROLLERQAIC_H_
+
+struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev,
+						    void __iomem *mhi_bar,
+						    int mhi_irq);
+
+void qaic_mhi_free_controller(struct mhi_controller *mhi_cntl, bool link_up);
+
+void qaic_mhi_start_reset(struct mhi_controller *mhi_cntl);
+void qaic_mhi_reset_done(struct mhi_controller *mhi_cntl);
+
+#endif /* MHICONTROLLERQAIC_H_ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 4/8] accel/qaic: Add control path
  2023-02-06 15:41 ` Jeffrey Hugo
@ 2023-02-06 15:41   ` Jeffrey Hugo
  -1 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-06 15:41 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: Jeffrey Hugo, linux-doc, linux-arm-msm, quic_ajitpals,
	quic_pkanojiy, stanislaw.gruszka, quic_carlv, jacek.lawrynowicz

Add the control path component that talks to the management processor (QSM)
to load workloads onto the AIC100 device.  This implements the KMD portion
of the NNC protocol over the QAIC_CONTROL MHI channel and the
DRM_IOCTL_QAIC_MANAGE IOCTL to userspace.  With this functionality, QAIC
clients are able to load, run, and cleanup their workloads on the device
but not interact with the workloads (run inferences).

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
---
 drivers/accel/qaic/qaic_control.c | 1656 +++++++++++++++++++++++++++++++++++++
 1 file changed, 1656 insertions(+)
 create mode 100644 drivers/accel/qaic/qaic_control.c

diff --git a/drivers/accel/qaic/qaic_control.c b/drivers/accel/qaic/qaic_control.c
new file mode 100644
index 0000000..9f5d5e2
--- /dev/null
+++ b/drivers/accel/qaic/qaic_control.c
@@ -0,0 +1,1656 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2019-2021, The Linux Foundation. All rights reserved. */
+/* Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved. */
+
+#include <asm/byteorder.h>
+#include <linux/completion.h>
+#include <linux/crc32.h>
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/mhi.h>
+#include <linux/mm.h>
+#include <linux/moduleparam.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/scatterlist.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/workqueue.h>
+#include <linux/wait.h>
+#include <drm/drm_device.h>
+#include <drm/drm_file.h>
+#include <uapi/drm/qaic_accel.h>
+
+#include "qaic.h"
+
+#define MANAGE_MAGIC_NUMBER	 ((__force __le32)0x43494151) /* "QAIC" in little endian */
+#define QAIC_DBC_Q_GAP		   0x100
+#define QAIC_DBC_Q_BUF_ALIGN	   SZ_4K
+#define QAIC_MANAGE_EXT_MSG_LENGTH SZ_64K /* Max DMA message length */
+#define QAIC_WRAPPER_MAX_SIZE      SZ_4K
+#define QAIC_MHI_RETRY_WAIT_MS	   100
+#define QAIC_MHI_RETRY_MAX	   20
+
+static unsigned int control_resp_timeout = 60; /* 60 sec default */
+module_param(control_resp_timeout, uint, 0600);
+
+struct manage_msg {
+	u32 len;
+	u32 count;
+	u8 data[];
+};
+
+/*
+ * wire encoding structures for the manage protocol.
+ * All fields are little endian on the wire
+ */
+struct _msg_hdr {
+	__le32 crc32; /* crc of everything following this field in the message */
+	__le32 magic_number;
+	__le32 sequence_number;
+	__le32 len; /* length of this message */
+	__le32 count; /* number of transactions in this message */
+	__le32 handle; /* unique id to track the resources consumed */
+	__le32 partition_id; /* partition id for the request (signed)*/
+	__le32 padding; /* must be 0 */
+} __packed;
+
+struct _msg {
+	struct _msg_hdr hdr;
+	u8 data[];
+} __packed;
+
+struct _trans_hdr {
+	__le32 type;
+	__le32 len;
+} __packed;
+
+/* Each message sent from driver to device are organized in a list of wrapper_msg */
+struct wrapper_msg {
+	struct list_head list;
+	struct kref ref_count;
+	u32 len; /* length of data to transfer */
+	struct wrapper_list *head;
+	union {
+		struct _msg msg;
+		struct _trans_hdr trans;
+	};
+};
+
+struct wrapper_list {
+	struct list_head list;
+	spinlock_t lock;
+};
+
+struct _trans_passthrough {
+	struct _trans_hdr hdr;
+	u8 data[];
+} __packed;
+
+struct _addr_size_pair {
+	__le64 addr;
+	__le64 size;
+} __packed;
+
+struct _trans_dma_xfer {
+	struct _trans_hdr hdr;
+	__le32 tag;
+	__le32 count;
+	__le32 dma_chunk_id;
+	__le32 padding;
+	struct _addr_size_pair data[];
+} __packed;
+
+/* Initiated by device to continue the DMA xfer of a large piece of data */
+struct _trans_dma_xfer_cont {
+	struct _trans_hdr hdr;
+	__le32 dma_chunk_id;
+	__le32 padding;
+	__le64 xferred_size;
+} __packed;
+
+struct _trans_activate_to_dev {
+	struct _trans_hdr hdr;
+	__le64 req_q_addr;
+	__le64 rsp_q_addr;
+	__le32 req_q_size;
+	__le32 rsp_q_size;
+	__le32 buf_len;
+	__le32 options; /* unused, but BIT(16) has meaning to the device */
+} __packed;
+
+struct _trans_activate_from_dev {
+	struct _trans_hdr hdr;
+	__le32 status;
+	__le32 dbc_id;
+	__le64 options; /* unused */
+} __packed;
+
+struct _trans_deactivate_from_dev {
+	struct _trans_hdr hdr;
+	__le32 status;
+	__le32 dbc_id;
+} __packed;
+
+struct _trans_terminate_to_dev {
+	struct _trans_hdr hdr;
+	__le32 handle;
+	__le32 padding;
+} __packed;
+
+struct _trans_terminate_from_dev {
+	struct _trans_hdr hdr;
+	__le32 status;
+	__le32 padding;
+} __packed;
+
+struct _trans_status_to_dev {
+	struct _trans_hdr hdr;
+} __packed;
+
+struct _trans_status_from_dev {
+	struct _trans_hdr hdr;
+	__le16 major;
+	__le16 minor;
+	__le32 status;
+	__le64 status_flags;
+} __packed;
+
+struct _trans_validate_part_to_dev {
+	struct _trans_hdr hdr;
+	__le32 part_id;
+	__le32 padding;
+} __packed;
+
+struct _trans_validate_part_from_dev {
+	struct _trans_hdr hdr;
+	__le32 status;
+	__le32 padding;
+} __packed;
+
+struct xfer_queue_elem {
+	/*
+	 * Node in list of ongoing transfer request on control channel.
+	 * Maintained by root device struct
+	 */
+	struct list_head list;
+	/* Sequence number of this transfer request */
+	u32 seq_num;
+	/* This is used to wait on until completion of transfer request */
+	struct completion xfer_done;
+	/* Received data from device */
+	void *buf;
+};
+
+struct dma_xfer {
+	/* Node in list of DMA transfers which is used for cleanup */
+	struct list_head list;
+	/* SG table of memory used for DMA */
+	struct sg_table *sgt;
+	/* Array pages used for DMA */
+	struct page **page_list;
+	/* Number of pages used for DMA */
+	unsigned long nr_pages;
+};
+
+struct ioctl_resources {
+	/* List of all DMA transfers which is used later for cleanup */
+	struct list_head dma_xfers;
+	/* Base address of request queue which belongs to a DBC */
+	void *buf;
+	/*
+	 * Base bus address of request queue which belongs to a DBC. Response
+	 * queue base bus address can be calculated by adding size of request
+	 * queue to base bus address of request queue.
+	 */
+	dma_addr_t dma_addr;
+	/* Total size of request queue and response queue in byte */
+	u32 total_size;
+	/* Total number of elements that can be queued in each of request and response queue */
+	u32 nelem;
+	/* Base address of response queue which belongs to a DBC */
+	void *rsp_q_base;
+	/* Status of the NNC message received */
+	u32 status;
+	/* DBC id of the DBC received from device */
+	u32 dbc_id;
+	/*
+	 * DMA transfer request messages can be big in size and it may not be
+	 * possible to send them in one shot. In such cases the messages are
+	 * broken into chunks, this field stores ID of such chunks.
+	 */
+	u32 dma_chunk_id;
+	/* Total number of bytes transferred for a DMA xfer request */
+	u64 xferred_dma_size;
+	/* Header of transaction message received from user. Used during DMA xfer request */
+	void *trans_hdr;
+};
+
+struct resp_work {
+	struct work_struct work;
+	struct qaic_device *qdev;
+	void *buf;
+};
+
+/*
+ * Since we're working with little endian messages, its useful to be able to
+ * increment without filling a whole line with conversions back and forth just
+ * to add one(1) to a message count.
+ */
+static __le32 incr_le32(__le32 val)
+{
+	return cpu_to_le32(le32_to_cpu(val) + 1);
+}
+
+static u32 gen_crc(void *msg)
+{
+	struct wrapper_list *wrappers = msg;
+	struct wrapper_msg *w;
+	u32 crc = ~0;
+
+	list_for_each_entry(w, &wrappers->list, list)
+		crc = crc32(crc, &w->msg, w->len);
+
+	return crc ^ ~0;
+}
+
+static u32 gen_crc_stub(void *msg)
+{
+	return 0;
+}
+
+static bool valid_crc(void *msg)
+{
+	struct _msg_hdr *hdr = msg;
+	bool ret;
+	u32 crc;
+
+	/*
+	 * The output of this algorithm is always converted to the native
+	 * endianness.
+	 */
+	crc = le32_to_cpu(hdr->crc32);
+	hdr->crc32 = 0;
+	ret = (crc32(~0, msg, le32_to_cpu(hdr->len)) ^ ~0) == crc;
+	hdr->crc32 = cpu_to_le32(crc);
+	return ret;
+}
+
+static bool valid_crc_stub(void *msg)
+{
+	return true;
+}
+
+static void free_wrapper(struct kref *ref)
+{
+	struct wrapper_msg *wrapper = container_of(ref, struct wrapper_msg,
+						   ref_count);
+
+	list_del(&wrapper->list);
+	kfree(wrapper);
+}
+
+static void save_dbc_buf(struct qaic_device *qdev,
+			 struct ioctl_resources *resources,
+			 struct qaic_user *usr)
+{
+	u32 dbc_id = resources->dbc_id;
+
+	if (resources->buf) {
+		wait_event_interruptible(qdev->dbc[dbc_id].dbc_release,
+					 !qdev->dbc[dbc_id].in_use);
+		qdev->dbc[dbc_id].req_q_base = resources->buf;
+		qdev->dbc[dbc_id].rsp_q_base = resources->rsp_q_base;
+		qdev->dbc[dbc_id].dma_addr = resources->dma_addr;
+		qdev->dbc[dbc_id].total_size = resources->total_size;
+		qdev->dbc[dbc_id].nelem = resources->nelem;
+		enable_dbc(qdev, dbc_id, usr);
+		qdev->dbc[dbc_id].in_use = true;
+		resources->buf = NULL;
+	}
+}
+
+static void free_dbc_buf(struct qaic_device *qdev,
+			 struct ioctl_resources *resources)
+{
+	if (resources->buf)
+		dma_free_coherent(&qdev->pdev->dev, resources->total_size,
+				  resources->buf, resources->dma_addr);
+	resources->buf = NULL;
+}
+
+static void free_dma_xfers(struct qaic_device *qdev,
+			   struct ioctl_resources *resources)
+{
+	struct dma_xfer *xfer;
+	struct dma_xfer *x;
+	int i;
+
+	list_for_each_entry_safe(xfer, x, &resources->dma_xfers, list) {
+		dma_unmap_sgtable(&qdev->pdev->dev, xfer->sgt, DMA_TO_DEVICE, 0);
+		sg_free_table(xfer->sgt);
+		kfree(xfer->sgt);
+		for (i = 0; i < xfer->nr_pages; ++i)
+			put_page(xfer->page_list[i]);
+		kfree(xfer->page_list);
+		list_del(&xfer->list);
+		kfree(xfer);
+	}
+}
+
+static struct wrapper_msg *add_wrapper(struct wrapper_list *wrappers, u32 size)
+{
+	struct wrapper_msg *w = kzalloc(size, GFP_KERNEL);
+
+	if (!w)
+		return NULL;
+	list_add_tail(&w->list, &wrappers->list);
+	kref_init(&w->ref_count);
+	w->head = wrappers;
+	return w;
+}
+
+static int encode_passthrough(struct qaic_device *qdev, void *trans,
+			      struct wrapper_list *wrappers, u32 *user_len)
+{
+	struct qaic_manage_trans_passthrough *in_trans = trans;
+	struct _trans_passthrough *out_trans;
+	struct wrapper_msg *trans_wrapper;
+	struct wrapper_msg *wrapper;
+	struct _msg *msg;
+	u32 msg_hdr_len;
+
+	wrapper = list_first_entry(&wrappers->list, struct wrapper_msg, list);
+	msg = &wrapper->msg;
+	msg_hdr_len = le32_to_cpu(msg->hdr.len);
+
+	if (in_trans->hdr.len % 8 != 0)
+		return -EINVAL;
+
+	if (msg_hdr_len + in_trans->hdr.len > QAIC_MANAGE_EXT_MSG_LENGTH)
+		return -ENOSPC;
+
+	trans_wrapper = add_wrapper(wrappers,
+				    offsetof(struct wrapper_msg, trans) +
+				    in_trans->hdr.len);
+	if (!trans_wrapper)
+		return -ENOMEM;
+	trans_wrapper->len = in_trans->hdr.len;
+	out_trans = (struct _trans_passthrough *)&trans_wrapper->trans;
+
+	memcpy(out_trans->data, in_trans->data, in_trans->hdr.len - sizeof(in_trans->hdr));
+	msg->hdr.len = cpu_to_le32(msg_hdr_len + in_trans->hdr.len);
+	msg->hdr.count = incr_le32(msg->hdr.count);
+	*user_len += in_trans->hdr.len;
+	out_trans->hdr.type = cpu_to_le32(TRANS_PASSTHROUGH_TO_DEV);
+	out_trans->hdr.len = cpu_to_le32(in_trans->hdr.len);
+
+	return 0;
+}
+
+static int encode_dma(struct qaic_device *qdev, void *trans,
+		      struct wrapper_list *wrappers, u32 *user_len,
+		      struct ioctl_resources *resources,
+		      struct qaic_user *usr)
+{
+	struct qaic_manage_trans_dma_xfer *in_trans = trans;
+	struct _trans_dma_xfer *out_trans;
+	struct wrapper_msg *trans_wrapper;
+	struct wrapper_msg *wrapper;
+	struct _addr_size_pair *asp;
+	unsigned long need_pages;
+	struct scatterlist *last;
+	struct page **page_list;
+	unsigned long nr_pages;
+	struct scatterlist *sg;
+	struct wrapper_msg *w;
+	struct dma_xfer *xfer;
+	struct sg_table *sgt;
+	unsigned int dma_len;
+	u64 dma_chunk_len;
+	struct _msg *msg;
+	u32 msg_hdr_len;
+	void *boundary;
+	int nents_dma;
+	int nents;
+	u32 size;
+	int ret;
+	int i;
+
+	wrapper = list_first_entry(&wrappers->list, struct wrapper_msg, list);
+	msg = &wrapper->msg;
+	msg_hdr_len = le32_to_cpu(msg->hdr.len);
+
+	if (msg_hdr_len > (UINT_MAX - QAIC_MANAGE_EXT_MSG_LENGTH)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* There should be enough space to hold at least one ASP entry. */
+	if (msg_hdr_len + sizeof(*out_trans) + sizeof(*asp) >
+	    QAIC_MANAGE_EXT_MSG_LENGTH) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (in_trans->addr + in_trans->size < in_trans->addr ||
+	    !in_trans->size) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	xfer = kmalloc(sizeof(*xfer), GFP_KERNEL);
+	if (!xfer) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	need_pages = DIV_ROUND_UP(in_trans->size + offset_in_page(in_trans->addr +
+				  resources->xferred_dma_size) -
+				  resources->xferred_dma_size, PAGE_SIZE);
+
+	nr_pages = need_pages;
+
+	while (1) {
+		page_list = kmalloc_array(nr_pages, sizeof(*page_list),
+					  GFP_KERNEL | __GFP_NOWARN);
+		if (!page_list) {
+			nr_pages = nr_pages / 2;
+			if (!nr_pages) {
+				ret = -ENOMEM;
+				goto free_resource;
+			}
+		} else {
+			break;
+		}
+	}
+
+	ret = get_user_pages_fast(in_trans->addr + resources->xferred_dma_size,
+				  nr_pages, 0, page_list);
+	if (ret < 0 || ret != nr_pages) {
+		ret = -EFAULT;
+		goto free_page_list;
+	}
+
+	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
+	if (!sgt) {
+		ret = -ENOMEM;
+		goto put_pages;
+	}
+
+	ret = sg_alloc_table_from_pages(sgt, page_list, nr_pages,
+					offset_in_page(in_trans->addr +
+						       resources->xferred_dma_size),
+					in_trans->size - resources->xferred_dma_size, GFP_KERNEL);
+	if (ret) {
+		ret = -ENOMEM;
+		goto free_sgt;
+	}
+
+	ret = dma_map_sgtable(&qdev->pdev->dev, sgt, DMA_TO_DEVICE, 0);
+	if (ret)
+		goto free_table;
+
+	nents = sgt->nents;
+	/*
+	 * It turns out several of the iommu drivers don't combine adjacent
+	 * regions, which is really what we expect based on the description of
+	 * dma_map_sgtable(), so lets see if that can be done.  It makes our message
+	 * more efficent.
+	 */
+	last = sgt->sgl;
+	nents_dma = nents;
+	size = QAIC_MANAGE_EXT_MSG_LENGTH - msg_hdr_len - sizeof(*out_trans);
+	for_each_sgtable_sg(sgt, sg, i) {
+		if (sg_dma_address(last) + sg_dma_len(last) !=
+		    sg_dma_address(sg)) {
+			size -= sizeof(*asp);
+			/* Save 1K for possible follow-up transactions. */
+			if (size < SZ_1K) {
+				nents_dma = i;
+				break;
+			}
+		}
+		last = sg;
+	}
+
+	trans_wrapper = add_wrapper(wrappers, QAIC_WRAPPER_MAX_SIZE);
+	if (!trans_wrapper) {
+		ret = -ENOMEM;
+		goto dma_unmap;
+	}
+	out_trans = (struct _trans_dma_xfer *)&trans_wrapper->trans;
+
+	asp = out_trans->data;
+	boundary = (void *)trans_wrapper + QAIC_WRAPPER_MAX_SIZE;
+	size = 0;
+
+	last = sgt->sgl;
+	dma_len = 0;
+	w = trans_wrapper;
+	dma_chunk_len = 0;
+	/* Adjacent DMA entries could be stitched together. */
+	for_each_sg(sgt->sgl, sg, nents_dma, i) {
+		/* hit a discontinuity, finalize segment and start new one */
+		if (sg_dma_address(last) + sg_dma_len(last) !=
+		    sg_dma_address(sg)) {
+			asp->size = cpu_to_le64(dma_len);
+			dma_chunk_len += dma_len;
+			if (dma_len) {
+				asp++;
+				if ((void *)asp + sizeof(*asp) > boundary) {
+					w->len = (void *)asp - (void *)&w->msg;
+					size += w->len;
+					w = add_wrapper(wrappers,
+							QAIC_WRAPPER_MAX_SIZE);
+					if (!w) {
+						ret = -ENOMEM;
+						goto dma_unmap;
+					}
+					boundary = (void *)w +
+						   QAIC_WRAPPER_MAX_SIZE;
+					asp = (struct _addr_size_pair *)&w->msg;
+				}
+			}
+			dma_len = 0;
+			asp->addr = cpu_to_le64(sg_dma_address(sg));
+		}
+		dma_len += sg_dma_len(sg);
+		last = sg;
+	}
+	/* finalize the last segment */
+	asp->size = cpu_to_le64(dma_len);
+	w->len = (void *)asp + sizeof(*asp) - (void *)&w->msg;
+	size += w->len;
+
+	msg->hdr.len = cpu_to_le32(msg_hdr_len + size);
+	msg->hdr.count = incr_le32(msg->hdr.count);
+
+	out_trans->hdr.type = cpu_to_le32(TRANS_DMA_XFER_TO_DEV);
+	out_trans->hdr.len = cpu_to_le32(size);
+	out_trans->tag = cpu_to_le32(in_trans->tag);
+	out_trans->count = cpu_to_le32((size - sizeof(*out_trans)) / sizeof(*asp));
+	dma_chunk_len += dma_len;
+
+	*user_len += in_trans->hdr.len;
+
+	if (resources->dma_chunk_id) {
+		out_trans->dma_chunk_id = cpu_to_le32(resources->dma_chunk_id);
+	} else if (need_pages > nr_pages || nents_dma < nents) {
+		while (resources->dma_chunk_id == 0)
+			resources->dma_chunk_id =
+				atomic_inc_return(&usr->chunk_id);
+
+		out_trans->dma_chunk_id = cpu_to_le32(resources->dma_chunk_id);
+	}
+	resources->xferred_dma_size += dma_chunk_len;
+	resources->trans_hdr = trans;
+
+	xfer->sgt = sgt;
+	xfer->page_list = page_list;
+	xfer->nr_pages = nr_pages;
+	list_add(&xfer->list, &resources->dma_xfers);
+	return 0;
+
+dma_unmap:
+	dma_unmap_sgtable(&qdev->pdev->dev, sgt, DMA_TO_DEVICE, 0);
+free_table:
+	sg_free_table(sgt);
+free_sgt:
+	kfree(sgt);
+put_pages:
+	for (i = 0; i < nr_pages; ++i)
+		put_page(page_list[i]);
+free_page_list:
+	kfree(page_list);
+free_resource:
+	kfree(xfer);
+out:
+	return ret;
+}
+
+static int encode_activate(struct qaic_device *qdev, void *trans,
+			   struct wrapper_list *wrappers,
+			   u32 *user_len,
+			   struct ioctl_resources *resources)
+{
+	struct qaic_manage_trans_activate_to_dev *in_trans = trans;
+	struct _trans_activate_to_dev *out_trans;
+	struct wrapper_msg *trans_wrapper;
+	struct wrapper_msg *wrapper;
+	dma_addr_t dma_addr;
+	struct _msg *msg;
+	u32 msg_hdr_len;
+	void *buf;
+	u32 nelem;
+	u32 size;
+	int ret;
+
+	wrapper = list_first_entry(&wrappers->list, struct wrapper_msg, list);
+	msg = &wrapper->msg;
+	msg_hdr_len = le32_to_cpu(msg->hdr.len);
+
+	if (msg_hdr_len + sizeof(*out_trans) > QAIC_MANAGE_MAX_MSG_LENGTH)
+		return -ENOSPC;
+
+	if (!in_trans->queue_size)
+		return -EINVAL;
+
+	if (in_trans->pad)
+		return -EINVAL;
+
+	nelem = in_trans->queue_size;
+	size = (get_dbc_req_elem_size() + get_dbc_rsp_elem_size()) * nelem;
+	if (size / nelem != get_dbc_req_elem_size() + get_dbc_rsp_elem_size())
+		return -EINVAL;
+
+	if (size + QAIC_DBC_Q_GAP + QAIC_DBC_Q_BUF_ALIGN < size)
+		return -EINVAL;
+
+	size = ALIGN((size + QAIC_DBC_Q_GAP), QAIC_DBC_Q_BUF_ALIGN);
+
+	buf = dma_alloc_coherent(&qdev->pdev->dev, size, &dma_addr, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	trans_wrapper = add_wrapper(wrappers,
+				    offsetof(struct wrapper_msg, trans) +
+				    sizeof(*out_trans));
+	if (!trans_wrapper) {
+		ret = -ENOMEM;
+		goto free_dma;
+	}
+	trans_wrapper->len = sizeof(*out_trans);
+	out_trans = (struct _trans_activate_to_dev *)&trans_wrapper->trans;
+
+	out_trans->hdr.type = cpu_to_le32(TRANS_ACTIVATE_TO_DEV);
+	out_trans->hdr.len = cpu_to_le32(sizeof(*out_trans));
+	out_trans->buf_len = cpu_to_le32(size);
+	out_trans->req_q_addr = cpu_to_le64(dma_addr);
+	out_trans->req_q_size = cpu_to_le32(nelem);
+	out_trans->rsp_q_addr = cpu_to_le64(dma_addr + size - nelem *
+							get_dbc_rsp_elem_size());
+	out_trans->rsp_q_size = cpu_to_le32(nelem);
+	out_trans->options = cpu_to_le32(in_trans->options);
+
+	*user_len += in_trans->hdr.len;
+	msg->hdr.len = cpu_to_le32(msg_hdr_len + sizeof(*out_trans));
+	msg->hdr.count = incr_le32(msg->hdr.count);
+
+	resources->buf = buf;
+	resources->dma_addr = dma_addr;
+	resources->total_size = size;
+	resources->nelem = nelem;
+	resources->rsp_q_base = buf + size - nelem * get_dbc_rsp_elem_size();
+	return 0;
+
+free_dma:
+	dma_free_coherent(&qdev->pdev->dev, size, buf, dma_addr);
+	return ret;
+}
+
+static int encode_deactivate(struct qaic_device *qdev, void *trans,
+			     u32 *user_len, struct qaic_user *usr)
+{
+	struct qaic_manage_trans_deactivate *in_trans = trans;
+
+	if (in_trans->dbc_id >= qdev->num_dbc || in_trans->pad)
+		return -EINVAL;
+
+	*user_len += in_trans->hdr.len;
+
+	return disable_dbc(qdev, in_trans->dbc_id, usr);
+}
+
+static int encode_status(struct qaic_device *qdev, void *trans,
+			 struct wrapper_list *wrappers,
+			 u32 *user_len)
+{
+	struct qaic_manage_trans_status_to_dev *in_trans = trans;
+	struct _trans_status_to_dev *out_trans;
+	struct wrapper_msg *trans_wrapper;
+	struct wrapper_msg *wrapper;
+	struct _msg *msg;
+	u32 msg_hdr_len;
+
+	wrapper = list_first_entry(&wrappers->list, struct wrapper_msg, list);
+	msg = &wrapper->msg;
+	msg_hdr_len = le32_to_cpu(msg->hdr.len);
+
+	if (msg_hdr_len + in_trans->hdr.len > QAIC_MANAGE_MAX_MSG_LENGTH)
+		return -ENOSPC;
+
+	trans_wrapper = add_wrapper(wrappers, sizeof(*trans_wrapper));
+	if (!trans_wrapper)
+		return -ENOMEM;
+
+	trans_wrapper->len = sizeof(*out_trans);
+	out_trans = (struct _trans_status_to_dev *)&trans_wrapper->trans;
+
+	out_trans->hdr.type = cpu_to_le32(TRANS_STATUS_TO_DEV);
+	out_trans->hdr.len = cpu_to_le32(in_trans->hdr.len);
+	msg->hdr.len = cpu_to_le32(msg_hdr_len + in_trans->hdr.len);
+	msg->hdr.count = incr_le32(msg->hdr.count);
+	*user_len += in_trans->hdr.len;
+
+	return 0;
+}
+
+static int encode_message(struct qaic_device *qdev,
+			  struct manage_msg *user_msg,
+			  struct wrapper_list *wrappers,
+			  struct ioctl_resources *resources,
+			  struct qaic_user *usr)
+{
+	struct qaic_manage_trans_hdr *trans_hdr;
+	struct wrapper_msg *wrapper;
+	struct _msg *msg;
+	u32 user_len = 0;
+	int ret;
+	int i;
+
+	if (!user_msg->count) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	wrapper = list_first_entry(&wrappers->list, struct wrapper_msg, list);
+	msg = &wrapper->msg;
+
+	msg->hdr.len = cpu_to_le32(sizeof(msg->hdr));
+
+	if (resources->dma_chunk_id) {
+		ret = encode_dma(qdev, resources->trans_hdr, wrappers,
+				 &user_len, resources, usr);
+		msg->hdr.count = cpu_to_le32(1);
+		goto out;
+	}
+
+	for (i = 0; i < user_msg->count; ++i) {
+		if (user_len >= user_msg->len) {
+			ret = -EINVAL;
+			break;
+		}
+		trans_hdr = (struct qaic_manage_trans_hdr *)
+						(user_msg->data + user_len);
+		if (user_len + trans_hdr->len > user_msg->len) {
+			ret = -EINVAL;
+			break;
+		}
+
+		switch (trans_hdr->type) {
+		case TRANS_PASSTHROUGH_FROM_USR:
+			ret = encode_passthrough(qdev, trans_hdr, wrappers,
+						 &user_len);
+			break;
+		case TRANS_DMA_XFER_FROM_USR:
+			ret = encode_dma(qdev, trans_hdr, wrappers, &user_len,
+					 resources, usr);
+			break;
+		case TRANS_ACTIVATE_FROM_USR:
+			ret = encode_activate(qdev, trans_hdr, wrappers,
+					      &user_len, resources);
+			break;
+		case TRANS_DEACTIVATE_FROM_USR:
+			ret = encode_deactivate(qdev, trans_hdr, &user_len, usr);
+			break;
+		case TRANS_STATUS_FROM_USR:
+			ret = encode_status(qdev, trans_hdr, wrappers,
+					    &user_len);
+			break;
+		default:
+			ret = -EINVAL;
+			break;
+		}
+
+		if (ret)
+			break;
+	}
+
+	if (user_len != user_msg->len)
+		ret = -EINVAL;
+out:
+	if (ret) {
+		free_dma_xfers(qdev, resources);
+		free_dbc_buf(qdev, resources);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int decode_passthrough(struct qaic_device *qdev, void *trans,
+			      struct manage_msg *user_msg, u32 *msg_len)
+{
+	struct _trans_passthrough *in_trans = trans;
+	struct qaic_manage_trans_passthrough *out_trans;
+	u32 len;
+
+	out_trans = (void *)user_msg->data + user_msg->len;
+
+	len = le32_to_cpu(in_trans->hdr.len);
+	if (len % 8 != 0)
+		return -EINVAL;
+
+	if (user_msg->len + len > QAIC_MANAGE_MAX_MSG_LENGTH)
+		return -ENOSPC;
+
+	memcpy(out_trans->data, in_trans->data, len - sizeof(in_trans->hdr));
+	user_msg->len += len;
+	*msg_len += len;
+	out_trans->hdr.type = le32_to_cpu(in_trans->hdr.type);
+	out_trans->hdr.len = len;
+
+	return 0;
+}
+
+static int decode_activate(struct qaic_device *qdev, void *trans,
+			   struct manage_msg *user_msg, u32 *msg_len,
+			   struct ioctl_resources *resources,
+			   struct qaic_user *usr)
+{
+	struct _trans_activate_from_dev *in_trans = trans;
+	struct qaic_manage_trans_activate_from_dev *out_trans;
+	u32 len;
+
+	out_trans = (void *)user_msg->data + user_msg->len;
+
+	len = le32_to_cpu(in_trans->hdr.len);
+	if (user_msg->len + len > QAIC_MANAGE_MAX_MSG_LENGTH)
+		return -ENOSPC;
+
+	user_msg->len += len;
+	*msg_len += len;
+	out_trans->hdr.type = le32_to_cpu(in_trans->hdr.type);
+	out_trans->hdr.len = len;
+	out_trans->status = le32_to_cpu(in_trans->status);
+	out_trans->dbc_id = le32_to_cpu(in_trans->dbc_id);
+	out_trans->options = le64_to_cpu(in_trans->options);
+
+	if (!resources->buf)
+		/* how did we get an activate response without a request? */
+		return -EINVAL;
+
+	if (out_trans->dbc_id >= qdev->num_dbc)
+		/*
+		 * The device assigned an invalid resource, which should never
+		 * happen.  Return an error so the user can try to recover.
+		 */
+		return -ENODEV;
+
+	if (out_trans->status)
+		/*
+		 * Allocating resources failed on device side. This is not an
+		 * expected behaviour, user is expected to handle this situation.
+		 */
+		return -ECANCELED;
+
+	resources->status = out_trans->status;
+	resources->dbc_id = out_trans->dbc_id;
+	save_dbc_buf(qdev, resources, usr);
+
+	return 0;
+}
+
+static int decode_deactivate(struct qaic_device *qdev, void *trans,
+			     u32 *msg_len, struct qaic_user *usr)
+{
+	struct _trans_deactivate_from_dev *in_trans = trans;
+	u32 dbc_id = le32_to_cpu(in_trans->dbc_id);
+	u32 status = le32_to_cpu(in_trans->status);
+
+	if (dbc_id >= qdev->num_dbc)
+		/*
+		 * The device assigned an invalid resource, which should never
+		 * happen.  Inject an error so the user can try to recover.
+		 */
+		return -ENODEV;
+
+	if (status) {
+		/*
+		 * Releasing resources failed on the device side, which puts
+		 * us in a bind since they may still be in use, so enable the
+		 * dbc. User is expected to retry deactivation.
+		 */
+		enable_dbc(qdev, dbc_id, usr);
+		return -ECANCELED;
+	}
+
+	release_dbc(qdev, dbc_id);
+	*msg_len += sizeof(*in_trans);
+
+	return 0;
+}
+
+static int decode_status(struct qaic_device *qdev, void *trans,
+			 struct manage_msg *user_msg, u32 *user_len,
+			 struct _msg *msg)
+{
+	struct _trans_status_from_dev *in_trans = trans;
+	struct qaic_manage_trans_status_from_dev *out_trans;
+	u32 len;
+
+	out_trans = (void *)user_msg->data + user_msg->len;
+
+	len = le32_to_cpu(in_trans->hdr.len);
+	if (user_msg->len + len > QAIC_MANAGE_MAX_MSG_LENGTH)
+		return -ENOSPC;
+
+	out_trans->hdr.type = TRANS_STATUS_FROM_DEV;
+	out_trans->hdr.len = len;
+	out_trans->major = le16_to_cpu(in_trans->major);
+	out_trans->minor = le16_to_cpu(in_trans->minor);
+	out_trans->status_flags = le64_to_cpu(in_trans->status_flags);
+	out_trans->status = le32_to_cpu(in_trans->status);
+	*user_len += le32_to_cpu(in_trans->hdr.len);
+	user_msg->len += len;
+
+	if (out_trans->status)
+		return -ECANCELED;
+	if (out_trans->status_flags & BIT(0) && !valid_crc(msg))
+		return -EPIPE;
+
+	return 0;
+}
+
+static int decode_message(struct qaic_device *qdev,
+			  struct manage_msg *user_msg, struct _msg *msg,
+			  struct ioctl_resources *resources,
+			  struct qaic_user *usr)
+{
+	struct _trans_hdr *trans_hdr;
+	u32 msg_len = 0;
+	u32 msg_hdr_len = le32_to_cpu(msg->hdr.len);
+	int ret;
+	int i;
+
+	if (msg_hdr_len > QAIC_MANAGE_MAX_MSG_LENGTH)
+		return -EINVAL;
+
+	user_msg->len = 0;
+	user_msg->count = le32_to_cpu(msg->hdr.count);
+
+	for (i = 0; i < user_msg->count; ++i) {
+		trans_hdr = (struct _trans_hdr *)(msg->data + msg_len);
+		if (msg_len + le32_to_cpu(trans_hdr->len) > msg_hdr_len)
+			return -EINVAL;
+
+		switch (le32_to_cpu(trans_hdr->type)) {
+		case TRANS_PASSTHROUGH_FROM_DEV:
+			ret = decode_passthrough(qdev, trans_hdr, user_msg,
+						 &msg_len);
+			break;
+		case TRANS_ACTIVATE_FROM_DEV:
+			ret = decode_activate(qdev, trans_hdr, user_msg,
+					      &msg_len, resources, usr);
+			break;
+		case TRANS_DEACTIVATE_FROM_DEV:
+			ret = decode_deactivate(qdev, trans_hdr, &msg_len, usr);
+			break;
+		case TRANS_STATUS_FROM_DEV:
+			ret = decode_status(qdev, trans_hdr, user_msg,
+					    &msg_len, msg);
+			break;
+		default:
+			return -EINVAL;
+		}
+
+		if (ret)
+			return ret;
+	}
+
+	if (msg_len != (msg_hdr_len - sizeof(msg->hdr)))
+		return -EINVAL;
+
+	return 0;
+}
+
+static void *msg_xfer(struct qaic_device *qdev, struct wrapper_list *wrappers,
+		      u32 seq_num, bool ignore_signal)
+{
+	struct xfer_queue_elem elem;
+	struct wrapper_msg *w;
+	struct _msg *out_buf;
+	int retry_count;
+	long ret;
+
+	if (qdev->in_reset) {
+		mutex_unlock(&qdev->cntl_mutex);
+		return ERR_PTR(-ENODEV);
+	}
+
+	elem.seq_num = seq_num;
+	elem.buf = NULL;
+	init_completion(&elem.xfer_done);
+	if (likely(!qdev->cntl_lost_buf)) {
+		/*
+		 * The max size of request to device is QAIC_MANAGE_EXT_MSG_LENGTH.
+		 * The max size of response from device is QAIC_MANAGE_MAX_MSG_LENGTH.
+		 */
+		out_buf = kmalloc(QAIC_MANAGE_MAX_MSG_LENGTH, GFP_KERNEL);
+		if (!out_buf) {
+			mutex_unlock(&qdev->cntl_mutex);
+			return ERR_PTR(-ENOMEM);
+		}
+
+		ret = mhi_queue_buf(qdev->cntl_ch, DMA_FROM_DEVICE,
+				    out_buf, QAIC_MANAGE_MAX_MSG_LENGTH,
+				    MHI_EOT);
+		if (ret) {
+			mutex_unlock(&qdev->cntl_mutex);
+			return ERR_PTR(ret);
+		}
+	} else {
+		/*
+		 * we lost a buffer because we queued a recv buf, but then
+		 * queuing the corresponding tx buf failed.  To try to avoid
+		 * a memory leak, lets reclaim it and use it for this
+		 * transaction.
+		 */
+		qdev->cntl_lost_buf = false;
+	}
+
+	list_for_each_entry(w, &wrappers->list, list) {
+		kref_get(&w->ref_count);
+		retry_count = 0;
+retry:
+		ret = mhi_queue_buf(qdev->cntl_ch, DMA_TO_DEVICE, &w->msg,
+				    w->len,
+				    list_is_last(&w->list, &wrappers->list) ?
+						MHI_EOT : MHI_CHAIN);
+		if (ret) {
+			if (ret == -EAGAIN &&
+			    retry_count++ < QAIC_MHI_RETRY_MAX) {
+				msleep_interruptible(QAIC_MHI_RETRY_WAIT_MS);
+				if (!signal_pending(current))
+					goto retry;
+			}
+
+			qdev->cntl_lost_buf = true;
+			kref_put(&w->ref_count, free_wrapper);
+			mutex_unlock(&qdev->cntl_mutex);
+			return ERR_PTR(ret);
+		}
+	}
+
+	list_add_tail(&elem.list, &qdev->cntl_xfer_list);
+	mutex_unlock(&qdev->cntl_mutex);
+
+	if (ignore_signal)
+		ret = wait_for_completion_timeout(&elem.xfer_done,
+						  control_resp_timeout * HZ);
+	else
+		ret = wait_for_completion_interruptible_timeout(&elem.xfer_done,
+								control_resp_timeout * HZ);
+	/*
+	 * not using _interruptable because we have to cleanup or we'll
+	 * likely cause memory corruption
+	 */
+	mutex_lock(&qdev->cntl_mutex);
+	if (!list_empty(&elem.list))
+		list_del(&elem.list);
+	if (!ret && !elem.buf)
+		ret = -ETIMEDOUT;
+	else if (ret > 0 && !elem.buf)
+		ret = -EIO;
+	mutex_unlock(&qdev->cntl_mutex);
+
+	if (ret < 0) {
+		kfree(elem.buf);
+		return ERR_PTR(ret);
+	} else if (!qdev->valid_crc(elem.buf)) {
+		kfree(elem.buf);
+		return ERR_PTR(-EPIPE);
+	}
+
+	return elem.buf;
+}
+
+/* Add a transaction to abort the outstanding DMA continuation */
+static int abort_dma_cont(struct qaic_device *qdev,
+			  struct wrapper_list *wrappers, u32 dma_chunk_id)
+{
+	struct _trans_dma_xfer *out_trans;
+	u32 size = sizeof(*out_trans);
+	struct wrapper_msg *wrapper;
+	struct wrapper_msg *w;
+	struct _msg *msg;
+
+	wrapper = list_first_entry(&wrappers->list, struct wrapper_msg, list);
+	msg = &wrapper->msg;
+
+	/* Remove all but the first wrapper which has the msg header */
+	list_for_each_entry_safe(wrapper, w, &wrappers->list, list)
+		if (!list_is_first(&wrapper->list, &wrappers->list))
+			kref_put(&wrapper->ref_count, free_wrapper);
+
+	wrapper = add_wrapper(wrappers,
+			      offsetof(struct wrapper_msg, trans) + sizeof(*out_trans));
+
+	if (!wrapper)
+		return -ENOMEM;
+
+	out_trans = (struct _trans_dma_xfer *)&wrapper->trans;
+	out_trans->hdr.type = cpu_to_le32(TRANS_DMA_XFER_TO_DEV);
+	out_trans->hdr.len = cpu_to_le32(size);
+	out_trans->tag = cpu_to_le32(0);
+	out_trans->count = cpu_to_le32(0);
+	out_trans->dma_chunk_id = cpu_to_le32(dma_chunk_id);
+
+	msg->hdr.len = cpu_to_le32(size + sizeof(*msg));
+	msg->hdr.count = cpu_to_le32(1);
+	wrapper->len = size;
+
+	return 0;
+}
+
+static struct wrapper_list *alloc_wrapper_list(void)
+{
+	struct wrapper_list *wrappers;
+
+	wrappers = kmalloc(sizeof(*wrappers), GFP_KERNEL);
+	if (!wrappers)
+		return NULL;
+	INIT_LIST_HEAD(&wrappers->list);
+	spin_lock_init(&wrappers->lock);
+
+	return wrappers;
+}
+
+static int __qaic_manage(struct qaic_device *qdev, struct qaic_user *usr,
+			 struct manage_msg *user_msg,
+			 struct ioctl_resources *resources,
+			 struct _msg **rsp)
+{
+	struct wrapper_list *wrappers;
+	struct wrapper_msg *wrapper;
+	struct wrapper_msg *w;
+	bool all_done = false;
+	struct _msg *msg;
+	int ret;
+
+	wrappers = alloc_wrapper_list();
+	if (!wrappers)
+		return -ENOMEM;
+
+	wrapper = add_wrapper(wrappers, sizeof(*wrapper));
+	if (!wrapper) {
+		kfree(wrappers);
+		return -ENOMEM;
+	}
+
+	msg = &wrapper->msg;
+	wrapper->len = sizeof(*msg);
+
+	ret = encode_message(qdev, user_msg, wrappers, resources, usr);
+	if (ret && resources->dma_chunk_id)
+		ret = abort_dma_cont(qdev, wrappers, resources->dma_chunk_id);
+	if (ret)
+		goto encode_failed;
+
+	ret = mutex_lock_interruptible(&qdev->cntl_mutex);
+	if (ret)
+		goto lock_failed;
+
+	msg->hdr.magic_number = MANAGE_MAGIC_NUMBER;
+	msg->hdr.sequence_number = cpu_to_le32(qdev->next_seq_num++);
+
+	if (usr) {
+		msg->hdr.handle = cpu_to_le32(usr->handle);
+		msg->hdr.partition_id = cpu_to_le32(usr->qddev->partition_id);
+	} else {
+		msg->hdr.handle = 0;
+		msg->hdr.partition_id = cpu_to_le32(QAIC_NO_PARTITION);
+	}
+
+	msg->hdr.padding = cpu_to_le32(0);
+	msg->hdr.crc32 = cpu_to_le32(qdev->gen_crc(wrappers));
+
+	/* msg_xfer releases the mutex */
+	*rsp = msg_xfer(qdev, wrappers, qdev->next_seq_num - 1, false);
+	if (IS_ERR(*rsp))
+		ret = PTR_ERR(*rsp);
+
+lock_failed:
+	free_dma_xfers(qdev, resources);
+encode_failed:
+	spin_lock(&wrappers->lock);
+	list_for_each_entry_safe(wrapper, w, &wrappers->list, list)
+		kref_put(&wrapper->ref_count, free_wrapper);
+	all_done = list_empty(&wrappers->list);
+	spin_unlock(&wrappers->lock);
+	if (all_done)
+		kfree(wrappers);
+
+	return ret;
+}
+
+static int qaic_manage(struct qaic_device *qdev, struct qaic_user *usr,
+		       struct manage_msg *user_msg)
+{
+	struct _trans_dma_xfer_cont *dma_cont = NULL;
+	struct ioctl_resources resources;
+	struct _msg *rsp = NULL;
+	int ret;
+
+	memset(&resources, 0, sizeof(struct ioctl_resources));
+
+	INIT_LIST_HEAD(&resources.dma_xfers);
+
+	if (user_msg->len > QAIC_MANAGE_MAX_MSG_LENGTH ||
+	    user_msg->count > QAIC_MANAGE_MAX_MSG_LENGTH / sizeof(struct qaic_manage_trans_hdr))
+		return -EINVAL;
+
+dma_xfer_continue:
+	ret = __qaic_manage(qdev, usr, user_msg, &resources, &rsp);
+	if (ret)
+		return ret;
+	/* dma_cont should be the only transaction if present */
+	if (le32_to_cpu(rsp->hdr.count) == 1) {
+		dma_cont = (struct _trans_dma_xfer_cont *)rsp->data;
+		if (le32_to_cpu(dma_cont->hdr.type) != TRANS_DMA_XFER_CONT)
+			dma_cont = NULL;
+	}
+	if (dma_cont) {
+		if (le32_to_cpu(dma_cont->dma_chunk_id) == resources.dma_chunk_id &&
+		    le64_to_cpu(dma_cont->xferred_size) == resources.xferred_dma_size) {
+			kfree(rsp);
+			goto dma_xfer_continue;
+		}
+
+		ret = -EINVAL;
+		goto dma_cont_failed;
+	}
+
+	ret = decode_message(qdev, user_msg, rsp, &resources, usr);
+
+dma_cont_failed:
+	free_dbc_buf(qdev, &resources);
+	kfree(rsp);
+	return ret;
+}
+
+int qaic_manage_ioctl(struct drm_device *dev, void *data,
+		      struct drm_file *file_priv)
+{
+	struct qaic_manage_msg *user_msg;
+	struct qaic_device *qdev;
+	struct manage_msg *msg;
+	struct qaic_user *usr;
+	u8 __user *user_data;
+	int qdev_rcu_id;
+	int usr_rcu_id;
+	int ret;
+
+	usr = file_priv->driver_priv;
+
+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
+	if (!usr->qddev) {
+		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+		return -ENODEV;
+	}
+
+	qdev = usr->qddev->qdev;
+
+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+		return -ENODEV;
+	}
+
+	user_msg = data;
+
+	if (user_msg->len > QAIC_MANAGE_MAX_MSG_LENGTH) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	msg = kzalloc(QAIC_MANAGE_MAX_MSG_LENGTH + sizeof(*msg), GFP_KERNEL);
+	if (!msg) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	msg->len = user_msg->len;
+	msg->count = user_msg->count;
+
+	user_data = u64_to_user_ptr(user_msg->data);
+
+	if (copy_from_user(msg->data, user_data, user_msg->len)) {
+		ret = -EFAULT;
+		goto free_msg;
+	}
+
+	ret = qaic_manage(qdev, usr, msg);
+
+	/*
+	 * If the qaic_manage() is successful then we copy the message onto
+	 * userspace memory but we have an exception for -ECANCELED.
+	 * For -ECANCELED, it means that device has NACKed the message with a
+	 * status error code which userspace would like to know.
+	 */
+	if (ret == -ECANCELED || !ret) {
+		if (copy_to_user(user_data, msg->data, msg->len)) {
+			ret = -EFAULT;
+		} else {
+			user_msg->len = msg->len;
+			user_msg->count = msg->count;
+		}
+	}
+
+free_msg:
+	kfree(msg);
+out:
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+	return ret;
+}
+
+int get_cntl_version(struct qaic_device *qdev, struct qaic_user *usr,
+		     u16 *major, u16 *minor)
+{
+	int ret;
+	struct manage_msg *user_msg;
+	struct qaic_manage_trans_status_to_dev *status_query;
+	struct qaic_manage_trans_status_from_dev *status_result;
+
+	user_msg = kmalloc(sizeof(*user_msg) + sizeof(*status_result), GFP_KERNEL);
+	if (!user_msg) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	user_msg->len = sizeof(*status_query);
+	user_msg->count = 1;
+
+	status_query = (struct qaic_manage_trans_status_to_dev *)user_msg->data;
+	status_query->hdr.type = TRANS_STATUS_FROM_USR;
+	status_query->hdr.len = sizeof(status_query->hdr);
+
+	ret = qaic_manage(qdev, usr, user_msg);
+	if (ret)
+		goto kfree_user_msg;
+	status_result =
+		(struct qaic_manage_trans_status_from_dev *)user_msg->data;
+	*major = status_result->major;
+	*minor = status_result->minor;
+
+	if (status_result->status_flags & BIT(0)) { /* device is using CRC */
+		/* By default qdev->gen_crc is programmed to generate CRC */
+		qdev->valid_crc = valid_crc;
+	} else {
+		/* By default qdev->valid_crc is programmed to bypass CRC */
+		qdev->gen_crc = gen_crc_stub;
+	}
+
+kfree_user_msg:
+	kfree(user_msg);
+out:
+	return ret;
+}
+
+static void resp_worker(struct work_struct *work)
+{
+	struct resp_work *resp = container_of(work, struct resp_work, work);
+	struct qaic_device *qdev = resp->qdev;
+	struct _msg *msg = resp->buf;
+	struct xfer_queue_elem *elem;
+	struct xfer_queue_elem *i;
+	bool found = false;
+
+	if (msg->hdr.magic_number != MANAGE_MAGIC_NUMBER) {
+		kfree(msg);
+		kfree(resp);
+		return;
+	}
+
+	mutex_lock(&qdev->cntl_mutex);
+	list_for_each_entry_safe(elem, i, &qdev->cntl_xfer_list, list) {
+		if (elem->seq_num == le32_to_cpu(msg->hdr.sequence_number)) {
+			found = true;
+			list_del_init(&elem->list);
+			elem->buf = msg;
+			complete_all(&elem->xfer_done);
+			break;
+		}
+	}
+	mutex_unlock(&qdev->cntl_mutex);
+
+	if (!found)
+		/* request must have timed out, drop packet */
+		kfree(msg);
+
+	kfree(resp);
+}
+
+static void free_wrapper_from_list(struct wrapper_list *wrappers,
+				   struct wrapper_msg *wrapper)
+{
+	bool all_done = false;
+
+	spin_lock(&wrappers->lock);
+	kref_put(&wrapper->ref_count, free_wrapper);
+	all_done = list_empty(&wrappers->list);
+	spin_unlock(&wrappers->lock);
+
+	if (all_done)
+		kfree(wrappers);
+}
+
+void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
+			 struct mhi_result *mhi_result)
+{
+	struct _msg *msg = mhi_result->buf_addr;
+	struct wrapper_msg *wrapper = container_of(msg, struct wrapper_msg,
+						   msg);
+
+	free_wrapper_from_list(wrapper->head, wrapper);
+}
+
+void qaic_mhi_dl_xfer_cb(struct mhi_device *mhi_dev,
+			 struct mhi_result *mhi_result)
+{
+	struct qaic_device *qdev = dev_get_drvdata(&mhi_dev->dev);
+	struct _msg *msg = mhi_result->buf_addr;
+	struct resp_work *resp;
+
+	if (mhi_result->transaction_status) {
+		kfree(msg);
+		return;
+	}
+
+	resp = kmalloc(sizeof(*resp), GFP_ATOMIC);
+	if (!resp) {
+		pci_err(qdev->pdev, "dl_xfer_cb alloc fail, dropping message\n");
+		kfree(msg);
+		return;
+	}
+
+	INIT_WORK(&resp->work, resp_worker);
+	resp->qdev = qdev;
+	resp->buf = msg;
+	queue_work(qdev->cntl_wq, &resp->work);
+}
+
+int qaic_control_open(struct qaic_device *qdev)
+{
+	if (!qdev->cntl_ch)
+		return -ENODEV;
+
+	qdev->cntl_lost_buf = false;
+	/*
+	 * By default qaic should assume that device has CRC enabled.
+	 * Qaic comes to know if device has CRC enabled or disabled during the
+	 * device status transaction, which is the first transaction performed
+	 * on control channel.
+	 *
+	 * So CRC validation of first device status transaction response is
+	 * ignored (by calling valid_crc_stub) and is done later during decoding
+	 * if device has CRC enabled.
+	 * Now that qaic knows whether device has CRC enabled or not it acts
+	 * accordingly
+	 */
+	qdev->gen_crc = gen_crc;
+	qdev->valid_crc = valid_crc_stub;
+
+	return mhi_prepare_for_transfer(qdev->cntl_ch);
+}
+
+void qaic_control_close(struct qaic_device *qdev)
+{
+	mhi_unprepare_from_transfer(qdev->cntl_ch);
+}
+
+void qaic_release_usr(struct qaic_device *qdev, struct qaic_user *usr)
+{
+	struct _trans_terminate_to_dev *trans;
+	struct wrapper_list *wrappers;
+	struct wrapper_msg *wrapper;
+	struct _msg *msg;
+	struct _msg *rsp;
+
+	wrappers = alloc_wrapper_list();
+	if (!wrappers)
+		return;
+
+	wrapper = add_wrapper(wrappers, sizeof(*wrapper) + sizeof(*msg) +
+			      sizeof(*trans));
+	if (!wrapper)
+		return;
+
+	msg = &wrapper->msg;
+
+	trans = (struct _trans_terminate_to_dev *)msg->data;
+
+	trans->hdr.type = cpu_to_le32(TRANS_TERMINATE_TO_DEV);
+	trans->hdr.len = cpu_to_le32(sizeof(*trans));
+	trans->handle = cpu_to_le32(usr->handle);
+
+	mutex_lock(&qdev->cntl_mutex);
+	wrapper->len = sizeof(msg->hdr) + sizeof(*trans);
+	msg->hdr.magic_number = MANAGE_MAGIC_NUMBER;
+	msg->hdr.sequence_number = cpu_to_le32(qdev->next_seq_num++);
+	msg->hdr.len = cpu_to_le32(wrapper->len);
+	msg->hdr.count = cpu_to_le32(1);
+	msg->hdr.handle = cpu_to_le32(usr->handle);
+	msg->hdr.padding = cpu_to_le32(0);
+	msg->hdr.crc32 = cpu_to_le32(qdev->gen_crc(wrappers));
+
+	/*
+	 * msg_xfer releases the mutex
+	 * We don't care about the return of msg_xfer since we will not do
+	 * anything different based on what happens.
+	 * We ignore pending signals since one will be set if the user is
+	 * killed, and we need give the device a chance to cleanup, otherwise
+	 * DMA may still be in progress when we return.
+	 */
+	rsp = msg_xfer(qdev, wrappers, qdev->next_seq_num - 1, true);
+	if (!IS_ERR(rsp))
+		kfree(rsp);
+	free_wrapper_from_list(wrappers, wrapper);
+}
+
+void wake_all_cntl(struct qaic_device *qdev)
+{
+	struct xfer_queue_elem *elem;
+	struct xfer_queue_elem *i;
+
+	mutex_lock(&qdev->cntl_mutex);
+	list_for_each_entry_safe(elem, i, &qdev->cntl_xfer_list, list) {
+		list_del_init(&elem->list);
+		complete_all(&elem->xfer_done);
+	}
+	mutex_unlock(&qdev->cntl_mutex);
+}
+
+int qaic_data_get_reservation(struct qaic_device *qdev, struct qaic_user *usr,
+			      void *data, u32 *partition_id, u16 *remove)
+{
+	struct _trans_validate_part_from_dev *trans_rsp;
+	struct _trans_validate_part_to_dev *trans_req;
+	struct qaic_part_dev *user_msg;
+	struct wrapper_list *wrappers;
+	struct wrapper_msg *wrapper;
+	struct _msg *msg_req;
+	struct _msg *msg_rsp;
+	size_t msg_rsp_len;
+	int ret = 0;
+
+	user_msg = (struct qaic_part_dev *)data;
+	/* -1 for partition_id is a special value, so check for it */
+	if (user_msg->partition_id == QAIC_NO_PARTITION || user_msg->remove > 1) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	*partition_id = user_msg->partition_id;
+	*remove = user_msg->remove;
+
+	/*
+	 * In case of a remove we do not need to do a fw partition check, the
+	 * right user is validated when removing the device in the device
+	 * remove code. So, in case remove is set to 1, we just copy the
+	 * parameters and return from the call.
+	 */
+	if (*remove)
+		return 0;
+
+	wrappers = alloc_wrapper_list();
+	if (!wrappers)
+		return -ENOMEM;
+
+	wrapper = add_wrapper(wrappers, sizeof(*wrapper) + sizeof(*msg_req) +
+			      sizeof(*trans_req));
+	if (!wrapper) {
+		kfree(wrappers);
+		return -ENOMEM;
+	}
+
+	msg_req = &wrapper->msg;
+
+	trans_req = (struct _trans_validate_part_to_dev *)msg_req->data;
+	trans_req->hdr.type = cpu_to_le32(TRANS_VALIDATE_PARTITION_TO_DEV);
+	trans_req->hdr.len = cpu_to_le32(sizeof(*trans_req));
+	trans_req->part_id = cpu_to_le32(*partition_id);
+
+	mutex_lock(&qdev->cntl_mutex);
+	wrapper->len = sizeof(msg_req->hdr) + sizeof(*trans_req);
+	msg_req->hdr.len = cpu_to_le32(wrapper->len);
+	msg_req->hdr.sequence_number = cpu_to_le32(qdev->next_seq_num++);
+	msg_req->hdr.magic_number = MANAGE_MAGIC_NUMBER;
+	msg_req->hdr.handle = cpu_to_le32(usr->handle);
+	msg_req->hdr.count = cpu_to_le32(1);
+	msg_req->hdr.padding = cpu_to_le32(0);
+	msg_req->hdr.crc32 = cpu_to_le32(qdev->gen_crc(wrappers));
+
+	/*
+	 * msg_xfer releases the mutex
+	 * The msg count will always be 1 in the response
+	 */
+	msg_rsp = msg_xfer(qdev, wrappers, qdev->next_seq_num - 1, false);
+	if (IS_ERR(msg_rsp)) {
+		ret = PTR_ERR(msg_rsp);
+		goto kfree_wrapper;
+	}
+
+	msg_rsp_len = sizeof(msg_rsp->hdr) + sizeof(*trans_rsp);
+	if (le32_to_cpu(msg_rsp->hdr.count) != 1 ||
+	    le32_to_cpu(msg_rsp->hdr.len) < msg_rsp_len) {
+		ret = -EINVAL;
+		goto kfree_msg_rsp;
+	}
+
+	trans_rsp = (struct _trans_validate_part_from_dev *)msg_rsp->data;
+	if (le32_to_cpu(trans_rsp->status))
+		ret = -EPERM;
+
+kfree_msg_rsp:
+	kfree(msg_rsp);
+kfree_wrapper:
+	free_wrapper_from_list(wrappers, wrapper);
+out:
+	return ret;
+}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 4/8] accel/qaic: Add control path
@ 2023-02-06 15:41   ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-06 15:41 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: jacek.lawrynowicz, stanislaw.gruszka, quic_pkanojiy, quic_carlv,
	quic_ajitpals, linux-doc, linux-arm-msm, Jeffrey Hugo

Add the control path component that talks to the management processor (QSM)
to load workloads onto the AIC100 device.  This implements the KMD portion
of the NNC protocol over the QAIC_CONTROL MHI channel and the
DRM_IOCTL_QAIC_MANAGE IOCTL to userspace.  With this functionality, QAIC
clients are able to load, run, and cleanup their workloads on the device
but not interact with the workloads (run inferences).

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
---
 drivers/accel/qaic/qaic_control.c | 1656 +++++++++++++++++++++++++++++++++++++
 1 file changed, 1656 insertions(+)
 create mode 100644 drivers/accel/qaic/qaic_control.c

diff --git a/drivers/accel/qaic/qaic_control.c b/drivers/accel/qaic/qaic_control.c
new file mode 100644
index 0000000..9f5d5e2
--- /dev/null
+++ b/drivers/accel/qaic/qaic_control.c
@@ -0,0 +1,1656 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2019-2021, The Linux Foundation. All rights reserved. */
+/* Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved. */
+
+#include <asm/byteorder.h>
+#include <linux/completion.h>
+#include <linux/crc32.h>
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/mhi.h>
+#include <linux/mm.h>
+#include <linux/moduleparam.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/scatterlist.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/workqueue.h>
+#include <linux/wait.h>
+#include <drm/drm_device.h>
+#include <drm/drm_file.h>
+#include <uapi/drm/qaic_accel.h>
+
+#include "qaic.h"
+
+#define MANAGE_MAGIC_NUMBER	 ((__force __le32)0x43494151) /* "QAIC" in little endian */
+#define QAIC_DBC_Q_GAP		   0x100
+#define QAIC_DBC_Q_BUF_ALIGN	   SZ_4K
+#define QAIC_MANAGE_EXT_MSG_LENGTH SZ_64K /* Max DMA message length */
+#define QAIC_WRAPPER_MAX_SIZE      SZ_4K
+#define QAIC_MHI_RETRY_WAIT_MS	   100
+#define QAIC_MHI_RETRY_MAX	   20
+
+static unsigned int control_resp_timeout = 60; /* 60 sec default */
+module_param(control_resp_timeout, uint, 0600);
+
+struct manage_msg {
+	u32 len;
+	u32 count;
+	u8 data[];
+};
+
+/*
+ * wire encoding structures for the manage protocol.
+ * All fields are little endian on the wire
+ */
+struct _msg_hdr {
+	__le32 crc32; /* crc of everything following this field in the message */
+	__le32 magic_number;
+	__le32 sequence_number;
+	__le32 len; /* length of this message */
+	__le32 count; /* number of transactions in this message */
+	__le32 handle; /* unique id to track the resources consumed */
+	__le32 partition_id; /* partition id for the request (signed)*/
+	__le32 padding; /* must be 0 */
+} __packed;
+
+struct _msg {
+	struct _msg_hdr hdr;
+	u8 data[];
+} __packed;
+
+struct _trans_hdr {
+	__le32 type;
+	__le32 len;
+} __packed;
+
+/* Each message sent from driver to device are organized in a list of wrapper_msg */
+struct wrapper_msg {
+	struct list_head list;
+	struct kref ref_count;
+	u32 len; /* length of data to transfer */
+	struct wrapper_list *head;
+	union {
+		struct _msg msg;
+		struct _trans_hdr trans;
+	};
+};
+
+struct wrapper_list {
+	struct list_head list;
+	spinlock_t lock;
+};
+
+struct _trans_passthrough {
+	struct _trans_hdr hdr;
+	u8 data[];
+} __packed;
+
+struct _addr_size_pair {
+	__le64 addr;
+	__le64 size;
+} __packed;
+
+struct _trans_dma_xfer {
+	struct _trans_hdr hdr;
+	__le32 tag;
+	__le32 count;
+	__le32 dma_chunk_id;
+	__le32 padding;
+	struct _addr_size_pair data[];
+} __packed;
+
+/* Initiated by device to continue the DMA xfer of a large piece of data */
+struct _trans_dma_xfer_cont {
+	struct _trans_hdr hdr;
+	__le32 dma_chunk_id;
+	__le32 padding;
+	__le64 xferred_size;
+} __packed;
+
+struct _trans_activate_to_dev {
+	struct _trans_hdr hdr;
+	__le64 req_q_addr;
+	__le64 rsp_q_addr;
+	__le32 req_q_size;
+	__le32 rsp_q_size;
+	__le32 buf_len;
+	__le32 options; /* unused, but BIT(16) has meaning to the device */
+} __packed;
+
+struct _trans_activate_from_dev {
+	struct _trans_hdr hdr;
+	__le32 status;
+	__le32 dbc_id;
+	__le64 options; /* unused */
+} __packed;
+
+struct _trans_deactivate_from_dev {
+	struct _trans_hdr hdr;
+	__le32 status;
+	__le32 dbc_id;
+} __packed;
+
+struct _trans_terminate_to_dev {
+	struct _trans_hdr hdr;
+	__le32 handle;
+	__le32 padding;
+} __packed;
+
+struct _trans_terminate_from_dev {
+	struct _trans_hdr hdr;
+	__le32 status;
+	__le32 padding;
+} __packed;
+
+struct _trans_status_to_dev {
+	struct _trans_hdr hdr;
+} __packed;
+
+struct _trans_status_from_dev {
+	struct _trans_hdr hdr;
+	__le16 major;
+	__le16 minor;
+	__le32 status;
+	__le64 status_flags;
+} __packed;
+
+struct _trans_validate_part_to_dev {
+	struct _trans_hdr hdr;
+	__le32 part_id;
+	__le32 padding;
+} __packed;
+
+struct _trans_validate_part_from_dev {
+	struct _trans_hdr hdr;
+	__le32 status;
+	__le32 padding;
+} __packed;
+
+struct xfer_queue_elem {
+	/*
+	 * Node in list of ongoing transfer request on control channel.
+	 * Maintained by root device struct
+	 */
+	struct list_head list;
+	/* Sequence number of this transfer request */
+	u32 seq_num;
+	/* This is used to wait on until completion of transfer request */
+	struct completion xfer_done;
+	/* Received data from device */
+	void *buf;
+};
+
+struct dma_xfer {
+	/* Node in list of DMA transfers which is used for cleanup */
+	struct list_head list;
+	/* SG table of memory used for DMA */
+	struct sg_table *sgt;
+	/* Array pages used for DMA */
+	struct page **page_list;
+	/* Number of pages used for DMA */
+	unsigned long nr_pages;
+};
+
+struct ioctl_resources {
+	/* List of all DMA transfers which is used later for cleanup */
+	struct list_head dma_xfers;
+	/* Base address of request queue which belongs to a DBC */
+	void *buf;
+	/*
+	 * Base bus address of request queue which belongs to a DBC. Response
+	 * queue base bus address can be calculated by adding size of request
+	 * queue to base bus address of request queue.
+	 */
+	dma_addr_t dma_addr;
+	/* Total size of request queue and response queue in byte */
+	u32 total_size;
+	/* Total number of elements that can be queued in each of request and response queue */
+	u32 nelem;
+	/* Base address of response queue which belongs to a DBC */
+	void *rsp_q_base;
+	/* Status of the NNC message received */
+	u32 status;
+	/* DBC id of the DBC received from device */
+	u32 dbc_id;
+	/*
+	 * DMA transfer request messages can be big in size and it may not be
+	 * possible to send them in one shot. In such cases the messages are
+	 * broken into chunks, this field stores ID of such chunks.
+	 */
+	u32 dma_chunk_id;
+	/* Total number of bytes transferred for a DMA xfer request */
+	u64 xferred_dma_size;
+	/* Header of transaction message received from user. Used during DMA xfer request */
+	void *trans_hdr;
+};
+
+struct resp_work {
+	struct work_struct work;
+	struct qaic_device *qdev;
+	void *buf;
+};
+
+/*
+ * Since we're working with little endian messages, its useful to be able to
+ * increment without filling a whole line with conversions back and forth just
+ * to add one(1) to a message count.
+ */
+static __le32 incr_le32(__le32 val)
+{
+	return cpu_to_le32(le32_to_cpu(val) + 1);
+}
+
+static u32 gen_crc(void *msg)
+{
+	struct wrapper_list *wrappers = msg;
+	struct wrapper_msg *w;
+	u32 crc = ~0;
+
+	list_for_each_entry(w, &wrappers->list, list)
+		crc = crc32(crc, &w->msg, w->len);
+
+	return crc ^ ~0;
+}
+
+static u32 gen_crc_stub(void *msg)
+{
+	return 0;
+}
+
+static bool valid_crc(void *msg)
+{
+	struct _msg_hdr *hdr = msg;
+	bool ret;
+	u32 crc;
+
+	/*
+	 * The output of this algorithm is always converted to the native
+	 * endianness.
+	 */
+	crc = le32_to_cpu(hdr->crc32);
+	hdr->crc32 = 0;
+	ret = (crc32(~0, msg, le32_to_cpu(hdr->len)) ^ ~0) == crc;
+	hdr->crc32 = cpu_to_le32(crc);
+	return ret;
+}
+
+static bool valid_crc_stub(void *msg)
+{
+	return true;
+}
+
+static void free_wrapper(struct kref *ref)
+{
+	struct wrapper_msg *wrapper = container_of(ref, struct wrapper_msg,
+						   ref_count);
+
+	list_del(&wrapper->list);
+	kfree(wrapper);
+}
+
+static void save_dbc_buf(struct qaic_device *qdev,
+			 struct ioctl_resources *resources,
+			 struct qaic_user *usr)
+{
+	u32 dbc_id = resources->dbc_id;
+
+	if (resources->buf) {
+		wait_event_interruptible(qdev->dbc[dbc_id].dbc_release,
+					 !qdev->dbc[dbc_id].in_use);
+		qdev->dbc[dbc_id].req_q_base = resources->buf;
+		qdev->dbc[dbc_id].rsp_q_base = resources->rsp_q_base;
+		qdev->dbc[dbc_id].dma_addr = resources->dma_addr;
+		qdev->dbc[dbc_id].total_size = resources->total_size;
+		qdev->dbc[dbc_id].nelem = resources->nelem;
+		enable_dbc(qdev, dbc_id, usr);
+		qdev->dbc[dbc_id].in_use = true;
+		resources->buf = NULL;
+	}
+}
+
+static void free_dbc_buf(struct qaic_device *qdev,
+			 struct ioctl_resources *resources)
+{
+	if (resources->buf)
+		dma_free_coherent(&qdev->pdev->dev, resources->total_size,
+				  resources->buf, resources->dma_addr);
+	resources->buf = NULL;
+}
+
+static void free_dma_xfers(struct qaic_device *qdev,
+			   struct ioctl_resources *resources)
+{
+	struct dma_xfer *xfer;
+	struct dma_xfer *x;
+	int i;
+
+	list_for_each_entry_safe(xfer, x, &resources->dma_xfers, list) {
+		dma_unmap_sgtable(&qdev->pdev->dev, xfer->sgt, DMA_TO_DEVICE, 0);
+		sg_free_table(xfer->sgt);
+		kfree(xfer->sgt);
+		for (i = 0; i < xfer->nr_pages; ++i)
+			put_page(xfer->page_list[i]);
+		kfree(xfer->page_list);
+		list_del(&xfer->list);
+		kfree(xfer);
+	}
+}
+
+static struct wrapper_msg *add_wrapper(struct wrapper_list *wrappers, u32 size)
+{
+	struct wrapper_msg *w = kzalloc(size, GFP_KERNEL);
+
+	if (!w)
+		return NULL;
+	list_add_tail(&w->list, &wrappers->list);
+	kref_init(&w->ref_count);
+	w->head = wrappers;
+	return w;
+}
+
+static int encode_passthrough(struct qaic_device *qdev, void *trans,
+			      struct wrapper_list *wrappers, u32 *user_len)
+{
+	struct qaic_manage_trans_passthrough *in_trans = trans;
+	struct _trans_passthrough *out_trans;
+	struct wrapper_msg *trans_wrapper;
+	struct wrapper_msg *wrapper;
+	struct _msg *msg;
+	u32 msg_hdr_len;
+
+	wrapper = list_first_entry(&wrappers->list, struct wrapper_msg, list);
+	msg = &wrapper->msg;
+	msg_hdr_len = le32_to_cpu(msg->hdr.len);
+
+	if (in_trans->hdr.len % 8 != 0)
+		return -EINVAL;
+
+	if (msg_hdr_len + in_trans->hdr.len > QAIC_MANAGE_EXT_MSG_LENGTH)
+		return -ENOSPC;
+
+	trans_wrapper = add_wrapper(wrappers,
+				    offsetof(struct wrapper_msg, trans) +
+				    in_trans->hdr.len);
+	if (!trans_wrapper)
+		return -ENOMEM;
+	trans_wrapper->len = in_trans->hdr.len;
+	out_trans = (struct _trans_passthrough *)&trans_wrapper->trans;
+
+	memcpy(out_trans->data, in_trans->data, in_trans->hdr.len - sizeof(in_trans->hdr));
+	msg->hdr.len = cpu_to_le32(msg_hdr_len + in_trans->hdr.len);
+	msg->hdr.count = incr_le32(msg->hdr.count);
+	*user_len += in_trans->hdr.len;
+	out_trans->hdr.type = cpu_to_le32(TRANS_PASSTHROUGH_TO_DEV);
+	out_trans->hdr.len = cpu_to_le32(in_trans->hdr.len);
+
+	return 0;
+}
+
+static int encode_dma(struct qaic_device *qdev, void *trans,
+		      struct wrapper_list *wrappers, u32 *user_len,
+		      struct ioctl_resources *resources,
+		      struct qaic_user *usr)
+{
+	struct qaic_manage_trans_dma_xfer *in_trans = trans;
+	struct _trans_dma_xfer *out_trans;
+	struct wrapper_msg *trans_wrapper;
+	struct wrapper_msg *wrapper;
+	struct _addr_size_pair *asp;
+	unsigned long need_pages;
+	struct scatterlist *last;
+	struct page **page_list;
+	unsigned long nr_pages;
+	struct scatterlist *sg;
+	struct wrapper_msg *w;
+	struct dma_xfer *xfer;
+	struct sg_table *sgt;
+	unsigned int dma_len;
+	u64 dma_chunk_len;
+	struct _msg *msg;
+	u32 msg_hdr_len;
+	void *boundary;
+	int nents_dma;
+	int nents;
+	u32 size;
+	int ret;
+	int i;
+
+	wrapper = list_first_entry(&wrappers->list, struct wrapper_msg, list);
+	msg = &wrapper->msg;
+	msg_hdr_len = le32_to_cpu(msg->hdr.len);
+
+	if (msg_hdr_len > (UINT_MAX - QAIC_MANAGE_EXT_MSG_LENGTH)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* There should be enough space to hold at least one ASP entry. */
+	if (msg_hdr_len + sizeof(*out_trans) + sizeof(*asp) >
+	    QAIC_MANAGE_EXT_MSG_LENGTH) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (in_trans->addr + in_trans->size < in_trans->addr ||
+	    !in_trans->size) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	xfer = kmalloc(sizeof(*xfer), GFP_KERNEL);
+	if (!xfer) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	need_pages = DIV_ROUND_UP(in_trans->size + offset_in_page(in_trans->addr +
+				  resources->xferred_dma_size) -
+				  resources->xferred_dma_size, PAGE_SIZE);
+
+	nr_pages = need_pages;
+
+	while (1) {
+		page_list = kmalloc_array(nr_pages, sizeof(*page_list),
+					  GFP_KERNEL | __GFP_NOWARN);
+		if (!page_list) {
+			nr_pages = nr_pages / 2;
+			if (!nr_pages) {
+				ret = -ENOMEM;
+				goto free_resource;
+			}
+		} else {
+			break;
+		}
+	}
+
+	ret = get_user_pages_fast(in_trans->addr + resources->xferred_dma_size,
+				  nr_pages, 0, page_list);
+	if (ret < 0 || ret != nr_pages) {
+		ret = -EFAULT;
+		goto free_page_list;
+	}
+
+	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
+	if (!sgt) {
+		ret = -ENOMEM;
+		goto put_pages;
+	}
+
+	ret = sg_alloc_table_from_pages(sgt, page_list, nr_pages,
+					offset_in_page(in_trans->addr +
+						       resources->xferred_dma_size),
+					in_trans->size - resources->xferred_dma_size, GFP_KERNEL);
+	if (ret) {
+		ret = -ENOMEM;
+		goto free_sgt;
+	}
+
+	ret = dma_map_sgtable(&qdev->pdev->dev, sgt, DMA_TO_DEVICE, 0);
+	if (ret)
+		goto free_table;
+
+	nents = sgt->nents;
+	/*
+	 * It turns out several of the iommu drivers don't combine adjacent
+	 * regions, which is really what we expect based on the description of
+	 * dma_map_sgtable(), so lets see if that can be done.  It makes our message
+	 * more efficent.
+	 */
+	last = sgt->sgl;
+	nents_dma = nents;
+	size = QAIC_MANAGE_EXT_MSG_LENGTH - msg_hdr_len - sizeof(*out_trans);
+	for_each_sgtable_sg(sgt, sg, i) {
+		if (sg_dma_address(last) + sg_dma_len(last) !=
+		    sg_dma_address(sg)) {
+			size -= sizeof(*asp);
+			/* Save 1K for possible follow-up transactions. */
+			if (size < SZ_1K) {
+				nents_dma = i;
+				break;
+			}
+		}
+		last = sg;
+	}
+
+	trans_wrapper = add_wrapper(wrappers, QAIC_WRAPPER_MAX_SIZE);
+	if (!trans_wrapper) {
+		ret = -ENOMEM;
+		goto dma_unmap;
+	}
+	out_trans = (struct _trans_dma_xfer *)&trans_wrapper->trans;
+
+	asp = out_trans->data;
+	boundary = (void *)trans_wrapper + QAIC_WRAPPER_MAX_SIZE;
+	size = 0;
+
+	last = sgt->sgl;
+	dma_len = 0;
+	w = trans_wrapper;
+	dma_chunk_len = 0;
+	/* Adjacent DMA entries could be stitched together. */
+	for_each_sg(sgt->sgl, sg, nents_dma, i) {
+		/* hit a discontinuity, finalize segment and start new one */
+		if (sg_dma_address(last) + sg_dma_len(last) !=
+		    sg_dma_address(sg)) {
+			asp->size = cpu_to_le64(dma_len);
+			dma_chunk_len += dma_len;
+			if (dma_len) {
+				asp++;
+				if ((void *)asp + sizeof(*asp) > boundary) {
+					w->len = (void *)asp - (void *)&w->msg;
+					size += w->len;
+					w = add_wrapper(wrappers,
+							QAIC_WRAPPER_MAX_SIZE);
+					if (!w) {
+						ret = -ENOMEM;
+						goto dma_unmap;
+					}
+					boundary = (void *)w +
+						   QAIC_WRAPPER_MAX_SIZE;
+					asp = (struct _addr_size_pair *)&w->msg;
+				}
+			}
+			dma_len = 0;
+			asp->addr = cpu_to_le64(sg_dma_address(sg));
+		}
+		dma_len += sg_dma_len(sg);
+		last = sg;
+	}
+	/* finalize the last segment */
+	asp->size = cpu_to_le64(dma_len);
+	w->len = (void *)asp + sizeof(*asp) - (void *)&w->msg;
+	size += w->len;
+
+	msg->hdr.len = cpu_to_le32(msg_hdr_len + size);
+	msg->hdr.count = incr_le32(msg->hdr.count);
+
+	out_trans->hdr.type = cpu_to_le32(TRANS_DMA_XFER_TO_DEV);
+	out_trans->hdr.len = cpu_to_le32(size);
+	out_trans->tag = cpu_to_le32(in_trans->tag);
+	out_trans->count = cpu_to_le32((size - sizeof(*out_trans)) / sizeof(*asp));
+	dma_chunk_len += dma_len;
+
+	*user_len += in_trans->hdr.len;
+
+	if (resources->dma_chunk_id) {
+		out_trans->dma_chunk_id = cpu_to_le32(resources->dma_chunk_id);
+	} else if (need_pages > nr_pages || nents_dma < nents) {
+		while (resources->dma_chunk_id == 0)
+			resources->dma_chunk_id =
+				atomic_inc_return(&usr->chunk_id);
+
+		out_trans->dma_chunk_id = cpu_to_le32(resources->dma_chunk_id);
+	}
+	resources->xferred_dma_size += dma_chunk_len;
+	resources->trans_hdr = trans;
+
+	xfer->sgt = sgt;
+	xfer->page_list = page_list;
+	xfer->nr_pages = nr_pages;
+	list_add(&xfer->list, &resources->dma_xfers);
+	return 0;
+
+dma_unmap:
+	dma_unmap_sgtable(&qdev->pdev->dev, sgt, DMA_TO_DEVICE, 0);
+free_table:
+	sg_free_table(sgt);
+free_sgt:
+	kfree(sgt);
+put_pages:
+	for (i = 0; i < nr_pages; ++i)
+		put_page(page_list[i]);
+free_page_list:
+	kfree(page_list);
+free_resource:
+	kfree(xfer);
+out:
+	return ret;
+}
+
+static int encode_activate(struct qaic_device *qdev, void *trans,
+			   struct wrapper_list *wrappers,
+			   u32 *user_len,
+			   struct ioctl_resources *resources)
+{
+	struct qaic_manage_trans_activate_to_dev *in_trans = trans;
+	struct _trans_activate_to_dev *out_trans;
+	struct wrapper_msg *trans_wrapper;
+	struct wrapper_msg *wrapper;
+	dma_addr_t dma_addr;
+	struct _msg *msg;
+	u32 msg_hdr_len;
+	void *buf;
+	u32 nelem;
+	u32 size;
+	int ret;
+
+	wrapper = list_first_entry(&wrappers->list, struct wrapper_msg, list);
+	msg = &wrapper->msg;
+	msg_hdr_len = le32_to_cpu(msg->hdr.len);
+
+	if (msg_hdr_len + sizeof(*out_trans) > QAIC_MANAGE_MAX_MSG_LENGTH)
+		return -ENOSPC;
+
+	if (!in_trans->queue_size)
+		return -EINVAL;
+
+	if (in_trans->pad)
+		return -EINVAL;
+
+	nelem = in_trans->queue_size;
+	size = (get_dbc_req_elem_size() + get_dbc_rsp_elem_size()) * nelem;
+	if (size / nelem != get_dbc_req_elem_size() + get_dbc_rsp_elem_size())
+		return -EINVAL;
+
+	if (size + QAIC_DBC_Q_GAP + QAIC_DBC_Q_BUF_ALIGN < size)
+		return -EINVAL;
+
+	size = ALIGN((size + QAIC_DBC_Q_GAP), QAIC_DBC_Q_BUF_ALIGN);
+
+	buf = dma_alloc_coherent(&qdev->pdev->dev, size, &dma_addr, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	trans_wrapper = add_wrapper(wrappers,
+				    offsetof(struct wrapper_msg, trans) +
+				    sizeof(*out_trans));
+	if (!trans_wrapper) {
+		ret = -ENOMEM;
+		goto free_dma;
+	}
+	trans_wrapper->len = sizeof(*out_trans);
+	out_trans = (struct _trans_activate_to_dev *)&trans_wrapper->trans;
+
+	out_trans->hdr.type = cpu_to_le32(TRANS_ACTIVATE_TO_DEV);
+	out_trans->hdr.len = cpu_to_le32(sizeof(*out_trans));
+	out_trans->buf_len = cpu_to_le32(size);
+	out_trans->req_q_addr = cpu_to_le64(dma_addr);
+	out_trans->req_q_size = cpu_to_le32(nelem);
+	out_trans->rsp_q_addr = cpu_to_le64(dma_addr + size - nelem *
+							get_dbc_rsp_elem_size());
+	out_trans->rsp_q_size = cpu_to_le32(nelem);
+	out_trans->options = cpu_to_le32(in_trans->options);
+
+	*user_len += in_trans->hdr.len;
+	msg->hdr.len = cpu_to_le32(msg_hdr_len + sizeof(*out_trans));
+	msg->hdr.count = incr_le32(msg->hdr.count);
+
+	resources->buf = buf;
+	resources->dma_addr = dma_addr;
+	resources->total_size = size;
+	resources->nelem = nelem;
+	resources->rsp_q_base = buf + size - nelem * get_dbc_rsp_elem_size();
+	return 0;
+
+free_dma:
+	dma_free_coherent(&qdev->pdev->dev, size, buf, dma_addr);
+	return ret;
+}
+
+static int encode_deactivate(struct qaic_device *qdev, void *trans,
+			     u32 *user_len, struct qaic_user *usr)
+{
+	struct qaic_manage_trans_deactivate *in_trans = trans;
+
+	if (in_trans->dbc_id >= qdev->num_dbc || in_trans->pad)
+		return -EINVAL;
+
+	*user_len += in_trans->hdr.len;
+
+	return disable_dbc(qdev, in_trans->dbc_id, usr);
+}
+
+static int encode_status(struct qaic_device *qdev, void *trans,
+			 struct wrapper_list *wrappers,
+			 u32 *user_len)
+{
+	struct qaic_manage_trans_status_to_dev *in_trans = trans;
+	struct _trans_status_to_dev *out_trans;
+	struct wrapper_msg *trans_wrapper;
+	struct wrapper_msg *wrapper;
+	struct _msg *msg;
+	u32 msg_hdr_len;
+
+	wrapper = list_first_entry(&wrappers->list, struct wrapper_msg, list);
+	msg = &wrapper->msg;
+	msg_hdr_len = le32_to_cpu(msg->hdr.len);
+
+	if (msg_hdr_len + in_trans->hdr.len > QAIC_MANAGE_MAX_MSG_LENGTH)
+		return -ENOSPC;
+
+	trans_wrapper = add_wrapper(wrappers, sizeof(*trans_wrapper));
+	if (!trans_wrapper)
+		return -ENOMEM;
+
+	trans_wrapper->len = sizeof(*out_trans);
+	out_trans = (struct _trans_status_to_dev *)&trans_wrapper->trans;
+
+	out_trans->hdr.type = cpu_to_le32(TRANS_STATUS_TO_DEV);
+	out_trans->hdr.len = cpu_to_le32(in_trans->hdr.len);
+	msg->hdr.len = cpu_to_le32(msg_hdr_len + in_trans->hdr.len);
+	msg->hdr.count = incr_le32(msg->hdr.count);
+	*user_len += in_trans->hdr.len;
+
+	return 0;
+}
+
+static int encode_message(struct qaic_device *qdev,
+			  struct manage_msg *user_msg,
+			  struct wrapper_list *wrappers,
+			  struct ioctl_resources *resources,
+			  struct qaic_user *usr)
+{
+	struct qaic_manage_trans_hdr *trans_hdr;
+	struct wrapper_msg *wrapper;
+	struct _msg *msg;
+	u32 user_len = 0;
+	int ret;
+	int i;
+
+	if (!user_msg->count) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	wrapper = list_first_entry(&wrappers->list, struct wrapper_msg, list);
+	msg = &wrapper->msg;
+
+	msg->hdr.len = cpu_to_le32(sizeof(msg->hdr));
+
+	if (resources->dma_chunk_id) {
+		ret = encode_dma(qdev, resources->trans_hdr, wrappers,
+				 &user_len, resources, usr);
+		msg->hdr.count = cpu_to_le32(1);
+		goto out;
+	}
+
+	for (i = 0; i < user_msg->count; ++i) {
+		if (user_len >= user_msg->len) {
+			ret = -EINVAL;
+			break;
+		}
+		trans_hdr = (struct qaic_manage_trans_hdr *)
+						(user_msg->data + user_len);
+		if (user_len + trans_hdr->len > user_msg->len) {
+			ret = -EINVAL;
+			break;
+		}
+
+		switch (trans_hdr->type) {
+		case TRANS_PASSTHROUGH_FROM_USR:
+			ret = encode_passthrough(qdev, trans_hdr, wrappers,
+						 &user_len);
+			break;
+		case TRANS_DMA_XFER_FROM_USR:
+			ret = encode_dma(qdev, trans_hdr, wrappers, &user_len,
+					 resources, usr);
+			break;
+		case TRANS_ACTIVATE_FROM_USR:
+			ret = encode_activate(qdev, trans_hdr, wrappers,
+					      &user_len, resources);
+			break;
+		case TRANS_DEACTIVATE_FROM_USR:
+			ret = encode_deactivate(qdev, trans_hdr, &user_len, usr);
+			break;
+		case TRANS_STATUS_FROM_USR:
+			ret = encode_status(qdev, trans_hdr, wrappers,
+					    &user_len);
+			break;
+		default:
+			ret = -EINVAL;
+			break;
+		}
+
+		if (ret)
+			break;
+	}
+
+	if (user_len != user_msg->len)
+		ret = -EINVAL;
+out:
+	if (ret) {
+		free_dma_xfers(qdev, resources);
+		free_dbc_buf(qdev, resources);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int decode_passthrough(struct qaic_device *qdev, void *trans,
+			      struct manage_msg *user_msg, u32 *msg_len)
+{
+	struct _trans_passthrough *in_trans = trans;
+	struct qaic_manage_trans_passthrough *out_trans;
+	u32 len;
+
+	out_trans = (void *)user_msg->data + user_msg->len;
+
+	len = le32_to_cpu(in_trans->hdr.len);
+	if (len % 8 != 0)
+		return -EINVAL;
+
+	if (user_msg->len + len > QAIC_MANAGE_MAX_MSG_LENGTH)
+		return -ENOSPC;
+
+	memcpy(out_trans->data, in_trans->data, len - sizeof(in_trans->hdr));
+	user_msg->len += len;
+	*msg_len += len;
+	out_trans->hdr.type = le32_to_cpu(in_trans->hdr.type);
+	out_trans->hdr.len = len;
+
+	return 0;
+}
+
+static int decode_activate(struct qaic_device *qdev, void *trans,
+			   struct manage_msg *user_msg, u32 *msg_len,
+			   struct ioctl_resources *resources,
+			   struct qaic_user *usr)
+{
+	struct _trans_activate_from_dev *in_trans = trans;
+	struct qaic_manage_trans_activate_from_dev *out_trans;
+	u32 len;
+
+	out_trans = (void *)user_msg->data + user_msg->len;
+
+	len = le32_to_cpu(in_trans->hdr.len);
+	if (user_msg->len + len > QAIC_MANAGE_MAX_MSG_LENGTH)
+		return -ENOSPC;
+
+	user_msg->len += len;
+	*msg_len += len;
+	out_trans->hdr.type = le32_to_cpu(in_trans->hdr.type);
+	out_trans->hdr.len = len;
+	out_trans->status = le32_to_cpu(in_trans->status);
+	out_trans->dbc_id = le32_to_cpu(in_trans->dbc_id);
+	out_trans->options = le64_to_cpu(in_trans->options);
+
+	if (!resources->buf)
+		/* how did we get an activate response without a request? */
+		return -EINVAL;
+
+	if (out_trans->dbc_id >= qdev->num_dbc)
+		/*
+		 * The device assigned an invalid resource, which should never
+		 * happen.  Return an error so the user can try to recover.
+		 */
+		return -ENODEV;
+
+	if (out_trans->status)
+		/*
+		 * Allocating resources failed on device side. This is not an
+		 * expected behaviour, user is expected to handle this situation.
+		 */
+		return -ECANCELED;
+
+	resources->status = out_trans->status;
+	resources->dbc_id = out_trans->dbc_id;
+	save_dbc_buf(qdev, resources, usr);
+
+	return 0;
+}
+
+static int decode_deactivate(struct qaic_device *qdev, void *trans,
+			     u32 *msg_len, struct qaic_user *usr)
+{
+	struct _trans_deactivate_from_dev *in_trans = trans;
+	u32 dbc_id = le32_to_cpu(in_trans->dbc_id);
+	u32 status = le32_to_cpu(in_trans->status);
+
+	if (dbc_id >= qdev->num_dbc)
+		/*
+		 * The device assigned an invalid resource, which should never
+		 * happen.  Inject an error so the user can try to recover.
+		 */
+		return -ENODEV;
+
+	if (status) {
+		/*
+		 * Releasing resources failed on the device side, which puts
+		 * us in a bind since they may still be in use, so enable the
+		 * dbc. User is expected to retry deactivation.
+		 */
+		enable_dbc(qdev, dbc_id, usr);
+		return -ECANCELED;
+	}
+
+	release_dbc(qdev, dbc_id);
+	*msg_len += sizeof(*in_trans);
+
+	return 0;
+}
+
+static int decode_status(struct qaic_device *qdev, void *trans,
+			 struct manage_msg *user_msg, u32 *user_len,
+			 struct _msg *msg)
+{
+	struct _trans_status_from_dev *in_trans = trans;
+	struct qaic_manage_trans_status_from_dev *out_trans;
+	u32 len;
+
+	out_trans = (void *)user_msg->data + user_msg->len;
+
+	len = le32_to_cpu(in_trans->hdr.len);
+	if (user_msg->len + len > QAIC_MANAGE_MAX_MSG_LENGTH)
+		return -ENOSPC;
+
+	out_trans->hdr.type = TRANS_STATUS_FROM_DEV;
+	out_trans->hdr.len = len;
+	out_trans->major = le16_to_cpu(in_trans->major);
+	out_trans->minor = le16_to_cpu(in_trans->minor);
+	out_trans->status_flags = le64_to_cpu(in_trans->status_flags);
+	out_trans->status = le32_to_cpu(in_trans->status);
+	*user_len += le32_to_cpu(in_trans->hdr.len);
+	user_msg->len += len;
+
+	if (out_trans->status)
+		return -ECANCELED;
+	if (out_trans->status_flags & BIT(0) && !valid_crc(msg))
+		return -EPIPE;
+
+	return 0;
+}
+
+static int decode_message(struct qaic_device *qdev,
+			  struct manage_msg *user_msg, struct _msg *msg,
+			  struct ioctl_resources *resources,
+			  struct qaic_user *usr)
+{
+	struct _trans_hdr *trans_hdr;
+	u32 msg_len = 0;
+	u32 msg_hdr_len = le32_to_cpu(msg->hdr.len);
+	int ret;
+	int i;
+
+	if (msg_hdr_len > QAIC_MANAGE_MAX_MSG_LENGTH)
+		return -EINVAL;
+
+	user_msg->len = 0;
+	user_msg->count = le32_to_cpu(msg->hdr.count);
+
+	for (i = 0; i < user_msg->count; ++i) {
+		trans_hdr = (struct _trans_hdr *)(msg->data + msg_len);
+		if (msg_len + le32_to_cpu(trans_hdr->len) > msg_hdr_len)
+			return -EINVAL;
+
+		switch (le32_to_cpu(trans_hdr->type)) {
+		case TRANS_PASSTHROUGH_FROM_DEV:
+			ret = decode_passthrough(qdev, trans_hdr, user_msg,
+						 &msg_len);
+			break;
+		case TRANS_ACTIVATE_FROM_DEV:
+			ret = decode_activate(qdev, trans_hdr, user_msg,
+					      &msg_len, resources, usr);
+			break;
+		case TRANS_DEACTIVATE_FROM_DEV:
+			ret = decode_deactivate(qdev, trans_hdr, &msg_len, usr);
+			break;
+		case TRANS_STATUS_FROM_DEV:
+			ret = decode_status(qdev, trans_hdr, user_msg,
+					    &msg_len, msg);
+			break;
+		default:
+			return -EINVAL;
+		}
+
+		if (ret)
+			return ret;
+	}
+
+	if (msg_len != (msg_hdr_len - sizeof(msg->hdr)))
+		return -EINVAL;
+
+	return 0;
+}
+
+static void *msg_xfer(struct qaic_device *qdev, struct wrapper_list *wrappers,
+		      u32 seq_num, bool ignore_signal)
+{
+	struct xfer_queue_elem elem;
+	struct wrapper_msg *w;
+	struct _msg *out_buf;
+	int retry_count;
+	long ret;
+
+	if (qdev->in_reset) {
+		mutex_unlock(&qdev->cntl_mutex);
+		return ERR_PTR(-ENODEV);
+	}
+
+	elem.seq_num = seq_num;
+	elem.buf = NULL;
+	init_completion(&elem.xfer_done);
+	if (likely(!qdev->cntl_lost_buf)) {
+		/*
+		 * The max size of request to device is QAIC_MANAGE_EXT_MSG_LENGTH.
+		 * The max size of response from device is QAIC_MANAGE_MAX_MSG_LENGTH.
+		 */
+		out_buf = kmalloc(QAIC_MANAGE_MAX_MSG_LENGTH, GFP_KERNEL);
+		if (!out_buf) {
+			mutex_unlock(&qdev->cntl_mutex);
+			return ERR_PTR(-ENOMEM);
+		}
+
+		ret = mhi_queue_buf(qdev->cntl_ch, DMA_FROM_DEVICE,
+				    out_buf, QAIC_MANAGE_MAX_MSG_LENGTH,
+				    MHI_EOT);
+		if (ret) {
+			mutex_unlock(&qdev->cntl_mutex);
+			return ERR_PTR(ret);
+		}
+	} else {
+		/*
+		 * we lost a buffer because we queued a recv buf, but then
+		 * queuing the corresponding tx buf failed.  To try to avoid
+		 * a memory leak, lets reclaim it and use it for this
+		 * transaction.
+		 */
+		qdev->cntl_lost_buf = false;
+	}
+
+	list_for_each_entry(w, &wrappers->list, list) {
+		kref_get(&w->ref_count);
+		retry_count = 0;
+retry:
+		ret = mhi_queue_buf(qdev->cntl_ch, DMA_TO_DEVICE, &w->msg,
+				    w->len,
+				    list_is_last(&w->list, &wrappers->list) ?
+						MHI_EOT : MHI_CHAIN);
+		if (ret) {
+			if (ret == -EAGAIN &&
+			    retry_count++ < QAIC_MHI_RETRY_MAX) {
+				msleep_interruptible(QAIC_MHI_RETRY_WAIT_MS);
+				if (!signal_pending(current))
+					goto retry;
+			}
+
+			qdev->cntl_lost_buf = true;
+			kref_put(&w->ref_count, free_wrapper);
+			mutex_unlock(&qdev->cntl_mutex);
+			return ERR_PTR(ret);
+		}
+	}
+
+	list_add_tail(&elem.list, &qdev->cntl_xfer_list);
+	mutex_unlock(&qdev->cntl_mutex);
+
+	if (ignore_signal)
+		ret = wait_for_completion_timeout(&elem.xfer_done,
+						  control_resp_timeout * HZ);
+	else
+		ret = wait_for_completion_interruptible_timeout(&elem.xfer_done,
+								control_resp_timeout * HZ);
+	/*
+	 * not using _interruptable because we have to cleanup or we'll
+	 * likely cause memory corruption
+	 */
+	mutex_lock(&qdev->cntl_mutex);
+	if (!list_empty(&elem.list))
+		list_del(&elem.list);
+	if (!ret && !elem.buf)
+		ret = -ETIMEDOUT;
+	else if (ret > 0 && !elem.buf)
+		ret = -EIO;
+	mutex_unlock(&qdev->cntl_mutex);
+
+	if (ret < 0) {
+		kfree(elem.buf);
+		return ERR_PTR(ret);
+	} else if (!qdev->valid_crc(elem.buf)) {
+		kfree(elem.buf);
+		return ERR_PTR(-EPIPE);
+	}
+
+	return elem.buf;
+}
+
+/* Add a transaction to abort the outstanding DMA continuation */
+static int abort_dma_cont(struct qaic_device *qdev,
+			  struct wrapper_list *wrappers, u32 dma_chunk_id)
+{
+	struct _trans_dma_xfer *out_trans;
+	u32 size = sizeof(*out_trans);
+	struct wrapper_msg *wrapper;
+	struct wrapper_msg *w;
+	struct _msg *msg;
+
+	wrapper = list_first_entry(&wrappers->list, struct wrapper_msg, list);
+	msg = &wrapper->msg;
+
+	/* Remove all but the first wrapper which has the msg header */
+	list_for_each_entry_safe(wrapper, w, &wrappers->list, list)
+		if (!list_is_first(&wrapper->list, &wrappers->list))
+			kref_put(&wrapper->ref_count, free_wrapper);
+
+	wrapper = add_wrapper(wrappers,
+			      offsetof(struct wrapper_msg, trans) + sizeof(*out_trans));
+
+	if (!wrapper)
+		return -ENOMEM;
+
+	out_trans = (struct _trans_dma_xfer *)&wrapper->trans;
+	out_trans->hdr.type = cpu_to_le32(TRANS_DMA_XFER_TO_DEV);
+	out_trans->hdr.len = cpu_to_le32(size);
+	out_trans->tag = cpu_to_le32(0);
+	out_trans->count = cpu_to_le32(0);
+	out_trans->dma_chunk_id = cpu_to_le32(dma_chunk_id);
+
+	msg->hdr.len = cpu_to_le32(size + sizeof(*msg));
+	msg->hdr.count = cpu_to_le32(1);
+	wrapper->len = size;
+
+	return 0;
+}
+
+static struct wrapper_list *alloc_wrapper_list(void)
+{
+	struct wrapper_list *wrappers;
+
+	wrappers = kmalloc(sizeof(*wrappers), GFP_KERNEL);
+	if (!wrappers)
+		return NULL;
+	INIT_LIST_HEAD(&wrappers->list);
+	spin_lock_init(&wrappers->lock);
+
+	return wrappers;
+}
+
+static int __qaic_manage(struct qaic_device *qdev, struct qaic_user *usr,
+			 struct manage_msg *user_msg,
+			 struct ioctl_resources *resources,
+			 struct _msg **rsp)
+{
+	struct wrapper_list *wrappers;
+	struct wrapper_msg *wrapper;
+	struct wrapper_msg *w;
+	bool all_done = false;
+	struct _msg *msg;
+	int ret;
+
+	wrappers = alloc_wrapper_list();
+	if (!wrappers)
+		return -ENOMEM;
+
+	wrapper = add_wrapper(wrappers, sizeof(*wrapper));
+	if (!wrapper) {
+		kfree(wrappers);
+		return -ENOMEM;
+	}
+
+	msg = &wrapper->msg;
+	wrapper->len = sizeof(*msg);
+
+	ret = encode_message(qdev, user_msg, wrappers, resources, usr);
+	if (ret && resources->dma_chunk_id)
+		ret = abort_dma_cont(qdev, wrappers, resources->dma_chunk_id);
+	if (ret)
+		goto encode_failed;
+
+	ret = mutex_lock_interruptible(&qdev->cntl_mutex);
+	if (ret)
+		goto lock_failed;
+
+	msg->hdr.magic_number = MANAGE_MAGIC_NUMBER;
+	msg->hdr.sequence_number = cpu_to_le32(qdev->next_seq_num++);
+
+	if (usr) {
+		msg->hdr.handle = cpu_to_le32(usr->handle);
+		msg->hdr.partition_id = cpu_to_le32(usr->qddev->partition_id);
+	} else {
+		msg->hdr.handle = 0;
+		msg->hdr.partition_id = cpu_to_le32(QAIC_NO_PARTITION);
+	}
+
+	msg->hdr.padding = cpu_to_le32(0);
+	msg->hdr.crc32 = cpu_to_le32(qdev->gen_crc(wrappers));
+
+	/* msg_xfer releases the mutex */
+	*rsp = msg_xfer(qdev, wrappers, qdev->next_seq_num - 1, false);
+	if (IS_ERR(*rsp))
+		ret = PTR_ERR(*rsp);
+
+lock_failed:
+	free_dma_xfers(qdev, resources);
+encode_failed:
+	spin_lock(&wrappers->lock);
+	list_for_each_entry_safe(wrapper, w, &wrappers->list, list)
+		kref_put(&wrapper->ref_count, free_wrapper);
+	all_done = list_empty(&wrappers->list);
+	spin_unlock(&wrappers->lock);
+	if (all_done)
+		kfree(wrappers);
+
+	return ret;
+}
+
+static int qaic_manage(struct qaic_device *qdev, struct qaic_user *usr,
+		       struct manage_msg *user_msg)
+{
+	struct _trans_dma_xfer_cont *dma_cont = NULL;
+	struct ioctl_resources resources;
+	struct _msg *rsp = NULL;
+	int ret;
+
+	memset(&resources, 0, sizeof(struct ioctl_resources));
+
+	INIT_LIST_HEAD(&resources.dma_xfers);
+
+	if (user_msg->len > QAIC_MANAGE_MAX_MSG_LENGTH ||
+	    user_msg->count > QAIC_MANAGE_MAX_MSG_LENGTH / sizeof(struct qaic_manage_trans_hdr))
+		return -EINVAL;
+
+dma_xfer_continue:
+	ret = __qaic_manage(qdev, usr, user_msg, &resources, &rsp);
+	if (ret)
+		return ret;
+	/* dma_cont should be the only transaction if present */
+	if (le32_to_cpu(rsp->hdr.count) == 1) {
+		dma_cont = (struct _trans_dma_xfer_cont *)rsp->data;
+		if (le32_to_cpu(dma_cont->hdr.type) != TRANS_DMA_XFER_CONT)
+			dma_cont = NULL;
+	}
+	if (dma_cont) {
+		if (le32_to_cpu(dma_cont->dma_chunk_id) == resources.dma_chunk_id &&
+		    le64_to_cpu(dma_cont->xferred_size) == resources.xferred_dma_size) {
+			kfree(rsp);
+			goto dma_xfer_continue;
+		}
+
+		ret = -EINVAL;
+		goto dma_cont_failed;
+	}
+
+	ret = decode_message(qdev, user_msg, rsp, &resources, usr);
+
+dma_cont_failed:
+	free_dbc_buf(qdev, &resources);
+	kfree(rsp);
+	return ret;
+}
+
+int qaic_manage_ioctl(struct drm_device *dev, void *data,
+		      struct drm_file *file_priv)
+{
+	struct qaic_manage_msg *user_msg;
+	struct qaic_device *qdev;
+	struct manage_msg *msg;
+	struct qaic_user *usr;
+	u8 __user *user_data;
+	int qdev_rcu_id;
+	int usr_rcu_id;
+	int ret;
+
+	usr = file_priv->driver_priv;
+
+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
+	if (!usr->qddev) {
+		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+		return -ENODEV;
+	}
+
+	qdev = usr->qddev->qdev;
+
+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+		return -ENODEV;
+	}
+
+	user_msg = data;
+
+	if (user_msg->len > QAIC_MANAGE_MAX_MSG_LENGTH) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	msg = kzalloc(QAIC_MANAGE_MAX_MSG_LENGTH + sizeof(*msg), GFP_KERNEL);
+	if (!msg) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	msg->len = user_msg->len;
+	msg->count = user_msg->count;
+
+	user_data = u64_to_user_ptr(user_msg->data);
+
+	if (copy_from_user(msg->data, user_data, user_msg->len)) {
+		ret = -EFAULT;
+		goto free_msg;
+	}
+
+	ret = qaic_manage(qdev, usr, msg);
+
+	/*
+	 * If the qaic_manage() is successful then we copy the message onto
+	 * userspace memory but we have an exception for -ECANCELED.
+	 * For -ECANCELED, it means that device has NACKed the message with a
+	 * status error code which userspace would like to know.
+	 */
+	if (ret == -ECANCELED || !ret) {
+		if (copy_to_user(user_data, msg->data, msg->len)) {
+			ret = -EFAULT;
+		} else {
+			user_msg->len = msg->len;
+			user_msg->count = msg->count;
+		}
+	}
+
+free_msg:
+	kfree(msg);
+out:
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+	return ret;
+}
+
+int get_cntl_version(struct qaic_device *qdev, struct qaic_user *usr,
+		     u16 *major, u16 *minor)
+{
+	int ret;
+	struct manage_msg *user_msg;
+	struct qaic_manage_trans_status_to_dev *status_query;
+	struct qaic_manage_trans_status_from_dev *status_result;
+
+	user_msg = kmalloc(sizeof(*user_msg) + sizeof(*status_result), GFP_KERNEL);
+	if (!user_msg) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	user_msg->len = sizeof(*status_query);
+	user_msg->count = 1;
+
+	status_query = (struct qaic_manage_trans_status_to_dev *)user_msg->data;
+	status_query->hdr.type = TRANS_STATUS_FROM_USR;
+	status_query->hdr.len = sizeof(status_query->hdr);
+
+	ret = qaic_manage(qdev, usr, user_msg);
+	if (ret)
+		goto kfree_user_msg;
+	status_result =
+		(struct qaic_manage_trans_status_from_dev *)user_msg->data;
+	*major = status_result->major;
+	*minor = status_result->minor;
+
+	if (status_result->status_flags & BIT(0)) { /* device is using CRC */
+		/* By default qdev->gen_crc is programmed to generate CRC */
+		qdev->valid_crc = valid_crc;
+	} else {
+		/* By default qdev->valid_crc is programmed to bypass CRC */
+		qdev->gen_crc = gen_crc_stub;
+	}
+
+kfree_user_msg:
+	kfree(user_msg);
+out:
+	return ret;
+}
+
+static void resp_worker(struct work_struct *work)
+{
+	struct resp_work *resp = container_of(work, struct resp_work, work);
+	struct qaic_device *qdev = resp->qdev;
+	struct _msg *msg = resp->buf;
+	struct xfer_queue_elem *elem;
+	struct xfer_queue_elem *i;
+	bool found = false;
+
+	if (msg->hdr.magic_number != MANAGE_MAGIC_NUMBER) {
+		kfree(msg);
+		kfree(resp);
+		return;
+	}
+
+	mutex_lock(&qdev->cntl_mutex);
+	list_for_each_entry_safe(elem, i, &qdev->cntl_xfer_list, list) {
+		if (elem->seq_num == le32_to_cpu(msg->hdr.sequence_number)) {
+			found = true;
+			list_del_init(&elem->list);
+			elem->buf = msg;
+			complete_all(&elem->xfer_done);
+			break;
+		}
+	}
+	mutex_unlock(&qdev->cntl_mutex);
+
+	if (!found)
+		/* request must have timed out, drop packet */
+		kfree(msg);
+
+	kfree(resp);
+}
+
+static void free_wrapper_from_list(struct wrapper_list *wrappers,
+				   struct wrapper_msg *wrapper)
+{
+	bool all_done = false;
+
+	spin_lock(&wrappers->lock);
+	kref_put(&wrapper->ref_count, free_wrapper);
+	all_done = list_empty(&wrappers->list);
+	spin_unlock(&wrappers->lock);
+
+	if (all_done)
+		kfree(wrappers);
+}
+
+void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
+			 struct mhi_result *mhi_result)
+{
+	struct _msg *msg = mhi_result->buf_addr;
+	struct wrapper_msg *wrapper = container_of(msg, struct wrapper_msg,
+						   msg);
+
+	free_wrapper_from_list(wrapper->head, wrapper);
+}
+
+void qaic_mhi_dl_xfer_cb(struct mhi_device *mhi_dev,
+			 struct mhi_result *mhi_result)
+{
+	struct qaic_device *qdev = dev_get_drvdata(&mhi_dev->dev);
+	struct _msg *msg = mhi_result->buf_addr;
+	struct resp_work *resp;
+
+	if (mhi_result->transaction_status) {
+		kfree(msg);
+		return;
+	}
+
+	resp = kmalloc(sizeof(*resp), GFP_ATOMIC);
+	if (!resp) {
+		pci_err(qdev->pdev, "dl_xfer_cb alloc fail, dropping message\n");
+		kfree(msg);
+		return;
+	}
+
+	INIT_WORK(&resp->work, resp_worker);
+	resp->qdev = qdev;
+	resp->buf = msg;
+	queue_work(qdev->cntl_wq, &resp->work);
+}
+
+int qaic_control_open(struct qaic_device *qdev)
+{
+	if (!qdev->cntl_ch)
+		return -ENODEV;
+
+	qdev->cntl_lost_buf = false;
+	/*
+	 * By default qaic should assume that device has CRC enabled.
+	 * Qaic comes to know if device has CRC enabled or disabled during the
+	 * device status transaction, which is the first transaction performed
+	 * on control channel.
+	 *
+	 * So CRC validation of first device status transaction response is
+	 * ignored (by calling valid_crc_stub) and is done later during decoding
+	 * if device has CRC enabled.
+	 * Now that qaic knows whether device has CRC enabled or not it acts
+	 * accordingly
+	 */
+	qdev->gen_crc = gen_crc;
+	qdev->valid_crc = valid_crc_stub;
+
+	return mhi_prepare_for_transfer(qdev->cntl_ch);
+}
+
+void qaic_control_close(struct qaic_device *qdev)
+{
+	mhi_unprepare_from_transfer(qdev->cntl_ch);
+}
+
+void qaic_release_usr(struct qaic_device *qdev, struct qaic_user *usr)
+{
+	struct _trans_terminate_to_dev *trans;
+	struct wrapper_list *wrappers;
+	struct wrapper_msg *wrapper;
+	struct _msg *msg;
+	struct _msg *rsp;
+
+	wrappers = alloc_wrapper_list();
+	if (!wrappers)
+		return;
+
+	wrapper = add_wrapper(wrappers, sizeof(*wrapper) + sizeof(*msg) +
+			      sizeof(*trans));
+	if (!wrapper)
+		return;
+
+	msg = &wrapper->msg;
+
+	trans = (struct _trans_terminate_to_dev *)msg->data;
+
+	trans->hdr.type = cpu_to_le32(TRANS_TERMINATE_TO_DEV);
+	trans->hdr.len = cpu_to_le32(sizeof(*trans));
+	trans->handle = cpu_to_le32(usr->handle);
+
+	mutex_lock(&qdev->cntl_mutex);
+	wrapper->len = sizeof(msg->hdr) + sizeof(*trans);
+	msg->hdr.magic_number = MANAGE_MAGIC_NUMBER;
+	msg->hdr.sequence_number = cpu_to_le32(qdev->next_seq_num++);
+	msg->hdr.len = cpu_to_le32(wrapper->len);
+	msg->hdr.count = cpu_to_le32(1);
+	msg->hdr.handle = cpu_to_le32(usr->handle);
+	msg->hdr.padding = cpu_to_le32(0);
+	msg->hdr.crc32 = cpu_to_le32(qdev->gen_crc(wrappers));
+
+	/*
+	 * msg_xfer releases the mutex
+	 * We don't care about the return of msg_xfer since we will not do
+	 * anything different based on what happens.
+	 * We ignore pending signals since one will be set if the user is
+	 * killed, and we need give the device a chance to cleanup, otherwise
+	 * DMA may still be in progress when we return.
+	 */
+	rsp = msg_xfer(qdev, wrappers, qdev->next_seq_num - 1, true);
+	if (!IS_ERR(rsp))
+		kfree(rsp);
+	free_wrapper_from_list(wrappers, wrapper);
+}
+
+void wake_all_cntl(struct qaic_device *qdev)
+{
+	struct xfer_queue_elem *elem;
+	struct xfer_queue_elem *i;
+
+	mutex_lock(&qdev->cntl_mutex);
+	list_for_each_entry_safe(elem, i, &qdev->cntl_xfer_list, list) {
+		list_del_init(&elem->list);
+		complete_all(&elem->xfer_done);
+	}
+	mutex_unlock(&qdev->cntl_mutex);
+}
+
+int qaic_data_get_reservation(struct qaic_device *qdev, struct qaic_user *usr,
+			      void *data, u32 *partition_id, u16 *remove)
+{
+	struct _trans_validate_part_from_dev *trans_rsp;
+	struct _trans_validate_part_to_dev *trans_req;
+	struct qaic_part_dev *user_msg;
+	struct wrapper_list *wrappers;
+	struct wrapper_msg *wrapper;
+	struct _msg *msg_req;
+	struct _msg *msg_rsp;
+	size_t msg_rsp_len;
+	int ret = 0;
+
+	user_msg = (struct qaic_part_dev *)data;
+	/* -1 for partition_id is a special value, so check for it */
+	if (user_msg->partition_id == QAIC_NO_PARTITION || user_msg->remove > 1) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	*partition_id = user_msg->partition_id;
+	*remove = user_msg->remove;
+
+	/*
+	 * In case of a remove we do not need to do a fw partition check, the
+	 * right user is validated when removing the device in the device
+	 * remove code. So, in case remove is set to 1, we just copy the
+	 * parameters and return from the call.
+	 */
+	if (*remove)
+		return 0;
+
+	wrappers = alloc_wrapper_list();
+	if (!wrappers)
+		return -ENOMEM;
+
+	wrapper = add_wrapper(wrappers, sizeof(*wrapper) + sizeof(*msg_req) +
+			      sizeof(*trans_req));
+	if (!wrapper) {
+		kfree(wrappers);
+		return -ENOMEM;
+	}
+
+	msg_req = &wrapper->msg;
+
+	trans_req = (struct _trans_validate_part_to_dev *)msg_req->data;
+	trans_req->hdr.type = cpu_to_le32(TRANS_VALIDATE_PARTITION_TO_DEV);
+	trans_req->hdr.len = cpu_to_le32(sizeof(*trans_req));
+	trans_req->part_id = cpu_to_le32(*partition_id);
+
+	mutex_lock(&qdev->cntl_mutex);
+	wrapper->len = sizeof(msg_req->hdr) + sizeof(*trans_req);
+	msg_req->hdr.len = cpu_to_le32(wrapper->len);
+	msg_req->hdr.sequence_number = cpu_to_le32(qdev->next_seq_num++);
+	msg_req->hdr.magic_number = MANAGE_MAGIC_NUMBER;
+	msg_req->hdr.handle = cpu_to_le32(usr->handle);
+	msg_req->hdr.count = cpu_to_le32(1);
+	msg_req->hdr.padding = cpu_to_le32(0);
+	msg_req->hdr.crc32 = cpu_to_le32(qdev->gen_crc(wrappers));
+
+	/*
+	 * msg_xfer releases the mutex
+	 * The msg count will always be 1 in the response
+	 */
+	msg_rsp = msg_xfer(qdev, wrappers, qdev->next_seq_num - 1, false);
+	if (IS_ERR(msg_rsp)) {
+		ret = PTR_ERR(msg_rsp);
+		goto kfree_wrapper;
+	}
+
+	msg_rsp_len = sizeof(msg_rsp->hdr) + sizeof(*trans_rsp);
+	if (le32_to_cpu(msg_rsp->hdr.count) != 1 ||
+	    le32_to_cpu(msg_rsp->hdr.len) < msg_rsp_len) {
+		ret = -EINVAL;
+		goto kfree_msg_rsp;
+	}
+
+	trans_rsp = (struct _trans_validate_part_from_dev *)msg_rsp->data;
+	if (le32_to_cpu(trans_rsp->status))
+		ret = -EPERM;
+
+kfree_msg_rsp:
+	kfree(msg_rsp);
+kfree_wrapper:
+	free_wrapper_from_list(wrappers, wrapper);
+out:
+	return ret;
+}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 5/8] accel/qaic: Add datapath
  2023-02-06 15:41 ` Jeffrey Hugo
@ 2023-02-06 15:41   ` Jeffrey Hugo
  -1 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-06 15:41 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: Jeffrey Hugo, linux-doc, linux-arm-msm, quic_ajitpals,
	quic_pkanojiy, stanislaw.gruszka, quic_carlv, jacek.lawrynowicz

Add the datapath component that manages BOs and submits them to running
workloads on the qaic device via the dma_bridge hardware.  This allows
QAIC clients to interact with their workloads (run inferences) via the
following ioctls along with mmap():

DRM_IOCTL_QAIC_CREATE_BO
DRM_IOCTL_QAIC_MMAP_BO
DRM_IOCTL_QAIC_ATTACH_SLICE_BO
DRM_IOCTL_QAIC_EXECUTE_BO
DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO
DRM_IOCTL_QAIC_WAIT_BO
DRM_IOCTL_QAIC_PERF_STATS_BO

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
---
 drivers/accel/qaic/qaic_data.c | 1952 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1952 insertions(+)
 create mode 100644 drivers/accel/qaic/qaic_data.c

diff --git a/drivers/accel/qaic/qaic_data.c b/drivers/accel/qaic/qaic_data.c
new file mode 100644
index 0000000..d37cdbe
--- /dev/null
+++ b/drivers/accel/qaic/qaic_data.c
@@ -0,0 +1,1952 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2019-2021, The Linux Foundation. All rights reserved. */
+/* Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved. */
+
+#include <linux/completion.h>
+#include <linux/delay.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-mapping.h>
+#include <linux/interrupt.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/moduleparam.h>
+#include <linux/scatterlist.h>
+#include <linux/spinlock.h>
+#include <linux/srcu.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/wait.h>
+#include <drm/drm_file.h>
+#include <drm/drm_gem.h>
+#include <drm/drm_print.h>
+#include <uapi/drm/qaic_accel.h>
+
+#include "qaic.h"
+
+#define SEM_VAL_MASK	GENMASK_ULL(11, 0)
+#define SEM_INDEX_MASK	GENMASK_ULL(4, 0)
+#define BULK_XFER	BIT(3)
+#define GEN_COMPLETION	BIT(4)
+#define INBOUND_XFER	1
+#define OUTBOUND_XFER	2
+#define REQHP_OFF	0x0 /* we read this */
+#define REQTP_OFF	0x4 /* we write this */
+#define RSPHP_OFF	0x8 /* we write this */
+#define RSPTP_OFF	0xc /* we read this */
+
+#define ENCODE_SEM(val, index, sync, cmd, flags)			\
+			((val) |					\
+			(index) << 16 |					\
+			(sync) << 22 |					\
+			(cmd) << 24 |					\
+			((cmd) ? BIT(31) : 0) |				\
+			(((flags) & SEM_INSYNCFENCE) ? BIT(30) : 0) |	\
+			(((flags) & SEM_OUTSYNCFENCE) ? BIT(29) : 0))
+#define NUM_EVENTS	128
+#define NUM_DELAYS	10
+
+static unsigned int wait_exec_default_timeout = 5000; /* 5 sec default */
+module_param(wait_exec_default_timeout, uint, 0600);
+
+static unsigned int datapath_poll_interval_us = 100; /* 100 usec default */
+module_param(datapath_poll_interval_us, uint, 0600);
+
+struct dbc_req { /* everything must be little endian encoded */
+	/*
+	 * A request ID is assigned to each memory handle going in DMA queue.
+	 * As a single memory handle can enqueue multiple elements in DMA queue
+	 * all of them will have the same request ID.
+	 */
+	__le16	req_id;
+	/* Future use */
+	__u8	seq_id;
+	/*
+	 * Special encoded variable
+	 * 7	0 - Do not force to generate MSI after DMA is completed
+	 *	1 - Force to generate MSI after DMA is completed
+	 * 6:5	Reserved
+	 * 4	1 - Generate completion element in the response queue
+	 *	0 - No Completion Code
+	 * 3	0 - DMA request is a Link list transfer
+	 *	1 - DMA request is a Bulk transfer
+	 * 2	Reserved
+	 * 1:0	00 - No DMA transfer involved
+	 *	01 - DMA transfer is part of inbound transfer
+	 *	10 - DMA transfer has outbound transfer
+	 *	11 - NA
+	 */
+	__u8	cmd;
+	__le32	resv;
+	/* Source address for the transfer */
+	__le64	src_addr;
+	/* Destination address for the transfer */
+	__le64	dest_addr;
+	/* Length of transfer request */
+	__le32	len;
+	__le32	resv2;
+	/* Doorbell address */
+	__le64	db_addr;
+	/*
+	 * Special encoded variable
+	 * 7	1 - Doorbell(db) write
+	 *	0 - No doorbell write
+	 * 6:2	Reserved
+	 * 1:0	00 - 32 bit access, db address must be aligned to 32bit-boundary
+	 *	01 - 16 bit access, db address must be aligned to 16bit-boundary
+	 *	10 - 8 bit access, db address must be aligned to 8bit-boundary
+	 *	11 - Reserved
+	 */
+	__u8	db_len;
+	__u8	resv3;
+	__le16	resv4;
+	/* 32 bit data written to doorbell address */
+	__le32	db_data;
+	/*
+	 * Special encoded variable
+	 * All the fields of sem_cmdX are passed from user and all are ORed
+	 * together to form sem_cmd.
+	 * 0:11		Semaphore value
+	 * 15:12	Reserved
+	 * 20:16	Semaphore index
+	 * 21		Reserved
+	 * 22		Semaphore Sync
+	 * 23		Reserved
+	 * 26:24	Semaphore command
+	 * 28:27	Reserved
+	 * 29		Semaphore DMA out bound sync fence
+	 * 30		Semaphore DMA in bound sync fence
+	 * 31		Enable semaphore command
+	 */
+	__le32	sem_cmd0;
+	__le32	sem_cmd1;
+	__le32	sem_cmd2;
+	__le32	sem_cmd3;
+} __packed;
+
+struct dbc_rsp { /* everything must be little endian encoded */
+	/* Request ID of the memory handle whose DMA transaction is completed */
+	__le16	req_id;
+	/* Status of the DMA transaction. 0 : Success otherwise failure */
+	__le16	status;
+} __packed;
+
+inline int get_dbc_req_elem_size(void)
+{
+	return sizeof(struct dbc_req);
+}
+
+inline int get_dbc_rsp_elem_size(void)
+{
+	return sizeof(struct dbc_rsp);
+}
+
+static int reserve_pages(unsigned long start_pfn, unsigned long nr_pages,
+			 bool reserve)
+{
+	unsigned long pfn;
+	unsigned long end_pfn = start_pfn + nr_pages;
+	struct page *page;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+		if (!pfn_valid(pfn))
+			return -EINVAL;
+		page =  pfn_to_page(pfn);
+		if (reserve)
+			SetPageReserved(page);
+		else
+			ClearPageReserved(page);
+	}
+	return 0;
+}
+
+static void free_slice(struct kref *kref)
+{
+	struct bo_slice *slice = container_of(kref, struct bo_slice, ref_count);
+
+	list_del(&slice->slice);
+	drm_gem_object_put(&slice->bo->base);
+	sg_free_table(slice->sgt);
+	kfree(slice->sgt);
+	kfree(slice->reqs);
+	kfree(slice);
+}
+
+static int copy_sgt(struct qaic_device *qdev, struct sg_table **sgt_out,
+		    struct sg_table *sgt_in, u64 size, u64 offset)
+{
+	int total_len, len, nents, offf = 0, offl = 0;
+	struct scatterlist *sg, *sgn, *sgf, *sgl;
+	struct sg_table *sgt;
+	int ret, j;
+
+	/* find out number of relevant nents needed for this mem */
+	total_len = 0;
+	sgf = NULL;
+	sgl = NULL;
+	nents = 0;
+
+	size = size ? size : PAGE_SIZE;
+	for (sg = sgt_in->sgl; sg; sg = sg_next(sg)) {
+		len = sg_dma_len(sg);
+
+		if (!len)
+			continue;
+		if (offset >= total_len && offset < total_len + len) {
+			sgf = sg;
+			offf = offset - total_len;
+		}
+		if (sgf)
+			nents++;
+		if (offset + size >= total_len &&
+		    offset + size <= total_len + len) {
+			sgl = sg;
+			offl = offset + size - total_len;
+			break;
+		}
+		total_len += len;
+	}
+
+	if (!sgf || !sgl) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
+	if (!sgt) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
+	if (ret)
+		goto free_sgt;
+
+	/* copy relevant sg node and fix page and length */
+	sgn = sgf;
+	for_each_sgtable_sg(sgt, sg, j) {
+		memcpy(sg, sgn, sizeof(*sg));
+		if (sgn == sgf) {
+			sg_dma_address(sg) += offf;
+			sg_dma_len(sg) -= offf;
+			sg_set_page(sg, sg_page(sgn),
+				    sg_dma_len(sg), offf);
+		} else {
+			offf = 0;
+		}
+		if (sgn == sgl) {
+			sg_dma_len(sg) = offl - offf;
+			sg_set_page(sg, sg_page(sgn),
+				    offl - offf, offf);
+			sg_mark_end(sg);
+			break;
+		}
+		sgn = sg_next(sgn);
+	}
+
+	*sgt_out = sgt;
+	return ret;
+
+free_sgt:
+	kfree(sgt);
+out:
+	*sgt_out = NULL;
+	return ret;
+}
+
+static int encode_reqs(struct qaic_device *qdev, struct bo_slice *slice,
+		       struct qaic_attach_slice_entry *req)
+{
+	__u8 cmd = BULK_XFER;
+	__le64 db_addr = cpu_to_le64(req->db_addr);
+	__u8 db_len;
+	__le32 db_data = cpu_to_le32(req->db_data);
+	struct scatterlist *sg;
+	u64 dev_addr;
+	int presync_sem;
+	int i;
+
+	if (!slice->no_xfer)
+		cmd |= (slice->dir == DMA_TO_DEVICE ? INBOUND_XFER :
+								OUTBOUND_XFER);
+
+	if (req->db_len && !IS_ALIGNED(req->db_addr, req->db_len / 8))
+		return -EINVAL;
+
+	presync_sem = req->sem0.presync + req->sem1.presync +
+		      req->sem2.presync + req->sem3.presync;
+	if (presync_sem > 1)
+		return -EINVAL;
+
+	presync_sem = req->sem0.presync << 0 | req->sem1.presync << 1 |
+		      req->sem2.presync << 2 | req->sem3.presync << 3;
+
+	switch (req->db_len) {
+	case 32:
+		db_len = BIT(7);
+		break;
+	case 16:
+		db_len = BIT(7) | 1;
+		break;
+	case 8:
+		db_len = BIT(7) | 2;
+		break;
+	case 0:
+		db_len = 0; /* doorbell is not active for this command */
+		break;
+	default:
+		return -EINVAL; /* should never hit this */
+	}
+
+	/*
+	 * When we end up splitting up a single request (ie a buf slice) into
+	 * multiple DMA requests, we have to manage the sync data carefully.
+	 * There can only be one presync sem.  That needs to be on every xfer
+	 * so that the DMA engine doesn't transfer data before the receiver is
+	 * ready.  We only do the doorbell and postsync sems after the xfer.
+	 * To guarantee previous xfers for the request are complete, we use a
+	 * fence.
+	 */
+	dev_addr = req->dev_addr;
+	for_each_sgtable_sg(slice->sgt, sg, i) {
+		slice->reqs[i].cmd = cmd;
+		slice->reqs[i].src_addr =
+			cpu_to_le64(slice->dir == DMA_TO_DEVICE ?
+					sg_dma_address(sg) : dev_addr);
+		slice->reqs[i].dest_addr =
+			cpu_to_le64(slice->dir == DMA_TO_DEVICE ?
+					dev_addr : sg_dma_address(sg));
+		/*
+		 * sg_dma_len(sg) returns size of a DMA segment, maximum DMA
+		 * segment size is set to UINT_MAX by qaic and hence return
+		 * values of sg_dma_len(sg) can never exceed u32 range. So,
+		 * by down sizing we are not corrupting the value.
+		 */
+		slice->reqs[i].len = cpu_to_le32((u32)sg_dma_len(sg));
+		switch (presync_sem) {
+		case BIT(0):
+			slice->reqs[i].sem_cmd0 = cpu_to_le32(ENCODE_SEM(req->sem0.val,
+									 req->sem0.index,
+									 req->sem0.presync,
+									 req->sem0.cmd,
+									 req->sem0.flags));
+			break;
+		case BIT(1):
+			slice->reqs[i].sem_cmd1 = cpu_to_le32(ENCODE_SEM(req->sem1.val,
+									 req->sem1.index,
+									 req->sem1.presync,
+									 req->sem1.cmd,
+									 req->sem1.flags));
+			break;
+		case BIT(2):
+			slice->reqs[i].sem_cmd2 = cpu_to_le32(ENCODE_SEM(req->sem2.val,
+									 req->sem2.index,
+									 req->sem2.presync,
+									 req->sem2.cmd,
+									 req->sem2.flags));
+			break;
+		case BIT(3):
+			slice->reqs[i].sem_cmd3 = cpu_to_le32(ENCODE_SEM(req->sem3.val,
+									 req->sem3.index,
+									 req->sem3.presync,
+									 req->sem3.cmd,
+									 req->sem3.flags));
+			break;
+		}
+		dev_addr += sg_dma_len(sg);
+	}
+	/* add post transfer stuff to last segment */
+	i--;
+	slice->reqs[i].cmd |= GEN_COMPLETION;
+	slice->reqs[i].db_addr = db_addr;
+	slice->reqs[i].db_len = db_len;
+	slice->reqs[i].db_data = db_data;
+	/*
+	 * Add a fence if we have more than one request going to the hardware
+	 * representing the entirety of the user request, and the user request
+	 * has no presync condition.
+	 * Fences are expensive, so we try to avoid them.  We rely on the
+	 * hardware behavior to avoid needing one when there is a presync
+	 * condition.  When a presync exists, all requests for that same
+	 * presync will be queued into a fifo.  Thus, since we queue the
+	 * post xfer activity only on the last request we queue, the hardware
+	 * will ensure that the last queued request is processed last, thus
+	 * making sure the post xfer activity happens at the right time without
+	 * a fence.
+	 */
+	if (i && !presync_sem)
+		req->sem0.flags |= (slice->dir == DMA_TO_DEVICE ?
+				    SEM_INSYNCFENCE : SEM_OUTSYNCFENCE);
+	slice->reqs[i].sem_cmd0 = cpu_to_le32(ENCODE_SEM(req->sem0.val,
+							 req->sem0.index,
+							 req->sem0.presync,
+							 req->sem0.cmd,
+							 req->sem0.flags));
+	slice->reqs[i].sem_cmd1 = cpu_to_le32(ENCODE_SEM(req->sem1.val,
+							 req->sem1.index,
+							 req->sem1.presync,
+							 req->sem1.cmd,
+							 req->sem1.flags));
+	slice->reqs[i].sem_cmd2 = cpu_to_le32(ENCODE_SEM(req->sem2.val,
+							 req->sem2.index,
+							 req->sem2.presync,
+							 req->sem2.cmd,
+							 req->sem2.flags));
+	slice->reqs[i].sem_cmd3 = cpu_to_le32(ENCODE_SEM(req->sem3.val,
+							 req->sem3.index,
+							 req->sem3.presync,
+							 req->sem3.cmd,
+							 req->sem3.flags));
+
+	return 0;
+}
+
+static int qaic_map_one_slice(struct qaic_device *qdev, struct qaic_bo *bo,
+			      struct qaic_attach_slice_entry *slice_ent)
+{
+	struct sg_table *sgt = NULL;
+	struct bo_slice *slice;
+	int ret;
+
+	ret = copy_sgt(qdev, &sgt, bo->sgt, slice_ent->size, slice_ent->offset);
+	if (ret)
+		goto out;
+
+	slice = kmalloc(sizeof(*slice), GFP_KERNEL);
+	if (!slice) {
+		ret = -ENOMEM;
+		goto free_sgt;
+	}
+
+	slice->reqs = kcalloc(sgt->nents, sizeof(*slice->reqs), GFP_KERNEL);
+	if (!slice->reqs) {
+		ret = -ENOMEM;
+		goto free_slice;
+	}
+
+	slice->no_xfer = !slice_ent->size;
+	slice->sgt = sgt;
+	slice->nents = sgt->nents;
+	slice->dir = bo->dir;
+	slice->bo = bo;
+	slice->size = slice_ent->size;
+	slice->offset = slice_ent->offset;
+
+	ret = encode_reqs(qdev, slice, slice_ent);
+	if (ret)
+		goto free_req;
+
+	bo->total_slice_nents += sgt->nents;
+	kref_init(&slice->ref_count);
+	drm_gem_object_get(&bo->base);
+	list_add_tail(&slice->slice, &bo->slices);
+
+	return 0;
+
+free_req:
+	kfree(slice->reqs);
+free_slice:
+	kfree(slice);
+free_sgt:
+	sg_free_table(sgt);
+	kfree(sgt);
+out:
+	return ret;
+}
+
+static int create_sgt(struct qaic_device *qdev, struct sg_table **sgt_out,
+		      u64 size)
+{
+	struct scatterlist *sg;
+	struct sg_table *sgt;
+	struct page **pages;
+	int *pages_order;
+	int buf_extra;
+	int max_order;
+	int nr_pages;
+	int ret = 0;
+	int i, j, k;
+	int order;
+
+	if (size) {
+		nr_pages = DIV_ROUND_UP(size, PAGE_SIZE);
+		/*
+		 * calculate how much extra we are going to allocate, to remove
+		 * later
+		 */
+		buf_extra = (PAGE_SIZE - size % PAGE_SIZE) % PAGE_SIZE;
+		max_order = min(MAX_ORDER - 1, get_order(size));
+	} else {
+		/* allocate a single page for book keeping */
+		nr_pages = 1;
+		buf_extra = 0;
+		max_order = 0;
+	}
+
+	pages = kvmalloc_array(nr_pages, sizeof(*pages) + sizeof(*pages_order), GFP_KERNEL);
+	if (!pages) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	pages_order = (void *)pages + sizeof(*pages) * nr_pages;
+
+	/*
+	 * Allocate requested memory using alloc_pages. It is possible to allocate
+	 * the requested memory in multiple chunks by calling alloc_pages
+	 * multiple times. Use SG table to handle multiple allocated pages.
+	 */
+	i = 0;
+	while (nr_pages > 0) {
+		order = min(get_order(nr_pages * PAGE_SIZE), max_order);
+		while (1) {
+			pages[i] = alloc_pages(GFP_KERNEL | GFP_HIGHUSER |
+					       __GFP_NOWARN | __GFP_ZERO |
+					       (order ? __GFP_NORETRY : __GFP_RETRY_MAYFAIL),
+					       order);
+			if (pages[i])
+				break;
+			if (!order--) {
+				ret = -ENOMEM;
+				goto free_partial_alloc;
+			}
+		}
+
+		max_order = order;
+		pages_order[i] = order;
+
+		nr_pages -= 1 << order;
+		if (nr_pages <= 0)
+			/* account for over allocation */
+			buf_extra += abs(nr_pages) * PAGE_SIZE;
+		i++;
+	}
+
+	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
+	if (!sgt) {
+		ret = -ENOMEM;
+		goto free_partial_alloc;
+	}
+
+	if (sg_alloc_table(sgt, i, GFP_KERNEL)) {
+		ret = -ENOMEM;
+		goto free_sgt;
+	}
+
+	/* Populate the SG table with the allocated memory pages */
+	sg = sgt->sgl;
+	for (k = 0; k < i; k++, sg = sg_next(sg)) {
+		/* Last entry requires special handling */
+		if (k < i - 1) {
+			sg_set_page(sg, pages[k], PAGE_SIZE << pages_order[k], 0);
+		} else {
+			sg_set_page(sg, pages[k],
+				    (PAGE_SIZE << pages_order[k]) - buf_extra, 0);
+			sg_mark_end(sg);
+		}
+
+		ret = reserve_pages(page_to_pfn(pages[k]), DIV_ROUND_UP(sg->length, PAGE_SIZE),
+				    true);
+		if (ret)
+			goto clear_pages;
+	}
+
+	kvfree(pages);
+	*sgt_out = sgt;
+	return ret;
+
+clear_pages:
+	for (j = 0; j < k; j++)
+		ret = reserve_pages(page_to_pfn(pages[j]), 1 << pages_order[j],
+				    false);
+	sg_free_table(sgt);
+free_sgt:
+	kfree(sgt);
+free_partial_alloc:
+	for (j = 0; j < i; j++)
+		__free_pages(pages[j], pages_order[j]);
+	kvfree(pages);
+out:
+	*sgt_out = NULL;
+	return ret;
+}
+
+static bool invalid_sem(struct qaic_sem *sem)
+{
+	if (sem->val & ~SEM_VAL_MASK || sem->index & ~SEM_INDEX_MASK ||
+	    !(sem->presync == 0 || sem->presync == 1) || sem->pad ||
+	    sem->flags & ~(SEM_INSYNCFENCE | SEM_OUTSYNCFENCE) ||
+	    sem->cmd > SEM_WAIT_GT_0)
+		return true;
+	return false;
+}
+
+static int qaic_validate_req(struct qaic_device *qdev,
+			     struct qaic_attach_slice_entry *slice_ent,
+			     u32 count, u64 total_size)
+{
+	int i;
+
+	for (i = 0; i < count; i++) {
+		if (!(slice_ent[i].db_len == 32 || slice_ent[i].db_len == 16 ||
+		      slice_ent[i].db_len == 8 || slice_ent[i].db_len == 0) ||
+		      invalid_sem(&slice_ent[i].sem0) ||
+		      invalid_sem(&slice_ent[i].sem1) ||
+		      invalid_sem(&slice_ent[i].sem2) ||
+		      invalid_sem(&slice_ent[i].sem3))
+			return -EINVAL;
+
+		if (slice_ent[i].offset + slice_ent[i].size > total_size)
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void qaic_free_sgt(struct sg_table *sgt)
+{
+	struct scatterlist *sg;
+
+	for (sg = sgt->sgl; sg; sg = sg_next(sg))
+		if (sg_page(sg)) {
+			reserve_pages(page_to_pfn(sg_page(sg)),
+				      DIV_ROUND_UP(sg->length, PAGE_SIZE), false);
+			__free_pages(sg_page(sg), get_order(sg->length));
+		}
+	sg_free_table(sgt);
+	kfree(sgt);
+}
+
+static void qaic_gem_print_info(struct drm_printer *p, unsigned int indent,
+				const struct drm_gem_object *obj)
+{
+	struct qaic_bo *bo = to_qaic_bo(obj);
+
+	drm_printf_indent(p, indent, "user requested size=%llu\n", bo->size);
+}
+
+static const struct vm_operations_struct drm_vm_ops = {
+	.open = drm_gem_vm_open,
+	.close = drm_gem_vm_close,
+};
+
+static int qaic_gem_object_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
+{
+	struct qaic_bo *bo = to_qaic_bo(obj);
+	unsigned long offset = 0;
+	struct scatterlist *sg;
+	int ret;
+
+	if (obj->import_attach)
+		return -EINVAL;
+
+	for (sg = bo->sgt->sgl; sg; sg = sg_next(sg)) {
+		if (sg_page(sg)) {
+			ret = remap_pfn_range(vma, vma->vm_start + offset,
+					      page_to_pfn(sg_page(sg)),
+					      sg->length, vma->vm_page_prot);
+			if (ret)
+				goto out;
+			offset += sg->length;
+		}
+	}
+
+out:
+	return ret;
+}
+
+static void qaic_free_object(struct drm_gem_object *obj)
+{
+	struct qaic_bo *bo = to_qaic_bo(obj);
+
+	if (obj->import_attach) {
+		/* DMABUF/PRIME Path */
+		dma_buf_detach(obj->import_attach->dmabuf, obj->import_attach);
+		dma_buf_put(obj->import_attach->dmabuf);
+	} else {
+		/* Private buffer allocation path */
+		qaic_free_sgt(bo->sgt);
+	}
+
+	drm_gem_object_release(obj);
+	kfree(bo);
+}
+
+static const struct drm_gem_object_funcs qaic_gem_funcs = {
+	.free = qaic_free_object,
+	.print_info = qaic_gem_print_info,
+	.mmap = qaic_gem_object_mmap,
+	.vm_ops = &drm_vm_ops,
+};
+
+static struct qaic_bo *qaic_alloc_init_bo(void)
+{
+	struct qaic_bo *bo;
+
+	bo = kzalloc(sizeof(*bo), GFP_KERNEL);
+	if (!bo)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&bo->slices);
+	init_completion(&bo->xfer_done);
+	complete_all(&bo->xfer_done);
+
+	return bo;
+}
+
+int qaic_create_bo_ioctl(struct drm_device *dev, void *data,
+			 struct drm_file *file_priv)
+{
+	struct qaic_create_bo *args = data;
+	int usr_rcu_id, qdev_rcu_id;
+	struct drm_gem_object *obj;
+	struct qaic_device *qdev;
+	struct qaic_user *usr;
+	struct qaic_bo *bo;
+	size_t size;
+	int ret;
+
+	if (args->pad)
+		return -EINVAL;
+
+	usr = file_priv->driver_priv;
+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
+	if (!usr->qddev) {
+		ret = -ENODEV;
+		goto unlock_usr_srcu;
+	}
+
+	qdev = usr->qddev->qdev;
+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		ret = -ENODEV;
+		goto unlock_dev_srcu;
+	}
+
+	size = PAGE_ALIGN(args->size);
+	if (size == 0) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	bo = qaic_alloc_init_bo();
+	if (IS_ERR(bo)) {
+		ret = PTR_ERR(bo);
+		goto unlock_dev_srcu;
+	}
+	obj = &bo->base;
+
+	drm_gem_private_object_init(dev, obj, size);
+
+	obj->funcs = &qaic_gem_funcs;
+	ret = create_sgt(qdev, &bo->sgt, size);
+	if (ret)
+		goto free_bo;
+
+	bo->size = args->size;
+
+	ret = drm_gem_handle_create(file_priv, obj, &args->handle);
+	if (ret)
+		goto free_sgt;
+
+	bo->handle = args->handle;
+	drm_gem_object_put(obj);
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+
+	return 0;
+
+free_sgt:
+	qaic_free_sgt(bo->sgt);
+free_bo:
+	kfree(bo);
+unlock_dev_srcu:
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+unlock_usr_srcu:
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+	return ret;
+}
+
+int qaic_mmap_bo_ioctl(struct drm_device *dev, void *data,
+		       struct drm_file *file_priv)
+{
+	struct qaic_mmap_bo *args = data;
+	int usr_rcu_id, qdev_rcu_id;
+	struct drm_gem_object *obj;
+	struct qaic_device *qdev;
+	struct qaic_user *usr;
+	int ret;
+
+	usr = file_priv->driver_priv;
+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
+	if (!usr->qddev) {
+		ret = -ENODEV;
+		goto unlock_usr_srcu;
+	}
+
+	qdev = usr->qddev->qdev;
+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		ret = -ENODEV;
+		goto unlock_dev_srcu;
+	}
+
+	obj = drm_gem_object_lookup(file_priv, args->handle);
+	if (!obj) {
+		ret = -ENOENT;
+		goto unlock_dev_srcu;
+	}
+
+	ret = drm_gem_create_mmap_offset(obj);
+	if (ret == 0)
+		args->offset = drm_vma_node_offset_addr(&obj->vma_node);
+
+	drm_gem_object_put(obj);
+
+unlock_dev_srcu:
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+unlock_usr_srcu:
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+	return ret;
+}
+
+struct drm_gem_object *qaic_gem_prime_import(struct drm_device *dev,
+					     struct dma_buf *dma_buf)
+{
+	struct dma_buf_attachment *attach;
+	struct drm_gem_object *obj;
+	struct qaic_bo *bo;
+	size_t size;
+	int ret;
+
+	bo = qaic_alloc_init_bo();
+	if (IS_ERR(bo)) {
+		ret = PTR_ERR(bo);
+		goto out;
+	}
+
+	obj = &bo->base;
+	get_dma_buf(dma_buf);
+
+	attach = dma_buf_attach(dma_buf, dev->dev);
+	if (IS_ERR(attach)) {
+		ret = PTR_ERR(attach);
+		goto attach_fail;
+	}
+
+	size = PAGE_ALIGN(attach->dmabuf->size);
+	if (size == 0) {
+		ret = -EINVAL;
+		goto size_align_fail;
+	}
+
+	drm_gem_private_object_init(dev, obj, size);
+	/*
+	 * skipping dma_buf_map_attachment() as we do not know the direction
+	 * just yet.  Once the direction is known in the subsequent IOCTL to
+	 * attach slicing, we can do it then.
+	 */
+
+	obj->funcs = &qaic_gem_funcs;
+	obj->import_attach = attach;
+	obj->resv = dma_buf->resv;
+
+	return obj;
+
+size_align_fail:
+	dma_buf_detach(dma_buf, attach);
+attach_fail:
+	dma_buf_put(dma_buf);
+	kfree(bo);
+out:
+	return ERR_PTR(ret);
+}
+
+static int qaic_prepare_import_bo(struct qaic_bo *bo,
+				  struct qaic_attach_slice_hdr *hdr)
+{
+	struct drm_gem_object *obj = &bo->base;
+	struct sg_table *sgt;
+	int ret;
+
+	if (obj->import_attach->dmabuf->size < hdr->size)
+		return -EINVAL;
+
+	sgt = dma_buf_map_attachment(obj->import_attach, hdr->dir);
+	if (IS_ERR(sgt)) {
+		ret = PTR_ERR(sgt);
+		return ret;
+	}
+
+	bo->sgt = sgt;
+	bo->size = hdr->size;
+
+	return 0;
+}
+
+static int qaic_prepare_export_bo(struct qaic_device *qdev, struct qaic_bo *bo,
+				  struct qaic_attach_slice_hdr *hdr)
+{
+	int ret;
+
+	if (bo->size != hdr->size)
+		return -EINVAL;
+
+	ret = dma_map_sgtable(&qdev->pdev->dev, bo->sgt, hdr->dir, 0);
+	if (ret)
+		return -EFAULT;
+
+	return 0;
+}
+
+static int qaic_prepare_bo(struct qaic_device *qdev, struct qaic_bo *bo,
+			   struct qaic_attach_slice_hdr *hdr)
+{
+	int ret;
+
+	if (bo->base.import_attach)
+		ret = qaic_prepare_import_bo(bo, hdr);
+	else
+		ret = qaic_prepare_export_bo(qdev, bo, hdr);
+
+	if (ret == 0)
+		bo->dir = hdr->dir;
+
+	return ret;
+}
+
+static void qaic_unprepare_import_bo(struct qaic_bo *bo)
+{
+	dma_buf_unmap_attachment(bo->base.import_attach, bo->sgt, bo->dir);
+	bo->sgt = NULL;
+	bo->size = 0;
+}
+
+static void qaic_unprepare_export_bo(struct qaic_device *qdev, struct qaic_bo *bo)
+{
+	dma_unmap_sgtable(&qdev->pdev->dev, bo->sgt, bo->dir, 0);
+}
+
+static void qaic_unprepare_bo(struct qaic_device *qdev, struct qaic_bo *bo)
+{
+	if (bo->base.import_attach)
+		qaic_unprepare_import_bo(bo);
+	else
+		qaic_unprepare_export_bo(qdev, bo);
+
+	bo->dir = 0;
+}
+
+static void qaic_free_slices_bo(struct qaic_bo *bo)
+{
+	struct bo_slice *slice, *temp;
+
+	list_for_each_entry_safe(slice, temp, &bo->slices, slice) {
+		kref_put(&slice->ref_count, free_slice);
+	}
+}
+
+static int qaic_attach_slicing_bo(struct qaic_device *qdev,
+				  struct qaic_bo *bo,
+				  struct qaic_attach_slice_hdr *hdr,
+				  struct qaic_attach_slice_entry *slice_ent)
+{
+	int ret, i;
+
+	for (i = 0; i < hdr->count; i++) {
+		ret = qaic_map_one_slice(qdev, bo, &slice_ent[i]);
+		if (ret) {
+			qaic_free_slices_bo(bo);
+			return ret;
+		}
+	}
+
+	if (bo->total_slice_nents > qdev->dbc[hdr->dbc_id].nelem) {
+		qaic_free_slices_bo(bo);
+		return -ENOSPC;
+	}
+
+	bo->sliced = true;
+	bo->nr_slice = hdr->count;
+	list_add_tail(&bo->bo_list, &qdev->dbc[hdr->dbc_id].bo_lists);
+
+	return 0;
+}
+
+int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data,
+			       struct drm_file *file_priv)
+{
+	struct qaic_attach_slice_entry *slice_ent;
+	struct qaic_attach_slice *args = data;
+	struct dma_bridge_chan	*dbc;
+	int usr_rcu_id, qdev_rcu_id;
+	struct drm_gem_object *obj;
+	struct qaic_device *qdev;
+	unsigned long arg_size;
+	struct qaic_user *usr;
+	u8 __user *user_data;
+	struct qaic_bo *bo;
+	int ret;
+
+	usr = file_priv->driver_priv;
+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
+	if (!usr->qddev) {
+		ret = -ENODEV;
+		goto unlock_usr_srcu;
+	}
+
+	qdev = usr->qddev->qdev;
+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		ret = -ENODEV;
+		goto unlock_dev_srcu;
+	}
+
+	if (args->hdr.count == 0) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	arg_size = args->hdr.count * sizeof(*slice_ent);
+	if (arg_size / args->hdr.count != sizeof(*slice_ent)) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	if (args->hdr.dbc_id >= qdev->num_dbc) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	if (args->hdr.size == 0) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	if (!(args->hdr.dir == DMA_TO_DEVICE  ||
+	      args->hdr.dir == DMA_FROM_DEVICE)) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	dbc = &qdev->dbc[args->hdr.dbc_id];
+	if (dbc->usr != usr) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	if (args->data == 0) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	user_data = u64_to_user_ptr(args->data);
+
+	slice_ent = kzalloc(arg_size, GFP_KERNEL);
+	if (!slice_ent) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	ret = copy_from_user(slice_ent, user_data, arg_size);
+	if (ret) {
+		ret = -EFAULT;
+		goto free_slice_ent;
+	}
+
+	ret = qaic_validate_req(qdev, slice_ent, args->hdr.count, args->hdr.size);
+	if (ret)
+		goto free_slice_ent;
+
+	obj = drm_gem_object_lookup(file_priv, args->hdr.handle);
+	if (!obj) {
+		ret = -ENOENT;
+		goto free_slice_ent;
+	}
+
+	bo = to_qaic_bo(obj);
+
+	ret = qaic_prepare_bo(qdev, bo, &args->hdr);
+	if (ret)
+		goto put_bo;
+
+	ret = qaic_attach_slicing_bo(qdev, bo, &args->hdr, slice_ent);
+	if (ret)
+		goto unprepare_bo;
+
+	if (args->hdr.dir == DMA_TO_DEVICE)
+		dma_sync_sgtable_for_cpu(&qdev->pdev->dev, bo->sgt, args->hdr.dir);
+
+	bo->dbc = dbc;
+	drm_gem_object_put(obj);
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+
+	return 0;
+
+unprepare_bo:
+	qaic_unprepare_bo(qdev, bo);
+put_bo:
+	drm_gem_object_put(obj);
+free_slice_ent:
+	kfree(slice_ent);
+unlock_dev_srcu:
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+unlock_usr_srcu:
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+	return ret;
+}
+
+static inline int copy_exec_reqs(struct qaic_device *qdev,
+				 struct bo_slice *slice, u32 dbc_id, u32 head,
+				 u32 *ptail)
+{
+	struct dma_bridge_chan *dbc = &qdev->dbc[dbc_id];
+	struct dbc_req *reqs = slice->reqs;
+	u32 tail = *ptail;
+	u32 avail;
+
+	avail = head - tail;
+	if (head <= tail)
+		avail += dbc->nelem;
+
+	--avail;
+
+	if (avail < slice->nents)
+		return -EAGAIN;
+
+	if (tail + slice->nents > dbc->nelem) {
+		avail = dbc->nelem - tail;
+		avail = min_t(u32, avail, slice->nents);
+		memcpy(dbc->req_q_base + tail * get_dbc_req_elem_size(),
+		       reqs, sizeof(*reqs) * avail);
+		reqs += avail;
+		avail = slice->nents - avail;
+		if (avail)
+			memcpy(dbc->req_q_base, reqs, sizeof(*reqs) * avail);
+	} else {
+		memcpy(dbc->req_q_base + tail * get_dbc_req_elem_size(),
+		       reqs, sizeof(*reqs) * slice->nents);
+	}
+
+	*ptail = (tail + slice->nents) % dbc->nelem;
+
+	return 0;
+}
+
+/*
+ * Based on the value of resize we may only need to transmit first_n
+ * entries and the last entry, with last_bytes to send from the last entry.
+ * Note that first_n could be 0.
+ */
+static inline int copy_partial_exec_reqs(struct qaic_device *qdev,
+					 struct bo_slice *slice,
+					 u64 resize, u32 dbc_id,
+					 u32 head, u32 *ptail)
+{
+	struct dma_bridge_chan *dbc = &qdev->dbc[dbc_id];
+	struct dbc_req *reqs = slice->reqs;
+	struct dbc_req *last_req;
+	u32 tail = *ptail;
+	u64 total_bytes;
+	u64 last_bytes;
+	u32 first_n;
+	u32 avail;
+	int ret;
+	int i;
+
+	avail = head - tail;
+	if (head <= tail)
+		avail += dbc->nelem;
+
+	--avail;
+
+	total_bytes = 0;
+	for (i = 0; i < slice->nents; i++) {
+		total_bytes += le32_to_cpu(reqs[i].len);
+		if (total_bytes >= resize)
+			break;
+	}
+
+	if (total_bytes < resize) {
+		/* User space should have used the full buffer path. */
+		ret = -EINVAL;
+		return ret;
+	}
+
+	first_n = i;
+	last_bytes = i ? resize + le32_to_cpu(reqs[i].len) - total_bytes : resize;
+
+	if (avail < (first_n + 1))
+		return -EAGAIN;
+
+	if (first_n) {
+		if (tail + first_n > dbc->nelem) {
+			avail = dbc->nelem - tail;
+			avail = min_t(u32, avail, first_n);
+			memcpy(dbc->req_q_base + tail * get_dbc_req_elem_size(),
+			       reqs, sizeof(*reqs) * avail);
+			last_req = reqs + avail;
+			avail = first_n - avail;
+			if (avail)
+				memcpy(dbc->req_q_base, last_req,
+				       sizeof(*reqs) * avail);
+		} else {
+			memcpy(dbc->req_q_base + tail * get_dbc_req_elem_size(),
+			       reqs, sizeof(*reqs) * first_n);
+		}
+	}
+
+	/* Copy over the last entry. Here we need to adjust len to the left over
+	 * size, and set src and dst to the entry it is copied to.
+	 */
+	last_req =  dbc->req_q_base +
+		    (tail + first_n) % dbc->nelem * get_dbc_req_elem_size();
+	memcpy(last_req, reqs + slice->nents - 1, sizeof(*reqs));
+
+	/*
+	 * last_bytes holds size of a DMA segment, maximum DMA segment size is
+	 * set to UINT_MAX by qaic and hence last_bytes can never exceed u32
+	 * range. So, by down sizing we are not corrupting the value.
+	 */
+	last_req->len = cpu_to_le32((u32)last_bytes);
+	last_req->src_addr = reqs[first_n].src_addr;
+	last_req->dest_addr = reqs[first_n].dest_addr;
+
+	*ptail = (tail + first_n + 1) % dbc->nelem;
+
+	return 0;
+}
+
+static int __qaic_execute_bo_ioctl(struct drm_device *dev, void *data,
+				   struct drm_file *file_priv, bool is_partial)
+{
+	struct qaic_partial_execute_entry *pexec;
+	struct qaic_execute *args = data;
+	struct qaic_execute_entry *exec;
+	struct dma_bridge_chan *dbc;
+	int usr_rcu_id, qdev_rcu_id;
+	struct drm_gem_object *obj;
+	struct qaic_device *qdev;
+	struct bo_slice *slice;
+	struct qaic_user *usr;
+	u8 __user *user_data;
+	unsigned long flags;
+	u64 received_ts = 0;
+	u32 queue_level = 0;
+	struct qaic_bo *bo;
+	u64 submit_ts = 0;
+	unsigned long n;
+	bool queued;
+	int ret = 0;
+	int dbc_id;
+	int rcu_id;
+	u32 head;
+	u32 tail;
+	u64 size;
+	int i, j;
+
+	received_ts = ktime_get_ns();
+
+	usr = file_priv->driver_priv;
+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
+	if (!usr->qddev) {
+		ret = -ENODEV;
+		goto unlock_usr_srcu;
+	}
+
+	qdev = usr->qddev->qdev;
+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		ret = -ENODEV;
+		goto unlock_dev_srcu;
+	}
+
+	if (args->hdr.dbc_id >= qdev->num_dbc) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	dbc_id = args->hdr.dbc_id;
+	dbc = &qdev->dbc[dbc_id];
+
+	size = is_partial ? sizeof(*pexec) : sizeof(*exec);
+
+	n = (unsigned long)size * args->hdr.count;
+	if (args->hdr.count == 0 || n / args->hdr.count != size) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	user_data = u64_to_user_ptr(args->data);
+
+	exec = kcalloc(args->hdr.count, size, GFP_KERNEL);
+	pexec = (struct qaic_partial_execute_entry *)exec;
+	if (!exec) {
+		ret = -ENOMEM;
+		goto unlock_dev_srcu;
+	}
+
+	if (copy_from_user(exec, user_data, n)) {
+		ret = -EFAULT;
+		goto free_exec;
+	}
+
+	rcu_id = srcu_read_lock(&dbc->ch_lock);
+	if (!dbc->usr || dbc->usr->handle != usr->handle) {
+		ret = -EPERM;
+		goto release_ch_rcu;
+	}
+
+	head = readl(dbc->dbc_base + REQHP_OFF);
+	tail = readl(dbc->dbc_base + REQTP_OFF);
+
+	if (head == U32_MAX || tail == U32_MAX) {
+		/* PCI link error */
+		ret = -ENODEV;
+		goto release_ch_rcu;
+	}
+
+	queue_level = head <= tail ? tail - head : dbc->nelem - (head - tail);
+
+	for (i = 0; i < args->hdr.count; i++) {
+		/*
+		 * ref count will be decremented when the transfer of this
+		 * buffer is complete. It is inside dbc_irq_threaded_fn().
+		 */
+		obj = drm_gem_object_lookup(file_priv,
+					    is_partial ? pexec[i].handle : exec[i].handle);
+		if (!obj) {
+			ret = -ENOENT;
+			goto sync_to_cpu;
+		}
+
+		bo = to_qaic_bo(obj);
+
+		if (!bo->sliced) {
+			ret = -EINVAL;
+			goto sync_to_cpu;
+		}
+
+		if (is_partial && pexec[i].resize > bo->size) {
+			ret = -EINVAL;
+			goto sync_to_cpu;
+		}
+
+		spin_lock_irqsave(&dbc->xfer_lock, flags);
+		queued = bo->queued;
+		bo->queued = true;
+		if (queued) {
+			spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+			ret = -EINVAL;
+			goto sync_to_cpu;
+		}
+
+		bo->req_id = dbc->next_req_id++;
+
+		list_for_each_entry(slice, &bo->slices, slice) {
+			/*
+			 * If this slice does not fall under the given
+			 * resize then skip this slice and continue the loop
+			 */
+			if (is_partial && pexec[i].resize &&
+			    pexec[i].resize <= slice->offset)
+				continue;
+
+			for (j = 0; j < slice->nents; j++)
+				slice->reqs[j].req_id = cpu_to_le16(bo->req_id);
+
+			/*
+			 * If it is a partial execute ioctl call then check if
+			 * resize has cut this slice short then do a partial copy
+			 * else do complete copy
+			 */
+			if (is_partial && pexec[i].resize &&
+			    pexec[i].resize < slice->offset + slice->size)
+				ret = copy_partial_exec_reqs(qdev, slice,
+							     pexec[i].resize - slice->offset,
+							     dbc_id, head, &tail);
+			else
+				ret = copy_exec_reqs(qdev, slice, dbc_id, head, &tail);
+			if (ret) {
+				bo->queued = false;
+				spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+				goto sync_to_cpu;
+			}
+		}
+		reinit_completion(&bo->xfer_done);
+		list_add_tail(&bo->xfer_list, &dbc->xfer_list);
+		spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+		dma_sync_sgtable_for_device(&qdev->pdev->dev, bo->sgt, bo->dir);
+	}
+
+	submit_ts = ktime_get_ns();
+	writel(tail, dbc->dbc_base + REQTP_OFF);
+
+	/* Collect kernel Profiling data */
+	for (i = 0; i < args->hdr.count; i++) {
+		/*
+		 * Since we already committed the BO to hardware, the only way
+		 * this should fail is a pending signal.  We can't cancel the
+		 * submit to hardware, so we have to just skip the profiling
+		 * data.  In case the signal is not fatal to the process, we
+		 * return success so that the user doesn't try to resubmit.
+		 */
+		obj = drm_gem_object_lookup(file_priv,
+					    is_partial ? pexec[i].handle : exec[i].handle);
+		if (!obj)
+			break;
+		bo = to_qaic_bo(obj);
+		bo->perf_stats.req_received_ts = received_ts;
+		bo->perf_stats.req_submit_ts = submit_ts;
+		bo->perf_stats.queue_level_before = queue_level;
+		queue_level += bo->total_slice_nents;
+		drm_gem_object_put(obj);
+	}
+
+	if (poll_datapath)
+		schedule_work(&dbc->poll_work);
+
+	goto release_ch_rcu;
+
+sync_to_cpu:
+	if (likely(obj))
+		drm_gem_object_put(obj);
+	for (j = 0; j < i; j++) {
+		spin_lock_irqsave(&dbc->xfer_lock, flags);
+		bo = list_last_entry(&dbc->xfer_list, struct qaic_bo,
+				     xfer_list);
+		obj = &bo->base;
+		bo->queued = false;
+		list_del(&bo->xfer_list);
+		spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+		dma_sync_sgtable_for_cpu(&qdev->pdev->dev, bo->sgt, bo->dir);
+		/* Release ref to BO */
+		drm_gem_object_put(obj);
+	}
+release_ch_rcu:
+	srcu_read_unlock(&dbc->ch_lock, rcu_id);
+free_exec:
+	kfree(exec);
+unlock_dev_srcu:
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+unlock_usr_srcu:
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+	return ret;
+}
+
+int qaic_execute_bo_ioctl(struct drm_device *dev, void *data,
+			  struct drm_file *file_priv)
+{
+	return __qaic_execute_bo_ioctl(dev, data, file_priv, false);
+}
+
+int qaic_partial_execute_bo_ioctl(struct drm_device *dev, void *data,
+				  struct drm_file *file_priv)
+{
+	return __qaic_execute_bo_ioctl(dev, data, file_priv, true);
+}
+
+/*
+ * Our interrupt handling is a bit more complicated than a simple ideal, but
+ * sadly necessary.
+ *
+ * Each dbc has a completion queue.  Entries in the queue correspond to DMA
+ * requests which the device has processed.  The hardware already has a built
+ * in irq mitigation.  When the device puts an entry into the queue, it will
+ * only trigger an interrupt if the queue was empty.  Therefore, when adding
+ * the Nth event to a non-empty queue, the hardware doesn't trigger an
+ * interrupt.  This means the host doesn't get additional interrupts signaling
+ * the same thing - the queue has something to process.
+ * This behavior can be overridden in the DMA request.
+ * This means that when the host receives an interrupt, it is required to
+ * drain the queue.
+ *
+ * This behavior is what NAPI attempts to accomplish, although we can't use
+ * NAPI as we don't have a netdev.  We use threaded irqs instead.
+ *
+ * However, there is a situation where the host drains the queue fast enough
+ * that every event causes an interrupt.  Typically this is not a problem as
+ * the rate of events would be low.  However, that is not the case with
+ * lprnet for example.  On an Intel Xeon D-2191 where we run 8 instances of
+ * lprnet, the host receives roughly 80k interrupts per second from the device
+ * (per /proc/interrupts).  While NAPI documentation indicates the host should
+ * just chug along, sadly that behavior causes instability in some hosts.
+ *
+ * Therefore, we implement an interrupt disable scheme similar to NAPI.  The
+ * key difference is that we will delay after draining the queue for a small
+ * time to allow additional events to come in via polling.  Using the above
+ * lprnet workload, this reduces the number of interrupts processed from
+ * ~80k/sec to about 64 in 5 minutes and appears to solve the system
+ * instability.
+ */
+irqreturn_t dbc_irq_handler(int irq, void *data)
+{
+	struct dma_bridge_chan *dbc = data;
+	int rcu_id;
+	u32 head;
+	u32 tail;
+
+	rcu_id = srcu_read_lock(&dbc->ch_lock);
+
+	if (!dbc->usr) {
+		srcu_read_unlock(&dbc->ch_lock, rcu_id);
+		return IRQ_HANDLED;
+	}
+
+	head = readl(dbc->dbc_base + RSPHP_OFF);
+	if (head == U32_MAX) { /* PCI link error */
+		srcu_read_unlock(&dbc->ch_lock, rcu_id);
+		return IRQ_NONE;
+	}
+
+	tail = readl(dbc->dbc_base + RSPTP_OFF);
+	if (tail == U32_MAX) { /* PCI link error */
+		srcu_read_unlock(&dbc->ch_lock, rcu_id);
+		return IRQ_NONE;
+	}
+
+	if (head == tail) { /* queue empty */
+		srcu_read_unlock(&dbc->ch_lock, rcu_id);
+		return IRQ_NONE;
+	}
+
+	disable_irq_nosync(irq);
+	srcu_read_unlock(&dbc->ch_lock, rcu_id);
+	return IRQ_WAKE_THREAD;
+}
+
+void irq_polling_work(struct work_struct *work)
+{
+	struct dma_bridge_chan *dbc = container_of(work,
+						   struct dma_bridge_chan,
+						   poll_work);
+	unsigned long flags;
+	int rcu_id;
+	u32 head;
+	u32 tail;
+
+	rcu_id = srcu_read_lock(&dbc->ch_lock);
+
+	while (1) {
+		if (dbc->qdev->in_reset) {
+			srcu_read_unlock(&dbc->ch_lock, rcu_id);
+			return;
+		}
+		if (!dbc->usr) {
+			srcu_read_unlock(&dbc->ch_lock, rcu_id);
+			return;
+		}
+		spin_lock_irqsave(&dbc->xfer_lock, flags);
+		if (list_empty(&dbc->xfer_list)) {
+			spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+			srcu_read_unlock(&dbc->ch_lock, rcu_id);
+			return;
+		}
+		spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+
+		head = readl(dbc->dbc_base + RSPHP_OFF);
+		if (head == U32_MAX) { /* PCI link error */
+			srcu_read_unlock(&dbc->ch_lock, rcu_id);
+			return;
+		}
+
+		tail = readl(dbc->dbc_base + RSPTP_OFF);
+		if (tail == U32_MAX) { /* PCI link error */
+			srcu_read_unlock(&dbc->ch_lock, rcu_id);
+			return;
+		}
+
+		if (head != tail) {
+			irq_wake_thread(dbc->irq, dbc);
+			srcu_read_unlock(&dbc->ch_lock, rcu_id);
+			return;
+		}
+
+		cond_resched();
+		usleep_range(datapath_poll_interval_us,
+			     2 * datapath_poll_interval_us);
+	}
+}
+
+irqreturn_t dbc_irq_threaded_fn(int irq, void *data)
+{
+	struct dma_bridge_chan *dbc = data;
+	int event_count = NUM_EVENTS;
+	int delay_count = NUM_DELAYS;
+	struct qaic_device *qdev;
+	struct qaic_bo *bo, *i;
+	struct dbc_rsp *rsp;
+	unsigned long flags;
+	int rcu_id;
+	u16 status;
+	u16 req_id;
+	u32 head;
+	u32 tail;
+
+	rcu_id = srcu_read_lock(&dbc->ch_lock);
+
+	head = readl(dbc->dbc_base + RSPHP_OFF);
+	if (head == U32_MAX) /* PCI link error */
+		goto error_out;
+
+	qdev = dbc->qdev;
+read_fifo:
+
+	if (!event_count) {
+		event_count = NUM_EVENTS;
+		cond_resched();
+	}
+
+	/*
+	 * if this channel isn't assigned or gets unassigned during processing
+	 * we have nothing further to do
+	 */
+	if (!dbc->usr)
+		goto error_out;
+
+	tail = readl(dbc->dbc_base + RSPTP_OFF);
+	if (tail == U32_MAX) /* PCI link error */
+		goto error_out;
+
+	if (head == tail) { /* queue empty */
+		if (delay_count) {
+			--delay_count;
+			usleep_range(100, 200);
+			goto read_fifo; /* check for a new event */
+		}
+		goto normal_out;
+	}
+
+	delay_count = NUM_DELAYS;
+	while (head != tail) {
+		if (!event_count)
+			break;
+		--event_count;
+		rsp = dbc->rsp_q_base + head * sizeof(*rsp);
+		req_id = le16_to_cpu(rsp->req_id);
+		status = le16_to_cpu(rsp->status);
+		if (status)
+			pci_dbg(qdev->pdev, "req_id %d failed with status %d\n",
+				req_id, status);
+		spin_lock_irqsave(&dbc->xfer_lock, flags);
+		/*
+		 * A BO can receive multiple interrupts, since a BO can be
+		 * divided into multiple slices and a buffer receives as many
+		 * interrupts as slices. So until it receives interrupts for
+		 * all the slices we cannot mark that buffer complete.
+		 */
+		list_for_each_entry_safe(bo, i, &dbc->xfer_list, xfer_list) {
+			if (bo->req_id == req_id)
+				bo->nr_slice_xfer_done++;
+			else
+				continue;
+
+			if (bo->nr_slice_xfer_done < bo->nr_slice)
+				break;
+
+			/*
+			 * At this point we have received all the interrupts for
+			 * BO, which means BO execution is complete.
+			 */
+			dma_sync_sgtable_for_cpu(&qdev->pdev->dev, bo->sgt, bo->dir);
+			bo->nr_slice_xfer_done = 0;
+			bo->queued = false;
+			list_del(&bo->xfer_list);
+			bo->perf_stats.req_processed_ts = ktime_get_ns();
+			complete_all(&bo->xfer_done);
+			drm_gem_object_put(&bo->base);
+			break;
+		}
+		spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+		head = (head + 1) % dbc->nelem;
+	}
+
+	/*
+	 * Update the head pointer of response queue and let the device know
+	 * that we have consumed elements from the queue.
+	 */
+	writel(head, dbc->dbc_base + RSPHP_OFF);
+
+	/* elements might have been put in the queue while we were processing */
+	goto read_fifo;
+
+normal_out:
+	if (likely(!poll_datapath))
+		enable_irq(irq);
+	else
+		schedule_work(&dbc->poll_work);
+	/* checking the fifo and enabling irqs is a race, missed event check */
+	tail = readl(dbc->dbc_base + RSPTP_OFF);
+	if (tail != U32_MAX && head != tail) {
+		if (likely(!poll_datapath))
+			disable_irq_nosync(irq);
+		goto read_fifo;
+	}
+	srcu_read_unlock(&dbc->ch_lock, rcu_id);
+	return IRQ_HANDLED;
+
+error_out:
+	srcu_read_unlock(&dbc->ch_lock, rcu_id);
+	if (likely(!poll_datapath))
+		enable_irq(irq);
+	else
+		schedule_work(&dbc->poll_work);
+
+	return IRQ_HANDLED;
+}
+
+int qaic_wait_bo_ioctl(struct drm_device *dev, void *data,
+		       struct drm_file *file_priv)
+{
+	struct qaic_wait *args = data;
+	int usr_rcu_id, qdev_rcu_id;
+	struct dma_bridge_chan *dbc;
+	struct drm_gem_object *obj;
+	struct qaic_device *qdev;
+	unsigned long timeout;
+	struct qaic_user *usr;
+	struct qaic_bo *bo;
+	int rcu_id;
+	int ret;
+
+	usr = file_priv->driver_priv;
+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
+	if (!usr->qddev) {
+		ret = -ENODEV;
+		goto unlock_usr_srcu;
+	}
+
+	qdev = usr->qddev->qdev;
+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		ret = -ENODEV;
+		goto unlock_dev_srcu;
+	}
+
+	if (args->pad != 0) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	if (args->dbc_id >= qdev->num_dbc) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	dbc = &qdev->dbc[args->dbc_id];
+
+	rcu_id = srcu_read_lock(&dbc->ch_lock);
+	if (dbc->usr != usr) {
+		ret = -EPERM;
+		goto unlock_ch_srcu;
+	}
+
+	obj = drm_gem_object_lookup(file_priv, args->handle);
+	if (!obj) {
+		ret = -ENOENT;
+		goto unlock_ch_srcu;
+	}
+
+	bo = to_qaic_bo(obj);
+	timeout = args->timeout ? args->timeout : wait_exec_default_timeout;
+	timeout = msecs_to_jiffies(timeout);
+	ret = wait_for_completion_interruptible_timeout(&bo->xfer_done, timeout);
+	if (!ret) {
+		ret = -ETIMEDOUT;
+		goto put_obj;
+	}
+	if (ret > 0)
+		ret = 0;
+
+	if (!dbc->usr)
+		ret = -EPERM;
+
+put_obj:
+	drm_gem_object_put(obj);
+unlock_ch_srcu:
+	srcu_read_unlock(&dbc->ch_lock, rcu_id);
+unlock_dev_srcu:
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+unlock_usr_srcu:
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+	return ret;
+}
+
+int qaic_perf_stats_bo_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file_priv)
+{
+	struct qaic_perf_stats_entry *ent = NULL;
+	struct qaic_perf_stats *args = data;
+	int usr_rcu_id, qdev_rcu_id;
+	struct drm_gem_object *obj;
+	struct qaic_device *qdev;
+	struct qaic_user *usr;
+	struct qaic_bo *bo;
+	int ret, i;
+
+	usr = file_priv->driver_priv;
+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
+	if (!usr->qddev) {
+		ret = -ENODEV;
+		goto unlock_usr_srcu;
+	}
+
+	qdev = usr->qddev->qdev;
+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		ret = -ENODEV;
+		goto unlock_dev_srcu;
+	}
+
+	if (args->hdr.dbc_id >= qdev->num_dbc) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	ent = kcalloc(args->hdr.count, sizeof(*ent), GFP_KERNEL);
+	if (!ent) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	ret = copy_from_user(ent, u64_to_user_ptr(args->data),
+			     args->hdr.count * sizeof(*ent));
+	if (ret) {
+		ret = -EFAULT;
+		goto free_ent;
+	}
+
+	for (i = 0; i < args->hdr.count; i++) {
+		obj = drm_gem_object_lookup(file_priv, ent[i].handle);
+		if (!obj) {
+			ret = -ENOENT;
+			goto free_ent;
+		}
+		bo = to_qaic_bo(obj);
+		/*
+		 * perf stats ioctl is called before wait ioctl is complete then
+		 * the latency information is invalid.
+		 */
+		if (bo->perf_stats.req_processed_ts < bo->perf_stats.req_submit_ts) {
+			ent[i].device_latency_us = 0;
+		} else {
+			ent[i].device_latency_us = (bo->perf_stats.req_processed_ts -
+						    bo->perf_stats.req_submit_ts) / 1000;
+		}
+		ent[i].submit_latency_us = (bo->perf_stats.req_submit_ts -
+					   bo->perf_stats.req_received_ts) / 1000;
+		ent[i].queue_level_before = bo->perf_stats.queue_level_before;
+		ent[i].num_queue_element = bo->total_slice_nents;
+		drm_gem_object_put(obj);
+	}
+
+	if (copy_to_user(u64_to_user_ptr(args->data), ent,
+			 args->hdr.count * sizeof(*ent)))
+		ret = -EFAULT;
+
+free_ent:
+	kfree(ent);
+unlock_dev_srcu:
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+unlock_usr_srcu:
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+	return ret;
+}
+
+static void empty_xfer_list(struct qaic_device *qdev, struct dma_bridge_chan *dbc)
+{
+	unsigned long flags;
+	struct qaic_bo *bo;
+
+	spin_lock_irqsave(&dbc->xfer_lock, flags);
+	while (!list_empty(&dbc->xfer_list)) {
+		bo = list_first_entry(&dbc->xfer_list, typeof(*bo), xfer_list);
+		bo->queued = false;
+		list_del(&bo->xfer_list);
+		spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+		dma_sync_sgtable_for_cpu(&qdev->pdev->dev, bo->sgt, bo->dir);
+		complete_all(&bo->xfer_done);
+		drm_gem_object_put(&bo->base);
+		spin_lock_irqsave(&dbc->xfer_lock, flags);
+	}
+	spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+}
+
+int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr)
+{
+	if (!qdev->dbc[dbc_id].usr ||
+	    qdev->dbc[dbc_id].usr->handle != usr->handle)
+		return -EPERM;
+
+	qdev->dbc[dbc_id].usr = NULL;
+	synchronize_srcu(&qdev->dbc[dbc_id].ch_lock);
+	return 0;
+}
+
+/**
+ * enable_dbc - Enable the DBC. DBCs are disabled by removing the context of
+ * user. Add user context back to DBC to enable it. This function trusts the
+ * DBC ID passed and expects the DBC to be disabled.
+ * @qdev: Qranium device handle
+ * @dbc_id: ID of the DBC
+ * @usr: User context
+ */
+void enable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr)
+{
+	qdev->dbc[dbc_id].usr = usr;
+}
+
+void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id)
+{
+	struct dma_bridge_chan *dbc = &qdev->dbc[dbc_id];
+
+	dbc->usr = NULL;
+	empty_xfer_list(qdev, dbc);
+	synchronize_srcu(&dbc->ch_lock);
+}
+
+void release_dbc(struct qaic_device *qdev, u32 dbc_id)
+{
+	struct bo_slice *slice, *slice_temp;
+	struct qaic_bo *bo, *bo_temp;
+	struct dma_bridge_chan *dbc;
+
+	dbc = &qdev->dbc[dbc_id];
+	if (!dbc->in_use)
+		return;
+
+	wakeup_dbc(qdev, dbc_id);
+
+	dma_free_coherent(&qdev->pdev->dev, dbc->total_size, dbc->req_q_base,
+			  dbc->dma_addr);
+	dbc->total_size = 0;
+	dbc->req_q_base = NULL;
+	dbc->dma_addr = 0;
+	dbc->nelem = 0;
+	dbc->usr = NULL;
+
+	list_for_each_entry_safe(bo, bo_temp, &dbc->bo_lists, bo_list) {
+		list_for_each_entry_safe(slice, slice_temp, &bo->slices, slice)
+			kref_put(&slice->ref_count, free_slice);
+		bo->sliced = false;
+		INIT_LIST_HEAD(&bo->slices);
+		bo->total_slice_nents = 0;
+		bo->dir = 0;
+		bo->dbc = NULL;
+		bo->nr_slice = 0;
+		bo->nr_slice_xfer_done = 0;
+		bo->queued = false;
+		bo->req_id = 0;
+		init_completion(&bo->xfer_done);
+		complete_all(&bo->xfer_done);
+		list_del(&bo->bo_list);
+		bo->perf_stats.req_received_ts = 0;
+		bo->perf_stats.req_submit_ts = 0;
+		bo->perf_stats.req_processed_ts = 0;
+		bo->perf_stats.queue_level_before = 0;
+	}
+
+	dbc->in_use = false;
+	wake_up(&dbc->dbc_release);
+}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 5/8] accel/qaic: Add datapath
@ 2023-02-06 15:41   ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-06 15:41 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: jacek.lawrynowicz, stanislaw.gruszka, quic_pkanojiy, quic_carlv,
	quic_ajitpals, linux-doc, linux-arm-msm, Jeffrey Hugo

Add the datapath component that manages BOs and submits them to running
workloads on the qaic device via the dma_bridge hardware.  This allows
QAIC clients to interact with their workloads (run inferences) via the
following ioctls along with mmap():

DRM_IOCTL_QAIC_CREATE_BO
DRM_IOCTL_QAIC_MMAP_BO
DRM_IOCTL_QAIC_ATTACH_SLICE_BO
DRM_IOCTL_QAIC_EXECUTE_BO
DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO
DRM_IOCTL_QAIC_WAIT_BO
DRM_IOCTL_QAIC_PERF_STATS_BO

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
---
 drivers/accel/qaic/qaic_data.c | 1952 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1952 insertions(+)
 create mode 100644 drivers/accel/qaic/qaic_data.c

diff --git a/drivers/accel/qaic/qaic_data.c b/drivers/accel/qaic/qaic_data.c
new file mode 100644
index 0000000..d37cdbe
--- /dev/null
+++ b/drivers/accel/qaic/qaic_data.c
@@ -0,0 +1,1952 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2019-2021, The Linux Foundation. All rights reserved. */
+/* Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved. */
+
+#include <linux/completion.h>
+#include <linux/delay.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-mapping.h>
+#include <linux/interrupt.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/moduleparam.h>
+#include <linux/scatterlist.h>
+#include <linux/spinlock.h>
+#include <linux/srcu.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/wait.h>
+#include <drm/drm_file.h>
+#include <drm/drm_gem.h>
+#include <drm/drm_print.h>
+#include <uapi/drm/qaic_accel.h>
+
+#include "qaic.h"
+
+#define SEM_VAL_MASK	GENMASK_ULL(11, 0)
+#define SEM_INDEX_MASK	GENMASK_ULL(4, 0)
+#define BULK_XFER	BIT(3)
+#define GEN_COMPLETION	BIT(4)
+#define INBOUND_XFER	1
+#define OUTBOUND_XFER	2
+#define REQHP_OFF	0x0 /* we read this */
+#define REQTP_OFF	0x4 /* we write this */
+#define RSPHP_OFF	0x8 /* we write this */
+#define RSPTP_OFF	0xc /* we read this */
+
+#define ENCODE_SEM(val, index, sync, cmd, flags)			\
+			((val) |					\
+			(index) << 16 |					\
+			(sync) << 22 |					\
+			(cmd) << 24 |					\
+			((cmd) ? BIT(31) : 0) |				\
+			(((flags) & SEM_INSYNCFENCE) ? BIT(30) : 0) |	\
+			(((flags) & SEM_OUTSYNCFENCE) ? BIT(29) : 0))
+#define NUM_EVENTS	128
+#define NUM_DELAYS	10
+
+static unsigned int wait_exec_default_timeout = 5000; /* 5 sec default */
+module_param(wait_exec_default_timeout, uint, 0600);
+
+static unsigned int datapath_poll_interval_us = 100; /* 100 usec default */
+module_param(datapath_poll_interval_us, uint, 0600);
+
+struct dbc_req { /* everything must be little endian encoded */
+	/*
+	 * A request ID is assigned to each memory handle going in DMA queue.
+	 * As a single memory handle can enqueue multiple elements in DMA queue
+	 * all of them will have the same request ID.
+	 */
+	__le16	req_id;
+	/* Future use */
+	__u8	seq_id;
+	/*
+	 * Special encoded variable
+	 * 7	0 - Do not force to generate MSI after DMA is completed
+	 *	1 - Force to generate MSI after DMA is completed
+	 * 6:5	Reserved
+	 * 4	1 - Generate completion element in the response queue
+	 *	0 - No Completion Code
+	 * 3	0 - DMA request is a Link list transfer
+	 *	1 - DMA request is a Bulk transfer
+	 * 2	Reserved
+	 * 1:0	00 - No DMA transfer involved
+	 *	01 - DMA transfer is part of inbound transfer
+	 *	10 - DMA transfer has outbound transfer
+	 *	11 - NA
+	 */
+	__u8	cmd;
+	__le32	resv;
+	/* Source address for the transfer */
+	__le64	src_addr;
+	/* Destination address for the transfer */
+	__le64	dest_addr;
+	/* Length of transfer request */
+	__le32	len;
+	__le32	resv2;
+	/* Doorbell address */
+	__le64	db_addr;
+	/*
+	 * Special encoded variable
+	 * 7	1 - Doorbell(db) write
+	 *	0 - No doorbell write
+	 * 6:2	Reserved
+	 * 1:0	00 - 32 bit access, db address must be aligned to 32bit-boundary
+	 *	01 - 16 bit access, db address must be aligned to 16bit-boundary
+	 *	10 - 8 bit access, db address must be aligned to 8bit-boundary
+	 *	11 - Reserved
+	 */
+	__u8	db_len;
+	__u8	resv3;
+	__le16	resv4;
+	/* 32 bit data written to doorbell address */
+	__le32	db_data;
+	/*
+	 * Special encoded variable
+	 * All the fields of sem_cmdX are passed from user and all are ORed
+	 * together to form sem_cmd.
+	 * 0:11		Semaphore value
+	 * 15:12	Reserved
+	 * 20:16	Semaphore index
+	 * 21		Reserved
+	 * 22		Semaphore Sync
+	 * 23		Reserved
+	 * 26:24	Semaphore command
+	 * 28:27	Reserved
+	 * 29		Semaphore DMA out bound sync fence
+	 * 30		Semaphore DMA in bound sync fence
+	 * 31		Enable semaphore command
+	 */
+	__le32	sem_cmd0;
+	__le32	sem_cmd1;
+	__le32	sem_cmd2;
+	__le32	sem_cmd3;
+} __packed;
+
+struct dbc_rsp { /* everything must be little endian encoded */
+	/* Request ID of the memory handle whose DMA transaction is completed */
+	__le16	req_id;
+	/* Status of the DMA transaction. 0 : Success otherwise failure */
+	__le16	status;
+} __packed;
+
+inline int get_dbc_req_elem_size(void)
+{
+	return sizeof(struct dbc_req);
+}
+
+inline int get_dbc_rsp_elem_size(void)
+{
+	return sizeof(struct dbc_rsp);
+}
+
+static int reserve_pages(unsigned long start_pfn, unsigned long nr_pages,
+			 bool reserve)
+{
+	unsigned long pfn;
+	unsigned long end_pfn = start_pfn + nr_pages;
+	struct page *page;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+		if (!pfn_valid(pfn))
+			return -EINVAL;
+		page =  pfn_to_page(pfn);
+		if (reserve)
+			SetPageReserved(page);
+		else
+			ClearPageReserved(page);
+	}
+	return 0;
+}
+
+static void free_slice(struct kref *kref)
+{
+	struct bo_slice *slice = container_of(kref, struct bo_slice, ref_count);
+
+	list_del(&slice->slice);
+	drm_gem_object_put(&slice->bo->base);
+	sg_free_table(slice->sgt);
+	kfree(slice->sgt);
+	kfree(slice->reqs);
+	kfree(slice);
+}
+
+static int copy_sgt(struct qaic_device *qdev, struct sg_table **sgt_out,
+		    struct sg_table *sgt_in, u64 size, u64 offset)
+{
+	int total_len, len, nents, offf = 0, offl = 0;
+	struct scatterlist *sg, *sgn, *sgf, *sgl;
+	struct sg_table *sgt;
+	int ret, j;
+
+	/* find out number of relevant nents needed for this mem */
+	total_len = 0;
+	sgf = NULL;
+	sgl = NULL;
+	nents = 0;
+
+	size = size ? size : PAGE_SIZE;
+	for (sg = sgt_in->sgl; sg; sg = sg_next(sg)) {
+		len = sg_dma_len(sg);
+
+		if (!len)
+			continue;
+		if (offset >= total_len && offset < total_len + len) {
+			sgf = sg;
+			offf = offset - total_len;
+		}
+		if (sgf)
+			nents++;
+		if (offset + size >= total_len &&
+		    offset + size <= total_len + len) {
+			sgl = sg;
+			offl = offset + size - total_len;
+			break;
+		}
+		total_len += len;
+	}
+
+	if (!sgf || !sgl) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
+	if (!sgt) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
+	if (ret)
+		goto free_sgt;
+
+	/* copy relevant sg node and fix page and length */
+	sgn = sgf;
+	for_each_sgtable_sg(sgt, sg, j) {
+		memcpy(sg, sgn, sizeof(*sg));
+		if (sgn == sgf) {
+			sg_dma_address(sg) += offf;
+			sg_dma_len(sg) -= offf;
+			sg_set_page(sg, sg_page(sgn),
+				    sg_dma_len(sg), offf);
+		} else {
+			offf = 0;
+		}
+		if (sgn == sgl) {
+			sg_dma_len(sg) = offl - offf;
+			sg_set_page(sg, sg_page(sgn),
+				    offl - offf, offf);
+			sg_mark_end(sg);
+			break;
+		}
+		sgn = sg_next(sgn);
+	}
+
+	*sgt_out = sgt;
+	return ret;
+
+free_sgt:
+	kfree(sgt);
+out:
+	*sgt_out = NULL;
+	return ret;
+}
+
+static int encode_reqs(struct qaic_device *qdev, struct bo_slice *slice,
+		       struct qaic_attach_slice_entry *req)
+{
+	__u8 cmd = BULK_XFER;
+	__le64 db_addr = cpu_to_le64(req->db_addr);
+	__u8 db_len;
+	__le32 db_data = cpu_to_le32(req->db_data);
+	struct scatterlist *sg;
+	u64 dev_addr;
+	int presync_sem;
+	int i;
+
+	if (!slice->no_xfer)
+		cmd |= (slice->dir == DMA_TO_DEVICE ? INBOUND_XFER :
+								OUTBOUND_XFER);
+
+	if (req->db_len && !IS_ALIGNED(req->db_addr, req->db_len / 8))
+		return -EINVAL;
+
+	presync_sem = req->sem0.presync + req->sem1.presync +
+		      req->sem2.presync + req->sem3.presync;
+	if (presync_sem > 1)
+		return -EINVAL;
+
+	presync_sem = req->sem0.presync << 0 | req->sem1.presync << 1 |
+		      req->sem2.presync << 2 | req->sem3.presync << 3;
+
+	switch (req->db_len) {
+	case 32:
+		db_len = BIT(7);
+		break;
+	case 16:
+		db_len = BIT(7) | 1;
+		break;
+	case 8:
+		db_len = BIT(7) | 2;
+		break;
+	case 0:
+		db_len = 0; /* doorbell is not active for this command */
+		break;
+	default:
+		return -EINVAL; /* should never hit this */
+	}
+
+	/*
+	 * When we end up splitting up a single request (ie a buf slice) into
+	 * multiple DMA requests, we have to manage the sync data carefully.
+	 * There can only be one presync sem.  That needs to be on every xfer
+	 * so that the DMA engine doesn't transfer data before the receiver is
+	 * ready.  We only do the doorbell and postsync sems after the xfer.
+	 * To guarantee previous xfers for the request are complete, we use a
+	 * fence.
+	 */
+	dev_addr = req->dev_addr;
+	for_each_sgtable_sg(slice->sgt, sg, i) {
+		slice->reqs[i].cmd = cmd;
+		slice->reqs[i].src_addr =
+			cpu_to_le64(slice->dir == DMA_TO_DEVICE ?
+					sg_dma_address(sg) : dev_addr);
+		slice->reqs[i].dest_addr =
+			cpu_to_le64(slice->dir == DMA_TO_DEVICE ?
+					dev_addr : sg_dma_address(sg));
+		/*
+		 * sg_dma_len(sg) returns size of a DMA segment, maximum DMA
+		 * segment size is set to UINT_MAX by qaic and hence return
+		 * values of sg_dma_len(sg) can never exceed u32 range. So,
+		 * by down sizing we are not corrupting the value.
+		 */
+		slice->reqs[i].len = cpu_to_le32((u32)sg_dma_len(sg));
+		switch (presync_sem) {
+		case BIT(0):
+			slice->reqs[i].sem_cmd0 = cpu_to_le32(ENCODE_SEM(req->sem0.val,
+									 req->sem0.index,
+									 req->sem0.presync,
+									 req->sem0.cmd,
+									 req->sem0.flags));
+			break;
+		case BIT(1):
+			slice->reqs[i].sem_cmd1 = cpu_to_le32(ENCODE_SEM(req->sem1.val,
+									 req->sem1.index,
+									 req->sem1.presync,
+									 req->sem1.cmd,
+									 req->sem1.flags));
+			break;
+		case BIT(2):
+			slice->reqs[i].sem_cmd2 = cpu_to_le32(ENCODE_SEM(req->sem2.val,
+									 req->sem2.index,
+									 req->sem2.presync,
+									 req->sem2.cmd,
+									 req->sem2.flags));
+			break;
+		case BIT(3):
+			slice->reqs[i].sem_cmd3 = cpu_to_le32(ENCODE_SEM(req->sem3.val,
+									 req->sem3.index,
+									 req->sem3.presync,
+									 req->sem3.cmd,
+									 req->sem3.flags));
+			break;
+		}
+		dev_addr += sg_dma_len(sg);
+	}
+	/* add post transfer stuff to last segment */
+	i--;
+	slice->reqs[i].cmd |= GEN_COMPLETION;
+	slice->reqs[i].db_addr = db_addr;
+	slice->reqs[i].db_len = db_len;
+	slice->reqs[i].db_data = db_data;
+	/*
+	 * Add a fence if we have more than one request going to the hardware
+	 * representing the entirety of the user request, and the user request
+	 * has no presync condition.
+	 * Fences are expensive, so we try to avoid them.  We rely on the
+	 * hardware behavior to avoid needing one when there is a presync
+	 * condition.  When a presync exists, all requests for that same
+	 * presync will be queued into a fifo.  Thus, since we queue the
+	 * post xfer activity only on the last request we queue, the hardware
+	 * will ensure that the last queued request is processed last, thus
+	 * making sure the post xfer activity happens at the right time without
+	 * a fence.
+	 */
+	if (i && !presync_sem)
+		req->sem0.flags |= (slice->dir == DMA_TO_DEVICE ?
+				    SEM_INSYNCFENCE : SEM_OUTSYNCFENCE);
+	slice->reqs[i].sem_cmd0 = cpu_to_le32(ENCODE_SEM(req->sem0.val,
+							 req->sem0.index,
+							 req->sem0.presync,
+							 req->sem0.cmd,
+							 req->sem0.flags));
+	slice->reqs[i].sem_cmd1 = cpu_to_le32(ENCODE_SEM(req->sem1.val,
+							 req->sem1.index,
+							 req->sem1.presync,
+							 req->sem1.cmd,
+							 req->sem1.flags));
+	slice->reqs[i].sem_cmd2 = cpu_to_le32(ENCODE_SEM(req->sem2.val,
+							 req->sem2.index,
+							 req->sem2.presync,
+							 req->sem2.cmd,
+							 req->sem2.flags));
+	slice->reqs[i].sem_cmd3 = cpu_to_le32(ENCODE_SEM(req->sem3.val,
+							 req->sem3.index,
+							 req->sem3.presync,
+							 req->sem3.cmd,
+							 req->sem3.flags));
+
+	return 0;
+}
+
+static int qaic_map_one_slice(struct qaic_device *qdev, struct qaic_bo *bo,
+			      struct qaic_attach_slice_entry *slice_ent)
+{
+	struct sg_table *sgt = NULL;
+	struct bo_slice *slice;
+	int ret;
+
+	ret = copy_sgt(qdev, &sgt, bo->sgt, slice_ent->size, slice_ent->offset);
+	if (ret)
+		goto out;
+
+	slice = kmalloc(sizeof(*slice), GFP_KERNEL);
+	if (!slice) {
+		ret = -ENOMEM;
+		goto free_sgt;
+	}
+
+	slice->reqs = kcalloc(sgt->nents, sizeof(*slice->reqs), GFP_KERNEL);
+	if (!slice->reqs) {
+		ret = -ENOMEM;
+		goto free_slice;
+	}
+
+	slice->no_xfer = !slice_ent->size;
+	slice->sgt = sgt;
+	slice->nents = sgt->nents;
+	slice->dir = bo->dir;
+	slice->bo = bo;
+	slice->size = slice_ent->size;
+	slice->offset = slice_ent->offset;
+
+	ret = encode_reqs(qdev, slice, slice_ent);
+	if (ret)
+		goto free_req;
+
+	bo->total_slice_nents += sgt->nents;
+	kref_init(&slice->ref_count);
+	drm_gem_object_get(&bo->base);
+	list_add_tail(&slice->slice, &bo->slices);
+
+	return 0;
+
+free_req:
+	kfree(slice->reqs);
+free_slice:
+	kfree(slice);
+free_sgt:
+	sg_free_table(sgt);
+	kfree(sgt);
+out:
+	return ret;
+}
+
+static int create_sgt(struct qaic_device *qdev, struct sg_table **sgt_out,
+		      u64 size)
+{
+	struct scatterlist *sg;
+	struct sg_table *sgt;
+	struct page **pages;
+	int *pages_order;
+	int buf_extra;
+	int max_order;
+	int nr_pages;
+	int ret = 0;
+	int i, j, k;
+	int order;
+
+	if (size) {
+		nr_pages = DIV_ROUND_UP(size, PAGE_SIZE);
+		/*
+		 * calculate how much extra we are going to allocate, to remove
+		 * later
+		 */
+		buf_extra = (PAGE_SIZE - size % PAGE_SIZE) % PAGE_SIZE;
+		max_order = min(MAX_ORDER - 1, get_order(size));
+	} else {
+		/* allocate a single page for book keeping */
+		nr_pages = 1;
+		buf_extra = 0;
+		max_order = 0;
+	}
+
+	pages = kvmalloc_array(nr_pages, sizeof(*pages) + sizeof(*pages_order), GFP_KERNEL);
+	if (!pages) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	pages_order = (void *)pages + sizeof(*pages) * nr_pages;
+
+	/*
+	 * Allocate requested memory using alloc_pages. It is possible to allocate
+	 * the requested memory in multiple chunks by calling alloc_pages
+	 * multiple times. Use SG table to handle multiple allocated pages.
+	 */
+	i = 0;
+	while (nr_pages > 0) {
+		order = min(get_order(nr_pages * PAGE_SIZE), max_order);
+		while (1) {
+			pages[i] = alloc_pages(GFP_KERNEL | GFP_HIGHUSER |
+					       __GFP_NOWARN | __GFP_ZERO |
+					       (order ? __GFP_NORETRY : __GFP_RETRY_MAYFAIL),
+					       order);
+			if (pages[i])
+				break;
+			if (!order--) {
+				ret = -ENOMEM;
+				goto free_partial_alloc;
+			}
+		}
+
+		max_order = order;
+		pages_order[i] = order;
+
+		nr_pages -= 1 << order;
+		if (nr_pages <= 0)
+			/* account for over allocation */
+			buf_extra += abs(nr_pages) * PAGE_SIZE;
+		i++;
+	}
+
+	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
+	if (!sgt) {
+		ret = -ENOMEM;
+		goto free_partial_alloc;
+	}
+
+	if (sg_alloc_table(sgt, i, GFP_KERNEL)) {
+		ret = -ENOMEM;
+		goto free_sgt;
+	}
+
+	/* Populate the SG table with the allocated memory pages */
+	sg = sgt->sgl;
+	for (k = 0; k < i; k++, sg = sg_next(sg)) {
+		/* Last entry requires special handling */
+		if (k < i - 1) {
+			sg_set_page(sg, pages[k], PAGE_SIZE << pages_order[k], 0);
+		} else {
+			sg_set_page(sg, pages[k],
+				    (PAGE_SIZE << pages_order[k]) - buf_extra, 0);
+			sg_mark_end(sg);
+		}
+
+		ret = reserve_pages(page_to_pfn(pages[k]), DIV_ROUND_UP(sg->length, PAGE_SIZE),
+				    true);
+		if (ret)
+			goto clear_pages;
+	}
+
+	kvfree(pages);
+	*sgt_out = sgt;
+	return ret;
+
+clear_pages:
+	for (j = 0; j < k; j++)
+		ret = reserve_pages(page_to_pfn(pages[j]), 1 << pages_order[j],
+				    false);
+	sg_free_table(sgt);
+free_sgt:
+	kfree(sgt);
+free_partial_alloc:
+	for (j = 0; j < i; j++)
+		__free_pages(pages[j], pages_order[j]);
+	kvfree(pages);
+out:
+	*sgt_out = NULL;
+	return ret;
+}
+
+static bool invalid_sem(struct qaic_sem *sem)
+{
+	if (sem->val & ~SEM_VAL_MASK || sem->index & ~SEM_INDEX_MASK ||
+	    !(sem->presync == 0 || sem->presync == 1) || sem->pad ||
+	    sem->flags & ~(SEM_INSYNCFENCE | SEM_OUTSYNCFENCE) ||
+	    sem->cmd > SEM_WAIT_GT_0)
+		return true;
+	return false;
+}
+
+static int qaic_validate_req(struct qaic_device *qdev,
+			     struct qaic_attach_slice_entry *slice_ent,
+			     u32 count, u64 total_size)
+{
+	int i;
+
+	for (i = 0; i < count; i++) {
+		if (!(slice_ent[i].db_len == 32 || slice_ent[i].db_len == 16 ||
+		      slice_ent[i].db_len == 8 || slice_ent[i].db_len == 0) ||
+		      invalid_sem(&slice_ent[i].sem0) ||
+		      invalid_sem(&slice_ent[i].sem1) ||
+		      invalid_sem(&slice_ent[i].sem2) ||
+		      invalid_sem(&slice_ent[i].sem3))
+			return -EINVAL;
+
+		if (slice_ent[i].offset + slice_ent[i].size > total_size)
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void qaic_free_sgt(struct sg_table *sgt)
+{
+	struct scatterlist *sg;
+
+	for (sg = sgt->sgl; sg; sg = sg_next(sg))
+		if (sg_page(sg)) {
+			reserve_pages(page_to_pfn(sg_page(sg)),
+				      DIV_ROUND_UP(sg->length, PAGE_SIZE), false);
+			__free_pages(sg_page(sg), get_order(sg->length));
+		}
+	sg_free_table(sgt);
+	kfree(sgt);
+}
+
+static void qaic_gem_print_info(struct drm_printer *p, unsigned int indent,
+				const struct drm_gem_object *obj)
+{
+	struct qaic_bo *bo = to_qaic_bo(obj);
+
+	drm_printf_indent(p, indent, "user requested size=%llu\n", bo->size);
+}
+
+static const struct vm_operations_struct drm_vm_ops = {
+	.open = drm_gem_vm_open,
+	.close = drm_gem_vm_close,
+};
+
+static int qaic_gem_object_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
+{
+	struct qaic_bo *bo = to_qaic_bo(obj);
+	unsigned long offset = 0;
+	struct scatterlist *sg;
+	int ret;
+
+	if (obj->import_attach)
+		return -EINVAL;
+
+	for (sg = bo->sgt->sgl; sg; sg = sg_next(sg)) {
+		if (sg_page(sg)) {
+			ret = remap_pfn_range(vma, vma->vm_start + offset,
+					      page_to_pfn(sg_page(sg)),
+					      sg->length, vma->vm_page_prot);
+			if (ret)
+				goto out;
+			offset += sg->length;
+		}
+	}
+
+out:
+	return ret;
+}
+
+static void qaic_free_object(struct drm_gem_object *obj)
+{
+	struct qaic_bo *bo = to_qaic_bo(obj);
+
+	if (obj->import_attach) {
+		/* DMABUF/PRIME Path */
+		dma_buf_detach(obj->import_attach->dmabuf, obj->import_attach);
+		dma_buf_put(obj->import_attach->dmabuf);
+	} else {
+		/* Private buffer allocation path */
+		qaic_free_sgt(bo->sgt);
+	}
+
+	drm_gem_object_release(obj);
+	kfree(bo);
+}
+
+static const struct drm_gem_object_funcs qaic_gem_funcs = {
+	.free = qaic_free_object,
+	.print_info = qaic_gem_print_info,
+	.mmap = qaic_gem_object_mmap,
+	.vm_ops = &drm_vm_ops,
+};
+
+static struct qaic_bo *qaic_alloc_init_bo(void)
+{
+	struct qaic_bo *bo;
+
+	bo = kzalloc(sizeof(*bo), GFP_KERNEL);
+	if (!bo)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&bo->slices);
+	init_completion(&bo->xfer_done);
+	complete_all(&bo->xfer_done);
+
+	return bo;
+}
+
+int qaic_create_bo_ioctl(struct drm_device *dev, void *data,
+			 struct drm_file *file_priv)
+{
+	struct qaic_create_bo *args = data;
+	int usr_rcu_id, qdev_rcu_id;
+	struct drm_gem_object *obj;
+	struct qaic_device *qdev;
+	struct qaic_user *usr;
+	struct qaic_bo *bo;
+	size_t size;
+	int ret;
+
+	if (args->pad)
+		return -EINVAL;
+
+	usr = file_priv->driver_priv;
+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
+	if (!usr->qddev) {
+		ret = -ENODEV;
+		goto unlock_usr_srcu;
+	}
+
+	qdev = usr->qddev->qdev;
+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		ret = -ENODEV;
+		goto unlock_dev_srcu;
+	}
+
+	size = PAGE_ALIGN(args->size);
+	if (size == 0) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	bo = qaic_alloc_init_bo();
+	if (IS_ERR(bo)) {
+		ret = PTR_ERR(bo);
+		goto unlock_dev_srcu;
+	}
+	obj = &bo->base;
+
+	drm_gem_private_object_init(dev, obj, size);
+
+	obj->funcs = &qaic_gem_funcs;
+	ret = create_sgt(qdev, &bo->sgt, size);
+	if (ret)
+		goto free_bo;
+
+	bo->size = args->size;
+
+	ret = drm_gem_handle_create(file_priv, obj, &args->handle);
+	if (ret)
+		goto free_sgt;
+
+	bo->handle = args->handle;
+	drm_gem_object_put(obj);
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+
+	return 0;
+
+free_sgt:
+	qaic_free_sgt(bo->sgt);
+free_bo:
+	kfree(bo);
+unlock_dev_srcu:
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+unlock_usr_srcu:
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+	return ret;
+}
+
+int qaic_mmap_bo_ioctl(struct drm_device *dev, void *data,
+		       struct drm_file *file_priv)
+{
+	struct qaic_mmap_bo *args = data;
+	int usr_rcu_id, qdev_rcu_id;
+	struct drm_gem_object *obj;
+	struct qaic_device *qdev;
+	struct qaic_user *usr;
+	int ret;
+
+	usr = file_priv->driver_priv;
+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
+	if (!usr->qddev) {
+		ret = -ENODEV;
+		goto unlock_usr_srcu;
+	}
+
+	qdev = usr->qddev->qdev;
+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		ret = -ENODEV;
+		goto unlock_dev_srcu;
+	}
+
+	obj = drm_gem_object_lookup(file_priv, args->handle);
+	if (!obj) {
+		ret = -ENOENT;
+		goto unlock_dev_srcu;
+	}
+
+	ret = drm_gem_create_mmap_offset(obj);
+	if (ret == 0)
+		args->offset = drm_vma_node_offset_addr(&obj->vma_node);
+
+	drm_gem_object_put(obj);
+
+unlock_dev_srcu:
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+unlock_usr_srcu:
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+	return ret;
+}
+
+struct drm_gem_object *qaic_gem_prime_import(struct drm_device *dev,
+					     struct dma_buf *dma_buf)
+{
+	struct dma_buf_attachment *attach;
+	struct drm_gem_object *obj;
+	struct qaic_bo *bo;
+	size_t size;
+	int ret;
+
+	bo = qaic_alloc_init_bo();
+	if (IS_ERR(bo)) {
+		ret = PTR_ERR(bo);
+		goto out;
+	}
+
+	obj = &bo->base;
+	get_dma_buf(dma_buf);
+
+	attach = dma_buf_attach(dma_buf, dev->dev);
+	if (IS_ERR(attach)) {
+		ret = PTR_ERR(attach);
+		goto attach_fail;
+	}
+
+	size = PAGE_ALIGN(attach->dmabuf->size);
+	if (size == 0) {
+		ret = -EINVAL;
+		goto size_align_fail;
+	}
+
+	drm_gem_private_object_init(dev, obj, size);
+	/*
+	 * skipping dma_buf_map_attachment() as we do not know the direction
+	 * just yet.  Once the direction is known in the subsequent IOCTL to
+	 * attach slicing, we can do it then.
+	 */
+
+	obj->funcs = &qaic_gem_funcs;
+	obj->import_attach = attach;
+	obj->resv = dma_buf->resv;
+
+	return obj;
+
+size_align_fail:
+	dma_buf_detach(dma_buf, attach);
+attach_fail:
+	dma_buf_put(dma_buf);
+	kfree(bo);
+out:
+	return ERR_PTR(ret);
+}
+
+static int qaic_prepare_import_bo(struct qaic_bo *bo,
+				  struct qaic_attach_slice_hdr *hdr)
+{
+	struct drm_gem_object *obj = &bo->base;
+	struct sg_table *sgt;
+	int ret;
+
+	if (obj->import_attach->dmabuf->size < hdr->size)
+		return -EINVAL;
+
+	sgt = dma_buf_map_attachment(obj->import_attach, hdr->dir);
+	if (IS_ERR(sgt)) {
+		ret = PTR_ERR(sgt);
+		return ret;
+	}
+
+	bo->sgt = sgt;
+	bo->size = hdr->size;
+
+	return 0;
+}
+
+static int qaic_prepare_export_bo(struct qaic_device *qdev, struct qaic_bo *bo,
+				  struct qaic_attach_slice_hdr *hdr)
+{
+	int ret;
+
+	if (bo->size != hdr->size)
+		return -EINVAL;
+
+	ret = dma_map_sgtable(&qdev->pdev->dev, bo->sgt, hdr->dir, 0);
+	if (ret)
+		return -EFAULT;
+
+	return 0;
+}
+
+static int qaic_prepare_bo(struct qaic_device *qdev, struct qaic_bo *bo,
+			   struct qaic_attach_slice_hdr *hdr)
+{
+	int ret;
+
+	if (bo->base.import_attach)
+		ret = qaic_prepare_import_bo(bo, hdr);
+	else
+		ret = qaic_prepare_export_bo(qdev, bo, hdr);
+
+	if (ret == 0)
+		bo->dir = hdr->dir;
+
+	return ret;
+}
+
+static void qaic_unprepare_import_bo(struct qaic_bo *bo)
+{
+	dma_buf_unmap_attachment(bo->base.import_attach, bo->sgt, bo->dir);
+	bo->sgt = NULL;
+	bo->size = 0;
+}
+
+static void qaic_unprepare_export_bo(struct qaic_device *qdev, struct qaic_bo *bo)
+{
+	dma_unmap_sgtable(&qdev->pdev->dev, bo->sgt, bo->dir, 0);
+}
+
+static void qaic_unprepare_bo(struct qaic_device *qdev, struct qaic_bo *bo)
+{
+	if (bo->base.import_attach)
+		qaic_unprepare_import_bo(bo);
+	else
+		qaic_unprepare_export_bo(qdev, bo);
+
+	bo->dir = 0;
+}
+
+static void qaic_free_slices_bo(struct qaic_bo *bo)
+{
+	struct bo_slice *slice, *temp;
+
+	list_for_each_entry_safe(slice, temp, &bo->slices, slice) {
+		kref_put(&slice->ref_count, free_slice);
+	}
+}
+
+static int qaic_attach_slicing_bo(struct qaic_device *qdev,
+				  struct qaic_bo *bo,
+				  struct qaic_attach_slice_hdr *hdr,
+				  struct qaic_attach_slice_entry *slice_ent)
+{
+	int ret, i;
+
+	for (i = 0; i < hdr->count; i++) {
+		ret = qaic_map_one_slice(qdev, bo, &slice_ent[i]);
+		if (ret) {
+			qaic_free_slices_bo(bo);
+			return ret;
+		}
+	}
+
+	if (bo->total_slice_nents > qdev->dbc[hdr->dbc_id].nelem) {
+		qaic_free_slices_bo(bo);
+		return -ENOSPC;
+	}
+
+	bo->sliced = true;
+	bo->nr_slice = hdr->count;
+	list_add_tail(&bo->bo_list, &qdev->dbc[hdr->dbc_id].bo_lists);
+
+	return 0;
+}
+
+int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data,
+			       struct drm_file *file_priv)
+{
+	struct qaic_attach_slice_entry *slice_ent;
+	struct qaic_attach_slice *args = data;
+	struct dma_bridge_chan	*dbc;
+	int usr_rcu_id, qdev_rcu_id;
+	struct drm_gem_object *obj;
+	struct qaic_device *qdev;
+	unsigned long arg_size;
+	struct qaic_user *usr;
+	u8 __user *user_data;
+	struct qaic_bo *bo;
+	int ret;
+
+	usr = file_priv->driver_priv;
+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
+	if (!usr->qddev) {
+		ret = -ENODEV;
+		goto unlock_usr_srcu;
+	}
+
+	qdev = usr->qddev->qdev;
+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		ret = -ENODEV;
+		goto unlock_dev_srcu;
+	}
+
+	if (args->hdr.count == 0) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	arg_size = args->hdr.count * sizeof(*slice_ent);
+	if (arg_size / args->hdr.count != sizeof(*slice_ent)) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	if (args->hdr.dbc_id >= qdev->num_dbc) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	if (args->hdr.size == 0) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	if (!(args->hdr.dir == DMA_TO_DEVICE  ||
+	      args->hdr.dir == DMA_FROM_DEVICE)) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	dbc = &qdev->dbc[args->hdr.dbc_id];
+	if (dbc->usr != usr) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	if (args->data == 0) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	user_data = u64_to_user_ptr(args->data);
+
+	slice_ent = kzalloc(arg_size, GFP_KERNEL);
+	if (!slice_ent) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	ret = copy_from_user(slice_ent, user_data, arg_size);
+	if (ret) {
+		ret = -EFAULT;
+		goto free_slice_ent;
+	}
+
+	ret = qaic_validate_req(qdev, slice_ent, args->hdr.count, args->hdr.size);
+	if (ret)
+		goto free_slice_ent;
+
+	obj = drm_gem_object_lookup(file_priv, args->hdr.handle);
+	if (!obj) {
+		ret = -ENOENT;
+		goto free_slice_ent;
+	}
+
+	bo = to_qaic_bo(obj);
+
+	ret = qaic_prepare_bo(qdev, bo, &args->hdr);
+	if (ret)
+		goto put_bo;
+
+	ret = qaic_attach_slicing_bo(qdev, bo, &args->hdr, slice_ent);
+	if (ret)
+		goto unprepare_bo;
+
+	if (args->hdr.dir == DMA_TO_DEVICE)
+		dma_sync_sgtable_for_cpu(&qdev->pdev->dev, bo->sgt, args->hdr.dir);
+
+	bo->dbc = dbc;
+	drm_gem_object_put(obj);
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+
+	return 0;
+
+unprepare_bo:
+	qaic_unprepare_bo(qdev, bo);
+put_bo:
+	drm_gem_object_put(obj);
+free_slice_ent:
+	kfree(slice_ent);
+unlock_dev_srcu:
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+unlock_usr_srcu:
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+	return ret;
+}
+
+static inline int copy_exec_reqs(struct qaic_device *qdev,
+				 struct bo_slice *slice, u32 dbc_id, u32 head,
+				 u32 *ptail)
+{
+	struct dma_bridge_chan *dbc = &qdev->dbc[dbc_id];
+	struct dbc_req *reqs = slice->reqs;
+	u32 tail = *ptail;
+	u32 avail;
+
+	avail = head - tail;
+	if (head <= tail)
+		avail += dbc->nelem;
+
+	--avail;
+
+	if (avail < slice->nents)
+		return -EAGAIN;
+
+	if (tail + slice->nents > dbc->nelem) {
+		avail = dbc->nelem - tail;
+		avail = min_t(u32, avail, slice->nents);
+		memcpy(dbc->req_q_base + tail * get_dbc_req_elem_size(),
+		       reqs, sizeof(*reqs) * avail);
+		reqs += avail;
+		avail = slice->nents - avail;
+		if (avail)
+			memcpy(dbc->req_q_base, reqs, sizeof(*reqs) * avail);
+	} else {
+		memcpy(dbc->req_q_base + tail * get_dbc_req_elem_size(),
+		       reqs, sizeof(*reqs) * slice->nents);
+	}
+
+	*ptail = (tail + slice->nents) % dbc->nelem;
+
+	return 0;
+}
+
+/*
+ * Based on the value of resize we may only need to transmit first_n
+ * entries and the last entry, with last_bytes to send from the last entry.
+ * Note that first_n could be 0.
+ */
+static inline int copy_partial_exec_reqs(struct qaic_device *qdev,
+					 struct bo_slice *slice,
+					 u64 resize, u32 dbc_id,
+					 u32 head, u32 *ptail)
+{
+	struct dma_bridge_chan *dbc = &qdev->dbc[dbc_id];
+	struct dbc_req *reqs = slice->reqs;
+	struct dbc_req *last_req;
+	u32 tail = *ptail;
+	u64 total_bytes;
+	u64 last_bytes;
+	u32 first_n;
+	u32 avail;
+	int ret;
+	int i;
+
+	avail = head - tail;
+	if (head <= tail)
+		avail += dbc->nelem;
+
+	--avail;
+
+	total_bytes = 0;
+	for (i = 0; i < slice->nents; i++) {
+		total_bytes += le32_to_cpu(reqs[i].len);
+		if (total_bytes >= resize)
+			break;
+	}
+
+	if (total_bytes < resize) {
+		/* User space should have used the full buffer path. */
+		ret = -EINVAL;
+		return ret;
+	}
+
+	first_n = i;
+	last_bytes = i ? resize + le32_to_cpu(reqs[i].len) - total_bytes : resize;
+
+	if (avail < (first_n + 1))
+		return -EAGAIN;
+
+	if (first_n) {
+		if (tail + first_n > dbc->nelem) {
+			avail = dbc->nelem - tail;
+			avail = min_t(u32, avail, first_n);
+			memcpy(dbc->req_q_base + tail * get_dbc_req_elem_size(),
+			       reqs, sizeof(*reqs) * avail);
+			last_req = reqs + avail;
+			avail = first_n - avail;
+			if (avail)
+				memcpy(dbc->req_q_base, last_req,
+				       sizeof(*reqs) * avail);
+		} else {
+			memcpy(dbc->req_q_base + tail * get_dbc_req_elem_size(),
+			       reqs, sizeof(*reqs) * first_n);
+		}
+	}
+
+	/* Copy over the last entry. Here we need to adjust len to the left over
+	 * size, and set src and dst to the entry it is copied to.
+	 */
+	last_req =  dbc->req_q_base +
+		    (tail + first_n) % dbc->nelem * get_dbc_req_elem_size();
+	memcpy(last_req, reqs + slice->nents - 1, sizeof(*reqs));
+
+	/*
+	 * last_bytes holds size of a DMA segment, maximum DMA segment size is
+	 * set to UINT_MAX by qaic and hence last_bytes can never exceed u32
+	 * range. So, by down sizing we are not corrupting the value.
+	 */
+	last_req->len = cpu_to_le32((u32)last_bytes);
+	last_req->src_addr = reqs[first_n].src_addr;
+	last_req->dest_addr = reqs[first_n].dest_addr;
+
+	*ptail = (tail + first_n + 1) % dbc->nelem;
+
+	return 0;
+}
+
+static int __qaic_execute_bo_ioctl(struct drm_device *dev, void *data,
+				   struct drm_file *file_priv, bool is_partial)
+{
+	struct qaic_partial_execute_entry *pexec;
+	struct qaic_execute *args = data;
+	struct qaic_execute_entry *exec;
+	struct dma_bridge_chan *dbc;
+	int usr_rcu_id, qdev_rcu_id;
+	struct drm_gem_object *obj;
+	struct qaic_device *qdev;
+	struct bo_slice *slice;
+	struct qaic_user *usr;
+	u8 __user *user_data;
+	unsigned long flags;
+	u64 received_ts = 0;
+	u32 queue_level = 0;
+	struct qaic_bo *bo;
+	u64 submit_ts = 0;
+	unsigned long n;
+	bool queued;
+	int ret = 0;
+	int dbc_id;
+	int rcu_id;
+	u32 head;
+	u32 tail;
+	u64 size;
+	int i, j;
+
+	received_ts = ktime_get_ns();
+
+	usr = file_priv->driver_priv;
+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
+	if (!usr->qddev) {
+		ret = -ENODEV;
+		goto unlock_usr_srcu;
+	}
+
+	qdev = usr->qddev->qdev;
+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		ret = -ENODEV;
+		goto unlock_dev_srcu;
+	}
+
+	if (args->hdr.dbc_id >= qdev->num_dbc) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	dbc_id = args->hdr.dbc_id;
+	dbc = &qdev->dbc[dbc_id];
+
+	size = is_partial ? sizeof(*pexec) : sizeof(*exec);
+
+	n = (unsigned long)size * args->hdr.count;
+	if (args->hdr.count == 0 || n / args->hdr.count != size) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	user_data = u64_to_user_ptr(args->data);
+
+	exec = kcalloc(args->hdr.count, size, GFP_KERNEL);
+	pexec = (struct qaic_partial_execute_entry *)exec;
+	if (!exec) {
+		ret = -ENOMEM;
+		goto unlock_dev_srcu;
+	}
+
+	if (copy_from_user(exec, user_data, n)) {
+		ret = -EFAULT;
+		goto free_exec;
+	}
+
+	rcu_id = srcu_read_lock(&dbc->ch_lock);
+	if (!dbc->usr || dbc->usr->handle != usr->handle) {
+		ret = -EPERM;
+		goto release_ch_rcu;
+	}
+
+	head = readl(dbc->dbc_base + REQHP_OFF);
+	tail = readl(dbc->dbc_base + REQTP_OFF);
+
+	if (head == U32_MAX || tail == U32_MAX) {
+		/* PCI link error */
+		ret = -ENODEV;
+		goto release_ch_rcu;
+	}
+
+	queue_level = head <= tail ? tail - head : dbc->nelem - (head - tail);
+
+	for (i = 0; i < args->hdr.count; i++) {
+		/*
+		 * ref count will be decremented when the transfer of this
+		 * buffer is complete. It is inside dbc_irq_threaded_fn().
+		 */
+		obj = drm_gem_object_lookup(file_priv,
+					    is_partial ? pexec[i].handle : exec[i].handle);
+		if (!obj) {
+			ret = -ENOENT;
+			goto sync_to_cpu;
+		}
+
+		bo = to_qaic_bo(obj);
+
+		if (!bo->sliced) {
+			ret = -EINVAL;
+			goto sync_to_cpu;
+		}
+
+		if (is_partial && pexec[i].resize > bo->size) {
+			ret = -EINVAL;
+			goto sync_to_cpu;
+		}
+
+		spin_lock_irqsave(&dbc->xfer_lock, flags);
+		queued = bo->queued;
+		bo->queued = true;
+		if (queued) {
+			spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+			ret = -EINVAL;
+			goto sync_to_cpu;
+		}
+
+		bo->req_id = dbc->next_req_id++;
+
+		list_for_each_entry(slice, &bo->slices, slice) {
+			/*
+			 * If this slice does not fall under the given
+			 * resize then skip this slice and continue the loop
+			 */
+			if (is_partial && pexec[i].resize &&
+			    pexec[i].resize <= slice->offset)
+				continue;
+
+			for (j = 0; j < slice->nents; j++)
+				slice->reqs[j].req_id = cpu_to_le16(bo->req_id);
+
+			/*
+			 * If it is a partial execute ioctl call then check if
+			 * resize has cut this slice short then do a partial copy
+			 * else do complete copy
+			 */
+			if (is_partial && pexec[i].resize &&
+			    pexec[i].resize < slice->offset + slice->size)
+				ret = copy_partial_exec_reqs(qdev, slice,
+							     pexec[i].resize - slice->offset,
+							     dbc_id, head, &tail);
+			else
+				ret = copy_exec_reqs(qdev, slice, dbc_id, head, &tail);
+			if (ret) {
+				bo->queued = false;
+				spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+				goto sync_to_cpu;
+			}
+		}
+		reinit_completion(&bo->xfer_done);
+		list_add_tail(&bo->xfer_list, &dbc->xfer_list);
+		spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+		dma_sync_sgtable_for_device(&qdev->pdev->dev, bo->sgt, bo->dir);
+	}
+
+	submit_ts = ktime_get_ns();
+	writel(tail, dbc->dbc_base + REQTP_OFF);
+
+	/* Collect kernel Profiling data */
+	for (i = 0; i < args->hdr.count; i++) {
+		/*
+		 * Since we already committed the BO to hardware, the only way
+		 * this should fail is a pending signal.  We can't cancel the
+		 * submit to hardware, so we have to just skip the profiling
+		 * data.  In case the signal is not fatal to the process, we
+		 * return success so that the user doesn't try to resubmit.
+		 */
+		obj = drm_gem_object_lookup(file_priv,
+					    is_partial ? pexec[i].handle : exec[i].handle);
+		if (!obj)
+			break;
+		bo = to_qaic_bo(obj);
+		bo->perf_stats.req_received_ts = received_ts;
+		bo->perf_stats.req_submit_ts = submit_ts;
+		bo->perf_stats.queue_level_before = queue_level;
+		queue_level += bo->total_slice_nents;
+		drm_gem_object_put(obj);
+	}
+
+	if (poll_datapath)
+		schedule_work(&dbc->poll_work);
+
+	goto release_ch_rcu;
+
+sync_to_cpu:
+	if (likely(obj))
+		drm_gem_object_put(obj);
+	for (j = 0; j < i; j++) {
+		spin_lock_irqsave(&dbc->xfer_lock, flags);
+		bo = list_last_entry(&dbc->xfer_list, struct qaic_bo,
+				     xfer_list);
+		obj = &bo->base;
+		bo->queued = false;
+		list_del(&bo->xfer_list);
+		spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+		dma_sync_sgtable_for_cpu(&qdev->pdev->dev, bo->sgt, bo->dir);
+		/* Release ref to BO */
+		drm_gem_object_put(obj);
+	}
+release_ch_rcu:
+	srcu_read_unlock(&dbc->ch_lock, rcu_id);
+free_exec:
+	kfree(exec);
+unlock_dev_srcu:
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+unlock_usr_srcu:
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+	return ret;
+}
+
+int qaic_execute_bo_ioctl(struct drm_device *dev, void *data,
+			  struct drm_file *file_priv)
+{
+	return __qaic_execute_bo_ioctl(dev, data, file_priv, false);
+}
+
+int qaic_partial_execute_bo_ioctl(struct drm_device *dev, void *data,
+				  struct drm_file *file_priv)
+{
+	return __qaic_execute_bo_ioctl(dev, data, file_priv, true);
+}
+
+/*
+ * Our interrupt handling is a bit more complicated than a simple ideal, but
+ * sadly necessary.
+ *
+ * Each dbc has a completion queue.  Entries in the queue correspond to DMA
+ * requests which the device has processed.  The hardware already has a built
+ * in irq mitigation.  When the device puts an entry into the queue, it will
+ * only trigger an interrupt if the queue was empty.  Therefore, when adding
+ * the Nth event to a non-empty queue, the hardware doesn't trigger an
+ * interrupt.  This means the host doesn't get additional interrupts signaling
+ * the same thing - the queue has something to process.
+ * This behavior can be overridden in the DMA request.
+ * This means that when the host receives an interrupt, it is required to
+ * drain the queue.
+ *
+ * This behavior is what NAPI attempts to accomplish, although we can't use
+ * NAPI as we don't have a netdev.  We use threaded irqs instead.
+ *
+ * However, there is a situation where the host drains the queue fast enough
+ * that every event causes an interrupt.  Typically this is not a problem as
+ * the rate of events would be low.  However, that is not the case with
+ * lprnet for example.  On an Intel Xeon D-2191 where we run 8 instances of
+ * lprnet, the host receives roughly 80k interrupts per second from the device
+ * (per /proc/interrupts).  While NAPI documentation indicates the host should
+ * just chug along, sadly that behavior causes instability in some hosts.
+ *
+ * Therefore, we implement an interrupt disable scheme similar to NAPI.  The
+ * key difference is that we will delay after draining the queue for a small
+ * time to allow additional events to come in via polling.  Using the above
+ * lprnet workload, this reduces the number of interrupts processed from
+ * ~80k/sec to about 64 in 5 minutes and appears to solve the system
+ * instability.
+ */
+irqreturn_t dbc_irq_handler(int irq, void *data)
+{
+	struct dma_bridge_chan *dbc = data;
+	int rcu_id;
+	u32 head;
+	u32 tail;
+
+	rcu_id = srcu_read_lock(&dbc->ch_lock);
+
+	if (!dbc->usr) {
+		srcu_read_unlock(&dbc->ch_lock, rcu_id);
+		return IRQ_HANDLED;
+	}
+
+	head = readl(dbc->dbc_base + RSPHP_OFF);
+	if (head == U32_MAX) { /* PCI link error */
+		srcu_read_unlock(&dbc->ch_lock, rcu_id);
+		return IRQ_NONE;
+	}
+
+	tail = readl(dbc->dbc_base + RSPTP_OFF);
+	if (tail == U32_MAX) { /* PCI link error */
+		srcu_read_unlock(&dbc->ch_lock, rcu_id);
+		return IRQ_NONE;
+	}
+
+	if (head == tail) { /* queue empty */
+		srcu_read_unlock(&dbc->ch_lock, rcu_id);
+		return IRQ_NONE;
+	}
+
+	disable_irq_nosync(irq);
+	srcu_read_unlock(&dbc->ch_lock, rcu_id);
+	return IRQ_WAKE_THREAD;
+}
+
+void irq_polling_work(struct work_struct *work)
+{
+	struct dma_bridge_chan *dbc = container_of(work,
+						   struct dma_bridge_chan,
+						   poll_work);
+	unsigned long flags;
+	int rcu_id;
+	u32 head;
+	u32 tail;
+
+	rcu_id = srcu_read_lock(&dbc->ch_lock);
+
+	while (1) {
+		if (dbc->qdev->in_reset) {
+			srcu_read_unlock(&dbc->ch_lock, rcu_id);
+			return;
+		}
+		if (!dbc->usr) {
+			srcu_read_unlock(&dbc->ch_lock, rcu_id);
+			return;
+		}
+		spin_lock_irqsave(&dbc->xfer_lock, flags);
+		if (list_empty(&dbc->xfer_list)) {
+			spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+			srcu_read_unlock(&dbc->ch_lock, rcu_id);
+			return;
+		}
+		spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+
+		head = readl(dbc->dbc_base + RSPHP_OFF);
+		if (head == U32_MAX) { /* PCI link error */
+			srcu_read_unlock(&dbc->ch_lock, rcu_id);
+			return;
+		}
+
+		tail = readl(dbc->dbc_base + RSPTP_OFF);
+		if (tail == U32_MAX) { /* PCI link error */
+			srcu_read_unlock(&dbc->ch_lock, rcu_id);
+			return;
+		}
+
+		if (head != tail) {
+			irq_wake_thread(dbc->irq, dbc);
+			srcu_read_unlock(&dbc->ch_lock, rcu_id);
+			return;
+		}
+
+		cond_resched();
+		usleep_range(datapath_poll_interval_us,
+			     2 * datapath_poll_interval_us);
+	}
+}
+
+irqreturn_t dbc_irq_threaded_fn(int irq, void *data)
+{
+	struct dma_bridge_chan *dbc = data;
+	int event_count = NUM_EVENTS;
+	int delay_count = NUM_DELAYS;
+	struct qaic_device *qdev;
+	struct qaic_bo *bo, *i;
+	struct dbc_rsp *rsp;
+	unsigned long flags;
+	int rcu_id;
+	u16 status;
+	u16 req_id;
+	u32 head;
+	u32 tail;
+
+	rcu_id = srcu_read_lock(&dbc->ch_lock);
+
+	head = readl(dbc->dbc_base + RSPHP_OFF);
+	if (head == U32_MAX) /* PCI link error */
+		goto error_out;
+
+	qdev = dbc->qdev;
+read_fifo:
+
+	if (!event_count) {
+		event_count = NUM_EVENTS;
+		cond_resched();
+	}
+
+	/*
+	 * if this channel isn't assigned or gets unassigned during processing
+	 * we have nothing further to do
+	 */
+	if (!dbc->usr)
+		goto error_out;
+
+	tail = readl(dbc->dbc_base + RSPTP_OFF);
+	if (tail == U32_MAX) /* PCI link error */
+		goto error_out;
+
+	if (head == tail) { /* queue empty */
+		if (delay_count) {
+			--delay_count;
+			usleep_range(100, 200);
+			goto read_fifo; /* check for a new event */
+		}
+		goto normal_out;
+	}
+
+	delay_count = NUM_DELAYS;
+	while (head != tail) {
+		if (!event_count)
+			break;
+		--event_count;
+		rsp = dbc->rsp_q_base + head * sizeof(*rsp);
+		req_id = le16_to_cpu(rsp->req_id);
+		status = le16_to_cpu(rsp->status);
+		if (status)
+			pci_dbg(qdev->pdev, "req_id %d failed with status %d\n",
+				req_id, status);
+		spin_lock_irqsave(&dbc->xfer_lock, flags);
+		/*
+		 * A BO can receive multiple interrupts, since a BO can be
+		 * divided into multiple slices and a buffer receives as many
+		 * interrupts as slices. So until it receives interrupts for
+		 * all the slices we cannot mark that buffer complete.
+		 */
+		list_for_each_entry_safe(bo, i, &dbc->xfer_list, xfer_list) {
+			if (bo->req_id == req_id)
+				bo->nr_slice_xfer_done++;
+			else
+				continue;
+
+			if (bo->nr_slice_xfer_done < bo->nr_slice)
+				break;
+
+			/*
+			 * At this point we have received all the interrupts for
+			 * BO, which means BO execution is complete.
+			 */
+			dma_sync_sgtable_for_cpu(&qdev->pdev->dev, bo->sgt, bo->dir);
+			bo->nr_slice_xfer_done = 0;
+			bo->queued = false;
+			list_del(&bo->xfer_list);
+			bo->perf_stats.req_processed_ts = ktime_get_ns();
+			complete_all(&bo->xfer_done);
+			drm_gem_object_put(&bo->base);
+			break;
+		}
+		spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+		head = (head + 1) % dbc->nelem;
+	}
+
+	/*
+	 * Update the head pointer of response queue and let the device know
+	 * that we have consumed elements from the queue.
+	 */
+	writel(head, dbc->dbc_base + RSPHP_OFF);
+
+	/* elements might have been put in the queue while we were processing */
+	goto read_fifo;
+
+normal_out:
+	if (likely(!poll_datapath))
+		enable_irq(irq);
+	else
+		schedule_work(&dbc->poll_work);
+	/* checking the fifo and enabling irqs is a race, missed event check */
+	tail = readl(dbc->dbc_base + RSPTP_OFF);
+	if (tail != U32_MAX && head != tail) {
+		if (likely(!poll_datapath))
+			disable_irq_nosync(irq);
+		goto read_fifo;
+	}
+	srcu_read_unlock(&dbc->ch_lock, rcu_id);
+	return IRQ_HANDLED;
+
+error_out:
+	srcu_read_unlock(&dbc->ch_lock, rcu_id);
+	if (likely(!poll_datapath))
+		enable_irq(irq);
+	else
+		schedule_work(&dbc->poll_work);
+
+	return IRQ_HANDLED;
+}
+
+int qaic_wait_bo_ioctl(struct drm_device *dev, void *data,
+		       struct drm_file *file_priv)
+{
+	struct qaic_wait *args = data;
+	int usr_rcu_id, qdev_rcu_id;
+	struct dma_bridge_chan *dbc;
+	struct drm_gem_object *obj;
+	struct qaic_device *qdev;
+	unsigned long timeout;
+	struct qaic_user *usr;
+	struct qaic_bo *bo;
+	int rcu_id;
+	int ret;
+
+	usr = file_priv->driver_priv;
+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
+	if (!usr->qddev) {
+		ret = -ENODEV;
+		goto unlock_usr_srcu;
+	}
+
+	qdev = usr->qddev->qdev;
+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		ret = -ENODEV;
+		goto unlock_dev_srcu;
+	}
+
+	if (args->pad != 0) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	if (args->dbc_id >= qdev->num_dbc) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	dbc = &qdev->dbc[args->dbc_id];
+
+	rcu_id = srcu_read_lock(&dbc->ch_lock);
+	if (dbc->usr != usr) {
+		ret = -EPERM;
+		goto unlock_ch_srcu;
+	}
+
+	obj = drm_gem_object_lookup(file_priv, args->handle);
+	if (!obj) {
+		ret = -ENOENT;
+		goto unlock_ch_srcu;
+	}
+
+	bo = to_qaic_bo(obj);
+	timeout = args->timeout ? args->timeout : wait_exec_default_timeout;
+	timeout = msecs_to_jiffies(timeout);
+	ret = wait_for_completion_interruptible_timeout(&bo->xfer_done, timeout);
+	if (!ret) {
+		ret = -ETIMEDOUT;
+		goto put_obj;
+	}
+	if (ret > 0)
+		ret = 0;
+
+	if (!dbc->usr)
+		ret = -EPERM;
+
+put_obj:
+	drm_gem_object_put(obj);
+unlock_ch_srcu:
+	srcu_read_unlock(&dbc->ch_lock, rcu_id);
+unlock_dev_srcu:
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+unlock_usr_srcu:
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+	return ret;
+}
+
+int qaic_perf_stats_bo_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file_priv)
+{
+	struct qaic_perf_stats_entry *ent = NULL;
+	struct qaic_perf_stats *args = data;
+	int usr_rcu_id, qdev_rcu_id;
+	struct drm_gem_object *obj;
+	struct qaic_device *qdev;
+	struct qaic_user *usr;
+	struct qaic_bo *bo;
+	int ret, i;
+
+	usr = file_priv->driver_priv;
+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
+	if (!usr->qddev) {
+		ret = -ENODEV;
+		goto unlock_usr_srcu;
+	}
+
+	qdev = usr->qddev->qdev;
+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		ret = -ENODEV;
+		goto unlock_dev_srcu;
+	}
+
+	if (args->hdr.dbc_id >= qdev->num_dbc) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	ent = kcalloc(args->hdr.count, sizeof(*ent), GFP_KERNEL);
+	if (!ent) {
+		ret = -EINVAL;
+		goto unlock_dev_srcu;
+	}
+
+	ret = copy_from_user(ent, u64_to_user_ptr(args->data),
+			     args->hdr.count * sizeof(*ent));
+	if (ret) {
+		ret = -EFAULT;
+		goto free_ent;
+	}
+
+	for (i = 0; i < args->hdr.count; i++) {
+		obj = drm_gem_object_lookup(file_priv, ent[i].handle);
+		if (!obj) {
+			ret = -ENOENT;
+			goto free_ent;
+		}
+		bo = to_qaic_bo(obj);
+		/*
+		 * perf stats ioctl is called before wait ioctl is complete then
+		 * the latency information is invalid.
+		 */
+		if (bo->perf_stats.req_processed_ts < bo->perf_stats.req_submit_ts) {
+			ent[i].device_latency_us = 0;
+		} else {
+			ent[i].device_latency_us = (bo->perf_stats.req_processed_ts -
+						    bo->perf_stats.req_submit_ts) / 1000;
+		}
+		ent[i].submit_latency_us = (bo->perf_stats.req_submit_ts -
+					   bo->perf_stats.req_received_ts) / 1000;
+		ent[i].queue_level_before = bo->perf_stats.queue_level_before;
+		ent[i].num_queue_element = bo->total_slice_nents;
+		drm_gem_object_put(obj);
+	}
+
+	if (copy_to_user(u64_to_user_ptr(args->data), ent,
+			 args->hdr.count * sizeof(*ent)))
+		ret = -EFAULT;
+
+free_ent:
+	kfree(ent);
+unlock_dev_srcu:
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+unlock_usr_srcu:
+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
+	return ret;
+}
+
+static void empty_xfer_list(struct qaic_device *qdev, struct dma_bridge_chan *dbc)
+{
+	unsigned long flags;
+	struct qaic_bo *bo;
+
+	spin_lock_irqsave(&dbc->xfer_lock, flags);
+	while (!list_empty(&dbc->xfer_list)) {
+		bo = list_first_entry(&dbc->xfer_list, typeof(*bo), xfer_list);
+		bo->queued = false;
+		list_del(&bo->xfer_list);
+		spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+		dma_sync_sgtable_for_cpu(&qdev->pdev->dev, bo->sgt, bo->dir);
+		complete_all(&bo->xfer_done);
+		drm_gem_object_put(&bo->base);
+		spin_lock_irqsave(&dbc->xfer_lock, flags);
+	}
+	spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+}
+
+int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr)
+{
+	if (!qdev->dbc[dbc_id].usr ||
+	    qdev->dbc[dbc_id].usr->handle != usr->handle)
+		return -EPERM;
+
+	qdev->dbc[dbc_id].usr = NULL;
+	synchronize_srcu(&qdev->dbc[dbc_id].ch_lock);
+	return 0;
+}
+
+/**
+ * enable_dbc - Enable the DBC. DBCs are disabled by removing the context of
+ * user. Add user context back to DBC to enable it. This function trusts the
+ * DBC ID passed and expects the DBC to be disabled.
+ * @qdev: Qranium device handle
+ * @dbc_id: ID of the DBC
+ * @usr: User context
+ */
+void enable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr)
+{
+	qdev->dbc[dbc_id].usr = usr;
+}
+
+void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id)
+{
+	struct dma_bridge_chan *dbc = &qdev->dbc[dbc_id];
+
+	dbc->usr = NULL;
+	empty_xfer_list(qdev, dbc);
+	synchronize_srcu(&dbc->ch_lock);
+}
+
+void release_dbc(struct qaic_device *qdev, u32 dbc_id)
+{
+	struct bo_slice *slice, *slice_temp;
+	struct qaic_bo *bo, *bo_temp;
+	struct dma_bridge_chan *dbc;
+
+	dbc = &qdev->dbc[dbc_id];
+	if (!dbc->in_use)
+		return;
+
+	wakeup_dbc(qdev, dbc_id);
+
+	dma_free_coherent(&qdev->pdev->dev, dbc->total_size, dbc->req_q_base,
+			  dbc->dma_addr);
+	dbc->total_size = 0;
+	dbc->req_q_base = NULL;
+	dbc->dma_addr = 0;
+	dbc->nelem = 0;
+	dbc->usr = NULL;
+
+	list_for_each_entry_safe(bo, bo_temp, &dbc->bo_lists, bo_list) {
+		list_for_each_entry_safe(slice, slice_temp, &bo->slices, slice)
+			kref_put(&slice->ref_count, free_slice);
+		bo->sliced = false;
+		INIT_LIST_HEAD(&bo->slices);
+		bo->total_slice_nents = 0;
+		bo->dir = 0;
+		bo->dbc = NULL;
+		bo->nr_slice = 0;
+		bo->nr_slice_xfer_done = 0;
+		bo->queued = false;
+		bo->req_id = 0;
+		init_completion(&bo->xfer_done);
+		complete_all(&bo->xfer_done);
+		list_del(&bo->bo_list);
+		bo->perf_stats.req_received_ts = 0;
+		bo->perf_stats.req_submit_ts = 0;
+		bo->perf_stats.req_processed_ts = 0;
+		bo->perf_stats.queue_level_before = 0;
+	}
+
+	dbc->in_use = false;
+	wake_up(&dbc->dbc_release);
+}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 6/8] accel/qaic: Add mhi_qaic_cntl
  2023-02-06 15:41 ` Jeffrey Hugo
@ 2023-02-06 15:41   ` Jeffrey Hugo
  -1 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-06 15:41 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: Jeffrey Hugo, linux-doc, linux-arm-msm, quic_ajitpals,
	quic_pkanojiy, stanislaw.gruszka, quic_carlv, jacek.lawrynowicz

From: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>

Some of the MHI channels for an AIC100 device need to be routed to
userspace so that userspace can communicate directly with QSM.  The MHI
bus does not support this, and while the WWAN subsystem does (for the same
reasons), AIC100 is not a WWAN device.  Also, MHI is not something that
other accelerators are expected to share, thus an accel subsystem function
that meets this usecase is unlikely.

Create a QAIC specific MHI userspace shim that exposes these channels.

Start with QAIC_SAHARA which is required to boot AIC100 and is consumed by
the kickstart application as documented in aic100.rst

Each AIC100 instance (currently, up to 16) in a system will create a
chardev for QAIC_SAHARA.  This chardev will be found as
/dev/<mhi instance>_QAIC_SAHARA
For example - /dev/mhi0_QAIC_SAHARA

Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
---
 drivers/accel/qaic/mhi_qaic_ctrl.c | 586 +++++++++++++++++++++++++++++++++++++
 drivers/accel/qaic/mhi_qaic_ctrl.h |  11 +
 2 files changed, 597 insertions(+)
 create mode 100644 drivers/accel/qaic/mhi_qaic_ctrl.c
 create mode 100644 drivers/accel/qaic/mhi_qaic_ctrl.h

diff --git a/drivers/accel/qaic/mhi_qaic_ctrl.c b/drivers/accel/qaic/mhi_qaic_ctrl.c
new file mode 100644
index 0000000..1f1ccf1
--- /dev/null
+++ b/drivers/accel/qaic/mhi_qaic_ctrl.c
@@ -0,0 +1,586 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved. */
+
+#include <linux/kernel.h>
+#include <linux/mhi.h>
+#include <linux/mod_devicetable.h>
+#include <linux/module.h>
+#include <linux/poll.h>
+#include <linux/version.h>
+#include <linux/xarray.h>
+#include <uapi/linux/eventpoll.h>
+
+#include "qaic.h"
+
+#define MHI_QAIC_CTRL_DRIVER_NAME	"mhi_qaic_ctrl"
+#define MHI_QAIC_CTRL_MAX_MINORS	128
+#define MHI_MAX_MTU			0xffff
+static DEFINE_XARRAY_ALLOC(mqc_xa);
+static struct class *mqc_dev_class;
+static int mqc_dev_major;
+
+/**
+ * @brief struct mqc_buf - Buffer structure used to receive data from device
+ * @data: Address of data to read from
+ * @odata: Original address returned from *alloc() API. Used to free this buf.
+ * @len: Length of data in byte
+ * @node: This buffer will be part of list managed in struct mqc_dev
+ */
+struct mqc_buf {
+	void *data;
+	void *odata;
+	size_t len;
+	struct list_head node;
+};
+
+/**
+ * @brief struct mqc_dev - MHI QAIC Control Device
+ * @minor: MQC device node minor number
+ * @mhi_dev: Associated mhi device object
+ * @mtu: Max TRE buffer length
+ * @enabled: Flag to track the state of the MQC device
+ * @lock: Mutex lock to serialize access to open_count
+ * @read_lock: Mutex lock to serialize readers
+ * @write_lock: Mutex lock to serialize writers
+ * @ul_wq: Wait queue for writers
+ * @dl_wq: Wait queue for readers
+ * @dl_queue_lock: Spin lock to serialize access to download queue
+ * @dl_queue: Queue of downloaded buffers
+ * @open_count: Track open counts
+ * @ref_count: Reference count for this structure
+ */
+struct mqc_dev {
+	uint32_t minor;
+	struct mhi_device *mhi_dev;
+	size_t mtu;
+	bool enabled;
+	struct mutex lock;
+	struct mutex read_lock;
+	struct mutex write_lock;
+	wait_queue_head_t ul_wq;
+	wait_queue_head_t dl_wq;
+	spinlock_t dl_queue_lock;
+	struct list_head dl_queue;
+	unsigned int open_count;
+	struct kref ref_count;
+};
+
+static void mqc_dev_release(struct kref *ref)
+{
+	struct mqc_dev *mqcdev = container_of(ref, struct mqc_dev, ref_count);
+
+	mutex_destroy(&mqcdev->read_lock);
+	mutex_destroy(&mqcdev->write_lock);
+	mutex_destroy(&mqcdev->lock);
+	kfree(mqcdev);
+}
+
+static int mhi_qaic_ctrl_fill_dl_queue(struct mqc_dev *mqcdev)
+{
+	struct mhi_device *mhi_dev = mqcdev->mhi_dev;
+	struct mqc_buf *ctrlbuf;
+	int rx_budget;
+	int ret = 0;
+	void *data;
+
+	rx_budget = mhi_get_free_desc_count(mhi_dev, DMA_FROM_DEVICE);
+	if (rx_budget < 0)
+		return -EIO;
+
+	while (rx_budget--) {
+		data = kzalloc(mqcdev->mtu + sizeof(*ctrlbuf), GFP_KERNEL);
+		if (!data)
+			return -ENOMEM;
+
+		ctrlbuf = data + mqcdev->mtu;
+		ctrlbuf->odata = data;
+
+		ret = mhi_queue_buf(mhi_dev, DMA_FROM_DEVICE, data,
+				    mqcdev->mtu, MHI_EOT);
+		if (ret) {
+			kfree(data);
+			dev_err(&mhi_dev->dev, "Failed to queue buffer\n");
+			return ret;
+		}
+	}
+
+	return ret;
+}
+
+static int mhi_qaic_ctrl_dev_start_chan(struct mqc_dev *mqcdev)
+{
+	struct device *dev = &mqcdev->mhi_dev->dev;
+	int ret = 0;
+
+	ret = mutex_lock_interruptible(&mqcdev->lock);
+	if (ret)
+		return ret;
+	if (!mqcdev->enabled) {
+		ret = -ENODEV;
+		goto release_dev_lock;
+	}
+	if (!mqcdev->open_count) {
+		ret = mhi_prepare_for_transfer(mqcdev->mhi_dev);
+		if (ret) {
+			dev_err(dev, "Error starting transfer channels\n");
+			goto release_dev_lock;
+		}
+
+		ret = mhi_qaic_ctrl_fill_dl_queue(mqcdev);
+		if (ret) {
+			dev_err(dev, "Error filling download queue.\n");
+			goto mhi_unprepare;
+		}
+	}
+	mqcdev->open_count++;
+	mutex_unlock(&mqcdev->lock);
+
+	return 0;
+
+mhi_unprepare:
+	mhi_unprepare_from_transfer(mqcdev->mhi_dev);
+release_dev_lock:
+	mutex_unlock(&mqcdev->lock);
+	return ret;
+}
+
+static struct mqc_dev *mqc_dev_get_by_minor(unsigned int minor)
+{
+	struct mqc_dev *mqcdev;
+
+	xa_lock(&mqc_xa);
+	mqcdev = xa_load(&mqc_xa, minor);
+	if (mqcdev)
+		kref_get(&mqcdev->ref_count);
+	xa_unlock(&mqc_xa);
+
+	return mqcdev;
+}
+
+static int mhi_qaic_ctrl_open(struct inode *inode, struct file *filp)
+{
+	struct mqc_dev *mqcdev;
+	int ret;
+
+	mqcdev = mqc_dev_get_by_minor(iminor(inode));
+	if (!mqcdev) {
+		pr_debug("mqc: minor %d not found\n", iminor(inode));
+		return -EINVAL;
+	}
+
+	ret = mhi_qaic_ctrl_dev_start_chan(mqcdev);
+	if (ret) {
+		kref_put(&mqcdev->ref_count, mqc_dev_release);
+		return ret;
+	}
+
+	filp->private_data = mqcdev;
+
+	return 0;
+}
+
+static void mhi_qaic_ctrl_buf_free(struct mqc_buf *ctrlbuf)
+{
+	list_del(&ctrlbuf->node);
+	kfree(ctrlbuf->odata);
+}
+
+static void __mhi_qaic_ctrl_release(struct mqc_dev *mqcdev)
+{
+	struct mqc_buf *ctrlbuf, *tmp;
+
+	mhi_unprepare_from_transfer(mqcdev->mhi_dev);
+	wake_up_interruptible(&mqcdev->ul_wq);
+	wake_up_interruptible(&mqcdev->dl_wq);
+	/*
+	 * Free the dl_queue. As we have already unprepared mhi transfers, we
+	 * do not expect any callback functions that update dl_queue hence no need
+	 * to grab dl_queue lock.
+	 */
+	mutex_lock(&mqcdev->read_lock);
+	list_for_each_entry_safe(ctrlbuf, tmp, &mqcdev->dl_queue, node)
+		mhi_qaic_ctrl_buf_free(ctrlbuf);
+	mutex_unlock(&mqcdev->read_lock);
+}
+
+static int mhi_qaic_ctrl_release(struct inode *inode, struct file *file)
+{
+	struct mqc_dev *mqcdev = file->private_data;
+
+	mutex_lock(&mqcdev->lock);
+	mqcdev->open_count--;
+	if (!mqcdev->open_count && mqcdev->enabled)
+		__mhi_qaic_ctrl_release(mqcdev);
+	mutex_unlock(&mqcdev->lock);
+
+	kref_put(&mqcdev->ref_count, mqc_dev_release);
+
+	return 0;
+}
+
+static __poll_t mhi_qaic_ctrl_poll(struct file *file, poll_table *wait)
+{
+	struct mqc_dev *mqcdev = file->private_data;
+	struct mhi_device *mhi_dev;
+	__poll_t mask = 0;
+
+	mhi_dev = mqcdev->mhi_dev;
+
+	poll_wait(file, &mqcdev->ul_wq, wait);
+	poll_wait(file, &mqcdev->dl_wq, wait);
+
+	mutex_lock(&mqcdev->lock);
+	if (!mqcdev->enabled || !mqcdev->open_count) {
+		mutex_unlock(&mqcdev->lock);
+		return EPOLLERR;
+	}
+
+	spin_lock_bh(&mqcdev->dl_queue_lock);
+	if (!list_empty(&mqcdev->dl_queue))
+		mask |= EPOLLIN | EPOLLRDNORM;
+	spin_unlock_bh(&mqcdev->dl_queue_lock);
+
+	if (mutex_lock_interruptible(&mqcdev->write_lock)) {
+		mutex_unlock(&mqcdev->lock);
+		return EPOLLERR;
+	}
+	if (mhi_get_free_desc_count(mhi_dev, DMA_TO_DEVICE) > 0)
+		mask |= EPOLLOUT | EPOLLWRNORM;
+	mutex_unlock(&mqcdev->write_lock);
+	mutex_unlock(&mqcdev->lock);
+
+	dev_dbg(&mhi_dev->dev, "Client attempted to poll, returning mask 0x%x\n", mask);
+
+	return mask;
+}
+
+static int mhi_qaic_ctrl_tx(struct mqc_dev *mqcdev)
+{
+	int ret;
+
+	ret = wait_event_interruptible(mqcdev->ul_wq,
+				       (!mqcdev->enabled || !mqcdev->open_count ||
+				       mhi_get_free_desc_count(mqcdev->mhi_dev,
+							       DMA_TO_DEVICE) > 0));
+
+	if (!mqcdev->open_count)
+		return -EPIPE;
+	if (!mqcdev->enabled)
+		return -ENODEV;
+
+	return ret;
+}
+
+static ssize_t mhi_qaic_ctrl_write(struct file *file, const char __user *buf,
+				   size_t count, loff_t *offp)
+{
+	struct mqc_dev *mqcdev = file->private_data;
+	struct mhi_device *mhi_dev;
+	size_t bytes_xfered = 0;
+	struct device *dev;
+	int ret, nr_desc;
+
+	mhi_dev = mqcdev->mhi_dev;
+	dev = &mhi_dev->dev;
+
+	if (!mhi_dev->ul_chan)
+		return -EOPNOTSUPP;
+
+	if (!buf || !count)
+		return -EINVAL;
+
+	dev_dbg(dev, "Request to transfer %zu bytes\n", count);
+
+	ret = mhi_qaic_ctrl_tx(mqcdev);
+	if (ret) {
+		dev_err(dev, "Failed to write %d\n", ret);
+		return ret;
+	}
+
+	if (mutex_lock_interruptible(&mqcdev->write_lock))
+		return -EINTR;
+
+	nr_desc = mhi_get_free_desc_count(mhi_dev, DMA_TO_DEVICE);
+	if (nr_desc * mqcdev->mtu < count) {
+		ret = -EMSGSIZE;
+		dev_dbg(dev, "Buffer too big to transfer\n");
+		goto unlock_mutex;
+	}
+
+	while (count != bytes_xfered) {
+		enum mhi_flags flags;
+		size_t to_copy;
+		void *kbuf;
+
+		to_copy = min_t(size_t, count - bytes_xfered, mqcdev->mtu);
+		kbuf = kmalloc(to_copy, GFP_KERNEL);
+		if (!kbuf) {
+			ret = -ENOMEM;
+			goto unlock_mutex;
+		}
+
+		ret = copy_from_user(kbuf, buf + bytes_xfered, to_copy);
+		if (ret) {
+			kfree(kbuf);
+			ret = -EFAULT;
+			goto unlock_mutex;
+		}
+
+		if (bytes_xfered + to_copy == count)
+			flags = MHI_EOT;
+		else
+			flags = MHI_CHAIN;
+
+		ret = mhi_queue_buf(mhi_dev, DMA_TO_DEVICE, kbuf, to_copy, flags);
+		if (ret) {
+			kfree(kbuf);
+			dev_err(dev, "Failed to queue buf of size %zu\n", to_copy);
+			goto unlock_mutex;
+		}
+
+		bytes_xfered += to_copy;
+	}
+
+	mutex_unlock(&mqcdev->write_lock);
+	dev_dbg(dev, "bytes xferred: %zu\n", bytes_xfered);
+
+	return bytes_xfered;
+
+unlock_mutex:
+	mutex_unlock(&mqcdev->write_lock);
+	return ret;
+}
+
+static int mhi_qaic_ctrl_rx(struct mqc_dev *mqcdev)
+{
+	int ret;
+
+	ret = wait_event_interruptible(mqcdev->dl_wq, (!mqcdev->enabled || !mqcdev->open_count ||
+				       !list_empty(&mqcdev->dl_queue)));
+
+	if (!mqcdev->open_count)
+		return -EPERM;
+	if (!mqcdev->enabled)
+		return -ENODEV;
+
+	return ret;
+}
+
+static ssize_t mhi_qaic_ctrl_read(struct file *file, char __user *buf,
+				  size_t count, loff_t *ppos)
+{
+	struct mqc_dev *mqcdev = file->private_data;
+	struct mqc_buf *ctrlbuf;
+	size_t to_copy;
+	int ret;
+
+	if (!mqcdev->mhi_dev->dl_chan)
+		return -EOPNOTSUPP;
+
+	ret = mhi_qaic_ctrl_rx(mqcdev);
+	if (ret) {
+		dev_err(&mqcdev->mhi_dev->dev, "Failed to read %d\n", ret);
+		return ret;
+	}
+
+	if (mutex_lock_interruptible(&mqcdev->read_lock))
+		return -EINTR;
+
+	ctrlbuf = list_first_entry_or_null(&mqcdev->dl_queue, struct mqc_buf, node);
+	if (!ctrlbuf) {
+		mutex_unlock(&mqcdev->read_lock);
+		dev_dbg(&mqcdev->mhi_dev->dev, "Device has been released\n");
+		ret = -ENODEV;
+		goto error_out;
+	}
+
+	to_copy = min_t(size_t, count, ctrlbuf->len);
+	if (copy_to_user(buf, ctrlbuf->data, to_copy)) {
+		mutex_unlock(&mqcdev->read_lock);
+		dev_dbg(&mqcdev->mhi_dev->dev, "Failed to copy data to user buffer\n");
+		ret = -EFAULT;
+		goto error_out;
+	}
+
+	ctrlbuf->len -= to_copy;
+	ctrlbuf->data += to_copy;
+
+	if (!ctrlbuf->len) {
+		spin_lock_bh(&mqcdev->dl_queue_lock);
+		mhi_qaic_ctrl_buf_free(ctrlbuf);
+		spin_unlock_bh(&mqcdev->dl_queue_lock);
+		mhi_qaic_ctrl_fill_dl_queue(mqcdev);
+		dev_dbg(&mqcdev->mhi_dev->dev, "Read buf freed\n");
+	}
+
+	mutex_unlock(&mqcdev->read_lock);
+	return to_copy;
+
+error_out:
+	mutex_unlock(&mqcdev->read_lock);
+	return ret;
+}
+
+static const struct file_operations mhidev_fops = {
+	.owner = THIS_MODULE,
+	.open = mhi_qaic_ctrl_open,
+	.release = mhi_qaic_ctrl_release,
+	.read = mhi_qaic_ctrl_read,
+	.write = mhi_qaic_ctrl_write,
+	.poll = mhi_qaic_ctrl_poll,
+};
+
+static void mhi_qaic_ctrl_ul_xfer_cb(struct mhi_device *mhi_dev,
+				     struct mhi_result *mhi_result)
+{
+	struct mqc_dev *mqcdev = dev_get_drvdata(&mhi_dev->dev);
+
+	dev_dbg(&mhi_dev->dev, "%s: status: %d xfer_len: %zu\n", __func__,
+		mhi_result->transaction_status, mhi_result->bytes_xferd);
+
+	kfree(mhi_result->buf_addr);
+
+	if (!mhi_result->transaction_status)
+		wake_up_interruptible(&mqcdev->ul_wq);
+}
+
+static void mhi_qaic_ctrl_dl_xfer_cb(struct mhi_device *mhi_dev,
+				     struct mhi_result *mhi_result)
+{
+	struct mqc_dev *mqcdev = dev_get_drvdata(&mhi_dev->dev);
+	struct mqc_buf *ctrlbuf;
+
+	dev_dbg(&mhi_dev->dev, "%s: status: %d receive_len: %zu\n", __func__,
+		mhi_result->transaction_status, mhi_result->bytes_xferd);
+
+	if (mhi_result->transaction_status &&
+	    mhi_result->transaction_status != -EOVERFLOW) {
+		kfree(mhi_result->buf_addr);
+		return;
+	}
+
+	ctrlbuf = mhi_result->buf_addr + mqcdev->mtu;
+	ctrlbuf->data = mhi_result->buf_addr;
+	ctrlbuf->len = mhi_result->bytes_xferd;
+	spin_lock_bh(&mqcdev->dl_queue_lock);
+	list_add_tail(&ctrlbuf->node, &mqcdev->dl_queue);
+	spin_unlock_bh(&mqcdev->dl_queue_lock);
+
+	wake_up_interruptible(&mqcdev->dl_wq);
+}
+
+static int mhi_qaic_ctrl_probe(struct mhi_device *mhi_dev,
+			       const struct mhi_device_id *id)
+{
+	struct mqc_dev *mqcdev;
+	struct device *dev;
+	int ret;
+
+	mqcdev = kzalloc(sizeof(*mqcdev), GFP_KERNEL);
+	if (!mqcdev)
+		return -ENOMEM;
+
+	kref_init(&mqcdev->ref_count);
+	mutex_init(&mqcdev->lock);
+	mqcdev->mhi_dev = mhi_dev;
+
+	ret = xa_alloc(&mqc_xa, &mqcdev->minor, mqcdev, XA_LIMIT(0, MHI_QAIC_CTRL_MAX_MINORS),
+		       GFP_KERNEL);
+	if (ret) {
+		kfree(mqcdev);
+		return ret;
+	}
+
+	init_waitqueue_head(&mqcdev->ul_wq);
+	init_waitqueue_head(&mqcdev->dl_wq);
+	mutex_init(&mqcdev->read_lock);
+	mutex_init(&mqcdev->write_lock);
+	spin_lock_init(&mqcdev->dl_queue_lock);
+	INIT_LIST_HEAD(&mqcdev->dl_queue);
+	mqcdev->mtu = min_t(size_t, id->driver_data, MHI_MAX_MTU);
+	mqcdev->enabled = true;
+	mqcdev->open_count = 0;
+	dev_set_drvdata(&mhi_dev->dev, mqcdev);
+
+	dev = device_create(mqc_dev_class, &mhi_dev->dev,
+			    MKDEV(mqc_dev_major, mqcdev->minor),
+			    mqcdev, "%s", dev_name(&mhi_dev->dev));
+	if (IS_ERR(dev)) {
+		xa_erase(&mqc_xa, mqcdev->minor);
+		dev_set_drvdata(&mhi_dev->dev, NULL);
+		kfree(mqcdev);
+		return PTR_ERR(dev);
+	}
+
+	return 0;
+};
+
+static void mhi_qaic_ctrl_remove(struct mhi_device *mhi_dev)
+{
+	struct mqc_dev *mqcdev = dev_get_drvdata(&mhi_dev->dev);
+
+	device_destroy(mqc_dev_class, MKDEV(mqc_dev_major, mqcdev->minor));
+
+	mutex_lock(&mqcdev->lock);
+	mqcdev->enabled = false;
+	if (mqcdev->open_count)
+		__mhi_qaic_ctrl_release(mqcdev);
+	mutex_unlock(&mqcdev->lock);
+
+	xa_erase(&mqc_xa, mqcdev->minor);
+	kref_put(&mqcdev->ref_count, mqc_dev_release);
+}
+
+/* .driver_data stores max mtu */
+static const struct mhi_device_id mhi_qaic_ctrl_match_table[] = {
+	{ .chan = "QAIC_SAHARA", .driver_data = SZ_32K},
+	{},
+};
+MODULE_DEVICE_TABLE(mhi, mhi_qaic_ctrl_match_table);
+
+static struct mhi_driver mhi_qaic_ctrl_driver = {
+	.id_table = mhi_qaic_ctrl_match_table,
+	.remove = mhi_qaic_ctrl_remove,
+	.probe = mhi_qaic_ctrl_probe,
+	.ul_xfer_cb = mhi_qaic_ctrl_ul_xfer_cb,
+	.dl_xfer_cb = mhi_qaic_ctrl_dl_xfer_cb,
+	.driver = {
+		.name = MHI_QAIC_CTRL_DRIVER_NAME,
+	},
+};
+
+int mhi_qaic_ctrl_init(void)
+{
+	int ret;
+
+	ret = register_chrdev(0, MHI_QAIC_CTRL_DRIVER_NAME, &mhidev_fops);
+	if (ret < 0)
+		return ret;
+
+	mqc_dev_major = ret;
+	mqc_dev_class = class_create(THIS_MODULE, MHI_QAIC_CTRL_DRIVER_NAME);
+	if (IS_ERR(mqc_dev_class)) {
+		ret = PTR_ERR(mqc_dev_class);
+		goto unregister_chrdev;
+	}
+
+	ret = mhi_driver_register(&mhi_qaic_ctrl_driver);
+	if (ret)
+		goto destroy_class;
+
+	return 0;
+
+destroy_class:
+	class_destroy(mqc_dev_class);
+unregister_chrdev:
+	unregister_chrdev(mqc_dev_major, MHI_QAIC_CTRL_DRIVER_NAME);
+	return ret;
+}
+
+void mhi_qaic_ctrl_deinit(void)
+{
+	mhi_driver_unregister(&mhi_qaic_ctrl_driver);
+	class_destroy(mqc_dev_class);
+	unregister_chrdev(mqc_dev_major, MHI_QAIC_CTRL_DRIVER_NAME);
+	xa_destroy(&mqc_xa);
+}
diff --git a/drivers/accel/qaic/mhi_qaic_ctrl.h b/drivers/accel/qaic/mhi_qaic_ctrl.h
new file mode 100644
index 0000000..0545540
--- /dev/null
+++ b/drivers/accel/qaic/mhi_qaic_ctrl.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * Copyright (c) 2022 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef __MHI_QAIC_CTRL_H__
+#define __MHI_QAIC_CTRL_H__
+
+int mhi_qaic_ctrl_init(void);
+void mhi_qaic_ctrl_deinit(void);
+#endif /* __MHI_QAIC_CTRL_H__ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 6/8] accel/qaic: Add mhi_qaic_cntl
@ 2023-02-06 15:41   ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-06 15:41 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: jacek.lawrynowicz, stanislaw.gruszka, quic_pkanojiy, quic_carlv,
	quic_ajitpals, linux-doc, linux-arm-msm, Jeffrey Hugo

From: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>

Some of the MHI channels for an AIC100 device need to be routed to
userspace so that userspace can communicate directly with QSM.  The MHI
bus does not support this, and while the WWAN subsystem does (for the same
reasons), AIC100 is not a WWAN device.  Also, MHI is not something that
other accelerators are expected to share, thus an accel subsystem function
that meets this usecase is unlikely.

Create a QAIC specific MHI userspace shim that exposes these channels.

Start with QAIC_SAHARA which is required to boot AIC100 and is consumed by
the kickstart application as documented in aic100.rst

Each AIC100 instance (currently, up to 16) in a system will create a
chardev for QAIC_SAHARA.  This chardev will be found as
/dev/<mhi instance>_QAIC_SAHARA
For example - /dev/mhi0_QAIC_SAHARA

Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
---
 drivers/accel/qaic/mhi_qaic_ctrl.c | 586 +++++++++++++++++++++++++++++++++++++
 drivers/accel/qaic/mhi_qaic_ctrl.h |  11 +
 2 files changed, 597 insertions(+)
 create mode 100644 drivers/accel/qaic/mhi_qaic_ctrl.c
 create mode 100644 drivers/accel/qaic/mhi_qaic_ctrl.h

diff --git a/drivers/accel/qaic/mhi_qaic_ctrl.c b/drivers/accel/qaic/mhi_qaic_ctrl.c
new file mode 100644
index 0000000..1f1ccf1
--- /dev/null
+++ b/drivers/accel/qaic/mhi_qaic_ctrl.c
@@ -0,0 +1,586 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved. */
+
+#include <linux/kernel.h>
+#include <linux/mhi.h>
+#include <linux/mod_devicetable.h>
+#include <linux/module.h>
+#include <linux/poll.h>
+#include <linux/version.h>
+#include <linux/xarray.h>
+#include <uapi/linux/eventpoll.h>
+
+#include "qaic.h"
+
+#define MHI_QAIC_CTRL_DRIVER_NAME	"mhi_qaic_ctrl"
+#define MHI_QAIC_CTRL_MAX_MINORS	128
+#define MHI_MAX_MTU			0xffff
+static DEFINE_XARRAY_ALLOC(mqc_xa);
+static struct class *mqc_dev_class;
+static int mqc_dev_major;
+
+/**
+ * @brief struct mqc_buf - Buffer structure used to receive data from device
+ * @data: Address of data to read from
+ * @odata: Original address returned from *alloc() API. Used to free this buf.
+ * @len: Length of data in byte
+ * @node: This buffer will be part of list managed in struct mqc_dev
+ */
+struct mqc_buf {
+	void *data;
+	void *odata;
+	size_t len;
+	struct list_head node;
+};
+
+/**
+ * @brief struct mqc_dev - MHI QAIC Control Device
+ * @minor: MQC device node minor number
+ * @mhi_dev: Associated mhi device object
+ * @mtu: Max TRE buffer length
+ * @enabled: Flag to track the state of the MQC device
+ * @lock: Mutex lock to serialize access to open_count
+ * @read_lock: Mutex lock to serialize readers
+ * @write_lock: Mutex lock to serialize writers
+ * @ul_wq: Wait queue for writers
+ * @dl_wq: Wait queue for readers
+ * @dl_queue_lock: Spin lock to serialize access to download queue
+ * @dl_queue: Queue of downloaded buffers
+ * @open_count: Track open counts
+ * @ref_count: Reference count for this structure
+ */
+struct mqc_dev {
+	uint32_t minor;
+	struct mhi_device *mhi_dev;
+	size_t mtu;
+	bool enabled;
+	struct mutex lock;
+	struct mutex read_lock;
+	struct mutex write_lock;
+	wait_queue_head_t ul_wq;
+	wait_queue_head_t dl_wq;
+	spinlock_t dl_queue_lock;
+	struct list_head dl_queue;
+	unsigned int open_count;
+	struct kref ref_count;
+};
+
+static void mqc_dev_release(struct kref *ref)
+{
+	struct mqc_dev *mqcdev = container_of(ref, struct mqc_dev, ref_count);
+
+	mutex_destroy(&mqcdev->read_lock);
+	mutex_destroy(&mqcdev->write_lock);
+	mutex_destroy(&mqcdev->lock);
+	kfree(mqcdev);
+}
+
+static int mhi_qaic_ctrl_fill_dl_queue(struct mqc_dev *mqcdev)
+{
+	struct mhi_device *mhi_dev = mqcdev->mhi_dev;
+	struct mqc_buf *ctrlbuf;
+	int rx_budget;
+	int ret = 0;
+	void *data;
+
+	rx_budget = mhi_get_free_desc_count(mhi_dev, DMA_FROM_DEVICE);
+	if (rx_budget < 0)
+		return -EIO;
+
+	while (rx_budget--) {
+		data = kzalloc(mqcdev->mtu + sizeof(*ctrlbuf), GFP_KERNEL);
+		if (!data)
+			return -ENOMEM;
+
+		ctrlbuf = data + mqcdev->mtu;
+		ctrlbuf->odata = data;
+
+		ret = mhi_queue_buf(mhi_dev, DMA_FROM_DEVICE, data,
+				    mqcdev->mtu, MHI_EOT);
+		if (ret) {
+			kfree(data);
+			dev_err(&mhi_dev->dev, "Failed to queue buffer\n");
+			return ret;
+		}
+	}
+
+	return ret;
+}
+
+static int mhi_qaic_ctrl_dev_start_chan(struct mqc_dev *mqcdev)
+{
+	struct device *dev = &mqcdev->mhi_dev->dev;
+	int ret = 0;
+
+	ret = mutex_lock_interruptible(&mqcdev->lock);
+	if (ret)
+		return ret;
+	if (!mqcdev->enabled) {
+		ret = -ENODEV;
+		goto release_dev_lock;
+	}
+	if (!mqcdev->open_count) {
+		ret = mhi_prepare_for_transfer(mqcdev->mhi_dev);
+		if (ret) {
+			dev_err(dev, "Error starting transfer channels\n");
+			goto release_dev_lock;
+		}
+
+		ret = mhi_qaic_ctrl_fill_dl_queue(mqcdev);
+		if (ret) {
+			dev_err(dev, "Error filling download queue.\n");
+			goto mhi_unprepare;
+		}
+	}
+	mqcdev->open_count++;
+	mutex_unlock(&mqcdev->lock);
+
+	return 0;
+
+mhi_unprepare:
+	mhi_unprepare_from_transfer(mqcdev->mhi_dev);
+release_dev_lock:
+	mutex_unlock(&mqcdev->lock);
+	return ret;
+}
+
+static struct mqc_dev *mqc_dev_get_by_minor(unsigned int minor)
+{
+	struct mqc_dev *mqcdev;
+
+	xa_lock(&mqc_xa);
+	mqcdev = xa_load(&mqc_xa, minor);
+	if (mqcdev)
+		kref_get(&mqcdev->ref_count);
+	xa_unlock(&mqc_xa);
+
+	return mqcdev;
+}
+
+static int mhi_qaic_ctrl_open(struct inode *inode, struct file *filp)
+{
+	struct mqc_dev *mqcdev;
+	int ret;
+
+	mqcdev = mqc_dev_get_by_minor(iminor(inode));
+	if (!mqcdev) {
+		pr_debug("mqc: minor %d not found\n", iminor(inode));
+		return -EINVAL;
+	}
+
+	ret = mhi_qaic_ctrl_dev_start_chan(mqcdev);
+	if (ret) {
+		kref_put(&mqcdev->ref_count, mqc_dev_release);
+		return ret;
+	}
+
+	filp->private_data = mqcdev;
+
+	return 0;
+}
+
+static void mhi_qaic_ctrl_buf_free(struct mqc_buf *ctrlbuf)
+{
+	list_del(&ctrlbuf->node);
+	kfree(ctrlbuf->odata);
+}
+
+static void __mhi_qaic_ctrl_release(struct mqc_dev *mqcdev)
+{
+	struct mqc_buf *ctrlbuf, *tmp;
+
+	mhi_unprepare_from_transfer(mqcdev->mhi_dev);
+	wake_up_interruptible(&mqcdev->ul_wq);
+	wake_up_interruptible(&mqcdev->dl_wq);
+	/*
+	 * Free the dl_queue. As we have already unprepared mhi transfers, we
+	 * do not expect any callback functions that update dl_queue hence no need
+	 * to grab dl_queue lock.
+	 */
+	mutex_lock(&mqcdev->read_lock);
+	list_for_each_entry_safe(ctrlbuf, tmp, &mqcdev->dl_queue, node)
+		mhi_qaic_ctrl_buf_free(ctrlbuf);
+	mutex_unlock(&mqcdev->read_lock);
+}
+
+static int mhi_qaic_ctrl_release(struct inode *inode, struct file *file)
+{
+	struct mqc_dev *mqcdev = file->private_data;
+
+	mutex_lock(&mqcdev->lock);
+	mqcdev->open_count--;
+	if (!mqcdev->open_count && mqcdev->enabled)
+		__mhi_qaic_ctrl_release(mqcdev);
+	mutex_unlock(&mqcdev->lock);
+
+	kref_put(&mqcdev->ref_count, mqc_dev_release);
+
+	return 0;
+}
+
+static __poll_t mhi_qaic_ctrl_poll(struct file *file, poll_table *wait)
+{
+	struct mqc_dev *mqcdev = file->private_data;
+	struct mhi_device *mhi_dev;
+	__poll_t mask = 0;
+
+	mhi_dev = mqcdev->mhi_dev;
+
+	poll_wait(file, &mqcdev->ul_wq, wait);
+	poll_wait(file, &mqcdev->dl_wq, wait);
+
+	mutex_lock(&mqcdev->lock);
+	if (!mqcdev->enabled || !mqcdev->open_count) {
+		mutex_unlock(&mqcdev->lock);
+		return EPOLLERR;
+	}
+
+	spin_lock_bh(&mqcdev->dl_queue_lock);
+	if (!list_empty(&mqcdev->dl_queue))
+		mask |= EPOLLIN | EPOLLRDNORM;
+	spin_unlock_bh(&mqcdev->dl_queue_lock);
+
+	if (mutex_lock_interruptible(&mqcdev->write_lock)) {
+		mutex_unlock(&mqcdev->lock);
+		return EPOLLERR;
+	}
+	if (mhi_get_free_desc_count(mhi_dev, DMA_TO_DEVICE) > 0)
+		mask |= EPOLLOUT | EPOLLWRNORM;
+	mutex_unlock(&mqcdev->write_lock);
+	mutex_unlock(&mqcdev->lock);
+
+	dev_dbg(&mhi_dev->dev, "Client attempted to poll, returning mask 0x%x\n", mask);
+
+	return mask;
+}
+
+static int mhi_qaic_ctrl_tx(struct mqc_dev *mqcdev)
+{
+	int ret;
+
+	ret = wait_event_interruptible(mqcdev->ul_wq,
+				       (!mqcdev->enabled || !mqcdev->open_count ||
+				       mhi_get_free_desc_count(mqcdev->mhi_dev,
+							       DMA_TO_DEVICE) > 0));
+
+	if (!mqcdev->open_count)
+		return -EPIPE;
+	if (!mqcdev->enabled)
+		return -ENODEV;
+
+	return ret;
+}
+
+static ssize_t mhi_qaic_ctrl_write(struct file *file, const char __user *buf,
+				   size_t count, loff_t *offp)
+{
+	struct mqc_dev *mqcdev = file->private_data;
+	struct mhi_device *mhi_dev;
+	size_t bytes_xfered = 0;
+	struct device *dev;
+	int ret, nr_desc;
+
+	mhi_dev = mqcdev->mhi_dev;
+	dev = &mhi_dev->dev;
+
+	if (!mhi_dev->ul_chan)
+		return -EOPNOTSUPP;
+
+	if (!buf || !count)
+		return -EINVAL;
+
+	dev_dbg(dev, "Request to transfer %zu bytes\n", count);
+
+	ret = mhi_qaic_ctrl_tx(mqcdev);
+	if (ret) {
+		dev_err(dev, "Failed to write %d\n", ret);
+		return ret;
+	}
+
+	if (mutex_lock_interruptible(&mqcdev->write_lock))
+		return -EINTR;
+
+	nr_desc = mhi_get_free_desc_count(mhi_dev, DMA_TO_DEVICE);
+	if (nr_desc * mqcdev->mtu < count) {
+		ret = -EMSGSIZE;
+		dev_dbg(dev, "Buffer too big to transfer\n");
+		goto unlock_mutex;
+	}
+
+	while (count != bytes_xfered) {
+		enum mhi_flags flags;
+		size_t to_copy;
+		void *kbuf;
+
+		to_copy = min_t(size_t, count - bytes_xfered, mqcdev->mtu);
+		kbuf = kmalloc(to_copy, GFP_KERNEL);
+		if (!kbuf) {
+			ret = -ENOMEM;
+			goto unlock_mutex;
+		}
+
+		ret = copy_from_user(kbuf, buf + bytes_xfered, to_copy);
+		if (ret) {
+			kfree(kbuf);
+			ret = -EFAULT;
+			goto unlock_mutex;
+		}
+
+		if (bytes_xfered + to_copy == count)
+			flags = MHI_EOT;
+		else
+			flags = MHI_CHAIN;
+
+		ret = mhi_queue_buf(mhi_dev, DMA_TO_DEVICE, kbuf, to_copy, flags);
+		if (ret) {
+			kfree(kbuf);
+			dev_err(dev, "Failed to queue buf of size %zu\n", to_copy);
+			goto unlock_mutex;
+		}
+
+		bytes_xfered += to_copy;
+	}
+
+	mutex_unlock(&mqcdev->write_lock);
+	dev_dbg(dev, "bytes xferred: %zu\n", bytes_xfered);
+
+	return bytes_xfered;
+
+unlock_mutex:
+	mutex_unlock(&mqcdev->write_lock);
+	return ret;
+}
+
+static int mhi_qaic_ctrl_rx(struct mqc_dev *mqcdev)
+{
+	int ret;
+
+	ret = wait_event_interruptible(mqcdev->dl_wq, (!mqcdev->enabled || !mqcdev->open_count ||
+				       !list_empty(&mqcdev->dl_queue)));
+
+	if (!mqcdev->open_count)
+		return -EPERM;
+	if (!mqcdev->enabled)
+		return -ENODEV;
+
+	return ret;
+}
+
+static ssize_t mhi_qaic_ctrl_read(struct file *file, char __user *buf,
+				  size_t count, loff_t *ppos)
+{
+	struct mqc_dev *mqcdev = file->private_data;
+	struct mqc_buf *ctrlbuf;
+	size_t to_copy;
+	int ret;
+
+	if (!mqcdev->mhi_dev->dl_chan)
+		return -EOPNOTSUPP;
+
+	ret = mhi_qaic_ctrl_rx(mqcdev);
+	if (ret) {
+		dev_err(&mqcdev->mhi_dev->dev, "Failed to read %d\n", ret);
+		return ret;
+	}
+
+	if (mutex_lock_interruptible(&mqcdev->read_lock))
+		return -EINTR;
+
+	ctrlbuf = list_first_entry_or_null(&mqcdev->dl_queue, struct mqc_buf, node);
+	if (!ctrlbuf) {
+		mutex_unlock(&mqcdev->read_lock);
+		dev_dbg(&mqcdev->mhi_dev->dev, "Device has been released\n");
+		ret = -ENODEV;
+		goto error_out;
+	}
+
+	to_copy = min_t(size_t, count, ctrlbuf->len);
+	if (copy_to_user(buf, ctrlbuf->data, to_copy)) {
+		mutex_unlock(&mqcdev->read_lock);
+		dev_dbg(&mqcdev->mhi_dev->dev, "Failed to copy data to user buffer\n");
+		ret = -EFAULT;
+		goto error_out;
+	}
+
+	ctrlbuf->len -= to_copy;
+	ctrlbuf->data += to_copy;
+
+	if (!ctrlbuf->len) {
+		spin_lock_bh(&mqcdev->dl_queue_lock);
+		mhi_qaic_ctrl_buf_free(ctrlbuf);
+		spin_unlock_bh(&mqcdev->dl_queue_lock);
+		mhi_qaic_ctrl_fill_dl_queue(mqcdev);
+		dev_dbg(&mqcdev->mhi_dev->dev, "Read buf freed\n");
+	}
+
+	mutex_unlock(&mqcdev->read_lock);
+	return to_copy;
+
+error_out:
+	mutex_unlock(&mqcdev->read_lock);
+	return ret;
+}
+
+static const struct file_operations mhidev_fops = {
+	.owner = THIS_MODULE,
+	.open = mhi_qaic_ctrl_open,
+	.release = mhi_qaic_ctrl_release,
+	.read = mhi_qaic_ctrl_read,
+	.write = mhi_qaic_ctrl_write,
+	.poll = mhi_qaic_ctrl_poll,
+};
+
+static void mhi_qaic_ctrl_ul_xfer_cb(struct mhi_device *mhi_dev,
+				     struct mhi_result *mhi_result)
+{
+	struct mqc_dev *mqcdev = dev_get_drvdata(&mhi_dev->dev);
+
+	dev_dbg(&mhi_dev->dev, "%s: status: %d xfer_len: %zu\n", __func__,
+		mhi_result->transaction_status, mhi_result->bytes_xferd);
+
+	kfree(mhi_result->buf_addr);
+
+	if (!mhi_result->transaction_status)
+		wake_up_interruptible(&mqcdev->ul_wq);
+}
+
+static void mhi_qaic_ctrl_dl_xfer_cb(struct mhi_device *mhi_dev,
+				     struct mhi_result *mhi_result)
+{
+	struct mqc_dev *mqcdev = dev_get_drvdata(&mhi_dev->dev);
+	struct mqc_buf *ctrlbuf;
+
+	dev_dbg(&mhi_dev->dev, "%s: status: %d receive_len: %zu\n", __func__,
+		mhi_result->transaction_status, mhi_result->bytes_xferd);
+
+	if (mhi_result->transaction_status &&
+	    mhi_result->transaction_status != -EOVERFLOW) {
+		kfree(mhi_result->buf_addr);
+		return;
+	}
+
+	ctrlbuf = mhi_result->buf_addr + mqcdev->mtu;
+	ctrlbuf->data = mhi_result->buf_addr;
+	ctrlbuf->len = mhi_result->bytes_xferd;
+	spin_lock_bh(&mqcdev->dl_queue_lock);
+	list_add_tail(&ctrlbuf->node, &mqcdev->dl_queue);
+	spin_unlock_bh(&mqcdev->dl_queue_lock);
+
+	wake_up_interruptible(&mqcdev->dl_wq);
+}
+
+static int mhi_qaic_ctrl_probe(struct mhi_device *mhi_dev,
+			       const struct mhi_device_id *id)
+{
+	struct mqc_dev *mqcdev;
+	struct device *dev;
+	int ret;
+
+	mqcdev = kzalloc(sizeof(*mqcdev), GFP_KERNEL);
+	if (!mqcdev)
+		return -ENOMEM;
+
+	kref_init(&mqcdev->ref_count);
+	mutex_init(&mqcdev->lock);
+	mqcdev->mhi_dev = mhi_dev;
+
+	ret = xa_alloc(&mqc_xa, &mqcdev->minor, mqcdev, XA_LIMIT(0, MHI_QAIC_CTRL_MAX_MINORS),
+		       GFP_KERNEL);
+	if (ret) {
+		kfree(mqcdev);
+		return ret;
+	}
+
+	init_waitqueue_head(&mqcdev->ul_wq);
+	init_waitqueue_head(&mqcdev->dl_wq);
+	mutex_init(&mqcdev->read_lock);
+	mutex_init(&mqcdev->write_lock);
+	spin_lock_init(&mqcdev->dl_queue_lock);
+	INIT_LIST_HEAD(&mqcdev->dl_queue);
+	mqcdev->mtu = min_t(size_t, id->driver_data, MHI_MAX_MTU);
+	mqcdev->enabled = true;
+	mqcdev->open_count = 0;
+	dev_set_drvdata(&mhi_dev->dev, mqcdev);
+
+	dev = device_create(mqc_dev_class, &mhi_dev->dev,
+			    MKDEV(mqc_dev_major, mqcdev->minor),
+			    mqcdev, "%s", dev_name(&mhi_dev->dev));
+	if (IS_ERR(dev)) {
+		xa_erase(&mqc_xa, mqcdev->minor);
+		dev_set_drvdata(&mhi_dev->dev, NULL);
+		kfree(mqcdev);
+		return PTR_ERR(dev);
+	}
+
+	return 0;
+};
+
+static void mhi_qaic_ctrl_remove(struct mhi_device *mhi_dev)
+{
+	struct mqc_dev *mqcdev = dev_get_drvdata(&mhi_dev->dev);
+
+	device_destroy(mqc_dev_class, MKDEV(mqc_dev_major, mqcdev->minor));
+
+	mutex_lock(&mqcdev->lock);
+	mqcdev->enabled = false;
+	if (mqcdev->open_count)
+		__mhi_qaic_ctrl_release(mqcdev);
+	mutex_unlock(&mqcdev->lock);
+
+	xa_erase(&mqc_xa, mqcdev->minor);
+	kref_put(&mqcdev->ref_count, mqc_dev_release);
+}
+
+/* .driver_data stores max mtu */
+static const struct mhi_device_id mhi_qaic_ctrl_match_table[] = {
+	{ .chan = "QAIC_SAHARA", .driver_data = SZ_32K},
+	{},
+};
+MODULE_DEVICE_TABLE(mhi, mhi_qaic_ctrl_match_table);
+
+static struct mhi_driver mhi_qaic_ctrl_driver = {
+	.id_table = mhi_qaic_ctrl_match_table,
+	.remove = mhi_qaic_ctrl_remove,
+	.probe = mhi_qaic_ctrl_probe,
+	.ul_xfer_cb = mhi_qaic_ctrl_ul_xfer_cb,
+	.dl_xfer_cb = mhi_qaic_ctrl_dl_xfer_cb,
+	.driver = {
+		.name = MHI_QAIC_CTRL_DRIVER_NAME,
+	},
+};
+
+int mhi_qaic_ctrl_init(void)
+{
+	int ret;
+
+	ret = register_chrdev(0, MHI_QAIC_CTRL_DRIVER_NAME, &mhidev_fops);
+	if (ret < 0)
+		return ret;
+
+	mqc_dev_major = ret;
+	mqc_dev_class = class_create(THIS_MODULE, MHI_QAIC_CTRL_DRIVER_NAME);
+	if (IS_ERR(mqc_dev_class)) {
+		ret = PTR_ERR(mqc_dev_class);
+		goto unregister_chrdev;
+	}
+
+	ret = mhi_driver_register(&mhi_qaic_ctrl_driver);
+	if (ret)
+		goto destroy_class;
+
+	return 0;
+
+destroy_class:
+	class_destroy(mqc_dev_class);
+unregister_chrdev:
+	unregister_chrdev(mqc_dev_major, MHI_QAIC_CTRL_DRIVER_NAME);
+	return ret;
+}
+
+void mhi_qaic_ctrl_deinit(void)
+{
+	mhi_driver_unregister(&mhi_qaic_ctrl_driver);
+	class_destroy(mqc_dev_class);
+	unregister_chrdev(mqc_dev_major, MHI_QAIC_CTRL_DRIVER_NAME);
+	xa_destroy(&mqc_xa);
+}
diff --git a/drivers/accel/qaic/mhi_qaic_ctrl.h b/drivers/accel/qaic/mhi_qaic_ctrl.h
new file mode 100644
index 0000000..0545540
--- /dev/null
+++ b/drivers/accel/qaic/mhi_qaic_ctrl.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * Copyright (c) 2022 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef __MHI_QAIC_CTRL_H__
+#define __MHI_QAIC_CTRL_H__
+
+int mhi_qaic_ctrl_init(void);
+void mhi_qaic_ctrl_deinit(void);
+#endif /* __MHI_QAIC_CTRL_H__ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 7/8] accel/qaic: Add qaic driver to the build system
  2023-02-06 15:41 ` Jeffrey Hugo
@ 2023-02-06 15:41   ` Jeffrey Hugo
  -1 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-06 15:41 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: Jeffrey Hugo, linux-doc, linux-arm-msm, quic_ajitpals,
	quic_pkanojiy, stanislaw.gruszka, quic_carlv, jacek.lawrynowicz

Now that we have all the components of a minimum QAIC which can boot and
run an AIC100 device, add the infrastructure that allows the QAIC driver
to be built.

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
---
 drivers/accel/Kconfig       |  1 +
 drivers/accel/Makefile      |  1 +
 drivers/accel/qaic/Kconfig  | 23 +++++++++++++++++++++++
 drivers/accel/qaic/Makefile | 13 +++++++++++++
 4 files changed, 38 insertions(+)
 create mode 100644 drivers/accel/qaic/Kconfig
 create mode 100644 drivers/accel/qaic/Makefile

diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig
index 8348639..56d0d01 100644
--- a/drivers/accel/Kconfig
+++ b/drivers/accel/Kconfig
@@ -25,3 +25,4 @@ menuconfig DRM_ACCEL
 
 source "drivers/accel/habanalabs/Kconfig"
 source "drivers/accel/ivpu/Kconfig"
+source "drivers/accel/qaic/Kconfig"
diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile
index 07aa77a..3cb6f14 100644
--- a/drivers/accel/Makefile
+++ b/drivers/accel/Makefile
@@ -2,3 +2,4 @@
 
 obj-y	+= habanalabs/
 obj-y	+= ivpu/
+obj-y	+= qaic/
diff --git a/drivers/accel/qaic/Kconfig b/drivers/accel/qaic/Kconfig
new file mode 100644
index 0000000..a9f8662
--- /dev/null
+++ b/drivers/accel/qaic/Kconfig
@@ -0,0 +1,23 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Qualcomm Cloud AI accelerators driver
+#
+
+config DRM_ACCEL_QAIC
+	tristate "Qualcomm Cloud AI accelerators"
+	depends on DRM_ACCEL
+	depends on PCI && HAS_IOMEM
+	depends on MHI_BUS
+	depends on MMU
+	select CRC32
+	help
+	  Enables driver for Qualcomm's Cloud AI accelerator PCIe cards that are
+	  designed to accelerate Deep Learning inference workloads.
+
+	  The driver manages the PCIe devices and provides an IOCTL interface
+	  for users to submit workloads to the devices.
+
+	  If unsure, say N.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called qaic.
diff --git a/drivers/accel/qaic/Makefile b/drivers/accel/qaic/Makefile
new file mode 100644
index 0000000..d5f4952
--- /dev/null
+++ b/drivers/accel/qaic/Makefile
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for Qualcomm Cloud AI accelerators driver
+#
+
+obj-$(CONFIG_DRM_ACCEL_QAIC)	:= qaic.o
+
+qaic-y := \
+	mhi_controller.o \
+	mhi_qaic_ctrl.o \
+	qaic_control.o \
+	qaic_data.o \
+	qaic_drv.o
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 7/8] accel/qaic: Add qaic driver to the build system
@ 2023-02-06 15:41   ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-06 15:41 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: jacek.lawrynowicz, stanislaw.gruszka, quic_pkanojiy, quic_carlv,
	quic_ajitpals, linux-doc, linux-arm-msm, Jeffrey Hugo

Now that we have all the components of a minimum QAIC which can boot and
run an AIC100 device, add the infrastructure that allows the QAIC driver
to be built.

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
---
 drivers/accel/Kconfig       |  1 +
 drivers/accel/Makefile      |  1 +
 drivers/accel/qaic/Kconfig  | 23 +++++++++++++++++++++++
 drivers/accel/qaic/Makefile | 13 +++++++++++++
 4 files changed, 38 insertions(+)
 create mode 100644 drivers/accel/qaic/Kconfig
 create mode 100644 drivers/accel/qaic/Makefile

diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig
index 8348639..56d0d01 100644
--- a/drivers/accel/Kconfig
+++ b/drivers/accel/Kconfig
@@ -25,3 +25,4 @@ menuconfig DRM_ACCEL
 
 source "drivers/accel/habanalabs/Kconfig"
 source "drivers/accel/ivpu/Kconfig"
+source "drivers/accel/qaic/Kconfig"
diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile
index 07aa77a..3cb6f14 100644
--- a/drivers/accel/Makefile
+++ b/drivers/accel/Makefile
@@ -2,3 +2,4 @@
 
 obj-y	+= habanalabs/
 obj-y	+= ivpu/
+obj-y	+= qaic/
diff --git a/drivers/accel/qaic/Kconfig b/drivers/accel/qaic/Kconfig
new file mode 100644
index 0000000..a9f8662
--- /dev/null
+++ b/drivers/accel/qaic/Kconfig
@@ -0,0 +1,23 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Qualcomm Cloud AI accelerators driver
+#
+
+config DRM_ACCEL_QAIC
+	tristate "Qualcomm Cloud AI accelerators"
+	depends on DRM_ACCEL
+	depends on PCI && HAS_IOMEM
+	depends on MHI_BUS
+	depends on MMU
+	select CRC32
+	help
+	  Enables driver for Qualcomm's Cloud AI accelerator PCIe cards that are
+	  designed to accelerate Deep Learning inference workloads.
+
+	  The driver manages the PCIe devices and provides an IOCTL interface
+	  for users to submit workloads to the devices.
+
+	  If unsure, say N.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called qaic.
diff --git a/drivers/accel/qaic/Makefile b/drivers/accel/qaic/Makefile
new file mode 100644
index 0000000..d5f4952
--- /dev/null
+++ b/drivers/accel/qaic/Makefile
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for Qualcomm Cloud AI accelerators driver
+#
+
+obj-$(CONFIG_DRM_ACCEL_QAIC)	:= qaic.o
+
+qaic-y := \
+	mhi_controller.o \
+	mhi_qaic_ctrl.o \
+	qaic_control.o \
+	qaic_data.o \
+	qaic_drv.o
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 8/8] MAINTAINERS: Add entry for QAIC driver
  2023-02-06 15:41 ` Jeffrey Hugo
@ 2023-02-06 15:41   ` Jeffrey Hugo
  -1 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-06 15:41 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: Jeffrey Hugo, linux-doc, linux-arm-msm, quic_ajitpals,
	quic_pkanojiy, stanislaw.gruszka, quic_carlv, jacek.lawrynowicz

Add MAINTAINERS entry for the Qualcomm Cloud AI 100 driver.

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
---
 MAINTAINERS | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 263d37a..0a264f1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17170,6 +17170,14 @@ F:	Documentation/devicetree/bindings/clock/qcom,*
 F:	drivers/clk/qcom/
 F:	include/dt-bindings/clock/qcom,*
 
+QUALCOMM CLOUD AI (QAIC) DRIVER
+M:	Jeffrey Hugo <quic_jhugo@quicinc.com>
+L:	linux-arm-msm@vger.kernel.org
+S:	Supported
+F:	Documentation/accel/qaic/
+F:	drivers/accel/qaic/
+F:	include/uapi/drm/qaic_accel.h
+
 QUALCOMM CORE POWER REDUCTION (CPR) AVS DRIVER
 M:	Niklas Cassel <nks@flawful.org>
 L:	linux-pm@vger.kernel.org
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 8/8] MAINTAINERS: Add entry for QAIC driver
@ 2023-02-06 15:41   ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-06 15:41 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: jacek.lawrynowicz, stanislaw.gruszka, quic_pkanojiy, quic_carlv,
	quic_ajitpals, linux-doc, linux-arm-msm, Jeffrey Hugo

Add MAINTAINERS entry for the Qualcomm Cloud AI 100 driver.

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
---
 MAINTAINERS | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 263d37a..0a264f1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17170,6 +17170,14 @@ F:	Documentation/devicetree/bindings/clock/qcom,*
 F:	drivers/clk/qcom/
 F:	include/dt-bindings/clock/qcom,*
 
+QUALCOMM CLOUD AI (QAIC) DRIVER
+M:	Jeffrey Hugo <quic_jhugo@quicinc.com>
+L:	linux-arm-msm@vger.kernel.org
+S:	Supported
+F:	Documentation/accel/qaic/
+F:	drivers/accel/qaic/
+F:	include/uapi/drm/qaic_accel.h
+
 QUALCOMM CORE POWER REDUCTION (CPR) AVS DRIVER
 M:	Niklas Cassel <nks@flawful.org>
 L:	linux-pm@vger.kernel.org
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 0/8] QAIC accel driver
  2023-02-06 15:41 ` Jeffrey Hugo
@ 2023-02-08 22:01   ` Jeffrey Hugo
  -1 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-08 22:01 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: linux-doc, linux-arm-msm, quic_ajitpals, quic_pkanojiy,
	stanislaw.gruszka, quic_carlv, jacek.lawrynowicz

On 2/6/2023 8:41 AM, Jeffrey Hugo wrote:
> Regarding the open userspace (see the documentation patch), the UMD and
> compiler are a week or so away from being posted in the indicated repos.
> Just need to polish some documentation.

An update to this, the compiler is now live on github at the link 
specified in the documentation patch.

-Jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 0/8] QAIC accel driver
@ 2023-02-08 22:01   ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-08 22:01 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: linux-doc, linux-arm-msm, quic_ajitpals, quic_pkanojiy,
	stanislaw.gruszka, quic_carlv, jacek.lawrynowicz

On 2/6/2023 8:41 AM, Jeffrey Hugo wrote:
> Regarding the open userspace (see the documentation patch), the UMD and
> compiler are a week or so away from being posted in the indicated repos.
> Just need to polish some documentation.

An update to this, the compiler is now live on github at the link 
specified in the documentation patch.

-Jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/8] accel/qaic: Add documentation for AIC100 accelerator driver
  2023-02-06 15:41   ` Jeffrey Hugo
  (?)
@ 2023-02-14 11:08   ` Jacek Lawrynowicz
  2023-02-15 15:41       ` Jeffrey Hugo
  -1 siblings, 1 reply; 68+ messages in thread
From: Jacek Lawrynowicz @ 2023-02-14 11:08 UTC (permalink / raw)
  To: Jeffrey Hugo, ogabbay, airlied, daniel, dri-devel
  Cc: linux-doc, linux-arm-msm, quic_ajitpals, quic_pkanojiy,
	stanislaw.gruszka, quic_carlv

Hi,

On 06.02.2023 16:41, Jeffrey Hugo wrote:
> The Qualcomm Cloud AI 100 (AIC100) device is an Artificial Intelligence
> accelerator PCIe card.  It contains a number of components both in the
> SoC and on the card which facilitate running workloads:
> 
> QSM: management processor
> NSPs: workload compute units
> DMA Bridge: dedicated data mover for the workloads
> MHI: multiplexed communication channels
> DDR: workload storage and memory
> 
> The Linux kernel driver for AIC100 is called "QAIC" and is located in the
> accel subsystem.
> 
> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
> ---
>  Documentation/accel/index.rst       |   1 +
>  Documentation/accel/qaic/aic100.rst | 498 ++++++++++++++++++++++++++++++++++++
>  Documentation/accel/qaic/index.rst  |  13 +
>  Documentation/accel/qaic/qaic.rst   | 169 ++++++++++++
>  4 files changed, 681 insertions(+)
>  create mode 100644 Documentation/accel/qaic/aic100.rst
>  create mode 100644 Documentation/accel/qaic/index.rst
>  create mode 100644 Documentation/accel/qaic/qaic.rst
> 
> diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
> index 2b43c9a..e94a016 100644
> --- a/Documentation/accel/index.rst
> +++ b/Documentation/accel/index.rst
> @@ -8,6 +8,7 @@ Compute Accelerators
>     :maxdepth: 1
>  
>     introduction
> +   qaic/index
>  
>  .. only::  subproject and html
>  
> diff --git a/Documentation/accel/qaic/aic100.rst b/Documentation/accel/qaic/aic100.rst
> new file mode 100644
> index 0000000..773aa54
> --- /dev/null
> +++ b/Documentation/accel/qaic/aic100.rst
> @@ -0,0 +1,498 @@
> +.. SPDX-License-Identifier: GPL-2.0-only
> +
> +===============================
> + Qualcomm Cloud AI 100 (AIC100)
> +===============================
> +
> +Overview
> +========
> +
> +The Qualcomm Cloud AI 100/AIC100 family of products (including SA9000P - part of
> +Snapdragon Ride) are PCIe adapter cards which contain a dedicated SoC ASIC for
> +the purpose of efficiently running Artificial Intelligence (AI) Deep Learning
> +inference workloads.  They are AI accelerators.

There are multiple double spaces in this document like this one above.

> +The PCIe interface of AIC100 is capable of PCIe Gen4 speeds over eight lanes
> +(x8).  An individual SoC on a card can have up to 16 NSPs for running workloads.
> +Each SoC has an A53 management CPU.  On card, there can be up to 32 GB of DDR.
> +
> +Multiple AIC100 cards can be hosted in a single system to scale overall
> +performance.
> +
> +Hardware Description
> +====================
> +
> +An AIC100 card consists of an AIC100 SoC, on-card DDR, and a set of misc
> +peripherals (PMICs, etc).
> +
> +An AIC100 card can either be a PCIe HHHL form factor (a traditional PCIe card),
> +or a Dual M.2 card.  Both use PCIe to connect to the host system.

Dual M.2 card? Is it a single PCB with two M.2 connectors? This requires custom
motherboard with x4 lanes from two connectors combined as a single PCIe device, right?

> +As a PCIe endpoint/adapter, AIC100 uses the standard VendorID(VID)/
> +DeviceID(DID) combination to uniquely identify itself to the host.  AIC100
> +uses the standard Qualcomm VID (0x17cb).  All AIC100 instances use the same
> +AIC100 DID (0xa100).

Maybe "SKUs" would fit better here then "instances".

> +AIC100 does not implement FLR (function level reset).
> +
> +AIC100 implements MSI but does not implement MSI-X.  AIC100 requires 17 MSIs to
> +operate (1 for MHI, 16 for the DMA Bridge).
> +
> +As a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device
> +hardware.  AIC100 provides 3, 64-bit BARs.
> +
> +* The first BAR is 4K in size, and exposes the MHI interface to the host.
> +
> +* The second BAR is 2M in size, and exposes the DMA Bridge interface to the
> +  host.
> +
> +* The third BAR is variable in size based on an individual AIC100's
> +  configuration, but defaults to 64K.  This BAR currently has no purpose.
> +
> +From the host perspective, AIC100 has several key hardware components-

Typo in "components-".

> +* QSM (QAIC Service Manager)
> +* NSPs (Neural Signal Processor)
> +* DMA Bridge
> +* DDR
> +* MHI (Modem Host Interface)
> +
> +QSM
> +---
> +
> +QAIC Service Manager.  This is an ARM A53 CPU that runs the primary
> +firmware of the card and performs on-card management tasks.  It also
> +communicates with the host via MHI.  Each AIC100 has one of
> +these.

I would put description of MHI at the top because it is referenced by the QSM description.

> +NSP
> +---
> +
> +Neural Signal Processor.  Each AIC100 has up to 16 of these.  These are
> +the processors that run the workloads on AIC100.  Each NSP is a Qualcomm Hexagon
> +(Q6) DSP with HVX and HMX.  Each NSP can only run one workload at a time, but
> +multiple NSPs may be assigned to a single workload.  Since each NSP can only run
> +one workload, AIC100 is limited to 16 concurrent workloads.  Workload
> +"scheduling" is under the purview of the host.  AIC100 does not automatically
> +timeslice.
> +
> +DMA Bridge
> +----------
> +
> +The DMA Bridge is custom DMA engine that manages the flow of data
> +in and out of workloads.  AIC100 has one of these.  The DMA Bridge has 16
> +channels, each consisting of a set of request/response FIFOs.  Each active
> +workload is assigned a single DMA Bridge channel.  The DMA Bridge exposes
> +hardware registers to manage the FIFOs (head/tail pointers), but requires host
> +memory to store the FIFOs.
> +
> +DDR
> +---
> +
> +AIC100 has on-card DDR.  In total, an AIC100 can have up to 32 GB of DDR.
> +This DDR is used to store workloads, data for the workloads, and is used by the
> +QSM for managing the device.  NSPs are granted access to sections of the DDR by
> +the QSM.  The host does not have direct access to the DDR, and must make
> +requests to the QSM to transfer data to the DDR.
> +
> +MHI
> +---
> +
> +AIC100 has one MHI interface over PCIe.  MHI itself is documented at

Please exand MHI acronym.

> +Documentation/mhi/index.rst  MHI is the mechanism the host uses to communicate
> +with the QSM.  Except for workload data via the DMA Bridge, all interaction with
> +he device occurs via MHI.

Typo in "he device".

> +High-level Use Flow
> +===================
> +
> +AIC100 is a programmable accelerator typically used for running
> +neural networks in inferencing mode to efficiently perform AI operations.
> +AIC100 is not intended for training neural networks.  AIC100 can be utilitized

utilitized -> utilized

> +for generic compute workloads.
> +
> +Assuming a user wants to utilize AIC100, they would follow these steps:
> +
> +1. Compile the workload into an ELF targeting the NSP(s)
> +2. Make requests to the QSM to load the workload and related artifacts into the
> +   device DDR
> +3. Make a request to the QSM to activate the workload onto a set of idle NSPs
> +4. Make requests to the DMA Bridge to send input data to the workload to be
> +   processed, and other requests to receive processed output data from the
> +   workload.
> +5. Once the workload is no longer required, make a request to the QSM to
> +   deactivate the workload, thus putting the NSPs back into an idle state.
> +6. Once the workload and related artifacts are no longer needed for future
> +   sessions, make requests to the QSM to unload the data from DDR.  This frees
> +   the DDR to be used by other users.
> +

Please specify if this is single or multi user device.

> +Boot Flow
> +=========
> +
> +AIC100 uses a flashless boot flow, derived from Qualcomm MSMs.

What's MSM?

> +When AIC100 is first powered on, it begins executing PBL (Primary Bootloader)
> +from ROM.  PBL enumerates the PCIe link, and initializes the BHI (Boot Host
> +Interface) component of MHI.
> +
> +Using BHI, the host points PBL to the location of the SBL (Secondary Bootloader)
> +image.  The PBL pulls the image from the host, validates it, and begins
> +execution of SBL.
> +
> +SBL initializes MHI, and uses MHI to notify the host that the device has entered
> +the SBL stage.  SBL performs a number of operations:
> +
> +* SBL initializes the majority of hardware (anything PBL left uninitialized),
> +  including DDR.
> +* SBL offloads the bootlog to the host.
> +* SBL synchonizes timestamps with the host for future logging.

synchonizes -> synchronizes

> +* SBL uses the Sahara protocol to obtain the runtime firmware images from the
> +  host.
> +
> +Once SBL has obtained and validated the runtime firmware, it brings the NSPs out
> +of reset, and jumps into the QSM.
> +
> +The QSM uses MHI to notify the host that the device has entered the QSM stage
> +(AMSS in MHI terms).  At this point, the AIC100 device is fully functional, and
> +ready to process workloads.
> +
> +Userspace components
> +====================
> +
> +Compiler
> +--------
> +
> +An open compiler for AIC100 based on upstream LLVM can be found at:
> +https://github.com/quic/software-kit-for-qualcomm-cloud-ai-100-cc
> +
> +Usermode Driver (UMD)
> +---------------------
> +
> +An open UMD that interfaces with the qaic kernel driver can be found at:
> +https://github.com/quic/software-kit-for-qualcomm-cloud-ai-100

This repo is empty.

> +
> +Sahara loader
> +-------------
> +
> +An open implementation of the Sahara protocol called kickstart can be found at:
> +https://github.com/andersson/qdl
> +
> +MHI Channels
> +============
> +
> +AIC100 defines a number of MHI channels for different purposes.  This is a list
> +of the defined channels, and their uses.
> +
> +| QAIC_LOOPBACK
> +| Channels 0/1

A would use comma or & here.

> +| Valid for AMSS
> +| Any data sent to the device on this channel is sent back to the host.
> +
> +| QAIC_SAHARA
> +| Channels 2/3
> +| Valid for SBL
> +| Used by SBL to obtain the runtime firmware from the host.
> +
> +| QAIC_DIAG
> +| Channels 4/5
> +| Valid for AMSS
> +| Used to communicate with QSM via the Diag protocol.
> +
> +| QAIC_SSR
> +| Channels 6/7
> +| Valid for AMSS
> +| Used to notify the host of subsystem restart events, and to offload SSR crashdumps.
> +
> +| QAIC_QDSS
> +| Channels 8/9
> +| Valid for AMSS
> +| Used for the Qualcomm Debug Subsystem.
> +
> +| QAIC_CONTROL
> +| Channels 10/11
> +| Valid for AMSS
> +| Used for the Neural Network Control (NNC) protocol.  This is the primary channel between host and QSM for managing workloads.
> +
> +| QAIC_LOGGING
> +| Channels 12/13
> +| Valid for SBL
> +| Used by the SBL to send the bootlog to the host.
> +
> +| QAIC_STATUS
> +| Channels 14/15
> +| Valid for AMSS
> +| Used to notify the host of Reliability, Accessability, Serviceability (RAS) events.

Accessability -> Accessibility

> +| QAIC_TELEMETRY
> +| Channels 16/17
> +| Valid for AMSS
> +| Used to get/set power/thermal/etc attributes.
> +
> +| QAIC_DEBUG
> +| Channels 18/19
> +| Valid for AMSS
> +| Not used.
> +
> +| QAIC_TIMESYNC
> +| Channels 20/21
> +| Valid for SBL/AMSS
> +| Used to synchronize timestamps in the device side logs with the host time source.
> +
> +DMA Bridge
> +==========
> +
> +Overview
> +--------
> +
> +The DMA Bridge is one of the main interfaces to the host from the device
> +(the other being MHI).  As part of activating a workload to run on NSPs, the QSM
> +assigns that network a DMA Bridge channel.  A workload's DMA Bridge channel
> +(DBC for short) is solely for the use of that workload and is not shared with
> +other workloads.
> +
> +Each DBC is a pair of FIFOs that manage data in and out of the workload.  One
> +FIFO is the request FIFO.  The other FIFO is the response FIFO.
> +
> +Each DBC contains 4 registers in hardware:
> +
> +* Request FIFO head pointer (offset 0x0).  Read only to the host.  Indicates the

Read only _by_ the host.

> +  latest item in the FIFO the device has consumed.
> +* Request FIFO tail pointer (offset 0x4).  Read/write by the host.  Host
> +  increments this register to add new items to the FIFO.
> +* Response FIFO head pointer (offset 0x8).  Read/write by the host.  Indicates
> +  the latest item in the FIFO the host has consumed.
> +* Response FIFO tail pointer (offset 0xc).  Read only to the host.  Device

Read only _by_ the host.

> +  increments this register to add new items to the FIFO.
> +
> +The values in each register are indexes in the FIFO.  To get the location of the
> +FIFO element pointed to by the register: FIFO base address + register * element
> +size.
> +
> +DBC registers are exposed to the host via the second BAR.  Each DBC consumes
> +0x1000 of space in the BAR.

I wouldn't use hex for the sizes. 4KB seems a lot more readable.

> +The actual FIFOs are backed by host memory.  When sending a request to the QSM
> +to activate a network, the host must donate memory to be used for the FIFOs.
> +Due to internal mapping limitations of the device, a single contigious chunk of

contigious -> contiguous

> +memory must be provided per DBC, which hosts both FIFOs.  The request FIFO will
> +consume the beginning of the memory chunk, and the response FIFO will consume
> +the end of the memory chunk.
> +
> +Request FIFO
> +------------
> +
> +A request FIFO element has the following structure:
> +
> +| {
> +|	u16 req_id;
> +|	u8  seq_id;
> +|	u8  pcie_dma_cmd;
> +|	u32 reserved;
> +|	u64 pcie_dma_source_addr;
> +|	u64 pcie_dma_dest_addr;
> +|	u32 pcie_dma_len;
> +|	u32 reserved;
> +|	u64 doorbell_addr;
> +|	u8  doorbell_attr;
> +|	u8  reserved;
> +|	u16 reserved;
> +|	u32 doorbell_data;
> +|	u32 sem_cmd0;
> +|	u32 sem_cmd1;
> +|	u32 sem_cmd2;
> +|	u32 sem_cmd3;
> +| }
> +
> +Request field descriptions:
> +
> +| req_id- request ID.  A request FIFO element and a response FIFO element with
> +|         the same request ID refer to the same command.
> +
> +| seq_id- sequence ID within a request.  Ignored by the DMA Bridge.
> +
> +| pcie_dma_cmd- describes the DMA element of this request.
> +| 	Bit(7) is the force msi flag, which overrides the DMA Bridge MSI logic
> +| 		and generates a MSI when this request is complete, and QSM
> +| 		configures the DMA Bridge to look at this bit.
> +| 	Bits(6:5) are reserved.
> +| 	Bit(4) is the completion code flag, and indicates that the DMA Bridge
> +| 		shall generate a response FIFO element when this request is
> +| 		complete.
> +| 	Bit(3) indicates if this request is a linked list transfer(0) or a bulk
> +| 		transfer(1).
> +| 	Bit(2) is reserved.
> +| 	Bits(1:0) indicate the type of transfer.  No transfer(0), to device(1),
> +| 		from device(2).  Value 3 is illegal.
> +
> +| pcie_dma_source_addr- source address for a bulk transfer, or the address of
> +|         the linked list.
> +
> +| pcie_dma_dest_addr- destination address for a bulk transfer.
> +
> +| pcie_dma_len- length of the bulk transfer.  Note that the size of this field
> +| 	limits transfers to 4G in size.
> +
> +| doorbell_addr- address of the doorbell to ring when this request is complete.
> +
> +| doorbell_attr- doorbell attributes.
> +| 	Bit(7) indicates if a write to a doorbell is to occur.
> +| 	Bits(6:2) are reserved.
> +| 	Bits(1:0) contain the encoding of the doorbell length.  0 is 32-bit,
> +| 		1 is 16-bit, 2 is 8-bit, 3 is reserved.  The doorbell address
> +| 		must be naturally aligned to the specified length.
> +
> +| doorbell_data- data to write to the doorbell.  Only the bits corresponding to
> +| 	the doorbell length are valid.
> +
> +| sem_cmdN- semaphore command.
> +| 	Bit(31) indicates this semaphore command is enabled.
> +| 	Bit(30) is the to-device DMA fence.  Block this request until all
> +| 		to-device DMA transfers are complete.
> +| 	Bit(29) is the from-device DMA fence.  Block this request until all
> +| 		from-device DMA transfers are complete.
> +| 	Bits(28:27) are reserved.
> +| 	Bits(26:24) are the semaphore command.  0 is NOP.  1 is init with the
> +| 		specified value.  2 is increment.  3 is decrement.  4 is wait
> +| 		until the semaphore is equal to the specified value.  5 is wait
> +| 		until the semaphore is greater or equal to the specified value.
> +| 		6 is "P", wait until semaphore is greater than 0, then
> +| 		decrement by 1.  7 is reserved.
> +| 	Bit(23) is reserved.
> +| 	Bit(22) is the semaphore sync.  0 is post sync, which means that the
> +| 		semaphore operation is done after the DMA transfer.  1 is
> +| 		presync, which gates the DMA transfer.  Only one presync is
> +| 		allowed per request.
> +| 	Bit(21) is reserved.
> +| 	Bits(20:16) is the index of the semaphore to operate on.
> +| 	Bits(15:12) are reserved.
> +| 	Bits(11:0) are the semaphore value to use in operations.

It seems to me like structure documentation 

> +Overall, a request is processed in 4 steps:
> +
> +1. If specified, the presync semaphore condition must be true
> +2. If enabled, the DMA transfer occurs
> +3. If specified, the postsync semaphore conditions must be true
> +4. If enabled, the doorbell is written
> +
> +By using the semaphores in conjunction with the workload running on the NSPs,
> +the data pipeline can be synchronized such that the host can queue multiple
> +requests of data for the workload to process, but the DMA Bridge will only copy
> +the data into the memory of the workload when the workload is ready to process
> +the next input.
> +
> +Response FIFO
> +-------------
> +
> +Once a request is fully processed, a response FIFO element is generated if
> +specified in pcie_dma_cmd.  The structure of a response FIFO element:
> +
> +| {
> +| 	u16 req_id;
> +| 	u16 completion_code;
> +| }
> +
> +req_id- matches the req_id of the request that generated this element.
> +
> +completion_code- status of this request.  0 is success.  non-zero is an error.
> +
> +The DMA Bridge will generate a MSI to the host as a reaction to activity in the
> +response FIFO of a DBC.  The DMA Bridge hardware has an IRQ storm mitigation
> +algorithm, where it will only generate a MSI when the response FIFO transitions
> +from empty to non-empty (unless force MSI is enabled and triggered).  In
> +response to this MSI, the host is expected to drain the response FIFO, and must
> +take care to handle any race conditions between draining the FIFO, and the
> +device inserting elements into the FIFO.
> +
> +Neural Network Control (NNC) Protocol
> +=====================================
> +
> +The NNC protocol is how the host makes requests to the QSM to manage workloads.
> +It uses the QAIC_CONTROL MHI channel.
> +
> +Each NNC request is packaged into a message.  Each message is a series of
> +transactions.  A passthrough type transaction can contain elements known as
> +commands.
> +
> +QSM requires NNC messages be little endian encoded and the fields be naturally
> +aligned.  Since there are 64-bit elements in some NNC messages, 64-bit alignment
> +must be maintained.
> +
> +A message contains a header and then a series of transactions.  A message may be
> +at most 4K in size from QSM to the host.  From the host to the QSM, a message
> +can be at most 64K (maximum size of a single MHI packet), but there is a
> +continuation feature where message N+1 can be marked as a continuation of
> +message N.  This is used for exceedingly large DMA xfer transactions.
> +
> +Transaction descriptions:
> +
> +passthrough- Allows userspace to send an opaque payload directly to the QSM.
> +This is used for NNC commands.  Userspace is responsible for managing
> +the QSM message requirements in the payload
> +
> +dma_xfer- DMA transfer.  Describes an object that the QSM should DMA into the
> +device via address and size tuples.
> +
> +activate- Activate a workload onto NSPs.  The host must provide memory to be
> +used by the DBC.
> +
> +deactivate- Deactivate an active workload and return the NSPs to idle.
> +
> +status- Query the QSM about it's NNC implementation.  Returns the NNC version,
> +and if CRC is used.
> +
> +terminate- Release a user's resources.
> +
> +dma_xfer_cont- Continuation of a previous DMA transfer.  If a DMA transfer
> +cannot be specified in a single message (highly fragmented), this
> +transaction can be used to specify more ranges.
> +
> +validate_partition- Query to QSM to determine if a partition identifier is
> +valid.
> +
> +Each message is tagged with a user id, and a partition id.  The user id allows
> +QSM to track resources, and release them when the user goes away (eg the process
> +crashes).  A partition id identifies the resource partition that QSM manages,
> +which this message applies to.
> +
> +Messages may have CRCs.  Messages should have CRCs applied until the QSM
> +reports via the status transaction that CRCs are not needed.  The QSM on the
> +SA9000P requires CRCs for black channel safing.
> +
> +Subsystem Restart (SSR)
> +=======================
> +
> +SSR is the concept of limiting the impact of an error.  An AIC100 device may
> +have multiple users, each with their own workload running.  If the workload of
> +one user crashes, the fallout of that should be limited to that workload and not
> +impact other workloads.  SSR accomplishes this.
> +
> +If a particular workload crashes, QSM notifies the host via the QAIC_SSR MHI
> +channel.  This notification identifies the workload by it's assigned DBC.  A
> +multi-stage recovery process is then used to cleanup both sides, and get the
> +DBC/NSPs into a working state.
> +
> +When SSR occurs, any state in the workload is lost.  Any inputs that were in
> +process, or queued by not yet serviced, are lost.  The loaded artifacts will
> +remain in on-card DDR, but the host will need to re-activate the workload if
> +it desires to recover the workload.
> +
> +Reliability, Accessability, Serviceability (RAS)

Accessability -> Accessibility

> +================================================
> +
> +AIC100 is expected to be deployed in server systems where RAS ideology is
> +applied.  Simply put, RAS is the concept of detecting, classifying, and
> +reporting errors.  While PCIe has AER (Advanced Error Reporting) which factors
> +into RAS, AER does not allow for a device to report details about internal
> +errors.  Therefore, AIC100 implements a custom RAS mechanism.  When a RAS event
> +occurs, QSM will report the event with appropriate details via the QAIC_STATUS
> +MHI channel.  A sysadmin may determine that a particular device needs
> +additional service based on RAS reports.
> +
> +Telemetry
> +=========
> +
> +QSM has the ability to report various physical attributes of the device, and in
> +some cases, to allow the host to control them.  Examples include thermal limits,
> +thermal readings, and power readings.  These items are communicated via the
> +QAIC_TELEMETRY MHI channel
> diff --git a/Documentation/accel/qaic/index.rst b/Documentation/accel/qaic/index.rst
> new file mode 100644
> index 0000000..ad19b88
> --- /dev/null
> +++ b/Documentation/accel/qaic/index.rst
> @@ -0,0 +1,13 @@
> +.. SPDX-License-Identifier: GPL-2.0-only
> +
> +=====================================
> + accel/qaic Qualcomm Cloud AI driver
> +=====================================
> +
> +The accel/qaic driver supports the Qualcomm Cloud AI machine learning
> +accelerator cards.
> +
> +.. toctree::
> +
> +   qaic
> +   aic100
> diff --git a/Documentation/accel/qaic/qaic.rst b/Documentation/accel/qaic/qaic.rst
> new file mode 100644
> index 0000000..b0e7a5f
> --- /dev/null
> +++ b/Documentation/accel/qaic/qaic.rst
> @@ -0,0 +1,169 @@
> +.. SPDX-License-Identifier: GPL-2.0-only
> +
> +=============
> + QAIC driver
> +=============
> +
> +The QAIC driver is the Kernel Mode Driver (KMD) for the AIC100 family of AI
> +accelerator products.
> +
> +Interrupts
> +==========
> +
> +While the AIC100 DMA Bridge hardware implements an IRQ storm mitigation
> +mechanism, it is still possible for an IRQ storm to occur.  A storm can happen
> +if the workload is particularly quick, and the host is responsive.  If the host
> +can drain the response FIFO as quickly as the device can insert elements into
> +it, then the device will frequently transition the response FIFO from empty to
> +non-empty and generate MSIs at a rate equilivelent to the speed of the

equilivelent -> equivalent

> +workload's ability to process inputs.  The lprnet (license plate reader network)
> +workload is known to trigger this condition, and can generate in excess of 100k
> +MSIs per second.  It has been observed that most systems cannot tolerate this
> +for long, and will crash due to some form of watchdog due to the overhead of
> +the interrupt controller interrupting the host CPU.
> +
> +To mitigate this issue, the QAIC driver implements specific IRQ handling.  When
> +QAIC receives an IRQ, it disables that line.  This prevents the interrupt
> +controller from interrupting the CPU.  Then AIC drains the FIFO.  Once the FIFO
> +is drained, QAIC implements a "last chance" polling algorithm where QAIC will
> +sleep for a time to see if the workload will generate more activity.  The IRQ
> +line remains disabled during this time.  If no activity is detected, QAIC exits
> +polling mode and reenables the IRQ line.
> +
> +This mitigation in QAIC is very effective.  The same lprnet usecase that
> +generates 100k IRQs per second (per /proc/interrupts) is reduced to roughly 64
> +IRQs over 5 minutes while keeping the host system stable, and having the same
> +workload throughput performance (within run to run noise variation).
> +
> +
> +Neural Network Control (NNC) Protocol
> +=====================================
> +
> +The implementation of NNC is split between the KMD (QAIC) and UMD.  In general
> +QAIC understands how to encode/decode NNC wire protocol, and elements of the
> +protocol which require kernelspace knowledge to process (for example, mapping

kernelspace is missing a space :P

> +host memory to device IOVAs).  QAIC understands the structure of a message, and
> +all of the transactions.  QAIC does not understand commands (the payload of a
> +passthrough transaction).
> +
> +QAIC handles and enforces the required little endianness and 64-bit alignment,
> +to the degree that it can.  Since QAIC does not know the contents of a
> +passthrough transaction, it relies on the UMD to saitsfy the requirements.

saitsfy -> satisfy

> +The terminate transaction is of particular use to QAIC.  QAIC is not aware of
> +the resources that are loaded onto a device since the majority of that activity
> +occurs within NNC commands.  As a result, QAIC does not have the means to
> +roll back userspace activity.  To ensure that a userspace client's resources
> +are fully released in the case of a process crash, or a bug, QAIC uses the
> +terminate command to let QSM know when a user has gone away, and the resources
> +can be released.
> +
> +QSM can report a version number of the NNC protocol it supports.  This is in the
> +form of a Major number and a Minor number.
> +
> +Major number updates indicate changes to the NNC protocol which impact the
> +message format, or transactions (impacts QAIC).
> +
> +Minor number updates indicate changes to the NNC protocol which impact the
> +commands (does not impact QAIC).
> +
> +uAPI
> +====
> +
> +QAIC defines a number of driver specific IOCTLs as part of the userspace API.
> +This section describes those APIs.
> +
> +DRM_IOCTL_QAIC_MANAGE:
> +This IOCTL allows userspace to send a NNC request to the QSM.  The call will
> +block until a response is received, or the request has timed out.
> +
> +DRM_IOCTL_QAIC_CREATE_BO:
> +This IOCTL allows userspace to allocate a buffer object (BO) which can send or
> +receive data from a workload.  The call will return a GEM handle that
> +represents the allocated buffer.  The BO is not usable until it has been sliced
> +(see DRM_IOCTL_QAIC_ATTACH_SLICE_BO).
> +
> +DRM_IOCTL_QAIC_MMAP_BO:
> +This IOCTL allows userspace to prepare an allocated BO to be mmap'd into the
> +userspace process.
> +
> +DRM_IOCTL_QAIC_ATTACH_SLICE_BO:
> +This IOCTL allows userspace to slice a BO in preparation for sending the BO to
> +the device.  Slicing is the operation of describing what portions of a BO get
> +sent where to a workload.  This requires a set of DMA transfers for the DMA
> +Bridge, and as such, locks the BO to a specific DBC.
> +
> +DRM_IOCTL_QAIC_EXECUTE_BO:
> +This IOCTL allows userspace to submit a set of sliced BOs to the device.  The
> +call is non-blocking.  Success only indicates that the BOs have been queued
> +to the device, but does not guarantee they have been executed.
> +
> +DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO:
> +This IOCTL operates like DRM_IOCTL_QAIC_EXECUTE_BO, but it allows userspace to
> +shrink the BOs sent to the device for this specific call.  If a BO typically has
> +N inputs, but only a subset of those is available, this IOCTL allows userspace
> +to indicate that only the first M bytes of the BO should be sent to the device
> +to minimize data transfer overhead.  This IOCTL dynamically recomputes the
> +slicing, and therefore has some processing overhead before the BOs can be queued
> +to the device.
> +
> +DRM_IOCTL_QAIC_WAIT_BO:
> +This IOCTL allows userspace to determine when a particular BO has been processed
> +by the device.  The call will block until either the BO has been processed and
> +can be re-queued to the device, or a timeout occurs.
> +
> +DRM_IOCTL_QAIC_PERF_STATS_BO:
> +This IOCTL allows userspace to collect performance statistics on the most
> +recent execution of a BO.  This allows userspace to construct an end to end
> +timeline of the BO processing for a performance analysis.
> +
> +DRM_IOCTL_QAIC_PART_DEV:
> +This IOCTL allows userspace to request a duplicate "shadow device".  This extra
> +accelN device is associated with a specific partition of resources on the AIC100
> +device and can be used for limiting a process to some subset of resources.
> +
> +Userspace Client Isolation
> +==========================
> +
> +AIC100 supports multiple clients.  Multiple DBCs can be consumed by a single
> +client, and multiple clients can each consume one or more DBCs.  Workloads
> +may contain sensistive information therefore only the client that owns the

sensistive -> sensitive

> +workload should be allowed to interface with the DBC.
> +
> +Clients are identified by the instance associated with their open().  A client
> +may only use memory they allocate, and DBCs that are assigned to their
> +workloads.  Attempts to access resources assigned to other clients will be
> +rejected.
> +
> +Module parameters
> +=================
> +
> +QAIC supports the following module parameters:
> +
> +**datapath_polling (bool)**
> +
> +Configures QAIC to use a polling thread for datapath events instead of relying
> +on the device interrupts.  Useful for platforms with broken multiMSI.  Must be
> +set at QAIC driver initialization.  Default is 0 (off).
> +
> +**mhi_timeout (int)**
> +
> +Sets the timeout value for MHI operations in milliseconds (ms).  Must be set
> +at the time the driver detects a device.  Default is 2000 (2 seconds).
> +
> +**control_resp_timeout (int)**
> +
> +Sets the timeout value for QSM responses to NNC messages in seconds (s).  Must
> +be set at the time the driver is sending a request to QSM.  Default is 60 (one
> +minute).
> +
> +**wait_exec_default_timeout (int)**
> +
> +Sets the default timeout for the wait_exec ioctl in milliseconds (ms).  Must be
> +set prior to the waic_exec ioctl call.  A value specified in the ioctl call
> +overrides this for that call.  Default is 5000 (5 seconds).
> +
> +**datapath_poll_interval_us (int)**
> +
> +Sets the polling interval in microseconds (us) when datapath polling is active.
> +Takes effect at the next polling interval.  Default is 100 (100 us).

Cool that you are staring with the documentation :)
I suggest running at least "checkpatch.pl --codespell" on the series as there are many spelling issue.

Regards,
Jacek




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/8] accel/qaic: Add documentation for AIC100 accelerator driver
  2023-02-14 11:08   ` Jacek Lawrynowicz
@ 2023-02-15 15:41       ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-15 15:41 UTC (permalink / raw)
  To: Jacek Lawrynowicz, ogabbay, airlied, daniel, dri-devel
  Cc: linux-doc, linux-arm-msm, quic_ajitpals, quic_pkanojiy,
	stanislaw.gruszka, quic_carlv

On 2/14/2023 4:08 AM, Jacek Lawrynowicz wrote:
> Hi,

Thank you for the review.

> On 06.02.2023 16:41, Jeffrey Hugo wrote:
>> The Qualcomm Cloud AI 100 (AIC100) device is an Artificial Intelligence
>> accelerator PCIe card.  It contains a number of components both in the
>> SoC and on the card which facilitate running workloads:
>>
>> QSM: management processor
>> NSPs: workload compute units
>> DMA Bridge: dedicated data mover for the workloads
>> MHI: multiplexed communication channels
>> DDR: workload storage and memory
>>
>> The Linux kernel driver for AIC100 is called "QAIC" and is located in the
>> accel subsystem.
>>
>> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
>> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
>> ---
>>   Documentation/accel/index.rst       |   1 +
>>   Documentation/accel/qaic/aic100.rst | 498 ++++++++++++++++++++++++++++++++++++
>>   Documentation/accel/qaic/index.rst  |  13 +
>>   Documentation/accel/qaic/qaic.rst   | 169 ++++++++++++
>>   4 files changed, 681 insertions(+)
>>   create mode 100644 Documentation/accel/qaic/aic100.rst
>>   create mode 100644 Documentation/accel/qaic/index.rst
>>   create mode 100644 Documentation/accel/qaic/qaic.rst
>>
>> diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
>> index 2b43c9a..e94a016 100644
>> --- a/Documentation/accel/index.rst
>> +++ b/Documentation/accel/index.rst
>> @@ -8,6 +8,7 @@ Compute Accelerators
>>      :maxdepth: 1
>>   
>>      introduction
>> +   qaic/index
>>   
>>   .. only::  subproject and html
>>   
>> diff --git a/Documentation/accel/qaic/aic100.rst b/Documentation/accel/qaic/aic100.rst
>> new file mode 100644
>> index 0000000..773aa54
>> --- /dev/null
>> +++ b/Documentation/accel/qaic/aic100.rst
>> @@ -0,0 +1,498 @@
>> +.. SPDX-License-Identifier: GPL-2.0-only
>> +
>> +===============================
>> + Qualcomm Cloud AI 100 (AIC100)
>> +===============================
>> +
>> +Overview
>> +========
>> +
>> +The Qualcomm Cloud AI 100/AIC100 family of products (including SA9000P - part of
>> +Snapdragon Ride) are PCIe adapter cards which contain a dedicated SoC ASIC for
>> +the purpose of efficiently running Artificial Intelligence (AI) Deep Learning
>> +inference workloads.  They are AI accelerators.
> 
> There are multiple double spaces in this document like this one above.

I presume you are referring to the double space after peroid. 
Universally, that was the recommended style (APA guidebook, etc) until a 
little while ago.  Old habits are hard to break.  Will scrub.

> 
>> +The PCIe interface of AIC100 is capable of PCIe Gen4 speeds over eight lanes
>> +(x8).  An individual SoC on a card can have up to 16 NSPs for running workloads.
>> +Each SoC has an A53 management CPU.  On card, there can be up to 32 GB of DDR.
>> +
>> +Multiple AIC100 cards can be hosted in a single system to scale overall
>> +performance.
>> +
>> +Hardware Description
>> +====================
>> +
>> +An AIC100 card consists of an AIC100 SoC, on-card DDR, and a set of misc
>> +peripherals (PMICs, etc).
>> +
>> +An AIC100 card can either be a PCIe HHHL form factor (a traditional PCIe card),
>> +or a Dual M.2 card.  Both use PCIe to connect to the host system.
> 
> Dual M.2 card? Is it a single PCB with two M.2 connectors? This requires custom
> motherboard with x4 lanes from two connectors combined as a single PCIe device, right?

Yes.  There is a specification for this, although it hasn't gotten 
widespread adoption.  In addition to more lanes, you also get to draw 
more power.  Sincle M.2 is around 11W.  Dual M.2 is capped at 25W.

It tends to be a handy form factor for "edge" applications where the 
physical size and power draw of a "normal" PCIe slot (what you'd find on 
a regular ATX motherboard) is not desirerable.

> 
>> +As a PCIe endpoint/adapter, AIC100 uses the standard VendorID(VID)/
>> +DeviceID(DID) combination to uniquely identify itself to the host.  AIC100
>> +uses the standard Qualcomm VID (0x17cb).  All AIC100 instances use the same
>> +AIC100 DID (0xa100).
> 
> Maybe "SKUs" would fit better here then "instances".

Sure.

> 
>> +AIC100 does not implement FLR (function level reset).
>> +
>> +AIC100 implements MSI but does not implement MSI-X.  AIC100 requires 17 MSIs to
>> +operate (1 for MHI, 16 for the DMA Bridge).
>> +
>> +As a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device
>> +hardware.  AIC100 provides 3, 64-bit BARs.
>> +
>> +* The first BAR is 4K in size, and exposes the MHI interface to the host.
>> +
>> +* The second BAR is 2M in size, and exposes the DMA Bridge interface to the
>> +  host.
>> +
>> +* The third BAR is variable in size based on an individual AIC100's
>> +  configuration, but defaults to 64K.  This BAR currently has no purpose.
>> +
>> +From the host perspective, AIC100 has several key hardware components-
> 
> Typo in "components-".

?
You want "components -"?

> 
>> +* QSM (QAIC Service Manager)
>> +* NSPs (Neural Signal Processor)
>> +* DMA Bridge
>> +* DDR
>> +* MHI (Modem Host Interface)
>> +
>> +QSM
>> +---
>> +
>> +QAIC Service Manager.  This is an ARM A53 CPU that runs the primary
>> +firmware of the card and performs on-card management tasks.  It also
>> +communicates with the host via MHI.  Each AIC100 has one of
>> +these.
> 
> I would put description of MHI at the top because it is referenced by the QSM description.

Sure.

> 
>> +NSP
>> +---
>> +
>> +Neural Signal Processor.  Each AIC100 has up to 16 of these.  These are
>> +the processors that run the workloads on AIC100.  Each NSP is a Qualcomm Hexagon
>> +(Q6) DSP with HVX and HMX.  Each NSP can only run one workload at a time, but
>> +multiple NSPs may be assigned to a single workload.  Since each NSP can only run
>> +one workload, AIC100 is limited to 16 concurrent workloads.  Workload
>> +"scheduling" is under the purview of the host.  AIC100 does not automatically
>> +timeslice.
>> +
>> +DMA Bridge
>> +----------
>> +
>> +The DMA Bridge is custom DMA engine that manages the flow of data
>> +in and out of workloads.  AIC100 has one of these.  The DMA Bridge has 16
>> +channels, each consisting of a set of request/response FIFOs.  Each active
>> +workload is assigned a single DMA Bridge channel.  The DMA Bridge exposes
>> +hardware registers to manage the FIFOs (head/tail pointers), but requires host
>> +memory to store the FIFOs.
>> +
>> +DDR
>> +---
>> +
>> +AIC100 has on-card DDR.  In total, an AIC100 can have up to 32 GB of DDR.
>> +This DDR is used to store workloads, data for the workloads, and is used by the
>> +QSM for managing the device.  NSPs are granted access to sections of the DDR by
>> +the QSM.  The host does not have direct access to the DDR, and must make
>> +requests to the QSM to transfer data to the DDR.
>> +
>> +MHI
>> +---
>> +
>> +AIC100 has one MHI interface over PCIe.  MHI itself is documented at
> 
> Please exand MHI acronym.

Its expanded about 40 lines up - "* MHI (Modem Host Interface)".  I 
generally go by the scheme of expanding an acronym the first time it is 
used in a document, and then just using the acronym there after.

Do you feel the expansion needs to be duplicated?  It might help when 
this section is moved to the top.

> 
>> +Documentation/mhi/index.rst  MHI is the mechanism the host uses to communicate
>> +with the QSM.  Except for workload data via the DMA Bridge, all interaction with
>> +he device occurs via MHI.
> 
> Typo in "he device".

Doh.  Will fix.

> 
>> +High-level Use Flow
>> +===================
>> +
>> +AIC100 is a programmable accelerator typically used for running
>> +neural networks in inferencing mode to efficiently perform AI operations.
>> +AIC100 is not intended for training neural networks.  AIC100 can be utilitized
> 
> utilitized -> utilized

Sure

> 
>> +for generic compute workloads.
>> +
>> +Assuming a user wants to utilize AIC100, they would follow these steps:
>> +
>> +1. Compile the workload into an ELF targeting the NSP(s)
>> +2. Make requests to the QSM to load the workload and related artifacts into the
>> +   device DDR
>> +3. Make a request to the QSM to activate the workload onto a set of idle NSPs
>> +4. Make requests to the DMA Bridge to send input data to the workload to be
>> +   processed, and other requests to receive processed output data from the
>> +   workload.
>> +5. Once the workload is no longer required, make a request to the QSM to
>> +   deactivate the workload, thus putting the NSPs back into an idle state.
>> +6. Once the workload and related artifacts are no longer needed for future
>> +   sessions, make requests to the QSM to unload the data from DDR.  This frees
>> +   the DDR to be used by other users.
>> +
> 
> Please specify if this is single or multi user device.

It is multi-user.  I will find a way to clarify that.

> 
>> +Boot Flow
>> +=========
>> +
>> +AIC100 uses a flashless boot flow, derived from Qualcomm MSMs.
> 
> What's MSM?

"Mobile Station Modem".  It is is Qualcomm term from the 80s, and used 
to describe the "family" Qualcomm phone SoCs.  It used to be that the 
Model number would be "MSM8660" or "MSM8960", etc.  That has changed a 
bit in the past few years, but the products are still referred to "MSMs".

It is a common term in the Qualcomm world, and "MSM" is more well known 
these days than what it stands for.  I don't think expanding it is going 
to add value.

> 
>> +When AIC100 is first powered on, it begins executing PBL (Primary Bootloader)
>> +from ROM.  PBL enumerates the PCIe link, and initializes the BHI (Boot Host
>> +Interface) component of MHI.
>> +
>> +Using BHI, the host points PBL to the location of the SBL (Secondary Bootloader)
>> +image.  The PBL pulls the image from the host, validates it, and begins
>> +execution of SBL.
>> +
>> +SBL initializes MHI, and uses MHI to notify the host that the device has entered
>> +the SBL stage.  SBL performs a number of operations:
>> +
>> +* SBL initializes the majority of hardware (anything PBL left uninitialized),
>> +  including DDR.
>> +* SBL offloads the bootlog to the host.
>> +* SBL synchonizes timestamps with the host for future logging.
> 
> synchonizes -> synchronizes

Yep

> 
>> +* SBL uses the Sahara protocol to obtain the runtime firmware images from the
>> +  host.
>> +
>> +Once SBL has obtained and validated the runtime firmware, it brings the NSPs out
>> +of reset, and jumps into the QSM.
>> +
>> +The QSM uses MHI to notify the host that the device has entered the QSM stage
>> +(AMSS in MHI terms).  At this point, the AIC100 device is fully functional, and
>> +ready to process workloads.
>> +
>> +Userspace components
>> +====================
>> +
>> +Compiler
>> +--------
>> +
>> +An open compiler for AIC100 based on upstream LLVM can be found at:
>> +https://github.com/quic/software-kit-for-qualcomm-cloud-ai-100-cc
>> +
>> +Usermode Driver (UMD)
>> +---------------------
>> +
>> +An open UMD that interfaces with the qaic kernel driver can be found at:
>> +https://github.com/quic/software-kit-for-qualcomm-cloud-ai-100
> 
> This repo is empty.

Correct.  That was mentioned in the cover letter.  Targeting to post 
content this week.  Just working through the last steps of our internal 
process.

> 
>> +
>> +Sahara loader
>> +-------------
>> +
>> +An open implementation of the Sahara protocol called kickstart can be found at:
>> +https://github.com/andersson/qdl
>> +
>> +MHI Channels
>> +============
>> +
>> +AIC100 defines a number of MHI channels for different purposes.  This is a list
>> +of the defined channels, and their uses.
>> +
>> +| QAIC_LOOPBACK
>> +| Channels 0/1
> 
> A would use comma or & here.

Intresting.  I can see that.  I think I like "&".  Will do that.

> 
>> +| Valid for AMSS
>> +| Any data sent to the device on this channel is sent back to the host.
>> +
>> +| QAIC_SAHARA
>> +| Channels 2/3
>> +| Valid for SBL
>> +| Used by SBL to obtain the runtime firmware from the host.
>> +
>> +| QAIC_DIAG
>> +| Channels 4/5
>> +| Valid for AMSS
>> +| Used to communicate with QSM via the Diag protocol.
>> +
>> +| QAIC_SSR
>> +| Channels 6/7
>> +| Valid for AMSS
>> +| Used to notify the host of subsystem restart events, and to offload SSR crashdumps.
>> +
>> +| QAIC_QDSS
>> +| Channels 8/9
>> +| Valid for AMSS
>> +| Used for the Qualcomm Debug Subsystem.
>> +
>> +| QAIC_CONTROL
>> +| Channels 10/11
>> +| Valid for AMSS
>> +| Used for the Neural Network Control (NNC) protocol.  This is the primary channel between host and QSM for managing workloads.
>> +
>> +| QAIC_LOGGING
>> +| Channels 12/13
>> +| Valid for SBL
>> +| Used by the SBL to send the bootlog to the host.
>> +
>> +| QAIC_STATUS
>> +| Channels 14/15
>> +| Valid for AMSS
>> +| Used to notify the host of Reliability, Accessability, Serviceability (RAS) events.
> 
> Accessability -> Accessibility

Got it.

> 
>> +| QAIC_TELEMETRY
>> +| Channels 16/17
>> +| Valid for AMSS
>> +| Used to get/set power/thermal/etc attributes.
>> +
>> +| QAIC_DEBUG
>> +| Channels 18/19
>> +| Valid for AMSS
>> +| Not used.
>> +
>> +| QAIC_TIMESYNC
>> +| Channels 20/21
>> +| Valid for SBL/AMSS
>> +| Used to synchronize timestamps in the device side logs with the host time source.
>> +
>> +DMA Bridge
>> +==========
>> +
>> +Overview
>> +--------
>> +
>> +The DMA Bridge is one of the main interfaces to the host from the device
>> +(the other being MHI).  As part of activating a workload to run on NSPs, the QSM
>> +assigns that network a DMA Bridge channel.  A workload's DMA Bridge channel
>> +(DBC for short) is solely for the use of that workload and is not shared with
>> +other workloads.
>> +
>> +Each DBC is a pair of FIFOs that manage data in and out of the workload.  One
>> +FIFO is the request FIFO.  The other FIFO is the response FIFO.
>> +
>> +Each DBC contains 4 registers in hardware:
>> +
>> +* Request FIFO head pointer (offset 0x0).  Read only to the host.  Indicates the
> 
> Read only _by_ the host.

Sure.

> 
>> +  latest item in the FIFO the device has consumed.
>> +* Request FIFO tail pointer (offset 0x4).  Read/write by the host.  Host
>> +  increments this register to add new items to the FIFO.
>> +* Response FIFO head pointer (offset 0x8).  Read/write by the host.  Indicates
>> +  the latest item in the FIFO the host has consumed.
>> +* Response FIFO tail pointer (offset 0xc).  Read only to the host.  Device
> 
> Read only _by_ the host.

Sure.

> 
>> +  increments this register to add new items to the FIFO.
>> +
>> +The values in each register are indexes in the FIFO.  To get the location of the
>> +FIFO element pointed to by the register: FIFO base address + register * element
>> +size.
>> +
>> +DBC registers are exposed to the host via the second BAR.  Each DBC consumes
>> +0x1000 of space in the BAR.
> 
> I wouldn't use hex for the sizes. 4KB seems a lot more readable.

Good point.  Will do.

> 
>> +The actual FIFOs are backed by host memory.  When sending a request to the QSM
>> +to activate a network, the host must donate memory to be used for the FIFOs.
>> +Due to internal mapping limitations of the device, a single contigious chunk of
> 
> contigious -> contiguous

Got it.

> 
>> +memory must be provided per DBC, which hosts both FIFOs.  The request FIFO will
>> +consume the beginning of the memory chunk, and the response FIFO will consume
>> +the end of the memory chunk.
>> +
>> +Request FIFO
>> +------------
>> +
>> +A request FIFO element has the following structure:
>> +
>> +| {
>> +|	u16 req_id;
>> +|	u8  seq_id;
>> +|	u8  pcie_dma_cmd;
>> +|	u32 reserved;
>> +|	u64 pcie_dma_source_addr;
>> +|	u64 pcie_dma_dest_addr;
>> +|	u32 pcie_dma_len;
>> +|	u32 reserved;
>> +|	u64 doorbell_addr;
>> +|	u8  doorbell_attr;
>> +|	u8  reserved;
>> +|	u16 reserved;
>> +|	u32 doorbell_data;
>> +|	u32 sem_cmd0;
>> +|	u32 sem_cmd1;
>> +|	u32 sem_cmd2;
>> +|	u32 sem_cmd3;
>> +| }
>> +
>> +Request field descriptions:
>> +
>> +| req_id- request ID.  A request FIFO element and a response FIFO element with
>> +|         the same request ID refer to the same command.
>> +
>> +| seq_id- sequence ID within a request.  Ignored by the DMA Bridge.
>> +
>> +| pcie_dma_cmd- describes the DMA element of this request.
>> +| 	Bit(7) is the force msi flag, which overrides the DMA Bridge MSI logic
>> +| 		and generates a MSI when this request is complete, and QSM
>> +| 		configures the DMA Bridge to look at this bit.
>> +| 	Bits(6:5) are reserved.
>> +| 	Bit(4) is the completion code flag, and indicates that the DMA Bridge
>> +| 		shall generate a response FIFO element when this request is
>> +| 		complete.
>> +| 	Bit(3) indicates if this request is a linked list transfer(0) or a bulk
>> +| 		transfer(1).
>> +| 	Bit(2) is reserved.
>> +| 	Bits(1:0) indicate the type of transfer.  No transfer(0), to device(1),
>> +| 		from device(2).  Value 3 is illegal.
>> +
>> +| pcie_dma_source_addr- source address for a bulk transfer, or the address of
>> +|         the linked list.
>> +
>> +| pcie_dma_dest_addr- destination address for a bulk transfer.
>> +
>> +| pcie_dma_len- length of the bulk transfer.  Note that the size of this field
>> +| 	limits transfers to 4G in size.
>> +
>> +| doorbell_addr- address of the doorbell to ring when this request is complete.
>> +
>> +| doorbell_attr- doorbell attributes.
>> +| 	Bit(7) indicates if a write to a doorbell is to occur.
>> +| 	Bits(6:2) are reserved.
>> +| 	Bits(1:0) contain the encoding of the doorbell length.  0 is 32-bit,
>> +| 		1 is 16-bit, 2 is 8-bit, 3 is reserved.  The doorbell address
>> +| 		must be naturally aligned to the specified length.
>> +
>> +| doorbell_data- data to write to the doorbell.  Only the bits corresponding to
>> +| 	the doorbell length are valid.
>> +
>> +| sem_cmdN- semaphore command.
>> +| 	Bit(31) indicates this semaphore command is enabled.
>> +| 	Bit(30) is the to-device DMA fence.  Block this request until all
>> +| 		to-device DMA transfers are complete.
>> +| 	Bit(29) is the from-device DMA fence.  Block this request until all
>> +| 		from-device DMA transfers are complete.
>> +| 	Bits(28:27) are reserved.
>> +| 	Bits(26:24) are the semaphore command.  0 is NOP.  1 is init with the
>> +| 		specified value.  2 is increment.  3 is decrement.  4 is wait
>> +| 		until the semaphore is equal to the specified value.  5 is wait
>> +| 		until the semaphore is greater or equal to the specified value.
>> +| 		6 is "P", wait until semaphore is greater than 0, then
>> +| 		decrement by 1.  7 is reserved.
>> +| 	Bit(23) is reserved.
>> +| 	Bit(22) is the semaphore sync.  0 is post sync, which means that the
>> +| 		semaphore operation is done after the DMA transfer.  1 is
>> +| 		presync, which gates the DMA transfer.  Only one presync is
>> +| 		allowed per request.
>> +| 	Bit(21) is reserved.
>> +| 	Bits(20:16) is the index of the semaphore to operate on.
>> +| 	Bits(15:12) are reserved.
>> +| 	Bits(11:0) are the semaphore value to use in operations.
> 
> It seems to me like structure documentation

Yes.  It can be modeled that way.  However the code comes later so it 
can't be referenced here yet.  I've got a todo to come back and clean 
that up once this series is merged.

> 
>> +Overall, a request is processed in 4 steps:
>> +
>> +1. If specified, the presync semaphore condition must be true
>> +2. If enabled, the DMA transfer occurs
>> +3. If specified, the postsync semaphore conditions must be true
>> +4. If enabled, the doorbell is written
>> +
>> +By using the semaphores in conjunction with the workload running on the NSPs,
>> +the data pipeline can be synchronized such that the host can queue multiple
>> +requests of data for the workload to process, but the DMA Bridge will only copy
>> +the data into the memory of the workload when the workload is ready to process
>> +the next input.
>> +
>> +Response FIFO
>> +-------------
>> +
>> +Once a request is fully processed, a response FIFO element is generated if
>> +specified in pcie_dma_cmd.  The structure of a response FIFO element:
>> +
>> +| {
>> +| 	u16 req_id;
>> +| 	u16 completion_code;
>> +| }
>> +
>> +req_id- matches the req_id of the request that generated this element.
>> +
>> +completion_code- status of this request.  0 is success.  non-zero is an error.
>> +
>> +The DMA Bridge will generate a MSI to the host as a reaction to activity in the
>> +response FIFO of a DBC.  The DMA Bridge hardware has an IRQ storm mitigation
>> +algorithm, where it will only generate a MSI when the response FIFO transitions
>> +from empty to non-empty (unless force MSI is enabled and triggered).  In
>> +response to this MSI, the host is expected to drain the response FIFO, and must
>> +take care to handle any race conditions between draining the FIFO, and the
>> +device inserting elements into the FIFO.
>> +
>> +Neural Network Control (NNC) Protocol
>> +=====================================
>> +
>> +The NNC protocol is how the host makes requests to the QSM to manage workloads.
>> +It uses the QAIC_CONTROL MHI channel.
>> +
>> +Each NNC request is packaged into a message.  Each message is a series of
>> +transactions.  A passthrough type transaction can contain elements known as
>> +commands.
>> +
>> +QSM requires NNC messages be little endian encoded and the fields be naturally
>> +aligned.  Since there are 64-bit elements in some NNC messages, 64-bit alignment
>> +must be maintained.
>> +
>> +A message contains a header and then a series of transactions.  A message may be
>> +at most 4K in size from QSM to the host.  From the host to the QSM, a message
>> +can be at most 64K (maximum size of a single MHI packet), but there is a
>> +continuation feature where message N+1 can be marked as a continuation of
>> +message N.  This is used for exceedingly large DMA xfer transactions.
>> +
>> +Transaction descriptions:
>> +
>> +passthrough- Allows userspace to send an opaque payload directly to the QSM.
>> +This is used for NNC commands.  Userspace is responsible for managing
>> +the QSM message requirements in the payload
>> +
>> +dma_xfer- DMA transfer.  Describes an object that the QSM should DMA into the
>> +device via address and size tuples.
>> +
>> +activate- Activate a workload onto NSPs.  The host must provide memory to be
>> +used by the DBC.
>> +
>> +deactivate- Deactivate an active workload and return the NSPs to idle.
>> +
>> +status- Query the QSM about it's NNC implementation.  Returns the NNC version,
>> +and if CRC is used.
>> +
>> +terminate- Release a user's resources.
>> +
>> +dma_xfer_cont- Continuation of a previous DMA transfer.  If a DMA transfer
>> +cannot be specified in a single message (highly fragmented), this
>> +transaction can be used to specify more ranges.
>> +
>> +validate_partition- Query to QSM to determine if a partition identifier is
>> +valid.
>> +
>> +Each message is tagged with a user id, and a partition id.  The user id allows
>> +QSM to track resources, and release them when the user goes away (eg the process
>> +crashes).  A partition id identifies the resource partition that QSM manages,
>> +which this message applies to.
>> +
>> +Messages may have CRCs.  Messages should have CRCs applied until the QSM
>> +reports via the status transaction that CRCs are not needed.  The QSM on the
>> +SA9000P requires CRCs for black channel safing.
>> +
>> +Subsystem Restart (SSR)
>> +=======================
>> +
>> +SSR is the concept of limiting the impact of an error.  An AIC100 device may
>> +have multiple users, each with their own workload running.  If the workload of
>> +one user crashes, the fallout of that should be limited to that workload and not
>> +impact other workloads.  SSR accomplishes this.
>> +
>> +If a particular workload crashes, QSM notifies the host via the QAIC_SSR MHI
>> +channel.  This notification identifies the workload by it's assigned DBC.  A
>> +multi-stage recovery process is then used to cleanup both sides, and get the
>> +DBC/NSPs into a working state.
>> +
>> +When SSR occurs, any state in the workload is lost.  Any inputs that were in
>> +process, or queued by not yet serviced, are lost.  The loaded artifacts will
>> +remain in on-card DDR, but the host will need to re-activate the workload if
>> +it desires to recover the workload.
>> +
>> +Reliability, Accessability, Serviceability (RAS)
> 
> Accessability -> Accessibility

Got it.

> 
>> +================================================
>> +
>> +AIC100 is expected to be deployed in server systems where RAS ideology is
>> +applied.  Simply put, RAS is the concept of detecting, classifying, and
>> +reporting errors.  While PCIe has AER (Advanced Error Reporting) which factors
>> +into RAS, AER does not allow for a device to report details about internal
>> +errors.  Therefore, AIC100 implements a custom RAS mechanism.  When a RAS event
>> +occurs, QSM will report the event with appropriate details via the QAIC_STATUS
>> +MHI channel.  A sysadmin may determine that a particular device needs
>> +additional service based on RAS reports.
>> +
>> +Telemetry
>> +=========
>> +
>> +QSM has the ability to report various physical attributes of the device, and in
>> +some cases, to allow the host to control them.  Examples include thermal limits,
>> +thermal readings, and power readings.  These items are communicated via the
>> +QAIC_TELEMETRY MHI channel
>> diff --git a/Documentation/accel/qaic/index.rst b/Documentation/accel/qaic/index.rst
>> new file mode 100644
>> index 0000000..ad19b88
>> --- /dev/null
>> +++ b/Documentation/accel/qaic/index.rst
>> @@ -0,0 +1,13 @@
>> +.. SPDX-License-Identifier: GPL-2.0-only
>> +
>> +=====================================
>> + accel/qaic Qualcomm Cloud AI driver
>> +=====================================
>> +
>> +The accel/qaic driver supports the Qualcomm Cloud AI machine learning
>> +accelerator cards.
>> +
>> +.. toctree::
>> +
>> +   qaic
>> +   aic100
>> diff --git a/Documentation/accel/qaic/qaic.rst b/Documentation/accel/qaic/qaic.rst
>> new file mode 100644
>> index 0000000..b0e7a5f
>> --- /dev/null
>> +++ b/Documentation/accel/qaic/qaic.rst
>> @@ -0,0 +1,169 @@
>> +.. SPDX-License-Identifier: GPL-2.0-only
>> +
>> +=============
>> + QAIC driver
>> +=============
>> +
>> +The QAIC driver is the Kernel Mode Driver (KMD) for the AIC100 family of AI
>> +accelerator products.
>> +
>> +Interrupts
>> +==========
>> +
>> +While the AIC100 DMA Bridge hardware implements an IRQ storm mitigation
>> +mechanism, it is still possible for an IRQ storm to occur.  A storm can happen
>> +if the workload is particularly quick, and the host is responsive.  If the host
>> +can drain the response FIFO as quickly as the device can insert elements into
>> +it, then the device will frequently transition the response FIFO from empty to
>> +non-empty and generate MSIs at a rate equilivelent to the speed of the
> 
> equilivelent -> equivalent

Sure.

> 
>> +workload's ability to process inputs.  The lprnet (license plate reader network)
>> +workload is known to trigger this condition, and can generate in excess of 100k
>> +MSIs per second.  It has been observed that most systems cannot tolerate this
>> +for long, and will crash due to some form of watchdog due to the overhead of
>> +the interrupt controller interrupting the host CPU.
>> +
>> +To mitigate this issue, the QAIC driver implements specific IRQ handling.  When
>> +QAIC receives an IRQ, it disables that line.  This prevents the interrupt
>> +controller from interrupting the CPU.  Then AIC drains the FIFO.  Once the FIFO
>> +is drained, QAIC implements a "last chance" polling algorithm where QAIC will
>> +sleep for a time to see if the workload will generate more activity.  The IRQ
>> +line remains disabled during this time.  If no activity is detected, QAIC exits
>> +polling mode and reenables the IRQ line.
>> +
>> +This mitigation in QAIC is very effective.  The same lprnet usecase that
>> +generates 100k IRQs per second (per /proc/interrupts) is reduced to roughly 64
>> +IRQs over 5 minutes while keeping the host system stable, and having the same
>> +workload throughput performance (within run to run noise variation).
>> +
>> +
>> +Neural Network Control (NNC) Protocol
>> +=====================================
>> +
>> +The implementation of NNC is split between the KMD (QAIC) and UMD.  In general
>> +QAIC understands how to encode/decode NNC wire protocol, and elements of the
>> +protocol which require kernelspace knowledge to process (for example, mapping
> 
> kernelspace is missing a space :P
> 
>> +host memory to device IOVAs).  QAIC understands the structure of a message, and
>> +all of the transactions.  QAIC does not understand commands (the payload of a
>> +passthrough transaction).
>> +
>> +QAIC handles and enforces the required little endianness and 64-bit alignment,
>> +to the degree that it can.  Since QAIC does not know the contents of a
>> +passthrough transaction, it relies on the UMD to saitsfy the requirements.
> 
> saitsfy -> satisfy

Will do.

> 
>> +The terminate transaction is of particular use to QAIC.  QAIC is not aware of
>> +the resources that are loaded onto a device since the majority of that activity
>> +occurs within NNC commands.  As a result, QAIC does not have the means to
>> +roll back userspace activity.  To ensure that a userspace client's resources
>> +are fully released in the case of a process crash, or a bug, QAIC uses the
>> +terminate command to let QSM know when a user has gone away, and the resources
>> +can be released.
>> +
>> +QSM can report a version number of the NNC protocol it supports.  This is in the
>> +form of a Major number and a Minor number.
>> +
>> +Major number updates indicate changes to the NNC protocol which impact the
>> +message format, or transactions (impacts QAIC).
>> +
>> +Minor number updates indicate changes to the NNC protocol which impact the
>> +commands (does not impact QAIC).
>> +
>> +uAPI
>> +====
>> +
>> +QAIC defines a number of driver specific IOCTLs as part of the userspace API.
>> +This section describes those APIs.
>> +
>> +DRM_IOCTL_QAIC_MANAGE:
>> +This IOCTL allows userspace to send a NNC request to the QSM.  The call will
>> +block until a response is received, or the request has timed out.
>> +
>> +DRM_IOCTL_QAIC_CREATE_BO:
>> +This IOCTL allows userspace to allocate a buffer object (BO) which can send or
>> +receive data from a workload.  The call will return a GEM handle that
>> +represents the allocated buffer.  The BO is not usable until it has been sliced
>> +(see DRM_IOCTL_QAIC_ATTACH_SLICE_BO).
>> +
>> +DRM_IOCTL_QAIC_MMAP_BO:
>> +This IOCTL allows userspace to prepare an allocated BO to be mmap'd into the
>> +userspace process.
>> +
>> +DRM_IOCTL_QAIC_ATTACH_SLICE_BO:
>> +This IOCTL allows userspace to slice a BO in preparation for sending the BO to
>> +the device.  Slicing is the operation of describing what portions of a BO get
>> +sent where to a workload.  This requires a set of DMA transfers for the DMA
>> +Bridge, and as such, locks the BO to a specific DBC.
>> +
>> +DRM_IOCTL_QAIC_EXECUTE_BO:
>> +This IOCTL allows userspace to submit a set of sliced BOs to the device.  The
>> +call is non-blocking.  Success only indicates that the BOs have been queued
>> +to the device, but does not guarantee they have been executed.
>> +
>> +DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO:
>> +This IOCTL operates like DRM_IOCTL_QAIC_EXECUTE_BO, but it allows userspace to
>> +shrink the BOs sent to the device for this specific call.  If a BO typically has
>> +N inputs, but only a subset of those is available, this IOCTL allows userspace
>> +to indicate that only the first M bytes of the BO should be sent to the device
>> +to minimize data transfer overhead.  This IOCTL dynamically recomputes the
>> +slicing, and therefore has some processing overhead before the BOs can be queued
>> +to the device.
>> +
>> +DRM_IOCTL_QAIC_WAIT_BO:
>> +This IOCTL allows userspace to determine when a particular BO has been processed
>> +by the device.  The call will block until either the BO has been processed and
>> +can be re-queued to the device, or a timeout occurs.
>> +
>> +DRM_IOCTL_QAIC_PERF_STATS_BO:
>> +This IOCTL allows userspace to collect performance statistics on the most
>> +recent execution of a BO.  This allows userspace to construct an end to end
>> +timeline of the BO processing for a performance analysis.
>> +
>> +DRM_IOCTL_QAIC_PART_DEV:
>> +This IOCTL allows userspace to request a duplicate "shadow device".  This extra
>> +accelN device is associated with a specific partition of resources on the AIC100
>> +device and can be used for limiting a process to some subset of resources.
>> +
>> +Userspace Client Isolation
>> +==========================
>> +
>> +AIC100 supports multiple clients.  Multiple DBCs can be consumed by a single
>> +client, and multiple clients can each consume one or more DBCs.  Workloads
>> +may contain sensistive information therefore only the client that owns the
> 
> sensistive -> sensitive

Will do.

> 
>> +workload should be allowed to interface with the DBC.
>> +
>> +Clients are identified by the instance associated with their open().  A client
>> +may only use memory they allocate, and DBCs that are assigned to their
>> +workloads.  Attempts to access resources assigned to other clients will be
>> +rejected.
>> +
>> +Module parameters
>> +=================
>> +
>> +QAIC supports the following module parameters:
>> +
>> +**datapath_polling (bool)**
>> +
>> +Configures QAIC to use a polling thread for datapath events instead of relying
>> +on the device interrupts.  Useful for platforms with broken multiMSI.  Must be
>> +set at QAIC driver initialization.  Default is 0 (off).
>> +
>> +**mhi_timeout (int)**
>> +
>> +Sets the timeout value for MHI operations in milliseconds (ms).  Must be set
>> +at the time the driver detects a device.  Default is 2000 (2 seconds).
>> +
>> +**control_resp_timeout (int)**
>> +
>> +Sets the timeout value for QSM responses to NNC messages in seconds (s).  Must
>> +be set at the time the driver is sending a request to QSM.  Default is 60 (one
>> +minute).
>> +
>> +**wait_exec_default_timeout (int)**
>> +
>> +Sets the default timeout for the wait_exec ioctl in milliseconds (ms).  Must be
>> +set prior to the waic_exec ioctl call.  A value specified in the ioctl call
>> +overrides this for that call.  Default is 5000 (5 seconds).
>> +
>> +**datapath_poll_interval_us (int)**
>> +
>> +Sets the polling interval in microseconds (us) when datapath polling is active.
>> +Takes effect at the next polling interval.  Default is 100 (100 us).
> 
> Cool that you are staring with the documentation :)
> I suggest running at least "checkpatch.pl --codespell" on the series as there are many spelling issue.

Thats what happened to checkpatch.  Thanks for pointing out the flag.  I 
remember it doing spellcheck by default.  Apparently it needs a flag 
now.  I'll do that going forward.

> 
> Regards,
> Jacek
> 
> 
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/8] accel/qaic: Add documentation for AIC100 accelerator driver
@ 2023-02-15 15:41       ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-15 15:41 UTC (permalink / raw)
  To: Jacek Lawrynowicz, ogabbay, airlied, daniel, dri-devel
  Cc: linux-doc, linux-arm-msm, quic_ajitpals, quic_pkanojiy,
	stanislaw.gruszka, quic_carlv

On 2/14/2023 4:08 AM, Jacek Lawrynowicz wrote:
> Hi,

Thank you for the review.

> On 06.02.2023 16:41, Jeffrey Hugo wrote:
>> The Qualcomm Cloud AI 100 (AIC100) device is an Artificial Intelligence
>> accelerator PCIe card.  It contains a number of components both in the
>> SoC and on the card which facilitate running workloads:
>>
>> QSM: management processor
>> NSPs: workload compute units
>> DMA Bridge: dedicated data mover for the workloads
>> MHI: multiplexed communication channels
>> DDR: workload storage and memory
>>
>> The Linux kernel driver for AIC100 is called "QAIC" and is located in the
>> accel subsystem.
>>
>> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
>> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
>> ---
>>   Documentation/accel/index.rst       |   1 +
>>   Documentation/accel/qaic/aic100.rst | 498 ++++++++++++++++++++++++++++++++++++
>>   Documentation/accel/qaic/index.rst  |  13 +
>>   Documentation/accel/qaic/qaic.rst   | 169 ++++++++++++
>>   4 files changed, 681 insertions(+)
>>   create mode 100644 Documentation/accel/qaic/aic100.rst
>>   create mode 100644 Documentation/accel/qaic/index.rst
>>   create mode 100644 Documentation/accel/qaic/qaic.rst
>>
>> diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
>> index 2b43c9a..e94a016 100644
>> --- a/Documentation/accel/index.rst
>> +++ b/Documentation/accel/index.rst
>> @@ -8,6 +8,7 @@ Compute Accelerators
>>      :maxdepth: 1
>>   
>>      introduction
>> +   qaic/index
>>   
>>   .. only::  subproject and html
>>   
>> diff --git a/Documentation/accel/qaic/aic100.rst b/Documentation/accel/qaic/aic100.rst
>> new file mode 100644
>> index 0000000..773aa54
>> --- /dev/null
>> +++ b/Documentation/accel/qaic/aic100.rst
>> @@ -0,0 +1,498 @@
>> +.. SPDX-License-Identifier: GPL-2.0-only
>> +
>> +===============================
>> + Qualcomm Cloud AI 100 (AIC100)
>> +===============================
>> +
>> +Overview
>> +========
>> +
>> +The Qualcomm Cloud AI 100/AIC100 family of products (including SA9000P - part of
>> +Snapdragon Ride) are PCIe adapter cards which contain a dedicated SoC ASIC for
>> +the purpose of efficiently running Artificial Intelligence (AI) Deep Learning
>> +inference workloads.  They are AI accelerators.
> 
> There are multiple double spaces in this document like this one above.

I presume you are referring to the double space after peroid. 
Universally, that was the recommended style (APA guidebook, etc) until a 
little while ago.  Old habits are hard to break.  Will scrub.

> 
>> +The PCIe interface of AIC100 is capable of PCIe Gen4 speeds over eight lanes
>> +(x8).  An individual SoC on a card can have up to 16 NSPs for running workloads.
>> +Each SoC has an A53 management CPU.  On card, there can be up to 32 GB of DDR.
>> +
>> +Multiple AIC100 cards can be hosted in a single system to scale overall
>> +performance.
>> +
>> +Hardware Description
>> +====================
>> +
>> +An AIC100 card consists of an AIC100 SoC, on-card DDR, and a set of misc
>> +peripherals (PMICs, etc).
>> +
>> +An AIC100 card can either be a PCIe HHHL form factor (a traditional PCIe card),
>> +or a Dual M.2 card.  Both use PCIe to connect to the host system.
> 
> Dual M.2 card? Is it a single PCB with two M.2 connectors? This requires custom
> motherboard with x4 lanes from two connectors combined as a single PCIe device, right?

Yes.  There is a specification for this, although it hasn't gotten 
widespread adoption.  In addition to more lanes, you also get to draw 
more power.  Sincle M.2 is around 11W.  Dual M.2 is capped at 25W.

It tends to be a handy form factor for "edge" applications where the 
physical size and power draw of a "normal" PCIe slot (what you'd find on 
a regular ATX motherboard) is not desirerable.

> 
>> +As a PCIe endpoint/adapter, AIC100 uses the standard VendorID(VID)/
>> +DeviceID(DID) combination to uniquely identify itself to the host.  AIC100
>> +uses the standard Qualcomm VID (0x17cb).  All AIC100 instances use the same
>> +AIC100 DID (0xa100).
> 
> Maybe "SKUs" would fit better here then "instances".

Sure.

> 
>> +AIC100 does not implement FLR (function level reset).
>> +
>> +AIC100 implements MSI but does not implement MSI-X.  AIC100 requires 17 MSIs to
>> +operate (1 for MHI, 16 for the DMA Bridge).
>> +
>> +As a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device
>> +hardware.  AIC100 provides 3, 64-bit BARs.
>> +
>> +* The first BAR is 4K in size, and exposes the MHI interface to the host.
>> +
>> +* The second BAR is 2M in size, and exposes the DMA Bridge interface to the
>> +  host.
>> +
>> +* The third BAR is variable in size based on an individual AIC100's
>> +  configuration, but defaults to 64K.  This BAR currently has no purpose.
>> +
>> +From the host perspective, AIC100 has several key hardware components-
> 
> Typo in "components-".

?
You want "components -"?

> 
>> +* QSM (QAIC Service Manager)
>> +* NSPs (Neural Signal Processor)
>> +* DMA Bridge
>> +* DDR
>> +* MHI (Modem Host Interface)
>> +
>> +QSM
>> +---
>> +
>> +QAIC Service Manager.  This is an ARM A53 CPU that runs the primary
>> +firmware of the card and performs on-card management tasks.  It also
>> +communicates with the host via MHI.  Each AIC100 has one of
>> +these.
> 
> I would put description of MHI at the top because it is referenced by the QSM description.

Sure.

> 
>> +NSP
>> +---
>> +
>> +Neural Signal Processor.  Each AIC100 has up to 16 of these.  These are
>> +the processors that run the workloads on AIC100.  Each NSP is a Qualcomm Hexagon
>> +(Q6) DSP with HVX and HMX.  Each NSP can only run one workload at a time, but
>> +multiple NSPs may be assigned to a single workload.  Since each NSP can only run
>> +one workload, AIC100 is limited to 16 concurrent workloads.  Workload
>> +"scheduling" is under the purview of the host.  AIC100 does not automatically
>> +timeslice.
>> +
>> +DMA Bridge
>> +----------
>> +
>> +The DMA Bridge is custom DMA engine that manages the flow of data
>> +in and out of workloads.  AIC100 has one of these.  The DMA Bridge has 16
>> +channels, each consisting of a set of request/response FIFOs.  Each active
>> +workload is assigned a single DMA Bridge channel.  The DMA Bridge exposes
>> +hardware registers to manage the FIFOs (head/tail pointers), but requires host
>> +memory to store the FIFOs.
>> +
>> +DDR
>> +---
>> +
>> +AIC100 has on-card DDR.  In total, an AIC100 can have up to 32 GB of DDR.
>> +This DDR is used to store workloads, data for the workloads, and is used by the
>> +QSM for managing the device.  NSPs are granted access to sections of the DDR by
>> +the QSM.  The host does not have direct access to the DDR, and must make
>> +requests to the QSM to transfer data to the DDR.
>> +
>> +MHI
>> +---
>> +
>> +AIC100 has one MHI interface over PCIe.  MHI itself is documented at
> 
> Please exand MHI acronym.

Its expanded about 40 lines up - "* MHI (Modem Host Interface)".  I 
generally go by the scheme of expanding an acronym the first time it is 
used in a document, and then just using the acronym there after.

Do you feel the expansion needs to be duplicated?  It might help when 
this section is moved to the top.

> 
>> +Documentation/mhi/index.rst  MHI is the mechanism the host uses to communicate
>> +with the QSM.  Except for workload data via the DMA Bridge, all interaction with
>> +he device occurs via MHI.
> 
> Typo in "he device".

Doh.  Will fix.

> 
>> +High-level Use Flow
>> +===================
>> +
>> +AIC100 is a programmable accelerator typically used for running
>> +neural networks in inferencing mode to efficiently perform AI operations.
>> +AIC100 is not intended for training neural networks.  AIC100 can be utilitized
> 
> utilitized -> utilized

Sure

> 
>> +for generic compute workloads.
>> +
>> +Assuming a user wants to utilize AIC100, they would follow these steps:
>> +
>> +1. Compile the workload into an ELF targeting the NSP(s)
>> +2. Make requests to the QSM to load the workload and related artifacts into the
>> +   device DDR
>> +3. Make a request to the QSM to activate the workload onto a set of idle NSPs
>> +4. Make requests to the DMA Bridge to send input data to the workload to be
>> +   processed, and other requests to receive processed output data from the
>> +   workload.
>> +5. Once the workload is no longer required, make a request to the QSM to
>> +   deactivate the workload, thus putting the NSPs back into an idle state.
>> +6. Once the workload and related artifacts are no longer needed for future
>> +   sessions, make requests to the QSM to unload the data from DDR.  This frees
>> +   the DDR to be used by other users.
>> +
> 
> Please specify if this is single or multi user device.

It is multi-user.  I will find a way to clarify that.

> 
>> +Boot Flow
>> +=========
>> +
>> +AIC100 uses a flashless boot flow, derived from Qualcomm MSMs.
> 
> What's MSM?

"Mobile Station Modem".  It is is Qualcomm term from the 80s, and used 
to describe the "family" Qualcomm phone SoCs.  It used to be that the 
Model number would be "MSM8660" or "MSM8960", etc.  That has changed a 
bit in the past few years, but the products are still referred to "MSMs".

It is a common term in the Qualcomm world, and "MSM" is more well known 
these days than what it stands for.  I don't think expanding it is going 
to add value.

> 
>> +When AIC100 is first powered on, it begins executing PBL (Primary Bootloader)
>> +from ROM.  PBL enumerates the PCIe link, and initializes the BHI (Boot Host
>> +Interface) component of MHI.
>> +
>> +Using BHI, the host points PBL to the location of the SBL (Secondary Bootloader)
>> +image.  The PBL pulls the image from the host, validates it, and begins
>> +execution of SBL.
>> +
>> +SBL initializes MHI, and uses MHI to notify the host that the device has entered
>> +the SBL stage.  SBL performs a number of operations:
>> +
>> +* SBL initializes the majority of hardware (anything PBL left uninitialized),
>> +  including DDR.
>> +* SBL offloads the bootlog to the host.
>> +* SBL synchonizes timestamps with the host for future logging.
> 
> synchonizes -> synchronizes

Yep

> 
>> +* SBL uses the Sahara protocol to obtain the runtime firmware images from the
>> +  host.
>> +
>> +Once SBL has obtained and validated the runtime firmware, it brings the NSPs out
>> +of reset, and jumps into the QSM.
>> +
>> +The QSM uses MHI to notify the host that the device has entered the QSM stage
>> +(AMSS in MHI terms).  At this point, the AIC100 device is fully functional, and
>> +ready to process workloads.
>> +
>> +Userspace components
>> +====================
>> +
>> +Compiler
>> +--------
>> +
>> +An open compiler for AIC100 based on upstream LLVM can be found at:
>> +https://github.com/quic/software-kit-for-qualcomm-cloud-ai-100-cc
>> +
>> +Usermode Driver (UMD)
>> +---------------------
>> +
>> +An open UMD that interfaces with the qaic kernel driver can be found at:
>> +https://github.com/quic/software-kit-for-qualcomm-cloud-ai-100
> 
> This repo is empty.

Correct.  That was mentioned in the cover letter.  Targeting to post 
content this week.  Just working through the last steps of our internal 
process.

> 
>> +
>> +Sahara loader
>> +-------------
>> +
>> +An open implementation of the Sahara protocol called kickstart can be found at:
>> +https://github.com/andersson/qdl
>> +
>> +MHI Channels
>> +============
>> +
>> +AIC100 defines a number of MHI channels for different purposes.  This is a list
>> +of the defined channels, and their uses.
>> +
>> +| QAIC_LOOPBACK
>> +| Channels 0/1
> 
> A would use comma or & here.

Intresting.  I can see that.  I think I like "&".  Will do that.

> 
>> +| Valid for AMSS
>> +| Any data sent to the device on this channel is sent back to the host.
>> +
>> +| QAIC_SAHARA
>> +| Channels 2/3
>> +| Valid for SBL
>> +| Used by SBL to obtain the runtime firmware from the host.
>> +
>> +| QAIC_DIAG
>> +| Channels 4/5
>> +| Valid for AMSS
>> +| Used to communicate with QSM via the Diag protocol.
>> +
>> +| QAIC_SSR
>> +| Channels 6/7
>> +| Valid for AMSS
>> +| Used to notify the host of subsystem restart events, and to offload SSR crashdumps.
>> +
>> +| QAIC_QDSS
>> +| Channels 8/9
>> +| Valid for AMSS
>> +| Used for the Qualcomm Debug Subsystem.
>> +
>> +| QAIC_CONTROL
>> +| Channels 10/11
>> +| Valid for AMSS
>> +| Used for the Neural Network Control (NNC) protocol.  This is the primary channel between host and QSM for managing workloads.
>> +
>> +| QAIC_LOGGING
>> +| Channels 12/13
>> +| Valid for SBL
>> +| Used by the SBL to send the bootlog to the host.
>> +
>> +| QAIC_STATUS
>> +| Channels 14/15
>> +| Valid for AMSS
>> +| Used to notify the host of Reliability, Accessability, Serviceability (RAS) events.
> 
> Accessability -> Accessibility

Got it.

> 
>> +| QAIC_TELEMETRY
>> +| Channels 16/17
>> +| Valid for AMSS
>> +| Used to get/set power/thermal/etc attributes.
>> +
>> +| QAIC_DEBUG
>> +| Channels 18/19
>> +| Valid for AMSS
>> +| Not used.
>> +
>> +| QAIC_TIMESYNC
>> +| Channels 20/21
>> +| Valid for SBL/AMSS
>> +| Used to synchronize timestamps in the device side logs with the host time source.
>> +
>> +DMA Bridge
>> +==========
>> +
>> +Overview
>> +--------
>> +
>> +The DMA Bridge is one of the main interfaces to the host from the device
>> +(the other being MHI).  As part of activating a workload to run on NSPs, the QSM
>> +assigns that network a DMA Bridge channel.  A workload's DMA Bridge channel
>> +(DBC for short) is solely for the use of that workload and is not shared with
>> +other workloads.
>> +
>> +Each DBC is a pair of FIFOs that manage data in and out of the workload.  One
>> +FIFO is the request FIFO.  The other FIFO is the response FIFO.
>> +
>> +Each DBC contains 4 registers in hardware:
>> +
>> +* Request FIFO head pointer (offset 0x0).  Read only to the host.  Indicates the
> 
> Read only _by_ the host.

Sure.

> 
>> +  latest item in the FIFO the device has consumed.
>> +* Request FIFO tail pointer (offset 0x4).  Read/write by the host.  Host
>> +  increments this register to add new items to the FIFO.
>> +* Response FIFO head pointer (offset 0x8).  Read/write by the host.  Indicates
>> +  the latest item in the FIFO the host has consumed.
>> +* Response FIFO tail pointer (offset 0xc).  Read only to the host.  Device
> 
> Read only _by_ the host.

Sure.

> 
>> +  increments this register to add new items to the FIFO.
>> +
>> +The values in each register are indexes in the FIFO.  To get the location of the
>> +FIFO element pointed to by the register: FIFO base address + register * element
>> +size.
>> +
>> +DBC registers are exposed to the host via the second BAR.  Each DBC consumes
>> +0x1000 of space in the BAR.
> 
> I wouldn't use hex for the sizes. 4KB seems a lot more readable.

Good point.  Will do.

> 
>> +The actual FIFOs are backed by host memory.  When sending a request to the QSM
>> +to activate a network, the host must donate memory to be used for the FIFOs.
>> +Due to internal mapping limitations of the device, a single contigious chunk of
> 
> contigious -> contiguous

Got it.

> 
>> +memory must be provided per DBC, which hosts both FIFOs.  The request FIFO will
>> +consume the beginning of the memory chunk, and the response FIFO will consume
>> +the end of the memory chunk.
>> +
>> +Request FIFO
>> +------------
>> +
>> +A request FIFO element has the following structure:
>> +
>> +| {
>> +|	u16 req_id;
>> +|	u8  seq_id;
>> +|	u8  pcie_dma_cmd;
>> +|	u32 reserved;
>> +|	u64 pcie_dma_source_addr;
>> +|	u64 pcie_dma_dest_addr;
>> +|	u32 pcie_dma_len;
>> +|	u32 reserved;
>> +|	u64 doorbell_addr;
>> +|	u8  doorbell_attr;
>> +|	u8  reserved;
>> +|	u16 reserved;
>> +|	u32 doorbell_data;
>> +|	u32 sem_cmd0;
>> +|	u32 sem_cmd1;
>> +|	u32 sem_cmd2;
>> +|	u32 sem_cmd3;
>> +| }
>> +
>> +Request field descriptions:
>> +
>> +| req_id- request ID.  A request FIFO element and a response FIFO element with
>> +|         the same request ID refer to the same command.
>> +
>> +| seq_id- sequence ID within a request.  Ignored by the DMA Bridge.
>> +
>> +| pcie_dma_cmd- describes the DMA element of this request.
>> +| 	Bit(7) is the force msi flag, which overrides the DMA Bridge MSI logic
>> +| 		and generates a MSI when this request is complete, and QSM
>> +| 		configures the DMA Bridge to look at this bit.
>> +| 	Bits(6:5) are reserved.
>> +| 	Bit(4) is the completion code flag, and indicates that the DMA Bridge
>> +| 		shall generate a response FIFO element when this request is
>> +| 		complete.
>> +| 	Bit(3) indicates if this request is a linked list transfer(0) or a bulk
>> +| 		transfer(1).
>> +| 	Bit(2) is reserved.
>> +| 	Bits(1:0) indicate the type of transfer.  No transfer(0), to device(1),
>> +| 		from device(2).  Value 3 is illegal.
>> +
>> +| pcie_dma_source_addr- source address for a bulk transfer, or the address of
>> +|         the linked list.
>> +
>> +| pcie_dma_dest_addr- destination address for a bulk transfer.
>> +
>> +| pcie_dma_len- length of the bulk transfer.  Note that the size of this field
>> +| 	limits transfers to 4G in size.
>> +
>> +| doorbell_addr- address of the doorbell to ring when this request is complete.
>> +
>> +| doorbell_attr- doorbell attributes.
>> +| 	Bit(7) indicates if a write to a doorbell is to occur.
>> +| 	Bits(6:2) are reserved.
>> +| 	Bits(1:0) contain the encoding of the doorbell length.  0 is 32-bit,
>> +| 		1 is 16-bit, 2 is 8-bit, 3 is reserved.  The doorbell address
>> +| 		must be naturally aligned to the specified length.
>> +
>> +| doorbell_data- data to write to the doorbell.  Only the bits corresponding to
>> +| 	the doorbell length are valid.
>> +
>> +| sem_cmdN- semaphore command.
>> +| 	Bit(31) indicates this semaphore command is enabled.
>> +| 	Bit(30) is the to-device DMA fence.  Block this request until all
>> +| 		to-device DMA transfers are complete.
>> +| 	Bit(29) is the from-device DMA fence.  Block this request until all
>> +| 		from-device DMA transfers are complete.
>> +| 	Bits(28:27) are reserved.
>> +| 	Bits(26:24) are the semaphore command.  0 is NOP.  1 is init with the
>> +| 		specified value.  2 is increment.  3 is decrement.  4 is wait
>> +| 		until the semaphore is equal to the specified value.  5 is wait
>> +| 		until the semaphore is greater or equal to the specified value.
>> +| 		6 is "P", wait until semaphore is greater than 0, then
>> +| 		decrement by 1.  7 is reserved.
>> +| 	Bit(23) is reserved.
>> +| 	Bit(22) is the semaphore sync.  0 is post sync, which means that the
>> +| 		semaphore operation is done after the DMA transfer.  1 is
>> +| 		presync, which gates the DMA transfer.  Only one presync is
>> +| 		allowed per request.
>> +| 	Bit(21) is reserved.
>> +| 	Bits(20:16) is the index of the semaphore to operate on.
>> +| 	Bits(15:12) are reserved.
>> +| 	Bits(11:0) are the semaphore value to use in operations.
> 
> It seems to me like structure documentation

Yes.  It can be modeled that way.  However the code comes later so it 
can't be referenced here yet.  I've got a todo to come back and clean 
that up once this series is merged.

> 
>> +Overall, a request is processed in 4 steps:
>> +
>> +1. If specified, the presync semaphore condition must be true
>> +2. If enabled, the DMA transfer occurs
>> +3. If specified, the postsync semaphore conditions must be true
>> +4. If enabled, the doorbell is written
>> +
>> +By using the semaphores in conjunction with the workload running on the NSPs,
>> +the data pipeline can be synchronized such that the host can queue multiple
>> +requests of data for the workload to process, but the DMA Bridge will only copy
>> +the data into the memory of the workload when the workload is ready to process
>> +the next input.
>> +
>> +Response FIFO
>> +-------------
>> +
>> +Once a request is fully processed, a response FIFO element is generated if
>> +specified in pcie_dma_cmd.  The structure of a response FIFO element:
>> +
>> +| {
>> +| 	u16 req_id;
>> +| 	u16 completion_code;
>> +| }
>> +
>> +req_id- matches the req_id of the request that generated this element.
>> +
>> +completion_code- status of this request.  0 is success.  non-zero is an error.
>> +
>> +The DMA Bridge will generate a MSI to the host as a reaction to activity in the
>> +response FIFO of a DBC.  The DMA Bridge hardware has an IRQ storm mitigation
>> +algorithm, where it will only generate a MSI when the response FIFO transitions
>> +from empty to non-empty (unless force MSI is enabled and triggered).  In
>> +response to this MSI, the host is expected to drain the response FIFO, and must
>> +take care to handle any race conditions between draining the FIFO, and the
>> +device inserting elements into the FIFO.
>> +
>> +Neural Network Control (NNC) Protocol
>> +=====================================
>> +
>> +The NNC protocol is how the host makes requests to the QSM to manage workloads.
>> +It uses the QAIC_CONTROL MHI channel.
>> +
>> +Each NNC request is packaged into a message.  Each message is a series of
>> +transactions.  A passthrough type transaction can contain elements known as
>> +commands.
>> +
>> +QSM requires NNC messages be little endian encoded and the fields be naturally
>> +aligned.  Since there are 64-bit elements in some NNC messages, 64-bit alignment
>> +must be maintained.
>> +
>> +A message contains a header and then a series of transactions.  A message may be
>> +at most 4K in size from QSM to the host.  From the host to the QSM, a message
>> +can be at most 64K (maximum size of a single MHI packet), but there is a
>> +continuation feature where message N+1 can be marked as a continuation of
>> +message N.  This is used for exceedingly large DMA xfer transactions.
>> +
>> +Transaction descriptions:
>> +
>> +passthrough- Allows userspace to send an opaque payload directly to the QSM.
>> +This is used for NNC commands.  Userspace is responsible for managing
>> +the QSM message requirements in the payload
>> +
>> +dma_xfer- DMA transfer.  Describes an object that the QSM should DMA into the
>> +device via address and size tuples.
>> +
>> +activate- Activate a workload onto NSPs.  The host must provide memory to be
>> +used by the DBC.
>> +
>> +deactivate- Deactivate an active workload and return the NSPs to idle.
>> +
>> +status- Query the QSM about it's NNC implementation.  Returns the NNC version,
>> +and if CRC is used.
>> +
>> +terminate- Release a user's resources.
>> +
>> +dma_xfer_cont- Continuation of a previous DMA transfer.  If a DMA transfer
>> +cannot be specified in a single message (highly fragmented), this
>> +transaction can be used to specify more ranges.
>> +
>> +validate_partition- Query to QSM to determine if a partition identifier is
>> +valid.
>> +
>> +Each message is tagged with a user id, and a partition id.  The user id allows
>> +QSM to track resources, and release them when the user goes away (eg the process
>> +crashes).  A partition id identifies the resource partition that QSM manages,
>> +which this message applies to.
>> +
>> +Messages may have CRCs.  Messages should have CRCs applied until the QSM
>> +reports via the status transaction that CRCs are not needed.  The QSM on the
>> +SA9000P requires CRCs for black channel safing.
>> +
>> +Subsystem Restart (SSR)
>> +=======================
>> +
>> +SSR is the concept of limiting the impact of an error.  An AIC100 device may
>> +have multiple users, each with their own workload running.  If the workload of
>> +one user crashes, the fallout of that should be limited to that workload and not
>> +impact other workloads.  SSR accomplishes this.
>> +
>> +If a particular workload crashes, QSM notifies the host via the QAIC_SSR MHI
>> +channel.  This notification identifies the workload by it's assigned DBC.  A
>> +multi-stage recovery process is then used to cleanup both sides, and get the
>> +DBC/NSPs into a working state.
>> +
>> +When SSR occurs, any state in the workload is lost.  Any inputs that were in
>> +process, or queued by not yet serviced, are lost.  The loaded artifacts will
>> +remain in on-card DDR, but the host will need to re-activate the workload if
>> +it desires to recover the workload.
>> +
>> +Reliability, Accessability, Serviceability (RAS)
> 
> Accessability -> Accessibility

Got it.

> 
>> +================================================
>> +
>> +AIC100 is expected to be deployed in server systems where RAS ideology is
>> +applied.  Simply put, RAS is the concept of detecting, classifying, and
>> +reporting errors.  While PCIe has AER (Advanced Error Reporting) which factors
>> +into RAS, AER does not allow for a device to report details about internal
>> +errors.  Therefore, AIC100 implements a custom RAS mechanism.  When a RAS event
>> +occurs, QSM will report the event with appropriate details via the QAIC_STATUS
>> +MHI channel.  A sysadmin may determine that a particular device needs
>> +additional service based on RAS reports.
>> +
>> +Telemetry
>> +=========
>> +
>> +QSM has the ability to report various physical attributes of the device, and in
>> +some cases, to allow the host to control them.  Examples include thermal limits,
>> +thermal readings, and power readings.  These items are communicated via the
>> +QAIC_TELEMETRY MHI channel
>> diff --git a/Documentation/accel/qaic/index.rst b/Documentation/accel/qaic/index.rst
>> new file mode 100644
>> index 0000000..ad19b88
>> --- /dev/null
>> +++ b/Documentation/accel/qaic/index.rst
>> @@ -0,0 +1,13 @@
>> +.. SPDX-License-Identifier: GPL-2.0-only
>> +
>> +=====================================
>> + accel/qaic Qualcomm Cloud AI driver
>> +=====================================
>> +
>> +The accel/qaic driver supports the Qualcomm Cloud AI machine learning
>> +accelerator cards.
>> +
>> +.. toctree::
>> +
>> +   qaic
>> +   aic100
>> diff --git a/Documentation/accel/qaic/qaic.rst b/Documentation/accel/qaic/qaic.rst
>> new file mode 100644
>> index 0000000..b0e7a5f
>> --- /dev/null
>> +++ b/Documentation/accel/qaic/qaic.rst
>> @@ -0,0 +1,169 @@
>> +.. SPDX-License-Identifier: GPL-2.0-only
>> +
>> +=============
>> + QAIC driver
>> +=============
>> +
>> +The QAIC driver is the Kernel Mode Driver (KMD) for the AIC100 family of AI
>> +accelerator products.
>> +
>> +Interrupts
>> +==========
>> +
>> +While the AIC100 DMA Bridge hardware implements an IRQ storm mitigation
>> +mechanism, it is still possible for an IRQ storm to occur.  A storm can happen
>> +if the workload is particularly quick, and the host is responsive.  If the host
>> +can drain the response FIFO as quickly as the device can insert elements into
>> +it, then the device will frequently transition the response FIFO from empty to
>> +non-empty and generate MSIs at a rate equilivelent to the speed of the
> 
> equilivelent -> equivalent

Sure.

> 
>> +workload's ability to process inputs.  The lprnet (license plate reader network)
>> +workload is known to trigger this condition, and can generate in excess of 100k
>> +MSIs per second.  It has been observed that most systems cannot tolerate this
>> +for long, and will crash due to some form of watchdog due to the overhead of
>> +the interrupt controller interrupting the host CPU.
>> +
>> +To mitigate this issue, the QAIC driver implements specific IRQ handling.  When
>> +QAIC receives an IRQ, it disables that line.  This prevents the interrupt
>> +controller from interrupting the CPU.  Then AIC drains the FIFO.  Once the FIFO
>> +is drained, QAIC implements a "last chance" polling algorithm where QAIC will
>> +sleep for a time to see if the workload will generate more activity.  The IRQ
>> +line remains disabled during this time.  If no activity is detected, QAIC exits
>> +polling mode and reenables the IRQ line.
>> +
>> +This mitigation in QAIC is very effective.  The same lprnet usecase that
>> +generates 100k IRQs per second (per /proc/interrupts) is reduced to roughly 64
>> +IRQs over 5 minutes while keeping the host system stable, and having the same
>> +workload throughput performance (within run to run noise variation).
>> +
>> +
>> +Neural Network Control (NNC) Protocol
>> +=====================================
>> +
>> +The implementation of NNC is split between the KMD (QAIC) and UMD.  In general
>> +QAIC understands how to encode/decode NNC wire protocol, and elements of the
>> +protocol which require kernelspace knowledge to process (for example, mapping
> 
> kernelspace is missing a space :P
> 
>> +host memory to device IOVAs).  QAIC understands the structure of a message, and
>> +all of the transactions.  QAIC does not understand commands (the payload of a
>> +passthrough transaction).
>> +
>> +QAIC handles and enforces the required little endianness and 64-bit alignment,
>> +to the degree that it can.  Since QAIC does not know the contents of a
>> +passthrough transaction, it relies on the UMD to saitsfy the requirements.
> 
> saitsfy -> satisfy

Will do.

> 
>> +The terminate transaction is of particular use to QAIC.  QAIC is not aware of
>> +the resources that are loaded onto a device since the majority of that activity
>> +occurs within NNC commands.  As a result, QAIC does not have the means to
>> +roll back userspace activity.  To ensure that a userspace client's resources
>> +are fully released in the case of a process crash, or a bug, QAIC uses the
>> +terminate command to let QSM know when a user has gone away, and the resources
>> +can be released.
>> +
>> +QSM can report a version number of the NNC protocol it supports.  This is in the
>> +form of a Major number and a Minor number.
>> +
>> +Major number updates indicate changes to the NNC protocol which impact the
>> +message format, or transactions (impacts QAIC).
>> +
>> +Minor number updates indicate changes to the NNC protocol which impact the
>> +commands (does not impact QAIC).
>> +
>> +uAPI
>> +====
>> +
>> +QAIC defines a number of driver specific IOCTLs as part of the userspace API.
>> +This section describes those APIs.
>> +
>> +DRM_IOCTL_QAIC_MANAGE:
>> +This IOCTL allows userspace to send a NNC request to the QSM.  The call will
>> +block until a response is received, or the request has timed out.
>> +
>> +DRM_IOCTL_QAIC_CREATE_BO:
>> +This IOCTL allows userspace to allocate a buffer object (BO) which can send or
>> +receive data from a workload.  The call will return a GEM handle that
>> +represents the allocated buffer.  The BO is not usable until it has been sliced
>> +(see DRM_IOCTL_QAIC_ATTACH_SLICE_BO).
>> +
>> +DRM_IOCTL_QAIC_MMAP_BO:
>> +This IOCTL allows userspace to prepare an allocated BO to be mmap'd into the
>> +userspace process.
>> +
>> +DRM_IOCTL_QAIC_ATTACH_SLICE_BO:
>> +This IOCTL allows userspace to slice a BO in preparation for sending the BO to
>> +the device.  Slicing is the operation of describing what portions of a BO get
>> +sent where to a workload.  This requires a set of DMA transfers for the DMA
>> +Bridge, and as such, locks the BO to a specific DBC.
>> +
>> +DRM_IOCTL_QAIC_EXECUTE_BO:
>> +This IOCTL allows userspace to submit a set of sliced BOs to the device.  The
>> +call is non-blocking.  Success only indicates that the BOs have been queued
>> +to the device, but does not guarantee they have been executed.
>> +
>> +DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO:
>> +This IOCTL operates like DRM_IOCTL_QAIC_EXECUTE_BO, but it allows userspace to
>> +shrink the BOs sent to the device for this specific call.  If a BO typically has
>> +N inputs, but only a subset of those is available, this IOCTL allows userspace
>> +to indicate that only the first M bytes of the BO should be sent to the device
>> +to minimize data transfer overhead.  This IOCTL dynamically recomputes the
>> +slicing, and therefore has some processing overhead before the BOs can be queued
>> +to the device.
>> +
>> +DRM_IOCTL_QAIC_WAIT_BO:
>> +This IOCTL allows userspace to determine when a particular BO has been processed
>> +by the device.  The call will block until either the BO has been processed and
>> +can be re-queued to the device, or a timeout occurs.
>> +
>> +DRM_IOCTL_QAIC_PERF_STATS_BO:
>> +This IOCTL allows userspace to collect performance statistics on the most
>> +recent execution of a BO.  This allows userspace to construct an end to end
>> +timeline of the BO processing for a performance analysis.
>> +
>> +DRM_IOCTL_QAIC_PART_DEV:
>> +This IOCTL allows userspace to request a duplicate "shadow device".  This extra
>> +accelN device is associated with a specific partition of resources on the AIC100
>> +device and can be used for limiting a process to some subset of resources.
>> +
>> +Userspace Client Isolation
>> +==========================
>> +
>> +AIC100 supports multiple clients.  Multiple DBCs can be consumed by a single
>> +client, and multiple clients can each consume one or more DBCs.  Workloads
>> +may contain sensistive information therefore only the client that owns the
> 
> sensistive -> sensitive

Will do.

> 
>> +workload should be allowed to interface with the DBC.
>> +
>> +Clients are identified by the instance associated with their open().  A client
>> +may only use memory they allocate, and DBCs that are assigned to their
>> +workloads.  Attempts to access resources assigned to other clients will be
>> +rejected.
>> +
>> +Module parameters
>> +=================
>> +
>> +QAIC supports the following module parameters:
>> +
>> +**datapath_polling (bool)**
>> +
>> +Configures QAIC to use a polling thread for datapath events instead of relying
>> +on the device interrupts.  Useful for platforms with broken multiMSI.  Must be
>> +set at QAIC driver initialization.  Default is 0 (off).
>> +
>> +**mhi_timeout (int)**
>> +
>> +Sets the timeout value for MHI operations in milliseconds (ms).  Must be set
>> +at the time the driver detects a device.  Default is 2000 (2 seconds).
>> +
>> +**control_resp_timeout (int)**
>> +
>> +Sets the timeout value for QSM responses to NNC messages in seconds (s).  Must
>> +be set at the time the driver is sending a request to QSM.  Default is 60 (one
>> +minute).
>> +
>> +**wait_exec_default_timeout (int)**
>> +
>> +Sets the default timeout for the wait_exec ioctl in milliseconds (ms).  Must be
>> +set prior to the waic_exec ioctl call.  A value specified in the ioctl call
>> +overrides this for that call.  Default is 5000 (5 seconds).
>> +
>> +**datapath_poll_interval_us (int)**
>> +
>> +Sets the polling interval in microseconds (us) when datapath polling is active.
>> +Takes effect at the next polling interval.  Default is 100 (100 us).
> 
> Cool that you are staring with the documentation :)
> I suggest running at least "checkpatch.pl --codespell" on the series as there are many spelling issue.

Thats what happened to checkpatch.  Thanks for pointing out the flag.  I 
remember it doing spellcheck by default.  Apparently it needs a flag 
now.  I'll do that going forward.

> 
> Regards,
> Jacek
> 
> 
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/8] accel/qaic: Add uapi and core driver file
  2023-02-06 15:41   ` Jeffrey Hugo
  (?)
@ 2023-02-16 14:13   ` Jacek Lawrynowicz
  2023-02-17 18:15       ` Jeffrey Hugo
  -1 siblings, 1 reply; 68+ messages in thread
From: Jacek Lawrynowicz @ 2023-02-16 14:13 UTC (permalink / raw)
  To: Jeffrey Hugo, ogabbay, airlied, daniel, dri-devel
  Cc: linux-doc, linux-arm-msm, quic_ajitpals, quic_pkanojiy,
	stanislaw.gruszka, quic_carlv

Hi,

On 06.02.2023 16:41, Jeffrey Hugo wrote:
> Add the QAIC driver uapi file and core driver file that binds to the PCIe
> device.  The core driver file also creates the accel device and manages
> all the interconnections between the different parts of the driver.
> 
> The driver can be built as a module.  If so, it will be called "qaic.ko".
> 
> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
> ---
>  drivers/accel/qaic/qaic.h     | 321 ++++++++++++++++++
>  drivers/accel/qaic/qaic_drv.c | 771 ++++++++++++++++++++++++++++++++++++++++++
>  include/uapi/drm/qaic_accel.h | 283 ++++++++++++++++
>  3 files changed, 1375 insertions(+)
>  create mode 100644 drivers/accel/qaic/qaic.h
>  create mode 100644 drivers/accel/qaic/qaic_drv.c
>  create mode 100644 include/uapi/drm/qaic_accel.h
> 
> diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
> new file mode 100644
> index 0000000..3f7ea76
> --- /dev/null
> +++ b/drivers/accel/qaic/qaic.h
> @@ -0,0 +1,321 @@
> +/* SPDX-License-Identifier: GPL-2.0-only
> + *
> + * Copyright (c) 2019-2021, The Linux Foundation. All rights reserved.
> + * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#ifndef QAICINTERNAL_H_

Please use guard macro that matches the file name: _QAIC_H_

> +#define QAICINTERNAL_H_
> +
> +#include <linux/interrupt.h>
> +#include <linux/kref.h>
> +#include <linux/mhi.h>
> +#include <linux/mutex.h>
> +#include <linux/pci.h>
> +#include <linux/spinlock.h>
> +#include <linux/srcu.h>
> +#include <linux/wait.h>
> +#include <linux/workqueue.h>
> +#include <drm/drm_device.h>
> +#include <drm/drm_gem.h>
> +
> +#define QAIC_DBC_BASE		0x20000
> +#define QAIC_DBC_SIZE		0x1000
> +
> +#define QAIC_NO_PARTITION	-1
> +
> +#define QAIC_DBC_OFF(i)		((i) * QAIC_DBC_SIZE + QAIC_DBC_BASE)
> +
> +#define to_qaic_bo(obj) container_of(obj, struct qaic_bo, base)
> +
> +extern bool poll_datapath;
> +
> +struct qaic_user {
> +	/* Uniquely identifies this user for the device */
> +	int			handle;
> +	struct kref		ref_count;
> +	/* Char device opened by this user */
> +	struct qaic_drm_device	*qddev;
> +	/* Node in list of users that opened this drm device */
> +	struct list_head	node;
> +	/* SRCU used to synchronize this user during cleanup */
> +	struct srcu_struct	qddev_lock;
> +	atomic_t		chunk_id;
> +};
> +
> +struct dma_bridge_chan {
> +	/* Pointer to device strcut maintained by driver */
> +	struct qaic_device	*qdev;
> +	/* ID of this DMA bridge channel(DBC) */
> +	unsigned int		id;
> +	/* Synchronizes access to xfer_list */
> +	spinlock_t		xfer_lock;
> +	/* Base address of request queue */
> +	void			*req_q_base;
> +	/* Base address of response queue */
> +	void			*rsp_q_base;
> +	/*
> +	 * Base bus address of request queue. Response queue bus address can be
> +	 * calculated by adding request queue size to this variable
> +	 */
> +	dma_addr_t		dma_addr;
> +	/* Total size of request and response queue in byte */
> +	u32			total_size;
> +	/* Capacity of request/response queue */
> +	u32			nelem;
> +	/* The user that opened this DBC */
> +	struct qaic_user	*usr;
> +	/*
> +	 * Request ID of next memory handle that goes in request queue. One
> +	 * memory handle can enqueue more than one request elements, all
> +	 * this requests that belong to same memory handle have same request ID
> +	 */
> +	u16			next_req_id;
> +	/* TRUE: DBC is in use; FALSE: DBC not in use */

Use standard "true"/"false" instead of custom "TRUE"/"FALSE" macros.
This applies here and in multiple other places in the driver.

> +	bool			in_use;
> +	/*
> +	 * Base address of device registers. Used to read/write request and
> +	 * response queue's head and tail pointer of this DBC.
> +	 */
> +	void __iomem		*dbc_base;
> +	/* Head of list where each node is a memory handle queued in request queue */
> +	struct list_head	xfer_list;
> +	/* Synchronizes DBC readers during cleanup */
> +	struct srcu_struct	ch_lock;
> +	/*
> +	 * When this DBC is released, any thread waiting on this wait queue is
> +	 * woken up
> +	 */
> +	wait_queue_head_t	dbc_release;
> +	/* Head of list where each node is a bo associated with this DBC */
> +	struct list_head	bo_lists;
> +	/* The irq line for this DBC.  Used for polling */
> +	unsigned int		irq;
> +	/* Polling work item to simulate interrupts */
> +	struct work_struct	poll_work;
> +};
> +
> +struct qaic_device {
> +	/* Pointer to base PCI device struct of our physical device */
> +	struct pci_dev		*pdev;
> +	/* Mask of all bars of this device */
> +	int			bars;
> +	/* Req. ID of request that will be queued next in MHI control device */
> +	u32			next_seq_num;
> +	/* Base address of bar 0 */
> +	void __iomem		*bar_0;
> +	/* Base address of bar 2 */
> +	void __iomem		*bar_2;
> +	/* Controller structure for MHI devices */
> +	struct mhi_controller	*mhi_cntl;
> +	/* MHI control channel device */
> +	struct mhi_device	*cntl_ch;
> +	/* List of requests queued in MHI control device */
> +	struct list_head	cntl_xfer_list;
> +	/* Synchronizes MHI control device transactions and its xfer list */
> +	struct mutex		cntl_mutex;
> +	/* Base actual physical representation of drm device */
> +	struct qaic_drm_device	*base_dev;
> +	/* Array of DBC struct of this device */
> +	struct dma_bridge_chan	*dbc;
> +	/* Work queue for tasks related to MHI control device */
> +	struct workqueue_struct	*cntl_wq;
> +	/* Synchronizes all the users of device during cleanup */
> +	struct srcu_struct	dev_lock;
> +	/* TRUE: Device under reset; FALSE: Device not under reset */
> +	bool			in_reset;
> +	/*
> +	 * TRUE: A tx MHI transaction has failed and a rx buffer is still queued
> +	 * in control device. Such a buffer is considered lost rx buffer
> +	 * FALSE: No rx buffer is lost in control device
> +	 */
> +	bool			cntl_lost_buf;
> +	/* Maximum number of DBC supported by this device */
> +	u32			num_dbc;
> +	/* Head in list of drm device created on top of this device */
> +	struct list_head	qaic_drm_devices;
> +	/* Synchronizes access of qaic_drm_devices list */
> +	struct mutex		qaic_drm_devices_mutex;
> +	/* Generate the CRC of a control message */
> +	u32 (*gen_crc)(void *msg);
> +	/* Validate the CRC of a control message */
> +	bool (*valid_crc)(void *msg);
> +};
> +
> +struct qaic_drm_device {
> +	/* Pointer to the root device struct driven by this driver */
> +	struct qaic_device	*qdev;
> +	/* Node in list of drm devices maintained by root device */
> +	struct list_head	node;
> +	/*
> +	 * The physical device can be partition in number of logical devices.
> +	 * And each logical device is given a partition id. This member stores
> +	 * that id. QAIC_NO_PARTITION is a sentinel used to mark that this drm
> +	 * device is the actual physical device
> +	 */
> +	s32			partition_id;
> +	/*
> +	 * It points to the user that created this drm device. It is NULL
> +	 * when this drm device represents the physical device i.e.
> +	 * partition_id is QAIC_NO_PARTITION
> +	 */
> +	struct qaic_user	*owner;
> +	/* Pointer to the drm device struct of this drm device */
> +	struct drm_device	*ddev;
> +	/* Head in list of users who have opened this drm device */
> +	struct list_head	users;
> +	/* Synchronizes access to users list */
> +	struct mutex		users_mutex;
> +};
> +
> +struct qaic_bo {
> +	struct drm_gem_object	base;

Any reason why drm_gem_shmem_object cannot be used as a base?

> +	/* Scatter/gather table for allocate/imported BO */
> +	struct sg_table		*sgt;
> +	/* BO size requested by user. GEM object might be bigger in size. */
> +	u64			size;
> +	/* Head in list of slices of this BO */
> +	struct list_head	slices;
> +	/* Total nents, for all slices of this BO */
> +	int			total_slice_nents;
> +	/*
> +	 * Direction of transfer. It can assume only two value DMA_TO_DEVICE and
> +	 * DMA_FROM_DEVICE.
> +	 */
> +	int			dir;
> +	/* The pointer of the DBC which operates on this BO */
> +	struct dma_bridge_chan	*dbc;
> +	/* Number of slice that belongs to this buffer */
> +	u32			nr_slice;
> +	/* Number of slice that have been transferred by DMA engine */
> +	u32			nr_slice_xfer_done;
> +	/* TRUE = BO is queued for execution, FALSE = BO is not queued */
> +	bool			queued;
> +	/*
> +	 * If TRUE then user has attached slicing information to this BO by
> +	 * calling DRM_IOCTL_QAIC_ATTACH_SLICE_BO ioctl.
> +	 */
> +	bool			sliced;
> +	/* Request ID of this BO if it is queued for execution */
> +	u16			req_id;
> +	/* Handle assigned to this BO */
> +	u32			handle;
> +	/* Wait on this for completion of DMA transfer of this BO */
> +	struct completion	xfer_done;
> +	/*
> +	 * Node in linked list where head is dbc->xfer_list.
> +	 * This link list contain BO's that are queued for DMA transfer.
> +	 */
> +	struct list_head	xfer_list;
> +	/*
> +	 * Node in linked list where head is dbc->bo_lists.
> +	 * This link list contain BO's that are associated with the DBC it is
> +	 * linked to.
> +	 */
> +	struct list_head	bo_list;
> +	struct {
> +		/*
> +		 * Latest timestamp(ns) at which kernel received a request to
> +		 * execute this BO
> +		 */
> +		u64		req_received_ts;
> +		/*
> +		 * Latest timestamp(ns) at which kernel enqueued requests of
> +		 * this BO for execution in DMA queue
> +		 */
> +		u64		req_submit_ts;
> +		/*
> +		 * Latest timestamp(ns) at which kernel received a completion
> +		 * interrupt for requests of this BO
> +		 */
> +		u64		req_processed_ts;
> +		/*
> +		 * Number of elements already enqueued in DMA queue before
> +		 * enqueuing requests of this BO
> +		 */
> +		u32		queue_level_before;
> +	} perf_stats;
> +
> +};
> +
> +struct bo_slice {
> +	/* Mapped pages */
> +	struct sg_table		*sgt;
> +	/* Number of requests required to queue in DMA queue */
> +	int			nents;
> +	/* See enum dma_data_direction */
> +	int			dir;
> +	/* Actual requests that will be copied in DMA queue */
> +	struct dbc_req		*reqs;
> +	struct kref		ref_count;
> +	/* TRUE: No DMA transfer required */
> +	bool			no_xfer;
> +	/* Pointer to the parent BO handle */
> +	struct qaic_bo		*bo;
> +	/* Node in list of slices maintained by parent BO */
> +	struct list_head	slice;
> +	/* Size of this slice in bytes */
> +	u64			size;
> +	/* Offset of this slice in buffer */
> +	u64			offset;
> +};
> +
> +int get_dbc_req_elem_size(void);
> +int get_dbc_rsp_elem_size(void);
> +int get_cntl_version(struct qaic_device *qdev, struct qaic_user *usr,
> +		     u16 *major, u16 *minor);
> +int qaic_manage_ioctl(struct drm_device *dev, void *data,
> +		      struct drm_file *file_priv);
> +int qaic_execute_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
> +		       unsigned long arg, bool is_partial);
> +int qaic_wait_exec_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
> +			 unsigned long arg);
> +int qaic_query_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
> +		     unsigned long arg);
> +int qaic_data_mmap(struct qaic_device *qdev, struct qaic_user *usr,
> +		   struct vm_area_struct *vma);
> +int qaic_data_get_reservation(struct qaic_device *qdev, struct qaic_user *usr,
> +			      void *data, u32 *partition_id,
> +			      u16 *remove);
> +void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
> +			 struct mhi_result *mhi_result);
> +
> +void qaic_mhi_dl_xfer_cb(struct mhi_device *mhi_dev,
> +			 struct mhi_result *mhi_result);
> +
> +int qaic_control_open(struct qaic_device *qdev);
> +void qaic_control_close(struct qaic_device *qdev);
> +void qaic_release_usr(struct qaic_device *qdev, struct qaic_user *usr);
> +
> +irqreturn_t dbc_irq_threaded_fn(int irq, void *data);
> +irqreturn_t dbc_irq_handler(int irq, void *data);
> +int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr);
> +void enable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr);
> +void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id);
> +void release_dbc(struct qaic_device *qdev, u32 dbc_id);
> +
> +void wake_all_cntl(struct qaic_device *qdev);
> +void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset);
> +
> +struct drm_gem_object *qaic_gem_prime_import(struct drm_device *dev,
> +					     struct dma_buf *dma_buf);
> +
> +int qaic_create_bo_ioctl(struct drm_device *dev, void *data,
> +			 struct drm_file *file_priv);
> +int qaic_mmap_bo_ioctl(struct drm_device *dev, void *data,
> +		       struct drm_file *file_priv);
> +int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data,
> +			       struct drm_file *file_priv);
> +int qaic_execute_bo_ioctl(struct drm_device *dev, void *data,
> +			  struct drm_file *file_priv);
> +int qaic_partial_execute_bo_ioctl(struct drm_device *dev, void *data,
> +				  struct drm_file *file_priv);
> +int qaic_wait_bo_ioctl(struct drm_device *dev, void *data,
> +		       struct drm_file *file_priv);
> +int qaic_test_print_bo_ioctl(struct drm_device *dev, void *data,
> +			     struct drm_file *file_priv);
> +int qaic_perf_stats_bo_ioctl(struct drm_device *dev, void *data,
> +			     struct drm_file *file_priv);

You don't need to break these lines. You can use up to 100 columns in the whole driver.
It will be more readable and checkpatch won't complain.

> +void irq_polling_work(struct work_struct *work);
> +
> +#endif /* QAICINTERNAL_H_ */
> diff --git a/drivers/accel/qaic/qaic_drv.c b/drivers/accel/qaic/qaic_drv.c
> new file mode 100644
> index 0000000..602a784
> --- /dev/null
> +++ b/drivers/accel/qaic/qaic_drv.c
> @@ -0,0 +1,771 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +/* Copyright (c) 2019-2021, The Linux Foundation. All rights reserved. */
> +/* Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved. */
> +
> +#include <linux/delay.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/idr.h>
> +#include <linux/interrupt.h>
> +#include <linux/list.h>
> +#include <linux/kref.h>
> +#include <linux/mhi.h>
> +#include <linux/module.h>
> +#include <linux/msi.h>
> +#include <linux/mutex.h>
> +#include <linux/pci.h>
> +#include <linux/sched.h>

Is <linux/sched.h> used here?
Feels like there are couple other unused includes in this file.

> +#include <linux/spinlock.h>
> +#include <linux/workqueue.h>
> +#include <linux/wait.h>
> +#include <drm/drm_accel.h>
> +#include <drm/drm_drv.h>
> +#include <drm/drm_file.h>
> +#include <drm/drm_gem.h>
> +#include <drm/drm_ioctl.h>
> +#include <uapi/drm/qaic_accel.h>
> +
> +#include "mhi_controller.h"
> +#include "mhi_qaic_ctrl.h"
> +#include "qaic.h"
> +
> +MODULE_IMPORT_NS(DMA_BUF);
> +
> +#define PCI_DEV_AIC100			0xa100
> +#define QAIC_NAME			"qaic"
> +#define QAIC_DESC			"Qualcomm Cloud AI Accelerators"
> +
> +static unsigned int datapath_polling;
> +module_param(datapath_polling, uint, 0400);
> +bool poll_datapath;
> +
> +static u16 cntl_major = 5;
> +static u16 cntl_minor;/* 0 */

Missing space before the comment.
And also you could convert both vars to macros as they are constants.

> +static bool link_up;
> +static DEFINE_IDA(qaic_usrs);
> +
> +static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id,
> +				  struct qaic_user *owner);
> +static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id,
> +				    struct qaic_user *owner);
> +
> +static void free_usr(struct kref *kref)
> +{
> +	struct qaic_user *usr = container_of(kref, struct qaic_user, ref_count);
> +
> +	cleanup_srcu_struct(&usr->qddev_lock);
> +	ida_free(&qaic_usrs, usr->handle);
> +	kfree(usr);
> +}
> +
> +static int qaic_open(struct drm_device *dev, struct drm_file *file)
> +{
> +	struct qaic_drm_device *qddev = dev->dev_private;
> +	struct qaic_device *qdev = qddev->qdev;
> +	struct qaic_user *usr;
> +	int rcu_id;
> +	int ret;
> +
> +	rcu_id = srcu_read_lock(&qdev->dev_lock);
> +	if (qdev->in_reset) {
> +		srcu_read_unlock(&qdev->dev_lock, rcu_id);
> +		return -ENODEV;
> +	}
> +
> +	usr = kmalloc(sizeof(*usr), GFP_KERNEL);
> +	if (!usr) {
> +		srcu_read_unlock(&qdev->dev_lock, rcu_id);
> +		return -ENOMEM;
> +	}
> +
> +	usr->handle = ida_alloc(&qaic_usrs, GFP_KERNEL);
> +	if (usr->handle < 0) {
> +		srcu_read_unlock(&qdev->dev_lock, rcu_id);
> +		kfree(usr);
> +		return usr->handle;
> +	}
> +	usr->qddev = qddev;
> +	atomic_set(&usr->chunk_id, 0);
> +	init_srcu_struct(&usr->qddev_lock);
> +	kref_init(&usr->ref_count);
> +
> +	ret = mutex_lock_interruptible(&qddev->users_mutex);
> +	if (ret) {
> +		cleanup_srcu_struct(&usr->qddev_lock);
> +		kfree(usr);
> +		srcu_read_unlock(&qdev->dev_lock, rcu_id);
> +		return ret;
> +	}
> +
> +	list_add(&usr->node, &qddev->users);
> +	mutex_unlock(&qddev->users_mutex);
> +
> +	file->driver_priv = usr;
> +
> +	srcu_read_unlock(&qdev->dev_lock, rcu_id);
> +	return 0;
> +}
> +
> +static void qaic_postclose(struct drm_device *dev, struct drm_file *file)
> +{
> +	struct qaic_user *usr = file->driver_priv;
> +	struct qaic_drm_device *qddev;
> +	struct qaic_device *qdev;
> +	int qdev_rcu_id;
> +	int usr_rcu_id;
> +	int i;
> +
> +	qddev = usr->qddev;
> +	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
> +	if (qddev) {
> +		qdev = qddev->qdev;
> +		qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
> +		if (!qdev->in_reset) {
> +			qaic_release_usr(qdev, usr);
> +			for (i = 0; i < qdev->num_dbc; ++i)
> +				if (qdev->dbc[i].usr &&
> +				    qdev->dbc[i].usr->handle == usr->handle)
> +					release_dbc(qdev, i);
> +
> +			/* Remove child devices */
> +			if (qddev->partition_id == QAIC_NO_PARTITION)
> +				qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, usr);
> +		}
> +		srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
> +
> +		mutex_lock(&qddev->users_mutex);
> +		if (!list_empty(&usr->node))
> +			list_del_init(&usr->node);
> +		mutex_unlock(&qddev->users_mutex);
> +	}
> +
> +	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
> +	kref_put(&usr->ref_count, free_usr);
> +
> +	file->driver_priv = NULL;
> +}
> +
> +static int qaic_part_dev_ioctl(struct drm_device *dev, void *data,
> +			       struct drm_file *file_priv)
> +{
> +	struct qaic_device *qdev;
> +	struct qaic_user *usr;
> +	u32 partition_id;
> +	int qdev_rcu_id;
> +	int usr_rcu_id;
> +	int ret = 0;
> +	u16 remove;
> +
> +	usr = file_priv->driver_priv;
> +	if (!usr)
> +		return -EINVAL;
> +
> +	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
> +	if (!usr->qddev) {
> +		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
> +		return -ENODEV;
> +	}
> +
> +	qdev = usr->qddev->qdev;
> +	if (!qdev) {
> +		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
> +		return -ENODEV;
> +	}
> +
> +	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
> +	if (qdev->in_reset) {
> +		ret = -ENODEV;
> +		goto out;
> +	}
> +
> +	/* This IOCTL is only supported for base devices. */
> +	if (usr->qddev->partition_id != QAIC_NO_PARTITION) {
> +		ret = -ENOTTY;
> +		goto out;
> +	}
> +
> +	ret = qaic_data_get_reservation(qdev, usr, data, &partition_id,
> +					&remove);
> +	if (ret)
> +		goto out;
> +
> +	if (remove == 1)
> +		qaic_destroy_drm_device(qdev, partition_id, usr);
> +	else
> +		ret = qaic_create_drm_device(qdev, partition_id, usr);
> +
> +out:
> +	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
> +	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
> +
> +	return ret;
> +}
> +
> +DEFINE_DRM_ACCEL_FOPS(qaic_accel_fops);
> +
> +static const struct drm_ioctl_desc qaic_drm_ioctls[] = {
> +	DRM_IOCTL_DEF_DRV(QAIC_MANAGE, qaic_manage_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(QAIC_CREATE_BO, qaic_create_bo_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(QAIC_MMAP_BO, qaic_mmap_bo_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(QAIC_ATTACH_SLICE_BO, qaic_attach_slice_bo_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(QAIC_EXECUTE_BO, qaic_execute_bo_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(QAIC_PARTIAL_EXECUTE_BO, qaic_partial_execute_bo_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(QAIC_WAIT_BO, qaic_wait_bo_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(QAIC_PERF_STATS_BO, qaic_perf_stats_bo_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(QAIC_PART_DEV, qaic_part_dev_ioctl, DRM_RENDER_ALLOW),
> +};
> +
> +static const struct drm_driver qaic_accel_driver = {
> +	.driver_features	= DRIVER_GEM | DRIVER_COMPUTE_ACCEL,
> +
> +	.name			= QAIC_NAME,
> +	.desc			= QAIC_DESC,
> +	.date			= "20190618",
> +
> +	.fops			= &qaic_accel_fops,
> +	.open			= qaic_open,
> +	.postclose		= qaic_postclose,
> +
> +	.ioctls			= qaic_drm_ioctls,
> +	.num_ioctls		= ARRAY_SIZE(qaic_drm_ioctls),
> +	.prime_fd_to_handle	= drm_gem_prime_fd_to_handle,
> +	.gem_prime_import	= qaic_gem_prime_import,
> +};
> +
> +static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id,
> +				  struct qaic_user *owner)
> +{
> +	struct qaic_drm_device *qddev;
> +	struct drm_device *ddev;
> +	struct device *pdev;
> +	int ret;
> +
> +	/*
> +	 * Partition id QAIC_NO_PARTITION indicates that the device was created
> +	 * on mhi_probe and id > QAIC_NO_PARTITION indicates a partition
> +	 * created using IOCTL. So, pdev for primary device is the pci dev and
> +	 * the parent for partition dev is the primary device.
> +	 */
> +	if (partition_id == QAIC_NO_PARTITION)
> +		pdev = &qdev->pdev->dev;
> +	else
> +		pdev = qdev->base_dev->ddev->dev;
> +
> +	qddev = kzalloc(sizeof(*qddev), GFP_KERNEL);
> +	if (!qddev) {
> +		ret = -ENOMEM;
> +		goto qddev_fail;
> +	}
> +
> +	ddev = drm_dev_alloc(&qaic_accel_driver, pdev);
> +	if (IS_ERR(ddev)) {
> +		ret = PTR_ERR(ddev);
> +		goto ddev_fail;
> +	}
> +
> +	ddev->dev_private = qddev;
> +	qddev->ddev = ddev;
> +
> +	if (partition_id == QAIC_NO_PARTITION)
> +		qdev->base_dev = qddev;
> +	qddev->qdev = qdev;
> +	qddev->partition_id = partition_id;
> +	qddev->owner = owner;
> +	INIT_LIST_HEAD(&qddev->users);
> +	mutex_init(&qddev->users_mutex);
> +
> +	mutex_lock(&qdev->qaic_drm_devices_mutex);
> +	list_add(&qddev->node, &qdev->qaic_drm_devices);
> +	mutex_unlock(&qdev->qaic_drm_devices_mutex);
> +
> +	ret = drm_dev_register(ddev, 0);
> +	if (ret) {
> +		pci_dbg(qdev->pdev, "%s: drm_dev_register failed %d\n", __func__, ret);
> +		goto drm_reg_fail;
> +	}
> +
> +	return 0;
> +
> +drm_reg_fail:
> +	mutex_destroy(&qddev->users_mutex);
> +	mutex_lock(&qdev->qaic_drm_devices_mutex);
> +	list_del(&qddev->node);
> +	mutex_unlock(&qdev->qaic_drm_devices_mutex);
> +	if (partition_id == QAIC_NO_PARTITION)
> +		qdev->base_dev = NULL;
> +	drm_dev_put(ddev);
> +ddev_fail:
> +	kfree(qddev);
> +qddev_fail:
> +	return ret;
> +}
> +
> +static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id,
> +				    struct qaic_user *owner)
> +{
> +	struct qaic_drm_device *qddev;
> +	struct qaic_drm_device *q;
> +	struct qaic_user *usr;
> +
> +	list_for_each_entry_safe(qddev, q, &qdev->qaic_drm_devices, node) {
> +		/*
> +		 * Skip devices in case we just want to remove devices
> +		 * specific to a owner or partition id.
> +		 *
> +		 * owner	partition_id	notes
> +		 * ----------------------------------
> +		 * NULL		NO_PARTITION	delete base + all derived (qdev
> +		 *				reset)
> +		 * !NULL	NO_PARTITION	delete derived devs created by
> +		 *				owner.
> +		 * !NULL	>NO_PARTITION	delete derived dev identified by
> +		 *				the partition id and created by
> +		 *				owner
> +		 * NULL		>NO_PARTITION	invalid (no-op)
> +		 *
> +		 * if partition_id is any value < QAIC_NO_PARTITION this will be
> +		 * a no-op.
> +		 */
> +		if (owner && owner != qddev->owner)
> +			continue;
> +
> +		if (partition_id != QAIC_NO_PARTITION &&
> +		    partition_id != qddev->partition_id && !owner)
> +			continue;
> +
> +		/*
> +		 * Existing users get unresolvable errors till they close FDs.
> +		 * Need to sync carefully with users calling close().  The
> +		 * list of users can be modified elsewhere when the lock isn't
> +		 * held here, but the sync'ing the srcu with the mutex held
> +		 * could deadlock.  Grab the mutex so that the list will be
> +		 * unmodified.  The user we get will exist as long as the
> +		 * lock is held.  Signal that the qcdev is going away, and
> +		 * grab a reference to the user so they don't go away for
> +		 * synchronize_srcu().  Then release the mutex to avoid
> +		 * deadlock and make sure the user has observed the signal.
> +		 * With the lock released, we cannot maintain any state of the
> +		 * user list.
> +		 */
> +		mutex_lock(&qddev->users_mutex);
> +		while (!list_empty(&qddev->users)) {
> +			usr = list_first_entry(&qddev->users, struct qaic_user,
> +					       node);
> +			list_del_init(&usr->node);
> +			kref_get(&usr->ref_count);
> +			usr->qddev = NULL;
> +			mutex_unlock(&qddev->users_mutex);
> +			synchronize_srcu(&usr->qddev_lock);
> +			kref_put(&usr->ref_count, free_usr);
> +			mutex_lock(&qddev->users_mutex);
> +		}
> +		mutex_unlock(&qddev->users_mutex);
> +
> +		if (qddev->ddev) {
> +			drm_dev_unregister(qddev->ddev);
> +			drm_dev_put(qddev->ddev);
> +		}
> +
> +		list_del(&qddev->node);
> +		kfree(qddev);
> +	}
> +}
> +
> +static int qaic_mhi_probe(struct mhi_device *mhi_dev,
> +			  const struct mhi_device_id *id)
> +{
> +	struct qaic_device *qdev;
> +	u16 major, minor;
> +	int ret;
> +
> +	/*
> +	 * Invoking this function indicates that the control channel to the
> +	 * device is available.  We use that as a signal to indicate that
> +	 * the device side firmware has booted.  The device side firmware
> +	 * manages the device resources, so we need to communicate with it
> +	 * via the control channel in order to utilize the device.  Therefore
> +	 * we wait until this signal to create the drm dev that userspace will
> +	 * use to control the device, because without the device side firmware,
> +	 * userspace can't do anything useful.
> +	 */
> +
> +	qdev = pci_get_drvdata(to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
> +
> +	qdev->in_reset = false;
> +
> +	dev_set_drvdata(&mhi_dev->dev, qdev);
> +	qdev->cntl_ch = mhi_dev;
> +
> +	ret = qaic_control_open(qdev);
> +	if (ret) {
> +		pci_dbg(qdev->pdev, "%s: control_open failed %d\n", __func__, ret);
> +		goto err;
> +	}
> +
> +	ret = get_cntl_version(qdev, NULL, &major, &minor);
> +	if (ret || major != cntl_major || minor > cntl_minor) {
> +		pci_err(qdev->pdev, "%s: Control protocol version (%d.%d) not supported.  Supported version is (%d.%d). Ret: %d\n",
> +			__func__, major, minor, cntl_major, cntl_minor, ret);
> +		ret = -EINVAL;
> +		goto close_control;
> +	}
> +
> +	ret = qaic_create_drm_device(qdev, QAIC_NO_PARTITION, NULL);
> +
> +	return ret;
> +
> +close_control:
> +	qaic_control_close(qdev);
> +err:
> +	return ret;
> +}
> +
> +static void qaic_mhi_remove(struct mhi_device *mhi_dev)
> +{

Add a comment here

> +}
> +
> +static void qaic_notify_reset(struct qaic_device *qdev)
> +{
> +	int i;
> +
> +	qdev->in_reset = true;
> +	/* wake up any waiters to avoid waiting for timeouts at sync */
> +	wake_all_cntl(qdev);
> +	for (i = 0; i < qdev->num_dbc; ++i)
> +		wakeup_dbc(qdev, i);
> +	synchronize_srcu(&qdev->dev_lock);
> +}
> +
> +void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset)
> +{
> +	int i;
> +
> +	qaic_notify_reset(qdev);
> +
> +	/* remove drmdevs to prevent new users from coming in */
> +	if (qdev->base_dev)
> +		qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, NULL);
> +
> +	/* start tearing things down */
> +	for (i = 0; i < qdev->num_dbc; ++i)
> +		release_dbc(qdev, i);
> +
> +	if (exit_reset)
> +		qdev->in_reset = false;
> +}
> +
> +static int qaic_pci_probe(struct pci_dev *pdev,
> +			  const struct pci_device_id *id)

Please try to simplify this function. Maybe move irq init to separate function.
It will be more readable and there will less of a error handling spaghetti at the bottom. 

> +{
> +	int ret;
> +	int i;
> +	int mhi_irq;
> +	struct qaic_device *qdev;
> +
> +	qdev = devm_kzalloc(&pdev->dev, sizeof(*qdev), GFP_KERNEL);
> +	if (!qdev)
> +		return -ENOMEM;
> +
> +	if (id->device == PCI_DEV_AIC100) {
> +		qdev->num_dbc = 16;
> +		qdev->dbc = devm_kcalloc(&pdev->dev, qdev->num_dbc, sizeof(*qdev->dbc),
> +					 GFP_KERNEL);
> +		if (!qdev->dbc)
> +			return -ENOMEM;
> +	}
> +
> +	qdev->cntl_wq = alloc_workqueue("qaic_cntl", WQ_UNBOUND, 0);
> +	if (!qdev->cntl_wq) {
> +		ret = -ENOMEM;
> +		goto wq_fail;
> +	}
> +	pci_set_drvdata(pdev, qdev);
> +	qdev->pdev = pdev;
> +	mutex_init(&qdev->cntl_mutex);
> +	INIT_LIST_HEAD(&qdev->cntl_xfer_list);
> +	init_srcu_struct(&qdev->dev_lock);
> +	INIT_LIST_HEAD(&qdev->qaic_drm_devices);
> +	mutex_init(&qdev->qaic_drm_devices_mutex);
> +	for (i = 0; i < qdev->num_dbc; ++i) {
> +		spin_lock_init(&qdev->dbc[i].xfer_lock);
> +		qdev->dbc[i].qdev = qdev;
> +		qdev->dbc[i].id = i;
> +		INIT_LIST_HEAD(&qdev->dbc[i].xfer_list);
> +		init_srcu_struct(&qdev->dbc[i].ch_lock);
> +		init_waitqueue_head(&qdev->dbc[i].dbc_release);
> +		INIT_LIST_HEAD(&qdev->dbc[i].bo_lists);
> +	}
> +
> +	qdev->bars = pci_select_bars(pdev, IORESOURCE_MEM);
> +
> +	/* make sure the device has the expected BARs */
> +	if (qdev->bars != (BIT(0) | BIT(2) | BIT(4))) {
> +		pci_dbg(pdev, "%s: expected BARs 0, 2, and 4 not found in device.  Found 0x%x\n",
> +			__func__, qdev->bars);
> +		ret = -EINVAL;
> +		goto bar_fail;
> +	}
> +
> +	ret = pci_enable_device(pdev);
> +	if (ret)
> +		goto enable_fail;
> +
> +	ret = pci_request_selected_regions(pdev, qdev->bars, "aic100");
> +	if (ret)
> +		goto request_regions_fail;
> +
> +	pci_set_master(pdev);
> +
> +	ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
> +	if (ret)
> +		goto dma_mask_fail;
> +	ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
> +	if (ret)
> +		goto dma_mask_fail;
> +	ret = dma_set_max_seg_size(&pdev->dev, UINT_MAX);
> +	if (ret)
> +		goto dma_mask_fail;
> +
> +	qdev->bar_0 = pci_ioremap_bar(pdev, 0);
> +	if (!qdev->bar_0) {
> +		ret = -ENOMEM;
> +		goto ioremap_0_fail;
> +	}
> +
> +	qdev->bar_2 = pci_ioremap_bar(pdev, 2);
> +	if (!qdev->bar_2) {
> +		ret = -ENOMEM;
> +		goto ioremap_2_fail;
> +	}
> +
> +	for (i = 0; i < qdev->num_dbc; ++i)
> +		qdev->dbc[i].dbc_base = qdev->bar_2 + QAIC_DBC_OFF(i);
> +
> +	ret = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI);
> +	if (ret < 0)
> +		goto alloc_irq_fail;
> +
> +	if (ret < 32) {
> +		pci_err(pdev, "%s: Requested 32 MSIs.  Obtained %d MSIs which is less than the 32 required.\n",
> +			__func__, ret);
> +		ret = -ENODEV;
> +		goto invalid_msi_config;
> +	}
> +
> +	mhi_irq = pci_irq_vector(pdev, 0);
> +	if (mhi_irq < 0) {
> +		ret = mhi_irq;
> +		goto get_mhi_irq_fail;
> +	}
> +
> +	for (i = 0; i < qdev->num_dbc; ++i) {
> +		ret = devm_request_threaded_irq(&pdev->dev,
> +						pci_irq_vector(pdev, i + 1),
> +						dbc_irq_handler,
> +						dbc_irq_threaded_fn,
> +						IRQF_SHARED,
> +						"qaic_dbc",
> +						&qdev->dbc[i]);
> +		if (ret)
> +			goto get_dbc_irq_failed;
> +
> +		if (poll_datapath) {
> +			qdev->dbc[i].irq = pci_irq_vector(pdev, i + 1);
> +			disable_irq_nosync(qdev->dbc[i].irq);
> +			INIT_WORK(&qdev->dbc[i].poll_work, irq_polling_work);
> +		}
> +	}
> +
> +	qdev->mhi_cntl = qaic_mhi_register_controller(pdev, qdev->bar_0, mhi_irq);
> +	if (IS_ERR(qdev->mhi_cntl)) {
> +		ret = PTR_ERR(qdev->mhi_cntl);
> +		goto mhi_register_fail;
> +	}
> +
> +	return 0;
> +
> +mhi_register_fail:
> +get_dbc_irq_failed:

I don't think that duplicated goto statements are allowed by the coding style.
These should be rather named something like err_free_irq.
See https://www.kernel.org/doc/html/v4.10/process/coding-style.html#centralized-exiting-of-functions

> +	for (i = 0; i < qdev->num_dbc; ++i)
> +		devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
> +			      &qdev->dbc[i]);
> +get_mhi_irq_fail:
> +invalid_msi_config:
> +	pci_free_irq_vectors(pdev);
> +alloc_irq_fail:
> +	iounmap(qdev->bar_2);
> +ioremap_2_fail:
> +	iounmap(qdev->bar_0);
> +ioremap_0_fail:
> +dma_mask_fail:
> +	pci_clear_master(pdev);
> +	pci_release_selected_regions(pdev, qdev->bars);
> +request_regions_fail:
> +	pci_disable_device(pdev);
> +enable_fail:
> +	pci_set_drvdata(pdev, NULL);
> +bar_fail:
> +	for (i = 0; i < qdev->num_dbc; ++i)
> +		cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
> +	cleanup_srcu_struct(&qdev->dev_lock);
> +	destroy_workqueue(qdev->cntl_wq);
> +wq_fail:
> +	return ret;
> +}
> +
> +static void qaic_pci_remove(struct pci_dev *pdev)
> +{
> +	struct qaic_device *qdev = pci_get_drvdata(pdev);
> +	int i;
> +
> +	if (!qdev)
> +		return;
> +
> +	qaic_dev_reset_clean_local_state(qdev, false);
> +	qaic_mhi_free_controller(qdev->mhi_cntl, link_up);
> +	for (i = 0; i < qdev->num_dbc; ++i) {
> +		devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
> +			      &qdev->dbc[i]);
> +		cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
> +	}
> +	destroy_workqueue(qdev->cntl_wq);
> +	pci_free_irq_vectors(pdev);
> +	iounmap(qdev->bar_0);
> +	pci_clear_master(pdev);
> +	pci_release_selected_regions(pdev, qdev->bars);
> +	pci_disable_device(pdev);
> +	pci_set_drvdata(pdev, NULL);
> +}
> +
> +static void qaic_pci_shutdown(struct pci_dev *pdev)
> +{
> +	/* see qaic_exit for what link_up is doing */
> +	link_up = true;
> +	qaic_pci_remove(pdev);
> +}
> +
> +static pci_ers_result_t qaic_pci_error_detected(struct pci_dev *pdev,
> +						pci_channel_state_t error)
> +{
> +	return PCI_ERS_RESULT_NEED_RESET;
> +}
> +
> +static void qaic_pci_reset_prepare(struct pci_dev *pdev)
> +{
> +	struct qaic_device *qdev = pci_get_drvdata(pdev);
> +
> +	qaic_notify_reset(qdev);
> +	qaic_mhi_start_reset(qdev->mhi_cntl);
> +	qaic_dev_reset_clean_local_state(qdev, false);
> +}
> +
> +static void qaic_pci_reset_done(struct pci_dev *pdev)
> +{
> +	struct qaic_device *qdev = pci_get_drvdata(pdev);
> +
> +	qdev->in_reset = false;
> +	qaic_mhi_reset_done(qdev->mhi_cntl);
> +}
> +
> +static const struct mhi_device_id qaic_mhi_match_table[] = {
> +	{ .chan = "QAIC_CONTROL", },
> +	{},
> +};
> +
> +static struct mhi_driver qaic_mhi_driver = {
> +	.id_table = qaic_mhi_match_table,
> +	.remove = qaic_mhi_remove,
> +	.probe = qaic_mhi_probe,
> +	.ul_xfer_cb = qaic_mhi_ul_xfer_cb,
> +	.dl_xfer_cb = qaic_mhi_dl_xfer_cb,
> +	.driver = {
> +		.name = "qaic_mhi",
> +	},
> +};
> +
> +static const struct pci_device_id qaic_ids[] = {
> +	{ PCI_DEVICE(PCI_VENDOR_ID_QCOM, PCI_DEV_AIC100), },
> +	{ }
> +};
> +MODULE_DEVICE_TABLE(pci, qaic_ids);
> +
> +static const struct pci_error_handlers qaic_pci_err_handler = {
> +	.error_detected = qaic_pci_error_detected,
> +	.reset_prepare = qaic_pci_reset_prepare,
> +	.reset_done = qaic_pci_reset_done,
> +};
> +
> +static struct pci_driver qaic_pci_driver = {
> +	.name = QAIC_NAME,
> +	.id_table = qaic_ids,
> +	.probe = qaic_pci_probe,
> +	.remove = qaic_pci_remove,
> +	.shutdown = qaic_pci_shutdown,
> +	.err_handler = &qaic_pci_err_handler,
> +};
> +
> +static int __init qaic_init(void)
> +{
> +	int ret;
> +
> +	if (datapath_polling)
> +		poll_datapath = true;
> +
> +	ret = mhi_driver_register(&qaic_mhi_driver);
> +	if (ret) {
> +		pr_debug("qaic: mhi_driver_register failed %d\n", ret);
> +		goto free_class;
> +	}
> +
> +	ret = pci_register_driver(&qaic_pci_driver);
> +	if (ret) {
> +		pr_debug("qaic: pci_register_driver failed %d\n", ret);
> +		goto free_mhi;
> +	}
> +
> +	ret = mhi_qaic_ctrl_init();
> +	if (ret) {
> +		pr_debug("qaic: mhi_qaic_ctrl_init failed %d\n", ret);
> +		goto free_pci;
> +	}
> +
> +	return 0;
> +
> +free_pci:
> +	pci_unregister_driver(&qaic_pci_driver);
> +free_mhi:
> +	mhi_driver_unregister(&qaic_mhi_driver);
> +free_class:

This label doesn't free anything. It should be renamed.

> +	return ret;
> +}
> +
> +static void __exit qaic_exit(void)
> +{
> +	/*
> +	 * We assume that qaic_pci_remove() is called due to a hotplug event
> +	 * which would mean that the link is down, and thus
> +	 * qaic_mhi_free_controller() should not try to access the device during
> +	 * cleanup.
> +	 * We call pci_unregister_driver() below, which also triggers
> +	 * qaic_pci_remove(), but since this is module exit, we expect the link
> +	 * to the device to be up, in which case qaic_mhi_free_controller()
> +	 * should try to access the device during cleanup to put the device in
> +	 * a sane state.
> +	 * For that reason, we set link_up here to let qaic_mhi_free_controller
> +	 * know the expected link state.  Since the module is going to be
> +	 * removed at the end of this, we don't need to worry about
> +	 * reinitializing the link_up state after the cleanup is done.
> +	 */
> +	link_up = true;

Maybe you could just use pdev->current_state instead of link_up?

> +	mhi_qaic_ctrl_deinit();
> +	pci_unregister_driver(&qaic_pci_driver);
> +	mhi_driver_unregister(&qaic_mhi_driver);
> +}
> +
> +module_init(qaic_init);
> +module_exit(qaic_exit);
> +
> +MODULE_AUTHOR(QAIC_DESC " Kernel Driver Team");
> +MODULE_DESCRIPTION(QAIC_DESC " Accel Driver");
> +MODULE_LICENSE("GPL");
> diff --git a/include/uapi/drm/qaic_accel.h b/include/uapi/drm/qaic_accel.h
> new file mode 100644
> index 0000000..d5fa6f5
> --- /dev/null
> +++ b/include/uapi/drm/qaic_accel.h
> @@ -0,0 +1,283 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
> + *
> + * Copyright (c) 2019-2020, The Linux Foundation. All rights reserved.
> + * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#ifndef QAIC_ACCEL_H_
> +#define QAIC_ACCEL_H_
> +
> +#include <linux/ioctl.h>
> +#include <linux/types.h>

These two headers should not be needed here.

> +#include "drm.h"
> +
> +#if defined(__CPLUSPLUS)

Use lowercase here: __cplusplus

> +extern "C" {
> +#endif
> +
> +#define QAIC_MANAGE_MAX_MSG_LENGTH SZ_4K	/**<
> +						  * The length(4K) includes len and
> +						  * count fields of qaic_manage_msg
> +						  */

I guess these are doxygen style commands but you should be using kernel-doc here.
See https://docs.kernel.org/doc-guide/kernel-doc.html.
This can be used to verify the header:
$ scripts/kernel-doc -v -none include/uapi/drm/qaic_accel.h

> +
> +enum qaic_sem_flags {

Is there any specific reason for enums if all values are explicitly given?

> +	SEM_INSYNCFENCE =	0x1,

All these enums/defines end up in a global user space namespace.
I would advise to prefix everything with QAIC_ or DRM_QAIC_ (e.g. QAIC_SEM_INSYNCFENCE)
to avoid conflicts with other drivers or user space libs.

> +	SEM_OUTSYNCFENCE =	0x2,
> +};
> +
> +enum qaic_sem_cmd {
> +	SEM_NOP =		0,
> +	SEM_INIT =		1,
> +	SEM_INC =		2,
> +	SEM_DEC =		3,
> +	SEM_WAIT_EQUAL =	4,
> +	SEM_WAIT_GT_EQ =	5, /**< Greater than or equal */
> +	SEM_WAIT_GT_0 =		6, /**< Greater than 0 */
> +};
> +
> +enum qaic_manage_transaction_type {
> +	TRANS_UNDEFINED =			0,
> +	TRANS_PASSTHROUGH_FROM_USR =		1,
> +	TRANS_PASSTHROUGH_TO_USR =		2,
> +	TRANS_PASSTHROUGH_FROM_DEV =		3,
> +	TRANS_PASSTHROUGH_TO_DEV =		4,
> +	TRANS_DMA_XFER_FROM_USR =		5,
> +	TRANS_DMA_XFER_TO_DEV =			6,
> +	TRANS_ACTIVATE_FROM_USR =		7,
> +	TRANS_ACTIVATE_FROM_DEV =		8,
> +	TRANS_ACTIVATE_TO_DEV =			9,
> +	TRANS_DEACTIVATE_FROM_USR =		10,
> +	TRANS_DEACTIVATE_FROM_DEV =		11,
> +	TRANS_STATUS_FROM_USR =			12,
> +	TRANS_STATUS_TO_USR =			13,
> +	TRANS_STATUS_FROM_DEV =			14,
> +	TRANS_STATUS_TO_DEV =			15,
> +	TRANS_TERMINATE_FROM_DEV =		16,
> +	TRANS_TERMINATE_TO_DEV =		17,
> +	TRANS_DMA_XFER_CONT =			18,
> +	TRANS_VALIDATE_PARTITION_FROM_DEV =	19,
> +	TRANS_VALIDATE_PARTITION_TO_DEV =	20,
> +	TRANS_MAX =				21
> +};
> +
> +struct qaic_manage_trans_hdr {
> +	__u32 type;	/**< in, value from enum manage_transaction_type */
> +	__u32 len;	/**< in, length of this transaction, including the header */
> +};
> +
> +struct qaic_manage_trans_passthrough {
> +	struct qaic_manage_trans_hdr hdr;
> +	__u8 data[];	/**< in, userspace must encode in little endian */
> +};
> +
> +struct qaic_manage_trans_dma_xfer {
> +	struct qaic_manage_trans_hdr hdr;
> +	__u32 tag;	/**< in, device specific */
> +	__u32 count;	/**< in */
> +	__u64 addr;	/**< in, address of the data to transferred via DMA */
> +	__u64 size;	/**< in, length of the data to transferred via DMA */
> +};
> +
> +struct qaic_manage_trans_activate_to_dev {
> +	struct qaic_manage_trans_hdr hdr;
> +	__u32 queue_size;	/**<
> +				  * in, number of elements in DBC request
> +				  * and respose queue
> +				  */
> +	__u32 eventfd;		/**< in */
> +	__u32 options;		/**< in, device specific */
> +	__u32 pad;		/**< pad must be 0 */
> +};
> +
> +struct qaic_manage_trans_activate_from_dev {
> +	struct qaic_manage_trans_hdr hdr;
> +	__u32 status;	/**< out, status of activate transaction */
> +	__u32 dbc_id;	/**< out, Identifier of assigned DMA Bridge channel */
> +	__u64 options;	/**< out */
> +};
> +
> +struct qaic_manage_trans_deactivate {
> +	struct qaic_manage_trans_hdr hdr;
> +	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
> +	__u32 pad;	/**< pad must be 0 */
> +};
> +
> +struct qaic_manage_trans_status_to_dev {
> +	struct qaic_manage_trans_hdr hdr;
> +};
> +
> +struct qaic_manage_trans_status_from_dev {
> +	struct qaic_manage_trans_hdr hdr;
> +	__u16 major;	/**< out, major vesrion of NNC protocol used by device */
> +	__u16 minor;	/**< out, minor vesrion of NNC protocol used by device */

vesrion -> version

> +	__u32 status;	/**< out, status of query transaction  */
> +	__u64 status_flags;	/**<
> +				  * out
> +				  * 0    : If set then device has CRC check enabled
> +				  * 1:63 : Unused
> +				  */
> +};
> +
> +struct qaic_manage_msg {
> +	__u32 len;	/**< in, Length of valid data - ie sum of all transactions */
> +	__u32 count;	/**< in, Number of transactions in message */
> +	__u64 data;	/**< in, Pointer to array of transactions */
> +};
> +
> +struct qaic_create_bo {
> +	__u64 size;	/**< in, Size of BO in byte */
> +	__u32 handle;	/**< out, Returned GEM handle for the BO */
> +	__u32 pad;	/**< pad must be 0 */
> +};
> +
> +struct qaic_mmap_bo {
> +	__u32 handle;	/**< in, Handle for the BO being mapped. */

The comment is missleading. BO is not mapped by this ioctl().

> +	__u32 pad;	/**< pad must be 0 */
> +	__u64 offset;	/**<
> +			  * out, offset into the drm node to use for
> +			  * subsequent mmap call
> +			  */
> +};
> +
> +/**
> + * @brief semaphore command
> + */
> +struct qaic_sem {
> +	__u16 val;	/**< in, Only lower 12 bits are valid */
> +	__u8  index;	/**< in, Only lower 5 bits are valid */
> +	__u8  presync;	/**< in, 1 if presync operation, 0 if postsync */
> +	__u8  cmd;	/**< in, See enum sem_cmd */
> +	__u8  flags;	/**< in, See sem_flags for valid bits.  All others must be 0 */
> +	__u16 pad;	/**< pad must be 0 */
> +};
> +
> +struct qaic_attach_slice_entry {
> +	__u64 size;		/**< in, Size memory to allocate for this BO slice */
> +	struct qaic_sem	sem0;	/**< in, Must be zero if not valid */
> +	struct qaic_sem	sem1;	/**< in, Must be zero if not valid */
> +	struct qaic_sem	sem2;	/**< in, Must be zero if not valid */
> +	struct qaic_sem	sem3;	/**< in, Must be zero if not valid */
> +	__u64 dev_addr;		/**< in, Address in device to/from which data is copied */
> +	__u64 db_addr;		/**< in, Doorbell address */
> +	__u32 db_data;		/**< in, Data to write to doorbell */
> +	__u32 db_len;		/**<
> +				  * in, Doorbell length - 32, 16, or 8 bits.
> +				  * 0 means doorbell is inactive
> +				  */
> +	__u64 offset;		/**< in, Offset from start of buffer */
> +};
> +
> +struct qaic_attach_slice_hdr {
> +	__u32 count;	/**< in, Number of slices for this BO */
> +	__u32 dbc_id;	/**< in, Associate this BO with this DMA Bridge channel */
> +	__u32 handle;	/**< in, Handle of BO to which slicing information is to be attached */
> +	__u32 dir;	/**< in, Direction of data: 1 = DMA_TO_DEVICE, 2 = DMA_FROM_DEVICE */
> +	__u64 size;	/**<
> +			  * in, Total length of BO
> +			  * If BO is imported (DMABUF/PRIME) then this size
> +			  * should not exceed the size of DMABUF provided.
> +			  * If BO is allocated using DRM_IOCTL_QAIC_CREATE_BO
> +			  * then this size should be exactly same as the size
> +			  * provided during DRM_IOCTL_QAIC_CREATE_BO.
> +			  */
> +};
> +
> +struct qaic_attach_slice {
> +	struct qaic_attach_slice_hdr hdr;
> +	__u64 data;	/**<
> +			  * in, Pointer to a buffer which is container of
> +			  * struct qaic_attach_slice_entry[]
> +			  */
> +};
> +
> +struct qaic_execute_entry {
> +	__u32 handle;	/**< in, buffer handle */
> +	__u32 dir;	/**< in, 1 = to device, 2 = from device */
> +};
> +
> +struct qaic_partial_execute_entry {
> +	__u32 handle;	/**< in, buffer handle */
> +	__u32 dir;	/**< in, 1 = to device, 2 = from device */
> +	__u64 resize;	/**< in, 0 = no resize */
> +};
> +
> +struct qaic_execute_hdr {
> +	__u32 count;	/**< in, number of executes following this header */
> +	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
> +};
> +
> +struct qaic_execute {
> +	struct qaic_execute_hdr hdr;
> +	__u64 data;	/**< in, qaic_execute_entry or qaic_partial_execute_entry container */
> +};
> +
> +struct qaic_wait {
> +	__u32 handle;	/**< in, handle to wait on until execute is complete */
> +	__u32 timeout;	/**< in, timeout for wait(in ms) */
> +	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
> +	__u32 pad;	/**< pad must be 0 */
> +};
> +
> +struct qaic_perf_stats_hdr {
> +	__u16 count;	/**< in, Total number BOs requested */
> +	__u16 pad;	/**< pad must be 0 */
> +	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
> +};
> +
> +struct qaic_perf_stats {
> +	struct qaic_perf_stats_hdr hdr;
> +	__u64 data;	/**< in, qaic_perf_stats_entry container */
> +};
> +
> +struct qaic_perf_stats_entry {
> +	__u32 handle;			/**< in, Handle of the memory request */
> +	__u32 queue_level_before;	/**<
> +					  * out, Number of elements in queue
> +					  * before submission given memory request
> +					  */
> +	__u32 num_queue_element;	/**<
> +					  * out, Number of elements to add in the
> +					  * queue for given memory request
> +					  */
> +	__u32 submit_latency_us;	/**<
> +					  * out, Time taken by kernel to submit
> +					  * the request to device
> +					  */
> +	__u32 device_latency_us;	/**<
> +					  * out, Time taken by device to execute the
> +					  * request. 0 if request is not completed
> +					  */
> +	__u32 pad;			/**< pad must be 0 */
> +};
> +
> +struct qaic_part_dev {
> +	__u32 partition_id;	/**< in, reservation id */
> +	__u16 remove;		/**< in, 1 - Remove device 0 - Create device */
> +	__u16 pad;		/**< pad must be 0 */
> +};
> +
> +#define DRM_QAIC_MANAGE				0x00
> +#define DRM_QAIC_CREATE_BO			0x01
> +#define DRM_QAIC_MMAP_BO			0x02

I know that MMAP_BO ioctl is common in drm drivers but in my opinion it is a very poor name.
I suggest naming it BO_INFO so in future you could extend it with other bo params besides
mmap offset. 

> +#define DRM_QAIC_ATTACH_SLICE_BO		0x03
> +#define DRM_QAIC_EXECUTE_BO			0x04
> +#define DRM_QAIC_PARTIAL_EXECUTE_BO		0x05
> +#define DRM_QAIC_WAIT_BO			0x06
> +#define DRM_QAIC_PERF_STATS_BO			0x07
> +#define DRM_QAIC_PART_DEV			0x08
> +
> +#define DRM_IOCTL_QAIC_MANAGE			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_MANAGE, struct qaic_manage_msg)
> +#define DRM_IOCTL_QAIC_CREATE_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_CREATE_BO,	struct qaic_create_bo)
> +#define DRM_IOCTL_QAIC_MMAP_BO			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_MMAP_BO, struct qaic_mmap_bo)
> +#define DRM_IOCTL_QAIC_ATTACH_SLICE_BO		DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_ATTACH_SLICE_BO, struct qaic_attach_slice)
> +#define DRM_IOCTL_QAIC_EXECUTE_BO		DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_EXECUTE_BO,	struct qaic_execute)
> +#define DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO	DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_PARTIAL_EXECUTE_BO,	struct qaic_execute)
> +#define DRM_IOCTL_QAIC_WAIT_BO			DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_WAIT_BO, struct qaic_wait)
> +#define DRM_IOCTL_QAIC_PERF_STATS_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PERF_STATS_BO, struct qaic_perf_stats)
> +#define DRM_IOCTL_QAIC_PART_DEV			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PART_DEV, struct qaic_part_dev)
> +
> +#if defined(__CPLUSPLUS)

Use lowercase here: __cplusplus

> +}
> +#endif
> +
> +#endif /* QAIC_ACCEL_H_ */

Regards,
Jacek


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/8] accel/qaic: Add documentation for AIC100 accelerator driver
  2023-02-15 15:41       ` Jeffrey Hugo
  (?)
@ 2023-02-16 14:18       ` Jacek Lawrynowicz
  2023-02-16 15:20           ` Jeffrey Hugo
  -1 siblings, 1 reply; 68+ messages in thread
From: Jacek Lawrynowicz @ 2023-02-16 14:18 UTC (permalink / raw)
  To: Jeffrey Hugo, ogabbay, airlied, daniel, dri-devel
  Cc: linux-doc, linux-arm-msm, quic_ajitpals, quic_pkanojiy,
	stanislaw.gruszka, quic_carlv

Hi,

On 15.02.2023 16:41, Jeffrey Hugo wrote:
> On 2/14/2023 4:08 AM, Jacek Lawrynowicz wrote:
>> Hi,
> 
> Thank you for the review.
> 
>> On 06.02.2023 16:41, Jeffrey Hugo wrote:
>>> The Qualcomm Cloud AI 100 (AIC100) device is an Artificial Intelligence
>>> accelerator PCIe card.  It contains a number of components both in the
>>> SoC and on the card which facilitate running workloads:
>>>
>>> QSM: management processor
>>> NSPs: workload compute units
>>> DMA Bridge: dedicated data mover for the workloads
>>> MHI: multiplexed communication channels
>>> DDR: workload storage and memory
>>>
>>> The Linux kernel driver for AIC100 is called "QAIC" and is located in the
>>> accel subsystem.
>>>
>>> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
>>> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
>>> ---
>>>   Documentation/accel/index.rst       |   1 +
>>>   Documentation/accel/qaic/aic100.rst | 498 ++++++++++++++++++++++++++++++++++++
>>>   Documentation/accel/qaic/index.rst  |  13 +
>>>   Documentation/accel/qaic/qaic.rst   | 169 ++++++++++++
>>>   4 files changed, 681 insertions(+)
>>>   create mode 100644 Documentation/accel/qaic/aic100.rst
>>>   create mode 100644 Documentation/accel/qaic/index.rst
>>>   create mode 100644 Documentation/accel/qaic/qaic.rst
>>>
>>> diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
>>> index 2b43c9a..e94a016 100644
>>> --- a/Documentation/accel/index.rst
>>> +++ b/Documentation/accel/index.rst
>>> @@ -8,6 +8,7 @@ Compute Accelerators
>>>      :maxdepth: 1
>>>        introduction
>>> +   qaic/index
>>>     .. only::  subproject and html
>>>   diff --git a/Documentation/accel/qaic/aic100.rst b/Documentation/accel/qaic/aic100.rst
>>> new file mode 100644
>>> index 0000000..773aa54
>>> --- /dev/null
>>> +++ b/Documentation/accel/qaic/aic100.rst
>>> @@ -0,0 +1,498 @@
>>> +.. SPDX-License-Identifier: GPL-2.0-only
>>> +
>>> +===============================
>>> + Qualcomm Cloud AI 100 (AIC100)
>>> +===============================
>>> +
>>> +Overview
>>> +========
>>> +
>>> +The Qualcomm Cloud AI 100/AIC100 family of products (including SA9000P - part of
>>> +Snapdragon Ride) are PCIe adapter cards which contain a dedicated SoC ASIC for
>>> +the purpose of efficiently running Artificial Intelligence (AI) Deep Learning
>>> +inference workloads.  They are AI accelerators.
>>
>> There are multiple double spaces in this document like this one above.
> 
> I presume you are referring to the double space after peroid. Universally, that was the recommended style (APA guidebook, etc) until a little while ago.  Old habits are hard to break.  Will scrub.
> 
>>
>>> +The PCIe interface of AIC100 is capable of PCIe Gen4 speeds over eight lanes
>>> +(x8).  An individual SoC on a card can have up to 16 NSPs for running workloads.
>>> +Each SoC has an A53 management CPU.  On card, there can be up to 32 GB of DDR.
>>> +
>>> +Multiple AIC100 cards can be hosted in a single system to scale overall
>>> +performance.
>>> +
>>> +Hardware Description
>>> +====================
>>> +
>>> +An AIC100 card consists of an AIC100 SoC, on-card DDR, and a set of misc
>>> +peripherals (PMICs, etc).
>>> +
>>> +An AIC100 card can either be a PCIe HHHL form factor (a traditional PCIe card),
>>> +or a Dual M.2 card.  Both use PCIe to connect to the host system.
>>
>> Dual M.2 card? Is it a single PCB with two M.2 connectors? This requires custom
>> motherboard with x4 lanes from two connectors combined as a single PCIe device, right?
> 
> Yes.  There is a specification for this, although it hasn't gotten widespread adoption.  In addition to more lanes, you also get to draw more power.  Sincle M.2 is around 11W.  Dual M.2 is capped at 25W.
> 
> It tends to be a handy form factor for "edge" applications where the physical size and power draw of a "normal" PCIe slot (what you'd find on a regular ATX motherboard) is not desirerable.
> 
>>
>>> +As a PCIe endpoint/adapter, AIC100 uses the standard VendorID(VID)/
>>> +DeviceID(DID) combination to uniquely identify itself to the host.  AIC100
>>> +uses the standard Qualcomm VID (0x17cb).  All AIC100 instances use the same
>>> +AIC100 DID (0xa100).
>>
>> Maybe "SKUs" would fit better here then "instances".
> 
> Sure.
> 
>>
>>> +AIC100 does not implement FLR (function level reset).
>>> +
>>> +AIC100 implements MSI but does not implement MSI-X.  AIC100 requires 17 MSIs to
>>> +operate (1 for MHI, 16 for the DMA Bridge).
>>> +
>>> +As a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device
>>> +hardware.  AIC100 provides 3, 64-bit BARs.
>>> +
>>> +* The first BAR is 4K in size, and exposes the MHI interface to the host.
>>> +
>>> +* The second BAR is 2M in size, and exposes the DMA Bridge interface to the
>>> +  host.
>>> +
>>> +* The third BAR is variable in size based on an individual AIC100's
>>> +  configuration, but defaults to 64K.  This BAR currently has no purpose.
>>> +
>>> +From the host perspective, AIC100 has several key hardware components-
>>
>> Typo in "components-".
> 
> ?
> You want "components -"?

I meant "components:" but I guess the "-" is part of the format.

>>
>>> +* QSM (QAIC Service Manager)
>>> +* NSPs (Neural Signal Processor)
>>> +* DMA Bridge
>>> +* DDR
>>> +* MHI (Modem Host Interface)
>>> +
>>> +QSM
>>> +---
>>> +
>>> +QAIC Service Manager.  This is an ARM A53 CPU that runs the primary
>>> +firmware of the card and performs on-card management tasks.  It also
>>> +communicates with the host via MHI.  Each AIC100 has one of
>>> +these.
>>
>> I would put description of MHI at the top because it is referenced by the QSM description.
> 
> Sure.
> 
>>
>>> +NSP
>>> +---
>>> +
>>> +Neural Signal Processor.  Each AIC100 has up to 16 of these.  These are
>>> +the processors that run the workloads on AIC100.  Each NSP is a Qualcomm Hexagon
>>> +(Q6) DSP with HVX and HMX.  Each NSP can only run one workload at a time, but
>>> +multiple NSPs may be assigned to a single workload.  Since each NSP can only run
>>> +one workload, AIC100 is limited to 16 concurrent workloads.  Workload
>>> +"scheduling" is under the purview of the host.  AIC100 does not automatically
>>> +timeslice.
>>> +
>>> +DMA Bridge
>>> +----------
>>> +
>>> +The DMA Bridge is custom DMA engine that manages the flow of data
>>> +in and out of workloads.  AIC100 has one of these.  The DMA Bridge has 16
>>> +channels, each consisting of a set of request/response FIFOs.  Each active
>>> +workload is assigned a single DMA Bridge channel.  The DMA Bridge exposes
>>> +hardware registers to manage the FIFOs (head/tail pointers), but requires host
>>> +memory to store the FIFOs.
>>> +
>>> +DDR
>>> +---
>>> +
>>> +AIC100 has on-card DDR.  In total, an AIC100 can have up to 32 GB of DDR.
>>> +This DDR is used to store workloads, data for the workloads, and is used by the
>>> +QSM for managing the device.  NSPs are granted access to sections of the DDR by
>>> +the QSM.  The host does not have direct access to the DDR, and must make
>>> +requests to the QSM to transfer data to the DDR.
>>> +
>>> +MHI
>>> +---
>>> +
>>> +AIC100 has one MHI interface over PCIe.  MHI itself is documented at
>>
>> Please exand MHI acronym.
> 
> Its expanded about 40 lines up - "* MHI (Modem Host Interface)".  I generally go by the scheme of expanding an acronym the first time it is used in a document, and then just using the acronym there after.
> 
> Do you feel the expansion needs to be duplicated?  It might help when this section is moved to the top.
> 

No, it's fine.

>>
>>> +Documentation/mhi/index.rst  MHI is the mechanism the host uses to communicate
>>> +with the QSM.  Except for workload data via the DMA Bridge, all interaction with
>>> +he device occurs via MHI.
>>
>> Typo in "he device".
> 
> Doh.  Will fix.
> 
>>
>>> +High-level Use Flow
>>> +===================
>>> +
>>> +AIC100 is a programmable accelerator typically used for running
>>> +neural networks in inferencing mode to efficiently perform AI operations.
>>> +AIC100 is not intended for training neural networks.  AIC100 can be utilitized
>>
>> utilitized -> utilized
> 
> Sure
> 
>>
>>> +for generic compute workloads.
>>> +
>>> +Assuming a user wants to utilize AIC100, they would follow these steps:
>>> +
>>> +1. Compile the workload into an ELF targeting the NSP(s)
>>> +2. Make requests to the QSM to load the workload and related artifacts into the
>>> +   device DDR
>>> +3. Make a request to the QSM to activate the workload onto a set of idle NSPs
>>> +4. Make requests to the DMA Bridge to send input data to the workload to be
>>> +   processed, and other requests to receive processed output data from the
>>> +   workload.
>>> +5. Once the workload is no longer required, make a request to the QSM to
>>> +   deactivate the workload, thus putting the NSPs back into an idle state.
>>> +6. Once the workload and related artifacts are no longer needed for future
>>> +   sessions, make requests to the QSM to unload the data from DDR.  This frees
>>> +   the DDR to be used by other users.
>>> +
>>
>> Please specify if this is single or multi user device.
> 
> It is multi-user.  I will find a way to clarify that.
> 
>>
>>> +Boot Flow
>>> +=========
>>> +
>>> +AIC100 uses a flashless boot flow, derived from Qualcomm MSMs.
>>
>> What's MSM?
> 
> "Mobile Station Modem".  It is is Qualcomm term from the 80s, and used to describe the "family" Qualcomm phone SoCs.  It used to be that the Model number would be "MSM8660" or "MSM8960", etc.  That has changed a bit in the past few years, but the products are still referred to "MSMs".
> 
> It is a common term in the Qualcomm world, and "MSM" is more well known these days than what it stands for.  I don't think expanding it is going to add value.
> 

OK

>>
>>> +When AIC100 is first powered on, it begins executing PBL (Primary Bootloader)
>>> +from ROM.  PBL enumerates the PCIe link, and initializes the BHI (Boot Host
>>> +Interface) component of MHI.
>>> +
>>> +Using BHI, the host points PBL to the location of the SBL (Secondary Bootloader)
>>> +image.  The PBL pulls the image from the host, validates it, and begins
>>> +execution of SBL.
>>> +
>>> +SBL initializes MHI, and uses MHI to notify the host that the device has entered
>>> +the SBL stage.  SBL performs a number of operations:
>>> +
>>> +* SBL initializes the majority of hardware (anything PBL left uninitialized),
>>> +  including DDR.
>>> +* SBL offloads the bootlog to the host.
>>> +* SBL synchonizes timestamps with the host for future logging.
>>
>> synchonizes -> synchronizes
> 
> Yep
> 
>>
>>> +* SBL uses the Sahara protocol to obtain the runtime firmware images from the
>>> +  host.
>>> +
>>> +Once SBL has obtained and validated the runtime firmware, it brings the NSPs out
>>> +of reset, and jumps into the QSM.
>>> +
>>> +The QSM uses MHI to notify the host that the device has entered the QSM stage
>>> +(AMSS in MHI terms).  At this point, the AIC100 device is fully functional, and
>>> +ready to process workloads.
>>> +
>>> +Userspace components
>>> +====================
>>> +
>>> +Compiler
>>> +--------
>>> +
>>> +An open compiler for AIC100 based on upstream LLVM can be found at:
>>> +https://github.com/quic/software-kit-for-qualcomm-cloud-ai-100-cc
>>> +
>>> +Usermode Driver (UMD)
>>> +---------------------
>>> +
>>> +An open UMD that interfaces with the qaic kernel driver can be found at:
>>> +https://github.com/quic/software-kit-for-qualcomm-cloud-ai-100
>>
>> This repo is empty.
> 
> Correct.  That was mentioned in the cover letter.  Targeting to post content this week.  Just working through the last steps of our internal process.
> 
>>
>>> +
>>> +Sahara loader
>>> +-------------
>>> +
>>> +An open implementation of the Sahara protocol called kickstart can be found at:
>>> +https://github.com/andersson/qdl
>>> +
>>> +MHI Channels
>>> +============
>>> +
>>> +AIC100 defines a number of MHI channels for different purposes.  This is a list
>>> +of the defined channels, and their uses.
>>> +
>>> +| QAIC_LOOPBACK
>>> +| Channels 0/1
>>
>> A would use comma or & here.
> 
> Intresting.  I can see that.  I think I like "&".  Will do that.
> 
>>
>>> +| Valid for AMSS
>>> +| Any data sent to the device on this channel is sent back to the host.
>>> +
>>> +| QAIC_SAHARA
>>> +| Channels 2/3
>>> +| Valid for SBL
>>> +| Used by SBL to obtain the runtime firmware from the host.
>>> +
>>> +| QAIC_DIAG
>>> +| Channels 4/5
>>> +| Valid for AMSS
>>> +| Used to communicate with QSM via the Diag protocol.
>>> +
>>> +| QAIC_SSR
>>> +| Channels 6/7
>>> +| Valid for AMSS
>>> +| Used to notify the host of subsystem restart events, and to offload SSR crashdumps.
>>> +
>>> +| QAIC_QDSS
>>> +| Channels 8/9
>>> +| Valid for AMSS
>>> +| Used for the Qualcomm Debug Subsystem.
>>> +
>>> +| QAIC_CONTROL
>>> +| Channels 10/11
>>> +| Valid for AMSS
>>> +| Used for the Neural Network Control (NNC) protocol.  This is the primary channel between host and QSM for managing workloads.
>>> +
>>> +| QAIC_LOGGING
>>> +| Channels 12/13
>>> +| Valid for SBL
>>> +| Used by the SBL to send the bootlog to the host.
>>> +
>>> +| QAIC_STATUS
>>> +| Channels 14/15
>>> +| Valid for AMSS
>>> +| Used to notify the host of Reliability, Accessability, Serviceability (RAS) events.
>>
>> Accessability -> Accessibility
> 
> Got it.
> 
>>
>>> +| QAIC_TELEMETRY
>>> +| Channels 16/17
>>> +| Valid for AMSS
>>> +| Used to get/set power/thermal/etc attributes.
>>> +
>>> +| QAIC_DEBUG
>>> +| Channels 18/19
>>> +| Valid for AMSS
>>> +| Not used.
>>> +
>>> +| QAIC_TIMESYNC
>>> +| Channels 20/21
>>> +| Valid for SBL/AMSS
>>> +| Used to synchronize timestamps in the device side logs with the host time source.
>>> +
>>> +DMA Bridge
>>> +==========
>>> +
>>> +Overview
>>> +--------
>>> +
>>> +The DMA Bridge is one of the main interfaces to the host from the device
>>> +(the other being MHI).  As part of activating a workload to run on NSPs, the QSM
>>> +assigns that network a DMA Bridge channel.  A workload's DMA Bridge channel
>>> +(DBC for short) is solely for the use of that workload and is not shared with
>>> +other workloads.
>>> +
>>> +Each DBC is a pair of FIFOs that manage data in and out of the workload.  One
>>> +FIFO is the request FIFO.  The other FIFO is the response FIFO.
>>> +
>>> +Each DBC contains 4 registers in hardware:
>>> +
>>> +* Request FIFO head pointer (offset 0x0).  Read only to the host.  Indicates the
>>
>> Read only _by_ the host.
> 
> Sure.
> 
>>
>>> +  latest item in the FIFO the device has consumed.
>>> +* Request FIFO tail pointer (offset 0x4).  Read/write by the host.  Host
>>> +  increments this register to add new items to the FIFO.
>>> +* Response FIFO head pointer (offset 0x8).  Read/write by the host.  Indicates
>>> +  the latest item in the FIFO the host has consumed.
>>> +* Response FIFO tail pointer (offset 0xc).  Read only to the host.  Device
>>
>> Read only _by_ the host.
> 
> Sure.
> 
>>
>>> +  increments this register to add new items to the FIFO.
>>> +
>>> +The values in each register are indexes in the FIFO.  To get the location of the
>>> +FIFO element pointed to by the register: FIFO base address + register * element
>>> +size.
>>> +
>>> +DBC registers are exposed to the host via the second BAR.  Each DBC consumes
>>> +0x1000 of space in the BAR.
>>
>> I wouldn't use hex for the sizes. 4KB seems a lot more readable.
> 
> Good point.  Will do.
> 
>>
>>> +The actual FIFOs are backed by host memory.  When sending a request to the QSM
>>> +to activate a network, the host must donate memory to be used for the FIFOs.
>>> +Due to internal mapping limitations of the device, a single contigious chunk of
>>
>> contigious -> contiguous
> 
> Got it.
> 
>>
>>> +memory must be provided per DBC, which hosts both FIFOs.  The request FIFO will
>>> +consume the beginning of the memory chunk, and the response FIFO will consume
>>> +the end of the memory chunk.
>>> +
>>> +Request FIFO
>>> +------------
>>> +
>>> +A request FIFO element has the following structure:
>>> +
>>> +| {
>>> +|    u16 req_id;
>>> +|    u8  seq_id;
>>> +|    u8  pcie_dma_cmd;
>>> +|    u32 reserved;
>>> +|    u64 pcie_dma_source_addr;
>>> +|    u64 pcie_dma_dest_addr;
>>> +|    u32 pcie_dma_len;
>>> +|    u32 reserved;
>>> +|    u64 doorbell_addr;
>>> +|    u8  doorbell_attr;
>>> +|    u8  reserved;
>>> +|    u16 reserved;
>>> +|    u32 doorbell_data;
>>> +|    u32 sem_cmd0;
>>> +|    u32 sem_cmd1;
>>> +|    u32 sem_cmd2;
>>> +|    u32 sem_cmd3;
>>> +| }
>>> +
>>> +Request field descriptions:
>>> +
>>> +| req_id- request ID.  A request FIFO element and a response FIFO element with
>>> +|         the same request ID refer to the same command.
>>> +
>>> +| seq_id- sequence ID within a request.  Ignored by the DMA Bridge.
>>> +
>>> +| pcie_dma_cmd- describes the DMA element of this request.
>>> +|     Bit(7) is the force msi flag, which overrides the DMA Bridge MSI logic
>>> +|         and generates a MSI when this request is complete, and QSM
>>> +|         configures the DMA Bridge to look at this bit.
>>> +|     Bits(6:5) are reserved.
>>> +|     Bit(4) is the completion code flag, and indicates that the DMA Bridge
>>> +|         shall generate a response FIFO element when this request is
>>> +|         complete.
>>> +|     Bit(3) indicates if this request is a linked list transfer(0) or a bulk
>>> +|         transfer(1).
>>> +|     Bit(2) is reserved.
>>> +|     Bits(1:0) indicate the type of transfer.  No transfer(0), to device(1),
>>> +|         from device(2).  Value 3 is illegal.
>>> +
>>> +| pcie_dma_source_addr- source address for a bulk transfer, or the address of
>>> +|         the linked list.
>>> +
>>> +| pcie_dma_dest_addr- destination address for a bulk transfer.
>>> +
>>> +| pcie_dma_len- length of the bulk transfer.  Note that the size of this field
>>> +|     limits transfers to 4G in size.
>>> +
>>> +| doorbell_addr- address of the doorbell to ring when this request is complete.
>>> +
>>> +| doorbell_attr- doorbell attributes.
>>> +|     Bit(7) indicates if a write to a doorbell is to occur.
>>> +|     Bits(6:2) are reserved.
>>> +|     Bits(1:0) contain the encoding of the doorbell length.  0 is 32-bit,
>>> +|         1 is 16-bit, 2 is 8-bit, 3 is reserved.  The doorbell address
>>> +|         must be naturally aligned to the specified length.
>>> +
>>> +| doorbell_data- data to write to the doorbell.  Only the bits corresponding to
>>> +|     the doorbell length are valid.
>>> +
>>> +| sem_cmdN- semaphore command.
>>> +|     Bit(31) indicates this semaphore command is enabled.
>>> +|     Bit(30) is the to-device DMA fence.  Block this request until all
>>> +|         to-device DMA transfers are complete.
>>> +|     Bit(29) is the from-device DMA fence.  Block this request until all
>>> +|         from-device DMA transfers are complete.
>>> +|     Bits(28:27) are reserved.
>>> +|     Bits(26:24) are the semaphore command.  0 is NOP.  1 is init with the
>>> +|         specified value.  2 is increment.  3 is decrement.  4 is wait
>>> +|         until the semaphore is equal to the specified value.  5 is wait
>>> +|         until the semaphore is greater or equal to the specified value.
>>> +|         6 is "P", wait until semaphore is greater than 0, then
>>> +|         decrement by 1.  7 is reserved.
>>> +|     Bit(23) is reserved.
>>> +|     Bit(22) is the semaphore sync.  0 is post sync, which means that the
>>> +|         semaphore operation is done after the DMA transfer.  1 is
>>> +|         presync, which gates the DMA transfer.  Only one presync is
>>> +|         allowed per request.
>>> +|     Bit(21) is reserved.
>>> +|     Bits(20:16) is the index of the semaphore to operate on.
>>> +|     Bits(15:12) are reserved.
>>> +|     Bits(11:0) are the semaphore value to use in operations.
>>
>> It seems to me like structure documentation
> 
> Yes.  It can be modeled that way.  However the code comes later so it can't be referenced here yet.  I've got a todo to come back and clean that up once this series is merged.
> 
>>
>>> +Overall, a request is processed in 4 steps:
>>> +
>>> +1. If specified, the presync semaphore condition must be true
>>> +2. If enabled, the DMA transfer occurs
>>> +3. If specified, the postsync semaphore conditions must be true
>>> +4. If enabled, the doorbell is written
>>> +
>>> +By using the semaphores in conjunction with the workload running on the NSPs,
>>> +the data pipeline can be synchronized such that the host can queue multiple
>>> +requests of data for the workload to process, but the DMA Bridge will only copy
>>> +the data into the memory of the workload when the workload is ready to process
>>> +the next input.
>>> +
>>> +Response FIFO
>>> +-------------
>>> +
>>> +Once a request is fully processed, a response FIFO element is generated if
>>> +specified in pcie_dma_cmd.  The structure of a response FIFO element:
>>> +
>>> +| {
>>> +|     u16 req_id;
>>> +|     u16 completion_code;
>>> +| }
>>> +
>>> +req_id- matches the req_id of the request that generated this element.
>>> +
>>> +completion_code- status of this request.  0 is success.  non-zero is an error.
>>> +
>>> +The DMA Bridge will generate a MSI to the host as a reaction to activity in the
>>> +response FIFO of a DBC.  The DMA Bridge hardware has an IRQ storm mitigation
>>> +algorithm, where it will only generate a MSI when the response FIFO transitions
>>> +from empty to non-empty (unless force MSI is enabled and triggered).  In
>>> +response to this MSI, the host is expected to drain the response FIFO, and must
>>> +take care to handle any race conditions between draining the FIFO, and the
>>> +device inserting elements into the FIFO.
>>> +
>>> +Neural Network Control (NNC) Protocol
>>> +=====================================
>>> +
>>> +The NNC protocol is how the host makes requests to the QSM to manage workloads.
>>> +It uses the QAIC_CONTROL MHI channel.
>>> +
>>> +Each NNC request is packaged into a message.  Each message is a series of
>>> +transactions.  A passthrough type transaction can contain elements known as
>>> +commands.
>>> +
>>> +QSM requires NNC messages be little endian encoded and the fields be naturally
>>> +aligned.  Since there are 64-bit elements in some NNC messages, 64-bit alignment
>>> +must be maintained.
>>> +
>>> +A message contains a header and then a series of transactions.  A message may be
>>> +at most 4K in size from QSM to the host.  From the host to the QSM, a message
>>> +can be at most 64K (maximum size of a single MHI packet), but there is a
>>> +continuation feature where message N+1 can be marked as a continuation of
>>> +message N.  This is used for exceedingly large DMA xfer transactions.
>>> +
>>> +Transaction descriptions:
>>> +
>>> +passthrough- Allows userspace to send an opaque payload directly to the QSM.
>>> +This is used for NNC commands.  Userspace is responsible for managing
>>> +the QSM message requirements in the payload
>>> +
>>> +dma_xfer- DMA transfer.  Describes an object that the QSM should DMA into the
>>> +device via address and size tuples.
>>> +
>>> +activate- Activate a workload onto NSPs.  The host must provide memory to be
>>> +used by the DBC.
>>> +
>>> +deactivate- Deactivate an active workload and return the NSPs to idle.
>>> +
>>> +status- Query the QSM about it's NNC implementation.  Returns the NNC version,
>>> +and if CRC is used.
>>> +
>>> +terminate- Release a user's resources.
>>> +
>>> +dma_xfer_cont- Continuation of a previous DMA transfer.  If a DMA transfer
>>> +cannot be specified in a single message (highly fragmented), this
>>> +transaction can be used to specify more ranges.
>>> +
>>> +validate_partition- Query to QSM to determine if a partition identifier is
>>> +valid.
>>> +
>>> +Each message is tagged with a user id, and a partition id.  The user id allows
>>> +QSM to track resources, and release them when the user goes away (eg the process
>>> +crashes).  A partition id identifies the resource partition that QSM manages,
>>> +which this message applies to.
>>> +
>>> +Messages may have CRCs.  Messages should have CRCs applied until the QSM
>>> +reports via the status transaction that CRCs are not needed.  The QSM on the
>>> +SA9000P requires CRCs for black channel safing.
>>> +
>>> +Subsystem Restart (SSR)
>>> +=======================
>>> +
>>> +SSR is the concept of limiting the impact of an error.  An AIC100 device may
>>> +have multiple users, each with their own workload running.  If the workload of
>>> +one user crashes, the fallout of that should be limited to that workload and not
>>> +impact other workloads.  SSR accomplishes this.
>>> +
>>> +If a particular workload crashes, QSM notifies the host via the QAIC_SSR MHI
>>> +channel.  This notification identifies the workload by it's assigned DBC.  A
>>> +multi-stage recovery process is then used to cleanup both sides, and get the
>>> +DBC/NSPs into a working state.
>>> +
>>> +When SSR occurs, any state in the workload is lost.  Any inputs that were in
>>> +process, or queued by not yet serviced, are lost.  The loaded artifacts will
>>> +remain in on-card DDR, but the host will need to re-activate the workload if
>>> +it desires to recover the workload.
>>> +
>>> +Reliability, Accessability, Serviceability (RAS)
>>
>> Accessability -> Accessibility
> 
> Got it.
> 
>>
>>> +================================================
>>> +
>>> +AIC100 is expected to be deployed in server systems where RAS ideology is
>>> +applied.  Simply put, RAS is the concept of detecting, classifying, and
>>> +reporting errors.  While PCIe has AER (Advanced Error Reporting) which factors
>>> +into RAS, AER does not allow for a device to report details about internal
>>> +errors.  Therefore, AIC100 implements a custom RAS mechanism.  When a RAS event
>>> +occurs, QSM will report the event with appropriate details via the QAIC_STATUS
>>> +MHI channel.  A sysadmin may determine that a particular device needs
>>> +additional service based on RAS reports.
>>> +
>>> +Telemetry
>>> +=========
>>> +
>>> +QSM has the ability to report various physical attributes of the device, and in
>>> +some cases, to allow the host to control them.  Examples include thermal limits,
>>> +thermal readings, and power readings.  These items are communicated via the
>>> +QAIC_TELEMETRY MHI channel
>>> diff --git a/Documentation/accel/qaic/index.rst b/Documentation/accel/qaic/index.rst
>>> new file mode 100644
>>> index 0000000..ad19b88
>>> --- /dev/null
>>> +++ b/Documentation/accel/qaic/index.rst
>>> @@ -0,0 +1,13 @@
>>> +.. SPDX-License-Identifier: GPL-2.0-only
>>> +
>>> +=====================================
>>> + accel/qaic Qualcomm Cloud AI driver
>>> +=====================================
>>> +
>>> +The accel/qaic driver supports the Qualcomm Cloud AI machine learning
>>> +accelerator cards.
>>> +
>>> +.. toctree::
>>> +
>>> +   qaic
>>> +   aic100
>>> diff --git a/Documentation/accel/qaic/qaic.rst b/Documentation/accel/qaic/qaic.rst
>>> new file mode 100644
>>> index 0000000..b0e7a5f
>>> --- /dev/null
>>> +++ b/Documentation/accel/qaic/qaic.rst
>>> @@ -0,0 +1,169 @@
>>> +.. SPDX-License-Identifier: GPL-2.0-only
>>> +
>>> +=============
>>> + QAIC driver
>>> +=============
>>> +
>>> +The QAIC driver is the Kernel Mode Driver (KMD) for the AIC100 family of AI
>>> +accelerator products.
>>> +
>>> +Interrupts
>>> +==========
>>> +
>>> +While the AIC100 DMA Bridge hardware implements an IRQ storm mitigation
>>> +mechanism, it is still possible for an IRQ storm to occur.  A storm can happen
>>> +if the workload is particularly quick, and the host is responsive.  If the host
>>> +can drain the response FIFO as quickly as the device can insert elements into
>>> +it, then the device will frequently transition the response FIFO from empty to
>>> +non-empty and generate MSIs at a rate equilivelent to the speed of the
>>
>> equilivelent -> equivalent
> 
> Sure.
> 
>>
>>> +workload's ability to process inputs.  The lprnet (license plate reader network)
>>> +workload is known to trigger this condition, and can generate in excess of 100k
>>> +MSIs per second.  It has been observed that most systems cannot tolerate this
>>> +for long, and will crash due to some form of watchdog due to the overhead of
>>> +the interrupt controller interrupting the host CPU.
>>> +
>>> +To mitigate this issue, the QAIC driver implements specific IRQ handling.  When
>>> +QAIC receives an IRQ, it disables that line.  This prevents the interrupt
>>> +controller from interrupting the CPU.  Then AIC drains the FIFO.  Once the FIFO
>>> +is drained, QAIC implements a "last chance" polling algorithm where QAIC will
>>> +sleep for a time to see if the workload will generate more activity.  The IRQ
>>> +line remains disabled during this time.  If no activity is detected, QAIC exits
>>> +polling mode and reenables the IRQ line.
>>> +
>>> +This mitigation in QAIC is very effective.  The same lprnet usecase that
>>> +generates 100k IRQs per second (per /proc/interrupts) is reduced to roughly 64
>>> +IRQs over 5 minutes while keeping the host system stable, and having the same
>>> +workload throughput performance (within run to run noise variation).
>>> +
>>> +
>>> +Neural Network Control (NNC) Protocol
>>> +=====================================
>>> +
>>> +The implementation of NNC is split between the KMD (QAIC) and UMD.  In general
>>> +QAIC understands how to encode/decode NNC wire protocol, and elements of the
>>> +protocol which require kernelspace knowledge to process (for example, mapping
>>
>> kernelspace is missing a space :P
>>
>>> +host memory to device IOVAs).  QAIC understands the structure of a message, and
>>> +all of the transactions.  QAIC does not understand commands (the payload of a
>>> +passthrough transaction).
>>> +
>>> +QAIC handles and enforces the required little endianness and 64-bit alignment,
>>> +to the degree that it can.  Since QAIC does not know the contents of a
>>> +passthrough transaction, it relies on the UMD to saitsfy the requirements.
>>
>> saitsfy -> satisfy
> 
> Will do.
> 
>>
>>> +The terminate transaction is of particular use to QAIC.  QAIC is not aware of
>>> +the resources that are loaded onto a device since the majority of that activity
>>> +occurs within NNC commands.  As a result, QAIC does not have the means to
>>> +roll back userspace activity.  To ensure that a userspace client's resources
>>> +are fully released in the case of a process crash, or a bug, QAIC uses the
>>> +terminate command to let QSM know when a user has gone away, and the resources
>>> +can be released.
>>> +
>>> +QSM can report a version number of the NNC protocol it supports.  This is in the
>>> +form of a Major number and a Minor number.
>>> +
>>> +Major number updates indicate changes to the NNC protocol which impact the
>>> +message format, or transactions (impacts QAIC).
>>> +
>>> +Minor number updates indicate changes to the NNC protocol which impact the
>>> +commands (does not impact QAIC).
>>> +
>>> +uAPI
>>> +====
>>> +
>>> +QAIC defines a number of driver specific IOCTLs as part of the userspace API.
>>> +This section describes those APIs.
>>> +
>>> +DRM_IOCTL_QAIC_MANAGE:
>>> +This IOCTL allows userspace to send a NNC request to the QSM.  The call will
>>> +block until a response is received, or the request has timed out.
>>> +
>>> +DRM_IOCTL_QAIC_CREATE_BO:
>>> +This IOCTL allows userspace to allocate a buffer object (BO) which can send or
>>> +receive data from a workload.  The call will return a GEM handle that
>>> +represents the allocated buffer.  The BO is not usable until it has been sliced
>>> +(see DRM_IOCTL_QAIC_ATTACH_SLICE_BO).
>>> +
>>> +DRM_IOCTL_QAIC_MMAP_BO:
>>> +This IOCTL allows userspace to prepare an allocated BO to be mmap'd into the
>>> +userspace process.
>>> +
>>> +DRM_IOCTL_QAIC_ATTACH_SLICE_BO:
>>> +This IOCTL allows userspace to slice a BO in preparation for sending the BO to
>>> +the device.  Slicing is the operation of describing what portions of a BO get
>>> +sent where to a workload.  This requires a set of DMA transfers for the DMA
>>> +Bridge, and as such, locks the BO to a specific DBC.
>>> +
>>> +DRM_IOCTL_QAIC_EXECUTE_BO:
>>> +This IOCTL allows userspace to submit a set of sliced BOs to the device.  The
>>> +call is non-blocking.  Success only indicates that the BOs have been queued
>>> +to the device, but does not guarantee they have been executed.
>>> +
>>> +DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO:
>>> +This IOCTL operates like DRM_IOCTL_QAIC_EXECUTE_BO, but it allows userspace to
>>> +shrink the BOs sent to the device for this specific call.  If a BO typically has
>>> +N inputs, but only a subset of those is available, this IOCTL allows userspace
>>> +to indicate that only the first M bytes of the BO should be sent to the device
>>> +to minimize data transfer overhead.  This IOCTL dynamically recomputes the
>>> +slicing, and therefore has some processing overhead before the BOs can be queued
>>> +to the device.
>>> +
>>> +DRM_IOCTL_QAIC_WAIT_BO:
>>> +This IOCTL allows userspace to determine when a particular BO has been processed
>>> +by the device.  The call will block until either the BO has been processed and
>>> +can be re-queued to the device, or a timeout occurs.
>>> +
>>> +DRM_IOCTL_QAIC_PERF_STATS_BO:
>>> +This IOCTL allows userspace to collect performance statistics on the most
>>> +recent execution of a BO.  This allows userspace to construct an end to end
>>> +timeline of the BO processing for a performance analysis.
>>> +
>>> +DRM_IOCTL_QAIC_PART_DEV:
>>> +This IOCTL allows userspace to request a duplicate "shadow device".  This extra
>>> +accelN device is associated with a specific partition of resources on the AIC100
>>> +device and can be used for limiting a process to some subset of resources.
>>> +
>>> +Userspace Client Isolation
>>> +==========================
>>> +
>>> +AIC100 supports multiple clients.  Multiple DBCs can be consumed by a single
>>> +client, and multiple clients can each consume one or more DBCs.  Workloads
>>> +may contain sensistive information therefore only the client that owns the
>>
>> sensistive -> sensitive
> 
> Will do.
> 
>>
>>> +workload should be allowed to interface with the DBC.
>>> +
>>> +Clients are identified by the instance associated with their open().  A client
>>> +may only use memory they allocate, and DBCs that are assigned to their
>>> +workloads.  Attempts to access resources assigned to other clients will be
>>> +rejected.
>>> +
>>> +Module parameters
>>> +=================
>>> +
>>> +QAIC supports the following module parameters:
>>> +
>>> +**datapath_polling (bool)**
>>> +
>>> +Configures QAIC to use a polling thread for datapath events instead of relying
>>> +on the device interrupts.  Useful for platforms with broken multiMSI.  Must be
>>> +set at QAIC driver initialization.  Default is 0 (off).
>>> +
>>> +**mhi_timeout (int)**
>>> +
>>> +Sets the timeout value for MHI operations in milliseconds (ms).  Must be set
>>> +at the time the driver detects a device.  Default is 2000 (2 seconds).
>>> +
>>> +**control_resp_timeout (int)**
>>> +
>>> +Sets the timeout value for QSM responses to NNC messages in seconds (s).  Must
>>> +be set at the time the driver is sending a request to QSM.  Default is 60 (one
>>> +minute).
>>> +
>>> +**wait_exec_default_timeout (int)**
>>> +
>>> +Sets the default timeout for the wait_exec ioctl in milliseconds (ms).  Must be
>>> +set prior to the waic_exec ioctl call.  A value specified in the ioctl call
>>> +overrides this for that call.  Default is 5000 (5 seconds).
>>> +
>>> +**datapath_poll_interval_us (int)**
>>> +
>>> +Sets the polling interval in microseconds (us) when datapath polling is active.
>>> +Takes effect at the next polling interval.  Default is 100 (100 us).
>>
>> Cool that you are staring with the documentation :)
>> I suggest running at least "checkpatch.pl --codespell" on the series as there are many spelling issue.
> 
> Thats what happened to checkpatch.  Thanks for pointing out the flag.  I remember it doing spellcheck by default.  Apparently it needs a flag now.  I'll do that going forward.
> 
>>
>> Regards,
>> Jacek
>>
>>
>>
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/8] accel/qaic: Add documentation for AIC100 accelerator driver
  2023-02-16 14:18       ` Jacek Lawrynowicz
@ 2023-02-16 15:20           ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-16 15:20 UTC (permalink / raw)
  To: Jacek Lawrynowicz, ogabbay, airlied, daniel, dri-devel
  Cc: linux-doc, linux-arm-msm, quic_ajitpals, quic_pkanojiy,
	stanislaw.gruszka, quic_carlv

On 2/16/2023 7:18 AM, Jacek Lawrynowicz wrote:
> Hi,
> 
> On 15.02.2023 16:41, Jeffrey Hugo wrote:
>> On 2/14/2023 4:08 AM, Jacek Lawrynowicz wrote:
>>> Hi,
>>
>> Thank you for the review.
>>
>>> On 06.02.2023 16:41, Jeffrey Hugo wrote:
>>>> The Qualcomm Cloud AI 100 (AIC100) device is an Artificial Intelligence
>>>> accelerator PCIe card.  It contains a number of components both in the
>>>> SoC and on the card which facilitate running workloads:
>>>>
>>>> QSM: management processor
>>>> NSPs: workload compute units
>>>> DMA Bridge: dedicated data mover for the workloads
>>>> MHI: multiplexed communication channels
>>>> DDR: workload storage and memory
>>>>
>>>> The Linux kernel driver for AIC100 is called "QAIC" and is located in the
>>>> accel subsystem.
>>>>
>>>> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
>>>> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
>>>> ---
>>>>    Documentation/accel/index.rst       |   1 +
>>>>    Documentation/accel/qaic/aic100.rst | 498 ++++++++++++++++++++++++++++++++++++
>>>>    Documentation/accel/qaic/index.rst  |  13 +
>>>>    Documentation/accel/qaic/qaic.rst   | 169 ++++++++++++
>>>>    4 files changed, 681 insertions(+)
>>>>    create mode 100644 Documentation/accel/qaic/aic100.rst
>>>>    create mode 100644 Documentation/accel/qaic/index.rst
>>>>    create mode 100644 Documentation/accel/qaic/qaic.rst
>>>>
>>>> diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
>>>> index 2b43c9a..e94a016 100644
>>>> --- a/Documentation/accel/index.rst
>>>> +++ b/Documentation/accel/index.rst
>>>> @@ -8,6 +8,7 @@ Compute Accelerators
>>>>       :maxdepth: 1
>>>>         introduction
>>>> +   qaic/index
>>>>      .. only::  subproject and html
>>>>    diff --git a/Documentation/accel/qaic/aic100.rst b/Documentation/accel/qaic/aic100.rst
>>>> new file mode 100644
>>>> index 0000000..773aa54
>>>> --- /dev/null
>>>> +++ b/Documentation/accel/qaic/aic100.rst
>>>> @@ -0,0 +1,498 @@
>>>> +.. SPDX-License-Identifier: GPL-2.0-only
>>>> +
>>>> +===============================
>>>> + Qualcomm Cloud AI 100 (AIC100)
>>>> +===============================
>>>> +
>>>> +Overview
>>>> +========
>>>> +
>>>> +The Qualcomm Cloud AI 100/AIC100 family of products (including SA9000P - part of
>>>> +Snapdragon Ride) are PCIe adapter cards which contain a dedicated SoC ASIC for
>>>> +the purpose of efficiently running Artificial Intelligence (AI) Deep Learning
>>>> +inference workloads.  They are AI accelerators.
>>>
>>> There are multiple double spaces in this document like this one above.
>>
>> I presume you are referring to the double space after peroid. Universally, that was the recommended style (APA guidebook, etc) until a little while ago.  Old habits are hard to break.  Will scrub.
>>
>>>
>>>> +The PCIe interface of AIC100 is capable of PCIe Gen4 speeds over eight lanes
>>>> +(x8).  An individual SoC on a card can have up to 16 NSPs for running workloads.
>>>> +Each SoC has an A53 management CPU.  On card, there can be up to 32 GB of DDR.
>>>> +
>>>> +Multiple AIC100 cards can be hosted in a single system to scale overall
>>>> +performance.
>>>> +
>>>> +Hardware Description
>>>> +====================
>>>> +
>>>> +An AIC100 card consists of an AIC100 SoC, on-card DDR, and a set of misc
>>>> +peripherals (PMICs, etc).
>>>> +
>>>> +An AIC100 card can either be a PCIe HHHL form factor (a traditional PCIe card),
>>>> +or a Dual M.2 card.  Both use PCIe to connect to the host system.
>>>
>>> Dual M.2 card? Is it a single PCB with two M.2 connectors? This requires custom
>>> motherboard with x4 lanes from two connectors combined as a single PCIe device, right?
>>
>> Yes.  There is a specification for this, although it hasn't gotten widespread adoption.  In addition to more lanes, you also get to draw more power.  Sincle M.2 is around 11W.  Dual M.2 is capped at 25W.
>>
>> It tends to be a handy form factor for "edge" applications where the physical size and power draw of a "normal" PCIe slot (what you'd find on a regular ATX motherboard) is not desirerable.
>>
>>>
>>>> +As a PCIe endpoint/adapter, AIC100 uses the standard VendorID(VID)/
>>>> +DeviceID(DID) combination to uniquely identify itself to the host.  AIC100
>>>> +uses the standard Qualcomm VID (0x17cb).  All AIC100 instances use the same
>>>> +AIC100 DID (0xa100).
>>>
>>> Maybe "SKUs" would fit better here then "instances".
>>
>> Sure.
>>
>>>
>>>> +AIC100 does not implement FLR (function level reset).
>>>> +
>>>> +AIC100 implements MSI but does not implement MSI-X.  AIC100 requires 17 MSIs to
>>>> +operate (1 for MHI, 16 for the DMA Bridge).
>>>> +
>>>> +As a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device
>>>> +hardware.  AIC100 provides 3, 64-bit BARs.
>>>> +
>>>> +* The first BAR is 4K in size, and exposes the MHI interface to the host.
>>>> +
>>>> +* The second BAR is 2M in size, and exposes the DMA Bridge interface to the
>>>> +  host.
>>>> +
>>>> +* The third BAR is variable in size based on an individual AIC100's
>>>> +  configuration, but defaults to 64K.  This BAR currently has no purpose.
>>>> +
>>>> +From the host perspective, AIC100 has several key hardware components-
>>>
>>> Typo in "components-".
>>
>> ?
>> You want "components -"?
> 
> I meant "components:" but I guess the "-" is part of the format.

Ah, I see.  "components:" is the same thing to me.  Will change since it 
seems like it makes more sense to you.

> 
>>>
>>>> +* QSM (QAIC Service Manager)
>>>> +* NSPs (Neural Signal Processor)
>>>> +* DMA Bridge
>>>> +* DDR
>>>> +* MHI (Modem Host Interface)
>>>> +
>>>> +QSM
>>>> +---
>>>> +
>>>> +QAIC Service Manager.  This is an ARM A53 CPU that runs the primary
>>>> +firmware of the card and performs on-card management tasks.  It also
>>>> +communicates with the host via MHI.  Each AIC100 has one of
>>>> +these.
>>>
>>> I would put description of MHI at the top because it is referenced by the QSM description.
>>
>> Sure.
>>
>>>
>>>> +NSP
>>>> +---
>>>> +
>>>> +Neural Signal Processor.  Each AIC100 has up to 16 of these.  These are
>>>> +the processors that run the workloads on AIC100.  Each NSP is a Qualcomm Hexagon
>>>> +(Q6) DSP with HVX and HMX.  Each NSP can only run one workload at a time, but
>>>> +multiple NSPs may be assigned to a single workload.  Since each NSP can only run
>>>> +one workload, AIC100 is limited to 16 concurrent workloads.  Workload
>>>> +"scheduling" is under the purview of the host.  AIC100 does not automatically
>>>> +timeslice.
>>>> +
>>>> +DMA Bridge
>>>> +----------
>>>> +
>>>> +The DMA Bridge is custom DMA engine that manages the flow of data
>>>> +in and out of workloads.  AIC100 has one of these.  The DMA Bridge has 16
>>>> +channels, each consisting of a set of request/response FIFOs.  Each active
>>>> +workload is assigned a single DMA Bridge channel.  The DMA Bridge exposes
>>>> +hardware registers to manage the FIFOs (head/tail pointers), but requires host
>>>> +memory to store the FIFOs.
>>>> +
>>>> +DDR
>>>> +---
>>>> +
>>>> +AIC100 has on-card DDR.  In total, an AIC100 can have up to 32 GB of DDR.
>>>> +This DDR is used to store workloads, data for the workloads, and is used by the
>>>> +QSM for managing the device.  NSPs are granted access to sections of the DDR by
>>>> +the QSM.  The host does not have direct access to the DDR, and must make
>>>> +requests to the QSM to transfer data to the DDR.
>>>> +
>>>> +MHI
>>>> +---
>>>> +
>>>> +AIC100 has one MHI interface over PCIe.  MHI itself is documented at
>>>
>>> Please exand MHI acronym.
>>
>> Its expanded about 40 lines up - "* MHI (Modem Host Interface)".  I generally go by the scheme of expanding an acronym the first time it is used in a document, and then just using the acronym there after.
>>
>> Do you feel the expansion needs to be duplicated?  It might help when this section is moved to the top.
>>
> 
> No, it's fine.
> 
>>>
>>>> +Documentation/mhi/index.rst  MHI is the mechanism the host uses to communicate
>>>> +with the QSM.  Except for workload data via the DMA Bridge, all interaction with
>>>> +he device occurs via MHI.
>>>
>>> Typo in "he device".
>>
>> Doh.  Will fix.
>>
>>>
>>>> +High-level Use Flow
>>>> +===================
>>>> +
>>>> +AIC100 is a programmable accelerator typically used for running
>>>> +neural networks in inferencing mode to efficiently perform AI operations.
>>>> +AIC100 is not intended for training neural networks.  AIC100 can be utilitized
>>>
>>> utilitized -> utilized
>>
>> Sure
>>
>>>
>>>> +for generic compute workloads.
>>>> +
>>>> +Assuming a user wants to utilize AIC100, they would follow these steps:
>>>> +
>>>> +1. Compile the workload into an ELF targeting the NSP(s)
>>>> +2. Make requests to the QSM to load the workload and related artifacts into the
>>>> +   device DDR
>>>> +3. Make a request to the QSM to activate the workload onto a set of idle NSPs
>>>> +4. Make requests to the DMA Bridge to send input data to the workload to be
>>>> +   processed, and other requests to receive processed output data from the
>>>> +   workload.
>>>> +5. Once the workload is no longer required, make a request to the QSM to
>>>> +   deactivate the workload, thus putting the NSPs back into an idle state.
>>>> +6. Once the workload and related artifacts are no longer needed for future
>>>> +   sessions, make requests to the QSM to unload the data from DDR.  This frees
>>>> +   the DDR to be used by other users.
>>>> +
>>>
>>> Please specify if this is single or multi user device.
>>
>> It is multi-user.  I will find a way to clarify that.
>>
>>>
>>>> +Boot Flow
>>>> +=========
>>>> +
>>>> +AIC100 uses a flashless boot flow, derived from Qualcomm MSMs.
>>>
>>> What's MSM?
>>
>> "Mobile Station Modem".  It is is Qualcomm term from the 80s, and used to describe the "family" Qualcomm phone SoCs.  It used to be that the Model number would be "MSM8660" or "MSM8960", etc.  That has changed a bit in the past few years, but the products are still referred to "MSMs".
>>
>> It is a common term in the Qualcomm world, and "MSM" is more well known these days than what it stands for.  I don't think expanding it is going to add value.
>>
> 
> OK
> 
>>>
>>>> +When AIC100 is first powered on, it begins executing PBL (Primary Bootloader)
>>>> +from ROM.  PBL enumerates the PCIe link, and initializes the BHI (Boot Host
>>>> +Interface) component of MHI.
>>>> +
>>>> +Using BHI, the host points PBL to the location of the SBL (Secondary Bootloader)
>>>> +image.  The PBL pulls the image from the host, validates it, and begins
>>>> +execution of SBL.
>>>> +
>>>> +SBL initializes MHI, and uses MHI to notify the host that the device has entered
>>>> +the SBL stage.  SBL performs a number of operations:
>>>> +
>>>> +* SBL initializes the majority of hardware (anything PBL left uninitialized),
>>>> +  including DDR.
>>>> +* SBL offloads the bootlog to the host.
>>>> +* SBL synchonizes timestamps with the host for future logging.
>>>
>>> synchonizes -> synchronizes
>>
>> Yep
>>
>>>
>>>> +* SBL uses the Sahara protocol to obtain the runtime firmware images from the
>>>> +  host.
>>>> +
>>>> +Once SBL has obtained and validated the runtime firmware, it brings the NSPs out
>>>> +of reset, and jumps into the QSM.
>>>> +
>>>> +The QSM uses MHI to notify the host that the device has entered the QSM stage
>>>> +(AMSS in MHI terms).  At this point, the AIC100 device is fully functional, and
>>>> +ready to process workloads.
>>>> +
>>>> +Userspace components
>>>> +====================
>>>> +
>>>> +Compiler
>>>> +--------
>>>> +
>>>> +An open compiler for AIC100 based on upstream LLVM can be found at:
>>>> +https://github.com/quic/software-kit-for-qualcomm-cloud-ai-100-cc
>>>> +
>>>> +Usermode Driver (UMD)
>>>> +---------------------
>>>> +
>>>> +An open UMD that interfaces with the qaic kernel driver can be found at:
>>>> +https://github.com/quic/software-kit-for-qualcomm-cloud-ai-100
>>>
>>> This repo is empty.
>>
>> Correct.  That was mentioned in the cover letter.  Targeting to post content this week.  Just working through the last steps of our internal process.
>>
>>>
>>>> +
>>>> +Sahara loader
>>>> +-------------
>>>> +
>>>> +An open implementation of the Sahara protocol called kickstart can be found at:
>>>> +https://github.com/andersson/qdl
>>>> +
>>>> +MHI Channels
>>>> +============
>>>> +
>>>> +AIC100 defines a number of MHI channels for different purposes.  This is a list
>>>> +of the defined channels, and their uses.
>>>> +
>>>> +| QAIC_LOOPBACK
>>>> +| Channels 0/1
>>>
>>> A would use comma or & here.
>>
>> Intresting.  I can see that.  I think I like "&".  Will do that.
>>
>>>
>>>> +| Valid for AMSS
>>>> +| Any data sent to the device on this channel is sent back to the host.
>>>> +
>>>> +| QAIC_SAHARA
>>>> +| Channels 2/3
>>>> +| Valid for SBL
>>>> +| Used by SBL to obtain the runtime firmware from the host.
>>>> +
>>>> +| QAIC_DIAG
>>>> +| Channels 4/5
>>>> +| Valid for AMSS
>>>> +| Used to communicate with QSM via the Diag protocol.
>>>> +
>>>> +| QAIC_SSR
>>>> +| Channels 6/7
>>>> +| Valid for AMSS
>>>> +| Used to notify the host of subsystem restart events, and to offload SSR crashdumps.
>>>> +
>>>> +| QAIC_QDSS
>>>> +| Channels 8/9
>>>> +| Valid for AMSS
>>>> +| Used for the Qualcomm Debug Subsystem.
>>>> +
>>>> +| QAIC_CONTROL
>>>> +| Channels 10/11
>>>> +| Valid for AMSS
>>>> +| Used for the Neural Network Control (NNC) protocol.  This is the primary channel between host and QSM for managing workloads.
>>>> +
>>>> +| QAIC_LOGGING
>>>> +| Channels 12/13
>>>> +| Valid for SBL
>>>> +| Used by the SBL to send the bootlog to the host.
>>>> +
>>>> +| QAIC_STATUS
>>>> +| Channels 14/15
>>>> +| Valid for AMSS
>>>> +| Used to notify the host of Reliability, Accessability, Serviceability (RAS) events.
>>>
>>> Accessability -> Accessibility
>>
>> Got it.
>>
>>>
>>>> +| QAIC_TELEMETRY
>>>> +| Channels 16/17
>>>> +| Valid for AMSS
>>>> +| Used to get/set power/thermal/etc attributes.
>>>> +
>>>> +| QAIC_DEBUG
>>>> +| Channels 18/19
>>>> +| Valid for AMSS
>>>> +| Not used.
>>>> +
>>>> +| QAIC_TIMESYNC
>>>> +| Channels 20/21
>>>> +| Valid for SBL/AMSS
>>>> +| Used to synchronize timestamps in the device side logs with the host time source.
>>>> +
>>>> +DMA Bridge
>>>> +==========
>>>> +
>>>> +Overview
>>>> +--------
>>>> +
>>>> +The DMA Bridge is one of the main interfaces to the host from the device
>>>> +(the other being MHI).  As part of activating a workload to run on NSPs, the QSM
>>>> +assigns that network a DMA Bridge channel.  A workload's DMA Bridge channel
>>>> +(DBC for short) is solely for the use of that workload and is not shared with
>>>> +other workloads.
>>>> +
>>>> +Each DBC is a pair of FIFOs that manage data in and out of the workload.  One
>>>> +FIFO is the request FIFO.  The other FIFO is the response FIFO.
>>>> +
>>>> +Each DBC contains 4 registers in hardware:
>>>> +
>>>> +* Request FIFO head pointer (offset 0x0).  Read only to the host.  Indicates the
>>>
>>> Read only _by_ the host.
>>
>> Sure.
>>
>>>
>>>> +  latest item in the FIFO the device has consumed.
>>>> +* Request FIFO tail pointer (offset 0x4).  Read/write by the host.  Host
>>>> +  increments this register to add new items to the FIFO.
>>>> +* Response FIFO head pointer (offset 0x8).  Read/write by the host.  Indicates
>>>> +  the latest item in the FIFO the host has consumed.
>>>> +* Response FIFO tail pointer (offset 0xc).  Read only to the host.  Device
>>>
>>> Read only _by_ the host.
>>
>> Sure.
>>
>>>
>>>> +  increments this register to add new items to the FIFO.
>>>> +
>>>> +The values in each register are indexes in the FIFO.  To get the location of the
>>>> +FIFO element pointed to by the register: FIFO base address + register * element
>>>> +size.
>>>> +
>>>> +DBC registers are exposed to the host via the second BAR.  Each DBC consumes
>>>> +0x1000 of space in the BAR.
>>>
>>> I wouldn't use hex for the sizes. 4KB seems a lot more readable.
>>
>> Good point.  Will do.
>>
>>>
>>>> +The actual FIFOs are backed by host memory.  When sending a request to the QSM
>>>> +to activate a network, the host must donate memory to be used for the FIFOs.
>>>> +Due to internal mapping limitations of the device, a single contigious chunk of
>>>
>>> contigious -> contiguous
>>
>> Got it.
>>
>>>
>>>> +memory must be provided per DBC, which hosts both FIFOs.  The request FIFO will
>>>> +consume the beginning of the memory chunk, and the response FIFO will consume
>>>> +the end of the memory chunk.
>>>> +
>>>> +Request FIFO
>>>> +------------
>>>> +
>>>> +A request FIFO element has the following structure:
>>>> +
>>>> +| {
>>>> +|    u16 req_id;
>>>> +|    u8  seq_id;
>>>> +|    u8  pcie_dma_cmd;
>>>> +|    u32 reserved;
>>>> +|    u64 pcie_dma_source_addr;
>>>> +|    u64 pcie_dma_dest_addr;
>>>> +|    u32 pcie_dma_len;
>>>> +|    u32 reserved;
>>>> +|    u64 doorbell_addr;
>>>> +|    u8  doorbell_attr;
>>>> +|    u8  reserved;
>>>> +|    u16 reserved;
>>>> +|    u32 doorbell_data;
>>>> +|    u32 sem_cmd0;
>>>> +|    u32 sem_cmd1;
>>>> +|    u32 sem_cmd2;
>>>> +|    u32 sem_cmd3;
>>>> +| }
>>>> +
>>>> +Request field descriptions:
>>>> +
>>>> +| req_id- request ID.  A request FIFO element and a response FIFO element with
>>>> +|         the same request ID refer to the same command.
>>>> +
>>>> +| seq_id- sequence ID within a request.  Ignored by the DMA Bridge.
>>>> +
>>>> +| pcie_dma_cmd- describes the DMA element of this request.
>>>> +|     Bit(7) is the force msi flag, which overrides the DMA Bridge MSI logic
>>>> +|         and generates a MSI when this request is complete, and QSM
>>>> +|         configures the DMA Bridge to look at this bit.
>>>> +|     Bits(6:5) are reserved.
>>>> +|     Bit(4) is the completion code flag, and indicates that the DMA Bridge
>>>> +|         shall generate a response FIFO element when this request is
>>>> +|         complete.
>>>> +|     Bit(3) indicates if this request is a linked list transfer(0) or a bulk
>>>> +|         transfer(1).
>>>> +|     Bit(2) is reserved.
>>>> +|     Bits(1:0) indicate the type of transfer.  No transfer(0), to device(1),
>>>> +|         from device(2).  Value 3 is illegal.
>>>> +
>>>> +| pcie_dma_source_addr- source address for a bulk transfer, or the address of
>>>> +|         the linked list.
>>>> +
>>>> +| pcie_dma_dest_addr- destination address for a bulk transfer.
>>>> +
>>>> +| pcie_dma_len- length of the bulk transfer.  Note that the size of this field
>>>> +|     limits transfers to 4G in size.
>>>> +
>>>> +| doorbell_addr- address of the doorbell to ring when this request is complete.
>>>> +
>>>> +| doorbell_attr- doorbell attributes.
>>>> +|     Bit(7) indicates if a write to a doorbell is to occur.
>>>> +|     Bits(6:2) are reserved.
>>>> +|     Bits(1:0) contain the encoding of the doorbell length.  0 is 32-bit,
>>>> +|         1 is 16-bit, 2 is 8-bit, 3 is reserved.  The doorbell address
>>>> +|         must be naturally aligned to the specified length.
>>>> +
>>>> +| doorbell_data- data to write to the doorbell.  Only the bits corresponding to
>>>> +|     the doorbell length are valid.
>>>> +
>>>> +| sem_cmdN- semaphore command.
>>>> +|     Bit(31) indicates this semaphore command is enabled.
>>>> +|     Bit(30) is the to-device DMA fence.  Block this request until all
>>>> +|         to-device DMA transfers are complete.
>>>> +|     Bit(29) is the from-device DMA fence.  Block this request until all
>>>> +|         from-device DMA transfers are complete.
>>>> +|     Bits(28:27) are reserved.
>>>> +|     Bits(26:24) are the semaphore command.  0 is NOP.  1 is init with the
>>>> +|         specified value.  2 is increment.  3 is decrement.  4 is wait
>>>> +|         until the semaphore is equal to the specified value.  5 is wait
>>>> +|         until the semaphore is greater or equal to the specified value.
>>>> +|         6 is "P", wait until semaphore is greater than 0, then
>>>> +|         decrement by 1.  7 is reserved.
>>>> +|     Bit(23) is reserved.
>>>> +|     Bit(22) is the semaphore sync.  0 is post sync, which means that the
>>>> +|         semaphore operation is done after the DMA transfer.  1 is
>>>> +|         presync, which gates the DMA transfer.  Only one presync is
>>>> +|         allowed per request.
>>>> +|     Bit(21) is reserved.
>>>> +|     Bits(20:16) is the index of the semaphore to operate on.
>>>> +|     Bits(15:12) are reserved.
>>>> +|     Bits(11:0) are the semaphore value to use in operations.
>>>
>>> It seems to me like structure documentation
>>
>> Yes.  It can be modeled that way.  However the code comes later so it can't be referenced here yet.  I've got a todo to come back and clean that up once this series is merged.
>>
>>>
>>>> +Overall, a request is processed in 4 steps:
>>>> +
>>>> +1. If specified, the presync semaphore condition must be true
>>>> +2. If enabled, the DMA transfer occurs
>>>> +3. If specified, the postsync semaphore conditions must be true
>>>> +4. If enabled, the doorbell is written
>>>> +
>>>> +By using the semaphores in conjunction with the workload running on the NSPs,
>>>> +the data pipeline can be synchronized such that the host can queue multiple
>>>> +requests of data for the workload to process, but the DMA Bridge will only copy
>>>> +the data into the memory of the workload when the workload is ready to process
>>>> +the next input.
>>>> +
>>>> +Response FIFO
>>>> +-------------
>>>> +
>>>> +Once a request is fully processed, a response FIFO element is generated if
>>>> +specified in pcie_dma_cmd.  The structure of a response FIFO element:
>>>> +
>>>> +| {
>>>> +|     u16 req_id;
>>>> +|     u16 completion_code;
>>>> +| }
>>>> +
>>>> +req_id- matches the req_id of the request that generated this element.
>>>> +
>>>> +completion_code- status of this request.  0 is success.  non-zero is an error.
>>>> +
>>>> +The DMA Bridge will generate a MSI to the host as a reaction to activity in the
>>>> +response FIFO of a DBC.  The DMA Bridge hardware has an IRQ storm mitigation
>>>> +algorithm, where it will only generate a MSI when the response FIFO transitions
>>>> +from empty to non-empty (unless force MSI is enabled and triggered).  In
>>>> +response to this MSI, the host is expected to drain the response FIFO, and must
>>>> +take care to handle any race conditions between draining the FIFO, and the
>>>> +device inserting elements into the FIFO.
>>>> +
>>>> +Neural Network Control (NNC) Protocol
>>>> +=====================================
>>>> +
>>>> +The NNC protocol is how the host makes requests to the QSM to manage workloads.
>>>> +It uses the QAIC_CONTROL MHI channel.
>>>> +
>>>> +Each NNC request is packaged into a message.  Each message is a series of
>>>> +transactions.  A passthrough type transaction can contain elements known as
>>>> +commands.
>>>> +
>>>> +QSM requires NNC messages be little endian encoded and the fields be naturally
>>>> +aligned.  Since there are 64-bit elements in some NNC messages, 64-bit alignment
>>>> +must be maintained.
>>>> +
>>>> +A message contains a header and then a series of transactions.  A message may be
>>>> +at most 4K in size from QSM to the host.  From the host to the QSM, a message
>>>> +can be at most 64K (maximum size of a single MHI packet), but there is a
>>>> +continuation feature where message N+1 can be marked as a continuation of
>>>> +message N.  This is used for exceedingly large DMA xfer transactions.
>>>> +
>>>> +Transaction descriptions:
>>>> +
>>>> +passthrough- Allows userspace to send an opaque payload directly to the QSM.
>>>> +This is used for NNC commands.  Userspace is responsible for managing
>>>> +the QSM message requirements in the payload
>>>> +
>>>> +dma_xfer- DMA transfer.  Describes an object that the QSM should DMA into the
>>>> +device via address and size tuples.
>>>> +
>>>> +activate- Activate a workload onto NSPs.  The host must provide memory to be
>>>> +used by the DBC.
>>>> +
>>>> +deactivate- Deactivate an active workload and return the NSPs to idle.
>>>> +
>>>> +status- Query the QSM about it's NNC implementation.  Returns the NNC version,
>>>> +and if CRC is used.
>>>> +
>>>> +terminate- Release a user's resources.
>>>> +
>>>> +dma_xfer_cont- Continuation of a previous DMA transfer.  If a DMA transfer
>>>> +cannot be specified in a single message (highly fragmented), this
>>>> +transaction can be used to specify more ranges.
>>>> +
>>>> +validate_partition- Query to QSM to determine if a partition identifier is
>>>> +valid.
>>>> +
>>>> +Each message is tagged with a user id, and a partition id.  The user id allows
>>>> +QSM to track resources, and release them when the user goes away (eg the process
>>>> +crashes).  A partition id identifies the resource partition that QSM manages,
>>>> +which this message applies to.
>>>> +
>>>> +Messages may have CRCs.  Messages should have CRCs applied until the QSM
>>>> +reports via the status transaction that CRCs are not needed.  The QSM on the
>>>> +SA9000P requires CRCs for black channel safing.
>>>> +
>>>> +Subsystem Restart (SSR)
>>>> +=======================
>>>> +
>>>> +SSR is the concept of limiting the impact of an error.  An AIC100 device may
>>>> +have multiple users, each with their own workload running.  If the workload of
>>>> +one user crashes, the fallout of that should be limited to that workload and not
>>>> +impact other workloads.  SSR accomplishes this.
>>>> +
>>>> +If a particular workload crashes, QSM notifies the host via the QAIC_SSR MHI
>>>> +channel.  This notification identifies the workload by it's assigned DBC.  A
>>>> +multi-stage recovery process is then used to cleanup both sides, and get the
>>>> +DBC/NSPs into a working state.
>>>> +
>>>> +When SSR occurs, any state in the workload is lost.  Any inputs that were in
>>>> +process, or queued by not yet serviced, are lost.  The loaded artifacts will
>>>> +remain in on-card DDR, but the host will need to re-activate the workload if
>>>> +it desires to recover the workload.
>>>> +
>>>> +Reliability, Accessability, Serviceability (RAS)
>>>
>>> Accessability -> Accessibility
>>
>> Got it.
>>
>>>
>>>> +================================================
>>>> +
>>>> +AIC100 is expected to be deployed in server systems where RAS ideology is
>>>> +applied.  Simply put, RAS is the concept of detecting, classifying, and
>>>> +reporting errors.  While PCIe has AER (Advanced Error Reporting) which factors
>>>> +into RAS, AER does not allow for a device to report details about internal
>>>> +errors.  Therefore, AIC100 implements a custom RAS mechanism.  When a RAS event
>>>> +occurs, QSM will report the event with appropriate details via the QAIC_STATUS
>>>> +MHI channel.  A sysadmin may determine that a particular device needs
>>>> +additional service based on RAS reports.
>>>> +
>>>> +Telemetry
>>>> +=========
>>>> +
>>>> +QSM has the ability to report various physical attributes of the device, and in
>>>> +some cases, to allow the host to control them.  Examples include thermal limits,
>>>> +thermal readings, and power readings.  These items are communicated via the
>>>> +QAIC_TELEMETRY MHI channel
>>>> diff --git a/Documentation/accel/qaic/index.rst b/Documentation/accel/qaic/index.rst
>>>> new file mode 100644
>>>> index 0000000..ad19b88
>>>> --- /dev/null
>>>> +++ b/Documentation/accel/qaic/index.rst
>>>> @@ -0,0 +1,13 @@
>>>> +.. SPDX-License-Identifier: GPL-2.0-only
>>>> +
>>>> +=====================================
>>>> + accel/qaic Qualcomm Cloud AI driver
>>>> +=====================================
>>>> +
>>>> +The accel/qaic driver supports the Qualcomm Cloud AI machine learning
>>>> +accelerator cards.
>>>> +
>>>> +.. toctree::
>>>> +
>>>> +   qaic
>>>> +   aic100
>>>> diff --git a/Documentation/accel/qaic/qaic.rst b/Documentation/accel/qaic/qaic.rst
>>>> new file mode 100644
>>>> index 0000000..b0e7a5f
>>>> --- /dev/null
>>>> +++ b/Documentation/accel/qaic/qaic.rst
>>>> @@ -0,0 +1,169 @@
>>>> +.. SPDX-License-Identifier: GPL-2.0-only
>>>> +
>>>> +=============
>>>> + QAIC driver
>>>> +=============
>>>> +
>>>> +The QAIC driver is the Kernel Mode Driver (KMD) for the AIC100 family of AI
>>>> +accelerator products.
>>>> +
>>>> +Interrupts
>>>> +==========
>>>> +
>>>> +While the AIC100 DMA Bridge hardware implements an IRQ storm mitigation
>>>> +mechanism, it is still possible for an IRQ storm to occur.  A storm can happen
>>>> +if the workload is particularly quick, and the host is responsive.  If the host
>>>> +can drain the response FIFO as quickly as the device can insert elements into
>>>> +it, then the device will frequently transition the response FIFO from empty to
>>>> +non-empty and generate MSIs at a rate equilivelent to the speed of the
>>>
>>> equilivelent -> equivalent
>>
>> Sure.
>>
>>>
>>>> +workload's ability to process inputs.  The lprnet (license plate reader network)
>>>> +workload is known to trigger this condition, and can generate in excess of 100k
>>>> +MSIs per second.  It has been observed that most systems cannot tolerate this
>>>> +for long, and will crash due to some form of watchdog due to the overhead of
>>>> +the interrupt controller interrupting the host CPU.
>>>> +
>>>> +To mitigate this issue, the QAIC driver implements specific IRQ handling.  When
>>>> +QAIC receives an IRQ, it disables that line.  This prevents the interrupt
>>>> +controller from interrupting the CPU.  Then AIC drains the FIFO.  Once the FIFO
>>>> +is drained, QAIC implements a "last chance" polling algorithm where QAIC will
>>>> +sleep for a time to see if the workload will generate more activity.  The IRQ
>>>> +line remains disabled during this time.  If no activity is detected, QAIC exits
>>>> +polling mode and reenables the IRQ line.
>>>> +
>>>> +This mitigation in QAIC is very effective.  The same lprnet usecase that
>>>> +generates 100k IRQs per second (per /proc/interrupts) is reduced to roughly 64
>>>> +IRQs over 5 minutes while keeping the host system stable, and having the same
>>>> +workload throughput performance (within run to run noise variation).
>>>> +
>>>> +
>>>> +Neural Network Control (NNC) Protocol
>>>> +=====================================
>>>> +
>>>> +The implementation of NNC is split between the KMD (QAIC) and UMD.  In general
>>>> +QAIC understands how to encode/decode NNC wire protocol, and elements of the
>>>> +protocol which require kernelspace knowledge to process (for example, mapping
>>>
>>> kernelspace is missing a space :P
>>>
>>>> +host memory to device IOVAs).  QAIC understands the structure of a message, and
>>>> +all of the transactions.  QAIC does not understand commands (the payload of a
>>>> +passthrough transaction).
>>>> +
>>>> +QAIC handles and enforces the required little endianness and 64-bit alignment,
>>>> +to the degree that it can.  Since QAIC does not know the contents of a
>>>> +passthrough transaction, it relies on the UMD to saitsfy the requirements.
>>>
>>> saitsfy -> satisfy
>>
>> Will do.
>>
>>>
>>>> +The terminate transaction is of particular use to QAIC.  QAIC is not aware of
>>>> +the resources that are loaded onto a device since the majority of that activity
>>>> +occurs within NNC commands.  As a result, QAIC does not have the means to
>>>> +roll back userspace activity.  To ensure that a userspace client's resources
>>>> +are fully released in the case of a process crash, or a bug, QAIC uses the
>>>> +terminate command to let QSM know when a user has gone away, and the resources
>>>> +can be released.
>>>> +
>>>> +QSM can report a version number of the NNC protocol it supports.  This is in the
>>>> +form of a Major number and a Minor number.
>>>> +
>>>> +Major number updates indicate changes to the NNC protocol which impact the
>>>> +message format, or transactions (impacts QAIC).
>>>> +
>>>> +Minor number updates indicate changes to the NNC protocol which impact the
>>>> +commands (does not impact QAIC).
>>>> +
>>>> +uAPI
>>>> +====
>>>> +
>>>> +QAIC defines a number of driver specific IOCTLs as part of the userspace API.
>>>> +This section describes those APIs.
>>>> +
>>>> +DRM_IOCTL_QAIC_MANAGE:
>>>> +This IOCTL allows userspace to send a NNC request to the QSM.  The call will
>>>> +block until a response is received, or the request has timed out.
>>>> +
>>>> +DRM_IOCTL_QAIC_CREATE_BO:
>>>> +This IOCTL allows userspace to allocate a buffer object (BO) which can send or
>>>> +receive data from a workload.  The call will return a GEM handle that
>>>> +represents the allocated buffer.  The BO is not usable until it has been sliced
>>>> +(see DRM_IOCTL_QAIC_ATTACH_SLICE_BO).
>>>> +
>>>> +DRM_IOCTL_QAIC_MMAP_BO:
>>>> +This IOCTL allows userspace to prepare an allocated BO to be mmap'd into the
>>>> +userspace process.
>>>> +
>>>> +DRM_IOCTL_QAIC_ATTACH_SLICE_BO:
>>>> +This IOCTL allows userspace to slice a BO in preparation for sending the BO to
>>>> +the device.  Slicing is the operation of describing what portions of a BO get
>>>> +sent where to a workload.  This requires a set of DMA transfers for the DMA
>>>> +Bridge, and as such, locks the BO to a specific DBC.
>>>> +
>>>> +DRM_IOCTL_QAIC_EXECUTE_BO:
>>>> +This IOCTL allows userspace to submit a set of sliced BOs to the device.  The
>>>> +call is non-blocking.  Success only indicates that the BOs have been queued
>>>> +to the device, but does not guarantee they have been executed.
>>>> +
>>>> +DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO:
>>>> +This IOCTL operates like DRM_IOCTL_QAIC_EXECUTE_BO, but it allows userspace to
>>>> +shrink the BOs sent to the device for this specific call.  If a BO typically has
>>>> +N inputs, but only a subset of those is available, this IOCTL allows userspace
>>>> +to indicate that only the first M bytes of the BO should be sent to the device
>>>> +to minimize data transfer overhead.  This IOCTL dynamically recomputes the
>>>> +slicing, and therefore has some processing overhead before the BOs can be queued
>>>> +to the device.
>>>> +
>>>> +DRM_IOCTL_QAIC_WAIT_BO:
>>>> +This IOCTL allows userspace to determine when a particular BO has been processed
>>>> +by the device.  The call will block until either the BO has been processed and
>>>> +can be re-queued to the device, or a timeout occurs.
>>>> +
>>>> +DRM_IOCTL_QAIC_PERF_STATS_BO:
>>>> +This IOCTL allows userspace to collect performance statistics on the most
>>>> +recent execution of a BO.  This allows userspace to construct an end to end
>>>> +timeline of the BO processing for a performance analysis.
>>>> +
>>>> +DRM_IOCTL_QAIC_PART_DEV:
>>>> +This IOCTL allows userspace to request a duplicate "shadow device".  This extra
>>>> +accelN device is associated with a specific partition of resources on the AIC100
>>>> +device and can be used for limiting a process to some subset of resources.
>>>> +
>>>> +Userspace Client Isolation
>>>> +==========================
>>>> +
>>>> +AIC100 supports multiple clients.  Multiple DBCs can be consumed by a single
>>>> +client, and multiple clients can each consume one or more DBCs.  Workloads
>>>> +may contain sensistive information therefore only the client that owns the
>>>
>>> sensistive -> sensitive
>>
>> Will do.
>>
>>>
>>>> +workload should be allowed to interface with the DBC.
>>>> +
>>>> +Clients are identified by the instance associated with their open().  A client
>>>> +may only use memory they allocate, and DBCs that are assigned to their
>>>> +workloads.  Attempts to access resources assigned to other clients will be
>>>> +rejected.
>>>> +
>>>> +Module parameters
>>>> +=================
>>>> +
>>>> +QAIC supports the following module parameters:
>>>> +
>>>> +**datapath_polling (bool)**
>>>> +
>>>> +Configures QAIC to use a polling thread for datapath events instead of relying
>>>> +on the device interrupts.  Useful for platforms with broken multiMSI.  Must be
>>>> +set at QAIC driver initialization.  Default is 0 (off).
>>>> +
>>>> +**mhi_timeout (int)**
>>>> +
>>>> +Sets the timeout value for MHI operations in milliseconds (ms).  Must be set
>>>> +at the time the driver detects a device.  Default is 2000 (2 seconds).
>>>> +
>>>> +**control_resp_timeout (int)**
>>>> +
>>>> +Sets the timeout value for QSM responses to NNC messages in seconds (s).  Must
>>>> +be set at the time the driver is sending a request to QSM.  Default is 60 (one
>>>> +minute).
>>>> +
>>>> +**wait_exec_default_timeout (int)**
>>>> +
>>>> +Sets the default timeout for the wait_exec ioctl in milliseconds (ms).  Must be
>>>> +set prior to the waic_exec ioctl call.  A value specified in the ioctl call
>>>> +overrides this for that call.  Default is 5000 (5 seconds).
>>>> +
>>>> +**datapath_poll_interval_us (int)**
>>>> +
>>>> +Sets the polling interval in microseconds (us) when datapath polling is active.
>>>> +Takes effect at the next polling interval.  Default is 100 (100 us).
>>>
>>> Cool that you are staring with the documentation :)
>>> I suggest running at least "checkpatch.pl --codespell" on the series as there are many spelling issue.
>>
>> Thats what happened to checkpatch.  Thanks for pointing out the flag.  I remember it doing spellcheck by default.  Apparently it needs a flag now.  I'll do that going forward.
>>
>>>
>>> Regards,
>>> Jacek
>>>
>>>
>>>
>>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/8] accel/qaic: Add documentation for AIC100 accelerator driver
@ 2023-02-16 15:20           ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-16 15:20 UTC (permalink / raw)
  To: Jacek Lawrynowicz, ogabbay, airlied, daniel, dri-devel
  Cc: linux-doc, linux-arm-msm, quic_ajitpals, quic_pkanojiy,
	stanislaw.gruszka, quic_carlv

On 2/16/2023 7:18 AM, Jacek Lawrynowicz wrote:
> Hi,
> 
> On 15.02.2023 16:41, Jeffrey Hugo wrote:
>> On 2/14/2023 4:08 AM, Jacek Lawrynowicz wrote:
>>> Hi,
>>
>> Thank you for the review.
>>
>>> On 06.02.2023 16:41, Jeffrey Hugo wrote:
>>>> The Qualcomm Cloud AI 100 (AIC100) device is an Artificial Intelligence
>>>> accelerator PCIe card.  It contains a number of components both in the
>>>> SoC and on the card which facilitate running workloads:
>>>>
>>>> QSM: management processor
>>>> NSPs: workload compute units
>>>> DMA Bridge: dedicated data mover for the workloads
>>>> MHI: multiplexed communication channels
>>>> DDR: workload storage and memory
>>>>
>>>> The Linux kernel driver for AIC100 is called "QAIC" and is located in the
>>>> accel subsystem.
>>>>
>>>> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
>>>> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
>>>> ---
>>>>    Documentation/accel/index.rst       |   1 +
>>>>    Documentation/accel/qaic/aic100.rst | 498 ++++++++++++++++++++++++++++++++++++
>>>>    Documentation/accel/qaic/index.rst  |  13 +
>>>>    Documentation/accel/qaic/qaic.rst   | 169 ++++++++++++
>>>>    4 files changed, 681 insertions(+)
>>>>    create mode 100644 Documentation/accel/qaic/aic100.rst
>>>>    create mode 100644 Documentation/accel/qaic/index.rst
>>>>    create mode 100644 Documentation/accel/qaic/qaic.rst
>>>>
>>>> diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
>>>> index 2b43c9a..e94a016 100644
>>>> --- a/Documentation/accel/index.rst
>>>> +++ b/Documentation/accel/index.rst
>>>> @@ -8,6 +8,7 @@ Compute Accelerators
>>>>       :maxdepth: 1
>>>>         introduction
>>>> +   qaic/index
>>>>      .. only::  subproject and html
>>>>    diff --git a/Documentation/accel/qaic/aic100.rst b/Documentation/accel/qaic/aic100.rst
>>>> new file mode 100644
>>>> index 0000000..773aa54
>>>> --- /dev/null
>>>> +++ b/Documentation/accel/qaic/aic100.rst
>>>> @@ -0,0 +1,498 @@
>>>> +.. SPDX-License-Identifier: GPL-2.0-only
>>>> +
>>>> +===============================
>>>> + Qualcomm Cloud AI 100 (AIC100)
>>>> +===============================
>>>> +
>>>> +Overview
>>>> +========
>>>> +
>>>> +The Qualcomm Cloud AI 100/AIC100 family of products (including SA9000P - part of
>>>> +Snapdragon Ride) are PCIe adapter cards which contain a dedicated SoC ASIC for
>>>> +the purpose of efficiently running Artificial Intelligence (AI) Deep Learning
>>>> +inference workloads.  They are AI accelerators.
>>>
>>> There are multiple double spaces in this document like this one above.
>>
>> I presume you are referring to the double space after peroid. Universally, that was the recommended style (APA guidebook, etc) until a little while ago.  Old habits are hard to break.  Will scrub.
>>
>>>
>>>> +The PCIe interface of AIC100 is capable of PCIe Gen4 speeds over eight lanes
>>>> +(x8).  An individual SoC on a card can have up to 16 NSPs for running workloads.
>>>> +Each SoC has an A53 management CPU.  On card, there can be up to 32 GB of DDR.
>>>> +
>>>> +Multiple AIC100 cards can be hosted in a single system to scale overall
>>>> +performance.
>>>> +
>>>> +Hardware Description
>>>> +====================
>>>> +
>>>> +An AIC100 card consists of an AIC100 SoC, on-card DDR, and a set of misc
>>>> +peripherals (PMICs, etc).
>>>> +
>>>> +An AIC100 card can either be a PCIe HHHL form factor (a traditional PCIe card),
>>>> +or a Dual M.2 card.  Both use PCIe to connect to the host system.
>>>
>>> Dual M.2 card? Is it a single PCB with two M.2 connectors? This requires custom
>>> motherboard with x4 lanes from two connectors combined as a single PCIe device, right?
>>
>> Yes.  There is a specification for this, although it hasn't gotten widespread adoption.  In addition to more lanes, you also get to draw more power.  Sincle M.2 is around 11W.  Dual M.2 is capped at 25W.
>>
>> It tends to be a handy form factor for "edge" applications where the physical size and power draw of a "normal" PCIe slot (what you'd find on a regular ATX motherboard) is not desirerable.
>>
>>>
>>>> +As a PCIe endpoint/adapter, AIC100 uses the standard VendorID(VID)/
>>>> +DeviceID(DID) combination to uniquely identify itself to the host.  AIC100
>>>> +uses the standard Qualcomm VID (0x17cb).  All AIC100 instances use the same
>>>> +AIC100 DID (0xa100).
>>>
>>> Maybe "SKUs" would fit better here then "instances".
>>
>> Sure.
>>
>>>
>>>> +AIC100 does not implement FLR (function level reset).
>>>> +
>>>> +AIC100 implements MSI but does not implement MSI-X.  AIC100 requires 17 MSIs to
>>>> +operate (1 for MHI, 16 for the DMA Bridge).
>>>> +
>>>> +As a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device
>>>> +hardware.  AIC100 provides 3, 64-bit BARs.
>>>> +
>>>> +* The first BAR is 4K in size, and exposes the MHI interface to the host.
>>>> +
>>>> +* The second BAR is 2M in size, and exposes the DMA Bridge interface to the
>>>> +  host.
>>>> +
>>>> +* The third BAR is variable in size based on an individual AIC100's
>>>> +  configuration, but defaults to 64K.  This BAR currently has no purpose.
>>>> +
>>>> +From the host perspective, AIC100 has several key hardware components-
>>>
>>> Typo in "components-".
>>
>> ?
>> You want "components -"?
> 
> I meant "components:" but I guess the "-" is part of the format.

Ah, I see.  "components:" is the same thing to me.  Will change since it 
seems like it makes more sense to you.

> 
>>>
>>>> +* QSM (QAIC Service Manager)
>>>> +* NSPs (Neural Signal Processor)
>>>> +* DMA Bridge
>>>> +* DDR
>>>> +* MHI (Modem Host Interface)
>>>> +
>>>> +QSM
>>>> +---
>>>> +
>>>> +QAIC Service Manager.  This is an ARM A53 CPU that runs the primary
>>>> +firmware of the card and performs on-card management tasks.  It also
>>>> +communicates with the host via MHI.  Each AIC100 has one of
>>>> +these.
>>>
>>> I would put description of MHI at the top because it is referenced by the QSM description.
>>
>> Sure.
>>
>>>
>>>> +NSP
>>>> +---
>>>> +
>>>> +Neural Signal Processor.  Each AIC100 has up to 16 of these.  These are
>>>> +the processors that run the workloads on AIC100.  Each NSP is a Qualcomm Hexagon
>>>> +(Q6) DSP with HVX and HMX.  Each NSP can only run one workload at a time, but
>>>> +multiple NSPs may be assigned to a single workload.  Since each NSP can only run
>>>> +one workload, AIC100 is limited to 16 concurrent workloads.  Workload
>>>> +"scheduling" is under the purview of the host.  AIC100 does not automatically
>>>> +timeslice.
>>>> +
>>>> +DMA Bridge
>>>> +----------
>>>> +
>>>> +The DMA Bridge is custom DMA engine that manages the flow of data
>>>> +in and out of workloads.  AIC100 has one of these.  The DMA Bridge has 16
>>>> +channels, each consisting of a set of request/response FIFOs.  Each active
>>>> +workload is assigned a single DMA Bridge channel.  The DMA Bridge exposes
>>>> +hardware registers to manage the FIFOs (head/tail pointers), but requires host
>>>> +memory to store the FIFOs.
>>>> +
>>>> +DDR
>>>> +---
>>>> +
>>>> +AIC100 has on-card DDR.  In total, an AIC100 can have up to 32 GB of DDR.
>>>> +This DDR is used to store workloads, data for the workloads, and is used by the
>>>> +QSM for managing the device.  NSPs are granted access to sections of the DDR by
>>>> +the QSM.  The host does not have direct access to the DDR, and must make
>>>> +requests to the QSM to transfer data to the DDR.
>>>> +
>>>> +MHI
>>>> +---
>>>> +
>>>> +AIC100 has one MHI interface over PCIe.  MHI itself is documented at
>>>
>>> Please exand MHI acronym.
>>
>> Its expanded about 40 lines up - "* MHI (Modem Host Interface)".  I generally go by the scheme of expanding an acronym the first time it is used in a document, and then just using the acronym there after.
>>
>> Do you feel the expansion needs to be duplicated?  It might help when this section is moved to the top.
>>
> 
> No, it's fine.
> 
>>>
>>>> +Documentation/mhi/index.rst  MHI is the mechanism the host uses to communicate
>>>> +with the QSM.  Except for workload data via the DMA Bridge, all interaction with
>>>> +he device occurs via MHI.
>>>
>>> Typo in "he device".
>>
>> Doh.  Will fix.
>>
>>>
>>>> +High-level Use Flow
>>>> +===================
>>>> +
>>>> +AIC100 is a programmable accelerator typically used for running
>>>> +neural networks in inferencing mode to efficiently perform AI operations.
>>>> +AIC100 is not intended for training neural networks.  AIC100 can be utilitized
>>>
>>> utilitized -> utilized
>>
>> Sure
>>
>>>
>>>> +for generic compute workloads.
>>>> +
>>>> +Assuming a user wants to utilize AIC100, they would follow these steps:
>>>> +
>>>> +1. Compile the workload into an ELF targeting the NSP(s)
>>>> +2. Make requests to the QSM to load the workload and related artifacts into the
>>>> +   device DDR
>>>> +3. Make a request to the QSM to activate the workload onto a set of idle NSPs
>>>> +4. Make requests to the DMA Bridge to send input data to the workload to be
>>>> +   processed, and other requests to receive processed output data from the
>>>> +   workload.
>>>> +5. Once the workload is no longer required, make a request to the QSM to
>>>> +   deactivate the workload, thus putting the NSPs back into an idle state.
>>>> +6. Once the workload and related artifacts are no longer needed for future
>>>> +   sessions, make requests to the QSM to unload the data from DDR.  This frees
>>>> +   the DDR to be used by other users.
>>>> +
>>>
>>> Please specify if this is single or multi user device.
>>
>> It is multi-user.  I will find a way to clarify that.
>>
>>>
>>>> +Boot Flow
>>>> +=========
>>>> +
>>>> +AIC100 uses a flashless boot flow, derived from Qualcomm MSMs.
>>>
>>> What's MSM?
>>
>> "Mobile Station Modem".  It is is Qualcomm term from the 80s, and used to describe the "family" Qualcomm phone SoCs.  It used to be that the Model number would be "MSM8660" or "MSM8960", etc.  That has changed a bit in the past few years, but the products are still referred to "MSMs".
>>
>> It is a common term in the Qualcomm world, and "MSM" is more well known these days than what it stands for.  I don't think expanding it is going to add value.
>>
> 
> OK
> 
>>>
>>>> +When AIC100 is first powered on, it begins executing PBL (Primary Bootloader)
>>>> +from ROM.  PBL enumerates the PCIe link, and initializes the BHI (Boot Host
>>>> +Interface) component of MHI.
>>>> +
>>>> +Using BHI, the host points PBL to the location of the SBL (Secondary Bootloader)
>>>> +image.  The PBL pulls the image from the host, validates it, and begins
>>>> +execution of SBL.
>>>> +
>>>> +SBL initializes MHI, and uses MHI to notify the host that the device has entered
>>>> +the SBL stage.  SBL performs a number of operations:
>>>> +
>>>> +* SBL initializes the majority of hardware (anything PBL left uninitialized),
>>>> +  including DDR.
>>>> +* SBL offloads the bootlog to the host.
>>>> +* SBL synchonizes timestamps with the host for future logging.
>>>
>>> synchonizes -> synchronizes
>>
>> Yep
>>
>>>
>>>> +* SBL uses the Sahara protocol to obtain the runtime firmware images from the
>>>> +  host.
>>>> +
>>>> +Once SBL has obtained and validated the runtime firmware, it brings the NSPs out
>>>> +of reset, and jumps into the QSM.
>>>> +
>>>> +The QSM uses MHI to notify the host that the device has entered the QSM stage
>>>> +(AMSS in MHI terms).  At this point, the AIC100 device is fully functional, and
>>>> +ready to process workloads.
>>>> +
>>>> +Userspace components
>>>> +====================
>>>> +
>>>> +Compiler
>>>> +--------
>>>> +
>>>> +An open compiler for AIC100 based on upstream LLVM can be found at:
>>>> +https://github.com/quic/software-kit-for-qualcomm-cloud-ai-100-cc
>>>> +
>>>> +Usermode Driver (UMD)
>>>> +---------------------
>>>> +
>>>> +An open UMD that interfaces with the qaic kernel driver can be found at:
>>>> +https://github.com/quic/software-kit-for-qualcomm-cloud-ai-100
>>>
>>> This repo is empty.
>>
>> Correct.  That was mentioned in the cover letter.  Targeting to post content this week.  Just working through the last steps of our internal process.
>>
>>>
>>>> +
>>>> +Sahara loader
>>>> +-------------
>>>> +
>>>> +An open implementation of the Sahara protocol called kickstart can be found at:
>>>> +https://github.com/andersson/qdl
>>>> +
>>>> +MHI Channels
>>>> +============
>>>> +
>>>> +AIC100 defines a number of MHI channels for different purposes.  This is a list
>>>> +of the defined channels, and their uses.
>>>> +
>>>> +| QAIC_LOOPBACK
>>>> +| Channels 0/1
>>>
>>> A would use comma or & here.
>>
>> Intresting.  I can see that.  I think I like "&".  Will do that.
>>
>>>
>>>> +| Valid for AMSS
>>>> +| Any data sent to the device on this channel is sent back to the host.
>>>> +
>>>> +| QAIC_SAHARA
>>>> +| Channels 2/3
>>>> +| Valid for SBL
>>>> +| Used by SBL to obtain the runtime firmware from the host.
>>>> +
>>>> +| QAIC_DIAG
>>>> +| Channels 4/5
>>>> +| Valid for AMSS
>>>> +| Used to communicate with QSM via the Diag protocol.
>>>> +
>>>> +| QAIC_SSR
>>>> +| Channels 6/7
>>>> +| Valid for AMSS
>>>> +| Used to notify the host of subsystem restart events, and to offload SSR crashdumps.
>>>> +
>>>> +| QAIC_QDSS
>>>> +| Channels 8/9
>>>> +| Valid for AMSS
>>>> +| Used for the Qualcomm Debug Subsystem.
>>>> +
>>>> +| QAIC_CONTROL
>>>> +| Channels 10/11
>>>> +| Valid for AMSS
>>>> +| Used for the Neural Network Control (NNC) protocol.  This is the primary channel between host and QSM for managing workloads.
>>>> +
>>>> +| QAIC_LOGGING
>>>> +| Channels 12/13
>>>> +| Valid for SBL
>>>> +| Used by the SBL to send the bootlog to the host.
>>>> +
>>>> +| QAIC_STATUS
>>>> +| Channels 14/15
>>>> +| Valid for AMSS
>>>> +| Used to notify the host of Reliability, Accessability, Serviceability (RAS) events.
>>>
>>> Accessability -> Accessibility
>>
>> Got it.
>>
>>>
>>>> +| QAIC_TELEMETRY
>>>> +| Channels 16/17
>>>> +| Valid for AMSS
>>>> +| Used to get/set power/thermal/etc attributes.
>>>> +
>>>> +| QAIC_DEBUG
>>>> +| Channels 18/19
>>>> +| Valid for AMSS
>>>> +| Not used.
>>>> +
>>>> +| QAIC_TIMESYNC
>>>> +| Channels 20/21
>>>> +| Valid for SBL/AMSS
>>>> +| Used to synchronize timestamps in the device side logs with the host time source.
>>>> +
>>>> +DMA Bridge
>>>> +==========
>>>> +
>>>> +Overview
>>>> +--------
>>>> +
>>>> +The DMA Bridge is one of the main interfaces to the host from the device
>>>> +(the other being MHI).  As part of activating a workload to run on NSPs, the QSM
>>>> +assigns that network a DMA Bridge channel.  A workload's DMA Bridge channel
>>>> +(DBC for short) is solely for the use of that workload and is not shared with
>>>> +other workloads.
>>>> +
>>>> +Each DBC is a pair of FIFOs that manage data in and out of the workload.  One
>>>> +FIFO is the request FIFO.  The other FIFO is the response FIFO.
>>>> +
>>>> +Each DBC contains 4 registers in hardware:
>>>> +
>>>> +* Request FIFO head pointer (offset 0x0).  Read only to the host.  Indicates the
>>>
>>> Read only _by_ the host.
>>
>> Sure.
>>
>>>
>>>> +  latest item in the FIFO the device has consumed.
>>>> +* Request FIFO tail pointer (offset 0x4).  Read/write by the host.  Host
>>>> +  increments this register to add new items to the FIFO.
>>>> +* Response FIFO head pointer (offset 0x8).  Read/write by the host.  Indicates
>>>> +  the latest item in the FIFO the host has consumed.
>>>> +* Response FIFO tail pointer (offset 0xc).  Read only to the host.  Device
>>>
>>> Read only _by_ the host.
>>
>> Sure.
>>
>>>
>>>> +  increments this register to add new items to the FIFO.
>>>> +
>>>> +The values in each register are indexes in the FIFO.  To get the location of the
>>>> +FIFO element pointed to by the register: FIFO base address + register * element
>>>> +size.
>>>> +
>>>> +DBC registers are exposed to the host via the second BAR.  Each DBC consumes
>>>> +0x1000 of space in the BAR.
>>>
>>> I wouldn't use hex for the sizes. 4KB seems a lot more readable.
>>
>> Good point.  Will do.
>>
>>>
>>>> +The actual FIFOs are backed by host memory.  When sending a request to the QSM
>>>> +to activate a network, the host must donate memory to be used for the FIFOs.
>>>> +Due to internal mapping limitations of the device, a single contigious chunk of
>>>
>>> contigious -> contiguous
>>
>> Got it.
>>
>>>
>>>> +memory must be provided per DBC, which hosts both FIFOs.  The request FIFO will
>>>> +consume the beginning of the memory chunk, and the response FIFO will consume
>>>> +the end of the memory chunk.
>>>> +
>>>> +Request FIFO
>>>> +------------
>>>> +
>>>> +A request FIFO element has the following structure:
>>>> +
>>>> +| {
>>>> +|    u16 req_id;
>>>> +|    u8  seq_id;
>>>> +|    u8  pcie_dma_cmd;
>>>> +|    u32 reserved;
>>>> +|    u64 pcie_dma_source_addr;
>>>> +|    u64 pcie_dma_dest_addr;
>>>> +|    u32 pcie_dma_len;
>>>> +|    u32 reserved;
>>>> +|    u64 doorbell_addr;
>>>> +|    u8  doorbell_attr;
>>>> +|    u8  reserved;
>>>> +|    u16 reserved;
>>>> +|    u32 doorbell_data;
>>>> +|    u32 sem_cmd0;
>>>> +|    u32 sem_cmd1;
>>>> +|    u32 sem_cmd2;
>>>> +|    u32 sem_cmd3;
>>>> +| }
>>>> +
>>>> +Request field descriptions:
>>>> +
>>>> +| req_id- request ID.  A request FIFO element and a response FIFO element with
>>>> +|         the same request ID refer to the same command.
>>>> +
>>>> +| seq_id- sequence ID within a request.  Ignored by the DMA Bridge.
>>>> +
>>>> +| pcie_dma_cmd- describes the DMA element of this request.
>>>> +|     Bit(7) is the force msi flag, which overrides the DMA Bridge MSI logic
>>>> +|         and generates a MSI when this request is complete, and QSM
>>>> +|         configures the DMA Bridge to look at this bit.
>>>> +|     Bits(6:5) are reserved.
>>>> +|     Bit(4) is the completion code flag, and indicates that the DMA Bridge
>>>> +|         shall generate a response FIFO element when this request is
>>>> +|         complete.
>>>> +|     Bit(3) indicates if this request is a linked list transfer(0) or a bulk
>>>> +|         transfer(1).
>>>> +|     Bit(2) is reserved.
>>>> +|     Bits(1:0) indicate the type of transfer.  No transfer(0), to device(1),
>>>> +|         from device(2).  Value 3 is illegal.
>>>> +
>>>> +| pcie_dma_source_addr- source address for a bulk transfer, or the address of
>>>> +|         the linked list.
>>>> +
>>>> +| pcie_dma_dest_addr- destination address for a bulk transfer.
>>>> +
>>>> +| pcie_dma_len- length of the bulk transfer.  Note that the size of this field
>>>> +|     limits transfers to 4G in size.
>>>> +
>>>> +| doorbell_addr- address of the doorbell to ring when this request is complete.
>>>> +
>>>> +| doorbell_attr- doorbell attributes.
>>>> +|     Bit(7) indicates if a write to a doorbell is to occur.
>>>> +|     Bits(6:2) are reserved.
>>>> +|     Bits(1:0) contain the encoding of the doorbell length.  0 is 32-bit,
>>>> +|         1 is 16-bit, 2 is 8-bit, 3 is reserved.  The doorbell address
>>>> +|         must be naturally aligned to the specified length.
>>>> +
>>>> +| doorbell_data- data to write to the doorbell.  Only the bits corresponding to
>>>> +|     the doorbell length are valid.
>>>> +
>>>> +| sem_cmdN- semaphore command.
>>>> +|     Bit(31) indicates this semaphore command is enabled.
>>>> +|     Bit(30) is the to-device DMA fence.  Block this request until all
>>>> +|         to-device DMA transfers are complete.
>>>> +|     Bit(29) is the from-device DMA fence.  Block this request until all
>>>> +|         from-device DMA transfers are complete.
>>>> +|     Bits(28:27) are reserved.
>>>> +|     Bits(26:24) are the semaphore command.  0 is NOP.  1 is init with the
>>>> +|         specified value.  2 is increment.  3 is decrement.  4 is wait
>>>> +|         until the semaphore is equal to the specified value.  5 is wait
>>>> +|         until the semaphore is greater or equal to the specified value.
>>>> +|         6 is "P", wait until semaphore is greater than 0, then
>>>> +|         decrement by 1.  7 is reserved.
>>>> +|     Bit(23) is reserved.
>>>> +|     Bit(22) is the semaphore sync.  0 is post sync, which means that the
>>>> +|         semaphore operation is done after the DMA transfer.  1 is
>>>> +|         presync, which gates the DMA transfer.  Only one presync is
>>>> +|         allowed per request.
>>>> +|     Bit(21) is reserved.
>>>> +|     Bits(20:16) is the index of the semaphore to operate on.
>>>> +|     Bits(15:12) are reserved.
>>>> +|     Bits(11:0) are the semaphore value to use in operations.
>>>
>>> It seems to me like structure documentation
>>
>> Yes.  It can be modeled that way.  However the code comes later so it can't be referenced here yet.  I've got a todo to come back and clean that up once this series is merged.
>>
>>>
>>>> +Overall, a request is processed in 4 steps:
>>>> +
>>>> +1. If specified, the presync semaphore condition must be true
>>>> +2. If enabled, the DMA transfer occurs
>>>> +3. If specified, the postsync semaphore conditions must be true
>>>> +4. If enabled, the doorbell is written
>>>> +
>>>> +By using the semaphores in conjunction with the workload running on the NSPs,
>>>> +the data pipeline can be synchronized such that the host can queue multiple
>>>> +requests of data for the workload to process, but the DMA Bridge will only copy
>>>> +the data into the memory of the workload when the workload is ready to process
>>>> +the next input.
>>>> +
>>>> +Response FIFO
>>>> +-------------
>>>> +
>>>> +Once a request is fully processed, a response FIFO element is generated if
>>>> +specified in pcie_dma_cmd.  The structure of a response FIFO element:
>>>> +
>>>> +| {
>>>> +|     u16 req_id;
>>>> +|     u16 completion_code;
>>>> +| }
>>>> +
>>>> +req_id- matches the req_id of the request that generated this element.
>>>> +
>>>> +completion_code- status of this request.  0 is success.  non-zero is an error.
>>>> +
>>>> +The DMA Bridge will generate a MSI to the host as a reaction to activity in the
>>>> +response FIFO of a DBC.  The DMA Bridge hardware has an IRQ storm mitigation
>>>> +algorithm, where it will only generate a MSI when the response FIFO transitions
>>>> +from empty to non-empty (unless force MSI is enabled and triggered).  In
>>>> +response to this MSI, the host is expected to drain the response FIFO, and must
>>>> +take care to handle any race conditions between draining the FIFO, and the
>>>> +device inserting elements into the FIFO.
>>>> +
>>>> +Neural Network Control (NNC) Protocol
>>>> +=====================================
>>>> +
>>>> +The NNC protocol is how the host makes requests to the QSM to manage workloads.
>>>> +It uses the QAIC_CONTROL MHI channel.
>>>> +
>>>> +Each NNC request is packaged into a message.  Each message is a series of
>>>> +transactions.  A passthrough type transaction can contain elements known as
>>>> +commands.
>>>> +
>>>> +QSM requires NNC messages be little endian encoded and the fields be naturally
>>>> +aligned.  Since there are 64-bit elements in some NNC messages, 64-bit alignment
>>>> +must be maintained.
>>>> +
>>>> +A message contains a header and then a series of transactions.  A message may be
>>>> +at most 4K in size from QSM to the host.  From the host to the QSM, a message
>>>> +can be at most 64K (maximum size of a single MHI packet), but there is a
>>>> +continuation feature where message N+1 can be marked as a continuation of
>>>> +message N.  This is used for exceedingly large DMA xfer transactions.
>>>> +
>>>> +Transaction descriptions:
>>>> +
>>>> +passthrough- Allows userspace to send an opaque payload directly to the QSM.
>>>> +This is used for NNC commands.  Userspace is responsible for managing
>>>> +the QSM message requirements in the payload
>>>> +
>>>> +dma_xfer- DMA transfer.  Describes an object that the QSM should DMA into the
>>>> +device via address and size tuples.
>>>> +
>>>> +activate- Activate a workload onto NSPs.  The host must provide memory to be
>>>> +used by the DBC.
>>>> +
>>>> +deactivate- Deactivate an active workload and return the NSPs to idle.
>>>> +
>>>> +status- Query the QSM about it's NNC implementation.  Returns the NNC version,
>>>> +and if CRC is used.
>>>> +
>>>> +terminate- Release a user's resources.
>>>> +
>>>> +dma_xfer_cont- Continuation of a previous DMA transfer.  If a DMA transfer
>>>> +cannot be specified in a single message (highly fragmented), this
>>>> +transaction can be used to specify more ranges.
>>>> +
>>>> +validate_partition- Query to QSM to determine if a partition identifier is
>>>> +valid.
>>>> +
>>>> +Each message is tagged with a user id, and a partition id.  The user id allows
>>>> +QSM to track resources, and release them when the user goes away (eg the process
>>>> +crashes).  A partition id identifies the resource partition that QSM manages,
>>>> +which this message applies to.
>>>> +
>>>> +Messages may have CRCs.  Messages should have CRCs applied until the QSM
>>>> +reports via the status transaction that CRCs are not needed.  The QSM on the
>>>> +SA9000P requires CRCs for black channel safing.
>>>> +
>>>> +Subsystem Restart (SSR)
>>>> +=======================
>>>> +
>>>> +SSR is the concept of limiting the impact of an error.  An AIC100 device may
>>>> +have multiple users, each with their own workload running.  If the workload of
>>>> +one user crashes, the fallout of that should be limited to that workload and not
>>>> +impact other workloads.  SSR accomplishes this.
>>>> +
>>>> +If a particular workload crashes, QSM notifies the host via the QAIC_SSR MHI
>>>> +channel.  This notification identifies the workload by it's assigned DBC.  A
>>>> +multi-stage recovery process is then used to cleanup both sides, and get the
>>>> +DBC/NSPs into a working state.
>>>> +
>>>> +When SSR occurs, any state in the workload is lost.  Any inputs that were in
>>>> +process, or queued by not yet serviced, are lost.  The loaded artifacts will
>>>> +remain in on-card DDR, but the host will need to re-activate the workload if
>>>> +it desires to recover the workload.
>>>> +
>>>> +Reliability, Accessability, Serviceability (RAS)
>>>
>>> Accessability -> Accessibility
>>
>> Got it.
>>
>>>
>>>> +================================================
>>>> +
>>>> +AIC100 is expected to be deployed in server systems where RAS ideology is
>>>> +applied.  Simply put, RAS is the concept of detecting, classifying, and
>>>> +reporting errors.  While PCIe has AER (Advanced Error Reporting) which factors
>>>> +into RAS, AER does not allow for a device to report details about internal
>>>> +errors.  Therefore, AIC100 implements a custom RAS mechanism.  When a RAS event
>>>> +occurs, QSM will report the event with appropriate details via the QAIC_STATUS
>>>> +MHI channel.  A sysadmin may determine that a particular device needs
>>>> +additional service based on RAS reports.
>>>> +
>>>> +Telemetry
>>>> +=========
>>>> +
>>>> +QSM has the ability to report various physical attributes of the device, and in
>>>> +some cases, to allow the host to control them.  Examples include thermal limits,
>>>> +thermal readings, and power readings.  These items are communicated via the
>>>> +QAIC_TELEMETRY MHI channel
>>>> diff --git a/Documentation/accel/qaic/index.rst b/Documentation/accel/qaic/index.rst
>>>> new file mode 100644
>>>> index 0000000..ad19b88
>>>> --- /dev/null
>>>> +++ b/Documentation/accel/qaic/index.rst
>>>> @@ -0,0 +1,13 @@
>>>> +.. SPDX-License-Identifier: GPL-2.0-only
>>>> +
>>>> +=====================================
>>>> + accel/qaic Qualcomm Cloud AI driver
>>>> +=====================================
>>>> +
>>>> +The accel/qaic driver supports the Qualcomm Cloud AI machine learning
>>>> +accelerator cards.
>>>> +
>>>> +.. toctree::
>>>> +
>>>> +   qaic
>>>> +   aic100
>>>> diff --git a/Documentation/accel/qaic/qaic.rst b/Documentation/accel/qaic/qaic.rst
>>>> new file mode 100644
>>>> index 0000000..b0e7a5f
>>>> --- /dev/null
>>>> +++ b/Documentation/accel/qaic/qaic.rst
>>>> @@ -0,0 +1,169 @@
>>>> +.. SPDX-License-Identifier: GPL-2.0-only
>>>> +
>>>> +=============
>>>> + QAIC driver
>>>> +=============
>>>> +
>>>> +The QAIC driver is the Kernel Mode Driver (KMD) for the AIC100 family of AI
>>>> +accelerator products.
>>>> +
>>>> +Interrupts
>>>> +==========
>>>> +
>>>> +While the AIC100 DMA Bridge hardware implements an IRQ storm mitigation
>>>> +mechanism, it is still possible for an IRQ storm to occur.  A storm can happen
>>>> +if the workload is particularly quick, and the host is responsive.  If the host
>>>> +can drain the response FIFO as quickly as the device can insert elements into
>>>> +it, then the device will frequently transition the response FIFO from empty to
>>>> +non-empty and generate MSIs at a rate equilivelent to the speed of the
>>>
>>> equilivelent -> equivalent
>>
>> Sure.
>>
>>>
>>>> +workload's ability to process inputs.  The lprnet (license plate reader network)
>>>> +workload is known to trigger this condition, and can generate in excess of 100k
>>>> +MSIs per second.  It has been observed that most systems cannot tolerate this
>>>> +for long, and will crash due to some form of watchdog due to the overhead of
>>>> +the interrupt controller interrupting the host CPU.
>>>> +
>>>> +To mitigate this issue, the QAIC driver implements specific IRQ handling.  When
>>>> +QAIC receives an IRQ, it disables that line.  This prevents the interrupt
>>>> +controller from interrupting the CPU.  Then AIC drains the FIFO.  Once the FIFO
>>>> +is drained, QAIC implements a "last chance" polling algorithm where QAIC will
>>>> +sleep for a time to see if the workload will generate more activity.  The IRQ
>>>> +line remains disabled during this time.  If no activity is detected, QAIC exits
>>>> +polling mode and reenables the IRQ line.
>>>> +
>>>> +This mitigation in QAIC is very effective.  The same lprnet usecase that
>>>> +generates 100k IRQs per second (per /proc/interrupts) is reduced to roughly 64
>>>> +IRQs over 5 minutes while keeping the host system stable, and having the same
>>>> +workload throughput performance (within run to run noise variation).
>>>> +
>>>> +
>>>> +Neural Network Control (NNC) Protocol
>>>> +=====================================
>>>> +
>>>> +The implementation of NNC is split between the KMD (QAIC) and UMD.  In general
>>>> +QAIC understands how to encode/decode NNC wire protocol, and elements of the
>>>> +protocol which require kernelspace knowledge to process (for example, mapping
>>>
>>> kernelspace is missing a space :P
>>>
>>>> +host memory to device IOVAs).  QAIC understands the structure of a message, and
>>>> +all of the transactions.  QAIC does not understand commands (the payload of a
>>>> +passthrough transaction).
>>>> +
>>>> +QAIC handles and enforces the required little endianness and 64-bit alignment,
>>>> +to the degree that it can.  Since QAIC does not know the contents of a
>>>> +passthrough transaction, it relies on the UMD to saitsfy the requirements.
>>>
>>> saitsfy -> satisfy
>>
>> Will do.
>>
>>>
>>>> +The terminate transaction is of particular use to QAIC.  QAIC is not aware of
>>>> +the resources that are loaded onto a device since the majority of that activity
>>>> +occurs within NNC commands.  As a result, QAIC does not have the means to
>>>> +roll back userspace activity.  To ensure that a userspace client's resources
>>>> +are fully released in the case of a process crash, or a bug, QAIC uses the
>>>> +terminate command to let QSM know when a user has gone away, and the resources
>>>> +can be released.
>>>> +
>>>> +QSM can report a version number of the NNC protocol it supports.  This is in the
>>>> +form of a Major number and a Minor number.
>>>> +
>>>> +Major number updates indicate changes to the NNC protocol which impact the
>>>> +message format, or transactions (impacts QAIC).
>>>> +
>>>> +Minor number updates indicate changes to the NNC protocol which impact the
>>>> +commands (does not impact QAIC).
>>>> +
>>>> +uAPI
>>>> +====
>>>> +
>>>> +QAIC defines a number of driver specific IOCTLs as part of the userspace API.
>>>> +This section describes those APIs.
>>>> +
>>>> +DRM_IOCTL_QAIC_MANAGE:
>>>> +This IOCTL allows userspace to send a NNC request to the QSM.  The call will
>>>> +block until a response is received, or the request has timed out.
>>>> +
>>>> +DRM_IOCTL_QAIC_CREATE_BO:
>>>> +This IOCTL allows userspace to allocate a buffer object (BO) which can send or
>>>> +receive data from a workload.  The call will return a GEM handle that
>>>> +represents the allocated buffer.  The BO is not usable until it has been sliced
>>>> +(see DRM_IOCTL_QAIC_ATTACH_SLICE_BO).
>>>> +
>>>> +DRM_IOCTL_QAIC_MMAP_BO:
>>>> +This IOCTL allows userspace to prepare an allocated BO to be mmap'd into the
>>>> +userspace process.
>>>> +
>>>> +DRM_IOCTL_QAIC_ATTACH_SLICE_BO:
>>>> +This IOCTL allows userspace to slice a BO in preparation for sending the BO to
>>>> +the device.  Slicing is the operation of describing what portions of a BO get
>>>> +sent where to a workload.  This requires a set of DMA transfers for the DMA
>>>> +Bridge, and as such, locks the BO to a specific DBC.
>>>> +
>>>> +DRM_IOCTL_QAIC_EXECUTE_BO:
>>>> +This IOCTL allows userspace to submit a set of sliced BOs to the device.  The
>>>> +call is non-blocking.  Success only indicates that the BOs have been queued
>>>> +to the device, but does not guarantee they have been executed.
>>>> +
>>>> +DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO:
>>>> +This IOCTL operates like DRM_IOCTL_QAIC_EXECUTE_BO, but it allows userspace to
>>>> +shrink the BOs sent to the device for this specific call.  If a BO typically has
>>>> +N inputs, but only a subset of those is available, this IOCTL allows userspace
>>>> +to indicate that only the first M bytes of the BO should be sent to the device
>>>> +to minimize data transfer overhead.  This IOCTL dynamically recomputes the
>>>> +slicing, and therefore has some processing overhead before the BOs can be queued
>>>> +to the device.
>>>> +
>>>> +DRM_IOCTL_QAIC_WAIT_BO:
>>>> +This IOCTL allows userspace to determine when a particular BO has been processed
>>>> +by the device.  The call will block until either the BO has been processed and
>>>> +can be re-queued to the device, or a timeout occurs.
>>>> +
>>>> +DRM_IOCTL_QAIC_PERF_STATS_BO:
>>>> +This IOCTL allows userspace to collect performance statistics on the most
>>>> +recent execution of a BO.  This allows userspace to construct an end to end
>>>> +timeline of the BO processing for a performance analysis.
>>>> +
>>>> +DRM_IOCTL_QAIC_PART_DEV:
>>>> +This IOCTL allows userspace to request a duplicate "shadow device".  This extra
>>>> +accelN device is associated with a specific partition of resources on the AIC100
>>>> +device and can be used for limiting a process to some subset of resources.
>>>> +
>>>> +Userspace Client Isolation
>>>> +==========================
>>>> +
>>>> +AIC100 supports multiple clients.  Multiple DBCs can be consumed by a single
>>>> +client, and multiple clients can each consume one or more DBCs.  Workloads
>>>> +may contain sensistive information therefore only the client that owns the
>>>
>>> sensistive -> sensitive
>>
>> Will do.
>>
>>>
>>>> +workload should be allowed to interface with the DBC.
>>>> +
>>>> +Clients are identified by the instance associated with their open().  A client
>>>> +may only use memory they allocate, and DBCs that are assigned to their
>>>> +workloads.  Attempts to access resources assigned to other clients will be
>>>> +rejected.
>>>> +
>>>> +Module parameters
>>>> +=================
>>>> +
>>>> +QAIC supports the following module parameters:
>>>> +
>>>> +**datapath_polling (bool)**
>>>> +
>>>> +Configures QAIC to use a polling thread for datapath events instead of relying
>>>> +on the device interrupts.  Useful for platforms with broken multiMSI.  Must be
>>>> +set at QAIC driver initialization.  Default is 0 (off).
>>>> +
>>>> +**mhi_timeout (int)**
>>>> +
>>>> +Sets the timeout value for MHI operations in milliseconds (ms).  Must be set
>>>> +at the time the driver detects a device.  Default is 2000 (2 seconds).
>>>> +
>>>> +**control_resp_timeout (int)**
>>>> +
>>>> +Sets the timeout value for QSM responses to NNC messages in seconds (s).  Must
>>>> +be set at the time the driver is sending a request to QSM.  Default is 60 (one
>>>> +minute).
>>>> +
>>>> +**wait_exec_default_timeout (int)**
>>>> +
>>>> +Sets the default timeout for the wait_exec ioctl in milliseconds (ms).  Must be
>>>> +set prior to the waic_exec ioctl call.  A value specified in the ioctl call
>>>> +overrides this for that call.  Default is 5000 (5 seconds).
>>>> +
>>>> +**datapath_poll_interval_us (int)**
>>>> +
>>>> +Sets the polling interval in microseconds (us) when datapath polling is active.
>>>> +Takes effect at the next polling interval.  Default is 100 (100 us).
>>>
>>> Cool that you are staring with the documentation :)
>>> I suggest running at least "checkpatch.pl --codespell" on the series as there are many spelling issue.
>>
>> Thats what happened to checkpatch.  Thanks for pointing out the flag.  I remember it doing spellcheck by default.  Apparently it needs a flag now.  I'll do that going forward.
>>
>>>
>>> Regards,
>>> Jacek
>>>
>>>
>>>
>>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 0/8] QAIC accel driver
  2023-02-08 22:01   ` Jeffrey Hugo
@ 2023-02-17 16:14     ` Jeffrey Hugo
  -1 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-17 16:14 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: linux-doc, linux-arm-msm, quic_ajitpals, quic_pkanojiy,
	stanislaw.gruszka, quic_carlv, jacek.lawrynowicz

On 2/8/2023 3:01 PM, Jeffrey Hugo wrote:
> On 2/6/2023 8:41 AM, Jeffrey Hugo wrote:
>> Regarding the open userspace (see the documentation patch), the UMD and
>> compiler are a week or so away from being posted in the indicated repos.
>> Just need to polish some documentation.
> 
> An update to this, the compiler is now live on github at the link 
> specified in the documentation patch.

The UMD is now posted.

-Jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 0/8] QAIC accel driver
@ 2023-02-17 16:14     ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-17 16:14 UTC (permalink / raw)
  To: ogabbay, airlied, daniel, dri-devel
  Cc: linux-doc, linux-arm-msm, quic_ajitpals, quic_pkanojiy,
	stanislaw.gruszka, quic_carlv, jacek.lawrynowicz

On 2/8/2023 3:01 PM, Jeffrey Hugo wrote:
> On 2/6/2023 8:41 AM, Jeffrey Hugo wrote:
>> Regarding the open userspace (see the documentation patch), the UMD and
>> compiler are a week or so away from being posted in the indicated repos.
>> Just need to polish some documentation.
> 
> An update to this, the compiler is now live on github at the link 
> specified in the documentation patch.

The UMD is now posted.

-Jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/8] accel/qaic: Add uapi and core driver file
  2023-02-16 14:13   ` Jacek Lawrynowicz
@ 2023-02-17 18:15       ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-17 18:15 UTC (permalink / raw)
  To: Jacek Lawrynowicz, ogabbay, airlied, daniel, dri-devel
  Cc: linux-doc, linux-arm-msm, quic_ajitpals, quic_pkanojiy,
	stanislaw.gruszka, quic_carlv

On 2/16/2023 7:13 AM, Jacek Lawrynowicz wrote:
> Hi,
> 
> On 06.02.2023 16:41, Jeffrey Hugo wrote:
>> Add the QAIC driver uapi file and core driver file that binds to the PCIe
>> device.  The core driver file also creates the accel device and manages
>> all the interconnections between the different parts of the driver.
>>
>> The driver can be built as a module.  If so, it will be called "qaic.ko".
>>
>> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
>> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
>> ---
>>   drivers/accel/qaic/qaic.h     | 321 ++++++++++++++++++
>>   drivers/accel/qaic/qaic_drv.c | 771 ++++++++++++++++++++++++++++++++++++++++++
>>   include/uapi/drm/qaic_accel.h | 283 ++++++++++++++++
>>   3 files changed, 1375 insertions(+)
>>   create mode 100644 drivers/accel/qaic/qaic.h
>>   create mode 100644 drivers/accel/qaic/qaic_drv.c
>>   create mode 100644 include/uapi/drm/qaic_accel.h
>>
>> diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
>> new file mode 100644
>> index 0000000..3f7ea76
>> --- /dev/null
>> +++ b/drivers/accel/qaic/qaic.h
>> @@ -0,0 +1,321 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only
>> + *
>> + * Copyright (c) 2019-2021, The Linux Foundation. All rights reserved.
>> + * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved.
>> + */
>> +
>> +#ifndef QAICINTERNAL_H_
> 
> Please use guard macro that matches the file name: _QAIC_H_

Before moving to DRM/ACCEL, this conflicted with the uapi file. 
However, that is no longer the case, so yes, this should be changed. 
Will do.

> 
>> +#define QAICINTERNAL_H_
>> +
>> +#include <linux/interrupt.h>
>> +#include <linux/kref.h>
>> +#include <linux/mhi.h>
>> +#include <linux/mutex.h>
>> +#include <linux/pci.h>
>> +#include <linux/spinlock.h>
>> +#include <linux/srcu.h>
>> +#include <linux/wait.h>
>> +#include <linux/workqueue.h>
>> +#include <drm/drm_device.h>
>> +#include <drm/drm_gem.h>
>> +
>> +#define QAIC_DBC_BASE		0x20000
>> +#define QAIC_DBC_SIZE		0x1000
>> +
>> +#define QAIC_NO_PARTITION	-1
>> +
>> +#define QAIC_DBC_OFF(i)		((i) * QAIC_DBC_SIZE + QAIC_DBC_BASE)
>> +
>> +#define to_qaic_bo(obj) container_of(obj, struct qaic_bo, base)
>> +
>> +extern bool poll_datapath;
>> +
>> +struct qaic_user {
>> +	/* Uniquely identifies this user for the device */
>> +	int			handle;
>> +	struct kref		ref_count;
>> +	/* Char device opened by this user */
>> +	struct qaic_drm_device	*qddev;
>> +	/* Node in list of users that opened this drm device */
>> +	struct list_head	node;
>> +	/* SRCU used to synchronize this user during cleanup */
>> +	struct srcu_struct	qddev_lock;
>> +	atomic_t		chunk_id;
>> +};
>> +
>> +struct dma_bridge_chan {
>> +	/* Pointer to device strcut maintained by driver */
>> +	struct qaic_device	*qdev;
>> +	/* ID of this DMA bridge channel(DBC) */
>> +	unsigned int		id;
>> +	/* Synchronizes access to xfer_list */
>> +	spinlock_t		xfer_lock;
>> +	/* Base address of request queue */
>> +	void			*req_q_base;
>> +	/* Base address of response queue */
>> +	void			*rsp_q_base;
>> +	/*
>> +	 * Base bus address of request queue. Response queue bus address can be
>> +	 * calculated by adding request queue size to this variable
>> +	 */
>> +	dma_addr_t		dma_addr;
>> +	/* Total size of request and response queue in byte */
>> +	u32			total_size;
>> +	/* Capacity of request/response queue */
>> +	u32			nelem;
>> +	/* The user that opened this DBC */
>> +	struct qaic_user	*usr;
>> +	/*
>> +	 * Request ID of next memory handle that goes in request queue. One
>> +	 * memory handle can enqueue more than one request elements, all
>> +	 * this requests that belong to same memory handle have same request ID
>> +	 */
>> +	u16			next_req_id;
>> +	/* TRUE: DBC is in use; FALSE: DBC not in use */
> 
> Use standard "true"/"false" instead of custom "TRUE"/"FALSE" macros.
> This applies here and in multiple other places in the driver.

I think you are getting at that the documentation could be confusing.  I 
don't appear to see custom macro use in the code.  Will try to clarify 
that here.

>> +	bool			in_use;
>> +	/*
>> +	 * Base address of device registers. Used to read/write request and
>> +	 * response queue's head and tail pointer of this DBC.
>> +	 */
>> +	void __iomem		*dbc_base;
>> +	/* Head of list where each node is a memory handle queued in request queue */
>> +	struct list_head	xfer_list;
>> +	/* Synchronizes DBC readers during cleanup */
>> +	struct srcu_struct	ch_lock;
>> +	/*
>> +	 * When this DBC is released, any thread waiting on this wait queue is
>> +	 * woken up
>> +	 */
>> +	wait_queue_head_t	dbc_release;
>> +	/* Head of list where each node is a bo associated with this DBC */
>> +	struct list_head	bo_lists;
>> +	/* The irq line for this DBC.  Used for polling */
>> +	unsigned int		irq;
>> +	/* Polling work item to simulate interrupts */
>> +	struct work_struct	poll_work;
>> +};
>> +
>> +struct qaic_device {
>> +	/* Pointer to base PCI device struct of our physical device */
>> +	struct pci_dev		*pdev;
>> +	/* Mask of all bars of this device */
>> +	int			bars;
>> +	/* Req. ID of request that will be queued next in MHI control device */
>> +	u32			next_seq_num;
>> +	/* Base address of bar 0 */
>> +	void __iomem		*bar_0;
>> +	/* Base address of bar 2 */
>> +	void __iomem		*bar_2;
>> +	/* Controller structure for MHI devices */
>> +	struct mhi_controller	*mhi_cntl;
>> +	/* MHI control channel device */
>> +	struct mhi_device	*cntl_ch;
>> +	/* List of requests queued in MHI control device */
>> +	struct list_head	cntl_xfer_list;
>> +	/* Synchronizes MHI control device transactions and its xfer list */
>> +	struct mutex		cntl_mutex;
>> +	/* Base actual physical representation of drm device */
>> +	struct qaic_drm_device	*base_dev;
>> +	/* Array of DBC struct of this device */
>> +	struct dma_bridge_chan	*dbc;
>> +	/* Work queue for tasks related to MHI control device */
>> +	struct workqueue_struct	*cntl_wq;
>> +	/* Synchronizes all the users of device during cleanup */
>> +	struct srcu_struct	dev_lock;
>> +	/* TRUE: Device under reset; FALSE: Device not under reset */
>> +	bool			in_reset;
>> +	/*
>> +	 * TRUE: A tx MHI transaction has failed and a rx buffer is still queued
>> +	 * in control device. Such a buffer is considered lost rx buffer
>> +	 * FALSE: No rx buffer is lost in control device
>> +	 */
>> +	bool			cntl_lost_buf;
>> +	/* Maximum number of DBC supported by this device */
>> +	u32			num_dbc;
>> +	/* Head in list of drm device created on top of this device */
>> +	struct list_head	qaic_drm_devices;
>> +	/* Synchronizes access of qaic_drm_devices list */
>> +	struct mutex		qaic_drm_devices_mutex;
>> +	/* Generate the CRC of a control message */
>> +	u32 (*gen_crc)(void *msg);
>> +	/* Validate the CRC of a control message */
>> +	bool (*valid_crc)(void *msg);
>> +};
>> +
>> +struct qaic_drm_device {
>> +	/* Pointer to the root device struct driven by this driver */
>> +	struct qaic_device	*qdev;
>> +	/* Node in list of drm devices maintained by root device */
>> +	struct list_head	node;
>> +	/*
>> +	 * The physical device can be partition in number of logical devices.
>> +	 * And each logical device is given a partition id. This member stores
>> +	 * that id. QAIC_NO_PARTITION is a sentinel used to mark that this drm
>> +	 * device is the actual physical device
>> +	 */
>> +	s32			partition_id;
>> +	/*
>> +	 * It points to the user that created this drm device. It is NULL
>> +	 * when this drm device represents the physical device i.e.
>> +	 * partition_id is QAIC_NO_PARTITION
>> +	 */
>> +	struct qaic_user	*owner;
>> +	/* Pointer to the drm device struct of this drm device */
>> +	struct drm_device	*ddev;
>> +	/* Head in list of users who have opened this drm device */
>> +	struct list_head	users;
>> +	/* Synchronizes access to users list */
>> +	struct mutex		users_mutex;
>> +};
>> +
>> +struct qaic_bo {
>> +	struct drm_gem_object	base;
> 
> Any reason why drm_gem_shmem_object cannot be used as a base?

I beleive that is incompatible with our memory allocator in patch 5.

>> +	/* Scatter/gather table for allocate/imported BO */
>> +	struct sg_table		*sgt;
>> +	/* BO size requested by user. GEM object might be bigger in size. */
>> +	u64			size;
>> +	/* Head in list of slices of this BO */
>> +	struct list_head	slices;
>> +	/* Total nents, for all slices of this BO */
>> +	int			total_slice_nents;
>> +	/*
>> +	 * Direction of transfer. It can assume only two value DMA_TO_DEVICE and
>> +	 * DMA_FROM_DEVICE.
>> +	 */
>> +	int			dir;
>> +	/* The pointer of the DBC which operates on this BO */
>> +	struct dma_bridge_chan	*dbc;
>> +	/* Number of slice that belongs to this buffer */
>> +	u32			nr_slice;
>> +	/* Number of slice that have been transferred by DMA engine */
>> +	u32			nr_slice_xfer_done;
>> +	/* TRUE = BO is queued for execution, FALSE = BO is not queued */
>> +	bool			queued;
>> +	/*
>> +	 * If TRUE then user has attached slicing information to this BO by
>> +	 * calling DRM_IOCTL_QAIC_ATTACH_SLICE_BO ioctl.
>> +	 */
>> +	bool			sliced;
>> +	/* Request ID of this BO if it is queued for execution */
>> +	u16			req_id;
>> +	/* Handle assigned to this BO */
>> +	u32			handle;
>> +	/* Wait on this for completion of DMA transfer of this BO */
>> +	struct completion	xfer_done;
>> +	/*
>> +	 * Node in linked list where head is dbc->xfer_list.
>> +	 * This link list contain BO's that are queued for DMA transfer.
>> +	 */
>> +	struct list_head	xfer_list;
>> +	/*
>> +	 * Node in linked list where head is dbc->bo_lists.
>> +	 * This link list contain BO's that are associated with the DBC it is
>> +	 * linked to.
>> +	 */
>> +	struct list_head	bo_list;
>> +	struct {
>> +		/*
>> +		 * Latest timestamp(ns) at which kernel received a request to
>> +		 * execute this BO
>> +		 */
>> +		u64		req_received_ts;
>> +		/*
>> +		 * Latest timestamp(ns) at which kernel enqueued requests of
>> +		 * this BO for execution in DMA queue
>> +		 */
>> +		u64		req_submit_ts;
>> +		/*
>> +		 * Latest timestamp(ns) at which kernel received a completion
>> +		 * interrupt for requests of this BO
>> +		 */
>> +		u64		req_processed_ts;
>> +		/*
>> +		 * Number of elements already enqueued in DMA queue before
>> +		 * enqueuing requests of this BO
>> +		 */
>> +		u32		queue_level_before;
>> +	} perf_stats;
>> +
>> +};
>> +
>> +struct bo_slice {
>> +	/* Mapped pages */
>> +	struct sg_table		*sgt;
>> +	/* Number of requests required to queue in DMA queue */
>> +	int			nents;
>> +	/* See enum dma_data_direction */
>> +	int			dir;
>> +	/* Actual requests that will be copied in DMA queue */
>> +	struct dbc_req		*reqs;
>> +	struct kref		ref_count;
>> +	/* TRUE: No DMA transfer required */
>> +	bool			no_xfer;
>> +	/* Pointer to the parent BO handle */
>> +	struct qaic_bo		*bo;
>> +	/* Node in list of slices maintained by parent BO */
>> +	struct list_head	slice;
>> +	/* Size of this slice in bytes */
>> +	u64			size;
>> +	/* Offset of this slice in buffer */
>> +	u64			offset;
>> +};
>> +
>> +int get_dbc_req_elem_size(void);
>> +int get_dbc_rsp_elem_size(void);
>> +int get_cntl_version(struct qaic_device *qdev, struct qaic_user *usr,
>> +		     u16 *major, u16 *minor);
>> +int qaic_manage_ioctl(struct drm_device *dev, void *data,
>> +		      struct drm_file *file_priv);
>> +int qaic_execute_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
>> +		       unsigned long arg, bool is_partial);
>> +int qaic_wait_exec_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
>> +			 unsigned long arg);
>> +int qaic_query_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
>> +		     unsigned long arg);
>> +int qaic_data_mmap(struct qaic_device *qdev, struct qaic_user *usr,
>> +		   struct vm_area_struct *vma);
>> +int qaic_data_get_reservation(struct qaic_device *qdev, struct qaic_user *usr,
>> +			      void *data, u32 *partition_id,
>> +			      u16 *remove);
>> +void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
>> +			 struct mhi_result *mhi_result);
>> +
>> +void qaic_mhi_dl_xfer_cb(struct mhi_device *mhi_dev,
>> +			 struct mhi_result *mhi_result);
>> +
>> +int qaic_control_open(struct qaic_device *qdev);
>> +void qaic_control_close(struct qaic_device *qdev);
>> +void qaic_release_usr(struct qaic_device *qdev, struct qaic_user *usr);
>> +
>> +irqreturn_t dbc_irq_threaded_fn(int irq, void *data);
>> +irqreturn_t dbc_irq_handler(int irq, void *data);
>> +int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr);
>> +void enable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr);
>> +void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id);
>> +void release_dbc(struct qaic_device *qdev, u32 dbc_id);
>> +
>> +void wake_all_cntl(struct qaic_device *qdev);
>> +void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset);
>> +
>> +struct drm_gem_object *qaic_gem_prime_import(struct drm_device *dev,
>> +					     struct dma_buf *dma_buf);
>> +
>> +int qaic_create_bo_ioctl(struct drm_device *dev, void *data,
>> +			 struct drm_file *file_priv);
>> +int qaic_mmap_bo_ioctl(struct drm_device *dev, void *data,
>> +		       struct drm_file *file_priv);
>> +int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data,
>> +			       struct drm_file *file_priv);
>> +int qaic_execute_bo_ioctl(struct drm_device *dev, void *data,
>> +			  struct drm_file *file_priv);
>> +int qaic_partial_execute_bo_ioctl(struct drm_device *dev, void *data,
>> +				  struct drm_file *file_priv);
>> +int qaic_wait_bo_ioctl(struct drm_device *dev, void *data,
>> +		       struct drm_file *file_priv);
>> +int qaic_test_print_bo_ioctl(struct drm_device *dev, void *data,
>> +			     struct drm_file *file_priv);
>> +int qaic_perf_stats_bo_ioctl(struct drm_device *dev, void *data,
>> +			     struct drm_file *file_priv);
> 
> You don't need to break these lines. You can use up to 100 columns in the whole driver.
> It will be more readable and checkpatch won't complain.

Some of this predates the 100 column change.  Checkpatch already 
complains about the uapi file.

Will try to fix here.

>> +void irq_polling_work(struct work_struct *work);
>> +
>> +#endif /* QAICINTERNAL_H_ */
>> diff --git a/drivers/accel/qaic/qaic_drv.c b/drivers/accel/qaic/qaic_drv.c
>> new file mode 100644
>> index 0000000..602a784
>> --- /dev/null
>> +++ b/drivers/accel/qaic/qaic_drv.c
>> @@ -0,0 +1,771 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +
>> +/* Copyright (c) 2019-2021, The Linux Foundation. All rights reserved. */
>> +/* Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved. */
>> +
>> +#include <linux/delay.h>
>> +#include <linux/dma-mapping.h>
>> +#include <linux/idr.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/list.h>
>> +#include <linux/kref.h>
>> +#include <linux/mhi.h>
>> +#include <linux/module.h>
>> +#include <linux/msi.h>
>> +#include <linux/mutex.h>
>> +#include <linux/pci.h>
>> +#include <linux/sched.h>
> 
> Is <linux/sched.h> used here?
> Feels like there are couple other unused includes in this file.

Actually I think that can be removed now.  Will check.
The other ones look sane to me.  Are there specific ones you question?

> 
>> +#include <linux/spinlock.h>
>> +#include <linux/workqueue.h>
>> +#include <linux/wait.h>
>> +#include <drm/drm_accel.h>
>> +#include <drm/drm_drv.h>
>> +#include <drm/drm_file.h>
>> +#include <drm/drm_gem.h>
>> +#include <drm/drm_ioctl.h>
>> +#include <uapi/drm/qaic_accel.h>
>> +
>> +#include "mhi_controller.h"
>> +#include "mhi_qaic_ctrl.h"
>> +#include "qaic.h"
>> +
>> +MODULE_IMPORT_NS(DMA_BUF);
>> +
>> +#define PCI_DEV_AIC100			0xa100
>> +#define QAIC_NAME			"qaic"
>> +#define QAIC_DESC			"Qualcomm Cloud AI Accelerators"
>> +
>> +static unsigned int datapath_polling;
>> +module_param(datapath_polling, uint, 0400);
>> +bool poll_datapath;
>> +
>> +static u16 cntl_major = 5;
>> +static u16 cntl_minor;/* 0 */
> 
> Missing space before the comment.
> And also you could convert both vars to macros as they are constants.

Sure

>> +static bool link_up;
>> +static DEFINE_IDA(qaic_usrs);
>> +
>> +static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id,
>> +				  struct qaic_user *owner);
>> +static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id,
>> +				    struct qaic_user *owner);
>> +
>> +static void free_usr(struct kref *kref)
>> +{
>> +	struct qaic_user *usr = container_of(kref, struct qaic_user, ref_count);
>> +
>> +	cleanup_srcu_struct(&usr->qddev_lock);
>> +	ida_free(&qaic_usrs, usr->handle);
>> +	kfree(usr);
>> +}
>> +
>> +static int qaic_open(struct drm_device *dev, struct drm_file *file)
>> +{
>> +	struct qaic_drm_device *qddev = dev->dev_private;
>> +	struct qaic_device *qdev = qddev->qdev;
>> +	struct qaic_user *usr;
>> +	int rcu_id;
>> +	int ret;
>> +
>> +	rcu_id = srcu_read_lock(&qdev->dev_lock);
>> +	if (qdev->in_reset) {
>> +		srcu_read_unlock(&qdev->dev_lock, rcu_id);
>> +		return -ENODEV;
>> +	}
>> +
>> +	usr = kmalloc(sizeof(*usr), GFP_KERNEL);
>> +	if (!usr) {
>> +		srcu_read_unlock(&qdev->dev_lock, rcu_id);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	usr->handle = ida_alloc(&qaic_usrs, GFP_KERNEL);
>> +	if (usr->handle < 0) {
>> +		srcu_read_unlock(&qdev->dev_lock, rcu_id);
>> +		kfree(usr);
>> +		return usr->handle;
>> +	}
>> +	usr->qddev = qddev;
>> +	atomic_set(&usr->chunk_id, 0);
>> +	init_srcu_struct(&usr->qddev_lock);
>> +	kref_init(&usr->ref_count);
>> +
>> +	ret = mutex_lock_interruptible(&qddev->users_mutex);
>> +	if (ret) {
>> +		cleanup_srcu_struct(&usr->qddev_lock);
>> +		kfree(usr);
>> +		srcu_read_unlock(&qdev->dev_lock, rcu_id);
>> +		return ret;
>> +	}
>> +
>> +	list_add(&usr->node, &qddev->users);
>> +	mutex_unlock(&qddev->users_mutex);
>> +
>> +	file->driver_priv = usr;
>> +
>> +	srcu_read_unlock(&qdev->dev_lock, rcu_id);
>> +	return 0;
>> +}
>> +
>> +static void qaic_postclose(struct drm_device *dev, struct drm_file *file)
>> +{
>> +	struct qaic_user *usr = file->driver_priv;
>> +	struct qaic_drm_device *qddev;
>> +	struct qaic_device *qdev;
>> +	int qdev_rcu_id;
>> +	int usr_rcu_id;
>> +	int i;
>> +
>> +	qddev = usr->qddev;
>> +	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
>> +	if (qddev) {
>> +		qdev = qddev->qdev;
>> +		qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
>> +		if (!qdev->in_reset) {
>> +			qaic_release_usr(qdev, usr);
>> +			for (i = 0; i < qdev->num_dbc; ++i)
>> +				if (qdev->dbc[i].usr &&
>> +				    qdev->dbc[i].usr->handle == usr->handle)
>> +					release_dbc(qdev, i);
>> +
>> +			/* Remove child devices */
>> +			if (qddev->partition_id == QAIC_NO_PARTITION)
>> +				qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, usr);
>> +		}
>> +		srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
>> +
>> +		mutex_lock(&qddev->users_mutex);
>> +		if (!list_empty(&usr->node))
>> +			list_del_init(&usr->node);
>> +		mutex_unlock(&qddev->users_mutex);
>> +	}
>> +
>> +	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>> +	kref_put(&usr->ref_count, free_usr);
>> +
>> +	file->driver_priv = NULL;
>> +}
>> +
>> +static int qaic_part_dev_ioctl(struct drm_device *dev, void *data,
>> +			       struct drm_file *file_priv)
>> +{
>> +	struct qaic_device *qdev;
>> +	struct qaic_user *usr;
>> +	u32 partition_id;
>> +	int qdev_rcu_id;
>> +	int usr_rcu_id;
>> +	int ret = 0;
>> +	u16 remove;
>> +
>> +	usr = file_priv->driver_priv;
>> +	if (!usr)
>> +		return -EINVAL;
>> +
>> +	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
>> +	if (!usr->qddev) {
>> +		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>> +		return -ENODEV;
>> +	}
>> +
>> +	qdev = usr->qddev->qdev;
>> +	if (!qdev) {
>> +		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>> +		return -ENODEV;
>> +	}
>> +
>> +	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
>> +	if (qdev->in_reset) {
>> +		ret = -ENODEV;
>> +		goto out;
>> +	}
>> +
>> +	/* This IOCTL is only supported for base devices. */
>> +	if (usr->qddev->partition_id != QAIC_NO_PARTITION) {
>> +		ret = -ENOTTY;
>> +		goto out;
>> +	}
>> +
>> +	ret = qaic_data_get_reservation(qdev, usr, data, &partition_id,
>> +					&remove);
>> +	if (ret)
>> +		goto out;
>> +
>> +	if (remove == 1)
>> +		qaic_destroy_drm_device(qdev, partition_id, usr);
>> +	else
>> +		ret = qaic_create_drm_device(qdev, partition_id, usr);
>> +
>> +out:
>> +	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
>> +	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>> +
>> +	return ret;
>> +}
>> +
>> +DEFINE_DRM_ACCEL_FOPS(qaic_accel_fops);
>> +
>> +static const struct drm_ioctl_desc qaic_drm_ioctls[] = {
>> +	DRM_IOCTL_DEF_DRV(QAIC_MANAGE, qaic_manage_ioctl, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(QAIC_CREATE_BO, qaic_create_bo_ioctl, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(QAIC_MMAP_BO, qaic_mmap_bo_ioctl, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(QAIC_ATTACH_SLICE_BO, qaic_attach_slice_bo_ioctl, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(QAIC_EXECUTE_BO, qaic_execute_bo_ioctl, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(QAIC_PARTIAL_EXECUTE_BO, qaic_partial_execute_bo_ioctl, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(QAIC_WAIT_BO, qaic_wait_bo_ioctl, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(QAIC_PERF_STATS_BO, qaic_perf_stats_bo_ioctl, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(QAIC_PART_DEV, qaic_part_dev_ioctl, DRM_RENDER_ALLOW),
>> +};
>> +
>> +static const struct drm_driver qaic_accel_driver = {
>> +	.driver_features	= DRIVER_GEM | DRIVER_COMPUTE_ACCEL,
>> +
>> +	.name			= QAIC_NAME,
>> +	.desc			= QAIC_DESC,
>> +	.date			= "20190618",
>> +
>> +	.fops			= &qaic_accel_fops,
>> +	.open			= qaic_open,
>> +	.postclose		= qaic_postclose,
>> +
>> +	.ioctls			= qaic_drm_ioctls,
>> +	.num_ioctls		= ARRAY_SIZE(qaic_drm_ioctls),
>> +	.prime_fd_to_handle	= drm_gem_prime_fd_to_handle,
>> +	.gem_prime_import	= qaic_gem_prime_import,
>> +};
>> +
>> +static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id,
>> +				  struct qaic_user *owner)
>> +{
>> +	struct qaic_drm_device *qddev;
>> +	struct drm_device *ddev;
>> +	struct device *pdev;
>> +	int ret;
>> +
>> +	/*
>> +	 * Partition id QAIC_NO_PARTITION indicates that the device was created
>> +	 * on mhi_probe and id > QAIC_NO_PARTITION indicates a partition
>> +	 * created using IOCTL. So, pdev for primary device is the pci dev and
>> +	 * the parent for partition dev is the primary device.
>> +	 */
>> +	if (partition_id == QAIC_NO_PARTITION)
>> +		pdev = &qdev->pdev->dev;
>> +	else
>> +		pdev = qdev->base_dev->ddev->dev;
>> +
>> +	qddev = kzalloc(sizeof(*qddev), GFP_KERNEL);
>> +	if (!qddev) {
>> +		ret = -ENOMEM;
>> +		goto qddev_fail;
>> +	}
>> +
>> +	ddev = drm_dev_alloc(&qaic_accel_driver, pdev);
>> +	if (IS_ERR(ddev)) {
>> +		ret = PTR_ERR(ddev);
>> +		goto ddev_fail;
>> +	}
>> +
>> +	ddev->dev_private = qddev;
>> +	qddev->ddev = ddev;
>> +
>> +	if (partition_id == QAIC_NO_PARTITION)
>> +		qdev->base_dev = qddev;
>> +	qddev->qdev = qdev;
>> +	qddev->partition_id = partition_id;
>> +	qddev->owner = owner;
>> +	INIT_LIST_HEAD(&qddev->users);
>> +	mutex_init(&qddev->users_mutex);
>> +
>> +	mutex_lock(&qdev->qaic_drm_devices_mutex);
>> +	list_add(&qddev->node, &qdev->qaic_drm_devices);
>> +	mutex_unlock(&qdev->qaic_drm_devices_mutex);
>> +
>> +	ret = drm_dev_register(ddev, 0);
>> +	if (ret) {
>> +		pci_dbg(qdev->pdev, "%s: drm_dev_register failed %d\n", __func__, ret);
>> +		goto drm_reg_fail;
>> +	}
>> +
>> +	return 0;
>> +
>> +drm_reg_fail:
>> +	mutex_destroy(&qddev->users_mutex);
>> +	mutex_lock(&qdev->qaic_drm_devices_mutex);
>> +	list_del(&qddev->node);
>> +	mutex_unlock(&qdev->qaic_drm_devices_mutex);
>> +	if (partition_id == QAIC_NO_PARTITION)
>> +		qdev->base_dev = NULL;
>> +	drm_dev_put(ddev);
>> +ddev_fail:
>> +	kfree(qddev);
>> +qddev_fail:
>> +	return ret;
>> +}
>> +
>> +static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id,
>> +				    struct qaic_user *owner)
>> +{
>> +	struct qaic_drm_device *qddev;
>> +	struct qaic_drm_device *q;
>> +	struct qaic_user *usr;
>> +
>> +	list_for_each_entry_safe(qddev, q, &qdev->qaic_drm_devices, node) {
>> +		/*
>> +		 * Skip devices in case we just want to remove devices
>> +		 * specific to a owner or partition id.
>> +		 *
>> +		 * owner	partition_id	notes
>> +		 * ----------------------------------
>> +		 * NULL		NO_PARTITION	delete base + all derived (qdev
>> +		 *				reset)
>> +		 * !NULL	NO_PARTITION	delete derived devs created by
>> +		 *				owner.
>> +		 * !NULL	>NO_PARTITION	delete derived dev identified by
>> +		 *				the partition id and created by
>> +		 *				owner
>> +		 * NULL		>NO_PARTITION	invalid (no-op)
>> +		 *
>> +		 * if partition_id is any value < QAIC_NO_PARTITION this will be
>> +		 * a no-op.
>> +		 */
>> +		if (owner && owner != qddev->owner)
>> +			continue;
>> +
>> +		if (partition_id != QAIC_NO_PARTITION &&
>> +		    partition_id != qddev->partition_id && !owner)
>> +			continue;
>> +
>> +		/*
>> +		 * Existing users get unresolvable errors till they close FDs.
>> +		 * Need to sync carefully with users calling close().  The
>> +		 * list of users can be modified elsewhere when the lock isn't
>> +		 * held here, but the sync'ing the srcu with the mutex held
>> +		 * could deadlock.  Grab the mutex so that the list will be
>> +		 * unmodified.  The user we get will exist as long as the
>> +		 * lock is held.  Signal that the qcdev is going away, and
>> +		 * grab a reference to the user so they don't go away for
>> +		 * synchronize_srcu().  Then release the mutex to avoid
>> +		 * deadlock and make sure the user has observed the signal.
>> +		 * With the lock released, we cannot maintain any state of the
>> +		 * user list.
>> +		 */
>> +		mutex_lock(&qddev->users_mutex);
>> +		while (!list_empty(&qddev->users)) {
>> +			usr = list_first_entry(&qddev->users, struct qaic_user,
>> +					       node);
>> +			list_del_init(&usr->node);
>> +			kref_get(&usr->ref_count);
>> +			usr->qddev = NULL;
>> +			mutex_unlock(&qddev->users_mutex);
>> +			synchronize_srcu(&usr->qddev_lock);
>> +			kref_put(&usr->ref_count, free_usr);
>> +			mutex_lock(&qddev->users_mutex);
>> +		}
>> +		mutex_unlock(&qddev->users_mutex);
>> +
>> +		if (qddev->ddev) {
>> +			drm_dev_unregister(qddev->ddev);
>> +			drm_dev_put(qddev->ddev);
>> +		}
>> +
>> +		list_del(&qddev->node);
>> +		kfree(qddev);
>> +	}
>> +}
>> +
>> +static int qaic_mhi_probe(struct mhi_device *mhi_dev,
>> +			  const struct mhi_device_id *id)
>> +{
>> +	struct qaic_device *qdev;
>> +	u16 major, minor;
>> +	int ret;
>> +
>> +	/*
>> +	 * Invoking this function indicates that the control channel to the
>> +	 * device is available.  We use that as a signal to indicate that
>> +	 * the device side firmware has booted.  The device side firmware
>> +	 * manages the device resources, so we need to communicate with it
>> +	 * via the control channel in order to utilize the device.  Therefore
>> +	 * we wait until this signal to create the drm dev that userspace will
>> +	 * use to control the device, because without the device side firmware,
>> +	 * userspace can't do anything useful.
>> +	 */
>> +
>> +	qdev = pci_get_drvdata(to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
>> +
>> +	qdev->in_reset = false;
>> +
>> +	dev_set_drvdata(&mhi_dev->dev, qdev);
>> +	qdev->cntl_ch = mhi_dev;
>> +
>> +	ret = qaic_control_open(qdev);
>> +	if (ret) {
>> +		pci_dbg(qdev->pdev, "%s: control_open failed %d\n", __func__, ret);
>> +		goto err;
>> +	}
>> +
>> +	ret = get_cntl_version(qdev, NULL, &major, &minor);
>> +	if (ret || major != cntl_major || minor > cntl_minor) {
>> +		pci_err(qdev->pdev, "%s: Control protocol version (%d.%d) not supported.  Supported version is (%d.%d). Ret: %d\n",
>> +			__func__, major, minor, cntl_major, cntl_minor, ret);
>> +		ret = -EINVAL;
>> +		goto close_control;
>> +	}
>> +
>> +	ret = qaic_create_drm_device(qdev, QAIC_NO_PARTITION, NULL);
>> +
>> +	return ret;
>> +
>> +close_control:
>> +	qaic_control_close(qdev);
>> +err:
>> +	return ret;
>> +}
>> +
>> +static void qaic_mhi_remove(struct mhi_device *mhi_dev)
>> +{
> 
> Add a comment here

Presumably why this is an empty function?
Will do.

>> +}
>> +
>> +static void qaic_notify_reset(struct qaic_device *qdev)
>> +{
>> +	int i;
>> +
>> +	qdev->in_reset = true;
>> +	/* wake up any waiters to avoid waiting for timeouts at sync */
>> +	wake_all_cntl(qdev);
>> +	for (i = 0; i < qdev->num_dbc; ++i)
>> +		wakeup_dbc(qdev, i);
>> +	synchronize_srcu(&qdev->dev_lock);
>> +}
>> +
>> +void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset)
>> +{
>> +	int i;
>> +
>> +	qaic_notify_reset(qdev);
>> +
>> +	/* remove drmdevs to prevent new users from coming in */
>> +	if (qdev->base_dev)
>> +		qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, NULL);
>> +
>> +	/* start tearing things down */
>> +	for (i = 0; i < qdev->num_dbc; ++i)
>> +		release_dbc(qdev, i);
>> +
>> +	if (exit_reset)
>> +		qdev->in_reset = false;
>> +}
>> +
>> +static int qaic_pci_probe(struct pci_dev *pdev,
>> +			  const struct pci_device_id *id)
> 
> Please try to simplify this function. Maybe move irq init to separate function.
> It will be more readable and there will less of a error handling spaghetti at the bottom.

I guess I tend towards not breaking something up if there is not reuse 
elsewhere, but I think I see your point.  Will have a look.

> 
>> +{
>> +	int ret;
>> +	int i;
>> +	int mhi_irq;
>> +	struct qaic_device *qdev;
>> +
>> +	qdev = devm_kzalloc(&pdev->dev, sizeof(*qdev), GFP_KERNEL);
>> +	if (!qdev)
>> +		return -ENOMEM;
>> +
>> +	if (id->device == PCI_DEV_AIC100) {
>> +		qdev->num_dbc = 16;
>> +		qdev->dbc = devm_kcalloc(&pdev->dev, qdev->num_dbc, sizeof(*qdev->dbc),
>> +					 GFP_KERNEL);
>> +		if (!qdev->dbc)
>> +			return -ENOMEM;
>> +	}
>> +
>> +	qdev->cntl_wq = alloc_workqueue("qaic_cntl", WQ_UNBOUND, 0);
>> +	if (!qdev->cntl_wq) {
>> +		ret = -ENOMEM;
>> +		goto wq_fail;
>> +	}
>> +	pci_set_drvdata(pdev, qdev);
>> +	qdev->pdev = pdev;
>> +	mutex_init(&qdev->cntl_mutex);
>> +	INIT_LIST_HEAD(&qdev->cntl_xfer_list);
>> +	init_srcu_struct(&qdev->dev_lock);
>> +	INIT_LIST_HEAD(&qdev->qaic_drm_devices);
>> +	mutex_init(&qdev->qaic_drm_devices_mutex);
>> +	for (i = 0; i < qdev->num_dbc; ++i) {
>> +		spin_lock_init(&qdev->dbc[i].xfer_lock);
>> +		qdev->dbc[i].qdev = qdev;
>> +		qdev->dbc[i].id = i;
>> +		INIT_LIST_HEAD(&qdev->dbc[i].xfer_list);
>> +		init_srcu_struct(&qdev->dbc[i].ch_lock);
>> +		init_waitqueue_head(&qdev->dbc[i].dbc_release);
>> +		INIT_LIST_HEAD(&qdev->dbc[i].bo_lists);
>> +	}
>> +
>> +	qdev->bars = pci_select_bars(pdev, IORESOURCE_MEM);
>> +
>> +	/* make sure the device has the expected BARs */
>> +	if (qdev->bars != (BIT(0) | BIT(2) | BIT(4))) {
>> +		pci_dbg(pdev, "%s: expected BARs 0, 2, and 4 not found in device.  Found 0x%x\n",
>> +			__func__, qdev->bars);
>> +		ret = -EINVAL;
>> +		goto bar_fail;
>> +	}
>> +
>> +	ret = pci_enable_device(pdev);
>> +	if (ret)
>> +		goto enable_fail;
>> +
>> +	ret = pci_request_selected_regions(pdev, qdev->bars, "aic100");
>> +	if (ret)
>> +		goto request_regions_fail;
>> +
>> +	pci_set_master(pdev);
>> +
>> +	ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
>> +	if (ret)
>> +		goto dma_mask_fail;
>> +	ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
>> +	if (ret)
>> +		goto dma_mask_fail;
>> +	ret = dma_set_max_seg_size(&pdev->dev, UINT_MAX);
>> +	if (ret)
>> +		goto dma_mask_fail;
>> +
>> +	qdev->bar_0 = pci_ioremap_bar(pdev, 0);
>> +	if (!qdev->bar_0) {
>> +		ret = -ENOMEM;
>> +		goto ioremap_0_fail;
>> +	}
>> +
>> +	qdev->bar_2 = pci_ioremap_bar(pdev, 2);
>> +	if (!qdev->bar_2) {
>> +		ret = -ENOMEM;
>> +		goto ioremap_2_fail;
>> +	}
>> +
>> +	for (i = 0; i < qdev->num_dbc; ++i)
>> +		qdev->dbc[i].dbc_base = qdev->bar_2 + QAIC_DBC_OFF(i);
>> +
>> +	ret = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI);
>> +	if (ret < 0)
>> +		goto alloc_irq_fail;
>> +
>> +	if (ret < 32) {
>> +		pci_err(pdev, "%s: Requested 32 MSIs.  Obtained %d MSIs which is less than the 32 required.\n",
>> +			__func__, ret);
>> +		ret = -ENODEV;
>> +		goto invalid_msi_config;
>> +	}
>> +
>> +	mhi_irq = pci_irq_vector(pdev, 0);
>> +	if (mhi_irq < 0) {
>> +		ret = mhi_irq;
>> +		goto get_mhi_irq_fail;
>> +	}
>> +
>> +	for (i = 0; i < qdev->num_dbc; ++i) {
>> +		ret = devm_request_threaded_irq(&pdev->dev,
>> +						pci_irq_vector(pdev, i + 1),
>> +						dbc_irq_handler,
>> +						dbc_irq_threaded_fn,
>> +						IRQF_SHARED,
>> +						"qaic_dbc",
>> +						&qdev->dbc[i]);
>> +		if (ret)
>> +			goto get_dbc_irq_failed;
>> +
>> +		if (poll_datapath) {
>> +			qdev->dbc[i].irq = pci_irq_vector(pdev, i + 1);
>> +			disable_irq_nosync(qdev->dbc[i].irq);
>> +			INIT_WORK(&qdev->dbc[i].poll_work, irq_polling_work);
>> +		}
>> +	}
>> +
>> +	qdev->mhi_cntl = qaic_mhi_register_controller(pdev, qdev->bar_0, mhi_irq);
>> +	if (IS_ERR(qdev->mhi_cntl)) {
>> +		ret = PTR_ERR(qdev->mhi_cntl);
>> +		goto mhi_register_fail;
>> +	}
>> +
>> +	return 0;
>> +
>> +mhi_register_fail:
>> +get_dbc_irq_failed:
> 
> I don't think that duplicated goto statements are allowed by the coding style.
> These should be rather named something like err_free_irq.
> See https://www.kernel.org/doc/html/v4.10/process/coding-style.html#centralized-exiting-of-functions

I took another look at that link, and I do not beleive it prohibits 
duplicated goto labels, but they probably could be renamed and 
simplified as you indicate.  Will have a look at that.

> 
>> +	for (i = 0; i < qdev->num_dbc; ++i)
>> +		devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
>> +			      &qdev->dbc[i]);
>> +get_mhi_irq_fail:
>> +invalid_msi_config:
>> +	pci_free_irq_vectors(pdev);
>> +alloc_irq_fail:
>> +	iounmap(qdev->bar_2);
>> +ioremap_2_fail:
>> +	iounmap(qdev->bar_0);
>> +ioremap_0_fail:
>> +dma_mask_fail:
>> +	pci_clear_master(pdev);
>> +	pci_release_selected_regions(pdev, qdev->bars);
>> +request_regions_fail:
>> +	pci_disable_device(pdev);
>> +enable_fail:
>> +	pci_set_drvdata(pdev, NULL);
>> +bar_fail:
>> +	for (i = 0; i < qdev->num_dbc; ++i)
>> +		cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
>> +	cleanup_srcu_struct(&qdev->dev_lock);
>> +	destroy_workqueue(qdev->cntl_wq);
>> +wq_fail:
>> +	return ret;
>> +}
>> +
>> +static void qaic_pci_remove(struct pci_dev *pdev)
>> +{
>> +	struct qaic_device *qdev = pci_get_drvdata(pdev);
>> +	int i;
>> +
>> +	if (!qdev)
>> +		return;
>> +
>> +	qaic_dev_reset_clean_local_state(qdev, false);
>> +	qaic_mhi_free_controller(qdev->mhi_cntl, link_up);
>> +	for (i = 0; i < qdev->num_dbc; ++i) {
>> +		devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
>> +			      &qdev->dbc[i]);
>> +		cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
>> +	}
>> +	destroy_workqueue(qdev->cntl_wq);
>> +	pci_free_irq_vectors(pdev);
>> +	iounmap(qdev->bar_0);
>> +	pci_clear_master(pdev);
>> +	pci_release_selected_regions(pdev, qdev->bars);
>> +	pci_disable_device(pdev);
>> +	pci_set_drvdata(pdev, NULL);
>> +}
>> +
>> +static void qaic_pci_shutdown(struct pci_dev *pdev)
>> +{
>> +	/* see qaic_exit for what link_up is doing */
>> +	link_up = true;
>> +	qaic_pci_remove(pdev);
>> +}
>> +
>> +static pci_ers_result_t qaic_pci_error_detected(struct pci_dev *pdev,
>> +						pci_channel_state_t error)
>> +{
>> +	return PCI_ERS_RESULT_NEED_RESET;
>> +}
>> +
>> +static void qaic_pci_reset_prepare(struct pci_dev *pdev)
>> +{
>> +	struct qaic_device *qdev = pci_get_drvdata(pdev);
>> +
>> +	qaic_notify_reset(qdev);
>> +	qaic_mhi_start_reset(qdev->mhi_cntl);
>> +	qaic_dev_reset_clean_local_state(qdev, false);
>> +}
>> +
>> +static void qaic_pci_reset_done(struct pci_dev *pdev)
>> +{
>> +	struct qaic_device *qdev = pci_get_drvdata(pdev);
>> +
>> +	qdev->in_reset = false;
>> +	qaic_mhi_reset_done(qdev->mhi_cntl);
>> +}
>> +
>> +static const struct mhi_device_id qaic_mhi_match_table[] = {
>> +	{ .chan = "QAIC_CONTROL", },
>> +	{},
>> +};
>> +
>> +static struct mhi_driver qaic_mhi_driver = {
>> +	.id_table = qaic_mhi_match_table,
>> +	.remove = qaic_mhi_remove,
>> +	.probe = qaic_mhi_probe,
>> +	.ul_xfer_cb = qaic_mhi_ul_xfer_cb,
>> +	.dl_xfer_cb = qaic_mhi_dl_xfer_cb,
>> +	.driver = {
>> +		.name = "qaic_mhi",
>> +	},
>> +};
>> +
>> +static const struct pci_device_id qaic_ids[] = {
>> +	{ PCI_DEVICE(PCI_VENDOR_ID_QCOM, PCI_DEV_AIC100), },
>> +	{ }
>> +};
>> +MODULE_DEVICE_TABLE(pci, qaic_ids);
>> +
>> +static const struct pci_error_handlers qaic_pci_err_handler = {
>> +	.error_detected = qaic_pci_error_detected,
>> +	.reset_prepare = qaic_pci_reset_prepare,
>> +	.reset_done = qaic_pci_reset_done,
>> +};
>> +
>> +static struct pci_driver qaic_pci_driver = {
>> +	.name = QAIC_NAME,
>> +	.id_table = qaic_ids,
>> +	.probe = qaic_pci_probe,
>> +	.remove = qaic_pci_remove,
>> +	.shutdown = qaic_pci_shutdown,
>> +	.err_handler = &qaic_pci_err_handler,
>> +};
>> +
>> +static int __init qaic_init(void)
>> +{
>> +	int ret;
>> +
>> +	if (datapath_polling)
>> +		poll_datapath = true;
>> +
>> +	ret = mhi_driver_register(&qaic_mhi_driver);
>> +	if (ret) {
>> +		pr_debug("qaic: mhi_driver_register failed %d\n", ret);
>> +		goto free_class;
>> +	}
>> +
>> +	ret = pci_register_driver(&qaic_pci_driver);
>> +	if (ret) {
>> +		pr_debug("qaic: pci_register_driver failed %d\n", ret);
>> +		goto free_mhi;
>> +	}
>> +
>> +	ret = mhi_qaic_ctrl_init();
>> +	if (ret) {
>> +		pr_debug("qaic: mhi_qaic_ctrl_init failed %d\n", ret);
>> +		goto free_pci;
>> +	}
>> +
>> +	return 0;
>> +
>> +free_pci:
>> +	pci_unregister_driver(&qaic_pci_driver);
>> +free_mhi:
>> +	mhi_driver_unregister(&qaic_mhi_driver);
>> +free_class:
> 
> This label doesn't free anything. It should be renamed.

Doh.  Holdover from a previous version.  It does need to be renamed.

> 
>> +	return ret;
>> +}
>> +
>> +static void __exit qaic_exit(void)
>> +{
>> +	/*
>> +	 * We assume that qaic_pci_remove() is called due to a hotplug event
>> +	 * which would mean that the link is down, and thus
>> +	 * qaic_mhi_free_controller() should not try to access the device during
>> +	 * cleanup.
>> +	 * We call pci_unregister_driver() below, which also triggers
>> +	 * qaic_pci_remove(), but since this is module exit, we expect the link
>> +	 * to the device to be up, in which case qaic_mhi_free_controller()
>> +	 * should try to access the device during cleanup to put the device in
>> +	 * a sane state.
>> +	 * For that reason, we set link_up here to let qaic_mhi_free_controller
>> +	 * know the expected link state.  Since the module is going to be
>> +	 * removed at the end of this, we don't need to worry about
>> +	 * reinitializing the link_up state after the cleanup is done.
>> +	 */
>> +	link_up = true;
> 
> Maybe you could just use pdev->current_state instead of link_up?

Maybe.  Will have a look at that.

> 
>> +	mhi_qaic_ctrl_deinit();
>> +	pci_unregister_driver(&qaic_pci_driver);
>> +	mhi_driver_unregister(&qaic_mhi_driver);
>> +}
>> +
>> +module_init(qaic_init);
>> +module_exit(qaic_exit);
>> +
>> +MODULE_AUTHOR(QAIC_DESC " Kernel Driver Team");
>> +MODULE_DESCRIPTION(QAIC_DESC " Accel Driver");
>> +MODULE_LICENSE("GPL");
>> diff --git a/include/uapi/drm/qaic_accel.h b/include/uapi/drm/qaic_accel.h
>> new file mode 100644
>> index 0000000..d5fa6f5
>> --- /dev/null
>> +++ b/include/uapi/drm/qaic_accel.h
>> @@ -0,0 +1,283 @@
>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
>> + *
>> + * Copyright (c) 2019-2020, The Linux Foundation. All rights reserved.
>> + * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved.
>> + */
>> +
>> +#ifndef QAIC_ACCEL_H_
>> +#define QAIC_ACCEL_H_
>> +
>> +#include <linux/ioctl.h>
>> +#include <linux/types.h>
> 
> These two headers should not be needed here.

Fair enough.  Listed out of paranoia since they define things used in 
this header.

> 
>> +#include "drm.h"
>> +
>> +#if defined(__CPLUSPLUS)
> 
> Use lowercase here: __cplusplus

Doh.  Not sure why uppercase was used.  The referenced examples sure 
didn't have that.

> 
>> +extern "C" {
>> +#endif
>> +
>> +#define QAIC_MANAGE_MAX_MSG_LENGTH SZ_4K	/**<
>> +						  * The length(4K) includes len and
>> +						  * count fields of qaic_manage_msg
>> +						  */
> 
> I guess these are doxygen style commands but you should be using kernel-doc here.
> See https://docs.kernel.org/doc-guide/kernel-doc.html.
> This can be used to verify the header:
> $ scripts/kernel-doc -v -none include/uapi/drm/qaic_accel.h

include/uapi/drm/drm.h was used as the example for these, and thus it 
was assumed that was the DRM style.

I'm not convinced kernel-doc is applicable to all of these.  Will review.

> 
>> +
>> +enum qaic_sem_flags {
> 
> Is there any specific reason for enums if all values are explicitly given?

It is a style pulled from elsewhere.  I can remove the enums.

> 
>> +	SEM_INSYNCFENCE =	0x1,
> 
> All these enums/defines end up in a global user space namespace.
> I would advise to prefix everything with QAIC_ or DRM_QAIC_ (e.g. QAIC_SEM_INSYNCFENCE)
> to avoid conflicts with other drivers or user space libs.

Good point.  Will do.

> 
>> +	SEM_OUTSYNCFENCE =	0x2,
>> +};
>> +
>> +enum qaic_sem_cmd {
>> +	SEM_NOP =		0,
>> +	SEM_INIT =		1,
>> +	SEM_INC =		2,
>> +	SEM_DEC =		3,
>> +	SEM_WAIT_EQUAL =	4,
>> +	SEM_WAIT_GT_EQ =	5, /**< Greater than or equal */
>> +	SEM_WAIT_GT_0 =		6, /**< Greater than 0 */
>> +};
>> +
>> +enum qaic_manage_transaction_type {
>> +	TRANS_UNDEFINED =			0,
>> +	TRANS_PASSTHROUGH_FROM_USR =		1,
>> +	TRANS_PASSTHROUGH_TO_USR =		2,
>> +	TRANS_PASSTHROUGH_FROM_DEV =		3,
>> +	TRANS_PASSTHROUGH_TO_DEV =		4,
>> +	TRANS_DMA_XFER_FROM_USR =		5,
>> +	TRANS_DMA_XFER_TO_DEV =			6,
>> +	TRANS_ACTIVATE_FROM_USR =		7,
>> +	TRANS_ACTIVATE_FROM_DEV =		8,
>> +	TRANS_ACTIVATE_TO_DEV =			9,
>> +	TRANS_DEACTIVATE_FROM_USR =		10,
>> +	TRANS_DEACTIVATE_FROM_DEV =		11,
>> +	TRANS_STATUS_FROM_USR =			12,
>> +	TRANS_STATUS_TO_USR =			13,
>> +	TRANS_STATUS_FROM_DEV =			14,
>> +	TRANS_STATUS_TO_DEV =			15,
>> +	TRANS_TERMINATE_FROM_DEV =		16,
>> +	TRANS_TERMINATE_TO_DEV =		17,
>> +	TRANS_DMA_XFER_CONT =			18,
>> +	TRANS_VALIDATE_PARTITION_FROM_DEV =	19,
>> +	TRANS_VALIDATE_PARTITION_TO_DEV =	20,
>> +	TRANS_MAX =				21
>> +};
>> +
>> +struct qaic_manage_trans_hdr {
>> +	__u32 type;	/**< in, value from enum manage_transaction_type */
>> +	__u32 len;	/**< in, length of this transaction, including the header */
>> +};
>> +
>> +struct qaic_manage_trans_passthrough {
>> +	struct qaic_manage_trans_hdr hdr;
>> +	__u8 data[];	/**< in, userspace must encode in little endian */
>> +};
>> +
>> +struct qaic_manage_trans_dma_xfer {
>> +	struct qaic_manage_trans_hdr hdr;
>> +	__u32 tag;	/**< in, device specific */
>> +	__u32 count;	/**< in */
>> +	__u64 addr;	/**< in, address of the data to transferred via DMA */
>> +	__u64 size;	/**< in, length of the data to transferred via DMA */
>> +};
>> +
>> +struct qaic_manage_trans_activate_to_dev {
>> +	struct qaic_manage_trans_hdr hdr;
>> +	__u32 queue_size;	/**<
>> +				  * in, number of elements in DBC request
>> +				  * and respose queue
>> +				  */
>> +	__u32 eventfd;		/**< in */
>> +	__u32 options;		/**< in, device specific */
>> +	__u32 pad;		/**< pad must be 0 */
>> +};
>> +
>> +struct qaic_manage_trans_activate_from_dev {
>> +	struct qaic_manage_trans_hdr hdr;
>> +	__u32 status;	/**< out, status of activate transaction */
>> +	__u32 dbc_id;	/**< out, Identifier of assigned DMA Bridge channel */
>> +	__u64 options;	/**< out */
>> +};
>> +
>> +struct qaic_manage_trans_deactivate {
>> +	struct qaic_manage_trans_hdr hdr;
>> +	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
>> +	__u32 pad;	/**< pad must be 0 */
>> +};
>> +
>> +struct qaic_manage_trans_status_to_dev {
>> +	struct qaic_manage_trans_hdr hdr;
>> +};
>> +
>> +struct qaic_manage_trans_status_from_dev {
>> +	struct qaic_manage_trans_hdr hdr;
>> +	__u16 major;	/**< out, major vesrion of NNC protocol used by device */
>> +	__u16 minor;	/**< out, minor vesrion of NNC protocol used by device */
> 
> vesrion -> version

Will fix.

> 
>> +	__u32 status;	/**< out, status of query transaction  */
>> +	__u64 status_flags;	/**<
>> +				  * out
>> +				  * 0    : If set then device has CRC check enabled
>> +				  * 1:63 : Unused
>> +				  */
>> +};
>> +
>> +struct qaic_manage_msg {
>> +	__u32 len;	/**< in, Length of valid data - ie sum of all transactions */
>> +	__u32 count;	/**< in, Number of transactions in message */
>> +	__u64 data;	/**< in, Pointer to array of transactions */
>> +};
>> +
>> +struct qaic_create_bo {
>> +	__u64 size;	/**< in, Size of BO in byte */
>> +	__u32 handle;	/**< out, Returned GEM handle for the BO */
>> +	__u32 pad;	/**< pad must be 0 */
>> +};
>> +
>> +struct qaic_mmap_bo {
>> +	__u32 handle;	/**< in, Handle for the BO being mapped. */
> 
> The comment is missleading. BO is not mapped by this ioctl().

Its mapping the handle to the offset, which is a type of mapping.  I 
think I get your point though.  Will try to wordsmith this better.

> 
>> +	__u32 pad;	/**< pad must be 0 */
>> +	__u64 offset;	/**<
>> +			  * out, offset into the drm node to use for
>> +			  * subsequent mmap call
>> +			  */
>> +};
>> +
>> +/**
>> + * @brief semaphore command
>> + */
>> +struct qaic_sem {
>> +	__u16 val;	/**< in, Only lower 12 bits are valid */
>> +	__u8  index;	/**< in, Only lower 5 bits are valid */
>> +	__u8  presync;	/**< in, 1 if presync operation, 0 if postsync */
>> +	__u8  cmd;	/**< in, See enum sem_cmd */
>> +	__u8  flags;	/**< in, See sem_flags for valid bits.  All others must be 0 */
>> +	__u16 pad;	/**< pad must be 0 */
>> +};
>> +
>> +struct qaic_attach_slice_entry {
>> +	__u64 size;		/**< in, Size memory to allocate for this BO slice */
>> +	struct qaic_sem	sem0;	/**< in, Must be zero if not valid */
>> +	struct qaic_sem	sem1;	/**< in, Must be zero if not valid */
>> +	struct qaic_sem	sem2;	/**< in, Must be zero if not valid */
>> +	struct qaic_sem	sem3;	/**< in, Must be zero if not valid */
>> +	__u64 dev_addr;		/**< in, Address in device to/from which data is copied */
>> +	__u64 db_addr;		/**< in, Doorbell address */
>> +	__u32 db_data;		/**< in, Data to write to doorbell */
>> +	__u32 db_len;		/**<
>> +				  * in, Doorbell length - 32, 16, or 8 bits.
>> +				  * 0 means doorbell is inactive
>> +				  */
>> +	__u64 offset;		/**< in, Offset from start of buffer */
>> +};
>> +
>> +struct qaic_attach_slice_hdr {
>> +	__u32 count;	/**< in, Number of slices for this BO */
>> +	__u32 dbc_id;	/**< in, Associate this BO with this DMA Bridge channel */
>> +	__u32 handle;	/**< in, Handle of BO to which slicing information is to be attached */
>> +	__u32 dir;	/**< in, Direction of data: 1 = DMA_TO_DEVICE, 2 = DMA_FROM_DEVICE */
>> +	__u64 size;	/**<
>> +			  * in, Total length of BO
>> +			  * If BO is imported (DMABUF/PRIME) then this size
>> +			  * should not exceed the size of DMABUF provided.
>> +			  * If BO is allocated using DRM_IOCTL_QAIC_CREATE_BO
>> +			  * then this size should be exactly same as the size
>> +			  * provided during DRM_IOCTL_QAIC_CREATE_BO.
>> +			  */
>> +};
>> +
>> +struct qaic_attach_slice {
>> +	struct qaic_attach_slice_hdr hdr;
>> +	__u64 data;	/**<
>> +			  * in, Pointer to a buffer which is container of
>> +			  * struct qaic_attach_slice_entry[]
>> +			  */
>> +};
>> +
>> +struct qaic_execute_entry {
>> +	__u32 handle;	/**< in, buffer handle */
>> +	__u32 dir;	/**< in, 1 = to device, 2 = from device */
>> +};
>> +
>> +struct qaic_partial_execute_entry {
>> +	__u32 handle;	/**< in, buffer handle */
>> +	__u32 dir;	/**< in, 1 = to device, 2 = from device */
>> +	__u64 resize;	/**< in, 0 = no resize */
>> +};
>> +
>> +struct qaic_execute_hdr {
>> +	__u32 count;	/**< in, number of executes following this header */
>> +	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
>> +};
>> +
>> +struct qaic_execute {
>> +	struct qaic_execute_hdr hdr;
>> +	__u64 data;	/**< in, qaic_execute_entry or qaic_partial_execute_entry container */
>> +};
>> +
>> +struct qaic_wait {
>> +	__u32 handle;	/**< in, handle to wait on until execute is complete */
>> +	__u32 timeout;	/**< in, timeout for wait(in ms) */
>> +	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
>> +	__u32 pad;	/**< pad must be 0 */
>> +};
>> +
>> +struct qaic_perf_stats_hdr {
>> +	__u16 count;	/**< in, Total number BOs requested */
>> +	__u16 pad;	/**< pad must be 0 */
>> +	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
>> +};
>> +
>> +struct qaic_perf_stats {
>> +	struct qaic_perf_stats_hdr hdr;
>> +	__u64 data;	/**< in, qaic_perf_stats_entry container */
>> +};
>> +
>> +struct qaic_perf_stats_entry {
>> +	__u32 handle;			/**< in, Handle of the memory request */
>> +	__u32 queue_level_before;	/**<
>> +					  * out, Number of elements in queue
>> +					  * before submission given memory request
>> +					  */
>> +	__u32 num_queue_element;	/**<
>> +					  * out, Number of elements to add in the
>> +					  * queue for given memory request
>> +					  */
>> +	__u32 submit_latency_us;	/**<
>> +					  * out, Time taken by kernel to submit
>> +					  * the request to device
>> +					  */
>> +	__u32 device_latency_us;	/**<
>> +					  * out, Time taken by device to execute the
>> +					  * request. 0 if request is not completed
>> +					  */
>> +	__u32 pad;			/**< pad must be 0 */
>> +};
>> +
>> +struct qaic_part_dev {
>> +	__u32 partition_id;	/**< in, reservation id */
>> +	__u16 remove;		/**< in, 1 - Remove device 0 - Create device */
>> +	__u16 pad;		/**< pad must be 0 */
>> +};
>> +
>> +#define DRM_QAIC_MANAGE				0x00
>> +#define DRM_QAIC_CREATE_BO			0x01
>> +#define DRM_QAIC_MMAP_BO			0x02
> 
> I know that MMAP_BO ioctl is common in drm drivers but in my opinion it is a very poor name.
> I suggest naming it BO_INFO so in future you could extend it with other bo params besides
> mmap offset.

I see your point, but I don't see how to implement it.

Let us assume this IOCTL is renamed DRM_QAIC_BO_INFO, and struct 
qaic_mmap_bo is named qaic_bo_info.

qaic_bo_info is part of the uAPI.  If, "tomorrow" we need to add a field 
to it, we cannot.  That changes the structure size and breaks the uAPI.

I can't add reserved fields in anticipation of future use since GregKH 
had said not to do that - 
https://lore.kernel.org/all/20200514163734.GB3154055@kroah.com/

So, I'm thinking the only approach would be a linked list similar to 
EXECUTE_BO where qaic_bo_info holds a userspace pointer that is the head 
of the list, and new list nodes can be defined in future.  That seems 
like an overengineered solution to some hypothetical future which may 
not come to pass.  If this is the solution, then I think I'd prefer to 
leave this ioctl as is, and deprecate it if the extension usecase ever 
comes to pass.

Thoughts?  Am I missing something?

> 
>> +#define DRM_QAIC_ATTACH_SLICE_BO		0x03
>> +#define DRM_QAIC_EXECUTE_BO			0x04
>> +#define DRM_QAIC_PARTIAL_EXECUTE_BO		0x05
>> +#define DRM_QAIC_WAIT_BO			0x06
>> +#define DRM_QAIC_PERF_STATS_BO			0x07
>> +#define DRM_QAIC_PART_DEV			0x08
>> +
>> +#define DRM_IOCTL_QAIC_MANAGE			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_MANAGE, struct qaic_manage_msg)
>> +#define DRM_IOCTL_QAIC_CREATE_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_CREATE_BO,	struct qaic_create_bo)
>> +#define DRM_IOCTL_QAIC_MMAP_BO			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_MMAP_BO, struct qaic_mmap_bo)
>> +#define DRM_IOCTL_QAIC_ATTACH_SLICE_BO		DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_ATTACH_SLICE_BO, struct qaic_attach_slice)
>> +#define DRM_IOCTL_QAIC_EXECUTE_BO		DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_EXECUTE_BO,	struct qaic_execute)
>> +#define DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO	DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_PARTIAL_EXECUTE_BO,	struct qaic_execute)
>> +#define DRM_IOCTL_QAIC_WAIT_BO			DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_WAIT_BO, struct qaic_wait)
>> +#define DRM_IOCTL_QAIC_PERF_STATS_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PERF_STATS_BO, struct qaic_perf_stats)
>> +#define DRM_IOCTL_QAIC_PART_DEV			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PART_DEV, struct qaic_part_dev)
>> +
>> +#if defined(__CPLUSPLUS)
> 
> Use lowercase here: __cplusplus

Will do.

> 
>> +}
>> +#endif
>> +
>> +#endif /* QAIC_ACCEL_H_ */
> 
> Regards,
> Jacek
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/8] accel/qaic: Add uapi and core driver file
@ 2023-02-17 18:15       ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-17 18:15 UTC (permalink / raw)
  To: Jacek Lawrynowicz, ogabbay, airlied, daniel, dri-devel
  Cc: linux-doc, linux-arm-msm, quic_ajitpals, quic_pkanojiy,
	stanislaw.gruszka, quic_carlv

On 2/16/2023 7:13 AM, Jacek Lawrynowicz wrote:
> Hi,
> 
> On 06.02.2023 16:41, Jeffrey Hugo wrote:
>> Add the QAIC driver uapi file and core driver file that binds to the PCIe
>> device.  The core driver file also creates the accel device and manages
>> all the interconnections between the different parts of the driver.
>>
>> The driver can be built as a module.  If so, it will be called "qaic.ko".
>>
>> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
>> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
>> ---
>>   drivers/accel/qaic/qaic.h     | 321 ++++++++++++++++++
>>   drivers/accel/qaic/qaic_drv.c | 771 ++++++++++++++++++++++++++++++++++++++++++
>>   include/uapi/drm/qaic_accel.h | 283 ++++++++++++++++
>>   3 files changed, 1375 insertions(+)
>>   create mode 100644 drivers/accel/qaic/qaic.h
>>   create mode 100644 drivers/accel/qaic/qaic_drv.c
>>   create mode 100644 include/uapi/drm/qaic_accel.h
>>
>> diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
>> new file mode 100644
>> index 0000000..3f7ea76
>> --- /dev/null
>> +++ b/drivers/accel/qaic/qaic.h
>> @@ -0,0 +1,321 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only
>> + *
>> + * Copyright (c) 2019-2021, The Linux Foundation. All rights reserved.
>> + * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved.
>> + */
>> +
>> +#ifndef QAICINTERNAL_H_
> 
> Please use guard macro that matches the file name: _QAIC_H_

Before moving to DRM/ACCEL, this conflicted with the uapi file. 
However, that is no longer the case, so yes, this should be changed. 
Will do.

> 
>> +#define QAICINTERNAL_H_
>> +
>> +#include <linux/interrupt.h>
>> +#include <linux/kref.h>
>> +#include <linux/mhi.h>
>> +#include <linux/mutex.h>
>> +#include <linux/pci.h>
>> +#include <linux/spinlock.h>
>> +#include <linux/srcu.h>
>> +#include <linux/wait.h>
>> +#include <linux/workqueue.h>
>> +#include <drm/drm_device.h>
>> +#include <drm/drm_gem.h>
>> +
>> +#define QAIC_DBC_BASE		0x20000
>> +#define QAIC_DBC_SIZE		0x1000
>> +
>> +#define QAIC_NO_PARTITION	-1
>> +
>> +#define QAIC_DBC_OFF(i)		((i) * QAIC_DBC_SIZE + QAIC_DBC_BASE)
>> +
>> +#define to_qaic_bo(obj) container_of(obj, struct qaic_bo, base)
>> +
>> +extern bool poll_datapath;
>> +
>> +struct qaic_user {
>> +	/* Uniquely identifies this user for the device */
>> +	int			handle;
>> +	struct kref		ref_count;
>> +	/* Char device opened by this user */
>> +	struct qaic_drm_device	*qddev;
>> +	/* Node in list of users that opened this drm device */
>> +	struct list_head	node;
>> +	/* SRCU used to synchronize this user during cleanup */
>> +	struct srcu_struct	qddev_lock;
>> +	atomic_t		chunk_id;
>> +};
>> +
>> +struct dma_bridge_chan {
>> +	/* Pointer to device strcut maintained by driver */
>> +	struct qaic_device	*qdev;
>> +	/* ID of this DMA bridge channel(DBC) */
>> +	unsigned int		id;
>> +	/* Synchronizes access to xfer_list */
>> +	spinlock_t		xfer_lock;
>> +	/* Base address of request queue */
>> +	void			*req_q_base;
>> +	/* Base address of response queue */
>> +	void			*rsp_q_base;
>> +	/*
>> +	 * Base bus address of request queue. Response queue bus address can be
>> +	 * calculated by adding request queue size to this variable
>> +	 */
>> +	dma_addr_t		dma_addr;
>> +	/* Total size of request and response queue in byte */
>> +	u32			total_size;
>> +	/* Capacity of request/response queue */
>> +	u32			nelem;
>> +	/* The user that opened this DBC */
>> +	struct qaic_user	*usr;
>> +	/*
>> +	 * Request ID of next memory handle that goes in request queue. One
>> +	 * memory handle can enqueue more than one request elements, all
>> +	 * this requests that belong to same memory handle have same request ID
>> +	 */
>> +	u16			next_req_id;
>> +	/* TRUE: DBC is in use; FALSE: DBC not in use */
> 
> Use standard "true"/"false" instead of custom "TRUE"/"FALSE" macros.
> This applies here and in multiple other places in the driver.

I think you are getting at that the documentation could be confusing.  I 
don't appear to see custom macro use in the code.  Will try to clarify 
that here.

>> +	bool			in_use;
>> +	/*
>> +	 * Base address of device registers. Used to read/write request and
>> +	 * response queue's head and tail pointer of this DBC.
>> +	 */
>> +	void __iomem		*dbc_base;
>> +	/* Head of list where each node is a memory handle queued in request queue */
>> +	struct list_head	xfer_list;
>> +	/* Synchronizes DBC readers during cleanup */
>> +	struct srcu_struct	ch_lock;
>> +	/*
>> +	 * When this DBC is released, any thread waiting on this wait queue is
>> +	 * woken up
>> +	 */
>> +	wait_queue_head_t	dbc_release;
>> +	/* Head of list where each node is a bo associated with this DBC */
>> +	struct list_head	bo_lists;
>> +	/* The irq line for this DBC.  Used for polling */
>> +	unsigned int		irq;
>> +	/* Polling work item to simulate interrupts */
>> +	struct work_struct	poll_work;
>> +};
>> +
>> +struct qaic_device {
>> +	/* Pointer to base PCI device struct of our physical device */
>> +	struct pci_dev		*pdev;
>> +	/* Mask of all bars of this device */
>> +	int			bars;
>> +	/* Req. ID of request that will be queued next in MHI control device */
>> +	u32			next_seq_num;
>> +	/* Base address of bar 0 */
>> +	void __iomem		*bar_0;
>> +	/* Base address of bar 2 */
>> +	void __iomem		*bar_2;
>> +	/* Controller structure for MHI devices */
>> +	struct mhi_controller	*mhi_cntl;
>> +	/* MHI control channel device */
>> +	struct mhi_device	*cntl_ch;
>> +	/* List of requests queued in MHI control device */
>> +	struct list_head	cntl_xfer_list;
>> +	/* Synchronizes MHI control device transactions and its xfer list */
>> +	struct mutex		cntl_mutex;
>> +	/* Base actual physical representation of drm device */
>> +	struct qaic_drm_device	*base_dev;
>> +	/* Array of DBC struct of this device */
>> +	struct dma_bridge_chan	*dbc;
>> +	/* Work queue for tasks related to MHI control device */
>> +	struct workqueue_struct	*cntl_wq;
>> +	/* Synchronizes all the users of device during cleanup */
>> +	struct srcu_struct	dev_lock;
>> +	/* TRUE: Device under reset; FALSE: Device not under reset */
>> +	bool			in_reset;
>> +	/*
>> +	 * TRUE: A tx MHI transaction has failed and a rx buffer is still queued
>> +	 * in control device. Such a buffer is considered lost rx buffer
>> +	 * FALSE: No rx buffer is lost in control device
>> +	 */
>> +	bool			cntl_lost_buf;
>> +	/* Maximum number of DBC supported by this device */
>> +	u32			num_dbc;
>> +	/* Head in list of drm device created on top of this device */
>> +	struct list_head	qaic_drm_devices;
>> +	/* Synchronizes access of qaic_drm_devices list */
>> +	struct mutex		qaic_drm_devices_mutex;
>> +	/* Generate the CRC of a control message */
>> +	u32 (*gen_crc)(void *msg);
>> +	/* Validate the CRC of a control message */
>> +	bool (*valid_crc)(void *msg);
>> +};
>> +
>> +struct qaic_drm_device {
>> +	/* Pointer to the root device struct driven by this driver */
>> +	struct qaic_device	*qdev;
>> +	/* Node in list of drm devices maintained by root device */
>> +	struct list_head	node;
>> +	/*
>> +	 * The physical device can be partition in number of logical devices.
>> +	 * And each logical device is given a partition id. This member stores
>> +	 * that id. QAIC_NO_PARTITION is a sentinel used to mark that this drm
>> +	 * device is the actual physical device
>> +	 */
>> +	s32			partition_id;
>> +	/*
>> +	 * It points to the user that created this drm device. It is NULL
>> +	 * when this drm device represents the physical device i.e.
>> +	 * partition_id is QAIC_NO_PARTITION
>> +	 */
>> +	struct qaic_user	*owner;
>> +	/* Pointer to the drm device struct of this drm device */
>> +	struct drm_device	*ddev;
>> +	/* Head in list of users who have opened this drm device */
>> +	struct list_head	users;
>> +	/* Synchronizes access to users list */
>> +	struct mutex		users_mutex;
>> +};
>> +
>> +struct qaic_bo {
>> +	struct drm_gem_object	base;
> 
> Any reason why drm_gem_shmem_object cannot be used as a base?

I beleive that is incompatible with our memory allocator in patch 5.

>> +	/* Scatter/gather table for allocate/imported BO */
>> +	struct sg_table		*sgt;
>> +	/* BO size requested by user. GEM object might be bigger in size. */
>> +	u64			size;
>> +	/* Head in list of slices of this BO */
>> +	struct list_head	slices;
>> +	/* Total nents, for all slices of this BO */
>> +	int			total_slice_nents;
>> +	/*
>> +	 * Direction of transfer. It can assume only two value DMA_TO_DEVICE and
>> +	 * DMA_FROM_DEVICE.
>> +	 */
>> +	int			dir;
>> +	/* The pointer of the DBC which operates on this BO */
>> +	struct dma_bridge_chan	*dbc;
>> +	/* Number of slice that belongs to this buffer */
>> +	u32			nr_slice;
>> +	/* Number of slice that have been transferred by DMA engine */
>> +	u32			nr_slice_xfer_done;
>> +	/* TRUE = BO is queued for execution, FALSE = BO is not queued */
>> +	bool			queued;
>> +	/*
>> +	 * If TRUE then user has attached slicing information to this BO by
>> +	 * calling DRM_IOCTL_QAIC_ATTACH_SLICE_BO ioctl.
>> +	 */
>> +	bool			sliced;
>> +	/* Request ID of this BO if it is queued for execution */
>> +	u16			req_id;
>> +	/* Handle assigned to this BO */
>> +	u32			handle;
>> +	/* Wait on this for completion of DMA transfer of this BO */
>> +	struct completion	xfer_done;
>> +	/*
>> +	 * Node in linked list where head is dbc->xfer_list.
>> +	 * This link list contain BO's that are queued for DMA transfer.
>> +	 */
>> +	struct list_head	xfer_list;
>> +	/*
>> +	 * Node in linked list where head is dbc->bo_lists.
>> +	 * This link list contain BO's that are associated with the DBC it is
>> +	 * linked to.
>> +	 */
>> +	struct list_head	bo_list;
>> +	struct {
>> +		/*
>> +		 * Latest timestamp(ns) at which kernel received a request to
>> +		 * execute this BO
>> +		 */
>> +		u64		req_received_ts;
>> +		/*
>> +		 * Latest timestamp(ns) at which kernel enqueued requests of
>> +		 * this BO for execution in DMA queue
>> +		 */
>> +		u64		req_submit_ts;
>> +		/*
>> +		 * Latest timestamp(ns) at which kernel received a completion
>> +		 * interrupt for requests of this BO
>> +		 */
>> +		u64		req_processed_ts;
>> +		/*
>> +		 * Number of elements already enqueued in DMA queue before
>> +		 * enqueuing requests of this BO
>> +		 */
>> +		u32		queue_level_before;
>> +	} perf_stats;
>> +
>> +};
>> +
>> +struct bo_slice {
>> +	/* Mapped pages */
>> +	struct sg_table		*sgt;
>> +	/* Number of requests required to queue in DMA queue */
>> +	int			nents;
>> +	/* See enum dma_data_direction */
>> +	int			dir;
>> +	/* Actual requests that will be copied in DMA queue */
>> +	struct dbc_req		*reqs;
>> +	struct kref		ref_count;
>> +	/* TRUE: No DMA transfer required */
>> +	bool			no_xfer;
>> +	/* Pointer to the parent BO handle */
>> +	struct qaic_bo		*bo;
>> +	/* Node in list of slices maintained by parent BO */
>> +	struct list_head	slice;
>> +	/* Size of this slice in bytes */
>> +	u64			size;
>> +	/* Offset of this slice in buffer */
>> +	u64			offset;
>> +};
>> +
>> +int get_dbc_req_elem_size(void);
>> +int get_dbc_rsp_elem_size(void);
>> +int get_cntl_version(struct qaic_device *qdev, struct qaic_user *usr,
>> +		     u16 *major, u16 *minor);
>> +int qaic_manage_ioctl(struct drm_device *dev, void *data,
>> +		      struct drm_file *file_priv);
>> +int qaic_execute_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
>> +		       unsigned long arg, bool is_partial);
>> +int qaic_wait_exec_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
>> +			 unsigned long arg);
>> +int qaic_query_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
>> +		     unsigned long arg);
>> +int qaic_data_mmap(struct qaic_device *qdev, struct qaic_user *usr,
>> +		   struct vm_area_struct *vma);
>> +int qaic_data_get_reservation(struct qaic_device *qdev, struct qaic_user *usr,
>> +			      void *data, u32 *partition_id,
>> +			      u16 *remove);
>> +void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
>> +			 struct mhi_result *mhi_result);
>> +
>> +void qaic_mhi_dl_xfer_cb(struct mhi_device *mhi_dev,
>> +			 struct mhi_result *mhi_result);
>> +
>> +int qaic_control_open(struct qaic_device *qdev);
>> +void qaic_control_close(struct qaic_device *qdev);
>> +void qaic_release_usr(struct qaic_device *qdev, struct qaic_user *usr);
>> +
>> +irqreturn_t dbc_irq_threaded_fn(int irq, void *data);
>> +irqreturn_t dbc_irq_handler(int irq, void *data);
>> +int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr);
>> +void enable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr);
>> +void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id);
>> +void release_dbc(struct qaic_device *qdev, u32 dbc_id);
>> +
>> +void wake_all_cntl(struct qaic_device *qdev);
>> +void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset);
>> +
>> +struct drm_gem_object *qaic_gem_prime_import(struct drm_device *dev,
>> +					     struct dma_buf *dma_buf);
>> +
>> +int qaic_create_bo_ioctl(struct drm_device *dev, void *data,
>> +			 struct drm_file *file_priv);
>> +int qaic_mmap_bo_ioctl(struct drm_device *dev, void *data,
>> +		       struct drm_file *file_priv);
>> +int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data,
>> +			       struct drm_file *file_priv);
>> +int qaic_execute_bo_ioctl(struct drm_device *dev, void *data,
>> +			  struct drm_file *file_priv);
>> +int qaic_partial_execute_bo_ioctl(struct drm_device *dev, void *data,
>> +				  struct drm_file *file_priv);
>> +int qaic_wait_bo_ioctl(struct drm_device *dev, void *data,
>> +		       struct drm_file *file_priv);
>> +int qaic_test_print_bo_ioctl(struct drm_device *dev, void *data,
>> +			     struct drm_file *file_priv);
>> +int qaic_perf_stats_bo_ioctl(struct drm_device *dev, void *data,
>> +			     struct drm_file *file_priv);
> 
> You don't need to break these lines. You can use up to 100 columns in the whole driver.
> It will be more readable and checkpatch won't complain.

Some of this predates the 100 column change.  Checkpatch already 
complains about the uapi file.

Will try to fix here.

>> +void irq_polling_work(struct work_struct *work);
>> +
>> +#endif /* QAICINTERNAL_H_ */
>> diff --git a/drivers/accel/qaic/qaic_drv.c b/drivers/accel/qaic/qaic_drv.c
>> new file mode 100644
>> index 0000000..602a784
>> --- /dev/null
>> +++ b/drivers/accel/qaic/qaic_drv.c
>> @@ -0,0 +1,771 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +
>> +/* Copyright (c) 2019-2021, The Linux Foundation. All rights reserved. */
>> +/* Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved. */
>> +
>> +#include <linux/delay.h>
>> +#include <linux/dma-mapping.h>
>> +#include <linux/idr.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/list.h>
>> +#include <linux/kref.h>
>> +#include <linux/mhi.h>
>> +#include <linux/module.h>
>> +#include <linux/msi.h>
>> +#include <linux/mutex.h>
>> +#include <linux/pci.h>
>> +#include <linux/sched.h>
> 
> Is <linux/sched.h> used here?
> Feels like there are couple other unused includes in this file.

Actually I think that can be removed now.  Will check.
The other ones look sane to me.  Are there specific ones you question?

> 
>> +#include <linux/spinlock.h>
>> +#include <linux/workqueue.h>
>> +#include <linux/wait.h>
>> +#include <drm/drm_accel.h>
>> +#include <drm/drm_drv.h>
>> +#include <drm/drm_file.h>
>> +#include <drm/drm_gem.h>
>> +#include <drm/drm_ioctl.h>
>> +#include <uapi/drm/qaic_accel.h>
>> +
>> +#include "mhi_controller.h"
>> +#include "mhi_qaic_ctrl.h"
>> +#include "qaic.h"
>> +
>> +MODULE_IMPORT_NS(DMA_BUF);
>> +
>> +#define PCI_DEV_AIC100			0xa100
>> +#define QAIC_NAME			"qaic"
>> +#define QAIC_DESC			"Qualcomm Cloud AI Accelerators"
>> +
>> +static unsigned int datapath_polling;
>> +module_param(datapath_polling, uint, 0400);
>> +bool poll_datapath;
>> +
>> +static u16 cntl_major = 5;
>> +static u16 cntl_minor;/* 0 */
> 
> Missing space before the comment.
> And also you could convert both vars to macros as they are constants.

Sure

>> +static bool link_up;
>> +static DEFINE_IDA(qaic_usrs);
>> +
>> +static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id,
>> +				  struct qaic_user *owner);
>> +static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id,
>> +				    struct qaic_user *owner);
>> +
>> +static void free_usr(struct kref *kref)
>> +{
>> +	struct qaic_user *usr = container_of(kref, struct qaic_user, ref_count);
>> +
>> +	cleanup_srcu_struct(&usr->qddev_lock);
>> +	ida_free(&qaic_usrs, usr->handle);
>> +	kfree(usr);
>> +}
>> +
>> +static int qaic_open(struct drm_device *dev, struct drm_file *file)
>> +{
>> +	struct qaic_drm_device *qddev = dev->dev_private;
>> +	struct qaic_device *qdev = qddev->qdev;
>> +	struct qaic_user *usr;
>> +	int rcu_id;
>> +	int ret;
>> +
>> +	rcu_id = srcu_read_lock(&qdev->dev_lock);
>> +	if (qdev->in_reset) {
>> +		srcu_read_unlock(&qdev->dev_lock, rcu_id);
>> +		return -ENODEV;
>> +	}
>> +
>> +	usr = kmalloc(sizeof(*usr), GFP_KERNEL);
>> +	if (!usr) {
>> +		srcu_read_unlock(&qdev->dev_lock, rcu_id);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	usr->handle = ida_alloc(&qaic_usrs, GFP_KERNEL);
>> +	if (usr->handle < 0) {
>> +		srcu_read_unlock(&qdev->dev_lock, rcu_id);
>> +		kfree(usr);
>> +		return usr->handle;
>> +	}
>> +	usr->qddev = qddev;
>> +	atomic_set(&usr->chunk_id, 0);
>> +	init_srcu_struct(&usr->qddev_lock);
>> +	kref_init(&usr->ref_count);
>> +
>> +	ret = mutex_lock_interruptible(&qddev->users_mutex);
>> +	if (ret) {
>> +		cleanup_srcu_struct(&usr->qddev_lock);
>> +		kfree(usr);
>> +		srcu_read_unlock(&qdev->dev_lock, rcu_id);
>> +		return ret;
>> +	}
>> +
>> +	list_add(&usr->node, &qddev->users);
>> +	mutex_unlock(&qddev->users_mutex);
>> +
>> +	file->driver_priv = usr;
>> +
>> +	srcu_read_unlock(&qdev->dev_lock, rcu_id);
>> +	return 0;
>> +}
>> +
>> +static void qaic_postclose(struct drm_device *dev, struct drm_file *file)
>> +{
>> +	struct qaic_user *usr = file->driver_priv;
>> +	struct qaic_drm_device *qddev;
>> +	struct qaic_device *qdev;
>> +	int qdev_rcu_id;
>> +	int usr_rcu_id;
>> +	int i;
>> +
>> +	qddev = usr->qddev;
>> +	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
>> +	if (qddev) {
>> +		qdev = qddev->qdev;
>> +		qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
>> +		if (!qdev->in_reset) {
>> +			qaic_release_usr(qdev, usr);
>> +			for (i = 0; i < qdev->num_dbc; ++i)
>> +				if (qdev->dbc[i].usr &&
>> +				    qdev->dbc[i].usr->handle == usr->handle)
>> +					release_dbc(qdev, i);
>> +
>> +			/* Remove child devices */
>> +			if (qddev->partition_id == QAIC_NO_PARTITION)
>> +				qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, usr);
>> +		}
>> +		srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
>> +
>> +		mutex_lock(&qddev->users_mutex);
>> +		if (!list_empty(&usr->node))
>> +			list_del_init(&usr->node);
>> +		mutex_unlock(&qddev->users_mutex);
>> +	}
>> +
>> +	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>> +	kref_put(&usr->ref_count, free_usr);
>> +
>> +	file->driver_priv = NULL;
>> +}
>> +
>> +static int qaic_part_dev_ioctl(struct drm_device *dev, void *data,
>> +			       struct drm_file *file_priv)
>> +{
>> +	struct qaic_device *qdev;
>> +	struct qaic_user *usr;
>> +	u32 partition_id;
>> +	int qdev_rcu_id;
>> +	int usr_rcu_id;
>> +	int ret = 0;
>> +	u16 remove;
>> +
>> +	usr = file_priv->driver_priv;
>> +	if (!usr)
>> +		return -EINVAL;
>> +
>> +	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
>> +	if (!usr->qddev) {
>> +		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>> +		return -ENODEV;
>> +	}
>> +
>> +	qdev = usr->qddev->qdev;
>> +	if (!qdev) {
>> +		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>> +		return -ENODEV;
>> +	}
>> +
>> +	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
>> +	if (qdev->in_reset) {
>> +		ret = -ENODEV;
>> +		goto out;
>> +	}
>> +
>> +	/* This IOCTL is only supported for base devices. */
>> +	if (usr->qddev->partition_id != QAIC_NO_PARTITION) {
>> +		ret = -ENOTTY;
>> +		goto out;
>> +	}
>> +
>> +	ret = qaic_data_get_reservation(qdev, usr, data, &partition_id,
>> +					&remove);
>> +	if (ret)
>> +		goto out;
>> +
>> +	if (remove == 1)
>> +		qaic_destroy_drm_device(qdev, partition_id, usr);
>> +	else
>> +		ret = qaic_create_drm_device(qdev, partition_id, usr);
>> +
>> +out:
>> +	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
>> +	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>> +
>> +	return ret;
>> +}
>> +
>> +DEFINE_DRM_ACCEL_FOPS(qaic_accel_fops);
>> +
>> +static const struct drm_ioctl_desc qaic_drm_ioctls[] = {
>> +	DRM_IOCTL_DEF_DRV(QAIC_MANAGE, qaic_manage_ioctl, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(QAIC_CREATE_BO, qaic_create_bo_ioctl, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(QAIC_MMAP_BO, qaic_mmap_bo_ioctl, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(QAIC_ATTACH_SLICE_BO, qaic_attach_slice_bo_ioctl, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(QAIC_EXECUTE_BO, qaic_execute_bo_ioctl, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(QAIC_PARTIAL_EXECUTE_BO, qaic_partial_execute_bo_ioctl, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(QAIC_WAIT_BO, qaic_wait_bo_ioctl, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(QAIC_PERF_STATS_BO, qaic_perf_stats_bo_ioctl, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(QAIC_PART_DEV, qaic_part_dev_ioctl, DRM_RENDER_ALLOW),
>> +};
>> +
>> +static const struct drm_driver qaic_accel_driver = {
>> +	.driver_features	= DRIVER_GEM | DRIVER_COMPUTE_ACCEL,
>> +
>> +	.name			= QAIC_NAME,
>> +	.desc			= QAIC_DESC,
>> +	.date			= "20190618",
>> +
>> +	.fops			= &qaic_accel_fops,
>> +	.open			= qaic_open,
>> +	.postclose		= qaic_postclose,
>> +
>> +	.ioctls			= qaic_drm_ioctls,
>> +	.num_ioctls		= ARRAY_SIZE(qaic_drm_ioctls),
>> +	.prime_fd_to_handle	= drm_gem_prime_fd_to_handle,
>> +	.gem_prime_import	= qaic_gem_prime_import,
>> +};
>> +
>> +static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id,
>> +				  struct qaic_user *owner)
>> +{
>> +	struct qaic_drm_device *qddev;
>> +	struct drm_device *ddev;
>> +	struct device *pdev;
>> +	int ret;
>> +
>> +	/*
>> +	 * Partition id QAIC_NO_PARTITION indicates that the device was created
>> +	 * on mhi_probe and id > QAIC_NO_PARTITION indicates a partition
>> +	 * created using IOCTL. So, pdev for primary device is the pci dev and
>> +	 * the parent for partition dev is the primary device.
>> +	 */
>> +	if (partition_id == QAIC_NO_PARTITION)
>> +		pdev = &qdev->pdev->dev;
>> +	else
>> +		pdev = qdev->base_dev->ddev->dev;
>> +
>> +	qddev = kzalloc(sizeof(*qddev), GFP_KERNEL);
>> +	if (!qddev) {
>> +		ret = -ENOMEM;
>> +		goto qddev_fail;
>> +	}
>> +
>> +	ddev = drm_dev_alloc(&qaic_accel_driver, pdev);
>> +	if (IS_ERR(ddev)) {
>> +		ret = PTR_ERR(ddev);
>> +		goto ddev_fail;
>> +	}
>> +
>> +	ddev->dev_private = qddev;
>> +	qddev->ddev = ddev;
>> +
>> +	if (partition_id == QAIC_NO_PARTITION)
>> +		qdev->base_dev = qddev;
>> +	qddev->qdev = qdev;
>> +	qddev->partition_id = partition_id;
>> +	qddev->owner = owner;
>> +	INIT_LIST_HEAD(&qddev->users);
>> +	mutex_init(&qddev->users_mutex);
>> +
>> +	mutex_lock(&qdev->qaic_drm_devices_mutex);
>> +	list_add(&qddev->node, &qdev->qaic_drm_devices);
>> +	mutex_unlock(&qdev->qaic_drm_devices_mutex);
>> +
>> +	ret = drm_dev_register(ddev, 0);
>> +	if (ret) {
>> +		pci_dbg(qdev->pdev, "%s: drm_dev_register failed %d\n", __func__, ret);
>> +		goto drm_reg_fail;
>> +	}
>> +
>> +	return 0;
>> +
>> +drm_reg_fail:
>> +	mutex_destroy(&qddev->users_mutex);
>> +	mutex_lock(&qdev->qaic_drm_devices_mutex);
>> +	list_del(&qddev->node);
>> +	mutex_unlock(&qdev->qaic_drm_devices_mutex);
>> +	if (partition_id == QAIC_NO_PARTITION)
>> +		qdev->base_dev = NULL;
>> +	drm_dev_put(ddev);
>> +ddev_fail:
>> +	kfree(qddev);
>> +qddev_fail:
>> +	return ret;
>> +}
>> +
>> +static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id,
>> +				    struct qaic_user *owner)
>> +{
>> +	struct qaic_drm_device *qddev;
>> +	struct qaic_drm_device *q;
>> +	struct qaic_user *usr;
>> +
>> +	list_for_each_entry_safe(qddev, q, &qdev->qaic_drm_devices, node) {
>> +		/*
>> +		 * Skip devices in case we just want to remove devices
>> +		 * specific to a owner or partition id.
>> +		 *
>> +		 * owner	partition_id	notes
>> +		 * ----------------------------------
>> +		 * NULL		NO_PARTITION	delete base + all derived (qdev
>> +		 *				reset)
>> +		 * !NULL	NO_PARTITION	delete derived devs created by
>> +		 *				owner.
>> +		 * !NULL	>NO_PARTITION	delete derived dev identified by
>> +		 *				the partition id and created by
>> +		 *				owner
>> +		 * NULL		>NO_PARTITION	invalid (no-op)
>> +		 *
>> +		 * if partition_id is any value < QAIC_NO_PARTITION this will be
>> +		 * a no-op.
>> +		 */
>> +		if (owner && owner != qddev->owner)
>> +			continue;
>> +
>> +		if (partition_id != QAIC_NO_PARTITION &&
>> +		    partition_id != qddev->partition_id && !owner)
>> +			continue;
>> +
>> +		/*
>> +		 * Existing users get unresolvable errors till they close FDs.
>> +		 * Need to sync carefully with users calling close().  The
>> +		 * list of users can be modified elsewhere when the lock isn't
>> +		 * held here, but the sync'ing the srcu with the mutex held
>> +		 * could deadlock.  Grab the mutex so that the list will be
>> +		 * unmodified.  The user we get will exist as long as the
>> +		 * lock is held.  Signal that the qcdev is going away, and
>> +		 * grab a reference to the user so they don't go away for
>> +		 * synchronize_srcu().  Then release the mutex to avoid
>> +		 * deadlock and make sure the user has observed the signal.
>> +		 * With the lock released, we cannot maintain any state of the
>> +		 * user list.
>> +		 */
>> +		mutex_lock(&qddev->users_mutex);
>> +		while (!list_empty(&qddev->users)) {
>> +			usr = list_first_entry(&qddev->users, struct qaic_user,
>> +					       node);
>> +			list_del_init(&usr->node);
>> +			kref_get(&usr->ref_count);
>> +			usr->qddev = NULL;
>> +			mutex_unlock(&qddev->users_mutex);
>> +			synchronize_srcu(&usr->qddev_lock);
>> +			kref_put(&usr->ref_count, free_usr);
>> +			mutex_lock(&qddev->users_mutex);
>> +		}
>> +		mutex_unlock(&qddev->users_mutex);
>> +
>> +		if (qddev->ddev) {
>> +			drm_dev_unregister(qddev->ddev);
>> +			drm_dev_put(qddev->ddev);
>> +		}
>> +
>> +		list_del(&qddev->node);
>> +		kfree(qddev);
>> +	}
>> +}
>> +
>> +static int qaic_mhi_probe(struct mhi_device *mhi_dev,
>> +			  const struct mhi_device_id *id)
>> +{
>> +	struct qaic_device *qdev;
>> +	u16 major, minor;
>> +	int ret;
>> +
>> +	/*
>> +	 * Invoking this function indicates that the control channel to the
>> +	 * device is available.  We use that as a signal to indicate that
>> +	 * the device side firmware has booted.  The device side firmware
>> +	 * manages the device resources, so we need to communicate with it
>> +	 * via the control channel in order to utilize the device.  Therefore
>> +	 * we wait until this signal to create the drm dev that userspace will
>> +	 * use to control the device, because without the device side firmware,
>> +	 * userspace can't do anything useful.
>> +	 */
>> +
>> +	qdev = pci_get_drvdata(to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
>> +
>> +	qdev->in_reset = false;
>> +
>> +	dev_set_drvdata(&mhi_dev->dev, qdev);
>> +	qdev->cntl_ch = mhi_dev;
>> +
>> +	ret = qaic_control_open(qdev);
>> +	if (ret) {
>> +		pci_dbg(qdev->pdev, "%s: control_open failed %d\n", __func__, ret);
>> +		goto err;
>> +	}
>> +
>> +	ret = get_cntl_version(qdev, NULL, &major, &minor);
>> +	if (ret || major != cntl_major || minor > cntl_minor) {
>> +		pci_err(qdev->pdev, "%s: Control protocol version (%d.%d) not supported.  Supported version is (%d.%d). Ret: %d\n",
>> +			__func__, major, minor, cntl_major, cntl_minor, ret);
>> +		ret = -EINVAL;
>> +		goto close_control;
>> +	}
>> +
>> +	ret = qaic_create_drm_device(qdev, QAIC_NO_PARTITION, NULL);
>> +
>> +	return ret;
>> +
>> +close_control:
>> +	qaic_control_close(qdev);
>> +err:
>> +	return ret;
>> +}
>> +
>> +static void qaic_mhi_remove(struct mhi_device *mhi_dev)
>> +{
> 
> Add a comment here

Presumably why this is an empty function?
Will do.

>> +}
>> +
>> +static void qaic_notify_reset(struct qaic_device *qdev)
>> +{
>> +	int i;
>> +
>> +	qdev->in_reset = true;
>> +	/* wake up any waiters to avoid waiting for timeouts at sync */
>> +	wake_all_cntl(qdev);
>> +	for (i = 0; i < qdev->num_dbc; ++i)
>> +		wakeup_dbc(qdev, i);
>> +	synchronize_srcu(&qdev->dev_lock);
>> +}
>> +
>> +void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset)
>> +{
>> +	int i;
>> +
>> +	qaic_notify_reset(qdev);
>> +
>> +	/* remove drmdevs to prevent new users from coming in */
>> +	if (qdev->base_dev)
>> +		qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, NULL);
>> +
>> +	/* start tearing things down */
>> +	for (i = 0; i < qdev->num_dbc; ++i)
>> +		release_dbc(qdev, i);
>> +
>> +	if (exit_reset)
>> +		qdev->in_reset = false;
>> +}
>> +
>> +static int qaic_pci_probe(struct pci_dev *pdev,
>> +			  const struct pci_device_id *id)
> 
> Please try to simplify this function. Maybe move irq init to separate function.
> It will be more readable and there will less of a error handling spaghetti at the bottom.

I guess I tend towards not breaking something up if there is not reuse 
elsewhere, but I think I see your point.  Will have a look.

> 
>> +{
>> +	int ret;
>> +	int i;
>> +	int mhi_irq;
>> +	struct qaic_device *qdev;
>> +
>> +	qdev = devm_kzalloc(&pdev->dev, sizeof(*qdev), GFP_KERNEL);
>> +	if (!qdev)
>> +		return -ENOMEM;
>> +
>> +	if (id->device == PCI_DEV_AIC100) {
>> +		qdev->num_dbc = 16;
>> +		qdev->dbc = devm_kcalloc(&pdev->dev, qdev->num_dbc, sizeof(*qdev->dbc),
>> +					 GFP_KERNEL);
>> +		if (!qdev->dbc)
>> +			return -ENOMEM;
>> +	}
>> +
>> +	qdev->cntl_wq = alloc_workqueue("qaic_cntl", WQ_UNBOUND, 0);
>> +	if (!qdev->cntl_wq) {
>> +		ret = -ENOMEM;
>> +		goto wq_fail;
>> +	}
>> +	pci_set_drvdata(pdev, qdev);
>> +	qdev->pdev = pdev;
>> +	mutex_init(&qdev->cntl_mutex);
>> +	INIT_LIST_HEAD(&qdev->cntl_xfer_list);
>> +	init_srcu_struct(&qdev->dev_lock);
>> +	INIT_LIST_HEAD(&qdev->qaic_drm_devices);
>> +	mutex_init(&qdev->qaic_drm_devices_mutex);
>> +	for (i = 0; i < qdev->num_dbc; ++i) {
>> +		spin_lock_init(&qdev->dbc[i].xfer_lock);
>> +		qdev->dbc[i].qdev = qdev;
>> +		qdev->dbc[i].id = i;
>> +		INIT_LIST_HEAD(&qdev->dbc[i].xfer_list);
>> +		init_srcu_struct(&qdev->dbc[i].ch_lock);
>> +		init_waitqueue_head(&qdev->dbc[i].dbc_release);
>> +		INIT_LIST_HEAD(&qdev->dbc[i].bo_lists);
>> +	}
>> +
>> +	qdev->bars = pci_select_bars(pdev, IORESOURCE_MEM);
>> +
>> +	/* make sure the device has the expected BARs */
>> +	if (qdev->bars != (BIT(0) | BIT(2) | BIT(4))) {
>> +		pci_dbg(pdev, "%s: expected BARs 0, 2, and 4 not found in device.  Found 0x%x\n",
>> +			__func__, qdev->bars);
>> +		ret = -EINVAL;
>> +		goto bar_fail;
>> +	}
>> +
>> +	ret = pci_enable_device(pdev);
>> +	if (ret)
>> +		goto enable_fail;
>> +
>> +	ret = pci_request_selected_regions(pdev, qdev->bars, "aic100");
>> +	if (ret)
>> +		goto request_regions_fail;
>> +
>> +	pci_set_master(pdev);
>> +
>> +	ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
>> +	if (ret)
>> +		goto dma_mask_fail;
>> +	ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
>> +	if (ret)
>> +		goto dma_mask_fail;
>> +	ret = dma_set_max_seg_size(&pdev->dev, UINT_MAX);
>> +	if (ret)
>> +		goto dma_mask_fail;
>> +
>> +	qdev->bar_0 = pci_ioremap_bar(pdev, 0);
>> +	if (!qdev->bar_0) {
>> +		ret = -ENOMEM;
>> +		goto ioremap_0_fail;
>> +	}
>> +
>> +	qdev->bar_2 = pci_ioremap_bar(pdev, 2);
>> +	if (!qdev->bar_2) {
>> +		ret = -ENOMEM;
>> +		goto ioremap_2_fail;
>> +	}
>> +
>> +	for (i = 0; i < qdev->num_dbc; ++i)
>> +		qdev->dbc[i].dbc_base = qdev->bar_2 + QAIC_DBC_OFF(i);
>> +
>> +	ret = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI);
>> +	if (ret < 0)
>> +		goto alloc_irq_fail;
>> +
>> +	if (ret < 32) {
>> +		pci_err(pdev, "%s: Requested 32 MSIs.  Obtained %d MSIs which is less than the 32 required.\n",
>> +			__func__, ret);
>> +		ret = -ENODEV;
>> +		goto invalid_msi_config;
>> +	}
>> +
>> +	mhi_irq = pci_irq_vector(pdev, 0);
>> +	if (mhi_irq < 0) {
>> +		ret = mhi_irq;
>> +		goto get_mhi_irq_fail;
>> +	}
>> +
>> +	for (i = 0; i < qdev->num_dbc; ++i) {
>> +		ret = devm_request_threaded_irq(&pdev->dev,
>> +						pci_irq_vector(pdev, i + 1),
>> +						dbc_irq_handler,
>> +						dbc_irq_threaded_fn,
>> +						IRQF_SHARED,
>> +						"qaic_dbc",
>> +						&qdev->dbc[i]);
>> +		if (ret)
>> +			goto get_dbc_irq_failed;
>> +
>> +		if (poll_datapath) {
>> +			qdev->dbc[i].irq = pci_irq_vector(pdev, i + 1);
>> +			disable_irq_nosync(qdev->dbc[i].irq);
>> +			INIT_WORK(&qdev->dbc[i].poll_work, irq_polling_work);
>> +		}
>> +	}
>> +
>> +	qdev->mhi_cntl = qaic_mhi_register_controller(pdev, qdev->bar_0, mhi_irq);
>> +	if (IS_ERR(qdev->mhi_cntl)) {
>> +		ret = PTR_ERR(qdev->mhi_cntl);
>> +		goto mhi_register_fail;
>> +	}
>> +
>> +	return 0;
>> +
>> +mhi_register_fail:
>> +get_dbc_irq_failed:
> 
> I don't think that duplicated goto statements are allowed by the coding style.
> These should be rather named something like err_free_irq.
> See https://www.kernel.org/doc/html/v4.10/process/coding-style.html#centralized-exiting-of-functions

I took another look at that link, and I do not beleive it prohibits 
duplicated goto labels, but they probably could be renamed and 
simplified as you indicate.  Will have a look at that.

> 
>> +	for (i = 0; i < qdev->num_dbc; ++i)
>> +		devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
>> +			      &qdev->dbc[i]);
>> +get_mhi_irq_fail:
>> +invalid_msi_config:
>> +	pci_free_irq_vectors(pdev);
>> +alloc_irq_fail:
>> +	iounmap(qdev->bar_2);
>> +ioremap_2_fail:
>> +	iounmap(qdev->bar_0);
>> +ioremap_0_fail:
>> +dma_mask_fail:
>> +	pci_clear_master(pdev);
>> +	pci_release_selected_regions(pdev, qdev->bars);
>> +request_regions_fail:
>> +	pci_disable_device(pdev);
>> +enable_fail:
>> +	pci_set_drvdata(pdev, NULL);
>> +bar_fail:
>> +	for (i = 0; i < qdev->num_dbc; ++i)
>> +		cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
>> +	cleanup_srcu_struct(&qdev->dev_lock);
>> +	destroy_workqueue(qdev->cntl_wq);
>> +wq_fail:
>> +	return ret;
>> +}
>> +
>> +static void qaic_pci_remove(struct pci_dev *pdev)
>> +{
>> +	struct qaic_device *qdev = pci_get_drvdata(pdev);
>> +	int i;
>> +
>> +	if (!qdev)
>> +		return;
>> +
>> +	qaic_dev_reset_clean_local_state(qdev, false);
>> +	qaic_mhi_free_controller(qdev->mhi_cntl, link_up);
>> +	for (i = 0; i < qdev->num_dbc; ++i) {
>> +		devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
>> +			      &qdev->dbc[i]);
>> +		cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
>> +	}
>> +	destroy_workqueue(qdev->cntl_wq);
>> +	pci_free_irq_vectors(pdev);
>> +	iounmap(qdev->bar_0);
>> +	pci_clear_master(pdev);
>> +	pci_release_selected_regions(pdev, qdev->bars);
>> +	pci_disable_device(pdev);
>> +	pci_set_drvdata(pdev, NULL);
>> +}
>> +
>> +static void qaic_pci_shutdown(struct pci_dev *pdev)
>> +{
>> +	/* see qaic_exit for what link_up is doing */
>> +	link_up = true;
>> +	qaic_pci_remove(pdev);
>> +}
>> +
>> +static pci_ers_result_t qaic_pci_error_detected(struct pci_dev *pdev,
>> +						pci_channel_state_t error)
>> +{
>> +	return PCI_ERS_RESULT_NEED_RESET;
>> +}
>> +
>> +static void qaic_pci_reset_prepare(struct pci_dev *pdev)
>> +{
>> +	struct qaic_device *qdev = pci_get_drvdata(pdev);
>> +
>> +	qaic_notify_reset(qdev);
>> +	qaic_mhi_start_reset(qdev->mhi_cntl);
>> +	qaic_dev_reset_clean_local_state(qdev, false);
>> +}
>> +
>> +static void qaic_pci_reset_done(struct pci_dev *pdev)
>> +{
>> +	struct qaic_device *qdev = pci_get_drvdata(pdev);
>> +
>> +	qdev->in_reset = false;
>> +	qaic_mhi_reset_done(qdev->mhi_cntl);
>> +}
>> +
>> +static const struct mhi_device_id qaic_mhi_match_table[] = {
>> +	{ .chan = "QAIC_CONTROL", },
>> +	{},
>> +};
>> +
>> +static struct mhi_driver qaic_mhi_driver = {
>> +	.id_table = qaic_mhi_match_table,
>> +	.remove = qaic_mhi_remove,
>> +	.probe = qaic_mhi_probe,
>> +	.ul_xfer_cb = qaic_mhi_ul_xfer_cb,
>> +	.dl_xfer_cb = qaic_mhi_dl_xfer_cb,
>> +	.driver = {
>> +		.name = "qaic_mhi",
>> +	},
>> +};
>> +
>> +static const struct pci_device_id qaic_ids[] = {
>> +	{ PCI_DEVICE(PCI_VENDOR_ID_QCOM, PCI_DEV_AIC100), },
>> +	{ }
>> +};
>> +MODULE_DEVICE_TABLE(pci, qaic_ids);
>> +
>> +static const struct pci_error_handlers qaic_pci_err_handler = {
>> +	.error_detected = qaic_pci_error_detected,
>> +	.reset_prepare = qaic_pci_reset_prepare,
>> +	.reset_done = qaic_pci_reset_done,
>> +};
>> +
>> +static struct pci_driver qaic_pci_driver = {
>> +	.name = QAIC_NAME,
>> +	.id_table = qaic_ids,
>> +	.probe = qaic_pci_probe,
>> +	.remove = qaic_pci_remove,
>> +	.shutdown = qaic_pci_shutdown,
>> +	.err_handler = &qaic_pci_err_handler,
>> +};
>> +
>> +static int __init qaic_init(void)
>> +{
>> +	int ret;
>> +
>> +	if (datapath_polling)
>> +		poll_datapath = true;
>> +
>> +	ret = mhi_driver_register(&qaic_mhi_driver);
>> +	if (ret) {
>> +		pr_debug("qaic: mhi_driver_register failed %d\n", ret);
>> +		goto free_class;
>> +	}
>> +
>> +	ret = pci_register_driver(&qaic_pci_driver);
>> +	if (ret) {
>> +		pr_debug("qaic: pci_register_driver failed %d\n", ret);
>> +		goto free_mhi;
>> +	}
>> +
>> +	ret = mhi_qaic_ctrl_init();
>> +	if (ret) {
>> +		pr_debug("qaic: mhi_qaic_ctrl_init failed %d\n", ret);
>> +		goto free_pci;
>> +	}
>> +
>> +	return 0;
>> +
>> +free_pci:
>> +	pci_unregister_driver(&qaic_pci_driver);
>> +free_mhi:
>> +	mhi_driver_unregister(&qaic_mhi_driver);
>> +free_class:
> 
> This label doesn't free anything. It should be renamed.

Doh.  Holdover from a previous version.  It does need to be renamed.

> 
>> +	return ret;
>> +}
>> +
>> +static void __exit qaic_exit(void)
>> +{
>> +	/*
>> +	 * We assume that qaic_pci_remove() is called due to a hotplug event
>> +	 * which would mean that the link is down, and thus
>> +	 * qaic_mhi_free_controller() should not try to access the device during
>> +	 * cleanup.
>> +	 * We call pci_unregister_driver() below, which also triggers
>> +	 * qaic_pci_remove(), but since this is module exit, we expect the link
>> +	 * to the device to be up, in which case qaic_mhi_free_controller()
>> +	 * should try to access the device during cleanup to put the device in
>> +	 * a sane state.
>> +	 * For that reason, we set link_up here to let qaic_mhi_free_controller
>> +	 * know the expected link state.  Since the module is going to be
>> +	 * removed at the end of this, we don't need to worry about
>> +	 * reinitializing the link_up state after the cleanup is done.
>> +	 */
>> +	link_up = true;
> 
> Maybe you could just use pdev->current_state instead of link_up?

Maybe.  Will have a look at that.

> 
>> +	mhi_qaic_ctrl_deinit();
>> +	pci_unregister_driver(&qaic_pci_driver);
>> +	mhi_driver_unregister(&qaic_mhi_driver);
>> +}
>> +
>> +module_init(qaic_init);
>> +module_exit(qaic_exit);
>> +
>> +MODULE_AUTHOR(QAIC_DESC " Kernel Driver Team");
>> +MODULE_DESCRIPTION(QAIC_DESC " Accel Driver");
>> +MODULE_LICENSE("GPL");
>> diff --git a/include/uapi/drm/qaic_accel.h b/include/uapi/drm/qaic_accel.h
>> new file mode 100644
>> index 0000000..d5fa6f5
>> --- /dev/null
>> +++ b/include/uapi/drm/qaic_accel.h
>> @@ -0,0 +1,283 @@
>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
>> + *
>> + * Copyright (c) 2019-2020, The Linux Foundation. All rights reserved.
>> + * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved.
>> + */
>> +
>> +#ifndef QAIC_ACCEL_H_
>> +#define QAIC_ACCEL_H_
>> +
>> +#include <linux/ioctl.h>
>> +#include <linux/types.h>
> 
> These two headers should not be needed here.

Fair enough.  Listed out of paranoia since they define things used in 
this header.

> 
>> +#include "drm.h"
>> +
>> +#if defined(__CPLUSPLUS)
> 
> Use lowercase here: __cplusplus

Doh.  Not sure why uppercase was used.  The referenced examples sure 
didn't have that.

> 
>> +extern "C" {
>> +#endif
>> +
>> +#define QAIC_MANAGE_MAX_MSG_LENGTH SZ_4K	/**<
>> +						  * The length(4K) includes len and
>> +						  * count fields of qaic_manage_msg
>> +						  */
> 
> I guess these are doxygen style commands but you should be using kernel-doc here.
> See https://docs.kernel.org/doc-guide/kernel-doc.html.
> This can be used to verify the header:
> $ scripts/kernel-doc -v -none include/uapi/drm/qaic_accel.h

include/uapi/drm/drm.h was used as the example for these, and thus it 
was assumed that was the DRM style.

I'm not convinced kernel-doc is applicable to all of these.  Will review.

> 
>> +
>> +enum qaic_sem_flags {
> 
> Is there any specific reason for enums if all values are explicitly given?

It is a style pulled from elsewhere.  I can remove the enums.

> 
>> +	SEM_INSYNCFENCE =	0x1,
> 
> All these enums/defines end up in a global user space namespace.
> I would advise to prefix everything with QAIC_ or DRM_QAIC_ (e.g. QAIC_SEM_INSYNCFENCE)
> to avoid conflicts with other drivers or user space libs.

Good point.  Will do.

> 
>> +	SEM_OUTSYNCFENCE =	0x2,
>> +};
>> +
>> +enum qaic_sem_cmd {
>> +	SEM_NOP =		0,
>> +	SEM_INIT =		1,
>> +	SEM_INC =		2,
>> +	SEM_DEC =		3,
>> +	SEM_WAIT_EQUAL =	4,
>> +	SEM_WAIT_GT_EQ =	5, /**< Greater than or equal */
>> +	SEM_WAIT_GT_0 =		6, /**< Greater than 0 */
>> +};
>> +
>> +enum qaic_manage_transaction_type {
>> +	TRANS_UNDEFINED =			0,
>> +	TRANS_PASSTHROUGH_FROM_USR =		1,
>> +	TRANS_PASSTHROUGH_TO_USR =		2,
>> +	TRANS_PASSTHROUGH_FROM_DEV =		3,
>> +	TRANS_PASSTHROUGH_TO_DEV =		4,
>> +	TRANS_DMA_XFER_FROM_USR =		5,
>> +	TRANS_DMA_XFER_TO_DEV =			6,
>> +	TRANS_ACTIVATE_FROM_USR =		7,
>> +	TRANS_ACTIVATE_FROM_DEV =		8,
>> +	TRANS_ACTIVATE_TO_DEV =			9,
>> +	TRANS_DEACTIVATE_FROM_USR =		10,
>> +	TRANS_DEACTIVATE_FROM_DEV =		11,
>> +	TRANS_STATUS_FROM_USR =			12,
>> +	TRANS_STATUS_TO_USR =			13,
>> +	TRANS_STATUS_FROM_DEV =			14,
>> +	TRANS_STATUS_TO_DEV =			15,
>> +	TRANS_TERMINATE_FROM_DEV =		16,
>> +	TRANS_TERMINATE_TO_DEV =		17,
>> +	TRANS_DMA_XFER_CONT =			18,
>> +	TRANS_VALIDATE_PARTITION_FROM_DEV =	19,
>> +	TRANS_VALIDATE_PARTITION_TO_DEV =	20,
>> +	TRANS_MAX =				21
>> +};
>> +
>> +struct qaic_manage_trans_hdr {
>> +	__u32 type;	/**< in, value from enum manage_transaction_type */
>> +	__u32 len;	/**< in, length of this transaction, including the header */
>> +};
>> +
>> +struct qaic_manage_trans_passthrough {
>> +	struct qaic_manage_trans_hdr hdr;
>> +	__u8 data[];	/**< in, userspace must encode in little endian */
>> +};
>> +
>> +struct qaic_manage_trans_dma_xfer {
>> +	struct qaic_manage_trans_hdr hdr;
>> +	__u32 tag;	/**< in, device specific */
>> +	__u32 count;	/**< in */
>> +	__u64 addr;	/**< in, address of the data to transferred via DMA */
>> +	__u64 size;	/**< in, length of the data to transferred via DMA */
>> +};
>> +
>> +struct qaic_manage_trans_activate_to_dev {
>> +	struct qaic_manage_trans_hdr hdr;
>> +	__u32 queue_size;	/**<
>> +				  * in, number of elements in DBC request
>> +				  * and respose queue
>> +				  */
>> +	__u32 eventfd;		/**< in */
>> +	__u32 options;		/**< in, device specific */
>> +	__u32 pad;		/**< pad must be 0 */
>> +};
>> +
>> +struct qaic_manage_trans_activate_from_dev {
>> +	struct qaic_manage_trans_hdr hdr;
>> +	__u32 status;	/**< out, status of activate transaction */
>> +	__u32 dbc_id;	/**< out, Identifier of assigned DMA Bridge channel */
>> +	__u64 options;	/**< out */
>> +};
>> +
>> +struct qaic_manage_trans_deactivate {
>> +	struct qaic_manage_trans_hdr hdr;
>> +	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
>> +	__u32 pad;	/**< pad must be 0 */
>> +};
>> +
>> +struct qaic_manage_trans_status_to_dev {
>> +	struct qaic_manage_trans_hdr hdr;
>> +};
>> +
>> +struct qaic_manage_trans_status_from_dev {
>> +	struct qaic_manage_trans_hdr hdr;
>> +	__u16 major;	/**< out, major vesrion of NNC protocol used by device */
>> +	__u16 minor;	/**< out, minor vesrion of NNC protocol used by device */
> 
> vesrion -> version

Will fix.

> 
>> +	__u32 status;	/**< out, status of query transaction  */
>> +	__u64 status_flags;	/**<
>> +				  * out
>> +				  * 0    : If set then device has CRC check enabled
>> +				  * 1:63 : Unused
>> +				  */
>> +};
>> +
>> +struct qaic_manage_msg {
>> +	__u32 len;	/**< in, Length of valid data - ie sum of all transactions */
>> +	__u32 count;	/**< in, Number of transactions in message */
>> +	__u64 data;	/**< in, Pointer to array of transactions */
>> +};
>> +
>> +struct qaic_create_bo {
>> +	__u64 size;	/**< in, Size of BO in byte */
>> +	__u32 handle;	/**< out, Returned GEM handle for the BO */
>> +	__u32 pad;	/**< pad must be 0 */
>> +};
>> +
>> +struct qaic_mmap_bo {
>> +	__u32 handle;	/**< in, Handle for the BO being mapped. */
> 
> The comment is missleading. BO is not mapped by this ioctl().

Its mapping the handle to the offset, which is a type of mapping.  I 
think I get your point though.  Will try to wordsmith this better.

> 
>> +	__u32 pad;	/**< pad must be 0 */
>> +	__u64 offset;	/**<
>> +			  * out, offset into the drm node to use for
>> +			  * subsequent mmap call
>> +			  */
>> +};
>> +
>> +/**
>> + * @brief semaphore command
>> + */
>> +struct qaic_sem {
>> +	__u16 val;	/**< in, Only lower 12 bits are valid */
>> +	__u8  index;	/**< in, Only lower 5 bits are valid */
>> +	__u8  presync;	/**< in, 1 if presync operation, 0 if postsync */
>> +	__u8  cmd;	/**< in, See enum sem_cmd */
>> +	__u8  flags;	/**< in, See sem_flags for valid bits.  All others must be 0 */
>> +	__u16 pad;	/**< pad must be 0 */
>> +};
>> +
>> +struct qaic_attach_slice_entry {
>> +	__u64 size;		/**< in, Size memory to allocate for this BO slice */
>> +	struct qaic_sem	sem0;	/**< in, Must be zero if not valid */
>> +	struct qaic_sem	sem1;	/**< in, Must be zero if not valid */
>> +	struct qaic_sem	sem2;	/**< in, Must be zero if not valid */
>> +	struct qaic_sem	sem3;	/**< in, Must be zero if not valid */
>> +	__u64 dev_addr;		/**< in, Address in device to/from which data is copied */
>> +	__u64 db_addr;		/**< in, Doorbell address */
>> +	__u32 db_data;		/**< in, Data to write to doorbell */
>> +	__u32 db_len;		/**<
>> +				  * in, Doorbell length - 32, 16, or 8 bits.
>> +				  * 0 means doorbell is inactive
>> +				  */
>> +	__u64 offset;		/**< in, Offset from start of buffer */
>> +};
>> +
>> +struct qaic_attach_slice_hdr {
>> +	__u32 count;	/**< in, Number of slices for this BO */
>> +	__u32 dbc_id;	/**< in, Associate this BO with this DMA Bridge channel */
>> +	__u32 handle;	/**< in, Handle of BO to which slicing information is to be attached */
>> +	__u32 dir;	/**< in, Direction of data: 1 = DMA_TO_DEVICE, 2 = DMA_FROM_DEVICE */
>> +	__u64 size;	/**<
>> +			  * in, Total length of BO
>> +			  * If BO is imported (DMABUF/PRIME) then this size
>> +			  * should not exceed the size of DMABUF provided.
>> +			  * If BO is allocated using DRM_IOCTL_QAIC_CREATE_BO
>> +			  * then this size should be exactly same as the size
>> +			  * provided during DRM_IOCTL_QAIC_CREATE_BO.
>> +			  */
>> +};
>> +
>> +struct qaic_attach_slice {
>> +	struct qaic_attach_slice_hdr hdr;
>> +	__u64 data;	/**<
>> +			  * in, Pointer to a buffer which is container of
>> +			  * struct qaic_attach_slice_entry[]
>> +			  */
>> +};
>> +
>> +struct qaic_execute_entry {
>> +	__u32 handle;	/**< in, buffer handle */
>> +	__u32 dir;	/**< in, 1 = to device, 2 = from device */
>> +};
>> +
>> +struct qaic_partial_execute_entry {
>> +	__u32 handle;	/**< in, buffer handle */
>> +	__u32 dir;	/**< in, 1 = to device, 2 = from device */
>> +	__u64 resize;	/**< in, 0 = no resize */
>> +};
>> +
>> +struct qaic_execute_hdr {
>> +	__u32 count;	/**< in, number of executes following this header */
>> +	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
>> +};
>> +
>> +struct qaic_execute {
>> +	struct qaic_execute_hdr hdr;
>> +	__u64 data;	/**< in, qaic_execute_entry or qaic_partial_execute_entry container */
>> +};
>> +
>> +struct qaic_wait {
>> +	__u32 handle;	/**< in, handle to wait on until execute is complete */
>> +	__u32 timeout;	/**< in, timeout for wait(in ms) */
>> +	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
>> +	__u32 pad;	/**< pad must be 0 */
>> +};
>> +
>> +struct qaic_perf_stats_hdr {
>> +	__u16 count;	/**< in, Total number BOs requested */
>> +	__u16 pad;	/**< pad must be 0 */
>> +	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
>> +};
>> +
>> +struct qaic_perf_stats {
>> +	struct qaic_perf_stats_hdr hdr;
>> +	__u64 data;	/**< in, qaic_perf_stats_entry container */
>> +};
>> +
>> +struct qaic_perf_stats_entry {
>> +	__u32 handle;			/**< in, Handle of the memory request */
>> +	__u32 queue_level_before;	/**<
>> +					  * out, Number of elements in queue
>> +					  * before submission given memory request
>> +					  */
>> +	__u32 num_queue_element;	/**<
>> +					  * out, Number of elements to add in the
>> +					  * queue for given memory request
>> +					  */
>> +	__u32 submit_latency_us;	/**<
>> +					  * out, Time taken by kernel to submit
>> +					  * the request to device
>> +					  */
>> +	__u32 device_latency_us;	/**<
>> +					  * out, Time taken by device to execute the
>> +					  * request. 0 if request is not completed
>> +					  */
>> +	__u32 pad;			/**< pad must be 0 */
>> +};
>> +
>> +struct qaic_part_dev {
>> +	__u32 partition_id;	/**< in, reservation id */
>> +	__u16 remove;		/**< in, 1 - Remove device 0 - Create device */
>> +	__u16 pad;		/**< pad must be 0 */
>> +};
>> +
>> +#define DRM_QAIC_MANAGE				0x00
>> +#define DRM_QAIC_CREATE_BO			0x01
>> +#define DRM_QAIC_MMAP_BO			0x02
> 
> I know that MMAP_BO ioctl is common in drm drivers but in my opinion it is a very poor name.
> I suggest naming it BO_INFO so in future you could extend it with other bo params besides
> mmap offset.

I see your point, but I don't see how to implement it.

Let us assume this IOCTL is renamed DRM_QAIC_BO_INFO, and struct 
qaic_mmap_bo is named qaic_bo_info.

qaic_bo_info is part of the uAPI.  If, "tomorrow" we need to add a field 
to it, we cannot.  That changes the structure size and breaks the uAPI.

I can't add reserved fields in anticipation of future use since GregKH 
had said not to do that - 
https://lore.kernel.org/all/20200514163734.GB3154055@kroah.com/

So, I'm thinking the only approach would be a linked list similar to 
EXECUTE_BO where qaic_bo_info holds a userspace pointer that is the head 
of the list, and new list nodes can be defined in future.  That seems 
like an overengineered solution to some hypothetical future which may 
not come to pass.  If this is the solution, then I think I'd prefer to 
leave this ioctl as is, and deprecate it if the extension usecase ever 
comes to pass.

Thoughts?  Am I missing something?

> 
>> +#define DRM_QAIC_ATTACH_SLICE_BO		0x03
>> +#define DRM_QAIC_EXECUTE_BO			0x04
>> +#define DRM_QAIC_PARTIAL_EXECUTE_BO		0x05
>> +#define DRM_QAIC_WAIT_BO			0x06
>> +#define DRM_QAIC_PERF_STATS_BO			0x07
>> +#define DRM_QAIC_PART_DEV			0x08
>> +
>> +#define DRM_IOCTL_QAIC_MANAGE			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_MANAGE, struct qaic_manage_msg)
>> +#define DRM_IOCTL_QAIC_CREATE_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_CREATE_BO,	struct qaic_create_bo)
>> +#define DRM_IOCTL_QAIC_MMAP_BO			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_MMAP_BO, struct qaic_mmap_bo)
>> +#define DRM_IOCTL_QAIC_ATTACH_SLICE_BO		DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_ATTACH_SLICE_BO, struct qaic_attach_slice)
>> +#define DRM_IOCTL_QAIC_EXECUTE_BO		DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_EXECUTE_BO,	struct qaic_execute)
>> +#define DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO	DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_PARTIAL_EXECUTE_BO,	struct qaic_execute)
>> +#define DRM_IOCTL_QAIC_WAIT_BO			DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_WAIT_BO, struct qaic_wait)
>> +#define DRM_IOCTL_QAIC_PERF_STATS_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PERF_STATS_BO, struct qaic_perf_stats)
>> +#define DRM_IOCTL_QAIC_PART_DEV			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PART_DEV, struct qaic_part_dev)
>> +
>> +#if defined(__CPLUSPLUS)
> 
> Use lowercase here: __cplusplus

Will do.

> 
>> +}
>> +#endif
>> +
>> +#endif /* QAIC_ACCEL_H_ */
> 
> Regards,
> Jacek
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/8] accel/qaic: Add uapi and core driver file
  2023-02-17 18:15       ` Jeffrey Hugo
@ 2023-02-22 15:52         ` Dafna Hirschfeld
  -1 siblings, 0 replies; 68+ messages in thread
From: Dafna Hirschfeld @ 2023-02-22 15:52 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: Jacek Lawrynowicz, ogabbay, airlied, daniel, dri-devel,
	linux-doc, linux-arm-msm, quic_ajitpals, quic_pkanojiy,
	stanislaw.gruszka, quic_carlv

On 17.02.2023 11:15, Jeffrey Hugo wrote:
>On 2/16/2023 7:13 AM, Jacek Lawrynowicz wrote:
>>Hi,
>>
>>On 06.02.2023 16:41, Jeffrey Hugo wrote:
>>>Add the QAIC driver uapi file and core driver file that binds to the PCIe
>>>device.  The core driver file also creates the accel device and manages
>>>all the interconnections between the different parts of the driver.
>>>
>>>The driver can be built as a module.  If so, it will be called "qaic.ko".
>>>
>>>Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
>>>Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
>>>---
>>>  drivers/accel/qaic/qaic.h     | 321 ++++++++++++++++++
>>>  drivers/accel/qaic/qaic_drv.c | 771 ++++++++++++++++++++++++++++++++++++++++++
>>>  include/uapi/drm/qaic_accel.h | 283 ++++++++++++++++
>>>  3 files changed, 1375 insertions(+)
>>>  create mode 100644 drivers/accel/qaic/qaic.h
>>>  create mode 100644 drivers/accel/qaic/qaic_drv.c
>>>  create mode 100644 include/uapi/drm/qaic_accel.h
>>>
>>>diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
>>>new file mode 100644
>>>index 0000000..3f7ea76
>>>--- /dev/null
>>>+++ b/drivers/accel/qaic/qaic.h
>>>@@ -0,0 +1,321 @@
>>>+/* SPDX-License-Identifier: GPL-2.0-only
>>>+ *
>>>+ * Copyright (c) 2019-2021, The Linux Foundation. All rights reserved.
>>>+ * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved.
>>>+ */
>>>+
>>>+#ifndef QAICINTERNAL_H_
>>
>>Please use guard macro that matches the file name: _QAIC_H_
>
>Before moving to DRM/ACCEL, this conflicted with the uapi file. 
>However, that is no longer the case, so yes, this should be changed. 
>Will do.
>
>>
>>>+#define QAICINTERNAL_H_
>>>+
>>>+#include <linux/interrupt.h>
>>>+#include <linux/kref.h>
>>>+#include <linux/mhi.h>
>>>+#include <linux/mutex.h>
>>>+#include <linux/pci.h>
>>>+#include <linux/spinlock.h>
>>>+#include <linux/srcu.h>
>>>+#include <linux/wait.h>
>>>+#include <linux/workqueue.h>
>>>+#include <drm/drm_device.h>
>>>+#include <drm/drm_gem.h>
>>>+
>>>+#define QAIC_DBC_BASE		0x20000
>>>+#define QAIC_DBC_SIZE		0x1000
>>>+
>>>+#define QAIC_NO_PARTITION	-1
>>>+
>>>+#define QAIC_DBC_OFF(i)		((i) * QAIC_DBC_SIZE + QAIC_DBC_BASE)
>>>+
>>>+#define to_qaic_bo(obj) container_of(obj, struct qaic_bo, base)
>>>+
>>>+extern bool poll_datapath;
>>>+
>>>+struct qaic_user {
>>>+	/* Uniquely identifies this user for the device */
>>>+	int			handle;
>>>+	struct kref		ref_count;
>>>+	/* Char device opened by this user */
>>>+	struct qaic_drm_device	*qddev;
>>>+	/* Node in list of users that opened this drm device */
>>>+	struct list_head	node;
>>>+	/* SRCU used to synchronize this user during cleanup */
>>>+	struct srcu_struct	qddev_lock;
>>>+	atomic_t		chunk_id;
>>>+};
>>>+
>>>+struct dma_bridge_chan {
>>>+	/* Pointer to device strcut maintained by driver */
>>>+	struct qaic_device	*qdev;
>>>+	/* ID of this DMA bridge channel(DBC) */
>>>+	unsigned int		id;
>>>+	/* Synchronizes access to xfer_list */
>>>+	spinlock_t		xfer_lock;
>>>+	/* Base address of request queue */
>>>+	void			*req_q_base;
>>>+	/* Base address of response queue */
>>>+	void			*rsp_q_base;
>>>+	/*
>>>+	 * Base bus address of request queue. Response queue bus address can be
>>>+	 * calculated by adding request queue size to this variable
>>>+	 */
>>>+	dma_addr_t		dma_addr;
>>>+	/* Total size of request and response queue in byte */
>>>+	u32			total_size;
>>>+	/* Capacity of request/response queue */
>>>+	u32			nelem;
>>>+	/* The user that opened this DBC */
>>>+	struct qaic_user	*usr;
>>>+	/*
>>>+	 * Request ID of next memory handle that goes in request queue. One
>>>+	 * memory handle can enqueue more than one request elements, all
>>>+	 * this requests that belong to same memory handle have same request ID
>>>+	 */
>>>+	u16			next_req_id;
>>>+	/* TRUE: DBC is in use; FALSE: DBC not in use */
>>
>>Use standard "true"/"false" instead of custom "TRUE"/"FALSE" macros.
>>This applies here and in multiple other places in the driver.
>
>I think you are getting at that the documentation could be confusing.  
>I don't appear to see custom macro use in the code.  Will try to 
>clarify that here.
>
>>>+	bool			in_use;
>>>+	/*
>>>+	 * Base address of device registers. Used to read/write request and
>>>+	 * response queue's head and tail pointer of this DBC.
>>>+	 */
>>>+	void __iomem		*dbc_base;
>>>+	/* Head of list where each node is a memory handle queued in request queue */
>>>+	struct list_head	xfer_list;
>>>+	/* Synchronizes DBC readers during cleanup */
>>>+	struct srcu_struct	ch_lock;
>>>+	/*
>>>+	 * When this DBC is released, any thread waiting on this wait queue is
>>>+	 * woken up
>>>+	 */
>>>+	wait_queue_head_t	dbc_release;
>>>+	/* Head of list where each node is a bo associated with this DBC */
>>>+	struct list_head	bo_lists;
>>>+	/* The irq line for this DBC.  Used for polling */
>>>+	unsigned int		irq;
>>>+	/* Polling work item to simulate interrupts */
>>>+	struct work_struct	poll_work;
>>>+};
>>>+
>>>+struct qaic_device {
>>>+	/* Pointer to base PCI device struct of our physical device */
>>>+	struct pci_dev		*pdev;
>>>+	/* Mask of all bars of this device */
>>>+	int			bars;
>>>+	/* Req. ID of request that will be queued next in MHI control device */
>>>+	u32			next_seq_num;
>>>+	/* Base address of bar 0 */
>>>+	void __iomem		*bar_0;
>>>+	/* Base address of bar 2 */
>>>+	void __iomem		*bar_2;
>>>+	/* Controller structure for MHI devices */
>>>+	struct mhi_controller	*mhi_cntl;
>>>+	/* MHI control channel device */
>>>+	struct mhi_device	*cntl_ch;
>>>+	/* List of requests queued in MHI control device */
>>>+	struct list_head	cntl_xfer_list;
>>>+	/* Synchronizes MHI control device transactions and its xfer list */
>>>+	struct mutex		cntl_mutex;
>>>+	/* Base actual physical representation of drm device */
>>>+	struct qaic_drm_device	*base_dev;
>>>+	/* Array of DBC struct of this device */
>>>+	struct dma_bridge_chan	*dbc;
>>>+	/* Work queue for tasks related to MHI control device */
>>>+	struct workqueue_struct	*cntl_wq;
>>>+	/* Synchronizes all the users of device during cleanup */
>>>+	struct srcu_struct	dev_lock;
>>>+	/* TRUE: Device under reset; FALSE: Device not under reset */
>>>+	bool			in_reset;
>>>+	/*
>>>+	 * TRUE: A tx MHI transaction has failed and a rx buffer is still queued
>>>+	 * in control device. Such a buffer is considered lost rx buffer
>>>+	 * FALSE: No rx buffer is lost in control device
>>>+	 */
>>>+	bool			cntl_lost_buf;
>>>+	/* Maximum number of DBC supported by this device */
>>>+	u32			num_dbc;
>>>+	/* Head in list of drm device created on top of this device */
>>>+	struct list_head	qaic_drm_devices;
>>>+	/* Synchronizes access of qaic_drm_devices list */
>>>+	struct mutex		qaic_drm_devices_mutex;
>>>+	/* Generate the CRC of a control message */
>>>+	u32 (*gen_crc)(void *msg);
>>>+	/* Validate the CRC of a control message */
>>>+	bool (*valid_crc)(void *msg);
>>>+};
>>>+
>>>+struct qaic_drm_device {
>>>+	/* Pointer to the root device struct driven by this driver */
>>>+	struct qaic_device	*qdev;
>>>+	/* Node in list of drm devices maintained by root device */
>>>+	struct list_head	node;
>>>+	/*
>>>+	 * The physical device can be partition in number of logical devices.
>>>+	 * And each logical device is given a partition id. This member stores
>>>+	 * that id. QAIC_NO_PARTITION is a sentinel used to mark that this drm
>>>+	 * device is the actual physical device
>>>+	 */
>>>+	s32			partition_id;
>>>+	/*
>>>+	 * It points to the user that created this drm device. It is NULL
>>>+	 * when this drm device represents the physical device i.e.
>>>+	 * partition_id is QAIC_NO_PARTITION
>>>+	 */
>>>+	struct qaic_user	*owner;
>>>+	/* Pointer to the drm device struct of this drm device */
>>>+	struct drm_device	*ddev;
>>>+	/* Head in list of users who have opened this drm device */
>>>+	struct list_head	users;
>>>+	/* Synchronizes access to users list */
>>>+	struct mutex		users_mutex;
>>>+};
>>>+
>>>+struct qaic_bo {
>>>+	struct drm_gem_object	base;
>>
>>Any reason why drm_gem_shmem_object cannot be used as a base?
>
>I beleive that is incompatible with our memory allocator in patch 5.
>
>>>+	/* Scatter/gather table for allocate/imported BO */
>>>+	struct sg_table		*sgt;
>>>+	/* BO size requested by user. GEM object might be bigger in size. */
>>>+	u64			size;
>>>+	/* Head in list of slices of this BO */
>>>+	struct list_head	slices;
>>>+	/* Total nents, for all slices of this BO */
>>>+	int			total_slice_nents;
>>>+	/*
>>>+	 * Direction of transfer. It can assume only two value DMA_TO_DEVICE and
>>>+	 * DMA_FROM_DEVICE.
>>>+	 */
>>>+	int			dir;
>>>+	/* The pointer of the DBC which operates on this BO */
>>>+	struct dma_bridge_chan	*dbc;
>>>+	/* Number of slice that belongs to this buffer */
>>>+	u32			nr_slice;
>>>+	/* Number of slice that have been transferred by DMA engine */
>>>+	u32			nr_slice_xfer_done;
>>>+	/* TRUE = BO is queued for execution, FALSE = BO is not queued */
>>>+	bool			queued;
>>>+	/*
>>>+	 * If TRUE then user has attached slicing information to this BO by
>>>+	 * calling DRM_IOCTL_QAIC_ATTACH_SLICE_BO ioctl.
>>>+	 */
>>>+	bool			sliced;
>>>+	/* Request ID of this BO if it is queued for execution */
>>>+	u16			req_id;
>>>+	/* Handle assigned to this BO */
>>>+	u32			handle;
>>>+	/* Wait on this for completion of DMA transfer of this BO */
>>>+	struct completion	xfer_done;
>>>+	/*
>>>+	 * Node in linked list where head is dbc->xfer_list.
>>>+	 * This link list contain BO's that are queued for DMA transfer.
>>>+	 */
>>>+	struct list_head	xfer_list;
>>>+	/*
>>>+	 * Node in linked list where head is dbc->bo_lists.
>>>+	 * This link list contain BO's that are associated with the DBC it is
>>>+	 * linked to.
>>>+	 */
>>>+	struct list_head	bo_list;
>>>+	struct {
>>>+		/*
>>>+		 * Latest timestamp(ns) at which kernel received a request to
>>>+		 * execute this BO
>>>+		 */
>>>+		u64		req_received_ts;
>>>+		/*
>>>+		 * Latest timestamp(ns) at which kernel enqueued requests of
>>>+		 * this BO for execution in DMA queue
>>>+		 */
>>>+		u64		req_submit_ts;
>>>+		/*
>>>+		 * Latest timestamp(ns) at which kernel received a completion
>>>+		 * interrupt for requests of this BO
>>>+		 */
>>>+		u64		req_processed_ts;
>>>+		/*
>>>+		 * Number of elements already enqueued in DMA queue before
>>>+		 * enqueuing requests of this BO
>>>+		 */
>>>+		u32		queue_level_before;
>>>+	} perf_stats;
>>>+
>>>+};
>>>+
>>>+struct bo_slice {
>>>+	/* Mapped pages */
>>>+	struct sg_table		*sgt;
>>>+	/* Number of requests required to queue in DMA queue */
>>>+	int			nents;
>>>+	/* See enum dma_data_direction */
>>>+	int			dir;
>>>+	/* Actual requests that will be copied in DMA queue */
>>>+	struct dbc_req		*reqs;
>>>+	struct kref		ref_count;
>>>+	/* TRUE: No DMA transfer required */
>>>+	bool			no_xfer;
>>>+	/* Pointer to the parent BO handle */
>>>+	struct qaic_bo		*bo;
>>>+	/* Node in list of slices maintained by parent BO */
>>>+	struct list_head	slice;
>>>+	/* Size of this slice in bytes */
>>>+	u64			size;
>>>+	/* Offset of this slice in buffer */
>>>+	u64			offset;
>>>+};
>>>+
>>>+int get_dbc_req_elem_size(void);
>>>+int get_dbc_rsp_elem_size(void);
>>>+int get_cntl_version(struct qaic_device *qdev, struct qaic_user *usr,
>>>+		     u16 *major, u16 *minor);
>>>+int qaic_manage_ioctl(struct drm_device *dev, void *data,
>>>+		      struct drm_file *file_priv);
>>>+int qaic_execute_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
>>>+		       unsigned long arg, bool is_partial);
>>>+int qaic_wait_exec_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
>>>+			 unsigned long arg);
>>>+int qaic_query_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
>>>+		     unsigned long arg);
>>>+int qaic_data_mmap(struct qaic_device *qdev, struct qaic_user *usr,
>>>+		   struct vm_area_struct *vma);
>>>+int qaic_data_get_reservation(struct qaic_device *qdev, struct qaic_user *usr,
>>>+			      void *data, u32 *partition_id,
>>>+			      u16 *remove);
>>>+void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
>>>+			 struct mhi_result *mhi_result);
>>>+
>>>+void qaic_mhi_dl_xfer_cb(struct mhi_device *mhi_dev,
>>>+			 struct mhi_result *mhi_result);
>>>+
>>>+int qaic_control_open(struct qaic_device *qdev);
>>>+void qaic_control_close(struct qaic_device *qdev);
>>>+void qaic_release_usr(struct qaic_device *qdev, struct qaic_user *usr);
>>>+
>>>+irqreturn_t dbc_irq_threaded_fn(int irq, void *data);
>>>+irqreturn_t dbc_irq_handler(int irq, void *data);
>>>+int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr);
>>>+void enable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr);
>>>+void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id);
>>>+void release_dbc(struct qaic_device *qdev, u32 dbc_id);
>>>+
>>>+void wake_all_cntl(struct qaic_device *qdev);
>>>+void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset);
>>>+
>>>+struct drm_gem_object *qaic_gem_prime_import(struct drm_device *dev,
>>>+					     struct dma_buf *dma_buf);
>>>+
>>>+int qaic_create_bo_ioctl(struct drm_device *dev, void *data,
>>>+			 struct drm_file *file_priv);
>>>+int qaic_mmap_bo_ioctl(struct drm_device *dev, void *data,
>>>+		       struct drm_file *file_priv);
>>>+int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data,
>>>+			       struct drm_file *file_priv);
>>>+int qaic_execute_bo_ioctl(struct drm_device *dev, void *data,
>>>+			  struct drm_file *file_priv);
>>>+int qaic_partial_execute_bo_ioctl(struct drm_device *dev, void *data,
>>>+				  struct drm_file *file_priv);
>>>+int qaic_wait_bo_ioctl(struct drm_device *dev, void *data,
>>>+		       struct drm_file *file_priv);
>>>+int qaic_test_print_bo_ioctl(struct drm_device *dev, void *data,
>>>+			     struct drm_file *file_priv);
>>>+int qaic_perf_stats_bo_ioctl(struct drm_device *dev, void *data,
>>>+			     struct drm_file *file_priv);
>>
>>You don't need to break these lines. You can use up to 100 columns in the whole driver.
>>It will be more readable and checkpatch won't complain.
>
>Some of this predates the 100 column change.  Checkpatch already 
>complains about the uapi file.
>
>Will try to fix here.
>
>>>+void irq_polling_work(struct work_struct *work);
>>>+
>>>+#endif /* QAICINTERNAL_H_ */
>>>diff --git a/drivers/accel/qaic/qaic_drv.c b/drivers/accel/qaic/qaic_drv.c
>>>new file mode 100644
>>>index 0000000..602a784
>>>--- /dev/null
>>>+++ b/drivers/accel/qaic/qaic_drv.c
>>>@@ -0,0 +1,771 @@
>>>+// SPDX-License-Identifier: GPL-2.0-only
>>>+
>>>+/* Copyright (c) 2019-2021, The Linux Foundation. All rights reserved. */
>>>+/* Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved. */
>>>+
>>>+#include <linux/delay.h>
>>>+#include <linux/dma-mapping.h>
>>>+#include <linux/idr.h>
>>>+#include <linux/interrupt.h>
>>>+#include <linux/list.h>
>>>+#include <linux/kref.h>
>>>+#include <linux/mhi.h>
>>>+#include <linux/module.h>
>>>+#include <linux/msi.h>
>>>+#include <linux/mutex.h>
>>>+#include <linux/pci.h>
>>>+#include <linux/sched.h>
>>
>>Is <linux/sched.h> used here?
>>Feels like there are couple other unused includes in this file.
>
>Actually I think that can be removed now.  Will check.
>The other ones look sane to me.  Are there specific ones you question?
>
>>
>>>+#include <linux/spinlock.h>
>>>+#include <linux/workqueue.h>
>>>+#include <linux/wait.h>
>>>+#include <drm/drm_accel.h>
>>>+#include <drm/drm_drv.h>
>>>+#include <drm/drm_file.h>
>>>+#include <drm/drm_gem.h>
>>>+#include <drm/drm_ioctl.h>
>>>+#include <uapi/drm/qaic_accel.h>
>>>+
>>>+#include "mhi_controller.h"
>>>+#include "mhi_qaic_ctrl.h"
>>>+#include "qaic.h"
>>>+
>>>+MODULE_IMPORT_NS(DMA_BUF);
>>>+
>>>+#define PCI_DEV_AIC100			0xa100
>>>+#define QAIC_NAME			"qaic"
>>>+#define QAIC_DESC			"Qualcomm Cloud AI Accelerators"
>>>+
>>>+static unsigned int datapath_polling;
>>>+module_param(datapath_polling, uint, 0400);
>>>+bool poll_datapath;
>>>+
>>>+static u16 cntl_major = 5;
>>>+static u16 cntl_minor;/* 0 */
>>
>>Missing space before the comment.
>>And also you could convert both vars to macros as they are constants.
>
>Sure
>
>>>+static bool link_up;
>>>+static DEFINE_IDA(qaic_usrs);
>>>+
>>>+static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id,
>>>+				  struct qaic_user *owner);
>>>+static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id,
>>>+				    struct qaic_user *owner);
>>>+
>>>+static void free_usr(struct kref *kref)
>>>+{
>>>+	struct qaic_user *usr = container_of(kref, struct qaic_user, ref_count);
>>>+
>>>+	cleanup_srcu_struct(&usr->qddev_lock);
>>>+	ida_free(&qaic_usrs, usr->handle);
>>>+	kfree(usr);
>>>+}
>>>+
>>>+static int qaic_open(struct drm_device *dev, struct drm_file *file)
>>>+{
>>>+	struct qaic_drm_device *qddev = dev->dev_private;
>>>+	struct qaic_device *qdev = qddev->qdev;
>>>+	struct qaic_user *usr;
>>>+	int rcu_id;
>>>+	int ret;
>>>+
>>>+	rcu_id = srcu_read_lock(&qdev->dev_lock);
>>>+	if (qdev->in_reset) {
>>>+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>+		return -ENODEV;
>>>+	}
>>>+
>>>+	usr = kmalloc(sizeof(*usr), GFP_KERNEL);
>>>+	if (!usr) {
>>>+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>+		return -ENOMEM;
>>>+	}
>>>+
>>>+	usr->handle = ida_alloc(&qaic_usrs, GFP_KERNEL);
>>>+	if (usr->handle < 0) {
>>>+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>+		kfree(usr);
>>>+		return usr->handle;

hi, use after free here

>>>+	}
>>>+	usr->qddev = qddev;
>>>+	atomic_set(&usr->chunk_id, 0);
>>>+	init_srcu_struct(&usr->qddev_lock);
>>>+	kref_init(&usr->ref_count);
>>>+
>>>+	ret = mutex_lock_interruptible(&qddev->users_mutex);
>>>+	if (ret) {
>>>+		cleanup_srcu_struct(&usr->qddev_lock);
>>>+		kfree(usr);
>>>+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>+		return ret;
>>>+	}
>>>+
>>>+	list_add(&usr->node, &qddev->users);
>>>+	mutex_unlock(&qddev->users_mutex);
>>>+
>>>+	file->driver_priv = usr;
>>>+
>>>+	srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>+	return 0;
>>>+}
>>>+
>>>+static void qaic_postclose(struct drm_device *dev, struct drm_file *file)
>>>+{
>>>+	struct qaic_user *usr = file->driver_priv;
>>>+	struct qaic_drm_device *qddev;
>>>+	struct qaic_device *qdev;
>>>+	int qdev_rcu_id;
>>>+	int usr_rcu_id;
>>>+	int i;
>>>+
>>>+	qddev = usr->qddev;
>>>+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
>>>+	if (qddev) {
>>>+		qdev = qddev->qdev;
>>>+		qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
>>>+		if (!qdev->in_reset) {
>>>+			qaic_release_usr(qdev, usr);
>>>+			for (i = 0; i < qdev->num_dbc; ++i)
>>>+				if (qdev->dbc[i].usr &&
>>>+				    qdev->dbc[i].usr->handle == usr->handle)
>>>+					release_dbc(qdev, i);
>>>+
>>>+			/* Remove child devices */
>>>+			if (qddev->partition_id == QAIC_NO_PARTITION)
>>>+				qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, usr);
>>>+		}
>>>+		srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
>>>+
>>>+		mutex_lock(&qddev->users_mutex);
>>>+		if (!list_empty(&usr->node))
>>>+			list_del_init(&usr->node);
>>>+		mutex_unlock(&qddev->users_mutex);
>>>+	}
>>>+
>>>+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>>+	kref_put(&usr->ref_count, free_usr);
>>>+
>>>+	file->driver_priv = NULL;
>>>+}
>>>+
>>>+static int qaic_part_dev_ioctl(struct drm_device *dev, void *data,
>>>+			       struct drm_file *file_priv)
>>>+{
>>>+	struct qaic_device *qdev;
>>>+	struct qaic_user *usr;
>>>+	u32 partition_id;
>>>+	int qdev_rcu_id;
>>>+	int usr_rcu_id;
>>>+	int ret = 0;
>>>+	u16 remove;
>>>+
>>>+	usr = file_priv->driver_priv;
>>>+	if (!usr)
>>>+		return -EINVAL;
>>>+
>>>+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
>>>+	if (!usr->qddev) {
>>>+		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>>+		return -ENODEV;
>>>+	}
>>>+
>>>+	qdev = usr->qddev->qdev;
>>>+	if (!qdev) {
>>>+		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>>+		return -ENODEV;
>>>+	}
>>>+
>>>+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
>>>+	if (qdev->in_reset) {
>>>+		ret = -ENODEV;
>>>+		goto out;
>>>+	}
>>>+
>>>+	/* This IOCTL is only supported for base devices. */
>>>+	if (usr->qddev->partition_id != QAIC_NO_PARTITION) {
>>>+		ret = -ENOTTY;
>>>+		goto out;
>>>+	}
>>>+
>>>+	ret = qaic_data_get_reservation(qdev, usr, data, &partition_id,
>>>+					&remove);
>>>+	if (ret)
>>>+		goto out;
>>>+
>>>+	if (remove == 1)
>>>+		qaic_destroy_drm_device(qdev, partition_id, usr);
>>>+	else
>>>+		ret = qaic_create_drm_device(qdev, partition_id, usr);
>>>+
>>>+out:
>>>+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
>>>+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>>+
>>>+	return ret;
>>>+}
>>>+
>>>+DEFINE_DRM_ACCEL_FOPS(qaic_accel_fops);
>>>+
>>>+static const struct drm_ioctl_desc qaic_drm_ioctls[] = {
>>>+	DRM_IOCTL_DEF_DRV(QAIC_MANAGE, qaic_manage_ioctl, DRM_RENDER_ALLOW),
>>>+	DRM_IOCTL_DEF_DRV(QAIC_CREATE_BO, qaic_create_bo_ioctl, DRM_RENDER_ALLOW),
>>>+	DRM_IOCTL_DEF_DRV(QAIC_MMAP_BO, qaic_mmap_bo_ioctl, DRM_RENDER_ALLOW),
>>>+	DRM_IOCTL_DEF_DRV(QAIC_ATTACH_SLICE_BO, qaic_attach_slice_bo_ioctl, DRM_RENDER_ALLOW),
>>>+	DRM_IOCTL_DEF_DRV(QAIC_EXECUTE_BO, qaic_execute_bo_ioctl, DRM_RENDER_ALLOW),
>>>+	DRM_IOCTL_DEF_DRV(QAIC_PARTIAL_EXECUTE_BO, qaic_partial_execute_bo_ioctl, DRM_RENDER_ALLOW),
>>>+	DRM_IOCTL_DEF_DRV(QAIC_WAIT_BO, qaic_wait_bo_ioctl, DRM_RENDER_ALLOW),
>>>+	DRM_IOCTL_DEF_DRV(QAIC_PERF_STATS_BO, qaic_perf_stats_bo_ioctl, DRM_RENDER_ALLOW),
>>>+	DRM_IOCTL_DEF_DRV(QAIC_PART_DEV, qaic_part_dev_ioctl, DRM_RENDER_ALLOW),
>>>+};
>>>+
>>>+static const struct drm_driver qaic_accel_driver = {
>>>+	.driver_features	= DRIVER_GEM | DRIVER_COMPUTE_ACCEL,
>>>+
>>>+	.name			= QAIC_NAME,
>>>+	.desc			= QAIC_DESC,
>>>+	.date			= "20190618",
>>>+
>>>+	.fops			= &qaic_accel_fops,
>>>+	.open			= qaic_open,
>>>+	.postclose		= qaic_postclose,
>>>+
>>>+	.ioctls			= qaic_drm_ioctls,
>>>+	.num_ioctls		= ARRAY_SIZE(qaic_drm_ioctls),
>>>+	.prime_fd_to_handle	= drm_gem_prime_fd_to_handle,
>>>+	.gem_prime_import	= qaic_gem_prime_import,
>>>+};
>>>+
>>>+static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id,
>>>+				  struct qaic_user *owner)
>>>+{
>>>+	struct qaic_drm_device *qddev;
>>>+	struct drm_device *ddev;
>>>+	struct device *pdev;
>>>+	int ret;
>>>+
>>>+	/*
>>>+	 * Partition id QAIC_NO_PARTITION indicates that the device was created
>>>+	 * on mhi_probe and id > QAIC_NO_PARTITION indicates a partition
>>>+	 * created using IOCTL. So, pdev for primary device is the pci dev and
>>>+	 * the parent for partition dev is the primary device.
>>>+	 */
>>>+	if (partition_id == QAIC_NO_PARTITION)
>>>+		pdev = &qdev->pdev->dev;
>>>+	else
>>>+		pdev = qdev->base_dev->ddev->dev;
>>>+
>>>+	qddev = kzalloc(sizeof(*qddev), GFP_KERNEL);
>>>+	if (!qddev) {
>>>+		ret = -ENOMEM;
>>>+		goto qddev_fail;

qddev_fail label is just before returning so better return here directly

>>>+	}
>>>+
>>>+	ddev = drm_dev_alloc(&qaic_accel_driver, pdev);
>>>+	if (IS_ERR(ddev)) {
>>>+		ret = PTR_ERR(ddev);
>>>+		goto ddev_fail;
>>>+	}
>>>+
>>>+	ddev->dev_private = qddev;
>>>+	qddev->ddev = ddev;
>>>+
>>>+	if (partition_id == QAIC_NO_PARTITION)
>>>+		qdev->base_dev = qddev;
>>>+	qddev->qdev = qdev;
>>>+	qddev->partition_id = partition_id;
>>>+	qddev->owner = owner;
>>>+	INIT_LIST_HEAD(&qddev->users);
>>>+	mutex_init(&qddev->users_mutex);
>>>+
>>>+	mutex_lock(&qdev->qaic_drm_devices_mutex);
>>>+	list_add(&qddev->node, &qdev->qaic_drm_devices);
>>>+	mutex_unlock(&qdev->qaic_drm_devices_mutex);
>>>+
>>>+	ret = drm_dev_register(ddev, 0);
>>>+	if (ret) {
>>>+		pci_dbg(qdev->pdev, "%s: drm_dev_register failed %d\n", __func__, ret);
>>>+		goto drm_reg_fail;
>>>+	}
>>>+
>>>+	return 0;
>>>+
>>>+drm_reg_fail:
>>>+	mutex_destroy(&qddev->users_mutex);
>>>+	mutex_lock(&qdev->qaic_drm_devices_mutex);
>>>+	list_del(&qddev->node);
>>>+	mutex_unlock(&qdev->qaic_drm_devices_mutex);
>>>+	if (partition_id == QAIC_NO_PARTITION)
>>>+		qdev->base_dev = NULL;
>>>+	drm_dev_put(ddev);
>>>+ddev_fail:
>>>+	kfree(qddev);
>>>+qddev_fail:

no point for this label

>>>+	return ret;
>>>+}
>>>+
>>>+static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id,
>>>+				    struct qaic_user *owner)
>>>+{
>>>+	struct qaic_drm_device *qddev;
>>>+	struct qaic_drm_device *q;
>>>+	struct qaic_user *usr;
>>>+
>>>+	list_for_each_entry_safe(qddev, q, &qdev->qaic_drm_devices, node) {
>>>+		/*
>>>+		 * Skip devices in case we just want to remove devices
>>>+		 * specific to a owner or partition id.
>>>+		 *
>>>+		 * owner	partition_id	notes
>>>+		 * ----------------------------------
>>>+		 * NULL		NO_PARTITION	delete base + all derived (qdev
>>>+		 *				reset)
>>>+		 * !NULL	NO_PARTITION	delete derived devs created by
>>>+		 *				owner.
>>>+		 * !NULL	>NO_PARTITION	delete derived dev identified by
>>>+		 *				the partition id and created by
>>>+		 *				owner
>>>+		 * NULL		>NO_PARTITION	invalid (no-op)
>>>+		 *
>>>+		 * if partition_id is any value < QAIC_NO_PARTITION this will be
>>>+		 * a no-op.
>>>+		 */
>>>+		if (owner && owner != qddev->owner)
>>>+			continue;
>>>+
>>>+		if (partition_id != QAIC_NO_PARTITION &&
>>>+		    partition_id != qddev->partition_id && !owner)
>>>+			continue;
>>>+
>>>+		/*
>>>+		 * Existing users get unresolvable errors till they close FDs.
>>>+		 * Need to sync carefully with users calling close().  The
>>>+		 * list of users can be modified elsewhere when the lock isn't
>>>+		 * held here, but the sync'ing the srcu with the mutex held
>>>+		 * could deadlock.  Grab the mutex so that the list will be
>>>+		 * unmodified.  The user we get will exist as long as the
>>>+		 * lock is held.  Signal that the qcdev is going away, and
>>>+		 * grab a reference to the user so they don't go away for
>>>+		 * synchronize_srcu().  Then release the mutex to avoid
>>>+		 * deadlock and make sure the user has observed the signal.
>>>+		 * With the lock released, we cannot maintain any state of the
>>>+		 * user list.
>>>+		 */
>>>+		mutex_lock(&qddev->users_mutex);
>>>+		while (!list_empty(&qddev->users)) {
>>>+			usr = list_first_entry(&qddev->users, struct qaic_user,
>>>+					       node);
>>>+			list_del_init(&usr->node);
>>>+			kref_get(&usr->ref_count);
>>>+			usr->qddev = NULL;
>>>+			mutex_unlock(&qddev->users_mutex);
>>>+			synchronize_srcu(&usr->qddev_lock);
>>>+			kref_put(&usr->ref_count, free_usr);
>>>+			mutex_lock(&qddev->users_mutex);
>>>+		}
>>>+		mutex_unlock(&qddev->users_mutex);
>>>+
>>>+		if (qddev->ddev) {
>>>+			drm_dev_unregister(qddev->ddev);
>>>+			drm_dev_put(qddev->ddev);
>>>+		}
>>>+
>>>+		list_del(&qddev->node);
>>>+		kfree(qddev);
>>>+	}
>>>+}
>>>+
>>>+static int qaic_mhi_probe(struct mhi_device *mhi_dev,
>>>+			  const struct mhi_device_id *id)
>>>+{
>>>+	struct qaic_device *qdev;
>>>+	u16 major, minor;
>>>+	int ret;
>>>+
>>>+	/*
>>>+	 * Invoking this function indicates that the control channel to the
>>>+	 * device is available.  We use that as a signal to indicate that
>>>+	 * the device side firmware has booted.  The device side firmware
>>>+	 * manages the device resources, so we need to communicate with it
>>>+	 * via the control channel in order to utilize the device.  Therefore
>>>+	 * we wait until this signal to create the drm dev that userspace will
>>>+	 * use to control the device, because without the device side firmware,
>>>+	 * userspace can't do anything useful.
>>>+	 */
>>>+
>>>+	qdev = pci_get_drvdata(to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
>>>+
>>>+	qdev->in_reset = false;
>>>+
>>>+	dev_set_drvdata(&mhi_dev->dev, qdev);
>>>+	qdev->cntl_ch = mhi_dev;
>>>+
>>>+	ret = qaic_control_open(qdev);
>>>+	if (ret) {
>>>+		pci_dbg(qdev->pdev, "%s: control_open failed %d\n", __func__, ret);
>>>+		goto err;
>>>+	}
>>>+
>>>+	ret = get_cntl_version(qdev, NULL, &major, &minor);
>>>+	if (ret || major != cntl_major || minor > cntl_minor) {
>>>+		pci_err(qdev->pdev, "%s: Control protocol version (%d.%d) not supported.  Supported version is (%d.%d). Ret: %d\n",
>>>+			__func__, major, minor, cntl_major, cntl_minor, ret);
>>>+		ret = -EINVAL;
>>>+		goto close_control;
>>>+	}
>>>+
>>>+	ret = qaic_create_drm_device(qdev, QAIC_NO_PARTITION, NULL);
>>>+
>>>+	return ret;
>>>+
>>>+close_control:
>>>+	qaic_control_close(qdev);
>>>+err:

same here, not sure there is a point to this label

Thanks,
Dafna

>>>+	return ret;
>>>+}
>>>+
>>>+static void qaic_mhi_remove(struct mhi_device *mhi_dev)
>>>+{
>>
>>Add a comment here
>
>Presumably why this is an empty function?
>Will do.
>
>>>+}
>>>+
>>>+static void qaic_notify_reset(struct qaic_device *qdev)
>>>+{
>>>+	int i;
>>>+
>>>+	qdev->in_reset = true;
>>>+	/* wake up any waiters to avoid waiting for timeouts at sync */
>>>+	wake_all_cntl(qdev);
>>>+	for (i = 0; i < qdev->num_dbc; ++i)
>>>+		wakeup_dbc(qdev, i);
>>>+	synchronize_srcu(&qdev->dev_lock);
>>>+}
>>>+
>>>+void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset)
>>>+{
>>>+	int i;
>>>+
>>>+	qaic_notify_reset(qdev);
>>>+
>>>+	/* remove drmdevs to prevent new users from coming in */
>>>+	if (qdev->base_dev)
>>>+		qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, NULL);
>>>+
>>>+	/* start tearing things down */
>>>+	for (i = 0; i < qdev->num_dbc; ++i)
>>>+		release_dbc(qdev, i);
>>>+
>>>+	if (exit_reset)
>>>+		qdev->in_reset = false;
>>>+}
>>>+
>>>+static int qaic_pci_probe(struct pci_dev *pdev,
>>>+			  const struct pci_device_id *id)
>>
>>Please try to simplify this function. Maybe move irq init to separate function.
>>It will be more readable and there will less of a error handling spaghetti at the bottom.
>
>I guess I tend towards not breaking something up if there is not reuse 
>elsewhere, but I think I see your point.  Will have a look.
>
>>
>>>+{
>>>+	int ret;
>>>+	int i;
>>>+	int mhi_irq;
>>>+	struct qaic_device *qdev;
>>>+
>>>+	qdev = devm_kzalloc(&pdev->dev, sizeof(*qdev), GFP_KERNEL);
>>>+	if (!qdev)
>>>+		return -ENOMEM;
>>>+
>>>+	if (id->device == PCI_DEV_AIC100) {
>>>+		qdev->num_dbc = 16;
>>>+		qdev->dbc = devm_kcalloc(&pdev->dev, qdev->num_dbc, sizeof(*qdev->dbc),
>>>+					 GFP_KERNEL);
>>>+		if (!qdev->dbc)
>>>+			return -ENOMEM;
>>>+	}
>>>+
>>>+	qdev->cntl_wq = alloc_workqueue("qaic_cntl", WQ_UNBOUND, 0);
>>>+	if (!qdev->cntl_wq) {
>>>+		ret = -ENOMEM;
>>>+		goto wq_fail;
>>>+	}
>>>+	pci_set_drvdata(pdev, qdev);
>>>+	qdev->pdev = pdev;
>>>+	mutex_init(&qdev->cntl_mutex);
>>>+	INIT_LIST_HEAD(&qdev->cntl_xfer_list);
>>>+	init_srcu_struct(&qdev->dev_lock);
>>>+	INIT_LIST_HEAD(&qdev->qaic_drm_devices);
>>>+	mutex_init(&qdev->qaic_drm_devices_mutex);
>>>+	for (i = 0; i < qdev->num_dbc; ++i) {
>>>+		spin_lock_init(&qdev->dbc[i].xfer_lock);
>>>+		qdev->dbc[i].qdev = qdev;
>>>+		qdev->dbc[i].id = i;
>>>+		INIT_LIST_HEAD(&qdev->dbc[i].xfer_list);
>>>+		init_srcu_struct(&qdev->dbc[i].ch_lock);
>>>+		init_waitqueue_head(&qdev->dbc[i].dbc_release);
>>>+		INIT_LIST_HEAD(&qdev->dbc[i].bo_lists);
>>>+	}
>>>+
>>>+	qdev->bars = pci_select_bars(pdev, IORESOURCE_MEM);
>>>+
>>>+	/* make sure the device has the expected BARs */
>>>+	if (qdev->bars != (BIT(0) | BIT(2) | BIT(4))) {
>>>+		pci_dbg(pdev, "%s: expected BARs 0, 2, and 4 not found in device.  Found 0x%x\n",
>>>+			__func__, qdev->bars);
>>>+		ret = -EINVAL;
>>>+		goto bar_fail;
>>>+	}
>>>+
>>>+	ret = pci_enable_device(pdev);
>>>+	if (ret)
>>>+		goto enable_fail;
>>>+
>>>+	ret = pci_request_selected_regions(pdev, qdev->bars, "aic100");
>>>+	if (ret)
>>>+		goto request_regions_fail;
>>>+
>>>+	pci_set_master(pdev);
>>>+
>>>+	ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
>>>+	if (ret)
>>>+		goto dma_mask_fail;
>>>+	ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
>>>+	if (ret)
>>>+		goto dma_mask_fail;
>>>+	ret = dma_set_max_seg_size(&pdev->dev, UINT_MAX);
>>>+	if (ret)
>>>+		goto dma_mask_fail;
>>>+
>>>+	qdev->bar_0 = pci_ioremap_bar(pdev, 0);
>>>+	if (!qdev->bar_0) {
>>>+		ret = -ENOMEM;
>>>+		goto ioremap_0_fail;
>>>+	}
>>>+
>>>+	qdev->bar_2 = pci_ioremap_bar(pdev, 2);
>>>+	if (!qdev->bar_2) {
>>>+		ret = -ENOMEM;
>>>+		goto ioremap_2_fail;
>>>+	}
>>>+
>>>+	for (i = 0; i < qdev->num_dbc; ++i)
>>>+		qdev->dbc[i].dbc_base = qdev->bar_2 + QAIC_DBC_OFF(i);
>>>+
>>>+	ret = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI);
>>>+	if (ret < 0)
>>>+		goto alloc_irq_fail;
>>>+
>>>+	if (ret < 32) {
>>>+		pci_err(pdev, "%s: Requested 32 MSIs.  Obtained %d MSIs which is less than the 32 required.\n",
>>>+			__func__, ret);
>>>+		ret = -ENODEV;
>>>+		goto invalid_msi_config;
>>>+	}
>>>+
>>>+	mhi_irq = pci_irq_vector(pdev, 0);
>>>+	if (mhi_irq < 0) {
>>>+		ret = mhi_irq;
>>>+		goto get_mhi_irq_fail;
>>>+	}
>>>+
>>>+	for (i = 0; i < qdev->num_dbc; ++i) {
>>>+		ret = devm_request_threaded_irq(&pdev->dev,
>>>+						pci_irq_vector(pdev, i + 1),
>>>+						dbc_irq_handler,
>>>+						dbc_irq_threaded_fn,
>>>+						IRQF_SHARED,
>>>+						"qaic_dbc",
>>>+						&qdev->dbc[i]);
>>>+		if (ret)
>>>+			goto get_dbc_irq_failed;
>>>+
>>>+		if (poll_datapath) {
>>>+			qdev->dbc[i].irq = pci_irq_vector(pdev, i + 1);
>>>+			disable_irq_nosync(qdev->dbc[i].irq);
>>>+			INIT_WORK(&qdev->dbc[i].poll_work, irq_polling_work);
>>>+		}
>>>+	}
>>>+
>>>+	qdev->mhi_cntl = qaic_mhi_register_controller(pdev, qdev->bar_0, mhi_irq);
>>>+	if (IS_ERR(qdev->mhi_cntl)) {
>>>+		ret = PTR_ERR(qdev->mhi_cntl);
>>>+		goto mhi_register_fail;
>>>+	}
>>>+
>>>+	return 0;
>>>+
>>>+mhi_register_fail:
>>>+get_dbc_irq_failed:
>>
>>I don't think that duplicated goto statements are allowed by the coding style.
>>These should be rather named something like err_free_irq.
>>See https://www.kernel.org/doc/html/v4.10/process/coding-style.html#centralized-exiting-of-functions
>
>I took another look at that link, and I do not beleive it prohibits 
>duplicated goto labels, but they probably could be renamed and 
>simplified as you indicate.  Will have a look at that.
>
>>
>>>+	for (i = 0; i < qdev->num_dbc; ++i)
>>>+		devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
>>>+			      &qdev->dbc[i]);
>>>+get_mhi_irq_fail:
>>>+invalid_msi_config:
>>>+	pci_free_irq_vectors(pdev);
>>>+alloc_irq_fail:
>>>+	iounmap(qdev->bar_2);
>>>+ioremap_2_fail:
>>>+	iounmap(qdev->bar_0);
>>>+ioremap_0_fail:
>>>+dma_mask_fail:
>>>+	pci_clear_master(pdev);
>>>+	pci_release_selected_regions(pdev, qdev->bars);
>>>+request_regions_fail:
>>>+	pci_disable_device(pdev);
>>>+enable_fail:
>>>+	pci_set_drvdata(pdev, NULL);
>>>+bar_fail:
>>>+	for (i = 0; i < qdev->num_dbc; ++i)
>>>+		cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
>>>+	cleanup_srcu_struct(&qdev->dev_lock);
>>>+	destroy_workqueue(qdev->cntl_wq);
>>>+wq_fail:
>>>+	return ret;
>>>+}
>>>+
>>>+static void qaic_pci_remove(struct pci_dev *pdev)
>>>+{
>>>+	struct qaic_device *qdev = pci_get_drvdata(pdev);
>>>+	int i;
>>>+
>>>+	if (!qdev)
>>>+		return;
>>>+
>>>+	qaic_dev_reset_clean_local_state(qdev, false);
>>>+	qaic_mhi_free_controller(qdev->mhi_cntl, link_up);
>>>+	for (i = 0; i < qdev->num_dbc; ++i) {
>>>+		devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
>>>+			      &qdev->dbc[i]);
>>>+		cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
>>>+	}
>>>+	destroy_workqueue(qdev->cntl_wq);
>>>+	pci_free_irq_vectors(pdev);
>>>+	iounmap(qdev->bar_0);
>>>+	pci_clear_master(pdev);
>>>+	pci_release_selected_regions(pdev, qdev->bars);
>>>+	pci_disable_device(pdev);
>>>+	pci_set_drvdata(pdev, NULL);
>>>+}
>>>+
>>>+static void qaic_pci_shutdown(struct pci_dev *pdev)
>>>+{
>>>+	/* see qaic_exit for what link_up is doing */
>>>+	link_up = true;
>>>+	qaic_pci_remove(pdev);
>>>+}
>>>+
>>>+static pci_ers_result_t qaic_pci_error_detected(struct pci_dev *pdev,
>>>+						pci_channel_state_t error)
>>>+{
>>>+	return PCI_ERS_RESULT_NEED_RESET;
>>>+}
>>>+
>>>+static void qaic_pci_reset_prepare(struct pci_dev *pdev)
>>>+{
>>>+	struct qaic_device *qdev = pci_get_drvdata(pdev);
>>>+
>>>+	qaic_notify_reset(qdev);
>>>+	qaic_mhi_start_reset(qdev->mhi_cntl);
>>>+	qaic_dev_reset_clean_local_state(qdev, false);
>>>+}
>>>+
>>>+static void qaic_pci_reset_done(struct pci_dev *pdev)
>>>+{
>>>+	struct qaic_device *qdev = pci_get_drvdata(pdev);
>>>+
>>>+	qdev->in_reset = false;
>>>+	qaic_mhi_reset_done(qdev->mhi_cntl);
>>>+}
>>>+
>>>+static const struct mhi_device_id qaic_mhi_match_table[] = {
>>>+	{ .chan = "QAIC_CONTROL", },
>>>+	{},
>>>+};
>>>+
>>>+static struct mhi_driver qaic_mhi_driver = {
>>>+	.id_table = qaic_mhi_match_table,
>>>+	.remove = qaic_mhi_remove,
>>>+	.probe = qaic_mhi_probe,
>>>+	.ul_xfer_cb = qaic_mhi_ul_xfer_cb,
>>>+	.dl_xfer_cb = qaic_mhi_dl_xfer_cb,
>>>+	.driver = {
>>>+		.name = "qaic_mhi",
>>>+	},
>>>+};
>>>+
>>>+static const struct pci_device_id qaic_ids[] = {
>>>+	{ PCI_DEVICE(PCI_VENDOR_ID_QCOM, PCI_DEV_AIC100), },
>>>+	{ }
>>>+};
>>>+MODULE_DEVICE_TABLE(pci, qaic_ids);
>>>+
>>>+static const struct pci_error_handlers qaic_pci_err_handler = {
>>>+	.error_detected = qaic_pci_error_detected,
>>>+	.reset_prepare = qaic_pci_reset_prepare,
>>>+	.reset_done = qaic_pci_reset_done,
>>>+};
>>>+
>>>+static struct pci_driver qaic_pci_driver = {
>>>+	.name = QAIC_NAME,
>>>+	.id_table = qaic_ids,
>>>+	.probe = qaic_pci_probe,
>>>+	.remove = qaic_pci_remove,
>>>+	.shutdown = qaic_pci_shutdown,
>>>+	.err_handler = &qaic_pci_err_handler,
>>>+};
>>>+
>>>+static int __init qaic_init(void)
>>>+{
>>>+	int ret;
>>>+
>>>+	if (datapath_polling)
>>>+		poll_datapath = true;
>>>+
>>>+	ret = mhi_driver_register(&qaic_mhi_driver);
>>>+	if (ret) {
>>>+		pr_debug("qaic: mhi_driver_register failed %d\n", ret);
>>>+		goto free_class;
>>>+	}
>>>+
>>>+	ret = pci_register_driver(&qaic_pci_driver);
>>>+	if (ret) {
>>>+		pr_debug("qaic: pci_register_driver failed %d\n", ret);
>>>+		goto free_mhi;
>>>+	}
>>>+
>>>+	ret = mhi_qaic_ctrl_init();
>>>+	if (ret) {
>>>+		pr_debug("qaic: mhi_qaic_ctrl_init failed %d\n", ret);
>>>+		goto free_pci;
>>>+	}
>>>+
>>>+	return 0;
>>>+
>>>+free_pci:
>>>+	pci_unregister_driver(&qaic_pci_driver);
>>>+free_mhi:
>>>+	mhi_driver_unregister(&qaic_mhi_driver);
>>>+free_class:
>>
>>This label doesn't free anything. It should be renamed.
>
>Doh.  Holdover from a previous version.  It does need to be renamed.
>
>>
>>>+	return ret;
>>>+}
>>>+
>>>+static void __exit qaic_exit(void)
>>>+{
>>>+	/*
>>>+	 * We assume that qaic_pci_remove() is called due to a hotplug event
>>>+	 * which would mean that the link is down, and thus
>>>+	 * qaic_mhi_free_controller() should not try to access the device during
>>>+	 * cleanup.
>>>+	 * We call pci_unregister_driver() below, which also triggers
>>>+	 * qaic_pci_remove(), but since this is module exit, we expect the link
>>>+	 * to the device to be up, in which case qaic_mhi_free_controller()
>>>+	 * should try to access the device during cleanup to put the device in
>>>+	 * a sane state.
>>>+	 * For that reason, we set link_up here to let qaic_mhi_free_controller
>>>+	 * know the expected link state.  Since the module is going to be
>>>+	 * removed at the end of this, we don't need to worry about
>>>+	 * reinitializing the link_up state after the cleanup is done.
>>>+	 */
>>>+	link_up = true;
>>
>>Maybe you could just use pdev->current_state instead of link_up?
>
>Maybe.  Will have a look at that.
>
>>
>>>+	mhi_qaic_ctrl_deinit();
>>>+	pci_unregister_driver(&qaic_pci_driver);
>>>+	mhi_driver_unregister(&qaic_mhi_driver);
>>>+}
>>>+
>>>+module_init(qaic_init);
>>>+module_exit(qaic_exit);
>>>+
>>>+MODULE_AUTHOR(QAIC_DESC " Kernel Driver Team");
>>>+MODULE_DESCRIPTION(QAIC_DESC " Accel Driver");
>>>+MODULE_LICENSE("GPL");
>>>diff --git a/include/uapi/drm/qaic_accel.h b/include/uapi/drm/qaic_accel.h
>>>new file mode 100644
>>>index 0000000..d5fa6f5
>>>--- /dev/null
>>>+++ b/include/uapi/drm/qaic_accel.h
>>>@@ -0,0 +1,283 @@
>>>+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
>>>+ *
>>>+ * Copyright (c) 2019-2020, The Linux Foundation. All rights reserved.
>>>+ * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved.
>>>+ */
>>>+
>>>+#ifndef QAIC_ACCEL_H_
>>>+#define QAIC_ACCEL_H_
>>>+
>>>+#include <linux/ioctl.h>
>>>+#include <linux/types.h>
>>
>>These two headers should not be needed here.
>
>Fair enough.  Listed out of paranoia since they define things used in 
>this header.
>
>>
>>>+#include "drm.h"
>>>+
>>>+#if defined(__CPLUSPLUS)
>>
>>Use lowercase here: __cplusplus
>
>Doh.  Not sure why uppercase was used.  The referenced examples sure 
>didn't have that.
>
>>
>>>+extern "C" {
>>>+#endif
>>>+
>>>+#define QAIC_MANAGE_MAX_MSG_LENGTH SZ_4K	/**<
>>>+						  * The length(4K) includes len and
>>>+						  * count fields of qaic_manage_msg
>>>+						  */
>>
>>I guess these are doxygen style commands but you should be using kernel-doc here.
>>See https://docs.kernel.org/doc-guide/kernel-doc.html.
>>This can be used to verify the header:
>>$ scripts/kernel-doc -v -none include/uapi/drm/qaic_accel.h
>
>include/uapi/drm/drm.h was used as the example for these, and thus it 
>was assumed that was the DRM style.
>
>I'm not convinced kernel-doc is applicable to all of these.  Will review.
>
>>
>>>+
>>>+enum qaic_sem_flags {
>>
>>Is there any specific reason for enums if all values are explicitly given?
>
>It is a style pulled from elsewhere.  I can remove the enums.
>
>>
>>>+	SEM_INSYNCFENCE =	0x1,
>>
>>All these enums/defines end up in a global user space namespace.
>>I would advise to prefix everything with QAIC_ or DRM_QAIC_ (e.g. QAIC_SEM_INSYNCFENCE)
>>to avoid conflicts with other drivers or user space libs.
>
>Good point.  Will do.
>
>>
>>>+	SEM_OUTSYNCFENCE =	0x2,
>>>+};
>>>+
>>>+enum qaic_sem_cmd {
>>>+	SEM_NOP =		0,
>>>+	SEM_INIT =		1,
>>>+	SEM_INC =		2,
>>>+	SEM_DEC =		3,
>>>+	SEM_WAIT_EQUAL =	4,
>>>+	SEM_WAIT_GT_EQ =	5, /**< Greater than or equal */
>>>+	SEM_WAIT_GT_0 =		6, /**< Greater than 0 */
>>>+};
>>>+
>>>+enum qaic_manage_transaction_type {
>>>+	TRANS_UNDEFINED =			0,
>>>+	TRANS_PASSTHROUGH_FROM_USR =		1,
>>>+	TRANS_PASSTHROUGH_TO_USR =		2,
>>>+	TRANS_PASSTHROUGH_FROM_DEV =		3,
>>>+	TRANS_PASSTHROUGH_TO_DEV =		4,
>>>+	TRANS_DMA_XFER_FROM_USR =		5,
>>>+	TRANS_DMA_XFER_TO_DEV =			6,
>>>+	TRANS_ACTIVATE_FROM_USR =		7,
>>>+	TRANS_ACTIVATE_FROM_DEV =		8,
>>>+	TRANS_ACTIVATE_TO_DEV =			9,
>>>+	TRANS_DEACTIVATE_FROM_USR =		10,
>>>+	TRANS_DEACTIVATE_FROM_DEV =		11,
>>>+	TRANS_STATUS_FROM_USR =			12,
>>>+	TRANS_STATUS_TO_USR =			13,
>>>+	TRANS_STATUS_FROM_DEV =			14,
>>>+	TRANS_STATUS_TO_DEV =			15,
>>>+	TRANS_TERMINATE_FROM_DEV =		16,
>>>+	TRANS_TERMINATE_TO_DEV =		17,
>>>+	TRANS_DMA_XFER_CONT =			18,
>>>+	TRANS_VALIDATE_PARTITION_FROM_DEV =	19,
>>>+	TRANS_VALIDATE_PARTITION_TO_DEV =	20,
>>>+	TRANS_MAX =				21
>>>+};
>>>+
>>>+struct qaic_manage_trans_hdr {
>>>+	__u32 type;	/**< in, value from enum manage_transaction_type */
>>>+	__u32 len;	/**< in, length of this transaction, including the header */
>>>+};
>>>+
>>>+struct qaic_manage_trans_passthrough {
>>>+	struct qaic_manage_trans_hdr hdr;
>>>+	__u8 data[];	/**< in, userspace must encode in little endian */
>>>+};
>>>+
>>>+struct qaic_manage_trans_dma_xfer {
>>>+	struct qaic_manage_trans_hdr hdr;
>>>+	__u32 tag;	/**< in, device specific */
>>>+	__u32 count;	/**< in */
>>>+	__u64 addr;	/**< in, address of the data to transferred via DMA */
>>>+	__u64 size;	/**< in, length of the data to transferred via DMA */
>>>+};
>>>+
>>>+struct qaic_manage_trans_activate_to_dev {
>>>+	struct qaic_manage_trans_hdr hdr;
>>>+	__u32 queue_size;	/**<
>>>+				  * in, number of elements in DBC request
>>>+				  * and respose queue
>>>+				  */
>>>+	__u32 eventfd;		/**< in */
>>>+	__u32 options;		/**< in, device specific */
>>>+	__u32 pad;		/**< pad must be 0 */
>>>+};
>>>+
>>>+struct qaic_manage_trans_activate_from_dev {
>>>+	struct qaic_manage_trans_hdr hdr;
>>>+	__u32 status;	/**< out, status of activate transaction */
>>>+	__u32 dbc_id;	/**< out, Identifier of assigned DMA Bridge channel */
>>>+	__u64 options;	/**< out */
>>>+};
>>>+
>>>+struct qaic_manage_trans_deactivate {
>>>+	struct qaic_manage_trans_hdr hdr;
>>>+	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
>>>+	__u32 pad;	/**< pad must be 0 */
>>>+};
>>>+
>>>+struct qaic_manage_trans_status_to_dev {
>>>+	struct qaic_manage_trans_hdr hdr;
>>>+};
>>>+
>>>+struct qaic_manage_trans_status_from_dev {
>>>+	struct qaic_manage_trans_hdr hdr;
>>>+	__u16 major;	/**< out, major vesrion of NNC protocol used by device */
>>>+	__u16 minor;	/**< out, minor vesrion of NNC protocol used by device */
>>
>>vesrion -> version
>
>Will fix.
>
>>
>>>+	__u32 status;	/**< out, status of query transaction  */
>>>+	__u64 status_flags;	/**<
>>>+				  * out
>>>+				  * 0    : If set then device has CRC check enabled
>>>+				  * 1:63 : Unused
>>>+				  */
>>>+};
>>>+
>>>+struct qaic_manage_msg {
>>>+	__u32 len;	/**< in, Length of valid data - ie sum of all transactions */
>>>+	__u32 count;	/**< in, Number of transactions in message */
>>>+	__u64 data;	/**< in, Pointer to array of transactions */
>>>+};
>>>+
>>>+struct qaic_create_bo {
>>>+	__u64 size;	/**< in, Size of BO in byte */
>>>+	__u32 handle;	/**< out, Returned GEM handle for the BO */
>>>+	__u32 pad;	/**< pad must be 0 */
>>>+};
>>>+
>>>+struct qaic_mmap_bo {
>>>+	__u32 handle;	/**< in, Handle for the BO being mapped. */
>>
>>The comment is missleading. BO is not mapped by this ioctl().
>
>Its mapping the handle to the offset, which is a type of mapping.  I 
>think I get your point though.  Will try to wordsmith this better.
>
>>
>>>+	__u32 pad;	/**< pad must be 0 */
>>>+	__u64 offset;	/**<
>>>+			  * out, offset into the drm node to use for
>>>+			  * subsequent mmap call
>>>+			  */
>>>+};
>>>+
>>>+/**
>>>+ * @brief semaphore command
>>>+ */
>>>+struct qaic_sem {
>>>+	__u16 val;	/**< in, Only lower 12 bits are valid */
>>>+	__u8  index;	/**< in, Only lower 5 bits are valid */
>>>+	__u8  presync;	/**< in, 1 if presync operation, 0 if postsync */
>>>+	__u8  cmd;	/**< in, See enum sem_cmd */
>>>+	__u8  flags;	/**< in, See sem_flags for valid bits.  All others must be 0 */
>>>+	__u16 pad;	/**< pad must be 0 */
>>>+};
>>>+
>>>+struct qaic_attach_slice_entry {
>>>+	__u64 size;		/**< in, Size memory to allocate for this BO slice */
>>>+	struct qaic_sem	sem0;	/**< in, Must be zero if not valid */
>>>+	struct qaic_sem	sem1;	/**< in, Must be zero if not valid */
>>>+	struct qaic_sem	sem2;	/**< in, Must be zero if not valid */
>>>+	struct qaic_sem	sem3;	/**< in, Must be zero if not valid */
>>>+	__u64 dev_addr;		/**< in, Address in device to/from which data is copied */
>>>+	__u64 db_addr;		/**< in, Doorbell address */
>>>+	__u32 db_data;		/**< in, Data to write to doorbell */
>>>+	__u32 db_len;		/**<
>>>+				  * in, Doorbell length - 32, 16, or 8 bits.
>>>+				  * 0 means doorbell is inactive
>>>+				  */
>>>+	__u64 offset;		/**< in, Offset from start of buffer */
>>>+};
>>>+
>>>+struct qaic_attach_slice_hdr {
>>>+	__u32 count;	/**< in, Number of slices for this BO */
>>>+	__u32 dbc_id;	/**< in, Associate this BO with this DMA Bridge channel */
>>>+	__u32 handle;	/**< in, Handle of BO to which slicing information is to be attached */
>>>+	__u32 dir;	/**< in, Direction of data: 1 = DMA_TO_DEVICE, 2 = DMA_FROM_DEVICE */
>>>+	__u64 size;	/**<
>>>+			  * in, Total length of BO
>>>+			  * If BO is imported (DMABUF/PRIME) then this size
>>>+			  * should not exceed the size of DMABUF provided.
>>>+			  * If BO is allocated using DRM_IOCTL_QAIC_CREATE_BO
>>>+			  * then this size should be exactly same as the size
>>>+			  * provided during DRM_IOCTL_QAIC_CREATE_BO.
>>>+			  */
>>>+};
>>>+
>>>+struct qaic_attach_slice {
>>>+	struct qaic_attach_slice_hdr hdr;
>>>+	__u64 data;	/**<
>>>+			  * in, Pointer to a buffer which is container of
>>>+			  * struct qaic_attach_slice_entry[]
>>>+			  */
>>>+};
>>>+
>>>+struct qaic_execute_entry {
>>>+	__u32 handle;	/**< in, buffer handle */
>>>+	__u32 dir;	/**< in, 1 = to device, 2 = from device */
>>>+};
>>>+
>>>+struct qaic_partial_execute_entry {
>>>+	__u32 handle;	/**< in, buffer handle */
>>>+	__u32 dir;	/**< in, 1 = to device, 2 = from device */
>>>+	__u64 resize;	/**< in, 0 = no resize */
>>>+};
>>>+
>>>+struct qaic_execute_hdr {
>>>+	__u32 count;	/**< in, number of executes following this header */
>>>+	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
>>>+};
>>>+
>>>+struct qaic_execute {
>>>+	struct qaic_execute_hdr hdr;
>>>+	__u64 data;	/**< in, qaic_execute_entry or qaic_partial_execute_entry container */
>>>+};
>>>+
>>>+struct qaic_wait {
>>>+	__u32 handle;	/**< in, handle to wait on until execute is complete */
>>>+	__u32 timeout;	/**< in, timeout for wait(in ms) */
>>>+	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
>>>+	__u32 pad;	/**< pad must be 0 */
>>>+};
>>>+
>>>+struct qaic_perf_stats_hdr {
>>>+	__u16 count;	/**< in, Total number BOs requested */
>>>+	__u16 pad;	/**< pad must be 0 */
>>>+	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
>>>+};
>>>+
>>>+struct qaic_perf_stats {
>>>+	struct qaic_perf_stats_hdr hdr;
>>>+	__u64 data;	/**< in, qaic_perf_stats_entry container */
>>>+};
>>>+
>>>+struct qaic_perf_stats_entry {
>>>+	__u32 handle;			/**< in, Handle of the memory request */
>>>+	__u32 queue_level_before;	/**<
>>>+					  * out, Number of elements in queue
>>>+					  * before submission given memory request
>>>+					  */
>>>+	__u32 num_queue_element;	/**<
>>>+					  * out, Number of elements to add in the
>>>+					  * queue for given memory request
>>>+					  */
>>>+	__u32 submit_latency_us;	/**<
>>>+					  * out, Time taken by kernel to submit
>>>+					  * the request to device
>>>+					  */
>>>+	__u32 device_latency_us;	/**<
>>>+					  * out, Time taken by device to execute the
>>>+					  * request. 0 if request is not completed
>>>+					  */
>>>+	__u32 pad;			/**< pad must be 0 */
>>>+};
>>>+
>>>+struct qaic_part_dev {
>>>+	__u32 partition_id;	/**< in, reservation id */
>>>+	__u16 remove;		/**< in, 1 - Remove device 0 - Create device */
>>>+	__u16 pad;		/**< pad must be 0 */
>>>+};
>>>+
>>>+#define DRM_QAIC_MANAGE				0x00
>>>+#define DRM_QAIC_CREATE_BO			0x01
>>>+#define DRM_QAIC_MMAP_BO			0x02
>>
>>I know that MMAP_BO ioctl is common in drm drivers but in my opinion it is a very poor name.
>>I suggest naming it BO_INFO so in future you could extend it with other bo params besides
>>mmap offset.
>
>I see your point, but I don't see how to implement it.
>
>Let us assume this IOCTL is renamed DRM_QAIC_BO_INFO, and struct 
>qaic_mmap_bo is named qaic_bo_info.
>
>qaic_bo_info is part of the uAPI.  If, "tomorrow" we need to add a 
>field to it, we cannot.  That changes the structure size and breaks 
>the uAPI.
>
>I can't add reserved fields in anticipation of future use since GregKH 
>had said not to do that - 
>https://lore.kernel.org/all/20200514163734.GB3154055@kroah.com/
>
>So, I'm thinking the only approach would be a linked list similar to 
>EXECUTE_BO where qaic_bo_info holds a userspace pointer that is the 
>head of the list, and new list nodes can be defined in future.  That 
>seems like an overengineered solution to some hypothetical future 
>which may not come to pass.  If this is the solution, then I think I'd 
>prefer to leave this ioctl as is, and deprecate it if the extension 
>usecase ever comes to pass.
>
>Thoughts?  Am I missing something?
>
>>
>>>+#define DRM_QAIC_ATTACH_SLICE_BO		0x03
>>>+#define DRM_QAIC_EXECUTE_BO			0x04
>>>+#define DRM_QAIC_PARTIAL_EXECUTE_BO		0x05
>>>+#define DRM_QAIC_WAIT_BO			0x06
>>>+#define DRM_QAIC_PERF_STATS_BO			0x07
>>>+#define DRM_QAIC_PART_DEV			0x08
>>>+
>>>+#define DRM_IOCTL_QAIC_MANAGE			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_MANAGE, struct qaic_manage_msg)
>>>+#define DRM_IOCTL_QAIC_CREATE_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_CREATE_BO,	struct qaic_create_bo)
>>>+#define DRM_IOCTL_QAIC_MMAP_BO			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_MMAP_BO, struct qaic_mmap_bo)
>>>+#define DRM_IOCTL_QAIC_ATTACH_SLICE_BO		DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_ATTACH_SLICE_BO, struct qaic_attach_slice)
>>>+#define DRM_IOCTL_QAIC_EXECUTE_BO		DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_EXECUTE_BO,	struct qaic_execute)
>>>+#define DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO	DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_PARTIAL_EXECUTE_BO,	struct qaic_execute)
>>>+#define DRM_IOCTL_QAIC_WAIT_BO			DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_WAIT_BO, struct qaic_wait)
>>>+#define DRM_IOCTL_QAIC_PERF_STATS_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PERF_STATS_BO, struct qaic_perf_stats)
>>>+#define DRM_IOCTL_QAIC_PART_DEV			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PART_DEV, struct qaic_part_dev)
>>>+
>>>+#if defined(__CPLUSPLUS)
>>
>>Use lowercase here: __cplusplus
>
>Will do.
>
>>
>>>+}
>>>+#endif
>>>+
>>>+#endif /* QAIC_ACCEL_H_ */
>>
>>Regards,
>>Jacek
>>
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/8] accel/qaic: Add uapi and core driver file
@ 2023-02-22 15:52         ` Dafna Hirschfeld
  0 siblings, 0 replies; 68+ messages in thread
From: Dafna Hirschfeld @ 2023-02-22 15:52 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, stanislaw.gruszka, quic_carlv, Jacek Lawrynowicz

On 17.02.2023 11:15, Jeffrey Hugo wrote:
>On 2/16/2023 7:13 AM, Jacek Lawrynowicz wrote:
>>Hi,
>>
>>On 06.02.2023 16:41, Jeffrey Hugo wrote:
>>>Add the QAIC driver uapi file and core driver file that binds to the PCIe
>>>device.  The core driver file also creates the accel device and manages
>>>all the interconnections between the different parts of the driver.
>>>
>>>The driver can be built as a module.  If so, it will be called "qaic.ko".
>>>
>>>Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
>>>Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
>>>---
>>>  drivers/accel/qaic/qaic.h     | 321 ++++++++++++++++++
>>>  drivers/accel/qaic/qaic_drv.c | 771 ++++++++++++++++++++++++++++++++++++++++++
>>>  include/uapi/drm/qaic_accel.h | 283 ++++++++++++++++
>>>  3 files changed, 1375 insertions(+)
>>>  create mode 100644 drivers/accel/qaic/qaic.h
>>>  create mode 100644 drivers/accel/qaic/qaic_drv.c
>>>  create mode 100644 include/uapi/drm/qaic_accel.h
>>>
>>>diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
>>>new file mode 100644
>>>index 0000000..3f7ea76
>>>--- /dev/null
>>>+++ b/drivers/accel/qaic/qaic.h
>>>@@ -0,0 +1,321 @@
>>>+/* SPDX-License-Identifier: GPL-2.0-only
>>>+ *
>>>+ * Copyright (c) 2019-2021, The Linux Foundation. All rights reserved.
>>>+ * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved.
>>>+ */
>>>+
>>>+#ifndef QAICINTERNAL_H_
>>
>>Please use guard macro that matches the file name: _QAIC_H_
>
>Before moving to DRM/ACCEL, this conflicted with the uapi file. 
>However, that is no longer the case, so yes, this should be changed. 
>Will do.
>
>>
>>>+#define QAICINTERNAL_H_
>>>+
>>>+#include <linux/interrupt.h>
>>>+#include <linux/kref.h>
>>>+#include <linux/mhi.h>
>>>+#include <linux/mutex.h>
>>>+#include <linux/pci.h>
>>>+#include <linux/spinlock.h>
>>>+#include <linux/srcu.h>
>>>+#include <linux/wait.h>
>>>+#include <linux/workqueue.h>
>>>+#include <drm/drm_device.h>
>>>+#include <drm/drm_gem.h>
>>>+
>>>+#define QAIC_DBC_BASE		0x20000
>>>+#define QAIC_DBC_SIZE		0x1000
>>>+
>>>+#define QAIC_NO_PARTITION	-1
>>>+
>>>+#define QAIC_DBC_OFF(i)		((i) * QAIC_DBC_SIZE + QAIC_DBC_BASE)
>>>+
>>>+#define to_qaic_bo(obj) container_of(obj, struct qaic_bo, base)
>>>+
>>>+extern bool poll_datapath;
>>>+
>>>+struct qaic_user {
>>>+	/* Uniquely identifies this user for the device */
>>>+	int			handle;
>>>+	struct kref		ref_count;
>>>+	/* Char device opened by this user */
>>>+	struct qaic_drm_device	*qddev;
>>>+	/* Node in list of users that opened this drm device */
>>>+	struct list_head	node;
>>>+	/* SRCU used to synchronize this user during cleanup */
>>>+	struct srcu_struct	qddev_lock;
>>>+	atomic_t		chunk_id;
>>>+};
>>>+
>>>+struct dma_bridge_chan {
>>>+	/* Pointer to device strcut maintained by driver */
>>>+	struct qaic_device	*qdev;
>>>+	/* ID of this DMA bridge channel(DBC) */
>>>+	unsigned int		id;
>>>+	/* Synchronizes access to xfer_list */
>>>+	spinlock_t		xfer_lock;
>>>+	/* Base address of request queue */
>>>+	void			*req_q_base;
>>>+	/* Base address of response queue */
>>>+	void			*rsp_q_base;
>>>+	/*
>>>+	 * Base bus address of request queue. Response queue bus address can be
>>>+	 * calculated by adding request queue size to this variable
>>>+	 */
>>>+	dma_addr_t		dma_addr;
>>>+	/* Total size of request and response queue in byte */
>>>+	u32			total_size;
>>>+	/* Capacity of request/response queue */
>>>+	u32			nelem;
>>>+	/* The user that opened this DBC */
>>>+	struct qaic_user	*usr;
>>>+	/*
>>>+	 * Request ID of next memory handle that goes in request queue. One
>>>+	 * memory handle can enqueue more than one request elements, all
>>>+	 * this requests that belong to same memory handle have same request ID
>>>+	 */
>>>+	u16			next_req_id;
>>>+	/* TRUE: DBC is in use; FALSE: DBC not in use */
>>
>>Use standard "true"/"false" instead of custom "TRUE"/"FALSE" macros.
>>This applies here and in multiple other places in the driver.
>
>I think you are getting at that the documentation could be confusing.  
>I don't appear to see custom macro use in the code.  Will try to 
>clarify that here.
>
>>>+	bool			in_use;
>>>+	/*
>>>+	 * Base address of device registers. Used to read/write request and
>>>+	 * response queue's head and tail pointer of this DBC.
>>>+	 */
>>>+	void __iomem		*dbc_base;
>>>+	/* Head of list where each node is a memory handle queued in request queue */
>>>+	struct list_head	xfer_list;
>>>+	/* Synchronizes DBC readers during cleanup */
>>>+	struct srcu_struct	ch_lock;
>>>+	/*
>>>+	 * When this DBC is released, any thread waiting on this wait queue is
>>>+	 * woken up
>>>+	 */
>>>+	wait_queue_head_t	dbc_release;
>>>+	/* Head of list where each node is a bo associated with this DBC */
>>>+	struct list_head	bo_lists;
>>>+	/* The irq line for this DBC.  Used for polling */
>>>+	unsigned int		irq;
>>>+	/* Polling work item to simulate interrupts */
>>>+	struct work_struct	poll_work;
>>>+};
>>>+
>>>+struct qaic_device {
>>>+	/* Pointer to base PCI device struct of our physical device */
>>>+	struct pci_dev		*pdev;
>>>+	/* Mask of all bars of this device */
>>>+	int			bars;
>>>+	/* Req. ID of request that will be queued next in MHI control device */
>>>+	u32			next_seq_num;
>>>+	/* Base address of bar 0 */
>>>+	void __iomem		*bar_0;
>>>+	/* Base address of bar 2 */
>>>+	void __iomem		*bar_2;
>>>+	/* Controller structure for MHI devices */
>>>+	struct mhi_controller	*mhi_cntl;
>>>+	/* MHI control channel device */
>>>+	struct mhi_device	*cntl_ch;
>>>+	/* List of requests queued in MHI control device */
>>>+	struct list_head	cntl_xfer_list;
>>>+	/* Synchronizes MHI control device transactions and its xfer list */
>>>+	struct mutex		cntl_mutex;
>>>+	/* Base actual physical representation of drm device */
>>>+	struct qaic_drm_device	*base_dev;
>>>+	/* Array of DBC struct of this device */
>>>+	struct dma_bridge_chan	*dbc;
>>>+	/* Work queue for tasks related to MHI control device */
>>>+	struct workqueue_struct	*cntl_wq;
>>>+	/* Synchronizes all the users of device during cleanup */
>>>+	struct srcu_struct	dev_lock;
>>>+	/* TRUE: Device under reset; FALSE: Device not under reset */
>>>+	bool			in_reset;
>>>+	/*
>>>+	 * TRUE: A tx MHI transaction has failed and a rx buffer is still queued
>>>+	 * in control device. Such a buffer is considered lost rx buffer
>>>+	 * FALSE: No rx buffer is lost in control device
>>>+	 */
>>>+	bool			cntl_lost_buf;
>>>+	/* Maximum number of DBC supported by this device */
>>>+	u32			num_dbc;
>>>+	/* Head in list of drm device created on top of this device */
>>>+	struct list_head	qaic_drm_devices;
>>>+	/* Synchronizes access of qaic_drm_devices list */
>>>+	struct mutex		qaic_drm_devices_mutex;
>>>+	/* Generate the CRC of a control message */
>>>+	u32 (*gen_crc)(void *msg);
>>>+	/* Validate the CRC of a control message */
>>>+	bool (*valid_crc)(void *msg);
>>>+};
>>>+
>>>+struct qaic_drm_device {
>>>+	/* Pointer to the root device struct driven by this driver */
>>>+	struct qaic_device	*qdev;
>>>+	/* Node in list of drm devices maintained by root device */
>>>+	struct list_head	node;
>>>+	/*
>>>+	 * The physical device can be partition in number of logical devices.
>>>+	 * And each logical device is given a partition id. This member stores
>>>+	 * that id. QAIC_NO_PARTITION is a sentinel used to mark that this drm
>>>+	 * device is the actual physical device
>>>+	 */
>>>+	s32			partition_id;
>>>+	/*
>>>+	 * It points to the user that created this drm device. It is NULL
>>>+	 * when this drm device represents the physical device i.e.
>>>+	 * partition_id is QAIC_NO_PARTITION
>>>+	 */
>>>+	struct qaic_user	*owner;
>>>+	/* Pointer to the drm device struct of this drm device */
>>>+	struct drm_device	*ddev;
>>>+	/* Head in list of users who have opened this drm device */
>>>+	struct list_head	users;
>>>+	/* Synchronizes access to users list */
>>>+	struct mutex		users_mutex;
>>>+};
>>>+
>>>+struct qaic_bo {
>>>+	struct drm_gem_object	base;
>>
>>Any reason why drm_gem_shmem_object cannot be used as a base?
>
>I beleive that is incompatible with our memory allocator in patch 5.
>
>>>+	/* Scatter/gather table for allocate/imported BO */
>>>+	struct sg_table		*sgt;
>>>+	/* BO size requested by user. GEM object might be bigger in size. */
>>>+	u64			size;
>>>+	/* Head in list of slices of this BO */
>>>+	struct list_head	slices;
>>>+	/* Total nents, for all slices of this BO */
>>>+	int			total_slice_nents;
>>>+	/*
>>>+	 * Direction of transfer. It can assume only two value DMA_TO_DEVICE and
>>>+	 * DMA_FROM_DEVICE.
>>>+	 */
>>>+	int			dir;
>>>+	/* The pointer of the DBC which operates on this BO */
>>>+	struct dma_bridge_chan	*dbc;
>>>+	/* Number of slice that belongs to this buffer */
>>>+	u32			nr_slice;
>>>+	/* Number of slice that have been transferred by DMA engine */
>>>+	u32			nr_slice_xfer_done;
>>>+	/* TRUE = BO is queued for execution, FALSE = BO is not queued */
>>>+	bool			queued;
>>>+	/*
>>>+	 * If TRUE then user has attached slicing information to this BO by
>>>+	 * calling DRM_IOCTL_QAIC_ATTACH_SLICE_BO ioctl.
>>>+	 */
>>>+	bool			sliced;
>>>+	/* Request ID of this BO if it is queued for execution */
>>>+	u16			req_id;
>>>+	/* Handle assigned to this BO */
>>>+	u32			handle;
>>>+	/* Wait on this for completion of DMA transfer of this BO */
>>>+	struct completion	xfer_done;
>>>+	/*
>>>+	 * Node in linked list where head is dbc->xfer_list.
>>>+	 * This link list contain BO's that are queued for DMA transfer.
>>>+	 */
>>>+	struct list_head	xfer_list;
>>>+	/*
>>>+	 * Node in linked list where head is dbc->bo_lists.
>>>+	 * This link list contain BO's that are associated with the DBC it is
>>>+	 * linked to.
>>>+	 */
>>>+	struct list_head	bo_list;
>>>+	struct {
>>>+		/*
>>>+		 * Latest timestamp(ns) at which kernel received a request to
>>>+		 * execute this BO
>>>+		 */
>>>+		u64		req_received_ts;
>>>+		/*
>>>+		 * Latest timestamp(ns) at which kernel enqueued requests of
>>>+		 * this BO for execution in DMA queue
>>>+		 */
>>>+		u64		req_submit_ts;
>>>+		/*
>>>+		 * Latest timestamp(ns) at which kernel received a completion
>>>+		 * interrupt for requests of this BO
>>>+		 */
>>>+		u64		req_processed_ts;
>>>+		/*
>>>+		 * Number of elements already enqueued in DMA queue before
>>>+		 * enqueuing requests of this BO
>>>+		 */
>>>+		u32		queue_level_before;
>>>+	} perf_stats;
>>>+
>>>+};
>>>+
>>>+struct bo_slice {
>>>+	/* Mapped pages */
>>>+	struct sg_table		*sgt;
>>>+	/* Number of requests required to queue in DMA queue */
>>>+	int			nents;
>>>+	/* See enum dma_data_direction */
>>>+	int			dir;
>>>+	/* Actual requests that will be copied in DMA queue */
>>>+	struct dbc_req		*reqs;
>>>+	struct kref		ref_count;
>>>+	/* TRUE: No DMA transfer required */
>>>+	bool			no_xfer;
>>>+	/* Pointer to the parent BO handle */
>>>+	struct qaic_bo		*bo;
>>>+	/* Node in list of slices maintained by parent BO */
>>>+	struct list_head	slice;
>>>+	/* Size of this slice in bytes */
>>>+	u64			size;
>>>+	/* Offset of this slice in buffer */
>>>+	u64			offset;
>>>+};
>>>+
>>>+int get_dbc_req_elem_size(void);
>>>+int get_dbc_rsp_elem_size(void);
>>>+int get_cntl_version(struct qaic_device *qdev, struct qaic_user *usr,
>>>+		     u16 *major, u16 *minor);
>>>+int qaic_manage_ioctl(struct drm_device *dev, void *data,
>>>+		      struct drm_file *file_priv);
>>>+int qaic_execute_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
>>>+		       unsigned long arg, bool is_partial);
>>>+int qaic_wait_exec_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
>>>+			 unsigned long arg);
>>>+int qaic_query_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
>>>+		     unsigned long arg);
>>>+int qaic_data_mmap(struct qaic_device *qdev, struct qaic_user *usr,
>>>+		   struct vm_area_struct *vma);
>>>+int qaic_data_get_reservation(struct qaic_device *qdev, struct qaic_user *usr,
>>>+			      void *data, u32 *partition_id,
>>>+			      u16 *remove);
>>>+void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
>>>+			 struct mhi_result *mhi_result);
>>>+
>>>+void qaic_mhi_dl_xfer_cb(struct mhi_device *mhi_dev,
>>>+			 struct mhi_result *mhi_result);
>>>+
>>>+int qaic_control_open(struct qaic_device *qdev);
>>>+void qaic_control_close(struct qaic_device *qdev);
>>>+void qaic_release_usr(struct qaic_device *qdev, struct qaic_user *usr);
>>>+
>>>+irqreturn_t dbc_irq_threaded_fn(int irq, void *data);
>>>+irqreturn_t dbc_irq_handler(int irq, void *data);
>>>+int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr);
>>>+void enable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr);
>>>+void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id);
>>>+void release_dbc(struct qaic_device *qdev, u32 dbc_id);
>>>+
>>>+void wake_all_cntl(struct qaic_device *qdev);
>>>+void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset);
>>>+
>>>+struct drm_gem_object *qaic_gem_prime_import(struct drm_device *dev,
>>>+					     struct dma_buf *dma_buf);
>>>+
>>>+int qaic_create_bo_ioctl(struct drm_device *dev, void *data,
>>>+			 struct drm_file *file_priv);
>>>+int qaic_mmap_bo_ioctl(struct drm_device *dev, void *data,
>>>+		       struct drm_file *file_priv);
>>>+int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data,
>>>+			       struct drm_file *file_priv);
>>>+int qaic_execute_bo_ioctl(struct drm_device *dev, void *data,
>>>+			  struct drm_file *file_priv);
>>>+int qaic_partial_execute_bo_ioctl(struct drm_device *dev, void *data,
>>>+				  struct drm_file *file_priv);
>>>+int qaic_wait_bo_ioctl(struct drm_device *dev, void *data,
>>>+		       struct drm_file *file_priv);
>>>+int qaic_test_print_bo_ioctl(struct drm_device *dev, void *data,
>>>+			     struct drm_file *file_priv);
>>>+int qaic_perf_stats_bo_ioctl(struct drm_device *dev, void *data,
>>>+			     struct drm_file *file_priv);
>>
>>You don't need to break these lines. You can use up to 100 columns in the whole driver.
>>It will be more readable and checkpatch won't complain.
>
>Some of this predates the 100 column change.  Checkpatch already 
>complains about the uapi file.
>
>Will try to fix here.
>
>>>+void irq_polling_work(struct work_struct *work);
>>>+
>>>+#endif /* QAICINTERNAL_H_ */
>>>diff --git a/drivers/accel/qaic/qaic_drv.c b/drivers/accel/qaic/qaic_drv.c
>>>new file mode 100644
>>>index 0000000..602a784
>>>--- /dev/null
>>>+++ b/drivers/accel/qaic/qaic_drv.c
>>>@@ -0,0 +1,771 @@
>>>+// SPDX-License-Identifier: GPL-2.0-only
>>>+
>>>+/* Copyright (c) 2019-2021, The Linux Foundation. All rights reserved. */
>>>+/* Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved. */
>>>+
>>>+#include <linux/delay.h>
>>>+#include <linux/dma-mapping.h>
>>>+#include <linux/idr.h>
>>>+#include <linux/interrupt.h>
>>>+#include <linux/list.h>
>>>+#include <linux/kref.h>
>>>+#include <linux/mhi.h>
>>>+#include <linux/module.h>
>>>+#include <linux/msi.h>
>>>+#include <linux/mutex.h>
>>>+#include <linux/pci.h>
>>>+#include <linux/sched.h>
>>
>>Is <linux/sched.h> used here?
>>Feels like there are couple other unused includes in this file.
>
>Actually I think that can be removed now.  Will check.
>The other ones look sane to me.  Are there specific ones you question?
>
>>
>>>+#include <linux/spinlock.h>
>>>+#include <linux/workqueue.h>
>>>+#include <linux/wait.h>
>>>+#include <drm/drm_accel.h>
>>>+#include <drm/drm_drv.h>
>>>+#include <drm/drm_file.h>
>>>+#include <drm/drm_gem.h>
>>>+#include <drm/drm_ioctl.h>
>>>+#include <uapi/drm/qaic_accel.h>
>>>+
>>>+#include "mhi_controller.h"
>>>+#include "mhi_qaic_ctrl.h"
>>>+#include "qaic.h"
>>>+
>>>+MODULE_IMPORT_NS(DMA_BUF);
>>>+
>>>+#define PCI_DEV_AIC100			0xa100
>>>+#define QAIC_NAME			"qaic"
>>>+#define QAIC_DESC			"Qualcomm Cloud AI Accelerators"
>>>+
>>>+static unsigned int datapath_polling;
>>>+module_param(datapath_polling, uint, 0400);
>>>+bool poll_datapath;
>>>+
>>>+static u16 cntl_major = 5;
>>>+static u16 cntl_minor;/* 0 */
>>
>>Missing space before the comment.
>>And also you could convert both vars to macros as they are constants.
>
>Sure
>
>>>+static bool link_up;
>>>+static DEFINE_IDA(qaic_usrs);
>>>+
>>>+static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id,
>>>+				  struct qaic_user *owner);
>>>+static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id,
>>>+				    struct qaic_user *owner);
>>>+
>>>+static void free_usr(struct kref *kref)
>>>+{
>>>+	struct qaic_user *usr = container_of(kref, struct qaic_user, ref_count);
>>>+
>>>+	cleanup_srcu_struct(&usr->qddev_lock);
>>>+	ida_free(&qaic_usrs, usr->handle);
>>>+	kfree(usr);
>>>+}
>>>+
>>>+static int qaic_open(struct drm_device *dev, struct drm_file *file)
>>>+{
>>>+	struct qaic_drm_device *qddev = dev->dev_private;
>>>+	struct qaic_device *qdev = qddev->qdev;
>>>+	struct qaic_user *usr;
>>>+	int rcu_id;
>>>+	int ret;
>>>+
>>>+	rcu_id = srcu_read_lock(&qdev->dev_lock);
>>>+	if (qdev->in_reset) {
>>>+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>+		return -ENODEV;
>>>+	}
>>>+
>>>+	usr = kmalloc(sizeof(*usr), GFP_KERNEL);
>>>+	if (!usr) {
>>>+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>+		return -ENOMEM;
>>>+	}
>>>+
>>>+	usr->handle = ida_alloc(&qaic_usrs, GFP_KERNEL);
>>>+	if (usr->handle < 0) {
>>>+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>+		kfree(usr);
>>>+		return usr->handle;

hi, use after free here

>>>+	}
>>>+	usr->qddev = qddev;
>>>+	atomic_set(&usr->chunk_id, 0);
>>>+	init_srcu_struct(&usr->qddev_lock);
>>>+	kref_init(&usr->ref_count);
>>>+
>>>+	ret = mutex_lock_interruptible(&qddev->users_mutex);
>>>+	if (ret) {
>>>+		cleanup_srcu_struct(&usr->qddev_lock);
>>>+		kfree(usr);
>>>+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>+		return ret;
>>>+	}
>>>+
>>>+	list_add(&usr->node, &qddev->users);
>>>+	mutex_unlock(&qddev->users_mutex);
>>>+
>>>+	file->driver_priv = usr;
>>>+
>>>+	srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>+	return 0;
>>>+}
>>>+
>>>+static void qaic_postclose(struct drm_device *dev, struct drm_file *file)
>>>+{
>>>+	struct qaic_user *usr = file->driver_priv;
>>>+	struct qaic_drm_device *qddev;
>>>+	struct qaic_device *qdev;
>>>+	int qdev_rcu_id;
>>>+	int usr_rcu_id;
>>>+	int i;
>>>+
>>>+	qddev = usr->qddev;
>>>+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
>>>+	if (qddev) {
>>>+		qdev = qddev->qdev;
>>>+		qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
>>>+		if (!qdev->in_reset) {
>>>+			qaic_release_usr(qdev, usr);
>>>+			for (i = 0; i < qdev->num_dbc; ++i)
>>>+				if (qdev->dbc[i].usr &&
>>>+				    qdev->dbc[i].usr->handle == usr->handle)
>>>+					release_dbc(qdev, i);
>>>+
>>>+			/* Remove child devices */
>>>+			if (qddev->partition_id == QAIC_NO_PARTITION)
>>>+				qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, usr);
>>>+		}
>>>+		srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
>>>+
>>>+		mutex_lock(&qddev->users_mutex);
>>>+		if (!list_empty(&usr->node))
>>>+			list_del_init(&usr->node);
>>>+		mutex_unlock(&qddev->users_mutex);
>>>+	}
>>>+
>>>+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>>+	kref_put(&usr->ref_count, free_usr);
>>>+
>>>+	file->driver_priv = NULL;
>>>+}
>>>+
>>>+static int qaic_part_dev_ioctl(struct drm_device *dev, void *data,
>>>+			       struct drm_file *file_priv)
>>>+{
>>>+	struct qaic_device *qdev;
>>>+	struct qaic_user *usr;
>>>+	u32 partition_id;
>>>+	int qdev_rcu_id;
>>>+	int usr_rcu_id;
>>>+	int ret = 0;
>>>+	u16 remove;
>>>+
>>>+	usr = file_priv->driver_priv;
>>>+	if (!usr)
>>>+		return -EINVAL;
>>>+
>>>+	usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
>>>+	if (!usr->qddev) {
>>>+		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>>+		return -ENODEV;
>>>+	}
>>>+
>>>+	qdev = usr->qddev->qdev;
>>>+	if (!qdev) {
>>>+		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>>+		return -ENODEV;
>>>+	}
>>>+
>>>+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
>>>+	if (qdev->in_reset) {
>>>+		ret = -ENODEV;
>>>+		goto out;
>>>+	}
>>>+
>>>+	/* This IOCTL is only supported for base devices. */
>>>+	if (usr->qddev->partition_id != QAIC_NO_PARTITION) {
>>>+		ret = -ENOTTY;
>>>+		goto out;
>>>+	}
>>>+
>>>+	ret = qaic_data_get_reservation(qdev, usr, data, &partition_id,
>>>+					&remove);
>>>+	if (ret)
>>>+		goto out;
>>>+
>>>+	if (remove == 1)
>>>+		qaic_destroy_drm_device(qdev, partition_id, usr);
>>>+	else
>>>+		ret = qaic_create_drm_device(qdev, partition_id, usr);
>>>+
>>>+out:
>>>+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
>>>+	srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>>+
>>>+	return ret;
>>>+}
>>>+
>>>+DEFINE_DRM_ACCEL_FOPS(qaic_accel_fops);
>>>+
>>>+static const struct drm_ioctl_desc qaic_drm_ioctls[] = {
>>>+	DRM_IOCTL_DEF_DRV(QAIC_MANAGE, qaic_manage_ioctl, DRM_RENDER_ALLOW),
>>>+	DRM_IOCTL_DEF_DRV(QAIC_CREATE_BO, qaic_create_bo_ioctl, DRM_RENDER_ALLOW),
>>>+	DRM_IOCTL_DEF_DRV(QAIC_MMAP_BO, qaic_mmap_bo_ioctl, DRM_RENDER_ALLOW),
>>>+	DRM_IOCTL_DEF_DRV(QAIC_ATTACH_SLICE_BO, qaic_attach_slice_bo_ioctl, DRM_RENDER_ALLOW),
>>>+	DRM_IOCTL_DEF_DRV(QAIC_EXECUTE_BO, qaic_execute_bo_ioctl, DRM_RENDER_ALLOW),
>>>+	DRM_IOCTL_DEF_DRV(QAIC_PARTIAL_EXECUTE_BO, qaic_partial_execute_bo_ioctl, DRM_RENDER_ALLOW),
>>>+	DRM_IOCTL_DEF_DRV(QAIC_WAIT_BO, qaic_wait_bo_ioctl, DRM_RENDER_ALLOW),
>>>+	DRM_IOCTL_DEF_DRV(QAIC_PERF_STATS_BO, qaic_perf_stats_bo_ioctl, DRM_RENDER_ALLOW),
>>>+	DRM_IOCTL_DEF_DRV(QAIC_PART_DEV, qaic_part_dev_ioctl, DRM_RENDER_ALLOW),
>>>+};
>>>+
>>>+static const struct drm_driver qaic_accel_driver = {
>>>+	.driver_features	= DRIVER_GEM | DRIVER_COMPUTE_ACCEL,
>>>+
>>>+	.name			= QAIC_NAME,
>>>+	.desc			= QAIC_DESC,
>>>+	.date			= "20190618",
>>>+
>>>+	.fops			= &qaic_accel_fops,
>>>+	.open			= qaic_open,
>>>+	.postclose		= qaic_postclose,
>>>+
>>>+	.ioctls			= qaic_drm_ioctls,
>>>+	.num_ioctls		= ARRAY_SIZE(qaic_drm_ioctls),
>>>+	.prime_fd_to_handle	= drm_gem_prime_fd_to_handle,
>>>+	.gem_prime_import	= qaic_gem_prime_import,
>>>+};
>>>+
>>>+static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id,
>>>+				  struct qaic_user *owner)
>>>+{
>>>+	struct qaic_drm_device *qddev;
>>>+	struct drm_device *ddev;
>>>+	struct device *pdev;
>>>+	int ret;
>>>+
>>>+	/*
>>>+	 * Partition id QAIC_NO_PARTITION indicates that the device was created
>>>+	 * on mhi_probe and id > QAIC_NO_PARTITION indicates a partition
>>>+	 * created using IOCTL. So, pdev for primary device is the pci dev and
>>>+	 * the parent for partition dev is the primary device.
>>>+	 */
>>>+	if (partition_id == QAIC_NO_PARTITION)
>>>+		pdev = &qdev->pdev->dev;
>>>+	else
>>>+		pdev = qdev->base_dev->ddev->dev;
>>>+
>>>+	qddev = kzalloc(sizeof(*qddev), GFP_KERNEL);
>>>+	if (!qddev) {
>>>+		ret = -ENOMEM;
>>>+		goto qddev_fail;

qddev_fail label is just before returning so better return here directly

>>>+	}
>>>+
>>>+	ddev = drm_dev_alloc(&qaic_accel_driver, pdev);
>>>+	if (IS_ERR(ddev)) {
>>>+		ret = PTR_ERR(ddev);
>>>+		goto ddev_fail;
>>>+	}
>>>+
>>>+	ddev->dev_private = qddev;
>>>+	qddev->ddev = ddev;
>>>+
>>>+	if (partition_id == QAIC_NO_PARTITION)
>>>+		qdev->base_dev = qddev;
>>>+	qddev->qdev = qdev;
>>>+	qddev->partition_id = partition_id;
>>>+	qddev->owner = owner;
>>>+	INIT_LIST_HEAD(&qddev->users);
>>>+	mutex_init(&qddev->users_mutex);
>>>+
>>>+	mutex_lock(&qdev->qaic_drm_devices_mutex);
>>>+	list_add(&qddev->node, &qdev->qaic_drm_devices);
>>>+	mutex_unlock(&qdev->qaic_drm_devices_mutex);
>>>+
>>>+	ret = drm_dev_register(ddev, 0);
>>>+	if (ret) {
>>>+		pci_dbg(qdev->pdev, "%s: drm_dev_register failed %d\n", __func__, ret);
>>>+		goto drm_reg_fail;
>>>+	}
>>>+
>>>+	return 0;
>>>+
>>>+drm_reg_fail:
>>>+	mutex_destroy(&qddev->users_mutex);
>>>+	mutex_lock(&qdev->qaic_drm_devices_mutex);
>>>+	list_del(&qddev->node);
>>>+	mutex_unlock(&qdev->qaic_drm_devices_mutex);
>>>+	if (partition_id == QAIC_NO_PARTITION)
>>>+		qdev->base_dev = NULL;
>>>+	drm_dev_put(ddev);
>>>+ddev_fail:
>>>+	kfree(qddev);
>>>+qddev_fail:

no point for this label

>>>+	return ret;
>>>+}
>>>+
>>>+static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id,
>>>+				    struct qaic_user *owner)
>>>+{
>>>+	struct qaic_drm_device *qddev;
>>>+	struct qaic_drm_device *q;
>>>+	struct qaic_user *usr;
>>>+
>>>+	list_for_each_entry_safe(qddev, q, &qdev->qaic_drm_devices, node) {
>>>+		/*
>>>+		 * Skip devices in case we just want to remove devices
>>>+		 * specific to a owner or partition id.
>>>+		 *
>>>+		 * owner	partition_id	notes
>>>+		 * ----------------------------------
>>>+		 * NULL		NO_PARTITION	delete base + all derived (qdev
>>>+		 *				reset)
>>>+		 * !NULL	NO_PARTITION	delete derived devs created by
>>>+		 *				owner.
>>>+		 * !NULL	>NO_PARTITION	delete derived dev identified by
>>>+		 *				the partition id and created by
>>>+		 *				owner
>>>+		 * NULL		>NO_PARTITION	invalid (no-op)
>>>+		 *
>>>+		 * if partition_id is any value < QAIC_NO_PARTITION this will be
>>>+		 * a no-op.
>>>+		 */
>>>+		if (owner && owner != qddev->owner)
>>>+			continue;
>>>+
>>>+		if (partition_id != QAIC_NO_PARTITION &&
>>>+		    partition_id != qddev->partition_id && !owner)
>>>+			continue;
>>>+
>>>+		/*
>>>+		 * Existing users get unresolvable errors till they close FDs.
>>>+		 * Need to sync carefully with users calling close().  The
>>>+		 * list of users can be modified elsewhere when the lock isn't
>>>+		 * held here, but the sync'ing the srcu with the mutex held
>>>+		 * could deadlock.  Grab the mutex so that the list will be
>>>+		 * unmodified.  The user we get will exist as long as the
>>>+		 * lock is held.  Signal that the qcdev is going away, and
>>>+		 * grab a reference to the user so they don't go away for
>>>+		 * synchronize_srcu().  Then release the mutex to avoid
>>>+		 * deadlock and make sure the user has observed the signal.
>>>+		 * With the lock released, we cannot maintain any state of the
>>>+		 * user list.
>>>+		 */
>>>+		mutex_lock(&qddev->users_mutex);
>>>+		while (!list_empty(&qddev->users)) {
>>>+			usr = list_first_entry(&qddev->users, struct qaic_user,
>>>+					       node);
>>>+			list_del_init(&usr->node);
>>>+			kref_get(&usr->ref_count);
>>>+			usr->qddev = NULL;
>>>+			mutex_unlock(&qddev->users_mutex);
>>>+			synchronize_srcu(&usr->qddev_lock);
>>>+			kref_put(&usr->ref_count, free_usr);
>>>+			mutex_lock(&qddev->users_mutex);
>>>+		}
>>>+		mutex_unlock(&qddev->users_mutex);
>>>+
>>>+		if (qddev->ddev) {
>>>+			drm_dev_unregister(qddev->ddev);
>>>+			drm_dev_put(qddev->ddev);
>>>+		}
>>>+
>>>+		list_del(&qddev->node);
>>>+		kfree(qddev);
>>>+	}
>>>+}
>>>+
>>>+static int qaic_mhi_probe(struct mhi_device *mhi_dev,
>>>+			  const struct mhi_device_id *id)
>>>+{
>>>+	struct qaic_device *qdev;
>>>+	u16 major, minor;
>>>+	int ret;
>>>+
>>>+	/*
>>>+	 * Invoking this function indicates that the control channel to the
>>>+	 * device is available.  We use that as a signal to indicate that
>>>+	 * the device side firmware has booted.  The device side firmware
>>>+	 * manages the device resources, so we need to communicate with it
>>>+	 * via the control channel in order to utilize the device.  Therefore
>>>+	 * we wait until this signal to create the drm dev that userspace will
>>>+	 * use to control the device, because without the device side firmware,
>>>+	 * userspace can't do anything useful.
>>>+	 */
>>>+
>>>+	qdev = pci_get_drvdata(to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
>>>+
>>>+	qdev->in_reset = false;
>>>+
>>>+	dev_set_drvdata(&mhi_dev->dev, qdev);
>>>+	qdev->cntl_ch = mhi_dev;
>>>+
>>>+	ret = qaic_control_open(qdev);
>>>+	if (ret) {
>>>+		pci_dbg(qdev->pdev, "%s: control_open failed %d\n", __func__, ret);
>>>+		goto err;
>>>+	}
>>>+
>>>+	ret = get_cntl_version(qdev, NULL, &major, &minor);
>>>+	if (ret || major != cntl_major || minor > cntl_minor) {
>>>+		pci_err(qdev->pdev, "%s: Control protocol version (%d.%d) not supported.  Supported version is (%d.%d). Ret: %d\n",
>>>+			__func__, major, minor, cntl_major, cntl_minor, ret);
>>>+		ret = -EINVAL;
>>>+		goto close_control;
>>>+	}
>>>+
>>>+	ret = qaic_create_drm_device(qdev, QAIC_NO_PARTITION, NULL);
>>>+
>>>+	return ret;
>>>+
>>>+close_control:
>>>+	qaic_control_close(qdev);
>>>+err:

same here, not sure there is a point to this label

Thanks,
Dafna

>>>+	return ret;
>>>+}
>>>+
>>>+static void qaic_mhi_remove(struct mhi_device *mhi_dev)
>>>+{
>>
>>Add a comment here
>
>Presumably why this is an empty function?
>Will do.
>
>>>+}
>>>+
>>>+static void qaic_notify_reset(struct qaic_device *qdev)
>>>+{
>>>+	int i;
>>>+
>>>+	qdev->in_reset = true;
>>>+	/* wake up any waiters to avoid waiting for timeouts at sync */
>>>+	wake_all_cntl(qdev);
>>>+	for (i = 0; i < qdev->num_dbc; ++i)
>>>+		wakeup_dbc(qdev, i);
>>>+	synchronize_srcu(&qdev->dev_lock);
>>>+}
>>>+
>>>+void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset)
>>>+{
>>>+	int i;
>>>+
>>>+	qaic_notify_reset(qdev);
>>>+
>>>+	/* remove drmdevs to prevent new users from coming in */
>>>+	if (qdev->base_dev)
>>>+		qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, NULL);
>>>+
>>>+	/* start tearing things down */
>>>+	for (i = 0; i < qdev->num_dbc; ++i)
>>>+		release_dbc(qdev, i);
>>>+
>>>+	if (exit_reset)
>>>+		qdev->in_reset = false;
>>>+}
>>>+
>>>+static int qaic_pci_probe(struct pci_dev *pdev,
>>>+			  const struct pci_device_id *id)
>>
>>Please try to simplify this function. Maybe move irq init to separate function.
>>It will be more readable and there will less of a error handling spaghetti at the bottom.
>
>I guess I tend towards not breaking something up if there is not reuse 
>elsewhere, but I think I see your point.  Will have a look.
>
>>
>>>+{
>>>+	int ret;
>>>+	int i;
>>>+	int mhi_irq;
>>>+	struct qaic_device *qdev;
>>>+
>>>+	qdev = devm_kzalloc(&pdev->dev, sizeof(*qdev), GFP_KERNEL);
>>>+	if (!qdev)
>>>+		return -ENOMEM;
>>>+
>>>+	if (id->device == PCI_DEV_AIC100) {
>>>+		qdev->num_dbc = 16;
>>>+		qdev->dbc = devm_kcalloc(&pdev->dev, qdev->num_dbc, sizeof(*qdev->dbc),
>>>+					 GFP_KERNEL);
>>>+		if (!qdev->dbc)
>>>+			return -ENOMEM;
>>>+	}
>>>+
>>>+	qdev->cntl_wq = alloc_workqueue("qaic_cntl", WQ_UNBOUND, 0);
>>>+	if (!qdev->cntl_wq) {
>>>+		ret = -ENOMEM;
>>>+		goto wq_fail;
>>>+	}
>>>+	pci_set_drvdata(pdev, qdev);
>>>+	qdev->pdev = pdev;
>>>+	mutex_init(&qdev->cntl_mutex);
>>>+	INIT_LIST_HEAD(&qdev->cntl_xfer_list);
>>>+	init_srcu_struct(&qdev->dev_lock);
>>>+	INIT_LIST_HEAD(&qdev->qaic_drm_devices);
>>>+	mutex_init(&qdev->qaic_drm_devices_mutex);
>>>+	for (i = 0; i < qdev->num_dbc; ++i) {
>>>+		spin_lock_init(&qdev->dbc[i].xfer_lock);
>>>+		qdev->dbc[i].qdev = qdev;
>>>+		qdev->dbc[i].id = i;
>>>+		INIT_LIST_HEAD(&qdev->dbc[i].xfer_list);
>>>+		init_srcu_struct(&qdev->dbc[i].ch_lock);
>>>+		init_waitqueue_head(&qdev->dbc[i].dbc_release);
>>>+		INIT_LIST_HEAD(&qdev->dbc[i].bo_lists);
>>>+	}
>>>+
>>>+	qdev->bars = pci_select_bars(pdev, IORESOURCE_MEM);
>>>+
>>>+	/* make sure the device has the expected BARs */
>>>+	if (qdev->bars != (BIT(0) | BIT(2) | BIT(4))) {
>>>+		pci_dbg(pdev, "%s: expected BARs 0, 2, and 4 not found in device.  Found 0x%x\n",
>>>+			__func__, qdev->bars);
>>>+		ret = -EINVAL;
>>>+		goto bar_fail;
>>>+	}
>>>+
>>>+	ret = pci_enable_device(pdev);
>>>+	if (ret)
>>>+		goto enable_fail;
>>>+
>>>+	ret = pci_request_selected_regions(pdev, qdev->bars, "aic100");
>>>+	if (ret)
>>>+		goto request_regions_fail;
>>>+
>>>+	pci_set_master(pdev);
>>>+
>>>+	ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
>>>+	if (ret)
>>>+		goto dma_mask_fail;
>>>+	ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
>>>+	if (ret)
>>>+		goto dma_mask_fail;
>>>+	ret = dma_set_max_seg_size(&pdev->dev, UINT_MAX);
>>>+	if (ret)
>>>+		goto dma_mask_fail;
>>>+
>>>+	qdev->bar_0 = pci_ioremap_bar(pdev, 0);
>>>+	if (!qdev->bar_0) {
>>>+		ret = -ENOMEM;
>>>+		goto ioremap_0_fail;
>>>+	}
>>>+
>>>+	qdev->bar_2 = pci_ioremap_bar(pdev, 2);
>>>+	if (!qdev->bar_2) {
>>>+		ret = -ENOMEM;
>>>+		goto ioremap_2_fail;
>>>+	}
>>>+
>>>+	for (i = 0; i < qdev->num_dbc; ++i)
>>>+		qdev->dbc[i].dbc_base = qdev->bar_2 + QAIC_DBC_OFF(i);
>>>+
>>>+	ret = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI);
>>>+	if (ret < 0)
>>>+		goto alloc_irq_fail;
>>>+
>>>+	if (ret < 32) {
>>>+		pci_err(pdev, "%s: Requested 32 MSIs.  Obtained %d MSIs which is less than the 32 required.\n",
>>>+			__func__, ret);
>>>+		ret = -ENODEV;
>>>+		goto invalid_msi_config;
>>>+	}
>>>+
>>>+	mhi_irq = pci_irq_vector(pdev, 0);
>>>+	if (mhi_irq < 0) {
>>>+		ret = mhi_irq;
>>>+		goto get_mhi_irq_fail;
>>>+	}
>>>+
>>>+	for (i = 0; i < qdev->num_dbc; ++i) {
>>>+		ret = devm_request_threaded_irq(&pdev->dev,
>>>+						pci_irq_vector(pdev, i + 1),
>>>+						dbc_irq_handler,
>>>+						dbc_irq_threaded_fn,
>>>+						IRQF_SHARED,
>>>+						"qaic_dbc",
>>>+						&qdev->dbc[i]);
>>>+		if (ret)
>>>+			goto get_dbc_irq_failed;
>>>+
>>>+		if (poll_datapath) {
>>>+			qdev->dbc[i].irq = pci_irq_vector(pdev, i + 1);
>>>+			disable_irq_nosync(qdev->dbc[i].irq);
>>>+			INIT_WORK(&qdev->dbc[i].poll_work, irq_polling_work);
>>>+		}
>>>+	}
>>>+
>>>+	qdev->mhi_cntl = qaic_mhi_register_controller(pdev, qdev->bar_0, mhi_irq);
>>>+	if (IS_ERR(qdev->mhi_cntl)) {
>>>+		ret = PTR_ERR(qdev->mhi_cntl);
>>>+		goto mhi_register_fail;
>>>+	}
>>>+
>>>+	return 0;
>>>+
>>>+mhi_register_fail:
>>>+get_dbc_irq_failed:
>>
>>I don't think that duplicated goto statements are allowed by the coding style.
>>These should be rather named something like err_free_irq.
>>See https://www.kernel.org/doc/html/v4.10/process/coding-style.html#centralized-exiting-of-functions
>
>I took another look at that link, and I do not beleive it prohibits 
>duplicated goto labels, but they probably could be renamed and 
>simplified as you indicate.  Will have a look at that.
>
>>
>>>+	for (i = 0; i < qdev->num_dbc; ++i)
>>>+		devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
>>>+			      &qdev->dbc[i]);
>>>+get_mhi_irq_fail:
>>>+invalid_msi_config:
>>>+	pci_free_irq_vectors(pdev);
>>>+alloc_irq_fail:
>>>+	iounmap(qdev->bar_2);
>>>+ioremap_2_fail:
>>>+	iounmap(qdev->bar_0);
>>>+ioremap_0_fail:
>>>+dma_mask_fail:
>>>+	pci_clear_master(pdev);
>>>+	pci_release_selected_regions(pdev, qdev->bars);
>>>+request_regions_fail:
>>>+	pci_disable_device(pdev);
>>>+enable_fail:
>>>+	pci_set_drvdata(pdev, NULL);
>>>+bar_fail:
>>>+	for (i = 0; i < qdev->num_dbc; ++i)
>>>+		cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
>>>+	cleanup_srcu_struct(&qdev->dev_lock);
>>>+	destroy_workqueue(qdev->cntl_wq);
>>>+wq_fail:
>>>+	return ret;
>>>+}
>>>+
>>>+static void qaic_pci_remove(struct pci_dev *pdev)
>>>+{
>>>+	struct qaic_device *qdev = pci_get_drvdata(pdev);
>>>+	int i;
>>>+
>>>+	if (!qdev)
>>>+		return;
>>>+
>>>+	qaic_dev_reset_clean_local_state(qdev, false);
>>>+	qaic_mhi_free_controller(qdev->mhi_cntl, link_up);
>>>+	for (i = 0; i < qdev->num_dbc; ++i) {
>>>+		devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
>>>+			      &qdev->dbc[i]);
>>>+		cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
>>>+	}
>>>+	destroy_workqueue(qdev->cntl_wq);
>>>+	pci_free_irq_vectors(pdev);
>>>+	iounmap(qdev->bar_0);
>>>+	pci_clear_master(pdev);
>>>+	pci_release_selected_regions(pdev, qdev->bars);
>>>+	pci_disable_device(pdev);
>>>+	pci_set_drvdata(pdev, NULL);
>>>+}
>>>+
>>>+static void qaic_pci_shutdown(struct pci_dev *pdev)
>>>+{
>>>+	/* see qaic_exit for what link_up is doing */
>>>+	link_up = true;
>>>+	qaic_pci_remove(pdev);
>>>+}
>>>+
>>>+static pci_ers_result_t qaic_pci_error_detected(struct pci_dev *pdev,
>>>+						pci_channel_state_t error)
>>>+{
>>>+	return PCI_ERS_RESULT_NEED_RESET;
>>>+}
>>>+
>>>+static void qaic_pci_reset_prepare(struct pci_dev *pdev)
>>>+{
>>>+	struct qaic_device *qdev = pci_get_drvdata(pdev);
>>>+
>>>+	qaic_notify_reset(qdev);
>>>+	qaic_mhi_start_reset(qdev->mhi_cntl);
>>>+	qaic_dev_reset_clean_local_state(qdev, false);
>>>+}
>>>+
>>>+static void qaic_pci_reset_done(struct pci_dev *pdev)
>>>+{
>>>+	struct qaic_device *qdev = pci_get_drvdata(pdev);
>>>+
>>>+	qdev->in_reset = false;
>>>+	qaic_mhi_reset_done(qdev->mhi_cntl);
>>>+}
>>>+
>>>+static const struct mhi_device_id qaic_mhi_match_table[] = {
>>>+	{ .chan = "QAIC_CONTROL", },
>>>+	{},
>>>+};
>>>+
>>>+static struct mhi_driver qaic_mhi_driver = {
>>>+	.id_table = qaic_mhi_match_table,
>>>+	.remove = qaic_mhi_remove,
>>>+	.probe = qaic_mhi_probe,
>>>+	.ul_xfer_cb = qaic_mhi_ul_xfer_cb,
>>>+	.dl_xfer_cb = qaic_mhi_dl_xfer_cb,
>>>+	.driver = {
>>>+		.name = "qaic_mhi",
>>>+	},
>>>+};
>>>+
>>>+static const struct pci_device_id qaic_ids[] = {
>>>+	{ PCI_DEVICE(PCI_VENDOR_ID_QCOM, PCI_DEV_AIC100), },
>>>+	{ }
>>>+};
>>>+MODULE_DEVICE_TABLE(pci, qaic_ids);
>>>+
>>>+static const struct pci_error_handlers qaic_pci_err_handler = {
>>>+	.error_detected = qaic_pci_error_detected,
>>>+	.reset_prepare = qaic_pci_reset_prepare,
>>>+	.reset_done = qaic_pci_reset_done,
>>>+};
>>>+
>>>+static struct pci_driver qaic_pci_driver = {
>>>+	.name = QAIC_NAME,
>>>+	.id_table = qaic_ids,
>>>+	.probe = qaic_pci_probe,
>>>+	.remove = qaic_pci_remove,
>>>+	.shutdown = qaic_pci_shutdown,
>>>+	.err_handler = &qaic_pci_err_handler,
>>>+};
>>>+
>>>+static int __init qaic_init(void)
>>>+{
>>>+	int ret;
>>>+
>>>+	if (datapath_polling)
>>>+		poll_datapath = true;
>>>+
>>>+	ret = mhi_driver_register(&qaic_mhi_driver);
>>>+	if (ret) {
>>>+		pr_debug("qaic: mhi_driver_register failed %d\n", ret);
>>>+		goto free_class;
>>>+	}
>>>+
>>>+	ret = pci_register_driver(&qaic_pci_driver);
>>>+	if (ret) {
>>>+		pr_debug("qaic: pci_register_driver failed %d\n", ret);
>>>+		goto free_mhi;
>>>+	}
>>>+
>>>+	ret = mhi_qaic_ctrl_init();
>>>+	if (ret) {
>>>+		pr_debug("qaic: mhi_qaic_ctrl_init failed %d\n", ret);
>>>+		goto free_pci;
>>>+	}
>>>+
>>>+	return 0;
>>>+
>>>+free_pci:
>>>+	pci_unregister_driver(&qaic_pci_driver);
>>>+free_mhi:
>>>+	mhi_driver_unregister(&qaic_mhi_driver);
>>>+free_class:
>>
>>This label doesn't free anything. It should be renamed.
>
>Doh.  Holdover from a previous version.  It does need to be renamed.
>
>>
>>>+	return ret;
>>>+}
>>>+
>>>+static void __exit qaic_exit(void)
>>>+{
>>>+	/*
>>>+	 * We assume that qaic_pci_remove() is called due to a hotplug event
>>>+	 * which would mean that the link is down, and thus
>>>+	 * qaic_mhi_free_controller() should not try to access the device during
>>>+	 * cleanup.
>>>+	 * We call pci_unregister_driver() below, which also triggers
>>>+	 * qaic_pci_remove(), but since this is module exit, we expect the link
>>>+	 * to the device to be up, in which case qaic_mhi_free_controller()
>>>+	 * should try to access the device during cleanup to put the device in
>>>+	 * a sane state.
>>>+	 * For that reason, we set link_up here to let qaic_mhi_free_controller
>>>+	 * know the expected link state.  Since the module is going to be
>>>+	 * removed at the end of this, we don't need to worry about
>>>+	 * reinitializing the link_up state after the cleanup is done.
>>>+	 */
>>>+	link_up = true;
>>
>>Maybe you could just use pdev->current_state instead of link_up?
>
>Maybe.  Will have a look at that.
>
>>
>>>+	mhi_qaic_ctrl_deinit();
>>>+	pci_unregister_driver(&qaic_pci_driver);
>>>+	mhi_driver_unregister(&qaic_mhi_driver);
>>>+}
>>>+
>>>+module_init(qaic_init);
>>>+module_exit(qaic_exit);
>>>+
>>>+MODULE_AUTHOR(QAIC_DESC " Kernel Driver Team");
>>>+MODULE_DESCRIPTION(QAIC_DESC " Accel Driver");
>>>+MODULE_LICENSE("GPL");
>>>diff --git a/include/uapi/drm/qaic_accel.h b/include/uapi/drm/qaic_accel.h
>>>new file mode 100644
>>>index 0000000..d5fa6f5
>>>--- /dev/null
>>>+++ b/include/uapi/drm/qaic_accel.h
>>>@@ -0,0 +1,283 @@
>>>+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
>>>+ *
>>>+ * Copyright (c) 2019-2020, The Linux Foundation. All rights reserved.
>>>+ * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved.
>>>+ */
>>>+
>>>+#ifndef QAIC_ACCEL_H_
>>>+#define QAIC_ACCEL_H_
>>>+
>>>+#include <linux/ioctl.h>
>>>+#include <linux/types.h>
>>
>>These two headers should not be needed here.
>
>Fair enough.  Listed out of paranoia since they define things used in 
>this header.
>
>>
>>>+#include "drm.h"
>>>+
>>>+#if defined(__CPLUSPLUS)
>>
>>Use lowercase here: __cplusplus
>
>Doh.  Not sure why uppercase was used.  The referenced examples sure 
>didn't have that.
>
>>
>>>+extern "C" {
>>>+#endif
>>>+
>>>+#define QAIC_MANAGE_MAX_MSG_LENGTH SZ_4K	/**<
>>>+						  * The length(4K) includes len and
>>>+						  * count fields of qaic_manage_msg
>>>+						  */
>>
>>I guess these are doxygen style commands but you should be using kernel-doc here.
>>See https://docs.kernel.org/doc-guide/kernel-doc.html.
>>This can be used to verify the header:
>>$ scripts/kernel-doc -v -none include/uapi/drm/qaic_accel.h
>
>include/uapi/drm/drm.h was used as the example for these, and thus it 
>was assumed that was the DRM style.
>
>I'm not convinced kernel-doc is applicable to all of these.  Will review.
>
>>
>>>+
>>>+enum qaic_sem_flags {
>>
>>Is there any specific reason for enums if all values are explicitly given?
>
>It is a style pulled from elsewhere.  I can remove the enums.
>
>>
>>>+	SEM_INSYNCFENCE =	0x1,
>>
>>All these enums/defines end up in a global user space namespace.
>>I would advise to prefix everything with QAIC_ or DRM_QAIC_ (e.g. QAIC_SEM_INSYNCFENCE)
>>to avoid conflicts with other drivers or user space libs.
>
>Good point.  Will do.
>
>>
>>>+	SEM_OUTSYNCFENCE =	0x2,
>>>+};
>>>+
>>>+enum qaic_sem_cmd {
>>>+	SEM_NOP =		0,
>>>+	SEM_INIT =		1,
>>>+	SEM_INC =		2,
>>>+	SEM_DEC =		3,
>>>+	SEM_WAIT_EQUAL =	4,
>>>+	SEM_WAIT_GT_EQ =	5, /**< Greater than or equal */
>>>+	SEM_WAIT_GT_0 =		6, /**< Greater than 0 */
>>>+};
>>>+
>>>+enum qaic_manage_transaction_type {
>>>+	TRANS_UNDEFINED =			0,
>>>+	TRANS_PASSTHROUGH_FROM_USR =		1,
>>>+	TRANS_PASSTHROUGH_TO_USR =		2,
>>>+	TRANS_PASSTHROUGH_FROM_DEV =		3,
>>>+	TRANS_PASSTHROUGH_TO_DEV =		4,
>>>+	TRANS_DMA_XFER_FROM_USR =		5,
>>>+	TRANS_DMA_XFER_TO_DEV =			6,
>>>+	TRANS_ACTIVATE_FROM_USR =		7,
>>>+	TRANS_ACTIVATE_FROM_DEV =		8,
>>>+	TRANS_ACTIVATE_TO_DEV =			9,
>>>+	TRANS_DEACTIVATE_FROM_USR =		10,
>>>+	TRANS_DEACTIVATE_FROM_DEV =		11,
>>>+	TRANS_STATUS_FROM_USR =			12,
>>>+	TRANS_STATUS_TO_USR =			13,
>>>+	TRANS_STATUS_FROM_DEV =			14,
>>>+	TRANS_STATUS_TO_DEV =			15,
>>>+	TRANS_TERMINATE_FROM_DEV =		16,
>>>+	TRANS_TERMINATE_TO_DEV =		17,
>>>+	TRANS_DMA_XFER_CONT =			18,
>>>+	TRANS_VALIDATE_PARTITION_FROM_DEV =	19,
>>>+	TRANS_VALIDATE_PARTITION_TO_DEV =	20,
>>>+	TRANS_MAX =				21
>>>+};
>>>+
>>>+struct qaic_manage_trans_hdr {
>>>+	__u32 type;	/**< in, value from enum manage_transaction_type */
>>>+	__u32 len;	/**< in, length of this transaction, including the header */
>>>+};
>>>+
>>>+struct qaic_manage_trans_passthrough {
>>>+	struct qaic_manage_trans_hdr hdr;
>>>+	__u8 data[];	/**< in, userspace must encode in little endian */
>>>+};
>>>+
>>>+struct qaic_manage_trans_dma_xfer {
>>>+	struct qaic_manage_trans_hdr hdr;
>>>+	__u32 tag;	/**< in, device specific */
>>>+	__u32 count;	/**< in */
>>>+	__u64 addr;	/**< in, address of the data to transferred via DMA */
>>>+	__u64 size;	/**< in, length of the data to transferred via DMA */
>>>+};
>>>+
>>>+struct qaic_manage_trans_activate_to_dev {
>>>+	struct qaic_manage_trans_hdr hdr;
>>>+	__u32 queue_size;	/**<
>>>+				  * in, number of elements in DBC request
>>>+				  * and respose queue
>>>+				  */
>>>+	__u32 eventfd;		/**< in */
>>>+	__u32 options;		/**< in, device specific */
>>>+	__u32 pad;		/**< pad must be 0 */
>>>+};
>>>+
>>>+struct qaic_manage_trans_activate_from_dev {
>>>+	struct qaic_manage_trans_hdr hdr;
>>>+	__u32 status;	/**< out, status of activate transaction */
>>>+	__u32 dbc_id;	/**< out, Identifier of assigned DMA Bridge channel */
>>>+	__u64 options;	/**< out */
>>>+};
>>>+
>>>+struct qaic_manage_trans_deactivate {
>>>+	struct qaic_manage_trans_hdr hdr;
>>>+	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
>>>+	__u32 pad;	/**< pad must be 0 */
>>>+};
>>>+
>>>+struct qaic_manage_trans_status_to_dev {
>>>+	struct qaic_manage_trans_hdr hdr;
>>>+};
>>>+
>>>+struct qaic_manage_trans_status_from_dev {
>>>+	struct qaic_manage_trans_hdr hdr;
>>>+	__u16 major;	/**< out, major vesrion of NNC protocol used by device */
>>>+	__u16 minor;	/**< out, minor vesrion of NNC protocol used by device */
>>
>>vesrion -> version
>
>Will fix.
>
>>
>>>+	__u32 status;	/**< out, status of query transaction  */
>>>+	__u64 status_flags;	/**<
>>>+				  * out
>>>+				  * 0    : If set then device has CRC check enabled
>>>+				  * 1:63 : Unused
>>>+				  */
>>>+};
>>>+
>>>+struct qaic_manage_msg {
>>>+	__u32 len;	/**< in, Length of valid data - ie sum of all transactions */
>>>+	__u32 count;	/**< in, Number of transactions in message */
>>>+	__u64 data;	/**< in, Pointer to array of transactions */
>>>+};
>>>+
>>>+struct qaic_create_bo {
>>>+	__u64 size;	/**< in, Size of BO in byte */
>>>+	__u32 handle;	/**< out, Returned GEM handle for the BO */
>>>+	__u32 pad;	/**< pad must be 0 */
>>>+};
>>>+
>>>+struct qaic_mmap_bo {
>>>+	__u32 handle;	/**< in, Handle for the BO being mapped. */
>>
>>The comment is missleading. BO is not mapped by this ioctl().
>
>Its mapping the handle to the offset, which is a type of mapping.  I 
>think I get your point though.  Will try to wordsmith this better.
>
>>
>>>+	__u32 pad;	/**< pad must be 0 */
>>>+	__u64 offset;	/**<
>>>+			  * out, offset into the drm node to use for
>>>+			  * subsequent mmap call
>>>+			  */
>>>+};
>>>+
>>>+/**
>>>+ * @brief semaphore command
>>>+ */
>>>+struct qaic_sem {
>>>+	__u16 val;	/**< in, Only lower 12 bits are valid */
>>>+	__u8  index;	/**< in, Only lower 5 bits are valid */
>>>+	__u8  presync;	/**< in, 1 if presync operation, 0 if postsync */
>>>+	__u8  cmd;	/**< in, See enum sem_cmd */
>>>+	__u8  flags;	/**< in, See sem_flags for valid bits.  All others must be 0 */
>>>+	__u16 pad;	/**< pad must be 0 */
>>>+};
>>>+
>>>+struct qaic_attach_slice_entry {
>>>+	__u64 size;		/**< in, Size memory to allocate for this BO slice */
>>>+	struct qaic_sem	sem0;	/**< in, Must be zero if not valid */
>>>+	struct qaic_sem	sem1;	/**< in, Must be zero if not valid */
>>>+	struct qaic_sem	sem2;	/**< in, Must be zero if not valid */
>>>+	struct qaic_sem	sem3;	/**< in, Must be zero if not valid */
>>>+	__u64 dev_addr;		/**< in, Address in device to/from which data is copied */
>>>+	__u64 db_addr;		/**< in, Doorbell address */
>>>+	__u32 db_data;		/**< in, Data to write to doorbell */
>>>+	__u32 db_len;		/**<
>>>+				  * in, Doorbell length - 32, 16, or 8 bits.
>>>+				  * 0 means doorbell is inactive
>>>+				  */
>>>+	__u64 offset;		/**< in, Offset from start of buffer */
>>>+};
>>>+
>>>+struct qaic_attach_slice_hdr {
>>>+	__u32 count;	/**< in, Number of slices for this BO */
>>>+	__u32 dbc_id;	/**< in, Associate this BO with this DMA Bridge channel */
>>>+	__u32 handle;	/**< in, Handle of BO to which slicing information is to be attached */
>>>+	__u32 dir;	/**< in, Direction of data: 1 = DMA_TO_DEVICE, 2 = DMA_FROM_DEVICE */
>>>+	__u64 size;	/**<
>>>+			  * in, Total length of BO
>>>+			  * If BO is imported (DMABUF/PRIME) then this size
>>>+			  * should not exceed the size of DMABUF provided.
>>>+			  * If BO is allocated using DRM_IOCTL_QAIC_CREATE_BO
>>>+			  * then this size should be exactly same as the size
>>>+			  * provided during DRM_IOCTL_QAIC_CREATE_BO.
>>>+			  */
>>>+};
>>>+
>>>+struct qaic_attach_slice {
>>>+	struct qaic_attach_slice_hdr hdr;
>>>+	__u64 data;	/**<
>>>+			  * in, Pointer to a buffer which is container of
>>>+			  * struct qaic_attach_slice_entry[]
>>>+			  */
>>>+};
>>>+
>>>+struct qaic_execute_entry {
>>>+	__u32 handle;	/**< in, buffer handle */
>>>+	__u32 dir;	/**< in, 1 = to device, 2 = from device */
>>>+};
>>>+
>>>+struct qaic_partial_execute_entry {
>>>+	__u32 handle;	/**< in, buffer handle */
>>>+	__u32 dir;	/**< in, 1 = to device, 2 = from device */
>>>+	__u64 resize;	/**< in, 0 = no resize */
>>>+};
>>>+
>>>+struct qaic_execute_hdr {
>>>+	__u32 count;	/**< in, number of executes following this header */
>>>+	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
>>>+};
>>>+
>>>+struct qaic_execute {
>>>+	struct qaic_execute_hdr hdr;
>>>+	__u64 data;	/**< in, qaic_execute_entry or qaic_partial_execute_entry container */
>>>+};
>>>+
>>>+struct qaic_wait {
>>>+	__u32 handle;	/**< in, handle to wait on until execute is complete */
>>>+	__u32 timeout;	/**< in, timeout for wait(in ms) */
>>>+	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
>>>+	__u32 pad;	/**< pad must be 0 */
>>>+};
>>>+
>>>+struct qaic_perf_stats_hdr {
>>>+	__u16 count;	/**< in, Total number BOs requested */
>>>+	__u16 pad;	/**< pad must be 0 */
>>>+	__u32 dbc_id;	/**< in, Identifier of assigned DMA Bridge channel */
>>>+};
>>>+
>>>+struct qaic_perf_stats {
>>>+	struct qaic_perf_stats_hdr hdr;
>>>+	__u64 data;	/**< in, qaic_perf_stats_entry container */
>>>+};
>>>+
>>>+struct qaic_perf_stats_entry {
>>>+	__u32 handle;			/**< in, Handle of the memory request */
>>>+	__u32 queue_level_before;	/**<
>>>+					  * out, Number of elements in queue
>>>+					  * before submission given memory request
>>>+					  */
>>>+	__u32 num_queue_element;	/**<
>>>+					  * out, Number of elements to add in the
>>>+					  * queue for given memory request
>>>+					  */
>>>+	__u32 submit_latency_us;	/**<
>>>+					  * out, Time taken by kernel to submit
>>>+					  * the request to device
>>>+					  */
>>>+	__u32 device_latency_us;	/**<
>>>+					  * out, Time taken by device to execute the
>>>+					  * request. 0 if request is not completed
>>>+					  */
>>>+	__u32 pad;			/**< pad must be 0 */
>>>+};
>>>+
>>>+struct qaic_part_dev {
>>>+	__u32 partition_id;	/**< in, reservation id */
>>>+	__u16 remove;		/**< in, 1 - Remove device 0 - Create device */
>>>+	__u16 pad;		/**< pad must be 0 */
>>>+};
>>>+
>>>+#define DRM_QAIC_MANAGE				0x00
>>>+#define DRM_QAIC_CREATE_BO			0x01
>>>+#define DRM_QAIC_MMAP_BO			0x02
>>
>>I know that MMAP_BO ioctl is common in drm drivers but in my opinion it is a very poor name.
>>I suggest naming it BO_INFO so in future you could extend it with other bo params besides
>>mmap offset.
>
>I see your point, but I don't see how to implement it.
>
>Let us assume this IOCTL is renamed DRM_QAIC_BO_INFO, and struct 
>qaic_mmap_bo is named qaic_bo_info.
>
>qaic_bo_info is part of the uAPI.  If, "tomorrow" we need to add a 
>field to it, we cannot.  That changes the structure size and breaks 
>the uAPI.
>
>I can't add reserved fields in anticipation of future use since GregKH 
>had said not to do that - 
>https://lore.kernel.org/all/20200514163734.GB3154055@kroah.com/
>
>So, I'm thinking the only approach would be a linked list similar to 
>EXECUTE_BO where qaic_bo_info holds a userspace pointer that is the 
>head of the list, and new list nodes can be defined in future.  That 
>seems like an overengineered solution to some hypothetical future 
>which may not come to pass.  If this is the solution, then I think I'd 
>prefer to leave this ioctl as is, and deprecate it if the extension 
>usecase ever comes to pass.
>
>Thoughts?  Am I missing something?
>
>>
>>>+#define DRM_QAIC_ATTACH_SLICE_BO		0x03
>>>+#define DRM_QAIC_EXECUTE_BO			0x04
>>>+#define DRM_QAIC_PARTIAL_EXECUTE_BO		0x05
>>>+#define DRM_QAIC_WAIT_BO			0x06
>>>+#define DRM_QAIC_PERF_STATS_BO			0x07
>>>+#define DRM_QAIC_PART_DEV			0x08
>>>+
>>>+#define DRM_IOCTL_QAIC_MANAGE			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_MANAGE, struct qaic_manage_msg)
>>>+#define DRM_IOCTL_QAIC_CREATE_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_CREATE_BO,	struct qaic_create_bo)
>>>+#define DRM_IOCTL_QAIC_MMAP_BO			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_MMAP_BO, struct qaic_mmap_bo)
>>>+#define DRM_IOCTL_QAIC_ATTACH_SLICE_BO		DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_ATTACH_SLICE_BO, struct qaic_attach_slice)
>>>+#define DRM_IOCTL_QAIC_EXECUTE_BO		DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_EXECUTE_BO,	struct qaic_execute)
>>>+#define DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO	DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_PARTIAL_EXECUTE_BO,	struct qaic_execute)
>>>+#define DRM_IOCTL_QAIC_WAIT_BO			DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_WAIT_BO, struct qaic_wait)
>>>+#define DRM_IOCTL_QAIC_PERF_STATS_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PERF_STATS_BO, struct qaic_perf_stats)
>>>+#define DRM_IOCTL_QAIC_PART_DEV			DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PART_DEV, struct qaic_part_dev)
>>>+
>>>+#if defined(__CPLUSPLUS)
>>
>>Use lowercase here: __cplusplus
>
>Will do.
>
>>
>>>+}
>>>+#endif
>>>+
>>>+#endif /* QAIC_ACCEL_H_ */
>>
>>Regards,
>>Jacek
>>
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 5/8] accel/qaic: Add datapath
  2023-02-06 15:41   ` Jeffrey Hugo
@ 2023-02-24 15:25     ` Stanislaw Gruszka
  -1 siblings, 0 replies; 68+ messages in thread
From: Stanislaw Gruszka @ 2023-02-24 15:25 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: ogabbay, airlied, daniel, dri-devel, jacek.lawrynowicz,
	quic_pkanojiy, quic_carlv, quic_ajitpals, linux-doc,
	linux-arm-msm

On Mon, Feb 06, 2023 at 08:41:42AM -0700, Jeffrey Hugo wrote:
> +#define SEM_VAL_MASK	GENMASK_ULL(11, 0)
> +#define SEM_INDEX_MASK	GENMASK_ULL(4, 0)
> +#define BULK_XFER	BIT(3)
> +#define GEN_COMPLETION	BIT(4)
> +#define INBOUND_XFER	1
> +#define OUTBOUND_XFER	2
> +#define REQHP_OFF	0x0 /* we read this */
> +#define REQTP_OFF	0x4 /* we write this */
> +#define RSPHP_OFF	0x8 /* we write this */
> +#define RSPTP_OFF	0xc /* we read this */
> +
> +#define ENCODE_SEM(val, index, sync, cmd, flags)			\
> +			((val) |					\
> +			(index) << 16 |					\
> +			(sync) << 22 |					\
> +			(cmd) << 24 |					\
> +			((cmd) ? BIT(31) : 0) |				\
> +			(((flags) & SEM_INSYNCFENCE) ? BIT(30) : 0) |	\
> +			(((flags) & SEM_OUTSYNCFENCE) ? BIT(29) : 0))

This could be probably better coded using FIELD_PREP()
with integrated checks of passed values not exceed
field width.

> +struct dbc_req { /* everything must be little endian encoded */

This comment does not provide much value IMHO ...

> +inline int get_dbc_req_elem_size(void)
> +{
> +	return sizeof(struct dbc_req);
> +}
> +
> +inline int get_dbc_rsp_elem_size(void)
> +{
> +	return sizeof(struct dbc_rsp);
> +}

.. as those those inliners, instead of using sizeof() directly.
Up to you to change.

> +static int reserve_pages(unsigned long start_pfn, unsigned long nr_pages,
> +			 bool reserve)
> +{
> +	unsigned long pfn;
> +	unsigned long end_pfn = start_pfn + nr_pages;
> +	struct page *page;
> +
> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> +		if (!pfn_valid(pfn))
> +			return -EINVAL;
> +		page =  pfn_to_page(pfn);
> +		if (reserve)
> +			SetPageReserved(page);
> +		else
> +			ClearPageReserved(page);

It is needed? Looks like taken from some legacy code.

> +static int copy_sgt(struct qaic_device *qdev, struct sg_table **sgt_out,
> +		    struct sg_table *sgt_in, u64 size, u64 offset)
> +{
> +	int total_len, len, nents, offf = 0, offl = 0;
> +	struct scatterlist *sg, *sgn, *sgf, *sgl;
> +	struct sg_table *sgt;
> +	int ret, j;
> +
> +	/* find out number of relevant nents needed for this mem */
> +	total_len = 0;
> +	sgf = NULL;
> +	sgl = NULL;
> +	nents = 0;
> +
> +	size = size ? size : PAGE_SIZE;
> +	for (sg = sgt_in->sgl; sg; sg = sg_next(sg)) {
> +		len = sg_dma_len(sg);
> +
> +		if (!len)
> +			continue;
> +		if (offset >= total_len && offset < total_len + len) {
> +			sgf = sg;
> +			offf = offset - total_len;
> +		}
> +		if (sgf)
> +			nents++;
> +		if (offset + size >= total_len &&
> +		    offset + size <= total_len + len) {
> +			sgl = sg;
> +			offl = offset + size - total_len;
> +			break;
> +		}
> +		total_len += len;
> +	}
> +
> +	if (!sgf || !sgl) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> +	if (!sgt) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
> +	if (ret)
> +		goto free_sgt;
> +
> +	/* copy relevant sg node and fix page and length */
> +	sgn = sgf;
> +	for_each_sgtable_sg(sgt, sg, j) {
> +		memcpy(sg, sgn, sizeof(*sg));
> +		if (sgn == sgf) {
> +			sg_dma_address(sg) += offf;
> +			sg_dma_len(sg) -= offf;
> +			sg_set_page(sg, sg_page(sgn),
> +				    sg_dma_len(sg), offf);
> +		} else {
> +			offf = 0;
> +		}
> +		if (sgn == sgl) {
> +			sg_dma_len(sg) = offl - offf;
> +			sg_set_page(sg, sg_page(sgn),
> +				    offl - offf, offf);
> +			sg_mark_end(sg);
> +			break;
> +		}
> +		sgn = sg_next(sgn);
> +	}

Why not use sg_copy_table() ? Overall copy_sgt() seems to be something
that could be replaced by generic helper or at least simplify.

Regards
Stanislaw


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 5/8] accel/qaic: Add datapath
@ 2023-02-24 15:25     ` Stanislaw Gruszka
  0 siblings, 0 replies; 68+ messages in thread
From: Stanislaw Gruszka @ 2023-02-24 15:25 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, quic_carlv, jacek.lawrynowicz

On Mon, Feb 06, 2023 at 08:41:42AM -0700, Jeffrey Hugo wrote:
> +#define SEM_VAL_MASK	GENMASK_ULL(11, 0)
> +#define SEM_INDEX_MASK	GENMASK_ULL(4, 0)
> +#define BULK_XFER	BIT(3)
> +#define GEN_COMPLETION	BIT(4)
> +#define INBOUND_XFER	1
> +#define OUTBOUND_XFER	2
> +#define REQHP_OFF	0x0 /* we read this */
> +#define REQTP_OFF	0x4 /* we write this */
> +#define RSPHP_OFF	0x8 /* we write this */
> +#define RSPTP_OFF	0xc /* we read this */
> +
> +#define ENCODE_SEM(val, index, sync, cmd, flags)			\
> +			((val) |					\
> +			(index) << 16 |					\
> +			(sync) << 22 |					\
> +			(cmd) << 24 |					\
> +			((cmd) ? BIT(31) : 0) |				\
> +			(((flags) & SEM_INSYNCFENCE) ? BIT(30) : 0) |	\
> +			(((flags) & SEM_OUTSYNCFENCE) ? BIT(29) : 0))

This could be probably better coded using FIELD_PREP()
with integrated checks of passed values not exceed
field width.

> +struct dbc_req { /* everything must be little endian encoded */

This comment does not provide much value IMHO ...

> +inline int get_dbc_req_elem_size(void)
> +{
> +	return sizeof(struct dbc_req);
> +}
> +
> +inline int get_dbc_rsp_elem_size(void)
> +{
> +	return sizeof(struct dbc_rsp);
> +}

.. as those those inliners, instead of using sizeof() directly.
Up to you to change.

> +static int reserve_pages(unsigned long start_pfn, unsigned long nr_pages,
> +			 bool reserve)
> +{
> +	unsigned long pfn;
> +	unsigned long end_pfn = start_pfn + nr_pages;
> +	struct page *page;
> +
> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> +		if (!pfn_valid(pfn))
> +			return -EINVAL;
> +		page =  pfn_to_page(pfn);
> +		if (reserve)
> +			SetPageReserved(page);
> +		else
> +			ClearPageReserved(page);

It is needed? Looks like taken from some legacy code.

> +static int copy_sgt(struct qaic_device *qdev, struct sg_table **sgt_out,
> +		    struct sg_table *sgt_in, u64 size, u64 offset)
> +{
> +	int total_len, len, nents, offf = 0, offl = 0;
> +	struct scatterlist *sg, *sgn, *sgf, *sgl;
> +	struct sg_table *sgt;
> +	int ret, j;
> +
> +	/* find out number of relevant nents needed for this mem */
> +	total_len = 0;
> +	sgf = NULL;
> +	sgl = NULL;
> +	nents = 0;
> +
> +	size = size ? size : PAGE_SIZE;
> +	for (sg = sgt_in->sgl; sg; sg = sg_next(sg)) {
> +		len = sg_dma_len(sg);
> +
> +		if (!len)
> +			continue;
> +		if (offset >= total_len && offset < total_len + len) {
> +			sgf = sg;
> +			offf = offset - total_len;
> +		}
> +		if (sgf)
> +			nents++;
> +		if (offset + size >= total_len &&
> +		    offset + size <= total_len + len) {
> +			sgl = sg;
> +			offl = offset + size - total_len;
> +			break;
> +		}
> +		total_len += len;
> +	}
> +
> +	if (!sgf || !sgl) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> +	if (!sgt) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
> +	if (ret)
> +		goto free_sgt;
> +
> +	/* copy relevant sg node and fix page and length */
> +	sgn = sgf;
> +	for_each_sgtable_sg(sgt, sg, j) {
> +		memcpy(sg, sgn, sizeof(*sg));
> +		if (sgn == sgf) {
> +			sg_dma_address(sg) += offf;
> +			sg_dma_len(sg) -= offf;
> +			sg_set_page(sg, sg_page(sgn),
> +				    sg_dma_len(sg), offf);
> +		} else {
> +			offf = 0;
> +		}
> +		if (sgn == sgl) {
> +			sg_dma_len(sg) = offl - offf;
> +			sg_set_page(sg, sg_page(sgn),
> +				    offl - offf, offf);
> +			sg_mark_end(sg);
> +			break;
> +		}
> +		sgn = sg_next(sgn);
> +	}

Why not use sg_copy_table() ? Overall copy_sgt() seems to be something
that could be replaced by generic helper or at least simplify.

Regards
Stanislaw


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 5/8] accel/qaic: Add datapath
  2023-02-24 15:25     ` Stanislaw Gruszka
@ 2023-02-24 19:36       ` Jeffrey Hugo
  -1 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-24 19:36 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: ogabbay, airlied, daniel, dri-devel, jacek.lawrynowicz,
	quic_pkanojiy, quic_carlv, quic_ajitpals, linux-doc,
	linux-arm-msm

On 2/24/2023 8:25 AM, Stanislaw Gruszka wrote:
> On Mon, Feb 06, 2023 at 08:41:42AM -0700, Jeffrey Hugo wrote:
>> +#define SEM_VAL_MASK	GENMASK_ULL(11, 0)
>> +#define SEM_INDEX_MASK	GENMASK_ULL(4, 0)
>> +#define BULK_XFER	BIT(3)
>> +#define GEN_COMPLETION	BIT(4)
>> +#define INBOUND_XFER	1
>> +#define OUTBOUND_XFER	2
>> +#define REQHP_OFF	0x0 /* we read this */
>> +#define REQTP_OFF	0x4 /* we write this */
>> +#define RSPHP_OFF	0x8 /* we write this */
>> +#define RSPTP_OFF	0xc /* we read this */
>> +
>> +#define ENCODE_SEM(val, index, sync, cmd, flags)			\
>> +			((val) |					\
>> +			(index) << 16 |					\
>> +			(sync) << 22 |					\
>> +			(cmd) << 24 |					\
>> +			((cmd) ? BIT(31) : 0) |				\
>> +			(((flags) & SEM_INSYNCFENCE) ? BIT(30) : 0) |	\
>> +			(((flags) & SEM_OUTSYNCFENCE) ? BIT(29) : 0))
> 
> This could be probably better coded using FIELD_PREP()
> with integrated checks of passed values not exceed
> field width.

Interesting idea.  Will have a look.

> 
>> +struct dbc_req { /* everything must be little endian encoded */
> 
> This comment does not provide much value IMHO ...

With the LE types, this does seem to be redundant now.  Will remove.

> 
>> +inline int get_dbc_req_elem_size(void)
>> +{
>> +	return sizeof(struct dbc_req);
>> +}
>> +
>> +inline int get_dbc_rsp_elem_size(void)
>> +{
>> +	return sizeof(struct dbc_rsp);
>> +}
> 
> .. as those those inliners, instead of using sizeof() directly.
> Up to you to change.

Will review these.

> 
>> +static int reserve_pages(unsigned long start_pfn, unsigned long nr_pages,
>> +			 bool reserve)
>> +{
>> +	unsigned long pfn;
>> +	unsigned long end_pfn = start_pfn + nr_pages;
>> +	struct page *page;
>> +
>> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>> +		if (!pfn_valid(pfn))
>> +			return -EINVAL;
>> +		page =  pfn_to_page(pfn);
>> +		if (reserve)
>> +			SetPageReserved(page);
>> +		else
>> +			ClearPageReserved(page);
> 
> It is needed? Looks like taken from some legacy code.

Required for remap_pfn_range().

>> +static int copy_sgt(struct qaic_device *qdev, struct sg_table **sgt_out,
>> +		    struct sg_table *sgt_in, u64 size, u64 offset)
>> +{
>> +	int total_len, len, nents, offf = 0, offl = 0;
>> +	struct scatterlist *sg, *sgn, *sgf, *sgl;
>> +	struct sg_table *sgt;
>> +	int ret, j;
>> +
>> +	/* find out number of relevant nents needed for this mem */
>> +	total_len = 0;
>> +	sgf = NULL;
>> +	sgl = NULL;
>> +	nents = 0;
>> +
>> +	size = size ? size : PAGE_SIZE;
>> +	for (sg = sgt_in->sgl; sg; sg = sg_next(sg)) {
>> +		len = sg_dma_len(sg);
>> +
>> +		if (!len)
>> +			continue;
>> +		if (offset >= total_len && offset < total_len + len) {
>> +			sgf = sg;
>> +			offf = offset - total_len;
>> +		}
>> +		if (sgf)
>> +			nents++;
>> +		if (offset + size >= total_len &&
>> +		    offset + size <= total_len + len) {
>> +			sgl = sg;
>> +			offl = offset + size - total_len;
>> +			break;
>> +		}
>> +		total_len += len;
>> +	}
>> +
>> +	if (!sgf || !sgl) {
>> +		ret = -EINVAL;
>> +		goto out;
>> +	}
>> +
>> +	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
>> +	if (!sgt) {
>> +		ret = -ENOMEM;
>> +		goto out;
>> +	}
>> +
>> +	ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
>> +	if (ret)
>> +		goto free_sgt;
>> +
>> +	/* copy relevant sg node and fix page and length */
>> +	sgn = sgf;
>> +	for_each_sgtable_sg(sgt, sg, j) {
>> +		memcpy(sg, sgn, sizeof(*sg));
>> +		if (sgn == sgf) {
>> +			sg_dma_address(sg) += offf;
>> +			sg_dma_len(sg) -= offf;
>> +			sg_set_page(sg, sg_page(sgn),
>> +				    sg_dma_len(sg), offf);
>> +		} else {
>> +			offf = 0;
>> +		}
>> +		if (sgn == sgl) {
>> +			sg_dma_len(sg) = offl - offf;
>> +			sg_set_page(sg, sg_page(sgn),
>> +				    offl - offf, offf);
>> +			sg_mark_end(sg);
>> +			break;
>> +		}
>> +		sgn = sg_next(sgn);
>> +	}
> 
> Why not use sg_copy_table() ? Overall copy_sgt() seems to be something
> that could be replaced by generic helper or at least simplify.

I don't see "sg_copy_table" defined in 6.2.  Are you suggesting renaming 
this function?  I guess I'm not quite understanding your comment here. 
Can you elaborate?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 5/8] accel/qaic: Add datapath
@ 2023-02-24 19:36       ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-24 19:36 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, quic_carlv, jacek.lawrynowicz

On 2/24/2023 8:25 AM, Stanislaw Gruszka wrote:
> On Mon, Feb 06, 2023 at 08:41:42AM -0700, Jeffrey Hugo wrote:
>> +#define SEM_VAL_MASK	GENMASK_ULL(11, 0)
>> +#define SEM_INDEX_MASK	GENMASK_ULL(4, 0)
>> +#define BULK_XFER	BIT(3)
>> +#define GEN_COMPLETION	BIT(4)
>> +#define INBOUND_XFER	1
>> +#define OUTBOUND_XFER	2
>> +#define REQHP_OFF	0x0 /* we read this */
>> +#define REQTP_OFF	0x4 /* we write this */
>> +#define RSPHP_OFF	0x8 /* we write this */
>> +#define RSPTP_OFF	0xc /* we read this */
>> +
>> +#define ENCODE_SEM(val, index, sync, cmd, flags)			\
>> +			((val) |					\
>> +			(index) << 16 |					\
>> +			(sync) << 22 |					\
>> +			(cmd) << 24 |					\
>> +			((cmd) ? BIT(31) : 0) |				\
>> +			(((flags) & SEM_INSYNCFENCE) ? BIT(30) : 0) |	\
>> +			(((flags) & SEM_OUTSYNCFENCE) ? BIT(29) : 0))
> 
> This could be probably better coded using FIELD_PREP()
> with integrated checks of passed values not exceed
> field width.

Interesting idea.  Will have a look.

> 
>> +struct dbc_req { /* everything must be little endian encoded */
> 
> This comment does not provide much value IMHO ...

With the LE types, this does seem to be redundant now.  Will remove.

> 
>> +inline int get_dbc_req_elem_size(void)
>> +{
>> +	return sizeof(struct dbc_req);
>> +}
>> +
>> +inline int get_dbc_rsp_elem_size(void)
>> +{
>> +	return sizeof(struct dbc_rsp);
>> +}
> 
> .. as those those inliners, instead of using sizeof() directly.
> Up to you to change.

Will review these.

> 
>> +static int reserve_pages(unsigned long start_pfn, unsigned long nr_pages,
>> +			 bool reserve)
>> +{
>> +	unsigned long pfn;
>> +	unsigned long end_pfn = start_pfn + nr_pages;
>> +	struct page *page;
>> +
>> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>> +		if (!pfn_valid(pfn))
>> +			return -EINVAL;
>> +		page =  pfn_to_page(pfn);
>> +		if (reserve)
>> +			SetPageReserved(page);
>> +		else
>> +			ClearPageReserved(page);
> 
> It is needed? Looks like taken from some legacy code.

Required for remap_pfn_range().

>> +static int copy_sgt(struct qaic_device *qdev, struct sg_table **sgt_out,
>> +		    struct sg_table *sgt_in, u64 size, u64 offset)
>> +{
>> +	int total_len, len, nents, offf = 0, offl = 0;
>> +	struct scatterlist *sg, *sgn, *sgf, *sgl;
>> +	struct sg_table *sgt;
>> +	int ret, j;
>> +
>> +	/* find out number of relevant nents needed for this mem */
>> +	total_len = 0;
>> +	sgf = NULL;
>> +	sgl = NULL;
>> +	nents = 0;
>> +
>> +	size = size ? size : PAGE_SIZE;
>> +	for (sg = sgt_in->sgl; sg; sg = sg_next(sg)) {
>> +		len = sg_dma_len(sg);
>> +
>> +		if (!len)
>> +			continue;
>> +		if (offset >= total_len && offset < total_len + len) {
>> +			sgf = sg;
>> +			offf = offset - total_len;
>> +		}
>> +		if (sgf)
>> +			nents++;
>> +		if (offset + size >= total_len &&
>> +		    offset + size <= total_len + len) {
>> +			sgl = sg;
>> +			offl = offset + size - total_len;
>> +			break;
>> +		}
>> +		total_len += len;
>> +	}
>> +
>> +	if (!sgf || !sgl) {
>> +		ret = -EINVAL;
>> +		goto out;
>> +	}
>> +
>> +	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
>> +	if (!sgt) {
>> +		ret = -ENOMEM;
>> +		goto out;
>> +	}
>> +
>> +	ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
>> +	if (ret)
>> +		goto free_sgt;
>> +
>> +	/* copy relevant sg node and fix page and length */
>> +	sgn = sgf;
>> +	for_each_sgtable_sg(sgt, sg, j) {
>> +		memcpy(sg, sgn, sizeof(*sg));
>> +		if (sgn == sgf) {
>> +			sg_dma_address(sg) += offf;
>> +			sg_dma_len(sg) -= offf;
>> +			sg_set_page(sg, sg_page(sgn),
>> +				    sg_dma_len(sg), offf);
>> +		} else {
>> +			offf = 0;
>> +		}
>> +		if (sgn == sgl) {
>> +			sg_dma_len(sg) = offl - offf;
>> +			sg_set_page(sg, sg_page(sgn),
>> +				    offl - offf, offf);
>> +			sg_mark_end(sg);
>> +			break;
>> +		}
>> +		sgn = sg_next(sgn);
>> +	}
> 
> Why not use sg_copy_table() ? Overall copy_sgt() seems to be something
> that could be replaced by generic helper or at least simplify.

I don't see "sg_copy_table" defined in 6.2.  Are you suggesting renaming 
this function?  I guess I'm not quite understanding your comment here. 
Can you elaborate?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/8] accel/qaic: Add uapi and core driver file
  2023-02-22 15:52         ` Dafna Hirschfeld
@ 2023-02-24 21:21           ` Jeffrey Hugo
  -1 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-24 21:21 UTC (permalink / raw)
  To: Dafna Hirschfeld
  Cc: Jacek Lawrynowicz, ogabbay, airlied, daniel, dri-devel,
	linux-doc, linux-arm-msm, quic_ajitpals, quic_pkanojiy,
	stanislaw.gruszka, quic_carlv

On 2/22/2023 8:52 AM, Dafna Hirschfeld wrote:
> On 17.02.2023 11:15, Jeffrey Hugo wrote:
>> On 2/16/2023 7:13 AM, Jacek Lawrynowicz wrote:
>>> Hi,
>>>
>>> On 06.02.2023 16:41, Jeffrey Hugo wrote:
>>>> Add the QAIC driver uapi file and core driver file that binds to the 
>>>> PCIe
>>>> device.  The core driver file also creates the accel device and manages
>>>> all the interconnections between the different parts of the driver.
>>>>
>>>> The driver can be built as a module.  If so, it will be called 
>>>> "qaic.ko".
>>>>
>>>> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
>>>> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
>>>> ---
>>>>  drivers/accel/qaic/qaic.h     | 321 ++++++++++++++++++
>>>>  drivers/accel/qaic/qaic_drv.c | 771 
>>>> ++++++++++++++++++++++++++++++++++++++++++
>>>>  include/uapi/drm/qaic_accel.h | 283 ++++++++++++++++
>>>>  3 files changed, 1375 insertions(+)
>>>>  create mode 100644 drivers/accel/qaic/qaic.h
>>>>  create mode 100644 drivers/accel/qaic/qaic_drv.c
>>>>  create mode 100644 include/uapi/drm/qaic_accel.h
>>>>
>>>> diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
>>>> new file mode 100644
>>>> index 0000000..3f7ea76
>>>> --- /dev/null
>>>> +++ b/drivers/accel/qaic/qaic.h
>>>> @@ -0,0 +1,321 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0-only
>>>> + *
>>>> + * Copyright (c) 2019-2021, The Linux Foundation. All rights reserved.
>>>> + * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All 
>>>> rights reserved.
>>>> + */
>>>> +
>>>> +#ifndef QAICINTERNAL_H_
>>>
>>> Please use guard macro that matches the file name: _QAIC_H_
>>
>> Before moving to DRM/ACCEL, this conflicted with the uapi file. 
>> However, that is no longer the case, so yes, this should be changed. 
>> Will do.
>>
>>>
>>>> +#define QAICINTERNAL_H_
>>>> +
>>>> +#include <linux/interrupt.h>
>>>> +#include <linux/kref.h>
>>>> +#include <linux/mhi.h>
>>>> +#include <linux/mutex.h>
>>>> +#include <linux/pci.h>
>>>> +#include <linux/spinlock.h>
>>>> +#include <linux/srcu.h>
>>>> +#include <linux/wait.h>
>>>> +#include <linux/workqueue.h>
>>>> +#include <drm/drm_device.h>
>>>> +#include <drm/drm_gem.h>
>>>> +
>>>> +#define QAIC_DBC_BASE        0x20000
>>>> +#define QAIC_DBC_SIZE        0x1000
>>>> +
>>>> +#define QAIC_NO_PARTITION    -1
>>>> +
>>>> +#define QAIC_DBC_OFF(i)        ((i) * QAIC_DBC_SIZE + QAIC_DBC_BASE)
>>>> +
>>>> +#define to_qaic_bo(obj) container_of(obj, struct qaic_bo, base)
>>>> +
>>>> +extern bool poll_datapath;
>>>> +
>>>> +struct qaic_user {
>>>> +    /* Uniquely identifies this user for the device */
>>>> +    int            handle;
>>>> +    struct kref        ref_count;
>>>> +    /* Char device opened by this user */
>>>> +    struct qaic_drm_device    *qddev;
>>>> +    /* Node in list of users that opened this drm device */
>>>> +    struct list_head    node;
>>>> +    /* SRCU used to synchronize this user during cleanup */
>>>> +    struct srcu_struct    qddev_lock;
>>>> +    atomic_t        chunk_id;
>>>> +};
>>>> +
>>>> +struct dma_bridge_chan {
>>>> +    /* Pointer to device strcut maintained by driver */
>>>> +    struct qaic_device    *qdev;
>>>> +    /* ID of this DMA bridge channel(DBC) */
>>>> +    unsigned int        id;
>>>> +    /* Synchronizes access to xfer_list */
>>>> +    spinlock_t        xfer_lock;
>>>> +    /* Base address of request queue */
>>>> +    void            *req_q_base;
>>>> +    /* Base address of response queue */
>>>> +    void            *rsp_q_base;
>>>> +    /*
>>>> +     * Base bus address of request queue. Response queue bus 
>>>> address can be
>>>> +     * calculated by adding request queue size to this variable
>>>> +     */
>>>> +    dma_addr_t        dma_addr;
>>>> +    /* Total size of request and response queue in byte */
>>>> +    u32            total_size;
>>>> +    /* Capacity of request/response queue */
>>>> +    u32            nelem;
>>>> +    /* The user that opened this DBC */
>>>> +    struct qaic_user    *usr;
>>>> +    /*
>>>> +     * Request ID of next memory handle that goes in request queue. 
>>>> One
>>>> +     * memory handle can enqueue more than one request elements, all
>>>> +     * this requests that belong to same memory handle have same 
>>>> request ID
>>>> +     */
>>>> +    u16            next_req_id;
>>>> +    /* TRUE: DBC is in use; FALSE: DBC not in use */
>>>
>>> Use standard "true"/"false" instead of custom "TRUE"/"FALSE" macros.
>>> This applies here and in multiple other places in the driver.
>>
>> I think you are getting at that the documentation could be confusing. 
>> I don't appear to see custom macro use in the code.  Will try to 
>> clarify that here.
>>
>>>> +    bool            in_use;
>>>> +    /*
>>>> +     * Base address of device registers. Used to read/write request 
>>>> and
>>>> +     * response queue's head and tail pointer of this DBC.
>>>> +     */
>>>> +    void __iomem        *dbc_base;
>>>> +    /* Head of list where each node is a memory handle queued in 
>>>> request queue */
>>>> +    struct list_head    xfer_list;
>>>> +    /* Synchronizes DBC readers during cleanup */
>>>> +    struct srcu_struct    ch_lock;
>>>> +    /*
>>>> +     * When this DBC is released, any thread waiting on this wait 
>>>> queue is
>>>> +     * woken up
>>>> +     */
>>>> +    wait_queue_head_t    dbc_release;
>>>> +    /* Head of list where each node is a bo associated with this 
>>>> DBC */
>>>> +    struct list_head    bo_lists;
>>>> +    /* The irq line for this DBC.  Used for polling */
>>>> +    unsigned int        irq;
>>>> +    /* Polling work item to simulate interrupts */
>>>> +    struct work_struct    poll_work;
>>>> +};
>>>> +
>>>> +struct qaic_device {
>>>> +    /* Pointer to base PCI device struct of our physical device */
>>>> +    struct pci_dev        *pdev;
>>>> +    /* Mask of all bars of this device */
>>>> +    int            bars;
>>>> +    /* Req. ID of request that will be queued next in MHI control 
>>>> device */
>>>> +    u32            next_seq_num;
>>>> +    /* Base address of bar 0 */
>>>> +    void __iomem        *bar_0;
>>>> +    /* Base address of bar 2 */
>>>> +    void __iomem        *bar_2;
>>>> +    /* Controller structure for MHI devices */
>>>> +    struct mhi_controller    *mhi_cntl;
>>>> +    /* MHI control channel device */
>>>> +    struct mhi_device    *cntl_ch;
>>>> +    /* List of requests queued in MHI control device */
>>>> +    struct list_head    cntl_xfer_list;
>>>> +    /* Synchronizes MHI control device transactions and its xfer 
>>>> list */
>>>> +    struct mutex        cntl_mutex;
>>>> +    /* Base actual physical representation of drm device */
>>>> +    struct qaic_drm_device    *base_dev;
>>>> +    /* Array of DBC struct of this device */
>>>> +    struct dma_bridge_chan    *dbc;
>>>> +    /* Work queue for tasks related to MHI control device */
>>>> +    struct workqueue_struct    *cntl_wq;
>>>> +    /* Synchronizes all the users of device during cleanup */
>>>> +    struct srcu_struct    dev_lock;
>>>> +    /* TRUE: Device under reset; FALSE: Device not under reset */
>>>> +    bool            in_reset;
>>>> +    /*
>>>> +     * TRUE: A tx MHI transaction has failed and a rx buffer is 
>>>> still queued
>>>> +     * in control device. Such a buffer is considered lost rx buffer
>>>> +     * FALSE: No rx buffer is lost in control device
>>>> +     */
>>>> +    bool            cntl_lost_buf;
>>>> +    /* Maximum number of DBC supported by this device */
>>>> +    u32            num_dbc;
>>>> +    /* Head in list of drm device created on top of this device */
>>>> +    struct list_head    qaic_drm_devices;
>>>> +    /* Synchronizes access of qaic_drm_devices list */
>>>> +    struct mutex        qaic_drm_devices_mutex;
>>>> +    /* Generate the CRC of a control message */
>>>> +    u32 (*gen_crc)(void *msg);
>>>> +    /* Validate the CRC of a control message */
>>>> +    bool (*valid_crc)(void *msg);
>>>> +};
>>>> +
>>>> +struct qaic_drm_device {
>>>> +    /* Pointer to the root device struct driven by this driver */
>>>> +    struct qaic_device    *qdev;
>>>> +    /* Node in list of drm devices maintained by root device */
>>>> +    struct list_head    node;
>>>> +    /*
>>>> +     * The physical device can be partition in number of logical 
>>>> devices.
>>>> +     * And each logical device is given a partition id. This member 
>>>> stores
>>>> +     * that id. QAIC_NO_PARTITION is a sentinel used to mark that 
>>>> this drm
>>>> +     * device is the actual physical device
>>>> +     */
>>>> +    s32            partition_id;
>>>> +    /*
>>>> +     * It points to the user that created this drm device. It is NULL
>>>> +     * when this drm device represents the physical device i.e.
>>>> +     * partition_id is QAIC_NO_PARTITION
>>>> +     */
>>>> +    struct qaic_user    *owner;
>>>> +    /* Pointer to the drm device struct of this drm device */
>>>> +    struct drm_device    *ddev;
>>>> +    /* Head in list of users who have opened this drm device */
>>>> +    struct list_head    users;
>>>> +    /* Synchronizes access to users list */
>>>> +    struct mutex        users_mutex;
>>>> +};
>>>> +
>>>> +struct qaic_bo {
>>>> +    struct drm_gem_object    base;
>>>
>>> Any reason why drm_gem_shmem_object cannot be used as a base?
>>
>> I beleive that is incompatible with our memory allocator in patch 5.
>>
>>>> +    /* Scatter/gather table for allocate/imported BO */
>>>> +    struct sg_table        *sgt;
>>>> +    /* BO size requested by user. GEM object might be bigger in 
>>>> size. */
>>>> +    u64            size;
>>>> +    /* Head in list of slices of this BO */
>>>> +    struct list_head    slices;
>>>> +    /* Total nents, for all slices of this BO */
>>>> +    int            total_slice_nents;
>>>> +    /*
>>>> +     * Direction of transfer. It can assume only two value 
>>>> DMA_TO_DEVICE and
>>>> +     * DMA_FROM_DEVICE.
>>>> +     */
>>>> +    int            dir;
>>>> +    /* The pointer of the DBC which operates on this BO */
>>>> +    struct dma_bridge_chan    *dbc;
>>>> +    /* Number of slice that belongs to this buffer */
>>>> +    u32            nr_slice;
>>>> +    /* Number of slice that have been transferred by DMA engine */
>>>> +    u32            nr_slice_xfer_done;
>>>> +    /* TRUE = BO is queued for execution, FALSE = BO is not queued */
>>>> +    bool            queued;
>>>> +    /*
>>>> +     * If TRUE then user has attached slicing information to this 
>>>> BO by
>>>> +     * calling DRM_IOCTL_QAIC_ATTACH_SLICE_BO ioctl.
>>>> +     */
>>>> +    bool            sliced;
>>>> +    /* Request ID of this BO if it is queued for execution */
>>>> +    u16            req_id;
>>>> +    /* Handle assigned to this BO */
>>>> +    u32            handle;
>>>> +    /* Wait on this for completion of DMA transfer of this BO */
>>>> +    struct completion    xfer_done;
>>>> +    /*
>>>> +     * Node in linked list where head is dbc->xfer_list.
>>>> +     * This link list contain BO's that are queued for DMA transfer.
>>>> +     */
>>>> +    struct list_head    xfer_list;
>>>> +    /*
>>>> +     * Node in linked list where head is dbc->bo_lists.
>>>> +     * This link list contain BO's that are associated with the DBC 
>>>> it is
>>>> +     * linked to.
>>>> +     */
>>>> +    struct list_head    bo_list;
>>>> +    struct {
>>>> +        /*
>>>> +         * Latest timestamp(ns) at which kernel received a request to
>>>> +         * execute this BO
>>>> +         */
>>>> +        u64        req_received_ts;
>>>> +        /*
>>>> +         * Latest timestamp(ns) at which kernel enqueued requests of
>>>> +         * this BO for execution in DMA queue
>>>> +         */
>>>> +        u64        req_submit_ts;
>>>> +        /*
>>>> +         * Latest timestamp(ns) at which kernel received a completion
>>>> +         * interrupt for requests of this BO
>>>> +         */
>>>> +        u64        req_processed_ts;
>>>> +        /*
>>>> +         * Number of elements already enqueued in DMA queue before
>>>> +         * enqueuing requests of this BO
>>>> +         */
>>>> +        u32        queue_level_before;
>>>> +    } perf_stats;
>>>> +
>>>> +};
>>>> +
>>>> +struct bo_slice {
>>>> +    /* Mapped pages */
>>>> +    struct sg_table        *sgt;
>>>> +    /* Number of requests required to queue in DMA queue */
>>>> +    int            nents;
>>>> +    /* See enum dma_data_direction */
>>>> +    int            dir;
>>>> +    /* Actual requests that will be copied in DMA queue */
>>>> +    struct dbc_req        *reqs;
>>>> +    struct kref        ref_count;
>>>> +    /* TRUE: No DMA transfer required */
>>>> +    bool            no_xfer;
>>>> +    /* Pointer to the parent BO handle */
>>>> +    struct qaic_bo        *bo;
>>>> +    /* Node in list of slices maintained by parent BO */
>>>> +    struct list_head    slice;
>>>> +    /* Size of this slice in bytes */
>>>> +    u64            size;
>>>> +    /* Offset of this slice in buffer */
>>>> +    u64            offset;
>>>> +};
>>>> +
>>>> +int get_dbc_req_elem_size(void);
>>>> +int get_dbc_rsp_elem_size(void);
>>>> +int get_cntl_version(struct qaic_device *qdev, struct qaic_user *usr,
>>>> +             u16 *major, u16 *minor);
>>>> +int qaic_manage_ioctl(struct drm_device *dev, void *data,
>>>> +              struct drm_file *file_priv);
>>>> +int qaic_execute_ioctl(struct qaic_device *qdev, struct qaic_user 
>>>> *usr,
>>>> +               unsigned long arg, bool is_partial);
>>>> +int qaic_wait_exec_ioctl(struct qaic_device *qdev, struct qaic_user 
>>>> *usr,
>>>> +             unsigned long arg);
>>>> +int qaic_query_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
>>>> +             unsigned long arg);
>>>> +int qaic_data_mmap(struct qaic_device *qdev, struct qaic_user *usr,
>>>> +           struct vm_area_struct *vma);
>>>> +int qaic_data_get_reservation(struct qaic_device *qdev, struct 
>>>> qaic_user *usr,
>>>> +                  void *data, u32 *partition_id,
>>>> +                  u16 *remove);
>>>> +void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
>>>> +             struct mhi_result *mhi_result);
>>>> +
>>>> +void qaic_mhi_dl_xfer_cb(struct mhi_device *mhi_dev,
>>>> +             struct mhi_result *mhi_result);
>>>> +
>>>> +int qaic_control_open(struct qaic_device *qdev);
>>>> +void qaic_control_close(struct qaic_device *qdev);
>>>> +void qaic_release_usr(struct qaic_device *qdev, struct qaic_user 
>>>> *usr);
>>>> +
>>>> +irqreturn_t dbc_irq_threaded_fn(int irq, void *data);
>>>> +irqreturn_t dbc_irq_handler(int irq, void *data);
>>>> +int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct 
>>>> qaic_user *usr);
>>>> +void enable_dbc(struct qaic_device *qdev, u32 dbc_id, struct 
>>>> qaic_user *usr);
>>>> +void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id);
>>>> +void release_dbc(struct qaic_device *qdev, u32 dbc_id);
>>>> +
>>>> +void wake_all_cntl(struct qaic_device *qdev);
>>>> +void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, 
>>>> bool exit_reset);
>>>> +
>>>> +struct drm_gem_object *qaic_gem_prime_import(struct drm_device *dev,
>>>> +                         struct dma_buf *dma_buf);
>>>> +
>>>> +int qaic_create_bo_ioctl(struct drm_device *dev, void *data,
>>>> +             struct drm_file *file_priv);
>>>> +int qaic_mmap_bo_ioctl(struct drm_device *dev, void *data,
>>>> +               struct drm_file *file_priv);
>>>> +int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data,
>>>> +                   struct drm_file *file_priv);
>>>> +int qaic_execute_bo_ioctl(struct drm_device *dev, void *data,
>>>> +              struct drm_file *file_priv);
>>>> +int qaic_partial_execute_bo_ioctl(struct drm_device *dev, void *data,
>>>> +                  struct drm_file *file_priv);
>>>> +int qaic_wait_bo_ioctl(struct drm_device *dev, void *data,
>>>> +               struct drm_file *file_priv);
>>>> +int qaic_test_print_bo_ioctl(struct drm_device *dev, void *data,
>>>> +                 struct drm_file *file_priv);
>>>> +int qaic_perf_stats_bo_ioctl(struct drm_device *dev, void *data,
>>>> +                 struct drm_file *file_priv);
>>>
>>> You don't need to break these lines. You can use up to 100 columns in 
>>> the whole driver.
>>> It will be more readable and checkpatch won't complain.
>>
>> Some of this predates the 100 column change.  Checkpatch already 
>> complains about the uapi file.
>>
>> Will try to fix here.
>>
>>>> +void irq_polling_work(struct work_struct *work);
>>>> +
>>>> +#endif /* QAICINTERNAL_H_ */
>>>> diff --git a/drivers/accel/qaic/qaic_drv.c 
>>>> b/drivers/accel/qaic/qaic_drv.c
>>>> new file mode 100644
>>>> index 0000000..602a784
>>>> --- /dev/null
>>>> +++ b/drivers/accel/qaic/qaic_drv.c
>>>> @@ -0,0 +1,771 @@
>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>> +
>>>> +/* Copyright (c) 2019-2021, The Linux Foundation. All rights 
>>>> reserved. */
>>>> +/* Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All 
>>>> rights reserved. */
>>>> +
>>>> +#include <linux/delay.h>
>>>> +#include <linux/dma-mapping.h>
>>>> +#include <linux/idr.h>
>>>> +#include <linux/interrupt.h>
>>>> +#include <linux/list.h>
>>>> +#include <linux/kref.h>
>>>> +#include <linux/mhi.h>
>>>> +#include <linux/module.h>
>>>> +#include <linux/msi.h>
>>>> +#include <linux/mutex.h>
>>>> +#include <linux/pci.h>
>>>> +#include <linux/sched.h>
>>>
>>> Is <linux/sched.h> used here?
>>> Feels like there are couple other unused includes in this file.
>>
>> Actually I think that can be removed now.  Will check.
>> The other ones look sane to me.  Are there specific ones you question?
>>
>>>
>>>> +#include <linux/spinlock.h>
>>>> +#include <linux/workqueue.h>
>>>> +#include <linux/wait.h>
>>>> +#include <drm/drm_accel.h>
>>>> +#include <drm/drm_drv.h>
>>>> +#include <drm/drm_file.h>
>>>> +#include <drm/drm_gem.h>
>>>> +#include <drm/drm_ioctl.h>
>>>> +#include <uapi/drm/qaic_accel.h>
>>>> +
>>>> +#include "mhi_controller.h"
>>>> +#include "mhi_qaic_ctrl.h"
>>>> +#include "qaic.h"
>>>> +
>>>> +MODULE_IMPORT_NS(DMA_BUF);
>>>> +
>>>> +#define PCI_DEV_AIC100            0xa100
>>>> +#define QAIC_NAME            "qaic"
>>>> +#define QAIC_DESC            "Qualcomm Cloud AI Accelerators"
>>>> +
>>>> +static unsigned int datapath_polling;
>>>> +module_param(datapath_polling, uint, 0400);
>>>> +bool poll_datapath;
>>>> +
>>>> +static u16 cntl_major = 5;
>>>> +static u16 cntl_minor;/* 0 */
>>>
>>> Missing space before the comment.
>>> And also you could convert both vars to macros as they are constants.
>>
>> Sure
>>
>>>> +static bool link_up;
>>>> +static DEFINE_IDA(qaic_usrs);
>>>> +
>>>> +static int qaic_create_drm_device(struct qaic_device *qdev, s32 
>>>> partition_id,
>>>> +                  struct qaic_user *owner);
>>>> +static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 
>>>> partition_id,
>>>> +                    struct qaic_user *owner);
>>>> +
>>>> +static void free_usr(struct kref *kref)
>>>> +{
>>>> +    struct qaic_user *usr = container_of(kref, struct qaic_user, 
>>>> ref_count);
>>>> +
>>>> +    cleanup_srcu_struct(&usr->qddev_lock);
>>>> +    ida_free(&qaic_usrs, usr->handle);
>>>> +    kfree(usr);
>>>> +}
>>>> +
>>>> +static int qaic_open(struct drm_device *dev, struct drm_file *file)
>>>> +{
>>>> +    struct qaic_drm_device *qddev = dev->dev_private;
>>>> +    struct qaic_device *qdev = qddev->qdev;
>>>> +    struct qaic_user *usr;
>>>> +    int rcu_id;
>>>> +    int ret;
>>>> +
>>>> +    rcu_id = srcu_read_lock(&qdev->dev_lock);
>>>> +    if (qdev->in_reset) {
>>>> +        srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>> +        return -ENODEV;
>>>> +    }
>>>> +
>>>> +    usr = kmalloc(sizeof(*usr), GFP_KERNEL);
>>>> +    if (!usr) {
>>>> +        srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>> +        return -ENOMEM;
>>>> +    }
>>>> +
>>>> +    usr->handle = ida_alloc(&qaic_usrs, GFP_KERNEL);
>>>> +    if (usr->handle < 0) {
>>>> +        srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>> +        kfree(usr);
>>>> +        return usr->handle;
> 
> hi, use after free here

Nice catch.  Thanks.  Also noting the other things you mentioned.

> 
>>>> +    }
>>>> +    usr->qddev = qddev;
>>>> +    atomic_set(&usr->chunk_id, 0);
>>>> +    init_srcu_struct(&usr->qddev_lock);
>>>> +    kref_init(&usr->ref_count);
>>>> +
>>>> +    ret = mutex_lock_interruptible(&qddev->users_mutex);
>>>> +    if (ret) {
>>>> +        cleanup_srcu_struct(&usr->qddev_lock);
>>>> +        kfree(usr);
>>>> +        srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>> +        return ret;
>>>> +    }
>>>> +
>>>> +    list_add(&usr->node, &qddev->users);
>>>> +    mutex_unlock(&qddev->users_mutex);
>>>> +
>>>> +    file->driver_priv = usr;
>>>> +
>>>> +    srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static void qaic_postclose(struct drm_device *dev, struct drm_file 
>>>> *file)
>>>> +{
>>>> +    struct qaic_user *usr = file->driver_priv;
>>>> +    struct qaic_drm_device *qddev;
>>>> +    struct qaic_device *qdev;
>>>> +    int qdev_rcu_id;
>>>> +    int usr_rcu_id;
>>>> +    int i;
>>>> +
>>>> +    qddev = usr->qddev;
>>>> +    usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
>>>> +    if (qddev) {
>>>> +        qdev = qddev->qdev;
>>>> +        qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
>>>> +        if (!qdev->in_reset) {
>>>> +            qaic_release_usr(qdev, usr);
>>>> +            for (i = 0; i < qdev->num_dbc; ++i)
>>>> +                if (qdev->dbc[i].usr &&
>>>> +                    qdev->dbc[i].usr->handle == usr->handle)
>>>> +                    release_dbc(qdev, i);
>>>> +
>>>> +            /* Remove child devices */
>>>> +            if (qddev->partition_id == QAIC_NO_PARTITION)
>>>> +                qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, usr);
>>>> +        }
>>>> +        srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
>>>> +
>>>> +        mutex_lock(&qddev->users_mutex);
>>>> +        if (!list_empty(&usr->node))
>>>> +            list_del_init(&usr->node);
>>>> +        mutex_unlock(&qddev->users_mutex);
>>>> +    }
>>>> +
>>>> +    srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>>> +    kref_put(&usr->ref_count, free_usr);
>>>> +
>>>> +    file->driver_priv = NULL;
>>>> +}
>>>> +
>>>> +static int qaic_part_dev_ioctl(struct drm_device *dev, void *data,
>>>> +                   struct drm_file *file_priv)
>>>> +{
>>>> +    struct qaic_device *qdev;
>>>> +    struct qaic_user *usr;
>>>> +    u32 partition_id;
>>>> +    int qdev_rcu_id;
>>>> +    int usr_rcu_id;
>>>> +    int ret = 0;
>>>> +    u16 remove;
>>>> +
>>>> +    usr = file_priv->driver_priv;
>>>> +    if (!usr)
>>>> +        return -EINVAL;
>>>> +
>>>> +    usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
>>>> +    if (!usr->qddev) {
>>>> +        srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>>> +        return -ENODEV;
>>>> +    }
>>>> +
>>>> +    qdev = usr->qddev->qdev;
>>>> +    if (!qdev) {
>>>> +        srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>>> +        return -ENODEV;
>>>> +    }
>>>> +
>>>> +    qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
>>>> +    if (qdev->in_reset) {
>>>> +        ret = -ENODEV;
>>>> +        goto out;
>>>> +    }
>>>> +
>>>> +    /* This IOCTL is only supported for base devices. */
>>>> +    if (usr->qddev->partition_id != QAIC_NO_PARTITION) {
>>>> +        ret = -ENOTTY;
>>>> +        goto out;
>>>> +    }
>>>> +
>>>> +    ret = qaic_data_get_reservation(qdev, usr, data, &partition_id,
>>>> +                    &remove);
>>>> +    if (ret)
>>>> +        goto out;
>>>> +
>>>> +    if (remove == 1)
>>>> +        qaic_destroy_drm_device(qdev, partition_id, usr);
>>>> +    else
>>>> +        ret = qaic_create_drm_device(qdev, partition_id, usr);
>>>> +
>>>> +out:
>>>> +    srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
>>>> +    srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +DEFINE_DRM_ACCEL_FOPS(qaic_accel_fops);
>>>> +
>>>> +static const struct drm_ioctl_desc qaic_drm_ioctls[] = {
>>>> +    DRM_IOCTL_DEF_DRV(QAIC_MANAGE, qaic_manage_ioctl, 
>>>> DRM_RENDER_ALLOW),
>>>> +    DRM_IOCTL_DEF_DRV(QAIC_CREATE_BO, qaic_create_bo_ioctl, 
>>>> DRM_RENDER_ALLOW),
>>>> +    DRM_IOCTL_DEF_DRV(QAIC_MMAP_BO, qaic_mmap_bo_ioctl, 
>>>> DRM_RENDER_ALLOW),
>>>> +    DRM_IOCTL_DEF_DRV(QAIC_ATTACH_SLICE_BO, 
>>>> qaic_attach_slice_bo_ioctl, DRM_RENDER_ALLOW),
>>>> +    DRM_IOCTL_DEF_DRV(QAIC_EXECUTE_BO, qaic_execute_bo_ioctl, 
>>>> DRM_RENDER_ALLOW),
>>>> +    DRM_IOCTL_DEF_DRV(QAIC_PARTIAL_EXECUTE_BO, 
>>>> qaic_partial_execute_bo_ioctl, DRM_RENDER_ALLOW),
>>>> +    DRM_IOCTL_DEF_DRV(QAIC_WAIT_BO, qaic_wait_bo_ioctl, 
>>>> DRM_RENDER_ALLOW),
>>>> +    DRM_IOCTL_DEF_DRV(QAIC_PERF_STATS_BO, qaic_perf_stats_bo_ioctl, 
>>>> DRM_RENDER_ALLOW),
>>>> +    DRM_IOCTL_DEF_DRV(QAIC_PART_DEV, qaic_part_dev_ioctl, 
>>>> DRM_RENDER_ALLOW),
>>>> +};
>>>> +
>>>> +static const struct drm_driver qaic_accel_driver = {
>>>> +    .driver_features    = DRIVER_GEM | DRIVER_COMPUTE_ACCEL,
>>>> +
>>>> +    .name            = QAIC_NAME,
>>>> +    .desc            = QAIC_DESC,
>>>> +    .date            = "20190618",
>>>> +
>>>> +    .fops            = &qaic_accel_fops,
>>>> +    .open            = qaic_open,
>>>> +    .postclose        = qaic_postclose,
>>>> +
>>>> +    .ioctls            = qaic_drm_ioctls,
>>>> +    .num_ioctls        = ARRAY_SIZE(qaic_drm_ioctls),
>>>> +    .prime_fd_to_handle    = drm_gem_prime_fd_to_handle,
>>>> +    .gem_prime_import    = qaic_gem_prime_import,
>>>> +};
>>>> +
>>>> +static int qaic_create_drm_device(struct qaic_device *qdev, s32 
>>>> partition_id,
>>>> +                  struct qaic_user *owner)
>>>> +{
>>>> +    struct qaic_drm_device *qddev;
>>>> +    struct drm_device *ddev;
>>>> +    struct device *pdev;
>>>> +    int ret;
>>>> +
>>>> +    /*
>>>> +     * Partition id QAIC_NO_PARTITION indicates that the device was 
>>>> created
>>>> +     * on mhi_probe and id > QAIC_NO_PARTITION indicates a partition
>>>> +     * created using IOCTL. So, pdev for primary device is the pci 
>>>> dev and
>>>> +     * the parent for partition dev is the primary device.
>>>> +     */
>>>> +    if (partition_id == QAIC_NO_PARTITION)
>>>> +        pdev = &qdev->pdev->dev;
>>>> +    else
>>>> +        pdev = qdev->base_dev->ddev->dev;
>>>> +
>>>> +    qddev = kzalloc(sizeof(*qddev), GFP_KERNEL);
>>>> +    if (!qddev) {
>>>> +        ret = -ENOMEM;
>>>> +        goto qddev_fail;
> 
> qddev_fail label is just before returning so better return here directly
> 
>>>> +    }
>>>> +
>>>> +    ddev = drm_dev_alloc(&qaic_accel_driver, pdev);
>>>> +    if (IS_ERR(ddev)) {
>>>> +        ret = PTR_ERR(ddev);
>>>> +        goto ddev_fail;
>>>> +    }
>>>> +
>>>> +    ddev->dev_private = qddev;
>>>> +    qddev->ddev = ddev;
>>>> +
>>>> +    if (partition_id == QAIC_NO_PARTITION)
>>>> +        qdev->base_dev = qddev;
>>>> +    qddev->qdev = qdev;
>>>> +    qddev->partition_id = partition_id;
>>>> +    qddev->owner = owner;
>>>> +    INIT_LIST_HEAD(&qddev->users);
>>>> +    mutex_init(&qddev->users_mutex);
>>>> +
>>>> +    mutex_lock(&qdev->qaic_drm_devices_mutex);
>>>> +    list_add(&qddev->node, &qdev->qaic_drm_devices);
>>>> +    mutex_unlock(&qdev->qaic_drm_devices_mutex);
>>>> +
>>>> +    ret = drm_dev_register(ddev, 0);
>>>> +    if (ret) {
>>>> +        pci_dbg(qdev->pdev, "%s: drm_dev_register failed %d\n", 
>>>> __func__, ret);
>>>> +        goto drm_reg_fail;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +
>>>> +drm_reg_fail:
>>>> +    mutex_destroy(&qddev->users_mutex);
>>>> +    mutex_lock(&qdev->qaic_drm_devices_mutex);
>>>> +    list_del(&qddev->node);
>>>> +    mutex_unlock(&qdev->qaic_drm_devices_mutex);
>>>> +    if (partition_id == QAIC_NO_PARTITION)
>>>> +        qdev->base_dev = NULL;
>>>> +    drm_dev_put(ddev);
>>>> +ddev_fail:
>>>> +    kfree(qddev);
>>>> +qddev_fail:
> 
> no point for this label
> 
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 
>>>> partition_id,
>>>> +                    struct qaic_user *owner)
>>>> +{
>>>> +    struct qaic_drm_device *qddev;
>>>> +    struct qaic_drm_device *q;
>>>> +    struct qaic_user *usr;
>>>> +
>>>> +    list_for_each_entry_safe(qddev, q, &qdev->qaic_drm_devices, 
>>>> node) {
>>>> +        /*
>>>> +         * Skip devices in case we just want to remove devices
>>>> +         * specific to a owner or partition id.
>>>> +         *
>>>> +         * owner    partition_id    notes
>>>> +         * ----------------------------------
>>>> +         * NULL        NO_PARTITION    delete base + all derived (qdev
>>>> +         *                reset)
>>>> +         * !NULL    NO_PARTITION    delete derived devs created by
>>>> +         *                owner.
>>>> +         * !NULL    >NO_PARTITION    delete derived dev identified by
>>>> +         *                the partition id and created by
>>>> +         *                owner
>>>> +         * NULL        >NO_PARTITION    invalid (no-op)
>>>> +         *
>>>> +         * if partition_id is any value < QAIC_NO_PARTITION this 
>>>> will be
>>>> +         * a no-op.
>>>> +         */
>>>> +        if (owner && owner != qddev->owner)
>>>> +            continue;
>>>> +
>>>> +        if (partition_id != QAIC_NO_PARTITION &&
>>>> +            partition_id != qddev->partition_id && !owner)
>>>> +            continue;
>>>> +
>>>> +        /*
>>>> +         * Existing users get unresolvable errors till they close FDs.
>>>> +         * Need to sync carefully with users calling close().  The
>>>> +         * list of users can be modified elsewhere when the lock isn't
>>>> +         * held here, but the sync'ing the srcu with the mutex held
>>>> +         * could deadlock.  Grab the mutex so that the list will be
>>>> +         * unmodified.  The user we get will exist as long as the
>>>> +         * lock is held.  Signal that the qcdev is going away, and
>>>> +         * grab a reference to the user so they don't go away for
>>>> +         * synchronize_srcu().  Then release the mutex to avoid
>>>> +         * deadlock and make sure the user has observed the signal.
>>>> +         * With the lock released, we cannot maintain any state of the
>>>> +         * user list.
>>>> +         */
>>>> +        mutex_lock(&qddev->users_mutex);
>>>> +        while (!list_empty(&qddev->users)) {
>>>> +            usr = list_first_entry(&qddev->users, struct qaic_user,
>>>> +                           node);
>>>> +            list_del_init(&usr->node);
>>>> +            kref_get(&usr->ref_count);
>>>> +            usr->qddev = NULL;
>>>> +            mutex_unlock(&qddev->users_mutex);
>>>> +            synchronize_srcu(&usr->qddev_lock);
>>>> +            kref_put(&usr->ref_count, free_usr);
>>>> +            mutex_lock(&qddev->users_mutex);
>>>> +        }
>>>> +        mutex_unlock(&qddev->users_mutex);
>>>> +
>>>> +        if (qddev->ddev) {
>>>> +            drm_dev_unregister(qddev->ddev);
>>>> +            drm_dev_put(qddev->ddev);
>>>> +        }
>>>> +
>>>> +        list_del(&qddev->node);
>>>> +        kfree(qddev);
>>>> +    }
>>>> +}
>>>> +
>>>> +static int qaic_mhi_probe(struct mhi_device *mhi_dev,
>>>> +              const struct mhi_device_id *id)
>>>> +{
>>>> +    struct qaic_device *qdev;
>>>> +    u16 major, minor;
>>>> +    int ret;
>>>> +
>>>> +    /*
>>>> +     * Invoking this function indicates that the control channel to 
>>>> the
>>>> +     * device is available.  We use that as a signal to indicate that
>>>> +     * the device side firmware has booted.  The device side firmware
>>>> +     * manages the device resources, so we need to communicate with it
>>>> +     * via the control channel in order to utilize the device.  
>>>> Therefore
>>>> +     * we wait until this signal to create the drm dev that 
>>>> userspace will
>>>> +     * use to control the device, because without the device side 
>>>> firmware,
>>>> +     * userspace can't do anything useful.
>>>> +     */
>>>> +
>>>> +    qdev = pci_get_drvdata(to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
>>>> +
>>>> +    qdev->in_reset = false;
>>>> +
>>>> +    dev_set_drvdata(&mhi_dev->dev, qdev);
>>>> +    qdev->cntl_ch = mhi_dev;
>>>> +
>>>> +    ret = qaic_control_open(qdev);
>>>> +    if (ret) {
>>>> +        pci_dbg(qdev->pdev, "%s: control_open failed %d\n", 
>>>> __func__, ret);
>>>> +        goto err;
>>>> +    }
>>>> +
>>>> +    ret = get_cntl_version(qdev, NULL, &major, &minor);
>>>> +    if (ret || major != cntl_major || minor > cntl_minor) {
>>>> +        pci_err(qdev->pdev, "%s: Control protocol version (%d.%d) 
>>>> not supported.  Supported version is (%d.%d). Ret: %d\n",
>>>> +            __func__, major, minor, cntl_major, cntl_minor, ret);
>>>> +        ret = -EINVAL;
>>>> +        goto close_control;
>>>> +    }
>>>> +
>>>> +    ret = qaic_create_drm_device(qdev, QAIC_NO_PARTITION, NULL);
>>>> +
>>>> +    return ret;
>>>> +
>>>> +close_control:
>>>> +    qaic_control_close(qdev);
>>>> +err:
> 
> same here, not sure there is a point to this label
> 
> Thanks,
> Dafna
> 
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static void qaic_mhi_remove(struct mhi_device *mhi_dev)
>>>> +{
>>>
>>> Add a comment here
>>
>> Presumably why this is an empty function?
>> Will do.
>>
>>>> +}
>>>> +
>>>> +static void qaic_notify_reset(struct qaic_device *qdev)
>>>> +{
>>>> +    int i;
>>>> +
>>>> +    qdev->in_reset = true;
>>>> +    /* wake up any waiters to avoid waiting for timeouts at sync */
>>>> +    wake_all_cntl(qdev);
>>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>>> +        wakeup_dbc(qdev, i);
>>>> +    synchronize_srcu(&qdev->dev_lock);
>>>> +}
>>>> +
>>>> +void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, 
>>>> bool exit_reset)
>>>> +{
>>>> +    int i;
>>>> +
>>>> +    qaic_notify_reset(qdev);
>>>> +
>>>> +    /* remove drmdevs to prevent new users from coming in */
>>>> +    if (qdev->base_dev)
>>>> +        qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, NULL);
>>>> +
>>>> +    /* start tearing things down */
>>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>>> +        release_dbc(qdev, i);
>>>> +
>>>> +    if (exit_reset)
>>>> +        qdev->in_reset = false;
>>>> +}
>>>> +
>>>> +static int qaic_pci_probe(struct pci_dev *pdev,
>>>> +              const struct pci_device_id *id)
>>>
>>> Please try to simplify this function. Maybe move irq init to separate 
>>> function.
>>> It will be more readable and there will less of a error handling 
>>> spaghetti at the bottom.
>>
>> I guess I tend towards not breaking something up if there is not reuse 
>> elsewhere, but I think I see your point.  Will have a look.
>>
>>>
>>>> +{
>>>> +    int ret;
>>>> +    int i;
>>>> +    int mhi_irq;
>>>> +    struct qaic_device *qdev;
>>>> +
>>>> +    qdev = devm_kzalloc(&pdev->dev, sizeof(*qdev), GFP_KERNEL);
>>>> +    if (!qdev)
>>>> +        return -ENOMEM;
>>>> +
>>>> +    if (id->device == PCI_DEV_AIC100) {
>>>> +        qdev->num_dbc = 16;
>>>> +        qdev->dbc = devm_kcalloc(&pdev->dev, qdev->num_dbc, 
>>>> sizeof(*qdev->dbc),
>>>> +                     GFP_KERNEL);
>>>> +        if (!qdev->dbc)
>>>> +            return -ENOMEM;
>>>> +    }
>>>> +
>>>> +    qdev->cntl_wq = alloc_workqueue("qaic_cntl", WQ_UNBOUND, 0);
>>>> +    if (!qdev->cntl_wq) {
>>>> +        ret = -ENOMEM;
>>>> +        goto wq_fail;
>>>> +    }
>>>> +    pci_set_drvdata(pdev, qdev);
>>>> +    qdev->pdev = pdev;
>>>> +    mutex_init(&qdev->cntl_mutex);
>>>> +    INIT_LIST_HEAD(&qdev->cntl_xfer_list);
>>>> +    init_srcu_struct(&qdev->dev_lock);
>>>> +    INIT_LIST_HEAD(&qdev->qaic_drm_devices);
>>>> +    mutex_init(&qdev->qaic_drm_devices_mutex);
>>>> +    for (i = 0; i < qdev->num_dbc; ++i) {
>>>> +        spin_lock_init(&qdev->dbc[i].xfer_lock);
>>>> +        qdev->dbc[i].qdev = qdev;
>>>> +        qdev->dbc[i].id = i;
>>>> +        INIT_LIST_HEAD(&qdev->dbc[i].xfer_list);
>>>> +        init_srcu_struct(&qdev->dbc[i].ch_lock);
>>>> +        init_waitqueue_head(&qdev->dbc[i].dbc_release);
>>>> +        INIT_LIST_HEAD(&qdev->dbc[i].bo_lists);
>>>> +    }
>>>> +
>>>> +    qdev->bars = pci_select_bars(pdev, IORESOURCE_MEM);
>>>> +
>>>> +    /* make sure the device has the expected BARs */
>>>> +    if (qdev->bars != (BIT(0) | BIT(2) | BIT(4))) {
>>>> +        pci_dbg(pdev, "%s: expected BARs 0, 2, and 4 not found in 
>>>> device.  Found 0x%x\n",
>>>> +            __func__, qdev->bars);
>>>> +        ret = -EINVAL;
>>>> +        goto bar_fail;
>>>> +    }
>>>> +
>>>> +    ret = pci_enable_device(pdev);
>>>> +    if (ret)
>>>> +        goto enable_fail;
>>>> +
>>>> +    ret = pci_request_selected_regions(pdev, qdev->bars, "aic100");
>>>> +    if (ret)
>>>> +        goto request_regions_fail;
>>>> +
>>>> +    pci_set_master(pdev);
>>>> +
>>>> +    ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
>>>> +    if (ret)
>>>> +        goto dma_mask_fail;
>>>> +    ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
>>>> +    if (ret)
>>>> +        goto dma_mask_fail;
>>>> +    ret = dma_set_max_seg_size(&pdev->dev, UINT_MAX);
>>>> +    if (ret)
>>>> +        goto dma_mask_fail;
>>>> +
>>>> +    qdev->bar_0 = pci_ioremap_bar(pdev, 0);
>>>> +    if (!qdev->bar_0) {
>>>> +        ret = -ENOMEM;
>>>> +        goto ioremap_0_fail;
>>>> +    }
>>>> +
>>>> +    qdev->bar_2 = pci_ioremap_bar(pdev, 2);
>>>> +    if (!qdev->bar_2) {
>>>> +        ret = -ENOMEM;
>>>> +        goto ioremap_2_fail;
>>>> +    }
>>>> +
>>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>>> +        qdev->dbc[i].dbc_base = qdev->bar_2 + QAIC_DBC_OFF(i);
>>>> +
>>>> +    ret = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI);
>>>> +    if (ret < 0)
>>>> +        goto alloc_irq_fail;
>>>> +
>>>> +    if (ret < 32) {
>>>> +        pci_err(pdev, "%s: Requested 32 MSIs.  Obtained %d MSIs 
>>>> which is less than the 32 required.\n",
>>>> +            __func__, ret);
>>>> +        ret = -ENODEV;
>>>> +        goto invalid_msi_config;
>>>> +    }
>>>> +
>>>> +    mhi_irq = pci_irq_vector(pdev, 0);
>>>> +    if (mhi_irq < 0) {
>>>> +        ret = mhi_irq;
>>>> +        goto get_mhi_irq_fail;
>>>> +    }
>>>> +
>>>> +    for (i = 0; i < qdev->num_dbc; ++i) {
>>>> +        ret = devm_request_threaded_irq(&pdev->dev,
>>>> +                        pci_irq_vector(pdev, i + 1),
>>>> +                        dbc_irq_handler,
>>>> +                        dbc_irq_threaded_fn,
>>>> +                        IRQF_SHARED,
>>>> +                        "qaic_dbc",
>>>> +                        &qdev->dbc[i]);
>>>> +        if (ret)
>>>> +            goto get_dbc_irq_failed;
>>>> +
>>>> +        if (poll_datapath) {
>>>> +            qdev->dbc[i].irq = pci_irq_vector(pdev, i + 1);
>>>> +            disable_irq_nosync(qdev->dbc[i].irq);
>>>> +            INIT_WORK(&qdev->dbc[i].poll_work, irq_polling_work);
>>>> +        }
>>>> +    }
>>>> +
>>>> +    qdev->mhi_cntl = qaic_mhi_register_controller(pdev, 
>>>> qdev->bar_0, mhi_irq);
>>>> +    if (IS_ERR(qdev->mhi_cntl)) {
>>>> +        ret = PTR_ERR(qdev->mhi_cntl);
>>>> +        goto mhi_register_fail;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +
>>>> +mhi_register_fail:
>>>> +get_dbc_irq_failed:
>>>
>>> I don't think that duplicated goto statements are allowed by the 
>>> coding style.
>>> These should be rather named something like err_free_irq.
>>> See 
>>> https://www.kernel.org/doc/html/v4.10/process/coding-style.html#centralized-exiting-of-functions 
>>>
>>
>> I took another look at that link, and I do not beleive it prohibits 
>> duplicated goto labels, but they probably could be renamed and 
>> simplified as you indicate.  Will have a look at that.
>>
>>>
>>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>>> +        devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
>>>> +                  &qdev->dbc[i]);
>>>> +get_mhi_irq_fail:
>>>> +invalid_msi_config:
>>>> +    pci_free_irq_vectors(pdev);
>>>> +alloc_irq_fail:
>>>> +    iounmap(qdev->bar_2);
>>>> +ioremap_2_fail:
>>>> +    iounmap(qdev->bar_0);
>>>> +ioremap_0_fail:
>>>> +dma_mask_fail:
>>>> +    pci_clear_master(pdev);
>>>> +    pci_release_selected_regions(pdev, qdev->bars);
>>>> +request_regions_fail:
>>>> +    pci_disable_device(pdev);
>>>> +enable_fail:
>>>> +    pci_set_drvdata(pdev, NULL);
>>>> +bar_fail:
>>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>>> +        cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
>>>> +    cleanup_srcu_struct(&qdev->dev_lock);
>>>> +    destroy_workqueue(qdev->cntl_wq);
>>>> +wq_fail:
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static void qaic_pci_remove(struct pci_dev *pdev)
>>>> +{
>>>> +    struct qaic_device *qdev = pci_get_drvdata(pdev);
>>>> +    int i;
>>>> +
>>>> +    if (!qdev)
>>>> +        return;
>>>> +
>>>> +    qaic_dev_reset_clean_local_state(qdev, false);
>>>> +    qaic_mhi_free_controller(qdev->mhi_cntl, link_up);
>>>> +    for (i = 0; i < qdev->num_dbc; ++i) {
>>>> +        devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
>>>> +                  &qdev->dbc[i]);
>>>> +        cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
>>>> +    }
>>>> +    destroy_workqueue(qdev->cntl_wq);
>>>> +    pci_free_irq_vectors(pdev);
>>>> +    iounmap(qdev->bar_0);
>>>> +    pci_clear_master(pdev);
>>>> +    pci_release_selected_regions(pdev, qdev->bars);
>>>> +    pci_disable_device(pdev);
>>>> +    pci_set_drvdata(pdev, NULL);
>>>> +}
>>>> +
>>>> +static void qaic_pci_shutdown(struct pci_dev *pdev)
>>>> +{
>>>> +    /* see qaic_exit for what link_up is doing */
>>>> +    link_up = true;
>>>> +    qaic_pci_remove(pdev);
>>>> +}
>>>> +
>>>> +static pci_ers_result_t qaic_pci_error_detected(struct pci_dev *pdev,
>>>> +                        pci_channel_state_t error)
>>>> +{
>>>> +    return PCI_ERS_RESULT_NEED_RESET;
>>>> +}
>>>> +
>>>> +static void qaic_pci_reset_prepare(struct pci_dev *pdev)
>>>> +{
>>>> +    struct qaic_device *qdev = pci_get_drvdata(pdev);
>>>> +
>>>> +    qaic_notify_reset(qdev);
>>>> +    qaic_mhi_start_reset(qdev->mhi_cntl);
>>>> +    qaic_dev_reset_clean_local_state(qdev, false);
>>>> +}
>>>> +
>>>> +static void qaic_pci_reset_done(struct pci_dev *pdev)
>>>> +{
>>>> +    struct qaic_device *qdev = pci_get_drvdata(pdev);
>>>> +
>>>> +    qdev->in_reset = false;
>>>> +    qaic_mhi_reset_done(qdev->mhi_cntl);
>>>> +}
>>>> +
>>>> +static const struct mhi_device_id qaic_mhi_match_table[] = {
>>>> +    { .chan = "QAIC_CONTROL", },
>>>> +    {},
>>>> +};
>>>> +
>>>> +static struct mhi_driver qaic_mhi_driver = {
>>>> +    .id_table = qaic_mhi_match_table,
>>>> +    .remove = qaic_mhi_remove,
>>>> +    .probe = qaic_mhi_probe,
>>>> +    .ul_xfer_cb = qaic_mhi_ul_xfer_cb,
>>>> +    .dl_xfer_cb = qaic_mhi_dl_xfer_cb,
>>>> +    .driver = {
>>>> +        .name = "qaic_mhi",
>>>> +    },
>>>> +};
>>>> +
>>>> +static const struct pci_device_id qaic_ids[] = {
>>>> +    { PCI_DEVICE(PCI_VENDOR_ID_QCOM, PCI_DEV_AIC100), },
>>>> +    { }
>>>> +};
>>>> +MODULE_DEVICE_TABLE(pci, qaic_ids);
>>>> +
>>>> +static const struct pci_error_handlers qaic_pci_err_handler = {
>>>> +    .error_detected = qaic_pci_error_detected,
>>>> +    .reset_prepare = qaic_pci_reset_prepare,
>>>> +    .reset_done = qaic_pci_reset_done,
>>>> +};
>>>> +
>>>> +static struct pci_driver qaic_pci_driver = {
>>>> +    .name = QAIC_NAME,
>>>> +    .id_table = qaic_ids,
>>>> +    .probe = qaic_pci_probe,
>>>> +    .remove = qaic_pci_remove,
>>>> +    .shutdown = qaic_pci_shutdown,
>>>> +    .err_handler = &qaic_pci_err_handler,
>>>> +};
>>>> +
>>>> +static int __init qaic_init(void)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    if (datapath_polling)
>>>> +        poll_datapath = true;
>>>> +
>>>> +    ret = mhi_driver_register(&qaic_mhi_driver);
>>>> +    if (ret) {
>>>> +        pr_debug("qaic: mhi_driver_register failed %d\n", ret);
>>>> +        goto free_class;
>>>> +    }
>>>> +
>>>> +    ret = pci_register_driver(&qaic_pci_driver);
>>>> +    if (ret) {
>>>> +        pr_debug("qaic: pci_register_driver failed %d\n", ret);
>>>> +        goto free_mhi;
>>>> +    }
>>>> +
>>>> +    ret = mhi_qaic_ctrl_init();
>>>> +    if (ret) {
>>>> +        pr_debug("qaic: mhi_qaic_ctrl_init failed %d\n", ret);
>>>> +        goto free_pci;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +
>>>> +free_pci:
>>>> +    pci_unregister_driver(&qaic_pci_driver);
>>>> +free_mhi:
>>>> +    mhi_driver_unregister(&qaic_mhi_driver);
>>>> +free_class:
>>>
>>> This label doesn't free anything. It should be renamed.
>>
>> Doh.  Holdover from a previous version.  It does need to be renamed.
>>
>>>
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static void __exit qaic_exit(void)
>>>> +{
>>>> +    /*
>>>> +     * We assume that qaic_pci_remove() is called due to a hotplug 
>>>> event
>>>> +     * which would mean that the link is down, and thus
>>>> +     * qaic_mhi_free_controller() should not try to access the 
>>>> device during
>>>> +     * cleanup.
>>>> +     * We call pci_unregister_driver() below, which also triggers
>>>> +     * qaic_pci_remove(), but since this is module exit, we expect 
>>>> the link
>>>> +     * to the device to be up, in which case 
>>>> qaic_mhi_free_controller()
>>>> +     * should try to access the device during cleanup to put the 
>>>> device in
>>>> +     * a sane state.
>>>> +     * For that reason, we set link_up here to let 
>>>> qaic_mhi_free_controller
>>>> +     * know the expected link state.  Since the module is going to be
>>>> +     * removed at the end of this, we don't need to worry about
>>>> +     * reinitializing the link_up state after the cleanup is done.
>>>> +     */
>>>> +    link_up = true;
>>>
>>> Maybe you could just use pdev->current_state instead of link_up?
>>
>> Maybe.  Will have a look at that.
>>
>>>
>>>> +    mhi_qaic_ctrl_deinit();
>>>> +    pci_unregister_driver(&qaic_pci_driver);
>>>> +    mhi_driver_unregister(&qaic_mhi_driver);
>>>> +}
>>>> +
>>>> +module_init(qaic_init);
>>>> +module_exit(qaic_exit);
>>>> +
>>>> +MODULE_AUTHOR(QAIC_DESC " Kernel Driver Team");
>>>> +MODULE_DESCRIPTION(QAIC_DESC " Accel Driver");
>>>> +MODULE_LICENSE("GPL");
>>>> diff --git a/include/uapi/drm/qaic_accel.h 
>>>> b/include/uapi/drm/qaic_accel.h
>>>> new file mode 100644
>>>> index 0000000..d5fa6f5
>>>> --- /dev/null
>>>> +++ b/include/uapi/drm/qaic_accel.h
>>>> @@ -0,0 +1,283 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
>>>> + *
>>>> + * Copyright (c) 2019-2020, The Linux Foundation. All rights reserved.
>>>> + * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All 
>>>> rights reserved.
>>>> + */
>>>> +
>>>> +#ifndef QAIC_ACCEL_H_
>>>> +#define QAIC_ACCEL_H_
>>>> +
>>>> +#include <linux/ioctl.h>
>>>> +#include <linux/types.h>
>>>
>>> These two headers should not be needed here.
>>
>> Fair enough.  Listed out of paranoia since they define things used in 
>> this header.
>>
>>>
>>>> +#include "drm.h"
>>>> +
>>>> +#if defined(__CPLUSPLUS)
>>>
>>> Use lowercase here: __cplusplus
>>
>> Doh.  Not sure why uppercase was used.  The referenced examples sure 
>> didn't have that.
>>
>>>
>>>> +extern "C" {
>>>> +#endif
>>>> +
>>>> +#define QAIC_MANAGE_MAX_MSG_LENGTH SZ_4K    /**<
>>>> +                          * The length(4K) includes len and
>>>> +                          * count fields of qaic_manage_msg
>>>> +                          */
>>>
>>> I guess these are doxygen style commands but you should be using 
>>> kernel-doc here.
>>> See https://docs.kernel.org/doc-guide/kernel-doc.html.
>>> This can be used to verify the header:
>>> $ scripts/kernel-doc -v -none include/uapi/drm/qaic_accel.h
>>
>> include/uapi/drm/drm.h was used as the example for these, and thus it 
>> was assumed that was the DRM style.
>>
>> I'm not convinced kernel-doc is applicable to all of these.  Will review.
>>
>>>
>>>> +
>>>> +enum qaic_sem_flags {
>>>
>>> Is there any specific reason for enums if all values are explicitly 
>>> given?
>>
>> It is a style pulled from elsewhere.  I can remove the enums.
>>
>>>
>>>> +    SEM_INSYNCFENCE =    0x1,
>>>
>>> All these enums/defines end up in a global user space namespace.
>>> I would advise to prefix everything with QAIC_ or DRM_QAIC_ (e.g. 
>>> QAIC_SEM_INSYNCFENCE)
>>> to avoid conflicts with other drivers or user space libs.
>>
>> Good point.  Will do.
>>
>>>
>>>> +    SEM_OUTSYNCFENCE =    0x2,
>>>> +};
>>>> +
>>>> +enum qaic_sem_cmd {
>>>> +    SEM_NOP =        0,
>>>> +    SEM_INIT =        1,
>>>> +    SEM_INC =        2,
>>>> +    SEM_DEC =        3,
>>>> +    SEM_WAIT_EQUAL =    4,
>>>> +    SEM_WAIT_GT_EQ =    5, /**< Greater than or equal */
>>>> +    SEM_WAIT_GT_0 =        6, /**< Greater than 0 */
>>>> +};
>>>> +
>>>> +enum qaic_manage_transaction_type {
>>>> +    TRANS_UNDEFINED =            0,
>>>> +    TRANS_PASSTHROUGH_FROM_USR =        1,
>>>> +    TRANS_PASSTHROUGH_TO_USR =        2,
>>>> +    TRANS_PASSTHROUGH_FROM_DEV =        3,
>>>> +    TRANS_PASSTHROUGH_TO_DEV =        4,
>>>> +    TRANS_DMA_XFER_FROM_USR =        5,
>>>> +    TRANS_DMA_XFER_TO_DEV =            6,
>>>> +    TRANS_ACTIVATE_FROM_USR =        7,
>>>> +    TRANS_ACTIVATE_FROM_DEV =        8,
>>>> +    TRANS_ACTIVATE_TO_DEV =            9,
>>>> +    TRANS_DEACTIVATE_FROM_USR =        10,
>>>> +    TRANS_DEACTIVATE_FROM_DEV =        11,
>>>> +    TRANS_STATUS_FROM_USR =            12,
>>>> +    TRANS_STATUS_TO_USR =            13,
>>>> +    TRANS_STATUS_FROM_DEV =            14,
>>>> +    TRANS_STATUS_TO_DEV =            15,
>>>> +    TRANS_TERMINATE_FROM_DEV =        16,
>>>> +    TRANS_TERMINATE_TO_DEV =        17,
>>>> +    TRANS_DMA_XFER_CONT =            18,
>>>> +    TRANS_VALIDATE_PARTITION_FROM_DEV =    19,
>>>> +    TRANS_VALIDATE_PARTITION_TO_DEV =    20,
>>>> +    TRANS_MAX =                21
>>>> +};
>>>> +
>>>> +struct qaic_manage_trans_hdr {
>>>> +    __u32 type;    /**< in, value from enum manage_transaction_type */
>>>> +    __u32 len;    /**< in, length of this transaction, including 
>>>> the header */
>>>> +};
>>>> +
>>>> +struct qaic_manage_trans_passthrough {
>>>> +    struct qaic_manage_trans_hdr hdr;
>>>> +    __u8 data[];    /**< in, userspace must encode in little endian */
>>>> +};
>>>> +
>>>> +struct qaic_manage_trans_dma_xfer {
>>>> +    struct qaic_manage_trans_hdr hdr;
>>>> +    __u32 tag;    /**< in, device specific */
>>>> +    __u32 count;    /**< in */
>>>> +    __u64 addr;    /**< in, address of the data to transferred via 
>>>> DMA */
>>>> +    __u64 size;    /**< in, length of the data to transferred via 
>>>> DMA */
>>>> +};
>>>> +
>>>> +struct qaic_manage_trans_activate_to_dev {
>>>> +    struct qaic_manage_trans_hdr hdr;
>>>> +    __u32 queue_size;    /**<
>>>> +                  * in, number of elements in DBC request
>>>> +                  * and respose queue
>>>> +                  */
>>>> +    __u32 eventfd;        /**< in */
>>>> +    __u32 options;        /**< in, device specific */
>>>> +    __u32 pad;        /**< pad must be 0 */
>>>> +};
>>>> +
>>>> +struct qaic_manage_trans_activate_from_dev {
>>>> +    struct qaic_manage_trans_hdr hdr;
>>>> +    __u32 status;    /**< out, status of activate transaction */
>>>> +    __u32 dbc_id;    /**< out, Identifier of assigned DMA Bridge 
>>>> channel */
>>>> +    __u64 options;    /**< out */
>>>> +};
>>>> +
>>>> +struct qaic_manage_trans_deactivate {
>>>> +    struct qaic_manage_trans_hdr hdr;
>>>> +    __u32 dbc_id;    /**< in, Identifier of assigned DMA Bridge 
>>>> channel */
>>>> +    __u32 pad;    /**< pad must be 0 */
>>>> +};
>>>> +
>>>> +struct qaic_manage_trans_status_to_dev {
>>>> +    struct qaic_manage_trans_hdr hdr;
>>>> +};
>>>> +
>>>> +struct qaic_manage_trans_status_from_dev {
>>>> +    struct qaic_manage_trans_hdr hdr;
>>>> +    __u16 major;    /**< out, major vesrion of NNC protocol used by 
>>>> device */
>>>> +    __u16 minor;    /**< out, minor vesrion of NNC protocol used by 
>>>> device */
>>>
>>> vesrion -> version
>>
>> Will fix.
>>
>>>
>>>> +    __u32 status;    /**< out, status of query transaction  */
>>>> +    __u64 status_flags;    /**<
>>>> +                  * out
>>>> +                  * 0    : If set then device has CRC check enabled
>>>> +                  * 1:63 : Unused
>>>> +                  */
>>>> +};
>>>> +
>>>> +struct qaic_manage_msg {
>>>> +    __u32 len;    /**< in, Length of valid data - ie sum of all 
>>>> transactions */
>>>> +    __u32 count;    /**< in, Number of transactions in message */
>>>> +    __u64 data;    /**< in, Pointer to array of transactions */
>>>> +};
>>>> +
>>>> +struct qaic_create_bo {
>>>> +    __u64 size;    /**< in, Size of BO in byte */
>>>> +    __u32 handle;    /**< out, Returned GEM handle for the BO */
>>>> +    __u32 pad;    /**< pad must be 0 */
>>>> +};
>>>> +
>>>> +struct qaic_mmap_bo {
>>>> +    __u32 handle;    /**< in, Handle for the BO being mapped. */
>>>
>>> The comment is missleading. BO is not mapped by this ioctl().
>>
>> Its mapping the handle to the offset, which is a type of mapping.  I 
>> think I get your point though.  Will try to wordsmith this better.
>>
>>>
>>>> +    __u32 pad;    /**< pad must be 0 */
>>>> +    __u64 offset;    /**<
>>>> +              * out, offset into the drm node to use for
>>>> +              * subsequent mmap call
>>>> +              */
>>>> +};
>>>> +
>>>> +/**
>>>> + * @brief semaphore command
>>>> + */
>>>> +struct qaic_sem {
>>>> +    __u16 val;    /**< in, Only lower 12 bits are valid */
>>>> +    __u8  index;    /**< in, Only lower 5 bits are valid */
>>>> +    __u8  presync;    /**< in, 1 if presync operation, 0 if 
>>>> postsync */
>>>> +    __u8  cmd;    /**< in, See enum sem_cmd */
>>>> +    __u8  flags;    /**< in, See sem_flags for valid bits.  All 
>>>> others must be 0 */
>>>> +    __u16 pad;    /**< pad must be 0 */
>>>> +};
>>>> +
>>>> +struct qaic_attach_slice_entry {
>>>> +    __u64 size;        /**< in, Size memory to allocate for this BO 
>>>> slice */
>>>> +    struct qaic_sem    sem0;    /**< in, Must be zero if not valid */
>>>> +    struct qaic_sem    sem1;    /**< in, Must be zero if not valid */
>>>> +    struct qaic_sem    sem2;    /**< in, Must be zero if not valid */
>>>> +    struct qaic_sem    sem3;    /**< in, Must be zero if not valid */
>>>> +    __u64 dev_addr;        /**< in, Address in device to/from which 
>>>> data is copied */
>>>> +    __u64 db_addr;        /**< in, Doorbell address */
>>>> +    __u32 db_data;        /**< in, Data to write to doorbell */
>>>> +    __u32 db_len;        /**<
>>>> +                  * in, Doorbell length - 32, 16, or 8 bits.
>>>> +                  * 0 means doorbell is inactive
>>>> +                  */
>>>> +    __u64 offset;        /**< in, Offset from start of buffer */
>>>> +};
>>>> +
>>>> +struct qaic_attach_slice_hdr {
>>>> +    __u32 count;    /**< in, Number of slices for this BO */
>>>> +    __u32 dbc_id;    /**< in, Associate this BO with this DMA 
>>>> Bridge channel */
>>>> +    __u32 handle;    /**< in, Handle of BO to which slicing 
>>>> information is to be attached */
>>>> +    __u32 dir;    /**< in, Direction of data: 1 = DMA_TO_DEVICE, 2 
>>>> = DMA_FROM_DEVICE */
>>>> +    __u64 size;    /**<
>>>> +              * in, Total length of BO
>>>> +              * If BO is imported (DMABUF/PRIME) then this size
>>>> +              * should not exceed the size of DMABUF provided.
>>>> +              * If BO is allocated using DRM_IOCTL_QAIC_CREATE_BO
>>>> +              * then this size should be exactly same as the size
>>>> +              * provided during DRM_IOCTL_QAIC_CREATE_BO.
>>>> +              */
>>>> +};
>>>> +
>>>> +struct qaic_attach_slice {
>>>> +    struct qaic_attach_slice_hdr hdr;
>>>> +    __u64 data;    /**<
>>>> +              * in, Pointer to a buffer which is container of
>>>> +              * struct qaic_attach_slice_entry[]
>>>> +              */
>>>> +};
>>>> +
>>>> +struct qaic_execute_entry {
>>>> +    __u32 handle;    /**< in, buffer handle */
>>>> +    __u32 dir;    /**< in, 1 = to device, 2 = from device */
>>>> +};
>>>> +
>>>> +struct qaic_partial_execute_entry {
>>>> +    __u32 handle;    /**< in, buffer handle */
>>>> +    __u32 dir;    /**< in, 1 = to device, 2 = from device */
>>>> +    __u64 resize;    /**< in, 0 = no resize */
>>>> +};
>>>> +
>>>> +struct qaic_execute_hdr {
>>>> +    __u32 count;    /**< in, number of executes following this 
>>>> header */
>>>> +    __u32 dbc_id;    /**< in, Identifier of assigned DMA Bridge 
>>>> channel */
>>>> +};
>>>> +
>>>> +struct qaic_execute {
>>>> +    struct qaic_execute_hdr hdr;
>>>> +    __u64 data;    /**< in, qaic_execute_entry or 
>>>> qaic_partial_execute_entry container */
>>>> +};
>>>> +
>>>> +struct qaic_wait {
>>>> +    __u32 handle;    /**< in, handle to wait on until execute is 
>>>> complete */
>>>> +    __u32 timeout;    /**< in, timeout for wait(in ms) */
>>>> +    __u32 dbc_id;    /**< in, Identifier of assigned DMA Bridge 
>>>> channel */
>>>> +    __u32 pad;    /**< pad must be 0 */
>>>> +};
>>>> +
>>>> +struct qaic_perf_stats_hdr {
>>>> +    __u16 count;    /**< in, Total number BOs requested */
>>>> +    __u16 pad;    /**< pad must be 0 */
>>>> +    __u32 dbc_id;    /**< in, Identifier of assigned DMA Bridge 
>>>> channel */
>>>> +};
>>>> +
>>>> +struct qaic_perf_stats {
>>>> +    struct qaic_perf_stats_hdr hdr;
>>>> +    __u64 data;    /**< in, qaic_perf_stats_entry container */
>>>> +};
>>>> +
>>>> +struct qaic_perf_stats_entry {
>>>> +    __u32 handle;            /**< in, Handle of the memory request */
>>>> +    __u32 queue_level_before;    /**<
>>>> +                      * out, Number of elements in queue
>>>> +                      * before submission given memory request
>>>> +                      */
>>>> +    __u32 num_queue_element;    /**<
>>>> +                      * out, Number of elements to add in the
>>>> +                      * queue for given memory request
>>>> +                      */
>>>> +    __u32 submit_latency_us;    /**<
>>>> +                      * out, Time taken by kernel to submit
>>>> +                      * the request to device
>>>> +                      */
>>>> +    __u32 device_latency_us;    /**<
>>>> +                      * out, Time taken by device to execute the
>>>> +                      * request. 0 if request is not completed
>>>> +                      */
>>>> +    __u32 pad;            /**< pad must be 0 */
>>>> +};
>>>> +
>>>> +struct qaic_part_dev {
>>>> +    __u32 partition_id;    /**< in, reservation id */
>>>> +    __u16 remove;        /**< in, 1 - Remove device 0 - Create 
>>>> device */
>>>> +    __u16 pad;        /**< pad must be 0 */
>>>> +};
>>>> +
>>>> +#define DRM_QAIC_MANAGE                0x00
>>>> +#define DRM_QAIC_CREATE_BO            0x01
>>>> +#define DRM_QAIC_MMAP_BO            0x02
>>>
>>> I know that MMAP_BO ioctl is common in drm drivers but in my opinion 
>>> it is a very poor name.
>>> I suggest naming it BO_INFO so in future you could extend it with 
>>> other bo params besides
>>> mmap offset.
>>
>> I see your point, but I don't see how to implement it.
>>
>> Let us assume this IOCTL is renamed DRM_QAIC_BO_INFO, and struct 
>> qaic_mmap_bo is named qaic_bo_info.
>>
>> qaic_bo_info is part of the uAPI.  If, "tomorrow" we need to add a 
>> field to it, we cannot.  That changes the structure size and breaks 
>> the uAPI.
>>
>> I can't add reserved fields in anticipation of future use since GregKH 
>> had said not to do that - 
>> https://lore.kernel.org/all/20200514163734.GB3154055@kroah.com/
>>
>> So, I'm thinking the only approach would be a linked list similar to 
>> EXECUTE_BO where qaic_bo_info holds a userspace pointer that is the 
>> head of the list, and new list nodes can be defined in future.  That 
>> seems like an overengineered solution to some hypothetical future 
>> which may not come to pass.  If this is the solution, then I think I'd 
>> prefer to leave this ioctl as is, and deprecate it if the extension 
>> usecase ever comes to pass.
>>
>> Thoughts?  Am I missing something?
>>
>>>
>>>> +#define DRM_QAIC_ATTACH_SLICE_BO        0x03
>>>> +#define DRM_QAIC_EXECUTE_BO            0x04
>>>> +#define DRM_QAIC_PARTIAL_EXECUTE_BO        0x05
>>>> +#define DRM_QAIC_WAIT_BO            0x06
>>>> +#define DRM_QAIC_PERF_STATS_BO            0x07
>>>> +#define DRM_QAIC_PART_DEV            0x08
>>>> +
>>>> +#define DRM_IOCTL_QAIC_MANAGE            DRM_IOWR(DRM_COMMAND_BASE 
>>>> + DRM_QAIC_MANAGE, struct qaic_manage_msg)
>>>> +#define DRM_IOCTL_QAIC_CREATE_BO        DRM_IOWR(DRM_COMMAND_BASE + 
>>>> DRM_QAIC_CREATE_BO,    struct qaic_create_bo)
>>>> +#define DRM_IOCTL_QAIC_MMAP_BO            DRM_IOWR(DRM_COMMAND_BASE 
>>>> + DRM_QAIC_MMAP_BO, struct qaic_mmap_bo)
>>>> +#define DRM_IOCTL_QAIC_ATTACH_SLICE_BO        
>>>> DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_ATTACH_SLICE_BO, struct 
>>>> qaic_attach_slice)
>>>> +#define DRM_IOCTL_QAIC_EXECUTE_BO        DRM_IOW(DRM_COMMAND_BASE + 
>>>> DRM_QAIC_EXECUTE_BO,    struct qaic_execute)
>>>> +#define DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO    
>>>> DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_PARTIAL_EXECUTE_BO,    struct 
>>>> qaic_execute)
>>>> +#define DRM_IOCTL_QAIC_WAIT_BO            DRM_IOW(DRM_COMMAND_BASE 
>>>> + DRM_QAIC_WAIT_BO, struct qaic_wait)
>>>> +#define DRM_IOCTL_QAIC_PERF_STATS_BO        
>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PERF_STATS_BO, struct 
>>>> qaic_perf_stats)
>>>> +#define DRM_IOCTL_QAIC_PART_DEV            
>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PART_DEV, struct qaic_part_dev)
>>>> +
>>>> +#if defined(__CPLUSPLUS)
>>>
>>> Use lowercase here: __cplusplus
>>
>> Will do.
>>
>>>
>>>> +}
>>>> +#endif
>>>> +
>>>> +#endif /* QAIC_ACCEL_H_ */
>>>
>>> Regards,
>>> Jacek
>>>
>>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/8] accel/qaic: Add uapi and core driver file
@ 2023-02-24 21:21           ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-02-24 21:21 UTC (permalink / raw)
  To: Dafna Hirschfeld
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, stanislaw.gruszka, quic_carlv, Jacek Lawrynowicz

On 2/22/2023 8:52 AM, Dafna Hirschfeld wrote:
> On 17.02.2023 11:15, Jeffrey Hugo wrote:
>> On 2/16/2023 7:13 AM, Jacek Lawrynowicz wrote:
>>> Hi,
>>>
>>> On 06.02.2023 16:41, Jeffrey Hugo wrote:
>>>> Add the QAIC driver uapi file and core driver file that binds to the 
>>>> PCIe
>>>> device.  The core driver file also creates the accel device and manages
>>>> all the interconnections between the different parts of the driver.
>>>>
>>>> The driver can be built as a module.  If so, it will be called 
>>>> "qaic.ko".
>>>>
>>>> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
>>>> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
>>>> ---
>>>>  drivers/accel/qaic/qaic.h     | 321 ++++++++++++++++++
>>>>  drivers/accel/qaic/qaic_drv.c | 771 
>>>> ++++++++++++++++++++++++++++++++++++++++++
>>>>  include/uapi/drm/qaic_accel.h | 283 ++++++++++++++++
>>>>  3 files changed, 1375 insertions(+)
>>>>  create mode 100644 drivers/accel/qaic/qaic.h
>>>>  create mode 100644 drivers/accel/qaic/qaic_drv.c
>>>>  create mode 100644 include/uapi/drm/qaic_accel.h
>>>>
>>>> diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
>>>> new file mode 100644
>>>> index 0000000..3f7ea76
>>>> --- /dev/null
>>>> +++ b/drivers/accel/qaic/qaic.h
>>>> @@ -0,0 +1,321 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0-only
>>>> + *
>>>> + * Copyright (c) 2019-2021, The Linux Foundation. All rights reserved.
>>>> + * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All 
>>>> rights reserved.
>>>> + */
>>>> +
>>>> +#ifndef QAICINTERNAL_H_
>>>
>>> Please use guard macro that matches the file name: _QAIC_H_
>>
>> Before moving to DRM/ACCEL, this conflicted with the uapi file. 
>> However, that is no longer the case, so yes, this should be changed. 
>> Will do.
>>
>>>
>>>> +#define QAICINTERNAL_H_
>>>> +
>>>> +#include <linux/interrupt.h>
>>>> +#include <linux/kref.h>
>>>> +#include <linux/mhi.h>
>>>> +#include <linux/mutex.h>
>>>> +#include <linux/pci.h>
>>>> +#include <linux/spinlock.h>
>>>> +#include <linux/srcu.h>
>>>> +#include <linux/wait.h>
>>>> +#include <linux/workqueue.h>
>>>> +#include <drm/drm_device.h>
>>>> +#include <drm/drm_gem.h>
>>>> +
>>>> +#define QAIC_DBC_BASE        0x20000
>>>> +#define QAIC_DBC_SIZE        0x1000
>>>> +
>>>> +#define QAIC_NO_PARTITION    -1
>>>> +
>>>> +#define QAIC_DBC_OFF(i)        ((i) * QAIC_DBC_SIZE + QAIC_DBC_BASE)
>>>> +
>>>> +#define to_qaic_bo(obj) container_of(obj, struct qaic_bo, base)
>>>> +
>>>> +extern bool poll_datapath;
>>>> +
>>>> +struct qaic_user {
>>>> +    /* Uniquely identifies this user for the device */
>>>> +    int            handle;
>>>> +    struct kref        ref_count;
>>>> +    /* Char device opened by this user */
>>>> +    struct qaic_drm_device    *qddev;
>>>> +    /* Node in list of users that opened this drm device */
>>>> +    struct list_head    node;
>>>> +    /* SRCU used to synchronize this user during cleanup */
>>>> +    struct srcu_struct    qddev_lock;
>>>> +    atomic_t        chunk_id;
>>>> +};
>>>> +
>>>> +struct dma_bridge_chan {
>>>> +    /* Pointer to device strcut maintained by driver */
>>>> +    struct qaic_device    *qdev;
>>>> +    /* ID of this DMA bridge channel(DBC) */
>>>> +    unsigned int        id;
>>>> +    /* Synchronizes access to xfer_list */
>>>> +    spinlock_t        xfer_lock;
>>>> +    /* Base address of request queue */
>>>> +    void            *req_q_base;
>>>> +    /* Base address of response queue */
>>>> +    void            *rsp_q_base;
>>>> +    /*
>>>> +     * Base bus address of request queue. Response queue bus 
>>>> address can be
>>>> +     * calculated by adding request queue size to this variable
>>>> +     */
>>>> +    dma_addr_t        dma_addr;
>>>> +    /* Total size of request and response queue in byte */
>>>> +    u32            total_size;
>>>> +    /* Capacity of request/response queue */
>>>> +    u32            nelem;
>>>> +    /* The user that opened this DBC */
>>>> +    struct qaic_user    *usr;
>>>> +    /*
>>>> +     * Request ID of next memory handle that goes in request queue. 
>>>> One
>>>> +     * memory handle can enqueue more than one request elements, all
>>>> +     * this requests that belong to same memory handle have same 
>>>> request ID
>>>> +     */
>>>> +    u16            next_req_id;
>>>> +    /* TRUE: DBC is in use; FALSE: DBC not in use */
>>>
>>> Use standard "true"/"false" instead of custom "TRUE"/"FALSE" macros.
>>> This applies here and in multiple other places in the driver.
>>
>> I think you are getting at that the documentation could be confusing. 
>> I don't appear to see custom macro use in the code.  Will try to 
>> clarify that here.
>>
>>>> +    bool            in_use;
>>>> +    /*
>>>> +     * Base address of device registers. Used to read/write request 
>>>> and
>>>> +     * response queue's head and tail pointer of this DBC.
>>>> +     */
>>>> +    void __iomem        *dbc_base;
>>>> +    /* Head of list where each node is a memory handle queued in 
>>>> request queue */
>>>> +    struct list_head    xfer_list;
>>>> +    /* Synchronizes DBC readers during cleanup */
>>>> +    struct srcu_struct    ch_lock;
>>>> +    /*
>>>> +     * When this DBC is released, any thread waiting on this wait 
>>>> queue is
>>>> +     * woken up
>>>> +     */
>>>> +    wait_queue_head_t    dbc_release;
>>>> +    /* Head of list where each node is a bo associated with this 
>>>> DBC */
>>>> +    struct list_head    bo_lists;
>>>> +    /* The irq line for this DBC.  Used for polling */
>>>> +    unsigned int        irq;
>>>> +    /* Polling work item to simulate interrupts */
>>>> +    struct work_struct    poll_work;
>>>> +};
>>>> +
>>>> +struct qaic_device {
>>>> +    /* Pointer to base PCI device struct of our physical device */
>>>> +    struct pci_dev        *pdev;
>>>> +    /* Mask of all bars of this device */
>>>> +    int            bars;
>>>> +    /* Req. ID of request that will be queued next in MHI control 
>>>> device */
>>>> +    u32            next_seq_num;
>>>> +    /* Base address of bar 0 */
>>>> +    void __iomem        *bar_0;
>>>> +    /* Base address of bar 2 */
>>>> +    void __iomem        *bar_2;
>>>> +    /* Controller structure for MHI devices */
>>>> +    struct mhi_controller    *mhi_cntl;
>>>> +    /* MHI control channel device */
>>>> +    struct mhi_device    *cntl_ch;
>>>> +    /* List of requests queued in MHI control device */
>>>> +    struct list_head    cntl_xfer_list;
>>>> +    /* Synchronizes MHI control device transactions and its xfer 
>>>> list */
>>>> +    struct mutex        cntl_mutex;
>>>> +    /* Base actual physical representation of drm device */
>>>> +    struct qaic_drm_device    *base_dev;
>>>> +    /* Array of DBC struct of this device */
>>>> +    struct dma_bridge_chan    *dbc;
>>>> +    /* Work queue for tasks related to MHI control device */
>>>> +    struct workqueue_struct    *cntl_wq;
>>>> +    /* Synchronizes all the users of device during cleanup */
>>>> +    struct srcu_struct    dev_lock;
>>>> +    /* TRUE: Device under reset; FALSE: Device not under reset */
>>>> +    bool            in_reset;
>>>> +    /*
>>>> +     * TRUE: A tx MHI transaction has failed and a rx buffer is 
>>>> still queued
>>>> +     * in control device. Such a buffer is considered lost rx buffer
>>>> +     * FALSE: No rx buffer is lost in control device
>>>> +     */
>>>> +    bool            cntl_lost_buf;
>>>> +    /* Maximum number of DBC supported by this device */
>>>> +    u32            num_dbc;
>>>> +    /* Head in list of drm device created on top of this device */
>>>> +    struct list_head    qaic_drm_devices;
>>>> +    /* Synchronizes access of qaic_drm_devices list */
>>>> +    struct mutex        qaic_drm_devices_mutex;
>>>> +    /* Generate the CRC of a control message */
>>>> +    u32 (*gen_crc)(void *msg);
>>>> +    /* Validate the CRC of a control message */
>>>> +    bool (*valid_crc)(void *msg);
>>>> +};
>>>> +
>>>> +struct qaic_drm_device {
>>>> +    /* Pointer to the root device struct driven by this driver */
>>>> +    struct qaic_device    *qdev;
>>>> +    /* Node in list of drm devices maintained by root device */
>>>> +    struct list_head    node;
>>>> +    /*
>>>> +     * The physical device can be partition in number of logical 
>>>> devices.
>>>> +     * And each logical device is given a partition id. This member 
>>>> stores
>>>> +     * that id. QAIC_NO_PARTITION is a sentinel used to mark that 
>>>> this drm
>>>> +     * device is the actual physical device
>>>> +     */
>>>> +    s32            partition_id;
>>>> +    /*
>>>> +     * It points to the user that created this drm device. It is NULL
>>>> +     * when this drm device represents the physical device i.e.
>>>> +     * partition_id is QAIC_NO_PARTITION
>>>> +     */
>>>> +    struct qaic_user    *owner;
>>>> +    /* Pointer to the drm device struct of this drm device */
>>>> +    struct drm_device    *ddev;
>>>> +    /* Head in list of users who have opened this drm device */
>>>> +    struct list_head    users;
>>>> +    /* Synchronizes access to users list */
>>>> +    struct mutex        users_mutex;
>>>> +};
>>>> +
>>>> +struct qaic_bo {
>>>> +    struct drm_gem_object    base;
>>>
>>> Any reason why drm_gem_shmem_object cannot be used as a base?
>>
>> I beleive that is incompatible with our memory allocator in patch 5.
>>
>>>> +    /* Scatter/gather table for allocate/imported BO */
>>>> +    struct sg_table        *sgt;
>>>> +    /* BO size requested by user. GEM object might be bigger in 
>>>> size. */
>>>> +    u64            size;
>>>> +    /* Head in list of slices of this BO */
>>>> +    struct list_head    slices;
>>>> +    /* Total nents, for all slices of this BO */
>>>> +    int            total_slice_nents;
>>>> +    /*
>>>> +     * Direction of transfer. It can assume only two value 
>>>> DMA_TO_DEVICE and
>>>> +     * DMA_FROM_DEVICE.
>>>> +     */
>>>> +    int            dir;
>>>> +    /* The pointer of the DBC which operates on this BO */
>>>> +    struct dma_bridge_chan    *dbc;
>>>> +    /* Number of slice that belongs to this buffer */
>>>> +    u32            nr_slice;
>>>> +    /* Number of slice that have been transferred by DMA engine */
>>>> +    u32            nr_slice_xfer_done;
>>>> +    /* TRUE = BO is queued for execution, FALSE = BO is not queued */
>>>> +    bool            queued;
>>>> +    /*
>>>> +     * If TRUE then user has attached slicing information to this 
>>>> BO by
>>>> +     * calling DRM_IOCTL_QAIC_ATTACH_SLICE_BO ioctl.
>>>> +     */
>>>> +    bool            sliced;
>>>> +    /* Request ID of this BO if it is queued for execution */
>>>> +    u16            req_id;
>>>> +    /* Handle assigned to this BO */
>>>> +    u32            handle;
>>>> +    /* Wait on this for completion of DMA transfer of this BO */
>>>> +    struct completion    xfer_done;
>>>> +    /*
>>>> +     * Node in linked list where head is dbc->xfer_list.
>>>> +     * This link list contain BO's that are queued for DMA transfer.
>>>> +     */
>>>> +    struct list_head    xfer_list;
>>>> +    /*
>>>> +     * Node in linked list where head is dbc->bo_lists.
>>>> +     * This link list contain BO's that are associated with the DBC 
>>>> it is
>>>> +     * linked to.
>>>> +     */
>>>> +    struct list_head    bo_list;
>>>> +    struct {
>>>> +        /*
>>>> +         * Latest timestamp(ns) at which kernel received a request to
>>>> +         * execute this BO
>>>> +         */
>>>> +        u64        req_received_ts;
>>>> +        /*
>>>> +         * Latest timestamp(ns) at which kernel enqueued requests of
>>>> +         * this BO for execution in DMA queue
>>>> +         */
>>>> +        u64        req_submit_ts;
>>>> +        /*
>>>> +         * Latest timestamp(ns) at which kernel received a completion
>>>> +         * interrupt for requests of this BO
>>>> +         */
>>>> +        u64        req_processed_ts;
>>>> +        /*
>>>> +         * Number of elements already enqueued in DMA queue before
>>>> +         * enqueuing requests of this BO
>>>> +         */
>>>> +        u32        queue_level_before;
>>>> +    } perf_stats;
>>>> +
>>>> +};
>>>> +
>>>> +struct bo_slice {
>>>> +    /* Mapped pages */
>>>> +    struct sg_table        *sgt;
>>>> +    /* Number of requests required to queue in DMA queue */
>>>> +    int            nents;
>>>> +    /* See enum dma_data_direction */
>>>> +    int            dir;
>>>> +    /* Actual requests that will be copied in DMA queue */
>>>> +    struct dbc_req        *reqs;
>>>> +    struct kref        ref_count;
>>>> +    /* TRUE: No DMA transfer required */
>>>> +    bool            no_xfer;
>>>> +    /* Pointer to the parent BO handle */
>>>> +    struct qaic_bo        *bo;
>>>> +    /* Node in list of slices maintained by parent BO */
>>>> +    struct list_head    slice;
>>>> +    /* Size of this slice in bytes */
>>>> +    u64            size;
>>>> +    /* Offset of this slice in buffer */
>>>> +    u64            offset;
>>>> +};
>>>> +
>>>> +int get_dbc_req_elem_size(void);
>>>> +int get_dbc_rsp_elem_size(void);
>>>> +int get_cntl_version(struct qaic_device *qdev, struct qaic_user *usr,
>>>> +             u16 *major, u16 *minor);
>>>> +int qaic_manage_ioctl(struct drm_device *dev, void *data,
>>>> +              struct drm_file *file_priv);
>>>> +int qaic_execute_ioctl(struct qaic_device *qdev, struct qaic_user 
>>>> *usr,
>>>> +               unsigned long arg, bool is_partial);
>>>> +int qaic_wait_exec_ioctl(struct qaic_device *qdev, struct qaic_user 
>>>> *usr,
>>>> +             unsigned long arg);
>>>> +int qaic_query_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
>>>> +             unsigned long arg);
>>>> +int qaic_data_mmap(struct qaic_device *qdev, struct qaic_user *usr,
>>>> +           struct vm_area_struct *vma);
>>>> +int qaic_data_get_reservation(struct qaic_device *qdev, struct 
>>>> qaic_user *usr,
>>>> +                  void *data, u32 *partition_id,
>>>> +                  u16 *remove);
>>>> +void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
>>>> +             struct mhi_result *mhi_result);
>>>> +
>>>> +void qaic_mhi_dl_xfer_cb(struct mhi_device *mhi_dev,
>>>> +             struct mhi_result *mhi_result);
>>>> +
>>>> +int qaic_control_open(struct qaic_device *qdev);
>>>> +void qaic_control_close(struct qaic_device *qdev);
>>>> +void qaic_release_usr(struct qaic_device *qdev, struct qaic_user 
>>>> *usr);
>>>> +
>>>> +irqreturn_t dbc_irq_threaded_fn(int irq, void *data);
>>>> +irqreturn_t dbc_irq_handler(int irq, void *data);
>>>> +int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct 
>>>> qaic_user *usr);
>>>> +void enable_dbc(struct qaic_device *qdev, u32 dbc_id, struct 
>>>> qaic_user *usr);
>>>> +void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id);
>>>> +void release_dbc(struct qaic_device *qdev, u32 dbc_id);
>>>> +
>>>> +void wake_all_cntl(struct qaic_device *qdev);
>>>> +void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, 
>>>> bool exit_reset);
>>>> +
>>>> +struct drm_gem_object *qaic_gem_prime_import(struct drm_device *dev,
>>>> +                         struct dma_buf *dma_buf);
>>>> +
>>>> +int qaic_create_bo_ioctl(struct drm_device *dev, void *data,
>>>> +             struct drm_file *file_priv);
>>>> +int qaic_mmap_bo_ioctl(struct drm_device *dev, void *data,
>>>> +               struct drm_file *file_priv);
>>>> +int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data,
>>>> +                   struct drm_file *file_priv);
>>>> +int qaic_execute_bo_ioctl(struct drm_device *dev, void *data,
>>>> +              struct drm_file *file_priv);
>>>> +int qaic_partial_execute_bo_ioctl(struct drm_device *dev, void *data,
>>>> +                  struct drm_file *file_priv);
>>>> +int qaic_wait_bo_ioctl(struct drm_device *dev, void *data,
>>>> +               struct drm_file *file_priv);
>>>> +int qaic_test_print_bo_ioctl(struct drm_device *dev, void *data,
>>>> +                 struct drm_file *file_priv);
>>>> +int qaic_perf_stats_bo_ioctl(struct drm_device *dev, void *data,
>>>> +                 struct drm_file *file_priv);
>>>
>>> You don't need to break these lines. You can use up to 100 columns in 
>>> the whole driver.
>>> It will be more readable and checkpatch won't complain.
>>
>> Some of this predates the 100 column change.  Checkpatch already 
>> complains about the uapi file.
>>
>> Will try to fix here.
>>
>>>> +void irq_polling_work(struct work_struct *work);
>>>> +
>>>> +#endif /* QAICINTERNAL_H_ */
>>>> diff --git a/drivers/accel/qaic/qaic_drv.c 
>>>> b/drivers/accel/qaic/qaic_drv.c
>>>> new file mode 100644
>>>> index 0000000..602a784
>>>> --- /dev/null
>>>> +++ b/drivers/accel/qaic/qaic_drv.c
>>>> @@ -0,0 +1,771 @@
>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>> +
>>>> +/* Copyright (c) 2019-2021, The Linux Foundation. All rights 
>>>> reserved. */
>>>> +/* Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All 
>>>> rights reserved. */
>>>> +
>>>> +#include <linux/delay.h>
>>>> +#include <linux/dma-mapping.h>
>>>> +#include <linux/idr.h>
>>>> +#include <linux/interrupt.h>
>>>> +#include <linux/list.h>
>>>> +#include <linux/kref.h>
>>>> +#include <linux/mhi.h>
>>>> +#include <linux/module.h>
>>>> +#include <linux/msi.h>
>>>> +#include <linux/mutex.h>
>>>> +#include <linux/pci.h>
>>>> +#include <linux/sched.h>
>>>
>>> Is <linux/sched.h> used here?
>>> Feels like there are couple other unused includes in this file.
>>
>> Actually I think that can be removed now.  Will check.
>> The other ones look sane to me.  Are there specific ones you question?
>>
>>>
>>>> +#include <linux/spinlock.h>
>>>> +#include <linux/workqueue.h>
>>>> +#include <linux/wait.h>
>>>> +#include <drm/drm_accel.h>
>>>> +#include <drm/drm_drv.h>
>>>> +#include <drm/drm_file.h>
>>>> +#include <drm/drm_gem.h>
>>>> +#include <drm/drm_ioctl.h>
>>>> +#include <uapi/drm/qaic_accel.h>
>>>> +
>>>> +#include "mhi_controller.h"
>>>> +#include "mhi_qaic_ctrl.h"
>>>> +#include "qaic.h"
>>>> +
>>>> +MODULE_IMPORT_NS(DMA_BUF);
>>>> +
>>>> +#define PCI_DEV_AIC100            0xa100
>>>> +#define QAIC_NAME            "qaic"
>>>> +#define QAIC_DESC            "Qualcomm Cloud AI Accelerators"
>>>> +
>>>> +static unsigned int datapath_polling;
>>>> +module_param(datapath_polling, uint, 0400);
>>>> +bool poll_datapath;
>>>> +
>>>> +static u16 cntl_major = 5;
>>>> +static u16 cntl_minor;/* 0 */
>>>
>>> Missing space before the comment.
>>> And also you could convert both vars to macros as they are constants.
>>
>> Sure
>>
>>>> +static bool link_up;
>>>> +static DEFINE_IDA(qaic_usrs);
>>>> +
>>>> +static int qaic_create_drm_device(struct qaic_device *qdev, s32 
>>>> partition_id,
>>>> +                  struct qaic_user *owner);
>>>> +static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 
>>>> partition_id,
>>>> +                    struct qaic_user *owner);
>>>> +
>>>> +static void free_usr(struct kref *kref)
>>>> +{
>>>> +    struct qaic_user *usr = container_of(kref, struct qaic_user, 
>>>> ref_count);
>>>> +
>>>> +    cleanup_srcu_struct(&usr->qddev_lock);
>>>> +    ida_free(&qaic_usrs, usr->handle);
>>>> +    kfree(usr);
>>>> +}
>>>> +
>>>> +static int qaic_open(struct drm_device *dev, struct drm_file *file)
>>>> +{
>>>> +    struct qaic_drm_device *qddev = dev->dev_private;
>>>> +    struct qaic_device *qdev = qddev->qdev;
>>>> +    struct qaic_user *usr;
>>>> +    int rcu_id;
>>>> +    int ret;
>>>> +
>>>> +    rcu_id = srcu_read_lock(&qdev->dev_lock);
>>>> +    if (qdev->in_reset) {
>>>> +        srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>> +        return -ENODEV;
>>>> +    }
>>>> +
>>>> +    usr = kmalloc(sizeof(*usr), GFP_KERNEL);
>>>> +    if (!usr) {
>>>> +        srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>> +        return -ENOMEM;
>>>> +    }
>>>> +
>>>> +    usr->handle = ida_alloc(&qaic_usrs, GFP_KERNEL);
>>>> +    if (usr->handle < 0) {
>>>> +        srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>> +        kfree(usr);
>>>> +        return usr->handle;
> 
> hi, use after free here

Nice catch.  Thanks.  Also noting the other things you mentioned.

> 
>>>> +    }
>>>> +    usr->qddev = qddev;
>>>> +    atomic_set(&usr->chunk_id, 0);
>>>> +    init_srcu_struct(&usr->qddev_lock);
>>>> +    kref_init(&usr->ref_count);
>>>> +
>>>> +    ret = mutex_lock_interruptible(&qddev->users_mutex);
>>>> +    if (ret) {
>>>> +        cleanup_srcu_struct(&usr->qddev_lock);
>>>> +        kfree(usr);
>>>> +        srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>> +        return ret;
>>>> +    }
>>>> +
>>>> +    list_add(&usr->node, &qddev->users);
>>>> +    mutex_unlock(&qddev->users_mutex);
>>>> +
>>>> +    file->driver_priv = usr;
>>>> +
>>>> +    srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static void qaic_postclose(struct drm_device *dev, struct drm_file 
>>>> *file)
>>>> +{
>>>> +    struct qaic_user *usr = file->driver_priv;
>>>> +    struct qaic_drm_device *qddev;
>>>> +    struct qaic_device *qdev;
>>>> +    int qdev_rcu_id;
>>>> +    int usr_rcu_id;
>>>> +    int i;
>>>> +
>>>> +    qddev = usr->qddev;
>>>> +    usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
>>>> +    if (qddev) {
>>>> +        qdev = qddev->qdev;
>>>> +        qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
>>>> +        if (!qdev->in_reset) {
>>>> +            qaic_release_usr(qdev, usr);
>>>> +            for (i = 0; i < qdev->num_dbc; ++i)
>>>> +                if (qdev->dbc[i].usr &&
>>>> +                    qdev->dbc[i].usr->handle == usr->handle)
>>>> +                    release_dbc(qdev, i);
>>>> +
>>>> +            /* Remove child devices */
>>>> +            if (qddev->partition_id == QAIC_NO_PARTITION)
>>>> +                qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, usr);
>>>> +        }
>>>> +        srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
>>>> +
>>>> +        mutex_lock(&qddev->users_mutex);
>>>> +        if (!list_empty(&usr->node))
>>>> +            list_del_init(&usr->node);
>>>> +        mutex_unlock(&qddev->users_mutex);
>>>> +    }
>>>> +
>>>> +    srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>>> +    kref_put(&usr->ref_count, free_usr);
>>>> +
>>>> +    file->driver_priv = NULL;
>>>> +}
>>>> +
>>>> +static int qaic_part_dev_ioctl(struct drm_device *dev, void *data,
>>>> +                   struct drm_file *file_priv)
>>>> +{
>>>> +    struct qaic_device *qdev;
>>>> +    struct qaic_user *usr;
>>>> +    u32 partition_id;
>>>> +    int qdev_rcu_id;
>>>> +    int usr_rcu_id;
>>>> +    int ret = 0;
>>>> +    u16 remove;
>>>> +
>>>> +    usr = file_priv->driver_priv;
>>>> +    if (!usr)
>>>> +        return -EINVAL;
>>>> +
>>>> +    usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
>>>> +    if (!usr->qddev) {
>>>> +        srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>>> +        return -ENODEV;
>>>> +    }
>>>> +
>>>> +    qdev = usr->qddev->qdev;
>>>> +    if (!qdev) {
>>>> +        srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>>> +        return -ENODEV;
>>>> +    }
>>>> +
>>>> +    qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
>>>> +    if (qdev->in_reset) {
>>>> +        ret = -ENODEV;
>>>> +        goto out;
>>>> +    }
>>>> +
>>>> +    /* This IOCTL is only supported for base devices. */
>>>> +    if (usr->qddev->partition_id != QAIC_NO_PARTITION) {
>>>> +        ret = -ENOTTY;
>>>> +        goto out;
>>>> +    }
>>>> +
>>>> +    ret = qaic_data_get_reservation(qdev, usr, data, &partition_id,
>>>> +                    &remove);
>>>> +    if (ret)
>>>> +        goto out;
>>>> +
>>>> +    if (remove == 1)
>>>> +        qaic_destroy_drm_device(qdev, partition_id, usr);
>>>> +    else
>>>> +        ret = qaic_create_drm_device(qdev, partition_id, usr);
>>>> +
>>>> +out:
>>>> +    srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
>>>> +    srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +DEFINE_DRM_ACCEL_FOPS(qaic_accel_fops);
>>>> +
>>>> +static const struct drm_ioctl_desc qaic_drm_ioctls[] = {
>>>> +    DRM_IOCTL_DEF_DRV(QAIC_MANAGE, qaic_manage_ioctl, 
>>>> DRM_RENDER_ALLOW),
>>>> +    DRM_IOCTL_DEF_DRV(QAIC_CREATE_BO, qaic_create_bo_ioctl, 
>>>> DRM_RENDER_ALLOW),
>>>> +    DRM_IOCTL_DEF_DRV(QAIC_MMAP_BO, qaic_mmap_bo_ioctl, 
>>>> DRM_RENDER_ALLOW),
>>>> +    DRM_IOCTL_DEF_DRV(QAIC_ATTACH_SLICE_BO, 
>>>> qaic_attach_slice_bo_ioctl, DRM_RENDER_ALLOW),
>>>> +    DRM_IOCTL_DEF_DRV(QAIC_EXECUTE_BO, qaic_execute_bo_ioctl, 
>>>> DRM_RENDER_ALLOW),
>>>> +    DRM_IOCTL_DEF_DRV(QAIC_PARTIAL_EXECUTE_BO, 
>>>> qaic_partial_execute_bo_ioctl, DRM_RENDER_ALLOW),
>>>> +    DRM_IOCTL_DEF_DRV(QAIC_WAIT_BO, qaic_wait_bo_ioctl, 
>>>> DRM_RENDER_ALLOW),
>>>> +    DRM_IOCTL_DEF_DRV(QAIC_PERF_STATS_BO, qaic_perf_stats_bo_ioctl, 
>>>> DRM_RENDER_ALLOW),
>>>> +    DRM_IOCTL_DEF_DRV(QAIC_PART_DEV, qaic_part_dev_ioctl, 
>>>> DRM_RENDER_ALLOW),
>>>> +};
>>>> +
>>>> +static const struct drm_driver qaic_accel_driver = {
>>>> +    .driver_features    = DRIVER_GEM | DRIVER_COMPUTE_ACCEL,
>>>> +
>>>> +    .name            = QAIC_NAME,
>>>> +    .desc            = QAIC_DESC,
>>>> +    .date            = "20190618",
>>>> +
>>>> +    .fops            = &qaic_accel_fops,
>>>> +    .open            = qaic_open,
>>>> +    .postclose        = qaic_postclose,
>>>> +
>>>> +    .ioctls            = qaic_drm_ioctls,
>>>> +    .num_ioctls        = ARRAY_SIZE(qaic_drm_ioctls),
>>>> +    .prime_fd_to_handle    = drm_gem_prime_fd_to_handle,
>>>> +    .gem_prime_import    = qaic_gem_prime_import,
>>>> +};
>>>> +
>>>> +static int qaic_create_drm_device(struct qaic_device *qdev, s32 
>>>> partition_id,
>>>> +                  struct qaic_user *owner)
>>>> +{
>>>> +    struct qaic_drm_device *qddev;
>>>> +    struct drm_device *ddev;
>>>> +    struct device *pdev;
>>>> +    int ret;
>>>> +
>>>> +    /*
>>>> +     * Partition id QAIC_NO_PARTITION indicates that the device was 
>>>> created
>>>> +     * on mhi_probe and id > QAIC_NO_PARTITION indicates a partition
>>>> +     * created using IOCTL. So, pdev for primary device is the pci 
>>>> dev and
>>>> +     * the parent for partition dev is the primary device.
>>>> +     */
>>>> +    if (partition_id == QAIC_NO_PARTITION)
>>>> +        pdev = &qdev->pdev->dev;
>>>> +    else
>>>> +        pdev = qdev->base_dev->ddev->dev;
>>>> +
>>>> +    qddev = kzalloc(sizeof(*qddev), GFP_KERNEL);
>>>> +    if (!qddev) {
>>>> +        ret = -ENOMEM;
>>>> +        goto qddev_fail;
> 
> qddev_fail label is just before returning so better return here directly
> 
>>>> +    }
>>>> +
>>>> +    ddev = drm_dev_alloc(&qaic_accel_driver, pdev);
>>>> +    if (IS_ERR(ddev)) {
>>>> +        ret = PTR_ERR(ddev);
>>>> +        goto ddev_fail;
>>>> +    }
>>>> +
>>>> +    ddev->dev_private = qddev;
>>>> +    qddev->ddev = ddev;
>>>> +
>>>> +    if (partition_id == QAIC_NO_PARTITION)
>>>> +        qdev->base_dev = qddev;
>>>> +    qddev->qdev = qdev;
>>>> +    qddev->partition_id = partition_id;
>>>> +    qddev->owner = owner;
>>>> +    INIT_LIST_HEAD(&qddev->users);
>>>> +    mutex_init(&qddev->users_mutex);
>>>> +
>>>> +    mutex_lock(&qdev->qaic_drm_devices_mutex);
>>>> +    list_add(&qddev->node, &qdev->qaic_drm_devices);
>>>> +    mutex_unlock(&qdev->qaic_drm_devices_mutex);
>>>> +
>>>> +    ret = drm_dev_register(ddev, 0);
>>>> +    if (ret) {
>>>> +        pci_dbg(qdev->pdev, "%s: drm_dev_register failed %d\n", 
>>>> __func__, ret);
>>>> +        goto drm_reg_fail;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +
>>>> +drm_reg_fail:
>>>> +    mutex_destroy(&qddev->users_mutex);
>>>> +    mutex_lock(&qdev->qaic_drm_devices_mutex);
>>>> +    list_del(&qddev->node);
>>>> +    mutex_unlock(&qdev->qaic_drm_devices_mutex);
>>>> +    if (partition_id == QAIC_NO_PARTITION)
>>>> +        qdev->base_dev = NULL;
>>>> +    drm_dev_put(ddev);
>>>> +ddev_fail:
>>>> +    kfree(qddev);
>>>> +qddev_fail:
> 
> no point for this label
> 
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 
>>>> partition_id,
>>>> +                    struct qaic_user *owner)
>>>> +{
>>>> +    struct qaic_drm_device *qddev;
>>>> +    struct qaic_drm_device *q;
>>>> +    struct qaic_user *usr;
>>>> +
>>>> +    list_for_each_entry_safe(qddev, q, &qdev->qaic_drm_devices, 
>>>> node) {
>>>> +        /*
>>>> +         * Skip devices in case we just want to remove devices
>>>> +         * specific to a owner or partition id.
>>>> +         *
>>>> +         * owner    partition_id    notes
>>>> +         * ----------------------------------
>>>> +         * NULL        NO_PARTITION    delete base + all derived (qdev
>>>> +         *                reset)
>>>> +         * !NULL    NO_PARTITION    delete derived devs created by
>>>> +         *                owner.
>>>> +         * !NULL    >NO_PARTITION    delete derived dev identified by
>>>> +         *                the partition id and created by
>>>> +         *                owner
>>>> +         * NULL        >NO_PARTITION    invalid (no-op)
>>>> +         *
>>>> +         * if partition_id is any value < QAIC_NO_PARTITION this 
>>>> will be
>>>> +         * a no-op.
>>>> +         */
>>>> +        if (owner && owner != qddev->owner)
>>>> +            continue;
>>>> +
>>>> +        if (partition_id != QAIC_NO_PARTITION &&
>>>> +            partition_id != qddev->partition_id && !owner)
>>>> +            continue;
>>>> +
>>>> +        /*
>>>> +         * Existing users get unresolvable errors till they close FDs.
>>>> +         * Need to sync carefully with users calling close().  The
>>>> +         * list of users can be modified elsewhere when the lock isn't
>>>> +         * held here, but the sync'ing the srcu with the mutex held
>>>> +         * could deadlock.  Grab the mutex so that the list will be
>>>> +         * unmodified.  The user we get will exist as long as the
>>>> +         * lock is held.  Signal that the qcdev is going away, and
>>>> +         * grab a reference to the user so they don't go away for
>>>> +         * synchronize_srcu().  Then release the mutex to avoid
>>>> +         * deadlock and make sure the user has observed the signal.
>>>> +         * With the lock released, we cannot maintain any state of the
>>>> +         * user list.
>>>> +         */
>>>> +        mutex_lock(&qddev->users_mutex);
>>>> +        while (!list_empty(&qddev->users)) {
>>>> +            usr = list_first_entry(&qddev->users, struct qaic_user,
>>>> +                           node);
>>>> +            list_del_init(&usr->node);
>>>> +            kref_get(&usr->ref_count);
>>>> +            usr->qddev = NULL;
>>>> +            mutex_unlock(&qddev->users_mutex);
>>>> +            synchronize_srcu(&usr->qddev_lock);
>>>> +            kref_put(&usr->ref_count, free_usr);
>>>> +            mutex_lock(&qddev->users_mutex);
>>>> +        }
>>>> +        mutex_unlock(&qddev->users_mutex);
>>>> +
>>>> +        if (qddev->ddev) {
>>>> +            drm_dev_unregister(qddev->ddev);
>>>> +            drm_dev_put(qddev->ddev);
>>>> +        }
>>>> +
>>>> +        list_del(&qddev->node);
>>>> +        kfree(qddev);
>>>> +    }
>>>> +}
>>>> +
>>>> +static int qaic_mhi_probe(struct mhi_device *mhi_dev,
>>>> +              const struct mhi_device_id *id)
>>>> +{
>>>> +    struct qaic_device *qdev;
>>>> +    u16 major, minor;
>>>> +    int ret;
>>>> +
>>>> +    /*
>>>> +     * Invoking this function indicates that the control channel to 
>>>> the
>>>> +     * device is available.  We use that as a signal to indicate that
>>>> +     * the device side firmware has booted.  The device side firmware
>>>> +     * manages the device resources, so we need to communicate with it
>>>> +     * via the control channel in order to utilize the device.  
>>>> Therefore
>>>> +     * we wait until this signal to create the drm dev that 
>>>> userspace will
>>>> +     * use to control the device, because without the device side 
>>>> firmware,
>>>> +     * userspace can't do anything useful.
>>>> +     */
>>>> +
>>>> +    qdev = pci_get_drvdata(to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
>>>> +
>>>> +    qdev->in_reset = false;
>>>> +
>>>> +    dev_set_drvdata(&mhi_dev->dev, qdev);
>>>> +    qdev->cntl_ch = mhi_dev;
>>>> +
>>>> +    ret = qaic_control_open(qdev);
>>>> +    if (ret) {
>>>> +        pci_dbg(qdev->pdev, "%s: control_open failed %d\n", 
>>>> __func__, ret);
>>>> +        goto err;
>>>> +    }
>>>> +
>>>> +    ret = get_cntl_version(qdev, NULL, &major, &minor);
>>>> +    if (ret || major != cntl_major || minor > cntl_minor) {
>>>> +        pci_err(qdev->pdev, "%s: Control protocol version (%d.%d) 
>>>> not supported.  Supported version is (%d.%d). Ret: %d\n",
>>>> +            __func__, major, minor, cntl_major, cntl_minor, ret);
>>>> +        ret = -EINVAL;
>>>> +        goto close_control;
>>>> +    }
>>>> +
>>>> +    ret = qaic_create_drm_device(qdev, QAIC_NO_PARTITION, NULL);
>>>> +
>>>> +    return ret;
>>>> +
>>>> +close_control:
>>>> +    qaic_control_close(qdev);
>>>> +err:
> 
> same here, not sure there is a point to this label
> 
> Thanks,
> Dafna
> 
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static void qaic_mhi_remove(struct mhi_device *mhi_dev)
>>>> +{
>>>
>>> Add a comment here
>>
>> Presumably why this is an empty function?
>> Will do.
>>
>>>> +}
>>>> +
>>>> +static void qaic_notify_reset(struct qaic_device *qdev)
>>>> +{
>>>> +    int i;
>>>> +
>>>> +    qdev->in_reset = true;
>>>> +    /* wake up any waiters to avoid waiting for timeouts at sync */
>>>> +    wake_all_cntl(qdev);
>>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>>> +        wakeup_dbc(qdev, i);
>>>> +    synchronize_srcu(&qdev->dev_lock);
>>>> +}
>>>> +
>>>> +void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, 
>>>> bool exit_reset)
>>>> +{
>>>> +    int i;
>>>> +
>>>> +    qaic_notify_reset(qdev);
>>>> +
>>>> +    /* remove drmdevs to prevent new users from coming in */
>>>> +    if (qdev->base_dev)
>>>> +        qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, NULL);
>>>> +
>>>> +    /* start tearing things down */
>>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>>> +        release_dbc(qdev, i);
>>>> +
>>>> +    if (exit_reset)
>>>> +        qdev->in_reset = false;
>>>> +}
>>>> +
>>>> +static int qaic_pci_probe(struct pci_dev *pdev,
>>>> +              const struct pci_device_id *id)
>>>
>>> Please try to simplify this function. Maybe move irq init to separate 
>>> function.
>>> It will be more readable and there will less of a error handling 
>>> spaghetti at the bottom.
>>
>> I guess I tend towards not breaking something up if there is not reuse 
>> elsewhere, but I think I see your point.  Will have a look.
>>
>>>
>>>> +{
>>>> +    int ret;
>>>> +    int i;
>>>> +    int mhi_irq;
>>>> +    struct qaic_device *qdev;
>>>> +
>>>> +    qdev = devm_kzalloc(&pdev->dev, sizeof(*qdev), GFP_KERNEL);
>>>> +    if (!qdev)
>>>> +        return -ENOMEM;
>>>> +
>>>> +    if (id->device == PCI_DEV_AIC100) {
>>>> +        qdev->num_dbc = 16;
>>>> +        qdev->dbc = devm_kcalloc(&pdev->dev, qdev->num_dbc, 
>>>> sizeof(*qdev->dbc),
>>>> +                     GFP_KERNEL);
>>>> +        if (!qdev->dbc)
>>>> +            return -ENOMEM;
>>>> +    }
>>>> +
>>>> +    qdev->cntl_wq = alloc_workqueue("qaic_cntl", WQ_UNBOUND, 0);
>>>> +    if (!qdev->cntl_wq) {
>>>> +        ret = -ENOMEM;
>>>> +        goto wq_fail;
>>>> +    }
>>>> +    pci_set_drvdata(pdev, qdev);
>>>> +    qdev->pdev = pdev;
>>>> +    mutex_init(&qdev->cntl_mutex);
>>>> +    INIT_LIST_HEAD(&qdev->cntl_xfer_list);
>>>> +    init_srcu_struct(&qdev->dev_lock);
>>>> +    INIT_LIST_HEAD(&qdev->qaic_drm_devices);
>>>> +    mutex_init(&qdev->qaic_drm_devices_mutex);
>>>> +    for (i = 0; i < qdev->num_dbc; ++i) {
>>>> +        spin_lock_init(&qdev->dbc[i].xfer_lock);
>>>> +        qdev->dbc[i].qdev = qdev;
>>>> +        qdev->dbc[i].id = i;
>>>> +        INIT_LIST_HEAD(&qdev->dbc[i].xfer_list);
>>>> +        init_srcu_struct(&qdev->dbc[i].ch_lock);
>>>> +        init_waitqueue_head(&qdev->dbc[i].dbc_release);
>>>> +        INIT_LIST_HEAD(&qdev->dbc[i].bo_lists);
>>>> +    }
>>>> +
>>>> +    qdev->bars = pci_select_bars(pdev, IORESOURCE_MEM);
>>>> +
>>>> +    /* make sure the device has the expected BARs */
>>>> +    if (qdev->bars != (BIT(0) | BIT(2) | BIT(4))) {
>>>> +        pci_dbg(pdev, "%s: expected BARs 0, 2, and 4 not found in 
>>>> device.  Found 0x%x\n",
>>>> +            __func__, qdev->bars);
>>>> +        ret = -EINVAL;
>>>> +        goto bar_fail;
>>>> +    }
>>>> +
>>>> +    ret = pci_enable_device(pdev);
>>>> +    if (ret)
>>>> +        goto enable_fail;
>>>> +
>>>> +    ret = pci_request_selected_regions(pdev, qdev->bars, "aic100");
>>>> +    if (ret)
>>>> +        goto request_regions_fail;
>>>> +
>>>> +    pci_set_master(pdev);
>>>> +
>>>> +    ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
>>>> +    if (ret)
>>>> +        goto dma_mask_fail;
>>>> +    ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
>>>> +    if (ret)
>>>> +        goto dma_mask_fail;
>>>> +    ret = dma_set_max_seg_size(&pdev->dev, UINT_MAX);
>>>> +    if (ret)
>>>> +        goto dma_mask_fail;
>>>> +
>>>> +    qdev->bar_0 = pci_ioremap_bar(pdev, 0);
>>>> +    if (!qdev->bar_0) {
>>>> +        ret = -ENOMEM;
>>>> +        goto ioremap_0_fail;
>>>> +    }
>>>> +
>>>> +    qdev->bar_2 = pci_ioremap_bar(pdev, 2);
>>>> +    if (!qdev->bar_2) {
>>>> +        ret = -ENOMEM;
>>>> +        goto ioremap_2_fail;
>>>> +    }
>>>> +
>>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>>> +        qdev->dbc[i].dbc_base = qdev->bar_2 + QAIC_DBC_OFF(i);
>>>> +
>>>> +    ret = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI);
>>>> +    if (ret < 0)
>>>> +        goto alloc_irq_fail;
>>>> +
>>>> +    if (ret < 32) {
>>>> +        pci_err(pdev, "%s: Requested 32 MSIs.  Obtained %d MSIs 
>>>> which is less than the 32 required.\n",
>>>> +            __func__, ret);
>>>> +        ret = -ENODEV;
>>>> +        goto invalid_msi_config;
>>>> +    }
>>>> +
>>>> +    mhi_irq = pci_irq_vector(pdev, 0);
>>>> +    if (mhi_irq < 0) {
>>>> +        ret = mhi_irq;
>>>> +        goto get_mhi_irq_fail;
>>>> +    }
>>>> +
>>>> +    for (i = 0; i < qdev->num_dbc; ++i) {
>>>> +        ret = devm_request_threaded_irq(&pdev->dev,
>>>> +                        pci_irq_vector(pdev, i + 1),
>>>> +                        dbc_irq_handler,
>>>> +                        dbc_irq_threaded_fn,
>>>> +                        IRQF_SHARED,
>>>> +                        "qaic_dbc",
>>>> +                        &qdev->dbc[i]);
>>>> +        if (ret)
>>>> +            goto get_dbc_irq_failed;
>>>> +
>>>> +        if (poll_datapath) {
>>>> +            qdev->dbc[i].irq = pci_irq_vector(pdev, i + 1);
>>>> +            disable_irq_nosync(qdev->dbc[i].irq);
>>>> +            INIT_WORK(&qdev->dbc[i].poll_work, irq_polling_work);
>>>> +        }
>>>> +    }
>>>> +
>>>> +    qdev->mhi_cntl = qaic_mhi_register_controller(pdev, 
>>>> qdev->bar_0, mhi_irq);
>>>> +    if (IS_ERR(qdev->mhi_cntl)) {
>>>> +        ret = PTR_ERR(qdev->mhi_cntl);
>>>> +        goto mhi_register_fail;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +
>>>> +mhi_register_fail:
>>>> +get_dbc_irq_failed:
>>>
>>> I don't think that duplicated goto statements are allowed by the 
>>> coding style.
>>> These should be rather named something like err_free_irq.
>>> See 
>>> https://www.kernel.org/doc/html/v4.10/process/coding-style.html#centralized-exiting-of-functions 
>>>
>>
>> I took another look at that link, and I do not beleive it prohibits 
>> duplicated goto labels, but they probably could be renamed and 
>> simplified as you indicate.  Will have a look at that.
>>
>>>
>>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>>> +        devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
>>>> +                  &qdev->dbc[i]);
>>>> +get_mhi_irq_fail:
>>>> +invalid_msi_config:
>>>> +    pci_free_irq_vectors(pdev);
>>>> +alloc_irq_fail:
>>>> +    iounmap(qdev->bar_2);
>>>> +ioremap_2_fail:
>>>> +    iounmap(qdev->bar_0);
>>>> +ioremap_0_fail:
>>>> +dma_mask_fail:
>>>> +    pci_clear_master(pdev);
>>>> +    pci_release_selected_regions(pdev, qdev->bars);
>>>> +request_regions_fail:
>>>> +    pci_disable_device(pdev);
>>>> +enable_fail:
>>>> +    pci_set_drvdata(pdev, NULL);
>>>> +bar_fail:
>>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>>> +        cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
>>>> +    cleanup_srcu_struct(&qdev->dev_lock);
>>>> +    destroy_workqueue(qdev->cntl_wq);
>>>> +wq_fail:
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static void qaic_pci_remove(struct pci_dev *pdev)
>>>> +{
>>>> +    struct qaic_device *qdev = pci_get_drvdata(pdev);
>>>> +    int i;
>>>> +
>>>> +    if (!qdev)
>>>> +        return;
>>>> +
>>>> +    qaic_dev_reset_clean_local_state(qdev, false);
>>>> +    qaic_mhi_free_controller(qdev->mhi_cntl, link_up);
>>>> +    for (i = 0; i < qdev->num_dbc; ++i) {
>>>> +        devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
>>>> +                  &qdev->dbc[i]);
>>>> +        cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
>>>> +    }
>>>> +    destroy_workqueue(qdev->cntl_wq);
>>>> +    pci_free_irq_vectors(pdev);
>>>> +    iounmap(qdev->bar_0);
>>>> +    pci_clear_master(pdev);
>>>> +    pci_release_selected_regions(pdev, qdev->bars);
>>>> +    pci_disable_device(pdev);
>>>> +    pci_set_drvdata(pdev, NULL);
>>>> +}
>>>> +
>>>> +static void qaic_pci_shutdown(struct pci_dev *pdev)
>>>> +{
>>>> +    /* see qaic_exit for what link_up is doing */
>>>> +    link_up = true;
>>>> +    qaic_pci_remove(pdev);
>>>> +}
>>>> +
>>>> +static pci_ers_result_t qaic_pci_error_detected(struct pci_dev *pdev,
>>>> +                        pci_channel_state_t error)
>>>> +{
>>>> +    return PCI_ERS_RESULT_NEED_RESET;
>>>> +}
>>>> +
>>>> +static void qaic_pci_reset_prepare(struct pci_dev *pdev)
>>>> +{
>>>> +    struct qaic_device *qdev = pci_get_drvdata(pdev);
>>>> +
>>>> +    qaic_notify_reset(qdev);
>>>> +    qaic_mhi_start_reset(qdev->mhi_cntl);
>>>> +    qaic_dev_reset_clean_local_state(qdev, false);
>>>> +}
>>>> +
>>>> +static void qaic_pci_reset_done(struct pci_dev *pdev)
>>>> +{
>>>> +    struct qaic_device *qdev = pci_get_drvdata(pdev);
>>>> +
>>>> +    qdev->in_reset = false;
>>>> +    qaic_mhi_reset_done(qdev->mhi_cntl);
>>>> +}
>>>> +
>>>> +static const struct mhi_device_id qaic_mhi_match_table[] = {
>>>> +    { .chan = "QAIC_CONTROL", },
>>>> +    {},
>>>> +};
>>>> +
>>>> +static struct mhi_driver qaic_mhi_driver = {
>>>> +    .id_table = qaic_mhi_match_table,
>>>> +    .remove = qaic_mhi_remove,
>>>> +    .probe = qaic_mhi_probe,
>>>> +    .ul_xfer_cb = qaic_mhi_ul_xfer_cb,
>>>> +    .dl_xfer_cb = qaic_mhi_dl_xfer_cb,
>>>> +    .driver = {
>>>> +        .name = "qaic_mhi",
>>>> +    },
>>>> +};
>>>> +
>>>> +static const struct pci_device_id qaic_ids[] = {
>>>> +    { PCI_DEVICE(PCI_VENDOR_ID_QCOM, PCI_DEV_AIC100), },
>>>> +    { }
>>>> +};
>>>> +MODULE_DEVICE_TABLE(pci, qaic_ids);
>>>> +
>>>> +static const struct pci_error_handlers qaic_pci_err_handler = {
>>>> +    .error_detected = qaic_pci_error_detected,
>>>> +    .reset_prepare = qaic_pci_reset_prepare,
>>>> +    .reset_done = qaic_pci_reset_done,
>>>> +};
>>>> +
>>>> +static struct pci_driver qaic_pci_driver = {
>>>> +    .name = QAIC_NAME,
>>>> +    .id_table = qaic_ids,
>>>> +    .probe = qaic_pci_probe,
>>>> +    .remove = qaic_pci_remove,
>>>> +    .shutdown = qaic_pci_shutdown,
>>>> +    .err_handler = &qaic_pci_err_handler,
>>>> +};
>>>> +
>>>> +static int __init qaic_init(void)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    if (datapath_polling)
>>>> +        poll_datapath = true;
>>>> +
>>>> +    ret = mhi_driver_register(&qaic_mhi_driver);
>>>> +    if (ret) {
>>>> +        pr_debug("qaic: mhi_driver_register failed %d\n", ret);
>>>> +        goto free_class;
>>>> +    }
>>>> +
>>>> +    ret = pci_register_driver(&qaic_pci_driver);
>>>> +    if (ret) {
>>>> +        pr_debug("qaic: pci_register_driver failed %d\n", ret);
>>>> +        goto free_mhi;
>>>> +    }
>>>> +
>>>> +    ret = mhi_qaic_ctrl_init();
>>>> +    if (ret) {
>>>> +        pr_debug("qaic: mhi_qaic_ctrl_init failed %d\n", ret);
>>>> +        goto free_pci;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +
>>>> +free_pci:
>>>> +    pci_unregister_driver(&qaic_pci_driver);
>>>> +free_mhi:
>>>> +    mhi_driver_unregister(&qaic_mhi_driver);
>>>> +free_class:
>>>
>>> This label doesn't free anything. It should be renamed.
>>
>> Doh.  Holdover from a previous version.  It does need to be renamed.
>>
>>>
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static void __exit qaic_exit(void)
>>>> +{
>>>> +    /*
>>>> +     * We assume that qaic_pci_remove() is called due to a hotplug 
>>>> event
>>>> +     * which would mean that the link is down, and thus
>>>> +     * qaic_mhi_free_controller() should not try to access the 
>>>> device during
>>>> +     * cleanup.
>>>> +     * We call pci_unregister_driver() below, which also triggers
>>>> +     * qaic_pci_remove(), but since this is module exit, we expect 
>>>> the link
>>>> +     * to the device to be up, in which case 
>>>> qaic_mhi_free_controller()
>>>> +     * should try to access the device during cleanup to put the 
>>>> device in
>>>> +     * a sane state.
>>>> +     * For that reason, we set link_up here to let 
>>>> qaic_mhi_free_controller
>>>> +     * know the expected link state.  Since the module is going to be
>>>> +     * removed at the end of this, we don't need to worry about
>>>> +     * reinitializing the link_up state after the cleanup is done.
>>>> +     */
>>>> +    link_up = true;
>>>
>>> Maybe you could just use pdev->current_state instead of link_up?
>>
>> Maybe.  Will have a look at that.
>>
>>>
>>>> +    mhi_qaic_ctrl_deinit();
>>>> +    pci_unregister_driver(&qaic_pci_driver);
>>>> +    mhi_driver_unregister(&qaic_mhi_driver);
>>>> +}
>>>> +
>>>> +module_init(qaic_init);
>>>> +module_exit(qaic_exit);
>>>> +
>>>> +MODULE_AUTHOR(QAIC_DESC " Kernel Driver Team");
>>>> +MODULE_DESCRIPTION(QAIC_DESC " Accel Driver");
>>>> +MODULE_LICENSE("GPL");
>>>> diff --git a/include/uapi/drm/qaic_accel.h 
>>>> b/include/uapi/drm/qaic_accel.h
>>>> new file mode 100644
>>>> index 0000000..d5fa6f5
>>>> --- /dev/null
>>>> +++ b/include/uapi/drm/qaic_accel.h
>>>> @@ -0,0 +1,283 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
>>>> + *
>>>> + * Copyright (c) 2019-2020, The Linux Foundation. All rights reserved.
>>>> + * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All 
>>>> rights reserved.
>>>> + */
>>>> +
>>>> +#ifndef QAIC_ACCEL_H_
>>>> +#define QAIC_ACCEL_H_
>>>> +
>>>> +#include <linux/ioctl.h>
>>>> +#include <linux/types.h>
>>>
>>> These two headers should not be needed here.
>>
>> Fair enough.  Listed out of paranoia since they define things used in 
>> this header.
>>
>>>
>>>> +#include "drm.h"
>>>> +
>>>> +#if defined(__CPLUSPLUS)
>>>
>>> Use lowercase here: __cplusplus
>>
>> Doh.  Not sure why uppercase was used.  The referenced examples sure 
>> didn't have that.
>>
>>>
>>>> +extern "C" {
>>>> +#endif
>>>> +
>>>> +#define QAIC_MANAGE_MAX_MSG_LENGTH SZ_4K    /**<
>>>> +                          * The length(4K) includes len and
>>>> +                          * count fields of qaic_manage_msg
>>>> +                          */
>>>
>>> I guess these are doxygen style commands but you should be using 
>>> kernel-doc here.
>>> See https://docs.kernel.org/doc-guide/kernel-doc.html.
>>> This can be used to verify the header:
>>> $ scripts/kernel-doc -v -none include/uapi/drm/qaic_accel.h
>>
>> include/uapi/drm/drm.h was used as the example for these, and thus it 
>> was assumed that was the DRM style.
>>
>> I'm not convinced kernel-doc is applicable to all of these.  Will review.
>>
>>>
>>>> +
>>>> +enum qaic_sem_flags {
>>>
>>> Is there any specific reason for enums if all values are explicitly 
>>> given?
>>
>> It is a style pulled from elsewhere.  I can remove the enums.
>>
>>>
>>>> +    SEM_INSYNCFENCE =    0x1,
>>>
>>> All these enums/defines end up in a global user space namespace.
>>> I would advise to prefix everything with QAIC_ or DRM_QAIC_ (e.g. 
>>> QAIC_SEM_INSYNCFENCE)
>>> to avoid conflicts with other drivers or user space libs.
>>
>> Good point.  Will do.
>>
>>>
>>>> +    SEM_OUTSYNCFENCE =    0x2,
>>>> +};
>>>> +
>>>> +enum qaic_sem_cmd {
>>>> +    SEM_NOP =        0,
>>>> +    SEM_INIT =        1,
>>>> +    SEM_INC =        2,
>>>> +    SEM_DEC =        3,
>>>> +    SEM_WAIT_EQUAL =    4,
>>>> +    SEM_WAIT_GT_EQ =    5, /**< Greater than or equal */
>>>> +    SEM_WAIT_GT_0 =        6, /**< Greater than 0 */
>>>> +};
>>>> +
>>>> +enum qaic_manage_transaction_type {
>>>> +    TRANS_UNDEFINED =            0,
>>>> +    TRANS_PASSTHROUGH_FROM_USR =        1,
>>>> +    TRANS_PASSTHROUGH_TO_USR =        2,
>>>> +    TRANS_PASSTHROUGH_FROM_DEV =        3,
>>>> +    TRANS_PASSTHROUGH_TO_DEV =        4,
>>>> +    TRANS_DMA_XFER_FROM_USR =        5,
>>>> +    TRANS_DMA_XFER_TO_DEV =            6,
>>>> +    TRANS_ACTIVATE_FROM_USR =        7,
>>>> +    TRANS_ACTIVATE_FROM_DEV =        8,
>>>> +    TRANS_ACTIVATE_TO_DEV =            9,
>>>> +    TRANS_DEACTIVATE_FROM_USR =        10,
>>>> +    TRANS_DEACTIVATE_FROM_DEV =        11,
>>>> +    TRANS_STATUS_FROM_USR =            12,
>>>> +    TRANS_STATUS_TO_USR =            13,
>>>> +    TRANS_STATUS_FROM_DEV =            14,
>>>> +    TRANS_STATUS_TO_DEV =            15,
>>>> +    TRANS_TERMINATE_FROM_DEV =        16,
>>>> +    TRANS_TERMINATE_TO_DEV =        17,
>>>> +    TRANS_DMA_XFER_CONT =            18,
>>>> +    TRANS_VALIDATE_PARTITION_FROM_DEV =    19,
>>>> +    TRANS_VALIDATE_PARTITION_TO_DEV =    20,
>>>> +    TRANS_MAX =                21
>>>> +};
>>>> +
>>>> +struct qaic_manage_trans_hdr {
>>>> +    __u32 type;    /**< in, value from enum manage_transaction_type */
>>>> +    __u32 len;    /**< in, length of this transaction, including 
>>>> the header */
>>>> +};
>>>> +
>>>> +struct qaic_manage_trans_passthrough {
>>>> +    struct qaic_manage_trans_hdr hdr;
>>>> +    __u8 data[];    /**< in, userspace must encode in little endian */
>>>> +};
>>>> +
>>>> +struct qaic_manage_trans_dma_xfer {
>>>> +    struct qaic_manage_trans_hdr hdr;
>>>> +    __u32 tag;    /**< in, device specific */
>>>> +    __u32 count;    /**< in */
>>>> +    __u64 addr;    /**< in, address of the data to transferred via 
>>>> DMA */
>>>> +    __u64 size;    /**< in, length of the data to transferred via 
>>>> DMA */
>>>> +};
>>>> +
>>>> +struct qaic_manage_trans_activate_to_dev {
>>>> +    struct qaic_manage_trans_hdr hdr;
>>>> +    __u32 queue_size;    /**<
>>>> +                  * in, number of elements in DBC request
>>>> +                  * and respose queue
>>>> +                  */
>>>> +    __u32 eventfd;        /**< in */
>>>> +    __u32 options;        /**< in, device specific */
>>>> +    __u32 pad;        /**< pad must be 0 */
>>>> +};
>>>> +
>>>> +struct qaic_manage_trans_activate_from_dev {
>>>> +    struct qaic_manage_trans_hdr hdr;
>>>> +    __u32 status;    /**< out, status of activate transaction */
>>>> +    __u32 dbc_id;    /**< out, Identifier of assigned DMA Bridge 
>>>> channel */
>>>> +    __u64 options;    /**< out */
>>>> +};
>>>> +
>>>> +struct qaic_manage_trans_deactivate {
>>>> +    struct qaic_manage_trans_hdr hdr;
>>>> +    __u32 dbc_id;    /**< in, Identifier of assigned DMA Bridge 
>>>> channel */
>>>> +    __u32 pad;    /**< pad must be 0 */
>>>> +};
>>>> +
>>>> +struct qaic_manage_trans_status_to_dev {
>>>> +    struct qaic_manage_trans_hdr hdr;
>>>> +};
>>>> +
>>>> +struct qaic_manage_trans_status_from_dev {
>>>> +    struct qaic_manage_trans_hdr hdr;
>>>> +    __u16 major;    /**< out, major vesrion of NNC protocol used by 
>>>> device */
>>>> +    __u16 minor;    /**< out, minor vesrion of NNC protocol used by 
>>>> device */
>>>
>>> vesrion -> version
>>
>> Will fix.
>>
>>>
>>>> +    __u32 status;    /**< out, status of query transaction  */
>>>> +    __u64 status_flags;    /**<
>>>> +                  * out
>>>> +                  * 0    : If set then device has CRC check enabled
>>>> +                  * 1:63 : Unused
>>>> +                  */
>>>> +};
>>>> +
>>>> +struct qaic_manage_msg {
>>>> +    __u32 len;    /**< in, Length of valid data - ie sum of all 
>>>> transactions */
>>>> +    __u32 count;    /**< in, Number of transactions in message */
>>>> +    __u64 data;    /**< in, Pointer to array of transactions */
>>>> +};
>>>> +
>>>> +struct qaic_create_bo {
>>>> +    __u64 size;    /**< in, Size of BO in byte */
>>>> +    __u32 handle;    /**< out, Returned GEM handle for the BO */
>>>> +    __u32 pad;    /**< pad must be 0 */
>>>> +};
>>>> +
>>>> +struct qaic_mmap_bo {
>>>> +    __u32 handle;    /**< in, Handle for the BO being mapped. */
>>>
>>> The comment is missleading. BO is not mapped by this ioctl().
>>
>> Its mapping the handle to the offset, which is a type of mapping.  I 
>> think I get your point though.  Will try to wordsmith this better.
>>
>>>
>>>> +    __u32 pad;    /**< pad must be 0 */
>>>> +    __u64 offset;    /**<
>>>> +              * out, offset into the drm node to use for
>>>> +              * subsequent mmap call
>>>> +              */
>>>> +};
>>>> +
>>>> +/**
>>>> + * @brief semaphore command
>>>> + */
>>>> +struct qaic_sem {
>>>> +    __u16 val;    /**< in, Only lower 12 bits are valid */
>>>> +    __u8  index;    /**< in, Only lower 5 bits are valid */
>>>> +    __u8  presync;    /**< in, 1 if presync operation, 0 if 
>>>> postsync */
>>>> +    __u8  cmd;    /**< in, See enum sem_cmd */
>>>> +    __u8  flags;    /**< in, See sem_flags for valid bits.  All 
>>>> others must be 0 */
>>>> +    __u16 pad;    /**< pad must be 0 */
>>>> +};
>>>> +
>>>> +struct qaic_attach_slice_entry {
>>>> +    __u64 size;        /**< in, Size memory to allocate for this BO 
>>>> slice */
>>>> +    struct qaic_sem    sem0;    /**< in, Must be zero if not valid */
>>>> +    struct qaic_sem    sem1;    /**< in, Must be zero if not valid */
>>>> +    struct qaic_sem    sem2;    /**< in, Must be zero if not valid */
>>>> +    struct qaic_sem    sem3;    /**< in, Must be zero if not valid */
>>>> +    __u64 dev_addr;        /**< in, Address in device to/from which 
>>>> data is copied */
>>>> +    __u64 db_addr;        /**< in, Doorbell address */
>>>> +    __u32 db_data;        /**< in, Data to write to doorbell */
>>>> +    __u32 db_len;        /**<
>>>> +                  * in, Doorbell length - 32, 16, or 8 bits.
>>>> +                  * 0 means doorbell is inactive
>>>> +                  */
>>>> +    __u64 offset;        /**< in, Offset from start of buffer */
>>>> +};
>>>> +
>>>> +struct qaic_attach_slice_hdr {
>>>> +    __u32 count;    /**< in, Number of slices for this BO */
>>>> +    __u32 dbc_id;    /**< in, Associate this BO with this DMA 
>>>> Bridge channel */
>>>> +    __u32 handle;    /**< in, Handle of BO to which slicing 
>>>> information is to be attached */
>>>> +    __u32 dir;    /**< in, Direction of data: 1 = DMA_TO_DEVICE, 2 
>>>> = DMA_FROM_DEVICE */
>>>> +    __u64 size;    /**<
>>>> +              * in, Total length of BO
>>>> +              * If BO is imported (DMABUF/PRIME) then this size
>>>> +              * should not exceed the size of DMABUF provided.
>>>> +              * If BO is allocated using DRM_IOCTL_QAIC_CREATE_BO
>>>> +              * then this size should be exactly same as the size
>>>> +              * provided during DRM_IOCTL_QAIC_CREATE_BO.
>>>> +              */
>>>> +};
>>>> +
>>>> +struct qaic_attach_slice {
>>>> +    struct qaic_attach_slice_hdr hdr;
>>>> +    __u64 data;    /**<
>>>> +              * in, Pointer to a buffer which is container of
>>>> +              * struct qaic_attach_slice_entry[]
>>>> +              */
>>>> +};
>>>> +
>>>> +struct qaic_execute_entry {
>>>> +    __u32 handle;    /**< in, buffer handle */
>>>> +    __u32 dir;    /**< in, 1 = to device, 2 = from device */
>>>> +};
>>>> +
>>>> +struct qaic_partial_execute_entry {
>>>> +    __u32 handle;    /**< in, buffer handle */
>>>> +    __u32 dir;    /**< in, 1 = to device, 2 = from device */
>>>> +    __u64 resize;    /**< in, 0 = no resize */
>>>> +};
>>>> +
>>>> +struct qaic_execute_hdr {
>>>> +    __u32 count;    /**< in, number of executes following this 
>>>> header */
>>>> +    __u32 dbc_id;    /**< in, Identifier of assigned DMA Bridge 
>>>> channel */
>>>> +};
>>>> +
>>>> +struct qaic_execute {
>>>> +    struct qaic_execute_hdr hdr;
>>>> +    __u64 data;    /**< in, qaic_execute_entry or 
>>>> qaic_partial_execute_entry container */
>>>> +};
>>>> +
>>>> +struct qaic_wait {
>>>> +    __u32 handle;    /**< in, handle to wait on until execute is 
>>>> complete */
>>>> +    __u32 timeout;    /**< in, timeout for wait(in ms) */
>>>> +    __u32 dbc_id;    /**< in, Identifier of assigned DMA Bridge 
>>>> channel */
>>>> +    __u32 pad;    /**< pad must be 0 */
>>>> +};
>>>> +
>>>> +struct qaic_perf_stats_hdr {
>>>> +    __u16 count;    /**< in, Total number BOs requested */
>>>> +    __u16 pad;    /**< pad must be 0 */
>>>> +    __u32 dbc_id;    /**< in, Identifier of assigned DMA Bridge 
>>>> channel */
>>>> +};
>>>> +
>>>> +struct qaic_perf_stats {
>>>> +    struct qaic_perf_stats_hdr hdr;
>>>> +    __u64 data;    /**< in, qaic_perf_stats_entry container */
>>>> +};
>>>> +
>>>> +struct qaic_perf_stats_entry {
>>>> +    __u32 handle;            /**< in, Handle of the memory request */
>>>> +    __u32 queue_level_before;    /**<
>>>> +                      * out, Number of elements in queue
>>>> +                      * before submission given memory request
>>>> +                      */
>>>> +    __u32 num_queue_element;    /**<
>>>> +                      * out, Number of elements to add in the
>>>> +                      * queue for given memory request
>>>> +                      */
>>>> +    __u32 submit_latency_us;    /**<
>>>> +                      * out, Time taken by kernel to submit
>>>> +                      * the request to device
>>>> +                      */
>>>> +    __u32 device_latency_us;    /**<
>>>> +                      * out, Time taken by device to execute the
>>>> +                      * request. 0 if request is not completed
>>>> +                      */
>>>> +    __u32 pad;            /**< pad must be 0 */
>>>> +};
>>>> +
>>>> +struct qaic_part_dev {
>>>> +    __u32 partition_id;    /**< in, reservation id */
>>>> +    __u16 remove;        /**< in, 1 - Remove device 0 - Create 
>>>> device */
>>>> +    __u16 pad;        /**< pad must be 0 */
>>>> +};
>>>> +
>>>> +#define DRM_QAIC_MANAGE                0x00
>>>> +#define DRM_QAIC_CREATE_BO            0x01
>>>> +#define DRM_QAIC_MMAP_BO            0x02
>>>
>>> I know that MMAP_BO ioctl is common in drm drivers but in my opinion 
>>> it is a very poor name.
>>> I suggest naming it BO_INFO so in future you could extend it with 
>>> other bo params besides
>>> mmap offset.
>>
>> I see your point, but I don't see how to implement it.
>>
>> Let us assume this IOCTL is renamed DRM_QAIC_BO_INFO, and struct 
>> qaic_mmap_bo is named qaic_bo_info.
>>
>> qaic_bo_info is part of the uAPI.  If, "tomorrow" we need to add a 
>> field to it, we cannot.  That changes the structure size and breaks 
>> the uAPI.
>>
>> I can't add reserved fields in anticipation of future use since GregKH 
>> had said not to do that - 
>> https://lore.kernel.org/all/20200514163734.GB3154055@kroah.com/
>>
>> So, I'm thinking the only approach would be a linked list similar to 
>> EXECUTE_BO where qaic_bo_info holds a userspace pointer that is the 
>> head of the list, and new list nodes can be defined in future.  That 
>> seems like an overengineered solution to some hypothetical future 
>> which may not come to pass.  If this is the solution, then I think I'd 
>> prefer to leave this ioctl as is, and deprecate it if the extension 
>> usecase ever comes to pass.
>>
>> Thoughts?  Am I missing something?
>>
>>>
>>>> +#define DRM_QAIC_ATTACH_SLICE_BO        0x03
>>>> +#define DRM_QAIC_EXECUTE_BO            0x04
>>>> +#define DRM_QAIC_PARTIAL_EXECUTE_BO        0x05
>>>> +#define DRM_QAIC_WAIT_BO            0x06
>>>> +#define DRM_QAIC_PERF_STATS_BO            0x07
>>>> +#define DRM_QAIC_PART_DEV            0x08
>>>> +
>>>> +#define DRM_IOCTL_QAIC_MANAGE            DRM_IOWR(DRM_COMMAND_BASE 
>>>> + DRM_QAIC_MANAGE, struct qaic_manage_msg)
>>>> +#define DRM_IOCTL_QAIC_CREATE_BO        DRM_IOWR(DRM_COMMAND_BASE + 
>>>> DRM_QAIC_CREATE_BO,    struct qaic_create_bo)
>>>> +#define DRM_IOCTL_QAIC_MMAP_BO            DRM_IOWR(DRM_COMMAND_BASE 
>>>> + DRM_QAIC_MMAP_BO, struct qaic_mmap_bo)
>>>> +#define DRM_IOCTL_QAIC_ATTACH_SLICE_BO        
>>>> DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_ATTACH_SLICE_BO, struct 
>>>> qaic_attach_slice)
>>>> +#define DRM_IOCTL_QAIC_EXECUTE_BO        DRM_IOW(DRM_COMMAND_BASE + 
>>>> DRM_QAIC_EXECUTE_BO,    struct qaic_execute)
>>>> +#define DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO    
>>>> DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_PARTIAL_EXECUTE_BO,    struct 
>>>> qaic_execute)
>>>> +#define DRM_IOCTL_QAIC_WAIT_BO            DRM_IOW(DRM_COMMAND_BASE 
>>>> + DRM_QAIC_WAIT_BO, struct qaic_wait)
>>>> +#define DRM_IOCTL_QAIC_PERF_STATS_BO        
>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PERF_STATS_BO, struct 
>>>> qaic_perf_stats)
>>>> +#define DRM_IOCTL_QAIC_PART_DEV            
>>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PART_DEV, struct qaic_part_dev)
>>>> +
>>>> +#if defined(__CPLUSPLUS)
>>>
>>> Use lowercase here: __cplusplus
>>
>> Will do.
>>
>>>
>>>> +}
>>>> +#endif
>>>> +
>>>> +#endif /* QAIC_ACCEL_H_ */
>>>
>>> Regards,
>>> Jacek
>>>
>>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 5/8] accel/qaic: Add datapath
  2023-02-24 19:36       ` Jeffrey Hugo
@ 2023-02-27 17:14         ` Stanislaw Gruszka
  -1 siblings, 0 replies; 68+ messages in thread
From: Stanislaw Gruszka @ 2023-02-27 17:14 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, quic_carlv, jacek.lawrynowicz

On Fri, Feb 24, 2023 at 12:36:51PM -0700, Jeffrey Hugo wrote:
> > > +static int reserve_pages(unsigned long start_pfn, unsigned long nr_pages,
> > > +			 bool reserve)
> > > +{
> > > +	unsigned long pfn;
> > > +	unsigned long end_pfn = start_pfn + nr_pages;
> > > +	struct page *page;
> > > +
> > > +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> > > +		if (!pfn_valid(pfn))
> > > +			return -EINVAL;
> > > +		page =  pfn_to_page(pfn);
> > > +		if (reserve)
> > > +			SetPageReserved(page);
> > > +		else
> > > +			ClearPageReserved(page);
> > 
> > It is needed? Looks like taken from some legacy code.
> 
> Required for remap_pfn_range().

PG_reserved is not required any longer for remap_pfn_range(), here
is excerpt from comment from include/linux/page-flags.h :

 * Some PG_reserved pages will be excluded from the hibernation image.
 * PG_reserved does in general not hinder anybody from dumping or swapping
 * and is no longer required for remap_pfn_range(). ioremap might require it.
 * Consequently, PG_reserved for a page mapped into user space can indicate
 * the zero page, the vDSO, MMIO pages or device memory.

> > > +static int copy_sgt(struct qaic_device *qdev, struct sg_table **sgt_out,
> > > +		    struct sg_table *sgt_in, u64 size, u64 offset)
> > > +{
> > > +	int total_len, len, nents, offf = 0, offl = 0;
> > > +	struct scatterlist *sg, *sgn, *sgf, *sgl;
> > > +	struct sg_table *sgt;
> > > +	int ret, j;
> > > +
> > > +	/* find out number of relevant nents needed for this mem */
> > > +	total_len = 0;
> > > +	sgf = NULL;
> > > +	sgl = NULL;
> > > +	nents = 0;
> > > +
> > > +	size = size ? size : PAGE_SIZE;
> > > +	for (sg = sgt_in->sgl; sg; sg = sg_next(sg)) {
> > > +		len = sg_dma_len(sg);
> > > +
> > > +		if (!len)
> > > +			continue;
> > > +		if (offset >= total_len && offset < total_len + len) {
> > > +			sgf = sg;
> > > +			offf = offset - total_len;
> > > +		}
> > > +		if (sgf)
> > > +			nents++;
> > > +		if (offset + size >= total_len &&
> > > +		    offset + size <= total_len + len) {
> > > +			sgl = sg;
> > > +			offl = offset + size - total_len;
> > > +			break;
> > > +		}
> > > +		total_len += len;
> > > +	}
> > > +
> > > +	if (!sgf || !sgl) {
> > > +		ret = -EINVAL;
> > > +		goto out;
> > > +	}
> > > +
> > > +	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> > > +	if (!sgt) {
> > > +		ret = -ENOMEM;
> > > +		goto out;
> > > +	}
> > > +
> > > +	ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
> > > +	if (ret)
> > > +		goto free_sgt;
> > > +
> > > +	/* copy relevant sg node and fix page and length */
> > > +	sgn = sgf;
> > > +	for_each_sgtable_sg(sgt, sg, j) {
> > > +		memcpy(sg, sgn, sizeof(*sg));
> > > +		if (sgn == sgf) {
> > > +			sg_dma_address(sg) += offf;

This looks a bit suspicious. Are you sure you can modify
sg->dma_address and still use it as valid value ?

> > > +			sg_dma_len(sg) -= offf;
> > > +			sg_set_page(sg, sg_page(sgn),
> > > +				    sg_dma_len(sg), offf);
> > > +		} else {
> > > +			offf = 0;
> > > +		}
> > > +		if (sgn == sgl) {
> > > +			sg_dma_len(sg) = offl - offf;
> > > +			sg_set_page(sg, sg_page(sgn),
> > > +				    offl - offf, offf);
> > > +			sg_mark_end(sg);
> > > +			break;
> > > +		}
> > > +		sgn = sg_next(sgn);
> > > +	}
> > 
> > Why not use sg_copy_table() ? Overall copy_sgt() seems to be something
> > that could be replaced by generic helper or at least simplify.
> 
> I don't see "sg_copy_table" defined in 6.2. 

Because there is no such function in any kernel source. It was only my
imagination, not sure right now how I came up with this function name :-/
Sorry about confusion.

There are only sg_copy_{to,from}_buffer(), but not really useful in
this case.

> Are you suggesting renaming
> this function?  I guess I'm not quite understanding your comment here. Can
> you elaborate?

Renaming would be nice. I was thinking by simplifying it, not sure
now if that's easy achievable, though.

Regards
Stanislaw



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 5/8] accel/qaic: Add datapath
@ 2023-02-27 17:14         ` Stanislaw Gruszka
  0 siblings, 0 replies; 68+ messages in thread
From: Stanislaw Gruszka @ 2023-02-27 17:14 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: ogabbay, airlied, daniel, dri-devel, jacek.lawrynowicz,
	quic_pkanojiy, quic_carlv, quic_ajitpals, linux-doc,
	linux-arm-msm

On Fri, Feb 24, 2023 at 12:36:51PM -0700, Jeffrey Hugo wrote:
> > > +static int reserve_pages(unsigned long start_pfn, unsigned long nr_pages,
> > > +			 bool reserve)
> > > +{
> > > +	unsigned long pfn;
> > > +	unsigned long end_pfn = start_pfn + nr_pages;
> > > +	struct page *page;
> > > +
> > > +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> > > +		if (!pfn_valid(pfn))
> > > +			return -EINVAL;
> > > +		page =  pfn_to_page(pfn);
> > > +		if (reserve)
> > > +			SetPageReserved(page);
> > > +		else
> > > +			ClearPageReserved(page);
> > 
> > It is needed? Looks like taken from some legacy code.
> 
> Required for remap_pfn_range().

PG_reserved is not required any longer for remap_pfn_range(), here
is excerpt from comment from include/linux/page-flags.h :

 * Some PG_reserved pages will be excluded from the hibernation image.
 * PG_reserved does in general not hinder anybody from dumping or swapping
 * and is no longer required for remap_pfn_range(). ioremap might require it.
 * Consequently, PG_reserved for a page mapped into user space can indicate
 * the zero page, the vDSO, MMIO pages or device memory.

> > > +static int copy_sgt(struct qaic_device *qdev, struct sg_table **sgt_out,
> > > +		    struct sg_table *sgt_in, u64 size, u64 offset)
> > > +{
> > > +	int total_len, len, nents, offf = 0, offl = 0;
> > > +	struct scatterlist *sg, *sgn, *sgf, *sgl;
> > > +	struct sg_table *sgt;
> > > +	int ret, j;
> > > +
> > > +	/* find out number of relevant nents needed for this mem */
> > > +	total_len = 0;
> > > +	sgf = NULL;
> > > +	sgl = NULL;
> > > +	nents = 0;
> > > +
> > > +	size = size ? size : PAGE_SIZE;
> > > +	for (sg = sgt_in->sgl; sg; sg = sg_next(sg)) {
> > > +		len = sg_dma_len(sg);
> > > +
> > > +		if (!len)
> > > +			continue;
> > > +		if (offset >= total_len && offset < total_len + len) {
> > > +			sgf = sg;
> > > +			offf = offset - total_len;
> > > +		}
> > > +		if (sgf)
> > > +			nents++;
> > > +		if (offset + size >= total_len &&
> > > +		    offset + size <= total_len + len) {
> > > +			sgl = sg;
> > > +			offl = offset + size - total_len;
> > > +			break;
> > > +		}
> > > +		total_len += len;
> > > +	}
> > > +
> > > +	if (!sgf || !sgl) {
> > > +		ret = -EINVAL;
> > > +		goto out;
> > > +	}
> > > +
> > > +	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> > > +	if (!sgt) {
> > > +		ret = -ENOMEM;
> > > +		goto out;
> > > +	}
> > > +
> > > +	ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
> > > +	if (ret)
> > > +		goto free_sgt;
> > > +
> > > +	/* copy relevant sg node and fix page and length */
> > > +	sgn = sgf;
> > > +	for_each_sgtable_sg(sgt, sg, j) {
> > > +		memcpy(sg, sgn, sizeof(*sg));
> > > +		if (sgn == sgf) {
> > > +			sg_dma_address(sg) += offf;

This looks a bit suspicious. Are you sure you can modify
sg->dma_address and still use it as valid value ?

> > > +			sg_dma_len(sg) -= offf;
> > > +			sg_set_page(sg, sg_page(sgn),
> > > +				    sg_dma_len(sg), offf);
> > > +		} else {
> > > +			offf = 0;
> > > +		}
> > > +		if (sgn == sgl) {
> > > +			sg_dma_len(sg) = offl - offf;
> > > +			sg_set_page(sg, sg_page(sgn),
> > > +				    offl - offf, offf);
> > > +			sg_mark_end(sg);
> > > +			break;
> > > +		}
> > > +		sgn = sg_next(sgn);
> > > +	}
> > 
> > Why not use sg_copy_table() ? Overall copy_sgt() seems to be something
> > that could be replaced by generic helper or at least simplify.
> 
> I don't see "sg_copy_table" defined in 6.2. 

Because there is no such function in any kernel source. It was only my
imagination, not sure right now how I came up with this function name :-/
Sorry about confusion.

There are only sg_copy_{to,from}_buffer(), but not really useful in
this case.

> Are you suggesting renaming
> this function?  I guess I'm not quite understanding your comment here. Can
> you elaborate?

Renaming would be nice. I was thinking by simplifying it, not sure
now if that's easy achievable, though.

Regards
Stanislaw



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 3/8] accel/qaic: Add MHI controller
  2023-02-06 15:41   ` Jeffrey Hugo
@ 2023-02-28 11:52     ` Stanislaw Gruszka
  -1 siblings, 0 replies; 68+ messages in thread
From: Stanislaw Gruszka @ 2023-02-28 11:52 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: ogabbay, airlied, daniel, dri-devel, jacek.lawrynowicz,
	quic_pkanojiy, quic_carlv, quic_ajitpals, linux-doc,
	linux-arm-msm

On Mon, Feb 06, 2023 at 08:41:40AM -0700, Jeffrey Hugo wrote:
> +	mhi_cntl = kzalloc(sizeof(*mhi_cntl), GFP_KERNEL);
[snip]
> +	mhi_cntl->irq = kmalloc(sizeof(*mhi_cntl->irq), GFP_KERNEL);

I recommend usage of devm_kzalloc(), devm_kmalloc() for those
to simplify error and exit paths.

Regards
Stanislaw

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 3/8] accel/qaic: Add MHI controller
@ 2023-02-28 11:52     ` Stanislaw Gruszka
  0 siblings, 0 replies; 68+ messages in thread
From: Stanislaw Gruszka @ 2023-02-28 11:52 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, quic_carlv, jacek.lawrynowicz

On Mon, Feb 06, 2023 at 08:41:40AM -0700, Jeffrey Hugo wrote:
> +	mhi_cntl = kzalloc(sizeof(*mhi_cntl), GFP_KERNEL);
[snip]
> +	mhi_cntl->irq = kmalloc(sizeof(*mhi_cntl->irq), GFP_KERNEL);

I recommend usage of devm_kzalloc(), devm_kmalloc() for those
to simplify error and exit paths.

Regards
Stanislaw

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 7/8] accel/qaic: Add qaic driver to the build system
  2023-02-06 15:41   ` Jeffrey Hugo
@ 2023-03-01 13:02     ` Stanislaw Gruszka
  -1 siblings, 0 replies; 68+ messages in thread
From: Stanislaw Gruszka @ 2023-03-01 13:02 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: ogabbay, airlied, daniel, dri-devel, jacek.lawrynowicz,
	quic_pkanojiy, quic_carlv, quic_ajitpals, linux-doc,
	linux-arm-msm

On Mon, Feb 06, 2023 at 08:41:44AM -0700, Jeffrey Hugo wrote:
> Now that we have all the components of a minimum QAIC which can boot and
> run an AIC100 device, add the infrastructure that allows the QAIC driver
> to be built.
> 
> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
> ---
>  drivers/accel/Kconfig       |  1 +
>  drivers/accel/Makefile      |  1 +
>  drivers/accel/qaic/Kconfig  | 23 +++++++++++++++++++++++
>  drivers/accel/qaic/Makefile | 13 +++++++++++++
>  4 files changed, 38 insertions(+)
>  create mode 100644 drivers/accel/qaic/Kconfig
>  create mode 100644 drivers/accel/qaic/Makefile
> 
> diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig
> index 8348639..56d0d01 100644
> --- a/drivers/accel/Kconfig
> +++ b/drivers/accel/Kconfig
> @@ -25,3 +25,4 @@ menuconfig DRM_ACCEL
>  
>  source "drivers/accel/habanalabs/Kconfig"
>  source "drivers/accel/ivpu/Kconfig"
> +source "drivers/accel/qaic/Kconfig"
> diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile
> index 07aa77a..3cb6f14 100644
> --- a/drivers/accel/Makefile
> +++ b/drivers/accel/Makefile
> @@ -2,3 +2,4 @@
>  
>  obj-y	+= habanalabs/
>  obj-y	+= ivpu/
> +obj-y	+= qaic/

All should be changed to obj-$(CONFIG_DRM_ACCEL_DRIVER)
to avoid inspecting sub-directories. I'll send patch for this.
Then you can adjust accordingly for qaic.

Regards
Stanislaw


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 7/8] accel/qaic: Add qaic driver to the build system
@ 2023-03-01 13:02     ` Stanislaw Gruszka
  0 siblings, 0 replies; 68+ messages in thread
From: Stanislaw Gruszka @ 2023-03-01 13:02 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, quic_carlv, jacek.lawrynowicz

On Mon, Feb 06, 2023 at 08:41:44AM -0700, Jeffrey Hugo wrote:
> Now that we have all the components of a minimum QAIC which can boot and
> run an AIC100 device, add the infrastructure that allows the QAIC driver
> to be built.
> 
> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
> ---
>  drivers/accel/Kconfig       |  1 +
>  drivers/accel/Makefile      |  1 +
>  drivers/accel/qaic/Kconfig  | 23 +++++++++++++++++++++++
>  drivers/accel/qaic/Makefile | 13 +++++++++++++
>  4 files changed, 38 insertions(+)
>  create mode 100644 drivers/accel/qaic/Kconfig
>  create mode 100644 drivers/accel/qaic/Makefile
> 
> diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig
> index 8348639..56d0d01 100644
> --- a/drivers/accel/Kconfig
> +++ b/drivers/accel/Kconfig
> @@ -25,3 +25,4 @@ menuconfig DRM_ACCEL
>  
>  source "drivers/accel/habanalabs/Kconfig"
>  source "drivers/accel/ivpu/Kconfig"
> +source "drivers/accel/qaic/Kconfig"
> diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile
> index 07aa77a..3cb6f14 100644
> --- a/drivers/accel/Makefile
> +++ b/drivers/accel/Makefile
> @@ -2,3 +2,4 @@
>  
>  obj-y	+= habanalabs/
>  obj-y	+= ivpu/
> +obj-y	+= qaic/

All should be changed to obj-$(CONFIG_DRM_ACCEL_DRIVER)
to avoid inspecting sub-directories. I'll send patch for this.
Then you can adjust accordingly for qaic.

Regards
Stanislaw


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 8/8] MAINTAINERS: Add entry for QAIC driver
  2023-02-06 15:41   ` Jeffrey Hugo
@ 2023-03-01 13:03     ` Stanislaw Gruszka
  -1 siblings, 0 replies; 68+ messages in thread
From: Stanislaw Gruszka @ 2023-03-01 13:03 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, quic_carlv, jacek.lawrynowicz

On Mon, Feb 06, 2023 at 08:41:45AM -0700, Jeffrey Hugo wrote:
> Add MAINTAINERS entry for the Qualcomm Cloud AI 100 driver.
> 
> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
> ---
>  MAINTAINERS | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 263d37a..0a264f1 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -17170,6 +17170,14 @@ F:	Documentation/devicetree/bindings/clock/qcom,*
>  F:	drivers/clk/qcom/
>  F:	include/dt-bindings/clock/qcom,*
>  
> +QUALCOMM CLOUD AI (QAIC) DRIVER
> +M:	Jeffrey Hugo <quic_jhugo@quicinc.com>
> +L:	linux-arm-msm@vger.kernel.org

Is this correct mailing list ? Should not be dri-devel ? 

Regards
Stanislaw

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 8/8] MAINTAINERS: Add entry for QAIC driver
@ 2023-03-01 13:03     ` Stanislaw Gruszka
  0 siblings, 0 replies; 68+ messages in thread
From: Stanislaw Gruszka @ 2023-03-01 13:03 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: ogabbay, airlied, daniel, dri-devel, jacek.lawrynowicz,
	quic_pkanojiy, quic_carlv, quic_ajitpals, linux-doc,
	linux-arm-msm

On Mon, Feb 06, 2023 at 08:41:45AM -0700, Jeffrey Hugo wrote:
> Add MAINTAINERS entry for the Qualcomm Cloud AI 100 driver.
> 
> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
> ---
>  MAINTAINERS | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 263d37a..0a264f1 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -17170,6 +17170,14 @@ F:	Documentation/devicetree/bindings/clock/qcom,*
>  F:	drivers/clk/qcom/
>  F:	include/dt-bindings/clock/qcom,*
>  
> +QUALCOMM CLOUD AI (QAIC) DRIVER
> +M:	Jeffrey Hugo <quic_jhugo@quicinc.com>
> +L:	linux-arm-msm@vger.kernel.org

Is this correct mailing list ? Should not be dri-devel ? 

Regards
Stanislaw

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 4/8] accel/qaic: Add control path
  2023-02-06 15:41   ` Jeffrey Hugo
@ 2023-03-01 13:18     ` Stanislaw Gruszka
  -1 siblings, 0 replies; 68+ messages in thread
From: Stanislaw Gruszka @ 2023-03-01 13:18 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: ogabbay, airlied, daniel, dri-devel, jacek.lawrynowicz,
	quic_pkanojiy, quic_carlv, quic_ajitpals, linux-doc,
	linux-arm-msm

On Mon, Feb 06, 2023 at 08:41:41AM -0700, Jeffrey Hugo wrote:
> +	 * It turns out several of the iommu drivers don't combine adjacent
> +	 * regions, which is really what we expect based on the description of
> +	 * dma_map_sgtable(), so lets see if that can be done.  It makes our message
> +	 * more efficent.
> +	 */

Interesting ...

> +	last = sgt->sgl;
> +	nents_dma = nents;
> +	size = QAIC_MANAGE_EXT_MSG_LENGTH - msg_hdr_len - sizeof(*out_trans);
> +	for_each_sgtable_sg(sgt, sg, i) {
> +		if (sg_dma_address(last) + sg_dma_len(last) !=
> +		    sg_dma_address(sg)) {
> +			size -= sizeof(*asp);
> +			/* Save 1K for possible follow-up transactions. */
> +			if (size < SZ_1K) {
> +				nents_dma = i;
> +				break;
> +			}
> +		}
> +		last = sg;
> +	}

I would say there is reason why iommu do not combine or there is problem in iommu driver.
If there is reason for not-combining this code is wrong.
If problem is in iommu driver why not fixup iommu ?

And why not create sg list that don't have adjacent entries in the first place ?

Regards
Stanislaw

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 4/8] accel/qaic: Add control path
@ 2023-03-01 13:18     ` Stanislaw Gruszka
  0 siblings, 0 replies; 68+ messages in thread
From: Stanislaw Gruszka @ 2023-03-01 13:18 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, quic_carlv, jacek.lawrynowicz

On Mon, Feb 06, 2023 at 08:41:41AM -0700, Jeffrey Hugo wrote:
> +	 * It turns out several of the iommu drivers don't combine adjacent
> +	 * regions, which is really what we expect based on the description of
> +	 * dma_map_sgtable(), so lets see if that can be done.  It makes our message
> +	 * more efficent.
> +	 */

Interesting ...

> +	last = sgt->sgl;
> +	nents_dma = nents;
> +	size = QAIC_MANAGE_EXT_MSG_LENGTH - msg_hdr_len - sizeof(*out_trans);
> +	for_each_sgtable_sg(sgt, sg, i) {
> +		if (sg_dma_address(last) + sg_dma_len(last) !=
> +		    sg_dma_address(sg)) {
> +			size -= sizeof(*asp);
> +			/* Save 1K for possible follow-up transactions. */
> +			if (size < SZ_1K) {
> +				nents_dma = i;
> +				break;
> +			}
> +		}
> +		last = sg;
> +	}

I would say there is reason why iommu do not combine or there is problem in iommu driver.
If there is reason for not-combining this code is wrong.
If problem is in iommu driver why not fixup iommu ?

And why not create sg list that don't have adjacent entries in the first place ?

Regards
Stanislaw

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 6/8] accel/qaic: Add mhi_qaic_cntl
  2023-02-06 15:41   ` Jeffrey Hugo
@ 2023-03-01 13:19     ` Stanislaw Gruszka
  -1 siblings, 0 replies; 68+ messages in thread
From: Stanislaw Gruszka @ 2023-03-01 13:19 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: ogabbay, airlied, daniel, dri-devel, jacek.lawrynowicz,
	quic_pkanojiy, quic_carlv, quic_ajitpals, linux-doc,
	linux-arm-msm

On Mon, Feb 06, 2023 at 08:41:43AM -0700, Jeffrey Hugo wrote:
> From: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
> 
> Some of the MHI channels for an AIC100 device need to be routed to
> userspace so that userspace can communicate directly with QSM.  The MHI
> bus does not support this, and while the WWAN subsystem does (for the same
> reasons), AIC100 is not a WWAN device.  Also, MHI is not something that
> other accelerators are expected to share, thus an accel subsystem function
> that meets this usecase is unlikely.
> 
> Create a QAIC specific MHI userspace shim that exposes these channels.
> 
> Start with QAIC_SAHARA which is required to boot AIC100 and is consumed by
> the kickstart application as documented in aic100.rst
> 
> Each AIC100 instance (currently, up to 16) in a system will create a
> chardev for QAIC_SAHARA.  This chardev will be found as
> /dev/<mhi instance>_QAIC_SAHARA
> For example - /dev/mhi0_QAIC_SAHARA
> 
> Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>

Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 6/8] accel/qaic: Add mhi_qaic_cntl
@ 2023-03-01 13:19     ` Stanislaw Gruszka
  0 siblings, 0 replies; 68+ messages in thread
From: Stanislaw Gruszka @ 2023-03-01 13:19 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, quic_carlv, jacek.lawrynowicz

On Mon, Feb 06, 2023 at 08:41:43AM -0700, Jeffrey Hugo wrote:
> From: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
> 
> Some of the MHI channels for an AIC100 device need to be routed to
> userspace so that userspace can communicate directly with QSM.  The MHI
> bus does not support this, and while the WWAN subsystem does (for the same
> reasons), AIC100 is not a WWAN device.  Also, MHI is not something that
> other accelerators are expected to share, thus an accel subsystem function
> that meets this usecase is unlikely.
> 
> Create a QAIC specific MHI userspace shim that exposes these channels.
> 
> Start with QAIC_SAHARA which is required to boot AIC100 and is consumed by
> the kickstart application as documented in aic100.rst
> 
> Each AIC100 instance (currently, up to 16) in a system will create a
> chardev for QAIC_SAHARA.  This chardev will be found as
> /dev/<mhi instance>_QAIC_SAHARA
> For example - /dev/mhi0_QAIC_SAHARA
> 
> Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>

Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 5/8] accel/qaic: Add datapath
  2023-02-27 17:14         ` Stanislaw Gruszka
@ 2023-03-01 16:08           ` Jeffrey Hugo
  -1 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-03-01 16:08 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: ogabbay, airlied, daniel, dri-devel, jacek.lawrynowicz,
	quic_pkanojiy, quic_carlv, quic_ajitpals, linux-doc,
	linux-arm-msm

On 2/27/2023 10:14 AM, Stanislaw Gruszka wrote:
> On Fri, Feb 24, 2023 at 12:36:51PM -0700, Jeffrey Hugo wrote:
>>>> +static int reserve_pages(unsigned long start_pfn, unsigned long nr_pages,
>>>> +			 bool reserve)
>>>> +{
>>>> +	unsigned long pfn;
>>>> +	unsigned long end_pfn = start_pfn + nr_pages;
>>>> +	struct page *page;
>>>> +
>>>> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>>> +		if (!pfn_valid(pfn))
>>>> +			return -EINVAL;
>>>> +		page =  pfn_to_page(pfn);
>>>> +		if (reserve)
>>>> +			SetPageReserved(page);
>>>> +		else
>>>> +			ClearPageReserved(page);
>>>
>>> It is needed? Looks like taken from some legacy code.
>>
>> Required for remap_pfn_range().
> 
> PG_reserved is not required any longer for remap_pfn_range(), here
> is excerpt from comment from include/linux/page-flags.h :
> 
>   * Some PG_reserved pages will be excluded from the hibernation image.
>   * PG_reserved does in general not hinder anybody from dumping or swapping
>   * and is no longer required for remap_pfn_range(). ioremap might require it.
>   * Consequently, PG_reserved for a page mapped into user space can indicate
>   * the zero page, the vDSO, MMIO pages or device memory.

I clearly missed that and was relying on other documentation.  Thank you 
for pointing this out.  Will remove.

> 
>>>> +static int copy_sgt(struct qaic_device *qdev, struct sg_table **sgt_out,
>>>> +		    struct sg_table *sgt_in, u64 size, u64 offset)
>>>> +{
>>>> +	int total_len, len, nents, offf = 0, offl = 0;
>>>> +	struct scatterlist *sg, *sgn, *sgf, *sgl;
>>>> +	struct sg_table *sgt;
>>>> +	int ret, j;
>>>> +
>>>> +	/* find out number of relevant nents needed for this mem */
>>>> +	total_len = 0;
>>>> +	sgf = NULL;
>>>> +	sgl = NULL;
>>>> +	nents = 0;
>>>> +
>>>> +	size = size ? size : PAGE_SIZE;
>>>> +	for (sg = sgt_in->sgl; sg; sg = sg_next(sg)) {
>>>> +		len = sg_dma_len(sg);
>>>> +
>>>> +		if (!len)
>>>> +			continue;
>>>> +		if (offset >= total_len && offset < total_len + len) {
>>>> +			sgf = sg;
>>>> +			offf = offset - total_len;
>>>> +		}
>>>> +		if (sgf)
>>>> +			nents++;
>>>> +		if (offset + size >= total_len &&
>>>> +		    offset + size <= total_len + len) {
>>>> +			sgl = sg;
>>>> +			offl = offset + size - total_len;
>>>> +			break;
>>>> +		}
>>>> +		total_len += len;
>>>> +	}
>>>> +
>>>> +	if (!sgf || !sgl) {
>>>> +		ret = -EINVAL;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
>>>> +	if (!sgt) {
>>>> +		ret = -ENOMEM;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
>>>> +	if (ret)
>>>> +		goto free_sgt;
>>>> +
>>>> +	/* copy relevant sg node and fix page and length */
>>>> +	sgn = sgf;
>>>> +	for_each_sgtable_sg(sgt, sg, j) {
>>>> +		memcpy(sg, sgn, sizeof(*sg));
>>>> +		if (sgn == sgf) {
>>>> +			sg_dma_address(sg) += offf;
> 
> This looks a bit suspicious. Are you sure you can modify
> sg->dma_address and still use it as valid value ?

A single entry in the sg table is a contiguous mapping of memory.  If it 
wasn't contiguous, it would have to be broken up into multiple entries. 
  In the simple case, a driver is going to take the dma_address/len pair 
and hand that directly to the device.  Then the device is going to 
access every address in that range.

If the device can access every address from dma_address to dma_address + 
len, why can't it access a subset of that?

>>>> +			sg_dma_len(sg) -= offf;
>>>> +			sg_set_page(sg, sg_page(sgn),
>>>> +				    sg_dma_len(sg), offf);
>>>> +		} else {
>>>> +			offf = 0;
>>>> +		}
>>>> +		if (sgn == sgl) {
>>>> +			sg_dma_len(sg) = offl - offf;
>>>> +			sg_set_page(sg, sg_page(sgn),
>>>> +				    offl - offf, offf);
>>>> +			sg_mark_end(sg);
>>>> +			break;
>>>> +		}
>>>> +		sgn = sg_next(sgn);
>>>> +	}
>>>
>>> Why not use sg_copy_table() ? Overall copy_sgt() seems to be something
>>> that could be replaced by generic helper or at least simplify.
>>
>> I don't see "sg_copy_table" defined in 6.2.
> 
> Because there is no such function in any kernel source. It was only my
> imagination, not sure right now how I came up with this function name :-/
> Sorry about confusion.
> 
> There are only sg_copy_{to,from}_buffer(), but not really useful in
> this case.
> 
>> Are you suggesting renaming
>> this function?  I guess I'm not quite understanding your comment here. Can
>> you elaborate?
> 
> Renaming would be nice. I was thinking by simplifying it, not sure
> now if that's easy achievable, though.

Ok.  I'll think on this.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 5/8] accel/qaic: Add datapath
@ 2023-03-01 16:08           ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-03-01 16:08 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, quic_carlv, jacek.lawrynowicz

On 2/27/2023 10:14 AM, Stanislaw Gruszka wrote:
> On Fri, Feb 24, 2023 at 12:36:51PM -0700, Jeffrey Hugo wrote:
>>>> +static int reserve_pages(unsigned long start_pfn, unsigned long nr_pages,
>>>> +			 bool reserve)
>>>> +{
>>>> +	unsigned long pfn;
>>>> +	unsigned long end_pfn = start_pfn + nr_pages;
>>>> +	struct page *page;
>>>> +
>>>> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>>> +		if (!pfn_valid(pfn))
>>>> +			return -EINVAL;
>>>> +		page =  pfn_to_page(pfn);
>>>> +		if (reserve)
>>>> +			SetPageReserved(page);
>>>> +		else
>>>> +			ClearPageReserved(page);
>>>
>>> It is needed? Looks like taken from some legacy code.
>>
>> Required for remap_pfn_range().
> 
> PG_reserved is not required any longer for remap_pfn_range(), here
> is excerpt from comment from include/linux/page-flags.h :
> 
>   * Some PG_reserved pages will be excluded from the hibernation image.
>   * PG_reserved does in general not hinder anybody from dumping or swapping
>   * and is no longer required for remap_pfn_range(). ioremap might require it.
>   * Consequently, PG_reserved for a page mapped into user space can indicate
>   * the zero page, the vDSO, MMIO pages or device memory.

I clearly missed that and was relying on other documentation.  Thank you 
for pointing this out.  Will remove.

> 
>>>> +static int copy_sgt(struct qaic_device *qdev, struct sg_table **sgt_out,
>>>> +		    struct sg_table *sgt_in, u64 size, u64 offset)
>>>> +{
>>>> +	int total_len, len, nents, offf = 0, offl = 0;
>>>> +	struct scatterlist *sg, *sgn, *sgf, *sgl;
>>>> +	struct sg_table *sgt;
>>>> +	int ret, j;
>>>> +
>>>> +	/* find out number of relevant nents needed for this mem */
>>>> +	total_len = 0;
>>>> +	sgf = NULL;
>>>> +	sgl = NULL;
>>>> +	nents = 0;
>>>> +
>>>> +	size = size ? size : PAGE_SIZE;
>>>> +	for (sg = sgt_in->sgl; sg; sg = sg_next(sg)) {
>>>> +		len = sg_dma_len(sg);
>>>> +
>>>> +		if (!len)
>>>> +			continue;
>>>> +		if (offset >= total_len && offset < total_len + len) {
>>>> +			sgf = sg;
>>>> +			offf = offset - total_len;
>>>> +		}
>>>> +		if (sgf)
>>>> +			nents++;
>>>> +		if (offset + size >= total_len &&
>>>> +		    offset + size <= total_len + len) {
>>>> +			sgl = sg;
>>>> +			offl = offset + size - total_len;
>>>> +			break;
>>>> +		}
>>>> +		total_len += len;
>>>> +	}
>>>> +
>>>> +	if (!sgf || !sgl) {
>>>> +		ret = -EINVAL;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
>>>> +	if (!sgt) {
>>>> +		ret = -ENOMEM;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
>>>> +	if (ret)
>>>> +		goto free_sgt;
>>>> +
>>>> +	/* copy relevant sg node and fix page and length */
>>>> +	sgn = sgf;
>>>> +	for_each_sgtable_sg(sgt, sg, j) {
>>>> +		memcpy(sg, sgn, sizeof(*sg));
>>>> +		if (sgn == sgf) {
>>>> +			sg_dma_address(sg) += offf;
> 
> This looks a bit suspicious. Are you sure you can modify
> sg->dma_address and still use it as valid value ?

A single entry in the sg table is a contiguous mapping of memory.  If it 
wasn't contiguous, it would have to be broken up into multiple entries. 
  In the simple case, a driver is going to take the dma_address/len pair 
and hand that directly to the device.  Then the device is going to 
access every address in that range.

If the device can access every address from dma_address to dma_address + 
len, why can't it access a subset of that?

>>>> +			sg_dma_len(sg) -= offf;
>>>> +			sg_set_page(sg, sg_page(sgn),
>>>> +				    sg_dma_len(sg), offf);
>>>> +		} else {
>>>> +			offf = 0;
>>>> +		}
>>>> +		if (sgn == sgl) {
>>>> +			sg_dma_len(sg) = offl - offf;
>>>> +			sg_set_page(sg, sg_page(sgn),
>>>> +				    offl - offf, offf);
>>>> +			sg_mark_end(sg);
>>>> +			break;
>>>> +		}
>>>> +		sgn = sg_next(sgn);
>>>> +	}
>>>
>>> Why not use sg_copy_table() ? Overall copy_sgt() seems to be something
>>> that could be replaced by generic helper or at least simplify.
>>
>> I don't see "sg_copy_table" defined in 6.2.
> 
> Because there is no such function in any kernel source. It was only my
> imagination, not sure right now how I came up with this function name :-/
> Sorry about confusion.
> 
> There are only sg_copy_{to,from}_buffer(), but not really useful in
> this case.
> 
>> Are you suggesting renaming
>> this function?  I guess I'm not quite understanding your comment here. Can
>> you elaborate?
> 
> Renaming would be nice. I was thinking by simplifying it, not sure
> now if that's easy achievable, though.

Ok.  I'll think on this.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 3/8] accel/qaic: Add MHI controller
  2023-02-28 11:52     ` Stanislaw Gruszka
@ 2023-03-01 16:09       ` Jeffrey Hugo
  -1 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-03-01 16:09 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: ogabbay, airlied, daniel, dri-devel, jacek.lawrynowicz,
	quic_pkanojiy, quic_carlv, quic_ajitpals, linux-doc,
	linux-arm-msm

On 2/28/2023 4:52 AM, Stanislaw Gruszka wrote:
> On Mon, Feb 06, 2023 at 08:41:40AM -0700, Jeffrey Hugo wrote:
>> +	mhi_cntl = kzalloc(sizeof(*mhi_cntl), GFP_KERNEL);
> [snip]
>> +	mhi_cntl->irq = kmalloc(sizeof(*mhi_cntl->irq), GFP_KERNEL);
> 
> I recommend usage of devm_kzalloc(), devm_kmalloc() for those
> to simplify error and exit paths.

When this was written, I didn't want to pass the struct device to the 
mhi_controller just for the purpose of using devm_*.  Today, I'm 
thinking that is not the end of the world, and devm has advantages. 
Will change.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 3/8] accel/qaic: Add MHI controller
@ 2023-03-01 16:09       ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-03-01 16:09 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, quic_carlv, jacek.lawrynowicz

On 2/28/2023 4:52 AM, Stanislaw Gruszka wrote:
> On Mon, Feb 06, 2023 at 08:41:40AM -0700, Jeffrey Hugo wrote:
>> +	mhi_cntl = kzalloc(sizeof(*mhi_cntl), GFP_KERNEL);
> [snip]
>> +	mhi_cntl->irq = kmalloc(sizeof(*mhi_cntl->irq), GFP_KERNEL);
> 
> I recommend usage of devm_kzalloc(), devm_kmalloc() for those
> to simplify error and exit paths.

When this was written, I didn't want to pass the struct device to the 
mhi_controller just for the purpose of using devm_*.  Today, I'm 
thinking that is not the end of the world, and devm has advantages. 
Will change.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 7/8] accel/qaic: Add qaic driver to the build system
  2023-03-01 13:02     ` Stanislaw Gruszka
@ 2023-03-01 16:10       ` Jeffrey Hugo
  -1 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-03-01 16:10 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: ogabbay, airlied, daniel, dri-devel, jacek.lawrynowicz,
	quic_pkanojiy, quic_carlv, quic_ajitpals, linux-doc,
	linux-arm-msm

On 3/1/2023 6:02 AM, Stanislaw Gruszka wrote:
> On Mon, Feb 06, 2023 at 08:41:44AM -0700, Jeffrey Hugo wrote:
>> Now that we have all the components of a minimum QAIC which can boot and
>> run an AIC100 device, add the infrastructure that allows the QAIC driver
>> to be built.
>>
>> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
>> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
>> ---
>>   drivers/accel/Kconfig       |  1 +
>>   drivers/accel/Makefile      |  1 +
>>   drivers/accel/qaic/Kconfig  | 23 +++++++++++++++++++++++
>>   drivers/accel/qaic/Makefile | 13 +++++++++++++
>>   4 files changed, 38 insertions(+)
>>   create mode 100644 drivers/accel/qaic/Kconfig
>>   create mode 100644 drivers/accel/qaic/Makefile
>>
>> diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig
>> index 8348639..56d0d01 100644
>> --- a/drivers/accel/Kconfig
>> +++ b/drivers/accel/Kconfig
>> @@ -25,3 +25,4 @@ menuconfig DRM_ACCEL
>>   
>>   source "drivers/accel/habanalabs/Kconfig"
>>   source "drivers/accel/ivpu/Kconfig"
>> +source "drivers/accel/qaic/Kconfig"
>> diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile
>> index 07aa77a..3cb6f14 100644
>> --- a/drivers/accel/Makefile
>> +++ b/drivers/accel/Makefile
>> @@ -2,3 +2,4 @@
>>   
>>   obj-y	+= habanalabs/
>>   obj-y	+= ivpu/
>> +obj-y	+= qaic/
> 
> All should be changed to obj-$(CONFIG_DRM_ACCEL_DRIVER)
> to avoid inspecting sub-directories. I'll send patch for this.
> Then you can adjust accordingly for qaic.

Sounds good.

-Jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 7/8] accel/qaic: Add qaic driver to the build system
@ 2023-03-01 16:10       ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-03-01 16:10 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, quic_carlv, jacek.lawrynowicz

On 3/1/2023 6:02 AM, Stanislaw Gruszka wrote:
> On Mon, Feb 06, 2023 at 08:41:44AM -0700, Jeffrey Hugo wrote:
>> Now that we have all the components of a minimum QAIC which can boot and
>> run an AIC100 device, add the infrastructure that allows the QAIC driver
>> to be built.
>>
>> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
>> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
>> ---
>>   drivers/accel/Kconfig       |  1 +
>>   drivers/accel/Makefile      |  1 +
>>   drivers/accel/qaic/Kconfig  | 23 +++++++++++++++++++++++
>>   drivers/accel/qaic/Makefile | 13 +++++++++++++
>>   4 files changed, 38 insertions(+)
>>   create mode 100644 drivers/accel/qaic/Kconfig
>>   create mode 100644 drivers/accel/qaic/Makefile
>>
>> diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig
>> index 8348639..56d0d01 100644
>> --- a/drivers/accel/Kconfig
>> +++ b/drivers/accel/Kconfig
>> @@ -25,3 +25,4 @@ menuconfig DRM_ACCEL
>>   
>>   source "drivers/accel/habanalabs/Kconfig"
>>   source "drivers/accel/ivpu/Kconfig"
>> +source "drivers/accel/qaic/Kconfig"
>> diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile
>> index 07aa77a..3cb6f14 100644
>> --- a/drivers/accel/Makefile
>> +++ b/drivers/accel/Makefile
>> @@ -2,3 +2,4 @@
>>   
>>   obj-y	+= habanalabs/
>>   obj-y	+= ivpu/
>> +obj-y	+= qaic/
> 
> All should be changed to obj-$(CONFIG_DRM_ACCEL_DRIVER)
> to avoid inspecting sub-directories. I'll send patch for this.
> Then you can adjust accordingly for qaic.

Sounds good.

-Jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 8/8] MAINTAINERS: Add entry for QAIC driver
  2023-03-01 13:03     ` Stanislaw Gruszka
@ 2023-03-01 16:15       ` Jeffrey Hugo
  -1 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-03-01 16:15 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: ogabbay, airlied, daniel, dri-devel, jacek.lawrynowicz,
	quic_pkanojiy, quic_carlv, quic_ajitpals, linux-doc,
	linux-arm-msm

On 3/1/2023 6:03 AM, Stanislaw Gruszka wrote:
> On Mon, Feb 06, 2023 at 08:41:45AM -0700, Jeffrey Hugo wrote:
>> Add MAINTAINERS entry for the Qualcomm Cloud AI 100 driver.
>>
>> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
>> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
>> ---
>>   MAINTAINERS | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 263d37a..0a264f1 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -17170,6 +17170,14 @@ F:	Documentation/devicetree/bindings/clock/qcom,*
>>   F:	drivers/clk/qcom/
>>   F:	include/dt-bindings/clock/qcom,*
>>   
>> +QUALCOMM CLOUD AI (QAIC) DRIVER
>> +M:	Jeffrey Hugo <quic_jhugo@quicinc.com>
>> +L:	linux-arm-msm@vger.kernel.org
> 
> Is this correct mailing list ? Should not be dri-devel ?

linux-arm-msm is the Qualcomm mailing list.  Listing dri-devel in 
addition makes sense though.  Will do.

-Jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 8/8] MAINTAINERS: Add entry for QAIC driver
@ 2023-03-01 16:15       ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-03-01 16:15 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, quic_carlv, jacek.lawrynowicz

On 3/1/2023 6:03 AM, Stanislaw Gruszka wrote:
> On Mon, Feb 06, 2023 at 08:41:45AM -0700, Jeffrey Hugo wrote:
>> Add MAINTAINERS entry for the Qualcomm Cloud AI 100 driver.
>>
>> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
>> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
>> ---
>>   MAINTAINERS | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 263d37a..0a264f1 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -17170,6 +17170,14 @@ F:	Documentation/devicetree/bindings/clock/qcom,*
>>   F:	drivers/clk/qcom/
>>   F:	include/dt-bindings/clock/qcom,*
>>   
>> +QUALCOMM CLOUD AI (QAIC) DRIVER
>> +M:	Jeffrey Hugo <quic_jhugo@quicinc.com>
>> +L:	linux-arm-msm@vger.kernel.org
> 
> Is this correct mailing list ? Should not be dri-devel ?

linux-arm-msm is the Qualcomm mailing list.  Listing dri-devel in 
addition makes sense though.  Will do.

-Jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 4/8] accel/qaic: Add control path
  2023-03-01 13:18     ` Stanislaw Gruszka
@ 2023-03-01 17:02       ` Jeffrey Hugo
  -1 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-03-01 17:02 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, quic_carlv, jacek.lawrynowicz

On 3/1/2023 6:18 AM, Stanislaw Gruszka wrote:
> On Mon, Feb 06, 2023 at 08:41:41AM -0700, Jeffrey Hugo wrote:
>> +	 * It turns out several of the iommu drivers don't combine adjacent
>> +	 * regions, which is really what we expect based on the description of
>> +	 * dma_map_sgtable(), so lets see if that can be done.  It makes our message
>> +	 * more efficent.
>> +	 */
> 
> Interesting ...
> 
>> +	last = sgt->sgl;
>> +	nents_dma = nents;
>> +	size = QAIC_MANAGE_EXT_MSG_LENGTH - msg_hdr_len - sizeof(*out_trans);
>> +	for_each_sgtable_sg(sgt, sg, i) {
>> +		if (sg_dma_address(last) + sg_dma_len(last) !=
>> +		    sg_dma_address(sg)) {
>> +			size -= sizeof(*asp);
>> +			/* Save 1K for possible follow-up transactions. */
>> +			if (size < SZ_1K) {
>> +				nents_dma = i;
>> +				break;
>> +			}
>> +		}
>> +		last = sg;
>> +	}
> 
> I would say there is reason why iommu do not combine or there is problem in iommu driver.
> If there is reason for not-combining this code is wrong.
> If problem is in iommu driver why not fixup iommu ?

So, previously, dma_map_sg was routed to the IOMMU drivers, and many of 
them did not optimize the mappings (looked like a choice of simplicity 
over best effort).  Looking at the details today, it looks like the 
DMA-API is handled by a common iommu driver, and it looks like it does 
the optimization after the IOMMU drivers do their mapping.

I'm going to recheck if this is used after that refactor, and if not, 
rip it out.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 4/8] accel/qaic: Add control path
@ 2023-03-01 17:02       ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-03-01 17:02 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: ogabbay, airlied, daniel, dri-devel, jacek.lawrynowicz,
	quic_pkanojiy, quic_carlv, quic_ajitpals, linux-doc,
	linux-arm-msm

On 3/1/2023 6:18 AM, Stanislaw Gruszka wrote:
> On Mon, Feb 06, 2023 at 08:41:41AM -0700, Jeffrey Hugo wrote:
>> +	 * It turns out several of the iommu drivers don't combine adjacent
>> +	 * regions, which is really what we expect based on the description of
>> +	 * dma_map_sgtable(), so lets see if that can be done.  It makes our message
>> +	 * more efficent.
>> +	 */
> 
> Interesting ...
> 
>> +	last = sgt->sgl;
>> +	nents_dma = nents;
>> +	size = QAIC_MANAGE_EXT_MSG_LENGTH - msg_hdr_len - sizeof(*out_trans);
>> +	for_each_sgtable_sg(sgt, sg, i) {
>> +		if (sg_dma_address(last) + sg_dma_len(last) !=
>> +		    sg_dma_address(sg)) {
>> +			size -= sizeof(*asp);
>> +			/* Save 1K for possible follow-up transactions. */
>> +			if (size < SZ_1K) {
>> +				nents_dma = i;
>> +				break;
>> +			}
>> +		}
>> +		last = sg;
>> +	}
> 
> I would say there is reason why iommu do not combine or there is problem in iommu driver.
> If there is reason for not-combining this code is wrong.
> If problem is in iommu driver why not fixup iommu ?

So, previously, dma_map_sg was routed to the IOMMU drivers, and many of 
them did not optimize the mappings (looked like a choice of simplicity 
over best effort).  Looking at the details today, it looks like the 
DMA-API is handled by a common iommu driver, and it looks like it does 
the optimization after the IOMMU drivers do their mapping.

I'm going to recheck if this is used after that refactor, and if not, 
rip it out.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 5/8] accel/qaic: Add datapath
  2023-03-01 16:08           ` Jeffrey Hugo
  (?)
@ 2023-03-01 17:05           ` Stanislaw Gruszka
  2023-03-01 18:14               ` Jeffrey Hugo
  -1 siblings, 1 reply; 68+ messages in thread
From: Stanislaw Gruszka @ 2023-03-01 17:05 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, quic_carlv, jacek.lawrynowicz

On Wed, Mar 01, 2023 at 09:08:03AM -0700, Jeffrey Hugo wrote:
> > This looks a bit suspicious. Are you sure you can modify
> > sg->dma_address and still use it as valid value ?
> 
> A single entry in the sg table is a contiguous mapping of memory.  If it
> wasn't contiguous, it would have to be broken up into multiple entries.  In
> the simple case, a driver is going to take the dma_address/len pair and hand
> that directly to the device.  Then the device is going to access every
> address in that range.
> 
> If the device can access every address from dma_address to dma_address +
> len, why can't it access a subset of that?

Required address alignment can be broken. Not sure if only that.

> > > Are you suggesting renaming
> > > this function?  I guess I'm not quite understanding your comment here. Can
> > > you elaborate?
> > 
> > Renaming would be nice. I was thinking by simplifying it, not sure
> > now if that's easy achievable, though.
> 
> Ok.  I'll think on this.

Maybe this function could be removed ? And create sg lists
that hardware can handle without any modification.
Just idea to consider, not any requirement.

Regards
Stanislaw

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 5/8] accel/qaic: Add datapath
  2023-03-01 17:05           ` Stanislaw Gruszka
@ 2023-03-01 18:14               ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-03-01 18:14 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, quic_carlv, jacek.lawrynowicz

On 3/1/2023 10:05 AM, Stanislaw Gruszka wrote:
> On Wed, Mar 01, 2023 at 09:08:03AM -0700, Jeffrey Hugo wrote:
>>> This looks a bit suspicious. Are you sure you can modify
>>> sg->dma_address and still use it as valid value ?
>>
>> A single entry in the sg table is a contiguous mapping of memory.  If it
>> wasn't contiguous, it would have to be broken up into multiple entries.  In
>> the simple case, a driver is going to take the dma_address/len pair and hand
>> that directly to the device.  Then the device is going to access every
>> address in that range.
>>
>> If the device can access every address from dma_address to dma_address +
>> len, why can't it access a subset of that?
> 
> Required address alignment can be broken. Not sure if only that.

AIC100 doesn't have required alignment.  AIC100 can access any 64-bit 
address, at a byte level granularity.  The only restriction AIC100 has 
is that the size of a transfer is restricted to a 32-bit value, so max 
individual transfer size of 4GB.  Transferring more than 4GB requires 
multiple transactions.

>>>> Are you suggesting renaming
>>>> this function?  I guess I'm not quite understanding your comment here. Can
>>>> you elaborate?
>>>
>>> Renaming would be nice. I was thinking by simplifying it, not sure
>>> now if that's easy achievable, though.
>>
>> Ok.  I'll think on this.
> 
> Maybe this function could be removed ? And create sg lists
> that hardware can handle without any modification.
> Just idea to consider, not any requirement.

Ok, so this is part of our "slicing" operation, and thus required.

Maybe how slicing works is not clear.

Lets say that we have a workload on AIC100 that can identify a license 
plate in a picture (aka lprnet).  Lets assume this workload only needs 
the RGB values of a RGBA file (a "jpeg" we are processing).

Userspace allocates a BO to hold the entire file.  A quarter of the file 
is R values, a quarter is G values, etc.  For simplicity, lets assume 
the R values are all sequentially listed, then the G values, then the B 
values, finally the A values.  When we allocate the BO, we map it once. 
  If we have an IOMMU, this optimizes the IOMMU mappings.  BOs can be 
quite large.  We have some test workloads based on real world workloads 
where each BO is 16-32M in size, and there are multiple BOs.  I don't 
want to map a 32M BO N duplicate times in the IOMMU.

So, now userspace slices the BO.  It tells us we need to transfer the 
RGB values (the first 75% of the BO), but not the A values.  So, we 
create a copy of the mapped SG and edit it to represent this transfer, 
which is a subset of the entire BO.  Using the slice information and the 
mapping information, we construct the DMA engine commands that can be 
used to transfer the relevant portions of the BO to the device.

It sounds like you are suggesting, lets flip this around.  Don't map the 
entire BO once.  Instead, wait for the slice info from userspace, 
construct a sg list based on the parts of the BO for the slice, and map 
that.  Then the driver gets a mapped SG it can just use.  The issue I 
see with that is slices can overlap.  You can transfer the same part of 
a BO multiple times.  Maybe lprnet has multiple threads on AIC100 where 
thread A consumes R data, thread B consumes R and G data, and thread C 
consumes B data.  We need to transfer the R data twice to different 
device locations so that threads A and B can consume the R data 
independently.

If we map per slice, we are going to map the R part of the BO twice in 
the IOMMU.  Is that valid?  It feels possible that there exists some 
IOMMU implementation that won't allow multiple IOVAs to map to the same 
DDR PA because that is weird and the implementer thinks its a software 
bug.  I don't want to run into that.  Assuming it is valid, that is 
multiple mappings in the IOMMU TLB which could have been a single 
mapping.  We are wasting IOMMU resources.

There are some ARM systems we support with limited IOVA space in the 
IOMMU, and we've had some issues with exhausting that space.  The 
current implementation is influenced by those experiences.

I appreciate the outsider perspective (here and elsewhere). 
Re-evaluating some of these things is resulting in improvements.

-Jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 5/8] accel/qaic: Add datapath
@ 2023-03-01 18:14               ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-03-01 18:14 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, quic_carlv, jacek.lawrynowicz

On 3/1/2023 10:05 AM, Stanislaw Gruszka wrote:
> On Wed, Mar 01, 2023 at 09:08:03AM -0700, Jeffrey Hugo wrote:
>>> This looks a bit suspicious. Are you sure you can modify
>>> sg->dma_address and still use it as valid value ?
>>
>> A single entry in the sg table is a contiguous mapping of memory.  If it
>> wasn't contiguous, it would have to be broken up into multiple entries.  In
>> the simple case, a driver is going to take the dma_address/len pair and hand
>> that directly to the device.  Then the device is going to access every
>> address in that range.
>>
>> If the device can access every address from dma_address to dma_address +
>> len, why can't it access a subset of that?
> 
> Required address alignment can be broken. Not sure if only that.

AIC100 doesn't have required alignment.  AIC100 can access any 64-bit 
address, at a byte level granularity.  The only restriction AIC100 has 
is that the size of a transfer is restricted to a 32-bit value, so max 
individual transfer size of 4GB.  Transferring more than 4GB requires 
multiple transactions.

>>>> Are you suggesting renaming
>>>> this function?  I guess I'm not quite understanding your comment here. Can
>>>> you elaborate?
>>>
>>> Renaming would be nice. I was thinking by simplifying it, not sure
>>> now if that's easy achievable, though.
>>
>> Ok.  I'll think on this.
> 
> Maybe this function could be removed ? And create sg lists
> that hardware can handle without any modification.
> Just idea to consider, not any requirement.

Ok, so this is part of our "slicing" operation, and thus required.

Maybe how slicing works is not clear.

Lets say that we have a workload on AIC100 that can identify a license 
plate in a picture (aka lprnet).  Lets assume this workload only needs 
the RGB values of a RGBA file (a "jpeg" we are processing).

Userspace allocates a BO to hold the entire file.  A quarter of the file 
is R values, a quarter is G values, etc.  For simplicity, lets assume 
the R values are all sequentially listed, then the G values, then the B 
values, finally the A values.  When we allocate the BO, we map it once. 
  If we have an IOMMU, this optimizes the IOMMU mappings.  BOs can be 
quite large.  We have some test workloads based on real world workloads 
where each BO is 16-32M in size, and there are multiple BOs.  I don't 
want to map a 32M BO N duplicate times in the IOMMU.

So, now userspace slices the BO.  It tells us we need to transfer the 
RGB values (the first 75% of the BO), but not the A values.  So, we 
create a copy of the mapped SG and edit it to represent this transfer, 
which is a subset of the entire BO.  Using the slice information and the 
mapping information, we construct the DMA engine commands that can be 
used to transfer the relevant portions of the BO to the device.

It sounds like you are suggesting, lets flip this around.  Don't map the 
entire BO once.  Instead, wait for the slice info from userspace, 
construct a sg list based on the parts of the BO for the slice, and map 
that.  Then the driver gets a mapped SG it can just use.  The issue I 
see with that is slices can overlap.  You can transfer the same part of 
a BO multiple times.  Maybe lprnet has multiple threads on AIC100 where 
thread A consumes R data, thread B consumes R and G data, and thread C 
consumes B data.  We need to transfer the R data twice to different 
device locations so that threads A and B can consume the R data 
independently.

If we map per slice, we are going to map the R part of the BO twice in 
the IOMMU.  Is that valid?  It feels possible that there exists some 
IOMMU implementation that won't allow multiple IOVAs to map to the same 
DDR PA because that is weird and the implementer thinks its a software 
bug.  I don't want to run into that.  Assuming it is valid, that is 
multiple mappings in the IOMMU TLB which could have been a single 
mapping.  We are wasting IOMMU resources.

There are some ARM systems we support with limited IOVA space in the 
IOMMU, and we've had some issues with exhausting that space.  The 
current implementation is influenced by those experiences.

I appreciate the outsider perspective (here and elsewhere). 
Re-evaluating some of these things is resulting in improvements.

-Jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/8] accel/qaic: Add uapi and core driver file
  2023-02-17 18:15       ` Jeffrey Hugo
@ 2023-03-01 19:46         ` Jeffrey Hugo
  -1 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-03-01 19:46 UTC (permalink / raw)
  To: Jacek Lawrynowicz, ogabbay, airlied, daniel, dri-devel
  Cc: linux-doc, linux-arm-msm, quic_ajitpals, quic_pkanojiy,
	stanislaw.gruszka, quic_carlv

On 2/17/2023 11:15 AM, Jeffrey Hugo wrote:
> On 2/16/2023 7:13 AM, Jacek Lawrynowicz wrote:
>> Hi,
>>
>> On 06.02.2023 16:41, Jeffrey Hugo wrote:
>>> Add the QAIC driver uapi file and core driver file that binds to the 
>>> PCIe
>>> device.  The core driver file also creates the accel device and manages
>>> all the interconnections between the different parts of the driver.
>>>
>>> The driver can be built as a module.  If so, it will be called 
>>> "qaic.ko".
>>>
>>> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
>>> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
>>> ---
>>>   drivers/accel/qaic/qaic.h     | 321 ++++++++++++++++++
>>>   drivers/accel/qaic/qaic_drv.c | 771 
>>> ++++++++++++++++++++++++++++++++++++++++++
>>>   include/uapi/drm/qaic_accel.h | 283 ++++++++++++++++
>>>   3 files changed, 1375 insertions(+)
>>>   create mode 100644 drivers/accel/qaic/qaic.h
>>>   create mode 100644 drivers/accel/qaic/qaic_drv.c
>>>   create mode 100644 include/uapi/drm/qaic_accel.h
>>>
>>> diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
>>> new file mode 100644
>>> index 0000000..3f7ea76
>>> --- /dev/null
>>> +++ b/drivers/accel/qaic/qaic.h
>>> @@ -0,0 +1,321 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only
>>> + *
>>> + * Copyright (c) 2019-2021, The Linux Foundation. All rights reserved.
>>> + * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All 
>>> rights reserved.
>>> + */
>>> +
>>> +#ifndef QAICINTERNAL_H_
>>
>> Please use guard macro that matches the file name: _QAIC_H_
> 
> Before moving to DRM/ACCEL, this conflicted with the uapi file. However, 
> that is no longer the case, so yes, this should be changed. Will do.
> 
>>
>>> +#define QAICINTERNAL_H_
>>> +
>>> +#include <linux/interrupt.h>
>>> +#include <linux/kref.h>
>>> +#include <linux/mhi.h>
>>> +#include <linux/mutex.h>
>>> +#include <linux/pci.h>
>>> +#include <linux/spinlock.h>
>>> +#include <linux/srcu.h>
>>> +#include <linux/wait.h>
>>> +#include <linux/workqueue.h>
>>> +#include <drm/drm_device.h>
>>> +#include <drm/drm_gem.h>
>>> +
>>> +#define QAIC_DBC_BASE        0x20000
>>> +#define QAIC_DBC_SIZE        0x1000
>>> +
>>> +#define QAIC_NO_PARTITION    -1
>>> +
>>> +#define QAIC_DBC_OFF(i)        ((i) * QAIC_DBC_SIZE + QAIC_DBC_BASE)
>>> +
>>> +#define to_qaic_bo(obj) container_of(obj, struct qaic_bo, base)
>>> +
>>> +extern bool poll_datapath;
>>> +
>>> +struct qaic_user {
>>> +    /* Uniquely identifies this user for the device */
>>> +    int            handle;
>>> +    struct kref        ref_count;
>>> +    /* Char device opened by this user */
>>> +    struct qaic_drm_device    *qddev;
>>> +    /* Node in list of users that opened this drm device */
>>> +    struct list_head    node;
>>> +    /* SRCU used to synchronize this user during cleanup */
>>> +    struct srcu_struct    qddev_lock;
>>> +    atomic_t        chunk_id;
>>> +};
>>> +
>>> +struct dma_bridge_chan {
>>> +    /* Pointer to device strcut maintained by driver */
>>> +    struct qaic_device    *qdev;
>>> +    /* ID of this DMA bridge channel(DBC) */
>>> +    unsigned int        id;
>>> +    /* Synchronizes access to xfer_list */
>>> +    spinlock_t        xfer_lock;
>>> +    /* Base address of request queue */
>>> +    void            *req_q_base;
>>> +    /* Base address of response queue */
>>> +    void            *rsp_q_base;
>>> +    /*
>>> +     * Base bus address of request queue. Response queue bus address 
>>> can be
>>> +     * calculated by adding request queue size to this variable
>>> +     */
>>> +    dma_addr_t        dma_addr;
>>> +    /* Total size of request and response queue in byte */
>>> +    u32            total_size;
>>> +    /* Capacity of request/response queue */
>>> +    u32            nelem;
>>> +    /* The user that opened this DBC */
>>> +    struct qaic_user    *usr;
>>> +    /*
>>> +     * Request ID of next memory handle that goes in request queue. One
>>> +     * memory handle can enqueue more than one request elements, all
>>> +     * this requests that belong to same memory handle have same 
>>> request ID
>>> +     */
>>> +    u16            next_req_id;
>>> +    /* TRUE: DBC is in use; FALSE: DBC not in use */
>>
>> Use standard "true"/"false" instead of custom "TRUE"/"FALSE" macros.
>> This applies here and in multiple other places in the driver.
> 
> I think you are getting at that the documentation could be confusing.  I 
> don't appear to see custom macro use in the code.  Will try to clarify 
> that here.
> 
>>> +    bool            in_use;
>>> +    /*
>>> +     * Base address of device registers. Used to read/write request and
>>> +     * response queue's head and tail pointer of this DBC.
>>> +     */
>>> +    void __iomem        *dbc_base;
>>> +    /* Head of list where each node is a memory handle queued in 
>>> request queue */
>>> +    struct list_head    xfer_list;
>>> +    /* Synchronizes DBC readers during cleanup */
>>> +    struct srcu_struct    ch_lock;
>>> +    /*
>>> +     * When this DBC is released, any thread waiting on this wait 
>>> queue is
>>> +     * woken up
>>> +     */
>>> +    wait_queue_head_t    dbc_release;
>>> +    /* Head of list where each node is a bo associated with this DBC */
>>> +    struct list_head    bo_lists;
>>> +    /* The irq line for this DBC.  Used for polling */
>>> +    unsigned int        irq;
>>> +    /* Polling work item to simulate interrupts */
>>> +    struct work_struct    poll_work;
>>> +};
>>> +
>>> +struct qaic_device {
>>> +    /* Pointer to base PCI device struct of our physical device */
>>> +    struct pci_dev        *pdev;
>>> +    /* Mask of all bars of this device */
>>> +    int            bars;
>>> +    /* Req. ID of request that will be queued next in MHI control 
>>> device */
>>> +    u32            next_seq_num;
>>> +    /* Base address of bar 0 */
>>> +    void __iomem        *bar_0;
>>> +    /* Base address of bar 2 */
>>> +    void __iomem        *bar_2;
>>> +    /* Controller structure for MHI devices */
>>> +    struct mhi_controller    *mhi_cntl;
>>> +    /* MHI control channel device */
>>> +    struct mhi_device    *cntl_ch;
>>> +    /* List of requests queued in MHI control device */
>>> +    struct list_head    cntl_xfer_list;
>>> +    /* Synchronizes MHI control device transactions and its xfer 
>>> list */
>>> +    struct mutex        cntl_mutex;
>>> +    /* Base actual physical representation of drm device */
>>> +    struct qaic_drm_device    *base_dev;
>>> +    /* Array of DBC struct of this device */
>>> +    struct dma_bridge_chan    *dbc;
>>> +    /* Work queue for tasks related to MHI control device */
>>> +    struct workqueue_struct    *cntl_wq;
>>> +    /* Synchronizes all the users of device during cleanup */
>>> +    struct srcu_struct    dev_lock;
>>> +    /* TRUE: Device under reset; FALSE: Device not under reset */
>>> +    bool            in_reset;
>>> +    /*
>>> +     * TRUE: A tx MHI transaction has failed and a rx buffer is 
>>> still queued
>>> +     * in control device. Such a buffer is considered lost rx buffer
>>> +     * FALSE: No rx buffer is lost in control device
>>> +     */
>>> +    bool            cntl_lost_buf;
>>> +    /* Maximum number of DBC supported by this device */
>>> +    u32            num_dbc;
>>> +    /* Head in list of drm device created on top of this device */
>>> +    struct list_head    qaic_drm_devices;
>>> +    /* Synchronizes access of qaic_drm_devices list */
>>> +    struct mutex        qaic_drm_devices_mutex;
>>> +    /* Generate the CRC of a control message */
>>> +    u32 (*gen_crc)(void *msg);
>>> +    /* Validate the CRC of a control message */
>>> +    bool (*valid_crc)(void *msg);
>>> +};
>>> +
>>> +struct qaic_drm_device {
>>> +    /* Pointer to the root device struct driven by this driver */
>>> +    struct qaic_device    *qdev;
>>> +    /* Node in list of drm devices maintained by root device */
>>> +    struct list_head    node;
>>> +    /*
>>> +     * The physical device can be partition in number of logical 
>>> devices.
>>> +     * And each logical device is given a partition id. This member 
>>> stores
>>> +     * that id. QAIC_NO_PARTITION is a sentinel used to mark that 
>>> this drm
>>> +     * device is the actual physical device
>>> +     */
>>> +    s32            partition_id;
>>> +    /*
>>> +     * It points to the user that created this drm device. It is NULL
>>> +     * when this drm device represents the physical device i.e.
>>> +     * partition_id is QAIC_NO_PARTITION
>>> +     */
>>> +    struct qaic_user    *owner;
>>> +    /* Pointer to the drm device struct of this drm device */
>>> +    struct drm_device    *ddev;
>>> +    /* Head in list of users who have opened this drm device */
>>> +    struct list_head    users;
>>> +    /* Synchronizes access to users list */
>>> +    struct mutex        users_mutex;
>>> +};
>>> +
>>> +struct qaic_bo {
>>> +    struct drm_gem_object    base;
>>
>> Any reason why drm_gem_shmem_object cannot be used as a base?
> 
> I beleive that is incompatible with our memory allocator in patch 5.
> 
>>> +    /* Scatter/gather table for allocate/imported BO */
>>> +    struct sg_table        *sgt;
>>> +    /* BO size requested by user. GEM object might be bigger in 
>>> size. */
>>> +    u64            size;
>>> +    /* Head in list of slices of this BO */
>>> +    struct list_head    slices;
>>> +    /* Total nents, for all slices of this BO */
>>> +    int            total_slice_nents;
>>> +    /*
>>> +     * Direction of transfer. It can assume only two value 
>>> DMA_TO_DEVICE and
>>> +     * DMA_FROM_DEVICE.
>>> +     */
>>> +    int            dir;
>>> +    /* The pointer of the DBC which operates on this BO */
>>> +    struct dma_bridge_chan    *dbc;
>>> +    /* Number of slice that belongs to this buffer */
>>> +    u32            nr_slice;
>>> +    /* Number of slice that have been transferred by DMA engine */
>>> +    u32            nr_slice_xfer_done;
>>> +    /* TRUE = BO is queued for execution, FALSE = BO is not queued */
>>> +    bool            queued;
>>> +    /*
>>> +     * If TRUE then user has attached slicing information to this BO by
>>> +     * calling DRM_IOCTL_QAIC_ATTACH_SLICE_BO ioctl.
>>> +     */
>>> +    bool            sliced;
>>> +    /* Request ID of this BO if it is queued for execution */
>>> +    u16            req_id;
>>> +    /* Handle assigned to this BO */
>>> +    u32            handle;
>>> +    /* Wait on this for completion of DMA transfer of this BO */
>>> +    struct completion    xfer_done;
>>> +    /*
>>> +     * Node in linked list where head is dbc->xfer_list.
>>> +     * This link list contain BO's that are queued for DMA transfer.
>>> +     */
>>> +    struct list_head    xfer_list;
>>> +    /*
>>> +     * Node in linked list where head is dbc->bo_lists.
>>> +     * This link list contain BO's that are associated with the DBC 
>>> it is
>>> +     * linked to.
>>> +     */
>>> +    struct list_head    bo_list;
>>> +    struct {
>>> +        /*
>>> +         * Latest timestamp(ns) at which kernel received a request to
>>> +         * execute this BO
>>> +         */
>>> +        u64        req_received_ts;
>>> +        /*
>>> +         * Latest timestamp(ns) at which kernel enqueued requests of
>>> +         * this BO for execution in DMA queue
>>> +         */
>>> +        u64        req_submit_ts;
>>> +        /*
>>> +         * Latest timestamp(ns) at which kernel received a completion
>>> +         * interrupt for requests of this BO
>>> +         */
>>> +        u64        req_processed_ts;
>>> +        /*
>>> +         * Number of elements already enqueued in DMA queue before
>>> +         * enqueuing requests of this BO
>>> +         */
>>> +        u32        queue_level_before;
>>> +    } perf_stats;
>>> +
>>> +};
>>> +
>>> +struct bo_slice {
>>> +    /* Mapped pages */
>>> +    struct sg_table        *sgt;
>>> +    /* Number of requests required to queue in DMA queue */
>>> +    int            nents;
>>> +    /* See enum dma_data_direction */
>>> +    int            dir;
>>> +    /* Actual requests that will be copied in DMA queue */
>>> +    struct dbc_req        *reqs;
>>> +    struct kref        ref_count;
>>> +    /* TRUE: No DMA transfer required */
>>> +    bool            no_xfer;
>>> +    /* Pointer to the parent BO handle */
>>> +    struct qaic_bo        *bo;
>>> +    /* Node in list of slices maintained by parent BO */
>>> +    struct list_head    slice;
>>> +    /* Size of this slice in bytes */
>>> +    u64            size;
>>> +    /* Offset of this slice in buffer */
>>> +    u64            offset;
>>> +};
>>> +
>>> +int get_dbc_req_elem_size(void);
>>> +int get_dbc_rsp_elem_size(void);
>>> +int get_cntl_version(struct qaic_device *qdev, struct qaic_user *usr,
>>> +             u16 *major, u16 *minor);
>>> +int qaic_manage_ioctl(struct drm_device *dev, void *data,
>>> +              struct drm_file *file_priv);
>>> +int qaic_execute_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
>>> +               unsigned long arg, bool is_partial);
>>> +int qaic_wait_exec_ioctl(struct qaic_device *qdev, struct qaic_user 
>>> *usr,
>>> +             unsigned long arg);
>>> +int qaic_query_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
>>> +             unsigned long arg);
>>> +int qaic_data_mmap(struct qaic_device *qdev, struct qaic_user *usr,
>>> +           struct vm_area_struct *vma);
>>> +int qaic_data_get_reservation(struct qaic_device *qdev, struct 
>>> qaic_user *usr,
>>> +                  void *data, u32 *partition_id,
>>> +                  u16 *remove);
>>> +void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
>>> +             struct mhi_result *mhi_result);
>>> +
>>> +void qaic_mhi_dl_xfer_cb(struct mhi_device *mhi_dev,
>>> +             struct mhi_result *mhi_result);
>>> +
>>> +int qaic_control_open(struct qaic_device *qdev);
>>> +void qaic_control_close(struct qaic_device *qdev);
>>> +void qaic_release_usr(struct qaic_device *qdev, struct qaic_user *usr);
>>> +
>>> +irqreturn_t dbc_irq_threaded_fn(int irq, void *data);
>>> +irqreturn_t dbc_irq_handler(int irq, void *data);
>>> +int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct 
>>> qaic_user *usr);
>>> +void enable_dbc(struct qaic_device *qdev, u32 dbc_id, struct 
>>> qaic_user *usr);
>>> +void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id);
>>> +void release_dbc(struct qaic_device *qdev, u32 dbc_id);
>>> +
>>> +void wake_all_cntl(struct qaic_device *qdev);
>>> +void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool 
>>> exit_reset);
>>> +
>>> +struct drm_gem_object *qaic_gem_prime_import(struct drm_device *dev,
>>> +                         struct dma_buf *dma_buf);
>>> +
>>> +int qaic_create_bo_ioctl(struct drm_device *dev, void *data,
>>> +             struct drm_file *file_priv);
>>> +int qaic_mmap_bo_ioctl(struct drm_device *dev, void *data,
>>> +               struct drm_file *file_priv);
>>> +int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data,
>>> +                   struct drm_file *file_priv);
>>> +int qaic_execute_bo_ioctl(struct drm_device *dev, void *data,
>>> +              struct drm_file *file_priv);
>>> +int qaic_partial_execute_bo_ioctl(struct drm_device *dev, void *data,
>>> +                  struct drm_file *file_priv);
>>> +int qaic_wait_bo_ioctl(struct drm_device *dev, void *data,
>>> +               struct drm_file *file_priv);
>>> +int qaic_test_print_bo_ioctl(struct drm_device *dev, void *data,
>>> +                 struct drm_file *file_priv);
>>> +int qaic_perf_stats_bo_ioctl(struct drm_device *dev, void *data,
>>> +                 struct drm_file *file_priv);
>>
>> You don't need to break these lines. You can use up to 100 columns in 
>> the whole driver.
>> It will be more readable and checkpatch won't complain.
> 
> Some of this predates the 100 column change.  Checkpatch already 
> complains about the uapi file.
> 
> Will try to fix here.
> 
>>> +void irq_polling_work(struct work_struct *work);
>>> +
>>> +#endif /* QAICINTERNAL_H_ */
>>> diff --git a/drivers/accel/qaic/qaic_drv.c 
>>> b/drivers/accel/qaic/qaic_drv.c
>>> new file mode 100644
>>> index 0000000..602a784
>>> --- /dev/null
>>> +++ b/drivers/accel/qaic/qaic_drv.c
>>> @@ -0,0 +1,771 @@
>>> +// SPDX-License-Identifier: GPL-2.0-only
>>> +
>>> +/* Copyright (c) 2019-2021, The Linux Foundation. All rights 
>>> reserved. */
>>> +/* Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All 
>>> rights reserved. */
>>> +
>>> +#include <linux/delay.h>
>>> +#include <linux/dma-mapping.h>
>>> +#include <linux/idr.h>
>>> +#include <linux/interrupt.h>
>>> +#include <linux/list.h>
>>> +#include <linux/kref.h>
>>> +#include <linux/mhi.h>
>>> +#include <linux/module.h>
>>> +#include <linux/msi.h>
>>> +#include <linux/mutex.h>
>>> +#include <linux/pci.h>
>>> +#include <linux/sched.h>
>>
>> Is <linux/sched.h> used here?
>> Feels like there are couple other unused includes in this file.
> 
> Actually I think that can be removed now.  Will check.
> The other ones look sane to me.  Are there specific ones you question?
> 
>>
>>> +#include <linux/spinlock.h>
>>> +#include <linux/workqueue.h>
>>> +#include <linux/wait.h>
>>> +#include <drm/drm_accel.h>
>>> +#include <drm/drm_drv.h>
>>> +#include <drm/drm_file.h>
>>> +#include <drm/drm_gem.h>
>>> +#include <drm/drm_ioctl.h>
>>> +#include <uapi/drm/qaic_accel.h>
>>> +
>>> +#include "mhi_controller.h"
>>> +#include "mhi_qaic_ctrl.h"
>>> +#include "qaic.h"
>>> +
>>> +MODULE_IMPORT_NS(DMA_BUF);
>>> +
>>> +#define PCI_DEV_AIC100            0xa100
>>> +#define QAIC_NAME            "qaic"
>>> +#define QAIC_DESC            "Qualcomm Cloud AI Accelerators"
>>> +
>>> +static unsigned int datapath_polling;
>>> +module_param(datapath_polling, uint, 0400);
>>> +bool poll_datapath;
>>> +
>>> +static u16 cntl_major = 5;
>>> +static u16 cntl_minor;/* 0 */
>>
>> Missing space before the comment.
>> And also you could convert both vars to macros as they are constants.
> 
> Sure
> 
>>> +static bool link_up;
>>> +static DEFINE_IDA(qaic_usrs);
>>> +
>>> +static int qaic_create_drm_device(struct qaic_device *qdev, s32 
>>> partition_id,
>>> +                  struct qaic_user *owner);
>>> +static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 
>>> partition_id,
>>> +                    struct qaic_user *owner);
>>> +
>>> +static void free_usr(struct kref *kref)
>>> +{
>>> +    struct qaic_user *usr = container_of(kref, struct qaic_user, 
>>> ref_count);
>>> +
>>> +    cleanup_srcu_struct(&usr->qddev_lock);
>>> +    ida_free(&qaic_usrs, usr->handle);
>>> +    kfree(usr);
>>> +}
>>> +
>>> +static int qaic_open(struct drm_device *dev, struct drm_file *file)
>>> +{
>>> +    struct qaic_drm_device *qddev = dev->dev_private;
>>> +    struct qaic_device *qdev = qddev->qdev;
>>> +    struct qaic_user *usr;
>>> +    int rcu_id;
>>> +    int ret;
>>> +
>>> +    rcu_id = srcu_read_lock(&qdev->dev_lock);
>>> +    if (qdev->in_reset) {
>>> +        srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>> +        return -ENODEV;
>>> +    }
>>> +
>>> +    usr = kmalloc(sizeof(*usr), GFP_KERNEL);
>>> +    if (!usr) {
>>> +        srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>> +        return -ENOMEM;
>>> +    }
>>> +
>>> +    usr->handle = ida_alloc(&qaic_usrs, GFP_KERNEL);
>>> +    if (usr->handle < 0) {
>>> +        srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>> +        kfree(usr);
>>> +        return usr->handle;
>>> +    }
>>> +    usr->qddev = qddev;
>>> +    atomic_set(&usr->chunk_id, 0);
>>> +    init_srcu_struct(&usr->qddev_lock);
>>> +    kref_init(&usr->ref_count);
>>> +
>>> +    ret = mutex_lock_interruptible(&qddev->users_mutex);
>>> +    if (ret) {
>>> +        cleanup_srcu_struct(&usr->qddev_lock);
>>> +        kfree(usr);
>>> +        srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>> +        return ret;
>>> +    }
>>> +
>>> +    list_add(&usr->node, &qddev->users);
>>> +    mutex_unlock(&qddev->users_mutex);
>>> +
>>> +    file->driver_priv = usr;
>>> +
>>> +    srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>> +    return 0;
>>> +}
>>> +
>>> +static void qaic_postclose(struct drm_device *dev, struct drm_file 
>>> *file)
>>> +{
>>> +    struct qaic_user *usr = file->driver_priv;
>>> +    struct qaic_drm_device *qddev;
>>> +    struct qaic_device *qdev;
>>> +    int qdev_rcu_id;
>>> +    int usr_rcu_id;
>>> +    int i;
>>> +
>>> +    qddev = usr->qddev;
>>> +    usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
>>> +    if (qddev) {
>>> +        qdev = qddev->qdev;
>>> +        qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
>>> +        if (!qdev->in_reset) {
>>> +            qaic_release_usr(qdev, usr);
>>> +            for (i = 0; i < qdev->num_dbc; ++i)
>>> +                if (qdev->dbc[i].usr &&
>>> +                    qdev->dbc[i].usr->handle == usr->handle)
>>> +                    release_dbc(qdev, i);
>>> +
>>> +            /* Remove child devices */
>>> +            if (qddev->partition_id == QAIC_NO_PARTITION)
>>> +                qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, usr);
>>> +        }
>>> +        srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
>>> +
>>> +        mutex_lock(&qddev->users_mutex);
>>> +        if (!list_empty(&usr->node))
>>> +            list_del_init(&usr->node);
>>> +        mutex_unlock(&qddev->users_mutex);
>>> +    }
>>> +
>>> +    srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>> +    kref_put(&usr->ref_count, free_usr);
>>> +
>>> +    file->driver_priv = NULL;
>>> +}
>>> +
>>> +static int qaic_part_dev_ioctl(struct drm_device *dev, void *data,
>>> +                   struct drm_file *file_priv)
>>> +{
>>> +    struct qaic_device *qdev;
>>> +    struct qaic_user *usr;
>>> +    u32 partition_id;
>>> +    int qdev_rcu_id;
>>> +    int usr_rcu_id;
>>> +    int ret = 0;
>>> +    u16 remove;
>>> +
>>> +    usr = file_priv->driver_priv;
>>> +    if (!usr)
>>> +        return -EINVAL;
>>> +
>>> +    usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
>>> +    if (!usr->qddev) {
>>> +        srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>> +        return -ENODEV;
>>> +    }
>>> +
>>> +    qdev = usr->qddev->qdev;
>>> +    if (!qdev) {
>>> +        srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>> +        return -ENODEV;
>>> +    }
>>> +
>>> +    qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
>>> +    if (qdev->in_reset) {
>>> +        ret = -ENODEV;
>>> +        goto out;
>>> +    }
>>> +
>>> +    /* This IOCTL is only supported for base devices. */
>>> +    if (usr->qddev->partition_id != QAIC_NO_PARTITION) {
>>> +        ret = -ENOTTY;
>>> +        goto out;
>>> +    }
>>> +
>>> +    ret = qaic_data_get_reservation(qdev, usr, data, &partition_id,
>>> +                    &remove);
>>> +    if (ret)
>>> +        goto out;
>>> +
>>> +    if (remove == 1)
>>> +        qaic_destroy_drm_device(qdev, partition_id, usr);
>>> +    else
>>> +        ret = qaic_create_drm_device(qdev, partition_id, usr);
>>> +
>>> +out:
>>> +    srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
>>> +    srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +DEFINE_DRM_ACCEL_FOPS(qaic_accel_fops);
>>> +
>>> +static const struct drm_ioctl_desc qaic_drm_ioctls[] = {
>>> +    DRM_IOCTL_DEF_DRV(QAIC_MANAGE, qaic_manage_ioctl, 
>>> DRM_RENDER_ALLOW),
>>> +    DRM_IOCTL_DEF_DRV(QAIC_CREATE_BO, qaic_create_bo_ioctl, 
>>> DRM_RENDER_ALLOW),
>>> +    DRM_IOCTL_DEF_DRV(QAIC_MMAP_BO, qaic_mmap_bo_ioctl, 
>>> DRM_RENDER_ALLOW),
>>> +    DRM_IOCTL_DEF_DRV(QAIC_ATTACH_SLICE_BO, 
>>> qaic_attach_slice_bo_ioctl, DRM_RENDER_ALLOW),
>>> +    DRM_IOCTL_DEF_DRV(QAIC_EXECUTE_BO, qaic_execute_bo_ioctl, 
>>> DRM_RENDER_ALLOW),
>>> +    DRM_IOCTL_DEF_DRV(QAIC_PARTIAL_EXECUTE_BO, 
>>> qaic_partial_execute_bo_ioctl, DRM_RENDER_ALLOW),
>>> +    DRM_IOCTL_DEF_DRV(QAIC_WAIT_BO, qaic_wait_bo_ioctl, 
>>> DRM_RENDER_ALLOW),
>>> +    DRM_IOCTL_DEF_DRV(QAIC_PERF_STATS_BO, qaic_perf_stats_bo_ioctl, 
>>> DRM_RENDER_ALLOW),
>>> +    DRM_IOCTL_DEF_DRV(QAIC_PART_DEV, qaic_part_dev_ioctl, 
>>> DRM_RENDER_ALLOW),
>>> +};
>>> +
>>> +static const struct drm_driver qaic_accel_driver = {
>>> +    .driver_features    = DRIVER_GEM | DRIVER_COMPUTE_ACCEL,
>>> +
>>> +    .name            = QAIC_NAME,
>>> +    .desc            = QAIC_DESC,
>>> +    .date            = "20190618",
>>> +
>>> +    .fops            = &qaic_accel_fops,
>>> +    .open            = qaic_open,
>>> +    .postclose        = qaic_postclose,
>>> +
>>> +    .ioctls            = qaic_drm_ioctls,
>>> +    .num_ioctls        = ARRAY_SIZE(qaic_drm_ioctls),
>>> +    .prime_fd_to_handle    = drm_gem_prime_fd_to_handle,
>>> +    .gem_prime_import    = qaic_gem_prime_import,
>>> +};
>>> +
>>> +static int qaic_create_drm_device(struct qaic_device *qdev, s32 
>>> partition_id,
>>> +                  struct qaic_user *owner)
>>> +{
>>> +    struct qaic_drm_device *qddev;
>>> +    struct drm_device *ddev;
>>> +    struct device *pdev;
>>> +    int ret;
>>> +
>>> +    /*
>>> +     * Partition id QAIC_NO_PARTITION indicates that the device was 
>>> created
>>> +     * on mhi_probe and id > QAIC_NO_PARTITION indicates a partition
>>> +     * created using IOCTL. So, pdev for primary device is the pci 
>>> dev and
>>> +     * the parent for partition dev is the primary device.
>>> +     */
>>> +    if (partition_id == QAIC_NO_PARTITION)
>>> +        pdev = &qdev->pdev->dev;
>>> +    else
>>> +        pdev = qdev->base_dev->ddev->dev;
>>> +
>>> +    qddev = kzalloc(sizeof(*qddev), GFP_KERNEL);
>>> +    if (!qddev) {
>>> +        ret = -ENOMEM;
>>> +        goto qddev_fail;
>>> +    }
>>> +
>>> +    ddev = drm_dev_alloc(&qaic_accel_driver, pdev);
>>> +    if (IS_ERR(ddev)) {
>>> +        ret = PTR_ERR(ddev);
>>> +        goto ddev_fail;
>>> +    }
>>> +
>>> +    ddev->dev_private = qddev;
>>> +    qddev->ddev = ddev;
>>> +
>>> +    if (partition_id == QAIC_NO_PARTITION)
>>> +        qdev->base_dev = qddev;
>>> +    qddev->qdev = qdev;
>>> +    qddev->partition_id = partition_id;
>>> +    qddev->owner = owner;
>>> +    INIT_LIST_HEAD(&qddev->users);
>>> +    mutex_init(&qddev->users_mutex);
>>> +
>>> +    mutex_lock(&qdev->qaic_drm_devices_mutex);
>>> +    list_add(&qddev->node, &qdev->qaic_drm_devices);
>>> +    mutex_unlock(&qdev->qaic_drm_devices_mutex);
>>> +
>>> +    ret = drm_dev_register(ddev, 0);
>>> +    if (ret) {
>>> +        pci_dbg(qdev->pdev, "%s: drm_dev_register failed %d\n", 
>>> __func__, ret);
>>> +        goto drm_reg_fail;
>>> +    }
>>> +
>>> +    return 0;
>>> +
>>> +drm_reg_fail:
>>> +    mutex_destroy(&qddev->users_mutex);
>>> +    mutex_lock(&qdev->qaic_drm_devices_mutex);
>>> +    list_del(&qddev->node);
>>> +    mutex_unlock(&qdev->qaic_drm_devices_mutex);
>>> +    if (partition_id == QAIC_NO_PARTITION)
>>> +        qdev->base_dev = NULL;
>>> +    drm_dev_put(ddev);
>>> +ddev_fail:
>>> +    kfree(qddev);
>>> +qddev_fail:
>>> +    return ret;
>>> +}
>>> +
>>> +static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 
>>> partition_id,
>>> +                    struct qaic_user *owner)
>>> +{
>>> +    struct qaic_drm_device *qddev;
>>> +    struct qaic_drm_device *q;
>>> +    struct qaic_user *usr;
>>> +
>>> +    list_for_each_entry_safe(qddev, q, &qdev->qaic_drm_devices, node) {
>>> +        /*
>>> +         * Skip devices in case we just want to remove devices
>>> +         * specific to a owner or partition id.
>>> +         *
>>> +         * owner    partition_id    notes
>>> +         * ----------------------------------
>>> +         * NULL        NO_PARTITION    delete base + all derived (qdev
>>> +         *                reset)
>>> +         * !NULL    NO_PARTITION    delete derived devs created by
>>> +         *                owner.
>>> +         * !NULL    >NO_PARTITION    delete derived dev identified by
>>> +         *                the partition id and created by
>>> +         *                owner
>>> +         * NULL        >NO_PARTITION    invalid (no-op)
>>> +         *
>>> +         * if partition_id is any value < QAIC_NO_PARTITION this 
>>> will be
>>> +         * a no-op.
>>> +         */
>>> +        if (owner && owner != qddev->owner)
>>> +            continue;
>>> +
>>> +        if (partition_id != QAIC_NO_PARTITION &&
>>> +            partition_id != qddev->partition_id && !owner)
>>> +            continue;
>>> +
>>> +        /*
>>> +         * Existing users get unresolvable errors till they close FDs.
>>> +         * Need to sync carefully with users calling close().  The
>>> +         * list of users can be modified elsewhere when the lock isn't
>>> +         * held here, but the sync'ing the srcu with the mutex held
>>> +         * could deadlock.  Grab the mutex so that the list will be
>>> +         * unmodified.  The user we get will exist as long as the
>>> +         * lock is held.  Signal that the qcdev is going away, and
>>> +         * grab a reference to the user so they don't go away for
>>> +         * synchronize_srcu().  Then release the mutex to avoid
>>> +         * deadlock and make sure the user has observed the signal.
>>> +         * With the lock released, we cannot maintain any state of the
>>> +         * user list.
>>> +         */
>>> +        mutex_lock(&qddev->users_mutex);
>>> +        while (!list_empty(&qddev->users)) {
>>> +            usr = list_first_entry(&qddev->users, struct qaic_user,
>>> +                           node);
>>> +            list_del_init(&usr->node);
>>> +            kref_get(&usr->ref_count);
>>> +            usr->qddev = NULL;
>>> +            mutex_unlock(&qddev->users_mutex);
>>> +            synchronize_srcu(&usr->qddev_lock);
>>> +            kref_put(&usr->ref_count, free_usr);
>>> +            mutex_lock(&qddev->users_mutex);
>>> +        }
>>> +        mutex_unlock(&qddev->users_mutex);
>>> +
>>> +        if (qddev->ddev) {
>>> +            drm_dev_unregister(qddev->ddev);
>>> +            drm_dev_put(qddev->ddev);
>>> +        }
>>> +
>>> +        list_del(&qddev->node);
>>> +        kfree(qddev);
>>> +    }
>>> +}
>>> +
>>> +static int qaic_mhi_probe(struct mhi_device *mhi_dev,
>>> +              const struct mhi_device_id *id)
>>> +{
>>> +    struct qaic_device *qdev;
>>> +    u16 major, minor;
>>> +    int ret;
>>> +
>>> +    /*
>>> +     * Invoking this function indicates that the control channel to the
>>> +     * device is available.  We use that as a signal to indicate that
>>> +     * the device side firmware has booted.  The device side firmware
>>> +     * manages the device resources, so we need to communicate with it
>>> +     * via the control channel in order to utilize the device.  
>>> Therefore
>>> +     * we wait until this signal to create the drm dev that 
>>> userspace will
>>> +     * use to control the device, because without the device side 
>>> firmware,
>>> +     * userspace can't do anything useful.
>>> +     */
>>> +
>>> +    qdev = pci_get_drvdata(to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
>>> +
>>> +    qdev->in_reset = false;
>>> +
>>> +    dev_set_drvdata(&mhi_dev->dev, qdev);
>>> +    qdev->cntl_ch = mhi_dev;
>>> +
>>> +    ret = qaic_control_open(qdev);
>>> +    if (ret) {
>>> +        pci_dbg(qdev->pdev, "%s: control_open failed %d\n", 
>>> __func__, ret);
>>> +        goto err;
>>> +    }
>>> +
>>> +    ret = get_cntl_version(qdev, NULL, &major, &minor);
>>> +    if (ret || major != cntl_major || minor > cntl_minor) {
>>> +        pci_err(qdev->pdev, "%s: Control protocol version (%d.%d) 
>>> not supported.  Supported version is (%d.%d). Ret: %d\n",
>>> +            __func__, major, minor, cntl_major, cntl_minor, ret);
>>> +        ret = -EINVAL;
>>> +        goto close_control;
>>> +    }
>>> +
>>> +    ret = qaic_create_drm_device(qdev, QAIC_NO_PARTITION, NULL);
>>> +
>>> +    return ret;
>>> +
>>> +close_control:
>>> +    qaic_control_close(qdev);
>>> +err:
>>> +    return ret;
>>> +}
>>> +
>>> +static void qaic_mhi_remove(struct mhi_device *mhi_dev)
>>> +{
>>
>> Add a comment here
> 
> Presumably why this is an empty function?
> Will do.
> 
>>> +}
>>> +
>>> +static void qaic_notify_reset(struct qaic_device *qdev)
>>> +{
>>> +    int i;
>>> +
>>> +    qdev->in_reset = true;
>>> +    /* wake up any waiters to avoid waiting for timeouts at sync */
>>> +    wake_all_cntl(qdev);
>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>> +        wakeup_dbc(qdev, i);
>>> +    synchronize_srcu(&qdev->dev_lock);
>>> +}
>>> +
>>> +void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool 
>>> exit_reset)
>>> +{
>>> +    int i;
>>> +
>>> +    qaic_notify_reset(qdev);
>>> +
>>> +    /* remove drmdevs to prevent new users from coming in */
>>> +    if (qdev->base_dev)
>>> +        qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, NULL);
>>> +
>>> +    /* start tearing things down */
>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>> +        release_dbc(qdev, i);
>>> +
>>> +    if (exit_reset)
>>> +        qdev->in_reset = false;
>>> +}
>>> +
>>> +static int qaic_pci_probe(struct pci_dev *pdev,
>>> +              const struct pci_device_id *id)
>>
>> Please try to simplify this function. Maybe move irq init to separate 
>> function.
>> It will be more readable and there will less of a error handling 
>> spaghetti at the bottom.
> 
> I guess I tend towards not breaking something up if there is not reuse 
> elsewhere, but I think I see your point.  Will have a look.
> 
>>
>>> +{
>>> +    int ret;
>>> +    int i;
>>> +    int mhi_irq;
>>> +    struct qaic_device *qdev;
>>> +
>>> +    qdev = devm_kzalloc(&pdev->dev, sizeof(*qdev), GFP_KERNEL);
>>> +    if (!qdev)
>>> +        return -ENOMEM;
>>> +
>>> +    if (id->device == PCI_DEV_AIC100) {
>>> +        qdev->num_dbc = 16;
>>> +        qdev->dbc = devm_kcalloc(&pdev->dev, qdev->num_dbc, 
>>> sizeof(*qdev->dbc),
>>> +                     GFP_KERNEL);
>>> +        if (!qdev->dbc)
>>> +            return -ENOMEM;
>>> +    }
>>> +
>>> +    qdev->cntl_wq = alloc_workqueue("qaic_cntl", WQ_UNBOUND, 0);
>>> +    if (!qdev->cntl_wq) {
>>> +        ret = -ENOMEM;
>>> +        goto wq_fail;
>>> +    }
>>> +    pci_set_drvdata(pdev, qdev);
>>> +    qdev->pdev = pdev;
>>> +    mutex_init(&qdev->cntl_mutex);
>>> +    INIT_LIST_HEAD(&qdev->cntl_xfer_list);
>>> +    init_srcu_struct(&qdev->dev_lock);
>>> +    INIT_LIST_HEAD(&qdev->qaic_drm_devices);
>>> +    mutex_init(&qdev->qaic_drm_devices_mutex);
>>> +    for (i = 0; i < qdev->num_dbc; ++i) {
>>> +        spin_lock_init(&qdev->dbc[i].xfer_lock);
>>> +        qdev->dbc[i].qdev = qdev;
>>> +        qdev->dbc[i].id = i;
>>> +        INIT_LIST_HEAD(&qdev->dbc[i].xfer_list);
>>> +        init_srcu_struct(&qdev->dbc[i].ch_lock);
>>> +        init_waitqueue_head(&qdev->dbc[i].dbc_release);
>>> +        INIT_LIST_HEAD(&qdev->dbc[i].bo_lists);
>>> +    }
>>> +
>>> +    qdev->bars = pci_select_bars(pdev, IORESOURCE_MEM);
>>> +
>>> +    /* make sure the device has the expected BARs */
>>> +    if (qdev->bars != (BIT(0) | BIT(2) | BIT(4))) {
>>> +        pci_dbg(pdev, "%s: expected BARs 0, 2, and 4 not found in 
>>> device.  Found 0x%x\n",
>>> +            __func__, qdev->bars);
>>> +        ret = -EINVAL;
>>> +        goto bar_fail;
>>> +    }
>>> +
>>> +    ret = pci_enable_device(pdev);
>>> +    if (ret)
>>> +        goto enable_fail;
>>> +
>>> +    ret = pci_request_selected_regions(pdev, qdev->bars, "aic100");
>>> +    if (ret)
>>> +        goto request_regions_fail;
>>> +
>>> +    pci_set_master(pdev);
>>> +
>>> +    ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
>>> +    if (ret)
>>> +        goto dma_mask_fail;
>>> +    ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
>>> +    if (ret)
>>> +        goto dma_mask_fail;
>>> +    ret = dma_set_max_seg_size(&pdev->dev, UINT_MAX);
>>> +    if (ret)
>>> +        goto dma_mask_fail;
>>> +
>>> +    qdev->bar_0 = pci_ioremap_bar(pdev, 0);
>>> +    if (!qdev->bar_0) {
>>> +        ret = -ENOMEM;
>>> +        goto ioremap_0_fail;
>>> +    }
>>> +
>>> +    qdev->bar_2 = pci_ioremap_bar(pdev, 2);
>>> +    if (!qdev->bar_2) {
>>> +        ret = -ENOMEM;
>>> +        goto ioremap_2_fail;
>>> +    }
>>> +
>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>> +        qdev->dbc[i].dbc_base = qdev->bar_2 + QAIC_DBC_OFF(i);
>>> +
>>> +    ret = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI);
>>> +    if (ret < 0)
>>> +        goto alloc_irq_fail;
>>> +
>>> +    if (ret < 32) {
>>> +        pci_err(pdev, "%s: Requested 32 MSIs.  Obtained %d MSIs 
>>> which is less than the 32 required.\n",
>>> +            __func__, ret);
>>> +        ret = -ENODEV;
>>> +        goto invalid_msi_config;
>>> +    }
>>> +
>>> +    mhi_irq = pci_irq_vector(pdev, 0);
>>> +    if (mhi_irq < 0) {
>>> +        ret = mhi_irq;
>>> +        goto get_mhi_irq_fail;
>>> +    }
>>> +
>>> +    for (i = 0; i < qdev->num_dbc; ++i) {
>>> +        ret = devm_request_threaded_irq(&pdev->dev,
>>> +                        pci_irq_vector(pdev, i + 1),
>>> +                        dbc_irq_handler,
>>> +                        dbc_irq_threaded_fn,
>>> +                        IRQF_SHARED,
>>> +                        "qaic_dbc",
>>> +                        &qdev->dbc[i]);
>>> +        if (ret)
>>> +            goto get_dbc_irq_failed;
>>> +
>>> +        if (poll_datapath) {
>>> +            qdev->dbc[i].irq = pci_irq_vector(pdev, i + 1);
>>> +            disable_irq_nosync(qdev->dbc[i].irq);
>>> +            INIT_WORK(&qdev->dbc[i].poll_work, irq_polling_work);
>>> +        }
>>> +    }
>>> +
>>> +    qdev->mhi_cntl = qaic_mhi_register_controller(pdev, qdev->bar_0, 
>>> mhi_irq);
>>> +    if (IS_ERR(qdev->mhi_cntl)) {
>>> +        ret = PTR_ERR(qdev->mhi_cntl);
>>> +        goto mhi_register_fail;
>>> +    }
>>> +
>>> +    return 0;
>>> +
>>> +mhi_register_fail:
>>> +get_dbc_irq_failed:
>>
>> I don't think that duplicated goto statements are allowed by the 
>> coding style.
>> These should be rather named something like err_free_irq.
>> See 
>> https://www.kernel.org/doc/html/v4.10/process/coding-style.html#centralized-exiting-of-functions 
>>
> 
> I took another look at that link, and I do not beleive it prohibits 
> duplicated goto labels, but they probably could be renamed and 
> simplified as you indicate.  Will have a look at that.
> 
>>
>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>> +        devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
>>> +                  &qdev->dbc[i]);
>>> +get_mhi_irq_fail:
>>> +invalid_msi_config:
>>> +    pci_free_irq_vectors(pdev);
>>> +alloc_irq_fail:
>>> +    iounmap(qdev->bar_2);
>>> +ioremap_2_fail:
>>> +    iounmap(qdev->bar_0);
>>> +ioremap_0_fail:
>>> +dma_mask_fail:
>>> +    pci_clear_master(pdev);
>>> +    pci_release_selected_regions(pdev, qdev->bars);
>>> +request_regions_fail:
>>> +    pci_disable_device(pdev);
>>> +enable_fail:
>>> +    pci_set_drvdata(pdev, NULL);
>>> +bar_fail:
>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>> +        cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
>>> +    cleanup_srcu_struct(&qdev->dev_lock);
>>> +    destroy_workqueue(qdev->cntl_wq);
>>> +wq_fail:
>>> +    return ret;
>>> +}
>>> +
>>> +static void qaic_pci_remove(struct pci_dev *pdev)
>>> +{
>>> +    struct qaic_device *qdev = pci_get_drvdata(pdev);
>>> +    int i;
>>> +
>>> +    if (!qdev)
>>> +        return;
>>> +
>>> +    qaic_dev_reset_clean_local_state(qdev, false);
>>> +    qaic_mhi_free_controller(qdev->mhi_cntl, link_up);
>>> +    for (i = 0; i < qdev->num_dbc; ++i) {
>>> +        devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
>>> +                  &qdev->dbc[i]);
>>> +        cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
>>> +    }
>>> +    destroy_workqueue(qdev->cntl_wq);
>>> +    pci_free_irq_vectors(pdev);
>>> +    iounmap(qdev->bar_0);
>>> +    pci_clear_master(pdev);
>>> +    pci_release_selected_regions(pdev, qdev->bars);
>>> +    pci_disable_device(pdev);
>>> +    pci_set_drvdata(pdev, NULL);
>>> +}
>>> +
>>> +static void qaic_pci_shutdown(struct pci_dev *pdev)
>>> +{
>>> +    /* see qaic_exit for what link_up is doing */
>>> +    link_up = true;
>>> +    qaic_pci_remove(pdev);
>>> +}
>>> +
>>> +static pci_ers_result_t qaic_pci_error_detected(struct pci_dev *pdev,
>>> +                        pci_channel_state_t error)
>>> +{
>>> +    return PCI_ERS_RESULT_NEED_RESET;
>>> +}
>>> +
>>> +static void qaic_pci_reset_prepare(struct pci_dev *pdev)
>>> +{
>>> +    struct qaic_device *qdev = pci_get_drvdata(pdev);
>>> +
>>> +    qaic_notify_reset(qdev);
>>> +    qaic_mhi_start_reset(qdev->mhi_cntl);
>>> +    qaic_dev_reset_clean_local_state(qdev, false);
>>> +}
>>> +
>>> +static void qaic_pci_reset_done(struct pci_dev *pdev)
>>> +{
>>> +    struct qaic_device *qdev = pci_get_drvdata(pdev);
>>> +
>>> +    qdev->in_reset = false;
>>> +    qaic_mhi_reset_done(qdev->mhi_cntl);
>>> +}
>>> +
>>> +static const struct mhi_device_id qaic_mhi_match_table[] = {
>>> +    { .chan = "QAIC_CONTROL", },
>>> +    {},
>>> +};
>>> +
>>> +static struct mhi_driver qaic_mhi_driver = {
>>> +    .id_table = qaic_mhi_match_table,
>>> +    .remove = qaic_mhi_remove,
>>> +    .probe = qaic_mhi_probe,
>>> +    .ul_xfer_cb = qaic_mhi_ul_xfer_cb,
>>> +    .dl_xfer_cb = qaic_mhi_dl_xfer_cb,
>>> +    .driver = {
>>> +        .name = "qaic_mhi",
>>> +    },
>>> +};
>>> +
>>> +static const struct pci_device_id qaic_ids[] = {
>>> +    { PCI_DEVICE(PCI_VENDOR_ID_QCOM, PCI_DEV_AIC100), },
>>> +    { }
>>> +};
>>> +MODULE_DEVICE_TABLE(pci, qaic_ids);
>>> +
>>> +static const struct pci_error_handlers qaic_pci_err_handler = {
>>> +    .error_detected = qaic_pci_error_detected,
>>> +    .reset_prepare = qaic_pci_reset_prepare,
>>> +    .reset_done = qaic_pci_reset_done,
>>> +};
>>> +
>>> +static struct pci_driver qaic_pci_driver = {
>>> +    .name = QAIC_NAME,
>>> +    .id_table = qaic_ids,
>>> +    .probe = qaic_pci_probe,
>>> +    .remove = qaic_pci_remove,
>>> +    .shutdown = qaic_pci_shutdown,
>>> +    .err_handler = &qaic_pci_err_handler,
>>> +};
>>> +
>>> +static int __init qaic_init(void)
>>> +{
>>> +    int ret;
>>> +
>>> +    if (datapath_polling)
>>> +        poll_datapath = true;
>>> +
>>> +    ret = mhi_driver_register(&qaic_mhi_driver);
>>> +    if (ret) {
>>> +        pr_debug("qaic: mhi_driver_register failed %d\n", ret);
>>> +        goto free_class;
>>> +    }
>>> +
>>> +    ret = pci_register_driver(&qaic_pci_driver);
>>> +    if (ret) {
>>> +        pr_debug("qaic: pci_register_driver failed %d\n", ret);
>>> +        goto free_mhi;
>>> +    }
>>> +
>>> +    ret = mhi_qaic_ctrl_init();
>>> +    if (ret) {
>>> +        pr_debug("qaic: mhi_qaic_ctrl_init failed %d\n", ret);
>>> +        goto free_pci;
>>> +    }
>>> +
>>> +    return 0;
>>> +
>>> +free_pci:
>>> +    pci_unregister_driver(&qaic_pci_driver);
>>> +free_mhi:
>>> +    mhi_driver_unregister(&qaic_mhi_driver);
>>> +free_class:
>>
>> This label doesn't free anything. It should be renamed.
> 
> Doh.  Holdover from a previous version.  It does need to be renamed.
> 
>>
>>> +    return ret;
>>> +}
>>> +
>>> +static void __exit qaic_exit(void)
>>> +{
>>> +    /*
>>> +     * We assume that qaic_pci_remove() is called due to a hotplug 
>>> event
>>> +     * which would mean that the link is down, and thus
>>> +     * qaic_mhi_free_controller() should not try to access the 
>>> device during
>>> +     * cleanup.
>>> +     * We call pci_unregister_driver() below, which also triggers
>>> +     * qaic_pci_remove(), but since this is module exit, we expect 
>>> the link
>>> +     * to the device to be up, in which case qaic_mhi_free_controller()
>>> +     * should try to access the device during cleanup to put the 
>>> device in
>>> +     * a sane state.
>>> +     * For that reason, we set link_up here to let 
>>> qaic_mhi_free_controller
>>> +     * know the expected link state.  Since the module is going to be
>>> +     * removed at the end of this, we don't need to worry about
>>> +     * reinitializing the link_up state after the cleanup is done.
>>> +     */
>>> +    link_up = true;
>>
>> Maybe you could just use pdev->current_state instead of link_up?
> 
> Maybe.  Will have a look at that.

Looks like no.  current_state does not appear to be modified by any of 
the hotplug code for remove events, and current_state is not updated by 
the pci framework until after the driver's remove() is invoked.

> 
>>
>>> +    mhi_qaic_ctrl_deinit();
>>> +    pci_unregister_driver(&qaic_pci_driver);
>>> +    mhi_driver_unregister(&qaic_mhi_driver);
>>> +}
>>> +
>>> +module_init(qaic_init);
>>> +module_exit(qaic_exit);
>>> +
>>> +MODULE_AUTHOR(QAIC_DESC " Kernel Driver Team");
>>> +MODULE_DESCRIPTION(QAIC_DESC " Accel Driver");
>>> +MODULE_LICENSE("GPL");
>>> diff --git a/include/uapi/drm/qaic_accel.h 
>>> b/include/uapi/drm/qaic_accel.h
>>> new file mode 100644
>>> index 0000000..d5fa6f5
>>> --- /dev/null
>>> +++ b/include/uapi/drm/qaic_accel.h
>>> @@ -0,0 +1,283 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
>>> + *
>>> + * Copyright (c) 2019-2020, The Linux Foundation. All rights reserved.
>>> + * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All 
>>> rights reserved.
>>> + */
>>> +
>>> +#ifndef QAIC_ACCEL_H_
>>> +#define QAIC_ACCEL_H_
>>> +
>>> +#include <linux/ioctl.h>
>>> +#include <linux/types.h>
>>
>> These two headers should not be needed here.
> 
> Fair enough.  Listed out of paranoia since they define things used in 
> this header.
> 
>>
>>> +#include "drm.h"
>>> +
>>> +#if defined(__CPLUSPLUS)
>>
>> Use lowercase here: __cplusplus
> 
> Doh.  Not sure why uppercase was used.  The referenced examples sure 
> didn't have that.
> 
>>
>>> +extern "C" {
>>> +#endif
>>> +
>>> +#define QAIC_MANAGE_MAX_MSG_LENGTH SZ_4K    /**<
>>> +                          * The length(4K) includes len and
>>> +                          * count fields of qaic_manage_msg
>>> +                          */
>>
>> I guess these are doxygen style commands but you should be using 
>> kernel-doc here.
>> See https://docs.kernel.org/doc-guide/kernel-doc.html.
>> This can be used to verify the header:
>> $ scripts/kernel-doc -v -none include/uapi/drm/qaic_accel.h
> 
> include/uapi/drm/drm.h was used as the example for these, and thus it 
> was assumed that was the DRM style.
> 
> I'm not convinced kernel-doc is applicable to all of these.  Will review.
> 
>>
>>> +
>>> +enum qaic_sem_flags {
>>
>> Is there any specific reason for enums if all values are explicitly 
>> given?
> 
> It is a style pulled from elsewhere.  I can remove the enums.
> 
>>
>>> +    SEM_INSYNCFENCE =    0x1,
>>
>> All these enums/defines end up in a global user space namespace.
>> I would advise to prefix everything with QAIC_ or DRM_QAIC_ (e.g. 
>> QAIC_SEM_INSYNCFENCE)
>> to avoid conflicts with other drivers or user space libs.
> 
> Good point.  Will do.
> 
>>
>>> +    SEM_OUTSYNCFENCE =    0x2,
>>> +};
>>> +
>>> +enum qaic_sem_cmd {
>>> +    SEM_NOP =        0,
>>> +    SEM_INIT =        1,
>>> +    SEM_INC =        2,
>>> +    SEM_DEC =        3,
>>> +    SEM_WAIT_EQUAL =    4,
>>> +    SEM_WAIT_GT_EQ =    5, /**< Greater than or equal */
>>> +    SEM_WAIT_GT_0 =        6, /**< Greater than 0 */
>>> +};
>>> +
>>> +enum qaic_manage_transaction_type {
>>> +    TRANS_UNDEFINED =            0,
>>> +    TRANS_PASSTHROUGH_FROM_USR =        1,
>>> +    TRANS_PASSTHROUGH_TO_USR =        2,
>>> +    TRANS_PASSTHROUGH_FROM_DEV =        3,
>>> +    TRANS_PASSTHROUGH_TO_DEV =        4,
>>> +    TRANS_DMA_XFER_FROM_USR =        5,
>>> +    TRANS_DMA_XFER_TO_DEV =            6,
>>> +    TRANS_ACTIVATE_FROM_USR =        7,
>>> +    TRANS_ACTIVATE_FROM_DEV =        8,
>>> +    TRANS_ACTIVATE_TO_DEV =            9,
>>> +    TRANS_DEACTIVATE_FROM_USR =        10,
>>> +    TRANS_DEACTIVATE_FROM_DEV =        11,
>>> +    TRANS_STATUS_FROM_USR =            12,
>>> +    TRANS_STATUS_TO_USR =            13,
>>> +    TRANS_STATUS_FROM_DEV =            14,
>>> +    TRANS_STATUS_TO_DEV =            15,
>>> +    TRANS_TERMINATE_FROM_DEV =        16,
>>> +    TRANS_TERMINATE_TO_DEV =        17,
>>> +    TRANS_DMA_XFER_CONT =            18,
>>> +    TRANS_VALIDATE_PARTITION_FROM_DEV =    19,
>>> +    TRANS_VALIDATE_PARTITION_TO_DEV =    20,
>>> +    TRANS_MAX =                21
>>> +};
>>> +
>>> +struct qaic_manage_trans_hdr {
>>> +    __u32 type;    /**< in, value from enum manage_transaction_type */
>>> +    __u32 len;    /**< in, length of this transaction, including the 
>>> header */
>>> +};
>>> +
>>> +struct qaic_manage_trans_passthrough {
>>> +    struct qaic_manage_trans_hdr hdr;
>>> +    __u8 data[];    /**< in, userspace must encode in little endian */
>>> +};
>>> +
>>> +struct qaic_manage_trans_dma_xfer {
>>> +    struct qaic_manage_trans_hdr hdr;
>>> +    __u32 tag;    /**< in, device specific */
>>> +    __u32 count;    /**< in */
>>> +    __u64 addr;    /**< in, address of the data to transferred via 
>>> DMA */
>>> +    __u64 size;    /**< in, length of the data to transferred via 
>>> DMA */
>>> +};
>>> +
>>> +struct qaic_manage_trans_activate_to_dev {
>>> +    struct qaic_manage_trans_hdr hdr;
>>> +    __u32 queue_size;    /**<
>>> +                  * in, number of elements in DBC request
>>> +                  * and respose queue
>>> +                  */
>>> +    __u32 eventfd;        /**< in */
>>> +    __u32 options;        /**< in, device specific */
>>> +    __u32 pad;        /**< pad must be 0 */
>>> +};
>>> +
>>> +struct qaic_manage_trans_activate_from_dev {
>>> +    struct qaic_manage_trans_hdr hdr;
>>> +    __u32 status;    /**< out, status of activate transaction */
>>> +    __u32 dbc_id;    /**< out, Identifier of assigned DMA Bridge 
>>> channel */
>>> +    __u64 options;    /**< out */
>>> +};
>>> +
>>> +struct qaic_manage_trans_deactivate {
>>> +    struct qaic_manage_trans_hdr hdr;
>>> +    __u32 dbc_id;    /**< in, Identifier of assigned DMA Bridge 
>>> channel */
>>> +    __u32 pad;    /**< pad must be 0 */
>>> +};
>>> +
>>> +struct qaic_manage_trans_status_to_dev {
>>> +    struct qaic_manage_trans_hdr hdr;
>>> +};
>>> +
>>> +struct qaic_manage_trans_status_from_dev {
>>> +    struct qaic_manage_trans_hdr hdr;
>>> +    __u16 major;    /**< out, major vesrion of NNC protocol used by 
>>> device */
>>> +    __u16 minor;    /**< out, minor vesrion of NNC protocol used by 
>>> device */
>>
>> vesrion -> version
> 
> Will fix.
> 
>>
>>> +    __u32 status;    /**< out, status of query transaction  */
>>> +    __u64 status_flags;    /**<
>>> +                  * out
>>> +                  * 0    : If set then device has CRC check enabled
>>> +                  * 1:63 : Unused
>>> +                  */
>>> +};
>>> +
>>> +struct qaic_manage_msg {
>>> +    __u32 len;    /**< in, Length of valid data - ie sum of all 
>>> transactions */
>>> +    __u32 count;    /**< in, Number of transactions in message */
>>> +    __u64 data;    /**< in, Pointer to array of transactions */
>>> +};
>>> +
>>> +struct qaic_create_bo {
>>> +    __u64 size;    /**< in, Size of BO in byte */
>>> +    __u32 handle;    /**< out, Returned GEM handle for the BO */
>>> +    __u32 pad;    /**< pad must be 0 */
>>> +};
>>> +
>>> +struct qaic_mmap_bo {
>>> +    __u32 handle;    /**< in, Handle for the BO being mapped. */
>>
>> The comment is missleading. BO is not mapped by this ioctl().
> 
> Its mapping the handle to the offset, which is a type of mapping.  I 
> think I get your point though.  Will try to wordsmith this better.
> 
>>
>>> +    __u32 pad;    /**< pad must be 0 */
>>> +    __u64 offset;    /**<
>>> +              * out, offset into the drm node to use for
>>> +              * subsequent mmap call
>>> +              */
>>> +};
>>> +
>>> +/**
>>> + * @brief semaphore command
>>> + */
>>> +struct qaic_sem {
>>> +    __u16 val;    /**< in, Only lower 12 bits are valid */
>>> +    __u8  index;    /**< in, Only lower 5 bits are valid */
>>> +    __u8  presync;    /**< in, 1 if presync operation, 0 if postsync */
>>> +    __u8  cmd;    /**< in, See enum sem_cmd */
>>> +    __u8  flags;    /**< in, See sem_flags for valid bits.  All 
>>> others must be 0 */
>>> +    __u16 pad;    /**< pad must be 0 */
>>> +};
>>> +
>>> +struct qaic_attach_slice_entry {
>>> +    __u64 size;        /**< in, Size memory to allocate for this BO 
>>> slice */
>>> +    struct qaic_sem    sem0;    /**< in, Must be zero if not valid */
>>> +    struct qaic_sem    sem1;    /**< in, Must be zero if not valid */
>>> +    struct qaic_sem    sem2;    /**< in, Must be zero if not valid */
>>> +    struct qaic_sem    sem3;    /**< in, Must be zero if not valid */
>>> +    __u64 dev_addr;        /**< in, Address in device to/from which 
>>> data is copied */
>>> +    __u64 db_addr;        /**< in, Doorbell address */
>>> +    __u32 db_data;        /**< in, Data to write to doorbell */
>>> +    __u32 db_len;        /**<
>>> +                  * in, Doorbell length - 32, 16, or 8 bits.
>>> +                  * 0 means doorbell is inactive
>>> +                  */
>>> +    __u64 offset;        /**< in, Offset from start of buffer */
>>> +};
>>> +
>>> +struct qaic_attach_slice_hdr {
>>> +    __u32 count;    /**< in, Number of slices for this BO */
>>> +    __u32 dbc_id;    /**< in, Associate this BO with this DMA Bridge 
>>> channel */
>>> +    __u32 handle;    /**< in, Handle of BO to which slicing 
>>> information is to be attached */
>>> +    __u32 dir;    /**< in, Direction of data: 1 = DMA_TO_DEVICE, 2 = 
>>> DMA_FROM_DEVICE */
>>> +    __u64 size;    /**<
>>> +              * in, Total length of BO
>>> +              * If BO is imported (DMABUF/PRIME) then this size
>>> +              * should not exceed the size of DMABUF provided.
>>> +              * If BO is allocated using DRM_IOCTL_QAIC_CREATE_BO
>>> +              * then this size should be exactly same as the size
>>> +              * provided during DRM_IOCTL_QAIC_CREATE_BO.
>>> +              */
>>> +};
>>> +
>>> +struct qaic_attach_slice {
>>> +    struct qaic_attach_slice_hdr hdr;
>>> +    __u64 data;    /**<
>>> +              * in, Pointer to a buffer which is container of
>>> +              * struct qaic_attach_slice_entry[]
>>> +              */
>>> +};
>>> +
>>> +struct qaic_execute_entry {
>>> +    __u32 handle;    /**< in, buffer handle */
>>> +    __u32 dir;    /**< in, 1 = to device, 2 = from device */
>>> +};
>>> +
>>> +struct qaic_partial_execute_entry {
>>> +    __u32 handle;    /**< in, buffer handle */
>>> +    __u32 dir;    /**< in, 1 = to device, 2 = from device */
>>> +    __u64 resize;    /**< in, 0 = no resize */
>>> +};
>>> +
>>> +struct qaic_execute_hdr {
>>> +    __u32 count;    /**< in, number of executes following this 
>>> header */
>>> +    __u32 dbc_id;    /**< in, Identifier of assigned DMA Bridge 
>>> channel */
>>> +};
>>> +
>>> +struct qaic_execute {
>>> +    struct qaic_execute_hdr hdr;
>>> +    __u64 data;    /**< in, qaic_execute_entry or 
>>> qaic_partial_execute_entry container */
>>> +};
>>> +
>>> +struct qaic_wait {
>>> +    __u32 handle;    /**< in, handle to wait on until execute is 
>>> complete */
>>> +    __u32 timeout;    /**< in, timeout for wait(in ms) */
>>> +    __u32 dbc_id;    /**< in, Identifier of assigned DMA Bridge 
>>> channel */
>>> +    __u32 pad;    /**< pad must be 0 */
>>> +};
>>> +
>>> +struct qaic_perf_stats_hdr {
>>> +    __u16 count;    /**< in, Total number BOs requested */
>>> +    __u16 pad;    /**< pad must be 0 */
>>> +    __u32 dbc_id;    /**< in, Identifier of assigned DMA Bridge 
>>> channel */
>>> +};
>>> +
>>> +struct qaic_perf_stats {
>>> +    struct qaic_perf_stats_hdr hdr;
>>> +    __u64 data;    /**< in, qaic_perf_stats_entry container */
>>> +};
>>> +
>>> +struct qaic_perf_stats_entry {
>>> +    __u32 handle;            /**< in, Handle of the memory request */
>>> +    __u32 queue_level_before;    /**<
>>> +                      * out, Number of elements in queue
>>> +                      * before submission given memory request
>>> +                      */
>>> +    __u32 num_queue_element;    /**<
>>> +                      * out, Number of elements to add in the
>>> +                      * queue for given memory request
>>> +                      */
>>> +    __u32 submit_latency_us;    /**<
>>> +                      * out, Time taken by kernel to submit
>>> +                      * the request to device
>>> +                      */
>>> +    __u32 device_latency_us;    /**<
>>> +                      * out, Time taken by device to execute the
>>> +                      * request. 0 if request is not completed
>>> +                      */
>>> +    __u32 pad;            /**< pad must be 0 */
>>> +};
>>> +
>>> +struct qaic_part_dev {
>>> +    __u32 partition_id;    /**< in, reservation id */
>>> +    __u16 remove;        /**< in, 1 - Remove device 0 - Create 
>>> device */
>>> +    __u16 pad;        /**< pad must be 0 */
>>> +};
>>> +
>>> +#define DRM_QAIC_MANAGE                0x00
>>> +#define DRM_QAIC_CREATE_BO            0x01
>>> +#define DRM_QAIC_MMAP_BO            0x02
>>
>> I know that MMAP_BO ioctl is common in drm drivers but in my opinion 
>> it is a very poor name.
>> I suggest naming it BO_INFO so in future you could extend it with 
>> other bo params besides
>> mmap offset.
> 
> I see your point, but I don't see how to implement it.
> 
> Let us assume this IOCTL is renamed DRM_QAIC_BO_INFO, and struct 
> qaic_mmap_bo is named qaic_bo_info.
> 
> qaic_bo_info is part of the uAPI.  If, "tomorrow" we need to add a field 
> to it, we cannot.  That changes the structure size and breaks the uAPI.
> 
> I can't add reserved fields in anticipation of future use since GregKH 
> had said not to do that - 
> https://lore.kernel.org/all/20200514163734.GB3154055@kroah.com/
> 
> So, I'm thinking the only approach would be a linked list similar to 
> EXECUTE_BO where qaic_bo_info holds a userspace pointer that is the head 
> of the list, and new list nodes can be defined in future.  That seems 
> like an overengineered solution to some hypothetical future which may 
> not come to pass.  If this is the solution, then I think I'd prefer to 
> leave this ioctl as is, and deprecate it if the extension usecase ever 
> comes to pass.
> 
> Thoughts?  Am I missing something?
> 
>>
>>> +#define DRM_QAIC_ATTACH_SLICE_BO        0x03
>>> +#define DRM_QAIC_EXECUTE_BO            0x04
>>> +#define DRM_QAIC_PARTIAL_EXECUTE_BO        0x05
>>> +#define DRM_QAIC_WAIT_BO            0x06
>>> +#define DRM_QAIC_PERF_STATS_BO            0x07
>>> +#define DRM_QAIC_PART_DEV            0x08
>>> +
>>> +#define DRM_IOCTL_QAIC_MANAGE            DRM_IOWR(DRM_COMMAND_BASE + 
>>> DRM_QAIC_MANAGE, struct qaic_manage_msg)
>>> +#define DRM_IOCTL_QAIC_CREATE_BO        DRM_IOWR(DRM_COMMAND_BASE + 
>>> DRM_QAIC_CREATE_BO,    struct qaic_create_bo)
>>> +#define DRM_IOCTL_QAIC_MMAP_BO            DRM_IOWR(DRM_COMMAND_BASE 
>>> + DRM_QAIC_MMAP_BO, struct qaic_mmap_bo)
>>> +#define DRM_IOCTL_QAIC_ATTACH_SLICE_BO        
>>> DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_ATTACH_SLICE_BO, struct 
>>> qaic_attach_slice)
>>> +#define DRM_IOCTL_QAIC_EXECUTE_BO        DRM_IOW(DRM_COMMAND_BASE + 
>>> DRM_QAIC_EXECUTE_BO,    struct qaic_execute)
>>> +#define DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO    
>>> DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_PARTIAL_EXECUTE_BO,    struct 
>>> qaic_execute)
>>> +#define DRM_IOCTL_QAIC_WAIT_BO            DRM_IOW(DRM_COMMAND_BASE + 
>>> DRM_QAIC_WAIT_BO, struct qaic_wait)
>>> +#define DRM_IOCTL_QAIC_PERF_STATS_BO        
>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PERF_STATS_BO, struct 
>>> qaic_perf_stats)
>>> +#define DRM_IOCTL_QAIC_PART_DEV            DRM_IOWR(DRM_COMMAND_BASE 
>>> + DRM_QAIC_PART_DEV, struct qaic_part_dev)
>>> +
>>> +#if defined(__CPLUSPLUS)
>>
>> Use lowercase here: __cplusplus
> 
> Will do.
> 
>>
>>> +}
>>> +#endif
>>> +
>>> +#endif /* QAIC_ACCEL_H_ */
>>
>> Regards,
>> Jacek
>>
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/8] accel/qaic: Add uapi and core driver file
@ 2023-03-01 19:46         ` Jeffrey Hugo
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey Hugo @ 2023-03-01 19:46 UTC (permalink / raw)
  To: Jacek Lawrynowicz, ogabbay, airlied, daniel, dri-devel
  Cc: linux-doc, linux-arm-msm, quic_ajitpals, quic_pkanojiy,
	stanislaw.gruszka, quic_carlv

On 2/17/2023 11:15 AM, Jeffrey Hugo wrote:
> On 2/16/2023 7:13 AM, Jacek Lawrynowicz wrote:
>> Hi,
>>
>> On 06.02.2023 16:41, Jeffrey Hugo wrote:
>>> Add the QAIC driver uapi file and core driver file that binds to the 
>>> PCIe
>>> device.  The core driver file also creates the accel device and manages
>>> all the interconnections between the different parts of the driver.
>>>
>>> The driver can be built as a module.  If so, it will be called 
>>> "qaic.ko".
>>>
>>> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
>>> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
>>> ---
>>>   drivers/accel/qaic/qaic.h     | 321 ++++++++++++++++++
>>>   drivers/accel/qaic/qaic_drv.c | 771 
>>> ++++++++++++++++++++++++++++++++++++++++++
>>>   include/uapi/drm/qaic_accel.h | 283 ++++++++++++++++
>>>   3 files changed, 1375 insertions(+)
>>>   create mode 100644 drivers/accel/qaic/qaic.h
>>>   create mode 100644 drivers/accel/qaic/qaic_drv.c
>>>   create mode 100644 include/uapi/drm/qaic_accel.h
>>>
>>> diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
>>> new file mode 100644
>>> index 0000000..3f7ea76
>>> --- /dev/null
>>> +++ b/drivers/accel/qaic/qaic.h
>>> @@ -0,0 +1,321 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only
>>> + *
>>> + * Copyright (c) 2019-2021, The Linux Foundation. All rights reserved.
>>> + * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All 
>>> rights reserved.
>>> + */
>>> +
>>> +#ifndef QAICINTERNAL_H_
>>
>> Please use guard macro that matches the file name: _QAIC_H_
> 
> Before moving to DRM/ACCEL, this conflicted with the uapi file. However, 
> that is no longer the case, so yes, this should be changed. Will do.
> 
>>
>>> +#define QAICINTERNAL_H_
>>> +
>>> +#include <linux/interrupt.h>
>>> +#include <linux/kref.h>
>>> +#include <linux/mhi.h>
>>> +#include <linux/mutex.h>
>>> +#include <linux/pci.h>
>>> +#include <linux/spinlock.h>
>>> +#include <linux/srcu.h>
>>> +#include <linux/wait.h>
>>> +#include <linux/workqueue.h>
>>> +#include <drm/drm_device.h>
>>> +#include <drm/drm_gem.h>
>>> +
>>> +#define QAIC_DBC_BASE        0x20000
>>> +#define QAIC_DBC_SIZE        0x1000
>>> +
>>> +#define QAIC_NO_PARTITION    -1
>>> +
>>> +#define QAIC_DBC_OFF(i)        ((i) * QAIC_DBC_SIZE + QAIC_DBC_BASE)
>>> +
>>> +#define to_qaic_bo(obj) container_of(obj, struct qaic_bo, base)
>>> +
>>> +extern bool poll_datapath;
>>> +
>>> +struct qaic_user {
>>> +    /* Uniquely identifies this user for the device */
>>> +    int            handle;
>>> +    struct kref        ref_count;
>>> +    /* Char device opened by this user */
>>> +    struct qaic_drm_device    *qddev;
>>> +    /* Node in list of users that opened this drm device */
>>> +    struct list_head    node;
>>> +    /* SRCU used to synchronize this user during cleanup */
>>> +    struct srcu_struct    qddev_lock;
>>> +    atomic_t        chunk_id;
>>> +};
>>> +
>>> +struct dma_bridge_chan {
>>> +    /* Pointer to device strcut maintained by driver */
>>> +    struct qaic_device    *qdev;
>>> +    /* ID of this DMA bridge channel(DBC) */
>>> +    unsigned int        id;
>>> +    /* Synchronizes access to xfer_list */
>>> +    spinlock_t        xfer_lock;
>>> +    /* Base address of request queue */
>>> +    void            *req_q_base;
>>> +    /* Base address of response queue */
>>> +    void            *rsp_q_base;
>>> +    /*
>>> +     * Base bus address of request queue. Response queue bus address 
>>> can be
>>> +     * calculated by adding request queue size to this variable
>>> +     */
>>> +    dma_addr_t        dma_addr;
>>> +    /* Total size of request and response queue in byte */
>>> +    u32            total_size;
>>> +    /* Capacity of request/response queue */
>>> +    u32            nelem;
>>> +    /* The user that opened this DBC */
>>> +    struct qaic_user    *usr;
>>> +    /*
>>> +     * Request ID of next memory handle that goes in request queue. One
>>> +     * memory handle can enqueue more than one request elements, all
>>> +     * this requests that belong to same memory handle have same 
>>> request ID
>>> +     */
>>> +    u16            next_req_id;
>>> +    /* TRUE: DBC is in use; FALSE: DBC not in use */
>>
>> Use standard "true"/"false" instead of custom "TRUE"/"FALSE" macros.
>> This applies here and in multiple other places in the driver.
> 
> I think you are getting at that the documentation could be confusing.  I 
> don't appear to see custom macro use in the code.  Will try to clarify 
> that here.
> 
>>> +    bool            in_use;
>>> +    /*
>>> +     * Base address of device registers. Used to read/write request and
>>> +     * response queue's head and tail pointer of this DBC.
>>> +     */
>>> +    void __iomem        *dbc_base;
>>> +    /* Head of list where each node is a memory handle queued in 
>>> request queue */
>>> +    struct list_head    xfer_list;
>>> +    /* Synchronizes DBC readers during cleanup */
>>> +    struct srcu_struct    ch_lock;
>>> +    /*
>>> +     * When this DBC is released, any thread waiting on this wait 
>>> queue is
>>> +     * woken up
>>> +     */
>>> +    wait_queue_head_t    dbc_release;
>>> +    /* Head of list where each node is a bo associated with this DBC */
>>> +    struct list_head    bo_lists;
>>> +    /* The irq line for this DBC.  Used for polling */
>>> +    unsigned int        irq;
>>> +    /* Polling work item to simulate interrupts */
>>> +    struct work_struct    poll_work;
>>> +};
>>> +
>>> +struct qaic_device {
>>> +    /* Pointer to base PCI device struct of our physical device */
>>> +    struct pci_dev        *pdev;
>>> +    /* Mask of all bars of this device */
>>> +    int            bars;
>>> +    /* Req. ID of request that will be queued next in MHI control 
>>> device */
>>> +    u32            next_seq_num;
>>> +    /* Base address of bar 0 */
>>> +    void __iomem        *bar_0;
>>> +    /* Base address of bar 2 */
>>> +    void __iomem        *bar_2;
>>> +    /* Controller structure for MHI devices */
>>> +    struct mhi_controller    *mhi_cntl;
>>> +    /* MHI control channel device */
>>> +    struct mhi_device    *cntl_ch;
>>> +    /* List of requests queued in MHI control device */
>>> +    struct list_head    cntl_xfer_list;
>>> +    /* Synchronizes MHI control device transactions and its xfer 
>>> list */
>>> +    struct mutex        cntl_mutex;
>>> +    /* Base actual physical representation of drm device */
>>> +    struct qaic_drm_device    *base_dev;
>>> +    /* Array of DBC struct of this device */
>>> +    struct dma_bridge_chan    *dbc;
>>> +    /* Work queue for tasks related to MHI control device */
>>> +    struct workqueue_struct    *cntl_wq;
>>> +    /* Synchronizes all the users of device during cleanup */
>>> +    struct srcu_struct    dev_lock;
>>> +    /* TRUE: Device under reset; FALSE: Device not under reset */
>>> +    bool            in_reset;
>>> +    /*
>>> +     * TRUE: A tx MHI transaction has failed and a rx buffer is 
>>> still queued
>>> +     * in control device. Such a buffer is considered lost rx buffer
>>> +     * FALSE: No rx buffer is lost in control device
>>> +     */
>>> +    bool            cntl_lost_buf;
>>> +    /* Maximum number of DBC supported by this device */
>>> +    u32            num_dbc;
>>> +    /* Head in list of drm device created on top of this device */
>>> +    struct list_head    qaic_drm_devices;
>>> +    /* Synchronizes access of qaic_drm_devices list */
>>> +    struct mutex        qaic_drm_devices_mutex;
>>> +    /* Generate the CRC of a control message */
>>> +    u32 (*gen_crc)(void *msg);
>>> +    /* Validate the CRC of a control message */
>>> +    bool (*valid_crc)(void *msg);
>>> +};
>>> +
>>> +struct qaic_drm_device {
>>> +    /* Pointer to the root device struct driven by this driver */
>>> +    struct qaic_device    *qdev;
>>> +    /* Node in list of drm devices maintained by root device */
>>> +    struct list_head    node;
>>> +    /*
>>> +     * The physical device can be partition in number of logical 
>>> devices.
>>> +     * And each logical device is given a partition id. This member 
>>> stores
>>> +     * that id. QAIC_NO_PARTITION is a sentinel used to mark that 
>>> this drm
>>> +     * device is the actual physical device
>>> +     */
>>> +    s32            partition_id;
>>> +    /*
>>> +     * It points to the user that created this drm device. It is NULL
>>> +     * when this drm device represents the physical device i.e.
>>> +     * partition_id is QAIC_NO_PARTITION
>>> +     */
>>> +    struct qaic_user    *owner;
>>> +    /* Pointer to the drm device struct of this drm device */
>>> +    struct drm_device    *ddev;
>>> +    /* Head in list of users who have opened this drm device */
>>> +    struct list_head    users;
>>> +    /* Synchronizes access to users list */
>>> +    struct mutex        users_mutex;
>>> +};
>>> +
>>> +struct qaic_bo {
>>> +    struct drm_gem_object    base;
>>
>> Any reason why drm_gem_shmem_object cannot be used as a base?
> 
> I beleive that is incompatible with our memory allocator in patch 5.
> 
>>> +    /* Scatter/gather table for allocate/imported BO */
>>> +    struct sg_table        *sgt;
>>> +    /* BO size requested by user. GEM object might be bigger in 
>>> size. */
>>> +    u64            size;
>>> +    /* Head in list of slices of this BO */
>>> +    struct list_head    slices;
>>> +    /* Total nents, for all slices of this BO */
>>> +    int            total_slice_nents;
>>> +    /*
>>> +     * Direction of transfer. It can assume only two value 
>>> DMA_TO_DEVICE and
>>> +     * DMA_FROM_DEVICE.
>>> +     */
>>> +    int            dir;
>>> +    /* The pointer of the DBC which operates on this BO */
>>> +    struct dma_bridge_chan    *dbc;
>>> +    /* Number of slice that belongs to this buffer */
>>> +    u32            nr_slice;
>>> +    /* Number of slice that have been transferred by DMA engine */
>>> +    u32            nr_slice_xfer_done;
>>> +    /* TRUE = BO is queued for execution, FALSE = BO is not queued */
>>> +    bool            queued;
>>> +    /*
>>> +     * If TRUE then user has attached slicing information to this BO by
>>> +     * calling DRM_IOCTL_QAIC_ATTACH_SLICE_BO ioctl.
>>> +     */
>>> +    bool            sliced;
>>> +    /* Request ID of this BO if it is queued for execution */
>>> +    u16            req_id;
>>> +    /* Handle assigned to this BO */
>>> +    u32            handle;
>>> +    /* Wait on this for completion of DMA transfer of this BO */
>>> +    struct completion    xfer_done;
>>> +    /*
>>> +     * Node in linked list where head is dbc->xfer_list.
>>> +     * This link list contain BO's that are queued for DMA transfer.
>>> +     */
>>> +    struct list_head    xfer_list;
>>> +    /*
>>> +     * Node in linked list where head is dbc->bo_lists.
>>> +     * This link list contain BO's that are associated with the DBC 
>>> it is
>>> +     * linked to.
>>> +     */
>>> +    struct list_head    bo_list;
>>> +    struct {
>>> +        /*
>>> +         * Latest timestamp(ns) at which kernel received a request to
>>> +         * execute this BO
>>> +         */
>>> +        u64        req_received_ts;
>>> +        /*
>>> +         * Latest timestamp(ns) at which kernel enqueued requests of
>>> +         * this BO for execution in DMA queue
>>> +         */
>>> +        u64        req_submit_ts;
>>> +        /*
>>> +         * Latest timestamp(ns) at which kernel received a completion
>>> +         * interrupt for requests of this BO
>>> +         */
>>> +        u64        req_processed_ts;
>>> +        /*
>>> +         * Number of elements already enqueued in DMA queue before
>>> +         * enqueuing requests of this BO
>>> +         */
>>> +        u32        queue_level_before;
>>> +    } perf_stats;
>>> +
>>> +};
>>> +
>>> +struct bo_slice {
>>> +    /* Mapped pages */
>>> +    struct sg_table        *sgt;
>>> +    /* Number of requests required to queue in DMA queue */
>>> +    int            nents;
>>> +    /* See enum dma_data_direction */
>>> +    int            dir;
>>> +    /* Actual requests that will be copied in DMA queue */
>>> +    struct dbc_req        *reqs;
>>> +    struct kref        ref_count;
>>> +    /* TRUE: No DMA transfer required */
>>> +    bool            no_xfer;
>>> +    /* Pointer to the parent BO handle */
>>> +    struct qaic_bo        *bo;
>>> +    /* Node in list of slices maintained by parent BO */
>>> +    struct list_head    slice;
>>> +    /* Size of this slice in bytes */
>>> +    u64            size;
>>> +    /* Offset of this slice in buffer */
>>> +    u64            offset;
>>> +};
>>> +
>>> +int get_dbc_req_elem_size(void);
>>> +int get_dbc_rsp_elem_size(void);
>>> +int get_cntl_version(struct qaic_device *qdev, struct qaic_user *usr,
>>> +             u16 *major, u16 *minor);
>>> +int qaic_manage_ioctl(struct drm_device *dev, void *data,
>>> +              struct drm_file *file_priv);
>>> +int qaic_execute_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
>>> +               unsigned long arg, bool is_partial);
>>> +int qaic_wait_exec_ioctl(struct qaic_device *qdev, struct qaic_user 
>>> *usr,
>>> +             unsigned long arg);
>>> +int qaic_query_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
>>> +             unsigned long arg);
>>> +int qaic_data_mmap(struct qaic_device *qdev, struct qaic_user *usr,
>>> +           struct vm_area_struct *vma);
>>> +int qaic_data_get_reservation(struct qaic_device *qdev, struct 
>>> qaic_user *usr,
>>> +                  void *data, u32 *partition_id,
>>> +                  u16 *remove);
>>> +void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
>>> +             struct mhi_result *mhi_result);
>>> +
>>> +void qaic_mhi_dl_xfer_cb(struct mhi_device *mhi_dev,
>>> +             struct mhi_result *mhi_result);
>>> +
>>> +int qaic_control_open(struct qaic_device *qdev);
>>> +void qaic_control_close(struct qaic_device *qdev);
>>> +void qaic_release_usr(struct qaic_device *qdev, struct qaic_user *usr);
>>> +
>>> +irqreturn_t dbc_irq_threaded_fn(int irq, void *data);
>>> +irqreturn_t dbc_irq_handler(int irq, void *data);
>>> +int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct 
>>> qaic_user *usr);
>>> +void enable_dbc(struct qaic_device *qdev, u32 dbc_id, struct 
>>> qaic_user *usr);
>>> +void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id);
>>> +void release_dbc(struct qaic_device *qdev, u32 dbc_id);
>>> +
>>> +void wake_all_cntl(struct qaic_device *qdev);
>>> +void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool 
>>> exit_reset);
>>> +
>>> +struct drm_gem_object *qaic_gem_prime_import(struct drm_device *dev,
>>> +                         struct dma_buf *dma_buf);
>>> +
>>> +int qaic_create_bo_ioctl(struct drm_device *dev, void *data,
>>> +             struct drm_file *file_priv);
>>> +int qaic_mmap_bo_ioctl(struct drm_device *dev, void *data,
>>> +               struct drm_file *file_priv);
>>> +int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data,
>>> +                   struct drm_file *file_priv);
>>> +int qaic_execute_bo_ioctl(struct drm_device *dev, void *data,
>>> +              struct drm_file *file_priv);
>>> +int qaic_partial_execute_bo_ioctl(struct drm_device *dev, void *data,
>>> +                  struct drm_file *file_priv);
>>> +int qaic_wait_bo_ioctl(struct drm_device *dev, void *data,
>>> +               struct drm_file *file_priv);
>>> +int qaic_test_print_bo_ioctl(struct drm_device *dev, void *data,
>>> +                 struct drm_file *file_priv);
>>> +int qaic_perf_stats_bo_ioctl(struct drm_device *dev, void *data,
>>> +                 struct drm_file *file_priv);
>>
>> You don't need to break these lines. You can use up to 100 columns in 
>> the whole driver.
>> It will be more readable and checkpatch won't complain.
> 
> Some of this predates the 100 column change.  Checkpatch already 
> complains about the uapi file.
> 
> Will try to fix here.
> 
>>> +void irq_polling_work(struct work_struct *work);
>>> +
>>> +#endif /* QAICINTERNAL_H_ */
>>> diff --git a/drivers/accel/qaic/qaic_drv.c 
>>> b/drivers/accel/qaic/qaic_drv.c
>>> new file mode 100644
>>> index 0000000..602a784
>>> --- /dev/null
>>> +++ b/drivers/accel/qaic/qaic_drv.c
>>> @@ -0,0 +1,771 @@
>>> +// SPDX-License-Identifier: GPL-2.0-only
>>> +
>>> +/* Copyright (c) 2019-2021, The Linux Foundation. All rights 
>>> reserved. */
>>> +/* Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All 
>>> rights reserved. */
>>> +
>>> +#include <linux/delay.h>
>>> +#include <linux/dma-mapping.h>
>>> +#include <linux/idr.h>
>>> +#include <linux/interrupt.h>
>>> +#include <linux/list.h>
>>> +#include <linux/kref.h>
>>> +#include <linux/mhi.h>
>>> +#include <linux/module.h>
>>> +#include <linux/msi.h>
>>> +#include <linux/mutex.h>
>>> +#include <linux/pci.h>
>>> +#include <linux/sched.h>
>>
>> Is <linux/sched.h> used here?
>> Feels like there are couple other unused includes in this file.
> 
> Actually I think that can be removed now.  Will check.
> The other ones look sane to me.  Are there specific ones you question?
> 
>>
>>> +#include <linux/spinlock.h>
>>> +#include <linux/workqueue.h>
>>> +#include <linux/wait.h>
>>> +#include <drm/drm_accel.h>
>>> +#include <drm/drm_drv.h>
>>> +#include <drm/drm_file.h>
>>> +#include <drm/drm_gem.h>
>>> +#include <drm/drm_ioctl.h>
>>> +#include <uapi/drm/qaic_accel.h>
>>> +
>>> +#include "mhi_controller.h"
>>> +#include "mhi_qaic_ctrl.h"
>>> +#include "qaic.h"
>>> +
>>> +MODULE_IMPORT_NS(DMA_BUF);
>>> +
>>> +#define PCI_DEV_AIC100            0xa100
>>> +#define QAIC_NAME            "qaic"
>>> +#define QAIC_DESC            "Qualcomm Cloud AI Accelerators"
>>> +
>>> +static unsigned int datapath_polling;
>>> +module_param(datapath_polling, uint, 0400);
>>> +bool poll_datapath;
>>> +
>>> +static u16 cntl_major = 5;
>>> +static u16 cntl_minor;/* 0 */
>>
>> Missing space before the comment.
>> And also you could convert both vars to macros as they are constants.
> 
> Sure
> 
>>> +static bool link_up;
>>> +static DEFINE_IDA(qaic_usrs);
>>> +
>>> +static int qaic_create_drm_device(struct qaic_device *qdev, s32 
>>> partition_id,
>>> +                  struct qaic_user *owner);
>>> +static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 
>>> partition_id,
>>> +                    struct qaic_user *owner);
>>> +
>>> +static void free_usr(struct kref *kref)
>>> +{
>>> +    struct qaic_user *usr = container_of(kref, struct qaic_user, 
>>> ref_count);
>>> +
>>> +    cleanup_srcu_struct(&usr->qddev_lock);
>>> +    ida_free(&qaic_usrs, usr->handle);
>>> +    kfree(usr);
>>> +}
>>> +
>>> +static int qaic_open(struct drm_device *dev, struct drm_file *file)
>>> +{
>>> +    struct qaic_drm_device *qddev = dev->dev_private;
>>> +    struct qaic_device *qdev = qddev->qdev;
>>> +    struct qaic_user *usr;
>>> +    int rcu_id;
>>> +    int ret;
>>> +
>>> +    rcu_id = srcu_read_lock(&qdev->dev_lock);
>>> +    if (qdev->in_reset) {
>>> +        srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>> +        return -ENODEV;
>>> +    }
>>> +
>>> +    usr = kmalloc(sizeof(*usr), GFP_KERNEL);
>>> +    if (!usr) {
>>> +        srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>> +        return -ENOMEM;
>>> +    }
>>> +
>>> +    usr->handle = ida_alloc(&qaic_usrs, GFP_KERNEL);
>>> +    if (usr->handle < 0) {
>>> +        srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>> +        kfree(usr);
>>> +        return usr->handle;
>>> +    }
>>> +    usr->qddev = qddev;
>>> +    atomic_set(&usr->chunk_id, 0);
>>> +    init_srcu_struct(&usr->qddev_lock);
>>> +    kref_init(&usr->ref_count);
>>> +
>>> +    ret = mutex_lock_interruptible(&qddev->users_mutex);
>>> +    if (ret) {
>>> +        cleanup_srcu_struct(&usr->qddev_lock);
>>> +        kfree(usr);
>>> +        srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>> +        return ret;
>>> +    }
>>> +
>>> +    list_add(&usr->node, &qddev->users);
>>> +    mutex_unlock(&qddev->users_mutex);
>>> +
>>> +    file->driver_priv = usr;
>>> +
>>> +    srcu_read_unlock(&qdev->dev_lock, rcu_id);
>>> +    return 0;
>>> +}
>>> +
>>> +static void qaic_postclose(struct drm_device *dev, struct drm_file 
>>> *file)
>>> +{
>>> +    struct qaic_user *usr = file->driver_priv;
>>> +    struct qaic_drm_device *qddev;
>>> +    struct qaic_device *qdev;
>>> +    int qdev_rcu_id;
>>> +    int usr_rcu_id;
>>> +    int i;
>>> +
>>> +    qddev = usr->qddev;
>>> +    usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
>>> +    if (qddev) {
>>> +        qdev = qddev->qdev;
>>> +        qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
>>> +        if (!qdev->in_reset) {
>>> +            qaic_release_usr(qdev, usr);
>>> +            for (i = 0; i < qdev->num_dbc; ++i)
>>> +                if (qdev->dbc[i].usr &&
>>> +                    qdev->dbc[i].usr->handle == usr->handle)
>>> +                    release_dbc(qdev, i);
>>> +
>>> +            /* Remove child devices */
>>> +            if (qddev->partition_id == QAIC_NO_PARTITION)
>>> +                qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, usr);
>>> +        }
>>> +        srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
>>> +
>>> +        mutex_lock(&qddev->users_mutex);
>>> +        if (!list_empty(&usr->node))
>>> +            list_del_init(&usr->node);
>>> +        mutex_unlock(&qddev->users_mutex);
>>> +    }
>>> +
>>> +    srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>> +    kref_put(&usr->ref_count, free_usr);
>>> +
>>> +    file->driver_priv = NULL;
>>> +}
>>> +
>>> +static int qaic_part_dev_ioctl(struct drm_device *dev, void *data,
>>> +                   struct drm_file *file_priv)
>>> +{
>>> +    struct qaic_device *qdev;
>>> +    struct qaic_user *usr;
>>> +    u32 partition_id;
>>> +    int qdev_rcu_id;
>>> +    int usr_rcu_id;
>>> +    int ret = 0;
>>> +    u16 remove;
>>> +
>>> +    usr = file_priv->driver_priv;
>>> +    if (!usr)
>>> +        return -EINVAL;
>>> +
>>> +    usr_rcu_id = srcu_read_lock(&usr->qddev_lock);
>>> +    if (!usr->qddev) {
>>> +        srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>> +        return -ENODEV;
>>> +    }
>>> +
>>> +    qdev = usr->qddev->qdev;
>>> +    if (!qdev) {
>>> +        srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>> +        return -ENODEV;
>>> +    }
>>> +
>>> +    qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
>>> +    if (qdev->in_reset) {
>>> +        ret = -ENODEV;
>>> +        goto out;
>>> +    }
>>> +
>>> +    /* This IOCTL is only supported for base devices. */
>>> +    if (usr->qddev->partition_id != QAIC_NO_PARTITION) {
>>> +        ret = -ENOTTY;
>>> +        goto out;
>>> +    }
>>> +
>>> +    ret = qaic_data_get_reservation(qdev, usr, data, &partition_id,
>>> +                    &remove);
>>> +    if (ret)
>>> +        goto out;
>>> +
>>> +    if (remove == 1)
>>> +        qaic_destroy_drm_device(qdev, partition_id, usr);
>>> +    else
>>> +        ret = qaic_create_drm_device(qdev, partition_id, usr);
>>> +
>>> +out:
>>> +    srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
>>> +    srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +DEFINE_DRM_ACCEL_FOPS(qaic_accel_fops);
>>> +
>>> +static const struct drm_ioctl_desc qaic_drm_ioctls[] = {
>>> +    DRM_IOCTL_DEF_DRV(QAIC_MANAGE, qaic_manage_ioctl, 
>>> DRM_RENDER_ALLOW),
>>> +    DRM_IOCTL_DEF_DRV(QAIC_CREATE_BO, qaic_create_bo_ioctl, 
>>> DRM_RENDER_ALLOW),
>>> +    DRM_IOCTL_DEF_DRV(QAIC_MMAP_BO, qaic_mmap_bo_ioctl, 
>>> DRM_RENDER_ALLOW),
>>> +    DRM_IOCTL_DEF_DRV(QAIC_ATTACH_SLICE_BO, 
>>> qaic_attach_slice_bo_ioctl, DRM_RENDER_ALLOW),
>>> +    DRM_IOCTL_DEF_DRV(QAIC_EXECUTE_BO, qaic_execute_bo_ioctl, 
>>> DRM_RENDER_ALLOW),
>>> +    DRM_IOCTL_DEF_DRV(QAIC_PARTIAL_EXECUTE_BO, 
>>> qaic_partial_execute_bo_ioctl, DRM_RENDER_ALLOW),
>>> +    DRM_IOCTL_DEF_DRV(QAIC_WAIT_BO, qaic_wait_bo_ioctl, 
>>> DRM_RENDER_ALLOW),
>>> +    DRM_IOCTL_DEF_DRV(QAIC_PERF_STATS_BO, qaic_perf_stats_bo_ioctl, 
>>> DRM_RENDER_ALLOW),
>>> +    DRM_IOCTL_DEF_DRV(QAIC_PART_DEV, qaic_part_dev_ioctl, 
>>> DRM_RENDER_ALLOW),
>>> +};
>>> +
>>> +static const struct drm_driver qaic_accel_driver = {
>>> +    .driver_features    = DRIVER_GEM | DRIVER_COMPUTE_ACCEL,
>>> +
>>> +    .name            = QAIC_NAME,
>>> +    .desc            = QAIC_DESC,
>>> +    .date            = "20190618",
>>> +
>>> +    .fops            = &qaic_accel_fops,
>>> +    .open            = qaic_open,
>>> +    .postclose        = qaic_postclose,
>>> +
>>> +    .ioctls            = qaic_drm_ioctls,
>>> +    .num_ioctls        = ARRAY_SIZE(qaic_drm_ioctls),
>>> +    .prime_fd_to_handle    = drm_gem_prime_fd_to_handle,
>>> +    .gem_prime_import    = qaic_gem_prime_import,
>>> +};
>>> +
>>> +static int qaic_create_drm_device(struct qaic_device *qdev, s32 
>>> partition_id,
>>> +                  struct qaic_user *owner)
>>> +{
>>> +    struct qaic_drm_device *qddev;
>>> +    struct drm_device *ddev;
>>> +    struct device *pdev;
>>> +    int ret;
>>> +
>>> +    /*
>>> +     * Partition id QAIC_NO_PARTITION indicates that the device was 
>>> created
>>> +     * on mhi_probe and id > QAIC_NO_PARTITION indicates a partition
>>> +     * created using IOCTL. So, pdev for primary device is the pci 
>>> dev and
>>> +     * the parent for partition dev is the primary device.
>>> +     */
>>> +    if (partition_id == QAIC_NO_PARTITION)
>>> +        pdev = &qdev->pdev->dev;
>>> +    else
>>> +        pdev = qdev->base_dev->ddev->dev;
>>> +
>>> +    qddev = kzalloc(sizeof(*qddev), GFP_KERNEL);
>>> +    if (!qddev) {
>>> +        ret = -ENOMEM;
>>> +        goto qddev_fail;
>>> +    }
>>> +
>>> +    ddev = drm_dev_alloc(&qaic_accel_driver, pdev);
>>> +    if (IS_ERR(ddev)) {
>>> +        ret = PTR_ERR(ddev);
>>> +        goto ddev_fail;
>>> +    }
>>> +
>>> +    ddev->dev_private = qddev;
>>> +    qddev->ddev = ddev;
>>> +
>>> +    if (partition_id == QAIC_NO_PARTITION)
>>> +        qdev->base_dev = qddev;
>>> +    qddev->qdev = qdev;
>>> +    qddev->partition_id = partition_id;
>>> +    qddev->owner = owner;
>>> +    INIT_LIST_HEAD(&qddev->users);
>>> +    mutex_init(&qddev->users_mutex);
>>> +
>>> +    mutex_lock(&qdev->qaic_drm_devices_mutex);
>>> +    list_add(&qddev->node, &qdev->qaic_drm_devices);
>>> +    mutex_unlock(&qdev->qaic_drm_devices_mutex);
>>> +
>>> +    ret = drm_dev_register(ddev, 0);
>>> +    if (ret) {
>>> +        pci_dbg(qdev->pdev, "%s: drm_dev_register failed %d\n", 
>>> __func__, ret);
>>> +        goto drm_reg_fail;
>>> +    }
>>> +
>>> +    return 0;
>>> +
>>> +drm_reg_fail:
>>> +    mutex_destroy(&qddev->users_mutex);
>>> +    mutex_lock(&qdev->qaic_drm_devices_mutex);
>>> +    list_del(&qddev->node);
>>> +    mutex_unlock(&qdev->qaic_drm_devices_mutex);
>>> +    if (partition_id == QAIC_NO_PARTITION)
>>> +        qdev->base_dev = NULL;
>>> +    drm_dev_put(ddev);
>>> +ddev_fail:
>>> +    kfree(qddev);
>>> +qddev_fail:
>>> +    return ret;
>>> +}
>>> +
>>> +static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 
>>> partition_id,
>>> +                    struct qaic_user *owner)
>>> +{
>>> +    struct qaic_drm_device *qddev;
>>> +    struct qaic_drm_device *q;
>>> +    struct qaic_user *usr;
>>> +
>>> +    list_for_each_entry_safe(qddev, q, &qdev->qaic_drm_devices, node) {
>>> +        /*
>>> +         * Skip devices in case we just want to remove devices
>>> +         * specific to a owner or partition id.
>>> +         *
>>> +         * owner    partition_id    notes
>>> +         * ----------------------------------
>>> +         * NULL        NO_PARTITION    delete base + all derived (qdev
>>> +         *                reset)
>>> +         * !NULL    NO_PARTITION    delete derived devs created by
>>> +         *                owner.
>>> +         * !NULL    >NO_PARTITION    delete derived dev identified by
>>> +         *                the partition id and created by
>>> +         *                owner
>>> +         * NULL        >NO_PARTITION    invalid (no-op)
>>> +         *
>>> +         * if partition_id is any value < QAIC_NO_PARTITION this 
>>> will be
>>> +         * a no-op.
>>> +         */
>>> +        if (owner && owner != qddev->owner)
>>> +            continue;
>>> +
>>> +        if (partition_id != QAIC_NO_PARTITION &&
>>> +            partition_id != qddev->partition_id && !owner)
>>> +            continue;
>>> +
>>> +        /*
>>> +         * Existing users get unresolvable errors till they close FDs.
>>> +         * Need to sync carefully with users calling close().  The
>>> +         * list of users can be modified elsewhere when the lock isn't
>>> +         * held here, but the sync'ing the srcu with the mutex held
>>> +         * could deadlock.  Grab the mutex so that the list will be
>>> +         * unmodified.  The user we get will exist as long as the
>>> +         * lock is held.  Signal that the qcdev is going away, and
>>> +         * grab a reference to the user so they don't go away for
>>> +         * synchronize_srcu().  Then release the mutex to avoid
>>> +         * deadlock and make sure the user has observed the signal.
>>> +         * With the lock released, we cannot maintain any state of the
>>> +         * user list.
>>> +         */
>>> +        mutex_lock(&qddev->users_mutex);
>>> +        while (!list_empty(&qddev->users)) {
>>> +            usr = list_first_entry(&qddev->users, struct qaic_user,
>>> +                           node);
>>> +            list_del_init(&usr->node);
>>> +            kref_get(&usr->ref_count);
>>> +            usr->qddev = NULL;
>>> +            mutex_unlock(&qddev->users_mutex);
>>> +            synchronize_srcu(&usr->qddev_lock);
>>> +            kref_put(&usr->ref_count, free_usr);
>>> +            mutex_lock(&qddev->users_mutex);
>>> +        }
>>> +        mutex_unlock(&qddev->users_mutex);
>>> +
>>> +        if (qddev->ddev) {
>>> +            drm_dev_unregister(qddev->ddev);
>>> +            drm_dev_put(qddev->ddev);
>>> +        }
>>> +
>>> +        list_del(&qddev->node);
>>> +        kfree(qddev);
>>> +    }
>>> +}
>>> +
>>> +static int qaic_mhi_probe(struct mhi_device *mhi_dev,
>>> +              const struct mhi_device_id *id)
>>> +{
>>> +    struct qaic_device *qdev;
>>> +    u16 major, minor;
>>> +    int ret;
>>> +
>>> +    /*
>>> +     * Invoking this function indicates that the control channel to the
>>> +     * device is available.  We use that as a signal to indicate that
>>> +     * the device side firmware has booted.  The device side firmware
>>> +     * manages the device resources, so we need to communicate with it
>>> +     * via the control channel in order to utilize the device.  
>>> Therefore
>>> +     * we wait until this signal to create the drm dev that 
>>> userspace will
>>> +     * use to control the device, because without the device side 
>>> firmware,
>>> +     * userspace can't do anything useful.
>>> +     */
>>> +
>>> +    qdev = pci_get_drvdata(to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
>>> +
>>> +    qdev->in_reset = false;
>>> +
>>> +    dev_set_drvdata(&mhi_dev->dev, qdev);
>>> +    qdev->cntl_ch = mhi_dev;
>>> +
>>> +    ret = qaic_control_open(qdev);
>>> +    if (ret) {
>>> +        pci_dbg(qdev->pdev, "%s: control_open failed %d\n", 
>>> __func__, ret);
>>> +        goto err;
>>> +    }
>>> +
>>> +    ret = get_cntl_version(qdev, NULL, &major, &minor);
>>> +    if (ret || major != cntl_major || minor > cntl_minor) {
>>> +        pci_err(qdev->pdev, "%s: Control protocol version (%d.%d) 
>>> not supported.  Supported version is (%d.%d). Ret: %d\n",
>>> +            __func__, major, minor, cntl_major, cntl_minor, ret);
>>> +        ret = -EINVAL;
>>> +        goto close_control;
>>> +    }
>>> +
>>> +    ret = qaic_create_drm_device(qdev, QAIC_NO_PARTITION, NULL);
>>> +
>>> +    return ret;
>>> +
>>> +close_control:
>>> +    qaic_control_close(qdev);
>>> +err:
>>> +    return ret;
>>> +}
>>> +
>>> +static void qaic_mhi_remove(struct mhi_device *mhi_dev)
>>> +{
>>
>> Add a comment here
> 
> Presumably why this is an empty function?
> Will do.
> 
>>> +}
>>> +
>>> +static void qaic_notify_reset(struct qaic_device *qdev)
>>> +{
>>> +    int i;
>>> +
>>> +    qdev->in_reset = true;
>>> +    /* wake up any waiters to avoid waiting for timeouts at sync */
>>> +    wake_all_cntl(qdev);
>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>> +        wakeup_dbc(qdev, i);
>>> +    synchronize_srcu(&qdev->dev_lock);
>>> +}
>>> +
>>> +void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool 
>>> exit_reset)
>>> +{
>>> +    int i;
>>> +
>>> +    qaic_notify_reset(qdev);
>>> +
>>> +    /* remove drmdevs to prevent new users from coming in */
>>> +    if (qdev->base_dev)
>>> +        qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION, NULL);
>>> +
>>> +    /* start tearing things down */
>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>> +        release_dbc(qdev, i);
>>> +
>>> +    if (exit_reset)
>>> +        qdev->in_reset = false;
>>> +}
>>> +
>>> +static int qaic_pci_probe(struct pci_dev *pdev,
>>> +              const struct pci_device_id *id)
>>
>> Please try to simplify this function. Maybe move irq init to separate 
>> function.
>> It will be more readable and there will less of a error handling 
>> spaghetti at the bottom.
> 
> I guess I tend towards not breaking something up if there is not reuse 
> elsewhere, but I think I see your point.  Will have a look.
> 
>>
>>> +{
>>> +    int ret;
>>> +    int i;
>>> +    int mhi_irq;
>>> +    struct qaic_device *qdev;
>>> +
>>> +    qdev = devm_kzalloc(&pdev->dev, sizeof(*qdev), GFP_KERNEL);
>>> +    if (!qdev)
>>> +        return -ENOMEM;
>>> +
>>> +    if (id->device == PCI_DEV_AIC100) {
>>> +        qdev->num_dbc = 16;
>>> +        qdev->dbc = devm_kcalloc(&pdev->dev, qdev->num_dbc, 
>>> sizeof(*qdev->dbc),
>>> +                     GFP_KERNEL);
>>> +        if (!qdev->dbc)
>>> +            return -ENOMEM;
>>> +    }
>>> +
>>> +    qdev->cntl_wq = alloc_workqueue("qaic_cntl", WQ_UNBOUND, 0);
>>> +    if (!qdev->cntl_wq) {
>>> +        ret = -ENOMEM;
>>> +        goto wq_fail;
>>> +    }
>>> +    pci_set_drvdata(pdev, qdev);
>>> +    qdev->pdev = pdev;
>>> +    mutex_init(&qdev->cntl_mutex);
>>> +    INIT_LIST_HEAD(&qdev->cntl_xfer_list);
>>> +    init_srcu_struct(&qdev->dev_lock);
>>> +    INIT_LIST_HEAD(&qdev->qaic_drm_devices);
>>> +    mutex_init(&qdev->qaic_drm_devices_mutex);
>>> +    for (i = 0; i < qdev->num_dbc; ++i) {
>>> +        spin_lock_init(&qdev->dbc[i].xfer_lock);
>>> +        qdev->dbc[i].qdev = qdev;
>>> +        qdev->dbc[i].id = i;
>>> +        INIT_LIST_HEAD(&qdev->dbc[i].xfer_list);
>>> +        init_srcu_struct(&qdev->dbc[i].ch_lock);
>>> +        init_waitqueue_head(&qdev->dbc[i].dbc_release);
>>> +        INIT_LIST_HEAD(&qdev->dbc[i].bo_lists);
>>> +    }
>>> +
>>> +    qdev->bars = pci_select_bars(pdev, IORESOURCE_MEM);
>>> +
>>> +    /* make sure the device has the expected BARs */
>>> +    if (qdev->bars != (BIT(0) | BIT(2) | BIT(4))) {
>>> +        pci_dbg(pdev, "%s: expected BARs 0, 2, and 4 not found in 
>>> device.  Found 0x%x\n",
>>> +            __func__, qdev->bars);
>>> +        ret = -EINVAL;
>>> +        goto bar_fail;
>>> +    }
>>> +
>>> +    ret = pci_enable_device(pdev);
>>> +    if (ret)
>>> +        goto enable_fail;
>>> +
>>> +    ret = pci_request_selected_regions(pdev, qdev->bars, "aic100");
>>> +    if (ret)
>>> +        goto request_regions_fail;
>>> +
>>> +    pci_set_master(pdev);
>>> +
>>> +    ret = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
>>> +    if (ret)
>>> +        goto dma_mask_fail;
>>> +    ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
>>> +    if (ret)
>>> +        goto dma_mask_fail;
>>> +    ret = dma_set_max_seg_size(&pdev->dev, UINT_MAX);
>>> +    if (ret)
>>> +        goto dma_mask_fail;
>>> +
>>> +    qdev->bar_0 = pci_ioremap_bar(pdev, 0);
>>> +    if (!qdev->bar_0) {
>>> +        ret = -ENOMEM;
>>> +        goto ioremap_0_fail;
>>> +    }
>>> +
>>> +    qdev->bar_2 = pci_ioremap_bar(pdev, 2);
>>> +    if (!qdev->bar_2) {
>>> +        ret = -ENOMEM;
>>> +        goto ioremap_2_fail;
>>> +    }
>>> +
>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>> +        qdev->dbc[i].dbc_base = qdev->bar_2 + QAIC_DBC_OFF(i);
>>> +
>>> +    ret = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI);
>>> +    if (ret < 0)
>>> +        goto alloc_irq_fail;
>>> +
>>> +    if (ret < 32) {
>>> +        pci_err(pdev, "%s: Requested 32 MSIs.  Obtained %d MSIs 
>>> which is less than the 32 required.\n",
>>> +            __func__, ret);
>>> +        ret = -ENODEV;
>>> +        goto invalid_msi_config;
>>> +    }
>>> +
>>> +    mhi_irq = pci_irq_vector(pdev, 0);
>>> +    if (mhi_irq < 0) {
>>> +        ret = mhi_irq;
>>> +        goto get_mhi_irq_fail;
>>> +    }
>>> +
>>> +    for (i = 0; i < qdev->num_dbc; ++i) {
>>> +        ret = devm_request_threaded_irq(&pdev->dev,
>>> +                        pci_irq_vector(pdev, i + 1),
>>> +                        dbc_irq_handler,
>>> +                        dbc_irq_threaded_fn,
>>> +                        IRQF_SHARED,
>>> +                        "qaic_dbc",
>>> +                        &qdev->dbc[i]);
>>> +        if (ret)
>>> +            goto get_dbc_irq_failed;
>>> +
>>> +        if (poll_datapath) {
>>> +            qdev->dbc[i].irq = pci_irq_vector(pdev, i + 1);
>>> +            disable_irq_nosync(qdev->dbc[i].irq);
>>> +            INIT_WORK(&qdev->dbc[i].poll_work, irq_polling_work);
>>> +        }
>>> +    }
>>> +
>>> +    qdev->mhi_cntl = qaic_mhi_register_controller(pdev, qdev->bar_0, 
>>> mhi_irq);
>>> +    if (IS_ERR(qdev->mhi_cntl)) {
>>> +        ret = PTR_ERR(qdev->mhi_cntl);
>>> +        goto mhi_register_fail;
>>> +    }
>>> +
>>> +    return 0;
>>> +
>>> +mhi_register_fail:
>>> +get_dbc_irq_failed:
>>
>> I don't think that duplicated goto statements are allowed by the 
>> coding style.
>> These should be rather named something like err_free_irq.
>> See 
>> https://www.kernel.org/doc/html/v4.10/process/coding-style.html#centralized-exiting-of-functions 
>>
> 
> I took another look at that link, and I do not beleive it prohibits 
> duplicated goto labels, but they probably could be renamed and 
> simplified as you indicate.  Will have a look at that.
> 
>>
>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>> +        devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
>>> +                  &qdev->dbc[i]);
>>> +get_mhi_irq_fail:
>>> +invalid_msi_config:
>>> +    pci_free_irq_vectors(pdev);
>>> +alloc_irq_fail:
>>> +    iounmap(qdev->bar_2);
>>> +ioremap_2_fail:
>>> +    iounmap(qdev->bar_0);
>>> +ioremap_0_fail:
>>> +dma_mask_fail:
>>> +    pci_clear_master(pdev);
>>> +    pci_release_selected_regions(pdev, qdev->bars);
>>> +request_regions_fail:
>>> +    pci_disable_device(pdev);
>>> +enable_fail:
>>> +    pci_set_drvdata(pdev, NULL);
>>> +bar_fail:
>>> +    for (i = 0; i < qdev->num_dbc; ++i)
>>> +        cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
>>> +    cleanup_srcu_struct(&qdev->dev_lock);
>>> +    destroy_workqueue(qdev->cntl_wq);
>>> +wq_fail:
>>> +    return ret;
>>> +}
>>> +
>>> +static void qaic_pci_remove(struct pci_dev *pdev)
>>> +{
>>> +    struct qaic_device *qdev = pci_get_drvdata(pdev);
>>> +    int i;
>>> +
>>> +    if (!qdev)
>>> +        return;
>>> +
>>> +    qaic_dev_reset_clean_local_state(qdev, false);
>>> +    qaic_mhi_free_controller(qdev->mhi_cntl, link_up);
>>> +    for (i = 0; i < qdev->num_dbc; ++i) {
>>> +        devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
>>> +                  &qdev->dbc[i]);
>>> +        cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
>>> +    }
>>> +    destroy_workqueue(qdev->cntl_wq);
>>> +    pci_free_irq_vectors(pdev);
>>> +    iounmap(qdev->bar_0);
>>> +    pci_clear_master(pdev);
>>> +    pci_release_selected_regions(pdev, qdev->bars);
>>> +    pci_disable_device(pdev);
>>> +    pci_set_drvdata(pdev, NULL);
>>> +}
>>> +
>>> +static void qaic_pci_shutdown(struct pci_dev *pdev)
>>> +{
>>> +    /* see qaic_exit for what link_up is doing */
>>> +    link_up = true;
>>> +    qaic_pci_remove(pdev);
>>> +}
>>> +
>>> +static pci_ers_result_t qaic_pci_error_detected(struct pci_dev *pdev,
>>> +                        pci_channel_state_t error)
>>> +{
>>> +    return PCI_ERS_RESULT_NEED_RESET;
>>> +}
>>> +
>>> +static void qaic_pci_reset_prepare(struct pci_dev *pdev)
>>> +{
>>> +    struct qaic_device *qdev = pci_get_drvdata(pdev);
>>> +
>>> +    qaic_notify_reset(qdev);
>>> +    qaic_mhi_start_reset(qdev->mhi_cntl);
>>> +    qaic_dev_reset_clean_local_state(qdev, false);
>>> +}
>>> +
>>> +static void qaic_pci_reset_done(struct pci_dev *pdev)
>>> +{
>>> +    struct qaic_device *qdev = pci_get_drvdata(pdev);
>>> +
>>> +    qdev->in_reset = false;
>>> +    qaic_mhi_reset_done(qdev->mhi_cntl);
>>> +}
>>> +
>>> +static const struct mhi_device_id qaic_mhi_match_table[] = {
>>> +    { .chan = "QAIC_CONTROL", },
>>> +    {},
>>> +};
>>> +
>>> +static struct mhi_driver qaic_mhi_driver = {
>>> +    .id_table = qaic_mhi_match_table,
>>> +    .remove = qaic_mhi_remove,
>>> +    .probe = qaic_mhi_probe,
>>> +    .ul_xfer_cb = qaic_mhi_ul_xfer_cb,
>>> +    .dl_xfer_cb = qaic_mhi_dl_xfer_cb,
>>> +    .driver = {
>>> +        .name = "qaic_mhi",
>>> +    },
>>> +};
>>> +
>>> +static const struct pci_device_id qaic_ids[] = {
>>> +    { PCI_DEVICE(PCI_VENDOR_ID_QCOM, PCI_DEV_AIC100), },
>>> +    { }
>>> +};
>>> +MODULE_DEVICE_TABLE(pci, qaic_ids);
>>> +
>>> +static const struct pci_error_handlers qaic_pci_err_handler = {
>>> +    .error_detected = qaic_pci_error_detected,
>>> +    .reset_prepare = qaic_pci_reset_prepare,
>>> +    .reset_done = qaic_pci_reset_done,
>>> +};
>>> +
>>> +static struct pci_driver qaic_pci_driver = {
>>> +    .name = QAIC_NAME,
>>> +    .id_table = qaic_ids,
>>> +    .probe = qaic_pci_probe,
>>> +    .remove = qaic_pci_remove,
>>> +    .shutdown = qaic_pci_shutdown,
>>> +    .err_handler = &qaic_pci_err_handler,
>>> +};
>>> +
>>> +static int __init qaic_init(void)
>>> +{
>>> +    int ret;
>>> +
>>> +    if (datapath_polling)
>>> +        poll_datapath = true;
>>> +
>>> +    ret = mhi_driver_register(&qaic_mhi_driver);
>>> +    if (ret) {
>>> +        pr_debug("qaic: mhi_driver_register failed %d\n", ret);
>>> +        goto free_class;
>>> +    }
>>> +
>>> +    ret = pci_register_driver(&qaic_pci_driver);
>>> +    if (ret) {
>>> +        pr_debug("qaic: pci_register_driver failed %d\n", ret);
>>> +        goto free_mhi;
>>> +    }
>>> +
>>> +    ret = mhi_qaic_ctrl_init();
>>> +    if (ret) {
>>> +        pr_debug("qaic: mhi_qaic_ctrl_init failed %d\n", ret);
>>> +        goto free_pci;
>>> +    }
>>> +
>>> +    return 0;
>>> +
>>> +free_pci:
>>> +    pci_unregister_driver(&qaic_pci_driver);
>>> +free_mhi:
>>> +    mhi_driver_unregister(&qaic_mhi_driver);
>>> +free_class:
>>
>> This label doesn't free anything. It should be renamed.
> 
> Doh.  Holdover from a previous version.  It does need to be renamed.
> 
>>
>>> +    return ret;
>>> +}
>>> +
>>> +static void __exit qaic_exit(void)
>>> +{
>>> +    /*
>>> +     * We assume that qaic_pci_remove() is called due to a hotplug 
>>> event
>>> +     * which would mean that the link is down, and thus
>>> +     * qaic_mhi_free_controller() should not try to access the 
>>> device during
>>> +     * cleanup.
>>> +     * We call pci_unregister_driver() below, which also triggers
>>> +     * qaic_pci_remove(), but since this is module exit, we expect 
>>> the link
>>> +     * to the device to be up, in which case qaic_mhi_free_controller()
>>> +     * should try to access the device during cleanup to put the 
>>> device in
>>> +     * a sane state.
>>> +     * For that reason, we set link_up here to let 
>>> qaic_mhi_free_controller
>>> +     * know the expected link state.  Since the module is going to be
>>> +     * removed at the end of this, we don't need to worry about
>>> +     * reinitializing the link_up state after the cleanup is done.
>>> +     */
>>> +    link_up = true;
>>
>> Maybe you could just use pdev->current_state instead of link_up?
> 
> Maybe.  Will have a look at that.

Looks like no.  current_state does not appear to be modified by any of 
the hotplug code for remove events, and current_state is not updated by 
the pci framework until after the driver's remove() is invoked.

> 
>>
>>> +    mhi_qaic_ctrl_deinit();
>>> +    pci_unregister_driver(&qaic_pci_driver);
>>> +    mhi_driver_unregister(&qaic_mhi_driver);
>>> +}
>>> +
>>> +module_init(qaic_init);
>>> +module_exit(qaic_exit);
>>> +
>>> +MODULE_AUTHOR(QAIC_DESC " Kernel Driver Team");
>>> +MODULE_DESCRIPTION(QAIC_DESC " Accel Driver");
>>> +MODULE_LICENSE("GPL");
>>> diff --git a/include/uapi/drm/qaic_accel.h 
>>> b/include/uapi/drm/qaic_accel.h
>>> new file mode 100644
>>> index 0000000..d5fa6f5
>>> --- /dev/null
>>> +++ b/include/uapi/drm/qaic_accel.h
>>> @@ -0,0 +1,283 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
>>> + *
>>> + * Copyright (c) 2019-2020, The Linux Foundation. All rights reserved.
>>> + * Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All 
>>> rights reserved.
>>> + */
>>> +
>>> +#ifndef QAIC_ACCEL_H_
>>> +#define QAIC_ACCEL_H_
>>> +
>>> +#include <linux/ioctl.h>
>>> +#include <linux/types.h>
>>
>> These two headers should not be needed here.
> 
> Fair enough.  Listed out of paranoia since they define things used in 
> this header.
> 
>>
>>> +#include "drm.h"
>>> +
>>> +#if defined(__CPLUSPLUS)
>>
>> Use lowercase here: __cplusplus
> 
> Doh.  Not sure why uppercase was used.  The referenced examples sure 
> didn't have that.
> 
>>
>>> +extern "C" {
>>> +#endif
>>> +
>>> +#define QAIC_MANAGE_MAX_MSG_LENGTH SZ_4K    /**<
>>> +                          * The length(4K) includes len and
>>> +                          * count fields of qaic_manage_msg
>>> +                          */
>>
>> I guess these are doxygen style commands but you should be using 
>> kernel-doc here.
>> See https://docs.kernel.org/doc-guide/kernel-doc.html.
>> This can be used to verify the header:
>> $ scripts/kernel-doc -v -none include/uapi/drm/qaic_accel.h
> 
> include/uapi/drm/drm.h was used as the example for these, and thus it 
> was assumed that was the DRM style.
> 
> I'm not convinced kernel-doc is applicable to all of these.  Will review.
> 
>>
>>> +
>>> +enum qaic_sem_flags {
>>
>> Is there any specific reason for enums if all values are explicitly 
>> given?
> 
> It is a style pulled from elsewhere.  I can remove the enums.
> 
>>
>>> +    SEM_INSYNCFENCE =    0x1,
>>
>> All these enums/defines end up in a global user space namespace.
>> I would advise to prefix everything with QAIC_ or DRM_QAIC_ (e.g. 
>> QAIC_SEM_INSYNCFENCE)
>> to avoid conflicts with other drivers or user space libs.
> 
> Good point.  Will do.
> 
>>
>>> +    SEM_OUTSYNCFENCE =    0x2,
>>> +};
>>> +
>>> +enum qaic_sem_cmd {
>>> +    SEM_NOP =        0,
>>> +    SEM_INIT =        1,
>>> +    SEM_INC =        2,
>>> +    SEM_DEC =        3,
>>> +    SEM_WAIT_EQUAL =    4,
>>> +    SEM_WAIT_GT_EQ =    5, /**< Greater than or equal */
>>> +    SEM_WAIT_GT_0 =        6, /**< Greater than 0 */
>>> +};
>>> +
>>> +enum qaic_manage_transaction_type {
>>> +    TRANS_UNDEFINED =            0,
>>> +    TRANS_PASSTHROUGH_FROM_USR =        1,
>>> +    TRANS_PASSTHROUGH_TO_USR =        2,
>>> +    TRANS_PASSTHROUGH_FROM_DEV =        3,
>>> +    TRANS_PASSTHROUGH_TO_DEV =        4,
>>> +    TRANS_DMA_XFER_FROM_USR =        5,
>>> +    TRANS_DMA_XFER_TO_DEV =            6,
>>> +    TRANS_ACTIVATE_FROM_USR =        7,
>>> +    TRANS_ACTIVATE_FROM_DEV =        8,
>>> +    TRANS_ACTIVATE_TO_DEV =            9,
>>> +    TRANS_DEACTIVATE_FROM_USR =        10,
>>> +    TRANS_DEACTIVATE_FROM_DEV =        11,
>>> +    TRANS_STATUS_FROM_USR =            12,
>>> +    TRANS_STATUS_TO_USR =            13,
>>> +    TRANS_STATUS_FROM_DEV =            14,
>>> +    TRANS_STATUS_TO_DEV =            15,
>>> +    TRANS_TERMINATE_FROM_DEV =        16,
>>> +    TRANS_TERMINATE_TO_DEV =        17,
>>> +    TRANS_DMA_XFER_CONT =            18,
>>> +    TRANS_VALIDATE_PARTITION_FROM_DEV =    19,
>>> +    TRANS_VALIDATE_PARTITION_TO_DEV =    20,
>>> +    TRANS_MAX =                21
>>> +};
>>> +
>>> +struct qaic_manage_trans_hdr {
>>> +    __u32 type;    /**< in, value from enum manage_transaction_type */
>>> +    __u32 len;    /**< in, length of this transaction, including the 
>>> header */
>>> +};
>>> +
>>> +struct qaic_manage_trans_passthrough {
>>> +    struct qaic_manage_trans_hdr hdr;
>>> +    __u8 data[];    /**< in, userspace must encode in little endian */
>>> +};
>>> +
>>> +struct qaic_manage_trans_dma_xfer {
>>> +    struct qaic_manage_trans_hdr hdr;
>>> +    __u32 tag;    /**< in, device specific */
>>> +    __u32 count;    /**< in */
>>> +    __u64 addr;    /**< in, address of the data to transferred via 
>>> DMA */
>>> +    __u64 size;    /**< in, length of the data to transferred via 
>>> DMA */
>>> +};
>>> +
>>> +struct qaic_manage_trans_activate_to_dev {
>>> +    struct qaic_manage_trans_hdr hdr;
>>> +    __u32 queue_size;    /**<
>>> +                  * in, number of elements in DBC request
>>> +                  * and respose queue
>>> +                  */
>>> +    __u32 eventfd;        /**< in */
>>> +    __u32 options;        /**< in, device specific */
>>> +    __u32 pad;        /**< pad must be 0 */
>>> +};
>>> +
>>> +struct qaic_manage_trans_activate_from_dev {
>>> +    struct qaic_manage_trans_hdr hdr;
>>> +    __u32 status;    /**< out, status of activate transaction */
>>> +    __u32 dbc_id;    /**< out, Identifier of assigned DMA Bridge 
>>> channel */
>>> +    __u64 options;    /**< out */
>>> +};
>>> +
>>> +struct qaic_manage_trans_deactivate {
>>> +    struct qaic_manage_trans_hdr hdr;
>>> +    __u32 dbc_id;    /**< in, Identifier of assigned DMA Bridge 
>>> channel */
>>> +    __u32 pad;    /**< pad must be 0 */
>>> +};
>>> +
>>> +struct qaic_manage_trans_status_to_dev {
>>> +    struct qaic_manage_trans_hdr hdr;
>>> +};
>>> +
>>> +struct qaic_manage_trans_status_from_dev {
>>> +    struct qaic_manage_trans_hdr hdr;
>>> +    __u16 major;    /**< out, major vesrion of NNC protocol used by 
>>> device */
>>> +    __u16 minor;    /**< out, minor vesrion of NNC protocol used by 
>>> device */
>>
>> vesrion -> version
> 
> Will fix.
> 
>>
>>> +    __u32 status;    /**< out, status of query transaction  */
>>> +    __u64 status_flags;    /**<
>>> +                  * out
>>> +                  * 0    : If set then device has CRC check enabled
>>> +                  * 1:63 : Unused
>>> +                  */
>>> +};
>>> +
>>> +struct qaic_manage_msg {
>>> +    __u32 len;    /**< in, Length of valid data - ie sum of all 
>>> transactions */
>>> +    __u32 count;    /**< in, Number of transactions in message */
>>> +    __u64 data;    /**< in, Pointer to array of transactions */
>>> +};
>>> +
>>> +struct qaic_create_bo {
>>> +    __u64 size;    /**< in, Size of BO in byte */
>>> +    __u32 handle;    /**< out, Returned GEM handle for the BO */
>>> +    __u32 pad;    /**< pad must be 0 */
>>> +};
>>> +
>>> +struct qaic_mmap_bo {
>>> +    __u32 handle;    /**< in, Handle for the BO being mapped. */
>>
>> The comment is missleading. BO is not mapped by this ioctl().
> 
> Its mapping the handle to the offset, which is a type of mapping.  I 
> think I get your point though.  Will try to wordsmith this better.
> 
>>
>>> +    __u32 pad;    /**< pad must be 0 */
>>> +    __u64 offset;    /**<
>>> +              * out, offset into the drm node to use for
>>> +              * subsequent mmap call
>>> +              */
>>> +};
>>> +
>>> +/**
>>> + * @brief semaphore command
>>> + */
>>> +struct qaic_sem {
>>> +    __u16 val;    /**< in, Only lower 12 bits are valid */
>>> +    __u8  index;    /**< in, Only lower 5 bits are valid */
>>> +    __u8  presync;    /**< in, 1 if presync operation, 0 if postsync */
>>> +    __u8  cmd;    /**< in, See enum sem_cmd */
>>> +    __u8  flags;    /**< in, See sem_flags for valid bits.  All 
>>> others must be 0 */
>>> +    __u16 pad;    /**< pad must be 0 */
>>> +};
>>> +
>>> +struct qaic_attach_slice_entry {
>>> +    __u64 size;        /**< in, Size memory to allocate for this BO 
>>> slice */
>>> +    struct qaic_sem    sem0;    /**< in, Must be zero if not valid */
>>> +    struct qaic_sem    sem1;    /**< in, Must be zero if not valid */
>>> +    struct qaic_sem    sem2;    /**< in, Must be zero if not valid */
>>> +    struct qaic_sem    sem3;    /**< in, Must be zero if not valid */
>>> +    __u64 dev_addr;        /**< in, Address in device to/from which 
>>> data is copied */
>>> +    __u64 db_addr;        /**< in, Doorbell address */
>>> +    __u32 db_data;        /**< in, Data to write to doorbell */
>>> +    __u32 db_len;        /**<
>>> +                  * in, Doorbell length - 32, 16, or 8 bits.
>>> +                  * 0 means doorbell is inactive
>>> +                  */
>>> +    __u64 offset;        /**< in, Offset from start of buffer */
>>> +};
>>> +
>>> +struct qaic_attach_slice_hdr {
>>> +    __u32 count;    /**< in, Number of slices for this BO */
>>> +    __u32 dbc_id;    /**< in, Associate this BO with this DMA Bridge 
>>> channel */
>>> +    __u32 handle;    /**< in, Handle of BO to which slicing 
>>> information is to be attached */
>>> +    __u32 dir;    /**< in, Direction of data: 1 = DMA_TO_DEVICE, 2 = 
>>> DMA_FROM_DEVICE */
>>> +    __u64 size;    /**<
>>> +              * in, Total length of BO
>>> +              * If BO is imported (DMABUF/PRIME) then this size
>>> +              * should not exceed the size of DMABUF provided.
>>> +              * If BO is allocated using DRM_IOCTL_QAIC_CREATE_BO
>>> +              * then this size should be exactly same as the size
>>> +              * provided during DRM_IOCTL_QAIC_CREATE_BO.
>>> +              */
>>> +};
>>> +
>>> +struct qaic_attach_slice {
>>> +    struct qaic_attach_slice_hdr hdr;
>>> +    __u64 data;    /**<
>>> +              * in, Pointer to a buffer which is container of
>>> +              * struct qaic_attach_slice_entry[]
>>> +              */
>>> +};
>>> +
>>> +struct qaic_execute_entry {
>>> +    __u32 handle;    /**< in, buffer handle */
>>> +    __u32 dir;    /**< in, 1 = to device, 2 = from device */
>>> +};
>>> +
>>> +struct qaic_partial_execute_entry {
>>> +    __u32 handle;    /**< in, buffer handle */
>>> +    __u32 dir;    /**< in, 1 = to device, 2 = from device */
>>> +    __u64 resize;    /**< in, 0 = no resize */
>>> +};
>>> +
>>> +struct qaic_execute_hdr {
>>> +    __u32 count;    /**< in, number of executes following this 
>>> header */
>>> +    __u32 dbc_id;    /**< in, Identifier of assigned DMA Bridge 
>>> channel */
>>> +};
>>> +
>>> +struct qaic_execute {
>>> +    struct qaic_execute_hdr hdr;
>>> +    __u64 data;    /**< in, qaic_execute_entry or 
>>> qaic_partial_execute_entry container */
>>> +};
>>> +
>>> +struct qaic_wait {
>>> +    __u32 handle;    /**< in, handle to wait on until execute is 
>>> complete */
>>> +    __u32 timeout;    /**< in, timeout for wait(in ms) */
>>> +    __u32 dbc_id;    /**< in, Identifier of assigned DMA Bridge 
>>> channel */
>>> +    __u32 pad;    /**< pad must be 0 */
>>> +};
>>> +
>>> +struct qaic_perf_stats_hdr {
>>> +    __u16 count;    /**< in, Total number BOs requested */
>>> +    __u16 pad;    /**< pad must be 0 */
>>> +    __u32 dbc_id;    /**< in, Identifier of assigned DMA Bridge 
>>> channel */
>>> +};
>>> +
>>> +struct qaic_perf_stats {
>>> +    struct qaic_perf_stats_hdr hdr;
>>> +    __u64 data;    /**< in, qaic_perf_stats_entry container */
>>> +};
>>> +
>>> +struct qaic_perf_stats_entry {
>>> +    __u32 handle;            /**< in, Handle of the memory request */
>>> +    __u32 queue_level_before;    /**<
>>> +                      * out, Number of elements in queue
>>> +                      * before submission given memory request
>>> +                      */
>>> +    __u32 num_queue_element;    /**<
>>> +                      * out, Number of elements to add in the
>>> +                      * queue for given memory request
>>> +                      */
>>> +    __u32 submit_latency_us;    /**<
>>> +                      * out, Time taken by kernel to submit
>>> +                      * the request to device
>>> +                      */
>>> +    __u32 device_latency_us;    /**<
>>> +                      * out, Time taken by device to execute the
>>> +                      * request. 0 if request is not completed
>>> +                      */
>>> +    __u32 pad;            /**< pad must be 0 */
>>> +};
>>> +
>>> +struct qaic_part_dev {
>>> +    __u32 partition_id;    /**< in, reservation id */
>>> +    __u16 remove;        /**< in, 1 - Remove device 0 - Create 
>>> device */
>>> +    __u16 pad;        /**< pad must be 0 */
>>> +};
>>> +
>>> +#define DRM_QAIC_MANAGE                0x00
>>> +#define DRM_QAIC_CREATE_BO            0x01
>>> +#define DRM_QAIC_MMAP_BO            0x02
>>
>> I know that MMAP_BO ioctl is common in drm drivers but in my opinion 
>> it is a very poor name.
>> I suggest naming it BO_INFO so in future you could extend it with 
>> other bo params besides
>> mmap offset.
> 
> I see your point, but I don't see how to implement it.
> 
> Let us assume this IOCTL is renamed DRM_QAIC_BO_INFO, and struct 
> qaic_mmap_bo is named qaic_bo_info.
> 
> qaic_bo_info is part of the uAPI.  If, "tomorrow" we need to add a field 
> to it, we cannot.  That changes the structure size and breaks the uAPI.
> 
> I can't add reserved fields in anticipation of future use since GregKH 
> had said not to do that - 
> https://lore.kernel.org/all/20200514163734.GB3154055@kroah.com/
> 
> So, I'm thinking the only approach would be a linked list similar to 
> EXECUTE_BO where qaic_bo_info holds a userspace pointer that is the head 
> of the list, and new list nodes can be defined in future.  That seems 
> like an overengineered solution to some hypothetical future which may 
> not come to pass.  If this is the solution, then I think I'd prefer to 
> leave this ioctl as is, and deprecate it if the extension usecase ever 
> comes to pass.
> 
> Thoughts?  Am I missing something?
> 
>>
>>> +#define DRM_QAIC_ATTACH_SLICE_BO        0x03
>>> +#define DRM_QAIC_EXECUTE_BO            0x04
>>> +#define DRM_QAIC_PARTIAL_EXECUTE_BO        0x05
>>> +#define DRM_QAIC_WAIT_BO            0x06
>>> +#define DRM_QAIC_PERF_STATS_BO            0x07
>>> +#define DRM_QAIC_PART_DEV            0x08
>>> +
>>> +#define DRM_IOCTL_QAIC_MANAGE            DRM_IOWR(DRM_COMMAND_BASE + 
>>> DRM_QAIC_MANAGE, struct qaic_manage_msg)
>>> +#define DRM_IOCTL_QAIC_CREATE_BO        DRM_IOWR(DRM_COMMAND_BASE + 
>>> DRM_QAIC_CREATE_BO,    struct qaic_create_bo)
>>> +#define DRM_IOCTL_QAIC_MMAP_BO            DRM_IOWR(DRM_COMMAND_BASE 
>>> + DRM_QAIC_MMAP_BO, struct qaic_mmap_bo)
>>> +#define DRM_IOCTL_QAIC_ATTACH_SLICE_BO        
>>> DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_ATTACH_SLICE_BO, struct 
>>> qaic_attach_slice)
>>> +#define DRM_IOCTL_QAIC_EXECUTE_BO        DRM_IOW(DRM_COMMAND_BASE + 
>>> DRM_QAIC_EXECUTE_BO,    struct qaic_execute)
>>> +#define DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO    
>>> DRM_IOW(DRM_COMMAND_BASE + DRM_QAIC_PARTIAL_EXECUTE_BO,    struct 
>>> qaic_execute)
>>> +#define DRM_IOCTL_QAIC_WAIT_BO            DRM_IOW(DRM_COMMAND_BASE + 
>>> DRM_QAIC_WAIT_BO, struct qaic_wait)
>>> +#define DRM_IOCTL_QAIC_PERF_STATS_BO        
>>> DRM_IOWR(DRM_COMMAND_BASE + DRM_QAIC_PERF_STATS_BO, struct 
>>> qaic_perf_stats)
>>> +#define DRM_IOCTL_QAIC_PART_DEV            DRM_IOWR(DRM_COMMAND_BASE 
>>> + DRM_QAIC_PART_DEV, struct qaic_part_dev)
>>> +
>>> +#if defined(__CPLUSPLUS)
>>
>> Use lowercase here: __cplusplus
> 
> Will do.
> 
>>
>>> +}
>>> +#endif
>>> +
>>> +#endif /* QAIC_ACCEL_H_ */
>>
>> Regards,
>> Jacek
>>
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 5/8] accel/qaic: Add datapath
  2023-03-01 18:14               ` Jeffrey Hugo
  (?)
@ 2023-03-03 13:49               ` Stanislaw Gruszka
  -1 siblings, 0 replies; 68+ messages in thread
From: Stanislaw Gruszka @ 2023-03-03 13:49 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, quic_carlv, jacek.lawrynowicz

On Wed, Mar 01, 2023 at 11:14:35AM -0700, Jeffrey Hugo wrote:
> On 3/1/2023 10:05 AM, Stanislaw Gruszka wrote:
> > On Wed, Mar 01, 2023 at 09:08:03AM -0700, Jeffrey Hugo wrote:
> > > > This looks a bit suspicious. Are you sure you can modify
> > > > sg->dma_address and still use it as valid value ?
> > > 
> > > A single entry in the sg table is a contiguous mapping of memory.  If it
> > > wasn't contiguous, it would have to be broken up into multiple entries.  In
> > > the simple case, a driver is going to take the dma_address/len pair and hand
> > > that directly to the device.  Then the device is going to access every
> > > address in that range.
> > > 
> > > If the device can access every address from dma_address to dma_address +
> > > len, why can't it access a subset of that?
> > 
> > Required address alignment can be broken. Not sure if only that.
> 
> AIC100 doesn't have required alignment.  AIC100 can access any 64-bit
> address, at a byte level granularity.  The only restriction AIC100 has is
> that the size of a transfer is restricted to a 32-bit value, so max
> individual transfer size of 4GB.  Transferring more than 4GB requires
> multiple transactions.
> 
> > > > > Are you suggesting renaming
> > > > > this function?  I guess I'm not quite understanding your comment here. Can
> > > > > you elaborate?
> > > > 
> > > > Renaming would be nice. I was thinking by simplifying it, not sure
> > > > now if that's easy achievable, though.
> > > 
> > > Ok.  I'll think on this.
> > 
> > Maybe this function could be removed ? And create sg lists
> > that hardware can handle without any modification.
> > Just idea to consider, not any requirement.
> 
> Ok, so this is part of our "slicing" operation, and thus required.
> 
> Maybe how slicing works is not clear.
> 
> Lets say that we have a workload on AIC100 that can identify a license plate
> in a picture (aka lprnet).  Lets assume this workload only needs the RGB
> values of a RGBA file (a "jpeg" we are processing).
> 
> Userspace allocates a BO to hold the entire file.  A quarter of the file is
> R values, a quarter is G values, etc.  For simplicity, lets assume the R
> values are all sequentially listed, then the G values, then the B values,
> finally the A values.  When we allocate the BO, we map it once.  If we have
> an IOMMU, this optimizes the IOMMU mappings.  BOs can be quite large.  We
> have some test workloads based on real world workloads where each BO is
> 16-32M in size, and there are multiple BOs.  I don't want to map a 32M BO N
> duplicate times in the IOMMU.
> 
> So, now userspace slices the BO.  It tells us we need to transfer the RGB
> values (the first 75% of the BO), but not the A values.  So, we create a
> copy of the mapped SG and edit it to represent this transfer, which is a
> subset of the entire BO.  Using the slice information and the mapping
> information, we construct the DMA engine commands that can be used to
> transfer the relevant portions of the BO to the device.
> 
> It sounds like you are suggesting, lets flip this around.  Don't map the
> entire BO once.  Instead, wait for the slice info from userspace, construct
> a sg list based on the parts of the BO for the slice, and map that.  Then
> the driver gets a mapped SG it can just use.  The issue I see with that is
> slices can overlap.  You can transfer the same part of a BO multiple times.
> Maybe lprnet has multiple threads on AIC100 where thread A consumes R data,
> thread B consumes R and G data, and thread C consumes B data.  We need to
> transfer the R data twice to different device locations so that threads A
> and B can consume the R data independently.
> 
> If we map per slice, we are going to map the R part of the BO twice in the
> IOMMU.  Is that valid?  It feels possible that there exists some IOMMU
> implementation that won't allow multiple IOVAs to map to the same DDR PA
> because that is weird and the implementer thinks its a software bug.  I
> don't want to run into that.  Assuming it is valid, that is multiple
> mappings in the IOMMU TLB which could have been a single mapping.  We are
> wasting IOMMU resources.
> 
> There are some ARM systems we support with limited IOVA space in the IOMMU,
> and we've had some issues with exhausting that space.  The current
> implementation is influenced by those experiences.

Ok, then the current implementation seems reasonable.
Thanks for explanation!

Regards
Stanislaw

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 3/8] accel/qaic: Add MHI controller
  2023-03-01 16:09       ` Jeffrey Hugo
  (?)
@ 2023-03-03 13:57       ` Stanislaw Gruszka
  -1 siblings, 0 replies; 68+ messages in thread
From: Stanislaw Gruszka @ 2023-03-03 13:57 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: linux-doc, linux-arm-msm, ogabbay, dri-devel, quic_ajitpals,
	quic_pkanojiy, quic_carlv, jacek.lawrynowicz

On Wed, Mar 01, 2023 at 09:09:57AM -0700, Jeffrey Hugo wrote:
> On 2/28/2023 4:52 AM, Stanislaw Gruszka wrote:
> > On Mon, Feb 06, 2023 at 08:41:40AM -0700, Jeffrey Hugo wrote:
> > > +	mhi_cntl = kzalloc(sizeof(*mhi_cntl), GFP_KERNEL);
> > [snip]
> > > +	mhi_cntl->irq = kmalloc(sizeof(*mhi_cntl->irq), GFP_KERNEL);
> > 
> > I recommend usage of devm_kzalloc(), devm_kmalloc() for those
> > to simplify error and exit paths.
> 
> When this was written, I didn't want to pass the struct device to the
> mhi_controller just for the purpose of using devm_*.  Today, I'm thinking
> that is not the end of the world, and devm has advantages. Will change.

If already available &pci_dev->dev can not be used in devm_ due to
different life times of pci_dev->dev and mhi_cntl, I don't think change
would be justifiable and kmalloc/kzalloc should stay.

Regards
Stanislaw


^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2023-03-03 13:57 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-06 15:41 [PATCH v2 0/8] QAIC accel driver Jeffrey Hugo
2023-02-06 15:41 ` Jeffrey Hugo
2023-02-06 15:41 ` [PATCH v2 1/8] accel/qaic: Add documentation for AIC100 accelerator driver Jeffrey Hugo
2023-02-06 15:41   ` Jeffrey Hugo
2023-02-14 11:08   ` Jacek Lawrynowicz
2023-02-15 15:41     ` Jeffrey Hugo
2023-02-15 15:41       ` Jeffrey Hugo
2023-02-16 14:18       ` Jacek Lawrynowicz
2023-02-16 15:20         ` Jeffrey Hugo
2023-02-16 15:20           ` Jeffrey Hugo
2023-02-06 15:41 ` [PATCH v2 2/8] accel/qaic: Add uapi and core driver file Jeffrey Hugo
2023-02-06 15:41   ` Jeffrey Hugo
2023-02-16 14:13   ` Jacek Lawrynowicz
2023-02-17 18:15     ` Jeffrey Hugo
2023-02-17 18:15       ` Jeffrey Hugo
2023-02-22 15:52       ` Dafna Hirschfeld
2023-02-22 15:52         ` Dafna Hirschfeld
2023-02-24 21:21         ` Jeffrey Hugo
2023-02-24 21:21           ` Jeffrey Hugo
2023-03-01 19:46       ` Jeffrey Hugo
2023-03-01 19:46         ` Jeffrey Hugo
2023-02-06 15:41 ` [PATCH v2 3/8] accel/qaic: Add MHI controller Jeffrey Hugo
2023-02-06 15:41   ` Jeffrey Hugo
2023-02-28 11:52   ` Stanislaw Gruszka
2023-02-28 11:52     ` Stanislaw Gruszka
2023-03-01 16:09     ` Jeffrey Hugo
2023-03-01 16:09       ` Jeffrey Hugo
2023-03-03 13:57       ` Stanislaw Gruszka
2023-02-06 15:41 ` [PATCH v2 4/8] accel/qaic: Add control path Jeffrey Hugo
2023-02-06 15:41   ` Jeffrey Hugo
2023-03-01 13:18   ` Stanislaw Gruszka
2023-03-01 13:18     ` Stanislaw Gruszka
2023-03-01 17:02     ` Jeffrey Hugo
2023-03-01 17:02       ` Jeffrey Hugo
2023-02-06 15:41 ` [PATCH v2 5/8] accel/qaic: Add datapath Jeffrey Hugo
2023-02-06 15:41   ` Jeffrey Hugo
2023-02-24 15:25   ` Stanislaw Gruszka
2023-02-24 15:25     ` Stanislaw Gruszka
2023-02-24 19:36     ` Jeffrey Hugo
2023-02-24 19:36       ` Jeffrey Hugo
2023-02-27 17:14       ` Stanislaw Gruszka
2023-02-27 17:14         ` Stanislaw Gruszka
2023-03-01 16:08         ` Jeffrey Hugo
2023-03-01 16:08           ` Jeffrey Hugo
2023-03-01 17:05           ` Stanislaw Gruszka
2023-03-01 18:14             ` Jeffrey Hugo
2023-03-01 18:14               ` Jeffrey Hugo
2023-03-03 13:49               ` Stanislaw Gruszka
2023-02-06 15:41 ` [PATCH v2 6/8] accel/qaic: Add mhi_qaic_cntl Jeffrey Hugo
2023-02-06 15:41   ` Jeffrey Hugo
2023-03-01 13:19   ` Stanislaw Gruszka
2023-03-01 13:19     ` Stanislaw Gruszka
2023-02-06 15:41 ` [PATCH v2 7/8] accel/qaic: Add qaic driver to the build system Jeffrey Hugo
2023-02-06 15:41   ` Jeffrey Hugo
2023-03-01 13:02   ` Stanislaw Gruszka
2023-03-01 13:02     ` Stanislaw Gruszka
2023-03-01 16:10     ` Jeffrey Hugo
2023-03-01 16:10       ` Jeffrey Hugo
2023-02-06 15:41 ` [PATCH v2 8/8] MAINTAINERS: Add entry for QAIC driver Jeffrey Hugo
2023-02-06 15:41   ` Jeffrey Hugo
2023-03-01 13:03   ` Stanislaw Gruszka
2023-03-01 13:03     ` Stanislaw Gruszka
2023-03-01 16:15     ` Jeffrey Hugo
2023-03-01 16:15       ` Jeffrey Hugo
2023-02-08 22:01 ` [PATCH v2 0/8] QAIC accel driver Jeffrey Hugo
2023-02-08 22:01   ` Jeffrey Hugo
2023-02-17 16:14   ` Jeffrey Hugo
2023-02-17 16:14     ` Jeffrey Hugo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.