linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver
@ 2020-05-14 14:07 Jeffrey Hugo
  2020-05-14 14:07 ` [RFC PATCH 1/8] qaic: Add skeleton driver Jeffrey Hugo
                   ` (9 more replies)
  0 siblings, 10 replies; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-14 14:07 UTC (permalink / raw)
  To: arnd, gregkh
  Cc: manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel, Jeffrey Hugo

Introduction:
Qualcomm Cloud AI 100 is a PCIe adapter card which contains a dedicated
SoC ASIC for the purpose of efficently running Deep Learning inference
workloads in a data center environment.

The offical press release can be found at -
https://www.qualcomm.com/news/releases/2019/04/09/qualcomm-brings-power-efficient-artificial-intelligence-inference

The offical product website is -
https://www.qualcomm.com/products/datacenter-artificial-intelligence

At the time of the offical press release, numerious technology news sites
also covered the product.  Doing a search of your favorite site is likely
to find their coverage of it.

It is our goal to have the kernel driver for the product fully upstream.
The purpose of this RFC is to start that process.  We are still doing
development (see below), and thus not quite looking to gain acceptance quite
yet, but now that we have a working driver we beleive we are at the stage
where meaningful conversation with the community can occur.

Design:

+--------------------------------+
|       AI application           |
|       (userspace)              |
+-------------+------------------+
              |
              | Char dev interface
              |
              |
+-------------+------------------+
|       QAIC driver              |
|       (kernel space)           |
|                                |
+----+------------------+--------+
     |                  |
     |                  |
     |                  |
     |                  |
     |Control path      | Data path
     |(MHI bus)         |
     |                  |
     |                  |
     |                  |
     |                  |
+--------------------------------+
| +--------+      +------------+ |
| | MHI HW |      |DMA Bridge  | |
| +--------+      |(DMA engine)| |
|                 +------------+ |
|                                |
|                                |
|                                |
|  Qualcomm Cloud AI 100 device  |
|                                |
|                                |
+--------------------------------+

A Qualcomm Cloud AI 100 device (QAIC device from here on) is a PCIe hardware
accelerator for AI inference workloads.  Over the PCIe bus fabric, a QAIC
device exposes two interfaces via PCI BARs - a MHI hardware region and a
DMA engine hardware region.

Before workloads can be run, a QAIC device needs to be initialized.  Similar
to other Qualcomm products with incorperate MHI, device firmware needs to be
loaded onto the device from the host.  This occurs in two stages.  First,
a secondary bootloader (SBL) needs to be loaded onto the device.  This occurs
via the BHI protocol, and is handled by the MHI bus.  Once the SBL loaded
and running, it activates the Sahara protocol.  The Sahara protocol is used
with a userspace application to load and initialize the remaining firmware.
The Sahara protocol and associated userspace application are outside the
scope of this series as they have no direct interaction with the QAIC driver.

Once a QAIC device is fully initialized, workloads can be sent to the device
and run.  This involves a per-device instance char dev that the QAIC driver
exposes to userspace.  Running a workload involves two phases - configuring the
device, and interacting with the workload.

To configure the device, commands are sent via a MHI channel.  This is referred
to as the control path.  A command is a single message.  A message contains
one or more transactions.  Transactions are operations that the device
is requested to perform.  Most commands are opaque to the kernel driver, however
some are not.  For example, if the user application wishes to DMA data to the
device, it requires the assistance of the kernel driver to translate the data
addresses to an address space that the device can understand.  In this instance
the transaction for DMAing the data is visible to the kernel driver, and the
driver will do the required transformation when encoding the message.

To interact with the workload, the workload is assigned a DMA Bridge Channel
(dbc).  This is dedicated hardware within the DMA engine.  Interacting with the
workload consists of sending it input data, and receiving output data.  The
user application requests appropiate buffers from the kernel driver, prepares
the buffers, and directs the kernel driver to queue them to the hardware.

The kernel driver is required to support multiple QAIC devices, and also N
users per device.

Status:
This series introduces the driver for QAIC devices, and builds up the minimum
functionality for running workloads.  Several features which have been omitted
or are still planned are indicated in the future work section.

Before exiting the RFC phase, and attempting full acceptance, we wish to
complete two features which are currently under development as we expect there
to be userspace interface changes as a result.

The first feature is a variable length control message between the kernel driver
and the device.  This allows us to support the total number of DMA transactions
we require for certain platforms, while minimizing memory usage.  The interface
impact of this would be to allow us to drop the size of the manage buffer
between userspace and the kernel driver from the current 16k, much of which is
wasted.

The second feature is an optimization and extension of the data path interface.
We plan to move the bulk of the data in the qaic_execute structure to the
qaic_mem_req structure, which optimized our critical path processing.  We also
plan to extend the qaic_execute structure to allow for a batch submit of
multiple buffers as an optimization and convenience for userspace.  

Future work:
For simplicity, we have omitted work related to the following features, and
intend to submit in future series:

-debugfs
-trace points
-hwmon (device telemetry)

We are also investigating what it might mean to support dma_bufs.  We expect
that such support would come as an extension of the interface.

Jeffrey Hugo (8):
  qaic: Add skeleton driver
  qaic: Add and init a basic mhi controller
  qaic: Create char dev
  qaic: Implement control path
  qaic: Implement data path
  qaic: Implement PCI link status error handlers
  qaic: Implement MHI error status handler
  MAINTAINERS: Add entry for QAIC driver

 MAINTAINERS                        |    7 +
 drivers/misc/Kconfig               |    1 +
 drivers/misc/Makefile              |    1 +
 drivers/misc/qaic/Kconfig          |   20 +
 drivers/misc/qaic/Makefile         |   12 +
 drivers/misc/qaic/mhi_controller.c |  538 +++++++++++++++++++
 drivers/misc/qaic/mhi_controller.h |   20 +
 drivers/misc/qaic/qaic.h           |  111 ++++
 drivers/misc/qaic/qaic_control.c   | 1015 ++++++++++++++++++++++++++++++++++++
 drivers/misc/qaic/qaic_data.c      |  952 +++++++++++++++++++++++++++++++++
 drivers/misc/qaic/qaic_drv.c       |  699 +++++++++++++++++++++++++
 include/uapi/misc/qaic.h           |  246 +++++++++
 12 files changed, 3622 insertions(+)
 create mode 100644 drivers/misc/qaic/Kconfig
 create mode 100644 drivers/misc/qaic/Makefile
 create mode 100644 drivers/misc/qaic/mhi_controller.c
 create mode 100644 drivers/misc/qaic/mhi_controller.h
 create mode 100644 drivers/misc/qaic/qaic.h
 create mode 100644 drivers/misc/qaic/qaic_control.c
 create mode 100644 drivers/misc/qaic/qaic_data.c
 create mode 100644 drivers/misc/qaic/qaic_drv.c
 create mode 100644 include/uapi/misc/qaic.h

-- 
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH 1/8] qaic: Add skeleton driver
  2020-05-14 14:07 [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver Jeffrey Hugo
@ 2020-05-14 14:07 ` Jeffrey Hugo
  2020-05-15  0:43   ` Jeffrey Hugo
  2020-05-14 14:07 ` [RFC PATCH 2/8] qaic: Add and init a basic mhi controller Jeffrey Hugo
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-14 14:07 UTC (permalink / raw)
  To: arnd, gregkh
  Cc: manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel, Jeffrey Hugo

Add the Qualcomm Cloud AI 100 skeleton driver that does nothing but
registers a pci driver, probes on the proper device, and supports removal
of the module.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
---
 drivers/misc/Kconfig         |  1 +
 drivers/misc/Makefile        |  1 +
 drivers/misc/qaic/Kconfig    | 20 ++++++++++++++
 drivers/misc/qaic/Makefile   | 10 +++++++
 drivers/misc/qaic/qaic_drv.c | 62 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 94 insertions(+)
 create mode 100644 drivers/misc/qaic/Kconfig
 create mode 100644 drivers/misc/qaic/Makefile
 create mode 100644 drivers/misc/qaic/qaic_drv.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 99e1514..ef91d062 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -481,4 +481,5 @@ source "drivers/misc/ocxl/Kconfig"
 source "drivers/misc/cardreader/Kconfig"
 source "drivers/misc/habanalabs/Kconfig"
 source "drivers/misc/uacce/Kconfig"
+source "drivers/misc/qaic/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 9abf292..64f0770 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -58,3 +58,4 @@ obj-$(CONFIG_PVPANIC)   	+= pvpanic.o
 obj-$(CONFIG_HABANA_AI)		+= habanalabs/
 obj-$(CONFIG_UACCE)		+= uacce/
 obj-$(CONFIG_XILINX_SDFEC)	+= xilinx_sdfec.o
+obj-$(CONFIG_QAIC)		+= qaic/
diff --git a/drivers/misc/qaic/Kconfig b/drivers/misc/qaic/Kconfig
new file mode 100644
index 0000000..502d176
--- /dev/null
+++ b/drivers/misc/qaic/Kconfig
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Qualcomm Cloud AI 100 accelerators driver
+#
+
+config QAIC
+	tristate "Qualcomm Cloud AI 100 accelerators"
+	depends on PCI && HAS_IOMEM
+	depends on MHI_BUS
+	help
+	  Enables driver for Qualcomm's Cloud AI 100 accelerator PCIe cards
+	  that are designed to accelerate Deep Learning inference workloads.
+
+	  The driver manages the PCIe devices and provides an IOCTL interface
+	  for users to submit workloads to the devices.
+
+	  If unsure, say N.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called qaic.
diff --git a/drivers/misc/qaic/Makefile b/drivers/misc/qaic/Makefile
new file mode 100644
index 0000000..42149ac
--- /dev/null
+++ b/drivers/misc/qaic/Makefile
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for Qualcomm Cloud AI 100 accelerators driver
+#
+
+
+obj-$(CONFIG_QAIC)	:= qaic.o
+
+qaic-y := \
+	qaic_drv.o
diff --git a/drivers/misc/qaic/qaic_drv.c b/drivers/misc/qaic/qaic_drv.c
new file mode 100644
index 0000000..addd9ea
--- /dev/null
+++ b/drivers/misc/qaic/qaic_drv.c
@@ -0,0 +1,62 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2019-2020, The Linux Foundation. All rights reserved. */
+
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/pci_ids.h>
+
+#define PCI_DEV_AIC100			0xa100
+
+#define QAIC_NAME			"Qualcomm Cloud AI 100"
+
+static int qaic_pci_probe(struct pci_dev *pdev,
+			  const struct pci_device_id *id)
+{
+	pci_dbg(pdev, "%s\n", __func__);
+	pci_dbg(pdev, "%s: successful init\n", __func__);
+	return 0;
+}
+
+static void qaic_pci_remove(struct pci_dev *pdev)
+{
+	pci_dbg(pdev, "%s\n", __func__);
+}
+
+static const struct pci_device_id ids[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_QCOM, PCI_DEV_AIC100), },
+	{ 0, }
+};
+MODULE_DEVICE_TABLE(pci, ids);
+
+static struct pci_driver qaic_pci_driver = {
+	.name = QAIC_NAME,
+	.id_table = ids,
+	.probe = qaic_pci_probe,
+	.remove = qaic_pci_remove,
+};
+
+static int __init qaic_init(void)
+{
+	int ret;
+
+	pr_debug("qaic: init\n");
+
+	ret = pci_register_driver(&qaic_pci_driver);
+
+	return ret;
+}
+
+static void __exit qaic_exit(void)
+{
+	pr_debug("qaic: exit\n");
+	pci_unregister_driver(&qaic_pci_driver);
+}
+
+module_init(qaic_init);
+module_exit(qaic_exit);
+
+MODULE_AUTHOR("Qualcomm Cloud AI 100 Accelerator Kernel Driver Team");
+MODULE_DESCRIPTION("Qualcomm Cloud 100 AI Accelerators Driver");
+MODULE_LICENSE("GPL v2");
+MODULE_VERSION("0.0.0"); /* MAJOR.MINOR.PATCH */
-- 
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 2/8] qaic: Add and init a basic mhi controller
  2020-05-14 14:07 [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver Jeffrey Hugo
  2020-05-14 14:07 ` [RFC PATCH 1/8] qaic: Add skeleton driver Jeffrey Hugo
@ 2020-05-14 14:07 ` Jeffrey Hugo
  2020-05-14 14:07 ` [RFC PATCH 3/8] qaic: Create char dev Jeffrey Hugo
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-14 14:07 UTC (permalink / raw)
  To: arnd, gregkh
  Cc: manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel, Jeffrey Hugo

An mhi controller is the "driver" for a specific mhi device with the mhi
bus framework.  Add a basic controller for the AIC100 device, and
supporting code to init the controller.  This will enable the PCIE device,
init the mhi hardware, bring it to ready state, and use BHI to load the
SBL image.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
---
 drivers/misc/qaic/Makefile         |   4 +-
 drivers/misc/qaic/mhi_controller.c | 495 +++++++++++++++++++++++++++++++++++++
 drivers/misc/qaic/mhi_controller.h |  14 ++
 drivers/misc/qaic/qaic.h           |  18 ++
 drivers/misc/qaic/qaic_drv.c       | 110 +++++++++
 5 files changed, 639 insertions(+), 2 deletions(-)
 create mode 100644 drivers/misc/qaic/mhi_controller.c
 create mode 100644 drivers/misc/qaic/mhi_controller.h
 create mode 100644 drivers/misc/qaic/qaic.h

diff --git a/drivers/misc/qaic/Makefile b/drivers/misc/qaic/Makefile
index 42149ac..b5fd819 100644
--- a/drivers/misc/qaic/Makefile
+++ b/drivers/misc/qaic/Makefile
@@ -3,8 +3,8 @@
 # Makefile for Qualcomm Cloud AI 100 accelerators driver
 #
 
-
 obj-$(CONFIG_QAIC)	:= qaic.o
 
 qaic-y := \
-	qaic_drv.o
+	qaic_drv.o \
+	mhi_controller.o
diff --git a/drivers/misc/qaic/mhi_controller.c b/drivers/misc/qaic/mhi_controller.c
new file mode 100644
index 0000000..ba4808c
--- /dev/null
+++ b/drivers/misc/qaic/mhi_controller.c
@@ -0,0 +1,495 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2019-2020, The Linux Foundation. All rights reserved. */
+
+#include <linux/err.h>
+#include <linux/mhi.h>
+#include <linux/moduleparam.h>
+#include <linux/pci.h>
+
+static unsigned int mhi_timeout = 20000; /* 20 sec default */
+module_param(mhi_timeout, uint, 0600);
+
+static struct mhi_channel_config aic100_channels[] = {
+	{
+		.name = "QAIC_LOOPBACK",
+		.num = 0,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.type = MHI_CH_TYPE_INBOUND,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_LOOPBACK",
+		.num = 1,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.type = MHI_CH_TYPE_OUTBOUND,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_SAHARA",
+		.num = 2,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.type = MHI_CH_TYPE_INBOUND,
+		.ee_mask = MHI_CH_EE_SBL,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_SAHARA",
+		.num = 3,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.type = MHI_CH_TYPE_OUTBOUND,
+		.ee_mask = MHI_CH_EE_SBL,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_DIAG",
+		.num = 4,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.type = MHI_CH_TYPE_INBOUND,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_DIAG",
+		.num = 5,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.type = MHI_CH_TYPE_OUTBOUND,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_SSR",
+		.num = 6,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.type = MHI_CH_TYPE_INBOUND,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_SSR",
+		.num = 7,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.type = MHI_CH_TYPE_OUTBOUND,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_QDSS",
+		.num = 8,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.type = MHI_CH_TYPE_INBOUND,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_QDSS",
+		.num = 9,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.type = MHI_CH_TYPE_OUTBOUND,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_CONTROL",
+		.num = 10,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.type = MHI_CH_TYPE_INBOUND,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_CONTROL",
+		.num = 11,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.type = MHI_CH_TYPE_OUTBOUND,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_LOGGING",
+		.num = 12,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.type = MHI_CH_TYPE_INBOUND,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_LOGGING",
+		.num = 13,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.type = MHI_CH_TYPE_OUTBOUND,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_STATUS",
+		.num = 14,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.type = MHI_CH_TYPE_INBOUND,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_STATUS",
+		.num = 15,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.type = MHI_CH_TYPE_OUTBOUND,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_TELEMETRY",
+		.num = 16,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.type = MHI_CH_TYPE_INBOUND,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_TELEMETRY",
+		.num = 17,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.type = MHI_CH_TYPE_OUTBOUND,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_DEBUG",
+		.num = 18,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_TO_DEVICE,
+		.type = MHI_CH_TYPE_INBOUND,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+	{
+		.name = "QAIC_DEBUG",
+		.num = 19,
+		.num_elements = 32,
+		.local_elements = 0,
+		.event_ring = 0,
+		.dir = DMA_FROM_DEVICE,
+		.type = MHI_CH_TYPE_OUTBOUND,
+		.ee_mask = MHI_CH_EE_AMSS,
+		.pollcfg = 0,
+		.doorbell = MHI_DB_BRST_DISABLE,
+		.lpm_notify = false,
+		.offload_channel = false,
+		.doorbell_mode_switch = false,
+		.auto_queue = false,
+		.auto_start = false,
+		.wake_capable = false,
+	},
+};
+
+static struct mhi_event_config aic100_events[] = {
+	{
+		.num_elements = 32,
+		.irq_moderation_ms = 0,
+		.irq = 0,
+		.channel = 0,
+		.priority = 1,
+		.mode = MHI_DB_BRST_DISABLE,
+		.data_type = MHI_ER_CTRL,
+		.hardware_event = false,
+		.client_managed = false,
+		.offload_channel = false,
+	},
+};
+
+static struct mhi_controller_config aic100_config = {
+	.max_channels = 128,
+	.timeout_ms = 0, /* controlled by mhi_timeout */
+	.buf_len = 0,
+	.num_channels = ARRAY_SIZE(aic100_channels),
+	.ch_cfg = aic100_channels,
+	.num_events = ARRAY_SIZE(aic100_events),
+	.event_cfg = aic100_events,
+	.use_bounce_buf = false,
+	.m2_no_db = false,
+};
+
+static int mhi_link_status(struct mhi_controller *mhi_cntl)
+{
+	struct pci_dev *pci_dev = to_pci_dev(mhi_cntl->cntrl_dev);
+	u16 dev_id;
+	int ret;
+
+	/* try reading device id, if dev id don't match, link is down */
+	ret = pci_read_config_word(pci_dev, PCI_DEVICE_ID, &dev_id);
+
+	return (ret || dev_id != pci_dev->device) ? -EIO : 0;
+}
+
+static int mhi_runtime_get(struct mhi_controller *mhi_cntl)
+{
+	return 0;
+}
+
+static void mhi_runtime_put(struct mhi_controller *mhi_cntl)
+{
+}
+
+static void mhi_status_cb(struct mhi_controller *mhi_cntl,
+			  enum mhi_callback reason)
+{
+}
+
+struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev,
+						    void *mhi_bar,
+						    int mhi_irq)
+{
+	struct mhi_controller *mhi_cntl;
+	int ret;
+
+	pci_dbg(pci_dev, "%s\n", __func__);
+
+	mhi_cntl = kzalloc(sizeof(*mhi_cntl), GFP_KERNEL);
+	if (!mhi_cntl)
+		return ERR_PTR(-ENOMEM);
+
+	mhi_cntl->cntrl_dev = &pci_dev->dev;
+
+	/*
+	 * Covers the entire possible physical ram region.  Remote side is
+	 * going to calculate a size of this range, so subtract 1 to prevent
+	 * rollover.
+	 */
+	mhi_cntl->iova_start = 0;
+	mhi_cntl->iova_stop = U64_MAX - 1;
+
+	mhi_cntl->status_cb = mhi_status_cb;
+	mhi_cntl->runtime_get = mhi_runtime_get;
+	mhi_cntl->runtime_put = mhi_runtime_put;
+	mhi_cntl->link_status = mhi_link_status;
+	mhi_cntl->regs = mhi_bar;
+	mhi_cntl->nr_irqs = 1;
+	mhi_cntl->irq = kmalloc(sizeof(*mhi_cntl->irq), GFP_KERNEL);
+
+	if (!mhi_cntl->irq)
+		return ERR_PTR(-ENOMEM);
+
+	mhi_cntl->irq[0] = mhi_irq;
+
+	mhi_cntl->fw_image = "qcom/aic100/sbl.bin";
+
+	/* use latest configured timeout */
+	aic100_config.timeout_ms = mhi_timeout;
+	ret = mhi_register_controller(mhi_cntl, &aic100_config);
+	if (ret) {
+		pci_err(pci_dev, "register_mhi_controller failed %d\n", ret);
+		kfree(mhi_cntl->irq);
+		kfree(mhi_cntl);
+		return ERR_PTR(ret);
+	}
+
+	ret = mhi_async_power_up(mhi_cntl);
+	if (ret) {
+		pci_err(pci_dev, "mhi_async_power_up failed %d\n", ret);
+		mhi_unregister_controller(mhi_cntl);
+		kfree(mhi_cntl->irq);
+		kfree(mhi_cntl);
+		return ERR_PTR(ret);
+	}
+
+	return mhi_cntl;
+}
+
+void qaic_mhi_free_controller(struct mhi_controller *mhi_cntl, bool link_up)
+{
+	mhi_power_down(mhi_cntl, link_up);
+	mhi_unregister_controller(mhi_cntl);
+	kfree(mhi_cntl->irq);
+	kfree(mhi_cntl);
+}
diff --git a/drivers/misc/qaic/mhi_controller.h b/drivers/misc/qaic/mhi_controller.h
new file mode 100644
index 0000000..c81725e
--- /dev/null
+++ b/drivers/misc/qaic/mhi_controller.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * Copyright (c) 2019-2020, The Linux Foundation. All rights reserved.
+ */
+
+#ifndef MHICONTROLLERQAIC_H_
+#define MHICONTROLLERQAIC_H_
+
+struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev,
+						    void *mhi_bar,
+						    int mhi_irq);
+
+void qaic_mhi_free_controller(struct mhi_controller *mhi_cntl, bool link_up);
+#endif /* MHICONTROLLERQAIC_H_ */
diff --git a/drivers/misc/qaic/qaic.h b/drivers/misc/qaic/qaic.h
new file mode 100644
index 0000000..379aa82
--- /dev/null
+++ b/drivers/misc/qaic/qaic.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * Copyright (c) 2019-2020, The Linux Foundation. All rights reserved.
+ */
+
+#ifndef QAICINTERNAL_H_
+#define QAICINTERNAL_H_
+
+#include <linux/mhi.h>
+#include <linux/pci.h>
+
+struct qaic_device {
+	struct pci_dev		*pdev;
+	int			bars;
+	void __iomem		*bar_0;
+	struct mhi_controller	*mhi_cntl;
+};
+#endif /* QAICINTERNAL_H_ */
diff --git a/drivers/misc/qaic/qaic_drv.c b/drivers/misc/qaic/qaic_drv.c
index addd9ea..b624daa 100644
--- a/drivers/misc/qaic/qaic_drv.c
+++ b/drivers/misc/qaic/qaic_drv.c
@@ -3,24 +3,131 @@
 /* Copyright (c) 2019-2020, The Linux Foundation. All rights reserved. */
 
 #include <linux/module.h>
+#include <linux/msi.h>
 #include <linux/pci.h>
 #include <linux/pci_ids.h>
 
+#include "mhi_controller.h"
+#include "qaic.h"
+
 #define PCI_DEV_AIC100			0xa100
 
 #define QAIC_NAME			"Qualcomm Cloud AI 100"
 
+static bool link_up;
+
 static int qaic_pci_probe(struct pci_dev *pdev,
 			  const struct pci_device_id *id)
 {
+	int ret;
+	int mhi_irq;
+	struct qaic_device *qdev;
+
 	pci_dbg(pdev, "%s\n", __func__);
+
+	qdev = kzalloc(sizeof(*qdev), GFP_KERNEL);
+	if (!qdev) {
+		ret = -ENOMEM;
+		goto qdev_fail;
+	}
+
+	pci_set_drvdata(pdev, qdev);
+	qdev->pdev = pdev;
+
+	qdev->bars = pci_select_bars(pdev, IORESOURCE_MEM);
+
+	/* make sure the device has the expected BARs */
+	if (qdev->bars != (BIT(0) | BIT(2) | BIT(4))) {
+		pci_err(pdev, "%s: expected BARs 0, 2, and 4 not found in device.  Found 0x%x\n", __func__, qdev->bars);
+		ret = -EINVAL;
+		goto bar_fail;
+	}
+
+	ret = pci_enable_device(pdev);
+	if (ret)
+		goto enable_fail;
+
+	ret = pci_request_selected_regions(pdev, qdev->bars, "aic100");
+	if (ret)
+		goto request_regions_fail;
+
+	pci_set_master(pdev);
+
+	ret = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+	if (ret)
+		goto dma_mask_fail;
+	ret = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+	if (ret)
+		goto dma_mask_fail;
+
+	qdev->bar_0 = pci_ioremap_bar(pdev, 0);
+	if (!qdev->bar_0) {
+		ret = -ENOMEM;
+		goto ioremap_0_fail;
+	}
+
+	ret = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI);
+	if (ret < 0)
+		goto alloc_irq_fail;
+
+	if (ret < 32) {
+		pci_err(pdev, "%s: Requested 32 MSIs.  Obtained %d MSIs which is less than the 32 required.\n", __func__, ret);
+		ret = -ENODEV;
+		goto invalid_msi_config;
+	}
+
+	mhi_irq = pci_irq_vector(pdev, 0);
+	if (mhi_irq < 0) {
+		ret = mhi_irq;
+		goto get_mhi_irq_fail;
+	}
+
+	qdev->mhi_cntl = qaic_mhi_register_controller(pdev, qdev->bar_0,
+						      mhi_irq);
+	if (IS_ERR(qdev->mhi_cntl)) {
+		ret = PTR_ERR(qdev->mhi_cntl);
+		goto mhi_register_fail;
+	}
+
 	pci_dbg(pdev, "%s: successful init\n", __func__);
 	return 0;
+
+mhi_register_fail:
+get_mhi_irq_fail:
+invalid_msi_config:
+	pci_free_irq_vectors(pdev);
+alloc_irq_fail:
+	iounmap(qdev->bar_0);
+ioremap_0_fail:
+dma_mask_fail:
+	pci_clear_master(pdev);
+	pci_release_selected_regions(pdev, qdev->bars);
+request_regions_fail:
+	pci_disable_device(pdev);
+enable_fail:
+	pci_set_drvdata(pdev, NULL);
+bar_fail:
+	kfree(qdev);
+qdev_fail:
+	return ret;
 }
 
 static void qaic_pci_remove(struct pci_dev *pdev)
 {
+	struct qaic_device *qdev = pci_get_drvdata(pdev);
+
 	pci_dbg(pdev, "%s\n", __func__);
+	if (!qdev)
+		return;
+
+	qaic_mhi_free_controller(qdev->mhi_cntl, link_up);
+	pci_free_irq_vectors(pdev);
+	iounmap(qdev->bar_0);
+	pci_clear_master(pdev);
+	pci_release_selected_regions(pdev, qdev->bars);
+	pci_disable_device(pdev);
+	pci_set_drvdata(pdev, NULL);
+	kfree(qdev);
 }
 
 static const struct pci_device_id ids[] = {
@@ -44,12 +151,15 @@ static int __init qaic_init(void)
 
 	ret = pci_register_driver(&qaic_pci_driver);
 
+	pr_debug("qaic: init success\n");
+
 	return ret;
 }
 
 static void __exit qaic_exit(void)
 {
 	pr_debug("qaic: exit\n");
+	link_up = true;
 	pci_unregister_driver(&qaic_pci_driver);
 }
 
-- 
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 3/8] qaic: Create char dev
  2020-05-14 14:07 [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver Jeffrey Hugo
  2020-05-14 14:07 ` [RFC PATCH 1/8] qaic: Add skeleton driver Jeffrey Hugo
  2020-05-14 14:07 ` [RFC PATCH 2/8] qaic: Add and init a basic mhi controller Jeffrey Hugo
@ 2020-05-14 14:07 ` Jeffrey Hugo
  2020-05-14 14:12   ` Greg KH
  2020-05-14 14:07 ` [RFC PATCH 4/8] qaic: Implement control path Jeffrey Hugo
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-14 14:07 UTC (permalink / raw)
  To: arnd, gregkh
  Cc: manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel, Jeffrey Hugo

Now that we can fully boot the device, we should start making it usable.
The primary interface to the device will be via a char dev.  Add the
necessary framework to detect when the device is fully booted, and create
the char dev at that point.  The device is only usable when it is fully
booted.  The char dev does nothing useful yet, but we can easily build on
this to provide functionality.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
---
 drivers/misc/qaic/qaic.h     |  19 +++
 drivers/misc/qaic/qaic_drv.c | 303 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 321 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/qaic/qaic.h b/drivers/misc/qaic/qaic.h
index 379aa82..58ca167 100644
--- a/drivers/misc/qaic/qaic.h
+++ b/drivers/misc/qaic/qaic.h
@@ -6,13 +6,32 @@
 #ifndef QAICINTERNAL_H_
 #define QAICINTERNAL_H_
 
+#include <linux/cdev.h>
+#include <linux/kref.h>
 #include <linux/mhi.h>
+#include <linux/mutex.h>
 #include <linux/pci.h>
+#include <linux/srcu.h>
+
+struct qaic_user {
+	pid_t			handle;
+	struct qaic_device	*qdev;
+	struct list_head	node;
+	struct srcu_struct	qdev_lock;
+	struct kref		ref_count;
+};
 
 struct qaic_device {
 	struct pci_dev		*pdev;
 	int			bars;
 	void __iomem		*bar_0;
 	struct mhi_controller	*mhi_cntl;
+	struct mhi_device	*cntl_ch;
+	struct cdev		*cdev;
+	struct device		*dev;
+	bool			in_reset;
+	struct srcu_struct	dev_lock;
+	struct list_head	users;
+	struct mutex		users_mutex;
 };
 #endif /* QAICINTERNAL_H_ */
diff --git a/drivers/misc/qaic/qaic_drv.c b/drivers/misc/qaic/qaic_drv.c
index b624daa..6e4b936 100644
--- a/drivers/misc/qaic/qaic_drv.c
+++ b/drivers/misc/qaic/qaic_drv.c
@@ -2,8 +2,14 @@
 
 /* Copyright (c) 2019-2020, The Linux Foundation. All rights reserved. */
 
+#include <linux/cdev.h>
+#include <linux/idr.h>
+#include <linux/list.h>
+#include <linux/kref.h>
+#include <linux/mhi.h>
 #include <linux/module.h>
 #include <linux/msi.h>
+#include <linux/mutex.h>
 #include <linux/pci.h>
 #include <linux/pci_ids.h>
 
@@ -13,9 +19,242 @@
 #define PCI_DEV_AIC100			0xa100
 
 #define QAIC_NAME			"Qualcomm Cloud AI 100"
+#define QAIC_MAX_MINORS			256
 
+static int qaic_major;
+static struct class *qaic_class;
+static DEFINE_IDR(qaic_devs);
+static DEFINE_MUTEX(qaic_devs_lock);
 static bool link_up;
 
+static int qaic_device_open(struct inode *inode, struct file *filp);
+static int qaic_device_release(struct inode *inode, struct file *filp);
+
+static const struct file_operations qaic_ops = {
+	.owner = THIS_MODULE,
+	.open = qaic_device_open,
+	.release = qaic_device_release,
+};
+
+static void free_usr(struct kref *kref)
+{
+	struct qaic_user *usr = container_of(kref, struct qaic_user, ref_count);
+
+	list_del(&usr->node);
+	cleanup_srcu_struct(&usr->qdev_lock);
+	kfree(usr);
+}
+
+static int qaic_device_open(struct inode *inode, struct file *filp)
+{
+	struct qaic_device *qdev;
+	struct qaic_user *usr;
+	int rcu_id;
+	int ret;
+
+	ret = mutex_lock_interruptible(&qaic_devs_lock);
+	if (ret)
+		return ret;
+	qdev = idr_find(&qaic_devs, iminor(inode));
+	mutex_unlock(&qaic_devs_lock);
+
+	pci_dbg(qdev->pdev, "%s pid:%d\n", __func__, current->pid);
+
+	rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
+		return -ENODEV;
+	}
+
+	usr = kmalloc(sizeof(*usr), GFP_KERNEL);
+	if (!usr)
+		return -ENOMEM;
+
+	usr->handle = current->pid;
+	usr->qdev = qdev;
+	init_srcu_struct(&usr->qdev_lock);
+	kref_init(&usr->ref_count);
+
+	ret = mutex_lock_interruptible(&qdev->users_mutex);
+	if (ret) {
+		cleanup_srcu_struct(&usr->qdev_lock);
+		kfree(usr);
+		srcu_read_unlock(&qdev->dev_lock, rcu_id);
+		return ret;
+	}
+
+	list_add(&usr->node, &qdev->users);
+	mutex_unlock(&qdev->users_mutex);
+
+	filp->private_data = usr;
+	nonseekable_open(inode, filp);
+
+	srcu_read_unlock(&qdev->dev_lock, rcu_id);
+	return 0;
+}
+
+static int qaic_device_release(struct inode *inode, struct file *filp)
+{
+	struct qaic_user *usr = filp->private_data;
+	struct qaic_device *qdev = usr->qdev;
+	int qdev_rcu_id;
+	int usr_rcu_id;
+
+	usr_rcu_id = srcu_read_lock(&usr->qdev_lock);
+	if (qdev) {
+		qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+		if (!qdev->in_reset) {
+			pci_dbg(qdev->pdev, "%s pid:%d\n", __func__,
+								current->pid);
+		}
+		srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+
+		srcu_read_unlock(&usr->qdev_lock, usr_rcu_id);
+		mutex_lock(&qdev->users_mutex);
+		kref_put(&usr->ref_count, free_usr);
+		mutex_unlock(&qdev->users_mutex);
+	} else {
+		srcu_read_unlock(&usr->qdev_lock, usr_rcu_id);
+		/* safe to do without the mutex because reset already has ref */
+		kref_put(&usr->ref_count, free_usr);
+	}
+
+	filp->private_data = NULL;
+	return 0;
+}
+
+static int qaic_mhi_probe(struct mhi_device *mhi_dev,
+			  const struct mhi_device_id *id)
+{
+	struct qaic_device *qdev;
+	dev_t devno;
+	int ret;
+
+	/*
+	 * Invoking this function indicates that the control channel to the
+	 * device is available.  We use that as a signal to indicate that
+	 * the device side firmware has booted.  The device side firmware
+	 * manages the device resources, so we need to communicate with it
+	 * via the control channel in order to utilize the device.  Therefore
+	 * we wait until this signal to create the char dev that userspace will
+	 * use to control the device, because without the device side firmware,
+	 * userspace can't do anything useful.
+	 */
+
+	qdev = (struct qaic_device *)pci_get_drvdata(
+				to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
+
+	pci_dbg(qdev->pdev, "%s\n", __func__);
+	qdev->in_reset = false;
+
+	dev_set_drvdata(&mhi_dev->dev, qdev);
+	qdev->cntl_ch = mhi_dev;
+
+	mutex_lock(&qaic_devs_lock);
+	ret = idr_alloc(&qaic_devs, qdev, 0, QAIC_MAX_MINORS, GFP_KERNEL);
+	mutex_unlock(&qaic_devs_lock);
+
+	if (ret < 0) {
+		pci_dbg(qdev->pdev, "%s: idr_alloc failed %d\n", __func__, ret);
+		goto err;
+	}
+
+	devno = MKDEV(qaic_major, ret);
+
+	qdev->cdev = cdev_alloc();
+	if (!qdev->cdev) {
+		pci_dbg(qdev->pdev, "%s: cdev_alloc failed\n", __func__);
+		ret = -ENOMEM;
+		goto free_idr;
+	}
+
+	qdev->cdev->owner = THIS_MODULE;
+	qdev->cdev->ops = &qaic_ops;
+	ret = cdev_add(qdev->cdev, devno, 1);
+	if (ret) {
+		pci_dbg(qdev->pdev, "%s: cdev_add failed %d\n", __func__, ret);
+		goto free_cdev;
+	}
+
+	qdev->dev = device_create(qaic_class, NULL, devno, NULL,
+				  "qaic_aic100_%04x:%02x:%02x.%d",
+				  pci_domain_nr(qdev->pdev->bus),
+				  qdev->pdev->bus->number,
+				  PCI_SLOT(qdev->pdev->devfn),
+				  PCI_FUNC(qdev->pdev->devfn));
+	if (IS_ERR(qdev->dev)) {
+		ret = PTR_ERR(qdev->dev);
+		pci_dbg(qdev->pdev, "%s: device_create failed %d\n", __func__, ret);
+		goto free_cdev;
+	}
+
+	dev_set_drvdata(qdev->dev, qdev);
+
+	return 0;
+
+free_cdev:
+	cdev_del(qdev->cdev);
+free_idr:
+	mutex_lock(&qaic_devs_lock);
+	idr_remove(&qaic_devs, MINOR(devno));
+	mutex_unlock(&qaic_devs_lock);
+err:
+	return ret;
+}
+
+static void qaic_mhi_remove(struct mhi_device *mhi_dev)
+{
+}
+
+static void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
+				struct mhi_result *mhi_result)
+{
+}
+
+static void qaic_mhi_dl_xfer_cb(struct mhi_device *mhi_dev,
+				struct mhi_result *mhi_result)
+{
+}
+
+void qaic_dev_reset_clean_local_state(struct qaic_device *qdev)
+{
+	struct qaic_user *usr;
+	struct qaic_user *u;
+	dev_t devno;
+
+	qdev->in_reset = true;
+	synchronize_srcu(&qdev->dev_lock);
+
+	/*
+	 * while the usr still has access to the qdev, use the mutex to add
+	 * a reference for later.  This makes sure the usr can't disappear on
+	 * us at the wrong time.  The mutex use in close() system call handling
+	 * makes sure the usr will be valid or complete not exist here.
+	 */
+	mutex_lock(&qdev->users_mutex);
+	list_for_each_entry_safe(usr, u, &qdev->users, node)
+		kref_get(&usr->ref_count);
+	mutex_unlock(&qdev->users_mutex);
+
+	/* remove chardev to prevent new users from coming in */
+	if (qdev->dev) {
+		devno = qdev->dev->devt;
+		qdev->dev = NULL;
+		device_destroy(qaic_class, devno);
+		cdev_del(qdev->cdev);
+		mutex_lock(&qaic_devs_lock);
+		idr_remove(&qaic_devs, MINOR(devno));
+		mutex_unlock(&qaic_devs_lock);
+	}
+
+	/* make existing users get unresolvable errors until they close FDs */
+	list_for_each_entry_safe(usr, u, &qdev->users, node) {
+		usr->qdev = NULL;
+		synchronize_srcu(&usr->qdev_lock);
+		kref_put(&usr->ref_count, free_usr);
+	}
+}
+
 static int qaic_pci_probe(struct pci_dev *pdev,
 			  const struct pci_device_id *id)
 {
@@ -33,6 +272,9 @@ static int qaic_pci_probe(struct pci_dev *pdev,
 
 	pci_set_drvdata(pdev, qdev);
 	qdev->pdev = pdev;
+	init_srcu_struct(&qdev->dev_lock);
+	INIT_LIST_HEAD(&qdev->users);
+	mutex_init(&qdev->users_mutex);
 
 	qdev->bars = pci_select_bars(pdev, IORESOURCE_MEM);
 
@@ -107,6 +349,7 @@ static int qaic_pci_probe(struct pci_dev *pdev,
 enable_fail:
 	pci_set_drvdata(pdev, NULL);
 bar_fail:
+	cleanup_srcu_struct(&qdev->dev_lock);
 	kfree(qdev);
 qdev_fail:
 	return ret;
@@ -120,6 +363,7 @@ static void qaic_pci_remove(struct pci_dev *pdev)
 	if (!qdev)
 		return;
 
+	qaic_dev_reset_clean_local_state(qdev);
 	qaic_mhi_free_controller(qdev->mhi_cntl, link_up);
 	pci_free_irq_vectors(pdev);
 	iounmap(qdev->bar_0);
@@ -127,9 +371,27 @@ static void qaic_pci_remove(struct pci_dev *pdev)
 	pci_release_selected_regions(pdev, qdev->bars);
 	pci_disable_device(pdev);
 	pci_set_drvdata(pdev, NULL);
+	cleanup_srcu_struct(&qdev->dev_lock);
 	kfree(qdev);
 }
 
+static const struct mhi_device_id qaic_mhi_match_table[] = {
+	{ .chan = "QAIC_CONTROL", },
+	{},
+};
+
+static struct mhi_driver qaic_mhi_driver = {
+	.id_table = qaic_mhi_match_table,
+	.remove = qaic_mhi_remove,
+	.probe = qaic_mhi_probe,
+	.ul_xfer_cb = qaic_mhi_ul_xfer_cb,
+	.dl_xfer_cb = qaic_mhi_dl_xfer_cb,
+	.driver = {
+		.name = "qaic_mhi",
+		.owner = THIS_MODULE,
+	},
+};
+
 static const struct pci_device_id ids[] = {
 	{ PCI_DEVICE(PCI_VENDOR_ID_QCOM, PCI_DEV_AIC100), },
 	{ 0, }
@@ -146,13 +408,48 @@ static struct pci_driver qaic_pci_driver = {
 static int __init qaic_init(void)
 {
 	int ret;
+	dev_t dev;
 
 	pr_debug("qaic: init\n");
 
+	ret = alloc_chrdev_region(&dev, 0, QAIC_MAX_MINORS, QAIC_NAME);
+	if (ret < 0) {
+		pr_debug("qaic: alloc_chrdev_region failed %d\n", ret);
+		goto out;
+	}
+
+	qaic_major = MAJOR(dev);
+
+	qaic_class = class_create(THIS_MODULE, QAIC_NAME);
+	if (IS_ERR(qaic_class)) {
+		ret = PTR_ERR(qaic_class);
+		pr_debug("qaic: class_create failed %d\n", ret);
+		goto free_major;
+	}
+
+	ret = mhi_driver_register(&qaic_mhi_driver);
+	if (ret) {
+		pr_debug("qaic: mhi_driver_register failed %d\n", ret);
+		goto free_class;
+	}
+
 	ret = pci_register_driver(&qaic_pci_driver);
 
-	pr_debug("qaic: init success\n");
+	if (ret) {
+		pr_debug("qaic: pci_register_driver failed %d\n", ret);
+		goto free_mhi;
+	}
 
+	pr_debug("qaic: init success\n");
+	goto out;
+
+free_mhi:
+	mhi_driver_unregister(&qaic_mhi_driver);
+free_class:
+	class_destroy(qaic_class);
+free_major:
+	unregister_chrdev_region(MKDEV(qaic_major, 0), QAIC_MAX_MINORS);
+out:
 	return ret;
 }
 
@@ -161,6 +458,10 @@ static void __exit qaic_exit(void)
 	pr_debug("qaic: exit\n");
 	link_up = true;
 	pci_unregister_driver(&qaic_pci_driver);
+	mhi_driver_unregister(&qaic_mhi_driver);
+	class_destroy(qaic_class);
+	unregister_chrdev_region(MKDEV(qaic_major, 0), QAIC_MAX_MINORS);
+	idr_destroy(&qaic_devs);
 }
 
 module_init(qaic_init);
-- 
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 4/8] qaic: Implement control path
  2020-05-14 14:07 [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver Jeffrey Hugo
                   ` (2 preceding siblings ...)
  2020-05-14 14:07 ` [RFC PATCH 3/8] qaic: Create char dev Jeffrey Hugo
@ 2020-05-14 14:07 ` Jeffrey Hugo
  2020-05-14 14:07 ` [RFC PATCH 5/8] qaic: Implement data path Jeffrey Hugo
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-14 14:07 UTC (permalink / raw)
  To: arnd, gregkh
  Cc: manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel, Jeffrey Hugo

Once a qaic device is fully booted, the control path is available for use.
The control path allows the host to use a MHI channel to send requests to
the device.  The canonical usecase for the control path is allowing
userspace to configure the device for a workload.  This is accomplished
via the manage ioctl.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
---
 drivers/misc/qaic/Makefile       |    3 +-
 drivers/misc/qaic/qaic.h         |   61 +++
 drivers/misc/qaic/qaic_control.c | 1015 ++++++++++++++++++++++++++++++++++++++
 drivers/misc/qaic/qaic_drv.c     |  159 +++++-
 include/uapi/misc/qaic.h         |  117 +++++
 5 files changed, 1343 insertions(+), 12 deletions(-)
 create mode 100644 drivers/misc/qaic/qaic_control.c
 create mode 100644 include/uapi/misc/qaic.h

diff --git a/drivers/misc/qaic/Makefile b/drivers/misc/qaic/Makefile
index b5fd819..7a5513b 100644
--- a/drivers/misc/qaic/Makefile
+++ b/drivers/misc/qaic/Makefile
@@ -7,4 +7,5 @@ obj-$(CONFIG_QAIC)	:= qaic.o
 
 qaic-y := \
 	qaic_drv.o \
-	mhi_controller.o
+	mhi_controller.o \
+	qaic_control.o
diff --git a/drivers/misc/qaic/qaic.h b/drivers/misc/qaic/qaic.h
index 58ca167..c35a4e1 100644
--- a/drivers/misc/qaic/qaic.h
+++ b/drivers/misc/qaic/qaic.h
@@ -7,11 +7,21 @@
 #define QAICINTERNAL_H_
 
 #include <linux/cdev.h>
+#include <linux/idr.h>
 #include <linux/kref.h>
 #include <linux/mhi.h>
 #include <linux/mutex.h>
 #include <linux/pci.h>
+#include <linux/spinlock.h>
 #include <linux/srcu.h>
+#include <linux/wait.h>
+#include <linux/workqueue.h>
+
+#define QAIC_NUM_DBC		16
+#define QAIC_DBC_BASE		0x20000
+#define QAIC_DBC_SIZE		0x1000
+
+#define QAIC_DBC_OFF(i)		((i) * QAIC_DBC_SIZE + QAIC_DBC_BASE)
 
 struct qaic_user {
 	pid_t			handle;
@@ -21,17 +31,68 @@ struct qaic_user {
 	struct kref		ref_count;
 };
 
+struct dma_bridge_chan {
+	struct qaic_device	*qdev;
+	unsigned int		id;
+	/* also the base of the entire memory allocation */
+	void			*req_q_base;
+	void			*rsp_q_base;
+	dma_addr_t		dma_addr;
+	u32			total_size;
+	u32			nelem;
+	struct mutex		mem_lock;
+	struct idr		mem_handles;
+	struct qaic_user	*usr;
+	u16			next_req_id;
+	void __iomem		*dbc_base;
+	spinlock_t		xfer_lock;
+	struct list_head	xfer_list;
+	struct srcu_struct	ch_lock;
+	struct dentry		*debugfs_root;
+	bool			in_use;
+	wait_queue_head_t	dbc_release;
+};
+
 struct qaic_device {
 	struct pci_dev		*pdev;
 	int			bars;
 	void __iomem		*bar_0;
+	void __iomem		*bar_2;
 	struct mhi_controller	*mhi_cntl;
 	struct mhi_device	*cntl_ch;
+	struct list_head	cntl_xfer_list;
+	u32			next_seq_num;
+	struct mutex		cntl_mutex;
+	bool			cntl_lost_buf;
 	struct cdev		*cdev;
 	struct device		*dev;
+	struct dma_bridge_chan	dbc[QAIC_NUM_DBC];
+	struct workqueue_struct	*cntl_wq;
 	bool			in_reset;
 	struct srcu_struct	dev_lock;
 	struct list_head	users;
 	struct mutex		users_mutex;
 };
+
+int get_dbc_req_elem_size(void);
+int get_dbc_rsp_elem_size(void);
+int get_cntl_version(struct qaic_device *qdev, struct qaic_user *usr,
+		     u16 *major, u16 *minor);
+int qaic_manage_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
+		      unsigned long arg);
+
+void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
+			 struct mhi_result *mhi_result);
+
+void qaic_mhi_dl_xfer_cb(struct mhi_device *mhi_dev,
+			 struct mhi_result *mhi_result);
+
+int qaic_control_open(struct qaic_device *qdev);
+void qaic_control_close(struct qaic_device *qdev);
+void qaic_release_usr(struct qaic_device *qdev, struct qaic_user *usr);
+
+int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr);
+void release_dbc(struct qaic_device *qdev, u32 dbc_id);
+
+void wake_all_cntl(struct qaic_device *qdev);
 #endif /* QAICINTERNAL_H_ */
diff --git a/drivers/misc/qaic/qaic_control.c b/drivers/misc/qaic/qaic_control.c
new file mode 100644
index 0000000..00ac5fc
--- /dev/null
+++ b/drivers/misc/qaic/qaic_control.c
@@ -0,0 +1,1015 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2019-2020, The Linux Foundation. All rights reserved. */
+
+#include <asm/byteorder.h>
+#include <linux/completion.h>
+#include <linux/dma-mapping.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/mhi.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/scatterlist.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/workqueue.h>
+#include <linux/wait.h>
+#include <uapi/misc/qaic.h>
+
+#include "qaic.h"
+
+#define MANAGE_MAGIC_NUMBER	0x43494151 /* "QAIC" in little endian */
+#define QAIC_DBC_Q_GAP		0x100
+#define QAIC_DBC_Q_BUF_ALIGN	0x1000
+#define RESP_TIMEOUT		(60 * HZ)
+
+/*
+ * Wire encoding structures for the manage protocol.
+ * All fields are little endian on the wire
+ */
+struct _msg_hdr {
+	u32 magic_number;
+	u32 sequence_number;
+	u32 len; /* length of this message */
+	u32 count; /* number of transactions in this message */
+	u32 handle; /* unique id to track the resources consumed */
+} __packed;
+
+struct _msg {
+	struct _msg_hdr hdr;
+	u8 data[QAIC_MANAGE_MAX_MSG_LENGTH];
+} __packed;
+
+struct wrapper_msg {
+	struct kref ref_count;
+	struct _msg msg;
+};
+
+struct _trans_hdr {
+	u32 type;
+	u32 len;
+} __packed;
+
+struct _trans_passthrough {
+	struct _trans_hdr hdr;
+	u8 data[0];
+} __packed;
+
+struct _addr_size_pair {
+	u64 addr;
+	u64 size;
+} __packed;
+
+struct _trans_dma_xfer {
+	struct _trans_hdr hdr;
+	u32 tag;
+	u32 count;
+	struct _addr_size_pair data[0];
+} __packed;
+
+struct _trans_activate_to_dev {
+	struct _trans_hdr hdr;
+	u32 buf_len;
+	u64 req_q_addr;
+	u32 req_q_size;
+	u64 rsp_q_addr;
+	u32 rsp_q_size;
+	u32 reserved;
+} __packed;
+
+struct _trans_activate_from_dev {
+	struct _trans_hdr hdr;
+	u32 status;
+	u32 dbc_id;
+} __packed;
+
+struct _trans_deactivate_from_dev {
+	struct _trans_hdr hdr;
+	u32 status;
+	u32 dbc_id;
+} __packed;
+
+struct _trans_terminate_to_dev {
+	struct _trans_hdr hdr;
+	u32 handle;
+} __packed;
+
+struct _trans_terminate_from_dev {
+	struct _trans_hdr hdr;
+	u32 status;
+} __packed;
+
+struct _trans_status_to_dev {
+	struct _trans_hdr hdr;
+} __packed;
+
+struct _trans_status_from_dev {
+	struct _trans_hdr hdr;
+	u16 major;
+	u16 minor;
+	u32 status;
+} __packed;
+
+struct xfer_queue_elem {
+	struct list_head list;
+	u32 seq_num;
+	struct completion xfer_done;
+	void *buf;
+};
+
+struct dma_xfer {
+	struct list_head list;
+	struct sg_table *sgt;
+	struct page **page_list;
+	unsigned long nr_pages;
+};
+
+struct ioctl_resources {
+	struct list_head dma_xfers;
+	void *buf;
+	dma_addr_t dma_addr;
+	u32 total_size;
+	u32 nelem;
+	void *rsp_q_base;
+	u32 status;
+	u32 dbc_id;
+};
+
+struct resp_work {
+	struct work_struct work;
+	struct qaic_device *qdev;
+	void *buf;
+};
+
+static void free_wrapper(struct kref *ref)
+{
+	struct wrapper_msg *wrapper = container_of(ref, struct wrapper_msg,
+						   ref_count);
+
+	kfree(wrapper);
+}
+
+static void save_dbc_buf(struct qaic_device *qdev,
+			 struct ioctl_resources *resources,
+			 struct qaic_user *usr)
+{
+	u32 dbc_id = resources->dbc_id;
+
+	if (resources->buf) {
+		wait_event_interruptible(qdev->dbc[dbc_id].dbc_release,
+					 !qdev->dbc[dbc_id].in_use);
+		qdev->dbc[dbc_id].req_q_base = resources->buf;
+		qdev->dbc[dbc_id].rsp_q_base = resources->rsp_q_base;
+		qdev->dbc[dbc_id].dma_addr = resources->dma_addr;
+		qdev->dbc[dbc_id].total_size = resources->total_size;
+		qdev->dbc[dbc_id].nelem = resources->nelem;
+		qdev->dbc[dbc_id].usr = usr;
+		qdev->dbc[dbc_id].in_use = true;
+		resources->buf = 0;
+	}
+}
+
+static void free_dbc_buf(struct qaic_device *qdev,
+			 struct ioctl_resources *resources)
+{
+	if (resources->buf)
+		dma_free_coherent(&qdev->pdev->dev, resources->total_size,
+				  resources->buf, resources->dma_addr);
+	resources->buf = 0;
+}
+
+static void free_dma_xfers(struct qaic_device *qdev,
+			   struct ioctl_resources *resources)
+{
+	struct dma_xfer *xfer;
+	struct dma_xfer *x;
+	int i;
+
+	list_for_each_entry_safe(xfer, x, &resources->dma_xfers, list) {
+		dma_unmap_sg(&qdev->pdev->dev, xfer->sgt->sgl, xfer->sgt->nents,
+			     DMA_TO_DEVICE);
+		sg_free_table(xfer->sgt);
+		kfree(xfer->sgt);
+		for (i = 0; i < xfer->nr_pages; ++i)
+			put_page(xfer->page_list[i]);
+		kfree(xfer->page_list);
+		list_del(&xfer->list);
+		kfree(xfer);
+	}
+}
+
+static int encode_passthrough(struct qaic_device *qdev, void *trans,
+			      struct _msg *msg, u32 *user_len)
+{
+	struct qaic_manage_trans_passthrough *in_trans = trans;
+	struct _trans_passthrough *out_trans = (void *)msg + msg->hdr.len;
+
+	if (msg->hdr.len + in_trans->hdr.len > sizeof(*msg))
+		return -ENOSPC;
+
+	memcpy(out_trans, in_trans, in_trans->hdr.len);
+	msg->hdr.len += in_trans->hdr.len;
+	*user_len += in_trans->hdr.len;
+	out_trans->hdr.type = cpu_to_le32(TRANS_PASSTHROUGH_TO_DEV);
+	out_trans->hdr.len = cpu_to_le32(out_trans->hdr.len);
+
+	return 0;
+}
+
+static int encode_dma(struct qaic_device *qdev, void *trans, struct _msg *msg,
+		      u32 *user_len, struct ioctl_resources *resources)
+{
+	struct qaic_manage_trans_dma_xfer *in_trans = trans;
+	struct _trans_dma_xfer *out_trans = (void *)msg + msg->hdr.len;
+	struct dma_xfer *xfer;
+	unsigned long nr_pages;
+	struct page **page_list;
+	struct scatterlist *last;
+	struct scatterlist *sg;
+	struct sg_table *sgt;
+	unsigned int dma_len;
+	int nents;
+	int dmas;
+	int ret;
+	int i;
+
+	if (in_trans->addr + in_trans->size < in_trans->addr ||
+	    !in_trans->size) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	xfer = kmalloc(sizeof(*xfer), GFP_KERNEL);
+	if (!xfer) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	nr_pages = PAGE_ALIGN(in_trans->size + offset_in_page(in_trans->addr))
+								>> PAGE_SHIFT;
+
+	page_list = kmalloc_array(nr_pages, sizeof(*page_list), GFP_KERNEL);
+	if (!page_list) {
+		ret = -ENOMEM;
+		goto free_resource;
+	}
+
+	ret = get_user_pages_fast(in_trans->addr, nr_pages, 0, page_list);
+	if (ret < 0 || ret != nr_pages) {
+		ret = -EFAULT;
+		goto free_page_list;
+	}
+
+	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
+	if (!sgt) {
+		ret = -ENOMEM;
+		goto put_pages;
+	}
+
+	ret = sg_alloc_table_from_pages(sgt, page_list, nr_pages,
+					offset_in_page(in_trans->addr),
+					in_trans->size, GFP_KERNEL);
+	if (ret) {
+		ret = -ENOMEM;
+		goto free_sgt;
+	}
+
+	nents = dma_map_sg(&qdev->pdev->dev, sgt->sgl, sgt->nents,
+			   DMA_TO_DEVICE);
+	if (!nents) {
+		ret = -EFAULT;
+		goto free_table;
+	}
+
+	/*
+	 * It turns out several of the iommu drivers don't combine adjacent
+	 * regions, which is really what we expect based on the description of
+	 * dma_map_sg(), so lets see if that can be done.  It makes our message
+	 * more efficent.
+	 */
+	dmas = 0;
+	last = sgt->sgl;
+	for_each_sg(sgt->sgl, sg, nents, i) {
+		if (sg_dma_address(last) + sg_dma_len(last) !=
+		    sg_dma_address(sg))
+			dmas++;
+		last = sg;
+	}
+
+	/*
+	 * now that we finally know how many memory segments we will be encoding
+	 * we can check to see if we have space in the message
+	 */
+	if (msg->hdr.len + sizeof(*out_trans) + dmas * sizeof(*out_trans->data)
+							> sizeof(*msg)) {
+		ret = -ENOSPC;
+		goto dma_unmap;
+	}
+
+	msg->hdr.len += sizeof(*out_trans) + dmas * sizeof(*out_trans->data);
+
+	out_trans->hdr.type = cpu_to_le32(TRANS_DMA_XFER_TO_DEV);
+	out_trans->hdr.len = cpu_to_le32(sizeof(*out_trans) +
+					 dmas * sizeof(*out_trans->data));
+	out_trans->tag = cpu_to_le32(in_trans->tag);
+	out_trans->count = cpu_to_le32(dmas);
+
+	i = 0;
+	last = sgt->sgl;
+	dma_len = 0;
+	for_each_sg(sgt->sgl, sg, nents, dmas) {
+		/* hit a discontinuity, finalize segment and start new one */
+		if (sg_dma_address(last) + sg_dma_len(last) !=
+		    sg_dma_address(sg)) {
+			out_trans->data[i].size = cpu_to_le64(dma_len);
+			if (dma_len)
+				i++;
+			dma_len = 0;
+			out_trans->data[i].addr =
+						cpu_to_le64(sg_dma_address(sg));
+		}
+		dma_len += sg_dma_len(sg);
+		last = sg;
+	}
+	/* finalize the last segment */
+	out_trans->data[i].size = cpu_to_le64(dma_len);
+
+	*user_len += in_trans->hdr.len;
+
+	xfer->sgt = sgt;
+	xfer->page_list = page_list;
+	xfer->nr_pages = nr_pages;
+	list_add(&xfer->list, &resources->dma_xfers);
+	return 0;
+
+dma_unmap:
+	dma_unmap_sg(&qdev->pdev->dev, sgt->sgl, sgt->nents, DMA_TO_DEVICE);
+free_table:
+	sg_free_table(sgt);
+free_sgt:
+	kfree(sgt);
+put_pages:
+	for (i = 0; i < nr_pages; ++i)
+		put_page(page_list[i]);
+free_page_list:
+	kfree(page_list);
+free_resource:
+	kfree(xfer);
+out:
+	return ret;
+}
+
+static int encode_activate(struct qaic_device *qdev, void *trans,
+			   struct _msg *msg, u32 *user_len,
+			   struct ioctl_resources *resources)
+{
+	struct qaic_manage_trans_activate_to_dev *in_trans = trans;
+	struct _trans_activate_to_dev *out_trans = (void *)msg + msg->hdr.len;
+	dma_addr_t dma_addr;
+	void *buf;
+	u32 nelem;
+	u32 size;
+
+	if (msg->hdr.len + sizeof(*out_trans) > sizeof(*msg))
+		return -ENOSPC;
+
+	if (!in_trans->queue_size)
+		return -EINVAL;
+
+	if (in_trans->resv)
+		return -EINVAL;
+
+	nelem = in_trans->queue_size;
+	size = (get_dbc_req_elem_size() + get_dbc_rsp_elem_size()) * nelem;
+	if (size / nelem != get_dbc_req_elem_size() + get_dbc_rsp_elem_size())
+		return -EINVAL;
+
+	if (size + QAIC_DBC_Q_GAP + QAIC_DBC_Q_BUF_ALIGN < size)
+		return -EINVAL;
+
+	size = ALIGN((size + QAIC_DBC_Q_GAP), QAIC_DBC_Q_BUF_ALIGN);
+
+	buf = dma_alloc_coherent(&qdev->pdev->dev, size, &dma_addr, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	out_trans->hdr.type = cpu_to_le32(TRANS_ACTIVATE_TO_DEV);
+	out_trans->hdr.len = cpu_to_le32(sizeof(*out_trans));
+	out_trans->buf_len = cpu_to_le32(size);
+	out_trans->req_q_addr = cpu_to_le64(dma_addr);
+	out_trans->req_q_size = cpu_to_le32(nelem);
+	out_trans->rsp_q_addr = cpu_to_le64(dma_addr + size - nelem *
+					    get_dbc_rsp_elem_size());
+	out_trans->rsp_q_size = cpu_to_le32(nelem);
+
+	*user_len += in_trans->hdr.len;
+	msg->hdr.len += sizeof(*out_trans);
+
+	resources->buf = buf;
+	resources->dma_addr = dma_addr;
+	resources->total_size = size;
+	resources->nelem = nelem;
+	resources->rsp_q_base = buf + size - nelem * get_dbc_rsp_elem_size();
+	return 0;
+}
+
+static int encode_deactivate(struct qaic_device *qdev, void *trans,
+			     u32 *user_len, struct qaic_user *usr)
+{
+	struct qaic_manage_trans_deactivate *in_trans = trans;
+
+	if (in_trans->dbc_id >= QAIC_NUM_DBC || in_trans->resv)
+		return -EINVAL;
+
+	*user_len += in_trans->hdr.len;
+
+	return disable_dbc(qdev, in_trans->dbc_id, usr);
+}
+
+static int encode_status(struct qaic_device *qdev, void *trans,
+			 struct _msg *msg, u32 *user_len)
+{
+	struct qaic_manage_trans_status_to_dev *in_trans = trans;
+	struct _trans_status_to_dev *out_trans = (void *)msg + msg->hdr.len;
+
+	if (msg->hdr.len + in_trans->hdr.len > sizeof(*msg))
+		return -ENOSPC;
+
+	out_trans->hdr.type = cpu_to_le32(TRANS_STATUS_TO_DEV);
+	out_trans->hdr.len = cpu_to_le32(in_trans->hdr.len);
+	msg->hdr.len += in_trans->hdr.len;
+	*user_len += in_trans->hdr.len;
+
+	return 0;
+}
+static int encode_message(struct qaic_device *qdev,
+			  struct qaic_manage_msg *user_msg, struct _msg *msg,
+			  struct ioctl_resources *resources,
+			  struct qaic_user *usr)
+{
+	struct qaic_manage_trans_hdr *trans_hdr;
+	u32 user_len = 0;
+	int ret;
+	int i;
+
+	msg->hdr.len = sizeof(msg->hdr);
+	for (i = 0; i < user_msg->count; ++i) {
+		if (user_len >= user_msg->len) {
+			ret = -EINVAL;
+			break;
+		}
+		trans_hdr = (struct qaic_manage_trans_hdr *)
+						(user_msg->data + user_len);
+		if (user_len + trans_hdr->len > user_msg->len) {
+			ret = -EINVAL;
+			break;
+		}
+
+		switch (trans_hdr->type) {
+		case TRANS_PASSTHROUGH_FROM_USR:
+			ret = encode_passthrough(qdev, trans_hdr, msg,
+						 &user_len);
+			break;
+		case TRANS_DMA_XFER_FROM_USR:
+			ret = encode_dma(qdev, trans_hdr, msg, &user_len,
+					 resources);
+			break;
+		case TRANS_ACTIVATE_FROM_USR:
+			ret = encode_activate(qdev, trans_hdr, msg, &user_len,
+					      resources);
+			break;
+		case TRANS_DEACTIVATE_FROM_USR:
+			ret = encode_deactivate(qdev, trans_hdr, &user_len,
+						usr);
+			break;
+		case TRANS_STATUS_FROM_USR:
+			ret = encode_status(qdev, trans_hdr, msg, &user_len);
+			break;
+		default:
+			ret = -EINVAL;
+			break;
+		}
+
+		if (ret)
+			break;
+	}
+
+	if (user_len != user_msg->len)
+		ret = -EINVAL;
+
+	if (ret) {
+		free_dma_xfers(qdev, resources);
+		free_dbc_buf(qdev, resources);
+		return ret;
+	}
+
+	msg->hdr.count = user_msg->count;
+	return 0;
+}
+
+static int decode_passthrough(struct qaic_device *qdev, void *trans,
+			      struct qaic_manage_msg *user_msg, u32 *msg_len)
+{
+	struct _trans_passthrough *in_trans = trans;
+	struct qaic_manage_trans_passthrough *out_trans;
+	u32 len;
+
+	out_trans = (void *)user_msg->data + user_msg->len;
+
+	len = le32_to_cpu(in_trans->hdr.len);
+	if (user_msg->len + len > QAIC_MANAGE_MAX_MSG_LENGTH)
+		return -ENOSPC;
+
+	memcpy(out_trans, in_trans, len);
+	user_msg->len += len;
+	*msg_len += len;
+	out_trans->hdr.type = le32_to_cpu(out_trans->hdr.type);
+	return 0;
+}
+
+static int decode_activate(struct qaic_device *qdev, void *trans,
+			   struct qaic_manage_msg *user_msg, u32 *msg_len,
+			   struct ioctl_resources *resources,
+			   struct qaic_user *usr)
+{
+	struct _trans_activate_from_dev *in_trans = trans;
+	struct qaic_manage_trans_activate_from_dev *out_trans;
+	u32 len;
+
+	out_trans = (void *)user_msg->data + user_msg->len;
+
+	len = le32_to_cpu(in_trans->hdr.len);
+	if (user_msg->len + len > QAIC_MANAGE_MAX_MSG_LENGTH)
+		return -ENOSPC;
+
+	user_msg->len += len;
+	*msg_len += len;
+	out_trans->hdr.type = le32_to_cpu(in_trans->hdr.type);
+	out_trans->hdr.len = len;
+	out_trans->status = le32_to_cpu(in_trans->status);
+	out_trans->dbc_id = le32_to_cpu(in_trans->dbc_id);
+
+	if (!resources->buf)
+		/* how did we get an activate response without a request? */
+		return -EINVAL;
+
+	if (out_trans->dbc_id >= QAIC_NUM_DBC)
+		/*
+		 * The device assigned an invalid resource, which should never
+		 * happen.  Inject an error so the user can try to recover.
+		 */
+		out_trans->status = -ENODEV;
+
+	resources->status = out_trans->status;
+	resources->dbc_id = out_trans->dbc_id;
+	if (!resources->status)
+		save_dbc_buf(qdev, resources, usr);
+	return 0;
+}
+
+static int decode_deactivate(struct qaic_device *qdev, void *trans,
+			     u32 *msg_len)
+{
+	struct _trans_deactivate_from_dev *in_trans = trans;
+	u32 dbc_id = le32_to_cpu(in_trans->dbc_id);
+	u32 status = le32_to_cpu(in_trans->status);
+
+	if (dbc_id >= QAIC_NUM_DBC)
+		/*
+		 * The device assigned an invalid resource, which should never
+		 * happen.  Inject an error so the user can try to recover.
+		 */
+		return -ENODEV;
+
+	if (status)
+		/*
+		 * Releasing resources failed on the device side, which puts
+		 * us in a bind since they may still be in use, so be safe and
+		 * do nothing.
+		 */
+		return -ENODEV;
+
+	release_dbc(qdev, dbc_id);
+	*msg_len += sizeof(*in_trans);
+	return 0;
+}
+
+static int decode_status(struct qaic_device *qdev, void *trans,
+			 struct qaic_manage_msg *user_msg, u32 *user_len)
+{
+	struct _trans_status_from_dev *in_trans = trans;
+	struct qaic_manage_trans_status_from_dev *out_trans;
+	u32 len;
+
+	out_trans = (void *)user_msg->data + user_msg->len;
+
+	len = le32_to_cpu(in_trans->hdr.len);
+	if (user_msg->len + len > QAIC_MANAGE_MAX_MSG_LENGTH)
+		return -ENOSPC;
+
+	out_trans->hdr.type = le32_to_cpu(TRANS_STATUS_FROM_DEV);
+	out_trans->hdr.len = len;
+	out_trans->major = le32_to_cpu(in_trans->major);
+	out_trans->minor = le32_to_cpu(in_trans->minor);
+	*user_len += in_trans->hdr.len;
+	user_msg->len += len;
+
+	return 0;
+}
+
+static int decode_message(struct qaic_device *qdev,
+			  struct qaic_manage_msg *user_msg, struct _msg *msg,
+			  struct ioctl_resources *resources,
+			  struct qaic_user *usr)
+{
+	struct _trans_hdr *trans_hdr;
+	u32 msg_len = 0;
+	int ret;
+	int i;
+
+	if (msg->hdr.len > sizeof(*msg))
+		return -EINVAL;
+
+	user_msg->len = 0;
+	user_msg->count = le32_to_cpu(msg->hdr.count);
+
+	for (i = 0; i < user_msg->count; ++i) {
+		trans_hdr = (struct _trans_hdr *)(msg->data + msg_len);
+		if (msg_len + trans_hdr->len > msg->hdr.len)
+			return -EINVAL;
+
+		switch (trans_hdr->type) {
+		case TRANS_PASSTHROUGH_FROM_DEV:
+			ret = decode_passthrough(qdev, trans_hdr, user_msg,
+						 &msg_len);
+			break;
+		case TRANS_ACTIVATE_FROM_DEV:
+			ret = decode_activate(qdev, trans_hdr, user_msg,
+					      &msg_len, resources, usr);
+			break;
+		case TRANS_DEACTIVATE_FROM_DEV:
+			ret = decode_deactivate(qdev, trans_hdr, &msg_len);
+			break;
+		case TRANS_STATUS_FROM_DEV:
+			ret = decode_status(qdev, trans_hdr, user_msg,
+					    &msg_len);
+			break;
+		default:
+			return -EINVAL;
+		}
+
+		if (ret)
+			return ret;
+	}
+
+	if (msg_len != (msg->hdr.len - sizeof(msg->hdr)))
+		return -EINVAL;
+
+	return 0;
+}
+
+static void *msg_xfer(struct qaic_device *qdev, struct wrapper_msg *wrapper,
+		      u32 seq_num, bool ignore_signal)
+{
+	struct xfer_queue_elem elem;
+	struct _msg *out_buf;
+	size_t in_len;
+	long ret;
+
+	if (qdev->in_reset) {
+		mutex_unlock(&qdev->cntl_mutex);
+		return ERR_PTR(-ENODEV);
+	}
+
+	in_len = sizeof(wrapper->msg);
+
+	elem.seq_num = seq_num;
+	elem.buf = NULL;
+	init_completion(&elem.xfer_done);
+	if (likely(!qdev->cntl_lost_buf)) {
+		out_buf = kmalloc(sizeof(*out_buf), GFP_KERNEL);
+		if (!out_buf) {
+			mutex_unlock(&qdev->cntl_mutex);
+			return ERR_PTR(-ENOMEM);
+		}
+
+		ret = mhi_queue_buf(qdev->cntl_ch, DMA_FROM_DEVICE,
+				    out_buf, sizeof(*out_buf), MHI_EOT);
+		if (ret) {
+			mutex_unlock(&qdev->cntl_mutex);
+			return ERR_PTR(ret);
+		}
+	} else {
+		/*
+		 * we lost a buffer because we queued a recv buf, but then
+		 * queuing the corresponding tx buf failed.  To try to avoid
+		 * a memory leak, lets reclaim it and use it for this
+		 * transaction.
+		 */
+		qdev->cntl_lost_buf = false;
+	}
+
+	kref_get(&wrapper->ref_count);
+	ret = mhi_queue_buf(qdev->cntl_ch, DMA_TO_DEVICE, &wrapper->msg, in_len,
+			    MHI_EOT);
+	if (ret) {
+		qdev->cntl_lost_buf = true;
+		kref_put(&wrapper->ref_count, free_wrapper);
+		mutex_unlock(&qdev->cntl_mutex);
+		return ERR_PTR(ret);
+	}
+
+	list_add_tail(&elem.list, &qdev->cntl_xfer_list);
+	mutex_unlock(&qdev->cntl_mutex);
+
+	if (ignore_signal)
+		ret = wait_for_completion_timeout(&elem.xfer_done,
+						  RESP_TIMEOUT);
+	else
+		ret = wait_for_completion_interruptible_timeout(&elem.xfer_done,
+								RESP_TIMEOUT);
+	/*
+	 * not using _interruptable because we have to cleanup or we'll
+	 * likely cause memory corruption
+	 */
+	mutex_lock(&qdev->cntl_mutex);
+	if (!list_empty(&elem.list))
+		list_del(&elem.list);
+	if (!ret && !elem.buf)
+		ret = -ETIMEDOUT;
+	else if (ret > 0 && !elem.buf)
+		ret = -EIO;
+	mutex_unlock(&qdev->cntl_mutex);
+
+	if (ret < 0) {
+		kfree(elem.buf);
+		return ERR_PTR(ret);
+	}
+
+	return elem.buf;
+}
+
+static int qaic_manage(struct qaic_device *qdev, struct qaic_user *usr,
+		       struct qaic_manage_msg *user_msg)
+{
+	struct ioctl_resources resources;
+	struct wrapper_msg *wrapper;
+	struct _msg *msg;
+	struct _msg *rsp;
+	int ret;
+
+	INIT_LIST_HEAD(&resources.dma_xfers);
+	resources.buf = NULL;
+
+	if (user_msg->len > QAIC_MANAGE_MAX_MSG_LENGTH ||
+	    user_msg->count >
+	    QAIC_MANAGE_MAX_MSG_LENGTH / sizeof(struct qaic_manage_trans_hdr)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	wrapper = kzalloc(sizeof(*wrapper), GFP_KERNEL);
+	if (!wrapper) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	kref_init(&wrapper->ref_count);
+	msg = &wrapper->msg;
+
+	ret = encode_message(qdev, user_msg, msg, &resources, usr);
+	if (ret)
+		goto encode_failed;
+
+	ret = mutex_lock_interruptible(&qdev->cntl_mutex);
+	if (ret)
+		goto lock_failed;
+	msg->hdr.magic_number = MANAGE_MAGIC_NUMBER;
+	msg->hdr.sequence_number = cpu_to_le32(qdev->next_seq_num++);
+	msg->hdr.len = cpu_to_le32(msg->hdr.len);
+	msg->hdr.count = cpu_to_le32(msg->hdr.count);
+	if (usr)
+		msg->hdr.handle = cpu_to_le32(usr->handle);
+	else
+		msg->hdr.handle = 0;
+
+	/* msg_xfer releases the mutex */
+	rsp = msg_xfer(qdev, wrapper, qdev->next_seq_num - 1, false);
+	if (IS_ERR(rsp)) {
+		ret = PTR_ERR(rsp);
+		goto lock_failed;
+	}
+
+	ret = decode_message(qdev, user_msg, rsp, &resources, usr);
+
+	kfree(rsp);
+lock_failed:
+	free_dma_xfers(qdev, &resources);
+	free_dbc_buf(qdev, &resources);
+encode_failed:
+	kref_put(&wrapper->ref_count, free_wrapper);
+out:
+	return ret;
+}
+
+int qaic_manage_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
+		      unsigned long arg)
+{
+	struct qaic_manage_msg *user_msg;
+	int ret;
+
+	user_msg = kmalloc(sizeof(*user_msg), GFP_KERNEL);
+	if (!user_msg) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (copy_from_user(user_msg, (void __user *)arg, sizeof(*user_msg))) {
+		ret = -EFAULT;
+		goto copy_from_user_failed;
+	}
+
+	ret = qaic_manage(qdev, usr, user_msg);
+	if (ret)
+		goto copy_from_user_failed;
+
+	if (copy_to_user((void __user *)arg, user_msg, sizeof(*user_msg)))
+		ret = -EFAULT;
+
+copy_from_user_failed:
+	kfree(user_msg);
+out:
+	return ret;
+}
+
+int get_cntl_version(struct qaic_device *qdev, struct qaic_user *usr,
+		     u16 *major, u16 *minor)
+{
+	int ret;
+	struct qaic_manage_msg *user_msg;
+	struct qaic_manage_trans_status_to_dev *status_query;
+	struct qaic_manage_trans_status_from_dev *status_result;
+
+	user_msg = kmalloc(sizeof(*user_msg), GFP_KERNEL);
+	if (!user_msg) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	user_msg->len = sizeof(*status_query);
+	user_msg->count = 1;
+
+	status_query = (struct qaic_manage_trans_status_to_dev *)user_msg->data;
+	status_query->hdr.type = TRANS_STATUS_FROM_USR;
+	status_query->hdr.len = sizeof(status_query->hdr);
+
+	ret = qaic_manage(qdev, usr, user_msg);
+	if (ret)
+		goto kfree_user_msg;
+	status_result =
+		(struct qaic_manage_trans_status_from_dev *)user_msg->data;
+	*major = status_result->major;
+	*minor = status_result->minor;
+
+kfree_user_msg:
+	kfree(user_msg);
+out:
+	return ret;
+}
+
+static void resp_worker(struct work_struct *work)
+{
+	struct resp_work *resp = container_of(work, struct resp_work, work);
+	struct qaic_device *qdev = resp->qdev;
+	struct _msg *msg = resp->buf;
+	struct xfer_queue_elem *elem;
+	struct xfer_queue_elem *i;
+	bool found = false;
+
+	if (msg->hdr.magic_number != MANAGE_MAGIC_NUMBER) {
+		kfree(msg);
+		kfree(resp);
+		return;
+	}
+
+	mutex_lock(&qdev->cntl_mutex);
+	list_for_each_entry_safe(elem, i, &qdev->cntl_xfer_list, list) {
+		if (elem->seq_num == le32_to_cpu(msg->hdr.sequence_number)) {
+			found = true;
+			list_del_init(&elem->list);
+			elem->buf = msg;
+			complete_all(&elem->xfer_done);
+			break;
+		}
+	}
+	mutex_unlock(&qdev->cntl_mutex);
+
+	if (!found)
+		/* request must have timed out, drop packet */
+		kfree(msg);
+
+	kfree(resp);
+}
+
+void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
+			 struct mhi_result *mhi_result)
+{
+	struct _msg *msg = mhi_result->buf_addr;
+	struct wrapper_msg *wrapper = container_of(msg, struct wrapper_msg,
+						   msg);
+
+	kref_put(&wrapper->ref_count, free_wrapper);
+}
+
+void qaic_mhi_dl_xfer_cb(struct mhi_device *mhi_dev,
+			 struct mhi_result *mhi_result)
+{
+	struct qaic_device *qdev = dev_get_drvdata(&mhi_dev->dev);
+	struct _msg *msg = mhi_result->buf_addr;
+	struct resp_work *resp;
+
+	if (mhi_result->transaction_status) {
+		kfree(msg);
+		return;
+	}
+
+	resp = kmalloc(sizeof(*resp), GFP_ATOMIC);
+	if (!resp) {
+		pci_err(qdev->pdev, "dl_xfer_cb alloc fail, dropping message\n");
+		kfree(msg);
+		return;
+	}
+
+	INIT_WORK(&resp->work, resp_worker);
+	resp->qdev = qdev;
+	resp->buf = msg;
+	queue_work(qdev->cntl_wq, &resp->work);
+}
+
+int qaic_control_open(struct qaic_device *qdev)
+{
+	if (!qdev->cntl_ch)
+		return -ENODEV;
+
+	return mhi_prepare_for_transfer(qdev->cntl_ch);
+}
+
+void qaic_control_close(struct qaic_device *qdev)
+{
+	mhi_unprepare_from_transfer(qdev->cntl_ch);
+}
+
+void qaic_release_usr(struct qaic_device *qdev, struct qaic_user *usr)
+{
+	struct _trans_terminate_to_dev *trans;
+	struct wrapper_msg *wrapper;
+	struct _msg *msg;
+	struct _msg *rsp;
+
+	wrapper = kzalloc(sizeof(*wrapper), GFP_KERNEL);
+	if (!wrapper)
+		return;
+
+	kref_init(&wrapper->ref_count);
+	msg = &wrapper->msg;
+
+	trans = (struct _trans_terminate_to_dev *)msg->data;
+
+	trans->hdr.type = cpu_to_le32(TRANS_TERMINATE_TO_DEV);
+	trans->hdr.len = cpu_to_le32(sizeof(*trans));
+	trans->handle = cpu_to_le32(usr->handle);
+
+	mutex_lock(&qdev->cntl_mutex);
+	msg->hdr.magic_number = MANAGE_MAGIC_NUMBER;
+	msg->hdr.sequence_number = cpu_to_le32(qdev->next_seq_num++);
+	msg->hdr.len = cpu_to_le32(sizeof(msg->hdr) + sizeof(*trans));
+	msg->hdr.count = cpu_to_le32(1);
+	msg->hdr.handle = cpu_to_le32(usr->handle);
+
+	/*
+	 * msg_xfer releases the mutex
+	 * We don't care about the return of msg_xfer since we will not do
+	 * anything different based on what happens.
+	 * We ignore pending signals since one will be set if the user is
+	 * killed, and we need give the device a chance to cleanup, otherwise
+	 * DMA may still be in progress when we return.
+	 */
+	rsp = msg_xfer(qdev, wrapper, qdev->next_seq_num - 1, true);
+	if (!IS_ERR(rsp))
+		kfree(rsp);
+	kref_put(&wrapper->ref_count, free_wrapper);
+}
+
+void wake_all_cntl(struct qaic_device *qdev)
+{
+	struct xfer_queue_elem *elem;
+	struct xfer_queue_elem *i;
+
+	mutex_lock(&qdev->cntl_mutex);
+	list_for_each_entry_safe(elem, i, &qdev->cntl_xfer_list, list) {
+		list_del_init(&elem->list);
+		complete_all(&elem->xfer_done);
+	}
+	mutex_unlock(&qdev->cntl_mutex);
+}
diff --git a/drivers/misc/qaic/qaic_drv.c b/drivers/misc/qaic/qaic_drv.c
index 6e4b936..6feecc0 100644
--- a/drivers/misc/qaic/qaic_drv.c
+++ b/drivers/misc/qaic/qaic_drv.c
@@ -12,6 +12,11 @@
 #include <linux/mutex.h>
 #include <linux/pci.h>
 #include <linux/pci_ids.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+#include <linux/workqueue.h>
+#include <linux/wait.h>
+#include <uapi/misc/qaic.h>
 
 #include "mhi_controller.h"
 #include "qaic.h"
@@ -21,6 +26,8 @@
 #define QAIC_NAME			"Qualcomm Cloud AI 100"
 #define QAIC_MAX_MINORS			256
 
+static u16 cntl_major;
+static u16 cntl_minor = 3;
 static int qaic_major;
 static struct class *qaic_class;
 static DEFINE_IDR(qaic_devs);
@@ -29,11 +36,14 @@ static bool link_up;
 
 static int qaic_device_open(struct inode *inode, struct file *filp);
 static int qaic_device_release(struct inode *inode, struct file *filp);
+static long qaic_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
 
 static const struct file_operations qaic_ops = {
 	.owner = THIS_MODULE,
 	.open = qaic_device_open,
 	.release = qaic_device_release,
+	.unlocked_ioctl = qaic_ioctl,
+	.compat_ioctl = qaic_ioctl,
 };
 
 static void free_usr(struct kref *kref)
@@ -106,6 +116,7 @@ static int qaic_device_release(struct inode *inode, struct file *filp)
 		if (!qdev->in_reset) {
 			pci_dbg(qdev->pdev, "%s pid:%d\n", __func__,
 								current->pid);
+			qaic_release_usr(qdev, usr);
 		}
 		srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
 
@@ -123,12 +134,59 @@ static int qaic_device_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
+static long qaic_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+	struct qaic_user *usr = filp->private_data;
+	struct qaic_device *qdev = usr->qdev;
+	unsigned int nr = _IOC_NR(cmd);
+	int qdev_rcu_id;
+	int usr_rcu_id;
+	int ret;
+
+	usr_rcu_id = srcu_read_lock(&usr->qdev_lock);
+	if (!qdev) {
+		srcu_read_unlock(&usr->qdev_lock, usr_rcu_id);
+		return -ENODEV;
+	}
+
+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+		srcu_read_unlock(&usr->qdev_lock, usr_rcu_id);
+		return -ENODEV;
+	}
+
+	if (_IOC_TYPE(cmd) != 'Q') {
+		srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+		srcu_read_unlock(&usr->qdev_lock, usr_rcu_id);
+		return -ENOTTY;
+	}
+
+	switch (nr) {
+	case QAIC_IOCTL_MANAGE_NR:
+		if (_IOC_DIR(cmd) != (_IOC_READ | _IOC_WRITE) ||
+		    _IOC_SIZE(cmd) != sizeof(struct qaic_manage_msg)) {
+			ret = -EINVAL;
+			break;
+		}
+		ret = qaic_manage_ioctl(qdev, usr, arg);
+		break;
+	default:
+		ret = -ENOTTY;
+	}
+
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+	srcu_read_unlock(&usr->qdev_lock, usr_rcu_id);
+	return ret;
+}
+
 static int qaic_mhi_probe(struct mhi_device *mhi_dev,
 			  const struct mhi_device_id *id)
 {
 	struct qaic_device *qdev;
 	dev_t devno;
 	int ret;
+	u16 major, minor;
 
 	/*
 	 * Invoking this function indicates that the control channel to the
@@ -150,13 +208,26 @@ static int qaic_mhi_probe(struct mhi_device *mhi_dev,
 	dev_set_drvdata(&mhi_dev->dev, qdev);
 	qdev->cntl_ch = mhi_dev;
 
+	ret = qaic_control_open(qdev);
+	if (ret) {
+		pci_dbg(qdev->pdev, "%s: control_open failed %d\n", __func__, ret);
+		goto err;
+	}
+
+	ret = get_cntl_version(qdev, NULL, &major, &minor);
+	if (ret || major != cntl_major || minor > cntl_minor) {
+		pci_dbg(qdev->pdev, "%s: Control protocol version (%d.%d) not supported.  Supported version is (%d.%d). Ret: %d\n", __func__, major, minor, cntl_major, cntl_minor, ret);
+		ret = -EINVAL;
+		goto close_control;
+	}
+
 	mutex_lock(&qaic_devs_lock);
 	ret = idr_alloc(&qaic_devs, qdev, 0, QAIC_MAX_MINORS, GFP_KERNEL);
 	mutex_unlock(&qaic_devs_lock);
 
 	if (ret < 0) {
 		pci_dbg(qdev->pdev, "%s: idr_alloc failed %d\n", __func__, ret);
-		goto err;
+		goto close_control;
 	}
 
 	devno = MKDEV(qaic_major, ret);
@@ -198,6 +269,8 @@ static int qaic_mhi_probe(struct mhi_device *mhi_dev,
 	mutex_lock(&qaic_devs_lock);
 	idr_remove(&qaic_devs, MINOR(devno));
 	mutex_unlock(&qaic_devs_lock);
+close_control:
+	qaic_control_close(qdev);
 err:
 	return ret;
 }
@@ -206,16 +279,6 @@ static void qaic_mhi_remove(struct mhi_device *mhi_dev)
 {
 }
 
-static void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
-				struct mhi_result *mhi_result)
-{
-}
-
-static void qaic_mhi_dl_xfer_cb(struct mhi_device *mhi_dev,
-				struct mhi_result *mhi_result)
-{
-}
-
 void qaic_dev_reset_clean_local_state(struct qaic_device *qdev)
 {
 	struct qaic_user *usr;
@@ -223,6 +286,8 @@ void qaic_dev_reset_clean_local_state(struct qaic_device *qdev)
 	dev_t devno;
 
 	qdev->in_reset = true;
+	/* wake up any waiters to avoid waiting for timeouts at sync */
+	wake_all_cntl(qdev);
 	synchronize_srcu(&qdev->dev_lock);
 
 	/*
@@ -255,10 +320,46 @@ void qaic_dev_reset_clean_local_state(struct qaic_device *qdev)
 	}
 }
 
+inline int get_dbc_req_elem_size(void)
+{
+	return 64;
+}
+
+inline int get_dbc_rsp_elem_size(void)
+{
+	return 4;
+}
+
+int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr)
+{
+	if (!qdev->dbc[dbc_id].usr ||
+	    qdev->dbc[dbc_id].usr->handle != usr->handle)
+		return -EPERM;
+
+	qdev->dbc[dbc_id].usr = NULL;
+	synchronize_srcu(&qdev->dbc[dbc_id].ch_lock);
+	return 0;
+}
+
+void release_dbc(struct qaic_device *qdev, u32 dbc_id)
+{
+	dma_free_coherent(&qdev->pdev->dev, qdev->dbc[dbc_id].total_size,
+			  qdev->dbc[dbc_id].req_q_base,
+			  qdev->dbc[dbc_id].dma_addr);
+	qdev->dbc[dbc_id].total_size = 0;
+	qdev->dbc[dbc_id].req_q_base = NULL;
+	qdev->dbc[dbc_id].dma_addr = 0;
+	qdev->dbc[dbc_id].nelem = 0;
+	qdev->dbc[dbc_id].usr = NULL;
+	qdev->dbc[dbc_id].in_use = false;
+	wake_up(&qdev->dbc[dbc_id].dbc_release);
+}
+
 static int qaic_pci_probe(struct pci_dev *pdev,
 			  const struct pci_device_id *id)
 {
 	int ret;
+	int i;
 	int mhi_irq;
 	struct qaic_device *qdev;
 
@@ -270,11 +371,28 @@ static int qaic_pci_probe(struct pci_dev *pdev,
 		goto qdev_fail;
 	}
 
+	qdev->cntl_wq = alloc_workqueue("qaic_cntl", WQ_UNBOUND, 0);
+	if (!qdev->cntl_wq) {
+		ret = -ENOMEM;
+		goto wq_fail;
+	}
 	pci_set_drvdata(pdev, qdev);
 	qdev->pdev = pdev;
+	mutex_init(&qdev->cntl_mutex);
+	INIT_LIST_HEAD(&qdev->cntl_xfer_list);
 	init_srcu_struct(&qdev->dev_lock);
 	INIT_LIST_HEAD(&qdev->users);
 	mutex_init(&qdev->users_mutex);
+	for (i = 0; i < QAIC_NUM_DBC; ++i) {
+		mutex_init(&qdev->dbc[i].mem_lock);
+		spin_lock_init(&qdev->dbc[i].xfer_lock);
+		idr_init(&qdev->dbc[i].mem_handles);
+		qdev->dbc[i].qdev = qdev;
+		qdev->dbc[i].id = i;
+		INIT_LIST_HEAD(&qdev->dbc[i].xfer_list);
+		init_srcu_struct(&qdev->dbc[i].ch_lock);
+		init_waitqueue_head(&qdev->dbc[i].dbc_release);
+	}
 
 	qdev->bars = pci_select_bars(pdev, IORESOURCE_MEM);
 
@@ -308,6 +426,15 @@ static int qaic_pci_probe(struct pci_dev *pdev,
 		goto ioremap_0_fail;
 	}
 
+	qdev->bar_2 = pci_ioremap_bar(pdev, 2);
+	if (!qdev->bar_2) {
+		ret = -ENOMEM;
+		goto ioremap_2_fail;
+	}
+
+	for (i = 0; i < QAIC_NUM_DBC; ++i)
+		qdev->dbc[i].dbc_base = qdev->bar_2 + QAIC_DBC_OFF(i);
+
 	ret = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI);
 	if (ret < 0)
 		goto alloc_irq_fail;
@@ -339,6 +466,8 @@ static int qaic_pci_probe(struct pci_dev *pdev,
 invalid_msi_config:
 	pci_free_irq_vectors(pdev);
 alloc_irq_fail:
+	iounmap(qdev->bar_2);
+ioremap_2_fail:
 	iounmap(qdev->bar_0);
 ioremap_0_fail:
 dma_mask_fail:
@@ -349,7 +478,11 @@ static int qaic_pci_probe(struct pci_dev *pdev,
 enable_fail:
 	pci_set_drvdata(pdev, NULL);
 bar_fail:
+	for (i = 0; i < QAIC_NUM_DBC; ++i)
+		cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
 	cleanup_srcu_struct(&qdev->dev_lock);
+	destroy_workqueue(qdev->cntl_wq);
+wq_fail:
 	kfree(qdev);
 qdev_fail:
 	return ret;
@@ -358,6 +491,7 @@ static int qaic_pci_probe(struct pci_dev *pdev,
 static void qaic_pci_remove(struct pci_dev *pdev)
 {
 	struct qaic_device *qdev = pci_get_drvdata(pdev);
+	int i;
 
 	pci_dbg(pdev, "%s\n", __func__);
 	if (!qdev)
@@ -365,6 +499,9 @@ static void qaic_pci_remove(struct pci_dev *pdev)
 
 	qaic_dev_reset_clean_local_state(qdev);
 	qaic_mhi_free_controller(qdev->mhi_cntl, link_up);
+	for (i = 0; i < QAIC_NUM_DBC; ++i)
+		cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
+	destroy_workqueue(qdev->cntl_wq);
 	pci_free_irq_vectors(pdev);
 	iounmap(qdev->bar_0);
 	pci_clear_master(pdev);
diff --git a/include/uapi/misc/qaic.h b/include/uapi/misc/qaic.h
new file mode 100644
index 0000000..9bcc33f
--- /dev/null
+++ b/include/uapi/misc/qaic.h
@@ -0,0 +1,117 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+ *
+ * Copyright (c) 2019-2020, The Linux Foundation. All rights reserved.
+ */
+
+#ifndef QAIC_H_
+#define QAIC_H_
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+#define QAIC_MANAGE_MAX_MSG_LENGTH 16364
+
+enum qaic_manage_transaction_type {
+	TRANS_UNDEFINED =		0,
+	TRANS_PASSTHROUGH_FROM_USR =	1,
+	TRANS_PASSTHROUGH_TO_USR =	2,
+	TRANS_PASSTHROUGH_FROM_DEV =	3,
+	TRANS_PASSTHROUGH_TO_DEV =	4,
+	TRANS_DMA_XFER_FROM_USR =	5,
+	TRANS_DMA_XFER_TO_DEV =		6,
+	TRANS_ACTIVATE_FROM_USR =	7,
+	TRANS_ACTIVATE_FROM_DEV =	8,
+	TRANS_ACTIVATE_TO_DEV =		9,
+	TRANS_DEACTIVATE_FROM_USR =	10,
+	TRANS_DEACTIVATE_FROM_DEV =	11,
+	TRANS_STATUS_FROM_USR =		12,
+	TRANS_STATUS_TO_USR =		13,
+	TRANS_STATUS_FROM_DEV =		14,
+	TRANS_STATUS_TO_DEV =		15,
+	TRANS_TERMINATE_FROM_DEV =	16,
+	TRANS_TERMINATE_TO_DEV =	17,
+	TRANS_MAX =			18
+};
+
+struct qaic_manage_trans_hdr {
+	__u32 type; /* value from enum manage_transaction_type */
+	__u32 len;  /* length of this transaction, including the header */
+};
+
+struct qaic_manage_trans_passthrough {
+	struct qaic_manage_trans_hdr hdr;
+	u8 data[0]; /* userspace must encode in little endian */
+};
+
+struct qaic_manage_trans_dma_xfer {
+	struct qaic_manage_trans_hdr hdr;
+	__u32 tag;
+	__u32 count;
+	__u64 addr;
+	__u64 size;
+};
+
+struct qaic_manage_trans_activate_to_dev {
+	struct qaic_manage_trans_hdr hdr;
+	__u32 queue_size; /* in number of elements */
+	__u32 eventfd;
+	__u64 resv; /* reserved for future use, must be 0 */
+};
+
+struct qaic_manage_trans_activate_from_dev {
+	struct qaic_manage_trans_hdr hdr;
+	__u32 status;
+	__u32 dbc_id; /* Identifier of assigned DMA Bridge channel */
+};
+
+struct qaic_manage_trans_deactivate {
+	struct qaic_manage_trans_hdr hdr;
+	__u32 dbc_id; /* Identifier of assigned DMA Bridge channel */
+	__u32 resv;   /* reserved for future use, must be 0 */
+};
+
+struct qaic_manage_trans_status_to_dev {
+	struct qaic_manage_trans_hdr hdr;
+};
+
+struct qaic_manage_trans_status_from_dev {
+	struct qaic_manage_trans_hdr hdr;
+	__u16 major;
+	__u16 minor;
+	__u32 status;
+};
+
+struct qaic_manage_msg {
+	__u32 len;   /* Length of valid data - ie sum of all transactions */
+	__u32 count; /* Number of transactions in message */
+	__u8 data[QAIC_MANAGE_MAX_MSG_LENGTH];
+};
+
+#define QAIC_IOCTL_MANAGE_NR	0x01
+
+/*
+ * Send Manage command to the device
+ *
+ * A manage command is a message that consists of N transactions.  The set
+ * of transactions consititues a single operation.  In most cases, a manage
+ * command is a request for the device to do something.  The entire command
+ * must be encoded into a single message.
+ *
+ * The command will be encoded into the wire format, and sent to the device.
+ * the process will then be blocked until the device responds to the message
+ * or a timeout is reached.  If a response is successfully received, it will
+ * be encoded into the provided message structure.
+ *
+ * The return value is 0 for success, or a standard error code.  Some of the
+ * possible errors:
+ *
+ * EINTR     - Kernel waiting was interrupted (IE received a signal for user)
+ * ETIMEDOUT - Timeout for response from device expired
+ * EINVAL    - Invalid message
+ * ENOSPC    - Ran out of space to encode the message into the wire protocol
+ * ENOMEM    - Unable to obtain memory while processing message
+ * EFAULT    - Error in accessing memory from user
+ */
+#define QAIC_IOCTL_MANAGE _IOWR('Q', QAIC_IOCTL_MANAGE_NR, struct manage_msg)
+
+#endif /* QAIC_H_ */
-- 
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 5/8] qaic: Implement data path
  2020-05-14 14:07 [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver Jeffrey Hugo
                   ` (3 preceding siblings ...)
  2020-05-14 14:07 ` [RFC PATCH 4/8] qaic: Implement control path Jeffrey Hugo
@ 2020-05-14 14:07 ` Jeffrey Hugo
  2020-05-14 14:14   ` Greg KH
  2020-05-14 21:36   ` Arnd Bergmann
  2020-05-14 14:07 ` [RFC PATCH 6/8] qaic: Implement PCI link status error handlers Jeffrey Hugo
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-14 14:07 UTC (permalink / raw)
  To: arnd, gregkh
  Cc: manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel, Jeffrey Hugo

Once a user has configured the device via the control path for their
workload, the user may use the data path to execute the workload.  Using
the data path involves several steps.

First, the user must use the memory ioctl to allocate one or more buffers
for use by the workload.  These buffers hold input data, and results.
The memory ioctl can also be used to free buffers once no longer needed.

Next, the user must mmap() the buffers, to gain access to them.

To submit buffers to the device, the user uses the execute ioctl.

Finally, the user may use the wait for execute ioctl to determine when
the device has completed its handling of submitted buffers (ie consumed
input data, or produced output).

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
---
 drivers/misc/qaic/Makefile    |   3 +-
 drivers/misc/qaic/qaic.h      |  11 +
 drivers/misc/qaic/qaic_data.c | 952 ++++++++++++++++++++++++++++++++++++++++++
 drivers/misc/qaic/qaic_drv.c  | 116 +++--
 include/uapi/misc/qaic.h      | 129 ++++++
 5 files changed, 1175 insertions(+), 36 deletions(-)
 create mode 100644 drivers/misc/qaic/qaic_data.c

diff --git a/drivers/misc/qaic/Makefile b/drivers/misc/qaic/Makefile
index 7a5513b..d17ee0a 100644
--- a/drivers/misc/qaic/Makefile
+++ b/drivers/misc/qaic/Makefile
@@ -8,4 +8,5 @@ obj-$(CONFIG_QAIC)	:= qaic.o
 qaic-y := \
 	qaic_drv.o \
 	mhi_controller.o \
-	qaic_control.o
+	qaic_control.o \
+	qaic_data.o
diff --git a/drivers/misc/qaic/qaic.h b/drivers/misc/qaic/qaic.h
index c35a4e1..5f774d4 100644
--- a/drivers/misc/qaic/qaic.h
+++ b/drivers/misc/qaic/qaic.h
@@ -8,6 +8,7 @@
 
 #include <linux/cdev.h>
 #include <linux/idr.h>
+#include <linux/interrupt.h>
 #include <linux/kref.h>
 #include <linux/mhi.h>
 #include <linux/mutex.h>
@@ -80,6 +81,14 @@ int get_cntl_version(struct qaic_device *qdev, struct qaic_user *usr,
 		     u16 *major, u16 *minor);
 int qaic_manage_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
 		      unsigned long arg);
+int qaic_mem_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
+		   unsigned long arg);
+int qaic_execute_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
+		       unsigned long arg);
+int qaic_wait_exec_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
+			 unsigned long arg);
+int qaic_data_mmap(struct qaic_device *qdev, struct qaic_user *usr,
+		   struct vm_area_struct *vma);
 
 void qaic_mhi_ul_xfer_cb(struct mhi_device *mhi_dev,
 			 struct mhi_result *mhi_result);
@@ -91,7 +100,9 @@ int qaic_control_open(struct qaic_device *qdev);
 void qaic_control_close(struct qaic_device *qdev);
 void qaic_release_usr(struct qaic_device *qdev, struct qaic_user *usr);
 
+irqreturn_t dbc_irq_handler(int irq, void *data);
 int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr);
+void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id);
 void release_dbc(struct qaic_device *qdev, u32 dbc_id);
 
 void wake_all_cntl(struct qaic_device *qdev);
diff --git a/drivers/misc/qaic/qaic_data.c b/drivers/misc/qaic/qaic_data.c
new file mode 100644
index 0000000..4261a3c
--- /dev/null
+++ b/drivers/misc/qaic/qaic_data.c
@@ -0,0 +1,952 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* Copyright (c) 2019-2020, The Linux Foundation. All rights reserved. */
+
+#include <linux/completion.h>
+#include <linux/dma-mapping.h>
+#include <linux/idr.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/moduleparam.h>
+#include <linux/scatterlist.h>
+#include <linux/spinlock.h>
+#include <linux/srcu.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/wait.h>
+#include <uapi/misc/qaic.h>
+
+#include "qaic.h"
+
+#define PGOFF_DBC_SHIFT 32
+#define PGOFF_DBC_MASK	GENMASK_ULL(63, 32)
+#define SEM_VAL_MASK	GENMASK_ULL(11, 0)
+#define SEM_INDEX_MASK	GENMASK_ULL(4, 0)
+#define BULK_XFER	BIT(3)
+#define GEN_COMPLETION	BIT(4)
+#define INBOUND_XFER	1
+#define OUTBOUND_XFER	2
+#define REQHP_OFF	0x0 /* we read this */
+#define REQTP_OFF	0x4 /* we write this */
+#define RSPHP_OFF	0x8 /* we write this */
+#define RSPTP_OFF	0xc /* we read this */
+
+#define ENCODE_SEM(val, index, sync, cmd, flags)			\
+			((val) |					\
+			(index) << 16 |					\
+			(sync) << 22 |					\
+			(cmd) << 24 |					\
+			((cmd) ? BIT(31) : 0) |				\
+			(((flags) & SEM_INSYNCFENCE) ? BIT(30) : 0) |	\
+			(((flags) & SEM_OUTSYNCFENCE) ? BIT(29) : 0))
+
+static unsigned int wait_exec_default_timeout = 5000; /* 5 sec default */
+module_param(wait_exec_default_timeout, uint, 0600);
+
+struct dbc_req { /* everything must be little endian encoded */
+	u16	req_id;
+	u8	seq_id;
+	u8	cmd;
+	u32	resv;
+	u64	src_addr;
+	u64	dest_addr;
+	u32	len;
+	u32	resv2;
+	u64	db_addr; /* doorbell address */
+	u8	db_len; /* not a raw value, special encoding */
+	u8	resv3;
+	u16	resv4;
+	u32	db_data;
+	u32	sem_cmd0;
+	u32	sem_cmd1;
+	u32	sem_cmd2;
+	u32	sem_cmd3;
+} __packed;
+
+struct dbc_rsp { /* everything must be little endian encoded */
+	u16	req_id;
+	u16	status;
+} __packed;
+
+struct mem_handle {
+	struct sg_table		*sgt;  /* Mapped pages */
+	int			nents; /* num dma mapped elements in sgt */
+	int			dir;   /* see enum dma_data_direction */
+	struct dbc_req		*reqs;
+	struct list_head	list;
+	u16			req_id;/* req_id for the xfer while in flight */
+	struct completion	xfer_done;
+	struct kref		ref_count;
+	struct qaic_device	*qdev;
+	bool			queued;
+	bool			no_xfer;
+};
+
+inline int get_dbc_req_elem_size(void)
+{
+	return sizeof(struct dbc_req);
+}
+
+inline int get_dbc_rsp_elem_size(void)
+{
+	return sizeof(struct dbc_rsp);
+}
+
+static int reserve_pages(unsigned long start_pfn, unsigned long nr_pages,
+			 bool reserve)
+{
+	unsigned long pfn;
+	unsigned long end_pfn = start_pfn + nr_pages;
+	struct page *page;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+		if (!pfn_valid(pfn))
+			return -EINVAL;
+		page =  pfn_to_page(pfn);
+		if (reserve)
+			SetPageReserved(page);
+		else
+			ClearPageReserved(page);
+	}
+	return 0;
+}
+
+static int alloc_handle(struct qaic_device *qdev, struct qaic_mem_req *req)
+{
+	struct mem_handle *mem;
+	struct scatterlist *sg;
+	struct sg_table *sgt;
+	struct page *page;
+	int buf_extra;
+	int max_order;
+	int nr_pages;
+	int order;
+	int nents;
+	int ret;
+
+	if (!(req->dir == DMA_TO_DEVICE || req->dir == DMA_FROM_DEVICE ||
+	      req->dir == DMA_BIDIRECTIONAL)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	mem = kmalloc(sizeof(*mem), GFP_KERNEL);
+	if (!mem) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (req->size) {
+		nr_pages = DIV_ROUND_UP(req->size, PAGE_SIZE);
+		/*
+		 * calculate how much extra we are going to allocate, to remove
+		 * later
+		 */
+		buf_extra = (PAGE_SIZE - req->size % PAGE_SIZE) % PAGE_SIZE;
+		max_order = min(MAX_ORDER, get_order(req->size));
+		mem->no_xfer = false;
+	} else {
+		/* allocate a single page for book keeping */
+		nr_pages = 1;
+		buf_extra = 0;
+		max_order = 0;
+		mem->no_xfer = true;
+	}
+
+	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
+	if (!sgt) {
+		ret = -ENOMEM;
+		goto free_mem;
+	}
+
+	if (sg_alloc_table(sgt, nr_pages, GFP_KERNEL)) {
+		ret = -ENOMEM;
+		goto free_sgt;
+	}
+
+	sg = sgt->sgl;
+	sgt->nents = 0;
+
+	/*
+	 * Try to allocate enough pages to cover the request.  High order pages
+	 * will be contiguous, which will be conducive to DMA.
+	 */
+	while (1) {
+		order = min(fls(nr_pages) - 1, max_order);
+		while (1) {
+			page = alloc_pages(GFP_KERNEL | GFP_HIGHUSER |
+					   __GFP_NOWARN | __GFP_ZERO |
+					   (order ? __GFP_NORETRY :
+							__GFP_RETRY_MAYFAIL),
+					   order);
+			if (page)
+				break;
+			if (!order--) {
+				sg_set_page(sg, NULL, 0, 0);
+				sg_mark_end(sg);
+				ret = -ENOMEM;
+				goto free_partial_alloc;
+			}
+			max_order = order;
+		}
+
+		if (reserve_pages(page_to_pfn(page), 1 << order, true))
+			goto free_partial_alloc;
+
+		sg_set_page(sg, page, PAGE_SIZE << order, 0);
+		sgt->nents++;
+		nr_pages -= 1 << order;
+		if (!nr_pages) {
+			if (buf_extra)
+				sg_set_page(sg, page,
+					    (PAGE_SIZE << order) - buf_extra,
+					    0);
+			sg_mark_end(sg);
+			break;
+		}
+		sg = sg_next(sg);
+	}
+
+	nents = dma_map_sg(&qdev->pdev->dev, sgt->sgl, sgt->nents, req->dir);
+	if (!nents) {
+		ret = -EFAULT;
+		goto free_partial_alloc;
+	}
+
+	if (req->dir == DMA_TO_DEVICE || req->dir == DMA_BIDIRECTIONAL)
+		dma_sync_sg_for_cpu(&qdev->pdev->dev, sgt->sgl, sgt->nents,
+				    req->dir);
+
+	mem->reqs = kcalloc(nents, sizeof(*mem->reqs), GFP_KERNEL);
+	if (!mem->reqs) {
+		ret = -ENOMEM;
+		goto req_alloc_fail;
+	}
+
+	mem->sgt = sgt;
+	mem->nents = nents;
+	mem->dir = req->dir;
+	mem->qdev = qdev;
+	mem->queued = false;
+
+	ret = mutex_lock_interruptible(&qdev->dbc[req->dbc_id].mem_lock);
+	if (ret)
+		goto lock_fail;
+	ret = idr_alloc(&qdev->dbc[req->dbc_id].mem_handles, mem, 1, 0,
+		       GFP_KERNEL);
+	mutex_unlock(&qdev->dbc[req->dbc_id].mem_lock);
+	if (ret < 0)
+		goto lock_fail;
+
+	req->handle = ret | (u64)req->dbc_id << PGOFF_DBC_SHIFT;
+	/*
+	 * When userspace uses the handle as the offset parameter to mmap,
+	 * it needs to be in multiples of PAGE_SIZE.
+	 */
+	req->handle <<= PAGE_SHIFT;
+
+	kref_init(&mem->ref_count);
+
+	return 0;
+
+lock_fail:
+	kfree(mem->reqs);
+req_alloc_fail:
+	dma_unmap_sg(&qdev->pdev->dev, sgt->sgl, sgt->nents, req->dir);
+free_partial_alloc:
+	for (sg = sgt->sgl; sg; sg = sg_next(sg))
+		if (sg_page(sg)) {
+			reserve_pages(page_to_pfn(sg_page(sg)),
+				      DIV_ROUND_UP(sg->length, PAGE_SIZE),
+				      false);
+			__free_pages(sg_page(sg), get_order(sg->length));
+		}
+	sg_free_table(sgt);
+free_sgt:
+	kfree(sgt);
+free_mem:
+	kfree(mem);
+out:
+	return ret;
+}
+
+static void free_handle_mem(struct kref *kref)
+{
+	struct mem_handle *mem = container_of(kref, struct mem_handle,
+					      ref_count);
+	struct scatterlist *sg;
+	struct sg_table *sgt;
+
+	sgt = mem->sgt;
+	dma_unmap_sg(&mem->qdev->pdev->dev, sgt->sgl, sgt->nents, mem->dir);
+	for (sg = sgt->sgl; sg; sg = sg_next(sg))
+		if (sg_page(sg)) {
+			reserve_pages(page_to_pfn(sg_page(sg)),
+				      DIV_ROUND_UP(sg->length, PAGE_SIZE),
+				      false);
+			__free_pages(sg_page(sg), get_order(sg->length));
+		}
+	sg_free_table(sgt);
+	kfree(sgt);
+	kfree(mem->reqs);
+	kfree(mem);
+}
+
+static int free_handle(struct qaic_device *qdev, struct qaic_mem_req *req)
+{
+	struct mem_handle *mem;
+	unsigned long flags;
+	int handle;
+	int dbc_id;
+	int ret;
+
+	handle = req->handle & ~PGOFF_DBC_MASK;
+	dbc_id = (req->handle & PGOFF_DBC_MASK) >> PGOFF_DBC_SHIFT;
+
+	/* we shifted up by PAGE_SHIFT to make mmap happy, need to undo that */
+	handle >>= PAGE_SHIFT;
+	dbc_id >>= PAGE_SHIFT;
+
+	if (dbc_id != req->dbc_id)
+		return -EINVAL;
+
+	ret = mutex_lock_interruptible(&qdev->dbc[dbc_id].mem_lock);
+	if (ret)
+		goto lock_fail;
+	mem = idr_find(&qdev->dbc[dbc_id].mem_handles, handle);
+	if (mem) {
+		spin_lock_irqsave(&qdev->dbc[dbc_id].xfer_lock, flags);
+		if (mem->queued)
+			ret = -EINVAL;
+		else
+			idr_remove(&qdev->dbc[dbc_id].mem_handles, handle);
+		spin_unlock_irqrestore(&qdev->dbc[dbc_id].xfer_lock, flags);
+	} else {
+		ret = -ENODEV;
+	}
+	mutex_unlock(&qdev->dbc[dbc_id].mem_lock);
+	if (ret)
+		goto lock_fail;
+
+	kref_put(&mem->ref_count, free_handle_mem);
+
+	ret = 0;
+
+lock_fail:
+	return ret;
+}
+
+int qaic_mem_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
+		   unsigned long arg)
+{
+	struct qaic_mem_req req;
+	int rcu_id;
+	int ret;
+
+	if (copy_from_user(&req, (void __user *)arg, sizeof(req))) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	if (req.dbc_id >= QAIC_NUM_DBC || req.resv) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	rcu_id = srcu_read_lock(&qdev->dbc[req.dbc_id].ch_lock);
+	if (!qdev->dbc[req.dbc_id].usr ||
+	    usr->handle != qdev->dbc[req.dbc_id].usr->handle) {
+		ret = -EPERM;
+		goto release_rcu;
+	}
+
+	if (!req.handle) {
+		ret = alloc_handle(qdev, &req);
+		if (!ret && copy_to_user((void __user *)arg, &req,
+					 sizeof(req))) {
+			ret = -EFAULT;
+			free_handle(qdev, &req);
+			goto release_rcu;
+		}
+	} else {
+		ret = free_handle(qdev, &req);
+	}
+
+release_rcu:
+	srcu_read_unlock(&qdev->dbc[req.dbc_id].ch_lock, rcu_id);
+out:
+	return ret;
+}
+
+int qaic_data_mmap(struct qaic_device *qdev, struct qaic_user *usr,
+		   struct vm_area_struct *vma)
+{
+	unsigned long offset = 0;
+	struct mem_handle *mem;
+	struct scatterlist *sg;
+	int handle;
+	int dbc_id;
+	int rcu_id;
+	int ret;
+
+	dbc_id = (vma->vm_pgoff & PGOFF_DBC_MASK) >> PGOFF_DBC_SHIFT;
+	handle = vma->vm_pgoff & ~PGOFF_DBC_MASK;
+
+	if (dbc_id >= QAIC_NUM_DBC) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	rcu_id = srcu_read_lock(&qdev->dbc[dbc_id].ch_lock);
+	if (!qdev->dbc[dbc_id].usr ||
+	    usr->handle != qdev->dbc[dbc_id].usr->handle) {
+		ret = -EPERM;
+		goto release_rcu;
+	}
+
+	ret = mutex_lock_interruptible(&qdev->dbc[dbc_id].mem_lock);
+	if (ret)
+		goto release_rcu;
+	mem = idr_find(&qdev->dbc[dbc_id].mem_handles, handle);
+	mutex_unlock(&qdev->dbc[dbc_id].mem_lock);
+	if (!mem) {
+		ret = -ENODEV;
+		goto release_rcu;
+	}
+
+	if (mem->no_xfer) {
+		ret = -EINVAL;
+		goto release_rcu;
+	}
+
+	for (sg = mem->sgt->sgl; sg; sg = sg_next(sg)) {
+		if (sg_page(sg)) {
+			ret = remap_pfn_range(vma, vma->vm_start + offset,
+					      page_to_pfn(sg_page(sg)),
+					      sg->length, vma->vm_page_prot);
+			if (ret)
+				goto release_rcu;
+			offset += sg->length;
+		}
+	}
+
+release_rcu:
+	srcu_read_unlock(&qdev->dbc[dbc_id].ch_lock, rcu_id);
+out:
+	return ret;
+}
+
+static bool invalid_sem(struct qaic_sem *sem)
+{
+	if (sem->val & ~SEM_VAL_MASK || sem->index & ~SEM_INDEX_MASK ||
+	    !(sem->presync == 0 || sem->presync == 1) || sem->resv ||
+	    sem->flags & ~(SEM_INSYNCFENCE | SEM_OUTSYNCFENCE) ||
+	    sem->cmd > SEM_WAIT_GT_0)
+		return true;
+	return false;
+}
+
+static int encode_execute(struct qaic_device *qdev, struct mem_handle *mem,
+			  struct qaic_execute *exec, u16 req_id)
+{
+	u8 cmd = BULK_XFER;
+	u64 db_addr = cpu_to_le64(exec->db_addr);
+	u8 db_len;
+	u32 db_data = cpu_to_le32(exec->db_data);
+	struct scatterlist *sg;
+	u64 dev_addr;
+	int presync_sem;
+	int i;
+
+	if (!mem->no_xfer)
+		cmd |= (exec->dir == DMA_TO_DEVICE ? INBOUND_XFER :
+								OUTBOUND_XFER);
+
+	req_id = cpu_to_le16(req_id);
+
+	if (exec->db_len && !IS_ALIGNED(exec->db_addr, exec->db_len / 8))
+		return -EINVAL;
+
+	presync_sem = exec->sem0.presync + exec->sem1.presync +
+		      exec->sem2.presync + exec->sem3.presync;
+	if (presync_sem > 1)
+		return -EINVAL;
+
+	presync_sem = exec->sem0.presync << 0 | exec->sem1.presync << 1 |
+		      exec->sem2.presync << 2 | exec->sem3.presync << 3;
+
+	switch (exec->db_len) {
+	case 32:
+		db_len = BIT(7);
+		break;
+	case 16:
+		db_len = BIT(7) | 1;
+		break;
+	case 8:
+		db_len = BIT(7) | 2;
+		break;
+	case 0:
+		db_len = 0; /* doorbell is not active for this command */
+		break;
+	default:
+		return -EINVAL; /* should never hit this */
+	}
+
+	/*
+	 * When we end up splitting up a single request (ie a mem handle) into
+	 * multiple DMA requests, we have to manage the sync data carefully.
+	 * There can only be one presync sem.  That needs to be on every xfer
+	 * so that the DMA engine doesn't transfer data before the receiver is
+	 * ready.  We only do the doorbell and postsync sems after the xfer.
+	 * To guarantee previous xfers for the request are complete, we use a
+	 * fence.
+	 */
+	dev_addr = exec->dev_addr;
+	for_each_sg(mem->sgt->sgl, sg, mem->nents, i) {
+		mem->reqs[i].req_id = req_id;
+		mem->reqs[i].cmd = cmd;
+		mem->reqs[i].src_addr =
+			cpu_to_le64(exec->dir == DMA_TO_DEVICE ?
+					sg_dma_address(sg) : dev_addr);
+		mem->reqs[i].dest_addr =
+			cpu_to_le64(exec->dir == DMA_TO_DEVICE ?
+					dev_addr : sg_dma_address(sg));
+		mem->reqs[i].len = cpu_to_le32(sg_dma_len(sg));
+		switch (presync_sem) {
+		case BIT(0):
+			mem->reqs[i].sem_cmd0 = cpu_to_le32(
+						ENCODE_SEM(exec->sem0.val,
+							exec->sem0.index,
+							exec->sem0.presync,
+							exec->sem0.cmd,
+							exec->sem0.flags));
+			break;
+		case BIT(1):
+			mem->reqs[i].sem_cmd1 = cpu_to_le32(
+						ENCODE_SEM(exec->sem1.val,
+							exec->sem1.index,
+							exec->sem1.presync,
+							exec->sem1.cmd,
+							exec->sem1.flags));
+			break;
+		case BIT(2):
+			mem->reqs[i].sem_cmd2 = cpu_to_le32(
+						ENCODE_SEM(exec->sem2.val,
+							exec->sem2.index,
+							exec->sem2.presync,
+							exec->sem2.cmd,
+							exec->sem2.flags));
+			break;
+		case BIT(3):
+			mem->reqs[i].sem_cmd3 = cpu_to_le32(
+						ENCODE_SEM(exec->sem3.val,
+							exec->sem3.index,
+							exec->sem3.presync,
+							exec->sem3.cmd,
+							exec->sem3.flags));
+			break;
+		}
+		dev_addr += sg_dma_len(sg);
+	}
+	/* add post transfer stuff to last segment */
+	i--;
+	mem->reqs[i].cmd |= GEN_COMPLETION;
+	mem->reqs[i].db_addr = db_addr;
+	mem->reqs[i].db_len = db_len;
+	mem->reqs[i].db_data = db_data;
+	exec->sem0.flags |= (exec->dir == DMA_TO_DEVICE ? SEM_INSYNCFENCE :
+							SEM_OUTSYNCFENCE);
+	mem->reqs[i].sem_cmd0 = cpu_to_le32(ENCODE_SEM(exec->sem0.val,
+						       exec->sem0.index,
+						       exec->sem0.presync,
+						       exec->sem0.cmd,
+						       exec->sem0.flags));
+	mem->reqs[i].sem_cmd1 = cpu_to_le32(ENCODE_SEM(exec->sem1.val,
+						       exec->sem1.index,
+						       exec->sem1.presync,
+						       exec->sem1.cmd,
+						       exec->sem1.flags));
+	mem->reqs[i].sem_cmd2 = cpu_to_le32(ENCODE_SEM(exec->sem2.val,
+						       exec->sem2.index,
+						       exec->sem2.presync,
+						       exec->sem2.cmd,
+						       exec->sem2.flags));
+	mem->reqs[i].sem_cmd3 = cpu_to_le32(ENCODE_SEM(exec->sem3.val,
+						       exec->sem3.index,
+						       exec->sem3.presync,
+						       exec->sem3.cmd,
+						       exec->sem3.flags));
+
+	return 0;
+}
+
+static int commit_execute(struct qaic_device *qdev, struct mem_handle *mem,
+			  u32 dbc_id)
+{
+	struct dma_bridge_chan *dbc = &qdev->dbc[dbc_id];
+	u32 head = le32_to_cpu(__raw_readl(dbc->dbc_base + REQHP_OFF));
+	u32 tail = le32_to_cpu(__raw_readl(dbc->dbc_base + REQTP_OFF));
+	u32 avail = head - tail;
+	struct dbc_req *reqs = mem->reqs;
+	bool two_copy = tail + mem->nents > dbc->nelem;
+
+	if (head == U32_MAX || tail == U32_MAX)
+		/* PCI link error */
+		return -ENODEV;
+
+	if (head <= tail)
+		avail += dbc->nelem;
+
+	--avail;
+
+	if (avail < mem->nents)
+		return -EAGAIN;
+
+	if (two_copy) {
+		avail = dbc->nelem - tail;
+		avail = min_t(u32, avail, mem->nents);
+		memcpy(dbc->req_q_base + tail * get_dbc_req_elem_size(),
+		       reqs, sizeof(*reqs) * avail);
+		reqs += avail;
+		avail = mem->nents - avail;
+		if (avail)
+			memcpy(dbc->req_q_base, reqs, sizeof(*reqs) * avail);
+	} else {
+		memcpy(dbc->req_q_base + tail * get_dbc_req_elem_size(),
+		       reqs, sizeof(*reqs) * mem->nents);
+	}
+
+	init_completion(&mem->xfer_done);
+	list_add_tail(&mem->list, &dbc->xfer_list);
+	tail = (tail + mem->nents) % dbc->nelem;
+	__raw_writel(cpu_to_le32(tail), dbc->dbc_base + REQTP_OFF);
+	return 0;
+}
+
+int qaic_execute_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
+		       unsigned long arg)
+{
+	struct mem_handle *mem;
+	struct qaic_execute *exec;
+	unsigned long flags;
+	bool queued;
+	u16 req_id;
+	int handle;
+	int dbc_id;
+	int rcu_id;
+	int ret;
+
+	exec = kmalloc(sizeof(*exec), GFP_KERNEL);
+	if (!exec) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (copy_from_user(exec, (void __user *)arg, sizeof(*exec))) {
+		ret = -EFAULT;
+		goto free_exec;
+	}
+
+	if (exec->dbc_id > QAIC_NUM_DBC || exec->ver != 1 ||
+	    !(exec->dir == 1 || exec->dir == 2) ||
+	    !(exec->db_len == 32 || exec->db_len == 16 || exec->db_len == 8 ||
+	      exec->db_len == 0) ||
+	    invalid_sem(&exec->sem0) || invalid_sem(&exec->sem1) ||
+	    invalid_sem(&exec->sem2) || invalid_sem(&exec->sem3) ||
+	    exec->resv) {
+		ret = -EINVAL;
+		goto free_exec;
+	}
+
+	rcu_id = srcu_read_lock(&qdev->dbc[exec->dbc_id].ch_lock);
+	if (!qdev->dbc[exec->dbc_id].usr ||
+	    qdev->dbc[exec->dbc_id].usr->handle != usr->handle) {
+		ret = -EPERM;
+		goto release_rcu;
+	}
+
+	handle = exec->handle & ~PGOFF_DBC_MASK;
+	dbc_id = (exec->handle & PGOFF_DBC_MASK) >> PGOFF_DBC_SHIFT;
+
+	/* we shifted up by PAGE_SHIFT to make mmap happy, need to undo that */
+	handle >>= PAGE_SHIFT;
+	dbc_id >>= PAGE_SHIFT;
+
+	if (dbc_id != exec->dbc_id) {
+		ret = -EINVAL;
+		goto release_rcu;
+	}
+
+	ret = mutex_lock_interruptible(&qdev->dbc[exec->dbc_id].mem_lock);
+	if (ret)
+		goto release_rcu;
+	mem = idr_find(&qdev->dbc[exec->dbc_id].mem_handles, handle);
+	if (!mem) {
+		ret = -ENODEV;
+		mutex_unlock(&qdev->dbc[exec->dbc_id].mem_lock);
+		goto release_rcu;
+	}
+	/* prevent free_handle from taking the memory from under us */
+	kref_get(&mem->ref_count);
+	mutex_unlock(&qdev->dbc[exec->dbc_id].mem_lock);
+
+	if (mem->dir != DMA_BIDIRECTIONAL && mem->dir != exec->dir) {
+		ret = -EINVAL;
+		kref_put(&mem->ref_count, free_handle_mem);
+		goto release_rcu;
+	}
+
+	spin_lock_irqsave(&qdev->dbc[exec->dbc_id].xfer_lock, flags);
+	req_id = qdev->dbc[exec->dbc_id].next_req_id++;
+	queued = mem->queued;
+	mem->queued = true;
+	spin_unlock_irqrestore(&qdev->dbc[exec->dbc_id].xfer_lock, flags);
+	mem->req_id = req_id;
+
+	if (queued) {
+		ret = -EINVAL;
+		kref_put(&mem->ref_count, free_handle_mem);
+		goto release_rcu;
+	}
+
+	ret = encode_execute(qdev, mem, exec, req_id);
+	if (ret) {
+		mem->queued = false;
+		kref_put(&mem->ref_count, free_handle_mem);
+		goto release_rcu;
+	}
+
+	dma_sync_sg_for_device(&qdev->pdev->dev, mem->sgt->sgl, mem->sgt->nents,
+			       mem->dir);
+
+	spin_lock_irqsave(&qdev->dbc[exec->dbc_id].xfer_lock, flags);
+	ret = commit_execute(qdev, mem, exec->dbc_id);
+	spin_unlock_irqrestore(&qdev->dbc[exec->dbc_id].xfer_lock, flags);
+	if (ret) {
+		mem->queued = false;
+		kref_put(&mem->ref_count, free_handle_mem);
+		goto sync_to_cpu;
+	}
+
+	goto release_rcu;
+
+sync_to_cpu:
+	dma_sync_sg_for_cpu(&qdev->pdev->dev, mem->sgt->sgl, mem->sgt->nents,
+			    mem->dir);
+release_rcu:
+	srcu_read_unlock(&qdev->dbc[exec->dbc_id].ch_lock, rcu_id);
+free_exec:
+	kfree(exec);
+out:
+	return ret;
+}
+
+irqreturn_t dbc_irq_handler(int irq, void *data)
+{
+	struct dma_bridge_chan *dbc = data;
+	struct qaic_device *qdev = dbc->qdev;
+	struct mem_handle *mem;
+	struct mem_handle *i;
+	struct dbc_rsp *rsp;
+	unsigned long flags;
+	int rcu_id;
+	u16 status;
+	u16 req_id;
+	u32 head;
+	u32 tail;
+
+	rcu_id = srcu_read_lock(&dbc->ch_lock);
+read_fifo:
+	/*
+	 * if this channel isn't assigned or gets unassigned during processing
+	 * we have nothing further to do
+	 */
+	if (!dbc->usr) {
+		srcu_read_unlock(&dbc->ch_lock, rcu_id);
+		return IRQ_HANDLED;
+	}
+
+	head = le32_to_cpu(__raw_readl(dbc->dbc_base + RSPHP_OFF));
+	tail = le32_to_cpu(__raw_readl(dbc->dbc_base + RSPTP_OFF));
+
+	if (head == U32_MAX || tail == U32_MAX) {
+		/* PCI link error */
+		srcu_read_unlock(&dbc->ch_lock, rcu_id);
+		return IRQ_HANDLED;
+	}
+
+	if (head == tail) { /* queue empty */
+		srcu_read_unlock(&dbc->ch_lock, rcu_id);
+		return IRQ_HANDLED;
+	}
+
+	while (head != tail) {
+		rsp = dbc->rsp_q_base + head * sizeof(*rsp);
+		req_id = le16_to_cpu(rsp->req_id);
+		status = le16_to_cpu(rsp->status);
+		if (status)
+			pci_dbg(qdev->pdev, "req_id %d failed with status %d\n",
+				req_id, status);
+		spin_lock_irqsave(&dbc->xfer_lock, flags);
+		list_for_each_entry_safe(mem, i, &dbc->xfer_list, list) {
+			if (mem->req_id == req_id) {
+				list_del(&mem->list);
+				dma_sync_sg_for_cpu(&qdev->pdev->dev,
+						    mem->sgt->sgl,
+						    mem->sgt->nents,
+						    mem->dir);
+				mem->queued = false;
+				complete_all(&mem->xfer_done);
+				kref_put(&mem->ref_count, free_handle_mem);
+				break;
+			}
+		}
+		spin_unlock_irqrestore(&dbc->xfer_lock, flags);
+		head = (head + 1) % dbc->nelem;
+		__raw_writel(cpu_to_le32(head), dbc->dbc_base + RSPHP_OFF);
+	}
+
+	/* elements might have been put in the queue while we were processing */
+	goto read_fifo;
+}
+
+int qaic_wait_exec_ioctl(struct qaic_device *qdev, struct qaic_user *usr,
+			 unsigned long arg)
+{
+	struct mem_handle *mem;
+	struct qaic_wait_exec *wait;
+	unsigned int timeout;
+	int handle;
+	int dbc_id;
+	int rcu_id;
+	int ret;
+
+	wait = kmalloc(sizeof(*wait), GFP_KERNEL);
+	if (!wait) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (copy_from_user(wait, (void __user *)arg, sizeof(*wait))) {
+		ret = -EFAULT;
+		goto free_wait;
+	}
+
+	if (wait->resv) {
+		ret = -EINVAL;
+		goto free_wait;
+	}
+
+	handle = wait->handle & ~PGOFF_DBC_MASK;
+	dbc_id = (wait->handle & PGOFF_DBC_MASK) >> PGOFF_DBC_SHIFT;
+
+	/* we shifted up by PAGE_SHIFT to make mmap happy, need to undo that */
+	handle >>= PAGE_SHIFT;
+	dbc_id >>= PAGE_SHIFT;
+
+	if (dbc_id > QAIC_NUM_DBC) {
+		ret = -EINVAL;
+		goto free_wait;
+	}
+
+	rcu_id = srcu_read_lock(&qdev->dbc[dbc_id].ch_lock);
+	if (!qdev->dbc[dbc_id].usr ||
+	    qdev->dbc[dbc_id].usr->handle != usr->handle) {
+		ret = -EPERM;
+		goto release_rcu;
+	}
+
+	ret = mutex_lock_interruptible(&qdev->dbc[dbc_id].mem_lock);
+	if (ret)
+		goto release_rcu;
+	mem = idr_find(&qdev->dbc[dbc_id].mem_handles, handle);
+	mutex_unlock(&qdev->dbc[dbc_id].mem_lock);
+	if (!mem) {
+		ret = -ENODEV;
+		goto release_rcu;
+	}
+
+	/* we don't want the mem handle freed under us in case of deactivate */
+	kref_get(&mem->ref_count);
+	srcu_read_unlock(&qdev->dbc[dbc_id].ch_lock, rcu_id);
+	timeout = wait->timeout ? wait->timeout : wait_exec_default_timeout;
+	ret = wait_for_completion_interruptible_timeout(&mem->xfer_done,
+				msecs_to_jiffies(timeout));
+	rcu_id = srcu_read_lock(&qdev->dbc[dbc_id].ch_lock);
+	if (!ret)
+		ret = -ETIMEDOUT;
+	else if (ret > 0)
+		ret = 0;
+	if (!qdev->dbc[dbc_id].usr) {
+		ret = -EPERM;
+		goto release_rcu;
+	}
+
+	kref_put(&mem->ref_count, free_handle_mem);
+
+release_rcu:
+	srcu_read_unlock(&qdev->dbc[dbc_id].ch_lock, rcu_id);
+free_wait:
+	kfree(wait);
+out:
+	return ret;
+}
+
+int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr)
+{
+	if (!qdev->dbc[dbc_id].usr ||
+	    qdev->dbc[dbc_id].usr->handle != usr->handle)
+		return -EPERM;
+
+	qdev->dbc[dbc_id].usr = NULL;
+	synchronize_srcu(&qdev->dbc[dbc_id].ch_lock);
+	return 0;
+}
+
+void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id)
+{
+	struct mem_handle *mem;
+	struct mem_handle *i;
+
+	qdev->dbc[dbc_id].usr = NULL;
+	synchronize_srcu(&qdev->dbc[dbc_id].ch_lock);
+	list_for_each_entry_safe(mem, i, &qdev->dbc[dbc_id].xfer_list, list) {
+		list_del(&mem->list);
+		dma_sync_sg_for_cpu(&qdev->pdev->dev,
+				    mem->sgt->sgl,
+				    mem->sgt->nents,
+				    mem->dir);
+		complete_all(&mem->xfer_done);
+	}
+}
+
+void release_dbc(struct qaic_device *qdev, u32 dbc_id)
+{
+	struct mem_handle *mem;
+	int next_id = 0;
+
+	wakeup_dbc(qdev, dbc_id);
+
+	dma_free_coherent(&qdev->pdev->dev, qdev->dbc[dbc_id].total_size,
+			  qdev->dbc[dbc_id].req_q_base,
+			  qdev->dbc[dbc_id].dma_addr);
+	qdev->dbc[dbc_id].total_size = 0;
+	qdev->dbc[dbc_id].req_q_base = NULL;
+	qdev->dbc[dbc_id].dma_addr = 0;
+	qdev->dbc[dbc_id].nelem = 0;
+	qdev->dbc[dbc_id].usr = NULL;
+	while (1) {
+		mem = idr_get_next(&qdev->dbc[dbc_id].mem_handles, &next_id);
+		if (!mem)
+			break;
+		idr_remove(&qdev->dbc[dbc_id].mem_handles, next_id);
+		/* account for the missing put from the irq handler */
+		if (mem->queued) {
+			mem->queued = false;
+			kref_put(&mem->ref_count, free_handle_mem);
+		}
+		kref_put(&mem->ref_count, free_handle_mem);
+	}
+	qdev->dbc[dbc_id].in_use = false;
+	wake_up(&qdev->dbc[dbc_id].dbc_release);
+}
diff --git a/drivers/misc/qaic/qaic_drv.c b/drivers/misc/qaic/qaic_drv.c
index 6feecc0..800769d 100644
--- a/drivers/misc/qaic/qaic_drv.c
+++ b/drivers/misc/qaic/qaic_drv.c
@@ -37,6 +37,7 @@ static bool link_up;
 static int qaic_device_open(struct inode *inode, struct file *filp);
 static int qaic_device_release(struct inode *inode, struct file *filp);
 static long qaic_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
+static int qaic_mmap(struct file *filp, struct vm_area_struct *vma);
 
 static const struct file_operations qaic_ops = {
 	.owner = THIS_MODULE,
@@ -44,6 +45,7 @@ static const struct file_operations qaic_ops = {
 	.release = qaic_device_release,
 	.unlocked_ioctl = qaic_ioctl,
 	.compat_ioctl = qaic_ioctl,
+	.mmap = qaic_mmap,
 };
 
 static void free_usr(struct kref *kref)
@@ -109,6 +111,7 @@ static int qaic_device_release(struct inode *inode, struct file *filp)
 	struct qaic_device *qdev = usr->qdev;
 	int qdev_rcu_id;
 	int usr_rcu_id;
+	int i;
 
 	usr_rcu_id = srcu_read_lock(&usr->qdev_lock);
 	if (qdev) {
@@ -117,6 +120,10 @@ static int qaic_device_release(struct inode *inode, struct file *filp)
 			pci_dbg(qdev->pdev, "%s pid:%d\n", __func__,
 								current->pid);
 			qaic_release_usr(qdev, usr);
+			for (i = 0; i < QAIC_NUM_DBC; ++i)
+				if (qdev->dbc[i].usr &&
+				    qdev->dbc[i].usr->handle == usr->handle)
+					release_dbc(qdev, i);
 		}
 		srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
 
@@ -171,6 +178,30 @@ static long qaic_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		}
 		ret = qaic_manage_ioctl(qdev, usr, arg);
 		break;
+	case QAIC_IOCTL_MEM_NR:
+		if (_IOC_DIR(cmd) != (_IOC_READ | _IOC_WRITE) ||
+		    _IOC_SIZE(cmd) != sizeof(struct qaic_mem_req)) {
+			ret = -EINVAL;
+			break;
+		}
+		ret = qaic_mem_ioctl(qdev, usr, arg);
+		break;
+	case QAIC_IOCTL_EXECUTE_NR:
+		if (_IOC_DIR(cmd) != _IOC_WRITE ||
+		    _IOC_SIZE(cmd) != sizeof(struct qaic_execute)) {
+			ret = -EINVAL;
+			break;
+		}
+		ret = qaic_execute_ioctl(qdev, usr, arg);
+		break;
+	case QAIC_IOCTL_WAIT_EXEC_NR:
+		if (_IOC_DIR(cmd) != _IOC_WRITE ||
+		    _IOC_SIZE(cmd) != sizeof(struct qaic_wait_exec)) {
+			ret = -EINVAL;
+			break;
+		}
+		ret = qaic_wait_exec_ioctl(qdev, usr, arg);
+		break;
 	default:
 		ret = -ENOTTY;
 	}
@@ -180,6 +211,34 @@ static long qaic_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	return ret;
 }
 
+static int qaic_mmap(struct file *filp, struct vm_area_struct *vma)
+{
+	struct qaic_user *usr = filp->private_data;
+	struct qaic_device *qdev = usr->qdev;
+	int qdev_rcu_id;
+	int usr_rcu_id;
+	int ret;
+
+	usr_rcu_id = srcu_read_lock(&usr->qdev_lock);
+	if (!qdev) {
+		srcu_read_unlock(&usr->qdev_lock, usr_rcu_id);
+		return -ENODEV;
+	}
+
+	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
+	if (qdev->in_reset) {
+		srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+		srcu_read_unlock(&usr->qdev_lock, usr_rcu_id);
+		return -ENODEV;
+	}
+
+	ret = qaic_data_mmap(qdev, usr, vma);
+
+	srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
+	srcu_read_unlock(&usr->qdev_lock, usr_rcu_id);
+	return ret;
+}
+
 static int qaic_mhi_probe(struct mhi_device *mhi_dev,
 			  const struct mhi_device_id *id)
 {
@@ -284,10 +343,13 @@ void qaic_dev_reset_clean_local_state(struct qaic_device *qdev)
 	struct qaic_user *usr;
 	struct qaic_user *u;
 	dev_t devno;
+	int i;
 
 	qdev->in_reset = true;
 	/* wake up any waiters to avoid waiting for timeouts at sync */
 	wake_all_cntl(qdev);
+	for (i = 0; i < QAIC_NUM_DBC; ++i)
+		wakeup_dbc(qdev, i);
 	synchronize_srcu(&qdev->dev_lock);
 
 	/*
@@ -318,41 +380,10 @@ void qaic_dev_reset_clean_local_state(struct qaic_device *qdev)
 		synchronize_srcu(&usr->qdev_lock);
 		kref_put(&usr->ref_count, free_usr);
 	}
-}
-
-inline int get_dbc_req_elem_size(void)
-{
-	return 64;
-}
 
-inline int get_dbc_rsp_elem_size(void)
-{
-	return 4;
-}
-
-int disable_dbc(struct qaic_device *qdev, u32 dbc_id, struct qaic_user *usr)
-{
-	if (!qdev->dbc[dbc_id].usr ||
-	    qdev->dbc[dbc_id].usr->handle != usr->handle)
-		return -EPERM;
-
-	qdev->dbc[dbc_id].usr = NULL;
-	synchronize_srcu(&qdev->dbc[dbc_id].ch_lock);
-	return 0;
-}
-
-void release_dbc(struct qaic_device *qdev, u32 dbc_id)
-{
-	dma_free_coherent(&qdev->pdev->dev, qdev->dbc[dbc_id].total_size,
-			  qdev->dbc[dbc_id].req_q_base,
-			  qdev->dbc[dbc_id].dma_addr);
-	qdev->dbc[dbc_id].total_size = 0;
-	qdev->dbc[dbc_id].req_q_base = NULL;
-	qdev->dbc[dbc_id].dma_addr = 0;
-	qdev->dbc[dbc_id].nelem = 0;
-	qdev->dbc[dbc_id].usr = NULL;
-	qdev->dbc[dbc_id].in_use = false;
-	wake_up(&qdev->dbc[dbc_id].dbc_release);
+	/* start tearing things down */
+	for (i = 0; i < QAIC_NUM_DBC; ++i)
+		release_dbc(qdev, i);
 }
 
 static int qaic_pci_probe(struct pci_dev *pdev,
@@ -451,6 +482,14 @@ static int qaic_pci_probe(struct pci_dev *pdev,
 		goto get_mhi_irq_fail;
 	}
 
+	for (i = 0; i < QAIC_NUM_DBC; ++i) {
+		ret = devm_request_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
+				       dbc_irq_handler, IRQF_SHARED, "qaic_dbc",
+				       &qdev->dbc[i]);
+		if (ret)
+			goto get_dbc_irq_failed;
+	}
+
 	qdev->mhi_cntl = qaic_mhi_register_controller(pdev, qdev->bar_0,
 						      mhi_irq);
 	if (IS_ERR(qdev->mhi_cntl)) {
@@ -462,6 +501,10 @@ static int qaic_pci_probe(struct pci_dev *pdev,
 	return 0;
 
 mhi_register_fail:
+get_dbc_irq_failed:
+	for (i = 0; i < QAIC_NUM_DBC; ++i)
+		devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
+			      &qdev->dbc[i]);
 get_mhi_irq_fail:
 invalid_msi_config:
 	pci_free_irq_vectors(pdev);
@@ -499,8 +542,11 @@ static void qaic_pci_remove(struct pci_dev *pdev)
 
 	qaic_dev_reset_clean_local_state(qdev);
 	qaic_mhi_free_controller(qdev->mhi_cntl, link_up);
-	for (i = 0; i < QAIC_NUM_DBC; ++i)
+	for (i = 0; i < QAIC_NUM_DBC; ++i) {
+		devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
+			      &qdev->dbc[i]);
 		cleanup_srcu_struct(&qdev->dbc[i].ch_lock);
+	}
 	destroy_workqueue(qdev->cntl_wq);
 	pci_free_irq_vectors(pdev);
 	iounmap(qdev->bar_0);
diff --git a/include/uapi/misc/qaic.h b/include/uapi/misc/qaic.h
index 9bcc33f..9e7f816 100644
--- a/include/uapi/misc/qaic.h
+++ b/include/uapi/misc/qaic.h
@@ -11,6 +11,21 @@
 
 #define QAIC_MANAGE_MAX_MSG_LENGTH 16364
 
+enum qaic_sem_flags {
+	SEM_INSYNCFENCE =	0x1,
+	SEM_OUTSYNCFENCE =	0x2,
+};
+
+enum qaic_sem_cmd {
+	SEM_NOP =		0,
+	SEM_INIT =		1,
+	SEM_INC =		2,
+	SEM_DEC =		3,
+	SEM_WAIT_EQUAL =	4,
+	SEM_WAIT_GT_EQ =	5, /* Greater than or equal */
+	SEM_WAIT_GT_0 =		6, /* Greater than 0 */
+};
+
 enum qaic_manage_transaction_type {
 	TRANS_UNDEFINED =		0,
 	TRANS_PASSTHROUGH_FROM_USR =	1,
@@ -87,7 +102,51 @@ struct qaic_manage_msg {
 	__u8 data[QAIC_MANAGE_MAX_MSG_LENGTH];
 };
 
+struct qaic_mem_req {
+	__u64 handle; /* 0 to alloc, or a valid handle to free */
+	__u64 size;   /* size to alloc, will be rounded to PAGE_SIZE */
+	__u32 dir;    /* direction of data: 0 = bidirectional data,
+			 1 = to device, 2 = from device */
+	__u32 dbc_id; /* Identifier of assigned DMA Bridge channel */
+	__u64 resv;   /* reserved for future use, must be 0 */
+};
+
+struct qaic_sem { /* semaphore command */
+	__u16 val;     /* only lower 12 bits are valid */
+	__u8  index;   /* only lower 5 bits are valid */
+	__u8  presync; /* 1 if presync operation, 0 if postsync */
+	__u8  cmd;     /* see enum sem_cmd */
+	__u8  flags;   /* see sem_flags for valid bits.  All others must be 0 */
+	__u16 resv;    /* reserved for future use, must be 0 */
+};
+
+struct qaic_execute {
+	__u16		ver;    /* struct version, must be 1 */
+	__u8		dir;    /* 1 = to device, 2 = from device */
+	__u8		db_len; /* doorbell length - 32, 16, or 8 bits. 0 means
+				   doorbell is inactive */
+	__u32		db_data;/* data to write to doorbell */
+	__u64		db_addr;/* doorbell address */
+	__u64		handle; /* mem handle from mem_req */
+	__u64		dev_addr;
+	__u32		dbc_id; /* Identifier of assigned DMA Bridge channel */
+	__u32		resv;   /* reserved for future use, must be 0 */
+	struct qaic_sem	sem0;   /* Must be zero if not valid */
+	struct qaic_sem	sem1;   /* Must be zero if not valid */
+	struct qaic_sem	sem2;   /* Must be zero if not valid */
+	struct qaic_sem	sem3;   /* Must be zero if not valid */
+};
+
+struct qaic_wait_exec {
+	__u64 handle; /* handle to wait on until execute is complete */
+	__u32 timeout;/* timeout for wait (in ms).  0 means use default */
+	__u32 resv;   /* reserved for future use, must be 0 */
+};
+
 #define QAIC_IOCTL_MANAGE_NR	0x01
+#define QAIC_IOCTL_MEM_NR	0x02
+#define QAIC_IOCTL_EXECUTE_NR	0x03
+#define QAIC_IOCTL_WAIT_EXEC_NR	0x04
 
 /*
  * Send Manage command to the device
@@ -114,4 +173,74 @@ struct qaic_manage_msg {
  */
 #define QAIC_IOCTL_MANAGE _IOWR('Q', QAIC_IOCTL_MANAGE_NR, struct manage_msg)
 
+/*
+ * Memory alloc/free
+ *
+ * Allows user to request buffers to send/receive data to/from the device
+ * via a DMA Bridge channel.  An allocated buffer may then be mmap'd to be
+ * accessed.  Buffers are tied to a specific dbc.  It is expected that the
+ * user will request a pool of buffers, and reuse the buffers as necessary
+ * to send/receive multiple sets of data with the device over time.
+ *
+ * The handle to the allocated buffer will be returned in the struct upon
+ * success.  A buffer to be freed cannot be accessed after the ioctl is called.
+ *
+ * A request for a 0 size buffer is valid.  This signals that the DMA
+ * operation to/from the device does not transfer data, but does perform
+ * other tasks (ring doorbell, etc).  A handle from a zero size request cannot
+ * be mmap()'d.
+ *
+ * The return value is 0 for success, or a standard error code.  Some of the
+ * possible errors:
+ *
+ * EINTR  - Kernel waiting was interrupted (IE received a signal for user)
+ * ENOMEM - Unable to obtain memory while processing request
+ * EPERM  - Invalid permissions to access resource
+ * EINVAL - Invalid request
+ * EFAULT - Error in accessing memory from user
+ * ENODEV - Resource does not exist
+ */
+#define QAIC_IOCTL_MEM _IOWR('Q', QAIC_IOCTL_MEM_NR, struct mem_req)
+
+/*
+ * Execute DMA Bridge transaction
+ *
+ * Allows user to execute a DMA Bridge transaction using a previously allocated
+ * memory resource.  This operation is non-blocking - success return only
+ * indicates the transaction is queued with the hardware, not that it is
+ * complete.  The user must ensure that the transaction is complete before
+ * reusing the memory resource.  It is invalid to attempt to execute multiple
+ * transactions concurrently which use the same memory resource in the same
+ * direction.
+ *
+ * The return value is 0 for success, or a standard error code.  Some of the
+ * possible errors:
+ *
+ * ENOMEM - Unable to obtain memory while processing request
+ * EPERM  - Invalid permissions to access resource
+ * EINVAL - Invalid request
+ * EFAULT - Error in accessing memory from user
+ * ENODEV - Resource does not exist
+ */
+#define QAIC_IOCTL_EXECUTE _IOW('Q', QAIC_IOCTL_EXECUTE_NR, struct execute)
+
+/*
+ * Wait for executed DMA Bridge transaction
+ *
+ * Allows user to wait for a previously executed DMA Bridge transaction.
+ * This operation is blocking.
+ *
+ * The return value is 0 for success, or a standard error code.  Some of the
+ * possible errors:
+ *
+ * ENOMEM    - Unable to obtain memory while processing request
+ * EPERM     - Invalid permissions to access resource
+ * EINVAL    - Invalid request
+ * EFAULT    - Error in accessing memory from user
+ * ENODEV    - Resource does not exist
+ * ETIMEDOUT - The transaction did not complete before the timeout expired
+ */
+#define QAIC_IOCTL_WAIT_EXEC _IOW('Q', QAIC_IOCTL_WAIT_EXEC_NR,	\
+				  struct wait_exec)
+
 #endif /* QAIC_H_ */
-- 
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 6/8] qaic: Implement PCI link status error handlers
  2020-05-14 14:07 [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver Jeffrey Hugo
                   ` (4 preceding siblings ...)
  2020-05-14 14:07 ` [RFC PATCH 5/8] qaic: Implement data path Jeffrey Hugo
@ 2020-05-14 14:07 ` Jeffrey Hugo
  2020-05-14 14:07 ` [RFC PATCH 7/8] qaic: Implement MHI error status handler Jeffrey Hugo
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-14 14:07 UTC (permalink / raw)
  To: arnd, gregkh
  Cc: manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel, Jeffrey Hugo

Whenever the link goes down, the device loses all state.  Therefore, we
need to be aware of when the link goes down and comes back up so that
our state is kept in sync with the device.  Implement the PCI error
handlers to be notified of links tate changes.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
---
 drivers/misc/qaic/mhi_controller.c | 16 ++++++++++++++++
 drivers/misc/qaic/mhi_controller.h |  4 ++++
 drivers/misc/qaic/qaic_drv.c       | 29 +++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+)

diff --git a/drivers/misc/qaic/mhi_controller.c b/drivers/misc/qaic/mhi_controller.c
index ba4808c..18fc830 100644
--- a/drivers/misc/qaic/mhi_controller.c
+++ b/drivers/misc/qaic/mhi_controller.c
@@ -493,3 +493,19 @@ void qaic_mhi_free_controller(struct mhi_controller *mhi_cntl, bool link_up)
 	kfree(mhi_cntl->irq);
 	kfree(mhi_cntl);
 }
+
+void qaic_mhi_link_down(struct mhi_controller *mhi_cntl)
+{
+	mhi_power_down(mhi_cntl, false);
+}
+
+void qaic_mhi_link_up(struct mhi_controller *mhi_cntl)
+{
+	struct pci_dev *pci_dev = container_of(mhi_cntl->cntrl_dev,
+					       struct pci_dev, dev);
+	int ret;
+
+	ret = mhi_async_power_up(mhi_cntl);
+	if (ret)
+		pci_err(pci_dev, "mhi_async_power_up failed when link came back %d\n", ret);
+}
diff --git a/drivers/misc/qaic/mhi_controller.h b/drivers/misc/qaic/mhi_controller.h
index c81725e..5eca586 100644
--- a/drivers/misc/qaic/mhi_controller.h
+++ b/drivers/misc/qaic/mhi_controller.h
@@ -11,4 +11,8 @@ struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev,
 						    int mhi_irq);
 
 void qaic_mhi_free_controller(struct mhi_controller *mhi_cntl, bool link_up);
+
+void qaic_mhi_link_down(struct mhi_controller *mhi_cntl);
+void qaic_mhi_link_up(struct mhi_controller *mhi_cntl);
+
 #endif /* MHICONTROLLERQAIC_H_ */
diff --git a/drivers/misc/qaic/qaic_drv.c b/drivers/misc/qaic/qaic_drv.c
index 800769d..1c8eefd 100644
--- a/drivers/misc/qaic/qaic_drv.c
+++ b/drivers/misc/qaic/qaic_drv.c
@@ -558,6 +558,28 @@ static void qaic_pci_remove(struct pci_dev *pdev)
 	kfree(qdev);
 }
 
+static pci_ers_result_t qaic_pci_error_detected(struct pci_dev *pdev,
+						enum pci_channel_state error)
+{
+	return PCI_ERS_RESULT_NEED_RESET;
+}
+
+static void qaic_pci_reset_prepare(struct pci_dev *pdev)
+{
+	struct qaic_device *qdev = pci_get_drvdata(pdev);
+
+	qaic_dev_reset_clean_local_state(qdev);
+	qaic_mhi_link_down(qdev->mhi_cntl);
+}
+
+static void qaic_pci_reset_done(struct pci_dev *pdev)
+{
+	struct qaic_device *qdev = pci_get_drvdata(pdev);
+
+	qdev->in_reset = false;
+	qaic_mhi_link_up(qdev->mhi_cntl);
+}
+
 static const struct mhi_device_id qaic_mhi_match_table[] = {
 	{ .chan = "QAIC_CONTROL", },
 	{},
@@ -581,11 +603,18 @@ static const struct pci_device_id ids[] = {
 };
 MODULE_DEVICE_TABLE(pci, ids);
 
+static const struct pci_error_handlers qaic_pci_err_handler = {
+	.error_detected = qaic_pci_error_detected,
+	.reset_prepare = qaic_pci_reset_prepare,
+	.reset_done = qaic_pci_reset_done,
+};
+
 static struct pci_driver qaic_pci_driver = {
 	.name = QAIC_NAME,
 	.id_table = ids,
 	.probe = qaic_pci_probe,
 	.remove = qaic_pci_remove,
+	.err_handler = &qaic_pci_err_handler,
 };
 
 static int __init qaic_init(void)
-- 
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 7/8] qaic: Implement MHI error status handler
  2020-05-14 14:07 [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver Jeffrey Hugo
                   ` (5 preceding siblings ...)
  2020-05-14 14:07 ` [RFC PATCH 6/8] qaic: Implement PCI link status error handlers Jeffrey Hugo
@ 2020-05-14 14:07 ` Jeffrey Hugo
  2020-05-14 14:07 ` [RFC PATCH 8/8] MAINTAINERS: Add entry for QAIC driver Jeffrey Hugo
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-14 14:07 UTC (permalink / raw)
  To: arnd, gregkh
  Cc: manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel, Jeffrey Hugo

If the MHI bus has detected some kind of error with the device, it will
invoke the registered status_cb().  We are intrested in both the fatal
error and sys error events.

For a fatal error, the device has essentially gone away, and needs to be
reinited from scratch.  We lose all state.  The MHI bus expects the
controller to fully handle this scenario

For a sys error, the device can recover by going through the MHI reset
flow, but we lose all state.  The MHI bus is going to handle the MHI
reset flow, but we need to still clean up our local state.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
---
 drivers/misc/qaic/mhi_controller.c | 27 +++++++++++++++++++++++++++
 drivers/misc/qaic/mhi_controller.h |  2 ++
 drivers/misc/qaic/qaic.h           |  2 ++
 drivers/misc/qaic/qaic_drv.c       | 16 +++++++++++++++-
 4 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/qaic/mhi_controller.c b/drivers/misc/qaic/mhi_controller.c
index 18fc830..c1498d4 100644
--- a/drivers/misc/qaic/mhi_controller.c
+++ b/drivers/misc/qaic/mhi_controller.c
@@ -7,6 +7,8 @@
 #include <linux/moduleparam.h>
 #include <linux/pci.h>
 
+#include "qaic.h"
+
 static unsigned int mhi_timeout = 20000; /* 20 sec default */
 module_param(mhi_timeout, uint, 0600);
 
@@ -424,6 +426,15 @@ static void mhi_runtime_put(struct mhi_controller *mhi_cntl)
 static void mhi_status_cb(struct mhi_controller *mhi_cntl,
 			  enum mhi_callback reason)
 {
+	struct qaic_device *qdev = (struct qaic_device *)pci_get_drvdata(
+					to_pci_dev(mhi_cntl->cntrl_dev));
+
+	/* this event occurs in atomic context */
+	if (reason == MHI_CB_FATAL_ERROR && !qdev->in_reset)
+		schedule_work(&qdev->reset_mhi_work);
+	/* this event occurs in non-atomic context */
+	if (reason == MHI_CB_SYS_ERROR && !qdev->in_reset)
+		qaic_dev_reset_clean_local_state(qdev);
 }
 
 struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev,
@@ -509,3 +520,19 @@ void qaic_mhi_link_up(struct mhi_controller *mhi_cntl)
 	if (ret)
 		pci_err(pci_dev, "mhi_async_power_up failed when link came back %d\n", ret);
 }
+
+void qaic_mhi_start_reset(struct mhi_controller *mhi_cntl)
+{
+	mhi_power_down(mhi_cntl, true);
+}
+
+void qaic_mhi_reset_done(struct mhi_controller *mhi_cntl)
+{
+	struct pci_dev *pci_dev = container_of(mhi_cntl->cntrl_dev,
+					       struct pci_dev, dev);
+	int ret;
+
+	ret = mhi_async_power_up(mhi_cntl);
+	if (ret)
+		pci_err(pci_dev, "mhi_async_power_up failed after reset %d\n", ret);
+}
diff --git a/drivers/misc/qaic/mhi_controller.h b/drivers/misc/qaic/mhi_controller.h
index 5eca586..1081f0a 100644
--- a/drivers/misc/qaic/mhi_controller.h
+++ b/drivers/misc/qaic/mhi_controller.h
@@ -14,5 +14,7 @@ void qaic_mhi_free_controller(struct mhi_controller *mhi_cntl, bool link_up);
 
 void qaic_mhi_link_down(struct mhi_controller *mhi_cntl);
 void qaic_mhi_link_up(struct mhi_controller *mhi_cntl);
+void qaic_mhi_start_reset(struct mhi_controller *mhi_cntl);
+void qaic_mhi_reset_done(struct mhi_controller *mhi_cntl);
 
 #endif /* MHICONTROLLERQAIC_H_ */
diff --git a/drivers/misc/qaic/qaic.h b/drivers/misc/qaic/qaic.h
index 5f774d4..bbb7512 100644
--- a/drivers/misc/qaic/qaic.h
+++ b/drivers/misc/qaic/qaic.h
@@ -68,6 +68,7 @@ struct qaic_device {
 	struct cdev		*cdev;
 	struct device		*dev;
 	struct dma_bridge_chan	dbc[QAIC_NUM_DBC];
+	struct work_struct	reset_mhi_work;
 	struct workqueue_struct	*cntl_wq;
 	bool			in_reset;
 	struct srcu_struct	dev_lock;
@@ -106,4 +107,5 @@ void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id);
 void release_dbc(struct qaic_device *qdev, u32 dbc_id);
 
 void wake_all_cntl(struct qaic_device *qdev);
+void qaic_dev_reset_clean_local_state(struct qaic_device *qdev);
 #endif /* QAICINTERNAL_H_ */
diff --git a/drivers/misc/qaic/qaic_drv.c b/drivers/misc/qaic/qaic_drv.c
index 1c8eefd..010cf88 100644
--- a/drivers/misc/qaic/qaic_drv.c
+++ b/drivers/misc/qaic/qaic_drv.c
@@ -386,6 +386,18 @@ void qaic_dev_reset_clean_local_state(struct qaic_device *qdev)
 		release_dbc(qdev, i);
 }
 
+static void reset_mhi_work_func(struct work_struct *work)
+{
+	struct qaic_device *qdev;
+
+	qdev = container_of(work, struct qaic_device, reset_mhi_work);
+
+	qaic_dev_reset_clean_local_state(qdev);
+	qaic_mhi_start_reset(qdev->mhi_cntl);
+	qdev->in_reset = false;
+	qaic_mhi_reset_done(qdev->mhi_cntl);
+}
+
 static int qaic_pci_probe(struct pci_dev *pdev,
 			  const struct pci_device_id *id)
 {
@@ -411,6 +423,7 @@ static int qaic_pci_probe(struct pci_dev *pdev,
 	qdev->pdev = pdev;
 	mutex_init(&qdev->cntl_mutex);
 	INIT_LIST_HEAD(&qdev->cntl_xfer_list);
+	INIT_WORK(&qdev->reset_mhi_work, reset_mhi_work_func);
 	init_srcu_struct(&qdev->dev_lock);
 	INIT_LIST_HEAD(&qdev->users);
 	mutex_init(&qdev->users_mutex);
@@ -541,6 +554,7 @@ static void qaic_pci_remove(struct pci_dev *pdev)
 		return;
 
 	qaic_dev_reset_clean_local_state(qdev);
+	cancel_work_sync(&qdev->reset_mhi_work);
 	qaic_mhi_free_controller(qdev->mhi_cntl, link_up);
 	for (i = 0; i < QAIC_NUM_DBC; ++i) {
 		devm_free_irq(&pdev->dev, pci_irq_vector(pdev, i + 1),
@@ -682,4 +696,4 @@ module_exit(qaic_exit);
 MODULE_AUTHOR("Qualcomm Cloud AI 100 Accelerator Kernel Driver Team");
 MODULE_DESCRIPTION("Qualcomm Cloud 100 AI Accelerators Driver");
 MODULE_LICENSE("GPL v2");
-MODULE_VERSION("0.0.0"); /* MAJOR.MINOR.PATCH */
+MODULE_VERSION("3.0.1"); /* MAJOR.MINOR.PATCH */
-- 
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 8/8] MAINTAINERS: Add entry for QAIC driver
  2020-05-14 14:07 [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver Jeffrey Hugo
                   ` (6 preceding siblings ...)
  2020-05-14 14:07 ` [RFC PATCH 7/8] qaic: Implement MHI error status handler Jeffrey Hugo
@ 2020-05-14 14:07 ` Jeffrey Hugo
  2020-05-19  5:08 ` [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver Dave Airlie
  2020-05-19  6:57 ` Manivannan Sadhasivam
  9 siblings, 0 replies; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-14 14:07 UTC (permalink / raw)
  To: arnd, gregkh
  Cc: manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel, Jeffrey Hugo

Add MAINTAINERS entry for the Qualcomm Cloud AI 100 driver.

Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
---
 MAINTAINERS | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index b816a45..7822d07 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13962,6 +13962,13 @@ F:	Documentation/devicetree/bindings/media/qcom,camss.txt
 F:	Documentation/media/v4l-drivers/qcom_camss.rst
 F:	drivers/media/platform/qcom/camss/
 
+QUALCOMM CLOUD AI 100 (QAIC) DRIVER
+M:	Jeffrey Hugo <jhugo@codeaurora.org>
+L:	linux-arm-msm@vger.kernel.org
+S:	Supported
+F:	drivers/misc/qaic/
+F:	include/uapi/misc/qaic.h
+
 QUALCOMM CORE POWER REDUCTION (CPR) AVS DRIVER
 M:	Niklas Cassel <nks@flawful.org>
 L:	linux-pm@vger.kernel.org
-- 
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 3/8] qaic: Create char dev
  2020-05-14 14:07 ` [RFC PATCH 3/8] qaic: Create char dev Jeffrey Hugo
@ 2020-05-14 14:12   ` Greg KH
  2020-05-14 15:05     ` Jeffrey Hugo
  0 siblings, 1 reply; 39+ messages in thread
From: Greg KH @ 2020-05-14 14:12 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: arnd, manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

On Thu, May 14, 2020 at 08:07:41AM -0600, Jeffrey Hugo wrote:
>  /* Copyright (c) 2019-2020, The Linux Foundation. All rights reserved. */
>  
> +#include <linux/cdev.h>
> +#include <linux/idr.h>
> +#include <linux/list.h>
> +#include <linux/kref.h>
> +#include <linux/mhi.h>
>  #include <linux/module.h>
>  #include <linux/msi.h>
> +#include <linux/mutex.h>
>  #include <linux/pci.h>
>  #include <linux/pci_ids.h>
>  
> @@ -13,9 +19,242 @@
>  #define PCI_DEV_AIC100			0xa100
>  
>  #define QAIC_NAME			"Qualcomm Cloud AI 100"
> +#define QAIC_MAX_MINORS			256

Why have a max?

Why not just use a misc device so you make the logic a lot simple, no
class or chardev logic to mess with at all.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 5/8] qaic: Implement data path
  2020-05-14 14:07 ` [RFC PATCH 5/8] qaic: Implement data path Jeffrey Hugo
@ 2020-05-14 14:14   ` Greg KH
  2020-05-14 15:06     ` Jeffrey Hugo
  2020-05-14 21:36   ` Arnd Bergmann
  1 sibling, 1 reply; 39+ messages in thread
From: Greg KH @ 2020-05-14 14:14 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: arnd, manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

On Thu, May 14, 2020 at 08:07:43AM -0600, Jeffrey Hugo wrote:
> +struct qaic_execute {
> +	__u16		ver;    /* struct version, must be 1 */

No need for structures to be versioned.  If you change something, then
add a new ioctl if you really needed it.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 3/8] qaic: Create char dev
  2020-05-14 14:12   ` Greg KH
@ 2020-05-14 15:05     ` Jeffrey Hugo
  2020-05-14 15:56       ` Greg KH
  0 siblings, 1 reply; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-14 15:05 UTC (permalink / raw)
  To: Greg KH
  Cc: arnd, manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

Wow, thank you for the near immediate response.  I'm am quite impressed.

On 5/14/2020 8:12 AM, Greg KH wrote:
> On Thu, May 14, 2020 at 08:07:41AM -0600, Jeffrey Hugo wrote:
>>   /* Copyright (c) 2019-2020, The Linux Foundation. All rights reserved. */
>>   
>> +#include <linux/cdev.h>
>> +#include <linux/idr.h>
>> +#include <linux/list.h>
>> +#include <linux/kref.h>
>> +#include <linux/mhi.h>
>>   #include <linux/module.h>
>>   #include <linux/msi.h>
>> +#include <linux/mutex.h>
>>   #include <linux/pci.h>
>>   #include <linux/pci_ids.h>
>>   
>> @@ -13,9 +19,242 @@
>>   #define PCI_DEV_AIC100			0xa100
>>   
>>   #define QAIC_NAME			"Qualcomm Cloud AI 100"
>> +#define QAIC_MAX_MINORS			256
> 
> Why have a max?
> 
> Why not just use a misc device so you make the logic a lot simple, no
> class or chardev logic to mess with at all.

It was our understanding that the preference is not to add new misc 
devices.  As I go and try to find a supporting reference for that, I 
cannot find one, so I'm not entirely sure where that idea came from.

In addition, we see that the Habana Labs driver also uses chardev, and 
has chosen the same max.  We assumed that since their driver is already 
accepted, using the same mechanisms where applicable would be the 
preferred approach.

Specific to the max, 256 was chosen as being a factor larger than the 
largest system we have, therefore we figured it wouldn't be hit for a 
long while even as we try to have a look at what might happen down the 
road.  Looking at the Habana code, it looks like they have the same 
value for much of the same reasons, although their usecases may vary 
from ours somewhat.

At this time, I don't think we have a strong requirement for a chardev, 
so we could investigate a switch over to a misc dev if you would prefer 
that over following the Habana Labs precedent.  All I ask is a 
confirmation that is the approach you would like to see going forward 
after reviewing the above.

-- 
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 5/8] qaic: Implement data path
  2020-05-14 14:14   ` Greg KH
@ 2020-05-14 15:06     ` Jeffrey Hugo
  2020-05-14 15:56       ` Greg KH
  0 siblings, 1 reply; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-14 15:06 UTC (permalink / raw)
  To: Greg KH
  Cc: arnd, manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

On 5/14/2020 8:14 AM, Greg KH wrote:
> On Thu, May 14, 2020 at 08:07:43AM -0600, Jeffrey Hugo wrote:
>> +struct qaic_execute {
>> +	__u16		ver;    /* struct version, must be 1 */
> 
> No need for structures to be versioned.  If you change something, then
> add a new ioctl if you really needed it.

Huh.  We had thought the botching ioctls document advised having a 
version, but as I double check that document, it infact does not.

Will remove.

-- 
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 3/8] qaic: Create char dev
  2020-05-14 15:05     ` Jeffrey Hugo
@ 2020-05-14 15:56       ` Greg KH
  2020-05-14 16:24         ` Jeffrey Hugo
  0 siblings, 1 reply; 39+ messages in thread
From: Greg KH @ 2020-05-14 15:56 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: arnd, manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

On Thu, May 14, 2020 at 09:05:30AM -0600, Jeffrey Hugo wrote:
> Wow, thank you for the near immediate response.  I'm am quite impressed.
> 
> On 5/14/2020 8:12 AM, Greg KH wrote:
> > On Thu, May 14, 2020 at 08:07:41AM -0600, Jeffrey Hugo wrote:
> > >   /* Copyright (c) 2019-2020, The Linux Foundation. All rights reserved. */
> > > +#include <linux/cdev.h>
> > > +#include <linux/idr.h>
> > > +#include <linux/list.h>
> > > +#include <linux/kref.h>
> > > +#include <linux/mhi.h>
> > >   #include <linux/module.h>
> > >   #include <linux/msi.h>
> > > +#include <linux/mutex.h>
> > >   #include <linux/pci.h>
> > >   #include <linux/pci_ids.h>
> > > @@ -13,9 +19,242 @@
> > >   #define PCI_DEV_AIC100			0xa100
> > >   #define QAIC_NAME			"Qualcomm Cloud AI 100"
> > > +#define QAIC_MAX_MINORS			256
> > 
> > Why have a max?
> > 
> > Why not just use a misc device so you make the logic a lot simple, no
> > class or chardev logic to mess with at all.
> 
> It was our understanding that the preference is not to add new misc devices.

Huh, who said that?  Not the char/misc maintainer (i.e. me) :)

> As I go and try to find a supporting reference for that, I cannot find one,
> so I'm not entirely sure where that idea came from.
> 
> In addition, we see that the Habana Labs driver also uses chardev, and has
> chosen the same max.  We assumed that since their driver is already
> accepted, using the same mechanisms where applicable would be the preferred
> approach.

They had good reasons why not to use a chardev and convinced me of it.
If you can't come up with them, then stick with a misc for now please.

> Specific to the max, 256 was chosen as being a factor larger than the
> largest system we have, therefore we figured it wouldn't be hit for a long
> while even as we try to have a look at what might happen down the road.
> Looking at the Habana code, it looks like they have the same value for much
> of the same reasons, although their usecases may vary from ours somewhat.

Max numbers for no good reason are not a good thing to have.

> At this time, I don't think we have a strong requirement for a chardev, so
> we could investigate a switch over to a misc dev if you would prefer that
> over following the Habana Labs precedent.  All I ask is a confirmation that
> is the approach you would like to see going forward after reviewing the
> above.

Please use misc.

That is, if you even need a character device node :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 5/8] qaic: Implement data path
  2020-05-14 15:06     ` Jeffrey Hugo
@ 2020-05-14 15:56       ` Greg KH
  2020-05-14 16:12         ` Jeffrey Hugo
  0 siblings, 1 reply; 39+ messages in thread
From: Greg KH @ 2020-05-14 15:56 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: arnd, manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

On Thu, May 14, 2020 at 09:06:53AM -0600, Jeffrey Hugo wrote:
> On 5/14/2020 8:14 AM, Greg KH wrote:
> > On Thu, May 14, 2020 at 08:07:43AM -0600, Jeffrey Hugo wrote:
> > > +struct qaic_execute {
> > > +	__u16		ver;    /* struct version, must be 1 */
> > 
> > No need for structures to be versioned.  If you change something, then
> > add a new ioctl if you really needed it.
> 
> Huh.  We had thought the botching ioctls document advised having a version,
> but as I double check that document, it infact does not.
> 
> Will remove.

Thanks, you can also remove the "reserved" variables as well as those
will not be needed either.

greg k-h

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 5/8] qaic: Implement data path
  2020-05-14 15:56       ` Greg KH
@ 2020-05-14 16:12         ` Jeffrey Hugo
  2020-05-14 16:37           ` Greg KH
  0 siblings, 1 reply; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-14 16:12 UTC (permalink / raw)
  To: Greg KH
  Cc: arnd, manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

On 5/14/2020 9:56 AM, Greg KH wrote:
> On Thu, May 14, 2020 at 09:06:53AM -0600, Jeffrey Hugo wrote:
>> On 5/14/2020 8:14 AM, Greg KH wrote:
>>> On Thu, May 14, 2020 at 08:07:43AM -0600, Jeffrey Hugo wrote:
>>>> +struct qaic_execute {
>>>> +	__u16		ver;    /* struct version, must be 1 */
>>>
>>> No need for structures to be versioned.  If you change something, then
>>> add a new ioctl if you really needed it.
>>
>> Huh.  We had thought the botching ioctls document advised having a version,
>> but as I double check that document, it infact does not.
>>
>> Will remove.
> 
> Thanks, you can also remove the "reserved" variables as well as those
> will not be needed either.

Are you sure?

Documentation/process/botching-up-ioctls.rst
Starting at Line 38:

"Pad the entire struct to a multiple of 64-bits if the structure 
contains 64-bit types - the structure size will otherwise differ on 
32-bit versus 64-bit. Having a different structure size hurts when 
passing arrays of structures to the kernel, or if the kernel checks the 
structure size, which e.g. the drm core does."

The "reserved" variables seem to be in line with that.

-- 
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 3/8] qaic: Create char dev
  2020-05-14 15:56       ` Greg KH
@ 2020-05-14 16:24         ` Jeffrey Hugo
  2020-05-15 21:08           ` Jeffrey Hugo
  0 siblings, 1 reply; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-14 16:24 UTC (permalink / raw)
  To: Greg KH
  Cc: arnd, manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

On 5/14/2020 9:56 AM, Greg KH wrote:
> On Thu, May 14, 2020 at 09:05:30AM -0600, Jeffrey Hugo wrote:
>> Wow, thank you for the near immediate response.  I'm am quite impressed.
>>
>> On 5/14/2020 8:12 AM, Greg KH wrote:
>>> On Thu, May 14, 2020 at 08:07:41AM -0600, Jeffrey Hugo wrote:
>>>>    /* Copyright (c) 2019-2020, The Linux Foundation. All rights reserved. */
>>>> +#include <linux/cdev.h>
>>>> +#include <linux/idr.h>
>>>> +#include <linux/list.h>
>>>> +#include <linux/kref.h>
>>>> +#include <linux/mhi.h>
>>>>    #include <linux/module.h>
>>>>    #include <linux/msi.h>
>>>> +#include <linux/mutex.h>
>>>>    #include <linux/pci.h>
>>>>    #include <linux/pci_ids.h>
>>>> @@ -13,9 +19,242 @@
>>>>    #define PCI_DEV_AIC100			0xa100
>>>>    #define QAIC_NAME			"Qualcomm Cloud AI 100"
>>>> +#define QAIC_MAX_MINORS			256
>>>
>>> Why have a max?
>>>
>>> Why not just use a misc device so you make the logic a lot simple, no
>>> class or chardev logic to mess with at all.
>>
>> It was our understanding that the preference is not to add new misc devices.
> 
> Huh, who said that?  Not the char/misc maintainer (i.e. me) :)
> 
>> As I go and try to find a supporting reference for that, I cannot find one,
>> so I'm not entirely sure where that idea came from.
>>
>> In addition, we see that the Habana Labs driver also uses chardev, and has
>> chosen the same max.  We assumed that since their driver is already
>> accepted, using the same mechanisms where applicable would be the preferred
>> approach.
> 
> They had good reasons why not to use a chardev and convinced me of it.
> If you can't come up with them, then stick with a misc for now please.

Interesting.  I didn't see any discussion on this.

>> Specific to the max, 256 was chosen as being a factor larger than the
>> largest system we have, therefore we figured it wouldn't be hit for a long
>> while even as we try to have a look at what might happen down the road.
>> Looking at the Habana code, it looks like they have the same value for much
>> of the same reasons, although their usecases may vary from ours somewhat.
> 
> Max numbers for no good reason are not a good thing to have.
> 
>> At this time, I don't think we have a strong requirement for a chardev, so
>> we could investigate a switch over to a misc dev if you would prefer that
>> over following the Habana Labs precedent.  All I ask is a confirmation that
>> is the approach you would like to see going forward after reviewing the
>> above.
> 
> Please use misc.

Ok, will investigate.

-- 
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 5/8] qaic: Implement data path
  2020-05-14 16:12         ` Jeffrey Hugo
@ 2020-05-14 16:37           ` Greg KH
  2020-05-14 16:45             ` Jeffrey Hugo
  0 siblings, 1 reply; 39+ messages in thread
From: Greg KH @ 2020-05-14 16:37 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: arnd, manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

On Thu, May 14, 2020 at 10:12:03AM -0600, Jeffrey Hugo wrote:
> On 5/14/2020 9:56 AM, Greg KH wrote:
> > On Thu, May 14, 2020 at 09:06:53AM -0600, Jeffrey Hugo wrote:
> > > On 5/14/2020 8:14 AM, Greg KH wrote:
> > > > On Thu, May 14, 2020 at 08:07:43AM -0600, Jeffrey Hugo wrote:
> > > > > +struct qaic_execute {
> > > > > +	__u16		ver;    /* struct version, must be 1 */
> > > > 
> > > > No need for structures to be versioned.  If you change something, then
> > > > add a new ioctl if you really needed it.
> > > 
> > > Huh.  We had thought the botching ioctls document advised having a version,
> > > but as I double check that document, it infact does not.
> > > 
> > > Will remove.
> > 
> > Thanks, you can also remove the "reserved" variables as well as those
> > will not be needed either.
> 
> Are you sure?
> 
> Documentation/process/botching-up-ioctls.rst
> Starting at Line 38:
> 
> "Pad the entire struct to a multiple of 64-bits if the structure contains
> 64-bit types - the structure size will otherwise differ on 32-bit versus
> 64-bit. Having a different structure size hurts when passing arrays of
> structures to the kernel, or if the kernel checks the structure size, which
> e.g. the drm core does."
> 
> The "reserved" variables seem to be in line with that.

Padding is fine to use, but don't use that as a "I'm reserving this to
use it for later" type of thing which is how I read the structure
definitions.  I might be totally wrong, but you should be explicit here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 5/8] qaic: Implement data path
  2020-05-14 16:37           ` Greg KH
@ 2020-05-14 16:45             ` Jeffrey Hugo
  0 siblings, 0 replies; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-14 16:45 UTC (permalink / raw)
  To: Greg KH
  Cc: arnd, manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

On 5/14/2020 10:37 AM, Greg KH wrote:
> On Thu, May 14, 2020 at 10:12:03AM -0600, Jeffrey Hugo wrote:
>> On 5/14/2020 9:56 AM, Greg KH wrote:
>>> On Thu, May 14, 2020 at 09:06:53AM -0600, Jeffrey Hugo wrote:
>>>> On 5/14/2020 8:14 AM, Greg KH wrote:
>>>>> On Thu, May 14, 2020 at 08:07:43AM -0600, Jeffrey Hugo wrote:
>>>>>> +struct qaic_execute {
>>>>>> +	__u16		ver;    /* struct version, must be 1 */
>>>>>
>>>>> No need for structures to be versioned.  If you change something, then
>>>>> add a new ioctl if you really needed it.
>>>>
>>>> Huh.  We had thought the botching ioctls document advised having a version,
>>>> but as I double check that document, it infact does not.
>>>>
>>>> Will remove.
>>>
>>> Thanks, you can also remove the "reserved" variables as well as those
>>> will not be needed either.
>>
>> Are you sure?
>>
>> Documentation/process/botching-up-ioctls.rst
>> Starting at Line 38:
>>
>> "Pad the entire struct to a multiple of 64-bits if the structure contains
>> 64-bit types - the structure size will otherwise differ on 32-bit versus
>> 64-bit. Having a different structure size hurts when passing arrays of
>> structures to the kernel, or if the kernel checks the structure size, which
>> e.g. the drm core does."
>>
>> The "reserved" variables seem to be in line with that.
> 
> Padding is fine to use, but don't use that as a "I'm reserving this to
> use it for later" type of thing which is how I read the structure
> definitions.  I might be totally wrong, but you should be explicit here.

Ok, I think I see your point.  I'll change them to be more explicit as 
padding.

-- 
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 5/8] qaic: Implement data path
  2020-05-14 14:07 ` [RFC PATCH 5/8] qaic: Implement data path Jeffrey Hugo
  2020-05-14 14:14   ` Greg KH
@ 2020-05-14 21:36   ` Arnd Bergmann
  2020-05-14 22:06     ` Jeffrey Hugo
  1 sibling, 1 reply; 39+ messages in thread
From: Arnd Bergmann @ 2020-05-14 21:36 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: gregkh, Manivannan Sadhasivam, Bjorn Andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

On Thu, May 14, 2020 at 4:09 PM Jeffrey Hugo <jhugo@codeaurora.org> wrote:
>
> +struct dbc_req { /* everything must be little endian encoded */

Instead of the comment, I suppose you want to use __le16 and __le32
types and let sparse check that you got it right.

> +       u16     req_id;
> +       u8      seq_id;
> +       u8      cmd;
> +       u32     resv;
> +       u64     src_addr;
> +       u64     dest_addr;
> +       u32     len;
> +       u32     resv2;
> +       u64     db_addr; /* doorbell address */
> +       u8      db_len; /* not a raw value, special encoding */
> +       u8      resv3;
> +       u16     resv4;
> +       u32     db_data;
> +       u32     sem_cmd0;
> +       u32     sem_cmd1;
> +       u32     sem_cmd2;
> +       u32     sem_cmd3;
> +} __packed;

All members are naturally aligned, so better drop the __packed
annotation get better code, unless the structure itself is
at an unaligned offset in memory.

> +struct dbc_rsp { /* everything must be little endian encoded */
> +       u16     req_id;
> +       u16     status;
> +} __packed;

Same here.

> +       init_completion(&mem->xfer_done);
> +       list_add_tail(&mem->list, &dbc->xfer_list);
> +       tail = (tail + mem->nents) % dbc->nelem;
> +       __raw_writel(cpu_to_le32(tail), dbc->dbc_base + REQTP_OFF);

What is this __raw accessor for? This generally results in non-portable
code that should be replaced with writel() or something specific to
the bus on the architecture you deal with.

> +       spin_lock_irqsave(&qdev->dbc[exec->dbc_id].xfer_lock, flags);
> +       req_id = qdev->dbc[exec->dbc_id].next_req_id++;
> +       queued = mem->queued;
> +       mem->queued = true;
> +       spin_unlock_irqrestore(&qdev->dbc[exec->dbc_id].xfer_lock, flags);

No need for 'irqsave' locks when you know that interrupts are enabled.

> +       head = le32_to_cpu(__raw_readl(dbc->dbc_base + RSPHP_OFF));
> +       tail = le32_to_cpu(__raw_readl(dbc->dbc_base + RSPTP_OFF));

More __raw accessors to replace.

> +       case QAIC_IOCTL_MEM_NR:
> +               if (_IOC_DIR(cmd) != (_IOC_READ | _IOC_WRITE) ||
> +                   _IOC_SIZE(cmd) != sizeof(struct qaic_mem_req)) {
> +                       ret = -EINVAL;
> +                       break;

This looks like a very verbose way to check 'cmd' against a known
constant. Why not use 'switch (cmd)' like all other drivers?

      Arnd

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 5/8] qaic: Implement data path
  2020-05-14 21:36   ` Arnd Bergmann
@ 2020-05-14 22:06     ` Jeffrey Hugo
  2020-05-14 22:20       ` Arnd Bergmann
  0 siblings, 1 reply; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-14 22:06 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: gregkh, Manivannan Sadhasivam, Bjorn Andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

Thanks for the review.

On 5/14/2020 3:36 PM, Arnd Bergmann wrote:
> On Thu, May 14, 2020 at 4:09 PM Jeffrey Hugo <jhugo@codeaurora.org> wrote:
>>
>> +struct dbc_req { /* everything must be little endian encoded */
> 
> Instead of the comment, I suppose you want to use __le16 and __le32
> types and let sparse check that you got it right.

Ah yes, I was curious if those should be applied here.  Their use seems 
inconsistent.  I will do that.

> 
>> +       u16     req_id;
>> +       u8      seq_id;
>> +       u8      cmd;
>> +       u32     resv;
>> +       u64     src_addr;
>> +       u64     dest_addr;
>> +       u32     len;
>> +       u32     resv2;
>> +       u64     db_addr; /* doorbell address */
>> +       u8      db_len; /* not a raw value, special encoding */
>> +       u8      resv3;
>> +       u16     resv4;
>> +       u32     db_data;
>> +       u32     sem_cmd0;
>> +       u32     sem_cmd1;
>> +       u32     sem_cmd2;
>> +       u32     sem_cmd3;
>> +} __packed;
> 
> All members are naturally aligned, so better drop the __packed
> annotation get better code, unless the structure itself is
> at an unaligned offset in memory.

I'm going to have to disagree.  While most "sane" compilers would not 
add extra padding, I've debugged enough issues in the past when 
sending/receiving data with foreign environments to never trust anything 
that isn't "packed".

Unless I missed something in the C spec that requires naturally aligned 
structures to have an identical layout in memory, I'll take safety and 
functional correctness over performance.

> 
>> +struct dbc_rsp { /* everything must be little endian encoded */
>> +       u16     req_id;
>> +       u16     status;
>> +} __packed;
> 
> Same here.
> 
>> +       init_completion(&mem->xfer_done);
>> +       list_add_tail(&mem->list, &dbc->xfer_list);
>> +       tail = (tail + mem->nents) % dbc->nelem;
>> +       __raw_writel(cpu_to_le32(tail), dbc->dbc_base + REQTP_OFF);
> 
> What is this __raw accessor for? This generally results in non-portable
> code that should be replaced with writel() or something specific to
> the bus on the architecture you deal with.

The barrier(s) that comes with writel are unnecessary in this case. 
Since this is part of our critical path, we are sensitive to its 
performance.

What are the portability issues around the __raw variant?

> 
>> +       spin_lock_irqsave(&qdev->dbc[exec->dbc_id].xfer_lock, flags);
>> +       req_id = qdev->dbc[exec->dbc_id].next_req_id++;
>> +       queued = mem->queued;
>> +       mem->queued = true;
>> +       spin_unlock_irqrestore(&qdev->dbc[exec->dbc_id].xfer_lock, flags);
> 
> No need for 'irqsave' locks when you know that interrupts are enabled.

Fair enough.

> 
>> +       head = le32_to_cpu(__raw_readl(dbc->dbc_base + RSPHP_OFF));
>> +       tail = le32_to_cpu(__raw_readl(dbc->dbc_base + RSPTP_OFF));
> 
> More __raw accessors to replace.

Same answer as before.

> 
>> +       case QAIC_IOCTL_MEM_NR:
>> +               if (_IOC_DIR(cmd) != (_IOC_READ | _IOC_WRITE) ||
>> +                   _IOC_SIZE(cmd) != sizeof(struct qaic_mem_req)) {
>> +                       ret = -EINVAL;
>> +                       break;
> 
> This looks like a very verbose way to check 'cmd' against a known
> constant. Why not use 'switch (cmd)' like all other drivers?

Huh.  That actually does sound more elegant.  Will do.


-- 
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 5/8] qaic: Implement data path
  2020-05-14 22:06     ` Jeffrey Hugo
@ 2020-05-14 22:20       ` Arnd Bergmann
  0 siblings, 0 replies; 39+ messages in thread
From: Arnd Bergmann @ 2020-05-14 22:20 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: gregkh, Manivannan Sadhasivam, Bjorn Andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

On Fri, May 15, 2020 at 12:06 AM Jeffrey Hugo <jhugo@codeaurora.org> wrote:

> >> +       u16     req_id;
> >> +       u8      seq_id;
> >> +       u8      cmd;
> >> +       u32     resv;
> >> +       u64     src_addr;
> >> +       u64     dest_addr;
> >> +       u32     len;
> >> +       u32     resv2;
> >> +       u64     db_addr; /* doorbell address */
> >> +       u8      db_len; /* not a raw value, special encoding */
> >> +       u8      resv3;
> >> +       u16     resv4;
> >> +       u32     db_data;
> >> +       u32     sem_cmd0;
> >> +       u32     sem_cmd1;
> >> +       u32     sem_cmd2;
> >> +       u32     sem_cmd3;
> >> +} __packed;
> >
> > All members are naturally aligned, so better drop the __packed
> > annotation get better code, unless the structure itself is
> > at an unaligned offset in memory.
>
> I'm going to have to disagree.  While most "sane" compilers would not
> add extra padding, I've debugged enough issues in the past when
> sending/receiving data with foreign environments to never trust anything
> that isn't "packed".
>
> Unless I missed something in the C spec that requires naturally aligned
> structures to have an identical layout in memory, I'll take safety and
> functional correctness over performance.

The C standard does not mandate the layout and individual ABIs can
be different, but Linux does only runs on ABIs that have the correct
layout for the structure above

The problem is that the compiler will split up word accesses into bytes
on some architectures such as older ARM, to avoid unaligned
loads and stores. It should be fine if you add both __packed and
__aligned(8) to make the structure aligned again.

> >> +       init_completion(&mem->xfer_done);
> >> +       list_add_tail(&mem->list, &dbc->xfer_list);
> >> +       tail = (tail + mem->nents) % dbc->nelem;
> >> +       __raw_writel(cpu_to_le32(tail), dbc->dbc_base + REQTP_OFF);
> >
> > What is this __raw accessor for? This generally results in non-portable
> > code that should be replaced with writel() or something specific to
> > the bus on the architecture you deal with.
>
> The barrier(s) that comes with writel are unnecessary in this case.
> Since this is part of our critical path, we are sensitive to its
> performance.
>
> What are the portability issues around the __raw variant?

The access may not be atomic on all architectures but get either
split up or combined with adjacent accesses.

There is no guarantee about endianess: while most architectures
treat __raw access as a simple load/store, some might have an
implied byteswap.

__raw accesses might be completely reordered around other
mmio accesses or spinlocks.

If you want to avoid the barrier, there is the
writel_relaxed()/readl_relaxed() that skips most barriers,
so they can be reordered against MSI interrupts and
DMA accesses on architectures that allow (e.g. ARM
or PowerPC) but not against each other.

If you do use them, I'd still expect a comment that explains
why this instance is performance critical and how it is
synchronized against DMA or MSI if necessary.

    Arnd

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 1/8] qaic: Add skeleton driver
  2020-05-14 14:07 ` [RFC PATCH 1/8] qaic: Add skeleton driver Jeffrey Hugo
@ 2020-05-15  0:43   ` Jeffrey Hugo
  2020-05-15  6:37     ` Greg KH
  0 siblings, 1 reply; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-15  0:43 UTC (permalink / raw)
  To: arnd, gregkh
  Cc: manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

On 5/14/2020 8:07 AM, Jeffrey Hugo wrote:
> +#define QAIC_NAME			"Qualcomm Cloud AI 100"
<snip>
> +static struct pci_driver qaic_pci_driver = {
> +	.name = QAIC_NAME,

A question about the community's preference here.

Our driver name is very verbose due to the desire to have the proper 
"branding".  However, I can see it being a bit obtuse, particularly in logs.

Would the community prefer something more succinct here, such as "qaic"?

-- 
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 1/8] qaic: Add skeleton driver
  2020-05-15  0:43   ` Jeffrey Hugo
@ 2020-05-15  6:37     ` Greg KH
  0 siblings, 0 replies; 39+ messages in thread
From: Greg KH @ 2020-05-15  6:37 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: arnd, manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

On Thu, May 14, 2020 at 06:43:06PM -0600, Jeffrey Hugo wrote:
> On 5/14/2020 8:07 AM, Jeffrey Hugo wrote:
> > +#define QAIC_NAME			"Qualcomm Cloud AI 100"
> <snip>
> > +static struct pci_driver qaic_pci_driver = {
> > +	.name = QAIC_NAME,
> 
> A question about the community's preference here.
> 
> Our driver name is very verbose due to the desire to have the proper
> "branding".  However, I can see it being a bit obtuse, particularly in logs.
> 
> Would the community prefer something more succinct here, such as "qaic"?

Make it the same a KBUILD_MODNAME and then no one can complain :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 3/8] qaic: Create char dev
  2020-05-14 16:24         ` Jeffrey Hugo
@ 2020-05-15 21:08           ` Jeffrey Hugo
  2020-05-16  7:01             ` Greg KH
  0 siblings, 1 reply; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-15 21:08 UTC (permalink / raw)
  To: Greg KH
  Cc: arnd, manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

On 5/14/2020 10:24 AM, Jeffrey Hugo wrote:
> On 5/14/2020 9:56 AM, Greg KH wrote:
>> Please use misc.
> 
> Ok, will investigate.
> 

Misc looks like it'll work, and I'm expecting to have that in the next 
revision.  However, a few of us looked at misc vs chardev, and didn't 
see much of a difference.

We were hoping you'd be kind enough to educate us on the considerations 
between the two, in-case we missed something that ends up being very 
relevant.  We attempted to find relevant documentation within the 
kernel, and in Linux Device Drivers 3rd edition, but did not find any 
besides just reading the misc dev code.  If we missed something, please 
point us to it.

By going with a misc device, we see two possible limitations -

1. You don't have your own major number, so userspace may have to do a 
bit more work to identify your device.  However, given that the driver 
assigns the device name, one would think that the device name would be 
pretty unique.  So, it seems like this would have a minor impact to udev 
rules.

2. There are a limited number of dynamic minor numbers for misc devs 
(64), so if you are expecting more devices than that, a misc dev is not 
appropiate.  Also, these minors are shared with other misc dev users, so 
depending on the system configuration, you might have significantly less 
than 64 minors available for use.

-- 
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 3/8] qaic: Create char dev
  2020-05-15 21:08           ` Jeffrey Hugo
@ 2020-05-16  7:01             ` Greg KH
  2020-05-16 21:29               ` Jeffrey Hugo
  0 siblings, 1 reply; 39+ messages in thread
From: Greg KH @ 2020-05-16  7:01 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: arnd, manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

On Fri, May 15, 2020 at 03:08:59PM -0600, Jeffrey Hugo wrote:
> 2. There are a limited number of dynamic minor numbers for misc devs (64),
> so if you are expecting more devices than that, a misc dev is not
> appropiate.  Also, these minors are shared with other misc dev users, so
> depending on the system configuration, you might have significantly less
> than 64 minors available for use.

I'm pretty sure we can have more than 64 misc devices, that limitation
should have been removed a while ago.  Try it and see :)

greg k-h

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 3/8] qaic: Create char dev
  2020-05-16  7:01             ` Greg KH
@ 2020-05-16 21:29               ` Jeffrey Hugo
  2020-05-17  7:14                 ` Greg KH
  0 siblings, 1 reply; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-16 21:29 UTC (permalink / raw)
  To: Greg KH
  Cc: arnd, manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

On 5/16/2020 1:01 AM, Greg KH wrote:
> On Fri, May 15, 2020 at 03:08:59PM -0600, Jeffrey Hugo wrote:
>> 2. There are a limited number of dynamic minor numbers for misc devs (64),
>> so if you are expecting more devices than that, a misc dev is not
>> appropiate.  Also, these minors are shared with other misc dev users, so
>> depending on the system configuration, you might have significantly less
>> than 64 minors available for use.
> 
> I'm pretty sure we can have more than 64 misc devices, that limitation
> should have been removed a while ago.  Try it and see :)

In total, there can be more tha 64 misc devices.  However my previous 
comment was specific to dynamic minors (ie devices which do not have an 
assigned minor).  The limit on dynamic minors still apears to be 64. 
Looking at the code -

DYNAMIC_MINORS is still 64
https://elixir.bootlin.com/linux/v5.7-rc5/source/drivers/char/misc.c#L63

I see the same in -next

DYNAMIC_MINORS is used to size a bitmap - one bit for each dynamic minor 
misc device that exists at one particular point in time.  After all 64 
bits are consumed by misc_register() by clients requesting a dynamic 
minor, no more dynamic minor misc devices can be registered until some 
are unregistered.

What am I missing?

-- 
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 3/8] qaic: Create char dev
  2020-05-16 21:29               ` Jeffrey Hugo
@ 2020-05-17  7:14                 ` Greg KH
  2020-05-17 19:37                   ` Jeffrey Hugo
  0 siblings, 1 reply; 39+ messages in thread
From: Greg KH @ 2020-05-17  7:14 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: arnd, manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

On Sat, May 16, 2020 at 03:29:19PM -0600, Jeffrey Hugo wrote:
> On 5/16/2020 1:01 AM, Greg KH wrote:
> > On Fri, May 15, 2020 at 03:08:59PM -0600, Jeffrey Hugo wrote:
> > > 2. There are a limited number of dynamic minor numbers for misc devs (64),
> > > so if you are expecting more devices than that, a misc dev is not
> > > appropiate.  Also, these minors are shared with other misc dev users, so
> > > depending on the system configuration, you might have significantly less
> > > than 64 minors available for use.
> > 
> > I'm pretty sure we can have more than 64 misc devices, that limitation
> > should have been removed a while ago.  Try it and see :)
> 
> In total, there can be more tha 64 misc devices.  However my previous
> comment was specific to dynamic minors (ie devices which do not have an
> assigned minor).  The limit on dynamic minors still apears to be 64. Looking
> at the code -
> 
> DYNAMIC_MINORS is still 64
> https://elixir.bootlin.com/linux/v5.7-rc5/source/drivers/char/misc.c#L63
> 
> I see the same in -next
> 
> DYNAMIC_MINORS is used to size a bitmap - one bit for each dynamic minor
> misc device that exists at one particular point in time.  After all 64 bits
> are consumed by misc_register() by clients requesting a dynamic minor, no
> more dynamic minor misc devices can be registered until some are
> unregistered.
> 
> What am I missing?

Oops, nothing, my fault.  We fixed up the allocation of more dynamic
majors for chardev in 2017 and for some reason I thought we also
increased the number of misc dynamic minors at the same time, but that
was incorrect.

I'll gladly take patches that bump up the number of misc minors if
needed.

But to get back to the main issue here, you are only going to have 1 or
maybe 2 of these devices in a system at a time, right?  So "burning" a
whole major number for that feels like a waste.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 3/8] qaic: Create char dev
  2020-05-17  7:14                 ` Greg KH
@ 2020-05-17 19:37                   ` Jeffrey Hugo
  0 siblings, 0 replies; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-17 19:37 UTC (permalink / raw)
  To: Greg KH
  Cc: arnd, manivannan.sadhasivam, bjorn.andersson, wufan, pratanan,
	linux-arm-msm, linux-kernel

On 5/17/2020 1:14 AM, Greg KH wrote:
> On Sat, May 16, 2020 at 03:29:19PM -0600, Jeffrey Hugo wrote:
>> On 5/16/2020 1:01 AM, Greg KH wrote:
>>> On Fri, May 15, 2020 at 03:08:59PM -0600, Jeffrey Hugo wrote:
>>>> 2. There are a limited number of dynamic minor numbers for misc devs (64),
>>>> so if you are expecting more devices than that, a misc dev is not
>>>> appropiate.  Also, these minors are shared with other misc dev users, so
>>>> depending on the system configuration, you might have significantly less
>>>> than 64 minors available for use.
>>>
>>> I'm pretty sure we can have more than 64 misc devices, that limitation
>>> should have been removed a while ago.  Try it and see :)
>>
>> In total, there can be more tha 64 misc devices.  However my previous
>> comment was specific to dynamic minors (ie devices which do not have an
>> assigned minor).  The limit on dynamic minors still apears to be 64. Looking
>> at the code -
>>
>> DYNAMIC_MINORS is still 64
>> https://elixir.bootlin.com/linux/v5.7-rc5/source/drivers/char/misc.c#L63
>>
>> I see the same in -next
>>
>> DYNAMIC_MINORS is used to size a bitmap - one bit for each dynamic minor
>> misc device that exists at one particular point in time.  After all 64 bits
>> are consumed by misc_register() by clients requesting a dynamic minor, no
>> more dynamic minor misc devices can be registered until some are
>> unregistered.
>>
>> What am I missing?
> 
> Oops, nothing, my fault.  We fixed up the allocation of more dynamic
> majors for chardev in 2017 and for some reason I thought we also
> increased the number of misc dynamic minors at the same time, but that
> was incorrect.

No problem.

> I'll gladly take patches that bump up the number of misc minors if
> needed.

I don't think its needed at this time, but I will keep that in mind.

> But to get back to the main issue here, you are only going to have 1 or
> maybe 2 of these devices in a system at a time, right?  So "burning" a
> whole major number for that feels like a waste.

Depends on what the customer wants to do.  We support a number of 
systems, but one in particular has the capability of 6-12 devices.

-- 
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver
  2020-05-14 14:07 [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver Jeffrey Hugo
                   ` (7 preceding siblings ...)
  2020-05-14 14:07 ` [RFC PATCH 8/8] MAINTAINERS: Add entry for QAIC driver Jeffrey Hugo
@ 2020-05-19  5:08 ` Dave Airlie
  2020-05-19 14:57   ` Jeffrey Hugo
  2020-05-19 17:33   ` Greg Kroah-Hartman
  2020-05-19  6:57 ` Manivannan Sadhasivam
  9 siblings, 2 replies; 39+ messages in thread
From: Dave Airlie @ 2020-05-19  5:08 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: Arnd Bergmann, Greg Kroah-Hartman, manivannan.sadhasivam,
	bjorn.andersson, wufan, pratanan, linux-arm-msm, LKML

On Fri, 15 May 2020 at 00:12, Jeffrey Hugo <jhugo@codeaurora.org> wrote:
>
> Introduction:
> Qualcomm Cloud AI 100 is a PCIe adapter card which contains a dedicated
> SoC ASIC for the purpose of efficently running Deep Learning inference
> workloads in a data center environment.
>
> The offical press release can be found at -
> https://www.qualcomm.com/news/releases/2019/04/09/qualcomm-brings-power-efficient-artificial-intelligence-inference
>
> The offical product website is -
> https://www.qualcomm.com/products/datacenter-artificial-intelligence
>
> At the time of the offical press release, numerious technology news sites
> also covered the product.  Doing a search of your favorite site is likely
> to find their coverage of it.
>
> It is our goal to have the kernel driver for the product fully upstream.
> The purpose of this RFC is to start that process.  We are still doing
> development (see below), and thus not quite looking to gain acceptance quite
> yet, but now that we have a working driver we beleive we are at the stage
> where meaningful conversation with the community can occur.


Hi Jeffery,

Just wondering what the userspace/testing plans for this driver.

This introduces a new user facing API for a device without pointers to
users or tests for that API.

Although this isn't a graphics driver, and Greg will likely merge
anything to the kernel you throw at him, I do wonder how to validate
the uapi from a security perspective. It's always interesting when
someone wraps a DMA engine with user ioctls, and without enough
information to decide if the DMA engine is secure against userspace
misprogramming it.

Also if we don't understand the programming API on board the device,
we can't tell if the "core" on the device are able to reprogram the
device engines either.

Figuring this out is difficult at the best of times, it helps if there
is access to the complete device documentation or user space side
drivers in order to faciliate this.

The other area I mention is testing the uAPI, how do you envisage
regression testing and long term sustainability of the uAPI?

Thanks,
Dave.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver
  2020-05-14 14:07 [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver Jeffrey Hugo
                   ` (8 preceding siblings ...)
  2020-05-19  5:08 ` [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver Dave Airlie
@ 2020-05-19  6:57 ` Manivannan Sadhasivam
  2020-05-19 14:16   ` Jeffrey Hugo
  9 siblings, 1 reply; 39+ messages in thread
From: Manivannan Sadhasivam @ 2020-05-19  6:57 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: arnd, gregkh, bjorn.andersson, wufan, pratanan, linux-arm-msm,
	linux-kernel

Hi Jeff,

On Thu, May 14, 2020 at 08:07:38AM -0600, Jeffrey Hugo wrote:
> Introduction:
> Qualcomm Cloud AI 100 is a PCIe adapter card which contains a dedicated
> SoC ASIC for the purpose of efficently running Deep Learning inference
> workloads in a data center environment.
> 
> The offical press release can be found at -
> https://www.qualcomm.com/news/releases/2019/04/09/qualcomm-brings-power-efficient-artificial-intelligence-inference
> 
> The offical product website is -
> https://www.qualcomm.com/products/datacenter-artificial-intelligence
> 
> At the time of the offical press release, numerious technology news sites
> also covered the product.  Doing a search of your favorite site is likely
> to find their coverage of it.
> 
> It is our goal to have the kernel driver for the product fully upstream.
> The purpose of this RFC is to start that process.  We are still doing
> development (see below), and thus not quite looking to gain acceptance quite
> yet, but now that we have a working driver we beleive we are at the stage
> where meaningful conversation with the community can occur.
> 
> Design:

Can you add documentation in next revision with all this information (or more)?
In restructured text ofc. Eventhough it is an RFC series, adding documentation
doesn't hurt and it will help reviewers to understand the hardware better.

Thanks,
Mani

> 
> +--------------------------------+
> |       AI application           |
> |       (userspace)              |
> +-------------+------------------+
>               |
>               | Char dev interface
>               |
>               |
> +-------------+------------------+
> |       QAIC driver              |
> |       (kernel space)           |
> |                                |
> +----+------------------+--------+
>      |                  |
>      |                  |
>      |                  |
>      |                  |
>      |Control path      | Data path
>      |(MHI bus)         |
>      |                  |
>      |                  |
>      |                  |
>      |                  |
> +--------------------------------+
> | +--------+      +------------+ |
> | | MHI HW |      |DMA Bridge  | |
> | +--------+      |(DMA engine)| |
> |                 +------------+ |
> |                                |
> |                                |
> |                                |
> |  Qualcomm Cloud AI 100 device  |
> |                                |
> |                                |
> +--------------------------------+
> 
> A Qualcomm Cloud AI 100 device (QAIC device from here on) is a PCIe hardware
> accelerator for AI inference workloads.  Over the PCIe bus fabric, a QAIC
> device exposes two interfaces via PCI BARs - a MHI hardware region and a
> DMA engine hardware region.
> 
> Before workloads can be run, a QAIC device needs to be initialized.  Similar
> to other Qualcomm products with incorperate MHI, device firmware needs to be
> loaded onto the device from the host.  This occurs in two stages.  First,
> a secondary bootloader (SBL) needs to be loaded onto the device.  This occurs
> via the BHI protocol, and is handled by the MHI bus.  Once the SBL loaded
> and running, it activates the Sahara protocol.  The Sahara protocol is used
> with a userspace application to load and initialize the remaining firmware.
> The Sahara protocol and associated userspace application are outside the
> scope of this series as they have no direct interaction with the QAIC driver.
> 
> Once a QAIC device is fully initialized, workloads can be sent to the device
> and run.  This involves a per-device instance char dev that the QAIC driver
> exposes to userspace.  Running a workload involves two phases - configuring the
> device, and interacting with the workload.
> 
> To configure the device, commands are sent via a MHI channel.  This is referred
> to as the control path.  A command is a single message.  A message contains
> one or more transactions.  Transactions are operations that the device
> is requested to perform.  Most commands are opaque to the kernel driver, however
> some are not.  For example, if the user application wishes to DMA data to the
> device, it requires the assistance of the kernel driver to translate the data
> addresses to an address space that the device can understand.  In this instance
> the transaction for DMAing the data is visible to the kernel driver, and the
> driver will do the required transformation when encoding the message.
> 
> To interact with the workload, the workload is assigned a DMA Bridge Channel
> (dbc).  This is dedicated hardware within the DMA engine.  Interacting with the
> workload consists of sending it input data, and receiving output data.  The
> user application requests appropiate buffers from the kernel driver, prepares
> the buffers, and directs the kernel driver to queue them to the hardware.
> 
> The kernel driver is required to support multiple QAIC devices, and also N
> users per device.
> 
> Status:
> This series introduces the driver for QAIC devices, and builds up the minimum
> functionality for running workloads.  Several features which have been omitted
> or are still planned are indicated in the future work section.
> 
> Before exiting the RFC phase, and attempting full acceptance, we wish to
> complete two features which are currently under development as we expect there
> to be userspace interface changes as a result.
> 
> The first feature is a variable length control message between the kernel driver
> and the device.  This allows us to support the total number of DMA transactions
> we require for certain platforms, while minimizing memory usage.  The interface
> impact of this would be to allow us to drop the size of the manage buffer
> between userspace and the kernel driver from the current 16k, much of which is
> wasted.
> 
> The second feature is an optimization and extension of the data path interface.
> We plan to move the bulk of the data in the qaic_execute structure to the
> qaic_mem_req structure, which optimized our critical path processing.  We also
> plan to extend the qaic_execute structure to allow for a batch submit of
> multiple buffers as an optimization and convenience for userspace.  
> 
> Future work:
> For simplicity, we have omitted work related to the following features, and
> intend to submit in future series:
> 
> -debugfs
> -trace points
> -hwmon (device telemetry)
> 
> We are also investigating what it might mean to support dma_bufs.  We expect
> that such support would come as an extension of the interface.
> 
> Jeffrey Hugo (8):
>   qaic: Add skeleton driver
>   qaic: Add and init a basic mhi controller
>   qaic: Create char dev
>   qaic: Implement control path
>   qaic: Implement data path
>   qaic: Implement PCI link status error handlers
>   qaic: Implement MHI error status handler
>   MAINTAINERS: Add entry for QAIC driver
> 
>  MAINTAINERS                        |    7 +
>  drivers/misc/Kconfig               |    1 +
>  drivers/misc/Makefile              |    1 +
>  drivers/misc/qaic/Kconfig          |   20 +
>  drivers/misc/qaic/Makefile         |   12 +
>  drivers/misc/qaic/mhi_controller.c |  538 +++++++++++++++++++
>  drivers/misc/qaic/mhi_controller.h |   20 +
>  drivers/misc/qaic/qaic.h           |  111 ++++
>  drivers/misc/qaic/qaic_control.c   | 1015 ++++++++++++++++++++++++++++++++++++
>  drivers/misc/qaic/qaic_data.c      |  952 +++++++++++++++++++++++++++++++++
>  drivers/misc/qaic/qaic_drv.c       |  699 +++++++++++++++++++++++++
>  include/uapi/misc/qaic.h           |  246 +++++++++
>  12 files changed, 3622 insertions(+)
>  create mode 100644 drivers/misc/qaic/Kconfig
>  create mode 100644 drivers/misc/qaic/Makefile
>  create mode 100644 drivers/misc/qaic/mhi_controller.c
>  create mode 100644 drivers/misc/qaic/mhi_controller.h
>  create mode 100644 drivers/misc/qaic/qaic.h
>  create mode 100644 drivers/misc/qaic/qaic_control.c
>  create mode 100644 drivers/misc/qaic/qaic_data.c
>  create mode 100644 drivers/misc/qaic/qaic_drv.c
>  create mode 100644 include/uapi/misc/qaic.h
> 
> -- 
> Qualcomm Technologies, Inc. is a member of the
> Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver
  2020-05-19  6:57 ` Manivannan Sadhasivam
@ 2020-05-19 14:16   ` Jeffrey Hugo
  0 siblings, 0 replies; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-19 14:16 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: arnd, gregkh, bjorn.andersson, wufan, pratanan, linux-arm-msm,
	linux-kernel

On 5/19/2020 12:57 AM, Manivannan Sadhasivam wrote:
> Hi Jeff,
> 
> On Thu, May 14, 2020 at 08:07:38AM -0600, Jeffrey Hugo wrote:
>> Introduction:
>> Qualcomm Cloud AI 100 is a PCIe adapter card which contains a dedicated
>> SoC ASIC for the purpose of efficently running Deep Learning inference
>> workloads in a data center environment.
>>
>> The offical press release can be found at -
>> https://www.qualcomm.com/news/releases/2019/04/09/qualcomm-brings-power-efficient-artificial-intelligence-inference
>>
>> The offical product website is -
>> https://www.qualcomm.com/products/datacenter-artificial-intelligence
>>
>> At the time of the offical press release, numerious technology news sites
>> also covered the product.  Doing a search of your favorite site is likely
>> to find their coverage of it.
>>
>> It is our goal to have the kernel driver for the product fully upstream.
>> The purpose of this RFC is to start that process.  We are still doing
>> development (see below), and thus not quite looking to gain acceptance quite
>> yet, but now that we have a working driver we beleive we are at the stage
>> where meaningful conversation with the community can occur.
>>
>> Design:
> 
> Can you add documentation in next revision with all this information (or more)?
> In restructured text ofc. Eventhough it is an RFC series, adding documentation
> doesn't hurt and it will help reviewers to understand the hardware better.

Sorry, saw this hit my inbox as I was sending out the next rev.  There 
will be another rev.

Sure.  I'm open to doing that.  Hmm, Documentation/misc-devices seem good?

Do you have specific additional information you think would be good?

-- 
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver
  2020-05-19  5:08 ` [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver Dave Airlie
@ 2020-05-19 14:57   ` Jeffrey Hugo
  2020-05-19 17:41     ` Greg Kroah-Hartman
  2020-05-19 17:33   ` Greg Kroah-Hartman
  1 sibling, 1 reply; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-19 14:57 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Arnd Bergmann, Greg Kroah-Hartman, manivannan.sadhasivam,
	bjorn.andersson, wufan, pratanan, linux-arm-msm, LKML

On 5/18/2020 11:08 PM, Dave Airlie wrote:
> On Fri, 15 May 2020 at 00:12, Jeffrey Hugo <jhugo@codeaurora.org> wrote:
>>
>> Introduction:
>> Qualcomm Cloud AI 100 is a PCIe adapter card which contains a dedicated
>> SoC ASIC for the purpose of efficently running Deep Learning inference
>> workloads in a data center environment.
>>
>> The offical press release can be found at -
>> https://www.qualcomm.com/news/releases/2019/04/09/qualcomm-brings-power-efficient-artificial-intelligence-inference
>>
>> The offical product website is -
>> https://www.qualcomm.com/products/datacenter-artificial-intelligence
>>
>> At the time of the offical press release, numerious technology news sites
>> also covered the product.  Doing a search of your favorite site is likely
>> to find their coverage of it.
>>
>> It is our goal to have the kernel driver for the product fully upstream.
>> The purpose of this RFC is to start that process.  We are still doing
>> development (see below), and thus not quite looking to gain acceptance quite
>> yet, but now that we have a working driver we beleive we are at the stage
>> where meaningful conversation with the community can occur.
> 
> 
> Hi Jeffery,
> 
> Just wondering what the userspace/testing plans for this driver.
> 
> This introduces a new user facing API for a device without pointers to
> users or tests for that API.

We have daily internal testing, although I don't expect you to take my 
word for that.

I would like to get one of these devices into the hands of Linaro, so 
that it can be put into KernelCI.  Similar to other Qualcomm products. 
I'm trying to convince the powers that be to make this happen.

Regarding what the community could do on its own, everything but the 
Linux driver is considered proprietary - that includes the on device 
firmware and the entire userspace stack.  This is a decision above my 
pay grade.

I've asked for authorization to develop and publish a simple userspace 
application that might enable the community to do such testing, but 
obtaining that authorization has been slow.

> Although this isn't a graphics driver, and Greg will likely merge
> anything to the kernel you throw at him, I do wonder how to validate
> the uapi from a security perspective. It's always interesting when
> someone wraps a DMA engine with user ioctls, and without enough
> information to decide if the DMA engine is secure against userspace
> misprogramming it.

I'm curious, what information might you be looking for?  Are you 
concerned about the device attacking the host, or the host attacking the 
device?

> Also if we don't understand the programming API on board the device,
> we can't tell if the "core" on the device are able to reprogram the
> device engines either.

So, you are looking for details about the messaging protocol which are 
considered opaque to the kernel driver?  Or something else?

> Figuring this out is difficult at the best of times, it helps if there
> is access to the complete device documentation or user space side
> drivers in order to faciliate this.

Regarding access to documentation, sadly that isn't going to happen now, 
or in the near future.  Again, above my pay grade.  The only public 
"documentation" is what you can see from my emails.

I understand your position, and if I can "bound" the information you are 
looking for, I can see what I can do about getting you what you want. 
No promises, but I will try.

> The other area I mention is testing the uAPI, how do you envisage
> regression testing and long term sustainability of the uAPI?

Can you clarify what you mean by "uAPI"?  Are you referring to the 
interface between the device and the kernel driver?

-- 
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver
  2020-05-19  5:08 ` [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver Dave Airlie
  2020-05-19 14:57   ` Jeffrey Hugo
@ 2020-05-19 17:33   ` Greg Kroah-Hartman
  1 sibling, 0 replies; 39+ messages in thread
From: Greg Kroah-Hartman @ 2020-05-19 17:33 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Jeffrey Hugo, Arnd Bergmann, manivannan.sadhasivam,
	bjorn.andersson, wufan, pratanan, linux-arm-msm, LKML

On Tue, May 19, 2020 at 03:08:42PM +1000, Dave Airlie wrote:
> On Fri, 15 May 2020 at 00:12, Jeffrey Hugo <jhugo@codeaurora.org> wrote:
> >
> > Introduction:
> > Qualcomm Cloud AI 100 is a PCIe adapter card which contains a dedicated
> > SoC ASIC for the purpose of efficently running Deep Learning inference
> > workloads in a data center environment.
> >
> > The offical press release can be found at -
> > https://www.qualcomm.com/news/releases/2019/04/09/qualcomm-brings-power-efficient-artificial-intelligence-inference
> >
> > The offical product website is -
> > https://www.qualcomm.com/products/datacenter-artificial-intelligence
> >
> > At the time of the offical press release, numerious technology news sites
> > also covered the product.  Doing a search of your favorite site is likely
> > to find their coverage of it.
> >
> > It is our goal to have the kernel driver for the product fully upstream.
> > The purpose of this RFC is to start that process.  We are still doing
> > development (see below), and thus not quite looking to gain acceptance quite
> > yet, but now that we have a working driver we beleive we are at the stage
> > where meaningful conversation with the community can occur.
> 
> 
> Hi Jeffery,
> 
> Just wondering what the userspace/testing plans for this driver.
> 
> This introduces a new user facing API for a device without pointers to
> users or tests for that API.
> 
> Although this isn't a graphics driver, and Greg will likely merge
> anything to the kernel you throw at him, I do wonder how to validate
> the uapi from a security perspective. It's always interesting when
> someone wraps a DMA engine with user ioctls, and without enough
> information to decide if the DMA engine is secure against userspace
> misprogramming it.

Hey, I'll not merge just anything!

Oh, well, maybe, if it's in staging :)

> Also if we don't understand the programming API on board the device,
> we can't tell if the "core" on the device are able to reprogram the
> device engines either.
> 
> Figuring this out is difficult at the best of times, it helps if there
> is access to the complete device documentation or user space side
> drivers in order to faciliate this.
> 
> The other area I mention is testing the uAPI, how do you envisage
> regression testing and long term sustainability of the uAPI?

I agree with this request, we should have some code that we can run in
order to test that things work properly.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver
  2020-05-19 14:57   ` Jeffrey Hugo
@ 2020-05-19 17:41     ` Greg Kroah-Hartman
  2020-05-19 18:07       ` Jeffrey Hugo
  0 siblings, 1 reply; 39+ messages in thread
From: Greg Kroah-Hartman @ 2020-05-19 17:41 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: Dave Airlie, Arnd Bergmann, manivannan.sadhasivam,
	bjorn.andersson, wufan, pratanan, linux-arm-msm, LKML

On Tue, May 19, 2020 at 08:57:38AM -0600, Jeffrey Hugo wrote:
> On 5/18/2020 11:08 PM, Dave Airlie wrote:
> > On Fri, 15 May 2020 at 00:12, Jeffrey Hugo <jhugo@codeaurora.org> wrote:
> > > 
> > > Introduction:
> > > Qualcomm Cloud AI 100 is a PCIe adapter card which contains a dedicated
> > > SoC ASIC for the purpose of efficently running Deep Learning inference
> > > workloads in a data center environment.
> > > 
> > > The offical press release can be found at -
> > > https://www.qualcomm.com/news/releases/2019/04/09/qualcomm-brings-power-efficient-artificial-intelligence-inference
> > > 
> > > The offical product website is -
> > > https://www.qualcomm.com/products/datacenter-artificial-intelligence
> > > 
> > > At the time of the offical press release, numerious technology news sites
> > > also covered the product.  Doing a search of your favorite site is likely
> > > to find their coverage of it.
> > > 
> > > It is our goal to have the kernel driver for the product fully upstream.
> > > The purpose of this RFC is to start that process.  We are still doing
> > > development (see below), and thus not quite looking to gain acceptance quite
> > > yet, but now that we have a working driver we beleive we are at the stage
> > > where meaningful conversation with the community can occur.
> > 
> > 
> > Hi Jeffery,
> > 
> > Just wondering what the userspace/testing plans for this driver.
> > 
> > This introduces a new user facing API for a device without pointers to
> > users or tests for that API.
> 
> We have daily internal testing, although I don't expect you to take my word
> for that.
> 
> I would like to get one of these devices into the hands of Linaro, so that
> it can be put into KernelCI.  Similar to other Qualcomm products. I'm trying
> to convince the powers that be to make this happen.
> 
> Regarding what the community could do on its own, everything but the Linux
> driver is considered proprietary - that includes the on device firmware and
> the entire userspace stack.  This is a decision above my pay grade.

Ok, that's a decision you are going to have to push upward on, as we
really can't take this without a working, open, userspace.

Especially given the copyright owner of this code, that would be just
crazy and foolish to not have open userspace code as well.  Firmware
would also be wonderful as well, go poke your lawyers about derivative
work issues and the like for fun conversations :)

So without that changed, I'm not going to take this, and push to object
that anyone else take this.

I'm not going to be able to review any of this code anymore until that
changes, sorry.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver
  2020-05-19 17:41     ` Greg Kroah-Hartman
@ 2020-05-19 18:07       ` Jeffrey Hugo
  2020-05-19 18:12         ` Greg Kroah-Hartman
  0 siblings, 1 reply; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-19 18:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Dave Airlie, Arnd Bergmann, manivannan.sadhasivam,
	bjorn.andersson, wufan, pratanan, linux-arm-msm, LKML

On 5/19/2020 11:41 AM, Greg Kroah-Hartman wrote:
> On Tue, May 19, 2020 at 08:57:38AM -0600, Jeffrey Hugo wrote:
>> On 5/18/2020 11:08 PM, Dave Airlie wrote:
>>> On Fri, 15 May 2020 at 00:12, Jeffrey Hugo <jhugo@codeaurora.org> wrote:
>>>>
>>>> Introduction:
>>>> Qualcomm Cloud AI 100 is a PCIe adapter card which contains a dedicated
>>>> SoC ASIC for the purpose of efficently running Deep Learning inference
>>>> workloads in a data center environment.
>>>>
>>>> The offical press release can be found at -
>>>> https://www.qualcomm.com/news/releases/2019/04/09/qualcomm-brings-power-efficient-artificial-intelligence-inference
>>>>
>>>> The offical product website is -
>>>> https://www.qualcomm.com/products/datacenter-artificial-intelligence
>>>>
>>>> At the time of the offical press release, numerious technology news sites
>>>> also covered the product.  Doing a search of your favorite site is likely
>>>> to find their coverage of it.
>>>>
>>>> It is our goal to have the kernel driver for the product fully upstream.
>>>> The purpose of this RFC is to start that process.  We are still doing
>>>> development (see below), and thus not quite looking to gain acceptance quite
>>>> yet, but now that we have a working driver we beleive we are at the stage
>>>> where meaningful conversation with the community can occur.
>>>
>>>
>>> Hi Jeffery,
>>>
>>> Just wondering what the userspace/testing plans for this driver.
>>>
>>> This introduces a new user facing API for a device without pointers to
>>> users or tests for that API.
>>
>> We have daily internal testing, although I don't expect you to take my word
>> for that.
>>
>> I would like to get one of these devices into the hands of Linaro, so that
>> it can be put into KernelCI.  Similar to other Qualcomm products. I'm trying
>> to convince the powers that be to make this happen.
>>
>> Regarding what the community could do on its own, everything but the Linux
>> driver is considered proprietary - that includes the on device firmware and
>> the entire userspace stack.  This is a decision above my pay grade.
> 
> Ok, that's a decision you are going to have to push upward on, as we
> really can't take this without a working, open, userspace.

Fair enough.  I hope that your position may have made things easier for me.

I hope this doesn't widen the rift as it were, but what is the "bar" for 
this userspace?

Is a simple test application that adds two numbers on the hardware 
acceptable?

What is the bar "working"?  I intend to satisfy this request in good 
faith, but I wonder, if no one has the hardware besides our customers, 
and possibly KernelCI, can you really say that I've provided a working 
userspace?

> Especially given the copyright owner of this code, that would be just
> crazy and foolish to not have open userspace code as well.  Firmware
> would also be wonderful as well, go poke your lawyers about derivative
> work issues and the like for fun conversations :)

Those are the kind of conversations I try to avoid  :)

> So without that changed, I'm not going to take this, and push to object
> that anyone else take this.
> 
> I'm not going to be able to review any of this code anymore until that
> changes, sorry.
> 
> thanks,
> 
> greg k-h
> 


-- 
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver
  2020-05-19 18:07       ` Jeffrey Hugo
@ 2020-05-19 18:12         ` Greg Kroah-Hartman
  2020-05-19 18:26           ` Jeffrey Hugo
  0 siblings, 1 reply; 39+ messages in thread
From: Greg Kroah-Hartman @ 2020-05-19 18:12 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: Dave Airlie, Arnd Bergmann, manivannan.sadhasivam,
	bjorn.andersson, wufan, pratanan, linux-arm-msm, LKML

On Tue, May 19, 2020 at 12:07:03PM -0600, Jeffrey Hugo wrote:
> On 5/19/2020 11:41 AM, Greg Kroah-Hartman wrote:
> > On Tue, May 19, 2020 at 08:57:38AM -0600, Jeffrey Hugo wrote:
> > > On 5/18/2020 11:08 PM, Dave Airlie wrote:
> > > > On Fri, 15 May 2020 at 00:12, Jeffrey Hugo <jhugo@codeaurora.org> wrote:
> > > > > 
> > > > > Introduction:
> > > > > Qualcomm Cloud AI 100 is a PCIe adapter card which contains a dedicated
> > > > > SoC ASIC for the purpose of efficently running Deep Learning inference
> > > > > workloads in a data center environment.
> > > > > 
> > > > > The offical press release can be found at -
> > > > > https://www.qualcomm.com/news/releases/2019/04/09/qualcomm-brings-power-efficient-artificial-intelligence-inference
> > > > > 
> > > > > The offical product website is -
> > > > > https://www.qualcomm.com/products/datacenter-artificial-intelligence
> > > > > 
> > > > > At the time of the offical press release, numerious technology news sites
> > > > > also covered the product.  Doing a search of your favorite site is likely
> > > > > to find their coverage of it.
> > > > > 
> > > > > It is our goal to have the kernel driver for the product fully upstream.
> > > > > The purpose of this RFC is to start that process.  We are still doing
> > > > > development (see below), and thus not quite looking to gain acceptance quite
> > > > > yet, but now that we have a working driver we beleive we are at the stage
> > > > > where meaningful conversation with the community can occur.
> > > > 
> > > > 
> > > > Hi Jeffery,
> > > > 
> > > > Just wondering what the userspace/testing plans for this driver.
> > > > 
> > > > This introduces a new user facing API for a device without pointers to
> > > > users or tests for that API.
> > > 
> > > We have daily internal testing, although I don't expect you to take my word
> > > for that.
> > > 
> > > I would like to get one of these devices into the hands of Linaro, so that
> > > it can be put into KernelCI.  Similar to other Qualcomm products. I'm trying
> > > to convince the powers that be to make this happen.
> > > 
> > > Regarding what the community could do on its own, everything but the Linux
> > > driver is considered proprietary - that includes the on device firmware and
> > > the entire userspace stack.  This is a decision above my pay grade.
> > 
> > Ok, that's a decision you are going to have to push upward on, as we
> > really can't take this without a working, open, userspace.
> 
> Fair enough.  I hope that your position may have made things easier for me.
> 
> I hope this doesn't widen the rift as it were, but what is the "bar" for
> this userspace?
> 
> Is a simple test application that adds two numbers on the hardware
> acceptable?

Make it the real library that you use for your applications that anyone
can then also use as well if they have the hardware.  Why would you want
something "crippled"?

> What is the bar "working"?  I intend to satisfy this request in good faith,
> but I wonder, if no one has the hardware besides our customers, and possibly
> KernelCI, can you really say that I've provided a working userspace?

How do you know who your customers really are, or who they sell the
chips to?  I could end up with one of these... :)

> > Especially given the copyright owner of this code, that would be just
> > crazy and foolish to not have open userspace code as well.  Firmware
> > would also be wonderful as well, go poke your lawyers about derivative
> > work issues and the like for fun conversations :)
> 
> Those are the kind of conversations I try to avoid  :)

Sounds like you are going to now have to have them, have fun!

greg k-h

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver
  2020-05-19 18:12         ` Greg Kroah-Hartman
@ 2020-05-19 18:26           ` Jeffrey Hugo
  2020-05-20  5:32             ` Greg Kroah-Hartman
  0 siblings, 1 reply; 39+ messages in thread
From: Jeffrey Hugo @ 2020-05-19 18:26 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Dave Airlie, Arnd Bergmann, manivannan.sadhasivam,
	bjorn.andersson, wufan, pratanan, linux-arm-msm, LKML

On 5/19/2020 12:12 PM, Greg Kroah-Hartman wrote:
> On Tue, May 19, 2020 at 12:07:03PM -0600, Jeffrey Hugo wrote:
>> On 5/19/2020 11:41 AM, Greg Kroah-Hartman wrote:
>>> On Tue, May 19, 2020 at 08:57:38AM -0600, Jeffrey Hugo wrote:
>>>> On 5/18/2020 11:08 PM, Dave Airlie wrote:
>>>>> On Fri, 15 May 2020 at 00:12, Jeffrey Hugo <jhugo@codeaurora.org> wrote:
>>>>>>
>>>>>> Introduction:
>>>>>> Qualcomm Cloud AI 100 is a PCIe adapter card which contains a dedicated
>>>>>> SoC ASIC for the purpose of efficently running Deep Learning inference
>>>>>> workloads in a data center environment.
>>>>>>
>>>>>> The offical press release can be found at -
>>>>>> https://www.qualcomm.com/news/releases/2019/04/09/qualcomm-brings-power-efficient-artificial-intelligence-inference
>>>>>>
>>>>>> The offical product website is -
>>>>>> https://www.qualcomm.com/products/datacenter-artificial-intelligence
>>>>>>
>>>>>> At the time of the offical press release, numerious technology news sites
>>>>>> also covered the product.  Doing a search of your favorite site is likely
>>>>>> to find their coverage of it.
>>>>>>
>>>>>> It is our goal to have the kernel driver for the product fully upstream.
>>>>>> The purpose of this RFC is to start that process.  We are still doing
>>>>>> development (see below), and thus not quite looking to gain acceptance quite
>>>>>> yet, but now that we have a working driver we beleive we are at the stage
>>>>>> where meaningful conversation with the community can occur.
>>>>>
>>>>>
>>>>> Hi Jeffery,
>>>>>
>>>>> Just wondering what the userspace/testing plans for this driver.
>>>>>
>>>>> This introduces a new user facing API for a device without pointers to
>>>>> users or tests for that API.
>>>>
>>>> We have daily internal testing, although I don't expect you to take my word
>>>> for that.
>>>>
>>>> I would like to get one of these devices into the hands of Linaro, so that
>>>> it can be put into KernelCI.  Similar to other Qualcomm products. I'm trying
>>>> to convince the powers that be to make this happen.
>>>>
>>>> Regarding what the community could do on its own, everything but the Linux
>>>> driver is considered proprietary - that includes the on device firmware and
>>>> the entire userspace stack.  This is a decision above my pay grade.
>>>
>>> Ok, that's a decision you are going to have to push upward on, as we
>>> really can't take this without a working, open, userspace.
>>
>> Fair enough.  I hope that your position may have made things easier for me.
>>
>> I hope this doesn't widen the rift as it were, but what is the "bar" for
>> this userspace?
>>
>> Is a simple test application that adds two numbers on the hardware
>> acceptable?
> 
> Make it the real library that you use for your applications that anyone
> can then also use as well if they have the hardware.  Why would you want
> something "crippled"?

It makes it easier to dance around real or perceived IP issues, and thus 
I can likely more successfully "push upward" as you put it.

> 
>> What is the bar "working"?  I intend to satisfy this request in good faith,
>> but I wonder, if no one has the hardware besides our customers, and possibly
>> KernelCI, can you really say that I've provided a working userspace?
> 
> How do you know who your customers really are, or who they sell the
> chips to?  I could end up with one of these... :)

At this time, I don't think that is going to happen, but I would like to 
see it regardless.

>>> Especially given the copyright owner of this code, that would be just
>>> crazy and foolish to not have open userspace code as well.  Firmware
>>> would also be wonderful as well, go poke your lawyers about derivative
>>> work issues and the like for fun conversations :)
>>
>> Those are the kind of conversations I try to avoid  :)
> 
> Sounds like you are going to now have to have them, have fun!

Honestly, I fail to see where you think there is a derivative work, so, 
I'm not really sure what discussions I need to revisit with our lawyers.

-- 
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver
  2020-05-19 18:26           ` Jeffrey Hugo
@ 2020-05-20  5:32             ` Greg Kroah-Hartman
  0 siblings, 0 replies; 39+ messages in thread
From: Greg Kroah-Hartman @ 2020-05-20  5:32 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: Dave Airlie, Arnd Bergmann, manivannan.sadhasivam,
	bjorn.andersson, wufan, pratanan, linux-arm-msm, LKML

On Tue, May 19, 2020 at 12:26:01PM -0600, Jeffrey Hugo wrote:
> On 5/19/2020 12:12 PM, Greg Kroah-Hartman wrote:
> > > > Especially given the copyright owner of this code, that would be just
> > > > crazy and foolish to not have open userspace code as well.  Firmware
> > > > would also be wonderful as well, go poke your lawyers about derivative
> > > > work issues and the like for fun conversations :)
> > > 
> > > Those are the kind of conversations I try to avoid  :)
> > 
> > Sounds like you are going to now have to have them, have fun!
> 
> Honestly, I fail to see where you think there is a derivative work, so, I'm
> not really sure what discussions I need to revisit with our lawyers.

Given that we are not lawyers, why don't we leave those types of
discussions up to the lawyers, and not depend on people like me and you
for that?  :)

If your lawyers think that the code division is fine as-is, that's
great, I'd be glad to review it if they add their signed-off-by: on it
verifying that the api divide is approved by them.

thanks!

greg k-h

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2020-05-20  5:32 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-14 14:07 [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver Jeffrey Hugo
2020-05-14 14:07 ` [RFC PATCH 1/8] qaic: Add skeleton driver Jeffrey Hugo
2020-05-15  0:43   ` Jeffrey Hugo
2020-05-15  6:37     ` Greg KH
2020-05-14 14:07 ` [RFC PATCH 2/8] qaic: Add and init a basic mhi controller Jeffrey Hugo
2020-05-14 14:07 ` [RFC PATCH 3/8] qaic: Create char dev Jeffrey Hugo
2020-05-14 14:12   ` Greg KH
2020-05-14 15:05     ` Jeffrey Hugo
2020-05-14 15:56       ` Greg KH
2020-05-14 16:24         ` Jeffrey Hugo
2020-05-15 21:08           ` Jeffrey Hugo
2020-05-16  7:01             ` Greg KH
2020-05-16 21:29               ` Jeffrey Hugo
2020-05-17  7:14                 ` Greg KH
2020-05-17 19:37                   ` Jeffrey Hugo
2020-05-14 14:07 ` [RFC PATCH 4/8] qaic: Implement control path Jeffrey Hugo
2020-05-14 14:07 ` [RFC PATCH 5/8] qaic: Implement data path Jeffrey Hugo
2020-05-14 14:14   ` Greg KH
2020-05-14 15:06     ` Jeffrey Hugo
2020-05-14 15:56       ` Greg KH
2020-05-14 16:12         ` Jeffrey Hugo
2020-05-14 16:37           ` Greg KH
2020-05-14 16:45             ` Jeffrey Hugo
2020-05-14 21:36   ` Arnd Bergmann
2020-05-14 22:06     ` Jeffrey Hugo
2020-05-14 22:20       ` Arnd Bergmann
2020-05-14 14:07 ` [RFC PATCH 6/8] qaic: Implement PCI link status error handlers Jeffrey Hugo
2020-05-14 14:07 ` [RFC PATCH 7/8] qaic: Implement MHI error status handler Jeffrey Hugo
2020-05-14 14:07 ` [RFC PATCH 8/8] MAINTAINERS: Add entry for QAIC driver Jeffrey Hugo
2020-05-19  5:08 ` [RFC PATCH 0/8] Qualcomm Cloud AI 100 driver Dave Airlie
2020-05-19 14:57   ` Jeffrey Hugo
2020-05-19 17:41     ` Greg Kroah-Hartman
2020-05-19 18:07       ` Jeffrey Hugo
2020-05-19 18:12         ` Greg Kroah-Hartman
2020-05-19 18:26           ` Jeffrey Hugo
2020-05-20  5:32             ` Greg Kroah-Hartman
2020-05-19 17:33   ` Greg Kroah-Hartman
2020-05-19  6:57 ` Manivannan Sadhasivam
2020-05-19 14:16   ` Jeffrey Hugo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).