dmaengine.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Add support of AMD AE4DMA DMA Engine
@ 2024-05-10  8:20 Basavaraj Natikar
  2024-05-10  8:20 ` [PATCH 1/7] dmaengine: Move AMD DMA driver to separate directory Basavaraj Natikar
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Basavaraj Natikar @ 2024-05-10  8:20 UTC (permalink / raw)
  To: vkoul, dmaengine; +Cc: Raju.Rangoju, Basavaraj Natikar

AMD AE4DMA Controller is a multi-queue DMA controller. Its design differs
significantly from PTDMA controller, although some functionalities
overlap. All functionalities similar to PTDMA are extended and merged
within PTDMA code to support both PTDMA and AE4DMA for code reuse. A new
AE4DMA driver directory is created to house unique AE4DMA code, ensuring
efficient handling of AE4DMA functionalities.

Basavaraj Natikar (7):
  dmaengine: Move AMD DMA driver to separate directory
  dmaengine: ae4dma: Add AMD ae4dma controller driver
  dmaengine: ptdma: Move common functions to common code
  dmaengine: ptdma: Extend ptdma to support multi-channel and version
  dmaengine: ae4dma: Register AE4DMA using pt_dmaengine_register
  dmaengine: ptdma: Extend ptdma-debugfs to support multi-queue
  dmaengine: ae4dma: Register debugfs using ptdma_debugfs_setup

 MAINTAINERS                                   |   9 +-
 drivers/dma/Kconfig                           |   4 +-
 drivers/dma/Makefile                          |   2 +-
 drivers/dma/amd/Kconfig                       |   6 +
 drivers/dma/amd/Makefile                      |   7 +
 drivers/dma/amd/ae4dma/Kconfig                |  13 +
 drivers/dma/amd/ae4dma/Makefile               |  10 +
 drivers/dma/amd/ae4dma/ae4dma-dev.c           | 281 ++++++++++++++++++
 drivers/dma/amd/ae4dma/ae4dma-pci.c           | 196 ++++++++++++
 drivers/dma/amd/ae4dma/ae4dma.h               |  80 +++++
 drivers/dma/amd/common/amd_dma.c              |  23 ++
 drivers/dma/amd/common/amd_dma.h              |  30 ++
 drivers/dma/{ => amd}/ptdma/Kconfig           |   0
 drivers/dma/{ => amd}/ptdma/Makefile          |   2 +-
 drivers/dma/{ => amd}/ptdma/ptdma-debugfs.c   |  76 +++--
 drivers/dma/{ => amd}/ptdma/ptdma-dev.c       |  14 +-
 drivers/dma/{ => amd}/ptdma/ptdma-dmaengine.c | 111 +++++--
 drivers/dma/{ => amd}/ptdma/ptdma-pci.c       |   0
 drivers/dma/{ => amd}/ptdma/ptdma.h           |   6 +-
 19 files changed, 805 insertions(+), 65 deletions(-)
 create mode 100644 drivers/dma/amd/Kconfig
 create mode 100644 drivers/dma/amd/Makefile
 create mode 100644 drivers/dma/amd/ae4dma/Kconfig
 create mode 100644 drivers/dma/amd/ae4dma/Makefile
 create mode 100644 drivers/dma/amd/ae4dma/ae4dma-dev.c
 create mode 100644 drivers/dma/amd/ae4dma/ae4dma-pci.c
 create mode 100644 drivers/dma/amd/ae4dma/ae4dma.h
 create mode 100644 drivers/dma/amd/common/amd_dma.c
 create mode 100644 drivers/dma/amd/common/amd_dma.h
 rename drivers/dma/{ => amd}/ptdma/Kconfig (100%)
 rename drivers/dma/{ => amd}/ptdma/Makefile (64%)
 rename drivers/dma/{ => amd}/ptdma/ptdma-debugfs.c (53%)
 rename drivers/dma/{ => amd}/ptdma/ptdma-dev.c (96%)
 rename drivers/dma/{ => amd}/ptdma/ptdma-dmaengine.c (78%)
 rename drivers/dma/{ => amd}/ptdma/ptdma-pci.c (100%)
 rename drivers/dma/{ => amd}/ptdma/ptdma.h (98%)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/7] dmaengine: Move AMD DMA driver to separate directory
  2024-05-10  8:20 [PATCH 0/7] Add support of AMD AE4DMA DMA Engine Basavaraj Natikar
@ 2024-05-10  8:20 ` Basavaraj Natikar
  2024-05-10  8:20 ` [PATCH 2/7] dmaengine: ae4dma: Add AMD ae4dma controller driver Basavaraj Natikar
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Basavaraj Natikar @ 2024-05-10  8:20 UTC (permalink / raw)
  To: vkoul, dmaengine; +Cc: Raju.Rangoju, Basavaraj Natikar

Currently, AMD PTDMA driver is single DMA driver supported and newer AMD
platforms supports newer DMA engine. Hence move the current drivers to
separate directory. This would also mean the newer driver submissions to
AMD DMA driver in the future will also land in AMD specific directory.

Reviewed-by: Raju Rangoju <Raju.Rangoju@amd.com>
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
---
 MAINTAINERS                                   | 2 +-
 drivers/dma/Kconfig                           | 4 ++--
 drivers/dma/Makefile                          | 2 +-
 drivers/dma/amd/Kconfig                       | 5 +++++
 drivers/dma/amd/Makefile                      | 6 ++++++
 drivers/dma/{ => amd}/ptdma/Kconfig           | 0
 drivers/dma/{ => amd}/ptdma/Makefile          | 0
 drivers/dma/{ => amd}/ptdma/ptdma-debugfs.c   | 0
 drivers/dma/{ => amd}/ptdma/ptdma-dev.c       | 0
 drivers/dma/{ => amd}/ptdma/ptdma-dmaengine.c | 3 +--
 drivers/dma/{ => amd}/ptdma/ptdma-pci.c       | 0
 drivers/dma/{ => amd}/ptdma/ptdma.h           | 2 +-
 12 files changed, 17 insertions(+), 7 deletions(-)
 create mode 100644 drivers/dma/amd/Kconfig
 create mode 100644 drivers/dma/amd/Makefile
 rename drivers/dma/{ => amd}/ptdma/Kconfig (100%)
 rename drivers/dma/{ => amd}/ptdma/Makefile (100%)
 rename drivers/dma/{ => amd}/ptdma/ptdma-debugfs.c (100%)
 rename drivers/dma/{ => amd}/ptdma/ptdma-dev.c (100%)
 rename drivers/dma/{ => amd}/ptdma/ptdma-dmaengine.c (99%)
 rename drivers/dma/{ => amd}/ptdma/ptdma-pci.c (100%)
 rename drivers/dma/{ => amd}/ptdma/ptdma.h (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 05720fcc95cb..b190efda33ba 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1073,7 +1073,7 @@ AMD PTDMA DRIVER
 M:	Basavaraj Natikar <Basavaraj.Natikar@amd.com>
 L:	dmaengine@vger.kernel.org
 S:	Maintained
-F:	drivers/dma/ptdma/
+F:	drivers/dma/amd/ptdma/
 
 AMD SEATTLE DEVICE TREE SUPPORT
 M:	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 002a5ec80620..ac6e9d3828b7 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -752,8 +752,6 @@ source "drivers/dma/bestcomm/Kconfig"
 
 source "drivers/dma/mediatek/Kconfig"
 
-source "drivers/dma/ptdma/Kconfig"
-
 source "drivers/dma/qcom/Kconfig"
 
 source "drivers/dma/dw/Kconfig"
@@ -772,6 +770,8 @@ source "drivers/dma/fsl-dpaa2-qdma/Kconfig"
 
 source "drivers/dma/lgm/Kconfig"
 
+source "drivers/dma/amd/Kconfig"
+
 # clients
 comment "DMA Clients"
 	depends on DMA_ENGINE
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index dfd40d14e408..41239304d21d 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -16,7 +16,7 @@ obj-$(CONFIG_DMATEST) += dmatest.o
 obj-$(CONFIG_ALTERA_MSGDMA) += altera-msgdma.o
 obj-$(CONFIG_AMBA_PL08X) += amba-pl08x.o
 obj-$(CONFIG_AMCC_PPC440SPE_ADMA) += ppc4xx/
-obj-$(CONFIG_AMD_PTDMA) += ptdma/
+obj-y += amd/
 obj-$(CONFIG_APPLE_ADMAC) += apple-admac.o
 obj-$(CONFIG_AT_HDMAC) += at_hdmac.o
 obj-$(CONFIG_AT_XDMAC) += at_xdmac.o
diff --git a/drivers/dma/amd/Kconfig b/drivers/dma/amd/Kconfig
new file mode 100644
index 000000000000..8246b463bcf7
--- /dev/null
+++ b/drivers/dma/amd/Kconfig
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# AMD DMA Drivers
+
+source "drivers/dma/amd/ptdma/Kconfig"
diff --git a/drivers/dma/amd/Makefile b/drivers/dma/amd/Makefile
new file mode 100644
index 000000000000..dd7257ba7e06
--- /dev/null
+++ b/drivers/dma/amd/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# AMD DMA drivers
+#
+
+obj-$(CONFIG_AMD_PTDMA) += ptdma/
diff --git a/drivers/dma/ptdma/Kconfig b/drivers/dma/amd/ptdma/Kconfig
similarity index 100%
rename from drivers/dma/ptdma/Kconfig
rename to drivers/dma/amd/ptdma/Kconfig
diff --git a/drivers/dma/ptdma/Makefile b/drivers/dma/amd/ptdma/Makefile
similarity index 100%
rename from drivers/dma/ptdma/Makefile
rename to drivers/dma/amd/ptdma/Makefile
diff --git a/drivers/dma/ptdma/ptdma-debugfs.c b/drivers/dma/amd/ptdma/ptdma-debugfs.c
similarity index 100%
rename from drivers/dma/ptdma/ptdma-debugfs.c
rename to drivers/dma/amd/ptdma/ptdma-debugfs.c
diff --git a/drivers/dma/ptdma/ptdma-dev.c b/drivers/dma/amd/ptdma/ptdma-dev.c
similarity index 100%
rename from drivers/dma/ptdma/ptdma-dev.c
rename to drivers/dma/amd/ptdma/ptdma-dev.c
diff --git a/drivers/dma/ptdma/ptdma-dmaengine.c b/drivers/dma/amd/ptdma/ptdma-dmaengine.c
similarity index 99%
rename from drivers/dma/ptdma/ptdma-dmaengine.c
rename to drivers/dma/amd/ptdma/ptdma-dmaengine.c
index f79240734807..a2e7c2cec15e 100644
--- a/drivers/dma/ptdma/ptdma-dmaengine.c
+++ b/drivers/dma/amd/ptdma/ptdma-dmaengine.c
@@ -10,8 +10,7 @@
  */
 
 #include "ptdma.h"
-#include "../dmaengine.h"
-#include "../virt-dma.h"
+#include "../../dmaengine.h"
 
 static inline struct pt_dma_chan *to_pt_chan(struct dma_chan *dma_chan)
 {
diff --git a/drivers/dma/ptdma/ptdma-pci.c b/drivers/dma/amd/ptdma/ptdma-pci.c
similarity index 100%
rename from drivers/dma/ptdma/ptdma-pci.c
rename to drivers/dma/amd/ptdma/ptdma-pci.c
diff --git a/drivers/dma/ptdma/ptdma.h b/drivers/dma/amd/ptdma/ptdma.h
similarity index 99%
rename from drivers/dma/ptdma/ptdma.h
rename to drivers/dma/amd/ptdma/ptdma.h
index 21b4bf895200..2690a32fc7cb 100644
--- a/drivers/dma/ptdma/ptdma.h
+++ b/drivers/dma/amd/ptdma/ptdma.h
@@ -22,7 +22,7 @@
 #include <linux/wait.h>
 #include <linux/dmapool.h>
 
-#include "../virt-dma.h"
+#include "../../virt-dma.h"
 
 #define MAX_PT_NAME_LEN			16
 #define MAX_DMAPOOL_NAME_LEN		32
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/7] dmaengine: ae4dma: Add AMD ae4dma controller driver
  2024-05-10  8:20 [PATCH 0/7] Add support of AMD AE4DMA DMA Engine Basavaraj Natikar
  2024-05-10  8:20 ` [PATCH 1/7] dmaengine: Move AMD DMA driver to separate directory Basavaraj Natikar
@ 2024-05-10  8:20 ` Basavaraj Natikar
  2024-05-10 18:16   ` Frank Li
  2024-05-10  8:20 ` [PATCH 3/7] dmaengine: ptdma: Move common functions to common code Basavaraj Natikar
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Basavaraj Natikar @ 2024-05-10  8:20 UTC (permalink / raw)
  To: vkoul, dmaengine; +Cc: Raju.Rangoju, Basavaraj Natikar

Add support for AMD AE4DMA controller. It performs high-bandwidth
memory to memory and IO copy operation. Device commands are managed
via a circular queue of 'descriptors', each of which specifies source
and destination addresses for copying a single buffer of data.

Reviewed-by: Raju Rangoju <Raju.Rangoju@amd.com>
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
---
 MAINTAINERS                         |   6 +
 drivers/dma/amd/Kconfig             |   1 +
 drivers/dma/amd/Makefile            |   1 +
 drivers/dma/amd/ae4dma/Kconfig      |  13 ++
 drivers/dma/amd/ae4dma/Makefile     |  10 ++
 drivers/dma/amd/ae4dma/ae4dma-dev.c | 206 ++++++++++++++++++++++++++++
 drivers/dma/amd/ae4dma/ae4dma-pci.c | 195 ++++++++++++++++++++++++++
 drivers/dma/amd/ae4dma/ae4dma.h     |  77 +++++++++++
 drivers/dma/amd/common/amd_dma.h    |  26 ++++
 9 files changed, 535 insertions(+)
 create mode 100644 drivers/dma/amd/ae4dma/Kconfig
 create mode 100644 drivers/dma/amd/ae4dma/Makefile
 create mode 100644 drivers/dma/amd/ae4dma/ae4dma-dev.c
 create mode 100644 drivers/dma/amd/ae4dma/ae4dma-pci.c
 create mode 100644 drivers/dma/amd/ae4dma/ae4dma.h
 create mode 100644 drivers/dma/amd/common/amd_dma.h

diff --git a/MAINTAINERS b/MAINTAINERS
index b190efda33ba..45f2140093b6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -909,6 +909,12 @@ L:	linux-edac@vger.kernel.org
 S:	Supported
 F:	drivers/ras/amd/atl/*
 
+AMD AE4DMA DRIVER
+M:	Basavaraj Natikar <Basavaraj.Natikar@amd.com>
+L:	dmaengine@vger.kernel.org
+S:	Maintained
+F:	drivers/dma/amd/ae4dma/
+
 AMD AXI W1 DRIVER
 M:	Kris Chaplin <kris.chaplin@amd.com>
 R:	Thomas Delev <thomas.delev@amd.com>
diff --git a/drivers/dma/amd/Kconfig b/drivers/dma/amd/Kconfig
index 8246b463bcf7..8c25a3ed6b94 100644
--- a/drivers/dma/amd/Kconfig
+++ b/drivers/dma/amd/Kconfig
@@ -3,3 +3,4 @@
 # AMD DMA Drivers
 
 source "drivers/dma/amd/ptdma/Kconfig"
+source "drivers/dma/amd/ae4dma/Kconfig"
diff --git a/drivers/dma/amd/Makefile b/drivers/dma/amd/Makefile
index dd7257ba7e06..8049b06a9ff5 100644
--- a/drivers/dma/amd/Makefile
+++ b/drivers/dma/amd/Makefile
@@ -4,3 +4,4 @@
 #
 
 obj-$(CONFIG_AMD_PTDMA) += ptdma/
+obj-$(CONFIG_AMD_AE4DMA) += ae4dma/
diff --git a/drivers/dma/amd/ae4dma/Kconfig b/drivers/dma/amd/ae4dma/Kconfig
new file mode 100644
index 000000000000..cf8db4dac98d
--- /dev/null
+++ b/drivers/dma/amd/ae4dma/Kconfig
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0
+config AMD_AE4DMA
+	tristate  "AMD AE4DMA Engine"
+	depends on X86_64 && PCI
+	select DMA_ENGINE
+	select DMA_VIRTUAL_CHANNELS
+	help
+	  Enable support for the AMD AE4DMA controller. This controller
+	  provides DMA capabilities to perform high bandwidth memory to
+	  memory and IO copy operations. It performs DMA transfer through
+	  queue-based descriptor management. This DMA controller is intended
+	  to be used with AMD Non-Transparent Bridge devices and not for
+	  general purpose peripheral DMA.
diff --git a/drivers/dma/amd/ae4dma/Makefile b/drivers/dma/amd/ae4dma/Makefile
new file mode 100644
index 000000000000..e918f85a80ec
--- /dev/null
+++ b/drivers/dma/amd/ae4dma/Makefile
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# AMD AE4DMA driver
+#
+
+obj-$(CONFIG_AMD_AE4DMA) += ae4dma.o
+
+ae4dma-objs := ae4dma-dev.o
+
+ae4dma-$(CONFIG_PCI) += ae4dma-pci.o
diff --git a/drivers/dma/amd/ae4dma/ae4dma-dev.c b/drivers/dma/amd/ae4dma/ae4dma-dev.c
new file mode 100644
index 000000000000..fc33d2056af2
--- /dev/null
+++ b/drivers/dma/amd/ae4dma/ae4dma-dev.c
@@ -0,0 +1,206 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD AE4DMA driver
+ *
+ * Copyright (c) 2024, Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
+ */
+
+#include "ae4dma.h"
+
+static unsigned int max_hw_q = 1;
+module_param(max_hw_q, uint, 0444);
+MODULE_PARM_DESC(max_hw_q, "max hw queues supported by engine (any non-zero value, default: 1)");
+
+static char *ae4_error_codes[] = {
+	"",
+	"ERR 01: INVALID HEADER DW0",
+	"ERR 02: INVALID STATUS",
+	"ERR 03: INVALID LENGTH - 4 BYTE ALIGNMENT",
+	"ERR 04: INVALID SRC ADDR - 4 BYTE ALIGNMENT",
+	"ERR 05: INVALID DST ADDR - 4 BYTE ALIGNMENT",
+	"ERR 06: INVALID ALIGNMENT",
+	"ERR 07: INVALID DESCRIPTOR",
+};
+
+static void ae4_log_error(struct pt_device *d, int e)
+{
+	if (e <= 7)
+		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", ae4_error_codes[e], e);
+	else if (e > 7 && e <= 15)
+		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "INVALID DESCRIPTOR", e);
+	else if (e > 15 && e <= 31)
+		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "INVALID DESCRIPTOR", e);
+	else if (e > 31 && e <= 63)
+		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "INVALID DESCRIPTOR", e);
+	else if (e > 63 && e <= 127)
+		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "PTE ERROR", e);
+	else if (e > 127 && e <= 255)
+		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "PTE ERROR", e);
+	else
+		dev_info(d->dev, "Unknown AE4DMA error");
+}
+
+static void ae4_check_status_error(struct ae4_cmd_queue *ae4cmd_q, int idx)
+{
+	struct pt_cmd_queue *cmd_q = &ae4cmd_q->cmd_q;
+	struct ae4dma_desc desc;
+	u8 status;
+
+	memcpy(&desc, &cmd_q->qbase[idx], sizeof(struct ae4dma_desc));
+	/* Synchronize ordering */
+	mb();
+	status = desc.dw1.status;
+	if (status && status != AE4_DESC_COMPLETED) {
+		cmd_q->cmd_error = desc.dw1.err_code;
+		if (cmd_q->cmd_error)
+			ae4_log_error(cmd_q->pt, cmd_q->cmd_error);
+	}
+}
+
+static void ae4_pending_work(struct work_struct *work)
+{
+	struct ae4_cmd_queue *ae4cmd_q = container_of(work, struct ae4_cmd_queue, p_work.work);
+	struct pt_cmd_queue *cmd_q = &ae4cmd_q->cmd_q;
+	struct pt_cmd *cmd;
+	u32 cridx, dridx;
+
+	while (true) {
+		wait_event_interruptible(ae4cmd_q->q_w,
+					 ((atomic64_read(&ae4cmd_q->done_cnt)) <
+					   atomic64_read(&ae4cmd_q->intr_cnt)));
+
+		atomic64_inc(&ae4cmd_q->done_cnt);
+
+		mutex_lock(&ae4cmd_q->cmd_lock);
+
+		cridx = readl(cmd_q->reg_control + 0x0C);
+		dridx = atomic_read(&ae4cmd_q->dridx);
+
+		while ((dridx != cridx) && !list_empty(&ae4cmd_q->cmd)) {
+			cmd = list_first_entry(&ae4cmd_q->cmd, struct pt_cmd, entry);
+			list_del(&cmd->entry);
+
+			ae4_check_status_error(ae4cmd_q, dridx);
+			cmd->pt_cmd_callback(cmd->data, cmd->ret);
+
+			atomic64_dec(&ae4cmd_q->q_cmd_count);
+			dridx = (dridx + 1) % CMD_Q_LEN;
+			atomic_set(&ae4cmd_q->dridx, dridx);
+			/* Synchronize ordering */
+			mb();
+
+			complete_all(&ae4cmd_q->cmp);
+		}
+
+		mutex_unlock(&ae4cmd_q->cmd_lock);
+	}
+}
+
+static irqreturn_t ae4_core_irq_handler(int irq, void *data)
+{
+	struct ae4_cmd_queue *ae4cmd_q = data;
+	struct pt_cmd_queue *cmd_q;
+	struct pt_device *pt;
+	u32 status;
+
+	cmd_q = &ae4cmd_q->cmd_q;
+	pt = cmd_q->pt;
+
+	pt->total_interrupts++;
+	atomic64_inc(&ae4cmd_q->intr_cnt);
+
+	wake_up(&ae4cmd_q->q_w);
+
+	status = readl(cmd_q->reg_control + 0x14);
+	if (status & BIT(0)) {
+		status &= GENMASK(31, 1);
+		writel(status, cmd_q->reg_control + 0x14);
+	}
+
+	return IRQ_HANDLED;
+}
+
+void ae4_destroy_work(struct ae4_device *ae4)
+{
+	struct ae4_cmd_queue *ae4cmd_q;
+	int i;
+
+	for (i = 0; i < ae4->cmd_q_count; i++) {
+		ae4cmd_q = &ae4->ae4cmd_q[i];
+
+		if (!ae4cmd_q->pws)
+			break;
+
+		cancel_delayed_work(&ae4cmd_q->p_work);
+		destroy_workqueue(ae4cmd_q->pws);
+	}
+}
+
+int ae4_core_init(struct ae4_device *ae4)
+{
+	struct pt_device *pt = &ae4->pt;
+	struct ae4_cmd_queue *ae4cmd_q;
+	struct device *dev = pt->dev;
+	struct pt_cmd_queue *cmd_q;
+	int i, ret = 0;
+
+	writel(max_hw_q, pt->io_regs);
+
+	for (i = 0; i < max_hw_q; i++) {
+		ae4cmd_q = &ae4->ae4cmd_q[i];
+		ae4cmd_q->id = ae4->cmd_q_count;
+		ae4->cmd_q_count++;
+
+		cmd_q = &ae4cmd_q->cmd_q;
+		cmd_q->pt = pt;
+
+		/* Preset some register values (Q size is 32byte (0x20)) */
+		cmd_q->reg_control = pt->io_regs + ((i + 1) * 0x20);
+
+		ret = devm_request_irq(dev, ae4->ae4_irq[i], ae4_core_irq_handler, 0,
+				       dev_name(pt->dev), ae4cmd_q);
+		if (ret)
+			return ret;
+
+		cmd_q->qsize = Q_SIZE(sizeof(struct ae4dma_desc));
+
+		cmd_q->qbase = dmam_alloc_coherent(dev, cmd_q->qsize, &cmd_q->qbase_dma,
+						   GFP_KERNEL);
+		if (!cmd_q->qbase)
+			return -ENOMEM;
+	}
+
+	for (i = 0; i < ae4->cmd_q_count; i++) {
+		ae4cmd_q = &ae4->ae4cmd_q[i];
+
+		cmd_q = &ae4cmd_q->cmd_q;
+
+		/* Preset some register values (Q size is 32byte (0x20)) */
+		cmd_q->reg_control = pt->io_regs + ((i + 1) * 0x20);
+
+		/* Update the device registers with queue information. */
+		writel(CMD_Q_LEN, cmd_q->reg_control + 0x08);
+
+		cmd_q->qdma_tail = cmd_q->qbase_dma;
+		writel(lower_32_bits(cmd_q->qdma_tail), cmd_q->reg_control + 0x18);
+		writel(upper_32_bits(cmd_q->qdma_tail), cmd_q->reg_control + 0x1C);
+
+		INIT_LIST_HEAD(&ae4cmd_q->cmd);
+		init_waitqueue_head(&ae4cmd_q->q_w);
+
+		ae4cmd_q->pws = alloc_ordered_workqueue("ae4dma_%d", WQ_MEM_RECLAIM, ae4cmd_q->id);
+		if (!ae4cmd_q->pws) {
+			ae4_destroy_work(ae4);
+			return -ENOMEM;
+		}
+		INIT_DELAYED_WORK(&ae4cmd_q->p_work, ae4_pending_work);
+		queue_delayed_work(ae4cmd_q->pws, &ae4cmd_q->p_work,  usecs_to_jiffies(100));
+
+		init_completion(&ae4cmd_q->cmp);
+	}
+
+	return ret;
+}
diff --git a/drivers/dma/amd/ae4dma/ae4dma-pci.c b/drivers/dma/amd/ae4dma/ae4dma-pci.c
new file mode 100644
index 000000000000..4cd537af757d
--- /dev/null
+++ b/drivers/dma/amd/ae4dma/ae4dma-pci.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD AE4DMA driver
+ *
+ * Copyright (c) 2024, Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
+ */
+
+#include "ae4dma.h"
+
+static int ae4_get_msi_irq(struct ae4_device *ae4)
+{
+	struct pt_device *pt = &ae4->pt;
+	struct device *dev = pt->dev;
+	struct pci_dev *pdev;
+	int ret, i;
+
+	pdev = to_pci_dev(dev);
+	ret = pci_enable_msi(pdev);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < MAX_AE4_HW_QUEUES; i++)
+		ae4->ae4_irq[i] = pdev->irq;
+
+	return 0;
+}
+
+static int ae4_get_msix_irqs(struct ae4_device *ae4)
+{
+	struct ae4_msix *ae4_msix = ae4->ae4_msix;
+	struct pt_device *pt = &ae4->pt;
+	struct device *dev = pt->dev;
+	struct pci_dev *pdev;
+	int v, i, ret;
+
+	pdev = to_pci_dev(dev);
+
+	for (v = 0; v < ARRAY_SIZE(ae4_msix->msix_entry); v++)
+		ae4_msix->msix_entry[v].entry = v;
+
+	ret = pci_enable_msix_range(pdev, ae4_msix->msix_entry, 1, v);
+	if (ret < 0)
+		return ret;
+
+	ae4_msix->msix_count = ret;
+
+	for (i = 0; i < MAX_AE4_HW_QUEUES; i++)
+		ae4->ae4_irq[i] = ae4_msix->msix_entry[i].vector;
+
+	return 0;
+}
+
+static int ae4_get_irqs(struct ae4_device *ae4)
+{
+	struct pt_device *pt = &ae4->pt;
+	struct device *dev = pt->dev;
+	int ret;
+
+	ret = ae4_get_msix_irqs(ae4);
+	if (!ret)
+		return 0;
+
+	/* Couldn't get MSI-X vectors, try MSI */
+	dev_err(dev, "could not enable MSI-X (%d), trying MSI\n", ret);
+	ret = ae4_get_msi_irq(ae4);
+	if (!ret)
+		return 0;
+
+	/* Couldn't get MSI interrupt */
+	dev_err(dev, "could not enable MSI (%d)\n", ret);
+
+	return ret;
+}
+
+static void ae4_free_irqs(struct ae4_device *ae4)
+{
+	struct ae4_msix *ae4_msix;
+	struct pci_dev *pdev;
+	struct pt_device *pt;
+	struct device *dev;
+	int i;
+
+	if (ae4) {
+		pt = &ae4->pt;
+		dev = pt->dev;
+		pdev = to_pci_dev(dev);
+
+		ae4_msix = ae4->ae4_msix;
+		if (ae4_msix && ae4_msix->msix_count)
+			pci_disable_msix(pdev);
+		else if (pdev->irq)
+			pci_disable_msi(pdev);
+
+		for (i = 0; i < MAX_AE4_HW_QUEUES; i++)
+			ae4->ae4_irq[i] = 0;
+	}
+}
+
+static void ae4_deinit(struct ae4_device *ae4)
+{
+	ae4_free_irqs(ae4);
+}
+
+static int ae4_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	struct device *dev = &pdev->dev;
+	struct ae4_device *ae4;
+	struct pt_device *pt;
+	int bar_mask;
+	int ret = 0;
+
+	ae4 = devm_kzalloc(dev, sizeof(*ae4), GFP_KERNEL);
+	if (!ae4)
+		return -ENOMEM;
+
+	ae4->ae4_msix = devm_kzalloc(dev, sizeof(struct ae4_msix), GFP_KERNEL);
+	if (!ae4->ae4_msix)
+		return -ENOMEM;
+
+	ret = pcim_enable_device(pdev);
+	if (ret)
+		goto ae4_error;
+
+	bar_mask = pci_select_bars(pdev, IORESOURCE_MEM);
+	ret = pcim_iomap_regions(pdev, bar_mask, "ae4dma");
+	if (ret)
+		goto ae4_error;
+
+	pt = &ae4->pt;
+	pt->dev = dev;
+
+	pt->io_regs = pcim_iomap_table(pdev)[0];
+	if (!pt->io_regs) {
+		ret = -ENOMEM;
+		goto ae4_error;
+	}
+
+	ret = ae4_get_irqs(ae4);
+	if (ret)
+		goto ae4_error;
+
+	pci_set_master(pdev);
+
+	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(48));
+	if (ret) {
+		ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
+		if (ret)
+			goto ae4_error;
+	}
+
+	dev_set_drvdata(dev, ae4);
+
+	ret = ae4_core_init(ae4);
+	if (ret)
+		goto ae4_error;
+
+	return 0;
+
+ae4_error:
+	ae4_deinit(ae4);
+
+	return ret;
+}
+
+static void ae4_pci_remove(struct pci_dev *pdev)
+{
+	struct ae4_device *ae4 = dev_get_drvdata(&pdev->dev);
+
+	ae4_destroy_work(ae4);
+	ae4_deinit(ae4);
+}
+
+static const struct pci_device_id ae4_pci_table[] = {
+	{ PCI_VDEVICE(AMD, 0x14C8), },
+	{ PCI_VDEVICE(AMD, 0x14DC), },
+	{ PCI_VDEVICE(AMD, 0x149B), },
+	/* Last entry must be zero */
+	{ 0, }
+};
+MODULE_DEVICE_TABLE(pci, ae4_pci_table);
+
+static struct pci_driver ae4_pci_driver = {
+	.name = "ae4dma",
+	.id_table = ae4_pci_table,
+	.probe = ae4_pci_probe,
+	.remove = ae4_pci_remove,
+};
+
+module_pci_driver(ae4_pci_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("AMD AE4DMA driver");
diff --git a/drivers/dma/amd/ae4dma/ae4dma.h b/drivers/dma/amd/ae4dma/ae4dma.h
new file mode 100644
index 000000000000..24b1253ad570
--- /dev/null
+++ b/drivers/dma/amd/ae4dma/ae4dma.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * AMD AE4DMA driver
+ *
+ * Copyright (c) 2024, Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
+ */
+#ifndef __AE4DMA_H__
+#define __AE4DMA_H__
+
+#include "../common/amd_dma.h"
+
+#define MAX_AE4_HW_QUEUES		16
+
+#define AE4_DESC_COMPLETED		0x3
+
+struct ae4_msix {
+	int msix_count;
+	struct msix_entry msix_entry[MAX_AE4_HW_QUEUES];
+};
+
+struct ae4_cmd_queue {
+	struct ae4_device *ae4;
+	struct pt_cmd_queue cmd_q;
+	struct list_head cmd;
+	/* protect command operations */
+	struct mutex cmd_lock;
+	struct delayed_work p_work;
+	struct workqueue_struct *pws;
+	struct completion cmp;
+	wait_queue_head_t q_w;
+	atomic64_t intr_cnt;
+	atomic64_t done_cnt;
+	atomic64_t q_cmd_count;
+	atomic_t dridx;
+	unsigned int id;
+};
+
+union dwou {
+	u32 dw0;
+	struct dword0 {
+	u8	byte0;
+	u8	byte1;
+	u16	timestamp;
+	} dws;
+};
+
+struct dword1 {
+	u8	status;
+	u8	err_code;
+	u16	desc_id;
+};
+
+struct ae4dma_desc {
+	union dwou dwouv;
+	struct dword1 dw1;
+	u32 length;
+	u32 rsvd;
+	u32 src_hi;
+	u32 src_lo;
+	u32 dst_hi;
+	u32 dst_lo;
+};
+
+struct ae4_device {
+	struct pt_device pt;
+	struct ae4_msix *ae4_msix;
+	struct ae4_cmd_queue ae4cmd_q[MAX_AE4_HW_QUEUES];
+	unsigned int ae4_irq[MAX_AE4_HW_QUEUES];
+	unsigned int cmd_q_count;
+};
+
+int ae4_core_init(struct ae4_device *ae4);
+void ae4_destroy_work(struct ae4_device *ae4);
+#endif
diff --git a/drivers/dma/amd/common/amd_dma.h b/drivers/dma/amd/common/amd_dma.h
new file mode 100644
index 000000000000..31c35b3bc94b
--- /dev/null
+++ b/drivers/dma/amd/common/amd_dma.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * AMD DMA Driver common
+ *
+ * Copyright (c) 2024, Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
+ */
+
+#ifndef AMD_DMA_H
+#define AMD_DMA_H
+
+#include <linux/device.h>
+#include <linux/dmaengine.h>
+#include <linux/pci.h>
+#include <linux/spinlock.h>
+#include <linux/mutex.h>
+#include <linux/list.h>
+#include <linux/wait.h>
+#include <linux/dmapool.h>
+
+#include "../ptdma/ptdma.h"
+#include "../../virt-dma.h"
+
+#endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/7] dmaengine: ptdma: Move common functions to common code
  2024-05-10  8:20 [PATCH 0/7] Add support of AMD AE4DMA DMA Engine Basavaraj Natikar
  2024-05-10  8:20 ` [PATCH 1/7] dmaengine: Move AMD DMA driver to separate directory Basavaraj Natikar
  2024-05-10  8:20 ` [PATCH 2/7] dmaengine: ae4dma: Add AMD ae4dma controller driver Basavaraj Natikar
@ 2024-05-10  8:20 ` Basavaraj Natikar
  2024-05-10  8:20 ` [PATCH 4/7] dmaengine: ptdma: Extend ptdma to support multi-channel and version Basavaraj Natikar
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Basavaraj Natikar @ 2024-05-10  8:20 UTC (permalink / raw)
  To: vkoul, dmaengine; +Cc: Raju.Rangoju, Basavaraj Natikar

To focus on reusability of ptdma code across modules extract common
functions into reusable modules.

Reviewed-by: Raju Rangoju <Raju.Rangoju@amd.com>
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
---
 MAINTAINERS                             |  1 +
 drivers/dma/amd/common/amd_dma.c        | 23 +++++++++++++++++++++++
 drivers/dma/amd/common/amd_dma.h        |  3 +++
 drivers/dma/amd/ptdma/Makefile          |  2 +-
 drivers/dma/amd/ptdma/ptdma-dev.c       | 14 +-------------
 drivers/dma/amd/ptdma/ptdma-dmaengine.c |  3 +--
 drivers/dma/amd/ptdma/ptdma.h           |  2 --
 7 files changed, 30 insertions(+), 18 deletions(-)
 create mode 100644 drivers/dma/amd/common/amd_dma.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 45f2140093b6..177445348f4e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -914,6 +914,7 @@ M:	Basavaraj Natikar <Basavaraj.Natikar@amd.com>
 L:	dmaengine@vger.kernel.org
 S:	Maintained
 F:	drivers/dma/amd/ae4dma/
+F:	drivers/dma/amd/common/
 
 AMD AXI W1 DRIVER
 M:	Kris Chaplin <kris.chaplin@amd.com>
diff --git a/drivers/dma/amd/common/amd_dma.c b/drivers/dma/amd/common/amd_dma.c
new file mode 100644
index 000000000000..3552d36fa8b9
--- /dev/null
+++ b/drivers/dma/amd/common/amd_dma.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * AMD DMA Driver common
+ *
+ * Copyright (c) 2024, Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
+ */
+
+#include "../common/amd_dma.h"
+
+void pt_start_queue(struct pt_cmd_queue *cmd_q)
+{
+	/* Turn on the run bit */
+	iowrite32(cmd_q->qcontrol | CMD_Q_RUN, cmd_q->reg_control);
+}
+
+void pt_stop_queue(struct pt_cmd_queue *cmd_q)
+{
+	/* Turn off the run bit */
+	iowrite32(cmd_q->qcontrol & ~CMD_Q_RUN, cmd_q->reg_control);
+}
diff --git a/drivers/dma/amd/common/amd_dma.h b/drivers/dma/amd/common/amd_dma.h
index 31c35b3bc94b..c216a03b5161 100644
--- a/drivers/dma/amd/common/amd_dma.h
+++ b/drivers/dma/amd/common/amd_dma.h
@@ -23,4 +23,7 @@
 #include "../ptdma/ptdma.h"
 #include "../../virt-dma.h"
 
+void pt_start_queue(struct pt_cmd_queue *cmd_q);
+void pt_stop_queue(struct pt_cmd_queue *cmd_q);
+
 #endif
diff --git a/drivers/dma/amd/ptdma/Makefile b/drivers/dma/amd/ptdma/Makefile
index ce5410268a9a..42606d7302e6 100644
--- a/drivers/dma/amd/ptdma/Makefile
+++ b/drivers/dma/amd/ptdma/Makefile
@@ -5,6 +5,6 @@
 
 obj-$(CONFIG_AMD_PTDMA) += ptdma.o
 
-ptdma-objs := ptdma-dev.o ptdma-dmaengine.o ptdma-debugfs.o
+ptdma-objs := ptdma-dev.o ptdma-dmaengine.o ptdma-debugfs.o ../common/amd_dma.o
 
 ptdma-$(CONFIG_PCI) += ptdma-pci.o
diff --git a/drivers/dma/amd/ptdma/ptdma-dev.c b/drivers/dma/amd/ptdma/ptdma-dev.c
index a2bf13ff18b6..506b3dfca549 100644
--- a/drivers/dma/amd/ptdma/ptdma-dev.c
+++ b/drivers/dma/amd/ptdma/ptdma-dev.c
@@ -17,7 +17,7 @@
 #include <linux/module.h>
 #include <linux/pci.h>
 
-#include "ptdma.h"
+#include "../common/amd_dma.h"
 
 /* Human-readable error strings */
 static char *pt_error_codes[] = {
@@ -54,18 +54,6 @@ static void pt_log_error(struct pt_device *d, int e)
 	dev_err(d->dev, "PTDMA error: %s (0x%x)\n", pt_error_codes[e], e);
 }
 
-void pt_start_queue(struct pt_cmd_queue *cmd_q)
-{
-	/* Turn on the run bit */
-	iowrite32(cmd_q->qcontrol | CMD_Q_RUN, cmd_q->reg_control);
-}
-
-void pt_stop_queue(struct pt_cmd_queue *cmd_q)
-{
-	/* Turn off the run bit */
-	iowrite32(cmd_q->qcontrol & ~CMD_Q_RUN, cmd_q->reg_control);
-}
-
 static int pt_core_execute_cmd(struct ptdma_desc *desc, struct pt_cmd_queue *cmd_q)
 {
 	bool soc = FIELD_GET(DWORD0_SOC, desc->dw0);
diff --git a/drivers/dma/amd/ptdma/ptdma-dmaengine.c b/drivers/dma/amd/ptdma/ptdma-dmaengine.c
index a2e7c2cec15e..66ea10499643 100644
--- a/drivers/dma/amd/ptdma/ptdma-dmaengine.c
+++ b/drivers/dma/amd/ptdma/ptdma-dmaengine.c
@@ -9,8 +9,7 @@
  * Author: Gary R Hook <gary.hook@amd.com>
  */
 
-#include "ptdma.h"
-#include "../../dmaengine.h"
+#include "../common/amd_dma.h"
 
 static inline struct pt_dma_chan *to_pt_chan(struct dma_chan *dma_chan)
 {
diff --git a/drivers/dma/amd/ptdma/ptdma.h b/drivers/dma/amd/ptdma/ptdma.h
index 2690a32fc7cb..b4f9ee83b074 100644
--- a/drivers/dma/amd/ptdma/ptdma.h
+++ b/drivers/dma/amd/ptdma/ptdma.h
@@ -322,8 +322,6 @@ int pt_core_perform_passthru(struct pt_cmd_queue *cmd_q,
 			     struct pt_passthru_engine *pt_engine);
 
 void pt_check_status_trans(struct pt_device *pt, struct pt_cmd_queue *cmd_q);
-void pt_start_queue(struct pt_cmd_queue *cmd_q);
-void pt_stop_queue(struct pt_cmd_queue *cmd_q);
 
 static inline void pt_core_disable_queue_interrupts(struct pt_device *pt)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/7] dmaengine: ptdma: Extend ptdma to support multi-channel and version
  2024-05-10  8:20 [PATCH 0/7] Add support of AMD AE4DMA DMA Engine Basavaraj Natikar
                   ` (2 preceding siblings ...)
  2024-05-10  8:20 ` [PATCH 3/7] dmaengine: ptdma: Move common functions to common code Basavaraj Natikar
@ 2024-05-10  8:20 ` Basavaraj Natikar
  2024-05-11 10:54   ` kernel test robot
  2024-05-10  8:20 ` [PATCH 5/7] dmaengine: ae4dma: Register AE4DMA using pt_dmaengine_register Basavaraj Natikar
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Basavaraj Natikar @ 2024-05-10  8:20 UTC (permalink / raw)
  To: vkoul, dmaengine; +Cc: Raju.Rangoju, Basavaraj Natikar

To support multi-channel functionality with AE4DMA engine, extend the
PTDMA code with reusable components.

Reviewed-by: Raju Rangoju <Raju.Rangoju@amd.com>
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
---
 drivers/dma/amd/ae4dma/ae4dma.h         |   1 +
 drivers/dma/amd/common/amd_dma.h        |   1 +
 drivers/dma/amd/ptdma/ptdma-dmaengine.c | 107 +++++++++++++++++++-----
 drivers/dma/amd/ptdma/ptdma.h           |   2 +
 4 files changed, 91 insertions(+), 20 deletions(-)

diff --git a/drivers/dma/amd/ae4dma/ae4dma.h b/drivers/dma/amd/ae4dma/ae4dma.h
index 24b1253ad570..4e4584e152a1 100644
--- a/drivers/dma/amd/ae4dma/ae4dma.h
+++ b/drivers/dma/amd/ae4dma/ae4dma.h
@@ -15,6 +15,7 @@
 #define MAX_AE4_HW_QUEUES		16
 
 #define AE4_DESC_COMPLETED		0x3
+#define AE4_DMA_VERSION			4
 
 struct ae4_msix {
 	int msix_count;
diff --git a/drivers/dma/amd/common/amd_dma.h b/drivers/dma/amd/common/amd_dma.h
index c216a03b5161..316af6ba0692 100644
--- a/drivers/dma/amd/common/amd_dma.h
+++ b/drivers/dma/amd/common/amd_dma.h
@@ -21,6 +21,7 @@
 #include <linux/dmapool.h>
 
 #include "../ptdma/ptdma.h"
+#include "../ae4dma/ae4dma.h"
 #include "../../virt-dma.h"
 
 void pt_start_queue(struct pt_cmd_queue *cmd_q);
diff --git a/drivers/dma/amd/ptdma/ptdma-dmaengine.c b/drivers/dma/amd/ptdma/ptdma-dmaengine.c
index 66ea10499643..eab372cfcd17 100644
--- a/drivers/dma/amd/ptdma/ptdma-dmaengine.c
+++ b/drivers/dma/amd/ptdma/ptdma-dmaengine.c
@@ -43,7 +43,24 @@ static void pt_do_cleanup(struct virt_dma_desc *vd)
 	kmem_cache_free(pt->dma_desc_cache, desc);
 }
 
-static int pt_dma_start_desc(struct pt_dma_desc *desc)
+static struct pt_cmd_queue *pt_get_cmd_queue(struct pt_device *pt, struct pt_dma_chan *chan)
+{
+	struct ae4_cmd_queue *ae4cmd_q;
+	struct pt_cmd_queue *cmd_q;
+	struct ae4_device *ae4;
+
+	if (pt->ver == AE4_DMA_VERSION) {
+		ae4 = container_of(pt, struct ae4_device, pt);
+		ae4cmd_q = &ae4->ae4cmd_q[chan->id];
+		cmd_q = &ae4cmd_q->cmd_q;
+	} else {
+		cmd_q = &pt->cmd_q;
+	}
+
+	return cmd_q;
+}
+
+static int pt_dma_start_desc(struct pt_dma_desc *desc, struct pt_dma_chan *chan)
 {
 	struct pt_passthru_engine *pt_engine;
 	struct pt_device *pt;
@@ -54,7 +71,9 @@ static int pt_dma_start_desc(struct pt_dma_desc *desc)
 
 	pt_cmd = &desc->pt_cmd;
 	pt = pt_cmd->pt;
-	cmd_q = &pt->cmd_q;
+
+	cmd_q = pt_get_cmd_queue(pt, chan);
+
 	pt_engine = &pt_cmd->passthru;
 
 	pt->tdata.cmd = pt_cmd;
@@ -149,7 +168,7 @@ static void pt_cmd_callback(void *data, int err)
 		if (!desc)
 			break;
 
-		ret = pt_dma_start_desc(desc);
+		ret = pt_dma_start_desc(desc, chan);
 		if (!ret)
 			break;
 
@@ -184,7 +203,11 @@ static struct pt_dma_desc *pt_create_desc(struct dma_chan *dma_chan,
 {
 	struct pt_dma_chan *chan = to_pt_chan(dma_chan);
 	struct pt_passthru_engine *pt_engine;
+	struct pt_device *pt = chan->pt;
+	struct ae4_cmd_queue *ae4cmd_q;
+	struct pt_cmd_queue *cmd_q;
 	struct pt_dma_desc *desc;
+	struct ae4_device *ae4;
 	struct pt_cmd *pt_cmd;
 
 	desc = pt_alloc_dma_desc(chan, flags);
@@ -192,7 +215,7 @@ static struct pt_dma_desc *pt_create_desc(struct dma_chan *dma_chan,
 		return NULL;
 
 	pt_cmd = &desc->pt_cmd;
-	pt_cmd->pt = chan->pt;
+	pt_cmd->pt = pt;
 	pt_engine = &pt_cmd->passthru;
 	pt_cmd->engine = PT_ENGINE_PASSTHRU;
 	pt_engine->src_dma = src;
@@ -203,6 +226,15 @@ static struct pt_dma_desc *pt_create_desc(struct dma_chan *dma_chan,
 
 	desc->len = len;
 
+	if (pt->ver == AE4_DMA_VERSION) {
+		ae4 = container_of(pt, struct ae4_device, pt);
+		ae4cmd_q = &ae4->ae4cmd_q[chan->id];
+		cmd_q = &ae4cmd_q->cmd_q;
+		mutex_lock(&ae4cmd_q->cmd_lock);
+		list_add_tail(&pt_cmd->entry, &ae4cmd_q->cmd);
+		mutex_unlock(&ae4cmd_q->cmd_lock);
+	}
+
 	return desc;
 }
 
@@ -260,8 +292,11 @@ static enum dma_status
 pt_tx_status(struct dma_chan *c, dma_cookie_t cookie,
 		struct dma_tx_state *txstate)
 {
-	struct pt_device *pt = to_pt_chan(c)->pt;
-	struct pt_cmd_queue *cmd_q = &pt->cmd_q;
+	struct pt_dma_chan *chan = to_pt_chan(c);
+	struct pt_device *pt = chan->pt;
+	struct pt_cmd_queue *cmd_q;
+
+	cmd_q = pt_get_cmd_queue(pt, chan);
 
 	pt_check_status_trans(pt, cmd_q);
 	return dma_cookie_status(c, cookie, txstate);
@@ -270,10 +305,13 @@ pt_tx_status(struct dma_chan *c, dma_cookie_t cookie,
 static int pt_pause(struct dma_chan *dma_chan)
 {
 	struct pt_dma_chan *chan = to_pt_chan(dma_chan);
+	struct pt_device *pt = chan->pt;
+	struct pt_cmd_queue *cmd_q;
 	unsigned long flags;
 
 	spin_lock_irqsave(&chan->vc.lock, flags);
-	pt_stop_queue(&chan->pt->cmd_q);
+	cmd_q = pt_get_cmd_queue(pt, chan);
+	pt_stop_queue(cmd_q);
 	spin_unlock_irqrestore(&chan->vc.lock, flags);
 
 	return 0;
@@ -283,10 +321,13 @@ static int pt_resume(struct dma_chan *dma_chan)
 {
 	struct pt_dma_chan *chan = to_pt_chan(dma_chan);
 	struct pt_dma_desc *desc = NULL;
+	struct pt_device *pt = chan->pt;
+	struct pt_cmd_queue *cmd_q;
 	unsigned long flags;
 
 	spin_lock_irqsave(&chan->vc.lock, flags);
-	pt_start_queue(&chan->pt->cmd_q);
+	cmd_q = pt_get_cmd_queue(pt, chan);
+	pt_start_queue(cmd_q);
 	desc = pt_next_dma_desc(chan);
 	spin_unlock_irqrestore(&chan->vc.lock, flags);
 
@@ -300,11 +341,17 @@ static int pt_resume(struct dma_chan *dma_chan)
 static int pt_terminate_all(struct dma_chan *dma_chan)
 {
 	struct pt_dma_chan *chan = to_pt_chan(dma_chan);
+	struct pt_device *pt = chan->pt;
+	struct pt_cmd_queue *cmd_q;
 	unsigned long flags;
-	struct pt_cmd_queue *cmd_q = &chan->pt->cmd_q;
 	LIST_HEAD(head);
 
-	iowrite32(SUPPORTED_INTERRUPTS, cmd_q->reg_control + 0x0010);
+	cmd_q = pt_get_cmd_queue(pt, chan);
+	if (pt->ver == AE4_DMA_VERSION)
+		pt_stop_queue(cmd_q);
+	else
+		iowrite32(SUPPORTED_INTERRUPTS, cmd_q->reg_control + 0x0010);
+
 	spin_lock_irqsave(&chan->vc.lock, flags);
 	vchan_get_all_descriptors(&chan->vc, &head);
 	spin_unlock_irqrestore(&chan->vc.lock, flags);
@@ -317,14 +364,24 @@ static int pt_terminate_all(struct dma_chan *dma_chan)
 
 int pt_dmaengine_register(struct pt_device *pt)
 {
-	struct pt_dma_chan *chan;
 	struct dma_device *dma_dev = &pt->dma_dev;
-	char *cmd_cache_name;
+	struct ae4_cmd_queue *ae4cmd_q = NULL;
+	struct ae4_device *ae4 = NULL;
+	struct pt_dma_chan *chan;
 	char *desc_cache_name;
-	int ret;
+	char *cmd_cache_name;
+	int ret, i;
+
+	if (pt->ver == AE4_DMA_VERSION)
+		ae4 = container_of(pt, struct ae4_device, pt);
+
+	if (ae4)
+		pt->pt_dma_chan = devm_kcalloc(pt->dev, ae4->cmd_q_count,
+					       sizeof(*pt->pt_dma_chan), GFP_KERNEL);
+	else
+		pt->pt_dma_chan = devm_kzalloc(pt->dev, sizeof(*pt->pt_dma_chan),
+					       GFP_KERNEL);
 
-	pt->pt_dma_chan = devm_kzalloc(pt->dev, sizeof(*pt->pt_dma_chan),
-				       GFP_KERNEL);
 	if (!pt->pt_dma_chan)
 		return -ENOMEM;
 
@@ -366,9 +423,6 @@ int pt_dmaengine_register(struct pt_device *pt)
 
 	INIT_LIST_HEAD(&dma_dev->channels);
 
-	chan = pt->pt_dma_chan;
-	chan->pt = pt;
-
 	/* Set base and prep routines */
 	dma_dev->device_free_chan_resources = pt_free_chan_resources;
 	dma_dev->device_prep_dma_memcpy = pt_prep_dma_memcpy;
@@ -380,8 +434,21 @@ int pt_dmaengine_register(struct pt_device *pt)
 	dma_dev->device_terminate_all = pt_terminate_all;
 	dma_dev->device_synchronize = pt_synchronize;
 
-	chan->vc.desc_free = pt_do_cleanup;
-	vchan_init(&chan->vc, dma_dev);
+	if (ae4) {
+		for (i = 0; i < ae4->cmd_q_count; i++) {
+			chan = pt->pt_dma_chan + i;
+			ae4cmd_q = &ae4->ae4cmd_q[i];
+			chan->id = ae4cmd_q->id;
+			chan->pt = pt;
+			chan->vc.desc_free = pt_do_cleanup;
+			vchan_init(&chan->vc, dma_dev);
+		}
+	} else {
+		chan = pt->pt_dma_chan;
+		chan->pt = pt;
+		chan->vc.desc_free = pt_do_cleanup;
+		vchan_init(&chan->vc, dma_dev);
+	}
 
 	ret = dma_async_device_register(dma_dev);
 	if (ret)
diff --git a/drivers/dma/amd/ptdma/ptdma.h b/drivers/dma/amd/ptdma/ptdma.h
index b4f9ee83b074..3ef290b78448 100644
--- a/drivers/dma/amd/ptdma/ptdma.h
+++ b/drivers/dma/amd/ptdma/ptdma.h
@@ -184,6 +184,7 @@ struct pt_dma_desc {
 struct pt_dma_chan {
 	struct virt_dma_chan vc;
 	struct pt_device *pt;
+	unsigned int id;
 };
 
 struct pt_cmd_queue {
@@ -262,6 +263,7 @@ struct pt_device {
 	unsigned long total_interrupts;
 
 	struct pt_tasklet_data tdata;
+	int ver;
 };
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/7] dmaengine: ae4dma: Register AE4DMA using pt_dmaengine_register
  2024-05-10  8:20 [PATCH 0/7] Add support of AMD AE4DMA DMA Engine Basavaraj Natikar
                   ` (3 preceding siblings ...)
  2024-05-10  8:20 ` [PATCH 4/7] dmaengine: ptdma: Extend ptdma to support multi-channel and version Basavaraj Natikar
@ 2024-05-10  8:20 ` Basavaraj Natikar
  2024-05-10  8:20 ` [PATCH 6/7] dmaengine: ptdma: Extend ptdma-debugfs to support multi-queue Basavaraj Natikar
  2024-05-10  8:20 ` [PATCH 7/7] dmaengine: ae4dma: Register debugfs using ptdma_debugfs_setup Basavaraj Natikar
  6 siblings, 0 replies; 12+ messages in thread
From: Basavaraj Natikar @ 2024-05-10  8:20 UTC (permalink / raw)
  To: vkoul, dmaengine; +Cc: Raju.Rangoju, Basavaraj Natikar

Use the pt_dmaengine_register function to register a AE4DMA DMA engine.

Reviewed-by: Raju Rangoju <Raju.Rangoju@amd.com>
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
---
 drivers/dma/amd/ae4dma/Makefile     |  2 +-
 drivers/dma/amd/ae4dma/ae4dma-dev.c | 73 +++++++++++++++++++++++++++++
 drivers/dma/amd/ae4dma/ae4dma-pci.c |  1 +
 drivers/dma/amd/ae4dma/ae4dma.h     |  2 +
 4 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/amd/ae4dma/Makefile b/drivers/dma/amd/ae4dma/Makefile
index e918f85a80ec..165d1c74b732 100644
--- a/drivers/dma/amd/ae4dma/Makefile
+++ b/drivers/dma/amd/ae4dma/Makefile
@@ -5,6 +5,6 @@
 
 obj-$(CONFIG_AMD_AE4DMA) += ae4dma.o
 
-ae4dma-objs := ae4dma-dev.o
+ae4dma-objs := ae4dma-dev.o  ../ptdma/ptdma-dmaengine.o ../common/amd_dma.o
 
 ae4dma-$(CONFIG_PCI) += ae4dma-pci.o
diff --git a/drivers/dma/amd/ae4dma/ae4dma-dev.c b/drivers/dma/amd/ae4dma/ae4dma-dev.c
index fc33d2056af2..7c4bd14c4f12 100644
--- a/drivers/dma/amd/ae4dma/ae4dma-dev.c
+++ b/drivers/dma/amd/ae4dma/ae4dma-dev.c
@@ -60,6 +60,15 @@ static void ae4_check_status_error(struct ae4_cmd_queue *ae4cmd_q, int idx)
 	}
 }
 
+void pt_check_status_trans(struct pt_device *pt, struct pt_cmd_queue *cmd_q)
+{
+	struct ae4_cmd_queue *ae4cmd_q = container_of(cmd_q, struct ae4_cmd_queue, cmd_q);
+	int i;
+
+	for (i = 0; i < CMD_Q_LEN; i++)
+		ae4_check_status_error(ae4cmd_q, i);
+}
+
 static void ae4_pending_work(struct work_struct *work)
 {
 	struct ae4_cmd_queue *ae4cmd_q = container_of(work, struct ae4_cmd_queue, p_work.work);
@@ -123,6 +132,66 @@ static irqreturn_t ae4_core_irq_handler(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
+static int ae4_core_execute_cmd(struct ae4dma_desc *desc, struct ae4_cmd_queue *ae4cmd_q)
+{
+	bool soc = FIELD_GET(DWORD0_SOC, desc->dwouv.dw0);
+	struct pt_cmd_queue *cmd_q = &ae4cmd_q->cmd_q;
+	u32 tail_wi;
+
+	if (soc) {
+		desc->dwouv.dw0 |= FIELD_PREP(DWORD0_IOC, desc->dwouv.dw0);
+		desc->dwouv.dw0 &= ~DWORD0_SOC;
+	}
+
+	mutex_lock(&ae4cmd_q->cmd_lock);
+
+	tail_wi = atomic_read(&ae4cmd_q->tail_wi);
+	memcpy(&cmd_q->qbase[tail_wi], desc, sizeof(struct ae4dma_desc));
+
+	atomic64_inc(&ae4cmd_q->q_cmd_count);
+
+	tail_wi = (tail_wi + 1) % CMD_Q_LEN;
+
+	atomic_set(&ae4cmd_q->tail_wi, tail_wi);
+	/* Synchronize ordering */
+	mb();
+
+	writel(tail_wi, cmd_q->reg_control + 0x10);
+	/* Synchronize ordering */
+	mb();
+
+	mutex_unlock(&ae4cmd_q->cmd_lock);
+
+	wake_up(&ae4cmd_q->q_w);
+
+	return 0;
+}
+
+int pt_core_perform_passthru(struct pt_cmd_queue *cmd_q,
+			     struct pt_passthru_engine *pt_engine)
+{
+	struct ae4_cmd_queue *ae4cmd_q = container_of(cmd_q, struct ae4_cmd_queue, cmd_q);
+	struct ae4dma_desc desc;
+
+	cmd_q->cmd_error = 0;
+	cmd_q->total_pt_ops++;
+	memset(&desc, 0, sizeof(desc));
+	desc.dwouv.dws.byte0 = CMD_AE4_DESC_DW0_VAL;
+
+	desc.dw1.status = 0;
+	desc.dw1.err_code = 0;
+	desc.dw1.desc_id = 0;
+
+	desc.length = pt_engine->src_len;
+
+	desc.src_lo = upper_32_bits(pt_engine->src_dma);
+	desc.src_hi = lower_32_bits(pt_engine->src_dma);
+	desc.dst_lo = upper_32_bits(pt_engine->dst_dma);
+	desc.dst_hi = lower_32_bits(pt_engine->dst_dma);
+
+	return ae4_core_execute_cmd(&desc, ae4cmd_q);
+}
+
 void ae4_destroy_work(struct ae4_device *ae4)
 {
 	struct ae4_cmd_queue *ae4cmd_q;
@@ -202,5 +271,9 @@ int ae4_core_init(struct ae4_device *ae4)
 		init_completion(&ae4cmd_q->cmp);
 	}
 
+	ret = pt_dmaengine_register(pt);
+	if (ret)
+		ae4_destroy_work(ae4);
+
 	return ret;
 }
diff --git a/drivers/dma/amd/ae4dma/ae4dma-pci.c b/drivers/dma/amd/ae4dma/ae4dma-pci.c
index 4cd537af757d..883c4c28361f 100644
--- a/drivers/dma/amd/ae4dma/ae4dma-pci.c
+++ b/drivers/dma/amd/ae4dma/ae4dma-pci.c
@@ -131,6 +131,7 @@ static int ae4_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	pt = &ae4->pt;
 	pt->dev = dev;
+	pt->ver = AE4_DMA_VERSION;
 
 	pt->io_regs = pcim_iomap_table(pdev)[0];
 	if (!pt->io_regs) {
diff --git a/drivers/dma/amd/ae4dma/ae4dma.h b/drivers/dma/amd/ae4dma/ae4dma.h
index 4e4584e152a1..f1b6dcc1d8c3 100644
--- a/drivers/dma/amd/ae4dma/ae4dma.h
+++ b/drivers/dma/amd/ae4dma/ae4dma.h
@@ -16,6 +16,7 @@
 
 #define AE4_DESC_COMPLETED		0x3
 #define AE4_DMA_VERSION			4
+#define CMD_AE4_DESC_DW0_VAL		2
 
 struct ae4_msix {
 	int msix_count;
@@ -36,6 +37,7 @@ struct ae4_cmd_queue {
 	atomic64_t done_cnt;
 	atomic64_t q_cmd_count;
 	atomic_t dridx;
+	atomic_t tail_wi;
 	unsigned int id;
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 6/7] dmaengine: ptdma: Extend ptdma-debugfs to support multi-queue
  2024-05-10  8:20 [PATCH 0/7] Add support of AMD AE4DMA DMA Engine Basavaraj Natikar
                   ` (4 preceding siblings ...)
  2024-05-10  8:20 ` [PATCH 5/7] dmaengine: ae4dma: Register AE4DMA using pt_dmaengine_register Basavaraj Natikar
@ 2024-05-10  8:20 ` Basavaraj Natikar
  2024-05-10  8:20 ` [PATCH 7/7] dmaengine: ae4dma: Register debugfs using ptdma_debugfs_setup Basavaraj Natikar
  6 siblings, 0 replies; 12+ messages in thread
From: Basavaraj Natikar @ 2024-05-10  8:20 UTC (permalink / raw)
  To: vkoul, dmaengine; +Cc: Raju.Rangoju, Basavaraj Natikar

To support multi-channel functionality with AE4DMA engine, extend the
ptdma-debugfs with reusable components.

Reviewed-by: Raju Rangoju <Raju.Rangoju@amd.com>
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
---
 drivers/dma/amd/ptdma/ptdma-debugfs.c | 76 +++++++++++++++++++--------
 1 file changed, 55 insertions(+), 21 deletions(-)

diff --git a/drivers/dma/amd/ptdma/ptdma-debugfs.c b/drivers/dma/amd/ptdma/ptdma-debugfs.c
index c8307d3044a3..9aa7a49ae5be 100644
--- a/drivers/dma/amd/ptdma/ptdma-debugfs.c
+++ b/drivers/dma/amd/ptdma/ptdma-debugfs.c
@@ -12,7 +12,7 @@
 #include <linux/debugfs.h>
 #include <linux/seq_file.h>
 
-#include "ptdma.h"
+#include "../common/amd_dma.h"
 
 /* DebugFS helpers */
 #define	RI_VERSION_NUM	0x0000003F
@@ -23,11 +23,19 @@
 static int pt_debugfs_info_show(struct seq_file *s, void *p)
 {
 	struct pt_device *pt = s->private;
+	struct ae4_device *ae4;
 	unsigned int regval;
 
 	seq_printf(s, "Device name: %s\n", dev_name(pt->dev));
-	seq_printf(s, "   # Queues: %d\n", 1);
-	seq_printf(s, "     # Cmds: %d\n", pt->cmd_count);
+
+	if (pt->ver == AE4_DMA_VERSION) {
+		ae4 = container_of(pt, struct ae4_device, pt);
+		seq_printf(s, "   # Queues: %d\n", ae4->cmd_q_count);
+		seq_printf(s, "     # Cmds per queue: %d\n", CMD_Q_LEN);
+	} else {
+		seq_printf(s, "   # Queues: %d\n", 1);
+		seq_printf(s, "     # Cmds: %d\n", pt->cmd_count);
+	}
 
 	regval = ioread32(pt->io_regs + CMD_PT_VERSION);
 
@@ -55,6 +63,7 @@ static int pt_debugfs_stats_show(struct seq_file *s, void *p)
 static int pt_debugfs_queue_show(struct seq_file *s, void *p)
 {
 	struct pt_cmd_queue *cmd_q = s->private;
+	struct pt_device *pt;
 	unsigned int regval;
 
 	if (!cmd_q)
@@ -62,18 +71,24 @@ static int pt_debugfs_queue_show(struct seq_file *s, void *p)
 
 	seq_printf(s, "               Pass-Thru: %ld\n", cmd_q->total_pt_ops);
 
-	regval = ioread32(cmd_q->reg_control + 0x000C);
-
-	seq_puts(s, "      Enabled Interrupts:");
-	if (regval & INT_EMPTY_QUEUE)
-		seq_puts(s, " EMPTY");
-	if (regval & INT_QUEUE_STOPPED)
-		seq_puts(s, " STOPPED");
-	if (regval & INT_ERROR)
-		seq_puts(s, " ERROR");
-	if (regval & INT_COMPLETION)
-		seq_puts(s, " COMPLETION");
-	seq_puts(s, "\n");
+	pt = cmd_q->pt;
+	if (pt->ver == AE4_DMA_VERSION) {
+		regval = readl(cmd_q->reg_control + 0x4);
+		seq_printf(s, "     Enabled Interrupts:: status 0x%x\n", regval);
+	} else {
+		regval = ioread32(cmd_q->reg_control + 0x000C);
+
+		seq_puts(s, "      Enabled Interrupts:");
+		if (regval & INT_EMPTY_QUEUE)
+			seq_puts(s, " EMPTY");
+		if (regval & INT_QUEUE_STOPPED)
+			seq_puts(s, " STOPPED");
+		if (regval & INT_ERROR)
+			seq_puts(s, " ERROR");
+		if (regval & INT_COMPLETION)
+			seq_puts(s, " COMPLETION");
+		seq_puts(s, "\n");
+	}
 
 	return 0;
 }
@@ -84,8 +99,12 @@ DEFINE_SHOW_ATTRIBUTE(pt_debugfs_stats);
 
 void ptdma_debugfs_setup(struct pt_device *pt)
 {
-	struct pt_cmd_queue *cmd_q;
 	struct dentry *debugfs_q_instance;
+	struct ae4_cmd_queue *ae4cmd_q;
+	struct pt_cmd_queue *cmd_q;
+	struct ae4_device *ae4;
+	char name[30];
+	int i;
 
 	if (!debugfs_initialized())
 		return;
@@ -96,11 +115,26 @@ void ptdma_debugfs_setup(struct pt_device *pt)
 	debugfs_create_file("stats", 0400, pt->dma_dev.dbg_dev_root, pt,
 			    &pt_debugfs_stats_fops);
 
-	cmd_q = &pt->cmd_q;
 
-	debugfs_q_instance =
-		debugfs_create_dir("q", pt->dma_dev.dbg_dev_root);
+	if (pt->ver == AE4_DMA_VERSION) {
+		ae4 = container_of(pt, struct ae4_device, pt);
+		for (i = 0; i < ae4->cmd_q_count; i++) {
+			ae4cmd_q = &ae4->ae4cmd_q[i];
+			cmd_q = &ae4cmd_q->cmd_q;
+
+			snprintf(name, 29, "q%d", ae4cmd_q->id);
+
+			debugfs_q_instance =
+				debugfs_create_dir(name, pt->dma_dev.dbg_dev_root);
 
-	debugfs_create_file("stats", 0400, debugfs_q_instance, cmd_q,
-			    &pt_debugfs_queue_fops);
+			debugfs_create_file("stats", 0400, debugfs_q_instance, cmd_q,
+					    &pt_debugfs_queue_fops);
+		}
+	} else {
+		debugfs_q_instance =
+			debugfs_create_dir("q", pt->dma_dev.dbg_dev_root);
+		cmd_q = &pt->cmd_q;
+		debugfs_create_file("stats", 0400, debugfs_q_instance, cmd_q,
+				    &pt_debugfs_queue_fops);
+	}
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 7/7] dmaengine: ae4dma: Register debugfs using ptdma_debugfs_setup
  2024-05-10  8:20 [PATCH 0/7] Add support of AMD AE4DMA DMA Engine Basavaraj Natikar
                   ` (5 preceding siblings ...)
  2024-05-10  8:20 ` [PATCH 6/7] dmaengine: ptdma: Extend ptdma-debugfs to support multi-queue Basavaraj Natikar
@ 2024-05-10  8:20 ` Basavaraj Natikar
  6 siblings, 0 replies; 12+ messages in thread
From: Basavaraj Natikar @ 2024-05-10  8:20 UTC (permalink / raw)
  To: vkoul, dmaengine; +Cc: Raju.Rangoju, Basavaraj Natikar

Use the ptdma_debugfs_setup function to register debugfs for AE4DMA DMA
engine.

Reviewed-by: Raju Rangoju <Raju.Rangoju@amd.com>
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
---
 drivers/dma/amd/ae4dma/Makefile     | 2 +-
 drivers/dma/amd/ae4dma/ae4dma-dev.c | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/amd/ae4dma/Makefile b/drivers/dma/amd/ae4dma/Makefile
index 165d1c74b732..5246d4738413 100644
--- a/drivers/dma/amd/ae4dma/Makefile
+++ b/drivers/dma/amd/ae4dma/Makefile
@@ -5,6 +5,6 @@
 
 obj-$(CONFIG_AMD_AE4DMA) += ae4dma.o
 
-ae4dma-objs := ae4dma-dev.o  ../ptdma/ptdma-dmaengine.o ../common/amd_dma.o
+ae4dma-objs := ae4dma-dev.o  ../ptdma/ptdma-dmaengine.o ../common/amd_dma.o ../ptdma/ptdma-debugfs.o
 
 ae4dma-$(CONFIG_PCI) += ae4dma-pci.o
diff --git a/drivers/dma/amd/ae4dma/ae4dma-dev.c b/drivers/dma/amd/ae4dma/ae4dma-dev.c
index 7c4bd14c4f12..3a5bda9e2c92 100644
--- a/drivers/dma/amd/ae4dma/ae4dma-dev.c
+++ b/drivers/dma/amd/ae4dma/ae4dma-dev.c
@@ -274,6 +274,8 @@ int ae4_core_init(struct ae4_device *ae4)
 	ret = pt_dmaengine_register(pt);
 	if (ret)
 		ae4_destroy_work(ae4);
+	else
+		ptdma_debugfs_setup(pt);
 
 	return ret;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/7] dmaengine: ae4dma: Add AMD ae4dma controller driver
  2024-05-10  8:20 ` [PATCH 2/7] dmaengine: ae4dma: Add AMD ae4dma controller driver Basavaraj Natikar
@ 2024-05-10 18:16   ` Frank Li
  2024-05-21  9:36     ` Basavaraj Natikar
  0 siblings, 1 reply; 12+ messages in thread
From: Frank Li @ 2024-05-10 18:16 UTC (permalink / raw)
  To: Basavaraj Natikar; +Cc: vkoul, dmaengine, Raju.Rangoju

On Fri, May 10, 2024 at 01:50:48PM +0530, Basavaraj Natikar wrote:
> Add support for AMD AE4DMA controller. It performs high-bandwidth
> memory to memory and IO copy operation. Device commands are managed
> via a circular queue of 'descriptors', each of which specifies source
> and destination addresses for copying a single buffer of data.
> 
> Reviewed-by: Raju Rangoju <Raju.Rangoju@amd.com>
> Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
> ---
>  MAINTAINERS                         |   6 +
>  drivers/dma/amd/Kconfig             |   1 +
>  drivers/dma/amd/Makefile            |   1 +
>  drivers/dma/amd/ae4dma/Kconfig      |  13 ++
>  drivers/dma/amd/ae4dma/Makefile     |  10 ++
>  drivers/dma/amd/ae4dma/ae4dma-dev.c | 206 ++++++++++++++++++++++++++++
>  drivers/dma/amd/ae4dma/ae4dma-pci.c | 195 ++++++++++++++++++++++++++
>  drivers/dma/amd/ae4dma/ae4dma.h     |  77 +++++++++++
>  drivers/dma/amd/common/amd_dma.h    |  26 ++++
>  9 files changed, 535 insertions(+)
>  create mode 100644 drivers/dma/amd/ae4dma/Kconfig
>  create mode 100644 drivers/dma/amd/ae4dma/Makefile
>  create mode 100644 drivers/dma/amd/ae4dma/ae4dma-dev.c
>  create mode 100644 drivers/dma/amd/ae4dma/ae4dma-pci.c
>  create mode 100644 drivers/dma/amd/ae4dma/ae4dma.h
>  create mode 100644 drivers/dma/amd/common/amd_dma.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index b190efda33ba..45f2140093b6 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -909,6 +909,12 @@ L:	linux-edac@vger.kernel.org
>  S:	Supported
>  F:	drivers/ras/amd/atl/*
>  
> +AMD AE4DMA DRIVER
> +M:	Basavaraj Natikar <Basavaraj.Natikar@amd.com>
> +L:	dmaengine@vger.kernel.org
> +S:	Maintained
> +F:	drivers/dma/amd/ae4dma/
> +
>  AMD AXI W1 DRIVER
>  M:	Kris Chaplin <kris.chaplin@amd.com>
>  R:	Thomas Delev <thomas.delev@amd.com>
> diff --git a/drivers/dma/amd/Kconfig b/drivers/dma/amd/Kconfig
> index 8246b463bcf7..8c25a3ed6b94 100644
> --- a/drivers/dma/amd/Kconfig
> +++ b/drivers/dma/amd/Kconfig
> @@ -3,3 +3,4 @@
>  # AMD DMA Drivers
>  
>  source "drivers/dma/amd/ptdma/Kconfig"
> +source "drivers/dma/amd/ae4dma/Kconfig"
> diff --git a/drivers/dma/amd/Makefile b/drivers/dma/amd/Makefile
> index dd7257ba7e06..8049b06a9ff5 100644
> --- a/drivers/dma/amd/Makefile
> +++ b/drivers/dma/amd/Makefile
> @@ -4,3 +4,4 @@
>  #
>  
>  obj-$(CONFIG_AMD_PTDMA) += ptdma/
> +obj-$(CONFIG_AMD_AE4DMA) += ae4dma/
> diff --git a/drivers/dma/amd/ae4dma/Kconfig b/drivers/dma/amd/ae4dma/Kconfig
> new file mode 100644
> index 000000000000..cf8db4dac98d
> --- /dev/null
> +++ b/drivers/dma/amd/ae4dma/Kconfig
> @@ -0,0 +1,13 @@
> +# SPDX-License-Identifier: GPL-2.0
> +config AMD_AE4DMA
> +	tristate  "AMD AE4DMA Engine"
> +	depends on X86_64 && PCI
> +	select DMA_ENGINE
> +	select DMA_VIRTUAL_CHANNELS
> +	help
> +	  Enable support for the AMD AE4DMA controller. This controller
> +	  provides DMA capabilities to perform high bandwidth memory to
> +	  memory and IO copy operations. It performs DMA transfer through
> +	  queue-based descriptor management. This DMA controller is intended
> +	  to be used with AMD Non-Transparent Bridge devices and not for
> +	  general purpose peripheral DMA.
> diff --git a/drivers/dma/amd/ae4dma/Makefile b/drivers/dma/amd/ae4dma/Makefile
> new file mode 100644
> index 000000000000..e918f85a80ec
> --- /dev/null
> +++ b/drivers/dma/amd/ae4dma/Makefile
> @@ -0,0 +1,10 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# AMD AE4DMA driver
> +#
> +
> +obj-$(CONFIG_AMD_AE4DMA) += ae4dma.o
> +
> +ae4dma-objs := ae4dma-dev.o
> +
> +ae4dma-$(CONFIG_PCI) += ae4dma-pci.o
> diff --git a/drivers/dma/amd/ae4dma/ae4dma-dev.c b/drivers/dma/amd/ae4dma/ae4dma-dev.c
> new file mode 100644
> index 000000000000..fc33d2056af2
> --- /dev/null
> +++ b/drivers/dma/amd/ae4dma/ae4dma-dev.c
> @@ -0,0 +1,206 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * AMD AE4DMA driver
> + *
> + * Copyright (c) 2024, Advanced Micro Devices, Inc.
> + * All Rights Reserved.
> + *
> + * Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
> + */
> +
> +#include "ae4dma.h"
> +
> +static unsigned int max_hw_q = 1;
> +module_param(max_hw_q, uint, 0444);
> +MODULE_PARM_DESC(max_hw_q, "max hw queues supported by engine (any non-zero value, default: 1)");

Does it get from hardware register? you put to global variable. How about
system have two difference DMA controllers, one's max_hw_q is 1, the other
is 2.

> +
> +static char *ae4_error_codes[] = {
> +	"",
> +	"ERR 01: INVALID HEADER DW0",
> +	"ERR 02: INVALID STATUS",
> +	"ERR 03: INVALID LENGTH - 4 BYTE ALIGNMENT",
> +	"ERR 04: INVALID SRC ADDR - 4 BYTE ALIGNMENT",
> +	"ERR 05: INVALID DST ADDR - 4 BYTE ALIGNMENT",
> +	"ERR 06: INVALID ALIGNMENT",
> +	"ERR 07: INVALID DESCRIPTOR",
> +};
> +
> +static void ae4_log_error(struct pt_device *d, int e)
> +{
> +	if (e <= 7)
> +		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", ae4_error_codes[e], e);
> +	else if (e > 7 && e <= 15)
> +		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "INVALID DESCRIPTOR", e);
> +	else if (e > 15 && e <= 31)
> +		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "INVALID DESCRIPTOR", e);
> +	else if (e > 31 && e <= 63)
> +		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "INVALID DESCRIPTOR", e);
> +	else if (e > 63 && e <= 127)
> +		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "PTE ERROR", e);
> +	else if (e > 127 && e <= 255)
> +		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "PTE ERROR", e);
> +	else
> +		dev_info(d->dev, "Unknown AE4DMA error");
> +}
> +
> +static void ae4_check_status_error(struct ae4_cmd_queue *ae4cmd_q, int idx)
> +{
> +	struct pt_cmd_queue *cmd_q = &ae4cmd_q->cmd_q;
> +	struct ae4dma_desc desc;
> +	u8 status;
> +
> +	memcpy(&desc, &cmd_q->qbase[idx], sizeof(struct ae4dma_desc));
> +	/* Synchronize ordering */
> +	mb();

does dma_wmb() enough? 

> +	status = desc.dw1.status;
> +	if (status && status != AE4_DESC_COMPLETED) {
> +		cmd_q->cmd_error = desc.dw1.err_code;
> +		if (cmd_q->cmd_error)
> +			ae4_log_error(cmd_q->pt, cmd_q->cmd_error);
> +	}
> +}
> +
> +static void ae4_pending_work(struct work_struct *work)
> +{
> +	struct ae4_cmd_queue *ae4cmd_q = container_of(work, struct ae4_cmd_queue, p_work.work);
> +	struct pt_cmd_queue *cmd_q = &ae4cmd_q->cmd_q;
> +	struct pt_cmd *cmd;
> +	u32 cridx, dridx;
> +
> +	while (true) {
> +		wait_event_interruptible(ae4cmd_q->q_w,
> +					 ((atomic64_read(&ae4cmd_q->done_cnt)) <
> +					   atomic64_read(&ae4cmd_q->intr_cnt)));

wait_event_interruptible_timeout() ? to avoid patental deadlock.

> +
> +		atomic64_inc(&ae4cmd_q->done_cnt);
> +
> +		mutex_lock(&ae4cmd_q->cmd_lock);
> +
> +		cridx = readl(cmd_q->reg_control + 0x0C);
> +		dridx = atomic_read(&ae4cmd_q->dridx);
> +
> +		while ((dridx != cridx) && !list_empty(&ae4cmd_q->cmd)) {
> +			cmd = list_first_entry(&ae4cmd_q->cmd, struct pt_cmd, entry);
> +			list_del(&cmd->entry);
> +
> +			ae4_check_status_error(ae4cmd_q, dridx);
> +			cmd->pt_cmd_callback(cmd->data, cmd->ret);
> +
> +			atomic64_dec(&ae4cmd_q->q_cmd_count);
> +			dridx = (dridx + 1) % CMD_Q_LEN;
> +			atomic_set(&ae4cmd_q->dridx, dridx);
> +			/* Synchronize ordering */
> +			mb();
> +
> +			complete_all(&ae4cmd_q->cmp);
> +		}
> +
> +		mutex_unlock(&ae4cmd_q->cmd_lock);
> +	}
> +}
> +
> +static irqreturn_t ae4_core_irq_handler(int irq, void *data)
> +{
> +	struct ae4_cmd_queue *ae4cmd_q = data;
> +	struct pt_cmd_queue *cmd_q;
> +	struct pt_device *pt;
> +	u32 status;
> +
> +	cmd_q = &ae4cmd_q->cmd_q;
> +	pt = cmd_q->pt;
> +
> +	pt->total_interrupts++;
> +	atomic64_inc(&ae4cmd_q->intr_cnt);
> +
> +	wake_up(&ae4cmd_q->q_w);
> +
> +	status = readl(cmd_q->reg_control + 0x14);
> +	if (status & BIT(0)) {
> +		status &= GENMASK(31, 1);
> +		writel(status, cmd_q->reg_control + 0x14);
> +	}
> +
> +	return IRQ_HANDLED;
> +}
> +
> +void ae4_destroy_work(struct ae4_device *ae4)
> +{
> +	struct ae4_cmd_queue *ae4cmd_q;
> +	int i;
> +
> +	for (i = 0; i < ae4->cmd_q_count; i++) {
> +		ae4cmd_q = &ae4->ae4cmd_q[i];
> +
> +		if (!ae4cmd_q->pws)
> +			break;
> +
> +		cancel_delayed_work(&ae4cmd_q->p_work);

do you need cancel_delayed_work_sync()?

> +		destroy_workqueue(ae4cmd_q->pws);
> +	}
> +}
> +
> +int ae4_core_init(struct ae4_device *ae4)
> +{
> +	struct pt_device *pt = &ae4->pt;
> +	struct ae4_cmd_queue *ae4cmd_q;
> +	struct device *dev = pt->dev;
> +	struct pt_cmd_queue *cmd_q;
> +	int i, ret = 0;
> +
> +	writel(max_hw_q, pt->io_regs);
> +
> +	for (i = 0; i < max_hw_q; i++) {
> +		ae4cmd_q = &ae4->ae4cmd_q[i];
> +		ae4cmd_q->id = ae4->cmd_q_count;
> +		ae4->cmd_q_count++;
> +
> +		cmd_q = &ae4cmd_q->cmd_q;
> +		cmd_q->pt = pt;
> +
> +		/* Preset some register values (Q size is 32byte (0x20)) */
> +		cmd_q->reg_control = pt->io_regs + ((i + 1) * 0x20);
> +
> +		ret = devm_request_irq(dev, ae4->ae4_irq[i], ae4_core_irq_handler, 0,
> +				       dev_name(pt->dev), ae4cmd_q);
> +		if (ret)
> +			return ret;
> +
> +		cmd_q->qsize = Q_SIZE(sizeof(struct ae4dma_desc));
> +
> +		cmd_q->qbase = dmam_alloc_coherent(dev, cmd_q->qsize, &cmd_q->qbase_dma,
> +						   GFP_KERNEL);
> +		if (!cmd_q->qbase)
> +			return -ENOMEM;
> +	}
> +
> +	for (i = 0; i < ae4->cmd_q_count; i++) {
> +		ae4cmd_q = &ae4->ae4cmd_q[i];
> +
> +		cmd_q = &ae4cmd_q->cmd_q;
> +
> +		/* Preset some register values (Q size is 32byte (0x20)) */
> +		cmd_q->reg_control = pt->io_regs + ((i + 1) * 0x20);
> +
> +		/* Update the device registers with queue information. */
> +		writel(CMD_Q_LEN, cmd_q->reg_control + 0x08);
> +
> +		cmd_q->qdma_tail = cmd_q->qbase_dma;
> +		writel(lower_32_bits(cmd_q->qdma_tail), cmd_q->reg_control + 0x18);
> +		writel(upper_32_bits(cmd_q->qdma_tail), cmd_q->reg_control + 0x1C);
> +
> +		INIT_LIST_HEAD(&ae4cmd_q->cmd);
> +		init_waitqueue_head(&ae4cmd_q->q_w);
> +
> +		ae4cmd_q->pws = alloc_ordered_workqueue("ae4dma_%d", WQ_MEM_RECLAIM, ae4cmd_q->id);

Can existed workqueue match your requirement? 

Frank

> +		if (!ae4cmd_q->pws) {
> +			ae4_destroy_work(ae4);
> +			return -ENOMEM;
> +		}
> +		INIT_DELAYED_WORK(&ae4cmd_q->p_work, ae4_pending_work);
> +		queue_delayed_work(ae4cmd_q->pws, &ae4cmd_q->p_work,  usecs_to_jiffies(100));
> +
> +		init_completion(&ae4cmd_q->cmp);
> +	}
> +
> +	return ret;
> +}
> diff --git a/drivers/dma/amd/ae4dma/ae4dma-pci.c b/drivers/dma/amd/ae4dma/ae4dma-pci.c
> new file mode 100644
> index 000000000000..4cd537af757d
> --- /dev/null
> +++ b/drivers/dma/amd/ae4dma/ae4dma-pci.c
> @@ -0,0 +1,195 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * AMD AE4DMA driver
> + *
> + * Copyright (c) 2024, Advanced Micro Devices, Inc.
> + * All Rights Reserved.
> + *
> + * Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
> + */
> +
> +#include "ae4dma.h"
> +
> +static int ae4_get_msi_irq(struct ae4_device *ae4)
> +{
> +	struct pt_device *pt = &ae4->pt;
> +	struct device *dev = pt->dev;
> +	struct pci_dev *pdev;
> +	int ret, i;
> +
> +	pdev = to_pci_dev(dev);
> +	ret = pci_enable_msi(pdev);
> +	if (ret)
> +		return ret;
> +
> +	for (i = 0; i < MAX_AE4_HW_QUEUES; i++)
> +		ae4->ae4_irq[i] = pdev->irq;
> +
> +	return 0;
> +}
> +
> +static int ae4_get_msix_irqs(struct ae4_device *ae4)
> +{
> +	struct ae4_msix *ae4_msix = ae4->ae4_msix;
> +	struct pt_device *pt = &ae4->pt;
> +	struct device *dev = pt->dev;
> +	struct pci_dev *pdev;
> +	int v, i, ret;
> +
> +	pdev = to_pci_dev(dev);
> +
> +	for (v = 0; v < ARRAY_SIZE(ae4_msix->msix_entry); v++)
> +		ae4_msix->msix_entry[v].entry = v;
> +
> +	ret = pci_enable_msix_range(pdev, ae4_msix->msix_entry, 1, v);
> +	if (ret < 0)
> +		return ret;
> +
> +	ae4_msix->msix_count = ret;
> +
> +	for (i = 0; i < MAX_AE4_HW_QUEUES; i++)
> +		ae4->ae4_irq[i] = ae4_msix->msix_entry[i].vector;
> +
> +	return 0;
> +}
> +
> +static int ae4_get_irqs(struct ae4_device *ae4)
> +{
> +	struct pt_device *pt = &ae4->pt;
> +	struct device *dev = pt->dev;
> +	int ret;
> +
> +	ret = ae4_get_msix_irqs(ae4);
> +	if (!ret)
> +		return 0;
> +
> +	/* Couldn't get MSI-X vectors, try MSI */
> +	dev_err(dev, "could not enable MSI-X (%d), trying MSI\n", ret);
> +	ret = ae4_get_msi_irq(ae4);
> +	if (!ret)
> +		return 0;
> +
> +	/* Couldn't get MSI interrupt */
> +	dev_err(dev, "could not enable MSI (%d)\n", ret);
> +
> +	return ret;
> +}
> +
> +static void ae4_free_irqs(struct ae4_device *ae4)
> +{
> +	struct ae4_msix *ae4_msix;
> +	struct pci_dev *pdev;
> +	struct pt_device *pt;
> +	struct device *dev;
> +	int i;
> +
> +	if (ae4) {
> +		pt = &ae4->pt;
> +		dev = pt->dev;
> +		pdev = to_pci_dev(dev);
> +
> +		ae4_msix = ae4->ae4_msix;
> +		if (ae4_msix && ae4_msix->msix_count)
> +			pci_disable_msix(pdev);
> +		else if (pdev->irq)
> +			pci_disable_msi(pdev);
> +
> +		for (i = 0; i < MAX_AE4_HW_QUEUES; i++)
> +			ae4->ae4_irq[i] = 0;
> +	}
> +}
> +
> +static void ae4_deinit(struct ae4_device *ae4)
> +{
> +	ae4_free_irqs(ae4);
> +}
> +
> +static int ae4_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> +{
> +	struct device *dev = &pdev->dev;
> +	struct ae4_device *ae4;
> +	struct pt_device *pt;
> +	int bar_mask;
> +	int ret = 0;
> +
> +	ae4 = devm_kzalloc(dev, sizeof(*ae4), GFP_KERNEL);
> +	if (!ae4)
> +		return -ENOMEM;
> +
> +	ae4->ae4_msix = devm_kzalloc(dev, sizeof(struct ae4_msix), GFP_KERNEL);
> +	if (!ae4->ae4_msix)
> +		return -ENOMEM;
> +
> +	ret = pcim_enable_device(pdev);
> +	if (ret)
> +		goto ae4_error;
> +
> +	bar_mask = pci_select_bars(pdev, IORESOURCE_MEM);
> +	ret = pcim_iomap_regions(pdev, bar_mask, "ae4dma");
> +	if (ret)
> +		goto ae4_error;
> +
> +	pt = &ae4->pt;
> +	pt->dev = dev;
> +
> +	pt->io_regs = pcim_iomap_table(pdev)[0];
> +	if (!pt->io_regs) {
> +		ret = -ENOMEM;
> +		goto ae4_error;
> +	}
> +
> +	ret = ae4_get_irqs(ae4);
> +	if (ret)
> +		goto ae4_error;
> +
> +	pci_set_master(pdev);
> +
> +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(48));
> +	if (ret) {
> +		ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> +		if (ret)
> +			goto ae4_error;
> +	}

needn't failback to 32bit.  never return failure when bit >= 32.

Detail see: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=f7ae20f2fc4e6a5e32f43c4fa2acab3281a61c81

if (support_48bit)
	dma_set_mask_and_coherent(dev, DMA_BIT_MASK(48))
else
	dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))

you decide if support_48bit by hardware register or PID/DID


> +
> +	dev_set_drvdata(dev, ae4);
> +
> +	ret = ae4_core_init(ae4);
> +	if (ret)
> +		goto ae4_error;
> +
> +	return 0;
> +
> +ae4_error:
> +	ae4_deinit(ae4);
> +
> +	return ret;
> +}
> +
> +static void ae4_pci_remove(struct pci_dev *pdev)
> +{
> +	struct ae4_device *ae4 = dev_get_drvdata(&pdev->dev);
> +
> +	ae4_destroy_work(ae4);
> +	ae4_deinit(ae4);
> +}
> +
> +static const struct pci_device_id ae4_pci_table[] = {
> +	{ PCI_VDEVICE(AMD, 0x14C8), },
> +	{ PCI_VDEVICE(AMD, 0x14DC), },
> +	{ PCI_VDEVICE(AMD, 0x149B), },
> +	/* Last entry must be zero */
> +	{ 0, }
> +};
> +MODULE_DEVICE_TABLE(pci, ae4_pci_table);
> +
> +static struct pci_driver ae4_pci_driver = {
> +	.name = "ae4dma",
> +	.id_table = ae4_pci_table,
> +	.probe = ae4_pci_probe,
> +	.remove = ae4_pci_remove,
> +};
> +
> +module_pci_driver(ae4_pci_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("AMD AE4DMA driver");
> diff --git a/drivers/dma/amd/ae4dma/ae4dma.h b/drivers/dma/amd/ae4dma/ae4dma.h
> new file mode 100644
> index 000000000000..24b1253ad570
> --- /dev/null
> +++ b/drivers/dma/amd/ae4dma/ae4dma.h
> @@ -0,0 +1,77 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * AMD AE4DMA driver
> + *
> + * Copyright (c) 2024, Advanced Micro Devices, Inc.
> + * All Rights Reserved.
> + *
> + * Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
> + */
> +#ifndef __AE4DMA_H__
> +#define __AE4DMA_H__
> +
> +#include "../common/amd_dma.h"
> +
> +#define MAX_AE4_HW_QUEUES		16
> +
> +#define AE4_DESC_COMPLETED		0x3
> +
> +struct ae4_msix {
> +	int msix_count;
> +	struct msix_entry msix_entry[MAX_AE4_HW_QUEUES];
> +};
> +
> +struct ae4_cmd_queue {
> +	struct ae4_device *ae4;
> +	struct pt_cmd_queue cmd_q;
> +	struct list_head cmd;
> +	/* protect command operations */
> +	struct mutex cmd_lock;
> +	struct delayed_work p_work;
> +	struct workqueue_struct *pws;
> +	struct completion cmp;
> +	wait_queue_head_t q_w;
> +	atomic64_t intr_cnt;
> +	atomic64_t done_cnt;
> +	atomic64_t q_cmd_count;
> +	atomic_t dridx;
> +	unsigned int id;
> +};
> +
> +union dwou {
> +	u32 dw0;
> +	struct dword0 {
> +	u8	byte0;
> +	u8	byte1;
> +	u16	timestamp;
> +	} dws;
> +};
> +
> +struct dword1 {
> +	u8	status;
> +	u8	err_code;
> +	u16	desc_id;
> +};
> +
> +struct ae4dma_desc {
> +	union dwou dwouv;
> +	struct dword1 dw1;
> +	u32 length;
> +	u32 rsvd;
> +	u32 src_hi;
> +	u32 src_lo;
> +	u32 dst_hi;
> +	u32 dst_lo;
> +};
> +
> +struct ae4_device {
> +	struct pt_device pt;
> +	struct ae4_msix *ae4_msix;
> +	struct ae4_cmd_queue ae4cmd_q[MAX_AE4_HW_QUEUES];
> +	unsigned int ae4_irq[MAX_AE4_HW_QUEUES];
> +	unsigned int cmd_q_count;
> +};
> +
> +int ae4_core_init(struct ae4_device *ae4);
> +void ae4_destroy_work(struct ae4_device *ae4);
> +#endif
> diff --git a/drivers/dma/amd/common/amd_dma.h b/drivers/dma/amd/common/amd_dma.h
> new file mode 100644
> index 000000000000..31c35b3bc94b
> --- /dev/null
> +++ b/drivers/dma/amd/common/amd_dma.h
> @@ -0,0 +1,26 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * AMD DMA Driver common
> + *
> + * Copyright (c) 2024, Advanced Micro Devices, Inc.
> + * All Rights Reserved.
> + *
> + * Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
> + */
> +
> +#ifndef AMD_DMA_H
> +#define AMD_DMA_H
> +
> +#include <linux/device.h>
> +#include <linux/dmaengine.h>
> +#include <linux/pci.h>
> +#include <linux/spinlock.h>
> +#include <linux/mutex.h>
> +#include <linux/list.h>
> +#include <linux/wait.h>
> +#include <linux/dmapool.h>

order by alphabet

> +
> +#include "../ptdma/ptdma.h"
> +#include "../../virt-dma.h"
> +
> +#endif
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/7] dmaengine: ptdma: Extend ptdma to support multi-channel and version
  2024-05-10  8:20 ` [PATCH 4/7] dmaengine: ptdma: Extend ptdma to support multi-channel and version Basavaraj Natikar
@ 2024-05-11 10:54   ` kernel test robot
  0 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2024-05-11 10:54 UTC (permalink / raw)
  To: Basavaraj Natikar, vkoul, dmaengine
  Cc: oe-kbuild-all, Raju.Rangoju, Basavaraj Natikar

Hi Basavaraj,

kernel test robot noticed the following build warnings:

[auto build test WARNING on vkoul-dmaengine/next]
[also build test WARNING on linus/master v6.9-rc7 next-20240510]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Basavaraj-Natikar/dmaengine-Move-AMD-DMA-driver-to-separate-directory/20240510-162221
base:   https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git next
patch link:    https://lore.kernel.org/r/20240510082053.875923-5-Basavaraj.Natikar%40amd.com
patch subject: [PATCH 4/7] dmaengine: ptdma: Extend ptdma to support multi-channel and version
config: x86_64-randconfig-122-20240511 (https://download.01.org/0day-ci/archive/20240511/202405111809.dOmIUtt3-lkp@intel.com/config)
compiler: gcc-11 (Ubuntu 11.4.0-4ubuntu1) 11.4.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240511/202405111809.dOmIUtt3-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405111809.dOmIUtt3-lkp@intel.com/

All warnings (new ones prefixed by >>):

   drivers/dma/amd/ptdma/ptdma-dmaengine.c: In function 'pt_create_desc':
>> drivers/dma/amd/ptdma/ptdma-dmaengine.c:208:30: warning: variable 'cmd_q' set but not used [-Wunused-but-set-variable]
     208 |         struct pt_cmd_queue *cmd_q;
         |                              ^~~~~


vim +/cmd_q +208 drivers/dma/amd/ptdma/ptdma-dmaengine.c

   197	
   198	static struct pt_dma_desc *pt_create_desc(struct dma_chan *dma_chan,
   199						  dma_addr_t dst,
   200						  dma_addr_t src,
   201						  unsigned int len,
   202						  unsigned long flags)
   203	{
   204		struct pt_dma_chan *chan = to_pt_chan(dma_chan);
   205		struct pt_passthru_engine *pt_engine;
   206		struct pt_device *pt = chan->pt;
   207		struct ae4_cmd_queue *ae4cmd_q;
 > 208		struct pt_cmd_queue *cmd_q;
   209		struct pt_dma_desc *desc;
   210		struct ae4_device *ae4;
   211		struct pt_cmd *pt_cmd;
   212	
   213		desc = pt_alloc_dma_desc(chan, flags);
   214		if (!desc)
   215			return NULL;
   216	
   217		pt_cmd = &desc->pt_cmd;
   218		pt_cmd->pt = pt;
   219		pt_engine = &pt_cmd->passthru;
   220		pt_cmd->engine = PT_ENGINE_PASSTHRU;
   221		pt_engine->src_dma = src;
   222		pt_engine->dst_dma = dst;
   223		pt_engine->src_len = len;
   224		pt_cmd->pt_cmd_callback = pt_cmd_callback;
   225		pt_cmd->data = desc;
   226	
   227		desc->len = len;
   228	
   229		if (pt->ver == AE4_DMA_VERSION) {
   230			ae4 = container_of(pt, struct ae4_device, pt);
   231			ae4cmd_q = &ae4->ae4cmd_q[chan->id];
   232			cmd_q = &ae4cmd_q->cmd_q;
   233			mutex_lock(&ae4cmd_q->cmd_lock);
   234			list_add_tail(&pt_cmd->entry, &ae4cmd_q->cmd);
   235			mutex_unlock(&ae4cmd_q->cmd_lock);
   236		}
   237	
   238		return desc;
   239	}
   240	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/7] dmaengine: ae4dma: Add AMD ae4dma controller driver
  2024-05-10 18:16   ` Frank Li
@ 2024-05-21  9:36     ` Basavaraj Natikar
  2024-05-23 22:02       ` Frank Li
  0 siblings, 1 reply; 12+ messages in thread
From: Basavaraj Natikar @ 2024-05-21  9:36 UTC (permalink / raw)
  To: Frank Li, Basavaraj Natikar; +Cc: vkoul, dmaengine, Raju.Rangoju


On 5/10/2024 11:46 PM, Frank Li wrote:
> On Fri, May 10, 2024 at 01:50:48PM +0530, Basavaraj Natikar wrote:
>> Add support for AMD AE4DMA controller. It performs high-bandwidth
>> memory to memory and IO copy operation. Device commands are managed
>> via a circular queue of 'descriptors', each of which specifies source
>> and destination addresses for copying a single buffer of data.
>>
>> Reviewed-by: Raju Rangoju <Raju.Rangoju@amd.com>
>> Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
>> ---
>>  MAINTAINERS                         |   6 +
>>  drivers/dma/amd/Kconfig             |   1 +
>>  drivers/dma/amd/Makefile            |   1 +
>>  drivers/dma/amd/ae4dma/Kconfig      |  13 ++
>>  drivers/dma/amd/ae4dma/Makefile     |  10 ++
>>  drivers/dma/amd/ae4dma/ae4dma-dev.c | 206 ++++++++++++++++++++++++++++
>>  drivers/dma/amd/ae4dma/ae4dma-pci.c | 195 ++++++++++++++++++++++++++
>>  drivers/dma/amd/ae4dma/ae4dma.h     |  77 +++++++++++
>>  drivers/dma/amd/common/amd_dma.h    |  26 ++++
>>  9 files changed, 535 insertions(+)
>>  create mode 100644 drivers/dma/amd/ae4dma/Kconfig
>>  create mode 100644 drivers/dma/amd/ae4dma/Makefile
>>  create mode 100644 drivers/dma/amd/ae4dma/ae4dma-dev.c
>>  create mode 100644 drivers/dma/amd/ae4dma/ae4dma-pci.c
>>  create mode 100644 drivers/dma/amd/ae4dma/ae4dma.h
>>  create mode 100644 drivers/dma/amd/common/amd_dma.h
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index b190efda33ba..45f2140093b6 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -909,6 +909,12 @@ L:	linux-edac@vger.kernel.org
>>  S:	Supported
>>  F:	drivers/ras/amd/atl/*
>>  
>> +AMD AE4DMA DRIVER
>> +M:	Basavaraj Natikar <Basavaraj.Natikar@amd.com>
>> +L:	dmaengine@vger.kernel.org
>> +S:	Maintained
>> +F:	drivers/dma/amd/ae4dma/
>> +
>>  AMD AXI W1 DRIVER
>>  M:	Kris Chaplin <kris.chaplin@amd.com>
>>  R:	Thomas Delev <thomas.delev@amd.com>
>> diff --git a/drivers/dma/amd/Kconfig b/drivers/dma/amd/Kconfig
>> index 8246b463bcf7..8c25a3ed6b94 100644
>> --- a/drivers/dma/amd/Kconfig
>> +++ b/drivers/dma/amd/Kconfig
>> @@ -3,3 +3,4 @@
>>  # AMD DMA Drivers
>>  
>>  source "drivers/dma/amd/ptdma/Kconfig"
>> +source "drivers/dma/amd/ae4dma/Kconfig"
>> diff --git a/drivers/dma/amd/Makefile b/drivers/dma/amd/Makefile
>> index dd7257ba7e06..8049b06a9ff5 100644
>> --- a/drivers/dma/amd/Makefile
>> +++ b/drivers/dma/amd/Makefile
>> @@ -4,3 +4,4 @@
>>  #
>>  
>>  obj-$(CONFIG_AMD_PTDMA) += ptdma/
>> +obj-$(CONFIG_AMD_AE4DMA) += ae4dma/
>> diff --git a/drivers/dma/amd/ae4dma/Kconfig b/drivers/dma/amd/ae4dma/Kconfig
>> new file mode 100644
>> index 000000000000..cf8db4dac98d
>> --- /dev/null
>> +++ b/drivers/dma/amd/ae4dma/Kconfig
>> @@ -0,0 +1,13 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +config AMD_AE4DMA
>> +	tristate  "AMD AE4DMA Engine"
>> +	depends on X86_64 && PCI
>> +	select DMA_ENGINE
>> +	select DMA_VIRTUAL_CHANNELS
>> +	help
>> +	  Enable support for the AMD AE4DMA controller. This controller
>> +	  provides DMA capabilities to perform high bandwidth memory to
>> +	  memory and IO copy operations. It performs DMA transfer through
>> +	  queue-based descriptor management. This DMA controller is intended
>> +	  to be used with AMD Non-Transparent Bridge devices and not for
>> +	  general purpose peripheral DMA.
>> diff --git a/drivers/dma/amd/ae4dma/Makefile b/drivers/dma/amd/ae4dma/Makefile
>> new file mode 100644
>> index 000000000000..e918f85a80ec
>> --- /dev/null
>> +++ b/drivers/dma/amd/ae4dma/Makefile
>> @@ -0,0 +1,10 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +#
>> +# AMD AE4DMA driver
>> +#
>> +
>> +obj-$(CONFIG_AMD_AE4DMA) += ae4dma.o
>> +
>> +ae4dma-objs := ae4dma-dev.o
>> +
>> +ae4dma-$(CONFIG_PCI) += ae4dma-pci.o
>> diff --git a/drivers/dma/amd/ae4dma/ae4dma-dev.c b/drivers/dma/amd/ae4dma/ae4dma-dev.c
>> new file mode 100644
>> index 000000000000..fc33d2056af2
>> --- /dev/null
>> +++ b/drivers/dma/amd/ae4dma/ae4dma-dev.c
>> @@ -0,0 +1,206 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * AMD AE4DMA driver
>> + *
>> + * Copyright (c) 2024, Advanced Micro Devices, Inc.
>> + * All Rights Reserved.
>> + *
>> + * Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
>> + */
>> +
>> +#include "ae4dma.h"
>> +
>> +static unsigned int max_hw_q = 1;
>> +module_param(max_hw_q, uint, 0444);
>> +MODULE_PARM_DESC(max_hw_q, "max hw queues supported by engine (any non-zero value, default: 1)");
> Does it get from hardware register? you put to global variable. How about
> system have two difference DMA controllers, one's max_hw_q is 1, the other
> is 2.

Yes, this global value configures the default hardware register to 1. Since
all DMA controllers are identical, they will all have the same value set for
all DMA controllers. 

>
>> +
>> +static char *ae4_error_codes[] = {
>> +	"",
>> +	"ERR 01: INVALID HEADER DW0",
>> +	"ERR 02: INVALID STATUS",
>> +	"ERR 03: INVALID LENGTH - 4 BYTE ALIGNMENT",
>> +	"ERR 04: INVALID SRC ADDR - 4 BYTE ALIGNMENT",
>> +	"ERR 05: INVALID DST ADDR - 4 BYTE ALIGNMENT",
>> +	"ERR 06: INVALID ALIGNMENT",
>> +	"ERR 07: INVALID DESCRIPTOR",
>> +};
>> +
>> +static void ae4_log_error(struct pt_device *d, int e)
>> +{
>> +	if (e <= 7)
>> +		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", ae4_error_codes[e], e);
>> +	else if (e > 7 && e <= 15)
>> +		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "INVALID DESCRIPTOR", e);
>> +	else if (e > 15 && e <= 31)
>> +		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "INVALID DESCRIPTOR", e);
>> +	else if (e > 31 && e <= 63)
>> +		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "INVALID DESCRIPTOR", e);
>> +	else if (e > 63 && e <= 127)
>> +		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "PTE ERROR", e);
>> +	else if (e > 127 && e <= 255)
>> +		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "PTE ERROR", e);
>> +	else
>> +		dev_info(d->dev, "Unknown AE4DMA error");
>> +}
>> +
>> +static void ae4_check_status_error(struct ae4_cmd_queue *ae4cmd_q, int idx)
>> +{
>> +	struct pt_cmd_queue *cmd_q = &ae4cmd_q->cmd_q;
>> +	struct ae4dma_desc desc;
>> +	u8 status;
>> +
>> +	memcpy(&desc, &cmd_q->qbase[idx], sizeof(struct ae4dma_desc));
>> +	/* Synchronize ordering */
>> +	mb();
> does dma_wmb() enough? 

Sure, I will change to dma_rmb which is enough for this scenario.

>
>> +	status = desc.dw1.status;
>> +	if (status && status != AE4_DESC_COMPLETED) {
>> +		cmd_q->cmd_error = desc.dw1.err_code;
>> +		if (cmd_q->cmd_error)
>> +			ae4_log_error(cmd_q->pt, cmd_q->cmd_error);
>> +	}
>> +}
>> +
>> +static void ae4_pending_work(struct work_struct *work)
>> +{
>> +	struct ae4_cmd_queue *ae4cmd_q = container_of(work, struct ae4_cmd_queue, p_work.work);
>> +	struct pt_cmd_queue *cmd_q = &ae4cmd_q->cmd_q;
>> +	struct pt_cmd *cmd;
>> +	u32 cridx, dridx;
>> +
>> +	while (true) {
>> +		wait_event_interruptible(ae4cmd_q->q_w,
>> +					 ((atomic64_read(&ae4cmd_q->done_cnt)) <
>> +					   atomic64_read(&ae4cmd_q->intr_cnt)));
> wait_event_interruptible_timeout() ? to avoid patental deadlock.

A thread will be created and started for each queue initially. These threads will wait for any DMA
operation to complete quickly. If there are no DMA operations, the threads will remain idle, but
there won't be a deadlock.

>
>> +
>> +		atomic64_inc(&ae4cmd_q->done_cnt);
>> +
>> +		mutex_lock(&ae4cmd_q->cmd_lock);
>> +
>> +		cridx = readl(cmd_q->reg_control + 0x0C);
>> +		dridx = atomic_read(&ae4cmd_q->dridx);
>> +
>> +		while ((dridx != cridx) && !list_empty(&ae4cmd_q->cmd)) {
>> +			cmd = list_first_entry(&ae4cmd_q->cmd, struct pt_cmd, entry);
>> +			list_del(&cmd->entry);
>> +
>> +			ae4_check_status_error(ae4cmd_q, dridx);
>> +			cmd->pt_cmd_callback(cmd->data, cmd->ret);
>> +
>> +			atomic64_dec(&ae4cmd_q->q_cmd_count);
>> +			dridx = (dridx + 1) % CMD_Q_LEN;
>> +			atomic_set(&ae4cmd_q->dridx, dridx);
>> +			/* Synchronize ordering */
>> +			mb();
>> +
>> +			complete_all(&ae4cmd_q->cmp);
>> +		}
>> +
>> +		mutex_unlock(&ae4cmd_q->cmd_lock);
>> +	}
>> +}
>> +
>> +static irqreturn_t ae4_core_irq_handler(int irq, void *data)
>> +{
>> +	struct ae4_cmd_queue *ae4cmd_q = data;
>> +	struct pt_cmd_queue *cmd_q;
>> +	struct pt_device *pt;
>> +	u32 status;
>> +
>> +	cmd_q = &ae4cmd_q->cmd_q;
>> +	pt = cmd_q->pt;
>> +
>> +	pt->total_interrupts++;
>> +	atomic64_inc(&ae4cmd_q->intr_cnt);
>> +
>> +	wake_up(&ae4cmd_q->q_w);
>> +
>> +	status = readl(cmd_q->reg_control + 0x14);
>> +	if (status & BIT(0)) {
>> +		status &= GENMASK(31, 1);
>> +		writel(status, cmd_q->reg_control + 0x14);
>> +	}
>> +
>> +	return IRQ_HANDLED;
>> +}
>> +
>> +void ae4_destroy_work(struct ae4_device *ae4)
>> +{
>> +	struct ae4_cmd_queue *ae4cmd_q;
>> +	int i;
>> +
>> +	for (i = 0; i < ae4->cmd_q_count; i++) {
>> +		ae4cmd_q = &ae4->ae4cmd_q[i];
>> +
>> +		if (!ae4cmd_q->pws)
>> +			break;
>> +
>> +		cancel_delayed_work(&ae4cmd_q->p_work);
> do you need cancel_delayed_work_sync()?

Sure, I will change to cancel_delayed_work_sync.

>
>> +		destroy_workqueue(ae4cmd_q->pws);
>> +	}
>> +}
>> +
>> +int ae4_core_init(struct ae4_device *ae4)
>> +{
>> +	struct pt_device *pt = &ae4->pt;
>> +	struct ae4_cmd_queue *ae4cmd_q;
>> +	struct device *dev = pt->dev;
>> +	struct pt_cmd_queue *cmd_q;
>> +	int i, ret = 0;
>> +
>> +	writel(max_hw_q, pt->io_regs);
>> +
>> +	for (i = 0; i < max_hw_q; i++) {
>> +		ae4cmd_q = &ae4->ae4cmd_q[i];
>> +		ae4cmd_q->id = ae4->cmd_q_count;
>> +		ae4->cmd_q_count++;
>> +
>> +		cmd_q = &ae4cmd_q->cmd_q;
>> +		cmd_q->pt = pt;
>> +
>> +		/* Preset some register values (Q size is 32byte (0x20)) */
>> +		cmd_q->reg_control = pt->io_regs + ((i + 1) * 0x20);
>> +
>> +		ret = devm_request_irq(dev, ae4->ae4_irq[i], ae4_core_irq_handler, 0,
>> +				       dev_name(pt->dev), ae4cmd_q);
>> +		if (ret)
>> +			return ret;
>> +
>> +		cmd_q->qsize = Q_SIZE(sizeof(struct ae4dma_desc));
>> +
>> +		cmd_q->qbase = dmam_alloc_coherent(dev, cmd_q->qsize, &cmd_q->qbase_dma,
>> +						   GFP_KERNEL);
>> +		if (!cmd_q->qbase)
>> +			return -ENOMEM;
>> +	}
>> +
>> +	for (i = 0; i < ae4->cmd_q_count; i++) {
>> +		ae4cmd_q = &ae4->ae4cmd_q[i];
>> +
>> +		cmd_q = &ae4cmd_q->cmd_q;
>> +
>> +		/* Preset some register values (Q size is 32byte (0x20)) */
>> +		cmd_q->reg_control = pt->io_regs + ((i + 1) * 0x20);
>> +
>> +		/* Update the device registers with queue information. */
>> +		writel(CMD_Q_LEN, cmd_q->reg_control + 0x08);
>> +
>> +		cmd_q->qdma_tail = cmd_q->qbase_dma;
>> +		writel(lower_32_bits(cmd_q->qdma_tail), cmd_q->reg_control + 0x18);
>> +		writel(upper_32_bits(cmd_q->qdma_tail), cmd_q->reg_control + 0x1C);
>> +
>> +		INIT_LIST_HEAD(&ae4cmd_q->cmd);
>> +		init_waitqueue_head(&ae4cmd_q->q_w);
>> +
>> +		ae4cmd_q->pws = alloc_ordered_workqueue("ae4dma_%d", WQ_MEM_RECLAIM, ae4cmd_q->id);
> Can existed workqueue match your requirement? 

Separate work queues for each queue, compared to a existing workqueue, enhance performance by enabling
load balancing across queues, ensuring DMA command execution even under memory pressure, and
maintaining strict isolation between tasks in different queues.

>
> Frank
>
>> +		if (!ae4cmd_q->pws) {
>> +			ae4_destroy_work(ae4);
>> +			return -ENOMEM;
>> +		}
>> +		INIT_DELAYED_WORK(&ae4cmd_q->p_work, ae4_pending_work);
>> +		queue_delayed_work(ae4cmd_q->pws, &ae4cmd_q->p_work,  usecs_to_jiffies(100));
>> +
>> +		init_completion(&ae4cmd_q->cmp);
>> +	}
>> +
>> +	return ret;
>> +}
>> diff --git a/drivers/dma/amd/ae4dma/ae4dma-pci.c b/drivers/dma/amd/ae4dma/ae4dma-pci.c
>> new file mode 100644
>> index 000000000000..4cd537af757d
>> --- /dev/null
>> +++ b/drivers/dma/amd/ae4dma/ae4dma-pci.c
>> @@ -0,0 +1,195 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * AMD AE4DMA driver
>> + *
>> + * Copyright (c) 2024, Advanced Micro Devices, Inc.
>> + * All Rights Reserved.
>> + *
>> + * Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
>> + */
>> +
>> +#include "ae4dma.h"
>> +
>> +static int ae4_get_msi_irq(struct ae4_device *ae4)
>> +{
>> +	struct pt_device *pt = &ae4->pt;
>> +	struct device *dev = pt->dev;
>> +	struct pci_dev *pdev;
>> +	int ret, i;
>> +
>> +	pdev = to_pci_dev(dev);
>> +	ret = pci_enable_msi(pdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	for (i = 0; i < MAX_AE4_HW_QUEUES; i++)
>> +		ae4->ae4_irq[i] = pdev->irq;
>> +
>> +	return 0;
>> +}
>> +
>> +static int ae4_get_msix_irqs(struct ae4_device *ae4)
>> +{
>> +	struct ae4_msix *ae4_msix = ae4->ae4_msix;
>> +	struct pt_device *pt = &ae4->pt;
>> +	struct device *dev = pt->dev;
>> +	struct pci_dev *pdev;
>> +	int v, i, ret;
>> +
>> +	pdev = to_pci_dev(dev);
>> +
>> +	for (v = 0; v < ARRAY_SIZE(ae4_msix->msix_entry); v++)
>> +		ae4_msix->msix_entry[v].entry = v;
>> +
>> +	ret = pci_enable_msix_range(pdev, ae4_msix->msix_entry, 1, v);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	ae4_msix->msix_count = ret;
>> +
>> +	for (i = 0; i < MAX_AE4_HW_QUEUES; i++)
>> +		ae4->ae4_irq[i] = ae4_msix->msix_entry[i].vector;
>> +
>> +	return 0;
>> +}
>> +
>> +static int ae4_get_irqs(struct ae4_device *ae4)
>> +{
>> +	struct pt_device *pt = &ae4->pt;
>> +	struct device *dev = pt->dev;
>> +	int ret;
>> +
>> +	ret = ae4_get_msix_irqs(ae4);
>> +	if (!ret)
>> +		return 0;
>> +
>> +	/* Couldn't get MSI-X vectors, try MSI */
>> +	dev_err(dev, "could not enable MSI-X (%d), trying MSI\n", ret);
>> +	ret = ae4_get_msi_irq(ae4);
>> +	if (!ret)
>> +		return 0;
>> +
>> +	/* Couldn't get MSI interrupt */
>> +	dev_err(dev, "could not enable MSI (%d)\n", ret);
>> +
>> +	return ret;
>> +}
>> +
>> +static void ae4_free_irqs(struct ae4_device *ae4)
>> +{
>> +	struct ae4_msix *ae4_msix;
>> +	struct pci_dev *pdev;
>> +	struct pt_device *pt;
>> +	struct device *dev;
>> +	int i;
>> +
>> +	if (ae4) {
>> +		pt = &ae4->pt;
>> +		dev = pt->dev;
>> +		pdev = to_pci_dev(dev);
>> +
>> +		ae4_msix = ae4->ae4_msix;
>> +		if (ae4_msix && ae4_msix->msix_count)
>> +			pci_disable_msix(pdev);
>> +		else if (pdev->irq)
>> +			pci_disable_msi(pdev);
>> +
>> +		for (i = 0; i < MAX_AE4_HW_QUEUES; i++)
>> +			ae4->ae4_irq[i] = 0;
>> +	}
>> +}
>> +
>> +static void ae4_deinit(struct ae4_device *ae4)
>> +{
>> +	ae4_free_irqs(ae4);
>> +}
>> +
>> +static int ae4_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>> +{
>> +	struct device *dev = &pdev->dev;
>> +	struct ae4_device *ae4;
>> +	struct pt_device *pt;
>> +	int bar_mask;
>> +	int ret = 0;
>> +
>> +	ae4 = devm_kzalloc(dev, sizeof(*ae4), GFP_KERNEL);
>> +	if (!ae4)
>> +		return -ENOMEM;
>> +
>> +	ae4->ae4_msix = devm_kzalloc(dev, sizeof(struct ae4_msix), GFP_KERNEL);
>> +	if (!ae4->ae4_msix)
>> +		return -ENOMEM;
>> +
>> +	ret = pcim_enable_device(pdev);
>> +	if (ret)
>> +		goto ae4_error;
>> +
>> +	bar_mask = pci_select_bars(pdev, IORESOURCE_MEM);
>> +	ret = pcim_iomap_regions(pdev, bar_mask, "ae4dma");
>> +	if (ret)
>> +		goto ae4_error;
>> +
>> +	pt = &ae4->pt;
>> +	pt->dev = dev;
>> +
>> +	pt->io_regs = pcim_iomap_table(pdev)[0];
>> +	if (!pt->io_regs) {
>> +		ret = -ENOMEM;
>> +		goto ae4_error;
>> +	}
>> +
>> +	ret = ae4_get_irqs(ae4);
>> +	if (ret)
>> +		goto ae4_error;
>> +
>> +	pci_set_master(pdev);
>> +
>> +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(48));
>> +	if (ret) {
>> +		ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
>> +		if (ret)
>> +			goto ae4_error;
>> +	}
> needn't failback to 32bit.  never return failure when bit >= 32.
>
> Detail see: 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=f7ae20f2fc4e6a5e32f43c4fa2acab3281a61c81
>
> if (support_48bit)
> 	dma_set_mask_and_coherent(dev, DMA_BIT_MASK(48))
> else
> 	dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))
>
> you decide if support_48bit by hardware register or PID/DID

Sure, I will add only this line dma_set_mask_and_coherent(dev, DMA_BIT_MASK(48)).

>
>
>> +
>> +	dev_set_drvdata(dev, ae4);
>> +
>> +	ret = ae4_core_init(ae4);
>> +	if (ret)
>> +		goto ae4_error;
>> +
>> +	return 0;
>> +
>> +ae4_error:
>> +	ae4_deinit(ae4);
>> +
>> +	return ret;
>> +}
>> +
>> +static void ae4_pci_remove(struct pci_dev *pdev)
>> +{
>> +	struct ae4_device *ae4 = dev_get_drvdata(&pdev->dev);
>> +
>> +	ae4_destroy_work(ae4);
>> +	ae4_deinit(ae4);
>> +}
>> +
>> +static const struct pci_device_id ae4_pci_table[] = {
>> +	{ PCI_VDEVICE(AMD, 0x14C8), },
>> +	{ PCI_VDEVICE(AMD, 0x14DC), },
>> +	{ PCI_VDEVICE(AMD, 0x149B), },
>> +	/* Last entry must be zero */
>> +	{ 0, }
>> +};
>> +MODULE_DEVICE_TABLE(pci, ae4_pci_table);
>> +
>> +static struct pci_driver ae4_pci_driver = {
>> +	.name = "ae4dma",
>> +	.id_table = ae4_pci_table,
>> +	.probe = ae4_pci_probe,
>> +	.remove = ae4_pci_remove,
>> +};
>> +
>> +module_pci_driver(ae4_pci_driver);
>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_DESCRIPTION("AMD AE4DMA driver");
>> diff --git a/drivers/dma/amd/ae4dma/ae4dma.h b/drivers/dma/amd/ae4dma/ae4dma.h
>> new file mode 100644
>> index 000000000000..24b1253ad570
>> --- /dev/null
>> +++ b/drivers/dma/amd/ae4dma/ae4dma.h
>> @@ -0,0 +1,77 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * AMD AE4DMA driver
>> + *
>> + * Copyright (c) 2024, Advanced Micro Devices, Inc.
>> + * All Rights Reserved.
>> + *
>> + * Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
>> + */
>> +#ifndef __AE4DMA_H__
>> +#define __AE4DMA_H__
>> +
>> +#include "../common/amd_dma.h"
>> +
>> +#define MAX_AE4_HW_QUEUES		16
>> +
>> +#define AE4_DESC_COMPLETED		0x3
>> +
>> +struct ae4_msix {
>> +	int msix_count;
>> +	struct msix_entry msix_entry[MAX_AE4_HW_QUEUES];
>> +};
>> +
>> +struct ae4_cmd_queue {
>> +	struct ae4_device *ae4;
>> +	struct pt_cmd_queue cmd_q;
>> +	struct list_head cmd;
>> +	/* protect command operations */
>> +	struct mutex cmd_lock;
>> +	struct delayed_work p_work;
>> +	struct workqueue_struct *pws;
>> +	struct completion cmp;
>> +	wait_queue_head_t q_w;
>> +	atomic64_t intr_cnt;
>> +	atomic64_t done_cnt;
>> +	atomic64_t q_cmd_count;
>> +	atomic_t dridx;
>> +	unsigned int id;
>> +};
>> +
>> +union dwou {
>> +	u32 dw0;
>> +	struct dword0 {
>> +	u8	byte0;
>> +	u8	byte1;
>> +	u16	timestamp;
>> +	} dws;
>> +};
>> +
>> +struct dword1 {
>> +	u8	status;
>> +	u8	err_code;
>> +	u16	desc_id;
>> +};
>> +
>> +struct ae4dma_desc {
>> +	union dwou dwouv;
>> +	struct dword1 dw1;
>> +	u32 length;
>> +	u32 rsvd;
>> +	u32 src_hi;
>> +	u32 src_lo;
>> +	u32 dst_hi;
>> +	u32 dst_lo;
>> +};
>> +
>> +struct ae4_device {
>> +	struct pt_device pt;
>> +	struct ae4_msix *ae4_msix;
>> +	struct ae4_cmd_queue ae4cmd_q[MAX_AE4_HW_QUEUES];
>> +	unsigned int ae4_irq[MAX_AE4_HW_QUEUES];
>> +	unsigned int cmd_q_count;
>> +};
>> +
>> +int ae4_core_init(struct ae4_device *ae4);
>> +void ae4_destroy_work(struct ae4_device *ae4);
>> +#endif
>> diff --git a/drivers/dma/amd/common/amd_dma.h b/drivers/dma/amd/common/amd_dma.h
>> new file mode 100644
>> index 000000000000..31c35b3bc94b
>> --- /dev/null
>> +++ b/drivers/dma/amd/common/amd_dma.h
>> @@ -0,0 +1,26 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * AMD DMA Driver common
>> + *
>> + * Copyright (c) 2024, Advanced Micro Devices, Inc.
>> + * All Rights Reserved.
>> + *
>> + * Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
>> + */
>> +
>> +#ifndef AMD_DMA_H
>> +#define AMD_DMA_H
>> +
>> +#include <linux/device.h>
>> +#include <linux/dmaengine.h>
>> +#include <linux/pci.h>
>> +#include <linux/spinlock.h>
>> +#include <linux/mutex.h>
>> +#include <linux/list.h>
>> +#include <linux/wait.h>
>> +#include <linux/dmapool.h>
> order by alphabet

Sure, I will change it accordingly.

Thanks,
--
Basavaraj

>
>> +
>> +#include "../ptdma/ptdma.h"
>> +#include "../../virt-dma.h"
>> +
>> +#endif
>> -- 
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/7] dmaengine: ae4dma: Add AMD ae4dma controller driver
  2024-05-21  9:36     ` Basavaraj Natikar
@ 2024-05-23 22:02       ` Frank Li
  0 siblings, 0 replies; 12+ messages in thread
From: Frank Li @ 2024-05-23 22:02 UTC (permalink / raw)
  To: Basavaraj Natikar; +Cc: Basavaraj Natikar, vkoul, dmaengine, Raju.Rangoju

On Tue, May 21, 2024 at 03:06:17PM +0530, Basavaraj Natikar wrote:
> 
> On 5/10/2024 11:46 PM, Frank Li wrote:
> > On Fri, May 10, 2024 at 01:50:48PM +0530, Basavaraj Natikar wrote:
> >> Add support for AMD AE4DMA controller. It performs high-bandwidth
> >> memory to memory and IO copy operation. Device commands are managed
> >> via a circular queue of 'descriptors', each of which specifies source
> >> and destination addresses for copying a single buffer of data.
> >>
> >> Reviewed-by: Raju Rangoju <Raju.Rangoju@amd.com>
> >> Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
> >> ---
> >>  MAINTAINERS                         |   6 +
> >>  drivers/dma/amd/Kconfig             |   1 +
> >>  drivers/dma/amd/Makefile            |   1 +
> >>  drivers/dma/amd/ae4dma/Kconfig      |  13 ++
> >>  drivers/dma/amd/ae4dma/Makefile     |  10 ++
> >>  drivers/dma/amd/ae4dma/ae4dma-dev.c | 206 ++++++++++++++++++++++++++++
> >>  drivers/dma/amd/ae4dma/ae4dma-pci.c | 195 ++++++++++++++++++++++++++
> >>  drivers/dma/amd/ae4dma/ae4dma.h     |  77 +++++++++++
> >>  drivers/dma/amd/common/amd_dma.h    |  26 ++++
> >>  9 files changed, 535 insertions(+)
> >>  create mode 100644 drivers/dma/amd/ae4dma/Kconfig
> >>  create mode 100644 drivers/dma/amd/ae4dma/Makefile
> >>  create mode 100644 drivers/dma/amd/ae4dma/ae4dma-dev.c
> >>  create mode 100644 drivers/dma/amd/ae4dma/ae4dma-pci.c
> >>  create mode 100644 drivers/dma/amd/ae4dma/ae4dma.h
> >>  create mode 100644 drivers/dma/amd/common/amd_dma.h
> >>
> >> diff --git a/MAINTAINERS b/MAINTAINERS
> >> index b190efda33ba..45f2140093b6 100644
> >> --- a/MAINTAINERS
> >> +++ b/MAINTAINERS
> >> @@ -909,6 +909,12 @@ L:	linux-edac@vger.kernel.org
> >>  S:	Supported
> >>  F:	drivers/ras/amd/atl/*
> >>  
> >> +AMD AE4DMA DRIVER
> >> +M:	Basavaraj Natikar <Basavaraj.Natikar@amd.com>
> >> +L:	dmaengine@vger.kernel.org
> >> +S:	Maintained
> >> +F:	drivers/dma/amd/ae4dma/
> >> +
> >>  AMD AXI W1 DRIVER
> >>  M:	Kris Chaplin <kris.chaplin@amd.com>
> >>  R:	Thomas Delev <thomas.delev@amd.com>
> >> diff --git a/drivers/dma/amd/Kconfig b/drivers/dma/amd/Kconfig
> >> index 8246b463bcf7..8c25a3ed6b94 100644
> >> --- a/drivers/dma/amd/Kconfig
> >> +++ b/drivers/dma/amd/Kconfig
> >> @@ -3,3 +3,4 @@
> >>  # AMD DMA Drivers
> >>  
> >>  source "drivers/dma/amd/ptdma/Kconfig"
> >> +source "drivers/dma/amd/ae4dma/Kconfig"
> >> diff --git a/drivers/dma/amd/Makefile b/drivers/dma/amd/Makefile
> >> index dd7257ba7e06..8049b06a9ff5 100644
> >> --- a/drivers/dma/amd/Makefile
> >> +++ b/drivers/dma/amd/Makefile
> >> @@ -4,3 +4,4 @@
> >>  #
> >>  
> >>  obj-$(CONFIG_AMD_PTDMA) += ptdma/
> >> +obj-$(CONFIG_AMD_AE4DMA) += ae4dma/
> >> diff --git a/drivers/dma/amd/ae4dma/Kconfig b/drivers/dma/amd/ae4dma/Kconfig
> >> new file mode 100644
> >> index 000000000000..cf8db4dac98d
> >> --- /dev/null
> >> +++ b/drivers/dma/amd/ae4dma/Kconfig
> >> @@ -0,0 +1,13 @@
> >> +# SPDX-License-Identifier: GPL-2.0
> >> +config AMD_AE4DMA
> >> +	tristate  "AMD AE4DMA Engine"
> >> +	depends on X86_64 && PCI
> >> +	select DMA_ENGINE
> >> +	select DMA_VIRTUAL_CHANNELS
> >> +	help
> >> +	  Enable support for the AMD AE4DMA controller. This controller
> >> +	  provides DMA capabilities to perform high bandwidth memory to
> >> +	  memory and IO copy operations. It performs DMA transfer through
> >> +	  queue-based descriptor management. This DMA controller is intended
> >> +	  to be used with AMD Non-Transparent Bridge devices and not for
> >> +	  general purpose peripheral DMA.
> >> diff --git a/drivers/dma/amd/ae4dma/Makefile b/drivers/dma/amd/ae4dma/Makefile
> >> new file mode 100644
> >> index 000000000000..e918f85a80ec
> >> --- /dev/null
> >> +++ b/drivers/dma/amd/ae4dma/Makefile
> >> @@ -0,0 +1,10 @@
> >> +# SPDX-License-Identifier: GPL-2.0
> >> +#
> >> +# AMD AE4DMA driver
> >> +#
> >> +
> >> +obj-$(CONFIG_AMD_AE4DMA) += ae4dma.o
> >> +
> >> +ae4dma-objs := ae4dma-dev.o
> >> +
> >> +ae4dma-$(CONFIG_PCI) += ae4dma-pci.o
> >> diff --git a/drivers/dma/amd/ae4dma/ae4dma-dev.c b/drivers/dma/amd/ae4dma/ae4dma-dev.c
> >> new file mode 100644
> >> index 000000000000..fc33d2056af2
> >> --- /dev/null
> >> +++ b/drivers/dma/amd/ae4dma/ae4dma-dev.c
> >> @@ -0,0 +1,206 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +/*
> >> + * AMD AE4DMA driver
> >> + *
> >> + * Copyright (c) 2024, Advanced Micro Devices, Inc.
> >> + * All Rights Reserved.
> >> + *
> >> + * Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
> >> + */
> >> +
> >> +#include "ae4dma.h"
> >> +
> >> +static unsigned int max_hw_q = 1;
> >> +module_param(max_hw_q, uint, 0444);
> >> +MODULE_PARM_DESC(max_hw_q, "max hw queues supported by engine (any non-zero value, default: 1)");
> > Does it get from hardware register? you put to global variable. How about
> > system have two difference DMA controllers, one's max_hw_q is 1, the other
> > is 2.
> 
> Yes, this global value configures the default hardware register to 1. Since
> all DMA controllers are identical, they will all have the same value set for
> all DMA controllers. 

Even it is same now. I still perfer put 

+static const struct pci_device_id ae4_pci_table[] = {
+	{ PCI_VDEVICE(AMD, 0x14C8), MAX_HW_Q},
				    ^^^^^^^^

+	{ PCI_VDEVICE(AMD, 0x14DC), ...},
+	{ PCI_VDEVICE(AMD, 0x149B), ...},
+	/* Last entry must be zero */
+	{ 0, }

So if new design increase queue number in future. 
You just need add one line here.

Frank

> 
> >
> >> +
> >> +static char *ae4_error_codes[] = {
> >> +	"",
> >> +	"ERR 01: INVALID HEADER DW0",
> >> +	"ERR 02: INVALID STATUS",
> >> +	"ERR 03: INVALID LENGTH - 4 BYTE ALIGNMENT",
> >> +	"ERR 04: INVALID SRC ADDR - 4 BYTE ALIGNMENT",
> >> +	"ERR 05: INVALID DST ADDR - 4 BYTE ALIGNMENT",
> >> +	"ERR 06: INVALID ALIGNMENT",
> >> +	"ERR 07: INVALID DESCRIPTOR",
> >> +};
> >> +
> >> +static void ae4_log_error(struct pt_device *d, int e)
> >> +{
> >> +	if (e <= 7)
> >> +		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", ae4_error_codes[e], e);
> >> +	else if (e > 7 && e <= 15)
> >> +		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "INVALID DESCRIPTOR", e);
> >> +	else if (e > 15 && e <= 31)
> >> +		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "INVALID DESCRIPTOR", e);
> >> +	else if (e > 31 && e <= 63)
> >> +		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "INVALID DESCRIPTOR", e);
> >> +	else if (e > 63 && e <= 127)
> >> +		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "PTE ERROR", e);
> >> +	else if (e > 127 && e <= 255)
> >> +		dev_info(d->dev, "AE4DMA error: %s (0x%x)\n", "PTE ERROR", e);
> >> +	else
> >> +		dev_info(d->dev, "Unknown AE4DMA error");
> >> +}
> >> +
> >> +static void ae4_check_status_error(struct ae4_cmd_queue *ae4cmd_q, int idx)
> >> +{
> >> +	struct pt_cmd_queue *cmd_q = &ae4cmd_q->cmd_q;
> >> +	struct ae4dma_desc desc;
> >> +	u8 status;
> >> +
> >> +	memcpy(&desc, &cmd_q->qbase[idx], sizeof(struct ae4dma_desc));
> >> +	/* Synchronize ordering */
> >> +	mb();
> > does dma_wmb() enough? 
> 
> Sure, I will change to dma_rmb which is enough for this scenario.
> 
> >
> >> +	status = desc.dw1.status;
> >> +	if (status && status != AE4_DESC_COMPLETED) {
> >> +		cmd_q->cmd_error = desc.dw1.err_code;
> >> +		if (cmd_q->cmd_error)
> >> +			ae4_log_error(cmd_q->pt, cmd_q->cmd_error);
> >> +	}
> >> +}
> >> +
> >> +static void ae4_pending_work(struct work_struct *work)
> >> +{
> >> +	struct ae4_cmd_queue *ae4cmd_q = container_of(work, struct ae4_cmd_queue, p_work.work);
> >> +	struct pt_cmd_queue *cmd_q = &ae4cmd_q->cmd_q;
> >> +	struct pt_cmd *cmd;
> >> +	u32 cridx, dridx;
> >> +
> >> +	while (true) {
> >> +		wait_event_interruptible(ae4cmd_q->q_w,
> >> +					 ((atomic64_read(&ae4cmd_q->done_cnt)) <
> >> +					   atomic64_read(&ae4cmd_q->intr_cnt)));
> > wait_event_interruptible_timeout() ? to avoid patental deadlock.
> 
> A thread will be created and started for each queue initially. These threads will wait for any DMA
> operation to complete quickly. If there are no DMA operations, the threads will remain idle, but
> there won't be a deadlock.
> 
> >
> >> +
> >> +		atomic64_inc(&ae4cmd_q->done_cnt);
> >> +
> >> +		mutex_lock(&ae4cmd_q->cmd_lock);
> >> +
> >> +		cridx = readl(cmd_q->reg_control + 0x0C);
> >> +		dridx = atomic_read(&ae4cmd_q->dridx);
> >> +
> >> +		while ((dridx != cridx) && !list_empty(&ae4cmd_q->cmd)) {
> >> +			cmd = list_first_entry(&ae4cmd_q->cmd, struct pt_cmd, entry);
> >> +			list_del(&cmd->entry);
> >> +
> >> +			ae4_check_status_error(ae4cmd_q, dridx);
> >> +			cmd->pt_cmd_callback(cmd->data, cmd->ret);
> >> +
> >> +			atomic64_dec(&ae4cmd_q->q_cmd_count);
> >> +			dridx = (dridx + 1) % CMD_Q_LEN;
> >> +			atomic_set(&ae4cmd_q->dridx, dridx);
> >> +			/* Synchronize ordering */
> >> +			mb();
> >> +
> >> +			complete_all(&ae4cmd_q->cmp);
> >> +		}
> >> +
> >> +		mutex_unlock(&ae4cmd_q->cmd_lock);
> >> +	}
> >> +}
> >> +
> >> +static irqreturn_t ae4_core_irq_handler(int irq, void *data)
> >> +{
> >> +	struct ae4_cmd_queue *ae4cmd_q = data;
> >> +	struct pt_cmd_queue *cmd_q;
> >> +	struct pt_device *pt;
> >> +	u32 status;
> >> +
> >> +	cmd_q = &ae4cmd_q->cmd_q;
> >> +	pt = cmd_q->pt;
> >> +
> >> +	pt->total_interrupts++;
> >> +	atomic64_inc(&ae4cmd_q->intr_cnt);
> >> +
> >> +	wake_up(&ae4cmd_q->q_w);
> >> +
> >> +	status = readl(cmd_q->reg_control + 0x14);
> >> +	if (status & BIT(0)) {
> >> +		status &= GENMASK(31, 1);
> >> +		writel(status, cmd_q->reg_control + 0x14);
> >> +	}
> >> +
> >> +	return IRQ_HANDLED;
> >> +}
> >> +
> >> +void ae4_destroy_work(struct ae4_device *ae4)
> >> +{
> >> +	struct ae4_cmd_queue *ae4cmd_q;
> >> +	int i;
> >> +
> >> +	for (i = 0; i < ae4->cmd_q_count; i++) {
> >> +		ae4cmd_q = &ae4->ae4cmd_q[i];
> >> +
> >> +		if (!ae4cmd_q->pws)
> >> +			break;
> >> +
> >> +		cancel_delayed_work(&ae4cmd_q->p_work);
> > do you need cancel_delayed_work_sync()?
> 
> Sure, I will change to cancel_delayed_work_sync.
> 
> >
> >> +		destroy_workqueue(ae4cmd_q->pws);
> >> +	}
> >> +}
> >> +
> >> +int ae4_core_init(struct ae4_device *ae4)
> >> +{
> >> +	struct pt_device *pt = &ae4->pt;
> >> +	struct ae4_cmd_queue *ae4cmd_q;
> >> +	struct device *dev = pt->dev;
> >> +	struct pt_cmd_queue *cmd_q;
> >> +	int i, ret = 0;
> >> +
> >> +	writel(max_hw_q, pt->io_regs);
> >> +
> >> +	for (i = 0; i < max_hw_q; i++) {
> >> +		ae4cmd_q = &ae4->ae4cmd_q[i];
> >> +		ae4cmd_q->id = ae4->cmd_q_count;
> >> +		ae4->cmd_q_count++;
> >> +
> >> +		cmd_q = &ae4cmd_q->cmd_q;
> >> +		cmd_q->pt = pt;
> >> +
> >> +		/* Preset some register values (Q size is 32byte (0x20)) */
> >> +		cmd_q->reg_control = pt->io_regs + ((i + 1) * 0x20);
> >> +
> >> +		ret = devm_request_irq(dev, ae4->ae4_irq[i], ae4_core_irq_handler, 0,
> >> +				       dev_name(pt->dev), ae4cmd_q);
> >> +		if (ret)
> >> +			return ret;
> >> +
> >> +		cmd_q->qsize = Q_SIZE(sizeof(struct ae4dma_desc));
> >> +
> >> +		cmd_q->qbase = dmam_alloc_coherent(dev, cmd_q->qsize, &cmd_q->qbase_dma,
> >> +						   GFP_KERNEL);
> >> +		if (!cmd_q->qbase)
> >> +			return -ENOMEM;
> >> +	}
> >> +
> >> +	for (i = 0; i < ae4->cmd_q_count; i++) {
> >> +		ae4cmd_q = &ae4->ae4cmd_q[i];
> >> +
> >> +		cmd_q = &ae4cmd_q->cmd_q;
> >> +
> >> +		/* Preset some register values (Q size is 32byte (0x20)) */
> >> +		cmd_q->reg_control = pt->io_regs + ((i + 1) * 0x20);
> >> +
> >> +		/* Update the device registers with queue information. */
> >> +		writel(CMD_Q_LEN, cmd_q->reg_control + 0x08);
> >> +
> >> +		cmd_q->qdma_tail = cmd_q->qbase_dma;
> >> +		writel(lower_32_bits(cmd_q->qdma_tail), cmd_q->reg_control + 0x18);
> >> +		writel(upper_32_bits(cmd_q->qdma_tail), cmd_q->reg_control + 0x1C);
> >> +
> >> +		INIT_LIST_HEAD(&ae4cmd_q->cmd);
> >> +		init_waitqueue_head(&ae4cmd_q->q_w);
> >> +
> >> +		ae4cmd_q->pws = alloc_ordered_workqueue("ae4dma_%d", WQ_MEM_RECLAIM, ae4cmd_q->id);
> > Can existed workqueue match your requirement? 
> 
> Separate work queues for each queue, compared to a existing workqueue, enhance performance by enabling
> load balancing across queues, ensuring DMA command execution even under memory pressure, and
> maintaining strict isolation between tasks in different queues.
> 
> >
> > Frank
> >
> >> +		if (!ae4cmd_q->pws) {
> >> +			ae4_destroy_work(ae4);
> >> +			return -ENOMEM;
> >> +		}
> >> +		INIT_DELAYED_WORK(&ae4cmd_q->p_work, ae4_pending_work);
> >> +		queue_delayed_work(ae4cmd_q->pws, &ae4cmd_q->p_work,  usecs_to_jiffies(100));
> >> +
> >> +		init_completion(&ae4cmd_q->cmp);
> >> +	}
> >> +
> >> +	return ret;
> >> +}
> >> diff --git a/drivers/dma/amd/ae4dma/ae4dma-pci.c b/drivers/dma/amd/ae4dma/ae4dma-pci.c
> >> new file mode 100644
> >> index 000000000000..4cd537af757d
> >> --- /dev/null
> >> +++ b/drivers/dma/amd/ae4dma/ae4dma-pci.c
> >> @@ -0,0 +1,195 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +/*
> >> + * AMD AE4DMA driver
> >> + *
> >> + * Copyright (c) 2024, Advanced Micro Devices, Inc.
> >> + * All Rights Reserved.
> >> + *
> >> + * Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
> >> + */
> >> +
> >> +#include "ae4dma.h"
> >> +
> >> +static int ae4_get_msi_irq(struct ae4_device *ae4)
> >> +{
> >> +	struct pt_device *pt = &ae4->pt;
> >> +	struct device *dev = pt->dev;
> >> +	struct pci_dev *pdev;
> >> +	int ret, i;
> >> +
> >> +	pdev = to_pci_dev(dev);
> >> +	ret = pci_enable_msi(pdev);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	for (i = 0; i < MAX_AE4_HW_QUEUES; i++)
> >> +		ae4->ae4_irq[i] = pdev->irq;
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +static int ae4_get_msix_irqs(struct ae4_device *ae4)
> >> +{
> >> +	struct ae4_msix *ae4_msix = ae4->ae4_msix;
> >> +	struct pt_device *pt = &ae4->pt;
> >> +	struct device *dev = pt->dev;
> >> +	struct pci_dev *pdev;
> >> +	int v, i, ret;
> >> +
> >> +	pdev = to_pci_dev(dev);
> >> +
> >> +	for (v = 0; v < ARRAY_SIZE(ae4_msix->msix_entry); v++)
> >> +		ae4_msix->msix_entry[v].entry = v;
> >> +
> >> +	ret = pci_enable_msix_range(pdev, ae4_msix->msix_entry, 1, v);
> >> +	if (ret < 0)
> >> +		return ret;
> >> +
> >> +	ae4_msix->msix_count = ret;
> >> +
> >> +	for (i = 0; i < MAX_AE4_HW_QUEUES; i++)
> >> +		ae4->ae4_irq[i] = ae4_msix->msix_entry[i].vector;
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +static int ae4_get_irqs(struct ae4_device *ae4)
> >> +{
> >> +	struct pt_device *pt = &ae4->pt;
> >> +	struct device *dev = pt->dev;
> >> +	int ret;
> >> +
> >> +	ret = ae4_get_msix_irqs(ae4);
> >> +	if (!ret)
> >> +		return 0;
> >> +
> >> +	/* Couldn't get MSI-X vectors, try MSI */
> >> +	dev_err(dev, "could not enable MSI-X (%d), trying MSI\n", ret);
> >> +	ret = ae4_get_msi_irq(ae4);
> >> +	if (!ret)
> >> +		return 0;
> >> +
> >> +	/* Couldn't get MSI interrupt */
> >> +	dev_err(dev, "could not enable MSI (%d)\n", ret);
> >> +
> >> +	return ret;
> >> +}
> >> +
> >> +static void ae4_free_irqs(struct ae4_device *ae4)
> >> +{
> >> +	struct ae4_msix *ae4_msix;
> >> +	struct pci_dev *pdev;
> >> +	struct pt_device *pt;
> >> +	struct device *dev;
> >> +	int i;
> >> +
> >> +	if (ae4) {
> >> +		pt = &ae4->pt;
> >> +		dev = pt->dev;
> >> +		pdev = to_pci_dev(dev);
> >> +
> >> +		ae4_msix = ae4->ae4_msix;
> >> +		if (ae4_msix && ae4_msix->msix_count)
> >> +			pci_disable_msix(pdev);
> >> +		else if (pdev->irq)
> >> +			pci_disable_msi(pdev);
> >> +
> >> +		for (i = 0; i < MAX_AE4_HW_QUEUES; i++)
> >> +			ae4->ae4_irq[i] = 0;
> >> +	}
> >> +}
> >> +
> >> +static void ae4_deinit(struct ae4_device *ae4)
> >> +{
> >> +	ae4_free_irqs(ae4);
> >> +}
> >> +
> >> +static int ae4_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >> +{
> >> +	struct device *dev = &pdev->dev;
> >> +	struct ae4_device *ae4;
> >> +	struct pt_device *pt;
> >> +	int bar_mask;
> >> +	int ret = 0;
> >> +
> >> +	ae4 = devm_kzalloc(dev, sizeof(*ae4), GFP_KERNEL);
> >> +	if (!ae4)
> >> +		return -ENOMEM;
> >> +
> >> +	ae4->ae4_msix = devm_kzalloc(dev, sizeof(struct ae4_msix), GFP_KERNEL);
> >> +	if (!ae4->ae4_msix)
> >> +		return -ENOMEM;
> >> +
> >> +	ret = pcim_enable_device(pdev);
> >> +	if (ret)
> >> +		goto ae4_error;
> >> +
> >> +	bar_mask = pci_select_bars(pdev, IORESOURCE_MEM);
> >> +	ret = pcim_iomap_regions(pdev, bar_mask, "ae4dma");
> >> +	if (ret)
> >> +		goto ae4_error;
> >> +
> >> +	pt = &ae4->pt;
> >> +	pt->dev = dev;
> >> +
> >> +	pt->io_regs = pcim_iomap_table(pdev)[0];
> >> +	if (!pt->io_regs) {
> >> +		ret = -ENOMEM;
> >> +		goto ae4_error;
> >> +	}
> >> +
> >> +	ret = ae4_get_irqs(ae4);
> >> +	if (ret)
> >> +		goto ae4_error;
> >> +
> >> +	pci_set_master(pdev);
> >> +
> >> +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(48));
> >> +	if (ret) {
> >> +		ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> >> +		if (ret)
> >> +			goto ae4_error;
> >> +	}
> > needn't failback to 32bit.  never return failure when bit >= 32.
> >
> > Detail see: 
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=f7ae20f2fc4e6a5e32f43c4fa2acab3281a61c81
> >
> > if (support_48bit)
> > 	dma_set_mask_and_coherent(dev, DMA_BIT_MASK(48))
> > else
> > 	dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))
> >
> > you decide if support_48bit by hardware register or PID/DID
> 
> Sure, I will add only this line dma_set_mask_and_coherent(dev, DMA_BIT_MASK(48)).
> 
> >
> >
> >> +
> >> +	dev_set_drvdata(dev, ae4);
> >> +
> >> +	ret = ae4_core_init(ae4);
> >> +	if (ret)
> >> +		goto ae4_error;
> >> +
> >> +	return 0;
> >> +
> >> +ae4_error:
> >> +	ae4_deinit(ae4);
> >> +
> >> +	return ret;
> >> +}
> >> +
> >> +static void ae4_pci_remove(struct pci_dev *pdev)
> >> +{
> >> +	struct ae4_device *ae4 = dev_get_drvdata(&pdev->dev);
> >> +
> >> +	ae4_destroy_work(ae4);
> >> +	ae4_deinit(ae4);
> >> +}
> >> +
> >> +static const struct pci_device_id ae4_pci_table[] = {
> >> +	{ PCI_VDEVICE(AMD, 0x14C8), },
> >> +	{ PCI_VDEVICE(AMD, 0x14DC), },
> >> +	{ PCI_VDEVICE(AMD, 0x149B), },
> >> +	/* Last entry must be zero */
> >> +	{ 0, }
> >> +};
> >> +MODULE_DEVICE_TABLE(pci, ae4_pci_table);
> >> +
> >> +static struct pci_driver ae4_pci_driver = {
> >> +	.name = "ae4dma",
> >> +	.id_table = ae4_pci_table,
> >> +	.probe = ae4_pci_probe,
> >> +	.remove = ae4_pci_remove,
> >> +};
> >> +
> >> +module_pci_driver(ae4_pci_driver);
> >> +
> >> +MODULE_LICENSE("GPL");
> >> +MODULE_DESCRIPTION("AMD AE4DMA driver");
> >> diff --git a/drivers/dma/amd/ae4dma/ae4dma.h b/drivers/dma/amd/ae4dma/ae4dma.h
> >> new file mode 100644
> >> index 000000000000..24b1253ad570
> >> --- /dev/null
> >> +++ b/drivers/dma/amd/ae4dma/ae4dma.h
> >> @@ -0,0 +1,77 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +/*
> >> + * AMD AE4DMA driver
> >> + *
> >> + * Copyright (c) 2024, Advanced Micro Devices, Inc.
> >> + * All Rights Reserved.
> >> + *
> >> + * Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
> >> + */
> >> +#ifndef __AE4DMA_H__
> >> +#define __AE4DMA_H__
> >> +
> >> +#include "../common/amd_dma.h"
> >> +
> >> +#define MAX_AE4_HW_QUEUES		16
> >> +
> >> +#define AE4_DESC_COMPLETED		0x3
> >> +
> >> +struct ae4_msix {
> >> +	int msix_count;
> >> +	struct msix_entry msix_entry[MAX_AE4_HW_QUEUES];
> >> +};
> >> +
> >> +struct ae4_cmd_queue {
> >> +	struct ae4_device *ae4;
> >> +	struct pt_cmd_queue cmd_q;
> >> +	struct list_head cmd;
> >> +	/* protect command operations */
> >> +	struct mutex cmd_lock;
> >> +	struct delayed_work p_work;
> >> +	struct workqueue_struct *pws;
> >> +	struct completion cmp;
> >> +	wait_queue_head_t q_w;
> >> +	atomic64_t intr_cnt;
> >> +	atomic64_t done_cnt;
> >> +	atomic64_t q_cmd_count;
> >> +	atomic_t dridx;
> >> +	unsigned int id;
> >> +};
> >> +
> >> +union dwou {
> >> +	u32 dw0;
> >> +	struct dword0 {
> >> +	u8	byte0;
> >> +	u8	byte1;
> >> +	u16	timestamp;
> >> +	} dws;
> >> +};
> >> +
> >> +struct dword1 {
> >> +	u8	status;
> >> +	u8	err_code;
> >> +	u16	desc_id;
> >> +};
> >> +
> >> +struct ae4dma_desc {
> >> +	union dwou dwouv;
> >> +	struct dword1 dw1;
> >> +	u32 length;
> >> +	u32 rsvd;
> >> +	u32 src_hi;
> >> +	u32 src_lo;
> >> +	u32 dst_hi;
> >> +	u32 dst_lo;
> >> +};
> >> +
> >> +struct ae4_device {
> >> +	struct pt_device pt;
> >> +	struct ae4_msix *ae4_msix;
> >> +	struct ae4_cmd_queue ae4cmd_q[MAX_AE4_HW_QUEUES];
> >> +	unsigned int ae4_irq[MAX_AE4_HW_QUEUES];
> >> +	unsigned int cmd_q_count;
> >> +};
> >> +
> >> +int ae4_core_init(struct ae4_device *ae4);
> >> +void ae4_destroy_work(struct ae4_device *ae4);
> >> +#endif
> >> diff --git a/drivers/dma/amd/common/amd_dma.h b/drivers/dma/amd/common/amd_dma.h
> >> new file mode 100644
> >> index 000000000000..31c35b3bc94b
> >> --- /dev/null
> >> +++ b/drivers/dma/amd/common/amd_dma.h
> >> @@ -0,0 +1,26 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +/*
> >> + * AMD DMA Driver common
> >> + *
> >> + * Copyright (c) 2024, Advanced Micro Devices, Inc.
> >> + * All Rights Reserved.
> >> + *
> >> + * Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
> >> + */
> >> +
> >> +#ifndef AMD_DMA_H
> >> +#define AMD_DMA_H
> >> +
> >> +#include <linux/device.h>
> >> +#include <linux/dmaengine.h>
> >> +#include <linux/pci.h>
> >> +#include <linux/spinlock.h>
> >> +#include <linux/mutex.h>
> >> +#include <linux/list.h>
> >> +#include <linux/wait.h>
> >> +#include <linux/dmapool.h>
> > order by alphabet
> 
> Sure, I will change it accordingly.
> 
> Thanks,
> --
> Basavaraj
> 
> >
> >> +
> >> +#include "../ptdma/ptdma.h"
> >> +#include "../../virt-dma.h"
> >> +
> >> +#endif
> >> -- 
> >> 2.25.1
> >>
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-05-23 22:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-10  8:20 [PATCH 0/7] Add support of AMD AE4DMA DMA Engine Basavaraj Natikar
2024-05-10  8:20 ` [PATCH 1/7] dmaengine: Move AMD DMA driver to separate directory Basavaraj Natikar
2024-05-10  8:20 ` [PATCH 2/7] dmaengine: ae4dma: Add AMD ae4dma controller driver Basavaraj Natikar
2024-05-10 18:16   ` Frank Li
2024-05-21  9:36     ` Basavaraj Natikar
2024-05-23 22:02       ` Frank Li
2024-05-10  8:20 ` [PATCH 3/7] dmaengine: ptdma: Move common functions to common code Basavaraj Natikar
2024-05-10  8:20 ` [PATCH 4/7] dmaengine: ptdma: Extend ptdma to support multi-channel and version Basavaraj Natikar
2024-05-11 10:54   ` kernel test robot
2024-05-10  8:20 ` [PATCH 5/7] dmaengine: ae4dma: Register AE4DMA using pt_dmaengine_register Basavaraj Natikar
2024-05-10  8:20 ` [PATCH 6/7] dmaengine: ptdma: Extend ptdma-debugfs to support multi-queue Basavaraj Natikar
2024-05-10  8:20 ` [PATCH 7/7] dmaengine: ae4dma: Register debugfs using ptdma_debugfs_setup Basavaraj Natikar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).