dmaengine Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 0/6] dma: Add Xilinx ZynqMP DPDMA driver
@ 2020-01-23  2:29 Laurent Pinchart
  2020-01-23  2:29 ` [PATCH v3 1/6] dt: bindings: dma: xilinx: dpdma: DT bindings for Xilinx DPDMA Laurent Pinchart
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Laurent Pinchart @ 2020-01-23  2:29 UTC (permalink / raw)
  To: dmaengine
  Cc: Michal Simek, Hyun Kwon, Tejas Upadhyay, Satish Kumar Nagireddy,
	Vinod Koul

Hello,

This patch series adds a new driver for the DPDMA engine found in the
Xilinx ZynqMP.

The previous version can be found at [1]. All review comments have been
taken into account. The most notable changes are

- Introduction of a new DMA transfer type that combines interleaved and
  cyclic tranfers (patch 2/6, suggested by Vinod)

- Switch to virt-dma (including a drive-by lockdep addition to virt-dma
  in patch 3/6)

- Removal of all non-interleaved, non-cyclic transfer types, as I have
  currently no way to test them given how the IP core is integrated in
  the hardware. Support for non-interleaved cyclic transfers may be
  added later for audio.

The driver has been successfully tested with the ZynqMP DisplayPort
subsystem DRM driver.

Vinod, please let me know if you would like authorship of patch 2/6 to
be assigned to you, in which case I will need your SoB line.

[1] https://lore.kernel.org/dmaengine/20191107021400.16474-1-laurent.pinchart@ideasonboard.com/

Hyun Kwon (1):
  dmaengine: xilinx: dpdma: Add the Xilinx DisplayPort DMA engine driver

Laurent Pinchart (5):
  dt: bindings: dma: xilinx: dpdma: DT bindings for Xilinx DPDMA
  dmaengine: Add interleaved cyclic transaction type
  dmaengine: virt-dma: Use lockdep to check locking requirements
  dmaengine: xilinx: dpdma: Add debugfs support
  arm64: dts: zynqmp: Add DPDMA node

 .../dma/xilinx/xlnx,zynqmp-dpdma.yaml         |   68 +
 MAINTAINERS                                   |    9 +
 arch/arm64/boot/dts/xilinx/zynqmp-clk.dtsi    |    4 +
 arch/arm64/boot/dts/xilinx/zynqmp.dtsi        |   10 +
 drivers/dma/Kconfig                           |   10 +
 drivers/dma/dmaengine.c                       |    8 +-
 drivers/dma/virt-dma.c                        |    2 +
 drivers/dma/virt-dma.h                        |   14 +
 drivers/dma/xilinx/Makefile                   |    1 +
 drivers/dma/xilinx/xilinx_dpdma.c             | 1754 +++++++++++++++++
 include/dt-bindings/dma/xlnx-zynqmp-dpdma.h   |   16 +
 include/linux/dmaengine.h                     |   18 +
 12 files changed, 1913 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/dma/xilinx/xlnx,zynqmp-dpdma.yaml
 create mode 100644 drivers/dma/xilinx/xilinx_dpdma.c
 create mode 100644 include/dt-bindings/dma/xlnx-zynqmp-dpdma.h


base-commit: d1eef1c619749b2a57e514a3fa67d9a516ffa919
-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 1/6] dt: bindings: dma: xilinx: dpdma: DT bindings for Xilinx DPDMA
  2020-01-23  2:29 [PATCH v3 0/6] dma: Add Xilinx ZynqMP DPDMA driver Laurent Pinchart
@ 2020-01-23  2:29 ` Laurent Pinchart
  2020-01-23  2:29 ` [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type Laurent Pinchart
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Laurent Pinchart @ 2020-01-23  2:29 UTC (permalink / raw)
  To: dmaengine
  Cc: Michal Simek, Hyun Kwon, Tejas Upadhyay, Satish Kumar Nagireddy,
	devicetree

The ZynqMP includes the DisplayPort subsystem with its own DMA engine
called DPDMA. The DPDMA IP comes with 6 individual channels
(4 for display, 2 for audio). This documentation describes DT bindings
of DPDMA.

Signed-off-by: Hyun Kwon <hyun.kwon@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Rob Herring <robh@kernel.org>
---
Changes since v2:

- Fix id URL
- Fix path to dma-controller.yaml
- Update license to GPL-2.0-only OR BSD-2-Clause

Changes since v1:

- Convert the DT bindings to YAML
- Drop the DT child nodes
---
 .../dma/xilinx/xlnx,zynqmp-dpdma.yaml         | 68 +++++++++++++++++++
 MAINTAINERS                                   |  8 +++
 include/dt-bindings/dma/xlnx-zynqmp-dpdma.h   | 16 +++++
 3 files changed, 92 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/xilinx/xlnx,zynqmp-dpdma.yaml
 create mode 100644 include/dt-bindings/dma/xlnx-zynqmp-dpdma.h

diff --git a/Documentation/devicetree/bindings/dma/xilinx/xlnx,zynqmp-dpdma.yaml b/Documentation/devicetree/bindings/dma/xilinx/xlnx,zynqmp-dpdma.yaml
new file mode 100644
index 000000000000..5de510f8c88c
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/xilinx/xlnx,zynqmp-dpdma.yaml
@@ -0,0 +1,68 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/dma/xilinx/xlnx,zynqmp-dpdma.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Xilinx ZynqMP DisplayPort DMA Controller Device Tree Bindings
+
+description: |
+  These bindings describe the DMA engine included in the Xilinx ZynqMP
+  DisplayPort Subsystem. The DMA engine supports up to 6 DMA channels (3
+  channels for a video stream, 1 channel for a graphics stream, and 2 channels
+  for an audio stream).
+
+maintainers:
+  - Laurent Pinchart <laurent.pinchart@ideasonboard.com>
+
+allOf:
+  - $ref: "../dma-controller.yaml#"
+
+properties:
+  "#dma-cells":
+    const: 1
+    description: |
+      The cell is the DMA channel ID (see dt-bindings/dma/xlnx-zynqmp-dpdma.h
+      for a list of channel IDs).
+
+  compatible:
+    const: xlnx,zynqmp-dpdma
+
+  reg:
+    maxItems: 1
+
+  interrupts:
+    maxItems: 1
+
+  clocks:
+    description: The AXI clock
+    maxItems: 1
+
+  clock-names:
+    const: axi_clk
+
+required:
+  - "#dma-cells"
+  - compatible
+  - reg
+  - interrupts
+  - clocks
+  - clock-names
+
+additionalProperties: false
+
+examples:
+  - |
+    #include <dt-bindings/interrupt-controller/arm-gic.h>
+
+    dma: dma-controller@fd4c0000 {
+      compatible = "xlnx,zynqmp-dpdma";
+      reg = <0x0 0xfd4c0000 0x0 0x1000>;
+      interrupts = <GIC_SPI 122 IRQ_TYPE_LEVEL_HIGH>;
+      interrupt-parent = <&gic>;
+      clocks = <&dpdma_clk>;
+      clock-names = "axi_clk";
+      #dma-cells = <1>;
+    };
+
+...
diff --git a/MAINTAINERS b/MAINTAINERS
index cc0a4a8ae06a..c7a011837102 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -18182,6 +18182,14 @@ F:	drivers/misc/Kconfig
 F:	drivers/misc/Makefile
 F:	include/uapi/misc/xilinx_sdfec.h
 
+XILINX ZYNQMP DPDMA DRIVER
+M:	Hyun Kwon <hyun.kwon@xilinx.com>
+M:	Laurent Pinchart <laurent.pinchart@ideasonboard.com>
+L:	dmaengine@vger.kernel.org
+S:	Supported
+F:	Documentation/devicetree/bindings/dma/xilinx/xlnx,zynqmp-dpdma.yaml
+F:	include/dt-bindings/dma/xlnx-zynqmp-dpdma.h
+
 XILLYBUS DRIVER
 M:	Eli Billauer <eli.billauer@gmail.com>
 L:	linux-kernel@vger.kernel.org
diff --git a/include/dt-bindings/dma/xlnx-zynqmp-dpdma.h b/include/dt-bindings/dma/xlnx-zynqmp-dpdma.h
new file mode 100644
index 000000000000..3719cda5679d
--- /dev/null
+++ b/include/dt-bindings/dma/xlnx-zynqmp-dpdma.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
+/*
+ * Copyright 2019 Laurent Pinchart <laurent.pinchart@ideasonboard.com>
+ */
+
+#ifndef __DT_BINDINGS_DMA_XLNX_ZYNQMP_DPDMA_H__
+#define __DT_BINDINGS_DMA_XLNX_ZYNQMP_DPDMA_H__
+
+#define ZYNQMP_DPDMA_VIDEO0		0
+#define ZYNQMP_DPDMA_VIDEO1		1
+#define ZYNQMP_DPDMA_VIDEO2		2
+#define ZYNQMP_DPDMA_GRAPHICS		3
+#define ZYNQMP_DPDMA_AUDIO0		4
+#define ZYNQMP_DPDMA_AUDIO1		5
+
+#endif /* __DT_BINDINGS_DMA_XLNX_ZYNQMP_DPDMA_H__ */
-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-01-23  2:29 [PATCH v3 0/6] dma: Add Xilinx ZynqMP DPDMA driver Laurent Pinchart
  2020-01-23  2:29 ` [PATCH v3 1/6] dt: bindings: dma: xilinx: dpdma: DT bindings for Xilinx DPDMA Laurent Pinchart
@ 2020-01-23  2:29 ` Laurent Pinchart
  2020-01-23  8:03   ` Peter Ujfalusi
  2020-01-23  2:29 ` [PATCH v3 3/6] dmaengine: virt-dma: Use lockdep to check locking requirements Laurent Pinchart
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Laurent Pinchart @ 2020-01-23  2:29 UTC (permalink / raw)
  To: dmaengine
  Cc: Michal Simek, Hyun Kwon, Tejas Upadhyay, Satish Kumar Nagireddy,
	Vinod Koul

The new interleaved cyclic transaction type combines interleaved and
cycle transactions. It is designed for DMA engines that back display
controllers, where the same 2D frame needs to be output to the display
until a new frame is available.

Suggested-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
---
 drivers/dma/dmaengine.c   |  8 +++++++-
 include/linux/dmaengine.h | 18 ++++++++++++++++++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 03ac4b96117c..4ffb98a47f31 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -981,7 +981,13 @@ int dma_async_device_register(struct dma_device *device)
 			"DMA_INTERLEAVE");
 		return -EIO;
 	}
-
+	if (dma_has_cap(DMA_INTERLEAVE_CYCLIC, device->cap_mask) &&
+	    !device->device_prep_interleaved_cyclic) {
+		dev_err(device->dev,
+			"Device claims capability %s, but op is not defined\n",
+			"DMA_INTERLEAVE_CYCLIC");
+		return -EIO;
+	}
 
 	if (!device->device_tx_status) {
 		dev_err(device->dev, "Device tx_status is not defined\n");
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 8fcdee1c0cf9..e9af3bf835cb 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -61,6 +61,7 @@ enum dma_transaction_type {
 	DMA_SLAVE,
 	DMA_CYCLIC,
 	DMA_INTERLEAVE,
+	DMA_INTERLEAVE_CYCLIC,
 /* last transaction type for creation of the capabilities mask */
 	DMA_TX_TYPE_END,
 };
@@ -701,6 +702,10 @@ struct dma_filter {
  *	The function takes a buffer of size buf_len. The callback function will
  *	be called after period_len bytes have been transferred.
  * @device_prep_interleaved_dma: Transfer expression in a generic way.
+ * @device_prep_interleaved_cyclic: prepares an interleaved cyclic transfer.
+ *	This is similar to @device_prep_interleaved_dma, but the transfer is
+ *	repeated until a new transfer is issued. This transfer type is meant
+ *	for display.
  * @device_prep_dma_imm_data: DMA's 8 byte immediate data to the dst address
  * @device_config: Pushes a new configuration to a channel, return 0 or an error
  *	code
@@ -785,6 +790,9 @@ struct dma_device {
 	struct dma_async_tx_descriptor *(*device_prep_interleaved_dma)(
 		struct dma_chan *chan, struct dma_interleaved_template *xt,
 		unsigned long flags);
+	struct dma_async_tx_descriptor *(*device_prep_interleaved_cyclic)(
+		struct dma_chan *chan, struct dma_interleaved_template *xt,
+		unsigned long flags);
 	struct dma_async_tx_descriptor *(*device_prep_dma_imm_data)(
 		struct dma_chan *chan, dma_addr_t dst, u64 data,
 		unsigned long flags);
@@ -880,6 +888,16 @@ static inline struct dma_async_tx_descriptor *dmaengine_prep_interleaved_dma(
 	return chan->device->device_prep_interleaved_dma(chan, xt, flags);
 }
 
+static inline struct dma_async_tx_descriptor *dmaengine_prep_interleaved_cyclic(
+		struct dma_chan *chan, struct dma_interleaved_template *xt,
+		unsigned long flags)
+{
+	if (!chan || !chan->device || !chan->device->device_prep_interleaved_cyclic)
+		return NULL;
+
+	return chan->device->device_prep_interleaved_cyclic(chan, xt, flags);
+}
+
 static inline struct dma_async_tx_descriptor *dmaengine_prep_dma_memset(
 		struct dma_chan *chan, dma_addr_t dest, int value, size_t len,
 		unsigned long flags)
-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 3/6] dmaengine: virt-dma: Use lockdep to check locking requirements
  2020-01-23  2:29 [PATCH v3 0/6] dma: Add Xilinx ZynqMP DPDMA driver Laurent Pinchart
  2020-01-23  2:29 ` [PATCH v3 1/6] dt: bindings: dma: xilinx: dpdma: DT bindings for Xilinx DPDMA Laurent Pinchart
  2020-01-23  2:29 ` [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type Laurent Pinchart
@ 2020-01-23  2:29 ` Laurent Pinchart
  2020-01-23  2:29 ` [PATCH v3 4/6] dmaengine: xilinx: dpdma: Add the Xilinx DisplayPort DMA engine driver Laurent Pinchart
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Laurent Pinchart @ 2020-01-23  2:29 UTC (permalink / raw)
  To: dmaengine; +Cc: Michal Simek, Hyun Kwon, Tejas Upadhyay, Satish Kumar Nagireddy

A few virt-dma functions are documented as requiring the vc.lock to be
held by the caller. Check this with lockdep.

The vchan_vdesc_fini() and vchan_find_desc() functions gain a lockdep
check as well, because, even though they are not documented with this
requirement (and not documented at all for the latter), they touch
fields documented as protected by vc.lock. All callers have been
manually inspected to verify they call the functions with the lock held.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
---
 drivers/dma/virt-dma.c |  2 ++
 drivers/dma/virt-dma.h | 14 ++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/drivers/dma/virt-dma.c b/drivers/dma/virt-dma.c
index ec4adf4260a0..9b59bc1c6a55 100644
--- a/drivers/dma/virt-dma.c
+++ b/drivers/dma/virt-dma.c
@@ -68,6 +68,8 @@ struct virt_dma_desc *vchan_find_desc(struct virt_dma_chan *vc,
 {
 	struct virt_dma_desc *vd;
 
+	lockdep_assert_held(&vc->lock);
+
 	list_for_each_entry(vd, &vc->desc_issued, node)
 		if (vd->tx.cookie == cookie)
 			return vd;
diff --git a/drivers/dma/virt-dma.h b/drivers/dma/virt-dma.h
index ab158bac03a7..942493e36666 100644
--- a/drivers/dma/virt-dma.h
+++ b/drivers/dma/virt-dma.h
@@ -81,6 +81,8 @@ static inline struct dma_async_tx_descriptor *vchan_tx_prep(struct virt_dma_chan
  */
 static inline bool vchan_issue_pending(struct virt_dma_chan *vc)
 {
+	lockdep_assert_held(&vc->lock);
+
 	list_splice_tail_init(&vc->desc_submitted, &vc->desc_issued);
 	return !list_empty(&vc->desc_issued);
 }
@@ -96,6 +98,8 @@ static inline void vchan_cookie_complete(struct virt_dma_desc *vd)
 	struct virt_dma_chan *vc = to_virt_chan(vd->tx.chan);
 	dma_cookie_t cookie;
 
+	lockdep_assert_held(&vc->lock);
+
 	cookie = vd->tx.cookie;
 	dma_cookie_complete(&vd->tx);
 	dev_vdbg(vc->chan.device->dev, "txd %p[%x]: marked complete\n",
@@ -108,11 +112,15 @@ static inline void vchan_cookie_complete(struct virt_dma_desc *vd)
 /**
  * vchan_vdesc_fini - Free or reuse a descriptor
  * @vd: virtual descriptor to free/reuse
+ *
+ * vc.lock must be held by caller
  */
 static inline void vchan_vdesc_fini(struct virt_dma_desc *vd)
 {
 	struct virt_dma_chan *vc = to_virt_chan(vd->tx.chan);
 
+	lockdep_assert_held(&vc->lock);
+
 	if (dmaengine_desc_test_reuse(&vd->tx))
 		list_add(&vd->node, &vc->desc_allocated);
 	else
@@ -141,6 +149,8 @@ static inline void vchan_terminate_vdesc(struct virt_dma_desc *vd)
 {
 	struct virt_dma_chan *vc = to_virt_chan(vd->tx.chan);
 
+	lockdep_assert_held(&vc->lock);
+
 	/* free up stuck descriptor */
 	if (vc->vd_terminated)
 		vchan_vdesc_fini(vc->vd_terminated);
@@ -158,6 +168,8 @@ static inline void vchan_terminate_vdesc(struct virt_dma_desc *vd)
  */
 static inline struct virt_dma_desc *vchan_next_desc(struct virt_dma_chan *vc)
 {
+	lockdep_assert_held(&vc->lock);
+
 	return list_first_entry_or_null(&vc->desc_issued,
 					struct virt_dma_desc, node);
 }
@@ -175,6 +187,8 @@ static inline struct virt_dma_desc *vchan_next_desc(struct virt_dma_chan *vc)
 static inline void vchan_get_all_descriptors(struct virt_dma_chan *vc,
 	struct list_head *head)
 {
+	lockdep_assert_held(&vc->lock);
+
 	list_splice_tail_init(&vc->desc_allocated, head);
 	list_splice_tail_init(&vc->desc_submitted, head);
 	list_splice_tail_init(&vc->desc_issued, head);
-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 4/6] dmaengine: xilinx: dpdma: Add the Xilinx DisplayPort DMA engine driver
  2020-01-23  2:29 [PATCH v3 0/6] dma: Add Xilinx ZynqMP DPDMA driver Laurent Pinchart
                   ` (2 preceding siblings ...)
  2020-01-23  2:29 ` [PATCH v3 3/6] dmaengine: virt-dma: Use lockdep to check locking requirements Laurent Pinchart
@ 2020-01-23  2:29 ` Laurent Pinchart
  2020-01-23  2:29 ` [PATCH v3 5/6] dmaengine: xilinx: dpdma: Add debugfs support Laurent Pinchart
  2020-01-23  2:29 ` [PATCH v3 6/6] arm64: dts: zynqmp: Add DPDMA node Laurent Pinchart
  5 siblings, 0 replies; 27+ messages in thread
From: Laurent Pinchart @ 2020-01-23  2:29 UTC (permalink / raw)
  To: dmaengine; +Cc: Michal Simek, Hyun Kwon, Tejas Upadhyay, Satish Kumar Nagireddy

From: Hyun Kwon <hyun.kwon@xilinx.com>

The ZynqMP DisplayPort subsystem includes a DMA engine called DPDMA with
6 DMa channels (4 for display and 2 for audio). This driver exposes the
DPDMA through the dmaengine API, to be used by audio (ALSA) and display
(DRM) drivers for the DisplayPort subsystem.

Signed-off-by: Hyun Kwon <hyun.kwon@xilinx.com>
Signed-off-by: Tejas Upadhyay <tejasu@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
---
Changes since v2:

- Switch to virt-dma
- Support interleaved cyclic transfers and nothing else
- Fix terminate_all behaviour (don't wait)
- Fix bug in extended address handling for hw desc
- Clean up video group handling
- Update driver name
- Use macros for bitfields
- Remove unneeded header
- Coding style and typo fixes

Changes since v1:

- Remove unneeded #include
- Drop enum xilinx_dpdma_chan_id
- Update compatible string
- Drop DT subnodes
- Replace XILINX_DPDMA_NUM_CHAN with ARRAY_SIZE(xdev->chan)
- Disable IRQ at remove() time
- Use devm_platform_ioremap_resource()
- Don't inline functions manually
- Add section headers
- Merge DMA engine implementation in their wrappers
- Rename xilinx_dpdma_sw_desc::phys to dma_addr
- Use GENMASK()
- Use FIELD_PREP/FIELD_GET
- Fix MSB handling in xilinx_dpdma_sw_desc_addr_64()
- Fix logic in xilinx_dpdma_chan_prep_slave_sg()
- Document why xilinx_dpdma_config() doesn't need to check most
  parameters
- Remove debugfs support
- Rechedule errored descriptor
- Align the line size with 128bit
- SPDX header formatting
---
 MAINTAINERS                       |    1 +
 drivers/dma/Kconfig               |   10 +
 drivers/dma/xilinx/Makefile       |    1 +
 drivers/dma/xilinx/xilinx_dpdma.c | 1527 +++++++++++++++++++++++++++++
 4 files changed, 1539 insertions(+)
 create mode 100644 drivers/dma/xilinx/xilinx_dpdma.c

diff --git a/MAINTAINERS b/MAINTAINERS
index c7a011837102..cabe9d0417c2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -18188,6 +18188,7 @@ M:	Laurent Pinchart <laurent.pinchart@ideasonboard.com>
 L:	dmaengine@vger.kernel.org
 S:	Supported
 F:	Documentation/devicetree/bindings/dma/xilinx/xlnx,zynqmp-dpdma.yaml
+F:	drivers/dma/xilinx/xilinx_dpdma.c
 F:	include/dt-bindings/dma/xlnx-zynqmp-dpdma.h
 
 XILLYBUS DRIVER
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 6fa1eba9d477..7f6a87161344 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -667,6 +667,16 @@ config XILINX_ZYNQMP_DMA
 	help
 	  Enable support for Xilinx ZynqMP DMA controller.
 
+config XILINX_ZYNQMP_DPDMA
+	tristate "Xilinx DPDMA Engine"
+	select DMA_ENGINE
+	select DMA_VIRTUAL_CHANNELS
+	help
+	  Enable support for Xilinx ZynqMP DisplayPort DMA. Choose this option
+	  if you have a Xilinx ZynqMP SoC with a DisplayPort subsystem. The
+	  driver provides the dmaengine required by the DisplayPort subsystem
+	  display driver.
+
 config ZX_DMA
 	tristate "ZTE ZX DMA support"
 	depends on ARCH_ZX || COMPILE_TEST
diff --git a/drivers/dma/xilinx/Makefile b/drivers/dma/xilinx/Makefile
index e921de575b55..767bb45f641f 100644
--- a/drivers/dma/xilinx/Makefile
+++ b/drivers/dma/xilinx/Makefile
@@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-$(CONFIG_XILINX_DMA) += xilinx_dma.o
 obj-$(CONFIG_XILINX_ZYNQMP_DMA) += zynqmp_dma.o
+obj-$(CONFIG_XILINX_ZYNQMP_DPDMA) += xilinx_dpdma.o
diff --git a/drivers/dma/xilinx/xilinx_dpdma.c b/drivers/dma/xilinx/xilinx_dpdma.c
new file mode 100644
index 000000000000..15ba85aa63d9
--- /dev/null
+++ b/drivers/dma/xilinx/xilinx_dpdma.c
@@ -0,0 +1,1527 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Xilinx ZynqMP DPDMA Engine driver
+ *
+ * Copyright (C) 2015 - 2019 Xilinx, Inc.
+ *
+ * Author: Hyun Woo Kwon <hyun.kwon@xilinx.com>
+ */
+
+#include <linux/bitfield.h>
+#include <linux/bits.h>
+#include <linux/clk.h>
+#include <linux/delay.h>
+#include <linux/dmaengine.h>
+#include <linux/dmapool.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_dma.h>
+#include <linux/platform_device.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/wait.h>
+
+#include <dt-bindings/dma/xlnx-zynqmp-dpdma.h>
+
+#include "../dmaengine.h"
+#include "../virt-dma.h"
+
+/* DPDMA registers */
+#define XILINX_DPDMA_ERR_CTRL				0x000
+#define XILINX_DPDMA_ISR				0x004
+#define XILINX_DPDMA_IMR				0x008
+#define XILINX_DPDMA_IEN				0x00c
+#define XILINX_DPDMA_IDS				0x010
+#define XILINX_DPDMA_INTR_DESC_DONE(n)			BIT((n) + 0)
+#define XILINX_DPDMA_INTR_DESC_DONE_MASK		GENMASK(5, 0)
+#define XILINX_DPDMA_INTR_NO_OSTAND(n)			BIT((n) + 6)
+#define XILINX_DPDMA_INTR_NO_OSTAND_MASK		GENMASK(11, 6)
+#define XILINX_DPDMA_INTR_AXI_ERR(n)			BIT((n) + 12)
+#define XILINX_DPDMA_INTR_AXI_ERR_MASK			GENMASK(17, 12)
+#define XILINX_DPDMA_INTR_DESC_ERR(n)			BIT((n) + 16)
+#define XILINX_DPDMA_INTR_DESC_ERR_MASK			GENMASK(23, 18)
+#define XILINX_DPDMA_INTR_WR_CMD_FIFO_FULL		BIT(24)
+#define XILINX_DPDMA_INTR_WR_DATA_FIFO_FULL		BIT(25)
+#define XILINX_DPDMA_INTR_AXI_4K_CROSS			BIT(26)
+#define XILINX_DPDMA_INTR_VSYNC				BIT(27)
+#define XILINX_DPDMA_INTR_CHAN_ERR_MASK			0x00041000
+#define XILINX_DPDMA_INTR_CHAN_ERR			0x00fff000
+#define XILINX_DPDMA_INTR_GLOBAL_ERR			0x07000000
+#define XILINX_DPDMA_INTR_ERR_ALL			0x07fff000
+#define XILINX_DPDMA_INTR_CHAN_MASK			0x00041041
+#define XILINX_DPDMA_INTR_GLOBAL_MASK			0x0f000000
+#define XILINX_DPDMA_INTR_ALL				0x0fffffff
+#define XILINX_DPDMA_EISR				0x014
+#define XILINX_DPDMA_EIMR				0x018
+#define XILINX_DPDMA_EIEN				0x01c
+#define XILINX_DPDMA_EIDS				0x020
+#define XILINX_DPDMA_EINTR_INV_APB			BIT(0)
+#define XILINX_DPDMA_EINTR_RD_AXI_ERR(n)		BIT((n) + 1)
+#define XILINX_DPDMA_EINTR_RD_AXI_ERR_MASK		GENMASK(6, 1)
+#define XILINX_DPDMA_EINTR_PRE_ERR(n)			BIT((n) + 7)
+#define XILINX_DPDMA_EINTR_PRE_ERR_MASK			GENMASK(12, 7)
+#define XILINX_DPDMA_EINTR_CRC_ERR(n)			BIT((n) + 13)
+#define XILINX_DPDMA_EINTR_CRC_ERR_MASK			GENMASK(18, 13)
+#define XILINX_DPDMA_EINTR_WR_AXI_ERR(n)		BIT((n) + 19)
+#define XILINX_DPDMA_EINTR_WR_AXI_ERR_MASK		GENMASK(24, 19)
+#define XILINX_DPDMA_EINTR_DESC_DONE_ERR(n)		BIT((n) + 25)
+#define XILINX_DPDMA_EINTR_DESC_DONE_ERR_MASK		GENMASK(30, 25)
+#define XILINX_DPDMA_EINTR_RD_CMD_FIFO_FULL		BIT(32)
+#define XILINX_DPDMA_EINTR_CHAN_ERR_MASK		0x02082082
+#define XILINX_DPDMA_EINTR_CHAN_ERR			0x7ffffffe
+#define XILINX_DPDMA_EINTR_GLOBAL_ERR			0x80000001
+#define XILINX_DPDMA_EINTR_ALL				0xffffffff
+#define XILINX_DPDMA_CNTL				0x100
+#define XILINX_DPDMA_GBL				0x104
+#define XILINX_DPDMA_GBL_TRIG_MASK(n)			((n) << 0)
+#define XILINX_DPDMA_GBL_RETRIG_MASK(n)			((n) << 6)
+#define XILINX_DPDMA_ALC0_CNTL				0x108
+#define XILINX_DPDMA_ALC0_STATUS			0x10c
+#define XILINX_DPDMA_ALC0_MAX				0x110
+#define XILINX_DPDMA_ALC0_MIN				0x114
+#define XILINX_DPDMA_ALC0_ACC				0x118
+#define XILINX_DPDMA_ALC0_ACC_TRAN			0x11c
+#define XILINX_DPDMA_ALC1_CNTL				0x120
+#define XILINX_DPDMA_ALC1_STATUS			0x124
+#define XILINX_DPDMA_ALC1_MAX				0x128
+#define XILINX_DPDMA_ALC1_MIN				0x12c
+#define XILINX_DPDMA_ALC1_ACC				0x130
+#define XILINX_DPDMA_ALC1_ACC_TRAN			0x134
+
+/* Channel register */
+#define XILINX_DPDMA_CH_BASE				0x200
+#define XILINX_DPDMA_CH_OFFSET				0x100
+#define XILINX_DPDMA_CH_DESC_START_ADDRE		0x000
+#define XILINX_DPDMA_CH_DESC_START_ADDRE_MASK		GENMASK(15, 0)
+#define XILINX_DPDMA_CH_DESC_START_ADDR			0x004
+#define XILINX_DPDMA_CH_DESC_NEXT_ADDRE			0x008
+#define XILINX_DPDMA_CH_DESC_NEXT_ADDR			0x00c
+#define XILINX_DPDMA_CH_PYLD_CUR_ADDRE			0x010
+#define XILINX_DPDMA_CH_PYLD_CUR_ADDR			0x014
+#define XILINX_DPDMA_CH_CNTL				0x018
+#define XILINX_DPDMA_CH_CNTL_ENABLE			BIT(0)
+#define XILINX_DPDMA_CH_CNTL_PAUSE			BIT(1)
+#define XILINX_DPDMA_CH_CNTL_QOS_DSCR_WR_MASK		GENMASK(5, 2)
+#define XILINX_DPDMA_CH_CNTL_QOS_DSCR_RD_MASK		GENMASK(9, 6)
+#define XILINX_DPDMA_CH_CNTL_QOS_DATA_RD_MASK		GENMASK(13, 10)
+#define XILINX_DPDMA_CH_CNTL_QOS_VID_CLASS		11
+#define XILINX_DPDMA_CH_STATUS				0x01c
+#define XILINX_DPDMA_CH_STATUS_OTRAN_CNT_MASK		GENMASK(24, 21)
+#define XILINX_DPDMA_CH_VDO				0x020
+#define XILINX_DPDMA_CH_PYLD_SZ				0x024
+#define XILINX_DPDMA_CH_DESC_ID				0x028
+
+/* DPDMA descriptor fields */
+#define XILINX_DPDMA_DESC_CONTROL_PREEMBLE		0xa5
+#define XILINX_DPDMA_DESC_CONTROL_COMPLETE_INTR		BIT(8)
+#define XILINX_DPDMA_DESC_CONTROL_DESC_UPDATE		BIT(9)
+#define XILINX_DPDMA_DESC_CONTROL_IGNORE_DONE		BIT(10)
+#define XILINX_DPDMA_DESC_CONTROL_FRAG_MODE		BIT(18)
+#define XILINX_DPDMA_DESC_CONTROL_LAST			BIT(19)
+#define XILINX_DPDMA_DESC_CONTROL_ENABLE_CRC		BIT(20)
+#define XILINX_DPDMA_DESC_CONTROL_LAST_OF_FRAME		BIT(21)
+#define XILINX_DPDMA_DESC_ID_MASK			GENMASK(15, 0)
+#define XILINX_DPDMA_DESC_HSIZE_STRIDE_HSIZE_MASK	GENMASK(17, 0)
+#define XILINX_DPDMA_DESC_HSIZE_STRIDE_STRIDE_MASK	GENMASK(31, 18)
+#define XILINX_DPDMA_DESC_ADDR_EXT_NEXT_ADDR_MASK	GENMASK(15, 0)
+#define XILINX_DPDMA_DESC_ADDR_EXT_SRC_ADDR_MASK	GENMASK(31, 16)
+
+#define XILINX_DPDMA_ALIGN_BYTES			256
+#define XILINX_DPDMA_LINESIZE_ALIGN_BITS		128
+
+#define XILINX_DPDMA_NUM_CHAN				6
+
+struct xilinx_dpdma_chan;
+
+/**
+ * struct xilinx_dpdma_hw_desc - DPDMA hardware descriptor
+ * @control: control configuration field
+ * @desc_id: descriptor ID
+ * @xfer_size: transfer size
+ * @hsize_stride: horizontal size and stride
+ * @timestamp_lsb: LSB of time stamp
+ * @timestamp_msb: MSB of time stamp
+ * @addr_ext: upper 16 bit of 48 bit address (next_desc and src_addr)
+ * @next_desc: next descriptor 32 bit address
+ * @src_addr: payload source address (1st page, 32 LSB)
+ * @addr_ext_23: payload source address (3nd and 3rd pages, 16 LSBs)
+ * @addr_ext_45: payload source address (4th and 5th pages, 16 LSBs)
+ * @src_addr2: payload source address (2nd page, 32 LSB)
+ * @src_addr3: payload source address (3rd page, 32 LSB)
+ * @src_addr4: payload source address (4th page, 32 LSB)
+ * @src_addr5: payload source address (5th page, 32 LSB)
+ * @crc: descriptor CRC
+ */
+struct xilinx_dpdma_hw_desc {
+	u32 control;
+	u32 desc_id;
+	u32 xfer_size;
+	u32 hsize_stride;
+	u32 timestamp_lsb;
+	u32 timestamp_msb;
+	u32 addr_ext;
+	u32 next_desc;
+	u32 src_addr;
+	u32 addr_ext_23;
+	u32 addr_ext_45;
+	u32 src_addr2;
+	u32 src_addr3;
+	u32 src_addr4;
+	u32 src_addr5;
+	u32 crc;
+} __aligned(XILINX_DPDMA_ALIGN_BYTES);
+
+/**
+ * struct xilinx_dpdma_sw_desc - DPDMA software descriptor
+ * @hw: DPDMA hardware descriptor
+ * @node: list node for software descriptors
+ * @dma_addr: DMA address of the software descriptor
+ */
+struct xilinx_dpdma_sw_desc {
+	struct xilinx_dpdma_hw_desc hw;
+	struct list_head node;
+	dma_addr_t dma_addr;
+};
+
+/**
+ * struct xilinx_dpdma_tx_desc - DPDMA transaction descriptor
+ * @vdesc: virtual DMA descriptor
+ * @chan: DMA channel
+ * @descriptors: list of software descriptors
+ * @error: an error has been detected with this descriptor
+ */
+struct xilinx_dpdma_tx_desc {
+	struct virt_dma_desc vdesc;
+	struct xilinx_dpdma_chan *chan;
+	struct list_head descriptors;
+	bool error;
+};
+
+#define to_dpdma_tx_desc(_desc) \
+	container_of(_desc, struct xilinx_dpdma_tx_desc, vdesc)
+
+/**
+ * struct xilinx_dpdma_chan - DPDMA channel
+ * @vchan: virtual DMA channel
+ * @reg: register base address
+ * @id: channel ID
+ * @wait_to_stop: queue to wait for outstanding transacitons before stopping
+ * @running: true if the channel is running
+ * @first_frame: flag for the first frame of stream
+ * @video_group: flag if multi-channel operation is needed for video channels
+ * @lock: lock to access struct xilinx_dpdma_chan
+ * @desc_pool: descriptor allocation pool
+ * @err_task: error IRQ bottom half handler
+ * @desc.pending: Descriptor schedule to the hardware, pending execution
+ * @desc.active: Descriptor being executed by the hardware
+ * @xdev: DPDMA device
+ */
+struct xilinx_dpdma_chan {
+	struct virt_dma_chan vchan;
+	void __iomem *reg;
+	unsigned int id;
+
+	wait_queue_head_t wait_to_stop;
+	bool running;
+	bool first_frame;
+	bool video_group;
+
+	spinlock_t lock; /* lock to access struct xilinx_dpdma_chan */
+	struct dma_pool *desc_pool;
+	struct tasklet_struct err_task;
+
+	struct {
+		struct xilinx_dpdma_tx_desc *pending;
+		struct xilinx_dpdma_tx_desc *active;
+	} desc;
+
+	struct xilinx_dpdma_device *xdev;
+};
+
+#define to_xilinx_chan(_chan) \
+	container_of(_chan, struct xilinx_dpdma_chan, vchan.chan)
+
+/**
+ * struct xilinx_dpdma_device - DPDMA device
+ * @common: generic dma device structure
+ * @reg: register base address
+ * @dev: generic device structure
+ * @irq: the interrupt number
+ * @axi_clk: axi clock
+ * @chan: DPDMA channels
+ * @ext_addr: flag for 64 bit system (48 bit addressing)
+ */
+struct xilinx_dpdma_device {
+	struct dma_device common;
+	void __iomem *reg;
+	struct device *dev;
+	int irq;
+
+	struct clk *axi_clk;
+	struct xilinx_dpdma_chan *chan[XILINX_DPDMA_NUM_CHAN];
+
+	bool ext_addr;
+};
+
+/* -----------------------------------------------------------------------------
+ * I/O Accessors
+ */
+
+static inline u32 dpdma_read(void __iomem *base, u32 offset)
+{
+	return ioread32(base + offset);
+}
+
+static inline void dpdma_write(void __iomem *base, u32 offset, u32 val)
+{
+	iowrite32(val, base + offset);
+}
+
+static inline void dpdma_clr(void __iomem *base, u32 offset, u32 clr)
+{
+	dpdma_write(base, offset, dpdma_read(base, offset) & ~clr);
+}
+
+static inline void dpdma_set(void __iomem *base, u32 offset, u32 set)
+{
+	dpdma_write(base, offset, dpdma_read(base, offset) | set);
+}
+
+/* -----------------------------------------------------------------------------
+ * Descriptor Operations
+ */
+
+/**
+ * xilinx_dpdma_sw_desc_set_dma_addrs - Set DMA addresses in the descriptor
+ * @sw_desc: The software descriptor in which to set DMA addresses
+ * @prev: The previous descriptor
+ * @dma_addr: array of dma addresses
+ * @num_src_addr: number of addresses in @dma_addr
+ *
+ * Set all the DMA addresses in the hardware descriptor corresponding to @dev
+ * from @dma_addr. If a previous descriptor is specified in @prev, its next
+ * descriptor DMA address is set to the DMA address of @sw_desc. @prev may be
+ * identical to @sw_desc for cyclic transfers.
+ */
+static void xilinx_dpdma_sw_desc_set_dma_addrs(struct xilinx_dpdma_device *xdev,
+					       struct xilinx_dpdma_sw_desc *sw_desc,
+					       struct xilinx_dpdma_sw_desc *prev,
+					       dma_addr_t dma_addr[],
+					       unsigned int num_src_addr)
+{
+	struct xilinx_dpdma_hw_desc *hw_desc = &sw_desc->hw;
+	unsigned int i;
+
+	hw_desc->src_addr = lower_32_bits(dma_addr[0]);
+	if (xdev->ext_addr)
+		hw_desc->addr_ext |=
+			FIELD_PREP(XILINX_DPDMA_DESC_ADDR_EXT_SRC_ADDR_MASK,
+				   upper_32_bits(dma_addr[0]));
+
+	for (i = 1; i < num_src_addr; i++) {
+		u32 *addr = &hw_desc->src_addr2;
+
+		addr[i-1] = lower_32_bits(dma_addr[i]);
+
+		if (xdev->ext_addr) {
+			u32 *addr_ext = &hw_desc->addr_ext_23;
+			u32 addr_msb;
+
+			addr_msb = upper_32_bits(dma_addr[i]) & GENMASK(15, 0);
+			addr_msb <<= 16 * ((i - 1) % 2);
+			addr_ext[(i - 1) / 2] |= addr_msb;
+		}
+	}
+
+	if (!prev)
+		return;
+
+	prev->hw.next_desc = lower_32_bits(sw_desc->dma_addr);
+	if (xdev->ext_addr)
+		prev->hw.addr_ext |=
+			FIELD_PREP(XILINX_DPDMA_DESC_ADDR_EXT_NEXT_ADDR_MASK,
+				   upper_32_bits(sw_desc->dma_addr));
+}
+
+/**
+ * xilinx_dpdma_chan_alloc_sw_desc - Allocate a software descriptor
+ * @chan: DPDMA channel
+ *
+ * Allocate a software descriptor from the channel's descriptor pool.
+ *
+ * Return: a software descriptor or NULL.
+ */
+static struct xilinx_dpdma_sw_desc *
+xilinx_dpdma_chan_alloc_sw_desc(struct xilinx_dpdma_chan *chan)
+{
+	struct xilinx_dpdma_sw_desc *sw_desc;
+	dma_addr_t dma_addr;
+
+	sw_desc = dma_pool_zalloc(chan->desc_pool, GFP_ATOMIC, &dma_addr);
+	if (!sw_desc)
+		return NULL;
+
+	sw_desc->dma_addr = dma_addr;
+
+	return sw_desc;
+}
+
+/**
+ * xilinx_dpdma_chan_free_sw_desc - Free a software descriptor
+ * @chan: DPDMA channel
+ * @sw_desc: software descriptor to free
+ *
+ * Free a software descriptor from the channel's descriptor pool.
+ */
+static void
+xilinx_dpdma_chan_free_sw_desc(struct xilinx_dpdma_chan *chan,
+			       struct xilinx_dpdma_sw_desc *sw_desc)
+{
+	dma_pool_free(chan->desc_pool, sw_desc, sw_desc->dma_addr);
+}
+
+/**
+ * xilinx_dpdma_chan_dump_tx_desc - Dump a tx descriptor
+ * @chan: DPDMA channel
+ * @tx_desc: tx descriptor to dump
+ *
+ * Dump contents of a tx descriptor
+ */
+static void xilinx_dpdma_chan_dump_tx_desc(struct xilinx_dpdma_chan *chan,
+					   struct xilinx_dpdma_tx_desc *tx_desc)
+{
+	struct xilinx_dpdma_sw_desc *sw_desc;
+	struct device *dev = chan->xdev->dev;
+	unsigned int i = 0;
+
+	dev_dbg(dev, "------- TX descriptor dump start -------\n");
+	dev_dbg(dev, "------- channel ID = %d -------\n", chan->id);
+
+	list_for_each_entry(sw_desc, &tx_desc->descriptors, node) {
+		struct xilinx_dpdma_hw_desc *hw_desc = &sw_desc->hw;
+
+		dev_dbg(dev, "------- HW descriptor %d -------\n", i++);
+		dev_dbg(dev, "descriptor DMA addr: %pad\n", &sw_desc->dma_addr);
+		dev_dbg(dev, "control: 0x%08x\n", hw_desc->control);
+		dev_dbg(dev, "desc_id: 0x%08x\n", hw_desc->desc_id);
+		dev_dbg(dev, "xfer_size: 0x%08x\n", hw_desc->xfer_size);
+		dev_dbg(dev, "hsize_stride: 0x%08x\n", hw_desc->hsize_stride);
+		dev_dbg(dev, "timestamp_lsb: 0x%08x\n", hw_desc->timestamp_lsb);
+		dev_dbg(dev, "timestamp_msb: 0x%08x\n", hw_desc->timestamp_msb);
+		dev_dbg(dev, "addr_ext: 0x%08x\n", hw_desc->addr_ext);
+		dev_dbg(dev, "next_desc: 0x%08x\n", hw_desc->next_desc);
+		dev_dbg(dev, "src_addr: 0x%08x\n", hw_desc->src_addr);
+		dev_dbg(dev, "addr_ext_23: 0x%08x\n", hw_desc->addr_ext_23);
+		dev_dbg(dev, "addr_ext_45: 0x%08x\n", hw_desc->addr_ext_45);
+		dev_dbg(dev, "src_addr2: 0x%08x\n", hw_desc->src_addr2);
+		dev_dbg(dev, "src_addr3: 0x%08x\n", hw_desc->src_addr3);
+		dev_dbg(dev, "src_addr4: 0x%08x\n", hw_desc->src_addr4);
+		dev_dbg(dev, "src_addr5: 0x%08x\n", hw_desc->src_addr5);
+		dev_dbg(dev, "crc: 0x%08x\n", hw_desc->crc);
+	}
+
+	dev_dbg(dev, "------- TX descriptor dump end -------\n");
+}
+
+/**
+ * xilinx_dpdma_chan_alloc_tx_desc - Allocate a transaction descriptor
+ * @chan: DPDMA channel
+ *
+ * Allocate a tx descriptor.
+ *
+ * Return: a tx descriptor or NULL.
+ */
+static struct xilinx_dpdma_tx_desc *
+xilinx_dpdma_chan_alloc_tx_desc(struct xilinx_dpdma_chan *chan)
+{
+	struct xilinx_dpdma_tx_desc *tx_desc;
+
+	tx_desc = kzalloc(sizeof(*tx_desc), GFP_KERNEL);
+	if (!tx_desc)
+		return NULL;
+
+	INIT_LIST_HEAD(&tx_desc->descriptors);
+	tx_desc->chan = chan;
+	tx_desc->error = false;
+
+	return tx_desc;
+}
+
+/**
+ * xilinx_dpdma_chan_free_tx_desc - Free a virtual DMA descriptor
+ * @vdesc: virtual DMA descriptor
+ *
+ * Free the virtual DMA descriptor @vdesc including its software descriptors.
+ */
+static void xilinx_dpdma_chan_free_tx_desc(struct virt_dma_desc *vdesc)
+{
+	struct xilinx_dpdma_sw_desc *sw_desc, *next;
+	struct xilinx_dpdma_tx_desc *desc;
+
+	if (!vdesc)
+		return;
+
+	desc = to_dpdma_tx_desc(vdesc);
+
+	list_for_each_entry_safe(sw_desc, next, &desc->descriptors, node) {
+		list_del(&sw_desc->node);
+		xilinx_dpdma_chan_free_sw_desc(desc->chan, sw_desc);
+	}
+
+	kfree(desc);
+}
+
+/**
+ * xilinx_dpdma_chan_prep_interleaved_cyclic - Prepare a cyclic interleaved dma
+ *					       descriptor
+ * @chan: DPDMA channel
+ * @xt: dma interleaved template
+ *
+ * Prepare a tx descriptor including internal software/hardware descriptors
+ * based on @xt.
+ *
+ * Return: A DPDMA TX descriptor on success, or NULL.
+ */
+static struct xilinx_dpdma_tx_desc *
+xilinx_dpdma_chan_prep_interleaved_cyclic(struct xilinx_dpdma_chan *chan,
+					  struct dma_interleaved_template *xt)
+{
+	struct xilinx_dpdma_tx_desc *tx_desc;
+	struct xilinx_dpdma_sw_desc *sw_desc;
+	struct xilinx_dpdma_hw_desc *hw_desc;
+	size_t hsize = xt->sgl[0].size;
+	size_t stride = hsize + xt->sgl[0].icg;
+
+	if (!IS_ALIGNED(xt->src_start, XILINX_DPDMA_ALIGN_BYTES)) {
+		dev_err(chan->xdev->dev, "buffer should be aligned at %d B\n",
+			XILINX_DPDMA_ALIGN_BYTES);
+		return NULL;
+	}
+
+	tx_desc = xilinx_dpdma_chan_alloc_tx_desc(chan);
+	if (!tx_desc)
+		return NULL;
+
+	sw_desc = xilinx_dpdma_chan_alloc_sw_desc(chan);
+	if (!sw_desc) {
+		xilinx_dpdma_chan_free_tx_desc(&tx_desc->vdesc);
+		return NULL;
+	}
+
+	xilinx_dpdma_sw_desc_set_dma_addrs(chan->xdev, sw_desc, sw_desc,
+					   &xt->src_start, 1);
+
+	hw_desc = &sw_desc->hw;
+	hsize = ALIGN(hsize, XILINX_DPDMA_LINESIZE_ALIGN_BITS / 8);
+	hw_desc->xfer_size = hsize * xt->numf;
+	hw_desc->hsize_stride =
+		FIELD_PREP(XILINX_DPDMA_DESC_HSIZE_STRIDE_HSIZE_MASK, hsize) |
+		FIELD_PREP(XILINX_DPDMA_DESC_HSIZE_STRIDE_STRIDE_MASK,
+			   stride / 16);
+	hw_desc->control |= XILINX_DPDMA_DESC_CONTROL_PREEMBLE;
+	hw_desc->control |= XILINX_DPDMA_DESC_CONTROL_COMPLETE_INTR;
+	hw_desc->control |= XILINX_DPDMA_DESC_CONTROL_IGNORE_DONE;
+	hw_desc->control |= XILINX_DPDMA_DESC_CONTROL_LAST_OF_FRAME;
+
+	list_add_tail(&sw_desc->node, &tx_desc->descriptors);
+
+	return tx_desc;
+}
+
+/* -----------------------------------------------------------------------------
+ * DPDMA Channel Operations
+ */
+
+/**
+ * xilinx_dpdma_chan_enable - Enable the channel
+ * @chan: DPDMA channel
+ *
+ * Enable the channel and its interrupts. Set the QoS values for video class.
+ */
+static void xilinx_dpdma_chan_enable(struct xilinx_dpdma_chan *chan)
+{
+	u32 reg;
+
+	reg = (XILINX_DPDMA_INTR_CHAN_MASK << chan->id)
+	    | XILINX_DPDMA_INTR_GLOBAL_MASK;
+	dpdma_write(chan->xdev->reg, XILINX_DPDMA_IEN, reg);
+	reg = (XILINX_DPDMA_EINTR_CHAN_ERR_MASK << chan->id)
+	    | XILINX_DPDMA_INTR_GLOBAL_ERR;
+	dpdma_write(chan->xdev->reg, XILINX_DPDMA_EIEN, reg);
+
+	reg = XILINX_DPDMA_CH_CNTL_ENABLE
+	    | FIELD_PREP(XILINX_DPDMA_CH_CNTL_QOS_DSCR_WR_MASK,
+			 XILINX_DPDMA_CH_CNTL_QOS_VID_CLASS)
+	    | FIELD_PREP(XILINX_DPDMA_CH_CNTL_QOS_DSCR_RD_MASK,
+			 XILINX_DPDMA_CH_CNTL_QOS_VID_CLASS)
+	    | FIELD_PREP(XILINX_DPDMA_CH_CNTL_QOS_DATA_RD_MASK,
+			 XILINX_DPDMA_CH_CNTL_QOS_VID_CLASS);
+	dpdma_set(chan->reg, XILINX_DPDMA_CH_CNTL, reg);
+}
+
+/**
+ * xilinx_dpdma_chan_disable - Disable the channel
+ * @chan: DPDMA channel
+ *
+ * Disable the channel and its interrupts.
+ */
+static void xilinx_dpdma_chan_disable(struct xilinx_dpdma_chan *chan)
+{
+	u32 reg;
+
+	reg = XILINX_DPDMA_INTR_CHAN_MASK << chan->id;
+	dpdma_write(chan->xdev->reg, XILINX_DPDMA_IEN, reg);
+	reg = XILINX_DPDMA_EINTR_CHAN_ERR_MASK << chan->id;
+	dpdma_write(chan->xdev->reg, XILINX_DPDMA_EIEN, reg);
+
+	dpdma_clr(chan->reg, XILINX_DPDMA_CH_CNTL, XILINX_DPDMA_CH_CNTL_ENABLE);
+}
+
+/**
+ * xilinx_dpdma_chan_pause - Pause the channel
+ * @chan: DPDMA channel
+ *
+ * Pause the channel.
+ */
+static void xilinx_dpdma_chan_pause(struct xilinx_dpdma_chan *chan)
+{
+	dpdma_set(chan->reg, XILINX_DPDMA_CH_CNTL, XILINX_DPDMA_CH_CNTL_PAUSE);
+}
+
+/**
+ * xilinx_dpdma_chan_unpause - Unpause the channel
+ * @chan: DPDMA channel
+ *
+ * Unpause the channel.
+ */
+static void xilinx_dpdma_chan_unpause(struct xilinx_dpdma_chan *chan)
+{
+	dpdma_clr(chan->reg, XILINX_DPDMA_CH_CNTL, XILINX_DPDMA_CH_CNTL_PAUSE);
+}
+
+static u32 xilinx_dpdma_chan_video_group_ready(struct xilinx_dpdma_chan *chan)
+{
+	struct xilinx_dpdma_device *xdev = chan->xdev;
+	u32 channels = 0;
+	unsigned int i;
+
+	for (i = ZYNQMP_DPDMA_VIDEO0; i <= ZYNQMP_DPDMA_VIDEO2; i++) {
+		if (xdev->chan[i]->video_group && !xdev->chan[i]->running)
+			return 0;
+
+		if (xdev->chan[i]->video_group)
+			channels |= BIT(i);
+	}
+
+	return channels;
+}
+
+/**
+ * xilinx_dpdma_chan_queue_transfer - Queue the next transfer
+ * @chan: DPDMA channel
+ *
+ * Queue the next descriptor, if any, to the hardware. If the channel is
+ * stopped, start it first. Otherwise retrigger it with the next descriptor.
+ */
+static void xilinx_dpdma_chan_queue_transfer(struct xilinx_dpdma_chan *chan)
+{
+	struct xilinx_dpdma_device *xdev = chan->xdev;
+	struct xilinx_dpdma_sw_desc *sw_desc;
+	struct xilinx_dpdma_tx_desc *desc;
+	struct virt_dma_desc *vdesc;
+	u32 reg, channels;
+
+	lockdep_assert_held(&chan->lock);
+
+	if (chan->desc.pending)
+		return;
+
+	if (!chan->running) {
+		xilinx_dpdma_chan_unpause(chan);
+		xilinx_dpdma_chan_enable(chan);
+		chan->first_frame = true;
+		chan->running = true;
+	}
+
+	if (chan->video_group)
+		channels = xilinx_dpdma_chan_video_group_ready(chan);
+	else
+		channels = BIT(chan->id);
+
+	if (!channels)
+		return;
+
+	vdesc = vchan_next_desc(&chan->vchan);
+	if (!vdesc)
+		return;
+
+	desc = to_dpdma_tx_desc(vdesc);
+	chan->desc.pending = desc;
+	list_del(&desc->vdesc.node);
+
+	/*
+	 * Assign the cookie to descriptors in this transaction. Only 16 bit
+	 * will be used, but it should be enough.
+	 */
+	list_for_each_entry(sw_desc, &desc->descriptors, node)
+		sw_desc->hw.desc_id = desc->vdesc.tx.cookie;
+
+	sw_desc = list_first_entry(&desc->descriptors,
+				   struct xilinx_dpdma_sw_desc, node);
+	dpdma_write(chan->reg, XILINX_DPDMA_CH_DESC_START_ADDR,
+		    lower_32_bits(sw_desc->dma_addr));
+	if (xdev->ext_addr)
+		dpdma_write(chan->reg, XILINX_DPDMA_CH_DESC_START_ADDRE,
+			    FIELD_PREP(XILINX_DPDMA_CH_DESC_START_ADDRE_MASK,
+				       upper_32_bits(sw_desc->dma_addr)));
+
+	if (chan->first_frame)
+		reg = XILINX_DPDMA_GBL_TRIG_MASK(channels);
+	else
+		reg = XILINX_DPDMA_GBL_RETRIG_MASK(channels);
+
+	chan->first_frame = false;
+
+	dpdma_write(xdev->reg, XILINX_DPDMA_GBL, reg);
+}
+
+/**
+ * xilinx_dpdma_chan_ostand - Number of outstanding transactions
+ * @chan: DPDMA channel
+ *
+ * Read and return the number of outstanding transactions from register.
+ *
+ * Return: Number of outstanding transactions from the status register.
+ */
+static u32 xilinx_dpdma_chan_ostand(struct xilinx_dpdma_chan *chan)
+{
+	return FIELD_GET(XILINX_DPDMA_CH_STATUS_OTRAN_CNT_MASK,
+			 dpdma_read(chan->reg, XILINX_DPDMA_CH_STATUS));
+}
+
+/**
+ * xilinx_dpdma_chan_no_ostand - Notify no outstanding transaction event
+ * @chan: DPDMA channel
+ *
+ * Notify waiters for no outstanding event, so waiters can stop the channel
+ * safely. This function is supposed to be called when 'no outstanding'
+ * interrupt is generated. The 'no outstanding' interrupt is disabled and
+ * should be re-enabled when this event is handled. If the channel status
+ * register still shows some number of outstanding transactions, the interrupt
+ * remains enabled.
+ *
+ * Return: 0 on success. On failure, -EWOULDBLOCK if there's still outstanding
+ * transaction(s).
+ */
+static int xilinx_dpdma_chan_notify_no_ostand(struct xilinx_dpdma_chan *chan)
+{
+	u32 cnt;
+
+	cnt = xilinx_dpdma_chan_ostand(chan);
+	if (cnt) {
+		dev_dbg(chan->xdev->dev, "%d outstanding transactions\n", cnt);
+		return -EWOULDBLOCK;
+	}
+
+	/* Disable 'no outstanding' interrupt */
+	dpdma_write(chan->xdev->reg, XILINX_DPDMA_IDS,
+		    XILINX_DPDMA_INTR_NO_OSTAND(chan->id));
+	wake_up(&chan->wait_to_stop);
+
+	return 0;
+}
+
+/**
+ * xilinx_dpdma_chan_wait_no_ostand - Wait for the no outstanding irq
+ * @chan: DPDMA channel
+ *
+ * Wait for the no outstanding transaction interrupt. This functions can sleep
+ * for 50ms.
+ *
+ * Return: 0 on success. On failure, -ETIMEOUT for time out, or the error code
+ * from wait_event_interruptible_timeout().
+ */
+static int xilinx_dpdma_chan_wait_no_ostand(struct xilinx_dpdma_chan *chan)
+{
+	int ret;
+
+	/* Wait for a no outstanding transaction interrupt upto 50msec */
+	ret = wait_event_interruptible_timeout(chan->wait_to_stop,
+					       !xilinx_dpdma_chan_ostand(chan),
+					       msecs_to_jiffies(50));
+	if (ret > 0) {
+		dpdma_write(chan->xdev->reg, XILINX_DPDMA_IEN,
+			    XILINX_DPDMA_INTR_NO_OSTAND(chan->id));
+		return 0;
+	}
+
+	dev_err(chan->xdev->dev, "not ready to stop: %d trans\n",
+		xilinx_dpdma_chan_ostand(chan));
+
+	if (ret == 0)
+		return -ETIMEDOUT;
+
+	return ret;
+}
+
+/**
+ * xilinx_dpdma_chan_poll_no_ostand - Poll the outstanding transaction status
+ * @chan: DPDMA channel
+ *
+ * Poll the outstanding transaction status, and return when there's no
+ * outstanding transaction. This functions can be used in the interrupt context
+ * or where the atomicity is required. Calling thread may wait more than 50ms.
+ *
+ * Return: 0 on success, or -ETIMEDOUT.
+ */
+static int xilinx_dpdma_chan_poll_no_ostand(struct xilinx_dpdma_chan *chan)
+{
+	u32 cnt, loop = 50000;
+
+	/* Poll at least for 50ms (20 fps). */
+	do {
+		cnt = xilinx_dpdma_chan_ostand(chan);
+		udelay(1);
+	} while (loop-- > 0 && cnt);
+
+	if (loop) {
+		dpdma_write(chan->xdev->reg, XILINX_DPDMA_IEN,
+			    XILINX_DPDMA_INTR_NO_OSTAND(chan->id));
+		return 0;
+	}
+
+	dev_err(chan->xdev->dev, "not ready to stop: %d trans\n",
+		xilinx_dpdma_chan_ostand(chan));
+
+	return -ETIMEDOUT;
+}
+
+/**
+ * xilinx_dpdma_chan_stop - Stop the channel
+ * @chan: DPDMA channel
+ *
+ * Stop a previously paused channel by first waiting for completion of all
+ * outstanding transaction and then disabling the channel.
+ *
+ * Return: 0 on success, or -ETIMEDOUT if the channel failed to stop.
+ */
+static int xilinx_dpdma_chan_stop(struct xilinx_dpdma_chan *chan)
+{
+	unsigned long flags;
+	int ret;
+
+	ret = xilinx_dpdma_chan_wait_no_ostand(chan);
+	if (ret)
+		return ret;
+
+	spin_lock_irqsave(&chan->lock, flags);
+	xilinx_dpdma_chan_disable(chan);
+	chan->running = false;
+	spin_unlock_irqrestore(&chan->lock, flags);
+
+	return 0;
+}
+
+/**
+ * xilinx_dpdma_chan_done_irq - Handle hardware descriptor completion
+ * @chan: DPDMA channel
+ *
+ * Handle completion of the currently active descriptor (@chan->desc.active). As
+ * we currently support cyclic transfers only, this just invokes the cyclic
+ * callback. The descriptor will be completed at the VSYNC interrupt when a new
+ * descriptor replaces it.
+ */
+static void xilinx_dpdma_chan_done_irq(struct xilinx_dpdma_chan *chan)
+{
+	struct xilinx_dpdma_tx_desc *active = chan->desc.active;
+	unsigned long flags;
+
+	spin_lock_irqsave(&chan->lock, flags);
+
+	if (active)
+		vchan_cyclic_callback(&active->vdesc);
+	else
+		dev_warn(chan->xdev->dev,
+			 "DONE IRQ with no active descriptor!\n");
+
+	spin_unlock_irqrestore(&chan->lock, flags);
+}
+
+/**
+ * xilinx_dpdma_chan_vsync_irq - Handle hardware descriptor scheduling
+ * @chan: DPDMA channel
+ *
+ * At VSYNC the active descriptor may have been replaced by the pending
+ * descriptor. Detect this through the DESC_ID and perform appropriate
+ * bookkeeping.
+ */
+static void xilinx_dpdma_chan_vsync_irq(struct  xilinx_dpdma_chan *chan)
+{
+	struct xilinx_dpdma_tx_desc *pending;
+	struct xilinx_dpdma_sw_desc *sw_desc;
+	unsigned long flags;
+	u32 desc_id;
+
+	spin_lock_irqsave(&chan->lock, flags);
+
+	pending = chan->desc.pending;
+	if (!chan->running || !pending)
+		goto out;
+
+	desc_id = dpdma_read(chan->reg, XILINX_DPDMA_CH_DESC_ID);
+
+	/* If the retrigger raced with vsync, retry at the next frame. */
+	sw_desc = list_first_entry(&pending->descriptors,
+				   struct xilinx_dpdma_sw_desc, node);
+	if (sw_desc->hw.desc_id != desc_id)
+		goto out;
+
+	/*
+	 * Complete the active descriptor, if any, promote the pending
+	 * descriptor to active, and queue the next transfer, if any.
+	 */
+	if (chan->desc.active)
+		vchan_cookie_complete(&chan->desc.active->vdesc);
+	chan->desc.active = pending;
+	chan->desc.pending = NULL;
+
+	xilinx_dpdma_chan_queue_transfer(chan);
+
+out:
+	spin_unlock_irqrestore(&chan->lock, flags);
+}
+
+/**
+ * xilinx_dpdma_chan_err - Detect any channel error
+ * @chan: DPDMA channel
+ * @isr: masked Interrupt Status Register
+ * @eisr: Error Interrupt Status Register
+ *
+ * Return: true if any channel error occurs, or false otherwise.
+ */
+static bool
+xilinx_dpdma_chan_err(struct xilinx_dpdma_chan *chan, u32 isr, u32 eisr)
+{
+	if (!chan)
+		return false;
+
+	if (chan->running &&
+	    ((isr & (XILINX_DPDMA_INTR_CHAN_ERR_MASK << chan->id)) ||
+	    (eisr & (XILINX_DPDMA_EINTR_CHAN_ERR_MASK << chan->id))))
+		return true;
+
+	return false;
+}
+
+/**
+ * xilinx_dpdma_chan_handle_err - DPDMA channel error handling
+ * @chan: DPDMA channel
+ *
+ * This function is called when any channel error or any global error occurs.
+ * The function disables the paused channel by errors and determines
+ * if the current active descriptor can be rescheduled depending on
+ * the descriptor status.
+ */
+static void xilinx_dpdma_chan_handle_err(struct xilinx_dpdma_chan *chan)
+{
+	struct xilinx_dpdma_device *xdev = chan->xdev;
+	struct xilinx_dpdma_tx_desc *active;
+	unsigned long flags;
+
+	spin_lock_irqsave(&chan->lock, flags);
+
+	dev_dbg(xdev->dev, "cur desc addr = 0x%04x%08x\n",
+		dpdma_read(chan->reg, XILINX_DPDMA_CH_DESC_START_ADDRE),
+		dpdma_read(chan->reg, XILINX_DPDMA_CH_DESC_START_ADDR));
+	dev_dbg(xdev->dev, "cur payload addr = 0x%04x%08x\n",
+		dpdma_read(chan->reg, XILINX_DPDMA_CH_PYLD_CUR_ADDRE),
+		dpdma_read(chan->reg, XILINX_DPDMA_CH_PYLD_CUR_ADDR));
+
+	xilinx_dpdma_chan_disable(chan);
+	chan->running = false;
+
+	if (!chan->desc.active)
+		goto out_unlock;
+
+	active = chan->desc.active;
+	chan->desc.active = NULL;
+
+	xilinx_dpdma_chan_dump_tx_desc(chan, active);
+
+	if (active->error)
+		dev_dbg(xdev->dev, "repeated error on desc\n");
+
+	/* Reschedule if there's no new descriptor */
+	if (!chan->desc.pending &&
+	    list_empty(&chan->vchan.desc_issued)) {
+		active->error = true;
+		list_add_tail(&active->vdesc.node,
+			      &chan->vchan.desc_issued);
+	} else {
+		xilinx_dpdma_chan_free_tx_desc(&active->vdesc);
+	}
+
+out_unlock:
+	spin_unlock_irqrestore(&chan->lock, flags);
+}
+
+/* -----------------------------------------------------------------------------
+ * DMA Engine Operations
+ */
+
+static struct dma_async_tx_descriptor *
+xilinx_dpdma_prep_interleaved_cyclic(struct dma_chan *dchan,
+				     struct dma_interleaved_template *xt,
+				     unsigned long flags)
+{
+	struct xilinx_dpdma_chan *chan = to_xilinx_chan(dchan);
+	struct xilinx_dpdma_tx_desc *desc;
+
+	if (xt->dir != DMA_MEM_TO_DEV)
+		return NULL;
+
+	if (!xt->numf || !xt->sgl[0].size)
+		return NULL;
+
+	desc = xilinx_dpdma_chan_prep_interleaved_cyclic(chan, xt);
+	if (!desc)
+		return NULL;
+
+	vchan_tx_prep(&chan->vchan, &desc->vdesc, flags | DMA_CTRL_ACK);
+
+	return &desc->vdesc.tx;
+}
+
+/**
+ * xilinx_dpdma_alloc_chan_resources - Allocate resources for the channel
+ * @dchan: DMA channel
+ *
+ * Allocate a descriptor pool for the channel.
+ *
+ * Return: 0 on success, or -ENOMEM if failed to allocate a pool.
+ */
+static int xilinx_dpdma_alloc_chan_resources(struct dma_chan *dchan)
+{
+	struct xilinx_dpdma_chan *chan = to_xilinx_chan(dchan);
+	size_t align = __alignof__(struct xilinx_dpdma_sw_desc);
+
+	chan->desc_pool = dma_pool_create(dev_name(chan->xdev->dev),
+					  chan->xdev->dev,
+					  sizeof(struct xilinx_dpdma_sw_desc),
+					  align, 0);
+	if (!chan->desc_pool) {
+		dev_err(chan->xdev->dev,
+			"failed to allocate a descriptor pool\n");
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+/**
+ * xilinx_dpdma_free_chan_resources - Free all resources for the channel
+ * @dchan: DMA channel
+ *
+ * Free resources associated with the virtual DMA channel, and destroy the
+ * descriptor pool.
+ */
+static void xilinx_dpdma_free_chan_resources(struct dma_chan *dchan)
+{
+	struct xilinx_dpdma_chan *chan = to_xilinx_chan(dchan);
+
+	vchan_free_chan_resources(&chan->vchan);
+
+	dma_pool_destroy(chan->desc_pool);
+	chan->desc_pool = NULL;
+}
+
+static void xilinx_dpdma_issue_pending(struct dma_chan *dchan)
+{
+	struct xilinx_dpdma_chan *chan = to_xilinx_chan(dchan);
+	unsigned long flags;
+
+	spin_lock_irqsave(&chan->vchan.lock, flags);
+	if (vchan_issue_pending(&chan->vchan))
+		xilinx_dpdma_chan_queue_transfer(chan);
+	spin_unlock_irqrestore(&chan->vchan.lock, flags);
+}
+
+static int xilinx_dpdma_config(struct dma_chan *dchan,
+			       struct dma_slave_config *config)
+{
+	struct xilinx_dpdma_chan *chan = to_xilinx_chan(dchan);
+	unsigned long flags;
+	int ret;
+
+	if (config->direction != DMA_MEM_TO_DEV)
+		return -EINVAL;
+
+	/*
+	 * The destination address doesn't need to be specified as the DPDMA is
+	 * hardwired to the destination (the DP controller). The transfer
+	 * width, burst size and port window size are thus meaningless, they're
+	 * fixed both on the DPDMA side and on the DP controller side.
+	 */
+
+	spin_lock_irqsave(&chan->lock, flags);
+
+	/* Can't reconfigure a running channel. */
+	if (chan->running) {
+		ret = -EBUSY;
+		goto unlock;
+	}
+
+	/*
+	 * Abuse the slave_id to indicate that the channel is part of a video
+	 * group.
+	 */
+	if (chan->id >= ZYNQMP_DPDMA_VIDEO0 && chan->id <= ZYNQMP_DPDMA_VIDEO2)
+		chan->video_group = config->slave_id != 0;
+
+unlock:
+	spin_unlock_irqrestore(&chan->lock, flags);
+
+	return ret;
+}
+
+static int xilinx_dpdma_pause(struct dma_chan *dchan)
+{
+	xilinx_dpdma_chan_pause(to_xilinx_chan(dchan));
+
+	return 0;
+}
+
+static int xilinx_dpdma_resume(struct dma_chan *dchan)
+{
+	xilinx_dpdma_chan_unpause(to_xilinx_chan(dchan));
+
+	return 0;
+}
+
+/**
+ * xilinx_dpdma_terminate_all - Terminate the channel and descriptors
+ * @dchan: DMA channel
+ *
+ * Pause the channel without waiting for ongoing transfers to complete. Waiting
+ * for completion is performed by xilinx_dpdma_synchronize() that will disable
+ * the channel to complete the stop.
+ *
+ * All the descriptors associated with the channel that are guaranteed not to
+ * be touched by the hardware. The pending and active descriptor are not
+ * touched, and will be freed either upon completion, or by
+ * xilinx_dpdma_synchronize().
+ *
+ * Return: 0 on success, or -ETIMEDOUT if the channel failed to stop.
+ */
+static int xilinx_dpdma_terminate_all(struct dma_chan *dchan)
+{
+	struct xilinx_dpdma_chan *chan = to_xilinx_chan(dchan);
+	struct xilinx_dpdma_device *xdev = chan->xdev;
+	LIST_HEAD(descriptors);
+	unsigned long flags;
+	unsigned int i;
+
+	/* Pause the channel (including the whole video group if applicable). */
+	if (chan->video_group) {
+		for (i = ZYNQMP_DPDMA_VIDEO0; i <= ZYNQMP_DPDMA_VIDEO2; i++) {
+			if (xdev->chan[i]->video_group &&
+			    xdev->chan[i]->running) {
+				xilinx_dpdma_chan_pause(xdev->chan[i]);
+				xdev->chan[i]->video_group = false;
+			}
+		}
+	} else {
+		xilinx_dpdma_chan_pause(chan);
+	}
+
+	/* Gather all the descriptors we can free and free them. */
+	spin_lock_irqsave(&chan->vchan.lock, flags);
+	vchan_get_all_descriptors(&chan->vchan, &descriptors);
+	spin_unlock_irqrestore(&chan->vchan.lock, flags);
+
+	vchan_dma_desc_free_list(&chan->vchan, &descriptors);
+
+	return 0;
+}
+
+/**
+ * xilinx_dpdma_synchronize - Synchronize callback execution
+ * @dchan: DMA channel
+ *
+ * Synchronizing callback execution ensures that all previously issued
+ * transfers have completed and all associated callbacks have been called and
+ * have returned.
+ *
+ * This function waits for the DMA channel to stop. It assumes it has been
+ * paused by a previous call to dmaengine_terminate_async(), and that no new
+ * pending descriptors have been issued with dma_async_issue_pending(). The
+ * behaviour is undefined otherwise.
+ */
+static void xilinx_dpdma_synchronize(struct dma_chan *dchan)
+{
+	struct xilinx_dpdma_chan *chan = to_xilinx_chan(dchan);
+
+	xilinx_dpdma_chan_stop(chan);
+
+	vchan_synchronize(&chan->vchan);
+}
+
+/* -----------------------------------------------------------------------------
+ * Interrupt and Tasklet Handling
+ */
+
+/**
+ * xilinx_dpdma_err - Detect any global error
+ * @isr: Interrupt Status Register
+ * @eisr: Error Interrupt Status Register
+ *
+ * Return: True if any global error occurs, or false otherwise.
+ */
+static bool xilinx_dpdma_err(u32 isr, u32 eisr)
+{
+	if (isr & XILINX_DPDMA_INTR_GLOBAL_ERR ||
+	    eisr & XILINX_DPDMA_EINTR_GLOBAL_ERR)
+		return true;
+
+	return false;
+}
+
+/**
+ * xilinx_dpdma_handle_err_irq - Handle DPDMA error interrupt
+ * @xdev: DPDMA device
+ * @isr: masked Interrupt Status Register
+ * @eisr: Error Interrupt Status Register
+ *
+ * Handle if any error occurs based on @isr and @eisr. This function disables
+ * corresponding error interrupts, and those should be re-enabled once handling
+ * is done.
+ */
+static void xilinx_dpdma_handle_err_irq(struct xilinx_dpdma_device *xdev,
+					u32 isr, u32 eisr)
+{
+	bool err = xilinx_dpdma_err(isr, eisr);
+	unsigned int i;
+
+	dev_dbg_ratelimited(xdev->dev,
+			    "error irq: isr = 0x%08x, eisr = 0x%08x\n",
+			    isr, eisr);
+
+	/* Disable channel error interrupts until errors are handled. */
+	dpdma_write(xdev->reg, XILINX_DPDMA_IDS,
+		    isr & ~XILINX_DPDMA_INTR_GLOBAL_ERR);
+	dpdma_write(xdev->reg, XILINX_DPDMA_EIDS,
+		    eisr & ~XILINX_DPDMA_EINTR_GLOBAL_ERR);
+
+	for (i = 0; i < ARRAY_SIZE(xdev->chan); i++)
+		if (err || xilinx_dpdma_chan_err(xdev->chan[i], isr, eisr))
+			tasklet_schedule(&xdev->chan[i]->err_task);
+}
+
+/**
+ * xilinx_dpdma_enable_irq - Enable interrupts
+ * @xdev: DPDMA device
+ *
+ * Enable interrupts.
+ */
+static void xilinx_dpdma_enable_irq(struct xilinx_dpdma_device *xdev)
+{
+	dpdma_write(xdev->reg, XILINX_DPDMA_IEN, XILINX_DPDMA_INTR_ALL);
+	dpdma_write(xdev->reg, XILINX_DPDMA_EIEN, XILINX_DPDMA_EINTR_ALL);
+}
+
+/**
+ * xilinx_dpdma_disable_irq - Disable interrupts
+ * @xdev: DPDMA device
+ *
+ * Disable interrupts.
+ */
+static void xilinx_dpdma_disable_irq(struct xilinx_dpdma_device *xdev)
+{
+	dpdma_write(xdev->reg, XILINX_DPDMA_IDS, XILINX_DPDMA_INTR_ERR_ALL);
+	dpdma_write(xdev->reg, XILINX_DPDMA_EIDS, XILINX_DPDMA_EINTR_ALL);
+}
+
+/**
+ * xilinx_dpdma_chan_err_task - Per channel tasklet for error handling
+ * @data: tasklet data to be casted to DPDMA channel structure
+ *
+ * Per channel error handling tasklet. This function waits for the outstanding
+ * transaction to complete and triggers error handling. After error handling,
+ * re-enable channel error interrupts, and restart the channel if needed.
+ */
+static void xilinx_dpdma_chan_err_task(unsigned long data)
+{
+	struct xilinx_dpdma_chan *chan = (struct xilinx_dpdma_chan *)data;
+	struct xilinx_dpdma_device *xdev = chan->xdev;
+	unsigned long flags;
+
+	/* Proceed error handling even when polling fails. */
+	xilinx_dpdma_chan_poll_no_ostand(chan);
+
+	xilinx_dpdma_chan_handle_err(chan);
+
+	dpdma_write(xdev->reg, XILINX_DPDMA_IEN,
+		    XILINX_DPDMA_INTR_CHAN_ERR_MASK << chan->id);
+	dpdma_write(xdev->reg, XILINX_DPDMA_EIEN,
+		    XILINX_DPDMA_EINTR_CHAN_ERR_MASK << chan->id);
+
+	spin_lock_irqsave(&chan->lock, flags);
+	xilinx_dpdma_chan_queue_transfer(chan);
+	spin_unlock_irqrestore(&chan->lock, flags);
+}
+
+static irqreturn_t xilinx_dpdma_irq_handler(int irq, void *data)
+{
+	struct xilinx_dpdma_device *xdev = data;
+	unsigned long mask;
+	unsigned int i;
+	u32 status;
+	u32 error;
+
+	status = dpdma_read(xdev->reg, XILINX_DPDMA_ISR);
+	error = dpdma_read(xdev->reg, XILINX_DPDMA_EISR);
+	if (!status && !error)
+		return IRQ_NONE;
+
+	dpdma_write(xdev->reg, XILINX_DPDMA_ISR, status);
+	dpdma_write(xdev->reg, XILINX_DPDMA_EISR, error);
+
+	if (status & XILINX_DPDMA_INTR_VSYNC) {
+		/*
+		 * There's a single VSYNC interrupt that needs to be processed
+		 * by each running channel to update the active descriptor.
+		 */
+		for (i = 0; i < ARRAY_SIZE(xdev->chan); i++) {
+			struct xilinx_dpdma_chan *chan = xdev->chan[i];
+
+			if (chan)
+				xilinx_dpdma_chan_vsync_irq(chan);
+		}
+	}
+
+	mask = FIELD_GET(XILINX_DPDMA_INTR_DESC_DONE_MASK, status);
+	if (mask) {
+		for_each_set_bit(i, &mask, ARRAY_SIZE(xdev->chan))
+			xilinx_dpdma_chan_done_irq(xdev->chan[i]);
+	}
+
+	mask = FIELD_GET(XILINX_DPDMA_INTR_NO_OSTAND_MASK, status);
+	if (mask) {
+		for_each_set_bit(i, &mask, ARRAY_SIZE(xdev->chan))
+			xilinx_dpdma_chan_notify_no_ostand(xdev->chan[i]);
+	}
+
+	mask = status & XILINX_DPDMA_INTR_ERR_ALL;
+	if (mask || error)
+		xilinx_dpdma_handle_err_irq(xdev, mask, error);
+
+	return IRQ_HANDLED;
+}
+
+/* -----------------------------------------------------------------------------
+ * Initialization & Cleanup
+ */
+
+static int xilinx_dpdma_chan_init(struct xilinx_dpdma_device *xdev,
+				  unsigned int chan_id)
+{
+	struct xilinx_dpdma_chan *chan;
+
+	chan = devm_kzalloc(xdev->dev, sizeof(*chan), GFP_KERNEL);
+	if (!chan)
+		return -ENOMEM;
+
+	chan->id = chan_id;
+	chan->reg = xdev->reg + XILINX_DPDMA_CH_BASE
+		  + XILINX_DPDMA_CH_OFFSET * chan->id;
+	chan->running = false;
+	chan->xdev = xdev;
+
+	spin_lock_init(&chan->lock);
+	init_waitqueue_head(&chan->wait_to_stop);
+
+	tasklet_init(&chan->err_task, xilinx_dpdma_chan_err_task,
+		     (unsigned long)chan);
+
+	chan->vchan.desc_free = xilinx_dpdma_chan_free_tx_desc;
+	vchan_init(&chan->vchan, &xdev->common);
+
+	xdev->chan[chan->id] = chan;
+
+	return 0;
+}
+
+static void xilinx_dpdma_chan_remove(struct xilinx_dpdma_chan *chan)
+{
+	if (!chan)
+		return;
+
+	tasklet_kill(&chan->err_task);
+	list_del(&chan->vchan.chan.device_node);
+}
+
+static struct dma_chan *of_dma_xilinx_xlate(struct of_phandle_args *dma_spec,
+					    struct of_dma *ofdma)
+{
+	struct xilinx_dpdma_device *xdev = ofdma->of_dma_data;
+	uint32_t chan_id = dma_spec->args[0];
+
+	if (chan_id >= ARRAY_SIZE(xdev->chan))
+		return NULL;
+
+	if (!xdev->chan[chan_id])
+		return NULL;
+
+	return dma_get_slave_channel(&xdev->chan[chan_id]->vchan.chan);
+}
+
+static int xilinx_dpdma_probe(struct platform_device *pdev)
+{
+	struct xilinx_dpdma_device *xdev;
+	struct dma_device *ddev;
+	unsigned int i;
+	int ret;
+
+	xdev = devm_kzalloc(&pdev->dev, sizeof(*xdev), GFP_KERNEL);
+	if (!xdev)
+		return -ENOMEM;
+
+	xdev->dev = &pdev->dev;
+	xdev->ext_addr = sizeof(dma_addr_t) > 4;
+
+	INIT_LIST_HEAD(&xdev->common.channels);
+
+	platform_set_drvdata(pdev, xdev);
+
+	xdev->axi_clk = devm_clk_get(xdev->dev, "axi_clk");
+	if (IS_ERR(xdev->axi_clk))
+		return PTR_ERR(xdev->axi_clk);
+
+	xdev->reg = devm_platform_ioremap_resource(pdev, 0);
+	if (IS_ERR(xdev->reg))
+		return PTR_ERR(xdev->reg);
+
+	xdev->irq = platform_get_irq(pdev, 0);
+	if (xdev->irq < 0) {
+		dev_err(xdev->dev, "failed to get platform irq\n");
+		return xdev->irq;
+	}
+
+	ret = request_irq(xdev->irq, xilinx_dpdma_irq_handler, IRQF_SHARED,
+			  dev_name(xdev->dev), xdev);
+	if (ret) {
+		dev_err(xdev->dev, "failed to request IRQ\n");
+		return ret;
+	}
+
+	ddev = &xdev->common;
+	ddev->dev = &pdev->dev;
+
+	dma_cap_set(DMA_SLAVE, ddev->cap_mask);
+	dma_cap_set(DMA_PRIVATE, ddev->cap_mask);
+	dma_cap_set(DMA_INTERLEAVE_CYCLIC, ddev->cap_mask);
+	ddev->copy_align = fls(XILINX_DPDMA_ALIGN_BYTES - 1);
+
+	ddev->device_alloc_chan_resources = xilinx_dpdma_alloc_chan_resources;
+	ddev->device_free_chan_resources = xilinx_dpdma_free_chan_resources;
+	ddev->device_prep_interleaved_cyclic = xilinx_dpdma_prep_interleaved_cyclic;
+	/* TODO: Can we achieve better granularity ? */
+	ddev->device_tx_status = dma_cookie_status;
+	ddev->device_issue_pending = xilinx_dpdma_issue_pending;
+	ddev->device_config = xilinx_dpdma_config;
+	ddev->device_pause = xilinx_dpdma_pause;
+	ddev->device_resume = xilinx_dpdma_resume;
+	ddev->device_terminate_all = xilinx_dpdma_terminate_all;
+	ddev->device_synchronize = xilinx_dpdma_synchronize;
+	ddev->src_addr_widths = BIT(DMA_SLAVE_BUSWIDTH_UNDEFINED);
+	ddev->directions = BIT(DMA_MEM_TO_DEV);
+	ddev->residue_granularity = DMA_RESIDUE_GRANULARITY_DESCRIPTOR;
+
+	for (i = 0; i < ARRAY_SIZE(xdev->chan); ++i) {
+		ret = xilinx_dpdma_chan_init(xdev, i);
+		if (ret < 0) {
+			dev_err(xdev->dev, "failed to initialize channel %u\n",
+				i);
+			goto error;
+		}
+	}
+
+	ret = clk_prepare_enable(xdev->axi_clk);
+	if (ret) {
+		dev_err(xdev->dev, "failed to enable the axi clock\n");
+		goto error;
+	}
+
+	ret = dma_async_device_register(ddev);
+	if (ret) {
+		dev_err(xdev->dev, "failed to register the dma device\n");
+		goto error_dma_async;
+	}
+
+	ret = of_dma_controller_register(xdev->dev->of_node,
+					 of_dma_xilinx_xlate, ddev);
+	if (ret) {
+		dev_err(xdev->dev, "failed to register DMA to DT DMA helper\n");
+		goto error_of_dma;
+	}
+
+	xilinx_dpdma_enable_irq(xdev);
+
+	dev_info(&pdev->dev, "Xilinx DPDMA engine is probed\n");
+
+	return 0;
+
+error_of_dma:
+	dma_async_device_unregister(ddev);
+error_dma_async:
+	clk_disable_unprepare(xdev->axi_clk);
+error:
+	for (i = 0; i < ARRAY_SIZE(xdev->chan); i++)
+		xilinx_dpdma_chan_remove(xdev->chan[i]);
+
+	free_irq(xdev->irq, xdev);
+
+	return ret;
+}
+
+static int xilinx_dpdma_remove(struct platform_device *pdev)
+{
+	struct xilinx_dpdma_device *xdev = platform_get_drvdata(pdev);
+	unsigned int i;
+
+	/* Start by disabling the IRQ to avoid races during cleanup. */
+	free_irq(xdev->irq, xdev);
+
+	xilinx_dpdma_disable_irq(xdev);
+	of_dma_controller_free(pdev->dev.of_node);
+	dma_async_device_unregister(&xdev->common);
+	clk_disable_unprepare(xdev->axi_clk);
+
+	for (i = 0; i < ARRAY_SIZE(xdev->chan); i++)
+		xilinx_dpdma_chan_remove(xdev->chan[i]);
+
+	return 0;
+}
+
+static const struct of_device_id xilinx_dpdma_of_match[] = {
+	{ .compatible = "xlnx,zynqmp-dpdma",},
+	{ /* end of table */ },
+};
+MODULE_DEVICE_TABLE(of, xilinx_dpdma_of_match);
+
+static struct platform_driver xilinx_dpdma_driver = {
+	.probe			= xilinx_dpdma_probe,
+	.remove			= xilinx_dpdma_remove,
+	.driver			= {
+		.name		= "xilinx-zynqmp-dpdma",
+		.of_match_table	= xilinx_dpdma_of_match,
+	},
+};
+
+module_platform_driver(xilinx_dpdma_driver);
+
+MODULE_AUTHOR("Xilinx, Inc.");
+MODULE_DESCRIPTION("Xilinx ZynqMP DPDMA driver");
+MODULE_LICENSE("GPL v2");
-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 5/6] dmaengine: xilinx: dpdma: Add debugfs support
  2020-01-23  2:29 [PATCH v3 0/6] dma: Add Xilinx ZynqMP DPDMA driver Laurent Pinchart
                   ` (3 preceding siblings ...)
  2020-01-23  2:29 ` [PATCH v3 4/6] dmaengine: xilinx: dpdma: Add the Xilinx DisplayPort DMA engine driver Laurent Pinchart
@ 2020-01-23  2:29 ` Laurent Pinchart
  2020-01-23  2:29 ` [PATCH v3 6/6] arm64: dts: zynqmp: Add DPDMA node Laurent Pinchart
  5 siblings, 0 replies; 27+ messages in thread
From: Laurent Pinchart @ 2020-01-23  2:29 UTC (permalink / raw)
  To: dmaengine; +Cc: Michal Simek, Hyun Kwon, Tejas Upadhyay, Satish Kumar Nagireddy

Expose statistics to debugfs when available. This helps debugging issues
with the DPDMA driver.

Signed-off-by: Hyun Kwon <hyun.kwon@xilinx.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
---
Changes since v2:

- Refactor debugfs code
---
 drivers/dma/xilinx/xilinx_dpdma.c | 227 ++++++++++++++++++++++++++++++
 1 file changed, 227 insertions(+)

diff --git a/drivers/dma/xilinx/xilinx_dpdma.c b/drivers/dma/xilinx/xilinx_dpdma.c
index 15ba85aa63d9..a0df729e2034 100644
--- a/drivers/dma/xilinx/xilinx_dpdma.c
+++ b/drivers/dma/xilinx/xilinx_dpdma.c
@@ -10,6 +10,7 @@
 #include <linux/bitfield.h>
 #include <linux/bits.h>
 #include <linux/clk.h>
+#include <linux/debugfs.h>
 #include <linux/delay.h>
 #include <linux/dmaengine.h>
 #include <linux/dmapool.h>
@@ -265,6 +266,228 @@ struct xilinx_dpdma_device {
 	bool ext_addr;
 };
 
+/* -----------------------------------------------------------------------------
+ * DebugFS
+ */
+
+#ifdef CONFIG_DEBUG_FS
+
+#define XILINX_DPDMA_DEBUGFS_READ_MAX_SIZE	32
+#define XILINX_DPDMA_DEBUGFS_UINT16_MAX_STR	"65535"
+
+/* Match xilinx_dpdma_testcases vs dpdma_debugfs_reqs[] entry */
+enum xilinx_dpdma_testcases {
+	DPDMA_TC_INTR_DONE,
+	DPDMA_TC_NONE
+};
+
+struct xilinx_dpdma_debugfs {
+	enum xilinx_dpdma_testcases testcase;
+	u16 xilinx_dpdma_irq_done_count;
+	unsigned int chan_id;
+};
+
+static struct xilinx_dpdma_debugfs dpdma_debugfs;
+struct xilinx_dpdma_debugfs_request {
+	const char *name;
+	enum xilinx_dpdma_testcases tc;
+	ssize_t (*read)(char *buf);
+	int (*write)(char *args);
+};
+
+static void xilinx_dpdma_debugfs_desc_done_irq(struct xilinx_dpdma_chan *chan)
+{
+	if (chan->id == dpdma_debugfs.chan_id)
+		dpdma_debugfs.xilinx_dpdma_irq_done_count++;
+}
+
+static ssize_t xilinx_dpdma_debugfs_desc_done_irq_read(char *buf)
+{
+	size_t out_str_len;
+
+	dpdma_debugfs.testcase = DPDMA_TC_NONE;
+
+	out_str_len = strlen(XILINX_DPDMA_DEBUGFS_UINT16_MAX_STR);
+	out_str_len = min_t(size_t, XILINX_DPDMA_DEBUGFS_READ_MAX_SIZE,
+			    out_str_len);
+	snprintf(buf, out_str_len, "%d",
+		 dpdma_debugfs.xilinx_dpdma_irq_done_count);
+
+	return 0;
+}
+
+static int xilinx_dpdma_debugfs_desc_done_irq_write(char *args)
+{
+	char *arg;
+	int ret;
+	u32 id;
+
+	arg = strsep(&args, " ");
+	if (!arg || strncasecmp(arg, "start", 5))
+		return -EINVAL;
+
+	arg = strsep(&args, " ");
+	if (!arg)
+		return -EINVAL;
+
+	ret = kstrtou32(arg, 0, &id);
+	if (ret < 0)
+		return ret;
+
+	if (id < ZYNQMP_DPDMA_VIDEO0 || id > ZYNQMP_DPDMA_AUDIO1)
+		return -EINVAL;
+
+	dpdma_debugfs.testcase = DPDMA_TC_INTR_DONE;
+	dpdma_debugfs.xilinx_dpdma_irq_done_count = 0;
+	dpdma_debugfs.chan_id = id;
+
+	return 0;
+}
+
+/* Match xilinx_dpdma_testcases vs dpdma_debugfs_reqs[] entry */
+struct xilinx_dpdma_debugfs_request dpdma_debugfs_reqs[] = {
+	{
+		.name = "DESCRIPTOR_DONE_INTR",
+		.tc = DPDMA_TC_INTR_DONE,
+		.read = xilinx_dpdma_debugfs_desc_done_irq_read,
+		.write = xilinx_dpdma_debugfs_desc_done_irq_write,
+	},
+};
+
+static ssize_t xilinx_dpdma_debugfs_read(struct file *f, char __user *buf,
+					 size_t size, loff_t *pos)
+{
+	enum xilinx_dpdma_testcases testcase;
+	char *kern_buff;
+	int ret;
+
+	if (*pos != 0 || size <= 0)
+		return -EINVAL;
+
+	kern_buff = kzalloc(XILINX_DPDMA_DEBUGFS_READ_MAX_SIZE, GFP_KERNEL);
+	if (!kern_buff) {
+		dpdma_debugfs.testcase = DPDMA_TC_NONE;
+		return -ENOMEM;
+	}
+
+	testcase = READ_ONCE(dpdma_debugfs.testcase);
+	if (testcase != DPDMA_TC_NONE) {
+		ret = dpdma_debugfs_reqs[testcase].read(kern_buff);
+		if (ret < 0)
+			goto done;
+	} else {
+		strlcpy(kern_buff, "No testcase executed",
+			XILINX_DPDMA_DEBUGFS_READ_MAX_SIZE);
+	}
+
+	size = min(size, strlen(kern_buff));
+	ret = copy_to_user(buf, kern_buff, size);
+
+done:
+	kfree(kern_buff);
+	if (ret)
+		return ret;
+
+	*pos = size + 1;
+	return size;
+}
+
+static ssize_t xilinx_dpdma_debugfs_write(struct file *f,
+					  const char __user *buf, size_t size,
+					  loff_t *pos)
+{
+	char *kern_buff, *kern_buff_start;
+	char *testcase;
+	unsigned int i;
+	int ret;
+
+	if (*pos != 0 || size <= 0)
+		return -EINVAL;
+
+	/* Supporting single instance of test as of now. */
+	if (dpdma_debugfs.testcase != DPDMA_TC_NONE)
+		return -EBUSY;
+
+	kern_buff = kzalloc(size, GFP_KERNEL);
+	if (!kern_buff)
+		return -ENOMEM;
+	kern_buff_start = kern_buff;
+
+	ret = strncpy_from_user(kern_buff, buf, size);
+	if (ret < 0)
+		goto done;
+
+	/* Read the testcase name from a user request. */
+	testcase = strsep(&kern_buff, " ");
+
+	for (i = 0; i < ARRAY_SIZE(dpdma_debugfs_reqs); i++) {
+		if (!strcasecmp(testcase, dpdma_debugfs_reqs[i].name))
+			break;
+	}
+
+	if (i == ARRAY_SIZE(dpdma_debugfs_reqs)) {
+		ret = -EINVAL;
+		goto done;
+	}
+
+	ret = dpdma_debugfs_reqs[i].write(kern_buff);
+	if (ret < 0)
+		goto done;
+
+	ret = size;
+
+done:
+	kfree(kern_buff_start);
+	return ret;
+}
+
+static const struct file_operations fops_xilinx_dpdma_dbgfs = {
+	.owner = THIS_MODULE,
+	.read = xilinx_dpdma_debugfs_read,
+	.write = xilinx_dpdma_debugfs_write,
+};
+
+static int xilinx_dpdma_debugfs_init(struct device *dev)
+{
+	int err;
+	struct dentry *xilinx_dpdma_debugfs_dir, *xilinx_dpdma_debugfs_file;
+
+	dpdma_debugfs.testcase = DPDMA_TC_NONE;
+
+	xilinx_dpdma_debugfs_dir = debugfs_create_dir("dpdma", NULL);
+	if (!xilinx_dpdma_debugfs_dir) {
+		dev_err(dev, "debugfs_create_dir failed\n");
+		return -ENODEV;
+	}
+
+	xilinx_dpdma_debugfs_file =
+		debugfs_create_file("testcase", 0444,
+				    xilinx_dpdma_debugfs_dir, NULL,
+				    &fops_xilinx_dpdma_dbgfs);
+	if (!xilinx_dpdma_debugfs_file) {
+		dev_err(dev, "debugfs_create_file testcase failed\n");
+		err = -ENODEV;
+		goto err_dbgfs;
+	}
+	return 0;
+
+err_dbgfs:
+	debugfs_remove_recursive(xilinx_dpdma_debugfs_dir);
+	xilinx_dpdma_debugfs_dir = NULL;
+	return err;
+}
+
+#else
+static int xilinx_dpdma_debugfs_init(struct device *dev)
+{
+	return 0;
+}
+
+static void xilinx_dpdma_debugfs_desc_done_irq(struct xilinx_dpdma_chan *chan)
+{
+}
+#endif /* CONFIG_DEBUG_FS */
+
 /* -----------------------------------------------------------------------------
  * I/O Accessors
  */
@@ -840,6 +1063,8 @@ static void xilinx_dpdma_chan_done_irq(struct xilinx_dpdma_chan *chan)
 
 	spin_lock_irqsave(&chan->lock, flags);
 
+	xilinx_dpdma_debugfs_desc_done_irq(chan);
+
 	if (active)
 		vchan_cyclic_callback(&active->vdesc);
 	else
@@ -1469,6 +1694,8 @@ static int xilinx_dpdma_probe(struct platform_device *pdev)
 
 	xilinx_dpdma_enable_irq(xdev);
 
+	xilinx_dpdma_debugfs_init(&pdev->dev);
+
 	dev_info(&pdev->dev, "Xilinx DPDMA engine is probed\n");
 
 	return 0;
-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 6/6] arm64: dts: zynqmp: Add DPDMA node
  2020-01-23  2:29 [PATCH v3 0/6] dma: Add Xilinx ZynqMP DPDMA driver Laurent Pinchart
                   ` (4 preceding siblings ...)
  2020-01-23  2:29 ` [PATCH v3 5/6] dmaengine: xilinx: dpdma: Add debugfs support Laurent Pinchart
@ 2020-01-23  2:29 ` Laurent Pinchart
  5 siblings, 0 replies; 27+ messages in thread
From: Laurent Pinchart @ 2020-01-23  2:29 UTC (permalink / raw)
  To: dmaengine; +Cc: Michal Simek, Hyun Kwon, Tejas Upadhyay, Satish Kumar Nagireddy

Add a DT node for the DisplayPort DMA engine (DPDMA).

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
---
 arch/arm64/boot/dts/xilinx/zynqmp-clk.dtsi |  4 ++++
 arch/arm64/boot/dts/xilinx/zynqmp.dtsi     | 10 ++++++++++
 2 files changed, 14 insertions(+)

diff --git a/arch/arm64/boot/dts/xilinx/zynqmp-clk.dtsi b/arch/arm64/boot/dts/xilinx/zynqmp-clk.dtsi
index 306ad2157c98..2936e5f97f84 100644
--- a/arch/arm64/boot/dts/xilinx/zynqmp-clk.dtsi
+++ b/arch/arm64/boot/dts/xilinx/zynqmp-clk.dtsi
@@ -80,6 +80,10 @@ &can1 {
 	clocks = <&clk100 &clk100>;
 };
 
+&dpdma {
+	clocks = <&dpdma_clk>;
+};
+
 &fpd_dma_chan1 {
 	clocks = <&clk600>, <&clk100>;
 };
diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
index 3c731e73903a..7e986461fd57 100644
--- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
+++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
@@ -219,6 +219,16 @@ pmu@9000 {
 			};
 		};
 
+		dpdma: dma-controller@fd4c0000 {
+			compatible = "xlnx,zynqmp-dpdma";
+			status = "disabled";
+			reg = <0x0 0xfd4c0000 0x0 0x1000>;
+			interrupts = <0 122 4>;
+			interrupt-parent = <&gic>;
+			clock-names = "axi_clk";
+			#dma-cells = <1>;
+		};
+
 		/* GDMA */
 		fpd_dma_chan1: dma@fd500000 {
 			status = "disabled";
-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-01-23  2:29 ` [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type Laurent Pinchart
@ 2020-01-23  8:03   ` Peter Ujfalusi
  2020-01-23  8:43     ` Vinod Koul
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Ujfalusi @ 2020-01-23  8:03 UTC (permalink / raw)
  To: Laurent Pinchart, dmaengine
  Cc: Michal Simek, Hyun Kwon, Tejas Upadhyay, Satish Kumar Nagireddy,
	Vinod Koul

Hi Laurent,

On 23/01/2020 4.29, Laurent Pinchart wrote:
> The new interleaved cyclic transaction type combines interleaved and
> cycle transactions. It is designed for DMA engines that back display
> controllers, where the same 2D frame needs to be output to the display
> until a new frame is available.
> 
> Suggested-by: Vinod Koul <vkoul@kernel.org>
> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> ---
>  drivers/dma/dmaengine.c   |  8 +++++++-
>  include/linux/dmaengine.h | 18 ++++++++++++++++++
>  2 files changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
> index 03ac4b96117c..4ffb98a47f31 100644
> --- a/drivers/dma/dmaengine.c
> +++ b/drivers/dma/dmaengine.c
> @@ -981,7 +981,13 @@ int dma_async_device_register(struct dma_device *device)
>  			"DMA_INTERLEAVE");
>  		return -EIO;
>  	}
> -
> +	if (dma_has_cap(DMA_INTERLEAVE_CYCLIC, device->cap_mask) &&
> +	    !device->device_prep_interleaved_cyclic) {
> +		dev_err(device->dev,
> +			"Device claims capability %s, but op is not defined\n",
> +			"DMA_INTERLEAVE_CYCLIC");
> +		return -EIO;
> +	}
>  
>  	if (!device->device_tx_status) {
>  		dev_err(device->dev, "Device tx_status is not defined\n");
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index 8fcdee1c0cf9..e9af3bf835cb 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -61,6 +61,7 @@ enum dma_transaction_type {
>  	DMA_SLAVE,
>  	DMA_CYCLIC,
>  	DMA_INTERLEAVE,
> +	DMA_INTERLEAVE_CYCLIC,
>  /* last transaction type for creation of the capabilities mask */
>  	DMA_TX_TYPE_END,
>  };
> @@ -701,6 +702,10 @@ struct dma_filter {
>   *	The function takes a buffer of size buf_len. The callback function will
>   *	be called after period_len bytes have been transferred.
>   * @device_prep_interleaved_dma: Transfer expression in a generic way.
> + * @device_prep_interleaved_cyclic: prepares an interleaved cyclic transfer.
> + *	This is similar to @device_prep_interleaved_dma, but the transfer is
> + *	repeated until a new transfer is issued. This transfer type is meant
> + *	for display.

I think capture (camera) is another potential beneficiary of this.

So you don't need to terminate the running interleaved_cyclic and start
a new one, but prepare and issue a new one, which would
terminate/replace the currently running cyclic interleaved DMA?

Can you also update the documentation at
Documentation/driver-api/dmaengine/client.rst

One more thing might be good to clarify for the interleaved_cyclic:
What is expected when DMA_PREP_INTERRUPT is set in the flags? The
client's callback is called for each completion of
dma_interleaved_template, right?

- Péter

>   * @device_prep_dma_imm_data: DMA's 8 byte immediate data to the dst address
>   * @device_config: Pushes a new configuration to a channel, return 0 or an error
>   *	code
> @@ -785,6 +790,9 @@ struct dma_device {
>  	struct dma_async_tx_descriptor *(*device_prep_interleaved_dma)(
>  		struct dma_chan *chan, struct dma_interleaved_template *xt,
>  		unsigned long flags);
> +	struct dma_async_tx_descriptor *(*device_prep_interleaved_cyclic)(
> +		struct dma_chan *chan, struct dma_interleaved_template *xt,
> +		unsigned long flags);
>  	struct dma_async_tx_descriptor *(*device_prep_dma_imm_data)(
>  		struct dma_chan *chan, dma_addr_t dst, u64 data,
>  		unsigned long flags);
> @@ -880,6 +888,16 @@ static inline struct dma_async_tx_descriptor *dmaengine_prep_interleaved_dma(
>  	return chan->device->device_prep_interleaved_dma(chan, xt, flags);
>  }
>  
> +static inline struct dma_async_tx_descriptor *dmaengine_prep_interleaved_cyclic(
> +		struct dma_chan *chan, struct dma_interleaved_template *xt,
> +		unsigned long flags)
> +{
> +	if (!chan || !chan->device || !chan->device->device_prep_interleaved_cyclic)
> +		return NULL;
> +
> +	return chan->device->device_prep_interleaved_cyclic(chan, xt, flags);
> +}
> +
>  static inline struct dma_async_tx_descriptor *dmaengine_prep_dma_memset(
>  		struct dma_chan *chan, dma_addr_t dest, int value, size_t len,
>  		unsigned long flags)
> 

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-01-23  8:03   ` Peter Ujfalusi
@ 2020-01-23  8:43     ` Vinod Koul
  2020-01-23  8:51       ` Peter Ujfalusi
  0 siblings, 1 reply; 27+ messages in thread
From: Vinod Koul @ 2020-01-23  8:43 UTC (permalink / raw)
  To: Peter Ujfalusi
  Cc: Laurent Pinchart, dmaengine, Michal Simek, Hyun Kwon,
	Tejas Upadhyay, Satish Kumar Nagireddy

On 23-01-20, 10:03, Peter Ujfalusi wrote:
> Hi Laurent,
> 
> On 23/01/2020 4.29, Laurent Pinchart wrote:
> > The new interleaved cyclic transaction type combines interleaved and
> > cycle transactions. It is designed for DMA engines that back display
> > controllers, where the same 2D frame needs to be output to the display
> > until a new frame is available.
> > 
> > Suggested-by: Vinod Koul <vkoul@kernel.org>
> > Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> > ---
> >  drivers/dma/dmaengine.c   |  8 +++++++-
> >  include/linux/dmaengine.h | 18 ++++++++++++++++++
> >  2 files changed, 25 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
> > index 03ac4b96117c..4ffb98a47f31 100644
> > --- a/drivers/dma/dmaengine.c
> > +++ b/drivers/dma/dmaengine.c
> > @@ -981,7 +981,13 @@ int dma_async_device_register(struct dma_device *device)
> >  			"DMA_INTERLEAVE");
> >  		return -EIO;
> >  	}
> > -
> > +	if (dma_has_cap(DMA_INTERLEAVE_CYCLIC, device->cap_mask) &&
> > +	    !device->device_prep_interleaved_cyclic) {
> > +		dev_err(device->dev,
> > +			"Device claims capability %s, but op is not defined\n",
> > +			"DMA_INTERLEAVE_CYCLIC");
> > +		return -EIO;
> > +	}
> >  
> >  	if (!device->device_tx_status) {
> >  		dev_err(device->dev, "Device tx_status is not defined\n");
> > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> > index 8fcdee1c0cf9..e9af3bf835cb 100644
> > --- a/include/linux/dmaengine.h
> > +++ b/include/linux/dmaengine.h
> > @@ -61,6 +61,7 @@ enum dma_transaction_type {
> >  	DMA_SLAVE,
> >  	DMA_CYCLIC,
> >  	DMA_INTERLEAVE,
> > +	DMA_INTERLEAVE_CYCLIC,
> >  /* last transaction type for creation of the capabilities mask */
> >  	DMA_TX_TYPE_END,
> >  };
> > @@ -701,6 +702,10 @@ struct dma_filter {
> >   *	The function takes a buffer of size buf_len. The callback function will
> >   *	be called after period_len bytes have been transferred.
> >   * @device_prep_interleaved_dma: Transfer expression in a generic way.
> > + * @device_prep_interleaved_cyclic: prepares an interleaved cyclic transfer.
> > + *	This is similar to @device_prep_interleaved_dma, but the transfer is
> > + *	repeated until a new transfer is issued. This transfer type is meant
> > + *	for display.
> 
> I think capture (camera) is another potential beneficiary of this.
> 
> So you don't need to terminate the running interleaved_cyclic and start
> a new one, but prepare and issue a new one, which would
> terminate/replace the currently running cyclic interleaved DMA?

Why not explicitly terminate the transfer and start when a new one is
issued. That can be common usage for audio and display..

-- 
~Vinod

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-01-23  8:43     ` Vinod Koul
@ 2020-01-23  8:51       ` Peter Ujfalusi
  2020-01-23 12:23         ` Laurent Pinchart
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Ujfalusi @ 2020-01-23  8:51 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Laurent Pinchart, dmaengine, Michal Simek, Hyun Kwon,
	Tejas Upadhyay, Satish Kumar Nagireddy

Vinod,

On 23/01/2020 10.43, Vinod Koul wrote:
> On 23-01-20, 10:03, Peter Ujfalusi wrote:
>> Hi Laurent,
>>
>> On 23/01/2020 4.29, Laurent Pinchart wrote:
>>> The new interleaved cyclic transaction type combines interleaved and
>>> cycle transactions. It is designed for DMA engines that back display
>>> controllers, where the same 2D frame needs to be output to the display
>>> until a new frame is available.
>>>
>>> Suggested-by: Vinod Koul <vkoul@kernel.org>
>>> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
>>> ---
>>>  drivers/dma/dmaengine.c   |  8 +++++++-
>>>  include/linux/dmaengine.h | 18 ++++++++++++++++++
>>>  2 files changed, 25 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
>>> index 03ac4b96117c..4ffb98a47f31 100644
>>> --- a/drivers/dma/dmaengine.c
>>> +++ b/drivers/dma/dmaengine.c
>>> @@ -981,7 +981,13 @@ int dma_async_device_register(struct dma_device *device)
>>>  			"DMA_INTERLEAVE");
>>>  		return -EIO;
>>>  	}
>>> -
>>> +	if (dma_has_cap(DMA_INTERLEAVE_CYCLIC, device->cap_mask) &&
>>> +	    !device->device_prep_interleaved_cyclic) {
>>> +		dev_err(device->dev,
>>> +			"Device claims capability %s, but op is not defined\n",
>>> +			"DMA_INTERLEAVE_CYCLIC");
>>> +		return -EIO;
>>> +	}
>>>  
>>>  	if (!device->device_tx_status) {
>>>  		dev_err(device->dev, "Device tx_status is not defined\n");
>>> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
>>> index 8fcdee1c0cf9..e9af3bf835cb 100644
>>> --- a/include/linux/dmaengine.h
>>> +++ b/include/linux/dmaengine.h
>>> @@ -61,6 +61,7 @@ enum dma_transaction_type {
>>>  	DMA_SLAVE,
>>>  	DMA_CYCLIC,
>>>  	DMA_INTERLEAVE,
>>> +	DMA_INTERLEAVE_CYCLIC,
>>>  /* last transaction type for creation of the capabilities mask */
>>>  	DMA_TX_TYPE_END,
>>>  };
>>> @@ -701,6 +702,10 @@ struct dma_filter {
>>>   *	The function takes a buffer of size buf_len. The callback function will
>>>   *	be called after period_len bytes have been transferred.
>>>   * @device_prep_interleaved_dma: Transfer expression in a generic way.
>>> + * @device_prep_interleaved_cyclic: prepares an interleaved cyclic transfer.
>>> + *	This is similar to @device_prep_interleaved_dma, but the transfer is
>>> + *	repeated until a new transfer is issued. This transfer type is meant
>>> + *	for display.
>>
>> I think capture (camera) is another potential beneficiary of this.
>>
>> So you don't need to terminate the running interleaved_cyclic and start
>> a new one, but prepare and issue a new one, which would
>> terminate/replace the currently running cyclic interleaved DMA?
> 
> Why not explicitly terminate the transfer and start when a new one is
> issued. That can be common usage for audio and display..

Yes, this is what I'm asking. The cyclic transfer is running and in
order to start the new transfer, the previous should stop. But in cyclic
case it is not going to happen unless it is terminated.

When one would want to have different interleaved transfer the display
(or capture )IP needs to be reconfigured as well. The the would need to
be terminated anyways to avoid interpreting data in a wrong way.

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-01-23  8:51       ` Peter Ujfalusi
@ 2020-01-23 12:23         ` Laurent Pinchart
  2020-01-24  6:10           ` Vinod Koul
  2020-01-24  7:20           ` Peter Ujfalusi
  0 siblings, 2 replies; 27+ messages in thread
From: Laurent Pinchart @ 2020-01-23 12:23 UTC (permalink / raw)
  To: Peter Ujfalusi
  Cc: Vinod Koul, dmaengine, Michal Simek, Hyun Kwon, Tejas Upadhyay,
	Satish Kumar Nagireddy

Hello,

On Thu, Jan 23, 2020 at 10:51:42AM +0200, Peter Ujfalusi wrote:
> On 23/01/2020 10.43, Vinod Koul wrote:
> > On 23-01-20, 10:03, Peter Ujfalusi wrote:
> >> On 23/01/2020 4.29, Laurent Pinchart wrote:
> >>> The new interleaved cyclic transaction type combines interleaved and
> >>> cycle transactions. It is designed for DMA engines that back display
> >>> controllers, where the same 2D frame needs to be output to the display
> >>> until a new frame is available.
> >>>
> >>> Suggested-by: Vinod Koul <vkoul@kernel.org>
> >>> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> >>> ---
> >>>  drivers/dma/dmaengine.c   |  8 +++++++-
> >>>  include/linux/dmaengine.h | 18 ++++++++++++++++++
> >>>  2 files changed, 25 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
> >>> index 03ac4b96117c..4ffb98a47f31 100644
> >>> --- a/drivers/dma/dmaengine.c
> >>> +++ b/drivers/dma/dmaengine.c
> >>> @@ -981,7 +981,13 @@ int dma_async_device_register(struct dma_device *device)
> >>>  			"DMA_INTERLEAVE");
> >>>  		return -EIO;
> >>>  	}
> >>> -
> >>> +	if (dma_has_cap(DMA_INTERLEAVE_CYCLIC, device->cap_mask) &&
> >>> +	    !device->device_prep_interleaved_cyclic) {
> >>> +		dev_err(device->dev,
> >>> +			"Device claims capability %s, but op is not defined\n",
> >>> +			"DMA_INTERLEAVE_CYCLIC");
> >>> +		return -EIO;
> >>> +	}
> >>>  
> >>>  	if (!device->device_tx_status) {
> >>>  		dev_err(device->dev, "Device tx_status is not defined\n");
> >>> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> >>> index 8fcdee1c0cf9..e9af3bf835cb 100644
> >>> --- a/include/linux/dmaengine.h
> >>> +++ b/include/linux/dmaengine.h
> >>> @@ -61,6 +61,7 @@ enum dma_transaction_type {
> >>>  	DMA_SLAVE,
> >>>  	DMA_CYCLIC,
> >>>  	DMA_INTERLEAVE,
> >>> +	DMA_INTERLEAVE_CYCLIC,
> >>>  /* last transaction type for creation of the capabilities mask */
> >>>  	DMA_TX_TYPE_END,
> >>>  };
> >>> @@ -701,6 +702,10 @@ struct dma_filter {
> >>>   *	The function takes a buffer of size buf_len. The callback function will
> >>>   *	be called after period_len bytes have been transferred.
> >>>   * @device_prep_interleaved_dma: Transfer expression in a generic way.
> >>> + * @device_prep_interleaved_cyclic: prepares an interleaved cyclic transfer.
> >>> + *	This is similar to @device_prep_interleaved_dma, but the transfer is
> >>> + *	repeated until a new transfer is issued. This transfer type is meant
> >>> + *	for display.
> >>
> >> I think capture (camera) is another potential beneficiary of this.

Possibly, although in the camera case I'd rather have the hardware stop
if there's no more buffer. Requiring a buffer to always be present is
annoying from a userspace point of view. For display it's different, if
userspace doesn't submit a new frame, the same frame should keep being
displayed on the screen.

> >> So you don't need to terminate the running interleaved_cyclic and start
> >> a new one, but prepare and issue a new one, which would
> >> terminate/replace the currently running cyclic interleaved DMA?

Correct.

> > Why not explicitly terminate the transfer and start when a new one is
> > issued. That can be common usage for audio and display..
> 
> Yes, this is what I'm asking. The cyclic transfer is running and in
> order to start the new transfer, the previous should stop. But in cyclic
> case it is not going to happen unless it is terminated.
> 
> When one would want to have different interleaved transfer the display
> (or capture )IP needs to be reconfigured as well. The the would need to
> be terminated anyways to avoid interpreting data in a wrong way.

The use case here is not to switch to a new configuration, but to switch
to a new buffer. If the transfer had to be terminated manually first,
the DMA engine would potentially miss a frame, which is not acceptable.
We need an atomic way to switch to the next transfer.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-01-23 12:23         ` Laurent Pinchart
@ 2020-01-24  6:10           ` Vinod Koul
  2020-01-24  8:50             ` Laurent Pinchart
  2020-01-24  7:20           ` Peter Ujfalusi
  1 sibling, 1 reply; 27+ messages in thread
From: Vinod Koul @ 2020-01-24  6:10 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Peter Ujfalusi, dmaengine, Michal Simek, Hyun Kwon,
	Tejas Upadhyay, Satish Kumar Nagireddy

Hi Laurent,

On 23-01-20, 14:23, Laurent Pinchart wrote:
> > >>> @@ -701,6 +702,10 @@ struct dma_filter {
> > >>>   *	The function takes a buffer of size buf_len. The callback function will
> > >>>   *	be called after period_len bytes have been transferred.
> > >>>   * @device_prep_interleaved_dma: Transfer expression in a generic way.
> > >>> + * @device_prep_interleaved_cyclic: prepares an interleaved cyclic transfer.
> > >>> + *	This is similar to @device_prep_interleaved_dma, but the transfer is
> > >>> + *	repeated until a new transfer is issued. This transfer type is meant
> > >>> + *	for display.
> > >>
> > >> I think capture (camera) is another potential beneficiary of this.
> 
> Possibly, although in the camera case I'd rather have the hardware stop
> if there's no more buffer. Requiring a buffer to always be present is
> annoying from a userspace point of view. For display it's different, if
> userspace doesn't submit a new frame, the same frame should keep being
> displayed on the screen.
> 
> > >> So you don't need to terminate the running interleaved_cyclic and start
> > >> a new one, but prepare and issue a new one, which would
> > >> terminate/replace the currently running cyclic interleaved DMA?
> 
> Correct.
> 
> > > Why not explicitly terminate the transfer and start when a new one is
> > > issued. That can be common usage for audio and display..
> > 
> > Yes, this is what I'm asking. The cyclic transfer is running and in
> > order to start the new transfer, the previous should stop. But in cyclic
> > case it is not going to happen unless it is terminated.
> > 
> > When one would want to have different interleaved transfer the display
> > (or capture )IP needs to be reconfigured as well. The the would need to
> > be terminated anyways to avoid interpreting data in a wrong way.
> 
> The use case here is not to switch to a new configuration, but to switch
> to a new buffer. If the transfer had to be terminated manually first,
> the DMA engine would potentially miss a frame, which is not acceptable.
> We need an atomic way to switch to the next transfer.

So in this case you have, let's say a cyclic descriptor with N buffers
and they are cyclically capturing data and providing to client/user..

So why would you like to submit again...? Once whole capture has
completed you would terminate, right...

Sorry not able to wrap my head around why new submission is required and
if that is the case why previous one cant be terminated :)

-- 
~Vinod

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-01-23 12:23         ` Laurent Pinchart
  2020-01-24  6:10           ` Vinod Koul
@ 2020-01-24  7:20           ` Peter Ujfalusi
  2020-01-24  7:38             ` Peter Ujfalusi
  2020-01-24  8:56             ` Laurent Pinchart
  1 sibling, 2 replies; 27+ messages in thread
From: Peter Ujfalusi @ 2020-01-24  7:20 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Vinod Koul, dmaengine, Michal Simek, Hyun Kwon, Tejas Upadhyay,
	Satish Kumar Nagireddy

Hi Laurent,

On 23/01/2020 14.23, Laurent Pinchart wrote:
>>>> I think capture (camera) is another potential beneficiary of this.
> 
> Possibly, although in the camera case I'd rather have the hardware stop
> if there's no more buffer. Requiring a buffer to always be present is
> annoying from a userspace point of view. For display it's different, if
> userspace doesn't submit a new frame, the same frame should keep being
> displayed on the screen.
> 
>>>> So you don't need to terminate the running interleaved_cyclic and start
>>>> a new one, but prepare and issue a new one, which would
>>>> terminate/replace the currently running cyclic interleaved DMA?
> 
> Correct.
> 
>>> Why not explicitly terminate the transfer and start when a new one is
>>> issued. That can be common usage for audio and display..
>>
>> Yes, this is what I'm asking. The cyclic transfer is running and in
>> order to start the new transfer, the previous should stop. But in cyclic
>> case it is not going to happen unless it is terminated.
>>
>> When one would want to have different interleaved transfer the display
>> (or capture )IP needs to be reconfigured as well. The the would need to
>> be terminated anyways to avoid interpreting data in a wrong way.
> 
> The use case here is not to switch to a new configuration, but to switch
> to a new buffer. If the transfer had to be terminated manually first,
> the DMA engine would potentially miss a frame, which is not acceptable.
> We need an atomic way to switch to the next transfer.

You have a special hardware in hand, most DMAs can not just replace a
cyclic transfer in-flight and it also kind of violates the DMAengine
principles.
If cyclic transfer is started then it is expected to run forever until
it is terminated. Preparing and issuing a new transfer will not get
executed when there is already a cyclic transfer in flight as your only
option is to terminate_all, which will kill the running cyclic _and_
will discard the issued and pending transfers.

So the use case is page flip when you have multiple framebuffers and you
switch them to show the updated one, right?

There are things missing in DMAengine in API level for sure to do this,
imho.
The issue is that cyclic transfers will never complete, they run until
terminated, but you want to replace the currently executing one with a
another cyclic transfer without actually terminating the other.

It is like pause the 1st cyclic and continue with the 2nd one. Then at
some point you pause the 2nd one and restart the 1st one.
It is also crucial that the pause /switch happens when the executing one
finished the interleaved round and not in the middle somewhere, right?

If you:
desc_1 = dmaengine_prep_interleaved_cyclic(chan, );
cookie_1 = dmaengine_submit(desc_1);
desc_2 = dmaengine_prep_interleaved_cyclic(chan, );
cookie_2 = dmaengine_submit(desc_1);

/* cookie_1/desc_1 is started */
dma_async_issue_pending(chan);

/* When need to switch to cookie_2 */
dmaengine_cyclic_set_active_cookie(chan, cookie_2);
/*
 * cookie_1 execution is suspended after it finished the running
 * dma_interleaved_template or buffer in normal cyclic and cookie_2
 * is replacing it.
 */

/* Switch back to cookie_1 */
dmaengine_cyclic_set_active_cookie(chan, cookie_1);
/*
 * cookie_2 execution is suspended after it finished the running
 * dma_interleaved_template or buffer in normal cyclic and cookie_1
 * is replacing it.
 */

There should be a (yet another) capabilities flag got
cyclic_set_active_cookie and the documentation should be strict on what
is the expected behavior.

You can kill everything with terminate_all.
There is another thing which is missing imho from DMAengine: to
terminate a specific cookie, not the entire channel, which might be a
good addition as you might spawn framebuffers and then delete them and
you might want to release the corresponding cookie/descriptor as well.

What do you think?

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-01-24  7:20           ` Peter Ujfalusi
@ 2020-01-24  7:38             ` Peter Ujfalusi
  2020-01-24  8:58               ` Laurent Pinchart
  2020-01-24  8:56             ` Laurent Pinchart
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Ujfalusi @ 2020-01-24  7:38 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Vinod Koul, dmaengine, Michal Simek, Hyun Kwon, Tejas Upadhyay,
	Satish Kumar Nagireddy



On 24/01/2020 9.20, Peter Ujfalusi wrote:
> Hi Laurent,
> 
> On 23/01/2020 14.23, Laurent Pinchart wrote:
>>>>> I think capture (camera) is another potential beneficiary of this.
>>
>> Possibly, although in the camera case I'd rather have the hardware stop
>> if there's no more buffer. Requiring a buffer to always be present is
>> annoying from a userspace point of view. For display it's different, if
>> userspace doesn't submit a new frame, the same frame should keep being
>> displayed on the screen.
>>
>>>>> So you don't need to terminate the running interleaved_cyclic and start
>>>>> a new one, but prepare and issue a new one, which would
>>>>> terminate/replace the currently running cyclic interleaved DMA?
>>
>> Correct.
>>
>>>> Why not explicitly terminate the transfer and start when a new one is
>>>> issued. That can be common usage for audio and display..
>>>
>>> Yes, this is what I'm asking. The cyclic transfer is running and in
>>> order to start the new transfer, the previous should stop. But in cyclic
>>> case it is not going to happen unless it is terminated.
>>>
>>> When one would want to have different interleaved transfer the display
>>> (or capture )IP needs to be reconfigured as well. The the would need to
>>> be terminated anyways to avoid interpreting data in a wrong way.
>>
>> The use case here is not to switch to a new configuration, but to switch
>> to a new buffer. If the transfer had to be terminated manually first,
>> the DMA engine would potentially miss a frame, which is not acceptable.
>> We need an atomic way to switch to the next transfer.
> 
> You have a special hardware in hand, most DMAs can not just replace a
> cyclic transfer in-flight and it also kind of violates the DMAengine
> principles.

Is there any specific reason why you need DMAengine driver for a display
DMA? Usually the drm drivers handle their DMA internally.

> If cyclic transfer is started then it is expected to run forever until
> it is terminated. Preparing and issuing a new transfer will not get
> executed when there is already a cyclic transfer in flight as your only
> option is to terminate_all, which will kill the running cyclic _and_
> will discard the issued and pending transfers.
> 
> So the use case is page flip when you have multiple framebuffers and you
> switch them to show the updated one, right?
> 
> There are things missing in DMAengine in API level for sure to do this,
> imho.
> The issue is that cyclic transfers will never complete, they run until
> terminated, but you want to replace the currently executing one with a
> another cyclic transfer without actually terminating the other.
> 
> It is like pause the 1st cyclic and continue with the 2nd one. Then at
> some point you pause the 2nd one and restart the 1st one.
> It is also crucial that the pause /switch happens when the executing one
> finished the interleaved round and not in the middle somewhere, right?
> 
> If you:
> desc_1 = dmaengine_prep_interleaved_cyclic(chan, );
> cookie_1 = dmaengine_submit(desc_1);
> desc_2 = dmaengine_prep_interleaved_cyclic(chan, );
> cookie_2 = dmaengine_submit(desc_1);
> 
> /* cookie_1/desc_1 is started */
> dma_async_issue_pending(chan);
> 
> /* When need to switch to cookie_2 */
> dmaengine_cyclic_set_active_cookie(chan, cookie_2);
> /*
>  * cookie_1 execution is suspended after it finished the running
>  * dma_interleaved_template or buffer in normal cyclic and cookie_2
>  * is replacing it.
>  */
> 
> /* Switch back to cookie_1 */
> dmaengine_cyclic_set_active_cookie(chan, cookie_1);
> /*
>  * cookie_2 execution is suspended after it finished the running
>  * dma_interleaved_template or buffer in normal cyclic and cookie_1
>  * is replacing it.
>  */
> 
> There should be a (yet another) capabilities flag got
> cyclic_set_active_cookie and the documentation should be strict on what
> is the expected behavior.
> 
> You can kill everything with terminate_all.
> There is another thing which is missing imho from DMAengine: to
> terminate a specific cookie, not the entire channel, which might be a
> good addition as you might spawn framebuffers and then delete them and
> you might want to release the corresponding cookie/descriptor as well.

This is a bit trickier as DMAengine's cookie is s32 and internally
treated as a running number and cookie status is checked against s32
numbers with < >, I think this will not like when someone kills a cookie
in the middle.

> 
> What do you think?
> 
> - Péter
> 
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
> 

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-01-24  6:10           ` Vinod Koul
@ 2020-01-24  8:50             ` Laurent Pinchart
  2020-02-10 14:06               ` Laurent Pinchart
  0 siblings, 1 reply; 27+ messages in thread
From: Laurent Pinchart @ 2020-01-24  8:50 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Peter Ujfalusi, dmaengine, Michal Simek, Hyun Kwon,
	Tejas Upadhyay, Satish Kumar Nagireddy

On Fri, Jan 24, 2020 at 11:40:47AM +0530, Vinod Koul wrote:
> On 23-01-20, 14:23, Laurent Pinchart wrote:
> > > >>> @@ -701,6 +702,10 @@ struct dma_filter {
> > > >>>   *	The function takes a buffer of size buf_len. The callback function will
> > > >>>   *	be called after period_len bytes have been transferred.
> > > >>>   * @device_prep_interleaved_dma: Transfer expression in a generic way.
> > > >>> + * @device_prep_interleaved_cyclic: prepares an interleaved cyclic transfer.
> > > >>> + *	This is similar to @device_prep_interleaved_dma, but the transfer is
> > > >>> + *	repeated until a new transfer is issued. This transfer type is meant
> > > >>> + *	for display.
> > > >>
> > > >> I think capture (camera) is another potential beneficiary of this.
> > 
> > Possibly, although in the camera case I'd rather have the hardware stop
> > if there's no more buffer. Requiring a buffer to always be present is
> > annoying from a userspace point of view. For display it's different, if
> > userspace doesn't submit a new frame, the same frame should keep being
> > displayed on the screen.
> > 
> > > >> So you don't need to terminate the running interleaved_cyclic and start
> > > >> a new one, but prepare and issue a new one, which would
> > > >> terminate/replace the currently running cyclic interleaved DMA?
> > 
> > Correct.
> > 
> > > > Why not explicitly terminate the transfer and start when a new one is
> > > > issued. That can be common usage for audio and display..
> > > 
> > > Yes, this is what I'm asking. The cyclic transfer is running and in
> > > order to start the new transfer, the previous should stop. But in cyclic
> > > case it is not going to happen unless it is terminated.
> > > 
> > > When one would want to have different interleaved transfer the display
> > > (or capture )IP needs to be reconfigured as well. The the would need to
> > > be terminated anyways to avoid interpreting data in a wrong way.
> > 
> > The use case here is not to switch to a new configuration, but to switch
> > to a new buffer. If the transfer had to be terminated manually first,
> > the DMA engine would potentially miss a frame, which is not acceptable.
> > We need an atomic way to switch to the next transfer.
> 
> So in this case you have, let's say a cyclic descriptor with N buffers
> and they are cyclically capturing data and providing to client/user..

For the display case it's cyclic over a single buffer that is repeatedly
displayed over and over again until a new one replaces it, when
userspace wants to change the content on the screen. Userspace only has
to provide a new buffer when content changes, otherwise the display has
to keep displaying the same one.

For cameras I don't think cyclic makes too much sense, except when the
DMA engine can't work in single-shot mode and always requires a buffer
to write into. That shouldn't be the norm.

> So why would you like to submit again...? Once whole capture has
> completed you would terminate, right...
> 
> Sorry not able to wrap my head around why new submission is required and
> if that is the case why previous one cant be terminated :)

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-01-24  7:20           ` Peter Ujfalusi
  2020-01-24  7:38             ` Peter Ujfalusi
@ 2020-01-24  8:56             ` Laurent Pinchart
  1 sibling, 0 replies; 27+ messages in thread
From: Laurent Pinchart @ 2020-01-24  8:56 UTC (permalink / raw)
  To: Peter Ujfalusi
  Cc: Vinod Koul, dmaengine, Michal Simek, Hyun Kwon, Tejas Upadhyay,
	Satish Kumar Nagireddy

Hi Peter,

On Fri, Jan 24, 2020 at 09:20:15AM +0200, Peter Ujfalusi wrote:
> On 23/01/2020 14.23, Laurent Pinchart wrote:
> >>>> I think capture (camera) is another potential beneficiary of this.
> > 
> > Possibly, although in the camera case I'd rather have the hardware stop
> > if there's no more buffer. Requiring a buffer to always be present is
> > annoying from a userspace point of view. For display it's different, if
> > userspace doesn't submit a new frame, the same frame should keep being
> > displayed on the screen.
> > 
> >>>> So you don't need to terminate the running interleaved_cyclic and start
> >>>> a new one, but prepare and issue a new one, which would
> >>>> terminate/replace the currently running cyclic interleaved DMA?
> > 
> > Correct.
> > 
> >>> Why not explicitly terminate the transfer and start when a new one is
> >>> issued. That can be common usage for audio and display..
> >>
> >> Yes, this is what I'm asking. The cyclic transfer is running and in
> >> order to start the new transfer, the previous should stop. But in cyclic
> >> case it is not going to happen unless it is terminated.
> >>
> >> When one would want to have different interleaved transfer the display
> >> (or capture )IP needs to be reconfigured as well. The the would need to
> >> be terminated anyways to avoid interpreting data in a wrong way.
> > 
> > The use case here is not to switch to a new configuration, but to switch
> > to a new buffer. If the transfer had to be terminated manually first,
> > the DMA engine would potentially miss a frame, which is not acceptable.
> > We need an atomic way to switch to the next transfer.
> 
> You have a special hardware in hand, most DMAs can not just replace a
> cyclic transfer in-flight and it also kind of violates the DMAengine
> principles.

That's why cyclic support is optional :-)

> If cyclic transfer is started then it is expected to run forever until
> it is terminated. Preparing and issuing a new transfer will not get
> executed when there is already a cyclic transfer in flight as your only
> option is to terminate_all, which will kill the running cyclic _and_
> will discard the issued and pending transfers.

For the existing cyclic API, I could agree with that, although there's
very little documentation in the dmaengine subsystem to be used as an
authoritative source of information :-(

> So the use case is page flip when you have multiple framebuffers and you
> switch them to show the updated one, right?

Correct.

> There are things missing in DMAengine in API level for sure to do this,
> imho.
> The issue is that cyclic transfers will never complete, they run until
> terminated, but you want to replace the currently executing one with a
> another cyclic transfer without actually terminating the other.

Correct.

> It is like pause the 1st cyclic and continue with the 2nd one. Then at
> some point you pause the 2nd one and restart the 1st one.

No, after the 2nd one comes the 3rd one. It's not a double-buffering
case, it's really about replacing the buffer with another one,
regardless of where it comes from. Userspace may double-buffer, or
triple, or more.

> It is also crucial that the pause /switch happens when the executing one
> finished the interleaved round and not in the middle somewhere, right?

Yes. But that's not specific to this use case, with all non-cyclic
transfers submitting a new transfer request doesn't stop the ongoing
transfer (if any) immediately, it just queues the new transfer for
processing.

> If you:
> desc_1 = dmaengine_prep_interleaved_cyclic(chan, );
> cookie_1 = dmaengine_submit(desc_1);
> desc_2 = dmaengine_prep_interleaved_cyclic(chan, );
> cookie_2 = dmaengine_submit(desc_1);
> 
> /* cookie_1/desc_1 is started */
> dma_async_issue_pending(chan);
> 
> /* When need to switch to cookie_2 */
> dmaengine_cyclic_set_active_cookie(chan, cookie_2);
> /*
>  * cookie_1 execution is suspended after it finished the running
>  * dma_interleaved_template or buffer in normal cyclic and cookie_2
>  * is replacing it.
>  */
> 
> /* Switch back to cookie_1 */
> dmaengine_cyclic_set_active_cookie(chan, cookie_1);
> /*
>  * cookie_2 execution is suspended after it finished the running
>  * dma_interleaved_template or buffer in normal cyclic and cookie_1
>  * is replacing it.
>  */

As explained above, I don't want to switch back to a previous transfer,
I always want a new one. I don't see why we would need this kind of API
when we can just define that any queued interleaved transfer, whether
cyclic or not, is just queued and replaces the ongoing transfer at the
next frame boundary. Drivers don't have to implement the new API if the
hardware doesn't possess this capability.

> There should be a (yet another) capabilities flag got
> cyclic_set_active_cookie and the documentation should be strict on what
> is the expected behavior.
> 
> You can kill everything with terminate_all.
> There is another thing which is missing imho from DMAengine: to
> terminate a specific cookie, not the entire channel, which might be a
> good addition as you might spawn framebuffers and then delete them and
> you might want to release the corresponding cookie/descriptor as well.
> 
> What do you think?

I think it's overcomplicated for this use case :-)

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-01-24  7:38             ` Peter Ujfalusi
@ 2020-01-24  8:58               ` Laurent Pinchart
  0 siblings, 0 replies; 27+ messages in thread
From: Laurent Pinchart @ 2020-01-24  8:58 UTC (permalink / raw)
  To: Peter Ujfalusi
  Cc: Vinod Koul, dmaengine, Michal Simek, Hyun Kwon, Tejas Upadhyay,
	Satish Kumar Nagireddy

Hi Peter,

On Fri, Jan 24, 2020 at 09:38:50AM +0200, Peter Ujfalusi wrote:
> On 24/01/2020 9.20, Peter Ujfalusi wrote:
> > On 23/01/2020 14.23, Laurent Pinchart wrote:
> >>>>> I think capture (camera) is another potential beneficiary of this.
> >>
> >> Possibly, although in the camera case I'd rather have the hardware stop
> >> if there's no more buffer. Requiring a buffer to always be present is
> >> annoying from a userspace point of view. For display it's different, if
> >> userspace doesn't submit a new frame, the same frame should keep being
> >> displayed on the screen.
> >>
> >>>>> So you don't need to terminate the running interleaved_cyclic and start
> >>>>> a new one, but prepare and issue a new one, which would
> >>>>> terminate/replace the currently running cyclic interleaved DMA?
> >>
> >> Correct.
> >>
> >>>> Why not explicitly terminate the transfer and start when a new one is
> >>>> issued. That can be common usage for audio and display..
> >>>
> >>> Yes, this is what I'm asking. The cyclic transfer is running and in
> >>> order to start the new transfer, the previous should stop. But in cyclic
> >>> case it is not going to happen unless it is terminated.
> >>>
> >>> When one would want to have different interleaved transfer the display
> >>> (or capture )IP needs to be reconfigured as well. The the would need to
> >>> be terminated anyways to avoid interpreting data in a wrong way.
> >>
> >> The use case here is not to switch to a new configuration, but to switch
> >> to a new buffer. If the transfer had to be terminated manually first,
> >> the DMA engine would potentially miss a frame, which is not acceptable.
> >> We need an atomic way to switch to the next transfer.
> > 
> > You have a special hardware in hand, most DMAs can not just replace a
> > cyclic transfer in-flight and it also kind of violates the DMAengine
> > principles.
> 
> Is there any specific reason why you need DMAengine driver for a display
> DMA? Usually the drm drivers handle their DMA internally.

Because it's a separate IP core that can be reused in different FPGAs
for different purposes. It happens that in my case it's a hard IP
connected to a display controller, but it could be used for non-cyclic
use cases in a different chip.

> > If cyclic transfer is started then it is expected to run forever until
> > it is terminated. Preparing and issuing a new transfer will not get
> > executed when there is already a cyclic transfer in flight as your only
> > option is to terminate_all, which will kill the running cyclic _and_
> > will discard the issued and pending transfers.
> > 
> > So the use case is page flip when you have multiple framebuffers and you
> > switch them to show the updated one, right?
> > 
> > There are things missing in DMAengine in API level for sure to do this,
> > imho.
> > The issue is that cyclic transfers will never complete, they run until
> > terminated, but you want to replace the currently executing one with a
> > another cyclic transfer without actually terminating the other.
> > 
> > It is like pause the 1st cyclic and continue with the 2nd one. Then at
> > some point you pause the 2nd one and restart the 1st one.
> > It is also crucial that the pause /switch happens when the executing one
> > finished the interleaved round and not in the middle somewhere, right?
> > 
> > If you:
> > desc_1 = dmaengine_prep_interleaved_cyclic(chan, );
> > cookie_1 = dmaengine_submit(desc_1);
> > desc_2 = dmaengine_prep_interleaved_cyclic(chan, );
> > cookie_2 = dmaengine_submit(desc_1);
> > 
> > /* cookie_1/desc_1 is started */
> > dma_async_issue_pending(chan);
> > 
> > /* When need to switch to cookie_2 */
> > dmaengine_cyclic_set_active_cookie(chan, cookie_2);
> > /*
> >  * cookie_1 execution is suspended after it finished the running
> >  * dma_interleaved_template or buffer in normal cyclic and cookie_2
> >  * is replacing it.
> >  */
> > 
> > /* Switch back to cookie_1 */
> > dmaengine_cyclic_set_active_cookie(chan, cookie_1);
> > /*
> >  * cookie_2 execution is suspended after it finished the running
> >  * dma_interleaved_template or buffer in normal cyclic and cookie_1
> >  * is replacing it.
> >  */
> > 
> > There should be a (yet another) capabilities flag got
> > cyclic_set_active_cookie and the documentation should be strict on what
> > is the expected behavior.
> > 
> > You can kill everything with terminate_all.
> > There is another thing which is missing imho from DMAengine: to
> > terminate a specific cookie, not the entire channel, which might be a
> > good addition as you might spawn framebuffers and then delete them and
> > you might want to release the corresponding cookie/descriptor as well.
> 
> This is a bit trickier as DMAengine's cookie is s32 and internally
> treated as a running number and cookie status is checked against s32
> numbers with < >, I think this will not like when someone kills a cookie
> in the middle.

I would require a major redesign, yes. Not looking forward to that,
especially as I think we don't need it.

> > What do you think?

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-01-24  8:50             ` Laurent Pinchart
@ 2020-02-10 14:06               ` Laurent Pinchart
  2020-02-13 13:29                 ` Vinod Koul
  0 siblings, 1 reply; 27+ messages in thread
From: Laurent Pinchart @ 2020-02-10 14:06 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Peter Ujfalusi, dmaengine, Michal Simek, Hyun Kwon,
	Tejas Upadhyay, Satish Kumar Nagireddy

Hi Vinod,

On Fri, Jan 24, 2020 at 10:50:51AM +0200, Laurent Pinchart wrote:
> On Fri, Jan 24, 2020 at 11:40:47AM +0530, Vinod Koul wrote:
> > On 23-01-20, 14:23, Laurent Pinchart wrote:
> >>>>>> @@ -701,6 +702,10 @@ struct dma_filter {
> >>>>>>   *	The function takes a buffer of size buf_len. The callback function will
> >>>>>>   *	be called after period_len bytes have been transferred.
> >>>>>>   * @device_prep_interleaved_dma: Transfer expression in a generic way.
> >>>>>> + * @device_prep_interleaved_cyclic: prepares an interleaved cyclic transfer.
> >>>>>> + *	This is similar to @device_prep_interleaved_dma, but the transfer is
> >>>>>> + *	repeated until a new transfer is issued. This transfer type is meant
> >>>>>> + *	for display.
> >>>>>
> >>>>> I think capture (camera) is another potential beneficiary of this.
> >> 
> >> Possibly, although in the camera case I'd rather have the hardware stop
> >> if there's no more buffer. Requiring a buffer to always be present is
> >> annoying from a userspace point of view. For display it's different, if
> >> userspace doesn't submit a new frame, the same frame should keep being
> >> displayed on the screen.
> >> 
> >>>>> So you don't need to terminate the running interleaved_cyclic and start
> >>>>> a new one, but prepare and issue a new one, which would
> >>>>> terminate/replace the currently running cyclic interleaved DMA?
> >> 
> >> Correct.
> >> 
> >>>> Why not explicitly terminate the transfer and start when a new one is
> >>>> issued. That can be common usage for audio and display..
> >>> 
> >>> Yes, this is what I'm asking. The cyclic transfer is running and in
> >>> order to start the new transfer, the previous should stop. But in cyclic
> >>> case it is not going to happen unless it is terminated.
> >>> 
> >>> When one would want to have different interleaved transfer the display
> >>> (or capture )IP needs to be reconfigured as well. The the would need to
> >>> be terminated anyways to avoid interpreting data in a wrong way.
> >> 
> >> The use case here is not to switch to a new configuration, but to switch
> >> to a new buffer. If the transfer had to be terminated manually first,
> >> the DMA engine would potentially miss a frame, which is not acceptable.
> >> We need an atomic way to switch to the next transfer.
> > 
> > So in this case you have, let's say a cyclic descriptor with N buffers
> > and they are cyclically capturing data and providing to client/user..
> 
> For the display case it's cyclic over a single buffer that is repeatedly
> displayed over and over again until a new one replaces it, when
> userspace wants to change the content on the screen. Userspace only has
> to provide a new buffer when content changes, otherwise the display has
> to keep displaying the same one.

Is the use case clear enough, or do you need more information ? Are you
fine with the API for this kind of use case ?

> For cameras I don't think cyclic makes too much sense, except when the
> DMA engine can't work in single-shot mode and always requires a buffer
> to write into. That shouldn't be the norm.
> 
> > So why would you like to submit again...? Once whole capture has
> > completed you would terminate, right...
> > 
> > Sorry not able to wrap my head around why new submission is required and
> > if that is the case why previous one cant be terminated :)

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-02-10 14:06               ` Laurent Pinchart
@ 2020-02-13 13:29                 ` Vinod Koul
  2020-02-13 13:48                   ` Laurent Pinchart
  0 siblings, 1 reply; 27+ messages in thread
From: Vinod Koul @ 2020-02-13 13:29 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Peter Ujfalusi, dmaengine, Michal Simek, Hyun Kwon,
	Tejas Upadhyay, Satish Kumar Nagireddy

Hi Laurent,

On 10-02-20, 16:06, Laurent Pinchart wrote:

> > >> The use case here is not to switch to a new configuration, but to switch
> > >> to a new buffer. If the transfer had to be terminated manually first,
> > >> the DMA engine would potentially miss a frame, which is not acceptable.
> > >> We need an atomic way to switch to the next transfer.
> > > 
> > > So in this case you have, let's say a cyclic descriptor with N buffers
> > > and they are cyclically capturing data and providing to client/user..
> > 
> > For the display case it's cyclic over a single buffer that is repeatedly
> > displayed over and over again until a new one replaces it, when
> > userspace wants to change the content on the screen. Userspace only has
> > to provide a new buffer when content changes, otherwise the display has
> > to keep displaying the same one.
> 
> Is the use case clear enough, or do you need more information ? Are you
> fine with the API for this kind of use case ?

So we *know* when a new buffer is being used?

IOW would it be possible for display (rather a dmaengine facing display wrapper) to detect that we are reusing an
old buffer and keep the cyclic and once detected prepare a new
descriptor, submit a new one and then terminate old one which should
trigger next transaction to be submitted

Would that make sense here?

-- 
~Vinod

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-02-13 13:29                 ` Vinod Koul
@ 2020-02-13 13:48                   ` Laurent Pinchart
  2020-02-13 14:07                     ` Vinod Koul
  0 siblings, 1 reply; 27+ messages in thread
From: Laurent Pinchart @ 2020-02-13 13:48 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Peter Ujfalusi, dmaengine, Michal Simek, Hyun Kwon,
	Tejas Upadhyay, Satish Kumar Nagireddy

Hi Vinod,

On Thu, Feb 13, 2020 at 06:59:38PM +0530, Vinod Koul wrote:
> On 10-02-20, 16:06, Laurent Pinchart wrote:
> 
> > > >> The use case here is not to switch to a new configuration, but to switch
> > > >> to a new buffer. If the transfer had to be terminated manually first,
> > > >> the DMA engine would potentially miss a frame, which is not acceptable.
> > > >> We need an atomic way to switch to the next transfer.
> > > > 
> > > > So in this case you have, let's say a cyclic descriptor with N buffers
> > > > and they are cyclically capturing data and providing to client/user..
> > > 
> > > For the display case it's cyclic over a single buffer that is repeatedly
> > > displayed over and over again until a new one replaces it, when
> > > userspace wants to change the content on the screen. Userspace only has
> > > to provide a new buffer when content changes, otherwise the display has
> > > to keep displaying the same one.
> > 
> > Is the use case clear enough, or do you need more information ? Are you
> > fine with the API for this kind of use case ?
> 
> So we *know* when a new buffer is being used?

The user of the DMA engine (the DRM DPSUB driver in this case) knows
when a new buffer needs to be used, as it receives it from userspace. In
response, it prepares a new interleaved cyclic transaction and queues
it. At the next IRQ, the DMA engine driver switches to the new
transaction (the implementation is slightly more complex to handle race
conditions, but that's the idea).

> IOW would it be possible for display (rather a dmaengine facing
> display wrapper) to detect that we are reusing an old buffer and keep
> the cyclic and once detected prepare a new descriptor, submit a new
> one and then terminate old one which should trigger next transaction
> to be submitted

I'm not sure to follow you. Do you mean that the display driver should
submit a non-cyclic transaction for every frame, reusing the same buffer
for every transaction, until a new buffer is available ? The issue with
this is that if the CPU load gets high, we may miss a frame, and the
display will break. The DPDMA hardware implements cyclic support for
this reason, and we want to use that feature to comply with the real
time requirements.

If you meant something else, could you please elaborate ?

> Would that make sense here?

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-02-13 13:48                   ` Laurent Pinchart
@ 2020-02-13 14:07                     ` Vinod Koul
  2020-02-13 14:15                       ` Peter Ujfalusi
  0 siblings, 1 reply; 27+ messages in thread
From: Vinod Koul @ 2020-02-13 14:07 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Peter Ujfalusi, dmaengine, Michal Simek, Hyun Kwon,
	Tejas Upadhyay, Satish Kumar Nagireddy

On 13-02-20, 15:48, Laurent Pinchart wrote:
> Hi Vinod,
> 
> On Thu, Feb 13, 2020 at 06:59:38PM +0530, Vinod Koul wrote:
> > On 10-02-20, 16:06, Laurent Pinchart wrote:
> > 
> > > > >> The use case here is not to switch to a new configuration, but to switch
> > > > >> to a new buffer. If the transfer had to be terminated manually first,
> > > > >> the DMA engine would potentially miss a frame, which is not acceptable.
> > > > >> We need an atomic way to switch to the next transfer.
> > > > > 
> > > > > So in this case you have, let's say a cyclic descriptor with N buffers
> > > > > and they are cyclically capturing data and providing to client/user..
> > > > 
> > > > For the display case it's cyclic over a single buffer that is repeatedly
> > > > displayed over and over again until a new one replaces it, when
> > > > userspace wants to change the content on the screen. Userspace only has
> > > > to provide a new buffer when content changes, otherwise the display has
> > > > to keep displaying the same one.
> > > 
> > > Is the use case clear enough, or do you need more information ? Are you
> > > fine with the API for this kind of use case ?
> > 
> > So we *know* when a new buffer is being used?
> 
> The user of the DMA engine (the DRM DPSUB driver in this case) knows
> when a new buffer needs to be used, as it receives it from userspace. In
> response, it prepares a new interleaved cyclic transaction and queues
> it. At the next IRQ, the DMA engine driver switches to the new
> transaction (the implementation is slightly more complex to handle race
> conditions, but that's the idea).
> 
> > IOW would it be possible for display (rather a dmaengine facing
> > display wrapper) to detect that we are reusing an old buffer and keep
> > the cyclic and once detected prepare a new descriptor, submit a new
> > one and then terminate old one which should trigger next transaction
> > to be submitted
> 
> I'm not sure to follow you. Do you mean that the display driver should
> submit a non-cyclic transaction for every frame, reusing the same buffer
> for every transaction, until a new buffer is available ? The issue with
> this is that if the CPU load gets high, we may miss a frame, and the
> display will break. The DPDMA hardware implements cyclic support for
> this reason, and we want to use that feature to comply with the real
> time requirements.

Sorry to cause confusion :) I mean cyclic

So, DRM DPSUB get first buffer
A.1 Prepare cyclic interleave txn
A.2 Submit the txn (it doesn't start here)
A.3 Invoke issue_pending (that starts the txn)

DRM DPSUB gets next buffer:
B.1 Prepare cyclic interleave txn
B.2 Submit the txn
B.3 Call terminate for current cyclic txn (we need an updated terminate
which terminates the current txn, right now we have terminate_all which
is a sledge hammer approach)
B.4 Next txn would start once current one is started

Does this help and make sense in your case

Thanks
-- 
~Vinod

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-02-13 14:07                     ` Vinod Koul
@ 2020-02-13 14:15                       ` Peter Ujfalusi
  2020-02-13 16:52                         ` Laurent Pinchart
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Ujfalusi @ 2020-02-13 14:15 UTC (permalink / raw)
  To: Vinod Koul, Laurent Pinchart
  Cc: dmaengine, Michal Simek, Hyun Kwon, Tejas Upadhyay,
	Satish Kumar Nagireddy

Hi Vinod, Laurent,

On 13/02/2020 16.07, Vinod Koul wrote:
> On 13-02-20, 15:48, Laurent Pinchart wrote:
>> Hi Vinod,
>>
>> On Thu, Feb 13, 2020 at 06:59:38PM +0530, Vinod Koul wrote:
>>> On 10-02-20, 16:06, Laurent Pinchart wrote:
>>>
>>>>>>> The use case here is not to switch to a new configuration, but to switch
>>>>>>> to a new buffer. If the transfer had to be terminated manually first,
>>>>>>> the DMA engine would potentially miss a frame, which is not acceptable.
>>>>>>> We need an atomic way to switch to the next transfer.
>>>>>>
>>>>>> So in this case you have, let's say a cyclic descriptor with N buffers
>>>>>> and they are cyclically capturing data and providing to client/user..
>>>>>
>>>>> For the display case it's cyclic over a single buffer that is repeatedly
>>>>> displayed over and over again until a new one replaces it, when
>>>>> userspace wants to change the content on the screen. Userspace only has
>>>>> to provide a new buffer when content changes, otherwise the display has
>>>>> to keep displaying the same one.
>>>>
>>>> Is the use case clear enough, or do you need more information ? Are you
>>>> fine with the API for this kind of use case ?
>>>
>>> So we *know* when a new buffer is being used?
>>
>> The user of the DMA engine (the DRM DPSUB driver in this case) knows
>> when a new buffer needs to be used, as it receives it from userspace. In
>> response, it prepares a new interleaved cyclic transaction and queues
>> it. At the next IRQ, the DMA engine driver switches to the new
>> transaction (the implementation is slightly more complex to handle race
>> conditions, but that's the idea).
>>
>>> IOW would it be possible for display (rather a dmaengine facing
>>> display wrapper) to detect that we are reusing an old buffer and keep
>>> the cyclic and once detected prepare a new descriptor, submit a new
>>> one and then terminate old one which should trigger next transaction
>>> to be submitted
>>
>> I'm not sure to follow you. Do you mean that the display driver should
>> submit a non-cyclic transaction for every frame, reusing the same buffer
>> for every transaction, until a new buffer is available ? The issue with
>> this is that if the CPU load gets high, we may miss a frame, and the
>> display will break. The DPDMA hardware implements cyclic support for
>> this reason, and we want to use that feature to comply with the real
>> time requirements.
> 
> Sorry to cause confusion :) I mean cyclic
> 
> So, DRM DPSUB get first buffer
> A.1 Prepare cyclic interleave txn
> A.2 Submit the txn (it doesn't start here)
> A.3 Invoke issue_pending (that starts the txn)
> 
> DRM DPSUB gets next buffer:
> B.1 Prepare cyclic interleave txn
> B.2 Submit the txn
> B.3 Call terminate for current cyclic txn (we need an updated terminate
> which terminates the current txn, right now we have terminate_all which
> is a sledge hammer approach)
> B.4 Next txn would start once current one is started
> 
> Does this help and make sense in your case

That would be a clean way to handle it. We were missing this API for a
long time to be able to cancel the ongoing transfer (whether it is
cyclic or slave_sg, or memcpy) and move to the next one if there is one
pending.

+1 from me if it counts ;)

> 
> Thanks
> 

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-02-13 14:15                       ` Peter Ujfalusi
@ 2020-02-13 16:52                         ` Laurent Pinchart
  2020-02-14  4:23                           ` Vinod Koul
  0 siblings, 1 reply; 27+ messages in thread
From: Laurent Pinchart @ 2020-02-13 16:52 UTC (permalink / raw)
  To: Peter Ujfalusi, Vinod Koul
  Cc: dmaengine, Michal Simek, Hyun Kwon, Tejas Upadhyay,
	Satish Kumar Nagireddy

Hi Vinod and Peter,

On Thu, Feb 13, 2020 at 04:15:38PM +0200, Peter Ujfalusi wrote:
> On 13/02/2020 16.07, Vinod Koul wrote:
> > On 13-02-20, 15:48, Laurent Pinchart wrote:
> >> On Thu, Feb 13, 2020 at 06:59:38PM +0530, Vinod Koul wrote:
> >>> On 10-02-20, 16:06, Laurent Pinchart wrote:
> >>>
> >>>>>>> The use case here is not to switch to a new configuration, but to switch
> >>>>>>> to a new buffer. If the transfer had to be terminated manually first,
> >>>>>>> the DMA engine would potentially miss a frame, which is not acceptable.
> >>>>>>> We need an atomic way to switch to the next transfer.
> >>>>>>
> >>>>>> So in this case you have, let's say a cyclic descriptor with N buffers
> >>>>>> and they are cyclically capturing data and providing to client/user..
> >>>>>
> >>>>> For the display case it's cyclic over a single buffer that is repeatedly
> >>>>> displayed over and over again until a new one replaces it, when
> >>>>> userspace wants to change the content on the screen. Userspace only has
> >>>>> to provide a new buffer when content changes, otherwise the display has
> >>>>> to keep displaying the same one.
> >>>>
> >>>> Is the use case clear enough, or do you need more information ? Are you
> >>>> fine with the API for this kind of use case ?
> >>>
> >>> So we *know* when a new buffer is being used?
> >>
> >> The user of the DMA engine (the DRM DPSUB driver in this case) knows
> >> when a new buffer needs to be used, as it receives it from userspace. In
> >> response, it prepares a new interleaved cyclic transaction and queues
> >> it. At the next IRQ, the DMA engine driver switches to the new
> >> transaction (the implementation is slightly more complex to handle race
> >> conditions, but that's the idea).
> >>
> >>> IOW would it be possible for display (rather a dmaengine facing
> >>> display wrapper) to detect that we are reusing an old buffer and keep
> >>> the cyclic and once detected prepare a new descriptor, submit a new
> >>> one and then terminate old one which should trigger next transaction
> >>> to be submitted
> >>
> >> I'm not sure to follow you. Do you mean that the display driver should
> >> submit a non-cyclic transaction for every frame, reusing the same buffer
> >> for every transaction, until a new buffer is available ? The issue with
> >> this is that if the CPU load gets high, we may miss a frame, and the
> >> display will break. The DPDMA hardware implements cyclic support for
> >> this reason, and we want to use that feature to comply with the real
> >> time requirements.
> > 
> > Sorry to cause confusion :) I mean cyclic
> > 
> > So, DRM DPSUB get first buffer
> > A.1 Prepare cyclic interleave txn
> > A.2 Submit the txn (it doesn't start here)
> > A.3 Invoke issue_pending (that starts the txn)

I assume that, at this point, the transfer is started, and repeated
forever until step B below, right ?

> > DRM DPSUB gets next buffer:
> > B.1 Prepare cyclic interleave txn
> > B.2 Submit the txn
> > B.3 Call terminate for current cyclic txn (we need an updated terminate
> > which terminates the current txn, right now we have terminate_all which
> > is a sledge hammer approach)
> > B.4 Next txn would start once current one is started

Do you mean "once current one is completed" ?

> > Does this help and make sense in your case

It does, but I really wonder why we need a new terminate operation that
would terminate a single transfer. If we call issue_pending at step B.3,
when the new txn submitted, we can terminate the current transfer at the
point. It changes the semantics of issue_pending, but only for cyclic
transfers (this whole discussions it only about cyclic transfers). As a
cyclic transfer will be repeated forever until terminated, there's no
use case for issuing a new transfer without terminating the one in
progress. I thus don't think we need a new terminate operation: the only
thing that makes sense to do when submitting a new cyclic transfer is to
terminate the current one and switch to the new one, and we already have
all the APIs we need to enable this behaviour.

> That would be a clean way to handle it. We were missing this API for a
> long time to be able to cancel the ongoing transfer (whether it is
> cyclic or slave_sg, or memcpy) and move to the next one if there is one
> pending.

Note that this new terminate API wouldn't terminate the ongoing transfer
immediately, it would complete first, until the end of the cycle for
cyclic transfers, and until the end of the whole transfer otherwise.
This new operation would thus essentially be a no-op for non-cyclic
transfers. I don't see how it would help :-) Do you have any particular
use case in mind ?

> +1 from me if it counts ;)

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-02-13 16:52                         ` Laurent Pinchart
@ 2020-02-14  4:23                           ` Vinod Koul
  2020-02-14 16:22                             ` Laurent Pinchart
  0 siblings, 1 reply; 27+ messages in thread
From: Vinod Koul @ 2020-02-14  4:23 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Peter Ujfalusi, dmaengine, Michal Simek, Hyun Kwon,
	Tejas Upadhyay, Satish Kumar Nagireddy

On 13-02-20, 18:52, Laurent Pinchart wrote:
> Hi Vinod and Peter,
> 
> On Thu, Feb 13, 2020 at 04:15:38PM +0200, Peter Ujfalusi wrote:
> > On 13/02/2020 16.07, Vinod Koul wrote:
> > > On 13-02-20, 15:48, Laurent Pinchart wrote:
> > >> On Thu, Feb 13, 2020 at 06:59:38PM +0530, Vinod Koul wrote:
> > >>> On 10-02-20, 16:06, Laurent Pinchart wrote:
> > >>>
> > >>>>>>> The use case here is not to switch to a new configuration, but to switch
> > >>>>>>> to a new buffer. If the transfer had to be terminated manually first,
> > >>>>>>> the DMA engine would potentially miss a frame, which is not acceptable.
> > >>>>>>> We need an atomic way to switch to the next transfer.
> > >>>>>>
> > >>>>>> So in this case you have, let's say a cyclic descriptor with N buffers
> > >>>>>> and they are cyclically capturing data and providing to client/user..
> > >>>>>
> > >>>>> For the display case it's cyclic over a single buffer that is repeatedly
> > >>>>> displayed over and over again until a new one replaces it, when
> > >>>>> userspace wants to change the content on the screen. Userspace only has
> > >>>>> to provide a new buffer when content changes, otherwise the display has
> > >>>>> to keep displaying the same one.
> > >>>>
> > >>>> Is the use case clear enough, or do you need more information ? Are you
> > >>>> fine with the API for this kind of use case ?
> > >>>
> > >>> So we *know* when a new buffer is being used?
> > >>
> > >> The user of the DMA engine (the DRM DPSUB driver in this case) knows
> > >> when a new buffer needs to be used, as it receives it from userspace. In
> > >> response, it prepares a new interleaved cyclic transaction and queues
> > >> it. At the next IRQ, the DMA engine driver switches to the new
> > >> transaction (the implementation is slightly more complex to handle race
> > >> conditions, but that's the idea).
> > >>
> > >>> IOW would it be possible for display (rather a dmaengine facing
> > >>> display wrapper) to detect that we are reusing an old buffer and keep
> > >>> the cyclic and once detected prepare a new descriptor, submit a new
> > >>> one and then terminate old one which should trigger next transaction
> > >>> to be submitted
> > >>
> > >> I'm not sure to follow you. Do you mean that the display driver should
> > >> submit a non-cyclic transaction for every frame, reusing the same buffer
> > >> for every transaction, until a new buffer is available ? The issue with
> > >> this is that if the CPU load gets high, we may miss a frame, and the
> > >> display will break. The DPDMA hardware implements cyclic support for
> > >> this reason, and we want to use that feature to comply with the real
> > >> time requirements.
> > > 
> > > Sorry to cause confusion :) I mean cyclic
> > > 
> > > So, DRM DPSUB get first buffer
> > > A.1 Prepare cyclic interleave txn
> > > A.2 Submit the txn (it doesn't start here)
> > > A.3 Invoke issue_pending (that starts the txn)
> 
> I assume that, at this point, the transfer is started, and repeated
> forever until step B below, right ?

Right, since the transaction is cyclic in nature, the transaction will continue
until stopped or switched :)

> > > DRM DPSUB gets next buffer:
> > > B.1 Prepare cyclic interleave txn
> > > B.2 Submit the txn
> > > B.3 Call terminate for current cyclic txn (we need an updated terminate
> > > which terminates the current txn, right now we have terminate_all which
> > > is a sledge hammer approach)
> > > B.4 Next txn would start once current one is started
> 
> Do you mean "once current one is completed" ?

Yup, sorry for the typo!

> > > Does this help and make sense in your case
> 
> It does, but I really wonder why we need a new terminate operation that
> would terminate a single transfer. If we call issue_pending at step B.3,
> when the new txn submitted, we can terminate the current transfer at the
> point. It changes the semantics of issue_pending, but only for cyclic
> transfers (this whole discussions it only about cyclic transfers). As a
> cyclic transfer will be repeated forever until terminated, there's no
> use case for issuing a new transfer without terminating the one in
> progress. I thus don't think we need a new terminate operation: the only
> thing that makes sense to do when submitting a new cyclic transfer is to
> terminate the current one and switch to the new one, and we already have
> all the APIs we need to enable this behaviour.

The issue_pending() is a NOP when engine is already running.

The design of APIs is that we submit a txn to pending_list and then the
pending_list is started when issue_pending() is called.
Or if the engine is already running, it will take next txn from
pending_list() when current txn completes.

The only consideration here in this case is that the cyclic txn never
completes. Do we really treat a new txn submission as an 'indication' of
completeness? That is indeed a point to ponder upon.

Also, we need to keep in mind that the dmaengine wont stop a cyclic
txn. It would be running and start next transfer (in this case do
from start) while it also gives you an interrupt. Here we would be
required to stop it and then start a new one...

Or perhaps remove the cyclic setting from the txn when a new one
arrives and that behaviour IMO is controller dependent, not sure if
all controllers support it..

> > That would be a clean way to handle it. We were missing this API for a
> > long time to be able to cancel the ongoing transfer (whether it is
> > cyclic or slave_sg, or memcpy) and move to the next one if there is one
> > pending.
> 
> Note that this new terminate API wouldn't terminate the ongoing transfer
> immediately, it would complete first, until the end of the cycle for
> cyclic transfers, and until the end of the whole transfer otherwise.
> This new operation would thus essentially be a no-op for non-cyclic
> transfers. I don't see how it would help :-) Do you have any particular
> use case in mind ?

Yeah that is something more to think about. Do we really abort here or
wait for the txn to complete. I think Peter needs the former and your
falls in the latter category

Thanks
-- 
~Vinod

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-02-14  4:23                           ` Vinod Koul
@ 2020-02-14 16:22                             ` Laurent Pinchart
  2020-02-17 10:00                               ` Peter Ujfalusi
  0 siblings, 1 reply; 27+ messages in thread
From: Laurent Pinchart @ 2020-02-14 16:22 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Peter Ujfalusi, dmaengine, Michal Simek, Hyun Kwon,
	Tejas Upadhyay, Satish Kumar Nagireddy

Hi Vinod,

On Fri, Feb 14, 2020 at 09:53:49AM +0530, Vinod Koul wrote:
> On 13-02-20, 18:52, Laurent Pinchart wrote:
> > On Thu, Feb 13, 2020 at 04:15:38PM +0200, Peter Ujfalusi wrote:
> > > On 13/02/2020 16.07, Vinod Koul wrote:
> > > > On 13-02-20, 15:48, Laurent Pinchart wrote:
> > > >> On Thu, Feb 13, 2020 at 06:59:38PM +0530, Vinod Koul wrote:
> > > >>> On 10-02-20, 16:06, Laurent Pinchart wrote:
> > > >>>
> > > >>>>>>> The use case here is not to switch to a new configuration, but to switch
> > > >>>>>>> to a new buffer. If the transfer had to be terminated manually first,
> > > >>>>>>> the DMA engine would potentially miss a frame, which is not acceptable.
> > > >>>>>>> We need an atomic way to switch to the next transfer.
> > > >>>>>>
> > > >>>>>> So in this case you have, let's say a cyclic descriptor with N buffers
> > > >>>>>> and they are cyclically capturing data and providing to client/user..
> > > >>>>>
> > > >>>>> For the display case it's cyclic over a single buffer that is repeatedly
> > > >>>>> displayed over and over again until a new one replaces it, when
> > > >>>>> userspace wants to change the content on the screen. Userspace only has
> > > >>>>> to provide a new buffer when content changes, otherwise the display has
> > > >>>>> to keep displaying the same one.
> > > >>>>
> > > >>>> Is the use case clear enough, or do you need more information ? Are you
> > > >>>> fine with the API for this kind of use case ?
> > > >>>
> > > >>> So we *know* when a new buffer is being used?
> > > >>
> > > >> The user of the DMA engine (the DRM DPSUB driver in this case) knows
> > > >> when a new buffer needs to be used, as it receives it from userspace. In
> > > >> response, it prepares a new interleaved cyclic transaction and queues
> > > >> it. At the next IRQ, the DMA engine driver switches to the new
> > > >> transaction (the implementation is slightly more complex to handle race
> > > >> conditions, but that's the idea).
> > > >>
> > > >>> IOW would it be possible for display (rather a dmaengine facing
> > > >>> display wrapper) to detect that we are reusing an old buffer and keep
> > > >>> the cyclic and once detected prepare a new descriptor, submit a new
> > > >>> one and then terminate old one which should trigger next transaction
> > > >>> to be submitted
> > > >>
> > > >> I'm not sure to follow you. Do you mean that the display driver should
> > > >> submit a non-cyclic transaction for every frame, reusing the same buffer
> > > >> for every transaction, until a new buffer is available ? The issue with
> > > >> this is that if the CPU load gets high, we may miss a frame, and the
> > > >> display will break. The DPDMA hardware implements cyclic support for
> > > >> this reason, and we want to use that feature to comply with the real
> > > >> time requirements.
> > > > 
> > > > Sorry to cause confusion :) I mean cyclic
> > > > 
> > > > So, DRM DPSUB get first buffer
> > > > A.1 Prepare cyclic interleave txn
> > > > A.2 Submit the txn (it doesn't start here)
> > > > A.3 Invoke issue_pending (that starts the txn)
> > 
> > I assume that, at this point, the transfer is started, and repeated
> > forever until step B below, right ?
> 
> Right, since the transaction is cyclic in nature, the transaction will continue
> until stopped or switched :)
> 
> > > > DRM DPSUB gets next buffer:
> > > > B.1 Prepare cyclic interleave txn
> > > > B.2 Submit the txn
> > > > B.3 Call terminate for current cyclic txn (we need an updated terminate
> > > > which terminates the current txn, right now we have terminate_all which
> > > > is a sledge hammer approach)
> > > > B.4 Next txn would start once current one is started
> > 
> > Do you mean "once current one is completed" ?
> 
> Yup, sorry for the typo!

No worries, I just wanted to make sure it wasn't a misunderstanding on
my side.

> > > > Does this help and make sense in your case
> > 
> > It does, but I really wonder why we need a new terminate operation that
> > would terminate a single transfer. If we call issue_pending at step B.3,
> > when the new txn submitted, we can terminate the current transfer at the
> > point. It changes the semantics of issue_pending, but only for cyclic
> > transfers (this whole discussions it only about cyclic transfers). As a
> > cyclic transfer will be repeated forever until terminated, there's no
> > use case for issuing a new transfer without terminating the one in
> > progress. I thus don't think we need a new terminate operation: the only
> > thing that makes sense to do when submitting a new cyclic transfer is to
> > terminate the current one and switch to the new one, and we already have
> > all the APIs we need to enable this behaviour.
> 
> The issue_pending() is a NOP when engine is already running.

That's not totally right. issue_pending() still moves submitted but not
issued transactions from the submitted queue to the issued queue. The
DMA engine only considers the issued queue, so issue_pending()
essentially tells the DMA engine to consider the submitted transaction
for processing after the already issued transactions complete (in the
non-cyclic case).

> The design of APIs is that we submit a txn to pending_list and then the
> pending_list is started when issue_pending() is called.
> Or if the engine is already running, it will take next txn from
> pending_list() when current txn completes.
> 
> The only consideration here in this case is that the cyclic txn never
> completes. Do we really treat a new txn submission as an 'indication' of
> completeness? That is indeed a point to ponder upon.

The reason why I think we should is two-fold:

1. I believe it's semantically aligned with the existing behaviour of
issue_pending(). As explained above, the operation tells the DMA engine
to consider submitted transactions for processing when the current (and
other issued) transactions complete. If we extend the definition of
complete to cover cyclic transactions, I think it's a good match.

2. There's really nothing else we could do with cyclic transactions.
They never complete today and have to be terminated manually with
terminate_all(). Using issue_pending() to move to a next cyclic
transaction doesn't change the existing behaviour by replacing a useful
(and used) feature, as issue_pending() is currently a no-op for cyclic
transactions. The newly issued transaction is never considered, and
calling terminate_all() will cancel the issued transactions. By
extending the behaviour of issue_pending(), we're making a new use case
possible, without restricting any other feature, and without "stealing"
issue_pending() and preventing it from implementing another useful
behaviour.

In a nutshell, an important reason why I like using issue_pending() for
this purpose is because it makes cyclic and non-cyclic transactions
behave more similarly, which I think is good from an API consistency
point of view.

> Also, we need to keep in mind that the dmaengine wont stop a cyclic
> txn. It would be running and start next transfer (in this case do
> from start) while it also gives you an interrupt. Here we would be
> required to stop it and then start a new one...

We wouldn't be required to stop it in the middle, the expected behaviour
is for the DMA engine to complete the cyclic transaction until the end
of the cycle and then replace it by the new one. That's exactly what
happens for non-cyclic transactions when you call issue_pending(), which
makes me like this solution.

> Or perhaps remove the cyclic setting from the txn when a new one
> arrives and that behaviour IMO is controller dependent, not sure if
> all controllers support it..

At the very least I would assume controllers to be able to stop a cyclic
transaction forcefully, otherwise terminate_all() could never be
implemented. This may not lead to a gracefully switch from one cyclic
transaction to another one if the hardware doesn't allow doing so. In
that case I think tx_submit() could return an error, or we could turn
issue_pending() into an int operation to signal the error. Note that
there's no need to mass-patch drivers here, if a DMA engine client
issues a second cyclic transaction while one is in progress, the second
transaction won't be considered today. Signalling an error is in my
opinion a useful feature, but not doing so in DMA engine drivers can't
be a regression. We could also add a flag to tell whether this mode of
operation is supported.

> > > That would be a clean way to handle it. We were missing this API for a
> > > long time to be able to cancel the ongoing transfer (whether it is
> > > cyclic or slave_sg, or memcpy) and move to the next one if there is one
> > > pending.
> > 
> > Note that this new terminate API wouldn't terminate the ongoing transfer
> > immediately, it would complete first, until the end of the cycle for
> > cyclic transfers, and until the end of the whole transfer otherwise.
> > This new operation would thus essentially be a no-op for non-cyclic
> > transfers. I don't see how it would help :-) Do you have any particular
> > use case in mind ?
> 
> Yeah that is something more to think about. Do we really abort here or
> wait for the txn to complete. I think Peter needs the former and your
> falls in the latter category

I definitely need the latter, otherwise the display will flicker (or
completely misoperate) every time a new frame is displayed, which isn't
a good idea :-) I'm not sure about Peter's use cases, but it seems to me
that aborting a transaction immediately is racy in most cases, unless
the DMA engine supports byte-level residue reporting. One non-intrusive
option would be to add a flag to signal that a newly issued transaction
should interrupt the current transaction immediately.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-02-14 16:22                             ` Laurent Pinchart
@ 2020-02-17 10:00                               ` Peter Ujfalusi
  2020-02-19  9:25                                 ` Vinod Koul
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Ujfalusi @ 2020-02-17 10:00 UTC (permalink / raw)
  To: Laurent Pinchart, Vinod Koul
  Cc: dmaengine, Michal Simek, Hyun Kwon, Tejas Upadhyay,
	Satish Kumar Nagireddy

Hi Laurent, Vinod,

On 14/02/2020 18.22, Laurent Pinchart wrote:
>>> It does, but I really wonder why we need a new terminate operation that
>>> would terminate a single transfer. If we call issue_pending at step B.3,
>>> when the new txn submitted, we can terminate the current transfer at the
>>> point. It changes the semantics of issue_pending, but only for cyclic
>>> transfers (this whole discussions it only about cyclic transfers). As a
>>> cyclic transfer will be repeated forever until terminated, there's no
>>> use case for issuing a new transfer without terminating the one in
>>> progress. I thus don't think we need a new terminate operation: the only
>>> thing that makes sense to do when submitting a new cyclic transfer is to
>>> terminate the current one and switch to the new one, and we already have
>>> all the APIs we need to enable this behaviour.
>>
>> The issue_pending() is a NOP when engine is already running.
> 
> That's not totally right. issue_pending() still moves submitted but not
> issued transactions from the submitted queue to the issued queue. The
> DMA engine only considers the issued queue, so issue_pending()
> essentially tells the DMA engine to consider the submitted transaction
> for processing after the already issued transactions complete (in the
> non-cyclic case).

Vinod's point is for the cyclic case at the current state. It is NOP
essentially as we don't have way to not kill the whole channel.

Just a sidenote: it is not even that clean cut for slave transfers
either as the slave_config must _not_ change between the issued
transfers. Iow, you can not switch between 16bit and 32bit word lengths
with some DMA. EDMA, sDMA can do that, but UDMA can not for example...

>> The design of APIs is that we submit a txn to pending_list and then the
>> pending_list is started when issue_pending() is called.
>> Or if the engine is already running, it will take next txn from
>> pending_list() when current txn completes.
>>
>> The only consideration here in this case is that the cyclic txn never
>> completes. Do we really treat a new txn submission as an 'indication' of
>> completeness? That is indeed a point to ponder upon.
> 
> The reason why I think we should is two-fold:
> 
> 1. I believe it's semantically aligned with the existing behaviour of
> issue_pending(). As explained above, the operation tells the DMA engine
> to consider submitted transactions for processing when the current (and
> other issued) transactions complete. If we extend the definition of
> complete to cover cyclic transactions, I think it's a good match.

We will end up with different behavior between cyclic and non cyclic
transfers and the new behavior should be somehow supported by existing
drivers.
Yes, issue_pending is moving the submitted tx to the issued queue to be
executed on HW when the current transfer finished.
We only needed this for non cyclic uses so far. Some DMA hw can replace
the current transfer with a new one (re-trigger to fetch the new
configuration, like your's), but some can not (none of the system DMAs
on TI platforms can).
If we say that this is the behavior the DMA drivers must follow then we
will have non compliant DMA drivers. You can not move simply to other
DMA or can not create generic DMA code shared by drivers.

> 2. There's really nothing else we could do with cyclic transactions.
> They never complete today and have to be terminated manually with
> terminate_all(). Using issue_pending() to move to a next cyclic
> transaction doesn't change the existing behaviour by replacing a useful
> (and used) feature, as issue_pending() is currently a no-op for cyclic
> transactions. The newly issued transaction is never considered, and
> calling terminate_all() will cancel the issued transactions. By
> extending the behaviour of issue_pending(), we're making a new use case
> possible, without restricting any other feature, and without "stealing"
> issue_pending() and preventing it from implementing another useful
> behaviour.

But at the same time we make existing drivers non compliant...

Imo a new callback to 'kill' / 'terminate' / 'replace' / 'abort' an
issued cookie would be cleaner.

cookie1 = dmaengine_issue_pending();
// will start the transfer
cookie2 = dmaengine_issue_pending();
// cookie1 still runs, cookie2 is waiting to be executed
dmaengine_abort_tx(chan);
// will kill cookie1 and executes cookie2

dmaengine_abort_tx() could take a cookie as parameter if we wish, so you
can say selectively which issued tx you want to remove, if it is the
running one, then stop it and move to the next one.
In place of the cookie parameter a 0 could imply that I don't know the
cookie, but kill the running one.

We would preserve what issue_pending does atm and would give us a
generic flow of how other drivers should handle such cases.

Note that this is not only useful for cyclic cases. Any driver which
currently uses brute-force termination can be upgraded.
Prime example is UART RX. We issue an RX buffer to receive data, but it
is not guarantied that the remote will send data which would fill the
buffer and we hit a timeout waiting. We could issue the next buffer and
kill the stale transfer to reclaim the received data.

I think this can be even implemented for DMAs which can not do the same
thing as your DMA can.

> In a nutshell, an important reason why I like using issue_pending() for
> this purpose is because it makes cyclic and non-cyclic transactions
> behave more similarly, which I think is good from an API consistency
> point of view.
> 
>> Also, we need to keep in mind that the dmaengine wont stop a cyclic
>> txn. It would be running and start next transfer (in this case do
>> from start) while it also gives you an interrupt. Here we would be
>> required to stop it and then start a new one...
> 
> We wouldn't be required to stop it in the middle, the expected behaviour
> is for the DMA engine to complete the cyclic transaction until the end
> of the cycle and then replace it by the new one. That's exactly what
> happens for non-cyclic transactions when you call issue_pending(), which
> makes me like this solution.

Right, so we have two different use cases. Replace the current transfers
with the next issued one and abort the current transfer now and arm the
next issued one.
dmaengine_abort_tx(chan, cookie, forced) ?
forced == false: replace it at cyclic boundary
forced == true: right away (as HW allows), do not wait for cyclic round

>> Or perhaps remove the cyclic setting from the txn when a new one
>> arrives and that behaviour IMO is controller dependent, not sure if
>> all controllers support it..
> 
> At the very least I would assume controllers to be able to stop a cyclic
> transaction forcefully, otherwise terminate_all() could never be
> implemented. This may not lead to a gracefully switch from one cyclic
> transaction to another one if the hardware doesn't allow doing so. In
> that case I think tx_submit() could return an error, or we could turn
> issue_pending() into an int operation to signal the error. Note that
> there's no need to mass-patch drivers here, if a DMA engine client
> issues a second cyclic transaction while one is in progress, the second
> transaction won't be considered today. Signalling an error is in my
> opinion a useful feature, but not doing so in DMA engine drivers can't
> be a regression. We could also add a flag to tell whether this mode of
> operation is supported.

My problems is that it is changing the behavior of issue_pending() for
cyclic. If we document this than all existing DMA drivers are broken
(not complaint with the API documentation) as they don't do this.


>>>> That would be a clean way to handle it. We were missing this API for a
>>>> long time to be able to cancel the ongoing transfer (whether it is
>>>> cyclic or slave_sg, or memcpy) and move to the next one if there is one
>>>> pending.
>>>
>>> Note that this new terminate API wouldn't terminate the ongoing transfer
>>> immediately, it would complete first, until the end of the cycle for
>>> cyclic transfers, and until the end of the whole transfer otherwise.
>>> This new operation would thus essentially be a no-op for non-cyclic
>>> transfers. I don't see how it would help :-) Do you have any particular
>>> use case in mind ?
>>
>> Yeah that is something more to think about. Do we really abort here or
>> wait for the txn to complete. I think Peter needs the former and your
>> falls in the latter category
> 
> I definitely need the latter, otherwise the display will flicker (or
> completely misoperate) every time a new frame is displayed, which isn't
> a good idea :-)

Sure, and it is a great feature.

> I'm not sure about Peter's use cases, but it seems to me
> that aborting a transaction immediately is racy in most cases, unless
> the DMA engine supports byte-level residue reporting.

Sort of yes. With EDMA, sDMA I can just kill the channel and set up a
new one right away.
UDMA on the other hand is not that forgiving... I would need to kill the
channel, wait for the termination to complete, reconfigure the channel
and execute the new transfer.

But with a separate callback API at least there will be an entry point
when this can be initiated and handled.
Fwiw, I think it should be simple to add this functionality to them, the
code is kind of handling it in other parts, but implementing it in the
issue_pending() is not really a clean solution.

In a channel you can run slave_sg transfers followed by cyclic if you
wish. A slave channel is what it is, slave channel which can be capable
to execute slave_sg and/or cyclic (and/or interleaved).
If issue_pending() is to take care then we need to check if the current
transfer is cyclic or not and decide based on that.

With a separate callback we in the DMA driver just need to do what the
client is asking for and no need to think.

> One non-intrusive
> option would be to add a flag to signal that a newly issued transaction
> should interrupt the current transaction immediately.

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type
  2020-02-17 10:00                               ` Peter Ujfalusi
@ 2020-02-19  9:25                                 ` Vinod Koul
  0 siblings, 0 replies; 27+ messages in thread
From: Vinod Koul @ 2020-02-19  9:25 UTC (permalink / raw)
  To: Peter Ujfalusi
  Cc: Laurent Pinchart, dmaengine, Michal Simek, Hyun Kwon,
	Tejas Upadhyay, Satish Kumar Nagireddy

On 17-02-20, 12:00, Peter Ujfalusi wrote:
> Hi Laurent, Vinod,
> 
> On 14/02/2020 18.22, Laurent Pinchart wrote:
> >>> It does, but I really wonder why we need a new terminate operation that
> >>> would terminate a single transfer. If we call issue_pending at step B.3,
> >>> when the new txn submitted, we can terminate the current transfer at the
> >>> point. It changes the semantics of issue_pending, but only for cyclic
> >>> transfers (this whole discussions it only about cyclic transfers). As a
> >>> cyclic transfer will be repeated forever until terminated, there's no
> >>> use case for issuing a new transfer without terminating the one in
> >>> progress. I thus don't think we need a new terminate operation: the only
> >>> thing that makes sense to do when submitting a new cyclic transfer is to
> >>> terminate the current one and switch to the new one, and we already have
> >>> all the APIs we need to enable this behaviour.
> >>
> >> The issue_pending() is a NOP when engine is already running.
> > 
> > That's not totally right. issue_pending() still moves submitted but not
> > issued transactions from the submitted queue to the issued queue. The
> > DMA engine only considers the issued queue, so issue_pending()
> > essentially tells the DMA engine to consider the submitted transaction
> > for processing after the already issued transactions complete (in the
> > non-cyclic case).
> 
> Vinod's point is for the cyclic case at the current state. It is NOP
> essentially as we don't have way to not kill the whole channel.

Or IOW there is no descriptor movement to hardware..

> Just a sidenote: it is not even that clean cut for slave transfers
> either as the slave_config must _not_ change between the issued
> transfers. Iow, you can not switch between 16bit and 32bit word lengths
> with some DMA. EDMA, sDMA can do that, but UDMA can not for example...
> 
> >> The design of APIs is that we submit a txn to pending_list and then the
> >> pending_list is started when issue_pending() is called.
> >> Or if the engine is already running, it will take next txn from
> >> pending_list() when current txn completes.
> >>
> >> The only consideration here in this case is that the cyclic txn never
> >> completes. Do we really treat a new txn submission as an 'indication' of
> >> completeness? That is indeed a point to ponder upon.
> > 
> > The reason why I think we should is two-fold:
> > 
> > 1. I believe it's semantically aligned with the existing behaviour of
> > issue_pending(). As explained above, the operation tells the DMA engine
> > to consider submitted transactions for processing when the current (and
> > other issued) transactions complete. If we extend the definition of
> > complete to cover cyclic transactions, I think it's a good match.
> 
> We will end up with different behavior between cyclic and non cyclic
> transfers and the new behavior should be somehow supported by existing
> drivers.
> Yes, issue_pending is moving the submitted tx to the issued queue to be
> executed on HW when the current transfer finished.
> We only needed this for non cyclic uses so far. Some DMA hw can replace
> the current transfer with a new one (re-trigger to fetch the new
> configuration, like your's), but some can not (none of the system DMAs
> on TI platforms can).
> If we say that this is the behavior the DMA drivers must follow then we
> will have non compliant DMA drivers. You can not move simply to other
> DMA or can not create generic DMA code shared by drivers.

That is very important point for API. We want no implicit behaviour, so
if we want an behaviour let us do that explicitly.

> > 2. There's really nothing else we could do with cyclic transactions.
> > They never complete today and have to be terminated manually with
> > terminate_all(). Using issue_pending() to move to a next cyclic
> > transaction doesn't change the existing behaviour by replacing a useful
> > (and used) feature, as issue_pending() is currently a no-op for cyclic
> > transactions. The newly issued transaction is never considered, and
> > calling terminate_all() will cancel the issued transactions. By
> > extending the behaviour of issue_pending(), we're making a new use case
> > possible, without restricting any other feature, and without "stealing"
> > issue_pending() and preventing it from implementing another useful
> > behaviour.
> 
> But at the same time we make existing drivers non compliant...
> 
> Imo a new callback to 'kill' / 'terminate' / 'replace' / 'abort' an
> issued cookie would be cleaner.
> 
> cookie1 = dmaengine_issue_pending();
> // will start the transfer
> cookie2 = dmaengine_issue_pending();
> // cookie1 still runs, cookie2 is waiting to be executed
> dmaengine_abort_tx(chan);
> // will kill cookie1 and executes cookie2

Right and we need a kill mode which kills the cookie1 at the end of
transfer (conditional to hw supporting that)

I think it should be generic API and usable in both the cyclic and
non-cyclic case

> 
> dmaengine_abort_tx() could take a cookie as parameter if we wish, so you
> can say selectively which issued tx you want to remove, if it is the
> running one, then stop it and move to the next one.
> In place of the cookie parameter a 0 could imply that I don't know the
> cookie, but kill the running one.
> 
> We would preserve what issue_pending does atm and would give us a
> generic flow of how other drivers should handle such cases.
> 
> Note that this is not only useful for cyclic cases. Any driver which
> currently uses brute-force termination can be upgraded.
> Prime example is UART RX. We issue an RX buffer to receive data, but it
> is not guarantied that the remote will send data which would fill the
> buffer and we hit a timeout waiting. We could issue the next buffer and
> kill the stale transfer to reclaim the received data.
> 
> I think this can be even implemented for DMAs which can not do the same
> thing as your DMA can.
> 
> > In a nutshell, an important reason why I like using issue_pending() for
> > this purpose is because it makes cyclic and non-cyclic transactions
> > behave more similarly, which I think is good from an API consistency
> > point of view.
> > 
> >> Also, we need to keep in mind that the dmaengine wont stop a cyclic
> >> txn. It would be running and start next transfer (in this case do
> >> from start) while it also gives you an interrupt. Here we would be
> >> required to stop it and then start a new one...
> > 
> > We wouldn't be required to stop it in the middle, the expected behaviour
> > is for the DMA engine to complete the cyclic transaction until the end
> > of the cycle and then replace it by the new one. That's exactly what
> > happens for non-cyclic transactions when you call issue_pending(), which
> > makes me like this solution.
> 
> Right, so we have two different use cases. Replace the current transfers
> with the next issued one and abort the current transfer now and arm the
> next issued one.
> dmaengine_abort_tx(chan, cookie, forced) ?
> forced == false: replace it at cyclic boundary
> forced == true: right away (as HW allows), do not wait for cyclic round
> 
> >> Or perhaps remove the cyclic setting from the txn when a new one
> >> arrives and that behaviour IMO is controller dependent, not sure if
> >> all controllers support it..
> > 
> > At the very least I would assume controllers to be able to stop a cyclic
> > transaction forcefully, otherwise terminate_all() could never be
> > implemented. This may not lead to a gracefully switch from one cyclic
> > transaction to another one if the hardware doesn't allow doing so. In
> > that case I think tx_submit() could return an error, or we could turn
> > issue_pending() into an int operation to signal the error. Note that
> > there's no need to mass-patch drivers here, if a DMA engine client
> > issues a second cyclic transaction while one is in progress, the second
> > transaction won't be considered today. Signalling an error is in my
> > opinion a useful feature, but not doing so in DMA engine drivers can't
> > be a regression. We could also add a flag to tell whether this mode of
> > operation is supported.
> 
> My problems is that it is changing the behavior of issue_pending() for
> cyclic. If we document this than all existing DMA drivers are broken
> (not complaint with the API documentation) as they don't do this.
> 
> 
> >>>> That would be a clean way to handle it. We were missing this API for a
> >>>> long time to be able to cancel the ongoing transfer (whether it is
> >>>> cyclic or slave_sg, or memcpy) and move to the next one if there is one
> >>>> pending.
> >>>
> >>> Note that this new terminate API wouldn't terminate the ongoing transfer
> >>> immediately, it would complete first, until the end of the cycle for
> >>> cyclic transfers, and until the end of the whole transfer otherwise.
> >>> This new operation would thus essentially be a no-op for non-cyclic
> >>> transfers. I don't see how it would help :-) Do you have any particular
> >>> use case in mind ?
> >>
> >> Yeah that is something more to think about. Do we really abort here or
> >> wait for the txn to complete. I think Peter needs the former and your
> >> falls in the latter category
> > 
> > I definitely need the latter, otherwise the display will flicker (or
> > completely misoperate) every time a new frame is displayed, which isn't
> > a good idea :-)
> 
> Sure, and it is a great feature.
> 
> > I'm not sure about Peter's use cases, but it seems to me
> > that aborting a transaction immediately is racy in most cases, unless
> > the DMA engine supports byte-level residue reporting.
> 
> Sort of yes. With EDMA, sDMA I can just kill the channel and set up a
> new one right away.
> UDMA on the other hand is not that forgiving... I would need to kill the
> channel, wait for the termination to complete, reconfigure the channel
> and execute the new transfer.
> 
> But with a separate callback API at least there will be an entry point
> when this can be initiated and handled.
> Fwiw, I think it should be simple to add this functionality to them, the
> code is kind of handling it in other parts, but implementing it in the
> issue_pending() is not really a clean solution.
> 
> In a channel you can run slave_sg transfers followed by cyclic if you
> wish. A slave channel is what it is, slave channel which can be capable
> to execute slave_sg and/or cyclic (and/or interleaved).
> If issue_pending() is to take care then we need to check if the current
> transfer is cyclic or not and decide based on that.
> 
> With a separate callback we in the DMA driver just need to do what the
> client is asking for and no need to think.
> 
> > One non-intrusive
> > option would be to add a flag to signal that a newly issued transaction
> > should interrupt the current transaction immediately.
> 
> - Péter
> 
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

-- 
~Vinod

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, back to index

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-23  2:29 [PATCH v3 0/6] dma: Add Xilinx ZynqMP DPDMA driver Laurent Pinchart
2020-01-23  2:29 ` [PATCH v3 1/6] dt: bindings: dma: xilinx: dpdma: DT bindings for Xilinx DPDMA Laurent Pinchart
2020-01-23  2:29 ` [PATCH v3 2/6] dmaengine: Add interleaved cyclic transaction type Laurent Pinchart
2020-01-23  8:03   ` Peter Ujfalusi
2020-01-23  8:43     ` Vinod Koul
2020-01-23  8:51       ` Peter Ujfalusi
2020-01-23 12:23         ` Laurent Pinchart
2020-01-24  6:10           ` Vinod Koul
2020-01-24  8:50             ` Laurent Pinchart
2020-02-10 14:06               ` Laurent Pinchart
2020-02-13 13:29                 ` Vinod Koul
2020-02-13 13:48                   ` Laurent Pinchart
2020-02-13 14:07                     ` Vinod Koul
2020-02-13 14:15                       ` Peter Ujfalusi
2020-02-13 16:52                         ` Laurent Pinchart
2020-02-14  4:23                           ` Vinod Koul
2020-02-14 16:22                             ` Laurent Pinchart
2020-02-17 10:00                               ` Peter Ujfalusi
2020-02-19  9:25                                 ` Vinod Koul
2020-01-24  7:20           ` Peter Ujfalusi
2020-01-24  7:38             ` Peter Ujfalusi
2020-01-24  8:58               ` Laurent Pinchart
2020-01-24  8:56             ` Laurent Pinchart
2020-01-23  2:29 ` [PATCH v3 3/6] dmaengine: virt-dma: Use lockdep to check locking requirements Laurent Pinchart
2020-01-23  2:29 ` [PATCH v3 4/6] dmaengine: xilinx: dpdma: Add the Xilinx DisplayPort DMA engine driver Laurent Pinchart
2020-01-23  2:29 ` [PATCH v3 5/6] dmaengine: xilinx: dpdma: Add debugfs support Laurent Pinchart
2020-01-23  2:29 ` [PATCH v3 6/6] arm64: dts: zynqmp: Add DPDMA node Laurent Pinchart

dmaengine Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/dmaengine/0 dmaengine/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dmaengine dmaengine/ https://lore.kernel.org/dmaengine \
		dmaengine@vger.kernel.org
	public-inbox-index dmaengine

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.dmaengine


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git