linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] Add-DMA-MDMA-chaining-support
@ 2018-09-28 13:01 Pierre-Yves MORDRET
  2018-09-28 13:01 ` [PATCH v3 1/7] dt-bindings: stm32-dma: Add DMA/MDMA chaining support bindings Pierre-Yves MORDRET
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: Pierre-Yves MORDRET @ 2018-09-28 13:01 UTC (permalink / raw)
  To: Vinod Koul, Rob Herring, Mark Rutland, Alexandre Torgue,
	Maxime Coquelin, Dan Williams, devicetree, dmaengine,
	linux-arm-kernel, linux-kernel
  Cc: Pierre-Yves MORDRET

This serie adds support for M2M transfer triggered by STM32 DMA in order to
transfer data from/to SRAM to/from DDR.

Normally, this mode should not be needed as transferring data from/to DDR
is supported by the STM32 DMA.
However, the STM32 DMA don't have the ability to generate burst transfer
on the DDR as it only embeds only a 4-word FIFO although the minimal burst
length on the DDR is 8 words.
Due to this constraint, the STM32 DMA transfers data from/to DDR in a
single way and could lead to pollute the DDR.
To avoid this, we have to use SRAM for all transfers where STM32 DMA is
involved.

An Hw design has been specially put in place to allow this chaining where DMA
interrupt is connected on GIC and MDMA request line as well. This grants the
possibility to trigger an MDMA transfer from the completion of DMA.
At the same time MDMA has the ability to acknowlege DMA. The aim is to have an
self refreching mechanism to transfer from/to device to/from DDR with minimal
sw support.
For instance the DMA is set in cyclic double buffering to feed SRAM and MDMA
transfer to DDR thanks to LLI.

---
  Version history:
    v3:
       * Solve KBuild warnings
    v2:
       * Rework binding content
    v1:
       * Initial
---

Pierre-Yves MORDRET (3):
  dt-bindings: stm32-dma: Add DMA/MDMA chaining support bindings
  dt-bindings: stm32-dmamux: Add one cell to support DMA/MDMA chain
  dt-bindings: stm32-mdma: Add DMA/MDMA chaining support bindings
  dmaengine: stm32-dma: Add DMA/MDMA chaining support
  dmaengine: stm32-mdma: Add DMA/MDMA chaining support
  dmaengine: stm32-dma: enable descriptor_reuse
  dmaengine: stm32-mdma: enable descriptor_reuse

 .../devicetree/bindings/dma/stm32-dma.txt          |  27 +-
 .../devicetree/bindings/dma/stm32-dmamux.txt       |   6 +-
 .../devicetree/bindings/dma/stm32-mdma.txt         |  12 +-
 drivers/dma/stm32-dma.c                            | 903 ++++++++++++++++++---
 drivers/dma/stm32-mdma.c                           | 133 ++-
 5 files changed, 949 insertions(+), 132 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 1/7] dt-bindings: stm32-dma: Add DMA/MDMA chaining support bindings
  2018-09-28 13:01 [PATCH v3 0/7] Add-DMA-MDMA-chaining-support Pierre-Yves MORDRET
@ 2018-09-28 13:01 ` Pierre-Yves MORDRET
  2018-10-07 14:57   ` Vinod
  2018-10-12 14:42   ` Rob Herring
  2018-09-28 13:01 ` [PATCH v3 2/7] dt-bindings: stm32-dmamux: Add one cell to support DMA/MDMA chain Pierre-Yves MORDRET
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: Pierre-Yves MORDRET @ 2018-09-28 13:01 UTC (permalink / raw)
  To: Vinod Koul, Rob Herring, Mark Rutland, Alexandre Torgue,
	Maxime Coquelin, Dan Williams, devicetree, dmaengine,
	linux-arm-kernel, linux-kernel
  Cc: Pierre-Yves MORDRET

From: M'boumba Cedric Madianga <cedric.madianga@gmail.com>

This patch adds dma bindings to support DMA/MDMA chaining transfer.
1 bit is to manage both DMA FIFO Threshold
1 bit is to manage DMA/MDMA Chaining features.
2 bits are used to specify SDRAM size to use for DMA/MDMA chaining.
The size in bytes of a certain order is given by the formula:
    (2 ^ order) * PAGE_SIZE.
The order is given by those 2 bits.
For cyclic, whether chaining is chosen, any value above 1 can be set :
SRAM buffer size will rely on period size and not on this DT value.

Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
---
  Version history:
    v3:
    v2:
       * rework content
    v1:
       * Initial
---
---
 .../devicetree/bindings/dma/stm32-dma.txt          | 27 +++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/dma/stm32-dma.txt b/Documentation/devicetree/bindings/dma/stm32-dma.txt
index c5f5190..2bac8c7 100644
--- a/Documentation/devicetree/bindings/dma/stm32-dma.txt
+++ b/Documentation/devicetree/bindings/dma/stm32-dma.txt
@@ -17,6 +17,12 @@ Optional properties:
 - resets: Reference to a reset controller asserting the DMA controller
 - st,mem2mem: boolean; if defined, it indicates that the controller supports
   memory-to-memory transfer
+- dmas: A list of eight dma specifiers, one for each entry in dma-names.
+  Refer to stm32-mdma.txt for more details.
+- dma-names: should contain "ch0", "ch1", "ch2", "ch3", "ch4", "ch5", "ch6" and
+  "ch7" and represents each STM32 DMA channel connected to a STM32 MDMA one.
+- memory-region : phandle to a node describing memory to be used for
+  M2M intermediate transfer between DMA and MDMA.
 
 Example:
 
@@ -36,6 +42,16 @@ Example:
 		st,mem2mem;
 		resets = <&rcc 150>;
 		dma-requests = <8>;
+		dmas = <&mdma1 8 0x10 0x1200000a 0x40026408 0x00000020 1>,
+		       <&mdma1 9 0x10 0x1200000a 0x40026408 0x00000800 1>,
+		       <&mdma1 10 0x10 0x1200000a 0x40026408 0x00200000 1>,
+		       <&mdma1 11 0x10 0x1200000a 0x40026408 0x08000000 1>,
+		       <&mdma1 12 0x10 0x1200000a 0x4002640C 0x00000020 1>,
+		       <&mdma1 13 0x10 0x1200000a 0x4002640C 0x00000800 1>,
+		       <&mdma1 14 0x10 0x1200000a 0x4002640C 0x00200000 1>,
+		       <&mdma1 15 0x10 0x1200000a 0x4002640C 0x08000000 1>;
+		dma-names = "ch0", "ch1", "ch2", "ch3", "ch4", "ch5", "ch6", "ch7";
+		memory-region = <&sram_dmapool>;
 	};
 
 * DMA client
@@ -68,7 +84,16 @@ channel: a phandle to the DMA controller plus the following four integer cells:
 	0x1: 1/2 full FIFO
 	0x2: 3/4 full FIFO
 	0x3: full FIFO
-
+ -bit 2: Intermediate M2M transfer from/to DDR to/from SRAM throughout MDMA
+	0: MDMA not used to generate an intermediate M2M transfer
+	1: MDMA used to generate an intermediate M2M transfer.
+ -bit 3-4: indicated SRAM Buffer size in (2^order)*PAGE_SIZE.
+	PAGE_SIZE is given by Linux at 4KiB: include/asm-generic/page.h.
+	Order is given by those 2 bits starting at 0.
+	Valid only whether Intermediate M2M transfer is set.
+	For cyclic, whether Intermediate M2M transfer is chosen, any value can
+	be set: SRAM buffer size will rely on period size and not on this DT
+	value.
 
 Example:
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 2/7] dt-bindings: stm32-dmamux: Add one cell to support DMA/MDMA chain
  2018-09-28 13:01 [PATCH v3 0/7] Add-DMA-MDMA-chaining-support Pierre-Yves MORDRET
  2018-09-28 13:01 ` [PATCH v3 1/7] dt-bindings: stm32-dma: Add DMA/MDMA chaining support bindings Pierre-Yves MORDRET
@ 2018-09-28 13:01 ` Pierre-Yves MORDRET
  2018-10-07 14:58   ` Vinod
  2018-10-12 14:46   ` Rob Herring
  2018-09-28 13:01 ` [PATCH v3 3/7] dt-bindings: stm32-mdma: Add DMA/MDMA chaining support bindings Pierre-Yves MORDRET
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: Pierre-Yves MORDRET @ 2018-09-28 13:01 UTC (permalink / raw)
  To: Vinod Koul, Rob Herring, Mark Rutland, Alexandre Torgue,
	Maxime Coquelin, Dan Williams, devicetree, dmaengine,
	linux-arm-kernel, linux-kernel
  Cc: Pierre-Yves MORDRET

From: M'boumba Cedric Madianga <cedric.madianga@gmail.com>

Add one cell to support DMA/MDMA chaining.

Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
---
  Version history:
    v3:
    v2:
       * rework content
    v1:
       * Initial
---
---
 Documentation/devicetree/bindings/dma/stm32-dmamux.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/dma/stm32-dmamux.txt b/Documentation/devicetree/bindings/dma/stm32-dmamux.txt
index 1b893b2..5e92b59 100644
--- a/Documentation/devicetree/bindings/dma/stm32-dmamux.txt
+++ b/Documentation/devicetree/bindings/dma/stm32-dmamux.txt
@@ -4,9 +4,9 @@ Required properties:
 - compatible:	"st,stm32h7-dmamux"
 - reg:		Memory map for accessing module
 - #dma-cells:	Should be set to <3>.
-		First parameter is request line number.
-		Second is DMA channel configuration
-		Third is Fifo threshold
+-		First parameter is request line number.
+-		Second is DMA channel configuration
+-		Third is a 32bit bitfield
 		For more details about the three cells, please see
 		stm32-dma.txt documentation binding file
 - dma-masters:	Phandle pointing to the DMA controllers.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 3/7] dt-bindings: stm32-mdma: Add DMA/MDMA chaining support bindings
  2018-09-28 13:01 [PATCH v3 0/7] Add-DMA-MDMA-chaining-support Pierre-Yves MORDRET
  2018-09-28 13:01 ` [PATCH v3 1/7] dt-bindings: stm32-dma: Add DMA/MDMA chaining support bindings Pierre-Yves MORDRET
  2018-09-28 13:01 ` [PATCH v3 2/7] dt-bindings: stm32-dmamux: Add one cell to support DMA/MDMA chain Pierre-Yves MORDRET
@ 2018-09-28 13:01 ` Pierre-Yves MORDRET
  2018-10-07 14:59   ` Vinod
  2018-09-28 13:01 ` [PATCH v3 4/7] dmaengine: stm32-dma: Add DMA/MDMA chaining support Pierre-Yves MORDRET
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Pierre-Yves MORDRET @ 2018-09-28 13:01 UTC (permalink / raw)
  To: Vinod Koul, Rob Herring, Mark Rutland, Alexandre Torgue,
	Maxime Coquelin, Dan Williams, devicetree, dmaengine,
	linux-arm-kernel, linux-kernel
  Cc: Pierre-Yves MORDRET

From: M'boumba Cedric Madianga <cedric.madianga@gmail.com>

This patch adds the description of the 2 properties needed to support M2M
transfer triggered by STM32 DMA when his transfer is complete.

Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
---
  Version history:
    v3:
    v2:
       * rework content
    v1:
       * Initial
---
---
 Documentation/devicetree/bindings/dma/stm32-mdma.txt | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/Documentation/devicetree/bindings/dma/stm32-mdma.txt b/Documentation/devicetree/bindings/dma/stm32-mdma.txt
index d18772d..27c2812 100644
--- a/Documentation/devicetree/bindings/dma/stm32-mdma.txt
+++ b/Documentation/devicetree/bindings/dma/stm32-mdma.txt
@@ -10,7 +10,7 @@ Required properties:
 - interrupts: Should contain the MDMA interrupt.
 - clocks: Should contain the input clock of the DMA instance.
 - resets: Reference to a reset controller asserting the DMA controller.
-- #dma-cells : Must be <5>. See DMA client paragraph for more details.
+- #dma-cells : Must be <6>. See DMA client paragraph for more details.
 
 Optional properties:
 - dma-channels: Number of DMA channels supported by the controller.
@@ -26,7 +26,7 @@ Example:
 		interrupts = <122>;
 		clocks = <&timer_clk>;
 		resets = <&rcc 992>;
-		#dma-cells = <5>;
+		#dma-cells = <6>;
 		dma-channels = <16>;
 		dma-requests = <32>;
 		st,ahb-addr-masks = <0x20000000>, <0x00000000>;
@@ -35,8 +35,8 @@ Example:
 * DMA client
 
 DMA clients connected to the STM32 MDMA controller must use the format
-described in the dma.txt file, using a five-cell specifier for each channel:
-a phandle to the MDMA controller plus the following five integer cells:
+described in the dma.txt file, using a six-cell specifier for each channel:
+a phandle to the MDMA controller plus the following six integer cells:
 
 1. The request line number
 2. The priority level
@@ -76,6 +76,10 @@ a phandle to the MDMA controller plus the following five integer cells:
    if no HW ack signal is used by the MDMA client
 5. A 32bit mask specifying the value to be written to acknowledge the request
    if no HW ack signal is used by the MDMA client
+6. A bitfield value specifying if the MDMA client wants to generate M2M
+   transfer with HW trigger (1) or not (0). This bitfield should be only
+   enabled for M2M transfer triggered by STM32 DMA client. The memory devices
+   involved in this kind of transfer are SRAM and DDR.
 
 Example:
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 4/7] dmaengine: stm32-dma: Add DMA/MDMA chaining support
  2018-09-28 13:01 [PATCH v3 0/7] Add-DMA-MDMA-chaining-support Pierre-Yves MORDRET
                   ` (2 preceding siblings ...)
  2018-09-28 13:01 ` [PATCH v3 3/7] dt-bindings: stm32-mdma: Add DMA/MDMA chaining support bindings Pierre-Yves MORDRET
@ 2018-09-28 13:01 ` Pierre-Yves MORDRET
  2018-10-07 16:00   ` Vinod
  2018-09-28 13:01 ` [PATCH v3 5/7] dmaengine: stm32-mdma: " Pierre-Yves MORDRET
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Pierre-Yves MORDRET @ 2018-09-28 13:01 UTC (permalink / raw)
  To: Vinod Koul, Rob Herring, Mark Rutland, Alexandre Torgue,
	Maxime Coquelin, Dan Williams, devicetree, dmaengine,
	linux-arm-kernel, linux-kernel
  Cc: Pierre-Yves MORDRET

This patch adds support of DMA/MDMA chaining support.
It introduces an intermediate transfer between peripherals and STM32 DMA.
This intermediate transfer is triggered by SW for single M2D transfer and
by STM32 DMA IP for all other modes (sg, cyclic) and direction (D2M).

A generic SRAM allocator is used for this intermediate buffer
Each DMA channel will be able to define its SRAM needs to achieve chaining
feature : (2 ^ order) * PAGE_SIZE.
For cyclic, SRAM buffer is derived from period length (rounded on
PAGE_SIZE).

Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
---
  Version history:
    v3:
       * Solve KBuild warning
    v2:
    v1:
       * Initial
---
---
 drivers/dma/stm32-dma.c | 879 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 772 insertions(+), 107 deletions(-)

diff --git a/drivers/dma/stm32-dma.c b/drivers/dma/stm32-dma.c
index 379e8d5..85e81c4 100644
--- a/drivers/dma/stm32-dma.c
+++ b/drivers/dma/stm32-dma.c
@@ -15,11 +15,14 @@
 #include <linux/dmaengine.h>
 #include <linux/dma-mapping.h>
 #include <linux/err.h>
+#include <linux/genalloc.h>
 #include <linux/init.h>
+#include <linux/iopoll.h>
 #include <linux/jiffies.h>
 #include <linux/list.h>
 #include <linux/module.h>
 #include <linux/of.h>
+#include <linux/of_address.h>
 #include <linux/of_device.h>
 #include <linux/of_dma.h>
 #include <linux/platform_device.h>
@@ -118,6 +121,7 @@
 #define STM32_DMA_FIFO_THRESHOLD_FULL			0x03
 
 #define STM32_DMA_MAX_DATA_ITEMS	0xffff
+#define STM32_DMA_SRAM_GRANULARITY	PAGE_SIZE
 /*
  * Valid transfer starts from @0 to @0xFFFE leading to unaligned scatter
  * gather at boundary. Thus it's safer to round down this value on FIFO
@@ -135,6 +139,12 @@
 /* DMA Features */
 #define STM32_DMA_THRESHOLD_FTR_MASK	GENMASK(1, 0)
 #define STM32_DMA_THRESHOLD_FTR_GET(n)	((n) & STM32_DMA_THRESHOLD_FTR_MASK)
+#define STM32_DMA_MDMA_CHAIN_FTR_MASK	BIT(2)
+#define STM32_DMA_MDMA_CHAIN_FTR_GET(n)	(((n) & STM32_DMA_MDMA_CHAIN_FTR_MASK) \
+					 >> 2)
+#define STM32_DMA_MDMA_SRAM_SIZE_MASK	GENMASK(4, 3)
+#define STM32_DMA_MDMA_SRAM_SIZE_GET(n)	(((n) & STM32_DMA_MDMA_SRAM_SIZE_MASK) \
+					 >> 3)
 
 enum stm32_dma_width {
 	STM32_DMA_BYTE,
@@ -176,15 +186,31 @@ struct stm32_dma_chan_reg {
 	u32 dma_sfcr;
 };
 
+struct stm32_dma_mdma_desc {
+	struct sg_table sgt;
+	struct dma_async_tx_descriptor *desc;
+};
+
+struct stm32_dma_mdma {
+	struct dma_chan *chan;
+	enum dma_transfer_direction dir;
+	dma_addr_t sram_buf;
+	u32 sram_period;
+	u32 num_sgs;
+};
+
 struct stm32_dma_sg_req {
-	u32 len;
+	struct scatterlist stm32_sgl_req;
 	struct stm32_dma_chan_reg chan_reg;
+	struct stm32_dma_mdma_desc m_desc;
 };
 
 struct stm32_dma_desc {
 	struct virt_dma_desc vdesc;
 	bool cyclic;
 	u32 num_sgs;
+	dma_addr_t dma_buf;
+	void *dma_buf_cpu;
 	struct stm32_dma_sg_req sg_req[];
 };
 
@@ -201,6 +227,10 @@ struct stm32_dma_chan {
 	u32 threshold;
 	u32 mem_burst;
 	u32 mem_width;
+	struct stm32_dma_mdma mchan;
+	u32 use_mdma;
+	u32 sram_size;
+	u32 residue_after_drain;
 };
 
 struct stm32_dma_device {
@@ -210,6 +240,7 @@ struct stm32_dma_device {
 	struct reset_control *rst;
 	bool mem2mem;
 	struct stm32_dma_chan chan[STM32_DMA_MAX_CHANNELS];
+	struct gen_pool *sram_pool;
 };
 
 static struct stm32_dma_device *stm32_dma_get_dev(struct stm32_dma_chan *chan)
@@ -497,11 +528,15 @@ static void stm32_dma_stop(struct stm32_dma_chan *chan)
 static int stm32_dma_terminate_all(struct dma_chan *c)
 {
 	struct stm32_dma_chan *chan = to_stm32_dma_chan(c);
+	struct stm32_dma_mdma *mchan = &chan->mchan;
 	unsigned long flags;
 	LIST_HEAD(head);
 
 	spin_lock_irqsave(&chan->vchan.lock, flags);
 
+	if (chan->use_mdma)
+		dmaengine_terminate_async(mchan->chan);
+
 	if (chan->busy) {
 		stm32_dma_stop(chan);
 		chan->desc = NULL;
@@ -514,9 +549,96 @@ static int stm32_dma_terminate_all(struct dma_chan *c)
 	return 0;
 }
 
+static u32 stm32_dma_get_remaining_bytes(struct stm32_dma_chan *chan)
+{
+	u32 dma_scr, width, ndtr;
+	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
+
+	dma_scr = stm32_dma_read(dmadev, STM32_DMA_SCR(chan->id));
+	width = STM32_DMA_SCR_PSIZE_GET(dma_scr);
+	ndtr = stm32_dma_read(dmadev, STM32_DMA_SNDTR(chan->id));
+
+	return ndtr << width;
+}
+
+static int stm32_dma_mdma_drain(struct stm32_dma_chan *chan)
+{
+	struct stm32_dma_mdma *mchan = &chan->mchan;
+	struct stm32_dma_sg_req *sg_req;
+	struct dma_device *ddev = mchan->chan->device;
+	struct dma_async_tx_descriptor *desc = NULL;
+	enum dma_status status;
+	dma_addr_t src_buf, dst_buf;
+	u32 mdma_residue, mdma_wrote, dma_to_write, len;
+	struct dma_tx_state state;
+	int ret;
+
+	/* DMA/MDMA chain: drain remaining data in SRAM */
+
+	/* Get the residue on MDMA side */
+	status = dmaengine_tx_status(mchan->chan, mchan->chan->cookie, &state);
+	if (status == DMA_COMPLETE)
+		return status;
+
+	mdma_residue = state.residue;
+	sg_req = &chan->desc->sg_req[chan->next_sg - 1];
+	len = sg_dma_len(&sg_req->stm32_sgl_req);
+
+	/*
+	 * Total = mdma blocks * sram_period + rest (< sram_period)
+	 * so mdma blocks * sram_period = len - mdma residue - rest
+	 */
+	mdma_wrote = len - mdma_residue - (len % mchan->sram_period);
+
+	/* Remaining data stuck in SRAM */
+	dma_to_write = mchan->sram_period - stm32_dma_get_remaining_bytes(chan);
+	if (dma_to_write > 0) {
+		/* Stop DMA current operation */
+		stm32_dma_disable_chan(chan);
+
+		/* Terminate current MDMA to initiate a new one */
+		dmaengine_terminate_all(mchan->chan);
+
+		/* Double buffer management */
+		src_buf = mchan->sram_buf +
+			  ((mdma_wrote / mchan->sram_period) & 0x1) *
+			  mchan->sram_period;
+		dst_buf = sg_dma_address(&sg_req->stm32_sgl_req) + mdma_wrote;
+
+		desc = ddev->device_prep_dma_memcpy(mchan->chan,
+						    dst_buf, src_buf,
+						    dma_to_write,
+						    DMA_PREP_INTERRUPT);
+		if (!desc)
+			return -EINVAL;
+
+		ret = dma_submit_error(dmaengine_submit(desc));
+		if (ret < 0)
+			return ret;
+
+		status = dma_wait_for_async_tx(desc);
+		if (status != DMA_COMPLETE) {
+			dev_err(chan2dev(chan), "flush() dma_wait_for_async_tx error\n");
+			dmaengine_terminate_async(mchan->chan);
+			return -EBUSY;
+		}
+
+		/* We need to store residue for tx_status() */
+		chan->residue_after_drain = len - (mdma_wrote + dma_to_write);
+	}
+
+	return 0;
+}
+
 static void stm32_dma_synchronize(struct dma_chan *c)
 {
 	struct stm32_dma_chan *chan = to_stm32_dma_chan(c);
+	struct stm32_dma_mdma *mchan = &chan->mchan;
+
+	if (chan->desc && chan->use_mdma && mchan->dir == DMA_DEV_TO_MEM)
+		if (stm32_dma_mdma_drain(chan))
+			dev_err(chan2dev(chan), "%s: can't drain DMA\n",
+				__func__);
 
 	vchan_synchronize(&chan->vchan);
 }
@@ -539,62 +661,232 @@ static void stm32_dma_dump_reg(struct stm32_dma_chan *chan)
 	dev_dbg(chan2dev(chan), "SFCR:  0x%08x\n", sfcr);
 }
 
-static void stm32_dma_configure_next_sg(struct stm32_dma_chan *chan);
-
-static void stm32_dma_start_transfer(struct stm32_dma_chan *chan)
+static int stm32_dma_dummy_memcpy_xfer(struct stm32_dma_chan *chan)
 {
 	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
-	struct virt_dma_desc *vdesc;
+	struct dma_device *ddev = &dmadev->ddev;
+	struct stm32_dma_chan_reg reg;
+	u8 src_buf, dst_buf;
+	dma_addr_t dma_src_buf, dma_dst_buf;
+	u32 ndtr, status;
+	int len, ret;
+
+	ret = 0;
+	src_buf = 0;
+	len = 1;
+
+	dma_src_buf = dma_map_single(ddev->dev, &src_buf, len, DMA_TO_DEVICE);
+	ret = dma_mapping_error(ddev->dev, dma_src_buf);
+	if (ret < 0) {
+		dev_err(chan2dev(chan), "Source buffer map failed\n");
+		return ret;
+	}
+
+	dma_dst_buf = dma_map_single(ddev->dev, &dst_buf, len, DMA_FROM_DEVICE);
+	ret = dma_mapping_error(ddev->dev, dma_dst_buf);
+	if (ret < 0) {
+		dev_err(chan2dev(chan), "Destination buffer map failed\n");
+		dma_unmap_single(ddev->dev, dma_src_buf, len, DMA_TO_DEVICE);
+		return ret;
+	}
+
+	reg.dma_scr =	STM32_DMA_SCR_DIR(STM32_DMA_MEM_TO_MEM) |
+			STM32_DMA_SCR_PBURST(STM32_DMA_BURST_SINGLE) |
+			STM32_DMA_SCR_MBURST(STM32_DMA_BURST_SINGLE) |
+			STM32_DMA_SCR_MINC |
+			STM32_DMA_SCR_PINC |
+			STM32_DMA_SCR_TEIE;
+	reg.dma_spar = dma_src_buf;
+	reg.dma_sm0ar = dma_dst_buf;
+	reg.dma_sfcr = STM32_DMA_SFCR_MASK |
+		STM32_DMA_SFCR_FTH(STM32_DMA_FIFO_THRESHOLD_FULL);
+	reg.dma_sm1ar = dma_dst_buf;
+	reg.dma_sndtr = 1;
+
+	stm32_dma_write(dmadev, STM32_DMA_SCR(chan->id), reg.dma_scr);
+	stm32_dma_write(dmadev, STM32_DMA_SPAR(chan->id), reg.dma_spar);
+	stm32_dma_write(dmadev, STM32_DMA_SM0AR(chan->id), reg.dma_sm0ar);
+	stm32_dma_write(dmadev, STM32_DMA_SFCR(chan->id), reg.dma_sfcr);
+	stm32_dma_write(dmadev, STM32_DMA_SM1AR(chan->id), reg.dma_sm1ar);
+	stm32_dma_write(dmadev, STM32_DMA_SNDTR(chan->id), reg.dma_sndtr);
+
+	/* Clear interrupt status if it is there */
+	status = stm32_dma_irq_status(chan);
+	if (status)
+		stm32_dma_irq_clear(chan, status);
+
+	stm32_dma_dump_reg(chan);
+
+	chan->busy = true;
+	/* Start DMA */
+	reg.dma_scr |= STM32_DMA_SCR_EN;
+	stm32_dma_write(dmadev, STM32_DMA_SCR(chan->id), reg.dma_scr);
+
+	ret = readl_relaxed_poll_timeout_atomic(dmadev->base +
+						STM32_DMA_SNDTR(chan->id),
+						ndtr, !ndtr, 10, 1000);
+	if (ret) {
+		dev_err(chan2dev(chan), "%s: timeout!\n", __func__);
+		ret = -EBUSY;
+	}
+
+	chan->busy = false;
+
+	ret = stm32_dma_disable_chan(chan);
+	status = stm32_dma_irq_status(chan);
+	if (status)
+		stm32_dma_irq_clear(chan, status);
+
+	dma_unmap_single(ddev->dev, dma_src_buf, len, DMA_TO_DEVICE);
+	dma_unmap_single(ddev->dev, dma_dst_buf, len, DMA_FROM_DEVICE);
+
+	return ret;
+}
+
+static int stm32_dma_mdma_flush_remaining(struct stm32_dma_chan *chan)
+{
+	struct stm32_dma_mdma *mchan = &chan->mchan;
 	struct stm32_dma_sg_req *sg_req;
-	struct stm32_dma_chan_reg *reg;
-	u32 status;
+	struct dma_device *ddev = mchan->chan->device;
+	struct dma_async_tx_descriptor *desc = NULL;
+	enum dma_status status;
+	dma_addr_t src_buf, dst_buf;
+	u32 residue, remain, len;
 	int ret;
 
-	ret = stm32_dma_disable_chan(chan);
-	if (ret < 0)
-		return;
+	sg_req = &chan->desc->sg_req[chan->next_sg - 1];
 
-	if (!chan->desc) {
-		vdesc = vchan_next_desc(&chan->vchan);
-		if (!vdesc)
-			return;
+	residue = stm32_dma_get_remaining_bytes(chan);
+	len = sg_dma_len(&sg_req->stm32_sgl_req);
+	remain = len % mchan->sram_period;
 
-		chan->desc = to_stm32_dma_desc(vdesc);
-		chan->next_sg = 0;
+	if (residue > 0 && len > mchan->sram_period &&
+	    ((len % mchan->sram_period) != 0)) {
+		unsigned long dma_sync_wait_timeout =
+			jiffies + msecs_to_jiffies(5000);
+
+		while (residue > 0 &&
+		       residue > (mchan->sram_period - remain)) {
+			if (time_after_eq(jiffies, dma_sync_wait_timeout)) {
+				dev_err(chan2dev(chan),
+					"%s timeout waiting for last bytes\n",
+					__func__);
+				break;
+			}
+			cpu_relax();
+			residue = stm32_dma_get_remaining_bytes(chan);
+		}
+		stm32_dma_disable_chan(chan);
+
+		src_buf = mchan->sram_buf + ((len / mchan->sram_period) & 0x1)
+			* mchan->sram_period;
+		dst_buf = sg_dma_address(&sg_req->stm32_sgl_req) + len -
+			(len % mchan->sram_period);
+
+		desc = ddev->device_prep_dma_memcpy(mchan->chan,
+						    dst_buf, src_buf,
+						    len % mchan->sram_period,
+						    DMA_PREP_INTERRUPT);
+
+		if (!desc)
+			return -EINVAL;
+
+		ret = dma_submit_error(dmaengine_submit(desc));
+		if (ret < 0)
+			return ret;
+
+		status = dma_wait_for_async_tx(desc);
+		if (status != DMA_COMPLETE) {
+			dmaengine_terminate_async(mchan->chan);
+			return -EBUSY;
+		}
 	}
 
-	if (chan->next_sg == chan->desc->num_sgs)
-		chan->next_sg = 0;
+	return 0;
+}
 
-	sg_req = &chan->desc->sg_req[chan->next_sg];
-	reg = &sg_req->chan_reg;
+static void stm32_dma_start_transfer(struct stm32_dma_chan *chan);
 
-	stm32_dma_write(dmadev, STM32_DMA_SCR(chan->id), reg->dma_scr);
-	stm32_dma_write(dmadev, STM32_DMA_SPAR(chan->id), reg->dma_spar);
-	stm32_dma_write(dmadev, STM32_DMA_SM0AR(chan->id), reg->dma_sm0ar);
-	stm32_dma_write(dmadev, STM32_DMA_SFCR(chan->id), reg->dma_sfcr);
-	stm32_dma_write(dmadev, STM32_DMA_SM1AR(chan->id), reg->dma_sm1ar);
-	stm32_dma_write(dmadev, STM32_DMA_SNDTR(chan->id), reg->dma_sndtr);
+static void stm32_mdma_chan_complete(void *param,
+				     const struct dmaengine_result *result)
+{
+	struct stm32_dma_chan *chan = param;
 
-	chan->next_sg++;
+	chan->busy = false;
+	if (result->result == DMA_TRANS_NOERROR) {
+		if (stm32_dma_mdma_flush_remaining(chan)) {
+			dev_err(chan2dev(chan), "Can't flush DMA\n");
+			return;
+		}
 
-	/* Clear interrupt status if it is there */
-	status = stm32_dma_irq_status(chan);
-	if (status)
-		stm32_dma_irq_clear(chan, status);
+		if (chan->next_sg == chan->desc->num_sgs) {
+			list_del(&chan->desc->vdesc.node);
+			vchan_cookie_complete(&chan->desc->vdesc);
+			chan->desc = NULL;
+		}
+		stm32_dma_start_transfer(chan);
+	} else {
+		dev_err(chan2dev(chan), "MDMA transfer error: %d\n",
+			result->result);
+	}
+}
 
-	if (chan->desc->cyclic)
-		stm32_dma_configure_next_sg(chan);
+static int stm32_dma_mdma_start(struct stm32_dma_chan *chan,
+				struct stm32_dma_sg_req *sg_req)
+{
+	struct stm32_dma_mdma *mchan = &chan->mchan;
+	struct stm32_dma_mdma_desc *m_desc = &sg_req->m_desc;
+	struct dma_slave_config config;
+	int ret;
 
-	stm32_dma_dump_reg(chan);
+	/* Configure MDMA channel */
+	memset(&config, 0, sizeof(config));
+	if (mchan->dir == DMA_MEM_TO_DEV)
+		config.dst_addr = mchan->sram_buf;
+	else
+		config.src_addr = mchan->sram_buf;
 
-	/* Start DMA */
-	reg->dma_scr |= STM32_DMA_SCR_EN;
-	stm32_dma_write(dmadev, STM32_DMA_SCR(chan->id), reg->dma_scr);
+	ret = dmaengine_slave_config(mchan->chan, &config);
+	if (ret < 0)
+		goto error;
+
+	 /* Prepare MDMA descriptor */
+	m_desc->desc = dmaengine_prep_slave_sg(mchan->chan, m_desc->sgt.sgl,
+					       m_desc->sgt.nents, mchan->dir,
+					       DMA_PREP_INTERRUPT);
+	if (!m_desc->desc) {
+		ret = -EINVAL;
+		goto error;
+	}
 
-	chan->busy = true;
+	if (mchan->dir != DMA_MEM_TO_DEV) {
+		m_desc->desc->callback_result = stm32_mdma_chan_complete;
+		m_desc->desc->callback_param = chan;
+	}
 
-	dev_dbg(chan2dev(chan), "vchan %pK: started\n", &chan->vchan);
+	ret = dma_submit_error(dmaengine_submit(m_desc->desc));
+	if (ret < 0) {
+		dev_err(chan2dev(chan), "MDMA submit failed\n");
+		goto error;
+	}
+
+	dma_async_issue_pending(mchan->chan);
+
+	/*
+	 * In case of M2D transfer, we have to generate dummy DMA transfer to
+	 * copy 1st sg data into SRAM
+	 */
+	if (mchan->dir == DMA_MEM_TO_DEV) {
+		ret = stm32_dma_dummy_memcpy_xfer(chan);
+		if (ret < 0) {
+			dmaengine_terminate_async(mchan->chan);
+			goto error;
+		}
+	}
+
+	return 0;
+error:
+	return ret;
 }
 
 static void stm32_dma_configure_next_sg(struct stm32_dma_chan *chan)
@@ -626,23 +918,132 @@ static void stm32_dma_configure_next_sg(struct stm32_dma_chan *chan)
 	}
 }
 
-static void stm32_dma_handle_chan_done(struct stm32_dma_chan *chan)
+static void stm32_dma_start_transfer(struct stm32_dma_chan *chan)
 {
-	if (chan->desc) {
-		if (chan->desc->cyclic) {
-			vchan_cyclic_callback(&chan->desc->vdesc);
-			chan->next_sg++;
-			stm32_dma_configure_next_sg(chan);
+	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
+	struct virt_dma_desc *vdesc;
+	struct stm32_dma_sg_req *sg_req;
+	struct stm32_dma_chan_reg *reg;
+	u32 status;
+	int ret;
+
+	ret = stm32_dma_disable_chan(chan);
+	if (ret < 0)
+		return;
+
+	if (!chan->desc) {
+		vdesc = vchan_next_desc(&chan->vchan);
+		if (!vdesc)
+			return;
+
+		chan->desc = to_stm32_dma_desc(vdesc);
+		chan->next_sg = 0;
+	} else {
+		vdesc = &chan->desc->vdesc;
+	}
+
+	if (chan->next_sg == chan->desc->num_sgs)
+		chan->next_sg = 0;
+
+	sg_req = &chan->desc->sg_req[chan->next_sg];
+	reg = &sg_req->chan_reg;
+
+	/* Clear interrupt status if it is there */
+	status = stm32_dma_irq_status(chan);
+	if (status)
+		stm32_dma_irq_clear(chan, status);
+
+	if (chan->use_mdma) {
+		if (chan->next_sg == 0) {
+			struct stm32_dma_mdma_desc *m_desc;
+
+			m_desc = &sg_req->m_desc;
+			if (chan->desc->cyclic) {
+				/*
+				 * If one callback is set, it will be called by
+				 * MDMA driver.
+				 */
+				if (vdesc->tx.callback) {
+					m_desc->desc->callback =
+						vdesc->tx.callback;
+					m_desc->desc->callback_param =
+						vdesc->tx.callback_param;
+					vdesc->tx.callback = NULL;
+					vdesc->tx.callback_param = NULL;
+				}
+			}
+		}
+
+		if (chan->mchan.dir == DMA_MEM_TO_DEV) {
+			ret = stm32_dma_dummy_memcpy_xfer(chan);
+			if (ret < 0) {
+				dmaengine_terminate_async(chan->mchan.chan);
+				chan->desc = NULL;
+				return;
+			}
 		} else {
-			chan->busy = false;
-			if (chan->next_sg == chan->desc->num_sgs) {
-				list_del(&chan->desc->vdesc.node);
-				vchan_cookie_complete(&chan->desc->vdesc);
+			reg->dma_scr &= ~STM32_DMA_SCR_TCIE;
+		}
+
+		if (!chan->desc->cyclic) {
+			/*  MDMA already started */
+			if (chan->mchan.dir != DMA_MEM_TO_DEV &&
+			    sg_dma_len(&sg_req->stm32_sgl_req) >
+			    chan->mchan.sram_period)
+				reg->dma_scr |= STM32_DMA_SCR_DBM;
+			ret = stm32_dma_mdma_start(chan, sg_req);
+			if (ret < 0) {
 				chan->desc = NULL;
+				return;
 			}
-			stm32_dma_start_transfer(chan);
 		}
 	}
+
+	chan->next_sg++;
+
+	stm32_dma_write(dmadev, STM32_DMA_SCR(chan->id), reg->dma_scr);
+	stm32_dma_write(dmadev, STM32_DMA_SPAR(chan->id), reg->dma_spar);
+	stm32_dma_write(dmadev, STM32_DMA_SM0AR(chan->id), reg->dma_sm0ar);
+	stm32_dma_write(dmadev, STM32_DMA_SFCR(chan->id), reg->dma_sfcr);
+	stm32_dma_write(dmadev, STM32_DMA_SM1AR(chan->id), reg->dma_sm1ar);
+	stm32_dma_write(dmadev, STM32_DMA_SNDTR(chan->id), reg->dma_sndtr);
+
+	if (chan->desc->cyclic)
+		stm32_dma_configure_next_sg(chan);
+
+	stm32_dma_dump_reg(chan);
+
+	/* Start DMA */
+	chan->busy = true;
+	reg->dma_scr |= STM32_DMA_SCR_EN;
+	stm32_dma_write(dmadev, STM32_DMA_SCR(chan->id), reg->dma_scr);
+
+	dev_dbg(chan2dev(chan), "vchan %pK: started\n", &chan->vchan);
+}
+
+static void stm32_dma_handle_chan_done(struct stm32_dma_chan *chan)
+{
+	if (!chan->desc)
+		return;
+
+	if (chan->desc->cyclic) {
+		vchan_cyclic_callback(&chan->desc->vdesc);
+		if (chan->use_mdma)
+			return;
+		chan->next_sg++;
+		stm32_dma_configure_next_sg(chan);
+	} else {
+		chan->busy = false;
+		if (chan->use_mdma && chan->mchan.dir != DMA_MEM_TO_DEV)
+			return;
+		if (chan->next_sg == chan->desc->num_sgs) {
+			list_del(&chan->desc->vdesc.node);
+			vchan_cookie_complete(&chan->desc->vdesc);
+			chan->desc = NULL;
+		}
+
+		stm32_dma_start_transfer(chan);
+	}
 }
 
 static irqreturn_t stm32_dma_chan_irq(int irq, void *devid)
@@ -695,7 +1096,6 @@ static void stm32_dma_issue_pending(struct dma_chan *c)
 	if (vchan_issue_pending(&chan->vchan) && !chan->desc && !chan->busy) {
 		dev_dbg(chan2dev(chan), "vchan %pK: issued\n", &chan->vchan);
 		stm32_dma_start_transfer(chan);
-
 	}
 	spin_unlock_irqrestore(&chan->vchan.lock, flags);
 }
@@ -836,16 +1236,128 @@ static void stm32_dma_clear_reg(struct stm32_dma_chan_reg *regs)
 	memset(regs, 0, sizeof(struct stm32_dma_chan_reg));
 }
 
+static int stm32_dma_mdma_prep_slave_sg(struct stm32_dma_chan *chan,
+					struct scatterlist *sgl, u32 sg_len,
+					struct stm32_dma_desc *desc)
+{
+	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
+	struct scatterlist *sg, *m_sg;
+	dma_addr_t dma_buf;
+	u32 len, num_sgs, sram_period;
+	int i, j, ret;
+
+	desc->dma_buf_cpu = gen_pool_dma_alloc(dmadev->sram_pool,
+					       chan->sram_size,
+					       &desc->dma_buf);
+	if (!desc->dma_buf_cpu)
+		return -ENOMEM;
+
+	sram_period = chan->sram_size / 2;
+
+	for_each_sg(sgl, sg, sg_len, i) {
+		struct stm32_dma_mdma_desc *m_desc = &desc->sg_req[i].m_desc;
+
+		len = sg_dma_len(sg);
+		desc->sg_req[i].stm32_sgl_req = *sg;
+		num_sgs = 1;
+
+		if (chan->mchan.dir == DMA_MEM_TO_DEV) {
+			if (len > chan->sram_size) {
+				dev_err(chan2dev(chan),
+					"max buf size = %d bytes\n",
+					chan->sram_size);
+				goto free_alloc;
+			}
+		} else {
+			/*
+			 * Build new sg for MDMA transfer
+			 * Scatter DMA Req into several SDRAM transfer
+			 */
+			if (len > sram_period)
+				num_sgs = len / sram_period;
+		}
+
+		ret = sg_alloc_table(&m_desc->sgt, num_sgs, GFP_ATOMIC);
+		if (ret) {
+			dev_err(chan2dev(chan), "MDMA sg table alloc failed\n");
+			ret = -ENOMEM;
+			goto err;
+		}
+
+		dma_buf = sg_dma_address(sg);
+		for_each_sg(m_desc->sgt.sgl, m_sg, num_sgs, j) {
+			size_t bytes = min_t(size_t, len, sram_period);
+
+			sg_dma_address(m_sg) = dma_buf;
+			sg_dma_len(m_sg) = bytes;
+			dma_buf += bytes;
+			len -= bytes;
+		}
+	}
+
+	chan->mchan.sram_buf = desc->dma_buf;
+	chan->mchan.sram_period = sram_period;
+	chan->mchan.num_sgs = num_sgs;
+
+	return 0;
+
+err:
+	for (j = 0; j < i; j++)
+		sg_free_table(&desc->sg_req[j].m_desc.sgt);
+free_alloc:
+	gen_pool_free(dmadev->sram_pool, (unsigned long)desc->dma_buf_cpu,
+		      chan->sram_size);
+	return ret;
+}
+
+static int stm32_dma_setup_sg_requests(struct stm32_dma_chan *chan,
+				       struct scatterlist *sgl,
+				       unsigned int sg_len,
+				       enum dma_transfer_direction direction,
+				       struct stm32_dma_desc *desc)
+{
+	struct scatterlist *sg;
+	u32 nb_data_items;
+	int i, ret;
+	enum dma_slave_buswidth buswidth;
+
+	for_each_sg(sgl, sg, sg_len, i) {
+		ret = stm32_dma_set_xfer_param(chan, direction, &buswidth,
+					       sg_dma_len(sg));
+		if (ret < 0)
+			return ret;
+
+		nb_data_items = sg_dma_len(sg) / buswidth;
+		if (nb_data_items > STM32_DMA_ALIGNED_MAX_DATA_ITEMS) {
+			dev_err(chan2dev(chan), "nb items not supported\n");
+			return -EINVAL;
+		}
+
+		stm32_dma_clear_reg(&desc->sg_req[i].chan_reg);
+		desc->sg_req[i].chan_reg.dma_scr = chan->chan_reg.dma_scr;
+		desc->sg_req[i].chan_reg.dma_sfcr = chan->chan_reg.dma_sfcr;
+		desc->sg_req[i].chan_reg.dma_spar = chan->chan_reg.dma_spar;
+		desc->sg_req[i].chan_reg.dma_sm0ar = sg_dma_address(sg);
+		desc->sg_req[i].chan_reg.dma_sm1ar = sg_dma_address(sg);
+		if (chan->use_mdma)
+			desc->sg_req[i].chan_reg.dma_sm1ar +=
+				chan->mchan.sram_period;
+		desc->sg_req[i].chan_reg.dma_sndtr = nb_data_items;
+	}
+
+	desc->num_sgs = sg_len;
+
+	return 0;
+}
+
 static struct dma_async_tx_descriptor *stm32_dma_prep_slave_sg(
 	struct dma_chan *c, struct scatterlist *sgl,
 	u32 sg_len, enum dma_transfer_direction direction,
 	unsigned long flags, void *context)
 {
 	struct stm32_dma_chan *chan = to_stm32_dma_chan(c);
+
 	struct stm32_dma_desc *desc;
-	struct scatterlist *sg;
-	enum dma_slave_buswidth buswidth;
-	u32 nb_data_items;
 	int i, ret;
 
 	if (!chan->config_init) {
@@ -868,48 +1380,141 @@ static struct dma_async_tx_descriptor *stm32_dma_prep_slave_sg(
 	else
 		chan->chan_reg.dma_scr &= ~STM32_DMA_SCR_PFCTRL;
 
-	for_each_sg(sgl, sg, sg_len, i) {
-		ret = stm32_dma_set_xfer_param(chan, direction, &buswidth,
-					       sg_dma_len(sg));
-		if (ret < 0)
-			goto err;
-
-		desc->sg_req[i].len = sg_dma_len(sg);
+	if (chan->use_mdma) {
+		struct sg_table new_sgt;
+		struct scatterlist *s, *_sgl;
 
-		nb_data_items = desc->sg_req[i].len / buswidth;
-		if (nb_data_items > STM32_DMA_ALIGNED_MAX_DATA_ITEMS) {
-			dev_err(chan2dev(chan), "nb items not supported\n");
-			goto err;
+		chan->mchan.dir = direction;
+		ret = stm32_dma_mdma_prep_slave_sg(chan, sgl, sg_len, desc);
+		if (ret < 0)
+			return NULL;
+
+		ret = sg_alloc_table(&new_sgt, sg_len, GFP_ATOMIC);
+		if (ret)
+			dev_err(chan2dev(chan), "DMA sg table alloc failed\n");
+
+		for_each_sg(new_sgt.sgl, s, sg_len, i) {
+			_sgl = sgl;
+			sg_dma_len(s) =
+				min(sg_dma_len(_sgl), chan->mchan.sram_period);
+			s->dma_address = chan->mchan.sram_buf;
+			_sgl = sg_next(_sgl);
 		}
 
-		stm32_dma_clear_reg(&desc->sg_req[i].chan_reg);
-		desc->sg_req[i].chan_reg.dma_scr = chan->chan_reg.dma_scr;
-		desc->sg_req[i].chan_reg.dma_sfcr = chan->chan_reg.dma_sfcr;
-		desc->sg_req[i].chan_reg.dma_spar = chan->chan_reg.dma_spar;
-		desc->sg_req[i].chan_reg.dma_sm0ar = sg_dma_address(sg);
-		desc->sg_req[i].chan_reg.dma_sm1ar = sg_dma_address(sg);
-		desc->sg_req[i].chan_reg.dma_sndtr = nb_data_items;
+		ret = stm32_dma_setup_sg_requests(chan, new_sgt.sgl, sg_len,
+						  direction, desc);
+		sg_free_table(&new_sgt);
+		if (ret < 0)
+			goto err;
+	} else {
+		/* Prepare a normal DMA transfer */
+		ret = stm32_dma_setup_sg_requests(chan, sgl, sg_len, direction,
+						  desc);
+		if (ret < 0)
+			goto err;
 	}
 
-	desc->num_sgs = sg_len;
 	desc->cyclic = false;
 
 	return vchan_tx_prep(&chan->vchan, &desc->vdesc, flags);
-
 err:
+	if (chan->use_mdma) {
+		struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
+
+		for (i = 0; i < sg_len; i++)
+			sg_free_table(&desc->sg_req[i].m_desc.sgt);
+
+		gen_pool_free(dmadev->sram_pool,
+			      (unsigned long)desc->dma_buf_cpu,
+			      chan->sram_size);
+	}
 	kfree(desc);
+
 	return NULL;
 }
 
+static int stm32_dma_mdma_prep_dma_cyclic(struct stm32_dma_chan *chan,
+					  dma_addr_t buf_addr, size_t buf_len,
+					  size_t period_len,
+					  struct stm32_dma_desc *desc)
+{
+	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
+	struct stm32_dma_mdma *mchan = &chan->mchan;
+	struct stm32_dma_mdma_desc *m_desc = &desc->sg_req[0].m_desc;
+	struct dma_slave_config config;
+	dma_addr_t mem;
+	int ret;
+
+	chan->sram_size = ALIGN(period_len, STM32_DMA_SRAM_GRANULARITY);
+	desc->dma_buf_cpu = gen_pool_dma_alloc(dmadev->sram_pool,
+					       2 * chan->sram_size,
+					       &desc->dma_buf);
+	if (!desc->dma_buf_cpu)
+		return -ENOMEM;
+
+	memset(&config, 0, sizeof(config));
+	mem = buf_addr;
+
+	/* Configure MDMA channel */
+	if (chan->mchan.dir == DMA_MEM_TO_DEV)
+		config.dst_addr = desc->dma_buf;
+	else
+		config.src_addr = desc->dma_buf;
+	ret = dmaengine_slave_config(mchan->chan, &config);
+	if (ret < 0)
+		goto err;
+
+	/* Prepare MDMA descriptor */
+	m_desc->desc = dmaengine_prep_dma_cyclic(mchan->chan, buf_addr, buf_len,
+						 period_len, chan->mchan.dir,
+						 DMA_PREP_INTERRUPT);
+
+	if (!m_desc->desc) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	ret = dma_submit_error(dmaengine_submit(m_desc->desc));
+	if (ret < 0) {
+		dev_err(chan2dev(chan), "MDMA submit failed\n");
+		goto err;
+	}
+
+	dma_async_issue_pending(mchan->chan);
+
+	/*
+	 * In case of M2D transfer, we have to generate dummy DMA transfer to
+	 * copy 1 period of data into SRAM
+	 */
+	if (chan->mchan.dir == DMA_MEM_TO_DEV) {
+		ret = stm32_dma_dummy_memcpy_xfer(chan);
+		if (ret < 0) {
+			dev_err(chan2dev(chan),
+				"stm32_dma_dummy_memcpy_xfer failed\n");
+			dmaengine_terminate_async(mchan->chan);
+			goto err;
+		}
+	}
+
+	return 0;
+err:
+	gen_pool_free(dmadev->sram_pool,
+		      (unsigned long)desc->dma_buf_cpu,
+		      chan->sram_size);
+	return ret;
+}
+
 static struct dma_async_tx_descriptor *stm32_dma_prep_dma_cyclic(
 	struct dma_chan *c, dma_addr_t buf_addr, size_t buf_len,
 	size_t period_len, enum dma_transfer_direction direction,
 	unsigned long flags)
 {
 	struct stm32_dma_chan *chan = to_stm32_dma_chan(c);
+	struct stm32_dma_chan_reg *chan_reg = &chan->chan_reg;
 	struct stm32_dma_desc *desc;
 	enum dma_slave_buswidth buswidth;
 	u32 num_periods, nb_data_items;
+	dma_addr_t dma_buf = 0;
 	int i, ret;
 
 	if (!buf_len || !period_len) {
@@ -957,28 +1562,49 @@ static struct dma_async_tx_descriptor *stm32_dma_prep_dma_cyclic(
 	/* Clear periph ctrl if client set it */
 	chan->chan_reg.dma_scr &= ~STM32_DMA_SCR_PFCTRL;
 
-	num_periods = buf_len / period_len;
+	if (chan->use_mdma)
+		num_periods = 1;
+	else
+		num_periods = buf_len / period_len;
 
 	desc = stm32_dma_alloc_desc(num_periods);
 	if (!desc)
 		return NULL;
 
-	for (i = 0; i < num_periods; i++) {
-		desc->sg_req[i].len = period_len;
+	desc->num_sgs = num_periods;
+	desc->cyclic = true;
 
+	if (chan->use_mdma) {
+		chan->mchan.dir = direction;
+
+		ret = stm32_dma_mdma_prep_dma_cyclic(chan, buf_addr, buf_len,
+						     period_len, desc);
+		if (ret < 0)
+			return NULL;
+		dma_buf = desc->dma_buf;
+	} else {
+		dma_buf = buf_addr;
+	}
+
+	for (i = 0; i < num_periods; i++) {
+		sg_dma_len(&desc->sg_req[i].stm32_sgl_req) = period_len;
+		sg_dma_address(&desc->sg_req[i].stm32_sgl_req) = dma_buf;
 		stm32_dma_clear_reg(&desc->sg_req[i].chan_reg);
-		desc->sg_req[i].chan_reg.dma_scr = chan->chan_reg.dma_scr;
-		desc->sg_req[i].chan_reg.dma_sfcr = chan->chan_reg.dma_sfcr;
-		desc->sg_req[i].chan_reg.dma_spar = chan->chan_reg.dma_spar;
-		desc->sg_req[i].chan_reg.dma_sm0ar = buf_addr;
-		desc->sg_req[i].chan_reg.dma_sm1ar = buf_addr;
+		desc->sg_req[i].chan_reg.dma_scr = chan_reg->dma_scr;
+		desc->sg_req[i].chan_reg.dma_sfcr = chan_reg->dma_sfcr;
+		desc->sg_req[i].chan_reg.dma_spar = chan_reg->dma_spar;
+		if (chan->use_mdma) {
+			desc->sg_req[i].chan_reg.dma_sm0ar = desc->dma_buf;
+			desc->sg_req[i].chan_reg.dma_sm1ar = desc->dma_buf +
+				chan->sram_size;
+		} else {
+			desc->sg_req[i].chan_reg.dma_sm0ar = dma_buf;
+			desc->sg_req[i].chan_reg.dma_sm1ar = dma_buf;
+			dma_buf += period_len;
+		}
 		desc->sg_req[i].chan_reg.dma_sndtr = nb_data_items;
-		buf_addr += period_len;
 	}
 
-	desc->num_sgs = num_periods;
-	desc->cyclic = true;
-
 	return vchan_tx_prep(&chan->vchan, &desc->vdesc, flags);
 }
 
@@ -1019,13 +1645,13 @@ static struct dma_async_tx_descriptor *stm32_dma_prep_dma_memcpy(
 			STM32_DMA_SCR_PINC |
 			STM32_DMA_SCR_TCIE |
 			STM32_DMA_SCR_TEIE;
-		desc->sg_req[i].chan_reg.dma_sfcr |= STM32_DMA_SFCR_MASK;
+		desc->sg_req[i].chan_reg.dma_sfcr &= ~STM32_DMA_SFCR_MASK;
 		desc->sg_req[i].chan_reg.dma_sfcr |=
 			STM32_DMA_SFCR_FTH(threshold);
 		desc->sg_req[i].chan_reg.dma_spar = src + offset;
 		desc->sg_req[i].chan_reg.dma_sm0ar = dest + offset;
 		desc->sg_req[i].chan_reg.dma_sndtr = xfer_count;
-		desc->sg_req[i].len = xfer_count;
+		sg_dma_len(&desc->sg_req[i].stm32_sgl_req) = xfer_count;
 	}
 
 	desc->num_sgs = num_sgs;
@@ -1034,18 +1660,6 @@ static struct dma_async_tx_descriptor *stm32_dma_prep_dma_memcpy(
 	return vchan_tx_prep(&chan->vchan, &desc->vdesc, flags);
 }
 
-static u32 stm32_dma_get_remaining_bytes(struct stm32_dma_chan *chan)
-{
-	u32 dma_scr, width, ndtr;
-	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
-
-	dma_scr = stm32_dma_read(dmadev, STM32_DMA_SCR(chan->id));
-	width = STM32_DMA_SCR_PSIZE_GET(dma_scr);
-	ndtr = stm32_dma_read(dmadev, STM32_DMA_SNDTR(chan->id));
-
-	return ndtr << width;
-}
-
 static size_t stm32_dma_desc_residue(struct stm32_dma_chan *chan,
 				     struct stm32_dma_desc *desc,
 				     u32 next_sg)
@@ -1054,6 +1668,10 @@ static size_t stm32_dma_desc_residue(struct stm32_dma_chan *chan,
 	u32 residue = 0;
 	int i;
 
+	/* Drain case */
+	if (chan->residue_after_drain)
+		return chan->residue_after_drain;
+
 	/*
 	 * In cyclic mode, for the last period, residue = remaining bytes from
 	 * NDTR
@@ -1069,7 +1687,7 @@ static size_t stm32_dma_desc_residue(struct stm32_dma_chan *chan,
 	 * transferred
 	 */
 	for (i = next_sg; i < desc->num_sgs; i++)
-		residue += desc->sg_req[i].len;
+		residue += sg_dma_len(&desc->sg_req[i].stm32_sgl_req);
 	residue += stm32_dma_get_remaining_bytes(chan);
 
 end:
@@ -1089,11 +1707,23 @@ static enum dma_status stm32_dma_tx_status(struct dma_chan *c,
 					   struct dma_tx_state *state)
 {
 	struct stm32_dma_chan *chan = to_stm32_dma_chan(c);
+	struct stm32_dma_mdma *mchan = &chan->mchan;
 	struct virt_dma_desc *vdesc;
 	enum dma_status status;
 	unsigned long flags;
 	u32 residue = 0;
 
+	/*
+	 * When DMA/MDMA chain is used, we return the status of MDMA in cyclic
+	 * mode and for D2M transfer in sg mode in order to return the correct
+	 * residue if any
+	 */
+	if (chan->desc && chan->use_mdma &&
+	    (mchan->dir != DMA_MEM_TO_DEV || chan->desc->cyclic) &&
+	    !chan->residue_after_drain)
+		return dmaengine_tx_status(mchan->chan, mchan->chan->cookie,
+					   state);
+
 	status = dma_cookie_status(c, cookie, state);
 	if (status == DMA_COMPLETE || !state)
 		return status;
@@ -1155,21 +1785,34 @@ static void stm32_dma_free_chan_resources(struct dma_chan *c)
 
 static void stm32_dma_desc_free(struct virt_dma_desc *vdesc)
 {
-	kfree(container_of(vdesc, struct stm32_dma_desc, vdesc));
+	struct stm32_dma_desc *desc = to_stm32_dma_desc(vdesc);
+	struct stm32_dma_chan *chan = to_stm32_dma_chan(vdesc->tx.chan);
+	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
+	int i;
+
+	if (chan->use_mdma) {
+		for (i = 0; i < desc->num_sgs; i++)
+			sg_free_table(&desc->sg_req[i].m_desc.sgt);
+
+		gen_pool_free(dmadev->sram_pool,
+			      (unsigned long)desc->dma_buf_cpu,
+			      chan->sram_size);
+	}
+
+	kfree(desc);
 }
 
 static void stm32_dma_set_config(struct stm32_dma_chan *chan,
 				 struct stm32_dma_cfg *cfg)
 {
 	stm32_dma_clear_reg(&chan->chan_reg);
-
 	chan->chan_reg.dma_scr = cfg->stream_config & STM32_DMA_SCR_CFG_MASK;
 	chan->chan_reg.dma_scr |= STM32_DMA_SCR_REQ(cfg->request_line);
-
-	/* Enable Interrupts  */
 	chan->chan_reg.dma_scr |= STM32_DMA_SCR_TEIE | STM32_DMA_SCR_TCIE;
-
 	chan->threshold = STM32_DMA_THRESHOLD_FTR_GET(cfg->features);
+	chan->use_mdma = STM32_DMA_MDMA_CHAIN_FTR_GET(cfg->features);
+	chan->sram_size = (1 << STM32_DMA_MDMA_SRAM_SIZE_GET(cfg->features)) *
+		STM32_DMA_SRAM_GRANULARITY;
 }
 
 static struct dma_chan *stm32_dma_of_xlate(struct of_phandle_args *dma_spec,
@@ -1207,6 +1850,9 @@ static struct dma_chan *stm32_dma_of_xlate(struct of_phandle_args *dma_spec,
 
 	stm32_dma_set_config(chan, &cfg);
 
+	if (!dmadev->sram_pool || !chan->mchan.chan)
+		chan->use_mdma = 0;
+
 	return c;
 }
 
@@ -1219,10 +1865,12 @@ MODULE_DEVICE_TABLE(of, stm32_dma_of_match);
 static int stm32_dma_probe(struct platform_device *pdev)
 {
 	struct stm32_dma_chan *chan;
+	struct stm32_dma_mdma *mchan;
 	struct stm32_dma_device *dmadev;
 	struct dma_device *dd;
 	const struct of_device_id *match;
 	struct resource *res;
+	char name[4];
 	int i, ret;
 
 	match = of_match_device(stm32_dma_of_match, &pdev->dev);
@@ -1258,6 +1906,13 @@ static int stm32_dma_probe(struct platform_device *pdev)
 		reset_control_deassert(dmadev->rst);
 	}
 
+	dmadev->sram_pool = of_gen_pool_get(pdev->dev.of_node, "sram", 0);
+	if (!dmadev->sram_pool)
+		dev_info(&pdev->dev, "no dma pool: can't use MDMA\n");
+	else
+		dev_dbg(&pdev->dev, "SRAM pool: %zu KiB\n",
+			gen_pool_size(dmadev->sram_pool) / 1024);
+
 	dma_cap_set(DMA_SLAVE, dd->cap_mask);
 	dma_cap_set(DMA_PRIVATE, dd->cap_mask);
 	dma_cap_set(DMA_CYCLIC, dd->cap_mask);
@@ -1293,6 +1948,16 @@ static int stm32_dma_probe(struct platform_device *pdev)
 		chan->id = i;
 		chan->vchan.desc_free = stm32_dma_desc_free;
 		vchan_init(&chan->vchan, dd);
+
+		mchan = &chan->mchan;
+		if (dmadev->sram_pool) {
+			snprintf(name, sizeof(name), "ch%d", chan->id);
+			mchan->chan = dma_request_slave_channel(dd->dev, name);
+			if (!mchan->chan)
+				dev_info(&pdev->dev,
+					 "can't request MDMA chan for %s\n",
+					 name);
+		}
 	}
 
 	ret = dma_async_device_register(dd);
@@ -1350,4 +2015,4 @@ static int __init stm32_dma_init(void)
 {
 	return platform_driver_probe(&stm32_dma_driver, stm32_dma_probe);
 }
-subsys_initcall(stm32_dma_init);
+device_initcall(stm32_dma_init);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 5/7] dmaengine: stm32-mdma: Add DMA/MDMA chaining support
  2018-09-28 13:01 [PATCH v3 0/7] Add-DMA-MDMA-chaining-support Pierre-Yves MORDRET
                   ` (3 preceding siblings ...)
  2018-09-28 13:01 ` [PATCH v3 4/7] dmaengine: stm32-dma: Add DMA/MDMA chaining support Pierre-Yves MORDRET
@ 2018-09-28 13:01 ` Pierre-Yves MORDRET
  2018-09-28 13:01 ` [PATCH v3 6/7] dmaengine: stm32-dma: enable descriptor_reuse Pierre-Yves MORDRET
  2018-09-28 13:01 ` [PATCH v3 7/7] dmaengine: stm32-mdma: " Pierre-Yves MORDRET
  6 siblings, 0 replies; 26+ messages in thread
From: Pierre-Yves MORDRET @ 2018-09-28 13:01 UTC (permalink / raw)
  To: Vinod Koul, Rob Herring, Mark Rutland, Alexandre Torgue,
	Maxime Coquelin, Dan Williams, devicetree, dmaengine,
	linux-arm-kernel, linux-kernel
  Cc: Pierre-Yves MORDRET

From: M'boumba Cedric Madianga <cedric.madianga@gmail.com>

This patch adds support for M2M transfer triggered by STM32 DMA in order to
transfer data from/to SRAM to/from DDR.

Normally, this mode should not be needed as transferring data from/to DDR
is supported by the STM32 DMA.
However, the STM32 DMA don't have the ability to generate burst transfer
on the DDR as it only embeds only a 4-word FIFO although the minimal burst
length on the DDR is 8 words.
Due to this constraint, the STM32 DMA transfers data from/to DDR in a
single way and could lead to pollute the DDR.
To avoid this, we have to use SRAM for all transfers where STM32 DMA is
involved.

So, we need to add an intermediate M2M transfer handled by the MDMA, which
has the ability to generate burst transfer on the DDR, to copy data
from/to SRAM to/from DDR as described below:
For M2D: DDR --> MDMA --> SRAM --> DMA  --> IP
For D2M: IP  --> DMA  --> SRAM --> MDMA --> DDR

This intermediate transfer is triggered by the STM32 DMA when his transfer
complete flag is set. In that way, we are able to build a DMA/MDMA
chaining transfer completely handled by HW.

This patch clearly adds support for M2M transfer triggered by HW.
This mode is not really available in dmaengine framework as normally M2M
transfers are triggered by SW.

Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
---
  Version history:
    v3:
    v2:
    v1:
       * Initial
---
---
 drivers/dma/stm32-mdma.c | 131 +++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 114 insertions(+), 17 deletions(-)

diff --git a/drivers/dma/stm32-mdma.c b/drivers/dma/stm32-mdma.c
index 06dd172..6b6e63b 100644
--- a/drivers/dma/stm32-mdma.c
+++ b/drivers/dma/stm32-mdma.c
@@ -211,6 +211,8 @@
 #define STM32_MDMA_MAX_BURST		128
 #define STM32_MDMA_VERY_HIGH_PRIORITY	0x11
 
+#define STM32_DMA_SRAM_GRANULARITY	PAGE_SIZE
+
 enum stm32_mdma_trigger_mode {
 	STM32_MDMA_BUFFER,
 	STM32_MDMA_BLOCK,
@@ -237,6 +239,7 @@ struct stm32_mdma_chan_config {
 	u32 transfer_config;
 	u32 mask_addr;
 	u32 mask_data;
+	bool m2m_hw;
 };
 
 struct stm32_mdma_hwdesc {
@@ -262,6 +265,7 @@ struct stm32_mdma_desc {
 	u32 ccr;
 	bool cyclic;
 	u32 count;
+	enum dma_transfer_direction dir;
 	struct stm32_mdma_desc_node node[];
 };
 
@@ -577,13 +581,25 @@ static int stm32_mdma_set_xfer_param(struct stm32_mdma_chan *chan,
 		dst_addr = chan->dma_config.dst_addr;
 
 		/* Set device data size */
+		if (chan_config->m2m_hw)
+			dst_addr_width =
+			stm32_mdma_get_max_width(dst_addr, buf_len,
+						 STM32_MDMA_MAX_BUF_LEN);
+
 		dst_bus_width = stm32_mdma_get_width(chan, dst_addr_width);
 		if (dst_bus_width < 0)
 			return dst_bus_width;
 		ctcr &= ~STM32_MDMA_CTCR_DSIZE_MASK;
 		ctcr |= STM32_MDMA_CTCR_DSIZE(dst_bus_width);
+		if (chan_config->m2m_hw) {
+			ctcr &= ~STM32_MDMA_CTCR_DINCOS_MASK;
+			ctcr |= STM32_MDMA_CTCR_DINCOS(dst_bus_width);
+		}
 
 		/* Set device burst value */
+		if (chan_config->m2m_hw)
+			dst_maxburst = STM32_MDMA_MAX_BUF_LEN / dst_addr_width;
+
 		dst_best_burst = stm32_mdma_get_best_burst(buf_len, tlen,
 							   dst_maxburst,
 							   dst_addr_width);
@@ -626,13 +642,25 @@ static int stm32_mdma_set_xfer_param(struct stm32_mdma_chan *chan,
 		src_addr = chan->dma_config.src_addr;
 
 		/* Set device data size */
+		if (chan_config->m2m_hw)
+			src_addr_width =
+			stm32_mdma_get_max_width(src_addr, buf_len,
+						 STM32_MDMA_MAX_BUF_LEN);
+
 		src_bus_width = stm32_mdma_get_width(chan, src_addr_width);
 		if (src_bus_width < 0)
 			return src_bus_width;
 		ctcr &= ~STM32_MDMA_CTCR_SSIZE_MASK;
 		ctcr |= STM32_MDMA_CTCR_SSIZE(src_bus_width);
+		if (chan_config->m2m_hw) {
+			ctcr &= ~STM32_MDMA_CTCR_SINCOS_MASK;
+			ctcr |= STM32_MDMA_CTCR_SINCOS(src_bus_width);
+		}
 
 		/* Set device burst value */
+		if (chan_config->m2m_hw)
+			src_maxburst = STM32_MDMA_MAX_BUF_LEN / src_addr_width;
+
 		src_best_burst = stm32_mdma_get_best_burst(buf_len, tlen,
 							   src_maxburst,
 							   src_addr_width);
@@ -740,6 +768,7 @@ static int stm32_mdma_setup_xfer(struct stm32_mdma_chan *chan,
 {
 	struct stm32_mdma_device *dmadev = stm32_mdma_get_dev(chan);
 	struct dma_slave_config *dma_config = &chan->dma_config;
+	struct stm32_mdma_chan_config *chan_config = &chan->chan_config;
 	struct scatterlist *sg;
 	dma_addr_t src_addr, dst_addr;
 	u32 ccr, ctcr, ctbr;
@@ -762,6 +791,8 @@ static int stm32_mdma_setup_xfer(struct stm32_mdma_chan *chan,
 		} else {
 			src_addr = dma_config->src_addr;
 			dst_addr = sg_dma_address(sg);
+			if (chan_config->m2m_hw)
+				src_addr += ((i & 1) ? sg_dma_len(sg) : 0);
 			ret = stm32_mdma_set_xfer_param(chan, direction, &ccr,
 							&ctcr, &ctbr, dst_addr,
 							sg_dma_len(sg));
@@ -780,8 +811,6 @@ static int stm32_mdma_setup_xfer(struct stm32_mdma_chan *chan,
 	/* Enable interrupts */
 	ccr &= ~STM32_MDMA_CCR_IRQ_MASK;
 	ccr |= STM32_MDMA_CCR_TEIE | STM32_MDMA_CCR_CTCIE;
-	if (sg_len > 1)
-		ccr |= STM32_MDMA_CCR_BTIE;
 	desc->ccr = ccr;
 
 	return 0;
@@ -793,7 +822,9 @@ stm32_mdma_prep_slave_sg(struct dma_chan *c, struct scatterlist *sgl,
 			 unsigned long flags, void *context)
 {
 	struct stm32_mdma_chan *chan = to_stm32_mdma_chan(c);
+	struct stm32_mdma_chan_config *chan_config = &chan->chan_config;
 	struct stm32_mdma_desc *desc;
+	struct stm32_mdma_hwdesc *hwdesc;
 	int i, ret;
 
 	/*
@@ -815,6 +846,20 @@ stm32_mdma_prep_slave_sg(struct dma_chan *c, struct scatterlist *sgl,
 	if (ret < 0)
 		goto xfer_setup_err;
 
+	/*
+	 * In case of M2M HW transfer triggered by STM32 DMA, we do not have to
+	 * clear the transfer complete flag by hardware in order to let the
+	 * CPU rearm the DMA with the next sg element and update some data in
+	 * dmaengine framework
+	 */
+	if (chan_config->m2m_hw && direction == DMA_MEM_TO_DEV) {
+		for (i = 0; i < sg_len; i++) {
+			hwdesc = desc->node[i].hwdesc;
+			hwdesc->cmar = 0;
+			hwdesc->cmdr = 0;
+		}
+	}
+
 	desc->cyclic = false;
 
 	return vchan_tx_prep(&chan->vchan, &desc->vdesc, flags);
@@ -836,9 +881,10 @@ stm32_mdma_prep_dma_cyclic(struct dma_chan *c, dma_addr_t buf_addr,
 	struct stm32_mdma_chan *chan = to_stm32_mdma_chan(c);
 	struct stm32_mdma_device *dmadev = stm32_mdma_get_dev(chan);
 	struct dma_slave_config *dma_config = &chan->dma_config;
+	struct stm32_mdma_chan_config *chan_config = &chan->chan_config;
 	struct stm32_mdma_desc *desc;
 	dma_addr_t src_addr, dst_addr;
-	u32 ccr, ctcr, ctbr, count;
+	u32 ccr, ctcr, ctbr, count, offset;
 	int i, ret;
 
 	/*
@@ -892,12 +938,29 @@ stm32_mdma_prep_dma_cyclic(struct dma_chan *c, dma_addr_t buf_addr,
 	desc->ccr = ccr;
 
 	/* Configure hwdesc list */
+	offset =  ALIGN(period_len, STM32_DMA_SRAM_GRANULARITY);
 	for (i = 0; i < count; i++) {
 		if (direction == DMA_MEM_TO_DEV) {
+			/*
+			 * When the DMA is configured in double buffer mode,
+			 * the MDMA has to use 2 destination buffers to be
+			 * compliant with this mode.
+			 */
+			if (chan_config->m2m_hw && count > 1 && i % 2)
+				dst_addr = dma_config->dst_addr + offset;
+			else
+				dst_addr = dma_config->dst_addr;
 			src_addr = buf_addr + i * period_len;
-			dst_addr = dma_config->dst_addr;
 		} else {
-			src_addr = dma_config->src_addr;
+			/*
+			 * When the DMA is configured in double buffer mode,
+			 * the MDMA has to use 2 destination buffers to be
+			 * compliant with this mode.
+			 */
+			if (chan_config->m2m_hw && count > 1 && i % 2)
+				src_addr = dma_config->src_addr + offset;
+			else
+				src_addr = dma_config->src_addr;
 			dst_addr = buf_addr + i * period_len;
 		}
 
@@ -907,6 +970,7 @@ stm32_mdma_prep_dma_cyclic(struct dma_chan *c, dma_addr_t buf_addr,
 	}
 
 	desc->cyclic = true;
+	desc->dir = direction;
 
 	return vchan_tx_prep(&chan->vchan, &desc->vdesc, flags);
 
@@ -1287,14 +1351,28 @@ static size_t stm32_mdma_desc_residue(struct stm32_mdma_chan *chan,
 {
 	struct stm32_mdma_device *dmadev = stm32_mdma_get_dev(chan);
 	struct stm32_mdma_hwdesc *hwdesc = desc->node[0].hwdesc;
-	u32 cbndtr, residue, modulo, burst_size;
+	u32 residue = 0;
+	u32 modulo, burst_size;
+	dma_addr_t next_clar;
+	u32 cbndtr;
 	int i;
 
-	residue = 0;
-	for (i = curr_hwdesc + 1; i < desc->count; i++) {
+	/*
+	 * Get the residue of pending descriptors
+	 */
+	/* Get the next hw descriptor to process from current transfer */
+	next_clar = stm32_mdma_read(dmadev, STM32_MDMA_CLAR(chan->id));
+	for (i = desc->count - 1; i >= 0; i--) {
 		hwdesc = desc->node[i].hwdesc;
+
+		if (hwdesc->clar == next_clar)
+			break;/* Current transfer found, stop cumulating */
+
+		/* Cumulate residue of unprocessed hw descriptors */
 		residue += STM32_MDMA_CBNDTR_BNDT(hwdesc->cbndtr);
 	}
+
+	/* Read & cumulate the residue of the current transfer */
 	cbndtr = stm32_mdma_read(dmadev, STM32_MDMA_CBNDTR(chan->id));
 	residue += cbndtr & STM32_MDMA_CBNDTR_BNDT_MASK;
 
@@ -1314,24 +1392,39 @@ static enum dma_status stm32_mdma_tx_status(struct dma_chan *c,
 					    struct dma_tx_state *state)
 {
 	struct stm32_mdma_chan *chan = to_stm32_mdma_chan(c);
+	struct stm32_mdma_chan_config *chan_config = &chan->chan_config;
 	struct virt_dma_desc *vdesc;
 	enum dma_status status;
 	unsigned long flags;
 	u32 residue = 0;
 
 	status = dma_cookie_status(c, cookie, state);
-	if ((status == DMA_COMPLETE) || (!state))
+	if (status == DMA_COMPLETE || !state)
 		return status;
 
 	spin_lock_irqsave(&chan->vchan.lock, flags);
 
 	vdesc = vchan_find_desc(&chan->vchan, cookie);
-	if (chan->desc && cookie == chan->desc->vdesc.tx.cookie)
-		residue = stm32_mdma_desc_residue(chan, chan->desc,
-						  chan->curr_hwdesc);
-	else if (vdesc)
+	if (chan->desc && cookie == chan->desc->vdesc.tx.cookie) {
+		/*
+		 * In case of M2D transfer triggered by STM32 DMA, the MDMA has
+		 * always one period in advance in cyclic mode. So, we have to
+		 * add 1 period of data to return the good residue to the
+		 * client
+		 */
+		if (chan_config->m2m_hw && chan->desc->dir == DMA_MEM_TO_DEV &&
+		    chan->curr_hwdesc > 1)
+			residue =
+				stm32_mdma_desc_residue(chan, chan->desc,
+							chan->curr_hwdesc - 1);
+		else
+			residue = stm32_mdma_desc_residue(chan, chan->desc,
+							  chan->curr_hwdesc);
+	} else if (vdesc) {
 		residue = stm32_mdma_desc_residue(chan,
 						  to_stm32_mdma_desc(vdesc), 0);
+	}
+
 	dma_set_residue(state, residue);
 
 	spin_unlock_irqrestore(&chan->vchan.lock, flags);
@@ -1498,7 +1591,7 @@ static struct dma_chan *stm32_mdma_of_xlate(struct of_phandle_args *dma_spec,
 	struct dma_chan *c;
 	struct stm32_mdma_chan_config config;
 
-	if (dma_spec->args_count < 5) {
+	if (dma_spec->args_count < 6) {
 		dev_err(mdma2dev(dmadev), "Bad number of args\n");
 		return NULL;
 	}
@@ -1508,6 +1601,7 @@ static struct dma_chan *stm32_mdma_of_xlate(struct of_phandle_args *dma_spec,
 	config.transfer_config = dma_spec->args[2];
 	config.mask_addr = dma_spec->args[3];
 	config.mask_data = dma_spec->args[4];
+	config.m2m_hw = dma_spec->args[5];
 
 	if (config.request >= dmadev->nr_requests) {
 		dev_err(mdma2dev(dmadev), "Bad request line\n");
@@ -1646,19 +1740,20 @@ static int stm32_mdma_probe(struct platform_device *pdev)
 	dmadev->irq = platform_get_irq(pdev, 0);
 	if (dmadev->irq < 0) {
 		dev_err(&pdev->dev, "failed to get IRQ\n");
-		return dmadev->irq;
+		ret = dmadev->irq;
+		goto clk_free;
 	}
 
 	ret = devm_request_irq(&pdev->dev, dmadev->irq, stm32_mdma_irq_handler,
 			       0, dev_name(&pdev->dev), dmadev);
 	if (ret) {
 		dev_err(&pdev->dev, "failed to request IRQ\n");
-		return ret;
+		goto clk_free;
 	}
 
 	ret = dma_async_device_register(dd);
 	if (ret)
-		return ret;
+		goto clk_free;
 
 	ret = of_dma_controller_register(of_node, stm32_mdma_of_xlate, dmadev);
 	if (ret < 0) {
@@ -1675,6 +1770,8 @@ static int stm32_mdma_probe(struct platform_device *pdev)
 
 err_unregister:
 	dma_async_device_unregister(dd);
+clk_free:
+	clk_disable_unprepare(dmadev->clk);
 
 	return ret;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 6/7] dmaengine: stm32-dma: enable descriptor_reuse
  2018-09-28 13:01 [PATCH v3 0/7] Add-DMA-MDMA-chaining-support Pierre-Yves MORDRET
                   ` (4 preceding siblings ...)
  2018-09-28 13:01 ` [PATCH v3 5/7] dmaengine: stm32-mdma: " Pierre-Yves MORDRET
@ 2018-09-28 13:01 ` Pierre-Yves MORDRET
  2018-09-28 13:01 ` [PATCH v3 7/7] dmaengine: stm32-mdma: " Pierre-Yves MORDRET
  6 siblings, 0 replies; 26+ messages in thread
From: Pierre-Yves MORDRET @ 2018-09-28 13:01 UTC (permalink / raw)
  To: Vinod Koul, Rob Herring, Mark Rutland, Alexandre Torgue,
	Maxime Coquelin, Dan Williams, devicetree, dmaengine,
	linux-arm-kernel, linux-kernel
  Cc: Pierre-Yves MORDRET

Enable client to resubmit already processed descriptors
in order to save descriptor creation time.

Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
---
  Version history:
    v3:
    v2:
    v1:
       * Initial
---
---
 drivers/dma/stm32-dma.c | 84 +++++++++++++++++++++++++++++++------------------
 1 file changed, 54 insertions(+), 30 deletions(-)

diff --git a/drivers/dma/stm32-dma.c b/drivers/dma/stm32-dma.c
index 85e81c4..ba79051 100644
--- a/drivers/dma/stm32-dma.c
+++ b/drivers/dma/stm32-dma.c
@@ -836,34 +836,8 @@ static int stm32_dma_mdma_start(struct stm32_dma_chan *chan,
 {
 	struct stm32_dma_mdma *mchan = &chan->mchan;
 	struct stm32_dma_mdma_desc *m_desc = &sg_req->m_desc;
-	struct dma_slave_config config;
 	int ret;
 
-	/* Configure MDMA channel */
-	memset(&config, 0, sizeof(config));
-	if (mchan->dir == DMA_MEM_TO_DEV)
-		config.dst_addr = mchan->sram_buf;
-	else
-		config.src_addr = mchan->sram_buf;
-
-	ret = dmaengine_slave_config(mchan->chan, &config);
-	if (ret < 0)
-		goto error;
-
-	 /* Prepare MDMA descriptor */
-	m_desc->desc = dmaengine_prep_slave_sg(mchan->chan, m_desc->sgt.sgl,
-					       m_desc->sgt.nents, mchan->dir,
-					       DMA_PREP_INTERRUPT);
-	if (!m_desc->desc) {
-		ret = -EINVAL;
-		goto error;
-	}
-
-	if (mchan->dir != DMA_MEM_TO_DEV) {
-		m_desc->desc->callback_result = stm32_mdma_chan_complete;
-		m_desc->desc->callback_param = chan;
-	}
-
 	ret = dma_submit_error(dmaengine_submit(m_desc->desc));
 	if (ret < 0) {
 		dev_err(chan2dev(chan), "MDMA submit failed\n");
@@ -1001,6 +975,7 @@ static void stm32_dma_start_transfer(struct stm32_dma_chan *chan)
 
 	chan->next_sg++;
 
+	reg->dma_scr &= ~STM32_DMA_SCR_EN;
 	stm32_dma_write(dmadev, STM32_DMA_SCR(chan->id), reg->dma_scr);
 	stm32_dma_write(dmadev, STM32_DMA_SPAR(chan->id), reg->dma_spar);
 	stm32_dma_write(dmadev, STM32_DMA_SM0AR(chan->id), reg->dma_sm0ar);
@@ -1238,9 +1213,11 @@ static void stm32_dma_clear_reg(struct stm32_dma_chan_reg *regs)
 
 static int stm32_dma_mdma_prep_slave_sg(struct stm32_dma_chan *chan,
 					struct scatterlist *sgl, u32 sg_len,
-					struct stm32_dma_desc *desc)
+					struct stm32_dma_desc *desc,
+					unsigned long flags)
 {
 	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
+	struct stm32_dma_mdma *mchan = &chan->mchan;
 	struct scatterlist *sg, *m_sg;
 	dma_addr_t dma_buf;
 	u32 len, num_sgs, sram_period;
@@ -1256,12 +1233,13 @@ static int stm32_dma_mdma_prep_slave_sg(struct stm32_dma_chan *chan,
 
 	for_each_sg(sgl, sg, sg_len, i) {
 		struct stm32_dma_mdma_desc *m_desc = &desc->sg_req[i].m_desc;
+		struct dma_slave_config config;
 
 		len = sg_dma_len(sg);
 		desc->sg_req[i].stm32_sgl_req = *sg;
 		num_sgs = 1;
 
-		if (chan->mchan.dir == DMA_MEM_TO_DEV) {
+		if (mchan->dir == DMA_MEM_TO_DEV) {
 			if (len > chan->sram_size) {
 				dev_err(chan2dev(chan),
 					"max buf size = %d bytes\n",
@@ -1293,6 +1271,38 @@ static int stm32_dma_mdma_prep_slave_sg(struct stm32_dma_chan *chan,
 			dma_buf += bytes;
 			len -= bytes;
 		}
+
+		/* Configure MDMA channel */
+		memset(&config, 0, sizeof(config));
+		if (mchan->dir == DMA_MEM_TO_DEV)
+			config.dst_addr = desc->dma_buf;
+		else
+			config.src_addr = desc->dma_buf;
+
+		ret = dmaengine_slave_config(mchan->chan, &config);
+		if (ret < 0)
+			goto err;
+
+		/* Prepare MDMA descriptor */
+		m_desc->desc = dmaengine_prep_slave_sg(mchan->chan,
+						       m_desc->sgt.sgl,
+						       m_desc->sgt.nents,
+						       mchan->dir,
+						       DMA_PREP_INTERRUPT);
+
+		if (!m_desc->desc) {
+			ret = -EINVAL;
+			goto err;
+		}
+
+		if (flags & DMA_CTRL_REUSE)
+			dmaengine_desc_set_reuse(m_desc->desc);
+
+		if (mchan->dir != DMA_MEM_TO_DEV) {
+			m_desc->desc->callback_result =
+				stm32_mdma_chan_complete;
+			m_desc->desc->callback_param = chan;
+		}
 	}
 
 	chan->mchan.sram_buf = desc->dma_buf;
@@ -1302,8 +1312,12 @@ static int stm32_dma_mdma_prep_slave_sg(struct stm32_dma_chan *chan,
 	return 0;
 
 err:
-	for (j = 0; j < i; j++)
+	for (j = 0; j < i; j++) {
+		struct stm32_dma_mdma_desc *m_desc = &desc->sg_req[j].m_desc;
+
+		m_desc->desc = NULL;
 		sg_free_table(&desc->sg_req[j].m_desc.sgt);
+	}
 free_alloc:
 	gen_pool_free(dmadev->sram_pool, (unsigned long)desc->dma_buf_cpu,
 		      chan->sram_size);
@@ -1385,7 +1399,8 @@ static struct dma_async_tx_descriptor *stm32_dma_prep_slave_sg(
 		struct scatterlist *s, *_sgl;
 
 		chan->mchan.dir = direction;
-		ret = stm32_dma_mdma_prep_slave_sg(chan, sgl, sg_len, desc);
+		ret = stm32_dma_mdma_prep_slave_sg(chan, sgl, sg_len, desc,
+						   flags);
 		if (ret < 0)
 			return NULL;
 
@@ -1791,6 +1806,14 @@ static void stm32_dma_desc_free(struct virt_dma_desc *vdesc)
 	int i;
 
 	if (chan->use_mdma) {
+		struct stm32_dma_mdma_desc *m_desc;
+
+		for (i = 0; i < desc->num_sgs; i++) {
+			m_desc = &desc->sg_req[i].m_desc;
+			dmaengine_desc_free(m_desc->desc);
+			m_desc->desc = NULL;
+		}
+
 		for (i = 0; i < desc->num_sgs; i++)
 			sg_free_table(&desc->sg_req[i].m_desc.sgt);
 
@@ -1934,6 +1957,7 @@ static int stm32_dma_probe(struct platform_device *pdev)
 	dd->directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV);
 	dd->residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;
 	dd->max_burst = STM32_DMA_MAX_BURST;
+	dd->descriptor_reuse = true;
 	dd->dev = &pdev->dev;
 	INIT_LIST_HEAD(&dd->channels);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 7/7] dmaengine: stm32-mdma: enable descriptor_reuse
  2018-09-28 13:01 [PATCH v3 0/7] Add-DMA-MDMA-chaining-support Pierre-Yves MORDRET
                   ` (5 preceding siblings ...)
  2018-09-28 13:01 ` [PATCH v3 6/7] dmaengine: stm32-dma: enable descriptor_reuse Pierre-Yves MORDRET
@ 2018-09-28 13:01 ` Pierre-Yves MORDRET
  6 siblings, 0 replies; 26+ messages in thread
From: Pierre-Yves MORDRET @ 2018-09-28 13:01 UTC (permalink / raw)
  To: Vinod Koul, Rob Herring, Mark Rutland, Alexandre Torgue,
	Maxime Coquelin, Dan Williams, devicetree, dmaengine,
	linux-arm-kernel, linux-kernel
  Cc: Pierre-Yves MORDRET

enable reuse to spare descriptors creation on critical UC.

Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
---
  Version history:
    v3:
    v2:
    v1:
       * Initial
---
---
 drivers/dma/stm32-mdma.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/dma/stm32-mdma.c b/drivers/dma/stm32-mdma.c
index 6b6e63b..80a17dd 100644
--- a/drivers/dma/stm32-mdma.c
+++ b/drivers/dma/stm32-mdma.c
@@ -1715,6 +1715,8 @@ static int stm32_mdma_probe(struct platform_device *pdev)
 	dd->device_resume = stm32_mdma_resume;
 	dd->device_terminate_all = stm32_mdma_terminate_all;
 	dd->device_synchronize = stm32_mdma_synchronize;
+	dd->descriptor_reuse = true;
+
 	dd->src_addr_widths = BIT(DMA_SLAVE_BUSWIDTH_1_BYTE) |
 		BIT(DMA_SLAVE_BUSWIDTH_2_BYTES) |
 		BIT(DMA_SLAVE_BUSWIDTH_4_BYTES) |
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/7] dt-bindings: stm32-dma: Add DMA/MDMA chaining support bindings
  2018-09-28 13:01 ` [PATCH v3 1/7] dt-bindings: stm32-dma: Add DMA/MDMA chaining support bindings Pierre-Yves MORDRET
@ 2018-10-07 14:57   ` Vinod
  2018-10-09  7:18     ` Pierre Yves MORDRET
  2018-10-12 14:42   ` Rob Herring
  1 sibling, 1 reply; 26+ messages in thread
From: Vinod @ 2018-10-07 14:57 UTC (permalink / raw)
  To: Pierre-Yves MORDRET
  Cc: Rob Herring, Mark Rutland, Alexandre Torgue, Maxime Coquelin,
	Dan Williams, devicetree, dmaengine, linux-arm-kernel,
	linux-kernel

On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
> From: M'boumba Cedric Madianga <cedric.madianga@gmail.com>
> 
> This patch adds dma bindings to support DMA/MDMA chaining transfer.
> 1 bit is to manage both DMA FIFO Threshold
> 1 bit is to manage DMA/MDMA Chaining features.
> 2 bits are used to specify SDRAM size to use for DMA/MDMA chaining.

Please do mention which specific bits?

> The size in bytes of a certain order is given by the formula:
>     (2 ^ order) * PAGE_SIZE.
> The order is given by those 2 bits.
> For cyclic, whether chaining is chosen, any value above 1 can be set :
> SRAM buffer size will rely on period size and not on this DT value.
> 
> Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
> ---
>   Version history:
>     v3:
>     v2:
>        * rework content
>     v1:
>        * Initial
> ---
> ---
>  .../devicetree/bindings/dma/stm32-dma.txt          | 27 +++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/dma/stm32-dma.txt b/Documentation/devicetree/bindings/dma/stm32-dma.txt
> index c5f5190..2bac8c7 100644
> --- a/Documentation/devicetree/bindings/dma/stm32-dma.txt
> +++ b/Documentation/devicetree/bindings/dma/stm32-dma.txt
> @@ -17,6 +17,12 @@ Optional properties:
>  - resets: Reference to a reset controller asserting the DMA controller
>  - st,mem2mem: boolean; if defined, it indicates that the controller supports
>    memory-to-memory transfer
> +- dmas: A list of eight dma specifiers, one for each entry in dma-names.
> +  Refer to stm32-mdma.txt for more details.
> +- dma-names: should contain "ch0", "ch1", "ch2", "ch3", "ch4", "ch5", "ch6" and
> +  "ch7" and represents each STM32 DMA channel connected to a STM32 MDMA one.
> +- memory-region : phandle to a node describing memory to be used for
> +  M2M intermediate transfer between DMA and MDMA.
>  
>  Example:
>  
> @@ -36,6 +42,16 @@ Example:
>  		st,mem2mem;
>  		resets = <&rcc 150>;
>  		dma-requests = <8>;
> +		dmas = <&mdma1 8 0x10 0x1200000a 0x40026408 0x00000020 1>,
> +		       <&mdma1 9 0x10 0x1200000a 0x40026408 0x00000800 1>,
> +		       <&mdma1 10 0x10 0x1200000a 0x40026408 0x00200000 1>,
> +		       <&mdma1 11 0x10 0x1200000a 0x40026408 0x08000000 1>,
> +		       <&mdma1 12 0x10 0x1200000a 0x4002640C 0x00000020 1>,
> +		       <&mdma1 13 0x10 0x1200000a 0x4002640C 0x00000800 1>,
> +		       <&mdma1 14 0x10 0x1200000a 0x4002640C 0x00200000 1>,
> +		       <&mdma1 15 0x10 0x1200000a 0x4002640C 0x08000000 1>;
> +		dma-names = "ch0", "ch1", "ch2", "ch3", "ch4", "ch5", "ch6", "ch7";
> +		memory-region = <&sram_dmapool>;
>  	};
>  
>  * DMA client
> @@ -68,7 +84,16 @@ channel: a phandle to the DMA controller plus the following four integer cells:
>  	0x1: 1/2 full FIFO
>  	0x2: 3/4 full FIFO
>  	0x3: full FIFO
> -
> + -bit 2: Intermediate M2M transfer from/to DDR to/from SRAM throughout MDMA
> +	0: MDMA not used to generate an intermediate M2M transfer
> +	1: MDMA used to generate an intermediate M2M transfer.
> + -bit 3-4: indicated SRAM Buffer size in (2^order)*PAGE_SIZE.
> +	PAGE_SIZE is given by Linux at 4KiB: include/asm-generic/page.h.
> +	Order is given by those 2 bits starting at 0.
> +	Valid only whether Intermediate M2M transfer is set.

why do we need this as a property?

> +	For cyclic, whether Intermediate M2M transfer is chosen, any value can
> +	be set: SRAM buffer size will rely on period size and not on this DT
> +	value.
>  
>  Example:
>  
> -- 
> 2.7.4

-- 
~Vinod

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/7] dt-bindings: stm32-dmamux: Add one cell to support DMA/MDMA chain
  2018-09-28 13:01 ` [PATCH v3 2/7] dt-bindings: stm32-dmamux: Add one cell to support DMA/MDMA chain Pierre-Yves MORDRET
@ 2018-10-07 14:58   ` Vinod
  2018-10-09  7:22     ` Pierre Yves MORDRET
  2018-10-12 14:46   ` Rob Herring
  1 sibling, 1 reply; 26+ messages in thread
From: Vinod @ 2018-10-07 14:58 UTC (permalink / raw)
  To: Pierre-Yves MORDRET
  Cc: Rob Herring, Mark Rutland, Alexandre Torgue, Maxime Coquelin,
	Dan Williams, devicetree, dmaengine, linux-arm-kernel,
	linux-kernel

On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
> From: M'boumba Cedric Madianga <cedric.madianga@gmail.com>
> 
> Add one cell to support DMA/MDMA chaining.
> 
> Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
> ---
>   Version history:
>     v3:
>     v2:
>        * rework content
>     v1:
>        * Initial
> ---
> ---
>  Documentation/devicetree/bindings/dma/stm32-dmamux.txt | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/dma/stm32-dmamux.txt b/Documentation/devicetree/bindings/dma/stm32-dmamux.txt
> index 1b893b2..5e92b59 100644
> --- a/Documentation/devicetree/bindings/dma/stm32-dmamux.txt
> +++ b/Documentation/devicetree/bindings/dma/stm32-dmamux.txt
> @@ -4,9 +4,9 @@ Required properties:
>  - compatible:	"st,stm32h7-dmamux"
>  - reg:		Memory map for accessing module
>  - #dma-cells:	Should be set to <3>.
> -		First parameter is request line number.
> -		Second is DMA channel configuration
> -		Third is Fifo threshold
> +-		First parameter is request line number.
> +-		Second is DMA channel configuration
> +-		Third is a 32bit bitfield

please separate out formatting changes and actual change proposed..

>  		For more details about the three cells, please see
>  		stm32-dma.txt documentation binding file
>  - dma-masters:	Phandle pointing to the DMA controllers.
> -- 
> 2.7.4

-- 
~Vinod

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/7] dt-bindings: stm32-mdma: Add DMA/MDMA chaining support bindings
  2018-09-28 13:01 ` [PATCH v3 3/7] dt-bindings: stm32-mdma: Add DMA/MDMA chaining support bindings Pierre-Yves MORDRET
@ 2018-10-07 14:59   ` Vinod
  2018-10-09  8:17     ` Pierre Yves MORDRET
  0 siblings, 1 reply; 26+ messages in thread
From: Vinod @ 2018-10-07 14:59 UTC (permalink / raw)
  To: Pierre-Yves MORDRET
  Cc: Rob Herring, Mark Rutland, Alexandre Torgue, Maxime Coquelin,
	Dan Williams, devicetree, dmaengine, linux-arm-kernel,
	linux-kernel

On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
> From: M'boumba Cedric Madianga <cedric.madianga@gmail.com>
> 
> This patch adds the description of the 2 properties needed to support M2M
> transfer triggered by STM32 DMA when his transfer is complete.
> 
> Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
> ---
>   Version history:
>     v3:
>     v2:
>        * rework content
>     v1:
>        * Initial
> ---
> ---
>  Documentation/devicetree/bindings/dma/stm32-mdma.txt | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/dma/stm32-mdma.txt b/Documentation/devicetree/bindings/dma/stm32-mdma.txt
> index d18772d..27c2812 100644
> --- a/Documentation/devicetree/bindings/dma/stm32-mdma.txt
> +++ b/Documentation/devicetree/bindings/dma/stm32-mdma.txt
> @@ -10,7 +10,7 @@ Required properties:
>  - interrupts: Should contain the MDMA interrupt.
>  - clocks: Should contain the input clock of the DMA instance.
>  - resets: Reference to a reset controller asserting the DMA controller.
> -- #dma-cells : Must be <5>. See DMA client paragraph for more details.
> +- #dma-cells : Must be <6>. See DMA client paragraph for more details.

can you update the example for 6 cells?

Also what happens to dts using 5 cells..

>  
>  Optional properties:
>  - dma-channels: Number of DMA channels supported by the controller.
> @@ -26,7 +26,7 @@ Example:
>  		interrupts = <122>;
>  		clocks = <&timer_clk>;
>  		resets = <&rcc 992>;
> -		#dma-cells = <5>;
> +		#dma-cells = <6>;
>  		dma-channels = <16>;
>  		dma-requests = <32>;
>  		st,ahb-addr-masks = <0x20000000>, <0x00000000>;
> @@ -35,8 +35,8 @@ Example:
>  * DMA client
>  
>  DMA clients connected to the STM32 MDMA controller must use the format
> -described in the dma.txt file, using a five-cell specifier for each channel:
> -a phandle to the MDMA controller plus the following five integer cells:
> +described in the dma.txt file, using a six-cell specifier for each channel:
> +a phandle to the MDMA controller plus the following six integer cells:
>  
>  1. The request line number
>  2. The priority level
> @@ -76,6 +76,10 @@ a phandle to the MDMA controller plus the following five integer cells:
>     if no HW ack signal is used by the MDMA client
>  5. A 32bit mask specifying the value to be written to acknowledge the request
>     if no HW ack signal is used by the MDMA client
> +6. A bitfield value specifying if the MDMA client wants to generate M2M
> +   transfer with HW trigger (1) or not (0). This bitfield should be only
> +   enabled for M2M transfer triggered by STM32 DMA client. The memory devices
> +   involved in this kind of transfer are SRAM and DDR.
>  
>  Example:
>  
> -- 
> 2.7.4

-- 
~Vinod

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 4/7] dmaengine: stm32-dma: Add DMA/MDMA chaining support
  2018-09-28 13:01 ` [PATCH v3 4/7] dmaengine: stm32-dma: Add DMA/MDMA chaining support Pierre-Yves MORDRET
@ 2018-10-07 16:00   ` Vinod
  2018-10-09  8:40     ` Pierre Yves MORDRET
  0 siblings, 1 reply; 26+ messages in thread
From: Vinod @ 2018-10-07 16:00 UTC (permalink / raw)
  To: Pierre-Yves MORDRET
  Cc: Rob Herring, Mark Rutland, Alexandre Torgue, Maxime Coquelin,
	Dan Williams, devicetree, dmaengine, linux-arm-kernel,
	linux-kernel

On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
> This patch adds support of DMA/MDMA chaining support.
> It introduces an intermediate transfer between peripherals and STM32 DMA.
> This intermediate transfer is triggered by SW for single M2D transfer and
> by STM32 DMA IP for all other modes (sg, cyclic) and direction (D2M).
> 
> A generic SRAM allocator is used for this intermediate buffer
> Each DMA channel will be able to define its SRAM needs to achieve chaining
> feature : (2 ^ order) * PAGE_SIZE.
> For cyclic, SRAM buffer is derived from period length (rounded on
> PAGE_SIZE).

So IIUC, you chain two dma txns together and transfer data via an SRAM?

> 
> Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
> ---
>   Version history:
>     v3:
>        * Solve KBuild warning
>     v2:
>     v1:
>        * Initial
> ---
> ---
>  drivers/dma/stm32-dma.c | 879 ++++++++++++++++++++++++++++++++++++++++++------

that is a lot of change for a driver, consider splitting it up
logically in smaller changes...

>  1 file changed, 772 insertions(+), 107 deletions(-)
> 
> diff --git a/drivers/dma/stm32-dma.c b/drivers/dma/stm32-dma.c
> index 379e8d5..85e81c4 100644
> --- a/drivers/dma/stm32-dma.c
> +++ b/drivers/dma/stm32-dma.c
> @@ -15,11 +15,14 @@
>  #include <linux/dmaengine.h>
>  #include <linux/dma-mapping.h>
>  #include <linux/err.h>
> +#include <linux/genalloc.h>
>  #include <linux/init.h>
> +#include <linux/iopoll.h>
>  #include <linux/jiffies.h>
>  #include <linux/list.h>
>  #include <linux/module.h>
>  #include <linux/of.h>
> +#include <linux/of_address.h>
>  #include <linux/of_device.h>
>  #include <linux/of_dma.h>
>  #include <linux/platform_device.h>
> @@ -118,6 +121,7 @@
>  #define STM32_DMA_FIFO_THRESHOLD_FULL			0x03
>  
>  #define STM32_DMA_MAX_DATA_ITEMS	0xffff
> +#define STM32_DMA_SRAM_GRANULARITY	PAGE_SIZE
>  /*
>   * Valid transfer starts from @0 to @0xFFFE leading to unaligned scatter
>   * gather at boundary. Thus it's safer to round down this value on FIFO
> @@ -135,6 +139,12 @@
>  /* DMA Features */
>  #define STM32_DMA_THRESHOLD_FTR_MASK	GENMASK(1, 0)
>  #define STM32_DMA_THRESHOLD_FTR_GET(n)	((n) & STM32_DMA_THRESHOLD_FTR_MASK)
> +#define STM32_DMA_MDMA_CHAIN_FTR_MASK	BIT(2)
> +#define STM32_DMA_MDMA_CHAIN_FTR_GET(n)	(((n) & STM32_DMA_MDMA_CHAIN_FTR_MASK) \
> +					 >> 2)
> +#define STM32_DMA_MDMA_SRAM_SIZE_MASK	GENMASK(4, 3)
> +#define STM32_DMA_MDMA_SRAM_SIZE_GET(n)	(((n) & STM32_DMA_MDMA_SRAM_SIZE_MASK) \
> +					 >> 3)
>  
>  enum stm32_dma_width {
>  	STM32_DMA_BYTE,
> @@ -176,15 +186,31 @@ struct stm32_dma_chan_reg {
>  	u32 dma_sfcr;
>  };
>  
> +struct stm32_dma_mdma_desc {
> +	struct sg_table sgt;
> +	struct dma_async_tx_descriptor *desc;
> +};
> +
> +struct stm32_dma_mdma {
> +	struct dma_chan *chan;
> +	enum dma_transfer_direction dir;
> +	dma_addr_t sram_buf;
> +	u32 sram_period;
> +	u32 num_sgs;
> +};
> +
>  struct stm32_dma_sg_req {
> -	u32 len;
> +	struct scatterlist stm32_sgl_req;
>  	struct stm32_dma_chan_reg chan_reg;
> +	struct stm32_dma_mdma_desc m_desc;
>  };
>  
>  struct stm32_dma_desc {
>  	struct virt_dma_desc vdesc;
>  	bool cyclic;
>  	u32 num_sgs;
> +	dma_addr_t dma_buf;
> +	void *dma_buf_cpu;
>  	struct stm32_dma_sg_req sg_req[];
>  };
>  
> @@ -201,6 +227,10 @@ struct stm32_dma_chan {
>  	u32 threshold;
>  	u32 mem_burst;
>  	u32 mem_width;
> +	struct stm32_dma_mdma mchan;
> +	u32 use_mdma;
> +	u32 sram_size;
> +	u32 residue_after_drain;
>  };
>  
>  struct stm32_dma_device {
> @@ -210,6 +240,7 @@ struct stm32_dma_device {
>  	struct reset_control *rst;
>  	bool mem2mem;
>  	struct stm32_dma_chan chan[STM32_DMA_MAX_CHANNELS];
> +	struct gen_pool *sram_pool;
>  };
>  
>  static struct stm32_dma_device *stm32_dma_get_dev(struct stm32_dma_chan *chan)
> @@ -497,11 +528,15 @@ static void stm32_dma_stop(struct stm32_dma_chan *chan)
>  static int stm32_dma_terminate_all(struct dma_chan *c)
>  {
>  	struct stm32_dma_chan *chan = to_stm32_dma_chan(c);
> +	struct stm32_dma_mdma *mchan = &chan->mchan;
>  	unsigned long flags;
>  	LIST_HEAD(head);
>  
>  	spin_lock_irqsave(&chan->vchan.lock, flags);
>  
> +	if (chan->use_mdma)
> +		dmaengine_terminate_async(mchan->chan);
> +
>  	if (chan->busy) {
>  		stm32_dma_stop(chan);
>  		chan->desc = NULL;
> @@ -514,9 +549,96 @@ static int stm32_dma_terminate_all(struct dma_chan *c)
>  	return 0;
>  }
>  
> +static u32 stm32_dma_get_remaining_bytes(struct stm32_dma_chan *chan)
> +{
> +	u32 dma_scr, width, ndtr;
> +	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
> +
> +	dma_scr = stm32_dma_read(dmadev, STM32_DMA_SCR(chan->id));
> +	width = STM32_DMA_SCR_PSIZE_GET(dma_scr);
> +	ndtr = stm32_dma_read(dmadev, STM32_DMA_SNDTR(chan->id));
> +
> +	return ndtr << width;
> +}
> +
> +static int stm32_dma_mdma_drain(struct stm32_dma_chan *chan)
> +{
> +	struct stm32_dma_mdma *mchan = &chan->mchan;
> +	struct stm32_dma_sg_req *sg_req;
> +	struct dma_device *ddev = mchan->chan->device;
> +	struct dma_async_tx_descriptor *desc = NULL;
> +	enum dma_status status;
> +	dma_addr_t src_buf, dst_buf;
> +	u32 mdma_residue, mdma_wrote, dma_to_write, len;
> +	struct dma_tx_state state;
> +	int ret;
> +
> +	/* DMA/MDMA chain: drain remaining data in SRAM */
> +
> +	/* Get the residue on MDMA side */
> +	status = dmaengine_tx_status(mchan->chan, mchan->chan->cookie, &state);
> +	if (status == DMA_COMPLETE)
> +		return status;
> +
> +	mdma_residue = state.residue;
> +	sg_req = &chan->desc->sg_req[chan->next_sg - 1];
> +	len = sg_dma_len(&sg_req->stm32_sgl_req);
> +
> +	/*
> +	 * Total = mdma blocks * sram_period + rest (< sram_period)
> +	 * so mdma blocks * sram_period = len - mdma residue - rest
> +	 */
> +	mdma_wrote = len - mdma_residue - (len % mchan->sram_period);
> +
> +	/* Remaining data stuck in SRAM */
> +	dma_to_write = mchan->sram_period - stm32_dma_get_remaining_bytes(chan);
> +	if (dma_to_write > 0) {
> +		/* Stop DMA current operation */
> +		stm32_dma_disable_chan(chan);
> +
> +		/* Terminate current MDMA to initiate a new one */
> +		dmaengine_terminate_all(mchan->chan);
> +
> +		/* Double buffer management */
> +		src_buf = mchan->sram_buf +
> +			  ((mdma_wrote / mchan->sram_period) & 0x1) *
> +			  mchan->sram_period;
> +		dst_buf = sg_dma_address(&sg_req->stm32_sgl_req) + mdma_wrote;
> +
> +		desc = ddev->device_prep_dma_memcpy(mchan->chan,
> +						    dst_buf, src_buf,
> +						    dma_to_write,
> +						    DMA_PREP_INTERRUPT);

why would you do that?

If at all you need to create anothe txn, I think it would be good to
prepare a new descriptor and chain it, not call the dmaengine APIs..

-- 
~Vinod

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/7] dt-bindings: stm32-dma: Add DMA/MDMA chaining support bindings
  2018-10-07 14:57   ` Vinod
@ 2018-10-09  7:18     ` Pierre Yves MORDRET
  2018-10-09  8:57       ` Vinod
  0 siblings, 1 reply; 26+ messages in thread
From: Pierre Yves MORDRET @ 2018-10-09  7:18 UTC (permalink / raw)
  To: Vinod
  Cc: Rob Herring, Mark Rutland, Alexandre Torgue, Maxime Coquelin,
	Dan Williams, devicetree, dmaengine, linux-arm-kernel,
	linux-kernel

Hi Vinod

On 10/07/2018 04:57 PM, Vinod wrote:
> On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
>> From: M'boumba Cedric Madianga <cedric.madianga@gmail.com>
>>
>> This patch adds dma bindings to support DMA/MDMA chaining transfer.
>> 1 bit is to manage both DMA FIFO Threshold
>> 1 bit is to manage DMA/MDMA Chaining features.
>> 2 bits are used to specify SDRAM size to use for DMA/MDMA chaining.
> 
> Please do mention which specific bits?

This is written below into DMA Client section. But I can put some words here.

> 
>> The size in bytes of a certain order is given by the formula:
>>     (2 ^ order) * PAGE_SIZE.
>> The order is given by those 2 bits.
>> For cyclic, whether chaining is chosen, any value above 1 can be set :
>> SRAM buffer size will rely on period size and not on this DT value.
>>
>> Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
>> ---
>>   Version history:
>>     v3:
>>     v2:
>>        * rework content
>>     v1:
>>        * Initial
>> ---
>> ---
>>  .../devicetree/bindings/dma/stm32-dma.txt          | 27 +++++++++++++++++++++-
>>  1 file changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/devicetree/bindings/dma/stm32-dma.txt b/Documentation/devicetree/bindings/dma/stm32-dma.txt
>> index c5f5190..2bac8c7 100644
>> --- a/Documentation/devicetree/bindings/dma/stm32-dma.txt
>> +++ b/Documentation/devicetree/bindings/dma/stm32-dma.txt
>> @@ -17,6 +17,12 @@ Optional properties:
>>  - resets: Reference to a reset controller asserting the DMA controller
>>  - st,mem2mem: boolean; if defined, it indicates that the controller supports
>>    memory-to-memory transfer
>> +- dmas: A list of eight dma specifiers, one for each entry in dma-names.
>> +  Refer to stm32-mdma.txt for more details.
>> +- dma-names: should contain "ch0", "ch1", "ch2", "ch3", "ch4", "ch5", "ch6" and
>> +  "ch7" and represents each STM32 DMA channel connected to a STM32 MDMA one.
>> +- memory-region : phandle to a node describing memory to be used for
>> +  M2M intermediate transfer between DMA and MDMA.
>>  
>>  Example:
>>  
>> @@ -36,6 +42,16 @@ Example:
>>  		st,mem2mem;
>>  		resets = <&rcc 150>;
>>  		dma-requests = <8>;
>> +		dmas = <&mdma1 8 0x10 0x1200000a 0x40026408 0x00000020 1>,
>> +		       <&mdma1 9 0x10 0x1200000a 0x40026408 0x00000800 1>,
>> +		       <&mdma1 10 0x10 0x1200000a 0x40026408 0x00200000 1>,
>> +		       <&mdma1 11 0x10 0x1200000a 0x40026408 0x08000000 1>,
>> +		       <&mdma1 12 0x10 0x1200000a 0x4002640C 0x00000020 1>,
>> +		       <&mdma1 13 0x10 0x1200000a 0x4002640C 0x00000800 1>,
>> +		       <&mdma1 14 0x10 0x1200000a 0x4002640C 0x00200000 1>,
>> +		       <&mdma1 15 0x10 0x1200000a 0x4002640C 0x08000000 1>;
>> +		dma-names = "ch0", "ch1", "ch2", "ch3", "ch4", "ch5", "ch6", "ch7";
>> +		memory-region = <&sram_dmapool>;
>>  	};
>>  
>>  * DMA client
>> @@ -68,7 +84,16 @@ channel: a phandle to the DMA controller plus the following four integer cells:
>>  	0x1: 1/2 full FIFO
>>  	0x2: 3/4 full FIFO
>>  	0x3: full FIFO
>> -
>> + -bit 2: Intermediate M2M transfer from/to DDR to/from SRAM throughout MDMA
>> +	0: MDMA not used to generate an intermediate M2M transfer
>> +	1: MDMA used to generate an intermediate M2M transfer.
>> + -bit 3-4: indicated SRAM Buffer size in (2^order)*PAGE_SIZE.
>> +	PAGE_SIZE is given by Linux at 4KiB: include/asm-generic/page.h.
>> +	Order is given by those 2 bits starting at 0.
>> +	Valid only whether Intermediate M2M transfer is set.
> 
> why do we need this as a property?

In some UC, we need more than 4KiB in case of chaining for better performances.
Chaining has to be enabled by client if performance is at sacks.

> 
>> +	For cyclic, whether Intermediate M2M transfer is chosen, any value can
>> +	be set: SRAM buffer size will rely on period size and not on this DT
>> +	value.
>>  
>>  Example:
>>  
>> -- 
>> 2.7.4
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/7] dt-bindings: stm32-dmamux: Add one cell to support DMA/MDMA chain
  2018-10-07 14:58   ` Vinod
@ 2018-10-09  7:22     ` Pierre Yves MORDRET
  0 siblings, 0 replies; 26+ messages in thread
From: Pierre Yves MORDRET @ 2018-10-09  7:22 UTC (permalink / raw)
  To: Vinod
  Cc: Rob Herring, Mark Rutland, Alexandre Torgue, Maxime Coquelin,
	Dan Williams, devicetree, dmaengine, linux-arm-kernel,
	linux-kernel



On 10/07/2018 04:58 PM, Vinod wrote:
> On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
>> From: M'boumba Cedric Madianga <cedric.madianga@gmail.com>
>>
>> Add one cell to support DMA/MDMA chaining.
>>
>> Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
>> ---
>>   Version history:
>>     v3:
>>     v2:
>>        * rework content
>>     v1:
>>        * Initial
>> ---
>> ---
>>  Documentation/devicetree/bindings/dma/stm32-dmamux.txt | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/dma/stm32-dmamux.txt b/Documentation/devicetree/bindings/dma/stm32-dmamux.txt
>> index 1b893b2..5e92b59 100644
>> --- a/Documentation/devicetree/bindings/dma/stm32-dmamux.txt
>> +++ b/Documentation/devicetree/bindings/dma/stm32-dmamux.txt
>> @@ -4,9 +4,9 @@ Required properties:
>>  - compatible:	"st,stm32h7-dmamux"
>>  - reg:		Memory map for accessing module
>>  - #dma-cells:	Should be set to <3>.
>> -		First parameter is request line number.
>> -		Second is DMA channel configuration
>> -		Third is Fifo threshold
>> +-		First parameter is request line number.
>> +-		Second is DMA channel configuration
>> +-		Third is a 32bit bitfield
> 
> please separate out formatting changes and actual change proposed..

Yes. sorry. my bad.

> 
>>  		For more details about the three cells, please see
>>  		stm32-dma.txt documentation binding file
>>  - dma-masters:	Phandle pointing to the DMA controllers.
>> -- 
>> 2.7.4
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/7] dt-bindings: stm32-mdma: Add DMA/MDMA chaining support bindings
  2018-10-07 14:59   ` Vinod
@ 2018-10-09  8:17     ` Pierre Yves MORDRET
  0 siblings, 0 replies; 26+ messages in thread
From: Pierre Yves MORDRET @ 2018-10-09  8:17 UTC (permalink / raw)
  To: Vinod
  Cc: Rob Herring, Mark Rutland, Alexandre Torgue, Maxime Coquelin,
	Dan Williams, devicetree, dmaengine, linux-arm-kernel,
	linux-kernel



On 10/07/2018 04:59 PM, Vinod wrote:
> On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
>> From: M'boumba Cedric Madianga <cedric.madianga@gmail.com>
>>
>> This patch adds the description of the 2 properties needed to support M2M
>> transfer triggered by STM32 DMA when his transfer is complete.
>>
>> Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
>> ---
>>   Version history:
>>     v3:
>>     v2:
>>        * rework content
>>     v1:
>>        * Initial
>> ---
>> ---
>>  Documentation/devicetree/bindings/dma/stm32-mdma.txt | 12 ++++++++----
>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/dma/stm32-mdma.txt b/Documentation/devicetree/bindings/dma/stm32-mdma.txt
>> index d18772d..27c2812 100644
>> --- a/Documentation/devicetree/bindings/dma/stm32-mdma.txt
>> +++ b/Documentation/devicetree/bindings/dma/stm32-mdma.txt
>> @@ -10,7 +10,7 @@ Required properties:
>>  - interrupts: Should contain the MDMA interrupt.
>>  - clocks: Should contain the input clock of the DMA instance.
>>  - resets: Reference to a reset controller asserting the DMA controller.
>> -- #dma-cells : Must be <5>. See DMA client paragraph for more details.
>> +- #dma-cells : Must be <6>. See DMA client paragraph for more details.
> 
> can you update the example for 6 cells?

of course.

> 
> Also what happens to dts using 5 cells..

They are not managed, but it should. I will update this flaw. Thanks for
pointing this out.

> 
>>  
>>  Optional properties:
>>  - dma-channels: Number of DMA channels supported by the controller.
>> @@ -26,7 +26,7 @@ Example:
>>  		interrupts = <122>;
>>  		clocks = <&timer_clk>;
>>  		resets = <&rcc 992>;
>> -		#dma-cells = <5>;
>> +		#dma-cells = <6>;
>>  		dma-channels = <16>;
>>  		dma-requests = <32>;
>>  		st,ahb-addr-masks = <0x20000000>, <0x00000000>;
>> @@ -35,8 +35,8 @@ Example:
>>  * DMA client
>>  
>>  DMA clients connected to the STM32 MDMA controller must use the format
>> -described in the dma.txt file, using a five-cell specifier for each channel:
>> -a phandle to the MDMA controller plus the following five integer cells:
>> +described in the dma.txt file, using a six-cell specifier for each channel:
>> +a phandle to the MDMA controller plus the following six integer cells:
>>  
>>  1. The request line number
>>  2. The priority level
>> @@ -76,6 +76,10 @@ a phandle to the MDMA controller plus the following five integer cells:
>>     if no HW ack signal is used by the MDMA client
>>  5. A 32bit mask specifying the value to be written to acknowledge the request
>>     if no HW ack signal is used by the MDMA client
>> +6. A bitfield value specifying if the MDMA client wants to generate M2M
>> +   transfer with HW trigger (1) or not (0). This bitfield should be only
>> +   enabled for M2M transfer triggered by STM32 DMA client. The memory devices
>> +   involved in this kind of transfer are SRAM and DDR.
>>  
>>  Example:
>>  
>> -- 
>> 2.7.4
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 4/7] dmaengine: stm32-dma: Add DMA/MDMA chaining support
  2018-10-07 16:00   ` Vinod
@ 2018-10-09  8:40     ` Pierre Yves MORDRET
  2018-10-10  4:03       ` Vinod
  0 siblings, 1 reply; 26+ messages in thread
From: Pierre Yves MORDRET @ 2018-10-09  8:40 UTC (permalink / raw)
  To: Vinod
  Cc: Rob Herring, Mark Rutland, Alexandre Torgue, Maxime Coquelin,
	Dan Williams, devicetree, dmaengine, linux-arm-kernel,
	linux-kernel



On 10/07/2018 06:00 PM, Vinod wrote:
> On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
>> This patch adds support of DMA/MDMA chaining support.
>> It introduces an intermediate transfer between peripherals and STM32 DMA.
>> This intermediate transfer is triggered by SW for single M2D transfer and
>> by STM32 DMA IP for all other modes (sg, cyclic) and direction (D2M).
>>
>> A generic SRAM allocator is used for this intermediate buffer
>> Each DMA channel will be able to define its SRAM needs to achieve chaining
>> feature : (2 ^ order) * PAGE_SIZE.
>> For cyclic, SRAM buffer is derived from period length (rounded on
>> PAGE_SIZE).
> 
> So IIUC, you chain two dma txns together and transfer data via an SRAM?

Correct. one DMA is DMAv2 (stm32-dma) and the other is MDMA(stm32-mdma).
Intermediate transfer is between device and memory.
This intermediate transfer is using SDRAM.

> 
>>
>> Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
>> ---
>>   Version history:
>>     v3:
>>        * Solve KBuild warning
>>     v2:
>>     v1:
>>        * Initial
>> ---
>> ---
>>  drivers/dma/stm32-dma.c | 879 ++++++++++++++++++++++++++++++++++++++++++------
> 
> that is a lot of change for a driver, consider splitting it up
> logically in smaller changes...
> 

This feature is rather monolithic. Difficult to split up.
All the code is required at once.

>>  1 file changed, 772 insertions(+), 107 deletions(-)
>>
>> diff --git a/drivers/dma/stm32-dma.c b/drivers/dma/stm32-dma.c
>> index 379e8d5..85e81c4 100644
>> --- a/drivers/dma/stm32-dma.c
>> +++ b/drivers/dma/stm32-dma.c
>> @@ -15,11 +15,14 @@
>>  #include <linux/dmaengine.h>
>>  #include <linux/dma-mapping.h>
>>  #include <linux/err.h>
>> +#include <linux/genalloc.h>
>>  #include <linux/init.h>
>> +#include <linux/iopoll.h>
>>  #include <linux/jiffies.h>
>>  #include <linux/list.h>
>>  #include <linux/module.h>
>>  #include <linux/of.h>
>> +#include <linux/of_address.h>
>>  #include <linux/of_device.h>
>>  #include <linux/of_dma.h>
>>  #include <linux/platform_device.h>
>> @@ -118,6 +121,7 @@
>>  #define STM32_DMA_FIFO_THRESHOLD_FULL			0x03
>>  
>>  #define STM32_DMA_MAX_DATA_ITEMS	0xffff
>> +#define STM32_DMA_SRAM_GRANULARITY	PAGE_SIZE
>>  /*
>>   * Valid transfer starts from @0 to @0xFFFE leading to unaligned scatter
>>   * gather at boundary. Thus it's safer to round down this value on FIFO
>> @@ -135,6 +139,12 @@
>>  /* DMA Features */
>>  #define STM32_DMA_THRESHOLD_FTR_MASK	GENMASK(1, 0)
>>  #define STM32_DMA_THRESHOLD_FTR_GET(n)	((n) & STM32_DMA_THRESHOLD_FTR_MASK)
>> +#define STM32_DMA_MDMA_CHAIN_FTR_MASK	BIT(2)
>> +#define STM32_DMA_MDMA_CHAIN_FTR_GET(n)	(((n) & STM32_DMA_MDMA_CHAIN_FTR_MASK) \
>> +					 >> 2)
>> +#define STM32_DMA_MDMA_SRAM_SIZE_MASK	GENMASK(4, 3)
>> +#define STM32_DMA_MDMA_SRAM_SIZE_GET(n)	(((n) & STM32_DMA_MDMA_SRAM_SIZE_MASK) \
>> +					 >> 3)
>>  
>>  enum stm32_dma_width {
>>  	STM32_DMA_BYTE,
>> @@ -176,15 +186,31 @@ struct stm32_dma_chan_reg {
>>  	u32 dma_sfcr;
>>  };
>>  
>> +struct stm32_dma_mdma_desc {
>> +	struct sg_table sgt;
>> +	struct dma_async_tx_descriptor *desc;
>> +};
>> +
>> +struct stm32_dma_mdma {
>> +	struct dma_chan *chan;
>> +	enum dma_transfer_direction dir;
>> +	dma_addr_t sram_buf;
>> +	u32 sram_period;
>> +	u32 num_sgs;
>> +};
>> +
>>  struct stm32_dma_sg_req {
>> -	u32 len;
>> +	struct scatterlist stm32_sgl_req;
>>  	struct stm32_dma_chan_reg chan_reg;
>> +	struct stm32_dma_mdma_desc m_desc;
>>  };
>>  
>>  struct stm32_dma_desc {
>>  	struct virt_dma_desc vdesc;
>>  	bool cyclic;
>>  	u32 num_sgs;
>> +	dma_addr_t dma_buf;
>> +	void *dma_buf_cpu;
>>  	struct stm32_dma_sg_req sg_req[];
>>  };
>>  
>> @@ -201,6 +227,10 @@ struct stm32_dma_chan {
>>  	u32 threshold;
>>  	u32 mem_burst;
>>  	u32 mem_width;
>> +	struct stm32_dma_mdma mchan;
>> +	u32 use_mdma;
>> +	u32 sram_size;
>> +	u32 residue_after_drain;
>>  };
>>  
>>  struct stm32_dma_device {
>> @@ -210,6 +240,7 @@ struct stm32_dma_device {
>>  	struct reset_control *rst;
>>  	bool mem2mem;
>>  	struct stm32_dma_chan chan[STM32_DMA_MAX_CHANNELS];
>> +	struct gen_pool *sram_pool;
>>  };
>>  
>>  static struct stm32_dma_device *stm32_dma_get_dev(struct stm32_dma_chan *chan)
>> @@ -497,11 +528,15 @@ static void stm32_dma_stop(struct stm32_dma_chan *chan)
>>  static int stm32_dma_terminate_all(struct dma_chan *c)
>>  {
>>  	struct stm32_dma_chan *chan = to_stm32_dma_chan(c);
>> +	struct stm32_dma_mdma *mchan = &chan->mchan;
>>  	unsigned long flags;
>>  	LIST_HEAD(head);
>>  
>>  	spin_lock_irqsave(&chan->vchan.lock, flags);
>>  
>> +	if (chan->use_mdma)
>> +		dmaengine_terminate_async(mchan->chan);
>> +
>>  	if (chan->busy) {
>>  		stm32_dma_stop(chan);
>>  		chan->desc = NULL;
>> @@ -514,9 +549,96 @@ static int stm32_dma_terminate_all(struct dma_chan *c)
>>  	return 0;
>>  }
>>  
>> +static u32 stm32_dma_get_remaining_bytes(struct stm32_dma_chan *chan)
>> +{
>> +	u32 dma_scr, width, ndtr;
>> +	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
>> +
>> +	dma_scr = stm32_dma_read(dmadev, STM32_DMA_SCR(chan->id));
>> +	width = STM32_DMA_SCR_PSIZE_GET(dma_scr);
>> +	ndtr = stm32_dma_read(dmadev, STM32_DMA_SNDTR(chan->id));
>> +
>> +	return ndtr << width;
>> +}
>> +
>> +static int stm32_dma_mdma_drain(struct stm32_dma_chan *chan)
>> +{
>> +	struct stm32_dma_mdma *mchan = &chan->mchan;
>> +	struct stm32_dma_sg_req *sg_req;
>> +	struct dma_device *ddev = mchan->chan->device;
>> +	struct dma_async_tx_descriptor *desc = NULL;
>> +	enum dma_status status;
>> +	dma_addr_t src_buf, dst_buf;
>> +	u32 mdma_residue, mdma_wrote, dma_to_write, len;
>> +	struct dma_tx_state state;
>> +	int ret;
>> +
>> +	/* DMA/MDMA chain: drain remaining data in SRAM */
>> +
>> +	/* Get the residue on MDMA side */
>> +	status = dmaengine_tx_status(mchan->chan, mchan->chan->cookie, &state);
>> +	if (status == DMA_COMPLETE)
>> +		return status;
>> +
>> +	mdma_residue = state.residue;
>> +	sg_req = &chan->desc->sg_req[chan->next_sg - 1];
>> +	len = sg_dma_len(&sg_req->stm32_sgl_req);
>> +
>> +	/*
>> +	 * Total = mdma blocks * sram_period + rest (< sram_period)
>> +	 * so mdma blocks * sram_period = len - mdma residue - rest
>> +	 */
>> +	mdma_wrote = len - mdma_residue - (len % mchan->sram_period);
>> +
>> +	/* Remaining data stuck in SRAM */
>> +	dma_to_write = mchan->sram_period - stm32_dma_get_remaining_bytes(chan);
>> +	if (dma_to_write > 0) {
>> +		/* Stop DMA current operation */
>> +		stm32_dma_disable_chan(chan);
>> +
>> +		/* Terminate current MDMA to initiate a new one */
>> +		dmaengine_terminate_all(mchan->chan);
>> +
>> +		/* Double buffer management */
>> +		src_buf = mchan->sram_buf +
>> +			  ((mdma_wrote / mchan->sram_period) & 0x1) *
>> +			  mchan->sram_period;
>> +		dst_buf = sg_dma_address(&sg_req->stm32_sgl_req) + mdma_wrote;
>> +
>> +		desc = ddev->device_prep_dma_memcpy(mchan->chan,
>> +						    dst_buf, src_buf,
>> +						    dma_to_write,
>> +						    DMA_PREP_INTERRUPT);
> 
> why would you do that?
> 
> If at all you need to create anothe txn, I think it would be good to
> prepare a new descriptor and chain it, not call the dmaengine APIs..
> 

In this UC, DMAv2 is configured in cyclic mode because this DMA doesn't work
with hw LLI only sw. This is really for performances reason we use this cyclic mode.
This very last txn is to flush remaining bytes stick in SDRAM.
I don't believe I can chain cyclic and this last txn.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/7] dt-bindings: stm32-dma: Add DMA/MDMA chaining support bindings
  2018-10-09  7:18     ` Pierre Yves MORDRET
@ 2018-10-09  8:57       ` Vinod
  2018-10-09 13:46         ` Pierre Yves MORDRET
  0 siblings, 1 reply; 26+ messages in thread
From: Vinod @ 2018-10-09  8:57 UTC (permalink / raw)
  To: Pierre Yves MORDRET
  Cc: Rob Herring, Mark Rutland, Alexandre Torgue, Maxime Coquelin,
	Dan Williams, devicetree, dmaengine, linux-arm-kernel,
	linux-kernel

Hi Pierre,

On 09-10-18, 09:18, Pierre Yves MORDRET wrote:

> >>  * DMA client
> >> @@ -68,7 +84,16 @@ channel: a phandle to the DMA controller plus the following four integer cells:
> >>  	0x1: 1/2 full FIFO
> >>  	0x2: 3/4 full FIFO
> >>  	0x3: full FIFO
> >> -
> >> + -bit 2: Intermediate M2M transfer from/to DDR to/from SRAM throughout MDMA
> >> +	0: MDMA not used to generate an intermediate M2M transfer
> >> +	1: MDMA used to generate an intermediate M2M transfer.
> >> + -bit 3-4: indicated SRAM Buffer size in (2^order)*PAGE_SIZE.
> >> +	PAGE_SIZE is given by Linux at 4KiB: include/asm-generic/page.h.
> >> +	Order is given by those 2 bits starting at 0.
> >> +	Valid only whether Intermediate M2M transfer is set.
> > 
> > why do we need this as a property?
> 
> In some UC, we need more than 4KiB in case of chaining for better performances.
> Chaining has to be enabled by client if performance is at sacks.

Okay if that is the case why is the user not taking care of this?
Creating DMA txn and chaining them up and starting the chain? Why would
dmaengine driver need to do that?

-- 
~Vinod

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/7] dt-bindings: stm32-dma: Add DMA/MDMA chaining support bindings
  2018-10-09  8:57       ` Vinod
@ 2018-10-09 13:46         ` Pierre Yves MORDRET
  0 siblings, 0 replies; 26+ messages in thread
From: Pierre Yves MORDRET @ 2018-10-09 13:46 UTC (permalink / raw)
  To: Vinod
  Cc: Rob Herring, Mark Rutland, Alexandre Torgue, Maxime Coquelin,
	Dan Williams, devicetree, dmaengine, linux-arm-kernel,
	linux-kernel



On 10/09/2018 10:57 AM, Vinod wrote:
> Hi Pierre,
> 
> On 09-10-18, 09:18, Pierre Yves MORDRET wrote:
> 
>>>>  * DMA client
>>>> @@ -68,7 +84,16 @@ channel: a phandle to the DMA controller plus the following four integer cells:
>>>>  	0x1: 1/2 full FIFO
>>>>  	0x2: 3/4 full FIFO
>>>>  	0x3: full FIFO
>>>> -
>>>> + -bit 2: Intermediate M2M transfer from/to DDR to/from SRAM throughout MDMA
>>>> +	0: MDMA not used to generate an intermediate M2M transfer
>>>> +	1: MDMA used to generate an intermediate M2M transfer.
>>>> + -bit 3-4: indicated SRAM Buffer size in (2^order)*PAGE_SIZE.
>>>> +	PAGE_SIZE is given by Linux at 4KiB: include/asm-generic/page.h.
>>>> +	Order is given by those 2 bits starting at 0.
>>>> +	Valid only whether Intermediate M2M transfer is set.
>>>
>>> why do we need this as a property?
>>
>> In some UC, we need more than 4KiB in case of chaining for better performances.
>> Chaining has to be enabled by client if performance is at sacks.
> 
> Okay if that is the case why is the user not taking care of this?
> Creating DMA txn and chaining them up and starting the chain? Why would
> dmaengine driver need to do that?
> 

User is using standard DMA API (single, sg or cyclic) and is agnostic on what is
behind the scene(almost). As driver I just fulfill the request to transfer what
he wants. My driver scatters transfer into SDRAM chunks defined by user.
Unfortunately all transfer are not % the SDRAM size given in DT. This very last
txn is to flush the last expected bytes.
Whatever user set for chaining(bit 2) the DMA API remains the same at its side.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 4/7] dmaengine: stm32-dma: Add DMA/MDMA chaining support
  2018-10-09  8:40     ` Pierre Yves MORDRET
@ 2018-10-10  4:03       ` Vinod
  2018-10-10  7:02         ` Pierre Yves MORDRET
  0 siblings, 1 reply; 26+ messages in thread
From: Vinod @ 2018-10-10  4:03 UTC (permalink / raw)
  To: Pierre Yves MORDRET
  Cc: Rob Herring, Mark Rutland, Alexandre Torgue, Maxime Coquelin,
	Dan Williams, devicetree, dmaengine, linux-arm-kernel,
	linux-kernel

On 09-10-18, 10:40, Pierre Yves MORDRET wrote:
> 
> 
> On 10/07/2018 06:00 PM, Vinod wrote:
> > On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
> >> This patch adds support of DMA/MDMA chaining support.
> >> It introduces an intermediate transfer between peripherals and STM32 DMA.
> >> This intermediate transfer is triggered by SW for single M2D transfer and
> >> by STM32 DMA IP for all other modes (sg, cyclic) and direction (D2M).
> >>
> >> A generic SRAM allocator is used for this intermediate buffer
> >> Each DMA channel will be able to define its SRAM needs to achieve chaining
> >> feature : (2 ^ order) * PAGE_SIZE.
> >> For cyclic, SRAM buffer is derived from period length (rounded on
> >> PAGE_SIZE).
> > 
> > So IIUC, you chain two dma txns together and transfer data via an SRAM?
> 
> Correct. one DMA is DMAv2 (stm32-dma) and the other is MDMA(stm32-mdma).
> Intermediate transfer is between device and memory.
> This intermediate transfer is using SDRAM.

Ah so you use dma calls to setup mdma xtfers? I dont think that is a
good idea. How do you know you should use mdma for subsequent transfer?


> >>  drivers/dma/stm32-dma.c | 879 ++++++++++++++++++++++++++++++++++++++++++------
> > 
> > that is a lot of change for a driver, consider splitting it up
> > logically in smaller changes...
> > 
> 
> This feature is rather monolithic. Difficult to split up.
> All the code is required at once.

It can be enabled at last but split up logically. Intrusive changes to a
driver make it hard to review..

-- 
~Vinod

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 4/7] dmaengine: stm32-dma: Add DMA/MDMA chaining support
  2018-10-10  4:03       ` Vinod
@ 2018-10-10  7:02         ` Pierre Yves MORDRET
  2018-10-15 17:14           ` Vinod
  0 siblings, 1 reply; 26+ messages in thread
From: Pierre Yves MORDRET @ 2018-10-10  7:02 UTC (permalink / raw)
  To: Vinod
  Cc: Rob Herring, Mark Rutland, Alexandre Torgue, Maxime Coquelin,
	Dan Williams, devicetree, dmaengine, linux-arm-kernel,
	linux-kernel



On 10/10/2018 06:03 AM, Vinod wrote:
> On 09-10-18, 10:40, Pierre Yves MORDRET wrote:
>>
>>
>> On 10/07/2018 06:00 PM, Vinod wrote:
>>> On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
>>>> This patch adds support of DMA/MDMA chaining support.
>>>> It introduces an intermediate transfer between peripherals and STM32 DMA.
>>>> This intermediate transfer is triggered by SW for single M2D transfer and
>>>> by STM32 DMA IP for all other modes (sg, cyclic) and direction (D2M).
>>>>
>>>> A generic SRAM allocator is used for this intermediate buffer
>>>> Each DMA channel will be able to define its SRAM needs to achieve chaining
>>>> feature : (2 ^ order) * PAGE_SIZE.
>>>> For cyclic, SRAM buffer is derived from period length (rounded on
>>>> PAGE_SIZE).
>>>
>>> So IIUC, you chain two dma txns together and transfer data via an SRAM?
>>
>> Correct. one DMA is DMAv2 (stm32-dma) and the other is MDMA(stm32-mdma).
>> Intermediate transfer is between device and memory.
>> This intermediate transfer is using SDRAM.
> 
> Ah so you use dma calls to setup mdma xtfers? I dont think that is a
> good idea. How do you know you should use mdma for subsequent transfer?
> 

When user bindings told to setup chaining intermediate MDMA transfers are always
triggers.
For instance if a user requests a Dev2Mem transfer with chaining. From client
pov this is still a prep_slave_sg. Internally DMAv2 is setup in cyclic mode (in
double buffer mode indeed => 2 buffer of PAGE_SIZE/2) and destination is SDRAM.
DMAv2 will flip/flop on those 2 buffers.
At the same time DMAv2 driver prepares a MDMA SG that will fetch data from those
2 buffers in SDRAM and fills final destination memory.

> 
>>>>  drivers/dma/stm32-dma.c | 879 ++++++++++++++++++++++++++++++++++++++++++------
>>>
>>> that is a lot of change for a driver, consider splitting it up
>>> logically in smaller changes...
>>>
>>
>> This feature is rather monolithic. Difficult to split up.
>> All the code is required at once.
> 
> It can be enabled at last but split up logically. Intrusive changes to a
> driver make it hard to review..
> 
Ok. I will to think about it how to proceed.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/7] dt-bindings: stm32-dma: Add DMA/MDMA chaining support bindings
  2018-09-28 13:01 ` [PATCH v3 1/7] dt-bindings: stm32-dma: Add DMA/MDMA chaining support bindings Pierre-Yves MORDRET
  2018-10-07 14:57   ` Vinod
@ 2018-10-12 14:42   ` Rob Herring
  1 sibling, 0 replies; 26+ messages in thread
From: Rob Herring @ 2018-10-12 14:42 UTC (permalink / raw)
  To: Pierre-Yves MORDRET
  Cc: Vinod Koul, Mark Rutland, Alexandre Torgue, Maxime Coquelin,
	Dan Williams, devicetree, dmaengine, linux-arm-kernel,
	linux-kernel

On Fri, Sep 28, 2018 at 03:01:49PM +0200, Pierre-Yves MORDRET wrote:
> From: M'boumba Cedric Madianga <cedric.madianga@gmail.com>
> 
> This patch adds dma bindings to support DMA/MDMA chaining transfer.
> 1 bit is to manage both DMA FIFO Threshold
> 1 bit is to manage DMA/MDMA Chaining features.
> 2 bits are used to specify SDRAM size to use for DMA/MDMA chaining.
> The size in bytes of a certain order is given by the formula:
>     (2 ^ order) * PAGE_SIZE.
> The order is given by those 2 bits.
> For cyclic, whether chaining is chosen, any value above 1 can be set :
> SRAM buffer size will rely on period size and not on this DT value.
> 
> Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>

Missing author S-o-b.

> ---
>   Version history:
>     v3:
>     v2:
>        * rework content
>     v1:
>        * Initial
> ---
> ---
>  .../devicetree/bindings/dma/stm32-dma.txt          | 27 +++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/7] dt-bindings: stm32-dmamux: Add one cell to support DMA/MDMA chain
  2018-09-28 13:01 ` [PATCH v3 2/7] dt-bindings: stm32-dmamux: Add one cell to support DMA/MDMA chain Pierre-Yves MORDRET
  2018-10-07 14:58   ` Vinod
@ 2018-10-12 14:46   ` Rob Herring
  1 sibling, 0 replies; 26+ messages in thread
From: Rob Herring @ 2018-10-12 14:46 UTC (permalink / raw)
  To: Pierre-Yves MORDRET
  Cc: Vinod Koul, Mark Rutland, Alexandre Torgue, Maxime Coquelin,
	Dan Williams, devicetree, dmaengine, linux-arm-kernel,
	linux-kernel

On Fri, Sep 28, 2018 at 03:01:50PM +0200, Pierre-Yves MORDRET wrote:
> From: M'boumba Cedric Madianga <cedric.madianga@gmail.com>
> 
> Add one cell to support DMA/MDMA chaining.

You aren't adding a cell. Is the change compatible with existing users 
(if you mask bits)?

> 
> Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>

Author S-o-b missing.

> ---
>   Version history:
>     v3:
>     v2:
>        * rework content
>     v1:
>        * Initial
> ---
> ---
>  Documentation/devicetree/bindings/dma/stm32-dmamux.txt | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/dma/stm32-dmamux.txt b/Documentation/devicetree/bindings/dma/stm32-dmamux.txt
> index 1b893b2..5e92b59 100644
> --- a/Documentation/devicetree/bindings/dma/stm32-dmamux.txt
> +++ b/Documentation/devicetree/bindings/dma/stm32-dmamux.txt
> @@ -4,9 +4,9 @@ Required properties:
>  - compatible:	"st,stm32h7-dmamux"
>  - reg:		Memory map for accessing module
>  - #dma-cells:	Should be set to <3>.
> -		First parameter is request line number.
> -		Second is DMA channel configuration
> -		Third is Fifo threshold
> +-		First parameter is request line number.
> +-		Second is DMA channel configuration
> +-		Third is a 32bit bitfield
>  		For more details about the three cells, please see
>  		stm32-dma.txt documentation binding file
>  - dma-masters:	Phandle pointing to the DMA controllers.
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 4/7] dmaengine: stm32-dma: Add DMA/MDMA chaining support
  2018-10-10  7:02         ` Pierre Yves MORDRET
@ 2018-10-15 17:14           ` Vinod
  2018-10-16  9:19             ` Pierre Yves MORDRET
  0 siblings, 1 reply; 26+ messages in thread
From: Vinod @ 2018-10-15 17:14 UTC (permalink / raw)
  To: Pierre Yves MORDRET
  Cc: Rob Herring, Mark Rutland, Alexandre Torgue, Maxime Coquelin,
	Dan Williams, devicetree, dmaengine, linux-arm-kernel,
	linux-kernel

On 10-10-18, 09:02, Pierre Yves MORDRET wrote:
> 
> 
> On 10/10/2018 06:03 AM, Vinod wrote:
> > On 09-10-18, 10:40, Pierre Yves MORDRET wrote:
> >>
> >>
> >> On 10/07/2018 06:00 PM, Vinod wrote:
> >>> On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
> >>>> This patch adds support of DMA/MDMA chaining support.
> >>>> It introduces an intermediate transfer between peripherals and STM32 DMA.
> >>>> This intermediate transfer is triggered by SW for single M2D transfer and
> >>>> by STM32 DMA IP for all other modes (sg, cyclic) and direction (D2M).
> >>>>
> >>>> A generic SRAM allocator is used for this intermediate buffer
> >>>> Each DMA channel will be able to define its SRAM needs to achieve chaining
> >>>> feature : (2 ^ order) * PAGE_SIZE.
> >>>> For cyclic, SRAM buffer is derived from period length (rounded on
> >>>> PAGE_SIZE).
> >>>
> >>> So IIUC, you chain two dma txns together and transfer data via an SRAM?
> >>
> >> Correct. one DMA is DMAv2 (stm32-dma) and the other is MDMA(stm32-mdma).
> >> Intermediate transfer is between device and memory.
> >> This intermediate transfer is using SDRAM.
> > 
> > Ah so you use dma calls to setup mdma xtfers? I dont think that is a
> > good idea. How do you know you should use mdma for subsequent transfer?
> > 
> 
> When user bindings told to setup chaining intermediate MDMA transfers are always
> triggers.
> For instance if a user requests a Dev2Mem transfer with chaining. From client
> pov this is still a prep_slave_sg. Internally DMAv2 is setup in cyclic mode (in
> double buffer mode indeed => 2 buffer of PAGE_SIZE/2) and destination is SDRAM.
> DMAv2 will flip/flop on those 2 buffers.
> At the same time DMAv2 driver prepares a MDMA SG that will fetch data from those
> 2 buffers in SDRAM and fills final destination memory.

I am not able to follow is why does it need to be internal, why should
the client not set the two transfers and trigger them?

-- 
~Vinod

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 4/7] dmaengine: stm32-dma: Add DMA/MDMA chaining support
  2018-10-15 17:14           ` Vinod
@ 2018-10-16  9:19             ` Pierre Yves MORDRET
  2018-10-16 14:44               ` Vinod
  0 siblings, 1 reply; 26+ messages in thread
From: Pierre Yves MORDRET @ 2018-10-16  9:19 UTC (permalink / raw)
  To: Vinod
  Cc: Rob Herring, Mark Rutland, Alexandre Torgue, Maxime Coquelin,
	Dan Williams, devicetree, dmaengine, linux-arm-kernel,
	linux-kernel



On 10/15/18 7:14 PM, Vinod wrote:
> On 10-10-18, 09:02, Pierre Yves MORDRET wrote:
>>
>>
>> On 10/10/2018 06:03 AM, Vinod wrote:
>>> On 09-10-18, 10:40, Pierre Yves MORDRET wrote:
>>>>
>>>>
>>>> On 10/07/2018 06:00 PM, Vinod wrote:
>>>>> On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
>>>>>> This patch adds support of DMA/MDMA chaining support.
>>>>>> It introduces an intermediate transfer between peripherals and STM32 DMA.
>>>>>> This intermediate transfer is triggered by SW for single M2D transfer and
>>>>>> by STM32 DMA IP for all other modes (sg, cyclic) and direction (D2M).
>>>>>>
>>>>>> A generic SRAM allocator is used for this intermediate buffer
>>>>>> Each DMA channel will be able to define its SRAM needs to achieve chaining
>>>>>> feature : (2 ^ order) * PAGE_SIZE.
>>>>>> For cyclic, SRAM buffer is derived from period length (rounded on
>>>>>> PAGE_SIZE).
>>>>>
>>>>> So IIUC, you chain two dma txns together and transfer data via an SRAM?
>>>>
>>>> Correct. one DMA is DMAv2 (stm32-dma) and the other is MDMA(stm32-mdma).
>>>> Intermediate transfer is between device and memory.
>>>> This intermediate transfer is using SDRAM.
>>>
>>> Ah so you use dma calls to setup mdma xtfers? I dont think that is a
>>> good idea. How do you know you should use mdma for subsequent transfer?
>>>
>>
>> When user bindings told to setup chaining intermediate MDMA transfers are always
>> triggers.
>> For instance if a user requests a Dev2Mem transfer with chaining. From client
>> pov this is still a prep_slave_sg. Internally DMAv2 is setup in cyclic mode (in
>> double buffer mode indeed => 2 buffer of PAGE_SIZE/2) and destination is SDRAM.
>> DMAv2 will flip/flop on those 2 buffers.
>> At the same time DMAv2 driver prepares a MDMA SG that will fetch data from those
>> 2 buffers in SDRAM and fills final destination memory.
> 
> I am not able to follow is why does it need to be internal, why should
> the client not set the two transfers and trigger them?
> 

Client may use or not chaining: defined within DT. API and dynamic are same at
driver client level. Moreover driver exposes only DMAv2 and not both DMAv2 and
MDMA. This is totally hidden for client. If client sets both this would imply
changing all drivers that may want use chaining. Even more to deal with DMAv2
and MDMA at its level.
Since DMAv2 deals with MDMA, all drivers are same as before. no changes required.

Regards

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 4/7] dmaengine: stm32-dma: Add DMA/MDMA chaining support
  2018-10-16  9:19             ` Pierre Yves MORDRET
@ 2018-10-16 14:44               ` Vinod
  2018-10-19  9:21                 ` Pierre Yves MORDRET
  0 siblings, 1 reply; 26+ messages in thread
From: Vinod @ 2018-10-16 14:44 UTC (permalink / raw)
  To: Pierre Yves MORDRET
  Cc: Rob Herring, Mark Rutland, Alexandre Torgue, Maxime Coquelin,
	Dan Williams, devicetree, dmaengine, linux-arm-kernel,
	linux-kernel

On 16-10-18, 11:19, Pierre Yves MORDRET wrote:
> 
> 
> On 10/15/18 7:14 PM, Vinod wrote:
> > On 10-10-18, 09:02, Pierre Yves MORDRET wrote:
> >>
> >>
> >> On 10/10/2018 06:03 AM, Vinod wrote:
> >>> On 09-10-18, 10:40, Pierre Yves MORDRET wrote:
> >>>>
> >>>>
> >>>> On 10/07/2018 06:00 PM, Vinod wrote:
> >>>>> On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
> >>>>>> This patch adds support of DMA/MDMA chaining support.
> >>>>>> It introduces an intermediate transfer between peripherals and STM32 DMA.
> >>>>>> This intermediate transfer is triggered by SW for single M2D transfer and
> >>>>>> by STM32 DMA IP for all other modes (sg, cyclic) and direction (D2M).
> >>>>>>
> >>>>>> A generic SRAM allocator is used for this intermediate buffer
> >>>>>> Each DMA channel will be able to define its SRAM needs to achieve chaining
> >>>>>> feature : (2 ^ order) * PAGE_SIZE.
> >>>>>> For cyclic, SRAM buffer is derived from period length (rounded on
> >>>>>> PAGE_SIZE).
> >>>>>
> >>>>> So IIUC, you chain two dma txns together and transfer data via an SRAM?
> >>>>
> >>>> Correct. one DMA is DMAv2 (stm32-dma) and the other is MDMA(stm32-mdma).
> >>>> Intermediate transfer is between device and memory.
> >>>> This intermediate transfer is using SDRAM.
> >>>
> >>> Ah so you use dma calls to setup mdma xtfers? I dont think that is a
> >>> good idea. How do you know you should use mdma for subsequent transfer?
> >>>
> >>
> >> When user bindings told to setup chaining intermediate MDMA transfers are always
> >> triggers.
> >> For instance if a user requests a Dev2Mem transfer with chaining. From client
> >> pov this is still a prep_slave_sg. Internally DMAv2 is setup in cyclic mode (in
> >> double buffer mode indeed => 2 buffer of PAGE_SIZE/2) and destination is SDRAM.
> >> DMAv2 will flip/flop on those 2 buffers.
> >> At the same time DMAv2 driver prepares a MDMA SG that will fetch data from those
> >> 2 buffers in SDRAM and fills final destination memory.
> > 
> > I am not able to follow is why does it need to be internal, why should
> > the client not set the two transfers and trigger them?
> > 
> 
> Client may use or not chaining: defined within DT. API and dynamic are same at

That should be upto client... As a dmaengine driver you should enable
data transfer from src to dstn.

> driver client level. Moreover driver exposes only DMAv2 and not both DMAv2 and
> MDMA. This is totally hidden for client. If client sets both this would imply

Why should a controller be hidden from user, I dont see why that would
be a good thing

> changing all drivers that may want use chaining. Even more to deal with DMAv2
> and MDMA at its level.
> Since DMAv2 deals with MDMA, all drivers are same as before. no changes required.

It is not about changes, it is about the SW model you want to have.

The intermediate SRAM transfers should not be made within DMAengine
driver, client can chose to have two transfers and couple or not, it is
upto them to choose. Sorry I do not like this abstraction and would like
to see a cleaner approach

-- 
~Vinod

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 4/7] dmaengine: stm32-dma: Add DMA/MDMA chaining support
  2018-10-16 14:44               ` Vinod
@ 2018-10-19  9:21                 ` Pierre Yves MORDRET
  0 siblings, 0 replies; 26+ messages in thread
From: Pierre Yves MORDRET @ 2018-10-19  9:21 UTC (permalink / raw)
  To: Vinod
  Cc: Rob Herring, Mark Rutland, Alexandre Torgue, Maxime Coquelin,
	Dan Williams, devicetree, dmaengine, linux-arm-kernel,
	linux-kernel



On 10/16/18 4:44 PM, Vinod wrote:
> On 16-10-18, 11:19, Pierre Yves MORDRET wrote:
>>
>>
>> On 10/15/18 7:14 PM, Vinod wrote:
>>> On 10-10-18, 09:02, Pierre Yves MORDRET wrote:
>>>>
>>>>
>>>> On 10/10/2018 06:03 AM, Vinod wrote:
>>>>> On 09-10-18, 10:40, Pierre Yves MORDRET wrote:
>>>>>>
>>>>>>
>>>>>> On 10/07/2018 06:00 PM, Vinod wrote:
>>>>>>> On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
>>>>>>>> This patch adds support of DMA/MDMA chaining support.
>>>>>>>> It introduces an intermediate transfer between peripherals and STM32 DMA.
>>>>>>>> This intermediate transfer is triggered by SW for single M2D transfer and
>>>>>>>> by STM32 DMA IP for all other modes (sg, cyclic) and direction (D2M).
>>>>>>>>
>>>>>>>> A generic SRAM allocator is used for this intermediate buffer
>>>>>>>> Each DMA channel will be able to define its SRAM needs to achieve chaining
>>>>>>>> feature : (2 ^ order) * PAGE_SIZE.
>>>>>>>> For cyclic, SRAM buffer is derived from period length (rounded on
>>>>>>>> PAGE_SIZE).
>>>>>>>
>>>>>>> So IIUC, you chain two dma txns together and transfer data via an SRAM?
>>>>>>
>>>>>> Correct. one DMA is DMAv2 (stm32-dma) and the other is MDMA(stm32-mdma).
>>>>>> Intermediate transfer is between device and memory.
>>>>>> This intermediate transfer is using SDRAM.
>>>>>
>>>>> Ah so you use dma calls to setup mdma xtfers? I dont think that is a
>>>>> good idea. How do you know you should use mdma for subsequent transfer?
>>>>>
>>>>
>>>> When user bindings told to setup chaining intermediate MDMA transfers are always
>>>> triggers.
>>>> For instance if a user requests a Dev2Mem transfer with chaining. From client
>>>> pov this is still a prep_slave_sg. Internally DMAv2 is setup in cyclic mode (in
>>>> double buffer mode indeed => 2 buffer of PAGE_SIZE/2) and destination is SDRAM.
>>>> DMAv2 will flip/flop on those 2 buffers.
>>>> At the same time DMAv2 driver prepares a MDMA SG that will fetch data from those
>>>> 2 buffers in SDRAM and fills final destination memory.
>>>
>>> I am not able to follow is why does it need to be internal, why should
>>> the client not set the two transfers and trigger them?
>>>
>>
>> Client may use or not chaining: defined within DT. API and dynamic are same at
> 
> That should be upto client... As a dmaengine driver you should enable
> data transfer from src to dstn.
> 
>> driver client level. Moreover driver exposes only DMAv2 and not both DMAv2 and
>> MDMA. This is totally hidden for client. If client sets both this would imply
> 
> Why should a controller be hidden from user, I dont see why that would
> be a good thing
> 
>> changing all drivers that may want use chaining. Even more to deal with DMAv2
>> and MDMA at its level.
>> Since DMAv2 deals with MDMA, all drivers are same as before. no changes required.
> 
> It is not about changes, it is about the SW model you want to have.
> 
> The intermediate SRAM transfers should not be made within DMAengine
> driver, client can chose to have two transfers and couple or not, it is
> upto them to choose. Sorry I do not like this abstraction and would like
> to see a cleaner approach
> 

What we have done it to hide all the complexity related to DMA engine:
synchronization, residue and many other topics solved by this approach. If this
is up to client to perform intermediate transfer, each client drivers using
chaining will need to duplicate the required sw.
This approach is present as a feature from driver pov.

Regards


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-10-19  9:23 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-28 13:01 [PATCH v3 0/7] Add-DMA-MDMA-chaining-support Pierre-Yves MORDRET
2018-09-28 13:01 ` [PATCH v3 1/7] dt-bindings: stm32-dma: Add DMA/MDMA chaining support bindings Pierre-Yves MORDRET
2018-10-07 14:57   ` Vinod
2018-10-09  7:18     ` Pierre Yves MORDRET
2018-10-09  8:57       ` Vinod
2018-10-09 13:46         ` Pierre Yves MORDRET
2018-10-12 14:42   ` Rob Herring
2018-09-28 13:01 ` [PATCH v3 2/7] dt-bindings: stm32-dmamux: Add one cell to support DMA/MDMA chain Pierre-Yves MORDRET
2018-10-07 14:58   ` Vinod
2018-10-09  7:22     ` Pierre Yves MORDRET
2018-10-12 14:46   ` Rob Herring
2018-09-28 13:01 ` [PATCH v3 3/7] dt-bindings: stm32-mdma: Add DMA/MDMA chaining support bindings Pierre-Yves MORDRET
2018-10-07 14:59   ` Vinod
2018-10-09  8:17     ` Pierre Yves MORDRET
2018-09-28 13:01 ` [PATCH v3 4/7] dmaengine: stm32-dma: Add DMA/MDMA chaining support Pierre-Yves MORDRET
2018-10-07 16:00   ` Vinod
2018-10-09  8:40     ` Pierre Yves MORDRET
2018-10-10  4:03       ` Vinod
2018-10-10  7:02         ` Pierre Yves MORDRET
2018-10-15 17:14           ` Vinod
2018-10-16  9:19             ` Pierre Yves MORDRET
2018-10-16 14:44               ` Vinod
2018-10-19  9:21                 ` Pierre Yves MORDRET
2018-09-28 13:01 ` [PATCH v3 5/7] dmaengine: stm32-mdma: " Pierre-Yves MORDRET
2018-09-28 13:01 ` [PATCH v3 6/7] dmaengine: stm32-dma: enable descriptor_reuse Pierre-Yves MORDRET
2018-09-28 13:01 ` [PATCH v3 7/7] dmaengine: stm32-mdma: " Pierre-Yves MORDRET

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).