dmaengine.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Add support for MEMCPY_SG transfers
@ 2021-07-06 23:43 Adrian Larumbe
  2021-07-06 23:43 ` [PATCH 1/2] dmaengine: xilinx_dma: Restore support for memcpy SG transfers Adrian Larumbe
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Adrian Larumbe @ 2021-07-06 23:43 UTC (permalink / raw)
  To: vkoul, dmaengine; +Cc: michal.simek, linux-arm-kernel, adrian.martinezlarumbe

Bring back dmaengine API support for scatter-gather memcpy's, which is a
follow-up from a previous patch series in which I wrongly tried to
implement the semantics of a MEMCPY_SG through the Slave SG interface.

The second patch fixes a race condition that I observed when calling
dmaengine_terminate_sync in the Xilinx DMA driver.

Adrian Larumbe (2):
  dmaengine: xilinx_dma: Restore support for memcpy SG transfers
  xilinx_dma: Fix read-after-free bug when terminating transfers

 .../driver-api/dmaengine/provider.rst         |  11 ++
 drivers/dma/dmaengine.c                       |   7 +
 drivers/dma/xilinx/xilinx_dma.c               | 134 ++++++++++++++++++
 include/linux/dmaengine.h                     |  20 +++
 4 files changed, 172 insertions(+)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] dmaengine: xilinx_dma: Restore support for memcpy SG transfers
  2021-07-06 23:43 [PATCH 0/2] Add support for MEMCPY_SG transfers Adrian Larumbe
@ 2021-07-06 23:43 ` Adrian Larumbe
  2021-07-14  4:58   ` Vinod Koul
  2021-07-06 23:43 ` [PATCH 2/2] xilinx_dma: Fix read-after-free bug when terminating transfers Adrian Larumbe
  2021-11-01 18:08 ` [PATCH 0/3] Add support for MEMCPY_SG transfers Adrian Larumbe
  2 siblings, 1 reply; 13+ messages in thread
From: Adrian Larumbe @ 2021-07-06 23:43 UTC (permalink / raw)
  To: vkoul, dmaengine; +Cc: michal.simek, linux-arm-kernel, adrian.martinezlarumbe

This is the old DMA_SG interface that was removed in commit
c678fa66341c ("dmaengine: remove DMA_SG as it is dead code in kernel"). It
has been renamed to DMA_MEMCPY_SG to better match the MEMSET and MEMSET_SG
naming convention.

It should only be used for mem2mem copies, either main system memory or
CPU-addressable device memory (like video memory on a PCI graphics card).

Bringing back this interface was prompted by the need to use the Xilinx
CDMA device for mem2mem SG transfers. The current CDMA binding for
device_prep_dma_memcpy_sg was partially borrowed from xlnx kernel tree, and
expanded with extended address space support when linking descriptor
segments and checking for incorrect zero transfer size.

Signed-off-by: Adrian Larumbe <adrian.martinezlarumbe@imgtec.com>
---
 .../driver-api/dmaengine/provider.rst         |  11 ++
 drivers/dma/dmaengine.c                       |   7 +
 drivers/dma/xilinx/xilinx_dma.c               | 122 ++++++++++++++++++
 include/linux/dmaengine.h                     |  20 +++
 4 files changed, 160 insertions(+)

diff --git a/Documentation/driver-api/dmaengine/provider.rst b/Documentation/driver-api/dmaengine/provider.rst
index ddb0a81a796c..9f0efe9e9952 100644
--- a/Documentation/driver-api/dmaengine/provider.rst
+++ b/Documentation/driver-api/dmaengine/provider.rst
@@ -162,6 +162,17 @@ Currently, the types available are:
 
   - The device is able to do memory to memory copies
 
+- - DMA_MEMCPY_SG
+
+  - The device supports memory to memory scatter-gather transfers.
+
+  - Even though a plain memcpy can look like a particular case of a
+    scatter-gather transfer, with a single chunk to transfer, it's a
+    distinct transaction type in the mem2mem transfer case. This is
+    because some very simple devices might be able to do contiguous
+    single-chunk memory copies, but have no support for more
+    complex SG transfers.
+
 - DMA_XOR
 
   - The device is able to perform XOR operations on memory areas
diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index af3ee288bc11..c4e3334b04cf 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -1160,6 +1160,13 @@ int dma_async_device_register(struct dma_device *device)
 		return -EIO;
 	}
 
+	if (dma_has_cap(DMA_MEMCPY_SG, device->cap_mask) && !device->device_prep_dma_memcpy_sg) {
+		dev_err(device->dev,
+			"Device claims capability %s, but op is not defined\n",
+			"DMA_MEMCPY_SG");
+		return -EIO;
+	}
+
 	if (dma_has_cap(DMA_XOR, device->cap_mask) && !device->device_prep_dma_xor) {
 		dev_err(device->dev,
 			"Device claims capability %s, but op is not defined\n",
diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index 75c0b8e904e5..0e2bf75d42d3 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -2108,6 +2108,126 @@ xilinx_cdma_prep_memcpy(struct dma_chan *dchan, dma_addr_t dma_dst,
 	return NULL;
 }
 
+/**
+ * xilinx_cdma_prep_memcpy_sg - prepare descriptors for a memcpy_sg transaction
+ * @dchan: DMA channel
+ * @dst_sg: Destination scatter list
+ * @dst_sg_len: Number of entries in destination scatter list
+ * @src_sg: Source scatter list
+ * @src_sg_len: Number of entries in source scatter list
+ * @flags: transfer ack flags
+ *
+ * Return: Async transaction descriptor on success and NULL on failure
+ */
+static struct dma_async_tx_descriptor *xilinx_cdma_prep_memcpy_sg(
+			struct dma_chan *dchan, struct scatterlist *dst_sg,
+			unsigned int dst_sg_len, struct scatterlist *src_sg,
+			unsigned int src_sg_len, unsigned long flags)
+{
+	struct xilinx_dma_chan *chan = to_xilinx_chan(dchan);
+	struct xilinx_dma_tx_descriptor *desc;
+	struct xilinx_cdma_tx_segment *segment, *prev = NULL;
+	struct xilinx_cdma_desc_hw *hw;
+	size_t len, dst_avail, src_avail;
+	dma_addr_t dma_dst, dma_src;
+
+	if (unlikely(dst_sg_len == 0 || src_sg_len == 0))
+		return NULL;
+
+	if (unlikely(!dst_sg  || !src_sg))
+		return NULL;
+
+	desc = xilinx_dma_alloc_tx_descriptor(chan);
+	if (!desc)
+		return NULL;
+
+	dma_async_tx_descriptor_init(&desc->async_tx, &chan->common);
+	desc->async_tx.tx_submit = xilinx_dma_tx_submit;
+
+	dst_avail = sg_dma_len(dst_sg);
+	src_avail = sg_dma_len(src_sg);
+	/*
+	 * loop until there is either no more source or no more destination
+	 * scatterlist entry
+	 */
+	while (true) {
+		len = min_t(size_t, src_avail, dst_avail);
+		len = min_t(size_t, len, chan->xdev->max_buffer_len);
+		if (len == 0)
+			goto fetch;
+
+		/* Allocate the link descriptor from DMA pool */
+		segment = xilinx_cdma_alloc_tx_segment(chan);
+		if (!segment)
+			goto error;
+
+		dma_dst = sg_dma_address(dst_sg) + sg_dma_len(dst_sg) -
+			dst_avail;
+		dma_src = sg_dma_address(src_sg) + sg_dma_len(src_sg) -
+			src_avail;
+		hw = &segment->hw;
+		hw->control = len;
+		hw->src_addr = dma_src;
+		hw->dest_addr = dma_dst;
+		if (chan->ext_addr) {
+			hw->src_addr_msb = upper_32_bits(dma_src);
+			hw->dest_addr_msb = upper_32_bits(dma_dst);
+		}
+
+		if (prev) {
+			prev->hw.next_desc = segment->phys;
+			if (chan->ext_addr)
+				prev->hw.next_desc_msb =
+					upper_32_bits(segment->phys);
+		}
+
+		prev = segment;
+		dst_avail -= len;
+		src_avail -= len;
+		list_add_tail(&segment->node, &desc->segments);
+
+fetch:
+		/* Fetch the next dst scatterlist entry */
+		if (dst_avail == 0) {
+			if (dst_sg_len == 0)
+				break;
+			dst_sg = sg_next(dst_sg);
+			if (dst_sg == NULL)
+				break;
+			dst_sg_len--;
+			dst_avail = sg_dma_len(dst_sg);
+		}
+		/* Fetch the next src scatterlist entry */
+		if (src_avail == 0) {
+			if (src_sg_len == 0)
+				break;
+			src_sg = sg_next(src_sg);
+			if (src_sg == NULL)
+				break;
+			src_sg_len--;
+			src_avail = sg_dma_len(src_sg);
+		}
+	}
+
+	if (list_empty(&desc->segments)) {
+		dev_err(chan->xdev->dev,
+			"%s: Zero-size SG transfer requested\n", __func__);
+		goto error;
+	}
+
+	/* Link the last hardware descriptor with the first. */
+	segment = list_first_entry(&desc->segments,
+				struct xilinx_cdma_tx_segment, node);
+	desc->async_tx.phys = segment->phys;
+	prev->hw.next_desc = segment->phys;
+
+	return &desc->async_tx;
+
+error:
+	xilinx_dma_free_tx_descriptor(chan, desc);
+	return NULL;
+}
+
 /**
  * xilinx_dma_prep_slave_sg - prepare descriptors for a DMA_SLAVE transaction
  * @dchan: DMA channel
@@ -3094,7 +3214,9 @@ static int xilinx_dma_probe(struct platform_device *pdev)
 					  DMA_RESIDUE_GRANULARITY_SEGMENT;
 	} else if (xdev->dma_config->dmatype == XDMA_TYPE_CDMA) {
 		dma_cap_set(DMA_MEMCPY, xdev->common.cap_mask);
+		dma_cap_set(DMA_MEMCPY_SG, xdev->common.cap_mask);
 		xdev->common.device_prep_dma_memcpy = xilinx_cdma_prep_memcpy;
+		xdev->common.device_prep_dma_memcpy_sg = xilinx_cdma_prep_memcpy_sg;
 		/* Residue calculation is supported by only AXI DMA and CDMA */
 		xdev->common.residue_granularity =
 					  DMA_RESIDUE_GRANULARITY_SEGMENT;
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 004736b6a9c8..7c342f77d8eb 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -50,6 +50,7 @@ enum dma_status {
  */
 enum dma_transaction_type {
 	DMA_MEMCPY,
+	DMA_MEMCPY_SG,
 	DMA_XOR,
 	DMA_PQ,
 	DMA_XOR_VAL,
@@ -891,6 +892,11 @@ struct dma_device {
 	struct dma_async_tx_descriptor *(*device_prep_dma_memcpy)(
 		struct dma_chan *chan, dma_addr_t dst, dma_addr_t src,
 		size_t len, unsigned long flags);
+	struct dma_async_tx_descriptor *(*device_prep_dma_memcpy_sg)(
+		struct dma_chan *chan,
+		struct scatterlist *dst_sg, unsigned int dst_nents,
+		struct scatterlist *src_sg, unsigned int src_nents,
+		unsigned long flags);
 	struct dma_async_tx_descriptor *(*device_prep_dma_xor)(
 		struct dma_chan *chan, dma_addr_t dst, dma_addr_t *src,
 		unsigned int src_cnt, size_t len, unsigned long flags);
@@ -1053,6 +1059,20 @@ static inline struct dma_async_tx_descriptor *dmaengine_prep_dma_memcpy(
 						    len, flags);
 }
 
+static inline struct dma_async_tx_descriptor *dmaengine_prep_dma_memcpy_sg(
+		struct dma_chan *chan,
+		struct scatterlist *dst_sg, unsigned int dst_nents,
+		struct scatterlist *src_sg, unsigned int src_nents,
+		unsigned long flags)
+{
+	if (!chan || !chan->device || !chan->device->device_prep_dma_memcpy_sg)
+		return NULL;
+
+	return chan->device->device_prep_dma_memcpy_sg(chan, dst_sg, dst_nents,
+						       src_sg, src_nents,
+						       flags);
+}
+
 static inline bool dmaengine_is_metadata_mode_supported(struct dma_chan *chan,
 		enum dma_desc_metadata_mode mode)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] xilinx_dma: Fix read-after-free bug when terminating transfers
  2021-07-06 23:43 [PATCH 0/2] Add support for MEMCPY_SG transfers Adrian Larumbe
  2021-07-06 23:43 ` [PATCH 1/2] dmaengine: xilinx_dma: Restore support for memcpy SG transfers Adrian Larumbe
@ 2021-07-06 23:43 ` Adrian Larumbe
  2021-07-14  5:10   ` Vinod Koul
  2021-11-01 18:08 ` [PATCH 0/3] Add support for MEMCPY_SG transfers Adrian Larumbe
  2 siblings, 1 reply; 13+ messages in thread
From: Adrian Larumbe @ 2021-07-06 23:43 UTC (permalink / raw)
  To: vkoul, dmaengine; +Cc: michal.simek, linux-arm-kernel, adrian.martinezlarumbe

When user calls dmaengine_terminate_sync, the driver will clean up any
remaining descriptors for all the pending or active transfers that had
previously been submitted. However, this might happen whilst the tasklet is
invoking the DMA callback for the last finished transfer, so by the time it
returns and takes over the channel's spinlock, the list of completed
descriptors it was traversing is no longer valid. This leads to a
read-after-free situation.

Fix it by signalling whether a user-triggered termination has happened by
means of a boolean variable.

Signed-off-by: Adrian Larumbe <adrian.martinezlarumbe@imgtec.com>
---
 drivers/dma/xilinx/xilinx_dma.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index 0e2bf75d42d3..8258e9fcc179 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -394,6 +394,7 @@ struct xilinx_dma_tx_descriptor {
  * @genlock: Support genlock mode
  * @err: Channel has errors
  * @idle: Check for channel idle
+ * @terminating: Check for channel being synchronized by user
  * @tasklet: Cleanup work after irq
  * @config: Device configuration info
  * @flush_on_fsync: Flush on Frame sync
@@ -431,6 +432,7 @@ struct xilinx_dma_chan {
 	bool genlock;
 	bool err;
 	bool idle;
+	bool terminating;
 	struct tasklet_struct tasklet;
 	struct xilinx_vdma_config config;
 	bool flush_on_fsync;
@@ -1049,6 +1051,13 @@ static void xilinx_dma_chan_desc_cleanup(struct xilinx_dma_chan *chan)
 		/* Run any dependencies, then free the descriptor */
 		dma_run_dependencies(&desc->async_tx);
 		xilinx_dma_free_tx_descriptor(chan, desc);
+
+		/*
+		 * While we ran a callback the user called a terminate function,
+		 * which takes care of cleaning up any remaining descriptors
+		 */
+		if (chan->terminating)
+			break;
 	}
 
 	spin_unlock_irqrestore(&chan->lock, flags);
@@ -1965,6 +1974,8 @@ static dma_cookie_t xilinx_dma_tx_submit(struct dma_async_tx_descriptor *tx)
 	if (desc->cyclic)
 		chan->cyclic = true;
 
+	chan->terminating = false;
+
 	spin_unlock_irqrestore(&chan->lock, flags);
 
 	return cookie;
@@ -2556,6 +2567,7 @@ static int xilinx_dma_terminate_all(struct dma_chan *dchan)
 
 	xilinx_dma_chan_reset(chan);
 	/* Remove and free all of the descriptors in the lists */
+	chan->terminating = true;
 	xilinx_dma_free_descriptors(chan);
 	chan->idle = true;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] dmaengine: xilinx_dma: Restore support for memcpy SG transfers
  2021-07-06 23:43 ` [PATCH 1/2] dmaengine: xilinx_dma: Restore support for memcpy SG transfers Adrian Larumbe
@ 2021-07-14  4:58   ` Vinod Koul
  2021-07-26 22:14     ` Adrian Larumbe
  0 siblings, 1 reply; 13+ messages in thread
From: Vinod Koul @ 2021-07-14  4:58 UTC (permalink / raw)
  To: Adrian Larumbe; +Cc: dmaengine, michal.simek, linux-arm-kernel

On 07-07-21, 00:43, Adrian Larumbe wrote:
> This is the old DMA_SG interface that was removed in commit
> c678fa66341c ("dmaengine: remove DMA_SG as it is dead code in kernel"). It
> has been renamed to DMA_MEMCPY_SG to better match the MEMSET and MEMSET_SG
> naming convention.
> 
> It should only be used for mem2mem copies, either main system memory or
> CPU-addressable device memory (like video memory on a PCI graphics card).
> 
> Bringing back this interface was prompted by the need to use the Xilinx
> CDMA device for mem2mem SG transfers. The current CDMA binding for
> device_prep_dma_memcpy_sg was partially borrowed from xlnx kernel tree, and
> expanded with extended address space support when linking descriptor
> segments and checking for incorrect zero transfer size.
> 
> Signed-off-by: Adrian Larumbe <adrian.martinezlarumbe@imgtec.com>
> ---
>  .../driver-api/dmaengine/provider.rst         |  11 ++
>  drivers/dma/dmaengine.c                       |   7 +
>  drivers/dma/xilinx/xilinx_dma.c               | 122 ++++++++++++++++++

Can you make this split... documentation patch, core change and then
driver

>  include/linux/dmaengine.h                     |  20 +++
>  4 files changed, 160 insertions(+)
> 
> diff --git a/Documentation/driver-api/dmaengine/provider.rst b/Documentation/driver-api/dmaengine/provider.rst
> index ddb0a81a796c..9f0efe9e9952 100644
> --- a/Documentation/driver-api/dmaengine/provider.rst
> +++ b/Documentation/driver-api/dmaengine/provider.rst
> @@ -162,6 +162,17 @@ Currently, the types available are:
>  
>    - The device is able to do memory to memory copies
>  
> +- - DMA_MEMCPY_SG
> +
> +  - The device supports memory to memory scatter-gather transfers.
> +
> +  - Even though a plain memcpy can look like a particular case of a
> +    scatter-gather transfer, with a single chunk to transfer, it's a
> +    distinct transaction type in the mem2mem transfer case. This is
> +    because some very simple devices might be able to do contiguous
> +    single-chunk memory copies, but have no support for more
> +    complex SG transfers.

How does one deal with cases where
 - src_sg_len and dstn_sg_len are different?
 - src_sg and dstn_sg are different lists (maybe different number of
   entries with different lengths..)

I think we need to document these cases or limitations..


> +
>  - DMA_XOR
>  
>    - The device is able to perform XOR operations on memory areas
> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
> index af3ee288bc11..c4e3334b04cf 100644
> --- a/drivers/dma/dmaengine.c
> +++ b/drivers/dma/dmaengine.c
> @@ -1160,6 +1160,13 @@ int dma_async_device_register(struct dma_device *device)
>  		return -EIO;
>  	}
>  
> +	if (dma_has_cap(DMA_MEMCPY_SG, device->cap_mask) && !device->device_prep_dma_memcpy_sg) {
> +		dev_err(device->dev,
> +			"Device claims capability %s, but op is not defined\n",
> +			"DMA_MEMCPY_SG");
> +		return -EIO;
> +	}
> +
>  	if (dma_has_cap(DMA_XOR, device->cap_mask) && !device->device_prep_dma_xor) {
>  		dev_err(device->dev,
>  			"Device claims capability %s, but op is not defined\n",
> diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
> index 75c0b8e904e5..0e2bf75d42d3 100644
> --- a/drivers/dma/xilinx/xilinx_dma.c
> +++ b/drivers/dma/xilinx/xilinx_dma.c
> @@ -2108,6 +2108,126 @@ xilinx_cdma_prep_memcpy(struct dma_chan *dchan, dma_addr_t dma_dst,
>  	return NULL;
>  }
>  
> +/**
> + * xilinx_cdma_prep_memcpy_sg - prepare descriptors for a memcpy_sg transaction
> + * @dchan: DMA channel
> + * @dst_sg: Destination scatter list
> + * @dst_sg_len: Number of entries in destination scatter list
> + * @src_sg: Source scatter list
> + * @src_sg_len: Number of entries in source scatter list
> + * @flags: transfer ack flags
> + *
> + * Return: Async transaction descriptor on success and NULL on failure
> + */
> +static struct dma_async_tx_descriptor *xilinx_cdma_prep_memcpy_sg(
> +			struct dma_chan *dchan, struct scatterlist *dst_sg,
> +			unsigned int dst_sg_len, struct scatterlist *src_sg,
> +			unsigned int src_sg_len, unsigned long flags)
> +{
> +	struct xilinx_dma_chan *chan = to_xilinx_chan(dchan);
> +	struct xilinx_dma_tx_descriptor *desc;
> +	struct xilinx_cdma_tx_segment *segment, *prev = NULL;
> +	struct xilinx_cdma_desc_hw *hw;
> +	size_t len, dst_avail, src_avail;
> +	dma_addr_t dma_dst, dma_src;
> +
> +	if (unlikely(dst_sg_len == 0 || src_sg_len == 0))
> +		return NULL;
> +
> +	if (unlikely(!dst_sg  || !src_sg))
> +		return NULL;

no check for dst_sg_len == src_sg_len or it doesnt matter here?

> +
> +	desc = xilinx_dma_alloc_tx_descriptor(chan);
> +	if (!desc)
> +		return NULL;
> +
> +	dma_async_tx_descriptor_init(&desc->async_tx, &chan->common);
> +	desc->async_tx.tx_submit = xilinx_dma_tx_submit;
> +
> +	dst_avail = sg_dma_len(dst_sg);
> +	src_avail = sg_dma_len(src_sg);
> +	/*
> +	 * loop until there is either no more source or no more destination
> +	 * scatterlist entry
> +	 */
> +	while (true) {
> +		len = min_t(size_t, src_avail, dst_avail);
> +		len = min_t(size_t, len, chan->xdev->max_buffer_len);
> +		if (len == 0)
> +			goto fetch;
> +
> +		/* Allocate the link descriptor from DMA pool */
> +		segment = xilinx_cdma_alloc_tx_segment(chan);
> +		if (!segment)
> +			goto error;
> +
> +		dma_dst = sg_dma_address(dst_sg) + sg_dma_len(dst_sg) -
> +			dst_avail;
> +		dma_src = sg_dma_address(src_sg) + sg_dma_len(src_sg) -
> +			src_avail;
> +		hw = &segment->hw;
> +		hw->control = len;
> +		hw->src_addr = dma_src;
> +		hw->dest_addr = dma_dst;
> +		if (chan->ext_addr) {
> +			hw->src_addr_msb = upper_32_bits(dma_src);
> +			hw->dest_addr_msb = upper_32_bits(dma_dst);
> +		}
> +
> +		if (prev) {
> +			prev->hw.next_desc = segment->phys;
> +			if (chan->ext_addr)
> +				prev->hw.next_desc_msb =
> +					upper_32_bits(segment->phys);
> +		}
> +
> +		prev = segment;
> +		dst_avail -= len;
> +		src_avail -= len;
> +		list_add_tail(&segment->node, &desc->segments);
> +
> +fetch:
> +		/* Fetch the next dst scatterlist entry */
> +		if (dst_avail == 0) {
> +			if (dst_sg_len == 0)
> +				break;
> +			dst_sg = sg_next(dst_sg);
> +			if (dst_sg == NULL)
> +				break;
> +			dst_sg_len--;
> +			dst_avail = sg_dma_len(dst_sg);
> +		}
> +		/* Fetch the next src scatterlist entry */
> +		if (src_avail == 0) {
> +			if (src_sg_len == 0)
> +				break;
> +			src_sg = sg_next(src_sg);
> +			if (src_sg == NULL)
> +				break;
> +			src_sg_len--;
> +			src_avail = sg_dma_len(src_sg);
> +		}
> +	}
> +
> +	if (list_empty(&desc->segments)) {
> +		dev_err(chan->xdev->dev,
> +			"%s: Zero-size SG transfer requested\n", __func__);
> +		goto error;
> +	}
> +
> +	/* Link the last hardware descriptor with the first. */
> +	segment = list_first_entry(&desc->segments,
> +				struct xilinx_cdma_tx_segment, node);
> +	desc->async_tx.phys = segment->phys;
> +	prev->hw.next_desc = segment->phys;
> +
> +	return &desc->async_tx;
> +
> +error:
> +	xilinx_dma_free_tx_descriptor(chan, desc);
> +	return NULL;
> +}
> +
>  /**
>   * xilinx_dma_prep_slave_sg - prepare descriptors for a DMA_SLAVE transaction
>   * @dchan: DMA channel
> @@ -3094,7 +3214,9 @@ static int xilinx_dma_probe(struct platform_device *pdev)
>  					  DMA_RESIDUE_GRANULARITY_SEGMENT;
>  	} else if (xdev->dma_config->dmatype == XDMA_TYPE_CDMA) {
>  		dma_cap_set(DMA_MEMCPY, xdev->common.cap_mask);
> +		dma_cap_set(DMA_MEMCPY_SG, xdev->common.cap_mask);
>  		xdev->common.device_prep_dma_memcpy = xilinx_cdma_prep_memcpy;
> +		xdev->common.device_prep_dma_memcpy_sg = xilinx_cdma_prep_memcpy_sg;
>  		/* Residue calculation is supported by only AXI DMA and CDMA */
>  		xdev->common.residue_granularity =
>  					  DMA_RESIDUE_GRANULARITY_SEGMENT;
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index 004736b6a9c8..7c342f77d8eb 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -50,6 +50,7 @@ enum dma_status {
>   */
>  enum dma_transaction_type {
>  	DMA_MEMCPY,
> +	DMA_MEMCPY_SG,
>  	DMA_XOR,
>  	DMA_PQ,
>  	DMA_XOR_VAL,
> @@ -891,6 +892,11 @@ struct dma_device {
>  	struct dma_async_tx_descriptor *(*device_prep_dma_memcpy)(
>  		struct dma_chan *chan, dma_addr_t dst, dma_addr_t src,
>  		size_t len, unsigned long flags);
> +	struct dma_async_tx_descriptor *(*device_prep_dma_memcpy_sg)(
> +		struct dma_chan *chan,
> +		struct scatterlist *dst_sg, unsigned int dst_nents,
> +		struct scatterlist *src_sg, unsigned int src_nents,
> +		unsigned long flags);
>  	struct dma_async_tx_descriptor *(*device_prep_dma_xor)(
>  		struct dma_chan *chan, dma_addr_t dst, dma_addr_t *src,
>  		unsigned int src_cnt, size_t len, unsigned long flags);
> @@ -1053,6 +1059,20 @@ static inline struct dma_async_tx_descriptor *dmaengine_prep_dma_memcpy(
>  						    len, flags);
>  }
>  
> +static inline struct dma_async_tx_descriptor *dmaengine_prep_dma_memcpy_sg(
> +		struct dma_chan *chan,
> +		struct scatterlist *dst_sg, unsigned int dst_nents,
> +		struct scatterlist *src_sg, unsigned int src_nents,
> +		unsigned long flags)
> +{
> +	if (!chan || !chan->device || !chan->device->device_prep_dma_memcpy_sg)
> +		return NULL;
> +
> +	return chan->device->device_prep_dma_memcpy_sg(chan, dst_sg, dst_nents,
> +						       src_sg, src_nents,
> +						       flags);
> +}
> +
>  static inline bool dmaengine_is_metadata_mode_supported(struct dma_chan *chan,
>  		enum dma_desc_metadata_mode mode)
>  {
> -- 
> 2.17.1

-- 
~Vinod

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] xilinx_dma: Fix read-after-free bug when terminating transfers
  2021-07-06 23:43 ` [PATCH 2/2] xilinx_dma: Fix read-after-free bug when terminating transfers Adrian Larumbe
@ 2021-07-14  5:10   ` Vinod Koul
  0 siblings, 0 replies; 13+ messages in thread
From: Vinod Koul @ 2021-07-14  5:10 UTC (permalink / raw)
  To: Adrian Larumbe; +Cc: dmaengine, michal.simek, linux-arm-kernel

On 07-07-21, 00:43, Adrian Larumbe wrote:
> When user calls dmaengine_terminate_sync, the driver will clean up any
> remaining descriptors for all the pending or active transfers that had
> previously been submitted. However, this might happen whilst the tasklet is
> invoking the DMA callback for the last finished transfer, so by the time it
> returns and takes over the channel's spinlock, the list of completed
> descriptors it was traversing is no longer valid. This leads to a
> read-after-free situation.
> 
> Fix it by signalling whether a user-triggered termination has happened by
> means of a boolean variable.

Applied after adding subsystem name, thanks

-- 
~Vinod

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] dmaengine: xilinx_dma: Restore support for memcpy SG transfers
  2021-07-14  4:58   ` Vinod Koul
@ 2021-07-26 22:14     ` Adrian Larumbe
  2021-07-29  4:19       ` Vinod Koul
  0 siblings, 1 reply; 13+ messages in thread
From: Adrian Larumbe @ 2021-07-26 22:14 UTC (permalink / raw)
  To: Vinod Koul; +Cc: Adrian Larumbe, dmaengine, michal.simek, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 10891 bytes --]

Hi Vinod, I'm the same person who authored this patch. I left my previous
employer so no longer have access to their company email address. However I've
signed this email with the same GPG key to confirm my identity.

On 14.07.2021 10:28, Vinod Koul wrote:
> On 07-07-21, 00:43, Adrian Larumbe wrote:
> > This is the old DMA_SG interface that was removed in commit
> > c678fa66341c ("dmaengine: remove DMA_SG as it is dead code in kernel"). It
> > has been renamed to DMA_MEMCPY_SG to better match the MEMSET and MEMSET_SG
> > naming convention.
> > 
> > It should only be used for mem2mem copies, either main system memory or
> > CPU-addressable device memory (like video memory on a PCI graphics card).
> > 
> > Bringing back this interface was prompted by the need to use the Xilinx
> > CDMA device for mem2mem SG transfers. The current CDMA binding for
> > device_prep_dma_memcpy_sg was partially borrowed from xlnx kernel tree, and
> > expanded with extended address space support when linking descriptor
> > segments and checking for incorrect zero transfer size.
> > 
> > Signed-off-by: Adrian Larumbe <adrian.martinezlarumbe@imgtec.com>
> > ---
> >  .../driver-api/dmaengine/provider.rst         |  11 ++
> >  drivers/dma/dmaengine.c                       |   7 +
> >  drivers/dma/xilinx/xilinx_dma.c               | 122 ++++++++++++++++++
> 
> Can you make this split... documentation patch, core change and then
> driver

I understand you'd like these in three different patches, is that right? Or
maybe one patch for the core change and its associated documentation, and
another one for the consumer.

> >  include/linux/dmaengine.h                     |  20 +++
> >  4 files changed, 160 insertions(+)
> > 
> > diff --git a/Documentation/driver-api/dmaengine/provider.rst b/Documentation/driver-api/dmaengine/provider.rst
> > index ddb0a81a796c..9f0efe9e9952 100644
> > --- a/Documentation/driver-api/dmaengine/provider.rst
> > +++ b/Documentation/driver-api/dmaengine/provider.rst
> > @@ -162,6 +162,17 @@ Currently, the types available are:
> >  
> >    - The device is able to do memory to memory copies
> >  
> > +- - DMA_MEMCPY_SG
> > +
> > +  - The device supports memory to memory scatter-gather transfers.
> > +
> > +  - Even though a plain memcpy can look like a particular case of a
> > +    scatter-gather transfer, with a single chunk to transfer, it's a
> > +    distinct transaction type in the mem2mem transfer case. This is
> > +    because some very simple devices might be able to do contiguous
> > +    single-chunk memory copies, but have no support for more
> > +    complex SG transfers.
> 
> How does one deal with cases where
>  - src_sg_len and dstn_sg_len are different?

Then only as many bytes as the smallest of the scattered buffers will be copied.

> >  - src_sg and dstn_sg are different lists (maybe different number of
> >    entries with different lengths..)
> > 
> > I think we need to document these cases or limitations..

I don't think we should place any restrictions on the number of scatterlist
entries or their length, and the consumer driver should ensure that these get
translated into a device-specific descriptor chain. However the previous
semantic should always be observed, which effectively turns the operation into
sort of a strncpy.

> > +
> >  - DMA_XOR
> >  
> >    - The device is able to perform XOR operations on memory areas
> > diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
> > index af3ee288bc11..c4e3334b04cf 100644
> > --- a/drivers/dma/dmaengine.c
> > +++ b/drivers/dma/dmaengine.c
> > @@ -1160,6 +1160,13 @@ int dma_async_device_register(struct dma_device *device)
> >  		return -EIO;
> >  	}
> >  
> > +	if (dma_has_cap(DMA_MEMCPY_SG, device->cap_mask) && !device->device_prep_dma_memcpy_sg) {
> > +		dev_err(device->dev,
> > +			"Device claims capability %s, but op is not defined\n",
> > +			"DMA_MEMCPY_SG");
> > +		return -EIO;
> > +	}
> > +
> >  	if (dma_has_cap(DMA_XOR, device->cap_mask) && !device->device_prep_dma_xor) {
> >  		dev_err(device->dev,
> >  			"Device claims capability %s, but op is not defined\n",
> > diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
> > index 75c0b8e904e5..0e2bf75d42d3 100644
> > --- a/drivers/dma/xilinx/xilinx_dma.c
> > +++ b/drivers/dma/xilinx/xilinx_dma.c
> > @@ -2108,6 +2108,126 @@ xilinx_cdma_prep_memcpy(struct dma_chan *dchan, dma_addr_t dma_dst,
> >  	return NULL;
> >  }
> >  
> > +/**
> > + * xilinx_cdma_prep_memcpy_sg - prepare descriptors for a memcpy_sg transaction
> > + * @dchan: DMA channel
> > + * @dst_sg: Destination scatter list
> > + * @dst_sg_len: Number of entries in destination scatter list
> > + * @src_sg: Source scatter list
> > + * @src_sg_len: Number of entries in source scatter list
> > + * @flags: transfer ack flags
> > + *
> > + * Return: Async transaction descriptor on success and NULL on failure
> > + */
> > +static struct dma_async_tx_descriptor *xilinx_cdma_prep_memcpy_sg(
> > +			struct dma_chan *dchan, struct scatterlist *dst_sg,
> > +			unsigned int dst_sg_len, struct scatterlist *src_sg,
> > +			unsigned int src_sg_len, unsigned long flags)
> > +{
> > +	struct xilinx_dma_chan *chan = to_xilinx_chan(dchan);
> > +	struct xilinx_dma_tx_descriptor *desc;
> > +	struct xilinx_cdma_tx_segment *segment, *prev = NULL;
> > +	struct xilinx_cdma_desc_hw *hw;
> > +	size_t len, dst_avail, src_avail;
> > +	dma_addr_t dma_dst, dma_src;
> > +
> > +	if (unlikely(dst_sg_len == 0 || src_sg_len == 0))
> > +		return NULL;
> > +
> > +	if (unlikely(!dst_sg  || !src_sg))
> > +		return NULL;
> 
> no check for dst_sg_len == src_sg_len or it doesnt matter here?

It doesn't, according to the semantics that I had envisioned.

> +
> +	desc = xilinx_dma_alloc_tx_descriptor(chan);
> +	if (!desc)
> +		return NULL;
> +
> +	dma_async_tx_descriptor_init(&desc->async_tx, &chan->common);
> +	desc->async_tx.tx_submit = xilinx_dma_tx_submit;
> +
> +	dst_avail = sg_dma_len(dst_sg);
> +	src_avail = sg_dma_len(src_sg);
> +	/*
> +	 * loop until there is either no more source or no more destination
> +	 * scatterlist entry
> +	 */
> +	while (true) {
> +		len = min_t(size_t, src_avail, dst_avail);
> +		len = min_t(size_t, len, chan->xdev->max_buffer_len);
> +		if (len == 0)
> +			goto fetch;
> +
> +		/* Allocate the link descriptor from DMA pool */
> +		segment = xilinx_cdma_alloc_tx_segment(chan);
> +		if (!segment)
> +			goto error;
> +
> +		dma_dst = sg_dma_address(dst_sg) + sg_dma_len(dst_sg) -
> +			dst_avail;
> +		dma_src = sg_dma_address(src_sg) + sg_dma_len(src_sg) -
> +			src_avail;
> +		hw = &segment->hw;
> +		hw->control = len;
> +		hw->src_addr = dma_src;
> +		hw->dest_addr = dma_dst;
> +		if (chan->ext_addr) {
> +			hw->src_addr_msb = upper_32_bits(dma_src);
> +			hw->dest_addr_msb = upper_32_bits(dma_dst);
> +		}
> +
> +		if (prev) {
> +			prev->hw.next_desc = segment->phys;
> +			if (chan->ext_addr)
> +				prev->hw.next_desc_msb =
> +					upper_32_bits(segment->phys);
> +		}
> +
> +		prev = segment;
> +		dst_avail -= len;
> +		src_avail -= len;
> +		list_add_tail(&segment->node, &desc->segments);
> +
> +fetch:
> +		/* Fetch the next dst scatterlist entry */
> +		if (dst_avail == 0) {
> +			if (dst_sg_len == 0)
> +				break;
> +			dst_sg = sg_next(dst_sg);
> +			if (dst_sg == NULL)
> +				break;
> +			dst_sg_len--;
> +			dst_avail = sg_dma_len(dst_sg);
> +		}
> +		/* Fetch the next src scatterlist entry */
> +		if (src_avail == 0) {
> +			if (src_sg_len == 0)
> +				break;
> +			src_sg = sg_next(src_sg);
> +			if (src_sg == NULL)
> +				break;
> +			src_sg_len--;
> +			src_avail = sg_dma_len(src_sg);
> +		}
> +	}
> +
> +	if (list_empty(&desc->segments)) {
> +		dev_err(chan->xdev->dev,
> +			"%s: Zero-size SG transfer requested\n", __func__);
> +		goto error;
> +	}
> +
> +	/* Link the last hardware descriptor with the first. */
> +	segment = list_first_entry(&desc->segments,
> +				struct xilinx_cdma_tx_segment, node);
> +	desc->async_tx.phys = segment->phys;
> +	prev->hw.next_desc = segment->phys;
> +
> +	return &desc->async_tx;
> +
> +error:
> +	xilinx_dma_free_tx_descriptor(chan, desc);
> +	return NULL;
> +}
> +
>  /**
>   * xilinx_dma_prep_slave_sg - prepare descriptors for a DMA_SLAVE transaction
>   * @dchan: DMA channel
> @@ -3094,7 +3214,9 @@ static int xilinx_dma_probe(struct platform_device *pdev)
>  					  DMA_RESIDUE_GRANULARITY_SEGMENT;
>  	} else if (xdev->dma_config->dmatype == XDMA_TYPE_CDMA) {
>  		dma_cap_set(DMA_MEMCPY, xdev->common.cap_mask);
> +		dma_cap_set(DMA_MEMCPY_SG, xdev->common.cap_mask);
>  		xdev->common.device_prep_dma_memcpy = xilinx_cdma_prep_memcpy;
> +		xdev->common.device_prep_dma_memcpy_sg = xilinx_cdma_prep_memcpy_sg;
>  		/* Residue calculation is supported by only AXI DMA and CDMA */
>  		xdev->common.residue_granularity =
>  					  DMA_RESIDUE_GRANULARITY_SEGMENT;
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index 004736b6a9c8..7c342f77d8eb 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -50,6 +50,7 @@ enum dma_status {
>   */
>  enum dma_transaction_type {
>  	DMA_MEMCPY,
> +	DMA_MEMCPY_SG,
>  	DMA_XOR,
>  	DMA_PQ,
>  	DMA_XOR_VAL,
> @@ -891,6 +892,11 @@ struct dma_device {
>  	struct dma_async_tx_descriptor *(*device_prep_dma_memcpy)(
>  		struct dma_chan *chan, dma_addr_t dst, dma_addr_t src,
>  		size_t len, unsigned long flags);
> +	struct dma_async_tx_descriptor *(*device_prep_dma_memcpy_sg)(
> +		struct dma_chan *chan,
> +		struct scatterlist *dst_sg, unsigned int dst_nents,
> +		struct scatterlist *src_sg, unsigned int src_nents,
> +		unsigned long flags);
>  	struct dma_async_tx_descriptor *(*device_prep_dma_xor)(
>  		struct dma_chan *chan, dma_addr_t dst, dma_addr_t *src,
>  		unsigned int src_cnt, size_t len, unsigned long flags);
> @@ -1053,6 +1059,20 @@ static inline struct dma_async_tx_descriptor *dmaengine_prep_dma_memcpy(
>  						    len, flags);
>  }
>  
> +static inline struct dma_async_tx_descriptor *dmaengine_prep_dma_memcpy_sg(
> +		struct dma_chan *chan,
> +		struct scatterlist *dst_sg, unsigned int dst_nents,
> +		struct scatterlist *src_sg, unsigned int src_nents,
> +		unsigned long flags)
> +{
> +	if (!chan || !chan->device || !chan->device->device_prep_dma_memcpy_sg)
> +		return NULL;
> +
> +	return chan->device->device_prep_dma_memcpy_sg(chan, dst_sg, dst_nents,
> +						       src_sg, src_nents,
> +						       flags);
> +}
> +
>  static inline bool dmaengine_is_metadata_mode_supported(struct dma_chan *chan,
>  		enum dma_desc_metadata_mode mode)
>  {
> -- 
> 2.17.1

Adrian Larumbe

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] dmaengine: xilinx_dma: Restore support for memcpy SG transfers
  2021-07-26 22:14     ` Adrian Larumbe
@ 2021-07-29  4:19       ` Vinod Koul
  0 siblings, 0 replies; 13+ messages in thread
From: Vinod Koul @ 2021-07-29  4:19 UTC (permalink / raw)
  To: Adrian Larumbe; +Cc: Adrian Larumbe, dmaengine, michal.simek, linux-arm-kernel

On 26-07-21, 23:14, Adrian Larumbe wrote:
> Hi Vinod, I'm the same person who authored this patch. I left my previous
> employer so no longer have access to their company email address. However I've
> signed this email with the same GPG key to confirm my identity.
> 
> On 14.07.2021 10:28, Vinod Koul wrote:
> > On 07-07-21, 00:43, Adrian Larumbe wrote:
> > > This is the old DMA_SG interface that was removed in commit
> > > c678fa66341c ("dmaengine: remove DMA_SG as it is dead code in kernel"). It
> > > has been renamed to DMA_MEMCPY_SG to better match the MEMSET and MEMSET_SG
> > > naming convention.
> > > 
> > > It should only be used for mem2mem copies, either main system memory or
> > > CPU-addressable device memory (like video memory on a PCI graphics card).
> > > 
> > > Bringing back this interface was prompted by the need to use the Xilinx
> > > CDMA device for mem2mem SG transfers. The current CDMA binding for
> > > device_prep_dma_memcpy_sg was partially borrowed from xlnx kernel tree, and
> > > expanded with extended address space support when linking descriptor
> > > segments and checking for incorrect zero transfer size.
> > > 
> > > Signed-off-by: Adrian Larumbe <adrian.martinezlarumbe@imgtec.com>
> > > ---
> > >  .../driver-api/dmaengine/provider.rst         |  11 ++
> > >  drivers/dma/dmaengine.c                       |   7 +
> > >  drivers/dma/xilinx/xilinx_dma.c               | 122 ++++++++++++++++++
> > 
> > Can you make this split... documentation patch, core change and then
> > driver
> 
> I understand you'd like these in three different patches, is that right? Or

Correct

> maybe one patch for the core change and its associated documentation, and

doc and core should be different

> another one for the consumer.
> 
> > >  include/linux/dmaengine.h                     |  20 +++
> > >  4 files changed, 160 insertions(+)
> > > 
> > > diff --git a/Documentation/driver-api/dmaengine/provider.rst b/Documentation/driver-api/dmaengine/provider.rst
> > > index ddb0a81a796c..9f0efe9e9952 100644
> > > --- a/Documentation/driver-api/dmaengine/provider.rst
> > > +++ b/Documentation/driver-api/dmaengine/provider.rst
> > > @@ -162,6 +162,17 @@ Currently, the types available are:
> > >  
> > >    - The device is able to do memory to memory copies
> > >  
> > > +- - DMA_MEMCPY_SG
> > > +
> > > +  - The device supports memory to memory scatter-gather transfers.
> > > +
> > > +  - Even though a plain memcpy can look like a particular case of a
> > > +    scatter-gather transfer, with a single chunk to transfer, it's a
> > > +    distinct transaction type in the mem2mem transfer case. This is
> > > +    because some very simple devices might be able to do contiguous
> > > +    single-chunk memory copies, but have no support for more
> > > +    complex SG transfers.
> > 
> > How does one deal with cases where
> >  - src_sg_len and dstn_sg_len are different?
> 
> Then only as many bytes as the smallest of the scattered buffers will be copied.

Is that not a restriction and that needs to be documented with examples!

> 
> > >  - src_sg and dstn_sg are different lists (maybe different number of
> > >    entries with different lengths..)
> > > 
> > > I think we need to document these cases or limitations..
> 
> I don't think we should place any restrictions on the number of scatterlist
> entries or their length, and the consumer driver should ensure that these get
> translated into a device-specific descriptor chain. However the previous
> semantic should always be observed, which effectively turns the operation into
> sort of a strncpy.

That sounds right, but someone needs to know how to handle such cases
with this API, that needs to be explained in detail on expected
behaviour when src_sg_len & dstn_sg_len and same, less or more!

-- 
~Vinod

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 0/3] Add support for MEMCPY_SG transfers
  2021-07-06 23:43 [PATCH 0/2] Add support for MEMCPY_SG transfers Adrian Larumbe
  2021-07-06 23:43 ` [PATCH 1/2] dmaengine: xilinx_dma: Restore support for memcpy SG transfers Adrian Larumbe
  2021-07-06 23:43 ` [PATCH 2/2] xilinx_dma: Fix read-after-free bug when terminating transfers Adrian Larumbe
@ 2021-11-01 18:08 ` Adrian Larumbe
  2021-11-01 18:08   ` [PATCH 1/3] dmaengine: Add documentation for new memcpy scatter-gather function Adrian Larumbe
                     ` (4 more replies)
  2 siblings, 5 replies; 13+ messages in thread
From: Adrian Larumbe @ 2021-11-01 18:08 UTC (permalink / raw)
  To: vkoul, dmaengine; +Cc: michal.simek, linux-arm-kernel, Adrian Larumbe

Bring back dmaengine API support for scatter-gather memcpy's.

Changes in patch v2:
  * Expanded API function documentation to elaborate on its semantics,
    limitations and corner case behaviour.
  * Broke the patch series into three different ones: documentation, core
    API change and consumer driver

v1 - https://lore.kernel.org/dmaengine/20210706234338.7696-1-adrian.martinezlarumbe@imgtec.com

Adrian Larumbe (3):
  dmaengine: Add documentation for new memcpy scatter-gather function
  dmaengine: Add core function and capability check for DMA_MEMCPY_SG
  dmaengine: Add consumer for the new DMA_MEMCPY_SG API function.

 .../driver-api/dmaengine/provider.rst         |  23 ++++
 drivers/dma/dmaengine.c                       |   7 +
 drivers/dma/xilinx/xilinx_dma.c               | 122 ++++++++++++++++++
 include/linux/dmaengine.h                     |  20 +++
 4 files changed, 172 insertions(+)


base-commit: e0674853943287669a82d1ffe09a700944615978
-- 
2.33.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] dmaengine: Add documentation for new memcpy scatter-gather function
  2021-11-01 18:08 ` [PATCH 0/3] Add support for MEMCPY_SG transfers Adrian Larumbe
@ 2021-11-01 18:08   ` Adrian Larumbe
  2021-11-01 18:08   ` [PATCH 2/3] dmaengine: Add core function and capability check for DMA_MEMCPY_SG Adrian Larumbe
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Adrian Larumbe @ 2021-11-01 18:08 UTC (permalink / raw)
  To: vkoul, dmaengine; +Cc: michal.simek, linux-arm-kernel, Adrian Larumbe

Documentation describes semantics, limitations and a typical use case
scenario.

Signed-off-by: Adrian Larumbe <adrianml@alumnos.upm.es>
---
 .../driver-api/dmaengine/provider.rst         | 23 +++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/Documentation/driver-api/dmaengine/provider.rst b/Documentation/driver-api/dmaengine/provider.rst
index ddb0a81a796c..0072c9c7efd3 100644
--- a/Documentation/driver-api/dmaengine/provider.rst
+++ b/Documentation/driver-api/dmaengine/provider.rst
@@ -162,6 +162,29 @@ Currently, the types available are:
 
   - The device is able to do memory to memory copies
 
+- - DMA_MEMCPY_SG
+
+  - The device supports memory to memory scatter-gather transfers.
+
+  - Even though a plain memcpy can look like a particular case of a
+    scatter-gather transfer, with a single chunk to copy, it's a distinct
+    transaction type in the mem2mem transfer case. This is because some very
+    simple devices might be able to do contiguous single-chunk memory copies,
+    but have no support for more complex SG transfers.
+
+  - No matter what the overall size of the combined chunks for source and
+    destination is, only as many bytes as the smallest of the two will be
+    transmitted. That means the number and size of the scatter-gather buffers in
+    both lists need not be the same, and that the operation functionally is
+    equivalent to a ``strncpy`` where the ``count`` argument equals the smallest
+    total size of the two scatter-gather list buffers.
+
+  - It's usually used for copying pixel data between host memory and
+    memory-mapped GPU device memory, such as found on modern PCI video graphics
+    cards. The most immediate example is the OpenGL API function
+    ``glReadPielx()``, which might require a verbatim copy of a huge framebuffer
+    from local device memory onto host memory.
+
 - DMA_XOR
 
   - The device is able to perform XOR operations on memory areas
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] dmaengine: Add core function and capability check for DMA_MEMCPY_SG
  2021-11-01 18:08 ` [PATCH 0/3] Add support for MEMCPY_SG transfers Adrian Larumbe
  2021-11-01 18:08   ` [PATCH 1/3] dmaengine: Add documentation for new memcpy scatter-gather function Adrian Larumbe
@ 2021-11-01 18:08   ` Adrian Larumbe
  2021-11-01 18:08   ` [PATCH 3/3] dmaengine: Add consumer for the new DMA_MEMCPY_SG API function Adrian Larumbe
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Adrian Larumbe @ 2021-11-01 18:08 UTC (permalink / raw)
  To: vkoul, dmaengine; +Cc: michal.simek, linux-arm-kernel, Adrian Larumbe

This is the old DMA_SG interface that was removed in commit
c678fa66341c ("dmaengine: remove DMA_SG as it is dead code in kernel"). It
has been renamed to DMA_MEMCPY_SG to better match the MEMSET and MEMSET_SG
naming convention.

It should only be used for mem2mem copies, either main system memory or
CPU-addressable device memory (like video memory on a PCI graphics card).

Bringing back this interface was prompted by the need to use the Xilinx
CDMA device for mem2mem SG transfers.

Signed-off-by: Adrian Larumbe <adrianml@alumnos.upm.es>
---
 drivers/dma/dmaengine.c   |  7 +++++++
 include/linux/dmaengine.h | 20 ++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index d9f7c097cfd6..2cfa8458b51b 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -1159,6 +1159,13 @@ int dma_async_device_register(struct dma_device *device)
 		return -EIO;
 	}
 
+	if (dma_has_cap(DMA_MEMCPY_SG, device->cap_mask) && !device->device_prep_dma_memcpy_sg) {
+		dev_err(device->dev,
+			"Device claims capability %s, but op is not defined\n",
+			"DMA_MEMCPY_SG");
+		return -EIO;
+	}
+
 	if (dma_has_cap(DMA_XOR, device->cap_mask) && !device->device_prep_dma_xor) {
 		dev_err(device->dev,
 			"Device claims capability %s, but op is not defined\n",
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 9000f3ffce8b..554a86665de9 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -50,6 +50,7 @@ enum dma_status {
  */
 enum dma_transaction_type {
 	DMA_MEMCPY,
+	DMA_MEMCPY_SG,
 	DMA_XOR,
 	DMA_PQ,
 	DMA_XOR_VAL,
@@ -891,6 +892,11 @@ struct dma_device {
 	struct dma_async_tx_descriptor *(*device_prep_dma_memcpy)(
 		struct dma_chan *chan, dma_addr_t dst, dma_addr_t src,
 		size_t len, unsigned long flags);
+	struct dma_async_tx_descriptor *(*device_prep_dma_memcpy_sg)(
+		struct dma_chan *chan,
+		struct scatterlist *dst_sg, unsigned int dst_nents,
+		struct scatterlist *src_sg, unsigned int src_nents,
+		unsigned long flags);
 	struct dma_async_tx_descriptor *(*device_prep_dma_xor)(
 		struct dma_chan *chan, dma_addr_t dst, dma_addr_t *src,
 		unsigned int src_cnt, size_t len, unsigned long flags);
@@ -1051,6 +1057,20 @@ static inline struct dma_async_tx_descriptor *dmaengine_prep_dma_memcpy(
 						    len, flags);
 }
 
+static inline struct dma_async_tx_descriptor *dmaengine_prep_dma_memcpy_sg(
+		struct dma_chan *chan,
+		struct scatterlist *dst_sg, unsigned int dst_nents,
+		struct scatterlist *src_sg, unsigned int src_nents,
+		unsigned long flags)
+{
+	if (!chan || !chan->device || !chan->device->device_prep_dma_memcpy_sg)
+		return NULL;
+
+	return chan->device->device_prep_dma_memcpy_sg(chan, dst_sg, dst_nents,
+						       src_sg, src_nents,
+						       flags);
+}
+
 static inline bool dmaengine_is_metadata_mode_supported(struct dma_chan *chan,
 		enum dma_desc_metadata_mode mode)
 {
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] dmaengine: Add consumer for the new DMA_MEMCPY_SG API function.
  2021-11-01 18:08 ` [PATCH 0/3] Add support for MEMCPY_SG transfers Adrian Larumbe
  2021-11-01 18:08   ` [PATCH 1/3] dmaengine: Add documentation for new memcpy scatter-gather function Adrian Larumbe
  2021-11-01 18:08   ` [PATCH 2/3] dmaengine: Add core function and capability check for DMA_MEMCPY_SG Adrian Larumbe
@ 2021-11-01 18:08   ` Adrian Larumbe
  2021-11-02 10:33   ` [PATCH 0/3] Add support for MEMCPY_SG transfers Michal Simek
  2021-11-22  6:08   ` Vinod Koul
  4 siblings, 0 replies; 13+ messages in thread
From: Adrian Larumbe @ 2021-11-01 18:08 UTC (permalink / raw)
  To: vkoul, dmaengine; +Cc: michal.simek, linux-arm-kernel, Adrian Larumbe

This new CDMA binding for device_prep_dma_memcpy_sg was partially borrowed
from xlnx kernel tree, an expanded with extended address space support when
linking descriptor segments and checking for incorrect zero transfer size.

Signed-off-by: Adrian Larumbe <adrianml@alumnos.upm.es>
---
 drivers/dma/xilinx/xilinx_dma.c | 122 ++++++++++++++++++++++++++++++++
 1 file changed, 122 insertions(+)

diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index 4677ce08ed40..61618148f9d4 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -2127,6 +2127,126 @@ xilinx_cdma_prep_memcpy(struct dma_chan *dchan, dma_addr_t dma_dst,
 	return NULL;
 }
 
+/**
+ * xilinx_cdma_prep_memcpy_sg - prepare descriptors for a memcpy_sg transaction
+ * @dchan: DMA channel
+ * @dst_sg: Destination scatter list
+ * @dst_sg_len: Number of entries in destination scatter list
+ * @src_sg: Source scatter list
+ * @src_sg_len: Number of entries in source scatter list
+ * @flags: transfer ack flags
+ *
+ * Return: Async transaction descriptor on success and NULL on failure
+ */
+static struct dma_async_tx_descriptor *xilinx_cdma_prep_memcpy_sg(
+			struct dma_chan *dchan, struct scatterlist *dst_sg,
+			unsigned int dst_sg_len, struct scatterlist *src_sg,
+			unsigned int src_sg_len, unsigned long flags)
+{
+	struct xilinx_dma_chan *chan = to_xilinx_chan(dchan);
+	struct xilinx_dma_tx_descriptor *desc;
+	struct xilinx_cdma_tx_segment *segment, *prev = NULL;
+	struct xilinx_cdma_desc_hw *hw;
+	size_t len, dst_avail, src_avail;
+	dma_addr_t dma_dst, dma_src;
+
+	if (unlikely(dst_sg_len == 0 || src_sg_len == 0))
+		return NULL;
+
+	if (unlikely(!dst_sg  || !src_sg))
+		return NULL;
+
+	desc = xilinx_dma_alloc_tx_descriptor(chan);
+	if (!desc)
+		return NULL;
+
+	dma_async_tx_descriptor_init(&desc->async_tx, &chan->common);
+	desc->async_tx.tx_submit = xilinx_dma_tx_submit;
+
+	dst_avail = sg_dma_len(dst_sg);
+	src_avail = sg_dma_len(src_sg);
+	/*
+	 * loop until there is either no more source or no more destination
+	 * scatterlist entry
+	 */
+	while (true) {
+		len = min_t(size_t, src_avail, dst_avail);
+		len = min_t(size_t, len, chan->xdev->max_buffer_len);
+		if (len == 0)
+			goto fetch;
+
+		/* Allocate the link descriptor from DMA pool */
+		segment = xilinx_cdma_alloc_tx_segment(chan);
+		if (!segment)
+			goto error;
+
+		dma_dst = sg_dma_address(dst_sg) + sg_dma_len(dst_sg) -
+			dst_avail;
+		dma_src = sg_dma_address(src_sg) + sg_dma_len(src_sg) -
+			src_avail;
+		hw = &segment->hw;
+		hw->control = len;
+		hw->src_addr = dma_src;
+		hw->dest_addr = dma_dst;
+		if (chan->ext_addr) {
+			hw->src_addr_msb = upper_32_bits(dma_src);
+			hw->dest_addr_msb = upper_32_bits(dma_dst);
+		}
+
+		if (prev) {
+			prev->hw.next_desc = segment->phys;
+			if (chan->ext_addr)
+				prev->hw.next_desc_msb =
+					upper_32_bits(segment->phys);
+		}
+
+		prev = segment;
+		dst_avail -= len;
+		src_avail -= len;
+		list_add_tail(&segment->node, &desc->segments);
+
+fetch:
+		/* Fetch the next dst scatterlist entry */
+		if (dst_avail == 0) {
+			if (dst_sg_len == 0)
+				break;
+			dst_sg = sg_next(dst_sg);
+			if (dst_sg == NULL)
+				break;
+			dst_sg_len--;
+			dst_avail = sg_dma_len(dst_sg);
+		}
+		/* Fetch the next src scatterlist entry */
+		if (src_avail == 0) {
+			if (src_sg_len == 0)
+				break;
+			src_sg = sg_next(src_sg);
+			if (src_sg == NULL)
+				break;
+			src_sg_len--;
+			src_avail = sg_dma_len(src_sg);
+		}
+	}
+
+	if (list_empty(&desc->segments)) {
+		dev_err(chan->xdev->dev,
+			"%s: Zero-size SG transfer requested\n", __func__);
+		goto error;
+	}
+
+	/* Link the last hardware descriptor with the first. */
+	segment = list_first_entry(&desc->segments,
+				struct xilinx_cdma_tx_segment, node);
+	desc->async_tx.phys = segment->phys;
+	prev->hw.next_desc = segment->phys;
+
+	return &desc->async_tx;
+
+error:
+	xilinx_dma_free_tx_descriptor(chan, desc);
+	return NULL;
+}
+
 /**
  * xilinx_dma_prep_slave_sg - prepare descriptors for a DMA_SLAVE transaction
  * @dchan: DMA channel
@@ -3115,7 +3235,9 @@ static int xilinx_dma_probe(struct platform_device *pdev)
 					  DMA_RESIDUE_GRANULARITY_SEGMENT;
 	} else if (xdev->dma_config->dmatype == XDMA_TYPE_CDMA) {
 		dma_cap_set(DMA_MEMCPY, xdev->common.cap_mask);
+		dma_cap_set(DMA_MEMCPY_SG, xdev->common.cap_mask);
 		xdev->common.device_prep_dma_memcpy = xilinx_cdma_prep_memcpy;
+		xdev->common.device_prep_dma_memcpy_sg = xilinx_cdma_prep_memcpy_sg;
 		/* Residue calculation is supported by only AXI DMA and CDMA */
 		xdev->common.residue_granularity =
 					  DMA_RESIDUE_GRANULARITY_SEGMENT;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] Add support for MEMCPY_SG transfers
  2021-11-01 18:08 ` [PATCH 0/3] Add support for MEMCPY_SG transfers Adrian Larumbe
                     ` (2 preceding siblings ...)
  2021-11-01 18:08   ` [PATCH 3/3] dmaengine: Add consumer for the new DMA_MEMCPY_SG API function Adrian Larumbe
@ 2021-11-02 10:33   ` Michal Simek
  2021-11-22  6:08   ` Vinod Koul
  4 siblings, 0 replies; 13+ messages in thread
From: Michal Simek @ 2021-11-02 10:33 UTC (permalink / raw)
  To: Adrian Larumbe, vkoul, dmaengine, Radhey Shyam Pandey
  Cc: michal.simek, linux-arm-kernel

+Radhey,


On 11/1/21 19:08, Adrian Larumbe wrote:
> Bring back dmaengine API support for scatter-gather memcpy's.
> 
> Changes in patch v2:
>    * Expanded API function documentation to elaborate on its semantics,
>      limitations and corner case behaviour.
>    * Broke the patch series into three different ones: documentation, core
>      API change and consumer driver
> 
> v1 - https://lore.kernel.org/dmaengine/20210706234338.7696-1-adrian.martinezlarumbe@imgtec.com
> 
> Adrian Larumbe (3):
>    dmaengine: Add documentation for new memcpy scatter-gather function
>    dmaengine: Add core function and capability check for DMA_MEMCPY_SG
>    dmaengine: Add consumer for the new DMA_MEMCPY_SG API function.
> 
>   .../driver-api/dmaengine/provider.rst         |  23 ++++
>   drivers/dma/dmaengine.c                       |   7 +
>   drivers/dma/xilinx/xilinx_dma.c               | 122 ++++++++++++++++++
>   include/linux/dmaengine.h                     |  20 +++
>   4 files changed, 172 insertions(+)
> 
> 
> base-commit: e0674853943287669a82d1ffe09a700944615978
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] Add support for MEMCPY_SG transfers
  2021-11-01 18:08 ` [PATCH 0/3] Add support for MEMCPY_SG transfers Adrian Larumbe
                     ` (3 preceding siblings ...)
  2021-11-02 10:33   ` [PATCH 0/3] Add support for MEMCPY_SG transfers Michal Simek
@ 2021-11-22  6:08   ` Vinod Koul
  4 siblings, 0 replies; 13+ messages in thread
From: Vinod Koul @ 2021-11-22  6:08 UTC (permalink / raw)
  To: Adrian Larumbe; +Cc: dmaengine, michal.simek, linux-arm-kernel

On 01-11-21, 18:08, Adrian Larumbe wrote:
> Bring back dmaengine API support for scatter-gather memcpy's.

Applied, thanks

-- 
~Vinod

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-11-22  6:08 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-06 23:43 [PATCH 0/2] Add support for MEMCPY_SG transfers Adrian Larumbe
2021-07-06 23:43 ` [PATCH 1/2] dmaengine: xilinx_dma: Restore support for memcpy SG transfers Adrian Larumbe
2021-07-14  4:58   ` Vinod Koul
2021-07-26 22:14     ` Adrian Larumbe
2021-07-29  4:19       ` Vinod Koul
2021-07-06 23:43 ` [PATCH 2/2] xilinx_dma: Fix read-after-free bug when terminating transfers Adrian Larumbe
2021-07-14  5:10   ` Vinod Koul
2021-11-01 18:08 ` [PATCH 0/3] Add support for MEMCPY_SG transfers Adrian Larumbe
2021-11-01 18:08   ` [PATCH 1/3] dmaengine: Add documentation for new memcpy scatter-gather function Adrian Larumbe
2021-11-01 18:08   ` [PATCH 2/3] dmaengine: Add core function and capability check for DMA_MEMCPY_SG Adrian Larumbe
2021-11-01 18:08   ` [PATCH 3/3] dmaengine: Add consumer for the new DMA_MEMCPY_SG API function Adrian Larumbe
2021-11-02 10:33   ` [PATCH 0/3] Add support for MEMCPY_SG transfers Michal Simek
2021-11-22  6:08   ` Vinod Koul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).