linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Support for QCA BAM DMA command descriptor
@ 2016-12-15  9:55 Abhishek Sahu
  2016-12-15  9:55 ` [PATCH 1/5] dmaengine: qca: bam_dma: Add header file for bam driver Abhishek Sahu
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Abhishek Sahu @ 2016-12-15  9:55 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams, andy.gross
  Cc: stanimir.varbanov, mcgrof, okaya, pramod.gurav, arnd,
	linux-kernel, dmaengine, linux-arm-msm, Abhishek Sahu

These patches mainly adds the support for QCA BAM command
descriptor and per SG flags which are required for implementing
BAM DMA support for some QCA peripherals like QPIC NAND/LCD.

The BAM command descriptors perform all register reads and writes
while data descriptors do data transfer. The QPIC NAND forms the
chain of command and data descriptors for full page read/write and
submit it to BAM DMA.

Following are the limitation of existing DMA mapping function which
forces us to go for separate DMA custom mapping function and SG.

1. BAM descriptor has multiple flags which cannot be mapped with
   generic DMA engine flags.
2. For each page code word i.e 512 bytes, approx 10-15 register
   read/writes are required. The NAND driver combines all these into
   SGL and submit it to BAM. Each register read/writes require
   different flags and the current generic SG does not have field to
   set dma flags for each SG. We cannot add flag parameter in generic
   SG since it is being used by different subsystems across linux
   kernel.

So these patches add custom mapping function, QCA specific SG which
has dma flag for each SG and its DMA mapping functions. With these,
peripheral driver can set per SG flag and submit it to custom DMA
mapping function.

Abhishek Sahu (5):
  dmaengine: qca: bam_dma: Add header file for bam driver
  dmaengine: Add support for custom data mapping
  dmaengine: qca: bam_dma: Add support for bam sgl
  dmaengine: qca: bam_dma: implement custom data mapping
  dmaengine: qca: bam_dma: implement command descriptor

 drivers/dma/qcom/bam_dma.c       |  98 +++++++++++++++++++++--
 include/linux/dma/qcom_bam_dma.h | 162 +++++++++++++++++++++++++++++++++++++++
 include/linux/dmaengine.h        |   5 ++
 3 files changed, 258 insertions(+), 7 deletions(-)
 create mode 100644 include/linux/dma/qcom_bam_dma.h

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/5] dmaengine: qca: bam_dma: Add header file for bam driver
  2016-12-15  9:55 [PATCH 0/5] Support for QCA BAM DMA command descriptor Abhishek Sahu
@ 2016-12-15  9:55 ` Abhishek Sahu
  2016-12-15  9:55 ` [PATCH 2/5] dmaengine: Add support for custom data mapping Abhishek Sahu
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Abhishek Sahu @ 2016-12-15  9:55 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams, andy.gross
  Cc: stanimir.varbanov, mcgrof, okaya, pramod.gurav, arnd,
	linux-kernel, dmaengine, linux-arm-msm, Abhishek Sahu

The QCA BAM DMA descriptor has other flags which cannot be
mapped with generic DMA engine flags. This patch creates a
new header file for BAM driver and moves the BAM flags
to this file. Some other BAM specific mapping functions will
be added in this file which can be used by different QCA
peripheral drivers.

Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
---
 drivers/dma/qcom/bam_dma.c       |  6 +-----
 include/linux/dma/qcom_bam_dma.h | 25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 5 deletions(-)
 create mode 100644 include/linux/dma/qcom_bam_dma.h

diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c
index 03c4eb3..7078a4d 100644
--- a/drivers/dma/qcom/bam_dma.c
+++ b/drivers/dma/qcom/bam_dma.c
@@ -49,6 +49,7 @@
 #include <linux/clk.h>
 #include <linux/dmaengine.h>
 #include <linux/pm_runtime.h>
+#include <linux/dma/qcom_bam_dma.h>
 
 #include "../dmaengine.h"
 #include "../virt-dma.h"
@@ -61,11 +62,6 @@ struct bam_desc_hw {
 
 #define BAM_DMA_AUTOSUSPEND_DELAY 100
 
-#define DESC_FLAG_INT BIT(15)
-#define DESC_FLAG_EOT BIT(14)
-#define DESC_FLAG_EOB BIT(13)
-#define DESC_FLAG_NWD BIT(12)
-
 struct bam_async_desc {
 	struct virt_dma_desc vd;
 
diff --git a/include/linux/dma/qcom_bam_dma.h b/include/linux/dma/qcom_bam_dma.h
new file mode 100644
index 0000000..c3b68c2
--- /dev/null
+++ b/include/linux/dma/qcom_bam_dma.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (c) 2016, The Linux Foundation. All rights reserved.
+ *
+ * Permission to use, copy, modify, and/or distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#ifndef _QCOM_BAM_DMA_H
+#define _QCOM_BAM_DMA_H
+
+#define DESC_FLAG_INT BIT(15)
+#define DESC_FLAG_EOT BIT(14)
+#define DESC_FLAG_EOB BIT(13)
+#define DESC_FLAG_NWD BIT(12)
+
+#endif
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/5] dmaengine: Add support for custom data mapping
  2016-12-15  9:55 [PATCH 0/5] Support for QCA BAM DMA command descriptor Abhishek Sahu
  2016-12-15  9:55 ` [PATCH 1/5] dmaengine: qca: bam_dma: Add header file for bam driver Abhishek Sahu
@ 2016-12-15  9:55 ` Abhishek Sahu
  2016-12-18 16:26   ` Vinod Koul
  2016-12-15  9:55 ` [PATCH 3/5] dmaengine: qca: bam_dma: Add support for bam sgl Abhishek Sahu
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Abhishek Sahu @ 2016-12-15  9:55 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams, andy.gross
  Cc: stanimir.varbanov, mcgrof, okaya, pramod.gurav, arnd,
	linux-kernel, dmaengine, linux-arm-msm, Abhishek Sahu

The current DMA APIs only support SGL or data in generic format.
The QCA BAM DMA engine data cannot be mapped with already
available APIs due to following reasons.

1. The QCA BAM DMA engine uses custom flags which cannot be
   mapped with generic DMA engine flags.
2. Some peripheral driver like QCA QPIC NAND/LCD requires to
   set specific flags (Like NWD, EOT) for some of the descriptors
   in scatter gather list. The already available mapping APIs take
   flags parameter in API itself and there is no support for
   passing DMA specific flags for each SGL entry.

Now this patch adds the support for making the DMA descriptor from
custom data with new DMA mapping function prep_dma_custom_mapping.
The peripheral driver will pass the custom data in this function and
DMA engine driver will form the descriptor according to its own
logic. In future, this API can be used by any other DMA engine
drivers also which are unable to do DMA mapping with already
available API’s.

Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
---
 include/linux/dmaengine.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index cc535a4..6324c1f 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -692,6 +692,8 @@ struct dma_filter {
  *	be called after period_len bytes have been transferred.
  * @device_prep_interleaved_dma: Transfer expression in a generic way.
  * @device_prep_dma_imm_data: DMA's 8 byte immediate data to the dst address
+ * @device_prep_dma_custom_mapping: prepares a dma operation from dma driver
+ *	specific custom data
  * @device_config: Pushes a new configuration to a channel, return 0 or an error
  *	code
  * @device_pause: Pauses any transfer happening on a channel. Returns
@@ -783,6 +785,9 @@ struct dma_device {
 	struct dma_async_tx_descriptor *(*device_prep_dma_imm_data)(
 		struct dma_chan *chan, dma_addr_t dst, u64 data,
 		unsigned long flags);
+	struct dma_async_tx_descriptor *(*device_prep_dma_custom_mapping)(
+		struct dma_chan *chan, void *data,
+		unsigned long flags);
 
 	int (*device_config)(struct dma_chan *chan,
 			     struct dma_slave_config *config);
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 3/5] dmaengine: qca: bam_dma: Add support for bam sgl
  2016-12-15  9:55 [PATCH 0/5] Support for QCA BAM DMA command descriptor Abhishek Sahu
  2016-12-15  9:55 ` [PATCH 1/5] dmaengine: qca: bam_dma: Add header file for bam driver Abhishek Sahu
  2016-12-15  9:55 ` [PATCH 2/5] dmaengine: Add support for custom data mapping Abhishek Sahu
@ 2016-12-15  9:55 ` Abhishek Sahu
  2016-12-15  9:55 ` [PATCH 4/5] dmaengine: qca: bam_dma: implement custom data mapping Abhishek Sahu
  2016-12-15  9:55 ` [PATCH 5/5] dmaengine: qca: bam_dma: implement command descriptor Abhishek Sahu
  4 siblings, 0 replies; 21+ messages in thread
From: Abhishek Sahu @ 2016-12-15  9:55 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams, andy.gross
  Cc: stanimir.varbanov, mcgrof, okaya, pramod.gurav, arnd,
	linux-kernel, dmaengine, linux-arm-msm, Abhishek Sahu

The default SG does not have flag field but the BAM requires
flags to be passed for each SG. Since SG is linux generic and
other subsystem drivers also use this SG, so we cannot add flag
field in default SG. Now, this patch adds BAM SG and its mapping
operations. This BAM SG contains generic SG and DMA flag field.
The mapping operations are just wrapper functions
over generic SG functions with per-SG DMA flag support.

Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
---
 include/linux/dma/qcom_bam_dma.h | 78 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

diff --git a/include/linux/dma/qcom_bam_dma.h b/include/linux/dma/qcom_bam_dma.h
index c3b68c2..2307c4d 100644
--- a/include/linux/dma/qcom_bam_dma.h
+++ b/include/linux/dma/qcom_bam_dma.h
@@ -17,9 +17,87 @@
 #ifndef _QCOM_BAM_DMA_H
 #define _QCOM_BAM_DMA_H
 
+#include <linux/dma-mapping.h>
+
 #define DESC_FLAG_INT BIT(15)
 #define DESC_FLAG_EOT BIT(14)
 #define DESC_FLAG_EOB BIT(13)
 #define DESC_FLAG_NWD BIT(12)
 
+/*
+ * QCOM BAM DMA SGL struct
+ *
+ * @sgl: DMA SGL
+ * @dma_flags: BAM DMA flags
+ */
+struct qcom_bam_sgl {
+	struct scatterlist sgl;
+	unsigned int dma_flags;
+};
+
+/*
+ * qcom_bam_sg_init_table - Init QCOM BAM SGL
+ * @bam_sgl: bam sgl
+ * @nents: number of entries in bam sgl
+ *
+ * This function performs the initialization for each SGL in BAM SGL
+ * with generic SGL API.
+ */
+static inline void qcom_bam_sg_init_table(struct qcom_bam_sgl *bam_sgl,
+		unsigned int nents)
+{
+	int i;
+
+	for (i = 0; i < nents; i++)
+		sg_init_table(&bam_sgl[i].sgl, 1);
+}
+
+/*
+ * qcom_bam_unmap_sg - Unmap QCOM BAM SGL
+ * @dev: device for which unmapping needs to be done
+ * @bam_sgl: bam sgl
+ * @nents: number of entries in bam sgl
+ * @dir: dma transfer direction
+ *
+ * This function performs the DMA unmapping for each SGL in BAM SGL
+ * with generic SGL API.
+ */
+static inline void qcom_bam_unmap_sg(struct device *dev,
+	struct qcom_bam_sgl *bam_sgl, int nents, enum dma_data_direction dir)
+{
+	int i;
+
+	for (i = 0; i < nents; i++)
+		dma_unmap_sg(dev, &bam_sgl[i].sgl, 1, dir);
+}
+
+/*
+ * qcom_bam_map_sg - Map QCOM BAM SGL
+ * @dev: device for which mapping needs to be done
+ * @bam_sgl: bam sgl
+ * @nents: number of entries in bam sgl
+ * @dir: dma transfer direction
+ *
+ * This function performs the DMA mapping for each SGL in BAM SGL
+ * with generic SGL API.
+ *
+ * returns 0 on error and > 0 on success
+ */
+static inline int qcom_bam_map_sg(struct device *dev,
+	struct qcom_bam_sgl *bam_sgl, int nents, enum dma_data_direction dir)
+{
+	int i, ret = 0;
+
+	for (i = 0; i < nents; i++) {
+		ret = dma_map_sg(dev, &bam_sgl[i].sgl, 1, dir);
+		if (!ret)
+			break;
+	}
+
+	/* unmap the mapped sgl from previous loop in case of error */
+	if (!ret)
+		qcom_bam_unmap_sg(dev, bam_sgl, i, dir);
+
+	return ret;
+}
 #endif
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 4/5] dmaengine: qca: bam_dma: implement custom data mapping
  2016-12-15  9:55 [PATCH 0/5] Support for QCA BAM DMA command descriptor Abhishek Sahu
                   ` (2 preceding siblings ...)
  2016-12-15  9:55 ` [PATCH 3/5] dmaengine: qca: bam_dma: Add support for bam sgl Abhishek Sahu
@ 2016-12-15  9:55 ` Abhishek Sahu
  2016-12-15  9:55 ` [PATCH 5/5] dmaengine: qca: bam_dma: implement command descriptor Abhishek Sahu
  4 siblings, 0 replies; 21+ messages in thread
From: Abhishek Sahu @ 2016-12-15  9:55 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams, andy.gross
  Cc: stanimir.varbanov, mcgrof, okaya, pramod.gurav, arnd,
	linux-kernel, dmaengine, linux-arm-msm, Abhishek Sahu

BAM custom mapping mainly adds per SG BAM specific flag support
which cannot be implemented with generic SG mapping function.
For each SG, it checks for dma_flags and set the same in
bam_async_desc.

Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
---
 drivers/dma/qcom/bam_dma.c       | 92 +++++++++++++++++++++++++++++++++++++++-
 include/linux/dma/qcom_bam_dma.h | 13 ++++++
 2 files changed, 103 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c
index 7078a4d..521ef45 100644
--- a/drivers/dma/qcom/bam_dma.c
+++ b/drivers/dma/qcom/bam_dma.c
@@ -615,7 +615,7 @@ static struct dma_async_tx_descriptor *bam_prep_slave_sg(struct dma_chan *chan,
 	for_each_sg(sgl, sg, sg_len, i)
 		num_alloc += DIV_ROUND_UP(sg_dma_len(sg), BAM_FIFO_SIZE);
 
-	/* allocate enough room to accomodate the number of entries */
+	/* allocate enough room to accommodate the number of entries */
 	async_desc = kzalloc(sizeof(*async_desc) +
 			(num_alloc * sizeof(struct bam_desc_hw)), GFP_NOWAIT);
 
@@ -666,6 +666,92 @@ static struct dma_async_tx_descriptor *bam_prep_slave_sg(struct dma_chan *chan,
 }
 
 /**
+ * bam_prep_dma_custom_mapping - Prep DMA descriptor from custom data
+ *
+ * @chan: dma channel
+ * @data: custom data
+ * @flags: DMA flags
+ */
+static struct dma_async_tx_descriptor *bam_prep_dma_custom_mapping(
+		struct dma_chan *chan,
+		void *data, unsigned long flags)
+{
+	struct bam_chan *bchan = to_bam_chan(chan);
+	struct bam_device *bdev = bchan->bdev;
+	struct bam_async_desc *async_desc;
+	struct qcom_bam_custom_data *desc_data = data;
+	u32 i;
+	struct bam_desc_hw *desc;
+	unsigned int num_alloc = 0;
+
+	if (!is_slave_direction(desc_data->dir)) {
+		dev_err(bdev->dev, "invalid dma direction\n");
+		return NULL;
+	}
+
+	/* calculate number of required entries */
+	for (i = 0; i < desc_data->sgl_cnt; i++)
+		num_alloc += DIV_ROUND_UP(
+			sg_dma_len(&desc_data->bam_sgl[i].sgl), BAM_FIFO_SIZE);
+
+	/* allocate enough room to accommodate the number of entries */
+	async_desc = kzalloc(sizeof(*async_desc) +
+			(num_alloc * sizeof(struct bam_desc_hw)), GFP_NOWAIT);
+
+	if (!async_desc)
+		goto err_out;
+
+	if (flags & DMA_PREP_FENCE)
+		async_desc->flags |= DESC_FLAG_NWD;
+
+	if (flags & DMA_PREP_INTERRUPT)
+		async_desc->flags |= DESC_FLAG_EOT;
+	else
+		async_desc->flags |= DESC_FLAG_INT;
+
+	async_desc->num_desc = num_alloc;
+	async_desc->curr_desc = async_desc->desc;
+	async_desc->dir = desc_data->dir;
+
+	/* fill in temporary descriptors */
+	desc = async_desc->desc;
+	for (i = 0; i < desc_data->sgl_cnt; i++) {
+		unsigned int remainder;
+		unsigned int curr_offset = 0;
+
+		remainder = sg_dma_len(&desc_data->bam_sgl[i].sgl);
+
+		do {
+			desc->addr = cpu_to_le32(
+				sg_dma_address(&desc_data->bam_sgl[i].sgl) +
+						 curr_offset);
+
+			if (desc_data->bam_sgl[i].dma_flags)
+				desc->flags |= cpu_to_le16(
+					desc_data->bam_sgl[i].dma_flags);
+
+			if (remainder > BAM_FIFO_SIZE) {
+				desc->size = cpu_to_le16(BAM_FIFO_SIZE);
+				remainder -= BAM_FIFO_SIZE;
+				curr_offset += BAM_FIFO_SIZE;
+			} else {
+				desc->size = cpu_to_le16(remainder);
+				remainder = 0;
+			}
+
+			async_desc->length += desc->size;
+			desc++;
+		} while (remainder > 0);
+	}
+
+	return vchan_tx_prep(&bchan->vc, &async_desc->vd, flags);
+
+err_out:
+	kfree(async_desc);
+	return NULL;
+}
+
+/**
  * bam_dma_terminate_all - terminate all transactions on a channel
  * @bchan: bam dma channel
  *
@@ -956,7 +1042,7 @@ static void bam_start_dma(struct bam_chan *bchan)
 
 	/* set any special flags on the last descriptor */
 	if (async_desc->num_desc == async_desc->xfer_len)
-		desc[async_desc->xfer_len - 1].flags =
+		desc[async_desc->xfer_len - 1].flags |=
 					cpu_to_le16(async_desc->flags);
 	else
 		desc[async_desc->xfer_len - 1].flags |=
@@ -1233,6 +1319,8 @@ static int bam_dma_probe(struct platform_device *pdev)
 	bdev->common.device_alloc_chan_resources = bam_alloc_chan;
 	bdev->common.device_free_chan_resources = bam_free_chan;
 	bdev->common.device_prep_slave_sg = bam_prep_slave_sg;
+	bdev->common.device_prep_dma_custom_mapping =
+		bam_prep_dma_custom_mapping;
 	bdev->common.device_config = bam_slave_config;
 	bdev->common.device_pause = bam_pause;
 	bdev->common.device_resume = bam_resume;
diff --git a/include/linux/dma/qcom_bam_dma.h b/include/linux/dma/qcom_bam_dma.h
index 2307c4d..46344cf 100644
--- a/include/linux/dma/qcom_bam_dma.h
+++ b/include/linux/dma/qcom_bam_dma.h
@@ -36,6 +36,19 @@ struct qcom_bam_sgl {
 };
 
 /*
+ * QCOM BAM DMA custom data
+ *
+ * @sgl_cnt: number of sgl in bam_sgl
+ * @dir: DMA data transfer direction
+ * @bam_sgl: BAM SGL pointer
+ */
+struct qcom_bam_custom_data {
+	u32 sgl_cnt;
+	enum dma_transfer_direction dir;
+	struct qcom_bam_sgl *bam_sgl;
+};
+
+/*
  * qcom_bam_sg_init_table - Init QCOM BAM SGL
  * @bam_sgl: bam sgl
  * @nents: number of entries in bam sgl
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 5/5] dmaengine: qca: bam_dma: implement command descriptor
  2016-12-15  9:55 [PATCH 0/5] Support for QCA BAM DMA command descriptor Abhishek Sahu
                   ` (3 preceding siblings ...)
  2016-12-15  9:55 ` [PATCH 4/5] dmaengine: qca: bam_dma: implement custom data mapping Abhishek Sahu
@ 2016-12-15  9:55 ` Abhishek Sahu
  4 siblings, 0 replies; 21+ messages in thread
From: Abhishek Sahu @ 2016-12-15  9:55 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams, andy.gross
  Cc: stanimir.varbanov, mcgrof, okaya, pramod.gurav, arnd,
	linux-kernel, dmaengine, linux-arm-msm, Abhishek Sahu

QCA BAM also support command descriptor which allows the SW to
create descriptors of type command which does not generate any
data transmissions but configures registers in the peripheral.
In command descriptor the 32bit address point to the start of
the command block which holds the command elements and the
16bit size define the size of the command block.

Each Command Element is structured by 4 words:
	Write command: address + cmd
		       register data
		       register mask
		       reserved

	Read command: address + cmd
		      read data result address,
		      reserved
		      reserved

Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
---
 include/linux/dma/qcom_bam_dma.h | 46 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/include/linux/dma/qcom_bam_dma.h b/include/linux/dma/qcom_bam_dma.h
index 46344cf..7e317d7 100644
--- a/include/linux/dma/qcom_bam_dma.h
+++ b/include/linux/dma/qcom_bam_dma.h
@@ -23,6 +23,7 @@
 #define DESC_FLAG_EOT BIT(14)
 #define DESC_FLAG_EOB BIT(13)
 #define DESC_FLAG_NWD BIT(12)
+#define DESC_FLAG_CMD BIT(11)
 
 /*
  * QCOM BAM DMA SGL struct
@@ -49,6 +50,34 @@ struct qcom_bam_custom_data {
 };
 
 /*
+ * This data type corresponds to the native Command Element
+ * supported by BAM DMA Engine.
+ *
+ * @addr - register address.
+ * @command - command type.
+ * @data - for write command: content to be written into peripheral register.
+ *	 for read command: dest addr to write peripheral register value to.
+ * @mask - register mask.
+ * @reserved - for future usage.
+ *
+ */
+struct bam_cmd_element {
+	__le32 addr:24;
+	__le32 command:8;
+	__le32 data;
+	__le32 mask;
+	__le32 reserved;
+};
+
+/*
+ * This enum indicates the command type in a command element
+ */
+enum bam_command_type {
+	BAM_WRITE_COMMAND = 0,
+	BAM_READ_COMMAND,
+};
+
+/*
  * qcom_bam_sg_init_table - Init QCOM BAM SGL
  * @bam_sgl: bam sgl
  * @nents: number of entries in bam sgl
@@ -113,4 +142,21 @@ static inline int qcom_bam_map_sg(struct device *dev,
 
 	return ret;
 }
+
+/*
+ * qcom_prep_bam_ce - Wrapper function to prepare a single BAM command element
+ *	with the data that is passed to this function.
+ * @bam_ce: bam command element
+ * @addr: target address
+ * @command: command in bam_command_type
+ * @data: actual data for write and dest addr for read
+ */
+static inline void qcom_prep_bam_ce(struct bam_cmd_element *bam_ce,
+				uint32_t addr, uint32_t command, uint32_t data)
+{
+	bam_ce->addr = cpu_to_le32(addr);
+	bam_ce->command = cpu_to_le32(command);
+	bam_ce->data = cpu_to_le32(data);
+	bam_ce->mask = 0xFFFFFFFF;
+}
 #endif
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] dmaengine: Add support for custom data mapping
  2016-12-15  9:55 ` [PATCH 2/5] dmaengine: Add support for custom data mapping Abhishek Sahu
@ 2016-12-18 16:26   ` Vinod Koul
  2016-12-19  5:06     ` Andy Gross
  0 siblings, 1 reply; 21+ messages in thread
From: Vinod Koul @ 2016-12-18 16:26 UTC (permalink / raw)
  To: Abhishek Sahu
  Cc: dan.j.williams, andy.gross, stanimir.varbanov, mcgrof, okaya,
	pramod.gurav, arnd, linux-kernel, dmaengine, linux-arm-msm

On Thu, Dec 15, 2016 at 03:25:52PM +0530, Abhishek Sahu wrote:
> The current DMA APIs only support SGL or data in generic format.
> The QCA BAM DMA engine data cannot be mapped with already
> available APIs due to following reasons.
> 
> 1. The QCA BAM DMA engine uses custom flags which cannot be
>    mapped with generic DMA engine flags.
> 2. Some peripheral driver like QCA QPIC NAND/LCD requires to
>    set specific flags (Like NWD, EOT) for some of the descriptors
>    in scatter gather list. The already available mapping APIs take
>    flags parameter in API itself and there is no support for
>    passing DMA specific flags for each SGL entry.
> 
> Now this patch adds the support for making the DMA descriptor from
> custom data with new DMA mapping function prep_dma_custom_mapping.
> The peripheral driver will pass the custom data in this function and
> DMA engine driver will form the descriptor according to its own
> logic. In future, this API can be used by any other DMA engine
> drivers also which are unable to do DMA mapping with already
> available API’s.
> 
> Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
> ---
>  include/linux/dmaengine.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index cc535a4..6324c1f 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -692,6 +692,8 @@ struct dma_filter {
>   *	be called after period_len bytes have been transferred.
>   * @device_prep_interleaved_dma: Transfer expression in a generic way.
>   * @device_prep_dma_imm_data: DMA's 8 byte immediate data to the dst address
> + * @device_prep_dma_custom_mapping: prepares a dma operation from dma driver
> + *	specific custom data
>   * @device_config: Pushes a new configuration to a channel, return 0 or an error
>   *	code
>   * @device_pause: Pauses any transfer happening on a channel. Returns
> @@ -783,6 +785,9 @@ struct dma_device {
>  	struct dma_async_tx_descriptor *(*device_prep_dma_imm_data)(
>  		struct dma_chan *chan, dma_addr_t dst, u64 data,
>  		unsigned long flags);
> +	struct dma_async_tx_descriptor *(*device_prep_dma_custom_mapping)(
> +		struct dma_chan *chan, void *data,
> +		unsigned long flags);

This needs a discussion. Why do you need to add a new API for framework.

What are NWD and EOT flags, cna you details out the flags?

-- 
~Vinod

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] dmaengine: Add support for custom data mapping
  2016-12-18 16:26   ` Vinod Koul
@ 2016-12-19  5:06     ` Andy Gross
  2016-12-19 15:49       ` Vinod Koul
  0 siblings, 1 reply; 21+ messages in thread
From: Andy Gross @ 2016-12-19  5:06 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Abhishek Sahu, dan.j.williams, stanimir.varbanov, mcgrof, okaya,
	pramod.gurav, arnd, linux-kernel, dmaengine, linux-arm-msm

On Sun, Dec 18, 2016 at 09:56:02PM +0530, Vinod Koul wrote:
> On Thu, Dec 15, 2016 at 03:25:52PM +0530, Abhishek Sahu wrote:
> > The current DMA APIs only support SGL or data in generic format.
> > The QCA BAM DMA engine data cannot be mapped with already
> > available APIs due to following reasons.
> > 
> > 1. The QCA BAM DMA engine uses custom flags which cannot be
> >    mapped with generic DMA engine flags.
> > 2. Some peripheral driver like QCA QPIC NAND/LCD requires to
> >    set specific flags (Like NWD, EOT) for some of the descriptors
> >    in scatter gather list. The already available mapping APIs take
> >    flags parameter in API itself and there is no support for
> >    passing DMA specific flags for each SGL entry.
> > 
> > Now this patch adds the support for making the DMA descriptor from
> > custom data with new DMA mapping function prep_dma_custom_mapping.
> > The peripheral driver will pass the custom data in this function and
> > DMA engine driver will form the descriptor according to its own
> > logic. In future, this API can be used by any other DMA engine
> > drivers also which are unable to do DMA mapping with already
> > available API’s.
> > 
> > Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
> > ---
> >  include/linux/dmaengine.h | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> > index cc535a4..6324c1f 100644
> > --- a/include/linux/dmaengine.h
> > +++ b/include/linux/dmaengine.h
> > @@ -692,6 +692,8 @@ struct dma_filter {
> >   *	be called after period_len bytes have been transferred.
> >   * @device_prep_interleaved_dma: Transfer expression in a generic way.
> >   * @device_prep_dma_imm_data: DMA's 8 byte immediate data to the dst address
> > + * @device_prep_dma_custom_mapping: prepares a dma operation from dma driver
> > + *	specific custom data
> >   * @device_config: Pushes a new configuration to a channel, return 0 or an error
> >   *	code
> >   * @device_pause: Pauses any transfer happening on a channel. Returns
> > @@ -783,6 +785,9 @@ struct dma_device {
> >  	struct dma_async_tx_descriptor *(*device_prep_dma_imm_data)(
> >  		struct dma_chan *chan, dma_addr_t dst, u64 data,
> >  		unsigned long flags);
> > +	struct dma_async_tx_descriptor *(*device_prep_dma_custom_mapping)(
> > +		struct dma_chan *chan, void *data,
> > +		unsigned long flags);
> 
> This needs a discussion. Why do you need to add a new API for framework.
> 
> What are NWD and EOT flags, cna you details out the flags?

These are the notify when done and end of transaction flags.  I believe the last
time we talked about this, we (Vinod and I)  agreed to just expose a QCOM only interface to set
the special transaction flags.  You'd then have to have some other API to fixup
the descriptor with the right qcom flags.

Ahbishek, correct me where i am wrong on the following:
So two main differences between a normal descriptor and a command descriptor:
1) size of the descriptor
2) the flag setting
3) data sent in is a modified scatter gather that includes flags , vs a normal
scatter gather

So the CMD descriptors in a given sgl can all have varying flags set? I'd assume
they all have CMD flag set.  Do the current users of the command descriptors
coalesce all of their requests into a big list?

So a couple of thoughts on how to deal with this:

1) Define a virtual channel for the command descriptors vs a normal DMA
transaction.  This lets you use the same hardware channel, but lets you discern
which descriptor format you need to use.  The only issue I see with this is the
required change in device tree binding to target the right type of channel (cmd
vs normal).

2) Provide an API to set flags for the descriptor on a whole descriptor basis.

3) If you have a set of transactions described by an sgl that has disparate use
of flags, you split the list and use a separate transaction.  In other words, we
need to enforce that the flag set API will be applied to all descriptors
described by an sgl.  This means that the whole transaction may be comprised of
multiple async TX descriptors.

Regards,
Andy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] dmaengine: Add support for custom data mapping
  2016-12-19  5:06     ` Andy Gross
@ 2016-12-19 15:49       ` Vinod Koul
  2016-12-19 17:52         ` Andy Gross
  0 siblings, 1 reply; 21+ messages in thread
From: Vinod Koul @ 2016-12-19 15:49 UTC (permalink / raw)
  To: Andy Gross
  Cc: Abhishek Sahu, dan.j.williams, stanimir.varbanov, mcgrof, okaya,
	pramod.gurav, arnd, linux-kernel, dmaengine, linux-arm-msm

On Sun, Dec 18, 2016 at 11:06:42PM -0600, Andy Gross wrote:
> On Sun, Dec 18, 2016 at 09:56:02PM +0530, Vinod Koul wrote:
> > On Thu, Dec 15, 2016 at 03:25:52PM +0530, Abhishek Sahu wrote:
> > > The current DMA APIs only support SGL or data in generic format.
> > > The QCA BAM DMA engine data cannot be mapped with already
> > > available APIs due to following reasons.
> > > 
> > > 1. The QCA BAM DMA engine uses custom flags which cannot be
> > >    mapped with generic DMA engine flags.
> > > 2. Some peripheral driver like QCA QPIC NAND/LCD requires to
> > >    set specific flags (Like NWD, EOT) for some of the descriptors
> > >    in scatter gather list. The already available mapping APIs take
> > >    flags parameter in API itself and there is no support for
> > >    passing DMA specific flags for each SGL entry.
> > > 
> > > Now this patch adds the support for making the DMA descriptor from
> > > custom data with new DMA mapping function prep_dma_custom_mapping.
> > > The peripheral driver will pass the custom data in this function and
> > > DMA engine driver will form the descriptor according to its own
> > > logic. In future, this API can be used by any other DMA engine
> > > drivers also which are unable to do DMA mapping with already
> > > available API’s.
> > > 
> > > Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
> > > ---
> > >  include/linux/dmaengine.h | 5 +++++
> > >  1 file changed, 5 insertions(+)
> > > 
> > > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> > > index cc535a4..6324c1f 100644
> > > --- a/include/linux/dmaengine.h
> > > +++ b/include/linux/dmaengine.h
> > > @@ -692,6 +692,8 @@ struct dma_filter {
> > >   *	be called after period_len bytes have been transferred.
> > >   * @device_prep_interleaved_dma: Transfer expression in a generic way.
> > >   * @device_prep_dma_imm_data: DMA's 8 byte immediate data to the dst address
> > > + * @device_prep_dma_custom_mapping: prepares a dma operation from dma driver
> > > + *	specific custom data
> > >   * @device_config: Pushes a new configuration to a channel, return 0 or an error
> > >   *	code
> > >   * @device_pause: Pauses any transfer happening on a channel. Returns
> > > @@ -783,6 +785,9 @@ struct dma_device {
> > >  	struct dma_async_tx_descriptor *(*device_prep_dma_imm_data)(
> > >  		struct dma_chan *chan, dma_addr_t dst, u64 data,
> > >  		unsigned long flags);
> > > +	struct dma_async_tx_descriptor *(*device_prep_dma_custom_mapping)(
> > > +		struct dma_chan *chan, void *data,
> > > +		unsigned long flags);
> > 
> > This needs a discussion. Why do you need to add a new API for framework.
> > 
> > What are NWD and EOT flags, cna you details out the flags?
> 
> These are the notify when done and end of transaction flags.  I believe the last
> time we talked about this, we (Vinod and I)  agreed to just expose a QCOM only interface to set
> the special transaction flags.  You'd then have to have some other API to fixup
> the descriptor with the right qcom flags.

Okay, do you have pointer on that one, will avoid asking the same questions
again.

> Ahbishek, correct me where i am wrong on the following:
> So two main differences between a normal descriptor and a command descriptor:
> 1) size of the descriptor
> 2) the flag setting
> 3) data sent in is a modified scatter gather that includes flags , vs a normal
> scatter gather
> 
> So the CMD descriptors in a given sgl can all have varying flags set? I'd assume
> they all have CMD flag set.  Do the current users of the command descriptors
> coalesce all of their requests into a big list?
> 
> So a couple of thoughts on how to deal with this:
> 
> 1) Define a virtual channel for the command descriptors vs a normal DMA
> transaction.  This lets you use the same hardware channel, but lets you discern
> which descriptor format you need to use.  The only issue I see with this is the
> required change in device tree binding to target the right type of channel (cmd
> vs normal).

Or mark the descriptor is cmd and write accordingly...

> 
> 2) Provide an API to set flags for the descriptor on a whole descriptor basis.
> 
> 3) If you have a set of transactions described by an sgl that has disparate use
> of flags, you split the list and use a separate transaction.  In other words, we
> need to enforce that the flag set API will be applied to all descriptors
> described by an sgl.  This means that the whole transaction may be comprised of
> multiple async TX descriptors.

-- 
~Vinod

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] dmaengine: Add support for custom data mapping
  2016-12-19 15:49       ` Vinod Koul
@ 2016-12-19 17:52         ` Andy Gross
  2016-12-20 19:28           ` Abhishek Sahu
  0 siblings, 1 reply; 21+ messages in thread
From: Andy Gross @ 2016-12-19 17:52 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Abhishek Sahu, dan.j.williams, stanimir.varbanov, mcgrof, okaya,
	pramod.gurav, arnd, linux-kernel, dmaengine, linux-arm-msm

On Mon, Dec 19, 2016 at 09:19:23PM +0530, Vinod Koul wrote:
> On Sun, Dec 18, 2016 at 11:06:42PM -0600, Andy Gross wrote:
> > On Sun, Dec 18, 2016 at 09:56:02PM +0530, Vinod Koul wrote:
> > > On Thu, Dec 15, 2016 at 03:25:52PM +0530, Abhishek Sahu wrote:
> > > > The current DMA APIs only support SGL or data in generic format.
> > > > The QCA BAM DMA engine data cannot be mapped with already
> > > > available APIs due to following reasons.
> > > > 
> > > > 1. The QCA BAM DMA engine uses custom flags which cannot be
> > > >    mapped with generic DMA engine flags.
> > > > 2. Some peripheral driver like QCA QPIC NAND/LCD requires to
> > > >    set specific flags (Like NWD, EOT) for some of the descriptors
> > > >    in scatter gather list. The already available mapping APIs take
> > > >    flags parameter in API itself and there is no support for
> > > >    passing DMA specific flags for each SGL entry.
> > > > 
> > > > Now this patch adds the support for making the DMA descriptor from
> > > > custom data with new DMA mapping function prep_dma_custom_mapping.
> > > > The peripheral driver will pass the custom data in this function and
> > > > DMA engine driver will form the descriptor according to its own
> > > > logic. In future, this API can be used by any other DMA engine
> > > > drivers also which are unable to do DMA mapping with already
> > > > available API’s.
> > > > 
> > > > Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
> > > > ---
> > > >  include/linux/dmaengine.h | 5 +++++
> > > >  1 file changed, 5 insertions(+)
> > > > 
> > > > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> > > > index cc535a4..6324c1f 100644
> > > > --- a/include/linux/dmaengine.h
> > > > +++ b/include/linux/dmaengine.h
> > > > @@ -692,6 +692,8 @@ struct dma_filter {
> > > >   *	be called after period_len bytes have been transferred.
> > > >   * @device_prep_interleaved_dma: Transfer expression in a generic way.
> > > >   * @device_prep_dma_imm_data: DMA's 8 byte immediate data to the dst address
> > > > + * @device_prep_dma_custom_mapping: prepares a dma operation from dma driver
> > > > + *	specific custom data
> > > >   * @device_config: Pushes a new configuration to a channel, return 0 or an error
> > > >   *	code
> > > >   * @device_pause: Pauses any transfer happening on a channel. Returns
> > > > @@ -783,6 +785,9 @@ struct dma_device {
> > > >  	struct dma_async_tx_descriptor *(*device_prep_dma_imm_data)(
> > > >  		struct dma_chan *chan, dma_addr_t dst, u64 data,
> > > >  		unsigned long flags);
> > > > +	struct dma_async_tx_descriptor *(*device_prep_dma_custom_mapping)(
> > > > +		struct dma_chan *chan, void *data,
> > > > +		unsigned long flags);
> > > 
> > > This needs a discussion. Why do you need to add a new API for framework.
> > > 
> > > What are NWD and EOT flags, cna you details out the flags?
> > 
> > These are the notify when done and end of transaction flags.  I believe the last
> > time we talked about this, we (Vinod and I)  agreed to just expose a QCOM only interface to set
> > the special transaction flags.  You'd then have to have some other API to fixup
> > the descriptor with the right qcom flags.
> 
> Okay, do you have pointer on that one, will avoid asking the same questions
> again.

I'll see if I can find the correspondance and send to you directly.

> 
> > Ahbishek, correct me where i am wrong on the following:
> > So two main differences between a normal descriptor and a command descriptor:
> > 1) size of the descriptor
> > 2) the flag setting
> > 3) data sent in is a modified scatter gather that includes flags , vs a normal
> > scatter gather
> > 
> > So the CMD descriptors in a given sgl can all have varying flags set? I'd assume
> > they all have CMD flag set.  Do the current users of the command descriptors
> > coalesce all of their requests into a big list?
> > 
> > So a couple of thoughts on how to deal with this:
> > 
> > 1) Define a virtual channel for the command descriptors vs a normal DMA
> > transaction.  This lets you use the same hardware channel, but lets you discern
> > which descriptor format you need to use.  The only issue I see with this is the
> > required change in device tree binding to target the right type of channel (cmd
> > vs normal).
> 
> Or mark the descriptor is cmd and write accordingly...

The only issue i see with that is knowing how much to pre-allocate during the
prep call.  The flag set API would be called on the allocated tx descriptor.  So
you'd have to know up front and be able to specify it.

> 
> > 
> > 2) Provide an API to set flags for the descriptor on a whole descriptor basis.
> > 
> > 3) If you have a set of transactions described by an sgl that has disparate use
> > of flags, you split the list and use a separate transaction.  In other words, we
> > need to enforce that the flag set API will be applied to all descriptors
> > described by an sgl.  This means that the whole transaction may be comprised of
> > multiple async TX descriptors.

Regards,

Andy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] dmaengine: Add support for custom data mapping
  2016-12-19 17:52         ` Andy Gross
@ 2016-12-20 19:28           ` Abhishek Sahu
  2016-12-20 20:25             ` Andy Gross
  0 siblings, 1 reply; 21+ messages in thread
From: Abhishek Sahu @ 2016-12-20 19:28 UTC (permalink / raw)
  To: Andy Gross
  Cc: Vinod Koul, dan.j.williams, stanimir.varbanov, mcgrof, okaya,
	pramod.gurav, arnd, linux-kernel, dmaengine, linux-arm-msm

On 2016-12-19 23:22, Andy Gross wrote:
> On Mon, Dec 19, 2016 at 09:19:23PM +0530, Vinod Koul wrote:
>> On Sun, Dec 18, 2016 at 11:06:42PM -0600, Andy Gross wrote:
>> > On Sun, Dec 18, 2016 at 09:56:02PM +0530, Vinod Koul wrote:
>> > > On Thu, Dec 15, 2016 at 03:25:52PM +0530, Abhishek Sahu wrote:
>> > > > The current DMA APIs only support SGL or data in generic format.
>> > > > The QCA BAM DMA engine data cannot be mapped with already
>> > > > available APIs due to following reasons.
>> > > >
>> > > > 1. The QCA BAM DMA engine uses custom flags which cannot be
>> > > >    mapped with generic DMA engine flags.
>> > > > 2. Some peripheral driver like QCA QPIC NAND/LCD requires to
>> > > >    set specific flags (Like NWD, EOT) for some of the descriptors
>> > > >    in scatter gather list. The already available mapping APIs take
>> > > >    flags parameter in API itself and there is no support for
>> > > >    passing DMA specific flags for each SGL entry.
>> > > >
>> > > > Now this patch adds the support for making the DMA descriptor from
>> > > > custom data with new DMA mapping function prep_dma_custom_mapping.
>> > > > The peripheral driver will pass the custom data in this function and
>> > > > DMA engine driver will form the descriptor according to its own
>> > > > logic. In future, this API can be used by any other DMA engine
>> > > > drivers also which are unable to do DMA mapping with already
>> > > > available API’s.
>> > > >
>> > > > Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
>> > > > ---
>> > > >  include/linux/dmaengine.h | 5 +++++
>> > > >  1 file changed, 5 insertions(+)
>> > > >
>> > > This needs a discussion. Why do you need to add a new API for framework.
>> > >
>> > > What are NWD and EOT flags, cna you details out the flags?
>> >

The QCA BAM descriptor has multiple flags. Following is the detailed
explanation for these flags

1. EOT (End of Transfer) – this flag is used to specify end of 
transaction
    at the end of this descriptor.
2. EOB (End of Blcok) – this flag is used to specify end of block at the
    end of this descriptor.
3. NWD (Notify When Done) – when set, NWD provides a handshake between
    peripheral and BAM indicating the transaction is truly done and
    data/command has delivered its destination.

    SW can use the NWD feature in order to make the BAM to separate 
between
    executions of consecutive descriptors. This can be useful for 
features
    like Command Descriptor.
4. CMD (Command) - allows the SW to create descriptors of type Command 
which
    does not generate any data transmissions but configures registers in 
the
    Peripheral (write operations, and read registers operations).

>> > These are the notify when done and end of transaction flags.  I believe the last
>> > time we talked about this, we (Vinod and I)  agreed to just expose a QCOM only interface to set
>> > the special transaction flags.  You'd then have to have some other API to fixup
>> > the descriptor with the right qcom flags.
>> 
>> Okay, do you have pointer on that one, will avoid asking the same 
>> questions
>> again.
> 
> I'll see if I can find the correspondance and send to you directly.
> 
>> 
>> > Ahbishek, correct me where i am wrong on the following:
>> > So two main differences between a normal descriptor and a command descriptor:
>> > 1) size of the descriptor
>> > 2) the flag setting
>> > 3) data sent in is a modified scatter gather that includes flags , vs a normal
>> > scatter gather

Top level descriptor is same for both. Only difference is Command flag. 
The
command descriptor will contain list of register read/write instead of 
data address
The peripheral driver can form the list with helper function provided in 
patch 5
and submit it to BAM. The main issue is for other flag like EOT/NWD.

The top level descriptor is again in the form of list where BAM writes 
the
address of the list in register before starting of transfer. In this 
list,
each element will have different flags.

>> >
>> > So the CMD descriptors in a given sgl can all have varying flags set? I'd assume
>> > they all have CMD flag set.  Do the current users of the command descriptors
>> > coalesce all of their requests into a big list?
>> >

The main user for command descriptor is currently QPIC NAND/LCD. The 
NAND uses
3 BAM channels- tx, rx and command. NAND controller do the data transfer 
in
chunk of codeword (maximum 544 bytes). NAND chip does the data transfer 
on
page basis so each page read/write can have multiple codewords. The NAND
driver prepares command, tx(write) or rx(read) descriptor for complete 
page
, submit it to BAM and wait for its completion. So NAND driver coalesces
all of their requests into a big list. In this big list,

1. Some of the request for command channel requires NWD flag to be set.
2. TX request depends upon the setting of EOT flag so some of the TX 
request
    in complete page write will contain EOT flag and others will not. So 
this
    custom mapping will be used for data descriptor also in NAND driver.

>> > So a couple of thoughts on how to deal with this:
>> >
>> > 1) Define a virtual channel for the command descriptors vs a normal DMA
>> > transaction.  This lets you use the same hardware channel, but lets you discern
>> > which descriptor format you need to use.  The only issue I see with this is the
>> > required change in device tree binding to target the right type of channel (cmd
>> > vs normal).
>> 
>> Or mark the descriptor is cmd and write accordingly...
> 
> The only issue i see with that is knowing how much to pre-allocate 
> during the
> prep call.  The flag set API would be called on the allocated tx 
> descriptor.  So
> you'd have to know up front and be able to specify it.
> 
>> 
>> >
>> > 2) Provide an API to set flags for the descriptor on a whole descriptor basis.
>> >
>> > 3) If you have a set of transactions described by an sgl that has disparate use
>> > of flags, you split the list and use a separate transaction.  In other words, we
>> > need to enforce that the flag set API will be applied to all descriptors
>> > described by an sgl.  This means that the whole transaction may be comprised of
>> > multiple async TX descriptors.

Each async TX descriptor will generate the BAM interrupt so we are 
deviating
from main purpose of DMA where ideally we should get the interrupt at 
the end
of transfer. This is the main reason for going for this patch.

With the submitted patch, only 1 interrupt per channel is required for
complete NAND page and it solves the setting of BAM specific flags also.
Only issue with this patch is adding new API in DMA framework itself. 
But
this API can be used by other DMA engines in future for which mapping 
cannot
be done with available APIs and if this mapping is vendor specific.
> 
> Regards,
> 
> Andy

-- 
Abhishek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] dmaengine: Add support for custom data mapping
  2016-12-20 19:28           ` Abhishek Sahu
@ 2016-12-20 20:25             ` Andy Gross
  2016-12-21 19:34               ` Abhishek Sahu
  0 siblings, 1 reply; 21+ messages in thread
From: Andy Gross @ 2016-12-20 20:25 UTC (permalink / raw)
  To: Abhishek Sahu
  Cc: Vinod Koul, dan.j.williams, stanimir.varbanov, mcgrof, okaya,
	pramod.gurav, arnd, linux-kernel, dmaengine, linux-arm-msm

On Wed, Dec 21, 2016 at 12:58:50AM +0530, Abhishek Sahu wrote:

<snip>

> >>Okay, do you have pointer on that one, will avoid asking the same
> >>questions
> >>again.
> >
> >I'll see if I can find the correspondance and send to you directly.
> >
> >>
> >>> Ahbishek, correct me where i am wrong on the following:
> >>> So two main differences between a normal descriptor and a command descriptor:
> >>> 1) size of the descriptor
> >>> 2) the flag setting
> >>> 3) data sent in is a modified scatter gather that includes flags , vs a normal
> >>> scatter gather
> 
> Top level descriptor is same for both. Only difference is Command flag. The
> command descriptor will contain list of register read/write instead of data
> address
> The peripheral driver can form the list with helper function provided in
> patch 5
> and submit it to BAM. The main issue is for other flag like EOT/NWD.
> 
> The top level descriptor is again in the form of list where BAM writes the
> address of the list in register before starting of transfer. In this list,
> each element will have different flags.

Ah that's right.  The command descriptor information is the format of the data
pointed to by the sgl.  So you'd have some set of register read/writes described
in those entries.

> 
> >>>
> >>> So the CMD descriptors in a given sgl can all have varying flags set? I'd assume
> >>> they all have CMD flag set.  Do the current users of the command descriptors
> >>> coalesce all of their requests into a big list?
> >>>
> 
> The main user for command descriptor is currently QPIC NAND/LCD. The NAND
> uses
> 3 BAM channels- tx, rx and command. NAND controller do the data transfer in
> chunk of codeword (maximum 544 bytes). NAND chip does the data transfer on
> page basis so each page read/write can have multiple codewords. The NAND
> driver prepares command, tx(write) or rx(read) descriptor for complete page
> , submit it to BAM and wait for its completion. So NAND driver coalesces
> all of their requests into a big list. In this big list,
> 
> 1. Some of the request for command channel requires NWD flag to be set.

I'd expect this to occur at the end of a chain.  So if you had 5 CMD descriptors
described in the SGL, only the last descriptor would have the NWD set.  Correct?

> 2. TX request depends upon the setting of EOT flag so some of the TX request
>    in complete page write will contain EOT flag and others will not. So this
>    custom mapping will be used for data descriptor also in NAND driver.

Can you give a sequence description of the descriptors and flags?  I haven't
seen the NAND documentation that describes the sequence/flow.

> >>> So a couple of thoughts on how to deal with this:
> >>>
> >>> 1) Define a virtual channel for the command descriptors vs a normal DMA
> >>> transaction.  This lets you use the same hardware channel, but lets you discern
> >>> which descriptor format you need to use.  The only issue I see with this is the
> >>> required change in device tree binding to target the right type of channel (cmd
> >>> vs normal).
> >>
> >>Or mark the descriptor is cmd and write accordingly...
> >
> >The only issue i see with that is knowing how much to pre-allocate during
> >the
> >prep call.  The flag set API would be called on the allocated tx
> >descriptor.  So
> >you'd have to know up front and be able to specify it.
> >
> >>
> >>>
> >>> 2) Provide an API to set flags for the descriptor on a whole descriptor basis.
> >>>
> >>> 3) If you have a set of transactions described by an sgl that has disparate use
> >>> of flags, you split the list and use a separate transaction.  In other words, we
> >>> need to enforce that the flag set API will be applied to all descriptors
> >>> described by an sgl.  This means that the whole transaction may be comprised of
> >>> multiple async TX descriptors.
> 
> Each async TX descriptor will generate the BAM interrupt so we are deviating
> from main purpose of DMA where ideally we should get the interrupt at the
> end
> of transfer. This is the main reason for going for this patch.

If the client queues 1 descriptor or 5 descriptors, it doesn't matter that much.
The client knows when it is done by waiting for the descriptors to complete.  It
is less efficient than grouping them all, but it should still work.

> 
> With the submitted patch, only 1 interrupt per channel is required for
> complete NAND page and it solves the setting of BAM specific flags also.
> Only issue with this patch is adding new API in DMA framework itself. But
> this API can be used by other DMA engines in future for which mapping cannot
> be done with available APIs and if this mapping is vendor specific.

I guess the key point in all of this is that the DMA operation being done is not
a normal data flow to/from the device.  It's direct remote register access to
the device using address information contained in the sgl.  And you are
collating the standard data access along with the special command access in the
same API call.

Regards,

Andy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] dmaengine: Add support for custom data mapping
  2016-12-20 20:25             ` Andy Gross
@ 2016-12-21 19:34               ` Abhishek Sahu
  2016-12-29 17:54                 ` Andy Gross
  0 siblings, 1 reply; 21+ messages in thread
From: Abhishek Sahu @ 2016-12-21 19:34 UTC (permalink / raw)
  To: Andy Gross
  Cc: Vinod Koul, dan.j.williams, stanimir.varbanov, mcgrof, okaya,
	pramod.gurav, arnd, linux-kernel, dmaengine, linux-arm-msm

On 2016-12-21 01:55, Andy Gross wrote:
> On Wed, Dec 21, 2016 at 12:58:50AM +0530, Abhishek Sahu wrote:
> 
> <snip>
> 
>> >>Okay, do you have pointer on that one, will avoid asking the same
>> >>questions
>> >>again.
>> >
>> >I'll see if I can find the correspondance and send to you directly.
>> >
>> >>
>> >>> Ahbishek, correct me where i am wrong on the following:
>> >>> So two main differences between a normal descriptor and a command descriptor:
>> >>> 1) size of the descriptor
>> >>> 2) the flag setting
>> >>> 3) data sent in is a modified scatter gather that includes flags , vs a normal
>> >>> scatter gather
>> 
>> Top level descriptor is same for both. Only difference is Command 
>> flag. The
>> command descriptor will contain list of register read/write instead of 
>> data
>> address
>> The peripheral driver can form the list with helper function provided 
>> in
>> patch 5
>> and submit it to BAM. The main issue is for other flag like EOT/NWD.
>> 
>> The top level descriptor is again in the form of list where BAM writes 
>> the
>> address of the list in register before starting of transfer. In this 
>> list,
>> each element will have different flags.
> 
> Ah that's right.  The command descriptor information is the format of 
> the data
> pointed to by the sgl.  So you'd have some set of register read/writes 
> described
> in those entries.
> 
>> 
>> >>>
>> >>> So the CMD descriptors in a given sgl can all have varying flags set? I'd assume
>> >>> they all have CMD flag set.  Do the current users of the command descriptors
>> >>> coalesce all of their requests into a big list?
>> >>>
>> 
>> The main user for command descriptor is currently QPIC NAND/LCD. The 
>> NAND
>> uses
>> 3 BAM channels- tx, rx and command. NAND controller do the data 
>> transfer in
>> chunk of codeword (maximum 544 bytes). NAND chip does the data 
>> transfer on
>> page basis so each page read/write can have multiple codewords. The 
>> NAND
>> driver prepares command, tx(write) or rx(read) descriptor for complete 
>> page
>> , submit it to BAM and wait for its completion. So NAND driver 
>> coalesces
>> all of their requests into a big list. In this big list,
>> 
>> 1. Some of the request for command channel requires NWD flag to be 
>> set.
> 
> I'd expect this to occur at the end of a chain.  So if you had 5 CMD 
> descriptors
> described in the SGL, only the last descriptor would have the NWD set.  
> Correct?
> 
>> 2. TX request depends upon the setting of EOT flag so some of the TX 
>> request
>>    in complete page write will contain EOT flag and others will not. 
>> So this
>>    custom mapping will be used for data descriptor also in NAND 
>> driver.
> 
> Can you give a sequence description of the descriptors and flags?  I 
> haven't
> seen the NAND documentation that describes the sequence/flow.

Following is the sample list of command descriptor for page write(2K 
page).
The actual list will contain more no of descriptor which involves
spare area transfer also.

Index	INT	NWD	CMD	24bit Register Address
0	-	-	1	0x0000F0 (EBI2_ECC_BUF_CFG)

1	-	-	1	0x000020 (NAND_DEVn_CFG0)
				0x000024 (NAND_DEVn_CFG1)
				0x0000F0 (EBI2_ECC_BUF_CFG)
				0x00000C (NAND_FLASH_CHIP_SELECT)

2	-	-	1	0x000004 (NAND_ADDR0)
				0x000008 (NAND_ADDR1)

3	-	1	1	0x000000 (NAND_FLASH_CMD)

4	-	1	1	0x000010 (NANDC_EXEC_CMD)

5	-	-	1	0x000014 (NAND_FLASH_STATUS)

6	-	1	1	0x000000 (NAND_FLASH_CMD)

7	-	1	1	0x000010 (NANDC_EXEC_CMD)

8	-	-	1	0x000014 (NAND_FLASH_STATUS)

9	-	1	1	0x000000 (NAND_FLASH_CMD)

10	-	1	1	0x000010 (NANDC_EXEC_CMD)

11	-	-	1	0x000014 (NAND_FLASH_STATUS)

12	-	1	1	0x000000 (NAND_FLASH_CMD)

13	-	1	1	0x000010 (NANDC_EXEC_CMD)

14	1	-	1	0x000014 (NAND_FLASH_STATUS)

15	-	-	1	0x000044 (NAND_FLASH_READ_STATUS)
				0x000014 (NAND_FLASH_STATUS)
> 
>> >>> So a couple of thoughts on how to deal with this:
>> >>>
>> >>> 1) Define a virtual channel for the command descriptors vs a normal DMA
>> >>> transaction.  This lets you use the same hardware channel, but lets you discern
>> >>> which descriptor format you need to use.  The only issue I see with this is the
>> >>> required change in device tree binding to target the right type of channel (cmd
>> >>> vs normal).
>> >>
>> >>Or mark the descriptor is cmd and write accordingly...
>> >
>> >The only issue i see with that is knowing how much to pre-allocate during
>> >the
>> >prep call.  The flag set API would be called on the allocated tx
>> >descriptor.  So
>> >you'd have to know up front and be able to specify it.
>> >
>> >>
>> >>>
>> >>> 2) Provide an API to set flags for the descriptor on a whole descriptor basis.
>> >>>
>> >>> 3) If you have a set of transactions described by an sgl that has disparate use
>> >>> of flags, you split the list and use a separate transaction.  In other words, we
>> >>> need to enforce that the flag set API will be applied to all descriptors
>> >>> described by an sgl.  This means that the whole transaction may be comprised of
>> >>> multiple async TX descriptors.
>> 
>> Each async TX descriptor will generate the BAM interrupt so we are 
>> deviating
>> from main purpose of DMA where ideally we should get the interrupt at 
>> the
>> end
>> of transfer. This is the main reason for going for this patch.
> 
> If the client queues 1 descriptor or 5 descriptors, it doesn't matter 
> that much.
> The client knows when it is done by waiting for the descriptors to 
> complete.  It
> is less efficient than grouping them all, but it should still work.
> 
Yes. client will wait for final descriptor completion. But these 
interrupts
will be overhead for CPU. For 1-2 page it won't matter much I guess it 
will be
significant for complete chip read/write(during boot and FS i.e JFFS 
operations).
>> 
>> With the submitted patch, only 1 interrupt per channel is required for
>> complete NAND page and it solves the setting of BAM specific flags 
>> also.
>> Only issue with this patch is adding new API in DMA framework itself. 
>> But
>> this API can be used by other DMA engines in future for which mapping 
>> cannot
>> be done with available APIs and if this mapping is vendor specific.
> 
> I guess the key point in all of this is that the DMA operation being 
> done is not
> a normal data flow to/from the device.  It's direct remote register 
> access to
> the device using address information contained in the sgl.  And you are
> collating the standard data access along with the special command 
> access in the
> same API call.
Yes. Normally DMA engine (QCA ADM DMA engine also) allows to read/write 
memory mapped
registers just like data. But BAM is different (Since it is not a global 
DMA Engine
and coupled with peripheral). Also, this different flag requirement is 
not just
for command descriptors but for data descriptors also.

BAM data access and command access differs only with flag and register 
read/write
list. The register read and write list will be simply array of
struct bam_cmd_element added in patch
struct bam_cmd_element {
         __le32 addr:24;
         __le32 command:8;
         __le32 data;
         __le32 mask;
         __le32 reserved;
};

The address and size of the array will be passed in data and size field 
of SGL.
If we want to form the SGL for mentioned list then we will have SGL of 
size 15
with just one descriptor.

Now we require different flag for each SG entry. currently SG does not 
have
flag parameter and we can't add flag parameter just for our requirement 
in
generic SG. So we have added the custom mapping function and passed 
modified SG
as parameter which is generic SG and flag.

-- 
Abhishek Sahu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] dmaengine: Add support for custom data mapping
  2016-12-21 19:34               ` Abhishek Sahu
@ 2016-12-29 17:54                 ` Andy Gross
  2017-01-02 14:25                   ` Abhishek Sahu
  0 siblings, 1 reply; 21+ messages in thread
From: Andy Gross @ 2016-12-29 17:54 UTC (permalink / raw)
  To: Abhishek Sahu
  Cc: Vinod Koul, dan.j.williams, stanimir.varbanov, mcgrof, okaya,
	pramod.gurav, arnd, linux-kernel, dmaengine, linux-arm-msm

On Thu, Dec 22, 2016 at 01:04:37AM +0530, Abhishek Sahu wrote:
> On 2016-12-21 01:55, Andy Gross wrote:
> >On Wed, Dec 21, 2016 at 12:58:50AM +0530, Abhishek Sahu wrote:
> >
> ><snip>
> >
> >>>>Okay, do you have pointer on that one, will avoid asking the same
> >>>>questions
> >>>>again.
> >>>
> >>>I'll see if I can find the correspondance and send to you directly.
> >>>
> >>>>
> >>>>> Ahbishek, correct me where i am wrong on the following:
> >>>>> So two main differences between a normal descriptor and a command descriptor:
> >>>>> 1) size of the descriptor
> >>>>> 2) the flag setting
> >>>>> 3) data sent in is a modified scatter gather that includes flags , vs a normal
> >>>>> scatter gather
> >>
> >>Top level descriptor is same for both. Only difference is Command flag.
> >>The
> >>command descriptor will contain list of register read/write instead of
> >>data
> >>address
> >>The peripheral driver can form the list with helper function provided in
> >>patch 5
> >>and submit it to BAM. The main issue is for other flag like EOT/NWD.
> >>
> >>The top level descriptor is again in the form of list where BAM writes
> >>the
> >>address of the list in register before starting of transfer. In this
> >>list,
> >>each element will have different flags.
> >
> >Ah that's right.  The command descriptor information is the format of the
> >data
> >pointed to by the sgl.  So you'd have some set of register read/writes
> >described
> >in those entries.
> >
> >>
> >>>>>
> >>>>> So the CMD descriptors in a given sgl can all have varying flags set? I'd assume
> >>>>> they all have CMD flag set.  Do the current users of the command descriptors
> >>>>> coalesce all of their requests into a big list?
> >>>>>
> >>
> >>The main user for command descriptor is currently QPIC NAND/LCD. The
> >>NAND
> >>uses
> >>3 BAM channels- tx, rx and command. NAND controller do the data transfer
> >>in
> >>chunk of codeword (maximum 544 bytes). NAND chip does the data transfer
> >>on
> >>page basis so each page read/write can have multiple codewords. The NAND
> >>driver prepares command, tx(write) or rx(read) descriptor for complete
> >>page
> >>, submit it to BAM and wait for its completion. So NAND driver coalesces
> >>all of their requests into a big list. In this big list,
> >>
> >>1. Some of the request for command channel requires NWD flag to be set.
> >
> >I'd expect this to occur at the end of a chain.  So if you had 5 CMD
> >descriptors
> >described in the SGL, only the last descriptor would have the NWD set.
> >Correct?
> >
> >>2. TX request depends upon the setting of EOT flag so some of the TX
> >>request
> >>   in complete page write will contain EOT flag and others will not. So
> >>this
> >>   custom mapping will be used for data descriptor also in NAND driver.
> >
> >Can you give a sequence description of the descriptors and flags?  I
> >haven't
> >seen the NAND documentation that describes the sequence/flow.
> 
> Following is the sample list of command descriptor for page write(2K page).
> The actual list will contain more no of descriptor which involves
> spare area transfer also.
> 
> Index	INT	NWD	CMD	24bit Register Address
> 0	-	-	1	0x0000F0 (EBI2_ECC_BUF_CFG)
> 
> 1	-	-	1	0x000020 (NAND_DEVn_CFG0)
> 				0x000024 (NAND_DEVn_CFG1)
> 				0x0000F0 (EBI2_ECC_BUF_CFG)
> 				0x00000C (NAND_FLASH_CHIP_SELECT)
> 
> 2	-	-	1	0x000004 (NAND_ADDR0)
> 				0x000008 (NAND_ADDR1)
> 
> 3	-	1	1	0x000000 (NAND_FLASH_CMD)
> 
> 4	-	1	1	0x000010 (NANDC_EXEC_CMD)
> 
> 5	-	-	1	0x000014 (NAND_FLASH_STATUS)
> 
> 6	-	1	1	0x000000 (NAND_FLASH_CMD)
> 
> 7	-	1	1	0x000010 (NANDC_EXEC_CMD)
> 
> 8	-	-	1	0x000014 (NAND_FLASH_STATUS)
> 
> 9	-	1	1	0x000000 (NAND_FLASH_CMD)
> 
> 10	-	1	1	0x000010 (NANDC_EXEC_CMD)
> 
> 11	-	-	1	0x000014 (NAND_FLASH_STATUS)
> 
> 12	-	1	1	0x000000 (NAND_FLASH_CMD)
> 
> 13	-	1	1	0x000010 (NANDC_EXEC_CMD)
> 
> 14	1	-	1	0x000014 (NAND_FLASH_STATUS)
> 
> 15	-	-	1	0x000044 (NAND_FLASH_READ_STATUS)
> 				0x000014 (NAND_FLASH_STATUS)

Yeah I was expecting something like:
- Setup NAND controller using some command writes (indices 0-4)
  Loop doing the following until all the data is done:
  - Send/Receive the Data
  - Check status.

The only one that sticks out to me is index 14.  Is the INT flag there to mark
the actual end of the data transfer from the device?  Then you do one more
Status read.

> >
> >>>>> So a couple of thoughts on how to deal with this:
> >>>>>
> >>>>> 1) Define a virtual channel for the command descriptors vs a normal DMA
> >>>>> transaction.  This lets you use the same hardware channel, but lets you discern
> >>>>> which descriptor format you need to use.  The only issue I see with this is the
> >>>>> required change in device tree binding to target the right type of channel (cmd
> >>>>> vs normal).
> >>>>
> >>>>Or mark the descriptor is cmd and write accordingly...
> >>>
> >>>The only issue i see with that is knowing how much to pre-allocate during
> >>>the
> >>>prep call.  The flag set API would be called on the allocated tx
> >>>descriptor.  So
> >>>you'd have to know up front and be able to specify it.
> >>>
> >>>>
> >>>>>
> >>>>> 2) Provide an API to set flags for the descriptor on a whole descriptor basis.
> >>>>>
> >>>>> 3) If you have a set of transactions described by an sgl that has disparate use
> >>>>> of flags, you split the list and use a separate transaction.  In other words, we
> >>>>> need to enforce that the flag set API will be applied to all descriptors
> >>>>> described by an sgl.  This means that the whole transaction may be comprised of
> >>>>> multiple async TX descriptors.
> >>
> >>Each async TX descriptor will generate the BAM interrupt so we are
> >>deviating
> >>from main purpose of DMA where ideally we should get the interrupt at
> >>the
> >>end
> >>of transfer. This is the main reason for going for this patch.
> >
> >If the client queues 1 descriptor or 5 descriptors, it doesn't matter that
> >much.
> >The client knows when it is done by waiting for the descriptors to
> >complete.  It
> >is less efficient than grouping them all, but it should still work.
> >
> Yes. client will wait for final descriptor completion. But these interrupts
> will be overhead for CPU. For 1-2 page it won't matter much I guess it will
> be
> significant for complete chip read/write(during boot and FS i.e JFFS
> operations).
> >>
> >>With the submitted patch, only 1 interrupt per channel is required for
> >>complete NAND page and it solves the setting of BAM specific flags also.
> >>Only issue with this patch is adding new API in DMA framework itself.
> >>But
> >>this API can be used by other DMA engines in future for which mapping
> >>cannot
> >>be done with available APIs and if this mapping is vendor specific.
> >
> >I guess the key point in all of this is that the DMA operation being done
> >is not
> >a normal data flow to/from the device.  It's direct remote register access
> >to
> >the device using address information contained in the sgl.  And you are
> >collating the standard data access along with the special command access
> >in the
> >same API call.
> Yes. Normally DMA engine (QCA ADM DMA engine also) allows to read/write
> memory mapped
> registers just like data. But BAM is different (Since it is not a global DMA
> Engine
> and coupled with peripheral). Also, this different flag requirement is not
> just
> for command descriptors but for data descriptors also.
> 
> BAM data access and command access differs only with flag and register
> read/write
> list. The register read and write list will be simply array of
> struct bam_cmd_element added in patch
> struct bam_cmd_element {
>         __le32 addr:24;
>         __le32 command:8;
>         __le32 data;
>         __le32 mask;
>         __le32 reserved;
> };
> 
> The address and size of the array will be passed in data and size field of
> SGL.
> If we want to form the SGL for mentioned list then we will have SGL of size
> 15
> with just one descriptor.
> 
> Now we require different flag for each SG entry. currently SG does not have
> flag parameter and we can't add flag parameter just for our requirement in
> generic SG. So we have added the custom mapping function and passed modified
> SG
> as parameter which is generic SG and flag.

I really think that we need some additional API that allows for the flag munging
for the descriptors instead of overriding the prep_slave_sg.  We already needed
to change the way the flags are passed anyway.  And instead of building up a
special sg list, the API should take a structure that has a 1:1 mapping of the
flags to the descriptors.  And you would call this API on your descriptor before
issuing it.

So build up the sglist.  Call the prep_slave_sg.  You get back a tx descriptor
that underneath is a bam descriptor.  Then call the API giving the descriptor
and the structure that defines the flags for the descriptors.  Then submit the
descriptor.

Something like:
int qcom_bam_apply_descriptor_flags(struct dma_async_tx_descriptor *tx,
				    u16 *flags)
{
	struct bam_async_desc async_desc = container_of(tx,
							struct bam_async_desc,
							vd.tx);
	int i;

	for (i = 0; i < async_desc->num_desc; i++)
		async_desc->desc[i].flags = cpu_to_le16(flags[i]);
}

EXPORT_SYMBOL(qcom_bam_apply_descriptor_flags)

This applies the flags directly to the underlying hardware descriptors.  The
prep_slave_sg call would need to remove all the flag munging.  The bam_start_dma
would need to account for this as well by only setting the INT flag if the
transfer cannot get all of the descriptors in the FIFO.

Regards,

Andy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] dmaengine: Add support for custom data mapping
  2016-12-29 17:54                 ` Andy Gross
@ 2017-01-02 14:25                   ` Abhishek Sahu
  2017-01-02 16:12                     ` Andy Gross
  0 siblings, 1 reply; 21+ messages in thread
From: Abhishek Sahu @ 2017-01-02 14:25 UTC (permalink / raw)
  To: Andy Gross
  Cc: Vinod Koul, dan.j.williams, stanimir.varbanov, mcgrof, okaya,
	pramod.gurav, arnd, linux-kernel, dmaengine, linux-arm-msm

On 2016-12-29 23:24, Andy Gross wrote:
> On Thu, Dec 22, 2016 at 01:04:37AM +0530, Abhishek Sahu wrote:
>> On 2016-12-21 01:55, Andy Gross wrote:
>> >On Wed, Dec 21, 2016 at 12:58:50AM +0530, Abhishek Sahu wrote:
>> >
>> ><snip>
>> >
>> >>>>Okay, do you have pointer on that one, will avoid asking the same
>> >>>>questions
>> >>>>again.
>> >>>
>> >>>I'll see if I can find the correspondance and send to you directly.
>> >>>
>> >>>>
>> >>>>> Ahbishek, correct me where i am wrong on the following:
>> >>>>> So two main differences between a normal descriptor and a command descriptor:
>> >>>>> 1) size of the descriptor
>> >>>>> 2) the flag setting
>> >>>>> 3) data sent in is a modified scatter gather that includes flags , vs a normal
>> >>>>> scatter gather
>> >>
>> >>Top level descriptor is same for both. Only difference is Command flag.
>> >>The
>> >>command descriptor will contain list of register read/write instead of
>> >>data
>> >>address
>> >>The peripheral driver can form the list with helper function provided in
>> >>patch 5
>> >>and submit it to BAM. The main issue is for other flag like EOT/NWD.
>> >>
>> >>The top level descriptor is again in the form of list where BAM writes
>> >>the
>> >>address of the list in register before starting of transfer. In this
>> >>list,
>> >>each element will have different flags.
>> >
>> >Ah that's right.  The command descriptor information is the format of the
>> >data
>> >pointed to by the sgl.  So you'd have some set of register read/writes
>> >described
>> >in those entries.
>> >
>> >>
>> >>>>>
>> >>>>> So the CMD descriptors in a given sgl can all have varying flags set? I'd assume
>> >>>>> they all have CMD flag set.  Do the current users of the command descriptors
>> >>>>> coalesce all of their requests into a big list?
>> >>>>>
>> >>
>> >>The main user for command descriptor is currently QPIC NAND/LCD. The
>> >>NAND
>> >>uses
>> >>3 BAM channels- tx, rx and command. NAND controller do the data transfer
>> >>in
>> >>chunk of codeword (maximum 544 bytes). NAND chip does the data transfer
>> >>on
>> >>page basis so each page read/write can have multiple codewords. The NAND
>> >>driver prepares command, tx(write) or rx(read) descriptor for complete
>> >>page
>> >>, submit it to BAM and wait for its completion. So NAND driver coalesces
>> >>all of their requests into a big list. In this big list,
>> >>
>> >>1. Some of the request for command channel requires NWD flag to be set.
>> >
>> >I'd expect this to occur at the end of a chain.  So if you had 5 CMD
>> >descriptors
>> >described in the SGL, only the last descriptor would have the NWD set.
>> >Correct?
>> >
>> >>2. TX request depends upon the setting of EOT flag so some of the TX
>> >>request
>> >>   in complete page write will contain EOT flag and others will not. So
>> >>this
>> >>   custom mapping will be used for data descriptor also in NAND driver.
>> >
>> >Can you give a sequence description of the descriptors and flags?  I
>> >haven't
>> >seen the NAND documentation that describes the sequence/flow.
>> 
>> Following is the sample list of command descriptor for page write(2K 
>> page).
>> The actual list will contain more no of descriptor which involves
>> spare area transfer also.
>> 
>> Index	INT	NWD	CMD	24bit Register Address
>> 0	-	-	1	0x0000F0 (EBI2_ECC_BUF_CFG)
>> 
>> 1	-	-	1	0x000020 (NAND_DEVn_CFG0)
>> 				0x000024 (NAND_DEVn_CFG1)
>> 				0x0000F0 (EBI2_ECC_BUF_CFG)
>> 				0x00000C (NAND_FLASH_CHIP_SELECT)
>> 
>> 2	-	-	1	0x000004 (NAND_ADDR0)
>> 				0x000008 (NAND_ADDR1)
>> 
>> 3	-	1	1	0x000000 (NAND_FLASH_CMD)
>> 
>> 4	-	1	1	0x000010 (NANDC_EXEC_CMD)
>> 
>> 5	-	-	1	0x000014 (NAND_FLASH_STATUS)
>> 
>> 6	-	1	1	0x000000 (NAND_FLASH_CMD)
>> 
>> 7	-	1	1	0x000010 (NANDC_EXEC_CMD)
>> 
>> 8	-	-	1	0x000014 (NAND_FLASH_STATUS)
>> 
>> 9	-	1	1	0x000000 (NAND_FLASH_CMD)
>> 
>> 10	-	1	1	0x000010 (NANDC_EXEC_CMD)
>> 
>> 11	-	-	1	0x000014 (NAND_FLASH_STATUS)
>> 
>> 12	-	1	1	0x000000 (NAND_FLASH_CMD)
>> 
>> 13	-	1	1	0x000010 (NANDC_EXEC_CMD)
>> 
>> 14	1	-	1	0x000014 (NAND_FLASH_STATUS)
>> 
>> 15	-	-	1	0x000044 (NAND_FLASH_READ_STATUS)
>> 				0x000014 (NAND_FLASH_STATUS)
> 
> Yeah I was expecting something like:
> - Setup NAND controller using some command writes (indices 0-4)
>   Loop doing the following until all the data is done:
>   - Send/Receive the Data
>   - Check status.
> 
> The only one that sticks out to me is index 14.  Is the INT flag there 
> to mark
> the actual end of the data transfer from the device?  Then you do one 
> more
> Status read.
> 
This is sample list given in NAND document. INT will be set only for the 
last
command. I checked the NAND driver in which status will be read only 
once for
each codeword.
>> >
>> >>>>> So a couple of thoughts on how to deal with this:
>> >>>>>
>> >>>>> 1) Define a virtual channel for the command descriptors vs a normal DMA
>> >>>>> transaction.  This lets you use the same hardware channel, but lets you discern
>> >>>>> which descriptor format you need to use.  The only issue I see with this is the
>> >>>>> required change in device tree binding to target the right type of channel (cmd
>> >>>>> vs normal).
>> >>>>
>> >>>>Or mark the descriptor is cmd and write accordingly...
>> >>>
>> >>>The only issue i see with that is knowing how much to pre-allocate during
>> >>>the
>> >>>prep call.  The flag set API would be called on the allocated tx
>> >>>descriptor.  So
>> >>>you'd have to know up front and be able to specify it.
>> >>>
>> >>>>
>> >>>>>
>> >>>>> 2) Provide an API to set flags for the descriptor on a whole descriptor basis.
>> >>>>>
>> >>>>> 3) If you have a set of transactions described by an sgl that has disparate use
>> >>>>> of flags, you split the list and use a separate transaction.  In other words, we
>> >>>>> need to enforce that the flag set API will be applied to all descriptors
>> >>>>> described by an sgl.  This means that the whole transaction may be comprised of
>> >>>>> multiple async TX descriptors.
>> >>
>> >>Each async TX descriptor will generate the BAM interrupt so we are
>> >>deviating
>> >>from main purpose of DMA where ideally we should get the interrupt at
>> >>the
>> >>end
>> >>of transfer. This is the main reason for going for this patch.
>> >
>> >If the client queues 1 descriptor or 5 descriptors, it doesn't matter that
>> >much.
>> >The client knows when it is done by waiting for the descriptors to
>> >complete.  It
>> >is less efficient than grouping them all, but it should still work.
>> >
>> Yes. client will wait for final descriptor completion. But these 
>> interrupts
>> will be overhead for CPU. For 1-2 page it won't matter much I guess it 
>> will
>> be
>> significant for complete chip read/write(during boot and FS i.e JFFS
>> operations).
>> >>
>> >>With the submitted patch, only 1 interrupt per channel is required for
>> >>complete NAND page and it solves the setting of BAM specific flags also.
>> >>Only issue with this patch is adding new API in DMA framework itself.
>> >>But
>> >>this API can be used by other DMA engines in future for which mapping
>> >>cannot
>> >>be done with available APIs and if this mapping is vendor specific.
>> >
>> >I guess the key point in all of this is that the DMA operation being done
>> >is not
>> >a normal data flow to/from the device.  It's direct remote register access
>> >to
>> >the device using address information contained in the sgl.  And you are
>> >collating the standard data access along with the special command access
>> >in the
>> >same API call.
>> Yes. Normally DMA engine (QCA ADM DMA engine also) allows to 
>> read/write
>> memory mapped
>> registers just like data. But BAM is different (Since it is not a 
>> global DMA
>> Engine
>> and coupled with peripheral). Also, this different flag requirement is 
>> not
>> just
>> for command descriptors but for data descriptors also.
>> 
>> BAM data access and command access differs only with flag and register
>> read/write
>> list. The register read and write list will be simply array of
>> struct bam_cmd_element added in patch
>> struct bam_cmd_element {
>>         __le32 addr:24;
>>         __le32 command:8;
>>         __le32 data;
>>         __le32 mask;
>>         __le32 reserved;
>> };
>> 
>> The address and size of the array will be passed in data and size 
>> field of
>> SGL.
>> If we want to form the SGL for mentioned list then we will have SGL of 
>> size
>> 15
>> with just one descriptor.
>> 
>> Now we require different flag for each SG entry. currently SG does not 
>> have
>> flag parameter and we can't add flag parameter just for our 
>> requirement in
>> generic SG. So we have added the custom mapping function and passed 
>> modified
>> SG
>> as parameter which is generic SG and flag.
> 
> I really think that we need some additional API that allows for the 
> flag munging
> for the descriptors instead of overriding the prep_slave_sg.  We 
> already needed
> to change the way the flags are passed anyway.  And instead of building 
> up a
> special sg list, the API should take a structure that has a 1:1 mapping 
> of the
> flags to the descriptors.  And you would call this API on your 
> descriptor before
> issuing it.
> 
> So build up the sglist.  Call the prep_slave_sg.  You get back a tx 
> descriptor
> that underneath is a bam descriptor.  Then call the API giving the 
> descriptor
> and the structure that defines the flags for the descriptors.  Then 
> submit the
> descriptor.
> 
> Something like:
> int qcom_bam_apply_descriptor_flags(struct dma_async_tx_descriptor *tx,
> 				    u16 *flags)
> {
> 	struct bam_async_desc async_desc = container_of(tx,
> 							struct bam_async_desc,
> 							vd.tx);
> 	int i;
> 
> 	for (i = 0; i < async_desc->num_desc; i++)
> 		async_desc->desc[i].flags = cpu_to_le16(flags[i]);
> }
> 
> EXPORT_SYMBOL(qcom_bam_apply_descriptor_flags)
> 

We want to tightly couple the SG and its flag. But if this 1:1 mapping 
is acceptable
in linux kernel then we can go ahead with this. It will solve our 
requirement and
does not require any change in Linux DMA API. I will do the same and 
will submit the
new patches.

> This applies the flags directly to the underlying hardware descriptors. 
>  The
> prep_slave_sg call would need to remove all the flag munging.  The 
> bam_start_dma
> would need to account for this as well by only setting the INT flag if 
> the
> transfer cannot get all of the descriptors in the FIFO.

It seems no major change is required in prep_slave_sg or bam_start_dma 
since
it is just setting INT flag for last entry which is required for QPIC 
drivers
also. We need to change the assignment of flag with bitwise OR 
assignment for
last BAM desc in function bam_start_dma

         /* set any special flags on the last descriptor */
         if (async_desc->num_desc == async_desc->xfer_len)
                 desc[async_desc->xfer_len - 1].flags |=
                                         cpu_to_le16(async_desc->flags);

> 
> Regards,
> 
> Andy

-- 
Abhishek Sahu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] dmaengine: Add support for custom data mapping
  2017-01-02 14:25                   ` Abhishek Sahu
@ 2017-01-02 16:12                     ` Andy Gross
  2017-01-19  5:01                       ` Vinod Koul
  0 siblings, 1 reply; 21+ messages in thread
From: Andy Gross @ 2017-01-02 16:12 UTC (permalink / raw)
  To: Abhishek Sahu
  Cc: Vinod Koul, dan.j.williams, stanimir.varbanov, mcgrof, okaya,
	pramod.gurav, arnd, linux-kernel, dmaengine, linux-arm-msm

On Mon, Jan 02, 2017 at 07:55:37PM +0530, Abhishek Sahu wrote:
> On 2016-12-29 23:24, Andy Gross wrote:
> >On Thu, Dec 22, 2016 at 01:04:37AM +0530, Abhishek Sahu wrote:
> >>On 2016-12-21 01:55, Andy Gross wrote:
> >>>On Wed, Dec 21, 2016 at 12:58:50AM +0530, Abhishek Sahu wrote:
> >>>
> >>><snip>
> >>>
> >>>>>>Okay, do you have pointer on that one, will avoid asking the same
> >>>>>>questions
> >>>>>>again.
> >>>>>
> >>>>>I'll see if I can find the correspondance and send to you directly.
> >>>>>
> >>>>>>
> >>>>>>> Ahbishek, correct me where i am wrong on the following:
> >>>>>>> So two main differences between a normal descriptor and a command descriptor:
> >>>>>>> 1) size of the descriptor
> >>>>>>> 2) the flag setting
> >>>>>>> 3) data sent in is a modified scatter gather that includes flags , vs a normal
> >>>>>>> scatter gather
> >>>>
> >>>>Top level descriptor is same for both. Only difference is Command flag.
> >>>>The
> >>>>command descriptor will contain list of register read/write instead of
> >>>>data
> >>>>address
> >>>>The peripheral driver can form the list with helper function provided in
> >>>>patch 5
> >>>>and submit it to BAM. The main issue is for other flag like EOT/NWD.
> >>>>
> >>>>The top level descriptor is again in the form of list where BAM writes
> >>>>the
> >>>>address of the list in register before starting of transfer. In this
> >>>>list,
> >>>>each element will have different flags.
> >>>
> >>>Ah that's right.  The command descriptor information is the format of the
> >>>data
> >>>pointed to by the sgl.  So you'd have some set of register read/writes
> >>>described
> >>>in those entries.
> >>>
> >>>>
> >>>>>>>
> >>>>>>> So the CMD descriptors in a given sgl can all have varying flags set? I'd assume
> >>>>>>> they all have CMD flag set.  Do the current users of the command descriptors
> >>>>>>> coalesce all of their requests into a big list?
> >>>>>>>
> >>>>
> >>>>The main user for command descriptor is currently QPIC NAND/LCD. The
> >>>>NAND
> >>>>uses
> >>>>3 BAM channels- tx, rx and command. NAND controller do the data transfer
> >>>>in
> >>>>chunk of codeword (maximum 544 bytes). NAND chip does the data transfer
> >>>>on
> >>>>page basis so each page read/write can have multiple codewords. The NAND
> >>>>driver prepares command, tx(write) or rx(read) descriptor for complete
> >>>>page
> >>>>, submit it to BAM and wait for its completion. So NAND driver coalesces
> >>>>all of their requests into a big list. In this big list,
> >>>>
> >>>>1. Some of the request for command channel requires NWD flag to be set.
> >>>
> >>>I'd expect this to occur at the end of a chain.  So if you had 5 CMD
> >>>descriptors
> >>>described in the SGL, only the last descriptor would have the NWD set.
> >>>Correct?
> >>>
> >>>>2. TX request depends upon the setting of EOT flag so some of the TX
> >>>>request
> >>>>   in complete page write will contain EOT flag and others will not. So
> >>>>this
> >>>>   custom mapping will be used for data descriptor also in NAND driver.
> >>>
> >>>Can you give a sequence description of the descriptors and flags?  I
> >>>haven't
> >>>seen the NAND documentation that describes the sequence/flow.
> >>
> >>Following is the sample list of command descriptor for page write(2K
> >>page).
> >>The actual list will contain more no of descriptor which involves
> >>spare area transfer also.
> >>
> >>Index	INT	NWD	CMD	24bit Register Address
> >>0	-	-	1	0x0000F0 (EBI2_ECC_BUF_CFG)
> >>
> >>1	-	-	1	0x000020 (NAND_DEVn_CFG0)
> >>				0x000024 (NAND_DEVn_CFG1)
> >>				0x0000F0 (EBI2_ECC_BUF_CFG)
> >>				0x00000C (NAND_FLASH_CHIP_SELECT)
> >>
> >>2	-	-	1	0x000004 (NAND_ADDR0)
> >>				0x000008 (NAND_ADDR1)
> >>
> >>3	-	1	1	0x000000 (NAND_FLASH_CMD)
> >>
> >>4	-	1	1	0x000010 (NANDC_EXEC_CMD)
> >>
> >>5	-	-	1	0x000014 (NAND_FLASH_STATUS)
> >>
> >>6	-	1	1	0x000000 (NAND_FLASH_CMD)
> >>
> >>7	-	1	1	0x000010 (NANDC_EXEC_CMD)
> >>
> >>8	-	-	1	0x000014 (NAND_FLASH_STATUS)
> >>
> >>9	-	1	1	0x000000 (NAND_FLASH_CMD)
> >>
> >>10	-	1	1	0x000010 (NANDC_EXEC_CMD)
> >>
> >>11	-	-	1	0x000014 (NAND_FLASH_STATUS)
> >>
> >>12	-	1	1	0x000000 (NAND_FLASH_CMD)
> >>
> >>13	-	1	1	0x000010 (NANDC_EXEC_CMD)
> >>
> >>14	1	-	1	0x000014 (NAND_FLASH_STATUS)
> >>
> >>15	-	-	1	0x000044 (NAND_FLASH_READ_STATUS)
> >>				0x000014 (NAND_FLASH_STATUS)
> >
> >Yeah I was expecting something like:
> >- Setup NAND controller using some command writes (indices 0-4)
> >  Loop doing the following until all the data is done:
> >  - Send/Receive the Data
> >  - Check status.
> >
> >The only one that sticks out to me is index 14.  Is the INT flag there to
> >mark
> >the actual end of the data transfer from the device?  Then you do one more
> >Status read.
> >
> This is sample list given in NAND document. INT will be set only for the
> last
> command. I checked the NAND driver in which status will be read only once
> for
> each codeword.
> >>>
> >>>>>>> So a couple of thoughts on how to deal with this:
> >>>>>>>
> >>>>>>> 1) Define a virtual channel for the command descriptors vs a normal DMA
> >>>>>>> transaction.  This lets you use the same hardware channel, but lets you discern
> >>>>>>> which descriptor format you need to use.  The only issue I see with this is the
> >>>>>>> required change in device tree binding to target the right type of channel (cmd
> >>>>>>> vs normal).
> >>>>>>
> >>>>>>Or mark the descriptor is cmd and write accordingly...
> >>>>>
> >>>>>The only issue i see with that is knowing how much to pre-allocate during
> >>>>>the
> >>>>>prep call.  The flag set API would be called on the allocated tx
> >>>>>descriptor.  So
> >>>>>you'd have to know up front and be able to specify it.
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> 2) Provide an API to set flags for the descriptor on a whole descriptor basis.
> >>>>>>>
> >>>>>>> 3) If you have a set of transactions described by an sgl that has disparate use
> >>>>>>> of flags, you split the list and use a separate transaction.  In other words, we
> >>>>>>> need to enforce that the flag set API will be applied to all descriptors
> >>>>>>> described by an sgl.  This means that the whole transaction may be comprised of
> >>>>>>> multiple async TX descriptors.
> >>>>
> >>>>Each async TX descriptor will generate the BAM interrupt so we are
> >>>>deviating
> >>>>from main purpose of DMA where ideally we should get the interrupt at
> >>>>the
> >>>>end
> >>>>of transfer. This is the main reason for going for this patch.
> >>>
> >>>If the client queues 1 descriptor or 5 descriptors, it doesn't matter that
> >>>much.
> >>>The client knows when it is done by waiting for the descriptors to
> >>>complete.  It
> >>>is less efficient than grouping them all, but it should still work.
> >>>
> >>Yes. client will wait for final descriptor completion. But these
> >>interrupts
> >>will be overhead for CPU. For 1-2 page it won't matter much I guess it
> >>will
> >>be
> >>significant for complete chip read/write(during boot and FS i.e JFFS
> >>operations).
> >>>>
> >>>>With the submitted patch, only 1 interrupt per channel is required for
> >>>>complete NAND page and it solves the setting of BAM specific flags also.
> >>>>Only issue with this patch is adding new API in DMA framework itself.
> >>>>But
> >>>>this API can be used by other DMA engines in future for which mapping
> >>>>cannot
> >>>>be done with available APIs and if this mapping is vendor specific.
> >>>
> >>>I guess the key point in all of this is that the DMA operation being done
> >>>is not
> >>>a normal data flow to/from the device.  It's direct remote register access
> >>>to
> >>>the device using address information contained in the sgl.  And you are
> >>>collating the standard data access along with the special command access
> >>>in the
> >>>same API call.
> >>Yes. Normally DMA engine (QCA ADM DMA engine also) allows to read/write
> >>memory mapped
> >>registers just like data. But BAM is different (Since it is not a global
> >>DMA
> >>Engine
> >>and coupled with peripheral). Also, this different flag requirement is
> >>not
> >>just
> >>for command descriptors but for data descriptors also.
> >>
> >>BAM data access and command access differs only with flag and register
> >>read/write
> >>list. The register read and write list will be simply array of
> >>struct bam_cmd_element added in patch
> >>struct bam_cmd_element {
> >>        __le32 addr:24;
> >>        __le32 command:8;
> >>        __le32 data;
> >>        __le32 mask;
> >>        __le32 reserved;
> >>};
> >>
> >>The address and size of the array will be passed in data and size field
> >>of
> >>SGL.
> >>If we want to form the SGL for mentioned list then we will have SGL of
> >>size
> >>15
> >>with just one descriptor.
> >>
> >>Now we require different flag for each SG entry. currently SG does not
> >>have
> >>flag parameter and we can't add flag parameter just for our requirement
> >>in
> >>generic SG. So we have added the custom mapping function and passed
> >>modified
> >>SG
> >>as parameter which is generic SG and flag.
> >
> >I really think that we need some additional API that allows for the flag
> >munging
> >for the descriptors instead of overriding the prep_slave_sg.  We already
> >needed
> >to change the way the flags are passed anyway.  And instead of building up
> >a
> >special sg list, the API should take a structure that has a 1:1 mapping of
> >the
> >flags to the descriptors.  And you would call this API on your descriptor
> >before
> >issuing it.
> >
> >So build up the sglist.  Call the prep_slave_sg.  You get back a tx
> >descriptor
> >that underneath is a bam descriptor.  Then call the API giving the
> >descriptor
> >and the structure that defines the flags for the descriptors.  Then submit
> >the
> >descriptor.
> >
> >Something like:
> >int qcom_bam_apply_descriptor_flags(struct dma_async_tx_descriptor *tx,
> >				    u16 *flags)
> >{
> >	struct bam_async_desc async_desc = container_of(tx,
> >							struct bam_async_desc,
> >							vd.tx);
> >	int i;
> >
> >	for (i = 0; i < async_desc->num_desc; i++)
> >		async_desc->desc[i].flags = cpu_to_le16(flags[i]);
> >}
> >
> >EXPORT_SYMBOL(qcom_bam_apply_descriptor_flags)
> >
> 
> We want to tightly couple the SG and its flag. But if this 1:1 mapping is
> acceptable
> in linux kernel then we can go ahead with this. It will solve our
> requirement and
> does not require any change in Linux DMA API. I will do the same and will
> submit the
> new patches.

Thanks.  Hopefully Vinod will be ok with the SoC specific EXPORT.

> 
> >This applies the flags directly to the underlying hardware descriptors.
> >The
> >prep_slave_sg call would need to remove all the flag munging.  The
> >bam_start_dma
> >would need to account for this as well by only setting the INT flag if the
> >transfer cannot get all of the descriptors in the FIFO.
> 
> It seems no major change is required in prep_slave_sg or bam_start_dma since
> it is just setting INT flag for last entry which is required for QPIC
> drivers
> also. We need to change the assignment of flag with bitwise OR assignment
> for
> last BAM desc in function bam_start_dma
> 
>         /* set any special flags on the last descriptor */
>         if (async_desc->num_desc == async_desc->xfer_len)
>                 desc[async_desc->xfer_len - 1].flags |=
>                                         cpu_to_le16(async_desc->flags);

Right.  That can be done in the start_dma directly.  That's the only function
that can really determine when the INT flag will be set (other than the last
descriptor).


Regards,

Andy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] dmaengine: Add support for custom data mapping
  2017-01-02 16:12                     ` Andy Gross
@ 2017-01-19  5:01                       ` Vinod Koul
  2017-01-19 14:13                         ` Andy Gross
  0 siblings, 1 reply; 21+ messages in thread
From: Vinod Koul @ 2017-01-19  5:01 UTC (permalink / raw)
  To: Andy Gross
  Cc: Abhishek Sahu, dan.j.williams, stanimir.varbanov, mcgrof, okaya,
	pramod.gurav, arnd, linux-kernel, dmaengine, linux-arm-msm

On Mon, Jan 02, 2017 at 10:12:33AM -0600, Andy Gross wrote:
> On Mon, Jan 02, 2017 at 07:55:37PM +0530, Abhishek Sahu wrote:
> > >>>>>>> So a couple of thoughts on how to deal with this:
> > >>>>>>>
> > >>>>>>> 1) Define a virtual channel for the command descriptors vs a normal DMA
> > >>>>>>> transaction.  This lets you use the same hardware channel, but lets you discern
> > >>>>>>> which descriptor format you need to use.  The only issue I see with this is the
> > >>>>>>> required change in device tree binding to target the right type of channel (cmd
> > >>>>>>> vs normal).
> > >>>>>>
> > >>>>>>Or mark the descriptor is cmd and write accordingly...
> > >>>>>
> > >>>>>The only issue i see with that is knowing how much to pre-allocate during
> > >>>>>the
> > >>>>>prep call.  The flag set API would be called on the allocated tx
> > >>>>>descriptor.  So
> > >>>>>you'd have to know up front and be able to specify it.
> > >>>>>
> > >>>>>>
> > >>>>>>>
> > >>>>>>> 2) Provide an API to set flags for the descriptor on a whole descriptor basis.
> > >>>>>>>
> > >>>>>>> 3) If you have a set of transactions described by an sgl that has disparate use
> > >>>>>>> of flags, you split the list and use a separate transaction.  In other words, we
> > >>>>>>> need to enforce that the flag set API will be applied to all descriptors
> > >>>>>>> described by an sgl.  This means that the whole transaction may be comprised of
> > >>>>>>> multiple async TX descriptors.
> > >>>>
> > >>>>Each async TX descriptor will generate the BAM interrupt so we are
> > >>>>deviating
> > >>>>from main purpose of DMA where ideally we should get the interrupt at
> > >>>>the
> > >>>>end
> > >>>>of transfer. This is the main reason for going for this patch.
> > >>>
> > >>>If the client queues 1 descriptor or 5 descriptors, it doesn't matter that
> > >>>much.
> > >>>The client knows when it is done by waiting for the descriptors to
> > >>>complete.  It
> > >>>is less efficient than grouping them all, but it should still work.
> > >>>
> > >>Yes. client will wait for final descriptor completion. But these
> > >>interrupts
> > >>will be overhead for CPU. For 1-2 page it won't matter much I guess it
> > >>will
> > >>be
> > >>significant for complete chip read/write(during boot and FS i.e JFFS
> > >>operations).
> > >>>>
> > >>>>With the submitted patch, only 1 interrupt per channel is required for
> > >>>>complete NAND page and it solves the setting of BAM specific flags also.
> > >>>>Only issue with this patch is adding new API in DMA framework itself.
> > >>>>But
> > >>>>this API can be used by other DMA engines in future for which mapping
> > >>>>cannot
> > >>>>be done with available APIs and if this mapping is vendor specific.
> > >>>
> > >>>I guess the key point in all of this is that the DMA operation being done
> > >>>is not
> > >>>a normal data flow to/from the device.  It's direct remote register access
> > >>>to
> > >>>the device using address information contained in the sgl.  And you are
> > >>>collating the standard data access along with the special command access
> > >>>in the
> > >>>same API call.
> > >>Yes. Normally DMA engine (QCA ADM DMA engine also) allows to read/write
> > >>memory mapped
> > >>registers just like data. But BAM is different (Since it is not a global
> > >>DMA
> > >>Engine
> > >>and coupled with peripheral). Also, this different flag requirement is
> > >>not
> > >>just
> > >>for command descriptors but for data descriptors also.
> > >>
> > >>BAM data access and command access differs only with flag and register
> > >>read/write
> > >>list. The register read and write list will be simply array of
> > >>struct bam_cmd_element added in patch
> > >>struct bam_cmd_element {
> > >>        __le32 addr:24;
> > >>        __le32 command:8;
> > >>        __le32 data;
> > >>        __le32 mask;
> > >>        __le32 reserved;
> > >>};
> > >>
> > >>The address and size of the array will be passed in data and size field
> > >>of
> > >>SGL.
> > >>If we want to form the SGL for mentioned list then we will have SGL of
> > >>size
> > >>15
> > >>with just one descriptor.
> > >>
> > >>Now we require different flag for each SG entry. currently SG does not
> > >>have
> > >>flag parameter and we can't add flag parameter just for our requirement
> > >>in
> > >>generic SG. So we have added the custom mapping function and passed
> > >>modified
> > >>SG
> > >>as parameter which is generic SG and flag.
> > >
> > >I really think that we need some additional API that allows for the flag
> > >munging
> > >for the descriptors instead of overriding the prep_slave_sg.  We already
> > >needed
> > >to change the way the flags are passed anyway.  And instead of building up
> > >a
> > >special sg list, the API should take a structure that has a 1:1 mapping of
> > >the
> > >flags to the descriptors.  And you would call this API on your descriptor
> > >before
> > >issuing it.

Munging wont be a good idea, but for some of the cases current flags can be
used, and if need be, we can add additional flags

> > >
> > >So build up the sglist.  Call the prep_slave_sg.  You get back a tx
> > >descriptor
> > >that underneath is a bam descriptor.  Then call the API giving the
> > >descriptor
> > >and the structure that defines the flags for the descriptors.  Then submit
> > >the
> > >descriptor.
> > >
> > >Something like:
> > >int qcom_bam_apply_descriptor_flags(struct dma_async_tx_descriptor *tx,
> > >				    u16 *flags)
> > >{
> > >	struct bam_async_desc async_desc = container_of(tx,
> > >							struct bam_async_desc,
> > >							vd.tx);
> > >	int i;
> > >
> > >	for (i = 0; i < async_desc->num_desc; i++)
> > >		async_desc->desc[i].flags = cpu_to_le16(flags[i]);
> > >}
> > >
> > >EXPORT_SYMBOL(qcom_bam_apply_descriptor_flags)

This makes it bam specific and causes issues if we want to use this code in
generic libs, but yes this is an option but should be last resort.

-- 
~Vinod

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] dmaengine: Add support for custom data mapping
  2017-01-19  5:01                       ` Vinod Koul
@ 2017-01-19 14:13                         ` Andy Gross
  2017-01-19 14:57                           ` Abhishek Sahu
  2017-01-20 16:56                           ` Vinod Koul
  0 siblings, 2 replies; 21+ messages in thread
From: Andy Gross @ 2017-01-19 14:13 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Abhishek Sahu, dan.j.williams, stanimir.varbanov, mcgrof, okaya,
	pramod.gurav, arnd, linux-kernel, dmaengine, linux-arm-msm

On Thu, Jan 19, 2017 at 10:31:50AM +0530, Vinod Koul wrote:

<snip>

> > > >
> > > >I really think that we need some additional API that allows for the flag
> > > >munging
> > > >for the descriptors instead of overriding the prep_slave_sg.  We already
> > > >needed
> > > >to change the way the flags are passed anyway.  And instead of building up
> > > >a
> > > >special sg list, the API should take a structure that has a 1:1 mapping of
> > > >the
> > > >flags to the descriptors.  And you would call this API on your descriptor
> > > >before
> > > >issuing it.
> 
> Munging wont be a good idea, but for some of the cases current flags can be
> used, and if need be, we can add additional flags

Is adding flags a possibility?  I tried to match up BAM flags to ones that made
sense that were currently defined, but adding a CMD flag would be kind of odd.

It was kind of a stretch to use the PREP_FENCE for the notify when done flag.

> > > >
> > > >So build up the sglist.  Call the prep_slave_sg.  You get back a tx
> > > >descriptor
> > > >that underneath is a bam descriptor.  Then call the API giving the
> > > >descriptor
> > > >and the structure that defines the flags for the descriptors.  Then submit
> > > >the
> > > >descriptor.
> > > >
> > > >Something like:
> > > >int qcom_bam_apply_descriptor_flags(struct dma_async_tx_descriptor *tx,
> > > >				    u16 *flags)
> > > >{
> > > >	struct bam_async_desc async_desc = container_of(tx,
> > > >							struct bam_async_desc,
> > > >							vd.tx);
> > > >	int i;
> > > >
> > > >	for (i = 0; i < async_desc->num_desc; i++)
> > > >		async_desc->desc[i].flags = cpu_to_le16(flags[i]);
> > > >}
> > > >
> > > >EXPORT_SYMBOL(qcom_bam_apply_descriptor_flags)
> 
> This makes it bam specific and causes issues if we want to use this code in
> generic libs, but yes this is an option but should be last resort.

If adding flags is a possibility (which it didn't seem to be in the past), that
would make things much easier.

Regards,

Andy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] dmaengine: Add support for custom data mapping
  2017-01-19 14:13                         ` Andy Gross
@ 2017-01-19 14:57                           ` Abhishek Sahu
  2017-01-20 16:56                           ` Vinod Koul
  1 sibling, 0 replies; 21+ messages in thread
From: Abhishek Sahu @ 2017-01-19 14:57 UTC (permalink / raw)
  To: Andy Gross
  Cc: Vinod Koul, dan.j.williams, stanimir.varbanov, mcgrof, okaya,
	pramod.gurav, arnd, linux-kernel, dmaengine, linux-arm-msm

On 2017-01-19 19:43, Andy Gross wrote:
> On Thu, Jan 19, 2017 at 10:31:50AM +0530, Vinod Koul wrote:
> 
> <snip>
> 
>> > > >
>> > > >I really think that we need some additional API that allows for the flag
>> > > >munging
>> > > >for the descriptors instead of overriding the prep_slave_sg.  We already
>> > > >needed
>> > > >to change the way the flags are passed anyway.  And instead of building up
>> > > >a
>> > > >special sg list, the API should take a structure that has a 1:1 mapping of
>> > > >the
>> > > >flags to the descriptors.  And you would call this API on your descriptor
>> > > >before
>> > > >issuing it.
>> 
>> Munging wont be a good idea, but for some of the cases current flags 
>> can be
>> used, and if need be, we can add additional flags
> 
> Is adding flags a possibility?  I tried to match up BAM flags to ones 
> that made
> sense that were currently defined, but adding a CMD flag would be kind 
> of odd.
> 
> It was kind of a stretch to use the PREP_FENCE for the notify when done 
> flag.
> 
>> > > >
>> > > >So build up the sglist.  Call the prep_slave_sg.  You get back a tx
>> > > >descriptor
>> > > >that underneath is a bam descriptor.  Then call the API giving the
>> > > >descriptor
>> > > >and the structure that defines the flags for the descriptors.  Then submit
>> > > >the
>> > > >descriptor.
>> > > >
>> > > >Something like:
>> > > >int qcom_bam_apply_descriptor_flags(struct dma_async_tx_descriptor *tx,
>> > > >				    u16 *flags)
>> > > >{
>> > > >	struct bam_async_desc async_desc = container_of(tx,
>> > > >							struct bam_async_desc,
>> > > >							vd.tx);
>> > > >	int i;
>> > > >
>> > > >	for (i = 0; i < async_desc->num_desc; i++)
>> > > >		async_desc->desc[i].flags = cpu_to_le16(flags[i]);
>> > > >}
>> > > >
>> > > >EXPORT_SYMBOL(qcom_bam_apply_descriptor_flags)
>> 
>> This makes it bam specific and causes issues if we want to use this 
>> code in
>> generic libs, but yes this is an option but should be last resort.
> 
> If adding flags is a possibility (which it didn't seem to be in the 
> past), that
> would make things much easier.
Also, Main reason for this approach was to set different flags for each
BAM descriptor present in a TX descriptor.

> 
> Regards,
> 
> Andy

-- 
Abhishek Sahu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] dmaengine: Add support for custom data mapping
  2017-01-19 14:13                         ` Andy Gross
  2017-01-19 14:57                           ` Abhishek Sahu
@ 2017-01-20 16:56                           ` Vinod Koul
  2017-04-07 13:58                             ` Abhishek Sahu
  1 sibling, 1 reply; 21+ messages in thread
From: Vinod Koul @ 2017-01-20 16:56 UTC (permalink / raw)
  To: Andy Gross
  Cc: Abhishek Sahu, dan.j.williams, stanimir.varbanov, mcgrof, okaya,
	pramod.gurav, arnd, linux-kernel, dmaengine, linux-arm-msm

On Thu, Jan 19, 2017 at 08:13:17AM -0600, Andy Gross wrote:
> On Thu, Jan 19, 2017 at 10:31:50AM +0530, Vinod Koul wrote:
> 
> <snip>
> 
> > > > >
> > > > >I really think that we need some additional API that allows for the flag
> > > > >munging
> > > > >for the descriptors instead of overriding the prep_slave_sg.  We already
> > > > >needed
> > > > >to change the way the flags are passed anyway.  And instead of building up
> > > > >a
> > > > >special sg list, the API should take a structure that has a 1:1 mapping of
> > > > >the
> > > > >flags to the descriptors.  And you would call this API on your descriptor
> > > > >before
> > > > >issuing it.
> > 
> > Munging wont be a good idea, but for some of the cases current flags can be
> > used, and if need be, we can add additional flags
> 
> Is adding flags a possibility?  I tried to match up BAM flags to ones that made
> sense that were currently defined, but adding a CMD flag would be kind of odd.

Matching flags is a good idea wherever they match, overriding is not :)

> It was kind of a stretch to use the PREP_FENCE for the notify when done flag.

For that, we should use PREP_INTERUPT. DMAengine should assert interrupt
only when this flag is set and continue to next transaction.

> > > > >
> > > > >So build up the sglist.  Call the prep_slave_sg.  You get back a tx
> > > > >descriptor
> > > > >that underneath is a bam descriptor.  Then call the API giving the
> > > > >descriptor
> > > > >and the structure that defines the flags for the descriptors.  Then submit
> > > > >the
> > > > >descriptor.
> > > > >
> > > > >Something like:
> > > > >int qcom_bam_apply_descriptor_flags(struct dma_async_tx_descriptor *tx,
> > > > >				    u16 *flags)
> > > > >{
> > > > >	struct bam_async_desc async_desc = container_of(tx,
> > > > >							struct bam_async_desc,
> > > > >							vd.tx);
> > > > >	int i;
> > > > >
> > > > >	for (i = 0; i < async_desc->num_desc; i++)
> > > > >		async_desc->desc[i].flags = cpu_to_le16(flags[i]);
> > > > >}
> > > > >
> > > > >EXPORT_SYMBOL(qcom_bam_apply_descriptor_flags)
> > 
> > This makes it bam specific and causes issues if we want to use this code in
> > generic libs, but yes this is an option but should be last resort.
> 
> If adding flags is a possibility (which it didn't seem to be in the past), that
> would make things much easier.

Yes if we can describe them generically then yes. So with this and resuing
existing flags without overriding, how many flags do we need..

-- 
~Vinod

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] dmaengine: Add support for custom data mapping
  2017-01-20 16:56                           ` Vinod Koul
@ 2017-04-07 13:58                             ` Abhishek Sahu
  0 siblings, 0 replies; 21+ messages in thread
From: Abhishek Sahu @ 2017-04-07 13:58 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Andy Gross, dan.j.williams, stanimir.varbanov, mcgrof, okaya,
	pramod.gurav, arnd, linux-kernel, dmaengine, linux-arm-msm

On 2017-01-20 22:26, Vinod Koul wrote:
> On Thu, Jan 19, 2017 at 08:13:17AM -0600, Andy Gross wrote:
>> On Thu, Jan 19, 2017 at 10:31:50AM +0530, Vinod Koul wrote:
>> 
<snip>
>> > This makes it bam specific and causes issues if we want to use this code in
>> > generic libs, but yes this is an option but should be last resort.
>> 
>> If adding flags is a possibility (which it didn't seem to be in the 
>> past), that
>> would make things much easier.
> 
> Yes if we can describe them generically then yes. So with this and 
> resuing
> existing flags without overriding, how many flags do we need..

Extremely Sorry for delayed response. I couldn't get time to work on 
this.

Summarizing the original issue and suggestion mentioned in this
mail thread.

ISSUE 1: Using the BAM specific command flag
CMD (Command) - allows the SW to create descriptors of type Command
which does not generate any data transmissions but configures
registers in the Peripheral (write operations, and read registers
operations). If we are going to add this as a part of new flag then
we require 1 new flags (DMA_PREP_CMD).

ISSUE 2: Setting per SG flag
Currently peripheral driver calls dmaengine_prep_slave_sg with flag
as parameter. Some of the peripherals (like NAND) requires different
flag to be set in for each SG.

SOLUTION 1: The original patch adds prep_dma_custom_mapping in the
generic DMA engine. The peripheral driver will pass the custom data
in this function and DMA engine driver will form the descriptor
according to its own logic. In future, this API can be used by any
other DMA engine drivers also which are unable to do DMA mapping
with already available API’s.

Drawback: Requires adding additional API in DMA framework which uses
void pointer.

SOLUTION 2: Define and export BAM specific custom API that allows for
the flag munging for the descriptors instead of overriding the
prep_slave_sg. The API should take a structure that has a 1:1 mapping
of the flags to the descriptors and this API will be called before
submitting the descriptors.

Drawback: This makes it BAM specific and causes issues if we want to
use this code in generic libs.

SOLUTION 3: Add CMD flags in generic DMA engine, split the list
and call prep_slave_sg multiple times.

Drawback: This flag is BAM specific and it can’t be used in other
drivers without overriding. Also, each prep_slave_sg will generate
the BAM hardware interrupt and impact the performance.

I did some experiments and here are the results with linux
mtd_speedtest:

- With solution 1 and 2,
    - 2 interrupts will be generated per NAND page read/write
        for 2K page
    - The mtd read speed obtained is 8685 KiB/s

- With solution 3,
    - 8 interrupts will be generated per NAND page read/write
      for 2K page
    - The mtd read speed 7419 KiB/s which increases boot time
      and decrease the File System KPI's

-- 
Abhishek Sahu

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-04-07 13:58 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-15  9:55 [PATCH 0/5] Support for QCA BAM DMA command descriptor Abhishek Sahu
2016-12-15  9:55 ` [PATCH 1/5] dmaengine: qca: bam_dma: Add header file for bam driver Abhishek Sahu
2016-12-15  9:55 ` [PATCH 2/5] dmaengine: Add support for custom data mapping Abhishek Sahu
2016-12-18 16:26   ` Vinod Koul
2016-12-19  5:06     ` Andy Gross
2016-12-19 15:49       ` Vinod Koul
2016-12-19 17:52         ` Andy Gross
2016-12-20 19:28           ` Abhishek Sahu
2016-12-20 20:25             ` Andy Gross
2016-12-21 19:34               ` Abhishek Sahu
2016-12-29 17:54                 ` Andy Gross
2017-01-02 14:25                   ` Abhishek Sahu
2017-01-02 16:12                     ` Andy Gross
2017-01-19  5:01                       ` Vinod Koul
2017-01-19 14:13                         ` Andy Gross
2017-01-19 14:57                           ` Abhishek Sahu
2017-01-20 16:56                           ` Vinod Koul
2017-04-07 13:58                             ` Abhishek Sahu
2016-12-15  9:55 ` [PATCH 3/5] dmaengine: qca: bam_dma: Add support for bam sgl Abhishek Sahu
2016-12-15  9:55 ` [PATCH 4/5] dmaengine: qca: bam_dma: implement custom data mapping Abhishek Sahu
2016-12-15  9:55 ` [PATCH 5/5] dmaengine: qca: bam_dma: implement command descriptor Abhishek Sahu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).