linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Broadcom SBA RAID support
@ 2017-02-07  8:16 Anup Patel
  2017-02-07  8:16 ` [PATCH v2 1/5] lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position Anup Patel
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Anup Patel @ 2017-02-07  8:16 UTC (permalink / raw)
  To: Vinod Koul, Rob Herring, Mark Rutland, Herbert Xu,
	David S . Miller, Jassi Brar
  Cc: Dan Williams, Ray Jui, Scott Branden, Jon Mason, Rob Rice,
	bcm-kernel-feedback-list, dmaengine, devicetree,
	linux-arm-kernel, linux-kernel, linux-crypto, linux-raid,
	Anup Patel

The Broadcom SBA RAID is a stream-based device which provides
RAID5/6 offload.

It requires a SoC specific ring manager (such as Broadcom FlexRM
ring manager) to provide ring-based programming interface. Due to
this, the Broadcom SBA RAID driver (mailbox client) implements
DMA device having one DMA channel using a set of mailbox channels
provided by Broadcom SoC specific ring manager driver (mailbox
controller).

Important limitations of Broadcom SBA RAID hardware are:
1. Requires disk position instead of disk coefficient 
2. Supports only 30 PQ disk coefficients

To address limitation #1, we have added raid_gflog table which
will help driver convert disk coefficient to disk position. To
address limitation #2, we have extended Linux Async Tx APIs to
check for available PQ coefficients before doing PQ offload.

This patchset is based on Linux-4.10-rc2 and depends on patchset
"[PATCH v4 0/2] Broadcom FlexRM ring manager support"

It is also available at sba-raid-v2 branch of
https://github.com/Broadcom/arm64-linux.git

Changes since v1:
 - Droped patch to add mbox_channel_device() API
 - Used GENMASK and BIT macros wherever possible in bcm-sba-raid driver
 - Replaced C_MDATA macros with static inline functions in
   bcm-sba-raid driver
 - Removed sba_alloc_chan_resources() callback in bcm-sba-raid driver
 - Used dev_err() instead of dev_info() wherever applicable
 - Removed call to sba_issue_pending() from sba_tx_submit() in
   bcm-sba-raid driver
 - Implemented SBA request chaning for handling (len > sba->req_size)
   in bcm-sba-raid driver
 - Implemented device_terminate_all() callback in bcm-sba-raid driver

Anup Patel (5):
  lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position
  async_tx: Handle DMA devices having support for fewer PQ coefficients
  async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()
  dmaengine: Add Broadcom SBA RAID driver
  dt-bindings: Add DT bindings document for Broadcom SBA RAID driver

 .../devicetree/bindings/dma/brcm,iproc-sba.txt     |   29 +
 crypto/async_tx/async_pq.c                         |    8 +-
 crypto/async_tx/async_raid6_recov.c                |   12 +-
 drivers/dma/Kconfig                                |   13 +
 drivers/dma/Makefile                               |    1 +
 drivers/dma/bcm-sba-raid.c                         | 1470 ++++++++++++++++++++
 include/linux/dmaengine.h                          |   19 +
 include/linux/raid/pq.h                            |    4 +
 lib/raid6/mktables.c                               |   20 +
 9 files changed, 1571 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt
 create mode 100644 drivers/dma/bcm-sba-raid.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/5] lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position
  2017-02-07  8:16 [PATCH v2 0/5] Broadcom SBA RAID support Anup Patel
@ 2017-02-07  8:16 ` Anup Patel
       [not found] ` <1486455406-11202-1-git-send-email-anup.patel-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Anup Patel @ 2017-02-07  8:16 UTC (permalink / raw)
  To: Vinod Koul, Rob Herring, Mark Rutland, Herbert Xu,
	David S . Miller, Jassi Brar
  Cc: Dan Williams, Ray Jui, Scott Branden, Jon Mason, Rob Rice,
	bcm-kernel-feedback-list, dmaengine, devicetree,
	linux-arm-kernel, linux-kernel, linux-crypto, linux-raid,
	Anup Patel

The raid6_gfexp table represents {2}^n values for 0 <= n < 256. The
Linux async_tx framework pass values from raid6_gfexp as coefficients
for each source to prep_dma_pq() callback of DMA channel with PQ
capability. This creates problem for RAID6 offload engines (such as
Broadcom SBA) which take disk position (i.e. log of {2}) instead of
multiplicative cofficients from raid6_gfexp table.

This patch adds raid6_gflog table having log-of-2 value for any given
x such that 0 <= x < 256. For any given disk coefficient x, the
corresponding disk position is given by raid6_gflog[x]. The RAID6
offload engine driver can use this newly added raid6_gflog table to
get disk position from multiplicative coefficient.

Signed-off-by: Anup Patel <anup.patel@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
---
 include/linux/raid/pq.h |  1 +
 lib/raid6/mktables.c    | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h
index 4d57bba..30f9453 100644
--- a/include/linux/raid/pq.h
+++ b/include/linux/raid/pq.h
@@ -142,6 +142,7 @@ int raid6_select_algo(void);
 extern const u8 raid6_gfmul[256][256] __attribute__((aligned(256)));
 extern const u8 raid6_vgfmul[256][32] __attribute__((aligned(256)));
 extern const u8 raid6_gfexp[256]      __attribute__((aligned(256)));
+extern const u8 raid6_gflog[256]      __attribute__((aligned(256)));
 extern const u8 raid6_gfinv[256]      __attribute__((aligned(256)));
 extern const u8 raid6_gfexi[256]      __attribute__((aligned(256)));
 
diff --git a/lib/raid6/mktables.c b/lib/raid6/mktables.c
index 39787db..e824d08 100644
--- a/lib/raid6/mktables.c
+++ b/lib/raid6/mktables.c
@@ -125,6 +125,26 @@ int main(int argc, char *argv[])
 	printf("EXPORT_SYMBOL(raid6_gfexp);\n");
 	printf("#endif\n");
 
+	/* Compute log-of-2 table */
+	printf("\nconst u8 __attribute__((aligned(256)))\n"
+	       "raid6_gflog[256] =\n" "{\n");
+	for (i = 0; i < 256; i += 8) {
+		printf("\t");
+		for (j = 0; j < 8; j++) {
+			v = 255;
+			for (k = 0; k < 256; k++)
+				if (exptbl[k] == (i + j)) {
+					v = k;
+					break;
+				}
+			printf("0x%02x,%c", v, (j == 7) ? '\n' : ' ');
+		}
+	}
+	printf("};\n");
+	printf("#ifdef __KERNEL__\n");
+	printf("EXPORT_SYMBOL(raid6_gflog);\n");
+	printf("#endif\n");
+
 	/* Compute inverse table x^-1 == x^254 */
 	printf("\nconst u8 __attribute__((aligned(256)))\n"
 	       "raid6_gfinv[256] =\n" "{\n");
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients
       [not found] ` <1486455406-11202-1-git-send-email-anup.patel-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
@ 2017-02-07  8:16   ` Anup Patel
  2017-02-07  8:27     ` Dan Williams
  2017-02-07  8:16   ` [PATCH v2 3/5] async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome() Anup Patel
  1 sibling, 1 reply; 16+ messages in thread
From: Anup Patel @ 2017-02-07  8:16 UTC (permalink / raw)
  To: Vinod Koul, Rob Herring, Mark Rutland, Herbert Xu,
	David S . Miller, Jassi Brar
  Cc: Dan Williams, Ray Jui, Scott Branden, Jon Mason, Rob Rice,
	bcm-kernel-feedback-list-dY08KVG/lbpWk0Htik3J/w,
	dmaengine-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA, Anup Patel

The DMAENGINE framework assumes that if PQ offload is supported by a
DMA device then all 256 PQ coefficients are supported. This assumption
does not hold anymore because we now have BCM-SBA-RAID offload engine
which supports PQ offload with limited number of PQ coefficients.

This patch extends async_tx APIs to handle DMA devices with support
for fewer PQ coefficients.

Signed-off-by: Anup Patel <anup.patel-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Reviewed-by: Scott Branden <scott.branden-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
---
 crypto/async_tx/async_pq.c          |  3 +++
 crypto/async_tx/async_raid6_recov.c | 12 ++++++++++--
 include/linux/dmaengine.h           | 19 +++++++++++++++++++
 include/linux/raid/pq.h             |  3 +++
 4 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c
index f83de99..16c6526 100644
--- a/crypto/async_tx/async_pq.c
+++ b/crypto/async_tx/async_pq.c
@@ -187,6 +187,9 @@ async_gen_syndrome(struct page **blocks, unsigned int offset, int disks,
 
 	BUG_ON(disks > 255 || !(P(blocks, disks) || Q(blocks, disks)));
 
+	if (device && dma_maxpqcoef(device) < src_cnt)
+		device = NULL;
+
 	if (device)
 		unmap = dmaengine_get_unmap_data(device->dev, disks, GFP_NOWAIT);
 
diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c
index 8fab627..2916f95 100644
--- a/crypto/async_tx/async_raid6_recov.c
+++ b/crypto/async_tx/async_raid6_recov.c
@@ -352,6 +352,7 @@ async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb,
 {
 	void *scribble = submit->scribble;
 	int non_zero_srcs, i;
+	struct dma_chan *chan = async_dma_find_channel(DMA_PQ);
 
 	BUG_ON(faila == failb);
 	if (failb < faila)
@@ -359,12 +360,15 @@ async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb,
 
 	pr_debug("%s: disks: %d len: %zu\n", __func__, disks, bytes);
 
+	if (chan && dma_maxpqcoef(chan->device) < RAID6_PQ_MAX_COEF)
+		chan = NULL;
+
 	/* if a dma resource is not available or a scribble buffer is not
 	 * available punt to the synchronous path.  In the 'dma not
 	 * available' case be sure to use the scribble buffer to
 	 * preserve the content of 'blocks' as the caller intended.
 	 */
-	if (!async_dma_find_channel(DMA_PQ) || !scribble) {
+	if (!chan || !scribble) {
 		void **ptrs = scribble ? scribble : (void **) blocks;
 
 		async_tx_quiesce(&submit->depend_tx);
@@ -432,15 +436,19 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila,
 	void *scribble = submit->scribble;
 	int good_srcs, good, i;
 	struct page *srcs[2];
+	struct dma_chan *chan = async_dma_find_channel(DMA_PQ);
 
 	pr_debug("%s: disks: %d len: %zu\n", __func__, disks, bytes);
 
+	if (chan && dma_maxpqcoef(chan->device) < RAID6_PQ_MAX_COEF)
+		chan = NULL;
+
 	/* if a dma resource is not available or a scribble buffer is not
 	 * available punt to the synchronous path.  In the 'dma not
 	 * available' case be sure to use the scribble buffer to
 	 * preserve the content of 'blocks' as the caller intended.
 	 */
-	if (!async_dma_find_channel(DMA_PQ) || !scribble) {
+	if (!chan || !scribble) {
 		void **ptrs = scribble ? scribble : (void **) blocks;
 
 		async_tx_quiesce(&submit->depend_tx);
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index feee6ec..d938a8b 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -24,6 +24,7 @@
 #include <linux/scatterlist.h>
 #include <linux/bitmap.h>
 #include <linux/types.h>
+#include <linux/raid/pq.h>
 #include <asm/page.h>
 
 /**
@@ -668,6 +669,7 @@ struct dma_filter {
  * @cap_mask: one or more dma_capability flags
  * @max_xor: maximum number of xor sources, 0 if no capability
  * @max_pq: maximum number of PQ sources and PQ-continue capability
+ * @max_pqcoef: maximum number of PQ coefficients, 0 if all supported
  * @copy_align: alignment shift for memcpy operations
  * @xor_align: alignment shift for xor operations
  * @pq_align: alignment shift for pq operations
@@ -727,11 +729,13 @@ struct dma_device {
 	dma_cap_mask_t  cap_mask;
 	unsigned short max_xor;
 	unsigned short max_pq;
+	unsigned short max_pqcoef;
 	enum dmaengine_alignment copy_align;
 	enum dmaengine_alignment xor_align;
 	enum dmaengine_alignment pq_align;
 	enum dmaengine_alignment fill_align;
 	#define DMA_HAS_PQ_CONTINUE (1 << 15)
+	#define DMA_HAS_FEWER_PQ_COEF (1 << 15)
 
 	int dev_id;
 	struct device *dev;
@@ -1122,6 +1126,21 @@ static inline int dma_maxpq(struct dma_device *dma, enum dma_ctrl_flags flags)
 	BUG();
 }
 
+static inline void dma_set_maxpqcoef(struct dma_device *dma,
+				     unsigned short max_pqcoef)
+{
+	if (max_pqcoef < RAID6_PQ_MAX_COEF) {
+		dma->max_pqcoef = max_pqcoef;
+		dma->max_pqcoef |= DMA_HAS_FEWER_PQ_COEF;
+	}
+}
+
+static inline unsigned short dma_maxpqcoef(struct dma_device *dma)
+{
+	return (dma->max_pqcoef & DMA_HAS_FEWER_PQ_COEF) ?
+		(dma->max_pqcoef & ~DMA_HAS_FEWER_PQ_COEF) : RAID6_PQ_MAX_COEF;
+}
+
 static inline size_t dmaengine_get_icg(bool inc, bool sgl, size_t icg,
 				      size_t dir_icg)
 {
diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h
index 30f9453..f3a04bb 100644
--- a/include/linux/raid/pq.h
+++ b/include/linux/raid/pq.h
@@ -15,6 +15,9 @@
 
 #ifdef __KERNEL__
 
+/* Max number of PQ coefficients */
+#define RAID6_PQ_MAX_COEF 256
+
 /* Set to 1 to use kernel-wide empty_zero_page */
 #define RAID6_USE_EMPTY_ZERO_PAGE 0
 #include <linux/blkdev.h>
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/5] async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()
       [not found] ` <1486455406-11202-1-git-send-email-anup.patel-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
  2017-02-07  8:16   ` [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients Anup Patel
@ 2017-02-07  8:16   ` Anup Patel
  1 sibling, 0 replies; 16+ messages in thread
From: Anup Patel @ 2017-02-07  8:16 UTC (permalink / raw)
  To: Vinod Koul, Rob Herring, Mark Rutland, Herbert Xu,
	David S . Miller, Jassi Brar
  Cc: Dan Williams, Ray Jui, Scott Branden, Jon Mason, Rob Rice,
	bcm-kernel-feedback-list-dY08KVG/lbpWk0Htik3J/w,
	dmaengine-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA, Anup Patel

The DMA_PREP_FENCE is to be used when preparing Tx descriptor if output
of Tx descriptor is to be used by next/dependent Tx descriptor.

The DMA_PREP_FENSE will not be set correctly in do_async_gen_syndrome()
when calling dma->device_prep_dma_pq() under following conditions:
1. ASYNC_TX_FENCE not set in submit->flags
2. DMA_PREP_FENCE not set in dma_flags
3. src_cnt (= (disks - 2)) is greater than dma_maxpq(dma, dma_flags)

This patch fixes DMA_PREP_FENCE usage in do_async_gen_syndrome() taking
inspiration from do_async_xor() implementation.

Signed-off-by: Anup Patel <anup.patel-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Reviewed-by: Ray Jui <ray.jui-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Reviewed-by: Scott Branden <scott.branden-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
---
 crypto/async_tx/async_pq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c
index 16c6526..947cf35 100644
--- a/crypto/async_tx/async_pq.c
+++ b/crypto/async_tx/async_pq.c
@@ -62,9 +62,6 @@ do_async_gen_syndrome(struct dma_chan *chan,
 	dma_addr_t dma_dest[2];
 	int src_off = 0;
 
-	if (submit->flags & ASYNC_TX_FENCE)
-		dma_flags |= DMA_PREP_FENCE;
-
 	while (src_cnt > 0) {
 		submit->flags = flags_orig;
 		pq_src_cnt = min(src_cnt, dma_maxpq(dma, dma_flags));
@@ -83,6 +80,8 @@ do_async_gen_syndrome(struct dma_chan *chan,
 			if (cb_fn_orig)
 				dma_flags |= DMA_PREP_INTERRUPT;
 		}
+		if (submit->flags & ASYNC_TX_FENCE)
+			dma_flags |= DMA_PREP_FENCE;
 
 		/* Drivers force forward progress in case they can not provide
 		 * a descriptor
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 4/5] dmaengine: Add Broadcom SBA RAID driver
  2017-02-07  8:16 [PATCH v2 0/5] Broadcom SBA RAID support Anup Patel
  2017-02-07  8:16 ` [PATCH v2 1/5] lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position Anup Patel
       [not found] ` <1486455406-11202-1-git-send-email-anup.patel-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
@ 2017-02-07  8:16 ` Anup Patel
  2017-02-07  8:16 ` [PATCH v2 5/5] dt-bindings: Add DT bindings document for " Anup Patel
  3 siblings, 0 replies; 16+ messages in thread
From: Anup Patel @ 2017-02-07  8:16 UTC (permalink / raw)
  To: Vinod Koul, Rob Herring, Mark Rutland, Herbert Xu,
	David S . Miller, Jassi Brar
  Cc: Dan Williams, Ray Jui, Scott Branden, Jon Mason, Rob Rice,
	bcm-kernel-feedback-list, dmaengine, devicetree,
	linux-arm-kernel, linux-kernel, linux-crypto, linux-raid,
	Anup Patel

The Broadcom stream buffer accelerator (SBA) provides offloading
capabilities for RAID operations. This SBA offload engine is
accessible via Broadcom SoC specific ring manager.

This patch adds Broadcom SBA RAID driver which provides one
DMA device with RAID capabilities using one or more Broadcom
SoC specific ring manager channels. The SBA RAID driver in its
current shape implements memcpy, xor, and pq operations.

Signed-off-by: Anup Patel <anup.patel@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
---
 drivers/dma/Kconfig        |   13 +
 drivers/dma/Makefile       |    1 +
 drivers/dma/bcm-sba-raid.c | 1470 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1484 insertions(+)
 create mode 100644 drivers/dma/bcm-sba-raid.c

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 263495d..bf8fb84 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -99,6 +99,19 @@ config AXI_DMAC
 	  controller is often used in Analog Device's reference designs for FPGA
 	  platforms.
 
+config BCM_SBA_RAID
+	tristate "Broadcom SBA RAID engine support"
+	depends on (ARM64 && MAILBOX && RAID6_PQ) || COMPILE_TEST
+	select DMA_ENGINE
+	select DMA_ENGINE_RAID
+	select ASYNC_TX_ENABLE_CHANNEL_SWITCH
+	default ARCH_BCM_IPROC
+	help
+	  Enable support for Broadcom SBA RAID Engine. The SBA RAID
+	  engine is available on most of the Broadcom iProc SoCs. It
+	  has the capability to offload memcpy, xor and pq computation
+	  for raid5/6.
+
 config COH901318
 	bool "ST-Ericsson COH901318 DMA support"
 	select DMA_ENGINE
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index a4fa336..ba96bdd 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_AMCC_PPC440SPE_ADMA) += ppc4xx/
 obj-$(CONFIG_AT_HDMAC) += at_hdmac.o
 obj-$(CONFIG_AT_XDMAC) += at_xdmac.o
 obj-$(CONFIG_AXI_DMAC) += dma-axi-dmac.o
+obj-$(CONFIG_BCM_SBA_RAID) += bcm-sba-raid.o
 obj-$(CONFIG_COH901318) += coh901318.o coh901318_lli.o
 obj-$(CONFIG_DMA_BCM2835) += bcm2835-dma.o
 obj-$(CONFIG_DMA_JZ4740) += dma-jz4740.o
diff --git a/drivers/dma/bcm-sba-raid.c b/drivers/dma/bcm-sba-raid.c
new file mode 100644
index 0000000..c823462
--- /dev/null
+++ b/drivers/dma/bcm-sba-raid.c
@@ -0,0 +1,1470 @@
+/*
+ * Copyright (C) 2017 Broadcom
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * Broadcom SBA RAID Driver
+ *
+ * The Broadcom stream buffer accelerator (SBA) provides offloading
+ * capabilities for RAID operations. The SBA offload engine is accessible
+ * via Broadcom SoC specific ring manager. Two or more offload engines
+ * can share same Broadcom SoC specific ring manager due to this Broadcom
+ * SoC specific ring manager driver is implemented as a mailbox controller
+ * driver and offload engine drivers are implemented as mallbox clients.
+ *
+ * Typically, Broadcom SoC specific ring manager will implement larger
+ * number of hardware rings over one or more SBA hardware devices. By
+ * design, the internal buffer size of SBA hardware device is limited
+ * but all offload operations supported by SBA can be broken down into
+ * multiple small size requests and executed parallely on multiple SBA
+ * hardware devices for achieving high through-put.
+ *
+ * The Broadcom SBA RAID driver does not require any register programming
+ * except submitting request to SBA hardware device via mailbox channels.
+ * This driver implements a DMA device with one DMA channel using a set
+ * of mailbox channels provided by Broadcom SoC specific ring manager
+ * driver. To exploit parallelism (as described above), all DMA request
+ * coming to SBA RAID DMA channel are broken down to smaller requests
+ * and submitted to multiple mailbox channels in round-robin fashion.
+ * For having more SBA DMA channels, we can create more SBA device nodes
+ * in Broadcom SoC specific DTS based on number of hardware rings supported
+ * by Broadcom SoC ring manager.
+ */
+
+#include <linux/bitops.h>
+#include <linux/dma-mapping.h>
+#include <linux/dmaengine.h>
+#include <linux/list.h>
+#include <linux/mailbox_client.h>
+#include <linux/mailbox/brcm-message.h>
+#include <linux/module.h>
+#include <linux/of_device.h>
+#include <linux/slab.h>
+#include <linux/raid/pq.h>
+
+#include "dmaengine.h"
+
+/* SBA command helper macros */
+#define SBA_DEC(_d, _s, _m)		(((_d) >> (_s)) & (_m))
+#define SBA_ENC(_d, _v, _s, _m)					\
+		do {						\
+			(_d) &= ~((u64)(_m) << (_s));		\
+			(_d) |= (((u64)(_v) & (_m)) << (_s));	\
+		} while (0)
+
+/* SBA command related defines */
+#define SBA_TYPE_SHIFT					48
+#define SBA_TYPE_MASK					GENMASK(1, 0)
+#define SBA_TYPE_A					0x0
+#define SBA_TYPE_B					0x2
+#define SBA_TYPE_C					0x3
+#define SBA_USER_DEF_SHIFT				32
+#define SBA_USER_DEF_MASK				GENMASK(15, 0)
+#define SBA_R_MDATA_SHIFT				24
+#define SBA_R_MDATA_MASK				GENMASK(7, 0)
+#define SBA_C_MDATA_MS_SHIFT				18
+#define SBA_C_MDATA_MS_MASK				GENMASK(1, 0)
+#define SBA_INT_SHIFT					17
+#define SBA_INT_MASK					BIT(0)
+#define SBA_RESP_SHIFT					16
+#define SBA_RESP_MASK					BIT(0)
+#define SBA_C_MDATA_SHIFT				8
+#define SBA_C_MDATA_MASK				GENMASK(7, 0)
+#define SBA_C_MDATA_BNUMx_SHIFT(__bnum)			(2 * (__bnum))
+#define SBA_C_MDATA_BNUMx_MASK				GENMASK(1, 0)
+#define SBA_C_MDATA_DNUM_SHIFT				5
+#define SBA_C_MDATA_DNUM_MASK				GENMASK(4, 0)
+#define SBA_C_MDATA_LS(__v)				((__v) & 0xff)
+#define SBA_C_MDATA_MS(__v)				(((__v) >> 8) & 0x3)
+#define SBA_CMD_SHIFT					0
+#define SBA_CMD_MASK					GENMASK(3, 0)
+#define SBA_CMD_ZERO_ALL_BUFFERS			0x8
+#define SBA_CMD_LOAD_BUFFER				0x9
+#define SBA_CMD_XOR					0xa
+#define SBA_CMD_GALOIS_XOR				0xb
+#define SBA_CMD_ZERO_BUFFER				0x4
+#define SBA_CMD_WRITE_BUFFER				0xc
+
+/* Driver helper macros */
+#define to_sba_request(tx)		\
+	container_of(tx, struct sba_request, tx)
+#define to_sba_device(dchan)		\
+	container_of(dchan, struct sba_device, dma_chan)
+
+enum sba_request_state {
+	SBA_REQUEST_STATE_FREE = 1,
+	SBA_REQUEST_STATE_ALLOCED = 2,
+	SBA_REQUEST_STATE_PENDING = 3,
+	SBA_REQUEST_STATE_ACTIVE = 4,
+	SBA_REQUEST_STATE_COMPLETED = 5,
+	SBA_REQUEST_STATE_ABORTED = 6,
+};
+
+struct sba_request {
+	/* Global state */
+	struct list_head node;
+	struct sba_device *sba;
+	enum sba_request_state state;
+	bool fence;
+	/* Chained requests management */
+	struct sba_request *first;
+	struct list_head next;
+	unsigned int next_count;
+	atomic_t next_pending_count;
+	/* BRCM message data */
+	void *resp;
+	dma_addr_t resp_dma;
+	struct brcm_sba_command *cmds;
+	struct brcm_message *msgs;
+	struct brcm_message bmsg;
+	atomic_t msgs_pending_count;
+	struct dma_async_tx_descriptor tx;
+};
+
+enum sba_version {
+	SBA_VER_1 = 0,
+	SBA_VER_2
+};
+
+struct sba_device {
+	/* Underlying device */
+	struct device *dev;
+	/* DT configuration parameters */
+	enum sba_version ver;
+	u32 max_req;
+	u32 req_size;
+	/* Derived configuration parameters */
+	u32 hw_buf_size;
+	u32 hw_resp_size;
+	u32 max_pq_coefs;
+	u32 max_pq_srcs;
+	u32 max_msg_per_req;
+	u32 max_cmd_per_msg;
+	u32 max_cmd_per_req;
+	u32 max_xor_srcs;
+	u32 max_resp_pool_size;
+	u32 max_cmds_pool_size;
+	/* Maibox client and Mailbox channels */
+	struct mbox_client client;
+	int mchans_count;
+	atomic_t mchans_current;
+	struct mbox_chan **mchans;
+	struct device *mbox_dev;
+	/* DMA device and DMA channel */
+	struct dma_device dma_dev;
+	struct dma_chan dma_chan;
+	/* DMA channel resources */
+	void *resp_base;
+	dma_addr_t resp_dma_base;
+	void *cmds_base;
+	dma_addr_t cmds_dma_base;
+	spinlock_t reqs_lock;
+	struct sba_request *reqs;
+	bool reqs_fence;
+	struct list_head reqs_alloc_list;
+	struct list_head reqs_pending_list;
+	struct list_head reqs_active_list;
+	struct list_head reqs_completed_list;
+	struct list_head reqs_aborted_list;
+	struct list_head reqs_free_list;
+	int reqs_free_count;
+};
+
+/* ====== C_MDATA helper routines ===== */
+
+static inline u32 sba_cmd_load_c_mdata(u32 b0)
+{
+	return b0 & SBA_C_MDATA_BNUMx_MASK;
+}
+
+static inline u32 sba_cmd_write_c_mdata(u32 b0)
+{
+	return b0 & SBA_C_MDATA_BNUMx_MASK;
+}
+
+static inline u32 sba_cmd_xor_c_mdata(u32 b1, u32 b0)
+{
+	return (b0 & SBA_C_MDATA_BNUMx_MASK) |
+	       ((b1 & SBA_C_MDATA_BNUMx_MASK) << SBA_C_MDATA_BNUMx_SHIFT(1));
+}
+
+static inline u32 sba_cmd_pq_c_mdata(u32 d, u32 b1, u32 b0)
+{
+	return (b0 & SBA_C_MDATA_BNUMx_MASK) |
+	       ((b1 & SBA_C_MDATA_BNUMx_MASK) << SBA_C_MDATA_BNUMx_SHIFT(1)) |
+	       ((d & SBA_C_MDATA_DNUM_MASK) << SBA_C_MDATA_DNUM_SHIFT);
+}
+
+/* ====== Channel resource management routines ===== */
+
+static struct sba_request *sba_alloc_request(struct sba_device *sba)
+{
+	unsigned long flags;
+	struct sba_request *req = NULL;
+
+	spin_lock_irqsave(&sba->reqs_lock, flags);
+
+	if (!list_empty(&sba->reqs_free_list)) {
+		req = list_first_entry(&sba->reqs_free_list,
+				       struct sba_request,
+				       node);
+
+		list_move_tail(&req->node, &sba->reqs_alloc_list);
+		req->state = SBA_REQUEST_STATE_ALLOCED;
+		req->fence = false;
+		req->first = req;
+		INIT_LIST_HEAD(&req->next);
+		req->next_count = 1;
+		atomic_set(&req->next_pending_count, 1);
+		atomic_set(&req->msgs_pending_count, 0);
+
+		sba->reqs_free_count--;
+
+		dma_async_tx_descriptor_init(&req->tx, &sba->dma_chan);
+	}
+
+	spin_unlock_irqrestore(&sba->reqs_lock, flags);
+
+	return req;
+}
+
+/* Note: Must be called with sba->reqs_lock held */
+static void _sba_pending_request(struct sba_device *sba,
+				 struct sba_request *req)
+{
+	req->state = SBA_REQUEST_STATE_PENDING;
+	list_move_tail(&req->node, &sba->reqs_pending_list);
+	if (list_empty(&sba->reqs_active_list))
+		sba->reqs_fence = false;
+}
+
+/* Note: Must be called with sba->reqs_lock held */
+static bool _sba_active_request(struct sba_device *sba,
+				struct sba_request *req)
+{
+	if (list_empty(&sba->reqs_active_list))
+		sba->reqs_fence = false;
+	if (sba->reqs_fence)
+		return false;
+	req->state = SBA_REQUEST_STATE_ACTIVE;
+	list_move_tail(&req->node, &sba->reqs_active_list);
+	if (req->fence)
+		sba->reqs_fence = true;
+	return true;
+}
+
+/* Note: Must be called with sba->reqs_lock held */
+static void _sba_abort_request(struct sba_device *sba,
+			       struct sba_request *req)
+{
+	req->state = SBA_REQUEST_STATE_ABORTED;
+	list_move_tail(&req->node, &sba->reqs_aborted_list);
+	if (list_empty(&sba->reqs_active_list))
+		sba->reqs_fence = false;
+}
+
+/* Note: Must be called with sba->reqs_lock held */
+static void _sba_free_request(struct sba_device *sba,
+			      struct sba_request *req)
+{
+	req->state = SBA_REQUEST_STATE_FREE;
+	list_move_tail(&req->node, &sba->reqs_free_list);
+	if (list_empty(&sba->reqs_active_list))
+		sba->reqs_fence = false;
+	sba->reqs_free_count++;
+}
+
+static void sba_complete_chained_requests(struct sba_request *req)
+{
+	unsigned long flags;
+	struct sba_request *nreq;
+	struct sba_device *sba = req->sba;
+
+	spin_lock_irqsave(&sba->reqs_lock, flags);
+
+	list_for_each_entry(nreq, &req->next, next) {
+		nreq->state = SBA_REQUEST_STATE_COMPLETED;
+		list_move_tail(&nreq->node, &sba->reqs_completed_list);
+	}
+	req->state = SBA_REQUEST_STATE_COMPLETED;
+	list_move_tail(&req->node, &sba->reqs_completed_list);
+	if (list_empty(&sba->reqs_active_list))
+		sba->reqs_fence = false;
+
+	spin_unlock_irqrestore(&sba->reqs_lock, flags);
+}
+
+static void sba_free_chained_requests(struct sba_request *req)
+{
+	unsigned long flags;
+	struct sba_request *nreq;
+	struct sba_device *sba = req->sba;
+
+	spin_lock_irqsave(&sba->reqs_lock, flags);
+
+	list_for_each_entry(nreq, &req->next, next) {
+		_sba_free_request(sba, nreq);
+	}
+	_sba_free_request(sba, req);
+
+	spin_unlock_irqrestore(&sba->reqs_lock, flags);
+}
+
+static void sba_chain_request(struct sba_request *first,
+			      struct sba_request *req)
+{
+	unsigned long flags;
+	struct sba_device *sba = req->sba;
+
+	spin_lock_irqsave(&sba->reqs_lock, flags);
+
+	list_add_tail(&req->next, &first->next);
+	req->first = first;
+	first->next_count++;
+	atomic_set(&first->next_pending_count, first->next_count);
+
+	spin_unlock_irqrestore(&sba->reqs_lock, flags);
+}
+
+static void sba_cleanup_nonpending_requests(struct sba_device *sba)
+{
+	unsigned long flags;
+	struct sba_request *req, *req1;
+
+	spin_lock_irqsave(&sba->reqs_lock, flags);
+
+	/* Freeup all alloced request */
+	list_for_each_entry_safe(req, req1, &sba->reqs_alloc_list, node) {
+		_sba_free_request(sba, req);
+	}
+
+	/* Freeup all completed request */
+	list_for_each_entry_safe(req, req1, &sba->reqs_completed_list, node) {
+		_sba_free_request(sba, req);
+	}
+
+	/* Set all active requests as aborted */
+	list_for_each_entry_safe(req, req1, &sba->reqs_active_list, node) {
+		_sba_abort_request(sba, req);
+	}
+
+	/*
+	 * Note: We expect that aborted request will be eventually
+	 * freed by sba_receive_message()
+	 */
+
+	spin_unlock_irqrestore(&sba->reqs_lock, flags);
+}
+
+static void sba_cleanup_pending_requests(struct sba_device *sba)
+{
+	unsigned long flags;
+	struct sba_request *req, *req1;
+
+	spin_lock_irqsave(&sba->reqs_lock, flags);
+
+	/* Freeup all pending request */
+	list_for_each_entry_safe(req, req1, &sba->reqs_pending_list, node) {
+		if (req->bmsg.batch.msgs_queued < req->bmsg.batch.msgs_count)
+			/* Set partially-queued request as aborted */
+			_sba_abort_request(sba, req);
+		else
+			/* Freeup rest of the pending request */
+			_sba_free_request(sba, req);
+	}
+
+	spin_unlock_irqrestore(&sba->reqs_lock, flags);
+}
+
+/* ====== DMAENGINE callbacks ===== */
+
+static void sba_free_chan_resources(struct dma_chan *dchan)
+{
+	/*
+	 * Channel resources are pre-alloced so we just free-up
+	 * whatever we can so that we can re-use pre-alloced
+	 * channel resources next time.
+	 */
+	sba_cleanup_nonpending_requests(to_sba_device(dchan));
+}
+
+static int sba_device_terminate_all(struct dma_chan *dchan)
+{
+	/* Cleanup all pending requests */
+	sba_cleanup_pending_requests(to_sba_device(dchan));
+
+	return 0;
+}
+
+static int sba_send_mbox_request(struct sba_device *sba,
+				 struct sba_request *req)
+{
+	int mchans_idx, ret = 0;
+
+	/* Select mailbox channel in round-robin fashion */
+	mchans_idx = atomic_inc_return(&sba->mchans_current);
+	mchans_idx = mchans_idx % sba->mchans_count;
+
+	/* Send batch message for the request */
+	req->bmsg.batch.msgs_queued = 0;
+	ret = mbox_send_message(sba->mchans[mchans_idx], &req->bmsg);
+	if (ret < 0) {
+		dev_err(sba->dev, "channel %d message %d (total %d)",
+			mchans_idx, req->bmsg.batch.msgs_queued,
+			req->bmsg.batch.msgs_count);
+		dev_err(sba->dev, "send message failed with error %d", ret);
+		return ret;
+	}
+	ret = req->bmsg.error;
+	if (ret < 0) {
+		dev_err(sba->dev,
+			"mbox channel %d message %d (total %d)",
+			mchans_idx, req->bmsg.batch.msgs_queued,
+			req->bmsg.batch.msgs_count);
+		dev_err(sba->dev, "message error %d", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void sba_issue_pending(struct dma_chan *dchan)
+{
+	int ret;
+	unsigned long flags;
+	struct sba_request *req, *req1;
+	struct sba_device *sba = to_sba_device(dchan);
+
+	spin_lock_irqsave(&sba->reqs_lock, flags);
+
+	/* Process all pending request */
+	list_for_each_entry_safe(req, req1, &sba->reqs_pending_list, node) {
+		/* Try to make request active */
+		if (!_sba_active_request(sba, req))
+			break;
+
+		/* Send request to mailbox channel */
+		spin_unlock_irqrestore(&sba->reqs_lock, flags);
+		ret = sba_send_mbox_request(sba, req);
+		spin_lock_irqsave(&sba->reqs_lock, flags);
+
+		/* If something went wrong then keep request pending */
+		if (ret < 0) {
+			_sba_pending_request(sba, req);
+			break;
+		}
+	}
+
+	spin_unlock_irqrestore(&sba->reqs_lock, flags);
+}
+
+static dma_cookie_t sba_tx_submit(struct dma_async_tx_descriptor *tx)
+{
+	unsigned long flags;
+	dma_cookie_t cookie;
+	struct sba_device *sba;
+	struct sba_request *req, *nreq;
+
+	if (unlikely(!tx))
+		return -EINVAL;
+
+	sba = to_sba_device(tx->chan);
+	req = to_sba_request(tx);
+
+	/* Assign cookie and mark all chained requests pending */
+	spin_lock_irqsave(&sba->reqs_lock, flags);
+	cookie = dma_cookie_assign(tx);
+	list_for_each_entry(nreq, &req->next, next) {
+		_sba_pending_request(sba, nreq);
+	}
+	_sba_pending_request(sba, req);
+	spin_unlock_irqrestore(&sba->reqs_lock, flags);
+
+	return cookie;
+}
+
+static enum dma_status sba_tx_status(struct dma_chan *dchan,
+				     dma_cookie_t cookie,
+				     struct dma_tx_state *txstate)
+{
+	int mchan_idx;
+	enum dma_status ret;
+	struct sba_device *sba = to_sba_device(dchan);
+
+	for (mchan_idx = 0; mchan_idx < sba->mchans_count; mchan_idx++)
+		mbox_client_peek_data(sba->mchans[mchan_idx]);
+
+	ret = dma_cookie_status(dchan, cookie, txstate);
+	if (ret == DMA_COMPLETE)
+		return ret;
+
+	return dma_cookie_status(dchan, cookie, txstate);
+}
+
+static unsigned int sba_fillup_memcpy_msg(struct sba_request *req,
+					  struct brcm_sba_command *cmds,
+					  struct brcm_message *msg,
+					  dma_addr_t msg_offset, size_t msg_len,
+					  dma_addr_t dst, dma_addr_t src)
+{
+	u64 cmd;
+	u32 c_mdata;
+	struct brcm_sba_command *cmdsp = cmds;
+
+	/* Type-B command to load data into buf0 */
+	cmd = 0;
+	SBA_ENC(cmd, SBA_TYPE_B, SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+	SBA_ENC(cmd, msg_len,
+		SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+	c_mdata = sba_cmd_load_c_mdata(0);
+	SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+		SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+	SBA_ENC(cmd, SBA_CMD_LOAD_BUFFER,
+		SBA_CMD_SHIFT, SBA_CMD_MASK);
+	cmdsp->cmd = cmd;
+	*cmdsp->cmd_dma = cpu_to_le64(cmd);
+	cmdsp->flags = BRCM_SBA_CMD_TYPE_B;
+	cmdsp->data = src + msg_offset;
+	cmdsp->data_len = msg_len;
+	cmdsp++;
+
+	/* Type-A command to write buf0 */
+	cmd = 0;
+	SBA_ENC(cmd, SBA_TYPE_A, SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+	SBA_ENC(cmd, msg_len,
+		SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+	SBA_ENC(cmd, 0x1, SBA_RESP_SHIFT, SBA_RESP_MASK);
+	c_mdata = sba_cmd_write_c_mdata(0);
+	SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+		SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+	SBA_ENC(cmd, SBA_CMD_WRITE_BUFFER,
+		SBA_CMD_SHIFT, SBA_CMD_MASK);
+	cmdsp->cmd = cmd;
+	*cmdsp->cmd_dma = cpu_to_le64(cmd);
+	cmdsp->flags = BRCM_SBA_CMD_TYPE_A;
+	if (req->sba->hw_resp_size) {
+		cmdsp->flags |= BRCM_SBA_CMD_HAS_RESP;
+		cmdsp->resp = req->resp_dma;
+		cmdsp->resp_len = req->sba->hw_resp_size;
+	}
+	cmdsp->flags |= BRCM_SBA_CMD_HAS_OUTPUT;
+	cmdsp->data = dst + msg_offset;
+	cmdsp->data_len = msg_len;
+	cmdsp++;
+
+	/* Fillup brcm_message */
+	msg->type = BRCM_MESSAGE_SBA;
+	msg->sba.cmds = cmds;
+	msg->sba.cmds_count = cmdsp - cmds;
+	msg->ctx = req;
+	msg->error = 0;
+
+	return cmdsp - cmds;
+}
+
+static struct sba_request *
+sba_prep_dma_memcpy_req(struct sba_device *sba,
+			dma_addr_t off, dma_addr_t dst, dma_addr_t src,
+			size_t len, unsigned long flags)
+{
+	size_t msg_len;
+	dma_addr_t msg_offset = 0;
+	unsigned int msgs_count = 0, cmds_count, cmds_idx = 0;
+	struct sba_request *req = NULL;
+
+	/* Alloc new request */
+	req = sba_alloc_request(sba);
+	if (!req)
+		return NULL;
+	req->fence = (flags & DMA_PREP_FENCE) ? true : false;
+
+	/* Fillup request messages */
+	while (len) {
+		msg_len = (len < sba->hw_buf_size) ? len : sba->hw_buf_size;
+		cmds_count = sba_fillup_memcpy_msg(req,
+					&req->cmds[cmds_idx],
+					&req->msgs[msgs_count],
+					off + msg_offset, msg_len, dst, src);
+		msgs_count++;
+		cmds_idx += cmds_count;
+		msg_offset += msg_len;
+		len -= msg_len;
+	}
+	req->bmsg.type = BRCM_MESSAGE_BATCH;
+	req->bmsg.batch.msgs = &req->msgs[0];
+	req->bmsg.batch.msgs_queued = 0;
+	req->bmsg.batch.msgs_count = msgs_count;
+	req->bmsg.ctx = req;
+	req->bmsg.error = 0;
+	atomic_set(&req->msgs_pending_count, msgs_count);
+
+	/* Init async_tx descriptor */
+	req->tx.flags = flags;
+	req->tx.cookie = -EBUSY;
+
+	return req;
+}
+
+static struct dma_async_tx_descriptor *
+sba_prep_dma_memcpy(struct dma_chan *dchan, dma_addr_t dst, dma_addr_t src,
+		    size_t len, unsigned long flags)
+{
+	size_t req_len;
+	dma_addr_t off = 0;
+	struct sba_device *sba = to_sba_device(dchan);
+	struct sba_request *first = NULL, *req;
+
+	/* Create chained requests where each request is upto req_size */
+	while (len) {
+		req_len = (len < sba->req_size) ? len : sba->req_size;
+
+		req = sba_prep_dma_memcpy_req(sba, off, dst, src,
+					      req_len, flags);
+		if (!req) {
+			if (first)
+				sba_free_chained_requests(first);
+			return NULL;
+		}
+
+		if (first)
+			sba_chain_request(first, req);
+		else
+			first = req;
+
+		off += req_len;
+		len -= req_len;
+	}
+
+	return (first) ? &first->tx : NULL;
+}
+
+static unsigned int sba_fillup_xor_msg(struct sba_request *req,
+				struct brcm_sba_command *cmds,
+				struct brcm_message *msg,
+				dma_addr_t msg_offset, size_t msg_len,
+				dma_addr_t dst, dma_addr_t *src, u32 src_cnt)
+{
+	u64 cmd;
+	u32 c_mdata;
+	unsigned int i;
+	struct brcm_sba_command *cmdsp = cmds;
+
+	/* Type-B command to load data into buf0 */
+	cmd = 0;
+	SBA_ENC(cmd, SBA_TYPE_B, SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+	SBA_ENC(cmd, msg_len,
+		SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+	c_mdata = sba_cmd_load_c_mdata(0);
+	SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+		SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+	SBA_ENC(cmd, SBA_CMD_LOAD_BUFFER,
+		SBA_CMD_SHIFT, SBA_CMD_MASK);
+	cmdsp->cmd = cmd;
+	*cmdsp->cmd_dma = cpu_to_le64(cmd);
+	cmdsp->flags = BRCM_SBA_CMD_TYPE_B;
+	cmdsp->data = src[0] + msg_offset;
+	cmdsp->data_len = msg_len;
+	cmdsp++;
+
+	/* Type-B commands to xor data with buf0 and put it back in buf0 */
+	for (i = 1; i < src_cnt; i++) {
+		cmd = 0;
+		SBA_ENC(cmd, SBA_TYPE_B, SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+		SBA_ENC(cmd, msg_len,
+			SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+		c_mdata = sba_cmd_xor_c_mdata(0, 0);
+		SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+			SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+		SBA_ENC(cmd, SBA_CMD_XOR, SBA_CMD_SHIFT, SBA_CMD_MASK);
+		cmdsp->cmd = cmd;
+		*cmdsp->cmd_dma = cpu_to_le64(cmd);
+		cmdsp->flags = BRCM_SBA_CMD_TYPE_B;
+		cmdsp->data = src[i] + msg_offset;
+		cmdsp->data_len = msg_len;
+		cmdsp++;
+	}
+
+	/* Type-A command to write buf0 */
+	cmd = 0;
+	SBA_ENC(cmd, SBA_TYPE_A, SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+	SBA_ENC(cmd, msg_len,
+		SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+	SBA_ENC(cmd, 0x1, SBA_RESP_SHIFT, SBA_RESP_MASK);
+	c_mdata = sba_cmd_write_c_mdata(0);
+	SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+		SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+	SBA_ENC(cmd, SBA_CMD_WRITE_BUFFER,
+		SBA_CMD_SHIFT, SBA_CMD_MASK);
+	cmdsp->cmd = cmd;
+	*cmdsp->cmd_dma = cpu_to_le64(cmd);
+	cmdsp->flags = BRCM_SBA_CMD_TYPE_A;
+	if (req->sba->hw_resp_size) {
+		cmdsp->flags |= BRCM_SBA_CMD_HAS_RESP;
+		cmdsp->resp = req->resp_dma;
+		cmdsp->resp_len = req->sba->hw_resp_size;
+	}
+	cmdsp->flags |= BRCM_SBA_CMD_HAS_OUTPUT;
+	cmdsp->data = dst + msg_offset;
+	cmdsp->data_len = msg_len;
+	cmdsp++;
+
+	/* Fillup brcm_message */
+	msg->type = BRCM_MESSAGE_SBA;
+	msg->sba.cmds = cmds;
+	msg->sba.cmds_count = cmdsp - cmds;
+	msg->ctx = req;
+	msg->error = 0;
+
+	return cmdsp - cmds;
+}
+
+struct sba_request *
+sba_prep_dma_xor_req(struct sba_device *sba,
+		     dma_addr_t off, dma_addr_t dst, dma_addr_t *src,
+		     u32 src_cnt, size_t len, unsigned long flags)
+{
+	size_t msg_len;
+	dma_addr_t msg_offset = 0;
+	unsigned int msgs_count = 0, cmds_count, cmds_idx = 0;
+	struct sba_request *req = NULL;
+
+	/* Alloc new request */
+	req = sba_alloc_request(sba);
+	if (!req)
+		return NULL;
+	req->fence = (flags & DMA_PREP_FENCE) ? true : false;
+
+	/* Fillup request messages */
+	while (len) {
+		msg_len = (len < sba->hw_buf_size) ? len : sba->hw_buf_size;
+		cmds_count = sba_fillup_xor_msg(req,
+				     &req->cmds[cmds_idx],
+				     &req->msgs[msgs_count],
+				     off + msg_offset, msg_len,
+				     dst, src, src_cnt);
+		msgs_count++;
+		cmds_idx += cmds_count;
+		msg_offset += msg_len;
+		len -= msg_len;
+	}
+	req->bmsg.type = BRCM_MESSAGE_BATCH;
+	req->bmsg.batch.msgs = &req->msgs[0];
+	req->bmsg.batch.msgs_queued = 0;
+	req->bmsg.batch.msgs_count = msgs_count;
+	req->bmsg.ctx = req;
+	req->bmsg.error = 0;
+	atomic_set(&req->msgs_pending_count, msgs_count);
+
+	/* Init async_tx descriptor */
+	req->tx.flags = flags;
+	req->tx.cookie = -EBUSY;
+
+	return req;
+}
+
+static struct dma_async_tx_descriptor *
+sba_prep_dma_xor(struct dma_chan *dchan, dma_addr_t dst, dma_addr_t *src,
+		 u32 src_cnt, size_t len, unsigned long flags)
+{
+	size_t req_len;
+	dma_addr_t off = 0;
+	struct sba_device *sba = to_sba_device(dchan);
+	struct sba_request *first = NULL, *req;
+
+	/* Sanity checks */
+	if (unlikely(src_cnt > sba->max_xor_srcs))
+		return NULL;
+
+	/* Create chained requests where each request is upto req_size */
+	while (len) {
+		req_len = (len < sba->req_size) ? len : sba->req_size;
+
+		req = sba_prep_dma_xor_req(sba, off, dst, src, src_cnt,
+					   req_len, flags);
+		if (!req) {
+			if (first)
+				sba_free_chained_requests(first);
+			return NULL;
+		}
+
+		if (first)
+			sba_chain_request(first, req);
+		else
+			first = req;
+
+		off += req_len;
+		len -= req_len;
+	}
+
+	return (first) ? &first->tx : NULL;
+}
+
+static unsigned int sba_fillup_pq_msg(struct sba_request *req,
+				bool pq_continue,
+				struct brcm_sba_command *cmds,
+				struct brcm_message *msg,
+				dma_addr_t msg_offset, size_t msg_len,
+				dma_addr_t *dst_p, dma_addr_t *dst_q,
+				const u8 *scf, dma_addr_t *src, u32 src_cnt)
+{
+	u64 cmd;
+	u32 c_mdata;
+	unsigned int i;
+	struct brcm_sba_command *cmdsp = cmds;
+
+	if (pq_continue) {
+		/* Type-B command to load old P into buf0 */
+		if (dst_p) {
+			cmd = 0;
+			SBA_ENC(cmd, SBA_TYPE_B,
+				SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+			SBA_ENC(cmd, msg_len,
+				SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+			c_mdata = sba_cmd_load_c_mdata(0);
+			SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+				SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+			SBA_ENC(cmd, SBA_CMD_LOAD_BUFFER,
+				SBA_CMD_SHIFT, SBA_CMD_MASK);
+			cmdsp->cmd = cmd;
+			*cmdsp->cmd_dma = cpu_to_le64(cmd);
+			cmdsp->flags = BRCM_SBA_CMD_TYPE_B;
+			cmdsp->data = *dst_p + msg_offset;
+			cmdsp->data_len = msg_len;
+			cmdsp++;
+		}
+
+		/* Type-B command to load old Q into buf1 */
+		if (dst_q) {
+			cmd = 0;
+			SBA_ENC(cmd, SBA_TYPE_B,
+				SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+			SBA_ENC(cmd, msg_len,
+				SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+			c_mdata = sba_cmd_load_c_mdata(1);
+			SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+				SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+			SBA_ENC(cmd, SBA_CMD_LOAD_BUFFER,
+				SBA_CMD_SHIFT, SBA_CMD_MASK);
+			cmdsp->cmd = cmd;
+			*cmdsp->cmd_dma = cpu_to_le64(cmd);
+			cmdsp->flags = BRCM_SBA_CMD_TYPE_B;
+			cmdsp->data = *dst_q + msg_offset;
+			cmdsp->data_len = msg_len;
+			cmdsp++;
+		}
+	} else {
+		/* Type-A command to load data into buf0 */
+		cmd = 0;
+		SBA_ENC(cmd, SBA_TYPE_A, SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+		SBA_ENC(cmd, msg_len,
+			SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+		SBA_ENC(cmd, SBA_CMD_ZERO_ALL_BUFFERS,
+			SBA_CMD_SHIFT, SBA_CMD_MASK);
+		cmdsp->cmd = cmd;
+		*cmdsp->cmd_dma = cpu_to_le64(cmd);
+		cmdsp->flags = BRCM_SBA_CMD_TYPE_A;
+		cmdsp++;
+	}
+
+	/* Type-B commands for generate P onto buf0 and Q onto buf1 */
+	for (i = 0; i < src_cnt; i++) {
+		cmd = 0;
+		SBA_ENC(cmd, SBA_TYPE_B, SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+		SBA_ENC(cmd, msg_len,
+			SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+		c_mdata = sba_cmd_pq_c_mdata(raid6_gflog[scf[i]], 1, 0);
+		SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+			SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+		SBA_ENC(cmd, SBA_C_MDATA_MS(c_mdata),
+			SBA_C_MDATA_MS_SHIFT, SBA_C_MDATA_MS_MASK);
+		SBA_ENC(cmd, SBA_CMD_GALOIS_XOR,
+			SBA_CMD_SHIFT, SBA_CMD_MASK);
+		cmdsp->cmd = cmd;
+		*cmdsp->cmd_dma = cpu_to_le64(cmd);
+		cmdsp->flags = BRCM_SBA_CMD_TYPE_B;
+		cmdsp->data = src[i] + msg_offset;
+		cmdsp->data_len = msg_len;
+		cmdsp++;
+	}
+
+	/* Type-A command to write buf0 */
+	if (dst_p) {
+		cmd = 0;
+		SBA_ENC(cmd, SBA_TYPE_A, SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+		SBA_ENC(cmd, msg_len,
+			SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+		SBA_ENC(cmd, 0x1, SBA_RESP_SHIFT, SBA_RESP_MASK);
+		c_mdata = sba_cmd_write_c_mdata(0);
+		SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+			SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+		SBA_ENC(cmd, SBA_CMD_WRITE_BUFFER,
+			SBA_CMD_SHIFT, SBA_CMD_MASK);
+		cmdsp->cmd = cmd;
+		*cmdsp->cmd_dma = cpu_to_le64(cmd);
+		cmdsp->flags = BRCM_SBA_CMD_TYPE_A;
+		if (req->sba->hw_resp_size) {
+			cmdsp->flags |= BRCM_SBA_CMD_HAS_RESP;
+			cmdsp->resp = req->resp_dma;
+			cmdsp->resp_len = req->sba->hw_resp_size;
+		}
+		cmdsp->flags |= BRCM_SBA_CMD_HAS_OUTPUT;
+		cmdsp->data = *dst_p + msg_offset;
+		cmdsp->data_len = msg_len;
+		cmdsp++;
+	}
+
+	/* Type-A command to write buf1 */
+	if (dst_q) {
+		cmd = 0;
+		SBA_ENC(cmd, SBA_TYPE_A, SBA_TYPE_SHIFT, SBA_TYPE_MASK);
+		SBA_ENC(cmd, msg_len,
+			SBA_USER_DEF_SHIFT, SBA_USER_DEF_MASK);
+		SBA_ENC(cmd, 0x1, SBA_RESP_SHIFT, SBA_RESP_MASK);
+		c_mdata = sba_cmd_write_c_mdata(1);
+		SBA_ENC(cmd, SBA_C_MDATA_LS(c_mdata),
+			SBA_C_MDATA_SHIFT, SBA_C_MDATA_MASK);
+		SBA_ENC(cmd, SBA_CMD_WRITE_BUFFER,
+			SBA_CMD_SHIFT, SBA_CMD_MASK);
+		cmdsp->cmd = cmd;
+		*cmdsp->cmd_dma = cpu_to_le64(cmd);
+		cmdsp->flags = BRCM_SBA_CMD_TYPE_A;
+		if (req->sba->hw_resp_size) {
+			cmdsp->flags |= BRCM_SBA_CMD_HAS_RESP;
+			cmdsp->resp = req->resp_dma;
+			cmdsp->resp_len = req->sba->hw_resp_size;
+		}
+		cmdsp->flags |= BRCM_SBA_CMD_HAS_OUTPUT;
+		cmdsp->data = *dst_q + msg_offset;
+		cmdsp->data_len = msg_len;
+		cmdsp++;
+	}
+
+	/* Fillup brcm_message */
+	msg->type = BRCM_MESSAGE_SBA;
+	msg->sba.cmds = cmds;
+	msg->sba.cmds_count = cmdsp - cmds;
+	msg->ctx = req;
+	msg->error = 0;
+
+	return cmdsp - cmds;
+}
+
+struct sba_request *
+sba_prep_dma_pq_req(struct sba_device *sba,
+		    dma_addr_t off, dma_addr_t *dst, dma_addr_t *src,
+		    u32 src_cnt, const u8 *scf, size_t len, unsigned long flags)
+{
+	size_t dst_count, msg_len;
+	unsigned int msgs_count = 0, cmds_count, cmds_idx = 0;
+	dma_addr_t *dst_p = NULL, *dst_q = NULL;
+	dma_addr_t msg_offset = 0;
+	struct sba_request *req = NULL;
+
+	/* Figure-out P and Q destination addresses */
+	dst_count = 0;
+	if (!(flags & DMA_PREP_PQ_DISABLE_P))
+		dst_p = &dst[dst_count++];
+	if (!(flags & DMA_PREP_PQ_DISABLE_Q))
+		dst_q = &dst[dst_count++];
+
+	/* Alloc new request */
+	req = sba_alloc_request(sba);
+	if (!req)
+		return NULL;
+	req->fence = (flags & DMA_PREP_FENCE) ? true : false;
+
+	/* Fillup request messages */
+	while (len) {
+		msg_len = (len < sba->hw_buf_size) ? len : sba->hw_buf_size;
+		cmds_count = sba_fillup_pq_msg(req, dmaf_continue(flags),
+				    &req->cmds[cmds_idx],
+				    &req->msgs[msgs_count],
+				    off + msg_offset, msg_len,
+				    dst_p, dst_q, scf, src, src_cnt);
+		msgs_count++;
+		cmds_idx += cmds_count;
+		msg_offset += msg_len;
+		len -= msg_len;
+	}
+	req->bmsg.type = BRCM_MESSAGE_BATCH;
+	req->bmsg.batch.msgs = &req->msgs[0];
+	req->bmsg.batch.msgs_queued = 0;
+	req->bmsg.batch.msgs_count = msgs_count;
+	req->bmsg.ctx = req;
+	req->bmsg.error = 0;
+	atomic_set(&req->msgs_pending_count, msgs_count);
+
+	/* Init async_tx descriptor */
+	req->tx.flags = flags;
+	req->tx.cookie = -EBUSY;
+
+	return req;
+}
+
+static struct dma_async_tx_descriptor *
+sba_prep_dma_pq(struct dma_chan *dchan, dma_addr_t *dst, dma_addr_t *src,
+		u32 src_cnt, const u8 *scf, size_t len, unsigned long flags)
+{
+	u32 i;
+	size_t req_len;
+	dma_addr_t off = 0;
+	struct sba_device *sba = to_sba_device(dchan);
+	struct sba_request *first = NULL, *req;
+
+	/* Sanity checks */
+	if (unlikely(src_cnt > sba->max_pq_srcs))
+		return NULL;
+	for (i = 0; i < src_cnt; i++)
+		if (sba->max_pq_coefs <= raid6_gflog[scf[i]])
+			return NULL;
+
+	/* Create chained requests where each request is upto req_size */
+	while (len) {
+		req_len = (len < sba->req_size) ? len : sba->req_size;
+
+		req = sba_prep_dma_pq_req(sba, off, dst, src, src_cnt,
+					  scf, req_len, flags);
+		if (!req) {
+			if (first)
+				sba_free_chained_requests(first);
+			return NULL;
+		}
+
+		if (first)
+			sba_chain_request(first, req);
+		else
+			first = req;
+
+		off += req_len;
+		len -= req_len;
+	}
+
+	return (first) ? &first->tx : NULL;
+}
+
+/* ====== Mailbox callbacks ===== */
+
+static void sba_dma_tx_actions(struct sba_request *req)
+{
+	struct dma_async_tx_descriptor *tx = &req->tx;
+
+	WARN_ON(tx->cookie < 0);
+
+	if (tx->cookie > 0) {
+		dma_cookie_complete(tx);
+
+		/* call the callback (must not sleep or submit new
+		 * operations to this channel)
+		 */
+		if (tx->callback)
+			tx->callback(tx->callback_param);
+
+		dma_descriptor_unmap(tx);
+	}
+
+	/* run dependent operations */
+	dma_run_dependencies(tx);
+
+	/* If waiting for 'ack' then move to completed list */
+	if (!async_tx_test_ack(&req->tx))
+		sba_complete_chained_requests(req);
+	else
+		sba_free_chained_requests(req);
+}
+
+static void sba_receive_message(struct mbox_client *cl, void *msg)
+{
+	unsigned long flags;
+	struct brcm_message *m = msg;
+	struct sba_request *req = m->ctx, *req1;
+	struct sba_device *sba = req->sba;
+
+	/*  error count if message has error */
+	if (m->error < 0) {
+		dev_err(sba->dev, "%s got message with error %d",
+			dma_chan_name(&sba->dma_chan), m->error);
+	}
+
+	/* Wait for all messages of a request to be completed */
+	if (atomic_dec_return(&req->msgs_pending_count))
+		return;
+
+	/* Wait for all chained request to be completed */
+	if (atomic_dec_return(&req->first->next_pending_count))
+		return;
+
+	/* Point to first request */
+	req = req->first;
+
+	/* Update request */
+	if (req->state == SBA_REQUEST_STATE_ACTIVE)
+		sba_dma_tx_actions(req);
+	else
+		sba_free_chained_requests(req);
+
+	spin_lock_irqsave(&sba->reqs_lock, flags);
+
+	/* Re-check all completed request waiting for 'ack' */
+	list_for_each_entry_safe(req, req1, &sba->reqs_completed_list, node) {
+		spin_unlock_irqrestore(&sba->reqs_lock, flags);
+		sba_dma_tx_actions(req);
+		spin_lock_irqsave(&sba->reqs_lock, flags);
+	}
+
+	spin_unlock_irqrestore(&sba->reqs_lock, flags);
+
+	/* Try to submit pending request */
+	sba_issue_pending(&sba->dma_chan);
+}
+
+/* ====== Platform driver routines ===== */
+
+static int sba_prealloc_channel_resources(struct sba_device *sba)
+{
+	int i, j, p, ret = 0;
+	struct sba_request *req = NULL;
+
+	sba->resp_base = dma_alloc_coherent(sba->dma_dev.dev,
+					    sba->max_resp_pool_size,
+					    &sba->resp_dma_base, GFP_KERNEL);
+	if (!sba->resp_base)
+		return -ENOMEM;
+
+	sba->cmds_base = dma_alloc_coherent(sba->dma_dev.dev,
+					    sba->max_cmds_pool_size,
+					    &sba->cmds_dma_base, GFP_KERNEL);
+	if (!sba->cmds_base) {
+		ret = -ENOMEM;
+		goto fail_free_resp_pool;
+	}
+
+	spin_lock_init(&sba->reqs_lock);
+	sba->reqs_fence = false;
+	INIT_LIST_HEAD(&sba->reqs_alloc_list);
+	INIT_LIST_HEAD(&sba->reqs_pending_list);
+	INIT_LIST_HEAD(&sba->reqs_active_list);
+	INIT_LIST_HEAD(&sba->reqs_completed_list);
+	INIT_LIST_HEAD(&sba->reqs_aborted_list);
+	INIT_LIST_HEAD(&sba->reqs_free_list);
+
+	sba->reqs = devm_kcalloc(sba->dev, sba->max_req,
+				 sizeof(*req), GFP_KERNEL);
+	if (!sba->reqs) {
+		ret = -ENOMEM;
+		goto fail_free_cmds_pool;
+	}
+
+	for (i = 0, p = 0; i < sba->max_req; i++) {
+		req = &sba->reqs[i];
+		INIT_LIST_HEAD(&req->node);
+		req->sba = sba;
+		req->state = SBA_REQUEST_STATE_FREE;
+		INIT_LIST_HEAD(&req->next);
+		req->next_count = 1;
+		atomic_set(&req->next_pending_count, 0);
+		req->fence = false;
+		req->resp = sba->resp_base + p;
+		req->resp_dma = sba->resp_dma_base + p;
+		p += sba->hw_resp_size;
+		req->cmds = devm_kcalloc(sba->dev, sba->max_cmd_per_req,
+					 sizeof(*req->cmds), GFP_KERNEL);
+		if (!req->cmds) {
+			ret = -ENOMEM;
+			goto fail_free_cmds_pool;
+		}
+		for (j = 0; j < sba->max_cmd_per_req; j++) {
+			req->cmds[j].cmd = 0;
+			req->cmds[j].cmd_dma = sba->cmds_base +
+				(i * sba->max_cmd_per_req + j) * sizeof(u64);
+			req->cmds[j].cmd_dma_addr = sba->cmds_dma_base +
+				(i * sba->max_cmd_per_req + j) * sizeof(u64);
+			req->cmds[j].flags = 0;
+		}
+		req->msgs = devm_kcalloc(sba->dev, sba->max_msg_per_req,
+					 sizeof(*req->msgs), GFP_KERNEL);
+		if (!req->msgs) {
+			ret = -ENOMEM;
+			goto fail_free_cmds_pool;
+		}
+		memset(&req->bmsg, 0, sizeof(req->bmsg));
+		atomic_set(&req->msgs_pending_count, 0);
+		dma_async_tx_descriptor_init(&req->tx, &sba->dma_chan);
+		req->tx.tx_submit = sba_tx_submit;
+		req->tx.phys = req->resp_dma;
+		list_add_tail(&req->node, &sba->reqs_free_list);
+	}
+
+	sba->reqs_free_count = sba->max_req;
+
+	return 0;
+
+fail_free_cmds_pool:
+	dma_free_coherent(sba->dma_dev.dev,
+			  sba->max_cmds_pool_size,
+			  sba->cmds_base, sba->cmds_dma_base);
+fail_free_resp_pool:
+	dma_free_coherent(sba->dma_dev.dev,
+			  sba->max_resp_pool_size,
+			  sba->resp_base, sba->resp_dma_base);
+	return ret;
+}
+
+static void sba_freeup_channel_resources(struct sba_device *sba)
+{
+	dmaengine_terminate_all(&sba->dma_chan);
+	dma_free_coherent(sba->dma_dev.dev, sba->max_cmds_pool_size,
+			  sba->cmds_base, sba->cmds_dma_base);
+	dma_free_coherent(sba->dma_dev.dev, sba->max_resp_pool_size,
+			  sba->resp_base, sba->resp_dma_base);
+	sba->resp_base = NULL;
+	sba->resp_dma_base = 0;
+}
+
+static int sba_async_register(struct sba_device *sba)
+{
+	int ret;
+	struct dma_device *dma_dev = &sba->dma_dev;
+
+	/* Initialize DMA channel cookie */
+	sba->dma_chan.device = dma_dev;
+	dma_cookie_init(&sba->dma_chan);
+
+	/* Initialize DMA device capability mask */
+	dma_cap_zero(dma_dev->cap_mask);
+	dma_cap_set(DMA_MEMCPY, dma_dev->cap_mask);
+	dma_cap_set(DMA_XOR, dma_dev->cap_mask);
+	dma_cap_set(DMA_PQ, dma_dev->cap_mask);
+
+	/*
+	 * Set mailbox channel device as the base device of
+	 * our dma_device because the actual memory accesses
+	 * will be done by mailbox controller
+	 */
+	dma_dev->dev = sba->mbox_dev;
+
+	/* Set base prep routines */
+	dma_dev->device_free_chan_resources = sba_free_chan_resources;
+	dma_dev->device_terminate_all = sba_device_terminate_all;
+	dma_dev->device_issue_pending = sba_issue_pending;
+	dma_dev->device_tx_status = sba_tx_status;
+
+	/* Set memcpy routines and capability */
+	if (dma_has_cap(DMA_MEMCPY, dma_dev->cap_mask))
+		dma_dev->device_prep_dma_memcpy = sba_prep_dma_memcpy;
+
+	/* Set xor routines and capability */
+	if (dma_has_cap(DMA_XOR, dma_dev->cap_mask)) {
+		dma_dev->device_prep_dma_xor = sba_prep_dma_xor;
+		dma_dev->max_xor = sba->max_xor_srcs;
+	}
+
+	/* Set pq routines and capability */
+	if (dma_has_cap(DMA_PQ, dma_dev->cap_mask)) {
+		dma_dev->device_prep_dma_pq = sba_prep_dma_pq;
+		dma_set_maxpq(dma_dev, sba->max_pq_srcs, 0);
+		dma_set_maxpqcoef(dma_dev, sba->max_pq_coefs);
+	}
+
+	/* Initialize DMA device channel list */
+	INIT_LIST_HEAD(&dma_dev->channels);
+	list_add_tail(&sba->dma_chan.device_node, &dma_dev->channels);
+
+	/* Register with Linux async DMA framework*/
+	ret = dma_async_device_register(dma_dev);
+	if (ret) {
+		dev_err(sba->dev, "async device register error %d", ret);
+		return ret;
+	}
+
+	dev_info(sba->dev, "%s capabilities: %s%s%s\n",
+		 dma_chan_name(&sba->dma_chan),
+		 dma_has_cap(DMA_MEMCPY, dma_dev->cap_mask) ? "memcpy " : "",
+		 dma_has_cap(DMA_XOR, dma_dev->cap_mask) ? "xor " : "",
+		 dma_has_cap(DMA_PQ, dma_dev->cap_mask) ? "pq " : "");
+
+	return 0;
+}
+
+static int sba_probe(struct platform_device *pdev)
+{
+	int i, ret = 0, mchans_count;
+	struct sba_device *sba;
+	struct platform_device *mbox_pdev;
+	struct of_phandle_args args;
+
+	/* Allocate main SBA struct */
+	sba = devm_kzalloc(&pdev->dev, sizeof(*sba), GFP_KERNEL);
+	if (!sba)
+		return -ENOMEM;
+
+	sba->dev = &pdev->dev;
+	platform_set_drvdata(pdev, sba);
+
+	/* Determine SBA version from DT compatible string */
+	if (of_device_is_compatible(sba->dev->of_node, "brcm,iproc-sba"))
+		sba->ver = SBA_VER_1;
+	else if (of_device_is_compatible(sba->dev->of_node,
+					 "brcm,iproc-sba-v2"))
+		sba->ver = SBA_VER_2;
+	else
+		return -ENODEV;
+
+	/* Derived Configuration parameters */
+	switch (sba->ver) {
+	case SBA_VER_1:
+		sba->max_req = 256;
+		sba->req_size = PAGE_SIZE;
+		sba->hw_buf_size = 4096;
+		sba->hw_resp_size = 8;
+		sba->max_pq_coefs = 6;
+		sba->max_pq_srcs = 6;
+		break;
+	case SBA_VER_2:
+		sba->max_req = 256;
+		sba->req_size = PAGE_SIZE;
+		sba->hw_buf_size = 4096;
+		sba->hw_resp_size = 8;
+		sba->max_pq_coefs = 30;
+		/*
+		 * We can support max_pq_srcs == max_pq_coefs because
+		 * we are limited by number of SBA commands that we can
+		 * fit in one message for underlying ring manager HW.
+		 */
+		sba->max_pq_srcs = 12;
+		break;
+	default:
+		return -EINVAL;
+	}
+	sba->max_msg_per_req = sba->req_size / sba->hw_buf_size;
+	if ((sba->max_msg_per_req * sba->hw_buf_size) < sba->req_size)
+		sba->max_msg_per_req++;
+	sba->max_cmd_per_msg = sba->max_pq_srcs + 3;
+	sba->max_cmd_per_req = sba->max_msg_per_req * sba->max_cmd_per_msg;
+	sba->max_xor_srcs = sba->max_cmd_per_msg - 1;
+	sba->max_resp_pool_size = sba->max_req * sba->hw_resp_size;
+	sba->max_cmds_pool_size = sba->max_req *
+				  sba->max_cmd_per_req * sizeof(u64);
+
+	/* Setup mailbox client */
+	sba->client.dev			= &pdev->dev;
+	sba->client.rx_callback		= sba_receive_message;
+	sba->client.tx_block		= false;
+	sba->client.knows_txdone	= false;
+	sba->client.tx_tout		= 0;
+
+	/* Number of channels equals number of mailbox channels */
+	ret = of_count_phandle_with_args(pdev->dev.of_node,
+					 "mboxes", "#mbox-cells");
+	if (ret <= 0)
+		return -ENODEV;
+	mchans_count = ret;
+	sba->mchans_count = 0;
+	atomic_set(&sba->mchans_current, 0);
+
+	/* Allocate mailbox channel array */
+	sba->mchans = devm_kcalloc(&pdev->dev, sba->mchans_count,
+				   sizeof(*sba->mchans), GFP_KERNEL);
+	if (!sba->mchans)
+		return -ENOMEM;
+
+	/* Request mailbox channels */
+	for (i = 0; i < mchans_count; i++) {
+		sba->mchans[i] = mbox_request_channel(&sba->client, i);
+		if (IS_ERR(sba->mchans[i])) {
+			ret = PTR_ERR(sba->mchans[i]);
+			goto fail_free_mchans;
+		}
+		sba->mchans_count++;
+	}
+
+	/* Find-out underlying mailbox device */
+	ret = of_parse_phandle_with_args(pdev->dev.of_node,
+					 "mboxes", "#mbox-cells", 0, &args);
+	if (ret)
+		goto fail_free_mchans;
+	mbox_pdev = of_find_device_by_node(args.np);
+	of_node_put(args.np);
+	if (!mbox_pdev) {
+		ret = -ENODEV;
+		goto fail_free_mchans;
+	}
+	sba->mbox_dev = &mbox_pdev->dev;
+
+	/* All mailbox channels should be of same ring manager device */
+	for (i = 1; i < mchans_count; i++) {
+		ret = of_parse_phandle_with_args(pdev->dev.of_node,
+					 "mboxes", "#mbox-cells", i, &args);
+		if (ret)
+			goto fail_free_mchans;
+		mbox_pdev = of_find_device_by_node(args.np);
+		of_node_put(args.np);
+		if (sba->mbox_dev != &mbox_pdev->dev) {
+			ret = -EINVAL;
+			goto fail_free_mchans;
+		}
+	}
+
+	/* Register DMA device with linux async framework */
+	ret = sba_async_register(sba);
+	if (ret)
+		goto fail_free_mchans;
+
+	/* Prealloc channel resource */
+	ret = sba_prealloc_channel_resources(sba);
+	if (ret)
+		goto fail_async_dev_unreg;
+
+	/* Print device info */
+	dev_info(sba->dev, "%s using SBAv%d and %d mailbox channels",
+		 dma_chan_name(&sba->dma_chan), sba->ver+1,
+		 sba->mchans_count);
+
+	return 0;
+
+fail_async_dev_unreg:
+	dma_async_device_unregister(&sba->dma_dev);
+fail_free_mchans:
+	for (i = 0; i < sba->mchans_count; i++)
+		mbox_free_channel(sba->mchans[i]);
+	return ret;
+}
+
+static int sba_remove(struct platform_device *pdev)
+{
+	int i;
+	struct sba_device *sba = platform_get_drvdata(pdev);
+
+	sba_freeup_channel_resources(sba);
+
+	dma_async_device_unregister(&sba->dma_dev);
+
+	for (i = 0; i < sba->mchans_count; i++)
+		mbox_free_channel(sba->mchans[i]);
+
+	return 0;
+}
+
+static const struct of_device_id sba_of_match[] = {
+	{ .compatible = "brcm,iproc-sba", },
+	{ .compatible = "brcm,iproc-sba-v2", },
+	{},
+};
+MODULE_DEVICE_TABLE(of, sba_of_match);
+
+static struct platform_driver sba_driver = {
+	.probe = sba_probe,
+	.remove = sba_remove,
+	.driver = {
+		.name = "bcm-sba-raid",
+		.of_match_table = sba_of_match,
+	},
+};
+module_platform_driver(sba_driver);
+
+MODULE_DESCRIPTION("Broadcom SBA RAID driver");
+MODULE_AUTHOR("Anup Patel <anup.patel@broadcom.com>");
+MODULE_LICENSE("GPL v2");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 5/5] dt-bindings: Add DT bindings document for Broadcom SBA RAID driver
  2017-02-07  8:16 [PATCH v2 0/5] Broadcom SBA RAID support Anup Patel
                   ` (2 preceding siblings ...)
  2017-02-07  8:16 ` [PATCH v2 4/5] dmaengine: Add Broadcom SBA RAID driver Anup Patel
@ 2017-02-07  8:16 ` Anup Patel
  3 siblings, 0 replies; 16+ messages in thread
From: Anup Patel @ 2017-02-07  8:16 UTC (permalink / raw)
  To: Vinod Koul, Rob Herring, Mark Rutland, Herbert Xu,
	David S . Miller, Jassi Brar
  Cc: Dan Williams, Ray Jui, Scott Branden, Jon Mason, Rob Rice,
	bcm-kernel-feedback-list, dmaengine, devicetree,
	linux-arm-kernel, linux-kernel, linux-crypto, linux-raid,
	Anup Patel

This patch adds the DT bindings document for newly added Broadcom
SBA RAID driver.

Signed-off-by: Anup Patel <anup.patel@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
---
 .../devicetree/bindings/dma/brcm,iproc-sba.txt     | 29 ++++++++++++++++++++++
 1 file changed, 29 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt

diff --git a/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt b/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt
new file mode 100644
index 0000000..092913a
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt
@@ -0,0 +1,29 @@
+* Broadcom SBA RAID engine
+
+Required properties:
+- compatible: Should be one of the following
+	      "brcm,iproc-sba"
+	      "brcm,iproc-sba-v2"
+  The "brcm,iproc-sba" has support for only 6 PQ coefficients
+  The "brcm,iproc-sba-v2" has support for only 30 PQ coefficients
+- mboxes: List of phandle and mailbox channel specifiers
+
+Example:
+
+raid_mbox: mbox@67400000 {
+	...
+	#mbox-cells = <3>;
+	...
+};
+
+raid0 {
+	compatible = "brcm,iproc-sba-v2";
+	mboxes = <&raid_mbox 0 0x1 0xffff>,
+		 <&raid_mbox 1 0x1 0xffff>,
+		 <&raid_mbox 2 0x1 0xffff>,
+		 <&raid_mbox 3 0x1 0xffff>,
+		 <&raid_mbox 4 0x1 0xffff>,
+		 <&raid_mbox 5 0x1 0xffff>,
+		 <&raid_mbox 6 0x1 0xffff>,
+		 <&raid_mbox 7 0x1 0xffff>;
+};
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients
  2017-02-07  8:16   ` [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients Anup Patel
@ 2017-02-07  8:27     ` Dan Williams
  2017-02-07  9:02       ` Anup Patel
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Williams @ 2017-02-07  8:27 UTC (permalink / raw)
  To: Anup Patel
  Cc: Vinod Koul, Rob Herring, Mark Rutland, Herbert Xu,
	David S . Miller, Jassi Brar, Ray Jui, Scott Branden, Jon Mason,
	Rob Rice, BCM Kernel Feedback, dmaengine, Device Tree,
	linux-arm-kernel, linux-kernel, linux-crypto, linux-raid

On Tue, Feb 7, 2017 at 12:16 AM, Anup Patel <anup.patel@broadcom.com> wrote:
> The DMAENGINE framework assumes that if PQ offload is supported by a
> DMA device then all 256 PQ coefficients are supported. This assumption
> does not hold anymore because we now have BCM-SBA-RAID offload engine
> which supports PQ offload with limited number of PQ coefficients.
>
> This patch extends async_tx APIs to handle DMA devices with support
> for fewer PQ coefficients.
>
> Signed-off-by: Anup Patel <anup.patel@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>

I don't like this approach. Define an interface for md to query the
offload engine once at the beginning of time. We should not be adding
any new extensions to async_tx.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients
  2017-02-07  8:27     ` Dan Williams
@ 2017-02-07  9:02       ` Anup Patel
  2017-02-07 16:42         ` Vinod Koul
  2017-02-07 18:16         ` Dan Williams
  0 siblings, 2 replies; 16+ messages in thread
From: Anup Patel @ 2017-02-07  9:02 UTC (permalink / raw)
  To: Dan Williams
  Cc: Mark Rutland, Device Tree, Herbert Xu, Scott Branden, Vinod Koul,
	Ray Jui, Jassi Brar, linux-kernel, linux-raid, Jon Mason,
	Rob Herring, BCM Kernel Feedback, linux-crypto, Rob Rice,
	dmaengine, David S . Miller, linux-arm-kernel

On Tue, Feb 7, 2017 at 1:57 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Feb 7, 2017 at 12:16 AM, Anup Patel <anup.patel@broadcom.com> wrote:
>> The DMAENGINE framework assumes that if PQ offload is supported by a
>> DMA device then all 256 PQ coefficients are supported. This assumption
>> does not hold anymore because we now have BCM-SBA-RAID offload engine
>> which supports PQ offload with limited number of PQ coefficients.
>>
>> This patch extends async_tx APIs to handle DMA devices with support
>> for fewer PQ coefficients.
>>
>> Signed-off-by: Anup Patel <anup.patel@broadcom.com>
>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>
> I don't like this approach. Define an interface for md to query the
> offload engine once at the beginning of time. We should not be adding
> any new extensions to async_tx.

Even if we do capability checks in Linux MD, we still need a way
for DMAENGINE drivers to advertise number of PQ coefficients
handled by the HW.

I agree capability checks should be done once in Linux MD but I don't
see why this has to be part of BCM-SBA-RAID driver patches. We need
separate patchsets to address limitations of async_tx framework.

Regards,
Anup

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients
  2017-02-07  9:02       ` Anup Patel
@ 2017-02-07 16:42         ` Vinod Koul
  2017-02-08  6:37           ` Anup Patel
  2017-02-07 18:16         ` Dan Williams
  1 sibling, 1 reply; 16+ messages in thread
From: Vinod Koul @ 2017-02-07 16:42 UTC (permalink / raw)
  To: Anup Patel
  Cc: Dan Williams, Rob Herring, Mark Rutland, Herbert Xu,
	David S . Miller, Jassi Brar, Ray Jui, Scott Branden, Jon Mason,
	Rob Rice, BCM Kernel Feedback, dmaengine, Device Tree,
	linux-arm-kernel, linux-kernel, linux-crypto, linux-raid

On Tue, Feb 07, 2017 at 02:32:15PM +0530, Anup Patel wrote:
> On Tue, Feb 7, 2017 at 1:57 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> > On Tue, Feb 7, 2017 at 12:16 AM, Anup Patel <anup.patel@broadcom.com> wrote:
> >> The DMAENGINE framework assumes that if PQ offload is supported by a
> >> DMA device then all 256 PQ coefficients are supported. This assumption
> >> does not hold anymore because we now have BCM-SBA-RAID offload engine
> >> which supports PQ offload with limited number of PQ coefficients.
> >>
> >> This patch extends async_tx APIs to handle DMA devices with support
> >> for fewer PQ coefficients.
> >>
> >> Signed-off-by: Anup Patel <anup.patel@broadcom.com>
> >> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> >
> > I don't like this approach. Define an interface for md to query the
> > offload engine once at the beginning of time. We should not be adding
> > any new extensions to async_tx.
> 
> Even if we do capability checks in Linux MD, we still need a way
> for DMAENGINE drivers to advertise number of PQ coefficients
> handled by the HW.

If the question is only for advertising caps, then why not do as done
for dma_get_slave_caps(). you can add dma_get_pq_caps() so that clients (md)
in this case would know the HW capability.

> I agree capability checks should be done once in Linux MD but I don't
> see why this has to be part of BCM-SBA-RAID driver patches. We need
> separate patchsets to address limitations of async_tx framework.
> 
> Regards,
> Anup

-- 
~Vinod

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients
  2017-02-07  9:02       ` Anup Patel
  2017-02-07 16:42         ` Vinod Koul
@ 2017-02-07 18:16         ` Dan Williams
  2017-02-08  8:57           ` Anup Patel
  1 sibling, 1 reply; 16+ messages in thread
From: Dan Williams @ 2017-02-07 18:16 UTC (permalink / raw)
  To: Anup Patel
  Cc: Vinod Koul, Rob Herring, Mark Rutland, Herbert Xu,
	David S . Miller, Jassi Brar, Ray Jui, Scott Branden, Jon Mason,
	Rob Rice, BCM Kernel Feedback, dmaengine, Device Tree,
	linux-arm-kernel, linux-kernel, linux-crypto, linux-raid

On Tue, Feb 7, 2017 at 1:02 AM, Anup Patel <anup.patel@broadcom.com> wrote:
> On Tue, Feb 7, 2017 at 1:57 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> On Tue, Feb 7, 2017 at 12:16 AM, Anup Patel <anup.patel@broadcom.com> wrote:
>>> The DMAENGINE framework assumes that if PQ offload is supported by a
>>> DMA device then all 256 PQ coefficients are supported. This assumption
>>> does not hold anymore because we now have BCM-SBA-RAID offload engine
>>> which supports PQ offload with limited number of PQ coefficients.
>>>
>>> This patch extends async_tx APIs to handle DMA devices with support
>>> for fewer PQ coefficients.
>>>
>>> Signed-off-by: Anup Patel <anup.patel@broadcom.com>
>>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>
>> I don't like this approach. Define an interface for md to query the
>> offload engine once at the beginning of time. We should not be adding
>> any new extensions to async_tx.
>
> Even if we do capability checks in Linux MD, we still need a way
> for DMAENGINE drivers to advertise number of PQ coefficients
> handled by the HW.
>
> I agree capability checks should be done once in Linux MD but I don't
> see why this has to be part of BCM-SBA-RAID driver patches. We need
> separate patchsets to address limitations of async_tx framework.

Right, separate enabling before we pile on new hardware support to a
known broken framework.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients
  2017-02-07 16:42         ` Vinod Koul
@ 2017-02-08  6:37           ` Anup Patel
  0 siblings, 0 replies; 16+ messages in thread
From: Anup Patel @ 2017-02-08  6:37 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Dan Williams, Rob Herring, Mark Rutland, Herbert Xu,
	David S . Miller, Jassi Brar, Ray Jui, Scott Branden, Jon Mason,
	Rob Rice, BCM Kernel Feedback, dmaengine, Device Tree,
	linux-arm-kernel, linux-kernel, linux-crypto, linux-raid

On Tue, Feb 7, 2017 at 10:12 PM, Vinod Koul <vinod.koul@intel.com> wrote:
> On Tue, Feb 07, 2017 at 02:32:15PM +0530, Anup Patel wrote:
>> On Tue, Feb 7, 2017 at 1:57 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> > On Tue, Feb 7, 2017 at 12:16 AM, Anup Patel <anup.patel@broadcom.com> wrote:
>> >> The DMAENGINE framework assumes that if PQ offload is supported by a
>> >> DMA device then all 256 PQ coefficients are supported. This assumption
>> >> does not hold anymore because we now have BCM-SBA-RAID offload engine
>> >> which supports PQ offload with limited number of PQ coefficients.
>> >>
>> >> This patch extends async_tx APIs to handle DMA devices with support
>> >> for fewer PQ coefficients.
>> >>
>> >> Signed-off-by: Anup Patel <anup.patel@broadcom.com>
>> >> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>> >
>> > I don't like this approach. Define an interface for md to query the
>> > offload engine once at the beginning of time. We should not be adding
>> > any new extensions to async_tx.
>>
>> Even if we do capability checks in Linux MD, we still need a way
>> for DMAENGINE drivers to advertise number of PQ coefficients
>> handled by the HW.
>
> If the question is only for advertising caps, then why not do as done
> for dma_get_slave_caps(). you can add dma_get_pq_caps() so that clients (md)
> in this case would know the HW capability.

We have large number of possible capabilities for
DMA slave such as src_addr_widths, dst_addr_widths,
directions, max_burst, residue_granularity, and
descriptor_resue.

The possible capabilities of PQ offload are:
1. Number of PQ sources handled by PQ offload
(Represented by "max_pq" member of "struct dma_device")
2. Number of PQ coefficients handled by PQ offload

The above two PQ capabilities are good enough for
current PQ HW and future PQ HW so we just need a
way to specify number of PQ coefficients.

Till now all of the PQ HW always supported all 256
PQ coefficients so we never felt the need of capability
to specify PQ coefficients. The BCM-SBA-RAID is the
only HW (as far as I know) which does not support all
256 PQ coefficients.

Currently, DMAENGINE drivers use dma_set_maxpq() to
specify number of PQ sources handled by PQ HW and
Linux Async Tx uses dma_maxpq() to get number of
PQ sources.

On similar lines, we added dma_set_maxpqcoef() to
specify number of PQ coefficients and Linux Async Tx
uses dma_maxpqcoef() to get number of PQ coefficients.
If DMAENGINE driver does not specify PQ coefficients
then dma_maxpqcoef() will return 256 assuming all
PQ coefficients are supported. This approach is
backward compatible to existing DMAENGINE APIs
and will not break existing DMAENGINE drivers.

If we add dma_get_pq_caps() similar to the
dma_get_slave_caps() for PQ capabilities then we
will have to use this new method for both of the above
PQ capabilities and we have to change all DMAENGINE
drivers to use new method of specifying PQ capabilities.
I think this is too intrusive and bit overkill because its
very very unlikely to see anymore additions to
PQ capabilities.

Regards,
Anup

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients
  2017-02-07 18:16         ` Dan Williams
@ 2017-02-08  8:57           ` Anup Patel
  2017-02-08 16:24             ` Dan Williams
  0 siblings, 1 reply; 16+ messages in thread
From: Anup Patel @ 2017-02-08  8:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Vinod Koul, Rob Herring, Mark Rutland, Herbert Xu,
	David S . Miller, Jassi Brar, Ray Jui, Scott Branden, Jon Mason,
	Rob Rice, BCM Kernel Feedback, dmaengine, Device Tree,
	linux-arm-kernel, linux-kernel, linux-crypto, linux-raid

On Tue, Feb 7, 2017 at 11:46 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Feb 7, 2017 at 1:02 AM, Anup Patel <anup.patel@broadcom.com> wrote:
>> On Tue, Feb 7, 2017 at 1:57 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>>> On Tue, Feb 7, 2017 at 12:16 AM, Anup Patel <anup.patel@broadcom.com> wrote:
>>>> The DMAENGINE framework assumes that if PQ offload is supported by a
>>>> DMA device then all 256 PQ coefficients are supported. This assumption
>>>> does not hold anymore because we now have BCM-SBA-RAID offload engine
>>>> which supports PQ offload with limited number of PQ coefficients.
>>>>
>>>> This patch extends async_tx APIs to handle DMA devices with support
>>>> for fewer PQ coefficients.
>>>>
>>>> Signed-off-by: Anup Patel <anup.patel@broadcom.com>
>>>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>>
>>> I don't like this approach. Define an interface for md to query the
>>> offload engine once at the beginning of time. We should not be adding
>>> any new extensions to async_tx.
>>
>> Even if we do capability checks in Linux MD, we still need a way
>> for DMAENGINE drivers to advertise number of PQ coefficients
>> handled by the HW.
>>
>> I agree capability checks should be done once in Linux MD but I don't
>> see why this has to be part of BCM-SBA-RAID driver patches. We need
>> separate patchsets to address limitations of async_tx framework.
>
> Right, separate enabling before we pile on new hardware support to a
> known broken framework.

Linux Async Tx not broken framework. The issue is:
1. Its not complete enough
2. Its not optimized for very high through-put offload engines

Regards,
Anup

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients
  2017-02-08  8:57           ` Anup Patel
@ 2017-02-08 16:24             ` Dan Williams
       [not found]               ` <CAPcyv4iFJXxvJFrUs2jtwP9GX5NcJ8LiEDHeZ5b1fwjCAToe5w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Williams @ 2017-02-08 16:24 UTC (permalink / raw)
  To: Anup Patel
  Cc: Vinod Koul, Rob Herring, Mark Rutland, Herbert Xu,
	David S . Miller, Jassi Brar, Ray Jui, Scott Branden, Jon Mason,
	Rob Rice, BCM Kernel Feedback, dmaengine, Device Tree,
	linux-arm-kernel, linux-kernel, linux-crypto, linux-raid

On Wed, Feb 8, 2017 at 12:57 AM, Anup Patel <anup.patel@broadcom.com> wrote:
> On Tue, Feb 7, 2017 at 11:46 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> On Tue, Feb 7, 2017 at 1:02 AM, Anup Patel <anup.patel@broadcom.com> wrote:
>>> On Tue, Feb 7, 2017 at 1:57 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>>>> On Tue, Feb 7, 2017 at 12:16 AM, Anup Patel <anup.patel@broadcom.com> wrote:
>>>>> The DMAENGINE framework assumes that if PQ offload is supported by a
>>>>> DMA device then all 256 PQ coefficients are supported. This assumption
>>>>> does not hold anymore because we now have BCM-SBA-RAID offload engine
>>>>> which supports PQ offload with limited number of PQ coefficients.
>>>>>
>>>>> This patch extends async_tx APIs to handle DMA devices with support
>>>>> for fewer PQ coefficients.
>>>>>
>>>>> Signed-off-by: Anup Patel <anup.patel@broadcom.com>
>>>>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>>>
>>>> I don't like this approach. Define an interface for md to query the
>>>> offload engine once at the beginning of time. We should not be adding
>>>> any new extensions to async_tx.
>>>
>>> Even if we do capability checks in Linux MD, we still need a way
>>> for DMAENGINE drivers to advertise number of PQ coefficients
>>> handled by the HW.
>>>
>>> I agree capability checks should be done once in Linux MD but I don't
>>> see why this has to be part of BCM-SBA-RAID driver patches. We need
>>> separate patchsets to address limitations of async_tx framework.
>>
>> Right, separate enabling before we pile on new hardware support to a
>> known broken framework.
>
> Linux Async Tx not broken framework. The issue is:
> 1. Its not complete enough
> 2. Its not optimized for very high through-put offload engines

I'm not understanding your point. I'm nak'ing this change to add yet
more per-transaction capability checking to async_tx. I don't like the
DMA_HAS_FEWER_PQ_COEF flag, especially since it is equal to
DMA_HAS_PQ_CONTINUE. I'm not asking for all of async_tx's problems to
be fixed before this new hardware support, I'm simply saying we should
start the process of moving offload-engine capability checking to the
raid code.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients
       [not found]               ` <CAPcyv4iFJXxvJFrUs2jtwP9GX5NcJ8LiEDHeZ5b1fwjCAToe5w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-02-09  9:29                 ` Anup Patel
  2017-02-09 16:44                   ` Dan Williams
  0 siblings, 1 reply; 16+ messages in thread
From: Anup Patel @ 2017-02-09  9:29 UTC (permalink / raw)
  To: Dan Williams
  Cc: Vinod Koul, Rob Herring, Mark Rutland, Herbert Xu,
	David S . Miller, Jassi Brar, Ray Jui, Scott Branden, Jon Mason,
	Rob Rice, BCM Kernel Feedback, dmaengine-u79uwXL29TY76Z2rM5mHXA,
	Device Tree, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, linux-raid

On Wed, Feb 8, 2017 at 9:54 PM, Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> On Wed, Feb 8, 2017 at 12:57 AM, Anup Patel <anup.patel-dY08KVG/lbpWk0Htik3J/w@public.gmane.org> wrote:
>> On Tue, Feb 7, 2017 at 11:46 PM, Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>>> On Tue, Feb 7, 2017 at 1:02 AM, Anup Patel <anup.patel-dY08KVG/lbpWk0Htik3J/w@public.gmane.org> wrote:
>>>> On Tue, Feb 7, 2017 at 1:57 PM, Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>>>>> On Tue, Feb 7, 2017 at 12:16 AM, Anup Patel <anup.patel-dY08KVG/lbpWk0Htik3J/w@public.gmane.org> wrote:
>>>>>> The DMAENGINE framework assumes that if PQ offload is supported by a
>>>>>> DMA device then all 256 PQ coefficients are supported. This assumption
>>>>>> does not hold anymore because we now have BCM-SBA-RAID offload engine
>>>>>> which supports PQ offload with limited number of PQ coefficients.
>>>>>>
>>>>>> This patch extends async_tx APIs to handle DMA devices with support
>>>>>> for fewer PQ coefficients.
>>>>>>
>>>>>> Signed-off-by: Anup Patel <anup.patel-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>>>>>> Reviewed-by: Scott Branden <scott.branden-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>>>>>
>>>>> I don't like this approach. Define an interface for md to query the
>>>>> offload engine once at the beginning of time. We should not be adding
>>>>> any new extensions to async_tx.
>>>>
>>>> Even if we do capability checks in Linux MD, we still need a way
>>>> for DMAENGINE drivers to advertise number of PQ coefficients
>>>> handled by the HW.
>>>>
>>>> I agree capability checks should be done once in Linux MD but I don't
>>>> see why this has to be part of BCM-SBA-RAID driver patches. We need
>>>> separate patchsets to address limitations of async_tx framework.
>>>
>>> Right, separate enabling before we pile on new hardware support to a
>>> known broken framework.
>>
>> Linux Async Tx not broken framework. The issue is:
>> 1. Its not complete enough
>> 2. Its not optimized for very high through-put offload engines
>
> I'm not understanding your point. I'm nak'ing this change to add yet
> more per-transaction capability checking to async_tx. I don't like the
> DMA_HAS_FEWER_PQ_COEF flag, especially since it is equal to
> DMA_HAS_PQ_CONTINUE. I'm not asking for all of async_tx's problems to
> be fixed before this new hardware support, I'm simply saying we should
> start the process of moving offload-engine capability checking to the
> raid code.

The DMA_HAS_FEWER_PQ_COEF is not equal to
DMA_HAS_PQ_CONTINUE.

I will try to drop this patch and take care of unsupported PQ
coefficients in BCM-SBA-RAID driver itself even if this means
doing some computations in BCM-SBA-RAID driver itself.

Regards,
Anup
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients
  2017-02-09  9:29                 ` Anup Patel
@ 2017-02-09 16:44                   ` Dan Williams
  2017-02-10  3:24                     ` Anup Patel
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Williams @ 2017-02-09 16:44 UTC (permalink / raw)
  To: Anup Patel
  Cc: Vinod Koul, Rob Herring, Mark Rutland, Herbert Xu,
	David S . Miller, Jassi Brar, Ray Jui, Scott Branden, Jon Mason,
	Rob Rice, BCM Kernel Feedback, dmaengine, Device Tree,
	linux-arm-kernel, linux-kernel, linux-crypto, linux-raid

On Thu, Feb 9, 2017 at 1:29 AM, Anup Patel <anup.patel@broadcom.com> wrote:
> On Wed, Feb 8, 2017 at 9:54 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> On Wed, Feb 8, 2017 at 12:57 AM, Anup Patel <anup.patel@broadcom.com> wrote:
>>> On Tue, Feb 7, 2017 at 11:46 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>>>> On Tue, Feb 7, 2017 at 1:02 AM, Anup Patel <anup.patel@broadcom.com> wrote:
>>>>> On Tue, Feb 7, 2017 at 1:57 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>>>>>> On Tue, Feb 7, 2017 at 12:16 AM, Anup Patel <anup.patel@broadcom.com> wrote:
>>>>>>> The DMAENGINE framework assumes that if PQ offload is supported by a
>>>>>>> DMA device then all 256 PQ coefficients are supported. This assumption
>>>>>>> does not hold anymore because we now have BCM-SBA-RAID offload engine
>>>>>>> which supports PQ offload with limited number of PQ coefficients.
>>>>>>>
>>>>>>> This patch extends async_tx APIs to handle DMA devices with support
>>>>>>> for fewer PQ coefficients.
>>>>>>>
>>>>>>> Signed-off-by: Anup Patel <anup.patel@broadcom.com>
>>>>>>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>>>>>
>>>>>> I don't like this approach. Define an interface for md to query the
>>>>>> offload engine once at the beginning of time. We should not be adding
>>>>>> any new extensions to async_tx.
>>>>>
>>>>> Even if we do capability checks in Linux MD, we still need a way
>>>>> for DMAENGINE drivers to advertise number of PQ coefficients
>>>>> handled by the HW.
>>>>>
>>>>> I agree capability checks should be done once in Linux MD but I don't
>>>>> see why this has to be part of BCM-SBA-RAID driver patches. We need
>>>>> separate patchsets to address limitations of async_tx framework.
>>>>
>>>> Right, separate enabling before we pile on new hardware support to a
>>>> known broken framework.
>>>
>>> Linux Async Tx not broken framework. The issue is:
>>> 1. Its not complete enough
>>> 2. Its not optimized for very high through-put offload engines
>>
>> I'm not understanding your point. I'm nak'ing this change to add yet
>> more per-transaction capability checking to async_tx. I don't like the
>> DMA_HAS_FEWER_PQ_COEF flag, especially since it is equal to
>> DMA_HAS_PQ_CONTINUE. I'm not asking for all of async_tx's problems to
>> be fixed before this new hardware support, I'm simply saying we should
>> start the process of moving offload-engine capability checking to the
>> raid code.
>
> The DMA_HAS_FEWER_PQ_COEF is not equal to
> DMA_HAS_PQ_CONTINUE.

#define DMA_HAS_PQ_CONTINUE (1 << 15
#define DMA_HAS_FEWER_PQ_COEF (1 << 15)

> I will try to drop this patch and take care of unsupported PQ
> coefficients in BCM-SBA-RAID driver itself even if this means
> doing some computations in BCM-SBA-RAID driver itself.

That should be nak'd as well, please do capability detection in a
routine that is common to all raid engines.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients
  2017-02-09 16:44                   ` Dan Williams
@ 2017-02-10  3:24                     ` Anup Patel
  0 siblings, 0 replies; 16+ messages in thread
From: Anup Patel @ 2017-02-10  3:24 UTC (permalink / raw)
  To: Dan Williams
  Cc: Mark Rutland, Device Tree, Herbert Xu, Scott Branden, Vinod Koul,
	Ray Jui, Jassi Brar, linux-kernel, linux-raid, Jon Mason,
	Rob Herring, BCM Kernel Feedback, linux-crypto, Rob Rice,
	dmaengine, David S . Miller, linux-arm-kernel

On Thu, Feb 9, 2017 at 10:14 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Thu, Feb 9, 2017 at 1:29 AM, Anup Patel <anup.patel@broadcom.com> wrote:
>> On Wed, Feb 8, 2017 at 9:54 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>>> On Wed, Feb 8, 2017 at 12:57 AM, Anup Patel <anup.patel@broadcom.com> wrote:
>>>> On Tue, Feb 7, 2017 at 11:46 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>>>>> On Tue, Feb 7, 2017 at 1:02 AM, Anup Patel <anup.patel@broadcom.com> wrote:
>>>>>> On Tue, Feb 7, 2017 at 1:57 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>>>>>>> On Tue, Feb 7, 2017 at 12:16 AM, Anup Patel <anup.patel@broadcom.com> wrote:
>>>>>>>> The DMAENGINE framework assumes that if PQ offload is supported by a
>>>>>>>> DMA device then all 256 PQ coefficients are supported. This assumption
>>>>>>>> does not hold anymore because we now have BCM-SBA-RAID offload engine
>>>>>>>> which supports PQ offload with limited number of PQ coefficients.
>>>>>>>>
>>>>>>>> This patch extends async_tx APIs to handle DMA devices with support
>>>>>>>> for fewer PQ coefficients.
>>>>>>>>
>>>>>>>> Signed-off-by: Anup Patel <anup.patel@broadcom.com>
>>>>>>>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>>>>>>
>>>>>>> I don't like this approach. Define an interface for md to query the
>>>>>>> offload engine once at the beginning of time. We should not be adding
>>>>>>> any new extensions to async_tx.
>>>>>>
>>>>>> Even if we do capability checks in Linux MD, we still need a way
>>>>>> for DMAENGINE drivers to advertise number of PQ coefficients
>>>>>> handled by the HW.
>>>>>>
>>>>>> I agree capability checks should be done once in Linux MD but I don't
>>>>>> see why this has to be part of BCM-SBA-RAID driver patches. We need
>>>>>> separate patchsets to address limitations of async_tx framework.
>>>>>
>>>>> Right, separate enabling before we pile on new hardware support to a
>>>>> known broken framework.
>>>>
>>>> Linux Async Tx not broken framework. The issue is:
>>>> 1. Its not complete enough
>>>> 2. Its not optimized for very high through-put offload engines
>>>
>>> I'm not understanding your point. I'm nak'ing this change to add yet
>>> more per-transaction capability checking to async_tx. I don't like the
>>> DMA_HAS_FEWER_PQ_COEF flag, especially since it is equal to
>>> DMA_HAS_PQ_CONTINUE. I'm not asking for all of async_tx's problems to
>>> be fixed before this new hardware support, I'm simply saying we should
>>> start the process of moving offload-engine capability checking to the
>>> raid code.
>>
>> The DMA_HAS_FEWER_PQ_COEF is not equal to
>> DMA_HAS_PQ_CONTINUE.
>
> #define DMA_HAS_PQ_CONTINUE (1 << 15
> #define DMA_HAS_FEWER_PQ_COEF (1 << 15)

You are only looking at the values of these flags.

The semantics of both these flags are different and both
flags are set in different members of "struct dma_device"
The DMA_HAS_PQ_CONTINUE is set in "max_pq" whereas
DMA_HAS_FEWER_PQ_COEF is set in "max_pqcoef".

When DMA_HAS_PQ_CONTINUE is set in "max_pq", it
means that PQ HW is capable of taking P & Q computed
previous txn as input. If DMA_HAS_PQ_CONTINUE is
not supported the async_pq() will pass P & Q computed
by previous txn as sources with coef as g^0.

When DMA_HAS_FEWER_PQ_COEF is set in "max_pqcoef",
it means the PQ HW is not capable of handling all 256 coefs.

>
>> I will try to drop this patch and take care of unsupported PQ
>> coefficients in BCM-SBA-RAID driver itself even if this means
>> doing some computations in BCM-SBA-RAID driver itself.
>
> That should be nak'd as well, please do capability detection in a
> routine that is common to all raid engines.

Thanks for NAKing this patch.

This motivated me to find clean work-around for handling
unsupported PQ coefs in BCM-SBA-RAID driver.

Let's assume max number of PQ coefs supported by PQ HW
is m coefs. Now for any coef n > m, we can use RAID6 math
to get g^n = (g^m)*(g^m)*....*(g^k) where k <= m.

Using the above fact, we can create chained txn for each
source of PQ request in BCM-SBA-RAID driver where each
txn will only compute PQ from one source using above
described RAID6 math. Also, each txn in chained txn will
depend on output of previous txn because we are computing
PQ from one source at a time.

I will drop this patch and send updated BCM-SBA-RAID
driver with above described work-around for handling
unsupported PQ coefs.

Regards,
Anup

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-02-10  3:24 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-07  8:16 [PATCH v2 0/5] Broadcom SBA RAID support Anup Patel
2017-02-07  8:16 ` [PATCH v2 1/5] lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position Anup Patel
     [not found] ` <1486455406-11202-1-git-send-email-anup.patel-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-02-07  8:16   ` [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients Anup Patel
2017-02-07  8:27     ` Dan Williams
2017-02-07  9:02       ` Anup Patel
2017-02-07 16:42         ` Vinod Koul
2017-02-08  6:37           ` Anup Patel
2017-02-07 18:16         ` Dan Williams
2017-02-08  8:57           ` Anup Patel
2017-02-08 16:24             ` Dan Williams
     [not found]               ` <CAPcyv4iFJXxvJFrUs2jtwP9GX5NcJ8LiEDHeZ5b1fwjCAToe5w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-02-09  9:29                 ` Anup Patel
2017-02-09 16:44                   ` Dan Williams
2017-02-10  3:24                     ` Anup Patel
2017-02-07  8:16   ` [PATCH v2 3/5] async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome() Anup Patel
2017-02-07  8:16 ` [PATCH v2 4/5] dmaengine: Add Broadcom SBA RAID driver Anup Patel
2017-02-07  8:16 ` [PATCH v2 5/5] dt-bindings: Add DT bindings document for " Anup Patel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).