linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] soc: fsl: dpio: Enable QMAN batch enqueuing
@ 2019-12-12 17:01 Youri Querry
  2019-12-12 17:01 ` [PATCH 1/3] soc: fsl: dpio: Adding QMAN multiple enqueue interface Youri Querry
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Youri Querry @ 2019-12-12 17:01 UTC (permalink / raw)
  To: Roy Pledge, Leo Li, linux-kernel, linuxppc-dev, linux-arm-kernel,
	Ioana Ciornei, Alexandru Marginean
  Cc: Youri Querry

This patch set consists of:
- We added an interface to enqueue several packets at a time and
  improve performance.
- Make the algorithm decisions once at initialization and use
  function pointers to improve performance.
- Replaced the QMAN enqueue array mode algorithm with a ring
  mode algorithm. This is to make the enqueue of several frames
  at a time more effective.

Youri Querry (3):
  soc: fsl: dpio: Adding QMAN multiple enqueue interface.
  soc: fsl: dpio: QMAN performance improvement. Function pointer
    indirection.
  soc: fsl: dpio: Replace QMAN array mode by ring mode enqueue.

 drivers/soc/fsl/dpio/dpio-service.c |  69 +++-
 drivers/soc/fsl/dpio/qbman-portal.c | 766 ++++++++++++++++++++++++++++++++----
 drivers/soc/fsl/dpio/qbman-portal.h | 158 +++++++-
 include/soc/fsl/dpaa2-io.h          |   6 +-
 4 files changed, 907 insertions(+), 92 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] soc: fsl: dpio: Adding QMAN multiple enqueue interface.
  2019-12-12 17:01 [PATCH 0/3] soc: fsl: dpio: Enable QMAN batch enqueuing Youri Querry
@ 2019-12-12 17:01 ` Youri Querry
  2019-12-12 17:01 ` [PATCH 2/3] soc: fsl: dpio: QMAN performance improvement. Function pointer indirection Youri Querry
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Youri Querry @ 2019-12-12 17:01 UTC (permalink / raw)
  To: Roy Pledge, Leo Li, linux-kernel, linuxppc-dev, linux-arm-kernel,
	Ioana Ciornei, Alexandru Marginean
  Cc: Youri Querry

Update of QMAN the interface to enqueue frame. We now support multiple
enqueue (qbman_swp_enqueue_multiple) and multiple enqueue with
a table of descriptor (qbman_swp_enqueue_multiple_desc).

Signed-off-by: Youri Querry <youri.querry_1@nxp.com>
---
 drivers/soc/fsl/dpio/dpio-service.c | 69 ++++++++++++++++++++++++++++--
 drivers/soc/fsl/dpio/qbman-portal.c | 83 +++++++++++++++++++++++++++++++------
 drivers/soc/fsl/dpio/qbman-portal.h | 24 +++++++++++
 include/soc/fsl/dpaa2-io.h          |  6 ++-
 4 files changed, 165 insertions(+), 17 deletions(-)

diff --git a/drivers/soc/fsl/dpio/dpio-service.c b/drivers/soc/fsl/dpio/dpio-service.c
index 518a8e0..cd4f641 100644
--- a/drivers/soc/fsl/dpio/dpio-service.c
+++ b/drivers/soc/fsl/dpio/dpio-service.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: (GPL-2.0+ OR BSD-3-Clause)
 /*
  * Copyright 2014-2016 Freescale Semiconductor Inc.
- * Copyright 2016 NXP
+ * Copyright 2016-2019 NXP
  *
  */
 #include <linux/types.h>
@@ -433,6 +433,69 @@ int dpaa2_io_service_enqueue_fq(struct dpaa2_io *d,
 EXPORT_SYMBOL(dpaa2_io_service_enqueue_fq);
 
 /**
+ * dpaa2_io_service_enqueue_multiple_fq() - Enqueue multiple frames
+ * to a frame queue using one fqid.
+ * @d: the given DPIO service.
+ * @fqid: the given frame queue id.
+ * @fd: the frame descriptor which is enqueued.
+ * @nb: number of frames to be enqueud
+ *
+ * Return 0 for successful enqueue, -EBUSY if the enqueue ring is not ready,
+ * or -ENODEV if there is no dpio service.
+ */
+int dpaa2_io_service_enqueue_multiple_fq(struct dpaa2_io *d,
+				u32 fqid,
+				const struct dpaa2_fd *fd,
+				int nb)
+{
+	struct qbman_eq_desc ed;
+
+	d = service_select(d);
+	if (!d)
+		return -ENODEV;
+
+	qbman_eq_desc_clear(&ed);
+	qbman_eq_desc_set_no_orp(&ed, 0);
+	qbman_eq_desc_set_fq(&ed, fqid);
+
+	return qbman_swp_enqueue_multiple(d->swp, &ed, fd, 0, nb);
+}
+EXPORT_SYMBOL(dpaa2_io_service_enqueue_multiple_fq);
+
+/**
+ * dpaa2_io_service_enqueue_multiple_desc_fq() - Enqueue multiple frames
+ * to different frame queue using a list of fqids.
+ * @d: the given DPIO service.
+ * @fqid: the given list of frame queue ids.
+ * @fd: the frame descriptor which is enqueued.
+ * @nb: number of frames to be enqueud
+ *
+ * Return 0 for successful enqueue, -EBUSY if the enqueue ring is not ready,
+ * or -ENODEV if there is no dpio service.
+ */
+int dpaa2_io_service_enqueue_multiple_desc_fq(struct dpaa2_io *d,
+				u32 *fqid,
+				const struct dpaa2_fd *fd,
+				int nb)
+{
+	int i;
+	struct qbman_eq_desc ed[32];
+
+	d = service_select(d);
+	if (!d)
+		return -ENODEV;
+
+	for (i = 0; i < nb; i++) {
+		qbman_eq_desc_clear(&ed[i]);
+		qbman_eq_desc_set_no_orp(&ed[i], 0);
+		qbman_eq_desc_set_fq(&ed[i], fqid[i]);
+	}
+
+	return qbman_swp_enqueue_multiple_desc(d->swp, &ed[0], fd, nb);
+}
+EXPORT_SYMBOL(dpaa2_io_service_enqueue_multiple_desc_fq);
+
+/**
  * dpaa2_io_service_enqueue_qd() - Enqueue a frame to a QD.
  * @d: the given DPIO service.
  * @qdid: the given queuing destination id.
@@ -526,7 +589,7 @@ EXPORT_SYMBOL_GPL(dpaa2_io_service_acquire);
 
 /**
  * dpaa2_io_store_create() - Create the dma memory storage for dequeue result.
- * @max_frames: the maximum number of dequeued result for frames, must be <= 16.
+ * @max_frames: the maximum number of dequeued result for frames, must be <= 32.
  * @dev:        the device to allow mapping/unmapping the DMAable region.
  *
  * The size of the storage is "max_frames*sizeof(struct dpaa2_dq)".
@@ -541,7 +604,7 @@ struct dpaa2_io_store *dpaa2_io_store_create(unsigned int max_frames,
 	struct dpaa2_io_store *ret;
 	size_t size;
 
-	if (!max_frames || (max_frames > 16))
+	if (!max_frames || (max_frames > 32))
 		return NULL;
 
 	ret = kmalloc(sizeof(*ret), GFP_KERNEL);
diff --git a/drivers/soc/fsl/dpio/qbman-portal.c b/drivers/soc/fsl/dpio/qbman-portal.c
index c66f5b7..5a37ac8 100644
--- a/drivers/soc/fsl/dpio/qbman-portal.c
+++ b/drivers/soc/fsl/dpio/qbman-portal.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: (GPL-2.0+ OR BSD-3-Clause)
 /*
  * Copyright (C) 2014-2016 Freescale Semiconductor, Inc.
- * Copyright 2016 NXP
+ * Copyright 2016-2019 NXP
  *
  */
 
@@ -12,13 +12,6 @@
 
 #include "qbman-portal.h"
 
-#define QMAN_REV_4000   0x04000000
-#define QMAN_REV_4100   0x04010000
-#define QMAN_REV_4101   0x04010001
-#define QMAN_REV_5000   0x05000000
-
-#define QMAN_REV_MASK   0xffff0000
-
 /* All QBMan command and result structures use this "valid bit" encoding */
 #define QB_VALID_BIT ((u32)0x80)
 
@@ -156,7 +149,7 @@ static inline u32 qbman_set_swp_cfg(u8 max_fill, u8 wn,	u8 est, u8 rpm, u8 dcm,
  */
 struct qbman_swp *qbman_swp_init(const struct qbman_swp_desc *d)
 {
-	struct qbman_swp *p = kmalloc(sizeof(*p), GFP_KERNEL);
+	struct qbman_swp *p = kzalloc(sizeof(*p), GFP_KERNEL);
 	u32 reg;
 
 	if (!p)
@@ -467,22 +460,32 @@ static inline void qbman_write_eqcr_am_rt_register(struct qbman_swp *p,
 int qbman_swp_enqueue(struct qbman_swp *s, const struct qbman_eq_desc *d,
 		      const struct dpaa2_fd *fd)
 {
-	struct qbman_eq_desc *p;
+	struct qbman_eq_desc_with_fd *p;
 	u32 eqar = qbman_read_register(s, QBMAN_CINH_SWP_EQAR);
 
 	if (!EQAR_SUCCESS(eqar))
 		return -EBUSY;
 
 	p = qbman_get_cmd(s, QBMAN_CENA_SWP_EQCR(EQAR_IDX(eqar)));
-	memcpy(&p->dca, &d->dca, 31);
+	/* This is mapped as DEVICE type memory, writes are
+	 * with address alignment:
+	 * desc.dca address alignment = 1
+	 * desc.seqnum address alignment = 2
+	 * desc.orpid address alignment = 4
+	 * desc.tgtid address alignment = 8
+	 */
+	p->desc.dca = d->dca;
+	p->desc.seqnum = d->seqnum;
+	p->desc.orpid = d->orpid;
+	memcpy(&p->desc.tgtid, &d->tgtid, 24);
 	memcpy(&p->fd, fd, sizeof(*fd));
 
 	if ((s->desc->qman_version & QMAN_REV_MASK) < QMAN_REV_5000) {
 		/* Set the verb byte, have to substitute in the valid-bit */
 		dma_wmb();
-		p->verb = d->verb | EQAR_VB(eqar);
+		p->desc.verb = d->verb | EQAR_VB(eqar);
 	} else {
-		p->verb = d->verb | EQAR_VB(eqar);
+		p->desc.verb = d->verb | EQAR_VB(eqar);
 		dma_wmb();
 		qbman_write_eqcr_am_rt_register(s, EQAR_IDX(eqar));
 	}
@@ -490,6 +493,60 @@ int qbman_swp_enqueue(struct qbman_swp *s, const struct qbman_eq_desc *d,
 	return 0;
 }
 
+/**
+ * qbman_swp_enqueue_multiple() - Issue a multi enqueue command
+ * using one enqueue descriptor
+ * @s:  the software portal used for enqueue
+ * @d:  the enqueue descriptor
+ * @fd: table pointer of frame descriptor table to be enqueued
+ * @flags: table pointer of flags, not used for the moment
+ * @num_frames: number of fd to be enqueued
+ *
+ * Return the number of fd enqueued, or a negative error number.
+ */
+int qbman_swp_enqueue_multiple(struct qbman_swp *s,
+			       const struct qbman_eq_desc *d,
+			       const struct dpaa2_fd *fd,
+			       uint32_t *flags,
+			       int num_frames)
+{
+	int count = 0;
+
+	while (count < num_frames) {
+		if (qbman_swp_enqueue(s, d, fd) != 0)
+			break;
+		count++;
+	}
+
+	return count;
+}
+
+/**
+ * qbman_swp_enqueue_multiple_desc() - Issue a multi enqueue command
+ * using multiple enqueue descriptor
+ * @s:  the software portal used for enqueue
+ * @d:  table of minimal enqueue descriptor
+ * @fd: table pointer of frame descriptor table to be enqueued
+ * @num_frames: number of fd to be enqueued
+ *
+ * Return the number of fd enqueued, or a negative error number.
+ */
+int qbman_swp_enqueue_multiple_desc(struct qbman_swp *s,
+				    const struct qbman_eq_desc *d,
+				    const struct dpaa2_fd *fd,
+				    int num_frames)
+{
+	int count = 0;
+
+	while (count < num_frames) {
+		if (qbman_swp_enqueue(s, &(d[count]), fd) != 0)
+			break;
+		count++;
+	}
+
+	return count;
+}
+
 /* Static (push) dequeue */
 
 /**
diff --git a/drivers/soc/fsl/dpio/qbman-portal.h b/drivers/soc/fsl/dpio/qbman-portal.h
index f3ec5d2..ac58a97 100644
--- a/drivers/soc/fsl/dpio/qbman-portal.h
+++ b/drivers/soc/fsl/dpio/qbman-portal.h
@@ -9,6 +9,13 @@
 
 #include <soc/fsl/dpaa2-fd.h>
 
+#define QMAN_REV_4000   0x04000000
+#define QMAN_REV_4100   0x04010000
+#define QMAN_REV_4101   0x04010001
+#define QMAN_REV_5000   0x05000000
+
+#define QMAN_REV_MASK   0xffff0000
+
 struct dpaa2_dq;
 struct qbman_swp;
 
@@ -81,6 +88,10 @@ struct qbman_eq_desc {
 	u8 wae;
 	u8 rspid;
 	__le64 rsp_addr;
+};
+
+struct qbman_eq_desc_with_fd {
+	struct qbman_eq_desc desc;
 	u8 fd[32];
 };
 
@@ -193,6 +204,19 @@ void *qbman_swp_mc_start(struct qbman_swp *p);
 void qbman_swp_mc_submit(struct qbman_swp *p, void *cmd, u8 cmd_verb);
 void *qbman_swp_mc_result(struct qbman_swp *p);
 
+int
+qbman_swp_enqueue_multiple(struct qbman_swp *s,
+			   const struct qbman_eq_desc *d,
+			   const struct dpaa2_fd *fd,
+			   uint32_t *flags,
+			   int num_frames);
+
+int
+qbman_swp_enqueue_multiple_desc(struct qbman_swp *s,
+				const struct qbman_eq_desc *d,
+				const struct dpaa2_fd *fd,
+				int num_frames);
+
 /**
  * qbman_result_is_DQ() - check if the dequeue result is a dequeue response
  * @dq: the dequeue result to be checked
diff --git a/include/soc/fsl/dpaa2-io.h b/include/soc/fsl/dpaa2-io.h
index 672cfb5..c9d8499 100644
--- a/include/soc/fsl/dpaa2-io.h
+++ b/include/soc/fsl/dpaa2-io.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: (GPL-2.0+ OR BSD-3-Clause) */
 /*
  * Copyright 2014-2016 Freescale Semiconductor Inc.
- * Copyright NXP
+ * Copyright 2017-2019 NXP
  *
  */
 #ifndef __FSL_DPAA2_IO_H
@@ -109,6 +109,10 @@ int dpaa2_io_service_pull_channel(struct dpaa2_io *d, u32 channelid,
 
 int dpaa2_io_service_enqueue_fq(struct dpaa2_io *d, u32 fqid,
 				const struct dpaa2_fd *fd);
+int dpaa2_io_service_enqueue_multiple_fq(struct dpaa2_io *d, u32 fqid,
+				const struct dpaa2_fd *fd, int number_of_frame);
+int dpaa2_io_service_enqueue_multiple_desc_fq(struct dpaa2_io *d, u32 *fqid,
+				const struct dpaa2_fd *fd, int number_of_frame);
 int dpaa2_io_service_enqueue_qd(struct dpaa2_io *d, u32 qdid, u8 prio,
 				u16 qdbin, const struct dpaa2_fd *fd);
 int dpaa2_io_service_release(struct dpaa2_io *d, u16 bpid,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] soc: fsl: dpio: QMAN performance improvement. Function pointer indirection.
  2019-12-12 17:01 [PATCH 0/3] soc: fsl: dpio: Enable QMAN batch enqueuing Youri Querry
  2019-12-12 17:01 ` [PATCH 1/3] soc: fsl: dpio: Adding QMAN multiple enqueue interface Youri Querry
@ 2019-12-12 17:01 ` Youri Querry
  2020-03-26 10:07   ` Arnd Bergmann
  2019-12-12 17:01 ` [PATCH 3/3] soc: fsl: dpio: Replace QMAN array mode by ring mode enqueue Youri Querry
  2020-02-06 20:39 ` [PATCH 0/3] soc: fsl: dpio: Enable QMAN batch enqueuing Roy Pledge
  3 siblings, 1 reply; 8+ messages in thread
From: Youri Querry @ 2019-12-12 17:01 UTC (permalink / raw)
  To: Roy Pledge, Leo Li, linux-kernel, linuxppc-dev, linux-arm-kernel,
	Ioana Ciornei, Alexandru Marginean
  Cc: Youri Querry

We are making the access decision in the initialization and
setting the function pointers accordingly.

Signed-off-by: Youri Querry <youri.querry_1@nxp.com>
---
 drivers/soc/fsl/dpio/qbman-portal.c | 451 +++++++++++++++++++++++++++++++-----
 drivers/soc/fsl/dpio/qbman-portal.h | 129 ++++++++++-
 2 files changed, 507 insertions(+), 73 deletions(-)

diff --git a/drivers/soc/fsl/dpio/qbman-portal.c b/drivers/soc/fsl/dpio/qbman-portal.c
index 5a37ac8..0ffe018 100644
--- a/drivers/soc/fsl/dpio/qbman-portal.c
+++ b/drivers/soc/fsl/dpio/qbman-portal.c
@@ -83,6 +83,82 @@ enum qbman_sdqcr_fc {
 	qbman_sdqcr_fc_up_to_3 = 1
 };
 
+/* Internal Function declaration */
+static int qbman_swp_enqueue_direct(struct qbman_swp *s,
+				    const struct qbman_eq_desc *d,
+				    const struct dpaa2_fd *fd);
+static int qbman_swp_enqueue_mem_back(struct qbman_swp *s,
+				      const struct qbman_eq_desc *d,
+				      const struct dpaa2_fd *fd);
+static int qbman_swp_enqueue_multiple_direct(struct qbman_swp *s,
+					     const struct qbman_eq_desc *d,
+					     const struct dpaa2_fd *fd,
+					     uint32_t *flags,
+					     int num_frames);
+static int qbman_swp_enqueue_multiple_mem_back(struct qbman_swp *s,
+					       const struct qbman_eq_desc *d,
+					       const struct dpaa2_fd *fd,
+					       uint32_t *flags,
+					       int num_frames);
+static int
+qbman_swp_enqueue_multiple_desc_direct(struct qbman_swp *s,
+				       const struct qbman_eq_desc *d,
+				       const struct dpaa2_fd *fd,
+				       int num_frames);
+static
+int qbman_swp_enqueue_multiple_desc_mem_back(struct qbman_swp *s,
+					     const struct qbman_eq_desc *d,
+					     const struct dpaa2_fd *fd,
+					     int num_frames);
+static int qbman_swp_pull_direct(struct qbman_swp *s,
+				 struct qbman_pull_desc *d);
+static int qbman_swp_pull_mem_back(struct qbman_swp *s,
+				   struct qbman_pull_desc *d);
+
+const struct dpaa2_dq *qbman_swp_dqrr_next_direct(struct qbman_swp *s);
+const struct dpaa2_dq *qbman_swp_dqrr_next_mem_back(struct qbman_swp *s);
+
+static int qbman_swp_release_direct(struct qbman_swp *s,
+				    const struct qbman_release_desc *d,
+				    const u64 *buffers,
+				    unsigned int num_buffers);
+static int qbman_swp_release_mem_back(struct qbman_swp *s,
+				      const struct qbman_release_desc *d,
+				      const u64 *buffers,
+				      unsigned int num_buffers);
+
+/* Function pointers */
+int (*qbman_swp_enqueue_ptr)(struct qbman_swp *s,
+			     const struct qbman_eq_desc *d,
+			     const struct dpaa2_fd *fd)
+	= qbman_swp_enqueue_direct;
+
+int (*qbman_swp_enqueue_multiple_ptr)(struct qbman_swp *s,
+				      const struct qbman_eq_desc *d,
+				      const struct dpaa2_fd *fd,
+				      uint32_t *flags,
+					     int num_frames)
+	= qbman_swp_enqueue_multiple_direct;
+
+int
+(*qbman_swp_enqueue_multiple_desc_ptr)(struct qbman_swp *s,
+				       const struct qbman_eq_desc *d,
+				       const struct dpaa2_fd *fd,
+				       int num_frames)
+	= qbman_swp_enqueue_multiple_desc_direct;
+
+int (*qbman_swp_pull_ptr)(struct qbman_swp *s, struct qbman_pull_desc *d)
+			= qbman_swp_pull_direct;
+
+const struct dpaa2_dq *(*qbman_swp_dqrr_next_ptr)(struct qbman_swp *s)
+			= qbman_swp_dqrr_next_direct;
+
+int (*qbman_swp_release_ptr)(struct qbman_swp *s,
+			     const struct qbman_release_desc *d,
+			     const u64 *buffers,
+			     unsigned int num_buffers)
+			= qbman_swp_release_direct;
+
 /* Portal Access */
 
 static inline u32 qbman_read_register(struct qbman_swp *p, u32 offset)
@@ -218,6 +294,19 @@ struct qbman_swp *qbman_swp_init(const struct qbman_swp_desc *d)
 	 * applied when dequeues from a specific channel are enabled.
 	 */
 	qbman_write_register(p, QBMAN_CINH_SWP_SDQCR, 0);
+
+	if ((p->desc->qman_version & QMAN_REV_MASK) >= QMAN_REV_5000) {
+		qbman_swp_enqueue_ptr =
+			qbman_swp_enqueue_mem_back;
+		qbman_swp_enqueue_multiple_ptr =
+			qbman_swp_enqueue_multiple_mem_back;
+		qbman_swp_enqueue_multiple_desc_ptr =
+			qbman_swp_enqueue_multiple_desc_mem_back;
+		qbman_swp_pull_ptr = qbman_swp_pull_mem_back;
+		qbman_swp_dqrr_next_ptr = qbman_swp_dqrr_next_mem_back;
+		qbman_swp_release_ptr = qbman_swp_release_mem_back;
+	}
+
 	return p;
 }
 
@@ -447,7 +536,7 @@ static inline void qbman_write_eqcr_am_rt_register(struct qbman_swp *p,
 }
 
 /**
- * qbman_swp_enqueue() - Issue an enqueue command
+ * qbman_swp_enqueue_direct() - Issue an enqueue command
  * @s:  the software portal used for enqueue
  * @d:  the enqueue descriptor
  * @fd: the frame descriptor to be enqueued
@@ -457,7 +546,7 @@ static inline void qbman_write_eqcr_am_rt_register(struct qbman_swp *p,
  *
  * Return 0 for successful enqueue, -EBUSY if the EQCR is not ready.
  */
-int qbman_swp_enqueue(struct qbman_swp *s, const struct qbman_eq_desc *d,
+int qbman_swp_enqueue_direct(struct qbman_swp *s, const struct qbman_eq_desc *d,
 		      const struct dpaa2_fd *fd)
 {
 	struct qbman_eq_desc_with_fd *p;
@@ -480,21 +569,57 @@ int qbman_swp_enqueue(struct qbman_swp *s, const struct qbman_eq_desc *d,
 	memcpy(&p->desc.tgtid, &d->tgtid, 24);
 	memcpy(&p->fd, fd, sizeof(*fd));
 
-	if ((s->desc->qman_version & QMAN_REV_MASK) < QMAN_REV_5000) {
-		/* Set the verb byte, have to substitute in the valid-bit */
-		dma_wmb();
-		p->desc.verb = d->verb | EQAR_VB(eqar);
-	} else {
-		p->desc.verb = d->verb | EQAR_VB(eqar);
-		dma_wmb();
-		qbman_write_eqcr_am_rt_register(s, EQAR_IDX(eqar));
-	}
+	/* Set the verb byte, have to substitute in the valid-bit */
+	dma_wmb();
+	p->desc.verb = d->verb | EQAR_VB(eqar);
 
 	return 0;
 }
 
 /**
- * qbman_swp_enqueue_multiple() - Issue a multi enqueue command
+ * qbman_swp_enqueue_mem_back() - Issue an enqueue command
+ * @s:  the software portal used for enqueue
+ * @d:  the enqueue descriptor
+ * @fd: the frame descriptor to be enqueued
+ *
+ * Please note that 'fd' should only be NULL if the "action" of the
+ * descriptor is "orp_hole" or "orp_nesn".
+ *
+ * Return 0 for successful enqueue, -EBUSY if the EQCR is not ready.
+ */
+int qbman_swp_enqueue_mem_back(struct qbman_swp *s,
+			       const struct qbman_eq_desc *d,
+			       const struct dpaa2_fd *fd)
+{
+	struct qbman_eq_desc_with_fd *p;
+	u32 eqar = qbman_read_register(s, QBMAN_CINH_SWP_EQAR);
+
+	if (!EQAR_SUCCESS(eqar))
+		return -EBUSY;
+
+	p = qbman_get_cmd(s, QBMAN_CENA_SWP_EQCR(EQAR_IDX(eqar)));
+	/* This is mapped as DEVICE type memory, writes are
+	 * with address alignment:
+	 * desc.dca address alignment = 1
+	 * desc.seqnum address alignment = 2
+	 * desc.orpid address alignment = 4
+	 * desc.tgtid address alignment = 8
+	 */
+	p->desc.dca = d->dca;
+	p->desc.seqnum = d->seqnum;
+	p->desc.orpid = d->orpid;
+	memcpy(&p->desc.tgtid, &d->tgtid, 24);
+	memcpy(&p->fd, fd, sizeof(*fd));
+
+	p->desc.verb = d->verb | EQAR_VB(eqar);
+	dma_wmb();
+	qbman_write_eqcr_am_rt_register(s, EQAR_IDX(eqar));
+
+	return 0;
+}
+
+/**
+ * qbman_swp_enqueue_multiple_direct() - Issue a multi enqueue command
  * using one enqueue descriptor
  * @s:  the software portal used for enqueue
  * @d:  the enqueue descriptor
@@ -504,16 +629,44 @@ int qbman_swp_enqueue(struct qbman_swp *s, const struct qbman_eq_desc *d,
  *
  * Return the number of fd enqueued, or a negative error number.
  */
-int qbman_swp_enqueue_multiple(struct qbman_swp *s,
-			       const struct qbman_eq_desc *d,
-			       const struct dpaa2_fd *fd,
-			       uint32_t *flags,
-			       int num_frames)
+int qbman_swp_enqueue_multiple_direct(struct qbman_swp *s,
+				      const struct qbman_eq_desc *d,
+				      const struct dpaa2_fd *fd,
+				      uint32_t *flags,
+				      int num_frames)
+{
+	int count = 0;
+
+	while (count < num_frames) {
+		if (qbman_swp_enqueue_direct(s, d, fd) != 0)
+			break;
+		count++;
+	}
+
+	return count;
+}
+
+/**
+ * qbman_swp_enqueue_multiple_mem_back() - Issue a multi enqueue command
+ * using one enqueue descriptor
+ * @s:  the software portal used for enqueue
+ * @d:  the enqueue descriptor
+ * @fd: table pointer of frame descriptor table to be enqueued
+ * @flags: table pointer of flags, not used for the moment
+ * @num_frames: number of fd to be enqueued
+ *
+ * Return the number of fd enqueued, or a negative error number.
+ */
+int qbman_swp_enqueue_multiple_mem_back(struct qbman_swp *s,
+				      const struct qbman_eq_desc *d,
+				      const struct dpaa2_fd *fd,
+				      uint32_t *flags,
+				      int num_frames)
 {
 	int count = 0;
 
 	while (count < num_frames) {
-		if (qbman_swp_enqueue(s, d, fd) != 0)
+		if (qbman_swp_enqueue_mem_back(s, d, fd) != 0)
 			break;
 		count++;
 	}
@@ -522,7 +675,7 @@ int qbman_swp_enqueue_multiple(struct qbman_swp *s,
 }
 
 /**
- * qbman_swp_enqueue_multiple_desc() - Issue a multi enqueue command
+ * qbman_swp_enqueue_multiple_desc_direct() - Issue a multi enqueue command
  * using multiple enqueue descriptor
  * @s:  the software portal used for enqueue
  * @d:  table of minimal enqueue descriptor
@@ -531,15 +684,41 @@ int qbman_swp_enqueue_multiple(struct qbman_swp *s,
  *
  * Return the number of fd enqueued, or a negative error number.
  */
-int qbman_swp_enqueue_multiple_desc(struct qbman_swp *s,
-				    const struct qbman_eq_desc *d,
-				    const struct dpaa2_fd *fd,
-				    int num_frames)
+int qbman_swp_enqueue_multiple_desc_direct(struct qbman_swp *s,
+					   const struct qbman_eq_desc *d,
+					   const struct dpaa2_fd *fd,
+					   int num_frames)
+{
+	int count = 0;
+
+	while (count < num_frames) {
+		if (qbman_swp_enqueue_direct(s, &(d[count]), fd) != 0)
+			break;
+		count++;
+	}
+
+	return count;
+}
+
+/**
+ * qbman_swp_enqueue_multiple_desc_mem_back() - Issue a multi enqueue command
+ * using multiple enqueue descriptor
+ * @s:  the software portal used for enqueue
+ * @d:  table of minimal enqueue descriptor
+ * @fd: table pointer of frame descriptor table to be enqueued
+ * @num_frames: number of fd to be enqueued
+ *
+ * Return the number of fd enqueued, or a negative error number.
+ */
+int qbman_swp_enqueue_multiple_desc_mem_back(struct qbman_swp *s,
+					   const struct qbman_eq_desc *d,
+					   const struct dpaa2_fd *fd,
+					   int num_frames)
 {
 	int count = 0;
 
 	while (count < num_frames) {
-		if (qbman_swp_enqueue(s, &(d[count]), fd) != 0)
+		if (qbman_swp_enqueue_mem_back(s, &(d[count]), fd) != 0)
 			break;
 		count++;
 	}
@@ -702,7 +881,7 @@ void qbman_pull_desc_set_channel(struct qbman_pull_desc *d, u32 chid,
 }
 
 /**
- * qbman_swp_pull() - Issue the pull dequeue command
+ * qbman_swp_pull_direct() - Issue the pull dequeue command
  * @s: the software portal object
  * @d: the software portal descriptor which has been configured with
  *     the set of qbman_pull_desc_set_*() calls
@@ -710,7 +889,7 @@ void qbman_pull_desc_set_channel(struct qbman_pull_desc *d, u32 chid,
  * Return 0 for success, and -EBUSY if the software portal is not ready
  * to do pull dequeue.
  */
-int qbman_swp_pull(struct qbman_swp *s, struct qbman_pull_desc *d)
+int qbman_swp_pull_direct(struct qbman_swp *s, struct qbman_pull_desc *d)
 {
 	struct qbman_pull_desc *p;
 
@@ -728,18 +907,45 @@ int qbman_swp_pull(struct qbman_swp *s, struct qbman_pull_desc *d)
 	p->dq_src = d->dq_src;
 	p->rsp_addr = d->rsp_addr;
 	p->rsp_addr_virt = d->rsp_addr_virt;
+	dma_wmb();
+	/* Set the verb byte, have to substitute in the valid-bit */
+	p->verb = d->verb | s->vdq.valid_bit;
+	s->vdq.valid_bit ^= QB_VALID_BIT;
 
-	if ((s->desc->qman_version & QMAN_REV_MASK) < QMAN_REV_5000) {
-		dma_wmb();
-		/* Set the verb byte, have to substitute in the valid-bit */
-		p->verb = d->verb | s->vdq.valid_bit;
-		s->vdq.valid_bit ^= QB_VALID_BIT;
-	} else {
-		p->verb = d->verb | s->vdq.valid_bit;
-		s->vdq.valid_bit ^= QB_VALID_BIT;
-		dma_wmb();
-		qbman_write_register(s, QBMAN_CINH_SWP_VDQCR_RT, QMAN_RT_MODE);
+	return 0;
+}
+
+/**
+ * qbman_swp_pull_mem_back() - Issue the pull dequeue command
+ * @s: the software portal object
+ * @d: the software portal descriptor which has been configured with
+ *     the set of qbman_pull_desc_set_*() calls
+ *
+ * Return 0 for success, and -EBUSY if the software portal is not ready
+ * to do pull dequeue.
+ */
+int qbman_swp_pull_mem_back(struct qbman_swp *s, struct qbman_pull_desc *d)
+{
+	struct qbman_pull_desc *p;
+
+	if (!atomic_dec_and_test(&s->vdq.available)) {
+		atomic_inc(&s->vdq.available);
+		return -EBUSY;
 	}
+	s->vdq.storage = (void *)(uintptr_t)d->rsp_addr_virt;
+	if ((s->desc->qman_version & QMAN_REV_MASK) < QMAN_REV_5000)
+		p = qbman_get_cmd(s, QBMAN_CENA_SWP_VDQCR);
+	else
+		p = qbman_get_cmd(s, QBMAN_CENA_SWP_VDQCR_MEM);
+	p->numf = d->numf;
+	p->tok = QMAN_DQ_TOKEN_VALID;
+	p->dq_src = d->dq_src;
+	p->rsp_addr = d->rsp_addr;
+	p->rsp_addr_virt = d->rsp_addr_virt;
+	p->verb = d->verb | s->vdq.valid_bit;
+	s->vdq.valid_bit ^= QB_VALID_BIT;
+	dma_wmb();
+	qbman_write_register(s, QBMAN_CINH_SWP_VDQCR_RT, QMAN_RT_MODE);
 
 	return 0;
 }
@@ -747,14 +953,14 @@ int qbman_swp_pull(struct qbman_swp *s, struct qbman_pull_desc *d)
 #define QMAN_DQRR_PI_MASK   0xf
 
 /**
- * qbman_swp_dqrr_next() - Get an valid DQRR entry
+ * qbman_swp_dqrr_next_direct() - Get an valid DQRR entry
  * @s: the software portal object
  *
  * Return NULL if there are no unconsumed DQRR entries. Return a DQRR entry
  * only once, so repeated calls can return a sequence of DQRR entries, without
  * requiring they be consumed immediately or in any particular order.
  */
-const struct dpaa2_dq *qbman_swp_dqrr_next(struct qbman_swp *s)
+const struct dpaa2_dq *qbman_swp_dqrr_next_direct(struct qbman_swp *s)
 {
 	u32 verb;
 	u32 response_verb;
@@ -797,10 +1003,99 @@ const struct dpaa2_dq *qbman_swp_dqrr_next(struct qbman_swp *s)
 				       QBMAN_CENA_SWP_DQRR(s->dqrr.next_idx)));
 	}
 
-	if ((s->desc->qman_version & QMAN_REV_MASK) < QMAN_REV_5000)
-		p = qbman_get_cmd(s, QBMAN_CENA_SWP_DQRR(s->dqrr.next_idx));
-	else
-		p = qbman_get_cmd(s, QBMAN_CENA_SWP_DQRR_MEM(s->dqrr.next_idx));
+	p = qbman_get_cmd(s, QBMAN_CENA_SWP_DQRR(s->dqrr.next_idx));
+	verb = p->dq.verb;
+
+	/*
+	 * If the valid-bit isn't of the expected polarity, nothing there. Note,
+	 * in the DQRR reset bug workaround, we shouldn't need to skip these
+	 * check, because we've already determined that a new entry is available
+	 * and we've invalidated the cacheline before reading it, so the
+	 * valid-bit behaviour is repaired and should tell us what we already
+	 * knew from reading PI.
+	 */
+	if ((verb & QB_VALID_BIT) != s->dqrr.valid_bit) {
+		prefetch(qbman_get_cmd(s,
+				       QBMAN_CENA_SWP_DQRR(s->dqrr.next_idx)));
+		return NULL;
+	}
+	/*
+	 * There's something there. Move "next_idx" attention to the next ring
+	 * entry (and prefetch it) before returning what we found.
+	 */
+	s->dqrr.next_idx++;
+	s->dqrr.next_idx &= s->dqrr.dqrr_size - 1; /* Wrap around */
+	if (!s->dqrr.next_idx)
+		s->dqrr.valid_bit ^= QB_VALID_BIT;
+
+	/*
+	 * If this is the final response to a volatile dequeue command
+	 * indicate that the vdq is available
+	 */
+	flags = p->dq.stat;
+	response_verb = verb & QBMAN_RESULT_MASK;
+	if ((response_verb == QBMAN_RESULT_DQ) &&
+	    (flags & DPAA2_DQ_STAT_VOLATILE) &&
+	    (flags & DPAA2_DQ_STAT_EXPIRED))
+		atomic_inc(&s->vdq.available);
+
+	prefetch(qbman_get_cmd(s, QBMAN_CENA_SWP_DQRR(s->dqrr.next_idx)));
+
+	return p;
+}
+
+/**
+ * qbman_swp_dqrr_next_mem_back() - Get an valid DQRR entry
+ * @s: the software portal object
+ *
+ * Return NULL if there are no unconsumed DQRR entries. Return a DQRR entry
+ * only once, so repeated calls can return a sequence of DQRR entries, without
+ * requiring they be consumed immediately or in any particular order.
+ */
+const struct dpaa2_dq *qbman_swp_dqrr_next_mem_back(struct qbman_swp *s)
+{
+	u32 verb;
+	u32 response_verb;
+	u32 flags;
+	struct dpaa2_dq *p;
+
+	/* Before using valid-bit to detect if something is there, we have to
+	 * handle the case of the DQRR reset bug...
+	 */
+	if (unlikely(s->dqrr.reset_bug)) {
+		/*
+		 * We pick up new entries by cache-inhibited producer index,
+		 * which means that a non-coherent mapping would require us to
+		 * invalidate and read *only* once that PI has indicated that
+		 * there's an entry here. The first trip around the DQRR ring
+		 * will be much less efficient than all subsequent trips around
+		 * it...
+		 */
+		u8 pi = qbman_read_register(s, QBMAN_CINH_SWP_DQPI) &
+			QMAN_DQRR_PI_MASK;
+
+		/* there are new entries if pi != next_idx */
+		if (pi == s->dqrr.next_idx)
+			return NULL;
+
+		/*
+		 * if next_idx is/was the last ring index, and 'pi' is
+		 * different, we can disable the workaround as all the ring
+		 * entries have now been DMA'd to so valid-bit checking is
+		 * repaired. Note: this logic needs to be based on next_idx
+		 * (which increments one at a time), rather than on pi (which
+		 * can burst and wrap-around between our snapshots of it).
+		 */
+		if (s->dqrr.next_idx == (s->dqrr.dqrr_size - 1)) {
+			pr_debug("next_idx=%d, pi=%d, clear reset bug\n",
+				 s->dqrr.next_idx, pi);
+			s->dqrr.reset_bug = 0;
+		}
+		prefetch(qbman_get_cmd(s,
+				       QBMAN_CENA_SWP_DQRR(s->dqrr.next_idx)));
+	}
+
+	p = qbman_get_cmd(s, QBMAN_CENA_SWP_DQRR_MEM(s->dqrr.next_idx));
 	verb = p->dq.verb;
 
 	/*
@@ -929,7 +1224,7 @@ void qbman_release_desc_set_rcdi(struct qbman_release_desc *d, int enable)
 #define RAR_SUCCESS(rar) ((rar) & 0x100)
 
 /**
- * qbman_swp_release() - Issue a buffer release command
+ * qbman_swp_release_direct() - Issue a buffer release command
  * @s:           the software portal object
  * @d:           the release descriptor
  * @buffers:     a pointer pointing to the buffer address to be released
@@ -937,8 +1232,9 @@ void qbman_release_desc_set_rcdi(struct qbman_release_desc *d, int enable)
  *
  * Return 0 for success, -EBUSY if the release command ring is not ready.
  */
-int qbman_swp_release(struct qbman_swp *s, const struct qbman_release_desc *d,
-		      const u64 *buffers, unsigned int num_buffers)
+int qbman_swp_release_direct(struct qbman_swp *s,
+			     const struct qbman_release_desc *d,
+			     const u64 *buffers, unsigned int num_buffers)
 {
 	int i;
 	struct qbman_release_desc *p;
@@ -952,28 +1248,59 @@ int qbman_swp_release(struct qbman_swp *s, const struct qbman_release_desc *d,
 		return -EBUSY;
 
 	/* Start the release command */
-	if ((s->desc->qman_version & QMAN_REV_MASK) < QMAN_REV_5000)
-		p = qbman_get_cmd(s, QBMAN_CENA_SWP_RCR(RAR_IDX(rar)));
-	else
-		p = qbman_get_cmd(s, QBMAN_CENA_SWP_RCR_MEM(RAR_IDX(rar)));
+	p = qbman_get_cmd(s, QBMAN_CENA_SWP_RCR(RAR_IDX(rar)));
+
 	/* Copy the caller's buffer pointers to the command */
 	for (i = 0; i < num_buffers; i++)
 		p->buf[i] = cpu_to_le64(buffers[i]);
 	p->bpid = d->bpid;
 
-	if ((s->desc->qman_version & QMAN_REV_MASK) < QMAN_REV_5000) {
-		/*
-		 * Set the verb byte, have to substitute in the valid-bit
-		 * and the number of buffers.
-		 */
-		dma_wmb();
-		p->verb = d->verb | RAR_VB(rar) | num_buffers;
-	} else {
-		p->verb = d->verb | RAR_VB(rar) | num_buffers;
-		dma_wmb();
-		qbman_write_register(s, QBMAN_CINH_SWP_RCR_AM_RT +
-				     RAR_IDX(rar)  * 4, QMAN_RT_MODE);
-	}
+	/*
+	 * Set the verb byte, have to substitute in the valid-bit
+	 * and the number of buffers.
+	 */
+	dma_wmb();
+	p->verb = d->verb | RAR_VB(rar) | num_buffers;
+
+	return 0;
+}
+
+/**
+ * qbman_swp_release_mem_back() - Issue a buffer release command
+ * @s:           the software portal object
+ * @d:           the release descriptor
+ * @buffers:     a pointer pointing to the buffer address to be released
+ * @num_buffers: number of buffers to be released,  must be less than 8
+ *
+ * Return 0 for success, -EBUSY if the release command ring is not ready.
+ */
+int qbman_swp_release_mem_back(struct qbman_swp *s,
+			       const struct qbman_release_desc *d,
+			       const u64 *buffers, unsigned int num_buffers)
+{
+	int i;
+	struct qbman_release_desc *p;
+	u32 rar;
+
+	if (!num_buffers || (num_buffers > 7))
+		return -EINVAL;
+
+	rar = qbman_read_register(s, QBMAN_CINH_SWP_RAR);
+	if (!RAR_SUCCESS(rar))
+		return -EBUSY;
+
+	/* Start the release command */
+	p = qbman_get_cmd(s, QBMAN_CENA_SWP_RCR_MEM(RAR_IDX(rar)));
+
+	/* Copy the caller's buffer pointers to the command */
+	for (i = 0; i < num_buffers; i++)
+		p->buf[i] = cpu_to_le64(buffers[i]);
+	p->bpid = d->bpid;
+
+	p->verb = d->verb | RAR_VB(rar) | num_buffers;
+	dma_wmb();
+	qbman_write_register(s, QBMAN_CINH_SWP_RCR_AM_RT +
+			     RAR_IDX(rar)  * 4, QMAN_RT_MODE);
 
 	return 0;
 }
diff --git a/drivers/soc/fsl/dpio/qbman-portal.h b/drivers/soc/fsl/dpio/qbman-portal.h
index ac58a97..3b3fb86 100644
--- a/drivers/soc/fsl/dpio/qbman-portal.h
+++ b/drivers/soc/fsl/dpio/qbman-portal.h
@@ -145,6 +145,33 @@ struct qbman_swp {
 	} dqrr;
 };
 
+/* Function pointers */
+extern
+int (*qbman_swp_enqueue_ptr)(struct qbman_swp *s,
+			     const struct qbman_eq_desc *d,
+			     const struct dpaa2_fd *fd);
+extern
+int (*qbman_swp_enqueue_multiple_ptr)(struct qbman_swp *s,
+				      const struct qbman_eq_desc *d,
+				      const struct dpaa2_fd *fd,
+				      uint32_t *flags,
+				      int num_frames);
+extern
+int (*qbman_swp_enqueue_multiple_desc_ptr)(struct qbman_swp *s,
+					   const struct qbman_eq_desc *d,
+					   const struct dpaa2_fd *fd,
+					   int num_frames);
+extern
+int (*qbman_swp_pull_ptr)(struct qbman_swp *s, struct qbman_pull_desc *d);
+extern
+const struct dpaa2_dq *(*qbman_swp_dqrr_next_ptr)(struct qbman_swp *s);
+extern
+int (*qbman_swp_release_ptr)(struct qbman_swp *s,
+			     const struct qbman_release_desc *d,
+			     const u64 *buffers,
+			     unsigned int num_buffers);
+
+/* Functions */
 struct qbman_swp *qbman_swp_init(const struct qbman_swp_desc *d);
 void qbman_swp_finish(struct qbman_swp *p);
 u32 qbman_swp_interrupt_read_status(struct qbman_swp *p);
@@ -169,9 +196,6 @@ void qbman_pull_desc_set_wq(struct qbman_pull_desc *d, u32 wqid,
 void qbman_pull_desc_set_channel(struct qbman_pull_desc *d, u32 chid,
 				 enum qbman_pull_type_e dct);
 
-int qbman_swp_pull(struct qbman_swp *p, struct qbman_pull_desc *d);
-
-const struct dpaa2_dq *qbman_swp_dqrr_next(struct qbman_swp *s);
 void qbman_swp_dqrr_consume(struct qbman_swp *s, const struct dpaa2_dq *dq);
 
 int qbman_result_has_new_result(struct qbman_swp *p, const struct dpaa2_dq *dq);
@@ -183,15 +207,11 @@ void qbman_eq_desc_set_fq(struct qbman_eq_desc *d, u32 fqid);
 void qbman_eq_desc_set_qd(struct qbman_eq_desc *d, u32 qdid,
 			  u32 qd_bin, u32 qd_prio);
 
-int qbman_swp_enqueue(struct qbman_swp *p, const struct qbman_eq_desc *d,
-		      const struct dpaa2_fd *fd);
 
 void qbman_release_desc_clear(struct qbman_release_desc *d);
 void qbman_release_desc_set_bpid(struct qbman_release_desc *d, u16 bpid);
 void qbman_release_desc_set_rcdi(struct qbman_release_desc *d, int enable);
 
-int qbman_swp_release(struct qbman_swp *s, const struct qbman_release_desc *d,
-		      const u64 *buffers, unsigned int num_buffers);
 int qbman_swp_acquire(struct qbman_swp *s, u16 bpid, u64 *buffers,
 		      unsigned int num_buffers);
 int qbman_swp_alt_fq_state(struct qbman_swp *s, u32 fqid,
@@ -204,18 +224,60 @@ void *qbman_swp_mc_start(struct qbman_swp *p);
 void qbman_swp_mc_submit(struct qbman_swp *p, void *cmd, u8 cmd_verb);
 void *qbman_swp_mc_result(struct qbman_swp *p);
 
-int
+/**
+ * qbman_swp_enqueue() - Issue an enqueue command
+ * @s:  the software portal used for enqueue
+ * @d:  the enqueue descriptor
+ * @fd: the frame descriptor to be enqueued
+ *
+ * Return 0 for successful enqueue, -EBUSY if the EQCR is not ready.
+ */
+static inline int
+qbman_swp_enqueue(struct qbman_swp *s, const struct qbman_eq_desc *d,
+		  const struct dpaa2_fd *fd)
+{
+	return qbman_swp_enqueue_ptr(s, d, fd);
+}
+
+/**
+ * qbman_swp_enqueue_multiple() - Issue a multi enqueue command
+ * using one enqueue descriptor
+ * @s:  the software portal used for enqueue
+ * @d:  the enqueue descriptor
+ * @fd: table pointer of frame descriptor table to be enqueued
+ * @flags: table pointer of QBMAN_ENQUEUE_FLAG_DCA flags, not used if NULL
+ * @num_frames: number of fd to be enqueued
+ *
+ * Return the number of fd enqueued, or a negative error number.
+ */
+static inline int
 qbman_swp_enqueue_multiple(struct qbman_swp *s,
 			   const struct qbman_eq_desc *d,
 			   const struct dpaa2_fd *fd,
 			   uint32_t *flags,
-			   int num_frames);
+			   int num_frames)
+{
+	return qbman_swp_enqueue_multiple_ptr(s, d, fd, flags, num_frames);
+}
 
-int
+/**
+ * qbman_swp_enqueue_multiple_desc() - Issue a multi enqueue command
+ * using multiple enqueue descriptor
+ * @s:  the software portal used for enqueue
+ * @d:  table of minimal enqueue descriptor
+ * @fd: table pointer of frame descriptor table to be enqueued
+ * @num_frames: number of fd to be enqueued
+ *
+ * Return the number of fd enqueued, or a negative error number.
+ */
+static inline int
 qbman_swp_enqueue_multiple_desc(struct qbman_swp *s,
 				const struct qbman_eq_desc *d,
 				const struct dpaa2_fd *fd,
-				int num_frames);
+				int num_frames)
+{
+	return qbman_swp_enqueue_multiple_desc_ptr(s, d, fd, num_frames);
+}
 
 /**
  * qbman_result_is_DQ() - check if the dequeue result is a dequeue response
@@ -528,4 +590,49 @@ int qbman_bp_query(struct qbman_swp *s, u16 bpid,
 
 u32 qbman_bp_info_num_free_bufs(struct qbman_bp_query_rslt *a);
 
+/**
+ * qbman_swp_release() - Issue a buffer release command
+ * @s:           the software portal object
+ * @d:           the release descriptor
+ * @buffers:     a pointer pointing to the buffer address to be released
+ * @num_buffers: number of buffers to be released,  must be less than 8
+ *
+ * Return 0 for success, -EBUSY if the release command ring is not ready.
+ */
+static inline int qbman_swp_release(struct qbman_swp *s,
+				    const struct qbman_release_desc *d,
+				    const u64 *buffers,
+				    unsigned int num_buffers)
+{
+	return qbman_swp_release_ptr(s, d, buffers, num_buffers);
+}
+
+/**
+ * qbman_swp_pull() - Issue the pull dequeue command
+ * @s: the software portal object
+ * @d: the software portal descriptor which has been configured with
+ *     the set of qbman_pull_desc_set_*() calls
+ *
+ * Return 0 for success, and -EBUSY if the software portal is not ready
+ * to do pull dequeue.
+ */
+static inline int qbman_swp_pull(struct qbman_swp *s,
+				 struct qbman_pull_desc *d)
+{
+	return qbman_swp_pull_ptr(s, d);
+}
+
+/**
+ * qbman_swp_dqrr_next() - Get an valid DQRR entry
+ * @s: the software portal object
+ *
+ * Return NULL if there are no unconsumed DQRR entries. Return a DQRR entry
+ * only once, so repeated calls can return a sequence of DQRR entries, without
+ * requiring they be consumed immediately or in any particular order.
+ */
+static inline const struct dpaa2_dq *qbman_swp_dqrr_next(struct qbman_swp *s)
+{
+	return qbman_swp_dqrr_next_ptr(s);
+}
+
 #endif /* __FSL_QBMAN_PORTAL_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] soc: fsl: dpio: Replace QMAN array mode by ring mode enqueue.
  2019-12-12 17:01 [PATCH 0/3] soc: fsl: dpio: Enable QMAN batch enqueuing Youri Querry
  2019-12-12 17:01 ` [PATCH 1/3] soc: fsl: dpio: Adding QMAN multiple enqueue interface Youri Querry
  2019-12-12 17:01 ` [PATCH 2/3] soc: fsl: dpio: QMAN performance improvement. Function pointer indirection Youri Querry
@ 2019-12-12 17:01 ` Youri Querry
  2020-02-06 20:39 ` [PATCH 0/3] soc: fsl: dpio: Enable QMAN batch enqueuing Roy Pledge
  3 siblings, 0 replies; 8+ messages in thread
From: Youri Querry @ 2019-12-12 17:01 UTC (permalink / raw)
  To: Roy Pledge, Leo Li, linux-kernel, linuxppc-dev, linux-arm-kernel,
	Ioana Ciornei, Alexandru Marginean
  Cc: Youri Querry

This change of algorithm will enable faster bulk enqueue.
This will grately benefit XDP bulk enqueue.

Signed-off-by: Youri Querry <youri.querry_1@nxp.com>
---
 drivers/soc/fsl/dpio/qbman-portal.c | 410 +++++++++++++++++++++++++++---------
 drivers/soc/fsl/dpio/qbman-portal.h |  13 ++
 2 files changed, 328 insertions(+), 95 deletions(-)

diff --git a/drivers/soc/fsl/dpio/qbman-portal.c b/drivers/soc/fsl/dpio/qbman-portal.c
index 0ffe018..740ee0d 100644
--- a/drivers/soc/fsl/dpio/qbman-portal.c
+++ b/drivers/soc/fsl/dpio/qbman-portal.c
@@ -8,6 +8,7 @@
 #include <asm/cacheflush.h>
 #include <linux/io.h>
 #include <linux/slab.h>
+#include <linux/spinlock.h>
 #include <soc/fsl/dpaa2-global.h>
 
 #include "qbman-portal.h"
@@ -21,6 +22,7 @@
 
 /* CINH register offsets */
 #define QBMAN_CINH_SWP_EQCR_PI      0x800
+#define QBMAN_CINH_SWP_EQCR_CI	    0x840
 #define QBMAN_CINH_SWP_EQAR    0x8c0
 #define QBMAN_CINH_SWP_CR_RT        0x900
 #define QBMAN_CINH_SWP_VDQCR_RT     0x940
@@ -44,6 +46,8 @@
 #define QBMAN_CENA_SWP_CR      0x600
 #define QBMAN_CENA_SWP_RR(vb)  (0x700 + ((u32)(vb) >> 1))
 #define QBMAN_CENA_SWP_VDQCR   0x780
+#define QBMAN_CENA_SWP_EQCR_CI 0x840
+#define QBMAN_CENA_SWP_EQCR_CI_MEMBACK 0x1840
 
 /* CENA register offsets in memory-backed mode */
 #define QBMAN_CENA_SWP_DQRR_MEM(n)  (0x800 + ((u32)(n) << 6))
@@ -71,6 +75,12 @@
 /* opaque token for static dequeues */
 #define QMAN_SDQCR_TOKEN    0xbb
 
+#define QBMAN_EQCR_DCA_IDXMASK          0x0f
+#define QBMAN_ENQUEUE_FLAG_DCA          (1ULL << 31)
+
+#define EQ_DESC_SIZE_WITHOUT_FD 29
+#define EQ_DESC_SIZE_FD_START 32
+
 enum qbman_sdqcr_dct {
 	qbman_sdqcr_dct_null = 0,
 	qbman_sdqcr_dct_prio_ics,
@@ -215,6 +225,15 @@ static inline u32 qbman_set_swp_cfg(u8 max_fill, u8 wn,	u8 est, u8 rpm, u8 dcm,
 
 #define QMAN_RT_MODE	   0x00000100
 
+static inline u8 qm_cyc_diff(u8 ringsize, u8 first, u8 last)
+{
+	/* 'first' is included, 'last' is excluded */
+	if (first <= last)
+		return last - first;
+	else
+		return (2 * ringsize) - (first - last);
+}
+
 /**
  * qbman_swp_init() - Create a functional object representing the given
  *                    QBMan portal descriptor.
@@ -227,6 +246,10 @@ struct qbman_swp *qbman_swp_init(const struct qbman_swp_desc *d)
 {
 	struct qbman_swp *p = kzalloc(sizeof(*p), GFP_KERNEL);
 	u32 reg;
+	u32 mask_size;
+	u32 eqcr_pi;
+
+	spin_lock_init(&p->access_spinlock);
 
 	if (!p)
 		return NULL;
@@ -255,25 +278,38 @@ struct qbman_swp *qbman_swp_init(const struct qbman_swp_desc *d)
 	p->addr_cena = d->cena_bar;
 	p->addr_cinh = d->cinh_bar;
 
-	if ((p->desc->qman_version & QMAN_REV_MASK) >= QMAN_REV_5000)
-		memset(p->addr_cena, 0, 64 * 1024);
+	if ((p->desc->qman_version & QMAN_REV_MASK) < QMAN_REV_5000) {
 
-	reg = qbman_set_swp_cfg(p->dqrr.dqrr_size,
-				1, /* Writes Non-cacheable */
-				0, /* EQCR_CI stashing threshold */
-				3, /* RPM: Valid bit mode, RCR in array mode */
-				2, /* DCM: Discrete consumption ack mode */
-				3, /* EPM: Valid bit mode, EQCR in array mode */
-				1, /* mem stashing drop enable == TRUE */
-				1, /* mem stashing priority == TRUE */
-				1, /* mem stashing enable == TRUE */
-				1, /* dequeue stashing priority == TRUE */
-				0, /* dequeue stashing enable == FALSE */
-				0); /* EQCR_CI stashing priority == FALSE */
-	if ((p->desc->qman_version & QMAN_REV_MASK) >= QMAN_REV_5000)
+		reg = qbman_set_swp_cfg(p->dqrr.dqrr_size,
+			1, /* Writes Non-cacheable */
+			0, /* EQCR_CI stashing threshold */
+			3, /* RPM: RCR in array mode */
+			2, /* DCM: Discrete consumption ack */
+			2, /* EPM: EQCR in ring mode */
+			1, /* mem stashing drop enable enable */
+			1, /* mem stashing priority enable */
+			1, /* mem stashing enable */
+			1, /* dequeue stashing priority enable */
+			0, /* dequeue stashing enable enable */
+			0); /* EQCR_CI stashing priority enable */
+	} else {
+		memset(p->addr_cena, 0, 64 * 1024);
+		reg = qbman_set_swp_cfg(p->dqrr.dqrr_size,
+			1, /* Writes Non-cacheable */
+			1, /* EQCR_CI stashing threshold */
+			3, /* RPM: RCR in array mode */
+			2, /* DCM: Discrete consumption ack */
+			0, /* EPM: EQCR in ring mode */
+			1, /* mem stashing drop enable */
+			1, /* mem stashing priority enable */
+			1, /* mem stashing enable */
+			1, /* dequeue stashing priority enable */
+			0, /* dequeue stashing enable */
+			0); /* EQCR_CI stashing priority enable */
 		reg |= 1 << SWP_CFG_CPBS_SHIFT | /* memory-backed mode */
 		       1 << SWP_CFG_VPM_SHIFT |  /* VDQCR read triggered mode */
 		       1 << SWP_CFG_CPM_SHIFT;   /* CR read triggered mode */
+	}
 
 	qbman_write_register(p, QBMAN_CINH_SWP_CFG, reg);
 	reg = qbman_read_register(p, QBMAN_CINH_SWP_CFG);
@@ -295,7 +331,9 @@ struct qbman_swp *qbman_swp_init(const struct qbman_swp_desc *d)
 	 */
 	qbman_write_register(p, QBMAN_CINH_SWP_SDQCR, 0);
 
+	p->eqcr.pi_ring_size = 8;
 	if ((p->desc->qman_version & QMAN_REV_MASK) >= QMAN_REV_5000) {
+		p->eqcr.pi_ring_size = 32;
 		qbman_swp_enqueue_ptr =
 			qbman_swp_enqueue_mem_back;
 		qbman_swp_enqueue_multiple_ptr =
@@ -307,6 +345,15 @@ struct qbman_swp *qbman_swp_init(const struct qbman_swp_desc *d)
 		qbman_swp_release_ptr = qbman_swp_release_mem_back;
 	}
 
+	for (mask_size = p->eqcr.pi_ring_size; mask_size > 0; mask_size >>= 1)
+		p->eqcr.pi_ci_mask = (p->eqcr.pi_ci_mask << 1) + 1;
+	eqcr_pi = qbman_read_register(p, QBMAN_CINH_SWP_EQCR_PI);
+	p->eqcr.pi = eqcr_pi & p->eqcr.pi_ci_mask;
+	p->eqcr.pi_vb = eqcr_pi & QB_VALID_BIT;
+	p->eqcr.ci = qbman_read_register(p, QBMAN_CINH_SWP_EQCR_CI)
+			& p->eqcr.pi_ci_mask;
+	p->eqcr.available = p->eqcr.pi_ring_size;
+
 	return p;
 }
 
@@ -460,6 +507,7 @@ enum qb_enqueue_commands {
 #define QB_ENQUEUE_CMD_ORP_ENABLE_SHIFT      2
 #define QB_ENQUEUE_CMD_IRQ_ON_DISPATCH_SHIFT 3
 #define QB_ENQUEUE_CMD_TARGET_TYPE_SHIFT     4
+#define QB_ENQUEUE_CMD_DCA_EN_SHIFT          7
 
 /**
  * qbman_eq_desc_clear() - Clear the contents of a descriptor to
@@ -535,6 +583,7 @@ static inline void qbman_write_eqcr_am_rt_register(struct qbman_swp *p,
 				     QMAN_RT_MODE);
 }
 
+#define QB_RT_BIT ((u32)0x100)
 /**
  * qbman_swp_enqueue_direct() - Issue an enqueue command
  * @s:  the software portal used for enqueue
@@ -546,34 +595,19 @@ static inline void qbman_write_eqcr_am_rt_register(struct qbman_swp *p,
  *
  * Return 0 for successful enqueue, -EBUSY if the EQCR is not ready.
  */
-int qbman_swp_enqueue_direct(struct qbman_swp *s, const struct qbman_eq_desc *d,
-		      const struct dpaa2_fd *fd)
+static
+int qbman_swp_enqueue_direct(struct qbman_swp *s,
+			     const struct qbman_eq_desc *d,
+			     const struct dpaa2_fd *fd)
 {
-	struct qbman_eq_desc_with_fd *p;
-	u32 eqar = qbman_read_register(s, QBMAN_CINH_SWP_EQAR);
-
-	if (!EQAR_SUCCESS(eqar))
-		return -EBUSY;
+	int flags = 0;
+	int ret = qbman_swp_enqueue_multiple_direct(s, d, fd, &flags, 1);
 
-	p = qbman_get_cmd(s, QBMAN_CENA_SWP_EQCR(EQAR_IDX(eqar)));
-	/* This is mapped as DEVICE type memory, writes are
-	 * with address alignment:
-	 * desc.dca address alignment = 1
-	 * desc.seqnum address alignment = 2
-	 * desc.orpid address alignment = 4
-	 * desc.tgtid address alignment = 8
-	 */
-	p->desc.dca = d->dca;
-	p->desc.seqnum = d->seqnum;
-	p->desc.orpid = d->orpid;
-	memcpy(&p->desc.tgtid, &d->tgtid, 24);
-	memcpy(&p->fd, fd, sizeof(*fd));
-
-	/* Set the verb byte, have to substitute in the valid-bit */
-	dma_wmb();
-	p->desc.verb = d->verb | EQAR_VB(eqar);
-
-	return 0;
+	if (ret >= 0)
+		ret = 0;
+	else
+		ret = -EBUSY;
+	return  ret;
 }
 
 /**
@@ -587,35 +621,19 @@ int qbman_swp_enqueue_direct(struct qbman_swp *s, const struct qbman_eq_desc *d,
  *
  * Return 0 for successful enqueue, -EBUSY if the EQCR is not ready.
  */
+static
 int qbman_swp_enqueue_mem_back(struct qbman_swp *s,
 			       const struct qbman_eq_desc *d,
 			       const struct dpaa2_fd *fd)
 {
-	struct qbman_eq_desc_with_fd *p;
-	u32 eqar = qbman_read_register(s, QBMAN_CINH_SWP_EQAR);
-
-	if (!EQAR_SUCCESS(eqar))
-		return -EBUSY;
+	int flags = 0;
+	int ret = qbman_swp_enqueue_multiple_mem_back(s, d, fd, &flags, 1);
 
-	p = qbman_get_cmd(s, QBMAN_CENA_SWP_EQCR(EQAR_IDX(eqar)));
-	/* This is mapped as DEVICE type memory, writes are
-	 * with address alignment:
-	 * desc.dca address alignment = 1
-	 * desc.seqnum address alignment = 2
-	 * desc.orpid address alignment = 4
-	 * desc.tgtid address alignment = 8
-	 */
-	p->desc.dca = d->dca;
-	p->desc.seqnum = d->seqnum;
-	p->desc.orpid = d->orpid;
-	memcpy(&p->desc.tgtid, &d->tgtid, 24);
-	memcpy(&p->fd, fd, sizeof(*fd));
-
-	p->desc.verb = d->verb | EQAR_VB(eqar);
-	dma_wmb();
-	qbman_write_eqcr_am_rt_register(s, EQAR_IDX(eqar));
-
-	return 0;
+	if (ret >= 0)
+		ret = 0;
+	else
+		ret = -EBUSY;
+	return  ret;
 }
 
 /**
@@ -624,26 +642,82 @@ int qbman_swp_enqueue_mem_back(struct qbman_swp *s,
  * @s:  the software portal used for enqueue
  * @d:  the enqueue descriptor
  * @fd: table pointer of frame descriptor table to be enqueued
- * @flags: table pointer of flags, not used for the moment
+ * @flags: table pointer of QBMAN_ENQUEUE_FLAG_DCA flags, not used if NULL
  * @num_frames: number of fd to be enqueued
  *
  * Return the number of fd enqueued, or a negative error number.
  */
+static
 int qbman_swp_enqueue_multiple_direct(struct qbman_swp *s,
 				      const struct qbman_eq_desc *d,
 				      const struct dpaa2_fd *fd,
 				      uint32_t *flags,
 				      int num_frames)
 {
-	int count = 0;
+	uint32_t *p = NULL;
+	const uint32_t *cl = (uint32_t *)d;
+	uint32_t eqcr_ci, eqcr_pi, half_mask, full_mask;
+	int i, num_enqueued = 0;
+	uint64_t addr_cena;
+
+	spin_lock(&s->access_spinlock);
+	half_mask = (s->eqcr.pi_ci_mask>>1);
+	full_mask = s->eqcr.pi_ci_mask;
+
+	if (!s->eqcr.available) {
+		eqcr_ci = s->eqcr.ci;
+		p = s->addr_cena + QBMAN_CENA_SWP_EQCR_CI;
+		s->eqcr.ci = qbman_read_register(s, QBMAN_CINH_SWP_EQCR_CI);
+
+		s->eqcr.available = qm_cyc_diff(s->eqcr.pi_ring_size,
+					eqcr_ci, s->eqcr.ci);
+		if (!s->eqcr.available) {
+			spin_unlock(&s->access_spinlock);
+			return 0;
+		}
+	}
 
-	while (count < num_frames) {
-		if (qbman_swp_enqueue_direct(s, d, fd) != 0)
-			break;
-		count++;
+	eqcr_pi = s->eqcr.pi;
+	num_enqueued = (s->eqcr.available < num_frames) ?
+			s->eqcr.available : num_frames;
+	s->eqcr.available -= num_enqueued;
+	/* Fill in the EQCR ring */
+	for (i = 0; i < num_enqueued; i++) {
+		p = (s->addr_cena + QBMAN_CENA_SWP_EQCR(eqcr_pi & half_mask));
+		/* Skip copying the verb */
+		memcpy(&p[1], &cl[1], EQ_DESC_SIZE_WITHOUT_FD - 1);
+		memcpy(&p[EQ_DESC_SIZE_FD_START/sizeof(uint32_t)],
+		       &fd[i], sizeof(*fd));
+		eqcr_pi++;
 	}
 
-	return count;
+	dma_wmb();
+
+	/* Set the verb byte, have to substitute in the valid-bit */
+	eqcr_pi = s->eqcr.pi;
+	for (i = 0; i < num_enqueued; i++) {
+		p = (s->addr_cena + QBMAN_CENA_SWP_EQCR(eqcr_pi & half_mask));
+		p[0] = cl[0] | s->eqcr.pi_vb;
+		if (flags && (flags[i] & QBMAN_ENQUEUE_FLAG_DCA)) {
+			struct qbman_eq_desc *d = (struct qbman_eq_desc *)p;
+
+			d->dca = (1 << QB_ENQUEUE_CMD_DCA_EN_SHIFT) |
+				((flags[i]) & QBMAN_EQCR_DCA_IDXMASK);
+		}
+		eqcr_pi++;
+		if (!(eqcr_pi & half_mask))
+			s->eqcr.pi_vb ^= QB_VALID_BIT;
+	}
+
+	/* Flush all the cacheline without load/store in between */
+	eqcr_pi = s->eqcr.pi;
+	addr_cena = (size_t)s->addr_cena;
+	for (i = 0; i < num_enqueued; i++)
+		eqcr_pi++;
+	s->eqcr.pi = eqcr_pi & full_mask;
+	spin_unlock(&s->access_spinlock);
+
+	return num_enqueued;
 }
 
 /**
@@ -652,26 +726,80 @@ int qbman_swp_enqueue_multiple_direct(struct qbman_swp *s,
  * @s:  the software portal used for enqueue
  * @d:  the enqueue descriptor
  * @fd: table pointer of frame descriptor table to be enqueued
- * @flags: table pointer of flags, not used for the moment
+ * @flags: table pointer of QBMAN_ENQUEUE_FLAG_DCA flags, not used if NULL
  * @num_frames: number of fd to be enqueued
  *
  * Return the number of fd enqueued, or a negative error number.
  */
+static
 int qbman_swp_enqueue_multiple_mem_back(struct qbman_swp *s,
-				      const struct qbman_eq_desc *d,
-				      const struct dpaa2_fd *fd,
-				      uint32_t *flags,
-				      int num_frames)
+					const struct qbman_eq_desc *d,
+					const struct dpaa2_fd *fd,
+					uint32_t *flags,
+					int num_frames)
 {
-	int count = 0;
+	uint32_t *p = NULL;
+	const uint32_t *cl = (uint32_t *)(d);
+	uint32_t eqcr_ci, eqcr_pi, half_mask, full_mask;
+	int i, num_enqueued = 0;
+	unsigned long irq_flags;
+
+	spin_lock(&s->access_spinlock);
+	local_irq_save(irq_flags);
+
+	half_mask = (s->eqcr.pi_ci_mask>>1);
+	full_mask = s->eqcr.pi_ci_mask;
+	if (!s->eqcr.available) {
+		eqcr_ci = s->eqcr.ci;
+		p = s->addr_cena + QBMAN_CENA_SWP_EQCR_CI_MEMBACK;
+		s->eqcr.ci = __raw_readl(p) & full_mask;
+		s->eqcr.available = qm_cyc_diff(s->eqcr.pi_ring_size,
+					eqcr_ci, s->eqcr.ci);
+		if (!s->eqcr.available) {
+			local_irq_restore(irq_flags);
+			spin_unlock(&s->access_spinlock);
+			return 0;
+		}
+	}
+
+	eqcr_pi = s->eqcr.pi;
+	num_enqueued = (s->eqcr.available < num_frames) ?
+			s->eqcr.available : num_frames;
+	s->eqcr.available -= num_enqueued;
+	/* Fill in the EQCR ring */
+	for (i = 0; i < num_enqueued; i++) {
+		p = (s->addr_cena + QBMAN_CENA_SWP_EQCR(eqcr_pi & half_mask));
+		/* Skip copying the verb */
+		memcpy(&p[1], &cl[1], EQ_DESC_SIZE_WITHOUT_FD - 1);
+		memcpy(&p[EQ_DESC_SIZE_FD_START/sizeof(uint32_t)],
+		       &fd[i], sizeof(*fd));
+		eqcr_pi++;
+	}
 
-	while (count < num_frames) {
-		if (qbman_swp_enqueue_mem_back(s, d, fd) != 0)
-			break;
-		count++;
+	/* Set the verb byte, have to substitute in the valid-bit */
+	eqcr_pi = s->eqcr.pi;
+	for (i = 0; i < num_enqueued; i++) {
+		p = (s->addr_cena + QBMAN_CENA_SWP_EQCR(eqcr_pi & half_mask));
+		p[0] = cl[0] | s->eqcr.pi_vb;
+		if (flags && (flags[i] & QBMAN_ENQUEUE_FLAG_DCA)) {
+			struct qbman_eq_desc *d = (struct qbman_eq_desc *)p;
+
+			d->dca = (1 << QB_ENQUEUE_CMD_DCA_EN_SHIFT) |
+				((flags[i]) & QBMAN_EQCR_DCA_IDXMASK);
+		}
+		eqcr_pi++;
+		if (!(eqcr_pi & half_mask))
+			s->eqcr.pi_vb ^= QB_VALID_BIT;
 	}
+	s->eqcr.pi = eqcr_pi & full_mask;
 
-	return count;
+	dma_wmb();
+	qbman_write_register(s, QBMAN_CINH_SWP_EQCR_PI,
+				(QB_RT_BIT)|(s->eqcr.pi)|s->eqcr.pi_vb);
+	local_irq_restore(irq_flags);
+	spin_unlock(&s->access_spinlock);
+
+	return num_enqueued;
 }
 
 /**
@@ -684,20 +812,66 @@ int qbman_swp_enqueue_multiple_mem_back(struct qbman_swp *s,
  *
  * Return the number of fd enqueued, or a negative error number.
  */
+static
 int qbman_swp_enqueue_multiple_desc_direct(struct qbman_swp *s,
 					   const struct qbman_eq_desc *d,
 					   const struct dpaa2_fd *fd,
 					   int num_frames)
 {
-	int count = 0;
+	uint32_t *p;
+	const uint32_t *cl;
+	uint32_t eqcr_ci, eqcr_pi, half_mask, full_mask;
+	int i, num_enqueued = 0;
+	uint64_t addr_cena;
+
+	half_mask = (s->eqcr.pi_ci_mask>>1);
+	full_mask = s->eqcr.pi_ci_mask;
+	if (!s->eqcr.available) {
+		eqcr_ci = s->eqcr.ci;
+		p = s->addr_cena + QBMAN_CENA_SWP_EQCR_CI;
+		s->eqcr.ci = qbman_read_register(s, QBMAN_CINH_SWP_EQCR_CI);
+		s->eqcr.available = qm_cyc_diff(s->eqcr.pi_ring_size,
+					eqcr_ci, s->eqcr.ci);
+		if (!s->eqcr.available)
+			return 0;
+	}
+
+	eqcr_pi = s->eqcr.pi;
+	num_enqueued = (s->eqcr.available < num_frames) ?
+			s->eqcr.available : num_frames;
+	s->eqcr.available -= num_enqueued;
+	/* Fill in the EQCR ring */
+	for (i = 0; i < num_enqueued; i++) {
+		p = (s->addr_cena + QBMAN_CENA_SWP_EQCR(eqcr_pi & half_mask));
+		cl = (uint32_t *)(&d[i]);
+		/* Skip copying the verb */
+		memcpy(&p[1], &cl[1], EQ_DESC_SIZE_WITHOUT_FD - 1);
+		memcpy(&p[EQ_DESC_SIZE_FD_START/sizeof(uint32_t)],
+		       &fd[i], sizeof(*fd));
+		eqcr_pi++;
+	}
+
+	dma_wmb();
 
-	while (count < num_frames) {
-		if (qbman_swp_enqueue_direct(s, &(d[count]), fd) != 0)
-			break;
-		count++;
+	/* Set the verb byte, have to substitute in the valid-bit */
+	eqcr_pi = s->eqcr.pi;
+	for (i = 0; i < num_enqueued; i++) {
+		p = (s->addr_cena + QBMAN_CENA_SWP_EQCR(eqcr_pi & half_mask));
+		cl = (uint32_t *)(&d[i]);
+		p[0] = cl[0] | s->eqcr.pi_vb;
+		eqcr_pi++;
+		if (!(eqcr_pi & half_mask))
+			s->eqcr.pi_vb ^= QB_VALID_BIT;
 	}
 
-	return count;
+	/* Flush all the cacheline without load/store in between */
+	eqcr_pi = s->eqcr.pi;
+	addr_cena = (uint64_t)s->addr_cena;
+	for (i = 0; i < num_enqueued; i++)
+		eqcr_pi++;
+	s->eqcr.pi = eqcr_pi & full_mask;
+
+	return num_enqueued;
 }
 
 /**
@@ -710,20 +884,62 @@ int qbman_swp_enqueue_multiple_desc_direct(struct qbman_swp *s,
  *
  * Return the number of fd enqueued, or a negative error number.
  */
+static
 int qbman_swp_enqueue_multiple_desc_mem_back(struct qbman_swp *s,
 					   const struct qbman_eq_desc *d,
 					   const struct dpaa2_fd *fd,
 					   int num_frames)
 {
-	int count = 0;
+	uint32_t *p;
+	const uint32_t *cl;
+	uint32_t eqcr_ci, eqcr_pi, half_mask, full_mask;
+	int i, num_enqueued = 0;
+
+	half_mask = (s->eqcr.pi_ci_mask>>1);
+	full_mask = s->eqcr.pi_ci_mask;
+	if (!s->eqcr.available) {
+		eqcr_ci = s->eqcr.ci;
+		p = s->addr_cena + QBMAN_CENA_SWP_EQCR_CI_MEMBACK;
+		s->eqcr.ci = __raw_readl(p) & full_mask;
+		s->eqcr.available = qm_cyc_diff(s->eqcr.pi_ring_size,
+					eqcr_ci, s->eqcr.ci);
+		if (!s->eqcr.available)
+			return 0;
+	}
 
-	while (count < num_frames) {
-		if (qbman_swp_enqueue_mem_back(s, &(d[count]), fd) != 0)
-			break;
-		count++;
+	eqcr_pi = s->eqcr.pi;
+	num_enqueued = (s->eqcr.available < num_frames) ?
+			s->eqcr.available : num_frames;
+	s->eqcr.available -= num_enqueued;
+	/* Fill in the EQCR ring */
+	for (i = 0; i < num_enqueued; i++) {
+		p = (s->addr_cena + QBMAN_CENA_SWP_EQCR(eqcr_pi & half_mask));
+		cl = (uint32_t *)(&d[i]);
+		/* Skip copying the verb */
+		memcpy(&p[1], &cl[1], EQ_DESC_SIZE_WITHOUT_FD - 1);
+		memcpy(&p[EQ_DESC_SIZE_FD_START/sizeof(uint32_t)],
+		       &fd[i], sizeof(*fd));
+		eqcr_pi++;
 	}
 
-	return count;
+	/* Set the verb byte, have to substitute in the valid-bit */
+	eqcr_pi = s->eqcr.pi;
+	for (i = 0; i < num_enqueued; i++) {
+		p = (s->addr_cena + QBMAN_CENA_SWP_EQCR(eqcr_pi & half_mask));
+		cl = (uint32_t *)(&d[i]);
+		p[0] = cl[0] | s->eqcr.pi_vb;
+		eqcr_pi++;
+		if (!(eqcr_pi & half_mask))
+			s->eqcr.pi_vb ^= QB_VALID_BIT;
+	}
+
+	s->eqcr.pi = eqcr_pi & full_mask;
+
+	dma_wmb();
+	qbman_write_register(s, QBMAN_CINH_SWP_EQCR_PI,
+				(QB_RT_BIT)|(s->eqcr.pi)|s->eqcr.pi_vb);
+
+	return num_enqueued;
 }
 
 /* Static (push) dequeue */
@@ -889,6 +1105,7 @@ void qbman_pull_desc_set_channel(struct qbman_pull_desc *d, u32 chid,
  * Return 0 for success, and -EBUSY if the software portal is not ready
  * to do pull dequeue.
  */
+static
 int qbman_swp_pull_direct(struct qbman_swp *s, struct qbman_pull_desc *d)
 {
 	struct qbman_pull_desc *p;
@@ -924,6 +1141,7 @@ int qbman_swp_pull_direct(struct qbman_swp *s, struct qbman_pull_desc *d)
  * Return 0 for success, and -EBUSY if the software portal is not ready
  * to do pull dequeue.
  */
+static
 int qbman_swp_pull_mem_back(struct qbman_swp *s, struct qbman_pull_desc *d)
 {
 	struct qbman_pull_desc *p;
@@ -942,6 +1160,8 @@ int qbman_swp_pull_mem_back(struct qbman_swp *s, struct qbman_pull_desc *d)
 	p->dq_src = d->dq_src;
 	p->rsp_addr = d->rsp_addr;
 	p->rsp_addr_virt = d->rsp_addr_virt;
+
+	/* Set the verb byte, have to substitute in the valid-bit */
 	p->verb = d->verb | s->vdq.valid_bit;
 	s->vdq.valid_bit ^= QB_VALID_BIT;
 	dma_wmb();
diff --git a/drivers/soc/fsl/dpio/qbman-portal.h b/drivers/soc/fsl/dpio/qbman-portal.h
index 3b3fb86..c7c2225 100644
--- a/drivers/soc/fsl/dpio/qbman-portal.h
+++ b/drivers/soc/fsl/dpio/qbman-portal.h
@@ -143,6 +143,19 @@ struct qbman_swp {
 		u8 dqrr_size;
 		int reset_bug; /* indicates dqrr reset workaround is needed */
 	} dqrr;
+
+	struct {
+		u32 pi;
+		u32 pi_vb;
+		u32 pi_ring_size;
+		u32 pi_ci_mask;
+		u32 ci;
+		int available;
+		u32 pend;
+		u32 no_pfdr;
+	} eqcr;
+
+	spinlock_t access_spinlock;
 };
 
 /* Function pointers */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] soc: fsl: dpio: Enable QMAN batch enqueuing
  2019-12-12 17:01 [PATCH 0/3] soc: fsl: dpio: Enable QMAN batch enqueuing Youri Querry
                   ` (2 preceding siblings ...)
  2019-12-12 17:01 ` [PATCH 3/3] soc: fsl: dpio: Replace QMAN array mode by ring mode enqueue Youri Querry
@ 2020-02-06 20:39 ` Roy Pledge
  2020-02-06 22:05   ` Leo Li
  2020-02-20  0:07   ` Li Yang
  3 siblings, 2 replies; 8+ messages in thread
From: Roy Pledge @ 2020-02-06 20:39 UTC (permalink / raw)
  To: Youri Querry, Roy Pledge, Leo Li, linux-kernel, linuxppc-dev,
	linux-arm-kernel, Ioana Ciornei, Alexandru Marginean

On 12/12/2019 12:01 PM, Youri Querry wrote:
> This patch set consists of:
> - We added an interface to enqueue several packets at a time and
>    improve performance.
> - Make the algorithm decisions once at initialization and use
>    function pointers to improve performance.
> - Replaced the QMAN enqueue array mode algorithm with a ring
>    mode algorithm. This is to make the enqueue of several frames
>    at a time more effective.
> 
> Youri Querry (3):
>    soc: fsl: dpio: Adding QMAN multiple enqueue interface.
>    soc: fsl: dpio: QMAN performance improvement. Function pointer
>      indirection.
>    soc: fsl: dpio: Replace QMAN array mode by ring mode enqueue.
> 
>   drivers/soc/fsl/dpio/dpio-service.c |  69 +++-
>   drivers/soc/fsl/dpio/qbman-portal.c | 766 ++++++++++++++++++++++++++++++++----
>   drivers/soc/fsl/dpio/qbman-portal.h | 158 +++++++-
>   include/soc/fsl/dpaa2-io.h          |   6 +-
>   4 files changed, 907 insertions(+), 92 deletions(-)
> 
Acked-by: Roy Pledge <roy.pledge@nxp.com>

Leo - can you look at this series so we can get it integrated? Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH 0/3] soc: fsl: dpio: Enable QMAN batch enqueuing
  2020-02-06 20:39 ` [PATCH 0/3] soc: fsl: dpio: Enable QMAN batch enqueuing Roy Pledge
@ 2020-02-06 22:05   ` Leo Li
  2020-02-20  0:07   ` Li Yang
  1 sibling, 0 replies; 8+ messages in thread
From: Leo Li @ 2020-02-06 22:05 UTC (permalink / raw)
  To: Roy Pledge (OSS),
	Youri Querry, Roy Pledge, linux-kernel, linuxppc-dev,
	linux-arm-kernel, Ioana Ciornei, Alexandru Marginean



> -----Original Message-----
> From: Roy Pledge (OSS) <roy.pledge@oss.nxp.com>
> Sent: Thursday, February 6, 2020 2:40 PM
> To: Youri Querry <youri.querry_1@nxp.com>; Roy Pledge
> <roy.pledge@nxp.com>; Leo Li <leoyang.li@nxp.com>; linux-
> kernel@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-arm-
> kernel@lists.infradead.org; Ioana Ciornei <ioana.ciornei@nxp.com>;
> Alexandru Marginean <alexandru.marginean@nxp.com>
> Subject: Re: [PATCH 0/3] soc: fsl: dpio: Enable QMAN batch enqueuing
> 
> On 12/12/2019 12:01 PM, Youri Querry wrote:
> > This patch set consists of:
> > - We added an interface to enqueue several packets at a time and
> >    improve performance.
> > - Make the algorithm decisions once at initialization and use
> >    function pointers to improve performance.
> > - Replaced the QMAN enqueue array mode algorithm with a ring
> >    mode algorithm. This is to make the enqueue of several frames
> >    at a time more effective.
> >
> > Youri Querry (3):
> >    soc: fsl: dpio: Adding QMAN multiple enqueue interface.
> >    soc: fsl: dpio: QMAN performance improvement. Function pointer
> >      indirection.
> >    soc: fsl: dpio: Replace QMAN array mode by ring mode enqueue.
> >
> >   drivers/soc/fsl/dpio/dpio-service.c |  69 +++-
> >   drivers/soc/fsl/dpio/qbman-portal.c | 766
> ++++++++++++++++++++++++++++++++----
> >   drivers/soc/fsl/dpio/qbman-portal.h | 158 +++++++-
> >   include/soc/fsl/dpaa2-io.h          |   6 +-
> >   4 files changed, 907 insertions(+), 92 deletions(-)
> >
> Acked-by: Roy Pledge <roy.pledge@nxp.com>
> 
> Leo - can you look at this series so we can get it integrated? Thanks

Sure.  Thanks for the review.  I will queue them up for v5.7.

Regards,
Leo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] soc: fsl: dpio: Enable QMAN batch enqueuing
  2020-02-06 20:39 ` [PATCH 0/3] soc: fsl: dpio: Enable QMAN batch enqueuing Roy Pledge
  2020-02-06 22:05   ` Leo Li
@ 2020-02-20  0:07   ` Li Yang
  1 sibling, 0 replies; 8+ messages in thread
From: Li Yang @ 2020-02-20  0:07 UTC (permalink / raw)
  To: Roy Pledge
  Cc: Roy Pledge, Alexandru Marginean, Youri Querry, linux-kernel,
	Ioana Ciornei, linuxppc-dev, linux-arm-kernel

On Thu, Feb 6, 2020 at 2:41 PM Roy Pledge <roy.pledge@oss.nxp.com> wrote:
>
> On 12/12/2019 12:01 PM, Youri Querry wrote:
> > This patch set consists of:
> > - We added an interface to enqueue several packets at a time and
> >    improve performance.
> > - Make the algorithm decisions once at initialization and use
> >    function pointers to improve performance.
> > - Replaced the QMAN enqueue array mode algorithm with a ring
> >    mode algorithm. This is to make the enqueue of several frames
> >    at a time more effective.
> >
> > Youri Querry (3):
> >    soc: fsl: dpio: Adding QMAN multiple enqueue interface.
> >    soc: fsl: dpio: QMAN performance improvement. Function pointer
> >      indirection.
> >    soc: fsl: dpio: Replace QMAN array mode by ring mode enqueue.
> >
> >   drivers/soc/fsl/dpio/dpio-service.c |  69 +++-
> >   drivers/soc/fsl/dpio/qbman-portal.c | 766 ++++++++++++++++++++++++++++++++----
> >   drivers/soc/fsl/dpio/qbman-portal.h | 158 +++++++-
> >   include/soc/fsl/dpaa2-io.h          |   6 +-
> >   4 files changed, 907 insertions(+), 92 deletions(-)
> >
> Acked-by: Roy Pledge <roy.pledge@nxp.com>

Series applied with some clean up and typo fix in the title/commit message.

Regards,
Leo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] soc: fsl: dpio: QMAN performance improvement. Function pointer indirection.
  2019-12-12 17:01 ` [PATCH 2/3] soc: fsl: dpio: QMAN performance improvement. Function pointer indirection Youri Querry
@ 2020-03-26 10:07   ` Arnd Bergmann
  0 siblings, 0 replies; 8+ messages in thread
From: Arnd Bergmann @ 2020-03-26 10:07 UTC (permalink / raw)
  To: Youri Querry
  Cc: Roy Pledge, Alexandru Marginean, linux-kernel, Leo Li,
	Ioana Ciornei, linuxppc-dev, linux-arm-kernel

On Thu, Dec 12, 2019 at 6:02 PM Youri Querry <youri.querry_1@nxp.com> wrote:
>
> We are making the access decision in the initialization and
> setting the function pointers accordingly.
>
> Signed-off-by: Youri Querry <youri.querry_1@nxp.com>
> ---
>  drivers/soc/fsl/dpio/qbman-portal.c | 451 +++++++++++++++++++++++++++++++-----
>  drivers/soc/fsl/dpio/qbman-portal.h | 129 ++++++++++-
>  2 files changed, 507 insertions(+), 73 deletions(-)

While merging pull requests, I came across some style issues with this driver.
I'm pulling it anyway, but please have another look and fix these for the next
release, or send a follow-up patch for the coming merge window.

>
> diff --git a/drivers/soc/fsl/dpio/qbman-portal.c b/drivers/soc/fsl/dpio/qbman-portal.c
> index 5a37ac8..0ffe018 100644
> --- a/drivers/soc/fsl/dpio/qbman-portal.c
> +++ b/drivers/soc/fsl/dpio/qbman-portal.c
> @@ -83,6 +83,82 @@ enum qbman_sdqcr_fc {
>         qbman_sdqcr_fc_up_to_3 = 1
>  };
>
> +/* Internal Function declaration */
> +static int qbman_swp_enqueue_direct(struct qbman_swp *s,
> +                                   const struct qbman_eq_desc *d,
> +                                   const struct dpaa2_fd *fd);
> +static int qbman_swp_enqueue_mem_back(struct qbman_swp *s,
> +                                     const struct qbman_eq_desc *d,
> +                                     const struct dpaa2_fd *fd);
> +static int qbman_swp_enqueue_multiple_direct(struct qbman_swp *s,
> +                                            const struct qbman_eq_desc *d,
> +                                            const struct dpaa2_fd *fd,
> +                                            uint32_t *flags,
> +                                            int num_frames);
> +static int qbman_swp_enqueue_multiple_mem_back(struct qbman_swp *s,
> +                                              const struct qbman_eq_desc *d,
> +                                              const struct dpaa2_fd *fd,
> +                                              uint32_t *flags,
> +                                              int num_frames);

Please try to avoid all static forward declarations. The coding style for the
kernel generally mandates that you define the functions in the order they
are used in, and have no such declarations, unless there is a recursion
that requires it. If you do have recursion, then please add a comment that
explains how you limit it to avoid overrunning the kernel stack.

> +const struct dpaa2_dq *qbman_swp_dqrr_next_direct(struct qbman_swp *s);
> +const struct dpaa2_dq *qbman_swp_dqrr_next_mem_back(struct qbman_swp *s);

Forward declarations for non-static functions in C code are much
worse, you should
never need these. If a function is shared between files, then put the
declaration
into a header file that is included by both, to ensure the prototypes match, and
if it's only used here, then make it 'static'.

> +/* Function pointers */
> +int (*qbman_swp_enqueue_ptr)(struct qbman_swp *s,
> +                            const struct qbman_eq_desc *d,
> +                            const struct dpaa2_fd *fd)
> +       = qbman_swp_enqueue_direct;
> +
> +int (*qbman_swp_enqueue_multiple_ptr)(struct qbman_swp *s,
> +                                     const struct qbman_eq_desc *d,
> +                                     const struct dpaa2_fd *fd,
> +                                     uint32_t *flags,
> +                                            int num_frames)
> +       = qbman_swp_enqueue_multiple_direct;

This looks like you just have an indirect function pointer with a
single possible
implementation. This is less of a problem, but until you have a way to safely
override these at runtime, it may be better to simplify this by using direct
function calls.

       Arnd

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-03-26 10:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-12 17:01 [PATCH 0/3] soc: fsl: dpio: Enable QMAN batch enqueuing Youri Querry
2019-12-12 17:01 ` [PATCH 1/3] soc: fsl: dpio: Adding QMAN multiple enqueue interface Youri Querry
2019-12-12 17:01 ` [PATCH 2/3] soc: fsl: dpio: QMAN performance improvement. Function pointer indirection Youri Querry
2020-03-26 10:07   ` Arnd Bergmann
2019-12-12 17:01 ` [PATCH 3/3] soc: fsl: dpio: Replace QMAN array mode by ring mode enqueue Youri Querry
2020-02-06 20:39 ` [PATCH 0/3] soc: fsl: dpio: Enable QMAN batch enqueuing Roy Pledge
2020-02-06 22:05   ` Leo Li
2020-02-20  0:07   ` Li Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).