All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing
@ 2018-04-27 16:25 Shreyansh Jain
  2018-04-27 16:25 ` [PATCH 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion Shreyansh Jain
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Shreyansh Jain @ 2018-04-27 16:25 UTC (permalink / raw)
  To: thomas, dev; +Cc: hemant.agrawal, akhil.goyal, anatoly.burakov, Shreyansh Jain

Once the hotplugging (6b42f7563) patchset were merged, DPAA2 Physical
Addressing mode and DPAA observed drastic performance drop (~95%)
This was because of an inherent assumption while doing some memory
translation that memsegs would be physically contiguous

This series attempts to add a workaround for that - a intermediary
one while complete solution is integrated

This work around creates a linked list of referenced buffers and
attempts to search through it during physical to virtual translations.

Shreyansh Jain (3):
  crypto/dpaa_sec: remove ctx based offset for PA-VA conversion
  bus/fslmc: optimize physical to virtual address searching
  bus/dpaa: optimize physical to virtual address searching

 drivers/bus/dpaa/rte_dpaa_bus.h          | 27 +++++++++++++++++-
 drivers/bus/fslmc/portal/dpaa2_hw_pvt.h  | 23 +++++++++++++++
 drivers/crypto/dpaa_sec/dpaa_sec.c       | 49 +++++++++++++-------------------
 drivers/mempool/dpaa/dpaa_mempool.c      | 33 ++++++++++++++++++++-
 drivers/mempool/dpaa2/dpaa2_hw_mempool.c | 43 ++++++++++++++++++++++++++++
 5 files changed, 144 insertions(+), 31 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion
  2018-04-27 16:25 [PATCH 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain
@ 2018-04-27 16:25 ` Shreyansh Jain
  2018-04-27 16:25 ` [PATCH 2/3] bus/fslmc: optimize physical to virtual address searching Shreyansh Jain
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Shreyansh Jain @ 2018-04-27 16:25 UTC (permalink / raw)
  To: thomas, dev; +Cc: hemant.agrawal, akhil.goyal, anatoly.burakov, Shreyansh Jain

Crypto requires physical to virtual address conversion for
descriptors. Prior to memory hotplugging this was based on memseg
iteration assuming memsegs are all physical contiguous and using
cached start address fast calculations can be done. This
assumption now stands invalid with memory hotplugging support.

In preparation for supporting hotplugging change to memory,
this patchset removes the optimized pool context stored physical
address offset based PA-VA conversion.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---

This adversely affects the performance as complete memsegs now need
to be parsed, but a rework containing necessary optimzation would be
posted over this.

 drivers/crypto/dpaa_sec/dpaa_sec.c | 49 ++++++++++++++++----------------------
 1 file changed, 20 insertions(+), 29 deletions(-)

diff --git a/drivers/crypto/dpaa_sec/dpaa_sec.c b/drivers/crypto/dpaa_sec/dpaa_sec.c
index e456fd542..06f7e4373 100644
--- a/drivers/crypto/dpaa_sec/dpaa_sec.c
+++ b/drivers/crypto/dpaa_sec/dpaa_sec.c
@@ -103,13 +103,6 @@ dpaa_mem_vtop(void *vaddr)
 	return (size_t)NULL;
 }
 
-/* virtual address conversin when mempool support is available for ctx */
-static inline phys_addr_t
-dpaa_mem_vtop_ctx(struct dpaa_sec_op_ctx *ctx, void *vaddr)
-{
-	return (size_t)vaddr - ctx->vtop_offset;
-}
-
 static inline void *
 dpaa_mem_ptov(rte_iova_t paddr)
 {
@@ -630,7 +623,7 @@ build_auth_only_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 	in_sg->extension = 1;
 	in_sg->final = 1;
 	in_sg->length = sym->auth.data.length;
-	qm_sg_entry_set64(in_sg, dpaa_mem_vtop_ctx(ctx, &cf->sg[2]));
+	qm_sg_entry_set64(in_sg, dpaa_mem_vtop(&cf->sg[2]));
 
 	/* 1st seg */
 	sg = in_sg + 1;
@@ -654,7 +647,7 @@ build_auth_only_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 		sg++;
 		rte_memcpy(old_digest, sym->auth.digest.data,
 				ses->digest_length);
-		start_addr = dpaa_mem_vtop_ctx(ctx, old_digest);
+		start_addr = dpaa_mem_vtop(old_digest);
 		qm_sg_entry_set64(sg, start_addr);
 		sg->length = ses->digest_length;
 		in_sg->length += ses->digest_length;
@@ -708,7 +701,7 @@ build_auth_only(struct rte_crypto_op *op, dpaa_sec_session *ses)
 	if (is_decode(ses)) {
 		/* need to extend the input to a compound frame */
 		sg->extension = 1;
-		qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, &cf->sg[2]));
+		qm_sg_entry_set64(sg, dpaa_mem_vtop(&cf->sg[2]));
 		sg->length = sym->auth.data.length + ses->digest_length;
 		sg->final = 1;
 		cpu_to_hw_sg(sg);
@@ -722,7 +715,7 @@ build_auth_only(struct rte_crypto_op *op, dpaa_sec_session *ses)
 		cpu_to_hw_sg(sg);
 
 		/* let's check digest by hw */
-		start_addr = dpaa_mem_vtop_ctx(ctx, old_digest);
+		start_addr = dpaa_mem_vtop(old_digest);
 		sg++;
 		qm_sg_entry_set64(sg, start_addr);
 		sg->length = ses->digest_length;
@@ -775,7 +768,7 @@ build_cipher_only_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 	out_sg = &cf->sg[0];
 	out_sg->extension = 1;
 	out_sg->length = sym->cipher.data.length;
-	qm_sg_entry_set64(out_sg, dpaa_mem_vtop_ctx(ctx, &cf->sg[2]));
+	qm_sg_entry_set64(out_sg, dpaa_mem_vtop(&cf->sg[2]));
 	cpu_to_hw_sg(out_sg);
 
 	/* 1st seg */
@@ -804,7 +797,7 @@ build_cipher_only_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 	in_sg->length = sym->cipher.data.length + ses->iv.length;
 
 	sg++;
-	qm_sg_entry_set64(in_sg, dpaa_mem_vtop_ctx(ctx, sg));
+	qm_sg_entry_set64(in_sg, dpaa_mem_vtop(sg));
 	cpu_to_hw_sg(in_sg);
 
 	/* IV */
@@ -871,7 +864,7 @@ build_cipher_only(struct rte_crypto_op *op, dpaa_sec_session *ses)
 	sg->extension = 1;
 	sg->final = 1;
 	sg->length = sym->cipher.data.length + ses->iv.length;
-	qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, &cf->sg[2]));
+	qm_sg_entry_set64(sg, dpaa_mem_vtop(&cf->sg[2]));
 	cpu_to_hw_sg(sg);
 
 	sg = &cf->sg[2];
@@ -937,7 +930,7 @@ build_cipher_auth_gcm_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 
 	/* output sg entries */
 	sg = &cf->sg[2];
-	qm_sg_entry_set64(out_sg, dpaa_mem_vtop_ctx(ctx, sg));
+	qm_sg_entry_set64(out_sg, dpaa_mem_vtop(sg));
 	cpu_to_hw_sg(out_sg);
 
 	/* 1st seg */
@@ -981,7 +974,7 @@ build_cipher_auth_gcm_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 
 	/* input sg entries */
 	sg++;
-	qm_sg_entry_set64(in_sg, dpaa_mem_vtop_ctx(ctx, sg));
+	qm_sg_entry_set64(in_sg, dpaa_mem_vtop(sg));
 	cpu_to_hw_sg(in_sg);
 
 	/* 1st seg IV */
@@ -1018,7 +1011,7 @@ build_cipher_auth_gcm_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 		sg++;
 		memcpy(ctx->digest, sym->aead.digest.data,
 			ses->digest_length);
-		qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, ctx->digest));
+		qm_sg_entry_set64(sg, dpaa_mem_vtop(ctx->digest));
 		sg->length = ses->digest_length;
 	}
 	sg->final = 1;
@@ -1056,7 +1049,7 @@ build_cipher_auth_gcm(struct rte_crypto_op *op, dpaa_sec_session *ses)
 	/* input */
 	rte_prefetch0(cf->sg);
 	sg = &cf->sg[2];
-	qm_sg_entry_set64(&cf->sg[1], dpaa_mem_vtop_ctx(ctx, sg));
+	qm_sg_entry_set64(&cf->sg[1], dpaa_mem_vtop(sg));
 	if (is_encode(ses)) {
 		qm_sg_entry_set64(sg, dpaa_mem_vtop(IV_ptr));
 		sg->length = ses->iv.length;
@@ -1101,7 +1094,7 @@ build_cipher_auth_gcm(struct rte_crypto_op *op, dpaa_sec_session *ses)
 		       ses->digest_length);
 		sg++;
 
-		qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, ctx->digest));
+		qm_sg_entry_set64(sg, dpaa_mem_vtop(ctx->digest));
 		sg->length = ses->digest_length;
 		length += sg->length;
 		sg->final = 1;
@@ -1115,7 +1108,7 @@ build_cipher_auth_gcm(struct rte_crypto_op *op, dpaa_sec_session *ses)
 
 	/* output */
 	sg++;
-	qm_sg_entry_set64(&cf->sg[0], dpaa_mem_vtop_ctx(ctx, sg));
+	qm_sg_entry_set64(&cf->sg[0], dpaa_mem_vtop(sg));
 	qm_sg_entry_set64(sg,
 		dst_start_addr + sym->aead.data.offset - ses->auth_only_len);
 	sg->length = sym->aead.data.length + ses->auth_only_len;
@@ -1184,7 +1177,7 @@ build_cipher_auth_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 
 	/* output sg entries */
 	sg = &cf->sg[2];
-	qm_sg_entry_set64(out_sg, dpaa_mem_vtop_ctx(ctx, sg));
+	qm_sg_entry_set64(out_sg, dpaa_mem_vtop(sg));
 	cpu_to_hw_sg(out_sg);
 
 	/* 1st seg */
@@ -1226,7 +1219,7 @@ build_cipher_auth_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 
 	/* input sg entries */
 	sg++;
-	qm_sg_entry_set64(in_sg, dpaa_mem_vtop_ctx(ctx, sg));
+	qm_sg_entry_set64(in_sg, dpaa_mem_vtop(sg));
 	cpu_to_hw_sg(in_sg);
 
 	/* 1st seg IV */
@@ -1256,7 +1249,7 @@ build_cipher_auth_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 		sg++;
 		memcpy(ctx->digest, sym->auth.digest.data,
 			ses->digest_length);
-		qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, ctx->digest));
+		qm_sg_entry_set64(sg, dpaa_mem_vtop(ctx->digest));
 		sg->length = ses->digest_length;
 	}
 	sg->final = 1;
@@ -1293,7 +1286,7 @@ build_cipher_auth(struct rte_crypto_op *op, dpaa_sec_session *ses)
 	/* input */
 	rte_prefetch0(cf->sg);
 	sg = &cf->sg[2];
-	qm_sg_entry_set64(&cf->sg[1], dpaa_mem_vtop_ctx(ctx, sg));
+	qm_sg_entry_set64(&cf->sg[1], dpaa_mem_vtop(sg));
 	if (is_encode(ses)) {
 		qm_sg_entry_set64(sg, dpaa_mem_vtop(IV_ptr));
 		sg->length = ses->iv.length;
@@ -1323,7 +1316,7 @@ build_cipher_auth(struct rte_crypto_op *op, dpaa_sec_session *ses)
 		       ses->digest_length);
 		sg++;
 
-		qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, ctx->digest));
+		qm_sg_entry_set64(sg, dpaa_mem_vtop(ctx->digest));
 		sg->length = ses->digest_length;
 		length += sg->length;
 		sg->final = 1;
@@ -1337,7 +1330,7 @@ build_cipher_auth(struct rte_crypto_op *op, dpaa_sec_session *ses)
 
 	/* output */
 	sg++;
-	qm_sg_entry_set64(&cf->sg[0], dpaa_mem_vtop_ctx(ctx, sg));
+	qm_sg_entry_set64(&cf->sg[0], dpaa_mem_vtop(sg));
 	qm_sg_entry_set64(sg, dst_start_addr + sym->cipher.data.offset);
 	sg->length = sym->cipher.data.length;
 	length = sg->length;
@@ -1412,7 +1405,6 @@ dpaa_sec_enqueue_burst(void *qp, struct rte_crypto_op **ops,
 	struct rte_crypto_op *op;
 	struct dpaa_sec_job *cf;
 	dpaa_sec_session *ses;
-	struct dpaa_sec_op_ctx *ctx;
 	uint32_t auth_only_len;
 	struct qman_fq *inq[DPAA_SEC_BURST];
 
@@ -1497,8 +1489,7 @@ dpaa_sec_enqueue_burst(void *qp, struct rte_crypto_op **ops,
 			inq[loop] = ses->inq;
 			fd->opaque_addr = 0;
 			fd->cmd = 0;
-			ctx = container_of(cf, struct dpaa_sec_op_ctx, job);
-			qm_fd_addr_set64(fd, dpaa_mem_vtop_ctx(ctx, cf->sg));
+			qm_fd_addr_set64(fd, dpaa_mem_vtop(cf->sg));
 			fd->_format1 = qm_fd_compound;
 			fd->length29 = 2 * sizeof(struct qm_sg_entry);
 			/* Auth_only_len is set as 0 in descriptor and it is
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] bus/fslmc: optimize physical to virtual address searching
  2018-04-27 16:25 [PATCH 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain
  2018-04-27 16:25 ` [PATCH 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion Shreyansh Jain
@ 2018-04-27 16:25 ` Shreyansh Jain
  2018-04-27 16:25 ` [PATCH 3/3] bus/dpaa: " Shreyansh Jain
  2018-04-27 17:20 ` [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain
  3 siblings, 0 replies; 12+ messages in thread
From: Shreyansh Jain @ 2018-04-27 16:25 UTC (permalink / raw)
  To: thomas, dev; +Cc: hemant.agrawal, akhil.goyal, anatoly.burakov, Shreyansh Jain

With Hotplugging memory support, the order of memseg has been changed
from physically contiguous to virtual contiguous. FSLMC bus and dpaa2
drivers depend on PA to VA address conversion when in Physical
addressing mode.

This patch creates a list of blocks requested to be pinned to the
DPAA2 mempool. For searching physical addresses, it is expected that
it would belong to this list (from hardware pool) and hence it is
less expensive than memseg walks. Though, this has marginal impact on
performance vis-a-vis legacy mode with physically contiguous memsegs.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>

--

An optimized algorithm is being worked upon based on some recent patches
in hotplugging. That would improve/recover the performance. Until that
time, this patch is to be treated a stop-gap solution.
---
 drivers/bus/fslmc/portal/dpaa2_hw_pvt.h  | 23 +++++++++++++++++
 drivers/mempool/dpaa2/dpaa2_hw_mempool.c | 43 ++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+)

diff --git a/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h b/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h
index c76393d45..da6e639dc 100644
--- a/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h
+++ b/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h
@@ -254,15 +254,38 @@ enum qbman_fd_format {
  */
 #define DPAA2_EQ_RESP_ALWAYS		1
 
+/* Various structures representing contiguous memory maps */
+struct dpaa2_memseg {
+	TAILQ_ENTRY(dpaa2_memseg) next;
+	char *vaddr;
+	rte_iova_t iova;
+	size_t len;
+};
+
+TAILQ_HEAD(dpaa2_memseg_list, dpaa2_memseg);
+extern struct dpaa2_memseg_list dpaa2_memsegs;
+
 #ifdef RTE_LIBRTE_DPAA2_USE_PHYS_IOVA
 extern uint8_t dpaa2_virt_mode;
 static void *dpaa2_mem_ptov(phys_addr_t paddr) __attribute__((unused));
 /* todo - this is costly, need to write a fast coversion routine */
 static void *dpaa2_mem_ptov(phys_addr_t paddr)
 {
+	struct dpaa2_memseg *ms;
+
 	if (dpaa2_virt_mode)
 		return (void *)(size_t)paddr;
 
+	/* Check if the address is already part of the memseg list internally
+	 * maintained by the dpaa2 driver.
+	 */
+	TAILQ_FOREACH(ms, &dpaa2_memsegs, next) {
+		if (paddr >= ms->iova && paddr <
+			ms->iova + ms->len)
+			return RTE_PTR_ADD(ms->vaddr, (paddr - ms->iova));
+	}
+
+	/* If not, Fallback to full memseg list searching */
 	return rte_mem_iova2virt(paddr);
 }
 
diff --git a/drivers/mempool/dpaa2/dpaa2_hw_mempool.c b/drivers/mempool/dpaa2/dpaa2_hw_mempool.c
index ce7a4c577..4c44c33cc 100644
--- a/drivers/mempool/dpaa2/dpaa2_hw_mempool.c
+++ b/drivers/mempool/dpaa2/dpaa2_hw_mempool.c
@@ -32,6 +32,13 @@
 struct dpaa2_bp_info rte_dpaa2_bpid_info[MAX_BPID];
 static struct dpaa2_bp_list *h_bp_list;
 
+/* List of all the memseg information locally maintained in dpaa2 driver. This
+ * is to optimize the PA_to_VA searches until a better mechanism (algo) is
+ * available.
+ */
+struct dpaa2_memseg_list dpaa2_memsegs
+	= TAILQ_HEAD_INITIALIZER(dpaa2_memsegs);
+
 /* Dynamic logging identified for mempool */
 int dpaa2_logtype_mempool;
 
@@ -358,6 +365,41 @@ rte_hw_mbuf_get_count(const struct rte_mempool *mp)
 	return num_of_bufs;
 }
 
+static int
+dpaa2_populate(struct rte_mempool *mp, unsigned int max_objs,
+	      void *vaddr, rte_iova_t paddr, size_t len,
+	      rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
+{
+	struct dpaa2_memseg *ms;
+
+	/* For each memory chunk pinned to the Mempool, a linked list of the
+	 * represeted memsegs is created for searching when PA to VA
+	 * conversion is required.
+	 */
+	ms = rte_zmalloc(NULL, sizeof(struct dpaa2_memseg), 0);
+	if (!ms) {
+		DPAA2_MEMPOOL_ERR("Unable to allocate internal memory.");
+		DPAA2_MEMPOOL_WARN("Fast Physical to Virtual Addr translation would not be available.");
+		/* If the element is not added, it would only lead to failure
+		 * in searching for the element and the logic would Fallback
+		 * to traditional DPDK memseg traversal code. So, this is not
+		 * a blocking error - but, error would be printed on screen.
+		 */
+		return 0;
+	}
+
+	ms->vaddr = vaddr;
+	ms->iova = paddr;
+	ms->len = len;
+	/* Head insertions are generally faster than tail insertions as the
+	 * buffers pinned are picked from rear end.
+	 */
+	TAILQ_INSERT_HEAD(&dpaa2_memsegs, ms, next);
+
+	return rte_mempool_op_populate_default(mp, max_objs, vaddr, paddr, len,
+					       obj_cb, obj_cb_arg);
+}
+
 struct rte_mempool_ops dpaa2_mpool_ops = {
 	.name = DPAA2_MEMPOOL_OPS_NAME,
 	.alloc = rte_hw_mbuf_create_pool,
@@ -365,6 +407,7 @@ struct rte_mempool_ops dpaa2_mpool_ops = {
 	.enqueue = rte_hw_mbuf_free_bulk,
 	.dequeue = rte_dpaa2_mbuf_alloc_bulk,
 	.get_count = rte_hw_mbuf_get_count,
+	.populate = dpaa2_populate,
 };
 
 MEMPOOL_REGISTER_OPS(dpaa2_mpool_ops);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] bus/dpaa: optimize physical to virtual address searching
  2018-04-27 16:25 [PATCH 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain
  2018-04-27 16:25 ` [PATCH 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion Shreyansh Jain
  2018-04-27 16:25 ` [PATCH 2/3] bus/fslmc: optimize physical to virtual address searching Shreyansh Jain
@ 2018-04-27 16:25 ` Shreyansh Jain
  2018-04-27 17:20 ` [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain
  3 siblings, 0 replies; 12+ messages in thread
From: Shreyansh Jain @ 2018-04-27 16:25 UTC (permalink / raw)
  To: thomas, dev; +Cc: hemant.agrawal, akhil.goyal, anatoly.burakov, Shreyansh Jain

With Hotplugging memory support, the order of memseg has been changed
from physically contiguous to virtual contiguous. DPAA bus and drivers
depend on PA to VA address conversion for I/O.

This patch creates a list of blocks requested to be pinned to the
DPAA mempool. For searching physical addresses, it is expected that
it would belong to this list (from hardware pool) and hence it is
less expensive than memseg walks. Though, there is a marginal drop
in performance vis-a-vis the legacy mode with physically contiguous
memsegs.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>

--
An optimized algorithm is being worked upon based on some recent patches
in hotplugging. That would improve/recover the performance. Until that
time, this patch is to be treated a stop-gap solution.
---
 drivers/bus/dpaa/rte_dpaa_bus.h     | 27 ++++++++++++++++++++++++++-
 drivers/mempool/dpaa/dpaa_mempool.c | 33 ++++++++++++++++++++++++++++++++-
 2 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/drivers/bus/dpaa/rte_dpaa_bus.h b/drivers/bus/dpaa/rte_dpaa_bus.h
index 89aeac2d1..ca32b7f2f 100644
--- a/drivers/bus/dpaa/rte_dpaa_bus.h
+++ b/drivers/bus/dpaa/rte_dpaa_bus.h
@@ -95,9 +95,34 @@ struct dpaa_portal {
 	uint64_t tid;/**< Parent Thread id for this portal */
 };
 
-/* TODO - this is costly, need to write a fast coversion routine */
+/* Various structures representing contiguous memory maps */
+struct dpaa_memseg {
+	TAILQ_ENTRY(dpaa_memseg) next;
+	char *vaddr;
+	rte_iova_t iova;
+	size_t len;
+};
+
+TAILQ_HEAD(dpaa_memseg_list, dpaa_memseg);
+extern struct dpaa_memseg_list dpaa_memsegs;
+
+/* Either iterate over the list of internal memseg references or fallback to
+ * EAL memseg based iova2virt.
+ */
 static inline void *rte_dpaa_mem_ptov(phys_addr_t paddr)
 {
+	struct dpaa_memseg *ms;
+
+	/* Check if the address is already part of the memseg list internally
+	 * maintained by the dpaa driver.
+	 */
+	TAILQ_FOREACH(ms, &dpaa_memsegs, next) {
+		if (paddr >= ms->iova && paddr <
+			ms->iova + ms->len)
+			return RTE_PTR_ADD(ms->vaddr, (paddr - ms->iova));
+	}
+
+	/* If not, Fallback to full memseg list searching */
 	return rte_mem_iova2virt(paddr);
 }
 
diff --git a/drivers/mempool/dpaa/dpaa_mempool.c b/drivers/mempool/dpaa/dpaa_mempool.c
index 580e4640c..e5de15ec9 100644
--- a/drivers/mempool/dpaa/dpaa_mempool.c
+++ b/drivers/mempool/dpaa/dpaa_mempool.c
@@ -27,6 +27,13 @@
 
 #include <dpaa_mempool.h>
 
+/* List of all the memseg information locally maintained in dpaa driver. This
+ * is to optimize the PA_to_VA searches until a better mechanism (algo) is
+ * available.
+ */
+struct dpaa_memseg_list dpaa_memsegs
+	= TAILQ_HEAD_INITIALIZER(dpaa_memsegs);
+
 struct dpaa_bp_info rte_dpaa_bpid_info[DPAA_MAX_BPOOLS];
 
 static int
@@ -287,10 +294,34 @@ dpaa_populate(struct rte_mempool *mp, unsigned int max_objs,
 	/* Detect pool area has sufficient space for elements in this memzone */
 	if (len >= total_elt_sz * mp->size)
 		bp_info->flags |= DPAA_MPOOL_SINGLE_SEGMENT;
+	struct dpaa_memseg *ms;
+
+	/* For each memory chunk pinned to the Mempool, a linked list of the
+	 * represeted memsegs is created for searching when PA to VA
+	 * conversion is required.
+	 */
+	ms = rte_zmalloc(NULL, sizeof(struct dpaa_memseg), 0);
+	if (!ms) {
+		DPAA_MEMPOOL_ERR("Unable to allocate internal memory.");
+		DPAA_MEMPOOL_WARN("Fast Physical to Virtual Addr translation would not be available.");
+		/* If the element is not added, it would only lead to failure
+		 * in searching for the element and the logic would Fallback
+		 * to traditional DPDK memseg traversal code. So, this is not
+		 * a blocking error - but, error would be printed on screen.
+		 */
+		return 0;
+	}
+
+	ms->vaddr = vaddr;
+	ms->iova = paddr;
+	ms->len = len;
+	/* Head insertions are generally faster than tail insertions as the
+	 * buffers pinned are picked from rear end.
+	 */
+	TAILQ_INSERT_HEAD(&dpaa_memsegs, ms, next);
 
 	return rte_mempool_op_populate_default(mp, max_objs, vaddr, paddr, len,
 					       obj_cb, obj_cb_arg);
-
 }
 
 struct rte_mempool_ops dpaa_mpool_ops = {
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing
  2018-04-27 16:25 [PATCH 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain
                   ` (2 preceding siblings ...)
  2018-04-27 16:25 ` [PATCH 3/3] bus/dpaa: " Shreyansh Jain
@ 2018-04-27 17:20 ` Shreyansh Jain
  2018-04-27 17:20   ` [PATCH v2 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion Shreyansh Jain
                     ` (3 more replies)
  3 siblings, 4 replies; 12+ messages in thread
From: Shreyansh Jain @ 2018-04-27 17:20 UTC (permalink / raw)
  To: thomas, dev; +Cc: hemant.agrawal, akhil.goyal, anatoly.burakov, Shreyansh Jain

Once the hotplugging (6b42f7563) patchset were merged, DPAA2 Physical
Addressing mode and DPAA observed drastic performance drop (~95%)
This was because of an inherent assumption while doing some memory
translation that memsegs would be physically contiguous

This series attempts to add a workaround for that - a intermediary
one while complete solution is integrated

This work around creates a linked list of referenced buffers and
attempts to search through it during physical to virtual translations.

:Change history:

v2:
 - fixed spelling mistakes in patch as commit

Shreyansh Jain (3):
  crypto/dpaa_sec: remove ctx based offset for PA-VA conversion
  bus/fslmc: optimize physical to virtual address searching
  bus/dpaa: optimize physical to virtual address searching

 drivers/bus/dpaa/rte_dpaa_bus.h          | 27 +++++++++++++++++-
 drivers/bus/fslmc/portal/dpaa2_hw_pvt.h  | 23 +++++++++++++++
 drivers/crypto/dpaa_sec/dpaa_sec.c       | 49 +++++++++++++-------------------
 drivers/mempool/dpaa/dpaa_mempool.c      | 33 ++++++++++++++++++++-
 drivers/mempool/dpaa2/dpaa2_hw_mempool.c | 43 ++++++++++++++++++++++++++++
 5 files changed, 144 insertions(+), 31 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion
  2018-04-27 17:20 ` [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain
@ 2018-04-27 17:20   ` Shreyansh Jain
  2018-04-27 17:20   ` [PATCH v2 2/3] bus/fslmc: optimize physical to virtual address searching Shreyansh Jain
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Shreyansh Jain @ 2018-04-27 17:20 UTC (permalink / raw)
  To: thomas, dev; +Cc: hemant.agrawal, akhil.goyal, anatoly.burakov, Shreyansh Jain

Crypto requires physical to virtual address conversion for
descriptors. Prior to memory hotplugging this was based on memseg
iteration assuming memsegs are all physical contiguous and using
cached start address fast calculations can be done. This
assumption now stands invalid with memory hotplugging support.

In preparation for supporting hotplugging change to memory,
this patchset removes the optimized pool context stored physical
address offset based PA-VA conversion.

This adversely affects the performance as complete memsegs now need
to be parsed, but a rework containing necessary optimization would be
posted over this.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
 drivers/crypto/dpaa_sec/dpaa_sec.c | 49 ++++++++++++++++----------------------
 1 file changed, 20 insertions(+), 29 deletions(-)

diff --git a/drivers/crypto/dpaa_sec/dpaa_sec.c b/drivers/crypto/dpaa_sec/dpaa_sec.c
index e456fd542..06f7e4373 100644
--- a/drivers/crypto/dpaa_sec/dpaa_sec.c
+++ b/drivers/crypto/dpaa_sec/dpaa_sec.c
@@ -103,13 +103,6 @@ dpaa_mem_vtop(void *vaddr)
 	return (size_t)NULL;
 }
 
-/* virtual address conversin when mempool support is available for ctx */
-static inline phys_addr_t
-dpaa_mem_vtop_ctx(struct dpaa_sec_op_ctx *ctx, void *vaddr)
-{
-	return (size_t)vaddr - ctx->vtop_offset;
-}
-
 static inline void *
 dpaa_mem_ptov(rte_iova_t paddr)
 {
@@ -630,7 +623,7 @@ build_auth_only_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 	in_sg->extension = 1;
 	in_sg->final = 1;
 	in_sg->length = sym->auth.data.length;
-	qm_sg_entry_set64(in_sg, dpaa_mem_vtop_ctx(ctx, &cf->sg[2]));
+	qm_sg_entry_set64(in_sg, dpaa_mem_vtop(&cf->sg[2]));
 
 	/* 1st seg */
 	sg = in_sg + 1;
@@ -654,7 +647,7 @@ build_auth_only_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 		sg++;
 		rte_memcpy(old_digest, sym->auth.digest.data,
 				ses->digest_length);
-		start_addr = dpaa_mem_vtop_ctx(ctx, old_digest);
+		start_addr = dpaa_mem_vtop(old_digest);
 		qm_sg_entry_set64(sg, start_addr);
 		sg->length = ses->digest_length;
 		in_sg->length += ses->digest_length;
@@ -708,7 +701,7 @@ build_auth_only(struct rte_crypto_op *op, dpaa_sec_session *ses)
 	if (is_decode(ses)) {
 		/* need to extend the input to a compound frame */
 		sg->extension = 1;
-		qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, &cf->sg[2]));
+		qm_sg_entry_set64(sg, dpaa_mem_vtop(&cf->sg[2]));
 		sg->length = sym->auth.data.length + ses->digest_length;
 		sg->final = 1;
 		cpu_to_hw_sg(sg);
@@ -722,7 +715,7 @@ build_auth_only(struct rte_crypto_op *op, dpaa_sec_session *ses)
 		cpu_to_hw_sg(sg);
 
 		/* let's check digest by hw */
-		start_addr = dpaa_mem_vtop_ctx(ctx, old_digest);
+		start_addr = dpaa_mem_vtop(old_digest);
 		sg++;
 		qm_sg_entry_set64(sg, start_addr);
 		sg->length = ses->digest_length;
@@ -775,7 +768,7 @@ build_cipher_only_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 	out_sg = &cf->sg[0];
 	out_sg->extension = 1;
 	out_sg->length = sym->cipher.data.length;
-	qm_sg_entry_set64(out_sg, dpaa_mem_vtop_ctx(ctx, &cf->sg[2]));
+	qm_sg_entry_set64(out_sg, dpaa_mem_vtop(&cf->sg[2]));
 	cpu_to_hw_sg(out_sg);
 
 	/* 1st seg */
@@ -804,7 +797,7 @@ build_cipher_only_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 	in_sg->length = sym->cipher.data.length + ses->iv.length;
 
 	sg++;
-	qm_sg_entry_set64(in_sg, dpaa_mem_vtop_ctx(ctx, sg));
+	qm_sg_entry_set64(in_sg, dpaa_mem_vtop(sg));
 	cpu_to_hw_sg(in_sg);
 
 	/* IV */
@@ -871,7 +864,7 @@ build_cipher_only(struct rte_crypto_op *op, dpaa_sec_session *ses)
 	sg->extension = 1;
 	sg->final = 1;
 	sg->length = sym->cipher.data.length + ses->iv.length;
-	qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, &cf->sg[2]));
+	qm_sg_entry_set64(sg, dpaa_mem_vtop(&cf->sg[2]));
 	cpu_to_hw_sg(sg);
 
 	sg = &cf->sg[2];
@@ -937,7 +930,7 @@ build_cipher_auth_gcm_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 
 	/* output sg entries */
 	sg = &cf->sg[2];
-	qm_sg_entry_set64(out_sg, dpaa_mem_vtop_ctx(ctx, sg));
+	qm_sg_entry_set64(out_sg, dpaa_mem_vtop(sg));
 	cpu_to_hw_sg(out_sg);
 
 	/* 1st seg */
@@ -981,7 +974,7 @@ build_cipher_auth_gcm_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 
 	/* input sg entries */
 	sg++;
-	qm_sg_entry_set64(in_sg, dpaa_mem_vtop_ctx(ctx, sg));
+	qm_sg_entry_set64(in_sg, dpaa_mem_vtop(sg));
 	cpu_to_hw_sg(in_sg);
 
 	/* 1st seg IV */
@@ -1018,7 +1011,7 @@ build_cipher_auth_gcm_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 		sg++;
 		memcpy(ctx->digest, sym->aead.digest.data,
 			ses->digest_length);
-		qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, ctx->digest));
+		qm_sg_entry_set64(sg, dpaa_mem_vtop(ctx->digest));
 		sg->length = ses->digest_length;
 	}
 	sg->final = 1;
@@ -1056,7 +1049,7 @@ build_cipher_auth_gcm(struct rte_crypto_op *op, dpaa_sec_session *ses)
 	/* input */
 	rte_prefetch0(cf->sg);
 	sg = &cf->sg[2];
-	qm_sg_entry_set64(&cf->sg[1], dpaa_mem_vtop_ctx(ctx, sg));
+	qm_sg_entry_set64(&cf->sg[1], dpaa_mem_vtop(sg));
 	if (is_encode(ses)) {
 		qm_sg_entry_set64(sg, dpaa_mem_vtop(IV_ptr));
 		sg->length = ses->iv.length;
@@ -1101,7 +1094,7 @@ build_cipher_auth_gcm(struct rte_crypto_op *op, dpaa_sec_session *ses)
 		       ses->digest_length);
 		sg++;
 
-		qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, ctx->digest));
+		qm_sg_entry_set64(sg, dpaa_mem_vtop(ctx->digest));
 		sg->length = ses->digest_length;
 		length += sg->length;
 		sg->final = 1;
@@ -1115,7 +1108,7 @@ build_cipher_auth_gcm(struct rte_crypto_op *op, dpaa_sec_session *ses)
 
 	/* output */
 	sg++;
-	qm_sg_entry_set64(&cf->sg[0], dpaa_mem_vtop_ctx(ctx, sg));
+	qm_sg_entry_set64(&cf->sg[0], dpaa_mem_vtop(sg));
 	qm_sg_entry_set64(sg,
 		dst_start_addr + sym->aead.data.offset - ses->auth_only_len);
 	sg->length = sym->aead.data.length + ses->auth_only_len;
@@ -1184,7 +1177,7 @@ build_cipher_auth_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 
 	/* output sg entries */
 	sg = &cf->sg[2];
-	qm_sg_entry_set64(out_sg, dpaa_mem_vtop_ctx(ctx, sg));
+	qm_sg_entry_set64(out_sg, dpaa_mem_vtop(sg));
 	cpu_to_hw_sg(out_sg);
 
 	/* 1st seg */
@@ -1226,7 +1219,7 @@ build_cipher_auth_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 
 	/* input sg entries */
 	sg++;
-	qm_sg_entry_set64(in_sg, dpaa_mem_vtop_ctx(ctx, sg));
+	qm_sg_entry_set64(in_sg, dpaa_mem_vtop(sg));
 	cpu_to_hw_sg(in_sg);
 
 	/* 1st seg IV */
@@ -1256,7 +1249,7 @@ build_cipher_auth_sg(struct rte_crypto_op *op, dpaa_sec_session *ses)
 		sg++;
 		memcpy(ctx->digest, sym->auth.digest.data,
 			ses->digest_length);
-		qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, ctx->digest));
+		qm_sg_entry_set64(sg, dpaa_mem_vtop(ctx->digest));
 		sg->length = ses->digest_length;
 	}
 	sg->final = 1;
@@ -1293,7 +1286,7 @@ build_cipher_auth(struct rte_crypto_op *op, dpaa_sec_session *ses)
 	/* input */
 	rte_prefetch0(cf->sg);
 	sg = &cf->sg[2];
-	qm_sg_entry_set64(&cf->sg[1], dpaa_mem_vtop_ctx(ctx, sg));
+	qm_sg_entry_set64(&cf->sg[1], dpaa_mem_vtop(sg));
 	if (is_encode(ses)) {
 		qm_sg_entry_set64(sg, dpaa_mem_vtop(IV_ptr));
 		sg->length = ses->iv.length;
@@ -1323,7 +1316,7 @@ build_cipher_auth(struct rte_crypto_op *op, dpaa_sec_session *ses)
 		       ses->digest_length);
 		sg++;
 
-		qm_sg_entry_set64(sg, dpaa_mem_vtop_ctx(ctx, ctx->digest));
+		qm_sg_entry_set64(sg, dpaa_mem_vtop(ctx->digest));
 		sg->length = ses->digest_length;
 		length += sg->length;
 		sg->final = 1;
@@ -1337,7 +1330,7 @@ build_cipher_auth(struct rte_crypto_op *op, dpaa_sec_session *ses)
 
 	/* output */
 	sg++;
-	qm_sg_entry_set64(&cf->sg[0], dpaa_mem_vtop_ctx(ctx, sg));
+	qm_sg_entry_set64(&cf->sg[0], dpaa_mem_vtop(sg));
 	qm_sg_entry_set64(sg, dst_start_addr + sym->cipher.data.offset);
 	sg->length = sym->cipher.data.length;
 	length = sg->length;
@@ -1412,7 +1405,6 @@ dpaa_sec_enqueue_burst(void *qp, struct rte_crypto_op **ops,
 	struct rte_crypto_op *op;
 	struct dpaa_sec_job *cf;
 	dpaa_sec_session *ses;
-	struct dpaa_sec_op_ctx *ctx;
 	uint32_t auth_only_len;
 	struct qman_fq *inq[DPAA_SEC_BURST];
 
@@ -1497,8 +1489,7 @@ dpaa_sec_enqueue_burst(void *qp, struct rte_crypto_op **ops,
 			inq[loop] = ses->inq;
 			fd->opaque_addr = 0;
 			fd->cmd = 0;
-			ctx = container_of(cf, struct dpaa_sec_op_ctx, job);
-			qm_fd_addr_set64(fd, dpaa_mem_vtop_ctx(ctx, cf->sg));
+			qm_fd_addr_set64(fd, dpaa_mem_vtop(cf->sg));
 			fd->_format1 = qm_fd_compound;
 			fd->length29 = 2 * sizeof(struct qm_sg_entry);
 			/* Auth_only_len is set as 0 in descriptor and it is
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/3] bus/fslmc: optimize physical to virtual address searching
  2018-04-27 17:20 ` [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain
  2018-04-27 17:20   ` [PATCH v2 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion Shreyansh Jain
@ 2018-04-27 17:20   ` Shreyansh Jain
  2018-04-27 18:49     ` Thomas Monjalon
  2018-04-27 17:20   ` [PATCH v2 3/3] bus/dpaa: " Shreyansh Jain
  2018-04-27 19:38   ` [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Thomas Monjalon
  3 siblings, 1 reply; 12+ messages in thread
From: Shreyansh Jain @ 2018-04-27 17:20 UTC (permalink / raw)
  To: thomas, dev; +Cc: hemant.agrawal, akhil.goyal, anatoly.burakov, Shreyansh Jain

With Hotplugging memory support, the order of memseg has been changed
from physically contiguous to virtual contiguous. FSLMC bus and dpaa2
drivers depend on PA to VA address conversion when in Physical
addressing mode.

This patch creates a list of blocks requested to be pinned to the
DPAA2 mempool. For searching physical addresses, it is expected that
it would belong to this list (from hardware pool) and hence it is
less expensive than memseg walks. Though, this has marginal impact on
performance vis-a-vis legacy mode with physically contiguous memsegs.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>

---
An optimized algorithm is being worked upon based on some recent patches
in hotplugging. That would improve/recover the performance. Until that
time, this patch is to be treated a stop-gap solution.
---
 drivers/bus/fslmc/portal/dpaa2_hw_pvt.h  | 23 +++++++++++++++++
 drivers/mempool/dpaa2/dpaa2_hw_mempool.c | 43 ++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+)

diff --git a/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h b/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h
index c76393d45..da6e639dc 100644
--- a/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h
+++ b/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h
@@ -254,15 +254,38 @@ enum qbman_fd_format {
  */
 #define DPAA2_EQ_RESP_ALWAYS		1
 
+/* Various structures representing contiguous memory maps */
+struct dpaa2_memseg {
+	TAILQ_ENTRY(dpaa2_memseg) next;
+	char *vaddr;
+	rte_iova_t iova;
+	size_t len;
+};
+
+TAILQ_HEAD(dpaa2_memseg_list, dpaa2_memseg);
+extern struct dpaa2_memseg_list dpaa2_memsegs;
+
 #ifdef RTE_LIBRTE_DPAA2_USE_PHYS_IOVA
 extern uint8_t dpaa2_virt_mode;
 static void *dpaa2_mem_ptov(phys_addr_t paddr) __attribute__((unused));
 /* todo - this is costly, need to write a fast coversion routine */
 static void *dpaa2_mem_ptov(phys_addr_t paddr)
 {
+	struct dpaa2_memseg *ms;
+
 	if (dpaa2_virt_mode)
 		return (void *)(size_t)paddr;
 
+	/* Check if the address is already part of the memseg list internally
+	 * maintained by the dpaa2 driver.
+	 */
+	TAILQ_FOREACH(ms, &dpaa2_memsegs, next) {
+		if (paddr >= ms->iova && paddr <
+			ms->iova + ms->len)
+			return RTE_PTR_ADD(ms->vaddr, (paddr - ms->iova));
+	}
+
+	/* If not, Fallback to full memseg list searching */
 	return rte_mem_iova2virt(paddr);
 }
 
diff --git a/drivers/mempool/dpaa2/dpaa2_hw_mempool.c b/drivers/mempool/dpaa2/dpaa2_hw_mempool.c
index ce7a4c577..883d8d84a 100644
--- a/drivers/mempool/dpaa2/dpaa2_hw_mempool.c
+++ b/drivers/mempool/dpaa2/dpaa2_hw_mempool.c
@@ -32,6 +32,13 @@
 struct dpaa2_bp_info rte_dpaa2_bpid_info[MAX_BPID];
 static struct dpaa2_bp_list *h_bp_list;
 
+/* List of all the memseg information locally maintained in dpaa2 driver. This
+ * is to optimize the PA_to_VA searches until a better mechanism (algo) is
+ * available.
+ */
+struct dpaa2_memseg_list dpaa2_memsegs
+	= TAILQ_HEAD_INITIALIZER(dpaa2_memsegs);
+
 /* Dynamic logging identified for mempool */
 int dpaa2_logtype_mempool;
 
@@ -358,6 +365,41 @@ rte_hw_mbuf_get_count(const struct rte_mempool *mp)
 	return num_of_bufs;
 }
 
+static int
+dpaa2_populate(struct rte_mempool *mp, unsigned int max_objs,
+	      void *vaddr, rte_iova_t paddr, size_t len,
+	      rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
+{
+	struct dpaa2_memseg *ms;
+
+	/* For each memory chunk pinned to the Mempool, a linked list of the
+	 * contained memsegs is created for searching when PA to VA
+	 * conversion is required.
+	 */
+	ms = rte_zmalloc(NULL, sizeof(struct dpaa2_memseg), 0);
+	if (!ms) {
+		DPAA2_MEMPOOL_ERR("Unable to allocate internal memory.");
+		DPAA2_MEMPOOL_WARN("Fast Physical to Virtual Addr translation would not be available.");
+		/* If the element is not added, it would only lead to failure
+		 * in searching for the element and the logic would Fallback
+		 * to traditional DPDK memseg traversal code. So, this is not
+		 * a blocking error - but, error would be printed on screen.
+		 */
+		return 0;
+	}
+
+	ms->vaddr = vaddr;
+	ms->iova = paddr;
+	ms->len = len;
+	/* Head insertions are generally faster than tail insertions as the
+	 * buffers pinned are picked from rear end.
+	 */
+	TAILQ_INSERT_HEAD(&dpaa2_memsegs, ms, next);
+
+	return rte_mempool_op_populate_default(mp, max_objs, vaddr, paddr, len,
+					       obj_cb, obj_cb_arg);
+}
+
 struct rte_mempool_ops dpaa2_mpool_ops = {
 	.name = DPAA2_MEMPOOL_OPS_NAME,
 	.alloc = rte_hw_mbuf_create_pool,
@@ -365,6 +407,7 @@ struct rte_mempool_ops dpaa2_mpool_ops = {
 	.enqueue = rte_hw_mbuf_free_bulk,
 	.dequeue = rte_dpaa2_mbuf_alloc_bulk,
 	.get_count = rte_hw_mbuf_get_count,
+	.populate = dpaa2_populate,
 };
 
 MEMPOOL_REGISTER_OPS(dpaa2_mpool_ops);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/3] bus/dpaa: optimize physical to virtual address searching
  2018-04-27 17:20 ` [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain
  2018-04-27 17:20   ` [PATCH v2 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion Shreyansh Jain
  2018-04-27 17:20   ` [PATCH v2 2/3] bus/fslmc: optimize physical to virtual address searching Shreyansh Jain
@ 2018-04-27 17:20   ` Shreyansh Jain
  2018-04-27 19:32     ` Thomas Monjalon
  2018-04-27 19:38   ` [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Thomas Monjalon
  3 siblings, 1 reply; 12+ messages in thread
From: Shreyansh Jain @ 2018-04-27 17:20 UTC (permalink / raw)
  To: thomas, dev; +Cc: hemant.agrawal, akhil.goyal, anatoly.burakov, Shreyansh Jain

With Hotplugging memory support, the order of memseg has been changed
from physically contiguous to virtual contiguous. DPAA bus and drivers
depend on PA to VA address conversion for I/O.

This patch creates a list of blocks requested to be pinned to the
DPAA mempool. For searching physical addresses, it is expected that
it would belong to this list (from hardware pool) and hence it is
less expensive than memseg walks. Though, there is a marginal drop
in performance vis-a-vis the legacy mode with physically contiguous
memsegs.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>

---
An optimized algorithm is being worked upon based on some recent patches
in hotplugging. That would improve/recover the performance. Until that
time, this patch is to be treated a stop-gap solution.
---
 drivers/bus/dpaa/rte_dpaa_bus.h     | 27 ++++++++++++++++++++++++++-
 drivers/mempool/dpaa/dpaa_mempool.c | 33 ++++++++++++++++++++++++++++++++-
 2 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/drivers/bus/dpaa/rte_dpaa_bus.h b/drivers/bus/dpaa/rte_dpaa_bus.h
index 89aeac2d1..ca32b7f2f 100644
--- a/drivers/bus/dpaa/rte_dpaa_bus.h
+++ b/drivers/bus/dpaa/rte_dpaa_bus.h
@@ -95,9 +95,34 @@ struct dpaa_portal {
 	uint64_t tid;/**< Parent Thread id for this portal */
 };
 
-/* TODO - this is costly, need to write a fast coversion routine */
+/* Various structures representing contiguous memory maps */
+struct dpaa_memseg {
+	TAILQ_ENTRY(dpaa_memseg) next;
+	char *vaddr;
+	rte_iova_t iova;
+	size_t len;
+};
+
+TAILQ_HEAD(dpaa_memseg_list, dpaa_memseg);
+extern struct dpaa_memseg_list dpaa_memsegs;
+
+/* Either iterate over the list of internal memseg references or fallback to
+ * EAL memseg based iova2virt.
+ */
 static inline void *rte_dpaa_mem_ptov(phys_addr_t paddr)
 {
+	struct dpaa_memseg *ms;
+
+	/* Check if the address is already part of the memseg list internally
+	 * maintained by the dpaa driver.
+	 */
+	TAILQ_FOREACH(ms, &dpaa_memsegs, next) {
+		if (paddr >= ms->iova && paddr <
+			ms->iova + ms->len)
+			return RTE_PTR_ADD(ms->vaddr, (paddr - ms->iova));
+	}
+
+	/* If not, Fallback to full memseg list searching */
 	return rte_mem_iova2virt(paddr);
 }
 
diff --git a/drivers/mempool/dpaa/dpaa_mempool.c b/drivers/mempool/dpaa/dpaa_mempool.c
index 580e4640c..9d6277f82 100644
--- a/drivers/mempool/dpaa/dpaa_mempool.c
+++ b/drivers/mempool/dpaa/dpaa_mempool.c
@@ -27,6 +27,13 @@
 
 #include <dpaa_mempool.h>
 
+/* List of all the memseg information locally maintained in dpaa driver. This
+ * is to optimize the PA_to_VA searches until a better mechanism (algo) is
+ * available.
+ */
+struct dpaa_memseg_list dpaa_memsegs
+	= TAILQ_HEAD_INITIALIZER(dpaa_memsegs);
+
 struct dpaa_bp_info rte_dpaa_bpid_info[DPAA_MAX_BPOOLS];
 
 static int
@@ -287,10 +294,34 @@ dpaa_populate(struct rte_mempool *mp, unsigned int max_objs,
 	/* Detect pool area has sufficient space for elements in this memzone */
 	if (len >= total_elt_sz * mp->size)
 		bp_info->flags |= DPAA_MPOOL_SINGLE_SEGMENT;
+	struct dpaa_memseg *ms;
+
+	/* For each memory chunk pinned to the Mempool, a linked list of the
+	 * contained memsegs is created for searching when PA to VA
+	 * conversion is required.
+	 */
+	ms = rte_zmalloc(NULL, sizeof(struct dpaa_memseg), 0);
+	if (!ms) {
+		DPAA_MEMPOOL_ERR("Unable to allocate internal memory.");
+		DPAA_MEMPOOL_WARN("Fast Physical to Virtual Addr translation would not be available.");
+		/* If the element is not added, it would only lead to failure
+		 * in searching for the element and the logic would Fallback
+		 * to traditional DPDK memseg traversal code. So, this is not
+		 * a blocking error - but, error would be printed on screen.
+		 */
+		return 0;
+	}
+
+	ms->vaddr = vaddr;
+	ms->iova = paddr;
+	ms->len = len;
+	/* Head insertions are generally faster than tail insertions as the
+	 * buffers pinned are picked from rear end.
+	 */
+	TAILQ_INSERT_HEAD(&dpaa_memsegs, ms, next);
 
 	return rte_mempool_op_populate_default(mp, max_objs, vaddr, paddr, len,
 					       obj_cb, obj_cb_arg);
-
 }
 
 struct rte_mempool_ops dpaa_mpool_ops = {
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] bus/fslmc: optimize physical to virtual address searching
  2018-04-27 17:20   ` [PATCH v2 2/3] bus/fslmc: optimize physical to virtual address searching Shreyansh Jain
@ 2018-04-27 18:49     ` Thomas Monjalon
  2018-04-27 19:24       ` Thomas Monjalon
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Monjalon @ 2018-04-27 18:49 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev, hemant.agrawal, akhil.goyal, anatoly.burakov

27/04/2018 19:20, Shreyansh Jain:
> --- a/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h
> +++ b/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h
> @@ -254,15 +254,38 @@ enum qbman_fd_format {
>   */
>  #define DPAA2_EQ_RESP_ALWAYS		1
>  
> +/* Various structures representing contiguous memory maps */
> +struct dpaa2_memseg {
> +	TAILQ_ENTRY(dpaa2_memseg) next;
> +	char *vaddr;
> +	rte_iova_t iova;
> +	size_t len;
> +};
> +
> +TAILQ_HEAD(dpaa2_memseg_list, dpaa2_memseg);
> +extern struct dpaa2_memseg_list dpaa2_memsegs;

Shared compilation is broken without following patch:

--- a/drivers/bus/fslmc/rte_bus_fslmc_version.map
+++ b/drivers/bus/fslmc/rte_bus_fslmc_version.map
@@ -105,5 +105,6 @@ DPDK_18.05 {
        global:
 
        dpaa2_affine_qbman_ethrx_swp;
+       dpaa2_memsegs;
 
 } DPDK_18.02;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] bus/fslmc: optimize physical to virtual address searching
  2018-04-27 18:49     ` Thomas Monjalon
@ 2018-04-27 19:24       ` Thomas Monjalon
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Monjalon @ 2018-04-27 19:24 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev, hemant.agrawal, akhil.goyal, anatoly.burakov

27/04/2018 20:49, Thomas Monjalon:
> 27/04/2018 19:20, Shreyansh Jain:
> > --- a/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h
> > +++ b/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h
> > @@ -254,15 +254,38 @@ enum qbman_fd_format {
> >   */
> >  #define DPAA2_EQ_RESP_ALWAYS		1
> >  
> > +/* Various structures representing contiguous memory maps */
> > +struct dpaa2_memseg {
> > +	TAILQ_ENTRY(dpaa2_memseg) next;
> > +	char *vaddr;
> > +	rte_iova_t iova;
> > +	size_t len;
> > +};
> > +
> > +TAILQ_HEAD(dpaa2_memseg_list, dpaa2_memseg);
> > +extern struct dpaa2_memseg_list dpaa2_memsegs;
> 
> Shared compilation is broken without following patch:
> 
> --- a/drivers/bus/fslmc/rte_bus_fslmc_version.map
> +++ b/drivers/bus/fslmc/rte_bus_fslmc_version.map
> @@ -105,5 +105,6 @@ DPDK_18.05 {
>         global:
>  
>         dpaa2_affine_qbman_ethrx_swp;
> +       dpaa2_memsegs;
>  
>  } DPDK_18.02;

Right fix is:

--- a/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h
+++ b/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h
@@ -263,7 +263,7 @@ struct dpaa2_memseg {
 };
 
 TAILQ_HEAD(dpaa2_memseg_list, dpaa2_memseg);
-extern struct dpaa2_memseg_list dpaa2_memsegs;
+extern struct dpaa2_memseg_list rte_dpaa2_memsegs;
 
 #ifdef RTE_LIBRTE_DPAA2_USE_PHYS_IOVA
 extern uint8_t dpaa2_virt_mode;
@@ -279,10 +279,10 @@ static void *dpaa2_mem_ptov(phys_addr_t paddr)
        /* Check if the address is already part of the memseg list internally
         * maintained by the dpaa2 driver.
         */
-       TAILQ_FOREACH(ms, &dpaa2_memsegs, next) {
+       TAILQ_FOREACH(ms, &rte_dpaa2_memsegs, next) {
                if (paddr >= ms->iova && paddr <
                        ms->iova + ms->len)
-                       return RTE_PTR_ADD(ms->vaddr, (paddr - ms->iova));
+                       return RTE_PTR_ADD(ms->vaddr, (uintptr_t)(paddr - ms->iova));
        }
 
        /* If not, Fallback to full memseg list searching */

--- a/drivers/event/dpaa2/Makefile
+++ b/drivers/event/dpaa2/Makefile
@@ -18,7 +18,8 @@ CFLAGS += -I$(RTE_SDK)/drivers/bus/fslmc/portal
 CFLAGS += -I$(RTE_SDK)/drivers/mempool/dpaa2
 CFLAGS += -I$(RTE_SDK)/drivers/event/dpaa2
 CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
-LDLIBS += -lrte_eal -lrte_eventdev -lrte_bus_fslmc -lrte_pmd_dpaa2
+LDLIBS += -lrte_eal -lrte_eventdev
+LDLIBS += -lrte_bus_fslmc -lrte_mempool_dpaa2 -lrte_pmd_dpaa2
 LDLIBS += -lrte_bus_vdev
 CFLAGS += -I$(RTE_SDK)/drivers/net/dpaa2
 CFLAGS += -I$(RTE_SDK)/drivers/net/dpaa2/mc

--- a/drivers/mempool/dpaa2/dpaa2_hw_mempool.c
+++ b/drivers/mempool/dpaa2/dpaa2_hw_mempool.c
@@ -36,8 +36,8 @@ static struct dpaa2_bp_list *h_bp_list;
  * is to optimize the PA_to_VA searches until a better mechanism (algo) is
  * available.
  */
-struct dpaa2_memseg_list dpaa2_memsegs
-       = TAILQ_HEAD_INITIALIZER(dpaa2_memsegs);
+struct dpaa2_memseg_list rte_dpaa2_memsegs
+       = TAILQ_HEAD_INITIALIZER(rte_dpaa2_memsegs);
 
 /* Dynamic logging identified for mempool */
 int dpaa2_logtype_mempool;
@@ -394,7 +394,7 @@ dpaa2_populate(struct rte_mempool *mp, unsigned int max_objs,
        /* Head insertions are generally faster than tail insertions as the
         * buffers pinned are picked from rear end.
         */
-       TAILQ_INSERT_HEAD(&dpaa2_memsegs, ms, next);
+       TAILQ_INSERT_HEAD(&rte_dpaa2_memsegs, ms, next);
 
        return rte_mempool_op_populate_default(mp, max_objs, vaddr, paddr, len,
                                               obj_cb, obj_cb_arg);

--- a/drivers/mempool/dpaa2/rte_mempool_dpaa2_version.map
+++ b/drivers/mempool/dpaa2/rte_mempool_dpaa2_version.map
@@ -3,6 +3,7 @@ DPDK_17.05 {
 
        rte_dpaa2_bpid_info;
        rte_dpaa2_mbuf_alloc_bulk;
+       rte_dpaa2_memsegs;
 
        local: *;
 };

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] bus/dpaa: optimize physical to virtual address searching
  2018-04-27 17:20   ` [PATCH v2 3/3] bus/dpaa: " Shreyansh Jain
@ 2018-04-27 19:32     ` Thomas Monjalon
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Monjalon @ 2018-04-27 19:32 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev, hemant.agrawal, akhil.goyal, anatoly.burakov

27/04/2018 19:20, Shreyansh Jain:
> --- a/drivers/bus/dpaa/rte_dpaa_bus.h
> +++ b/drivers/bus/dpaa/rte_dpaa_bus.h
> @@ -95,9 +95,34 @@ struct dpaa_portal {
>  	uint64_t tid;/**< Parent Thread id for this portal */
>  };
>  
> -/* TODO - this is costly, need to write a fast coversion routine */
> +/* Various structures representing contiguous memory maps */
> +struct dpaa_memseg {
> +	TAILQ_ENTRY(dpaa_memseg) next;
> +	char *vaddr;
> +	rte_iova_t iova;
> +	size_t len;
> +};
> +
> +TAILQ_HEAD(dpaa_memseg_list, dpaa_memseg);
> +extern struct dpaa_memseg_list dpaa_memsegs;

Same as for DPAA2, fixes are required:

--- a/drivers/bus/dpaa/rte_dpaa_bus.h
+++ b/drivers/bus/dpaa/rte_dpaa_bus.h
@@ -104,7 +104,7 @@ struct dpaa_memseg {
 };
 
 TAILQ_HEAD(dpaa_memseg_list, dpaa_memseg);
-extern struct dpaa_memseg_list dpaa_memsegs;
+extern struct dpaa_memseg_list rte_dpaa_memsegs;
 
 /* Either iterate over the list of internal memseg references or fallback to
  * EAL memseg based iova2virt.
@@ -116,10 +116,10 @@ static inline void *rte_dpaa_mem_ptov(phys_addr_t paddr)
        /* Check if the address is already part of the memseg list internally
         * maintained by the dpaa driver.
         */
-       TAILQ_FOREACH(ms, &dpaa_memsegs, next) {
+       TAILQ_FOREACH(ms, &rte_dpaa_memsegs, next) {
                if (paddr >= ms->iova && paddr <
                        ms->iova + ms->len)
-                       return RTE_PTR_ADD(ms->vaddr, (paddr - ms->iova));
+                       return RTE_PTR_ADD(ms->vaddr, (uintptr_t)(paddr - ms->iova));
        }
 
        /* If not, Fallback to full memseg list searching */

--- a/drivers/mempool/dpaa/dpaa_mempool.c
+++ b/drivers/mempool/dpaa/dpaa_mempool.c
@@ -31,8 +31,8 @@
  * is to optimize the PA_to_VA searches until a better mechanism (algo) is
  * available.
  */
-struct dpaa_memseg_list dpaa_memsegs
-       = TAILQ_HEAD_INITIALIZER(dpaa_memsegs);
+struct dpaa_memseg_list rte_dpaa_memsegs
+       = TAILQ_HEAD_INITIALIZER(rte_dpaa_memsegs);
 
 struct dpaa_bp_info rte_dpaa_bpid_info[DPAA_MAX_BPOOLS];
 
@@ -318,7 +318,7 @@ dpaa_populate(struct rte_mempool *mp, unsigned int max_objs,
        /* Head insertions are generally faster than tail insertions as the
         * buffers pinned are picked from rear end.
         */
-       TAILQ_INSERT_HEAD(&dpaa_memsegs, ms, next);
+       TAILQ_INSERT_HEAD(&rte_dpaa_memsegs, ms, next);
 
        return rte_mempool_op_populate_default(mp, max_objs, vaddr, paddr, len,
                                               obj_cb, obj_cb_arg);

--- a/drivers/mempool/dpaa/rte_mempool_dpaa_version.map
+++ b/drivers/mempool/dpaa/rte_mempool_dpaa_version.map
@@ -2,6 +2,7 @@ DPDK_17.11 {
        global:
 
        rte_dpaa_bpid_info;
+       rte_dpaa_memsegs;
 
        local: *;
 };

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing
  2018-04-27 17:20 ` [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain
                     ` (2 preceding siblings ...)
  2018-04-27 17:20   ` [PATCH v2 3/3] bus/dpaa: " Shreyansh Jain
@ 2018-04-27 19:38   ` Thomas Monjalon
  3 siblings, 0 replies; 12+ messages in thread
From: Thomas Monjalon @ 2018-04-27 19:38 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev, hemant.agrawal, akhil.goyal, anatoly.burakov

> Shreyansh Jain (3):
>   crypto/dpaa_sec: remove ctx based offset for PA-VA conversion
>   bus/fslmc: optimize physical to virtual address searching
>   bus/dpaa: optimize physical to virtual address searching

Applied with fixes for:
	- 32-bit compilation
	- symbols export for shared lib compilation
	- rte_ prefix namespace for exported symbols
	- dpaa2 mempool dependency for dpaa2 eventdev

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-04-27 19:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-27 16:25 [PATCH 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain
2018-04-27 16:25 ` [PATCH 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion Shreyansh Jain
2018-04-27 16:25 ` [PATCH 2/3] bus/fslmc: optimize physical to virtual address searching Shreyansh Jain
2018-04-27 16:25 ` [PATCH 3/3] bus/dpaa: " Shreyansh Jain
2018-04-27 17:20 ` [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Shreyansh Jain
2018-04-27 17:20   ` [PATCH v2 1/3] crypto/dpaa_sec: remove ctx based offset for PA-VA conversion Shreyansh Jain
2018-04-27 17:20   ` [PATCH v2 2/3] bus/fslmc: optimize physical to virtual address searching Shreyansh Jain
2018-04-27 18:49     ` Thomas Monjalon
2018-04-27 19:24       ` Thomas Monjalon
2018-04-27 17:20   ` [PATCH v2 3/3] bus/dpaa: " Shreyansh Jain
2018-04-27 19:32     ` Thomas Monjalon
2018-04-27 19:38   ` [PATCH v2 0/3] Optimization for DPAA/DPAA2 for PA/VA Addressing Thomas Monjalon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.