All of lore.kernel.org
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] mempool/octeontx2: optimize for L1D cache architecture
@ 2020-01-31 12:50 pbhagavatula
  2020-01-31 15:55 ` Jerin Jacob
  2020-01-31 17:23 ` [dpdk-dev] [PATCH v2] " pbhagavatula
  0 siblings, 2 replies; 5+ messages in thread
From: pbhagavatula @ 2020-01-31 12:50 UTC (permalink / raw)
  To: Jerin Jacob, Nithin Dabilpuram, Vamsi Attunuru; +Cc: dev, Pavan Nikhilesh

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

OCTEON TX2 has 8 sets, 41 ways L1D cache, VA<9:7> bits dictate
the set selection.
Add additional padding to ensure that the element size always
occupies odd number of cachelines to ensure even distribution
of elements among L1D cache sets.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 drivers/mempool/octeontx2/otx2_mempool_ops.c | 41 ++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/drivers/mempool/octeontx2/otx2_mempool_ops.c b/drivers/mempool/octeontx2/otx2_mempool_ops.c
index ea4b1c45d..bd71babdb 100644
--- a/drivers/mempool/octeontx2/otx2_mempool_ops.c
+++ b/drivers/mempool/octeontx2/otx2_mempool_ops.c
@@ -641,6 +641,7 @@ otx2_npa_alloc(struct rte_mempool *mp)
 	struct npa_aura_s aura;
 	struct npa_pool_s pool;
 	uint64_t aura_handle;
+	size_t padding;
 	int rc;
 
 	lf = otx2_npa_lf_obj_get();
@@ -650,6 +651,18 @@ otx2_npa_alloc(struct rte_mempool *mp)
 	}
 
 	block_size = mp->elt_size + mp->header_size + mp->trailer_size;
+	/*
+	 * OCTEON TX2 has 8 sets, 41 ways L1D cache, VA<9:7> bits dictate
+	 * the set selection.
+	 * Add additional padding to ensure that the element size always
+	 * occupies odd number of cachelines to ensure even distribution
+	 * of elements among L1D cache sets.
+	 */
+	padding = ((block_size / RTE_CACHE_LINE_SIZE) % 2) ? 0 :
+				RTE_CACHE_LINE_SIZE;
+	mp->trailer_size += padding;
+	block_size += padding;
+
 	block_count = mp->size;
 
 	if (block_size % OTX2_ALIGN != 0) {
@@ -724,12 +737,22 @@ otx2_npa_calc_mem_size(const struct rte_mempool *mp, uint32_t obj_num,
 						align);
 }
 
+static uint8_t
+otx2_npa_l1d_way_set_get(uint64_t iova)
+{
+	return (iova >> rte_log2_u32(RTE_CACHE_LINE_SIZE)) & 0x7;
+}
+
 static int
 otx2_npa_populate(struct rte_mempool *mp, unsigned int max_objs, void *vaddr,
 		  rte_iova_t iova, size_t len,
 		  rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
 {
+#define OTX2_L1D_NB_SETS	8
+	uint64_t distribution[OTX2_L1D_NB_SETS];
+	rte_iova_t start_iova;
 	size_t total_elt_sz;
+	uint8_t set;
 	size_t off;
 
 	if (iova == RTE_BAD_IOVA)
@@ -743,10 +766,28 @@ otx2_npa_populate(struct rte_mempool *mp, unsigned int max_objs, void *vaddr,
 	if (len < off)
 		return -EINVAL;
 
+
 	vaddr = (char *)vaddr + off;
 	iova += off;
 	len -= off;
 
+	memset(distribution, 0, sizeof(uint64_t) * OTX2_L1D_NB_SETS);
+	start_iova = iova;
+	while (start_iova < iova + len) {
+		set = otx2_npa_l1d_way_set_get(start_iova + mp->header_size);
+		distribution[set]++;
+		start_iova += total_elt_sz;
+	}
+
+	otx2_npa_dbg("iova %lx, aligned iova %lx", iova - off, iova);
+	otx2_npa_dbg("length %ld, aligned length %ld", len + off, len);
+	otx2_npa_dbg("element size %ld", total_elt_sz);
+	otx2_npa_dbg("requested objects %d, possible objects %ld", max_objs,
+			len / total_elt_sz);
+	otx2_npa_dbg("L1D set distribution :");
+	for (int i = 0; i < OTX2_L1D_NB_SETS; i++)
+		otx2_npa_dbg("set[%d] : objects : %ld", i, distribution[i]);
+
 	npa_lf_aura_op_range_set(mp->pool_id, iova, iova + len);
 
 	if (npa_lf_aura_range_update_check(mp->pool_id) < 0)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH] mempool/octeontx2: optimize for L1D cache architecture
  2020-01-31 12:50 [dpdk-dev] [PATCH] mempool/octeontx2: optimize for L1D cache architecture pbhagavatula
@ 2020-01-31 15:55 ` Jerin Jacob
  2020-01-31 17:23 ` [dpdk-dev] [PATCH v2] " pbhagavatula
  1 sibling, 0 replies; 5+ messages in thread
From: Jerin Jacob @ 2020-01-31 15:55 UTC (permalink / raw)
  To: Pavan Nikhilesh; +Cc: Jerin Jacob, Nithin Dabilpuram, Vamsi Attunuru, dpdk-dev

On Fri, Jan 31, 2020 at 6:21 PM <pbhagavatula@marvell.com> wrote:
>
> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>
> OCTEON TX2 has 8 sets, 41 ways L1D cache, VA<9:7> bits dictate
> the set selection.
> Add additional padding to ensure that the element size always
> occupies odd number of cachelines to ensure even distribution
> of elements among L1D cache sets.
>
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
> Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> ---
>  drivers/mempool/octeontx2/otx2_mempool_ops.c | 41 ++++++++++++++++++++
>  1 file changed, 41 insertions(+)


Please fix the 32-bit build issues(i.e use PRIx64)

/export/dpdk.org/drivers/mempool/octeontx2/otx2_mempool_ops.c:782:41:
note: format string is defined here
  782 |  otx2_npa_dbg("iova %lx, aligned iova %lx", iova - off, iova);
      |                                       ~~^
      |                                         |
      |                                         long unsigned int
      |                                       %llx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [dpdk-dev] [PATCH v2] mempool/octeontx2: optimize for L1D cache architecture
  2020-01-31 12:50 [dpdk-dev] [PATCH] mempool/octeontx2: optimize for L1D cache architecture pbhagavatula
  2020-01-31 15:55 ` Jerin Jacob
@ 2020-01-31 17:23 ` pbhagavatula
  2020-02-01 11:51   ` Jerin Jacob
  1 sibling, 1 reply; 5+ messages in thread
From: pbhagavatula @ 2020-01-31 17:23 UTC (permalink / raw)
  To: Jerin Jacob, Nithin Dabilpuram, Vamsi Attunuru; +Cc: dev, Pavan Nikhilesh

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

OCTEON TX2 has 8 sets, 41 ways L1D cache, VA<9:7> bits dictate
the set selection.
Add additional padding to ensure that the element size always
occupies odd number of cachelines to ensure even distribution
of elements among L1D cache sets.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 v2 Changes:
 ----------
 - Fix 32bit build break.

 drivers/mempool/octeontx2/otx2_mempool_ops.c | 44 ++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/drivers/mempool/octeontx2/otx2_mempool_ops.c b/drivers/mempool/octeontx2/otx2_mempool_ops.c
index ea4b1c45d..6b554471d 100644
--- a/drivers/mempool/octeontx2/otx2_mempool_ops.c
+++ b/drivers/mempool/octeontx2/otx2_mempool_ops.c
@@ -641,6 +641,7 @@ otx2_npa_alloc(struct rte_mempool *mp)
 	struct npa_aura_s aura;
 	struct npa_pool_s pool;
 	uint64_t aura_handle;
+	size_t padding;
 	int rc;

 	lf = otx2_npa_lf_obj_get();
@@ -650,6 +651,18 @@ otx2_npa_alloc(struct rte_mempool *mp)
 	}

 	block_size = mp->elt_size + mp->header_size + mp->trailer_size;
+	/*
+	 * OCTEON TX2 has 8 sets, 41 ways L1D cache, VA<9:7> bits dictate
+	 * the set selection.
+	 * Add additional padding to ensure that the element size always
+	 * occupies odd number of cachelines to ensure even distribution
+	 * of elements among L1D cache sets.
+	 */
+	padding = ((block_size / RTE_CACHE_LINE_SIZE) % 2) ? 0 :
+				RTE_CACHE_LINE_SIZE;
+	mp->trailer_size += padding;
+	block_size += padding;
+
 	block_count = mp->size;

 	if (block_size % OTX2_ALIGN != 0) {
@@ -724,12 +737,22 @@ otx2_npa_calc_mem_size(const struct rte_mempool *mp, uint32_t obj_num,
 						align);
 }

+static uint8_t
+otx2_npa_l1d_way_set_get(uint64_t iova)
+{
+	return (iova >> rte_log2_u32(RTE_CACHE_LINE_SIZE)) & 0x7;
+}
+
 static int
 otx2_npa_populate(struct rte_mempool *mp, unsigned int max_objs, void *vaddr,
 		  rte_iova_t iova, size_t len,
 		  rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
 {
+#define OTX2_L1D_NB_SETS	8
+	uint64_t distribution[OTX2_L1D_NB_SETS];
+	rte_iova_t start_iova;
 	size_t total_elt_sz;
+	uint8_t set;
 	size_t off;

 	if (iova == RTE_BAD_IOVA)
@@ -743,10 +766,31 @@ otx2_npa_populate(struct rte_mempool *mp, unsigned int max_objs, void *vaddr,
 	if (len < off)
 		return -EINVAL;

+
 	vaddr = (char *)vaddr + off;
 	iova += off;
 	len -= off;

+	memset(distribution, 0, sizeof(uint64_t) * OTX2_L1D_NB_SETS);
+	start_iova = iova;
+	while (start_iova < iova + len) {
+		set = otx2_npa_l1d_way_set_get(start_iova + mp->header_size);
+		distribution[set]++;
+		start_iova += total_elt_sz;
+	}
+
+	otx2_npa_dbg("iova %"PRIx64", aligned iova %"PRIx64"", iova - off,
+		     iova);
+	otx2_npa_dbg("length %"PRIu64", aligned length %"PRIu64"",
+		     (uint64_t)(len + off), (uint64_t)len);
+	otx2_npa_dbg("element size %"PRIu64"", (uint64_t)total_elt_sz);
+	otx2_npa_dbg("requested objects %"PRIu64", possible objects %"PRIu64"",
+		     (uint64_t)max_objs, (uint64_t)(len / total_elt_sz));
+	otx2_npa_dbg("L1D set distribution :");
+	for (int i = 0; i < OTX2_L1D_NB_SETS; i++)
+		otx2_npa_dbg("set[%d] : objects : %"PRIu64"", i,
+			     distribution[i]);
+
 	npa_lf_aura_op_range_set(mp->pool_id, iova, iova + len);

 	if (npa_lf_aura_range_update_check(mp->pool_id) < 0)
--
2.24.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH v2] mempool/octeontx2: optimize for L1D cache architecture
  2020-01-31 17:23 ` [dpdk-dev] [PATCH v2] " pbhagavatula
@ 2020-02-01 11:51   ` Jerin Jacob
  2020-02-16 20:33     ` Thomas Monjalon
  0 siblings, 1 reply; 5+ messages in thread
From: Jerin Jacob @ 2020-02-01 11:51 UTC (permalink / raw)
  To: Pavan Nikhilesh, Thomas Monjalon
  Cc: Jerin Jacob, Nithin Dabilpuram, Vamsi Attunuru, dpdk-dev

On Fri, Jan 31, 2020 at 10:53 PM <pbhagavatula@marvell.com> wrote:
>
> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>
> OCTEON TX2 has 8 sets, 41 ways L1D cache, VA<9:7> bits dictate
> the set selection.
> Add additional padding to ensure that the element size always
> occupies odd number of cachelines to ensure even distribution
> of elements among L1D cache sets.
>
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
> Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> ---
>  v2 Changes:
>  ----------
>  - Fix 32bit build break.


Acked-by: Jerin Jacob <jerinj@marvell.com>

Delegated the patch to Thomas as it has come through the main tree.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH v2] mempool/octeontx2: optimize for L1D cache architecture
  2020-02-01 11:51   ` Jerin Jacob
@ 2020-02-16 20:33     ` Thomas Monjalon
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Monjalon @ 2020-02-16 20:33 UTC (permalink / raw)
  To: Pavan Nikhilesh
  Cc: dev, Jerin Jacob, Nithin Dabilpuram, Vamsi Attunuru, Jerin Jacob

01/02/2020 12:51, Jerin Jacob:
> On Fri, Jan 31, 2020 at 10:53 PM <pbhagavatula@marvell.com> wrote:
> >
> > From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> >
> > OCTEON TX2 has 8 sets, 41 ways L1D cache, VA<9:7> bits dictate
> > the set selection.
> > Add additional padding to ensure that the element size always
> > occupies odd number of cachelines to ensure even distribution
> > of elements among L1D cache sets.
> >
> > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
> > Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
> > Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> > ---
> >  v2 Changes:
> >  ----------
> >  - Fix 32bit build break.
> 
> Acked-by: Jerin Jacob <jerinj@marvell.com>
> 
> Delegated the patch to Thomas as it has come through the main tree.

Applied, thanks




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-16 20:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-31 12:50 [dpdk-dev] [PATCH] mempool/octeontx2: optimize for L1D cache architecture pbhagavatula
2020-01-31 15:55 ` Jerin Jacob
2020-01-31 17:23 ` [dpdk-dev] [PATCH v2] " pbhagavatula
2020-02-01 11:51   ` Jerin Jacob
2020-02-16 20:33     ` Thomas Monjalon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.