netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [pull request][net-next 00/15] mlx5 SW steering updates 2020-09-25
@ 2020-09-25 19:37 saeed
  2020-09-25 19:37 ` [net-next 01/15] net/mlx5: DR, Add buddy allocator utilities saeed
                   ` (14 more replies)
  0 siblings, 15 replies; 23+ messages in thread
From: saeed @ 2020-09-25 19:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski; +Cc: netdev, Saeed Mahameed

From: Saeed Mahameed <saeedm@nvidia.com>

Hi Dave/Jakub,

This series makes some updates to mlx5 software steering.
For more information please see tag log below.

Please pull and let me know if there is any problem.

Thanks,
Saeed.

---
The following changes since commit aafe8853f5e2bcbdd231411aec218b8c5dc78437:

  Merge branch 'hns3-next' (2020-09-24 20:19:25 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2020-09-25

for you to fetch changes up to d2bbdbfbdbbeed34a15228990654f4d796fb6323:

  net/mlx5: DR, Add support for rule creation with flow source hint (2020-09-25 12:35:45 -0700)

----------------------------------------------------------------
mlx5-updates-2020-09-25

This series includes updates to mlx5 software steering component.

1) DR, Memory management improvements

This patch series contains SW Steering memory management improvements:
using buddy allocator instead of an existing bucket allocator, and
several other optimizations.

The buddy system is a memory allocation and management algorithm
that manages memory in power of two increments.

The algorithm is well-known and well-described, such as here:
https://en.wikipedia.org/wiki/Buddy_memory_allocation

Linux uses this algorithm for managing and allocating physical pages,
as described here:
https://www.kernel.org/doc/gorman/html/understand/understand009.html

In our case, although the algorithm in principal is similar to the
Linux physical page allocator, the "building blocks" and the circumstances
are different: in SW steering, buddy allocator doesn't really allocates
a memory, but rather manages ICM (Interconnect Context Memory) that was
previously allocated and registered.

The ICM memory that is used in SW steering is always power
of 2 (order), so buddy system is a good fit for this.

Patches in this series:

[PATH 1/5] net/mlx5: DR, Add buddy allocator utilities
  This patch adds a modified implementation of a well-known buddy allocator,
  adjusted for SW steering needs: the algorithm in principal is similar to
  the Linux physical page allocator, but in our case buddy allocator doesn't
  really allocate a memory, but rather manages ICM memory that was previously
  allocated and registered.

[PATH 2/5] net/mlx5: DR, Handle ICM memory via buddy allocation instead of bucket management
  This patch changes ICM management of SW steering to use buddy-system mechanism
  Instead of the previous bucket management.

[PATH 3/5] net/mlx5: DR, Sync chunks only during free
  This patch makes syncing happen only when freeing memory chunks.

[PATH 4/5] net/mlx5: DR, ICM memory pools sync optimization
  This patch adds tracking of pool's "hot" memory and makes the
  check whether steering sync is required much shorter and faster.

[PATH 5/5] net/mlx5: DR, Free buddy ICM memory if it is unused
  This patch adds tracking buddy's used ICM memory,
  and frees the buddy if all its memory becomes unused.

2) Few improvements in the DR area, such as removing unneeded checks,
  renaming to better general names, refactor in some places, etc.

3) Create SW steering rules with flow source hint, needed to determine
 rule direction (TX,RX) based on this hint rather than the meta data
 from Reg-C0.

----------------------------------------------------------------
Hamdan Igbaria (1):
      net/mlx5: DR, Add support for rule creation with flow source hint

Yevgeny Kliteynik (13):
      net/mlx5: DR, Add buddy allocator utilities
      net/mlx5: DR, Handle ICM memory via buddy allocation instead of bucket management
      net/mlx5: DR, Sync chunks only during free
      net/mlx5: DR, ICM memory pools sync optimization
      net/mlx5: DR, Free buddy ICM memory if it is unused
      net/mlx5: DR, Replace the check for valid STE entry
      net/mlx5: DR, Remove unneeded check from source port builder
      net/mlx5: DR, Remove unneeded vlan check from L2 builder
      net/mlx5: DR, Remove unneeded local variable
      net/mlx5: DR, Call ste_builder directly with tag pointer
      net/mlx5: DR, Remove unused member of action struct
      net/mlx5: DR, Rename builders HW specific names
      net/mlx5: DR, Rename matcher functions to be more HW agnostic

sunils (1):
      net/mlx5: E-switch, Use PF num in metadata reg c0

 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |  36 +-
 .../mellanox/mlx5/core/steering/dr_buddy.c         | 255 +++++++++++
 .../ethernet/mellanox/mlx5/core/steering/dr_cmd.c  |   4 +-
 .../mellanox/mlx5/core/steering/dr_icm_pool.c      | 501 ++++++++-------------
 .../mellanox/mlx5/core/steering/dr_matcher.c       | 123 ++---
 .../ethernet/mellanox/mlx5/core/steering/dr_rule.c |  47 +-
 .../ethernet/mellanox/mlx5/core/steering/dr_send.c |   4 +-
 .../ethernet/mellanox/mlx5/core/steering/dr_ste.c  | 223 +++------
 .../mellanox/mlx5/core/steering/dr_types.h         | 101 +++--
 .../ethernet/mellanox/mlx5/core/steering/fs_dr.c   |   3 +-
 .../ethernet/mellanox/mlx5/core/steering/mlx5dr.h  |  36 +-
 include/linux/mlx5/eswitch.h                       |  15 +-
 13 files changed, 728 insertions(+), 622 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/steering/dr_buddy.c

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [net-next 01/15] net/mlx5: DR, Add buddy allocator utilities
  2020-09-25 19:37 [pull request][net-next 00/15] mlx5 SW steering updates 2020-09-25 saeed
@ 2020-09-25 19:37 ` saeed
  2020-09-26 22:15   ` David Miller
  2020-09-25 19:37 ` [net-next 02/15] net/mlx5: DR, Handle ICM memory via buddy allocation instead of bucket management saeed
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 23+ messages in thread
From: saeed @ 2020-09-25 19:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Yevgeny Kliteynik, Erez Shitrit, Mark Bloch, Saeed Mahameed

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

Add implementation of SW Steering variation of buddy allocator.

The buddy system for ICM memory uses 2 main data structures:
  - Bitmap per order, that keeps the current state of allocated
    blocks for this order
  - Indicator for the number of available blocks per each order

In addition, there is one more hierarchy of searching in the bitmaps
in order to accelerate the search of the next free block which done
via find-first function:
The buddy system spends lots of its time in searching the next free
space using function find_first_bit, which scans a big array of long
values in order to find the first bit. We added one more array of
longs, where each bit indicates a long value in the original array,
that way there is a need for much less searches for the next free area.

For example, for the following bits array of 128 bits where all
bits are zero except for the last bit  :  0000........00001
the corresponding bits-per-long array is:  0001

The search will be done over the bits-per-long array first, and after
the first bit is found, we will use it as a start pointer in the
bigger array (bits array).

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 .../mellanox/mlx5/core/steering/dr_buddy.c    | 255 ++++++++++++++++++
 .../mellanox/mlx5/core/steering/mlx5dr.h      |  34 +++
 3 files changed, 290 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/steering/dr_buddy.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 9826a041e407..132786e853aa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -80,7 +80,7 @@ mlx5_core-$(CONFIG_MLX5_EN_TLS) += en_accel/tls.o en_accel/tls_rxtx.o en_accel/t
 
 mlx5_core-$(CONFIG_MLX5_SW_STEERING) += steering/dr_domain.o steering/dr_table.o \
 					steering/dr_matcher.o steering/dr_rule.o \
-					steering/dr_icm_pool.o \
+					steering/dr_icm_pool.o steering/dr_buddy.o \
 					steering/dr_ste.o steering/dr_send.o \
 					steering/dr_cmd.o steering/dr_fw.o \
 					steering/dr_action.o steering/fs_dr.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_buddy.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_buddy.c
new file mode 100644
index 000000000000..9cabf968ac11
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_buddy.c
@@ -0,0 +1,255 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2004 Topspin Communications. All rights reserved.
+ * Copyright (c) 2005 - 2008 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2006 - 2007 Cisco Systems, Inc. All rights reserved.
+ * Copyright (c) 2020 NVIDIA CORPORATION. All rights reserved.
+ */
+
+#include "dr_types.h"
+
+static unsigned long dr_find_first_bit(const unsigned long *bitmap_per_long,
+				       const unsigned long *bitmap,
+				       unsigned long size)
+{
+	unsigned int bit_per_long_size = DIV_ROUND_UP(size, BITS_PER_LONG);
+	unsigned int bitmap_idx;
+
+	/* find the first free in the first level */
+	bitmap_idx = find_first_bit(bitmap_per_long, bit_per_long_size);
+	/* find the next level */
+	return find_next_bit(bitmap, size, bitmap_idx * BITS_PER_LONG);
+}
+
+int mlx5dr_buddy_init(struct mlx5dr_icm_buddy_mem *buddy,
+		      unsigned int max_order)
+{
+	int i;
+
+	buddy->max_order = max_order;
+
+	INIT_LIST_HEAD(&buddy->list_node);
+	INIT_LIST_HEAD(&buddy->used_list);
+	INIT_LIST_HEAD(&buddy->hot_list);
+
+	buddy->bitmap = kcalloc(buddy->max_order + 1,
+				sizeof(*buddy->bitmap),
+				GFP_KERNEL);
+	buddy->num_free = kcalloc(buddy->max_order + 1,
+				  sizeof(*buddy->num_free),
+				  GFP_KERNEL);
+	buddy->bitmap_per_long = kcalloc(buddy->max_order + 1,
+					 sizeof(*buddy->bitmap_per_long),
+					 GFP_KERNEL);
+
+	if (!buddy->bitmap || !buddy->num_free || !buddy->bitmap_per_long)
+		goto err_free_all;
+
+	/* Allocating max_order bitmaps, one for each order */
+
+	for (i = 0; i <= buddy->max_order; ++i) {
+		unsigned int size = 1 << (buddy->max_order - i);
+
+		buddy->bitmap[i] = bitmap_zalloc(size, GFP_KERNEL);
+		if (!buddy->bitmap[i])
+			goto err_out_free_each_bit_per_order;
+	}
+
+	for (i = 0; i <= buddy->max_order; ++i) {
+		unsigned int size = BITS_TO_LONGS(1 << (buddy->max_order - i));
+
+		buddy->bitmap_per_long[i] = bitmap_zalloc(size, GFP_KERNEL);
+		if (!buddy->bitmap_per_long[i])
+			goto err_out_free_set;
+	}
+
+	/* In the beginning, we have only one order that is available for
+	 * use (the biggest one), so mark the first bit in both bitmaps.
+	 */
+
+	bitmap_set(buddy->bitmap[buddy->max_order], 0, 1);
+	bitmap_set(buddy->bitmap_per_long[buddy->max_order], 0, 1);
+
+	buddy->num_free[buddy->max_order] = 1;
+
+	return 0;
+
+err_out_free_set:
+	for (i = 0; i <= buddy->max_order; ++i)
+		bitmap_free(buddy->bitmap_per_long[i]);
+
+err_out_free_each_bit_per_order:
+	kfree(buddy->bitmap_per_long);
+
+	for (i = 0; i <= buddy->max_order; ++i)
+		bitmap_free(buddy->bitmap[i]);
+
+err_free_all:
+	kfree(buddy->bitmap_per_long);
+	kfree(buddy->num_free);
+	kfree(buddy->bitmap);
+	return -ENOMEM;
+}
+
+void mlx5dr_buddy_cleanup(struct mlx5dr_icm_buddy_mem *buddy)
+{
+	int i;
+
+	list_del(&buddy->list_node);
+
+	for (i = 0; i <= buddy->max_order; ++i) {
+		bitmap_free(buddy->bitmap[i]);
+		bitmap_free(buddy->bitmap_per_long[i]);
+	}
+
+	kfree(buddy->bitmap_per_long);
+	kfree(buddy->num_free);
+	kfree(buddy->bitmap);
+}
+
+/**
+ * dr_buddy_get_seg_borders() - Find the borders of specific segment.
+ * @seg: Segment number.
+ * @low: Pointer to hold the low border of the provided segment.
+ * @high: Pointer to hold the high border of the provided segment.
+ *
+ * Find the borders (high and low) of specific seg (segment location)
+ * of the lower level of the bitmap in order to mark the upper layer
+ * of bitmap.
+ */
+static void dr_buddy_get_seg_borders(unsigned int seg,
+				     unsigned int *low,
+				     unsigned int *high)
+{
+	*low = (seg / BITS_PER_LONG) * BITS_PER_LONG;
+	*high = ((seg / BITS_PER_LONG) + 1) * BITS_PER_LONG;
+}
+
+/**
+ * dr_buddy_update_upper_bitmap() - Update second level bitmap.
+ * @buddy: Buddy to update.
+ * @seg: Segment number.
+ * @order: Order of the buddy to update.
+ *
+ * We have two layers of searching in the bitmaps, so when
+ * needed update the second layer of search.
+ */
+static void dr_buddy_update_upper_bitmap(struct mlx5dr_icm_buddy_mem *buddy,
+					 unsigned long seg,
+					 unsigned int order)
+{
+	unsigned int h, l, m;
+
+	/* clear upper layer of search if needed */
+	dr_buddy_get_seg_borders(seg, &l, &h);
+	m = find_next_bit(buddy->bitmap[order], h, l);
+	if (m == h) /* nothing in the long that includes seg */
+		bitmap_clear(buddy->bitmap_per_long[order],
+			     seg / BITS_PER_LONG, 1);
+}
+
+static int dr_buddy_find_free_seg(struct mlx5dr_icm_buddy_mem *buddy,
+				  unsigned int start_order,
+				  unsigned int *segment,
+				  unsigned int *order)
+{
+	unsigned int seg, order_iter, m;
+
+	for (order_iter = start_order;
+	     order_iter <= buddy->max_order; ++order_iter) {
+		if (!buddy->num_free[order_iter])
+			continue;
+
+		m = 1 << (buddy->max_order - order_iter);
+		seg = dr_find_first_bit(buddy->bitmap_per_long[order_iter],
+					buddy->bitmap[order_iter], m);
+
+		if (WARN(seg >= m,
+			 "ICM Buddy: failed finding free mem for order %d\n",
+			 order_iter))
+			return -ENOMEM;
+
+		break;
+	}
+
+	if (order_iter > buddy->max_order)
+		return -ENOMEM;
+
+	*segment = seg;
+	*order = order_iter;
+	return 0;
+}
+
+/**
+ * mlx5dr_buddy_alloc_mem() - Update second level bitmap.
+ * @buddy: Buddy to update.
+ * @order: Order of the buddy to update.
+ * @segment: Segment number.
+ *
+ * This function finds the first area of the ICM memory managed by this buddy.
+ * It uses the data structures of the buddy system in order to find the first
+ * area of free place, starting from the current order till the maximum order
+ * in the system.
+ *
+ * Return: 0 when segment is set, non-zero error status otherwise.
+ *
+ * The function returns the location (segment) in the whole buddy ICM memory
+ * area - the index of the memory segment that is available for use.
+ */
+int mlx5dr_buddy_alloc_mem(struct mlx5dr_icm_buddy_mem *buddy,
+			   unsigned int order,
+			   unsigned int *segment)
+{
+	unsigned int seg, order_iter;
+	int err;
+
+	err = dr_buddy_find_free_seg(buddy, order, &seg, &order_iter);
+	if (err)
+		return err;
+
+	bitmap_clear(buddy->bitmap[order_iter], seg, 1);
+	/* clear upper layer of search if needed */
+	dr_buddy_update_upper_bitmap(buddy, seg, order_iter);
+	--buddy->num_free[order_iter];
+
+	/* If we found free memory in some order that is bigger than the
+	 * required order, we need to split every order between the required
+	 * order and the order that we found into two parts, and mark accordingly.
+	 */
+	while (order_iter > order) {
+		--order_iter;
+		seg <<= 1;
+		bitmap_set(buddy->bitmap[order_iter], seg ^ 1, 1);
+		bitmap_set(buddy->bitmap_per_long[order_iter],
+			   (seg ^ 1) / BITS_PER_LONG, 1);
+
+		++buddy->num_free[order_iter];
+	}
+
+	seg <<= order;
+	*segment = seg;
+
+	return 0;
+}
+
+void mlx5dr_buddy_free_mem(struct mlx5dr_icm_buddy_mem *buddy,
+			   unsigned int seg, unsigned int order)
+{
+	seg >>= order;
+
+	/* Whenever a segment is free,
+	 * the mem is added to the buddy that gave it.
+	 */
+	while (test_bit(seg ^ 1, buddy->bitmap[order])) {
+		bitmap_clear(buddy->bitmap[order], seg ^ 1, 1);
+		dr_buddy_update_upper_bitmap(buddy, seg ^ 1, order);
+		--buddy->num_free[order];
+		seg >>= 1;
+		++order;
+	}
+	bitmap_set(buddy->bitmap[order], seg, 1);
+	bitmap_set(buddy->bitmap_per_long[order],
+		   seg / BITS_PER_LONG, 1);
+
+	++buddy->num_free[order];
+}
+
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h
index 7deaca9ade3b..40bdbdc6e3f2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h
@@ -126,4 +126,38 @@ mlx5dr_is_supported(struct mlx5_core_dev *dev)
 	return MLX5_CAP_ESW_FLOWTABLE_FDB(dev, sw_owner);
 }
 
+/* buddy functions & structure */
+
+struct mlx5dr_icm_mr;
+
+struct mlx5dr_icm_buddy_mem {
+	unsigned long		**bitmap;
+	unsigned int		*num_free;
+	unsigned long		**bitmap_per_long;
+	u32			max_order;
+	struct list_head	list_node;
+	struct mlx5dr_icm_mr	*icm_mr;
+	struct mlx5dr_icm_pool	*pool;
+
+	/* This is the list of used chunks. HW may be accessing this memory */
+	struct list_head	used_list;
+
+	/* Hardware may be accessing this memory but at some future,
+	 * undetermined time, it might cease to do so.
+	 * sync_ste command sets them free.
+	 */
+	struct list_head	hot_list;
+	/* indicates the byte size of hot mem */
+	unsigned int		hot_memory_size;
+};
+
+int mlx5dr_buddy_init(struct mlx5dr_icm_buddy_mem *buddy,
+		      unsigned int max_order);
+void mlx5dr_buddy_cleanup(struct mlx5dr_icm_buddy_mem *buddy);
+int mlx5dr_buddy_alloc_mem(struct mlx5dr_icm_buddy_mem *buddy,
+			   unsigned int order,
+			   unsigned int *segment);
+void mlx5dr_buddy_free_mem(struct mlx5dr_icm_buddy_mem *buddy,
+			   unsigned int seg, unsigned int order);
+
 #endif /* _MLX5DR_H_ */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 02/15] net/mlx5: DR, Handle ICM memory via buddy allocation instead of bucket management
  2020-09-25 19:37 [pull request][net-next 00/15] mlx5 SW steering updates 2020-09-25 saeed
  2020-09-25 19:37 ` [net-next 01/15] net/mlx5: DR, Add buddy allocator utilities saeed
@ 2020-09-25 19:37 ` saeed
  2020-09-25 19:37 ` [net-next 03/15] net/mlx5: DR, Sync chunks only during free saeed
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: saeed @ 2020-09-25 19:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Yevgeny Kliteynik, Erez Shitrit, Mark Bloch, Saeed Mahameed

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

Till now in order to manage the ICM memory we used bucket
mechanism, which kept a bucket per specified size (sizes were
between 1 block to 2^21 blocks).

Now changing that with buddy-system mechanism, which gives us much
more flexible way to manage the ICM memory.
Its biggest advantage over the bucket is by using the same ICM memory
area for all the sizes of blocks, which reduces the memory consumption.

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/steering/dr_icm_pool.c | 501 +++++++-----------
 .../mellanox/mlx5/core/steering/dr_types.h    |  24 +-
 2 files changed, 211 insertions(+), 314 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c
index cc33515b9aba..2c5886b469f7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c
@@ -6,48 +6,13 @@
 #define DR_ICM_MODIFY_HDR_ALIGN_BASE 64
 #define DR_ICM_SYNC_THRESHOLD (64 * 1024 * 1024)
 
-struct mlx5dr_icm_pool;
-
-struct mlx5dr_icm_bucket {
-	struct mlx5dr_icm_pool *pool;
-
-	/* Chunks that aren't visible to HW not directly and not in cache */
-	struct list_head free_list;
-	unsigned int free_list_count;
-
-	/* Used chunks, HW may be accessing this memory */
-	struct list_head used_list;
-	unsigned int used_list_count;
-
-	/* HW may be accessing this memory but at some future,
-	 * undetermined time, it might cease to do so. Before deciding to call
-	 * sync_ste, this list is moved to sync_list
-	 */
-	struct list_head hot_list;
-	unsigned int hot_list_count;
-
-	/* Pending sync list, entries from the hot list are moved to this list.
-	 * sync_ste is executed and then sync_list is concatenated to the free list
-	 */
-	struct list_head sync_list;
-	unsigned int sync_list_count;
-
-	u32 total_chunks;
-	u32 num_of_entries;
-	u32 entry_size;
-	/* protect the ICM bucket */
-	struct mutex mutex;
-};
-
 struct mlx5dr_icm_pool {
-	struct mlx5dr_icm_bucket *buckets;
 	enum mlx5dr_icm_type icm_type;
 	enum mlx5dr_icm_chunk_size max_log_chunk_sz;
-	enum mlx5dr_icm_chunk_size num_of_buckets;
-	struct list_head icm_mr_list;
-	/* protect the ICM MR list */
-	struct mutex mr_mutex;
 	struct mlx5dr_domain *dmn;
+	/* memory management */
+	struct mutex mutex; /* protect the ICM pool and ICM buddy */
+	struct list_head buddy_mem_list;
 };
 
 struct mlx5dr_icm_dm {
@@ -58,13 +23,11 @@ struct mlx5dr_icm_dm {
 };
 
 struct mlx5dr_icm_mr {
-	struct mlx5dr_icm_pool *pool;
 	struct mlx5_core_mkey mkey;
 	struct mlx5dr_icm_dm dm;
-	size_t used_length;
+	struct mlx5dr_domain *dmn;
 	size_t length;
 	u64 icm_start_addr;
-	struct list_head mr_list;
 };
 
 static int dr_icm_create_dm_mkey(struct mlx5_core_dev *mdev,
@@ -107,8 +70,7 @@ dr_icm_pool_mr_create(struct mlx5dr_icm_pool *pool)
 	if (!icm_mr)
 		return NULL;
 
-	icm_mr->pool = pool;
-	INIT_LIST_HEAD(&icm_mr->mr_list);
+	icm_mr->dmn = pool->dmn;
 
 	icm_mr->dm.length = mlx5dr_icm_pool_chunk_size_to_byte(pool->max_log_chunk_sz,
 							       pool->icm_type);
@@ -150,8 +112,6 @@ dr_icm_pool_mr_create(struct mlx5dr_icm_pool *pool)
 		goto free_mkey;
 	}
 
-	list_add_tail(&icm_mr->mr_list, &pool->icm_mr_list);
-
 	return icm_mr;
 
 free_mkey:
@@ -166,10 +126,9 @@ dr_icm_pool_mr_create(struct mlx5dr_icm_pool *pool)
 
 static void dr_icm_pool_mr_destroy(struct mlx5dr_icm_mr *icm_mr)
 {
-	struct mlx5_core_dev *mdev = icm_mr->pool->dmn->mdev;
+	struct mlx5_core_dev *mdev = icm_mr->dmn->mdev;
 	struct mlx5dr_icm_dm *dm = &icm_mr->dm;
 
-	list_del(&icm_mr->mr_list);
 	mlx5_core_destroy_mkey(mdev, &icm_mr->mkey);
 	mlx5_dm_sw_icm_dealloc(mdev, dm->type, dm->length, 0,
 			       dm->addr, dm->obj_id);
@@ -178,19 +137,17 @@ static void dr_icm_pool_mr_destroy(struct mlx5dr_icm_mr *icm_mr)
 
 static int dr_icm_chunk_ste_init(struct mlx5dr_icm_chunk *chunk)
 {
-	struct mlx5dr_icm_bucket *bucket = chunk->bucket;
-
-	chunk->ste_arr = kvzalloc(bucket->num_of_entries *
+	chunk->ste_arr = kvzalloc(chunk->num_of_entries *
 				  sizeof(chunk->ste_arr[0]), GFP_KERNEL);
 	if (!chunk->ste_arr)
 		return -ENOMEM;
 
-	chunk->hw_ste_arr = kvzalloc(bucket->num_of_entries *
+	chunk->hw_ste_arr = kvzalloc(chunk->num_of_entries *
 				     DR_STE_SIZE_REDUCED, GFP_KERNEL);
 	if (!chunk->hw_ste_arr)
 		goto out_free_ste_arr;
 
-	chunk->miss_list = kvmalloc(bucket->num_of_entries *
+	chunk->miss_list = kvmalloc(chunk->num_of_entries *
 				    sizeof(chunk->miss_list[0]), GFP_KERNEL);
 	if (!chunk->miss_list)
 		goto out_free_hw_ste_arr;
@@ -204,72 +161,6 @@ static int dr_icm_chunk_ste_init(struct mlx5dr_icm_chunk *chunk)
 	return -ENOMEM;
 }
 
-static int dr_icm_chunks_create(struct mlx5dr_icm_bucket *bucket)
-{
-	size_t mr_free_size, mr_req_size, mr_row_size;
-	struct mlx5dr_icm_pool *pool = bucket->pool;
-	struct mlx5dr_icm_mr *icm_mr = NULL;
-	struct mlx5dr_icm_chunk *chunk;
-	int i, err = 0;
-
-	mr_req_size = bucket->num_of_entries * bucket->entry_size;
-	mr_row_size = mlx5dr_icm_pool_chunk_size_to_byte(pool->max_log_chunk_sz,
-							 pool->icm_type);
-	mutex_lock(&pool->mr_mutex);
-	if (!list_empty(&pool->icm_mr_list)) {
-		icm_mr = list_last_entry(&pool->icm_mr_list,
-					 struct mlx5dr_icm_mr, mr_list);
-
-		if (icm_mr)
-			mr_free_size = icm_mr->dm.length - icm_mr->used_length;
-	}
-
-	if (!icm_mr || mr_free_size < mr_row_size) {
-		icm_mr = dr_icm_pool_mr_create(pool);
-		if (!icm_mr) {
-			err = -ENOMEM;
-			goto out_err;
-		}
-	}
-
-	/* Create memory aligned chunks */
-	for (i = 0; i < mr_row_size / mr_req_size; i++) {
-		chunk = kvzalloc(sizeof(*chunk), GFP_KERNEL);
-		if (!chunk) {
-			err = -ENOMEM;
-			goto out_err;
-		}
-
-		chunk->bucket = bucket;
-		chunk->rkey = icm_mr->mkey.key;
-		/* mr start addr is zero based */
-		chunk->mr_addr = icm_mr->used_length;
-		chunk->icm_addr = (uintptr_t)icm_mr->icm_start_addr + icm_mr->used_length;
-		icm_mr->used_length += mr_req_size;
-		chunk->num_of_entries = bucket->num_of_entries;
-		chunk->byte_size = chunk->num_of_entries * bucket->entry_size;
-
-		if (pool->icm_type == DR_ICM_TYPE_STE) {
-			err = dr_icm_chunk_ste_init(chunk);
-			if (err)
-				goto out_free_chunk;
-		}
-
-		INIT_LIST_HEAD(&chunk->chunk_list);
-		list_add(&chunk->chunk_list, &bucket->free_list);
-		bucket->free_list_count++;
-		bucket->total_chunks++;
-	}
-	mutex_unlock(&pool->mr_mutex);
-	return 0;
-
-out_free_chunk:
-	kvfree(chunk);
-out_err:
-	mutex_unlock(&pool->mr_mutex);
-	return err;
-}
-
 static void dr_icm_chunk_ste_cleanup(struct mlx5dr_icm_chunk *chunk)
 {
 	kvfree(chunk->miss_list);
@@ -277,166 +168,208 @@ static void dr_icm_chunk_ste_cleanup(struct mlx5dr_icm_chunk *chunk)
 	kvfree(chunk->ste_arr);
 }
 
+static enum mlx5dr_icm_type
+get_chunk_icm_type(struct mlx5dr_icm_chunk *chunk)
+{
+	return chunk->buddy_mem->pool->icm_type;
+}
+
 static void dr_icm_chunk_destroy(struct mlx5dr_icm_chunk *chunk)
 {
-	struct mlx5dr_icm_bucket *bucket = chunk->bucket;
+	enum mlx5dr_icm_type icm_type = get_chunk_icm_type(chunk);
 
 	list_del(&chunk->chunk_list);
-	bucket->total_chunks--;
 
-	if (bucket->pool->icm_type == DR_ICM_TYPE_STE)
+	if (icm_type == DR_ICM_TYPE_STE)
 		dr_icm_chunk_ste_cleanup(chunk);
 
 	kvfree(chunk);
 }
 
-static void dr_icm_bucket_init(struct mlx5dr_icm_pool *pool,
-			       struct mlx5dr_icm_bucket *bucket,
-			       enum mlx5dr_icm_chunk_size chunk_size)
+static int dr_icm_buddy_create(struct mlx5dr_icm_pool *pool)
 {
-	if (pool->icm_type == DR_ICM_TYPE_STE)
-		bucket->entry_size = DR_STE_SIZE;
-	else
-		bucket->entry_size = DR_MODIFY_ACTION_SIZE;
-
-	bucket->num_of_entries = mlx5dr_icm_pool_chunk_size_to_entries(chunk_size);
-	bucket->pool = pool;
-	mutex_init(&bucket->mutex);
-	INIT_LIST_HEAD(&bucket->free_list);
-	INIT_LIST_HEAD(&bucket->used_list);
-	INIT_LIST_HEAD(&bucket->hot_list);
-	INIT_LIST_HEAD(&bucket->sync_list);
+	struct mlx5dr_icm_buddy_mem *buddy;
+	struct mlx5dr_icm_mr *icm_mr;
+
+	icm_mr = dr_icm_pool_mr_create(pool);
+	if (!icm_mr)
+		return -ENOMEM;
+
+	buddy = kvzalloc(sizeof(*buddy), GFP_KERNEL);
+	if (!buddy)
+		goto free_mr;
+
+	if (mlx5dr_buddy_init(buddy, pool->max_log_chunk_sz))
+		goto err_free_buddy;
+
+	buddy->icm_mr = icm_mr;
+	buddy->pool = pool;
+
+	/* add it to the -start- of the list in order to search in it first */
+	list_add(&buddy->list_node, &pool->buddy_mem_list);
+
+	return 0;
+
+err_free_buddy:
+	kvfree(buddy);
+free_mr:
+	dr_icm_pool_mr_destroy(icm_mr);
+	return -ENOMEM;
 }
 
-static void dr_icm_bucket_cleanup(struct mlx5dr_icm_bucket *bucket)
+static void dr_icm_buddy_destroy(struct mlx5dr_icm_buddy_mem *buddy)
 {
 	struct mlx5dr_icm_chunk *chunk, *next;
 
-	mutex_destroy(&bucket->mutex);
-	list_splice_tail_init(&bucket->sync_list, &bucket->free_list);
-	list_splice_tail_init(&bucket->hot_list, &bucket->free_list);
+	list_for_each_entry_safe(chunk, next, &buddy->hot_list, chunk_list)
+		dr_icm_chunk_destroy(chunk);
 
-	list_for_each_entry_safe(chunk, next, &bucket->free_list, chunk_list)
+	list_for_each_entry_safe(chunk, next, &buddy->used_list, chunk_list)
 		dr_icm_chunk_destroy(chunk);
 
-	WARN_ON(bucket->total_chunks != 0);
+	dr_icm_pool_mr_destroy(buddy->icm_mr);
 
-	/* Cleanup of unreturned chunks */
-	list_for_each_entry_safe(chunk, next, &bucket->used_list, chunk_list)
-		dr_icm_chunk_destroy(chunk);
+	mlx5dr_buddy_cleanup(buddy);
+
+	kvfree(buddy);
 }
 
-static u64 dr_icm_hot_mem_size(struct mlx5dr_icm_pool *pool)
+static struct mlx5dr_icm_chunk *
+dr_icm_chunk_create(struct mlx5dr_icm_pool *pool,
+		    enum mlx5dr_icm_chunk_size chunk_size,
+		    struct mlx5dr_icm_buddy_mem *buddy_mem_pool,
+		    unsigned int seg)
 {
-	u64 hot_size = 0;
-	int chunk_order;
+	struct mlx5dr_icm_chunk *chunk;
+	int offset;
 
-	for (chunk_order = 0; chunk_order < pool->num_of_buckets; chunk_order++)
-		hot_size += pool->buckets[chunk_order].hot_list_count *
-			    mlx5dr_icm_pool_chunk_size_to_byte(chunk_order, pool->icm_type);
+	chunk = kvzalloc(sizeof(*chunk), GFP_KERNEL);
+	if (!chunk)
+		return NULL;
 
-	return hot_size;
-}
+	offset = mlx5dr_icm_pool_dm_type_to_entry_size(pool->icm_type) * seg;
+
+	chunk->rkey = buddy_mem_pool->icm_mr->mkey.key;
+	chunk->mr_addr = offset;
+	chunk->icm_addr =
+		(uintptr_t)buddy_mem_pool->icm_mr->icm_start_addr + offset;
+	chunk->num_of_entries =
+		mlx5dr_icm_pool_chunk_size_to_entries(chunk_size);
+	chunk->byte_size =
+		mlx5dr_icm_pool_chunk_size_to_byte(chunk_size, pool->icm_type);
+	chunk->seg = seg;
+
+	if (pool->icm_type == DR_ICM_TYPE_STE && dr_icm_chunk_ste_init(chunk)) {
+		mlx5dr_err(pool->dmn,
+			   "Failed to init ste arrays (order: %d)\n",
+			   chunk_size);
+		goto out_free_chunk;
+	}
 
-static bool dr_icm_reuse_hot_entries(struct mlx5dr_icm_pool *pool,
-				     struct mlx5dr_icm_bucket *bucket)
-{
-	u64 bytes_for_sync;
+	chunk->buddy_mem = buddy_mem_pool;
+	INIT_LIST_HEAD(&chunk->chunk_list);
 
-	bytes_for_sync = dr_icm_hot_mem_size(pool);
-	if (bytes_for_sync < DR_ICM_SYNC_THRESHOLD || !bucket->hot_list_count)
-		return false;
+	/* chunk now is part of the used_list */
+	list_add_tail(&chunk->chunk_list, &buddy_mem_pool->used_list);
 
-	return true;
-}
+	return chunk;
 
-static void dr_icm_chill_bucket_start(struct mlx5dr_icm_bucket *bucket)
-{
-	list_splice_tail_init(&bucket->hot_list, &bucket->sync_list);
-	bucket->sync_list_count += bucket->hot_list_count;
-	bucket->hot_list_count = 0;
+out_free_chunk:
+	kvfree(chunk);
+	return NULL;
 }
 
-static void dr_icm_chill_bucket_end(struct mlx5dr_icm_bucket *bucket)
+static bool dr_icm_pool_is_sync_required(struct mlx5dr_icm_pool *pool)
 {
-	list_splice_tail_init(&bucket->sync_list, &bucket->free_list);
-	bucket->free_list_count += bucket->sync_list_count;
-	bucket->sync_list_count = 0;
-}
+	u64 allow_hot_size, all_hot_mem = 0;
+	struct mlx5dr_icm_buddy_mem *buddy;
+
+	list_for_each_entry(buddy, &pool->buddy_mem_list, list_node) {
+		allow_hot_size =
+			mlx5dr_icm_pool_chunk_size_to_byte((buddy->max_order - 2),
+							   pool->icm_type);
+		all_hot_mem += buddy->hot_memory_size;
+
+		if (buddy->hot_memory_size > allow_hot_size ||
+		    all_hot_mem > DR_ICM_SYNC_THRESHOLD)
+			return true;
+	}
 
-static void dr_icm_chill_bucket_abort(struct mlx5dr_icm_bucket *bucket)
-{
-	list_splice_tail_init(&bucket->sync_list, &bucket->hot_list);
-	bucket->hot_list_count += bucket->sync_list_count;
-	bucket->sync_list_count = 0;
+	return false;
 }
 
-static void dr_icm_chill_buckets_start(struct mlx5dr_icm_pool *pool,
-				       struct mlx5dr_icm_bucket *cb,
-				       bool buckets[DR_CHUNK_SIZE_MAX])
+static int dr_icm_pool_sync_all_buddy_pools(struct mlx5dr_icm_pool *pool)
 {
-	struct mlx5dr_icm_bucket *bucket;
-	int i;
-
-	for (i = 0; i < pool->num_of_buckets; i++) {
-		bucket = &pool->buckets[i];
-		if (bucket == cb) {
-			dr_icm_chill_bucket_start(bucket);
-			continue;
-		}
+	struct mlx5dr_icm_buddy_mem *buddy, *tmp_buddy;
+	int err;
 
-		/* Freeing the mutex is done at the end of that process, after
-		 * sync_ste was executed at dr_icm_chill_buckets_end func.
-		 */
-		if (mutex_trylock(&bucket->mutex)) {
-			dr_icm_chill_bucket_start(bucket);
-			buckets[i] = true;
-		}
+	err = mlx5dr_cmd_sync_steering(pool->dmn->mdev);
+	if (err) {
+		mlx5dr_err(pool->dmn, "Failed to sync to HW (err: %d)\n", err);
+		return err;
 	}
-}
-
-static void dr_icm_chill_buckets_end(struct mlx5dr_icm_pool *pool,
-				     struct mlx5dr_icm_bucket *cb,
-				     bool buckets[DR_CHUNK_SIZE_MAX])
-{
-	struct mlx5dr_icm_bucket *bucket;
-	int i;
-
-	for (i = 0; i < pool->num_of_buckets; i++) {
-		bucket = &pool->buckets[i];
-		if (bucket == cb) {
-			dr_icm_chill_bucket_end(bucket);
-			continue;
-		}
 
-		if (!buckets[i])
-			continue;
+	list_for_each_entry_safe(buddy, tmp_buddy, &pool->buddy_mem_list, list_node) {
+		struct mlx5dr_icm_chunk *chunk, *tmp_chunk;
 
-		dr_icm_chill_bucket_end(bucket);
-		mutex_unlock(&bucket->mutex);
+		list_for_each_entry_safe(chunk, tmp_chunk, &buddy->hot_list, chunk_list) {
+			mlx5dr_buddy_free_mem(buddy, chunk->seg,
+					      ilog2(chunk->num_of_entries));
+			buddy->hot_memory_size -= chunk->byte_size;
+			dr_icm_chunk_destroy(chunk);
+		}
 	}
+
+	return 0;
 }
 
-static void dr_icm_chill_buckets_abort(struct mlx5dr_icm_pool *pool,
-				       struct mlx5dr_icm_bucket *cb,
-				       bool buckets[DR_CHUNK_SIZE_MAX])
+static int dr_icm_handle_buddies_get_mem(struct mlx5dr_icm_pool *pool,
+					 enum mlx5dr_icm_chunk_size chunk_size,
+					 struct mlx5dr_icm_buddy_mem **buddy,
+					 unsigned int *seg)
 {
-	struct mlx5dr_icm_bucket *bucket;
-	int i;
-
-	for (i = 0; i < pool->num_of_buckets; i++) {
-		bucket = &pool->buckets[i];
-		if (bucket == cb) {
-			dr_icm_chill_bucket_abort(bucket);
-			continue;
-		}
+	struct mlx5dr_icm_buddy_mem *buddy_mem_pool;
+	bool new_mem = false;
+	int err;
 
-		if (!buckets[i])
-			continue;
+	/* Check if we have chunks that are waiting for sync-ste */
+	if (dr_icm_pool_is_sync_required(pool))
+		dr_icm_pool_sync_all_buddy_pools(pool);
+
+alloc_buddy_mem:
+	/* find the next free place from the buddy list */
+	list_for_each_entry(buddy_mem_pool, &pool->buddy_mem_list, list_node) {
+		err = mlx5dr_buddy_alloc_mem(buddy_mem_pool,
+					     chunk_size, seg);
+		if (!err)
+			goto found;
+
+		if (WARN_ON(new_mem)) {
+			/* We have new memory pool, first in the list */
+			mlx5dr_err(pool->dmn,
+				   "No memory for order: %d\n",
+				   chunk_size);
+			goto out;
+		}
+	}
 
-		dr_icm_chill_bucket_abort(bucket);
-		mutex_unlock(&bucket->mutex);
+	/* no more available allocators in that pool, create new */
+	err = dr_icm_buddy_create(pool);
+	if (err) {
+		mlx5dr_err(pool->dmn,
+			   "Failed creating buddy for order %d\n",
+			   chunk_size);
+		goto out;
 	}
+
+	/* mark we have new memory, first in list */
+	new_mem = true;
+	goto alloc_buddy_mem;
+
+found:
+	*buddy = buddy_mem_pool;
+out:
+	return err;
 }
 
 /* Allocate an ICM chunk, each chunk holds a piece of ICM memory and
@@ -446,68 +379,42 @@ struct mlx5dr_icm_chunk *
 mlx5dr_icm_alloc_chunk(struct mlx5dr_icm_pool *pool,
 		       enum mlx5dr_icm_chunk_size chunk_size)
 {
-	struct mlx5dr_icm_chunk *chunk = NULL; /* Fix compilation warning */
-	bool buckets[DR_CHUNK_SIZE_MAX] = {};
-	struct mlx5dr_icm_bucket *bucket;
-	int err;
+	struct mlx5dr_icm_chunk *chunk = NULL;
+	struct mlx5dr_icm_buddy_mem *buddy;
+	unsigned int seg;
+	int ret;
 
 	if (chunk_size > pool->max_log_chunk_sz)
 		return NULL;
 
-	bucket = &pool->buckets[chunk_size];
-
-	mutex_lock(&bucket->mutex);
-
-	/* Take chunk from pool if available, otherwise allocate new chunks */
-	if (list_empty(&bucket->free_list)) {
-		if (dr_icm_reuse_hot_entries(pool, bucket)) {
-			dr_icm_chill_buckets_start(pool, bucket, buckets);
-			err = mlx5dr_cmd_sync_steering(pool->dmn->mdev);
-			if (err) {
-				dr_icm_chill_buckets_abort(pool, bucket, buckets);
-				mlx5dr_err(pool->dmn, "Sync_steering failed\n");
-				chunk = NULL;
-				goto out;
-			}
-			dr_icm_chill_buckets_end(pool, bucket, buckets);
-		} else {
-			dr_icm_chunks_create(bucket);
-		}
-	}
+	mutex_lock(&pool->mutex);
+	/* find mem, get back the relevant buddy pool and seg in that mem */
+	ret = dr_icm_handle_buddies_get_mem(pool, chunk_size, &buddy, &seg);
+	if (ret)
+		goto out;
 
-	if (!list_empty(&bucket->free_list)) {
-		chunk = list_last_entry(&bucket->free_list,
-					struct mlx5dr_icm_chunk,
-					chunk_list);
-		if (chunk) {
-			list_del_init(&chunk->chunk_list);
-			list_add_tail(&chunk->chunk_list, &bucket->used_list);
-			bucket->free_list_count--;
-			bucket->used_list_count++;
-		}
-	}
+	chunk = dr_icm_chunk_create(pool, chunk_size, buddy, seg);
+	if (!chunk)
+		goto out_err;
+
+	goto out;
+
+out_err:
+	mlx5dr_buddy_free_mem(buddy, seg, chunk_size);
 out:
-	mutex_unlock(&bucket->mutex);
+	mutex_unlock(&pool->mutex);
 	return chunk;
 }
 
 void mlx5dr_icm_free_chunk(struct mlx5dr_icm_chunk *chunk)
 {
-	struct mlx5dr_icm_bucket *bucket = chunk->bucket;
-
-	if (bucket->pool->icm_type == DR_ICM_TYPE_STE) {
-		memset(chunk->ste_arr, 0,
-		       bucket->num_of_entries * sizeof(chunk->ste_arr[0]));
-		memset(chunk->hw_ste_arr, 0,
-		       bucket->num_of_entries * DR_STE_SIZE_REDUCED);
-	}
+	struct mlx5dr_icm_buddy_mem *buddy = chunk->buddy_mem;
 
-	mutex_lock(&bucket->mutex);
-	list_del_init(&chunk->chunk_list);
-	list_add_tail(&chunk->chunk_list, &bucket->hot_list);
-	bucket->hot_list_count++;
-	bucket->used_list_count--;
-	mutex_unlock(&bucket->mutex);
+	/* move the memory to the waiting list AKA "hot" */
+	mutex_lock(&buddy->pool->mutex);
+	list_move_tail(&chunk->chunk_list, &buddy->hot_list);
+	buddy->hot_memory_size += chunk->byte_size;
+	mutex_unlock(&buddy->pool->mutex);
 }
 
 struct mlx5dr_icm_pool *mlx5dr_icm_pool_create(struct mlx5dr_domain *dmn,
@@ -515,7 +422,6 @@ struct mlx5dr_icm_pool *mlx5dr_icm_pool_create(struct mlx5dr_domain *dmn,
 {
 	enum mlx5dr_icm_chunk_size max_log_chunk_sz;
 	struct mlx5dr_icm_pool *pool;
-	int i;
 
 	if (icm_type == DR_ICM_TYPE_STE)
 		max_log_chunk_sz = dmn->info.max_log_sw_icm_sz;
@@ -526,43 +432,24 @@ struct mlx5dr_icm_pool *mlx5dr_icm_pool_create(struct mlx5dr_domain *dmn,
 	if (!pool)
 		return NULL;
 
-	pool->buckets = kcalloc(max_log_chunk_sz + 1,
-				sizeof(pool->buckets[0]),
-				GFP_KERNEL);
-	if (!pool->buckets)
-		goto free_pool;
-
 	pool->dmn = dmn;
 	pool->icm_type = icm_type;
 	pool->max_log_chunk_sz = max_log_chunk_sz;
-	pool->num_of_buckets = max_log_chunk_sz + 1;
-	INIT_LIST_HEAD(&pool->icm_mr_list);
 
-	for (i = 0; i < pool->num_of_buckets; i++)
-		dr_icm_bucket_init(pool, &pool->buckets[i], i);
+	INIT_LIST_HEAD(&pool->buddy_mem_list);
 
-	mutex_init(&pool->mr_mutex);
+	mutex_init(&pool->mutex);
 
 	return pool;
-
-free_pool:
-	kvfree(pool);
-	return NULL;
 }
 
 void mlx5dr_icm_pool_destroy(struct mlx5dr_icm_pool *pool)
 {
-	struct mlx5dr_icm_mr *icm_mr, *next;
-	int i;
-
-	mutex_destroy(&pool->mr_mutex);
-
-	list_for_each_entry_safe(icm_mr, next, &pool->icm_mr_list, mr_list)
-		dr_icm_pool_mr_destroy(icm_mr);
+	struct mlx5dr_icm_buddy_mem *buddy, *tmp_buddy;
 
-	for (i = 0; i < pool->num_of_buckets; i++)
-		dr_icm_bucket_cleanup(&pool->buckets[i]);
+	list_for_each_entry_safe(buddy, tmp_buddy, &pool->buddy_mem_list, list_node)
+		dr_icm_buddy_destroy(buddy);
 
-	kfree(pool->buckets);
+	mutex_destroy(&pool->mutex);
 	kvfree(pool);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
index 0883956c58c0..f71ca74f96fd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
@@ -114,7 +114,7 @@ enum mlx5dr_ipv {
 
 struct mlx5dr_icm_pool;
 struct mlx5dr_icm_chunk;
-struct mlx5dr_icm_bucket;
+struct mlx5dr_icm_buddy_mem;
 struct mlx5dr_ste_htbl;
 struct mlx5dr_match_param;
 struct mlx5dr_cmd_caps;
@@ -799,7 +799,7 @@ void mlx5dr_rule_update_rule_member(struct mlx5dr_ste *new_ste,
 				    struct mlx5dr_ste *ste);
 
 struct mlx5dr_icm_chunk {
-	struct mlx5dr_icm_bucket *bucket;
+	struct mlx5dr_icm_buddy_mem *buddy_mem;
 	struct list_head chunk_list;
 	u32 rkey;
 	u32 num_of_entries;
@@ -807,6 +807,11 @@ struct mlx5dr_icm_chunk {
 	u64 icm_addr;
 	u64 mr_addr;
 
+	/* indicates the index of this chunk in the whole memory,
+	 * used for deleting the chunk from the buddy
+	 */
+	unsigned int seg;
+
 	/* Memory optimisation */
 	struct mlx5dr_ste *ste_arr;
 	u8 *hw_ste_arr;
@@ -852,6 +857,15 @@ int mlx5dr_matcher_select_builders(struct mlx5dr_matcher *matcher,
 				   enum mlx5dr_ipv outer_ipv,
 				   enum mlx5dr_ipv inner_ipv);
 
+static inline int
+mlx5dr_icm_pool_dm_type_to_entry_size(enum mlx5dr_icm_type icm_type)
+{
+	if (icm_type == DR_ICM_TYPE_STE)
+		return DR_STE_SIZE;
+
+	return DR_MODIFY_ACTION_SIZE;
+}
+
 static inline u32
 mlx5dr_icm_pool_chunk_size_to_entries(enum mlx5dr_icm_chunk_size chunk_size)
 {
@@ -865,11 +879,7 @@ mlx5dr_icm_pool_chunk_size_to_byte(enum mlx5dr_icm_chunk_size chunk_size,
 	int num_of_entries;
 	int entry_size;
 
-	if (icm_type == DR_ICM_TYPE_STE)
-		entry_size = DR_STE_SIZE;
-	else
-		entry_size = DR_MODIFY_ACTION_SIZE;
-
+	entry_size = mlx5dr_icm_pool_dm_type_to_entry_size(icm_type);
 	num_of_entries = mlx5dr_icm_pool_chunk_size_to_entries(chunk_size);
 
 	return entry_size * num_of_entries;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 03/15] net/mlx5: DR, Sync chunks only during free
  2020-09-25 19:37 [pull request][net-next 00/15] mlx5 SW steering updates 2020-09-25 saeed
  2020-09-25 19:37 ` [net-next 01/15] net/mlx5: DR, Add buddy allocator utilities saeed
  2020-09-25 19:37 ` [net-next 02/15] net/mlx5: DR, Handle ICM memory via buddy allocation instead of bucket management saeed
@ 2020-09-25 19:37 ` saeed
  2020-09-25 19:37 ` [net-next 04/15] net/mlx5: DR, ICM memory pools sync optimization saeed
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: saeed @ 2020-09-25 19:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Yevgeny Kliteynik, Erez Shitrit, Alex Vesker, Mark Bloch,
	Saeed Mahameed

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

When freeing chunks, we want to sync the steering
so that all the "hot" memory will be written to ICM
and all the chunks that are in the hot_list will be
actually destroyed.
When allocating from the pool, we don't have a need
to sync the steering, as we're not freeing anything,
and sync might just hurt the performance in terms of
flow-per-second offloaded.

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/steering/dr_icm_pool.c      | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c
index 2c5886b469f7..4d8330aab169 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c
@@ -332,10 +332,6 @@ static int dr_icm_handle_buddies_get_mem(struct mlx5dr_icm_pool *pool,
 	bool new_mem = false;
 	int err;
 
-	/* Check if we have chunks that are waiting for sync-ste */
-	if (dr_icm_pool_is_sync_required(pool))
-		dr_icm_pool_sync_all_buddy_pools(pool);
-
 alloc_buddy_mem:
 	/* find the next free place from the buddy list */
 	list_for_each_entry(buddy_mem_pool, &pool->buddy_mem_list, list_node) {
@@ -409,12 +405,18 @@ mlx5dr_icm_alloc_chunk(struct mlx5dr_icm_pool *pool,
 void mlx5dr_icm_free_chunk(struct mlx5dr_icm_chunk *chunk)
 {
 	struct mlx5dr_icm_buddy_mem *buddy = chunk->buddy_mem;
+	struct mlx5dr_icm_pool *pool = buddy->pool;
 
 	/* move the memory to the waiting list AKA "hot" */
-	mutex_lock(&buddy->pool->mutex);
+	mutex_lock(&pool->mutex);
 	list_move_tail(&chunk->chunk_list, &buddy->hot_list);
 	buddy->hot_memory_size += chunk->byte_size;
-	mutex_unlock(&buddy->pool->mutex);
+
+	/* Check if we have chunks that are waiting for sync-ste */
+	if (dr_icm_pool_is_sync_required(pool))
+		dr_icm_pool_sync_all_buddy_pools(pool);
+
+	mutex_unlock(&pool->mutex);
 }
 
 struct mlx5dr_icm_pool *mlx5dr_icm_pool_create(struct mlx5dr_domain *dmn,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 04/15] net/mlx5: DR, ICM memory pools sync optimization
  2020-09-25 19:37 [pull request][net-next 00/15] mlx5 SW steering updates 2020-09-25 saeed
                   ` (2 preceding siblings ...)
  2020-09-25 19:37 ` [net-next 03/15] net/mlx5: DR, Sync chunks only during free saeed
@ 2020-09-25 19:37 ` saeed
  2020-09-25 19:37 ` [net-next 05/15] net/mlx5: DR, Free buddy ICM memory if it is unused saeed
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: saeed @ 2020-09-25 19:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Yevgeny Kliteynik, Hamdan Igbaria, Alex Vesker,
	Mark Bloch, Saeed Mahameed

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

Track the pool's hot ICM memory when freeing/allocating
chunk, so that when checking if the sync is required, just
check if the pool hot memory has reached the sync threshold.

Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/steering/dr_icm_pool.c | 22 +++++--------------
 .../mellanox/mlx5/core/steering/mlx5dr.h      |  2 --
 2 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c
index 4d8330aab169..c49f8e86f3bc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c
@@ -4,7 +4,7 @@
 #include "dr_types.h"
 
 #define DR_ICM_MODIFY_HDR_ALIGN_BASE 64
-#define DR_ICM_SYNC_THRESHOLD (64 * 1024 * 1024)
+#define DR_ICM_SYNC_THRESHOLD_POOL (64 * 1024 * 1024)
 
 struct mlx5dr_icm_pool {
 	enum mlx5dr_icm_type icm_type;
@@ -13,6 +13,7 @@ struct mlx5dr_icm_pool {
 	/* memory management */
 	struct mutex mutex; /* protect the ICM pool and ICM buddy */
 	struct list_head buddy_mem_list;
+	u64 hot_memory_size;
 };
 
 struct mlx5dr_icm_dm {
@@ -281,19 +282,8 @@ dr_icm_chunk_create(struct mlx5dr_icm_pool *pool,
 
 static bool dr_icm_pool_is_sync_required(struct mlx5dr_icm_pool *pool)
 {
-	u64 allow_hot_size, all_hot_mem = 0;
-	struct mlx5dr_icm_buddy_mem *buddy;
-
-	list_for_each_entry(buddy, &pool->buddy_mem_list, list_node) {
-		allow_hot_size =
-			mlx5dr_icm_pool_chunk_size_to_byte((buddy->max_order - 2),
-							   pool->icm_type);
-		all_hot_mem += buddy->hot_memory_size;
-
-		if (buddy->hot_memory_size > allow_hot_size ||
-		    all_hot_mem > DR_ICM_SYNC_THRESHOLD)
-			return true;
-	}
+	if (pool->hot_memory_size > DR_ICM_SYNC_THRESHOLD_POOL)
+		return true;
 
 	return false;
 }
@@ -315,7 +305,7 @@ static int dr_icm_pool_sync_all_buddy_pools(struct mlx5dr_icm_pool *pool)
 		list_for_each_entry_safe(chunk, tmp_chunk, &buddy->hot_list, chunk_list) {
 			mlx5dr_buddy_free_mem(buddy, chunk->seg,
 					      ilog2(chunk->num_of_entries));
-			buddy->hot_memory_size -= chunk->byte_size;
+			pool->hot_memory_size -= chunk->byte_size;
 			dr_icm_chunk_destroy(chunk);
 		}
 	}
@@ -410,7 +400,7 @@ void mlx5dr_icm_free_chunk(struct mlx5dr_icm_chunk *chunk)
 	/* move the memory to the waiting list AKA "hot" */
 	mutex_lock(&pool->mutex);
 	list_move_tail(&chunk->chunk_list, &buddy->hot_list);
-	buddy->hot_memory_size += chunk->byte_size;
+	pool->hot_memory_size += chunk->byte_size;
 
 	/* Check if we have chunks that are waiting for sync-ste */
 	if (dr_icm_pool_is_sync_required(pool))
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h
index 40bdbdc6e3f2..93ddb9c3e6b6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h
@@ -147,8 +147,6 @@ struct mlx5dr_icm_buddy_mem {
 	 * sync_ste command sets them free.
 	 */
 	struct list_head	hot_list;
-	/* indicates the byte size of hot mem */
-	unsigned int		hot_memory_size;
 };
 
 int mlx5dr_buddy_init(struct mlx5dr_icm_buddy_mem *buddy,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 05/15] net/mlx5: DR, Free buddy ICM memory if it is unused
  2020-09-25 19:37 [pull request][net-next 00/15] mlx5 SW steering updates 2020-09-25 saeed
                   ` (3 preceding siblings ...)
  2020-09-25 19:37 ` [net-next 04/15] net/mlx5: DR, ICM memory pools sync optimization saeed
@ 2020-09-25 19:37 ` saeed
  2020-09-25 19:38 ` [net-next 06/15] net/mlx5: E-switch, Use PF num in metadata reg c0 saeed
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: saeed @ 2020-09-25 19:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Yevgeny Kliteynik, Alex Vesker, Mark Bloch, Saeed Mahameed

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

Track buddy's used ICM memory, and free it if all
of the buddy's memory bacame unused.
Do this only for STEs.
MODIFY_ACTION buddies are much smaller, so in case there
is a large amount of modify_header actions, which result
in large amount of MODIFY_ACTION buddies, doing this
cleanup during sync will result in performance hit while
not freeing significant amount of memory.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/steering/dr_icm_pool.c      | 14 ++++++++++----
 .../ethernet/mellanox/mlx5/core/steering/mlx5dr.h  |  1 +
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c
index c49f8e86f3bc..66c24767e3b0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c
@@ -175,10 +175,12 @@ get_chunk_icm_type(struct mlx5dr_icm_chunk *chunk)
 	return chunk->buddy_mem->pool->icm_type;
 }
 
-static void dr_icm_chunk_destroy(struct mlx5dr_icm_chunk *chunk)
+static void dr_icm_chunk_destroy(struct mlx5dr_icm_chunk *chunk,
+				 struct mlx5dr_icm_buddy_mem *buddy)
 {
 	enum mlx5dr_icm_type icm_type = get_chunk_icm_type(chunk);
 
+	buddy->used_memory -= chunk->byte_size;
 	list_del(&chunk->chunk_list);
 
 	if (icm_type == DR_ICM_TYPE_STE)
@@ -223,10 +225,10 @@ static void dr_icm_buddy_destroy(struct mlx5dr_icm_buddy_mem *buddy)
 	struct mlx5dr_icm_chunk *chunk, *next;
 
 	list_for_each_entry_safe(chunk, next, &buddy->hot_list, chunk_list)
-		dr_icm_chunk_destroy(chunk);
+		dr_icm_chunk_destroy(chunk, buddy);
 
 	list_for_each_entry_safe(chunk, next, &buddy->used_list, chunk_list)
-		dr_icm_chunk_destroy(chunk);
+		dr_icm_chunk_destroy(chunk, buddy);
 
 	dr_icm_pool_mr_destroy(buddy->icm_mr);
 
@@ -267,6 +269,7 @@ dr_icm_chunk_create(struct mlx5dr_icm_pool *pool,
 		goto out_free_chunk;
 	}
 
+	buddy_mem_pool->used_memory += chunk->byte_size;
 	chunk->buddy_mem = buddy_mem_pool;
 	INIT_LIST_HEAD(&chunk->chunk_list);
 
@@ -306,8 +309,11 @@ static int dr_icm_pool_sync_all_buddy_pools(struct mlx5dr_icm_pool *pool)
 			mlx5dr_buddy_free_mem(buddy, chunk->seg,
 					      ilog2(chunk->num_of_entries));
 			pool->hot_memory_size -= chunk->byte_size;
-			dr_icm_chunk_destroy(chunk);
+			dr_icm_chunk_destroy(chunk, buddy);
 		}
+
+		if (!buddy->used_memory && pool->icm_type == DR_ICM_TYPE_STE)
+			dr_icm_buddy_destroy(buddy);
 	}
 
 	return 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h
index 93ddb9c3e6b6..0aaba0ae9cf7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h
@@ -141,6 +141,7 @@ struct mlx5dr_icm_buddy_mem {
 
 	/* This is the list of used chunks. HW may be accessing this memory */
 	struct list_head	used_list;
+	u64			used_memory;
 
 	/* Hardware may be accessing this memory but at some future,
 	 * undetermined time, it might cease to do so.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 06/15] net/mlx5: E-switch, Use PF num in metadata reg c0
  2020-09-25 19:37 [pull request][net-next 00/15] mlx5 SW steering updates 2020-09-25 saeed
                   ` (4 preceding siblings ...)
  2020-09-25 19:37 ` [net-next 05/15] net/mlx5: DR, Free buddy ICM memory if it is unused saeed
@ 2020-09-25 19:38 ` saeed
  2020-09-25 19:38 ` [net-next 07/15] net/mlx5: DR, Replace the check for valid STE entry saeed
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: saeed @ 2020-09-25 19:38 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, sunils, Parav Pandit, Vu Pham, Saeed Mahameed

From: sunils <sunils@nvidia.com>

Currently only 256 vports can be supported as only 8 bits are
reserved for them and 8 bits are reserved for vhca_ids in
metadata reg c0. To support more than 256 vports, replace
vhca_id with a unique shorter 4-bit PF number which covers
upto 16 PF's. Use remaining 12 bits for vports ranging 1-4095.
This will continue to generate unique metadata even if
multiple PCI devices have same switch_id.

Signed-off-by: sunils <sunils@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/eswitch_offloads.c     | 36 +++++++++----------
 include/linux/mlx5/eswitch.h                  | 15 ++++----
 2 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index ffd5d540a19e..6b49c0d59099 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -2019,31 +2019,31 @@ esw_check_vport_match_metadata_supported(const struct mlx5_eswitch *esw)
 
 u32 mlx5_esw_match_metadata_alloc(struct mlx5_eswitch *esw)
 {
-	u32 num_vports = GENMASK(ESW_VPORT_BITS - 1, 0) - 1;
-	u32 vhca_id_mask = GENMASK(ESW_VHCA_ID_BITS - 1, 0);
-	u32 vhca_id = MLX5_CAP_GEN(esw->dev, vhca_id);
-	u32 start;
-	u32 end;
+	u32 vport_end_ida = (1 << ESW_VPORT_BITS) - 1;
+	u32 max_pf_num = (1 << ESW_PFNUM_BITS) - 1;
+	u32 pf_num;
 	int id;
 
-	/* Make sure the vhca_id fits the ESW_VHCA_ID_BITS */
-	WARN_ON_ONCE(vhca_id >= BIT(ESW_VHCA_ID_BITS));
-
-	/* Trim vhca_id to ESW_VHCA_ID_BITS */
-	vhca_id &= vhca_id_mask;
-
-	start = (vhca_id << ESW_VPORT_BITS);
-	end = start + num_vports;
-	if (!vhca_id)
-		start += 1; /* zero is reserved/invalid metadata */
-	id = ida_alloc_range(&esw->offloads.vport_metadata_ida, start, end, GFP_KERNEL);
+	/* Only 4 bits of pf_num */
+	pf_num = PCI_FUNC(esw->dev->pdev->devfn);
+	if (pf_num > max_pf_num)
+		return 0;
 
-	return (id < 0) ? 0 : id;
+	/* Metadata is 4 bits of PFNUM and 12 bits of unique id */
+	/* Use only non-zero vport_id (1-4095) for all PF's */
+	id = ida_alloc_range(&esw->offloads.vport_metadata_ida, 1, vport_end_ida, GFP_KERNEL);
+	if (id < 0)
+		return 0;
+	id = (pf_num << ESW_VPORT_BITS) | id;
+	return id;
 }
 
 void mlx5_esw_match_metadata_free(struct mlx5_eswitch *esw, u32 metadata)
 {
-	ida_free(&esw->offloads.vport_metadata_ida, metadata);
+	u32 vport_bit_mask = (1 << ESW_VPORT_BITS) - 1;
+
+	/* Metadata contains only 12 bits of actual ida id */
+	ida_free(&esw->offloads.vport_metadata_ida, metadata & vport_bit_mask);
 }
 
 static int esw_offloads_vport_metadata_setup(struct mlx5_eswitch *esw,
diff --git a/include/linux/mlx5/eswitch.h b/include/linux/mlx5/eswitch.h
index c16827eeba9c..b0ae8020f13e 100644
--- a/include/linux/mlx5/eswitch.h
+++ b/include/linux/mlx5/eswitch.h
@@ -74,15 +74,16 @@ bool mlx5_eswitch_reg_c1_loopback_enabled(const struct mlx5_eswitch *esw);
 bool mlx5_eswitch_vport_match_metadata_enabled(const struct mlx5_eswitch *esw);
 
 /* Reg C0 usage:
- * Reg C0 = < ESW_VHCA_ID_BITS(8) | ESW_VPORT BITS(8) | ESW_CHAIN_TAG(16) >
+ * Reg C0 = < ESW_PFNUM_BITS(4) | ESW_VPORT BITS(12) | ESW_CHAIN_TAG(16) >
  *
- * Highest 8 bits of the reg c0 is the vhca_id, next 8 bits is vport_num,
- * the rest (lowest 16 bits) is left for tc chain tag restoration.
- * VHCA_ID + VPORT comprise the SOURCE_PORT matching.
+ * Highest 4 bits of the reg c0 is the PF_NUM (range 0-15), 12 bits of
+ * unique non-zero vport id (range 1-4095). The rest (lowest 16 bits) is left
+ * for tc chain tag restoration.
+ * PFNUM + VPORT comprise the SOURCE_PORT matching.
  */
-#define ESW_VHCA_ID_BITS 8
-#define ESW_VPORT_BITS 8
-#define ESW_SOURCE_PORT_METADATA_BITS (ESW_VHCA_ID_BITS + ESW_VPORT_BITS)
+#define ESW_VPORT_BITS 12
+#define ESW_PFNUM_BITS 4
+#define ESW_SOURCE_PORT_METADATA_BITS (ESW_PFNUM_BITS + ESW_VPORT_BITS)
 #define ESW_SOURCE_PORT_METADATA_OFFSET (32 - ESW_SOURCE_PORT_METADATA_BITS)
 #define ESW_CHAIN_TAG_METADATA_BITS (32 - ESW_SOURCE_PORT_METADATA_BITS)
 #define ESW_CHAIN_TAG_METADATA_MASK GENMASK(ESW_CHAIN_TAG_METADATA_BITS - 1,\
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 07/15] net/mlx5: DR, Replace the check for valid STE entry
  2020-09-25 19:37 [pull request][net-next 00/15] mlx5 SW steering updates 2020-09-25 saeed
                   ` (5 preceding siblings ...)
  2020-09-25 19:38 ` [net-next 06/15] net/mlx5: E-switch, Use PF num in metadata reg c0 saeed
@ 2020-09-25 19:38 ` saeed
  2020-09-25 19:38 ` [net-next 08/15] net/mlx5: DR, Remove unneeded check from source port builder saeed
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: saeed @ 2020-09-25 19:38 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Yevgeny Kliteynik, Alex Vesker, Saeed Mahameed

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

Validity check is done by reading the next lu_type from the STE,
this check can be replaced by checking the refcount.
This will make the check independent on internal STE structure.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/steering/dr_rule.c     |  6 +++---
 .../mellanox/mlx5/core/steering/dr_send.c     |  4 ++--
 .../mellanox/mlx5/core/steering/dr_ste.c      | 19 -------------------
 .../mellanox/mlx5/core/steering/dr_types.h    |  7 +++++--
 4 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c
index 6ec5106bc472..17577181ce8f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c
@@ -242,7 +242,7 @@ dr_rule_rehash_copy_ste(struct mlx5dr_matcher *matcher,
 	new_idx = mlx5dr_ste_calc_hash_index(hw_ste, new_htbl);
 	new_ste = &new_htbl->ste_arr[new_idx];
 
-	if (mlx5dr_ste_not_used_ste(new_ste)) {
+	if (mlx5dr_ste_is_not_used(new_ste)) {
 		mlx5dr_htbl_get(new_htbl);
 		list_add_tail(&new_ste->miss_list_node,
 			      mlx5dr_ste_get_miss_list(new_ste));
@@ -335,7 +335,7 @@ static int dr_rule_rehash_copy_htbl(struct mlx5dr_matcher *matcher,
 
 	for (i = 0; i < cur_entries; i++) {
 		cur_ste = &cur_htbl->ste_arr[i];
-		if (mlx5dr_ste_not_used_ste(cur_ste)) /* Empty, nothing to copy */
+		if (mlx5dr_ste_is_not_used(cur_ste)) /* Empty, nothing to copy */
 			continue;
 
 		err = dr_rule_rehash_copy_miss_list(matcher,
@@ -791,7 +791,7 @@ dr_rule_handle_ste_branch(struct mlx5dr_rule *rule,
 	miss_list = &cur_htbl->chunk->miss_list[index];
 	ste = &cur_htbl->ste_arr[index];
 
-	if (mlx5dr_ste_not_used_ste(ste)) {
+	if (mlx5dr_ste_is_not_used(ste)) {
 		if (dr_rule_handle_empty_entry(matcher, nic_matcher, cur_htbl,
 					       ste, ste_location,
 					       hw_ste, miss_list,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
index 2ca79b9bde1f..3d77f7d9fbdf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
@@ -466,10 +466,10 @@ int mlx5dr_send_postsend_htbl(struct mlx5dr_domain *dmn,
 		 * need to add the bit_mask
 		 */
 		for (j = 0; j < num_stes_per_iter; j++) {
-			u8 *hw_ste = htbl->ste_arr[ste_index + j].hw_ste;
+			struct mlx5dr_ste *ste = &htbl->ste_arr[ste_index + j];
 			u32 ste_off = j * DR_STE_SIZE;
 
-			if (mlx5dr_ste_is_not_valid_entry(hw_ste)) {
+			if (mlx5dr_ste_is_not_used(ste)) {
 				memcpy(data + ste_off,
 				       formatted_ste, DR_STE_SIZE);
 			} else {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c
index 00c2f598f034..053e63844bd2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c
@@ -549,25 +549,6 @@ void mlx5dr_ste_always_miss_addr(struct mlx5dr_ste *ste, u64 miss_addr)
 	dr_ste_set_always_miss((struct dr_hw_ste_format *)ste->hw_ste);
 }
 
-/* The assumption here is that we don't update the ste->hw_ste if it is not
- * used ste, so it will be all zero, checking the next_lu_type.
- */
-bool mlx5dr_ste_is_not_valid_entry(u8 *p_hw_ste)
-{
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)p_hw_ste;
-
-	if (MLX5_GET(ste_general, hw_ste, next_lu_type) ==
-	    MLX5DR_STE_LU_TYPE_NOP)
-		return true;
-
-	return false;
-}
-
-bool mlx5dr_ste_not_used_ste(struct mlx5dr_ste *ste)
-{
-	return !ste->refcount;
-}
-
 /* Init one ste as a pattern for ste data array */
 void mlx5dr_ste_set_formatted_ste(u16 gvmi,
 				  struct mlx5dr_domain_rx_tx *nic_dmn,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
index f71ca74f96fd..07cf24a38d47 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
@@ -227,7 +227,6 @@ void mlx5dr_ste_set_hit_gvmi(u8 *hw_ste_p, u16 gvmi);
 void mlx5dr_ste_set_hit_addr(u8 *hw_ste, u64 icm_addr, u32 ht_size);
 void mlx5dr_ste_always_miss_addr(struct mlx5dr_ste *ste, u64 miss_addr);
 void mlx5dr_ste_set_bit_mask(u8 *hw_ste_p, u8 *bit_mask);
-bool mlx5dr_ste_not_used_ste(struct mlx5dr_ste *ste);
 bool mlx5dr_ste_is_last_in_rule(struct mlx5dr_matcher_rx_tx *nic_matcher,
 				u8 ste_location);
 void mlx5dr_ste_rx_set_flow_tag(u8 *hw_ste_p, u32 flow_tag);
@@ -266,6 +265,11 @@ static inline void mlx5dr_ste_get(struct mlx5dr_ste *ste)
 	ste->refcount++;
 }
 
+static inline bool mlx5dr_ste_is_not_used(struct mlx5dr_ste *ste)
+{
+	return !ste->refcount;
+}
+
 void mlx5dr_ste_set_hit_addr_by_next_htbl(u8 *hw_ste,
 					  struct mlx5dr_ste_htbl *next_htbl);
 bool mlx5dr_ste_equal_tag(void *src, void *dst);
@@ -1001,7 +1005,6 @@ struct mlx5dr_icm_chunk *
 mlx5dr_icm_alloc_chunk(struct mlx5dr_icm_pool *pool,
 		       enum mlx5dr_icm_chunk_size chunk_size);
 void mlx5dr_icm_free_chunk(struct mlx5dr_icm_chunk *chunk);
-bool mlx5dr_ste_is_not_valid_entry(u8 *p_hw_ste);
 int mlx5dr_ste_htbl_init_and_postsend(struct mlx5dr_domain *dmn,
 				      struct mlx5dr_domain_rx_tx *nic_dmn,
 				      struct mlx5dr_ste_htbl *htbl,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 08/15] net/mlx5: DR, Remove unneeded check from source port builder
  2020-09-25 19:37 [pull request][net-next 00/15] mlx5 SW steering updates 2020-09-25 saeed
                   ` (6 preceding siblings ...)
  2020-09-25 19:38 ` [net-next 07/15] net/mlx5: DR, Replace the check for valid STE entry saeed
@ 2020-09-25 19:38 ` saeed
  2020-09-25 19:38 ` [net-next 09/15] net/mlx5: DR, Remove unneeded vlan check from L2 builder saeed
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: saeed @ 2020-09-25 19:38 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Yevgeny Kliteynik, Alex Vesker, Saeed Mahameed

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

Mask validity for ste builders is checked by mlx5dr_ste_build_pre_check
during matcher creation.
It already checks the mask value of source_vport, so removing
this duplicated check.
Also, moving there the check of source_eswitch_owner_vhca_id mask.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/steering/dr_matcher.c  |  6 +--
 .../mellanox/mlx5/core/steering/dr_ste.c      | 40 +++++++------------
 .../mellanox/mlx5/core/steering/dr_types.h    |  8 ++--
 3 files changed, 21 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c
index c63f727273d8..2b794daca436 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c
@@ -252,10 +252,8 @@ static int dr_matcher_set_ste_builders(struct mlx5dr_matcher *matcher,
 		if (dr_mask_is_gvmi_or_qpn_set(&mask.misc) &&
 		    (dmn->type == MLX5DR_DOMAIN_TYPE_FDB ||
 		     dmn->type == MLX5DR_DOMAIN_TYPE_NIC_RX)) {
-			ret = mlx5dr_ste_build_src_gvmi_qpn(&sb[idx++], &mask,
-							    dmn, inner, rx);
-			if (ret)
-				return ret;
+			mlx5dr_ste_build_src_gvmi_qpn(&sb[idx++], &mask,
+						      dmn, inner, rx);
 		}
 
 		if (dr_mask_is_smac_set(&mask.outer) &&
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c
index 053e63844bd2..6e86704181cc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c
@@ -709,7 +709,14 @@ int mlx5dr_ste_build_pre_check(struct mlx5dr_domain *dmn,
 {
 	if (!value && (match_criteria & DR_MATCHER_CRITERIA_MISC)) {
 		if (mask->misc.source_port && mask->misc.source_port != 0xffff) {
-			mlx5dr_err(dmn, "Partial mask source_port is not supported\n");
+			mlx5dr_err(dmn,
+				   "Partial mask source_port is not supported\n");
+			return -EINVAL;
+		}
+		if (mask->misc.source_eswitch_owner_vhca_id &&
+		    mask->misc.source_eswitch_owner_vhca_id != 0xffff) {
+			mlx5dr_err(dmn,
+				   "Partial mask source_eswitch_owner_vhca_id is not supported\n");
 			return -EINVAL;
 		}
 	}
@@ -2257,25 +2264,14 @@ void mlx5dr_ste_build_register_1(struct mlx5dr_ste_build *sb,
 	sb->ste_build_tag_func = &dr_ste_build_register_1_tag;
 }
 
-static int dr_ste_build_src_gvmi_qpn_bit_mask(struct mlx5dr_match_param *value,
-					      u8 *bit_mask)
+static void dr_ste_build_src_gvmi_qpn_bit_mask(struct mlx5dr_match_param *value,
+					       u8 *bit_mask)
 {
 	struct mlx5dr_match_misc *misc_mask = &value->misc;
 
-	/* Partial misc source_port is not supported */
-	if (misc_mask->source_port && misc_mask->source_port != 0xffff)
-		return -EINVAL;
-
-	/* Partial misc source_eswitch_owner_vhca_id is not supported */
-	if (misc_mask->source_eswitch_owner_vhca_id &&
-	    misc_mask->source_eswitch_owner_vhca_id != 0xffff)
-		return -EINVAL;
-
 	DR_STE_SET_MASK(src_gvmi_qp, bit_mask, source_gvmi, misc_mask, source_port);
 	DR_STE_SET_MASK(src_gvmi_qp, bit_mask, source_qp, misc_mask, source_sqn);
 	misc_mask->source_eswitch_owner_vhca_id = 0;
-
-	return 0;
 }
 
 static int dr_ste_build_src_gvmi_qpn_tag(struct mlx5dr_match_param *value,
@@ -2320,19 +2316,15 @@ static int dr_ste_build_src_gvmi_qpn_tag(struct mlx5dr_match_param *value,
 	return 0;
 }
 
-int mlx5dr_ste_build_src_gvmi_qpn(struct mlx5dr_ste_build *sb,
-				  struct mlx5dr_match_param *mask,
-				  struct mlx5dr_domain *dmn,
-				  bool inner, bool rx)
+void mlx5dr_ste_build_src_gvmi_qpn(struct mlx5dr_ste_build *sb,
+				   struct mlx5dr_match_param *mask,
+				   struct mlx5dr_domain *dmn,
+				   bool inner, bool rx)
 {
-	int ret;
-
 	/* Set vhca_id_valid before we reset source_eswitch_owner_vhca_id */
 	sb->vhca_id_valid = mask->misc.source_eswitch_owner_vhca_id;
 
-	ret = dr_ste_build_src_gvmi_qpn_bit_mask(mask, sb->bit_mask);
-	if (ret)
-		return ret;
+	dr_ste_build_src_gvmi_qpn_bit_mask(mask, sb->bit_mask);
 
 	sb->rx = rx;
 	sb->dmn = dmn;
@@ -2340,6 +2332,4 @@ int mlx5dr_ste_build_src_gvmi_qpn(struct mlx5dr_ste_build *sb,
 	sb->lu_type = MLX5DR_STE_LU_TYPE_SRC_GVMI_AND_QP;
 	sb->byte_mask = dr_ste_conv_bit_to_byte_mask(sb->bit_mask);
 	sb->ste_build_tag_func = &dr_ste_build_src_gvmi_qpn_tag;
-
-	return 0;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
index 07cf24a38d47..026fb4606606 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
@@ -346,10 +346,10 @@ void mlx5dr_ste_build_register_0(struct mlx5dr_ste_build *sb,
 void mlx5dr_ste_build_register_1(struct mlx5dr_ste_build *sb,
 				 struct mlx5dr_match_param *mask,
 				 bool inner, bool rx);
-int mlx5dr_ste_build_src_gvmi_qpn(struct mlx5dr_ste_build *sb,
-				  struct mlx5dr_match_param *mask,
-				  struct mlx5dr_domain *dmn,
-				  bool inner, bool rx);
+void mlx5dr_ste_build_src_gvmi_qpn(struct mlx5dr_ste_build *sb,
+				   struct mlx5dr_match_param *mask,
+				   struct mlx5dr_domain *dmn,
+				   bool inner, bool rx);
 void mlx5dr_ste_build_empty_always_hit(struct mlx5dr_ste_build *sb, bool rx);
 
 /* Actions utils */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 09/15] net/mlx5: DR, Remove unneeded vlan check from L2 builder
  2020-09-25 19:37 [pull request][net-next 00/15] mlx5 SW steering updates 2020-09-25 saeed
                   ` (7 preceding siblings ...)
  2020-09-25 19:38 ` [net-next 08/15] net/mlx5: DR, Remove unneeded check from source port builder saeed
@ 2020-09-25 19:38 ` saeed
  2020-09-25 19:38 ` [net-next 10/15] net/mlx5: DR, Remove unneeded local variable saeed
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: saeed @ 2020-09-25 19:38 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Yevgeny Kliteynik, Alex Vesker, Saeed Mahameed

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

When we create a matcher we check that all fields are consumed.
There is no need for this specific check. This keeps the STE
builder functions simple and clean.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/steering/dr_matcher.c  | 12 +++------
 .../mellanox/mlx5/core/steering/dr_ste.c      | 25 +++++--------------
 .../mellanox/mlx5/core/steering/dr_types.h    |  6 ++---
 3 files changed, 13 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c
index 2b794daca436..a16d7faa2bb8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c
@@ -258,10 +258,8 @@ static int dr_matcher_set_ste_builders(struct mlx5dr_matcher *matcher,
 
 		if (dr_mask_is_smac_set(&mask.outer) &&
 		    dr_mask_is_dmac_set(&mask.outer)) {
-			ret = mlx5dr_ste_build_eth_l2_src_des(&sb[idx++], &mask,
-							      inner, rx);
-			if (ret)
-				return ret;
+			mlx5dr_ste_build_eth_l2_src_des(&sb[idx++], &mask,
+							inner, rx);
 		}
 
 		if (dr_mask_is_smac_set(&mask.outer))
@@ -338,10 +336,8 @@ static int dr_matcher_set_ste_builders(struct mlx5dr_matcher *matcher,
 
 		if (dr_mask_is_smac_set(&mask.inner) &&
 		    dr_mask_is_dmac_set(&mask.inner)) {
-			ret = mlx5dr_ste_build_eth_l2_src_des(&sb[idx++],
-							      &mask, inner, rx);
-			if (ret)
-				return ret;
+			mlx5dr_ste_build_eth_l2_src_des(&sb[idx++],
+							&mask, inner, rx);
 		}
 
 		if (dr_mask_is_smac_set(&mask.inner))
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c
index 6e86704181cc..970dbabe3ea2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c
@@ -766,8 +766,8 @@ int mlx5dr_ste_build_ste_arr(struct mlx5dr_matcher *matcher,
 	return 0;
 }
 
-static int dr_ste_build_eth_l2_src_des_bit_mask(struct mlx5dr_match_param *value,
-						bool inner, u8 *bit_mask)
+static void dr_ste_build_eth_l2_src_des_bit_mask(struct mlx5dr_match_param *value,
+						 bool inner, u8 *bit_mask)
 {
 	struct mlx5dr_match_spec *mask = inner ? &value->inner : &value->outer;
 
@@ -795,13 +795,6 @@ static int dr_ste_build_eth_l2_src_des_bit_mask(struct mlx5dr_match_param *value
 		MLX5_SET(ste_eth_l2_src_dst, bit_mask, first_vlan_qualifier, -1);
 		mask->svlan_tag = 0;
 	}
-
-	if (mask->cvlan_tag || mask->svlan_tag) {
-		pr_info("Invalid c/svlan mask configuration\n");
-		return -EINVAL;
-	}
-
-	return 0;
 }
 
 static void dr_ste_copy_mask_misc(char *mask, struct mlx5dr_match_misc *spec)
@@ -1092,23 +1085,17 @@ static int dr_ste_build_eth_l2_src_des_tag(struct mlx5dr_match_param *value,
 	return 0;
 }
 
-int mlx5dr_ste_build_eth_l2_src_des(struct mlx5dr_ste_build *sb,
-				    struct mlx5dr_match_param *mask,
-				    bool inner, bool rx)
+void mlx5dr_ste_build_eth_l2_src_des(struct mlx5dr_ste_build *sb,
+				     struct mlx5dr_match_param *mask,
+				     bool inner, bool rx)
 {
-	int ret;
-
-	ret = dr_ste_build_eth_l2_src_des_bit_mask(mask, inner, sb->bit_mask);
-	if (ret)
-		return ret;
+	dr_ste_build_eth_l2_src_des_bit_mask(mask, inner, sb->bit_mask);
 
 	sb->rx = rx;
 	sb->inner = inner;
 	sb->lu_type = DR_STE_CALC_LU_TYPE(ETHL2_SRC_DST, rx, inner);
 	sb->byte_mask = dr_ste_conv_bit_to_byte_mask(sb->bit_mask);
 	sb->ste_build_tag_func = &dr_ste_build_eth_l2_src_des_tag;
-
-	return 0;
 }
 
 static void dr_ste_build_eth_l3_ipv6_dst_bit_mask(struct mlx5dr_match_param *value,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
index 026fb4606606..a4a026243896 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
@@ -288,9 +288,9 @@ int mlx5dr_ste_build_ste_arr(struct mlx5dr_matcher *matcher,
 			     struct mlx5dr_matcher_rx_tx *nic_matcher,
 			     struct mlx5dr_match_param *value,
 			     u8 *ste_arr);
-int mlx5dr_ste_build_eth_l2_src_des(struct mlx5dr_ste_build *builder,
-				    struct mlx5dr_match_param *mask,
-				    bool inner, bool rx);
+void mlx5dr_ste_build_eth_l2_src_des(struct mlx5dr_ste_build *builder,
+				     struct mlx5dr_match_param *mask,
+				     bool inner, bool rx);
 void mlx5dr_ste_build_eth_l3_ipv4_5_tuple(struct mlx5dr_ste_build *sb,
 					  struct mlx5dr_match_param *mask,
 					  bool inner, bool rx);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 10/15] net/mlx5: DR, Remove unneeded local variable
  2020-09-25 19:37 [pull request][net-next 00/15] mlx5 SW steering updates 2020-09-25 saeed
                   ` (8 preceding siblings ...)
  2020-09-25 19:38 ` [net-next 09/15] net/mlx5: DR, Remove unneeded vlan check from L2 builder saeed
@ 2020-09-25 19:38 ` saeed
  2020-09-25 19:38 ` [net-next 11/15] net/mlx5: DR, Call ste_builder directly with tag pointer saeed
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: saeed @ 2020-09-25 19:38 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Yevgeny Kliteynik, Alex Vesker, Saeed Mahameed

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

The misc3 variable is used only once and can be dropped.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c
index a16d7faa2bb8..7df883686d46 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c
@@ -203,7 +203,6 @@ static int dr_matcher_set_ste_builders(struct mlx5dr_matcher *matcher,
 	struct mlx5dr_domain_rx_tx *nic_dmn = nic_matcher->nic_tbl->nic_dmn;
 	struct mlx5dr_domain *dmn = matcher->tbl->dmn;
 	struct mlx5dr_match_param mask = {};
-	struct mlx5dr_match_misc3 *misc3;
 	struct mlx5dr_ste_build *sb;
 	bool inner, rx;
 	int idx = 0;
@@ -309,8 +308,7 @@ static int dr_matcher_set_ste_builders(struct mlx5dr_matcher *matcher,
 			mlx5dr_ste_build_flex_parser_0(&sb[idx++], &mask,
 						       inner, rx);
 
-		misc3 = &mask.misc3;
-		if ((DR_MASK_IS_FLEX_PARSER_ICMPV4_SET(misc3) &&
+		if ((DR_MASK_IS_FLEX_PARSER_ICMPV4_SET(&mask.misc3) &&
 		     mlx5dr_matcher_supp_flex_parser_icmp_v4(&dmn->info.caps)) ||
 		    (dr_mask_is_flex_parser_icmpv6_set(&mask.misc3) &&
 		     mlx5dr_matcher_supp_flex_parser_icmp_v6(&dmn->info.caps))) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 11/15] net/mlx5: DR, Call ste_builder directly with tag pointer
  2020-09-25 19:37 [pull request][net-next 00/15] mlx5 SW steering updates 2020-09-25 saeed
                   ` (9 preceding siblings ...)
  2020-09-25 19:38 ` [net-next 10/15] net/mlx5: DR, Remove unneeded local variable saeed
@ 2020-09-25 19:38 ` saeed
  2020-09-25 19:38 ` [net-next 12/15] net/mlx5: DR, Remove unused member of action struct saeed
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: saeed @ 2020-09-25 19:38 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Yevgeny Kliteynik, Alex Vesker, Saeed Mahameed

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

Instead of getting the tag in each function, call the builder
directly with the tag. This will allow to use the same function
for building the tag and the bitmask.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/steering/dr_ste.c      | 99 ++++++-------------
 .../mellanox/mlx5/core/steering/dr_types.h    |  2 +-
 2 files changed, 33 insertions(+), 68 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c
index 970dbabe3ea2..b01aaec75622 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c
@@ -155,6 +155,13 @@ static u16 dr_ste_conv_bit_to_byte_mask(u8 *bit_mask)
 	return byte_mask;
 }
 
+static u8 *mlx5dr_ste_get_tag(u8 *hw_ste_p)
+{
+	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
+
+	return hw_ste->tag;
+}
+
 void mlx5dr_ste_set_bit_mask(u8 *hw_ste_p, u8 *bit_mask)
 {
 	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
@@ -748,7 +755,7 @@ int mlx5dr_ste_build_ste_arr(struct mlx5dr_matcher *matcher,
 
 		mlx5dr_ste_set_bit_mask(ste_arr, sb->bit_mask);
 
-		ret = sb->ste_build_tag_func(value, sb, ste_arr);
+		ret = sb->ste_build_tag_func(value, sb, mlx5dr_ste_get_tag(ste_arr));
 		if (ret)
 			return ret;
 
@@ -1040,11 +1047,9 @@ void mlx5dr_ste_copy_param(u8 match_criteria,
 
 static int dr_ste_build_eth_l2_src_des_tag(struct mlx5dr_match_param *value,
 					   struct mlx5dr_ste_build *sb,
-					   u8 *hw_ste_p)
+					   u8 *tag)
 {
 	struct mlx5dr_match_spec *spec = sb->inner ? &value->inner : &value->outer;
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
-	u8 *tag = hw_ste->tag;
 
 	DR_STE_SET_TAG(eth_l2_src_dst, tag, dmac_47_16, spec, dmac_47_16);
 	DR_STE_SET_TAG(eth_l2_src_dst, tag, dmac_15_0, spec, dmac_15_0);
@@ -1111,11 +1116,9 @@ static void dr_ste_build_eth_l3_ipv6_dst_bit_mask(struct mlx5dr_match_param *val
 
 static int dr_ste_build_eth_l3_ipv6_dst_tag(struct mlx5dr_match_param *value,
 					    struct mlx5dr_ste_build *sb,
-					    u8 *hw_ste_p)
+					    u8 *tag)
 {
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
 	struct mlx5dr_match_spec *spec = sb->inner ? &value->inner : &value->outer;
-	u8 *tag = hw_ste->tag;
 
 	DR_STE_SET_TAG(eth_l3_ipv6_dst, tag, dst_ip_127_96, spec, dst_ip_127_96);
 	DR_STE_SET_TAG(eth_l3_ipv6_dst, tag, dst_ip_95_64, spec, dst_ip_95_64);
@@ -1151,11 +1154,9 @@ static void dr_ste_build_eth_l3_ipv6_src_bit_mask(struct mlx5dr_match_param *val
 
 static int dr_ste_build_eth_l3_ipv6_src_tag(struct mlx5dr_match_param *value,
 					    struct mlx5dr_ste_build *sb,
-					    u8 *hw_ste_p)
+					    u8 *tag)
 {
 	struct mlx5dr_match_spec *spec = sb->inner ? &value->inner : &value->outer;
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
-	u8 *tag = hw_ste->tag;
 
 	DR_STE_SET_TAG(eth_l3_ipv6_src, tag, src_ip_127_96, spec, src_ip_127_96);
 	DR_STE_SET_TAG(eth_l3_ipv6_src, tag, src_ip_95_64, spec, src_ip_95_64);
@@ -1213,11 +1214,9 @@ static void dr_ste_build_eth_l3_ipv4_5_tuple_bit_mask(struct mlx5dr_match_param
 
 static int dr_ste_build_eth_l3_ipv4_5_tuple_tag(struct mlx5dr_match_param *value,
 						struct mlx5dr_ste_build *sb,
-						u8 *hw_ste_p)
+						u8 *tag)
 {
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
 	struct mlx5dr_match_spec *spec = sb->inner ? &value->inner : &value->outer;
-	u8 *tag = hw_ste->tag;
 
 	DR_STE_SET_TAG(eth_l3_ipv4_5_tuple, tag, destination_address, spec, dst_ip_31_0);
 	DR_STE_SET_TAG(eth_l3_ipv4_5_tuple, tag, source_address, spec, src_ip_31_0);
@@ -1303,12 +1302,10 @@ dr_ste_build_eth_l2_src_or_dst_bit_mask(struct mlx5dr_match_param *value,
 }
 
 static int dr_ste_build_eth_l2_src_or_dst_tag(struct mlx5dr_match_param *value,
-					      bool inner, u8 *hw_ste_p)
+					      bool inner, u8 *tag)
 {
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
 	struct mlx5dr_match_spec *spec = inner ? &value->inner : &value->outer;
 	struct mlx5dr_match_misc *misc_spec = &value->misc;
-	u8 *tag = hw_ste->tag;
 
 	DR_STE_SET_TAG(eth_l2_src, tag, first_vlan_id, spec, first_vid);
 	DR_STE_SET_TAG(eth_l2_src, tag, first_cfi, spec, first_cfi);
@@ -1378,16 +1375,14 @@ static void dr_ste_build_eth_l2_src_bit_mask(struct mlx5dr_match_param *value,
 
 static int dr_ste_build_eth_l2_src_tag(struct mlx5dr_match_param *value,
 				       struct mlx5dr_ste_build *sb,
-				       u8 *hw_ste_p)
+				       u8 *tag)
 {
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
 	struct mlx5dr_match_spec *spec = sb->inner ? &value->inner : &value->outer;
-	u8 *tag = hw_ste->tag;
 
 	DR_STE_SET_TAG(eth_l2_src, tag, smac_47_16, spec, smac_47_16);
 	DR_STE_SET_TAG(eth_l2_src, tag, smac_15_0, spec, smac_15_0);
 
-	return dr_ste_build_eth_l2_src_or_dst_tag(value, sb->inner, hw_ste_p);
+	return dr_ste_build_eth_l2_src_or_dst_tag(value, sb->inner, tag);
 }
 
 void mlx5dr_ste_build_eth_l2_src(struct mlx5dr_ste_build *sb,
@@ -1415,16 +1410,14 @@ static void dr_ste_build_eth_l2_dst_bit_mask(struct mlx5dr_match_param *value,
 
 static int dr_ste_build_eth_l2_dst_tag(struct mlx5dr_match_param *value,
 				       struct mlx5dr_ste_build *sb,
-				       u8 *hw_ste_p)
+				       u8 *tag)
 {
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
 	struct mlx5dr_match_spec *spec = sb->inner ? &value->inner : &value->outer;
-	u8 *tag = hw_ste->tag;
 
 	DR_STE_SET_TAG(eth_l2_dst, tag, dmac_47_16, spec, dmac_47_16);
 	DR_STE_SET_TAG(eth_l2_dst, tag, dmac_15_0, spec, dmac_15_0);
 
-	return dr_ste_build_eth_l2_src_or_dst_tag(value, sb->inner, hw_ste_p);
+	return dr_ste_build_eth_l2_src_or_dst_tag(value, sb->inner, tag);
 }
 
 void mlx5dr_ste_build_eth_l2_dst(struct mlx5dr_ste_build *sb,
@@ -1470,12 +1463,10 @@ static void dr_ste_build_eth_l2_tnl_bit_mask(struct mlx5dr_match_param *value,
 
 static int dr_ste_build_eth_l2_tnl_tag(struct mlx5dr_match_param *value,
 				       struct mlx5dr_ste_build *sb,
-				       u8 *hw_ste_p)
+				       u8 *tag)
 {
 	struct mlx5dr_match_spec *spec = sb->inner ? &value->inner : &value->outer;
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
 	struct mlx5dr_match_misc *misc = &value->misc;
-	u8 *tag = hw_ste->tag;
 
 	DR_STE_SET_TAG(eth_l2_tnl, tag, dmac_47_16, spec, dmac_47_16);
 	DR_STE_SET_TAG(eth_l2_tnl, tag, dmac_15_0, spec, dmac_15_0);
@@ -1536,11 +1527,9 @@ static void dr_ste_build_eth_l3_ipv4_misc_bit_mask(struct mlx5dr_match_param *va
 
 static int dr_ste_build_eth_l3_ipv4_misc_tag(struct mlx5dr_match_param *value,
 					     struct mlx5dr_ste_build *sb,
-					     u8 *hw_ste_p)
+					     u8 *tag)
 {
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
 	struct mlx5dr_match_spec *spec = sb->inner ? &value->inner : &value->outer;
-	u8 *tag = hw_ste->tag;
 
 	DR_STE_SET_TAG(eth_l3_ipv4_misc, tag, time_to_live, spec, ttl_hoplimit);
 
@@ -1583,11 +1572,9 @@ static void dr_ste_build_ipv6_l3_l4_bit_mask(struct mlx5dr_match_param *value,
 
 static int dr_ste_build_ipv6_l3_l4_tag(struct mlx5dr_match_param *value,
 				       struct mlx5dr_ste_build *sb,
-				       u8 *hw_ste_p)
+				       u8 *tag)
 {
 	struct mlx5dr_match_spec *spec = sb->inner ? &value->inner : &value->outer;
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
-	u8 *tag = hw_ste->tag;
 
 	DR_STE_SET_TAG(eth_l4, tag, dst_port, spec, tcp_dport);
 	DR_STE_SET_TAG(eth_l4, tag, src_port, spec, tcp_sport);
@@ -1622,7 +1609,7 @@ void mlx5dr_ste_build_ipv6_l3_l4(struct mlx5dr_ste_build *sb,
 
 static int dr_ste_build_empty_always_hit_tag(struct mlx5dr_match_param *value,
 					     struct mlx5dr_ste_build *sb,
-					     u8 *hw_ste_p)
+					     u8 *tag)
 {
 	return 0;
 }
@@ -1648,11 +1635,9 @@ static void dr_ste_build_mpls_bit_mask(struct mlx5dr_match_param *value,
 
 static int dr_ste_build_mpls_tag(struct mlx5dr_match_param *value,
 				 struct mlx5dr_ste_build *sb,
-				 u8 *hw_ste_p)
+				 u8 *tag)
 {
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
 	struct mlx5dr_match_misc2 *misc2_mask = &value->misc2;
-	u8 *tag = hw_ste->tag;
 
 	if (sb->inner)
 		DR_STE_SET_MPLS_TAG(mpls, misc2_mask, inner, tag);
@@ -1691,11 +1676,9 @@ static void dr_ste_build_gre_bit_mask(struct mlx5dr_match_param *value,
 
 static int dr_ste_build_gre_tag(struct mlx5dr_match_param *value,
 				struct mlx5dr_ste_build *sb,
-				u8 *hw_ste_p)
+				u8 *tag)
 {
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
 	struct  mlx5dr_match_misc *misc = &value->misc;
-	u8 *tag = hw_ste->tag;
 
 	DR_STE_SET_TAG(gre, tag, gre_protocol, misc, gre_protocol);
 
@@ -1756,11 +1739,9 @@ static void dr_ste_build_flex_parser_0_bit_mask(struct mlx5dr_match_param *value
 
 static int dr_ste_build_flex_parser_0_tag(struct mlx5dr_match_param *value,
 					  struct mlx5dr_ste_build *sb,
-					  u8 *hw_ste_p)
+					  u8 *tag)
 {
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
 	struct mlx5dr_match_misc2 *misc_2_mask = &value->misc2;
-	u8 *tag = hw_ste->tag;
 
 	if (DR_STE_IS_OUTER_MPLS_OVER_GRE_SET(misc_2_mask)) {
 		DR_STE_SET_TAG(flex_parser_0, tag, parser_3_label,
@@ -1878,11 +1859,9 @@ static int dr_ste_build_flex_parser_1_bit_mask(struct mlx5dr_match_param *mask,
 
 static int dr_ste_build_flex_parser_1_tag(struct mlx5dr_match_param *value,
 					  struct mlx5dr_ste_build *sb,
-					  u8 *hw_ste_p)
+					  u8 *tag)
 {
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
 	struct mlx5dr_match_misc3 *misc_3 = &value->misc3;
-	u8 *tag = hw_ste->tag;
 	u32 icmp_header_data;
 	int dw0_location;
 	int dw1_location;
@@ -1982,11 +1961,9 @@ static void dr_ste_build_general_purpose_bit_mask(struct mlx5dr_match_param *val
 
 static int dr_ste_build_general_purpose_tag(struct mlx5dr_match_param *value,
 					    struct mlx5dr_ste_build *sb,
-					    u8 *hw_ste_p)
+					    u8 *tag)
 {
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
 	struct mlx5dr_match_misc2 *misc_2_mask = &value->misc2;
-	u8 *tag = hw_ste->tag;
 
 	DR_STE_SET_TAG(general_purpose, tag, general_purpose_lookup_field,
 		       misc_2_mask, metadata_reg_a);
@@ -2027,11 +2004,9 @@ static void dr_ste_build_eth_l4_misc_bit_mask(struct mlx5dr_match_param *value,
 
 static int dr_ste_build_eth_l4_misc_tag(struct mlx5dr_match_param *value,
 					struct mlx5dr_ste_build *sb,
-					u8 *hw_ste_p)
+					u8 *tag)
 {
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
 	struct mlx5dr_match_misc3 *misc3 = &value->misc3;
-	u8 *tag = hw_ste->tag;
 
 	if (sb->inner) {
 		DR_STE_SET_TAG(eth_l4_misc, tag, seq_num, misc3, inner_tcp_seq_num);
@@ -2077,11 +2052,9 @@ dr_ste_build_flex_parser_tnl_vxlan_gpe_bit_mask(struct mlx5dr_match_param *value
 static int
 dr_ste_build_flex_parser_tnl_vxlan_gpe_tag(struct mlx5dr_match_param *value,
 					   struct mlx5dr_ste_build *sb,
-					   u8 *hw_ste_p)
+					   u8 *tag)
 {
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
 	struct mlx5dr_match_misc3 *misc3 = &value->misc3;
-	u8 *tag = hw_ste->tag;
 
 	DR_STE_SET_TAG(flex_parser_tnl_vxlan_gpe, tag,
 		       outer_vxlan_gpe_flags, misc3,
@@ -2133,11 +2106,9 @@ dr_ste_build_flex_parser_tnl_geneve_bit_mask(struct mlx5dr_match_param *value,
 static int
 dr_ste_build_flex_parser_tnl_geneve_tag(struct mlx5dr_match_param *value,
 					struct mlx5dr_ste_build *sb,
-					u8 *hw_ste_p)
+					u8 *tag)
 {
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
 	struct mlx5dr_match_misc *misc = &value->misc;
-	u8 *tag = hw_ste->tag;
 
 	DR_STE_SET_TAG(flex_parser_tnl_geneve, tag,
 		       geneve_protocol_type, misc, geneve_protocol_type);
@@ -2180,11 +2151,9 @@ static void dr_ste_build_register_0_bit_mask(struct mlx5dr_match_param *value,
 
 static int dr_ste_build_register_0_tag(struct mlx5dr_match_param *value,
 				       struct mlx5dr_ste_build *sb,
-				       u8 *hw_ste_p)
+				       u8 *tag)
 {
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
 	struct mlx5dr_match_misc2 *misc2 = &value->misc2;
-	u8 *tag = hw_ste->tag;
 
 	DR_STE_SET_TAG(register_0, tag, register_0_h, misc2, metadata_reg_c_0);
 	DR_STE_SET_TAG(register_0, tag, register_0_l, misc2, metadata_reg_c_1);
@@ -2224,11 +2193,9 @@ static void dr_ste_build_register_1_bit_mask(struct mlx5dr_match_param *value,
 
 static int dr_ste_build_register_1_tag(struct mlx5dr_match_param *value,
 				       struct mlx5dr_ste_build *sb,
-				       u8 *hw_ste_p)
+				       u8 *tag)
 {
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
 	struct mlx5dr_match_misc2 *misc2 = &value->misc2;
-	u8 *tag = hw_ste->tag;
 
 	DR_STE_SET_TAG(register_1, tag, register_2_h, misc2, metadata_reg_c_4);
 	DR_STE_SET_TAG(register_1, tag, register_2_l, misc2, metadata_reg_c_5);
@@ -2263,15 +2230,13 @@ static void dr_ste_build_src_gvmi_qpn_bit_mask(struct mlx5dr_match_param *value,
 
 static int dr_ste_build_src_gvmi_qpn_tag(struct mlx5dr_match_param *value,
 					 struct mlx5dr_ste_build *sb,
-					 u8 *hw_ste_p)
+					 u8 *tag)
 {
-	struct dr_hw_ste_format *hw_ste = (struct dr_hw_ste_format *)hw_ste_p;
 	struct mlx5dr_match_misc *misc = &value->misc;
 	struct mlx5dr_cmd_vport_cap *vport_cap;
 	struct mlx5dr_domain *dmn = sb->dmn;
 	struct mlx5dr_cmd_caps *caps;
 	u8 *bit_mask = sb->bit_mask;
-	u8 *tag = hw_ste->tag;
 	bool source_gvmi_set;
 
 	DR_STE_SET_TAG(src_gvmi_qp, tag, source_qp, misc, source_sqn);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
index a4a026243896..ce04a2019f90 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
@@ -194,7 +194,7 @@ struct mlx5dr_ste_build {
 	u8 bit_mask[DR_STE_SIZE_MASK];
 	int (*ste_build_tag_func)(struct mlx5dr_match_param *spec,
 				  struct mlx5dr_ste_build *sb,
-				  u8 *hw_ste_p);
+				  u8 *tag);
 };
 
 struct mlx5dr_ste_htbl *
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 12/15] net/mlx5: DR, Remove unused member of action struct
  2020-09-25 19:37 [pull request][net-next 00/15] mlx5 SW steering updates 2020-09-25 saeed
                   ` (10 preceding siblings ...)
  2020-09-25 19:38 ` [net-next 11/15] net/mlx5: DR, Call ste_builder directly with tag pointer saeed
@ 2020-09-25 19:38 ` saeed
  2020-09-25 19:38 ` [net-next 13/15] net/mlx5: DR, Rename builders HW specific names saeed
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 23+ messages in thread
From: saeed @ 2020-09-25 19:38 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Yevgeny Kliteynik, Erez Shitrit, Alex Vesker, Saeed Mahameed

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

Struct mlx5dr_action doesn't use this member

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
index ce04a2019f90..77615f6d2fdf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
@@ -731,7 +731,6 @@ struct mlx5dr_action {
 			struct mlx5dr_domain *dmn;
 			struct mlx5dr_icm_chunk *chunk;
 			u8 *data;
-			u32 data_size;
 			u16 num_of_actions;
 			u32 index;
 			u8 allow_rx:1;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 13/15] net/mlx5: DR, Rename builders HW specific names
  2020-09-25 19:37 [pull request][net-next 00/15] mlx5 SW steering updates 2020-09-25 saeed
                   ` (11 preceding siblings ...)
  2020-09-25 19:38 ` [net-next 12/15] net/mlx5: DR, Remove unused member of action struct saeed
@ 2020-09-25 19:38 ` saeed
  2020-09-25 19:38 ` [net-next 14/15] net/mlx5: DR, Rename matcher functions to be more HW agnostic saeed
  2020-09-25 19:38 ` [net-next 15/15] net/mlx5: DR, Add support for rule creation with flow source hint saeed
  14 siblings, 0 replies; 23+ messages in thread
From: saeed @ 2020-09-25 19:38 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Yevgeny Kliteynik, Alex Vesker, Saeed Mahameed

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

We will support multiple STE versions.
The existing naming is not suitable for newer versions.
Removed the HW specific details and renamed with a more
general names.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/steering/dr_matcher.c  | 63 ++++++++++---------
 .../mellanox/mlx5/core/steering/dr_ste.c      | 42 ++++++-------
 .../mellanox/mlx5/core/steering/dr_types.h    | 42 ++++++-------
 3 files changed, 76 insertions(+), 71 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c
index 7df883686d46..752afdb20e23 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c
@@ -85,7 +85,7 @@ static bool dr_mask_is_ttl_set(struct mlx5dr_match_spec *spec)
 	(_misc2)._inner_outer##_first_mpls_s_bos || \
 	(_misc2)._inner_outer##_first_mpls_ttl)
 
-static bool dr_mask_is_gre_set(struct mlx5dr_match_misc *misc)
+static bool dr_mask_is_tnl_gre_set(struct mlx5dr_match_misc *misc)
 {
 	return (misc->gre_key_h || misc->gre_key_l ||
 		misc->gre_protocol || misc->gre_c_present ||
@@ -98,7 +98,7 @@ static bool dr_mask_is_gre_set(struct mlx5dr_match_misc *misc)
 	(_misc2).outer_first_mpls_over_##gre_udp##_s_bos || \
 	(_misc2).outer_first_mpls_over_##gre_udp##_ttl)
 
-#define DR_MASK_IS_FLEX_PARSER_0_SET(_misc2) ( \
+#define DR_MASK_IS_TNL_MPLS_SET(_misc2) ( \
 	DR_MASK_IS_OUTER_MPLS_OVER_GRE_UDP_SET((_misc2), gre) || \
 	DR_MASK_IS_OUTER_MPLS_OVER_GRE_UDP_SET((_misc2), udp))
 
@@ -148,12 +148,23 @@ dr_mask_is_flex_parser_tnl_geneve_set(struct mlx5dr_match_param *mask,
 	       dr_matcher_supp_flex_parser_geneve(&dmn->info.caps);
 }
 
-static bool dr_mask_is_flex_parser_icmpv6_set(struct mlx5dr_match_misc3 *misc3)
+static bool dr_mask_is_icmpv6_set(struct mlx5dr_match_misc3 *misc3)
 {
 	return (misc3->icmpv6_type || misc3->icmpv6_code ||
 		misc3->icmpv6_header_data);
 }
 
+static bool dr_mask_is_flex_parser_icmp_set(struct mlx5dr_match_param *mask,
+					    struct mlx5dr_domain *dmn)
+{
+	if (DR_MASK_IS_ICMPV4_SET(&mask->misc3))
+		return mlx5dr_matcher_supp_flex_parser_icmp_v4(&dmn->info.caps);
+	else if (dr_mask_is_icmpv6_set(&mask->misc3))
+		return mlx5dr_matcher_supp_flex_parser_icmp_v6(&dmn->info.caps);
+
+	return false;
+}
+
 static bool dr_mask_is_wqe_metadata_set(struct mlx5dr_match_misc2 *misc2)
 {
 	return misc2->metadata_reg_a;
@@ -257,7 +268,7 @@ static int dr_matcher_set_ste_builders(struct mlx5dr_matcher *matcher,
 
 		if (dr_mask_is_smac_set(&mask.outer) &&
 		    dr_mask_is_dmac_set(&mask.outer)) {
-			mlx5dr_ste_build_eth_l2_src_des(&sb[idx++], &mask,
+			mlx5dr_ste_build_eth_l2_src_dst(&sb[idx++], &mask,
 							inner, rx);
 		}
 
@@ -277,8 +288,8 @@ static int dr_matcher_set_ste_builders(struct mlx5dr_matcher *matcher,
 								 inner, rx);
 
 			if (DR_MASK_IS_ETH_L4_SET(mask.outer, mask.misc, outer))
-				mlx5dr_ste_build_ipv6_l3_l4(&sb[idx++], &mask,
-							    inner, rx);
+				mlx5dr_ste_build_eth_ipv6_l3_l4(&sb[idx++], &mask,
+								inner, rx);
 		} else {
 			if (dr_mask_is_ipv4_5_tuple_set(&mask.outer))
 				mlx5dr_ste_build_eth_l3_ipv4_5_tuple(&sb[idx++], &mask,
@@ -290,13 +301,11 @@ static int dr_matcher_set_ste_builders(struct mlx5dr_matcher *matcher,
 		}
 
 		if (dr_mask_is_flex_parser_tnl_vxlan_gpe_set(&mask, dmn))
-			mlx5dr_ste_build_flex_parser_tnl_vxlan_gpe(&sb[idx++],
-								   &mask,
-								   inner, rx);
+			mlx5dr_ste_build_tnl_vxlan_gpe(&sb[idx++], &mask,
+						       inner, rx);
 		else if (dr_mask_is_flex_parser_tnl_geneve_set(&mask, dmn))
-			mlx5dr_ste_build_flex_parser_tnl_geneve(&sb[idx++],
-								&mask,
-								inner, rx);
+			mlx5dr_ste_build_tnl_geneve(&sb[idx++], &mask,
+						    inner, rx);
 
 		if (DR_MASK_IS_ETH_L4_MISC_SET(mask.misc3, outer))
 			mlx5dr_ste_build_eth_l4_misc(&sb[idx++], &mask, inner, rx);
@@ -304,22 +313,18 @@ static int dr_matcher_set_ste_builders(struct mlx5dr_matcher *matcher,
 		if (DR_MASK_IS_FIRST_MPLS_SET(mask.misc2, outer))
 			mlx5dr_ste_build_mpls(&sb[idx++], &mask, inner, rx);
 
-		if (DR_MASK_IS_FLEX_PARSER_0_SET(mask.misc2))
-			mlx5dr_ste_build_flex_parser_0(&sb[idx++], &mask,
-						       inner, rx);
+		if (DR_MASK_IS_TNL_MPLS_SET(mask.misc2))
+			mlx5dr_ste_build_tnl_mpls(&sb[idx++], &mask, inner, rx);
 
-		if ((DR_MASK_IS_FLEX_PARSER_ICMPV4_SET(&mask.misc3) &&
-		     mlx5dr_matcher_supp_flex_parser_icmp_v4(&dmn->info.caps)) ||
-		    (dr_mask_is_flex_parser_icmpv6_set(&mask.misc3) &&
-		     mlx5dr_matcher_supp_flex_parser_icmp_v6(&dmn->info.caps))) {
-			ret = mlx5dr_ste_build_flex_parser_1(&sb[idx++],
-							     &mask, &dmn->info.caps,
-							     inner, rx);
+		if (dr_mask_is_flex_parser_icmp_set(&mask, dmn)) {
+			ret = mlx5dr_ste_build_icmp(&sb[idx++],
+						    &mask, &dmn->info.caps,
+						    inner, rx);
 			if (ret)
 				return ret;
 		}
-		if (dr_mask_is_gre_set(&mask.misc))
-			mlx5dr_ste_build_gre(&sb[idx++], &mask, inner, rx);
+		if (dr_mask_is_tnl_gre_set(&mask.misc))
+			mlx5dr_ste_build_tnl_gre(&sb[idx++], &mask, inner, rx);
 	}
 
 	/* Inner */
@@ -334,7 +339,7 @@ static int dr_matcher_set_ste_builders(struct mlx5dr_matcher *matcher,
 
 		if (dr_mask_is_smac_set(&mask.inner) &&
 		    dr_mask_is_dmac_set(&mask.inner)) {
-			mlx5dr_ste_build_eth_l2_src_des(&sb[idx++],
+			mlx5dr_ste_build_eth_l2_src_dst(&sb[idx++],
 							&mask, inner, rx);
 		}
 
@@ -354,8 +359,8 @@ static int dr_matcher_set_ste_builders(struct mlx5dr_matcher *matcher,
 								 inner, rx);
 
 			if (DR_MASK_IS_ETH_L4_SET(mask.inner, mask.misc, inner))
-				mlx5dr_ste_build_ipv6_l3_l4(&sb[idx++], &mask,
-							    inner, rx);
+				mlx5dr_ste_build_eth_ipv6_l3_l4(&sb[idx++], &mask,
+								inner, rx);
 		} else {
 			if (dr_mask_is_ipv4_5_tuple_set(&mask.inner))
 				mlx5dr_ste_build_eth_l3_ipv4_5_tuple(&sb[idx++], &mask,
@@ -372,8 +377,8 @@ static int dr_matcher_set_ste_builders(struct mlx5dr_matcher *matcher,
 		if (DR_MASK_IS_FIRST_MPLS_SET(mask.misc2, inner))
 			mlx5dr_ste_build_mpls(&sb[idx++], &mask, inner, rx);
 
-		if (DR_MASK_IS_FLEX_PARSER_0_SET(mask.misc2))
-			mlx5dr_ste_build_flex_parser_0(&sb[idx++], &mask, inner, rx);
+		if (DR_MASK_IS_TNL_MPLS_SET(mask.misc2))
+			mlx5dr_ste_build_tnl_mpls(&sb[idx++], &mask, inner, rx);
 	}
 	/* Empty matcher, takes all */
 	if (matcher->match_criteria == DR_MATCHER_CRITERIA_EMPTY)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c
index b01aaec75622..d275823bff2f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c
@@ -1090,7 +1090,7 @@ static int dr_ste_build_eth_l2_src_des_tag(struct mlx5dr_match_param *value,
 	return 0;
 }
 
-void mlx5dr_ste_build_eth_l2_src_des(struct mlx5dr_ste_build *sb,
+void mlx5dr_ste_build_eth_l2_src_dst(struct mlx5dr_ste_build *sb,
 				     struct mlx5dr_match_param *mask,
 				     bool inner, bool rx)
 {
@@ -1594,9 +1594,9 @@ static int dr_ste_build_ipv6_l3_l4_tag(struct mlx5dr_match_param *value,
 	return 0;
 }
 
-void mlx5dr_ste_build_ipv6_l3_l4(struct mlx5dr_ste_build *sb,
-				 struct mlx5dr_match_param *mask,
-				 bool inner, bool rx)
+void mlx5dr_ste_build_eth_ipv6_l3_l4(struct mlx5dr_ste_build *sb,
+				     struct mlx5dr_match_param *mask,
+				     bool inner, bool rx)
 {
 	dr_ste_build_ipv6_l3_l4_bit_mask(mask, inner, sb->bit_mask);
 
@@ -1693,8 +1693,8 @@ static int dr_ste_build_gre_tag(struct mlx5dr_match_param *value,
 	return 0;
 }
 
-void mlx5dr_ste_build_gre(struct mlx5dr_ste_build *sb,
-			  struct mlx5dr_match_param *mask, bool inner, bool rx)
+void mlx5dr_ste_build_tnl_gre(struct mlx5dr_ste_build *sb,
+			      struct mlx5dr_match_param *mask, bool inner, bool rx)
 {
 	dr_ste_build_gre_bit_mask(mask, inner, sb->bit_mask);
 
@@ -1771,9 +1771,9 @@ static int dr_ste_build_flex_parser_0_tag(struct mlx5dr_match_param *value,
 	return 0;
 }
 
-void mlx5dr_ste_build_flex_parser_0(struct mlx5dr_ste_build *sb,
-				    struct mlx5dr_match_param *mask,
-				    bool inner, bool rx)
+void mlx5dr_ste_build_tnl_mpls(struct mlx5dr_ste_build *sb,
+			       struct mlx5dr_match_param *mask,
+			       bool inner, bool rx)
 {
 	dr_ste_build_flex_parser_0_bit_mask(mask, inner, sb->bit_mask);
 
@@ -1792,8 +1792,8 @@ static int dr_ste_build_flex_parser_1_bit_mask(struct mlx5dr_match_param *mask,
 					       struct mlx5dr_cmd_caps *caps,
 					       u8 *bit_mask)
 {
+	bool is_ipv4_mask = DR_MASK_IS_ICMPV4_SET(&mask->misc3);
 	struct mlx5dr_match_misc3 *misc_3_mask = &mask->misc3;
-	bool is_ipv4_mask = DR_MASK_IS_FLEX_PARSER_ICMPV4_SET(misc_3_mask);
 	u32 icmp_header_data_mask;
 	u32 icmp_type_mask;
 	u32 icmp_code_mask;
@@ -1869,7 +1869,7 @@ static int dr_ste_build_flex_parser_1_tag(struct mlx5dr_match_param *value,
 	u32 icmp_code;
 	bool is_ipv4;
 
-	is_ipv4 = DR_MASK_IS_FLEX_PARSER_ICMPV4_SET(misc_3);
+	is_ipv4 = DR_MASK_IS_ICMPV4_SET(misc_3);
 	if (is_ipv4) {
 		icmp_header_data	= misc_3->icmpv4_header_data;
 		icmp_type		= misc_3->icmpv4_type;
@@ -1928,10 +1928,10 @@ static int dr_ste_build_flex_parser_1_tag(struct mlx5dr_match_param *value,
 	return 0;
 }
 
-int mlx5dr_ste_build_flex_parser_1(struct mlx5dr_ste_build *sb,
-				   struct mlx5dr_match_param *mask,
-				   struct mlx5dr_cmd_caps *caps,
-				   bool inner, bool rx)
+int mlx5dr_ste_build_icmp(struct mlx5dr_ste_build *sb,
+			  struct mlx5dr_match_param *mask,
+			  struct mlx5dr_cmd_caps *caps,
+			  bool inner, bool rx)
 {
 	int ret;
 
@@ -2069,9 +2069,9 @@ dr_ste_build_flex_parser_tnl_vxlan_gpe_tag(struct mlx5dr_match_param *value,
 	return 0;
 }
 
-void mlx5dr_ste_build_flex_parser_tnl_vxlan_gpe(struct mlx5dr_ste_build *sb,
-						struct mlx5dr_match_param *mask,
-						bool inner, bool rx)
+void mlx5dr_ste_build_tnl_vxlan_gpe(struct mlx5dr_ste_build *sb,
+				    struct mlx5dr_match_param *mask,
+				    bool inner, bool rx)
 {
 	dr_ste_build_flex_parser_tnl_vxlan_gpe_bit_mask(mask, inner,
 							sb->bit_mask);
@@ -2122,9 +2122,9 @@ dr_ste_build_flex_parser_tnl_geneve_tag(struct mlx5dr_match_param *value,
 	return 0;
 }
 
-void mlx5dr_ste_build_flex_parser_tnl_geneve(struct mlx5dr_ste_build *sb,
-					     struct mlx5dr_match_param *mask,
-					     bool inner, bool rx)
+void mlx5dr_ste_build_tnl_geneve(struct mlx5dr_ste_build *sb,
+				 struct mlx5dr_match_param *mask,
+				 bool inner, bool rx)
 {
 	dr_ste_build_flex_parser_tnl_geneve_bit_mask(mask, sb->bit_mask);
 	sb->rx = rx;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
index 77615f6d2fdf..5d0ae664aac0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
@@ -288,7 +288,7 @@ int mlx5dr_ste_build_ste_arr(struct mlx5dr_matcher *matcher,
 			     struct mlx5dr_matcher_rx_tx *nic_matcher,
 			     struct mlx5dr_match_param *value,
 			     u8 *ste_arr);
-void mlx5dr_ste_build_eth_l2_src_des(struct mlx5dr_ste_build *builder,
+void mlx5dr_ste_build_eth_l2_src_dst(struct mlx5dr_ste_build *builder,
 				     struct mlx5dr_match_param *mask,
 				     bool inner, bool rx);
 void mlx5dr_ste_build_eth_l3_ipv4_5_tuple(struct mlx5dr_ste_build *sb,
@@ -312,31 +312,31 @@ void mlx5dr_ste_build_eth_l2_dst(struct mlx5dr_ste_build *sb,
 void mlx5dr_ste_build_eth_l2_tnl(struct mlx5dr_ste_build *sb,
 				 struct mlx5dr_match_param *mask,
 				 bool inner, bool rx);
-void mlx5dr_ste_build_ipv6_l3_l4(struct mlx5dr_ste_build *sb,
-				 struct mlx5dr_match_param *mask,
-				 bool inner, bool rx);
+void mlx5dr_ste_build_eth_ipv6_l3_l4(struct mlx5dr_ste_build *sb,
+				     struct mlx5dr_match_param *mask,
+				     bool inner, bool rx);
 void mlx5dr_ste_build_eth_l4_misc(struct mlx5dr_ste_build *sb,
 				  struct mlx5dr_match_param *mask,
 				  bool inner, bool rx);
-void mlx5dr_ste_build_gre(struct mlx5dr_ste_build *sb,
-			  struct mlx5dr_match_param *mask,
-			  bool inner, bool rx);
+void mlx5dr_ste_build_tnl_gre(struct mlx5dr_ste_build *sb,
+			      struct mlx5dr_match_param *mask,
+			      bool inner, bool rx);
 void mlx5dr_ste_build_mpls(struct mlx5dr_ste_build *sb,
 			   struct mlx5dr_match_param *mask,
 			   bool inner, bool rx);
-void mlx5dr_ste_build_flex_parser_0(struct mlx5dr_ste_build *sb,
+void mlx5dr_ste_build_tnl_mpls(struct mlx5dr_ste_build *sb,
+			       struct mlx5dr_match_param *mask,
+			       bool inner, bool rx);
+int mlx5dr_ste_build_icmp(struct mlx5dr_ste_build *sb,
+			  struct mlx5dr_match_param *mask,
+			  struct mlx5dr_cmd_caps *caps,
+			  bool inner, bool rx);
+void mlx5dr_ste_build_tnl_vxlan_gpe(struct mlx5dr_ste_build *sb,
 				    struct mlx5dr_match_param *mask,
 				    bool inner, bool rx);
-int mlx5dr_ste_build_flex_parser_1(struct mlx5dr_ste_build *sb,
-				   struct mlx5dr_match_param *mask,
-				   struct mlx5dr_cmd_caps *caps,
-				   bool inner, bool rx);
-void mlx5dr_ste_build_flex_parser_tnl_vxlan_gpe(struct mlx5dr_ste_build *sb,
-						struct mlx5dr_match_param *mask,
-						bool inner, bool rx);
-void mlx5dr_ste_build_flex_parser_tnl_geneve(struct mlx5dr_ste_build *sb,
-					     struct mlx5dr_match_param *mask,
-					     bool inner, bool rx);
+void mlx5dr_ste_build_tnl_geneve(struct mlx5dr_ste_build *sb,
+				 struct mlx5dr_match_param *mask,
+				 bool inner, bool rx);
 void mlx5dr_ste_build_general_purpose(struct mlx5dr_ste_build *sb,
 				      struct mlx5dr_match_param *mask,
 				      bool inner, bool rx);
@@ -588,9 +588,9 @@ struct mlx5dr_match_param {
 	struct mlx5dr_match_misc3 misc3;
 };
 
-#define DR_MASK_IS_FLEX_PARSER_ICMPV4_SET(_misc3) ((_misc3)->icmpv4_type || \
-						   (_misc3)->icmpv4_code || \
-						   (_misc3)->icmpv4_header_data)
+#define DR_MASK_IS_ICMPV4_SET(_misc3) ((_misc3)->icmpv4_type || \
+				       (_misc3)->icmpv4_code || \
+				       (_misc3)->icmpv4_header_data)
 
 struct mlx5dr_esw_caps {
 	u64 drop_icm_address_rx;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 14/15] net/mlx5: DR, Rename matcher functions to be more HW agnostic
  2020-09-25 19:37 [pull request][net-next 00/15] mlx5 SW steering updates 2020-09-25 saeed
                   ` (12 preceding siblings ...)
  2020-09-25 19:38 ` [net-next 13/15] net/mlx5: DR, Rename builders HW specific names saeed
@ 2020-09-25 19:38 ` saeed
  2020-09-25 19:38 ` [net-next 15/15] net/mlx5: DR, Add support for rule creation with flow source hint saeed
  14 siblings, 0 replies; 23+ messages in thread
From: saeed @ 2020-09-25 19:38 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Yevgeny Kliteynik, Alex Vesker, Saeed Mahameed

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

Remove flex parser from the matcher function names since
the matcher should not be aware of such HW specific details.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/steering/dr_cmd.c      |  4 +-
 .../mellanox/mlx5/core/steering/dr_matcher.c  | 54 +++++++++++--------
 .../mellanox/mlx5/core/steering/dr_types.h    | 12 -----
 3 files changed, 33 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_cmd.c
index 6bd34b293007..ebc879052e42 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_cmd.c
@@ -93,12 +93,12 @@ int mlx5dr_cmd_query_device(struct mlx5_core_dev *mdev,
 	caps->gvmi		= MLX5_CAP_GEN(mdev, vhca_id);
 	caps->flex_protocols	= MLX5_CAP_GEN(mdev, flex_parser_protocols);
 
-	if (mlx5dr_matcher_supp_flex_parser_icmp_v4(caps)) {
+	if (caps->flex_protocols & MLX5_FLEX_PARSER_ICMP_V4_ENABLED) {
 		caps->flex_parser_id_icmp_dw0 = MLX5_CAP_GEN(mdev, flex_parser_id_icmp_dw0);
 		caps->flex_parser_id_icmp_dw1 = MLX5_CAP_GEN(mdev, flex_parser_id_icmp_dw1);
 	}
 
-	if (mlx5dr_matcher_supp_flex_parser_icmp_v6(caps)) {
+	if (caps->flex_protocols & MLX5_FLEX_PARSER_ICMP_V6_ENABLED) {
 		caps->flex_parser_id_icmpv6_dw0 =
 			MLX5_CAP_GEN(mdev, flex_parser_id_icmpv6_dw0);
 		caps->flex_parser_id_icmpv6_dw1 =
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c
index 752afdb20e23..cb5202e17856 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_matcher.c
@@ -103,7 +103,7 @@ static bool dr_mask_is_tnl_gre_set(struct mlx5dr_match_misc *misc)
 	DR_MASK_IS_OUTER_MPLS_OVER_GRE_UDP_SET((_misc2), udp))
 
 static bool
-dr_mask_is_misc3_vxlan_gpe_set(struct mlx5dr_match_misc3 *misc3)
+dr_mask_is_vxlan_gpe_set(struct mlx5dr_match_misc3 *misc3)
 {
 	return (misc3->outer_vxlan_gpe_vni ||
 		misc3->outer_vxlan_gpe_next_protocol ||
@@ -111,21 +111,20 @@ dr_mask_is_misc3_vxlan_gpe_set(struct mlx5dr_match_misc3 *misc3)
 }
 
 static bool
-dr_matcher_supp_flex_parser_vxlan_gpe(struct mlx5dr_cmd_caps *caps)
+dr_matcher_supp_vxlan_gpe(struct mlx5dr_cmd_caps *caps)
 {
-	return caps->flex_protocols &
-	       MLX5_FLEX_PARSER_VXLAN_GPE_ENABLED;
+	return caps->flex_protocols & MLX5_FLEX_PARSER_VXLAN_GPE_ENABLED;
 }
 
 static bool
-dr_mask_is_flex_parser_tnl_vxlan_gpe_set(struct mlx5dr_match_param *mask,
-					 struct mlx5dr_domain *dmn)
+dr_mask_is_tnl_vxlan_gpe(struct mlx5dr_match_param *mask,
+			 struct mlx5dr_domain *dmn)
 {
-	return dr_mask_is_misc3_vxlan_gpe_set(&mask->misc3) &&
-	       dr_matcher_supp_flex_parser_vxlan_gpe(&dmn->info.caps);
+	return dr_mask_is_vxlan_gpe_set(&mask->misc3) &&
+	       dr_matcher_supp_vxlan_gpe(&dmn->info.caps);
 }
 
-static bool dr_mask_is_misc_geneve_set(struct mlx5dr_match_misc *misc)
+static bool dr_mask_is_tnl_geneve_set(struct mlx5dr_match_misc *misc)
 {
 	return misc->geneve_vni ||
 	       misc->geneve_oam ||
@@ -134,18 +133,27 @@ static bool dr_mask_is_misc_geneve_set(struct mlx5dr_match_misc *misc)
 }
 
 static bool
-dr_matcher_supp_flex_parser_geneve(struct mlx5dr_cmd_caps *caps)
+dr_matcher_supp_tnl_geneve(struct mlx5dr_cmd_caps *caps)
 {
-	return caps->flex_protocols &
-	       MLX5_FLEX_PARSER_GENEVE_ENABLED;
+	return caps->flex_protocols & MLX5_FLEX_PARSER_GENEVE_ENABLED;
 }
 
 static bool
-dr_mask_is_flex_parser_tnl_geneve_set(struct mlx5dr_match_param *mask,
-				      struct mlx5dr_domain *dmn)
+dr_mask_is_tnl_geneve(struct mlx5dr_match_param *mask,
+		      struct mlx5dr_domain *dmn)
 {
-	return dr_mask_is_misc_geneve_set(&mask->misc) &&
-	       dr_matcher_supp_flex_parser_geneve(&dmn->info.caps);
+	return dr_mask_is_tnl_geneve_set(&mask->misc) &&
+	       dr_matcher_supp_tnl_geneve(&dmn->info.caps);
+}
+
+static int dr_matcher_supp_icmp_v4(struct mlx5dr_cmd_caps *caps)
+{
+	return caps->flex_protocols & MLX5_FLEX_PARSER_ICMP_V4_ENABLED;
+}
+
+static int dr_matcher_supp_icmp_v6(struct mlx5dr_cmd_caps *caps)
+{
+	return caps->flex_protocols & MLX5_FLEX_PARSER_ICMP_V6_ENABLED;
 }
 
 static bool dr_mask_is_icmpv6_set(struct mlx5dr_match_misc3 *misc3)
@@ -154,13 +162,13 @@ static bool dr_mask_is_icmpv6_set(struct mlx5dr_match_misc3 *misc3)
 		misc3->icmpv6_header_data);
 }
 
-static bool dr_mask_is_flex_parser_icmp_set(struct mlx5dr_match_param *mask,
-					    struct mlx5dr_domain *dmn)
+static bool dr_mask_is_icmp(struct mlx5dr_match_param *mask,
+			    struct mlx5dr_domain *dmn)
 {
 	if (DR_MASK_IS_ICMPV4_SET(&mask->misc3))
-		return mlx5dr_matcher_supp_flex_parser_icmp_v4(&dmn->info.caps);
+		return dr_matcher_supp_icmp_v4(&dmn->info.caps);
 	else if (dr_mask_is_icmpv6_set(&mask->misc3))
-		return mlx5dr_matcher_supp_flex_parser_icmp_v6(&dmn->info.caps);
+		return dr_matcher_supp_icmp_v6(&dmn->info.caps);
 
 	return false;
 }
@@ -300,10 +308,10 @@ static int dr_matcher_set_ste_builders(struct mlx5dr_matcher *matcher,
 								  inner, rx);
 		}
 
-		if (dr_mask_is_flex_parser_tnl_vxlan_gpe_set(&mask, dmn))
+		if (dr_mask_is_tnl_vxlan_gpe(&mask, dmn))
 			mlx5dr_ste_build_tnl_vxlan_gpe(&sb[idx++], &mask,
 						       inner, rx);
-		else if (dr_mask_is_flex_parser_tnl_geneve_set(&mask, dmn))
+		else if (dr_mask_is_tnl_geneve(&mask, dmn))
 			mlx5dr_ste_build_tnl_geneve(&sb[idx++], &mask,
 						    inner, rx);
 
@@ -316,7 +324,7 @@ static int dr_matcher_set_ste_builders(struct mlx5dr_matcher *matcher,
 		if (DR_MASK_IS_TNL_MPLS_SET(mask.misc2))
 			mlx5dr_ste_build_tnl_mpls(&sb[idx++], &mask, inner, rx);
 
-		if (dr_mask_is_flex_parser_icmp_set(&mask, dmn)) {
+		if (dr_mask_is_icmp(&mask, dmn)) {
 			ret = mlx5dr_ste_build_icmp(&sb[idx++],
 						    &mask, &dmn->info.caps,
 						    inner, rx);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
index 5d0ae664aac0..3086a44f7e7f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
@@ -843,18 +843,6 @@ static inline void mlx5dr_domain_unlock(struct mlx5dr_domain *dmn)
 	mlx5dr_domain_nic_unlock(&dmn->info.rx);
 }
 
-static inline int
-mlx5dr_matcher_supp_flex_parser_icmp_v4(struct mlx5dr_cmd_caps *caps)
-{
-	return caps->flex_protocols & MLX5_FLEX_PARSER_ICMP_V4_ENABLED;
-}
-
-static inline int
-mlx5dr_matcher_supp_flex_parser_icmp_v6(struct mlx5dr_cmd_caps *caps)
-{
-	return caps->flex_protocols & MLX5_FLEX_PARSER_ICMP_V6_ENABLED;
-}
-
 int mlx5dr_matcher_select_builders(struct mlx5dr_matcher *matcher,
 				   struct mlx5dr_matcher_rx_tx *nic_matcher,
 				   enum mlx5dr_ipv outer_ipv,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 15/15] net/mlx5: DR, Add support for rule creation with flow source hint
  2020-09-25 19:37 [pull request][net-next 00/15] mlx5 SW steering updates 2020-09-25 saeed
                   ` (13 preceding siblings ...)
  2020-09-25 19:38 ` [net-next 14/15] net/mlx5: DR, Rename matcher functions to be more HW agnostic saeed
@ 2020-09-25 19:38 ` saeed
  14 siblings, 0 replies; 23+ messages in thread
From: saeed @ 2020-09-25 19:38 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski
  Cc: netdev, Hamdan Igbaria, Alex Vesker, Saeed Mahameed

From: Hamdan Igbaria <hamdani@nvidia.com>

Skip the rule according to flow arrival source, in case of RX and the
source is local port skip and in case of TX and the source is uplink
skip, we get this info according to the flow source hint we get from
upper layers when creating the rule.
This is needed because for example in case of FDB table which has a TX
and RX tables and we are inserting a rule with an encap action which
is only a TX action, in this case rule will fail on RX, so we can rely
on the flow source hint and skip RX in such case.
Until now we relied on metadata regc_0 that upper layer mapped the
port in the regc_0, but the problem is that upper layer did not always
use regc_0 for port mapping, so now we added support to flow source
hint which upper layers will pass to SW steering when creating a rule.

Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/steering/dr_rule.c     | 41 ++++++++++---------
 .../mellanox/mlx5/core/steering/dr_types.h    |  1 +
 .../mellanox/mlx5/core/steering/fs_dr.c       |  3 +-
 .../mellanox/mlx5/core/steering/mlx5dr.h      |  3 +-
 4 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c
index 17577181ce8f..b3c9dc032026 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c
@@ -985,31 +985,28 @@ static enum mlx5dr_ipv dr_rule_get_ipv(struct mlx5dr_match_spec *spec)
 static bool dr_rule_skip(enum mlx5dr_domain_type domain,
 			 enum mlx5dr_ste_entry_type ste_type,
 			 struct mlx5dr_match_param *mask,
-			 struct mlx5dr_match_param *value)
+			 struct mlx5dr_match_param *value,
+			 u32 flow_source)
 {
+	bool rx = ste_type == MLX5DR_STE_TYPE_RX;
+
 	if (domain != MLX5DR_DOMAIN_TYPE_FDB)
 		return false;
 
 	if (mask->misc.source_port) {
-		if (ste_type == MLX5DR_STE_TYPE_RX)
-			if (value->misc.source_port != WIRE_PORT)
-				return true;
+		if (rx && value->misc.source_port != WIRE_PORT)
+			return true;
 
-		if (ste_type == MLX5DR_STE_TYPE_TX)
-			if (value->misc.source_port == WIRE_PORT)
-				return true;
+		if (!rx && value->misc.source_port == WIRE_PORT)
+			return true;
 	}
 
-	/* Metadata C can be used to describe the source vport */
-	if (mask->misc2.metadata_reg_c_0) {
-		if (ste_type == MLX5DR_STE_TYPE_RX)
-			if ((value->misc2.metadata_reg_c_0 & WIRE_PORT) != WIRE_PORT)
-				return true;
+	if (rx && flow_source == MLX5_FLOW_CONTEXT_FLOW_SOURCE_LOCAL_VPORT)
+		return true;
+
+	if (!rx && flow_source == MLX5_FLOW_CONTEXT_FLOW_SOURCE_UPLINK)
+		return true;
 
-		if (ste_type == MLX5DR_STE_TYPE_TX)
-			if ((value->misc2.metadata_reg_c_0 & WIRE_PORT) == WIRE_PORT)
-				return true;
-	}
 	return false;
 }
 
@@ -1038,7 +1035,8 @@ dr_rule_create_rule_nic(struct mlx5dr_rule *rule,
 
 	INIT_LIST_HEAD(&nic_rule->rule_members_list);
 
-	if (dr_rule_skip(dmn->type, nic_dmn->ste_type, &matcher->mask, param))
+	if (dr_rule_skip(dmn->type, nic_dmn->ste_type, &matcher->mask, param,
+			 rule->flow_source))
 		return 0;
 
 	hw_ste_arr = kzalloc(DR_RULE_MAX_STE_CHAIN * DR_STE_SIZE, GFP_KERNEL);
@@ -1173,7 +1171,8 @@ static struct mlx5dr_rule *
 dr_rule_create_rule(struct mlx5dr_matcher *matcher,
 		    struct mlx5dr_match_parameters *value,
 		    size_t num_actions,
-		    struct mlx5dr_action *actions[])
+		    struct mlx5dr_action *actions[],
+		    u32 flow_source)
 {
 	struct mlx5dr_domain *dmn = matcher->tbl->dmn;
 	struct mlx5dr_match_param param = {};
@@ -1188,6 +1187,7 @@ dr_rule_create_rule(struct mlx5dr_matcher *matcher,
 		return NULL;
 
 	rule->matcher = matcher;
+	rule->flow_source = flow_source;
 	INIT_LIST_HEAD(&rule->rule_actions_list);
 
 	ret = dr_rule_add_action_members(rule, num_actions, actions);
@@ -1232,13 +1232,14 @@ dr_rule_create_rule(struct mlx5dr_matcher *matcher,
 struct mlx5dr_rule *mlx5dr_rule_create(struct mlx5dr_matcher *matcher,
 				       struct mlx5dr_match_parameters *value,
 				       size_t num_actions,
-				       struct mlx5dr_action *actions[])
+				       struct mlx5dr_action *actions[],
+				       u32 flow_source)
 {
 	struct mlx5dr_rule *rule;
 
 	refcount_inc(&matcher->refcount);
 
-	rule = dr_rule_create_rule(matcher, value, num_actions, actions);
+	rule = dr_rule_create_rule(matcher, value, num_actions, actions, flow_source);
 	if (!rule)
 		refcount_dec(&matcher->refcount);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
index 3086a44f7e7f..3e423c8ed22f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
@@ -796,6 +796,7 @@ struct mlx5dr_rule {
 	struct mlx5dr_rule_rx_tx rx;
 	struct mlx5dr_rule_rx_tx tx;
 	struct list_head rule_actions_list;
+	u32 flow_source;
 };
 
 void mlx5dr_rule_update_rule_member(struct mlx5dr_ste *new_ste,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/fs_dr.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/fs_dr.c
index 9b08eb557a31..96c39a17d026 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/fs_dr.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/fs_dr.c
@@ -487,7 +487,8 @@ static int mlx5_cmd_dr_create_fte(struct mlx5_flow_root_namespace *ns,
 	rule = mlx5dr_rule_create(group->fs_dr_matcher.dr_matcher,
 				  &params,
 				  num_actions,
-				  actions);
+				  actions,
+				  fte->flow_context.flow_source);
 	if (!rule) {
 		err = -EINVAL;
 		goto free_actions;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h
index 0aaba0ae9cf7..726c4cfcc399 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h
@@ -67,7 +67,8 @@ struct mlx5dr_rule *
 mlx5dr_rule_create(struct mlx5dr_matcher *matcher,
 		   struct mlx5dr_match_parameters *value,
 		   size_t num_actions,
-		   struct mlx5dr_action *actions[]);
+		   struct mlx5dr_action *actions[],
+		   u32 flow_source);
 
 int mlx5dr_rule_destroy(struct mlx5dr_rule *rule);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [net-next 01/15] net/mlx5: DR, Add buddy allocator utilities
  2020-09-25 19:37 ` [net-next 01/15] net/mlx5: DR, Add buddy allocator utilities saeed
@ 2020-09-26 22:15   ` David Miller
  2020-09-28 19:58     ` Yevgeny Kliteynik
  0 siblings, 1 reply; 23+ messages in thread
From: David Miller @ 2020-09-26 22:15 UTC (permalink / raw)
  To: saeed; +Cc: kuba, netdev, kliteyn, erezsh, mbloch, saeedm

From: saeed@kernel.org
Date: Fri, 25 Sep 2020 12:37:55 -0700

> From: Yevgeny Kliteynik <kliteyn@nvidia.com>
> 
> Add implementation of SW Steering variation of buddy allocator.
> 
> The buddy system for ICM memory uses 2 main data structures:
>   - Bitmap per order, that keeps the current state of allocated
>     blocks for this order
>   - Indicator for the number of available blocks per each order
> 
> In addition, there is one more hierarchy of searching in the bitmaps
> in order to accelerate the search of the next free block which done
> via find-first function:
> The buddy system spends lots of its time in searching the next free
> space using function find_first_bit, which scans a big array of long
> values in order to find the first bit. We added one more array of
> longs, where each bit indicates a long value in the original array,
> that way there is a need for much less searches for the next free area.
> 
> For example, for the following bits array of 128 bits where all
> bits are zero except for the last bit  :  0000........00001
> the corresponding bits-per-long array is:  0001
> 
> The search will be done over the bits-per-long array first, and after
> the first bit is found, we will use it as a start pointer in the
> bigger array (bits array).
> 
> Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
> Reviewed-by: Mark Bloch <mbloch@nvidia.com>
> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Instead of a bits-per-long array, it seems so much simpler and more
cache friendly to maintain instead just a "lowest set bit" value.

In the initial state it is zero for all values, remember this is just
a hint.

When you allocate, if num_free is non-zero of course, you start the
bit scan from the "lowest set bit" value.  When the set bit is found,
update the "lowest set bit" cache to the set bit plus one (or zero if
the new value exceeds the bitmap size).

Then on free you update "lowest set bit" to the bit being set if it is
smaller than the current "lowest set bit" value for that order.

No double scanning of bitmap arrays, just a single bitmap search with
a variable start point.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [net-next 01/15] net/mlx5: DR, Add buddy allocator utilities
  2020-09-26 22:15   ` David Miller
@ 2020-09-28 19:58     ` Yevgeny Kliteynik
  2020-09-28 21:41       ` David Miller
  0 siblings, 1 reply; 23+ messages in thread
From: Yevgeny Kliteynik @ 2020-09-28 19:58 UTC (permalink / raw)
  To: David Miller, saeed
  Cc: kuba, netdev, Erez Shitrit, Mark Bloch, Saeed Mahameed

> From: David Miller <davem@davemloft.net>
> Sent: Sunday, September 27, 2020 01:16
> To: saeed@kernel.org
> Cc: kuba@kernel.org; netdev@vger.kernel.org; Yevgeny Kliteynik
> <kliteyn@nvidia.com>; Erez Shitrit <erezsh@nvidia.com>; Mark Bloch
> <mbloch@nvidia.com>; Saeed Mahameed <saeedm@nvidia.com>
> Subject: Re: [net-next 01/15] net/mlx5: DR, Add buddy allocator utilities
> 
> External email: Use caution opening links or attachments
> 
> 
> From: saeed@kernel.org
> Date: Fri, 25 Sep 2020 12:37:55 -0700
> 
> > From: Yevgeny Kliteynik <kliteyn@nvidia.com>
> >
> > Add implementation of SW Steering variation of buddy allocator.
> >
> > The buddy system for ICM memory uses 2 main data structures:
> >   - Bitmap per order, that keeps the current state of allocated
> >     blocks for this order
> >   - Indicator for the number of available blocks per each order
> >
> > In addition, there is one more hierarchy of searching in the bitmaps
> > in order to accelerate the search of the next free block which done
> > via find-first function:
> > The buddy system spends lots of its time in searching the next free
> > space using function find_first_bit, which scans a big array of long
> > values in order to find the first bit. We added one more array of
> > longs, where each bit indicates a long value in the original array,
> > that way there is a need for much less searches for the next free area.
> >
> > For example, for the following bits array of 128 bits where all bits
> > are zero except for the last bit  :  0000........00001 the
> > corresponding bits-per-long array is:  0001
> >
> > The search will be done over the bits-per-long array first, and after
> > the first bit is found, we will use it as a start pointer in the
> > bigger array (bits array).
> >
> > Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
> > Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
> > Reviewed-by: Mark Bloch <mbloch@nvidia.com>
> > Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
> 
> Instead of a bits-per-long array, it seems so much simpler and more cache
> friendly to maintain instead just a "lowest set bit" value.
> 
> In the initial state it is zero for all values, remember this is just a hint.
> 
> When you allocate, if num_free is non-zero of course, you start the bit scan
> from the "lowest set bit" value.  When the set bit is found, update the "lowest
> set bit" cache to the set bit plus one (or zero if the new value exceeds the bitmap
> size).

This will work when you allocate everything and freeing "in order".
In our vase the system is more dynamic - we allocate, and then some chunks
are freed all over, creating free spots in the bit array in a rather random manner.
By replacing the bits-per-long array with a single counter we loose this ability
to jump faster to the free spot.
It is still an improvement to start from a certain place and not from the beginning,
but bits-per-long array saves more time.
Also, in terms of memory, it's not a big hit - it's bit per 32 chunks (or byte per 256 chunks).

-- YK

> Then on free you update "lowest set bit" to the bit being set if it is smaller than
> the current "lowest set bit" value for that order.
> 
> No double scanning of bitmap arrays, just a single bitmap search with a variable
> start point.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next 01/15] net/mlx5: DR, Add buddy allocator utilities
  2020-09-28 19:58     ` Yevgeny Kliteynik
@ 2020-09-28 21:41       ` David Miller
  2020-09-29 20:44         ` Yevgeny Kliteynik
  0 siblings, 1 reply; 23+ messages in thread
From: David Miller @ 2020-09-28 21:41 UTC (permalink / raw)
  To: kliteyn; +Cc: saeed, kuba, netdev, erezsh, mbloch, saeedm

From: Yevgeny Kliteynik <kliteyn@nvidia.com>
Date: Mon, 28 Sep 2020 19:58:59 +0000

> By replacing the bits-per-long array with a single counter we loose
> this ability to jump faster to the free spot.

I don't understand why this is true, because upon the free we will
update the hint and that's where the next bit search will start.

Have you even tried my suggestion?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next 01/15] net/mlx5: DR, Add buddy allocator utilities
  2020-09-28 21:41       ` David Miller
@ 2020-09-29 20:44         ` Yevgeny Kliteynik
  2020-10-06 13:02           ` Yevgeny Kliteynik
  0 siblings, 1 reply; 23+ messages in thread
From: Yevgeny Kliteynik @ 2020-09-29 20:44 UTC (permalink / raw)
  To: David Miller; +Cc: saeed, kuba, netdev, erezsh, mbloch, saeedm


On 29-Sep-20 00:41, David Miller wrote:
> 
> From: Yevgeny Kliteynik <kliteyn@nvidia.com>
> Date: Mon, 28 Sep 2020 19:58:59 +0000
> 
>> By replacing the bits-per-long array with a single counter we loose
>> this ability to jump faster to the free spot.
> 
> I don't understand why this is true, because upon the free we will
> update the hint and that's where the next bit search will start.
> 
> Have you even tried my suggestion?

I did, but I couldn't make it work for some use cases that
I expect to be fairly common. Perhaps I misunderstood something.
Let's look at the following use case (I'll make it 'extreme' to
better illustrate the problem):

A buddy is all allocated by mostly small chunks.
This means that the bit array of size n will look something like this:

idx  | 0 | 1 | ...                                     |n-2|n-1|
      |---|---|-----------------------------------------|---|---|
array| 0 | 0 | ....... all zeroes - all full .......   | 0 | 0 |
      |---|---|-----------------------------------------|---|---|

Then chunk that was allocated at index n-1 is freed, which means
that 'lowest set bit' is set to n-1.
Array will look like this:

idx  | 0 | 1 | ...                                     |n-2|n-1|
      |---|---|-----------------------------------------|---|---|
array| 0 | 0 | ....... all zeroes - all full .......   | 0 | 1 |
      |---|---|-----------------------------------------|---|---|


Then chunk that was allocated at index 0 is freed, which means
that 'lowest set bit' is set to 0.
Array will look like this:

idx  | 0 | 1 | ...                                     |n-2|n-1|
      |---|---|-----------------------------------------|---|---|
array| 1 | 0 | ....... all zeroes - all full .......   | 0 | 1 |
      |---|---|-----------------------------------------|---|---|

Then a new allocation request comes in.
The 'lowest set bit' is 0, which allows us to find the spot.

The 'lowest set bit' is now set to 1.
Array will look like this:

idx  | 0 | 1 | ...                                     |n-2|n-1|
      |---|---|-----------------------------------------|---|---|
array| 0 | 0 | ....... all zeroes - all full .......   | 0 | 1 |
      |---|---|-----------------------------------------|---|---|

Then another allocation request comes in.
The 'lowest set bit' is 1, which means that we now need
to scan the whole array.

Is there a way to make 'lowest set bit' hint to work for
the use cases similar to what I've described?

-- YK

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next 01/15] net/mlx5: DR, Add buddy allocator utilities
  2020-09-29 20:44         ` Yevgeny Kliteynik
@ 2020-10-06 13:02           ` Yevgeny Kliteynik
  2020-10-06 14:47             ` David Miller
  0 siblings, 1 reply; 23+ messages in thread
From: Yevgeny Kliteynik @ 2020-10-06 13:02 UTC (permalink / raw)
  To: David Miller
  Cc: saeed, kuba, netdev, Erez Shitrit, Mark Bloch, Saeed Mahameed

On 29-Sep-20 23:44, Yevgeny Kliteynik wrote:>
 > On 29-Sep-20 00:41, David Miller wrote:
 >>
 >> From: Yevgeny Kliteynik <kliteyn@nvidia.com>
 >> Date: Mon, 28 Sep 2020 19:58:59 +0000
 >>
 >>> By replacing the bits-per-long array with a single counter we loose
 >>> this ability to jump faster to the free spot.
 >>
 >> I don't understand why this is true, because upon the free we will
 >> update the hint and that's where the next bit search will start.
 >>
 >> Have you even tried my suggestion?
 >
 > I did, but I couldn't make it work for some use cases that
 > I expect to be fairly common. Perhaps I misunderstood something.
 > Let's look at the following use case (I'll make it 'extreme' to
 > better illustrate the problem):

Following up on this mail.
In addition to the case I've described, I'd like to clarify why
the 'lowest set bit' hint doesn't work here.

Buddy allocator allocates blocks of different sizes, so when it
scans the bits array, the allocator looks for free *area* of at
least the required size.
Can't store this info in a 'lowest set bit' counter.

Furthermore, when buddy allocator scans for such areas, it
takes into consideration blocks alignment as required by HW,
which can't be stored in an external counter.

The code here implements standard buddy allocator with a small
optimization of an additional level to speed up the search.
Do you see something wrong with this additional level?

-- YK


 > A buddy is all allocated by mostly small chunks.
 > This means that the bit array of size n will look something like this:
 >
 > idx  | 0 | 1 | ...                                     |n-2|n-1|
 >      |---|---|-----------------------------------------|---|---|
 > array| 0 | 0 | ....... all zeroes - all full .......   | 0 | 0 |
 >      |---|---|-----------------------------------------|---|---|
 >
 > Then chunk that was allocated at index n-1 is freed, which means
 > that 'lowest set bit' is set to n-1.
 > Array will look like this:
 >
 > idx  | 0 | 1 | ...                                     |n-2|n-1|
 >      |---|---|-----------------------------------------|---|---|
 > array| 0 | 0 | ....... all zeroes - all full .......   | 0 | 1 |
 >      |---|---|-----------------------------------------|---|---|
 >
 >
 > Then chunk that was allocated at index 0 is freed, which means
 > that 'lowest set bit' is set to 0.
 > Array will look like this:
 >
 > idx  | 0 | 1 | ...                                     |n-2|n-1|
 >      |---|---|-----------------------------------------|---|---|
 > array| 1 | 0 | ....... all zeroes - all full .......   | 0 | 1 |
 >      |---|---|-----------------------------------------|---|---|
 >
 > Then a new allocation request comes in.
 > The 'lowest set bit' is 0, which allows us to find the spot.
 >
 > The 'lowest set bit' is now set to 1.
 > Array will look like this:
 >
 > idx  | 0 | 1 | ...                                     |n-2|n-1|
 >      |---|---|-----------------------------------------|---|---|
 > array| 0 | 0 | ....... all zeroes - all full .......   | 0 | 1 |
 >      |---|---|-----------------------------------------|---|---|
 >
 > Then another allocation request comes in.
 > The 'lowest set bit' is 1, which means that we now need
 > to scan the whole array.
 >
 > Is there a way to make 'lowest set bit' hint to work for
 > the use cases similar to what I've described?
 >
 > -- YK
 >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next 01/15] net/mlx5: DR, Add buddy allocator utilities
  2020-10-06 13:02           ` Yevgeny Kliteynik
@ 2020-10-06 14:47             ` David Miller
  2020-10-07 21:25               ` Yevgeny Kliteynik
  0 siblings, 1 reply; 23+ messages in thread
From: David Miller @ 2020-10-06 14:47 UTC (permalink / raw)
  To: kliteyn; +Cc: saeed, kuba, netdev, erezsh, mbloch, saeedm

From: Yevgeny Kliteynik <kliteyn@nvidia.com>
Date: Tue, 6 Oct 2020 16:02:24 +0300

> Buddy allocator allocates blocks of different sizes, so when it
> scans the bits array, the allocator looks for free *area* of at
> least the required size.
> Can't store this info in a 'lowest set bit' counter.

If you make it per-order, why not?

> Furthermore, when buddy allocator scans for such areas, it
> takes into consideration blocks alignment as required by HW,
> which can't be stored in an external counter.

That's why you scan the bits, which you have to do anyways.

I'm kinda disappointed in how this discussion is going, to be honest.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next 01/15] net/mlx5: DR, Add buddy allocator utilities
  2020-10-06 14:47             ` David Miller
@ 2020-10-07 21:25               ` Yevgeny Kliteynik
  0 siblings, 0 replies; 23+ messages in thread
From: Yevgeny Kliteynik @ 2020-10-07 21:25 UTC (permalink / raw)
  To: David Miller; +Cc: saeed, kuba, netdev, erezsh, mbloch, saeedm

On 06-Oct-20 17:47, David Miller wrote:
 > From: Yevgeny Kliteynik <kliteyn@nvidia.com>
 > Date: Tue, 6 Oct 2020 16:02:24 +0300
 >
 >> Buddy allocator allocates blocks of different sizes, so when it
 >> scans the bits array, the allocator looks for free *area* of at
 >> least the required size.
 >> Can't store this info in a 'lowest set bit' counter.
 >
 > If you make it per-order, why not?

OK, I’ll send the V2 series with the standard allocator.
The optimization will be handled in a separate patch later on.

Thanks


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-10-07 21:26 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-25 19:37 [pull request][net-next 00/15] mlx5 SW steering updates 2020-09-25 saeed
2020-09-25 19:37 ` [net-next 01/15] net/mlx5: DR, Add buddy allocator utilities saeed
2020-09-26 22:15   ` David Miller
2020-09-28 19:58     ` Yevgeny Kliteynik
2020-09-28 21:41       ` David Miller
2020-09-29 20:44         ` Yevgeny Kliteynik
2020-10-06 13:02           ` Yevgeny Kliteynik
2020-10-06 14:47             ` David Miller
2020-10-07 21:25               ` Yevgeny Kliteynik
2020-09-25 19:37 ` [net-next 02/15] net/mlx5: DR, Handle ICM memory via buddy allocation instead of bucket management saeed
2020-09-25 19:37 ` [net-next 03/15] net/mlx5: DR, Sync chunks only during free saeed
2020-09-25 19:37 ` [net-next 04/15] net/mlx5: DR, ICM memory pools sync optimization saeed
2020-09-25 19:37 ` [net-next 05/15] net/mlx5: DR, Free buddy ICM memory if it is unused saeed
2020-09-25 19:38 ` [net-next 06/15] net/mlx5: E-switch, Use PF num in metadata reg c0 saeed
2020-09-25 19:38 ` [net-next 07/15] net/mlx5: DR, Replace the check for valid STE entry saeed
2020-09-25 19:38 ` [net-next 08/15] net/mlx5: DR, Remove unneeded check from source port builder saeed
2020-09-25 19:38 ` [net-next 09/15] net/mlx5: DR, Remove unneeded vlan check from L2 builder saeed
2020-09-25 19:38 ` [net-next 10/15] net/mlx5: DR, Remove unneeded local variable saeed
2020-09-25 19:38 ` [net-next 11/15] net/mlx5: DR, Call ste_builder directly with tag pointer saeed
2020-09-25 19:38 ` [net-next 12/15] net/mlx5: DR, Remove unused member of action struct saeed
2020-09-25 19:38 ` [net-next 13/15] net/mlx5: DR, Rename builders HW specific names saeed
2020-09-25 19:38 ` [net-next 14/15] net/mlx5: DR, Rename matcher functions to be more HW agnostic saeed
2020-09-25 19:38 ` [net-next 15/15] net/mlx5: DR, Add support for rule creation with flow source hint saeed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).