All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration
@ 2022-08-23 13:31 Leon Romanovsky
  2022-08-23 13:31 ` [PATCH xfrm-next v3 1/6] xfrm: add new full offload flag Leon Romanovsky
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Leon Romanovsky @ 2022-08-23 13:31 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Leon Romanovsky, David S. Miller, Eric Dumazet, Herbert Xu,
	Jakub Kicinski, netdev, Paolo Abeni, Raed Salem, Saeed Mahameed

From: Leon Romanovsky <leonro@nvidia.com>

Changelog:
v3:
 * I didn't hear any suggestion what term to use instead of
   "full offload", so left it as is. It is used in commit messages
   and documentation only and easy to rename.
 * Added performance data and background info to cover letter
 * Reused xfrm_output_resume() function to support multiple XFRM transformations
 * Add PMTU check in addition to driver .xdo_dev_offload_ok validation
 * Documentation is in progress, but not part of this series yet.
v2: https://lore.kernel.org/all/cover.1660639789.git.leonro@nvidia.com
 * Rebased to latest 6.0-rc1
 * Add an extra check in TX datapath patch to validate packets before
   forwarding to HW.
 * Added policy cleanup logic in case of netdev down event
v1: https://lore.kernel.org/all/cover.1652851393.git.leonro@nvidia.com
 * Moved comment to be before if (...) in third patch.
v0: https://lore.kernel.org/all/cover.1652176932.git.leonro@nvidia.com
-----------------------------------------------------------------------

The following series extends XFRM core code to handle a new type of IPsec
offload - full offload.

In this mode, the HW is going to be responsible for the whole data path,
so both policy and state should be offloaded.

IPsec full offload is an improved version of IPsec crypto mode,
In full mode, HW is responsible to trim/add headers in addition to
decrypt/encrypt. In this mode, the packet arrives to the stack as already
decrypted and vice versa for TX (exits to HW as not-encrypted).

Devices that implement IPsec full offload mode offload policies too.
In the RX path, it causes the situation that HW can't effectively
handle mixed SW and HW priorities unless users make sure that HW offloaded
policies have higher priorities.

To make sure that users have a coherent picture, we require that
HW offloaded policies have always (both RX and TX) higher priorities
than SW ones.

To not over-engineer the code, HW policies are treated as SW ones and
don't take into account netdev to allow reuse of the same priorities for
different devices.

There are several deliberate limitations:
 * No software fallback
 * Fragments are dropped, both in RX and TX
 * No sockets policies
 * Only IPsec transport mode is implemented

================================================================================
Configuration:

iproute2: https://lore.kernel.org/netdev/cover.1652179360.git.leonro@nvidia.com/

Full offload mode:
  ip xfrm state offload full dev <if-name> dir <in|out>
  ip xfrm policy .... offload full dev <if-name>
Crypto offload mode:
  ip xfrm state offload crypto dev <if-name> dir <in|out>
or (backward compatibility)
  ip xfrm state offload dev <if-name> dir <in|out>

================================================================================
Performance results:

TCP multi-stream, using iperf3 instance per-CPU.
+----------------------+--------+--------+--------+--------+---------+---------+
|                      | 1 CPU  | 2 CPUs | 4 CPUs | 8 CPUs | 16 CPUs | 32 CPUs |
|                      +--------+--------+--------+--------+---------+---------+
|                      |                   BW (Gbps)                           |
+----------------------+--------+--------+-------+---------+---------+---------+
| Baseline             | 27.9   | 59     | 93.1  | 92.8    | 93.7    | 94.4    |
+----------------------+--------+--------+-------+---------+---------+---------+
| Software IPsec       | 6      | 11.9   | 23.3  | 45.9    | 83.8    | 91.8    |
+----------------------+--------+--------+-------+---------+---------+---------+
| IPsec crypto offload | 15     | 29.7   | 58.5  | 89.6    | 90.4    | 90.8    |
+----------------------+--------+--------+-------+---------+---------+---------+
| IPsec full offload   | 28     | 57     | 90.7  | 91      | 91.3    | 91.9    |
+----------------------+--------+--------+-------+---------+---------+---------+

IPsec full offload mode behaves as baseline and reaches linerate with same amount
of CPUs.

Setups details (similar for both sides):
* NIC: ConnectX6-DX dual port, 100 Gbps each.
  Single port used in the tests.
* CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz

================================================================================
Series together with mlx5 part:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=xfrm-next

Thanks

Leon Romanovsky (6):
  xfrm: add new full offload flag
  xfrm: allow state full offload mode
  xfrm: add an interface to offload policy
  xfrm: add TX datapath support for IPsec full offload mode
  xfrm: add RX datapath protection for IPsec full offload mode
  xfrm: enforce separation between priorities of HW/SW policies

 .../inline_crypto/ch_ipsec/chcr_ipsec.c       |   4 +
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c    |   5 +
 drivers/net/ethernet/intel/ixgbevf/ipsec.c    |   5 +
 .../mellanox/mlx5/core/en_accel/ipsec.c       |   4 +
 drivers/net/netdevsim/ipsec.c                 |   5 +
 include/linux/netdevice.h                     |   3 +
 include/net/netns/xfrm.h                      |   8 +-
 include/net/xfrm.h                            | 104 +++++++---
 include/uapi/linux/xfrm.h                     |   6 +
 net/xfrm/xfrm_device.c                        | 102 +++++++++-
 net/xfrm/xfrm_output.c                        |  12 +-
 net/xfrm/xfrm_policy.c                        | 181 ++++++++++++++++++
 net/xfrm/xfrm_user.c                          |  19 ++
 13 files changed, 425 insertions(+), 33 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH xfrm-next v3 1/6] xfrm: add new full offload flag
  2022-08-23 13:31 [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration Leon Romanovsky
@ 2022-08-23 13:31 ` Leon Romanovsky
  2022-08-23 13:31   ` Leon Romanovsky
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Leon Romanovsky @ 2022-08-23 13:31 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Leon Romanovsky, David S. Miller, Eric Dumazet, Herbert Xu,
	Jakub Kicinski, netdev, Paolo Abeni, Raed Salem, Saeed Mahameed

From: Leon Romanovsky <leonro@nvidia.com>

In the next patches, the xfrm core code will be extended to support
new type of offload - full offload. In that mode, both policy and state
should be specially configured in order to perform whole offloaded data
path.

Full offload takes care of encryption, decryption, encapsulation and
other operations with headers.

As this mode is new for XFRM policy flow, we can "start fresh" with flag
bits and release first and second bit for future use.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/net/xfrm.h        | 7 +++++++
 include/uapi/linux/xfrm.h | 6 ++++++
 net/xfrm/xfrm_device.c    | 3 +++
 net/xfrm/xfrm_user.c      | 2 ++
 4 files changed, 18 insertions(+)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 6e8fa98f786f..b4d487053dfd 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -131,12 +131,19 @@ enum {
 	XFRM_DEV_OFFLOAD_OUT,
 };
 
+enum {
+	XFRM_DEV_OFFLOAD_UNSPECIFIED,
+	XFRM_DEV_OFFLOAD_CRYPTO,
+	XFRM_DEV_OFFLOAD_FULL,
+};
+
 struct xfrm_dev_offload {
 	struct net_device	*dev;
 	netdevice_tracker	dev_tracker;
 	struct net_device	*real_dev;
 	unsigned long		offload_handle;
 	u8			dir : 2;
+	u8			type : 2;
 };
 
 struct xfrm_mode {
diff --git a/include/uapi/linux/xfrm.h b/include/uapi/linux/xfrm.h
index b1f3e6a8f11a..3b084246d610 100644
--- a/include/uapi/linux/xfrm.h
+++ b/include/uapi/linux/xfrm.h
@@ -519,6 +519,12 @@ struct xfrm_user_offload {
  */
 #define XFRM_OFFLOAD_IPV6	1
 #define XFRM_OFFLOAD_INBOUND	2
+/* Two bits above are relevant for state path only, while
+ * offload is used for both policy and state flows.
+ *
+ * In policy offload mode, they are free and can be safely reused.
+ */
+#define XFRM_OFFLOAD_FULL	4
 
 struct xfrm_userpolicy_default {
 #define XFRM_USERPOLICY_UNSPEC	0
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 637ca8838436..6d1124eb1ec8 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -270,12 +270,15 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 	else
 		xso->dir = XFRM_DEV_OFFLOAD_OUT;
 
+	xso->type = XFRM_DEV_OFFLOAD_CRYPTO;
+
 	err = dev->xfrmdev_ops->xdo_dev_state_add(x);
 	if (err) {
 		xso->dev = NULL;
 		xso->dir = 0;
 		xso->real_dev = NULL;
 		netdev_put(dev, &xso->dev_tracker);
+		xso->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
 
 		if (err != -EOPNOTSUPP)
 			return err;
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 2ff017117730..9c0aef815730 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -854,6 +854,8 @@ static int copy_user_offload(struct xfrm_dev_offload *xso, struct sk_buff *skb)
 	xuo->ifindex = xso->dev->ifindex;
 	if (xso->dir == XFRM_DEV_OFFLOAD_IN)
 		xuo->flags = XFRM_OFFLOAD_INBOUND;
+	if (xso->type == XFRM_DEV_OFFLOAD_FULL)
+		xuo->flags |= XFRM_OFFLOAD_FULL;
 
 	return 0;
 }
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Intel-wired-lan] [PATCH xfrm-next v3 2/6] xfrm: allow state full offload mode
  2022-08-23 13:31 [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration Leon Romanovsky
@ 2022-08-23 13:31   ` Leon Romanovsky
  2022-08-23 13:31   ` Leon Romanovsky
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Leon Romanovsky @ 2022-08-23 13:31 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Herbert Xu, Rohit Maheshwari, netdev, Raed Salem,
	Vinay Kumar Yadav, Saeed Mahameed, Eric Dumazet, intel-wired-lan,
	Ayush Sawal, Jakub Kicinski, Paolo Abeni, Leon Romanovsky,
	David S. Miller

From: Leon Romanovsky <leonro@nvidia.com>

Allow users to configure xfrm states with full offload mode.
The full mode must be requested both for policy and state, and
such requires us to do not implement fallback.

We explicitly return an error if requested full mode can't
be configured.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 .../inline_crypto/ch_ipsec/chcr_ipsec.c       |  4 ++++
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c    |  5 ++++
 drivers/net/ethernet/intel/ixgbevf/ipsec.c    |  5 ++++
 .../mellanox/mlx5/core/en_accel/ipsec.c       |  4 ++++
 drivers/net/netdevsim/ipsec.c                 |  5 ++++
 net/xfrm/xfrm_device.c                        | 24 +++++++++++++++----
 6 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c b/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
index 585590520076..ca21794281d6 100644
--- a/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
+++ b/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
@@ -283,6 +283,10 @@ static int ch_ipsec_xfrm_add_state(struct xfrm_state *x)
 		pr_debug("Cannot offload xfrm states with geniv other than seqiv\n");
 		return -EINVAL;
 	}
+	if (x->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {
+		pr_debug("Unsupported xfrm offload\n");
+		return -EINVAL;
+	}
 
 	sa_entry = kzalloc(sizeof(*sa_entry), GFP_KERNEL);
 	if (!sa_entry) {
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 774de63dd93a..53a969e34883 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -585,6 +585,11 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
 		return -EINVAL;
 	}
 
+	if (xs->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {
+		netdev_err(dev, "Unsupported ipsec offload type\n");
+		return -EINVAL;
+	}
+
 	if (xs->xso.dir == XFRM_DEV_OFFLOAD_IN) {
 		struct rx_sa rsa;
 
diff --git a/drivers/net/ethernet/intel/ixgbevf/ipsec.c b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
index 9984ebc62d78..c1cf540d162a 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
@@ -280,6 +280,11 @@ static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs)
 		return -EINVAL;
 	}
 
+	if (xs->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {
+		netdev_err(dev, "Unsupported ipsec offload type\n");
+		return -EINVAL;
+	}
+
 	if (xs->xso.dir == XFRM_DEV_OFFLOAD_IN) {
 		struct rx_sa rsa;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index 2a8fd7020622..c182b640b80d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -256,6 +256,10 @@ static inline int mlx5e_xfrm_validate_state(struct xfrm_state *x)
 		netdev_info(netdev, "Cannot offload xfrm states with geniv other than seqiv\n");
 		return -EINVAL;
 	}
+	if (x->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {
+		netdev_info(netdev, "Unsupported xfrm offload type\n");
+		return -EINVAL;
+	}
 	return 0;
 }
 
diff --git a/drivers/net/netdevsim/ipsec.c b/drivers/net/netdevsim/ipsec.c
index 386336a38f34..b93baf5c8bee 100644
--- a/drivers/net/netdevsim/ipsec.c
+++ b/drivers/net/netdevsim/ipsec.c
@@ -149,6 +149,11 @@ static int nsim_ipsec_add_sa(struct xfrm_state *xs)
 		return -EINVAL;
 	}
 
+	if (xs->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {
+		netdev_err(dev, "Unsupported ipsec offload type\n");
+		return -EINVAL;
+	}
+
 	/* find the first unused index */
 	ret = nsim_ipsec_find_empty_idx(ipsec);
 	if (ret < 0) {
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 6d1124eb1ec8..5b04e5cdca64 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -215,6 +215,7 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 	struct xfrm_dev_offload *xso = &x->xso;
 	xfrm_address_t *saddr;
 	xfrm_address_t *daddr;
+	bool is_full_offload;
 
 	if (!x->type_offload)
 		return -EINVAL;
@@ -223,9 +224,11 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 	if (x->encap || x->tfcpad)
 		return -EINVAL;
 
-	if (xuo->flags & ~(XFRM_OFFLOAD_IPV6 | XFRM_OFFLOAD_INBOUND))
+	if (xuo->flags &
+	    ~(XFRM_OFFLOAD_IPV6 | XFRM_OFFLOAD_INBOUND | XFRM_OFFLOAD_FULL))
 		return -EINVAL;
 
+	is_full_offload = xuo->flags & XFRM_OFFLOAD_FULL;
 	dev = dev_get_by_index(net, xuo->ifindex);
 	if (!dev) {
 		if (!(xuo->flags & XFRM_OFFLOAD_INBOUND)) {
@@ -240,7 +243,7 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 					x->props.family,
 					xfrm_smark_get(0, x));
 		if (IS_ERR(dst))
-			return 0;
+			return (is_full_offload) ? -EINVAL : 0;
 
 		dev = dst->dev;
 
@@ -251,7 +254,7 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 	if (!dev->xfrmdev_ops || !dev->xfrmdev_ops->xdo_dev_state_add) {
 		xso->dev = NULL;
 		dev_put(dev);
-		return 0;
+		return (is_full_offload) ? -EINVAL : 0;
 	}
 
 	if (x->props.flags & XFRM_STATE_ESN &&
@@ -270,7 +273,10 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 	else
 		xso->dir = XFRM_DEV_OFFLOAD_OUT;
 
-	xso->type = XFRM_DEV_OFFLOAD_CRYPTO;
+	if (is_full_offload)
+		xso->type = XFRM_DEV_OFFLOAD_FULL;
+	else
+		xso->type = XFRM_DEV_OFFLOAD_CRYPTO;
 
 	err = dev->xfrmdev_ops->xdo_dev_state_add(x);
 	if (err) {
@@ -280,7 +286,15 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 		netdev_put(dev, &xso->dev_tracker);
 		xso->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
 
-		if (err != -EOPNOTSUPP)
+		/* User explicitly requested full offload mode and configured
+		 * policy in addition to the XFRM state. So be civil to users,
+		 * and return an error instead of taking fallback path.
+		 *
+		 * This WARN_ON() can be seen as a documentation for driver
+		 * authors to do not return -EOPNOTSUPP in full offload mode.
+		 */
+		WARN_ON(err == -EOPNOTSUPP && is_full_offload);
+		if (err != -EOPNOTSUPP || is_full_offload)
 			return err;
 	}
 
-- 
2.37.2

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH xfrm-next v3 2/6] xfrm: allow state full offload mode
@ 2022-08-23 13:31   ` Leon Romanovsky
  0 siblings, 0 replies; 19+ messages in thread
From: Leon Romanovsky @ 2022-08-23 13:31 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Leon Romanovsky, Ayush Sawal, David S. Miller, Eric Dumazet,
	Herbert Xu, intel-wired-lan, Jakub Kicinski, Jesse Brandeburg,
	netdev, Paolo Abeni, Raed Salem, Rohit Maheshwari,
	Saeed Mahameed, Tony Nguyen, Vinay Kumar Yadav

From: Leon Romanovsky <leonro@nvidia.com>

Allow users to configure xfrm states with full offload mode.
The full mode must be requested both for policy and state, and
such requires us to do not implement fallback.

We explicitly return an error if requested full mode can't
be configured.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 .../inline_crypto/ch_ipsec/chcr_ipsec.c       |  4 ++++
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c    |  5 ++++
 drivers/net/ethernet/intel/ixgbevf/ipsec.c    |  5 ++++
 .../mellanox/mlx5/core/en_accel/ipsec.c       |  4 ++++
 drivers/net/netdevsim/ipsec.c                 |  5 ++++
 net/xfrm/xfrm_device.c                        | 24 +++++++++++++++----
 6 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c b/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
index 585590520076..ca21794281d6 100644
--- a/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
+++ b/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c
@@ -283,6 +283,10 @@ static int ch_ipsec_xfrm_add_state(struct xfrm_state *x)
 		pr_debug("Cannot offload xfrm states with geniv other than seqiv\n");
 		return -EINVAL;
 	}
+	if (x->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {
+		pr_debug("Unsupported xfrm offload\n");
+		return -EINVAL;
+	}
 
 	sa_entry = kzalloc(sizeof(*sa_entry), GFP_KERNEL);
 	if (!sa_entry) {
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 774de63dd93a..53a969e34883 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -585,6 +585,11 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
 		return -EINVAL;
 	}
 
+	if (xs->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {
+		netdev_err(dev, "Unsupported ipsec offload type\n");
+		return -EINVAL;
+	}
+
 	if (xs->xso.dir == XFRM_DEV_OFFLOAD_IN) {
 		struct rx_sa rsa;
 
diff --git a/drivers/net/ethernet/intel/ixgbevf/ipsec.c b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
index 9984ebc62d78..c1cf540d162a 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
@@ -280,6 +280,11 @@ static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs)
 		return -EINVAL;
 	}
 
+	if (xs->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {
+		netdev_err(dev, "Unsupported ipsec offload type\n");
+		return -EINVAL;
+	}
+
 	if (xs->xso.dir == XFRM_DEV_OFFLOAD_IN) {
 		struct rx_sa rsa;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
index 2a8fd7020622..c182b640b80d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
@@ -256,6 +256,10 @@ static inline int mlx5e_xfrm_validate_state(struct xfrm_state *x)
 		netdev_info(netdev, "Cannot offload xfrm states with geniv other than seqiv\n");
 		return -EINVAL;
 	}
+	if (x->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {
+		netdev_info(netdev, "Unsupported xfrm offload type\n");
+		return -EINVAL;
+	}
 	return 0;
 }
 
diff --git a/drivers/net/netdevsim/ipsec.c b/drivers/net/netdevsim/ipsec.c
index 386336a38f34..b93baf5c8bee 100644
--- a/drivers/net/netdevsim/ipsec.c
+++ b/drivers/net/netdevsim/ipsec.c
@@ -149,6 +149,11 @@ static int nsim_ipsec_add_sa(struct xfrm_state *xs)
 		return -EINVAL;
 	}
 
+	if (xs->xso.type != XFRM_DEV_OFFLOAD_CRYPTO) {
+		netdev_err(dev, "Unsupported ipsec offload type\n");
+		return -EINVAL;
+	}
+
 	/* find the first unused index */
 	ret = nsim_ipsec_find_empty_idx(ipsec);
 	if (ret < 0) {
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 6d1124eb1ec8..5b04e5cdca64 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -215,6 +215,7 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 	struct xfrm_dev_offload *xso = &x->xso;
 	xfrm_address_t *saddr;
 	xfrm_address_t *daddr;
+	bool is_full_offload;
 
 	if (!x->type_offload)
 		return -EINVAL;
@@ -223,9 +224,11 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 	if (x->encap || x->tfcpad)
 		return -EINVAL;
 
-	if (xuo->flags & ~(XFRM_OFFLOAD_IPV6 | XFRM_OFFLOAD_INBOUND))
+	if (xuo->flags &
+	    ~(XFRM_OFFLOAD_IPV6 | XFRM_OFFLOAD_INBOUND | XFRM_OFFLOAD_FULL))
 		return -EINVAL;
 
+	is_full_offload = xuo->flags & XFRM_OFFLOAD_FULL;
 	dev = dev_get_by_index(net, xuo->ifindex);
 	if (!dev) {
 		if (!(xuo->flags & XFRM_OFFLOAD_INBOUND)) {
@@ -240,7 +243,7 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 					x->props.family,
 					xfrm_smark_get(0, x));
 		if (IS_ERR(dst))
-			return 0;
+			return (is_full_offload) ? -EINVAL : 0;
 
 		dev = dst->dev;
 
@@ -251,7 +254,7 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 	if (!dev->xfrmdev_ops || !dev->xfrmdev_ops->xdo_dev_state_add) {
 		xso->dev = NULL;
 		dev_put(dev);
-		return 0;
+		return (is_full_offload) ? -EINVAL : 0;
 	}
 
 	if (x->props.flags & XFRM_STATE_ESN &&
@@ -270,7 +273,10 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 	else
 		xso->dir = XFRM_DEV_OFFLOAD_OUT;
 
-	xso->type = XFRM_DEV_OFFLOAD_CRYPTO;
+	if (is_full_offload)
+		xso->type = XFRM_DEV_OFFLOAD_FULL;
+	else
+		xso->type = XFRM_DEV_OFFLOAD_CRYPTO;
 
 	err = dev->xfrmdev_ops->xdo_dev_state_add(x);
 	if (err) {
@@ -280,7 +286,15 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 		netdev_put(dev, &xso->dev_tracker);
 		xso->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
 
-		if (err != -EOPNOTSUPP)
+		/* User explicitly requested full offload mode and configured
+		 * policy in addition to the XFRM state. So be civil to users,
+		 * and return an error instead of taking fallback path.
+		 *
+		 * This WARN_ON() can be seen as a documentation for driver
+		 * authors to do not return -EOPNOTSUPP in full offload mode.
+		 */
+		WARN_ON(err == -EOPNOTSUPP && is_full_offload);
+		if (err != -EOPNOTSUPP || is_full_offload)
 			return err;
 	}
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH xfrm-next v3 3/6] xfrm: add an interface to offload policy
  2022-08-23 13:31 [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration Leon Romanovsky
  2022-08-23 13:31 ` [PATCH xfrm-next v3 1/6] xfrm: add new full offload flag Leon Romanovsky
  2022-08-23 13:31   ` Leon Romanovsky
@ 2022-08-23 13:32 ` Leon Romanovsky
  2022-08-23 13:32 ` [PATCH xfrm-next v3 4/6] xfrm: add TX datapath support for IPsec full offload mode Leon Romanovsky
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Leon Romanovsky @ 2022-08-23 13:32 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Leon Romanovsky, David S. Miller, Eric Dumazet, Herbert Xu,
	Jakub Kicinski, netdev, Paolo Abeni, Raed Salem, Saeed Mahameed

From: Leon Romanovsky <leonro@nvidia.com>

Extend netlink interface to add and delete XFRM policy from the device.
This functionality is a first step to implement full IPsec offload solution.

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/linux/netdevice.h |  3 ++
 include/net/xfrm.h        | 42 ++++++++++++++++++++++++
 net/xfrm/xfrm_device.c    | 61 ++++++++++++++++++++++++++++++++++-
 net/xfrm/xfrm_policy.c    | 68 +++++++++++++++++++++++++++++++++++++++
 net/xfrm/xfrm_user.c      | 17 ++++++++++
 5 files changed, 190 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 64e8662632f8..f5a49d2b5992 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1012,6 +1012,9 @@ struct xfrmdev_ops {
 	bool	(*xdo_dev_offload_ok) (struct sk_buff *skb,
 				       struct xfrm_state *x);
 	void	(*xdo_dev_state_advance_esn) (struct xfrm_state *x);
+	int	(*xdo_dev_policy_add) (struct xfrm_policy *x);
+	void	(*xdo_dev_policy_delete) (struct xfrm_policy *x);
+	void	(*xdo_dev_policy_free) (struct xfrm_policy *x);
 };
 #endif
 
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index b4d487053dfd..587697eb1d31 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -129,6 +129,7 @@ struct xfrm_state_walk {
 enum {
 	XFRM_DEV_OFFLOAD_IN = 1,
 	XFRM_DEV_OFFLOAD_OUT,
+	XFRM_DEV_OFFLOAD_FWD,
 };
 
 enum {
@@ -534,6 +535,8 @@ struct xfrm_policy {
 	struct xfrm_tmpl       	xfrm_vec[XFRM_MAX_DEPTH];
 	struct hlist_node	bydst_inexact_list;
 	struct rcu_head		rcu;
+
+	struct xfrm_dev_offload xdo;
 };
 
 static inline struct net *xp_net(const struct xfrm_policy *xp)
@@ -1577,6 +1580,7 @@ struct xfrm_state *xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq);
 int xfrm_state_delete(struct xfrm_state *x);
 int xfrm_state_flush(struct net *net, u8 proto, bool task_valid, bool sync);
 int xfrm_dev_state_flush(struct net *net, struct net_device *dev, bool task_valid);
+int xfrm_dev_policy_flush(struct net *net, struct net_device *dev, bool task_valid);
 void xfrm_sad_getinfo(struct net *net, struct xfrmk_sadinfo *si);
 void xfrm_spd_getinfo(struct net *net, struct xfrmk_spdinfo *si);
 u32 xfrm_replay_seqhi(struct xfrm_state *x, __be32 net_seq);
@@ -1887,6 +1891,8 @@ void xfrm_dev_backlog(struct softnet_data *sd);
 struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_features_t features, bool *again);
 int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 		       struct xfrm_user_offload *xuo);
+int xfrm_dev_policy_add(struct net *net, struct xfrm_policy *xp,
+		       struct xfrm_user_offload *xuo, u8 dir);
 bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x);
 
 static inline void xfrm_dev_state_advance_esn(struct xfrm_state *x)
@@ -1935,6 +1941,28 @@ static inline void xfrm_dev_state_free(struct xfrm_state *x)
 		netdev_put(dev, &xso->dev_tracker);
 	}
 }
+
+static inline void xfrm_dev_policy_delete(struct xfrm_policy *x)
+{
+	struct xfrm_dev_offload *xdo = &x->xdo;
+	struct net_device *dev = xdo->dev;
+
+	if (dev && dev->xfrmdev_ops && dev->xfrmdev_ops->xdo_dev_policy_delete)
+		dev->xfrmdev_ops->xdo_dev_policy_delete(x);
+}
+
+static inline void xfrm_dev_policy_free(struct xfrm_policy *x)
+{
+	struct xfrm_dev_offload *xdo = &x->xdo;
+	struct net_device *dev = xdo->dev;
+
+	if (dev && dev->xfrmdev_ops) {
+		if (dev->xfrmdev_ops->xdo_dev_policy_free)
+			dev->xfrmdev_ops->xdo_dev_policy_free(x);
+		xdo->dev = NULL;
+		netdev_put(dev, &xdo->dev_tracker);
+	}
+}
 #else
 static inline void xfrm_dev_resume(struct sk_buff *skb)
 {
@@ -1962,6 +1990,20 @@ static inline void xfrm_dev_state_free(struct xfrm_state *x)
 {
 }
 
+static inline int xfrm_dev_policy_add(struct net *net, struct xfrm_policy *xp,
+				      struct xfrm_user_offload *xuo, u8 dir)
+{
+	return 0;
+}
+
+static inline void xfrm_dev_policy_delete(struct xfrm_policy *x)
+{
+}
+
+static inline void xfrm_dev_policy_free(struct xfrm_policy *x)
+{
+}
+
 static inline bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
 {
 	return false;
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 5b04e5cdca64..1cc482e9c87d 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -302,6 +302,63 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
 }
 EXPORT_SYMBOL_GPL(xfrm_dev_state_add);
 
+int xfrm_dev_policy_add(struct net *net, struct xfrm_policy *xp,
+			struct xfrm_user_offload *xuo, u8 dir)
+{
+	struct xfrm_dev_offload *xdo = &xp->xdo;
+	struct net_device *dev;
+	int err;
+
+	if (!xuo->flags || xuo->flags & ~XFRM_OFFLOAD_FULL)
+		/* We support only Full offload mode and it means
+		 * that user must set XFRM_OFFLOAD_FULL bit.
+		 */
+		return -EINVAL;
+
+	dev = dev_get_by_index(net, xuo->ifindex);
+	if (!dev)
+		return -EINVAL;
+
+	if (!dev->xfrmdev_ops || !dev->xfrmdev_ops->xdo_dev_policy_add) {
+		xdo->dev = NULL;
+		dev_put(dev);
+		return -EINVAL;
+	}
+
+	xdo->dev = dev;
+	netdev_tracker_alloc(dev, &xdo->dev_tracker, GFP_ATOMIC);
+	xdo->real_dev = dev;
+	xdo->type = XFRM_DEV_OFFLOAD_FULL;
+	switch (dir) {
+	case XFRM_POLICY_IN:
+		xdo->dir = XFRM_DEV_OFFLOAD_IN;
+		break;
+	case XFRM_POLICY_OUT:
+		xdo->dir = XFRM_DEV_OFFLOAD_OUT;
+		break;
+	case XFRM_POLICY_FWD:
+		xdo->dir = XFRM_DEV_OFFLOAD_FWD;
+		break;
+	default:
+		xdo->dev = NULL;
+		dev_put(dev);
+		return -EINVAL;
+	}
+
+	err = dev->xfrmdev_ops->xdo_dev_policy_add(xp);
+	if (err) {
+		xdo->dev = NULL;
+		xdo->real_dev = NULL;
+		xdo->type = XFRM_DEV_OFFLOAD_UNSPECIFIED;
+		xdo->dir = 0;
+		netdev_put(dev, &xdo->dev_tracker);
+		return err;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(xfrm_dev_policy_add);
+
 bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
 {
 	int mtu;
@@ -404,8 +461,10 @@ static int xfrm_api_check(struct net_device *dev)
 
 static int xfrm_dev_down(struct net_device *dev)
 {
-	if (dev->features & NETIF_F_HW_ESP)
+	if (dev->features & NETIF_F_HW_ESP) {
 		xfrm_dev_state_flush(dev_net(dev), dev, true);
+		xfrm_dev_policy_flush(dev_net(dev), dev, true);
+	}
 
 	return NOTIFY_DONE;
 }
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index f1a0bab920a5..4ee422c367f1 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -425,6 +425,7 @@ void xfrm_policy_destroy(struct xfrm_policy *policy)
 	if (del_timer(&policy->timer) || del_timer(&policy->polq.hold_timer))
 		BUG();
 
+	xfrm_dev_policy_free(policy);
 	call_rcu(&policy->rcu, xfrm_policy_destroy_rcu);
 }
 EXPORT_SYMBOL(xfrm_policy_destroy);
@@ -1769,12 +1770,41 @@ xfrm_policy_flush_secctx_check(struct net *net, u8 type, bool task_valid)
 	}
 	return err;
 }
+
+static inline int xfrm_dev_policy_flush_secctx_check(struct net *net,
+						     struct net_device *dev,
+						     bool task_valid)
+{
+	struct xfrm_policy *pol;
+	int err = 0;
+
+	list_for_each_entry(pol, &net->xfrm.policy_all, walk.all) {
+		if (pol->walk.dead ||
+		    xfrm_policy_id2dir(pol->index) >= XFRM_POLICY_MAX ||
+		    pol->xdo.dev != dev)
+			continue;
+
+		err = security_xfrm_policy_delete(pol->security);
+		if (err) {
+			xfrm_audit_policy_delete(pol, 0, task_valid);
+			return err;
+		}
+	}
+	return err;
+}
 #else
 static inline int
 xfrm_policy_flush_secctx_check(struct net *net, u8 type, bool task_valid)
 {
 	return 0;
 }
+
+static inline int xfrm_dev_policy_flush_secctx_check(struct net *net,
+						     struct net_device *dev,
+						     bool task_valid)
+{
+	return 0;
+}
 #endif
 
 int xfrm_policy_flush(struct net *net, u8 type, bool task_valid)
@@ -1814,6 +1844,43 @@ int xfrm_policy_flush(struct net *net, u8 type, bool task_valid)
 }
 EXPORT_SYMBOL(xfrm_policy_flush);
 
+int xfrm_dev_policy_flush(struct net *net, struct net_device *dev, bool task_valid)
+{
+	int dir, err = 0, cnt = 0;
+	struct xfrm_policy *pol;
+
+	spin_lock_bh(&net->xfrm.xfrm_policy_lock);
+
+	err = xfrm_dev_policy_flush_secctx_check(net, dev, task_valid);
+	if (err)
+		goto out;
+
+again:
+	list_for_each_entry(pol, &net->xfrm.policy_all, walk.all) {
+		dir = xfrm_policy_id2dir(pol->index);
+		if (pol->walk.dead ||
+		    dir >= XFRM_POLICY_MAX ||
+		    pol->xdo.dev != dev)
+			continue;
+
+		__xfrm_policy_unlink(pol, dir);
+		spin_unlock_bh(&net->xfrm.xfrm_policy_lock);
+		cnt++;
+		xfrm_audit_policy_delete(pol, 1, task_valid);
+		xfrm_policy_kill(pol);
+		spin_lock_bh(&net->xfrm.xfrm_policy_lock);
+		goto again;
+	}
+	if (cnt)
+		__xfrm_policy_inexact_flush(net);
+	else
+		err = -ESRCH;
+out:
+	spin_unlock_bh(&net->xfrm.xfrm_policy_lock);
+	return err;
+}
+EXPORT_SYMBOL(xfrm_dev_policy_flush);
+
 int xfrm_policy_walk(struct net *net, struct xfrm_policy_walk *walk,
 		     int (*func)(struct xfrm_policy *, int, int, void*),
 		     void *data)
@@ -2246,6 +2313,7 @@ int xfrm_policy_delete(struct xfrm_policy *pol, int dir)
 	pol = __xfrm_policy_unlink(pol, dir);
 	spin_unlock_bh(&net->xfrm.xfrm_policy_lock);
 	if (pol) {
+		xfrm_dev_policy_delete(pol);
 		xfrm_policy_kill(pol);
 		return 0;
 	}
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 9c0aef815730..698ff84da6ba 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1747,6 +1747,14 @@ static struct xfrm_policy *xfrm_policy_construct(struct net *net, struct xfrm_us
 	if (attrs[XFRMA_IF_ID])
 		xp->if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
 
+	/* configure the hardware if offload is requested */
+	if (attrs[XFRMA_OFFLOAD_DEV]) {
+		err = xfrm_dev_policy_add(net, xp,
+				nla_data(attrs[XFRMA_OFFLOAD_DEV]), p->dir);
+		if (err)
+			goto error;
+	}
+
 	return xp;
  error:
 	*errp = err;
@@ -1785,6 +1793,7 @@ static int xfrm_add_policy(struct sk_buff *skb, struct nlmsghdr *nlh,
 	xfrm_audit_policy_add(xp, err ? 0 : 1, true);
 
 	if (err) {
+		xfrm_dev_policy_delete(xp);
 		security_xfrm_policy_free(xp->security);
 		kfree(xp);
 		return err;
@@ -1897,6 +1906,8 @@ static int dump_one_policy(struct xfrm_policy *xp, int dir, int count, void *ptr
 		err = xfrm_mark_put(skb, &xp->mark);
 	if (!err)
 		err = xfrm_if_id_put(skb, xp->if_id);
+	if (!err && xp->xdo.dev)
+		err = copy_user_offload(&xp->xdo, skb);
 	if (err) {
 		nlmsg_cancel(skb, nlh);
 		return err;
@@ -3213,6 +3224,8 @@ static int build_acquire(struct sk_buff *skb, struct xfrm_state *x,
 		err = xfrm_mark_put(skb, &xp->mark);
 	if (!err)
 		err = xfrm_if_id_put(skb, xp->if_id);
+	if (!err && xp->xdo.dev)
+		err = copy_user_offload(&xp->xdo, skb);
 	if (err) {
 		nlmsg_cancel(skb, nlh);
 		return err;
@@ -3331,6 +3344,8 @@ static int build_polexpire(struct sk_buff *skb, struct xfrm_policy *xp,
 		err = xfrm_mark_put(skb, &xp->mark);
 	if (!err)
 		err = xfrm_if_id_put(skb, xp->if_id);
+	if (!err && xp->xdo.dev)
+		err = copy_user_offload(&xp->xdo, skb);
 	if (err) {
 		nlmsg_cancel(skb, nlh);
 		return err;
@@ -3414,6 +3429,8 @@ static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_e
 		err = xfrm_mark_put(skb, &xp->mark);
 	if (!err)
 		err = xfrm_if_id_put(skb, xp->if_id);
+	if (!err && xp->xdo.dev)
+		err = copy_user_offload(&xp->xdo, skb);
 	if (err)
 		goto out_free_skb;
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH xfrm-next v3 4/6] xfrm: add TX datapath support for IPsec full offload mode
  2022-08-23 13:31 [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration Leon Romanovsky
                   ` (2 preceding siblings ...)
  2022-08-23 13:32 ` [PATCH xfrm-next v3 3/6] xfrm: add an interface to offload policy Leon Romanovsky
@ 2022-08-23 13:32 ` Leon Romanovsky
  2022-08-23 13:32 ` [PATCH xfrm-next v3 5/6] xfrm: add RX datapath protection " Leon Romanovsky
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Leon Romanovsky @ 2022-08-23 13:32 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Leon Romanovsky, David S. Miller, Eric Dumazet, Herbert Xu,
	Jakub Kicinski, netdev, Paolo Abeni, Raed Salem, Saeed Mahameed

From: Leon Romanovsky <leonro@nvidia.com>

In IPsec full mode, the device is going to encrypt and encapsulate
packets that are associated with offloaded policy. After successful
policy lookup to indicate if packets should be offloaded or not,
the stack forwards packets to the device to do the magic.

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 net/xfrm/xfrm_device.c | 14 ++++++++++++--
 net/xfrm/xfrm_output.c | 12 +++++++++++-
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 1cc482e9c87d..2f0c2c6a1ed6 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -120,6 +120,16 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_features_t featur
 	if (xo->flags & XFRM_GRO || x->xso.dir == XFRM_DEV_OFFLOAD_IN)
 		return skb;
 
+	/* The packet was sent to HW IPsec full offload engine,
+	 * but to wrong device. Drop the packet, so it won't skip
+	 * XFRM stack.
+	 */
+	if (x->xso.type == XFRM_DEV_OFFLOAD_FULL && x->xso.dev != dev) {
+		kfree_skb(skb);
+		dev_core_stats_tx_dropped_inc(dev);
+		return NULL;
+	}
+
 	/* This skb was already validated on the upper/virtual dev */
 	if ((x->xso.dev != dev) && (x->xso.real_dev == dev))
 		return skb;
@@ -369,8 +379,8 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
 	if (!x->type_offload || x->encap)
 		return false;
 
-	if ((!dev || (dev == xfrm_dst_path(dst)->dev)) &&
-	    (!xdst->child->xfrm)) {
+	if (x->xso.type == XFRM_DEV_OFFLOAD_FULL ||
+	    (!dev || (dev == xfrm_dst_path(dst)->dev)) && !xdst->child->xfrm) {
 		mtu = xfrm_state_mtu(x, xdst->child_mtu_cached);
 		if (skb->len <= mtu)
 			goto ok;
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 555ab35cd119..b554a98926bd 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -494,7 +494,7 @@ static int xfrm_output_one(struct sk_buff *skb, int err)
 	struct xfrm_state *x = dst->xfrm;
 	struct net *net = xs_net(x);
 
-	if (err <= 0)
+	if (err <= 0 || x->xso.type == XFRM_DEV_OFFLOAD_FULL)
 		goto resume;
 
 	do {
@@ -719,6 +719,16 @@ int xfrm_output(struct sock *sk, struct sk_buff *skb)
 		break;
 	}
 
+	if (x->xso.type == XFRM_DEV_OFFLOAD_FULL) {
+		if (!xfrm_dev_offload_ok(skb, x)) {
+			XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
+			kfree_skb(skb);
+			return -EHOSTUNREACH;
+		}
+
+		return xfrm_output_resume(sk, skb, 0);
+	}
+
 	secpath_reset(skb);
 
 	if (xfrm_dev_offload_ok(skb, x)) {
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH xfrm-next v3 5/6] xfrm: add RX datapath protection for IPsec full offload mode
  2022-08-23 13:31 [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration Leon Romanovsky
                   ` (3 preceding siblings ...)
  2022-08-23 13:32 ` [PATCH xfrm-next v3 4/6] xfrm: add TX datapath support for IPsec full offload mode Leon Romanovsky
@ 2022-08-23 13:32 ` Leon Romanovsky
  2022-08-23 13:32 ` [PATCH xfrm-next v3 6/6] xfrm: enforce separation between priorities of HW/SW policies Leon Romanovsky
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Leon Romanovsky @ 2022-08-23 13:32 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Leon Romanovsky, David S. Miller, Eric Dumazet, Herbert Xu,
	Jakub Kicinski, netdev, Paolo Abeni, Raed Salem, Saeed Mahameed

From: Leon Romanovsky <leonro@nvidia.com>

Traffic received by device with enabled IPsec full offload should be
forwarded to the stack only after decryption, packet headers and
trailers removed.

Such packets are expected to be seen as normal (non-XFRM) ones, while
not-supported packets should be dropped by the HW.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/net/xfrm.h | 55 +++++++++++++++++++++++++++-------------------
 1 file changed, 32 insertions(+), 23 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 587697eb1d31..38fff78a1421 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1094,6 +1094,29 @@ xfrm_state_addr_cmp(const struct xfrm_tmpl *tmpl, const struct xfrm_state *x, un
 	return !0;
 }
 
+#ifdef CONFIG_XFRM
+static inline struct xfrm_state *xfrm_input_state(struct sk_buff *skb)
+{
+	struct sec_path *sp = skb_sec_path(skb);
+
+	return sp->xvec[sp->len - 1];
+}
+#endif
+
+static inline struct xfrm_offload *xfrm_offload(struct sk_buff *skb)
+{
+#ifdef CONFIG_XFRM
+	struct sec_path *sp = skb_sec_path(skb);
+
+	if (!sp || !sp->olen || sp->len != sp->olen)
+		return NULL;
+
+	return &sp->ovec[sp->olen - 1];
+#else
+	return NULL;
+#endif
+}
+
 #ifdef CONFIG_XFRM
 int __xfrm_policy_check(struct sock *, int dir, struct sk_buff *skb,
 			unsigned short family);
@@ -1125,10 +1148,19 @@ static inline int __xfrm_policy_check2(struct sock *sk, int dir,
 {
 	struct net *net = dev_net(skb->dev);
 	int ndir = dir | (reverse ? XFRM_POLICY_MASK + 1 : 0);
+	struct xfrm_offload *xo = xfrm_offload(skb);
+	struct xfrm_state *x;
 
 	if (sk && sk->sk_policy[XFRM_POLICY_IN])
 		return __xfrm_policy_check(sk, ndir, skb, family);
 
+	if (xo) {
+		x = xfrm_input_state(skb);
+		if (x->xso.type == XFRM_DEV_OFFLOAD_FULL)
+			return (xo->flags & CRYPTO_DONE) &&
+			       (xo->status & CRYPTO_SUCCESS);
+	}
+
 	return __xfrm_check_nopolicy(net, skb, dir) ||
 	       __xfrm_check_dev_nopolicy(skb, dir, family) ||
 	       __xfrm_policy_check(sk, ndir, skb, family);
@@ -1860,29 +1892,6 @@ static inline void xfrm_states_delete(struct xfrm_state **states, int n)
 }
 #endif
 
-#ifdef CONFIG_XFRM
-static inline struct xfrm_state *xfrm_input_state(struct sk_buff *skb)
-{
-	struct sec_path *sp = skb_sec_path(skb);
-
-	return sp->xvec[sp->len - 1];
-}
-#endif
-
-static inline struct xfrm_offload *xfrm_offload(struct sk_buff *skb)
-{
-#ifdef CONFIG_XFRM
-	struct sec_path *sp = skb_sec_path(skb);
-
-	if (!sp || !sp->olen || sp->len != sp->olen)
-		return NULL;
-
-	return &sp->ovec[sp->olen - 1];
-#else
-	return NULL;
-#endif
-}
-
 void __init xfrm_dev_init(void);
 
 #ifdef CONFIG_XFRM_OFFLOAD
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH xfrm-next v3 6/6] xfrm: enforce separation between priorities of HW/SW policies
  2022-08-23 13:31 [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration Leon Romanovsky
                   ` (4 preceding siblings ...)
  2022-08-23 13:32 ` [PATCH xfrm-next v3 5/6] xfrm: add RX datapath protection " Leon Romanovsky
@ 2022-08-23 13:32 ` Leon Romanovsky
  2022-08-25 21:36 ` [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration Jakub Kicinski
  2022-08-29  7:54 ` Steffen Klassert
  7 siblings, 0 replies; 19+ messages in thread
From: Leon Romanovsky @ 2022-08-23 13:32 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Leon Romanovsky, David S. Miller, Eric Dumazet, Herbert Xu,
	Jakub Kicinski, netdev, Paolo Abeni, Raed Salem, Saeed Mahameed

From: Leon Romanovsky <leonro@nvidia.com>

Devices that implement IPsec full offload mode offload policies too.
In RX path, it causes to the situation that HW can't effectively handle
mixed SW and HW priorities unless users make sure that HW offloaded
policies have higher priorities.

In order to make sure that users have coherent picture, let's require
that HW offloaded policies have always (both RX and TX) higher priorities
than SW ones.

To do not over engineer the code, HW policies are treated as SW ones and
don't take into account netdev to allow reuse of same priorities for
different devices.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/net/netns/xfrm.h |   8 ++-
 net/xfrm/xfrm_policy.c   | 113 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 120 insertions(+), 1 deletion(-)

diff --git a/include/net/netns/xfrm.h b/include/net/netns/xfrm.h
index bd7c3be4af5d..f0cfa0faf611 100644
--- a/include/net/netns/xfrm.h
+++ b/include/net/netns/xfrm.h
@@ -29,6 +29,11 @@ struct xfrm_policy_hthresh {
 	u8			rbits6;
 };
 
+struct xfrm_policy_prio {
+	u32 max_sw_prio;
+	u32 min_hw_prio;
+};
+
 struct netns_xfrm {
 	struct list_head	state_all;
 	/*
@@ -52,7 +57,7 @@ struct netns_xfrm {
 	unsigned int		policy_idx_hmask;
 	struct hlist_head	policy_inexact[XFRM_POLICY_MAX];
 	struct xfrm_policy_hash	policy_bydst[XFRM_POLICY_MAX];
-	unsigned int		policy_count[XFRM_POLICY_MAX * 2];
+	unsigned int		policy_count[XFRM_POLICY_MAX * 3];
 	struct work_struct	policy_hash_work;
 	struct xfrm_policy_hthresh policy_hthresh;
 	struct list_head	inexact_bins;
@@ -67,6 +72,7 @@ struct netns_xfrm {
 	u32			sysctl_acq_expires;
 
 	u8			policy_default[XFRM_POLICY_MAX];
+	struct xfrm_policy_prio	policy_prio[XFRM_POLICY_MAX];
 
 #ifdef CONFIG_SYSCTL
 	struct ctl_table_header	*sysctl_hdr;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 4ee422c367f1..28018691c6b3 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1570,13 +1570,70 @@ static struct xfrm_policy *xfrm_policy_insert_list(struct hlist_head *chain,
 	return delpol;
 }
 
+static int __xfrm_policy_check_hw_priority(struct net *net,
+					   struct xfrm_policy *policy, int dir)
+{
+	int left, right;
+
+	lockdep_assert_held(&net->xfrm.xfrm_policy_lock);
+
+	if (!net->xfrm.policy_count[dir])
+		/* Adding first policy */
+		return 0;
+
+	if (policy->xdo.type != XFRM_DEV_OFFLOAD_FULL) {
+		/* SW priority */
+		if (!net->xfrm.policy_count[2 * XFRM_POLICY_MAX + dir])
+			/* Special case to allow reuse maximum priority
+			 * (U32_MAX) for SW policies, when no HW policy exist.
+			 */
+			return 0;
+
+		left = policy->priority;
+		right = net->xfrm.policy_prio[dir].min_hw_prio;
+	} else {
+		/* HW priority */
+		left = net->xfrm.policy_prio[dir].max_sw_prio;
+		right = policy->priority;
+	}
+	if (left >= right)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void __xfrm_policy_update_hw_priority(struct net *net,
+					     struct xfrm_policy *policy,
+					     int dir)
+{
+	u32 *hw_prio, *sw_prio;
+
+	lockdep_assert_held(&net->xfrm.xfrm_policy_lock);
+
+	if (policy->xdo.type != XFRM_DEV_OFFLOAD_FULL) {
+		sw_prio = &net->xfrm.policy_prio[dir].max_sw_prio;
+		*sw_prio = max(*sw_prio, policy->priority);
+		return;
+	}
+
+	hw_prio = &net->xfrm.policy_prio[dir].min_hw_prio;
+	*hw_prio = min(*hw_prio, policy->priority);
+}
+
 int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl)
 {
 	struct net *net = xp_net(policy);
 	struct xfrm_policy *delpol;
 	struct hlist_head *chain;
+	int ret;
 
 	spin_lock_bh(&net->xfrm.xfrm_policy_lock);
+	ret = __xfrm_policy_check_hw_priority(net, policy, dir);
+	if (ret) {
+		spin_unlock_bh(&net->xfrm.xfrm_policy_lock);
+		return ret;
+	}
+
 	chain = policy_hash_bysel(net, &policy->selector, policy->family, dir);
 	if (chain)
 		delpol = xfrm_policy_insert_list(chain, policy, excl);
@@ -1606,6 +1663,7 @@ int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl)
 	policy->curlft.use_time = 0;
 	if (!mod_timer(&policy->timer, jiffies + HZ))
 		xfrm_pol_hold(policy);
+	__xfrm_policy_update_hw_priority(net, policy, dir);
 	spin_unlock_bh(&net->xfrm.xfrm_policy_lock);
 
 	if (delpol)
@@ -2271,6 +2329,8 @@ static void __xfrm_policy_link(struct xfrm_policy *pol, int dir)
 
 	list_add(&pol->walk.all, &net->xfrm.policy_all);
 	net->xfrm.policy_count[dir]++;
+	if (pol->xdo.type == XFRM_DEV_OFFLOAD_FULL)
+		net->xfrm.policy_count[2 * XFRM_POLICY_MAX + dir]++;
 	xfrm_pol_hold(pol);
 }
 
@@ -2290,6 +2350,8 @@ static struct xfrm_policy *__xfrm_policy_unlink(struct xfrm_policy *pol,
 	}
 
 	list_del_init(&pol->walk.all);
+	if (pol->xdo.type == XFRM_DEV_OFFLOAD_FULL)
+		net->xfrm.policy_count[2 * XFRM_POLICY_MAX + dir]--;
 	net->xfrm.policy_count[dir]--;
 
 	return pol;
@@ -2305,12 +2367,58 @@ static void xfrm_sk_policy_unlink(struct xfrm_policy *pol, int dir)
 	__xfrm_policy_unlink(pol, XFRM_POLICY_MAX + dir);
 }
 
+static void __xfrm_policy_delete_prio(struct net *net,
+				      struct xfrm_policy *policy, int dir)
+{
+	struct xfrm_policy *pol;
+	u32 sw_prio = 0;
+
+	lockdep_assert_held(&net->xfrm.xfrm_policy_lock);
+
+	if (!net->xfrm.policy_count[dir]) {
+		net->xfrm.policy_prio[dir].max_sw_prio = sw_prio;
+		net->xfrm.policy_prio[dir].min_hw_prio = U32_MAX;
+		return;
+	}
+
+	if (policy->xdo.type == XFRM_DEV_OFFLOAD_FULL &&
+	    !net->xfrm.policy_count[2 * XFRM_POLICY_MAX + dir]) {
+		net->xfrm.policy_prio[dir].min_hw_prio = U32_MAX;
+		return;
+	}
+
+	list_for_each_entry(pol, &net->xfrm.policy_all, walk.all) {
+		if (pol->walk.dead)
+			continue;
+
+		if (policy->xdo.type != XFRM_DEV_OFFLOAD_FULL) {
+			/* SW priority */
+			if (pol->xdo.type == XFRM_DEV_OFFLOAD_FULL) {
+				net->xfrm.policy_prio[dir].max_sw_prio = sw_prio;
+				return;
+			}
+			sw_prio = pol->priority;
+			continue;
+		}
+		/* HW priority */
+		if (pol->xdo.type != XFRM_DEV_OFFLOAD_FULL)
+			continue;
+
+		net->xfrm.policy_prio[dir].min_hw_prio = pol->priority;
+		return;
+	}
+
+	net->xfrm.policy_prio[dir].max_sw_prio = sw_prio;
+}
+
 int xfrm_policy_delete(struct xfrm_policy *pol, int dir)
 {
 	struct net *net = xp_net(pol);
 
 	spin_lock_bh(&net->xfrm.xfrm_policy_lock);
 	pol = __xfrm_policy_unlink(pol, dir);
+	if (pol)
+		__xfrm_policy_delete_prio(net, pol, dir);
 	spin_unlock_bh(&net->xfrm.xfrm_policy_lock);
 	if (pol) {
 		xfrm_dev_policy_delete(pol);
@@ -4111,6 +4219,7 @@ static int __net_init xfrm_policy_init(struct net *net)
 
 		net->xfrm.policy_count[dir] = 0;
 		net->xfrm.policy_count[XFRM_POLICY_MAX + dir] = 0;
+		net->xfrm.policy_count[2 * XFRM_POLICY_MAX + dir] = 0;
 		INIT_HLIST_HEAD(&net->xfrm.policy_inexact[dir]);
 
 		htab = &net->xfrm.policy_bydst[dir];
@@ -4196,6 +4305,10 @@ static int __net_init xfrm_net_init(struct net *net)
 	net->xfrm.policy_default[XFRM_POLICY_FWD] = XFRM_USERPOLICY_ACCEPT;
 	net->xfrm.policy_default[XFRM_POLICY_OUT] = XFRM_USERPOLICY_ACCEPT;
 
+	net->xfrm.policy_prio[XFRM_POLICY_IN].min_hw_prio = U32_MAX;
+	net->xfrm.policy_prio[XFRM_POLICY_FWD].min_hw_prio = U32_MAX;
+	net->xfrm.policy_prio[XFRM_POLICY_OUT].min_hw_prio = U32_MAX;
+
 	rv = xfrm_statistics_init(net);
 	if (rv < 0)
 		goto out_statistics;
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration
  2022-08-23 13:31 [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration Leon Romanovsky
                   ` (5 preceding siblings ...)
  2022-08-23 13:32 ` [PATCH xfrm-next v3 6/6] xfrm: enforce separation between priorities of HW/SW policies Leon Romanovsky
@ 2022-08-25 21:36 ` Jakub Kicinski
  2022-08-26  6:26   ` Leon Romanovsky
  2022-08-29  7:54 ` Steffen Klassert
  7 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2022-08-25 21:36 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Steffen Klassert, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Herbert Xu, netdev, Paolo Abeni, Raed Salem, Saeed Mahameed

On Tue, 23 Aug 2022 16:31:57 +0300 Leon Romanovsky wrote:
>  * I didn't hear any suggestion what term to use instead of
>    "full offload", so left it as is. It is used in commit messages
>    and documentation only and easy to rename.
>  * Added performance data and background info to cover letter
>  * Reused xfrm_output_resume() function to support multiple XFRM transformations
>  * Add PMTU check in addition to driver .xdo_dev_offload_ok validation
>  * Documentation is in progress, but not part of this series yet.

Since the use case is somewhat in question, perhaps switch to RFC
postings until the drivers side incl. tc forwarding is implemented?
Also the perf traces, I don't see them here.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration
  2022-08-25 21:36 ` [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration Jakub Kicinski
@ 2022-08-26  6:26   ` Leon Romanovsky
  2022-08-26 23:45     ` Jakub Kicinski
  2022-08-29  8:07     ` Steffen Klassert
  0 siblings, 2 replies; 19+ messages in thread
From: Leon Romanovsky @ 2022-08-26  6:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Steffen Klassert, David S. Miller, Eric Dumazet, Herbert Xu,
	netdev, Paolo Abeni, Raed Salem, Saeed Mahameed

On Thu, Aug 25, 2022 at 02:36:10PM -0700, Jakub Kicinski wrote:
> On Tue, 23 Aug 2022 16:31:57 +0300 Leon Romanovsky wrote:
> >  * I didn't hear any suggestion what term to use instead of
> >    "full offload", so left it as is. It is used in commit messages
> >    and documentation only and easy to rename.
> >  * Added performance data and background info to cover letter
> >  * Reused xfrm_output_resume() function to support multiple XFRM transformations
> >  * Add PMTU check in addition to driver .xdo_dev_offload_ok validation
> >  * Documentation is in progress, but not part of this series yet.
> 
> Since the use case is somewhat in question, perhaps switch to RFC
> postings until the drivers side incl. tc forwarding is implemented?

Proposed driver implementation works fine with eswitch representors.
All our flow steering magic is performed on local table entry and it
ensures that representors receives/sends "clean" traffic.

We are using the following configuration snippet to achieve that.

---------------------------------------------------------------------
#!/bin/bash
P0_OUTER_REMOTE_IP=192.168.50.2
P0_OUTER_LOCAL_IP=192.168.50.1
PF0=enp8s0f0
VF0_REP=enp8s0f0_0

set -v
# Configure IP and turn VF_REP on
ifconfig $PF0 $P0_OUTER_LOCAL_IP/24 up
ifconfig $VF0_REP up

# Clean all TC rules, start fresh
tc qdisc del dev enp8s0f0 ingress >/dev/null 2>&1
tc qdisc del dev enp8s0f0_0 ingress >/dev/null 2>&1

# Make sure steering mode is dmfs(FW) and eswitch encap is none
devlink dev param set pci/0000:08:00.0 name flow_steering_mode value dmfs cmode runtime
devlink dev eswitch set pci/0000:08:00.0 mode legacy
devlink dev eswitch set pci/0000:08:00.0 encap none
devlink dev eswitch set pci/0000:08:00.0 mode switchdev

sleep 2

tc qdisc add dev enp8s0f0 ingress
tc qdisc add dev enp8s0f0_0 ingress

# Add TC rules
tc filter add dev $PF0 parent ffff: protocol 802.1q chain 0 flower vlan_id 10 vlan_ethtype 802.1q cvlan_id 5 action vlan pop action vlan pop  action mirred egress redirect dev $VF0_REP
tc filter add dev $VF0_REP parent ffff: protocol all chain 0 flower action vlan push protocol 802.1q id 5 action vlan push protocol 802.1q id 10 action mirred egress redirect dev $PF0
tc filter show dev $PF0 ingress
----------------------------------------------------------------------------------------------------

We also don't offload anything related to routing as we can't
differentiate between local traffic.

> Also the perf traces, I don't see them here.

It is worth to separate it to standalone discussion with a title:
"why crypto is not fast enough?". I don't think that mixed discussions
about full offload which Steffen said that he is interested and
research about crypto bottlenecks will be productive. These discussions
are orthogonal.

Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration
  2022-08-26  6:26   ` Leon Romanovsky
@ 2022-08-26 23:45     ` Jakub Kicinski
  2022-08-28  9:28       ` Leon Romanovsky
  2022-08-30 17:36       ` Jason Gunthorpe
  2022-08-29  8:07     ` Steffen Klassert
  1 sibling, 2 replies; 19+ messages in thread
From: Jakub Kicinski @ 2022-08-26 23:45 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Steffen Klassert, David S. Miller, Eric Dumazet, Herbert Xu,
	netdev, Paolo Abeni, Raed Salem, Saeed Mahameed

On Fri, 26 Aug 2022 09:26:57 +0300 Leon Romanovsky wrote:
> On Thu, Aug 25, 2022 at 02:36:10PM -0700, Jakub Kicinski wrote:
> > On Tue, 23 Aug 2022 16:31:57 +0300 Leon Romanovsky wrote:  
> > >  * I didn't hear any suggestion what term to use instead of
> > >    "full offload", so left it as is. It is used in commit messages
> > >    and documentation only and easy to rename.
> > >  * Added performance data and background info to cover letter
> > >  * Reused xfrm_output_resume() function to support multiple XFRM transformations
> > >  * Add PMTU check in addition to driver .xdo_dev_offload_ok validation
> > >  * Documentation is in progress, but not part of this series yet.  
> > 
> > Since the use case is somewhat in question, perhaps switch to RFC
> > postings until the drivers side incl. tc forwarding is implemented?  
> 
> Proposed driver implementation works fine with eswitch representors.
> All our flow steering magic is performed on local table entry and it
> ensures that representors receives/sends "clean" traffic.
> 
> We are using the following configuration snippet to achieve that.
> 
> ---------------------------------------------------------------------
> #!/bin/bash
> P0_OUTER_REMOTE_IP=192.168.50.2
> P0_OUTER_LOCAL_IP=192.168.50.1
> PF0=enp8s0f0
> VF0_REP=enp8s0f0_0
> 
> set -v
> # Configure IP and turn VF_REP on
> ifconfig $PF0 $P0_OUTER_LOCAL_IP/24 up
> ifconfig $VF0_REP up
> 
> # Clean all TC rules, start fresh
> tc qdisc del dev enp8s0f0 ingress >/dev/null 2>&1
> tc qdisc del dev enp8s0f0_0 ingress >/dev/null 2>&1
> 
> # Make sure steering mode is dmfs(FW) and eswitch encap is none
> devlink dev param set pci/0000:08:00.0 name flow_steering_mode value dmfs cmode runtime
> devlink dev eswitch set pci/0000:08:00.0 mode legacy
> devlink dev eswitch set pci/0000:08:00.0 encap none
> devlink dev eswitch set pci/0000:08:00.0 mode switchdev
> 
> sleep 2
> 
> tc qdisc add dev enp8s0f0 ingress
> tc qdisc add dev enp8s0f0_0 ingress
> 
> # Add TC rules
> tc filter add dev $PF0 parent ffff: protocol 802.1q chain 0 flower vlan_id 10 vlan_ethtype 802.1q cvlan_id 5 action vlan pop action vlan pop  action mirred egress redirect dev $VF0_REP
> tc filter add dev $VF0_REP parent ffff: protocol all chain 0 flower action vlan push protocol 802.1q id 5 action vlan push protocol 802.1q id 10 action mirred egress redirect dev $PF0
> tc filter show dev $PF0 ingress
> ----------------------------------------------------------------------------------------------------
> 
> We also don't offload anything related to routing as we can't
> differentiate between local traffic.

Yeah, nah, that's not what I'm asking for.
I said forwarding, not sending traffic thru a different virtual
interface. The TC rules must forward from or two the IPSec ifc.

That was the use case Jason mentioned.

> > Also the perf traces, I don't see them here.  
> 
> It is worth to separate it to standalone discussion with a title:
> "why crypto is not fast enough?". I don't think that mixed discussions
> about full offload which Steffen said that he is interested and
> research about crypto bottlenecks will be productive. These discussions
> are orthogonal.

What do you mean by crypto bottlenecks?

Please use more precise language. crypto here may mean "crypto only
offload" or "crypto as done by CPU". I have no idea which one you mean.

We are very much interested in the former, the latter is indeed out of
scope here.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration
  2022-08-26 23:45     ` Jakub Kicinski
@ 2022-08-28  9:28       ` Leon Romanovsky
  2022-08-30 17:36       ` Jason Gunthorpe
  1 sibling, 0 replies; 19+ messages in thread
From: Leon Romanovsky @ 2022-08-28  9:28 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Steffen Klassert, David S. Miller, Eric Dumazet, Herbert Xu,
	netdev, Paolo Abeni, Raed Salem, Saeed Mahameed

On Fri, Aug 26, 2022 at 04:45:22PM -0700, Jakub Kicinski wrote:
> On Fri, 26 Aug 2022 09:26:57 +0300 Leon Romanovsky wrote:
> > On Thu, Aug 25, 2022 at 02:36:10PM -0700, Jakub Kicinski wrote:
> > > On Tue, 23 Aug 2022 16:31:57 +0300 Leon Romanovsky wrote:  
> > > >  * I didn't hear any suggestion what term to use instead of
> > > >    "full offload", so left it as is. It is used in commit messages
> > > >    and documentation only and easy to rename.
> > > >  * Added performance data and background info to cover letter
> > > >  * Reused xfrm_output_resume() function to support multiple XFRM transformations
> > > >  * Add PMTU check in addition to driver .xdo_dev_offload_ok validation
> > > >  * Documentation is in progress, but not part of this series yet.  
> > > 
> > > Since the use case is somewhat in question, perhaps switch to RFC
> > > postings until the drivers side incl. tc forwarding is implemented?  

<...>

> > We also don't offload anything related to routing as we can't
> > differentiate between local traffic.
> 
> Yeah, nah, that's not what I'm asking for.
> I said forwarding, not sending traffic thru a different virtual
> interface. The TC rules must forward from or two the IPSec ifc.
> 
> That was the use case Jason mentioned.

I see, word "TC" confused me, sorry about that.

My next mlx5-related task after this IPsec full offload will be accepted
is to extend mlx5 with extra eswitch logic.

There is no change in API, xfrm code or behavior, just internal change
where IPsec flow steering tables will be created and how they will be
created/destroyed. 

Unfortunately, this "just.." change is a complicated task due to mlx5 core
internal implementation and will take time, but as I said, I will do it.

> 
> > > Also the perf traces, I don't see them here.  
> > 
> > It is worth to separate it to standalone discussion with a title:
> > "why crypto is not fast enough?". I don't think that mixed discussions
> > about full offload which Steffen said that he is interested and
> > research about crypto bottlenecks will be productive. These discussions
> > are orthogonal.
> 
> What do you mean by crypto bottlenecks?

I think that I used same language all the time.
* IPsec SW - software path
* IPsec crypto - HW offload of crypto part
* IPsec full offload - state and policy offloads to the HW. 

I will try to be more clear next time.

> 
> Please use more precise language. crypto here may mean "crypto only
> offload" or "crypto as done by CPU". I have no idea which one you mean.
> 
> We are very much interested in the former, the latter is indeed out of
> scope here.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration
  2022-08-23 13:31 [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration Leon Romanovsky
                   ` (6 preceding siblings ...)
  2022-08-25 21:36 ` [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration Jakub Kicinski
@ 2022-08-29  7:54 ` Steffen Klassert
  2022-08-30  6:48   ` Leon Romanovsky
  2022-09-04 16:45   ` Leon Romanovsky
  7 siblings, 2 replies; 19+ messages in thread
From: Steffen Klassert @ 2022-08-29  7:54 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Leon Romanovsky, David S. Miller, Eric Dumazet, Herbert Xu,
	Jakub Kicinski, netdev, Paolo Abeni, Raed Salem, Saeed Mahameed

On Tue, Aug 23, 2022 at 04:31:57PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Changelog:
> v3:
>  * I didn't hear any suggestion what term to use instead of
>    "full offload", so left it as is. It is used in commit messages
>    and documentation only and easy to rename.
>  * Added performance data and background info to cover letter
>  * Reused xfrm_output_resume() function to support multiple XFRM transformations
>  * Add PMTU check in addition to driver .xdo_dev_offload_ok validation
>  * Documentation is in progress, but not part of this series yet.
> v2: https://lore.kernel.org/all/cover.1660639789.git.leonro@nvidia.com
>  * Rebased to latest 6.0-rc1
>  * Add an extra check in TX datapath patch to validate packets before
>    forwarding to HW.
>  * Added policy cleanup logic in case of netdev down event
> v1: https://lore.kernel.org/all/cover.1652851393.git.leonro@nvidia.com
>  * Moved comment to be before if (...) in third patch.
> v0: https://lore.kernel.org/all/cover.1652176932.git.leonro@nvidia.com
> -----------------------------------------------------------------------
> 
> The following series extends XFRM core code to handle a new type of IPsec
> offload - full offload.
> 
> In this mode, the HW is going to be responsible for the whole data path,
> so both policy and state should be offloaded.
> 
> IPsec full offload is an improved version of IPsec crypto mode,
> In full mode, HW is responsible to trim/add headers in addition to
> decrypt/encrypt. In this mode, the packet arrives to the stack as already
> decrypted and vice versa for TX (exits to HW as not-encrypted).
> 
> Devices that implement IPsec full offload mode offload policies too.
> In the RX path, it causes the situation that HW can't effectively
> handle mixed SW and HW priorities unless users make sure that HW offloaded
> policies have higher priorities.
> 
> To make sure that users have a coherent picture, we require that
> HW offloaded policies have always (both RX and TX) higher priorities
> than SW ones.
> 
> To not over-engineer the code, HW policies are treated as SW ones and
> don't take into account netdev to allow reuse of the same priorities for
> different devices.
> 
> There are several deliberate limitations:
>  * No software fallback
>  * Fragments are dropped, both in RX and TX
>  * No sockets policies
>  * Only IPsec transport mode is implemented

... and you still have not answered the fundamental questions:

As implemented, the software does not hold any state.
I.e. there is no sync between hardware and software
regarding stats, liftetime, lifebyte, packet counts
and replay window. IKE rekeying and auditing is based
on these, how should this be done?

How can tunnel mode work with this offload?

I want to see the full picture before I consider to
apply this.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration
  2022-08-26  6:26   ` Leon Romanovsky
  2022-08-26 23:45     ` Jakub Kicinski
@ 2022-08-29  8:07     ` Steffen Klassert
  1 sibling, 0 replies; 19+ messages in thread
From: Steffen Klassert @ 2022-08-29  8:07 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jakub Kicinski, David S. Miller, Eric Dumazet, Herbert Xu,
	netdev, Paolo Abeni, Raed Salem, Saeed Mahameed

On Fri, Aug 26, 2022 at 09:26:57AM +0300, Leon Romanovsky wrote:
> On Thu, Aug 25, 2022 at 02:36:10PM -0700, Jakub Kicinski wrote:
> > On Tue, 23 Aug 2022 16:31:57 +0300 Leon Romanovsky wrote:
> > >  * I didn't hear any suggestion what term to use instead of
> > >    "full offload", so left it as is. It is used in commit messages
> > >    and documentation only and easy to rename.
> > >  * Added performance data and background info to cover letter
> > >  * Reused xfrm_output_resume() function to support multiple XFRM transformations
> > >  * Add PMTU check in addition to driver .xdo_dev_offload_ok validation
> > >  * Documentation is in progress, but not part of this series yet.
> > 
> > Since the use case is somewhat in question, perhaps switch to RFC
> > postings until the drivers side incl. tc forwarding is implemented?

+1

Please mark it as RFC as long as the full picture is not yet clear.
This is still far away from being ready for merging.

> 
> > Also the perf traces, I don't see them here.
> 
> It is worth to separate it to standalone discussion with a title:
> "why crypto is not fast enough?".

This is not the question. I want to know whether performance
comes from encapsulation/decapsulation offload, or lookup
offload, or something else.

Please provide the traces.

> I don't think that mixed discussions
> about full offload which Steffen said that he is interested and
> research about crypto bottlenecks will be productive. These discussions
> are orthogonal.

I'm interested in a 'full offload' but we need to make sure
this 'full offload' is able to support most of the common
usecases. It is still not clear whether this particular
offload you are proposing is the one that can make it.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration
  2022-08-29  7:54 ` Steffen Klassert
@ 2022-08-30  6:48   ` Leon Romanovsky
  2022-09-04 16:45   ` Leon Romanovsky
  1 sibling, 0 replies; 19+ messages in thread
From: Leon Romanovsky @ 2022-08-30  6:48 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: David S. Miller, Eric Dumazet, Herbert Xu, Jakub Kicinski,
	netdev, Paolo Abeni, Raed Salem, Saeed Mahameed

On Mon, Aug 29, 2022 at 09:54:03AM +0200, Steffen Klassert wrote:
> On Tue, Aug 23, 2022 at 04:31:57PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > Changelog:
> > v3:
> >  * I didn't hear any suggestion what term to use instead of
> >    "full offload", so left it as is. It is used in commit messages
> >    and documentation only and easy to rename.
> >  * Added performance data and background info to cover letter
> >  * Reused xfrm_output_resume() function to support multiple XFRM transformations
> >  * Add PMTU check in addition to driver .xdo_dev_offload_ok validation
> >  * Documentation is in progress, but not part of this series yet.
> > v2: https://lore.kernel.org/all/cover.1660639789.git.leonro@nvidia.com
> >  * Rebased to latest 6.0-rc1
> >  * Add an extra check in TX datapath patch to validate packets before
> >    forwarding to HW.
> >  * Added policy cleanup logic in case of netdev down event
> > v1: https://lore.kernel.org/all/cover.1652851393.git.leonro@nvidia.com
> >  * Moved comment to be before if (...) in third patch.
> > v0: https://lore.kernel.org/all/cover.1652176932.git.leonro@nvidia.com
> > -----------------------------------------------------------------------
> > 
> > The following series extends XFRM core code to handle a new type of IPsec
> > offload - full offload.
> > 
> > In this mode, the HW is going to be responsible for the whole data path,
> > so both policy and state should be offloaded.
> > 
> > IPsec full offload is an improved version of IPsec crypto mode,
> > In full mode, HW is responsible to trim/add headers in addition to
> > decrypt/encrypt. In this mode, the packet arrives to the stack as already
> > decrypted and vice versa for TX (exits to HW as not-encrypted).
> > 
> > Devices that implement IPsec full offload mode offload policies too.
> > In the RX path, it causes the situation that HW can't effectively
> > handle mixed SW and HW priorities unless users make sure that HW offloaded
> > policies have higher priorities.
> > 
> > To make sure that users have a coherent picture, we require that
> > HW offloaded policies have always (both RX and TX) higher priorities
> > than SW ones.
> > 
> > To not over-engineer the code, HW policies are treated as SW ones and
> > don't take into account netdev to allow reuse of the same priorities for
> > different devices.
> > 
> > There are several deliberate limitations:
> >  * No software fallback
> >  * Fragments are dropped, both in RX and TX
> >  * No sockets policies
> >  * Only IPsec transport mode is implemented
> 
> ... and you still have not answered the fundamental questions:
> 
> As implemented, the software does not hold any state.
> I.e. there is no sync between hardware and software
> regarding stats, liftetime, lifebyte, packet counts
> and replay window. IKE rekeying and auditing is based
> on these, how should this be done?
> 
> How can tunnel mode work with this offload?
> 
> I want to see the full picture before I consider to
> apply this.

Hi,

Unfortunately, due to vacation time, it takes more time to get internal answers
than I anticipated.

I will continue to collect the needed data and add everything to cover
letter/documentation when respin.

Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration
  2022-08-26 23:45     ` Jakub Kicinski
  2022-08-28  9:28       ` Leon Romanovsky
@ 2022-08-30 17:36       ` Jason Gunthorpe
  2022-08-30 18:56         ` Jakub Kicinski
  1 sibling, 1 reply; 19+ messages in thread
From: Jason Gunthorpe @ 2022-08-30 17:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Leon Romanovsky, Steffen Klassert, David S. Miller, Eric Dumazet,
	Herbert Xu, netdev, Paolo Abeni, Raed Salem, Saeed Mahameed

On Fri, Aug 26, 2022 at 04:45:22PM -0700, Jakub Kicinski wrote:

> > # Add TC rules
> > tc filter add dev $PF0 parent ffff: protocol 802.1q chain 0 flower vlan_id 10 vlan_ethtype 802.1q cvlan_id 5 action vlan pop action vlan pop  action mirred egress redirect dev $VF0_REP
> > tc filter add dev $VF0_REP parent ffff: protocol all chain 0 flower action vlan push protocol 802.1q id 5 action vlan push protocol 802.1q id 10 action mirred egress redirect dev $PF0
> > tc filter show dev $PF0 ingress
> > ----------------------------------------------------------------------------------------------------
> > 
> > We also don't offload anything related to routing as we can't
> > differentiate between local traffic.
> 
> Yeah, nah, that's not what I'm asking for.
> I said forwarding, not sending traffic thru a different virtual
> interface. The TC rules must forward from or two the IPSec ifc.
> 
> That was the use case Jason mentioned.

I was meaning rather generically handling the packets in the
hypervisor side without involving the CPU. 

We have customers deploying many different models for this in their
hypervisor, including a significant deployment using a model like the
above. 

It achieves a kind of connectivity to a VM with 0 hypervisor CPU
involvement with the vlan push/pop done in HW.

We other use-models, like the offloaded OVS switching model you are
alluding to, that is Leon has as a followup.

Jason

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration
  2022-08-30 17:36       ` Jason Gunthorpe
@ 2022-08-30 18:56         ` Jakub Kicinski
  2022-08-30 22:34           ` Jason Gunthorpe
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2022-08-30 18:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Steffen Klassert, David S. Miller, Eric Dumazet,
	Herbert Xu, netdev, Paolo Abeni, Raed Salem, Saeed Mahameed

On Tue, 30 Aug 2022 14:36:52 -0300 Jason Gunthorpe wrote:
> I was meaning rather generically handling the packets in the
> hypervisor side without involving the CPU. 
> 
> We have customers deploying many different models for this in their
> hypervisor, including a significant deployment using a model like the
> above. 
> 
> It achieves a kind of connectivity to a VM with 0 hypervisor CPU
> involvement with the vlan push/pop done in HW.

I don't see how that necessitates the full IPsec offload.

Sorry, perhaps I assumed we were on the same page too hastily.

I was referring to a model where the TC rules redirect to a IPsec
tunnel which is then routed to the uplink. All offloaded. Like
we do today for VxLAN encap/decap.

That necessitates full offload, and therefore can be used as a strong
argument for having a full offload in netdev.

> We other use-models, like the offloaded OVS switching model you are
> alluding to, that is Leon has as a followup.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration
  2022-08-30 18:56         ` Jakub Kicinski
@ 2022-08-30 22:34           ` Jason Gunthorpe
  0 siblings, 0 replies; 19+ messages in thread
From: Jason Gunthorpe @ 2022-08-30 22:34 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Leon Romanovsky, Steffen Klassert, David S. Miller, Eric Dumazet,
	Herbert Xu, netdev, Paolo Abeni, Raed Salem, Saeed Mahameed

On Tue, Aug 30, 2022 at 11:56:27AM -0700, Jakub Kicinski wrote:
> On Tue, 30 Aug 2022 14:36:52 -0300 Jason Gunthorpe wrote:
> > I was meaning rather generically handling the packets in the
> > hypervisor side without involving the CPU. 
> > 
> > We have customers deploying many different models for this in their
> > hypervisor, including a significant deployment using a model like the
> > above. 
> > 
> > It achieves a kind of connectivity to a VM with 0 hypervisor CPU
> > involvement with the vlan push/pop done in HW.
> 
> I don't see how that necessitates the full IPsec offload.

I think it would help if Leon showed the xfrm commands in his
configuration snippet and explained the traffic flow that is achieved
by it.

He has a configuration that is as I described, 0 hypervisor CPU
involvement, IPSEC, and VMs.

> Sorry, perhaps I assumed we were on the same page too hastily.

Well, I think we all agree that if a hypervisor environment is
controlling the IPSEC then we'd like a path similar to others where
the hypervisor doesn't process the packets in the CPU - it just flows
in the HW right out of the VM.

It seems obvious, but this means that the hypervisor controls HW that
does the crypto, the anti-replay, the counting and limiting
"autonomously" from the hypervisor CPU. (the so-called "full offload")

> I was referring to a model where the TC rules redirect to a IPsec
> tunnel which is then routed to the uplink. All offloaded. Like
> we do today for VxLAN encap/decap.

I think there are several models with TC that can do this. It is
probably worth talking about exactly which models should and need to
have it.

eg what exactly do you mean when you say "TC rules redirect to a IPsec
tunnel" ? How does a TC rule redirect to an xfrm?

Jason

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration
  2022-08-29  7:54 ` Steffen Klassert
  2022-08-30  6:48   ` Leon Romanovsky
@ 2022-09-04 16:45   ` Leon Romanovsky
  1 sibling, 0 replies; 19+ messages in thread
From: Leon Romanovsky @ 2022-09-04 16:45 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: David S. Miller, Eric Dumazet, Herbert Xu, Jakub Kicinski,
	netdev, Paolo Abeni, Raed Salem, Saeed Mahameed

On Mon, Aug 29, 2022 at 09:54:03AM +0200, Steffen Klassert wrote:
> On Tue, Aug 23, 2022 at 04:31:57PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > Changelog:
> > v3:
> >  * I didn't hear any suggestion what term to use instead of
> >    "full offload", so left it as is. It is used in commit messages
> >    and documentation only and easy to rename.
> >  * Added performance data and background info to cover letter
> >  * Reused xfrm_output_resume() function to support multiple XFRM transformations
> >  * Add PMTU check in addition to driver .xdo_dev_offload_ok validation
> >  * Documentation is in progress, but not part of this series yet.
> > v2: https://lore.kernel.org/all/cover.1660639789.git.leonro@nvidia.com
> >  * Rebased to latest 6.0-rc1
> >  * Add an extra check in TX datapath patch to validate packets before
> >    forwarding to HW.
> >  * Added policy cleanup logic in case of netdev down event
> > v1: https://lore.kernel.org/all/cover.1652851393.git.leonro@nvidia.com
> >  * Moved comment to be before if (...) in third patch.
> > v0: https://lore.kernel.org/all/cover.1652176932.git.leonro@nvidia.com
> > -----------------------------------------------------------------------
> > 
> > The following series extends XFRM core code to handle a new type of IPsec
> > offload - full offload.
> > 
> > In this mode, the HW is going to be responsible for the whole data path,
> > so both policy and state should be offloaded.
> > 
> > IPsec full offload is an improved version of IPsec crypto mode,
> > In full mode, HW is responsible to trim/add headers in addition to
> > decrypt/encrypt. In this mode, the packet arrives to the stack as already
> > decrypted and vice versa for TX (exits to HW as not-encrypted).
> > 
> > Devices that implement IPsec full offload mode offload policies too.
> > In the RX path, it causes the situation that HW can't effectively
> > handle mixed SW and HW priorities unless users make sure that HW offloaded
> > policies have higher priorities.
> > 
> > To make sure that users have a coherent picture, we require that
> > HW offloaded policies have always (both RX and TX) higher priorities
> > than SW ones.
> > 
> > To not over-engineer the code, HW policies are treated as SW ones and
> > don't take into account netdev to allow reuse of the same priorities for
> > different devices.
> > 
> > There are several deliberate limitations:
> >  * No software fallback
> >  * Fragments are dropped, both in RX and TX
> >  * No sockets policies
> >  * Only IPsec transport mode is implemented
> 
> ... and you still have not answered the fundamental questions:
> 
> As implemented, the software does not hold any state.
> I.e. there is no sync between hardware and software
> regarding stats, liftetime, lifebyte, packet counts
> and replay window. IKE rekeying and auditing is based
> on these, how should this be done?

I hope that the patch added in v4 clarifies it. There is a sync between
HW and core in regarding of packet counts. The HW generates event and
calls to xfrm_state_check_expire() to make sure that already existing
logic will do rekeying.

The replay window will be handled in similar way. HW will generate an
event.

> 
> How can tunnel mode work with this offload?

I don't see any issues here. Same rules will apply here. 

> 
> I want to see the full picture before I consider to
> apply this.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-09-04 16:45 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-23 13:31 [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration Leon Romanovsky
2022-08-23 13:31 ` [PATCH xfrm-next v3 1/6] xfrm: add new full offload flag Leon Romanovsky
2022-08-23 13:31 ` [Intel-wired-lan] [PATCH xfrm-next v3 2/6] xfrm: allow state full offload mode Leon Romanovsky
2022-08-23 13:31   ` Leon Romanovsky
2022-08-23 13:32 ` [PATCH xfrm-next v3 3/6] xfrm: add an interface to offload policy Leon Romanovsky
2022-08-23 13:32 ` [PATCH xfrm-next v3 4/6] xfrm: add TX datapath support for IPsec full offload mode Leon Romanovsky
2022-08-23 13:32 ` [PATCH xfrm-next v3 5/6] xfrm: add RX datapath protection " Leon Romanovsky
2022-08-23 13:32 ` [PATCH xfrm-next v3 6/6] xfrm: enforce separation between priorities of HW/SW policies Leon Romanovsky
2022-08-25 21:36 ` [PATCH xfrm-next v3 0/6] Extend XFRM core to allow full offload configuration Jakub Kicinski
2022-08-26  6:26   ` Leon Romanovsky
2022-08-26 23:45     ` Jakub Kicinski
2022-08-28  9:28       ` Leon Romanovsky
2022-08-30 17:36       ` Jason Gunthorpe
2022-08-30 18:56         ` Jakub Kicinski
2022-08-30 22:34           ` Jason Gunthorpe
2022-08-29  8:07     ` Steffen Klassert
2022-08-29  7:54 ` Steffen Klassert
2022-08-30  6:48   ` Leon Romanovsky
2022-09-04 16:45   ` Leon Romanovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.