All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/2] add API to set process to primary or secondary
@ 2022-12-01  8:20 Rongwei Liu
  2022-12-01  8:20 ` [RFC 1/2] ethdev: add group description Rongwei Liu
  2022-12-01  8:20 ` [RFC 2/2] ethdev: add API to set process to primary or secondary Rongwei Liu
  0 siblings, 2 replies; 54+ messages in thread
From: Rongwei Liu @ 2022-12-01  8:20 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

Set the rte_eth process to the primary or secondary role which affects
the flow rules offloading which is userful when switching DPDK to a
different version.

Targeted version is 23.03.

Rongwei Liu (2):
  ethdev: add group description
  ethdev: add API to set process to primary or secondary

 doc/guides/nics/mlx5.rst   | 10 ++++++
 lib/ethdev/ethdev_driver.h | 63 ++++++++++++++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.c    | 20 ++++++++++++
 lib/ethdev/rte_flow.h      | 14 +++++++++
 lib/ethdev/version.map     |  3 ++
 5 files changed, 110 insertions(+)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [RFC 1/2] ethdev: add group description
  2022-12-01  8:20 [RFC 0/2] add API to set process to primary or secondary Rongwei Liu
@ 2022-12-01  8:20 ` Rongwei Liu
  2022-12-01  8:20 ` [RFC 2/2] ethdev: add API to set process to primary or secondary Rongwei Liu
  1 sibling, 0 replies; 54+ messages in thread
From: Rongwei Liu @ 2022-12-01  8:20 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, rasland

Add more sentences to describe group concepts and define
group 0 as root group for traffic to search a hit rule.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
---
 lib/ethdev/rte_flow.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index b60987db4b..2443c92c6c 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -86,6 +86,20 @@ extern "C" {
  * but may be valid in a few cases.
  */
 struct rte_flow_attr {
+	/**
+	 * Group is a superset of several rules and the user should always
+	 * set the flow rules starting from group 0. Jump action is used
+	 * to find the next rule set by specifying the group ID if matching
+	 * criteria are met.
+	 * The group ID which is to be jumped may be not contiguous with the
+	 * current flow rules' group ID, then the group IDs between them will
+	 * be skipped.
+	 *
+	 * The user should always organize flow rules from lower group to
+	 * higher group number otherwise, the dead loop is expected.
+	 *
+	 * Note: group 0 is shared between kernel and DPDK for bifurcated drivers.
+	 */
 	uint32_t group; /**< Priority group. */
 	uint32_t priority; /**< Rule priority level within group. */
 	/**
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC 2/2] ethdev: add API to set process to primary or secondary
  2022-12-01  8:20 [RFC 0/2] add API to set process to primary or secondary Rongwei Liu
  2022-12-01  8:20 ` [RFC 1/2] ethdev: add group description Rongwei Liu
@ 2022-12-01  8:20 ` Rongwei Liu
  2022-12-01 15:10   ` Stephen Hemminger
  1 sibling, 1 reply; 54+ messages in thread
From: Rongwei Liu @ 2022-12-01  8:20 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, rasland

Users may want to change the DPDK process to different versions
such as hot upgrade.
There is a strong requirement to simplify the logic and shorten the
traffic downtime as much as possible.

This update introduces new rte_eth process role definitions: primary
or secondary.

The primary role means rules are programmed to HW immediately, and no
behavior changed. This is the default state.
The secondary role means rules are queued in the HW. If no primary roles
alive or back to primary, the rules are effective immediately.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
---
 doc/guides/nics/mlx5.rst   | 10 ++++++
 lib/ethdev/ethdev_driver.h | 63 ++++++++++++++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.c    | 20 ++++++++++++
 lib/ethdev/version.map     |  3 ++
 4 files changed, 96 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 51f51259e3..ea78e14b80 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -2001,3 +2001,13 @@ where:
 * ``sw_queue_id``: queue index in range [64536, 65535].
   This range is the highest 1000 numbers.
 * ``hw_queue_id``: queue index given by HW in queue creation.
+
+ethdev set process primary or secondary
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+User should only program group 0 (fdb_def_rule_en=0) when ``rte_eth_process_set_primary``
+has been called and set to a secondary role.
+Group 0 is shared across different DPDK processes while the other groups are limited
+to the current process scope.
+The process can't move from primary to secondary role if preceding primary application's
+rules are still present and vice versa.
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 6a550cfc83..67b53840c8 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -179,6 +179,16 @@ struct rte_eth_dev_data {
 	pthread_mutex_t flow_ops_mutex; /**< rte_flow ops mutex */
 } __rte_cache_aligned;
 
+/**@{@name Different rte_eth role flag definitions which will be used
+ *  when miagrating DPDK to a different version.
+ */
+/*
+ * Traffic coming from NIC domain rules will reach
+ * both primary and secondary processes.
+ */
+#define RTE_ETH_PROCESS_NIC_DUP_WITH_SECONDARY RTE_BIT32(0),
+/**@}*/
+
 /**
  * @internal
  * The pool of *rte_eth_dev* structures. The size of the pool
@@ -1087,6 +1097,22 @@ typedef const uint32_t *(*eth_buffer_split_supported_hdr_ptypes_get_t)(struct rt
  */
 typedef int (*eth_dev_priv_dump_t)(struct rte_eth_dev *dev, FILE *file);
 
+/**
+ * @internal
+ * Set rte_eth process to primary or secondary role.
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param active
+ *   Device (role) primary or not (secondary).
+ * @param flag
+ *   Role specific flag.
+ *
+ * @return
+ *   Negative value on error, 0 on success.
+ */
+typedef int (*eth_process_set_primary_t)(struct rte_eth_dev *dev, bool primary, uint32_t flag);
+
 /**
  * @internal Set Rx queue available descriptors threshold.
  * @see rte_eth_rx_avail_thresh_set()
@@ -1403,6 +1429,8 @@ struct eth_dev_ops {
 	eth_cman_config_set_t cman_config_set;
 	/** Retrieve congestion management configuration */
 	eth_cman_config_get_t cman_config_get;
+	/** Set the whole rte_eth process to primary or secondary role. */
+	eth_process_set_primary_t eth_process_set_primary;
 };
 
 /**
@@ -2046,6 +2074,41 @@ struct rte_eth_fdir_conf {
 	struct rte_eth_fdir_flex_conf flex_conf;
 };
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Set the rte_eth process to the primary or secondary role which affects
+ * the flow rules offloading. It doesn't allow multiple processes to be the
+ * same role unless no offload rules are set.
+ * The primary process's flow rules are effective immediately while the secondary
+ * process's rules will be queued in hardware until it becomes primary or no
+ * primary process is alive.
+ * The primary application will always receive traffic while the secondary
+ * application will receive traffic when no matching rules are present from
+ * the primary application.
+ *
+ * The application is primary by default if this API is not called.
+ *
+ * When a process transforms from a secondary to a primary role, all preceding
+ * flow rules which are queued by hardware will be effective immediately.
+ * Before role transition, all the rules set by the primary process should be
+ * flushed first.
+ *
+ * When role flag "RTE_ETH_PROCESS_NIC_DUP_WITH_SECONDARY" is set, NIC domain
+ * flow rules are effective immediately even if a process is secondary.
+ *
+ * @param active
+ *   Process primary (role) or not (secondary).
+ * @param flag
+ *   The role flag.
+ * @return
+ *   - (>=0) Number of rte devices which have been switched successfully.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_process_set_primary(bool primary, uint32_t flag);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 5d5e18db1e..251908b4e3 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6318,6 +6318,26 @@ rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes
 	return j;
 }
 
+int rte_eth_process_set_primary(bool primary, uint32_t flag)
+{
+	struct rte_eth_dev *dev;
+	uint16_t port_id;
+	int ret, val;
+
+	ret = 0;
+	RTE_ETH_FOREACH_DEV(port_id) {
+		dev = &rte_eth_devices[port_id];
+		if (*dev->dev_ops->eth_process_set_primary == NULL)
+			return -ENOTSUP;
+		val = (*dev->dev_ops->eth_process_set_primary)(dev, primary, flag);
+
+		if (val < 0)
+			return -EINVAL;
+		ret++;
+	}
+	return ret;
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 17201fbe0f..1823869e73 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -298,6 +298,9 @@ EXPERIMENTAL {
 	rte_flow_get_q_aged_flows;
 	rte_mtr_meter_policy_get;
 	rte_mtr_meter_profile_get;
+
+	# added in 23.03
+	rte_eth_process_set_primary;
 };
 
 INTERNAL {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [RFC 2/2] ethdev: add API to set process to primary or secondary
  2022-12-01  8:20 ` [RFC 2/2] ethdev: add API to set process to primary or secondary Rongwei Liu
@ 2022-12-01 15:10   ` Stephen Hemminger
  2022-12-02  3:27     ` Rongwei Liu
  0 siblings, 1 reply; 54+ messages in thread
From: Stephen Hemminger @ 2022-12-01 15:10 UTC (permalink / raw)
  To: Rongwei Liu
  Cc: matan, viacheslavo, orika, thomas, Ferruh Yigit,
	Andrew Rybchenko, dev, rasland

On Thu, 1 Dec 2022 10:20:05 +0200
Rongwei Liu <rongweil@nvidia.com> wrote:

> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Set the rte_eth process to the primary or secondary role which affects
> + * the flow rules offloading. It doesn't allow multiple processes to be the
> + * same role unless no offload rules are set.
> + * The primary process's flow rules are effective immediately while the secondary
> + * process's rules will be queued in hardware until it becomes primary or no
> + * primary process is alive.
> + * The primary application will always receive traffic while the secondary
> + * application will receive traffic when no matching rules are present from
> + * the primary application.
> + *
> + * The application is primary by default if this API is not called.
> + *
> + * When a process transforms from a secondary to a primary role, all preceding
> + * flow rules which are queued by hardware will be effective immediately.
> + * Before role transition, all the rules set by the primary process should be
> + * flushed first.
> + *
> + * When role flag "RTE_ETH_PROCESS_NIC_DUP_WITH_SECONDARY" is set, NIC domain
> + * flow rules are effective immediately even if a process is secondary.
> + *
> + * @param active
> + *   Process primary (role) or not (secondary).
> + * @param flag
> + *   The role flag.
> + * @return
> + *   - (>=0) Number of rte devices which have been switched successfully.
> + *   - (-EINVAL) if bad parameter.
> + */
> +__rte_experimental
> +int rte_eth_process_set_primary(bool primary, uint32_t flag);

The state of the devices and the system is really unstable if
this fails. There is no rollback here.

I think this should have a PMD capability flag so that application
can check that device supports doing this. And it would have to
be opt-in so that existing devices would always fail.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [RFC 2/2] ethdev: add API to set process to primary or secondary
  2022-12-01 15:10   ` Stephen Hemminger
@ 2022-12-02  3:27     ` Rongwei Liu
  2022-12-05 16:08       ` Stephen Hemminger
  0 siblings, 1 reply; 54+ messages in thread
From: Rongwei Liu @ 2022-12-02  3:27 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	Ferruh Yigit, Andrew Rybchenko, dev, Raslan Darawsheh



BR
Rongwei

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, December 1, 2022 23:10
> To: Rongwei Liu <rongweil@nvidia.com>
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> <rasland@nvidia.com>
> Subject: Re: [RFC 2/2] ethdev: add API to set process to primary or secondary
> 
> External email: Use caution opening links or attachments
> 
> 
> On Thu, 1 Dec 2022 10:20:05 +0200
> Rongwei Liu <rongweil@nvidia.com> wrote:
> 
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Set the rte_eth process to the primary or secondary role which
> > +affects
> > + * the flow rules offloading. It doesn't allow multiple processes to
> > +be the
> > + * same role unless no offload rules are set.
> > + * The primary process's flow rules are effective immediately while
> > +the secondary
> > + * process's rules will be queued in hardware until it becomes
> > +primary or no
> > + * primary process is alive.
> > + * The primary application will always receive traffic while the
> > +secondary
> > + * application will receive traffic when no matching rules are
> > +present from
> > + * the primary application.
> > + *
> > + * The application is primary by default if this API is not called.
> > + *
> > + * When a process transforms from a secondary to a primary role, all
> > +preceding
> > + * flow rules which are queued by hardware will be effective immediately.
> > + * Before role transition, all the rules set by the primary process
> > +should be
> > + * flushed first.
> > + *
> > + * When role flag "RTE_ETH_PROCESS_NIC_DUP_WITH_SECONDARY" is set,
> > +NIC domain
> > + * flow rules are effective immediately even if a process is secondary.
> > + *
> > + * @param active
> > + *   Process primary (role) or not (secondary).
> > + * @param flag
> > + *   The role flag.
> > + * @return
> > + *   - (>=0) Number of rte devices which have been switched successfully.
> > + *   - (-EINVAL) if bad parameter.
> > + */
> > +__rte_experimental
> > +int rte_eth_process_set_primary(bool primary, uint32_t flag);
> 
> The state of the devices and the system is really unstable if this fails. There is
> no rollback here.
> 
Assume application is calling rte_eth_process_set_primary(false);
Once failed, call all preceding successful ports as rte_eth_process_set_primary(true);
What do you think?
> I think this should have a PMD capability flag so that application can check
> that device supports doing this. And it would have to be opt-in so that existing
> devices would always fail.
If device doesn't support it, it can set the ethdev callback to NULL or return failure for all devices.
Then the devices' state will be consistent.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 2/2] ethdev: add API to set process to primary or secondary
  2022-12-02  3:27     ` Rongwei Liu
@ 2022-12-05 16:08       ` Stephen Hemminger
  2022-12-06  3:47         ` Rongwei Liu
  0 siblings, 1 reply; 54+ messages in thread
From: Stephen Hemminger @ 2022-12-05 16:08 UTC (permalink / raw)
  To: Rongwei Liu
  Cc: Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	Ferruh Yigit, Andrew Rybchenko, dev, Raslan Darawsheh

On Fri, 2 Dec 2022 03:27:38 +0000
Rongwei Liu <rongweil@nvidia.com> wrote:

> > 
> > The state of the devices and the system is really unstable if this fails. There is
> > no rollback here.
> >   
> Assume application is calling rte_eth_process_set_primary(false);
> Once failed, call all preceding successful ports as rte_eth_process_set_primary(true);
> What do you think?
> > I think this should have a PMD capability flag so that application can check
> > that device supports doing this. And it would have to be opt-in so that existing
> > devices would always fail.  
> If device doesn't support it, it can set the ethdev callback to NULL or return failure for all devices.
> Then the devices' state will be consistent.


Assume there are two DPDK ports.
If the application tries to change roles and one of the devices does not
support the change over, then that error is fatal. The first device has
changed state already, and the second doesn't allow it.

This needs to be a capability flag for the device, and would need an additional
flag in the device documentation as well.

I bet many devices do regular malloc or mmap in the primary process and that
is not going to work with this change.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [RFC 2/2] ethdev: add API to set process to primary or secondary
  2022-12-05 16:08       ` Stephen Hemminger
@ 2022-12-06  3:47         ` Rongwei Liu
  2022-12-06  5:54           ` Stephen Hemminger
  0 siblings, 1 reply; 54+ messages in thread
From: Rongwei Liu @ 2022-12-06  3:47 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	Ferruh Yigit, Andrew Rybchenko, dev, Raslan Darawsheh

Hi

BR
Rongwei

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Tuesday, December 6, 2022 00:08
> To: Rongwei Liu <rongweil@nvidia.com>
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> <rasland@nvidia.com>
> Subject: Re: [RFC 2/2] ethdev: add API to set process to primary or secondary
> 
> External email: Use caution opening links or attachments
> 
> 
> On Fri, 2 Dec 2022 03:27:38 +0000
> Rongwei Liu <rongweil@nvidia.com> wrote:
> 
> > >
> > > The state of the devices and the system is really unstable if this
> > > fails. There is no rollback here.
> > >
> > Assume application is calling rte_eth_process_set_primary(false);
> > Once failed, call all preceding successful ports as
> > rte_eth_process_set_primary(true);
> > What do you think?
> > > I think this should have a PMD capability flag so that application
> > > can check that device supports doing this. And it would have to be
> > > opt-in so that existing devices would always fail.
> > If device doesn't support it, it can set the ethdev callback to NULL or return
> failure for all devices.
> > Then the devices' state will be consistent.
> 
> 
> Assume there are two DPDK ports.
> If the application tries to change roles and one of the devices does not support
> the change over, then that error is fatal. The first device has changed state
> already, and the second doesn't allow it.
> 
If my understanding is correct, you are saying one application to probe two PMD vendors. This is difficult to handle. 
> This needs to be a capability flag for the device, and would need an additional
> flag in the device documentation as well.
> 
For multiple vendors simultaneously probing. Capability flag is a must. But no need for single vendor, right?
> I bet many devices do regular malloc or mmap in the primary process and that
> is not going to work with this change.
Sorry, looks like I mis-lead you. The words "Primary/Secondary" in this update have no relationship with current PRIMARY/SECDONARY definition.
Doesn't focus on the resource ownership.
In this update: Primary application' offload rules are effective immediately. Secondary' rules are in queue which will be effective if primary application exits or primary application doesn't insert any rule.
Maybe we can call it as "active/standby" or "main/standby" or "active/backup"?  Do you have naming suggestion?  

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 2/2] ethdev: add API to set process to primary or secondary
  2022-12-06  3:47         ` Rongwei Liu
@ 2022-12-06  5:54           ` Stephen Hemminger
  2022-12-21  9:00             ` [RFC v3 0/2] add API to set process to active or standby Rongwei Liu
  0 siblings, 1 reply; 54+ messages in thread
From: Stephen Hemminger @ 2022-12-06  5:54 UTC (permalink / raw)
  To: Rongwei Liu
  Cc: Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	Ferruh Yigit, Andrew Rybchenko, dev, Raslan Darawsheh

On Tue, 6 Dec 2022 03:47:42 +0000
Rongwei Liu <rongweil@nvidia.com> wrote:

> Hi
> 
> BR
> Rongwei
> 
> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Tuesday, December 6, 2022 00:08
> > To: Rongwei Liu <rongweil@nvidia.com>
> > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> > <ferruh.yigit@amd.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> > <rasland@nvidia.com>
> > Subject: Re: [RFC 2/2] ethdev: add API to set process to primary or secondary
> > 
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Fri, 2 Dec 2022 03:27:38 +0000
> > Rongwei Liu <rongweil@nvidia.com> wrote:
> >   
> > > >
> > > > The state of the devices and the system is really unstable if this
> > > > fails. There is no rollback here.
> > > >  
> > > Assume application is calling rte_eth_process_set_primary(false);
> > > Once failed, call all preceding successful ports as
> > > rte_eth_process_set_primary(true);
> > > What do you think?  
> > > > I think this should have a PMD capability flag so that application
> > > > can check that device supports doing this. And it would have to be
> > > > opt-in so that existing devices would always fail.  
> > > If device doesn't support it, it can set the ethdev callback to NULL or return  
> > failure for all devices.  
> > > Then the devices' state will be consistent.  
> > 
> > 
> > Assume there are two DPDK ports.
> > If the application tries to change roles and one of the devices does not support
> > the change over, then that error is fatal. The first device has changed state
> > already, and the second doesn't allow it.
> >   
> If my understanding is correct, you are saying one application to probe two PMD vendors. This is difficult to handle. 
> > This needs to be a capability flag for the device, and would need an additional
> > flag in the device documentation as well.
> >   
> For multiple vendors simultaneously probing. Capability flag is a must. But no need for single vendor, right?
> > I bet many devices do regular malloc or mmap in the primary process and that
> > is not going to work with this change.  
> Sorry, looks like I mis-lead you. The words "Primary/Secondary" in this update have no relationship with current PRIMARY/SECDONARY definition.
> Doesn't focus on the resource ownership.
> In this update: Primary application' offload rules are effective immediately. Secondary' rules are in queue which will be effective if primary application exits or primary application doesn't insert any rule.
> Maybe we can call it as "active/standby" or "main/standby" or "active/backup"?  Do you have naming suggestion?  


DPDK supports any combination of PCI and virtual devices.
Any patch that restricts that is a bad idea.

There already is a capability mechanism in ethdev API.
A well written application would look at those flags for all ethdev's before
attempting transition. Layered PMD's like bonding, failsafe, and netvsc would also
need to handle the nesting.

The problem is hard, what you did so far is a start but there are lots more issues
that need to be considered.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [RFC v3 0/2] add API to set process to active or standby
  2022-12-06  5:54           ` Stephen Hemminger
@ 2022-12-21  9:00             ` Rongwei Liu
  2022-12-21  9:00               ` [RFC v3 1/2] ethdev: add group description Rongwei Liu
  2022-12-21  9:00               ` [RFC v3 2/2] ethdev: add API to set process to active or standby Rongwei Liu
  0 siblings, 2 replies; 54+ messages in thread
From: Rongwei Liu @ 2022-12-21  9:00 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

Set the rte_eth process to the active or standby role that affects
the flow rules offloading which is userful when switching DPDK to a
different version.

v3: Use active/standby words to distinguish.
    Add new device capability bit.
v2: Add more sentences to describe group concepts and define group 0 as
    root group for traffic to search a hit rule.

Rongwei Liu (2):
  ethdev: add group description
  ethdev: add API to set process to active or standby

 doc/guides/nics/mlx5.rst               | 10 ++++
 doc/guides/rel_notes/release_22_03.rst |  5 ++
 lib/ethdev/ethdev_driver.h             | 63 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.c                | 41 +++++++++++++++++
 lib/ethdev/rte_ethdev.h                |  7 ++-
 lib/ethdev/rte_flow.h                  | 15 ++++++
 lib/ethdev/version.map                 |  3 ++
 7 files changed, 143 insertions(+), 1 deletion(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [RFC v3 1/2] ethdev: add group description
  2022-12-21  9:00             ` [RFC v3 0/2] add API to set process to active or standby Rongwei Liu
@ 2022-12-21  9:00               ` Rongwei Liu
  2023-01-18 15:44                 ` [PATCH v4 0/3] add API for live migration Rongwei Liu
  2022-12-21  9:00               ` [RFC v3 2/2] ethdev: add API to set process to active or standby Rongwei Liu
  1 sibling, 1 reply; 54+ messages in thread
From: Rongwei Liu @ 2022-12-21  9:00 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, rasland

Add more sentences to describe group concepts and define
group 0 as root group for traffic to search a hit rule.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
---
 lib/ethdev/rte_flow.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index b60987db4b..e004e41ac8 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -86,6 +86,21 @@ extern "C" {
  * but may be valid in a few cases.
  */
 struct rte_flow_attr {
+	/**
+	 * Group is a superset of several rules and the user should always
+	 * set the flow rules starting from group 0. Jump action is used
+	 * to find the next rule set by specifying the group ID if matching
+	 * criteria are met.
+	 * The group ID which is to be jumped may be not contiguous with the
+	 * current flow rules' group ID, then the group IDs between them will
+	 * be skipped.
+	 *
+	 * The user should make sure no dead loop is introduced when organizing
+	 * flow rules and this can be achieved by always jumping to higher group
+	 * number.
+	 *
+	 * Note: group 0 is shared between kernel and DPDK for bifurcated drivers.
+	 */
 	uint32_t group; /**< Priority group. */
 	uint32_t priority; /**< Rule priority level within group. */
 	/**
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC v3 2/2] ethdev: add API to set process to active or standby
  2022-12-21  9:00             ` [RFC v3 0/2] add API to set process to active or standby Rongwei Liu
  2022-12-21  9:00               ` [RFC v3 1/2] ethdev: add group description Rongwei Liu
@ 2022-12-21  9:00               ` Rongwei Liu
  2022-12-21  9:12                 ` Jerin Jacob
  1 sibling, 1 reply; 54+ messages in thread
From: Rongwei Liu @ 2022-12-21  9:00 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, rasland

Users may want to change the DPDK process to different versions
such as hot upgrade.
There is a strong requirement to simplify the logic and shorten the
traffic downtime as much as possible.

This update introduces new rte_eth process role definitions: active
or standby.

The active role means rules are programmed to HW immediately, and no
behavior changed. This is the default state.
The standby role means rules are queued in the HW. If no active roles
alive or back to active, the rules are effective immediately.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
---
 doc/guides/nics/mlx5.rst               | 10 ++++
 doc/guides/rel_notes/release_22_03.rst |  5 ++
 lib/ethdev/ethdev_driver.h             | 63 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.c                | 41 +++++++++++++++++
 lib/ethdev/rte_ethdev.h                |  7 ++-
 lib/ethdev/version.map                 |  3 ++
 6 files changed, 128 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 51f51259e3..de1fdac0a1 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -2001,3 +2001,13 @@ where:
 * ``sw_queue_id``: queue index in range [64536, 65535].
   This range is the highest 1000 numbers.
 * ``hw_queue_id``: queue index given by HW in queue creation.
+
+ethdev set process active or standby
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+User should only program group 0 (fdb_def_rule_en=0) when ``rte_eth_process_set_active``
+has been called and set to a standby role.
+Group 0 is shared across different DPDK processes while the other groups are limited
+to the current process scope.
+The process can't move from active to standby role if preceding active application's
+rules are still present and vice versa.
diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst
index 0923707cb8..6fa48106c4 100644
--- a/doc/guides/rel_notes/release_22_03.rst
+++ b/doc/guides/rel_notes/release_22_03.rst
@@ -207,6 +207,11 @@ API Changes
 * ethdev: Old public macros and enumeration constants without ``RTE_ETH_`` prefix,
   which are kept for backward compatibility, are marked as deprecated.
 
+* ethdev: added a new experimental api:
+
+  The new API ``rte_eth_process_set_active()`` was added.
+  If ``RTE_ETH_CAPA_PROCESS_SET_ROLE`` is not advertised, this new api is not supported.
+
 * cryptodev: The asymmetric session handling was modified to use a single
   mempool object. An API ``rte_cryptodev_asym_session_pool_create`` was added
   to create a mempool with element size big enough to hold the generic asymmetric
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 6a550cfc83..3c583bc39d 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -179,6 +179,16 @@ struct rte_eth_dev_data {
 	pthread_mutex_t flow_ops_mutex; /**< rte_flow ops mutex */
 } __rte_cache_aligned;
 
+/**@{@name Different rte_eth role flag definitions which will be used
+ *  when miagrating DPDK to a different version.
+ */
+/*
+ * Traffic coming from NIC domain rules will reach
+ * both active and standby processes.
+ */
+#define RTE_ETH_PROCESS_NIC_DUP_WITH_STANDBY RTE_BIT32(0),
+/**@}*/
+
 /**
  * @internal
  * The pool of *rte_eth_dev* structures. The size of the pool
@@ -1087,6 +1097,22 @@ typedef const uint32_t *(*eth_buffer_split_supported_hdr_ptypes_get_t)(struct rt
  */
 typedef int (*eth_dev_priv_dump_t)(struct rte_eth_dev *dev, FILE *file);
 
+/**
+ * @internal
+ * Set rte_eth process to active or standby role.
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param active
+ *   Device (role) active or not (standby).
+ * @param flag
+ *   Role specific flag.
+ *
+ * @return
+ *   Negative value on error, 0 on success.
+ */
+typedef int (*eth_process_set_active_t)(struct rte_eth_dev *dev, bool active, uint32_t flag);
+
 /**
  * @internal Set Rx queue available descriptors threshold.
  * @see rte_eth_rx_avail_thresh_set()
@@ -1403,6 +1429,8 @@ struct eth_dev_ops {
 	eth_cman_config_set_t cman_config_set;
 	/** Retrieve congestion management configuration */
 	eth_cman_config_get_t cman_config_get;
+	/** Set the whole rte_eth process to active or standby role. */
+	eth_process_set_active_t eth_process_set_active;
 };
 
 /**
@@ -2046,6 +2074,41 @@ struct rte_eth_fdir_conf {
 	struct rte_eth_fdir_flex_conf flex_conf;
 };
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Set the rte_eth process to the active or standby role which affects
+ * the flow rules offloading. It doesn't allow multiple processes to be the
+ * same role unless no offload rules are set.
+ * The active process flow rules are effective immediately while the standby
+ * process rules will be matched (active) when the process becomes active or
+ * when the traffic is not matched by the active process rules.
+ * The active application will always receive traffic while the standby
+ * application will receive traffic when no matching rules are present from
+ * the active application.
+ *
+ * The application is active by default if this API is not called.
+ *
+ * When a process transforms from a standby to a active role, all preceding
+ * flow rules which are queued by hardware will be effective immediately.
+ * Before role transition, all the rules set by the active process should be
+ * flushed first.
+ *
+ * When role flag "RTE_ETH_PROCESS_NIC_DUP_WITH_STANDBY" is set, NIC domain
+ * flow rules are effective immediately even if a process is standby role.
+ *
+ * @param active
+ *   Process active (role) or not (standby).
+ * @param flag
+ *   The role flag.
+ * @return
+ *   - (>=0) Number of rte devices which have been switched successfully.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_process_set_active(bool active, uint32_t flag);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 5d5e18db1e..f19da75bfe 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6318,6 +6318,47 @@ rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes
 	return j;
 }
 
+int rte_eth_process_set_active(bool active, uint32_t flag)
+{
+	struct rte_eth_dev_info dev_info = {0};
+	uint32_t flags[RTE_MAX_ETHPORTS];
+	struct rte_eth_dev *dev;
+	uint16_t port_id;
+	int ret = 0;
+
+	/* Check if all devices support. */
+	RTE_ETH_FOREACH_DEV(port_id) {
+		dev = &rte_eth_devices[port_id];
+		if (*dev->dev_ops->dev_infos_get == NULL ||
+		    *dev->dev_ops->eth_process_set_active == NULL)
+			return -ENOTSUP;
+		if ((*dev->dev_ops->dev_infos_get)(dev, &dev_info))
+			return -EINVAL;
+		if (!(dev_info.dev_capa & RTE_ETH_CAPA_PROCESS_SET_ROLE))
+			return -ENOTSUP;
+	}
+	RTE_ETH_FOREACH_DEV(port_id) {
+		dev = &rte_eth_devices[port_id];
+		if ((*dev->dev_ops->dev_infos_get)(dev, &dev_info))
+			goto err;
+		flags[port_id] = dev_info.eth_process_flag;
+		if ((*dev->dev_ops->eth_process_set_active)(dev, active, flag) < 0)
+			goto err;
+		ret++;
+	}
+	return ret;
+err:
+	if (!ret)
+		return 0;
+	RTE_ETH_FOREACH_DEV(port_id) {
+		dev = &rte_eth_devices[port_id];
+		(*dev->dev_ops->eth_process_set_active)(dev, !active, flags[port_id]);
+		if (--ret == 0)
+			break;
+	}
+	return 0;
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index c129ca1eaf..d29f051d6f 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1606,6 +1606,8 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP         RTE_BIT64(3)
 /** Device supports keeping shared flow objects across restart. */
 #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
+/**Device supports "rte_eth_process_set_active" callback. */
+#define RTE_ETH_CAPA_PROCESS_SET_ROLE RTE_BIT64(5)
 /**@}*/
 
 /*
@@ -1777,8 +1779,11 @@ struct rte_eth_dev_info {
 	struct rte_eth_switch_info switch_info;
 	/** Supported error handling mode. */
 	enum rte_eth_err_handle_mode err_handle_mode;
+	/** Process specific role flag. */
+	uint32_t eth_process_flag;
 
-	uint64_t reserved_64s[2]; /**< Reserved for future fields */
+	uint32_t reserved_32s[1]; /**< Reserved for future fields */
+	uint64_t reserved_64s[1]; /**< Reserved for future fields */
 	void *reserved_ptrs[2];   /**< Reserved for future fields */
 };
 
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 17201fbe0f..a5503f6fde 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -298,6 +298,9 @@ EXPERIMENTAL {
 	rte_flow_get_q_aged_flows;
 	rte_mtr_meter_policy_get;
 	rte_mtr_meter_profile_get;
+
+	# added in 23.03
+	rte_eth_process_set_active;
 };
 
 INTERNAL {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [RFC v3 2/2] ethdev: add API to set process to active or standby
  2022-12-21  9:00               ` [RFC v3 2/2] ethdev: add API to set process to active or standby Rongwei Liu
@ 2022-12-21  9:12                 ` Jerin Jacob
  2022-12-21  9:32                   ` Rongwei Liu
  0 siblings, 1 reply; 54+ messages in thread
From: Jerin Jacob @ 2022-12-21  9:12 UTC (permalink / raw)
  To: Rongwei Liu
  Cc: matan, viacheslavo, orika, thomas, Ferruh Yigit,
	Andrew Rybchenko, dev, rasland

On Wed, Dec 21, 2022 at 2:31 PM Rongwei Liu <rongweil@nvidia.com> wrote:
>
> Users may want to change the DPDK process to different versions

Different version of DPDK? If there is any ABI change how to support this?

> such as hot upgrade.
> There is a strong requirement to simplify the logic and shorten the
> traffic downtime as much as possible.
>
> This update introduces new rte_eth process role definitions: active
> or standby.
>
> The active role means rules are programmed to HW immediately, and no

Why it has to be specific only to rte_flow rule? If it spedieic to
rte_flow, why it is in rte_eth_process_ name space?

Also, if we are moving the standby, What about the rule whose ABI is
changed between versions?

> behavior changed. This is the default state.
> The standby role means rules are queued in the HW. If no active roles
> alive or back to active, the rules are effective immediately.
>
> Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> ---
>  doc/guides/nics/mlx5.rst               | 10 ++++
>  doc/guides/rel_notes/release_22_03.rst |  5 ++
>  lib/ethdev/ethdev_driver.h             | 63 ++++++++++++++++++++++++++
>  lib/ethdev/rte_ethdev.c                | 41 +++++++++++++++++
>  lib/ethdev/rte_ethdev.h                |  7 ++-
>  lib/ethdev/version.map                 |  3 ++
>  6 files changed, 128 insertions(+), 1 deletion(-)
>
> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
> index 51f51259e3..de1fdac0a1 100644
> --- a/doc/guides/nics/mlx5.rst
> +++ b/doc/guides/nics/mlx5.rst
> @@ -2001,3 +2001,13 @@ where:
>  * ``sw_queue_id``: queue index in range [64536, 65535].
>    This range is the highest 1000 numbers.
>  * ``hw_queue_id``: queue index given by HW in queue creation.
> +
> +ethdev set process active or standby
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +User should only program group 0 (fdb_def_rule_en=0) when ``rte_eth_process_set_active``
> +has been called and set to a standby role.
> +Group 0 is shared across different DPDK processes while the other groups are limited
> +to the current process scope.
> +The process can't move from active to standby role if preceding active application's
> +rules are still present and vice versa.
> diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst
> index 0923707cb8..6fa48106c4 100644
> --- a/doc/guides/rel_notes/release_22_03.rst
> +++ b/doc/guides/rel_notes/release_22_03.rst
> @@ -207,6 +207,11 @@ API Changes
>  * ethdev: Old public macros and enumeration constants without ``RTE_ETH_`` prefix,
>    which are kept for backward compatibility, are marked as deprecated.
>
> +* ethdev: added a new experimental api:
> +
> +  The new API ``rte_eth_process_set_active()`` was added.
> +  If ``RTE_ETH_CAPA_PROCESS_SET_ROLE`` is not advertised, this new api is not supported.
> +
>  * cryptodev: The asymmetric session handling was modified to use a single
>    mempool object. An API ``rte_cryptodev_asym_session_pool_create`` was added
>    to create a mempool with element size big enough to hold the generic asymmetric
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> index 6a550cfc83..3c583bc39d 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -179,6 +179,16 @@ struct rte_eth_dev_data {
>         pthread_mutex_t flow_ops_mutex; /**< rte_flow ops mutex */
>  } __rte_cache_aligned;
>
> +/**@{@name Different rte_eth role flag definitions which will be used
> + *  when miagrating DPDK to a different version.
> + */
> +/*
> + * Traffic coming from NIC domain rules will reach
> + * both active and standby processes.
> + */
> +#define RTE_ETH_PROCESS_NIC_DUP_WITH_STANDBY RTE_BIT32(0),
> +/**@}*/
> +
>  /**
>   * @internal
>   * The pool of *rte_eth_dev* structures. The size of the pool
> @@ -1087,6 +1097,22 @@ typedef const uint32_t *(*eth_buffer_split_supported_hdr_ptypes_get_t)(struct rt
>   */
>  typedef int (*eth_dev_priv_dump_t)(struct rte_eth_dev *dev, FILE *file);
>
> +/**
> + * @internal
> + * Set rte_eth process to active or standby role.
> + *
> + * @param dev
> + *   Port (ethdev) handle.
> + * @param active
> + *   Device (role) active or not (standby).
> + * @param flag
> + *   Role specific flag.
> + *
> + * @return
> + *   Negative value on error, 0 on success.
> + */
> +typedef int (*eth_process_set_active_t)(struct rte_eth_dev *dev, bool active, uint32_t flag);
> +
>  /**
>   * @internal Set Rx queue available descriptors threshold.
>   * @see rte_eth_rx_avail_thresh_set()
> @@ -1403,6 +1429,8 @@ struct eth_dev_ops {
>         eth_cman_config_set_t cman_config_set;
>         /** Retrieve congestion management configuration */
>         eth_cman_config_get_t cman_config_get;
> +       /** Set the whole rte_eth process to active or standby role. */
> +       eth_process_set_active_t eth_process_set_active;
>  };
>
>  /**
> @@ -2046,6 +2074,41 @@ struct rte_eth_fdir_conf {
>         struct rte_eth_fdir_flex_conf flex_conf;
>  };
>
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Set the rte_eth process to the active or standby role which affects
> + * the flow rules offloading. It doesn't allow multiple processes to be the
> + * same role unless no offload rules are set.
> + * The active process flow rules are effective immediately while the standby
> + * process rules will be matched (active) when the process becomes active or
> + * when the traffic is not matched by the active process rules.
> + * The active application will always receive traffic while the standby
> + * application will receive traffic when no matching rules are present from
> + * the active application.
> + *
> + * The application is active by default if this API is not called.
> + *
> + * When a process transforms from a standby to a active role, all preceding
> + * flow rules which are queued by hardware will be effective immediately.
> + * Before role transition, all the rules set by the active process should be
> + * flushed first.
> + *
> + * When role flag "RTE_ETH_PROCESS_NIC_DUP_WITH_STANDBY" is set, NIC domain
> + * flow rules are effective immediately even if a process is standby role.
> + *
> + * @param active
> + *   Process active (role) or not (standby).
> + * @param flag
> + *   The role flag.
> + * @return
> + *   - (>=0) Number of rte devices which have been switched successfully.
> + *   - (-EINVAL) if bad parameter.
> + */
> +__rte_experimental
> +int rte_eth_process_set_active(bool active, uint32_t flag);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 5d5e18db1e..f19da75bfe 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -6318,6 +6318,47 @@ rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t *ptypes
>         return j;
>  }
>
> +int rte_eth_process_set_active(bool active, uint32_t flag)
> +{
> +       struct rte_eth_dev_info dev_info = {0};
> +       uint32_t flags[RTE_MAX_ETHPORTS];
> +       struct rte_eth_dev *dev;
> +       uint16_t port_id;
> +       int ret = 0;
> +
> +       /* Check if all devices support. */
> +       RTE_ETH_FOREACH_DEV(port_id) {
> +               dev = &rte_eth_devices[port_id];
> +               if (*dev->dev_ops->dev_infos_get == NULL ||
> +                   *dev->dev_ops->eth_process_set_active == NULL)
> +                       return -ENOTSUP;
> +               if ((*dev->dev_ops->dev_infos_get)(dev, &dev_info))
> +                       return -EINVAL;
> +               if (!(dev_info.dev_capa & RTE_ETH_CAPA_PROCESS_SET_ROLE))
> +                       return -ENOTSUP;
> +       }
> +       RTE_ETH_FOREACH_DEV(port_id) {
> +               dev = &rte_eth_devices[port_id];
> +               if ((*dev->dev_ops->dev_infos_get)(dev, &dev_info))
> +                       goto err;
> +               flags[port_id] = dev_info.eth_process_flag;
> +               if ((*dev->dev_ops->eth_process_set_active)(dev, active, flag) < 0)
> +                       goto err;
> +               ret++;
> +       }
> +       return ret;
> +err:
> +       if (!ret)
> +               return 0;
> +       RTE_ETH_FOREACH_DEV(port_id) {
> +               dev = &rte_eth_devices[port_id];
> +               (*dev->dev_ops->eth_process_set_active)(dev, !active, flags[port_id]);
> +               if (--ret == 0)
> +                       break;
> +       }
> +       return 0;
> +}
> +
>  RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
>
>  RTE_INIT(ethdev_init_telemetry)
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index c129ca1eaf..d29f051d6f 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1606,6 +1606,8 @@ struct rte_eth_conf {
>  #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP         RTE_BIT64(3)
>  /** Device supports keeping shared flow objects across restart. */
>  #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
> +/**Device supports "rte_eth_process_set_active" callback. */
> +#define RTE_ETH_CAPA_PROCESS_SET_ROLE RTE_BIT64(5)
>  /**@}*/
>
>  /*
> @@ -1777,8 +1779,11 @@ struct rte_eth_dev_info {
>         struct rte_eth_switch_info switch_info;
>         /** Supported error handling mode. */
>         enum rte_eth_err_handle_mode err_handle_mode;
> +       /** Process specific role flag. */
> +       uint32_t eth_process_flag;
>
> -       uint64_t reserved_64s[2]; /**< Reserved for future fields */
> +       uint32_t reserved_32s[1]; /**< Reserved for future fields */
> +       uint64_t reserved_64s[1]; /**< Reserved for future fields */
>         void *reserved_ptrs[2];   /**< Reserved for future fields */
>  };
>
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> index 17201fbe0f..a5503f6fde 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -298,6 +298,9 @@ EXPERIMENTAL {
>         rte_flow_get_q_aged_flows;
>         rte_mtr_meter_policy_get;
>         rte_mtr_meter_profile_get;
> +
> +       # added in 23.03
> +       rte_eth_process_set_active;
>  };
>
>  INTERNAL {
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [RFC v3 2/2] ethdev: add API to set process to active or standby
  2022-12-21  9:12                 ` Jerin Jacob
@ 2022-12-21  9:32                   ` Rongwei Liu
  2022-12-21 10:59                     ` Jerin Jacob
  0 siblings, 1 reply; 54+ messages in thread
From: Rongwei Liu @ 2022-12-21  9:32 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	Ferruh Yigit, Andrew Rybchenko, dev, Raslan Darawsheh

HI Jerin:

BR
Rongwei

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, December 21, 2022 17:13
> To: Rongwei Liu <rongweil@nvidia.com>
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> <rasland@nvidia.com>
> Subject: Re: [RFC v3 2/2] ethdev: add API to set process to active or standby
> 
> External email: Use caution opening links or attachments
> 
> 
> On Wed, Dec 21, 2022 at 2:31 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> >
> > Users may want to change the DPDK process to different versions
> 
> Different version of DPDK? If there is any ABI change how to support this?
> 
There is a new member which was introduced into rte_eth_dev_info but it shouldn’t be ABI breaking since using reserved fields.
> > such as hot upgrade.
> > There is a strong requirement to simplify the logic and shorten the
> > traffic downtime as much as possible.
> >
> > This update introduces new rte_eth process role definitions: active or
> > standby.
> >
> > The active role means rules are programmed to HW immediately, and no
> 
> Why it has to be specific only to rte_flow rule? If it spedieic to rte_flow, why it
> is in rte_eth_process_ name space?
For now, this design focuses on the flow rule offloading and traffic redirection. 
When switching process version, it' important to make sure which application receives and handles the traffic.
The changing should be effective across all probing eth devices, that' why it was put under rte_eth_process_ (for all rte_eth_dev) name space.
> 
> Also, if we are moving the standby, What about the rule whose ABI is changed
> between versions?

Like the comments mentioned: " Before role transition, all the rules set by the active process should be flushed first. "
> > behavior changed. This is the default state.
> > The standby role means rules are queued in the HW. If no active roles
> > alive or back to active, the rules are effective immediately.
> >
> > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > ---
> >  doc/guides/nics/mlx5.rst               | 10 ++++
> >  doc/guides/rel_notes/release_22_03.rst |  5 ++
> >  lib/ethdev/ethdev_driver.h             | 63 ++++++++++++++++++++++++++
> >  lib/ethdev/rte_ethdev.c                | 41 +++++++++++++++++
> >  lib/ethdev/rte_ethdev.h                |  7 ++-
> >  lib/ethdev/version.map                 |  3 ++
> >  6 files changed, 128 insertions(+), 1 deletion(-)
> >
> > diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index
> > 51f51259e3..de1fdac0a1 100644
> > --- a/doc/guides/nics/mlx5.rst
> > +++ b/doc/guides/nics/mlx5.rst
> > @@ -2001,3 +2001,13 @@ where:
> >  * ``sw_queue_id``: queue index in range [64536, 65535].
> >    This range is the highest 1000 numbers.
> >  * ``hw_queue_id``: queue index given by HW in queue creation.
> > +
> > +ethdev set process active or standby
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +User should only program group 0 (fdb_def_rule_en=0) when
> > +``rte_eth_process_set_active`` has been called and set to a standby role.
> > +Group 0 is shared across different DPDK processes while the other
> > +groups are limited to the current process scope.
> > +The process can't move from active to standby role if preceding
> > +active application's rules are still present and vice versa.
> > diff --git a/doc/guides/rel_notes/release_22_03.rst
> > b/doc/guides/rel_notes/release_22_03.rst
> > index 0923707cb8..6fa48106c4 100644
> > --- a/doc/guides/rel_notes/release_22_03.rst
> > +++ b/doc/guides/rel_notes/release_22_03.rst
> > @@ -207,6 +207,11 @@ API Changes
> >  * ethdev: Old public macros and enumeration constants without
> ``RTE_ETH_`` prefix,
> >    which are kept for backward compatibility, are marked as deprecated.
> >
> > +* ethdev: added a new experimental api:
> > +
> > +  The new API ``rte_eth_process_set_active()`` was added.
> > +  If ``RTE_ETH_CAPA_PROCESS_SET_ROLE`` is not advertised, this new api
> is not supported.
> > +
> >  * cryptodev: The asymmetric session handling was modified to use a single
> >    mempool object. An API ``rte_cryptodev_asym_session_pool_create`` was
> added
> >    to create a mempool with element size big enough to hold the
> > generic asymmetric diff --git a/lib/ethdev/ethdev_driver.h
> > b/lib/ethdev/ethdev_driver.h index 6a550cfc83..3c583bc39d 100644
> > --- a/lib/ethdev/ethdev_driver.h
> > +++ b/lib/ethdev/ethdev_driver.h
> > @@ -179,6 +179,16 @@ struct rte_eth_dev_data {
> >         pthread_mutex_t flow_ops_mutex; /**< rte_flow ops mutex */  }
> > __rte_cache_aligned;
> >
> > +/**@{@name Different rte_eth role flag definitions which will be used
> > + *  when miagrating DPDK to a different version.
> > + */
> > +/*
> > + * Traffic coming from NIC domain rules will reach
> > + * both active and standby processes.
> > + */
> > +#define RTE_ETH_PROCESS_NIC_DUP_WITH_STANDBY RTE_BIT32(0),
> /**@}*/
> > +
> >  /**
> >   * @internal
> >   * The pool of *rte_eth_dev* structures. The size of the pool @@
> > -1087,6 +1097,22 @@ typedef const uint32_t
> *(*eth_buffer_split_supported_hdr_ptypes_get_t)(struct rt
> >   */
> >  typedef int (*eth_dev_priv_dump_t)(struct rte_eth_dev *dev, FILE
> > *file);
> >
> > +/**
> > + * @internal
> > + * Set rte_eth process to active or standby role.
> > + *
> > + * @param dev
> > + *   Port (ethdev) handle.
> > + * @param active
> > + *   Device (role) active or not (standby).
> > + * @param flag
> > + *   Role specific flag.
> > + *
> > + * @return
> > + *   Negative value on error, 0 on success.
> > + */
> > +typedef int (*eth_process_set_active_t)(struct rte_eth_dev *dev, bool
> > +active, uint32_t flag);
> > +
> >  /**
> >   * @internal Set Rx queue available descriptors threshold.
> >   * @see rte_eth_rx_avail_thresh_set() @@ -1403,6 +1429,8 @@ struct
> > eth_dev_ops {
> >         eth_cman_config_set_t cman_config_set;
> >         /** Retrieve congestion management configuration */
> >         eth_cman_config_get_t cman_config_get;
> > +       /** Set the whole rte_eth process to active or standby role. */
> > +       eth_process_set_active_t eth_process_set_active;
> >  };
> >
> >  /**
> > @@ -2046,6 +2074,41 @@ struct rte_eth_fdir_conf {
> >         struct rte_eth_fdir_flex_conf flex_conf;  };
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Set the rte_eth process to the active or standby role which
> > +affects
> > + * the flow rules offloading. It doesn't allow multiple processes to
> > +be the
> > + * same role unless no offload rules are set.
> > + * The active process flow rules are effective immediately while the
> > +standby
> > + * process rules will be matched (active) when the process becomes
> > +active or
> > + * when the traffic is not matched by the active process rules.
> > + * The active application will always receive traffic while the
> > +standby
> > + * application will receive traffic when no matching rules are
> > +present from
> > + * the active application.
> > + *
> > + * The application is active by default if this API is not called.
> > + *
> > + * When a process transforms from a standby to a active role, all
> > +preceding
> > + * flow rules which are queued by hardware will be effective immediately.
> > + * Before role transition, all the rules set by the active process
> > +should be
> > + * flushed first.
> > + *
> > + * When role flag "RTE_ETH_PROCESS_NIC_DUP_WITH_STANDBY" is set,
> NIC
> > +domain
> > + * flow rules are effective immediately even if a process is standby role.
> > + *
> > + * @param active
> > + *   Process active (role) or not (standby).
> > + * @param flag
> > + *   The role flag.
> > + * @return
> > + *   - (>=0) Number of rte devices which have been switched successfully.
> > + *   - (-EINVAL) if bad parameter.
> > + */
> > +__rte_experimental
> > +int rte_eth_process_set_active(bool active, uint32_t flag);
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 5d5e18db1e..f19da75bfe 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -6318,6 +6318,47 @@
> rte_eth_buffer_split_get_supported_hdr_ptypes(uint16_t port_id, uint32_t
> *ptypes
> >         return j;
> >  }
> >
> > +int rte_eth_process_set_active(bool active, uint32_t flag) {
> > +       struct rte_eth_dev_info dev_info = {0};
> > +       uint32_t flags[RTE_MAX_ETHPORTS];
> > +       struct rte_eth_dev *dev;
> > +       uint16_t port_id;
> > +       int ret = 0;
> > +
> > +       /* Check if all devices support. */
> > +       RTE_ETH_FOREACH_DEV(port_id) {
> > +               dev = &rte_eth_devices[port_id];
> > +               if (*dev->dev_ops->dev_infos_get == NULL ||
> > +                   *dev->dev_ops->eth_process_set_active == NULL)
> > +                       return -ENOTSUP;
> > +               if ((*dev->dev_ops->dev_infos_get)(dev, &dev_info))
> > +                       return -EINVAL;
> > +               if (!(dev_info.dev_capa & RTE_ETH_CAPA_PROCESS_SET_ROLE))
> > +                       return -ENOTSUP;
> > +       }
> > +       RTE_ETH_FOREACH_DEV(port_id) {
> > +               dev = &rte_eth_devices[port_id];
> > +               if ((*dev->dev_ops->dev_infos_get)(dev, &dev_info))
> > +                       goto err;
> > +               flags[port_id] = dev_info.eth_process_flag;
> > +               if ((*dev->dev_ops->eth_process_set_active)(dev, active, flag) < 0)
> > +                       goto err;
> > +               ret++;
> > +       }
> > +       return ret;
> > +err:
> > +       if (!ret)
> > +               return 0;
> > +       RTE_ETH_FOREACH_DEV(port_id) {
> > +               dev = &rte_eth_devices[port_id];
> > +               (*dev->dev_ops->eth_process_set_active)(dev, !active,
> flags[port_id]);
> > +               if (--ret == 0)
> > +                       break;
> > +       }
> > +       return 0;
> > +}
> > +
> >  RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
> >
> >  RTE_INIT(ethdev_init_telemetry)
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > c129ca1eaf..d29f051d6f 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1606,6 +1606,8 @@ struct rte_eth_conf {
> >  #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP         RTE_BIT64(3)
> >  /** Device supports keeping shared flow objects across restart. */
> > #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
> > +/**Device supports "rte_eth_process_set_active" callback. */ #define
> > +RTE_ETH_CAPA_PROCESS_SET_ROLE RTE_BIT64(5)
> >  /**@}*/
> >
> >  /*
> > @@ -1777,8 +1779,11 @@ struct rte_eth_dev_info {
> >         struct rte_eth_switch_info switch_info;
> >         /** Supported error handling mode. */
> >         enum rte_eth_err_handle_mode err_handle_mode;
> > +       /** Process specific role flag. */
> > +       uint32_t eth_process_flag;
> >
> > -       uint64_t reserved_64s[2]; /**< Reserved for future fields */
> > +       uint32_t reserved_32s[1]; /**< Reserved for future fields */
> > +       uint64_t reserved_64s[1]; /**< Reserved for future fields */
> >         void *reserved_ptrs[2];   /**< Reserved for future fields */
> >  };
> >
> > diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map index
> > 17201fbe0f..a5503f6fde 100644
> > --- a/lib/ethdev/version.map
> > +++ b/lib/ethdev/version.map
> > @@ -298,6 +298,9 @@ EXPERIMENTAL {
> >         rte_flow_get_q_aged_flows;
> >         rte_mtr_meter_policy_get;
> >         rte_mtr_meter_profile_get;
> > +
> > +       # added in 23.03
> > +       rte_eth_process_set_active;
> >  };
> >
> >  INTERNAL {
> > --
> > 2.27.0
> >

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC v3 2/2] ethdev: add API to set process to active or standby
  2022-12-21  9:32                   ` Rongwei Liu
@ 2022-12-21 10:59                     ` Jerin Jacob
  2022-12-21 12:05                       ` Rongwei Liu
  0 siblings, 1 reply; 54+ messages in thread
From: Jerin Jacob @ 2022-12-21 10:59 UTC (permalink / raw)
  To: Rongwei Liu
  Cc: Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	Ferruh Yigit, Andrew Rybchenko, dev, Raslan Darawsheh

On Wed, Dec 21, 2022 at 3:02 PM Rongwei Liu <rongweil@nvidia.com> wrote:
>
> HI Jerin:
>

Hi Rongwei

> BR
> Rongwei
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Wednesday, December 21, 2022 17:13
> > To: Rongwei Liu <rongweil@nvidia.com>
> > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> > <ferruh.yigit@amd.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> > <rasland@nvidia.com>
> > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to active or standby
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Wed, Dec 21, 2022 at 2:31 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> > >
> > > Users may want to change the DPDK process to different versions
> >
> > Different version of DPDK? If there is any ABI change how to support this?
> >
> There is a new member which was introduced into rte_eth_dev_info but it shouldn’t be ABI breaking since using reserved fields.

That is just for rte_eth_dev_info. What about the ABI change in
different ethdev structure and rte_flow structures across different
DPDK ABI versions.

> > > such as hot upgrade.
> > > There is a strong requirement to simplify the logic and shorten the
> > > traffic downtime as much as possible.
> > >
> > > This update introduces new rte_eth process role definitions: active or
> > > standby.
> > >
> > > The active role means rules are programmed to HW immediately, and no
> >
> > Why it has to be specific only to rte_flow rule? If it spedieic to rte_flow, why it
> > is in rte_eth_process_ name space?
> For now, this design focuses on the flow rule offloading and traffic redirection.
> When switching process version, it' important to make sure which application receives and handles the traffic.

Changing the DPDK version runtime is just beyond rte_flow driver.

> The changing should be effective across all probing eth devices, that' why it was put under rte_eth_process_ (for all rte_eth_dev) name space.
> >
> > Also, if we are moving the standby, What about the rule whose ABI is changed
> > between versions?
>
> Like the comments mentioned: " Before role transition, all the rules set by the active process should be flushed first. "

What happens to rte_flow flow handles for existing ones  which is
created with version X?
Also What if new version Y has ABI change in rte_flow_pattern and
rte_flow_action structure?

For me, If DPDK version change is needed, simply reload the
application. This API will soon bloat, and it will be a mess if to
start handling
Different DPDK version which is not ABI compatible at all.




> > > behavior changed. This is the default state.
> > > The standby role means rules are queued in the HW. If no active roles
> > > alive or back to active, the rules are effective immediately.
> > >
> > > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [RFC v3 2/2] ethdev: add API to set process to active or standby
  2022-12-21 10:59                     ` Jerin Jacob
@ 2022-12-21 12:05                       ` Rongwei Liu
  2022-12-21 12:44                         ` Jerin Jacob
  0 siblings, 1 reply; 54+ messages in thread
From: Rongwei Liu @ 2022-12-21 12:05 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	Ferruh Yigit, Andrew Rybchenko, dev, Raslan Darawsheh

Hi Jerin:

BR
Rongwei

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, December 21, 2022 19:00
> To: Rongwei Liu <rongweil@nvidia.com>
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> <rasland@nvidia.com>
> Subject: Re: [RFC v3 2/2] ethdev: add API to set process to active or standby
> 
> External email: Use caution opening links or attachments
> 
> 
> On Wed, Dec 21, 2022 at 3:02 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> >
> > HI Jerin:
> >
> 
> Hi Rongwei
> 
> > BR
> > Rongwei
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Wednesday, December 21, 2022 17:13
> > > To: Rongwei Liu <rongweil@nvidia.com>
> > > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> > > <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> > > <rasland@nvidia.com>
> > > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to active
> > > or standby
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Wed, Dec 21, 2022 at 2:31 PM Rongwei Liu <rongweil@nvidia.com>
> wrote:
> > > >
> > > > Users may want to change the DPDK process to different versions
> > >
> > > Different version of DPDK? If there is any ABI change how to support this?
> > >
> > There is a new member which was introduced into rte_eth_dev_info but it
> shouldn’t be ABI breaking since using reserved fields.
> 
> That is just for rte_eth_dev_info. What about the ABI change in different
> ethdev structure and rte_flow structures across different DPDK ABI versions.
> 
Besides this, there is no other ABI changes dependency.

Assume there is a DPDK process A running with version v21.11 and plan to upgrade to
version v22.11. Let' call v22.11 as process B.

Now, process A has been running for long time and has lot of rules configured. It' "active" role per this API definition.
Process B starts and it should call this API and set itself to "standby" role and user can program the flow rules as they want
and different NIC vendors may have different recommendations. Nvidia suggests only program process B with group 0' rules now.

The user should sync all desired configurations from process A to process B, and process A starts to yield traffic like "delete all group 0
rules for Nvidia' NICs" or quit.
After that process B calls this API and set itself to "active" role, now the hot-upgrade finishes.

> > > > such as hot upgrade.
> > > > There is a strong requirement to simplify the logic and shorten
> > > > the traffic downtime as much as possible.
> > > >
> > > > This update introduces new rte_eth process role definitions:
> > > > active or standby.
> > > >
> > > > The active role means rules are programmed to HW immediately, and
> > > > no
> > >
> > > Why it has to be specific only to rte_flow rule? If it spedieic to
> > > rte_flow, why it is in rte_eth_process_ name space?
> > For now, this design focuses on the flow rule offloading and traffic
> redirection.
> > When switching process version, it' important to make sure which
> application receives and handles the traffic.
> 
> Changing the DPDK version runtime is just beyond rte_flow driver.

It' not about changing DPDK version but upgrading DPDK from one PMD version to another one.
Does the preceding example answer your question?
> 
> > The changing should be effective across all probing eth devices, that' why it
> was put under rte_eth_process_ (for all rte_eth_dev) name space.
> > >
> > > Also, if we are moving the standby, What about the rule whose ABI is
> > > changed between versions?
> >
> > Like the comments mentioned: " Before role transition, all the rules set by
> the active process should be flushed first. "
> 
> What happens to rte_flow flow handles for existing ones  which is created with
> version X?
> Also What if new version Y has ABI change in rte_flow_pattern and
> rte_flow_action structure?
> 
> For me, If DPDK version change is needed, simply reload the application. This
> API will soon bloat, and it will be a mess if to start handling Different DPDK
> version which is not ABI compatible at all.
> 
Yes, you are right. Reloading the application is the easiest way but it may have a long time
Window that traffic is lost. No traffic arrives at process A or process B. 
We are trying to simplify the reloading logic and minimize the traffic down time as much as possible.
The approach may differentiate hugely between different NIC vendors, so I think it should be better if 
DPDK can provide an abstract API.

If process A and process B are ABI different, it doesn't matter. 
1. Call this API with process A means older ABI.
2. Call this API with process B means newer ABI.
It' have process concept and working scope. 

> 
> 
> 
> > > > behavior changed. This is the default state.
> > > > The standby role means rules are queued in the HW. If no active
> > > > roles alive or back to active, the rules are effective immediately.
> > > >
> > > > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC v3 2/2] ethdev: add API to set process to active or standby
  2022-12-21 12:05                       ` Rongwei Liu
@ 2022-12-21 12:44                         ` Jerin Jacob
  2022-12-21 12:50                           ` Rongwei Liu
  0 siblings, 1 reply; 54+ messages in thread
From: Jerin Jacob @ 2022-12-21 12:44 UTC (permalink / raw)
  To: Rongwei Liu
  Cc: Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	Ferruh Yigit, Andrew Rybchenko, dev, Raslan Darawsheh

On Wed, Dec 21, 2022 at 5:35 PM Rongwei Liu <rongweil@nvidia.com> wrote:
>
> Hi Jerin:
>
> BR
> Rongwei
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Wednesday, December 21, 2022 19:00
> > To: Rongwei Liu <rongweil@nvidia.com>
> > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> > <ferruh.yigit@amd.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> > <rasland@nvidia.com>
> > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to active or standby
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Wed, Dec 21, 2022 at 3:02 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> > >
> > > HI Jerin:
> > >
> >
> > Hi Rongwei
> >
> > > BR
> > > Rongwei
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Wednesday, December 21, 2022 17:13
> > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> > > > <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > > <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> > > > <rasland@nvidia.com>
> > > > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to active
> > > > or standby
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > On Wed, Dec 21, 2022 at 2:31 PM Rongwei Liu <rongweil@nvidia.com>
> > wrote:
> > > > >
> > > > > Users may want to change the DPDK process to different versions
> > > >
> > > > Different version of DPDK? If there is any ABI change how to support this?
> > > >
> > > There is a new member which was introduced into rte_eth_dev_info but it
> > shouldn’t be ABI breaking since using reserved fields.
> >
> > That is just for rte_eth_dev_info. What about the ABI change in different
> > ethdev structure and rte_flow structures across different DPDK ABI versions.
> >
> Besides this, there is no other ABI changes dependency.
>
> Assume there is a DPDK process A running with version v21.11 and plan to upgrade to
> version v22.11. Let' call v22.11 as process B.

OK. That's a relief. I understand the use case now.

Why not simply use standard DPDK multiprocess model then.
Primary process act as server for slow path API. Secondary process can
come and go(aka can be updated at runtime)
and use as client to update rules via primary-secondray communication mechanism.


>
> Now, process A has been running for long time and has lot of rules configured. It' "active" role per this API definition.
> Process B starts and it should call this API and set itself to "standby" role and user can program the flow rules as they want
> and different NIC vendors may have different recommendations. Nvidia suggests only program process B with group 0' rules now.
>
> The user should sync all desired configurations from process A to process B, and process A starts to yield traffic like "delete all group 0
> rules for Nvidia' NICs" or quit.
> After that process B calls this API and set itself to "active" role, now the hot-upgrade finishes.
>
> > > > > such as hot upgrade.
> > > > > There is a strong requirement to simplify the logic and shorten
> > > > > the traffic downtime as much as possible.
> > > > >
> > > > > This update introduces new rte_eth process role definitions:
> > > > > active or standby.
> > > > >
> > > > > The active role means rules are programmed to HW immediately, and
> > > > > no
> > > >
> > > > Why it has to be specific only to rte_flow rule? If it spedieic to
> > > > rte_flow, why it is in rte_eth_process_ name space?
> > > For now, this design focuses on the flow rule offloading and traffic
> > redirection.
> > > When switching process version, it' important to make sure which
> > application receives and handles the traffic.
> >
> > Changing the DPDK version runtime is just beyond rte_flow driver.
>
> It' not about changing DPDK version but upgrading DPDK from one PMD version to another one.
> Does the preceding example answer your question?
> >
> > > The changing should be effective across all probing eth devices, that' why it
> > was put under rte_eth_process_ (for all rte_eth_dev) name space.
> > > >
> > > > Also, if we are moving the standby, What about the rule whose ABI is
> > > > changed between versions?
> > >
> > > Like the comments mentioned: " Before role transition, all the rules set by
> > the active process should be flushed first. "
> >
> > What happens to rte_flow flow handles for existing ones  which is created with
> > version X?
> > Also What if new version Y has ABI change in rte_flow_pattern and
> > rte_flow_action structure?
> >
> > For me, If DPDK version change is needed, simply reload the application. This
> > API will soon bloat, and it will be a mess if to start handling Different DPDK
> > version which is not ABI compatible at all.
> >
> Yes, you are right. Reloading the application is the easiest way but it may have a long time
> Window that traffic is lost. No traffic arrives at process A or process B.
> We are trying to simplify the reloading logic and minimize the traffic down time as much as possible.
> The approach may differentiate hugely between different NIC vendors, so I think it should be better if
> DPDK can provide an abstract API.
>
> If process A and process B are ABI different, it doesn't matter.
> 1. Call this API with process A means older ABI.
> 2. Call this API with process B means newer ABI.
> It' have process concept and working scope.
>
> >
> >
> >
> > > > > behavior changed. This is the default state.
> > > > > The standby role means rules are queued in the HW. If no active
> > > > > roles alive or back to active, the rules are effective immediately.
> > > > >
> > > > > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [RFC v3 2/2] ethdev: add API to set process to active or standby
  2022-12-21 12:44                         ` Jerin Jacob
@ 2022-12-21 12:50                           ` Rongwei Liu
  2022-12-21 13:12                             ` Jerin Jacob
  0 siblings, 1 reply; 54+ messages in thread
From: Rongwei Liu @ 2022-12-21 12:50 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	Ferruh Yigit, Andrew Rybchenko, dev, Raslan Darawsheh

HI Jerin:

BR
Rongwei

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, December 21, 2022 20:45
> To: Rongwei Liu <rongweil@nvidia.com>
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> <rasland@nvidia.com>
> Subject: Re: [RFC v3 2/2] ethdev: add API to set process to active or standby
> 
> External email: Use caution opening links or attachments
> 
> 
> On Wed, Dec 21, 2022 at 5:35 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> >
> > Hi Jerin:
> >
> > BR
> > Rongwei
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Wednesday, December 21, 2022 19:00
> > > To: Rongwei Liu <rongweil@nvidia.com>
> > > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> > > <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> > > <rasland@nvidia.com>
> > > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to active
> > > or standby
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Wed, Dec 21, 2022 at 3:02 PM Rongwei Liu <rongweil@nvidia.com>
> wrote:
> > > >
> > > > HI Jerin:
> > > >
> > >
> > > Hi Rongwei
> > >
> > > > BR
> > > > Rongwei
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > Sent: Wednesday, December 21, 2022 17:13
> > > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>;
> > > > > NBU-Contact- Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > > > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > > > <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> > > > > <rasland@nvidia.com>
> > > > > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to
> > > > > active or standby
> > > > >
> > > > > External email: Use caution opening links or attachments
> > > > >
> > > > >
> > > > > On Wed, Dec 21, 2022 at 2:31 PM Rongwei Liu
> > > > > <rongweil@nvidia.com>
> > > wrote:
> > > > > >
> > > > > > Users may want to change the DPDK process to different
> > > > > > versions
> > > > >
> > > > > Different version of DPDK? If there is any ABI change how to support
> this?
> > > > >
> > > > There is a new member which was introduced into rte_eth_dev_info
> > > > but it
> > > shouldn’t be ABI breaking since using reserved fields.
> > >
> > > That is just for rte_eth_dev_info. What about the ABI change in
> > > different ethdev structure and rte_flow structures across different DPDK
> ABI versions.
> > >
> > Besides this, there is no other ABI changes dependency.
> >
> > Assume there is a DPDK process A running with version v21.11 and plan
> > to upgrade to version v22.11. Let' call v22.11 as process B.
> 
> OK. That's a relief. I understand the use case now.
> 
> Why not simply use standard DPDK multiprocess model then.
> Primary process act as server for slow path API. Secondary process can come
> and go(aka can be updated at runtime) and use as client to update rules via
> primary-secondray communication mechanism.
> 
Just image if process A and process B have ABI breakage like different rte_flow_item_*** and rte_flow_action_*** size and members.
How can we quickly accommodate primary/secondary to be ABI compatible across different versions?
It will be very huge effort and difficult to implement, at least in my opinion. 
What do you think?
> 
> >
> > Now, process A has been running for long time and has lot of rules
> configured. It' "active" role per this API definition.
> > Process B starts and it should call this API and set itself to
> > "standby" role and user can program the flow rules as they want and
> different NIC vendors may have different recommendations. Nvidia suggests
> only program process B with group 0' rules now.
> >
> > The user should sync all desired configurations from process A to
> > process B, and process A starts to yield traffic like "delete all group 0 rules
> for Nvidia' NICs" or quit.
> > After that process B calls this API and set itself to "active" role, now the hot-
> upgrade finishes.
> >
> > > > > > such as hot upgrade.
> > > > > > There is a strong requirement to simplify the logic and
> > > > > > shorten the traffic downtime as much as possible.
> > > > > >
> > > > > > This update introduces new rte_eth process role definitions:
> > > > > > active or standby.
> > > > > >
> > > > > > The active role means rules are programmed to HW immediately,
> > > > > > and no
> > > > >
> > > > > Why it has to be specific only to rte_flow rule? If it spedieic
> > > > > to rte_flow, why it is in rte_eth_process_ name space?
> > > > For now, this design focuses on the flow rule offloading and
> > > > traffic
> > > redirection.
> > > > When switching process version, it' important to make sure which
> > > application receives and handles the traffic.
> > >
> > > Changing the DPDK version runtime is just beyond rte_flow driver.
> >
> > It' not about changing DPDK version but upgrading DPDK from one PMD
> version to another one.
> > Does the preceding example answer your question?
> > >
> > > > The changing should be effective across all probing eth devices,
> > > > that' why it
> > > was put under rte_eth_process_ (for all rte_eth_dev) name space.
> > > > >
> > > > > Also, if we are moving the standby, What about the rule whose
> > > > > ABI is changed between versions?
> > > >
> > > > Like the comments mentioned: " Before role transition, all the
> > > > rules set by
> > > the active process should be flushed first. "
> > >
> > > What happens to rte_flow flow handles for existing ones  which is
> > > created with version X?
> > > Also What if new version Y has ABI change in rte_flow_pattern and
> > > rte_flow_action structure?
> > >
> > > For me, If DPDK version change is needed, simply reload the
> > > application. This API will soon bloat, and it will be a mess if to
> > > start handling Different DPDK version which is not ABI compatible at all.
> > >
> > Yes, you are right. Reloading the application is the easiest way but
> > it may have a long time Window that traffic is lost. No traffic arrives at
> process A or process B.
> > We are trying to simplify the reloading logic and minimize the traffic down
> time as much as possible.
> > The approach may differentiate hugely between different NIC vendors,
> > so I think it should be better if DPDK can provide an abstract API.
> >
> > If process A and process B are ABI different, it doesn't matter.
> > 1. Call this API with process A means older ABI.
> > 2. Call this API with process B means newer ABI.
> > It' have process concept and working scope.
> >
> > >
> > >
> > >
> > > > > > behavior changed. This is the default state.
> > > > > > The standby role means rules are queued in the HW. If no
> > > > > > active roles alive or back to active, the rules are effective
> immediately.
> > > > > >
> > > > > > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC v3 2/2] ethdev: add API to set process to active or standby
  2022-12-21 12:50                           ` Rongwei Liu
@ 2022-12-21 13:12                             ` Jerin Jacob
  2022-12-21 14:33                               ` Rongwei Liu
  0 siblings, 1 reply; 54+ messages in thread
From: Jerin Jacob @ 2022-12-21 13:12 UTC (permalink / raw)
  To: Rongwei Liu
  Cc: Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	Ferruh Yigit, Andrew Rybchenko, dev, Raslan Darawsheh

On Wed, Dec 21, 2022 at 6:20 PM Rongwei Liu <rongweil@nvidia.com> wrote:
>
> HI Jerin:
>
> BR
> Rongwei
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Wednesday, December 21, 2022 20:45
> > To: Rongwei Liu <rongweil@nvidia.com>
> > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> > <ferruh.yigit@amd.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> > <rasland@nvidia.com>
> > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to active or standby
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Wed, Dec 21, 2022 at 5:35 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> > >
> > > Hi Jerin:
> > >
> > > BR
> > > Rongwei
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Wednesday, December 21, 2022 19:00
> > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> > > > <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > > <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> > > > <rasland@nvidia.com>
> > > > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to active
> > > > or standby
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > On Wed, Dec 21, 2022 at 3:02 PM Rongwei Liu <rongweil@nvidia.com>
> > wrote:
> > > > >
> > > > > HI Jerin:
> > > > >
> > > >
> > > > Hi Rongwei
> > > >
> > > > > BR
> > > > > Rongwei
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > Sent: Wednesday, December 21, 2022 17:13
> > > > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > > > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > > > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>;
> > > > > > NBU-Contact- Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > > > > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > > > > <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> > > > > > <rasland@nvidia.com>
> > > > > > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to
> > > > > > active or standby
> > > > > >
> > > > > > External email: Use caution opening links or attachments
> > > > > >
> > > > > >
> > > > > > On Wed, Dec 21, 2022 at 2:31 PM Rongwei Liu
> > > > > > <rongweil@nvidia.com>
> > > > wrote:
> > > > > > >
> > > > > > > Users may want to change the DPDK process to different
> > > > > > > versions
> > > > > >
> > > > > > Different version of DPDK? If there is any ABI change how to support
> > this?
> > > > > >
> > > > > There is a new member which was introduced into rte_eth_dev_info
> > > > > but it
> > > > shouldn’t be ABI breaking since using reserved fields.
> > > >
> > > > That is just for rte_eth_dev_info. What about the ABI change in
> > > > different ethdev structure and rte_flow structures across different DPDK
> > ABI versions.
> > > >
> > > Besides this, there is no other ABI changes dependency.
> > >
> > > Assume there is a DPDK process A running with version v21.11 and plan
> > > to upgrade to version v22.11. Let' call v22.11 as process B.
> >
> > OK. That's a relief. I understand the use case now.
> >
> > Why not simply use standard DPDK multiprocess model then.
> > Primary process act as server for slow path API. Secondary process can come
> > and go(aka can be updated at runtime) and use as client to update rules via
> > primary-secondray communication mechanism.
> >
> Just image if process A and process B have ABI breakage like different rte_flow_item_*** and rte_flow_action_*** size and members.
> How can we quickly accommodate primary/secondary to be ABI compatible across different versions?
> It will be very huge effort and difficult to implement, at least in my opinion.
> What do you think?

Yes. it difficult what ever approach we take,
On other hand, ethdev subsystem has other components like rte_tm and
other offload etc, We can not simply have rte_eth_process_set_active()
and things magical works across different DPDK versions. Example, if
rte_flow rule has meter action will depend on another HW piece in NIC
card for doing the metering but set by flow API.
IMO, Customer can simply use standard multiprocess model if version
are compatible without any special intelligence in application.
Otherwise they can reload and start the application again
or have special intelligence in application to cater the specific area
of API they need to leverage based on server and client DPDK versions.



> >
> > >
> > > Now, process A has been running for long time and has lot of rules
> > configured. It' "active" role per this API definition.
> > > Process B starts and it should call this API and set itself to
> > > "standby" role and user can program the flow rules as they want and
> > different NIC vendors may have different recommendations. Nvidia suggests
> > only program process B with group 0' rules now.
> > >
> > > The user should sync all desired configurations from process A to
> > > process B, and process A starts to yield traffic like "delete all group 0 rules
> > for Nvidia' NICs" or quit.
> > > After that process B calls this API and set itself to "active" role, now the hot-
> > upgrade finishes.
> > >
> > > > > > > such as hot upgrade.
> > > > > > > There is a strong requirement to simplify the logic and
> > > > > > > shorten the traffic downtime as much as possible.
> > > > > > >
> > > > > > > This update introduces new rte_eth process role definitions:
> > > > > > > active or standby.
> > > > > > >
> > > > > > > The active role means rules are programmed to HW immediately,
> > > > > > > and no
> > > > > >
> > > > > > Why it has to be specific only to rte_flow rule? If it spedieic
> > > > > > to rte_flow, why it is in rte_eth_process_ name space?
> > > > > For now, this design focuses on the flow rule offloading and
> > > > > traffic
> > > > redirection.
> > > > > When switching process version, it' important to make sure which
> > > > application receives and handles the traffic.
> > > >
> > > > Changing the DPDK version runtime is just beyond rte_flow driver.
> > >
> > > It' not about changing DPDK version but upgrading DPDK from one PMD
> > version to another one.
> > > Does the preceding example answer your question?
> > > >
> > > > > The changing should be effective across all probing eth devices,
> > > > > that' why it
> > > > was put under rte_eth_process_ (for all rte_eth_dev) name space.
> > > > > >
> > > > > > Also, if we are moving the standby, What about the rule whose
> > > > > > ABI is changed between versions?
> > > > >
> > > > > Like the comments mentioned: " Before role transition, all the
> > > > > rules set by
> > > > the active process should be flushed first. "
> > > >
> > > > What happens to rte_flow flow handles for existing ones  which is
> > > > created with version X?
> > > > Also What if new version Y has ABI change in rte_flow_pattern and
> > > > rte_flow_action structure?
> > > >
> > > > For me, If DPDK version change is needed, simply reload the
> > > > application. This API will soon bloat, and it will be a mess if to
> > > > start handling Different DPDK version which is not ABI compatible at all.
> > > >
> > > Yes, you are right. Reloading the application is the easiest way but
> > > it may have a long time Window that traffic is lost. No traffic arrives at
> > process A or process B.
> > > We are trying to simplify the reloading logic and minimize the traffic down
> > time as much as possible.
> > > The approach may differentiate hugely between different NIC vendors,
> > > so I think it should be better if DPDK can provide an abstract API.
> > >
> > > If process A and process B are ABI different, it doesn't matter.
> > > 1. Call this API with process A means older ABI.
> > > 2. Call this API with process B means newer ABI.
> > > It' have process concept and working scope.
> > >
> > > >
> > > >
> > > >
> > > > > > > behavior changed. This is the default state.
> > > > > > > The standby role means rules are queued in the HW. If no
> > > > > > > active roles alive or back to active, the rules are effective
> > immediately.
> > > > > > >
> > > > > > > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [RFC v3 2/2] ethdev: add API to set process to active or standby
  2022-12-21 13:12                             ` Jerin Jacob
@ 2022-12-21 14:33                               ` Rongwei Liu
  2022-12-26 16:44                                 ` Ori Kam
  0 siblings, 1 reply; 54+ messages in thread
From: Rongwei Liu @ 2022-12-21 14:33 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	Ferruh Yigit, Andrew Rybchenko, dev, Raslan Darawsheh

HI Jerin:

BR
Rongwei

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, December 21, 2022 21:12
> To: Rongwei Liu <rongweil@nvidia.com>
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> <rasland@nvidia.com>
> Subject: Re: [RFC v3 2/2] ethdev: add API to set process to active or standby
> 
> External email: Use caution opening links or attachments
> 
> 
> On Wed, Dec 21, 2022 at 6:20 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> >
> > HI Jerin:
> >
> > BR
> > Rongwei
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Wednesday, December 21, 2022 20:45
> > > To: Rongwei Liu <rongweil@nvidia.com>
> > > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> > > <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> > > <rasland@nvidia.com>
> > > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to active
> > > or standby
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Wed, Dec 21, 2022 at 5:35 PM Rongwei Liu <rongweil@nvidia.com>
> wrote:
> > > >
> > > > Hi Jerin:
> > > >
> > > > BR
> > > > Rongwei
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > Sent: Wednesday, December 21, 2022 19:00
> > > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>;
> > > > > NBU-Contact- Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > > > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > > > <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> > > > > <rasland@nvidia.com>
> > > > > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to
> > > > > active or standby
> > > > >
> > > > > External email: Use caution opening links or attachments
> > > > >
> > > > >
> > > > > On Wed, Dec 21, 2022 at 3:02 PM Rongwei Liu
> > > > > <rongweil@nvidia.com>
> > > wrote:
> > > > > >
> > > > > > HI Jerin:
> > > > > >
> > > > >
> > > > > Hi Rongwei
> > > > >
> > > > > > BR
> > > > > > Rongwei
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > Sent: Wednesday, December 21, 2022 17:13
> > > > > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > > > > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > > > > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>;
> > > > > > > NBU-Contact- Thomas Monjalon (EXTERNAL)
> > > > > > > <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@amd.com>;
> > > > > > > Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>;
> > > > > > > dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>
> > > > > > > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to
> > > > > > > active or standby
> > > > > > >
> > > > > > > External email: Use caution opening links or attachments
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Dec 21, 2022 at 2:31 PM Rongwei Liu
> > > > > > > <rongweil@nvidia.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > Users may want to change the DPDK process to different
> > > > > > > > versions
> > > > > > >
> > > > > > > Different version of DPDK? If there is any ABI change how to
> > > > > > > support
> > > this?
> > > > > > >
> > > > > > There is a new member which was introduced into
> > > > > > rte_eth_dev_info but it
> > > > > shouldn’t be ABI breaking since using reserved fields.
> > > > >
> > > > > That is just for rte_eth_dev_info. What about the ABI change in
> > > > > different ethdev structure and rte_flow structures across
> > > > > different DPDK
> > > ABI versions.
> > > > >
> > > > Besides this, there is no other ABI changes dependency.
> > > >
> > > > Assume there is a DPDK process A running with version v21.11 and
> > > > plan to upgrade to version v22.11. Let' call v22.11 as process B.
> > >
> > > OK. That's a relief. I understand the use case now.
> > >
> > > Why not simply use standard DPDK multiprocess model then.
> > > Primary process act as server for slow path API. Secondary process
> > > can come and go(aka can be updated at runtime) and use as client to
> > > update rules via primary-secondray communication mechanism.
> > >
> > Just image if process A and process B have ABI breakage like different
> rte_flow_item_*** and rte_flow_action_*** size and members.
> > How can we quickly accommodate primary/secondary to be ABI compatible
> across different versions?
> > It will be very huge effort and difficult to implement, at least in my opinion.
> > What do you think?
> 
> Yes. it difficult what ever approach we take, On other hand, ethdev subsystem
> has other components like rte_tm and other offload etc, We can not simply
> have rte_eth_process_set_active() and things magical works across different
> DPDK versions. Example, if rte_flow rule has meter action will depend on
> another HW piece in NIC card for doing the metering but set by flow API.
> IMO, Customer can simply use standard multiprocess model if version are
> compatible without any special intelligence in application.
> Otherwise they can reload and start the application again or have special
> intelligence in application to cater the specific area of API they need to
> leverage based on server and client DPDK versions.

Thanks for the message.
IMO, we are trying to eliminate the version/ABI dependency with this new added API.
For example, if meter action is in the flow rules:
1. Process A have rules like "eth / ipv4 src 1.1.1.1 / meter / queue / end"
2. Process B starts with "rte_eth_process_set_active(false, flags)"

Just give Nvidia' hardware as example (other NIC vendors may not care if group 0 or not)
If the process A' rules are in group 0, users should set them one by one to process B.
Then either flush process A' rules or quit process A, then process B calls with "rte_eth_process_set_active(true, flags)"
All is set.
It will avoid complex operations with client/server model and avoid user mis-operation too.
We should avoid reload as much as possible since reloading is very time consuming and may take up to few seconds.
In this time slot, there is no application to handle the traffic, and everything is lost.
For end user especially cloud service providers, they are sensitive to the traffic down time.
> 
> 
> > >
> > > >
> > > > Now, process A has been running for long time and has lot of rules
> > > configured. It' "active" role per this API definition.
> > > > Process B starts and it should call this API and set itself to
> > > > "standby" role and user can program the flow rules as they want
> > > > and
> > > different NIC vendors may have different recommendations. Nvidia
> > > suggests only program process B with group 0' rules now.
> > > >
> > > > The user should sync all desired configurations from process A to
> > > > process B, and process A starts to yield traffic like "delete all
> > > > group 0 rules
> > > for Nvidia' NICs" or quit.
> > > > After that process B calls this API and set itself to "active"
> > > > role, now the hot-
> > > upgrade finishes.
> > > >
> > > > > > > > such as hot upgrade.
> > > > > > > > There is a strong requirement to simplify the logic and
> > > > > > > > shorten the traffic downtime as much as possible.
> > > > > > > >
> > > > > > > > This update introduces new rte_eth process role definitions:
> > > > > > > > active or standby.
> > > > > > > >
> > > > > > > > The active role means rules are programmed to HW
> > > > > > > > immediately, and no
> > > > > > >
> > > > > > > Why it has to be specific only to rte_flow rule? If it
> > > > > > > spedieic to rte_flow, why it is in rte_eth_process_ name space?
> > > > > > For now, this design focuses on the flow rule offloading and
> > > > > > traffic
> > > > > redirection.
> > > > > > When switching process version, it' important to make sure
> > > > > > which
> > > > > application receives and handles the traffic.
> > > > >
> > > > > Changing the DPDK version runtime is just beyond rte_flow driver.
> > > >
> > > > It' not about changing DPDK version but upgrading DPDK from one
> > > > PMD
> > > version to another one.
> > > > Does the preceding example answer your question?
> > > > >
> > > > > > The changing should be effective across all probing eth
> > > > > > devices, that' why it
> > > > > was put under rte_eth_process_ (for all rte_eth_dev) name space.
> > > > > > >
> > > > > > > Also, if we are moving the standby, What about the rule
> > > > > > > whose ABI is changed between versions?
> > > > > >
> > > > > > Like the comments mentioned: " Before role transition, all the
> > > > > > rules set by
> > > > > the active process should be flushed first. "
> > > > >
> > > > > What happens to rte_flow flow handles for existing ones  which
> > > > > is created with version X?
> > > > > Also What if new version Y has ABI change in rte_flow_pattern
> > > > > and rte_flow_action structure?
> > > > >
> > > > > For me, If DPDK version change is needed, simply reload the
> > > > > application. This API will soon bloat, and it will be a mess if
> > > > > to start handling Different DPDK version which is not ABI compatible at
> all.
> > > > >
> > > > Yes, you are right. Reloading the application is the easiest way
> > > > but it may have a long time Window that traffic is lost. No
> > > > traffic arrives at
> > > process A or process B.
> > > > We are trying to simplify the reloading logic and minimize the
> > > > traffic down
> > > time as much as possible.
> > > > The approach may differentiate hugely between different NIC
> > > > vendors, so I think it should be better if DPDK can provide an abstract API.
> > > >
> > > > If process A and process B are ABI different, it doesn't matter.
> > > > 1. Call this API with process A means older ABI.
> > > > 2. Call this API with process B means newer ABI.
> > > > It' have process concept and working scope.
> > > >
> > > > >
> > > > >
> > > > >
> > > > > > > > behavior changed. This is the default state.
> > > > > > > > The standby role means rules are queued in the HW. If no
> > > > > > > > active roles alive or back to active, the rules are
> > > > > > > > effective
> > > immediately.
> > > > > > > >
> > > > > > > > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [RFC v3 2/2] ethdev: add API to set process to active or standby
  2022-12-21 14:33                               ` Rongwei Liu
@ 2022-12-26 16:44                                 ` Ori Kam
  2023-01-15 22:46                                   ` Thomas Monjalon
  0 siblings, 1 reply; 54+ messages in thread
From: Ori Kam @ 2022-12-26 16:44 UTC (permalink / raw)
  To: Rongwei Liu, Jerin Jacob
  Cc: Matan Azrad, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	Ferruh Yigit, Andrew Rybchenko, dev, Raslan Darawsheh

Hi Rongwei and Jerin,

> -----Original Message-----
> From: Rongwei Liu <rongweil@nvidia.com>
> Sent: Wednesday, 21 December 2022 16:33
> 
> HI Jerin:
> 
> BR
> Rongwei
> 
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Wednesday, December 21, 2022 21:12
> > To: Rongwei Liu <rongweil@nvidia.com>
> > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> > <ferruh.yigit@amd.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> > <rasland@nvidia.com>
> > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to active or
> standby
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Wed, Dec 21, 2022 at 6:20 PM Rongwei Liu <rongweil@nvidia.com>
> wrote:
> > >
> > > HI Jerin:
> > >
> > > BR
> > > Rongwei
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Wednesday, December 21, 2022 20:45
> > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-
> Contact-
> > > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Ferruh Yigit
> > > > <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > > <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh
> > > > <rasland@nvidia.com>
> > > > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to active
> > > > or standby
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > On Wed, Dec 21, 2022 at 5:35 PM Rongwei Liu <rongweil@nvidia.com>
> > wrote:
> > > > >
> > > > > Hi Jerin:
> > > > >
> > > > > BR
> > > > > Rongwei
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > Sent: Wednesday, December 21, 2022 19:00
> > > > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > > > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > > > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>;
> > > > > > NBU-Contact- Thomas Monjalon (EXTERNAL)
> <thomas@monjalon.net>;
> > > > > > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > > > > <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org; Raslan
> Darawsheh
> > > > > > <rasland@nvidia.com>
> > > > > > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to
> > > > > > active or standby
> > > > > >
> > > > > > External email: Use caution opening links or attachments
> > > > > >
> > > > > >
> > > > > > On Wed, Dec 21, 2022 at 3:02 PM Rongwei Liu
> > > > > > <rongweil@nvidia.com>
> > > > wrote:
> > > > > > >
> > > > > > > HI Jerin:
> > > > > > >
> > > > > >
> > > > > > Hi Rongwei
> > > > > >
> > > > > > > BR
> > > > > > > Rongwei
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > > Sent: Wednesday, December 21, 2022 17:13
> > > > > > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > > > > > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > > > > > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>;
> > > > > > > > NBU-Contact- Thomas Monjalon (EXTERNAL)
> > > > > > > > <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@amd.com>;
> > > > > > > > Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>;
> > > > > > > > dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>
> > > > > > > > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to
> > > > > > > > active or standby
> > > > > > > >
> > > > > > > > External email: Use caution opening links or attachments
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Dec 21, 2022 at 2:31 PM Rongwei Liu
> > > > > > > > <rongweil@nvidia.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Users may want to change the DPDK process to different
> > > > > > > > > versions
> > > > > > > >
> > > > > > > > Different version of DPDK? If there is any ABI change how to
> > > > > > > > support
> > > > this?
> > > > > > > >
> > > > > > > There is a new member which was introduced into
> > > > > > > rte_eth_dev_info but it
> > > > > > shouldn’t be ABI breaking since using reserved fields.
> > > > > >
> > > > > > That is just for rte_eth_dev_info. What about the ABI change in
> > > > > > different ethdev structure and rte_flow structures across
> > > > > > different DPDK
> > > > ABI versions.
> > > > > >
> > > > > Besides this, there is no other ABI changes dependency.
> > > > >
> > > > > Assume there is a DPDK process A running with version v21.11 and
> > > > > plan to upgrade to version v22.11. Let' call v22.11 as process B.
> > > >
> > > > OK. That's a relief. I understand the use case now.
> > > >
> > > > Why not simply use standard DPDK multiprocess model then.
> > > > Primary process act as server for slow path API. Secondary process
> > > > can come and go(aka can be updated at runtime) and use as client to
> > > > update rules via primary-secondray communication mechanism.
> > > >
> > > Just image if process A and process B have ABI breakage like different
> > rte_flow_item_*** and rte_flow_action_*** size and members.
> > > How can we quickly accommodate primary/secondary to be ABI
> compatible
> > across different versions?
> > > It will be very huge effort and difficult to implement, at least in my
> opinion.
> > > What do you think?
> >
> > Yes. it difficult what ever approach we take, On other hand, ethdev
> subsystem
> > has other components like rte_tm and other offload etc, We can not simply
> > have rte_eth_process_set_active() and things magical works across
> different
> > DPDK versions. Example, if rte_flow rule has meter action will depend on
> > another HW piece in NIC card for doing the metering but set by flow API.
> > IMO, Customer can simply use standard multiprocess model if version are
> > compatible without any special intelligence in application.
> > Otherwise they can reload and start the application again or have special
> > intelligence in application to cater the specific area of API they need to
> > leverage based on server and client DPDK versions.
> 
> Thanks for the message.
> IMO, we are trying to eliminate the version/ABI dependency with this new
> added API.
> For example, if meter action is in the flow rules:
> 1. Process A have rules like "eth / ipv4 src 1.1.1.1 / meter / queue / end"
> 2. Process B starts with "rte_eth_process_set_active(false, flags)"
> 
> Just give Nvidia' hardware as example (other NIC vendors may not care if
> group 0 or not)
> If the process A' rules are in group 0, users should set them one by one to
> process B.
> Then either flush process A' rules or quit process A, then process B calls with
> "rte_eth_process_set_active(true, flags)"
> All is set.
> It will avoid complex operations with client/server model and avoid user mis-
> operation too.
> We should avoid reload as much as possible since reloading is very time
> consuming and may take up to few seconds.
> In this time slot, there is no application to handle the traffic, and everything is
> lost.
> For end user especially cloud service providers, they are sensitive to the
> traffic down time.

From my viewpoint the upgrade has nothing to do with DPDK as a library,
the upgrade may be because of application upgrade.
Unless I'm missing something, this API is not meant for API/ABI it is created
to allow minimum downtime when doing upgrade of the application.
Unless I'm missing something critical, since the upgrade in any case is not
only for the DPDK but the entire app, there isn't any ABI/API issue.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC v3 2/2] ethdev: add API to set process to active or standby
  2022-12-26 16:44                                 ` Ori Kam
@ 2023-01-15 22:46                                   ` Thomas Monjalon
  0 siblings, 0 replies; 54+ messages in thread
From: Thomas Monjalon @ 2023-01-15 22:46 UTC (permalink / raw)
  To: Rongwei Liu, Jerin Jacob, Ori Kam
  Cc: dev, Matan Azrad, Slava Ovsiienko, Ferruh Yigit,
	Andrew Rybchenko, dev, Raslan Darawsheh, maxime.coquelin

26/12/2022 17:44, Ori Kam:
> From: Rongwei Liu <rongweil@nvidia.com>
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > On Wed, Dec 21, 2022 at 6:20 PM Rongwei Liu wrote:
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > On Wed, Dec 21, 2022 at 5:35 PM Rongwei Liu wrote:
> > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > On Wed, Dec 21, 2022 at 3:02 PM Rongwei Liu wrote:
> > > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > > > On Wed, Dec 21, 2022 at 2:31 PM Rongwei Liu wrote:
> > > > > > > > > >
> > > > > > > > > > Users may want to change the DPDK process to different
> > > > > > > > > > versions
> > > > > > > > >
> > > > > > > > > Different version of DPDK? If there is any ABI change how to
> > > > > > > > > support
> > > > > this?
> > > > > > > > >
> > > > > > > > There is a new member which was introduced into
> > > > > > > > rte_eth_dev_info but it
> > > > > > > shouldn’t be ABI breaking since using reserved fields.
> > > > > > >
> > > > > > > That is just for rte_eth_dev_info. What about the ABI change in
> > > > > > > different ethdev structure and rte_flow structures across
> > > > > > > different DPDK
> > > > > ABI versions.
> > > > > > >
> > > > > > Besides this, there is no other ABI changes dependency.
> > > > > >
> > > > > > Assume there is a DPDK process A running with version v21.11 and
> > > > > > plan to upgrade to version v22.11. Let' call v22.11 as process B.
> > > > >
> > > > > OK. That's a relief. I understand the use case now.
> > > > >
> > > > > Why not simply use standard DPDK multiprocess model then.
> > > > > Primary process act as server for slow path API. Secondary process
> > > > > can come and go(aka can be updated at runtime) and use as client to
> > > > > update rules via primary-secondray communication mechanism.
> > > > >
> > > > Just image if process A and process B have ABI breakage like different
> > > rte_flow_item_*** and rte_flow_action_*** size and members.
> > > > How can we quickly accommodate primary/secondary to be ABI
> > compatible
> > > across different versions?
> > > > It will be very huge effort and difficult to implement, at least in my
> > opinion.
> > > > What do you think?
> > >
> > > Yes. it difficult what ever approach we take, On other hand, ethdev
> > subsystem
> > > has other components like rte_tm and other offload etc, We can not simply
> > > have rte_eth_process_set_active() and things magical works across
> > different
> > > DPDK versions. Example, if rte_flow rule has meter action will depend on
> > > another HW piece in NIC card for doing the metering but set by flow API.
> > > IMO, Customer can simply use standard multiprocess model if version are
> > > compatible without any special intelligence in application.
> > > Otherwise they can reload and start the application again or have special
> > > intelligence in application to cater the specific area of API they need to
> > > leverage based on server and client DPDK versions.
> > 
> > Thanks for the message.
> > IMO, we are trying to eliminate the version/ABI dependency with this new
> > added API.
> > For example, if meter action is in the flow rules:
> > 1. Process A have rules like "eth / ipv4 src 1.1.1.1 / meter / queue / end"
> > 2. Process B starts with "rte_eth_process_set_active(false, flags)"
> > 
> > Just give Nvidia' hardware as example (other NIC vendors may not care if
> > group 0 or not)
> > If the process A' rules are in group 0, users should set them one by one to
> > process B.
> > Then either flush process A' rules or quit process A, then process B calls with
> > "rte_eth_process_set_active(true, flags)"
> > All is set.
> > It will avoid complex operations with client/server model and avoid user mis-
> > operation too.
> > We should avoid reload as much as possible since reloading is very time
> > consuming and may take up to few seconds.
> > In this time slot, there is no application to handle the traffic, and everything is
> > lost.
> > For end user especially cloud service providers, they are sensitive to the
> > traffic down time.
> 
> From my viewpoint the upgrade has nothing to do with DPDK as a library,
> the upgrade may be because of application upgrade.
> Unless I'm missing something, this API is not meant for API/ABI it is created
> to allow minimum downtime when doing upgrade of the application.
> Unless I'm missing something critical, since the upgrade in any case is not
> only for the DPDK but the entire app, there isn't any ABI/API issue.

Yes we can consider the case of an application upgrade with the same DPDK.
The patch needs to be reworded in this more realistic direction I think.
We can also improve the usage explanations.

That said, another high level question is about the scope of the feature.
In this patch, only ethdev is targetted.
Do you think we need the same migration mechanism in other classes
like vDPA, crypto, etc?




^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v4 0/3] add API for live migration
  2022-12-21  9:00               ` [RFC v3 1/2] ethdev: add group description Rongwei Liu
@ 2023-01-18 15:44                 ` Rongwei Liu
  2023-01-18 15:44                   ` [PATCH v4 1/3] ethdev: add flow rule group description Rongwei Liu
                                     ` (2 more replies)
  0 siblings, 3 replies; 54+ messages in thread
From: Rongwei Liu @ 2023-01-18 15:44 UTC (permalink / raw)
  To: dev, matan, viacheslavo, orika, thomas, jerinjacobk, stephen; +Cc: rasland

Introduce a new API to set eth_dev process to different roles: active or
standby.
This API should be used when switching DPDK application to a different
version and it should be simple and easy to use.

v4: add more descriptions and one user example.
v3: add rollback and device capability.

Rongwei Liu (3):
  ethdev: add flow rule group description
  ethdev: add standby state for live migration
  ethdev: add standby flags for live migration

 doc/guides/rel_notes/release_23_03.rst |  7 +++
 lib/ethdev/ethdev_driver.h             | 20 +++++++
 lib/ethdev/rte_ethdev.c                | 42 ++++++++++++++
 lib/ethdev/rte_ethdev.h                | 77 ++++++++++++++++++++++++++
 lib/ethdev/rte_flow.h                  | 13 ++++-
 lib/ethdev/version.map                 |  3 +
 6 files changed, 161 insertions(+), 1 deletion(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v4 1/3] ethdev: add flow rule group description
  2023-01-18 15:44                 ` [PATCH v4 0/3] add API for live migration Rongwei Liu
@ 2023-01-18 15:44                   ` Rongwei Liu
  2023-01-31 11:53                     ` Ori Kam
  2023-02-07  2:57                     ` [PATCH v5] " Rongwei Liu
  2023-01-18 15:44                   ` [PATCH v4 2/3] ethdev: add standby state for live migration Rongwei Liu
  2023-01-18 15:44                   ` [PATCH v4 3/3] ethdev: add standby flags " Rongwei Liu
  2 siblings, 2 replies; 54+ messages in thread
From: Rongwei Liu @ 2023-01-18 15:44 UTC (permalink / raw)
  To: dev, matan, viacheslavo, orika, thomas, jerinjacobk, stephen
  Cc: rasland, Ferruh Yigit, Andrew Rybchenko

Add more sentences to describe the group concepts
and define group 0 as root group for traffic to search a
hit rule.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
---
 lib/ethdev/rte_flow.h | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index b60987db4b..e71ac0c199 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -86,7 +86,18 @@ extern "C" {
  * but may be valid in a few cases.
  */
 struct rte_flow_attr {
-	uint32_t group; /**< Priority group. */
+	/**
+	 * A group is a superset of multiple rules.
+	 * The default group is 0 and is processed for all packets.
+	 * The group 0 of bifurcated drivers is shared with the kernel.
+	 * Rules in other groups are processed only if the group is chained
+	 * by a jump action from a previously matched rule.
+	 * It means the group hierarchy is made by the flow rules,
+	 * and the group 0 is the hierarchy root.
+	 * Note there is no automatic dead loop protection.
+	 * @see rte_flow_action_jump
+	 */
+	uint32_t group;
 	uint32_t priority; /**< Rule priority level within group. */
 	/**
 	 * The rule in question applies to ingress traffic (non-"transfer").
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v4 2/3] ethdev: add standby state for live migration
  2023-01-18 15:44                 ` [PATCH v4 0/3] add API for live migration Rongwei Liu
  2023-01-18 15:44                   ` [PATCH v4 1/3] ethdev: add flow rule group description Rongwei Liu
@ 2023-01-18 15:44                   ` Rongwei Liu
  2023-01-31 13:50                     ` Ori Kam
                                       ` (2 more replies)
  2023-01-18 15:44                   ` [PATCH v4 3/3] ethdev: add standby flags " Rongwei Liu
  2 siblings, 3 replies; 54+ messages in thread
From: Rongwei Liu @ 2023-01-18 15:44 UTC (permalink / raw)
  To: dev, matan, viacheslavo, orika, thomas, jerinjacobk, stephen
  Cc: rasland, Ferruh Yigit, Andrew Rybchenko

When a DPDK application must be upgraded,
the traffic downtime should be shortened as much as possible.
During the migration time, the old application may stay alive
while the new application is starting and being configured.

In order to optimize the switch to the new application,
the old application may need to be aware of the presence
of the new application being prepared.
This is achieved with a new API allowing the user to change the
new application state to standby and active later.

The added function is trying to apply the new state to all probed
ethdev ports. To make this API simple and easy to use,
the same flags have to be accepted by all devices.

This is the scenario of operations in the old and new applications:
.       device: already configured by the old application
.       new:    start as active
.       new:    probe the same device
.       new:    set as standby
.       new:    configure the device
.       device: has configurations from old and new applications
.       old:    clear its device configuration
.       device: has only 1 configuration from new application
.       new:    set as active
.       device: downtime for connecting all to the new application
.       old:    shutdown

The active role means network handling configurations are programmed
to the HW immediately, and no behavior changed. This is the default state.
The standby role means configurations are queued in the HW.
If there is no application with active role,
any configuration is effective immediately.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
---
 doc/guides/rel_notes/release_23_03.rst |  7 ++++
 lib/ethdev/ethdev_driver.h             | 20 +++++++++
 lib/ethdev/rte_ethdev.c                | 42 +++++++++++++++++++
 lib/ethdev/rte_ethdev.h                | 56 ++++++++++++++++++++++++++
 lib/ethdev/version.map                 |  3 ++
 5 files changed, 128 insertions(+)

diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index b8c5b68d6c..5367123f24 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -55,6 +55,13 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added process state in ethdev to improve live migration.**
+
+  Hot upgrade of an application may be accelerated by configuring
+  the new application in standby state while the old one is still active.
+  Such double ethdev configuration of the same device is possible
+  with the added process state API.
+
 
 Removed Items
 -------------
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 6a550cfc83..4a098410d5 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -219,6 +219,23 @@ typedef int (*eth_dev_reset_t)(struct rte_eth_dev *dev);
 /** @internal Function used to detect an Ethernet device removal. */
 typedef int (*eth_is_removed_t)(struct rte_eth_dev *dev);
 
+/**
+ * @internal
+ * Set the role of the process to active or standby during live migration.
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param standby
+ *   Role active if false, standby if true.
+ * @param flags
+ *   Role specific flags.
+ *
+ * @return
+ *   Negative value on error, 0 on success.
+ */
+typedef int (*eth_dev_process_set_role_t)(struct rte_eth_dev *dev,
+		bool standby, uint32_t flags);
+
 /**
  * @internal
  * Function used to enable the Rx promiscuous mode of an Ethernet device.
@@ -1186,6 +1203,9 @@ struct eth_dev_ops {
 	/** Check if the device was physically removed */
 	eth_is_removed_t           is_removed;
 
+	/** Set role during live migration */
+	eth_dev_process_set_role_t process_set_role;
+
 	eth_promiscuous_enable_t   promiscuous_enable; /**< Promiscuous ON */
 	eth_promiscuous_disable_t  promiscuous_disable;/**< Promiscuous OFF */
 	eth_allmulticast_enable_t  allmulticast_enable;/**< Rx multicast ON */
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 5d5e18db1e..3a1fb64053 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -558,6 +558,48 @@ rte_eth_dev_owner_get(const uint16_t port_id, struct rte_eth_dev_owner *owner)
 	return 0;
 }
 
+int
+rte_eth_process_set_role(bool standby, uint32_t flags)
+{
+	struct rte_eth_dev_info dev_info = {0};
+	struct rte_eth_dev *dev;
+	uint16_t port_id;
+	int ret = 0;
+
+	/* Check if all devices support process role. */
+	RTE_ETH_FOREACH_DEV(port_id) {
+		dev = &rte_eth_devices[port_id];
+		if (*dev->dev_ops->process_set_role != NULL &&
+			*dev->dev_ops->dev_infos_get != NULL &&
+			(*dev->dev_ops->dev_infos_get)(dev, &dev_info) == 0 &&
+			(dev_info.dev_capa & RTE_ETH_DEV_CAPA_PROCESS_ROLE) != 0)
+			continue;
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	/* Call the driver callbacks. */
+	RTE_ETH_FOREACH_DEV(port_id) {
+		dev = &rte_eth_devices[port_id];
+		if ((*dev->dev_ops->process_set_role)(dev, standby, flags) < 0)
+			goto failure;
+		ret++;
+	}
+	return ret;
+
+failure:
+	/* Rollback all changed devices in case one failed. */
+	if (ret) {
+		RTE_ETH_FOREACH_DEV(port_id) {
+			dev = &rte_eth_devices[port_id];
+			(*dev->dev_ops->process_set_role)(dev, !standby, flags);
+			if (--ret == 0)
+				break;
+		}
+	}
+	rte_errno = EPERM;
+	return -rte_errno;
+}
+
 int
 rte_eth_dev_socket_id(uint16_t port_id)
 {
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index c129ca1eaf..1505396ced 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1606,6 +1606,8 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP         RTE_BIT64(3)
 /** Device supports keeping shared flow objects across restart. */
 #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
+/** Device supports process role changing. @see rte_eth_process_set_active */
+#define RTE_ETH_DEV_CAPA_PROCESS_ROLE           RTE_BIT64(5)
 /**@}*/
 
 /*
@@ -2204,6 +2206,60 @@ int rte_eth_dev_owner_delete(const uint64_t owner_id);
 int rte_eth_dev_owner_get(const uint16_t port_id,
 		struct rte_eth_dev_owner *owner);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Set the role of the process to active or standby,
+ * affecting network traffic handling.
+ *
+ * If one device does not support this operation or fails,
+ * the whole operation is failed and rolled back.
+ *
+ * It is forbidden to have multiple processes with the same role
+ * unless only one of them is configured to handle the traffic.
+ *
+ * The application is active by default.
+ * The configuration from the active process is effective immediately
+ * while the configuration from the standby process is queued by hardware.
+ * When configuring the device from a standby process,
+ * it has no effect except for below situations:
+ *   - traffic not handled by the active process configuration
+ *   - no active process
+ *
+ * When a process is changed from a standby to an active role,
+ * all preceding configurations that are queued by hardware
+ * should become effective immediately.
+ * Before role transition, all the traffic handling configurations
+ * set by the active process should be flushed first.
+ *
+ * In summary, the operations are expected to happen in this order
+ * in "old" and "new" applications:
+ *   device: already configured by the old application
+ *   new:    start as active
+ *   new:    probe the same device
+ *   new:    set as standby
+ *   new:    configure the device
+ *   device: has configurations from old and new applications
+ *   old:    clear its device configuration
+ *   device: has only 1 configuration from new application
+ *   new:    set as active
+ *   device: downtime for connecting all to the new application
+ *   old:    shutdown
+ *
+ * @param standby
+ *   Role active if false, standby if true.
+ * @param flags
+ *   Role specific flags.
+ * @return
+ *   Positive value on success, -rte_errno value on error:
+ *   - (> 0) Number of switched devices.
+ *   - (-ENOTSUP) if not supported by a device.
+ *   - (-EPERM) if operation failed with a device.
+ */
+__rte_experimental
+int rte_eth_process_set_role(bool standby, uint32_t flags);
+
 /**
  * Get the number of ports which are usable for the application.
  *
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 17201fbe0f..d5d3ea5421 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -298,6 +298,9 @@ EXPERIMENTAL {
 	rte_flow_get_q_aged_flows;
 	rte_mtr_meter_policy_get;
 	rte_mtr_meter_profile_get;
+
+	# added in 23.03
+	rte_eth_process_set_role;
 };
 
 INTERNAL {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v4 3/3] ethdev: add standby flags for live migration
  2023-01-18 15:44                 ` [PATCH v4 0/3] add API for live migration Rongwei Liu
  2023-01-18 15:44                   ` [PATCH v4 1/3] ethdev: add flow rule group description Rongwei Liu
  2023-01-18 15:44                   ` [PATCH v4 2/3] ethdev: add standby state for live migration Rongwei Liu
@ 2023-01-18 15:44                   ` Rongwei Liu
  2023-01-23 13:20                     ` Jerin Jacob
  2 siblings, 1 reply; 54+ messages in thread
From: Rongwei Liu @ 2023-01-18 15:44 UTC (permalink / raw)
  To: dev, matan, viacheslavo, orika, thomas, jerinjacobk, stephen
  Cc: rasland, Ferruh Yigit, Andrew Rybchenko

Some flags are added to the process state API for live migration
in order to change the behavior of the flow rules in a standby process.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
---
 lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 1505396ced..9ae4f426a7 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -2260,6 +2260,27 @@ int rte_eth_dev_owner_get(const uint16_t port_id,
 __rte_experimental
 int rte_eth_process_set_role(bool standby, uint32_t flags);
 
+/**@{@name Process role flags
+ * used when migrating from an application to another one.
+ * @see rte_eth_process_set_active
+ */
+/**
+ * When set on a standby process, ingress flow rules will be effective
+ * in active and standby processes, so the ingress traffic may be duplicated.
+ */
+#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS      RTE_BIT32(0)
+/**
+ * When set on a standby process, egress flow rules will be effective
+ * in active and standby processes, so the egress traffic may be duplicated.
+ */
+#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_EGRESS       RTE_BIT32(1)
+/**
+ * When set on a standby process, transfer flow rules will be effective
+ * in active and standby processes, so the transfer traffic may be duplicated.
+ */
+#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_TRANSFER     RTE_BIT32(2)
+/**@}*/
+
 /**
  * Get the number of ports which are usable for the application.
  *
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
  2023-01-18 15:44                   ` [PATCH v4 3/3] ethdev: add standby flags " Rongwei Liu
@ 2023-01-23 13:20                     ` Jerin Jacob
  2023-01-30  2:47                       ` Rongwei Liu
  0 siblings, 1 reply; 54+ messages in thread
From: Jerin Jacob @ 2023-01-23 13:20 UTC (permalink / raw)
  To: Rongwei Liu
  Cc: dev, matan, viacheslavo, orika, thomas, stephen, rasland,
	Ferruh Yigit, Andrew Rybchenko

On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu <rongweil@nvidia.com> wrote:
>
> Some flags are added to the process state API for live migration
> in order to change the behavior of the flow rules in a standby process.
>
> Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> ---
>  lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
>
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 1505396ced..9ae4f426a7 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -2260,6 +2260,27 @@ int rte_eth_dev_owner_get(const uint16_t port_id,
>  __rte_experimental
>  int rte_eth_process_set_role(bool standby, uint32_t flags);
>
> +/**@{@name Process role flags
> + * used when migrating from an application to another one.
> + * @see rte_eth_process_set_active
> + */
> +/**
> + * When set on a standby process, ingress flow rules will be effective
> + * in active and standby processes, so the ingress traffic may be duplicated.
> + */
> +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS      RTE_BIT32(0)


How to duplicate if action has statefull items for example,
rte_flow_action_security::security_session -> it store the live pointer
rte_flow_action_meter::mtr_id; -> MTR object ID created with rte_mtr_create()

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v4 3/3] ethdev: add standby flags for live migration
  2023-01-23 13:20                     ` Jerin Jacob
@ 2023-01-30  2:47                       ` Rongwei Liu
  2023-01-30 17:10                         ` Jerin Jacob
  0 siblings, 1 reply; 54+ messages in thread
From: Rongwei Liu @ 2023-01-30  2:47 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	stephen, Raslan Darawsheh, Ferruh Yigit, Andrew Rybchenko

Hi Jerin

BR
Rongwei

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, January 23, 2023 21:20
> To: Rongwei Liu <rongweil@nvidia.com>
> Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
> 
> External email: Use caution opening links or attachments
> 
> 
> On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> >
> > Some flags are added to the process state API for live migration in
> > order to change the behavior of the flow rules in a standby process.
> >
> > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > ---
> >  lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
> >  1 file changed, 21 insertions(+)
> >
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > 1505396ced..9ae4f426a7 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -2260,6 +2260,27 @@ int rte_eth_dev_owner_get(const uint16_t
> > port_id,  __rte_experimental  int rte_eth_process_set_role(bool
> > standby, uint32_t flags);
> >
> > +/**@{@name Process role flags
> > + * used when migrating from an application to another one.
> > + * @see rte_eth_process_set_active
> > + */
> > +/**
> > + * When set on a standby process, ingress flow rules will be
> > +effective
> > + * in active and standby processes, so the ingress traffic may be duplicated.
> > + */
> > +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS
> RTE_BIT32(0)
> 
> 
> How to duplicate if action has statefull items for example,
> rte_flow_action_security::security_session -> it store the live pointer
> rte_flow_action_meter::mtr_id; -> MTR object ID created with
> rte_mtr_create()
I agree with you, not all actions can be supported in the active/standby model.
That' why we have return value checking and rollback.
In Nvidia driver doc, we suggested user to start from 'rss/queue/jump' actions.
Meter is possible, at least per my view.
Assume: "meter g_action queue 0 / y_action drop / r_action drop"
Old application: create meter_id 'A' with pre-defined limitation.
New application: create meter_id 'B' which has the same parameters with 'A'.
1. 1st possible approach:
	Hardware duplicates the traffic; old application use meter 'A' and new application uses meter 'B' to control traffic throughputs.
	Since traffic is duplicated, so it can go to different meters. 
2. 2nd possible approach:
             Meter 'A' and 'B' point to the same hardware resource, and traffic reaches this part first and if green, duplication happens. 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
  2023-01-30  2:47                       ` Rongwei Liu
@ 2023-01-30 17:10                         ` Jerin Jacob
  2023-01-31  2:53                           ` Rongwei Liu
  0 siblings, 1 reply; 54+ messages in thread
From: Jerin Jacob @ 2023-01-30 17:10 UTC (permalink / raw)
  To: Rongwei Liu
  Cc: dev, Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	stephen, Raslan Darawsheh, Ferruh Yigit, Andrew Rybchenko

On Mon, Jan 30, 2023 at 8:17 AM Rongwei Liu <rongweil@nvidia.com> wrote:
>
> Hi Jerin
>
> BR
> Rongwei
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Monday, January 23, 2023 21:20
> > To: Rongwei Liu <rongweil@nvidia.com>
> > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>
> > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> > >
> > > Some flags are added to the process state API for live migration in
> > > order to change the behavior of the flow rules in a standby process.
> > >
> > > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > > ---
> > >  lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
> > >  1 file changed, 21 insertions(+)
> > >
> > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > > 1505396ced..9ae4f426a7 100644
> > > --- a/lib/ethdev/rte_ethdev.h
> > > +++ b/lib/ethdev/rte_ethdev.h
> > > @@ -2260,6 +2260,27 @@ int rte_eth_dev_owner_get(const uint16_t
> > > port_id,  __rte_experimental  int rte_eth_process_set_role(bool
> > > standby, uint32_t flags);
> > >
> > > +/**@{@name Process role flags
> > > + * used when migrating from an application to another one.
> > > + * @see rte_eth_process_set_active
> > > + */
> > > +/**
> > > + * When set on a standby process, ingress flow rules will be
> > > +effective
> > > + * in active and standby processes, so the ingress traffic may be duplicated.
> > > + */
> > > +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS
> > RTE_BIT32(0)
> >
> >
> > How to duplicate if action has statefull items for example,
> > rte_flow_action_security::security_session -> it store the live pointer
> > rte_flow_action_meter::mtr_id; -> MTR object ID created with
> > rte_mtr_create()
> I agree with you, not all actions can be supported in the active/standby model.

IMO, Where ever rules are not standalone (like QUEUE, RSS) etc, It
will be architecturally is not possible to migrate with pointers.
That's where I have concern generalizing this feature for this ethdev.

Also, I don't believe there is any real HW support needed for this.
IMO, Having DPDK standard multiprocess can do this by keeping
secondary application can migrate, keeping all the SW logic in the
primary process by doing the housekeeping in the application. On plus
side,
it works with pointers too.

I am not sure how much housekeeping offload to _HW_ in your case. In
my view, it should be generic utils functions to track the flow
and installing the rules using rte_flow APIs and keep the scope only
for rte_flow.

That's just my view. I leave to ethdev maintainers for the rest of the
review and decision on this series.

> That' why we have return value checking and rollback.
> In Nvidia driver doc, we suggested user to start from 'rss/queue/jump' actions.
> Meter is possible, at least per my view.
> Assume: "meter g_action queue 0 / y_action drop / r_action drop"
> Old application: create meter_id 'A' with pre-defined limitation.
> New application: create meter_id 'B' which has the same parameters with 'A'.
> 1. 1st possible approach:
>         Hardware duplicates the traffic; old application use meter 'A' and new application uses meter 'B' to control traffic throughputs.
>         Since traffic is duplicated, so it can go to different meters.
> 2. 2nd possible approach:
>              Meter 'A' and 'B' point to the same hardware resource, and traffic reaches this part first and if green, duplication happens.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v4 3/3] ethdev: add standby flags for live migration
  2023-01-30 17:10                         ` Jerin Jacob
@ 2023-01-31  2:53                           ` Rongwei Liu
  2023-01-31  8:45                             ` Jerin Jacob
  0 siblings, 1 reply; 54+ messages in thread
From: Rongwei Liu @ 2023-01-31  2:53 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	stephen, Raslan Darawsheh, Ferruh Yigit, Andrew Rybchenko

HI Jerin:

BR
Rongwei

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, January 31, 2023 01:10
> To: Rongwei Liu <rongweil@nvidia.com>
> Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
> 
> External email: Use caution opening links or attachments
> 
> 
> On Mon, Jan 30, 2023 at 8:17 AM Rongwei Liu <rongweil@nvidia.com> wrote:
> >
> > Hi Jerin
> >
> > BR
> > Rongwei
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Monday, January 23, 2023 21:20
> > > To: Rongwei Liu <rongweil@nvidia.com>
> > > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> > > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > <andrew.rybchenko@oktetlabs.ru>
> > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live
> > > migration
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu <rongweil@nvidia.com>
> wrote:
> > > >
> > > > Some flags are added to the process state API for live migration
> > > > in order to change the behavior of the flow rules in a standby process.
> > > >
> > > > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > > > ---
> > > >  lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
> > > >  1 file changed, 21 insertions(+)
> > > >
> > > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > > > index
> > > > 1505396ced..9ae4f426a7 100644
> > > > --- a/lib/ethdev/rte_ethdev.h
> > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > @@ -2260,6 +2260,27 @@ int rte_eth_dev_owner_get(const uint16_t
> > > > port_id,  __rte_experimental  int rte_eth_process_set_role(bool
> > > > standby, uint32_t flags);
> > > >
> > > > +/**@{@name Process role flags
> > > > + * used when migrating from an application to another one.
> > > > + * @see rte_eth_process_set_active  */
> > > > +/**
> > > > + * When set on a standby process, ingress flow rules will be
> > > > +effective
> > > > + * in active and standby processes, so the ingress traffic may be
> duplicated.
> > > > + */
> > > > +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS
> > > RTE_BIT32(0)
> > >
> > >
> > > How to duplicate if action has statefull items for example,
> > > rte_flow_action_security::security_session -> it store the live
> > > pointer rte_flow_action_meter::mtr_id; -> MTR object ID created with
> > > rte_mtr_create()
> > I agree with you, not all actions can be supported in the active/standby
> model.
> 
> IMO, Where ever rules are not standalone (like QUEUE, RSS) etc, It will be
> architecturally is not possible to migrate with pointers.
> That's where I have concern generalizing this feature for this ethdev.
> 
Not sure I understand your concern correctly. What' the pointer concept here?
Queue RSS actions can be migrated per my local test. Active/Standby application have its fully own rxq/txq.
They are totally separated processes and like two members in pipeline. 2nd member can't be feed if 1st member alive and handle the traffic. 

> Also, I don't believe there is any real HW support needed for this.
> IMO, Having DPDK standard multiprocess can do this by keeping secondary
> application can migrate, keeping all the SW logic in the primary process by
> doing the housekeeping in the application. On plus side, it works with pointers
> too.
IMO, in multiple process model, primary process usually owns the hardware resources via mmap/iomap/pci_map etc.
Secondary process is not able to run if primary quits no matter gracefully or crashing.
This patch wants to introduce a "backup to alive" model.
Assume user wants to upgrade from DPDK version 22.03 to 23.03, 22.03 is running and active role while 23.03 comes up in standby.
Both DPDK processes have its own resources and doesn't rely on each other. 
User can migrate the application following the steps in commit message with minimum traffic downtime. 
SW logic like flow rules can be done following iptables-save/iptables-restore approach.  
> 
> I am not sure how much housekeeping offload to _HW_ in your case. In my
> view, it should be generic utils functions to track the flow and installing the
> rules using rte_flow APIs and keep the scope only for rte_flow.
For rules part, totally agree with you. Issue is there maybe millions of flow rules in field and each rule may take different steps to 
re-install per vendor' implementations.
This serial wants to propose a unified interface for upper layer application' easy use.  
> 
> That's just my view. I leave to ethdev maintainers for the rest of the review
> and decision on this series.
> 
> > That' why we have return value checking and rollback.
> > In Nvidia driver doc, we suggested user to start from 'rss/queue/jump'
> actions.
> > Meter is possible, at least per my view.
> > Assume: "meter g_action queue 0 / y_action drop / r_action drop"
> > Old application: create meter_id 'A' with pre-defined limitation.
> > New application: create meter_id 'B' which has the same parameters with
> 'A'.
> > 1. 1st possible approach:
> >         Hardware duplicates the traffic; old application use meter 'A' and new
> application uses meter 'B' to control traffic throughputs.
> >         Since traffic is duplicated, so it can go to different meters.
> > 2. 2nd possible approach:
> >              Meter 'A' and 'B' point to the same hardware resource, and traffic
> reaches this part first and if green, duplication happens.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
  2023-01-31  2:53                           ` Rongwei Liu
@ 2023-01-31  8:45                             ` Jerin Jacob
  2023-01-31  9:01                               ` Rongwei Liu
  0 siblings, 1 reply; 54+ messages in thread
From: Jerin Jacob @ 2023-01-31  8:45 UTC (permalink / raw)
  To: Rongwei Liu
  Cc: dev, Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	stephen, Raslan Darawsheh, Ferruh Yigit, Andrew Rybchenko

On Tue, Jan 31, 2023 at 8:23 AM Rongwei Liu <rongweil@nvidia.com> wrote:
>
> HI Jerin:
>
> BR
> Rongwei
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Tuesday, January 31, 2023 01:10
> > To: Rongwei Liu <rongweil@nvidia.com>
> > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>
> > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Mon, Jan 30, 2023 at 8:17 AM Rongwei Liu <rongweil@nvidia.com> wrote:
> > >
> > > Hi Jerin
> > >
> > > BR
> > > Rongwei
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Monday, January 23, 2023 21:20
> > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > > stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> > > > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > > <andrew.rybchenko@oktetlabs.ru>
> > > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live
> > > > migration
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu <rongweil@nvidia.com>
> > wrote:
> > > > >
> > > > > Some flags are added to the process state API for live migration
> > > > > in order to change the behavior of the flow rules in a standby process.
> > > > >
> > > > > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > > > > ---
> > > > >  lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
> > > > >  1 file changed, 21 insertions(+)
> > > > >
> > > > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > > > > index
> > > > > 1505396ced..9ae4f426a7 100644
> > > > > --- a/lib/ethdev/rte_ethdev.h
> > > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > > @@ -2260,6 +2260,27 @@ int rte_eth_dev_owner_get(const uint16_t
> > > > > port_id,  __rte_experimental  int rte_eth_process_set_role(bool
> > > > > standby, uint32_t flags);
> > > > >
> > > > > +/**@{@name Process role flags
> > > > > + * used when migrating from an application to another one.
> > > > > + * @see rte_eth_process_set_active  */
> > > > > +/**
> > > > > + * When set on a standby process, ingress flow rules will be
> > > > > +effective
> > > > > + * in active and standby processes, so the ingress traffic may be
> > duplicated.
> > > > > + */
> > > > > +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS
> > > > RTE_BIT32(0)
> > > >
> > > >
> > > > How to duplicate if action has statefull items for example,
> > > > rte_flow_action_security::security_session -> it store the live
> > > > pointer rte_flow_action_meter::mtr_id; -> MTR object ID created with
> > > > rte_mtr_create()
> > > I agree with you, not all actions can be supported in the active/standby
> > model.
> >
> > IMO, Where ever rules are not standalone (like QUEUE, RSS) etc, It will be
> > architecturally is not possible to migrate with pointers.
> > That's where I have concern generalizing this feature for this ethdev.
> >
> Not sure I understand your concern correctly. What' the pointer concept here?

I meant, Any HW resource driver deals with "pointers" or "fixed ID"
can not get the same value
for the new application. That's where I believe this whole concepts
works for very standalone rte_flow patterns and actions.


> Queue RSS actions can be migrated per my local test. Active/Standby application have its fully own rxq/txq.

Yes. It because it is standalone.

> They are totally separated processes and like two members in pipeline. 2nd member can't be feed if 1st member alive and handle the traffic.
>
> > Also, I don't believe there is any real HW support needed for this.
> > IMO, Having DPDK standard multiprocess can do this by keeping secondary
> > application can migrate, keeping all the SW logic in the primary process by
> > doing the housekeeping in the application. On plus side, it works with pointers
> > too.

> IMO, in multiple process model, primary process usually owns the hardware resources via mmap/iomap/pci_map etc.
> Secondary process is not able to run if primary quits no matter gracefully or crashing.
> This patch wants to introduce a "backup to alive" model.
> Assume user wants to upgrade from DPDK version 22.03 to 23.03, 22.03 is running and active role while 23.03 comes up in standby.
> Both DPDK processes have its own resources and doesn't rely on each other.
> User can migrate the application following the steps in commit message with minimum traffic downtime.
> SW logic like flow rules can be done following iptables-save/iptables-restore approach.
> >
> > I am not sure how much housekeeping offload to _HW_ in your case. In my
> > view, it should be generic utils functions to track the flow and installing the
> > rules using rte_flow APIs and keep the scope only for rte_flow.
> For rules part, totally agree with you. Issue is there maybe millions of flow rules in field and each rule may take different steps to
> re-install per vendor' implementations.

I understand the desire for millon flow migrations. Which makes
sense.IMO, It may be just easy to make this feature just
for rte_flow name space. Just have APIs to export() existing rules for
the given port and import() the rules exported
rather than going to ethdev space and call it as "live migration".

> This serial wants to propose a unified interface for upper layer application' easy use.
> >
> > That's just my view. I leave to ethdev maintainers for the rest of the review
> > and decision on this series.
> >
> > > That' why we have return value checking and rollback.
> > > In Nvidia driver doc, we suggested user to start from 'rss/queue/jump'
> > actions.
> > > Meter is possible, at least per my view.
> > > Assume: "meter g_action queue 0 / y_action drop / r_action drop"
> > > Old application: create meter_id 'A' with pre-defined limitation.
> > > New application: create meter_id 'B' which has the same parameters with
> > 'A'.
> > > 1. 1st possible approach:
> > >         Hardware duplicates the traffic; old application use meter 'A' and new
> > application uses meter 'B' to control traffic throughputs.
> > >         Since traffic is duplicated, so it can go to different meters.
> > > 2. 2nd possible approach:
> > >              Meter 'A' and 'B' point to the same hardware resource, and traffic
> > reaches this part first and if green, duplication happens.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v4 3/3] ethdev: add standby flags for live migration
  2023-01-31  8:45                             ` Jerin Jacob
@ 2023-01-31  9:01                               ` Rongwei Liu
  2023-01-31 14:37                                 ` Jerin Jacob
  0 siblings, 1 reply; 54+ messages in thread
From: Rongwei Liu @ 2023-01-31  9:01 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	stephen, Raslan Darawsheh, Ferruh Yigit, Andrew Rybchenko

Hi Jerin:

BR
Rongwei

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, January 31, 2023 16:46
> To: Rongwei Liu <rongweil@nvidia.com>
> Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, Jan 31, 2023 at 8:23 AM Rongwei Liu <rongweil@nvidia.com> wrote:
> >
> > HI Jerin:
> >
> > BR
> > Rongwei
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Tuesday, January 31, 2023 01:10
> > > To: Rongwei Liu <rongweil@nvidia.com>
> > > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> > > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > <andrew.rybchenko@oktetlabs.ru>
> > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live
> > > migration
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Mon, Jan 30, 2023 at 8:17 AM Rongwei Liu <rongweil@nvidia.com>
> wrote:
> > > >
> > > > Hi Jerin
> > > >
> > > > BR
> > > > Rongwei
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > Sent: Monday, January 23, 2023 21:20
> > > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava
> > > > > Ovsiienko <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>;
> > > > > NBU-Contact- Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > > > stephen@networkplumber.org; Raslan Darawsheh
> > > > > <rasland@nvidia.com>; Ferruh Yigit <ferruh.yigit@amd.com>;
> > > > > Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > > > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live
> > > > > migration
> > > > >
> > > > > External email: Use caution opening links or attachments
> > > > >
> > > > >
> > > > > On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu
> > > > > <rongweil@nvidia.com>
> > > wrote:
> > > > > >
> > > > > > Some flags are added to the process state API for live
> > > > > > migration in order to change the behavior of the flow rules in a
> standby process.
> > > > > >
> > > > > > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > > > > > ---
> > > > > >  lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
> > > > > >  1 file changed, 21 insertions(+)
> > > > > >
> > > > > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > > > > > index
> > > > > > 1505396ced..9ae4f426a7 100644
> > > > > > --- a/lib/ethdev/rte_ethdev.h
> > > > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > > > @@ -2260,6 +2260,27 @@ int rte_eth_dev_owner_get(const
> > > > > > uint16_t port_id,  __rte_experimental  int
> > > > > > rte_eth_process_set_role(bool standby, uint32_t flags);
> > > > > >
> > > > > > +/**@{@name Process role flags
> > > > > > + * used when migrating from an application to another one.
> > > > > > + * @see rte_eth_process_set_active  */
> > > > > > +/**
> > > > > > + * When set on a standby process, ingress flow rules will be
> > > > > > +effective
> > > > > > + * in active and standby processes, so the ingress traffic
> > > > > > +may be
> > > duplicated.
> > > > > > + */
> > > > > > +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS
> > > > > RTE_BIT32(0)
> > > > >
> > > > >
> > > > > How to duplicate if action has statefull items for example,
> > > > > rte_flow_action_security::security_session -> it store the live
> > > > > pointer rte_flow_action_meter::mtr_id; -> MTR object ID created
> > > > > with
> > > > > rte_mtr_create()
> > > > I agree with you, not all actions can be supported in the
> > > > active/standby
> > > model.
> > >
> > > IMO, Where ever rules are not standalone (like QUEUE, RSS) etc, It
> > > will be architecturally is not possible to migrate with pointers.
> > > That's where I have concern generalizing this feature for this ethdev.
> > >
> > Not sure I understand your concern correctly. What' the pointer concept
> here?
> 
> I meant, Any HW resource driver deals with "pointers" or "fixed ID"
> can not get the same value
> for the new application. That's where I believe this whole concepts works for
> very standalone rte_flow patterns and actions.
> 
> 
> > Queue RSS actions can be migrated per my local test. Active/Standby
> application have its fully own rxq/txq.
> 
> Yes. It because it is standalone.
> 
> > They are totally separated processes and like two members in pipeline. 2nd
> member can't be feed if 1st member alive and handle the traffic.
> >
> > > Also, I don't believe there is any real HW support needed for this.
> > > IMO, Having DPDK standard multiprocess can do this by keeping
> > > secondary application can migrate, keeping all the SW logic in the
> > > primary process by doing the housekeeping in the application. On
> > > plus side, it works with pointers too.
> 
> > IMO, in multiple process model, primary process usually owns the hardware
> resources via mmap/iomap/pci_map etc.
> > Secondary process is not able to run if primary quits no matter gracefully or
> crashing.
> > This patch wants to introduce a "backup to alive" model.
> > Assume user wants to upgrade from DPDK version 22.03 to 23.03, 22.03 is
> running and active role while 23.03 comes up in standby.
> > Both DPDK processes have its own resources and doesn't rely on each other.
> > User can migrate the application following the steps in commit message
> with minimum traffic downtime.
> > SW logic like flow rules can be done following iptables-save/iptables-restore
> approach.
> > >
> > > I am not sure how much housekeeping offload to _HW_ in your case. In
> > > my view, it should be generic utils functions to track the flow and
> > > installing the rules using rte_flow APIs and keep the scope only for
> rte_flow.
> > For rules part, totally agree with you. Issue is there maybe millions
> > of flow rules in field and each rule may take different steps to re-install per
> vendor' implementations.
> 
> I understand the desire for millon flow migrations. Which makes sense.IMO, It
> may be just easy to make this feature just for rte_flow name space. Just have
> APIs to export() existing rules for the given port and import() the rules
> exported rather than going to ethdev space and call it as "live migration".
> 
Do you mean the API naming should be "rte_flow_process_set_role()" instead of "rte_eth_process_set_role()" ?
Also move to rte_flow.c/.h files? Are we good to keep the PMD callback in eth_dev layer?
Simple export()/import() may not work. Image some flow rules are exclusive and can't be issued from both applications. 
We need to stop old application. I am afraid this will introduce big time window which traffic stops. 
Application won't like this behavior.
With this callback, each PMD can specify each rule, queue it or use lower priority if exclusive. Or return error.

> > This serial wants to propose a unified interface for upper layer application'
> easy use.
> > >
> > > That's just my view. I leave to ethdev maintainers for the rest of
> > > the review and decision on this series.
> > >
> > > > That' why we have return value checking and rollback.
> > > > In Nvidia driver doc, we suggested user to start from 'rss/queue/jump'
> > > actions.
> > > > Meter is possible, at least per my view.
> > > > Assume: "meter g_action queue 0 / y_action drop / r_action drop"
> > > > Old application: create meter_id 'A' with pre-defined limitation.
> > > > New application: create meter_id 'B' which has the same parameters
> > > > with
> > > 'A'.
> > > > 1. 1st possible approach:
> > > >         Hardware duplicates the traffic; old application use meter
> > > > 'A' and new
> > > application uses meter 'B' to control traffic throughputs.
> > > >         Since traffic is duplicated, so it can go to different meters.
> > > > 2. 2nd possible approach:
> > > >              Meter 'A' and 'B' point to the same hardware
> > > > resource, and traffic
> > > reaches this part first and if green, duplication happens.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v4 1/3] ethdev: add flow rule group description
  2023-01-18 15:44                   ` [PATCH v4 1/3] ethdev: add flow rule group description Rongwei Liu
@ 2023-01-31 11:53                     ` Ori Kam
  2023-02-06 12:15                       ` Rongwei Liu
  2023-02-07  2:57                     ` [PATCH v5] " Rongwei Liu
  1 sibling, 1 reply; 54+ messages in thread
From: Ori Kam @ 2023-01-31 11:53 UTC (permalink / raw)
  To: Rongwei Liu, dev, Matan Azrad, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	jerinjacobk, stephen
  Cc: Raslan Darawsheh, Ferruh Yigit, Andrew Rybchenko

Hi Rongwei,

> -----Original Message-----
> From: Rongwei Liu <rongweil@nvidia.com>
> Sent: Wednesday, 18 January 2023 17:45
> 
> Add more sentences to describe the group concepts
> and define group 0 as root group for traffic to search a
> hit rule.
> 
> Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> ---
>  lib/ethdev/rte_flow.h | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
> index b60987db4b..e71ac0c199 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -86,7 +86,18 @@ extern "C" {
>   * but may be valid in a few cases.
>   */
>  struct rte_flow_attr {
> -	uint32_t group; /**< Priority group. */
> +	/**
> +	 * A group is a superset of multiple rules.
> +	 * The default group is 0 and is processed for all packets.
> +	 * The group 0 of bifurcated drivers is shared with the kernel.
> +	 * Rules in other groups are processed only if the group is chained
> +	 * by a jump action from a previously matched rule.
> +	 * It means the group hierarchy is made by the flow rules,
> +	 * and the group 0 is the hierarchy root.
> +	 * Note there is no automatic dead loop protection.
> +	 * @see rte_flow_action_jump
> +	 */
> +	uint32_t group;
>  	uint32_t priority; /**< Rule priority level within group. */
>  	/**
>  	 * The rule in question applies to ingress traffic (non-"transfer").
> --
> 2.27.0

Acked-by: Ori Kam <orika@nvidia.com>
Best,
Ori

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v4 2/3] ethdev: add standby state for live migration
  2023-01-18 15:44                   ` [PATCH v4 2/3] ethdev: add standby state for live migration Rongwei Liu
@ 2023-01-31 13:50                     ` Ori Kam
  2023-01-31 18:14                     ` Jerin Jacob
  2023-02-01  7:52                     ` Andrew Rybchenko
  2 siblings, 0 replies; 54+ messages in thread
From: Ori Kam @ 2023-01-31 13:50 UTC (permalink / raw)
  To: Rongwei Liu, dev, Matan Azrad, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	jerinjacobk, stephen
  Cc: Raslan Darawsheh, Ferruh Yigit, Andrew Rybchenko

Hi Rongwei,

> -----Original Message-----
> From: Rongwei Liu <rongweil@nvidia.com>
> Sent: Wednesday, 18 January 2023 17:45
> 
> When a DPDK application must be upgraded,
> the traffic downtime should be shortened as much as possible.
> During the migration time, the old application may stay alive
> while the new application is starting and being configured.
> 
> In order to optimize the switch to the new application,
> the old application may need to be aware of the presence
> of the new application being prepared.
> This is achieved with a new API allowing the user to change the
> new application state to standby and active later.
> 
> The added function is trying to apply the new state to all probed
> ethdev ports. To make this API simple and easy to use,
> the same flags have to be accepted by all devices.
> 
> This is the scenario of operations in the old and new applications:
> .       device: already configured by the old application
> .       new:    start as active
> .       new:    probe the same device
> .       new:    set as standby
> .       new:    configure the device
> .       device: has configurations from old and new applications
> .       old:    clear its device configuration
> .       device: has only 1 configuration from new application
> .       new:    set as active
> .       device: downtime for connecting all to the new application
> .       old:    shutdown
> 
> The active role means network handling configurations are programmed
> to the HW immediately, and no behavior changed. This is the default state.
> The standby role means configurations are queued in the HW.
> If there is no application with active role,
> any configuration is effective immediately.
> 
> Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> ---
>  doc/guides/rel_notes/release_23_03.rst |  7 ++++
>  lib/ethdev/ethdev_driver.h             | 20 +++++++++
>  lib/ethdev/rte_ethdev.c                | 42 +++++++++++++++++++
>  lib/ethdev/rte_ethdev.h                | 56 ++++++++++++++++++++++++++
>  lib/ethdev/version.map                 |  3 ++
>  5 files changed, 128 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_23_03.rst
> b/doc/guides/rel_notes/release_23_03.rst
> index b8c5b68d6c..5367123f24 100644
> --- a/doc/guides/rel_notes/release_23_03.rst
> +++ b/doc/guides/rel_notes/release_23_03.rst
> @@ -55,6 +55,13 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================
> 
> +* **Added process state in ethdev to improve live migration.**
> +
> +  Hot upgrade of an application may be accelerated by configuring
> +  the new application in standby state while the old one is still active.
> +  Such double ethdev configuration of the same device is possible
> +  with the added process state API.
> +
> 
>  Removed Items
>  -------------
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> index 6a550cfc83..4a098410d5 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -219,6 +219,23 @@ typedef int (*eth_dev_reset_t)(struct rte_eth_dev
> *dev);
>  /** @internal Function used to detect an Ethernet device removal. */
>  typedef int (*eth_is_removed_t)(struct rte_eth_dev *dev);
> 
> +/**
> + * @internal
> + * Set the role of the process to active or standby during live migration.
> + *
> + * @param dev
> + *   Port (ethdev) handle.
> + * @param standby
> + *   Role active if false, standby if true.
> + * @param flags
> + *   Role specific flags.
> + *
> + * @return
> + *   Negative value on error, 0 on success.
> + */
> +typedef int (*eth_dev_process_set_role_t)(struct rte_eth_dev *dev,
> +		bool standby, uint32_t flags);
> +
>  /**
>   * @internal
>   * Function used to enable the Rx promiscuous mode of an Ethernet device.
> @@ -1186,6 +1203,9 @@ struct eth_dev_ops {
>  	/** Check if the device was physically removed */
>  	eth_is_removed_t           is_removed;
> 
> +	/** Set role during live migration */
> +	eth_dev_process_set_role_t process_set_role;
> +
>  	eth_promiscuous_enable_t   promiscuous_enable; /**< Promiscuous
> ON */
>  	eth_promiscuous_disable_t  promiscuous_disable;/**< Promiscuous
> OFF */
>  	eth_allmulticast_enable_t  allmulticast_enable;/**< Rx multicast ON
> */
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 5d5e18db1e..3a1fb64053 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -558,6 +558,48 @@ rte_eth_dev_owner_get(const uint16_t port_id,
> struct rte_eth_dev_owner *owner)
>  	return 0;
>  }
> 
> +int
> +rte_eth_process_set_role(bool standby, uint32_t flags)
> +{
> +	struct rte_eth_dev_info dev_info = {0};
> +	struct rte_eth_dev *dev;
> +	uint16_t port_id;
> +	int ret = 0;
> +
> +	/* Check if all devices support process role. */
> +	RTE_ETH_FOREACH_DEV(port_id) {
> +		dev = &rte_eth_devices[port_id];
> +		if (*dev->dev_ops->process_set_role != NULL &&
> +			*dev->dev_ops->dev_infos_get != NULL &&
> +			(*dev->dev_ops->dev_infos_get)(dev, &dev_info) ==
> 0 &&
> +			(dev_info.dev_capa &
> RTE_ETH_DEV_CAPA_PROCESS_ROLE) != 0)
> +			continue;
> +		rte_errno = ENOTSUP;
> +		return -rte_errno;
> +	}
> +	/* Call the driver callbacks. */
> +	RTE_ETH_FOREACH_DEV(port_id) {
> +		dev = &rte_eth_devices[port_id];
> +		if ((*dev->dev_ops->process_set_role)(dev, standby, flags) <
> 0)
> +			goto failure;
> +		ret++;
> +	}
> +	return ret;
> +
> +failure:
> +	/* Rollback all changed devices in case one failed. */
> +	if (ret) {
> +		RTE_ETH_FOREACH_DEV(port_id) {
> +			dev = &rte_eth_devices[port_id];
> +			(*dev->dev_ops->process_set_role)(dev, !standby,
> flags);
> +			if (--ret == 0)
> +				break;
> +		}
> +	}
> +	rte_errno = EPERM;
> +	return -rte_errno;
> +}
> +
>  int
>  rte_eth_dev_socket_id(uint16_t port_id)
>  {
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index c129ca1eaf..1505396ced 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1606,6 +1606,8 @@ struct rte_eth_conf {
>  #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP         RTE_BIT64(3)
>  /** Device supports keeping shared flow objects across restart. */
>  #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
> +/** Device supports process role changing. @see
> rte_eth_process_set_active */
> +#define RTE_ETH_DEV_CAPA_PROCESS_ROLE           RTE_BIT64(5)
>  /**@}*/
> 
>  /*
> @@ -2204,6 +2206,60 @@ int rte_eth_dev_owner_delete(const uint64_t
> owner_id);
>  int rte_eth_dev_owner_get(const uint16_t port_id,
>  		struct rte_eth_dev_owner *owner);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Set the role of the process to active or standby,
> + * affecting network traffic handling.
> + *
> + * If one device does not support this operation or fails,
> + * the whole operation is failed and rolled back.
> + *
> + * It is forbidden to have multiple processes with the same role
> + * unless only one of them is configured to handle the traffic.
> + *
> + * The application is active by default.
> + * The configuration from the active process is effective immediately
> + * while the configuration from the standby process is queued by hardware.
> + * When configuring the device from a standby process,
> + * it has no effect except for below situations:
> + *   - traffic not handled by the active process configuration
> + *   - no active process
> + *
> + * When a process is changed from a standby to an active role,
> + * all preceding configurations that are queued by hardware
> + * should become effective immediately.
> + * Before role transition, all the traffic handling configurations
> + * set by the active process should be flushed first.
> + *
> + * In summary, the operations are expected to happen in this order
> + * in "old" and "new" applications:
> + *   device: already configured by the old application
> + *   new:    start as active
> + *   new:    probe the same device
> + *   new:    set as standby
> + *   new:    configure the device
> + *   device: has configurations from old and new applications
> + *   old:    clear its device configuration
> + *   device: has only 1 configuration from new application
> + *   new:    set as active
> + *   device: downtime for connecting all to the new application
> + *   old:    shutdown
> + *
> + * @param standby
> + *   Role active if false, standby if true.
> + * @param flags
> + *   Role specific flags.
> + * @return
> + *   Positive value on success, -rte_errno value on error:
> + *   - (> 0) Number of switched devices.
> + *   - (-ENOTSUP) if not supported by a device.
> + *   - (-EPERM) if operation failed with a device.
> + */
> +__rte_experimental
> +int rte_eth_process_set_role(bool standby, uint32_t flags);
> +
>  /**
>   * Get the number of ports which are usable for the application.
>   *
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> index 17201fbe0f..d5d3ea5421 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -298,6 +298,9 @@ EXPERIMENTAL {
>  	rte_flow_get_q_aged_flows;
>  	rte_mtr_meter_policy_get;
>  	rte_mtr_meter_profile_get;
> +
> +	# added in 23.03
> +	rte_eth_process_set_role;
>  };
> 
>  INTERNAL {
> --
> 2.27.0


Acked-by: Ori Kam <orika@nvidia.com>
Best,
Ori

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
  2023-01-31  9:01                               ` Rongwei Liu
@ 2023-01-31 14:37                                 ` Jerin Jacob
  2023-01-31 14:45                                   ` Ori Kam
  0 siblings, 1 reply; 54+ messages in thread
From: Jerin Jacob @ 2023-01-31 14:37 UTC (permalink / raw)
  To: Rongwei Liu
  Cc: dev, Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	stephen, Raslan Darawsheh, Ferruh Yigit, Andrew Rybchenko

On Tue, Jan 31, 2023 at 2:31 PM Rongwei Liu <rongweil@nvidia.com> wrote:
>
> Hi Jerin:
>
> BR
> Rongwei
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Tuesday, January 31, 2023 16:46
> > To: Rongwei Liu <rongweil@nvidia.com>
> > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>
> > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, Jan 31, 2023 at 8:23 AM Rongwei Liu <rongweil@nvidia.com> wrote:
> > >
> > > HI Jerin:
> > >
> > > BR
> > > Rongwei
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Tuesday, January 31, 2023 01:10
> > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > > stephen@networkplumber.org; Raslan Darawsheh <rasland@nvidia.com>;
> > > > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > > <andrew.rybchenko@oktetlabs.ru>
> > > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live
> > > > migration
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > On Mon, Jan 30, 2023 at 8:17 AM Rongwei Liu <rongweil@nvidia.com>
> > wrote:
> > > > >
> > > > > Hi Jerin
> > > > >
> > > > > BR
> > > > > Rongwei
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > Sent: Monday, January 23, 2023 21:20
> > > > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > > > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava
> > > > > > Ovsiienko <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>;
> > > > > > NBU-Contact- Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > > > > stephen@networkplumber.org; Raslan Darawsheh
> > > > > > <rasland@nvidia.com>; Ferruh Yigit <ferruh.yigit@amd.com>;
> > > > > > Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > > > > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live
> > > > > > migration
> > > > > >
> > > > > > External email: Use caution opening links or attachments
> > > > > >
> > > > > >
> > > > > > On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu
> > > > > > <rongweil@nvidia.com>
> > > > wrote:
> > > > > > >
> > > > > > > Some flags are added to the process state API for live
> > > > > > > migration in order to change the behavior of the flow rules in a
> > standby process.
> > > > > > >
> > > > > > > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > > > > > > ---
> > > > > > >  lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
> > > > > > >  1 file changed, 21 insertions(+)
> > > > > > >
> > > > > > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > > > > > > index
> > > > > > > 1505396ced..9ae4f426a7 100644
> > > > > > > --- a/lib/ethdev/rte_ethdev.h
> > > > > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > > > > @@ -2260,6 +2260,27 @@ int rte_eth_dev_owner_get(const
> > > > > > > uint16_t port_id,  __rte_experimental  int
> > > > > > > rte_eth_process_set_role(bool standby, uint32_t flags);
> > > > > > >
> > > > > > > +/**@{@name Process role flags
> > > > > > > + * used when migrating from an application to another one.
> > > > > > > + * @see rte_eth_process_set_active  */
> > > > > > > +/**
> > > > > > > + * When set on a standby process, ingress flow rules will be
> > > > > > > +effective
> > > > > > > + * in active and standby processes, so the ingress traffic
> > > > > > > +may be
> > > > duplicated.
> > > > > > > + */
> > > > > > > +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS
> > > > > > RTE_BIT32(0)
> > > > > >
> > > > > >
> > > > > > How to duplicate if action has statefull items for example,
> > > > > > rte_flow_action_security::security_session -> it store the live
> > > > > > pointer rte_flow_action_meter::mtr_id; -> MTR object ID created
> > > > > > with
> > > > > > rte_mtr_create()
> > > > > I agree with you, not all actions can be supported in the
> > > > > active/standby
> > > > model.
> > > >
> > > > IMO, Where ever rules are not standalone (like QUEUE, RSS) etc, It
> > > > will be architecturally is not possible to migrate with pointers.
> > > > That's where I have concern generalizing this feature for this ethdev.
> > > >
> > > Not sure I understand your concern correctly. What' the pointer concept
> > here?
> >
> > I meant, Any HW resource driver deals with "pointers" or "fixed ID"
> > can not get the same value
> > for the new application. That's where I believe this whole concepts works for
> > very standalone rte_flow patterns and actions.
> >
> >
> > > Queue RSS actions can be migrated per my local test. Active/Standby
> > application have its fully own rxq/txq.
> >
> > Yes. It because it is standalone.
> >
> > > They are totally separated processes and like two members in pipeline. 2nd
> > member can't be feed if 1st member alive and handle the traffic.
> > >
> > > > Also, I don't believe there is any real HW support needed for this.
> > > > IMO, Having DPDK standard multiprocess can do this by keeping
> > > > secondary application can migrate, keeping all the SW logic in the
> > > > primary process by doing the housekeeping in the application. On
> > > > plus side, it works with pointers too.
> >
> > > IMO, in multiple process model, primary process usually owns the hardware
> > resources via mmap/iomap/pci_map etc.
> > > Secondary process is not able to run if primary quits no matter gracefully or
> > crashing.
> > > This patch wants to introduce a "backup to alive" model.
> > > Assume user wants to upgrade from DPDK version 22.03 to 23.03, 22.03 is
> > running and active role while 23.03 comes up in standby.
> > > Both DPDK processes have its own resources and doesn't rely on each other.
> > > User can migrate the application following the steps in commit message
> > with minimum traffic downtime.
> > > SW logic like flow rules can be done following iptables-save/iptables-restore
> > approach.
> > > >
> > > > I am not sure how much housekeeping offload to _HW_ in your case. In
> > > > my view, it should be generic utils functions to track the flow and
> > > > installing the rules using rte_flow APIs and keep the scope only for
> > rte_flow.
> > > For rules part, totally agree with you. Issue is there maybe millions
> > > of flow rules in field and each rule may take different steps to re-install per
> > vendor' implementations.
> >
> > I understand the desire for millon flow migrations. Which makes sense.IMO, It
> > may be just easy to make this feature just for rte_flow name space. Just have
> > APIs to export() existing rules for the given port and import() the rules
> > exported rather than going to ethdev space and call it as "live migration".
> >
> Do you mean the API naming should be "rte_flow_process_set_role()" instead of "rte_eth_process_set_role()" ?
> Also move to rte_flow.c/.h files? Are we good to keep the PMD callback in eth_dev layer?

Yes. something with rte_flow_ prefix and not sure _set_role() kind of scheme.

> Simple export()/import() may not work. Image some flow rules are exclusive and can't be issued from both applications.
> We need to stop old application. I am afraid this will introduce big time window which traffic stops.

Yes, I think the  sequence is
rte_flow_rules_export() on app 1
stop the app 1
rte_flow_rules_import() of app 1 by app2.


> Application won't like this behavior.
> With this callback, each PMD can specify each rule, queue it or use lower priority if exclusive. Or return error.
>
> > > This serial wants to propose a unified interface for upper layer application'
> > easy use.
> > > >
> > > > That's just my view. I leave to ethdev maintainers for the rest of
> > > > the review and decision on this series.
> > > >
> > > > > That' why we have return value checking and rollback.
> > > > > In Nvidia driver doc, we suggested user to start from 'rss/queue/jump'
> > > > actions.
> > > > > Meter is possible, at least per my view.
> > > > > Assume: "meter g_action queue 0 / y_action drop / r_action drop"
> > > > > Old application: create meter_id 'A' with pre-defined limitation.
> > > > > New application: create meter_id 'B' which has the same parameters
> > > > > with
> > > > 'A'.
> > > > > 1. 1st possible approach:
> > > > >         Hardware duplicates the traffic; old application use meter
> > > > > 'A' and new
> > > > application uses meter 'B' to control traffic throughputs.
> > > > >         Since traffic is duplicated, so it can go to different meters.
> > > > > 2. 2nd possible approach:
> > > > >              Meter 'A' and 'B' point to the same hardware
> > > > > resource, and traffic
> > > > reaches this part first and if green, duplication happens.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v4 3/3] ethdev: add standby flags for live migration
  2023-01-31 14:37                                 ` Jerin Jacob
@ 2023-01-31 14:45                                   ` Ori Kam
  2023-01-31 17:50                                     ` Thomas Monjalon
  0 siblings, 1 reply; 54+ messages in thread
From: Ori Kam @ 2023-01-31 14:45 UTC (permalink / raw)
  To: Jerin Jacob, Rongwei Liu
  Cc: dev, Matan Azrad, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	stephen, Raslan Darawsheh, Ferruh Yigit, Andrew Rybchenko

Hi Jerin and Rongwei,

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, 31 January 2023 16:37
> 
> On Tue, Jan 31, 2023 at 2:31 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> >
> > Hi Jerin:
> >
> > BR
> > Rongwei
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Tuesday, January 31, 2023 16:46
> > > To: Rongwei Liu <rongweil@nvidia.com>
> > > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > stephen@networkplumber.org; Raslan Darawsheh
> <rasland@nvidia.com>;
> > > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > <andrew.rybchenko@oktetlabs.ru>
> > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Tue, Jan 31, 2023 at 8:23 AM Rongwei Liu <rongweil@nvidia.com>
> wrote:
> > > >
> > > > HI Jerin:
> > > >
> > > > BR
> > > > Rongwei
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > Sent: Tuesday, January 31, 2023 01:10
> > > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava
> Ovsiienko
> > > > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-
> Contact-
> > > > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> > > > > stephen@networkplumber.org; Raslan Darawsheh
> <rasland@nvidia.com>;
> > > > > Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> > > > > <andrew.rybchenko@oktetlabs.ru>
> > > > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live
> > > > > migration
> > > > >
> > > > > External email: Use caution opening links or attachments
> > > > >
> > > > >
> > > > > On Mon, Jan 30, 2023 at 8:17 AM Rongwei Liu
> <rongweil@nvidia.com>
> > > wrote:
> > > > > >
> > > > > > Hi Jerin
> > > > > >
> > > > > > BR
> > > > > > Rongwei
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > Sent: Monday, January 23, 2023 21:20
> > > > > > > To: Rongwei Liu <rongweil@nvidia.com>
> > > > > > > Cc: dev@dpdk.org; Matan Azrad <matan@nvidia.com>; Slava
> > > > > > > Ovsiienko <viacheslavo@nvidia.com>; Ori Kam
> <orika@nvidia.com>;
> > > > > > > NBU-Contact- Thomas Monjalon (EXTERNAL)
> <thomas@monjalon.net>;
> > > > > > > stephen@networkplumber.org; Raslan Darawsheh
> > > > > > > <rasland@nvidia.com>; Ferruh Yigit <ferruh.yigit@amd.com>;
> > > > > > > Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > > > > > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live
> > > > > > > migration
> > > > > > >
> > > > > > > External email: Use caution opening links or attachments
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu
> > > > > > > <rongweil@nvidia.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > Some flags are added to the process state API for live
> > > > > > > > migration in order to change the behavior of the flow rules in a
> > > standby process.
> > > > > > > >
> > > > > > > > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > > > > > > > ---
> > > > > > > >  lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
> > > > > > > >  1 file changed, 21 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > > > > > > > index
> > > > > > > > 1505396ced..9ae4f426a7 100644
> > > > > > > > --- a/lib/ethdev/rte_ethdev.h
> > > > > > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > > > > > @@ -2260,6 +2260,27 @@ int rte_eth_dev_owner_get(const
> > > > > > > > uint16_t port_id,  __rte_experimental  int
> > > > > > > > rte_eth_process_set_role(bool standby, uint32_t flags);
> > > > > > > >
> > > > > > > > +/**@{@name Process role flags
> > > > > > > > + * used when migrating from an application to another one.
> > > > > > > > + * @see rte_eth_process_set_active  */
> > > > > > > > +/**
> > > > > > > > + * When set on a standby process, ingress flow rules will be
> > > > > > > > +effective
> > > > > > > > + * in active and standby processes, so the ingress traffic
> > > > > > > > +may be
> > > > > duplicated.
> > > > > > > > + */
> > > > > > > > +#define
> RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS
> > > > > > > RTE_BIT32(0)
> > > > > > >
> > > > > > >
> > > > > > > How to duplicate if action has statefull items for example,
> > > > > > > rte_flow_action_security::security_session -> it store the live
> > > > > > > pointer rte_flow_action_meter::mtr_id; -> MTR object ID created
> > > > > > > with
> > > > > > > rte_mtr_create()
> > > > > > I agree with you, not all actions can be supported in the
> > > > > > active/standby
> > > > > model.
> > > > >
> > > > > IMO, Where ever rules are not standalone (like QUEUE, RSS) etc, It
> > > > > will be architecturally is not possible to migrate with pointers.
> > > > > That's where I have concern generalizing this feature for this ethdev.
> > > > >
> > > > Not sure I understand your concern correctly. What' the pointer
> concept
> > > here?
> > >
> > > I meant, Any HW resource driver deals with "pointers" or "fixed ID"
> > > can not get the same value
> > > for the new application. That's where I believe this whole concepts works
> for
> > > very standalone rte_flow patterns and actions.
> > >
> > >
> > > > Queue RSS actions can be migrated per my local test. Active/Standby
> > > application have its fully own rxq/txq.
> > >
> > > Yes. It because it is standalone.
> > >
> > > > They are totally separated processes and like two members in pipeline.
> 2nd
> > > member can't be feed if 1st member alive and handle the traffic.
> > > >
> > > > > Also, I don't believe there is any real HW support needed for this.
> > > > > IMO, Having DPDK standard multiprocess can do this by keeping
> > > > > secondary application can migrate, keeping all the SW logic in the
> > > > > primary process by doing the housekeeping in the application. On
> > > > > plus side, it works with pointers too.
> > >
> > > > IMO, in multiple process model, primary process usually owns the
> hardware
> > > resources via mmap/iomap/pci_map etc.
> > > > Secondary process is not able to run if primary quits no matter
> gracefully or
> > > crashing.
> > > > This patch wants to introduce a "backup to alive" model.
> > > > Assume user wants to upgrade from DPDK version 22.03 to 23.03, 22.03
> is
> > > running and active role while 23.03 comes up in standby.
> > > > Both DPDK processes have its own resources and doesn't rely on each
> other.
> > > > User can migrate the application following the steps in commit message
> > > with minimum traffic downtime.
> > > > SW logic like flow rules can be done following iptables-save/iptables-
> restore
> > > approach.
> > > > >
> > > > > I am not sure how much housekeeping offload to _HW_ in your case.
> In
> > > > > my view, it should be generic utils functions to track the flow and
> > > > > installing the rules using rte_flow APIs and keep the scope only for
> > > rte_flow.
> > > > For rules part, totally agree with you. Issue is there maybe millions
> > > > of flow rules in field and each rule may take different steps to re-install
> per
> > > vendor' implementations.
> > >
> > > I understand the desire for millon flow migrations. Which makes
> sense.IMO, It
> > > may be just easy to make this feature just for rte_flow name space. Just
> have
> > > APIs to export() existing rules for the given port and import() the rules
> > > exported rather than going to ethdev space and call it as "live migration".
> > >
> > Do you mean the API naming should be "rte_flow_process_set_role()"
> instead of "rte_eth_process_set_role()" ?
> > Also move to rte_flow.c/.h files? Are we good to keep the PMD callback in
> eth_dev layer?
> 
> Yes. something with rte_flow_ prefix and not sure _set_role() kind of
> scheme.

I think that the process of upgrade relates to the entire port and not only the rte_flow,
I don't mind that this flag will be part  of rte_flow, but it looks like this information is in higher level.

> 
> > Simple export()/import() may not work. Image some flow rules are
> exclusive and can't be issued from both applications.
> > We need to stop old application. I am afraid this will introduce big time
> window which traffic stops.
> 
> Yes, I think the  sequence is
> rte_flow_rules_export() on app 1
> stop the app 1
> rte_flow_rules_import() of app 1 by app2.
> 
I don't think export is the best solution, since maybe the second application doesn't want
all rules.
From my understanding the idea is to set priority between two process so when 
one application closes the traffic is going to be received by the second application.
We have also the option that the second process will get duplicated traffic with the
First application.

> 
> > Application won't like this behavior.
> > With this callback, each PMD can specify each rule, queue it or use lower
> priority if exclusive. Or return error.
> >
> > > > This serial wants to propose a unified interface for upper layer
> application'
> > > easy use.
> > > > >
> > > > > That's just my view. I leave to ethdev maintainers for the rest of
> > > > > the review and decision on this series.
> > > > >
> > > > > > That' why we have return value checking and rollback.
> > > > > > In Nvidia driver doc, we suggested user to start from
> 'rss/queue/jump'
> > > > > actions.
> > > > > > Meter is possible, at least per my view.
> > > > > > Assume: "meter g_action queue 0 / y_action drop / r_action drop"
> > > > > > Old application: create meter_id 'A' with pre-defined limitation.
> > > > > > New application: create meter_id 'B' which has the same
> parameters
> > > > > > with
> > > > > 'A'.
> > > > > > 1. 1st possible approach:
> > > > > >         Hardware duplicates the traffic; old application use meter
> > > > > > 'A' and new
> > > > > application uses meter 'B' to control traffic throughputs.
> > > > > >         Since traffic is duplicated, so it can go to different meters.
> > > > > > 2. 2nd possible approach:
> > > > > >              Meter 'A' and 'B' point to the same hardware
> > > > > > resource, and traffic
> > > > > reaches this part first and if green, duplication happens.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
  2023-01-31 14:45                                   ` Ori Kam
@ 2023-01-31 17:50                                     ` Thomas Monjalon
  2023-01-31 18:10                                       ` Jerin Jacob
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2023-01-31 17:50 UTC (permalink / raw)
  To: Jerin Jacob, Rongwei Liu, Ori Kam
  Cc: dev, Matan Azrad, Slava Ovsiienko, stephen, Raslan Darawsheh,
	Ferruh Yigit, Andrew Rybchenko

31/01/2023 15:45, Ori Kam:
> From: Jerin Jacob <jerinjacobk@gmail.com>
> > On Tue, Jan 31, 2023 at 2:31 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > On Tue, Jan 31, 2023 at 8:23 AM Rongwei Liu <rongweil@nvidia.com>
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > On Mon, Jan 30, 2023 at 8:17 AM Rongwei Liu
> > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > > On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu
> > > > > > > > > +/**@{@name Process role flags
> > > > > > > > > + * used when migrating from an application to another one.
> > > > > > > > > + * @see rte_eth_process_set_active  */
> > > > > > > > > +/**
> > > > > > > > > + * When set on a standby process, ingress flow rules will be
> > > > > > > > > +effective
> > > > > > > > > + * in active and standby processes, so the ingress traffic
> > > > > > > > > +may be duplicated.
> > > > > > > > > + */
> > > > > > > > > +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS RTE_BIT32(0)
> > > > > > > >
> > > > > > > >
> > > > > > > > How to duplicate if action has statefull items for example,
> > > > > > > > rte_flow_action_security::security_session -> it store the live
> > > > > > > > pointer rte_flow_action_meter::mtr_id; -> MTR object ID created
> > > > > > > > with
> > > > > > > > rte_mtr_create()
> > > > > > > I agree with you, not all actions can be supported in the
> > > > > > > active/standby model.
> > > > > >
> > > > > > IMO, Where ever rules are not standalone (like QUEUE, RSS) etc, It
> > > > > > will be architecturally is not possible to migrate with pointers.
> > > > > > That's where I have concern generalizing this feature for this ethdev.
> > > > > >
> > > > > Not sure I understand your concern correctly. What' the pointer concept here?
> > > >
> > > > I meant, Any HW resource driver deals with "pointers" or "fixed ID"
> > > > can not get the same value
> > > > for the new application. That's where I believe this whole concepts works
> > > > for very standalone rte_flow patterns and actions.
> > > >
> > > > > Queue RSS actions can be migrated per my local test. Active/Standby
> > > > application have its fully own rxq/txq.
> > > >
> > > > Yes. It because it is standalone.
> > > >
> > > > > They are totally separated processes and like two members in pipeline.
> > > > > 2nd member can't be feed if 1st member alive and handle the traffic.
> > > > >
[...]
> > > > > > my view, it should be generic utils functions to track the flow and
> > > > > > installing the rules using rte_flow APIs and keep the scope only for
> > > > > > rte_flow.
> > > > > 
> > > > > For rules part, totally agree with you. Issue is there maybe millions
> > > > > of flow rules in field and each rule may take different steps
> > > > > to re-install per vendor' implementations.
> > > >
> > > > I understand the desire for millon flow migrations. Which makes sense.
> > > > IMO, It may be just easy to make this feature just for rte_flow name space.
> > > > Just have APIs to export() existing rules for the given port
> > > > and import() the rules
> > > > exported rather than going to ethdev space and call it as "live migration".
> > > >
> > > Do you mean the API naming should be "rte_flow_process_set_role()"
> > > instead of "rte_eth_process_set_role()" ?
> > > Also move to rte_flow.c/.h files? Are we good to keep the PMD callback
> > > in eth_dev layer?
> > 
> > Yes. something with rte_flow_ prefix and not sure _set_role() kind of
> > scheme.
> 
> I think that the process of upgrade relates to the entire port and not only the rte_flow,
> I don't mind that this flag will be part  of rte_flow, but it looks like this information is in higher level.

I agree, application migration is a high-level concept.
For now we see that we can take advantage of it for some flow rules.
It could help more use cases.

I also agree that it is not a full solution.
Migration is complex, that's sure we cannot solve it in few weeks,
and we'll need to add more functions and helpers to make it easy to use
in more cases.


> > > Simple export()/import() may not work. Image some flow rules are
> > exclusive and can't be issued from both applications.
> > > We need to stop old application. I am afraid this will introduce big time
> > window which traffic stops.
> > 
> > Yes, I think the  sequence is
> > rte_flow_rules_export() on app 1
> > stop the app 1
> > rte_flow_rules_import() of app 1 by app2.
> > 
> I don't think export is the best solution, since maybe the second application doesn't want
> all rules.
> From my understanding the idea is to set priority between two process so when 
> one application closes the traffic is going to be received by the second application.
> We have also the option that the second process will get duplicated traffic with the
> First application.
> 
> > > Application won't like this behavior.
> > > With this callback, each PMD can specify each rule, queue it or use lower
> > priority if exclusive. Or return error.
> > >
> > > > > This serial wants to propose a unified interface for upper layer
> > application'
> > > > easy use.
> > > > > >
> > > > > > That's just my view. I leave to ethdev maintainers for the rest of
> > > > > > the review and decision on this series.

That's a first step which allows to declare the migration intent.
We should try to build on top of it and keep it as experimental
as long as needed to achieve a good migration support.

I am for going in this direction (accept the patch) for now.
If we discover in the next months that there is a better direction,
we can change.


> > > > > > > That' why we have return value checking and rollback.
> > > > > > > In Nvidia driver doc, we suggested user to start from
> > 'rss/queue/jump'
> > > > > > actions.
> > > > > > > Meter is possible, at least per my view.
> > > > > > > Assume: "meter g_action queue 0 / y_action drop / r_action drop"
> > > > > > > Old application: create meter_id 'A' with pre-defined limitation.
> > > > > > > New application: create meter_id 'B' which has the same
> > parameters
> > > > > > > with
> > > > > > 'A'.
> > > > > > > 1. 1st possible approach:
> > > > > > >         Hardware duplicates the traffic; old application use meter
> > > > > > > 'A' and new
> > > > > > application uses meter 'B' to control traffic throughputs.
> > > > > > >         Since traffic is duplicated, so it can go to different meters.
> > > > > > > 2. 2nd possible approach:
> > > > > > >              Meter 'A' and 'B' point to the same hardware
> > > > > > > resource, and traffic
> > > > > > reaches this part first and if green, duplication happens.




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
  2023-01-31 17:50                                     ` Thomas Monjalon
@ 2023-01-31 18:10                                       ` Jerin Jacob
  0 siblings, 0 replies; 54+ messages in thread
From: Jerin Jacob @ 2023-01-31 18:10 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Rongwei Liu, Ori Kam, dev, Matan Azrad, Slava Ovsiienko, stephen,
	Raslan Darawsheh, Ferruh Yigit, Andrew Rybchenko

On Tue, Jan 31, 2023 at 11:20 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 31/01/2023 15:45, Ori Kam:
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > On Tue, Jan 31, 2023 at 2:31 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > On Tue, Jan 31, 2023 at 8:23 AM Rongwei Liu <rongweil@nvidia.com>
> > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > On Mon, Jan 30, 2023 at 8:17 AM Rongwei Liu
> > > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > > > On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu
> > > > > > > > > > +/**@{@name Process role flags
> > > > > > > > > > + * used when migrating from an application to another one.
> > > > > > > > > > + * @see rte_eth_process_set_active  */
> > > > > > > > > > +/**
> > > > > > > > > > + * When set on a standby process, ingress flow rules will be
> > > > > > > > > > +effective
> > > > > > > > > > + * in active and standby processes, so the ingress traffic
> > > > > > > > > > +may be duplicated.
> > > > > > > > > > + */
> > > > > > > > > > +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS RTE_BIT32(0)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > How to duplicate if action has statefull items for example,
> > > > > > > > > rte_flow_action_security::security_session -> it store the live
> > > > > > > > > pointer rte_flow_action_meter::mtr_id; -> MTR object ID created
> > > > > > > > > with
> > > > > > > > > rte_mtr_create()
> > > > > > > > I agree with you, not all actions can be supported in the
> > > > > > > > active/standby model.
> > > > > > >
> > > > > > > IMO, Where ever rules are not standalone (like QUEUE, RSS) etc, It
> > > > > > > will be architecturally is not possible to migrate with pointers.
> > > > > > > That's where I have concern generalizing this feature for this ethdev.
> > > > > > >
> > > > > > Not sure I understand your concern correctly. What' the pointer concept here?
> > > > >
> > > > > I meant, Any HW resource driver deals with "pointers" or "fixed ID"
> > > > > can not get the same value
> > > > > for the new application. That's where I believe this whole concepts works
> > > > > for very standalone rte_flow patterns and actions.
> > > > >
> > > > > > Queue RSS actions can be migrated per my local test. Active/Standby
> > > > > application have its fully own rxq/txq.
> > > > >
> > > > > Yes. It because it is standalone.
> > > > >
> > > > > > They are totally separated processes and like two members in pipeline.
> > > > > > 2nd member can't be feed if 1st member alive and handle the traffic.
> > > > > >
> [...]
> > > > > > > my view, it should be generic utils functions to track the flow and
> > > > > > > installing the rules using rte_flow APIs and keep the scope only for
> > > > > > > rte_flow.
> > > > > >
> > > > > > For rules part, totally agree with you. Issue is there maybe millions
> > > > > > of flow rules in field and each rule may take different steps
> > > > > > to re-install per vendor' implementations.
> > > > >
> > > > > I understand the desire for millon flow migrations. Which makes sense.
> > > > > IMO, It may be just easy to make this feature just for rte_flow name space.
> > > > > Just have APIs to export() existing rules for the given port
> > > > > and import() the rules
> > > > > exported rather than going to ethdev space and call it as "live migration".
> > > > >
> > > > Do you mean the API naming should be "rte_flow_process_set_role()"
> > > > instead of "rte_eth_process_set_role()" ?
> > > > Also move to rte_flow.c/.h files? Are we good to keep the PMD callback
> > > > in eth_dev layer?
> > >
> > > Yes. something with rte_flow_ prefix and not sure _set_role() kind of
> > > scheme.
> >
> > I think that the process of upgrade relates to the entire port and not only the rte_flow,
> > I don't mind that this flag will be part  of rte_flow, but it looks like this information is in higher level.
>
> I agree, application migration is a high-level concept.
> For now we see that we can take advantage of it for some flow rules.
> It could help more use cases.
>
> I also agree that it is not a full solution.
> Migration is complex, that's sure we cannot solve it in few weeks,
> and we'll need to add more functions and helpers to make it easy to use
> in more cases.

Makes sense.

>
>
> > > > Simple export()/import() may not work. Image some flow rules are
> > > exclusive and can't be issued from both applications.
> > > > We need to stop old application. I am afraid this will introduce big time
> > > window which traffic stops.
> > >
> > > Yes, I think the  sequence is
> > > rte_flow_rules_export() on app 1
> > > stop the app 1
> > > rte_flow_rules_import() of app 1 by app2.
> > >
> > I don't think export is the best solution, since maybe the second application doesn't want
> > all rules.
> > From my understanding the idea is to set priority between two process so when
> > one application closes the traffic is going to be received by the second application.
> > We have also the option that the second process will get duplicated traffic with the
> > First application.
> >
> > > > Application won't like this behavior.
> > > > With this callback, each PMD can specify each rule, queue it or use lower
> > > priority if exclusive. Or return error.
> > > >
> > > > > > This serial wants to propose a unified interface for upper layer
> > > application'
> > > > > easy use.
> > > > > > >
> > > > > > > That's just my view. I leave to ethdev maintainers for the rest of
> > > > > > > the review and decision on this series.
>
> That's a first step which allows to declare the migration intent.
> We should try to build on top of it and keep it as experimental
> as long as needed to achieve a good migration support.
>
> I am for going in this direction (accept the patch) for now.
> If we discover in the next months that there is a better direction,
> we can change.

Please have a driver support and test application to exercise this API
when merging this patch.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 2/3] ethdev: add standby state for live migration
  2023-01-18 15:44                   ` [PATCH v4 2/3] ethdev: add standby state for live migration Rongwei Liu
  2023-01-31 13:50                     ` Ori Kam
@ 2023-01-31 18:14                     ` Jerin Jacob
  2023-01-31 22:55                       ` Thomas Monjalon
  2023-02-01  7:52                     ` Andrew Rybchenko
  2 siblings, 1 reply; 54+ messages in thread
From: Jerin Jacob @ 2023-01-31 18:14 UTC (permalink / raw)
  To: Rongwei Liu
  Cc: dev, matan, viacheslavo, orika, thomas, stephen, rasland,
	Ferruh Yigit, Andrew Rybchenko

On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu <rongweil@nvidia.com> wrote:
>
> When a DPDK application must be upgraded,
> the traffic downtime should be shortened as much as possible.
> During the migration time, the old application may stay alive
> while the new application is starting and being configured.
>
> In order to optimize the switch to the new application,
> the old application may need to be aware of the presence
> of the new application being prepared.
> This is achieved with a new API allowing the user to change the
> new application state to standby and active later.
>
> The added function is trying to apply the new state to all probed
> ethdev ports. To make this API simple and easy to use,
> the same flags have to be accepted by all devices.
>
> This is the scenario of operations in the old and new applications:
> .       device: already configured by the old application
> .       new:    start as active
> .       new:    probe the same device

How to probe same device if is already bind to another application?
vfio-pci wont allow this.

Good to see any implementation and example application to exercise this APIs


> .       new:    set as standby
> .       new:    configure the device
> .       device: has configurations from old and new applications
> .       old:    clear its device configuration
> .       device: has only 1 configuration from new application
> .       new:    set as active
> .       device: downtime for connecting all to the new application
> .       old:    shutdown
>
> The active role means network handling configurations are programmed
> to the HW immediately, and no behavior changed. This is the default state.
> The standby role means configurations are queued in the HW.
> If there is no application with active role,
> any configuration is effective immediately.
>
> Signed-off-by: Rongwei Liu <rongweil@nvidia.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 2/3] ethdev: add standby state for live migration
  2023-01-31 18:14                     ` Jerin Jacob
@ 2023-01-31 22:55                       ` Thomas Monjalon
  2023-02-01  7:32                         ` Andrew Rybchenko
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2023-01-31 22:55 UTC (permalink / raw)
  To: Rongwei Liu, Jerin Jacob
  Cc: dev, matan, viacheslavo, orika, stephen, rasland, Ferruh Yigit,
	Andrew Rybchenko, bruce.richardson

31/01/2023 19:14, Jerin Jacob:
> On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> >
> > When a DPDK application must be upgraded,
> > the traffic downtime should be shortened as much as possible.
> > During the migration time, the old application may stay alive
> > while the new application is starting and being configured.
> >
> > In order to optimize the switch to the new application,
> > the old application may need to be aware of the presence
> > of the new application being prepared.
> > This is achieved with a new API allowing the user to change the
> > new application state to standby and active later.
> >
> > The added function is trying to apply the new state to all probed
> > ethdev ports. To make this API simple and easy to use,
> > the same flags have to be accepted by all devices.
> >
> > This is the scenario of operations in the old and new applications:
> > .       device: already configured by the old application
> > .       new:    start as active
> > .       new:    probe the same device
> 
> How to probe same device if is already bind to another application?
> vfio-pci wont allow this.

I missed that part.
There is no way to share a VFIO device between 2 applications?


> > .       new:    set as standby
> > .       new:    configure the device
> > .       device: has configurations from old and new applications
> > .       old:    clear its device configuration
> > .       device: has only 1 configuration from new application
> > .       new:    set as active
> > .       device: downtime for connecting all to the new application
> > .       old:    shutdown




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 2/3] ethdev: add standby state for live migration
  2023-01-31 22:55                       ` Thomas Monjalon
@ 2023-02-01  7:32                         ` Andrew Rybchenko
  2023-02-01  8:31                           ` Thomas Monjalon
  2023-02-01  8:40                           ` Jerin Jacob
  0 siblings, 2 replies; 54+ messages in thread
From: Andrew Rybchenko @ 2023-02-01  7:32 UTC (permalink / raw)
  To: Thomas Monjalon, Rongwei Liu, Jerin Jacob
  Cc: dev, matan, viacheslavo, orika, stephen, rasland, Ferruh Yigit,
	bruce.richardson

On 2/1/23 01:55, Thomas Monjalon wrote:
> 31/01/2023 19:14, Jerin Jacob:
>> On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu <rongweil@nvidia.com> wrote:
>>>
>>> When a DPDK application must be upgraded,
>>> the traffic downtime should be shortened as much as possible.
>>> During the migration time, the old application may stay alive
>>> while the new application is starting and being configured.
>>>
>>> In order to optimize the switch to the new application,
>>> the old application may need to be aware of the presence
>>> of the new application being prepared.
>>> This is achieved with a new API allowing the user to change the
>>> new application state to standby and active later.
>>>
>>> The added function is trying to apply the new state to all probed
>>> ethdev ports. To make this API simple and easy to use,
>>> the same flags have to be accepted by all devices.
>>>
>>> This is the scenario of operations in the old and new applications:
>>> .       device: already configured by the old application
>>> .       new:    start as active
>>> .       new:    probe the same device
>>
>> How to probe same device if is already bind to another application?
>> vfio-pci wont allow this.
> 
> I missed that part.
> There is no way to share a VFIO device between 2 applications?

As I understand multi-process shares an VFIO device between
many application. As far as I remember it is just required to
pass corresponding file descriptor to another application.

Anyway I fully agree that the patch requires more documentation
in doc/guides/ with the description of live migration theory of
operations.

>>> .       new:    set as standby
>>> .       new:    configure the device
>>> .       device: has configurations from old and new applications
>>> .       old:    clear its device configuration
>>> .       device: has only 1 configuration from new application
>>> .       new:    set as active
>>> .       device: downtime for connecting all to the new application
>>> .       old:    shutdown
> 
> 
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 2/3] ethdev: add standby state for live migration
  2023-01-18 15:44                   ` [PATCH v4 2/3] ethdev: add standby state for live migration Rongwei Liu
  2023-01-31 13:50                     ` Ori Kam
  2023-01-31 18:14                     ` Jerin Jacob
@ 2023-02-01  7:52                     ` Andrew Rybchenko
  2023-02-01  8:27                       ` Thomas Monjalon
  2 siblings, 1 reply; 54+ messages in thread
From: Andrew Rybchenko @ 2023-02-01  7:52 UTC (permalink / raw)
  To: Rongwei Liu, dev, matan, viacheslavo, orika, thomas, jerinjacobk,
	stephen
  Cc: rasland, Ferruh Yigit

On 1/18/23 18:44, Rongwei Liu wrote:
> When a DPDK application must be upgraded,
> the traffic downtime should be shortened as much as possible.
> During the migration time, the old application may stay alive
> while the new application is starting and being configured.
> 
> In order to optimize the switch to the new application,
> the old application may need to be aware of the presence
> of the new application being prepared.
> This is achieved with a new API allowing the user to change the
> new application state to standby and active later.
> 
> The added function is trying to apply the new state to all probed
> ethdev ports. To make this API simple and easy to use,
> the same flags have to be accepted by all devices.
> 
> This is the scenario of operations in the old and new applications:
> .       device: already configured by the old application
> .       new:    start as active
> .       new:    probe the same device
> .       new:    set as standby
> .       new:    configure the device
> .       device: has configurations from old and new applications
> .       old:    clear its device configuration
> .       device: has only 1 configuration from new application
> .       new:    set as active
> .       device: downtime for connecting all to the new application
> .       old:    shutdown
> 
> The active role means network handling configurations are programmed
> to the HW immediately, and no behavior changed. This is the default state.
> The standby role means configurations are queued in the HW.
> If there is no application with active role,
> any configuration is effective immediately.
> 
> Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> ---
>   doc/guides/rel_notes/release_23_03.rst |  7 ++++
>   lib/ethdev/ethdev_driver.h             | 20 +++++++++
>   lib/ethdev/rte_ethdev.c                | 42 +++++++++++++++++++
>   lib/ethdev/rte_ethdev.h                | 56 ++++++++++++++++++++++++++
>   lib/ethdev/version.map                 |  3 ++
>   5 files changed, 128 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
> index b8c5b68d6c..5367123f24 100644
> --- a/doc/guides/rel_notes/release_23_03.rst
> +++ b/doc/guides/rel_notes/release_23_03.rst
> @@ -55,6 +55,13 @@ New Features
>        Also, make sure to start the actual text at the margin.
>        =======================================================
>   
> +* **Added process state in ethdev to improve live migration.**
> +
> +  Hot upgrade of an application may be accelerated by configuring
> +  the new application in standby state while the old one is still active.
> +  Such double ethdev configuration of the same device is possible
> +  with the added process state API.
> +

Will we have the same API for other device classes? Any ideas
how to avoid duplication? (I'm afraid we don't have generic
place to put such functionality).

>   
>   Removed Items
>   -------------
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> index 6a550cfc83..4a098410d5 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -219,6 +219,23 @@ typedef int (*eth_dev_reset_t)(struct rte_eth_dev *dev);
>   /** @internal Function used to detect an Ethernet device removal. */
>   typedef int (*eth_is_removed_t)(struct rte_eth_dev *dev);
>   
> +/**
> + * @internal
> + * Set the role of the process to active or standby during live migration.
> + *
> + * @param dev
> + *   Port (ethdev) handle.
> + * @param standby
> + *   Role active if false, standby if true.
> + * @param flags
> + *   Role specific flags.
> + *
> + * @return
> + *   Negative value on error, 0 on success.
> + */
> +typedef int (*eth_dev_process_set_role_t)(struct rte_eth_dev *dev,
> +		bool standby, uint32_t flags);
> +
>   /**
>    * @internal
>    * Function used to enable the Rx promiscuous mode of an Ethernet device.
> @@ -1186,6 +1203,9 @@ struct eth_dev_ops {
>   	/** Check if the device was physically removed */
>   	eth_is_removed_t           is_removed;
>   
> +	/** Set role during live migration */
> +	eth_dev_process_set_role_t process_set_role;
> +
>   	eth_promiscuous_enable_t   promiscuous_enable; /**< Promiscuous ON */
>   	eth_promiscuous_disable_t  promiscuous_disable;/**< Promiscuous OFF */
>   	eth_allmulticast_enable_t  allmulticast_enable;/**< Rx multicast ON */
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 5d5e18db1e..3a1fb64053 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -558,6 +558,48 @@ rte_eth_dev_owner_get(const uint16_t port_id, struct rte_eth_dev_owner *owner)
>   	return 0;
>   }
>   
> +int
> +rte_eth_process_set_role(bool standby, uint32_t flags)
> +{
> +	struct rte_eth_dev_info dev_info = {0};
> +	struct rte_eth_dev *dev;
> +	uint16_t port_id;
> +	int ret = 0;
> +
> +	/* Check if all devices support process role. */
> +	RTE_ETH_FOREACH_DEV(port_id) {
> +		dev = &rte_eth_devices[port_id];
> +		if (*dev->dev_ops->process_set_role != NULL &&
> +			*dev->dev_ops->dev_infos_get != NULL &&
> +			(*dev->dev_ops->dev_infos_get)(dev, &dev_info) == 0 &&
> +			(dev_info.dev_capa & RTE_ETH_DEV_CAPA_PROCESS_ROLE) != 0)

Why do we need both non-NULL callback and dev_capa flag?
Isn't just non-NULL callback sufficient?

> +			continue;
> +		rte_errno = ENOTSUP;
> +		return -rte_errno;
> +	}
> +	/* Call the driver callbacks. */
> +	RTE_ETH_FOREACH_DEV(port_id) {
> +		dev = &rte_eth_devices[port_id];
> +		if ((*dev->dev_ops->process_set_role)(dev, standby, flags) < 0)

Please, add error logging on failure.

> +			goto failure;

IMHO it would be useful to have info logs on success.

> +		ret++;

ret variable name is very confusing, please, be
more specific in variable naming.

> +	}
> +	return ret;
> +
> +failure:
> +	/* Rollback all changed devices in case one failed. */
> +	if (ret) {

Compare vs 0

> +		RTE_ETH_FOREACH_DEV(port_id) {
> +			dev = &rte_eth_devices[port_id];
> +			(*dev->dev_ops->process_set_role)(dev, !standby, flags);

IMHO, it would be useful to have error logs on rollback
failure and info logs on success.

> +			if (--ret == 0)
> +				break;
> +		}
> +	}
> +	rte_errno = EPERM;

Why is it always EPERM? Isn't it better to forward
failure code from driver?

> +	return -rte_errno;
> +}
> +
>   int
>   rte_eth_dev_socket_id(uint16_t port_id)
>   {
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index c129ca1eaf..1505396ced 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1606,6 +1606,8 @@ struct rte_eth_conf {
>   #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP         RTE_BIT64(3)
>   /** Device supports keeping shared flow objects across restart. */
>   #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
> +/** Device supports process role changing. @see rte_eth_process_set_active */

There is no rte_eth_process_set_active()

> +#define RTE_ETH_DEV_CAPA_PROCESS_ROLE           RTE_BIT64(5)
>   /**@}*/
>   
>   /*
> @@ -2204,6 +2206,60 @@ int rte_eth_dev_owner_delete(const uint64_t owner_id);
>   int rte_eth_dev_owner_get(const uint16_t port_id,
>   		struct rte_eth_dev_owner *owner);
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Set the role of the process to active or standby,
> + * affecting network traffic handling.
> + *
> + * If one device does not support this operation or fails,
> + * the whole operation is failed and rolled back.
> + *
> + * It is forbidden to have multiple processes with the same role
> + * unless only one of them is configured to handle the traffic.

Why is it not allowed to have many processes in standby?

> + *
> + * The application is active by default.
> + * The configuration from the active process is effective immediately
> + * while the configuration from the standby process is queued by hardware.

I'm afraid error handling could be tricky in this case.
I think it makes sense to highlight that configuration
propagation failures will be returned on attempt to set
the process active.

> + * When configuring the device from a standby process,
> + * it has no effect except for below situations:
> + *   - traffic not handled by the active process configuration
> + *   - no active process
> + *
> + * When a process is changed from a standby to an active role,
> + * all preceding configurations that are queued by hardware
> + * should become effective immediately.
> + * Before role transition, all the traffic handling configurations
> + * set by the active process should be flushed first.
> + *
> + * In summary, the operations are expected to happen in this order
> + * in "old" and "new" applications:
> + *   device: already configured by the old application
> + *   new:    start as active
> + *   new:    probe the same device
> + *   new:    set as standby
> + *   new:    configure the device
> + *   device: has configurations from old and new applications
> + *   old:    clear its device configuration
> + *   device: has only 1 configuration from new application
> + *   new:    set as active
> + *   device: downtime for connecting all to the new application
> + *   old:    shutdown
> + *
> + * @param standby
> + *   Role active if false, standby if true.

Typically API with boolean parameters is bad. May be in this
particular case it is better to have two functions:
rte_eth_process_set_active() and rte_eth_process_set_standby().

> + * @param flags
> + *   Role specific flags.

Please, define namespace for flags.

> + * @return
> + *   Positive value on success, -rte_errno value on error:
> + *   - (> 0) Number of switched devices.

Isn't is success if there is no devices to switch?

> + *   - (-ENOTSUP) if not supported by a device.
> + *   - (-EPERM) if operation failed with a device.
> + */
> +__rte_experimental
> +int rte_eth_process_set_role(bool standby, uint32_t flags);
> +
>   /**
>    * Get the number of ports which are usable for the application.
>    *
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> index 17201fbe0f..d5d3ea5421 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -298,6 +298,9 @@ EXPERIMENTAL {
>   	rte_flow_get_q_aged_flows;
>   	rte_mtr_meter_policy_get;
>   	rte_mtr_meter_profile_get;
> +
> +	# added in 23.03
> +	rte_eth_process_set_role;
>   };
>   
>   INTERNAL {


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 2/3] ethdev: add standby state for live migration
  2023-02-01  7:52                     ` Andrew Rybchenko
@ 2023-02-01  8:27                       ` Thomas Monjalon
  2023-02-01  8:40                         ` Andrew Rybchenko
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2023-02-01  8:27 UTC (permalink / raw)
  To: Rongwei Liu, Andrew Rybchenko
  Cc: dev, matan, viacheslavo, orika, jerinjacobk, stephen, rasland,
	Ferruh Yigit, jerinj

01/02/2023 08:52, Andrew Rybchenko:
> On 1/18/23 18:44, Rongwei Liu wrote:
> > +* **Added process state in ethdev to improve live migration.**
> > +
> > +  Hot upgrade of an application may be accelerated by configuring
> > +  the new application in standby state while the old one is still active.
> > +  Such double ethdev configuration of the same device is possible
> > +  with the added process state API.
> > +
> 
> Will we have the same API for other device classes? Any ideas
> how to avoid duplication? (I'm afraid we don't have generic
> place to put such functionality).

It could be needed to have such feature in other classes, yes.
That's a device feature, so it must be in each class,
with different flags/options per class.

> > +	/* Check if all devices support process role. */
> > +	RTE_ETH_FOREACH_DEV(port_id) {
> > +		dev = &rte_eth_devices[port_id];
> > +		if (*dev->dev_ops->process_set_role != NULL &&
> > +			*dev->dev_ops->dev_infos_get != NULL &&
> > +			(*dev->dev_ops->dev_infos_get)(dev, &dev_info) == 0 &&
> > +			(dev_info.dev_capa & RTE_ETH_DEV_CAPA_PROCESS_ROLE) != 0)
> 
> Why do we need both non-NULL callback and dev_capa flag?
> Isn't just non-NULL callback sufficient?

We could have a callback returning ENOTSUP.
Anyway we need the capability for application query.

> > +			continue;
> > +		rte_errno = ENOTSUP;
> > +		return -rte_errno;
> > +	}
> > +	/* Call the driver callbacks. */
> > +	RTE_ETH_FOREACH_DEV(port_id) {
> > +		dev = &rte_eth_devices[port_id];
> > +		if ((*dev->dev_ops->process_set_role)(dev, standby, flags) < 0)
> 
> Please, add error logging on failure.

good idea

> 
> > +			goto failure;
> 
> IMHO it would be useful to have info logs on success.

When setting the role, we could have debug log, yes.

> > +		ret++;
> 
> ret variable name is very confusing, please, be
> more specific in variable naming.
> 
> > +	}
> > +	return ret;
> > +
> > +failure:
> > +	/* Rollback all changed devices in case one failed. */
> > +	if (ret) {
> 
> Compare vs 0
> 
> > +		RTE_ETH_FOREACH_DEV(port_id) {
> > +			dev = &rte_eth_devices[port_id];
> > +			(*dev->dev_ops->process_set_role)(dev, !standby, flags);
> 
> IMHO, it would be useful to have error logs on rollback
> failure and info logs on success.
> 
> > +			if (--ret == 0)
> > +				break;
> > +		}
> > +	}
> > +	rte_errno = EPERM;
> 
> Why is it always EPERM? Isn't it better to forward
> failure code from driver?
> 
> > +	return -rte_errno;
> > +}

Thanks for the good detailed review Andrew.

> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1606,6 +1606,8 @@ struct rte_eth_conf {
> >   #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP         RTE_BIT64(3)
> >   /** Device supports keeping shared flow objects across restart. */
> >   #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
> > +/** Device supports process role changing. @see rte_eth_process_set_active */
> 
> There is no rte_eth_process_set_active()
> 
> > +#define RTE_ETH_DEV_CAPA_PROCESS_ROLE           RTE_BIT64(5)
> >   /**@}*/
> >   
> >   /*
> > @@ -2204,6 +2206,60 @@ int rte_eth_dev_owner_delete(const uint64_t owner_id);
> >   int rte_eth_dev_owner_get(const uint16_t port_id,
> >   		struct rte_eth_dev_owner *owner);
> >   
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Set the role of the process to active or standby,
> > + * affecting network traffic handling.
> > + *
> > + * If one device does not support this operation or fails,
> > + * the whole operation is failed and rolled back.
> > + *
> > + * It is forbidden to have multiple processes with the same role
> > + * unless only one of them is configured to handle the traffic.
> 
> Why is it not allowed to have many processes in standby?
> 
> > + *
> > + * The application is active by default.
> > + * The configuration from the active process is effective immediately
> > + * while the configuration from the standby process is queued by hardware.
> 
> I'm afraid error handling could be tricky in this case.
> I think it makes sense to highlight that configuration
> propagation failures will be returned on attempt to set
> the process active.
> 
> > + * When configuring the device from a standby process,
> > + * it has no effect except for below situations:
> > + *   - traffic not handled by the active process configuration
> > + *   - no active process
> > + *
> > + * When a process is changed from a standby to an active role,
> > + * all preceding configurations that are queued by hardware
> > + * should become effective immediately.
> > + * Before role transition, all the traffic handling configurations
> > + * set by the active process should be flushed first.
> > + *
> > + * In summary, the operations are expected to happen in this order
> > + * in "old" and "new" applications:
> > + *   device: already configured by the old application
> > + *   new:    start as active
> > + *   new:    probe the same device
> > + *   new:    set as standby
> > + *   new:    configure the device
> > + *   device: has configurations from old and new applications
> > + *   old:    clear its device configuration
> > + *   device: has only 1 configuration from new application
> > + *   new:    set as active
> > + *   device: downtime for connecting all to the new application
> > + *   old:    shutdown
> > + *
> > + * @param standby
> > + *   Role active if false, standby if true.
> 
> Typically API with boolean parameters is bad. May be in this
> particular case it is better to have two functions:
> rte_eth_process_set_active() and rte_eth_process_set_standby().

Why?
It could be an enum as well.

> > + * @param flags
> > + *   Role specific flags.
> 
> Please, define namespace for flags.
> 
> > + * @return
> > + *   Positive value on success, -rte_errno value on error:
> > + *   - (> 0) Number of switched devices.
> 
> Isn't is success if there is no devices to switch?

It should be >= 0 I guess.




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 2/3] ethdev: add standby state for live migration
  2023-02-01  7:32                         ` Andrew Rybchenko
@ 2023-02-01  8:31                           ` Thomas Monjalon
  2023-02-01  8:40                           ` Jerin Jacob
  1 sibling, 0 replies; 54+ messages in thread
From: Thomas Monjalon @ 2023-02-01  8:31 UTC (permalink / raw)
  To: Rongwei Liu, Jerin Jacob, Andrew Rybchenko
  Cc: dev, matan, viacheslavo, orika, stephen, rasland, Ferruh Yigit,
	bruce.richardson

01/02/2023 08:32, Andrew Rybchenko:
> On 2/1/23 01:55, Thomas Monjalon wrote:
> > 31/01/2023 19:14, Jerin Jacob:
> >> On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> >>>
> >>> When a DPDK application must be upgraded,
> >>> the traffic downtime should be shortened as much as possible.
> >>> During the migration time, the old application may stay alive
> >>> while the new application is starting and being configured.
> >>>
> >>> In order to optimize the switch to the new application,
> >>> the old application may need to be aware of the presence
> >>> of the new application being prepared.
> >>> This is achieved with a new API allowing the user to change the
> >>> new application state to standby and active later.
> >>>
> >>> The added function is trying to apply the new state to all probed
> >>> ethdev ports. To make this API simple and easy to use,
> >>> the same flags have to be accepted by all devices.
> >>>
> >>> This is the scenario of operations in the old and new applications:
> >>> .       device: already configured by the old application
> >>> .       new:    start as active
> >>> .       new:    probe the same device
> >>
> >> How to probe same device if is already bind to another application?
> >> vfio-pci wont allow this.
> > 
> > I missed that part.
> > There is no way to share a VFIO device between 2 applications?
> 
> As I understand multi-process shares an VFIO device between
> many application. As far as I remember it is just required to
> pass corresponding file descriptor to another application.
> 
> Anyway I fully agree that the patch requires more documentation
> in doc/guides/ with the description of live migration theory of
> operations.

Yes that part is missing.
Rongwei, you should try a fake migration with e1000 or virtio
(both available in QEMU), so you can play with VFIO device probing.

> >>> .       new:    set as standby
> >>> .       new:    configure the device
> >>> .       device: has configurations from old and new applications
> >>> .       old:    clear its device configuration
> >>> .       device: has only 1 configuration from new application
> >>> .       new:    set as active
> >>> .       device: downtime for connecting all to the new application
> >>> .       old:    shutdown




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 2/3] ethdev: add standby state for live migration
  2023-02-01  8:27                       ` Thomas Monjalon
@ 2023-02-01  8:40                         ` Andrew Rybchenko
  0 siblings, 0 replies; 54+ messages in thread
From: Andrew Rybchenko @ 2023-02-01  8:40 UTC (permalink / raw)
  To: Thomas Monjalon, Rongwei Liu
  Cc: dev, matan, viacheslavo, orika, jerinjacobk, stephen, rasland,
	Ferruh Yigit, jerinj

On 2/1/23 11:27, Thomas Monjalon wrote:
> 01/02/2023 08:52, Andrew Rybchenko:
>> On 1/18/23 18:44, Rongwei Liu wrote:
>>> + * When configuring the device from a standby process,
>>> + * it has no effect except for below situations:
>>> + *   - traffic not handled by the active process configuration
>>> + *   - no active process
>>> + *
>>> + * When a process is changed from a standby to an active role,
>>> + * all preceding configurations that are queued by hardware
>>> + * should become effective immediately.
>>> + * Before role transition, all the traffic handling configurations
>>> + * set by the active process should be flushed first.
>>> + *
>>> + * In summary, the operations are expected to happen in this order
>>> + * in "old" and "new" applications:
>>> + *   device: already configured by the old application
>>> + *   new:    start as active
>>> + *   new:    probe the same device
>>> + *   new:    set as standby
>>> + *   new:    configure the device
>>> + *   device: has configurations from old and new applications
>>> + *   old:    clear its device configuration
>>> + *   device: has only 1 configuration from new application
>>> + *   new:    set as active
>>> + *   device: downtime for connecting all to the new application
>>> + *   old:    shutdown
>>> + *
>>> + * @param standby
>>> + *   Role active if false, standby if true.
>>
>> Typically API with boolean parameters is bad. May be in this
>> particular case it is better to have two functions:
>> rte_eth_process_set_active() and rte_eth_process_set_standby().
> 
> Why?
> It could be an enum as well.

It is simply hard to read (what is set_role(true)???) and not
extensible. enum instead of boolean is an acceptable
alternative of many functions.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 2/3] ethdev: add standby state for live migration
  2023-02-01  7:32                         ` Andrew Rybchenko
  2023-02-01  8:31                           ` Thomas Monjalon
@ 2023-02-01  8:40                           ` Jerin Jacob
  2023-02-01  8:46                             ` Thomas Monjalon
  1 sibling, 1 reply; 54+ messages in thread
From: Jerin Jacob @ 2023-02-01  8:40 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: Thomas Monjalon, Rongwei Liu, dev, matan, viacheslavo, orika,
	stephen, rasland, Ferruh Yigit, bruce.richardson

On Wed, Feb 1, 2023 at 1:02 PM Andrew Rybchenko
<andrew.rybchenko@oktetlabs.ru> wrote:
>
> On 2/1/23 01:55, Thomas Monjalon wrote:
> > 31/01/2023 19:14, Jerin Jacob:
> >> On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> >>>
> >>> When a DPDK application must be upgraded,
> >>> the traffic downtime should be shortened as much as possible.
> >>> During the migration time, the old application may stay alive
> >>> while the new application is starting and being configured.
> >>>
> >>> In order to optimize the switch to the new application,
> >>> the old application may need to be aware of the presence
> >>> of the new application being prepared.
> >>> This is achieved with a new API allowing the user to change the
> >>> new application state to standby and active later.
> >>>
> >>> The added function is trying to apply the new state to all probed
> >>> ethdev ports. To make this API simple and easy to use,
> >>> the same flags have to be accepted by all devices.
> >>>
> >>> This is the scenario of operations in the old and new applications:
> >>> .       device: already configured by the old application
> >>> .       new:    start as active
> >>> .       new:    probe the same device
> >>
> >> How to probe same device if is already bind to another application?
> >> vfio-pci wont allow this.
> >
> > I missed that part.
> > There is no way to share a VFIO device between 2 applications?
>
> As I understand multi-process shares an VFIO device between
> many application. As far as I remember it is just required to
> pass corresponding file descriptor to another application.

I think, Here, it is two different primary application.

Even if it is primary-secondary kind of case, bus or pci driver layer
needs fixup,
Currently, if we add allow list same PCI device on primary and secondary,
the second app start will fail.

>
> Anyway I fully agree that the patch requires more documentation
> in doc/guides/ with the description of live migration theory of
> operations.
>
> >>> .       new:    set as standby
> >>> .       new:    configure the device
> >>> .       device: has configurations from old and new applications
> >>> .       old:    clear its device configuration
> >>> .       device: has only 1 configuration from new application
> >>> .       new:    set as active
> >>> .       device: downtime for connecting all to the new application
> >>> .       old:    shutdown
> >
> >
> >
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 2/3] ethdev: add standby state for live migration
  2023-02-01  8:40                           ` Jerin Jacob
@ 2023-02-01  8:46                             ` Thomas Monjalon
  2023-02-02 10:23                               ` Rongwei Liu
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Monjalon @ 2023-02-01  8:46 UTC (permalink / raw)
  To: Andrew Rybchenko, Jerin Jacob
  Cc: Rongwei Liu, dev, matan, viacheslavo, orika, stephen, rasland,
	Ferruh Yigit, bruce.richardson

01/02/2023 09:40, Jerin Jacob:
> On Wed, Feb 1, 2023 at 1:02 PM Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru> wrote:
> >
> > On 2/1/23 01:55, Thomas Monjalon wrote:
> > > 31/01/2023 19:14, Jerin Jacob:
> > >> On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu <rongweil@nvidia.com> wrote:
> > >>>
> > >>> When a DPDK application must be upgraded,
> > >>> the traffic downtime should be shortened as much as possible.
> > >>> During the migration time, the old application may stay alive
> > >>> while the new application is starting and being configured.
> > >>>
> > >>> In order to optimize the switch to the new application,
> > >>> the old application may need to be aware of the presence
> > >>> of the new application being prepared.
> > >>> This is achieved with a new API allowing the user to change the
> > >>> new application state to standby and active later.
> > >>>
> > >>> The added function is trying to apply the new state to all probed
> > >>> ethdev ports. To make this API simple and easy to use,
> > >>> the same flags have to be accepted by all devices.
> > >>>
> > >>> This is the scenario of operations in the old and new applications:
> > >>> .       device: already configured by the old application
> > >>> .       new:    start as active
> > >>> .       new:    probe the same device
> > >>
> > >> How to probe same device if is already bind to another application?
> > >> vfio-pci wont allow this.
> > >
> > > I missed that part.
> > > There is no way to share a VFIO device between 2 applications?
> >
> > As I understand multi-process shares an VFIO device between
> > many application. As far as I remember it is just required to
> > pass corresponding file descriptor to another application.
> 
> I think, Here, it is two different primary application.

Yes it is 2 primary applications.

> Even if it is primary-secondary kind of case, bus or pci driver layer
> needs fixup,
> Currently, if we add allow list same PCI device on primary and secondary,
> the second app start will fail.

Can we imagine passing a VFIO handle to the new application?




^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v4 2/3] ethdev: add standby state for live migration
  2023-02-01  8:46                             ` Thomas Monjalon
@ 2023-02-02 10:23                               ` Rongwei Liu
  0 siblings, 0 replies; 54+ messages in thread
From: Rongwei Liu @ 2023-02-02 10:23 UTC (permalink / raw)
  To: NBU-Contact-Thomas Monjalon (EXTERNAL), Andrew Rybchenko, Jerin Jacob
  Cc: dev, Matan Azrad, Slava Ovsiienko, Ori Kam, stephen,
	Raslan Darawsheh, Ferruh Yigit, bruce.richardson

Hi Jerin and Thomas

BR
Rongwei

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Wednesday, February 1, 2023 16:47
> To: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Jerin Jacob
> <jerinjacobk@gmail.com>
> Cc: Rongwei Liu <rongweil@nvidia.com>; dev@dpdk.org; Matan Azrad
> <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; Ori Kam
> <orika@nvidia.com>; stephen@networkplumber.org; Raslan Darawsheh
> <rasland@nvidia.com>; Ferruh Yigit <ferruh.yigit@amd.com>;
> bruce.richardson@intel.com
> Subject: Re: [PATCH v4 2/3] ethdev: add standby state for live migration
> 
> External email: Use caution opening links or attachments
> 
> 
> 01/02/2023 09:40, Jerin Jacob:
> > On Wed, Feb 1, 2023 at 1:02 PM Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru> wrote:
> > >
> > > On 2/1/23 01:55, Thomas Monjalon wrote:
> > > > 31/01/2023 19:14, Jerin Jacob:
> > > >> On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu <rongweil@nvidia.com>
> wrote:
> > > >>>
> > > >>> When a DPDK application must be upgraded, the traffic downtime
> > > >>> should be shortened as much as possible.
> > > >>> During the migration time, the old application may stay alive
> > > >>> while the new application is starting and being configured.
> > > >>>
> > > >>> In order to optimize the switch to the new application, the old
> > > >>> application may need to be aware of the presence of the new
> > > >>> application being prepared.
> > > >>> This is achieved with a new API allowing the user to change the
> > > >>> new application state to standby and active later.
> > > >>>
> > > >>> The added function is trying to apply the new state to all
> > > >>> probed ethdev ports. To make this API simple and easy to use,
> > > >>> the same flags have to be accepted by all devices.
> > > >>>
> > > >>> This is the scenario of operations in the old and new applications:
> > > >>> .       device: already configured by the old application
> > > >>> .       new:    start as active
> > > >>> .       new:    probe the same device
> > > >>
> > > >> How to probe same device if is already bind to another application?
> > > >> vfio-pci wont allow this.
> > > >
> > > > I missed that part.
> > > > There is no way to share a VFIO device between 2 applications?
> > >
> > > As I understand multi-process shares an VFIO device between many
> > > application. As far as I remember it is just required to pass
> > > corresponding file descriptor to another application.
Thanks for the comments. I missed this part when designed this feature.
Per my test, vfio-pci doesn't support two primary applications using the same handle(/dev/vfio/***).
We will move this back to mlx5 specific. 
> >
> > I think, Here, it is two different primary application.
> 
> Yes it is 2 primary applications.
> 
> > Even if it is primary-secondary kind of case, bus or pci driver layer
> > needs fixup, Currently, if we add allow list same PCI device on
> > primary and secondary, the second app start will fail.
> 
> Can we imagine passing a VFIO handle to the new application?
> 
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v4 1/3] ethdev: add flow rule group description
  2023-01-31 11:53                     ` Ori Kam
@ 2023-02-06 12:15                       ` Rongwei Liu
  0 siblings, 0 replies; 54+ messages in thread
From: Rongwei Liu @ 2023-02-06 12:15 UTC (permalink / raw)
  To: Ori Kam, dev, Matan Azrad, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	jerinjacobk, stephen
  Cc: Raslan Darawsheh, Ferruh Yigit, Andrew Rybchenko

HI:
Andrew, Thomas, Ferruh, are we good with this commit. Can you share some comments?
We still need this one after moving live migration API to MLX private.

BR
Rongwei

> -----Original Message-----
> From: Ori Kam <orika@nvidia.com>
> Sent: Tuesday, January 31, 2023 19:53
> To: Rongwei Liu <rongweil@nvidia.com>; dev@dpdk.org; Matan Azrad
> <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; NBU-
> Contact-Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> jerinjacobk@gmail.com; stephen@networkplumber.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; Ferruh Yigit
> <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Subject: RE: [PATCH v4 1/3] ethdev: add flow rule group description
> 
> Hi Rongwei,
> 
> > -----Original Message-----
> > From: Rongwei Liu <rongweil@nvidia.com>
> > Sent: Wednesday, 18 January 2023 17:45
> >
> > Add more sentences to describe the group concepts and define group 0
> > as root group for traffic to search a hit rule.
> >
> > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > ---
> >  lib/ethdev/rte_flow.h | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> > b60987db4b..e71ac0c199 100644
> > --- a/lib/ethdev/rte_flow.h
> > +++ b/lib/ethdev/rte_flow.h
> > @@ -86,7 +86,18 @@ extern "C" {
> >   * but may be valid in a few cases.
> >   */
> >  struct rte_flow_attr {
> > -	uint32_t group; /**< Priority group. */
> > +	/**
> > +	 * A group is a superset of multiple rules.
> > +	 * The default group is 0 and is processed for all packets.
> > +	 * The group 0 of bifurcated drivers is shared with the kernel.
> > +	 * Rules in other groups are processed only if the group is chained
> > +	 * by a jump action from a previously matched rule.
> > +	 * It means the group hierarchy is made by the flow rules,
> > +	 * and the group 0 is the hierarchy root.
> > +	 * Note there is no automatic dead loop protection.
> > +	 * @see rte_flow_action_jump
> > +	 */
> > +	uint32_t group;
> >  	uint32_t priority; /**< Rule priority level within group. */
> >  	/**
> >  	 * The rule in question applies to ingress traffic (non-"transfer").
> > --
> > 2.27.0
> 
> Acked-by: Ori Kam <orika@nvidia.com>
> Best,
> Ori

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v5] ethdev: add flow rule group description
  2023-01-18 15:44                   ` [PATCH v4 1/3] ethdev: add flow rule group description Rongwei Liu
  2023-01-31 11:53                     ` Ori Kam
@ 2023-02-07  2:57                     ` Rongwei Liu
  2023-02-08 20:28                       ` Ferruh Yigit
  1 sibling, 1 reply; 54+ messages in thread
From: Rongwei Liu @ 2023-02-07  2:57 UTC (permalink / raw)
  To: dev, matan, viacheslavo, orika, thomas
  Cc: rasland, Ferruh Yigit, Andrew Rybchenko

Add more sentences to describe the group concepts
and define group 0 as root group for traffic to search a
hit rule.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
---
 lib/ethdev/rte_flow.h | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index b60987db4b..e71ac0c199 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -86,7 +86,18 @@ extern "C" {
  * but may be valid in a few cases.
  */
 struct rte_flow_attr {
-	uint32_t group; /**< Priority group. */
+	/**
+	 * A group is a superset of multiple rules.
+	 * The default group is 0 and is processed for all packets.
+	 * The group 0 of bifurcated drivers is shared with the kernel.
+	 * Rules in other groups are processed only if the group is chained
+	 * by a jump action from a previously matched rule.
+	 * It means the group hierarchy is made by the flow rules,
+	 * and the group 0 is the hierarchy root.
+	 * Note there is no automatic dead loop protection.
+	 * @see rte_flow_action_jump
+	 */
+	uint32_t group;
 	uint32_t priority; /**< Rule priority level within group. */
 	/**
 	 * The rule in question applies to ingress traffic (non-"transfer").
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v5] ethdev: add flow rule group description
  2023-02-07  2:57                     ` [PATCH v5] " Rongwei Liu
@ 2023-02-08 20:28                       ` Ferruh Yigit
  2023-02-09  2:06                         ` Rongwei Liu
  2023-02-09  7:32                         ` [PATCH v6] " Rongwei Liu
  0 siblings, 2 replies; 54+ messages in thread
From: Ferruh Yigit @ 2023-02-08 20:28 UTC (permalink / raw)
  To: Rongwei Liu, dev, matan, viacheslavo, orika, thomas
  Cc: rasland, Andrew Rybchenko

On 2/7/2023 2:57 AM, Rongwei Liu wrote:
> Add more sentences to describe the group concepts
> and define group 0 as root group for traffic to search a
> hit rule.
> 
> Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> Acked-by: Ori Kam <orika@nvidia.com>
> ---
>  lib/ethdev/rte_flow.h | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
> index b60987db4b..e71ac0c199 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -86,7 +86,18 @@ extern "C" {
>   * but may be valid in a few cases.
>   */
>  struct rte_flow_attr {
> -	uint32_t group; /**< Priority group. */
> +	/**
> +	 * A group is a superset of multiple rules.
> +	 * The default group is 0 and is processed for all packets.
> +	 * The group 0 of bifurcated drivers is shared with the kernel.
> +	 * Rules in other groups are processed only if the group is chained
> +	 * by a jump action from a previously matched rule.
> +	 * It means the group hierarchy is made by the flow rules,
> +	 * and the group 0 is the hierarchy root.
> +	 * Note there is no automatic dead loop protection.
> +	 * @see rte_flow_action_jump
> +	 */
> +	uint32_t group;

Hi Rongwei, Ori,

The elaborated comment looks matching with flow API documentation [1],
except there is additional information here about default group being
shared with kernel for bifurcated drivers.

Should this additional information added to the flow API documentation?



[1]
https://doc.dpdk.org/guides/prog_guide/rte_flow.html



^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v5] ethdev: add flow rule group description
  2023-02-08 20:28                       ` Ferruh Yigit
@ 2023-02-09  2:06                         ` Rongwei Liu
  2023-02-09  7:32                         ` [PATCH v6] " Rongwei Liu
  1 sibling, 0 replies; 54+ messages in thread
From: Rongwei Liu @ 2023-02-09  2:06 UTC (permalink / raw)
  To: Ferruh Yigit, dev, Matan Azrad, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL)
  Cc: Raslan Darawsheh, Andrew Rybchenko

HI Ferruh:

BR
Rongwei

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, February 9, 2023 04:28
> To: Rongwei Liu <rongweil@nvidia.com>; dev@dpdk.org; Matan Azrad
> <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; Ori Kam
> <orika@nvidia.com>; NBU-Contact-Thomas Monjalon (EXTERNAL)
> <thomas@monjalon.net>
> Cc: Raslan Darawsheh <rasland@nvidia.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Subject: Re: [PATCH v5] ethdev: add flow rule group description
> 
> External email: Use caution opening links or attachments
> 
> 
> On 2/7/2023 2:57 AM, Rongwei Liu wrote:
> > Add more sentences to describe the group concepts and define group 0
> > as root group for traffic to search a hit rule.
> >
> > Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> > Acked-by: Ori Kam <orika@nvidia.com>
> > ---
> >  lib/ethdev/rte_flow.h | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> > b60987db4b..e71ac0c199 100644
> > --- a/lib/ethdev/rte_flow.h
> > +++ b/lib/ethdev/rte_flow.h
> > @@ -86,7 +86,18 @@ extern "C" {
> >   * but may be valid in a few cases.
> >   */
> >  struct rte_flow_attr {
> > -     uint32_t group; /**< Priority group. */
> > +     /**
> > +      * A group is a superset of multiple rules.
> > +      * The default group is 0 and is processed for all packets.
> > +      * The group 0 of bifurcated drivers is shared with the kernel.
> > +      * Rules in other groups are processed only if the group is chained
> > +      * by a jump action from a previously matched rule.
> > +      * It means the group hierarchy is made by the flow rules,
> > +      * and the group 0 is the hierarchy root.
> > +      * Note there is no automatic dead loop protection.
> > +      * @see rte_flow_action_jump
> > +      */
> > +     uint32_t group;
> 
> Hi Rongwei, Ori,
> 
> The elaborated comment looks matching with flow API documentation [1],
> except there is additional information here about default group being shared
> with kernel for bifurcated drivers.
> 
> Should this additional information added to the flow API documentation?
Agree with you, it' better to be present at NIC documents.
> 
> 
> 
> [1]
> https://doc.dpdk.org/guides/prog_guide/rte_flow.html
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v6] ethdev: add flow rule group description
  2023-02-08 20:28                       ` Ferruh Yigit
  2023-02-09  2:06                         ` Rongwei Liu
@ 2023-02-09  7:32                         ` Rongwei Liu
  2023-02-09  8:01                           ` Ori Kam
  1 sibling, 1 reply; 54+ messages in thread
From: Rongwei Liu @ 2023-02-09  7:32 UTC (permalink / raw)
  To: dev, matan, viacheslavo, orika, thomas
  Cc: rasland, Ferruh Yigit, Andrew Rybchenko

Add more sentences to describe the group concepts
and define group 0 as root group for traffic to search a
hit rule.

Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
---
 lib/ethdev/rte_flow.h | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index b60987db4b..321cf06fbc 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -86,7 +86,17 @@ extern "C" {
  * but may be valid in a few cases.
  */
 struct rte_flow_attr {
-	uint32_t group; /**< Priority group. */
+	/**
+	 * A group is a superset of multiple rules.
+	 * The default group is 0 and is processed for all packets.
+	 * Rules in other groups are processed only if the group is chained
+	 * by a jump action from a previously matched rule.
+	 * It means the group hierarchy is made by the flow rules,
+	 * and the group 0 is the hierarchy root.
+	 * Note there is no automatic dead loop protection.
+	 * @see rte_flow_action_jump
+	 */
+	uint32_t group;
 	uint32_t priority; /**< Rule priority level within group. */
 	/**
 	 * The rule in question applies to ingress traffic (non-"transfer").
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* RE: [PATCH v6] ethdev: add flow rule group description
  2023-02-09  7:32                         ` [PATCH v6] " Rongwei Liu
@ 2023-02-09  8:01                           ` Ori Kam
  2023-02-09 11:26                             ` Ferruh Yigit
  0 siblings, 1 reply; 54+ messages in thread
From: Ori Kam @ 2023-02-09  8:01 UTC (permalink / raw)
  To: Rongwei Liu, dev, Matan Azrad, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon (EXTERNAL)
  Cc: Raslan Darawsheh, Ferruh Yigit, Andrew Rybchenko

Hi Rongwei,

> -----Original Message-----
> From: Rongwei Liu <rongweil@nvidia.com>
> Sent: Thursday, 9 February 2023 9:33
> 
> Add more sentences to describe the group concepts
> and define group 0 as root group for traffic to search a
> hit rule.
> 
> Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> ---
>  lib/ethdev/rte_flow.h | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
> index b60987db4b..321cf06fbc 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -86,7 +86,17 @@ extern "C" {
>   * but may be valid in a few cases.
>   */
>  struct rte_flow_attr {
> -	uint32_t group; /**< Priority group. */
> +	/**
> +	 * A group is a superset of multiple rules.
> +	 * The default group is 0 and is processed for all packets.
> +	 * Rules in other groups are processed only if the group is chained
> +	 * by a jump action from a previously matched rule.
> +	 * It means the group hierarchy is made by the flow rules,
> +	 * and the group 0 is the hierarchy root.
> +	 * Note there is no automatic dead loop protection.
> +	 * @see rte_flow_action_jump
> +	 */
> +	uint32_t group;
>  	uint32_t priority; /**< Rule priority level within group. */
>  	/**
>  	 * The rule in question applies to ingress traffic (non-"transfer").
> --
> 2.27.0

Acked-by: Ori Kam <orika@nvidia.com>
Best,
Ori

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v6] ethdev: add flow rule group description
  2023-02-09  8:01                           ` Ori Kam
@ 2023-02-09 11:26                             ` Ferruh Yigit
  0 siblings, 0 replies; 54+ messages in thread
From: Ferruh Yigit @ 2023-02-09 11:26 UTC (permalink / raw)
  To: Ori Kam, Rongwei Liu, dev, Matan Azrad, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon (EXTERNAL)
  Cc: Raslan Darawsheh, Andrew Rybchenko

On 2/9/2023 8:01 AM, Ori Kam wrote:
> Hi Rongwei,
> 
>> -----Original Message-----
>> From: Rongwei Liu <rongweil@nvidia.com>
>> Sent: Thursday, 9 February 2023 9:33
>>
>> Add more sentences to describe the group concepts
>> and define group 0 as root group for traffic to search a
>> hit rule.
>>
>> Signed-off-by: Rongwei Liu <rongweil@nvidia.com>
> 
> Acked-by: Ori Kam <orika@nvidia.com>
> 

Applied to dpdk-next-net/main, thanks.


^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2023-02-09 11:26 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-01  8:20 [RFC 0/2] add API to set process to primary or secondary Rongwei Liu
2022-12-01  8:20 ` [RFC 1/2] ethdev: add group description Rongwei Liu
2022-12-01  8:20 ` [RFC 2/2] ethdev: add API to set process to primary or secondary Rongwei Liu
2022-12-01 15:10   ` Stephen Hemminger
2022-12-02  3:27     ` Rongwei Liu
2022-12-05 16:08       ` Stephen Hemminger
2022-12-06  3:47         ` Rongwei Liu
2022-12-06  5:54           ` Stephen Hemminger
2022-12-21  9:00             ` [RFC v3 0/2] add API to set process to active or standby Rongwei Liu
2022-12-21  9:00               ` [RFC v3 1/2] ethdev: add group description Rongwei Liu
2023-01-18 15:44                 ` [PATCH v4 0/3] add API for live migration Rongwei Liu
2023-01-18 15:44                   ` [PATCH v4 1/3] ethdev: add flow rule group description Rongwei Liu
2023-01-31 11:53                     ` Ori Kam
2023-02-06 12:15                       ` Rongwei Liu
2023-02-07  2:57                     ` [PATCH v5] " Rongwei Liu
2023-02-08 20:28                       ` Ferruh Yigit
2023-02-09  2:06                         ` Rongwei Liu
2023-02-09  7:32                         ` [PATCH v6] " Rongwei Liu
2023-02-09  8:01                           ` Ori Kam
2023-02-09 11:26                             ` Ferruh Yigit
2023-01-18 15:44                   ` [PATCH v4 2/3] ethdev: add standby state for live migration Rongwei Liu
2023-01-31 13:50                     ` Ori Kam
2023-01-31 18:14                     ` Jerin Jacob
2023-01-31 22:55                       ` Thomas Monjalon
2023-02-01  7:32                         ` Andrew Rybchenko
2023-02-01  8:31                           ` Thomas Monjalon
2023-02-01  8:40                           ` Jerin Jacob
2023-02-01  8:46                             ` Thomas Monjalon
2023-02-02 10:23                               ` Rongwei Liu
2023-02-01  7:52                     ` Andrew Rybchenko
2023-02-01  8:27                       ` Thomas Monjalon
2023-02-01  8:40                         ` Andrew Rybchenko
2023-01-18 15:44                   ` [PATCH v4 3/3] ethdev: add standby flags " Rongwei Liu
2023-01-23 13:20                     ` Jerin Jacob
2023-01-30  2:47                       ` Rongwei Liu
2023-01-30 17:10                         ` Jerin Jacob
2023-01-31  2:53                           ` Rongwei Liu
2023-01-31  8:45                             ` Jerin Jacob
2023-01-31  9:01                               ` Rongwei Liu
2023-01-31 14:37                                 ` Jerin Jacob
2023-01-31 14:45                                   ` Ori Kam
2023-01-31 17:50                                     ` Thomas Monjalon
2023-01-31 18:10                                       ` Jerin Jacob
2022-12-21  9:00               ` [RFC v3 2/2] ethdev: add API to set process to active or standby Rongwei Liu
2022-12-21  9:12                 ` Jerin Jacob
2022-12-21  9:32                   ` Rongwei Liu
2022-12-21 10:59                     ` Jerin Jacob
2022-12-21 12:05                       ` Rongwei Liu
2022-12-21 12:44                         ` Jerin Jacob
2022-12-21 12:50                           ` Rongwei Liu
2022-12-21 13:12                             ` Jerin Jacob
2022-12-21 14:33                               ` Rongwei Liu
2022-12-26 16:44                                 ` Ori Kam
2023-01-15 22:46                                   ` Thomas Monjalon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.