virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v10 0/4] ethtool: provide the dim profile fine-tuning channel
@ 2024-04-25 16:59 Heng Qi
  2024-04-25 16:59 ` [PATCH net-next v10 1/4] linux/dim: move useful macros to .h file Heng Qi
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Heng Qi @ 2024-04-25 16:59 UTC (permalink / raw)
  To: netdev, virtualization, Jakub Kicinski
  Cc: David S . Miller, Paolo Abeni, Eric Dumazet, Jason Wang,
	Michael S . Tsirkin, Brett Creeley, Ratheesh Kannoth,
	Alexander Lobakin, Xuan Zhuo, Tal Gilboa, Jonathan Corbet,
	linux-doc, Maxime Chevallier, Jiri Pirko, Paul Greenwalt,
	Ahmed Zaki, Vladimir Oltean, Kory Maincent, Andrew Lunn,
	justinstitt @ google . com

The NetDIM library provides excellent acceleration for many modern
network cards. However, the default profiles of DIM limits its maximum
capabilities for different NICs, so providing a way which the NIC can
be custom configured is necessary.

Currently, the way is based on the commonly used "ethtool -C".

Please review, thank you very much!

Changelog
=====
v9->v10:
  - Collect dim related flags/mode/work into one place.
  - Use rx_profile + tx_profile instead of four profiles.
  - Add several helps.
  - Update commit logs.

v8->v9:
  - Fix the compilation error of conflicting names of rx_profile in
    dim.h and ice driver: in dim.h, rx_profile is replaced with
    dim_rx_profile. So does tx_profile.

v7->v8:
  - Use kmemdup() instead of kzalloc()/memcpy() in dev_dim_profile_init().

v6->v7:
  - A new wrapper struct pointer is used in struct net_device.
  - Add IS_ENABLED(CONFIG_DIMLIB) to avoid compiler warnings.
  - Profile fields changed from u16 to u32.

v5->v6:
  - Place the profile in netdevice to bypass the driver.
    The interaction code of ethtool <-> kernel has not changed at all,
    only the interaction part of kernel <-> driver has changed.

v4->v5:
  - Update some snippets from Kuba, Thanks.

v3->v4:
  - Some tiny updates and patch 1 only add a new comment.

v2->v3:
  - Break up the attributes to avoid the use of raw c structs.
  - Use per-device profile instead of global profile in the driver.

v1->v2:
  - Use ethtool tool instead of net-sysfs

Heng Qi (4):
  linux/dim: move useful macros to .h file
  ethtool: provide customized dim profile management
  dim: add new interfaces for initialization and getting results
  virtio-net: support dim profile fine-tuning

 Documentation/netlink/specs/ethtool.yaml     |  23 ++
 Documentation/networking/ethtool-netlink.rst |   4 +
 drivers/net/virtio_net.c                     |  44 +++-
 include/linux/dim.h                          | 115 ++++++++
 include/linux/ethtool.h                      |   7 +-
 include/linux/netdevice.h                    |   5 +
 include/uapi/linux/ethtool_netlink.h         |  20 ++
 lib/dim/net_dim.c                            | 139 ++++++++++
 net/ethtool/coalesce.c                       | 264 ++++++++++++++++++-
 9 files changed, 612 insertions(+), 9 deletions(-)

-- 
2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net-next v10 1/4] linux/dim: move useful macros to .h file
  2024-04-25 16:59 [PATCH net-next v10 0/4] ethtool: provide the dim profile fine-tuning channel Heng Qi
@ 2024-04-25 16:59 ` Heng Qi
  2024-04-25 16:59 ` [PATCH net-next v10 2/4] ethtool: provide customized dim profile management Heng Qi
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Heng Qi @ 2024-04-25 16:59 UTC (permalink / raw)
  To: netdev, virtualization, Jakub Kicinski
  Cc: David S . Miller, Paolo Abeni, Eric Dumazet, Jason Wang,
	Michael S . Tsirkin, Brett Creeley, Ratheesh Kannoth,
	Alexander Lobakin, Xuan Zhuo, Tal Gilboa, Jonathan Corbet,
	linux-doc, Maxime Chevallier, Jiri Pirko, Paul Greenwalt,
	Ahmed Zaki, Vladimir Oltean, Kory Maincent, Andrew Lunn,
	justinstitt @ google . com

Useful macros will be used effectively elsewhere.
These will be utilized in subsequent patches.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
---
 include/linux/dim.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/dim.h b/include/linux/dim.h
index f343bc9aa2ec..43398f5eade2 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -10,6 +10,13 @@
 #include <linux/types.h>
 #include <linux/workqueue.h>
 
+/* Number of DIM profiles and period mode. */
+#define NET_DIM_PARAMS_NUM_PROFILES 5
+#define NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE 256
+#define NET_DIM_DEFAULT_TX_CQ_PKTS_FROM_EQE 128
+#define NET_DIM_DEF_PROFILE_CQE 1
+#define NET_DIM_DEF_PROFILE_EQE 1
+
 /*
  * Number of events between DIM iterations.
  * Causes a moderation of the algorithm run.
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v10 2/4] ethtool: provide customized dim profile management
  2024-04-25 16:59 [PATCH net-next v10 0/4] ethtool: provide the dim profile fine-tuning channel Heng Qi
  2024-04-25 16:59 ` [PATCH net-next v10 1/4] linux/dim: move useful macros to .h file Heng Qi
@ 2024-04-25 16:59 ` Heng Qi
  2024-04-27  1:33   ` Jakub Kicinski
  2024-04-25 16:59 ` [PATCH net-next v10 3/4] dim: add new interfaces for initialization and getting results Heng Qi
  2024-04-25 16:59 ` [PATCH net-next v10 4/4] virtio-net: support dim profile fine-tuning Heng Qi
  3 siblings, 1 reply; 11+ messages in thread
From: Heng Qi @ 2024-04-25 16:59 UTC (permalink / raw)
  To: netdev, virtualization, Jakub Kicinski
  Cc: David S . Miller, Paolo Abeni, Eric Dumazet, Jason Wang,
	Michael S . Tsirkin, Brett Creeley, Ratheesh Kannoth,
	Alexander Lobakin, Xuan Zhuo, Tal Gilboa, Jonathan Corbet,
	linux-doc, Maxime Chevallier, Jiri Pirko, Paul Greenwalt,
	Ahmed Zaki, Vladimir Oltean, Kory Maincent, Andrew Lunn,
	justinstitt @ google . com

The NetDIM library, currently leveraged by an array of NICs, delivers
excellent acceleration benefits. Nevertheless, NICs vary significantly
in their dim profile list prerequisites.

Specifically, virtio-net backends may present diverse sw or hw device
implementation, making a one-size-fits-all parameter list impractical.
On Alibaba Cloud, the virtio DPU's performance under the default DIM
profile falls short of expectations, partly due to a mismatch in
parameter configuration.

I also noticed that ice/idpf/ena and other NICs have customized
profilelist or placed some restrictions on dim capabilities.

Motivated by this, I tried adding new params for "ethtool -C" that provides
a per-device control to modify and access a device's interrupt parameters.

Usage
========
The target NIC is named ethx.

Assume that ethx only declares support for rx profile setting
(with DIM_PROFILE_RX flag set in profile_flags) and supports modification
of usec and pkt fields.

1. Query the currently customized list of the device

$ ethtool -c ethx
...
rx-profile:
{.usec =   1, .pkts = 256, .comps = n/a,},
{.usec =   8, .pkts = 256, .comps = n/a,},
{.usec =  64, .pkts = 256, .comps = n/a,},
{.usec = 128, .pkts = 256, .comps = n/a,},
{.usec = 256, .pkts = 256, .comps = n/a,}
tx-profile:   n/a

2. Tune
$ ethtool -C ethx rx-profile 1,1,n_2,n,n_3,3,n_4,4,n_n,5,n
"n" means do not modify this field.
$ ethtool -c ethx
...
rx-profile:
{.usec =   1, .pkts =   1, .comps = n/a,},
{.usec =   2, .pkts = 256, .comps = n/a,},
{.usec =   3, .pkts =   3, .comps = n/a,},
{.usec =   4, .pkts =   4, .comps = n/a,},
{.usec = 256, .pkts =   5, .comps = n/a,}
tx-profile:   n/a

3. Hint
If the device does not support some type of customized dim profiles,
the corresponding "n/a" will display.

If the "n/a" field is being modified, -EOPNOTSUPP will be reported.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
---
 Documentation/netlink/specs/ethtool.yaml     |  23 ++
 Documentation/networking/ethtool-netlink.rst |   4 +
 include/linux/dim.h                          |  60 +++++
 include/linux/ethtool.h                      |   7 +-
 include/linux/netdevice.h                    |   5 +
 include/uapi/linux/ethtool_netlink.h         |  20 ++
 lib/dim/net_dim.c                            |  73 +++++
 net/ethtool/coalesce.c                       | 264 ++++++++++++++++++-
 8 files changed, 454 insertions(+), 2 deletions(-)

diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
index 87ae7b397984..3c51a1a0b5d9 100644
--- a/Documentation/netlink/specs/ethtool.yaml
+++ b/Documentation/netlink/specs/ethtool.yaml
@@ -413,6 +413,18 @@ attribute-sets:
       -
         name: combined-count
         type: u32
+  -
+    name: irq-moderation
+    attributes:
+      -
+        name: usec
+        type: u32
+      -
+        name: pkts
+        type: u32
+      -
+        name: comps
+        type: u32
 
   -
     name: coalesce
@@ -502,6 +514,15 @@ attribute-sets:
       -
         name: tx-aggr-time-usecs
         type: u32
+      -
+        name: rx-profile
+        type: nest
+        nested-attributes: irq-moderation
+      -
+        name: tx-profile
+        type: nest
+        nested-attributes: irq-moderation
+
   -
     name: pause-stat
     attributes:
@@ -1313,6 +1334,8 @@ operations:
             - tx-aggr-max-bytes
             - tx-aggr-max-frames
             - tx-aggr-time-usecs
+            - rx-profile
+            - tx-profile
       dump: *coalesce-get-op
     -
       name: coalesce-set
diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index 4e63d3708ed9..78ee25081498 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -1040,6 +1040,8 @@ Kernel response contents:
   ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES``     u32     max aggr size, Tx
   ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES``    u32     max aggr packets, Tx
   ``ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS``    u32     time (us), aggr, Tx
+  ``ETHTOOL_A_COALESCE_RX_PROFILE``            nested  profile of DIM, Rx
+  ``ETHTOOL_A_COALESCE_TX_PROFILE``            nested  profile of DIM, Tx
   ===========================================  ======  =======================
 
 Attributes are only included in reply if their value is not zero or the
@@ -1105,6 +1107,8 @@ Request contents:
   ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES``     u32     max aggr size, Tx
   ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES``    u32     max aggr packets, Tx
   ``ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS``    u32     time (us), aggr, Tx
+  ``ETHTOOL_A_COALESCE_RX_PROFILE``            nested  profile of DIM, Rx
+  ``ETHTOOL_A_COALESCE_TX_PROFILE``            nested  profile of DIM, Tx
   ===========================================  ======  =======================
 
 Request is rejected if it attributes declared as unsupported by driver (i.e.
diff --git a/include/linux/dim.h b/include/linux/dim.h
index 43398f5eade2..af01389fcf39 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -9,6 +9,7 @@
 #include <linux/module.h>
 #include <linux/types.h>
 #include <linux/workqueue.h>
+#include <linux/netdevice.h>
 
 /* Number of DIM profiles and period mode. */
 #define NET_DIM_PARAMS_NUM_PROFILES 5
@@ -53,6 +54,39 @@ struct dim_cq_moder {
 	u8 cq_period_mode;
 };
 
+#define DIM_PROFILE_RX		BIT(0)	/* support rx dim profile modification */
+#define DIM_PROFILE_TX		BIT(1)	/* support tx dim profile modification */
+
+#define DIM_COALESCE_USEC	BIT(0)	/* support usec field modification */
+#define DIM_COALESCE_PKTS	BIT(1)	/* support pkts field modification */
+#define DIM_COALESCE_COMPS	BIT(2)	/* support comps field modification */
+
+struct dim_irq_moder {
+	/* See DIM_PROFILE_* */
+	u8 profile_flags;
+
+	/* See DIM_COALESCE_* for Rx and Tx */
+	u8 coal_flags;
+
+	/* Rx DIM period count mode: CQE or EQE */
+	u8 dim_rx_mode;
+
+	/* Tx DIM period count mode: CQE or EQE */
+	u8 dim_tx_mode;
+
+	/* DIM profile list for Rx */
+	struct dim_cq_moder __rcu *rx_profile;
+
+	/* DIM profile list for Tx */
+	struct dim_cq_moder __rcu *tx_profile;
+
+	/* Rx DIM worker function scheduled by net_dim() */
+	void (*rx_dim_work)(struct work_struct *work);
+
+	/* Tx DIM worker function scheduled by net_dim() */
+	void (*tx_dim_work)(struct work_struct *work);
+};
+
 /**
  * struct dim_sample - Structure for DIM sample data.
  * Used for communications between DIM and its consumer.
@@ -198,6 +232,32 @@ enum dim_step_result {
 	DIM_ON_EDGE,
 };
 
+/**
+ * net_dim_init_irq_moder - collect information to initialize irq moderation
+ * @dev: target network device
+ * @profile_flags: Rx or Tx profile modification capability
+ * @coal_flags: irq moderation params flags
+ * @rx_mode: CQ period mode for Rx
+ * @tx_mode: CQ period mode for Tx
+ * void (*rx_dim_work)(struct work_struct *work);
+ *	Rx worker called after dim decision.
+ *
+ * void (*tx_dim_work)(struct work_struct *work);
+ *	Tx worker called after dim decision.
+ *
+ * Return: 0 on success or a negative error code.
+ */
+int net_dim_init_irq_moder(struct net_device *dev, u8 profile_flags,
+			   u8 coal_flags, u8 rx_mode, u8 tx_mode,
+			   void (*rx_dim_work)(struct work_struct *work),
+			   void (*tx_dim_work)(struct work_struct *work));
+
+/**
+ * net_dim_free_irq_moder - free fields for irq moderation
+ * @dev: target network device
+ */
+void net_dim_free_irq_moder(struct net_device *dev);
+
 /**
  *	dim_on_top - check if current state is a good place to stop (top location)
  *	@dim: DIM context
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 6fd9107d3cc0..902815b517dc 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -284,7 +284,9 @@ bool ethtool_convert_link_mode_to_legacy_u32(u32 *legacy_u32,
 #define ETHTOOL_COALESCE_TX_AGGR_MAX_BYTES	BIT(24)
 #define ETHTOOL_COALESCE_TX_AGGR_MAX_FRAMES	BIT(25)
 #define ETHTOOL_COALESCE_TX_AGGR_TIME_USECS	BIT(26)
-#define ETHTOOL_COALESCE_ALL_PARAMS		GENMASK(26, 0)
+#define ETHTOOL_COALESCE_RX_PROFILE		BIT(27)
+#define ETHTOOL_COALESCE_TX_PROFILE		BIT(28)
+#define ETHTOOL_COALESCE_ALL_PARAMS		GENMASK(28, 0)
 
 #define ETHTOOL_COALESCE_USECS						\
 	(ETHTOOL_COALESCE_RX_USECS | ETHTOOL_COALESCE_TX_USECS)
@@ -316,6 +318,9 @@ bool ethtool_convert_link_mode_to_legacy_u32(u32 *legacy_u32,
 	(ETHTOOL_COALESCE_TX_AGGR_MAX_BYTES |	\
 	 ETHTOOL_COALESCE_TX_AGGR_MAX_FRAMES |	\
 	 ETHTOOL_COALESCE_TX_AGGR_TIME_USECS)
+#define ETHTOOL_COALESCE_PROFILE	\
+	(ETHTOOL_COALESCE_RX_PROFILE |	\
+	 ETHTOOL_COALESCE_TX_PROFILE)
 
 #define ETHTOOL_STAT_NOT_SET	(~0ULL)
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d45f330d083d..d0bcc69b1b82 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2400,6 +2400,11 @@ struct net_device {
 	/** @page_pools: page pools created for this netdevice */
 	struct hlist_head	page_pools;
 #endif
+
+#if IS_ENABLED(CONFIG_DIMLIB)
+	/** @irq_moder: dim related parameters for this netdevice */
+	struct dim_irq_moder	*irq_moder;
+#endif
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index b4f0d233d048..4837e37e8b10 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -416,12 +416,32 @@ enum {
 	ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES,		/* u32 */
 	ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES,		/* u32 */
 	ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS,		/* u32 */
+	ETHTOOL_A_COALESCE_RX_PROFILE,			/* nest - _A_PROFILE_IRQ_MODERATION */
+	ETHTOOL_A_COALESCE_TX_PROFILE,			/* nest - _A_PROFILE_IRQ_MODERATION */
 
 	/* add new constants above here */
 	__ETHTOOL_A_COALESCE_CNT,
 	ETHTOOL_A_COALESCE_MAX = (__ETHTOOL_A_COALESCE_CNT - 1)
 };
 
+enum {
+	ETHTOOL_A_PROFILE_UNSPEC,
+	ETHTOOL_A_PROFILE_IRQ_MODERATION,		/* nest, _A_IRQ_MODERATION_* */
+
+	__ETHTOOL_A_PROFILE_CNT,
+	ETHTOOL_A_PROFILE_MAX = (__ETHTOOL_A_PROFILE_CNT - 1)
+};
+
+enum {
+	ETHTOOL_A_IRQ_MODERATION_UNSPEC,
+	ETHTOOL_A_IRQ_MODERATION_USEC,			/* u32 */
+	ETHTOOL_A_IRQ_MODERATION_PKTS,			/* u32 */
+	ETHTOOL_A_IRQ_MODERATION_COMPS,			/* u32 */
+
+	__ETHTOOL_A_IRQ_MODERATION_CNT,
+	ETHTOOL_A_IRQ_MODERATION_MAX = (__ETHTOOL_A_IRQ_MODERATION_CNT - 1)
+};
+
 /* PAUSE */
 
 enum {
diff --git a/lib/dim/net_dim.c b/lib/dim/net_dim.c
index 4e32f7aaac86..ec0dc21793c0 100644
--- a/lib/dim/net_dim.c
+++ b/lib/dim/net_dim.c
@@ -101,6 +101,79 @@ net_dim_get_def_tx_moderation(u8 cq_period_mode)
 }
 EXPORT_SYMBOL(net_dim_get_def_tx_moderation);
 
+int net_dim_init_irq_moder(struct net_device *dev, u8 profile_flags,
+			   u8 coal_flags, u8 rx_mode, u8 tx_mode,
+			   void (*rx_dim_work)(struct work_struct *work),
+			   void (*tx_dim_work)(struct work_struct *work))
+{
+	struct dim_cq_moder *rxp, *txp;
+	struct dim_irq_moder *moder;
+	int len;
+
+	dev->irq_moder = kzalloc(sizeof(*dev->irq_moder), GFP_KERNEL);
+	if (!dev->irq_moder)
+		goto err_moder;
+
+	moder = dev->irq_moder;
+	len = NET_DIM_PARAMS_NUM_PROFILES * sizeof(*moder->rx_profile);
+
+	moder->profile_flags = profile_flags;
+	moder->coal_flags = coal_flags;
+
+	if (profile_flags & DIM_PROFILE_RX) {
+		moder->dim_rx_mode = rx_mode;
+		moder->rx_dim_work = rx_dim_work;
+		rxp = kmemdup(rx_profile[rx_mode], len, GFP_KERNEL);
+		if (!rxp)
+			goto err_rx_profile;
+
+		rcu_assign_pointer(moder->rx_profile, rxp);
+	}
+
+	if (profile_flags & DIM_PROFILE_TX) {
+		moder->dim_tx_mode = tx_mode;
+		moder->tx_dim_work = tx_dim_work;
+		txp = kmemdup(tx_profile[tx_mode], len, GFP_KERNEL);
+		if (!txp)
+			goto err_tx_profile;
+
+		rcu_assign_pointer(moder->tx_profile, txp);
+	}
+
+	return 0;
+
+err_tx_profile:
+	kfree(rxp);
+err_rx_profile:
+	kfree(moder);
+err_moder:
+	return -ENOMEM;
+}
+EXPORT_SYMBOL(net_dim_init_irq_moder);
+
+void net_dim_free_irq_moder(struct net_device *dev)
+{
+	struct dim_cq_moder *rx_profile, *tx_profile;
+
+	if (!dev->irq_moder)
+		return;
+
+	rcu_read_lock();
+	rx_profile = rcu_dereference(dev->irq_moder->rx_profile);
+	tx_profile = rcu_dereference(dev->irq_moder->tx_profile);
+	rcu_read_unlock();
+
+	rcu_assign_pointer(dev->irq_moder->tx_profile, NULL);
+	rcu_assign_pointer(dev->irq_moder->rx_profile, NULL);
+
+	synchronize_rcu();
+
+	kfree(rx_profile);
+	kfree(tx_profile);
+	kfree(dev->irq_moder);
+}
+EXPORT_SYMBOL(net_dim_free_irq_moder);
+
 static int net_dim_step(struct dim *dim)
 {
 	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
diff --git a/net/ethtool/coalesce.c b/net/ethtool/coalesce.c
index 83112c1a71ae..7b852938cf01 100644
--- a/net/ethtool/coalesce.c
+++ b/net/ethtool/coalesce.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 
+#include <linux/dim.h>
 #include "netlink.h"
 #include "common.h"
 
@@ -82,6 +83,14 @@ static int coalesce_prepare_data(const struct ethnl_req_info *req_base,
 static int coalesce_reply_size(const struct ethnl_req_info *req_base,
 			       const struct ethnl_reply_data *reply_base)
 {
+	int modersz = nla_total_size(0) + /* _PROFILE_IRQ_MODERATION, nest */
+		      nla_total_size(sizeof(u32)) + /* _IRQ_MODERATION_USEC */
+		      nla_total_size(sizeof(u32)) + /* _IRQ_MODERATION_PKTS */
+		      nla_total_size(sizeof(u32));  /* _IRQ_MODERATION_COMPS */
+
+	int total_modersz = nla_total_size(0) +  /* _{R,T}X_PROFILE, nest */
+			modersz * NET_DIM_PARAMS_NUM_PROFILES;
+
 	return nla_total_size(sizeof(u32)) +	/* _RX_USECS */
 	       nla_total_size(sizeof(u32)) +	/* _RX_MAX_FRAMES */
 	       nla_total_size(sizeof(u32)) +	/* _RX_USECS_IRQ */
@@ -108,7 +117,8 @@ static int coalesce_reply_size(const struct ethnl_req_info *req_base,
 	       nla_total_size(sizeof(u8)) +	/* _USE_CQE_MODE_RX */
 	       nla_total_size(sizeof(u32)) +	/* _TX_AGGR_MAX_BYTES */
 	       nla_total_size(sizeof(u32)) +	/* _TX_AGGR_MAX_FRAMES */
-	       nla_total_size(sizeof(u32));	/* _TX_AGGR_TIME_USECS */
+	       nla_total_size(sizeof(u32)) +	/* _TX_AGGR_TIME_USECS */
+	       total_modersz * 2;		/* _{R,T}X_PROFILE */
 }
 
 static bool coalesce_put_u32(struct sk_buff *skb, u16 attr_type, u32 val,
@@ -127,6 +137,75 @@ static bool coalesce_put_bool(struct sk_buff *skb, u16 attr_type, u32 val,
 	return nla_put_u8(skb, attr_type, !!val);
 }
 
+#if IS_ENABLED(CONFIG_DIMLIB)
+/**
+ * coalesce_put_profile - fill reply with a nla nest with four child nla nests.
+ * @skb: socket buffer the message is stored in
+ * @attr_type: nest attr type ETHTOOL_A_COALESCE_*X_PROFILE
+ * @profile: data passed to userspace
+ * @coal_flags: modifiable parameters supported by the driver
+ *
+ * Put a dim profile nest attribute. Refer to ETHTOOL_A_PROFILE_IRQ_MODERATION.
+ *
+ * Return: 0 on success or a negative error code.
+ */
+static int coalesce_put_profile(struct sk_buff *skb, u16 attr_type,
+				const struct dim_cq_moder *profile,
+				u8 coal_flags)
+{
+	struct nlattr *profile_attr, *moder_attr;
+	int i, ret;
+
+	if (!profile || !coal_flags)
+		return 0;
+
+	profile_attr = nla_nest_start(skb, attr_type);
+	if (!profile_attr)
+		return -EMSGSIZE;
+
+	for (i = 0; i < NET_DIM_PARAMS_NUM_PROFILES; i++) {
+		moder_attr = nla_nest_start(skb, ETHTOOL_A_PROFILE_IRQ_MODERATION);
+		if (!moder_attr) {
+			ret = -EMSGSIZE;
+			goto cancel_profile;
+		}
+
+		if (coal_flags & DIM_COALESCE_USEC) {
+			ret = nla_put_u32(skb, ETHTOOL_A_IRQ_MODERATION_USEC,
+					  profile[i].usec);
+			if (ret)
+				goto cancel_moder;
+		}
+
+		if (coal_flags & DIM_COALESCE_PKTS) {
+			ret = nla_put_u32(skb, ETHTOOL_A_IRQ_MODERATION_PKTS,
+					  profile[i].pkts);
+			if (ret)
+				goto cancel_moder;
+		}
+
+		if (coal_flags & DIM_COALESCE_COMPS) {
+			ret = nla_put_u32(skb, ETHTOOL_A_IRQ_MODERATION_COMPS,
+					  profile[i].comps);
+			if (ret)
+				goto cancel_moder;
+		}
+
+		nla_nest_end(skb, moder_attr);
+	}
+
+	nla_nest_end(skb, profile_attr);
+
+	return 0;
+
+cancel_moder:
+	nla_nest_cancel(skb, moder_attr);
+cancel_profile:
+	nla_nest_cancel(skb, profile_attr);
+	return ret;
+}
+#endif
+
 static int coalesce_fill_reply(struct sk_buff *skb,
 			       const struct ethnl_req_info *req_base,
 			       const struct ethnl_reply_data *reply_base)
@@ -134,6 +213,12 @@ static int coalesce_fill_reply(struct sk_buff *skb,
 	const struct coalesce_reply_data *data = COALESCE_REPDATA(reply_base);
 	const struct kernel_ethtool_coalesce *kcoal = &data->kernel_coalesce;
 	const struct ethtool_coalesce *coal = &data->coalesce;
+#if IS_ENABLED(CONFIG_DIMLIB)
+	struct net_device *dev = req_base->dev;
+	struct dim_irq_moder *irq_moder = dev->irq_moder;
+	u8 coal_flags;
+	int ret;
+#endif
 	u32 supported = data->supported_params;
 
 	if (coalesce_put_u32(skb, ETHTOOL_A_COALESCE_RX_USECS,
@@ -192,11 +277,51 @@ static int coalesce_fill_reply(struct sk_buff *skb,
 			     kcoal->tx_aggr_time_usecs, supported))
 		return -EMSGSIZE;
 
+#if IS_ENABLED(CONFIG_DIMLIB)
+	if (!irq_moder)
+		return 0;
+
+	coal_flags = irq_moder->coal_flags;
+	rcu_read_lock();
+	if (irq_moder->profile_flags & DIM_PROFILE_RX) {
+		ret = coalesce_put_profile(skb, ETHTOOL_A_COALESCE_RX_PROFILE,
+					   rcu_dereference(irq_moder->rx_profile),
+					   coal_flags);
+		if (ret) {
+			rcu_read_unlock();
+			return ret;
+		}
+	}
+
+	if (irq_moder->profile_flags & DIM_PROFILE_TX) {
+		ret = coalesce_put_profile(skb, ETHTOOL_A_COALESCE_TX_PROFILE,
+					   rcu_dereference(irq_moder->tx_profile),
+					   coal_flags);
+		if (ret) {
+			rcu_read_unlock();
+			return ret;
+		}
+	}
+	rcu_read_unlock();
+#endif
 	return 0;
 }
 
 /* COALESCE_SET */
 
+#if IS_ENABLED(CONFIG_DIMLIB)
+static const struct nla_policy coalesce_irq_moderation_policy[] = {
+	[ETHTOOL_A_IRQ_MODERATION_USEC]	= {.type = NLA_U32},
+	[ETHTOOL_A_IRQ_MODERATION_PKTS]	= {.type = NLA_U32},
+	[ETHTOOL_A_IRQ_MODERATION_COMPS] = {.type = NLA_U32},
+};
+
+static const struct nla_policy coalesce_profile_irq_policy[] = {
+	[ETHTOOL_A_PROFILE_IRQ_MODERATION] =
+		NLA_POLICY_NESTED(coalesce_irq_moderation_policy),
+};
+#endif
+
 const struct nla_policy ethnl_coalesce_set_policy[] = {
 	[ETHTOOL_A_COALESCE_HEADER]		=
 		NLA_POLICY_NESTED(ethnl_header_policy),
@@ -227,6 +352,12 @@ const struct nla_policy ethnl_coalesce_set_policy[] = {
 	[ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES] = { .type = NLA_U32 },
 	[ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES] = { .type = NLA_U32 },
 	[ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS] = { .type = NLA_U32 },
+#if IS_ENABLED(CONFIG_DIMLIB)
+	[ETHTOOL_A_COALESCE_RX_PROFILE] =
+		NLA_POLICY_NESTED(coalesce_profile_irq_policy),
+	[ETHTOOL_A_COALESCE_TX_PROFILE] =
+		NLA_POLICY_NESTED(coalesce_profile_irq_policy),
+#endif
 };
 
 static int
@@ -234,6 +365,9 @@ ethnl_set_coalesce_validate(struct ethnl_req_info *req_info,
 			    struct genl_info *info)
 {
 	const struct ethtool_ops *ops = req_info->dev->ethtool_ops;
+#if IS_ENABLED(CONFIG_DIMLIB)
+	struct net_device *dev = req_info->dev;
+#endif
 	struct nlattr **tb = info->attrs;
 	u32 supported_params;
 	u16 a;
@@ -243,6 +377,15 @@ ethnl_set_coalesce_validate(struct ethnl_req_info *req_info,
 
 	/* make sure that only supported parameters are present */
 	supported_params = ops->supported_coalesce_params;
+#if IS_ENABLED(CONFIG_DIMLIB)
+	if (dev->irq_moder) {
+		if (dev->irq_moder->profile_flags & DIM_PROFILE_RX)
+			supported_params |= ETHTOOL_COALESCE_RX_PROFILE;
+
+		if (dev->irq_moder->profile_flags & DIM_PROFILE_TX)
+			supported_params |= ETHTOOL_COALESCE_TX_PROFILE;
+	}
+#endif
 	for (a = ETHTOOL_A_COALESCE_RX_USECS; a < __ETHTOOL_A_COALESCE_CNT; a++)
 		if (tb[a] && !(supported_params & attr_to_mask(a))) {
 			NL_SET_ERR_MSG_ATTR(info->extack, tb[a],
@@ -253,12 +396,104 @@ ethnl_set_coalesce_validate(struct ethnl_req_info *req_info,
 	return 1;
 }
 
+#if IS_ENABLED(CONFIG_DIMLIB)
+/**
+ * ethnl_update_profile - get a profile nla nest with child nla nests from userspace.
+ * @dev: netdevice to update the profile
+ * @dst: profile get from the driver and modified by ethnl_update_profile.
+ * @nests: nest attr ETHTOOL_A_COALESCE_*X_PROFILE to set profile.
+ * @extack: Netlink extended ack
+ *
+ * Layout of nests:
+ *   Nested ETHTOOL_A_COALESCE_*X_PROFILE attr
+ *     Nested ETHTOOL_A_PROFILE_IRQ_MODERATION attr
+ *       ETHTOOL_A_IRQ_MODERATION_USEC attr
+ *       ETHTOOL_A_IRQ_MODERATION_PKTS attr
+ *       ETHTOOL_A_IRQ_MODERATION_COMPS attr
+ *     ...
+ *     Nested ETHTOOL_A_PROFILE_IRQ_MODERATION attr
+ *       ETHTOOL_A_IRQ_MODERATION_USEC attr
+ *       ETHTOOL_A_IRQ_MODERATION_PKTS attr
+ *       ETHTOOL_A_IRQ_MODERATION_COMPS attr
+ *
+ * Return: 0 on success or a negative error code.
+ */
+static int ethnl_update_profile(struct net_device *dev,
+				struct dim_cq_moder __rcu **dst,
+				const struct nlattr *nests,
+				struct netlink_ext_ack *extack)
+{
+	struct nlattr *moder[ARRAY_SIZE(coalesce_irq_moderation_policy)];
+	struct dim_irq_moder *irq_moder = dev->irq_moder;
+	struct dim_cq_moder *new_profile, *old_profile;
+	int ret, rem, i = 0, len;
+	struct nlattr *nest;
+
+	if (!nests)
+		return 0;
+
+	if (!*dst)
+		return -EINVAL;
+
+	old_profile = rtnl_dereference(*dst);
+	len = NET_DIM_PARAMS_NUM_PROFILES * sizeof(*old_profile);
+	new_profile = kmemdup(old_profile, len, GFP_KERNEL);
+	if (!new_profile)
+		return -ENOMEM;
+
+	nla_for_each_nested_type(nest, ETHTOOL_A_PROFILE_IRQ_MODERATION, nests, rem) {
+		ret = nla_parse_nested(moder,
+				       ARRAY_SIZE(coalesce_irq_moderation_policy) - 1,
+				       nest, coalesce_irq_moderation_policy,
+				       extack);
+		if (ret)
+			return ret;
+
+		if (!NL_REQ_ATTR_CHECK(extack, nest, moder, ETHTOOL_A_IRQ_MODERATION_USEC)) {
+			if (irq_moder->coal_flags & DIM_COALESCE_USEC)
+				new_profile[i].usec =
+					nla_get_u32(moder[ETHTOOL_A_IRQ_MODERATION_USEC]);
+			else
+				return -EOPNOTSUPP;
+		}
+
+		if (!NL_REQ_ATTR_CHECK(extack, nest, moder, ETHTOOL_A_IRQ_MODERATION_PKTS)) {
+			if (irq_moder->coal_flags & DIM_COALESCE_PKTS)
+				new_profile[i].pkts =
+					nla_get_u32(moder[ETHTOOL_A_IRQ_MODERATION_PKTS]);
+			else
+				return -EOPNOTSUPP;
+		}
+
+		if (!NL_REQ_ATTR_CHECK(extack, nest, moder, ETHTOOL_A_IRQ_MODERATION_COMPS)) {
+			if (irq_moder->coal_flags & DIM_COALESCE_COMPS)
+				new_profile[i].comps =
+					nla_get_u32(moder[ETHTOOL_A_IRQ_MODERATION_COMPS]);
+			else
+				return -EOPNOTSUPP;
+		}
+
+		i++;
+	}
+
+	rcu_assign_pointer(*dst, new_profile);
+
+	synchronize_rcu();
+	kfree(old_profile);
+
+	return 0;
+}
+#endif
+
 static int
 __ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info,
 		     bool *dual_change)
 {
 	struct kernel_ethtool_coalesce kernel_coalesce = {};
 	struct net_device *dev = req_info->dev;
+#if IS_ENABLED(CONFIG_DIMLIB)
+	struct dim_irq_moder *irq_moder = dev->irq_moder;
+#endif
 	struct ethtool_coalesce coalesce = {};
 	bool mod_mode = false, mod = false;
 	struct nlattr **tb = info->attrs;
@@ -317,6 +552,33 @@ __ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info,
 	ethnl_update_u32(&kernel_coalesce.tx_aggr_time_usecs,
 			 tb[ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS], &mod);
 
+#if IS_ENABLED(CONFIG_DIMLIB)
+	if (!irq_moder)
+		goto skip_irq_moder;
+
+	if (irq_moder->profile_flags & DIM_PROFILE_RX) {
+		ret = ethnl_update_profile(dev, &irq_moder->rx_profile,
+					   tb[ETHTOOL_A_COALESCE_RX_PROFILE],
+					   info->extack);
+		if (ret < 0)
+			return ret;
+	}
+
+	if (irq_moder->profile_flags & DIM_PROFILE_TX) {
+		ret = ethnl_update_profile(dev, &irq_moder->tx_profile,
+					   tb[ETHTOOL_A_COALESCE_TX_PROFILE],
+					   info->extack);
+		if (ret < 0)
+			return ret;
+	}
+
+skip_irq_moder:
+#else
+	if (tb[ETHTOOL_A_COALESCE_RX_PROFILE] ||
+	    tb[ETHTOOL_A_COALESCE_TX_PROFILE])
+		return -EOPNOTSUPP;
+
+#endif
 	/* Update operation modes */
 	ethnl_update_bool32(&coalesce.use_adaptive_rx_coalesce,
 			    tb[ETHTOOL_A_COALESCE_USE_ADAPTIVE_RX], &mod_mode);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v10 3/4] dim: add new interfaces for initialization and getting results
  2024-04-25 16:59 [PATCH net-next v10 0/4] ethtool: provide the dim profile fine-tuning channel Heng Qi
  2024-04-25 16:59 ` [PATCH net-next v10 1/4] linux/dim: move useful macros to .h file Heng Qi
  2024-04-25 16:59 ` [PATCH net-next v10 2/4] ethtool: provide customized dim profile management Heng Qi
@ 2024-04-25 16:59 ` Heng Qi
  2024-04-25 16:59 ` [PATCH net-next v10 4/4] virtio-net: support dim profile fine-tuning Heng Qi
  3 siblings, 0 replies; 11+ messages in thread
From: Heng Qi @ 2024-04-25 16:59 UTC (permalink / raw)
  To: netdev, virtualization, Jakub Kicinski
  Cc: David S . Miller, Paolo Abeni, Eric Dumazet, Jason Wang,
	Michael S . Tsirkin, Brett Creeley, Ratheesh Kannoth,
	Alexander Lobakin, Xuan Zhuo, Tal Gilboa, Jonathan Corbet,
	linux-doc, Maxime Chevallier, Jiri Pirko, Paul Greenwalt,
	Ahmed Zaki, Vladimir Oltean, Kory Maincent, Andrew Lunn,
	justinstitt @ google . com

DIM-related mode and work have been collected in one same place,
so new interfaces are added to provide convenience.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
---
 include/linux/dim.h | 48 +++++++++++++++++++++++++++++++++
 lib/dim/net_dim.c   | 66 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 114 insertions(+)

diff --git a/include/linux/dim.h b/include/linux/dim.h
index af01389fcf39..ea7551bbc599 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -258,6 +258,54 @@ int net_dim_init_irq_moder(struct net_device *dev, u8 profile_flags,
  */
 void net_dim_free_irq_moder(struct net_device *dev);
 
+/**
+ * net_dim_setting - initialize DIM's cq mode and schedule worker
+ * @dev: target network device
+ * @dim: DIM context
+ * @is_tx: true indicates the tx direction, false indicates the rx direction
+ */
+void net_dim_setting(struct net_device *dev, struct dim *dim, bool is_tx);
+
+/**
+ * net_dim_work_cancel - synchronously cancel dim's worker
+ * @dim: DIM context
+ */
+void net_dim_work_cancel(struct dim *dim);
+
+/**
+ * net_dim_get_rx_irq_moder - get DIM rx results based on profile_ix
+ * @dev: target network device
+ * @dim: DIM context
+ *
+ * Return: DIM irq moderation
+ */
+struct dim_cq_moder
+net_dim_get_rx_irq_moder(struct net_device *dev, struct dim *dim);
+
+/**
+ * net_dim_get_tx_irq_moder - get DIM tx results based on profile_ix
+ * @dev: target network device
+ * @dim: DIM context
+ *
+ * Return: DIM irq moderation
+ */
+struct dim_cq_moder
+net_dim_get_tx_irq_moder(struct net_device *dev, struct dim *dim);
+
+/**
+ * net_dim_set_rx_mode - set DIM rx cq mode
+ * @dev: target network device
+ * @rx_mode: target rx cq mode
+ */
+void net_dim_set_rx_mode(struct net_device *dev, u8 rx_mode);
+
+/**
+ * net_dim_set_tx_mode - set DIM tx cq mode
+ * @dev: target network device
+ * @tx_mode: target tx cq mode
+ */
+void net_dim_set_tx_mode(struct net_device *dev, u8 tx_mode);
+
 /**
  *	dim_on_top - check if current state is a good place to stop (top location)
  *	@dim: DIM context
diff --git a/lib/dim/net_dim.c b/lib/dim/net_dim.c
index ec0dc21793c0..8c4543a628e1 100644
--- a/lib/dim/net_dim.c
+++ b/lib/dim/net_dim.c
@@ -174,6 +174,72 @@ void net_dim_free_irq_moder(struct net_device *dev)
 }
 EXPORT_SYMBOL(net_dim_free_irq_moder);
 
+void net_dim_setting(struct net_device *dev, struct dim *dim, bool is_tx)
+{
+	struct dim_irq_moder *irq_moder = dev->irq_moder;
+
+	if (!irq_moder)
+		return;
+
+	if (is_tx) {
+		INIT_WORK(&dim->work, irq_moder->tx_dim_work);
+		dim->mode = irq_moder->dim_tx_mode;
+		return;
+	}
+
+	INIT_WORK(&dim->work, irq_moder->rx_dim_work);
+	dim->mode = irq_moder->dim_rx_mode;
+}
+EXPORT_SYMBOL(net_dim_setting);
+
+void net_dim_work_cancel(struct dim *dim)
+{
+	cancel_work_sync(&dim->work);
+}
+EXPORT_SYMBOL(net_dim_work_cancel);
+
+struct dim_cq_moder net_dim_get_rx_irq_moder(struct net_device *dev,
+					     struct dim *dim)
+{
+	struct dim_cq_moder res, *profile;
+
+	rcu_read_lock();
+	profile = rcu_dereference(dev->irq_moder->rx_profile);
+	res = profile[dim->profile_ix];
+	rcu_read_unlock();
+
+	dim->mode = READ_ONCE(dev->irq_moder->dim_rx_mode);
+
+	return res;
+}
+EXPORT_SYMBOL(net_dim_get_rx_irq_moder);
+
+struct dim_cq_moder net_dim_get_tx_irq_moder(struct net_device *dev,
+					     struct dim *dim)
+{
+	struct dim_cq_moder res, *profile;
+
+	rcu_read_lock();
+	profile = rcu_dereference(dev->irq_moder->tx_profile);
+	res = profile[dim->profile_ix];
+	rcu_read_unlock();
+
+	dim->mode = READ_ONCE(dev->irq_moder->dim_tx_mode);
+
+	return res;
+}
+EXPORT_SYMBOL(net_dim_get_tx_irq_moder);
+
+void net_dim_set_rx_mode(struct net_device *dev, u8 rx_mode)
+{
+	WRITE_ONCE(dev->irq_moder->dim_rx_mode, rx_mode);
+}
+
+void net_dim_set_tx_mode(struct net_device *dev, u8 tx_mode)
+{
+	WRITE_ONCE(dev->irq_moder->dim_tx_mode, tx_mode);
+}
+
 static int net_dim_step(struct dim *dim)
 {
 	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v10 4/4] virtio-net: support dim profile fine-tuning
  2024-04-25 16:59 [PATCH net-next v10 0/4] ethtool: provide the dim profile fine-tuning channel Heng Qi
                   ` (2 preceding siblings ...)
  2024-04-25 16:59 ` [PATCH net-next v10 3/4] dim: add new interfaces for initialization and getting results Heng Qi
@ 2024-04-25 16:59 ` Heng Qi
  3 siblings, 0 replies; 11+ messages in thread
From: Heng Qi @ 2024-04-25 16:59 UTC (permalink / raw)
  To: netdev, virtualization, Jakub Kicinski
  Cc: David S . Miller, Paolo Abeni, Eric Dumazet, Jason Wang,
	Michael S . Tsirkin, Brett Creeley, Ratheesh Kannoth,
	Alexander Lobakin, Xuan Zhuo, Tal Gilboa, Jonathan Corbet,
	linux-doc, Maxime Chevallier, Jiri Pirko, Paul Greenwalt,
	Ahmed Zaki, Vladimir Oltean, Kory Maincent, Andrew Lunn,
	justinstitt @ google . com

Virtio-net has different types of back-end device implementations.
In order to effectively optimize the dim library's gains for different
device implementations, let's use the new interface params to
initialize and query dim results from a customized profile list.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 44 +++++++++++++++++++++++++++++++++-------
 1 file changed, 37 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 115c3c5414f2..555e6c9761da 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2300,7 +2300,7 @@ static int virtnet_open(struct net_device *dev)
 
 	for (i--; i >= 0; i--) {
 		virtnet_disable_queue_pair(vi, i);
-		cancel_work_sync(&vi->rq[i].dim.work);
+		net_dim_work_cancel(&vi->rq[i].dim);
 	}
 
 	return err;
@@ -2466,7 +2466,7 @@ static int virtnet_rx_resize(struct virtnet_info *vi,
 
 	if (running) {
 		napi_disable(&rq->napi);
-		cancel_work_sync(&rq->dim.work);
+		net_dim_work_cancel(&rq->dim);
 	}
 
 	err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf);
@@ -2718,7 +2718,7 @@ static int virtnet_close(struct net_device *dev)
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		virtnet_disable_queue_pair(vi, i);
-		cancel_work_sync(&vi->rq[i].dim.work);
+		net_dim_work_cancel(&vi->rq[i].dim);
 	}
 
 	return 0;
@@ -3580,7 +3580,7 @@ static void virtnet_rx_dim_work(struct work_struct *work)
 		if (!rq->dim_enabled)
 			continue;
 
-		update_moder = net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
+		update_moder = net_dim_get_rx_irq_moder(dev, dim);
 		if (update_moder.usec != rq->intr_coal.max_usecs ||
 		    update_moder.pkts != rq->intr_coal.max_packets) {
 			err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
@@ -4182,6 +4182,33 @@ static void virtnet_tx_timeout(struct net_device *dev, unsigned int txqueue)
 		   jiffies_to_usecs(jiffies - READ_ONCE(txq->trans_start)));
 }
 
+static int virtnet_init_irq_moder(struct virtnet_info *vi)
+{
+	u8 profile_flags = 0, coal_flags = 0;
+	struct net_device *dev = vi->dev;
+	int ret, i;
+
+	profile_flags |= DIM_PROFILE_RX;
+	coal_flags |= DIM_COALESCE_USEC | DIM_COALESCE_PKTS;
+	ret = net_dim_init_irq_moder(dev, profile_flags, coal_flags,
+				     DIM_CQ_PERIOD_MODE_START_FROM_EQE,
+				     0, virtnet_rx_dim_work, NULL);
+
+	if (ret)
+		return ret;
+
+	for (i = 0; i < vi->max_queue_pairs; i++)
+		net_dim_setting(vi->dev, &vi->rq[i].dim, false);
+
+	return 0;
+}
+
+static void virtnet_free_irq_moder(struct virtnet_info *vi)
+{
+	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_VQ_NOTF_COAL))
+		net_dim_free_irq_moder(vi->dev);
+}
+
 static const struct net_device_ops virtnet_netdev = {
 	.ndo_open            = virtnet_open,
 	.ndo_stop   	     = virtnet_close,
@@ -4461,9 +4488,6 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
 					 virtnet_poll_tx,
 					 napi_tx ? napi_weight : 0);
 
-		INIT_WORK(&vi->rq[i].dim.work, virtnet_rx_dim_work);
-		vi->rq[i].dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
-
 		sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
 		ewma_pkt_len_init(&vi->rq[i].mrg_avg_pkt_len);
 		sg_init_table(vi->sq[i].sg, ARRAY_SIZE(vi->sq[i].sg));
@@ -4837,6 +4861,10 @@ static int virtnet_probe(struct virtio_device *vdev)
 		for (i = 0; i < vi->max_queue_pairs; i++)
 			if (vi->sq[i].napi.weight)
 				vi->sq[i].intr_coal.max_packets = 1;
+
+		err = virtnet_init_irq_moder(vi);
+		if (err)
+			goto free;
 	}
 
 #ifdef CONFIG_SYSFS
@@ -4961,6 +4989,8 @@ static void virtnet_remove(struct virtio_device *vdev)
 	disable_rx_mode_work(vi);
 	flush_work(&vi->rx_mode_work);
 
+	virtnet_free_irq_moder(vi);
+
 	unregister_netdev(vi->dev);
 
 	net_failover_destroy(vi->failover);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v10 2/4] ethtool: provide customized dim profile management
  2024-04-25 16:59 ` [PATCH net-next v10 2/4] ethtool: provide customized dim profile management Heng Qi
@ 2024-04-27  1:33   ` Jakub Kicinski
  2024-04-28 14:49     ` Heng Qi
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2024-04-27  1:33 UTC (permalink / raw)
  To: Heng Qi
  Cc: netdev, virtualization, David S . Miller, Paolo Abeni,
	Eric Dumazet, Jason Wang, Michael S . Tsirkin, Brett Creeley,
	Ratheesh Kannoth, Alexander Lobakin, Xuan Zhuo, Tal Gilboa,
	Jonathan Corbet, linux-doc, Maxime Chevallier, Jiri Pirko,
	Paul Greenwalt, Ahmed Zaki, Vladimir Oltean, Kory Maincent,
	Andrew Lunn, justinstitt @ google . com

On Fri, 26 Apr 2024 00:59:46 +0800 Heng Qi wrote:
> The NetDIM library, currently leveraged by an array of NICs, delivers
> excellent acceleration benefits. Nevertheless, NICs vary significantly
> in their dim profile list prerequisites.
> 
> Specifically, virtio-net backends may present diverse sw or hw device
> implementation, making a one-size-fits-all parameter list impractical.
> On Alibaba Cloud, the virtio DPU's performance under the default DIM
> profile falls short of expectations, partly due to a mismatch in
> parameter configuration.
> 
> I also noticed that ice/idpf/ena and other NICs have customized
> profilelist or placed some restrictions on dim capabilities.
> 
> Motivated by this, I tried adding new params for "ethtool -C" that provides
> a per-device control to modify and access a device's interrupt parameters.
> 
> Usage
> ========
> The target NIC is named ethx.
> 
> Assume that ethx only declares support for rx profile setting
> (with DIM_PROFILE_RX flag set in profile_flags) and supports modification
> of usec and pkt fields.
> 
> 1. Query the currently customized list of the device
> 
> $ ethtool -c ethx
> ...
> rx-profile:
> {.usec =   1, .pkts = 256, .comps = n/a,},
> {.usec =   8, .pkts = 256, .comps = n/a,},
> {.usec =  64, .pkts = 256, .comps = n/a,},
> {.usec = 128, .pkts = 256, .comps = n/a,},
> {.usec = 256, .pkts = 256, .comps = n/a,}
> tx-profile:   n/a
> 
> 2. Tune
> $ ethtool -C ethx rx-profile 1,1,n_2,n,n_3,3,n_4,4,n_n,5,n
> "n" means do not modify this field.
> $ ethtool -c ethx
> ...
> rx-profile:
> {.usec =   1, .pkts =   1, .comps = n/a,},
> {.usec =   2, .pkts = 256, .comps = n/a,},
> {.usec =   3, .pkts =   3, .comps = n/a,},
> {.usec =   4, .pkts =   4, .comps = n/a,},
> {.usec = 256, .pkts =   5, .comps = n/a,}
> tx-profile:   n/a
> 
> 3. Hint
> If the device does not support some type of customized dim profiles,
> the corresponding "n/a" will display.
> 
> If the "n/a" field is being modified, -EOPNOTSUPP will be reported.
> 
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> ---
>  Documentation/netlink/specs/ethtool.yaml     |  23 ++
>  Documentation/networking/ethtool-netlink.rst |   4 +
>  include/linux/dim.h                          |  60 +++++
>  include/linux/ethtool.h                      |   7 +-
>  include/linux/netdevice.h                    |   5 +
>  include/uapi/linux/ethtool_netlink.h         |  20 ++
>  lib/dim/net_dim.c                            |  73 +++++
>  net/ethtool/coalesce.c                       | 264 ++++++++++++++++++-
>  8 files changed, 454 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
> index 87ae7b397984..3c51a1a0b5d9 100644
> --- a/Documentation/netlink/specs/ethtool.yaml
> +++ b/Documentation/netlink/specs/ethtool.yaml
> @@ -413,6 +413,18 @@ attribute-sets:
>        -
>          name: combined-count
>          type: u32
> +  -
> +    name: irq-moderation
> +    attributes:
> +      -
> +        name: usec
> +        type: u32
> +      -
> +        name: pkts
> +        type: u32
> +      -
> +        name: comps
> +        type: u32
>  
>    -
>      name: coalesce
> @@ -502,6 +514,15 @@ attribute-sets:
>        -
>          name: tx-aggr-time-usecs
>          type: u32
> +      -
> +        name: rx-profile
> +        type: nest
> +        nested-attributes: irq-moderation
> +      -
> +        name: tx-profile
> +        type: nest
> +        nested-attributes: irq-moderation
> +
>    -
>      name: pause-stat
>      attributes:
> @@ -1313,6 +1334,8 @@ operations:
>              - tx-aggr-max-bytes
>              - tx-aggr-max-frames
>              - tx-aggr-time-usecs
> +            - rx-profile
> +            - tx-profile
>        dump: *coalesce-get-op
>      -
>        name: coalesce-set
> diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
> index 4e63d3708ed9..78ee25081498 100644
> --- a/Documentation/networking/ethtool-netlink.rst
> +++ b/Documentation/networking/ethtool-netlink.rst
> @@ -1040,6 +1040,8 @@ Kernel response contents:
>    ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES``     u32     max aggr size, Tx
>    ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES``    u32     max aggr packets, Tx
>    ``ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS``    u32     time (us), aggr, Tx
> +  ``ETHTOOL_A_COALESCE_RX_PROFILE``            nested  profile of DIM, Rx
> +  ``ETHTOOL_A_COALESCE_TX_PROFILE``            nested  profile of DIM, Tx
>    ===========================================  ======  =======================
>  
>  Attributes are only included in reply if their value is not zero or the
> @@ -1105,6 +1107,8 @@ Request contents:
>    ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES``     u32     max aggr size, Tx
>    ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES``    u32     max aggr packets, Tx
>    ``ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS``    u32     time (us), aggr, Tx
> +  ``ETHTOOL_A_COALESCE_RX_PROFILE``            nested  profile of DIM, Rx
> +  ``ETHTOOL_A_COALESCE_TX_PROFILE``            nested  profile of DIM, Tx
>    ===========================================  ======  =======================
>  
>  Request is rejected if it attributes declared as unsupported by driver (i.e.
> diff --git a/include/linux/dim.h b/include/linux/dim.h
> index 43398f5eade2..af01389fcf39 100644
> --- a/include/linux/dim.h
> +++ b/include/linux/dim.h
> @@ -9,6 +9,7 @@
>  #include <linux/module.h>
>  #include <linux/types.h>
>  #include <linux/workqueue.h>
> +#include <linux/netdevice.h>
>  
>  /* Number of DIM profiles and period mode. */
>  #define NET_DIM_PARAMS_NUM_PROFILES 5
> @@ -53,6 +54,39 @@ struct dim_cq_moder {
>  	u8 cq_period_mode;
>  };
>  
> +#define DIM_PROFILE_RX		BIT(0)	/* support rx dim profile modification */
> +#define DIM_PROFILE_TX		BIT(1)	/* support tx dim profile modification */
> +
> +#define DIM_COALESCE_USEC	BIT(0)	/* support usec field modification */
> +#define DIM_COALESCE_PKTS	BIT(1)	/* support pkts field modification */
> +#define DIM_COALESCE_COMPS	BIT(2)	/* support comps field modification */
> +
> +struct dim_irq_moder {
> +	/* See DIM_PROFILE_* */
> +	u8 profile_flags;
> +
> +	/* See DIM_COALESCE_* for Rx and Tx */
> +	u8 coal_flags;
> +
> +	/* Rx DIM period count mode: CQE or EQE */
> +	u8 dim_rx_mode;
> +
> +	/* Tx DIM period count mode: CQE or EQE */
> +	u8 dim_tx_mode;
> +
> +	/* DIM profile list for Rx */
> +	struct dim_cq_moder __rcu *rx_profile;
> +
> +	/* DIM profile list for Tx */
> +	struct dim_cq_moder __rcu *tx_profile;
> +
> +	/* Rx DIM worker function scheduled by net_dim() */
> +	void (*rx_dim_work)(struct work_struct *work);
> +
> +	/* Tx DIM worker function scheduled by net_dim() */
> +	void (*tx_dim_work)(struct work_struct *work);
> +};
> +
>  /**
>   * struct dim_sample - Structure for DIM sample data.
>   * Used for communications between DIM and its consumer.
> @@ -198,6 +232,32 @@ enum dim_step_result {
>  	DIM_ON_EDGE,
>  };
>  
> +/**
> + * net_dim_init_irq_moder - collect information to initialize irq moderation
> + * @dev: target network device
> + * @profile_flags: Rx or Tx profile modification capability
> + * @coal_flags: irq moderation params flags
> + * @rx_mode: CQ period mode for Rx
> + * @tx_mode: CQ period mode for Tx
> + * void (*rx_dim_work)(struct work_struct *work);
> + *	Rx worker called after dim decision.
> + *
> + * void (*tx_dim_work)(struct work_struct *work);
> + *	Tx worker called after dim decision.

this format is not going to make ./scripts/kernel-doc happy,
run it on the modified files it will show you the warnings:

./scripts/kernel-doc -Wall -none file

> diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
> index b4f0d233d048..4837e37e8b10 100644
> --- a/include/uapi/linux/ethtool_netlink.h
> +++ b/include/uapi/linux/ethtool_netlink.h
> @@ -416,12 +416,32 @@ enum {
>  	ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES,		/* u32 */
>  	ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES,		/* u32 */
>  	ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS,		/* u32 */
> +	ETHTOOL_A_COALESCE_RX_PROFILE,			/* nest - _A_PROFILE_IRQ_MODERATION */
> +	ETHTOOL_A_COALESCE_TX_PROFILE,			/* nest - _A_PROFILE_IRQ_MODERATION */
>  
>  	/* add new constants above here */
>  	__ETHTOOL_A_COALESCE_CNT,
>  	ETHTOOL_A_COALESCE_MAX = (__ETHTOOL_A_COALESCE_CNT - 1)
>  };
>  
> +enum {
> +	ETHTOOL_A_PROFILE_UNSPEC,
> +	ETHTOOL_A_PROFILE_IRQ_MODERATION,		/* nest, _A_IRQ_MODERATION_* */
> +
> +	__ETHTOOL_A_PROFILE_CNT,
> +	ETHTOOL_A_PROFILE_MAX = (__ETHTOOL_A_PROFILE_CNT - 1)
> +};

I think this doesn't match what you described in the YAML spec.
There is no "irq-moderation" layer there and no multi-attr: true...
Does tools/net/ynl/cli.py work with the new attributes?

> +enum {
> +	ETHTOOL_A_IRQ_MODERATION_UNSPEC,
> +	ETHTOOL_A_IRQ_MODERATION_USEC,			/* u32 */
> +	ETHTOOL_A_IRQ_MODERATION_PKTS,			/* u32 */
> +	ETHTOOL_A_IRQ_MODERATION_COMPS,			/* u32 */
> +
> +	__ETHTOOL_A_IRQ_MODERATION_CNT,
> +	ETHTOOL_A_IRQ_MODERATION_MAX = (__ETHTOOL_A_IRQ_MODERATION_CNT - 1)
> +};
> +
>  /* PAUSE */
>  
>  enum {
> diff --git a/lib/dim/net_dim.c b/lib/dim/net_dim.c
> index 4e32f7aaac86..ec0dc21793c0 100644
> --- a/lib/dim/net_dim.c
> +++ b/lib/dim/net_dim.c
> @@ -101,6 +101,79 @@ net_dim_get_def_tx_moderation(u8 cq_period_mode)
>  }
>  EXPORT_SYMBOL(net_dim_get_def_tx_moderation);
>  
> +int net_dim_init_irq_moder(struct net_device *dev, u8 profile_flags,
> +			   u8 coal_flags, u8 rx_mode, u8 tx_mode,
> +			   void (*rx_dim_work)(struct work_struct *work),
> +			   void (*tx_dim_work)(struct work_struct *work))
> +{
> +	struct dim_cq_moder *rxp, *txp;
> +	struct dim_irq_moder *moder;
> +	int len;
> +
> +	dev->irq_moder = kzalloc(sizeof(*dev->irq_moder), GFP_KERNEL);
> +	if (!dev->irq_moder)
> +		goto err_moder;
> +
> +	moder = dev->irq_moder;
> +	len = NET_DIM_PARAMS_NUM_PROFILES * sizeof(*moder->rx_profile);
> +
> +	moder->profile_flags = profile_flags;
> +	moder->coal_flags = coal_flags;
> +
> +	if (profile_flags & DIM_PROFILE_RX) {
> +		moder->dim_rx_mode = rx_mode;
> +		moder->rx_dim_work = rx_dim_work;
> +		rxp = kmemdup(rx_profile[rx_mode], len, GFP_KERNEL);
> +		if (!rxp)
> +			goto err_rx_profile;
> +
> +		rcu_assign_pointer(moder->rx_profile, rxp);
> +	}
> +
> +	if (profile_flags & DIM_PROFILE_TX) {
> +		moder->dim_tx_mode = tx_mode;
> +		moder->tx_dim_work = tx_dim_work;
> +		txp = kmemdup(tx_profile[tx_mode], len, GFP_KERNEL);
> +		if (!txp)
> +			goto err_tx_profile;

you need to init rxp to NULL, otherwise this goto may read
uninitialized value

> +		rcu_assign_pointer(moder->tx_profile, txp);
> +	}
> +
> +	return 0;
> +
> +err_tx_profile:
> +	kfree(rxp);
> +err_rx_profile:
> +	kfree(moder);
> +err_moder:
> +	return -ENOMEM;
> +}
> +EXPORT_SYMBOL(net_dim_init_irq_moder);
> +
> +void net_dim_free_irq_moder(struct net_device *dev)
> +{
> +	struct dim_cq_moder *rx_profile, *tx_profile;
> +
> +	if (!dev->irq_moder)
> +		return;
> +
> +	rcu_read_lock();
> +	rx_profile = rcu_dereference(dev->irq_moder->rx_profile);
> +	tx_profile = rcu_dereference(dev->irq_moder->tx_profile);
> +	rcu_read_unlock();

rtnl_dereference() ? use of the pointers you got outside of the read
critical section looks wrong

> +	rcu_assign_pointer(dev->irq_moder->tx_profile, NULL);
> +	rcu_assign_pointer(dev->irq_moder->rx_profile, NULL);
> +
> +	synchronize_rcu();

Better to use kfree_rcu(), synchronize_rcu() can be quite slow

> +	kfree(rx_profile);
> +	kfree(tx_profile);
> +	kfree(dev->irq_moder);
> +}
> +EXPORT_SYMBOL(net_dim_free_irq_moder);
> +
>  static int net_dim_step(struct dim *dim)
>  {
>  	if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
> diff --git a/net/ethtool/coalesce.c b/net/ethtool/coalesce.c
> index 83112c1a71ae..7b852938cf01 100644
> --- a/net/ethtool/coalesce.c
> +++ b/net/ethtool/coalesce.c
> @@ -1,5 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0-only
>  
> +#include <linux/dim.h>
>  #include "netlink.h"
>  #include "common.h"
>  
> @@ -82,6 +83,14 @@ static int coalesce_prepare_data(const struct ethnl_req_info *req_base,
>  static int coalesce_reply_size(const struct ethnl_req_info *req_base,
>  			       const struct ethnl_reply_data *reply_base)
>  {
> +	int modersz = nla_total_size(0) + /* _PROFILE_IRQ_MODERATION, nest */
> +		      nla_total_size(sizeof(u32)) + /* _IRQ_MODERATION_USEC */
> +		      nla_total_size(sizeof(u32)) + /* _IRQ_MODERATION_PKTS */
> +		      nla_total_size(sizeof(u32));  /* _IRQ_MODERATION_COMPS */
> +
> +	int total_modersz = nla_total_size(0) +  /* _{R,T}X_PROFILE, nest */
> +			modersz * NET_DIM_PARAMS_NUM_PROFILES;
> +
>  	return nla_total_size(sizeof(u32)) +	/* _RX_USECS */
>  	       nla_total_size(sizeof(u32)) +	/* _RX_MAX_FRAMES */
>  	       nla_total_size(sizeof(u32)) +	/* _RX_USECS_IRQ */
> @@ -108,7 +117,8 @@ static int coalesce_reply_size(const struct ethnl_req_info *req_base,
>  	       nla_total_size(sizeof(u8)) +	/* _USE_CQE_MODE_RX */
>  	       nla_total_size(sizeof(u32)) +	/* _TX_AGGR_MAX_BYTES */
>  	       nla_total_size(sizeof(u32)) +	/* _TX_AGGR_MAX_FRAMES */
> -	       nla_total_size(sizeof(u32));	/* _TX_AGGR_TIME_USECS */
> +	       nla_total_size(sizeof(u32)) +	/* _TX_AGGR_TIME_USECS */
> +	       total_modersz * 2;		/* _{R,T}X_PROFILE */
>  }
>  
>  static bool coalesce_put_u32(struct sk_buff *skb, u16 attr_type, u32 val,
> @@ -127,6 +137,75 @@ static bool coalesce_put_bool(struct sk_buff *skb, u16 attr_type, u32 val,
>  	return nla_put_u8(skb, attr_type, !!val);
>  }
>  
> +#if IS_ENABLED(CONFIG_DIMLIB)

Can we decrease the use of IS_ENABLED() here, somehow?
Do we need to protect anything else than accesses to dev->irq_moder ?
Does coalesce_put_profile() need CONFIG_DIMLIB to build?

> +/**
> + * coalesce_put_profile - fill reply with a nla nest with four child nla nests.
> + * @skb: socket buffer the message is stored in
> + * @attr_type: nest attr type ETHTOOL_A_COALESCE_*X_PROFILE
> + * @profile: data passed to userspace
> + * @coal_flags: modifiable parameters supported by the driver
> + *
> + * Put a dim profile nest attribute. Refer to ETHTOOL_A_PROFILE_IRQ_MODERATION.
> + *
> + * Return: 0 on success or a negative error code.
> + */
> +static int coalesce_put_profile(struct sk_buff *skb, u16 attr_type,
> +				const struct dim_cq_moder *profile,
> +				u8 coal_flags)
> +{
> +	struct nlattr *profile_attr, *moder_attr;
> +	int i, ret;
> +
> +	if (!profile || !coal_flags)
> +		return 0;
> +
> +	profile_attr = nla_nest_start(skb, attr_type);
> +	if (!profile_attr)
> +		return -EMSGSIZE;
> +
> +	for (i = 0; i < NET_DIM_PARAMS_NUM_PROFILES; i++) {
> +		moder_attr = nla_nest_start(skb, ETHTOOL_A_PROFILE_IRQ_MODERATION);
> +		if (!moder_attr) {
> +			ret = -EMSGSIZE;
> +			goto cancel_profile;
> +		}
> +
> +		if (coal_flags & DIM_COALESCE_USEC) {
> +			ret = nla_put_u32(skb, ETHTOOL_A_IRQ_MODERATION_USEC,
> +					  profile[i].usec);
> +			if (ret)
> +				goto cancel_moder;
> +		}
> +
> +		if (coal_flags & DIM_COALESCE_PKTS) {
> +			ret = nla_put_u32(skb, ETHTOOL_A_IRQ_MODERATION_PKTS,
> +					  profile[i].pkts);
> +			if (ret)
> +				goto cancel_moder;
> +		}
> +
> +		if (coal_flags & DIM_COALESCE_COMPS) {
> +			ret = nla_put_u32(skb, ETHTOOL_A_IRQ_MODERATION_COMPS,
> +					  profile[i].comps);
> +			if (ret)
> +				goto cancel_moder;
> +		}
> +
> +		nla_nest_end(skb, moder_attr);
> +	}
> +
> +	nla_nest_end(skb, profile_attr);
> +
> +	return 0;
> +
> +cancel_moder:
> +	nla_nest_cancel(skb, moder_attr);
> +cancel_profile:
> +	nla_nest_cancel(skb, profile_attr);
> +	return ret;
> +}
> +#endif
> +
>  static int coalesce_fill_reply(struct sk_buff *skb,
>  			       const struct ethnl_req_info *req_base,
>  			       const struct ethnl_reply_data *reply_base)
> @@ -134,6 +213,12 @@ static int coalesce_fill_reply(struct sk_buff *skb,
>  	const struct coalesce_reply_data *data = COALESCE_REPDATA(reply_base);
>  	const struct kernel_ethtool_coalesce *kcoal = &data->kernel_coalesce;
>  	const struct ethtool_coalesce *coal = &data->coalesce;
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +	struct net_device *dev = req_base->dev;
> +	struct dim_irq_moder *irq_moder = dev->irq_moder;
> +	u8 coal_flags;
> +	int ret;
> +#endif
>  	u32 supported = data->supported_params;
>  
>  	if (coalesce_put_u32(skb, ETHTOOL_A_COALESCE_RX_USECS,
> @@ -192,11 +277,51 @@ static int coalesce_fill_reply(struct sk_buff *skb,
>  			     kcoal->tx_aggr_time_usecs, supported))
>  		return -EMSGSIZE;
>  
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +	if (!irq_moder)
> +		return 0;
> +
> +	coal_flags = irq_moder->coal_flags;
> +	rcu_read_lock();
> +	if (irq_moder->profile_flags & DIM_PROFILE_RX) {
> +		ret = coalesce_put_profile(skb, ETHTOOL_A_COALESCE_RX_PROFILE,
> +					   rcu_dereference(irq_moder->rx_profile),

rtnl_deference can be used there, I assume updates are protected by
rtnl_lock

> +					   coal_flags);
> +		if (ret) {
> +			rcu_read_unlock();
> +			return ret;
> +		}
> +	}
> +
> +	if (irq_moder->profile_flags & DIM_PROFILE_TX) {
> +		ret = coalesce_put_profile(skb, ETHTOOL_A_COALESCE_TX_PROFILE,
> +					   rcu_dereference(irq_moder->tx_profile),
> +					   coal_flags);
> +		if (ret) {
> +			rcu_read_unlock();
> +			return ret;
> +		}
> +	}
> +	rcu_read_unlock();
> +#endif
>  	return 0;
>  }
>  
>  /* COALESCE_SET */
>  
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +static const struct nla_policy coalesce_irq_moderation_policy[] = {
> +	[ETHTOOL_A_IRQ_MODERATION_USEC]	= {.type = NLA_U32},
> +	[ETHTOOL_A_IRQ_MODERATION_PKTS]	= {.type = NLA_U32},
> +	[ETHTOOL_A_IRQ_MODERATION_COMPS] = {.type = NLA_U32},

nit: empty spaces around brackets, please ...

> +};
> +
> +static const struct nla_policy coalesce_profile_irq_policy[] = {
> +	[ETHTOOL_A_PROFILE_IRQ_MODERATION] =
> +		NLA_POLICY_NESTED(coalesce_irq_moderation_policy),
> +};
> +#endif
> +
>  const struct nla_policy ethnl_coalesce_set_policy[] = {
>  	[ETHTOOL_A_COALESCE_HEADER]		=
>  		NLA_POLICY_NESTED(ethnl_header_policy),
> @@ -227,6 +352,12 @@ const struct nla_policy ethnl_coalesce_set_policy[] = {
>  	[ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES] = { .type = NLA_U32 },
>  	[ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES] = { .type = NLA_U32 },
>  	[ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS] = { .type = NLA_U32 },

... like here ^

> +#if IS_ENABLED(CONFIG_DIMLIB)
> +	[ETHTOOL_A_COALESCE_RX_PROFILE] =
> +		NLA_POLICY_NESTED(coalesce_profile_irq_policy),
> +	[ETHTOOL_A_COALESCE_TX_PROFILE] =
> +		NLA_POLICY_NESTED(coalesce_profile_irq_policy),
> +#endif
>  };
>  
>  static int
> @@ -234,6 +365,9 @@ ethnl_set_coalesce_validate(struct ethnl_req_info *req_info,
>  			    struct genl_info *info)
>  {
>  	const struct ethtool_ops *ops = req_info->dev->ethtool_ops;
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +	struct net_device *dev = req_info->dev;
> +#endif
>  	struct nlattr **tb = info->attrs;
>  	u32 supported_params;
>  	u16 a;
> @@ -243,6 +377,15 @@ ethnl_set_coalesce_validate(struct ethnl_req_info *req_info,
>  
>  	/* make sure that only supported parameters are present */
>  	supported_params = ops->supported_coalesce_params;
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +	if (dev->irq_moder) {

This may be NULL

> +		if (dev->irq_moder->profile_flags & DIM_PROFILE_RX)
> +			supported_params |= ETHTOOL_COALESCE_RX_PROFILE;
> +
> +		if (dev->irq_moder->profile_flags & DIM_PROFILE_TX)
> +			supported_params |= ETHTOOL_COALESCE_TX_PROFILE;
> +	}
> +#endif
>  	for (a = ETHTOOL_A_COALESCE_RX_USECS; a < __ETHTOOL_A_COALESCE_CNT; a++)
>  		if (tb[a] && !(supported_params & attr_to_mask(a))) {
>  			NL_SET_ERR_MSG_ATTR(info->extack, tb[a],
> @@ -253,12 +396,104 @@ ethnl_set_coalesce_validate(struct ethnl_req_info *req_info,
>  	return 1;
>  }
>  
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +/**
> + * ethnl_update_profile - get a profile nla nest with child nla nests from userspace.
> + * @dev: netdevice to update the profile
> + * @dst: profile get from the driver and modified by ethnl_update_profile.
> + * @nests: nest attr ETHTOOL_A_COALESCE_*X_PROFILE to set profile.
> + * @extack: Netlink extended ack
> + *
> + * Layout of nests:
> + *   Nested ETHTOOL_A_COALESCE_*X_PROFILE attr
> + *     Nested ETHTOOL_A_PROFILE_IRQ_MODERATION attr
> + *       ETHTOOL_A_IRQ_MODERATION_USEC attr
> + *       ETHTOOL_A_IRQ_MODERATION_PKTS attr
> + *       ETHTOOL_A_IRQ_MODERATION_COMPS attr
> + *     ...
> + *     Nested ETHTOOL_A_PROFILE_IRQ_MODERATION attr
> + *       ETHTOOL_A_IRQ_MODERATION_USEC attr
> + *       ETHTOOL_A_IRQ_MODERATION_PKTS attr
> + *       ETHTOOL_A_IRQ_MODERATION_COMPS attr
> + *
> + * Return: 0 on success or a negative error code.
> + */
> +static int ethnl_update_profile(struct net_device *dev,
> +				struct dim_cq_moder __rcu **dst,
> +				const struct nlattr *nests,
> +				struct netlink_ext_ack *extack)
> +{
> +	struct nlattr *moder[ARRAY_SIZE(coalesce_irq_moderation_policy)];
> +	struct dim_irq_moder *irq_moder = dev->irq_moder;
> +	struct dim_cq_moder *new_profile, *old_profile;
> +	int ret, rem, i = 0, len;
> +	struct nlattr *nest;
> +
> +	if (!nests)
> +		return 0;
> +
> +	if (!*dst)
> +		return -EINVAL;
> +
> +	old_profile = rtnl_dereference(*dst);
> +	len = NET_DIM_PARAMS_NUM_PROFILES * sizeof(*old_profile);
> +	new_profile = kmemdup(old_profile, len, GFP_KERNEL);
> +	if (!new_profile)
> +		return -ENOMEM;
> +
> +	nla_for_each_nested_type(nest, ETHTOOL_A_PROFILE_IRQ_MODERATION, nests, rem) {
> +		ret = nla_parse_nested(moder,
> +				       ARRAY_SIZE(coalesce_irq_moderation_policy) - 1,
> +				       nest, coalesce_irq_moderation_policy,
> +				       extack);
> +		if (ret)
> +			return ret;
> +
> +		if (!NL_REQ_ATTR_CHECK(extack, nest, moder, ETHTOOL_A_IRQ_MODERATION_USEC)) {
> +			if (irq_moder->coal_flags & DIM_COALESCE_USEC)

There are 3 options here, not 2:

	if (irq_moder->coal_flags & flag) {
		if (NL_REQ_ATTR_CHECK())
			val = nla_get_u32(...);
		else
			return -EINVAL;
	} else {
		if (moder[attr_type)) {
			BAD_ATTR()
			return -EOPNOTSUPP;
		}
	}

you probably want to factor this out to a helper..

> +				new_profile[i].usec =
> +					nla_get_u32(moder[ETHTOOL_A_IRQ_MODERATION_USEC]);
> +			else
> +				return -EOPNOTSUPP;
> +		}
> +
> +		if (!NL_REQ_ATTR_CHECK(extack, nest, moder, ETHTOOL_A_IRQ_MODERATION_PKTS)) {
> +			if (irq_moder->coal_flags & DIM_COALESCE_PKTS)
> +				new_profile[i].pkts =
> +					nla_get_u32(moder[ETHTOOL_A_IRQ_MODERATION_PKTS]);
> +			else
> +				return -EOPNOTSUPP;
> +		}
> +
> +		if (!NL_REQ_ATTR_CHECK(extack, nest, moder, ETHTOOL_A_IRQ_MODERATION_COMPS)) {
> +			if (irq_moder->coal_flags & DIM_COALESCE_COMPS)
> +				new_profile[i].comps =
> +					nla_get_u32(moder[ETHTOOL_A_IRQ_MODERATION_COMPS]);
> +			else
> +				return -EOPNOTSUPP;
> +		}
> +
> +		i++;
> +	}
> +
> +	rcu_assign_pointer(*dst, new_profile);
> +
> +	synchronize_rcu();
> +	kfree(old_profile);
> +
> +	return 0;
> +}
> +#endif
> +
>  static int
>  __ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info,
>  		     bool *dual_change)
>  {
>  	struct kernel_ethtool_coalesce kernel_coalesce = {};
>  	struct net_device *dev = req_info->dev;
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +	struct dim_irq_moder *irq_moder = dev->irq_moder;
> +#endif
>  	struct ethtool_coalesce coalesce = {};
>  	bool mod_mode = false, mod = false;
>  	struct nlattr **tb = info->attrs;
> @@ -317,6 +552,33 @@ __ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info,
>  	ethnl_update_u32(&kernel_coalesce.tx_aggr_time_usecs,
>  			 tb[ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS], &mod);
>  
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +	if (!irq_moder)
> +		goto skip_irq_moder;
> +
> +	if (irq_moder->profile_flags & DIM_PROFILE_RX) {
> +		ret = ethnl_update_profile(dev, &irq_moder->rx_profile,
> +					   tb[ETHTOOL_A_COALESCE_RX_PROFILE],
> +					   info->extack);
> +		if (ret < 0)
> +			return ret;
> +	}
> +
> +	if (irq_moder->profile_flags & DIM_PROFILE_TX) {
> +		ret = ethnl_update_profile(dev, &irq_moder->tx_profile,
> +					   tb[ETHTOOL_A_COALESCE_TX_PROFILE],
> +					   info->extack);
> +		if (ret < 0)
> +			return ret;
> +	}
> +
> +skip_irq_moder:
> +#else
> +	if (tb[ETHTOOL_A_COALESCE_RX_PROFILE] ||
> +	    tb[ETHTOOL_A_COALESCE_TX_PROFILE])
> +		return -EOPNOTSUPP;

ethnl_set_coalesce_validate() should already reject this, no?

> +#endif
>  	/* Update operation modes */
>  	ethnl_update_bool32(&coalesce.use_adaptive_rx_coalesce,
>  			    tb[ETHTOOL_A_COALESCE_USE_ADAPTIVE_RX], &mod_mode);


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v10 2/4] ethtool: provide customized dim profile management
  2024-04-27  1:33   ` Jakub Kicinski
@ 2024-04-28 14:49     ` Heng Qi
  2024-04-29 17:47       ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Heng Qi @ 2024-04-28 14:49 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, virtualization, David S . Miller, Paolo Abeni,
	Eric Dumazet, Jason Wang, Michael S . Tsirkin, Brett Creeley,
	Ratheesh Kannoth, Alexander Lobakin, Xuan Zhuo, Tal Gilboa,
	Jonathan Corbet, linux-doc, Maxime Chevallier, Jiri Pirko,
	Paul Greenwalt, Ahmed Zaki, Vladimir Oltean, Kory Maincent,
	Andrew Lunn, justinstitt @ google . com


在 2024/4/27 上午9:33, Jakub Kicinski 写道:
> On Fri, 26 Apr 2024 00:59:46 +0800 Heng Qi wrote:
>> The NetDIM library, currently leveraged by an array of NICs, delivers
>> excellent acceleration benefits. Nevertheless, NICs vary significantly
>> in their dim profile list prerequisites.
>>
>> Specifically, virtio-net backends may present diverse sw or hw device
>> implementation, making a one-size-fits-all parameter list impractical.
>> On Alibaba Cloud, the virtio DPU's performance under the default DIM
>> profile falls short of expectations, partly due to a mismatch in
>> parameter configuration.
>>
>> I also noticed that ice/idpf/ena and other NICs have customized
>> profilelist or placed some restrictions on dim capabilities.
>>
>> Motivated by this, I tried adding new params for "ethtool -C" that provides
>> a per-device control to modify and access a device's interrupt parameters.
>>
>> Usage
>> ========
>> The target NIC is named ethx.
>>
>> Assume that ethx only declares support for rx profile setting
>> (with DIM_PROFILE_RX flag set in profile_flags) and supports modification
>> of usec and pkt fields.
>>
>> 1. Query the currently customized list of the device
>>
>> $ ethtool -c ethx
>> ...
>> rx-profile:
>> {.usec =   1, .pkts = 256, .comps = n/a,},
>> {.usec =   8, .pkts = 256, .comps = n/a,},
>> {.usec =  64, .pkts = 256, .comps = n/a,},
>> {.usec = 128, .pkts = 256, .comps = n/a,},
>> {.usec = 256, .pkts = 256, .comps = n/a,}
>> tx-profile:   n/a
>>
>> 2. Tune
>> $ ethtool -C ethx rx-profile 1,1,n_2,n,n_3,3,n_4,4,n_n,5,n
>> "n" means do not modify this field.
>> $ ethtool -c ethx
>> ...
>> rx-profile:
>> {.usec =   1, .pkts =   1, .comps = n/a,},
>> {.usec =   2, .pkts = 256, .comps = n/a,},
>> {.usec =   3, .pkts =   3, .comps = n/a,},
>> {.usec =   4, .pkts =   4, .comps = n/a,},
>> {.usec = 256, .pkts =   5, .comps = n/a,}
>> tx-profile:   n/a
>>
>> 3. Hint
>> If the device does not support some type of customized dim profiles,
>> the corresponding "n/a" will display.
>>
>> If the "n/a" field is being modified, -EOPNOTSUPP will be reported.
>>
>>
>> --- a/include/uapi/linux/ethtool_netlink.h
>> +++ b/include/uapi/linux/ethtool_netlink.h
>> @@ -416,12 +416,32 @@ enum {
>>   	ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES,		/* u32 */
>>   	ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES,		/* u32 */
>>   	ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS,		/* u32 */
>> +	ETHTOOL_A_COALESCE_RX_PROFILE,			/* nest - _A_PROFILE_IRQ_MODERATION */
>> +	ETHTOOL_A_COALESCE_TX_PROFILE,			/* nest - _A_PROFILE_IRQ_MODERATION */
>>   
>>   	/* add new constants above here */
>>   	__ETHTOOL_A_COALESCE_CNT,
>>   	ETHTOOL_A_COALESCE_MAX = (__ETHTOOL_A_COALESCE_CNT - 1)
>>   };
>>   
>> +enum {
>> +	ETHTOOL_A_PROFILE_UNSPEC,
>> +	ETHTOOL_A_PROFILE_IRQ_MODERATION,		/* nest, _A_IRQ_MODERATION_* */
>> +
>> +	__ETHTOOL_A_PROFILE_CNT,
>> +	ETHTOOL_A_PROFILE_MAX = (__ETHTOOL_A_PROFILE_CNT - 1)
>>
>> +};
> I think this doesn't match what you described in the YAML spec.
> There is no "irq-moderation" layer there and no multi-attr: true...
> Does tools/net/ynl/cli.py work with the new attributes?

Yes, YAML requires an extra layer for tools/net/ynl/cli.py to work

properly. It has been fixed now.

> +void net_dim_free_irq_moder(struct net_device *dev)
> +{
> +	struct dim_cq_moder *rx_profile, *tx_profile;
> +
> +	if (!dev->irq_moder)
> +		return;
> +
> +	rcu_read_lock();
> +	rx_profile = rcu_dereference(dev->irq_moder->rx_profile);
> +	tx_profile = rcu_dereference(dev->irq_moder->tx_profile);
> +	rcu_read_unlock();
> rtnl_dereference() ? use of the pointers you got outside of the read
> critical section looks wrong

rtnl_lock is not held in the current context. The next version will hold rtnl_lock

in the driver, and the function is as follows:

struct dim_cq_moder {
         u16 usec;
         u16 pkts;
         u16 comps;
         u8 cq_period_mode;
+        struct rcu_head rcu;
+};


+/* RTNL lock is held. */
+void net_dim_free_irq_moder(struct net_device *dev)
+{
+       struct dim_cq_moder *rxp, *txp;
+
+       if (!dev->irq_moder)
+               return;
+
+       rxp = rtnl_dereference(dev->irq_moder->rx_profile);
+       txp = rtnl_dereference(dev->irq_moder->tx_profile);
+
+       rcu_assign_pointer(dev->irq_moder->rx_profile, NULL);
+       rcu_assign_pointer(dev->irq_moder->tx_profile, NULL);
+
+       kfree_rcu(rxp, rcu);
+       kfree_rcu(txp, rcu);
+       kfree(dev->irq_moder);
+}
+EXPORT_SYMBOL(net_dim_free_irq_moder);

>> +	rcu_assign_pointer(dev->irq_moder->tx_profile, NULL);
>> +	rcu_assign_pointer(dev->irq_moder->rx_profile, NULL);
>> +
>> +	synchronize_rcu();
> Better to use kfree_rcu(), synchronize_rcu() can be quite slow


Ok.


>> +	
>>   static bool coalesce_put_u32(struct sk_buff *skb, u16 attr_type, u32 val,
>> @@ -127,6 +137,75 @@ static bool coalesce_put_bool(struct sk_buff *skb, u16 attr_type, u32 val,
>>   	return nla_put_u8(skb, attr_type, !!val);
>>   }
>>   
>> +#if IS_ENABLED(CONFIG_DIMLIB)
> Can we decrease the use of IS_ENABLED() here, somehow?
> Do we need to protect anything else than accesses to dev->irq_moder ?
> Does coalesce_put_profile() need CONFIG_DIMLIB to build?


I will try to decrease this.


>   static int coalesce_fill_reply(struct sk_buff *skb,
>   			       const struct ethnl_req_info *req_base,
>   			       const struct ethnl_reply_data *reply_base)
> @@ -134,6 +213,12 @@ static int coalesce_fill_reply(struct sk_buff *skb,
>   	const struct coalesce_reply_data *data = COALESCE_REPDATA(reply_base);
>   	const struct kernel_ethtool_coalesce *kcoal = &data->kernel_coalesce;
>   	const struct ethtool_coalesce *coal = &data->coalesce;
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +	struct net_device *dev = req_base->dev;
> +	struct dim_irq_moder *irq_moder = dev->irq_moder;
> +	u8 coal_flags;
> +	int ret;
> +#endif
>   	u32 supported = data->supported_params;
>   
>   	if (coalesce_put_u32(skb, ETHTOOL_A_COALESCE_RX_USECS,
> @@ -192,11 +277,51 @@ static int coalesce_fill_reply(struct sk_buff *skb,
>   			     kcoal->tx_aggr_time_usecs, supported))
>   		return -EMSGSIZE;
>   
> +#if IS_ENABLED(CONFIG_DIMLIB)
> +	if (!irq_moder)
> +		return 0;
> +
> +	coal_flags = irq_moder->coal_flags;
> +	rcu_read_lock();
> +	if (irq_moder->profile_flags & DIM_PROFILE_RX) {
> +		ret = coalesce_put_profile(skb, ETHTOOL_A_COALESCE_RX_PROFILE,
> +					   rcu_dereference(irq_moder->rx_profile),
> rtnl_deference can be used there, I assume updates are protected by
> rtnl_lock


The update-side does hold rtnl_lock. But here is the read-side,

so I think rcu_read_lock + rcu_dereference is enough.


>> +					   coal_flags);
>> +		if (ret) {
>> +			rcu_read_unlock();
>> +			return ret;
>> +		}
>> +	}
>> +
>> +	if (irq_moder->profile_flags & DIM_PROFILE_TX) {
>> +		ret = coalesce_put_profile(skb, ETHTOOL_A_COALESCE_TX_PROFILE,
>> +					   rcu_dereference(irq_moder->tx_profile),
>> +					   coal_flags);
>> +		if (ret) {
>> +			rcu_read_unlock();
>> +			return ret;
>> +		}
>> +	}
>> +	rcu_read_unlock();
>> +#endif
>>   	return 0;
>>   }
>>   
>>   /* COALESCE_SET */
>>   
>>   static int
>> @@ -234,6 +365,9 @@ ethnl_set_coalesce_validate(struct ethnl_req_info *req_info,
>>   			    struct genl_info *info)
>>   {
>>   	const struct ethtool_ops *ops = req_info->dev->ethtool_ops;
>> +#if IS_ENABLED(CONFIG_DIMLIB)
>> +	struct net_device *dev = req_info->dev;
>> +#endif
>>   	struct nlattr **tb = info->attrs;
>>   	u32 supported_params;
>>   	u16 a;
>> @@ -243,6 +377,15 @@ ethnl_set_coalesce_validate(struct ethnl_req_info *req_info,
>>   
>>   	/* make sure that only supported parameters are present */
>>   	supported_params = ops->supported_coalesce_params;
>> +#if IS_ENABLED(CONFIG_DIMLIB)
>> +	if (dev->irq_moder) {
> This may be NULL


Do you mean dev may be null or dev->irq_moder may be null?
The former has been excluded (see const struct ethtool_ops *ops

= req_info->dev->ethtool_ops;).

And we are ruling out the latter using 'if (dev->irq_moder)'.

Or something else?


>> +		if (dev->irq_moder->profile_flags & DIM_PROFILE_RX)
>> +			supported_params |= ETHTOOL_COALESCE_RX_PROFILE;
>> +
>> +		if (dev->irq_moder->profile_flags & DIM_PROFILE_TX)
>> +			supported_params |= ETHTOOL_COALESCE_TX_PROFILE;
>> +	}
>> +#endif
>>   	for (a = ETHTOOL_A_COALESCE_RX_USECS; a < __ETHTOOL_A_COALESCE_CNT; a++)
>>   		if (tb[a] && !(supported_params & attr_to_mask(a))) {
>>   			NL_SET_ERR_MSG_ATTR(info->extack, tb[a],
>> @@ -253,12 +396,104 @@ ethnl_set_coalesce_validate(struct ethnl_req_info *req_info,
>>   	return 1;
>>   }
>>   
>> +#if IS_ENABLED(CONFIG_DIMLIB)
>> +/**
>> + * ethnl_update_profile - get a profile nla nest with child nla nests from userspace.
>> + * @dev: netdevice to update the profile
>> + * @dst: profile get from the driver and modified by ethnl_update_profile.
>> + * @nests: nest attr ETHTOOL_A_COALESCE_*X_PROFILE to set profile.
>> + * @extack: Netlink extended ack
>> + *
>> + * Layout of nests:
>> + *   Nested ETHTOOL_A_COALESCE_*X_PROFILE attr
>> + *     Nested ETHTOOL_A_PROFILE_IRQ_MODERATION attr
>> + *       ETHTOOL_A_IRQ_MODERATION_USEC attr
>> + *       ETHTOOL_A_IRQ_MODERATION_PKTS attr
>> + *       ETHTOOL_A_IRQ_MODERATION_COMPS attr
>> + *     ...
>> + *     Nested ETHTOOL_A_PROFILE_IRQ_MODERATION attr
>> + *       ETHTOOL_A_IRQ_MODERATION_USEC attr
>> + *       ETHTOOL_A_IRQ_MODERATION_PKTS attr
>> + *       ETHTOOL_A_IRQ_MODERATION_COMPS attr
>> + *
>> + * Return: 0 on success or a negative error code.
>> + */
>> +static int ethnl_update_profile(struct net_device *dev,
>> +				struct dim_cq_moder __rcu **dst,
>> +				const struct nlattr *nests,
>> +				struct netlink_ext_ack *extack)
>> +{
>> +	struct nlattr *moder[ARRAY_SIZE(coalesce_irq_moderation_policy)];
>> +	struct dim_irq_moder *irq_moder = dev->irq_moder;
>> +	struct dim_cq_moder *new_profile, *old_profile;
>> +	int ret, rem, i = 0, len;
>> +	struct nlattr *nest;
>> +
>> +	if (!nests)
>> +		return 0;
>> +
>> +	if (!*dst)
>> +		return -EINVAL;
>> +
>> +	old_profile = rtnl_dereference(*dst);
>> +	len = NET_DIM_PARAMS_NUM_PROFILES * sizeof(*old_profile);
>> +	new_profile = kmemdup(old_profile, len, GFP_KERNEL);
>> +	if (!new_profile)
>> +		return -ENOMEM;
>> +
>> +	nla_for_each_nested_type(nest, ETHTOOL_A_PROFILE_IRQ_MODERATION, nests, rem) {
>> +		ret = nla_parse_nested(moder,
>> +				       ARRAY_SIZE(coalesce_irq_moderation_policy) - 1,
>> +				       nest, coalesce_irq_moderation_policy,
>> +				       extack);
>> +		if (ret)
>> +			return ret;
>> +
>> +		if (!NL_REQ_ATTR_CHECK(extack, nest, moder, ETHTOOL_A_IRQ_MODERATION_USEC)) {
>> +			if (irq_moder->coal_flags & DIM_COALESCE_USEC)
> There are 3 options here, not 2:
>
> 	if (irq_moder->coal_flags & flag) {
> 		if (NL_REQ_ATTR_CHECK())
> 			val = nla_get_u32(...);
> 		else
> 			return -EINVAL;
> 	} else {
> 		if (moder[attr_type)) {
> 			BAD_ATTR()
> 			return -EOPNOTSUPP;
> 		}
> 	}

Maybe we missed something.

As shown in the commit log, the user is allowed to modify only
a certain fields in irq-moderation. It is assumed that the driver
supports modification of usec and pkts, but the user may only
modify usec and only fill in the usec attr.

Therefore, the kernel only gets usec attr here. Of course, the user
may have passed in 5 groups of "n, n, n", which means that nothing
is modified, and rx_profile and irq_moderation attrs are all empty.

Thanks!


> you probably want to factor this out to a helper..
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v10 2/4] ethtool: provide customized dim profile management
  2024-04-28 14:49     ` Heng Qi
@ 2024-04-29 17:47       ` Jakub Kicinski
  2024-04-30  1:59         ` Heng Qi
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2024-04-29 17:47 UTC (permalink / raw)
  To: Heng Qi
  Cc: netdev, virtualization, David S . Miller, Paolo Abeni,
	Eric Dumazet, Jason Wang, Michael S . Tsirkin, Brett Creeley,
	Ratheesh Kannoth, Alexander Lobakin, Xuan Zhuo, Tal Gilboa,
	Jonathan Corbet, linux-doc, Maxime Chevallier, Jiri Pirko,
	Paul Greenwalt, Ahmed Zaki, Vladimir Oltean, Kory Maincent,
	Andrew Lunn, justinstitt @ google . com

On Sun, 28 Apr 2024 22:49:09 +0800 Heng Qi wrote:
> >> +#if IS_ENABLED(CONFIG_DIMLIB)
> >> +	if (dev->irq_moder) {  
> > This may be NULL  
> 
> 
> Do you mean dev may be null or dev->irq_moder may be null?
> The former has been excluded (see const struct ethtool_ops *ops
> 
> = req_info->dev->ethtool_ops;).
> 
> And we are ruling out the latter using 'if (dev->irq_moder)'.
> 
> Or something else?

Hm, I must have misread something here.

> >> +	nla_for_each_nested_type(nest, ETHTOOL_A_PROFILE_IRQ_MODERATION, nests, rem) {
> >> +		ret = nla_parse_nested(moder,
> >> +				       ARRAY_SIZE(coalesce_irq_moderation_policy) - 1,
> >> +				       nest, coalesce_irq_moderation_policy,
> >> +				       extack);
> >> +		if (ret)
> >> +			return ret;
> >> +
> >> +		if (!NL_REQ_ATTR_CHECK(extack, nest, moder, ETHTOOL_A_IRQ_MODERATION_USEC)) {
> >> +			if (irq_moder->coal_flags & DIM_COALESCE_USEC)  
> > There are 3 options here, not 2:
> >
> > 	if (irq_moder->coal_flags & flag) {
> > 		if (NL_REQ_ATTR_CHECK())
> > 			val = nla_get_u32(...);
> > 		else
> > 			return -EINVAL;
> > 	} else {
> > 		if (moder[attr_type)) {
> > 			BAD_ATTR()
> > 			return -EOPNOTSUPP;
> > 		}
> > 	}  
> 
> Maybe we missed something.
> 
> As shown in the commit log, the user is allowed to modify only
> a certain fields in irq-moderation. It is assumed that the driver
> supports modification of usec and pkts, but the user may only
> modify usec and only fill in the usec attr.
> 
> Therefore, the kernel only gets usec attr here. Of course, the user
> may have passed in 5 groups of "n, n, n", which means that nothing
> is modified, and rx_profile and irq_moderation attrs are all empty.

What you describe sounds good, but it's not what the code seems to be
doing. NL_REQ_ATTR_CHECK() will set an error attribute if the attr is
not present.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v10 2/4] ethtool: provide customized dim profile management
  2024-04-29 17:47       ` Jakub Kicinski
@ 2024-04-30  1:59         ` Heng Qi
  2024-04-30  3:13           ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Heng Qi @ 2024-04-30  1:59 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, virtualization, David S .  Miller, Paolo Abeni,
	Eric Dumazet, Jason Wang, Michael S  . Tsirkin, Brett Creeley,
	Ratheesh Kannoth, Alexander Lobakin, Xuan Zhuo, Tal Gilboa,
	Jonathan Corbet, linux-doc, Maxime Chevallier, Jiri Pirko,
	Paul Greenwalt, Ahmed Zaki, Vladimir Oltean, Kory Maincent,
	Andrew Lunn, justinstitt @  google . com

On Mon, 29 Apr 2024 10:47:41 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> On Sun, 28 Apr 2024 22:49:09 +0800 Heng Qi wrote:
> > >> +	nla_for_each_nested_type(nest, ETHTOOL_A_PROFILE_IRQ_MODERATION, nests, rem) {
> > >> +		ret = nla_parse_nested(moder,
> > >> +				       ARRAY_SIZE(coalesce_irq_moderation_policy) - 1,
> > >> +				       nest, coalesce_irq_moderation_policy,
> > >> +				       extack);
> > >> +		if (ret)
> > >> +			return ret;
> > >> +
> > >> +		if (!NL_REQ_ATTR_CHECK(extack, nest, moder, ETHTOOL_A_IRQ_MODERATION_USEC)) {
> > >> +			if (irq_moder->coal_flags & DIM_COALESCE_USEC)  
> > > There are 3 options here, not 2:
> > >
> > > 	if (irq_moder->coal_flags & flag) {
> > > 		if (NL_REQ_ATTR_CHECK())
> > > 			val = nla_get_u32(...);
> > > 		else
> > > 			return -EINVAL;
> > > 	} else {
> > > 		if (moder[attr_type)) {
> > > 			BAD_ATTR()
> > > 			return -EOPNOTSUPP;
> > > 		}
> > > 	}  
> > 
> > Maybe we missed something.
> > 
> > As shown in the commit log, the user is allowed to modify only
> > a certain fields in irq-moderation. It is assumed that the driver
> > supports modification of usec and pkts, but the user may only
> > modify usec and only fill in the usec attr.
> > 
> > Therefore, the kernel only gets usec attr here. Of course, the user
> > may have passed in 5 groups of "n, n, n", which means that nothing
> > is modified, and rx_profile and irq_moderation attrs are all empty.
> 
> What you describe sounds good, but it's not what the code seems to be
> doing. NL_REQ_ATTR_CHECK() will set an error attribute if the attr is
> not present.

So here we can use:

+ if (moder[ETHTOOL_A_IRQ_MODERATION_USEC]) {
+ 	if (irq_moder->coal_flags & DIM_COALESCE_USEC)
+ 		new_profile[i].usec =
+ 			nla_get_u32(moder[ETHTOOL_A_IRQ_MODERATION_USEC]);
+ 	else
+ 		return -EOPNOTSUPP;
+ }

instead of:

+ if (!NL_REQ_ATTR_CHECK(extack, nest, moder, ETHTOOL_A_IRQ_MODERATION_USEC)) {
+ 	if (irq_moder->coal_flags & DIM_COALESCE_USEC)
+ 		new_profile[i].usec =
+ 			nla_get_u32(moder[ETHTOOL_A_IRQ_MODERATION_USEC]);
+ 	else
+ 		return -EOPNOTSUPP;
+ }

to avoid missing information set in extack?

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v10 2/4] ethtool: provide customized dim profile management
  2024-04-30  1:59         ` Heng Qi
@ 2024-04-30  3:13           ` Jakub Kicinski
  2024-04-30  4:43             ` Heng Qi
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2024-04-30  3:13 UTC (permalink / raw)
  To: Heng Qi
  Cc: netdev, virtualization, David S .  Miller, Paolo Abeni,
	Eric Dumazet, Jason Wang, Michael S . Tsirkin, Brett Creeley,
	Ratheesh Kannoth, Alexander Lobakin, Xuan Zhuo, Tal Gilboa,
	Jonathan Corbet, linux-doc, Maxime Chevallier, Jiri Pirko,
	Paul Greenwalt, Ahmed Zaki, Vladimir Oltean, Kory Maincent,
	Andrew Lunn, justinstitt @  google . com

On Tue, 30 Apr 2024 09:59:39 +0800 Heng Qi wrote:
> + if (moder[ETHTOOL_A_IRQ_MODERATION_USEC]) {
> + 	if (irq_moder->coal_flags & DIM_COALESCE_USEC)
> + 		new_profile[i].usec =
> + 			nla_get_u32(moder[ETHTOOL_A_IRQ_MODERATION_USEC]);
> + 	else
> + 		return -EOPNOTSUPP;
> + }

Almost, the extack should still be there on error:

+ if (moder[ETHTOOL_A_IRQ_MODERATION_USEC])
+ 	if (irq_moder->coal_flags & DIM_COALESCE_USEC) {
+ 		new_profile[i].usec =
+ 			nla_get_u32(moder[ETHTOOL_A_IRQ_MODERATION_USEC]);
+ 	} else {
+		NL_SET_BAD_ATTR(extack, moder[ETHTOOL_A_IRQ_MODERATION_USEC]);
+ 		return -EOPNOTSUPP;
+ 	}


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v10 2/4] ethtool: provide customized dim profile management
  2024-04-30  3:13           ` Jakub Kicinski
@ 2024-04-30  4:43             ` Heng Qi
  0 siblings, 0 replies; 11+ messages in thread
From: Heng Qi @ 2024-04-30  4:43 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, virtualization, David S .   Miller, Paolo Abeni,
	Eric Dumazet, Jason Wang, Michael S   . Tsirkin, Brett Creeley,
	Ratheesh Kannoth, Alexander Lobakin, Xuan Zhuo, Tal Gilboa,
	Jonathan Corbet, linux-doc, Maxime Chevallier, Jiri Pirko,
	Paul Greenwalt, Ahmed Zaki, Vladimir Oltean, Kory Maincent,
	Andrew Lunn, justinstitt @   google . com

On Mon, 29 Apr 2024 20:13:00 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> On Tue, 30 Apr 2024 09:59:39 +0800 Heng Qi wrote:
> > + if (moder[ETHTOOL_A_IRQ_MODERATION_USEC]) {
> > + 	if (irq_moder->coal_flags & DIM_COALESCE_USEC)
> > + 		new_profile[i].usec =
> > + 			nla_get_u32(moder[ETHTOOL_A_IRQ_MODERATION_USEC]);
> > + 	else
> > + 		return -EOPNOTSUPP;
> > + }
> 
> Almost, the extack should still be there on error:
> 
> + if (moder[ETHTOOL_A_IRQ_MODERATION_USEC])
> + 	if (irq_moder->coal_flags & DIM_COALESCE_USEC) {
> + 		new_profile[i].usec =
> + 			nla_get_u32(moder[ETHTOOL_A_IRQ_MODERATION_USEC]);
> + 	} else {
> +		NL_SET_BAD_ATTR(extack, moder[ETHTOOL_A_IRQ_MODERATION_USEC]);
> + 		return -EOPNOTSUPP;
> + 	}
> 

Sure, thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-04-30  4:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-25 16:59 [PATCH net-next v10 0/4] ethtool: provide the dim profile fine-tuning channel Heng Qi
2024-04-25 16:59 ` [PATCH net-next v10 1/4] linux/dim: move useful macros to .h file Heng Qi
2024-04-25 16:59 ` [PATCH net-next v10 2/4] ethtool: provide customized dim profile management Heng Qi
2024-04-27  1:33   ` Jakub Kicinski
2024-04-28 14:49     ` Heng Qi
2024-04-29 17:47       ` Jakub Kicinski
2024-04-30  1:59         ` Heng Qi
2024-04-30  3:13           ` Jakub Kicinski
2024-04-30  4:43             ` Heng Qi
2024-04-25 16:59 ` [PATCH net-next v10 3/4] dim: add new interfaces for initialization and getting results Heng Qi
2024-04-25 16:59 ` [PATCH net-next v10 4/4] virtio-net: support dim profile fine-tuning Heng Qi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).